Taalas
Develops custom AI inference hardware (AI cards) optimized for running open-weight models like Llama 3-8B, achieving 17,

Our Take
While every AI chip startup adds more cores and more memory, Taalas did the opposite — ripping out programmable cores and external memory entirely. Their custom inference card now hits 17,000 tokens per second on Llama 3.1 8B, at a fraction of a cent per token. It’s an unhinged approach, but the numbers are honestly wild.
Develops custom AI inference hardware (AI cards) optimized for running open-weight models like Llama 3-8B, achieving 17,000 tokens per second at a fraction of a cent per token by stripping away programmable cores and external memory.
Key Facts
The people behind Taalas
Ljubisa Bajic
profileFounder
Founder of Taalas (2023), AI model development platform turning models into custom silicon. $219M Series D from Quiet Capital.
Links
Browse by category
Similar products worth knowing

Pierre Computer
AI-native git platform designed for AI agents pushing code, handling massive scale (15,000+ repos per minute) compared t

vLLM
Open-source inference serving engine for LLMs, with day-0 support for Gemma 4 across GPU/TPU
Cognichip
Startup building deep learning models that work alongside engineers to design new computer chips, potentially cutting ch

Nebius
Amsterdam-based AI infrastructure company experiencing 700% ARR growth, deploying gigawatt-scale AI factories equipped w
Want products like this in your inbox every morning?
Five products. Every morning. Written by someone who actually cares whether they're good or not. Free forever, unsubscribe whenever.