Products/Infrastructure/vLLM

vLLM

Open-source inference serving engine for LLMs, with day-0 support for Gemma 4 across GPU/TPU

Infrastructureinferenceservingllmopen-sourceReviewed

vLLM

Our Take

vLLM is the open-source inference engine that's basically holding down the entire LLM serving infrastructure game right now, and the day-0 Gemma 4 support across both GPU and TPU is the kind of move that makes competitors nervous. It's built by Woosuk Park and the team at UC Berkeley, and if you're running any serious LLM workload without considering it, you're making things harder for yourself than they need to be.

Open-source inference serving engine for LLMs, with day-0 support for Gemma 4 across GPU/TPU

News

{"source":"X/Twitter","url":"https://x.com/woosuk_k/status/2014384730528202919","text":"Founded inferact startup Jan 2026 - startup by vLLM creators"}

Key Facts

Category

Infrastructure

Discovered via

newsletter:Substack newsletter

The people behind vLLM

W

Woosuk Kim

Co-Founder

LinkedIn Twitter/X GitHub

Links

Similar products worth knowing

Pierre Computer

Pierre Computer

AI-native git platform designed for AI agents pushing code, handling massive scale (15,000+ repos per minute) compared t

Cognichip

Startup building deep learning models that work alongside engineers to design new computer chips, potentially cutting ch

Taalas

Taalas

Develops custom AI inference hardware (AI cards) optimized for running open-weight models like Llama 3-8B, achieving 17,

Nebius

Nebius

Amsterdam-based AI infrastructure company experiencing 700% ARR growth, deploying gigawatt-scale AI factories equipped w

Want products like this in your inbox every morning?

Five products. Every morning. Written by someone who actually cares whether they're good or not. Free forever, unsubscribe whenever.

vLLM — SLAYREPORT