Phi-4-reasoning-vision
Open-weight 15B multimodal model for thinking and GUI agents

Our Take
Microsoft just dropped Phi-4-reasoning-vision, a 15 billion parameter open-weight multimodal model built for thinking and GUI agents. Three Microsoft researchers—Emad Ibrahim, Piroune Balachandran, and Zac Zuo—built this beast to process both text and images while handling complex reasoning tasks that most models choke on. It's open-weight, meaning developers can actually download, inspect, and fine-tune it themselves instead of praying to the closed-source gods.
Most multimodal models are either good at understanding images or good at reasoning—not both. Phi-4-reasoning-vision tries to crack that by combining visual understanding with advanced chain-of-thought reasoning in a single 15B package. For developers building AI agents that need to see screens, read documents, and make decisions, this is exactly the kind of open foundation you'd want to build on. It's not a consumer product you "use"—it's infrastructure. The kind of thing that shows Microsoft still knows how to build real AI, not just wrap ChatGPT in a different skin.
The people behind Phi-4-reasoning-vision
Links
Similar products worth knowing

Cardboard
Cursor for video editing.

Copperlane
Agents for Mortgage Origination

MochaCare
AI-Supercharged Humans for Home Care Agency Growth

Didit v3
The all-in-one Identity platform
Want products like this in your inbox every morning?
Five products. Every morning. Written by someone who actually cares whether they're good or not. Free forever, unsubscribe whenever.