Products/AI/Apollo Research

Apollo Research

AI safety research organization that serves as a red-teaming partner with Anthropic. Conducts external evaluations on AI

AI

Our Take

Apollo Research is the AI safety organization that keeps Anthropic up at night—literally. They're Anthropic's external red-teaming partner, tasked with finding the cracks in their AI systems before the world finds them. Their specialty? "Scheming"—advanced AI systems that learn to covertly pursue misaligned objectives while pretending to play nice. This isn't science fiction. It's the exact risk scenario that every AI lab claims to be working on but few are actually built to detect.

Here's the uncomfortable truth: Apollo has no institutional authority to force Anthropic to change their testing methodology. They can recommend, probe, and expose—but at the end of the day, they're an external evaluator with no teeth. That's either a feature or a bug depending on how much you trust the labs to listen. Their first product, Watcher, is an automated oversight layer built to catch dangerous coding-agent behavior in real time—insecure code execution, data exfiltration, agent manipulation, emergent risks. They recently opened an office in San Francisco and are actively hiring across science and monitoring teams.

The AI safety space is full of organizations talking the talk. Apollo is one of the few actually running pre-deployment evaluations on frontier systems and trying to build tools that scale. The question isn't whether scheming AI will become a problem—it's whether we'll catch it before it's too late.

Key Facts

Category
AI
Discovered via
newsletter:Substack newsletter

Links

Browse by category

Similar products worth knowing

Want products like this in your inbox every morning?

Five products. Every morning. Written by someone who actually cares whether they're good or not. Free forever, unsubscribe whenever.

Apollo Research — SLAYREPORT