Products/AI Safety Research & Evaluation/METR

METR

Model Evaluation & Threat Research

AI Safety Research & EvaluationNon-profit/Donation-fundedaievaluationthreat-researchsafetymodel-analysisReviewed
METR

Our Take

METR (née ARC Evals) is out here doing the unglamorous work of actually measuring what AI agents can and can't do, which is honestly more useful than the endless stream of "we crushed the benchmark" press releases flooding my inbox. They're not building models — they're the people making sure the rest of us can tell what's real and what's just optimized training data. The evaluation space is getting crowded, but METR's been at this long enough to have real credibility in the space.

Evaluates frontier AI models to help companies and wider society understand AI capabilities and the risks they pose

Key Features
Frontier Risk Reports assessing AI deployment risks, Time Horizon methodology measuring AI task-completion capabilities, MALT dataset of evaluation integrity threats, Hawk open-source platform for running AI agent evaluations at scale, Independent reviews of AI developer risk assessments, AI productivity impact studies, Prototype governance approaches like Responsible Scaling Policies
Problem It Solves
Need for independent third-party evaluation of AI autonomous capabilities and catastrophic risks
Target Customer
AI companies (OpenAI, Anthropic, Google, Meta, xAI), governments, policymakers, and society
Use Cases
Evaluating AI autonomous capabilities, Assessing rogue deployment risks, Measuring AI agent task-completion time horizons, Evaluating AI sabotage risks, Studying AI effects on developer productivity, Risk assessment for frontier AI models
Pricing Details
Funded by donations; does not accept compensation from AI companies
Differentiator
Independent third-party scientific evaluation of AI capabilities and risks; has partnered with leading AI developers including Anthropic, Google, Meta, and OpenAI
Why Now
AI systems advancing rapidly with potential for autonomous capabilities that could pose catastrophic risks; need to measure when AI systems might have wide-reaching impacts
Traction
Customers Mentioned: Anthropic, Google, Meta, OpenAI, xAI, DeepSeek · Notable Metrics: Time horizon metric shows AI task-completion has doubled approximately every 7 months for 6 years; 349 technical workers surveyed for productivity study · Press Mentions: Press coverage from May 2026, April 2026, February 2026, August 2025, July 2025, March 2025, March 2024

Key Facts

Category
AI Safety Research & Evaluation
Stage
Non-profit/Donation-funded
Discovered via
newsletter:Substack newsletter

The people behind METR

M

METR Team

profile

AI Company

AI company building agents for enterprises. METR (formerly known as various names) — autonomous AI agents.

Links

Browse by category

Want products like this in your inbox every morning?

Five products. Every morning. Written by someone who actually cares whether they're good or not. Free forever, unsubscribe whenever.

METR — SLAYREPORT