Products/AI Safety Research & Evaluation/METR

METR

Model Evaluation & Threat Research

AI Safety Research & EvaluationNon-profit/Donation-fundedaievaluationthreat-researchsafetymodel-analysisReviewed

Our Take

METR (née ARC Evals) is out here doing the unglamorous work of actually measuring what AI agents can and can't do, which is honestly more useful than the endless stream of "we crushed the benchmark" press releases flooding my inbox. They're not building models — they're the people making sure the rest of us can tell what's real and what's just optimized training data. The evaluation space is getting crowded, but METR's been at this long enough to have real credibility in the space.

Evaluates frontier AI models to help companies and wider society understand AI capabilities and the risks they pose

Key Features

Frontier Risk Reports assessing AI deployment risks, Time Horizon methodology measuring AI task-completion capabilities, MALT dataset of evaluation integrity threats, Hawk open-source platform for running AI agent evaluations at scale, Independent reviews of AI developer risk assessments, AI productivity impact studies, Prototype governance approaches like Responsible Scaling Policies

Problem It Solves

Need for independent third-party evaluation of AI autonomous capabilities and catastrophic risks

Target Customer

AI companies (OpenAI, Anthropic, Google, Meta, xAI), governments, policymakers, and society

Use Cases

Evaluating AI autonomous capabilities, Assessing rogue deployment risks, Measuring AI agent task-completion time horizons, Evaluating AI sabotage risks, Studying AI effects on developer productivity, Risk assessment for frontier AI models

Pricing Details

Funded by donations; does not accept compensation from AI companies

Differentiator

Independent third-party scientific evaluation of AI capabilities and risks; has partnered with leading AI developers including Anthropic, Google, Meta, and OpenAI

Why Now

AI systems advancing rapidly with potential for autonomous capabilities that could pose catastrophic risks; need to measure when AI systems might have wide-reaching impacts

Traction

Customers Mentioned: Anthropic, Google, Meta, OpenAI, xAI, DeepSeek · Notable Metrics: Time horizon metric shows AI task-completion has doubled approximately every 7 months for 6 years; 349 technical workers surveyed for productivity study · Press Mentions: Press coverage from May 2026, April 2026, February 2026, August 2025, July 2025, March 2025, March 2024

Key Facts

The people behind METR

METR Team

profile

AI Company

AI company building agents for enterprises. METR (formerly known as various names) — autonomous AI agents.

Links

Website

Want products like this in your inbox every morning?