Methodology — How We Measure Task-Level AI Exposure

01 · Definitions

Tasks, not jobs.

We define a task as a discrete, observable unit of paid work — drawn from the O*NET 28.3 occupational taxonomy maintained by the U.S. Department of Labor. A profession is a weighted bag of tasks. Models substitute or assist tasks, not whole jobs.

Throughout this document, exposure refers to the share of time-weighted task value that current frontier models can perform at or above human-median quality, holding context, tools, and oversight constant.

Definition

E_profession = Σ_{i ∈ tasks} w_i · c(t_i, M)

w = TIME WEIGHT (BLS)c = CAPABILITY SCOREM = FRONTIER MODEL SET

02 · Data sources

Four primary inputs.

TASK TAXONOMY

O*NET 28.3

Source taxonomy · 923 occupations · 19,260 tasks

U.S. DEPT OF LABOR

LABOR MARKET

BLS OEWS 2025

154M workers · wages · employment · time-use

BUREAU OF LABOR STATISTICS

CAPABILITY MAPPING

Anthropic 2024

Task-level model performance benchmarks

INTERNAL & ARXIV

EXPOSURE RESEARCH

Eloundou et al.

GPTs are GPTs — exposure framework foundations

ARXIV:2303.10130

03 · The capability matrix

52 capabilities × 6 frontier models.

Each task is decomposed into one or more capability primitives — atomic skills like “summarize unstructured text” or “debug deterministic code under spec.” We score every capability against six frontier models, refreshed quarterly.

Capability	Claude 4.5	GPT-5	Gemini 2.5	Llama 4	o-Series	Median
Generate boilerplate code	0.94	0.92	0.89	0.81	0.86	0.89
Summarize unstructured text	0.91	0.93	0.88	0.85	0.84	0.88
Multi-turn empathetic dialog	0.62	0.58	0.55	0.41	0.49	0.55
Triage production incident logs	0.41	0.45	0.38	0.31	0.46	0.41
Design system architecture	0.28	0.31	0.25	0.19	0.27	0.27

04 · Scoring & aggregation

Bottom-up, weighted, classed.

Tasks are aggregated bottom-up. Each task is given a capability-weighted exposure score (0–100), then classified into one of three buckets at the canonical thresholds:

AI-Substitutable

SCORE ≥ 75

Frontier models meet human-median quality without prompted oversight.

AI-Assisted

SCORE 40 – 74

Models reliably accelerate but require human review for correctness.

Human-Critical

SCORE < 40

Models underperform humans materially; oversight is the work.

05 · Updates & versioning

Re-scored quarterly. Versioned forever.

The capability matrix is re-benchmarked every quarter as new frontier models release. Every change is committed to a versioned dataset — you can pin any report to a historical dataset for longitudinal research.

v2.7JUL 2026Expanded dataset to 148 professions; added statistics and ranking pages; corrected cross-reference data

v2.6MAY 2026Expanded curated profession dataset and generated role-specific FAQs

v2.5APR 2026Added personal-risk scoring flows and profession-specific question sets

v2.4FEB 2026O*NET 28.3 task taxonomy migration

v2.2NOV 18, 2025Added six creative-professional families

v2.1AUG 24, 2025Re-classification thresholds adjusted (+5pp)

v2.0MAY 4, 2025Major rewrite — capability matrix introduced

06 · Limitations

Where this score is wrong.

Exposure is not adoption. Substitution is not extinction. Below-average models matter less than the marginal user. This index tells you where the ceiling is, not where the market actually lands. Read the full whitepaper for our priors, our anti-bias adjustments, and the things that genuinely surprised us.

Calculate My Personal AI Risk

07 · Citations

Primary references.

[1]Eloundou, T., Manning, S., Mishkin, P., & Rock, D. (2023). GPTs are GPTs: An early look at the labor market impact potential of large language models. arXiv:2303.10130.

[2]Acemoglu, D., & Restrepo, P. (2022). Tasks, automation, and the rise in US wage inequality. Econometrica, 90(5), 1973–2016.

[3]O*NET Resource Center. (2026). O*NET 28.3 database. U.S. Department of Labor/Employment and Training Administration.

[4]Bureau of Labor Statistics. (2025). Occupational Employment and Wage Statistics (OEWS) 2025. U.S. Department of Labor.

[5]Anthropic. (2024). Task-level model performance benchmarks [Internal research report]. Anthropic, PBC.

[6]Autor, D., Levy, F., & Murnane, R. J. (2003). The skill content of recent technological change: An empirical exploration. Quarterly Journal of Economics, 118(4), 1279–1333.

How we measuretask-level AI exposure.