Measuring AI ROI: metrics that convince skeptics

"What's the return on investment (ROI) on AI?" — if your answer is only "hours saved," skeptics will rightly push back.

Time matters, but it's one line on a scorecard. Quality, cycle time, risk avoided, and adoption rate tell whether a pilot deserves scale — or a graceful stop. I use this framework with CFOs and operations leads who are tired of slide-deck promises.

At a glance

Measure before pilot start — without a baseline, you'll argue anecdotes
Balance efficiency metrics with quality, risk, and adoption in one scorecard
Include human review time in cost — it's not "free"
Connect to budget reality, not vendor case studies

Baseline first (two weeks)

Before any tool change, capture:

Metric	How to measure
Time on task	Sample 10–20 instances; stopwatch honest
Error / rework rate	Misses, corrections, client complaints
Cycle time	Request → delivered
Cost of delay	Backlog, overtime, missed SLAs

Without baseline, "50% faster" is marketing.

The four-quadrant scorecard

1. Efficiency

Hours saved per week (team level, not hero user)
Cost per transaction (if repeatable task)
Throughput (items processed)

Caution: Shaving minutes on a broken process automates waste. Pair with friction mapping before scaling.

2. Quality

Error rate before/after
Rework tickets
Client satisfaction on affected deliverables

AI that speeds up wrong answers is negative ROI.

3. Speed

Cycle time reduction
Time-to-first-draft (with human review still counted)

4. Risk and resilience

Near-misses caught in review
Consistency of documentation
Reduced dependency on one person's tacit knowledge

Harder to quantify — but executives feel these when someone is on vacation.

Adoption metrics (don't skip)

Active users / eligible users weekly
Completion rate — started workflow vs finished
Override rate — humans fixing AI output
Qualitative — short survey: trust, would recommend

A brilliant tool with 15% adoption fails the business case.

Worked example: meeting notes pilot (8 weeks)

A professional services SMB (28 people) measured an automated meeting notes pilot across two teams (6 eligible users).

Annual costs (extrapolated from pilot):

Item	Amount
Enterprise tool licenses	$4,800
Integration time (IT + pilot lead)	$3,200
Training (2 × 90 min sessions)	$1,800
Human review time (12 min × 48 sets/mo)	$6,400
Total cost	$16,200

Annual benefits (measured, not projected):

Item	Amount
Drafting time saved (312 h × $85 loaded)	$26,520
Rework avoided (2 errors/mo × 2 h × $85)	$4,080
Publication delays avoided (conservative)	$2,400
Total benefit	$33,000

ROI ≈ (33,000 − 16,200) / 16,200 = 104% — with 83% adoption (5/6 users active weekly). The CFO approved extension to a third team. Without the adoption column, leadership would have seen only "312 hours" and missed that 17% of the pilot team wasn't using the tool.

Simple ROI formula (SMB-friendly)

Annual benefit ≈ (hours saved × loaded hourly rate) + rework avoided + delay cost avoided
Annual cost ≈ licenses + integration + training + review time + governance overhead
ROI ≈ (benefit − cost) / cost

Include review time in cost — human-in-the-loop (HITL) review, where a named person approves before outputs leave the organization, is real work. Include ramp-up; month one is rarely steady state.

What convinces skeptics

Side-by-side samples (anonymized) — before vs after
Named process owner endorsing results
Honest misses — "here's where it failed and what we changed"
Bounded scale plan — not open-ended spend

When to stop or pivot

Quality metrics worsen
Review time exceeds time saved
Adoption flat after training
Governance incidents rise

Stopping a pilot isn't failure — it's discipline.

Reporting rhythm

Weekly during pilot — operational tweaks
Monthly — scorecard to leadership
At pilot end — scale / extend / stop decision with numbers

Where you are

You've just completed the Concrete pilots series — meeting notes, field workflow, measured ROI. The next series, Govern and sustain, starts here: Human-in-the-loop: where AI stops and judgment starts — who approves what before expanding externally.

Building a scorecard for your pilot? Let's talk about metrics that match your CFO's language.