"What's the return on investment (ROI) on AI?" — if your answer is only "hours saved," skeptics will rightly push back.

Time matters, but it's one line on a scorecard. Quality, cycle time, risk avoided, and adoption rate tell whether a pilot deserves scale — or a graceful stop. I use this framework with CFOs and operations leads who are tired of slide-deck promises.

At a glance

  • Measure before pilot start — without a baseline, you'll argue anecdotes
  • Balance efficiency metrics with quality, risk, and adoption in one scorecard
  • Include human review time in cost — it's not "free"
  • Connect to budget reality, not vendor case studies

Baseline first (two weeks)

Before any tool change, capture:

MetricHow to measure
Time on taskSample 10–20 instances; stopwatch honest
Error / rework rateMisses, corrections, client complaints
Cycle timeRequest → delivered
Cost of delayBacklog, overtime, missed SLAs

Without baseline, "50% faster" is marketing.

The four-quadrant scorecard

1. Efficiency

  • Hours saved per week (team level, not hero user)
  • Cost per transaction (if repeatable task)
  • Throughput (items processed)

Caution: Shaving minutes on a broken process automates waste. Pair with friction mapping before scaling.

2. Quality

  • Error rate before/after
  • Rework tickets
  • Client satisfaction on affected deliverables

AI that speeds up wrong answers is negative ROI.

3. Speed

  • Cycle time reduction
  • Time-to-first-draft (with human review still counted)

4. Risk and resilience

  • Near-misses caught in review
  • Consistency of documentation
  • Reduced dependency on one person's tacit knowledge

Harder to quantify — but executives feel these when someone is on vacation.

Adoption metrics (don't skip)

  • Active users / eligible users weekly
  • Completion rate — started workflow vs finished
  • Override rate — humans fixing AI output
  • Qualitative — short survey: trust, would recommend

A brilliant tool with 15% adoption fails the business case.

Worked example: meeting notes pilot (8 weeks)

A professional services SMB (28 people) measured an automated meeting notes pilot across two teams (6 eligible users).

Annual costs (extrapolated from pilot):

ItemAmount
Enterprise tool licenses$4,800
Integration time (IT + pilot lead)$3,200
Training (2 × 90 min sessions)$1,800
Human review time (12 min × 48 sets/mo)$6,400
Total cost$16,200

Annual benefits (measured, not projected):

ItemAmount
Drafting time saved (312 h × $85 loaded)$26,520
Rework avoided (2 errors/mo × 2 h × $85)$4,080
Publication delays avoided (conservative)$2,400
Total benefit$33,000

ROI ≈ (33,000 − 16,200) / 16,200 = 104% — with 83% adoption (5/6 users active weekly). The CFO approved extension to a third team. Without the adoption column, leadership would have seen only "312 hours" and missed that 17% of the pilot team wasn't using the tool.

Simple ROI formula (SMB-friendly)

Annual benefit ≈ (hours saved × loaded hourly rate) + rework avoided + delay cost avoided
Annual cost ≈ licenses + integration + training + review time + governance overhead
ROI ≈ (benefit − cost) / cost

Include review time in cost — human-in-the-loop (HITL) review, where a named person approves before outputs leave the organization, is real work. Include ramp-up; month one is rarely steady state.

What convinces skeptics

  • Side-by-side samples (anonymized) — before vs after
  • Named process owner endorsing results
  • Honest misses — "here's where it failed and what we changed"
  • Bounded scale plan — not open-ended spend

When to stop or pivot

  • Quality metrics worsen
  • Review time exceeds time saved
  • Adoption flat after training
  • Governance incidents rise

Stopping a pilot isn't failure — it's discipline.

Reporting rhythm

  • Weekly during pilot — operational tweaks
  • Monthly — scorecard to leadership
  • At pilot end — scale / extend / stop decision with numbers

Where you are

You've just completed the Concrete pilots series — meeting notes, field workflow, measured ROI. The next series, Govern and sustain, starts here: Human-in-the-loop: where AI stops and judgment starts — who approves what before expanding externally.

Building a scorecard for your pilot? Let's talk about metrics that match your CFO's language.