The Hidden ROI of AI: Why Leaders Are Measuring AI Success the Wrong Way

World Development Corporation Directors’ Institute - World Council of Directors
May 25
8 min read

Here is a quiet number that has been making the rounds in CFO conversations for the last six months.

Ninety-five percent.

That is the share of corporate generative AI pilots that have not produced a measurable impact on profit and loss, according to a July 2025 report from MIT's NANDA initiative called The GenAI Divide. Between thirty and forty billion dollars has been spent on enterprise AI. Five percent of those pilots are extracting millions in real value. The other ninety-five percent are sitting there, polite and quiet, waiting to be either scaled or quietly shelved.

Now, every think piece you have read in the last six months has used this number to argue that AI is failing.

I want to argue something else.

The AI is not failing. The way we are measuring it is.

An infographic titled "THE HIDDEN ROI OF AI: LEADING THE MEASUREMENT PARADIGM SHIFT" illustrating how to measure AI success. It uses an iceberg metaphor, showing traditional metrics (inputs and lagging indicators) as only the tip of the iceberg with minimal P&L impact, leading to the statistic that 95% of AI pilots show no measurable impact. The much larger submerged part of the iceberg represents "HIDDEN ROI (THE REAL VALUE)," which consists of "leading indicators and avoided cost." A brain graphic with a network of neural connections highlights these leading indicators: Avoided Cost (Compliance Error Caught), Time-to-Value (Accelerated Cycles), Function-Level ROI (Surgical Decisions), and Proficiency (Use-Case Diversity & Prompt Quality). The right side of the infographic outlines the "2026 LEADER MEASUREMENT SHIFT," visually contrasting old practices to stop (like relying on surveys) with new practices to adopt (measuring proficiency and function-level view) to capture "REALIZED ECONOMIC VALUE.

What is the hidden ROI of AI?

The hidden ROI of AI is the value AI is actually producing inside companies right now that does not show up on any dashboard, finance report, or board update — because most companies are measuring the wrong things.

If you ask a CFO whether their AI investment is paying off, they will check three numbers. Licence count. Average productivity hours saved per employee. Maybe a survey score on whether people find AI "helpful."

None of those numbers tell you what AI is actually doing.

The bug caught by an AI code reviewer at 2 AM that would have cost two hundred thousand dollars in production? Invisible. The compliance flag raised before a contract went out the door? Not on the dashboard. The junior analyst who, thanks to a copilot, just produced work that would have needed a senior person three weeks ago? Maybe celebrated, never measured.

This is the hidden ROI of AI in one sentence. It is the value that exists, that is real, that is moving the business — but that nobody has built a number for yet.

And until companies fix that, the AI ROI debate will keep going in circles.

Why is measuring AI success so hard right now?

Because most leaders are still measuring AI the way they measured software ten years ago.

You buy software. You pay a per-user licence. You count usage. You measure cost saved. You report ROI in a tidy slide. Clean. Linear. Auditable.

AI does not behave like software. It behaves like a new hire who is unevenly brilliant. Some days it saves you a week. Some days it confidently writes something wrong. Some weeks it quietly reshapes how an entire team works without anyone documenting it.

Try fitting that into a per-licence ROI calculation.

The IBM 2026 Q4 Think Circle report makes the gap explicit. About seventy-nine percent of executives say they see productivity gains from AI. But only twenty-nine percent say they can measure that ROI with any real confidence. Translation — most leaders know AI is doing something, they just cannot tell you what it is worth.

That is not a technology problem. That is a measuring AI success problem.

PwC's 2026 Global CEO Survey adds another layer. Fifty-six percent of CEOs surveyed said they were getting "nothing" from their AI adoption efforts. Now, two things can be true at the same time. Some of those CEOs are right — their pilots really are stalled. And some of them are simply not seeing what is happening below the C-suite, because their measurement system was not designed to see it.

Why are 95% of AI pilots failing?

According to MIT, it is not the model. It is what they call the learning gap.

The NANDA researchers ran fifty-two executive interviews, surveyed one hundred and fifty senior leaders, analysed three hundred public AI deployments, and gathered three hundred and fifty employee survey responses. Their conclusion is brutal in its simplicity. Most generative AI tools do not retain feedback, do not adapt to context, and do not improve over time. They demo beautifully. They die in the workflow.

Three patterns inside that finding are worth pausing on, because they are exactly where the AI investment value is leaking out.

One. Companies are pouring money into the wrong use case. More than half of enterprise GenAI budgets are going into sales and marketing pilots — flashy, customer-facing, easy to demo. MIT found that the highest ROI is actually showing up in back-office automation. Compliance. Document processing. Internal workflows. Finance ops. The unglamorous middle of the company. The part where nobody books a press release for fixing the invoice cycle.

Two. The build-versus-buy answer is not obvious. Internal builds are succeeding about thirty-three percent of the time. Specialised vendor partnerships are succeeding closer to sixty-seven percent. That is roughly twice the success rate. Yet most large enterprises still default to internal builds because of pride, control, or sunk-cost thinking.

Three. Adoption is happening, but in the shadows. This one is the most interesting. Only about forty percent of companies have official enterprise LLM subscriptions. But ninety percent of workers in the MIT survey said they use a personal AI tool like ChatGPT or Claude for work, daily. The work is getting done. The value is being created. The company just is not seeing it because the tools are unofficial. MIT calls this the shadow AI economy.

If your dashboard says nothing is happening but your output keeps improving, the dashboard is wrong.

What are leaders measuring wrong?

Five specific mistakes, and most companies are making at least three of them.

One. Confusing inputs with outcomes. A licence count is an input. So is hours saved. Neither tells you whether anything that matters to the business actually changed. Outcomes are different — did the customer churn rate drop, did the audit cycle shrink, did the sales conversion rate move. Most AI dashboards stop at input.

Two. Relying on surveys. This is one of the biggest unspoken problems. When the CEO has personally championed AI and the board has approved a forty-million-dollar transformation programme, employees feel pressure to report positive experiences. Survey-based ROI is almost always inflated. People remember last Tuesday, not their average month. The number you get back is social desirability dressed up as data.

Three. Tracking the wrong direction in time. Lagging indicators — revenue, profit, churn — are useful but they show up months after the AI is doing its work. Leading indicators — prompt quality, use-case diversity, time-to-decision, error rates caught early — show up in weeks. Companies that only track lagging numbers conclude AI is not working, when in fact the leading indicators have been positive for a quarter.

Four. Forgetting avoided cost. This is the entire invisible category. AI that catches a compliance error before it ships. AI that flags a contract clause that would have lost the company two hundred thousand dollars. AI-assisted code review that prevents a production bug. None of this appears in standard productivity dashboards. For quality-sensitive industries — banking, healthcare, legal, audit — avoided cost is often the largest single component of real AI ROI. And almost nobody is measuring it.

Five. Skipping the middle of the chain. A clean AI ROI metrics framework needs five links. Spend. Adoption depth. Proficiency. Productivity signal. Business outcome. Most companies measure spend and business outcome — the first and the last link — and skip everything in between. When the outcome moves, they cannot tell whether the AI caused it or whether it just happened to be in the room.

What should leaders actually measure?

Five things, ordered from easiest to hardest.

One. Time-to-value. How long between the day the AI tool was deployed and the day it produced a measurable business result. Shorter time-to-value means the design and the workflow fit. Longer time-to-value usually means the data is dirty, the integration is broken, or the success criteria were never written down. This is one of the cleanest AI ROI metrics available, and almost no one tracks it.

Two. Function-level ROI, not company-wide ROI. A single company-wide AI ROI number is mostly useless. Function-level ROI — finance, customer support, legal, engineering — lets you see which corner of the business AI is actually working in. It also lets the CFO make surgical calls. Double down here. Cut there. Retrain that team. That is how you build a portfolio view of AI investment value.

Three. Proficiency, not just adoption. Adoption is a one-time hurdle. Once people are logged in, you have cleared it. Proficiency is different — it measures use-case diversity (how many distinct tasks someone applies AI to) and prompt quality (how well they interact with the tool). Two employees can both be "AI users" and produce wildly different output. Proficiency is the multiplier between adoption and value.

Four. Avoided cost. Build a category in the dashboard for this and resist the urge to call it soft ROI. A compliance error caught is real money. A bug prevented from reaching production is real money. A misclassification stopped in audit is real money. If you cannot put a number on it, build a range with a confidence rating.

Five. The Levelized Cost of AI. This is a newer metric, borrowed from how the energy industry measures the lifetime cost of producing a unit of electricity. LCOAI measures the cost per useful AI output across the model's full lifecycle — including compute, integration, governance, and ongoing tuning. It lets you compare an external API-based GenAI deployment with an in-house build on the same scale. As compute costs and energy prices move, this metric is going to matter more and more.

When should leaders expect AI ROI to show up?

This is the question that quietly torpedoes most AI programmes. The honest answer is — it depends on what you measured.

Productivity signals show up in weeks. Proficiency improves over the first three to six months. Function-level ROI usually surfaces in a quarter or two if the use case is well-chosen. Enterprise-wide P&L impact takes a year or more, and that is on a generous timeline.

Jensen Huang, the Nvidia CEO, said something at the start of 2026 that has been quoted ever since. Demanding ROI from AI in the first six months, he argued, is a bit like asking a child to write a business plan. The technology is moving from infrastructure-building into early operating mode. That does not mean leaders should stop asking for returns. It does mean they should stop expecting the wrong returns at the wrong time.

The companies that are getting this right — and the BCG 2026 AI Radar survey found that around twenty-five percent of AI initiatives are delivering expected ROI, with sixteen percent scaling enterprise-wide — are doing two things that the other seventy-five percent are not.

They have multiple time horizons baked into their measurement. Weeks for adoption and proficiency. Quarters for function-level ROI. Years for enterprise transformation.

And they have someone in the room whose job is to challenge the dashboard.

Why does this matter for boards in 2026?

Because the budget cycle is coming.

The companies that can prove AI ROI with data — real data, not survey vibes — will double down next year. The companies that cannot will see their AI budgets cut. BCG's data is clear on this. The 2026 budget conversations in most boardrooms are no longer about whether to fund AI. They are about whether the current AI funding is producing anything worth keeping.

That makes measuring AI success a board-level question, not an IT question. If your AI dashboard cannot tell you which functions are generating return, which are not, what the avoided-cost number looks like, and how long the time-to-value has been, the board is, effectively, voting blind.

And boards that vote blind, in a regulatory environment where AI accountability now sits squarely with directors under existing fiduciary duties, are taking a bigger risk than they realise.

The real takeaway

The MIT ninety-five percent number is not the story.

The story is that of the five percent that are succeeding, almost none of them got there through a bigger model, a fancier vendor, or more spend. They got there because somebody in the room insisted on measuring the right things — back-office where the ROI lived, proficiency over adoption, function-level over company-wide, leading over lagging, avoided cost over surveys.

The hidden ROI of AI is not really hidden. It is sitting there, in your workflows, in your shadow tools, in the small daily wins your dashboard cannot see.

The question is not whether the AI is working.

It is whether you have built a system honest enough to notice when it is.

Most companies are measuring AI success the wrong way.

Join the Directors’ Institute – World Council of Directors webinar to discover how leaders and boards can measure the real ROI of AI, strengthen AI governance, and make smarter future-ready decisions.

Register now: https://www.directors-institute.com/webinar-registration