Why Generative AI Is (Still) Not a Magic Wand for Consulting
Why we still need to understand the promise and the facts.
Originally published on Northstar Consulting’s The Pulse
There’s no denying that generative AI has swept through the consulting industry like a gust of wind through a boardroom window. From McKinsey’s Lilli to Deloitte’s Sidekick and BCG’s Deckster, major firms are embedding large language models (LLMs) into their workflows, promising to compress weeks of effort into hours.
But beneath the headlines and PowerPoint decks, a more complex reality is emerging.
The Promise: Efficiency, Scale, Speed
Consulting firms tout generative AI as a force multiplier. McKinsey claims that over 70% of its consultants use Lilli nearly 17 times a week for research and analytics. BCG has spun up over 18,000 custom GPTs. Bain’s Microsoft Copilot integration supports drafting, summarisation, and due diligence across 13,000 consultants. Deloitte has rolled out Sidekick to 75,000 EMEA employees.
The pitch is simple: automate the mundane, accelerate the analytical, and free up consultants to focus on insight and strategy.
The Reality: Oversight, Overhead, and Unintended Consequences
Yet, the real-world application tells a different story — one of rework, governance bottlenecks, and unexpected costs.
Internal reports show that nearly one in four AI-generated deliverables at McKinsey requires significant revision. BCG estimates that 30% of auto-generated decks need manual reworking to align with client-specific context. At Bain, prompt engineering hours are eroding early efficiency gains.
And Deloitte? Their security and compliance frameworks, while robust, can delay deliverables by days — a far cry from the “hours-not-weeks” promise.
Failure Modes: A Few Real-World Examples
Hallucinations with Real-World Consequences: In one healthcare pilot, Lilli cited a regulatory amendment that didn’t exist. BCG’s GENE generated plausible-sounding but incorrect market-sizing figures that only surfaced during senior partner review.
Generic Output, Not Insight: Lilli-generated slides were discarded when a one-size-fits-all resilience framework failed to reflect client logistics data. Bain teams reworked Copilot drafts to embed client-specific risk policies.
Prompt Engineering as a Hidden Cost: At McKinsey, junior analysts spend 15–20 hours a week refining prompts, time previously spent on interviews and synthesis. Deloitte allocates up to 25% of project budgets to prompt engineering and tool customisation.
The Governance Tax
Governance is the hidden tax on generative AI deployments. McKinsey’s six-step validation cycle can stretch to three weeks in regulated sectors. Deloitte layers security and privacy reviews on top of standard workflows. PwC embeds compliance checks at every stage.
The result? Efficiency gains evaporate under the weight of oversight.
Beyond the Big Four: AI Boutiques and Skeptical Clients
Outside the traditional consulting giants, AI-native boutiques are offering faster, more focused services at lower costs. But even here, Gartner predicts that one-third of enterprise generative AI pilots will be abandoned by the end of 2025 due to poor data quality, risk gaps, and rising costs.
Clients, too, remain cautious. Sixty-five per cent of C-suite executives distrust fully AI-automated consulting advice, fearing reputational fallout from black-box recommendations.
Legal liability for AI-generated errors — whether from humans or algorithms — remains unresolved.
The Cost of AI Adoption: Not Just a Line Item
The financial stakes are high. Licensing for Lilli alone costs over £8,000 per user annually — a £360 million investment across McKinsey’s global team, before training and ethics roles. Bain’s ChatGPT enterprise deal and Deloitte’s Sidekick rollout each involved eight-figure commitments.
And yet, only 22% of consultancies track AI use against revenue uplift. Most rely on soft metrics like query volume and tool usage, leaving ROI ambiguous.
Structural Barriers: Data, Ethics, and Accountability
Three major structural challenges loom large:
Data Fragmentation: ERP, CRM, and bespoke research systems remain siloed, stalling AI initiatives and inflating integration budgets. Bain warns that fragmented datasets are a key blocker to effective AI deployment.
Accountability Gaps: With few firms tracking AI-to-revenue KPIs, legal and reputational risks grow when AI-driven recommendations lead to client losses.
Ethical and Legal Risks: Generative models often train on scraped content without consent, raising copyright and privacy concerns. Firms must now build auditable data-sourcing frameworks or face regulatory scrutiny.
The Path Forward: Co-Pilot, Not Autopilot
Generative AI has undoubtedly accelerated routine tasks — slide creation, initial drafts, data summarisation — but it is not a standalone consultant. The firms that succeed will be those that treat AI as a co-pilot rather than a replacement.
That means pairing every pound spent on models with equal investment in human oversight, robust controls, and ethical data practices.
The consulting firms of the future won’t be the ones with the flashiest AI tools — they’ll be the ones who can integrate those tools without sacrificing quality, accountability, or client trust.
Northstar Consulting works with SMEs to develop growth strategies grounded in data and insight. Follow our blog, The Pulse, for ongoing analysis of business, innovation, and strategy in a rapidly changing world.