Automation6 min read

What Klarna's AI Reversal Actually Teaches Us

In May 2025, Klarna's CEO admitted they'd pushed AI-driven job cuts too far and began rehiring human agents. The story got covered as a cautionary tale. It's more useful than that — and more instructive about what actually went wrong.

Sarah Chen

Senior Editor

—9 June 2025

Let me put a framing on this upfront: the people writing "Klarna's AI experiment failed" pieces are getting the story slightly wrong, and the people writing "this proves AI customer service doesn't work" pieces are getting it more wrong. What actually happened is more nuanced and considerably more useful as a lesson.

In February 2024, Klarna announced that its AI assistant had handled the equivalent of 700 human agents' worth of customer service volume. In its first month, the assistant handled 2.3 million conversations (two-thirds of all customer service chats), and resolution times fell from eleven minutes to under two. The announcement was partly genuine operational data and partly investor narrative. Klarna was preparing for an IPO and had a commercial interest in presenting AI efficiency as a multiplier of its business value. Both things were true simultaneously.

What followed, over the next 15 months, was a managed experiment in how far you can push AI-first customer service before it starts costing you more than it saves.

What Bloomberg Reported in May 2025

CX Dive's coverage of the Bloomberg interview on 9 May 2025 confirmed CEO Sebastian Siemiatkowski's acknowledgement that the aggressive AI strategy had gone too far. Klarna would begin rehiring human agents. Customers would always have the option to speak with a real person. Siemiatkowski described the model he was moving towards as an "Uber type" customer service workforce.

The CX Dive piece also carried the most useful industry response, from Julie Geller at Info-Tech Research Group: "The key takeaway is that AI should augment human agents, not replace them. Automate the routine to drive efficiency, but always ensure customers have a clear, easy path to a human, especially when emotions or complexity come into play."

This is not a retreat. It is the design that many practitioners had been arguing for from the beginning.

The Actual Failure Mode

Klarna's AI customer service was not bad at customer service. It was bad at the wrong customer service cases.

Routine queries (balance checks, refund timescales, payment splitting) are genuinely well-handled by AI systems trained on consistent, structured data. Response time is better, availability is better (24/7 against business hours), and accuracy on well-understood query types is reliable.

The failure mode is non-routine cases: disputed transactions where the customer is distressed, complaints involving an element of procedural unfairness, refund requests where policy and personal situation are in tension. These cases require contextual reasoning and emotional attunement that current AI systems do not do well. Customers interacting with an AI on a stressful financial matter and receiving generic, policy-reciting responses do not just have a bad experience; they escalate, complain publicly, and in fintech, they close their accounts.

The cost of the AI handling complex cases badly was not just "we didn't save the agent cost on this ticket." It was "we damaged the customer relationship and potentially lost a customer with a long-term value of several hundred pounds."

The Case Volume Trap

The specific error in Klarna's approach, and one that is easy to make because the data tempts you into it, was optimising for case volume rather than case complexity. The AI handled two-thirds of all customer interactions. That sounds impressive. What it obscures is whether the cases it handled included any of the complex, high-stakes ones, or whether it was handling only the routine two-thirds while routing the difficult cases to humans at unchanged volume.

Klarna's own Q1 2025 earnings told an interesting story. The headline was positive: customer service cost per transaction fell 40%, from $0.32 in Q1 2023 to $0.19 in Q1 2025. Resolution times had improved 82%. Consumer satisfaction, the company maintained, had remained steady. And yet, in the same period, customers were publicly complaining that the AI provided generic answers and was unable to handle complicated questions. The efficiency metrics and the experience reality were pointing in different directions.

If the AI is handling only the cases it can handle well, the reported efficiency gain is real. If it is also handling cases it cannot handle well and producing poor outcomes, the efficiency gain is partly illusory. You are trading customer service cost for churn cost, which is usually a bad trade.

What the Hybrid Model Actually Looks Like

Klarna's post-reversal model routes cases by complexity, not just by case type. AI handles the routine cases. Human agents handle the complex ones. The boundary between them is set not by a fixed rule but by a confidence threshold: the AI escalates to a human when its confidence in a clean resolution falls below a defined level.

This requires a few things to work well: good classification of case complexity at intake, calibrated confidence scoring in the AI system, clean handoff protocols that do not require the customer to repeat their situation to a human agent, and monitoring of escalation rates to detect drift (if escalation rates start rising, something has changed in the case mix or the AI performance).

None of this is exotic. It is operationally rigorous, but it is the right way to deploy AI in a customer service context where the failure mode is a damaged customer relationship rather than a failed page load.

The Broader Lesson

The Klarna story is not "AI customer service does not work." It is "designing AI deployment around cost reduction rather than outcome quality leads to a specific failure mode that is predictable and preventable."

The reversal was necessary, and the CEO deserves some credit for saying so publicly rather than quietly reengineering while maintaining the narrative. That kind of honesty is rarer than it should be. The commercial vindication came a few months later, when Klarna's IPO landed on the NYSE with a $17.3 billion first-day closing valuation, positioning the company as an AI-enabled BNPL and payments business that had demonstrated genuine operational improvement. By Q3 2025, Klarna's AI agent was described as doing the equivalent work of 853 employees, even as the company had reintroduced human agents for complex cases.

The retailers and fintech companies watching this story who take the lesson to be "avoid AI in customer service" are likely to find themselves in a worse competitive position than the ones who take the lesson to be "design the human-AI split carefully, monitor quality not just efficiency, and be honest about where the boundary should be."

For UK fintechs in particular, the stakes are concrete. With 11 million UK users and an EMI licence, Klarna's operational choices in customer service have direct implications for the companies competing with it for the same relationships. The lesson from 2025 is not that AI in customer service fails. It is that it fails in predictable ways when deployed without those design principles. Those are avoidable failures.