The week the EBA told 2,759 banks their data was wrong

Here's a small exercise. Open your counterparty master — or whatever you call the system of record for your corporate lending book — and pick ten entities at random. Look at their NACE codes. Now check whether those codes match what the company actually does today, not what it did when it was onboarded.

I've run this exercise a few times over the years. The hit rate is never as good as people expect. You'll find a logistics company classified as "wholesale trade" because that's what it was in 2014. A manufacturer that pivoted to services three years ago but still sits in Section C. A holding company coded to whatever its largest subsidiary did at the time someone filled in the form.

It's the kind of error that feels cosmetic until you realise it feeds directly into your concentration risk reporting, your Large Exposure calculations, and — if you're subject to Pillar 2A — your capital add-ons.

The EBA just published a paper that puts a number on how widespread this problem actually is. It's worse than I expected.

What the EBA found

EBA Staff Paper No. 25, published in May 2026, analysed Large Exposure data submitted by 2,759 EEA credit institutions. The focus was NACE codes — the standard industry classification that drives sectoral concentration limits, stress-test scenarios, and supervisory benchmarking.

The findings are blunt: systemic misclassifications of counterparty NACE codes across the sample, distorting capital requirements and supervisory analysis. The EBA's own language calls these findings a "lower bound" for data quality problems — meaning the real error rate is likely higher than what they measured, because they could only detect the misclassifications visible in the data they received.

This isn't a story about a few outlier banks with sloppy processes. It's 2,759 institutions. The pattern is structural.

Why classification errors compound

The reason this matters more than it looks is that NACE codes aren't just a label. They're load-bearing infrastructure.

When your regulator runs a sectoral stress test — say, "what happens to your book if commercial real estate drops 30%" — the answer depends entirely on which exposures are tagged as commercial real estate. If a property holding company is coded as "financial services" because it's technically a holding entity, it doesn't show up in the stress scenario. Your capital looks fine. The risk hasn't gone anywhere.

I'll be honest: I once presented a board-level concentration report that I later realised was built on NACE codes nobody had reviewed in four years. The numbers were precise, well-formatted, and confidently wrong. Nobody in the room questioned them, including me. That's the thing about bad classification data — it doesn't look broken. It looks like a report.

The same logic applies to internal concentration limits. If your board has set a 15% cap on exposure to construction, and 3% of your construction exposure is hiding under the wrong NACE code, you might be running closer to the limit than anyone thinks. Or already over it.

And it compounds downstream. If you're building any kind of analytics on top of your lending data — sector-level default rates, portfolio migration analysis, early-warning models — every one of those outputs inherits the classification error. The model doesn't know the input is wrong. It just gives you a confident answer based on bad data.

The Cambridge/BIS report says the same thing from a different angle

This connects to something else published recently. The 2026 Global AI in Financial Services Report from Cambridge's Centre for Alternative Finance, produced in partnership with the BIS, IMF and WEF, surveyed 628 organisations globally — financial institutions, AI vendors, and regulators. The headline finding that got attention was that 81% of financial services firms are adopting AI at some level. The finding that matters more is that data quality is the top barrier to making AI useful in production.

Not model sophistication. Not compute costs. Not talent. Data quality.

If you're a mid-market bank thinking about where to invest in analytics or AI capability, this is the answer staring you in the face. The binding constraint isn't whether you have the right model. It's whether your counterparty data is clean enough for any model to work.

❝

If I asked your credit team to produce a sectoral concentration report tomorrow, how much of the answer would depend on NACE codes that haven't been reviewed since onboarding?

This is a data-engineering problem, not a strategy problem

I want to be specific about what "fixing this" actually looks like, because it's smaller than people assume.

For a UK mid-market bank, your counterparty master probably has somewhere between 500 and 5,000 corporate entities. Companies House gives you free access to SIC codes (the UK equivalent of NACE) for every registered company, updated when companies file their confirmation statements. The Companies House API is free, reasonably well-documented, and rate-limited at 600 requests per five minutes — which means you can reconcile a 3,000-entity book in about 25 minutes.

The script is maybe 150 lines of Python. Pull your internal NACE or SIC codes, fetch the current Companies House classification for each entity by company number, produce a delta report showing every mismatch. The first run will be ugly. You'll find dozens of mismatches, maybe more. Some will be trivial reclassifications. Some will be genuinely wrong in a way that affects your concentration numbers.

I won't pretend it's glamorous. But it's the kind of work that, once done, makes everything downstream — your stress tests, your board reporting, your regulatory submissions, any future analytics — more trustworthy. And it costs you one person's week, not a procurement cycle.

The takeaway

This week, ask whoever owns your counterparty data to pull a random sample of 50 entities and manually check their NACE or SIC codes against Companies House. If more than five are wrong, you've found a data quality problem that's distorting your concentration risk numbers — and you've scoped a fix that a competent analyst can deliver in a few days with nothing more than the Companies House API and your existing counterparty master.

The EBA just told 2,759 banks this problem exists. The question is whether you find out from your regulator or from your own team first.

— Aksel

The Analytical Banker is a weekly note on data, analytics, and AI inside corporate banking — written for finance leaders who actually have to make this stuff work. Reply to this email if something here resonates, or forward it to a colleague who'd benefit.

The week the EBA told 2,759 banks their data was wrong — and what it means for yours

What the EBA found

Why classification errors compound

The Cambridge/BIS report says the same thing from a different angle

This is a data-engineering problem, not a strategy problem

The takeaway

Keep reading

The Analytical Banker