Site Title

AI in Finance Works. Just Not in Your Operation.

Linkedin
x
x

AI in Finance Works. Just Not in Your Operation.

Publish date

Publish date

The exam finding that keeps compliance officers up at night is not the one that says the technology failed. It is the one that says the technology was running while the operation broke around it.

We have been inside enough of these situations to recognize the pattern before the conversation starts. An institution that brought in a new system twelve or eighteen months ago. Transaction monitoring live. Fraud detection active. Sometimes both. The system is working. The alert backlog is growing. Three products launched since go-live have no monitoring scenarios. The senior analyst who calibrated the thresholds left in February.

The system did exactly what it was built to do. It just was not built around the operation it went into.

We work inside six of these operations: anti-money laundering (AML) alert review and compliance, fraud case management, know your customer (KYC) and know your business (KYB) onboarding, credit underwriting, vendor chain and banking-as-a-service (BaaS) compliance, and regulatory filing. The technology differs across all of them. The failure mode is the same.


The Math That Does Not Add Up

Eighty-one percent of financial services firms are now running AI at some level. That is the headline from the 2026 Global AI in Financial Services Report, produced by Cambridge, the Bank for International Settlements, and the IMF across 628 institutions. The return picture is considerably quieter.

40%  report any increase in profitability

55%  cannot measure the value of what they brought in at all

9.5%  say their data infrastructure is ‘very prepared’ to support what they are running

That last number is from the Q1 2026 Banking Compliance AI Trend Report, covering 148 financial institutions. More than half of the industry is running AI it cannot measure, on data it does not consider ready. The gap was not caused by choosing the wrong tool. It was caused by the sequence.


In Finance, AI Errors Are Not Just Expensive. They Are Evidence.

Every industry is learning that AI errors cost money. Financial services is learning they cost something else too: they appear in examination reports, adverse action notices, and, increasingly, court records.

The Air Canada ruling in 2024 established the principle that matters here. A chatbot invented a bereavement-fare policy. The airline argued the chatbot was a separate entity responsible for its own statements. The tribunal disagreed. It made no difference, the ruling held, whether the information came from a static page or an AI system. The operator carries liability for what the system produces. In financial services, that operator is the institution. The model vendor is not named in the consent order.

The hallucination risk lands differently here than it does in consumer software. A suspicious activity report (SAR) narrative that cites an invented regulatory reference does not just mislead. It fails the audit. A credit memo that contains a fabricated financial ratio does not just produce a wrong recommendation. It creates a fair lending liability. The institution signed off on both. That is what the examiner sees.

There is a third dimension worth naming before anything else. At 95 percent per-step accuracy, a ten-step workflow succeeds 59 percent of the time. At twenty steps: 36 percent. In most industries, those are product quality metrics. In AML alert review running against a 30-day statutory filing window, they are compliance outcomes.

95% accuracy is a reasonable benchmark for most software. In a regulated operation with legal deadlines and examiner oversight, it is a gap analysis waiting to happen.

The operations making this work in production have not found a system that reaches 100 percent. They have designed workflows where each system’s role is narrow enough that its error rate does not compound. That design decision comes before any system is chosen.


What the System Does Not Know About Your Operation

No vendor brings institutional context when it arrives. It comes trained on broad patterns from across the industry and no memory of the decisions your team made last quarter, last year, or the year before that.

When a system enters without this context, it applies general knowledge to situations built from specific institutional history. The errors it makes are invisible until an examiner, a fraud spike, or a key departure makes them visible at the worst possible time.


Adding It to a Broken Process Does Not Fix It. It Accelerates It.

A compliance team automates alert triage without redesigning the alert workflow. The false positive rate stays at 95 percent. Documentation burden identical. Throughput changes. So does the cost, without the result.

A fraud team adds a new system on top of a case file process built across nine vendor tools. Now there are nine tools and additional outputs to reconcile. Assembly time shifts. It does not shrink.

An onboarding function brings in new technology for a document collection chain still running on email. The result is technology-assisted email chains. The 90-day average does not move.

We see this at the start of almost every engagement where a previous system is already in place. The process was broken before the new technology arrived. Bringing in something new made it faster, more expensive, and harder to unpick.

There is a cost problem sitting underneath the process problem. S&P Global analysis found the top 50 global banks averaged 40 fintech vendor relationships in 2024, up from 12 in 2018. Mid-market compliance operations run a similar stack. The typical setup has separate:

  • Transaction monitoring
  • Case management
  • Watchlist and sanctions screening
  • Adverse media feeds
  • KYB verification tools

 

Each charges for data access. None reasons across the others. The analyst stitches them together manually on every case. The budget is going to data collection. The decisions still sit with a human assembling information from five screens. When a new system gets added on top of one tool in that stack, the integration problem stays exactly where it was.


Map Where Judgment Matters. Everything Else Is Assembly Work.

The teams getting consistent results did one thing before selecting any tool. They mapped their operation at the decision level: where does human judgment actually change the outcome, versus where is it being consumed by work that requires accuracy, not expertise.

For an AML compliance team, that map produces a boundary that is both obvious and, in most operations, completely unaddressed.

The compliance analyst today spends most of her day on the left side of that table. The system that changes nothing is the one built to assist that work without replacing it. The one that changes throughput, backlog, and exam readiness takes the left side completely, so the analyst’s day is spent on the right.


That is not AI replacing compliance judgment. It is AI making compliance judgment possible at scale.

This does not mean rebuilding what you have. The transaction monitoring system still fires alerts. The case management platform stays. The compliance procedures your team spent years getting regulatory approval for are not going anywhere. What changes is who does the assembly, the steps between an alert firing and an analyst being ready to make a call.

The same logic runs across fraud ops, onboarding, underwriting, and vendor chain monitoring. The left side shifts by function. The principle does not.


Your Examiner Will Ask Why the System Made That Decision. Do You Have an Answer?

The hesitation at this stage is always regulatory. If a system is assisting dispositions, what happens when the examiner asks why a specific alert was cleared? If a system contributed to a credit decision, what does the denial notice say?

These are the right questions. Here is what the regulatory environment actually requires right now.

In April 2026, the Federal Reserve, the Office of the Comptroller of the Currency (OCC), and the Federal Deposit Insurance Corporation (FDIC) jointly issued Supervisory Letter 26-2 (SR 26-2), replacing the model risk management framework that had governed banking since 2011. SR 26-2 explicitly excludes generative and agentic AI from its scope. Not a green light. A governance gap. The agencies have signaled further rules are coming. The institutions building explainability into how their systems work are constructing the evidence base that will matter when those rules arrive. The ones treating explainability as documentation added before the next exam are building the gap the examiner will find.

For credit decisions, the Consumer Financial Protection Bureau (CFPB) has been precise. Under Regulation B, the federal rule implementing the Equal Credit Opportunity Act that requires specific written reasons for credit denials, a lender cannot justify non-compliance because its technology is too complex to explain. The adverse action notice must state the actual reason. The difference between a notice that satisfies the requirement and one that does not is not legal language. It looks like this:

✗  Risk score exceeded threshold

✓  Debt-to-income ratio of 48% exceeded our documented underwriting limit of 43%

That distinction is whether the system was built around your specific credit policy or arrived without it. Explainability is not documentation you add afterward. It is the test of whether the system actually knows your operation.


What We See When the Design Is Done Correctly

AML and compliance

The alert queue gets quieter because the system surfaces what genuinely requires a human call, pre-populated with the context to make it quickly. SARs get drafted from structured case evidence rather than written from scratch under a 30-day deadline. The examiner sees a team working the right problems.

Fraud operations

Case files arrive assembled before the analyst opens the screen. Behavioral context across accounts, devices, and prior disputes is already there. The time that was going to data assembly goes to the fraud judgment that requires a person.

KYC and KYB onboarding

Document collection is sequenced automatically. Registry checks run in parallel rather than sequentially. UBO tracing reaches the jurisdictional boundary without manual switching between registries. Enhanced due diligence (EDD) triggers on risk signal rather than whoever has bandwidth that week. The 90 days becomes 20.

Vendor chain and BaaS compliance

Monitoring extends to the actual end customer, not just the direct partner. The compliance program covers the full chain. When the examiner asks how deep the program reaches, the answer is specific, documented, and correct.

These outcomes share one design decision: the boundary between assembly work and judgment work was drawn explicitly, before any tool was selected. That sequence is the difference between a program that runs and one that compounds.

The financial services firms making this work did not find a better tool. They mapped their operations first. They drew the line between accuracy work and judgment work, specified the technology against that boundary, and built explainability into the design from the start rather than as a layer added before the next examination.

The gap between 81 percent running AI and 55 percent unable to measure what they got is not a technology gap. It is a sequence gap. It does not close by trying harder with the same approach.

If your operation sounds like the ones described here, the place to start is the mapping, not the tooling. Let’s talk.

Related Insights

AI Token Costs and How They Might Wreck Your Budget

Token prices are falling. Enterprise AI bills are not. The gap is not a pricing problem. It is a volume problem built into the architecture of every agentic deployment, and most enterprises will not see it until the invoice arrives.

The "Vibe Check" Bubble: Why Your AI Pilots Are Unsafe at Scale

There is a reason why 80% of Enterprise AI pilots are currently stuck in "Pilot Purgatory." They work perfectly for ten users. The demo is flawless. The CEO is impressed. But the moment you scale to 10,000 users, the system collapses into a mess of hallucinations, unexplainable loops, and subtle drifts.

Working on something similar?​

We’ve helped teams ship smarter in AI, DevOps, product, and more. Let’s talk.

Stay Ahead of the Curve in Tech & AI!

Actionable insights across AI, DevOps, Product, Security & more