There's a shift that's been building in BSA/AML examinations over the past five years. Examiners used to ask "how many SARs did you file?" Now they ask something harder: "Can you show me your methodology?"
That question exposes a gap that most mid-market fintechs haven't closed. They've built alert rules. They've hired compliance analysts. They file SARs. But ask them to reconstruct *why* they didn't escalate a specific transaction from six months ago — and the documentation falls apart.
That's not a SAR volume problem. It's a defensibility problem. And it's the real compliance risk most fintechs are carrying right now.
What Examiners Actually Care About
Let's start with what regulators are actually evaluating, because the target has shifted substantially since the early 2020s.
Following FinCEN's 2020-2023 guidance updates, examiner focus moved away from raw SAR counts toward process quality. The question is no longer whether you filed — it's whether your decision, whatever it was, was reasonable given your methodology at the time.
This matters because it changes the compliance exposure profile entirely. A false negative (missed SAR) is still bad. But a false negative you can document and justify — "our system flagged this as lower risk based on these documented criteria, and here's the audit trail showing consistent application of that criteria" — is a very different finding than a false negative where you have no record of ever having evaluated the transaction.
The examiner's actual question: "Did you look at this, and can you prove you applied your stated methodology consistently?" They're not expecting perfection. They're expecting a documented, repeatable process.
Three things regulators now audit for specifically:
- Escalation audit trails — documented evidence that suspicious transactions were reviewed and a decision was made
- Non-escalation audit trails — equally important: documented evidence for transactions you looked at and decided *not* to file on, with reasoning
- Methodology consistency — proof that you applied the same criteria across similar transactions, not just the ones that got escalated
Most compliance teams have solid documentation on their SARs. The gap is almost always in the non-escalations — the transactions you reviewed and cleared. If you can't reconstruct those decisions, you're exposed.
The False Positive Problem (Why Rules Fail)
The standard approach to SAR triage is rules-based monitoring: flag transactions above certain thresholds, matching certain patterns, involving certain counterparties. The problem is well-documented at this point: rules-based systems generate between 70 and 90 percent false positives in live environments.
That number isn't a system failure. It's an inherent property of rule-based detection applied to complex human financial behavior. Consider a $50,000 wire transfer to an account your system has never seen. Your rules flag it. Your analyst opens it. It's a down payment on a house — inheritance from a deceased parent. The transaction is completely legitimate, and completely indistinguishable from a suspicious one based on threshold rules alone.
The downstream effect is significant. High false positive rates don't just waste analyst time. They create two specific compliance risks:
Risk 1: Real signals buried in noise. When analysts are processing 200 alerts a day and 170 of them are clearly legitimate, fatigue sets in. The 5 genuinely suspicious transactions get the same shallow review as the 165 clean ones. Your detection rate for actual suspicious activity drops.
Risk 2: Process is just static rules. When an examiner asks "what's your methodology?" and the answer is "we flag everything over X threshold," that's a methodology. But it's not a defensible one, because it's indiscriminate. It doesn't demonstrate that you've applied judgment — it demonstrates that you've applied arithmetic.
Rules aren't going away. They're a reasonable first filter. The problem is when rules *are* the triage process rather than the front end of one.
What "Defensible" Actually Means in Practice
Defensibility has three components. All three need to be present. One or two isn't enough.
1. Documented Methodology
Your triage criteria need to be written down, versioned, and reviewed on a defined schedule. "We look at transaction patterns and use analyst judgment" is not a documented methodology. It's a description of ad-hoc review.
A documented methodology specifies: what signals you weight, how you weight them, what combinations of signals trigger escalation versus disposition, and what the rationale is for those thresholds. It doesn't need to be a hundred-page manual — but it does need to exist, be accessible, and be the thing your analysts are actually following.
2. Consistent Application
The methodology only protects you if you can show it was applied consistently. Two transactions with the same risk profile should receive the same treatment. If one analyst escalates and another clears identical circumstances, that inconsistency is an examiner red flag.
This is where most manual review processes fail at scale. Individual analysts develop their own heuristics. Shift changes mean different reviewers apply different standards. Over six months, your "methodology" in practice becomes a statistical average of twenty different individual approaches — none of which match the documented process.
3. Complete Audit Trail
Every decision needs a record. Not just the SARs you file — every transaction you reviewed, who reviewed it, what criteria were applied, what the outcome was, and when. That record needs to be immutable and retrievable on demand.
"We document the SARs we file" is not an audit trail. An audit trail is the ability to pull up any transaction from the past 12 months and reconstruct the complete decision record in under five minutes.
Examiner red flag: "Can you show me your decision record for transactions you reviewed and cleared last quarter?" If your answer requires more than a few clicks, you have a documentation gap.
The Mid-Market Gap — and Where AI Fits
Here's the structural problem facing most fintechs with $50M to $1B AUM: enterprise SAR monitoring platforms (Actimize, NICE, Fiserv) solve this problem reasonably well, but they cost $25,000 to $100,000 per month. That's sized for banks. It's not sized for growth-stage or mid-market fintechs.
The alternative — manual analyst review of rules-based alerts — doesn't scale, doesn't produce consistent methodology application, and doesn't generate the kind of audit trail that satisfies examiners. You end up with high costs, high false positive rates, and still marginal defensibility.
This is the gap AI-assisted triage was built to close. Not as a replacement for compliance judgment, but as a methodology layer that solves exactly the three defensibility problems above.
Here's what AI-assisted triage actually changes:
On methodology: AI applies consistent criteria across every transaction. The "methodology" isn't a document sitting in a shared drive — it's operationalized in the scoring logic. When an examiner asks what your methodology is, you can point to a system that applies the same criteria to every transaction, every time.
On false positives: Well-trained classification models reliably reduce false positive rates by 50-70% compared to pure rules-based approaches. Not because AI is magic, but because AI can incorporate contextual signals — account history, behavioral patterns, counterparty risk profiles — that threshold rules can't. The $50K wire transfer looks different to a model that knows this account has a history of large legitimate transfers than it does to a rule that only sees the dollar amount.
On audit trails: Every AI-scored transaction comes with a decision record by default. Score, contributing factors, timestamp, analyst disposition. The audit trail isn't something you have to build — it's a byproduct of the process.
The key caveat is that AI-assisted triage is only more defensible than manual review when it's implemented correctly. "We use AI" is not a methodology. The defensibility comes from the combination: documented criteria encoded in the model, consistent application enforced by the system, and audit trails generated automatically.
The right frame is: AI handles the consistency problem; humans handle the nuance problem. AI makes sure every transaction gets evaluated against the same criteria, every time. Human analysts make the calls on the cases where the signals are genuinely ambiguous and institutional knowledge matters. That separation of concerns — AI scores, humans judge — is both more effective and more defensible than either approach alone.
How Tagmatic Fits
Tagmatic is built as a methodology application layer for compliance teams that need defensible annotation without enterprise-tier overhead. It doesn't replace your analysts — it gives them a consistent framework to work within, and generates the audit trail automatically.
You bring the compliance expertise: the risk criteria that matter for your specific product and customer base, the thresholds that reflect your risk appetite, the institutional knowledge about what legitimate transaction patterns look like in your context. Tagmatic brings the consistency: every transaction evaluated against the same criteria, every decision documented, every disposition logged with full context.
The result is a SAR triage process where the answer to "can you show me your methodology?" isn't a shrug or a policy document — it's a live audit trail that reconstructs any decision in under a minute.
If you're a mid-market fintech building or overhauling your AML program, start at tagmatic.app. The playground lets you run a few transactions through classification without signing up, which is usually enough to see whether the approach fits your use case.
See it on your own transactions
The Tagmatic playground runs live classification on sample data — no account required. See how consistent, documented triage compares to your current process.
Try the playground →