Why Not Just Use an LLM?
LLMs are powerful. They are not a compliance engine. Here is why the architecture matters.
Sanctions lists are subject to change by their issuing authorities.
The Question
Why build a deterministic engine when an LLM could just answer?
It comes up often: why run a hybrid engine — string metrics, ML models, threshold bands, entity guards — when you could hand the query and the target dataset to an LLM and ask "are these the same person?"
Three reasons: speed, auditability, reproducibility.
For the record. This page is about the screening engine, which is fully deterministic. Some adjacent features do use the Anthropic Claude API — Chat (Q&A on entities), Decision Summary (per-screening synthesis), and the ER Guardian audit. They are optional and can be disabled per project. The screening itself never depends on an LLM.
Head-to-Head: Hybrid vs. Pure LLM
Across the criteria that matter in production compliance.
| Criterion | Hybrid (Heuristic + ML) | Pure LLM |
|---|---|---|
| Latency | Milliseconds per match | Seconds per match |
| Cost at scale | Very low — minimal compute | Very high — per-token API costs |
| Auditability | Full — every score traceable to the exact comparison | Poor — reasoning varies, hard to document |
| Reproducibility | 100% — same input, same output | Variable — temperature and model updates affect results |
| Regulatory acceptance | High — deterministic rules satisfy BaFin, FCA, OFAC | Low — black-box reasoning is difficult to defend |
| Bulk throughput | Scales to large datasets without API rate limits or token costs | Not viable for watchlist screening at scale |
| Semantic understanding | Limited to programmed features | Excellent — world knowledge, semantic context |
| Hallucination risk | None — deterministic | Real — a missed sanctions hit is a compliance failure |
The Hybrid Approach
Deterministic rules at the floor, machine learning at the ceiling.
String metrics and entity guards form the floor. Machine-learning models can only raise a score, never lower it. The reason is regulatory: when the heuristic pipeline says "no match", that decision is mathematically traceable, line by line.
Strengths and limitations
Strengths:
- Regulatory defensibility: BaFin, FCA, and OFAC all require explainability. When the system rejects a match, you can show exactly why, down to the character comparison.
- Mass throughput: Banks and payment processors screen millions of transactions a day in real time. String comparisons cost microseconds and scale horizontally — no external API, no rate limits.
- Surgical control: When a new false-positive pattern appears — say, a newly generic token like "Crypto" — we add a guard. It applies platform-wide on the next screening.
Limitations:
- Maintenance overhead: Thresholds and weights need ongoing monitoring against a ground-truth dataset. That is what nightly self-verification is for.
- Context blindness: When a company rebrands from "Twitter" to "X", every string metric fails. We bridge that with metadata — LEIs and curated alias lists — to absorb full name changes.
Sanctions Screening Built to Be Audited.
Every score traceable, every decision documented, no black boxes.
Login / Register