Login / Register

Why Not Just Use an LLM?

LLMs are powerful. They are not a compliance engine. Here is why the architecture matters.

Sanctions lists are subject to change by their issuing authorities.

The Question

Why build a deterministic engine when an LLM could just answer?

It comes up often: why run a hybrid engine — string metrics, ML models, threshold bands, entity guards — when you could hand the query and the target dataset to an LLM and ask "are these the same person?"

Three reasons: speed, auditability, reproducibility.

For the record. This page is about the screening engine, which is fully deterministic. Some adjacent features do use the Anthropic Claude API — Chat (Q&A on entities), Decision Summary (per-screening synthesis), and the ER Guardian audit. They are optional and can be disabled per project. The screening itself never depends on an LLM.

Head-to-Head: Hybrid vs. Pure LLM

Across the criteria that matter in production compliance.

Criterion Hybrid (Heuristic + ML) Pure LLM
Latency Milliseconds per match Seconds per match
Cost at scale Very low — minimal compute Very high — per-token API costs
Auditability Full — every score traceable to the exact comparison Poor — reasoning varies, hard to document
Reproducibility 100% — same input, same output Variable — temperature and model updates affect results
Regulatory acceptance High — deterministic rules satisfy BaFin, FCA, OFAC Low — black-box reasoning is difficult to defend
Bulk throughput Scales to large datasets without API rate limits or token costs Not viable for watchlist screening at scale
Semantic understanding Limited to programmed features Excellent — world knowledge, semantic context
Hallucination risk None — deterministic Real — a missed sanctions hit is a compliance failure

The Hybrid Approach

Deterministic rules at the floor, machine learning at the ceiling.

String metrics and entity guards form the floor. Machine-learning models can only raise a score, never lower it. The reason is regulatory: when the heuristic pipeline says "no match", that decision is mathematically traceable, line by line.

Strengths and limitations

Strengths:

  • Regulatory defensibility: BaFin, FCA, and OFAC all require explainability. When the system rejects a match, you can show exactly why, down to the character comparison.
  • Mass throughput: Banks and payment processors screen millions of transactions a day in real time. String comparisons cost microseconds and scale horizontally — no external API, no rate limits.
  • Surgical control: When a new false-positive pattern appears — say, a newly generic token like "Crypto" — we add a guard. It applies platform-wide on the next screening.

Limitations:

  • Maintenance overhead: Thresholds and weights need ongoing monitoring against a ground-truth dataset. That is what nightly self-verification is for.
  • Context blindness: When a company rebrands from "Twitter" to "X", every string metric fails. We bridge that with metadata — LEIs and curated alias lists — to absorb full name changes.

Sanctions Screening Built to Be Audited.

Every score traceable, every decision documented, no black boxes.

Login / Register