Multi-Algorithm Name Matching

Beyond keyword search.
Every score traceable to the exact character comparison that produced it.

Sanctions lists are subject to change by their issuing authorities.

Why Keyword Search Fails

The compliance gap that exact matching cannot close

Keyword search returns exact matches. Sanctions lists don't contain exact matches. "Владимир Путин" becomes "Vladimir Putin" becomes "Wladimir Putin" depending on transliteration standard. "Kulazhin" and "Kulagin" differ by two characters — and are two different sanctioned persons.

A compliance system that cannot handle this is not a compliance system. The names on the list are not necessarily the names in your records. The gap between them is where sanctions evasion lives.

The Multi-Layer Pipeline

Every comparison runs the same pipeline. Nothing skipped, nothing approximated.

A name query triggers eleven sequential layers, each capturing a different class of name variation. The layers interact — bonuses can only lift a score, guards can only cap it — to produce a final score from 0 to 100.

All Eleven Layers

Four String Metrics: Token Set Ratio captures word reordering. Partial Ratio catches truncations. Token Sort Ratio handles name component reordering. Character Ratio measures overall character-level similarity. Each captures a different class of name variation.
Weighted Combination: Metrics combined with tuned weights. When query and candidate differ greatly in length, weights shift automatically — preventing a short name from scoring artificially high as a substring of a long one.
Organisation Guards: Common-Word Guard caps scores when overlap consists only of generic terms (bank, group, holdings, international). Token-Overlap Guard caps scores (for organisations and companies) when fewer than 40% of query words appear in the candidate. Both prevent false positives on generic organisation names.
Subset Bonus: When one name is a proper subset of the other, the score is boosted proportionally to coverage. "Putin" ⊂ "Vladimir Putin" is rewarded. "bank" ⊂ "Deutsche Bank AG" is not — it fails the Common-Word Guard.
Jaro-Winkler Bonus: A string-similarity measure that rewards matching prefixes — catches transliterations that preserve the beginning of a name. Reduced for names shorter than 6 characters, where prefix matching is less discriminative.
Phonetic Bonus: Two phonetic algorithms (Soundex and Metaphone) detect names that sound alike despite different spelling. Boosts score by up to 5 points. Catches transliteration variants that pure string metrics may miss.
Surname Boost: For person names, independently rewards matching surnames and first names — because a surname match is stronger evidence than a random string overlap.
Tertiary Penalty: When biographical data is available, it is compared. Date of birth mismatch reduces the score. Place of birth mismatch reduces the score. Nationality mismatch reduces the score. Gender mismatch reduces the score for person entities. A matching Legal Entity Identifier suppresses all other tertiary checks — it is definitive identity proof. An exact date-of-birth match suppresses secondary mismatches — it is treated as identity confirmation. Maximum combined penalty: capped to avoid over-penalising sparse data.
Identifier Match Bonus: When both the query and the candidate share the same Legal Entity Identifier (LEI), and at least one name token overlaps, the score receives a hard-positive boost. This identifier-graph signal actively rewards confirmed identity rather than merely suppressing penalties.
Short-Name Cap: Single-word organisation names are capped based on character length. A three-character acronym can score no higher than 70. A seven-character name can score no higher than 95. Prevents inflated confidence on fragments.
Machine Learning Override: Four ML models, one per entity type, score each match against 27 features — string metrics, script detection, legal-suffix equivalence. ML can only raise a score, never lower it. The heuristic engine is the floor; ML is the ceiling.

Thresholds by Entity Type

One threshold for all entity types produces noise. Separate thresholds, tuned per type, produce precision.

Below threshold, a result is discarded. Above threshold, it appears in the review pipeline. Thresholds are not universal — generic organisation names require a higher bar to avoid noise.

Threshold Bands by Entity Type

Person / Unknown: Lower threshold. Names are highly variable across transliterations and jurisdictions. The engine must cast a wider net.
Organisation / Company / Security: Higher threshold. Generic word overlap is common. The Common-Word Guard and Token-Overlap Guard reduce noise, but a higher base threshold adds a second layer of defence.
Vessel / Aircraft: Intermediate threshold. Names are often distinctive but can be translated or abbreviated across registries.

Thresholds are configurable per project. The defaults are tuned against the nightly self-verification results across all active sources.

Zone Classification

Results sorted by Machine Learning confidence. The highest-risk results surface first.

Results above threshold are classified into zones by Machine Learning confidence. Zone assignment determines review order — not whether a result is shown.

Zone Definitions

Zone A — Priority: High ML confidence — likely true positive. Review first.
Zone B — Review: Above decision threshold or with a strong heuristic score, but lower or absent ML confidence. Manual check.
Zone C — Workbasket: Below both thresholds. Can be bulk-cleared with configurable auto-clear, with safety floors at 50 (persons) and 72 (organisations) to prevent accidental clearance of true positives.

The heuristic floor is a safety net. Even if the Machine Learning model is uncertain, a high heuristic score keeps a result in the review queue. Machine Learning cannot suppress a strong name match.

Multi-LayerScoring Pipeline

4ML Models

27ML Features per Match

<50msPer Name

See live matching results →

Sanctions Screening Built to Be Audited.

Free to use: full screening across every official source, the complete review workflow, and audit-ready exports.