Login / Register

Entity Resolution

A false merge hides a designation. A missed merge leaves a gap. Ten guards prevent both.

Early access. Sanctions lists are subject to change by their issuing authorities.

The Stakes

Entity Resolution is a consequential decision. Wrong in either direction: compliance failure.

Entity Resolution decides whether two list entries represent the same real-world actor. Get it wrong in one direction: a designated person disappears behind the wrong identity card. Get it wrong in the other: the same actor appears under multiple disconnected records, and a screening result depends on which name variant was searched.

Both failures are compliance failures. The system is designed to avoid both — with separate mechanisms for each failure mode.

Resolution Tiers

Four tiers, attempted in order. Guards block bad merges before they happen.

Four tiers, attempted in order. If a guard blocks a merge at any tier, the system falls to the next — never forcing a merge that violates a safety constraint.

Tier Definitions
  • Tier 1: Exact match on canonical name and entity type. Highest confidence — no scoring required. Guards still apply.
  • Tier 2: Exact match on canonical name only. Entity type may differ — the Entity Type Guard determines whether enrichment is permitted.
  • Tier 3: Fuzzy match using the scoring pipeline. Above the auto-merge threshold: merged automatically, subject to guards. Between auto-merge and review thresholds: held for manual review. Below review threshold: new entity created.
  • Tier 4: No match found — new entity created. The incoming entry is the first known record of this actor.

From Source to Entity

Three stages reduce raw list data into deduplicated, testable entities.

Every sanctions authority publishes its list in a different format — XML, CSV, JSON, or API. The same designated person can appear in dozens of raw records. Before entity resolution begins, the data passes through three reduction stages.

The Data Pipeline

Step 1 — Download & Parse: The system downloads the authoritative data feed from each source. A single person can appear many times in the raw feed — the EU consolidated list, for example, contains one record per regime listing, so a person sanctioned under five EU programmes appears five times. The parser deduplicates by source-specific identifier, collapsing all records into one entry per designated actor. If the source authority assigns two different IDs to two different persons, they remain two separate entries — regardless of how similar the names are. The source authority defines actor boundaries, not the system.

Step 2 — Registry Import: Each deduplicated entry becomes one listing in the entity registry. The system tracks every change via content hashing and version history. If a listing's properties change between imports, a new version is created. If nothing changed, no action is taken.

Step 3 — Entity Resolution: Listings from different sources that refer to the same real-world actor are resolved into a single entity. This is where the four resolution tiers and ten guards described below apply. The result: one entity card per real-world actor, with all cross-source listings attached.

Example — March 2026 figures:

  • EU Consolidated: 27,793 name records in XML → 5,860 deduplicated entries → 5,808 testable entities
  • OFAC SDN: 41,547 name records → 18,706 entries → 18,576 testable entities
  • UK FCDO: 14,523 name records → 6,033 entries → 5,985 testable entities
  • All sources combined: 78,000+ unique entities in the registry — 29,000+ sanctioned entities from 9 sanctions lists, 44,000+ politically exposed persons from 11 PEP sources

Figures as of March 2026. Counts change nightly as authorities update their lists. These numbers refer to the deduplicated XML/API feeds processed by the system. CSV or XLSX files available for manual download from the same authorities may show different row counts — for example, one row per regime listing rather than one row per designated person.

Ten Guards

Each guard addresses a documented class of false merge. None are arbitrary.

Guards are evaluated before any merge — auto or manual review. A guard can downgrade a proposed auto-merge to manual review, or block a merge entirely. Guards cannot be bypassed by a high score.

All Ten Guards
  • 1. Intra-Source Guard: Two entries from the same source with different source IDs are never merged. Different IDs mean different actors — by definition. The source authority made that determination.
  • 2. Entity Type Guard: COMPANY cannot merge into PERSON. VESSEL cannot merge into ORGANISATION. UNKNOWN allows cross-source type enrichment — it is the only permissive case.
  • 3. Single-Token Canonical Guard: Single-token names inflate fuzzy scores. If a multi-token variant exists, it is promoted to canonical before matching begins. This prevents short surnames from scoring high against unrelated full names.
  • 4. Short-Name Guard: Fuzzy matches between single-word names are unreliable — downgraded from auto-merge to manual review regardless of score. A reviewer must confirm.
  • 5. Length-Ratio Guard: When the shorter name is a small fraction of the longer and score is below a high threshold — downgraded to review. Prevents a fragment of one name from merging into a full name of a different entity.
  • 6. Category Guard: Sanctions vs securities vs PEP — different categories downgrade auto-merge to manual review. A securities instrument should not silently merge with a sanctioned person.
  • 7. Nationality Guard: When both entries have nationality data and values differ — auto-merge downgraded to review, regardless of score. See the documented case below.
  • 8. Alias Contamination Guard: Incoming aliases that match another entity's canonical name are rejected — preventing cross-contamination. A name that already identifies a different sanctioned actor cannot become an alias of a second.
  • 9. Surname Mismatch Guard: For person names with three or more tokens — when two tokens match well (given name and patronymic) but each side has an unmatched token that differs substantially — the score is capped below auto-merge. Prevents false merges between "Ivanov Aleksandr Gennadevich" and "Bogdanov Aleksandr Gennadevich".
  • 10. ORG Distinguishing Token Guard: When two organisations share generic terms but each has a meaningful distinguishing token that differs — the score is capped below the review threshold. Prevents "Kovrov Mechanical Plant" from merging with "Serov Mechanical Plant".

Nationality Guard — A Real Case

Same name. Same programme. Two different persons. The guard catches what the score cannot.

Two people can share an identical full name, appear on the same sanctions programmes, and be entirely different persons. The nationality guard was introduced after a documented case.

The Documented Case

Two sanctioned individuals with the same canonical name in Cyrillic — one a Russian State Duma member, one a Belarusian Major General — had been merged into a single entity card. The EU and UK designations of two distinct people were attributed to one record.

The scoring engine produced a score above the auto-merge threshold because the names were identical. The nationality mismatch penalty reduced the score — but not below the threshold.

The nationality guard adds a structural safety layer: if both entries have nationality data and the values differ, the merge is held for human review regardless of score. The reviewer confirms whether the difference is explainable (dual citizenship, country name variant) or confirms that these are different persons.

The guard does not reject merges. It flags them. Human judgment closes the loop.

10Resolution Guards
4Resolution Tiers
FullAudit Trail per Merge
Manual ReviewQueue for Borderline Cases

Sanctions Screening Built to Be Audited.

Early access is free and includes full screening functionality across all official sources, the complete review workflow, and audit-ready exports.

Login / Register