Nightly Self-Verification

The Requirement

Complete detection on exact-name matches. A requirement, not a target.

A screening system has to prove it works. If a designated person cannot be found under their own official name, the system has failed.

For exact-name matches there is no acceptable detection rate below 100 %. Anything less means a sanctioned actor can sit in your data unnoticed. The nightly self-verification enforces that bar — automatically, every night.

Test Design

Ten sources × ~22 sessions each = ~223 sessions per night.

Every entity from every active source is screened against itself under multiple conditions, every night. Exhaustive, not sampled.

Configurations and Mutation Levels

Scenarios (progressive information density):

Full Identity (Config A): Name + entity type + date of birth + place of birth + nationality. Ideal-conditions baseline.
Identity, No Context (Config H): Name + type + date of birth. POB and nationality unavailable — a common partial-KYC scenario.
Name with Type (Config B): Name + person/organisation classification. The customer knows whether they are screening a person or a company, but has no biographical data.
Name Only (Config C): Only the name — no type, no DOB, no nationality. The hardest real-world condition: a single string entered on an onboarding form. This is our headline scenario.
Production Mode (Config D): Same data as Full Identity, re-evaluated with the production DOB rescue thresholds used in live screening. Detects recoveries missed by the baseline pass.
Without ML (Config E): Full data, but the Machine Learning scorer is disabled. Isolates the heuristic pipeline — our explainability baseline for audits.
Cross-source validation (runs as a separate session, not part of the progressive stages): Screens each source's entities against all other sources. Tests whether a designation on one list is detectable via a different list's entry for the same actor.

Mutation Levels:

M0 — Unmodified: Original name as it appears on the list. Must achieve complete detection. Any miss at M0 is a critical failure.
M1 — One mutation: Random character swap, omission, insertion, replacement, diacritic change, or transliteration. Models a single transcription error or variant spelling.
M2 — Two mutations: Two mutations applied simultaneously. Tests the degradation curve — how quickly detection falls as input quality degrades.
M3 — Name reorder: Token order is reversed or first and last tokens are swapped. Tests resilience to name component reordering — a common variant when names cross cultural conventions (given name first vs. family name first).
M4 — Partial name: Only the first or last token is kept, simulating partial name entry. Tests whether the system can still surface a candidate when only a fragment of the full name is provided.

Every mutation is deterministically seeded. Every false negative is traceable to the exact mutation that caused it.

What Is Measured

Five metrics per session. Every false negative analysed, not just counted.

Each verification session produces a structured dataset, not a pass/fail flag. Every metric is kept for trend analysis.

Metrics Collected Per Session

Detection Rate: Proportion of entities that appear in their own screening results. The primary metric. M0 detection must be 100%. A separate Wrong Match Rate (WMR) tracks how often a different entity ranks first.
Mean Reciprocal Rank: At which position does the entity appear in its own top-5 results? A rank-1 result is ideal. Rank 2 or lower flags a concern.
Score Distribution: Mean, median, and percentiles across all sessions. Detects drift — a gradual score decline that precedes detection failures.
False Negative Analysis: Every missed entity logged with exact mutation applied and the actual top-1 result returned. Makes failure modes visible and debuggable.
Wilson Confidence Intervals: Statistical bounds on the true positive rate at 95% confidence. Prevents overstating reliability on small source sizes.

Results are exported as a structured XLSX report after each nightly cycle.

What the Results Mean

M0 detection is complete; M1 and M2 rates are measured and published every night.

M0 — Exact name (100.0%): Every entity across all sources is findable under its official name, every night. Any miss at M0 is a critical failure and triggers an immediate alert.

M1 — One mutation (>99.99%): A single character error — typo, transposition, phonetic variant. Example: Sergei Ivanov → Sergei Ivanvo (two adjacent characters swapped), or Mohammed → Mohanned (neighbouring key on the keyboard). Detection counts a hit anywhere in the results, not only at rank 1. The few remaining misses sit in ultra-short identifiers (2–3-character vessel codes, wallet labels), where a single mutation eats most of the name. For people and organisations the rate is higher still.

M2 — Two mutations (>99.7%): Two simultaneous errors compound. Example: Hassan Nasrallah → Hassam Nasralah (neighbouring key + deletion). The remaining misses sit in short identifiers where two mutations leave too little signal for any string-similarity system to recover.

Every false negative is logged with the exact mutation, mutation type, and score. The nightly report names exactly which entries fail under which conditions, and feeds directly into matching-engine fixes.

100.0%Exact-Name Detection (M0)

>99.99%Single-Mutation Detection (M1)

>99.7%Two-Mutation Detection (M2)

DailyEntities Verified

Loading live results…

Live Verification Results

--.--%

Detection Rate at M1 — Name Only (most conservative test)

Config C — Name Only (no type, no DOB, no nationality) · -- entities · updated --

Results by Data Availability

How detection improves when more identity data is available.

The headline uses Config C (name only) — the most conservative scenario where only the name is available. With additional data (date of birth, nationality), detection improves further. Person detection at M1: -- — zero false negatives across all configurations.

Configuration	M0	M1	M2
Loading…

+ All Configurations (A-E)

Loading configuration comparison…

What We Miss — and Why

We publish what we miss. Most providers do not.

Loading false negative analysis…

+ Synthetic Example

Loading…

+ Name Length Distribution of False Negatives