Nightly Self-Verification
Detection proven every night. Not assumed.
Early access. Sanctions lists are subject to change by their issuing authorities.
The Requirement
Complete detection on exact-name matches. Not a target — a requirement.
A screening system must prove that it works — not just produce results and hope. The standard is unambiguous: if a designated person cannot be found by their own official name, the system has failed.
There is no acceptable detection threshold below complete for exact-name matches. Anything less means a sanctioned actor can appear in your data and go undetected. The nightly self-verification enforces this standard automatically.
Test Design
Nine sources × ~22 sessions each = ~196 sessions per night.
Every entity from every active source is screened against itself under multiple conditions. The test is not a sample — it is exhaustive. Every entity. Every night.
Configurations and Mutation Levels
Configurations:
- Baseline: Full data — name, entity type, date of birth, nationality. Tests the system under ideal conditions.
- Without entity type: Tests resilience when type is unknown — a common real-world scenario when screening from unstructured data.
- Without tertiary data: Tests name-only matching strength. The baseline for any system that receives a name without biographical context.
- Production mode: Re-evaluates results with production DOB rescue thresholds. Tests whether entities that narrowly miss the threshold in baseline would be recovered by the DOB rescue pass used in live screening.
- Without ML: Disables the Machine Learning override entirely. Isolates the heuristic pipeline performance — the score floor without ML assistance.
- Cross-source validation (runs as a separate session, not part of configs A–E): Screens each source's entities against all other sources simultaneously. Tests whether a designation on one list is detectable via a different list's entry for the same actor.
Mutation Levels:
- M0 — Unmodified: Original name as it appears on the list. Must achieve complete detection. Any miss at M0 is a critical failure.
- M1 — One mutation: Random character swap, omission, insertion, replacement, diacritic change, or transliteration. Models a single transcription error or variant spelling.
- M2 — Two mutations: Two mutations applied simultaneously. Tests the degradation curve — how quickly detection falls as input quality degrades.
- M3 — Name reorder: Token order is reversed or first and last tokens are swapped. Tests resilience to name component reordering — a common variant when names cross cultural conventions (given name first vs. family name first).
- M4 — Partial name: Only the first or last token is kept, simulating partial name entry. Tests whether the system can still surface a candidate when only a fragment of the full name is provided.
Every mutation is deterministically seeded. Every false negative is traceable to the exact mutation that caused it.
What Is Measured
Five metrics per session. Every false negative analysed, not just counted.
The output of each verification session is a structured dataset — not a pass/fail flag. Every metric is preserved for trend analysis.
Metrics Collected Per Session
- Detection Rate: Proportion of entities that appear in their own screening results. The primary metric. M0 detection must be 100%. A separate Wrong Match Rate (WMR) tracks how often a different entity ranks first.
- Mean Reciprocal Rank: At which position does the entity appear in its own top-5 results? A rank-1 result is ideal. Rank 2 or lower flags a concern.
- Score Distribution: Mean, median, and percentiles across all sessions. Detects drift — a gradual score decline that precedes detection failures.
- False Negative Analysis: Every missed entity logged with exact mutation applied and the actual top-1 result returned. Makes failure modes visible and debuggable.
- Wilson Confidence Intervals: Statistical bounds on the true positive rate at 95% confidence. Prevents overstating reliability on small source sizes.
Results are exported as a structured XLSX report after each nightly cycle.
What the Results Mean
M0 detection is complete. M1 and M2 detection rates are measured, analysed, and published — every night.
M0 — Exact name (100.0%): Every entity, across all sources, is findable by its official name. Every night. No exceptions. Any miss at M0 is a critical failure and triggers an immediate alert.
M1 — One mutation (>99.9%): A single character error — a typo, a transposition, a phonetic variant. Example: Sergei Ivanov → Sergei Ivanvo (two adjacent characters swapped), or Mohammed → Mohanned (key next to ‘m’ on the keyboard). The detection rate measures whether the correct entity appears anywhere in the results — not just at rank 1. The few false negatives are concentrated in ultra-short identifiers (2–3 character vessel codes, wallet labels) where a single mutation destroys most of the name. For person and organisation names, the effective rate is even higher.
M2 — Two mutations (>99%): Two simultaneous errors compound. Example: Hassan Nasrallah → Hassam Nasralah (adjacent key + character deleted). This stress-tests the degradation curve. The remaining false negatives are concentrated in short identifiers where two mutations leave too little structure for any string similarity system to recover.
Every false negative is logged with the exact mutation applied, the mutation type, and the score achieved. The nightly report identifies exactly which names fail under which conditions — and that information feeds directly into matching pipeline improvements.
Loading live results…
Sanctions Screening Built to Be Audited.
Early access is free and includes full screening functionality across all official sources, the complete review workflow, and audit-ready exports.
Login / Register