Entity Connections & Network Intelligence

The Intelligence Gap

Sanctions lists are flat. Corporate structures are graphs.

Ten official sanctions sources publish independent lists. The EU does not reference OFAC entries. OFAC does not cross-reference the UN. Each authority publishes its own data — with no links to other jurisdictions and no links to the corporate registries, offshore databases, or ownership records that reveal how designated actors structure their assets.

A sanctioned entity rarely operates alone. It has subsidiaries in foreign jurisdictions, officers who sit on the boards of other companies, shell companies at shared addresses, and intermediaries documented in leaked offshore data. None of this is visible from the sanctions list itself.

We close that gap by combining cross-list referencing with six external data sources, rationale text mining, and graph-based analysis of corporate structures.

Ten Connection Types

Every connection points to a specific data point. Nothing inferred.

Connections are detected from structured data, never guessed. Each type is a distinct evidence class with its own detection mechanism and confidence score.

Cross-List Connections

Shared Address: Two entities registered at the same normalised address across different sanctions programmes. Detected by comparing address fields across all entity cards after every import cycle.
Shared Identifier: The same passport number, LEI, IMO number, or tax ID appearing on separate list entries. Definitive evidence of a relationship or a data quality issue — both worth surfacing.
Shared Alias: A name appearing as an alias of one entity on one list and simultaneously as an alias of a different entity on another list. Cross-indexed across all aliases after every import.
Co-Listed Regulation: Two entities designated under the same EUR-Lex regulation amendment. Tracked via automated EUR-Lex amendment monitoring.

Beneficial Ownership Connections

Offshore Link: A sanctioned entity matched in the ICIJ Offshore Leaks database (Panama Papers, Pandora Papers, Paradise Papers). Includes intermediary and connected entity relationships at 1–2 hop distance.
Corporate Owner: Ownership relationships from GLEIF Level 2, UK Companies House PSC data, and company registry records. Includes direct ownership, ultimate parent, and subsidiary chains.
Corporate Officer: A person identified as director, manager, or authorised signatory of a company linked to a sanctioned entity. Extracted from company registry officer records.

Text-Mined Connections

Rationale Mention: One sanctioned entity named in the listing rationale text of another entity — across sources. Detected by scanning 45,000+ listing texts and 5,500+ EUR-Lex annex texts with an alias-pattern matching engine built from 225,000 name patterns.
Family Member: Family relationships extracted from listing rationale text ("son of", "wife of", "brother of") and linked to the corresponding entity in the registry.
Explicit Associate: Business or political associations explicitly stated in listing rationale text ("associated with", "acting on behalf of", "controlled by").

Six External Data Sources

Offshore leaks, corporate ownership records, company registries, and debarment lists

On top of cross-referencing the sanctions lists themselves, six external data sources map the corporate structures, ownership chains, and offshore networks around sanctioned entities.

Source Details

ICIJ Offshore Leaks (ODbL) — 2 million records from the Panama Papers, Pandora Papers, Paradise Papers, and Offshore Leaks. Matched at 95% precision after LLM-assisted review. Includes intermediary and officer relationships at 1–2 hop distance.
GLEIF Level 2 (Open Data) — 3 million Legal Entity Identifier relationship records. Maps parent–subsidiary chains, ultimate parent entities, and fund relationships. Updated weekly.
UK Companies House PSC (OGL) — Persons with Significant Control data for all UK-registered companies. Includes control type, share thresholds, and beneficial ownership chains. Updated weekly.
OffeneRegister.de (CC BY 4.0) — 5.3 million German companies and 4.8 million officer records from the Handelsregister. Analysed via graph mining (anchor matching, officer extraction, network expansion), not name matching. See case study below.
SAM.gov Exclusions (Public) — US federal debarment and exclusion records. Cross-referenced against sanctioned entities for dual-listing detection.
World Bank / ADB Debarment (Public) — Multilateral development bank debarment lists. Cross-referenced for entities sanctioned in multiple jurisdictions.

Three Detection Directions

Forward, inward, and reverse. Each catches relationships the others miss.

A single matching direction is not enough. Forward matching alone — querying external databases for each sanctioned entity — misses transliteration variants and does not scale to millions of records. We run three complementary directions to maximise coverage.

Forward: Registry → External

Each sanctioned entity's canonical name and aliases are queried against external databases via trigram similarity and full-text search, confirmed by biographical data — date of birth, jurisdiction, registration number. The traditional approach: precise but slow, and limited to the name forms in the registry.

Inward: Rationale Text → Entity Connections

An alias-pattern matching engine is built from 225,000 name patterns (all canonical names, aliases, and name variants of all entities). It scans 45,000+ listing rationale texts and 5,500+ EUR-Lex annex texts in under two minutes. Every mention of one entity in the rationale text of another entity creates a connection — including cross-source connections (e.g., an EU listing rationale that names an OFAC-designated entity).

That produces over 36,000 connections invisible to any approach that only compares names between lists, at over 95% precision — rationale text is structured enough that name mentions are almost always intentional references.

Reverse: External → Registry

The same engine is applied in the opposite direction: millions of external names are scanned for matches against the 161,000 entity name patterns. 5 million names from ICIJ, GLEIF, SAM, and debarment lists are processed in a single pass in under 90 seconds — runtime scales with text length, not pattern count.

Four precision guards eliminate false positives: a pattern blocklist for common words, single-token rejection, an IDF floor (a name appearing in 500+ external records is not discriminative), and a token-overlap gate. The result: 14,000+ connections at 98% measured precision.

Case Study: German Company Registry

Why name matching fails on company registries — and what works instead

The OffeneRegister dataset holds 5.3 million German companies and 4.8 million officer records from the Handelsregister. The naive approach — matching sanctioned entity names against company names — was tested and failed. A name-pattern reverse scan produced 5,099 candidates. An independent review by ten AI agents measured 0.4% precision: one true positive out of 299 sampled entries.

The problem is structural: two-word patterns like "solar invest" or "cargo service" are generic German business vocabulary, not sanctions indicators. Name matching treats the registry as a list of names. But a company registry is not a list — it is a graph. The value is in the structure: which companies link to which officers, and which officers sit on the boards of other companies.

Phase 1 — Anchor Companies

Instead of name-similarity matching, the engine sets exact anchors: sanctioned organisations matched to Handelsregister companies by exact or near-exact name after legal-form stripping. "Gazprom" matches "Gazprom Germania GmbH", "Gazprom NGV Europe GmbH", and "Gazprom Schweiz AG". "VTB" matches "VTB Bank (Europe) SE". These are not probabilistic matches — they are known subsidiaries registered in the commercial register.

Quality filters eliminate generic matches: a blocklist of ~90 common German business names (Berg, Phoenix, Fortuna), minimum token-length requirements, and confidence tiers by match type. Three iterations, each reviewed by a dedicated audit agent, refined the filter set from 3,996 anchors (10–15% precision) down to 2,607 anchors at over 90% precision. 812 sanctioned entities mapped to 2,340 German companies.

Phase 2 — Officer Extraction

For each anchor company: who is or was the managing director, board member, or authorised signatory? The dataset contains the full officer history for each company. 4,817 officer links were extracted from 1,607 companies, identifying 4,199 unique persons. 32% of these officers carry a "dismissed" flag — they were removed from their position at a specific date, which is intelligence in itself when compared against sanctions designation dates.

Phase 3 — Network Expansion

For each extracted officer: in which other companies does this person appear? This is the 1-hop expansion that reveals potential evasion structures, successor vehicles, and shell companies. 2,887 officer profiles were expanded (after filtering 59 professional administrators with 100+ company directorships — Treuhand noise, not signal), producing 16,608 exposure links to 11,389 unique companies via 1,618 officers.

// Concrete example Sanctioned Entity: ASCOTEC Holding GmbH (Iran procurement network) → Anchor: ASCOTEC Holding GmbH (Handelsregister HRB ...) → Officers: Ahmad Karami, Ahmad Katani, Ehsan Mojtahed → Also at: ASCOTEC Steel Trading GmbH → Also at: ASCOTEC Mineral & Machinery GmbH → Also at: Kara Industrial Trading GmbH → Also at: Breyeller Kaltband GmbH // Result: 3 Hub Officers across 4 sanctioned entities + network companies

Phase 4 — Structural Signals

The final phase analyses the expanded network for structural patterns that indicate elevated risk:

Hub Officers (305 detected): A person sitting on the boards of companies linked to two or more distinct sanctioned entities. This is the strongest signal — a person bridging separate sanctions networks through corporate directorships.
Dismissed Officers (1,525 detected): Officers removed from their positions at anchor companies. The dismissal date compared against the sanctions designation date reveals whether the removal was a response to sanctions — or a pre-emptive restructuring.
High-Exposure Officers (571 detected): Persons with directorships at five or more companies beyond their anchor company. Indicates a professional network that could facilitate sanctions evasion through corporate complexity.
Verwaltungs-GmbH Patterns (2,920 detected): "GmbH & Co. KG" structures where the Komplementär-GmbH is a management shell. Common in legitimate German corporate structuring, but also used to obscure beneficial ownership.
Shared Address Clusters (1,604 detected): Multiple anchor-linked companies registered at the same address. Potential indicators of shell-company clusters or shared nominee offices.

Total: 6,925 structural signals, of which 2,029 classified as HIGH severity.

What comes out

Not thousands of noisy name matches, but structured entity connections stored in the registry. Every anchor company, every extracted officer, and every 1-hop exposure company becomes a connection in the entity graph — with type, confidence score, evidence, and the concrete path that produced it.

These connections enrich the entity profiles visible in the network graph, feed the post-screening exposure detection, and are accessible via the REST API. A compliance officer reviewing a screening hit sees not just the sanctions match, but the corporate structure surrounding it: which German subsidiaries exist, who manages them, and where the officers also hold positions.

Interactive Network Graph

Explore entity relationships visually — filter, expand, export.

Every entity's connections are visualised as an interactive network graph. Nodes represent entities and associated actors. Edges represent connections with type-specific styling. The graph loads on demand and supports click-to-expand: clicking a node fetches its neighbours from the server, revealing the network incrementally.

Visualisation Details

Layout engine: Interactive graph viewer with force-directed and hierarchical layout algorithms. Automatic layout selection based on graph structure.
Node colours: Red for sanctioned entities, amber for offshore officers, violet for shell companies, cyan for intermediaries. PEP entities are marked separately.
Connection-type filters: Five toggle buttons (Family, Corporate, ICIJ, Shared, Rationale) filter the visible network in real time. Orphan nodes hidden automatically when their edges are filtered out.
Click-to-expand: Clicking any node loads its direct neighbours from the server, expanding the graph incrementally up to a 300-node hard cap. Rate-limited to prevent overloading.
Node and edge panels: Clicking a node shows entity details (type, DOB, nationality, sanctions status). Clicking an edge shows connection evidence, confidence score, and a compliance disclaimer by connection type.
CSV export: Export the visible network (filtered edges and connected nodes) as CSV for offline analysis.

Network Screening

Post-screening exposure detection. Every screened name checked against 54,000 network names.

After the primary sanctions screening completes, a second pass checks every screened name against the Connection Name Index — a curated index of 54,000 names derived from entity connections. If a screened name matches a person or company that is connected to a sanctioned entity (but not sanctioned themselves), this is surfaced as network exposure context alongside the primary screening results.

Network exposure is context, not a sanctions hit. It means: "this name is not on a sanctions list, but it appears in the network of a sanctioned entity." The compliance officer decides what to do with that — the system makes sure they see it.

Precision & Architecture

The Connection Name Index is rebuilt nightly from entity connections and reverse name-pattern scan results. Five precision guards ensure 98% measured precision (audited on 600 stratified samples with Wilson confidence intervals):

Self-reference filter: Names that are spelling variants of the sanctioned entity itself are excluded (normalised match + token subset + similarity score ≥ 90).
Short-word filter: Single words under 4 characters are excluded, unless they match known acronyms of sanctioned organisations (e.g., IRGC, NIOC).
Name-fragment filter: Single-word entries that are tokens of a sanctioned entity's name are excluded (prevents first-name fragments).
ORG-existence guard: Single-word entries must match a sanctioned organisation's canonical name or known variant. Eliminates vessel names and common first names.
Token-frequency cap: Names appearing in 500+ external records are excluded — if a name appears everywhere, it is not discriminative.

Network exposure is score-neutral: it adds metadata to screening results without touching the screening score or threshold. The core sanctions screening stays untouched and independently verifiable.

≈118,000+Entity Connections

54,000Network Names Indexed

98%Measured Precision

6External Data Sources