The Global Synthetic Dataset

The Global Synthetic Dataset

The Global Synthetic Dataset is the largest publicly available individual-level data on human trafficking. The dataset is made possible by innovative technology to protect the safety and privacy of victims and survivors. Developed in partnership with Microsoft Research, the dataset provides in-depth information to accelerate evidence-based policy in the fight against human trafficking.

The Global Synthetic Dataset provides critical information on the socio-demographic profile of victims, types of exploitation, and the trafficking process, including means of control used on victims. This data, updated in 2024, represents 20 years of assistance and hotline data – with substantial contributions from IOM and Polaris, as well as contributions from A21, RecollectiV, and the Portuguese Observatory on Trafficking in Human Beings (OTSH).

This dataset represents over 206,000 victims and survivors of trafficking identified across 190 countries and territories from 2002 to 2022. Note that some attributes (e.g., countries) are suppressed as they are highly sensitive and cannot be protected, although it does not mean that no trafficking cases were recorded.

This is the third synthetic dataset derived from victim of trafficking case records. It accurately preserves the statistical properties of the original victim case records while providing the guarantee of differential privacy. Differential privacy was first developed at Microsoft Research in 2006, and today represents the gold standard in privacy protection. The differential privacy approach to synthetic data generation provides quantifiable privacy guarantees against any privacy attacks, even across multiple data releases. The technology has enabled CTDC to share more data and conduct more robust research while protecting privacy and civil liberties.

More information on the approach is available through the open-source software and documentation on differential privacy via GitHub.

This data release and supporting technology were made possible by the Tech Against Trafficking 2019 Accelerator Program, in which IOM worked with Microsoft, Amazon, BT, Salesforce, and the broader community to advance the data and technology foundations of the CTDC platform.

Please find the dataset, codebook, and data dictionary below. We encourage you to check out the FAQs page for more information about the data.

Global Synthetic Data Dashboard

This dashboard is generated using the Global Synthetic Dataset. It represents over 206K trafficked persons assisted by CTDC partners from 2002 to 2022. The Global Synthetic Dataset protects trafficked persons via differential privacy (ε=12, δ=3.9 x 10-7).

Click around and explore! Download the data and read the codebook to learn more. 

Data and Resources

Sub-Block-Page

The Global Synthetic Dataset

This is the global synthetic dataset. It represents over 206,000 victims and survivors of trafficking identified by CTDC partners from 190 countries and territories between 2002 and 2022. The dataset protects trafficked persons via differential privacy with ε=12, δ=3.9 x 10^-7. The dataset is generated with Synthetic Data Showcase developed at Microsoft Research.
csv
Download All
Field Value
Modified 2024-02-27
Release Date 2021-08-19
Identifier 66ec189c-93ea-4eb6-a376-7351f4762cd8
License IOM Terms of Use
Public Access Level Public
Click here to take our survey