Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
182 changes: 131 additions & 51 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,77 +1,157 @@
# Project overview
...

# Installation
# 🧠 Human Trafficking Data Intelligence: *Modern Day Slavery Still Exists*

1. **Clone the repository**:
> **Human trafficking is not history — it is today’s silent crisis.**
> This project brings data to the frontlines of one of the world’s most pressing human rights challenges.

```bash
git clone https://github.com/YourUsername/repository_name.git
```
---

2. **Install UV**
## 🌍 Project Title: International Trafficking Victim Analytics & Intelligence

If you're a MacOS/Linux user type:
### 📣 Executive Summary

```bash
curl -LsSf https://astral.sh/uv/install.sh | sh
```
This project addresses the global issue of modern-day slavery using data analytics. Human trafficking remains widespread, with many victims going unidentified and unsupported. By analyzing international data, we aim to uncover patterns, highlight countries with significant reports of human trafficking, and assist data-based decisions by governments and NGOs working to stop these crimes and help survivors.

If you're a Windows user open an Anaconda Powershell Prompt and type :
Our project is framed as a **public threat case**, with the goal to:
- **Understand the scope and distribution** of trafficking offenses.
- **Uncover patterns in victim demographics** to inform specialized support.
- **Help governments and NGOs** strategically allocate resources and establish victim recovery centers.

```bash
powershell -ExecutionPolicy ByPass -c "irm https://astral.sh/uv/install.ps1 | iex"
```
By analyzing reported geographic and demographic data on this issue, we seek to build insights that raise awareness and assist real-world intervention.

3. **Create an environment**
---

```bash
uv venv
```
## 🔍 Hypothesis & Research Questions

3. **Activate the environment**
### Hypothesis
With the right data, enforcement agencies can identify critical hotspots and demographic groups in need of immediate support.

If you're a MacOS/Linux user type (if you're using a bash shell):
Insights:

```bash
source ./venv/bin/activate
```
> Determine the concentration of reported victims and types of trafficking to assist governments make data-based decisions to fight these crimes.

If you're a MacOS/Linux user type (if you're using a csh/tcsh shell):
> Assist NGOs as to where (which countries) to open up support centers to help victims.

```bash
source ./venv/bin/activate.csh
```

If you're a Windows user type:
### Key Questions
- Which countries have the highest number of reported offenses and need prioritized support from NGOs?
- What are the gender and age demographics of victims?

```bash
.\venv\Scripts\activate
```
---

4. **Install dependencies**:
## 🧾 Dataset Description

```bash
uv pip install -r requirements.txt
```
The dataset, comprising of **39485 rows** and **10 columns**, originates from the United Nations Office on Drugs and Crime and encompasses two decades of multilevel information data across **regions**, **subregions**, **countries**, and **demographics**, **victims** and **offenders**.

# Questions
...
### Features Breakdown:

# Dataset
...
| Column | Description |
|------------------|-----------------------------------------------------------------------------|
| `country` | Country where the incident occurred or victim was detected |
| `region` | High-level regional classification (e.g., Asia, Africa) |
| `subregion` | Subdivided region classification |
| `indicator` | Classification of the record (e.g., offense or victim repatriation) |
| `dimension` | Reporting dimension (e.g., country of detection, repatriation) |
| `category` | Victim source category or trafficking type |
| `sex` | Gender of the victim (if reported) |
| `age` | Age group (minor or adult) (if reported) |
| `year` | Year the offense or detection was reported |
| `nr_of_victims` | Number of victims (cleaned and converted to numeric for analysis) |

## Main dataset issues
> Note: Data cleaning was applied to standardize victim counts and handle anonymized entries (e.g., "<5" to mean value "2.5").

- ...
- ...
- ...
### Dataset obstacles:

## Solutions for the dataset issues
...
In analyzing datasets related to illicit activities, a major challenge is incomplete reporting, leading to missing values. This lack of data, often due to underreporting or the secretive nature of these activities, complicates data processing and analysis. Such gaps can undermine the accuracy of analytical models, requiring techniques like data imputation to mitigate the impact and enhance analysis reliability.

# Conclussions
...
---

## 🧱 Entity Relationship Model and Diagram

The database schema is relationally structured to support multi-layered analysis across geography and time.

### Core Tables:

- **Region (region_id, region_name)**
- **Subregion (subregion_id, subregion_name, region_id)**
- **Country (country_id, country_name, subregion_id)**
- **Victim (victim_id, sex, age)**
- **Offense (offense_id, year, dimension, category, nr_of_victims, country_id, victim_id)**

### Cardinality Logic:

- Each `Offense` may involve one or more `Victims`.
- Each `Country` has multiple `Offenses`.
- A `Region` contains multiple `Subregions`, which contain multiple `Countries`.


![ER Model](first_project\slides\ERM.png)
![ER Diagram](first_project\slides\ERD.png)

This normalized schema allows efficient filtering and joins across geography, victim profiles, and offense dimensions.

---

### 📊 Exploratory Data Analysis (EDA)

- **Temporal Trends**: Year-over-year tendencies in reported victims
- **Geospatial Mapping**: Statistics of trafficking victim per country of report
- **Victim Profiling**: Clustering victims by age/sex/type of exploitation

---

## 💻 Technologies Used

| Area | Tools/Technologies |
|----------------------|---------------------------------------------------------|
| Data Manipulation | Python (Pandas, NumPy) |
| Data Visualization | Matplotlib, Seaborn, Pyplot |
| Database Modeling | MySQL Workbench, Miro, Lucid |
| Documentation | Jupyter Notebook, Markdown, GitHub, Visual Studio Code |
| Version Control | Git, GitHub, Anaconda Powershell |

---

## 📦 Deliverables

- ✅ [Repository "first_project" on GitHub](https://github.com/mari21041/first_project)
- ✅ [Raw dataset](https://view.officeapps.live.com/op/view.aspx?src=https%3A%2F%2Fdataunodc.un.org%2Fsites%2Fdataunodc.un.org%2Ffiles%2Fdata_glotip.xlsx&wdOrigin=BROWSELINK)
- ✅ Jupyter Notebook with cleaned and documented dataset (`load_clean_data.csv`)
- ✅ ERM and ERD schemas with relationship logic
- ✅ Jupyter Notebook with EDA and visualizations
- ✅ MySQL file with data base and quaries
- ✅ README documentation
- ✅ [Final presentation report](https://docs.google.com/presentation/d/1ZxcF3VxB39Q2w0D33H5HTTdKfI78sPBHEtSYHJ0Mm8I/edit?usp=sharing)

---

## 👨‍💼 Target Audience

- **Policy Makers**: Use insights to influence anti-trafficking strategies
- **NGOs**: Suggest as to where to open possible support centers geographically
- **Researchers**: Access a clean dataset for further academic work

---

## 🛠️ Future Work

- Further research can be done to include perpetrators information (from additional dataset) to create an overall view of both victims and perpetrators.
- The analysis can be deepened into specific trafficking dynamics in each country for more focused enforcement efforts
- Analyze the possibility of using predictive models to complete the "Unknown" values of the reports

---

## 👥 Contributors

- Hipolito Marin
- Marianne Filbig
- Delmar Bumanglag
- Egbe Grace

---

## 🌐 Call to Action

Human trafficking is real, widespread, and preventable. Data-driven insights must be used to take concrete steps.
📢 *Share this repository, contribute to awareness, and help make a difference.*

# Next steps
...
Loading