AML Transaction Detection

Exploratory data analysis and machine-learning pipeline for the IBM Transactions for Anti-Money Laundering (AML) synthetic dataset.

Dataset: https://www.kaggle.com/datasets/ealtman2019/ibm-transactions-for-anti-money-laundering-aml

Quick start

# 1. Install uv (if not present)
curl -LsSf https://astral.sh/uv/install.sh | sh

# 2. Create virtual environment and install dependencies
uv sync

# 3. Download dataset (requires Kaggle API token at ~/.kaggle/kaggle.json)
uv run python -c "
import kaggle
kaggle.api.authenticate()
kaggle.api.dataset_download_files(
    'ealtman2019/ibm-transactions-for-anti-money-laundering-aml',
    path='data/raw', unzip=True
)
"

# 4. Launch Jupyter
uv run jupyter lab

Dataset

The benchmark family includes multiple HI / LI and Small / Medium / Large splits, but this repo currently analyses only the HI-Large split.

Notebook transaction input: data/raw/HI-Large_Trans.csv
Notebook patterns input: data/raw/HI-Large_Patterns.txt
Other splits and companion files are not used by notebooks/01_eda.ipynb

In the broader benchmark:

HI = high-illicit-ratio split, with roughly 5% laundering-labelled transactions
LI = low-illicit-ratio split, with roughly 0.1% laundering-labelled transactions
Small / Medium / Large refer to dataset size, not a different schema

Group	Size	Approx illicit ratio
HI	Small	~5 %
HI	Medium	~5 %
HI	Large	~5 %
LI	Small	~0.1 %
LI	Medium	~0.1 %
LI	Large	~0.1 %

Each split ships with three companion files:

*_Trans.csv = transaction-level records
*_accounts.csv = account metadata
*_Patterns.txt = ground-truth laundering pattern blocks

Columns

Column	Description
Timestamp	Date-time of the transaction
From Bank	Originating bank ID
Account (from)	Originating account ID
To Bank	Receiving bank ID
Account (to)	Receiving account ID
Amount Received	Amount received (in Receiving Currency)
Receiving Currency	ISO currency code at destination
Amount Paid	Amount paid (in Payment Currency)
Payment Currency	ISO currency code at source
Payment Format	Wire, Cheque, Credit Card, ACH, etc.
Is Laundering	Binary label – 1 = laundering, 0 = legitimate

Laundering patterns

Fan-out – One account rapidly sends to many recipients
Fan-in – Many accounts consolidate to one
Bipartite – Many-to-many transfer block between sender and receiver sets
Cycle – Money circulates through a closed loop of accounts
Gather-Scatter – Aggregate then disperse through layering
Scatter-Gather – Disperse then re-aggregate
Stack – Layered pass-through chains
Random – Irregular mixing pattern

Example

The structuring / smurfing view from the EDA is one of the clearest visuals in the notebook:

Project layout

aml-transaction-detection/
├── data/
│   ├── raw/          <- original Kaggle CSVs
│   └── processed/    <- parquet caches
├── notebooks/
│   └── 01_eda.ipynb  <- main EDA notebook
├── src/              <- helper modules (future)
├── pyproject.toml
└── README.md

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
images		images
notebooks		notebooks
.gitignore		.gitignore
.python-version		.python-version
README.md		README.md
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

AML Transaction Detection

Quick start

Dataset

Columns

Laundering patterns

Example

Project layout

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

AML Transaction Detection

Quick start

Dataset

Columns

Laundering patterns

Example

Project layout

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages