Welcome to the embedding-to-individual-id project. This repository contains a deep learning pipeline to identify individual birds using embeddings extracted from BirdNET and Google Perch. It supports training and evaluation of FCNN (Fully Connected Neural Networks) and RNN (Recurrent Neural Networks), specifically with LSTM.
- Modular Architecture: Easy to adapt to new species or embedding types.
- Multi-Model Support: Compare performance between FCNN and LSTM.
- Streamlined Workflow: From raw embeddings to performance metrics and confusion matrices.
Install Miniconda or Anaconda.
Clone the repository and navigate to the project directory:
git clone xxxxx
cd xxxxxCreate the environment using the provided YAML file:
conda env create -f environment.yml
conda activate embtoindvThis repository requires embeddings as input. You can generate them using the [Extraction Repository] or download the pre-extracted datasets:
Access the pre-extracted embeddings here:
Organization: Place all embeddings inside the Output_files folder in the project root, separating them by window duration (3s or 5s):
Output_files/
├── Embeddings_from_3sPadding/
│ ├── <dataset_name>_parquet_parts/ # .parquet files (e.g., littleowl_parquet_parts)
| └── ...
└── Embeddings_from_5sPadding/
├── <dataset_name>_parquet_parts/ # .parquet files (e.g., littleowl_parquet_parts)
└── ...
Metadata files are not included in this repository. Access them through:
Paste the files into the Output_metadata folder following this structure:
Output_metadata
├── GreatTit_metadata
│ ├── final_greatTit_metadata.csv
│ └── ... (train, test, val csv files)
├── chiffchaff-fg
├── KiwiTrimmed
├── littleowl-fg
├── littlepenguin_metadata
├── pipit-fg
└── rtbc_metadata
The notebooks are categorized by architecture and dataset:
-
FCNN Models: Found in Notebooks/. High-performance classification using 1024-D (BirdNET) or 1280-D (Perch) features.
-
Pooling Strategies: Notebooks with _pooling in the name. They explore global average or maximum pooling across multiple temporal segments.
-
LSTM Models: Specialized notebooks for capturing temporal dependencies within vocalization sequences.
Study Cases Includes experiments for both Across-year and Within-year temporal constraints across all target species.
-
Verify Data: Ensure .parquet files and .csv metadata are in their respective Output_ folders.
-
Select a Model: Open a notebook from the Notebooks/ directory (e.g., Notebooks/chiffchaff_withinyear_LSTM_birdnet.ipynb).
Configure Paths: Verify the data paths in the initial cells match your local directory structure.
Train and Evaluate: Execute all cells. The notebooks will output accuracy metrics, loss curves, and confusion matrices for the test split.
- Input: BirdNET v2.4 (1024-D) or Google Perch (1280-D) embeddings.
- FCNN: Dense layers with Dropout and Batch Normalization.
- RNN: LSTM-based architecture for sequence processing.
Note: This repository focuses on the classification and inference stage. For audio preprocessing and raw embedding extraction, please refer to the Extraction Repository.
We welcome contributions!
This project is licensed under the MIT License.