Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 4 additions & 1 deletion CONTRIBUTING.md
Original file line number Diff line number Diff line change
Expand Up @@ -31,7 +31,10 @@ cd avex
### Install with uv (Recommended)

```bash
# Install the project with dev dependencies
# Install the project without dev dependencies
uv sync

# For ESP users, install the project with dev dependencies
uv sync --group dev
```

Expand Down
52 changes: 36 additions & 16 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,13 +1,16 @@
# avex - Animal Vocalization Encoder Library
# AVEX - Animal Vocalization Encoder Library

[![arXiv](https://img.shields.io/badge/arXiv-2508.11845-b31b1b.svg)](https://arxiv.org/abs/2508.11845)
[![PyPI](https://img.shields.io/pypi/v/avex.svg)](https://pypi.org/project/avex/)
[![Hugging Face](https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Collection-yellow)](https://huggingface.co/collections/EarthSpeciesProject/esp-aves2)
![CI status](https://github.com/earthspecies/avex/actions/workflows/pythonapp.yml/badge.svg?branch=main)
![Pre-commit status](https://github.com/earthspecies/avex/actions/workflows/pre-commit.yml/badge.svg?branch=main)

An API for model loading and inference, and a Python-based system for training and evaluating bioacoustics representation learning models.

## Description

The Animal Vocalization Encoder library avex provides a unified interface for working with pre-trained bioacoustics representation learning models, with support for:
The Animal Vocalization Encoder library AVEX provides a unified interface for working with pre-trained bioacoustics representation learning models, with support for:

- **Model Loading**: Load pre-trained models with checkpoints and class mappings
- **Embedding Extraction**: Extract features from audio for downstream tasks
Expand Down Expand Up @@ -148,9 +151,9 @@ The framework supports the following audio representation learning models:
- **ATST** - Audio Spectrogram Transformer
- **ResNet** - ResNet models (ResNet18, ResNet50, ResNet152)
- **CLIP** - Contrastive Language-Audio Pretraining models
- **BirdNet** - BirdNet models for bioacoustic classification
- **Perch** - Perch models for bioacoustics
- **SurfPerch** - SurfPerch models
- **BirdNet** - BirdNet models for bioacoustic classification - external tensorflow model, some features might not be available
- **Perch** - Perch models for bioacoustics - external tensorflow model, some features might not be available
- **SurfPerch** - SurfPerch models - external tensorflow model, some features might not be available

See [Supported Models](docs/supported_models.md) for detailed information and configuration examples.

Expand All @@ -175,29 +178,46 @@ See [Probe System](docs/probe_system.md) and [API Probes](docs/api_probes.md) fo
If you use this framework in your research, please cite:

```bibtex
@article{miron2025matters,
@inproceedings{miron2025matters,
title={What Matters for Bioacoustic Encoding},
author={Miron, Marius and Robinson, David and Alizadeh, Milad and Gilsenan-McMahon, Ellen and Narula, Gagan and Pietquin, Olivier and Geist, Matthieu and Chemla, Emmanuel and Cusimano, Maddie and Effenberger, Felix and others},
journal={arXiv preprint arXiv:2508.11845},
year={2025}
author={Miron, Marius and Robinson, David and Alizadeh, Milad and Gilsenan-McMahon, Ellen and Narula, Gagan and Chemla, Emmanuel and Cusimano, Maddie and Effenberger, Felix and Hagiwara, Masato and Hoffman, Benjamin and Keen, Sara and Kim, Diane and Lawton, Jane K. and Liu, Jen-Yu and Raskin, Aza and Pietquin, Olivier and Geist, Matthieu},
booktitle={The Fourteen International Conference on Learning Representations},
year={2026}
}
```

Related ESP papers:

```bibtex
@inproceedings{miron2026probing,
title={Multi-layer attentive probing improves transfer of audio representations for bioacoustics},
author={Miron, Marius and Robinson, David and Hagiwara, Masato and Titouan, Parcollet and Cauzinille, Jules and and Narula, Gagan and Alizadeh, Milad and Gilsenan-McMahon, Ellen and Keen, Sara and Chemla, Emmanuel and Hoffman, Benjamin and Cusimano, Maddie and Kim, Diane and Effenberger, Felix and Lawton, Jane K. and Raskin, Aza and Pietquin, Olivier and Geist, Matthieu},
booktitle={ICASSP 2026-2026 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)},
pages={1--5},
year={2026},
organization={IEEE}
}
@inproceedings{hagiwara2023aves,
title={Aves: Animal vocalization encoder based on self-supervision},
author={Hagiwara, Masato},
booktitle={ICASSP 2023-2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)},
pages={1--5},
year={2023},
organization={IEEE}
}
```

## Contributing

We welcome contributions! Please see [CONTRIBUTING.md](CONTRIBUTING.md) for:

- Development setup
- Running tests
- Code style guidelines
- Adding new functionality
- Pull request process

## License

This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.

## Acknowledgments

- Built on top of PyTorch
- Integrates with various pre-trained audio models
- ICLR2026 and ICASSP2026 reviewers for the feedback
- Titouan Parcollet for templating, engineering feedback
- Bioacoustics community (IBAC, BioDCASE, ABS)
4 changes: 2 additions & 2 deletions docs/index.md
Original file line number Diff line number Diff line change
@@ -1,10 +1,10 @@
# Representation Learning Framework Documentation
# AVEX - Animal Vocalization Encoder Library Documentation

Welcome to the Representation Learning Framework documentation. This framework provides an API for model loading and inference, and a Python-based system for training and evaluating bioacoustics representation learning models.

## Getting Started

### What is avex?
### What is AVEX?

The Representation Learning Framework is an API for model loading and inference, and a Python-based system for training and evaluating bioacoustics representation learning models. It provides:

Expand Down
2 changes: 1 addition & 1 deletion pyproject.toml
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
[project]
name = "avex"
version = "0.5.0a1"
version = "1.0.0"
description = "A comprehensive Python-based system for training, evaluating, and analyzing audio representation learning models with support for both supervised and self-supervised learning paradigms"
readme = "README.md"
requires-python = ">=3.10,<3.13"
Expand Down
2 changes: 1 addition & 1 deletion uv.lock

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

Loading