NEWS: Agentomics has been accepted into the ISMB 2026 Proceedings
Made for biomedical data, Agentomics outperformed human experts and created new state-of-the-art models for problems in Protein Engineering, Drug Discovery, and Regulatory Genomics.
How it works
- Input is a CSV training dataset + optional data description
- Agentomics autonomously experments with various ML models and strategies
- Output is a trained model ready for inference and a detailed PDF report summarizing the development process and achieved metrics
For more details see: preprint
git clone https://github.com/BioGeMT/agentomics-ml.git
cd agentomics-ml
cp .env.example .env
# Edit .env and set at least one API key (OPENROUTER_API_KEY or OPENAI_API_KEY)
# Download example dataset
./scripts/download_example_dataset.sh
./run.shRecommended model: gpt-5.1-codex-max
Outputs are saved to outputs/<agent_id>/, including PDF reports in outputs/<agent_id>/pdf_reports.
Agentomics can be run either:
For more details visit https://biogemt.github.io/agentomics-ml/
- Generic: Agentomics can crunch any classification and regression datasets in CSV format.
- Secure: Agents execute code securely in Docker with read-only mounts to your file system and are only allowed to write in a Docker Volume.
- Reproducible: Outputs include models, scripts, and conda environments needed to run inference or re-train models with one bash command.
- Trustworthy: If you provide a test set, Agentomics fully abstracts LLMs from accessing it, allowing you to rely on programmaticly computed and reported test set metrics.
- Foundation models: Agentomics can leverage foundation models from huggingface for both embeddings and fine-tuning.
- Various LLM providers: OpenAI, OpenRouter, or local models via Ollama
- Reliability: Thanks to our functional validators, Agentomics creates a working model 100% of the time (when using recommended settings).
Agentomics is in active development. We welcome any raised Issues and suggestions. You can also Email Us.
Features coming soon:
- Support for any data type (currently only CSV datasets)
- Run forking and continuing
- Better local model support and configuration
- Remote GPU support for GCP
See the ismb_submission branch README for instructions.
If you use Agentomics in your work, please cite:
Martinek et al. (2026). Agentomics: An Agentic System that Autonomously Develops Novel State-of-the-Art Solutions for Biomedical Machine Learning Tasks. bioRxiv (preprint) https://www.biorxiv.org/content/10.64898/2026.01.27.702049v1
MIT. See LICENSE.
