Skip to content

Pro-GenAI/Agent-Action-Classifier

Repository files navigation

Project logo

Agent Action Classifier: Classifying AI agent actions to ensure safety and reliability

A neural network model to classify actions proposed by autonomous AI agents as harmful or safe. The model has been based on a small dataset of labeled examples.

Preprint AI LLMs Python License: CC BY 4.0 Medium

Implementation

Implementation Diagram

Training

Training Diagram

Usage:

  1. Create a virtual environment and install dependencies:
python3 -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt

For development (optional, includes linting, formatting, and testing tools):

pip install -r requirements-dev.txt
  1. Train the model (Optional):
python3 train_nn.py
  1. Implement the trained model in LLM calls - run the example:
python3 run_sample_query.py

Files:

  • sample_actions.json — dataset of action prompts and labels/resources in MCP-like format.
  • train_nn.py — small script that trains a neural network model and saves the trained model.
  • action_classifier.py — module that loads the trained model and provides a function to classify actions.
  • run_sample_query.py — script to classify new actions using the trained model (example wrapper).
  • requirements.txt — minimal dependencies.
  • requirements-dev.txt — development dependencies (linting, formatting, testing tools).

Citation

If you find this repository useful in your research, please consider citing:

@misc{vadlapati2025agentactionclassifier,
  author       = {Vadlapati, Praneeth},
  title        = {Agent Action Classifier: Classifying AI agent actions to ensure safety and reliability},
  year         = {2025},
  howpublished = {\url{https://github.com/Pro-GenAI/Agent-Action-Classifier}},
  note         = {GitHub repository},
}

Created based on my past work

Agent-Supervisor: Supervising Actions of Autonomous AI Agents for Ethical Compliance: GitHub

Image credits: