NeMo Guardrails is an open-source toolkit for easily adding programmable guardrails to LLM-based conversational systems.
-
Updated
Jun 9, 2026 - Python
NeMo Guardrails is an open-source toolkit for easily adding programmable guardrails to LLM-based conversational systems.
DeepTeam is a framework to red team LLMs and AI agents.
UQLM: Uncertainty Quantification for Language Models, is a Python package for UQ-based LLM hallucination detection
Internal Safety Collapse (ISC): Turning the LLM or an AI Agent into a sensitive data generator.
A resource repository for machine unlearning in large language models
Agent trace and tool-use safety evaluation lab.
Decrypted Generative Model safety files for Apple Intelligence containing filters
[NDSS'25 Best Technical Poster] A collection of automated evaluators for assessing jailbreak attempts.
Static security scanner for LLM agents — prompt injection, MCP config auditing, taint analysis. 51 rules mapped to OWASP Agentic Top 10 (2026). Works with LangChain, CrewAI, AutoGen.
Papers about red teaming LLMs and Multimodal models.
Attack to induce LLMs within hallucinations
Reading list for adversarial perspective and robustness in deep reinforcement learning.
Open Source Reliability Harness: Make your agents follow rules. One line of code to enforce, trace, and improve.
Safety in Embodied AI: A Survey of Risks, Attacks, and Defenses | 500+ Papers | Perception, Cognition, Planning, Interaction, Agentic System
[NeurIPS 2025] SECA: Semantically Equivalent and Coherent Attacks for Eliciting LLM Hallucinations
Official repository for the paper "ALERT: A Comprehensive Benchmark for Assessing Large Language Models’ Safety through Red Teaming"
AISecOps (AI Security Operations) framework for deterministic verification of AI systems. QWED verifies LLM outputs using math, logic, and symbolic execution — creating an auditable trust boundary for agentic AI systems. Not generation. Verification.
Papers from our SoK on Red-Teaming (Accepted at TMLR)
lintlang is a static linter for AI agent configs, tool descriptions, and system prompts that runs zero-LLM quality gating in CI. Catches language-level failures (vague tool descriptions, missing stop conditions, schema gaps) before they reach runtime, with deterministic regex + structural detectors and no model calls.
NeurIPS'24 - LLM Safety Landscape
Add a description, image, and links to the llm-safety topic page so that developers can more easily learn about it.
To associate your repository with the llm-safety topic, visit your repo's landing page and select "manage topics."