llm-safety

Here are 229 public repositories matching this topic...

NVIDIA-NeMo / Guardrails

NeMo Guardrails is an open-source toolkit for easily adding programmable guardrails to LLM-based conversational systems.

python nvidia safety agents guardrails llms generative-ai llm-security llm-safety

Updated Jun 9, 2026
Python

confident-ai / deepteam

Star

DeepTeam is a framework to red team LLMs and AI agents.

python llm-safety llm-guardrails llm-red-teaming llm-seecurity

Updated Jun 9, 2026
Python

cvs-health / uqlm

Star

UQLM: Uncertainty Quantification for Language Models, is a Python package for UQ-based LLM hallucination detection

uncertainty-quantification uncertainty-estimation ai-safety confidence-score hallucination confidence-estimation ai-evaluation llm llm-evaluation llm-safety hallucination-evaluation hallucination-detection hallucination-mitigation llm-hallucination

Updated Jun 8, 2026
Python

wuyoscar / Internal-Safety-Collapse

Star

Internal Safety Collapse (ISC): Turning the LLM or an AI Agent into a sensitive data generator.

benchmark jailbreak ai-safety red-teaming large-language-models llm-safety safety-evaluation agent-safety

Updated Jun 7, 2026
Python

chrisliu298 / awesome-llm-unlearning

Star

A resource repository for machine unlearning in large language models

Updated Jun 10, 2026

YutoTerashima / agent-safety-eval-lab

Star

Agent trace and tool-use safety evaluation lab.

ai-agents red-teaming tool-use evals llm-safety

Updated May 2, 2026
Python

BlueFalconHD / apple_generative_model_safety_decrypted

Star

Decrypted Generative Model safety files for Apple Intelligence containing filters

apple ai safety decryption lldb-script llm llm-safety apple-intelligence

Updated Jan 26, 2026
Python

CryptoAILab / JailbreakEval

Star

[NDSS'25 Best Technical Poster] A collection of automated evaluators for assessing jailbreak attempts.

llm-safety llm-jailbreaks

Updated Apr 1, 2025
Python

HeadyZhang / agent-audit

Star

Static security scanner for LLM agents — prompt injection, MCP config auditing, taint analysis. 51 rules mapped to OWASP Agentic Top 10 (2026). Works with LangChain, CrewAI, AutoGen.

python cli security mcp static-analysis owasp taint-analysis vulnerability-detection vulnerability-scanner ai-security ai-agent langchain prompt-injection llm-security llm-safety crewai ai-security-tool langchain-security-

Updated Jun 7, 2026
Python

Libr-AI / OpenRedTeaming

Star

Papers about red teaming LLMs and Multimodal models.

safety awesome-list papers language-model redteaming llm-safety

Updated May 28, 2025

PKU-YuanGroup / Hallucination-Attack

Star

Attack to induce LLMs within hallucinations

nlp machine-learning deep-learning ai-safety adversarial-attacks hallucinations llm llm-safety

Updated May 17, 2024
Python

EzgiKorkmaz / adversarial-reinforcement-learning

Star

Reading list for adversarial perspective and robustness in deep reinforcement learning.

Updated Mar 2, 2026

open-bias / open-bias

Star

Open Source Reliability Harness: Make your agents follow rules. One line of code to‎ ‎enforce, trace, and improve. ‎ ‎

Updated May 23, 2026
Python

x-zheng16 / Awesome-Embodied-AI-Safety

Star

Safety in Embodied AI: A Survey of Risks, Attacks, and Defenses | 500+ Papers | Perception, Cognition, Planning, Interaction, Agentic System

Updated Jun 3, 2026
Shell

Buyun-Liang / SECA

Star

[NeurIPS 2025] SECA: Semantically Equivalent and Coherent Attacks for Eliciting LLM Hallucinations

adversarial-attacks large-language-models llm-safety llm-hallucination

Updated Dec 10, 2025
Python

Babelscape / ALERT

Star

Official repository for the paper "ALERT: A Comprehensive Benchmark for Assessing Large Language Models’ Safety through Red Teaming"

nlp benchmark ai artificial-intelligence nlp-machine-learning red-teaming bias-detection safety-monitoring transformers-models llm llm-evaluation llm-safety llm-safety-benchmark

Updated Sep 20, 2024
Python

AISecOps (AI Security Operations) framework for deterministic verification of AI systems. QWED verifies LLM outputs using math, logic, and symbolic execution — creating an auditable trust boundary for agentic AI systems. Not generation. Verification.

Updated Jun 6, 2026
Python

dapurv5 / awesome-red-teaming-llms

Star

Papers from our SoK on Red-Teaming (Accepted at TMLR)

awesome awesome-list ai-safety adversarial-attacks red-teaming ai-security llm-security llm-safety

Updated May 2, 2026

hermes-labs-ai / lintlang

Star

lintlang is a static linter for AI agent configs, tool descriptions, and system prompts that runs zero-LLM quality gating in CI. Catches language-level failures (vague tool descriptions, missing stop conditions, schema gaps) before they reach runtime, with deterministic regex + structural detectors and no model calls.

Updated Jun 2, 2026
Python

poloclub / llm-landscape

Star

NeurIPS'24 - LLM Safety Landscape

llm llm-safety safety-basin llm-safety-landscape llm-landscape

Updated Oct 21, 2025
Python

Improve this page

Add a description, image, and links to the llm-safety topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the llm-safety topic, visit your repo's landing page and select "manage topics."

Learn more

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

llm-safety

Here are 229 public repositories matching this topic...

NVIDIA-NeMo / Guardrails

confident-ai / deepteam

cvs-health / uqlm

wuyoscar / Internal-Safety-Collapse

chrisliu298 / awesome-llm-unlearning

YutoTerashima / agent-safety-eval-lab

BlueFalconHD / apple_generative_model_safety_decrypted

CryptoAILab / JailbreakEval

HeadyZhang / agent-audit

Libr-AI / OpenRedTeaming

PKU-YuanGroup / Hallucination-Attack

EzgiKorkmaz / adversarial-reinforcement-learning

open-bias / open-bias

x-zheng16 / Awesome-Embodied-AI-Safety

Buyun-Liang / SECA

Babelscape / ALERT

QWED-AI / qwed-verification

dapurv5 / awesome-red-teaming-llms

hermes-labs-ai / lintlang

poloclub / llm-landscape

Improve this page

Add this topic to your repo