git-disl
Pinned Loading
Repositories
- awesome_LLM-harmful-fine-tuning-papers Public
A survey on harmful fine-tuning attack for large language model
git-disl/awesome_LLM-harmful-fine-tuning-papers’s past year of commit activity - GTLLMZoo Public
GTLLMZoo: A comprehensive framework that aggregates LLM benchmark data from multiple sources with an interactive UI for efficient model comparison, filtering, and evaluation across performance, safety, and efficiency metrics.
git-disl/GTLLMZoo’s past year of commit activity - Booster Public
This is the official code for the paper "Booster: Tackling Harmful Fine-tuning for Large Language Models via Attenuating Harmful Perturbation" (ICLR2025 Oral).
git-disl/Booster’s past year of commit activity - Safety-Tax Public
This is the official code for the paper "Safety Tax: Safety Alignment Makes Your Large Reasoning Models Less Reasonable".
git-disl/Safety-Tax’s past year of commit activity - Virus Public
This is the official code for the paper "Virus: Harmful Fine-tuning Attack for Large Language Models Bypassing Guardrail Moderation"
git-disl/Virus’s past year of commit activity - llm-topla Public
git-disl/llm-topla’s past year of commit activity - PFT Public
git-disl/PFT’s past year of commit activity - Chameleon Public
git-disl/Chameleon’s past year of commit activity - Vaccine Public
This is the official code for the paper "Vaccine: Perturbation-aware Alignment for Large Language Models" (NeurIPS2024)
git-disl/Vaccine’s past year of commit activity
Top languages
Loading…
Most used topics
Loading…