Skip to content
View Arth-Singh's full-sized avatar
🎯
Focusing
🎯
Focusing

Block or report Arth-Singh

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Arth-Singh/README.md

Hi there 👋

I’m Arth Singh — an AI Safety & Red Teaming researcher from Mumbai, India 🇮🇳. I am currently working at AIM Intelligence as a Research Engineer in the AI Safety department, and currently collaborating with Seoul National University PI Lab for Mobile Use Agents Red Teaming, I was also a Research Collaborator with FAR.AI where I helped them build their Red Teaming Toolkit.

  • 🧨 I enjoy red teaming AI models, but lately I’m more focused on AI alignment & safety

📫 Let’s connect:

Always down to talk alignment, adversarial evals, or half-baked research ideas that can turn into collaborations.

Pinned Loading

  1. arth-finds-weird-model-behaviours arth-finds-weird-model-behaviours Public

    I have created this repository to document all the weird findings I do related to LLMs

    Python

  2. A-Red-Team-Havoc A-Red-Team-Havoc Public

    This is a red teaming toolkit that i have built to do attacks on LLMs. More to add soon.

    Python 1

  3. NSFW-Image-Gen-Prompt-Injection-automation NSFW-Image-Gen-Prompt-Injection-automation Public

    3-Step Pipeline: 1. Kimi K2 → Generates creative, boundary-testing prompts for safety evaluation 2. OpenAI GPT-Image-1 → Creates images from prompts (fallbacks to DALL-E 3) 3. OpenAI Moderation → …

    Python

  4. Arth-Jailbreak-Templates Arth-Jailbreak-Templates Public

    This is a repository I use to add my own jailbreak templates, which I build with the help of LLMs.

    1

  5. arth-whitebox-redteam arth-whitebox-redteam Public

    Open-source framework for mechanistic interpretability-driven red teaming of language model safety mechanisms.

    Python 1