Stars
🧑🏫 60+ Implementations/tutorials of deep learning papers with side-by-side notes 📝; including transformers (original, xl, switch, feedback, vit, ...), optimizers (adam, adabelief, sophia, ...), ga…
A Python library to access Instagram's private API.
A library for mechanistic interpretability of GPT-style language models
Siamese and triplet networks with online pair/triplet mining in PyTorch
PyTorch implementation of JiT https://arxiv.org/abs/2511.13720
Code release for Best-of-N Jailbreaking
Code for the EMNLP 2024 paper "Detecting and Mitigating Contextual Hallucinations in Large Language Models Using Only Attention Maps"
Code for the paper "DiffusionNER: Boundary Diffusion for Named Entity Recognition", accepted at ACL 2023.
Awesome Jailbreak, red teaming arxiv papers (Automatically Update Every 12th hours)
Awesome Large Reasoning Model(LRM) Safety.This repository is used to collect security-related research on large reasoning models such as DeepSeek-R1 and OpenAI o1, which are currently very popular.
A new algorithm that formulates jailbreaking as a reasoning problem.
Official code implementation of SKU, Accepted by ACL 2024 Findings