gagan3012

🎯

Focusing

Gagan Bhatia gagan3012

🎯

Focusing

NLP Research | MLE

131 followers · 35 following

Achievements

x3 x2

Achievements

x3 x2

Highlights

Organizations

Starred repositories

lili-chen / rltf

Reinforcement Learning from Text Feedback

Python 47 4 Updated Feb 17, 2026

google-deepmind / concordia

A library for generative social simulation

Python 1,485 335 Updated Jun 17, 2026

run-llama / liteparse

A fast, helpful, and open-source document parser

Rust 10,185 665 Updated Jun 18, 2026

TruthfulAI-research / negation_neglect

Code for Negation Neglect

Python 15 4 Updated May 22, 2026

kitft / natural_language_autoencoders

Python 833 111 Updated Jun 9, 2026

eliaskempf / model-diffing

Official code for the paper: "Simple LLM Baselines are Competitive for Model Diffing"

Python 9 Updated Feb 13, 2026

safety-research / introspection-adapters

Training LLMs to Report Their Learned Behaviors

Python 22 2 Updated Apr 28, 2026

UKGovernmentBEIS / vllm-lens

Extract residual-stream activations and apply steering vectors (including activation oracles) to any vLLM model during inference.

Python 107 8 Updated May 7, 2026

science-of-finetuning / diffing-toolkit

A toolkit that provides a range of model diffing techniques including a UI to visualize them interactively.

Python 74 18 Updated Apr 15, 2026

natolambert / colloquium

A markdown native slides tool for academics building with agents.

Python 208 11 Updated Jun 4, 2026

pyannote / pyannote-audio

Neural building blocks for speaker diarization: speech activity detection, speaker change detection, overlapped speech detection, speaker embedding

Jupyter Notebook 10,139 1,072 Updated Jun 17, 2026

UKGovernmentBEIS / reward-hacking-misalignment

Reproducing "Natural Emergent Misalignment from Reward Hacking" (MacDiarmid et al., Anthropic 2025) with open-source models. Includes reward-hackable RL environments, misalignment evaluations, trai…

HTML 22 7 Updated Mar 30, 2026

ndif-team / nnsight

The nnsight package enables interpreting and manipulating the internals of deep learned models.

Python 963 93 Updated Jun 18, 2026

Ronen-Rusinov / Manifolds_in_LLMs

Research project seeking to confirm/disprove the manifold hypothesis in the context of LLM internal activations

HTML 2 Updated Mar 2, 2026

danielpcox / irtk

Interpretability Research Tool Kit

Python 1 Updated Mar 11, 2026

patrickhyw / logitscope

A simplified and more accurate next token Patchscope

Jupyter Notebook 2 Updated Jan 16, 2026

AI-in-Transportation-Lab / awesome-mechanistic-interpretability

A carefully curated collection of high-quality libraries, projects, tutorials, research papers, and other essential resources focused on Mechanistic Interpretability, a growing subfield in machine …

JavaScript 111 10 Updated Jun 18, 2026