Skip to content
View gagan3012's full-sized avatar
🎯
Focusing
🎯
Focusing

Organizations

@conda-forge @UBC-NLP @EddieHubCommunity @openwater-fall2020 @ubcdsc @jupyter-naas @accelerateplus

Block or report gagan3012

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don’t include any personal information such as legal names or email addresses. Markdown is supported. This note will only be visible to you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse

Starred repositories

Showing results

Reinforcement Learning from Text Feedback

Python 47 4 Updated Feb 17, 2026

A library for generative social simulation

Python 1,485 335 Updated Jun 17, 2026

A fast, helpful, and open-source document parser

Rust 10,185 665 Updated Jun 18, 2026

Code for Negation Neglect

Python 15 4 Updated May 22, 2026

Official code for the paper: "Simple LLM Baselines are Competitive for Model Diffing"

Python 9 Updated Feb 13, 2026

Training LLMs to Report Their Learned Behaviors

Python 22 2 Updated Apr 28, 2026

Extract residual-stream activations and apply steering vectors (including activation oracles) to any vLLM model during inference.

Python 107 8 Updated May 7, 2026

A toolkit that provides a range of model diffing techniques including a UI to visualize them interactively.

Python 74 18 Updated Apr 15, 2026

A markdown native slides tool for academics building with agents.

Python 208 11 Updated Jun 4, 2026

Neural building blocks for speaker diarization: speech activity detection, speaker change detection, overlapped speech detection, speaker embedding

Jupyter Notebook 10,139 1,072 Updated Jun 17, 2026

Reproducing "Natural Emergent Misalignment from Reward Hacking" (MacDiarmid et al., Anthropic 2025) with open-source models. Includes reward-hackable RL environments, misalignment evaluations, trai…

HTML 22 7 Updated Mar 30, 2026

The nnsight package enables interpreting and manipulating the internals of deep learned models.

Python 963 93 Updated Jun 18, 2026

Research project seeking to confirm/disprove the manifold hypothesis in the context of LLM internal activations

HTML 2 Updated Mar 2, 2026

Interpretability Research Tool Kit

Python 1 Updated Mar 11, 2026

A simplified and more accurate next token Patchscope

Jupyter Notebook 2 Updated Jan 16, 2026

A carefully curated collection of high-quality libraries, projects, tutorials, research papers, and other essential resources focused on Mechanistic Interpretability, a growing subfield in machine …

JavaScript 111 10 Updated Jun 18, 2026

Code for `LLM2VEC-GEN: Generative Embeddings from Large Language Models`

Python 71 2 Updated Apr 5, 2026

Late Interaction Models Training & Retrieval

Python 851 89 Updated Jun 17, 2026

Optimize prompts, code, and more with AI-powered Reflective Text Evolution

Jupyter Notebook 5,216 433 Updated Jun 18, 2026

Official PyTorch implementation for "TensorLens: End-to-End Transformer Analysis via High-Order Attention Tensors" [ACL 2026]

Jupyter Notebook 47 4 Updated Apr 14, 2026

🪄 Interpreto is an interpretability toolbox for LLMs

Python 184 5 Updated Jun 18, 2026

ADAG: Transluce's MLP neuron-level circuit tracing library

Python 29 4 Updated Apr 10, 2026

A collection of lightweight interpretability scripts to understand how LLMs think

Python 89 7 Updated Mar 18, 2026

Pivotal Token Search

Python 153 9 Updated Dec 20, 2025

A toolkit for embedding text datasets with sparse autoencoders

Python 29 5 Updated Mar 24, 2026

Fully automatic censorship removal for language models

Python 25,140 2,706 Updated Jun 18, 2026

Code for Paper "The Geometry of Reasoning: Flowing Logics in Representation Space" (ICLR 2026)

Python 52 5 Updated Jan 31, 2026

Unified access to Large Language Model modules using NNsight

Python 112 11 Updated May 6, 2026
Next