Stars
Making Fabio Crameri's perceptually uniform colourmaps for geosciences available on PyPI and conda-forge
Turning general models into narrow ones
[NeurIPS D&B '25] The one-stop repository for large language model (LLM) unlearning. Supports TOFU, MUSE, WMDP, and many unlearning methods with easily feature extensibility.
A tool for visualization of complex job searches.
HarmBench: A Standardized Evaluation Framework for Automated Red Teaming and Robust Refusal
ControlArena is a collection of settings, model organisms and protocols - for running control experiments.
Codebase for Distillation Robustifies Unlearning
📗 Score text readability using a number of formulas: Flesch-Kincaid Grade Level, Gunning Fog, ARI, Dale Chall, SMOG, and more
Code for the paper "Do Unlearning Methods Remove Information from Language Model Weights?"
Easy and lightning fast training of 🤗 Transformers on Habana Gaudi processor (HPU)
[ICML25] Official repo for "Towards LLM Unlearning Resilient to Relearning Attacks: A Sharpness-Aware Minimization Perspective and Beyond"
Friends don't let friends make certain types of data visualization - What are they and why are they bad.
Code for "Utility Engineering: Analyzing and Controlling Emergent Value Systems in AIs"
Official PyTorch Implementation of EMoE: Unlocking Emergent Modularity in Large Language Models [main conference @ NAACL2024]
Tools for understanding how transformer predictions are built layer-by-layer
Code and Data Repo for the CoNLL Paper -- Future Lens: Anticipating Subsequent Tokens from a Single Hidden State
MishformerLens intends to be a drop-in replacement for TransformerLens that AST patches HuggingFace Transformers rather than implementing a custom, numerically inaccurate Transformer architecture.