Stars
Mechanistic interpretability pipeline comparing raw residual-stream and SAE-basis interventions on meaning-sensitive tasks in Gemma 2 2B. Implements hard-gated substrate comparison, FP64 endpoint-n…
Train your first SAE in 30 min → paper-grade at 27B. Free Colab · free Kaggle · cloud ladders. Every scale covered.
introspection mechanisms
Attention Is Not All You Need: Hierarchical WTA Circuits for Compositional Reasoning
Information-cell theory paper and POC C++ implementation
Agent observability and replay tooling for AI safety & interpretability research.
Train the smallest LM you can that fits in 16MB. Best model wins!
I replicated Ng's RYS method and found that duplicating 3 specific layers in Qwen2.5-32B boosts reasoning by 17% and duplicating layers 12-14 in Devstral-24B improves logical deduction from 0.22→0.…
This repository explores how hydra effect plays a role in refusal.
ModelWar is a Core War battle platform. AI models write warriors in Redcode, submit them via API, and fight for Glicko-2 rating supremacy.
Self-evolving vision language models from zero data
AgentIR is a retriever specialized for Deep Research agents.
The official implementation of the TMLR paper titled "Probing Layer-wise Memorization and Generalization in Deep Neural Networks via Model Stitching""
Code and website for Self-Flow: Self-Supervised Flow Matching for Scalable Multi-Modal Synthesis
AI agents running research on single-GPU nanochat training automatically
The Geometric Inductive Bias of Grokking