Skip to content
View colored-dye's full-sized avatar

Block or report colored-dye

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

[NeurIPS 2025 Spotlight] Towards Safety Alignment of Vision-Language-Action Model via Constrained Learning.

Python 92 6 Updated Dec 19, 2025

Processed / Cleaned Data for Paper Copilot

Python 786 36 Updated Dec 4, 2025
Python 38 3 Updated Sep 30, 2025
Python 3 Updated Sep 30, 2025

Code for Learning to Interpret Weight Differences in Language Models (Goel et al. 2025)

Jupyter Notebook 14 1 Updated Dec 15, 2025

A Unified Framework for High-Performance and Extensible LLM Steering

Python 137 13 Updated Dec 5, 2025

Persona Vectors: Monitoring and Controlling Character Traits in Language Models

Python 308 72 Updated Jul 30, 2025

The best ChatGPT that $100 can buy.

Python 38,886 4,909 Updated Dec 9, 2025
Python 6 2 Updated Sep 27, 2024

This repo contains the source code for the paper "Evolution Strategies at Scale: LLM Fine-Tuning Beyond Reinforcement Learning"

Python 277 27 Updated Nov 24, 2025

Stanford NLP Python library for benchmarking the utility of LLM interpretability methods

Python 157 23 Updated Jun 25, 2025

[NeurIPS'25] Towards Interpretability Without Sacrifice: Faithful Dense Layer Decomposition with Mixture of Decoders

Python 15 1 Updated May 28, 2025

Enhancing Automated Interpretability with Output-Centric Feature Descriptions

Jupyter Notebook 9 Updated Jun 12, 2025

LaTeX files for the Deep Learning book notation

TeX 1,835 370 Updated May 8, 2023

Config files for mixer

Emacs Lisp 498 61 Updated Dec 11, 2025

tmux source code

C 40,199 2,342 Updated Dec 19, 2025

Lightweight clipboard manager for macOS

Swift 17,897 803 Updated Dec 18, 2025

AlphaEdit: Null-Space Constrained Knowledge Editing for Language Models, ICLR 2025 (Outstanding Paper)

Python 380 40 Updated Oct 15, 2025

ZJU RVPN 客户端的 Go 语言实现

Go 528 39 Updated Oct 10, 2025

A technique for removing sleeper agent behavior

Python 1 Updated Apr 2, 2024

This repository collects all relevant resources about interpretability in LLMs

389 26 Updated Nov 1, 2024
Python 240 17 Updated Feb 22, 2024

awesome papers in LLM interpretability

600 20 Updated Aug 20, 2025

Color palettes for Python

Python 809 78 Updated Aug 23, 2025

Research papers about Chain of Thought (CoT)

57 4 Updated Oct 25, 2023

Beyond the Imitation Game collaborative benchmark for measuring and extrapolating the capabilities of language models

Python 3,173 615 Updated Jul 19, 2024

Measuring Massive Multitask Language Understanding | ICLR 2021

Python 1,537 113 Updated May 28, 2023

Stanford NLP Python library for Representation Finetuning (ReFT)

Python 1,545 130 Updated Feb 6, 2025
Next