colored-dye

colored-dye

5 followers · 2 following

Zhejiang University
https://colored-dye.github.io

Achievements

Stars

PKU-Alignment / SafeVLA

[NeurIPS 2025 Spotlight] Towards Safety Alignment of Vision-Language-Action Model via Constrained Learning.

Python 92 6 Updated Dec 19, 2025

papercopilot / paperlists

Processed / Cleaned Data for Paper Copilot

Python 786 36 Updated Dec 4, 2025

apple / ml-selfreflect

Python 38 3 Updated Sep 30, 2025

MashDevel / lightbox

Python 3 Updated Sep 30, 2025

Aviously / diff-interpretation-tuning

Code for Learning to Interpret Weight Differences in Language Models (Goel et al. 2025)

Jupyter Notebook 14 1 Updated Dec 15, 2025

ZJU-REAL / EasySteer

A Unified Framework for High-Performance and Extensible LLM Steering

Python 137 13 Updated Dec 5, 2025

safety-research / persona_vectors

Persona Vectors: Monitoring and Controlling Character Traits in Language Models

Python 308 72 Updated Jul 30, 2025

karpathy / nanochat

The best ChatGPT that $100 can buy.

Python 38,886 4,909 Updated Dec 9, 2025

maxtli / optimalablation

Python 6 2 Updated Sep 27, 2024

VsonicV / es-fine-tuning-paper

This repo contains the source code for the paper "Evolution Strategies at Scale: LLM Fine-Tuning Beyond Reinforcement Learning"

Python 277 27 Updated Nov 24, 2025

stanfordnlp / axbench

Stanford NLP Python library for benchmarking the utility of LLM interpretability methods

Python 157 23 Updated Jun 25, 2025

james-oldfield / MxD

[NeurIPS'25] Towards Interpretability Without Sacrifice: Faithful Dense Layer Decomposition with Mixture of Decoders

Python 15 1 Updated May 28, 2025

yoavgur / Feature-Descriptions

Enhancing Automated Interpretability with Output-Centric Feature Descriptions

Jupyter Notebook 9 Updated Jun 12, 2025

goodfeli / dlbook_notation

LaTeX files for the Deep Learning book notation

TeX 1,835 370 Updated May 8, 2023

rexim / dotfiles

Config files for mixer

Emacs Lisp 498 61 Updated Dec 11, 2025

tmux / tmux

tmux source code

C 40,199 2,342 Updated Dec 19, 2025

p0deje / Maccy

Lightweight clipboard manager for macOS

Swift 17,897 803 Updated Dec 18, 2025

jianghoucheng / AlphaEdit

AlphaEdit: Null-Space Constrained Knowledge Editing for Language Models, ICLR 2025 (Outstanding Paper)

Python 380 40 Updated Oct 15, 2025

Mythologyli / zju-connect

ZJU RVPN 客户端的 Go 语言实现

Go 528 39 Updated Oct 10, 2025

vwxyzjn / summarize_from_feedback_details

Python 159 21 Updated Nov 23, 2024

colehaus / sleeper-agent

A technique for removing sleeper agent behavior

Python 1 Updated Apr 2, 2024

Dakingrai / awesome-mechanistic-interpretability-lm-papers

223 12 Updated Nov 22, 2024

ruizheliUOA / Awesome-Interpretability-in-Large-Language-Models

This repository collects all relevant resources about interpretability in LLMs

389 26 Updated Nov 1, 2024

davidbau / baukit

Python 240 17 Updated Feb 22, 2024

zepingyu0512 / awesome-llm-understanding-mechanism

awesome papers in LLM interpretability

600 20 Updated Aug 20, 2025

jiffyclub / palettable

Color palettes for Python

Python 809 78 Updated Aug 23, 2025

Alab-NII / chain-of-thought

Research papers about Chain of Thought (CoT)

57 4 Updated Oct 25, 2023

google / BIG-bench

Beyond the Imitation Game collaborative benchmark for measuring and extrapolating the capabilities of language models

Python 3,173 615 Updated Jul 19, 2024

hendrycks / test

Measuring Massive Multitask Language Understanding | ICLR 2021

Python 1,537 113 Updated May 28, 2023

stanfordnlp / pyreft

Stanford NLP Python library for Representation Finetuning (ReFT)

Python 1,545 130 Updated Feb 6, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

colored-dye

Achievements

Achievements

Block or report colored-dye

Stars

PKU-Alignment / SafeVLA

papercopilot / paperlists

apple / ml-selfreflect

MashDevel / lightbox

Aviously / diff-interpretation-tuning

ZJU-REAL / EasySteer

safety-research / persona_vectors

karpathy / nanochat

maxtli / optimalablation

VsonicV / es-fine-tuning-paper

stanfordnlp / axbench

james-oldfield / MxD

yoavgur / Feature-Descriptions

goodfeli / dlbook_notation

rexim / dotfiles

tmux / tmux

p0deje / Maccy

jianghoucheng / AlphaEdit

Mythologyli / zju-connect

vwxyzjn / summarize_from_feedback_details

colehaus / sleeper-agent

Dakingrai / awesome-mechanistic-interpretability-lm-papers

ruizheliUOA / Awesome-Interpretability-in-Large-Language-Models

davidbau / baukit

zepingyu0512 / awesome-llm-understanding-mechanism

jiffyclub / palettable

Alab-NII / chain-of-thought

google / BIG-bench

hendrycks / test

stanfordnlp / pyreft