Stars
A benchmark for LLMs on complicated tasks in the terminal
[EMNLP2025] Remedy: Learning Machine Translation Evaluation from Human Preferences with Reward Modeling
Open-Source Battery Monitoring & Modeling Resources
The official pytorch implemention of the Intespeech 2024 paper "Reshape Dimensions Network for Speaker Recognition"
[INTERSPEECH 2022] This dataset is designed for multi-modal speaker diarization and lip-speech synchronization in the wild.
Android Voice Activity Detection (VAD) library. Supports WebRTC VAD GMM, Silero VAD DNN, Yamnet VAD DNN models.
DataComp: In search of the next generation of multimodal datasets
Code accompanying the paper "Massive Activations in Large Language Models"
Implementation of the BatchTopK activation function for training sparse autoencoders (SAEs)
A list of Twitter datasets and related resources.
MTEB: Massive Text Embedding Benchmark
Sparsify transformers with SAEs and transcoders
Training Sparse Autoencoders on Language Models
Code that renders an image as a series of lines connecting pins around a circular frame (for more detail, see my Medium page).
Code for reproducing our paper "Not All Language Model Features Are Linear"
DSPy: The framework for programming—not prompting—language models
Open Source smart glasses designed to be 1. All day wearable 2. Immediately useful 3. Extendable for makers, startups, and everyone else.
Physics Informed Deep Learning: Data-driven Solutions and Discovery of Nonlinear Partial Differential Equations
Single- and Multi-Speaker Cloned Voice Detection: From Perceptual to Learned Features