Stars
Evals is a framework for evaluating LLMs and LLM systems, and an open-source registry of benchmarks.
Browser extension that blocks algorithmic 'home' feeds, while preserving unique pages/links, DMs, search, and subscriptions
Sutskever 30 implementations inspired by https://papercode.vercel.app/
Claude UX skill plugin for web app usability audits, accessibility checks, and design specs
A super fast Graph Database uses GraphBLAS under the hood for its sparse adjacency matrix graph representation. Our goal is to provide the best Knowledge Graph for LLM (GraphRAG).
This is the template I use to start new full-stack projects.
PyTorch Code for Energy-Based Transformers paper -- generalizable reasoning and scalable learning
MiniHF is an inference, human preference data collection, and fine-tuning tool for local language models. It is intended to help the user develop their prompts into full models.
AI chat assistant for Obsidian with contextual awareness, smart writing assistance, and one-click edits. Features vault-aware conversations, semantic search, and local model support.
Inspect: A framework for large language model evaluations
User-friendly AI Interface (Supports Ollama, OpenAI API, ...)
DSPy: The framework for programming—not prompting—language models
SWE-agent takes a GitHub issue and tries to automatically fix it, using your LM of choice. It can also be employed for offensive cybersecurity or competitive coding challenges. [NeurIPS 2024]
A modern cookiecutter template for Python projects that use Poetry for dependency management
Contains random samples referenced in the paper "Sleeper Agents: Training Robustly Deceptive LLMs that Persist Through Safety Training".
Code associated with the paper, Soft Prompts For Evaluation.
list of projects related to EA Software Engineers
Machine Learning for Alignment Bootcamp