- Dublin, Ireland
- stephenmcaleese.com
Stars
Inspect: A framework for large language model evaluations
A simple pytorch implementation of GPT-2, optimized to run on Macbook Pro M1/M2.
Secrets of RLHF in Large Language Models Part I: PPO
Code for my Master Thesis: I generate Counterfactual Trajectory Explanations about Reward Functions that were learned with Inverse Reinforcemnet Learning
This project focuses on the work of understanding sycophantic behavior within LLMs.
Resources for skilling up in AI alignment research engineering. Covers basics of deep learning, mechanistic interpretability, and RL.
Steering Llama 2 with Contrastive Activation Addition
LLM experiments done during SERI MATS - focusing on activation steering / interpreting activation spaces
The simplest, fastest repository for training/finetuning medium-sized GPTs.
Reproduction of OpenAI and DeepMind's "Deep Reinforcement Learning from Human Preferences"
Website to track people, organizations, and products (tools, websites, etc.) in AI safety
An Obsidian starter kit for LessWrong, Effective Altruism, AI Alignment, etc.
Master programming by recreating your favorite technologies from scratch.
An opinionated guide on how to become a professional Web/Mobile App Developer.
[NeurIPS 2023] Reflexion: Language Agents with Verbal Reinforcement Learning
Mac app for crushing tech interviews with AI
A map of the AI alignment landscape
Personal Website built using GatsbyJS and Strapi
Models for data stocks and training dataset sizes
Code for Deep RL from Human Preferences [Christiano et al]. Plus a webapp for collecting human feedback
My website/blog thing. Made with Jekyll.
Exercise solutions and explanations for the book Probability Theory: The Logic of Science by E.T. Jaynes. Created by the reading group at r/jaynesprobability
Multiversal tree writing interface for human-AI collaboration