holarissun

🎯

Focusing

Hao Sun holarissun

🎯

Focusing

PhD in Reinforcement Learning, LLM Alignment, RLHF

124 followers · 37 following

University of Cambridge
https://holarissun.github.io/
@HolarisSun

Achievements

Highlights

Stars

JacobPfau / fillerTokens

Python 74 7 Updated Apr 27, 2024

ethansbrown / acpc

Projects related to Annual Computer Poker Competition

C 15 11 Updated Sep 19, 2016

google-deepmind / open_spiel

OpenSpiel is a collection of environments and algorithms for research in general reinforcement learning and search/planning in games.

C++ 5,180 1,127 Updated Apr 26, 2026

google-deepmind / game_arena

Python 104 23 Updated Feb 2, 2026

TextArena / UnstableBaselines

Python 120 14 Updated Apr 7, 2026

TextArena / TextArena

A Collection of Competitive Text-Based Games for Language Model Evaluation and Reinforcement Learning

Python 394 90 Updated Apr 15, 2026

keithlee96 / pluribus-poker-AI

Forked from fedden/poker_ai

🤖 An Open Source Texas Hold'em AI

Python 347 71 Updated Oct 22, 2023

HenryRLee / PokerHandEvaluator

Poker-Hand-Evaluator: An efficient poker hand evaluation algorithm and its implementation, supporting 7-card poker and Omaha poker evaluation

C 494 106 Updated Nov 25, 2025

ge-ne / bibtool

BibTool is a tool for manipulating BibTeX data bases. BibTeX provides a mean to integrate citations into LaTeX documents. BibTool allows the manipulation of BibTeX files which goes beyond the possi…

C 239 33 Updated Jan 14, 2026

inclusionAI / AReaL

The RL Bridge for LLM-based Agent Applications. Made Simple & Flexible.

Python 5,121 485 Updated Apr 30, 2026

hiyouga / EasyR1

EasyR1: An Efficient, Scalable, Multi-Modality RL Training Framework based on veRL

Python 4,900 371 Updated Apr 6, 2026

proroklab / popgym

Partially Observable Process Gym

Python 214 20 Updated Jun 12, 2025

ruixin31 / Spurious_Rewards

Python 359 20 Updated Jul 29, 2025

zhangxy-2019 / critique-GRPO

Python 64 3 Updated Mar 8, 2026

verl-project / verl

verl/HybridFlow: A Flexible and Efficient RL Post-Training Framework

Python 21,026 3,775 Updated Apr 30, 2026

jingyangcarl / openreview

Python 11 2 Updated Apr 29, 2026

span-man / ebooks

178 61 Updated Aug 26, 2020

tengxiao1 / SimPER

SimPER: A Minimalist Approach to Preference Alignment without Hyperparameters (ICLR 2025)

Python 17 Updated Aug 22, 2025

facebookresearch / xformers

Hackable and optimized Transformers building blocks, supporting a composable construction.

Python 10,439 773 Updated Apr 21, 2026

YunyiShen / ARM-FI

Active reward modeling with last layer Fisher Information (ICML'25)

Python 7 Updated Jul 9, 2025

QwenLM / Qwen3

Qwen3 is the large language model series developed by Qwen team, Alibaba Cloud.

Python 27,187 1,983 Updated Jan 9, 2026

alirezadir / Machine-Learning-Interviews

This repo is meant to serve as a guide for Machine Learning/AI technical interviews.

Jupyter Notebook 8,116 1,457 Updated Nov 28, 2025

holarissun / embedding-based-llm-alignment

Codebase for Paper Reusing Embeddings: Reproducible Reward Model Research in Large Language Model Alignment without GPUs

Python 22 2 Updated Apr 24, 2025

Linear95 / SPAG

Self-playing Adversarial Language Game Enhances LLM Reasoning, NeurIPS 2024

Python 144 23 Updated Feb 24, 2025

BlackHC / batchbald_redux

Reusable BatchBALD implementation

Jupyter Notebook 77 15 Updated Feb 28, 2024

deepseek-ai / DeepSeek-R1

92,002 11,729 Updated Jun 27, 2025

opendilab / DI-engine

OpenDILab Decision AI Engine. The Most Comprehensive Reinforcement Learning Framework B.P.

Python 3,621 431 Updated Dec 7, 2025

google-deepmind / alphafold3

AlphaFold 3 inference pipeline.

Python 7,922 1,196 Updated Apr 23, 2026

BlinkDL / RWKV-LM

RWKV (pronounced RwaKuv) is an RNN with great LLM performance, which can also be directly trained like a GPT transformer (parallelizable). We are at RWKV-7 "Goose". So it's combining the best of RN…

Python 14,498 1,006 Updated Apr 28, 2026

holarissun / RewardModelingBeyondBradleyTerry

official implementation of ICLR'2025 paper: Rethinking Bradley-Terry Models in Preference-based Reward Modeling: Foundations, Theory, and Alternatives

Python 73 5 Updated Apr 2, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Hao Sun holarissun

Achievements

Achievements

Highlights

Block or report holarissun

Stars

JacobPfau / fillerTokens

ethansbrown / acpc

google-deepmind / open_spiel

google-deepmind / game_arena

TextArena / UnstableBaselines

TextArena / TextArena

keithlee96 / pluribus-poker-AI

HenryRLee / PokerHandEvaluator

ge-ne / bibtool

inclusionAI / AReaL

hiyouga / EasyR1

proroklab / popgym

ruixin31 / Spurious_Rewards

zhangxy-2019 / critique-GRPO

verl-project / verl

jingyangcarl / openreview

span-man / ebooks

tengxiao1 / SimPER

facebookresearch / xformers

YunyiShen / ARM-FI

QwenLM / Qwen3

alirezadir / Machine-Learning-Interviews

holarissun / embedding-based-llm-alignment

Linear95 / SPAG

BlackHC / batchbald_redux

deepseek-ai / DeepSeek-R1

opendilab / DI-engine

google-deepmind / alphafold3

BlinkDL / RWKV-LM

holarissun / RewardModelingBeyondBradleyTerry