Skip to content
View kykim0's full-sized avatar

Organizations

@sisl @JuliaPOMDP @StanfordVL

Block or report kykim0

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse

Starred repositories

Showing results

LLM Chess - evaluating Large Language Models' reasoning and instruction-following abilities by simulating chess games

Jupyter Notebook 91 9 Updated Feb 12, 2026

A collection of various llm pruning implementations, training code for GPUs & TPUs, and evaluation script.

Python 60 8 Updated Feb 3, 2026

CATArena is an engineering-level tournament evaluation platform for Large Language Model-driven code agents (LLM-driven code agents), based on an iterative competitive peer learning framework.

Python 59 10 Updated Dec 25, 2025

"AI-Trader: Can AI Beat the Market?" Live Trading Bench: https://ai4trade.ai Tech Report Link: https://arxiv.org/abs/2512.10971

Python 11,274 1,887 Updated Dec 19, 2025

Synthetic data curation for post-training and structured data extraction

Python 1,632 134 Updated Jan 24, 2026

Benchmark LLM reasoning capability by solving chess puzzles.

Python 90 5 Updated Apr 26, 2025

Training VLM agents with multi-turn reinforcement learning

Python 412 47 Updated Feb 13, 2026

Harsh Jhamtani*, Varun Gangal*, Eduard Hovy, Graham Neubig, Taylor Berg-Kirkpatrick. Learning to Generate Move-by-Move Commentary for Chess Games from Large-Scale Social Forum Data. ACL 2018

OpenEdge ABL 45 11 Updated Jul 21, 2022

Open source neural network chess engine with GPU acceleration and broad hardware support.

C++ 2,983 614 Updated Dec 30, 2025

A Text-Based Environment for Interactive Debugging

Python 293 39 Updated Feb 13, 2026

This is the official GitHub repository for our survey paper "Beyond Single-Turn: A Survey on Multi-Turn Interactions with Large Language Models".

173 4 Updated May 14, 2025

Fully open reproduction of DeepSeek-R1

Python 25,880 2,413 Updated Nov 24, 2025

[ICLR 2026] Learning to Reason without External Rewards

Python 390 41 Updated Jan 26, 2026

A library for generative social simulation

Python 1,203 281 Updated Feb 16, 2026

AI paper trading project inspired by nof1 Alpha Arena, using cctx for quotation.

Python 538 139 Updated Nov 21, 2025

Procgen Benchmark: Procedurally-Generated Game-Like Gym-Environments

C++ 1,141 216 Updated Jan 3, 2024

Defeating the Training-Inference Mismatch via FP16

Python 182 15 Updated Nov 14, 2025

Natural Language Reinforcement Learning

Python 101 7 Updated Jul 30, 2025
Python 12 3 Updated Jul 10, 2025

Post-training with Tinker

Python 2,843 323 Updated Feb 17, 2026

A library for mechanistic interpretability of GPT-style language models

Python 3,087 510 Updated Feb 17, 2026

MARSHAL: Incentivizing Multi-Agent Reasoning via Self-Play with Strategic LLMs

Python 37 1 Updated Feb 11, 2026

[ICLR 2026] Vision-Zero: Scalable VLM Self-Improvement via Strategic Gamified Self-Play.

Python 115 4 Updated Feb 6, 2026
Jupyter Notebook 4 Updated Dec 16, 2025

An extensible benchmark for evaluating large language models on planning

PDDL 450 47 Updated Sep 17, 2025

Awesome List for Agentic RL

HTML 798 34 Updated Feb 10, 2026

A-MEM: Agentic Memory for LLM Agents

Python 271 44 Updated Nov 21, 2025

Evals is a framework for evaluating LLMs and LLM systems, and an open-source registry of benchmarks.

Python 17,691 2,878 Updated Nov 3, 2025

All the source code for "Robot Learning: A Tutorial". Get involved to be featured in the next iteration!

TeX 465 54 Updated Feb 4, 2026
Next