nissymori

Follow

Soichiro Nishimori nissymori

Follow

PhD student. Interested in Game AI, JAX-based RL, offline RL and exploration.

38 followers · 21 following

The University of Tokyo
Tokyo, Japan
18:04 (UTC -12:00)
https://nissymori.github.io/
@nissymori1

Achievements

Achievements

Stars

nissymori / ReMAC

The Official JAX Code for "Retry Policy Gradients for Continuous Action Spaces"

Python 3 Updated Jun 5, 2026

paavo5 / ordergrad

Policy gradient reinforcement learning beyond the mean, e.g., Pass@k, Max@k, TopM@K, CVaR, VaR, Quantiles, Trimmed Means or other functionals of the reward distribution based on unbiased order-stat…

Python 11 1 Updated Jun 5, 2026

proppo / proppo

Proppo is a prototype Automatic Propagation software library, a generalization of Automatic Differentiation.

Python 9 Updated Nov 30, 2022

RobertTLange / headless-cli

One unified CLI for headless coding agent execution 🤖

TypeScript 22 3 Updated Jun 18, 2026

ishida-lab / capbencher

[ICML 2026] CapBencher toolkit: Give your LLM benchmark a built-in alarm for leakage and gaming

Python 8 1 Updated May 29, 2026

ImIntheMiddle / awesome-online-prediction

[English/Japanese] A curated list of awesome online-prediction papers, libraries, and resources. Created and hosted by MIRU2025 Young Researchers Program group 5.

8 Updated Aug 19, 2025

easonyu0203 / nash_policy_gradient_public

Python 3 Updated May 2, 2026

nissymori / remax-rl

[ICML2026] Official JAX code for Emergence of Exploration in Policy Gradient Reinforcement Learning via Retrying

Python 8 Updated Jun 5, 2026

takashiishida / bibfixer

A Python tool that automatically cleans, completes, and standardizes BibTeX entries using LLMs and web search.

Python 186 7 Updated Jun 10, 2026

takashiishida / arxiv-to-prompt

Transform arXiv papers into a single LaTeX source that can be used as a prompt for asking LLMs questions about the paper.

Python 163 10 Updated Jun 10, 2026

takashiishida / arxiv-latex-mcp

MCP server that uses arxiv-to-prompt to fetch and process arXiv LaTeX sources for precise interpretation of mathematical expressions in scientific papers.

Python 138 14 Updated Jun 17, 2026

aiming-lab / AutoResearchClaw

Fully autonomous & self-evolving research from idea to paper. Chat an Idea. Get a Paper. 🦞

Python 13,485 1,582 Updated Jun 3, 2026

karpathy / autoresearch

AI agents running research on single-GPU nanochat training automatically

Python 87,593 12,668 Updated Mar 26, 2026

666ghj / MiroFish

A Simple and Universal Swarm Intelligence Engine, Predicting Anything. 简洁通用的群体智能引擎，预测万物

Python 66,784 10,414 Updated May 24, 2026

JohannesAck / gradientregularization_trl

Implementation for our paper "Gradient Regularization prevents Reward Hacking in RLHF and RLVR". Implemented TRL and for Huggingface Transformers

Python 11 Updated Feb 24, 2026

softmatcha / softmatcha2

A fast and soft pattern search for trillion-scale corpora.

Python 230 11 Updated Feb 28, 2026

softmatcha / softmatcha.github.io

Forked from nerfies/nerfies.github.io

HTML 3 Updated May 19, 2026

AtaraxosAI / stratego

C++ 10 1 Updated Feb 18, 2026

rutopio / mahjong-font

https://mahjongfont.pages.dev - Japanese Mahjong (Riichi Mahjong) Font with OpenType｜Mahjong Tile Image Generator

CSS 26 2 Updated Jun 14, 2026

smly / RiichiEnv

High-Performance Research Environment for Riichi Mahjong

Rust 61 14 Updated May 9, 2026

st-tech / zr-obp

Open Bandit Pipeline: a python library for bandit algorithms and off-policy evaluation

Python 704 99 Updated Jun 3, 2024

nissymori / mahjax

A GPU-Accelerated Mahjong Simulator for RL in JAX

Python 50 6 Updated May 27, 2026

MizuhoAOKI / jax_generative_models

Minimal JAX implementation unifying Diffusion and Flow Matching algorithms as alternative strategies for transporting data distributions.

Python 66 3 Updated Dec 19, 2025

R-Yin-217 / Towards-Scalable-Oversight-via-Partitioned-Human-Supervision

Python 8 Updated Feb 8, 2026

naruya / gaussian-vrm

Instant Skinned Gaussian Avatars for Web, Mobile and VR Applications

JavaScript 411 27 Updated May 30, 2026

google-deepmind / md4

Official Jax Implementation of MD4 Masked Diffusion Models

Python 161 17 Updated Feb 27, 2025

nissymori / JAX-CORL

Clean single-file implementation of offline RL algorithms in JAX

Python 180 4 Updated Jun 5, 2026

Tang-Yuting / recursive-reward-aggregation

Jupyter Notebook 11 1 Updated Aug 8, 2025

openai / gpt-oss

gpt-oss-120b and gpt-oss-20b are two open-weight language models by OpenAI

Python 20,173 2,096 Updated Jun 9, 2026

motokiomura / Q-DOT

[RLC 2025] Official code repository for "Offline Reinforcement Learning with Wasserstein Regularization via Optimal Transport Maps"

Python 3 1 Updated Oct 20, 2025