nissymori

Soichiro Nishimori nissymori

PhD student. Interested in Game AI, JAX-based RL, offline RL and exploration.

40 followers · 21 following

The University of Tokyo
Tokyo, Japan
21:36 (UTC -12:00)
https://nissymori.github.io/
@nissymori1

Achievements

x2 x3

Achievements

x2 x3

Stars

sotetsuk / causal-ai-ja

『因果AI ―コードファーストで学ぶ因果推論―』サポートページ

Jupyter Notebook 1 Updated Jul 23, 2026

next-state / open-dreamer

Open-source Dreamer world-model implementation in JAX

Python 308 25 Updated Jul 26, 2026

Infatoshi / craftax.cu

CUDA Craftax-Classic: 7x faster RL training than JAX

Cuda 73 1 Updated Jul 25, 2026

AlexGoldie / discogen

Official implementation of DiscoGen, for "Procedural Generation of Algorithm Discovery Tasks in Machine Learning"

Python 48 10 Updated Jul 2, 2026

KazukiOhta / klent

Revisiting Regularized Policy Optimization for Stable and Efficient Reinforcement Learning in Two-Player Games

Python 4 Updated Jul 12, 2026

trotsky1997 / OpenFugu

Open reimplementation of Sakana Fugu — the 'one model to command them all' LLM orchestrator. Read → run → train → serve.

Python 440 82 Updated Jun 22, 2026

ishida-lab / irreducible

[ICLR 2023] Is the Performance of My Deep Network Too Good to Be True? A Direct Approach to Estimating the Bayes Error in Binary Classification

Python 24 Updated Aug 12, 2025

ishida-lab / capcode

CapCode: Detecting cheating in coding agents with capped, randomized tests

Python 5 Updated Jun 8, 2026

ishida-lab / capreward

CapReward: Penalizing implausibly high pass rates to prevent reward hacking in coding RL

Python 5 Updated Jun 8, 2026

nissymori / ReMAC

The Official JAX Code for "Retry Policy Gradients for Continuous Action Spaces"

Python 3 Updated Jun 5, 2026

paavo5 / ordergrad

Policy gradient reinforcement learning beyond the mean, e.g., Pass@k, Max@k, TopM@K, CVaR, VaR, Quantiles, Trimmed Means or other functionals of the reward distribution based on unbiased order-stat…

Python 13 1 Updated Jun 5, 2026

proppo / proppo

Proppo is a prototype Automatic Propagation software library, a generalization of Automatic Differentiation.

Python 9 Updated Nov 30, 2022

RobertTLange / headless-cli

One unified CLI for headless coding agent execution 🤖

TypeScript 33 4 Updated Jul 27, 2026

ishida-lab / capbencher

[ICML 2026] CapBencher toolkit: Give your LLM benchmark a built-in alarm for leakage and gaming

Python 11 1 Updated May 29, 2026

ImIntheMiddle / awesome-online-prediction

[English/Japanese] A curated list of awesome online-prediction papers, libraries, and resources. Created and hosted by MIRU2025 Young Researchers Program group 5.

8 Updated Aug 19, 2025

easonyu0203 / nash_policy_gradient_public

Python 4 Updated May 2, 2026

nissymori / remax-rl

[ICML2026] Official JAX code for Emergence of Exploration in Policy Gradient Reinforcement Learning via Retrying

Python 15 Updated Jul 3, 2026

takashiishida / bibfixer

A Python tool that automatically cleans, completes, and standardizes BibTeX entries using LLMs and web search.

Python 185 7 Updated Jun 10, 2026

takashiishida / arxiv-to-prompt

Transform arXiv papers into a single LaTeX source that can be used as a prompt for asking LLMs questions about the paper.

Python 167 10 Updated Jun 10, 2026

takashiishida / arxiv-latex-mcp

MCP server that uses arxiv-to-prompt to fetch and process arXiv LaTeX sources for precise interpretation of mathematical expressions in scientific papers.

Python 142 14 Updated Jul 17, 2026

aiming-lab / AutoResearchClaw

Fully autonomous & self-evolving research from idea to paper. Chat an Idea. Get a Paper. 🦞

Python 13,919 1,630 Updated Jul 13, 2026

karpathy / autoresearch

AI agents running research on single-GPU nanochat training automatically

Python 92,301 13,171 Updated Mar 26, 2026

666ghj / MiroFish

A Simple and Universal Swarm Intelligence Engine, Predicting Anything. 简洁通用的群体智能引擎，预测万物

Python 69,654 10,883 Updated Jul 23, 2026

JohannesAck / gradientregularization_trl

Implementation for our paper "Gradient Regularization prevents Reward Hacking in RLHF and RLVR". Implemented TRL and for Huggingface Transformers

Python 12 Updated Feb 24, 2026

softmatcha / softmatcha2

A fast and soft pattern search for trillion-scale corpora.

Python 238 11 Updated Feb 28, 2026

softmatcha / softmatcha.github.io

Forked from nerfies/nerfies.github.io

HTML 3 Updated Jul 22, 2026

AtaraxosAI / stratego

C++ 13 1 Updated Feb 18, 2026

rutopio / mahjong-font

https://mahjong.chingru.com - Japanese Mahjong Font (Riichi Mahjong) turns tile notation like 7m7m7m2p3p4p into inline-SVG mahjong hands, with an embeddable shields.io-style Image API.

TypeScript 29 2 Updated Jul 14, 2026

smly / RiichiEnv

High-Performance Research Environment for Riichi Mahjong

Rust 69 18 Updated May 9, 2026

st-tech / zr-obp

Open Bandit Pipeline: a python library for bandit algorithms and off-policy evaluation

Python 704 100 Updated Jun 3, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Soichiro Nishimori nissymori

Achievements

Achievements

Block or report nissymori

Stars

sotetsuk / causal-ai-ja

next-state / open-dreamer

Infatoshi / craftax.cu

AlexGoldie / discogen

KazukiOhta / klent

trotsky1997 / OpenFugu

ishida-lab / irreducible

ishida-lab / capcode

ishida-lab / capreward

nissymori / ReMAC

paavo5 / ordergrad

proppo / proppo

RobertTLange / headless-cli

ishida-lab / capbencher

ImIntheMiddle / awesome-online-prediction

easonyu0203 / nash_policy_gradient_public

nissymori / remax-rl

takashiishida / bibfixer

takashiishida / arxiv-to-prompt

takashiishida / arxiv-latex-mcp

aiming-lab / AutoResearchClaw

karpathy / autoresearch

666ghj / MiroFish

JohannesAck / gradientregularization_trl

softmatcha / softmatcha2

softmatcha / softmatcha.github.io

AtaraxosAI / stratego

rutopio / mahjong-font

smly / RiichiEnv

st-tech / zr-obp