Skip to content
View nissymori's full-sized avatar

Block or report nissymori

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don’t include any personal information such as legal names or email addresses. Markdown is supported. This note will only be visible to you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

The Official JAX Code for "Retry Policy Gradients for Continuous Action Spaces"

Python 3 Updated Jun 5, 2026

Policy gradient reinforcement learning beyond the mean, e.g., Pass@k, Max@k, TopM@K, CVaR, VaR, Quantiles, Trimmed Means or other functionals of the reward distribution based on unbiased order-stat…

Python 11 1 Updated Jun 5, 2026

Proppo is a prototype Automatic Propagation software library, a generalization of Automatic Differentiation.

Python 9 Updated Nov 30, 2022

One unified CLI for headless coding agent execution 🤖

TypeScript 22 3 Updated Jun 18, 2026

[ICML 2026] CapBencher toolkit: Give your LLM benchmark a built-in alarm for leakage and gaming

Python 8 1 Updated May 29, 2026

[English/Japanese] A curated list of awesome online-prediction papers, libraries, and resources. Created and hosted by MIRU2025 Young Researchers Program group 5.

8 Updated Aug 19, 2025

[ICML2026] Official JAX code for Emergence of Exploration in Policy Gradient Reinforcement Learning via Retrying

Python 8 Updated Jun 5, 2026

A Python tool that automatically cleans, completes, and standardizes BibTeX entries using LLMs and web search.

Python 186 7 Updated Jun 10, 2026

Transform arXiv papers into a single LaTeX source that can be used as a prompt for asking LLMs questions about the paper.

Python 163 10 Updated Jun 10, 2026

MCP server that uses arxiv-to-prompt to fetch and process arXiv LaTeX sources for precise interpretation of mathematical expressions in scientific papers.

Python 138 14 Updated Jun 17, 2026

Fully autonomous & self-evolving research from idea to paper. Chat an Idea. Get a Paper. 🦞

Python 13,485 1,582 Updated Jun 3, 2026

AI agents running research on single-GPU nanochat training automatically

Python 87,593 12,668 Updated Mar 26, 2026

A Simple and Universal Swarm Intelligence Engine, Predicting Anything. 简洁通用的群体智能引擎,预测万物

Python 66,784 10,414 Updated May 24, 2026

Implementation for our paper "Gradient Regularization prevents Reward Hacking in RLHF and RLVR". Implemented TRL and for Huggingface Transformers

Python 11 Updated Feb 24, 2026

A fast and soft pattern search for trillion-scale corpora.

Python 230 11 Updated Feb 28, 2026
C++ 10 1 Updated Feb 18, 2026

https://mahjongfont.pages.dev - Japanese Mahjong (Riichi Mahjong) Font with OpenType|Mahjong Tile Image Generator

CSS 26 2 Updated Jun 14, 2026

High-Performance Research Environment for Riichi Mahjong

Rust 61 14 Updated May 9, 2026

Open Bandit Pipeline: a python library for bandit algorithms and off-policy evaluation

Python 704 99 Updated Jun 3, 2024

A GPU-Accelerated Mahjong Simulator for RL in JAX

Python 50 6 Updated May 27, 2026

Minimal JAX implementation unifying Diffusion and Flow Matching algorithms as alternative strategies for transporting data distributions.

Python 66 3 Updated Dec 19, 2025

Instant Skinned Gaussian Avatars for Web, Mobile and VR Applications

JavaScript 411 27 Updated May 30, 2026

Official Jax Implementation of MD4 Masked Diffusion Models

Python 161 17 Updated Feb 27, 2025

Clean single-file implementation of offline RL algorithms in JAX

Python 180 4 Updated Jun 5, 2026
Jupyter Notebook 11 1 Updated Aug 8, 2025

gpt-oss-120b and gpt-oss-20b are two open-weight language models by OpenAI

Python 20,173 2,096 Updated Jun 9, 2026

[RLC 2025] Official code repository for "Offline Reinforcement Learning with Wasserstein Regularization via Optimal Transport Maps"

Python 3 1 Updated Oct 20, 2025
Next