Stars
- All languages
- ANTLR
- ASP.NET
- Agda
- Assembly
- Batchfile
- C
- C#
- C++
- CMake
- CSS
- Clojure
- CoffeeScript
- Cuda
- Cython
- DIGITAL Command Language
- Dockerfile
- FreeBASIC
- Git Attributes
- Go
- HCL
- HTML
- Haskell
- Java
- JavaScript
- Jupyter Notebook
- Kotlin
- LLVM
- Lean
- LiveScript
- Lua
- MATLAB
- MDX
- MLIR
- Makefile
- Markdown
- Mustache
- OCaml
- Objective-C
- OpenEdge ABL
- PHP
- Pascal
- Perl
- PostScript
- PowerShell
- PureScript
- Python
- Reason
- Roff
- Ruby
- Rust
- SCSS
- Sass
- Scala
- Shell
- Starlark
- Svelte
- Swift
- SystemVerilog
- Tcl
- TeX
- TypeScript
- Vala
- Verilog
- Vim Script
- Vue
- q
C++-based high-performance parallel environment execution engine (vectorized env) for general RL environments.
Grammars written for ANTLR v4; expectation that the grammars are free of actions.
A framework for modeling non-stationary Markov decision processes and the key decision making problems in these environments
A non-saturating, open-ended environment for evaluating LLMs in Factorio
verl: Volcano Engine Reinforcement Learning for LLMs
An interface library for RL post training with environments.
torchcomms: a modern PyTorch communications API
Machine Learning Engineering Open Book
The contents of /mnt/skills in Claude's code interpreter environment
Epistemic AlphaZero utilizes uncertainty to explore and learn even when AlphaZero gets stuck.
stdgpu: Efficient STL-like Data Structures on the GPU
Really Fast End-to-End Jax RL Implementations
Chisel: A Modern Hardware Design Language
open-source IEEE 802.11 WiFi baseband FPGA (chip) design: driver, software
Domain-specific language designed to streamline the development of high-performance GPU/CPU/Accelerators kernels
A more memory-efficient rewrite of the HF transformers implementation of Llama for use with quantized weights.
A Survey of Reinforcement Learning for Large Reasoning Models
An open-source efficient deep learning framework/compiler, written in python.
This is the repository that contains the source code for the Self-Evaluation Guided MCTS for online DPO.
Safe RLHF: Constrained Value Alignment via Safe Reinforcement Learning from Human Feedback