Highlights
-
-
agent-skills-for-compute Public
Agent-optimized skills for the full LLM lifecycle — pre-training, post-training (RL/DPO/RLHF), inference, and autonomous research — plus GPU/TPU/QPU kernel programming, simulation, and scientific c…
-
Mem-RLM Public
Memory augmented inference library for Recursive Language Models (RLMs), built on top of rlm.
-
ContextJira Public
Chrome Extension for extracting AI-ready Markdown from Jira Cloud & Server. Copy issue context — metadata, descriptions, comments, linked issues, attachments. Built for Claude, ChatGPT, Copilot and…
-
Implementation of Lin et al., 2025.
-
-
sglang Public
Forked from sgl-project/sglangSGLang is a fast serving framework for large language models and vision language models.
Python Apache License 2.0 UpdatedNov 26, 2025 -
mirage Public
Forked from mirage-project/mirageMirage Persistent Kernel: Compiling LLMs into a MegaKernel
C++ Apache License 2.0 UpdatedNov 8, 2025 -
Streaming-DeepAgents Public
Streaming and task delegation for Langchain's Deepagents
-
awesome-gemini-cli Public
A curated list of awesome resources, tools, workflows, and guides for Google's > Gemini CLI
-
SynthToT Public
SynthToT: Generate synthetic dataset for your training dataset through deliberate problem-solving et al S Yao, 2023.
-
cpp-langchain Public
Tool for executing C/C++ code snippets with Langchain Agents.
-
xLSTM-Jax Public
Jax implementation of x-LSTM: Extended Long Short-Term Memory by Beck et al. (2024)
-
Mixture-of-Depths-Jax Public
Jax module for the paper: "Mixture-of-Depths: Dynamically allocating compute in transformer-based language models"
-
LongConv-Jax Public
Jax/Flax/Linen implementation of "Simple Hardware-Efficient Long Convolutions for Sequence Modeling"
-
Tri-RMSNorm Public
Efficient kernel for RMS normalization with fused operations, includes both forward and backward passes, compatibility with PyTorch.
-
GradientAscent-Jax Public
Custom gradient ascent solver (optimizer) for JAX/Flax models
-
Ring-Attention-Jax Public
Packaged Ring Attention with Blockwise Transformers for Near-Infinite Context implemented in Jax + Flax.
-
Griffin-Jax Public
Jax implementation of "Griffin: Mixing Gated Linear Recurrences with Local Attention for Efficient Language Models"
-
triton-activations Public
Collection of neural network activation function kernels for Triton Language Compiler by OpenAI
-
PaLM-rlhf-pytorch-DS Public
Forked from lucidrains/PaLM-rlhf-pytorchModificated DeepSpeed training setup fork of RLHF (Reinforcement Learning with Human Feedback) by lucidrains on top of the PaLM architecture. Basically ChatGPT but with PaLM
Python MIT License UpdatedApr 11, 2024 -
MEGABYTE-pytorch-DS Public
Forked from lucidrains/MEGABYTE-pytorchModificated DeepSpeed training setup fork of MEGABYTE - PyTorch by lucidrains, Predicting Million-byte Sequences with Multiscale Transformers, in Pytorch
Python MIT License UpdatedApr 11, 2024 -
kmeansops Public
PyKeops Powered K-Means Clustering Algorithms Module both on CPU & GPU
-
jax-triton Public
Forked from jax-ml/jax-tritonjax-triton contains integrations between JAX and OpenAI Triton
Python Apache License 2.0 UpdatedMar 12, 2024 -
mpi-ds Public
MPI Operator DeepSpeed Base Configuration for CIFAR-10
-
miniF2F-code Public
Dataset of formal Olympiad-level mathematics problems solved with Python code instructions.
-
smooth-activations Public
Smooth ReLU activations in CUDA. Shamir, G., I. et al.