Skip to content
View alecco's full-sized avatar
  • Spain

Block or report alecco

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don’t include any personal information such as legal names or email addresses. Markdown is supported. This note will only be visible to you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

Quartet II Official Code

Python 75 9 Updated May 1, 2026

High-throughput tensor loading for PyTorch

Python 256 15 Updated Mar 11, 2026
Python 4 3 Updated May 7, 2026
Python 10 Updated May 26, 2026

100M tokens. Infinite compute. Lowest val loss wins.

Python 503 75 Updated Jun 15, 2026

CODA: Rewriting Transformer Blocks as GEMM-Epilogue Programs

Python 216 22 Updated Jun 22, 2026

Long Context Pre-Training with Lighthouse Attention

Python 58 12 Updated May 9, 2026

The agent that grows with you

Python 199,923 35,571 Updated Jun 22, 2026

Cuda kernels for leveraging LLM sparsity to improve throughput and decrease the memory requirements during inference and training.

Cuda 246 23 Updated May 14, 2026

Muon is an optimizer for hidden layers in neural networks

Python 2,674 125 Updated May 24, 2026

NanoGPT (124M) in 90 seconds

Python 5,434 816 Updated Jun 21, 2026

SmoothE: Differentiable E-Graph Extraction (ASPLOS'25 Best Paper)

Python 33 3 Updated Jan 15, 2026

TokenSpeed is a speed-of-light LLM inference engine.

Python 1,480 166 Updated Jun 22, 2026

cuDNN Frontend is NVIDIA's modern, open-source entry point to the cuDNN library and a growing collection of high-performance open-source kernels.

Python 851 191 Updated Jun 17, 2026

Node0: A collaborative event powered by Protocol Learning, our decentralized approach to AI development

Python 95 29 Updated Sep 23, 2025

Learn CUDA with PyTorch

Cuda 337 50 Updated Jun 1, 2026

Benchmark and deploy optimized LLM models on GPU servers with vLLM or SGLang. Chose from a list of optimized recipes for popular models or create your own with custom configurations. Run benchmarks…

Python 60 8 Updated Jun 22, 2026

[ICLR2025, ICML2025, NeurIPS2025 Spotlight] Quantized Attention achieves speedup of 2-5x compared to FlashAttention, without losing end-to-end metrics across language, image, and video models.

Cuda 3,428 434 Updated Jan 17, 2026

A beautiful, simple, clean, and responsive Jekyll theme for academics

HTML 15,759 13,072 Updated Jun 22, 2026

Every Code - push frontier AI to it limits. A fork of the Codex CLI with validation, automation, browser integration, multi-agents, theming, and much more. Orchestrate agents from OpenAI, Claude, G…

Rust 3,809 232 Updated Jun 22, 2026

FlashKDA: high-performance Kimi Delta Attention kernels

Cuda 449 40 Updated May 26, 2026

how few training tokens can you use to reach a target validation loss?

Python 5 Updated Apr 6, 2026

Accelerating MoE with IO and Tile-aware Optimizations

Python 719 90 Updated Jun 15, 2026

A dedicated effort to make an optimized, bleeding edge vLLM image using Docker to support DGX comprehensively

Cuda 118 22 Updated Feb 22, 2026

FlashInfer: Kernel Library for LLM Serving

Python 5,838 1,070 Updated Jun 22, 2026

Experiment on replacing the Scaled Dot-Product Attention in Transformers for a distance-based metric: the Radial Basis Function (RBF) kernel.

Python 25 3 Updated Apr 19, 2026

Trains small LMs. Designed for training on SimpleStories

Python 14 5 Updated Sep 15, 2025

Dataset Generation Code for SimpleStories

Jupyter Notebook 12 2 Updated Dec 19, 2025

🎨 NeMo Data Designer: Generate high-quality synthetic data from scratch or from seed data.

Python 2,025 187 Updated Jun 22, 2026
Next