Skip to content
View duoan's full-sized avatar

Block or report duoan

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

Scalable and Performant Data Loading

Python 355 21 Updated Dec 22, 2025

Curated collection of papers in machine learning systems

481 34 Updated Dec 13, 2025

A curated list of awesome projects and papers for distributed training or inference

261 30 Updated Oct 8, 2024

Collective communications library with various primitives for multi-machine training.

C++ 1,379 339 Updated Dec 2, 2025

Build LLM from scratch

Python 73 5 Updated Nov 19, 2025

The hub for EleutherAI's work on interpretability and learning dynamics

Jupyter Notebook 2,691 200 Updated Nov 15, 2025
Python 18 4 Updated Mar 11, 2025

[NeurIPS 2025] AutoVLA: A Vision-Language-Action Model for End-to-End Autonomous Driving with Adaptive Reasoning and Reinforcement Fine-Tuning

340 8 Updated Dec 2, 2025

Summarize existing representative LLMs text datasets.

1,404 139 Updated Oct 11, 2025

NanoGPT (124M) in 3 minutes

Python 3,985 526 Updated Dec 21, 2025

Tutel MoE: Optimized Mixture-of-Experts Library, Support GptOss/DeepSeek/Kimi-K2/Qwen3 using FP8/NVFP4/MXFP4

C 950 107 Updated Dec 21, 2025
Python 1,511 219 Updated Jun 26, 2025

An open-source data logging library for machine learning models and data pipelines. 📚 Provides visibility into data quality & model performance over time. 🛡️ Supports privacy-preserving data collec…

Jupyter Notebook 2,783 134 Updated Jan 10, 2025

🔍 LangKit: An open-source toolkit for monitoring Large Language Models (LLMs). 📚 Extracts signals from prompts & responses, ensuring safety & security. 🛡️ Features include text quality, relevance m…

Jupyter Notebook 969 72 Updated Nov 22, 2024

Easy Data Preparation with latest LLMs-based Operators and Pipelines.

Python 1,645 118 Updated Dec 21, 2025

Best practices & guides on how to write distributed pytorch training code

Python 552 60 Updated Oct 22, 2025
Jupyter Notebook 849 535 Updated Nov 12, 2025

A library for mechanistic interpretability of GPT-style language models

Python 2,902 483 Updated Dec 7, 2025

📝A simple and elegant markdown editor, available for Linux, macOS and Windows.

JavaScript 52,871 3,896 Updated Nov 19, 2025

An open source implementation of CLIP.

Python 13,143 1,220 Updated Nov 4, 2025

Recreating every milestone in Machine Learning and Artificial Intelligence

Python 2 Updated Oct 30, 2025

TinySigLIP: SigLIP Distillation via Affinity Mimicking and Weight Inheritance

Python 1 Updated Dec 2, 2025

Seek Vision Language Model based on Siglip and Qwen

Python 1 Updated Dec 16, 2025

A Framework of Small-scale Large Multimodal Models

Python 938 95 Updated Apr 26, 2025

Lightweight Nearest Neighbors with Flexible Backends

Python 324 10 Updated Oct 5, 2025

State-of-the-art Image & Video CLIP, Multimodal Large Language Models, and More!

Jupyter Notebook 1,944 127 Updated Dec 18, 2025

🐹 Deep clean and optimize your Mac.

Shell 11,644 383 Updated Dec 22, 2025

An orchestration platform for the development, production, and observation of data assets.

Python 14,632 1,913 Updated Dec 19, 2025

Official PyTorch implementation for "Large Language Diffusion Models"

Python 3,414 230 Updated Nov 12, 2025

"Deep Learning Crash Course" is a comprehensive and up-to-date guide that takes you from simple neural networks all the way to cutting-edge deep learning architectures-no advanced math and programm…

Jupyter Notebook 81 26 Updated Dec 11, 2025
Next