Skip to content
View mingjihantencent's full-sized avatar

Block or report mingjihantencent

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

Accelerating MoE with IO and Tile-aware Optimizations

Python 192 6 Updated Dec 18, 2025

LM engine is a library for pretraining/finetuning LLMs

Python 87 23 Updated Dec 18, 2025

Zero Bubble Pipeline Parallelism

Python 442 31 Updated May 7, 2025

🎯 告别信息过载,AI 助你看懂新闻资讯热点,简单的舆情监控分析 - 多平台热点聚合+基于 MCP 的AI分析工具。监控35个平台(抖音、知乎、B站、华尔街见闻、财联社等),智能筛选+自动推送+AI对话分析(用自然语言深度挖掘新闻:趋势追踪、情感分析、相似检索等13种工具)。支持企业微信/个人微信/飞书/钉钉/Telegram/邮件/ntfy/bark/slack 推送,1分钟手机通知,无需…

Python 39,578 20,680 Updated Dec 18, 2025

Run LLMs with MLX

Python 3,078 332 Updated Dec 18, 2025

Why Low-Precision Transformer Training Fails: An Analysis on Flash Attention

Python 33 3 Updated Oct 16, 2025

🔥 A minimal training framework for scaling FLA models

Python 322 49 Updated Nov 15, 2025

A high-performance Python-based I/O system for large (and small) deep learning problems, with strong support for PyTorch.

Python 2,927 228 Updated Jun 19, 2025

The best ChatGPT that $100 can buy.

Python 38,859 4,902 Updated Dec 9, 2025

CUDA Python: Performance meets Productivity

Cython 3,095 233 Updated Dec 18, 2025

Official PyTorch implementation for "Large Language Diffusion Models"

Python 3,412 231 Updated Nov 12, 2025

learning how CUDA works

Cuda 351 45 Updated Mar 3, 2025

GPU documentation for humans

Python 415 51 Updated Dec 9, 2025

This is a series of GPU optimization topics. Here we will introduce how to optimize the CUDA kernel in detail. I will introduce several basic kernel optimizations, including: elementwise, reduce, s…

Cuda 1,206 177 Updated Jul 29, 2023

Complete solutions to the Programming Massively Parallel Processors Edition 4

Jupyter Notebook 608 81 Updated Jun 18, 2025

CUDA 算子手撕与面试指南

Cuda 735 80 Updated Aug 23, 2025

LeetGPU Challenges

Python 543 41 Updated Dec 11, 2025

Puzzles for learning Triton, play it with minimal environment configuration!

Python 574 73 Updated Nov 30, 2025

A 120-day CUDA learning plan covering daily concepts, exercises, pitfalls, and references (including “Programming Massively Parallel Processors”). Features six capstone projects to solidify GPU par…

Shell 820 94 Updated Mar 29, 2025

This repository is a curated collection of resources, tutorials, and practical examples designed to guide you through the journey of mastering CUDA programming. Whether you're just starting or look…

427 38 Updated Feb 22, 2025

A curated list of resources for learning and exploring Triton, OpenAI's programming language for writing efficient GPU code.

Python 441 27 Updated Mar 10, 2025
Python 107 8 Updated Sep 22, 2025

A unified inference and post-training framework for accelerated video generation.

Python 2,833 226 Updated Dec 18, 2025

A simple, unified multimodal models training engine. Lean, flexible, and built for hacking at scale.

Python 680 26 Updated Dec 17, 2025

A scalable generative AI framework built for researchers and developers working on Large Language Models, Multimodal, and Speech AI (Automatic Speech Recognition and Text-to-Speech)

Python 16,321 3,238 Updated Dec 18, 2025

⚡️SwanLab - an open-source, modern-design AI training tracking and visualization tool. Supports Cloud / Self-hosted use. Integrated with PyTorch / Transformers / verl / LLaMA Factory / ms-swift / U…

Python 3,276 176 Updated Dec 18, 2025

how to optimize some algorithm in cuda.

Cuda 2,696 244 Updated Dec 6, 2025

flash attention tutorial written in python, triton, cuda, cutlass

Cuda 459 50 Updated May 14, 2025

Flash Attention in ~100 lines of CUDA (forward pass only)

Cuda 1,023 100 Updated Dec 30, 2024
Next