mingjihantencent

mingjihantencent

0 followers · 8 following

Achievements

Stars

Dao-AILab / sonic-moe

Accelerating MoE with IO and Tile-aware Optimizations

Python 192 6 Updated Dec 18, 2025

open-lm-engine / lm-engine

LM engine is a library for pretraining/finetuning LLMs

Python 87 23 Updated Dec 18, 2025

sail-sg / zero-bubble-pipeline-parallelism

Forked from NVIDIA/Megatron-LM

Zero Bubble Pipeline Parallelism

Python 442 31 Updated May 7, 2025

sansan0 / TrendRadar

🎯 告别信息过载，AI 助你看懂新闻资讯热点，简单的舆情监控分析 - 多平台热点聚合+基于 MCP 的AI分析工具。监控35个平台（抖音、知乎、B站、华尔街见闻、财联社等），智能筛选+自动推送+AI对话分析（用自然语言深度挖掘新闻：趋势追踪、情感分析、相似检索等13种工具）。支持企业微信/个人微信/飞书/钉钉/Telegram/邮件/ntfy/bark/slack 推送，1分钟手机通知，无需…

Python 39,578 20,680 Updated Dec 18, 2025

ml-explore / mlx-lm

Run LLMs with MLX

Python 3,078 332 Updated Dec 18, 2025

ucker / why-low-precision-training-fails

Why Low-Precision Transformer Training Fails: An Analysis on Flash Attention

Python 33 3 Updated Oct 16, 2025

fla-org / flame

🔥 A minimal training framework for scaling FLA models

Python 322 49 Updated Nov 15, 2025

webdataset / webdataset

A high-performance Python-based I/O system for large (and small) deep learning problems, with strong support for PyTorch.

Python 2,927 228 Updated Jun 19, 2025

karpathy / nanochat

The best ChatGPT that $100 can buy.

Python 38,859 4,902 Updated Dec 9, 2025

NVIDIA / cuda-python

CUDA Python: Performance meets Productivity

Cython 3,095 233 Updated Dec 18, 2025

ML-GSAI / LLaDA

Official PyTorch implementation for "Large Language Diffusion Models"

Python 3,412 231 Updated Nov 12, 2025

ifromeast / cuda_learning

learning how CUDA works

Cuda 351 45 Updated Mar 3, 2025

modal-labs / gpu-glossary

GPU documentation for humans

Python 415 51 Updated Dec 9, 2025

Liu-xiandong / How_to_optimize_in_GPU

This is a series of GPU optimization topics. Here we will introduce how to optimize the CUDA kernel in detail. I will introduce several basic kernel optimizations, including: elementwise, reduce, s…

Cuda 1,206 177 Updated Jul 29, 2023

tugot17 / pmpp

Complete solutions to the Programming Massively Parallel Processors Edition 4

Jupyter Notebook 608 81 Updated Jun 18, 2025

Tongkaio / CUDA_Kernel_Samples

CUDA 算子手撕与面试指南

Cuda 735 80 Updated Aug 23, 2025

AlphaGPU / leetgpu-challenges

LeetGPU Challenges

Python 543 41 Updated Dec 11, 2025

SiriusNEO / Triton-Puzzles-Lite

Puzzles for learning Triton, play it with minimal environment configuration!

Python 574 73 Updated Nov 30, 2025

AdepojuJeremy / CUDA-120-DAYS--CHALLENGE

A 120-day CUDA learning plan covering daily concepts, exercises, pitfalls, and references (including “Programming Massively Parallel Processors”). Features six capstone projects to solidify GPU par…

Shell 820 94 Updated Mar 29, 2025

rkinas / cuda-learning

This repository is a curated collection of resources, tutorials, and practical examples designed to guide you through the journey of mastering CUDA programming. Whether you're just starting or look…

427 38 Updated Feb 22, 2025

rkinas / triton-resources

A curated list of resources for learning and exploring Triton, OpenAI's programming language for writing efficient GPU code.

Python 441 27 Updated Mar 10, 2025

hkproj / 100-days-of-gpu

406 37 Updated Apr 10, 2025

Starmys / TritonStudyGroup

Python 107 8 Updated Sep 22, 2025

hao-ai-lab / FastVideo

A unified inference and post-training framework for accelerated video generation.

Python 2,833 226 Updated Dec 18, 2025

EvolvingLMMs-Lab / lmms-engine

A simple, unified multimodal models training engine. Lean, flexible, and built for hacking at scale.

Python 680 26 Updated Dec 17, 2025

NVIDIA-NeMo / NeMo

A scalable generative AI framework built for researchers and developers working on Large Language Models, Multimodal, and Speech AI (Automatic Speech Recognition and Text-to-Speech)

Python 16,321 3,238 Updated Dec 18, 2025

SwanHubX / SwanLab

⚡️SwanLab - an open-source, modern-design AI training tracking and visualization tool. Supports Cloud / Self-hosted use. Integrated with PyTorch / Transformers / verl / LLaMA Factory / ms-swift / U…

Python 3,276 176 Updated Dec 18, 2025

BBuf / how-to-optim-algorithm-in-cuda

how to optimize some algorithm in cuda.

Cuda 2,696 244 Updated Dec 6, 2025

66RING / tiny-flash-attention

flash attention tutorial written in python, triton, cuda, cutlass

Cuda 459 50 Updated May 14, 2025

tspeterkim / flash-attention-minimal

Flash Attention in ~100 lines of CUDA (forward pass only)

Cuda 1,023 100 Updated Dec 30, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

mingjihantencent

Achievements

Achievements

Block or report mingjihantencent

Stars

Dao-AILab / sonic-moe

open-lm-engine / lm-engine

sail-sg / zero-bubble-pipeline-parallelism

sansan0 / TrendRadar

ml-explore / mlx-lm

ucker / why-low-precision-training-fails

fla-org / flame

webdataset / webdataset

karpathy / nanochat

NVIDIA / cuda-python

ML-GSAI / LLaDA

ifromeast / cuda_learning

modal-labs / gpu-glossary

Liu-xiandong / How_to_optimize_in_GPU

tugot17 / pmpp

Tongkaio / CUDA_Kernel_Samples

AlphaGPU / leetgpu-challenges

SiriusNEO / Triton-Puzzles-Lite

AdepojuJeremy / CUDA-120-DAYS--CHALLENGE

rkinas / cuda-learning

rkinas / triton-resources

hkproj / 100-days-of-gpu

Starmys / TritonStudyGroup

hao-ai-lab / FastVideo

EvolvingLMMs-Lab / lmms-engine

NVIDIA-NeMo / NeMo

SwanHubX / SwanLab

BBuf / how-to-optim-algorithm-in-cuda

66RING / tiny-flash-attention

tspeterkim / flash-attention-minimal