cjmcv

Chen Jianming cjmcv

45 followers · 148 following

Achievements

tilelang Public
Forked from tile-ai/tilelang

Domain-specific language designed to streamline the development of high-performance GPU/CPU/Accelerators kernels

Python Other Updated May 15, 2026
Megakernels Public
Forked from HazyResearch/Megakernels

Kernels, of the mega variety :)

Python MIT License Updated May 12, 2026
xalpha Public
Forked from refraction-ray/xalpha

基金投资管理回测引擎

Python MIT License Updated May 10, 2026
cutlass Public
Forked from NVIDIA/cutlass

CUDA Templates for Linear Algebra Subroutines

C++ 1 Other Updated Apr 13, 2026
ai-infra-notes Public

Reading notes on the open source code of AI infrastructure (sglang, llm, cutlass, hpc, etc.)

hpc gpu cuda inference simd cutlass heterogeneous-computing

6 Updated Mar 21, 2026
flash-attention Public
Forked from Dao-AILab/flash-attention

Fast and memory-efficient exact attention

Python 1 BSD 3-Clause "New" or "Revised" License Updated Jan 30, 2026
mirage Public
Forked from mirage-project/mirage

Mirage Persistent Kernel: Compiling LLMs into a MegaKernel

C++ Apache License 2.0 Updated Jan 8, 2026
flux Public
Forked from bytedance/flux

A fast communication-overlapping library for tensor/expert parallelism on GPUs.

C++ Apache License 2.0 Updated Dec 19, 2025
tvm Public
Forked from apache/tvm

Open Machine Learning Compiler Framework

Python Apache License 2.0 Updated Nov 27, 2025
mlc-llm Public
Forked from mlc-ai/mlc-llm

Universal LLM Deployment Engine with ML Compilation

Python Apache License 2.0 Updated Nov 26, 2025
sglang Public
Forked from sgl-project/sglang

SGLang is a fast serving framework for large language models and vision language models.

Python 1 Apache License 2.0 Updated Oct 20, 2025
relax Public
Forked from mlc-ai/relax

Python Apache License 2.0 Updated Oct 11, 2025
nvshmem Public
Forked from NVIDIA/nvshmem

NVIDIA NVSHMEM is a parallel programming interface for NVIDIA GPUs based on OpenSHMEM. NVSHMEM can significantly reduce multi-process communication and coordination overheads by allowing programmer…

C++ Other Updated Sep 10, 2025
flashinfer Public
Forked from flashinfer-ai/flashinfer

FlashInfer: Kernel Library for LLM Serving

Cuda Apache License 2.0 Updated Aug 14, 2025
TensorRT-LLM Public
Forked from NVIDIA/TensorRT-LLM

TensorRT-LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and support state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs. TensorR…

C++ Apache License 2.0 Updated Jul 14, 2025
SageAttention Public
Forked from thu-ml/SageAttention

Quantized Attention achieves speedup of 2-5x and 3-11x compared to FlashAttention and xformers, without lossing end-to-end metrics across language, image, and video models.

Cuda 1 Apache License 2.0 Updated Jul 4, 2025
marlin Public
Forked from IST-DASLab/marlin

FP16xINT4 LLM inference kernel that can achieve near-ideal ~4x speedups up to medium batchsizes of 16-32 tokens.

Cuda Apache License 2.0 Updated Jun 29, 2025
hpc Public

Learning and practice of high performance computing (CUDA, Vulkan, OpenCL, OpenMP, TBB, SSE/AVX, NEON, MPI, coroutines, etc. )

hpc gpu vulkan opencl thread mpi parallel-computing

C++ 62 8 Apache License 2.0 Updated Mar 23, 2025
pocket-ai Public

A Portable Toolkit for deploying Edge AI and HPC (opencl, vulkan, simd, task scheduling)

hpc gpu vulkan opencl cuda heterogeneous task-scheduling

Python 2 MIT License Updated Mar 22, 2025
TinyNeuralNetwork Public
Forked from alibaba/TinyNeuralNetwork

TinyNeuralNetwork is an efficient and easy-to-use deep learning model compression framework.

Python MIT License Updated Mar 4, 2025
lighteval Public
Forked from huggingface/lighteval

Lighteval is your all-in-one toolkit for evaluating LLMs across multiple backends

Python 1 MIT License Updated Feb 10, 2025
vllm Public
Forked from vllm-project/vllm

A high-throughput and memory-efficient inference and serving engine for LLMs

Python 1 Apache License 2.0 Updated Feb 9, 2025
tflite_micro Public
Forked from tensorflow/tflite-micro

Infrastructure to enable deployment of ML models to low-power resource-constrained embedded targets (including microcontrollers and digital signal processors).

C++ Apache License 2.0 Updated Jul 17, 2024
cjmcv Public

Updated Mar 4, 2024
ecas Public

ECAS is a library for edge AI computing acceleration.

performance ai graph hpc gpu neon vulkan

C++ 2 MIT License Updated Jul 22, 2023
patterns Public

A collection of architectural patterns and design patterns.

patterns design-patterns gof architectural-patterns software-architecture

C++ 4 4 Apache License 2.0 Updated Oct 15, 2022
mxnet Public

A fork of apache/incubator-mxnet.

C++ 1 1 Apache License 2.0 Updated Oct 15, 2022
cpy Public

Notes on calling each other between C and python.

C++ Apache License 2.0 Updated Aug 15, 2021
algorithm Public

C++ Apache License 2.0 Updated Sep 14, 2019
deeplearning-paper-notes Public

Reading notes on deep learning papers---深度学习论文阅读笔记 (2013-2018)

blog machine-learning algorithm computer-vision deep-learning notes paper

HTML 40 16 MIT License Updated Aug 24, 2019

Chen Jianming cjmcv

Achievements

Achievements

tilelang Public

Uh oh!

Megakernels Public

Uh oh!

xalpha Public

Uh oh!

cutlass Public

Uh oh!

ai-infra-notes Public

Uh oh!

flash-attention Public

Uh oh!

mirage Public

Uh oh!

flux Public

Uh oh!

tvm Public

Uh oh!

mlc-llm Public

Uh oh!

sglang Public

Uh oh!

relax Public

Uh oh!

nvshmem Public

Uh oh!

flashinfer Public

Uh oh!

TensorRT-LLM Public

Uh oh!

SageAttention Public

Uh oh!

marlin Public

Uh oh!

hpc Public

Uh oh!

pocket-ai Public

Uh oh!

TinyNeuralNetwork Public

Uh oh!

lighteval Public

Uh oh!

vllm Public

Uh oh!

tflite_micro Public

Uh oh!

cjmcv Public

Uh oh!

ecas Public

Uh oh!

patterns Public

Uh oh!

mxnet Public

Uh oh!

cpy Public

Uh oh!

algorithm Public

Uh oh!

deeplearning-paper-notes Public

Uh oh!