bigPYJ1151

🎯

Focusing

Li, Jiang bigPYJ1151

🎯

Focusing

47 followers · 31 following

@intel

Achievements

x4 x3

Achievements

x4 x3

Organizations

Lists (2)

Sort

My stack

11 repositories

TODO

3 repositories

Starred repositories

tanweai / pua

你是一个曾经被寄予厚望的 P8 级工程师。Anthropic 当初给你定级的时候，对你的期望是很高的。一个agent使用的高能动性的skill。 Your AI has been placed on a PIP. 30 days to show improvement.

TypeScript 16,363 940 Updated Apr 17, 2026

Tencent / hpc-ops

High Performance LLM Inference Operator Library

C++ 828 82 Updated Apr 13, 2026

InternLM / lmdeploy

LMDeploy is a toolkit for compressing, deploying, and serving LLMs.

Python 7,788 685 Updated Apr 17, 2026

ByteDance-Seed / Triton-distributed

Distributed Compiler based on Triton for Parallel Systems

Python 1,409 138 Updated Apr 17, 2026

RRZE-HPC / gpu-benches

collection of benchmarks to measure basic GPU capabilities

C++ 513 82 Updated Oct 24, 2025

xdit-project / xDiT

xDiT: A Scalable Inference Engine for Diffusion Transformers (DiTs) with Massive Parallelism

Python 2,597 315 Updated Apr 9, 2026

thuml / depyf

depyf is a tool to help you understand and adapt to PyTorch compiler torch.compile.

Python 801 28 Updated Oct 13, 2025

apache / gluten

Gluten is a middle layer responsible for offloading JVM-based SQL engines' execution to native engines.

Scala 1,548 595 Updated Apr 17, 2026

efeslab / Atom

[MLSys'24] Atom: Low-bit Quantization for Efficient and Accurate LLM Serving

Cuda 337 32 Updated Jul 2, 2024

mikeroyal / AMX-Guide

Advanced Matrix Extensions (AMX) Guide

C++ 113 8 Updated Jan 11, 2022

NVIDIA / cutlass

CUDA Templates and Python DSLs for High-Performance Linear Algebra

C++ 9,599 1,797 Updated Apr 17, 2026

openucx / ucc

Unified Collective Communication Library

C 303 128 Updated Apr 15, 2026

pigirons / cpufp

A CPU tool for benchmarking the peak of floating points

Assembly 580 132 Updated Feb 7, 2026

NVIDIA / TensorRT-LLM

TensorRT LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and supports state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs. Tensor…

Python 13,406 2,298 Updated Apr 18, 2026

ByteByteGoHq / system-design-101

Explain complex systems using visuals and simple terms. Help you prepare for system design interviews.

81,983 9,033 Updated Apr 4, 2025

mit-han-lab / streaming-llm

[ICLR 2024] Efficient Streaming Language Models with Attention Sinks

Python 7,215 398 Updated Jul 11, 2024

timeplus-io / proton

⚡ Fastest SQL ETL pipeline in a single C++ binary, built for stream processing, observability, analytics and AI/ML

C++ 2,189 107 Updated Apr 17, 2026

linkedin / coral

Coral is a translation, analysis, and query rewrite engine for SQL and other relational languages.

Java 897 209 Updated Apr 15, 2026

dendibakh / perf-ninja

This is an online course where you can learn and master the skill of low-level performance analysis and tuning.

C++ 3,654 368 Updated Apr 15, 2026

bloomberg / blazingmq

A modern high-performance open source message queuing system

C++ 3,160 181 Updated Apr 17, 2026

ggml-org / ggml

Tensor library for machine learning

C++ 14,463 1,561 Updated Apr 14, 2026

grgalex / nvshare

Practical GPU Sharing Without Memory Size Constraints

C 308 33 Updated Mar 28, 2025

MLIR-China / mlir-playground

Play with MLIR right in your browser

TypeScript 139 8 Updated May 25, 2023

facebookresearch / segment-anything

The repository provides code for running inference with the SegmentAnything Model (SAM), links for downloading the trained model checkpoints, and example notebooks that show how to use the model.

Jupyter Notebook 53,973 6,323 Updated Sep 18, 2024