Skip to content
View sergeywong's full-sized avatar
  • Tencent
  • Shenzhen, Guangdong, China

Block or report sergeywong

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don’t include any personal information such as legal names or email addresses. Markdown is supported. This note will only be visible to you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

FlashInfer: Kernel Library for LLM Serving

Python 5,795 1,050 Updated Jun 15, 2026

A general-purpose programmatic animation tool

Python 252 18 Updated Jun 6, 2026

ASCII generator (image to text, image to image, video to video)

Python 8,269 651 Updated Nov 22, 2024

D2 is a modern diagram scripting language that turns text to diagrams.

Go 24,412 688 Updated Apr 24, 2026

Repository for Meta Chameleon, a mixed-modal early-fusion foundation model from FAIR.

Python 2,101 117 Updated Jul 29, 2024

📚A curated list of Awesome LLM/VLM Inference Papers with Codes: Flash-Attention, Paged-Attention, WINT8/4, Parallelism, etc.🎉

Python 5,322 399 Updated Apr 20, 2026

A generative speech model for daily dialogue.

Python 39,459 4,244 Updated Apr 10, 2026

Fast and memory-efficient exact attention

Python 24,153 2,831 Updated Jun 10, 2026

LLM inference in C/C++

C++ 116,628 19,597 Updated Jun 15, 2026

Large-scale Self-supervised Pre-training Across Tasks, Languages, and Modalities

Python 22,147 2,697 Updated Jan 23, 2026

Scalable toolkit for efficient model alignment

Python 851 105 Updated Oct 6, 2025

Tile primitives for speedy kernels

Cuda 3,431 295 Updated Jun 15, 2026

A library for accelerating Transformer models on NVIDIA GPUs, including using 8-bit and 4-bit floating point (FP8 and FP4) precision on Hopper, Ada and Blackwell GPUs, to provide better performance…

Python 3,394 749 Updated Jun 14, 2026

OpenMMLab Model Deployment Framework

Python 3,125 713 Updated Sep 30, 2024

LMDeploy is a toolkit for compressing, deploying, and serving LLMs.

Python 7,900 701 Updated Jun 15, 2026

CUDA Library Samples

Cuda 2,436 461 Updated Jun 10, 2026

CV-CUDA™ is an open-source, GPU accelerated library for cloud-scale image processing and computer vision.

C++ 2,693 252 Updated May 28, 2026

[ICML 2023] SmoothQuant: Accurate and Efficient Post-Training Quantization for Large Language Models

Python 1,661 206 Updated Jul 12, 2024

ChatGLM-6B: An Open Bilingual Dialogue Language Model | 开源双语对话语言模型

Python 41,040 5,142 Updated Jun 27, 2024

Accessible large language models via k-bit quantization for PyTorch.

Python 8,273 872 Updated Jun 12, 2026

Transformer related optimization, including BERT, GPT

C++ 6,421 935 Updated Mar 27, 2024

健康学习到150岁 - 人体系统调优不完全指南

21,731 1,517 Updated Sep 10, 2025

The road to hack SysML and become an system expert

Emacs Lisp 514 64 Updated Sep 25, 2024

X-modaler is a versatile and high-performance codebase for cross-modal analytics(e.g., image captioning, video captioning, vision-language pre-training, visual question answering, visual commonsens…

Python 971 105 Updated Feb 27, 2023

Low-precision matrix multiplication

C++ 1,842 460 Updated Jan 29, 2024

CLIP (Contrastive Language-Image Pretraining), Predict the most relevant text snippet given an image

Jupyter Notebook 33,772 4,017 Updated Mar 25, 2026

Code for "Discovering Symbolic Models from Deep Learning with Inductive Biases"

Python 781 143 Updated Nov 20, 2023

An index of algorithms for learning causality with data

3,268 474 Updated Jan 22, 2025

Repository with code and slides for a tutorial on causal inference.

Jupyter Notebook 590 113 Updated Sep 23, 2019
Next