Skip to content
View weijietong's full-sized avatar

Organizations

@RoaringBitmap

Block or report weijietong

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

Pretrain, finetune ANY AI model of ANY size on 1 or 10,000+ GPUs with zero code changes.

Python 30,988 3,699 Updated Apr 1, 2026

Composable transformations of Python+NumPy programs: differentiate, vectorize, JIT to GPU/TPU, and more

Python 35,295 3,503 Updated Apr 4, 2026

A machine learning compiler for GPUs, CPUs, and ML accelerators

C++ 4,131 772 Updated Apr 4, 2026

Implement a reasoning LLM in PyTorch from scratch, step by step

Jupyter Notebook 3,978 560 Updated Apr 1, 2026

Build smaller, faster, and more secure desktop and mobile applications with a web frontend.

Rust 104,903 3,490 Updated Apr 3, 2026

AnyBlox runtime and tooling

C 36 1 Updated Sep 4, 2025

An extensible, state of the art columnar file format. Formerly at @spiraldb, now an Incubation Stage project at LFAI&Data, part of the Linux Foundation.

Rust 2,832 144 Updated Apr 3, 2026

An easy-to-use, header-only C++ wrapper for Linux' perf event API

C++ 140 22 Updated Jan 7, 2026
C++ 23 4 Updated Nov 7, 2025
C++ 913 82 Updated Apr 3, 2026

Goal: Enable awesome tooling for Bazel users of the C language family.

Python 896 183 Updated Aug 11, 2025

The universal proxy platform

Go 32,088 3,755 Updated Apr 3, 2026

[TMLR 2025] Efficient Reasoning Models: A Survey

Python 304 22 Updated Mar 9, 2026

Vector (and Scalar) Quantization, in Pytorch

Python 3,884 325 Updated Mar 30, 2026

Official repository of the xLSTM.

Python 2,140 176 Updated Nov 4, 2025

A library for accelerating Transformer models on NVIDIA GPUs, including using 8-bit and 4-bit floating point (FP8 and FP4) precision on Hopper, Ada and Blackwell GPUs, to provide better performance…

Python 3,255 684 Updated Apr 3, 2026

The hub for EleutherAI's work on interpretability and learning dynamics

Jupyter Notebook 2,761 210 Updated Nov 15, 2025

Modeling, training, eval, and inference code for OLMo

Python 6,453 734 Updated Nov 24, 2025

InkFuse - An Experimental Database Runtime Unifying Vectorized and Compiled Query Execution.

C++ 55 3 Updated May 13, 2024

CUDA Templates and Python DSLs for High-Performance Linear Algebra

C++ 9,527 1,768 Updated Apr 2, 2026

TensorRT LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and supports state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs. Tensor…

Python 13,260 2,250 Updated Apr 3, 2026

Unified Efficient Fine-Tuning of 100+ LLMs & VLMs (ACL 2024)

Python 69,473 8,452 Updated Apr 1, 2026

Unsloth Studio is a web UI for training and running open models like Qwen3.5, Gemma 4, DeepSeek, gpt-oss locally.

Python 59,291 5,023 Updated Apr 3, 2026

The AI developer platform. Use Weights & Biases to train and fine-tune models, and manage models from experimentation to production.

Python 10,947 851 Updated Apr 4, 2026

A Flexible Framework for Experiencing Heterogeneous LLM Inference/Fine-tune Optimizations

Python 16,891 1,252 Updated Apr 3, 2026

FP16xINT4 LLM inference kernel that can achieve near-ideal ~4x speedups up to medium batchsizes of 16-32 tokens.

Python 1,048 87 Updated Sep 4, 2024

A concise but complete full-attention transformer with a set of promising experimental features from various papers

Python 5,813 506 Updated Mar 27, 2026

Get up and running with Kimi-K2.5, GLM-5, MiniMax, DeepSeek, gpt-oss, Qwen, Gemma and other models.

Go 167,026 15,301 Updated Apr 4, 2026
41 22 Updated Apr 3, 2022

LLM training in simple, raw C/CUDA

Cuda 29,342 3,481 Updated Jun 26, 2025
Next