Skip to content
View HangJie720's full-sized avatar

Block or report HangJie720

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

torchcomms: a modern PyTorch communications API

C++ 327 82 Updated Feb 5, 2026

LMDeploy is a toolkit for compressing, deploying, and serving LLMs.

Python 7,576 651 Updated Feb 5, 2026

诺亚盘古大模型研发背后的真正的心酸与黑暗的故事。

11,389 1,343 Updated Jul 9, 2025

Simple and efficient pytorch-native transformer text generation in <1000 LOC of python.

Python 6,184 569 Updated Aug 22, 2025

A PyTorch native platform for training generative AI models

Python 5,039 698 Updated Feb 5, 2026

[ICLR2025, ICML2025, NeurIPS2025 Spotlight] Quantized Attention achieves speedup of 2-5x compared to FlashAttention, without losing end-to-end metrics across language, image, and video models.

Cuda 3,137 335 Updated Jan 17, 2026

DeepSparkInference has selected 216 inference models of both small and large sizes. The small models cover fields such as computer vision, natural language processing, and speech recognition; the L…

Python 27 7 Updated Feb 4, 2026

The DeepSpark open platform selects hundreds of open source application algorithms and models that are deeply coupled with industrial applications, supports mainstream application frameworks, and p…

45 5 Updated Dec 25, 2025

Mirage Persistent Kernel: Compiling LLMs into a MegaKernel

C++ 2,116 172 Updated Jan 29, 2026

《C++ Primer Plus 第6版(中文版)》原书代码、习题答案和个人笔记,仅供学习和交流。

C++ 3,098 613 Updated Mar 10, 2025

Ascend PyTorch adapter (torch_npu). Mirror of https://gitee.com/ascend/pytorch

Python 485 42 Updated Feb 5, 2026

FlashInfer: Kernel Library for LLM Serving

Python 4,876 692 Updated Feb 5, 2026
Python 76 35 Updated Nov 22, 2024

Mamba SSM architecture

Python 17,145 1,580 Updated Jan 12, 2026

Causal depthwise conv1d in CUDA, with a PyTorch interface

Cuda 712 150 Updated Jan 12, 2026

Unified Efficient Fine-Tuning of 100+ LLMs & VLMs (ACL 2024)

Python 66,940 8,140 Updated Feb 4, 2026

Ongoing research training transformer models at scale

Python 15,148 3,568 Updated Feb 5, 2026

Hackable and optimized Transformers building blocks, supporting a composable construction.

Python 10,326 765 Updated Feb 5, 2026

This repository contains the results and code for the MLPerf™ Training v2.1 benchmark.

C++ 15 18 Updated Aug 9, 2023

Forward and backward Attention DNN operators implementationed by LibTorch, cuDNN, and Eigen.

C++ 30 5 Updated Jun 6, 2023

Efficient GPU kernels for block-sparse matrix multiplication and convolution

Cuda 1,063 198 Updated Jun 8, 2023

Memory Efficient Attention (O(sqrt(n)) for Jax and PyTorch

Python 184 21 Updated Jan 6, 2023

Implementation of a memory efficient multi-head attention as proposed in the paper, "Self-attention Does Not Need O(n²) Memory"

Python 390 35 Updated Jul 18, 2023

Fast and memory-efficient exact attention

Python 22,104 2,350 Updated Feb 5, 2026

DeepSparkHub selects hundreds of application algorithms and models, covering various fields of AI and general-purpose computing, to support the mainstream intelligent computing scenarios. This repo…

Python 70 14 Updated Feb 4, 2026

Pretrain, finetune ANY AI model of ANY size on 1 or 10,000+ GPUs with zero code changes.

Python 30,805 3,666 Updated Feb 4, 2026

how to optimize some algorithm in cuda.

Cuda 2,818 256 Updated Jan 31, 2026

OpenBLAS is an optimized BLAS library based on GotoBLAS2 1.13 BSD version.

C 7,267 1,643 Updated Feb 5, 2026
Next