HangJie720

jie.hang HangJie720

Main research is focused on deep learning and parallel computing.

28 followers · 55 following

iluvatar.ai
Cloud Security City, Nanjing
https://scholar.google.com/citations?hl=zh-CN&user=I5UcdEYAAAAJ&view_op=list_works&sortby=pubdate#

Achievements

Stars

meta-pytorch / torchcomms

torchcomms: a modern PyTorch communications API

C++ 327 82 Updated Feb 5, 2026

InternLM / lmdeploy

LMDeploy is a toolkit for compressing, deploying, and serving LLMs.

Python 7,576 651 Updated Feb 5, 2026

HW-whistleblower / True-Story-of-Pangu

诺亚盘古大模型研发背后的真正的心酸与黑暗的故事。

11,389 1,343 Updated Jul 9, 2025

deepseek-ai / DeepSeek-V3

Python 101,458 16,510 Updated Aug 28, 2025

meta-pytorch / gpt-fast

Simple and efficient pytorch-native transformer text generation in <1000 LOC of python.

Python 6,184 569 Updated Aug 22, 2025

pytorch / torchtitan

A PyTorch native platform for training generative AI models

Python 5,039 698 Updated Feb 5, 2026

thu-ml / SageAttention

[ICLR2025, ICML2025, NeurIPS2025 Spotlight] Quantized Attention achieves speedup of 2-5x compared to FlashAttention, without losing end-to-end metrics across language, image, and video models.

Cuda 3,137 335 Updated Jan 17, 2026

Deep-Spark / DeepSparkInference

DeepSparkInference has selected 216 inference models of both small and large sizes. The small models cover fields such as computer vision, natural language processing, and speech recognition; the L…

Python 27 7 Updated Feb 4, 2026

Deep-Spark / DeepSpark

The DeepSpark open platform selects hundreds of open source application algorithms and models that are deeply coupled with industrial applications, supports mainstream application frameworks, and p…

45 5 Updated Dec 25, 2025

mirage-project / mirage

Mirage Persistent Kernel: Compiling LLMs into a MegaKernel

C++ 2,116 172 Updated Jan 29, 2026

ShujiaHuang / Cpp-Primer-Plus-6th

《C++ Primer Plus 第6版（中文版）》原书代码、习题答案和个人笔记，仅供学习和交流。

C++ 3,098 613 Updated Mar 10, 2025

Ascend / pytorch

Ascend PyTorch adapter (torch_npu). Mirror of https://gitee.com/ascend/pytorch

Python 485 42 Updated Feb 5, 2026

flashinfer-ai / flashinfer

FlashInfer: Kernel Library for LLM Serving

Python 4,876 692 Updated Feb 5, 2026

DeepLink-org / DIOPI

Python 76 35 Updated Nov 22, 2024

state-spaces / mamba

Mamba SSM architecture

Python 17,145 1,580 Updated Jan 12, 2026

Dao-AILab / causal-conv1d

Causal depthwise conv1d in CUDA, with a PyTorch interface

Cuda 712 150 Updated Jan 12, 2026

hiyouga / LlamaFactory

Unified Efficient Fine-Tuning of 100+ LLMs & VLMs (ACL 2024)

Python 66,940 8,140 Updated Feb 4, 2026

NVIDIA / Megatron-LM

Ongoing research training transformer models at scale

Python 15,148 3,568 Updated Feb 5, 2026

facebookresearch / xformers

Hackable and optimized Transformers building blocks, supporting a composable construction.

Python 10,326 765 Updated Feb 5, 2026

mlcommons / training_results_v2.1

This repository contains the results and code for the MLPerf™ Training v2.1 benchmark.

C++ 15 18 Updated Aug 9, 2023

jundaf2 / eigenMHA

Forward and backward Attention DNN operators implementationed by LibTorch, cuDNN, and Eigen.

C++ 30 5 Updated Jun 6, 2023

openai / blocksparse

Efficient GPU kernels for block-sparse matrix multiplication and convolution

Cuda 1,063 198 Updated Jun 8, 2023

AminRezaei0x443 / memory-efficient-attention

Memory Efficient Attention (O(sqrt(n)) for Jax and PyTorch

Python 184 21 Updated Jan 6, 2023

lucidrains / memory-efficient-attention-pytorch

Implementation of a memory efficient multi-head attention as proposed in the paper, "Self-attention Does Not Need O(n²) Memory"

Python 390 35 Updated Jul 18, 2023

Dao-AILab / flash-attention

Fast and memory-efficient exact attention

Python 22,104 2,350 Updated Feb 5, 2026

Deep-Spark / DeepSparkHub

DeepSparkHub selects hundreds of application algorithms and models, covering various fields of AI and general-purpose computing, to support the mainstream intelligent computing scenarios. This repo…

Python 70 14 Updated Feb 4, 2026

Lightning-AI / pytorch-lightning

Pretrain, finetune ANY AI model of ANY size on 1 or 10,000+ GPUs with zero code changes.

Python 30,805 3,666 Updated Feb 4, 2026

BBuf / how-to-optim-algorithm-in-cuda

how to optimize some algorithm in cuda.

Cuda 2,818 256 Updated Jan 31, 2026

nicolaswilde / cuda-sgemm

Cuda 70 18 Updated Jan 6, 2025

OpenMathLib / OpenBLAS

OpenBLAS is an optimized BLAS library based on GotoBLAS2 1.13 BSD version.

C 7,267 1,643 Updated Feb 5, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

jie.hang HangJie720

Achievements

Achievements

Block or report HangJie720

Stars

meta-pytorch / torchcomms

InternLM / lmdeploy

HW-whistleblower / True-Story-of-Pangu

deepseek-ai / DeepSeek-V3

meta-pytorch / gpt-fast

pytorch / torchtitan

thu-ml / SageAttention

Deep-Spark / DeepSparkInference

Deep-Spark / DeepSpark

mirage-project / mirage

ShujiaHuang / Cpp-Primer-Plus-6th

Ascend / pytorch

flashinfer-ai / flashinfer

DeepLink-org / DIOPI

state-spaces / mamba

Dao-AILab / causal-conv1d

hiyouga / LlamaFactory

NVIDIA / Megatron-LM

facebookresearch / xformers

mlcommons / training_results_v2.1

jundaf2 / eigenMHA

openai / blocksparse

AminRezaei0x443 / memory-efficient-attention

lucidrains / memory-efficient-attention-pytorch

Dao-AILab / flash-attention

Deep-Spark / DeepSparkHub

Lightning-AI / pytorch-lightning

BBuf / how-to-optim-algorithm-in-cuda

nicolaswilde / cuda-sgemm

OpenMathLib / OpenBLAS