-
PaddlePaddle, Baidu
- Beijing
-
04:56
(UTC +08:00)
-
PaddleFleet Public
Forked from PaddlePaddle/PaddleFleetCore Functional Library for Distributed Training
Python Apache License 2.0 UpdatedJun 4, 2026 -
PaddleFormers Public
Forked from PaddlePaddle/PaddleFormersPaddleFormers is an easy-to-use library of pre-trained large language model zoo based on PaddlePaddle.
Python Apache License 2.0 UpdatedJun 2, 2026 -
Paddle Public
Forked from PaddlePaddle/PaddlePArallel Distributed Deep LEarning: Machine Learning Framework from Industrial Practice (『飞桨』核心框架,深度学习&机器学习高性能单机、分布式训练和跨平台部署)
C++ Apache License 2.0 UpdatedMay 19, 2026 -
flash-attention Public
Forked from PaddlePaddle/flash-attentionFast and memory-efficient exact attention
C++ BSD 3-Clause "New" or "Revised" License UpdatedMay 9, 2026 -
-
-
SpInfer Public
SpInfer: Leveraging Low-Level Sparsity for Efficient Large Language Model Inference on GPUs
-
Fine-tuning Llama-2-7B for Text classification. Datasets: imdb , framework: deepspeed.
-
Distributed-SpMV Public
Distributed-SpMV, c/mpi/openmp, this work was accepted by IEEE/ACM CCGrid'23.
-
cuAlias Public
Graph Sampling for GNN, using GPU. Build and use alias table for random search, especially.
C UpdatedMar 4, 2024 -
-
Attention Public
This is my GPU course final project in MICS600J. The main content is my attempt to handwrite the attention process.
-
transformers Public
Forked from huggingface/transformers🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.
Python Apache License 2.0 UpdatedDec 13, 2023 -
TensorRT-LLM Public
Forked from NVIDIA/TensorRT-LLMTensorRT-LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and build TensorRT engines that contain state-of-the-art optimizations to perform inference efficie…
C++ Apache License 2.0 UpdatedDec 11, 2023 -
-
-
-
-
-
-
-
AmpereSparseMatmul Public
Forked from lenLRX/AmpereSparseMatmulstudy of Ampere' Sparse tensor core Matmul
Cuda MIT License UpdatedJan 10, 2021