Skip to content
View pipecat's full-sized avatar
  • Ant Group
  • Beijing

Organizations

@NJUPT-SACC

Block or report pipecat

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse

Starred repositories

Showing results

A high-performance RL training-inference weight synchronization framework, designed to enable second-level parameter updates from training to inference in RL workflows

Python 117 9 Updated Dec 22, 2025

IEEE 754-style floating-point converter

TypeScript 17 2 Updated Jan 30, 2023

Tile primitives for speedy kernels

Cuda 3,012 219 Updated Dec 9, 2025
Python 34 10 Updated Oct 11, 2025

Financial data platform for analysts, quants and AI agents.

Python 55,772 5,420 Updated Dec 23, 2025

[ICLR2025, ICML2025, NeurIPS2025 Spotlight] Quantized Attention achieves speedup of 2-5x compared to FlashAttention, without losing end-to-end metrics across language, image, and video models.

Cuda 2,911 291 Updated Dec 22, 2025

This is a series of GPU optimization topics. Here we will introduce how to optimize the CUDA kernel in detail. I will introduce several basic kernel optimizations, including: elementwise, reduce, s…

Cuda 1,208 178 Updated Jul 29, 2023

CUDA 算子手撕与面试指南

Cuda 739 81 Updated Aug 23, 2025
C++ 10 Updated Jul 27, 2023

Efficent platform for inference and serving local LLMs including an OpenAI compatible API server.

Rust 552 65 Updated Dec 22, 2025

[🔥updating ...] AI 自动量化交易机器人(完全本地部署) AI-powered Quantitative Investment Research Platform. 📃 online docs: https://ufund-me.github.io/Qbot ✨ :news: qbot-mini: https://github.com/Charmve/iQuant

Jupyter Notebook 15,519 2,211 Updated Jul 6, 2025

Material for gpu-mode lectures

Jupyter Notebook 5,445 552 Updated Dec 8, 2025

基于Python的开源量化交易平台开发框架

Python 34,843 10,536 Updated Dec 22, 2025
C++ 12 2 Updated Dec 31, 2020

IPTV直播源抓取 自动整合hao趣网直播源+TVBox直播源+其他网上直播源 择取分辨率、速度最佳视频流 定期更新

10,278 728 Updated Dec 31, 2024

[MLSys'25] QServe: W4A8KV4 Quantization and System Co-design for Efficient LLM Serving; [MLSys'25] LServe: Efficient Long-sequence LLM Serving with Unified Sparse Attention

C++ 795 56 Updated Mar 6, 2025

Code base and slides for ECE408:Applied Parallel Programming On GPU.

C++ 141 34 Updated Jul 2, 2021

📚A curated list of Awesome LLM/VLM Inference Papers with Codes: Flash-Attention, Paged-Attention, WINT8/4, Parallelism, etc.🎉

Python 4,858 329 Updated Nov 28, 2025

CUDA/Metal accelerated language model inference

C 625 30 Updated May 29, 2025

LLM training in simple, raw C/CUDA

Cuda 28,458 3,337 Updated Jun 26, 2025

Large Language Model Text Generation Inference

Python 10,711 1,247 Updated Dec 19, 2025

Lightning fast C++/CUDA neural network framework

C++ 4,361 537 Updated Dec 14, 2025

This project aim to reproduce Sora (Open AI T2V model), we wish the open source community contribute to this project.

Python 12,097 1,071 Updated Oct 29, 2025

Fast Multimodal LLM on Mobile Devices

C++ 1,293 156 Updated Dec 23, 2025

FlashInfer: Kernel Library for LLM Serving

Cuda 4,340 614 Updated Dec 23, 2025

Full and up-to-date source code of the chapters of the "SFML Game Development" book

C++ 1,012 235 Updated Sep 27, 2015

LLM Inference benchmark

Python 430 40 Updated Jul 23, 2024

how to learn PyTorch and OneFlow

466 28 Updated Mar 22, 2024
Next