-
Math.SDU
- JiNan ShanDong china
Stars
The official implementation of OSDI'25 paper BlitzScale
UCCL is an efficient communication library for GPUs, covering collectives, P2P (e.g., KV cache transfer, RL weight transfer), and EP (e.g., GPU-driven)
A tiny deep learning training framework implemented from scratch in C++ that follows PyTorch's API.
Tiny C++ LLM inference implementation from scratch
MessagePack is an extremely efficient object serialization library. It's like JSON, but very fast and small.
torchcomms: a modern PyTorch communications API
🚀🚀 「大模型」2小时完全从0训练26M的小参数GPT!🌏 Train a 26M-parameter GPT from scratch in just 2h!
We are committed to the open-sourcing quantitative knowledge, aiming to bridge the information gap between the domestic and international quantitative finance industries. 我们致力于量化知识的开源与汉化,打破国内外量化金融行…
基于多智能体LLM的中文金融交易框架 - TradingAgents中文增强版
muvm - run programs from your system in a microVM
实现Linux Wayland下腾讯会议屏幕共享(非虚拟相机). Hook library that enables screenshare with Tencent Wemeet on Linux Wayland, without the need of using virtual cameras.
A scalable file analysis and data generation platform that allows users to easily orchestrate arbitrary docker/vm/shell tools at scale.
Legacy-Mess Detector – assess the “legacy-mess level” of your code and output a beautiful report | 屎山代码检测器,评估代码的“屎山等级”并输出美观的报告
This repository contains a 90-day cybersecurity study plan, along with resources and materials for learning various cybersecurity concepts and technologies. The plan is organized into daily tasks, …
Patterns and resources of low latency programming.
Library for specialized dense and sparse matrix operations, and deep learning primitives.
Stepwise optimizations of DGEMM on CPU, reaching performance faster than Intel MKL eventually, even under multithreading.
[ICLR2025, ICML2025, NeurIPS2025 Spotlight] Quantized Attention achieves speedup of 2-5x compared to FlashAttention, without losing end-to-end metrics across language, image, and video models.