-
Math.SDU
- JiNan ShanDong china
Stars
A fast multi-producer, multi-consumer lock-free concurrent queue for C++11
A small CLI tool to detect CPU flags (instruction sets) of X86 binaries.
NEO is a LLM inference engine built to save the GPU memory crisis by CPU offloading
Helpful kernel tutorials and examples for tile-based GPU programming
Detect concurrency and memory bugs and possible panic locations in Rust projects
A lock-free, read-optimized, concurrency primitive.
Nsight Python is a Python kernel profiling interface based on NVIDIA Nsight Tools
High-throughput, token-authenticated tunneling built in Zig.
Flash Attention from Scratch on CUDA Ampere
指纹浏览器(防关联浏览器)资源整理 - Fingerprint Browser (Antidetect Browser) Resources
🎯 告别信息过载,AI 助你看懂新闻资讯热点,简单的舆情监控分析 - 多平台热点聚合+基于 MCP 的AI分析工具。监控35个平台(抖音、知乎、B站、华尔街见闻、财联社等),智能筛选+自动推送+AI对话分析(用自然语言深度挖掘新闻:趋势追踪、情感分析、相似检索等13种工具)。支持企业微信/个人微信/飞书/钉钉/Telegram/邮件/ntfy/bark/slack 推送,1分钟手机通知,无需…
Low overhead tracing library and trace visualizer for pipelined CUDA kernels
The official implementation of OSDI'25 paper BlitzScale
UCCL is an efficient communication library for GPUs, covering collectives, P2P (e.g., KV cache transfer, RL weight transfer), and EP (e.g., GPU-driven)
A tiny deep learning training framework implemented from scratch in C++ that follows PyTorch's API.
Tiny C++ LLM inference implementation from scratch
MessagePack is an extremely efficient object serialization library. It's like JSON, but very fast and small.
torchcomms: a modern PyTorch communications API
🚀🚀 「大模型」2小时完全从0训练26M的小参数GPT!🌏 Train a 26M-parameter GPT from scratch in just 2h!
We are committed to the open-sourcing quantitative knowledge, aiming to bridge the information gap between the domestic and international quantitative finance industries. 我们致力于量化知识的开源与汉化,打破国内外量化金融行…
基于多智能体LLM的中文金融交易框架 - TradingAgents中文增强版