-
ECNU
- Shanghai
Highlights
- Pro
Stars
Transformer related optimization, including BERT, GPT
A modern model graph visualizer and debugger
Source code examples from the Parallel Forall Blog
Tensors and Dynamic neural networks in Python with strong GPU acceleration
This is a series of GPU optimization topics. Here we will introduce how to optimize the CUDA kernel in detail. I will introduce several basic kernel optimizations, including: elementwise, reduce, s…
AI agents running research on single-GPU nanochat training automatically
A lightweight alternative to OpenClaw that runs in containers for security. Connects to WhatsApp, Telegram, Slack, Discord, Gmail and other messaging apps,, has memory, scheduled jobs, and runs dir…
Very low latency speech to text, intent recognition, and text to speech, for building voice agents and interfaces
AliSQL is a MySQL branch originated from Alibaba Group. Fetch document from Release Notes at bottom.
A lightweight, lightning-fast, in-process vector database
A simple C++11 Thread Pool implementation
Algorithm powering the For You feed on X
A vector indexing library to bring fast, fresh and filtered search to your database
Minimal Claude Code alternative. Single Python file, zero dependencies, ~250 lines.
Roxanne0321 / vsag
Forked from antgroup/vsagvsag is a vector indexing library used for similarity search.
A curated list of awesome Claude Skills, resources, and tools for customizing Claude AI workflows
A compact implementation of SGLang, designed to demystify the complexities of modern LLM serving systems.
🤗 LeRobot: Making AI for Robotics more accessible with end-to-end learning
Optimizing SGEMM kernel functions on NVIDIA GPUs to a close-to-cuBLAS performance.
🧩 Hands-on SIMD Programming with C++
Awesome Generative Recommendation papers primarily focused on industry-level applications.
Flash Attention from Scratch on CUDA Ampere