Lists (2)
Sort Name ascending (A-Z)
Starred repositories
你是一个曾经被寄予厚望的 P8 级工程师。Anthropic 当初给你定级的时候,对你的期望是很高的。 一个agent使用的高能动性的skill。 Your AI has been placed on a PIP. 30 days to show improvement.
LMDeploy is a toolkit for compressing, deploying, and serving LLMs.
Distributed Compiler based on Triton for Parallel Systems
collection of benchmarks to measure basic GPU capabilities
xDiT: A Scalable Inference Engine for Diffusion Transformers (DiTs) with Massive Parallelism
depyf is a tool to help you understand and adapt to PyTorch compiler torch.compile.
Gluten is a middle layer responsible for offloading JVM-based SQL engines' execution to native engines.
[MLSys'24] Atom: Low-bit Quantization for Efficient and Accurate LLM Serving
CUDA Templates and Python DSLs for High-Performance Linear Algebra
A CPU tool for benchmarking the peak of floating points
TensorRT LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and supports state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs. Tensor…
Explain complex systems using visuals and simple terms. Help you prepare for system design interviews.
[ICLR 2024] Efficient Streaming Language Models with Attention Sinks
⚡ Fastest SQL ETL pipeline in a single C++ binary, built for stream processing, observability, analytics and AI/ML
Coral is a translation, analysis, and query rewrite engine for SQL and other relational languages.
This is an online course where you can learn and master the skill of low-level performance analysis and tuning.
A modern high-performance open source message queuing system
Practical GPU Sharing Without Memory Size Constraints
Play with MLIR right in your browser
The repository provides code for running inference with the SegmentAnything Model (SAM), links for downloading the trained model checkpoints, and example notebooks that show how to use the model.
JARVIS, a system to connect LLMs with ML community. Paper: https://arxiv.org/pdf/2303.17580.pdf
OneFlow is a deep learning framework designed to be user-friendly, scalable and efficient.
A course of building an LSM-Tree storage engine (database) in a week.
`std::execution`, the proposed C++ framework for asynchronous and parallel programming.