-
Huazhong University of Science and Technology
- WuhanοΌChina
-
10:48
(UTC +08:00)
Stars
A verification tool for ensuring parallelization equivalence in distributed model training.
DLRover: An Automatic Distributed Deep Learning System
A curated reading list for machine learning reliability research and practice
Dynolog is a telemetry daemon for performance monitoring and tracing. It exports metrics from different components in the system like the linux kernel, CPU, disks, Intel PT, GPUs etc. Dynolog also β¦
A Framework for Automated Validation of Deep Learning Training Tasks
GKLEE is a symbolic analyser and test generator tailored for CUDA C++ programs
Tensors and Dynamic neural networks in Python with strong GPU acceleration
Projects related to end-to-end tracing and causal metadata propagation, such as X-Trace, Retro, and Pivot Tracing
A computer algebra system written in pure Python
A fault injection framework for detecting fail-slow hardware failures in cloud systems
Fuzzing Automatic Differentiation in Deep-Learning Libraries (ICSE'23)
Original reference implementation of "3D Gaussian Splatting for Real-Time Radiance Field Rendering"
The old Linux kernel source ver 0.11 which has been tested under modern Linux, Mac OSX and Windows.
Extract semantic information about static Python code
The property-based testing library for Python
The simplest, fastest repository for training/finetuning medium-sized GPTs.
Large Language Model (LLM) Systems Paper List
Awesome-LLM: a curated list of Large Language Model
SGLang is a fast serving framework for large language models and vision language models.
π OpenHands: AI-Driven Development
π¨βπ» An awesome and curated list of best code-LLM for research.