Stars
Repository to host and maintain SCALE-Sim code
ASTRA-sim2.0: Modeling Hierarchical Networks and Disaggregated Systems for Large-model Training at Scale
The repository for ATC'25 paper "Greyhound: Hunting Fail-Slows in Hybrid-Parallel Training at Scale"
Scapy: the Python-based interactive packet manipulation program & library.
Countdowns to top Networking and Measurement conference deadlines.
cluster data collected from production clusters in Alibaba for cluster management research
Collective communications library with various primitives for multi-machine training.
eBPF Developer Tutorial: Learning eBPF Step by Step with Examples
[NeurIPS'24 Oral] HydraLoRA: An Asymmetric LoRA Architecture for Efficient Fine-Tuning
ScholarCopilot: Training Large Language Models for Academic Writing with Accurate Citations [COLM 2025]
An easy-to-use federated learning platform
A prefetching technique for faster federated learning
๐A curated list of Awesome LLM/VLM Inference Papers with Codes: Flash-Attention, Paged-Attention, WINT8/4, Parallelism, etc.๐
๐ฐ Desktop utility to download images/videos/music/text from various websites, and more.
The official repo of "In-Network Address Caching for Virtual Networks" (ACM SIGCOMM'24).
Emulator for rapid prototyping of Software Defined Networks
THC: Accelerating Distributed Deep Learning Using Tensor Homomorphic Compression
This repository contains a list of papers on various topics (that I am working/worked on) in the system and networking area.