-
Alibaba Cloud
- Hangzhou China
-
04:21
(UTC +08:00) - https://tonylu.dev (UNDER MAINTENANCE)
- @tonyluj
Starred repositories
A next.js web application that integrates AI capabilities with draw.io diagrams. This app allows you to create, modify, and enhance diagrams through natural language commands and AI-assisted visual…
React app for inspecting, building and debugging with the Realtime API
Silero VAD: pre-trained enterprise-grade Voice Activity Detector
A framework for efficient model inference with omni-modality models
A Lightweight LLM Inference Performance Simulator
SGLang is a fast serving framework for large language models and vision language models.
Shared data types for building collaborative software
WebAssembly Micro Runtime (WAMR)
A highly customable, adaptable, runtime agnostic and WASM/WASI friendly Gossip protocol (SWIM) which helps manage cluster membership and member failure detection.
a collection of well-tested, serializable CRDTs for Rust
Offline optimization of your disaggregated Dynamo graph
An transformer based LLM. Written completely in Rust
Python tool for converting files and office documents to Markdown.
Raft distributed consensus algorithm implemented in Rust.
Train speculative decoding models effortlessly and port them smoothly to SGLang serving.
📰 Must-read papers and blogs on Speculative Decoding ⚡️
This is the release repository for Fan Control, a highly customizable fan controlling software for Windows.
The official Python library for the OpenAI API
Tilus is a tile-level kernel programming language with explicit control over shared memory and registers.
A workload for deploying LLM inference services on Kubernetes
Fast inference from large lauguage models via speculative decoding
Codes for our paper "Speculative Decoding: Exploiting Speculative Execution for Accelerating Seq2seq Generation" (EMNLP 2023 Findings)
A curated list for Efficient Large Language Models
vLLM’s reference system for K8S-native cluster-wide deployment with community-driven performance optimization
Library of deep learning models and datasets designed to make deep learning more accessible and accelerate ML research.
Ergonomic and modular web framework built with Tokio, Tower, and Hyper