Starred repositories
SHAS: Approaching optimal Segmentation for End-to-End Speech Translation
A framework for efficient model inference with omni-modality models
Precision Alignment, Infinite Possibilities
DeepSeek 4 Flash and PRO local inference engine for Metal, CUDA and ROCm
High-performance data engine for AI and multimodal workloads. Process images, audio, video, and structured data at any scale
⚡️SwanLab - an open-source, modern-design AI training tracking and visualization tool. Supports Cloud / Self-hosted use. Integrated with PyTorch / Transformers / verl / LLaMA Factory / ms-swift / U…
Plug-and-play streaming semantic VAD for real-time full-duplex spoken dialogue systems.
The Full-Duplex Interaction Track of the ICASSP 2026 Human-like Spoken Dialogue Systems Challenge aims to advance the evaluation of full-duplex dialogue systems by in- troducing a dual-channel dial…
Examples and guides for using the OpenAI API
High-Quality Voice Cloning TTS for 600+ Languages
Unofficial Python API and agentic skill for Google NotebookLM. Full programmatic access to NotebookLM's features—including capabilities the web UI doesn't expose—via Python, CLI, and AI agents like…
🥨 Lobe Icons - Brings AI/LLM brand logos to your React & React Native apps — static SVG/PNG/WebP, no dependencies.
Unified Efficient Fine-Tuning of 100+ LLMs & VLMs (ACL 2024)
Noise reduction in python using spectral gating (speech, bioacoustics, audio, time-domain signals)
An open-source long-horizon SuperAgent harness that researches, codes, and creates. With the help of sandboxes, memories, tools, skill, subagents and message gateway, it handles different levels of…
Bash is all you need - A nano claude code–like 「agent harness」, built from 0 to 1
A unified library for building, evaluating, and storing speculative decoding algorithms for LLM inference in vLLM
Your own personal AI assistant. Any OS. Any Platform. The lobster way. 🦞
A SOTA Industrial-Grade All-in-One ASR system with ASR, VAD, LID, and Punc modules. FireRedASR2 supports Chinese (Mandarin, 20+ dialects/accents), English, code-switching, and both speech and singi…
Qwen3-TTS is an open-source series of TTS models developed by the Qwen team at Alibaba Cloud, supporting stable, expressive, and streaming speech generation, free-form voice design, and vivid voice…
Fun-Audio-Chat is a Large Audio Language Model built for natural, low-latency voice interactions.
A feature-rich command-line audio/video downloader
👮♂️The sensitive word tool for java.(敏感词/违禁词/违法词/脏词。基于 DFA 算法实现的高性能 java 敏感词过滤工具框架。内置支持单词标签分类分级。请勿发布涉及政治、广告、营销、翻墙、违反国家法律法规等内容。高性能敏感词检测过滤组件,附带繁体简体互换,支持全角半角互换,汉字转拼音,模糊搜索等功能。)
Framework to bring LLM applications to production
Use Kimi latest model(kimi-k2-0711-preview) to drive your Claude Code.
Hydra is a framework for elegantly configuring complex applications
A modular, primitive-first, python-first PyTorch library for Reinforcement Learning.
🌐 Make websites accessible for AI agents. Automate tasks online with ease.