Stars
- All languages
- Assembly
- Astro
- C
- C#
- C++
- CSS
- Crystal
- Cuda
- Cython
- Dart
- Dockerfile
- Elixir
- Elm
- Go
- HTML
- Haskell
- Java
- JavaScript
- Julia
- Jupyter Notebook
- Kotlin
- Lua
- MDX
- MLIR
- Markdown
- Mask
- Nix
- OCaml
- Objective-C
- OpenEdge ABL
- PHP
- PLSQL
- Perl
- Processing
- Python
- ReScript
- Red
- Roff
- Ruby
- Rust
- SCSS
- Scala
- Shell
- Svelte
- Swift
- TeX
- TypeScript
- V
- Verilog
- Vim Snippet
- Vue
- Yacc
- mdsvex
Kimi-Audio, an open-source audio foundation model excelling in audio understanding, generation, and conversation
High-Quality Voice Cloning TTS for 600+ Languages
A SOTA Industrial-Grade Voice Activity Detection & Audio Event Detection, supporting 100+ languages, outperforming Silero-VAD, TEN-VAD, FunASR-VAD and WebRTC-VAD
A highly compressive and high-quality neural audio codec for speech models.
A Streaming-Native Serving Engine for TTS/STS Models
Lightweight coding agent that runs in your terminal
[EMNLP Main '25] LiteASR: Efficient Automatic Speech Recognition with Low-Rank Approximation
Repair malformed JSON from LLMs, APIs, logs, and user input in Python.
Things you can do with the token embeddings of an LLM
vLLM’s reference system for K8S-native cluster-wide deployment with community-driven performance optimization
Achieve state of the art inference performance with modern accelerators on Kubernetes
Fast neural codec compression and generation for audio waveforms
Analyze coding (agent) CLI token usage and costs from local data.
Fast audio data augmentation in PyTorch. Inspired by audiomentations. Useful for deep learning.
Official code for "F5R-TTS: Improving Flow-Matching based Text-to-Speech with Group Relative Policy Optimization"
Hibiki is a model for streaming speech translation (also known as simultaneous translation). Unlike offline translation—where one waits for the end of the source utterance to start translating--- H…
Moshi is a speech-text foundation model and full-duplex spoken dialogue framework. It uses Mimi, a state-of-the-art streaming neural audio codec.
A neural word aligner based on multilingual BERT
tsukumijima / pyopenjtalk-plus
Forked from r9y9/pyopenjtalkpyopenjtalk-plus: A Python wrapper for OpenJTalk with additional improvements
A Python framework for sequence labeling evaluation(named-entity recognition, pos tagging, etc...)
[ICCV 2025] Implementation for Describe Anything: Detailed Localized Image and Video Captioning