Highlights
Lists (1)
Sort Name ascending (A-Z)
Stars
- All languages
- C
- C#
- C++
- CMake
- CSS
- ChucK
- Crystal
- Cuda
- Cython
- Dart
- Dockerfile
- Gherkin
- Go
- HCL
- HTML
- Java
- JavaScript
- Jinja
- Julia
- Jupyter Notebook
- Kotlin
- Lex
- Lua
- MATLAB
- MDX
- Macaulay2
- Makefile
- Markdown
- Nim
- Objective-C
- PHP
- Perl
- Python
- R
- Rich Text Format
- Roff
- Ruby
- Rust
- SCSS
- Scala
- Shell
- Singularity
- Swift
- TSQL
- TeX
- TypeScript
- Vim Script
- Vue
Ming-UniAudio: Speech LLM for Joint Understanding, Generation and Editing with Unified Representation
A Conversational Speech Generation Model
Official PyTorch Implementation of "Latent Denoising Makes Good Visual Tokenizers"
Official implementation of "Continuous Autoregressive Language Models"
Metrics for evaluating music and audio generative models โ with a focus on long-form, full-band, and stereo generations.
This is the official repo for the paper "LongCat-Flash-Omni Technical Report"
A list of tools, papers and code related to Fake Audio Detection.
"AI-Trader: Can AI Beat the Market?" Live Trading Bench: https://ai4trade.ai
Official code of ConfTuner: Training Large Language Models to Express Their Confidence Verbally
Trainging, inference, and testing of the SAC speech codec model.
An All-in-One Speech, Sound, Music Codec with Single Nested Codebook
Turn any PDF or image document into structured data for your AI. A powerful, lightweight OCR toolkit that bridges the gap between images/PDFs and LLMs. Supports 100+ languages.
AAAI 2025: Codec Does Matter: Exploring the Semantic Shortcoming of Codec for Audio Language Model
This is the official repository for the UltraVoice100K dataset, providing code and dataset samples.
The contents of /mnt/skills in Claude's code interpreter environment
Post-training with Tinker
Official GitHub repository for paper "SAKURA: On the Multi-hop Reasoning of Large Audio-Language Models Based on Speech and Audio Information" (Interspeech 2025)
Qwen3-omni is a natively end-to-end, omni-modal LLM developed by the Qwen team at Alibaba Cloud, capable of understanding text, audio, images, and video, as well as generating speech in real time.
MiMo-Audio: Audio Language Models are Few-Shot Learners
NISQA - Non-Intrusive Speech Quality and TTS Naturalness Assessment
Lightweight coding agent that runs in your terminal
An extremely fast Python package and project manager, written in Rust.
iOS OCR Server, using Apple's Vision Framework API.