Highlights
Lists (1)
Sort Name ascending (A-Z)
Stars
- All languages
- C
- C#
- C++
- CMake
- CSS
- ChucK
- Crystal
- Cuda
- Cython
- Dart
- Dockerfile
- Gherkin
- Go
- HCL
- HTML
- Java
- JavaScript
- Jinja
- Julia
- Jupyter Notebook
- Kotlin
- Lex
- Lua
- MATLAB
- MDX
- Macaulay2
- Makefile
- Markdown
- Nim
- Objective-C
- PHP
- Perl
- Python
- R
- Rich Text Format
- Roff
- Ruby
- Rust
- SCSS
- Scala
- Shell
- Singularity
- Swift
- TSQL
- TeX
- TypeScript
- Vim Script
- Vue
This repository contains a series of works on diffusion-based speech tokenizers, including the official implementation of the paper: "TaDiCodec: Text-aware Diffusion Speech Tokenizer for Speech Lanโฆ
FlexiCodec: A Dynamic Neural Audio Codec for Low Frame Rates
Ming-UniAudio: Speech LLM for Joint Understanding, Generation and Editing with Unified Representation
A Conversational Speech Generation Model
Official PyTorch Implementation of "Latent Denoising Makes Good Visual Tokenizers"
Official implementation of "Continuous Autoregressive Language Models"
Metrics for evaluating music and audio generative models โ with a focus on long-form, full-band, and stereo generations.
This is the official repo for the paper "LongCat-Flash-Omni Technical Report"
A list of tools, papers and code related to Fake Audio Detection.
"AI-Trader: Can AI Beat the Market?" Live Trading Bench: https://ai4trade.ai
Official code of ConfTuner: Training Large Language Models to Express Their Confidence Verbally
Trainging, inference, and testing of the SAC speech codec model.
An All-in-One Speech, Sound, Music Codec with Single Nested Codebook
Turn any PDF or image document into structured data for your AI. A powerful, lightweight OCR toolkit that bridges the gap between images/PDFs and LLMs. Supports 100+ languages.
AAAI 2025: Codec Does Matter: Exploring the Semantic Shortcoming of Codec for Audio Language Model
This is the official repository for the UltraVoice100K dataset, providing code and dataset samples.
The contents of /mnt/skills in Claude's code interpreter environment
Post-training with Tinker
Official GitHub repository for paper "SAKURA: On the Multi-hop Reasoning of Large Audio-Language Models Based on Speech and Audio Information" (Interspeech 2025)
Qwen3-omni is a natively end-to-end, omni-modal LLM developed by the Qwen team at Alibaba Cloud, capable of understanding text, audio, images, and video, as well as generating speech in real time.
MiMo-Audio: Audio Language Models are Few-Shot Learners
NISQA - Non-Intrusive Speech Quality and TTS Naturalness Assessment
Lightweight coding agent that runs in your terminal