Skip to content
View voidful's full-sized avatar
๐ŸŽฏ
Focusing
๐ŸŽฏ
Focusing

Sponsors

@ga642381

Block or report voidful

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this userโ€™s behavior. Learn more about reporting abuse.

Report abuse
Showing results

Ming-UniAudio: Speech LLM for Joint Understanding, Generation and Editing with Unified Representation

Python 302 25 Updated Oct 28, 2025

A Conversational Speech Generation Model

Python 14,257 1,426 Updated May 27, 2025

Official PyTorch Implementation of "Latent Denoising Makes Good Visual Tokenizers"

Jupyter Notebook 152 4 Updated Oct 21, 2025

Official implementation of "Continuous Autoregressive Language Models"

Python 294 42 Updated Nov 6, 2025

Metrics for evaluating music and audio generative models โ€“ with a focus on long-form, full-band, and stereo generations.

Python 253 22 Updated Oct 31, 2025
Python 1 Updated Nov 7, 2025

This is the official repo for the paper "LongCat-Flash-Omni Technical Report"

Python 366 16 Updated Nov 4, 2025

A list of tools, papers and code related to Fake Audio Detection.

190 10 Updated Oct 20, 2025

"AI-Trader: Can AI Beat the Market?" Live Trading Bench: https://ai4trade.ai

Python 9,024 1,302 Updated Nov 6, 2025

NOF0 - ๅผ€ๆบ็š„ AI ไบคๆ˜“็ซžๆŠ€ๅœบ

Go 2,630 410 Updated Nov 6, 2025
Python 243 26 Updated May 19, 2025

Official code of ConfTuner: Training Large Language Models to Express Their Confidence Verbally

Python 11 Updated Sep 26, 2025

Trainging, inference, and testing of the SAC speech codec model.

Python 84 6 Updated Nov 1, 2025

An All-in-One Speech, Sound, Music Codec with Single Nested Codebook

Python 20 Updated Oct 11, 2025

Turn any PDF or image document into structured data for your AI. A powerful, lightweight OCR toolkit that bridges the gap between images/PDFs and LLMs. Supports 100+ languages.

Python 62,867 9,262 Updated Nov 6, 2025

AAAI 2025: Codec Does Matter: Exploring the Semantic Shortcoming of Codec for Audio Language Model

Python 259 18 Updated Oct 12, 2025

This is the official repository for the UltraVoice100K dataset, providing code and dataset samples.

JavaScript 12 1 Updated Oct 26, 2025

The best ChatGPT that $100 can buy.

Python 35,956 4,161 Updated Nov 5, 2025

The contents of /mnt/skills in Claude's code interpreter environment

860 123 Updated Oct 16, 2025

Post-training with Tinker

Python 1,446 114 Updated Nov 5, 2025

Official GitHub repository for paper "SAKURA: On the Multi-hop Reasoning of Large Audio-Language Models Based on Speech and Audio Information" (Interspeech 2025)

Python 19 3 Updated Aug 14, 2025

Qwen3-omni is a natively end-to-end, omni-modal LLM developed by the Qwen team at Alibaba Cloud, capable of understanding text, audio, images, and video, as well as generating speech in real time.

Jupyter Notebook 2,829 160 Updated Oct 9, 2025

MiMo-Audio: Audio Language Models are Few-Shot Learners

Python 835 84 Updated Sep 20, 2025
Python 374 48 Updated Nov 1, 2025

NISQA - Non-Intrusive Speech Quality and TTS Naturalness Assessment

Python 879 142 Updated Dec 1, 2024
Python 117 5 Updated Sep 4, 2025

Lightweight coding agent that runs in your terminal

Rust 49,938 6,174 Updated Nov 7, 2025

An extremely fast Python package and project manager, written in Rust.

Rust 72,125 2,200 Updated Nov 7, 2025

iOS OCR Server, using Apple's Vision Framework API.

Swift 879 52 Updated Sep 20, 2025
Next