-
Tokyo Metropolitan University
- Tokyo
-
02:19
(UTC +09:00) - https://portfolio.ayutaso.com
- @aya172957
Highlights
- Pro
Lists (1)
Sort Name ascending (A-Z)
Stars
FlashCosyVoice: A lightweight vLLM implementation built from scratch for CosyVoice.
The agent that grows with you
Large-scale, Informative, and Diverse Multi-round Chat Data (and Models)
Reference implementation of an end-to-end voice agent built using the NVIDIA Nemotron models
[EMNLP 2025 Findings] Code for "Distilling Many-Shot In-Context Learning into a Cheat Sheet"
List of open-source TTS, voice cloning, and music generation models
Erasing concepts from neural representations with provable guarantees
A package for NeuCodec: a 50hz, 0.8kbps, 24kHz audio codec.
MOSS‑TTS Family is an open‑source speech and sound generation model family from MOSI.AI and the OpenMOSS team. It is designed for high‑fidelity, high‑expressiveness, and complex real‑world scenario…
Whisper-Flow is a framework designed to enable real-time transcription of audio content using OpenAI’s Whisper model. Rather than processing entire files after upload (“batch mode”), Whisper-Flow a…
Speech-to-text, text-to-speech, speaker diarization, speech enhancement, source separation, and VAD using next-gen Kaldi with onnxruntime without Internet connection. Support embedded systems, Andr…
An open-source wake word library for creating voice-enabled applications.
The AsyncAPI specification allows you to create machine-readable definitions of your asynchronous APIs.
Open Source framework for voice and multimodal conversational AI
Public release of the Sound Effect Foundation model by Sony AI.
kyutai-labs / nanoGPTaudio
Forked from karpathy/nanoGPTCode for the blog "Neural audio codecs: how to get audio into LLMs"
✨✨Freeze-Omni: A Smart and Low Latency Speech-to-speech Dialogue Model with Frozen LLM
MiMo-Audio: Audio Language Models are Few-Shot Learners
SGLang is a high-performance serving framework for large language models and multimodal models.
Your own personal AI assistant. Any OS. Any Platform. The lobster way. 🦞
Use PEFT or Full-parameter to CPT/SFT/DPO/GRPO 600+ LLMs (Qwen3.6, DeepSeek-V4, GLM-5.1, InternLM3, Llama4, ...) and 300+ MLLMs (Qwen3-VL, Qwen3-Omni, InternVL3.5, Ovis2.5, GLM4.5v, Gemma4, Llava, …
Official inference code for SoulX-Singer: Towards High-Quality Zero-Shot Singing Voice Synthesis
AI Generated Music Player with 3D Carousel - Powered by ACE-Step 1.5, HonoX, Cloudflare Workers/R2/D1