Starred repositories
Open-source, self-hosted note-taking tool built for quick capture. Markdown-native, lightweight, and fully yours.
Interactive 3D cell architecture gallery built with React and Three.js
A next.js web application that integrates AI capabilities with draw.io diagrams. This app allows you to create, modify, and enhance diagrams through natural language commands and AI-assisted visual…
using MMS to do the audio-transcript alignment
Dolphin is a multilingual, multitask ASR model jointly trained by DataoceanAI and Tsinghua University.
Python tool for converting files and office documents to Markdown.
High-Quality Voice Cloning TTS for 600+ Languages
Official code for "Habibi: Laying the Open-Source Foundation of Unified-Dialectal Arabic Speech Synthesis"
Flexible audio loudness meter in Python with implementation of ITU-R BS.1770-4 loudness algorithm
A high-quality rapid TTS voice cloning model that reaches speeds of 150x realtime.
End-to-end speech recognition large model: 31 languages, dialects, accents, lyrics, hotwords, timestamps, speaker diarization. Trained on tens of millions of hours.
Patterns and resources of low latency programming.
Opencpop: A High-Quality Open Source Chinese Popular Song Database for Singing Voice Synthesis
Omnilingual ASR Open-Source Multilingual SpeechRecognition for 1600+ Languages
We Speech Toolkit, LLM based Speech Toolkit for Speech Understanding, Generation, and Interaction
SoulX-Podcast is an inference codebase by the Soul AI team for generating high-fidelity podcasts from text.
Step-Audio 2 is an end-to-end multi-modal large language model designed for industry-strength audio understanding and speech conversation.
The official implementation of CATT Arabic diacritization models.
An Industrial-Level Controllable and Efficient Zero-Shot Text-To-Speech System
Convert PDF to markdown + JSON quickly with high accuracy
A python package to analyze and compare voices with deep learning
Automatic Speech Recognition with Speaker Diarization based on OpenAI Whisper
A generative speech model for daily dialogue.