🌍 Advance world modeling with LingBot-World, an open-source simulator designed for video generation and high-quality environment interaction.
-
Updated
Apr 30, 2026 - Python
🌍 Advance world modeling with LingBot-World, an open-source simulator designed for video generation and high-quality environment interaction.
🖌️ Generate artistic creations using prompts, reference images, and trajectories with WorldCanvas, a versatile platform for visual expression and exploration.
ZPE-Prosody V0.0: DETERMINISTIC SPEECH PROSODY CODEC: Intonation | Rhythm | Stress | Emotional Contour | Pitch Transport | Duration Encoding
An unofficial Python reimplementation of the legacy-STRAIGHT
Converts .mp3 to a .txt with WhisperModel.
本库存放基于深度学习的压缩语音隐写信息检测模型。与 RNN‑SM、CSW、SFFN 相比,该方法可更有效地检测与捕获压缩语音中的隐藏信息,并支持隐藏检测分类。(这也是我在读本科的时候第一篇SCI 当时AI也没有太盛行 纯手搓 大佬勿喷 愿提出宝贵意见)
CLI-based speech analysis pipeline: transcribes audio/video with OpenAI Whisper, then runs NLP analyzers (speech rate, complexity, vocabulary, wordcloud) via spaCy. Exports metrics and charts as PDF or JSON.
Turn AI into anyone — data-driven persona generation from real social media. 说一个名字,AI 就能变成 TA。自动采集真实数据,生成比手写人设真实100倍的角色。
Speaker State Trajectory analysis — treats voice as a nonlinear dynamical system and drives research with a Karpathy-style autoresearch loop.
Speech analysis system for detecting pauses and stuttered speech patterns using MFCC, cosine similarity, and phoneme-based reconstruction with a Streamlit interface.
Open-source toolkit for social interaction research: extract 400+ multimodal features from conversation videos, then analyze synchrony, conversational states, and impression dynamics
Code for audio-based autism spectrum disorder (ASD) classification using Transformer models, machine learning baselines, and SHAP analysis.
PolyglotDB is a package for phonetic corpus storage and analysis
Listening Between the Lines: An explainable multimodal framework for MCI detection from spontaneous speech. Leverages Selective State Space Models (Mamba) and Gated Fusion to integrate linguistic disfluencies and eGeMAPS biomarkers across multi-corpus benchmarks (Pitt, ADReSS, TAUKADIAL)
Local Sanskrit recitation coach for Bhagavad Gita shlokas with audio-based pronunciation analysis, shloka detection, practice mode, and LLM feedback.
AI-powered communication coach that analyzes real-time speech signals to detect confidence drops, hesitation, and nervousness, providing data-driven feedback for interviews and public speaking.
Streamlit app for time-domain audio signal analysis — silence detection, voiced/unvoiced classification, F0 estimation via autocorrelation and AMDF, and weighted multi-feature speech/music discrimination.
Streamlit app for frequency-domain audio signal analysis — FFT, spectrogram, spectral features (centroid, bandwidth, SFM, SCF), formant detection, and F0 estimation via cepstrum.
Lightweight Python toolkit for analyzing speech fluency features such as pauses and silence ratio.
The automatic system that can extract PRAAT-like speech features from raw speech wav files, and also can get low WER (<10) high quality transcriptions at the same time.
Add a description, image, and links to the speech-analysis topic page so that developers can more easily learn about it.
To associate your repository with the speech-analysis topic, visit your repo's landing page and select "manage topics."