Stars
7
stars
written in Jupyter Notebook
Clear filter
Qwen3-VL is the multimodal large language model series developed by Qwen team, Alibaba Cloud.
Offline speech recognition API for Android, iOS, Raspberry Pi and servers with Python, Java, C# and Node
FaceChain is a deep-learning toolchain for generating your Digital-Twin.
Neural building blocks for speaker diarization: speech activity detection, speaker change detection, overlapped speech detection, speaker embedding
An open-source audio wake word (or phrase) detection framework with a focus on performance and simplicity.