Stars
📚 《从零开始构建智能体》——从零开始的智能体原理与实践教程
The first Large Audio Language Model that enables native in-depth thinking, which is trained on large-scale audio Chain-of-Thought data.
[NeurIPS 2025] PyTorch implementation of [ThinkSound], a unified framework for generating audio from any modality, guided by Chain-of-Thought (CoT) reasoning.
A Fully Self-Hosted Solution for Full-Duplex Voice Interaction
Efficient audio understanding with general audio captions
中国科学院大学电子学院专业核心课-现代数字信号处理课程资料-张颢老师
Production First and Production Ready End-to-End Speech Recognition Toolkit
Open-Source Turn-Taking Detection Model and Dataset for Full-Duplex Spoken Dialogue Systems
ICT-STAR小组生存指南 | A website for sharing information about how to become a qualified Master/Ph.D.
中国科学院大学网安-计算机相关课程资源,高级人工智能,深度学习,应用密码学,机器学习,信息隐藏,信息论与编码,多媒体编码等
Some useful things while 集中教学 in Yan Qi Lake, Beijing:国科大人文讲座脚本
PPTAgent: Generating and Evaluating Presentations Beyond Text-to-Slides [EMNLP 2025]
Code for Pose-Controllable Talking Face Generation by Implicitly Modularized Audio-Visual Representation (CVPR 2021)
code for paper "Audio2Head: Audio-driven One-shot Talking-head Generation with Natural Head Motion" in the conference of IJCAI 2021
This repository contains the codes for LipGAN. LipGAN was published as a part of the paper titled "Towards Automatic Face-to-Face Translation".
ABAW3 (CVPRW): A Joint Cross-Attention Model for Audio-Visual Fusion in Dimensional Emotion Recognition
speech enhancement\speech seperation\sound source localization
natlamir / Wav2Lip-WebUI
Forked from Rudrabha/Wav2LipA wav2lip Web UI using Gradio
Latex code for making neural networks diagrams
Out of time: automated lip sync in the wild