-
University of Surrey
- UK
- https://swapb94.github.io/
- https://scholar.google.com/citations?user=FsO6e24AAAAJ&hl=en&oi=ao
Highlights
- Pro
Stars
Technically-oriented PDF Collection (Papers, Specs, Decks, Manuals, etc)
[ICLR 2025] Enhancing Self-Supervised Models with Audio Mixtures for Polyphonic Soundscapes
Official implementation of "ViSAGe: Video-to-Spatial AUdio Generation" (ICLR 2025)
A 360-degree video dataset designed for 360-degree video-to-spatial audio generation.
[ICML 2025] PyTorch Implementation of "OmniAudio: Generating Spatial Audio from 360-Degree Video"
Implementation of the paper, T-FOLEY: A Controllable Waveform-Domain Diffusion Model for Temporal-Event-Guided Foley Sound Synthesis, accepted in 2024 ICASSP
Reference implementation for DPO (Direct Preference Optimization)
Code for "VideoRepair: Improving Text-to-Video Generation via Misalignment Evaluation and Localized Refinement [ACL 2026 Findings]"
[CVPR 2025] MMAudio: Taming Multimodal Joint Training for High-Quality Video-to-Audio Synthesis
A curated list of recent diffusion models for video generation, editing, and various other applications.
A generative world for general-purpose robotics & embodied AI learning.
[3DV 2025] MaterialFusion: Enhancing Inverse Rendering with Material Diffusion Priors
[ICASSP 2024] This is the official code for "VoiceFlow: Efficient Text-to-Speech with Rectified Flow Matching"
A Framework for Speech, Language, Audio, Music Processing with Large Language Model
[ICRA2025] Integrates the vision, touch, and common-sense information of foundational models, customized to the agent's perceptual needs.
DN-Splatter + AGS-Mesh: Depth and Normal Priors for Gaussian Splatting
Official code for "F5-TTS: A Fairytaler that Fakes Fluent and Faithful Speech with Flow Matching"
Official implementation of “GaussianTalker: Real-Time High-Fidelity Talking Head Synthesis with Audio-Driven 3D Gaussian Splatting” by Kyusun Cho, Joungbin Lee, Heeji Yoon, Yeobin Hong, Jaehoon Ko,…
Generative models for conditional audio generation
The implementation of MeshSegmenter: Zero-Shot Mesh Semantic Segmentation via Texture Synthesis
[ECCV 2024 Oral] Audio-Synchronized Visual Animation
[TVCG2024] PGSR: Planar-based Gaussian Splatting for Efficient and High-Fidelity Surface Reconstruction
PyTorch implementation of paper: GaussNav: Gaussian Splatting for Visual Navigation
arXiv LaTeX Cleaner: Easily clean the LaTeX code of your paper to submit to arXiv
[NeurIPS 2023] AV-NeRF: Learning Neural Fields for Real-World Audio-Visual Scene Synthesis