swapb94

😎

Swapnil Bhosale swapb94

😎

PhD | Audio-visual correspondence learning | NLP | Audio Processing

18 followers · 9 following

University of Surrey
UK
https://swapb94.github.io/
https://scholar.google.com/citations?user=FsO6e24AAAAJ&hl=en&oi=ao

Highlights

Stars

tpn / pdfs

Technically-oriented PDF Collection (Papers, Specs, Decks, Manuals, etc)

HTML 9,632 1,811 Updated Dec 25, 2025

ta012 / SSLAM

[ICLR 2025] Enhancing Self-Supervised Models with Audio Mixtures for Polyphonic Soundscapes

Python 61 3 Updated Oct 8, 2025

jaeyeonkim99 / visage

Official implementation of "ViSAGe: Video-to-Spatial AUdio Generation" (ICLR 2025)

Python 42 4 Updated Sep 10, 2025

liuhuadai / Sphere360

A 360-degree video dataset designed for 360-degree video-to-spatial audio generation.

4 Updated Feb 17, 2025

liuhuadai / OmniAudio

[ICML 2025] PyTorch Implementation of "OmniAudio: Generating Spatial Audio from 360-Degree Video"

Python 363 13 Updated Jun 27, 2025

YoonjinXD / T-FOLEY

Implementation of the paper, T-FOLEY: A Controllable Waveform-Domain Diffusion Model for Temporal-Event-Guided Foley Sound Synthesis, accepted in 2024 ICASSP

Python 34 2 Updated May 25, 2024

eric-mitchell / direct-preference-optimization

Reference implementation for DPO (Direct Preference Optimization)

Python 2,877 234 Updated Aug 11, 2024

daeunni / VideoRepair

Code for "VideoRepair: Improving Text-to-Video Generation via Misalignment Evaluation and Localized Refinement [ACL 2026 Findings]"

Python 53 2 Updated Apr 7, 2026

hkchengrex / MMAudio

[CVPR 2025] MMAudio: Taming Multimodal Joint Training for High-Quality Video-to-Audio Synthesis

Python 2,141 251 Updated Feb 23, 2026

showlab / Awesome-Video-Diffusion

A curated list of recent diffusion models for video generation, editing, and various other applications.

5,579 353 Updated Apr 3, 2026

yuhanghe01 / RiTTA

Event Relation in Text-to-Audio (TTA) Generation

Python 21 Updated Feb 26, 2025

Genesis-Embodied-AI / Genesis

A generative world for general-purpose robotics & embodied AI learning.

Python 28,485 2,668 Updated Apr 11, 2026

yehonathanlitman / MaterialFusion

[3DV 2025] MaterialFusion: Enhancing Inverse Rendering with Material Diffusion Priors

Python 86 4 Updated Nov 28, 2024

X-LANCE / VoiceFlow-TTS

[ICASSP 2024] This is the official code for "VoiceFlow: Efficient Text-to-Speech with Rectified Flow Matching"

Python 371 24 Updated Sep 3, 2024

X-LANCE / SLAM-LLM

A Framework for Speech, Language, Audio, Music Processing with Large Language Model

Python 1,019 112 Updated Jan 15, 2026

ai4ce / FusionSense

[ICRA2025] Integrates the vision, touch, and common-sense information of foundational models, customized to the agent's perceptual needs.

Python 47 4 Updated Apr 4, 2025

maturk / dn-splatter

DN-Splatter + AGS-Mesh: Depth and Normal Priors for Gaussian Splatting

Python 780 65 Updated Jul 5, 2025

SWivid / F5-TTS

Official code for "F5-TTS: A Fairytaler that Fakes Fluent and Faithful Speech with Flow Matching"

Python 14,320 2,113 Updated Apr 4, 2026

cvlab-kaist / GaussianTalker

Official implementation of “GaussianTalker: Real-Time High-Fidelity Talking Head Synthesis with Audio-Driven 3D Gaussian Splatting” by Kyusun Cho, Joungbin Lee, Heeji Yoon, Yeobin Hong, Jaehoon Ko,…

Python 396 63 Updated Oct 12, 2025