-
10:13
(UTC -04:00) - sivannavis.github.io
- in/sivanding
Starred repositories
My Python scripts to make high-quality figures for publications in top AI conferences and journals.
Semantic search for conference papers via OpenReview API. This is a helper package for my agentic research workflow.
The repository provides code for running inference with the SegmentAnything Model (SAM), links for downloading the trained model checkpoints, and example notebooks that show how to use the model.
This is the official codebase for WavJEPA. Time-domain audio foundation model for holistic downstream tasks. "Self-supervised learning from raw waveforms unlock robust audio foundation models"."
This repo hosts the code and models of "Masked Autoencoders that Listen".
[CVPR 2024] Code and datasets for 'Learning Spatial Features from Audio-Visual Correspondence in Egocentric Videos'
PyTorch code and models for VJEPA2 self-supervised learning from video.
Official repository of the paper "MuQ: Self-Supervised Music Representation Learning with Mel Residual Vector Quantization".
[NeurIPS 2024 Spotlight] code for "Diffusion Model with Cross Attention as an Inductive Bias for Disentanglement"
PyTorch implementation of FractalGen https://arxiv.org/abs/2502.17437
Object-oriented handling of audio data, with GPU-powered augmentations, and more.
Official PyTorch implementation of ReWaS (AAAI'25) "Read, Watch and Scream! Sound Generation from Text and Video"
Ultra-low bitrate neural audio codec (0.31~1.40 kbps) with a better semantic in the latent space.
A collection of literature after or concurrent with Masked Autoencoder (MAE) (Kaiming He el al.).
Latte: Cross-framework Python Package for Evaluation of Latent-based Generative Models
A simple and elegant Jekyll theme for an academic personal homepage
A beautiful, simple, clean, and responsive Jekyll theme for academics
[ICLR'24] Learning to Compose: Improving Object Centric Learning by Injecting Compositionality
Multi-view-AE: An extensive collection of multi-modal autoencoders implemented in a modular, scikit-learn style framework.
Multi-VAE: Learning Disentangled View-common and View-peculiar Visual Representations for Multi-view Clustering
Official code for SEE-2-SOUND: Zero-Shot Spatial Environment-to-Spatial Sound
This toolbox aims to unify audio generation model evaluation for easier comparison.
Python packaging and dependency management made easy
This repo implements a Stable Diffusion model in PyTorch with all the essential components.
This repo implements Denoising Diffusion Probabilistic Models (DDPM) in Pytorch