Skip to content
View Psjs's full-sized avatar

Block or report Psjs

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

Janus-Series: Unified Multimodal Understanding and Generation Models

Python 17,702 2,235 Updated Feb 1, 2025

nsfc - 国家自然科学基金项目LaTeX模版(面地青CBA)

TeX 1,217 312 Updated Mar 5, 2026

ACMMM2021 paper "I2V-GAN: Unpaired Infrared-to-Visible Video Translation"

Python 126 25 Updated Feb 11, 2022

I2V-Adapter: A General Image-to-Video Adapter for Video Diffusion Models

214 4 Updated Dec 30, 2023

LaVIT: Empower the Large Language Model to Understand and Generate Visual Content

Jupyter Notebook 605 31 Updated Oct 6, 2024
Python 93 7 Updated May 25, 2024

Code of "3D Shape Variational Autoencoder Latent Disentanglement via Mini-Batch Feature Swapping for Bodies and Faces"

Python 68 12 Updated Jul 2, 2022

Real3D-Portrait: One-shot Realistic 3D Talking Portrait Synthesis; ICLR 2024 Spotlight; Official code

Python 1,094 131 Updated Oct 18, 2024

CVPR and NeurIPS poster examples and templates. May we have in-person poster session soon!

1,881 166 Updated May 9, 2023

repository for HapticLLaMA: A Multimodal Sensory Language Model for Haptic Captioning

Python 200 27 Updated Sep 3, 2025

A Python library for audio data augmentation. Useful for making audio ML models work well in the real world, not just in the lab.

Python 2,250 213 Updated Dec 27, 2025

Source code for models described in the paper "AudioCLIP: Extending CLIP to Image, Text and Audio" (https://arxiv.org/abs/2106.13043)

Python 863 104 Updated Sep 30, 2021
Jupyter Notebook 54 13 Updated Oct 17, 2023

1 min voice data can also be used to train a good TTS model! (few shot voice cloning)

Python 56,313 6,153 Updated Feb 9, 2026

[CVPR-2022] Official implementation for "Knowledge Distillation with the Reused Teacher Classifier".

Python 103 19 Updated Jun 16, 2022

A PyTorch Implementation of AC-SUM-GAN from "AC-SUM-GAN: Connecting Actor-Critic and Generative Adversarial Networks for Unsupervised Video Summarization" (IEEE TCSVT 2021)

Python 28 10 Updated May 4, 2022

Source code for the paper "Unsupervised Video Summarization via Multi-source Features" published at ICMR 2021

Python 21 10 Updated Apr 5, 2022
Python 16 3 Updated Jul 10, 2024

The code for ICASSP23 paper "MHSCNet: A Multimodal Hierarchical Shot-aware Convolutional Network for Video Summarization"

Python 10 2 Updated Aug 12, 2024
Python 11 Updated Feb 29, 2024

video summarization research repo

Python 2 1 Updated Jul 10, 2022

Pytorch code for paper Contrastive Losses Are Natural Criteria for Unsupervised Video Summarization

Python 21 3 Updated Jan 7, 2023

The official implementation of 'Align and Attend: Multimodal Summarization with Dual Contrastive Losses' (CVPR 2023)

Python 86 10 Updated Apr 24, 2023
Jupyter Notebook 15 3 Updated Mar 29, 2023

Group Gated Fusion on Attention-based Bidirectional Alignment for Multimodal Emotion Recognition

Python 14 Updated May 10, 2022
Python 27 7 Updated Oct 7, 2021
Python 86 8 Updated Mar 27, 2024

[CVPR 2023] Official implementation for "CDDFuse: Correlation-Driven Dual-Branch Feature Decomposition for Multi-Modality Image Fusion."

Python 612 55 Updated Jan 15, 2025

A Repository for Single- and Multi-modal Speaker Verification, Speaker Recognition and Speaker Diarization

Python 2,866 257 Updated Dec 8, 2025

Swift Parameter-free Attention Network for Efficient Super-Resolution

Python 257 18 Updated Feb 28, 2026
Next