[TPAMI'23] Unifying Flow, Stereo and Depth Estimation
-
Updated
Jan 4, 2025 - Python
[TPAMI'23] Unifying Flow, Stereo and Depth Estimation
Unofficial implementation of "Prompt-to-Prompt Image Editing with Cross Attention Control" with Stable Diffusion
Pocket-Sized Multimodal AI for content understanding and generation across multilingual texts, images, and 🔜 video, up to 5x faster than OpenAI CLIP and LLaVA 🖼️ & 🖋️
🚀 Cross attention map tools for huggingface/diffusers
T-GATE: Temporally Gating Attention to Accelerate Diffusion Model for Free!
1-shot image segmentation using Stable Diffusion
Code on selecting an action based on multimodal inputs. Here in this case inputs are voice and text.
[IV 2025, Oral] Official code of "6Img-to-3D: Few-Image Large-Scale Outdoor Novel View Synthesis"
Implementation of CALM from the paper "LLM Augmented LLMs: Expanding Capabilities through Composition", out of Google Deepmind
This is the project for the paper of "Low-Light Video Enhancement via Spatial-Temporal Consistent Decomposition" in IJCAI2025
The official repository of "Energy-Based Cross Attention for Bayesian Context Update in Text-to-Image Diffusion Models".
[NeurIPS 2023] Official implementation of the paper "CAST: Cross-Attention in Space and Time for Video Action Recognition"
This is the implementation of the paper Enhanced Photovoltaic Power Forecasting: An iTransformer and LSTM-Based Model Integrating Temporal and Covariate Interactions
Detect Deepfaked Faces Using Multiple Deeplearning Models
A lightweight PyTorch implementation of the Transformer-XL architecture proposed by Dai et al. (2019)
[ISMB 2024] Official PyTorch Code for "PhiHER2: Phenotype-informed weakly supervised model for HER2 status prediction from WSIs"
Official source code of the paper: "Unveiling Interpretability in Self-Supervised Speech Representations for Parkinson’s Diagnosis"
[ITSC-2023] HRFuser: A Multi-resolution Sensor Fusion Architecture for 2D Object Detection
SOVL System (Self-Organizing Virtual Lifeform): A complex, purpose-agnostic autonomous agent with continuous, asynchronous learning capabilities via a dynamic scaffolded LLM and a frozen base LLM
Raw C/cuda implementation of 3d GAN
Add a description, image, and links to the cross-attention topic page so that developers can more easily learn about it.
To associate your repository with the cross-attention topic, visit your repo's landing page and select "manage topics."