Deepfake Detection Solution using Multimodal Approach.
-
Updated
Jun 22, 2025 - Python
Deepfake Detection Solution using Multimodal Approach.
Experiments around using Multi-Modal Casual Attention with Multi-Grouped Query Attention
Multimodal Agentic GenAI Workflow – Seamlessly blends retrieval and generation for intelligent storytelling
Multi-speaker diarization from video using SyncNet’s cross-modal embedding space to match multiple face tracks to corresponding audio tracks.
Multimodal deep learning package that uses both categorical and text-based features in a single deep architecture for regression and binary classification use cases.
Deeplearning utils for multimodal research
Code and Models for Binding Text, Images, Graphs, and Audio for Music Representation Learning
The first-ever series of embeddings models for olfaction-vision-language applications in robotics and embodied AI.
App to cheer you up with some awesome quotes when depressed using deep learning
A PyTorch implementation of a Transformer Network for Machine Translation that incorporates image features to enhance the performance of the translation
Repository for the "Latent Multimodal Reconstruction for Misinformation Detection" paper
Multimodal benchmark for evaluating handwritten editorial correction in printed text.
This repository implements temporal reasoning capabilities for vision-language models in simulated embodied environments, addressing the critical limitation of frame-by-frame processing in current multimodal AI systems.
Unofficial implementation for Sigmoid Loss for Language Image Pre-Training
Brain Tumor Classification from Mutlisequence MRI (T1, T1C and T2) and Mutlimodal CT & MRI using EfficientNetV2B0 and Mutliheaded Self Attention with Hyperparameter Fine-Tuning
A unified multimodal generative AI system designed to learn and adapt across multiple modalities (text, audio, vision, robotics) with minimal data and long-term autonomy through reinforcement learning.
[IROS 2023] GVCCI: Lifelong Learning of Visual Grounding for Language-Guided Robotic Manipulation
Mixed vision-language Attention Model that gets better by making mistakes
This repo contains the official PyTorch implementation of vLMIG: Improving Visual Commonsense in Language Models via Multiple Image Generation
Add a description, image, and links to the multimodal-deep-learning topic page so that developers can more easily learn about it.
To associate your repository with the multimodal-deep-learning topic, visit your repo's landing page and select "manage topics."