multimodal-deep-learning

Here are 297 public repositories matching this topic...

KimMeen / Time-LLM

[ICLR 2024] Official implementation of " 🦙 Time-LLM: Time Series Forecasting by Reprogramming Large Language Models"

machine-learning deep-learning time-series language-model time-series-analysis time-series-forecast time-series-forecasting multimodal-deep-learning cross-modality multimodal-time-series cross-modal-learning prompt-tuning large-language-models

Updated Nov 3, 2024
Python

jrzaurin / pytorch-widedeep

Star

A flexible package for multimodal-deep-learning to combine tabular data with text and images using Wide and Deep models in Pytorch

python deep-learning text images tabular-data pytorch pytorch-cv multimodal-deep-learning pytorch-nlp pytorch-transformers model-hub pytorch-tabular-data

Updated Jul 30, 2025
Python

kyegomez / BitNet

Sponsor

Star

Implementation of "BitNet: Scaling 1-bit Transformers for Large Language Models" in pytorch

machine-learning deep-neural-networks artificial-intelligence deeplearning multimodal multimodal-deep-learning gpt4

Updated Sep 8, 2025
Python

This repository contains the code for a video captioning system inspired by Sequence to Sequence -- Video to Text. This system takes as input a video and generates a caption in English describing the video.

tensorflow seq2seq sequence-to-sequence video-captioning s2vt multimodal-deep-learning

Updated Oct 12, 2019
Python

kyegomez / Med-PaLM

Sponsor

Star

Towards Generalist Biomedical AI

opensource deep-learning multimodality biomedical multimodal multimodal-deep-learning gpt4

Updated Feb 17, 2024
Python

MMMU-Benchmark / MMMU

Star

This repo contains evaluation code for the paper "MMMU: A Massive Multi-discipline Multimodal Understanding and Reasoning Benchmark for Expert AGI"

machine-learning natural-language-processing deep-neural-networks computer-vision deep-learning evaluation question-answering stem multimodality multimodal-learning visual-question-answering multimodal multimodal-deep-learning foundation-models large-language-models llm llms large-multimodal-models

Updated May 19, 2025
Python

cap-ntu / Video-to-Retail-Platform

Star

An intelligent multimodal-learning based system for video, product and ads analysis. Based on the system, people can build a lot of downstream applications such as product recommendation, video retrieval, etc.

machine-learning deep-neural-networks deep-learning multimedia network-server multimodal-deep-learning ai-system

Updated Jan 10, 2021
Python

declare-lab / Multimodal-Infomax

Star

This repository contains the official implementation code of the paper Improving Multimodal Fusion with Hierarchical Mutual Information Maximization for Multimodal Sentiment Analysis, accepted at EMNLP 2021.

multimodal-sentiment-analysis multimodal-deep-learning multimodal-fusion

Updated Mar 14, 2023
Python

MILVLG / prophet

Star

Implementation of CVPR 2023 paper "Prompting Large Language Models with Answer Heuristics for Knowledge-based Visual Question Answering".

pytorch visual-question-answering multimodal-deep-learning gpt-3 prompt-engineering okvqa a-okvqa

Updated Jun 14, 2025
Python

idearibosome / embracenet

Star

Robust multimodal integration method implemented in PyTorch and TensorFlow

deep-learning tensorflow pytorch multimodal multimodal-deep-learning

Updated Mar 5, 2021
Python

drprojects / DeepViewAgg

Star

[CVPR'22 Best Paper Finalist] Official PyTorch implementation of the method presented in "Learning Multi-View Aggregation In the Wild for Large-Scale 3D Semantic Segmentation"