LAVIS - A One-stop Library for Language-Vision Intelligence
-
Updated
Nov 18, 2024 - Jupyter Notebook
LAVIS - A One-stop Library for Language-Vision Intelligence
FinRobot: An Open-Source AI Agent Platform for Financial Analysis using LLMs 🚀 🚀 🚀
[ICLR 2024] Official implementation of " 🦙 Time-LLM: Time Series Forecasting by Reprogramming Large Language Models"
(ෆ`꒳´ෆ) A Survey on Text-to-Image Generation/Synthesis.
A collection of original, innovative ideas and algorithms towards Advanced Literate Machinery. This project is maintained by the OCR Team in the Language Technology Lab, Tongyi Lab, Alibaba Group.
A flexible package for multimodal-deep-learning to combine tabular data with text and images using Wide and Deep models in Pytorch
Implementation of "BitNet: Scaling 1-bit Transformers for Large Language Models" in pytorch
This repository contains various models targetting multimodal representation learning, multimodal fusion for downstream tasks such as multimodal sentiment analysis.
收集 CVPR 最新的成果,包括论文、代码和demo视频等,欢迎大家推荐!Collect the latest CVPR (Conference on Computer Vision and Pattern Recognition) results, including papers, code, and demo videos, etc., and welcome recommendations from everyone!
Recent Advances in Vision and Language PreTrained Models (VL-PTMs)
awesome grounding: A curated list of research papers in visual grounding
A collection of resources on applications of multi-modal learning in medical imaging.
TensorFlow implementation of "Multimodal Speech Emotion Recognition using Audio and Text," IEEE SLT-18
Multimodal Sarcasm Detection Dataset
This repository contains the code for a video captioning system inspired by Sequence to Sequence -- Video to Text. This system takes as input a video and generates a caption in English describing the video.
Reference mapping for single-cell genomics
Towards Generalist Biomedical AI
A comprehensive reading list for Emotion Recognition in Conversations
This repo contains evaluation code for the paper "MMMU: A Massive Multi-discipline Multimodal Understanding and Reasoning Benchmark for Expert AGI"
An intelligent multimodal-learning based system for video, product and ads analysis. Based on the system, people can build a lot of downstream applications such as product recommendation, video retrieval, etc.
Add a description, image, and links to the multimodal-deep-learning topic page so that developers can more easily learn about it.
To associate your repository with the multimodal-deep-learning topic, visit your repo's landing page and select "manage topics."