Lists (2)
Sort Name ascending (A-Z)
Stars
Official repository of Evolutionary Optimization of Model Merging Recipes
CLIP-based aesthetics predictor inspired by the interface of š¤ huggingface transformers.
šµ Is a free asynchronous library from reverse engineered Shazam API written in Python 3.10+ with asyncio and aiohttp.
A framework to enable multimodal models to operate a computer.
š¢ Open-Source Evaluation & Testing library for LLM Agents
Run Latent Consistency Models on your Mac
The official Python library for the OpenAI API
[CVPR 2023 Workshop] The code reproduce the results of our solutions on both tracks for Meta AI Video Similarity Challenge (CVPR 2023 Workshop). Our solutions got the first place on both tracks, inā¦
Implementation of the conditionally routed attention in the CoLT5 architecture, in Pytorch
An Open-source Toolkit for LLM Development
LAVIS - A One-stop Library for Language-Vision Intelligence
[CVPR'24 Highlight] Official PyTorch implementation of CoDeF: Content Deformation Fields for Temporally Consistent Video Processing
[ICCV23] Official implementation of eP-ALM: Efficient Perceptual Augmentation of Language Models.
š The Multi-Agent Framework: First AI Software Company, Towards Natural Language Programming
An efficient implementation of the popular sequence models for text generation, summarization, and translation tasks. https://arxiv.org/pdf/2106.04718.pdf
A high-throughput and memory-efficient inference and serving engine for LLMs
Official repository of OFA (ICML 2022). Paper: OFA: Unifying Architectures, Tasks, and Modalities Through a Simple Sequence-to-Sequence Learning Framework
[TMLR23] Official implementation of UnIVAL: Unified Model for Image, Video, Audio and Language Tasks.
Soldelli / Awesome-Temporal-Language-Grounding-in-Videos
Forked from rookiecm/Awesome-Temporal-Sentence-Grounding-in-VideosA curated list of grounding natural language in video and related area. :-)
Code release for "MERLOT Reserve: Neural Script Knowledge through Vision and Language and Sound"
MERLOT: Multimodal Neural Script Knowledge Models
Retrieval-Augmented Video Generation for Telling a Story
Official code for VisProg (CVPR 2023 Best Paper!)
mPLUG-2: A Modularized Multi-modal Foundation Model Across Text, Image and Video (ICML 2023)
[CVPR2023] All in One: Exploring Unified Video-Language Pre-training
20+ high-performance LLMs with recipes to pretrain, finetune and deploy at scale.
Audiocraft is a library for audio processing and generation with deep learning. It features the state-of-the-art EnCodec audio compressor / tokenizer, along with MusicGen, a simple and controllableā¦
A lightweight yet powerful audio-to-MIDI converter with pitch bend detection