Lists (2)
Sort Name ascending (A-Z)
Stars
CLIP (Contrastive Language-Image Pretraining), Predict the most relevant text snippet given an image
A simple screen parsing tool towards pure vision based GUI agent
Welcome to the Llama Cookbook! This is your go to guide for Building with Llama: Getting started with Inference, Fine-Tuning, RAG. We also show you how to solve end to end problems using Llama modeโฆ
Qwen3-VL is the multimodal large language model series developed by Qwen team, Alibaba Cloud.
A hands-on introduction to video technology: image, video, codec (av1, vp9, h265) and more (ffmpeg encoding). Translations: ๐บ๐ธ ๐จ๐ณ ๐ฏ๐ต ๐ฎ๐น ๐ฐ๐ท ๐ท๐บ ๐ง๐ท ๐ช๐ธ
Offline speech recognition API for Android, iOS, Raspberry Pi and servers with Python, Java, C# and Node
PyTorch code and models for the DINOv2 self-supervised learning method.
Code for replicating Roboflow 100 benchmark results and programmatically downloading benchmark datasets
Live-bending a foundation modelโs output at neural network level.