Stars
🤗 Transformers: the model-definition framework for state-of-the-art machine learning models in text, vision, audio, and multimodal models, for both inference and training.
A feature-rich command-line audio/video downloader
Tensors and Dynamic neural networks in Python with strong GPU acceleration
Magnificent app which corrects your previous console command.
🌐 Make websites accessible for AI agents. Automate tasks online with ease.
A high-throughput and memory-efficient inference and serving engine for LLMs
Tesseract Open Source OCR Engine (main repository)
Rich is a Python library for rich text and beautiful formatting in the terminal.
The repository provides code for running inference with the SegmentAnything Model (SAM), links for downloading the trained model checkpoints, and example notebooks that show how to use the model.
A library for efficient similarity search and clustering of dense vectors.
The largest collection of PyTorch image encoders / backbones. Including train, eval, inference, export scripts, and pretrained weights -- ResNet, ResNeXT, EfficientNet, NFNet, Vision Transformer (V…
DSPy: The framework for programming—not prompting—language models
CLIP (Contrastive Language-Image Pretraining), Predict the most relevant text snippet given an image
Swagger UI is a collection of HTML, JavaScript, and CSS assets that dynamically generate beautiful documentation from a Swagger-compliant API.
Generative Models by Stability AI
[NeurIPS'23 Oral] Visual Instruction Tuning (LLaVA) built towards GPT-4V level capabilities and beyond.
Oh my tmux! My self-contained, pretty & versatile tmux configuration made with 💛🩷💙🖤❤️🤍
Fast and memory-efficient exact attention
Learn OpenCV : C++ and Python Examples
🤗 The largest hub of ready-to-use datasets for AI models with fast, easy-to-use and efficient data manipulation tools
verl: Volcano Engine Reinforcement Learning for LLMs
The repository provides code for running inference with the Meta Segment Anything Model 2 (SAM 2), links for downloading the trained model checkpoints, and example notebooks that show how to use th…
Qwen3-VL is the multimodal large language model series developed by Qwen team, Alibaba Cloud.
🎨 ML Visuals contains figures and templates which you can reuse and customize to improve your scientific writing.
An open source implementation of CLIP.
The AI Scientist: Towards Fully Automated Open-Ended Scientific Discovery 🧑🔬