Stars
🤗 Transformers: the model-definition framework for state-of-the-art machine learning models in text, vision, audio, and multimodal models, for both inference and training.
Tesseract Open Source OCR Engine (main repository)
Turn any PDF or image document into structured data for your AI. A powerful, lightweight OCR toolkit that bridges the gap between images/PDFs and LLMs. Supports 100+ languages.
A library for efficient similarity search and clustering of dense vectors.
💫 Industrial-strength Natural Language Processing (NLP) in Python
Ready-to-use OCR with 80+ supported languages and all popular writing scripts including Latin, Chinese, Arabic, Devanagari, Cyrillic and etc.
Library for fast text representation and classification.
Label Studio is a multi-type data labeling and annotation tool with standardized output format
Code for the paper "Language Models are Unsupervised Multitask Learners"
Repository to track the progress in Natural Language Processing (NLP), including the datasets and the current state-of-the-art for the most common NLP tasks.
AlexeyAB / darknet
Forked from pjreddie/darknetYOLOv4 / Scaled-YOLOv4 / YOLO - Neural Networks for Object Detection (Windows and Linux version of Darknet )
🤗 The largest hub of ready-to-use datasets for AI models with fast, easy-to-use and efficient data manipulation tools
💬 Open source machine learning framework to automate text- and voice-based conversations: NLU, dialogue management, connect to Slack, Facebook, and more - Create chatbots and voice assistants
A fully-modern text-based browser, rendering to TTY and browsers
Development repository for the Triton language and compiler
State-of-the-Art Text Embeddings
Datasets, Transforms and Models specific to Computer Vision
Library of deep learning models and datasets designed to make deep learning more accessible and accelerate ML research.
StableLM: Stability AI Language Models
Annotate better with CVAT, the industry-leading data engine for machine learning. Used and trusted by teams at any scale, for data of any scale.
newspaper3k is a news, full-text, and article metadata extraction in Python 3. Advanced docs:
This repository contains implementations and illustrative code to accompany DeepMind publications
Distributed training framework for TensorFlow, Keras, PyTorch, and Apache MXNet.
A very simple framework for state-of-the-art Natural Language Processing (NLP)
RWKV (pronounced RwaKuv) is an RNN with great LLM performance, which can also be directly trained like a GPT transformer (parallelizable). We are at RWKV-7 "Goose". So it's combining the best of RN…
OpenProject is the leading open source project management software.
Open source code for AlphaFold 2.