Highlights
- Pro
Starred repositories
Robust Speech Recognition via Large-Scale Weak Supervision
Interact with your documents using the power of GPT, 100% privately, no data leaks
The world's simplest facial recognition api for Python and the command line
An open-source RAG-based tool for chatting with your documents.
Build Real-Time Knowledge Graphs for AI Agents
OCR, layout analysis, reading order, table recognition in 90+ languages
A Python library for extracting structured information from unstructured text using LLMs with precise source grounding and interactive visualization.
Open Source AI Platform - AI Chat with advanced features that works with every LLM
Buzz transcribes and translates audio offline on your personal computer. Powered by OpenAI's Whisper.
Always know what to expect from your data.
An open source multi-tool for exploring and publishing data
Simple, Pythonic, text processing--Sentiment analysis, part-of-speech tagging, noun phrase extraction, translation, and more.
Plumb a PDF for detailed information about each char, rectangle, line, et cetera — and easily extract text and tables.
Anomaly detection related books, papers, videos, and toolboxes
File Parser optimised for LLM Ingestion with no loss 🧠 Parse PDFs, Docx, PPTx in a format that is ideal for LLMs.
Crawlee—A web scraping and browser automation library for Python to build reliable crawlers. Extract data for AI, LLMs, RAG, or GPTs. Download HTML, PDF, JPG, PNG, and other files from websites. Wo…
A suite of utilities for converting to and working with CSV, the king of tabular file formats.
[NeurIPS 2023] Tree of Thoughts: Deliberate Problem Solving with Large Language Models
High-resolution models for human tasks.
Structured data extraction and instruction calling with ML, LLM and Vision LLM
Python & Command-line tool to gather text and metadata on the Web: Crawling, scraping, extraction, output as CSV, JSON, HTML, MD, TXT, XML
VADER Sentiment Analysis. VADER (Valence Aware Dictionary and sEntiment Reasoner) is a lexicon and rule-based sentiment analysis tool that is specifically attuned to sentiments expressed in social …
Easy-to-use data handling for SQL data stores with support for implicit table creation, bulk loading, and transactions.
A Python library for automating interaction with websites.
🆔 A python library for accurate and scalable fuzzy matching, record deduplication and entity-resolution.
A generic JSON document store with sharing and synchronisation capabilities.
A utility for mocking out the Python Requests library.