Skip to content
View HiromuHota's full-sized avatar

Organizations

@apache @spaCy-ja

Block or report HiromuHota

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don’t include any personal information such as legal names or email addresses. Markdown is supported. This note will only be visible to you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
51 stars written in Python
Clear filter

The uncompromising Python code formatter

Python 41,455 2,753 Updated Apr 8, 2026

💫 Industrial-strength Natural Language Processing (NLP) in Python

Python 33,428 4,669 Updated Mar 28, 2026

OCRmyPDF adds an OCR text layer to scanned PDF files, allowing them to be searched

Python 33,170 2,299 Updated Apr 8, 2026

Ready-to-use OCR with 80+ supported languages and all popular writing scripts including Latin, Chinese, Arabic, Devanagari, Cyrillic and etc.

Python 29,258 3,552 Updated Dec 5, 2025

The open source AI engineering platform for agents, LLMs, and ML models. MLflow enables teams of all sizes to debug, evaluate, monitor, and optimize production-quality AI applications while control…

Python 25,241 5,541 Updated Apr 9, 2026

🦉 Data Versioning and ML Experiments

Python 15,518 1,291 Updated Apr 7, 2026

Low-code framework for building custom LLMs, neural networks, and other AI models

Python 11,664 1,215 Updated Apr 8, 2026

Ready-to-run Docker images containing Jupyter applications

Python 8,428 2,991 Updated Apr 7, 2026

A system for quickly generating training data with weak supervision

Python 5,948 855 Updated May 2, 2024

Representation learning on large graphs using stochastic graph convolutions.

Python 3,678 854 Updated Aug 4, 2024

A command line utility to display dependency tree of the installed Python packages

Python 2,991 158 Updated Apr 8, 2026

Deep neural network to extract intelligent information from invoice documents.

Python 2,682 413 Updated May 3, 2024

Simple wrapper of tabula-java: extract table from PDF into pandas DataFrame

Python 2,314 303 Updated Dec 5, 2024

A set of tools for extracting tables from PDF files helping to do data mining on (OCR-processed) scanned documents.

Python 2,256 370 Updated Jun 24, 2022

A benchmark for LLMs on complicated tasks in the terminal

Python 1,940 499 Updated Jan 22, 2026

Generate modern Python clients from OpenAPI

Python 1,930 270 Updated Apr 8, 2026

A web interface to extract tabular data from PDFs

Python 1,794 238 Updated Jan 3, 2025

Auto-generate PEP-484 annotations

Python 1,447 60 Updated Jul 3, 2022

strip output from Jupyter and IPython notebooks

Python 1,441 103 Updated Feb 21, 2026

Harbor is a framework for running agent evaluations and creating and using RL environments.

Python 1,385 888 Updated Apr 9, 2026

Simple reference implementation of GraphSAGE.

Python 1,044 249 Updated May 11, 2020

Simple PDF text extraction

Python 1,008 109 Updated Feb 27, 2026

A Japanese NLP Library using spaCy as framework based on Universal Dependencies

Python 842 58 Updated Mar 30, 2024
Python 471 159 Updated Jun 14, 2020

🌲 A tool for converting PDF into hOCR with text, tables, and figures being recognized and preserved.

Python 459 94 Updated Aug 3, 2023

Snorkel MeTaL: A framework for training models with multi-task weak supervision

Python 430 77 Updated Sep 16, 2019

A knowledge base construction engine for richly formatted data

Python 412 76 Updated Jun 23, 2021

Tools for manipulating and evaluating the hOCR format for representing multi-lingual OCR results by embedding them into HTML.

Python 412 82 Updated Aug 10, 2024

Neural Symbolic Machines is a framework to integrate neural networks and symbolic representations using reinforcement learning, with applications in program synthesis and semantic parsing.

Python 384 69 Updated Nov 21, 2022

Pythonic interface to GnuCash SQL documents

Python 337 86 Updated Jan 5, 2025
Next