Skip to content
View abubelinha's full-sized avatar

Block or report abubelinha

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Stars

data-mining

27 repositories

A pure python based utility to extract text and images from docx files.

Python 575 105 Updated Mar 24, 2025

Web mining module for Python, with tools for scraping, natural language processing, machine learning, network analysis and visualization.

Python 8,855 1,580 Updated Jun 10, 2024

A python3 module that converts your bs4 Tag into json object (dict)

Python 15 2 Updated Aug 3, 2025

Apply different text recognition services to images of handwritten documents.

Python 188 18 Updated Dec 26, 2022

Python wrapper for the tesseract OCR engine. The module is based on OpenCV

Python 177 48 Updated Sep 8, 2017

A Python wrapper for Google Tesseract

Python 6,288 749 Updated Dec 22, 2025

Google Search Results via SERP API pip Python Package

Python 723 117 Updated Dec 3, 2025

A Python module for use with Elsevier's APIs: Scopus, ScienceDirect, others.

Python 417 161 Updated Nov 9, 2022

Extract data from all Google Scholar pages from a single Python module.

Python 126 21 Updated Jul 31, 2025

Zotero is a free, easy-to-use tool to help you collect, organize, annotate, cite, and share your research sources.

JavaScript 13,046 918 Updated Dec 21, 2025

Catalogue of Life toolkit for Python

Python 11 2 Updated Aug 4, 2020

Python PDF Parser (Not actively maintained). Check out pdfminer.six.

Python 5,301 1,122 Updated Dec 7, 2022

Community maintained fork of pdfminer - we fathom PDF

Python 6,835 1,015 Updated Dec 14, 2025

pdfrw is a pure Python library that reads and writes PDFs

Python 34 4 Updated Nov 9, 2022

Plumb a PDF for detailed information about each char, rectangle, line, et cetera — and easily extract text and tables.

Python 9,369 838 Updated Nov 8, 2025

smoothscan is a tool to convert scanned text into a vectorized output form.

C 67 4 Updated Sep 23, 2013

Extract text and its region from image using openCV

Python 1 Updated Nov 23, 2019

GUI for cropping a large amount of images quickly.

Python 2 3 Updated May 9, 2021

A list of awesome resources and links related to digital repositories (especially open access)

1 Updated Oct 25, 2023

client for Crossref search API

Python 237 32 Updated Oct 10, 2025

A versatile Python library for EPUB2/EPUB3 manipulation and processing.

Python 1,730 251 Updated Nov 18, 2025

Classifiers for the Phenobase project

Python 5 2 Updated Nov 21, 2025

Scrapes dictionary definitions from Oxford and Collins Cobuild dictionaries and stores them in a sqlite database.

Python 2 Updated May 23, 2021

This is a python code based on Scrapy package to crawl famous online dictionaries like Oxford, Longman, Cambridge, Webster, and Collins to make a dataset

Python 107 21 Updated May 21, 2023

Wrapper para la librería de pycatastro - consultas simplificadas a la API del catastro de España.

Python 7 Updated Jul 23, 2023

A Python library to extract tabular data from PDFs

Python 3,552 523 Updated Nov 12, 2025