abubelinha

Follow

abubelinha

Follow

7 followers · 21 following

Achievements

Achievements

Stars

data-mining

27 repositories

ankushshah89 / python-docx2txt

A pure python based utility to extract text and images from docx files.

Python 575 105 Updated Mar 24, 2025

clips / pattern

Web mining module for Python, with tools for scraping, natural language processing, machine learning, network analysis and visualization.

Python 8,855 1,580 Updated Jun 10, 2024

MrDebugger / bs2json

A python3 module that converts your bs4 Tag into json object (dict)

Python 15 2 Updated Aug 3, 2025

caltechlibrary / handprint

Apply different text recognition services to images of handwritten documents.

Python 188 18 Updated Dec 26, 2022

RobinDavid / Pytesser

Python wrapper for the tesseract OCR engine. The module is based on OpenCV

Python 177 48 Updated Sep 8, 2017

madmaze / pytesseract

A Python wrapper for Google Tesseract

Python 6,288 749 Updated Dec 22, 2025

serpapi / google-search-results-python

Google Search Results via SERP API pip Python Package

Python 723 117 Updated Dec 3, 2025

ElsevierDev / elsapy

A Python module for use with Elsevier's APIs: Scopus, ScienceDirect, others.

Python 417 161 Updated Nov 9, 2022

dimitryzub / scrape-google-scholar-py

Extract data from all Google Scholar pages from a single Python module.

Python 126 21 Updated Jul 31, 2025

zotero / zotero

Zotero is a free, easy-to-use tool to help you collect, organize, annotate, cite, and share your research sources.

JavaScript 13,046 918 Updated Dec 21, 2025

ynulihao / SP2000

Catalogue of Life toolkit for Python

Python 11 2 Updated Aug 4, 2020

euske / pdfminer

Python PDF Parser (Not actively maintained). Check out pdfminer.six.

Python 5,301 1,122 Updated Dec 7, 2022

pdfminer / pdfminer.six

Community maintained fork of pdfminer - we fathom PDF

Python 6,835 1,015 Updated Dec 14, 2025

sarnold / pdfrw

pdfrw is a pure Python library that reads and writes PDFs

Python 34 4 Updated Nov 9, 2022

jsvine / pdfplumber

Plumb a PDF for detailed information about each char, rectangle, line, et cetera — and easily extract text and tables.

Python 9,369 838 Updated Nov 8, 2025

ncraun / smoothscan

smoothscan is a tool to convert scanned text into a vectorized output form.

C 67 4 Updated Sep 23, 2013

scantailor / scantailor

C++ 1,747 318 Updated Nov 29, 2020

ImageProcessing-ElectronicPublications / python-image-region-extraction

Extract text and its region from image using openCV

Python 1 Updated Nov 23, 2019

ImageProcessing-ElectronicPublications / python-cropper-tk

GUI for cropping a large amount of images quickly.

Python 2 3 Updated May 9, 2021

kshepherd / awesome-digital-repository-list

A list of awesome resources and links related to digital repositories (especially open access)

1 Updated Oct 25, 2023

sckott / habanero

client for Crossref search API

Python 237 32 Updated Oct 10, 2025

aerkalov / ebooklib

A versatile Python library for EPUB2/EPUB3 manipulation and processing.

Python 1,730 251 Updated Nov 18, 2025

rafelafrance / phenobase

Classifiers for the Phenobase project

Python 5 2 Updated Nov 21, 2025

twisp007 / dictScrape

Scrapes dictionary definitions from Oxford and Collins Cobuild dictionaries and stores them in a sqlite database.

Python 2 Updated May 23, 2021

kiasar / Dictionary_crawler

This is a python code based on Scrapy package to crawl famous online dictionaries like Oxford, Longman, Cambridge, Webster, and Collins to make a dataset

Python 107 21 Updated May 21, 2023

MrCabss69 / Python-Catastro

Wrapper para la librería de pycatastro - consultas simplificadas a la API del catastro de España.

Python 7 Updated Jul 23, 2023

camelot-dev / camelot

A Python library to extract tabular data from PDFs

Python 3,552 523 Updated Nov 12, 2025