Stars
🤗 Transformers: the model-definition framework for state-of-the-art machine learning models in text, vision, audio, and multimodal models, for both inference and training.
OCRmyPDF adds an OCR text layer to scanned PDF files, allowing them to be searched
🔒 Consolidating and extending hosts files from several well-curated sources. Optionally pick extensions for porn, social media, and other categories.
Data science Python notebooks: Deep learning (TensorFlow, Theano, Caffe, Keras), scikit-learn, Kaggle, big data (Spark, Hadoop MapReduce, HDFS), matplotlib, pandas, NumPy, SciPy, Python essentials,…
Nuitka is a Python compiler written in Python. It's fully compatible with Python 2.6, 2.7, 3.4-3.13. You feed it your Python app, it does a lot of clever things, and spits out an executable or exte…
Statsmodels: statistical modeling and econometrics in Python
An open access book on scientific visualization using python and matplotlib
An open source multi-tool for exploring and publishing data
Practical Python Programming (course by @dabeaz)
Deep learning library featuring a higher-level API for TensorFlow.
Community maintained fork of pdfminer - we fathom PDF
A Unified Toolkit for Deep Learning Based Document Image Analysis
Python PDF Parser (Not actively maintained). Check out pdfminer.six.
💡 Full-featured code intelligence and smart autocomplete for Sublime Text
Python & Command-line tool to gather text and metadata on the Web: Crawling, scraping, extraction, output as CSV, JSON, HTML, MD, TXT, XML
CommonMark spec, with reference implementations in C and JavaScript
Snips Python library to extract meaning from text
Best practice and tips & tricks to write scientific papers in LaTeX, with figures generated in Python or Matlab.
(OLD REPO) Line-by-line profiling for Python - Current repo ->
A Python library to extract tabular data from PDFs
Was an interactive continuous Python profiler.