- San Francisco, CA
Highlights
- Pro
Starred repositories
Library for fast text representation and classification.
newspaper3k is a news, full-text, and article metadata extraction in Python 3. Advanced docs:
Convert documents to structured data effortlessly. Unstructured is open-source ETL solution for transforming complex documents into clean, structured formats for language models. Visit our website …
Convert PDF to HTML without losing text or format.
✨ Build a beautiful and simple website in literally minutes. Demo at https://beautifuljekyll.com
Awesome Search - this is all about the (e-commerce, but not only) search and its awesomeness
(Java)A Method to Extract Tabular Content from PDF Files
BFO repository including source code and latest documents
GIS boundaries in GeoJSON format for all US Congressional Districts, 1789 to 2012
This contains materials for the word embeddings workshop
[deprecated] U.S. public financial analysis tools using pandas.