Skip to content
View dimitryslavin's full-sized avatar
  • San Francisco, CA

Highlights

  • Pro

Block or report dimitryslavin

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse

Starred repositories

13 stars written in HTML
Clear filter

Library for fast text representation and classification.

HTML 26,458 4,817 Updated Mar 22, 2024

newspaper3k is a news, full-text, and article metadata extraction in Python 3. Advanced docs:

HTML 14,907 2,134 Updated Dec 6, 2025

Convert documents to structured data effortlessly. Unstructured is open-source ETL solution for transforming complex documents into clean, structured formats for language models. Visit our website …

HTML 13,458 1,111 Updated Dec 19, 2025

Convert PDF to HTML without losing text or format.

HTML 10,561 1,858 Updated Jun 2, 2023

✨ Build a beautiful and simple website in literally minutes. Demo at https://beautifuljekyll.com

HTML 5,748 17,304 Updated Mar 23, 2025

Awesome Search - this is all about the (e-commerce, but not only) search and its awesomeness

HTML 1,502 131 Updated Dec 13, 2025

(Java)A Method to Extract Tabular Content from PDF Files

HTML 336 133 Updated Apr 22, 2023

PYthon Automated Term Extraction

HTML 318 39 Updated Feb 8, 2023

BFO repository including source code and latest documents

HTML 290 49 Updated Jan 30, 2024

GIS boundaries in GeoJSON format for all US Congressional Districts, 1789 to 2012

HTML 156 70 Updated Nov 29, 2025

This contains materials for the word embeddings workshop

HTML 127 66 Updated Jul 19, 2017

[deprecated] U.S. public financial analysis tools using pandas.

HTML 14 3 Updated Jan 8, 2022