content-extraction

Here are 7 public repositories matching this topic...

currentslab / extractnet

A fork of Dragnet that also extract author, headline, date, keywords from context, as well as built in metadata extraction all in one package

python machine-learning text-mining news web-scraping webscraping news-articles news-extractor content-extraction news-extraction text-cleaning date-extraction author-extraction

Updated May 19, 2025
HTML

nikitautiu / learnhtml

Star

Web content extraction using machine learning

html deep-learning content-extraction

Updated Mar 3, 2021
HTML

peremenov / seize

Star

Seize is light Node or Browser web-page content extractor inspired by arc90 readability and Safari Reader

dom extract reader readability content-extraction text-score

Updated May 20, 2017
HTML

thorkill / dbce

Star

Diff Based Content Extraction is a part of my Bachelor Thesis: Joint Approach to Boilerplate Detection in Web Archives

machine-learning machine-learning-algorithms bachelor-thesis webarchive content-extraction html-content-extraction

Updated Jun 11, 2017
HTML

arman-bd / www2any

Star

A web application that scrapes web pages, extracts main content, and uses OpenLLaMA to convert the content into specified formats.

flask transformer webscraping content-extraction playwright llm openllama

Updated Dec 9, 2024
HTML

chithraxx-0616 / AI_SUMMARIZER

Star

A Chrome extension that summarizes articles using Gemini API

javascript css chrome-extension html text-analysis browser-extension web-extension user-friendly content-extraction gemini-api reading-tools generative-ai google-ai-studio summarizer-api

Updated Oct 21, 2025
HTML

A privacy-focused, client-side web application that extracts clean, readable content from any webpage and converts it to PDF format. Built with pure HTML, CSS, and JavaScript—no backend required, no tracking, complete privacy.

Updated Oct 5, 2025
HTML

Improve this page

Add a description, image, and links to the content-extraction topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the content-extraction topic, visit your repo's landing page and select "manage topics."

Learn more

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

content-extraction

Here are 7 public repositories matching this topic...

currentslab / extractnet

nikitautiu / learnhtml

peremenov / seize

thorkill / dbce

arman-bd / www2any

chithraxx-0616 / AI_SUMMARIZER

xsukax / xsukax-ReadClean-PDF

Improve this page

Add this topic to your repo