A fork of Dragnet that also extract author, headline, date, keywords from context, as well as built in metadata extraction all in one package
-
Updated
May 19, 2025 - HTML
A fork of Dragnet that also extract author, headline, date, keywords from context, as well as built in metadata extraction all in one package
Web content extraction using machine learning
Seize is light Node or Browser web-page content extractor inspired by arc90 readability and Safari Reader
Diff Based Content Extraction is a part of my Bachelor Thesis: Joint Approach to Boilerplate Detection in Web Archives
A web application that scrapes web pages, extracts main content, and uses OpenLLaMA to convert the content into specified formats.
A Chrome extension that summarizes articles using Gemini API
A privacy-focused, client-side web application that extracts clean, readable content from any webpage and converts it to PDF format. Built with pure HTML, CSS, and JavaScript—no backend required, no tracking, complete privacy.
Add a description, image, and links to the content-extraction topic page so that developers can more easily learn about it.
To associate your repository with the content-extraction topic, visit your repo's landing page and select "manage topics."