Python & Command-line tool to gather text and metadata on the Web: Crawling, scraping, extraction, output as CSV, JSON, HTML, MD, TXT, XML
-
Updated
Sep 12, 2025 - Python
Python & Command-line tool to gather text and metadata on the Web: Crawling, scraping, extraction, output as CSV, JSON, HTML, MD, TXT, XML
news-please - an integrated web crawler and information extractor for news that just works
A very simple news crawler with a funny name
A korean news crawler built to ingest large amounts of news data.
A news crawler for BBC News, Reuters and New York Times.
News crawler là một công cụ giúp bạn có thể crawl dữ liệu của một trang tin tức.
Use python scrapy build crawler for real-time Taiwan NEWS website.
A Fast and lightweight Python API that search for articles on Google News and returns a JSON response.
Generate large textual corpora for almost any language by crawling the web
Crawler (Scraper) for several well-known persian news for scraping public data
a web crawler to take all the latest indonesian news from many sources
A Scrapy webscraper that can scrape and store articles of theguardian.com
News Site Crawler is a multi-threaded web crawling system designed to ingest, traverse, and analyze large-scale news websites in a controlled and reliable manner.
News crawler project written in Python.
AI-powered news crawler with RAG pipeline — scrapes, processes and summarises news articles using LLMs
A Scrapy package based web scraper for collecting Kurdish text data from websites. The tool recursively crawls specified domains, extracts article content using Trafilatura, and filters results by language using Facebook's FastText language identification model.
Add a description, image, and links to the news-crawler topic page so that developers can more easily learn about it.
To associate your repository with the news-crawler topic, visit your repo's landing page and select "manage topics."