Skip to content
View dwillis's full-sized avatar

Highlights

  • Pro

Organizations

@unitedstates

Block or report dwillis

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse

Starred repositories

44 results for source starred repositories written in HTML
Clear filter

newspaper3k is a news, full-text, and article metadata extraction in Python 3. Advanced docs:

HTML 14,847 2,134 Updated Nov 3, 2025

Jupyter notebooks for teaching/learning Python 3

HTML 6,712 1,823 Updated Jan 4, 2024

Html Content / Article Extractor, web scrapping lib in Python

HTML 4,050 783 Updated Dec 26, 2021

The official online compendium for Mining the Social Web, 2nd Edition (O'Reilly, 2013)

HTML 2,893 1,466 Updated Jul 11, 2022

Code examples for “Interactive Data Visualization for the Web”

HTML 2,413 1,787 Updated Dec 7, 2019

A pure-python HTML screen-scraping library

HTML 1,886 273 Updated Apr 4, 2022

Guide to Python's magic methods

HTML 1,716 347 Updated Dec 25, 2023

A batteries-included framework for easy web-scraping. Just add CSS! (Or do more.)

HTML 1,602 110 Updated Dec 26, 2018

A Python 3 compatible version of goose http://goose3.readthedocs.io/en/latest/index.html

HTML 891 109 Updated Oct 30, 2025

Documents with Scientific Intelligence

HTML 848 52 Updated Nov 5, 2025

Find dates inside text using Python and get back datetime objects

HTML 664 170 Updated May 13, 2024

Web Content Retrieval for Humans™

HTML 623 48 Updated Jul 30, 2022

Wget-compatible web downloader and crawler.

HTML 595 84 Updated Apr 29, 2024

Experimental HTML templates linting for Jinja, Nunjucks, Django templates, Twig, Liquid

HTML 251 27 Updated Jan 28, 2024

Open local or remote XLSX, XLS, ODS, CSV (comma separated), TSV (tab separated), other delimited, fixed-width files, and Google Docs. Returns an enumerator of Arrays or Hashes, depending on whether…

HTML 229 19 Updated Feb 1, 2016

A collection of R packages spanning natural language processing, statistical analysis, data visualization, and text analysis

HTML 218 36 Updated Jun 8, 2025

A Gentle Introduction to SQL Using SQLite

HTML 202 82 Updated Aug 18, 2022

Supplementary materials for Visual Storytelling with D3 (plus more D3 goodness)

HTML 138 52 Updated Feb 15, 2016

Workshop: Collecting and Analyzing Social Media Data with R

HTML 101 68 Updated Jun 23, 2019

Introduction to Computational Tools and Techniques for Social Research

HTML 99 69 Updated Sep 25, 2020

OSDI Specification

HTML 86 47 Updated Aug 25, 2022

Tooling to extract data from scanned paper forms OCR-ed by Tesseract using the HOCR standard.

HTML 84 16 Updated Mar 1, 2016

A step-by-step guide to publishing a simple news application.

HTML 75 27 Updated Mar 2, 2018

A repository for collecting several simple datasets that track the impact of the Trump 47 regime

HTML 55 2 Updated Aug 19, 2025

Interesting datasets for personal projects or submissions to #TidyTuesday

HTML 50 9 Updated Dec 8, 2020

An attempt at creating a silver/gold standard dataset for backtesting yesterday & today's content-extractors

HTML 35 3 Updated Mar 19, 2015

Files for the On The Books project

HTML 35 6 Updated Nov 5, 2024

Tracking changes to the official U.S. House and Senate roll call votes XML data files. Monitored hourly-ish by @GovTrack/@JoshData.

HTML 33 7 Updated Dec 22, 2018

Interactive and searchable House staffer directory, based on House disbursement data.

HTML 29 4 Updated Feb 29, 2024
Next