dwillis

Derek Willis dwillis

I teach data journalism at the University of Maryland and am co-founder of @openelections. I enjoy cricket, scraping and other things people enjoy.

826 followers · 135 following

Achievements

x4 x2 x2

Achievements

x4 x2 x2

Highlights

Organizations

Starred repositories

44 results for source starred repositories written in HTML

Clear filter

codelucas / newspaper

newspaper3k is a news, full-text, and article metadata extraction in Python 3. Advanced docs:

HTML 14,847 2,134 Updated Nov 3, 2025

jerry-git / learn-python3

Jupyter notebooks for teaching/learning Python 3

HTML 6,712 1,823 Updated Jan 4, 2024

grangier / python-goose

Html Content / Article Extractor, web scrapping lib in Python

HTML 4,050 783 Updated Dec 26, 2021

ptwobrussell / Mining-the-Social-Web-2nd-Edition

The official online compendium for Mining the Social Web, 2nd Edition (O'Reilly, 2013)

HTML 2,893 1,466 Updated Jul 11, 2022

scotthmurray / d3-book

Code examples for “Interactive Data Visualization for the Web”

HTML 2,413 1,787 Updated Dec 7, 2019

scrapy / scrapely

A pure-python HTML screen-scraping library

HTML 1,886 273 Updated Apr 4, 2022

RafeKettler / magicmethods

Guide to Python's magic methods

HTML 1,716 347 Updated Dec 25, 2023

propublica / upton

A batteries-included framework for easy web-scraping. Just add CSS! (Or do more.)

HTML 1,602 110 Updated Dec 26, 2018

goose3 / goose3

A Python 3 compatible version of goose http://goose3.readthedocs.io/en/latest/index.html

HTML 891 109 Updated Oct 30, 2025

stencila / stencila

Documents with Scientific Intelligence

HTML 848 52 Updated Nov 5, 2025

akoumjian / datefinder

Find dates inside text using Python and get back datetime objects

HTML 664 170 Updated May 13, 2024

michaelhelmick / lassie

Web Content Retrieval for Humans™

HTML 623 48 Updated Jul 30, 2022

ArchiveTeam / wpull

Wget-compatible web downloader and crawler.

HTML 595 84 Updated Apr 29, 2024

thibaudcolas / curlylint

Experimental HTML templates linting for Jinja, Nunjucks, Django templates, Twig, Liquid

HTML 251 27 Updated Jan 28, 2024

seamusabshere / remote_table

Open local or remote XLSX, XLS, ODS, CSV (comma separated), TSV (tab separated), other delimited, fixed-width files, and Google Docs. Returns an enumerator of Arrays or Hashes, depending on whether…

HTML 229 19 Updated Feb 1, 2016

taylor-arnold / rpkg

A collection of R packages spanning natural language processing, statistical analysis, data visualization, and text analysis

HTML 218 36 Updated Jun 8, 2025

tthibo / SQL-Tutorial

A Gentle Introduction to SQL Using SQLite

HTML 202 82 Updated Aug 18, 2022

ritchieking / d3-book

Supplementary materials for Visual Storytelling with D3 (plus more D3 goodness)

HTML 138 52 Updated Feb 15, 2016

pablobarbera / social-media-workshop

Workshop: Collecting and Analyzing Social Media Data with R

HTML 101 68 Updated Jun 23, 2019

COVID19StatePolicy / SocialDistancing

HTML 100 47 Updated Apr 27, 2023

rochelleterman / PS239T

Introduction to Computational Tools and Techniques for Social Research

HTML 99 69 Updated Sep 25, 2020

opensupporter / osdi-docs

OSDI Specification

HTML 86 47 Updated Aug 25, 2022

jsfenfen / whatwordwhere

Tooling to extract data from scanned paper forms OCR-ed by Tesseract using the HOCR standard.

HTML 84 16 Updated Mar 1, 2016

palewire / first-news-app

A step-by-step guide to publishing a simple news application.

HTML 75 27 Updated Mar 2, 2018

harrisj / trump_data

A repository for collecting several simple datasets that track the impact of the Trump 47 regime

HTML 55 2 Updated Aug 19, 2025

tacookson / data

Interesting datasets for personal projects or submissions to #TidyTuesday

HTML 50 9 Updated Dec 8, 2020

rodricios / crawl-to-the-future

An attempt at creating a silver/gold standard dataset for backtesting yesterday & today's content-extractors

HTML 35 3 Updated Mar 19, 2015

UNC-Libraries-data / OnTheBooks

Files for the On The Books project

HTML 35 6 Updated Nov 5, 2024

unitedstates / congress-votes-servo

Tracking changes to the official U.S. House and Senate roll call votes XML data files. Monitored hourly-ish by @GovTrack/@JoshData.

HTML 33 7 Updated Dec 22, 2018

propublica / staffers

Interactive and searchable House staffer directory, based on House disbursement data.

HTML 29 4 Updated Feb 29, 2024

Derek Willis dwillis

Highlights

Organizations

Starred repositories

git-scraping