Highlights
- Pro
Starred repositories
newspaper3k is a news, full-text, and article metadata extraction in Python 3. Advanced docs:
Jupyter notebooks for teaching/learning Python 3
Html Content / Article Extractor, web scrapping lib in Python
The official online compendium for Mining the Social Web, 2nd Edition (O'Reilly, 2013)
Code examples for “Interactive Data Visualization for the Web”
A batteries-included framework for easy web-scraping. Just add CSS! (Or do more.)
A Python 3 compatible version of goose http://goose3.readthedocs.io/en/latest/index.html
Find dates inside text using Python and get back datetime objects
Experimental HTML templates linting for Jinja, Nunjucks, Django templates, Twig, Liquid
Open local or remote XLSX, XLS, ODS, CSV (comma separated), TSV (tab separated), other delimited, fixed-width files, and Google Docs. Returns an enumerator of Arrays or Hashes, depending on whether…
A collection of R packages spanning natural language processing, statistical analysis, data visualization, and text analysis
Supplementary materials for Visual Storytelling with D3 (plus more D3 goodness)
Workshop: Collecting and Analyzing Social Media Data with R
Introduction to Computational Tools and Techniques for Social Research
Tooling to extract data from scanned paper forms OCR-ed by Tesseract using the HOCR standard.
A step-by-step guide to publishing a simple news application.
A repository for collecting several simple datasets that track the impact of the Trump 47 regime
Interesting datasets for personal projects or submissions to #TidyTuesday
An attempt at creating a silver/gold standard dataset for backtesting yesterday & today's content-extractors
Tracking changes to the official U.S. House and Senate roll call votes XML data files. Monitored hourly-ish by @GovTrack/@JoshData.
Interactive and searchable House staffer directory, based on House disbursement data.