- London, United Kingdom
-
22:13
(UTC) - https://underwood.network
- https://lpx.org.uk
Highlights
- Pro
Stars
Automatically visualize your pandas dataframe via a single print! 📊 💡
A collection of public presentations from the Common Crawl Foundation
Examples for using the Daft data engine
High-performance data engine for AI and multimodal workloads. Process images, audio, video, and structured data at any scale
Magnificent app which corrects your previous console command.
tall, condensed, bitmap font for geeks
Internet Archive's Sparkling Data Processing Library
Introduction to WebGraphs - Workshop at the IIPC Web Archiving Conference 2025
Materials for the workshop held at Born-Digital Collections, Archives and Memory conference 2025.
Lightweight Python utility for retrieving individual pages from the Common Crawl archives.
A very simple Python module to suppress `KeyboardInterrupt` traceback spam when pressing `^C`.
ripgrep recursively searches directories for a regex pattern while respecting your gitignore
Vocabulary for Expressing Content Preferences for AI Training
Common Voice is part of Mozilla's initiative to help teach machines how real people speak.
An open-source handbook of applied guidance and tools for sustainable software development and maintenance.
Yet another Markdown renderer but this one extends standard Markdown with additional features, offering enhanced functionality and flexibility for content creators
yq is a portable command-line YAML, JSON, XML, CSV, TOML, HCL and properties processor
Statistics of Common Crawl monthly Web Graphs
Working repo to support the Alliance's Open Trusted Data Initiative
Book repository for The Turing Way: a how to guide for reproducible, ethical and collaborative data science
LaTeX to image converter with web UI using Node.js / Docker
Run safety benchmarks against AI models and view detailed reports showing how well they performed.
Open source project for data preparation for GenAI applications
Crowd-sourced lists of urls to help Common Crawl crawl under-resourced languages. See https://github.com/commoncrawl/web-languages-code/ for the code
⏩ Ship faster with Continuous AI. Open-source CLI that can be used in TUI mode as a coding agent or Headless mode to run background agents
X-SAMPA to IPA and IPA to X-SAMPA converter
A JavaScript utility for web pages that creates dynamic, human-readable dates, times, and relative time descriptions from UNIX timestamps.
The day-to-day front-end to the IETF database for people who work on IETF standards.