Skip to content
View thunderpoot's full-sized avatar
💨
💨

Sponsors

@pbernicchi
@blacknite4ever

Highlights

  • Pro

Organizations

@commoncrawl @mlcommons @telehack-foundation @londonpixelexchange @vaughantype

Block or report thunderpoot

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

Automatically visualize your pandas dataframe via a single print! 📊 💡

Python 5,361 377 Updated Mar 20, 2024

A collection of public presentations from the Common Crawl Foundation

9 1 Updated Oct 22, 2025

Examples for using the Daft data engine

Jupyter Notebook 9 2 Updated Dec 18, 2025

High-performance data engine for AI and multimodal workloads. Process images, audio, video, and structured data at any scale

Rust 5,026 364 Updated Dec 20, 2025

Magnificent app which corrects your previous console command.

Python 94,958 3,802 Updated Jul 19, 2024

tall, condensed, bitmap font for geeks

Vim Script 2,039 33 Updated Nov 4, 2023

Internet Archive's Sparkling Data Processing Library

Scala 15 2 Updated Dec 15, 2025

Introduction to WebGraphs - Workshop at the IIPC Web Archiving Conference 2025

Shell 3 Updated Apr 10, 2025

Materials for the workshop held at Born-Digital Collections, Archives and Memory conference 2025.

2 Updated Apr 10, 2025

Lightweight Python utility for retrieving individual pages from the Common Crawl archives.

Python 7 1 Updated Mar 2, 2025

A very simple Python module to suppress `KeyboardInterrupt` traceback spam when pressing `^C`.

Python 4 Updated Feb 28, 2025

ripgrep recursively searches directories for a regex pattern while respecting your gitignore

Rust 58,238 2,342 Updated Dec 17, 2025

Vocabulary for Expressing Content Preferences for AI Training

Makefile 2 Updated Dec 18, 2025

Common Voice is part of Mozilla's initiative to help teach machines how real people speak.

TypeScript 3,432 871 Updated Dec 20, 2025

An open-source handbook of applied guidance and tools for sustainable software development and maintenance.

Jupyter Notebook 23 4 Updated Dec 15, 2025

Yet another Markdown renderer but this one extends standard Markdown with additional features, offering enhanced functionality and flexibility for content creators

Go 5 Updated Dec 11, 2024

yq is a portable command-line YAML, JSON, XML, CSV, TOML, HCL and properties processor

Go 14,591 725 Updated Dec 20, 2025

Statistics of Common Crawl monthly Web Graphs

Python 5 1 Updated Nov 24, 2025

Working repo to support the Alliance's Open Trusted Data Initiative

Jupyter Notebook 9 7 Updated Dec 16, 2025

Book repository for The Turing Way: a how to guide for reproducible, ethical and collaborative data science

TeX 2,118 740 Updated Dec 19, 2025

A utility to do detailed analysis of gzip files.

Rust 3 Updated Nov 19, 2024

LaTeX to image converter with web UI using Node.js / Docker

JavaScript 287 42 Updated Oct 4, 2023

Run safety benchmarks against AI models and view detailed reports showing how well they performed.

Python 114 27 Updated Dec 19, 2025

Open source project for data preparation for GenAI applications

HTML 874 237 Updated Dec 19, 2025

Crowd-sourced lists of urls to help Common Crawl crawl under-resourced languages. See https://github.com/commoncrawl/web-languages-code/ for the code

66 84 Updated Dec 5, 2025

⏩ Ship faster with Continuous AI. Open-source CLI that can be used in TUI mode as a coding agent or Headless mode to run background agents

TypeScript 30,433 3,928 Updated Dec 20, 2025

Turn ASCII art into SVG

JavaScript 92 17 Updated Sep 6, 2024

X-SAMPA to IPA and IPA to X-SAMPA converter

JavaScript 3 2 Updated May 19, 2021

A JavaScript utility for web pages that creates dynamic, human-readable dates, times, and relative time descriptions from UNIX timestamps.

JavaScript 3 Updated Aug 28, 2024

The day-to-day front-end to the IETF database for people who work on IETF standards.

Python 874 661 Updated Dec 19, 2025
Next