Skip to content
View Smerity's full-sized avatar

Block or report Smerity

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
20 stars written in Java
Clear filter

OpenRefine is a free, open source power tool for working with messy data and improving it

Java 11,580 2,090 Updated Nov 4, 2025

1️⃣🐝🏎️ The One Billion Row Challenge -- A fun exploration of how quickly 1B rows from a text file can be aggregated with Java

Java 7,561 2,126 Updated Aug 20, 2024

Enso Analytics is a self-service data prep and analysis platform designed for data teams.

Java 7,437 334 Updated Nov 5, 2025

Yahoo! Cloud Serving Benchmark

Java 5,154 2,313 Updated Oct 27, 2025

Apache Nutch is an extensible and scalable web crawler

Java 3,087 1,258 Updated Oct 15, 2025

Web-Scale Open Information Extraction

Java 542 132 Updated Mar 6, 2019

Clear implementation of arithmetic coding for educational purposes in Java, Python, C++.

Java 387 106 Updated Jan 22, 2023

Mojang's Humble bundle source

Java 302 100 Updated Oct 12, 2023

Huge Collections for Java using efficient off heap storage

Java 276 50 Updated Dec 10, 2014

Unsupervised Statistical Machine Translation

Java 229 40 Updated Aug 30, 2020

Java implementation of a probabilistic set data structure

Java 144 15 Updated May 18, 2017

A virtual pet that helps you raise your TDD practice.

Java 122 12 Updated Apr 10, 2011

playing around with the common crawl dataset

Java 70 9 Updated Aug 18, 2012

An AWS SDK-backed FileSystem driver for Hadoop

Java 64 44 Updated Oct 13, 2020

NLP Utilities in Java

Java 43 5 Updated Dec 14, 2022

Hadoop jobs for WikiReverse project. Parses Common Crawl data for links to Wikipedia articles.

Java 38 12 Updated Aug 12, 2018

CommonCrawl Hello World example

Java 33 15 Updated Jun 25, 2014

Stipple Image and TSP-Art Generator for Processing

Java 31 4 Updated Nov 10, 2012
Java 26 9 Updated Aug 14, 2013

CommonCrawl Test version of Nutch

Java 17 2 Updated Jul 10, 2014