Skip to content
View datawrangling's full-sized avatar

Highlights

  • Pro

Block or report datawrangling

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
34 stars written in Java
Clear filter

Free and Open Source, Distributed, RESTful Search Engine

Java 76,391 25,834 Updated Apr 1, 2026

Android application powering the mirror in my house

Java 7,830 655 Updated May 28, 2023

Hackpad is a web-based realtime wiki.

Java 3,645 520 Updated May 22, 2023

AWS Usage Tool

Java 2,879 428 Updated Oct 7, 2022

An open source clone of Amazon's Dynamo.

Java 2,681 583 Updated Jul 24, 2023

Plugin to integrate Learning to Rank (aka machine learning for better relevance) with Elasticsearch

Java 1,524 374 Updated Feb 19, 2026

WE HAVE MOVED to Apache Incubator. https://cwiki.apache.org/FLUME/ . Flume is a distributed, reliable, and available service for efficiently collecting, aggregating, and moving large amounts of log…

Java 944 211 Updated May 26, 2021

Mirror of Apache Pig

Java 689 445 Updated Sep 15, 2025

Generating Vectors for DBpedia Entities via Word2Vec and Wikipedia Dumps. Questions? https://gitter.im/idio-opensource/Lobby

Java 600 134 Updated Jan 11, 2018

Hadoop library for large-scale data processing, now an Apache Incubator project

Java 581 132 Updated Jul 8, 2014

Web-Scale Open Information Extraction

Java 544 132 Updated Mar 6, 2019

Timeline visualization application

Java 459 70 Updated Jul 30, 2010

Fast Entity Linker Toolkit for training models to link entities to KnowledgeBase (Wikipedia) in documents and queries.

Java 340 82 Updated Feb 12, 2021

Processing applet which creates the images seen in the Streamgraph paper

Java 285 38 Updated Aug 30, 2018

Speech and Vision Based Intelligent Personal Assistant

Java 255 68 Updated Oct 7, 2017

Json Wikipedia, contains code to convert the Wikipedia xml dump into a json/avro dump

Java 255 43 Updated Dec 5, 2023

Remedy small files by combining them into larger ones.

Java 195 118 Updated Jul 1, 2022

Bulk loading for elastic search

Java 187 57 Updated Dec 16, 2023

Warcbase is an open-source platform for managing analyzing web archives

Java 161 47 Updated Dec 8, 2017

Apache Pig utilities to build training corpora for machine learning / NLP out of public Wikipedia and DBpedia dumps.

Java 161 60 Updated Nov 8, 2022

Kinesis spout for Storm

Java 108 61 Updated Mar 28, 2018

Use Solr clients/tools with ElasticSearch

Java 77 25 Updated Feb 25, 2013

playing around with the common crawl dataset

Java 70 9 Updated Aug 18, 2012

A grouping of Apache Pig examples.

Java 65 14 Updated Oct 13, 2020

Redis bulk-loader for Apache Pig

Java 40 10 Updated Apr 21, 2012

Disambiguation of Semantic Resources - Full framework

Java 30 4 Updated Oct 31, 2016

Gora has moved to Apache Incubator, please goto http://incubator.apache.org/gora/

Java 22 2 Updated Nov 17, 2010

common crawl quick hack examples

Java 19 7 Updated Feb 11, 2015

DoSeR with entity disambiguation components only

Java 16 4 Updated Jan 29, 2019

Term payloads for Elasticsearch

Java 11 5 Updated Sep 8, 2016
Next