Skip to content
View ogrisel's full-sized avatar

Organizations

@scikit-learn @compatible @pydata @sup-e-educ

Block or report ogrisel

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
33 stars written in Java
Clear filter

NOTE: This repo (The Clojure programming language) has moved:

Java 911 12 Updated Jul 4, 2014

Solandra = Solr + Cassandra

Java 883 149 Updated Mar 9, 2016

Distributed database specialized in exporting key/value data from Hadoop

Java 558 51 Updated Jun 27, 2014

Bnd/Bndtools. Tooling to build OSGi bundles including Eclipse, Maven, and Gradle plugins.

Java 555 304 Updated Dec 17, 2025

Cascading is a feature rich API for defining and executing complex and fault tolerant data processing flows locally or on a cluster.

Java 352 218 Updated Apr 8, 2025

Java version of LIBLINEAR

Java 308 137 Updated Dec 31, 2024

Behemoth is an open source platform for large scale document analysis based on Apache Hadoop.

Java 283 59 Updated Apr 25, 2018

S4 is a general-purpose, distributed, scalable, partially fault-tolerant, pluggable platform that allows programmers to easily develop applications for processing continuous unbounded streams of da…

Java 234 15 Updated Mar 4, 2011

Counterclockwise is an Eclipse plugin helping developers write Clojure code

Java 220 50 Updated Apr 17, 2018

ARCHIVED: The contents of this repo have been merged into the `bnd` repo.

Java 195 86 Updated Dec 6, 2018

Bulk loading for elastic search

Java 185 57 Updated Dec 16, 2023

Apache Pig utilities to build training corpora for machine learning / NLP out of public Wikipedia and DBpedia dumps.

Java 160 61 Updated Nov 8, 2022

Java implementation of a probabilistic set data structure

Java 144 15 Updated May 18, 2017

Mirror of Apache Stanbol (incubating)

Java 115 73 Updated Feb 29, 2024

Some utilities for Lucene

Java 111 12 Updated Jun 21, 2013

Explorations relative to cloning FlumeJava

Java 94 16 Updated Oct 13, 2020

Mirror of Apache Whirr

Java 94 55 Updated Apr 28, 2017

playing around with the common crawl dataset

Java 70 9 Updated Aug 18, 2012

Machine learning and natural language processing with Apache Pig

Java 53 16 Updated Dec 17, 2013

Utility to re-structure research papers published in US Letter or A4 format PDF files to typically remove the 2 columns layout.

Java 53 5 Updated Nov 8, 2010

Example applications

Java 50 3 Updated Mar 8, 2011

NLP Utilities in Java

Java 43 5 Updated Dec 14, 2022

S4 Communication Layer

Java 36 4 Updated Jan 21, 2011

The Colossal Pipe framework for map/reduce processing.

Java 29 2 Updated Aug 19, 2014

A reporistory of User-defined functions for Apache Pig

Java 16 1 Updated Sep 20, 2010

Personal development repository to prepare contributions and patches for Apache Mahout

Java 16 314 Updated Jun 8, 2010

Jersey Provider for Freemarker templates

Java 7 1 Updated Jul 4, 2009

Tools for using Clueweb09 and HBase together

Java 7 5 Updated Jun 27, 2025

A word tokenizer component for UIMA that take advantage of unicode general classes. The tokenizer only handles French for the moment, but can be extended quite easily.

Java 7 1 Updated Sep 8, 2010
Next