Skip to content
View ogrisel's full-sized avatar

Organizations

@scikit-learn @compatible @pydata @sup-e-educ

Block or report ogrisel

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don’t include any personal information such as legal names or email addresses. Markdown is supported. This note will only be visible to you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
33 stars written in Java
Clear filter

NOTE: This repo (The Clojure programming language) has moved:

Java 917 12 Updated Jul 4, 2014

Solandra = Solr + Cassandra

Java 882 149 Updated Mar 9, 2016

Distributed database specialized in exporting key/value data from Hadoop

Java 559 50 Updated Jun 27, 2014

Bnd/Bndtools. Tooling to build OSGi bundles including Eclipse, Maven, and Gradle plugins.

Java 556 300 Updated Apr 15, 2026

Cascading is a feature rich API for defining and executing complex and fault tolerant data processing flows locally or on a cluster.

Java 353 219 Updated Apr 8, 2025

Java version of LIBLINEAR

Java 310 139 Updated Dec 31, 2024

Behemoth is an open source platform for large scale document analysis based on Apache Hadoop.

Java 284 59 Updated Apr 25, 2018

S4 is a general-purpose, distributed, scalable, partially fault-tolerant, pluggable platform that allows programmers to easily develop applications for processing continuous unbounded streams of da…

Java 233 14 Updated Mar 4, 2011

Counterclockwise is an Eclipse plugin helping developers write Clojure code

Java 223 51 Updated Apr 17, 2018

ARCHIVED: The contents of this repo have been merged into the `bnd` repo.

Java 197 84 Updated Dec 6, 2018

Bulk loading for elastic search

Java 187 57 Updated Dec 16, 2023

Apache Pig utilities to build training corpora for machine learning / NLP out of public Wikipedia and DBpedia dumps.

Java 161 60 Updated Nov 8, 2022

Java implementation of a probabilistic set data structure

Java 146 15 Updated May 18, 2017

Mirror of Apache Stanbol (incubating)

Java 117 72 Updated Feb 29, 2024

Some utilities for Lucene

Java 113 12 Updated Jun 21, 2013

Mirror of Apache Whirr

Java 95 55 Updated Apr 28, 2017

Explorations relative to cloning FlumeJava

Java 94 16 Updated Oct 13, 2020

playing around with the common crawl dataset

Java 70 9 Updated Aug 18, 2012

Machine learning and natural language processing with Apache Pig

Java 53 16 Updated Dec 17, 2013

Utility to re-structure research papers published in US Letter or A4 format PDF files to typically remove the 2 columns layout.

Java 53 5 Updated Nov 8, 2010

Example applications

Java 50 3 Updated Mar 8, 2011

NLP Utilities in Java

Java 43 5 Updated Dec 14, 2022

S4 Communication Layer

Java 36 4 Updated Jan 21, 2011

The Colossal Pipe framework for map/reduce processing.

Java 29 2 Updated Aug 19, 2014

A reporistory of User-defined functions for Apache Pig

Java 16 1 Updated Sep 20, 2010

Personal development repository to prepare contributions and patches for Apache Mahout

Java 16 314 Updated Jun 8, 2010

Jersey Provider for Freemarker templates

Java 7 1 Updated Jul 4, 2009

Tools for using Clueweb09 and HBase together

Java 7 5 Updated Apr 14, 2026

A word tokenizer component for UIMA that take advantage of unicode general classes. The tokenizer only handles French for the moment, but can be extended quite easily.

Java 7 1 Updated Sep 8, 2010
Next