-
probabl.ai
- Paris, France
- https://ogrisel.com
- @ogrisel@sigmoid.social
- @ogrisel.bsky.social
- @ogrisel
Stars
richhickey / clojure
Forked from clojure/clojureNOTE: This repo (The Clojure programming language) has moved:
Distributed database specialized in exporting key/value data from Hadoop
Bnd/Bndtools. Tooling to build OSGi bundles including Eclipse, Maven, and Gradle plugins.
Cascading is a feature rich API for defining and executing complex and fault tolerant data processing flows locally or on a cluster.
Behemoth is an open source platform for large scale document analysis based on Apache Hadoop.
S4 is a general-purpose, distributed, scalable, partially fault-tolerant, pluggable platform that allows programmers to easily develop applications for processing continuous unbounded streams of da…
Counterclockwise is an Eclipse plugin helping developers write Clojure code
ARCHIVED: The contents of this repo have been merged into the `bnd` repo.
Apache Pig utilities to build training corpora for machine learning / NLP out of public Wikipedia and DBpedia dumps.
Java implementation of a probabilistic set data structure
Machine learning and natural language processing with Apache Pig
Utility to re-structure research papers published in US Letter or A4 format PDF files to typically remove the 2 columns layout.
The Colossal Pipe framework for map/reduce processing.
A reporistory of User-defined functions for Apache Pig
Personal development repository to prepare contributions and patches for Apache Mahout
Tools for using Clueweb09 and HBase together
A word tokenizer component for UIMA that take advantage of unicode general classes. The tokenizer only handles French for the moment, but can be extended quite easily.