- California
- https://twitter.com/grep_alex
Highlights
- Pro
-
mleap Public
Forked from combust/mleapMLeap: Deploy ML Pipelines to Production
Scala Apache License 2.0 UpdatedApr 1, 2022 -
spark Public
Forked from apache/sparkApache Spark - A unified analytics engine for large-scale data processing
Scala Apache License 2.0 UpdatedFeb 1, 2022 -
-
hadoop-book Public
Source code to accompany the book "Hadoop in Practice", published by Manning.
-
hdfs-file-slurper Public
Utility to easily copy files into HDFS
-
hadoop-utils Public
A set of Hadoop utilities to make working with Hadoop a little easier.
-
vagrant-hadoop-spark-hive Public
Vagrant project to spin up a single virtual machine running current versions of Hadoop, Hive and Spark
-
storm-trending-words Public
Quick and dirty trending words example on Storm.
-
json-mapreduce Public
InputFormat that can split multi-line JSON
-
htuple Public
A library to simplify compound field partitioning, sorting and grouping in MapReduce.
-
hiped2 Public
Source code that accompanies the book "Hadoop in Practice, Second Edition".
-
-
-
avro-sorting Public
Examples of built-in and customizable sorting in Avro and Hadoop.
-
avro-maven Public
A simple example of how to use the Avro Maven plugin to generate Avro sources.
-
filecrush Public
Forked from edwardcapriolo/filecrushRemedy small files by combining them into larger ones.
-
hsync Public
HDFS rsync-like utility to replicate data between HDFS clusters
-
-
redline Public
Forked from craigwblake/redlinePure Java Rpm Library
-
-