Stars
Egeria's Guidance on Governance as well as large media files such as presentations and movies
This repository contains the notebooks and presentations we use for our Databricks Tech Talks
Code repository for O'Reilly Hadoop Application Architectures book
hadoop-mini-clusters provides an easy way to test Hadoop projects directly in your IDE
Kerberos and Hadoop: The Madness beyond the Gate
KillrWeather is a reference application (work in progress) showing how to easily integrate streaming and batch data processing with Apache Spark Streaming, Apache Cassandra, Apache Kafka and Akka f…
Snippets and small examples demonstrating kafka features and configs
Scalable machine learning library for Apache Hive/Spark/Pig
Streaming MapReduce with Scalding and Storm
📚 Freely available programming books
JeroMQ is a pure Java implementation of the ZeroMQ messaging library, offering high-performance asynchronous messaging for distributed or concurrent applications.
Elephant Twin is a framework for creating indexes in Hadoop
Common Crawl support library to access 2008-2012 crawl archives (ARC files)
Apache Kafka - A distributed event streaming platform