Install Spark, Kafka, Cassandra, Zookeeper
-
Updated
Feb 20, 2017 - Python
Apache Spark is an open source distributed general-purpose cluster-computing framework. It provides an interface for programming entire clusters with implicit data parallelism and fault tolerance.
Install Spark, Kafka, Cassandra, Zookeeper
Implemented parallel and distributed algorithms using OpenMP, Apache Spark and NVIDIA CUDA
An engineering process for data science and big data processing
A content based movie recommendation system
Big data analysis of 'shared-world' cloud application.
REPOSITORY FOR MY SOFTWARE DEVELOPMENT AND DATA SCIENCE PORTFOLIO.
Notebooks for Python and Spark for Big Data
Analysis of weather data records from 1985-01-01 to 2014-12-31 for weather stations in Nebraska, Iowa, Illinois, Indiana, or Ohio.
You can do a lot of things with Apache Spark. What I've done here is to work with a static file and create a Batch ETL system.
This project was completed as a part of the " Advanced Big Data" course at Nile University.
A Twitter Stream Processing Pipeline with ingestion, processing, storage, and visualization.
Desafio final desenvolvido para a Residência do Porto Digital na empresa A3Data onde tivemos a oportunidade de trabalhar com construção de DataLake e camadas (Bronze, Silver e Gold) para criação de dashboards e analises.
A coursework-style project from my Master's studies in Machine Learning on Big Data (University of East London), implementing distributed word embeddings and K-Means topic clustering on a large-scale news dataset using PySpark, and extending the trained models to a real-time Structured Streaming pipeline.
Created by Matei Zaharia
Released May 26, 2014