Working examples for some components on GCP, and instructions on how to run them.
-
Updated
Apr 26, 2017 - Java
Working examples for some components on GCP, and instructions on how to run them.
Neo4j, Cassandra, Hadoop, PySpark, RDD, MapReduce, Cluster-Computing, DataProc
Support code for the article "Connecting GCP Dataproc and Elasticsearch: Bridging the Worlds of Big Data and (vector) Search"
Co-Purchase Analysis with Scala and Spark. Project for Scalable and Cloud Programming (81942) university course @unibo
An e-commerce data lakehouse implemented on Google Cloud Platform (GCP). This project features an end-to-end data pipeline, from raw data generation via Cloud Functions, layered processing with PySpark on Dataproc, to structured data warehousing in BigQuery. It's fully orchestrated by Apache Airflow, enabling analytics and BI with Metabase.
Coding book counter words with PySpark for Digital Innovation One challenge
Data Engineering Using Google Could Platform and Mage
Repositorio para realizar el curso en Udemy llamado "Airflow2.0 De 0 a Héroe", de la academia "Datapath".
PySpark-based ETL pipeline leveraging Dataproc, Cloud Storage, Cloud Run Functions and BigQuery, to automate Spotify "New Releases" data processing and visualization in Looker Studio.
Running a wordcount job on a Google Dataproc cluster
The Tokyo Olympic Data Analysis on GCP project is a comprehensive solution for analyzing and visualizing Olympic Games data using various GCP services.
Extract stock market data, then uploading it to GCP using Airflow, GCS, BigQuery and Spark
Add a description, image, and links to the dataproc topic page so that developers can more easily learn about it.
To associate your repository with the dataproc topic, visit your repo's landing page and select "manage topics."