Highlights
- Pro
-
Code snippets for Data Engineering Design Patterns book
-
spark-playground Public
Code snippets used in demos recorded for the blog.
-
-
-
Code snippets for the NDC Porto 2024 2-hours workshop on stream processing with Apache Spark Structured Streaming and Apache Flink
-
idempotency-ndc-porto-2024 Public
Demo for idempotency examples presented as part of my talk at NDC Porto 2024: https://ndcporto.com/agenda/embrace-the-failure-stay-idempotent-0v69/06fl3582ypx
-
-
data-ai-summit-2024 Public
Visits sessionization pipeline used for the talk
-
-
-
acid-file-formats Public
Code for Apache Hudi, Apache Iceberg and Delta Lake analysis
-
-
-
delta Public
Forked from delta-io/deltaAn open-source storage framework that enables building a Lakehouse architecture with compute engines including Spark, PrestoDB, Flink, Trino, and Hive and APIs
HTML Apache License 2.0 UpdatedSep 27, 2023 -
-
-
-
recipes-fork Public
Forked from tspannhw/recipesThe Immerok Apache Flink Cookbook is a collection of examples of Apache Flink applications in the format of "recipes". Each recipe explains how you can solve a specific problem by leveraging one or…
-
data-generator Public
User web sessions data generator written in Python, for Kafka, Kinesis or local file system sinks
-
data-ai-summit-2020 Public
You will find here the demo codes for my Data+AI 2020 talk about customizing Apache Spark state store.
-
-
spark-scala-playground Public
Sample processing code using Spark 2.1+ and Scala
-
-
-
Demo code for my talk about Cerberus integration with PySpark
Python UpdatedNov 6, 2019 -
spark-docker Public
Repository containing Docker images for Spark master and slave
-
Real-time data visualization with Apache Spark GraphX, Cytoscape.js, Jetty and websockets
-
-
bigdata-sandbox Public
Tests of some tools which can be used on Big Data projects: ZooKeeper, Kafka, Cassandra, Spark and so on.
-
scala-learn Public
Some learning test showing Scala features, such as implicits, guards, pattern matching and more more others.