Qubole Streaminglens tool for tuning Spark Structured Streaming Pipelines
-
Updated
Jan 21, 2020 - Scala
Qubole Streaminglens tool for tuning Spark Structured Streaming Pipelines
ecommerce GCP Streaming pipeline ― Cloud Storage, Compute Engine, Pub/Sub, Dataflow, Apache Beam, BigQuery and Tableau; GCP Batch pipeline ― Cloud Storage, Dataproc, PySpark, Cloud Spanner and Tableau
Collecting highlights from the Quix community and social media in the form of interesting questions, comments, challenges, solutions and insights
Real Time Data Streaming Pipeline
Stream data directly from an API using Apache Beam to BigQuery.
Docs-only case study of a compliance & anomaly detection platform on Azure + Databricks (Streaming ETL + Batch ELT + ML).
Masters degree | Data Engineering | Final course projects | goit-de-fp
Docs-only case study – Compliance Reporting data platform on Azure for a Big-4 Audit & Consulting Firm (BFSI, healthcare-style datasets) using Streaming Pipeline (ETL) + Batch Pipeline (ELT) with Snowflake, Synapse, ADF, Power BI, ML risk scoring, DQ, governance, and lineage.
Collecting highlights from the Quix community and social media in the form of interesting questions, comments, challenges, solutions and insights
Streaming pipeline using AWS MSK and AWS EMR with Spark, retrieving the data from Twitter Streams API
This project implements a modern data engineering pipeline using Databricks, PySpark, DBT, and Delta Live Tables. It follows the Medallion Architecture, supports realtime data ingestion with Autoloader, and models data with fact and dimension tables, including Slowly Changing Dimensions (SCD Type 2), all orchestrated in a scalable cloud environment
Data Engineer Training Using Google Cloud Platform
AI-powered data sanitizer with schema detection, dedupe, outlier detection, and LLM enrichment.
🏥 Streamline healthcare claims processing with this Snowflake pipeline, featuring auto-ingestion, CDC, SCD Type 2, and data quality checks.
🔍 Detect compliance anomalies in financial transactions with Azure and Databricks, ensuring accuracy and transparency for audit processes.
Add a description, image, and links to the streaming-pipeline topic page so that developers can more easily learn about it.
To associate your repository with the streaming-pipeline topic, visit your repo's landing page and select "manage topics."