Data Engineer | Technical Writer | Building Scalable, Reliable Data Pipelines | Cloud & Workflow Automation
With a passion for modern data stack tooling, I specialize in building production-ready data pipelines using Python, Apache Flink, dbt, and cloud-native GCP services. I focus on clean, maintainable streaming and batch pipelines, orchestration, and infrastructure-as-code.
Core scripting and advanced querying for data engineering workflows.
Building robust batch and real-time data ingestion pipelines at scale.
Cloud data architecture and modern data warehousing solutions.
Orchestrating reliable, production-grade data workflows.
- Stream Processing: Apache Flink / PyFlink, Redpanda (Kafka-compatible)
- Data Ingestion: dlt (data load tool), PySpark, REST API pipelines
- Orchestration & Workflow: Apache Airflow, Kestra, Prefect
- Data Transformation: dbt Cloud, SQL
- Infrastructure & Deployment: Docker, Terraform, GCS, BigQuery
- CI/CD: GitHub Actions
- Version Control: Advanced Git
- Monitoring & Reliability: Structured logging, pipeline health checks, alerting
- Documentation: Pipeline lineage, runbooks, data dictionaries
A production-grade hybrid streaming and batch cryptocurrency analytics pipeline on GCP, tracking BTC, ETH, SOL, BNB, and ADA in real time at near-zero infrastructure cost (~$0.01/month).
Impact: Delivers live price aggregations with ~1 minute latency alongside enriched daily market context (market cap, OHLC candles, 24h change %) — all surfaced in a public, auto-refreshing Grafana Cloud dashboard.
Key Challenge: Designing a dual-lane architecture that keeps compute costs at zero by running Flink, Redpanda, and Airflow locally in Docker while using GCP only for storage — replacing expensive BigQuery Streaming Inserts with free Load Jobs via a GCS JSONL intermediate layer.
Architecture:
- Streaming lane: Binance WebSocket → Python Producer → Redpanda → PyFlink (1-min tumbling windows) → GCS JSONL → BigQuery
- Batch lane: CoinGecko API → Airflow 7-task DAG → GCS Parquet → BigQuery
- Transformation: dbt Cloud staging views + incremental mart tables (daily @ 07:00 UTC)
- Visualization: Grafana Cloud, 6 panels, 30-second auto-refresh
Stack: PyFlink 2.2.0 · Redpanda · Apache Airflow 2.9.2 · dbt Cloud · BigQuery · GCS · Terraform · Grafana Cloud · Python · Docker
Live Dashboard: derrickryangiggs.grafana.net | Repo: coinpulse
An end-to-end ELT pipeline ingesting World Bank external debt data (JEDH + QEDS datasets) into BigQuery, with dbt Cloud transformations and a Looker Studio dashboard tracking sovereign debt trends across 120+ countries.
Impact: Automated quarterly ingestion of World Bank IDS data, surfacing debt-to-GNI ratios, creditor composition, and external debt stock trends across developing economies in an interactive public dashboard.
Key Challenge: Fixing double-counted inflation in staging models caused by World Bank aggregate region codes being included alongside country-level records — verified against published World Bank figures post-fix.
Architecture:
- Ingestion:
wbgapiPython library → Apache Airflow (CeleryExecutor, Docker Compose) → GCS Parquet → BigQuery - Transformation: dbt Cloud (staging → mart layer, incremental models)
- Visualization: Looker Studio connected to BigQuery mart tables
Stack: Python · Apache Airflow · dbt Cloud · BigQuery · GCS · PySpark · Docker · Looker Studio
Repo: sovereign-debt-observatory
A cloud-native batch pipeline analyzing global tech ecosystem health by correlating layoffs trends with YC startup activity, built entirely on GCP with infrastructure-as-code.
Impact: Enables macro-level analysis of tech sector cycles — surfacing patterns between funding activity, layoff waves, and startup formation rates in a Looker Studio dashboard refreshed on a weekly schedule.
Key Challenge: Joining two independently-sourced datasets (Layoffs.fyi + YC company data) with different granularities and update cadences into a coherent, time-aligned analytical model without double-counting events across reporting periods.
Architecture:
- Ingestion: REST APIs + CSV sources → Kestra workflow orchestration → GCS
- Transformation: dbt Cloud (staging → mart layer)
- Infrastructure: Terraform (GCS bucket, BigQuery datasets, IAM)
- Visualization: Looker Studio
Stack: Python · Kestra · dbt Cloud · BigQuery · GCS · Terraform · Looker Studio · Docker
Repo: tech-ecosystem-observatory
Built a production-ready data ingestion pipeline using dlt (data load tool) to ingest, normalize, and load NYC taxi trip data into a cloud data warehouse.
Impact: Automated end-to-end data loading with schema inference, incremental loading, and built-in data quality checks. Key Challenge: Handling schema evolution across different taxi dataset versions while maintaining idempotent, reliable loads.
Stack: Python · dlt · SQL · GitHub Actions
Leveraged PySpark to process and analyze large-scale datasets using distributed computing techniques.
Impact: Applied big data processing fundamentals to transform raw datasets into structured, analysis-ready formats. Key Challenge: Optimizing Spark jobs for performance while maintaining code clarity and reproducibility in Jupyter notebooks.
Stack: PySpark · Python · Jupyter Notebook
Open to Remote & Hybrid Opportunities
GitHub: github.com/Derrick-Ryan-Giggs Blog: medium.com/@derrickryangiggs · dev.to/derrickryangiggs · ryan-giggs.hashnode.dev LinkedIn: in/ryan-giggs-a19330265
Open to collaborating on interesting data infrastructure projects and discussions about data engineering, cloud architecture, and modern data stack tooling.
Last Updated: 2026-04-24