Skip to content
View Derrick-Ryan-Giggs's full-sized avatar
:octocat:
is doing many things
:octocat:
is doing many things

Block or report Derrick-Ryan-Giggs

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don’t include any personal information such as legal names or email addresses. Markdown is supported. This note will only be visible to you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Derrick-Ryan-Giggs/README.md

Derrick Ryan Giggs

Data Engineer | Technical Writer | Building Scalable, Reliable Data Pipelines | Cloud & Workflow Automation

With a passion for modern data stack tooling, I specialize in building production-ready data pipelines using Python, Apache Flink, dbt, and cloud-native GCP services. I focus on clean, maintainable streaming and batch pipelines, orchestration, and infrastructure-as-code.


Core Competencies

Languages & Querying

Python SQL Shell Scripting

Core scripting and advanced querying for data engineering workflows.

Data Processing & Ingestion

Apache Flink dlt PySpark Jupyter

Building robust batch and real-time data ingestion pipelines at scale.

Cloud & Data Platforms

GCP BigQuery AWS OCI

Cloud data architecture and modern data warehousing solutions.

Orchestration & Infrastructure

Airflow Kestra Terraform Docker dbt Redpanda

Orchestrating reliable, production-grade data workflows.


Engineering Practices & Tooling

  • Stream Processing: Apache Flink / PyFlink, Redpanda (Kafka-compatible)
  • Data Ingestion: dlt (data load tool), PySpark, REST API pipelines
  • Orchestration & Workflow: Apache Airflow, Kestra, Prefect
  • Data Transformation: dbt Cloud, SQL
  • Infrastructure & Deployment: Docker, Terraform, GCS, BigQuery
  • CI/CD: GitHub Actions
  • Version Control: Advanced Git
  • Monitoring & Reliability: Structured logging, pipeline health checks, alerting
  • Documentation: Pipeline lineage, runbooks, data dictionaries

Featured Projects


CoinPulse — Real-Time Crypto Analytics Pipeline

A production-grade hybrid streaming and batch cryptocurrency analytics pipeline on GCP, tracking BTC, ETH, SOL, BNB, and ADA in real time at near-zero infrastructure cost (~$0.01/month).

Impact: Delivers live price aggregations with ~1 minute latency alongside enriched daily market context (market cap, OHLC candles, 24h change %) — all surfaced in a public, auto-refreshing Grafana Cloud dashboard.

Key Challenge: Designing a dual-lane architecture that keeps compute costs at zero by running Flink, Redpanda, and Airflow locally in Docker while using GCP only for storage — replacing expensive BigQuery Streaming Inserts with free Load Jobs via a GCS JSONL intermediate layer.

Architecture:

  • Streaming lane: Binance WebSocket → Python Producer → Redpanda → PyFlink (1-min tumbling windows) → GCS JSONL → BigQuery
  • Batch lane: CoinGecko API → Airflow 7-task DAG → GCS Parquet → BigQuery
  • Transformation: dbt Cloud staging views + incremental mart tables (daily @ 07:00 UTC)
  • Visualization: Grafana Cloud, 6 panels, 30-second auto-refresh

Stack: PyFlink 2.2.0 · Redpanda · Apache Airflow 2.9.2 · dbt Cloud · BigQuery · GCS · Terraform · Grafana Cloud · Python · Docker

Live Dashboard: derrickryangiggs.grafana.net | Repo: coinpulse


Sovereign Debt Observatory

An end-to-end ELT pipeline ingesting World Bank external debt data (JEDH + QEDS datasets) into BigQuery, with dbt Cloud transformations and a Looker Studio dashboard tracking sovereign debt trends across 120+ countries.

Impact: Automated quarterly ingestion of World Bank IDS data, surfacing debt-to-GNI ratios, creditor composition, and external debt stock trends across developing economies in an interactive public dashboard.

Key Challenge: Fixing double-counted inflation in staging models caused by World Bank aggregate region codes being included alongside country-level records — verified against published World Bank figures post-fix.

Architecture:

  • Ingestion: wbgapi Python library → Apache Airflow (CeleryExecutor, Docker Compose) → GCS Parquet → BigQuery
  • Transformation: dbt Cloud (staging → mart layer, incremental models)
  • Visualization: Looker Studio connected to BigQuery mart tables

Stack: Python · Apache Airflow · dbt Cloud · BigQuery · GCS · PySpark · Docker · Looker Studio

Repo: sovereign-debt-observatory


Tech Ecosystem Observatory

A cloud-native batch pipeline analyzing global tech ecosystem health by correlating layoffs trends with YC startup activity, built entirely on GCP with infrastructure-as-code.

Impact: Enables macro-level analysis of tech sector cycles — surfacing patterns between funding activity, layoff waves, and startup formation rates in a Looker Studio dashboard refreshed on a weekly schedule.

Key Challenge: Joining two independently-sourced datasets (Layoffs.fyi + YC company data) with different granularities and update cadences into a coherent, time-aligned analytical model without double-counting events across reporting periods.

Architecture:

  • Ingestion: REST APIs + CSV sources → Kestra workflow orchestration → GCS
  • Transformation: dbt Cloud (staging → mart layer)
  • Infrastructure: Terraform (GCS bucket, BigQuery datasets, IAM)
  • Visualization: Looker Studio

Stack: Python · Kestra · dbt Cloud · BigQuery · GCS · Terraform · Looker Studio · Docker

Repo: tech-ecosystem-observatory


DLT Taxi Pipeline

Built a production-ready data ingestion pipeline using dlt (data load tool) to ingest, normalize, and load NYC taxi trip data into a cloud data warehouse.

Impact: Automated end-to-end data loading with schema inference, incremental loading, and built-in data quality checks. Key Challenge: Handling schema evolution across different taxi dataset versions while maintaining idempotent, reliable loads.

Stack: Python · dlt · SQL · GitHub Actions


PySpark Data Engineering

Leveraged PySpark to process and analyze large-scale datasets using distributed computing techniques.

Impact: Applied big data processing fundamentals to transform raw datasets into structured, analysis-ready formats. Key Challenge: Optimizing Spark jobs for performance while maintaining code clarity and reproducibility in Jupyter notebooks.

Stack: PySpark · Python · Jupyter Notebook


Connect & Collaborate

Open to Remote & Hybrid Opportunities

GitHub: github.com/Derrick-Ryan-Giggs Blog: medium.com/@derrickryangiggs · dev.to/derrickryangiggs · ryan-giggs.hashnode.dev LinkedIn: in/ryan-giggs-a19330265

Open to collaborating on interesting data infrastructure projects and discussions about data engineering, cloud architecture, and modern data stack tooling.


Last Updated: 2026-04-24

Pinned Loading

  1. coinpulse coinpulse Public

    Production-grade hybrid streaming and batch crypto analytics pipeline. Binance WebSocket → Redpanda → PyFlink → BigQuery. Batch via Airflow + CoinGecko. dbt Cloud transforms. Grafana Cloud dashboar…

    Python 1

  2. sovereign-debt-observatory sovereign-debt-observatory Public

    End-to-end ELT pipeline ingesting World Bank external debt data (JEDH + QEDS) using PySpark, Airflow, dbt Cloud, BigQuery, and GCS

    Python 1 1

  3. tech-ecosystem-observatory tech-ecosystem-observatory Public

    Cloud-native batch pipeline analyzing global tech ecosystem health by correlating layoffs trends with YC startup activity. Built with Terraform, Kestra, GCS, BigQuery, dbt, and Looker Studio.

    Python 1

  4. -my-dlt-taxi-pipeline -my-dlt-taxi-pipeline Public

    Python 1

  5. my-dlt-pipeline my-dlt-pipeline Public

    Python 1

  6. AgroLink-Mobile-App AgroLink-Mobile-App Public

    TypeScript 1