Skip to content
View abhishek2f24's full-sized avatar
👋
👋

Block or report abhishek2f24

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don’t include any personal information such as legal names or email addresses. Markdown is supported. This note will only be visible to you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
abhishek2f24/README.md

Abhishek Kumar Maurya

Senior Data Engineer · Azure · Databricks · Kafka · Spark · Snowflake


About Me

I architect and deliver cloud-native data platforms that move real data at real scale — 5TB/day ETL, 1M IoT events/hour, 5M+ GPS records in production. Currently at GHD (UK), building geospatial and streaming pipelines for global infrastructure clients.

  • 5 years designing end-to-end data platforms (ingestion → transformation → orchestration → serving)
  • Triple cloud certified — Microsoft Azure · AWS · Google Cloud
  • Expertise in real-time streaming (Kafka, Spark Streaming, Delta Lake) and modern data stack (dbt, Snowflake, Airflow, Databricks)
  • Open to fully remote roles across US, UK, Australia, and Europe

Tech Stack

Cloud
Azure AWS GCP Microsoft Fabric

Processing & Streaming
Apache Spark PySpark Apache Kafka Databricks Delta Lake

Warehousing & Transformation
Snowflake BigQuery dbt Azure Synapse Azure Data Factory

Orchestration
Apache Airflow Azure DevOps

Languages
Python SQL Scala

DevOps & IaC
Docker Kubernetes Terraform GitHub Actions

BI & Visualization
Power BI Tableau


Production Impact

Metric Value
ETL throughput 5 TB/day
Streaming throughput ~1M events/hour
GPS records processed 5M+
Pipeline latency reduced 40%
Storage costs reduced 30%
Manual audit effort eliminated 65%
Securable objects validated 500+
Cloud certifications 3 (Azure + AWS + GCP)

Featured Projects

Real-Time Streaming Pipeline — Kafka to Delta Lake

End-to-end streaming data platform: Kafka ingestion → PySpark Structured Streaming → Delta Lake → Snowflake → dbt models → Airflow orchestration. Production-grade with CI/CD, data quality checks, and monitoring.
Kafka PySpark Delta Lake Snowflake dbt Airflow Docker GitHub Actions

Cloud Lakehouse on Azure — Medallion Architecture

Medallion architecture (Bronze/Silver/Gold) on Azure Data Lake Storage + Databricks + Delta Lake + ADF. Includes Unity Catalog governance, Great Expectations data quality, and Power BI serving layer.
Azure Databricks Delta Lake ADF dbt Unity Catalog Power BI

Geospatial Fleet Analytics Platform

Processes 5M+ GPS records for mining fleet operations: haversine distance, raster elevation, DBSCAN clustering for EV station placement, and interactive Leaflet.js stakeholder dashboard.
PySpark Databricks GeoPandas DBSCAN Leaflet.js Delta Lake


Certifications

  • Microsoft Certified: Azure Data Engineer Associate
  • AWS Certified Data Engineer
  • Google Cloud Professional Data Engineer

Currently at

GHD — Data Engineer (Remote, UK-based client)
Building geospatial data platforms for global infrastructure and mining clients.



Open to Senior Data Engineer / Data Platform Engineer roles — Remote · US · UK · Australia · Europe
abhishek2f24@gmail.com

Pinned Loading

  1. LLM_ChatGPT LLM_ChatGPT Public

    Implementation of various LLMs in python

    Jupyter Notebook

  2. NoSQL-Schema-Extraction NoSQL-Schema-Extraction Public

    Python

  3. rasa-chatbot rasa-chatbot Public

    Python

  4. MultiClass-Sentence-Classification MultiClass-Sentence-Classification Public

    This repository contains file from preprocessing to deployment(as Azure HTTP trigger function)

    PowerShell

  5. ReinforcementLearning ReinforcementLearning Public

    Jupyter Notebook

  6. OpenCV_projects OpenCV_projects Public

    Jupyter Notebook