Junior Data Engineer specializing in building reliable, production-style data pipelines using SQL, dbt, Airflow, and AWS. Developed an end-to-end analytics platform with idempotent ingestion, backfill handling, data quality checks, and cloud-based deployment. Familiar with Infrastructure-as-Code (Terraform) and containerized workflows (Docker). Focused on delivering stable, cost-aware, and scalable data systems.
| Category | Stack |
|---|---|
| Languages | |
| Orchestration & Transformation | |
| Streaming & Event Processing | |
| Compute & Storage | |
| Infra & DevOps |
Architecture: Python • Kafka (KRaft) • Apache Flink • PostgreSQL • Grafana
- Built a decoupled, real-time streaming architecture to process continuous IoT telemetry data using Kafka and stateful Apache Flink (PyFlink) tumbling windows.
- Prevented downstream pipeline failures by enforcing Avro data contracts via Confluent Schema Registry.
- Orchestrated automated Airflow anomaly-detection jobs against a live PostgreSQL serving layer, integrated with Grafana for sub-second observability.
Architecture: Python • PyTorch • Docker • MLOps • DAG Orchestration
- Built a GPU-accelerated data orchestration pipeline for autonomous systems, reducing simulation bottlenecks by parallelizing 1M+ Bayesian tensor evaluations in local VRAM.
- Tested initially on SMARTS framework and utilizes Gymnasium to orchestrate RL environment.
- Processed complex vehicle telemetry using Directed Acyclic Graphs (DAGs) to transition from data correlation to causal root-cause extraction.
- Decoupled the infrastructure into a deployable Docker container, generating automated JSON compliance artifacts and HTML dashboards for CI/CD integration.
Architecture: Airflow • Kafka • dbt • DuckDB • AWS S3 • Terraform
- Designed an end-to-end Medallion data pipeline using Apache Airflow to ingest, cleanse, and transform streaming logistics telemetry from Apache Kafka into AWS S3.
- Engineered a decoupled compute architecture utilizing DuckDB for in-memory processing and dbt for advanced SQL transformations to track delivery SLA breaches.
- Built a resilient data contract layer using Great Expectations to filter corrupted IoT sensor data without pipeline bottlenecks.
Architecture: Python • AutoMQ • Apache Kafka • S3 •Apache Flink
- A proof-of-concept demonstrating a seamless migration from traditional KRaft Kafka to AutoMQ (S3-backed storage) with exactly zero lines of code changed in the Apache Flink downstream compute layer with PermuteX as foundation.
Architecture: Python • AST • Gemini API • HuggingFace • Code-BERT
- A developer tool designed to instantly reverse-engineer and visually map legacy or undocumented Python codebases by parsing Abstract Syntax Trees (AST) into deterministic dependency graphs.