Skip to content
View KOUSHIC8976's full-sized avatar
🏠
Working from home
🏠
Working from home

Block or report KOUSHIC8976

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don’t include any personal information such as legal names or email addresses. Markdown is supported. This note will only be visible to you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
KOUSHIC8976/README.md

Hello, I'm KOUSHIC

Data Engineer | Modern Data Stack | IaC

Building reliable, decoupled, and production-grade data pipelines.

LinkedIn Email

About Me

Junior Data Engineer specializing in building reliable, production-style data pipelines using SQL, dbt, Airflow, and AWS. Developed an end-to-end analytics platform with idempotent ingestion, backfill handling, data quality checks, and cloud-based deployment. Familiar with Infrastructure-as-Code (Terraform) and containerized workflows (Docker). Focused on delivering stable, cost-aware, and scalable data systems.


Technical Skills

Category Stack
Languages Python SQL Bash
Orchestration & Transformation Airflow dbt Great Expectations
Streaming & Event Processing Kafka Flink
Compute & Storage DuckDB Iceberg
Infra & DevOps AWS Terraform Docker

📂 Featured Projects

Architecture: PythonKafka (KRaft)Apache FlinkPostgreSQLGrafana

  • Built a decoupled, real-time streaming architecture to process continuous IoT telemetry data using Kafka and stateful Apache Flink (PyFlink) tumbling windows.
  • Prevented downstream pipeline failures by enforcing Avro data contracts via Confluent Schema Registry.
  • Orchestrated automated Airflow anomaly-detection jobs against a live PostgreSQL serving layer, integrated with Grafana for sub-second observability.

Architecture: PythonPyTorchDockerMLOpsDAG Orchestration

  • Built a GPU-accelerated data orchestration pipeline for autonomous systems, reducing simulation bottlenecks by parallelizing 1M+ Bayesian tensor evaluations in local VRAM.
  • Tested initially on SMARTS framework and utilizes Gymnasium to orchestrate RL environment.
  • Processed complex vehicle telemetry using Directed Acyclic Graphs (DAGs) to transition from data correlation to causal root-cause extraction.
  • Decoupled the infrastructure into a deployable Docker container, generating automated JSON compliance artifacts and HTML dashboards for CI/CD integration.

Architecture: AirflowKafkadbtDuckDBAWS S3Terraform

  • Designed an end-to-end Medallion data pipeline using Apache Airflow to ingest, cleanse, and transform streaming logistics telemetry from Apache Kafka into AWS S3.
  • Engineered a decoupled compute architecture utilizing DuckDB for in-memory processing and dbt for advanced SQL transformations to track delivery SLA breaches.
  • Built a resilient data contract layer using Great Expectations to filter corrupted IoT sensor data without pipeline bottlenecks.

Architecture: PythonAutoMQApache KafkaS3Apache Flink

  • A proof-of-concept demonstrating a seamless migration from traditional KRaft Kafka to AutoMQ (S3-backed storage) with exactly zero lines of code changed in the Apache Flink downstream compute layer with PermuteX as foundation.

Architecture: PythonASTGemini APIHuggingFaceCode-BERT

  • A developer tool designed to instantly reverse-engineer and visually map legacy or undocumented Python codebases by parsing Abstract Syntax Trees (AST) into deterministic dependency graphs.

Pinned Loading

  1. RERA RERA Public

    RERA is a failure analysis and insight layer built on top of simulation environments.

    Python 3

  2. Ledger-Sync Ledger-Sync Public

    Ledger Sync is a micro-batch data pipeline built on the Medallion Architecture. It ingests simulated, erratic logistics telemetry via Apache Kafka, Airflow.

    Python 2

  3. PermuteX PermuteX Public

    An end-to-end, stateful streaming data architecture designed to ingest, validate, process, and visualize high-velocity IoT telemetry data in real-time.

    Python 1

  4. AutoMQ-Flink-Streaming AutoMQ-Flink-Streaming Public

    PyFlink streaming integration demo of AutoMQ (handling real-time tumbling windows to PostgreSQL)

    Python

  5. Code-Cartographer Code-Cartographer Public

    Code Cartographer is a desktop developer tool designed to instantly reverse-engineer and visually map legacy or undocumented Python codebases.

    Python 3