Skip to content
View OlgaTatarinova's full-sized avatar

Block or report OlgaTatarinova

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don’t include any personal information such as legal names or email addresses. Markdown is supported. This note will only be visible to you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
OlgaTatarinova/README.md

Olga Tatarinova

Senior Data Engineer · Real-Time Data Platform Specialist

Building the infrastructure that turns raw events into business intelligence

LinkedIn Email Location


About Me

I'm a Senior Data Engineer with 5+ years building production data platforms for high-load e-commerce systems. Currently at Wildberries (one of Europe's largest e-commerce platforms), where I design and operate analytics infrastructure processing millions of events per day.

I specialize in the intersection of real-time streaming and analytical workloads — the hard problems that happen when you need both sub-second latency and historical query performance at scale.

PostgreSQL → Debezium CDC → Kafka → Flink → Delta Lake → dbt → ClickHouse → Grafana

Core Stack

Streaming & Messaging

  • Apache Kafka

  • Apache Flink (exactly-once)

  • Apache Spark (PySpark)

  • Debezium CDC

  • Avro + Schema Registry

Storage & Processing

  • ClickHouse (OLAP)

  • Delta Lake / Apache Iceberg

  • PostgreSQL · Greenplum

  • AWS Redshift

Transformation & Orchestration

  • dbt Core (Silver/Gold layers)

  • Apache Airflow (40+ DAGs)

  • Great Expectations

  • DataHub (data lineage)

Cloud & Infrastructure

  • AWS (S3, Lambda, EC2, Redshift, Kinesis, Glue, SageMaker)

  • Yandex Cloud

  • Docker · Kubernetes · Terraform

  • GitLab CI/CD

Observability

  • Prometheus · Grafana

  • Alertmanager (P0/P1/P2)

  • Loki · Promtail

  • Superset · Redash · DataLens

Languages

  • SQL (advanced)

  • Python

  • English — fluent

  • Chinese — conversational

  • Russian — native

AI-Augmented Development

  • Claude API · OpenAI API · LLM API integration

  • Prompt Engineering (chain-of-thought, structured outputs, tool use)

  • Claude Code (agentic coding workflows)

  • AI-assisted code review, SQL generation, documentation

  • LLM-powered data pipeline automation


Featured Projects

Production-grade local data platform — from CDC to ML predictions

A complete end-to-end data platform built with open-source tooling. Demonstrates real-world patterns: exactly-once Flink, Avro schema evolution, Data Contracts, full observability stack, and troubleshooting runbooks.

CDC (Debezium) → Kafka → Flink (exactly-once) → Delta Lake → dbt → Superset + Grafana
                                                                  ↳ MLflow → SageMaker

Key features: Schema Registry · Data Contracts · 5 Troubleshooting Runbooks · GDPR/PII masking · make demo one-command setup

Python Apache Flink dbt Delta Lake


Cloud-native data platform on AWS — Kinesis · Glue · Iceberg · SageMaker

A 12-week portfolio project targeting AWS Data Engineer Associate certification. Implements a full Medallion architecture on AWS: real-time ingestion via Kinesis, S3 Lakehouse with Apache Iceberg, batch processing through AWS Glue + dbt, Flink anomaly detection, and MLOps pipeline with SageMaker + MLflow.

Phases: Foundation → Lakehouse → Batch (dbt) → Real-Time (Flink) → ML (Churn Prediction) → Serving (QuickSight)

AWS Apache Iceberg SageMaker


Professional Experience Highlights

Wildberries · Senior Data Engineer / Tech Lead (Dec 2025 — present)

  • Designed a 6-layer hybrid Kappa architecture (Ingestion → Staging → Entities → Speed → Batch → Serving) processing 700M+ events/hour from Kafka
  • Built sharded ClickHouse cluster: sipHash64 partitioning, Buffer Engine for compaction, ReplicatedReplacingMergeTree for entity tracking
  • Introduced Entities layer consolidating status lifecycles across 1.2B+ rows, eliminating full-scans and enabling lineage tracking
  • Optimized dashboards: pre-aggregated Serving layer reduced scan volume from 1.2M rows → 1,000 (~1000x speedup)
  • Implemented Rolling Backfill with 8-day recalculation window — 100% accuracy under delays up to 186 hours
  • Led team of 3 engineers: code review, branching strategy, CI/CD (trunk-based Git)
  • Analyzed 13,000+ lines of SQL across 4 business domains, 60+ dashboards, 20+ materialized views

Wildberries · Senior Data Engineer (Apr 2024 — Dec 2025)

  • Built Medallion architecture (Bronze/Silver/Gold) from scratch on ClickHouse — first structured data platform for the team
  • Designed Data Mesh across 5 products: each department as an independent data domain owner
  • Deployed alerting for Kafka, ClickHouse, Airflow with 3-tier escalation (P0/P1/P2)
  • Doubled warehouse processing speed, reduced inventory losses by 80%

CandyCat · Senior Data Engineer (Aug 2023 — Jun 2024)

  • Built real-time CDC pipeline: Debezium (PostgreSQL) → Kafka → Flink (streaming) + Spark (batch) — reduced data delivery latency from 4 hours to 30 seconds
  • Built Apache Iceberg lakehouse with ACID guarantees and schema evolution — reduced storage costs by 35% via compaction and partitioning
  • Orchestrated 40+ Airflow DAGs, dbt transformations, Terraform + Docker infrastructure
  • Implemented AWS analytics: S3 data lake, Lambda for event processing, EC2 for batch, Redshift for OLAP

Kata Academy · Middle / Senior Data Engineer (Feb 2022 — Aug 2023)

  • Built analytics system from scratch: PostgreSQL schema (3NF) + ETL pipelines to ClickHouse
  • Created 50+ dashboards in DataLens and Superset for marketing and product teams
  • Automated reporting — reduced manual work by 50%, weekly report prep from 8 hours → 15 minutes
  • Deployed and administered analytics infrastructure in Yandex Cloud

What I'm Working On

  • 🔨 Production Hardening of the E-Commerce Data Platform: Schema Registry, Data Contracts, troubleshooting runbooks
  • 📚 AWS Data Engineer Associate certification (DEA-C01)
  • 🤖 Exploring MLOps as a growth path — Feature Stores, model monitoring, retraining pipelines
  • 📝 Writing ADRs (Architecture Decision Records) on Flink vs Spark, Delta Lake vs Iceberg


Let's Connect

I'm actively looking for remote Senior/Staff Data Engineer roles at international companies — particularly those working on real-time data platforms, streaming analytics, or MLOps infrastructure.

Interested in: Tinybird · DoubleCloud · Altinity · adtech/fintech scale-ups

LinkedIn Email


Built with ☕ and ClickHouse queries in Nha Trang, Vietnam 🌊

Pinned Loading

  1. aws-ecommerce-analytics aws-ecommerce-analytics Public

    Cloud-native e-commerce analytics on AWS: Kinesis, Glue, Iceberg, SageMaker

  2. clickhouse-analytics-cookbook clickhouse-analytics-cookbook Public

    ClickHouse patterns, materialized views, and distributed query optimization