Duyet Le

Data & AI Engineer building scalable data platforms and AI-powered systems. 8+ years shipping production workloads across ClickHouse, Kubernetes, and cloud infrastructure. Currently exploring LLM agents and Rust.

Education

University of Information Technology

Thesis
Bachelor's degree, Information System

Experience

Sr. Data EngineerOct 2023 – Present
  • Migrated from legacy stack (Spark, Iceberg, Trino) to ClickHouse.
  • Migrated 350TB+ Iceberg Data Lake to ClickHouse on Kubernetes.
  • Achieved 300% better data compression and 2x-100x faster queries with ClickHouse compared to Trino + Iceberg.
  • Automated operations with Airflow: data replication, data processing, healthchecks, etc.
  • Built AI Agents and AI Workflows on top of ClickHouse Data Lake and Documentation with LangGraph, LlamaIndex, Qdrant, Firecrawl, Cube.js, Next.js.
Sr. Data EngineerOct 2018 – Jul 2023
  • Optimized monthly costs from $45,000 to $20,000 (GCP and AWS).
  • Managed a team of 4 data engineers and 2 data analysts to provide end-to-end analytics solutions to stakeholders. Raised data-driven awareness throughout the organization and encouraged everyone to take a more data-driven approach to problem-solving.
  • Designed next-gen Data Platform in Rust ↗︎
  • Developed tools for Data Monitoring, Data Catalog, and Self-service Analytics for internal teams with everything deployed on Kubernetes.

FPT SoftwareFPT Software

Sr. Data EngineerJun 2017 – Oct 2018
  • Built data pipelines processing 2TB/day with AWS for a Recommendation System
  • Ingested and transformed 1TB+/day into Data Lake using Azure Cloud and Databricks

John von Neumann InstituteJohn von Neumann Institute

Data EngineerSep 2015 – Jun 2017
  • Developed data pipelines, data cleaning and visualizations for ad-hoc problems.
  • Trained and deployed ML models: customer lifetime value, churn prediction, sales optimization, recruitment optimization, etc.

Technical Skills

Languages & Frameworks: Python, Rust, TypeScript, SQL, Spark
Data & AI: LlamaIndex, AI SDK, LangGraph, ClickHouse, Kafka, Airflow, BigQuery, AWS
DevOps: CI/CD, Kubernetes, Helm Charts, Cloudflare