Skip to content

theanh/theanh

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

16 Commits
 
 

Repository files navigation

TA

The Anh Nguyen

     

Focus: AI Engineering - Agentic Systems - Data Engineering - Cloud Infrastructure - Backend Systems - Engineering Leadership

Engineering at scale. I optimize data systems that handle billions of records, cut infrastructure costs, and actually last. Hands-on from product discovery, architecture, implementation to deployment.

Engineering since 2011. Pipelines, EMR clusters, Airflow DAGs, and the unglamorous work of cutting cloud costs from the inside. I profile actual bottlenecks before changing anything, and prefer durable fixes over clever ones.

Built teams too: grew my team at iPrice from 3 to 7 and mentored two engineers who were later promoted.

AWS Certified Solutions Architect: Associate (July 2022).

Numbers worth quoting

Backend at scale

  • 20M+ monthly visitors served
  • <250ms API p95 latency
  • 4x crawler throughput (60K -> 250K pages/hour)

Data engineering

  • 6B records at peak
  • $6K+ monthly infrastructure savings (EMR $7K -> $3K, OpenSearch $2K -> self-hosted, reporting ~$1K)
  • <8 h data processing time

Leadership

  • Grew my team from 3 to 7, with 2 engineers later promoted

Case studies

See theanh.github.io for the long versions.

  • SketchNet: convolutional neural network (CNN) for hand-drawn sketches. 95.1% accuracy, 938 KB model, 1ms inference. Live demo.
  • DIA Risk Screener: five algorithms scoring the same molecule for drug-induced autoimmunity risk. The spread between their probabilities is the trust signal. 0.896 best test AUC (area under the ROC curve), 477 compounds. Live demo.
  • PCA Audio Toolkit: Principal Component Analysis (PCA)-based audio denoising and lossy compression. Live demo.
  • Cutting Spark shuffle cost: wide vs narrow transformations on billion-record EMR pipelines.

Stack

Python - TypeScript - JavaScript - PHP - PyTorch - scikit-learn - XGBoost - LLM applications - AWS (Athena - EMR - S3 - SQS - ElastiCache - Lambda - RDS) - Azure Data Warehouse - Apache Airflow - PySpark - Elasticsearch - MySQL - PostgreSQL - SQL Server - Cassandra - Laravel - Symfony - RESTful API - GraphQL - Docker - Terraform - ELK Stack

Building AI applications now: LLM-powered automation in production, and learning ML by building (see SketchNet, DIA Risk Screener). Also travel and read.

"Great code is minimal to no code."

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors