Skip to content
View theanh's full-sized avatar

Block or report theanh

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don’t include any personal information such as legal names or email addresses. Markdown is supported. This note will only be visible to you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
theanh/README.md

TA

The Anh Nguyen

     

Focus: AI Engineering - Agentic Systems - Data Engineering - Cloud Infrastructure - Backend Systems - Engineering Leadership

Engineering at scale. I optimize data systems that handle billions of records, cut infrastructure costs, and actually last. Hands-on from product discovery, architecture, implementation to deployment.

Engineering since 2011. Pipelines, EMR clusters, Airflow DAGs, and the unglamorous work of cutting cloud costs from the inside. I profile actual bottlenecks before changing anything, and prefer durable fixes over clever ones.

Built teams too: grew my team at iPrice from 3 to 7 and mentored two engineers who were later promoted.

AWS Certified Solutions Architect: Associate (July 2022).

Numbers worth quoting

Backend at scale

  • 20M+ monthly visitors served
  • <250ms API p95 latency
  • 4x crawler throughput (60K -> 250K pages/hour)

Data engineering

  • 6B records at peak
  • $6K+ monthly infrastructure savings (EMR $7K -> $3K, OpenSearch $2K -> self-hosted, reporting ~$1K)
  • <8 h data processing time

Leadership

  • Grew my team from 3 to 7, with 2 engineers later promoted

Case studies

See theanh.github.io for the long versions.

  • SketchNet: convolutional neural network (CNN) for hand-drawn sketches. 95.1% accuracy, 938 KB model, 1ms inference. Live demo.
  • DIA Risk Screener: five algorithms scoring the same molecule for drug-induced autoimmunity risk. The spread between their probabilities is the trust signal. 0.896 best test AUC (area under the ROC curve), 477 compounds. Live demo.
  • PCA Audio Toolkit: Principal Component Analysis (PCA)-based audio denoising and lossy compression. Live demo.
  • Cutting Spark shuffle cost: wide vs narrow transformations on billion-record EMR pipelines.

Stack

Python - TypeScript - JavaScript - PHP - PyTorch - scikit-learn - XGBoost - LLM applications - AWS (Athena - EMR - S3 - SQS - ElastiCache - Lambda - RDS) - Azure Data Warehouse - Apache Airflow - PySpark - Elasticsearch - MySQL - PostgreSQL - SQL Server - Cassandra - Laravel - Symfony - RESTful API - GraphQL - Docker - Terraform - ELK Stack

Building AI applications now: LLM-powered automation in production, and learning ML by building (see SketchNet, DIA Risk Screener). Also travel and read.

"Great code is minimal to no code."

Popular repositories Loading

  1. vm-docker vm-docker Public

    Virtual machine base on docker

    Smarty 1

  2. flow flow Public

    Forked from facebook/flow

    Adds static typing to JavaScript to improve developer productivity and code quality.

    OCaml 1

  3. clockwork clockwork Public

    Forked from itsgoingd/clockwork

    Clockwork - php dev tools integrated to your browser - server-side component

    PHP 1

  4. puppetron puppetron Public

    Forked from cheeaun/puppetron

    Puppeteer (Headless Chrome Node API)-based rendering solution.

    JavaScript 1

  5. gdpr-cookie gdpr-cookie Public

    Forked from acoustep/gdpr-cookie

    A jQuery plugin to manage cookie settings in compliance with EU law

    JavaScript 1

  6. 46e6e8d89d8d9c7fcd04bf0ca78b5b1d 46e6e8d89d8d9c7fcd04bf0ca78b5b1d Public

    Application allows user to take a survey

    CSS