Mathematician by training (PhD in extreme value theory), full-stack AI engineer in practice. I like problems that need both a whiteboard and a deploy pipeline.
π These days I'm co-founder & CDO at Nabu, where I've spent the last six years building the data and ML side of a document-intelligence platform for customs & trade β turning messy, unstructured trade documents into structured, calibrated data. In practice, a lot of slow manual processing collapses into a few minutes.
- Document AI & RAG β LangChain / LangGraph, Weaviate & pgvector, visual-rich document understanding, custom OCR pipelines
- LLM systems with some rigour β dynamic structured outputs, logprob-calibrated confidence scores, and MILP where it earns its keep
- The stack around it β FastAPI, PostgreSQL, React/Vue, AWS (EKS, SageMaker), Terraform, Kubernetes, and a soft spot for observability (OpenTelemetry, Datadog, SigNoz, Grafana)
- π Quant & crypto tinkering β backtesting ideas, poking at market data, and the occasional Ethereum rabbit hole
- π§ͺ ML / RL & generative AI for fun β reinforcement learning, self-hosted LLMs, Stable Diffusion
- π Self-hosting & homelab β Proxmox, ZFS, Grafana dashboards, and a healthy distrust of the cloud for personal stuff
- ποΈ Data hoarding β web archiving, media library tooling, metadata wrangling, giving everything a tidy, well-tagged home. If it can be catalogued, I've probably tried.
- Beets-Plugin_VGMdb β VGMdb metadata for the beets music manager
- fast_deskew β a fast document deskew library, born from real OCR pain
- py-prisma2markdown β turn Prisma schemas into readable Markdown docs
β‘ Fun fact: all of this runs on three servers and ~200 TB at home β which I assure everyone is "for the homelab" and definitely not just hoarding.
π¬ Always happy to talk document AI, applied ML with real math behind it, quant experiments, or homelab over-engineering.
π« hozhenwai@gmail.com Β· LinkedIn Β· based in Strasbourg π«π·