Software Engineer | French Dev in NYC 🗽🇫🇷
Lead SWE @ Salesforce. 20+ years deterministic systems, now building and learning in public. Agent Evals, AI tooling and much more.
Find me: shahfazal.com | LinkedIn
ARIA-ready descriptions for civic data visualizations. Submitted to the Kaggle Gemma 4 Good Hackathon (May 2026).
Fine-tuned Gemma 4 E4B on 61 hand-curated examples, paired with a deterministic verification layer that grounds extracted numbers against source CSV when available. Numbers from the model are treated as claims to be checked, not tokens to be trusted.
Tech stack: Unsloth, TRL, PyTorch, Modal, FastAPI, Python
Shipped:
- Fine-tuned model published to HuggingFace
- Deterministic verifier with four states (verified / partial / unverified / structural-issue)
- Live demo on Modal
- Two upstream vision DPO fixes contributed to unslothai/unsloth#5196 (merged April 29, 2026)
Session browser for Claude Code. Five shipped versions:
- v0.1: Local session explorer (parses
~/.claude/directory) - v0.2: Memory browser (reads project memory states)
- v0.3: Compaction viewer (analyzes context window compression)
- v0.4: Resilience — environment health checks, session export endpoint, externalized pricing config
- v0.5: Stats dashboard — D3 heatmap, per-project cost bars, cumulative cost line, date-range filter, graceful degradation on parse failures
Upcoming: Driver.js guided help tour across the nav surfaces Tech stack: Python, Flask, Jinja2, D3.js, pytest
Use case: Browse session history, review memory evolution, analyze cost and token usage across projects.
French municipal elections 2026 data viz.
- Live at shahfazal.com/elections-municipales-2026.
- Submission on data.gouv.fr.
Data sources: DVF (property prices), 2nd round results (Ministère de l'Intérieur)
Stack: Python + pandas (pipeline), Plotly.js (charts), Leaflet.js (maps), Driver.js (help tours), vanilla JS
Shipped:
- 5 interactive tabs: quintile breakdown, abstention box plot by bloc, Price distribution box plot, Paris-Lyon-Marseille choropleth, prix/m² vs abstention scatter plot with year toggle
- 838 communes analysed, DVF 2024 + 2025
- Guided tours, full French UI, accessibility attributes
- Published réutilisation on data.gouv.fr for the Défi 1 challenge
Key lesson: Declarative specs upfront beat imperative iteration. Full build log coming in blog series.
datagouv/datagouv-mcp#115 (merged)
Fixed search_datasets reporting resources_count: 4 for every dataset. The v2 search API returns resources as a HATEOAS link dict, so the client was counting its 4 keys instead of reading resources.total. Found while using the MCP, isolated against the live API, locked with a regression unit test, and verified end-to-end through the local MCP loop (using the call_tool.py I shipped in #100).
Impact: Consuming models now see true per-dataset resource counts instead of a constant 4.
datagouv/datagouv-mcp#100 (merged)
Reduced dev friction when testing the official French data.gouv.fr MCP server. Added a /health endpoint that runs a full MCP handshake plus tool call, and a call_tool.py script that replaces the manual 3-curl handshake with a single command.
Impact: Lowers barrier for contributors testing MCP integrations locally.
unslothai/unsloth#5199 (merged)
Filed unslothai/unsloth#5196 reporting two vision DPO blockers on Gemma 4 (tokenization hang in dataset.map + data collator schema mismatch) with reproductions and documented workaround attempts. Fix merged into Unsloth main on April 29, 2026.
Posts (and ramblings) at shahfazal.com/posts:
- "Nobody Tests the Steering Wheel" - Why agent evals need observe-first methodology
- "Claude Gatekeep You Yet?" - why it's important to stop and think before handing the reins to your coding agent.
- Declarative Viz series (upcoming) - Build log from elections-municipales-2026
- CivicInsight retrospective (upcoming) - 5 weeks, 19 sessions, what shipped and what got dropped
Next up: Decompressing from CivicInsight. Picking up backlog projects.
Backlog:
- ADS-B + Gemma 4 voice assistant on Raspberry Pi (family collaborative project)
- TinyDiffusion (3-phase learning project: 1D scalar diffusion → 2x2 unconditional → 2x2 conditional)
- CodeHaiku (fine-tune Gemma 4 to write PR review comments as haiku)
- Public agent eval demo using datagouv-mcp
- AI workflow optimizer (analyzes Claudio session exports for inefficiency patterns)
- Plotly a11y toolkit
Before building production eval systems, rebuilt intuition from first principles:
- TinyNet - Neural net from scratch (Python, no frameworks)
- NYC EV LSTM - Spatio-temporal demand forecasting
These aren't production systems - they're foundational exercises to understand backprop, overfitting, and temporal modeling before applying those concepts to agent evaluation.
Philosophy: If it can't be measured, it can't be trusted. I apply 20+ years of production engineering rigor (observability, regression detection, test harness design) to the chaos of agentic systems.