Shahfazal Mohammed shahfazal

Bonjour, I'm Fazal! 👋

Software Engineer | French Dev in NYC 🗽🇫🇷

Lead SWE @ Salesforce. 20+ years deterministic systems, now building and learning in public. Agent Evals, AI tooling and much more.

Find me: shahfazal.com | LinkedIn

Active Projects

CivicInsight

ARIA-ready descriptions for civic data visualizations. Submitted to the Kaggle Gemma 4 Good Hackathon (May 2026).

Fine-tuned Gemma 4 E4B on 61 hand-curated examples, paired with a deterministic verification layer that grounds extracted numbers against source CSV when available. Numbers from the model are treated as claims to be checked, not tokens to be trusted.

Tech stack: Unsloth, TRL, PyTorch, Modal, FastAPI, Python

Shipped:

Fine-tuned model published to HuggingFace
Deterministic verifier with four states (verified / partial / unverified / structural-issue)
Live demo on Modal
Two upstream vision DPO fixes contributed to unslothai/unsloth#5196 (merged April 29, 2026)

Claudio

Session browser for Claude Code. Five shipped versions:

v0.1: Local session explorer (parses ~/.claude/ directory)
v0.2: Memory browser (reads project memory states)
v0.3: Compaction viewer (analyzes context window compression)
v0.4: Resilience — environment health checks, session export endpoint, externalized pricing config
v0.5: Stats dashboard — D3 heatmap, per-project cost bars, cumulative cost line, date-range filter, graceful degradation on parse failures

Upcoming: Driver.js guided help tour across the nav surfaces Tech stack: Python, Flask, Jinja2, D3.js, pytest

Use case: Browse session history, review memory evolution, analyze cost and token usage across projects.

elections-municipales-2026

French municipal elections 2026 data viz.

Live at shahfazal.com/elections-municipales-2026.
Submission on data.gouv.fr.

Data sources: DVF (property prices), 2nd round results (Ministère de l'Intérieur)

Stack: Python + pandas (pipeline), Plotly.js (charts), Leaflet.js (maps), Driver.js (help tours), vanilla JS

Shipped:

5 interactive tabs: quintile breakdown, abstention box plot by bloc, Price distribution box plot, Paris-Lyon-Marseille choropleth, prix/m² vs abstention scatter plot with year toggle
838 communes analysed, DVF 2024 + 2025
Guided tours, full French UI, accessibility attributes
Published réutilisation on data.gouv.fr for the Défi 1 challenge

Key lesson: Declarative specs upfront beat imperative iteration. Full build log coming in blog series.

Recent Contributions

datagouv/datagouv-mcp#115 (merged)

Fixed search_datasets reporting resources_count: 4 for every dataset. The v2 search API returns resources as a HATEOAS link dict, so the client was counting its 4 keys instead of reading resources.total. Found while using the MCP, isolated against the live API, locked with a regression unit test, and verified end-to-end through the local MCP loop (using the call_tool.py I shipped in #100).

Impact: Consuming models now see true per-dataset resource counts instead of a constant 4.

datagouv/datagouv-mcp#100 (merged)

Reduced dev friction when testing the official French data.gouv.fr MCP server. Added a /health endpoint that runs a full MCP handshake plus tool call, and a call_tool.py script that replaces the manual 3-curl handshake with a single command.

Impact: Lowers barrier for contributors testing MCP integrations locally.

unslothai/unsloth#5199 (merged)

Filed unslothai/unsloth#5196 reporting two vision DPO blockers on Gemma 4 (tokenization hang in dataset.map + data collator schema mismatch) with reproductions and documented workaround attempts. Fix merged into Unsloth main on April 29, 2026.

Writing

Posts (and ramblings) at shahfazal.com/posts:

"Nobody Tests the Steering Wheel" - Why agent evals need observe-first methodology
"Claude Gatekeep You Yet?" - why it's important to stop and think before handing the reins to your coding agent.
Declarative Viz series (upcoming) - Build log from elections-municipales-2026
CivicInsight retrospective (upcoming) - 5 weeks, 19 sessions, what shipped and what got dropped

What I'm Working On

Next up: Decompressing from CivicInsight. Picking up backlog projects.

Backlog:

ADS-B + Gemma 4 voice assistant on Raspberry Pi (family collaborative project)
TinyDiffusion (3-phase learning project: 1D scalar diffusion → 2x2 unconditional → 2x2 conditional)
CodeHaiku (fine-tune Gemma 4 to write PR review comments as haiku)
Public agent eval demo using datagouv-mcp
AI workflow optimizer (analyzes Claudio session exports for inefficiency patterns)
Plotly a11y toolkit

ML Foundations

Before building production eval systems, rebuilt intuition from first principles:

TinyNet - Neural net from scratch (Python, no frameworks)
NYC EV LSTM - Spatio-temporal demand forecasting

These aren't production systems - they're foundational exercises to understand backprop, overfitting, and temporal modeling before applying those concepts to agent evaluation.

Philosophy: If it can't be measured, it can't be trusted. I apply 20+ years of production engineering rigor (observability, regression detection, test harness design) to the chaos of agentic systems.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly