Andrew Friedman

afriedman412 [at] gmail [dot] com

Home • Work • Projects • Open Source • Writing • Content

WORK

ML Engineer

1-1-2022 to present
Sludge

Build and operate production AI/data systems that extract structured information from FEC campaign-finance filings, congressional stock-trading disclosures and municipal budget PDFs. The pipelines power ongoing newsroom queries and super-PAC industry-flow visualizations.

Designed a RAG pipeline over thousands of municipal budget PDFs (ChromaDB + OpenAI embeddings) with metadata filtering and async producer-consumer ingest, plus crash-resistant subprocess isolation and SQLite-based job-state tracking.
Owned the full lifecycle of fine-tuned extraction models (Mistral 7B with LoRA) — dataset curation, evaluation against held-out ground truth, and production deployment.
Productionalized real-time FEC ingestion as a containerized FastAPI service on GCP Cloud Run and Cloud SQL, with Cloud Scheduler triggering ingest jobs and multi-tier job sizing for heavy parallel workers.
Implemented retry/backoff, idempotent dedup and checkpoint/resume to ingest 1.5M+ FEC records under API rate limits. Automated deploys to Cloud Run via GitHub Actions with OIDC.

A few stories that used data from these systems:

Data Scientist (Contract)

1-1-2023 to 6-1-2023
Center for Just Journalism

In partnership with the NYU Wagner School of Public Service, I worked with graduate students to investigate American newspapers' reliance on police sources when reporting on crime, and how that affected coverage of both crime and police. While the students conducted an in-depth analysis of a representative 300 article sample, I used a programmatic approach to analyze the full 100,000 article data set.

KEY RESPONSIBILITIES:

Development of a standalone Python package for quote identification, attribution and resolution
Acquisition and processing of 100,000 articles
Lexis/Nexis query optimization to minimize irrelevant or off-topic articles
Topic modeling to verify the fidelity of the students' sub-sample

Data Lead

6-1-2022 to 6-2-2023
Google/Medill Data-Driven Reporting Project

Member of a team awarded a grant from the Google/Medill Data-Driven Reporting Project to study 30 years of detailed crime statistics obtained from the Baltimore Police Department.

Results of the study are being published as a multipart series in The Real News:

Part 1: Baltimore's Crime Numbers Game
Part 2: The Short History and Long Tail of Baltimore’s “Zero Tolerance” Policing
Part 3: An Audit of Baltimore City's Data Integrity
Part 4: An Evaluation of City Budget and Health Metrics

Data Scientist

11-1-2018 to 8-1-2022
Remarkable AI (fka Chatdesk)

Remarkable AI (previously Chatdesk) is a Series A company backed by leading Silicon Valley investors like Menlo Ventures, Susa Ventures and Slow Ventures in the customer service space, whose customers include leading brands like Grubhub, BarkBox, Thinx, and OLAPLEX.

KEY RESPONSIBILITIES:

Deployed message classification model for 1 million weekly messages with 99.5% accuracy
Implemented Named Entity Recognition to increase flexibility of cleaning code
Developed and maintained code base for cleaning and standardization of incoming messages for downstream processing for 100+ companies in 10+ languages, from diverse sources (Zendesk, Salesforce, Intercom, Facebook, Instagram)
Attended client meetings for technical integration and conduct data analysis to help our sales team

Instructional Associate

9-1-2018 to 5-1-2020
General Assembly

Train students in data science methods, concepts and technologies, including: bash, python, data mining, supervised and unsupervised learning techniques, model building, forecasting, SQL, AWS and NLP.