Andrew Friedman
afriedman412 [at] gmail [dot] com
Home • Work • Projects • Open Source • Writing • ContentPROJECTS
Petey
1-1-2026
An open-source Python framework for extracting structured data from PDFs using configurable LLM and parser backends. Distributed as a pip-installable package with a CLI, Python API and FastAPI service.
Pluggable model-serving layer abstracts 12 LLM backends (OpenAI, Anthropic, Ollama, plus OpenAI-compatible providers) and 8 PDF parsers, with an async concurrency manager that splits CPU and API worker pools to maximize throughput under provider rate limits.
Includes a schema-agnostic evaluation harness — sentence-transformers cosine similarity with field-type-aware scoring — that runs factorial sweeps across model × parser × dataset, tracking accuracy, latency and cost. Production-deployed as a containerized FastAPI service on Google Cloud Run with autoscaling.
Python, FastAPI, LLMs, Async Concurrency, Evaluation Frameworks, GCP Cloud Run, Docker
Short-Term Rental Revenue Estimator
12-5-2025
Put in a Zillow link in Chicago, the number of bedrooms and bathrooms and how many people it accommodates, get back optimal price per night, expected annual occupancy and predicted total revenue if you were to list it on AirBNB.
Built on a trio of LightGBM models and powered by a neighbor-search stack that uses BallTree for precise local comps, FAISS for fast semantic similarity, and KNN-style aggregation to turn those neighbors into pricing and occupancy signals.
Also includes a user-facing AI interface that synthesizes model predictions into plain-language investment evaluations.
MSDS Capstone project. Blood, sweat and tears. Not investment advice!!!
Machine Learning, Feature Engineering, Backend Development, Cloud Deployment, DevOps & Tooling
The Shape of Meaning — Model Sensitivity to Textual Perturbation
12-1-2025
An empirical study of how off-the-shelf NLP classifiers respond when language is systematically distorted. Used dialogue from The Office, It's Always Sunny in Philadelphia and South Park — three sitcoms with similar formats but distinct tonal identities — as a natural comparative corpus, splitting ~150,000 lines into ~28,000 overlapping 8-line chunks and scoring each chunk for Emotion (4-model ensemble), Toxicity (Detoxify) and Topic (BART-large-MNLI zero-shot).
Applied a series of controlled perturbations (case/punctuation removal, light and full token shuffling, and a "shuffle with preservation" condition that scrambled word order while keeping punctuation and capitalization intact) then quantified how each signal degraded using PCA + UMAP embeddings, silhouette scores, Jensen-Shannon divergence and Wasserstein distance.
Graduate project, Northwestern MSDS.
NLP, Model Evaluation, Zero-Shot Classification, Hugging Face, PyTorch, UMAP / PCA, Statistical Analysis
Look At Me
12-4-2023
A light, modular website template built on Python, Jinja2 and not much else.
I made Look At Me because I was having trouble figuring out how to best structure a portfolio website that could showcase projects that fell under multiple categories. For example: I wanted to have a "writing" page and a "data journalism" page, with projects formatted differently on each page. Some projects fell under both categories, but having an entry on each page for the same project didn't seem like the best approach.
Look At Me lets you build a site by filling out some YAML and maybe tweaking some Jinja code if you are feeling spicy. As a bonus, you don't need to learn a new language, or use an external service to do it!
Flask, Jinja2, CSS
SaysWho
4-1-2023
A Python package for identification and attribution of quotes within a document.
- Finds quotes and their speakers using a combination of grammar and logic.
- Uses coreferencing models to resolve ambiguous speakers.
- Built on SpaCy and Textacy
SpaCy, Named Entity Recognition, Numpy, Regular Expressions
Rap Caviar Gender Balance Tracker
10-1-2022
Automated data collection and visualization of Spotify's influential Rap Caviar playlist, tracking how the gender balance of artists continues to change over time.
(No longer updated due to API changes as of December 2024)
- Pulls chart updates daily using the Spotify API.
- Infers artist gender based on pronoun usage in artist bios from several sources.
- Identifies groups and breaks them into their constituent artists.
- Visualizes data using Plotly, including charts of raw and normalized gender balance.
SQL, AWS, ElasticBeanstalk, Plotly, Pandas
Tracking Use of Passive Voice When Reporting on Police Violence
1-13-2022
I tracked the rate of use of "officer-involved" and similar exonerative language across 20 years of American reporting on police violence. Collaboration with Brandon Soderberg.
SKILLS:natural language processing
G Sheets Tools
9-1-2020
Light Python package for moving data in and out of Google Sheets.
- Manipulate, analyze and share.csv (and .csv-like) data using tools like Jupyter and Pandas.
- Avoid the formatting and versioning issues raised by repeated imports and exports.
- Easier than using the Google Cloud API!
Housing Court Data Scraper
6-1-2020
I built an app to help housing lawyers fight eviction during the pandemic upheaval.
- Automatically identified tenants at risk of missing court dates, allowing lawyers to reach out and offer assistance.
- Reduced hours of manually searching through online forms to a few minutes.
Selenium, BeautifulSoup, SQL
Tenant Help Hotline
6-1-2020
I set up a hotline to make it easier for tenants without internet access to contact Brooklyn Eviction Defense if they needed the organization's help.
- Provides useful tenant information in English and Spanish.
- Automatically populates documents from voicemail transcriptions to streamline intake of new tenants looking for help.
- Quick response line sends a text to members on call in case of emergency.
Twilio, SQL, AWS