Blog

Writing code with AI: My 4 phases

January 9, 2026

These days, I develop with a Q&A AI tool, a systematic, slow-and-steady human-and-AI approach, and/or parallel agent mode. Mixed in there is hand-written code, supplementary research from docs or online, and hopefully some human intelligence.

Read the full post >

Quotes and sources for reliable RAG

November 11, 2025

I find that it’s effective to nudge a RAG system to prioritize returning quotes and sources.

Read the full post >

You can fine-tune the iPhone’s built-in AI model…

June 13, 2025

…which can then be served on-device, for free, as part of your iOS app.

Read the full post >

Give the world yet another chatbot, with FastAPI and AWS (Bonus: Troubleshoot API Issues)

May 16, 2025

You’ve probably wanted to deploy a chatbot lately.

Read the full post >

Host LanceDB on S3 for RAG or other applications

April 24, 2025

While ChatGPT or similar can now search the web, your proprietary content is probably unavailable online, and therefore can’t be used to provide more “contextual” responses.

Read the full post >

For better AI, add randomness. And start with a lot of answers.

October 28, 2024

When training AI, engineers begin with some random numbers.

Read the full post >

Comparing some early language models: BERT vs GPT-1

September 3, 2024

BERT is a language model released in late 2018, soon after GPT-1. There are many similarities, and some major differences.

Read the full post >

Simple example of XGBoost with nested cross validation, in python (with Bonus)

August 6, 2024

Actual machine learning code usually takes up a minority of your project code [1].

Read the full post >

Minimum cost or maximum likelihood?

August 1, 2024

In classification, you want to minimize your mistakes, or miss-classifications. The fewer mistakes, the better your model.

Read the full post >

ML for Tabular Data

July 26, 2024

For machine learning on tabular data, you probably should use XGBoost.

Read the full post >

Survival Methods in R

June 13, 2024

Easily obtain common results and outputs. This should get you 80% of the way.

Read the full post >

Makefile quick start

May 13, 2019

Makefiles are a simple way to organize code execution/compilation.

Read the full post >

Fred Brooks: “grow” software

March 24, 2018

Great design comes from great designers.

Read the full post >

What is nested cross-validation (for) and why you should use it

December 2, 2017

In machine learning, cross-validation is always the answer. But what are the questions?

Read the full post >

What is gradient descent (for)?

November 20, 2017

When normal equations are computationally expensive, such as with large feature spaces, linear regression may be solved using the Gradient Descent algorithm.

Read the full post >

When programming, consider the rule of three

August 30, 2017

In software engineering, there is a rule of three which can make your programming work more efficient.

Read the full post >

Tmux Reference

May 9, 2017

tmux manages your terminal windows to help you work on multiple projects.

Read the full post >

Reproducible vs. non-reproducible analysis

April 17, 2017

Below are some characteristics of a reproducible analysis, in contrast to a non-reproducible analysis.

Read the full post >

dplyr is a top 10 invention in past 10 years

February 10, 2017

Do you love R’s dplyr?

Read the full post >

Distinguishing between statistical modeling and machine learning

July 12, 2016

If you are looking for it, here is one framework to distinguish statistical modeling from machine learning. It hinges on whether or not you are interested in the interpretability of your model.

Read the full post >

Where a human still beats an algorithm

July 6, 2016

“Algorithms don’t understand the subtlety and the mixing of [music] genres. So we hired hundreds of the best people we know.” - Jimmy Iovine, on curation in Apple Music.

Read the full post >

Essential R packages that will dramatically improve your workflow

June 7, 2016

R is a fantastic tool for data analysis, and you can take it to the next level by learning the pipe %>% operator and using the packages dplyr, ggplot2, broom and a few others.

Read the full post >

Software Carpentry

January 19, 2015

Below are notes from a talk at the Harvard Innovation Lab by Jim Waldo, CTO and instructor, Harvard University, on January 19, 2015. The following notes cover about 85% of the talk. See bottom for a bonus.

Read the full post >

The Guardian: Why Your Data Matters

November 14, 2014

Rather than burying its data collection motives and tactics in an obscure privacy policy, the Guardian has set up a portal* with articles and videos explaining these in layman’s terms.

Read the full post >

“Summary Statistics Obscure Important Details”

November 7, 2014

Boston Data Festival was this week, packed with excellent talks on topics from “Quantifying Culture” to “Evaluating Trading Algorithms Using Probabilistic Programming.”

Read the full post >

Deflating Big Data Hubris

September 24, 2014

The highs and lows of Google Flu trends, “once a poster child for the power of big-data analysis,” serve as a case study for David Lazer, Ryan Kennedy, Gary King, et al, in their Science Magazine* article “The Parable of Google Flu: Traps in Big Data Analysis.”

Read the full post >

A Peek into Apple University

August 26, 2014

From a New York Times article* “Inside Apple’s Internal Training Program”, some highlights on simplicity in Apple:

Read the full post >

“Should we believe more in Big Data or in magic?”

November 6, 2013

From a Reuters column*:

Read the full post >

Ever seen a twitter-activity map? Then you must consider Big-Data bias…

April 4, 2013

Over the weekend of Apple’s April 3 release of the iPad, 73% of circulated tweets were favorable toward the iPad, but 26% expressed disappointment that the iPad could not replace the iPhone, according to a study.

Read the full post >

NYT: Sure, Big Data Is Great. But So Is intuition

January 27, 2013

Steve Lohr at the New York Times* writes a reflection on the promises of Big Data, citing increasing buzz, yet also its initial big failure:

Read the full post >

Don’t go from 2D to 1D with p-values

May 14, 2012

Or, when not to be too precise.

Read the full post >

2026 | Site based on the Primer Theme.