🦄 AI That Works

On Zoom, Tuesdays at 10 AM PST - an hour of live coding, Q&A, and production-ready AI engineering

🦄 Next Episode

Constructing Taxonomy from Unstructured Data

Tuesday, December 16, 2025 at 10 AM PST

Building taxonomies and classification systems from unstructured data using AI.

What We're About

Weekly conversations with @hellovai & @dexhorthy about getting the most juice out of today's models

When: Every Tuesday at 10 AM PST on Zoom
Duration: 1 hour of live coding, Q&A, and production-ready insights
Goal: Take your AI app from demo → production

Let's code together.

Pre-Reading & Setup

Before joining, get familiar with our toolkit:

Core Tools

Zoom - Live sessions
Cursor - AI-powered IDE
Git - Version control
Claude Code - Agentic Coding
CodeLayer - Agentic Coding Tool

Languages

Python/TypeScript/Go - Application logic
BAML - Prompting DSL
- Repository
- Getting Started Guide

Package Managers

Python: UV
TypeScript: PNPM
Go: Go modules

Episodes & Workshops

From Demo to Production - One Episode at a Time

📅 Episode	📝 Description
UPCOMING 2025-12-16 #35: Constructing Taxonomy from Unstructured Data code • register	Building taxonomies and classification systems from unstructured data using AI.
PAST 2025-12-09 #34: Git Worktrees for AI Coding Agents code	Since ~ May 2025, there's been a ton of buzz around AI coding agents, parallelizing workflows, and it's not stopping any time soon. On this episode we'll go deep on the tech that can help you push the limits of these tools, including: - Crash course on Git Worktrees - File and Spec Management, tradeoffs in hardlinks vs symlinks - tmux as a building block for collaborative agent workflows
PAST 2025-11-25 #33: No Vibes Allowed: Using CodeLayer to Build CodeLayer code	Live coding with CodeLayer, we'll use Research / Plan / Implement live to ship 3 new features to CodeLayer.
PAST 2025-11-18 #32: Building an Animation Pipeline code	We do a lot of work with Excalidraw, and this session shows the AI-first workflow for turning any sketch into a finished animation. We'll blend Claude Code with custom TypeScript scripts, wire up interactive slash commands, and add browser automation to existing OSS tools to export polished WebM assets.
PAST 2025-11-11 #31: Dates, Times, and LLMs code	How do you make an LLM amazing at dates? Relative dates, absolute dates, timezones, all that madness. Let's talk dates, times, and all that goodness.
PAST 2025-11-04 #30: Event-driven agentic loops watch • code	Key takeaway: treat agent interactions as an event log, not mutable state. Modeling user inputs, LLM chunks, tool calls, interrupts, and UI actions as a single event stream lets you project state for the UI, agent loop, and persistence without drift. We walk through effect-ts patterns for subscribing to the bus, deriving “current” state via pure projections, and deciding when to persist or replay events—plus trade-offs for queuing, cancelation, and tool orchestration in complex agent UX.
PAST 2025-10-28 #29: Ralph Wiggum under the hood: Coding Agent Power Tools watch • code	We've talked a lot about how to use context engineering to get more out of coding agents. In this episode, we dive deep on the Ralph Wiggum technique and why this different approach can reshape your coding workflow. We explore how Ralph handles greenfield work, refactors, and spec generation—surprise: it's all about higher-quality context engineering.
PAST 2025-10-21 #28: Agentic RAG + Context Engineering watch • code	In this conversation, Vaibhav Gupta and Dex explore the intricacies of building an Agentic Retrieval-Augmented Generation (RAG) system. They discuss the differences between traditional RAG and Agentic RAG, emphasizing the flexibility and decision-making capabilities of the latter. The conversation includes a live demo of a coding agent, insights into the coding architecture, challenges faced during tool implementation, and the iterative process of refining the system. They also touch on the integration of web search functionalities and the evaluation of tool effectiveness, providing a comprehensive overview of the development process and the underlying principles of Agentic RAG systems. In this conversation, Vaibhav Gupta and Dex discuss the intricacies of building dynamic AI systems, focusing on tool implementation, user interface optimization, and model performance. They explore the importance of reinforcement learning in training models, the challenges of debugging AI systems, and the significance of writing code to enhance understanding and efficiency in AI development. The dialogue emphasizes the balance between different AI approaches and the necessity of real use cases in building effective solutions.
PAST 2025-10-14 #27: No Vibes Allowed - Live Coding with AI Agents watch • code	Vaibhav Gupta and Dex demonstrate the power of AI-assisted coding by implementing a complex timeout feature for BAML (a programming language for AI applications) in a live coding session. Starting from a GitHub issue that had been open since March, they showcase a systematic workflow: specification refinement, codebase research, implementation planning, and phased execution. Using Claude and specialized coding agents, they navigate a 400,000+ line codebase, implementing timeout configurations for HTTP clients including connection timeouts, request timeouts, idle timeouts, and time-to-first-token for streaming responses. The session highlights key practices like context engineering, frequent plan validation, breaking complex features into testable phases, and the importance of reading AI-generated code. In under 3 hours of live coding, they achieve what would typically take 1-2 days of engineering time, successfully implementing parsing, validation, error handling, and Python integration tests.
PAST 2025-10-12 SF Workshop: Unconference SF code	Special unconference episode from San Francisco.
PAST 2025-10-07 #26: Anthropic Post Mortem watch • code	In this conversation, Vaibhav Gupta and Aaron discuss various aspects of AI model performance, focusing on the recent downtime experienced by Anthropic and the implications for AI systems. They explore the sensitivity of models to context windows, the challenges of output corruption, and the complexities of token selection mechanisms. The discussion also highlights the importance of debugging and observability in AI systems, as well as the role of user-friendly workflows and integrations in making AI accessible to non-technical users. The conversation concludes with thoughts on the future of AI development and the need for effective metrics to monitor product performance.
PAST 2025-09-30 #25: Dynamic Schemas watch • code	In this episode, Dex and Vaibhav explore the concept of dynamic UIs and how to build systems that can adapt to unknown data structures. They discuss the importance of dynamic schema generation, meta programming with LLMs, and the potential for creating dynamic React components. The conversation also delves into the execution and rendering of these dynamic schemas, highlighting the challenges and opportunities in this evolving field. They conclude with thoughts on future directions and the importance of building robust workflows around schema management.
PAST 2025-09-23 #24: Evals for Classification watch • code	In this episode of AI That Works, hosts Vaibhav Gupta and Dex, along with guest Kevin Gregory, explore the intricacies of building AI systems that are ready for production. They discuss the concept of dynamic UIs, the challenges of large-scale classification, and the importance of user experience in AI applications. The conversation delves into the use of LLMs for enhancing classification systems, the evaluation and tuning of these systems, and the subjective nature of what constitutes a 'correct' classification. The episode emphasizes the need for engineers to focus on accuracy and user experience while navigating the complexities of AI engineering. The speakers also discuss model upgrades, user feedback, and the importance of building effective user interfaces, emphasizing iterative development and rapid prototyping for chatbot performance evaluation.
PAST 2025-09-16 #23: Bash vs. MCP - token efficient coding agent tooling watch • code	In this conversation, Dex and Vaibhav delve into the intricacies of coding agents, focusing on the debate between using MCP (Model Control Protocol) and Bash for tool integration. They explore the importance of understanding context windows, token management, and the efficiency of using different tools. The discussion emphasizes the significance of naming conventions, dynamic context engineering, and the engineering efforts required to optimize performance. They also share real-world applications, best practices for using MCPs, and engage with the community through a Q&A session.
PAST 2025-09-09 #22: Generative UIs and Structured Streaming watch • code	We'll explore hard problems in building rich UIs that rely on streaming data from LLMs. Specifically, we'll talk through techniques for rendering STRUCTURED outputs from LLMs, with real-world examples of how to handle partially-streamed outputs over incomplete JSON data. We'll explore advanced needs like * Fields that should be required for stream to start * Rendering React Components with partial data * Handling nullable fields vs. yet-to-be-streamed fields * Building high-quality User feedback * Handling errors mid-stream
PAST 2025-09-02 #21: Voice Agents and Supervisor Threading watch • code	Exploring voice-based AI agents and supervisor threading patterns for managing complex conversational workflows.
PAST 2025-08-26 #20: Claude for Non-Code Tasks watch • code	On #17 we talked about advanced context engineering workflows for using Claude code to work in complex codebases. This week, we're gonna get a little weird with it, and show off a bunch of ways you can use Claude Code as a generic agent to handle non-coding tasks. We'll learn things like: Skipping the MCP and having claude write its own scripts to interact with external systems, Creating internal knowledge graphs with markdown files, How to blend agentic retrieval and search with deterministic context packing
PAST 2025-08-19 #19: Interruptible Agents watch • code	Anyone can build a chatbot, but the user experience is what truly sets it apart. Can you cancel a message? Can you queue commands while it's busy? How finely can you steer the agent? We'll explore these questions and code a solution together.
PAST 2025-08-12 #18: Decoding Context Engineering Lessons from Manus watch • code	A few weeks ago, the Manus team published an excellent paper on context engineering. It covered KV Cache, Hot-swapping tools with custom samplers, and a ton of other cool techniques. On this week's episode, we'll dive deep on the manus Article and put some of the advice into practice, exploring how a deep understanding of models and inference can help you to get the most out of today's LLMs.
PAST 2025-08-05 #17: Context Engineering for Coding Agents watch • code	By popular demand, AI That Works #17 will dive deep on a new kind of context engineering: managing research, specs, and planning to get the most of coding agents and coding CLIs. You've heard people bragging about spending thousands/mo on Claude Code, maxing out Amp limits, and much more. Now Dex and Vaibhav are gonna share some tips and tricks for pushing AI coding tools to their absolute limits, while still shipping well-tested, bug-free code. This isn't vibe-coding, this is something completely different.
PAST 2025-07-29 #16: Evaluating Prompts Across Models watch • code	AI That Works #16 will be a super-practical deep dive into real-world examples and techniques for evaluating a single prompt against multiple models. While this is a commonly heralded use case for Evals, e.g. 'how do we know if the new model is better' / 'how do we know if the new model breaks anything', there's not a ton of practical examples out there for real-world use cases.
PAST 2025-07-22 #15: PDFs, Multimodality, Vision Models watch • code	Dive deep into practical PDF processing techniques for AI applications. We'll explore how to extract, parse, and leverage PDF content effectively in your AI workflows, tackling common challenges like layout preservation, table extraction, and multi-modal content handling.
PAST 2025-07-15 #14: Implementing Decaying-Resolution Memory watch • code	Last week on #13, we did a conceptual deep dive on context engineering and memory - this week, we're going to jump right into the weeds and implement a version of Decaying-Resolution Memory that you can pick up and apply to your AI Agents today. For this episode, you'll probably want to check out episode #13 in the session listing to get caught up on DRM and why its worth building from scratch.
PAST 2025-07-08 #13: Building AI with Memory & Context watch • code	How do we build agents that can remember past conversations and learn over time? We'll explore memory and context engineering techniques to create AI systems that maintain state across interactions.
PAST 2025-07-01 #12: Boosting AI Output Quality watch • code	This week's session was a bit meta! We explored 'Boosting AI Output Quality' by building the very AI pipeline that generated this email from our Zoom recording. The real breakthrough: separating extraction from polishing for high-quality AI generation.
PAST 2025-06-24 #11: Building an AI Content Pipeline watch • code	Content creation involves a lot of manual work - uploading videos, sending emails, and other follow-up tasks that are easy to drop. We'll build an agent that integrates YouTube, email, GitHub and human-in-the-loop to fully automate the AI that Works content pipeline, handling all the repetitive work while maintaining quality.
PAST 2025-06-17 #10: Entity Resolution: Extraction, Deduping, and Enriching watch • code	Disambiguating many ways of naming the same thing (companies, skills, etc.) - from entity extraction to resolution to deduping. We'll explore breaking problems into extraction → resolution → enrichment stages, scaling with two-stage designs, and building async workflows with human-in-loop patterns for production entity resolution systems.
PAST 2025-06-10 #9: Cracking the Prompting Interview watch • code	Ready to level up your prompting skills? Join us for a deep dive into advanced prompting techniques that separate good prompt engineers from great ones. We'll cover systematic prompt design, testing tools / inner loops, and tackle real-world prompting challenges. Perfect prep for becoming a more effective AI engineer.
PAST 2025-06-03 #8: Humans as Tools: Async Agents and Durable Execution watch • code	Agents are great, but for the most accuracy-sensitive scenarios, we some times want a human in the loop. Today we'll discuss techniques for how to make this possible. We'll dive deep into concepts from our 4/22 session on 12-factor agents and extend them to handle asynchronous operations where agents need to contact humans for help, feedback, or approvals across a variety of channels.
PAST 2025-05-27 #7: 12-factor agents: selecting from thousands of MCP tools watch • code	MCP is only as great as your ability to pick the right tools. We'll dive into showing how to leverage MCP servers and accurately use the right ones when only a few have actually relevant tools.
PAST 2025-05-20 #6: Policy to Prompt: Evaluating w/ the Enron Emails Dataset watch • code	One of the most common problems in AI engineering is looking at a set of policies/rules and evaluating evidence to determine if the rules were followed. In this session we'll explore turning policies into prompts and pipelines to evaluate which emails in the massive Enron email dataset violated SEC and Sarbanes-Oxley regulations.
PAST 2025-05-17 SF Workshop: Workshop SF – Twelve Factor Agents code	Live workshop in San Francisco on building 12 factor agents. Interactive instruction, code-along format, and hackathon to build production-ready AI agents.
PAST 2025-05-13 #5: Designing Evals watch • code	Minimalist and high-performance testing/evals for LLM applications. Stay tuned for our season 2 kickoff topic on testing and evaluation strategies.
PAST 2025-05-10 NYC Workshop: Workshop NYC – Twelve Factor Agents code	Live workshop in NYC on building 12 factor agents. Interactive instruction, code-along format, and hackathon to build production-ready AI agents.
PAST 2025-04-22 #4: Twelve Factor Agents watch • code	Learn how to build production-ready AI agents using the twelve-factor methodology. We'll cover the core concepts and build a real agent from scratch.
PAST 2025-04-15 #3: Code Generation with Small Models watch • code	Large models can do a lot, but so can small models. We'll discuss techniques for how to leverage extremely small models for generating diffs and making changes in complete codebases.
PAST 2025-04-08 #2: Reasoning Models vs Reasoning Prompts watch • code	Models can reason but you can also reason within a prompt. Which technique wins out when and why? We'll find out by adding reasoning to an existing movie chat agent.
PAST 2025-03-31 #1: Large Scale Classification watch • code	LLMs are great at classification from 5, 10, maybe even 50 categories. But how do we deal with situations when we have over 1000? Perhaps it's an ever changing list of categories?

Name		Name	Last commit message	Last commit date
Latest commit History 371 Commits
.claude/commands		.claude/commands
.vscode		.vscode
2025-03-31-large-scale-classification		2025-03-31-large-scale-classification
2025-04-07-reasoning-models-vs-prompts		2025-04-07-reasoning-models-vs-prompts
2025-04-15-code-generation-small-models		2025-04-15-code-generation-small-models
2025-04-22-twelve-factor-agents		2025-04-22-twelve-factor-agents
2025-05-10-workshop-nyc-twelve-factor-agents		2025-05-10-workshop-nyc-twelve-factor-agents
2025-05-13-designing-evals		2025-05-13-designing-evals
2025-05-17-workshop-sf-twelve-factor-agents		2025-05-17-workshop-sf-twelve-factor-agents
2025-05-20-policies-to-prompts		2025-05-20-policies-to-prompts
2025-05-27-mcp-with-10000-tools		2025-05-27-mcp-with-10000-tools
2025-06-03-humans-as-tools-async		2025-06-03-humans-as-tools-async
2025-06-10-cracking-the-prompting-interview		2025-06-10-cracking-the-prompting-interview
2025-06-17-entity-extraction		2025-06-17-entity-extraction
2025-06-24-ai-content-pipeline		2025-06-24-ai-content-pipeline
2025-07-01-ai-content-pipeline-2		2025-07-01-ai-content-pipeline-2
2025-07-08-context-engineering		2025-07-08-context-engineering
2025-07-15-decaying-resolution-memory		2025-07-15-decaying-resolution-memory
2025-07-22-multimodality		2025-07-22-multimodality
2025-07-29-eval-many-models-same-prompt		2025-07-29-eval-many-models-same-prompt
2025-08-05-advanced-context-engineering-for-coding-agents		2025-08-05-advanced-context-engineering-for-coding-agents
2025-08-12-manus-context-engineering		2025-08-12-manus-context-engineering
2025-08-19-interruptible-agents		2025-08-19-interruptible-agents
2025-08-26-claude-for-non-code-workflows		2025-08-26-claude-for-non-code-workflows
2025-09-02-voice-agent-supervisor-threading		2025-09-02-voice-agent-supervisor-threading
2025-09-09-generative-uis		2025-09-09-generative-uis
2025-09-16-coding-agent-tools-bash-vs-mcp		2025-09-16-coding-agent-tools-bash-vs-mcp
2025-09-23-evals-for-classification		2025-09-23-evals-for-classification
2025-09-30-dyanmic-schemas		2025-09-30-dyanmic-schemas
2025-10-07-anthropic-post-mortem		2025-10-07-anthropic-post-mortem
2025-10-12-unconference-sf		2025-10-12-unconference-sf
2025-10-14-no-vibes-allowed		2025-10-14-no-vibes-allowed
2025-10-21-agentic-rag-context-engineering		2025-10-21-agentic-rag-context-engineering
2025-10-28-ralph-wiggum-coding-agent-power-tools		2025-10-28-ralph-wiggum-coding-agent-power-tools
2025-11-05-event-driven-agents		2025-11-05-event-driven-agents
2025-11-11-dates-and-times		2025-11-11-dates-and-times
2025-11-18-building-an-animation-pipeline		2025-11-18-building-an-animation-pipeline
2025-11-25-no-vibes-allowed-using-codelayer-to-build-codelayer		2025-11-25-no-vibes-allowed-using-codelayer-to-build-codelayer
2025-12-02-multimodal-evals		2025-12-02-multimodal-evals
2025-12-09-git-worktrees		2025-12-09-git-worktrees
2025-12-16-taxonomy-from-unstructured-data		2025-12-16-taxonomy-from-unstructured-data
thoughts		thoughts
tools		tools
.DS_Store		.DS_Store
.env.example		.env.example
.envrc		.envrc
.gitignore		.gitignore
Makefile		Makefile
README.md		README.md
data.json		data.json
feed.xml		feed.xml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

🦄 AI That Works

🦄 Next Episode

Constructing Taxonomy from Unstructured Data

What We're About

Pre-Reading & Setup

Core Tools

Languages

Package Managers

Episodes & Workshops

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 7

Uh oh!

Languages

ai-that-works/ai-that-works

Folders and files

Latest commit

History

Repository files navigation

🦄 AI That Works

🦄 Next Episode

Constructing Taxonomy from Unstructured Data

What We're About

Pre-Reading & Setup

Core Tools

Languages

Package Managers

Episodes & Workshops

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 7

Uh oh!

Languages

Packages