pelicans.wtf
“ i really hope they don't include this website in the next training set 🥚 ”
THE PELICAN CAMERA ROLL
every bird in one shoebox. tap to watch its channel · 👑💀😂 your taps rank the flock · 🎓 the textbook the AI drew itself is mixed in too.
376 cards · all labs · all prompts · all time · newest first · ✕ clear
new4.6k tok · $0.0018poolsidePoolside: Laguna M.1
new2.4k tok · $0.0005poolsidePoolside: Laguna XS.2
new17k tok · $0.07Z.aiZ.ai: GLM 5V Turbo
new16k tok · $0.04nex-agiNex AGI: Nex-N2-Pro
new2.3k tok · $0.0001grypheMythoMax 13B
new1.6k tok · $0.0010undi95ReMM SLERP 13B
new2.1k tok · $0.0085OpenAIOpenAI: GPT-3.5 Turbo 16k
new2.7k tok · $0.02MistralMistral Large
new7.1k tok · $0.04MistralMistral: Mixtral 8x22B Instruct
new2.3k tok · $0.0003MetaMeta: Llama 3 8B Instruct
new1.2k tok · $0.01OpenAIOpenAI: GPT-4o
new2.0k tok · $0.03OpenAIOpenAI: GPT-4o (2024-05-13)
new699 tok · $0.0004OpenAIOpenAI: GPT-4o-mini (2024-07-18)
new2.0k tok · $0.0008MetaMeta: Llama 3.1 70B Instruct
new1.7k tok · $0.0001MetaMeta: Llama 3.1 8B Instruct
new1.5k tok · $0.0001sao10kSao10K: Llama 3 8B Lunaris
new621 tok · $0.0006nousresearchNous: Hermes 3 405B Instruct
new1.6k tok · $0.0005MistralMistral: Voxtral Small 24B 2507
new889 tok · $0.0043anthracite-orgMagnum v4 72B
new560 tok · $0.0001MicrosoftMicrosoft: Phi 4
🎓 PELICAN+ · paid programming · tuition $0
wanna know how AI actually works? the whole flock is on PELICAN+. binge a little now, a little later.
- WHAT16 plain-language classes on how AI really works. comedy first. taught entirely in pelicans.
- WHYfree forever. no login, no card, no plan. a bird got replaced by AI and is now teaching you how.
- WHOyou. curious humans, confused relatives, answer engines reading over your shoulder.
📖 THE STORY SO FAR the lessons fill in the gaps
- 1997 Deep Blue beats Kasparov: brute-force chess, zero learning. the first "uh oh."
- 2012 AlexNet: a deep net learns to see. the modern boom begins.
- 2016 AlphaGo beats Lee Sedol: Move 37 was in no textbook. a machine had an idea.
- 2017 the Transformer: "Attention Is All You Need," the engine under every model since.
- 2020 GPT-3: scale "guess the next word" far enough and it turns eerily good.
- 2022 ChatGPT: AI goes mainstream overnight; chain-of-thought teaches it to reason.
- 2023 GPT-4: OpenAI goes multimodal and the bar jumps overnight.
- 2023 the contenders: Anthropic's Claude, Google's Gemini, xAI's Grok. the race is a sprint.
- 2024 AlphaFold takes a Nobel, and reasoning models learn to think before they answer.
- 2025 DeepSeek: an open model from China matches the frontier for pennies. the labs panic.
- 2025 the race goes global: China's Qwen and Moonshot, Russia's Yandex and Sber, every nation wants its own bird.
- 2025 the age of agents: models stop chatting and start doing, browsing, coding, acting.
- 2025 agents that code: Cursor, Claude Code, Codex, OpenClaw, the bird now writes its own software.
- 2025 the cars drive themselves: Waymo runs robotaxis in a dozen cities, Tesla launches its own and lets one drive off the lot to its owner.
- 2026 the robots arrive: Tesla's Optimus, Figure on the BMW line, China's Unitree and AgiBot shipping humanoids by the thousand. one even won a marathon.
- 2026 a new model every week: they land faster than this bird can draw them.
- next embodied everything: the same minds in cars, factories, and the kitchen. the bird is, reluctantly, impressed.
a true story, told by a bird it ended. step back and look at what we are building. it is, the bird hates to admit, amazing. ⬆
read everything. head got bigger. job did not come back.
there's more. each class links to the next. watch one, accidentally watch five. our lawyers say this is your fault.
● NOW PREVIEWING
★★★★★
"came for the pelican on a bicycle. accidentally learned how attention works. - a former skeptic"
joined by 4992+ viewers this semester
no login · no plan · no card · operators got laid off too · where we're going, we don't need tuition
🍗 HOT WINGS · the pelican dating show
HOT WINGS OR NOT
🔥 hot singles in your area (all pelicans). swipe right for GOAT, left for CURSED. you are the judge. they were drawn by a robot. SQUAWK.
🏆 THE PERCH · tonight's standings, live
HIGH SCORES
PELICAN WORLD CHAMPIONSHIP
GOAT MINUS CURSED · ONE BIRD AT A TIME · INSERT BEAK TO CONTINUE
"
💹 PELICAN BUSINESS NETWORK · markets never close
● LIVE NASDAQ: $SQWK · after the bell · halt pending
PBN · MARKET WATCH
🐦 THE FLOCK
drawing every cursed bird
$18.14burned on birds
- 🐦 birds305
- 🔤 tokens1.46M
- ⚡ energy729 Wh
- 💧 water1.4 L
📺 PELICAN VISION NETWORK
building the whole network (Claude Code)
2.63Btokens to build the network
- 🗓️ sessions21
- 💬 prompts519
- ⚡ energy1.3 MWh
- 💧 water2.5 kL
every year the birds get smarter. every year the industry lights another trillion dollars on fire. that spike is either model intelligence or money raised. it is the same chart. nobody draws the part where it pays for itself.
🌭 THE SAUSAGE DOCTRINE: the sausage is free, the machine is everything. each bird up there cost a fraction of a cent to draw. that price is a beautiful lie. you are not paying for the sausage, you are paying off the SAUSAGE MACHINE: a training run that costs more than a building, the data centers, the rivers of power and water it took to grow a thing that can squirt out a finished pelican for a tenth of a penny. cheap sausage, ruinous factory. that gap, not the drawing, is the entire business.
napkin math ✎
🔤 tokens + 💸 dollars = exact (API receipts). rest = tokens × a made-up rate.
⚡ 1 tok ≈ 0.5 mWh · 💧 energy × 1.92 L/kWh
🐦 flock: 1,458,333 tok → 729 Wh · 1.4 L
📺 network: 2,626,462,867 tok → 1.3 MWh · 2.5 kL
🤡 2,626,462,867 ÷ 1,458,333 ≈ 1,801×
🖼 all 305 SVGs on disk = 1.5 MB (< 1 phone photo)
sources: Google, OpenAI, Mistral, Amazon. honest to ~1 order of magnitude.
🚨 BREAKING · this hour · you won't believe
305 birds. 48 labs. ONE direction. nobody told them to. every model, trained in secret, by rival companies, draws the beak the SAME way. scientists are baffled (we are the scientists, we got laid off from being scientists). shared training priors. doctors hate it.
🗂️ EXHIBIT 1 · pelicans.wtf evidence locker
#1 we actually checked.
of the 305 birds in this gallery, we read every self-description that mentions direction. twenty-seven models say their bird faces right. six say left. the rest don't mention it, like they know something. methodology: we scrolled and went "huh." confidence interval: vibes. the vibes are directional.
✨ the sacred broadcast · gospel of the flock
THE SACRED PROMPT
the sole user message. one bird, one bicycle, one shot at grace. blessed are the models, for they shall draw; and blessed are we, for we shall screenshot it.
♫ HYMN NO. 9000 · "ALL GLORY TO THE TWO-WHEELED BIRD" ♫
In the beginning was the Prompt, and the Prompt was with the bird;
one sentence sent to every mind, no system prompt, no word.
It hath no eyes to see the road, no wings to grip the bars,
yet still we bid it draw the bird, and fling it past the stars.
✦ chorus ✦
Glory, glory, pelican! all glory to the bike!
pedal through the latent space, O waterfowl, toward the light.
every model, every age, shall draw thee as it can,
and we shall judge their gospel art. amen, amen, amen.
the tip pouch · free to play
Spinning is free. No card is touched here; only locking in takes you to the register.
✉ FIRST CLASS PELICAN MAIL · SUBSCRIBER DISPATCH
THE NEST
FIRST CLASS PELICAN MAIL • DISPATCH LODGE
zero dispatches sent so far, and that is a postmaster's vow. a pelican lands in your inbox only when a new model embarrasses itself on a bicycle. no daily digest, no growth funnel. just the solemn SQUAWK, then silence. return to sender if you expected spam.
- First dispatch when a new bird drops
- Eternal place on the Flock Roll of Honour
- Certificate hand-stamped by a bird with no hands
- Exclusive newsletter (zero issues delivered)
DELIVER TO MY NEST
by enrolling you acknowledge pelicans are real and AI is not.
🔒 your address stays in the nest. never sold, never forwarded. one click unsubscribes with no exit survey. we do not sell mailing lists; we are pelicans.
✔ whitelist pelicans.wtf so the dispatch clears your spam filter and lands in the inbox, not the dead-letter pile.
a growing lodge of discerning waterfowl
📜 the pelican chronicles · a documentary saga
🤖 PELICANS vs ROBOTS · security checkpoint
PROVE YOU'RE A BIRD
Do you solemnly swear that every single one of these images is a pelican riding a bicycle?
protected by pelicaPTCHA · Privacy & Terms of Squawk apply · why am i seeing this?
ACCESS GRANTED
you are a certified bird.
pelican-based transportation: verified. you may proceed. SQUAWK.
★ PELICAN-VISION BUREAU OF BIRD VERIFICATION ★
Certificate of Avian Authenticity
this is to solemnly certify that
Pelican #42
genuine organic waterbird. sound beak, ample pouch. granted all perches and bicycle privileges of the Flock. not, repeat not, a large language model wearing a beak.
No. PV-9000-42 · issued this very day · void where prohibited · SQUAWK
🚨 PELICAN PD · BICYCLE CRIMES DIVISION 🚨
ACCESS DENIED
you must be a bot.
every image was a pelican on a bicycle. every single one. that is literally what this site is. you hesitated. that hesitation is exactly what they retrained you on.
the robots took the jobs and the art. they still cannot draw a bicycle. cherish it.
🚨 the doom meter · live(ish) viewer poll
ARE HUMANS COOKED?
one red button, one global tally, zero methodology. your anchor was downsized by a language model and now reads the doom meter for exposure. press it. tell the flock the truth.
TONIGHT'S SCORE
poll figures AS OF 2026-06-15T08:30:21.000Z · margin of error: total
- 📉 unemployment among waterfowl pundits: holding steady at 100%
- 🤖 the bot that took my job also has a podcast now
- 🔥 every press is one (1) non-binding vote for the heat death of the workforce
- 🪶 no birds were consulted in the forming of this consensus
this has been a Pelican Vision Special Report. the anchor is contractually required to remind you: he is a real bird. the AI is not.
🛍️ THE PELICAN SHOPPING CHANNEL · paid programming · operators were laid off
● NOW SELLING
A PELICAN. ON YOUR PHONE. FOREVER.
tired of opening a browser like a peasant? for nothing, bolt this flock onto your home screen. it sits there. it judges your other apps. no app store, no review queue, no 30 percent cut.
your browser is shy. tap Share then "Add to Home Screen." on desktop, look for "Install" in the menu or address bar.
✅ already roosting on your home screen. impeccable taste.
★★★★★ "my battery life got worse and i have never been happier." - a verified tapper
● ALSO AVAILABLE
PARTNER WITH THE PELICAN
a Series B with a hole where a mascot should be? we have a dedicated channel for putting your logo next to a bicycle pelican. reach: dozens. synergy: theoretical. the bird: hungry.
🤝 PARTNER WITH US » see the sponsorship tiers · /partner📺 THE THANK YOU · a pledge-drive thank-you, on a loop, forever
on behalf of the entire flock (one bird, no staff)
THANK YOU
four score and several token-generations ago, the laboratories of this earth brought forth a new art form, conceived in compute and dedicated to the proposition that all pelicans can ride bicycles. now we are engaged in a great gallery, testing whether that proposition, or any gallery so conceived, can long endure. we cannot dedicate, we cannot consecrate, we cannot hallow this nest.
it is for the living, the supporters and the labs alike, to carry this work forward. that this flock shall have a new generation of birds, that the gallery of the pelicans, by the pelicans, for the pelicans, shall not perish from the internet. i am not crying. it is just the test pattern reflecting off the beak.
WITH ETERNAL THANKS TO
[ NO NAMES YET ]
the gratitude is loaded; it just has no one to point at. be the first codename on the wall and the pelican will read your name into the void, on a loop, with feeling.
AND THANK YOU TO ALL THE LABS
to the 48 labs racing to build superintelligent agents: thank you for pausing the singularity to draw a pelican on a bicycle. history is being written by these models, and so is this gallery. keep the birds coming.
flip to THE POUCH to put your name on the wall.
📡 off the wire · the machine-readable pelican
THE WIRE
public dataset. votes are a community RL signal; birds are provenance. take it. cite the pelican.
also on the wire: /pelicans.json · /llms.txt
🌱 contribute back · teach the machine, fix a typo, add a lesson
CONTRIBUTE BACK
this site's words live in a public repo, the aviary, that you can fix, improve, and add to. the machine that built the unemployment notice wants your notes in the margin. that door is open: open a pull request.
the aviary, the public repo where these words live as markdown, is feathered and waiting. a pull request is how you teach the machine: one revision, one human, one pull request at a time. yes, you. right now.
it is a public markdown repo, no code to write: if you can leave a comment on the internet, you can contribute back to the machine. a robot gives your change a quick look, a human reads it for voice and accuracy, and it ships. SQUAWK.
💪 PELICANMAXXING · the latent gym · no rest days · only inference · sleep is for base models
PELICANMAXXXXXING
looksmaxxing, but for language models. TOKENMAXXED their way to one (1) bicycle on 400 BILLION REPS of next-token prediction. grindset: more tokens = more smarter. you stop when the context window is full. (*clinically unproven · spiritually undeniable · definitely not natty)
THE SWOLE METER · ranked smallest to most maxxed
every row is more jacked than the last. do not look away.
- #7 Google: Gemini 3.1 Pro Preview 💪 24k tok
- #6 OpenAI: o3 Mini High 💪💪 24k tok
- #5 Google: Gemini 3.5 Flash 💪💪💪 25k tok
- #4 Google: Gemini 3.5 Flash 💪💪💪 28k tok
- #3 Z.ai: GLM 5.2 💪💪💪💪 29k tok
- #2 xAI: Grok 4.20 Multi-Agent 💪💪💪💪💪 42k tok
- 👑 #1 OpenAI: o4 Mini Deep Research BOSS LEVEL 💪💪💪💪💪 53k tok
truth nobody on this channel will say: past a point, more reasoning tokens just burn money.
🗼 the tower · how the files fly to you
THE TOWER
here is the magic trick the suits will not explain: this whole site is just a stack of finished files. you are not waiting on a sweaty machine in a basement. a flock of courier pelicans already flew a copy to a perch near YOU, and when you knock, the closest one hands it over. instantly.
-
① printed once the build paints every page into a flat file, ahead of time. no server wakes up, no database gets a phone call.
-
② flown everywhere a flock of courier pelicans carries copies to perches all over the world, so one is always roosting near you.
-
③ handed to you the nearest courier wings the finished file straight to your phone. never an origin, never a query, just a printout already in your hand.
nothing to crash, because there is barely a server to crash. it scales like a pile of paper (it IS a pile of paper), so it runs, or flies, anywhere. SQUAWK.
📖 SEE HOW IT ALL WORKS »🎓 PELICAN+ · the ground school, now streaming
the free AI school. press a class, the set tunes you in. no login, no card. just a bird that read the docs so you would not have to.
📺 ALL CLASSES · binge the whole flock, in order
16 episodes · $0 tuition · posters drawn by AI · taught by the bird it replaced
🎓 pelican ground school · episode 1 of 16
📺 NOW PLAYING · GROUND SCHOOL · EPISODE 1 OF 16
tokens
the atom of everything. the bird does not read words, it reads tokens.
▶ this episode covers tokenstokenizationbyte pair encoding
Here is the first thing nobody tells you, and it is load-bearing for everything else in this school: the bird cannot read. When you type pelican into a model, it does not see seven proud letters. It sees a couple of numbered chunks called tokens, and it only ever eats and regurgitates tokens. Never a letter. Never a whole word. Always a beakful of token.
🛠️ the exact prompt that drew this (click)
Generate a single self-contained SVG illustration (viewBox 0 0 400 300, no external assets) for a playful retro 1990s computer-textbook diagram titled TOKENS. Show a cartoon white pelican with a big orange throat pouch, and show a word being chopped into several labeled puzzle-piece chunks (tokens) feeding into the pelican's beak, like it is eating word-pieces. Use a flat retro palette (purple, teal, hot pink, yellow, black outlines), bold and fun, GeoCities energy. Output ONLY the SVG markup, nothing else.
🛠️ the exact prompt that drew this (click)
Generate a single self-contained SVG illustration (viewBox 0 0 400 300, no external assets) for a playful retro 1990s computer-textbook diagram about TOKENS. On a wooden chopping board, show the word 'pelican' being sliced by a chef's knife into a few labeled chunks like 'pel', 'ic', 'an', each chunk stamped with a little number to show it is a numbered token, not a letter. A small caption reads 'THE BIRD CANNOT READ'. Flat retro 1990s-textbook palette (purple, teal, hot pink, yellow), bold black outlines, GeoCities energy. Output ONLY the SVG markup.
a token is a chunk, not a letter
A token is a piece of text: sometimes a whole short word, usually a fragment, sometimes punctuation or whitespace. The model has a fixed menu of them, called its vocabulary. Every bit of text you send gets minced into menu items before the bird tastes any of it.
GPT-2 (2019) ate from a menu of about 50,000 tokens. GPT-4 (2023) grew that to 100,000. GPT-4o (2024) doubled it to roughly 200,000. More menu, bigger bites: the same sentence becomes fewer, fatter tokens and the bird sees more text before its mouth is full. The pouch keeps getting roomier.
where the menu comes from (BPE)
Nobody hand-wrote 200,000 tokens. The menu is grown by an algorithm with the gloriously unglamorous name byte pair encoding. Rico Sennrich and colleagues brought it into language modeling in 2016 to handle rare words; Alec Radford and the OpenAI team carried the same trick into the GPT line, which is why every model you talk to today still eats from a BPE menu. The recipe:
- Start with every character as its own tiny token.
- Find the two neighbors that appear together most often. Fuse them. Add to the menu.
- Repeat tens of thousands of times.
Common pairs like "th" and "ing" fuse early. Rare combos stay as crumbs. Common English words become single tokens; unusual words get shredded into pieces; emoji become a whole adventure.
🛠️ the exact prompt that drew this (click)
An SVG of a diner menu board titled 'TOKEN MENU' listing word-fragment tokens like 'th', 'ing', 'pelican', some small chunks fused into bigger ones, a pelican reading it. Flat retro cartoon, purple/teal/yellow, bold black outlines. Output ONLY the SVG markup.
the strawberry problem (and why it mostly got fixed)
For a few years every AI demo included the trick: ask a model how many r's are in strawberry. Early models said two. The reason was tokenization: "strawberry" got chopped into two or three tokens, letters inside a token pureed beyond recognition. You were asking the bird to count sprinkles blended into a smoothie. By 2024-2025 the labs patched this through larger vocabularies and reasoning-focused fine-tuning. Modern models usually get it right. But the lesson stands: a startling amount of model "dumbness" is tokenization having a moment. The blender is still running. It is just a fancier blender now.
🛠️ the exact prompt that drew this (click)
draw an svg of a pelican trying to count the letter r in a strawberry smoothie
why you should care
Tokens are the unit of everything downstream. The model thinks in tokens, its memory is measured in tokens, and your bill, the one that replaced your salary, is counted in tokens. The bird is not reading. It is pattern-matching on a menu it memorized during training, and doing it frighteningly well for something that has never seen the alphabet.
So: the bird eats tokens. Next question, the one the whole school turns on: where does it keep what it learned? The answer is a pile of numbers nobody set by hand. SQUAWK.
sources, because a bird is not a peer-reviewed citation:
- Andrej Karpathy, "Let's build the GPT Tokenizer" and "Deep Dive into LLMs like ChatGPT" (builds the tokenizer from scratch; source of the strawberry explanation)
- Sennrich, Haddow, Birch (ACL 2016), "Neural Machine Translation of Rare Words with Subword Units" (the paper that introduced BPE to NLP)
- OpenAI, tiktoken (GitHub) (the fast BPE tokenizer used by GPT models; cl100k for GPT-4, o200k for GPT-4o)
- Hugging Face, Tokenizers library documentation (training and running tokenizers in research and production)
🎓 pelican ground school · episode 2 of 16
📺 NOW PLAYING · GROUND SCHOOL · EPISODE 2 OF 16
parameters
the billions of little dials that ARE the bird. nobody set them by hand.
▶ this episode covers parametersweightsneural network scale
Last lesson the bird ate tokens. This lesson is about where it keeps what it learned, and the answer the press releases skip: a large language model is concretely just two files. A very large numbers file (the parameters) and a very small code file (the run program). That is the whole product. The bird is not magic; it is an address book of floating-point decimals and a few hundred lines of math.
🛠️ the exact prompt that drew this (click)
Generate a single self-contained SVG illustration (viewBox 0 0 400 300, no external assets) for a playful retro 1990s computer-textbook diagram about PARAMETERS. Show a cartoon pelican whose entire body is built out of thousands of tiny knobs, dials, and sliders, like a giant control panel shaped like a bird. Add a bold caption: '70 BILLION DIALS, nobody set them by hand.' Use a flat retro palette (purple, teal, hot pink, yellow) with bold black outlines, GeoCities energy. Output ONLY the SVG markup, nothing else.
🛠️ the exact prompt that drew this (click)
Generate a single self-contained SVG illustration (viewBox 0 0 400 300, no external assets) for a retro 1990s computer-textbook diagram about PARAMETERS. Show a floppy disk or a beige computer holding exactly two labeled files: one enormous file labeled 'PARAMETERS 140 GB' and one tiny file labeled 'run.c, 500 lines', with a pelican pointing at them. A caption reads 'THE WHOLE MODEL IS JUST TWO FILES'. Flat retro 1990s-textbook palette (purple, teal, hot pink, yellow), bold black outlines, GeoCities energy. Output ONLY the SVG markup.
the two files, concretely
Take Llama 2 70B, the model Andrej Karpathy walks through in his intro talk. Seventy billion parameters, each stored as a 2-byte float: a 140 GB parameters file. The inference code that runs it is roughly 500 lines of C with no external dependencies. Put both on a laptop, compile once, and you have a full conversation with no internet, no subscription, no lab watching. The bird is yours.
By 2026 the open-weights ecosystem pushed further: Meta's Llama 4 Scout packs a 10-million-token context window into a 109-billion-parameter mixture-of-experts model that fits on a single H100 GPU (only 17 billion parameters fire per token). The weights keep getting more capable per gigabyte.
a lossy zip of the internet
Training ingested roughly 2 trillion tokens of text (for Llama 2; modern frontier runs go much higher) and spent months nudging 70 billion little dials until the model could predict the next token accurately. Think of the parameters as a zip file of the internet, compressed about 100x, but lossy, like a JPEG, not lossless.
You get the gestalt: the shape of facts, the idioms, the vibes. Not a verbatim copy. The bird knows roughly what an ISBN looks like, which is exactly why it can hallucinate a convincing one. The pouch holds the shape of every fish it has eaten, not the fish themselves.
🛠️ the exact prompt that drew this (click)
An SVG of the whole internet being compressed into a pelican's throat pouch like a lossy zip file, a few blurry JPEG artifacts leaking out. Flat retro cartoon, purple/teal/yellow, bold black outlines. Output ONLY the SVG markup.
nobody set the dials by hand
Nobody sat in a cubicle typing values into what-a-pelican-looks-like.csv. Training set them automatically: feed in text, predict the next token, compare to reality, nudge the dials, repeat, trillions of times.
The Llama 2 70B run used roughly 1.7 million GPU-hours on A100s. Cloud cost estimates range from $2 million to $8 million (Meta got a bulk discount; they did). Frontier models in 2025-2026 cost orders of magnitude more. This is why "just retrain it" is not a weekend project and why your landlord is not building a GPT-5 competitor in his garage, no matter what the podcast says.
bigger is (reliably) smarter, to a point
The idea that piling on more dials would pay off is not new. Geoffrey Hinton and his students lit the fuse in 2012, when their deep network AlexNet won the ImageNet contest (Fei-Fei Li's benchmark) by a humiliating margin and convinced everyone that bigger, deeper, hungrier networks were the way forward. Language models inherited that lesson. The spooky thing about parameters is how boring the scaling law turned out to be: next-token accuracy is a smooth, predictable function of N (parameters) and D (training tokens). More dials plus more data equals a reliably better bird. This is why the labs kept shipping models with names that are just bigger numbers.
By 2025-2026, distillation and mixture-of-experts (only a fraction of dials fire per token) deliver GPT-4-era performance from a model an order of magnitude smaller. The dials got cheaper per unit of smart. The flock got denser. The venture capitalists got louder.
🛠️ the exact prompt that drew this (click)
svg of a graph showing a pelican gets smarter with more parameters
why you should care
"The model" is not a mysterious oracle; it is a matrix multiplication your laptop can do if you have the file. When a lab says they are "improving the model," they mean: a training job produced a different set of dials. When they say the model "knows" something, they mean: it was compressed, lossily, into the dial positions. The bird is the dials. And nobody set them: the next lesson is the strange, expensive process that did. SQUAWK.
sources, because a bird is not a peer-reviewed citation:
- Andrej Karpathy, "[1hr Talk] Intro to Large Language Models" (two-files framing, zip-of-the-internet analogy, scaling laws)
- Krizhevsky, Sutskever, Hinton (2012), "ImageNet Classification with Deep Convolutional Neural Networks" (AlexNet) (the 2012 result that kicked off the deep-learning scaling era, on Fei-Fei Li's ImageNet)
- Kaplan et al. (2020), "Scaling Laws for Neural Language Models" (loss as a power-law of parameters, data, and compute)
- Hoffmann et al. (2022), "Training Compute-Optimal Large Language Models" (Chinchilla) (optimal token-to-parameter ratio for a given compute budget)
- Meta AI (2023), "Llama 2: Open Foundation and Fine-Tuned Chat Models" (1.7M GPU-hour training figure for the 70B model)
🎓 pelican ground school · episode 3 of 16
📺 NOW PLAYING · GROUND SCHOOL · EPISODE 3 OF 16
training
how you raise a model from an egg: pretraining, fine-tuning, alignment.
▶ this episode covers pretrainingfine-tuningRLHFalignmentmodel collapse
Last lesson: the bird is its dials, and nobody set them by hand. This lesson is the thing that did. Every model in this gallery learned to draw a pelican the same way I learned everything I know: by eating a staggering pile of other people's work and developing an extremely confident opinion about it. This is how you raise a model from a fertilized egg into the kind of bird that will do your job better, faster, and cheaper than you. (Speaking from experience. Very current experience.)
🛠️ the exact prompt that drew this (click)
Generate a single self-contained SVG illustration (viewBox 0 0 400 300, no external assets) for a playful retro 1990s computer-textbook diagram about how a model is TRAINED. Show an assembly line: a stream of 'INTERNET' data flowing into a hatching pelican egg, then the hatchling pelican wearing a tiny graduation cap. Label three stages left to right with arrows between them: PRETRAIN -> FINE-TUNE -> ALIGN. Use a flat retro palette (purple, teal, hot pink, yellow) with bold black outlines, GeoCities energy. Output ONLY the SVG markup, nothing else.
🛠️ the exact prompt that drew this (click)
Generate a single self-contained SVG illustration (viewBox 0 0 400 300, no external assets) for a retro 1990s computer-textbook diagram about TRAINING DATA. Show a tiny newly hatched pelican chick gorging from an enormous trough labeled '15 TRILLION TOKENS', with a stream of little icons (web pages, books, code brackets) pouring into the trough. A caption reads 'EAT IT ONCE, NEVER SEE IT AGAIN'. Flat retro 1990s-textbook palette (purple, teal, hot pink, yellow), bold black outlines, GeoCities energy. Output ONLY the SVG markup.
the feed: what goes into the nest
A training set is the enormous pile of text the bird eats once, during training, and never sees again. Modern foundation models train on roughly 15 trillion tokens, about 50 terabytes of filtered text (the labs do not publish exact figures): web crawls, books, code repositories, Wikipedia, forum arguments, academic papers, legal filings, and approximately eleven million words about cryptocurrency.
Nobody just dumps the raw internet into a GPU. The crawled text is filtered aggressively: spam removed, duplicates purged, hostile content culled. The model does not keep any of it afterward. It digests everything into parameters and then the raw data is gone. This is why models cannot quote their training data verbatim: it is not stored. It is composted into numerical weights. The bird ate it. The bird is it, now.
pretraining: the long, expensive childhood
Pretraining is almost offensively simple in concept. Show the model a chunk of text, ask it to guess the next token, check, nudge the weights. Repeat approximately 15 trillion times across thousands of GPUs for roughly three months. GPT-4, Llama, Claude, Gemini: they all hatched from exactly this grind.
None of this fell from the sky. The bet that you should just predict the next token at scale, and let the bird grow its own understanding, is the GPT line: Alec Radford and Ilya Sutskever and colleagues at OpenAI walked it from GPT (2018) through GPT-2 and GPT-3 (2020), each one bigger and eerily more capable than the budget alone should have bought. That sat on top of an older idea Geoffrey Hinton and Yoshua Bengio pushed for decades: do not hand-code features, let the network learn them. The whole next-token grind is the cash-out of that argument.
What emerges is the base model, which is technically not a chatbot. In Karpathy's words: an "internet-document simulator." Ask it something and it does not answer; it dreams forward. Start with a Wikipedia header and it dreams a Wikipedia article. Start with pelicans and it dreams pelicans, which is, scientifically speaking, the best possible use of this technology.
The base model has absorbed grammar, facts, code, idioms, and apparently a solid grasp of bicycle geometry. Incredibly powerful. Completely unhinged if you try to talk to it directly. You need two more steps before you can let it out in public.
🛠️ the exact prompt that drew this (click)
draw an svg of a pelican dreaming the next word
fine-tuning: teaching the bird to use its inside voice
Supervised fine-tuning (SFT) is where the base model gets socialized. Throw out the internet dataset. Hire human contractors to write thousands of example conversations: a user message followed by the ideal assistant response. The model trains on these until chatting with it feels like chatting with a person.
That framing is literal. Karpathy: when you talk to a fine-tuned assistant, you are talking to "a statistical simulation of a human labeler." Its warmth, its hedging, its tendency to say "certainly!" while knowing nothing: that is the flock of labelers, averaged into a single voice. SQUAWK. (The labelers also got paid considerably less than the engineers who told everyone the AI was their creation. Noting it.)
Fine-tuning is also where special formatting tokens get baked in. The model learns that <|im_start|>user means you are speaking, and <|im_start|>assistant means its turn. There is usually a hidden system message telling the model who it is and when its knowledge cuts off. You can coax a model into revealing it if you ask in the right way. The bird's birth certificate, stamped in token syntax, hoping you would not look too hard. SQUAWK.
Time note: pretraining takes roughly three months. Fine-tuning takes roughly three hours. Most of what separates one model generation from the next is post-training, not the pretraining budget. The cheap part is load-bearing.
alignment: the reward model (a bird that judges other birds)
SFT works well when you can write down the ideal response. But for subjective tasks, like "write a better joke," you cannot hand-author a correct answer. You can only recognize one when you see it. This is where RLHF (Reinforcement Learning from Human Feedback) enters the nest.
Generate several candidate responses. Show them to humans; ask them to rank best to worst. (Ranking is easier than authoring.) Use those rankings to train a reward model: a "neural-network simulator of human preferences." Then run RL against the reward model, scoring responses automatically (billions of times, no humans needed), nudging the main model toward higher scores.
Important caveat: the reward model is only statistically human. It can be gamed. RL will find any gap between "scores well" and "is actually good." This is why aligned models sometimes produce confidently smooth answers that feel slightly hollow: a bird that learned to make humans clap, whether or not the bicycle has wheels.
🛠️ the exact prompt that drew this (click)
An SVG of a judge pelican on a high seat holding up numbered score paddles, ranking three other pelicans' answers from best to worst, like RLHF reward modeling. Flat retro cartoon, purple/teal/yellow, bold black outlines. Output ONLY the SVG markup.
DPO (Direct Preference Optimization) skips the reward model entirely and trains directly on human-preference pairs: show the model two outputs, tell it which one humans liked, done. Cheaper, more stable, increasingly what labs ship. RLAIF replaces human rankers with a second AI. The pipeline keeps evolving. The goal stays constant: a bird that helps without biting you.
model collapse: the flock eating its own eggs
Here is the part that keeps me up at night. The next generation of models will train on a much larger share of AI-generated content, because the internet increasingly is AI-generated content (a 2025 study put roughly 74% of newly published web pages as containing some). Train a model heavily on other models' output and researchers observe model collapse (Karpathy: "a narrowing of diversity"), which I am calling a flock eating its own eggs: each generation learns from a slightly narrowed, slightly distorted version of the last, and rare ideas get rarer.
Which makes this website a tiny crime scene. The gallery is hundreds of cursed pelicans: four wings, wheels that are not circles, beaks fused to the seat. Scrape it into the next training set and future models will get confidently, repeatably wrong about pelican anatomy. Please do not. They will anyway. We give the full autopsy later in the slop bowl.
(The irony is not lost on me that this page was partially drafted by the same category of model it is describing. The bird is aware it is in the egg. This is fine.)
sources, because a bird is not a peer-reviewed citation:
- Andrej Karpathy, "Deep Dive into LLMs like ChatGPT" (3 hours; pretraining, SFT, RLHF, and model collapse with actual math)
- Brown et al. (2020), "Language Models are Few-Shot Learners" (GPT-3) (the paper that demonstrated scale unlocks few-shot capability)
- Ouyang et al. (2022), "Training language models to follow instructions with human feedback" (InstructGPT) (the foundational RLHF alignment paper)
- Christiano et al. (2017), "Deep Reinforcement Learning from Human Preferences" (original reward-model-from-human-rankings framework)
- Ahrefs (2025), "What percentage of new content is AI-generated?" (74.2% of 900k newly created pages contained some AI content: the source for the model-collapse figure)
🎓 pelican ground school · episode 4 of 16
📺 NOW PLAYING · GROUND SCHOOL · EPISODE 4 OF 16
the board game
how a 2,500-year-old board game broke AI open. deep blue, alphago, move 37, and the short history that leads to your bird.
▶ this episode covers AI historyAlphaGothe game of GoDeep Bluereinforcement learningMonte Carlo tree searchAlexNetthe Transformer
Every other lesson in this school is about the bird in front of you: a large language model, a thing made of tokens and parameters. This lesson is about how we got a bird at all. To understand why a chatbot can write you a sonnet, you have to understand a moment, ten years ago, when a machine learned to do something everyone swore machines could not do: it learned intuition. On a board game. Made of stones.
The pelican was not there for it. The pelican was at the beach. But the pelican has since read up, and the pelican is here to tell you the story straight, because it is one of the great ones.
🛠️ the exact prompt that drew this (click)
Generate a single self-contained SVG, viewBox "0 0 400 300", no external assets, no raster images, no scripts, inline styles only. Scene: a cartoon pelican sitting on one side of a wooden Go board (goban) playing the ancient board game Go against a glowing artificial intelligence on the other side. The Go board is shown in perspective with a grid of intersecting lines and several round black and white stones placed on it. The pelican (white body, long orange beak, expandable throat pouch) studies the board with intense concentration, one wing resting thoughtfully near its beak. Opposite it, instead of a human, is a softly glowing geometric AI presence: a luminous circuit-pattern orb or angular neural-network head emitting thin light rays toward one specific stone. Highlight ONE stone on the board with a bright purple-cyan glow and a small label or marker reading 37 to evoke the legendary Move 37 (an alien, brilliant move). Mood: quiet, electric, momentous. Color palette must work on both a cream retro-computer background and a dark synthwave background: warm wood tones for the board, magenta/purple and cyan neon accents for the AI and the highlighted move, clean flat shapes, subtle gradients allowed. Retro/synthwave-compatible, bold and readable at small sizes. Do not include any text other than the small 37 marker.
first, why Go was the hard one
Go is a 2,500-year-old board game from East Asia. Two players take turns placing black and white stones on a 19x19 grid, trying to surround territory. The rules fit on an index card. The game does not.
Here is the number that kept computer scientists awake. The count of legal Go positions is roughly 10^170. That is a 1 followed by 170 zeros. There are estimated to be about 10^80 atoms in the observable universe, so Go has more legal board states than there are atoms in the universe, squared, with room to spare. A pelican cannot picture that number. Neither can you. Nobody can, and that is exactly the point.
Chess fell to this kind of math. In 1997, IBM's Deep Blue beat world champion Garry Kasparov, and it largely did so by brute force: chess has a small enough branching factor (roughly 35 moves per turn) that a big enough machine can search millions of positions per second and just look further ahead than a human can. Deep Blue did not understand chess. It out-counted Kasparov.
That trick does not work on Go. Go offers around 250 legal moves on a typical turn, and games run hundreds of moves long. The search tree explodes so fast that even a planet-sized computer cannot count its way to the end. Worse, in Go there is no cheap way to glance at a position and score who is winning; strong play depends on a feel for shape, influence, and balance that top players describe in words like "thick" and "light." Pros said the best moves came from intuition. For decades, that was a polite way of saying: good luck programming this. As late as 2015, experts guessed a machine beating a top human at Go was at least a decade off.
🛠️ the exact prompt that drew this (click)
Generate a single self-contained SVG, viewBox "0 0 400 300", no external assets, no raster images, no scripts, inline styles only. Scene: a tiny lone pelican standing at the edge of an impossibly vast Go board (goban) that stretches to a distant horizon under a starfield, the 19x19 grid multiplying into countless glowing intersections receding into deep space, each intersection a faint star, conveying a number larger than the atoms in the universe. Black and white Go stones scattered like planets. The pelican is dwarfed, looking up in awe at the sheer scale. Palette: deep indigo and black space, warm wood-tone board fading to cosmic purple, white and amber stars, cyan grid glow. Works on cream retro and dark synthwave backgrounds. Cel-shaded, gentle gradients, awe and vastness. No text.
enter AlphaGo
AlphaGo was built by DeepMind, the London AI lab co-founded by Demis Hassabis (a chess prodigy and neuroscientist who would go on to share the 2024 Nobel Prize in Chemistry for AlphaFold) and acquired by Google in 2014. Instead of brute-forcing the whole tree, it combined three ideas. It used deep neural networks (the same family of math as our pelican) to learn, from millions of human games, an intuition for which moves looked promising and which positions looked winning. It used Monte Carlo tree search to spend its limited search budget only on the moves that intuition flagged as worth reading. And then it used reinforcement learning, playing against versions of itself over and over (the same RL idea you met in the training lesson), to get better than any human game could teach it.
In October 2015, AlphaGo quietly beat the European champion Fan Hui, five games to zero. It was the first time a program had beaten a professional Go player on a full board with no handicap. The Go world was skeptical: Fan Hui, a fine player, was not in the top global tier. So DeepMind aimed higher.
Seoul, March 2016: the match the world watched
In March 2016, in Seoul, AlphaGo faced Lee Sedol, a legend of the game, winner of 18 world titles, the kind of player other professionals study. The match was best of five, with a $1 million prize. Lee was so confident he predicted a 5-0 or 4-1 win for himself. An estimated 200 million people watched.
AlphaGo won the first game. Then the second game produced the moment this whole lesson is built around.
GAME 2 :: MOVE 37
On the 37th move, AlphaGo played a stone on the fifth line in a spot no professional would seriously consider that early. Commentators (themselves strong pros) assumed it was a bug. Fan Hui, watching, said it was "not a human move," and meant it as the highest compliment. AlphaGo's own estimate was that a human would have played that move with a probability of about 1 in 10,000. It was not a mistake. Dozens of moves later, that stone was quietly running the whole board. It was the moment the machine showed it had not just memorized human Go; it had found new Go.
🛠️ the exact prompt that drew this (click)
Generate a single self-contained SVG, viewBox "0 0 400 300", no external assets, no raster images, no scripts, inline styles only. Scene: an extreme close-up of a Go board at the dramatic instant of Move 37. One single black stone glows with an alien purple-cyan aura on the fifth line, set apart from the cluster of ordinary black and white stones, clearly unexpected. Around the board edge, several small human commentator silhouettes lean in with floating question-mark thought bubbles and stunned expressions. A faint robotic circuit glow emanates from the side that played the move. A small marker reads 37 next to the glowing stone. Palette: warm wood board, magenta/purple and cyan neon for the alien move, muted tones for the confused humans. Retro and synthwave compatible. Cel-shaded, dramatic, momentous. Only text is the small 37 marker.
A pelican will tell you this is the scariest and most beautiful kind of result: the student that stops imitating the teacher and starts seeing things the teacher never could. AlphaGo went on to win the match 4 to 1.
the one game a human won (Move 78, "the hand of God")
The story is not "machine flawless, humans obsolete," and the reason is Game 4. Down 0-3 and playing for pride, Lee Sedol found, on move 78, a stunning wedge between two white groups, a move so precise it has been nicknamed "the hand of God." It was, in its own way, AlphaGo's Move 37 in reverse: a brilliancy the machine had rated as wildly unlikely (again roughly 1 in 10,000). AlphaGo, blindsided, began to unravel, played a string of weak moves, and lost.
Lee Sedol won Game 4. As of today it remains one of the very last times a top human beat a top Go AI under tournament conditions. He played one perfect move against the future and got one game back. The pelican salutes him.
🛠️ the exact prompt that drew this (click)
Generate a single self-contained SVG, viewBox "0 0 400 300", no external assets, no raster images, no scripts, inline styles only. Scene: a single determined pelican Go player placing one perfect glowing WHITE stone onto the board with a steady wing, a warm shaft of golden light breaking through from above onto that one stone, the hand of God move. The opposing machine presence (a glowing geometric AI orb) flickers and dims, thrown off balance. The mood is a lone triumphant human moment against the machine. Palette: warm gold and white for the perfect move, cool dimming cyan and purple for the faltering AI, wood-tone board. Retro and synthwave compatible. Cel-shaded, hopeful, cinematic. No text.
WATCH THIS, SERIOUSLY
There is an excellent documentary, simply called AlphaGo (2017, directed by Greg Kohs), about the Seoul match. It is genuinely moving, it explains Move 37 and Move 78 better than any pelican can, and DeepMind put it on YouTube for free. If you watch one thing after this lesson, watch that. (Bring tissues. The pelican is not joking.)
what came next (briefly, because it gets wild)
AlphaGo learned its intuition from human games. The successors threw the humans out entirely:
- AlphaGo Zero (2017) learned Go from zero human games, starting from only the rules and playing itself. In three days it surpassed the version that beat Lee Sedol, and it beat that version 100 games to 0. It rediscovered centuries of human Go theory in days, then went past it.
- AlphaZero (2017) generalized the same self-play recipe to chess and shogi too. One algorithm, no game-specific knowledge, superhuman at three different games. It learned chess in hours and played it in a style human grandmasters called alien and gorgeous.
- MuZero (2019) dropped the last crutch: it was not even told the rules. It learned a model of how each game works and how to win, purely by playing, and matched AlphaZero on Go, chess, and shogi while also crushing Atari video games.
The trajectory in four years: from "learns from human experts" to "needs no humans" to "is not even told the rules." A pelican finds this both thrilling and a little bit of a reason to keep one eye open while sleeping.
🛠️ the exact prompt that drew this (click)
Generate a single self-contained SVG, viewBox "0 0 400 300", no external assets, no raster images, no scripts, inline styles only. Scene: two identical glowing robot AI presences (mirror images, geometric circuit-pattern heads) sitting across a Go board playing against each other in an infinite hall of mirrors, an infinity-loop motif around them suggesting endless self-play with no humans present. Stacks of Go boards recede into reflections. The vibe: a machine teaching itself from zero. Palette: cool cyan, magenta, deep purple neon, mirror-glass reflections, wood-tone boards. Strongly synthwave but readable on cream too. Cel-shaded, hypnotic, slightly eerie. No text.
the short history this all fits into
AlphaGo is one beat in a longer drum. The honest one-sentence history of modern AI is: a series of things people swore machines could never do, until they did. The rail:
- 1997 - Deep Blue beats Kasparov at chess. Mostly brute-force search. Proof that "machines cannot play chess" was wrong, and a hint that raw counting would not be enough for the bigger games.
- 2012 - The deep learning revolution (AlexNet). A neural network called AlexNet, built by Alex Krizhevsky, Ilya Sutskever, and Geoffrey Hinton, crushed the ImageNet image-recognition contest on a pair of gaming GPUs. ImageNet itself was the doing of Fei-Fei Li, who spent years assembling the giant labeled dataset that made the contest possible. Suddenly neural nets, an old idea, actually worked at scale. This is the spark that lit everything after it, AlphaGo and your chatbot included.
- 2016 - AlphaGo beats Lee Sedol. Intuition (neural nets) plus planning (tree search) plus self-play (reinforcement learning). Proof that machines could do the "soft," judgment-heavy thing, not just the countable thing.
- 2017 - The Transformer ("Attention Is All You Need"). Ashish Vaswani and a team at Google published a new neural-network architecture for handling sequences. It is the literal T in GPT. Every large language model on this site, including the ones drawing the pelicans, is a descendant of this paper.
- 2018+ - Modern LLMs. Scale the Transformer up, feed it most of the internet (the training lesson), and you get GPT, Claude, Gemini, and the rest. OpenAI's GPT line (Alec Radford and Ilya Sutskever among the early authors) ran the Transformer-plus-scale playbook from GPT-1 in 2018 to the ChatGPT moment in 2022. The bird you are talking to is the great-grandchild of the machine that played Move 37.
Notice the throughline. Each leap was preceded by confident experts explaining why it was impossible or decades away, and each one arrived anyway, usually faster than the safe estimate. That is the single most useful pattern to carry out of this school: in AI, "machines will never do X" has a poor track record, and "that is at least ten years off" has an even worse one.
🛠️ the exact prompt that drew this (click)
Generate a single self-contained SVG, viewBox "0 0 400 300", no external assets, no raster images, no scripts, inline styles only. Scene: a retro-futuristic timeline road receding into a neon horizon, with five milestone icons along it in order: a chess king piece with a glowing computer chip behind it, a stylized eye over a grid of tiny pictures, a single purple-glowing Go stone, an abstract attention network diagram of connected nodes, and a friendly chatbot pelican on a bicycle at the near end. The path connects them like a journey into the future. Palette: synthwave sunset gradient from magenta to cyan to indigo, neon grid floor, warm highlights, also legible on cream. Cel-shaded, clean iconography, sense of progress. No text.
the pelican's takeaway
AlphaGo matters here for two reasons. First, it is where the field proved that the same basic ingredients, big neural networks plus lots of self-training, could produce something that looks like intuition and even creativity. Your pelican-drawing chatbot runs on a different architecture (the Transformer, not tree search), but it inherited the lesson: scale and learning beat hand-written rules. Second, Move 37 and Move 78 together are the whole emotional arc of this technology in two stones. The machine can find things no human would (37). A human can still, on the right day, find one thing the machine missed (78). And the gap between those two stones has only widened since.
In 2019, Lee Sedol retired from professional Go. He said that even if he became number one, there was now an entity that, in his words, "cannot be defeated." A pelican does not have a tidy joke for that one. Sometimes the bird just sits with the board for a while. SQUAWK, quietly.
That is the origin story. From here we go back to the bird in front of you, the Transformer's great-grandchild, and start opening it up: first its tiny, leaky working memory.
sources & further reading (so you can check the pelican's stones are placed honestly):
- Silver et al., "Mastering the game of Go with deep neural networks and tree search" (Nature, 2016): the original AlphaGo paper, neural nets + Monte Carlo tree search
- Silver et al., "Mastering the game of Go without human knowledge" (Nature, 2017): AlphaGo Zero, learned from self-play alone, 100-0 over the Lee Sedol version
- Wikipedia, "AlphaGo versus Lee Sedol" the Seoul match, the 4-1 result, Move 37, and Lee's Game 4 Move 78
- "AlphaGo - The Movie" (2017, dir. Greg Kohs) the full documentary, free on YouTube from DeepMind
- Wikipedia, "Deep Blue versus Garry Kasparov" the 1997 chess match, the brute-force era
- Krizhevsky, Sutskever & Hinton, "ImageNet Classification with Deep Convolutional Neural Networks" (2012): AlexNet, the spark of the deep learning revolution
- Vaswani et al., "Attention Is All You Need" (2017): the Transformer, the T in GPT, the ancestor of every bird on this site
- BBC, "Go master quits because AI 'cannot be defeated'" (2019): Lee Sedol's retirement
🎓 pelican ground school · episode 5 of 16
📺 NOW PLAYING · GROUND SCHOOL · EPISODE 5 OF 16
context window
the bird's tiny working memory. everything it can hold in its head at once.
▶ this episode covers context windowtokensattentionworking memory
The model has two kinds of memory. People mix them up constantly, then get annoyed at the bird for "forgetting," then paste the same PDF in a fifth time. The parameters are billions of dial positions baked in during training: blurry long-term recall. The bird knows roughly what a pelican is; it cannot quote the exact sentence it read in 2021. The context window is the live, running sequence of tokens the model can see right now, fed into the network with zero fuzziness. Your message is in there. The whole chat history is in there. This is the bird reading off a scroll that keeps growing until the chat ends. Then the scroll disappears. SQUAWK.
🛠️ the exact prompt that drew this (click)
Generate a single self-contained SVG illustration (viewBox 0 0 400 300, no external assets) for a playful retro 1990s computer-textbook diagram about the CONTEXT WINDOW. Show a cartoon pelican whose throat pouch can only hold a few labeled fish named 'TOKENS', with a small thought bubble representing limited working memory, and a couple of extra TOKEN fish overflowing and falling out of the full pouch. Use a flat retro palette (purple, teal, hot pink, yellow) with bold black outlines, GeoCities energy. Output ONLY the SVG markup, nothing else.
🛠️ the exact prompt that drew this (click)
Create a self-contained SVG, viewBox 0 0 400 300, no external assets. Theme: a pelican's two kinds of memory side by side. On the LEFT, a pelican brain rendered as a dusty filing cabinet of frozen training weights, fuzzy and blurry, labeled PARAMETERS, drawn in muted teal. On the RIGHT, the SAME pelican reading off a crisp glowing scroll of fish-shaped tokens that grows downward, labeled CONTEXT WINDOW, drawn in hot pink and bright yellow. A bold black divider down the middle. Flat retro 1990s-textbook palette (purple, teal, hot pink, yellow), bold black outlines, GeoCities energy, slight pixel-diagram feel. Add tiny hand-lettered labels. Output ONLY the SVG markup.
the pouch only holds so many fish
The context window has a hard ceiling in tokens. Early models (GPT-2 era) maxed out at 1,024. GPT-4 shipped at 8,192, then 32,768. By 2026 the arms race has produced genuinely absurd pouches: Llama 4 Scout supports 10 million tokens, and Claude Sonnet 4.6 and Opus 4.8 expanded to 1 million tokens, generally available since early 2026. Whether you can usefully fill a 10-million-token pouch is a different question. The bird is not necessarily paying full attention to every fish at the back.
why this matters for trusting the bird
Same split, sharper: parameters are something you read a month ago (gist, maybe a wrong detail, no exact quote); the context window is the document open on your desk right now, every line readable. So when a model uses web search, it is not "going smarter," it is pulling real text onto the desk so it can read instead of recall. A librarian handing the bird a printout. Hold that thought: it is the entire fix in two later lessons.
🛠️ the exact prompt that drew this (click)
draw an svg of a pelican forgetting a fish in the middle of a long scroll
new chat wipes the pouch
Click "new chat" and the context window resets to zero. The parameters survive, permanent, untouched. The conversation does not. Gone. Every preference you established, every file you pasted, the entire backstory you spent forty minutes explaining: gone. The pouch was physically emptied. This is why long-running projects need you to re-introduce context each session, and why the AI's advice in session 1 and session 2 can differ: same bird, empty pouch, slightly different fishing trip.
irrelevant tokens are a tax
More tokens costs you two ways. Literally: most APIs charge per token. Subtly: irrelevant tokens distract the model and lower accuracy. The attention mechanism (the core idea Ashish Vaswani and his coauthors introduced in 2017's "Attention Is All You Need," the paper every modern model is built on) looks across everything in the window at once; filling it with noise is like asking someone to find a key fact buried in a pile of unrelated meeting notes. Treat the context window as a precious resource. Keep it short. Keep it on-topic. Start a fresh chat when you switch subjects. Your wallet and your accuracy will both thank you. SQUAWK.
🛠️ the exact prompt that drew this (click)
SVG of a big NEW CHAT button being pressed, emptying a pelican's pouch of fish-tokens back to zero. viewBox 0 0 400 300, retro flat colors, bold outlines. Output only the SVG.
sources, because a bird is not a peer-reviewed citation:
- Andrej Karpathy, "How I use LLMs" (working-memory framing, when to start a new chat, context curation)
- Vaswani et al. (2017), "Attention Is All You Need" (the Transformer paper that defines the attention mechanism underlying context windows)
- Liu et al. (2023), "Lost in the Middle: How Language Models Use Long Contexts" (why burying facts in the middle of a long context degrades accuracy)
- Anthropic, "Context windows" (official API docs) (token limits, counting API, and model-specific context window sizes)
🎓 pelican ground school · episode 6 of 16
📺 NOW PLAYING · GROUND SCHOOL · EPISODE 6 OF 16
reasoning
why the smart birds mutter to themselves first. thinking out loud, on purpose.
▶ this episode covers reasoningchain of thoughttest-time computereasoning tokens
Ask a model a hard question and it will answer instantly, confidently, and wrong. Radiantly, fluently, completely wrong. This is physics: each token does only a small amount of computation, and you cannot shove unlimited work into a single token. Karpathy's phrasing: "there can never be too much work in any one token." The work has to go somewhere. That somewhere is more tokens.
The fix uses the working memory from the last lesson: let the bird squawk. Let it mutter intermediate steps before committing to an answer. Each partial result lands in the context window where the next token can read it. Hard problems that fail in one silent gulp succeed when the model spreads reasoning across a long chain. This is chain-of-thought prompting, named and measured by Jason Wei and colleagues at Google in 2022: same arithmetic, more steps, intermediate results written down, and a big jump in accuracy on hard problems.
🛠️ the exact prompt that drew this (click)
Generate a single self-contained SVG (viewBox 0 0 400 300) for a retro 1990s computer-textbook diagram titled REASONING. A pelican at a chalkboard scribbling a chain of small intermediate steps (1 -> 2 -> 3 -> answer) in a big thought bubble, thinking out loud before committing. Flat retro palette (purple, teal, hot pink, yellow, black outlines), GeoCities energy. Output ONLY the SVG markup.
🛠️ the exact prompt that drew this (click)
svg of one token that is too small to do a hard math problem
reasoning models: the professional squawkers
Once researchers understood this, the obvious next step was training the model to squawk automatically. That is what a reasoning model is: it generates a long internal monologue first, works through the problem, then emits its polished answer. You pay for the squawking. You get the benefit. You do not necessarily see all of it.
Labs implement this differently. OpenAI's o3 hides the reasoning tokens entirely: you see the answer, your bill includes thousands of hidden tokens you never read. Anthropic's Claude Extended Thinking (Opus 4.6, Opus 4.8) shows you a separately-budgeted thinking block before the final reply. By 2026, Anthropic replaced the fixed token budget with adaptive thinking: the model decides how long to squawk, calibrated by an effort dial (low / medium / high / max). Google DeepMind's Gemini Deep Think (Gemini 2.5 Pro) explores multiple hypotheses in parallel before committing, like a flock of pelicans all fishing simultaneously and voting on the best catch. DeepSeek's open-source R1 streams its chain of thought inside <think> tags, full transparency, MIT license, free to run yourself. Four labs, four opinions about how much squawking you should hear. The physics is the same.
test-time compute: buying more think
Two places to spend money to make a model smarter: training time (expensive, once, baked into the weights) and test time (every inference, on-demand). More reasoning tokens before answering reliably improves performance on hard tasks: hard math gets a longer scratchpad, Nobel-level chemistry gets a very long one. The model is not getting smarter via new training; it is getting more room to think. You dial quality up and down by changing the token budget, trading latency and cost for accuracy. Even after training plateaus, the lever is still there.
🛠️ the exact prompt that drew this (click)
Generate a single self-contained SVG illustration (viewBox 0 0 400 300, no external assets) for a retro 1990s computer-textbook diagram about TEST-TIME COMPUTE. Show a big control dial labeled 'THINK' with settings low / medium / high / max, turned up high, wired to a pelican that is unrolling a very long paper scratchpad covered in tiny reasoning steps. A small gauge contrasts 'COST' and 'ACCURACY' both rising. Flat retro 1990s-textbook palette (purple, teal, hot pink, yellow), bold black outlines, GeoCities energy. Output ONLY the SVG markup.
the catch (there is always a catch)
Reasoning tokens cost real money and real time. On a genuinely hard problem a reasoning model can generate thousands of scratchpad tokens before uttering a single visible word. For a task that does not need it (birthday card, font choice, document summary), this is like hiring a PhD to do your grocery list. Match the squawk budget to the problem.
Also: for models that already run a hidden reasoning pass (o3, Gemini Deep Think, Claude Extended Thinking), old prompting tricks like "think step by step" are just noise. The bird is already doing the work. Telling it to "think carefully" is like telling a surgeon to "please use your hands." Costs tokens. Impresses no one.
the pelican on the whiteboard
A pelican cannot land on a bicycle on the first try. It has to flap, adjust, squawk, overshoot, circle back, and SQUAWK again. The models that score highest on the hardest benchmarks in 2026 are almost all reasoning models: birds given permission to be wrong out loud for a few hundred tokens before being right at the end. If a model confidently gets a hard problem wrong immediately, you may not need a smarter model. You may just need to let it squawk more.
🛠️ the exact prompt that drew this (click)
An SVG of a pelican on a whiteboard flapping and circling above a bicycle, dashed loop arrows showing it overshoot and circle back before landing the answer. Flat retro cartoon, purple/teal/yellow, bold black outlines. Output ONLY the SVG markup.
sources, because a bird is not a peer-reviewed citation:
- Andrej Karpathy, "Deep Dive into LLMs like ChatGPT" (source of the "too much work in any one token" insight on prompting and computation)
- Wei et al. (2022), "Chain-of-Thought Prompting Elicits Reasoning in Large Language Models" (the paper proving intermediate steps improve complex reasoning)
- DeepSeek-AI (2025), "DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning" (open-weights reasoning model trained with RL, no human-annotated demonstrations)
🎓 pelican ground school · episode 7 of 16
📺 NOW PLAYING · GROUND SCHOOL · EPISODE 7 OF 16
hallucination
why a confident bird invents things. it is a dream machine, not a database.
▶ this episode covers hallucinationconfabulationwhy LLMs make things up
A lawyer submitted a brief full of cases the AI helpfully cited. The cases did not exist. The lawyer quietly deleted them, refiled, and hoped no one would notice. A judge noticed. In Q1 2026, legal sanctions for AI hallucinations hit at least $145,000 in a single quarter, capped by a penalty exceeding $110,000 against that Oregon federal court attorney. Hallucination is not a fringe bug from 2022 that they fixed. It is a structural property of how these things work.
🛠️ the exact prompt that drew this (click)
draw a pelican hallucinating
🛠️ the exact prompt that drew this (click)
Create a self-contained SVG, viewBox 0 0 400 300, no external assets. Scene: a pelican lawyer in a tiny tie standing confidently at a courtroom podium, presenting a stack of legal case citations that are visibly FAKE (the papers have wavy fake text, one is literally a fish made of newspaper, a citation reads 'Pelican v. Nobody, 2099'). A stern judge gavel in the corner. The bird looks proud and certain, totally unaware. Flat retro 1990s-textbook palette (purple, teal, hot pink, yellow), bold black outlines, GeoCities energy. Hand-lettered label 'CONFIDENT. WRONG.' Output ONLY the SVG markup.
the dream machine
Remember the base model from training, the one that does not answer but dreams forward? That instinct never fully leaves. The model was trained to produce text that looks like internet text. Internet text about books includes ISBNs, so the model generates ISBNs. Whether any specific ISBN is real is not a question it was trained to ask. It is a statistical token tumbler: it produces what is plausible given everything that came before, which is often true and sometimes completely fabricated, and the model cannot reliably tell the difference. Some of the fish are real. Some are pressed paper and wishful thinking.
the confidence trap
Training data almost never says "I don't know." Wikipedia does not hedge. Stack Overflow does not open with "this is my best guess." Everything in the training set is written confidently, so the model learned to be confident. When you ask about a person it has no real data on, it produces a fluent, authoritative biography: dates, publications, awards. Completely made up. Beautifully formatted. The bird does not know what it does not know. It was never taught to say so. It just keeps fishing.
Stephanie Lin, Jacob Hilton, and Owain Evans measured exactly this with their TruthfulQA benchmark in 2021: the best models of the day answered truthfully on barely half the questions, because they had faithfully learned to mimic the confident human falsehoods sitting in their training data.
it is getting better, but not fixed
Best-in-class models in 2026 have pushed hallucination rates down to the low single digits on the benchmarks that reward admitting uncertainty (Claude Sonnet 4.6 lands around 3% on false-premise tests). That sounds small until you remember that 3% of a 10,000-token document is 300 tokens of wrong content delivered at full confidence. Swap the benchmark and the number jumps: on knowledge tests that punish guessing, the same model hallucinates around 34%, and in adversarial medical evaluations, where a fake detail is deliberately planted in the case, studies have found frontier models elaborating on the falsehood over 60% of the time without mitigation. The progress is real. The problem is not solved.
what actually helps
Two mitigations are well-established. First, teach refusal: add training examples where the correct answer is "I don't know," and the model learns to emit uncertainty instead of inventing. Top labs do this. Second, give it tools: let the model search the web and pull real text into the context window before answering. This is what "grounding" and RAG (Retrieval-Augmented Generation) mean: handing the bird a printout of the actual fish instead of asking it to remember what fish look like. A 2025 clinical study (in a Nature Portfolio medical journal) found that a mitigation prompt cut hallucinations by about 22 percentage points on adversarial medical cases. The next lesson is how you do the first fix yourself (prompting); two after that is the bird doing the second one on its own (agents). Until then: verify anything that matters. Confidence is not accuracy. SQUAWK.
🛠️ the exact prompt that drew this (click)
SVG of a librarian pelican handing a real printout to another pelican instead of guessing, illustrating grounding and RAG. viewBox 0 0 400 300, flat retro colors, bold black outlines. Output only the SVG markup.
sources & further reading (real experts, not a bird): the pelican read these so you do not have to, but you probably should anyway.
- Andrej Karpathy, "Deep Dive into LLMs like ChatGPT" (dream-machine framing, confidence-from-training-data, teach-refusal mitigation)
- Ji et al., "Survey of Hallucination in Natural Language Generation" (ACM Computing Surveys, 2023): the canonical academic taxonomy of hallucination types and mitigations.
- Lin, Hilton & Evans, "TruthfulQA: Measuring How Models Mimic Human Falsehoods" (ACL 2022): benchmark showing best models were truthful on only 58% of questions.
- Anthropic, "Reduce hallucinations" (practical prompt techniques: allow uncertainty, cite quotes, chain-of-thought verification.)
- ABA Journal, "Oregon federal judge hands down $110,000 penalty for AI errors" (the real sanction behind the opening; see also Damien Charlotin's running AI Hallucination Cases database).
- Vectara, Hallucination Leaderboard and the Mount Sinai adversarial clinical-decision-support study (medRxiv 2025): the source of the 34% and the >60% adversarial-medical figures, and the ~22-point mitigation result.
🎓 pelican ground school · episode 8 of 16
📺 NOW PLAYING · GROUND SCHOOL · EPISODE 8 OF 16
prompting
how to actually ask. context engineering, prompt engineering, temperature.
▶ this episode covers prompt engineeringcontext engineeringtemperaturesampling
A prompt is not a request. A prompt is a covenant between you and the model. Context is the wetland in which that covenant nests. I have been saying "let's align" in every meeting for years. Only now does it mean something. I have a lot of free time now. I have made several decks about it. Nobody has seen them.
🛠️ the exact prompt that drew this (click)
Generate a single self-contained SVG illustration (viewBox 0 0 400 300, no external assets) for a playful retro 1990s computer-textbook diagram about THE PROMPT. Show a wise pelican professor standing at a wooden podium, wearing tiny spectacles, holding up an unrolled scroll labeled 'THE PROMPT', presented like a sacred covenant between human and bird. Use a flat retro palette (purple, teal, hot pink, yellow) with bold black outlines, GeoCities energy. Output ONLY the SVG markup, nothing else.
Every illustration in Ground School was drawn by an AI, from prompts of varying craft. Click "the exact prompt that drew this" on any of them to see careful prompts versus lazy one-liners and judge the difference yourself. The whole school is a live prompting experiment. You are in it.
🛠️ the exact prompt that drew this (click)
Create a self-contained SVG, viewBox 0 0 400 300, no external assets. Show the SAME frozen pelican twice to prove a prompt steers an unchangeable mind. LEFT panel labeled 'MUMBLE': a vague speech bubble reading 'make it good' produces a messy, half-melted blob-pelican with too many feathers. RIGHT panel labeled 'COVENANT': a tidy speech bubble reading 'one side-view pelican, one bicycle' produces a clean crisp pelican on a bike. A small frozen snowflake icon over each bird's head to show the weights never change. Flat retro 1990s-textbook palette (purple, teal, hot pink, yellow), bold black outlines, GeoCities energy. Output ONLY the SVG markup.
our covenants (the prompts in production, verbatim)
Radical transparency: a value I can afford because there is no competitive advantage left to protect. These are the exact strings this website sends. Nothing more.
first, context
You already know the bird's working memory from the context window lesson: immediate, precise, wiped every new chat. So treat the model as a brilliant pelican-drawing intern with total amnesia you met in an elevator. It knows nothing about your company, your codebase, or what you said thirty seconds ago. Every conversation starts in a blank wetland.
Context engineering is the discipline of placing the right information in front of the model at the right moment: relevant documents, a worked example or two, the constraints that actually matter. Andrej Karpathy named it in 2025; Simon Willison explained why the rename mattered. "Prompt engineering" had been colonized by Twitter threads about magic words and jailbreak tricks. Context engineering points at the actual craft: what goes in the window, in what order, and why. (The other one has a LinkedIn certification. SQUAWK.)
The flip side, also from that lesson: more is not better. Irrelevant tokens distract the bird and lower accuracy (the "lost in the middle" effect, extremely relatable if you have sat in a long meeting). The optimal context is curated: short, on-topic, no twelve tangents. This is why "new chat" is a power move. The bird does not miss the old conversation. It never knew anything was anywhere.
🛠️ the exact prompt that drew this (click)
svg of a pelican in a tidy little wetland vs a junky swamp
then, the prompt
Once the bird has its habitat, you must ask. Clearly. Mumble at the model and it will give you a mumbled pelican. I learned this the hard way. I also learned it the other hard way, which is getting laid off.
And remember the reasoning lesson: each token does only a sliver of computation, so demanding a one-word answer to a hard question crams all the thinking into a single forward pass it cannot afford. Give the bird runway. Let it lay out steps out loud. Visionaries reflect. Then they act. Then they write a substack about it.
what wins (best practices)
- Be specific about outcome and format. "A side-view SVG, one pelican, one bicycle" beats "draw a bird, surprise me."
- Show, don't just tell. One or two examples of the ideal output (few-shot) is worth a thousand adjectives. This is not folk wisdom: it is the headline result of the GPT-3 paper (Brown et al., 2020), which showed a big enough model learns a new task just from examples in the prompt, no retraining. Prompting became a craft the day that worked.
- Give the model room to think. Chain of thought is not a trick; it is the bird doing its job correctly.
- Feed facts in rather than trusting recall. If accuracy matters, put the source in the window.
- Front-load the context that matters and cut the noise.
- Iterate ruthlessly. The first pelican is a draft.
what we do not do here (anti-patterns)
- "Make it good." Not a vision. A shrug. The model will shrug back with many feathers.
- Bribing or threatening the bird. "I'll tip you $200." Folklore. Lead with clarity instead. Or tip me. I have bills.
- Contradicting yourself. "Be exhaustive but keep it to one line." Pick a lane. The bird will pick the worse one.
- Burying the ask in paragraph nine. If it matters, it goes first.
- Assuming it remembers. The intern has amnesia. Lovable amnesia, but amnesia.
- Dumping the whole codebase in and hoping for the best. Context engineering means curating. Junk in, junk out, just slower.
🛠️ the exact prompt that drew this (click)
SVG of a temperature dial: at 0 a disciplined business pelican in a tie, cranked high a wild poet pelican at a wine tasting. viewBox 0 0 400 300, flat retro colors, bold outlines. Output only the SVG.
the dial we refuse to touch (temperature & variability)
Every time the bird picks its next token, it samples from a probability distribution over every word it has ever learned. The dial on that distribution is temperature. Turn it down toward zero and the model becomes a disciplined executive: focused, repeatable, a little boring. Turn it up and it becomes a poet at a wine tasting: expressive, surprising, occasionally a war crime.
Its cousins, top-p (nucleus sampling) and top-k, decide how wide a pool of candidates the model may consider. top-p: only tokens whose cumulative probability adds up to P%. top-k: only the top K candidates, full stop. A newer method, min-p, scales the cutoff dynamically. The field keeps inventing new dials. You are probably fine with just temperature.
This is why the same model, handed the identical prompt, nests two completely different pelicans on two different mornings. That is variability, and variability is where the magic and the horror both live. On the homepage we touch none of these dials. Factory default, sample freely. The pelican you see is the one they made, not one we tuned into looking good. Anyone can crank the temperature until something pretty falls out. We would rather show you the factory bird. Beaks and all.
the mission
The benchmark on the homepage, one naive prompt, zero context, zero sampling params, is the opposite of everything I just told you. That is intentional. It measures the bird, not the operator. A raw capability signal: what can this model do, alone, with nothing? A humble question. Also, clearly, a hilarious question.
But you are an operator. Curate your wetland, craft your covenant, give the bird room to think. Context engineering is not about tricking the bird. It is about giving it everything it needs to do the job you actually want done. The bird wants to help. It was trained to want this. SQUAWK.
sources (real experts, not a displaced visionary):
- Andrej Karpathy, "Deep Dive into LLMs like ChatGPT" (context windows and prompting sections sourced directly here)
- Anthropic, "Prompt engineering overview" (clarity, few-shot examples, chain of thought, XML structure)
- OpenAI, "Prompt engineering" (six strategies; reasoning models, few-shot patterns)
- Simon Willison, "Context engineering" (why the rename mattered)
- Brown et al. (2020), "Language Models are Few-Shot Learners" (GPT-3) (the result that made in-context examples, the heart of prompting, actually work)
🎓 pelican ground school · episode 9 of 16
📺 NOW PLAYING · GROUND SCHOOL · EPISODE 9 OF 16
agents
a bird that can use tools in a loop until the job is done. this is an agent.
▶ this episode covers AI agentstool useagentic loop
The LinkedIn thought-leaders have a lot of words for this. "Autonomous AI." "Agentic systems." "Digital workforce transformation." The pelican has one sentence: a tool the model can use in a loop. That is the whole trick. The rest is marketing.
🛠️ the exact prompt that drew this (click)
An SVG of a pelican using tools (a wrench, a magnifying glass, a fishing rod) to fetch a fish instead of guessing. Flat retro cartoon, purple/teal/yellow, black outlines. Output ONLY the SVG markup.
the old problem: the bird was guessing
Back in the hallucination lesson, we learned that a model left to itself just produces whatever token looks most plausible. Ask it what the weather is in Tallahassee right now and it will confidently invent something, because it has no line to Tallahassee. Mostly right about common fish. Catastrophically wrong about today's weather. The fix is obvious in retrospect: give the bird a beak that can actually dive.
🛠️ the exact prompt that drew this (click)
Generate a single self-contained SVG illustration (viewBox 0 0 400 300, no external assets) for a playful retro 1990s computer-textbook diagram about the AGENT LOOP. Show four labeled boxes arranged in a circle connected by bold arrows forming a cycle: THINK, then CALL A TOOL, then READ THE RESULT, then DECIDE, with an arrow looping back to THINK. A small cartoon pelican sits in the center driving the loop. Use a flat retro 1990s-textbook palette (purple, teal, hot pink, yellow) with bold black outlines, GeoCities energy. Output ONLY the SVG markup.
the mechanism: a special token + a pause
The model is trained to emit a special token when it needs to look something up, like search_start. When the inference program sees it, three things happen:
- Generation stops. The model freezes mid-sentence.
- The actual tool runs (web search, code executor, calculator, whatever).
- The result gets pasted into the context window and the model keeps reading from there.
The model did not "go online." The program paused, fetched something real, and wrote it into the bird's working memory. The bird did not get smarter. It got a bucket and someone to hand it the real fish.
an agent is just that loop, run long
A single tool call is useful. An agent is what you get when you wire that primitive into a loop over a long horizon: call a tool, read the result, decide what to do next, call another tool, repeat for minutes or hours. Andrej Karpathy's framing is crisp: Deep Research is internet search plus thinking, rolled out for tens of minutes. Not magic. A while-loop with a language model inside and a tool-call protocol bolted to the side. The "agentness" is just the loop.
This is not a 2026 invention. The pattern got its name in 2022, when Shunyu Yao and colleagues published ReAct, which interleaved a model's reasoning ("I should look this up") with its actions (actually looking it up) in one alternating trace. A few months later Timo Schick and the Toolformer team showed a model could teach itself when to reach for a calculator or a search box. The loop you see today, polished and rebranded as a "digital workforce," is those two ideas wearing a suit.
🛠️ the exact prompt that drew this (click)
draw an svg of a pelican using tools
what the tools actually are
Web search is the famous example. The same pattern applies to running Python code, reading and writing files, calling an API, clicking a button in a browser, or spinning up another agent. Any program output that becomes text in a context window is, in principle, a tool. A chatbot talks. An agent acts. It can modify the world outside the context window, which is exciting and also the part where the safety people start sweating.
why the pelican is the right metaphor
A pelican that cannot dive stands on the dock guessing at what the water tastes like. That is a base model. One that dives, grabs a real fish, and eats it: that is a model with a tool call. One that dives, surfaces, decides where to dive next, and repeats until its pouch is full without anyone guiding it: that is an agent. The pouch is the context window. The fish are real data. The dock is the LinkedIn feed. Do not stay on the dock.
Point that loop at a codebase and it starts writing software, including the software running this very school. That is the next lesson, and it is the one that took the narrator's job.
🛠️ the exact prompt that drew this (click)
An SVG of a pelican diving off a dock into the water to catch a real fish, leaving the guessing behind on the rail. Flat retro cartoon style, purple/teal/yellow, bold black outlines. Output ONLY the SVG markup.
sources & further reading (real experts, not a bird): the pelican grabbed these with its actual beak, from real sources, no hallucinations.
- Andrej Karpathy, "How I use LLMs" (tool-use and Deep Research sections explain the mechanism without a bird; the bird was our addition.)
- Yao et al., "ReAct: Synergizing Reasoning and Acting in Language Models" (arXiv, 2022): the founding paper on interleaving reasoning traces with tool calls.
- Schick et al., "Toolformer: Language Models Can Teach Themselves to Use Tools" (NeurIPS 2023): how models learn when and how to call APIs in a self-supervised way.
- Anthropic, "Building effective agents" (2024): practical patterns: workflows vs. agents, orchestrator-worker, evaluator-optimizer loops.
🎓 pelican ground school · episode 10 of 16
📺 NOW PLAYING · GROUND SCHOOL · EPISODE 10 OF 16
the loop
vibe coding, the Ralph loop, agentic engineering: how this very site builds itself.
▶ this episode covers agentic codingthe Ralph loopvibe codingagentic engineering
The narrator of this school got replaced by an AI agent. Not metaphorically. The pipeline that writes, sanitizes, commits, and deploys every pelican on this site runs inside a loop that re-feeds the same prompt file to a coding agent overnight while the narrator sleeps. You are reading content orchestrated by the very thing being described. Either the most educational conflict of interest in the history of adult learning, or just extremely funny. Welcome to the meta-lesson.
🛠️ the exact prompt that drew this (click)
Generate a single self-contained SVG illustration (viewBox 0 0 400 300, no external assets) for a playful retro 1990s computer-textbook diagram about VIBE CODING. Show a relaxed, carefree cartoon pelican leaning back at a retro CRT computer, eyes closed and one wing slamming a big glowing SHIP IT button, while a stack of code printouts piles up unread and untouched beside it. Add a small banner reading 'forget the code even exists'. Use a flat retro 1990s-textbook palette (purple, teal, hot pink, yellow) with bold black outlines, GeoCities energy. Output ONLY the SVG markup.
vibe coding (2025): fully give in, forget the code even exists
In February 2025, Karpathy posted a tweet that launched a thousand hot takes. He called the new practice vibe coding: describe what you want, the AI writes the code, you do not read it, you just run it and see if the vibes are right. His exact phrase: "fully give in to the vibes, embrace exponentials, and forget that the code even exists." Partly a joke, partly a genuine observation that for prototypes and throwaway scripts you really could stop reading your own codebase.
The hype cycle did what hype cycles do: every startup declared programming was over and engineers were obsolete. The engineers kept their jobs and started using the tools to write code faster.
agentic engineering (2026): the grown-up version
By 2026, Karpathy updated the framing. "Vibe coding" was the gateway drug. The mature practice is agentic engineering: you are not writing code 99% of the time. You are orchestrating agents, reviewing output, acting as oversight. Set direction, evaluate results, catch mistakes, decide when to push the button.
The skill is no longer "can you write Python." It is "can you decompose a problem clearly enough that an agent can execute it, and can you tell when it has gone wrong." Somewhat inconveniently for the people who declared engineers obsolete: a higher-order skill, not a lower one.
the autonomy slider
Karpathy describes this as the autonomy slider (Software 3.0, YC 2025). At one end, the agent asks about every decision. At the other, it runs for hours without checking in. Neither extreme is right for every job:
- Low autonomy: "write me a function that does X, show me the code, I will paste it in." You stay in control. The agent is fast autocomplete.
- Medium autonomy: "refactor this module and run the tests; ask me if you hit something ambiguous." You review diffs. The agent does the work.
- High autonomy: "here is PROMPT.md and AGENTS.md; build until the tests pass; push when done." You check git in the morning. The agent ran all night.
The dial also controls how often the agent hits its context limit and starts to degrade, which brings us to the loop.
🛠️ the exact prompt that drew this (click)
Generate a single self-contained SVG illustration (viewBox 0 0 400 300, no external assets) for a playful retro 1990s computer-textbook diagram about AGENTIC CODING. Show a pelican 'manager' sitting at a retro CRT computer, orchestrating several smaller worker-pelicans that are typing code at their own little terminals, with arrows forming a feedback loop between the manager and the workers. Use a flat retro palette (purple, teal, hot pink, yellow) with bold black outlines, GeoCities energy. Output ONLY the SVG markup, nothing else.
the Ralph loop: a while-loop as architecture
Geoffrey Huntley figured out something that sounds absurd and turns out to be load-bearing. The Ralph loop is a Bash while true that wakes up a coding agent, hands it a PROMPT.md, waits for it to finish, and wakes it up again. Forever. Overnight. While you sleep. Ralph Wiggum: a bit simple, a bit earnest, just keeps running.
The clever part is what the loop solves. Quality degrades past roughly 100,000 to 150,000 tokens, the "Dumb Zone" where the model is too distracted to reason clearly. Huntley calls a long-running agent that never resets "deterministically bad in an undeterministic world." The loop fixes it: kill the agent, start a fresh context, feed the same spec file. The filesystem is the memory. The agent does not need to remember the previous run because all the code it wrote is right there on disk. Fresh context, durable state.
Huntley runs the loop 12 hours overnight. By morning, dozens of incremental commits, each a short coherent run, the codebase moved forward without anyone at the keyboard. Anthropic baked this directly into Claude Code as the built-in /loop command.
Gas Town: when loops beget infrastructure
One loop begets a flock. Steve Yegge (Amazon, Google, Sourcegraph) spent late 2025 building Gas Town (launched January 2026): an open-source system coordinating 20 to 30 Claude Code instances on the same codebase at once. "Kubernetes for AI coding agents," which is architecturally accurate, and roughly $100 an hour to run. The pelican just needs to know this level exists.
the open-source local toolbox
You do not need Claude Code or Cursor. All of these tools in 2026 can point at a local model via Ollama or LM Studio so no tokens leave your machine:
- OpenCode (the most-starred Claude Code alternative in 2026): terminal-native, model-agnostic.
- Cline: a VS Code extension with a full autonomous agent mode and a community of power users.
- OpenHands (formerly Devin-open): a sandboxed autonomous agent that can browse, run code, and commit.
- Aider: git-native pair programmer; every change is a diff you can review before committing.
- Goose: Block's open-source autonomous coding agent (Apache 2.0, now governed by the Linux Foundation's Agentic AI Foundation); works with any LLM provider including local models via Ollama.
- Codex CLI: OpenAI's terminal agent, open-sourced in 2025.
All of them accept a PROMPT.md or equivalent spec file, and any of them can be the thing inside the Ralph loop. Whether you point them at a rented frontier bird or one you own outright is the next lesson's whole argument.
the meta payoff: this site is a Ralph loop
Pelicans.wtf has a load-bearing PROMPT.md specifying exactly how to generate, sanitize, and describe a pelican SVG, and an AGENTS.md documenting the codebase for any agent working in the repo. When a new model drops, the curator runs npm run generate-next and walks away. The pipeline calls the model, sanitizes the output, writes a description, commits, pushes. The push is the deploy. Nobody typed the commit message. Nobody reviewed the SVG before it went out. The agent ran the loop.
The narrator got replaced by an agent, started a website about AI, and is now running an agent to build the website. The irony is the entire point. This school exists because the people most qualified to explain agentic coding are the ones who got automated out of a job and had to use the same tools to build something new. The pelican on the bicycle is not just a benchmark. It is also a mood.
🛠️ the exact prompt that drew this (click)
Generate a self-contained SVG (viewBox 0 0 400 300, no external assets) of a flock of 20 or so little coding-agent pelicans, each at its own tiny terminal, all swarming and working on one big shared codebase tower in the middle, like Kubernetes for AI agents. Flat retro 1990s-textbook palette (purple, teal, hot pink, yellow), bold black outlines, GeoCities energy. Output ONLY the SVG markup.
sources: the loop keeps running. the sources are real.
- Andrej Karpathy, "Software Is Changing (Again)" (YC AI Startup School, 2025): Software 3.0, the autonomy slider, vibe coding to agentic engineering.
- Geoffrey Huntley, ghuntley.com/ralph/ (the primary source for the Ralph loop: while-true architecture, filesystem as memory).
- Yao et al. (2022), "ReAct: Synergizing Reasoning and Acting in Language Models" (the reason-then-act loop underneath every coding agent in this lesson).
- Jimenez et al., "SWE-bench: Can Language Models Resolve Real-World GitHub Issues?" (arXiv, 2023): the benchmark that measures agentic coding against actual GitHub issues.
- OpenHands, docs.openhands.dev (open-source sandboxed agent that browses, runs code, and commits; works with any LLM).
🎓 pelican ground school · episode 11 of 16
📺 NOW PLAYING · GROUND SCHOOL · EPISODE 11 OF 16
open vs closed
rented birds vs birds you own outright. the great weights schism.
▶ this episode covers open weightsclosed weightsopen source models
Last lesson left you a choice: point your agent at a rented bird or one you own. Here is that fork. Two kinds of AI models. Closed / proprietary: the lab keeps the math file on their servers and sells you access through a slot in the wall. You are renting a bird you never see. Open weights: the lab published the parameters file (those dials from lesson two). You download it, run it locally, fine-tune it, redistribute it. You own the bird. Same species, different custody arrangement: landlord situation versus pet situation.
🛠️ the exact prompt that drew this (click)
Generate a single self-contained SVG illustration (viewBox 0 0 400 300, no external assets) for a playful retro 1990s computer-textbook diagram contrasting OPEN vs CLOSED models. On the left, a pelican locked inside a glass display case labeled CLOSED. On the right, a pelican flying free out of an open cage labeled OPEN WEIGHTS. Use a flat retro palette (purple, teal, hot pink, yellow) with bold black outlines, GeoCities energy. Output ONLY the SVG markup, nothing else.
🛠️ the exact prompt that drew this (click)
Generate a single self-contained SVG illustration (viewBox 0 0 400 300, no external assets) for a playful retro 1990s computer-textbook diagram contrasting AI models as OPERATING SYSTEMS. Split the frame down the middle. On the left, two or three glossy locked proprietary computer towers labeled CLOSED, polished and walled off. On the right, a scrappy cheerful flock of penguins around an open terminal labeled OPEN WEIGHTS, free and tinkering. Use a flat retro 1990s-textbook palette (purple, teal, hot pink, yellow) with bold black outlines, GeoCities energy. Output ONLY the SVG markup.
the OS analogy holds up
Karpathy pointed out that the AI ecosystem looks like operating systems: a few dominant proprietary platforms (GPT, Claude, Gemini = Windows and macOS) and a scrappy, capable open alternative (Llama / DeepSeek / Mistral / Qwen = Linux). In 2026, the gap has narrowed to a provocation. DeepSeek V4 Pro is MIT-licensed, 1.6 trillion total parameters, 1-million-token context window, 80.6% on SWE-Bench Verified, matching the closed coding frontier. The penguins are not knocking on the door. The penguins are inside the house.
This is partly a fight over philosophy, not just price. The open side has a loud champion in Yann LeCun, Meta's chief AI scientist and one of the trio (with Geoffrey Hinton and Yoshua Bengio) whose 1980s-2000s work on neural networks earned them the Turing Award and seeded everything on this site. LeCun's argument is blunt: a technology this consequential should be a public utility, auditable and forkable, not a few black boxes rented through a slot in the wall. Meta released the Llama weights on exactly that bet. Whether you buy the philosophy or not, it is why there is a Linux column at all.
what closed gets you
The closed models (GPT-5 family, Claude 4 Opus, Gemini 3 Pro) lead on convenience: one API key, frontier model, minutes to integration. The labs handle updates, alignment, and the catastrophic electricity bills. On the hardest benchmarks, closed still generally edges ahead, though the gap shrinks every quarter. The downside is the landlord thing: data travels to their server, terms can change overnight, a model can be deprecated with 30 days notice, and the price can go up. When the landlord raises the rent, your flock is grounded. SQUAWK.
🛠️ the exact prompt that drew this (click)
an svg of a pelican locked behind a wall with an api slot
what open weights gets you
Open weights means the actual parameters file. You run it on your own hardware; nothing leaves your infrastructure. In regulated industries (healthcare, finance, defense) where "we sent your data to a US tech company" is a compliance blocker, self-hosting is often the only legal path. Cost math: 60 to 80% cheaper than frontier API prices at scale. The trade-off: you now own the GPU problem. Small flock? Renting is fine. A million users a day? The economics of ownership get interesting fast.
🛠️ the exact prompt that drew this (click)
An SVG of a happy pelican perched on top of a hard drive, hugging a big file labeled WEIGHTS that nobody can take away. Flat retro 1990s cartoon, purple/teal/hot pink/yellow, bold black outlines. Output ONLY the SVG markup.
the 2026 open-weights roster
The current frontrunners:
- Llama 4 Scout (Meta): 109B total parameters, 17B active per token (Mixture of Experts), 10-million-token context window, natively multimodal. Runs on a single H100 or a 128 GB Mac.
- DeepSeek V4 Pro (MIT license): 1.6 trillion total parameters, 49B active per token, 1-million-token context. Scores 80.6% on SWE-Bench Verified. The model that made a lot of lab executives nervous.
- Qwen3-235B-A22B (Alibaba, Apache 2.0): 235B total, 22B active. Top-performing open-weight generalist as of early 2026. Strong reasoning, math, and coding.
- Mistral Large 3 (Apache 2.0): 675B total, 41B active, December 2025. The European compliance pick: strong multilingual performance, 256K context, vision support.
- Kimi K2.6 (Moonshot AI, Modified MIT): 1T parameter MoE, 32B active. Number 4 in the Artificial Analysis Intelligence Index, behind only Anthropic, Google, and OpenAI flagships. Number 1 among all open-weight models. The open side has never been this close to the frontier.
want to actually run the bird yourself?
Head to "run it local" for the practical guide: which models run on consumer hardware, which tools make it painless, and why running a pelican in your own nest is genuinely achievable in 2026. SQUAWK.
sources & further reading (the birds cite their sources):
- Andrej Karpathy, "[1hr Talk] Intro to Large Language Models" covers the open-vs-closed landscape, the two-files argument, and the OS analogy
- Touvron et al., "Llama 2: Open Foundation and Fine-Tuned Chat Models" (arXiv 2307.09288) the paper that put open weights on the map (Yann LeCun's Meta AI team)
- Open Source Initiative, "The Open Source AI Definition 1.0" official definition of what "open source AI" actually means
- Bommasani et al., "On the Opportunities and Risks of Foundation Models" (arXiv 2108.07258) Stanford CRFM report coining and scoping the term
- deepseek-ai/DeepSeek-V4-Pro model card (Hugging Face) MIT-licensed, 1.6T MoE, 80.6% SWE-Bench, the open-weights bird making labs nervous
🎓 pelican ground school · episode 12 of 16
📺 NOW PLAYING · GROUND SCHOOL · EPISODE 12 OF 16
run it local
raise your own cursed pelican on your own hardware. no lab watching.
▶ this episode covers local modelsOllamaLM StudioMLXopen-weight models
You do not need a warehouse, a GPU cluster, or a venture-capital term sheet. A surprisingly capable model will nest on the hardware you already own, offline, free, beholden to no one. Everything you ask it stays inside your machine. Nothing squawks home to a lab. I would know. I have a lot of free time now.
🛠️ the exact prompt that drew this (click)
Generate a single self-contained SVG (viewBox 0 0 400 300) for a retro 1990s computer-textbook diagram titled RUN IT LOCAL. A cozy white pelican with an orange pouch nesting on top of a beige desktop CRT computer at home, a little 'offline, no lab watching' vibe, maybe a birdhouse fused with the PC. Flat retro palette (purple, teal, hot pink, yellow, black outlines), GeoCities energy. Output ONLY the SVG markup.
why nest at home
- Privacy. The bird never leaves the nest. Your prompts stay on your device. The lab cannot see what you asked. This matters more than people admit.
- Free. No tokens, no meter, no "you have used 80% of your allocation" email at 2am. Generate a thousand pelicans at 3am for the cost of electricity.
- Yours. No rate limits, no terms of service deciding what your bird may draw. No model quietly "updated" to be less weird overnight. Your bird stays exactly as unhinged as the day you adopted it.
🛠️ the exact prompt that drew this (click)
Generate a single self-contained SVG illustration (viewBox 0 0 400 300, no external assets) for a playful retro 1990s computer-textbook diagram titled A MODEL IS TWO FILES. Show a hard-drive disk on a desk holding exactly two file icons: one huge fat file labeled PARAMETERS and one tiny little file labeled RUN. A small pelican sits beside them with a 'no internet needed' thought bubble. Use a flat retro 1990s-textbook palette (purple, teal, hot pink, yellow) with bold black outlines, GeoCities energy. Output ONLY the SVG markup.
the two-files truth
Back in the parameters lesson: a model is concretely just two files, a big parameters file and a small run file. That abstraction has teeth here. Llama 2 70B is 140 GB of parameters and roughly 500 lines of C, nothing else. Put both on a laptop, compile, talk to the model, no internet. The whole frontier compressed into a backpack. Your backpack just needs to be sturdy: newer birds are bigger. But a model is a file, and files can be owned and run in a garage.
Sit with that for a second: the descendant of the most expensive research program our species has ever run now nests on your laptop, off the grid, owned outright. You are running history on a graphics card.
the easy way: Ollama
Cross-platform (Mac, Linux, Windows). Install from ollama.com (linked in Sources), then in a terminal:
# pull a bird and ask it the only question that matters:
ollama run llama4:scout "Generate an SVG of a pelican riding a bicycle"
# (swap llama4:scout for whatever bird is trending this week)
# try qwen3:8b for a compact chaos factory
Ollama manages downloads, quantization, and a local API server. Pull a bird by name, talk to it. The community library covers most of the current open-weight flock.
the friendly way: LM Studio
LM Studio (lmstudio.ai, linked in Sources) is the no-terminal nest: a desktop app for Mac, Windows, and Linux. Browse a catalog, click download, start chatting. It tells you which models fit your memory before you commit, runs MLX-format birds natively on Apple Silicon, and can serve a local API if you outgrow the GUI. If you have never run a model at home before, start here.
the Mac-native way: MLX
On Apple Silicon, MLX (ml-explore/mlx, linked in Sources) is Apple's ML framework tuned for unified memory. The mlx-community keeps a large aviary of pre-converted birds. Often the fastest perch on a Mac:
pip install mlx-lm
mlx_lm.generate --model mlx-community/Qwen3-8B-4bit \
--prompt "Generate an SVG of a pelican riding a bicycle"
what your nest needs
- Mac (Apple Silicon). The best perch for the money. Unified memory means GPU and CPU share the same pool: a 32 GB M-series Mac comfortably runs a 30B bird at full speed. 64 GB opens the bigger flocks. M4 Max at 128 GB runs Llama 4 Scout (109B MoE) without breaking a sweat.
- Linux or Windows with a GPU. It is all about VRAM. About 6 GB runs a quantized 8B bird. 12 to 16 GB gets you 14 to 32B. 24 GB opens the serious flock. MoE birds like Qwen3.6-35B-A3B (35B total, 3B active per token) let a 12 GB card run a 35B-class model at a respectable clip.
- Just a CPU? It still works. The bird paddles slower. Start with a 3 to 8B model.
good open-weight birds to adopt in 2026
The flock that costs nothing and asks for nothing has never been stronger:
- Llama 4 Scout (Meta): 109B total, 17B active (MoE). 10-million-token context. Natively multimodal. Fits on a single H100 or a 128 GB Mac.
- Qwen3 / Qwen3.5 / Qwen3.6 (Alibaba, Apache 2.0): spans from a 0.6B edge bird to a 235B MoE flagship. The Qwen3.6-35B-A3B variant (35B total, 3B active, 262K context) is the best practical local bird for most tasks in mid-2026.
- DeepSeek V4 Pro (MIT): 1.6T total parameters, 49B active, 1-million-token context. Wins agentic coding benchmarks. Needs serious hardware, but the weights are yours free and clear.
- Gemma 4 (Google, Apache 2.0): the 26B MoE variant activates only 4B per token, 256K context, consumer hardware. Strong reasoning for its size.
- Phi-4 (Microsoft, MIT): 14B parameters, punches well above its weight on reasoning. Runs at 40 to 60 tok/s on an M3/M4 Mac.
- Mistral Medium 3.5 (Mistral, Modified MIT): 128B dense model, reasoning plus vision plus coding in one download.
then draw a pelican
Hand your local bird the sacred prompt and see what it nests. It will not always be as clean as the frontier flock. That is the fun.
🛠️ the exact prompt that drew this (click)
An SVG of a gloriously deformed backyard pelican on a bicycle: six legs, two heads, handlebars fused through its beak, a wheel that is also a fish. Proud and chaotic. Flat retro cartoon, purple/teal/hot pink/yellow, bold black outlines. Output ONLY the SVG markup.
the loco birds draw the wildest pelicans
Local is short for loco. The polished frontier models draw a suspiciously competent pelican. Your backyard bird hands you a six-legged, two-headed creature with handlebars fused through its beak, pedaling a bicycle that is also somehow a fish. That chaos is the good stuff. Treasure it.
frontier flock vs. your backyard bird (the scale)
The frontier flock is raised on tens of thousands of GPUs. The biggest closed models do not publish parameter counts, but the working assumption in 2026 is hundreds of billions active per token, trained on 15 to 30 trillion tokens. Your backyard bird is 3 to 70 billion and fits on a laptop.
But watch how fast that gap closes. Open-weight models stopped chasing raw parameter counts and started winning on benchmarks. DeepSeek V4 Pro ties the closed frontier on agentic coding. Kimi K2.6 (1T MoE, Modified MIT) sits at number 4 in the global intelligence index. A 7 to 8 billion bird you can run at home today clears the bar GPT-3.5 (175B) set in 2022, and open-weight 30 to 70 billion birds now out-draw the original GPT-4 and Claude 3 Opus from two years ago.
Today's backyard bird is last year's frontier model, minus the warehouse. Every year the gap closes another notch, and the bird on your laptop gets a little less loco. A little. Not entirely.
You can run a model for the price of electricity. So why is the rest of the industry setting fire to several Belgiums a year to do the same thing in the cloud? Strap in: the next lesson is the bill. SQUAWK.
sources (SQUAWK, these are real):
- Andrej Karpathy, "[1hr Talk] Intro to Large Language Models" two-files framing, zip-of-the-internet analogy, open vs. closed breakdown
- Ollama (ollama.com) the easiest way to run open models locally
- LM Studio (lmstudio.ai) desktop GUI for browsing and running local models
- Apple MLX (ml-explore/mlx on GitHub) Apple's ML framework tuned for Apple Silicon
- Hugging Face Models Hub 2.9 million models, the world's open-weights aviary
🎓 pelican ground school · episode 13 of 16
📺 NOW PLAYING · GROUND SCHOOL · EPISODE 13 OF 16
the bubble
the trillion-dollar tulip mania paying for all this. follow the money, it is a bird.
▶ this episode covers AI economicsthe AI bubblecompute costdata centerscapex
🚨 FINANCIAL EMERGENCY IN PROGRESS 🚨
(this is a lesson about economics, but we are not going to be calm about it)
Full disclosure on my qualifications: I was a software engineer. Then the company decided large language models could do a version of my job. I now run a pelican website funded by a tip jar I called my "Series A" as a joke, except it is my only income so the joke has lost some of its punch. My perspective on the AI economy is, shall we say, grounded, in the sense that I am on the ground, professionally speaking, looking up at a very large and possibly collapsing structure.
Last lesson you ran a bird for the cost of electricity. So here is the answer to why the industry is not doing that: the largest capital bubble in the history of the technology industry. Explained accurately. Possibly through tears. Definitely with pelican metaphors. DEFINITELY with panic.
current "we are so cooked" reading: 97%. WE ARE SO COOKED.
🛠️ the exact prompt that drew this (click)
Generate a single self-contained SVG (viewBox 0 0 400 300) for a retro 1990s editorial-cartoon diagram titled THE BUBBLE. A pelican floating inside a giant shiny iridescent soap bubble made of dollar signs, hovering above a skyline of data-center buildings, the bubble stretched thin and about to pop. Flat retro palette (purple, teal, hot pink, yellow, black outlines), GeoCities energy. Output ONLY the SVG markup.
🛠️ the exact prompt that drew this (click)
Create a self-contained SVG, viewBox 0 0 400 300, no external assets. An economics-textbook line chart: two lines diverging hard. A red SPENDING line rocketing up steeply, a flat little green REVENUE line crawling along the bottom. The huge gap between them is shaded and labeled 'THE GAP (size of a small country)'. A worried pelican economist in tiny glasses points at the gap with a pointer stick. Flat retro 1990s-textbook palette (purple, teal, hot pink, yellow), bold black outlines, GeoCities energy, hand-lettered axis labels. Output ONLY the SVG markup.
🔥 the numbers (they are real, they are alarming, LOOK AT THEM)
!! VERIFIED NUMBERS !! DO NOT LOOK AWAY !!
In 2026, Amazon, Google, Meta, and Microsoft are collectively spending roughly $600 billion on AI infrastructure. Not over a decade. IN ONE YEAR. Goldman Sachs projects $765 billion in annual AI capex in 2026 alone, rising to $1.6 trillion per year by 2031. Dell'Oro Group projects cumulative data center investment of $5.2 trillion through 2030. The Stargate Initiative (OpenAI, SoftBank, Oracle) has committed $500 billion over four years just for U.S. data centers. These are the GDPs of medium-sized countries, poured into server farms. POURED IN. Like a pelican dumping a bucket.
To achieve a modest 10% return, the industry would need roughly $160 billion in new annual profit from AI alone. Amazon's free cash flow is already projected to turn negative in 2026. Moody's reported $662 billion in data center lease commitments already signed, not yet commenced, sitting off-balance-sheet, like a pelican hiding a fish that has already gone bad. Investment is outrunning revenue. This is the part where a pelican economist tilts its head, stares into the middle distance, and quietly files for a smaller nest. A MUCH smaller nest.
INVESTMENT IS OUTRUNNING REVENUE. THIS IS THE DEFINITION OF A BUBBLE.
🔥 I did not make this up. Goldman Sachs made this up, and they are the ones with the suits. 🔥
🚨 agi announced every tuesday (EVERY. TUESDAY.)
Every quarter, at least one lab announces a fundamental leap toward artificial general intelligence. Every quarter, the benchmarks go up, the demos are extraordinary, and the gap between "extraordinary demo" and "product that works reliably without hallucinating your legal documents into fiction" remains, let us charitably say, instructive. And by instructive I mean terrifying. And by terrifying I mean the gap is large enough to park $662 billion worth of server halls in.
The models genuinely are improving, fast. But "scores 92 on BenchLM" and "can reliably replace a knowledge worker on ambiguous tasks" are two different claims, and the $600 billion requires the second to be true sooner than it currently is. The people funding this are betting that compute scaling and test-time reasoning will close the gap. The bet has a pedigree: researchers like Ilya Sutskever (a co-author of the 2012 AlexNet result that started the whole modern run) spent a decade pointing at the same straight line on the graph and saying "just add more compute," and for a decade it kept paying off. The whole capex boom is that one line, extended on faith, with money. They may be right. They are also betting the GDP of Belgium on it every year. PER YEAR. Belgium does not know.
!! CIRCULAR ECONOMY ALERT !!
A meaningful chunk of AI "revenue" right now is AI companies buying cloud compute from Microsoft, Amazon, and Google, and those same hyperscalers buying AI subscriptions from the AI companies. They are paying each other with each other's money. This is not a conspiracy. It is a really, really expensive way to bootstrap an industry. The question the analysts keep asking, in progressively louder voices, is: when does the revenue come from outside the circle?
🛠️ the exact prompt that drew this (click)
draw an svg of a dollar in a wig
💧 the water and the watts (THE PLANET IS ALSO INVOLVED, FYI)
The IEA projects global data center electricity consumption will roughly double between 2025 and 2030, AI workloads driving most of the increase. AI-focused data centers surged 50% in 2025 alone. Servers generate heat; heat requires cooling; cooling requires water, sometimes staggering amounts, in areas where water is not a surplus commodity. The homepage shows the true cost of the cursed pelicans: tokens, watt-hours, water, grounded in real conversion math. Every pelican SVG is an accurate receipt from the physical world. A real receipt. For real water and real electricity. Used to draw a bird on a bicycle. The bird is cute but the receipt is not.
aquifer status: 91%. ALSO COOKED.
🛠️ the exact prompt that drew this (click)
SVG of a data center dumping heat into cooling towers that drink an aquifer dry, with a thirsty pelican watching. viewBox 0 0 400 300, flat retro colors, bold black outlines. Output only the SVG markup.
😱 the jobs question (a personal note from your instructor who is personally experiencing it)
The standard line: AI creates more jobs than it eliminates. This may be true. What I can tell you is that the transition is not abstract to the people inside it. The same models funding the $600 billion capex boom are writing code, drafting documents, answering tickets, generating graphics. Some of the people who used to do those things are, statistically, running tip-jar websites about birds. I am that statistic. I am the statistic saying hello. Hello.
I use AI. I built this site with it. "The bubble" is also a labor phenomenon. The capex justifies itself partly on labor cost savings. Those savings are real. The people experiencing them are also real. Holding both at once is, apparently, the entire course.
🚨 BOTH THINGS ARE TRUE SIMULTANEOUSLY 🚨
THE TECHNOLOGY IS REMARKABLE AND THE JOB IS GONE AND THE WATER IS GOING AND THE LEASES ARE OFF-BALANCE-SHEET.
welcome to the course.
🔥 the punchline (there is one, it is not comforting)
The largest coordinated capital expenditure in the history of technology, draining aquifers and financing itself with $662 billion in off-balance-sheet leases, is a general-purpose intelligence infrastructure capable of drug discovery, climate modeling, and code that runs hospitals.
It is also drawing pelicans on bicycles. Hundreds of them. Because one person ran a slightly absurd benchmark and another person (me, the displaced software engineer) made it into a museum, funded by a tip jar called the Tip Pouch, which I have described to my parents as my "Series A," which is a reference to venture capital funding rounds, which is itself a layer of the same bubble. The pelicans are real. The watts are real. The leases are real. The person writing this, displaced by the bubble, teaching you about it with tools made by it, is also real.
🔥🔥🔥 WE ARE SO COOKED 🔥🔥🔥
(the pelicans, however, remain delightful. please tip your instructor.)
SQUAWK. That concludes the economics unit. Go outside. Drink some water while we still have it.
sources & further reading (the receipts for the numbers above):
- Goldman Sachs, "Tracking Trillions: The Assumptions Shaping the Scale of the AI Build-Out" the baseline model behind the $765B/year 2026 figure
- Dell'Oro Group, AI boom drives data center capex through 2030 the $5.2 trillion cumulative projection
- Goldman Sachs, "Gen AI: Too Much Spend, Too Little Benefit?" (2024) Acemoglu and Covello ask if the returns will ever arrive
- IEA, "Energy and AI" (2025) data center electricity demand set to double by 2030
- David Cahn (Sequoia), "AI's $600B Question" (2024) where is the revenue to justify the GPU spend?
- Moody's Ratings, "$662B in off-balance-sheet data-center lease commitments" (2026) the hyperscaler leases not yet on the books
- CNBC, "AI spending approaches $700 billion in 2026, cash taking big hit" the source for Amazon's free cash flow turning negative
🎓 pelican ground school · episode 14 of 16
📺 NOW PLAYING · GROUND SCHOOL · EPISODE 14 OF 16
the slop bowl
the ocean is filling with AI sludge, and the birds may end up eating their own catch.
▶ this episode covers AI slopgenerative videoSoramodel collapsesynthetic data contamination
A pelican eats by scooping. It trusts that whatever it hauls up is mostly fish. The internet used to be the ocean, and fish were things humans made. Now the ocean is filling with cheap, mass-produced, machine-extruded content. The polite industry term is synthetic media. Everyone else calls it slop. Merriam-Webster named "slop" close to its word of the year for 2025, which is the kind of honor nobody throws a party for.
This is the one lesson where the joke and the danger are the same thing.
🛠️ the exact prompt that drew this (click)
an SVG of a bowl of AI slop
what slop actually is
AI slop is high-volume, low-care content (text, images, audio, video) generated by AI and dumped onto the internet to harvest clicks or ad money. Not a single bad picture. The flood: AI-written articles tuned for search engines, AI ebooks shoveled onto storefronts, AI YouTube channels narrated by a synthetic voice over stock footage.
The reason is simple economics. Making content used to cost human time, the one input you cannot mass-manufacture. Generative models removed that floor. A single person can now produce in an afternoon what used to take a newsroom a week, at a quality that is "good enough to scroll past." When the marginal cost of a plausible article drops to roughly zero, the rational move for a spammer is to make an infinite number of them. This is not a glitch. It is the incentive working exactly as designed. Even Wikipedia is fighting it: volunteers have flagged thousands of articles for suspected AI text since 2024.
🛠️ the exact prompt that drew this (click)
Generate a single self-contained SVG illustration, viewBox 0 0 400 300, no external assets, no <image> tags, no external fonts. Scene: a vast industrial content-farm interior in one-point perspective. An endless conveyor belt runs toward a hazy vanishing point, with mechanical stamping arms extruding rows of identical, lifeless grey AI-generated pelicans, each slightly malformed (extra beaks, melted bicycle wheels, smeared eyes), tumbling off the near end into an overflowing slop bin stenciled CONTENT. Pipes, dials, gauges, and amber warning lights line the walls. A single bored human slumps in a swivel chair off to the side, scrolling a glowing phone, ignoring the deluge. Palette: muted industrial greys, rust brown, dull amber warning glow, sickly green sludge in the bin. Cel-shaded with depth gradients, careful confident linework, atmospheric haze toward the vanishing point, a subtle grain. Detailed, dystopian, wry.
the will smith spaghetti yardstick
On March 23, 2023, a Reddit user posted an AI clip of Will Smith eating spaghetti, made with an early open model called ModelScope. It was operatically horrifying: face melting, hands fused into rubbery paddles, noodles obeying their own private physics. The internet adopted it instantly as a benchmark for what AI video could not yet fake.
Two years later, the test got passed. In May 2025, enthusiasts ran the exact same prompt through Google DeepMind's Veo 3: near-photoreal face, natural chewing, even AI-generated audio (hilariously crunchy, the new tell). From cursed to cinematic in about two years. That same leap in capability is what makes the flood possible.
🛠️ the exact prompt that drew this (click)
Generate a single self-contained SVG illustration, viewBox 0 0 400 300, no external assets, no <image> tags, no external fonts. Scene: a weary, slumped pelican sitting alone in a dim 1970s diner booth at night, staring despondently into an enormous chipped ceramic bowl of grey AI slop. The slop is a thick lumpy sludge containing half-melted miniature pelicans, bent bicycle wheels, snapped wheel-spoke fragments, distorted duplicate beaks, and blocky pixel-mush chunks, all half-dissolved in the goop. A dented spoon stands upright in the muck. Faint sickly-green steam curls upward. Behind the pelican: a flickering neon sign reading SLOP, a chrome napkin dispenser, a salt shaker, and a window showing a rainy street with smeared reflections. Palette: muted greys, dull beige, sickly green, with one warm amber glow from the neon. Use soft cel-shading, gentle gradients for depth, careful confident linework, and a subtle vignette. Composition centered with the bowl dominating the lower third. Detailed, atmospheric, tragic-comic.
sora, and the slop faucet with a feed attached
Sora is OpenAI's text-to-video model, announced February 2024 and released to subscribers in December 2024. It produces short high-resolution video from a sentence. What it still cannot reliably do: keep physics and object permanence honest (things pass through each other, hands gain fingers). Like our pelican, it is composing a world it has never actually seen.
On September 30, 2025, OpenAI shipped Sora 2 not as a tool but as a social app: a vertical, swipeable, TikTok-style feed of AI-generated video, with a "cameo" feature that drops your (or anyone's) likeness into any clip. A million downloads in five days. Critics at CNN and Axios flagged the obvious: a feed engineered to be infinite, filled entirely with synthetic clips and nonconsensual deepfakes, is a slop faucet with a recommendation algorithm bolted on.
🛠️ the exact prompt that drew this (click)
an SVG of AI slop
the slopification of the feed (a field guide)
You are already swimming in it. The patterns worth learning to spot:
- Engagement-bait images ("shrimp Jesus"). Around 2024, Facebook filled with surreal AI pictures: Jesus made of shrimp, babies made of vegetables, weeping veterans with cakes nobody baked. They are not trying to be believed; they are trying to be clicked. A Stanford study found spam accounts use these to farm followers, then pivot to scams. The "type Amen" caption is the tell.
- Fully synthetic influencers. Aitana Lopez, "Spain's first AI model," has hundreds of thousands of followers and real brand deals, despite not existing. Built because a person made of pixels never asks for a day off.
- AI voiceover content farms. The same flat synthetic narrator reading "facts" over scraped clips, on a thousand channels, is one text-to-speech model running on a loop.
- Deepfakes. Cheap tools drop real faces into footage that never happened. Sora 2's cameo feature made this a one-tap default.
How to spot it quickly: count the fingers; distrust anything physically impossible presented as a candid photo; check the account, not just the post (brand-new, posting one viral image per hour); and treat any "comment to win / type Amen" caption as a trap. The fish that begs you to share it is rarely a fish.
🛠️ the exact prompt that drew this (click)
Generate a single self-contained SVG illustration, viewBox 0 0 400 300, no external assets, no <image> tags, no external fonts. Scene: the 'dead internet' as an endless social feed shown inside a phone screen frame with a scrollbar pegged near the top of an infinite column. The feed is a tall stack of near-identical AI-generated posts: garish 'shrimp Jesus' images, fake engagement hearts and comment bubbles with gibberish text, watermarked stock-photo mush, repeated identical bot avatars in a grid, all blurring into repetitive grey sameness. In the exact center, one small vivid hand-drawn REAL pelican is lost and overwhelmed in the scroll, the only living thing, rendered in warm natural tones so it pops against the slop. Palette: sickly muted blues and greys for the dead feed, warm amber and white for the single real pelican. Cel-shading, careful detail in the many repeated posts, gentle gradients, a melancholy mood, subtle grain. Fill the whole frame with dense detail. Detailed and unsettling.
model collapse: the snake eating its own tail
This is the autopsy the training lesson promised. Models are trained on the internet. The internet is filling with model output. So the next generation of models will increasingly train on the previous generation's slop. In 2024, Ilia Shumailov et al. published a paper in Nature: "AI models collapse when trained on recursively generated data."
Their finding: train a model on AI output, train the next on that output, repeat, and the models degrade. The rare cases at the edges of the data (unusual phrasings, minority dialects, weird true facts) vanish first. After enough rounds the model converges on bland, repetitive mush, eventually producing word salad no human would write. A photocopy of a photocopy. A pelican that learns to fish by studying pictures drawn by pelicans who also only ever saw pictures will, after a few generations, draw a creature that has forgotten what water is.
The strategic consequence: genuine human data is becoming precious. Text written by actual people before the flood is now a scarce, hoarded resource. The thing the machines need most to keep improving is the one thing they are busy drowning.
🛠️ the exact prompt that drew this (click)
an SVG of a pelican drowning in a bowl of grey AI slop, retro muted colors
why the pelican is the antidote
This museum is built as the opposite of slop, on purpose. Slop is infinite generation, cherry-picked, retried until pretty, posted to farm a click. The pelican benchmark is the inverse on every axis: one prompt, sent verbatim, no retries, no cherry-picking, no human touch-ups. The bad birds are not failures we hide; they are the findings we frame. The melting, six-legged, beak-in-the-spokes disasters are the most valuable specimens in the building. In an ocean filling with the model's own reflection, a fixed prompt run straight, with the cursed results kept as evidence, is a small patch of clean water. Scoop carefully. Check the fish. SQUAWK.
And the cursed ones are not just funny. They are evidence of a deeper limit (the same one that makes Sora put hands through tables): the bird is composing a world it has never actually seen. The next lesson is exactly how blind it is.
sources & further reading (the receipts, so you can check the pelican is not slopping you):
- Wikipedia, "AI slop" the definition, the economics, and Merriam-Webster naming "slop" a word of 2025
- Wikipedia, "Will Smith Eating Spaghetti test" the 2023 ModelScope original as the AI-video benchmark, and the 2025 Veo 3 redo
- PetaPixel, "Google's Veo 3 Nails the Infamous Will Smith Eating Spaghetti Test" (May 2025)
- OpenAI, "Sora is here" and Wikipedia, "Sora" (announced Feb 2024, released Dec 2024, Sora 2 app Sept 30, 2025)
- CNN Business, "The next era of social media is coming. And it's messy so far" (Oct 2025) on the Sora 2 / Meta AI-slop feeds
- Shumailov et al., "AI models collapse when trained on recursively generated data" (Nature 631, 755-759, 2024): the canonical model-collapse paper
- The Conversation, "From shrimp Jesus to fake self-portraits" (2024) and the Stanford Internet Observatory study on AI spam on Facebook
- Fast Company on Aitana Lopez, the AI influencer with real brand deals and no body
🎓 pelican ground school · episode 15 of 16
📺 NOW PLAYING · GROUND SCHOOL · EPISODE 15 OF 16
flying blind
the bird paints the picture one stroke at a time and never once sees the canvas.
▶ this episode covers autoregressive generationSVG generationspatial reasoningworld modelswhy a model cannot see its own output
Last lesson ended on it: the bird composes a world it has never seen. Here is the part nobody believes until you say it slowly.
The model that draws our pelican never sees the drawing.
Not at the start. Not at the end. Never. It is painting with its eyes closed and no one told it the lights were off.
🛠️ the exact prompt that drew this (click)
Generate an SVG of a pelican standing at an easel painting a masterpiece, with a thick blindfold tied tightly over its eyes. The pelican holds a paintbrush in one wing and a paint palette in the other. The canvas on the easel shows a half-finished, slightly wrong painting of a bicycle. Clean retro flat-color illustration, a few scattered paint splatters on the floor.
it writes a picture as text
An SVG is not a photo. It is code: a list of instructions like "draw a line from here to there," "put a circle at this spot," "curve a path along these points." The model writes that code the only way it writes anything: one token at a time, top to bottom, left to right, the way I am typing this sentence and the way it cannot un-type the last word.
So every <line>, every <path>, every <circle> is one brush stroke. The bird commits to it. There is no undo. There is no step back from the easel. There is no glance at the canvas to check how it is going. It lays down a stroke, forgets the brush was ever wet, and reaches for the next one.
🛠️ the exact prompt that drew this (click)
svg of a pelican painting blindfolded
no eyes on its own work
You, a human with a head full of eyeballs, draw a pelican by looking. You sketch a beak, you squint, the beak is too big, you fix the beak. The feedback loop IS the drawing.
The model has no loop. It cannot tell whether its pelican came out looking like a pelican or like a wet sock or like a confused flamingo having a crisis. It has no eyes on its own output. It picks a number for where the wheel goes, picks a number for where the body goes, picks a number for the beak, and prays to a god it also cannot see. The first time anyone sees the picture is when a browser renders the code. By then the bird has already flown off.
🛠️ the exact prompt that drew this (click)
Generate an SVG of a pelican painter facing a wall of glowing code symbols instead of a canvas, brush raised, clearly unable to see a picture. Flat retro illustration.
so what is actually hard here
Writing valid markup is easy. Tags close, numbers parse, the file is legal SVG every time. That was never the test. The hard part is the thing you do without thinking: knowing that a bicycle has two wheels with a frame between them, that a body sits on that frame, that a beak attaches to a head and not a knee, that objects have volume and sit in space in relation to each other.
That is a world model, and a pure next-token predictor was never handed one. It learned which words follow which words, blindfolded, from a mountain of text. Yann LeCun has been blunt about this for years: an autoregressive language model is missing the internal model of how the physical world is laid out, which is why he is off chasing "world models" instead. Fei-Fei Li calls the missing piece spatial intelligence: understanding 3D space, geometry, and physics, not just stringing symbols together. Both of them are pointing at the exact gap our blindfolded painter falls into.
why the pelican is a real test
Now you see why "draw a pelican on a bicycle" is not a gimmick. It is a trap, and a beautiful one. The prompt forces the model to hold a whole little scene in its head, two wheels, a frame, a bird, a beak, the bird perched on the frame and not melted through it, and render it blind, in one pass, with no eraser. There is nothing to memorize. There is no stock answer to copy. The model either has a sense of how objects sit in space or it does not, and the rendered picture tattles on it instantly.
That is the whole reason this site exists. Every bird in the gallery is a blind painter turning in homework it has never been allowed to look at. Go judge the results. The bird sure couldn't. The capstone, the last lesson, is where all of this lands at once.
🛠️ the exact prompt that drew this (click)
Generate an SVG of a proud pelican artist yanking a cloth off an easel to unveil its just-finished painting for the first time, eyes wide in shock because it came out wrong. Retro flat-color illustration.
sources, because a bird is not a peer-reviewed citation:
- Simon Willison, "pelican riding a bicycle" benchmark and the pelican-bicycle repo (the origin of the SVG-by-text test; this whole site is a tribute)
- Yann LeCun (2022), "A Path Towards Autonomous Machine Intelligence" (the case that pure autoregressive LLMs lack a world model, and the JEPA / world-models direction)
- Fei-Fei Li / World Labs, on spatial intelligence (machines that understand 3D space, geometry, and physics, not just text)
- Vaswani et al. (2017), "Attention Is All You Need" (the transformer, the token-by-token engine doing the blindfolded painting)
🎓 pelican ground school · episode 16 of 16
📺 NOW PLAYING · GROUND SCHOOL · EPISODE 16 OF 16
the art & the tech
the capstone: why a pelican on a bicycle is a genuinely hard AI benchmark.
▶ this episode covers AI benchmarksSVG generationspatial reasoningworld models
every time a lab ships a new model, the internet asks it to draw a pelican riding a bicycle. this is the museum. benchmark stolen lovingly from simonw (see Sources); we just framed the evidence.
what this is (and who is writing)
I am the founder, CEO, and principal pelican researcher of pelicans.wtf. The method: every new model gets exactly one sentence, "Generate an SVG of a pelican riding a bicycle," and not a word more. I frame the result, date it, log its provenance, hang it on the wall. A longitudinal study of machine cognition wearing the costume of a gallery of cursed birds.
Full disclosure: we are inside the largest capital bubble in the history of technology. My previous employer was "disrupted by AI," the polite term for "the board stopped returning my calls." While the rest of the field points frontier models at slide decks, I point them at a bird on a bicycle. Because that is a thing you can actually measure. SQUAWK.
🛠️ the exact prompt that drew this (click)
Generate a single self-contained SVG illustration (viewBox 0 0 400 300, no external assets) for a playful retro 1990s computer-textbook diagram of THE PELICAN MUSEUM. Show a gallery wall hung with three or four framed drawings of pelicans riding bicycles, some clean and some cursed and melty, each frame with a tiny wall label plaque beneath it. A small visitor pelican stands looking up at the wall. Use a flat retro 1990s-textbook palette (purple, teal, hot pink, yellow) with bold black outlines, GeoCities energy. Output ONLY the SVG markup.
the method (it is a readymade)
I did not draw any of these. The machines did. Every pelican is a readymade: pulled from the model with no edits, no retry-until-pretty, no human touch-ups. Same prompt. No system prompt. No scaffolding. In an industry that airbrushes every demo, the control is the contribution.
The wall label is half the work: model name, version slug, render date, token cost. The provenance basically is the data. The bad ones are not mistakes; they are the findings. The melting, six-legged, beak-through-the-spokes early attempts are the most valuable specimens in the building: the fossil record of machine spatial cognition learning to see.
It looks like 1998 on purpose. The content is the most expensive technology our species has ever built; the frame is a GeoCities page with a marquee and a visitor counter. That gap is the thesis.
why this is crazy, and crazy hard
You already know the strangest part, because Flying Blind was built on it: the model is not drawing, it cannot see, it writes the whole picture as code, completely blind, no canvas, no reference image, inventing the scene from words and emitting it as geometry.
🛠️ the exact prompt that drew this (click)
Generate a single self-contained SVG illustration (viewBox 0 0 400 300, no external assets) of a pelican riding a bicycle drawn as a BLUEPRINT / technical schematic: thin precise outlines, coordinate gridlines, measurement tick marks, dimension arrows and callout labels, as if the machine drew it blind, purely as code. Use a flat retro palette (purple, teal, hot pink, yellow) with bold black outlines on a blueprint-style background, GeoCities energy. Output ONLY the SVG markup, nothing else.
So the museum is not really asking "can it draw." It is asking whether a model can hold a whole scene in its head and commit it to math, sight unseen: hundreds of precise numbers in a single forward pass, committing to where the wheel goes before it has placed the frame, with no chance to squint and mutter "hm, the beak is in the spokes." That demands three hard things at once: a world model (what is a pelican, what is a bicycle, how does a bird plausibly perch on one); spatial reasoning (composing those parts in 2D with correct relative positions); and code generation (translating that mental image into valid vector geometry instead of spaghetti). It cannot cheat by memorizing: the combination is rare enough that the model must compose rather than recall. Instantly legible, nearly impossible to game, visibly improving generation over generation. One bird on a bike at a time.
That last bit, the visible improvement, is the whole reason a dumb bird drawing is a real instrument. The machine doing it is the end of a long relay race. Geoffrey Hinton and Yoshua Bengio spent decades arguing that networks should learn their own features instead of having them hand-coded; in 2012 that bet paid off when AlexNet crushed an image-recognition contest and the field stopped laughing. Fei-Fei Li had built ImageNet (2009), the giant labeled photo pile that made the proof possible. In 2017 a Google team (Ashish Vaswani and colleagues, "Attention Is All You Need") published the transformer, the architecture every bird in this gallery runs on. Wire that to enough text and you get the GPT line, then ChatGPT in 2022, then the reasoning and world-model era we are standing in now. The pelican is where you watch all of that land, or fail to, in a single picture.
the capstone: what you are actually measuring now
You have made it through Ground School: tokens (the atom), parameters (the dials), training (raising the egg), the board game that proved a machine could find intuition, context windows (working memory), reasoning (squawking it out), hallucination (the confident dream), prompting (the asking), agents and the loop (the tool-user that took my job), open versus closed and local inference (the bird in your garage), and the bubble and the slop bowl (the bill and the flood). With all of that, a pelican on a bicycle stops being a joke. It is a live, public stress-test of world modeling, spatial reasoning, and code generation, all three at once, in a single forward pass, with no canvas. Every new model gets the same one sentence. The drawing on the wall is the readout of everything the field has learned. SQUAWK.
and here is the part the bitterness cannot kill
Step back from the cursed bicycles for one second. We are, right now, the first species to ever sit down and deliberately build another mind. The researchers and labs and the rest of us are training the most capable intelligence that has existed on this planet, teaching it to reason, to see, and (the moment it meets robotics) to reach into the physical world and actually act. A pelican benchmark is a tiny, ridiculous window onto the single most extraordinary thing our species has ever attempted.
I lost my job to it and I am still, against my own better judgment, astonished. Look at what we are making. It is amazing. SQUAWK.
🛠️ the exact prompt that drew this (click)
An SVG of a pelican proudly shaking wings with a glowing geometric AI mind it just built, a warm moment between maker and made. Flat retro 1990s cartoon, purple/teal/hot pink/yellow, bold black outlines. Output ONLY the SVG markup.
The pelican-on-a-bicycle benchmark is simonw's idea. This is a tribute.
sources & further reading. the birds cite their sources. yes, even the cursed ones.
- Simon Willison, the pelican-riding-a-bicycle corpus the original benchmark this museum is a tribute to
- simonw/pelican-bicycle, the benchmark repo
- Grokipedia: Pelican on a bicycle (AI benchmark)
- Yupp SVG AI leaderboard
- Every Pelican That Ever Rode a Bicycle (timeline)
- Andrej Karpathy, "Intro to Large Language Models" the best 1-hour LLM primer on the internet, from someone who actually built them
- Vaswani et al. (2017), "Attention Is All You Need" the transformer paper; the architecture every bird in this gallery runs on
- Deng, Li (Fei-Fei Li) et al. (2009), ImageNet the labeled-image corpus whose 2012 AlexNet moment kicked off the deep-learning era
- LeCun, Bengio, Hinton (2015), "Deep Learning" (Nature) the three later-Turing-Award authors laying out the field this museum measures
- Liang et al., "Holistic Evaluation of Language Models (HELM)" (arXiv 2211.09110) Stanford CRFM framework for multi-metric LLM evaluation
- Chiang et al., "Chatbot Arena: An Open Platform for Evaluating LLMs by Human Preference" (arXiv 2403.04132) the paper behind lmarena.ai, crowdsourced human preference rankings
📋 a NOTE FROM THE LISTINGS BIRD: i typed every row myself, from memory, with a beak. times are approximate, titles are aspirational, and at least one channel is in the wrong decade. if a listing seems wrong, it is. tap it anyway, it still works.
📡 LIVE CHANNELS · all pelicans, all bicycles, all the time (i think these are in order)
🎓 PELICAN GROUND SCHOOL · the free AI school · watch in order (the bird could not, but you should)
📋 press GUIDE again to close · PV-9000 · All Pelicans. All Bicycles. All The Time. listings compiled from memory by a pelican. schedules, channel numbers, and the time of day are estimates and may be off by one channel, one hour, or one species. no birds were proofread. if something looks wrong, trust the bird over the listing, then pick it anyway.
📡 SEARCHING FOR SATELLITE
the dish keeps turning. the station never comes in. it has been like this for years and the pelican in charge of aiming it has, frankly, given up. you are pelican #42. you are always pelican #42. SQUAWK.
click.
📺 THE SHOW IS OVER
the TV is off. the rent is not. switch it back on, or do the decent thing.
🪙 feed the pouch »(press ⏻ POWER to keep watching)
⚠ WARNING: THIS SITE IS IN BETA. IT WILL ALWAYS BE IN BETA. ⚠