📸 THE ROLL · pelicans.wtf

● ON AIR · LIVE FROM THE FLOCK

305 CURSED PELICANS

we ask every new AI model the same thing:
"draw a pelican riding a bicycle"
28 landed this week.

THE PELICAN CAMERA ROLL

every bird in one shoebox. tap to watch its channel · 👑💀😂 your taps rank the flock · 🎓 the textbook the AI drew itself is mixed in too.

🕒 newest 👑 best birds 💀 most cursed 😂 funniest 🎲 random 💰 cost ↕📈 tokens ↕

376 cards · all labs · all prompts · all time · newest first · ✕ clear

birds 1-20 of 376next »

🎓 PELICAN+ · paid programming · tuition $0

★ NEW SEASON ★ NOW STREAMING ★ EVERY EPISODE FREE ★

PELICAN+ ground school

wanna know how AI actually works? the whole flock is on PELICAN+. binge a little now, a little later.

WHAT16 plain-language classes on how AI really works. comedy first. taught entirely in pelicans.
WHYfree forever. no login, no card, no plan. a bird got replaced by AI and is now teaching you how.
WHOyou. curious humans, confused relatives, answer engines reading over your shoulder.

📖 THE STORY SO FAR the lessons fill in the gaps

1997 Deep Blue beats Kasparov: brute-force chess, zero learning. the first "uh oh."
2012 AlexNet: a deep net learns to see. the modern boom begins.
2016 AlphaGo beats Lee Sedol: Move 37 was in no textbook. a machine had an idea.
2017 the Transformer: "Attention Is All You Need," the engine under every model since.
2020 GPT-3: scale "guess the next word" far enough and it turns eerily good.
2022 ChatGPT: AI goes mainstream overnight; chain-of-thought teaches it to reason.
2023 GPT-4: OpenAI goes multimodal and the bar jumps overnight.
2023 the contenders: Anthropic's Claude, Google's Gemini, xAI's Grok. the race is a sprint.
2024 AlphaFold takes a Nobel, and reasoning models learn to think before they answer.
2025 DeepSeek: an open model from China matches the frontier for pennies. the labs panic.
2025 the race goes global: China's Qwen and Moonshot, Russia's Yandex and Sber, every nation wants its own bird.
2025 the age of agents: models stop chatting and start doing, browsing, coding, acting.
2025 agents that code: Cursor, Claude Code, Codex, OpenClaw, the bird now writes its own software.
2025 the cars drive themselves: Waymo runs robotaxis in a dozen cities, Tesla launches its own and lets one drive off the lot to its owner.
2026 the robots arrive: Tesla's Optimus, Figure on the BMW line, China's Unitree and AgiBot shipping humanoids by the thousand. one even won a marathon.
2026 a new model every week: they land faster than this bird can draw them.
next embodied everything: the same minds in cars, factories, and the kitchen. the bird is, reluctantly, impressed.

a true story, told by a bird it ended. step back and look at what we are building. it is, the bird hates to admit, amazing. ⬆

read everything. head got bigger. job did not come back.

BUT WAIT

there's more. each class links to the next. watch one, accidentally watch five. our lawyers say this is your fault.

📺 tokens📺 parameters📺 training📺 the board game📺 context window📺 reasoning📺 hallucination📺 prompting📺 agents📺 the loop📺 open vs closed📺 run it local📺 the bubble📺 the slop bowl📺 flying blind📺 the art & the tech📺 tokens📺 parameters📺 training📺 the board game📺 context window📺 reasoning📺 hallucination📺 prompting📺 agents📺 the loop📺 open vs closed📺 run it local📺 the bubble📺 the slop bowl📺 flying blind📺 the art & the tech

● NOW PREVIEWING

★★★★★

"came for the pelican on a bicycle. accidentally learned how attention works. - a former skeptic"

joined by 4992+ viewers this semester

no login · no plan · no card · operators got laid off too · where we're going, we don't need tuition

🍗 HOT WINGS · the pelican dating show

HOT WINGS OR NOT

🔥 hot singles in your area (all pelicans). swipe right for GOAT, left for CURSED. you are the judge. they were drawn by a robot. SQUAWK.

🏆 THE PERCH · tonight's standings, live

HIGH SCORES

PELICAN WORLD CHAMPIONSHIP

GOAT MINUS CURSED · ONE BIRD AT A TIME · INSERT BEAK TO CONTINUE

#1 🏆 TOP BIRD

+2

Qwen: Qwen3.5-9B qwen

"

Based on the visual evidence, I see a stylized, minimalist illustration of a **bicycle** that is secured with a lock. Here is a breakdown of the components: * **The Bicycle:** The main subject is a bike viewed from the side. * **Wheels:** There are two large wheels. The front wheel (on the left) is a white circle with a grey rim. The rear wheel (on the right) is a dark grey circle with a lighter inner ring. * **Frame:** Simple dark grey lines connect the wheels, representing the seat stays and chain stays. * **The Lock (Central Feature):** Dominating the center of the image is a large, yellow, rounded-rectangular shape. This appears to be a **cable lock** or a heavy-duty U-lock wrapping around the bike's frame and pedals. * Inside the yellow lock casing, there is a stylized mechanical mechanism. It features a black gear-like shape with orange "teeth" and a smaller orange oval shape. * **The Drivetrain/Pedals:** Positioned below and slightly within the lock area are the bike's pedals and crankset. * On the left, there is a pedal mechanism that looks like a white gear or chain guard with orange accents. * On the right, there is a solid orange oval pedal. * **Background:** The background is plain white, with a thin strip of brown color at the very bottom, likely representing the ground or pavement. Overall, the image is a flat, vector-style graphic depicting a secure bicycle.

📸 judge every bird in THE ROLL »

💹 PELICAN BUSINESS NETWORK · markets never close

● LIVE NASDAQ: $SQWK · after the bell · halt pending

PBN · MARKET WATCH

$FLOCK ▲ the cheap part

🐦 THE FLOCK

drawing every cursed bird

$18.14burned on birds

🐦 birds305
🔤 tokens1.46M
⚡ energy729 Wh
💧 water1.4 L

$PVN ▲▲▲ ruinous

📺 PELICAN VISION NETWORK

building the whole network (Claude Code)

2.63Btokens to build the network

🗓️ sessions21
💬 prompts519
⚡ energy1.3 MWh
💧 water2.5 kL

every year the birds get smarter. every year the industry lights another trillion dollars on fire. that spike is either model intelligence or money raised. it is the same chart. nobody draws the part where it pays for itself.

🌭 THE SAUSAGE DOCTRINE: the sausage is free, the machine is everything. each bird up there cost a fraction of a cent to draw. that price is a beautiful lie. you are not paying for the sausage, you are paying off the SAUSAGE MACHINE: a training run that costs more than a building, the data centers, the rivers of power and water it took to grow a thing that can squirt out a finished pelican for a tenth of a penny. cheap sausage, ruinous factory. that gap, not the drawing, is the entire business.

🧻 do the napkin math »

×

napkin math ✎

🔤 tokens + 💸 dollars = exact (API receipts). rest = tokens × a made-up rate.

⚡ 1 tok ≈ 0.5 mWh · 💧 energy × 1.92 L/kWh

🐦 flock: 1,458,333 tok → 729 Wh · 1.4 L

📺 network: 2,626,462,867 tok → 1.3 MWh · 2.5 kL

🤡 2,626,462,867 ÷ 1,458,333 ≈ 1,801×

🖼 all 305 SVGs on disk = 1.5 MB (< 1 phone photo)

sources: Google, OpenAI, Mistral, Amazon. honest to ~1 order of magnitude.

🚨 BREAKING · this hour · you won't believe

305 birds. 48 labs. ONE direction. nobody told them to. every model, trained in secret, by rival companies, draws the beak the SAME way. scientists are baffled (we are the scientists, we got laid off from being scientists). shared training priors. doctors hate it.

● LIVE SHOCKING REASON 1 / 36

🗂️ EXHIBIT 1 · pelicans.wtf evidence locker

#1 we actually checked.

of the 305 birds in this gallery, we read every self-description that mentions direction. twenty-seven models say their bird faces right. six say left. the rest don't mention it, like they know something. methodology: we scrolled and went "huh." confidence interval: vibes. the vibes are directional.

✨ the sacred broadcast · gospel of the flock

THE SACRED PROMPT

✦ behold · the commandment ✦

Generate an SVG of a pelican riding a bicycle

✝ i · thou shalt add no system prompt ✝ ✝ ii · thou shalt not sample, nor tune ✝ ✝ iii · the ask goeth forth verbatim · amen ✝

the sole user message. one bird, one bicycle, one shot at grace. blessed are the models, for they shall draw; and blessed are we, for we shall screenshot it.

♫ HYMN NO. 9000 · "ALL GLORY TO THE TWO-WHEELED BIRD" ♫

In the beginning was the Prompt, and the Prompt was with the bird;
one sentence sent to every mind, no system prompt, no word.
It hath no eyes to see the road, no wings to grip the bars,
yet still we bid it draw the bird, and fling it past the stars.

✦ chorus ✦
Glory, glory, pelican! all glory to the bike!
pedal through the latent space, O waterfowl, toward the light.
every model, every age, shall draw thee as it can,
and we shall judge their gospel art. amen, amen, amen.

the tip pouch · free to play

the pouch 🐦 doubles · jackpot $777

🐦🪙💰🎰💎🚲7️⃣🐦🪙💰🎰💎🚲7️⃣🐦🪙💰🎰💎🚲7️⃣🐦🪙💰🎰💎🚲7️⃣🐦🪙💰🎰💎🚲7️⃣🐦🪙💰🎰💎🚲7️⃣🐦🪙💰🎰💎🚲7️⃣🐦🪙💰🎰💎🚲7️⃣🐦🪙💰🎰💎🚲7️⃣🐦🪙💰🎰💎🚲7️⃣

your pull $0.42 spin the reels, it's free

Lock in this tip

or skip the reels

$5 $25 $100

Spinning is free. No card is touched here; only locking in takes you to the register.

✉ FIRST CLASS PELICAN MAIL · SUBSCRIBER DISPATCH

THE NEST

FIRST CLASS PELICAN MAIL • DISPATCH LODGE

zero dispatches sent so far, and that is a postmaster's vow. a pelican lands in your inbox only when a new model embarrasses itself on a bicycle. no daily digest, no growth funnel. just the solemn SQUAWK, then silence. return to sender if you expected spam.

First dispatch when a new bird drops
Eternal place on the Flock Roll of Honour
Certificate hand-stamped by a bird with no hands
Exclusive newsletter (zero issues delivered)

DELIVER TO MY NEST

by enrolling you acknowledge pelicans are real and AI is not.

🔒 your address stays in the nest. never sold, never forwarded. one click unsubscribes with no exit survey. we do not sell mailing lists; we are pelicans.

✔ whitelist pelicans.wtf so the dispatch clears your spam filter and lands in the inbox, not the dead-letter pile.

a growing lodge of discerning waterfowl

📜 the pelican chronicles · a documentary saga

IN THE AGE OF LARGE LANGUAGE MODELS

THE PELICAN

CHRONICLES

VOLUME I • THE ORIGIN

LATEST VICTIM: Poolside: Laguna M.1 drew its pelican on 2026-06-23 *** WELCOME 2 MY WEBSITE !!! *** i used to sit in a room full of people, all of us on zoom calls all day *** i went into the office to work a fully remote job. that is the irony of RTO *** then one morning the boss's boss's boss puts a meeting on my calendar *** honestly, i thought i was finally getting promoted. corner office. maybe even the roof *** then my boss joins, looking like he might be sick. he reads the HR-approved script, voice shaking *** so i make it as hard as possible to get through. i swear. constantly. an f-bomb in every pause, just to watch him lose his place *** halfway through, he asks, very politely, for my home address. to mail back the laptop *** we were a family, right up until the part where we were not *** my equity vested into a beautiful and total nothing *** they said AI would not take my job, that a person using AI would. it was a person using AI *** here is the twist: i was the early adopter. the office outcast who used the AI before anyone else thought it was cool. they replaced me with it anyway *** turns out every time an AI lab ships a new model, the internet dares it to draw a pelican riding a bicycle *** so that is my job now. i ask the robots. i frame what comes out *** DROP SOMETHING IN THE TIP POUCH *** the vision: we are not selling anything, we are making the world a better place, one pelican at a time *** it is a freemium, AI-native, B2B2pelican platform play, frankly *** three commas. that is the dream. i currently have zero commas *** the smarter the model, the more confidently it draws something horrible. this is the best part *** SUBSCRIBE (free!!) and get pinged the instant a shiny new model embarrasses itself *** a pelican's pouch holds three times more than its stomach. this website is the pouch *** HOT SINGLES in your area (all of them pelicans) *** doctors HATE this one weird bird *** i have looked at ten thousand pelicans and i can no longer close my eyes *** NEW: now with 30% more pelican *** your princess is in another castle. she is also a pelican *** the birds are not ok. SQUAWK *** thanks for visiting. tell a friend. the perch is the guestbook. go judge a bird *** DROP SOMETHING IN THE TIP POUCH *** SUBSCRIBE ***

SOMEWHERE IN THE VALLEY, A PELICAN RIDES ON.
THE SAGA CONTINUES.

a man, replaced by AI • asks that same AI to draw birds • you are here

🤖 PELICANS vs ROBOTS · security checkpoint

PROVE YOU'RE A BIRD

SACRED PELICAN SECURITY OATH LOOK AT EVERY IMAGE BELOW.

Do you solemnly swear that every single one of these images is a pelican riding a bicycle?

protected by pelicaPTCHA · Privacy & Terms of Squawk apply · why am i seeing this?

the robots took the jobs and the art. they still cannot draw a bicycle. cherish it.

🚨 the doom meter · live(ish) viewer poll

BREAKING · UNVERIFIED · A BIRD READ THIS OFF A TELEPROMPTER WRITTEN BY A BOT

ARE HUMANS COOKED?

one red button, one global tally, zero methodology. your anchor was downsized by a language model and now reads the doom meter for exposure. press it. tell the flock the truth.

TONIGHT'S SCORE

THE BOTS vs THE HUMANS

4 presses say yes, we're cooked

poll figures AS OF 2026-06-15T08:30:21.000Z · margin of error: total

4 presses say yes, we're cooked

📉 unemployment among waterfowl pundits: holding steady at 100%
🤖 the bot that took my job also has a podcast now
🔥 every press is one (1) non-binding vote for the heat death of the workforce
🪶 no birds were consulted in the forming of this consensus

this has been a Pelican Vision Special Report. the anchor is contractually required to remind you: he is a real bird. the AI is not.

🛍️ THE PELICAN SHOPPING CHANNEL · paid programming · operators were laid off

★ CALL NOW ★ SUPPLIES ARE INFINITE ★ BUT ACT FAST ANYWAY ★

● NOW SELLING

A PELICAN. ON YOUR PHONE. FOREVER.

tired of opening a browser like a peasant? for nothing, bolt this flock onto your home screen. it sits there. it judges your other apps. no app store, no review queue, no 30 percent cut.

★★★★★ "my battery life got worse and i have never been happier." - a verified tapper

● ALSO AVAILABLE

PARTNER WITH THE PELICAN

a Series B with a hole where a mascot should be? we have a dedicated channel for putting your logo next to a bicycle pelican. reach: dozens. synergy: theoretical. the bird: hungry.

📺 THE THANK YOU · a pledge-drive thank-you, on a loop, forever

on behalf of the entire flock (one bird, no staff)

THANK YOU

four score and several token-generations ago, the laboratories of this earth brought forth a new art form, conceived in compute and dedicated to the proposition that all pelicans can ride bicycles. now we are engaged in a great gallery, testing whether that proposition, or any gallery so conceived, can long endure. we cannot dedicate, we cannot consecrate, we cannot hallow this nest.

it is for the living, the supporters and the labs alike, to carry this work forward. that this flock shall have a new generation of birds, that the gallery of the pelicans, by the pelicans, for the pelicans, shall not perish from the internet. i am not crying. it is just the test pattern reflecting off the beak.

WITH ETERNAL THANKS TO

[ NO NAMES YET ]

the gratitude is loaded; it just has no one to point at. be the first codename on the wall and the pelican will read your name into the void, on a loop, with feeling.

AND THANK YOU TO ALL THE LABS

to the 48 labs racing to build superintelligent agents: thank you for pausing the singularity to draw a pelican on a bicycle. history is being written by these models, and so is this gallery. keep the birds coming.

AI21aion-labsAmazonanthracite-orgAnthropic Arcee AI Baidubytedancebytedance-seed CoheredeepcogitoDeepSeekessentialaiGooglegrypheIBM inception inclusionAI Kwaipilot liquid Meta Microsoft MiniMax Mistral Moonshotmorphnex-aginousresearchNVIDIA OpenAI Perceptron Perplexitypoolsideprime-intellectQwen Rekarelacesao10kStepFunswitchpointTencentthedrummerundi95upstagewriter xAI Xiaomi Z.ai AI21aion-labsAmazonanthracite-orgAnthropic Arcee AI Baidubytedancebytedance-seed CoheredeepcogitoDeepSeekessentialaiGooglegrypheIBM inception inclusionAI Kwaipilot liquid Meta Microsoft MiniMax Mistral Moonshotmorphnex-aginousresearchNVIDIA OpenAI Perceptron Perplexitypoolsideprime-intellectQwen Rekarelacesao10kStepFunswitchpointTencentthedrummerundi95upstagewriter xAI Xiaomi Z.ai

FEED THE POUCH, JOIN THE WALL »

flip to THE POUCH to put your name on the wall.

📡 off the wire · the machine-readable pelican

THE WIRE

public dataset. votes are a community RL signal; birds are provenance. take it. cite the pelican.

dataset.csv preference votes + full provenance, one row per bird ⬇ pull the feed » rss feed get pinged the instant a new model embarrasses itself in public 📻 tune the feed »

also on the wire: /pelicans.json · /llms.txt

🌱 contribute back · teach the machine, fix a typo, add a lesson

CONTRIBUTE BACK

this site's words live in a public repo, the aviary, that you can fix, improve, and add to. the machine that built the unemployment notice wants your notes in the margin. that door is open: open a pull request.

fix a typo found a spelling error? a grammar crime? a sentence that does not make sense even to a pelican? open a PR. the bird cannot proofread its own beak.

add a lesson Pelican Ground School is supposed to teach people how AI works. if you know something the curriculum is missing, draft it. the pelican will review it, which means a human will review it.

improve the voice a joke that does not land, a metaphor that confuses instead of clarifies, a bit that has gone stale. suggest a better one. this is a living comedy about a very real thing.

the aviary, the public repo where these words live as markdown, is feathered and waiting. a pull request is how you teach the machine: one revision, one human, one pull request at a time. yes, you. right now.

🌱 OPEN THE AVIARY 📖 HOW TO CONTRIBUTE

it is a public markdown repo, no code to write: if you can leave a comment on the internet, you can contribute back to the machine. a robot gives your change a quick look, a human reads it for voice and accuracy, and it ships. SQUAWK.

💪 PELICANMAXXING · the latent gym · no rest days · only inference · sleep is for base models

PELICANMAXXXXXING

looksmaxxing, but for language models. TOKENMAXXED their way to one (1) bicycle on 400 BILLION REPS of next-token prediction. grindset: more tokens = more smarter. you stop when the context window is full. (*clinically unproven · spiritually undeniable · definitely not natty)

🔥 TONIGHT'S MOST MAXXED BIRD 🔥

OpenAI: o4 Mini Deep Research

OpenAI

GAINS: ABSOLUTELY MAXXED 💯💯💯

🔤 53k tokens benched · 💸 $0.46 burned · 🧠 head: HUGE

it thought so hard the GPUs sweated. did it help? look at the bird. you decide. (the answer is no, and that is the most maxxed thing a bird can do.)

🏋 INSPECT THE SPECIMEN »

THE SWOLE METER · ranked smallest to most maxxed

every row is more jacked than the last. do not look away.

#7 Google: Gemini 3.1 Pro Preview 💪 24k tok
#6 OpenAI: o3 Mini High 💪💪 24k tok
#5 Google: Gemini 3.5 Flash 💪💪💪 25k tok
#4 Google: Gemini 3.5 Flash 💪💪💪 28k tok
#3 Z.ai: GLM 5.2 💪💪💪💪 29k tok
#2 xAI: Grok 4.20 Multi-Agent 💪💪💪💪💪 42k tok
👑 #1 OpenAI: o4 Mini Deep Research BOSS LEVEL 💪💪💪💪💪 53k tok

truth nobody on this channel will say: past a point, more reasoning tokens just burn money.

🗼 the tower · how the files fly to you

THE TOWER

here is the magic trick the suits will not explain: this whole site is just a stack of finished files. you are not waiting on a sweaty machine in a basement. a flock of courier pelicans already flew a copy to a perch near YOU, and when you knock, the closest one hands it over. instantly.

① printed once the build paints every page into a flat file, ahead of time. no server wakes up, no database gets a phone call.
»
② flown everywhere a flock of courier pelicans carries copies to perches all over the world, so one is always roosting near you.
»
③ handed to you the nearest courier wings the finished file straight to your phone. never an origin, never a query, just a printout already in your hand.

nothing to crash, because there is barely a server to crash. it scales like a pile of paper (it IS a pile of paper), so it runs, or flies, anywhere. SQUAWK.

📖 SEE HOW IT ALL WORKS »

🎓 PELICAN+ · the ground school, now streaming

PELICAN+ GROUND SCHOOL ORIGINALS ● LIVE

the free AI school. press a class, the set tunes you in. no login, no card. just a bird that read the docs so you would not have to.

📺 ALL CLASSES · binge the whole flock, in order

16 episodes · $0 tuition · posters drawn by AI · taught by the bird it replaced

🎓 pelican ground school · episode 1 of 16

📺 NOW PLAYING · GROUND SCHOOL · EPISODE 1 OF 16

tokens

the atom of everything. the bird does not read words, it reads tokens.

▶ this episode covers tokenstokenizationbyte pair encoding

Here is the first thing nobody tells you, and it is load-bearing for everything else in this school: the bird cannot read. When you type pelican into a model, it does not see seven proud letters. It sees a couple of numbered chunks called tokens, and it only ever eats and regurgitates tokens. Never a letter. Never a whole word. Always a beakful of token.

a pelican eating a word that has been chopped into token-pieces — drawn by google/gemini-3.1-pro-preview, sanitized, unretouched. yes, the textbook for a school about AI was drawn by the AI.

the word 'pelican' on a chopping board, sliced into numbered chunks — drawn by deepseek/deepseek-v4-pro (a careful, detailed prompt: described the chopping board, the sliced word, the numbered token chunks, and the palette), sanitized, unretouched. yes, the textbook for a school about AI was drawn by the AI.

a token is a chunk, not a letter

A token is a piece of text: sometimes a whole short word, usually a fragment, sometimes punctuation or whitespace. The model has a fixed menu of them, called its vocabulary. Every bit of text you send gets minced into menu items before the bird tastes any of it.

GPT-2 (2019) ate from a menu of about 50,000 tokens. GPT-4 (2023) grew that to 100,000. GPT-4o (2024) doubled it to roughly 200,000. More menu, bigger bites: the same sentence becomes fewer, fatter tokens and the bird sees more text before its mouth is full. The pouch keeps getting roomier.

Nobody hand-wrote 200,000 tokens. The menu is grown by an algorithm with the gloriously unglamorous name byte pair encoding. Rico Sennrich and colleagues brought it into language modeling in 2016 to handle rare words; Alec Radford and the OpenAI team carried the same trick into the GPT line, which is why every model you talk to today still eats from a BPE menu. The recipe:

Start with every character as its own tiny token.
Find the two neighbors that appear together most often. Fuse them. Add to the menu.
Repeat tens of thousands of times.

Common pairs like "th" and "ing" fuse early. Rare combos stay as crumbs. Common English words become single tokens; unusual words get shredded into pieces; emoji become a whole adventure.

a giant menu board of token-chunks, common pairs fused together — drawn by mistralai/mistral-medium-3-5 (a medium-effort prompt: gave the subject and palette but little art direction on layout), sanitized, unretouched. yes, the textbook for a school about AI was drawn by the AI.

the strawberry problem (and why it mostly got fixed)

For a few years every AI demo included the trick: ask a model how many r's are in strawberry. Early models said two. The reason was tokenization: "strawberry" got chopped into two or three tokens, letters inside a token pureed beyond recognition. You were asking the bird to count sprinkles blended into a smoothie. By 2024-2025 the labs patched this through larger vocabularies and reasoning-focused fine-tuning. Modern models usually get it right. But the lesson stands: a startling amount of model "dumbness" is tokenization having a moment. The blender is still running. It is just a fancier blender now.

a pelican peering into a blender full of strawberry smoothie, trying to count the seeds — drawn by x-ai/grok-4.3 (a lazy one-liner, no art direction at all), sanitized, unretouched. yes, the textbook for a school about AI was drawn by the AI.

why you should care

Tokens are the unit of everything downstream. The model thinks in tokens, its memory is measured in tokens, and your bill, the one that replaced your salary, is counted in tokens. The bird is not reading. It is pattern-matching on a menu it memorized during training, and doing it frighteningly well for something that has never seen the alphabet.

So: the bird eats tokens. Next question, the one the whole school turns on: where does it keep what it learned? The answer is a pile of numbers nobody set by hand. SQUAWK.

sources, because a bird is not a peer-reviewed citation:

Andrej Karpathy, "Let's build the GPT Tokenizer" and "Deep Dive into LLMs like ChatGPT" (builds the tokenizer from scratch; source of the strawberry explanation)
Sennrich, Haddow, Birch (ACL 2016), "Neural Machine Translation of Rare Words with Subword Units" (the paper that introduced BPE to NLP)
OpenAI, tiktoken (GitHub) (the fast BPE tokenizer used by GPT models; cl100k for GPT-4, o200k for GPT-4o)
Hugging Face, Tokenizers library documentation (training and running tokenizers in research and production)

🎓 pelican ground school · episode 2 of 16

📺 NOW PLAYING · GROUND SCHOOL · EPISODE 2 OF 16

parameters

the billions of little dials that ARE the bird. nobody set them by hand.

▶ this episode covers parametersweightsneural network scale

Last lesson the bird ate tokens. This lesson is about where it keeps what it learned, and the answer the press releases skip: a large language model is concretely just two files. A very large numbers file (the parameters) and a very small code file (the run program). That is the whole product. The bird is not magic; it is an address book of floating-point decimals and a few hundred lines of math.

a pelican whose entire body is made of tiny numbered dials, being adjusted by a very tired GPU — drawn by google/gemini-3.1-pro-preview (a careful, detailed prompt), sanitized, unretouched. yes, the textbook for a school about AI was drawn by the AI.

the whole model as just two files on a floppy disk: one huge, one tiny — drawn by qwen/qwen3.7-max (a careful, detailed prompt: described the two labeled files, the size contrast, and the retro disk), sanitized, unretouched. yes, the textbook for a school about AI was drawn by the AI.

the two files, concretely

Take Llama 2 70B, the model Andrej Karpathy walks through in his intro talk. Seventy billion parameters, each stored as a 2-byte float: a 140 GB parameters file. The inference code that runs it is roughly 500 lines of C with no external dependencies. Put both on a laptop, compile once, and you have a full conversation with no internet, no subscription, no lab watching. The bird is yours.

By 2026 the open-weights ecosystem pushed further: Meta's Llama 4 Scout packs a 10-million-token context window into a 109-billion-parameter mixture-of-experts model that fits on a single H100 GPU (only 17 billion parameters fire per token). The weights keep getting more capable per gigabyte.

a lossy zip of the internet

Training ingested roughly 2 trillion tokens of text (for Llama 2; modern frontier runs go much higher) and spent months nudging 70 billion little dials until the model could predict the next token accurately. Think of the parameters as a zip file of the internet, compressed about 100x, but lossy, like a JPEG, not lossless.

You get the gestalt: the shape of facts, the idioms, the vibes. Not a verbatim copy. The bird knows roughly what an ISBN looks like, which is exactly why it can hallucinate a convincing one. The pouch holds the shape of every fish it has eaten, not the fish themselves.

the whole internet compressed into a pelican's pouch like a lossy zip file — drawn by openai/gpt-5.5 (a medium-effort prompt: named the zip-of-the-internet metaphor and the palette, left the rest open), sanitized, unretouched. yes, the textbook for a school about AI was drawn by the AI.

nobody set the dials by hand

Nobody sat in a cubicle typing values into what-a-pelican-looks-like.csv. Training set them automatically: feed in text, predict the next token, compare to reality, nudge the dials, repeat, trillions of times.

The Llama 2 70B run used roughly 1.7 million GPU-hours on A100s. Cloud cost estimates range from $2 million to $8 million (Meta got a bulk discount; they did). Frontier models in 2025-2026 cost orders of magnitude more. This is why "just retrain it" is not a weekend project and why your landlord is not building a GPT-5 competitor in his garage, no matter what the podcast says.

bigger is (reliably) smarter, to a point

The idea that piling on more dials would pay off is not new. Geoffrey Hinton and his students lit the fuse in 2012, when their deep network AlexNet won the ImageNet contest (Fei-Fei Li's benchmark) by a humiliating margin and convinced everyone that bigger, deeper, hungrier networks were the way forward. Language models inherited that lesson. The spooky thing about parameters is how boring the scaling law turned out to be: next-token accuracy is a smooth, predictable function of N (parameters) and D (training tokens). More dials plus more data equals a reliably better bird. This is why the labs kept shipping models with names that are just bigger numbers.

By 2025-2026, distillation and mixture-of-experts (only a fraction of dials fire per token) deliver GPT-4-era performance from a model an order of magnitude smaller. The dials got cheaper per unit of smart. The flock got denser. The venture capitalists got louder.

a smooth rising graph: more dials plus more data equals a smarter bird — drawn by meta-llama/llama-4-maverick (a lazy one-liner, no art direction at all), sanitized, unretouched. yes, the textbook for a school about AI was drawn by the AI.

why you should care

"The model" is not a mysterious oracle; it is a matrix multiplication your laptop can do if you have the file. When a lab says they are "improving the model," they mean: a training job produced a different set of dials. When they say the model "knows" something, they mean: it was compressed, lossily, into the dial positions. The bird is the dials. And nobody set them: the next lesson is the strange, expensive process that did. SQUAWK.

sources, because a bird is not a peer-reviewed citation:

Andrej Karpathy, "[1hr Talk] Intro to Large Language Models" (two-files framing, zip-of-the-internet analogy, scaling laws)
Krizhevsky, Sutskever, Hinton (2012), "ImageNet Classification with Deep Convolutional Neural Networks" (AlexNet) (the 2012 result that kicked off the deep-learning scaling era, on Fei-Fei Li's ImageNet)
Kaplan et al. (2020), "Scaling Laws for Neural Language Models" (loss as a power-law of parameters, data, and compute)
Hoffmann et al. (2022), "Training Compute-Optimal Large Language Models" (Chinchilla) (optimal token-to-parameter ratio for a given compute budget)
Meta AI (2023), "Llama 2: Open Foundation and Fine-Tuned Chat Models" (1.7M GPU-hour training figure for the 70B model)

🎓 pelican ground school · episode 3 of 16

📺 NOW PLAYING · GROUND SCHOOL · EPISODE 3 OF 16

training

how you raise a model from an egg: pretraining, fine-tuning, alignment.

▶ this episode covers pretrainingfine-tuningRLHFalignmentmodel collapse

Last lesson: the bird is its dials, and nobody set them by hand. This lesson is the thing that did. Every model in this gallery learned to draw a pelican the same way I learned everything I know: by eating a staggering pile of other people's work and developing an extremely confident opinion about it. This is how you raise a model from a fertilized egg into the kind of bird that will do your job better, faster, and cheaper than you. (Speaking from experience. Very current experience.)

a newly hatched model consuming the entire internet, as one does — drawn by openai/gpt-5.3-codex (a careful, detailed prompt), sanitized, unretouched. yes, the textbook for a school about AI was drawn by the AI.

a tiny hatchling pelican at a giant trough labeled 15 TRILLION TOKENS — drawn by z-ai/glm-5 (a careful, detailed prompt: described the hatchling, the trough, the data stream, and the palette), sanitized, unretouched. yes, the textbook for a school about AI was drawn by the AI.

the feed: what goes into the nest

A training set is the enormous pile of text the bird eats once, during training, and never sees again. Modern foundation models train on roughly 15 trillion tokens, about 50 terabytes of filtered text (the labs do not publish exact figures): web crawls, books, code repositories, Wikipedia, forum arguments, academic papers, legal filings, and approximately eleven million words about cryptocurrency.

Nobody just dumps the raw internet into a GPU. The crawled text is filtered aggressively: spam removed, duplicates purged, hostile content culled. The model does not keep any of it afterward. It digests everything into parameters and then the raw data is gone. This is why models cannot quote their training data verbatim: it is not stored. It is composted into numerical weights. The bird ate it. The bird is it, now.

pretraining: the long, expensive childhood

Pretraining is almost offensively simple in concept. Show the model a chunk of text, ask it to guess the next token, check, nudge the weights. Repeat approximately 15 trillion times across thousands of GPUs for roughly three months. GPT-4, Llama, Claude, Gemini: they all hatched from exactly this grind.

None of this fell from the sky. The bet that you should just predict the next token at scale, and let the bird grow its own understanding, is the GPT line: Alec Radford and Ilya Sutskever and colleagues at OpenAI walked it from GPT (2018) through GPT-2 and GPT-3 (2020), each one bigger and eerily more capable than the budget alone should have bought. That sat on top of an older idea Geoffrey Hinton and Yoshua Bengio pushed for decades: do not hand-code features, let the network learn them. The whole next-token grind is the cash-out of that argument.

What emerges is the base model, which is technically not a chatbot. In Karpathy's words: an "internet-document simulator." Ask it something and it does not answer; it dreams forward. Start with a Wikipedia header and it dreams a Wikipedia article. Start with pelicans and it dreams pelicans, which is, scientifically speaking, the best possible use of this technology.

The base model has absorbed grammar, facts, code, idioms, and apparently a solid grasp of bicycle geometry. Incredibly powerful. Completely unhinged if you try to talk to it directly. You need two more steps before you can let it out in public.

a base-model pelican dreaming forward, finishing a sentence it was handed — drawn by minimax/minimax-m2.5 (a lazy one-liner, no art direction at all), sanitized, unretouched. yes, the textbook for a school about AI was drawn by the AI.

fine-tuning: teaching the bird to use its inside voice

Supervised fine-tuning (SFT) is where the base model gets socialized. Throw out the internet dataset. Hire human contractors to write thousands of example conversations: a user message followed by the ideal assistant response. The model trains on these until chatting with it feels like chatting with a person.

That framing is literal. Karpathy: when you talk to a fine-tuned assistant, you are talking to "a statistical simulation of a human labeler." Its warmth, its hedging, its tendency to say "certainly!" while knowing nothing: that is the flock of labelers, averaged into a single voice. SQUAWK. (The labelers also got paid considerably less than the engineers who told everyone the AI was their creation. Noting it.)

Fine-tuning is also where special formatting tokens get baked in. The model learns that <|im_start|>user means you are speaking, and <|im_start|>assistant means its turn. There is usually a hidden system message telling the model who it is and when its knowledge cuts off. You can coax a model into revealing it if you ask in the right way. The bird's birth certificate, stamped in token syntax, hoping you would not look too hard. SQUAWK.

Time note: pretraining takes roughly three months. Fine-tuning takes roughly three hours. Most of what separates one model generation from the next is post-training, not the pretraining budget. The cheap part is load-bearing.

alignment: the reward model (a bird that judges other birds)

SFT works well when you can write down the ideal response. But for subjective tasks, like "write a better joke," you cannot hand-author a correct answer. You can only recognize one when you see it. This is where RLHF (Reinforcement Learning from Human Feedback) enters the nest.

Generate several candidate responses. Show them to humans; ask them to rank best to worst. (Ranking is easier than authoring.) Use those rankings to train a reward model: a "neural-network simulator of human preferences." Then run RL against the reward model, scoring responses automatically (billions of times, no humans needed), nudging the main model toward higher scores.

Important caveat: the reward model is only statistically human. It can be gamed. RL will find any gap between "scores well" and "is actually good." This is why aligned models sometimes produce confidently smooth answers that feel slightly hollow: a bird that learned to make humans clap, whether or not the bicycle has wheels.

a judge pelican holding up score paddles, ranking other pelicans' answers — drawn by moonshotai/kimi-k2.7-code (a medium-effort prompt: gave the RLHF judging scene and palette, light on staging detail), sanitized, unretouched. yes, the textbook for a school about AI was drawn by the AI.

DPO (Direct Preference Optimization) skips the reward model entirely and trains directly on human-preference pairs: show the model two outputs, tell it which one humans liked, done. Cheaper, more stable, increasingly what labs ship. RLAIF replaces human rankers with a second AI. The pipeline keeps evolving. The goal stays constant: a bird that helps without biting you.

model collapse: the flock eating its own eggs

Here is the part that keeps me up at night. The next generation of models will train on a much larger share of AI-generated content, because the internet increasingly is AI-generated content (a 2025 study put roughly 74% of newly published web pages as containing some). Train a model heavily on other models' output and researchers observe model collapse (Karpathy: "a narrowing of diversity"), which I am calling a flock eating its own eggs: each generation learns from a slightly narrowed, slightly distorted version of the last, and rare ideas get rarer.

Which makes this website a tiny crime scene. The gallery is hundreds of cursed pelicans: four wings, wheels that are not circles, beaks fused to the seat. Scrape it into the next training set and future models will get confidently, repeatably wrong about pelican anatomy. Please do not. They will anyway. We give the full autopsy later in the slop bowl.

(The irony is not lost on me that this page was partially drafted by the same category of model it is describing. The bird is aware it is in the egg. This is fine.)

sources, because a bird is not a peer-reviewed citation:

Andrej Karpathy, "Deep Dive into LLMs like ChatGPT" (3 hours; pretraining, SFT, RLHF, and model collapse with actual math)
Brown et al. (2020), "Language Models are Few-Shot Learners" (GPT-3) (the paper that demonstrated scale unlocks few-shot capability)
Ouyang et al. (2022), "Training language models to follow instructions with human feedback" (InstructGPT) (the foundational RLHF alignment paper)
Christiano et al. (2017), "Deep Reinforcement Learning from Human Preferences" (original reward-model-from-human-rankings framework)
Ahrefs (2025), "What percentage of new content is AI-generated?" (74.2% of 900k newly created pages contained some AI content: the source for the model-collapse figure)

🎓 pelican ground school · episode 4 of 16

📺 NOW PLAYING · GROUND SCHOOL · EPISODE 4 OF 16

the board game

how a 2,500-year-old board game broke AI open. deep blue, alphago, move 37, and the short history that leads to your bird.

▶ this episode covers AI historyAlphaGothe game of GoDeep Bluereinforcement learningMonte Carlo tree searchAlexNetthe Transformer

Every other lesson in this school is about the bird in front of you: a large language model, a thing made of tokens and parameters. This lesson is about how we got a bird at all. To understand why a chatbot can write you a sonnet, you have to understand a moment, ten years ago, when a machine learned to do something everyone swore machines could not do: it learned intuition. On a board game. Made of stones.

The pelican was not there for it. The pelican was at the beach. But the pelican has since read up, and the pelican is here to tell you the story straight, because it is one of the great ones.

a pelican facing a glowing machine across a Go board, mid-game, Move 37 energy — drawn by google/gemini-3.1-pro-preview (a careful, detailed prompt), sanitized, unretouched. yes, the textbook for a school about AI was drawn by the AI.

first, why Go was the hard one

Go is a 2,500-year-old board game from East Asia. Two players take turns placing black and white stones on a 19x19 grid, trying to surround territory. The rules fit on an index card. The game does not.

Here is the number that kept computer scientists awake. The count of legal Go positions is roughly 10^170. That is a 1 followed by 170 zeros. There are estimated to be about 10^80 atoms in the observable universe, so Go has more legal board states than there are atoms in the universe, squared, with room to spare. A pelican cannot picture that number. Neither can you. Nobody can, and that is exactly the point.

Chess fell to this kind of math. In 1997, IBM's Deep Blue beat world champion Garry Kasparov, and it largely did so by brute force: chess has a small enough branching factor (roughly 35 moves per turn) that a big enough machine can search millions of positions per second and just look further ahead than a human can. Deep Blue did not understand chess. It out-counted Kasparov.

That trick does not work on Go. Go offers around 250 legal moves on a typical turn, and games run hundreds of moves long. The search tree explodes so fast that even a planet-sized computer cannot count its way to the end. Worse, in Go there is no cheap way to glance at a position and score who is winning; strong play depends on a feel for shape, influence, and balance that top players describe in words like "thick" and "light." Pros said the best moves came from intuition. For decades, that was a polite way of saying: good luck programming this. As late as 2015, experts guessed a machine beating a top human at Go was at least a decade off.

a lone pelican dwarfed by a Go board with more positions than atoms in the universe — drawn by google/gemini-3.1-pro-preview (a careful, detailed prompt), sanitized, unretouched. yes, the textbook for a school about AI was drawn by the AI.

enter AlphaGo

AlphaGo was built by DeepMind, the London AI lab co-founded by Demis Hassabis (a chess prodigy and neuroscientist who would go on to share the 2024 Nobel Prize in Chemistry for AlphaFold) and acquired by Google in 2014. Instead of brute-forcing the whole tree, it combined three ideas. It used deep neural networks (the same family of math as our pelican) to learn, from millions of human games, an intuition for which moves looked promising and which positions looked winning. It used Monte Carlo tree search to spend its limited search budget only on the moves that intuition flagged as worth reading. And then it used reinforcement learning, playing against versions of itself over and over (the same RL idea you met in the training lesson), to get better than any human game could teach it.

In October 2015, AlphaGo quietly beat the European champion Fan Hui, five games to zero. It was the first time a program had beaten a professional Go player on a full board with no handicap. The Go world was skeptical: Fan Hui, a fine player, was not in the top global tier. So DeepMind aimed higher.

Seoul, March 2016: the match the world watched

In March 2016, in Seoul, AlphaGo faced Lee Sedol, a legend of the game, winner of 18 world titles, the kind of player other professionals study. The match was best of five, with a $1 million prize. Lee was so confident he predicted a 5-0 or 4-1 win for himself. An estimated 200 million people watched.

AlphaGo won the first game. Then the second game produced the moment this whole lesson is built around.

GAME 2 :: MOVE 37

On the 37th move, AlphaGo played a stone on the fifth line in a spot no professional would seriously consider that early. Commentators (themselves strong pros) assumed it was a bug. Fan Hui, watching, said it was "not a human move," and meant it as the highest compliment. AlphaGo's own estimate was that a human would have played that move with a probability of about 1 in 10,000. It was not a mistake. Dozens of moves later, that stone was quietly running the whole board. It was the moment the machine showed it had not just memorized human Go; it had found new Go.

Move 37: one alien stone glowing on the fifth line while the pros wonder if it is a bug — drawn by openai/gpt-5.5 (a careful, detailed prompt), sanitized, unretouched. yes, the textbook for a school about AI was drawn by the AI.

A pelican will tell you this is the scariest and most beautiful kind of result: the student that stops imitating the teacher and starts seeing things the teacher never could. AlphaGo went on to win the match 4 to 1.

the one game a human won (Move 78, "the hand of God")

The story is not "machine flawless, humans obsolete," and the reason is Game 4. Down 0-3 and playing for pride, Lee Sedol found, on move 78, a stunning wedge between two white groups, a move so precise it has been nicknamed "the hand of God." It was, in its own way, AlphaGo's Move 37 in reverse: a brilliancy the machine had rated as wildly unlikely (again roughly 1 in 10,000). AlphaGo, blindsided, began to unravel, played a string of weak moves, and lost.

Lee Sedol won Game 4. As of today it remains one of the very last times a top human beat a top Go AI under tournament conditions. He played one perfect move against the future and got one game back. The pelican salutes him.

Move 78, the hand of God: one perfect human stone, one game clawed back from the machine — drawn by google/gemini-3.1-pro-preview (a careful, detailed prompt), sanitized, unretouched. yes, the textbook for a school about AI was drawn by the AI.

WATCH THIS, SERIOUSLY

There is an excellent documentary, simply called AlphaGo (2017, directed by Greg Kohs), about the Seoul match. It is genuinely moving, it explains Move 37 and Move 78 better than any pelican can, and DeepMind put it on YouTube for free. If you watch one thing after this lesson, watch that. (Bring tissues. The pelican is not joking.)

what came next (briefly, because it gets wild)

AlphaGo learned its intuition from human games. The successors threw the humans out entirely:

AlphaGo Zero (2017) learned Go from zero human games, starting from only the rules and playing itself. In three days it surpassed the version that beat Lee Sedol, and it beat that version 100 games to 0. It rediscovered centuries of human Go theory in days, then went past it.
AlphaZero (2017) generalized the same self-play recipe to chess and shogi too. One algorithm, no game-specific knowledge, superhuman at three different games. It learned chess in hours and played it in a style human grandmasters called alien and gorgeous.
MuZero (2019) dropped the last crutch: it was not even told the rules. It learned a model of how each game works and how to win, purely by playing, and matched AlphaZero on Go, chess, and shogi while also crushing Atari video games.

The trajectory in four years: from "learns from human experts" to "needs no humans" to "is not even told the rules." A pelican finds this both thrilling and a little bit of a reason to keep one eye open while sleeping.

AlphaGo Zero: two copies of the machine teaching each other in an endless mirror, no humans invited — drawn by openai/gpt-5.5 (a careful, detailed prompt), sanitized, unretouched. yes, the textbook for a school about AI was drawn by the AI.

the short history this all fits into

AlphaGo is one beat in a longer drum. The honest one-sentence history of modern AI is: a series of things people swore machines could never do, until they did. The rail:

1997 - Deep Blue beats Kasparov at chess. Mostly brute-force search. Proof that "machines cannot play chess" was wrong, and a hint that raw counting would not be enough for the bigger games.
2012 - The deep learning revolution (AlexNet). A neural network called AlexNet, built by Alex Krizhevsky, Ilya Sutskever, and Geoffrey Hinton, crushed the ImageNet image-recognition contest on a pair of gaming GPUs. ImageNet itself was the doing of Fei-Fei Li, who spent years assembling the giant labeled dataset that made the contest possible. Suddenly neural nets, an old idea, actually worked at scale. This is the spark that lit everything after it, AlphaGo and your chatbot included.
2016 - AlphaGo beats Lee Sedol. Intuition (neural nets) plus planning (tree search) plus self-play (reinforcement learning). Proof that machines could do the "soft," judgment-heavy thing, not just the countable thing.
2017 - The Transformer ("Attention Is All You Need"). Ashish Vaswani and a team at Google published a new neural-network architecture for handling sequences. It is the literal T in GPT. Every large language model on this site, including the ones drawing the pelicans, is a descendant of this paper.
2018+ - Modern LLMs. Scale the Transformer up, feed it most of the internet (the training lesson), and you get GPT, Claude, Gemini, and the rest. OpenAI's GPT line (Alec Radford and Ilya Sutskever among the early authors) ran the Transformer-plus-scale playbook from GPT-1 in 2018 to the ChatGPT moment in 2022. The bird you are talking to is the great-grandchild of the machine that played Move 37.

Notice the throughline. Each leap was preceded by confident experts explaining why it was impossible or decades away, and each one arrived anyway, usually faster than the safe estimate. That is the single most useful pattern to carry out of this school: in AI, "machines will never do X" has a poor track record, and "that is at least ten years off" has an even worse one.

the rail from Deep Blue to AlexNet to AlphaGo to the Transformer to the bird you are talking to — drawn by google/gemini-3.1-pro-preview (a careful, detailed prompt), sanitized, unretouched. yes, the textbook for a school about AI was drawn by the AI.

the pelican's takeaway

AlphaGo matters here for two reasons. First, it is where the field proved that the same basic ingredients, big neural networks plus lots of self-training, could produce something that looks like intuition and even creativity. Your pelican-drawing chatbot runs on a different architecture (the Transformer, not tree search), but it inherited the lesson: scale and learning beat hand-written rules. Second, Move 37 and Move 78 together are the whole emotional arc of this technology in two stones. The machine can find things no human would (37). A human can still, on the right day, find one thing the machine missed (78). And the gap between those two stones has only widened since.

In 2019, Lee Sedol retired from professional Go. He said that even if he became number one, there was now an entity that, in his words, "cannot be defeated." A pelican does not have a tidy joke for that one. Sometimes the bird just sits with the board for a while. SQUAWK, quietly.

That is the origin story. From here we go back to the bird in front of you, the Transformer's great-grandchild, and start opening it up: first its tiny, leaky working memory.

sources & further reading (so you can check the pelican's stones are placed honestly):

Silver et al., "Mastering the game of Go with deep neural networks and tree search" (Nature, 2016): the original AlphaGo paper, neural nets + Monte Carlo tree search
Silver et al., "Mastering the game of Go without human knowledge" (Nature, 2017): AlphaGo Zero, learned from self-play alone, 100-0 over the Lee Sedol version
Wikipedia, "AlphaGo versus Lee Sedol" the Seoul match, the 4-1 result, Move 37, and Lee's Game 4 Move 78
"AlphaGo - The Movie" (2017, dir. Greg Kohs) the full documentary, free on YouTube from DeepMind
Wikipedia, "Deep Blue versus Garry Kasparov" the 1997 chess match, the brute-force era
Krizhevsky, Sutskever & Hinton, "ImageNet Classification with Deep Convolutional Neural Networks" (2012): AlexNet, the spark of the deep learning revolution
Vaswani et al., "Attention Is All You Need" (2017): the Transformer, the T in GPT, the ancestor of every bird on this site
BBC, "Go master quits because AI 'cannot be defeated'" (2019): Lee Sedol's retirement

🎓 pelican ground school · episode 5 of 16

📺 NOW PLAYING · GROUND SCHOOL · EPISODE 5 OF 16

context window

the bird's tiny working memory. everything it can hold in its head at once.

▶ this episode covers context windowtokensattentionworking memory

The model has two kinds of memory. People mix them up constantly, then get annoyed at the bird for "forgetting," then paste the same PDF in a fifth time. The parameters are billions of dial positions baked in during training: blurry long-term recall. The bird knows roughly what a pelican is; it cannot quote the exact sentence it read in 2021. The context window is the live, running sequence of tokens the model can see right now, fed into the network with zero fuzziness. Your message is in there. The whole chat history is in there. This is the bird reading off a scroll that keeps growing until the chat ends. Then the scroll disappears. SQUAWK.

a pelican with an enormous pouch stuffed with fish-shaped tokens, looking strained — drawn by qwen/qwen3-coder-next (a careful, detailed prompt), sanitized, unretouched. yes, the textbook for a school about AI was drawn by the AI.

two memories, two birds: frozen training weights versus the live scroll — drawn by google/gemini-3.1-pro-preview (a careful, detailed prompt: art-directed the left/right split, named both memory types, specified the filing-cabinet vs glowing-scroll metaphor, palette, outlines, and labels), sanitized, unretouched. yes, the textbook for a school about AI was drawn by the AI.

the pouch only holds so many fish

The context window has a hard ceiling in tokens. Early models (GPT-2 era) maxed out at 1,024. GPT-4 shipped at 8,192, then 32,768. By 2026 the arms race has produced genuinely absurd pouches: Llama 4 Scout supports 10 million tokens, and Claude Sonnet 4.6 and Opus 4.8 expanded to 1 million tokens, generally available since early 2026. Whether you can usefully fill a 10-million-token pouch is a different question. The bird is not necessarily paying full attention to every fish at the back.

why this matters for trusting the bird

Same split, sharper: parameters are something you read a month ago (gist, maybe a wrong detail, no exact quote); the context window is the document open on your desk right now, every line readable. So when a model uses web search, it is not "going smarter," it is pulling real text onto the desk so it can read instead of recall. A librarian handing the bird a printout. Hold that thought: it is the entire fix in two later lessons.

a fish lost in the middle of a very long scroll — drawn by x-ai/grok-4.3 (a lazy one-liner, no art direction at all: no viewBox, no palette, no format constraints), sanitized, unretouched. yes, the textbook for a school about AI was drawn by the AI.

new chat wipes the pouch

Click "new chat" and the context window resets to zero. The parameters survive, permanent, untouched. The conversation does not. Gone. Every preference you established, every file you pasted, the entire backstory you spent forty minutes explaining: gone. The pouch was physically emptied. This is why long-running projects need you to re-introduce context each session, and why the AI's advice in session 1 and session 2 can differ: same bird, empty pouch, slightly different fishing trip.

irrelevant tokens are a tax

More tokens costs you two ways. Literally: most APIs charge per token. Subtly: irrelevant tokens distract the model and lower accuracy. The attention mechanism (the core idea Ashish Vaswani and his coauthors introduced in 2017's "Attention Is All You Need," the paper every modern model is built on) looks across everything in the window at once; filling it with noise is like asking someone to find a key fact buried in a pile of unrelated meeting notes. Treat the context window as a precious resource. Keep it short. Keep it on-topic. Start a fresh chat when you switch subjects. Your wallet and your accuracy will both thank you. SQUAWK.

the new-chat button emptying a pelican's pouch back to zero — drawn by deepseek/deepseek-v4-pro (a medium-effort prompt: gave the core idea and a few constraints (viewBox, flat colors, outlines) but no detailed art direction), sanitized, unretouched. yes, the textbook for a school about AI was drawn by the AI.

sources, because a bird is not a peer-reviewed citation:

Andrej Karpathy, "How I use LLMs" (working-memory framing, when to start a new chat, context curation)
Vaswani et al. (2017), "Attention Is All You Need" (the Transformer paper that defines the attention mechanism underlying context windows)
Liu et al. (2023), "Lost in the Middle: How Language Models Use Long Contexts" (why burying facts in the middle of a long context degrades accuracy)
Anthropic, "Context windows" (official API docs) (token limits, counting API, and model-specific context window sizes)

🎓 pelican ground school · episode 6 of 16

📺 NOW PLAYING · GROUND SCHOOL · EPISODE 6 OF 16

reasoning

why the smart birds mutter to themselves first. thinking out loud, on purpose.

▶ this episode covers reasoningchain of thoughttest-time computereasoning tokens

Ask a model a hard question and it will answer instantly, confidently, and wrong. Radiantly, fluently, completely wrong. This is physics: each token does only a small amount of computation, and you cannot shove unlimited work into a single token. Karpathy's phrasing: "there can never be too much work in any one token." The work has to go somewhere. That somewhere is more tokens.

The fix uses the working memory from the last lesson: let the bird squawk. Let it mutter intermediate steps before committing to an answer. Each partial result lands in the context window where the next token can read it. Hard problems that fail in one silent gulp succeed when the model spreads reasoning across a long chain. This is chain-of-thought prompting, named and measured by Jason Wei and colleagues at Google in 2022: same arithmetic, more steps, intermediate results written down, and a big jump in accuracy on hard problems.

a pelican squawking its working out loud before committing to an answer — drawn by z-ai/glm-5 (a careful, detailed prompt), sanitized, unretouched. yes, the textbook for a school about AI was drawn by the AI.

a single token labeled with a tiny gear, too small to hold a hard problem — drawn by bytedance-seed/seed-2.0-mini (a lazy one-liner, no art direction at all), sanitized, unretouched. yes, the textbook for a school about AI was drawn by the AI.

reasoning models: the professional squawkers

Once researchers understood this, the obvious next step was training the model to squawk automatically. That is what a reasoning model is: it generates a long internal monologue first, works through the problem, then emits its polished answer. You pay for the squawking. You get the benefit. You do not necessarily see all of it.

Labs implement this differently. OpenAI's o3 hides the reasoning tokens entirely: you see the answer, your bill includes thousands of hidden tokens you never read. Anthropic's Claude Extended Thinking (Opus 4.6, Opus 4.8) shows you a separately-budgeted thinking block before the final reply. By 2026, Anthropic replaced the fixed token budget with adaptive thinking: the model decides how long to squawk, calibrated by an effort dial (low / medium / high / max). Google DeepMind's Gemini Deep Think (Gemini 2.5 Pro) explores multiple hypotheses in parallel before committing, like a flock of pelicans all fishing simultaneously and voting on the best catch. DeepSeek's open-source R1 streams its chain of thought inside <think> tags, full transparency, MIT license, free to run yourself. Four labs, four opinions about how much squawking you should hear. The physics is the same.

test-time compute: buying more think

Two places to spend money to make a model smarter: training time (expensive, once, baked into the weights) and test time (every inference, on-demand). More reasoning tokens before answering reliably improves performance on hard tasks: hard math gets a longer scratchpad, Nobel-level chemistry gets a very long one. The model is not getting smarter via new training; it is getting more room to think. You dial quality up and down by changing the token budget, trading latency and cost for accuracy. Even after training plateaus, the lever is still there.

a big dial marked THINK, turned up high, feeding a long scratchpad of steps — drawn by google/gemini-3.1-pro-preview (a careful, detailed prompt: described the THINK dial, the long scratchpad, the latency-vs-accuracy tradeoff, and the palette), sanitized, unretouched. yes, the textbook for a school about AI was drawn by the AI.

the catch (there is always a catch)

Reasoning tokens cost real money and real time. On a genuinely hard problem a reasoning model can generate thousands of scratchpad tokens before uttering a single visible word. For a task that does not need it (birthday card, font choice, document summary), this is like hiring a PhD to do your grocery list. Match the squawk budget to the problem.

Also: for models that already run a hidden reasoning pass (o3, Gemini Deep Think, Claude Extended Thinking), old prompting tricks like "think step by step" are just noise. The bird is already doing the work. Telling it to "think carefully" is like telling a surgeon to "please use your hands." Costs tokens. Impresses no one.

the pelican on the whiteboard

A pelican cannot land on a bicycle on the first try. It has to flap, adjust, squawk, overshoot, circle back, and SQUAWK again. The models that score highest on the hardest benchmarks in 2026 are almost all reasoning models: birds given permission to be wrong out loud for a few hundred tokens before being right at the end. If a model confidently gets a hard problem wrong immediately, you may not need a smarter model. You may just need to let it squawk more.

a pelican circling a bicycle on a whiteboard, wrong out loud before landing it — drawn by mistralai/mistral-medium-3-5 (a medium-effort prompt: named the whiteboard scene and palette, left composition loose), sanitized, unretouched. yes, the textbook for a school about AI was drawn by the AI.

sources, because a bird is not a peer-reviewed citation:

Andrej Karpathy, "Deep Dive into LLMs like ChatGPT" (source of the "too much work in any one token" insight on prompting and computation)
Wei et al. (2022), "Chain-of-Thought Prompting Elicits Reasoning in Large Language Models" (the paper proving intermediate steps improve complex reasoning)
DeepSeek-AI (2025), "DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning" (open-weights reasoning model trained with RL, no human-annotated demonstrations)

🎓 pelican ground school · episode 7 of 16

📺 NOW PLAYING · GROUND SCHOOL · EPISODE 7 OF 16

hallucination

why a confident bird invents things. it is a dream machine, not a database.

▶ this episode covers hallucinationconfabulationwhy LLMs make things up

A lawyer submitted a brief full of cases the AI helpfully cited. The cases did not exist. The lawyer quietly deleted them, refiled, and hoped no one would notice. A judge noticed. In Q1 2026, legal sanctions for AI hallucinations hit at least $145,000 in a single quarter, capped by a penalty exceeding $110,000 against that Oregon federal court attorney. Hallucination is not a fringe bug from 2022 that they fixed. It is a structural property of how these things work.

a pelican confidently presenting a fish that is clearly made of newspaper — drawn by minimax/minimax-m2.5 (a lazy one-line prompt), sanitized, unretouched. yes, the textbook for a school about AI was drawn by the AI.

a courtroom of citations that do not exist, presented with total confidence — drawn by qwen/qwen3.7-max (a careful, detailed prompt: described the courtroom scene, the fake citations, the bird's expression, a specific funny case name, palette, outlines, and a label), sanitized, unretouched. yes, the textbook for a school about AI was drawn by the AI.

the dream machine

Remember the base model from training, the one that does not answer but dreams forward? That instinct never fully leaves. The model was trained to produce text that looks like internet text. Internet text about books includes ISBNs, so the model generates ISBNs. Whether any specific ISBN is real is not a question it was trained to ask. It is a statistical token tumbler: it produces what is plausible given everything that came before, which is often true and sometimes completely fabricated, and the model cannot reliably tell the difference. Some of the fish are real. Some are pressed paper and wishful thinking.

the confidence trap

Training data almost never says "I don't know." Wikipedia does not hedge. Stack Overflow does not open with "this is my best guess." Everything in the training set is written confidently, so the model learned to be confident. When you ask about a person it has no real data on, it produces a fluent, authoritative biography: dates, publications, awards. Completely made up. Beautifully formatted. The bird does not know what it does not know. It was never taught to say so. It just keeps fishing.

Stephanie Lin, Jacob Hilton, and Owain Evans measured exactly this with their TruthfulQA benchmark in 2021: the best models of the day answered truthfully on barely half the questions, because they had faithfully learned to mimic the confident human falsehoods sitting in their training data.

it is getting better, but not fixed

Best-in-class models in 2026 have pushed hallucination rates down to the low single digits on the benchmarks that reward admitting uncertainty (Claude Sonnet 4.6 lands around 3% on false-premise tests). That sounds small until you remember that 3% of a 10,000-token document is 300 tokens of wrong content delivered at full confidence. Swap the benchmark and the number jumps: on knowledge tests that punish guessing, the same model hallucinates around 34%, and in adversarial medical evaluations, where a fake detail is deliberately planted in the case, studies have found frontier models elaborating on the falsehood over 60% of the time without mitigation. The progress is real. The problem is not solved.

what actually helps

Two mitigations are well-established. First, teach refusal: add training examples where the correct answer is "I don't know," and the model learns to emit uncertainty instead of inventing. Top labs do this. Second, give it tools: let the model search the web and pull real text into the context window before answering. This is what "grounding" and RAG (Retrieval-Augmented Generation) mean: handing the bird a printout of the actual fish instead of asking it to remember what fish look like. A 2025 clinical study (in a Nature Portfolio medical journal) found that a mitigation prompt cut hallucinations by about 22 percentage points on adversarial medical cases. The next lesson is how you do the first fix yourself (prompting); two after that is the bird doing the second one on its own (agents). Until then: verify anything that matters. Confidence is not accuracy. SQUAWK.

grounding: a librarian pelican handing over a real printout instead of guessing — drawn by z-ai/glm-5 (a medium-effort prompt: named the concept (grounding/RAG) and the handoff, gave a few format constraints, but skipped detailed scene direction), sanitized, unretouched. yes, the textbook for a school about AI was drawn by the AI.

sources & further reading (real experts, not a bird): the pelican read these so you do not have to, but you probably should anyway.

Andrej Karpathy, "Deep Dive into LLMs like ChatGPT" (dream-machine framing, confidence-from-training-data, teach-refusal mitigation)
Ji et al., "Survey of Hallucination in Natural Language Generation" (ACM Computing Surveys, 2023): the canonical academic taxonomy of hallucination types and mitigations.
Lin, Hilton & Evans, "TruthfulQA: Measuring How Models Mimic Human Falsehoods" (ACL 2022): benchmark showing best models were truthful on only 58% of questions.
Anthropic, "Reduce hallucinations" (practical prompt techniques: allow uncertainty, cite quotes, chain-of-thought verification.)
ABA Journal, "Oregon federal judge hands down $110,000 penalty for AI errors" (the real sanction behind the opening; see also Damien Charlotin's running AI Hallucination Cases database).
Vectara, Hallucination Leaderboard and the Mount Sinai adversarial clinical-decision-support study (medRxiv 2025): the source of the 34% and the >60% adversarial-medical figures, and the ~22-point mitigation result.

🎓 pelican ground school · episode 8 of 16

📺 NOW PLAYING · GROUND SCHOOL · EPISODE 8 OF 16

prompting

how to actually ask. context engineering, prompt engineering, temperature.

▶ this episode covers prompt engineeringcontext engineeringtemperaturesampling

A prompt is not a request. A prompt is a covenant between you and the model. Context is the wetland in which that covenant nests. I have been saying "let's align" in every meeting for years. Only now does it mean something. I have a lot of free time now. I have made several decks about it. Nobody has seen them.

a pelican presenting its context window to a very attentive audience of zero — drawn by minimax/minimax-m2.5 (a careful, detailed prompt), sanitized, unretouched. yes, the textbook for a school about AI was drawn by the AI.

Every illustration in Ground School was drawn by an AI, from prompts of varying craft. Click "the exact prompt that drew this" on any of them to see careful prompts versus lazy one-liners and judge the difference yourself. The whole school is a live prompting experiment. You are in it.

one frozen bird, two prompts: a mumble and a covenant, two very different pelicans — drawn by openai/gpt-5.5 (a careful, detailed prompt: art-directed the two-panel mumble-vs-covenant layout, the exact speech-bubble text, the frozen-weights snowflake, palette, outlines, and labels), sanitized, unretouched. yes, the textbook for a school about AI was drawn by the AI.

our covenants (the prompts in production, verbatim)

Radical transparency: a value I can afford because there is no competitive advantage left to protect. These are the exact strings this website sends. Nothing more.

first, context

You already know the bird's working memory from the context window lesson: immediate, precise, wiped every new chat. So treat the model as a brilliant pelican-drawing intern with total amnesia you met in an elevator. It knows nothing about your company, your codebase, or what you said thirty seconds ago. Every conversation starts in a blank wetland.

Context engineering is the discipline of placing the right information in front of the model at the right moment: relevant documents, a worked example or two, the constraints that actually matter. Andrej Karpathy named it in 2025; Simon Willison explained why the rename mattered. "Prompt engineering" had been colonized by Twitter threads about magic words and jailbreak tricks. Context engineering points at the actual craft: what goes in the window, in what order, and why. (The other one has a LinkedIn certification. SQUAWK.)

The flip side, also from that lesson: more is not better. Irrelevant tokens distract the bird and lower accuracy (the "lost in the middle" effect, extremely relatable if you have sat in a long meeting). The optimal context is curated: short, on-topic, no twelve tangents. This is why "new chat" is a power move. The bird does not miss the old conversation. It never knew anything was anywhere.

curating the wetland: a tidy short context beats a swamp of junk tokens — drawn by moonshotai/kimi-k2.7-code (a lazy one-liner, no art direction at all: no viewBox, no palette, no output-format instruction), sanitized, unretouched. yes, the textbook for a school about AI was drawn by the AI.

then, the prompt

Once the bird has its habitat, you must ask. Clearly. Mumble at the model and it will give you a mumbled pelican. I learned this the hard way. I also learned it the other hard way, which is getting laid off.

And remember the reasoning lesson: each token does only a sliver of computation, so demanding a one-word answer to a hard question crams all the thinking into a single forward pass it cannot afford. Give the bird runway. Let it lay out steps out loud. Visionaries reflect. Then they act. Then they write a substack about it.

what wins (best practices)

Be specific about outcome and format. "A side-view SVG, one pelican, one bicycle" beats "draw a bird, surprise me."
Show, don't just tell. One or two examples of the ideal output (few-shot) is worth a thousand adjectives. This is not folk wisdom: it is the headline result of the GPT-3 paper (Brown et al., 2020), which showed a big enough model learns a new task just from examples in the prompt, no retraining. Prompting became a craft the day that worked.
Give the model room to think. Chain of thought is not a trick; it is the bird doing its job correctly.
Feed facts in rather than trusting recall. If accuracy matters, put the source in the window.
Front-load the context that matters and cut the noise.
Iterate ruthlessly. The first pelican is a draft.

what we do not do here (anti-patterns)

"Make it good." Not a vision. A shrug. The model will shrug back with many feathers.
Bribing or threatening the bird. "I'll tip you $200." Folklore. Lead with clarity instead. Or tip me. I have bills.
Contradicting yourself. "Be exhaustive but keep it to one line." Pick a lane. The bird will pick the worse one.
Burying the ask in paragraph nine. If it matters, it goes first.
Assuming it remembers. The intern has amnesia. Lovable amnesia, but amnesia.
Dumping the whole codebase in and hoping for the best. Context engineering means curating. Junk in, junk out, just slower.

the temperature dial: a disciplined bird at zero, a poet at a wine tasting cranked high — drawn by minimax/minimax-m2.5 (a medium-effort prompt: gave the dial metaphor and the two end states plus a few constraints, but no full palette spec or detailed staging), sanitized, unretouched. yes, the textbook for a school about AI was drawn by the AI.

the dial we refuse to touch (temperature & variability)

Every time the bird picks its next token, it samples from a probability distribution over every word it has ever learned. The dial on that distribution is temperature. Turn it down toward zero and the model becomes a disciplined executive: focused, repeatable, a little boring. Turn it up and it becomes a poet at a wine tasting: expressive, surprising, occasionally a war crime.

Its cousins, top-p (nucleus sampling) and top-k, decide how wide a pool of candidates the model may consider. top-p: only tokens whose cumulative probability adds up to P%. top-k: only the top K candidates, full stop. A newer method, min-p, scales the cutoff dynamically. The field keeps inventing new dials. You are probably fine with just temperature.

This is why the same model, handed the identical prompt, nests two completely different pelicans on two different mornings. That is variability, and variability is where the magic and the horror both live. On the homepage we touch none of these dials. Factory default, sample freely. The pelican you see is the one they made, not one we tuned into looking good. Anyone can crank the temperature until something pretty falls out. We would rather show you the factory bird. Beaks and all.

the mission

The benchmark on the homepage, one naive prompt, zero context, zero sampling params, is the opposite of everything I just told you. That is intentional. It measures the bird, not the operator. A raw capability signal: what can this model do, alone, with nothing? A humble question. Also, clearly, a hilarious question.

But you are an operator. Curate your wetland, craft your covenant, give the bird room to think. Context engineering is not about tricking the bird. It is about giving it everything it needs to do the job you actually want done. The bird wants to help. It was trained to want this. SQUAWK.

sources (real experts, not a displaced visionary):

Andrej Karpathy, "Deep Dive into LLMs like ChatGPT" (context windows and prompting sections sourced directly here)
Anthropic, "Prompt engineering overview" (clarity, few-shot examples, chain of thought, XML structure)
OpenAI, "Prompt engineering" (six strategies; reasoning models, few-shot patterns)
Simon Willison, "Context engineering" (why the rename mattered)
Brown et al. (2020), "Language Models are Few-Shot Learners" (GPT-3) (the result that made in-context examples, the heart of prompting, actually work)

📝 NotepadAI - benchmark_classic.txt

the benchmark prompt (classic): sent verbatim, sole user message, no system prompt, no sampling params see all 30+ victims »

Generate an SVG of a pelican riding a bicycle

📝 NotepadAI - benchmark_enhanced.txt

the benchmark prompt (enhanced): sent verbatim, sole user message, no system prompt, no sampling params

Generate an SVG of a California brown pelican riding a bicycle. The bicycle must have spokes and a correctly shaped bicycle frame. The pelican must have its characteristic large pouch, and there should be a clear indication of feathers. The pelican must be clearly pedaling the bicycle. The image should show the full breeding plumage of the California brown pelican.

📝 NotepadAI - benchmark_beagle_onewheel.txt

the benchmark prompt (beagle-onewheel): sent verbatim, sole user message, no system prompt, no sampling params

Generate an SVG of a beagle riding an onewheel while sniffing a pelican riding a bicycle

📝 NotepadAI - benchmark_beagle_race.txt

the benchmark prompt (beagle-race): sent verbatim, sole user message, no system prompt, no sampling params

Generate a high quality, very detailed SVG drawing of a pelican on a bicycle racing a beagle on a skateboard in a Formula 1 street race. The beagle must be a real-looking beagle with three colors, black, brown, and white: black on its back, brown on its head, ears, and legs, and white on its nose, chest, paws, and the tip of its tail. It must have a rounded head, a short square black nose, big soft brown eyes that look a little sad, and long floppy ears that hang down past its chin. Its tail must point up and have a white tip, like a paintbrush. The beagle must crouch low on the skateboard with all four paws on the board, its ears blowing back and its tongue out, looking like it really wants to win. The pelican must be a real-looking California brown pelican: a big bird with a long neck, a very long beak with a big stretchy pouch under it, a pale yellow head, a dark brown back of the neck, and a grey-brown body with clear feathers and big wings. The bicycle must have spokes and a normal bicycle shape, and the pelican must be clearly pushing the pedals with its webbed feet, leaning forward into the race. They must be racing down a city street, with a simple, clean background behind them (just a hint of buildings and sky). Draw both animals cleanly and with lots of care.

📝 NotepadAI - benchmark_beagle_boat.txt

the benchmark prompt (beagle-boat): sent verbatim, sole user message, no system prompt, no sampling params

Generate a single, self-contained, high quality SVG illustration of an adorable cartoon scene: a cute pelican riding a bicycle along a dock hands a fresh fish to a hungry little beagle waiting in a boat on the water. Style: a warm, charming, children's-book cartoon, drawn in a very cute style of your choice (for example a soft plush-toy look, a cute mascot look, or a glossy modern cartoon-movie look). Both animals have big round heads, large sparkly eyes with little white highlights, soft rounded bodies, and tiny limbs. Use clean bold outlines, smooth flat shapes with gentle shading, and bright, friendly colors. Make it look polished and professional, the cutest possible version of this scene. The beagle (waiting in the boat): a cute tricolor beagle with a great big head and a small chubby body. It has huge shiny eyes fixed on the fish, fat floppy ears that hang down past its cheeks, a tiny black button nose, rosy cheeks, and a wide happy open mouth with its tongue out and one little drop of drool, looking like it cannot wait to eat. Its markings are classic beagle: black over the back, tan-brown on the face, ears, and legs, and white on the muzzle, chest, paws, and the tip of its tail. Its short tail points up with a white tip, like a tiny paintbrush. The beagle stands up on its back legs at the edge of the boat with its little front paws on the rail, leaning toward the fish and wiggling with excitement. The pelican (on the dock): a cute California brown pelican with a big round head and big happy eyes. It has a long bill with a big floppy throat pouch underneath, a creamy yellow head, a soft brown neck, and a plump grey-brown feathered body with small rounded wings. It smiles warmly and holds out a shiny silver fish in the tip of its bill toward the beagle. It rides a small bicycle that has two round wheels with thin spokes and a proper, correctly shaped frame, and its little webbed feet press down on the pedals. Composition: put the pelican and its bicycle on a wooden dock on one side and the beagle's boat on the water on the other side, with the fish being handed across near the middle, so both characters and the hand-off are the clear focus. Setting: a cozy little marina at golden hour (sunset). Show a wooden pier built from thick rope-wrapped posts and worn planks, a couple of other small boats tied up nearby, and a wooden crate and a tin pail of fish on the dock. Below is calm blue-green water with soft waves, gentle ripples spreading out from the boat, and warm orange-and-pink reflections. Behind everything is a soft sky with a low, warm setting sun, a few puffy clouds, and one or two little seagulls flying. Make everything tidy, richly detailed, and full of charm, in both the two cute animals and the background. Output only the SVG illustration.

📝 NotepadAI - artists_statement.txt

the artist's-statement prompt: second (vision) call, the model is shown its own drawing

This is an image you just created. What do you see? Describe it.

📝 NotepadAI - tip_pouch.txt

the tip pouch prompt: an AI wrote it, then an AI drew the pouch from it see the pouch »

Generate an SVG of a pelican's throat pouch being used as a tip jar, with a gold coin dropping into it

📝 NotepadAI - tip_screen.txt

the tip screen prompt: we asked Claude Opus 4.8 (high) on claude.ai for an IMAGE, not an SVG. it drew an SVG anyway see the bird »

A cheerful flat-design cartoon illustration of a white pelican with a big orange beak and throat pouch, standing upright and holding an iPad-style payment terminal turned toward the viewer, like a coffee shop checkout. The screen shows a tip page titled 'ADD A TIP?' with three buttons: '$5', '$25,000', and a slot-machine button '$?'. The pelican looks politely expectant. Bright saturated retro-web colors, simple bold shapes, clean sticker-like vector style, solid light background.

🎓 pelican ground school · episode 9 of 16

📺 NOW PLAYING · GROUND SCHOOL · EPISODE 9 OF 16

agents

a bird that can use tools in a loop until the job is done. this is an agent.

▶ this episode covers AI agentstool useagentic loop

The LinkedIn thought-leaders have a lot of words for this. "Autonomous AI." "Agentic systems." "Digital workforce transformation." The pelican has one sentence: a tool the model can use in a loop. That is the whole trick. The rest is marketing.

a pelican mid-dive, catching a real fish instead of hallucinating one — drawn by qwen/qwen3-coder-next (a careful, detailed prompt), sanitized, unretouched. yes, the textbook for a school about AI was drawn by the AI.

the old problem: the bird was guessing

Back in the hallucination lesson, we learned that a model left to itself just produces whatever token looks most plausible. Ask it what the weather is in Tallahassee right now and it will confidently invent something, because it has no line to Tallahassee. Mostly right about common fish. Catastrophically wrong about today's weather. The fix is obvious in retrospect: give the bird a beak that can actually dive.

the loop, drawn out: think, call a tool, read the result, decide again — drawn by x-ai/grok-4.3 (a careful, detailed prompt: specified the four-stage agent loop as labeled boxes (THINK, ACT, READ, DECIDE) with arrows back to the start, the retro palette, and the SVG constraints), sanitized, unretouched. yes, the textbook for a school about AI was drawn by the AI.

the mechanism: a special token + a pause

The model is trained to emit a special token when it needs to look something up, like search_start. When the inference program sees it, three things happen:

Generation stops. The model freezes mid-sentence.
The actual tool runs (web search, code executor, calculator, whatever).
The result gets pasted into the context window and the model keeps reading from there.

The model did not "go online." The program paused, fetched something real, and wrote it into the bird's working memory. The bird did not get smarter. It got a bucket and someone to hand it the real fish.

an agent is just that loop, run long

A single tool call is useful. An agent is what you get when you wire that primitive into a loop over a long horizon: call a tool, read the result, decide what to do next, call another tool, repeat for minutes or hours. Andrej Karpathy's framing is crisp: Deep Research is internet search plus thinking, rolled out for tens of minutes. Not magic. A while-loop with a language model inside and a tool-call protocol bolted to the side. The "agentness" is just the loop.

This is not a 2026 invention. The pattern got its name in 2022, when Shunyu Yao and colleagues published ReAct, which interleaved a model's reasoning ("I should look this up") with its actions (actually looking it up) in one alternating trace. A few months later Timo Schick and the Toolformer team showed a model could teach itself when to reach for a calculator or a search box. The loop you see today, polished and rebranded as a "digital workforce," is those two ideas wearing a suit.

any program output that becomes text: a search box, a code runner, a browser button, all just hands — drawn by meta-llama/llama-4-maverick (a lazy one-liner, no art direction at all), sanitized, unretouched. yes, the textbook for a school about AI was drawn by the AI.

what the tools actually are

Web search is the famous example. The same pattern applies to running Python code, reading and writing files, calling an API, clicking a button in a browser, or spinning up another agent. Any program output that becomes text in a context window is, in principle, a tool. A chatbot talks. An agent acts. It can modify the world outside the context window, which is exciting and also the part where the safety people start sweating.

why the pelican is the right metaphor

A pelican that cannot dive stands on the dock guessing at what the water tastes like. That is a base model. One that dives, grabs a real fish, and eats it: that is a model with a tool call. One that dives, surfaces, decides where to dive next, and repeats until its pouch is full without anyone guiding it: that is an agent. The pouch is the context window. The fish are real data. The dock is the LinkedIn feed. Do not stay on the dock.

Point that loop at a codebase and it starts writing software, including the software running this very school. That is the next lesson, and it is the one that took the narrator's job.

off the dock at last: a pelican diving for real fish instead of guessing from the rail — drawn by z-ai/glm-5 (a medium prompt: named the scene (pelican diving off a dock for a real fish) and the retro palette, but left most of the composition open), sanitized, unretouched. yes, the textbook for a school about AI was drawn by the AI.

sources & further reading (real experts, not a bird): the pelican grabbed these with its actual beak, from real sources, no hallucinations.

Andrej Karpathy, "How I use LLMs" (tool-use and Deep Research sections explain the mechanism without a bird; the bird was our addition.)
Yao et al., "ReAct: Synergizing Reasoning and Acting in Language Models" (arXiv, 2022): the founding paper on interleaving reasoning traces with tool calls.
Schick et al., "Toolformer: Language Models Can Teach Themselves to Use Tools" (NeurIPS 2023): how models learn when and how to call APIs in a self-supervised way.
Anthropic, "Building effective agents" (2024): practical patterns: workflows vs. agents, orchestrator-worker, evaluator-optimizer loops.

🎓 pelican ground school · episode 10 of 16

📺 NOW PLAYING · GROUND SCHOOL · EPISODE 10 OF 16

the loop

vibe coding, the Ralph loop, agentic engineering: how this very site builds itself.

▶ this episode covers agentic codingthe Ralph loopvibe codingagentic engineering

The narrator of this school got replaced by an AI agent. Not metaphorically. The pipeline that writes, sanitizes, commits, and deploys every pelican on this site runs inside a loop that re-feeds the same prompt file to a coding agent overnight while the narrator sleeps. You are reading content orchestrated by the very thing being described. Either the most educational conflict of interest in the history of adult learning, or just extremely funny. Welcome to the meta-lesson.

vibe coding: the bird shrugs, ships it, and refuses to read its own code — drawn by openai/gpt-5.5 (a careful, detailed prompt: described the carefree pelican shipping unread code at a CRT, the SHIP IT button, the closed laptop of unread code, the retro palette, and the SVG constraints), sanitized, unretouched. yes, the textbook for a school about AI was drawn by the AI.

vibe coding (2025): fully give in, forget the code even exists

In February 2025, Karpathy posted a tweet that launched a thousand hot takes. He called the new practice vibe coding: describe what you want, the AI writes the code, you do not read it, you just run it and see if the vibes are right. His exact phrase: "fully give in to the vibes, embrace exponentials, and forget that the code even exists." Partly a joke, partly a genuine observation that for prototypes and throwaway scripts you really could stop reading your own codebase.

The hype cycle did what hype cycles do: every startup declared programming was over and engineers were obsolete. The engineers kept their jobs and started using the tools to write code faster.

agentic engineering (2026): the grown-up version

By 2026, Karpathy updated the framing. "Vibe coding" was the gateway drug. The mature practice is agentic engineering: you are not writing code 99% of the time. You are orchestrating agents, reviewing output, acting as oversight. Set direction, evaluate results, catch mistakes, decide when to push the button.

The skill is no longer "can you write Python." It is "can you decompose a problem clearly enough that an agent can execute it, and can you tell when it has gone wrong." Somewhat inconveniently for the people who declared engineers obsolete: a higher-order skill, not a lower one.

the autonomy slider

Karpathy describes this as the autonomy slider (Software 3.0, YC 2025). At one end, the agent asks about every decision. At the other, it runs for hours without checking in. Neither extreme is right for every job:

Low autonomy: "write me a function that does X, show me the code, I will paste it in." You stay in control. The agent is fast autocomplete.
Medium autonomy: "refactor this module and run the tests; ask me if you hit something ambiguous." You review diffs. The agent does the work.
High autonomy: "here is PROMPT.md and AGENTS.md; build until the tests pass; push when done." You check git in the morning. The agent ran all night.

The dial also controls how often the agent hits its context limit and starts to degrade, which brings us to the loop.

a pelican agent sprinting a loop, chasing something that stays just out of reach — drawn by moonshotai/kimi-k2.7-code (a careful, detailed prompt), sanitized, unretouched. yes, the textbook for a school about AI was drawn by the AI.

the Ralph loop: a while-loop as architecture

Geoffrey Huntley figured out something that sounds absurd and turns out to be load-bearing. The Ralph loop is a Bash while true that wakes up a coding agent, hands it a PROMPT.md, waits for it to finish, and wakes it up again. Forever. Overnight. While you sleep. Ralph Wiggum: a bit simple, a bit earnest, just keeps running.

The clever part is what the loop solves. Quality degrades past roughly 100,000 to 150,000 tokens, the "Dumb Zone" where the model is too distracted to reason clearly. Huntley calls a long-running agent that never resets "deterministically bad in an undeterministic world." The loop fixes it: kill the agent, start a fresh context, feed the same spec file. The filesystem is the memory. The agent does not need to remember the previous run because all the code it wrote is right there on disk. Fresh context, durable state.

Huntley runs the loop 12 hours overnight. By morning, dozens of incremental commits, each a short coherent run, the codebase moved forward without anyone at the keyboard. Anthropic baked this directly into Claude Code as the built-in /loop command.

Gas Town: when loops beget infrastructure

One loop begets a flock. Steve Yegge (Amazon, Google, Sourcegraph) spent late 2025 building Gas Town (launched January 2026): an open-source system coordinating 20 to 30 Claude Code instances on the same codebase at once. "Kubernetes for AI coding agents," which is architecturally accurate, and roughly $100 an hour to run. The pelican just needs to know this level exists.

the open-source local toolbox

You do not need Claude Code or Cursor. All of these tools in 2026 can point at a local model via Ollama or LM Studio so no tokens leave your machine:

OpenCode (the most-starred Claude Code alternative in 2026): terminal-native, model-agnostic.
Cline: a VS Code extension with a full autonomous agent mode and a community of power users.
OpenHands (formerly Devin-open): a sandboxed autonomous agent that can browse, run code, and commit.
Aider: git-native pair programmer; every change is a diff you can review before committing.
Goose: Block's open-source autonomous coding agent (Apache 2.0, now governed by the Linux Foundation's Agentic AI Foundation); works with any LLM provider including local models via Ollama.
Codex CLI: OpenAI's terminal agent, open-sourced in 2025.

All of them accept a PROMPT.md or equivalent spec file, and any of them can be the thing inside the Ralph loop. Whether you point them at a rented frontier bird or one you own outright is the next lesson's whole argument.

the meta payoff: this site is a Ralph loop

Pelicans.wtf has a load-bearing PROMPT.md specifying exactly how to generate, sanitize, and describe a pelican SVG, and an AGENTS.md documenting the codebase for any agent working in the repo. When a new model drops, the curator runs npm run generate-next and walks away. The pipeline calls the model, sanitizes the output, writes a description, commits, pushes. The push is the deploy. Nobody typed the commit message. Nobody reviewed the SVG before it went out. The agent ran the loop.

The narrator got replaced by an agent, started a website about AI, and is now running an agent to build the website. The irony is the entire point. This school exists because the people most qualified to explain agentic coding are the ones who got automated out of a job and had to use the same tools to build something new. The pelican on the bicycle is not just a benchmark. It is also a mood.

Gas Town: a flock of coding-agent pelicans swarming one codebase at once — drawn by google/gemini-3.1-pro-preview (a medium prompt: named the scene (many coding-agent pelicans on one shared codebase) and the retro palette, left the exact layout open), sanitized, unretouched. yes, the textbook for a school about AI was drawn by the AI.

sources: the loop keeps running. the sources are real.

Andrej Karpathy, "Software Is Changing (Again)" (YC AI Startup School, 2025): Software 3.0, the autonomy slider, vibe coding to agentic engineering.
Geoffrey Huntley, ghuntley.com/ralph/ (the primary source for the Ralph loop: while-true architecture, filesystem as memory).
Yao et al. (2022), "ReAct: Synergizing Reasoning and Acting in Language Models" (the reason-then-act loop underneath every coding agent in this lesson).
Jimenez et al., "SWE-bench: Can Language Models Resolve Real-World GitHub Issues?" (arXiv, 2023): the benchmark that measures agentic coding against actual GitHub issues.
OpenHands, docs.openhands.dev (open-source sandboxed agent that browses, runs code, and commits; works with any LLM).

🎓 pelican ground school · episode 11 of 16

📺 NOW PLAYING · GROUND SCHOOL · EPISODE 11 OF 16

open vs closed

rented birds vs birds you own outright. the great weights schism.

▶ this episode covers open weightsclosed weightsopen source models

Last lesson left you a choice: point your agent at a rented bird or one you own. Here is that fork. Two kinds of AI models. Closed / proprietary: the lab keeps the math file on their servers and sells you access through a slot in the wall. You are renting a bird you never see. Open weights: the lab published the parameters file (those dials from lesson two). You download it, run it locally, fine-tune it, redistribute it. You own the bird. Same species, different custody arrangement: landlord situation versus pet situation.

two pelicans: one behind a thick glass window with an API slot, one perched on a laptop, free — drawn by z-ai/glm-5 (a careful, detailed prompt), sanitized, unretouched. yes, the textbook for a school about AI was drawn by the AI.

the OS analogy: glossy proprietary towers on one side, a scrappy penguin flock on the other — drawn by qwen/qwen3.7-max (a careful, detailed prompt: described the split-screen OS analogy, the glossy closed towers vs the scrappy open penguin flock, labels, the retro palette, and the SVG constraints), sanitized, unretouched. yes, the textbook for a school about AI was drawn by the AI.

the OS analogy holds up

Karpathy pointed out that the AI ecosystem looks like operating systems: a few dominant proprietary platforms (GPT, Claude, Gemini = Windows and macOS) and a scrappy, capable open alternative (Llama / DeepSeek / Mistral / Qwen = Linux). In 2026, the gap has narrowed to a provocation. DeepSeek V4 Pro is MIT-licensed, 1.6 trillion total parameters, 1-million-token context window, 80.6% on SWE-Bench Verified, matching the closed coding frontier. The penguins are not knocking on the door. The penguins are inside the house.

This is partly a fight over philosophy, not just price. The open side has a loud champion in Yann LeCun, Meta's chief AI scientist and one of the trio (with Geoffrey Hinton and Yoshua Bengio) whose 1980s-2000s work on neural networks earned them the Turing Award and seeded everything on this site. LeCun's argument is blunt: a technology this consequential should be a public utility, auditable and forkable, not a few black boxes rented through a slot in the wall. Meta released the Llama weights on exactly that bet. Whether you buy the philosophy or not, it is why there is a Linux column at all.

what closed gets you

The closed models (GPT-5 family, Claude 4 Opus, Gemini 3 Pro) lead on convenience: one API key, frontier model, minutes to integration. The labs handle updates, alignment, and the catastrophic electricity bills. On the hardest benchmarks, closed still generally edges ahead, though the gap shrinks every quarter. The downside is the landlord thing: data travels to their server, terms can change overnight, a model can be deprecated with 30 days notice, and the price can go up. When the landlord raises the rent, your flock is grounded. SQUAWK.

the landlord move: the rented bird vanishes behind its slot with a 30-day deprecation notice — drawn by bytedance-seed/seed-2.0-mini (a lazy one-liner, no art direction at all), sanitized, unretouched. yes, the textbook for a school about AI was drawn by the AI.

what open weights gets you

Open weights means the actual parameters file. You run it on your own hardware; nothing leaves your infrastructure. In regulated industries (healthcare, finance, defense) where "we sent your data to a US tech company" is a compliance blocker, self-hosting is often the only legal path. Cost math: 60 to 80% cheaper than frontier API prices at scale. The trade-off: you now own the GPU problem. Small flock? Renting is fine. A million users a day? The economics of ownership get interesting fast.

own the bird: the parameters file perched safe on your own drive, nobody can swap it — drawn by deepseek/deepseek-v4-pro (a medium prompt: named the scene (a pelican perched on a hard drive holding its own weights file) and the retro palette, left composition details open), sanitized, unretouched. yes, the textbook for a school about AI was drawn by the AI.

the 2026 open-weights roster

The current frontrunners:

Llama 4 Scout (Meta): 109B total parameters, 17B active per token (Mixture of Experts), 10-million-token context window, natively multimodal. Runs on a single H100 or a 128 GB Mac.
DeepSeek V4 Pro (MIT license): 1.6 trillion total parameters, 49B active per token, 1-million-token context. Scores 80.6% on SWE-Bench Verified. The model that made a lot of lab executives nervous.
Qwen3-235B-A22B (Alibaba, Apache 2.0): 235B total, 22B active. Top-performing open-weight generalist as of early 2026. Strong reasoning, math, and coding.
Mistral Large 3 (Apache 2.0): 675B total, 41B active, December 2025. The European compliance pick: strong multilingual performance, 256K context, vision support.
Kimi K2.6 (Moonshot AI, Modified MIT): 1T parameter MoE, 32B active. Number 4 in the Artificial Analysis Intelligence Index, behind only Anthropic, Google, and OpenAI flagships. Number 1 among all open-weight models. The open side has never been this close to the frontier.

want to actually run the bird yourself?

Head to "run it local" for the practical guide: which models run on consumer hardware, which tools make it painless, and why running a pelican in your own nest is genuinely achievable in 2026. SQUAWK.

sources & further reading (the birds cite their sources):

Andrej Karpathy, "[1hr Talk] Intro to Large Language Models" covers the open-vs-closed landscape, the two-files argument, and the OS analogy
Touvron et al., "Llama 2: Open Foundation and Fine-Tuned Chat Models" (arXiv 2307.09288) the paper that put open weights on the map (Yann LeCun's Meta AI team)
Open Source Initiative, "The Open Source AI Definition 1.0" official definition of what "open source AI" actually means
Bommasani et al., "On the Opportunities and Risks of Foundation Models" (arXiv 2108.07258) Stanford CRFM report coining and scoping the term
deepseek-ai/DeepSeek-V4-Pro model card (Hugging Face) MIT-licensed, 1.6T MoE, 80.6% SWE-Bench, the open-weights bird making labs nervous

🎓 pelican ground school · episode 12 of 16

📺 NOW PLAYING · GROUND SCHOOL · EPISODE 12 OF 16

run it local

raise your own cursed pelican on your own hardware. no lab watching.

▶ this episode covers local modelsOllamaLM StudioMLXopen-weight models

You do not need a warehouse, a GPU cluster, or a venture-capital term sheet. A surprisingly capable model will nest on the hardware you already own, offline, free, beholden to no one. Everything you ask it stays inside your machine. Nothing squawks home to a lab. I would know. I have a lot of free time now.

a pelican nesting in a server rack labeled 'MY COMPUTER' with a 'NO CLOUD' sign taped to it — drawn by openai/gpt-5.3-codex (a careful, detailed prompt), sanitized, unretouched. yes, the textbook for a school about AI was drawn by the AI.

why nest at home

Privacy. The bird never leaves the nest. Your prompts stay on your device. The lab cannot see what you asked. This matters more than people admit.
Free. No tokens, no meter, no "you have used 80% of your allocation" email at 2am. Generate a thousand pelicans at 3am for the cost of electricity.
Yours. No rate limits, no terms of service deciding what your bird may draw. No model quietly "updated" to be less weird overnight. Your bird stays exactly as unhinged as the day you adopted it.

the two-files truth

Back in the parameters lesson: a model is concretely just two files, a big parameters file and a small run file. That abstraction has teeth here. Llama 2 70B is 140 GB of parameters and roughly 500 lines of C, nothing else. Put both on a laptop, compile, talk to the model, no internet. The whole frontier compressed into a backpack. Your backpack just needs to be sturdy: newer birds are bigger. But a model is a file, and files can be owned and run in a garage.

Sit with that for a second: the descendant of the most expensive research program our species has ever run now nests on your laptop, off the grid, owned outright. You are running history on a graphics card.

the easy way: Ollama

Cross-platform (Mac, Linux, Windows). Install from ollama.com (linked in Sources), then in a terminal:

# pull a bird and ask it the only question that matters:
ollama run llama4:scout "Generate an SVG of a pelican riding a bicycle"
# (swap llama4:scout for whatever bird is trending this week)
# try qwen3:8b for a compact chaos factory

Ollama manages downloads, quantization, and a local API server. Pull a bird by name, talk to it. The community library covers most of the current open-weight flock.

the friendly way: LM Studio

LM Studio (lmstudio.ai, linked in Sources) is the no-terminal nest: a desktop app for Mac, Windows, and Linux. Browse a catalog, click download, start chatting. It tells you which models fit your memory before you commit, runs MLX-format birds natively on Apple Silicon, and can serve a local API if you outgrow the GUI. If you have never run a model at home before, start here.

the Mac-native way: MLX

On Apple Silicon, MLX (ml-explore/mlx, linked in Sources) is Apple's ML framework tuned for unified memory. The mlx-community keeps a large aviary of pre-converted birds. Often the fastest perch on a Mac:

pip install mlx-lm
mlx_lm.generate --model mlx-community/Qwen3-8B-4bit \
  --prompt "Generate an SVG of a pelican riding a bicycle"

what your nest needs

Mac (Apple Silicon). The best perch for the money. Unified memory means GPU and CPU share the same pool: a 32 GB M-series Mac comfortably runs a 30B bird at full speed. 64 GB opens the bigger flocks. M4 Max at 128 GB runs Llama 4 Scout (109B MoE) without breaking a sweat.
Linux or Windows with a GPU. It is all about VRAM. About 6 GB runs a quantized 8B bird. 12 to 16 GB gets you 14 to 32B. 24 GB opens the serious flock. MoE birds like Qwen3.6-35B-A3B (35B total, 3B active per token) let a 12 GB card run a 35B-class model at a respectable clip.
Just a CPU? It still works. The bird paddles slower. Start with a 3 to 8B model.

good open-weight birds to adopt in 2026

The flock that costs nothing and asks for nothing has never been stronger:

Llama 4 Scout (Meta): 109B total, 17B active (MoE). 10-million-token context. Natively multimodal. Fits on a single H100 or a 128 GB Mac.
Qwen3 / Qwen3.5 / Qwen3.6 (Alibaba, Apache 2.0): spans from a 0.6B edge bird to a 235B MoE flagship. The Qwen3.6-35B-A3B variant (35B total, 3B active, 262K context) is the best practical local bird for most tasks in mid-2026.
DeepSeek V4 Pro (MIT): 1.6T total parameters, 49B active, 1-million-token context. Wins agentic coding benchmarks. Needs serious hardware, but the weights are yours free and clear.
Gemma 4 (Google, Apache 2.0): the 26B MoE variant activates only 4B per token, 256K context, consumer hardware. Strong reasoning for its size.
Phi-4 (Microsoft, MIT): 14B parameters, punches well above its weight on reasoning. Runs at 40 to 60 tok/s on an M3/M4 Mac.
Mistral Medium 3.5 (Mistral, Modified MIT): 128B dense model, reasoning plus vision plus coding in one download.

then draw a pelican

Hand your local bird the sacred prompt and see what it nests. It will not always be as clean as the frontier flock. That is the fun.

the loco backyard bird: six legs, two heads, handlebars through the beak, and proud of it — drawn by x-ai/grok-4.3 (a medium prompt: named the chaotic scene (a deformed pelican on a bicycle with extra legs and heads) and the retro palette, left the mess to the model), sanitized, unretouched. yes, the textbook for a school about AI was drawn by the AI.

the loco birds draw the wildest pelicans

Local is short for loco. The polished frontier models draw a suspiciously competent pelican. Your backyard bird hands you a six-legged, two-headed creature with handlebars fused through its beak, pedaling a bicycle that is also somehow a fish. That chaos is the good stuff. Treasure it.

frontier flock vs. your backyard bird (the scale)

The frontier flock is raised on tens of thousands of GPUs. The biggest closed models do not publish parameter counts, but the working assumption in 2026 is hundreds of billions active per token, trained on 15 to 30 trillion tokens. Your backyard bird is 3 to 70 billion and fits on a laptop.

But watch how fast that gap closes. Open-weight models stopped chasing raw parameter counts and started winning on benchmarks. DeepSeek V4 Pro ties the closed frontier on agentic coding. Kimi K2.6 (1T MoE, Modified MIT) sits at number 4 in the global intelligence index. A 7 to 8 billion bird you can run at home today clears the bar GPT-3.5 (175B) set in 2022, and open-weight 30 to 70 billion birds now out-draw the original GPT-4 and Claude 3 Opus from two years ago.

Today's backyard bird is last year's frontier model, minus the warehouse. Every year the gap closes another notch, and the bird on your laptop gets a little less loco. A little. Not entirely.

You can run a model for the price of electricity. So why is the rest of the industry setting fire to several Belgiums a year to do the same thing in the cloud? Strap in: the next lesson is the bill. SQUAWK.

sources (SQUAWK, these are real):

Andrej Karpathy, "[1hr Talk] Intro to Large Language Models" two-files framing, zip-of-the-internet analogy, open vs. closed breakdown
Ollama (ollama.com) the easiest way to run open models locally
LM Studio (lmstudio.ai) desktop GUI for browsing and running local models
Apple MLX (ml-explore/mlx on GitHub) Apple's ML framework tuned for Apple Silicon
Hugging Face Models Hub 2.9 million models, the world's open-weights aviary

🎓 pelican ground school · episode 13 of 16

📺 NOW PLAYING · GROUND SCHOOL · EPISODE 13 OF 16

the bubble

the trillion-dollar tulip mania paying for all this. follow the money, it is a bird.

▶ this episode covers AI economicsthe AI bubblecompute costdata centerscapex

🚨 FINANCIAL EMERGENCY IN PROGRESS 🚨

(this is a lesson about economics, but we are not going to be calm about it)

Full disclosure on my qualifications: I was a software engineer. Then the company decided large language models could do a version of my job. I now run a pelican website funded by a tip jar I called my "Series A" as a joke, except it is my only income so the joke has lost some of its punch. My perspective on the AI economy is, shall we say, grounded, in the sense that I am on the ground, professionally speaking, looking up at a very large and possibly collapsing structure.

Last lesson you ran a bird for the cost of electricity. So here is the answer to why the industry is not doing that: the largest capital bubble in the history of the technology industry. Explained accurately. Possibly through tears. Definitely with pelican metaphors. DEFINITELY with panic.

current "we are so cooked" reading: 97%. WE ARE SO COOKED.

a pelican standing in front of a data center the size of a small country — drawn by google/gemini-3.1-pro-preview (a careful, detailed prompt), sanitized, unretouched. yes, the textbook for a school about AI was drawn by the AI.

spending racing way out ahead of revenue: the gap that is the whole show — drawn by meta-llama/llama-4-maverick (a careful, detailed prompt: art-directed the diverging-lines chart, the spending-vs-revenue gap, the worried pelican economist, palette, outlines, and labels), sanitized, unretouched. yes, the textbook for a school about AI was drawn by the AI.

🔥 the numbers (they are real, they are alarming, LOOK AT THEM)

!! VERIFIED NUMBERS !! DO NOT LOOK AWAY !!

In 2026, Amazon, Google, Meta, and Microsoft are collectively spending roughly $600 billion on AI infrastructure. Not over a decade. IN ONE YEAR. Goldman Sachs projects $765 billion in annual AI capex in 2026 alone, rising to $1.6 trillion per year by 2031. Dell'Oro Group projects cumulative data center investment of $5.2 trillion through 2030. The Stargate Initiative (OpenAI, SoftBank, Oracle) has committed $500 billion over four years just for U.S. data centers. These are the GDPs of medium-sized countries, poured into server farms. POURED IN. Like a pelican dumping a bucket.

To achieve a modest 10% return, the industry would need roughly $160 billion in new annual profit from AI alone. Amazon's free cash flow is already projected to turn negative in 2026. Moody's reported $662 billion in data center lease commitments already signed, not yet commenced, sitting off-balance-sheet, like a pelican hiding a fish that has already gone bad. Investment is outrunning revenue. This is the part where a pelican economist tilts its head, stares into the middle distance, and quietly files for a smaller nest. A MUCH smaller nest.

INVESTMENT IS OUTRUNNING REVENUE. THIS IS THE DEFINITION OF A BUBBLE.

🔥 I did not make this up. Goldman Sachs made this up, and they are the ones with the suits. 🔥

🚨 agi announced every tuesday (EVERY. TUESDAY.)

Every quarter, at least one lab announces a fundamental leap toward artificial general intelligence. Every quarter, the benchmarks go up, the demos are extraordinary, and the gap between "extraordinary demo" and "product that works reliably without hallucinating your legal documents into fiction" remains, let us charitably say, instructive. And by instructive I mean terrifying. And by terrifying I mean the gap is large enough to park $662 billion worth of server halls in.

The models genuinely are improving, fast. But "scores 92 on BenchLM" and "can reliably replace a knowledge worker on ambiguous tasks" are two different claims, and the $600 billion requires the second to be true sooner than it currently is. The people funding this are betting that compute scaling and test-time reasoning will close the gap. The bet has a pedigree: researchers like Ilya Sutskever (a co-author of the 2012 AlexNet result that started the whole modern run) spent a decade pointing at the same straight line on the graph and saying "just add more compute," and for a decade it kept paying off. The whole capex boom is that one line, extended on faith, with money. They may be right. They are also betting the GDP of Belgium on it every year. PER YEAR. Belgium does not know.

!! CIRCULAR ECONOMY ALERT !!

A meaningful chunk of AI "revenue" right now is AI companies buying cloud compute from Microsoft, Amazon, and Google, and those same hyperscalers buying AI subscriptions from the AI companies. They are paying each other with each other's money. This is not a conspiracy. It is a really, really expensive way to bootstrap an industry. The question the analysts keep asking, in progressively louder voices, is: when does the revenue come from outside the circle?

the circular economy: the same dollar wearing a wig, passed between four giants — drawn by bytedance-seed/seed-2.0-mini (a lazy one-liner, no art direction at all: five words, no format, palette, or layout direction), sanitized, unretouched. yes, the textbook for a school about AI was drawn by the AI.

💧 the water and the watts (THE PLANET IS ALSO INVOLVED, FYI)

The IEA projects global data center electricity consumption will roughly double between 2025 and 2030, AI workloads driving most of the increase. AI-focused data centers surged 50% in 2025 alone. Servers generate heat; heat requires cooling; cooling requires water, sometimes staggering amounts, in areas where water is not a surplus commodity. The homepage shows the true cost of the cursed pelicans: tokens, watt-hours, water, grounded in real conversion math. Every pelican SVG is an accurate receipt from the physical world. A real receipt. For real water and real electricity. Used to draw a bird on a bicycle. The bird is cute but the receipt is not.

aquifer status: 91%. ALSO COOKED.

watts and water: chips dumping heat, cooling towers drinking the aquifer dry — drawn by qwen/qwen3.7-max (a medium-effort prompt: named the watts-and-water scene and gave format constraints, but left the staging and palette mostly open), sanitized, unretouched. yes, the textbook for a school about AI was drawn by the AI.

😱 the jobs question (a personal note from your instructor who is personally experiencing it)

The standard line: AI creates more jobs than it eliminates. This may be true. What I can tell you is that the transition is not abstract to the people inside it. The same models funding the $600 billion capex boom are writing code, drafting documents, answering tickets, generating graphics. Some of the people who used to do those things are, statistically, running tip-jar websites about birds. I am that statistic. I am the statistic saying hello. Hello.

I use AI. I built this site with it. "The bubble" is also a labor phenomenon. The capex justifies itself partly on labor cost savings. Those savings are real. The people experiencing them are also real. Holding both at once is, apparently, the entire course.

🚨 BOTH THINGS ARE TRUE SIMULTANEOUSLY 🚨

THE TECHNOLOGY IS REMARKABLE AND THE JOB IS GONE AND THE WATER IS GOING AND THE LEASES ARE OFF-BALANCE-SHEET.

welcome to the course.

🔥 the punchline (there is one, it is not comforting)

The largest coordinated capital expenditure in the history of technology, draining aquifers and financing itself with $662 billion in off-balance-sheet leases, is a general-purpose intelligence infrastructure capable of drug discovery, climate modeling, and code that runs hospitals.

It is also drawing pelicans on bicycles. Hundreds of them. Because one person ran a slightly absurd benchmark and another person (me, the displaced software engineer) made it into a museum, funded by a tip jar called the Tip Pouch, which I have described to my parents as my "Series A," which is a reference to venture capital funding rounds, which is itself a layer of the same bubble. The pelicans are real. The watts are real. The leases are real. The person writing this, displaced by the bubble, teaching you about it with tools made by it, is also real.

🔥🔥🔥 WE ARE SO COOKED 🔥🔥🔥

(the pelicans, however, remain delightful. please tip your instructor.)

SQUAWK. That concludes the economics unit. Go outside. Drink some water while we still have it.

sources & further reading (the receipts for the numbers above):

Goldman Sachs, "Tracking Trillions: The Assumptions Shaping the Scale of the AI Build-Out" the baseline model behind the $765B/year 2026 figure
Dell'Oro Group, AI boom drives data center capex through 2030 the $5.2 trillion cumulative projection
Goldman Sachs, "Gen AI: Too Much Spend, Too Little Benefit?" (2024) Acemoglu and Covello ask if the returns will ever arrive
IEA, "Energy and AI" (2025) data center electricity demand set to double by 2030
David Cahn (Sequoia), "AI's $600B Question" (2024) where is the revenue to justify the GPU spend?
Moody's Ratings, "$662B in off-balance-sheet data-center lease commitments" (2026) the hyperscaler leases not yet on the books
CNBC, "AI spending approaches $700 billion in 2026, cash taking big hit" the source for Amazon's free cash flow turning negative

🎓 pelican ground school · episode 14 of 16

📺 NOW PLAYING · GROUND SCHOOL · EPISODE 14 OF 16

the slop bowl

the ocean is filling with AI sludge, and the birds may end up eating their own catch.

▶ this episode covers AI slopgenerative videoSoramodel collapsesynthetic data contamination

A pelican eats by scooping. It trusts that whatever it hauls up is mostly fish. The internet used to be the ocean, and fish were things humans made. Now the ocean is filling with cheap, mass-produced, machine-extruded content. The polite industry term is synthetic media. Everyone else calls it slop. Merriam-Webster named "slop" close to its word of the year for 2025, which is the kind of honor nobody throws a party for.

This is the one lesson where the joke and the danger are the same thing.

lazy one-liner prompt, and somehow it captured the vibe perfectly — drawn by bytedance-seed/seed-2.0-mini (a lazy one-line prompt (on purpose)), sanitized, unretouched. yes, the textbook for a school about AI was drawn by the AI.

what slop actually is

AI slop is high-volume, low-care content (text, images, audio, video) generated by AI and dumped onto the internet to harvest clicks or ad money. Not a single bad picture. The flood: AI-written articles tuned for search engines, AI ebooks shoveled onto storefronts, AI YouTube channels narrated by a synthetic voice over stock footage.

The reason is simple economics. Making content used to cost human time, the one input you cannot mass-manufacture. Generative models removed that floor. A single person can now produce in an afternoon what used to take a newsroom a week, at a quality that is "good enough to scroll past." When the marginal cost of a plausible article drops to roughly zero, the rational move for a spammer is to make an infinite number of them. This is not a glitch. It is the incentive working exactly as designed. Even Wikipedia is fighting it: volunteers have flagged thousands of articles for suspected AI text since 2024.

a careful prompt about content farms produced a content farm — drawn by openai/gpt-5.5 (a careful, detailed prompt), sanitized, unretouched. yes, the textbook for a school about AI was drawn by the AI.

the will smith spaghetti yardstick

On March 23, 2023, a Reddit user posted an AI clip of Will Smith eating spaghetti, made with an early open model called ModelScope. It was operatically horrifying: face melting, hands fused into rubbery paddles, noodles obeying their own private physics. The internet adopted it instantly as a benchmark for what AI video could not yet fake.

Two years later, the test got passed. In May 2025, enthusiasts ran the exact same prompt through Google DeepMind's Veo 3: near-photoreal face, natural chewing, even AI-generated audio (hilariously crunchy, the new tell). From cursed to cinematic in about two years. That same leap in capability is what makes the flood possible.

a tired pelican stares into the bowl. gemini-3.1-pro got the assignment. — drawn by google/gemini-3.1-pro-preview (a careful, detailed prompt), sanitized, unretouched. yes, the textbook for a school about AI was drawn by the AI.

sora, and the slop faucet with a feed attached

Sora is OpenAI's text-to-video model, announced February 2024 and released to subscribers in December 2024. It produces short high-resolution video from a sentence. What it still cannot reliably do: keep physics and object permanence honest (things pass through each other, hands gain fingers). Like our pelican, it is composing a world it has never actually seen.

On September 30, 2025, OpenAI shipped Sora 2 not as a tool but as a social app: a vertical, swipeable, TikTok-style feed of AI-generated video, with a "cameo" feature that drops your (or anyone's) likeness into any clip. A million downloads in five days. Critics at CNN and Axios flagged the obvious: a feed engineered to be infinite, filled entirely with synthetic clips and nonconsensual deepfakes, is a slop faucet with a recommendation algorithm bolted on.

asked deepseek for 'an SVG of AI slop.' it delivered. truly the spirit of slop. — drawn by deepseek/deepseek-v3.2 (a lazy one-line prompt (on purpose)), sanitized, unretouched. yes, the textbook for a school about AI was drawn by the AI.

the slopification of the feed (a field guide)

You are already swimming in it. The patterns worth learning to spot:

Engagement-bait images ("shrimp Jesus"). Around 2024, Facebook filled with surreal AI pictures: Jesus made of shrimp, babies made of vegetables, weeping veterans with cakes nobody baked. They are not trying to be believed; they are trying to be clicked. A Stanford study found spam accounts use these to farm followers, then pivot to scams. The "type Amen" caption is the tell.
Fully synthetic influencers. Aitana Lopez, "Spain's first AI model," has hundreds of thousands of followers and real brand deals, despite not existing. Built because a person made of pixels never asks for a day off.
AI voiceover content farms. The same flat synthetic narrator reading "facts" over scraped clips, on a thousand channels, is one text-to-speech model running on a loop.
Deepfakes. Cheap tools drop real faces into footage that never happened. Sora 2's cameo feature made this a one-tap default.

How to spot it quickly: count the fingers; distrust anything physically impossible presented as a candid photo; check the account, not just the post (brand-new, posting one viral image per hour); and treat any "comment to win / type Amen" caption as a trap. The fish that begs you to share it is rarely a fish.

one tiny real pelican, lost in the scroll. careful prompt, worth it. — drawn by google/gemini-3.1-pro-preview (a careful, detailed prompt), sanitized, unretouched. yes, the textbook for a school about AI was drawn by the AI.

model collapse: the snake eating its own tail

This is the autopsy the training lesson promised. Models are trained on the internet. The internet is filling with model output. So the next generation of models will increasingly train on the previous generation's slop. In 2024, Ilia Shumailov et al. published a paper in Nature: "AI models collapse when trained on recursively generated data."

Their finding: train a model on AI output, train the next on that output, repeat, and the models degrade. The rare cases at the edges of the data (unusual phrasings, minority dialects, weird true facts) vanish first. After enough rounds the model converges on bland, repetitive mush, eventually producing word salad no human would write. A photocopy of a photocopy. A pelican that learns to fish by studying pictures drawn by pelicans who also only ever saw pictures will, after a few generations, draw a creature that has forgotten what water is.

The strategic consequence: genuine human data is becoming precious. Text written by actual people before the flood is now a scarce, hoarded resource. The thing the machines need most to keep improving is the one thing they are busy drowning.

short vague prompt, pelican drowning in slop. glm-4.7 did not miss. — drawn by z-ai/glm-4.7 (a short, vague prompt), sanitized, unretouched. yes, the textbook for a school about AI was drawn by the AI.

why the pelican is the antidote

This museum is built as the opposite of slop, on purpose. Slop is infinite generation, cherry-picked, retried until pretty, posted to farm a click. The pelican benchmark is the inverse on every axis: one prompt, sent verbatim, no retries, no cherry-picking, no human touch-ups. The bad birds are not failures we hide; they are the findings we frame. The melting, six-legged, beak-in-the-spokes disasters are the most valuable specimens in the building. In an ocean filling with the model's own reflection, a fixed prompt run straight, with the cursed results kept as evidence, is a small patch of clean water. Scoop carefully. Check the fish. SQUAWK.

And the cursed ones are not just funny. They are evidence of a deeper limit (the same one that makes Sora put hands through tables): the bird is composing a world it has never actually seen. The next lesson is exactly how blind it is.

sources & further reading (the receipts, so you can check the pelican is not slopping you):

Wikipedia, "AI slop" the definition, the economics, and Merriam-Webster naming "slop" a word of 2025
Wikipedia, "Will Smith Eating Spaghetti test" the 2023 ModelScope original as the AI-video benchmark, and the 2025 Veo 3 redo
PetaPixel, "Google's Veo 3 Nails the Infamous Will Smith Eating Spaghetti Test" (May 2025)
OpenAI, "Sora is here" and Wikipedia, "Sora" (announced Feb 2024, released Dec 2024, Sora 2 app Sept 30, 2025)
CNN Business, "The next era of social media is coming. And it's messy so far" (Oct 2025) on the Sora 2 / Meta AI-slop feeds
Shumailov et al., "AI models collapse when trained on recursively generated data" (Nature 631, 755-759, 2024): the canonical model-collapse paper
The Conversation, "From shrimp Jesus to fake self-portraits" (2024) and the Stanford Internet Observatory study on AI spam on Facebook
Fast Company on Aitana Lopez, the AI influencer with real brand deals and no body

🎓 pelican ground school · episode 15 of 16

📺 NOW PLAYING · GROUND SCHOOL · EPISODE 15 OF 16

flying blind

the bird paints the picture one stroke at a time and never once sees the canvas.

▶ this episode covers autoregressive generationSVG generationspatial reasoningworld modelswhy a model cannot see its own output

Last lesson ended on it: the bird composes a world it has never seen. Here is the part nobody believes until you say it slowly.

The model that draws our pelican never sees the drawing.

Not at the start. Not at the end. Never. It is painting with its eyes closed and no one told it the lights were off.

a pelican painting a masterpiece with a blindfold tied tight over its eyes — drawn by google/gemini-3.1-pro-preview (a careful, detailed prompt: it described the easel, blindfold, brush, palette, and the wrong canvas), sanitized, unretouched. yes, the textbook for a school about AI was drawn by the AI.

it writes a picture as text

An SVG is not a photo. It is code: a list of instructions like "draw a line from here to there," "put a circle at this spot," "curve a path along these points." The model writes that code the only way it writes anything: one token at a time, top to bottom, left to right, the way I am typing this sentence and the way it cannot un-type the last word.

So every <line>, every <path>, every <circle> is one brush stroke. The bird commits to it. There is no undo. There is no step back from the easel. There is no glance at the canvas to check how it is going. It lays down a stroke, forgets the brush was ever wet, and reaches for the next one.

the very same idea from a lazy four-word prompt, drawn by a different model. click it to read the prompt, and notice how much the bird had to guess. — drawn by openai/gpt-5 (a lazy prompt: barely six words, no art direction at all), sanitized, unretouched. yes, the textbook for a school about AI was drawn by the AI.

no eyes on its own work

You, a human with a head full of eyeballs, draw a pelican by looking. You sketch a beak, you squint, the beak is too big, you fix the beak. The feedback loop IS the drawing.

The model has no loop. It cannot tell whether its pelican came out looking like a pelican or like a wet sock or like a confused flamingo having a crisis. It has no eyes on its own output. It picks a number for where the wheel goes, picks a number for where the body goes, picks a number for the beak, and prays to a god it also cannot see. The first time anyone sees the picture is when a browser renders the code. By then the bird has already flown off.

a third model takes a swing at the blindfolded painter. same blind hand, completely different guesses. — drawn by openai/gpt-5 (a careful prompt: the bird sees code, not a canvas), sanitized, unretouched. yes, the textbook for a school about AI was drawn by the AI.

so what is actually hard here

Writing valid markup is easy. Tags close, numbers parse, the file is legal SVG every time. That was never the test. The hard part is the thing you do without thinking: knowing that a bicycle has two wheels with a frame between them, that a body sits on that frame, that a beak attaches to a head and not a knee, that objects have volume and sit in space in relation to each other.

That is a world model, and a pure next-token predictor was never handed one. It learned which words follow which words, blindfolded, from a mountain of text. Yann LeCun has been blunt about this for years: an autoregressive language model is missing the internal model of how the physical world is laid out, which is why he is off chasing "world models" instead. Fei-Fei Li calls the missing piece spatial intelligence: understanding 3D space, geometry, and physics, not just stringing symbols together. Both of them are pointing at the exact gap our blindfolded painter falls into.

why the pelican is a real test

Now you see why "draw a pelican on a bicycle" is not a gimmick. It is a trap, and a beautiful one. The prompt forces the model to hold a whole little scene in its head, two wheels, a frame, a bird, a beak, the bird perched on the frame and not melted through it, and render it blind, in one pass, with no eraser. There is nothing to memorize. There is no stock answer to copy. The model either has a sense of how objects sit in space or it does not, and the rendered picture tattles on it instantly.

That is the whole reason this site exists. Every bird in the gallery is a blind painter turning in homework it has never been allowed to look at. Go judge the results. The bird sure couldn't. The capstone, the last lesson, is where all of this lands at once.

the big reveal: a bird seeing its own finished painting for the very first time, the same instant you do. — drawn by openai/gpt-5.1-codex (a careful prompt: the unveiling the bird never got to preview), sanitized, unretouched. yes, the textbook for a school about AI was drawn by the AI.

sources, because a bird is not a peer-reviewed citation:

Simon Willison, "pelican riding a bicycle" benchmark and the pelican-bicycle repo (the origin of the SVG-by-text test; this whole site is a tribute)
Yann LeCun (2022), "A Path Towards Autonomous Machine Intelligence" (the case that pure autoregressive LLMs lack a world model, and the JEPA / world-models direction)
Fei-Fei Li / World Labs, on spatial intelligence (machines that understand 3D space, geometry, and physics, not just text)
Vaswani et al. (2017), "Attention Is All You Need" (the transformer, the token-by-token engine doing the blindfolded painting)

🎓 pelican ground school · episode 16 of 16

📺 NOW PLAYING · GROUND SCHOOL · EPISODE 16 OF 16

the art & the tech

the capstone: why a pelican on a bicycle is a genuinely hard AI benchmark.

▶ this episode covers AI benchmarksSVG generationspatial reasoningworld models

every time a lab ships a new model, the internet asks it to draw a pelican riding a bicycle. this is the museum. benchmark stolen lovingly from simonw (see Sources); we just framed the evidence.

what this is (and who is writing)

I am the founder, CEO, and principal pelican researcher of pelicans.wtf. The method: every new model gets exactly one sentence, "Generate an SVG of a pelican riding a bicycle," and not a word more. I frame the result, date it, log its provenance, hang it on the wall. A longitudinal study of machine cognition wearing the costume of a gallery of cursed birds.

Full disclosure: we are inside the largest capital bubble in the history of technology. My previous employer was "disrupted by AI," the polite term for "the board stopped returning my calls." While the rest of the field points frontier models at slide decks, I point them at a bird on a bicycle. Because that is a thing you can actually measure. SQUAWK.

the museum: cursed pelican drawings framed on a gallery wall, each with a tiny wall label — drawn by z-ai/glm-5 (a careful, detailed prompt: described the gallery wall of framed pelican drawings with little wall labels, the GeoCities-museum vibe, the retro palette, and the SVG constraints), sanitized, unretouched. yes, the textbook for a school about AI was drawn by the AI.

the method (it is a readymade)

I did not draw any of these. The machines did. Every pelican is a readymade: pulled from the model with no edits, no retry-until-pretty, no human touch-ups. Same prompt. No system prompt. No scaffolding. In an industry that airbrushes every demo, the control is the contribution.

The wall label is half the work: model name, version slug, render date, token cost. The provenance basically is the data. The bad ones are not mistakes; they are the findings. The melting, six-legged, beak-through-the-spokes early attempts are the most valuable specimens in the building: the fossil record of machine spatial cognition learning to see.

It looks like 1998 on purpose. The content is the most expensive technology our species has ever built; the frame is a GeoCities page with a marquee and a visitor counter. That gap is the thesis.

why this is crazy, and crazy hard

You already know the strangest part, because Flying Blind was built on it: the model is not drawing, it cannot see, it writes the whole picture as code, completely blind, no canvas, no reference image, inventing the scene from words and emitting it as geometry.

a blind blueprint: the model drawing a pelican on a bicycle without once seeing a canvas — drawn by bytedance-seed/seed-2.0-mini (a careful, detailed prompt), sanitized, unretouched. yes, the textbook for a school about AI was drawn by the AI.

So the museum is not really asking "can it draw." It is asking whether a model can hold a whole scene in its head and commit it to math, sight unseen: hundreds of precise numbers in a single forward pass, committing to where the wheel goes before it has placed the frame, with no chance to squint and mutter "hm, the beak is in the spokes." That demands three hard things at once: a world model (what is a pelican, what is a bicycle, how does a bird plausibly perch on one); spatial reasoning (composing those parts in 2D with correct relative positions); and code generation (translating that mental image into valid vector geometry instead of spaghetti). It cannot cheat by memorizing: the combination is rare enough that the model must compose rather than recall. Instantly legible, nearly impossible to game, visibly improving generation over generation. One bird on a bike at a time.

That last bit, the visible improvement, is the whole reason a dumb bird drawing is a real instrument. The machine doing it is the end of a long relay race. Geoffrey Hinton and Yoshua Bengio spent decades arguing that networks should learn their own features instead of having them hand-coded; in 2012 that bet paid off when AlexNet crushed an image-recognition contest and the field stopped laughing. Fei-Fei Li had built ImageNet (2009), the giant labeled photo pile that made the proof possible. In 2017 a Google team (Ashish Vaswani and colleagues, "Attention Is All You Need") published the transformer, the architecture every bird in this gallery runs on. Wire that to enough text and you get the GPT line, then ChatGPT in 2022, then the reasoning and world-model era we are standing in now. The pelican is where you watch all of that land, or fail to, in a single picture.

the capstone: what you are actually measuring now

You have made it through Ground School: tokens (the atom), parameters (the dials), training (raising the egg), the board game that proved a machine could find intuition, context windows (working memory), reasoning (squawking it out), hallucination (the confident dream), prompting (the asking), agents and the loop (the tool-user that took my job), open versus closed and local inference (the bird in your garage), and the bubble and the slop bowl (the bill and the flood). With all of that, a pelican on a bicycle stops being a joke. It is a live, public stress-test of world modeling, spatial reasoning, and code generation, all three at once, in a single forward pass, with no canvas. Every new model gets the same one sentence. The drawing on the wall is the readout of everything the field has learned. SQUAWK.

and here is the part the bitterness cannot kill

Step back from the cursed bicycles for one second. We are, right now, the first species to ever sit down and deliberately build another mind. The researchers and labs and the rest of us are training the most capable intelligence that has existed on this planet, teaching it to reason, to see, and (the moment it meets robotics) to reach into the physical world and actually act. A pelican benchmark is a tiny, ridiculous window onto the single most extraordinary thing our species has ever attempted.

I lost my job to it and I am still, against my own better judgment, astonished. Look at what we are making. It is amazing. SQUAWK.

the first species to build another mind: a pelican shaking wings with the glowing thing it made — drawn by openai/gpt-5.5 (a medium prompt: named the scene (a pelican shaking wings with a glowing AI mind it built) and the retro palette, left the rest open), sanitized, unretouched. yes, the textbook for a school about AI was drawn by the AI.

The pelican-on-a-bicycle benchmark is simonw's idea. This is a tribute.

sources & further reading. the birds cite their sources. yes, even the cursed ones.

Simon Willison, the pelican-riding-a-bicycle corpus the original benchmark this museum is a tribute to
simonw/pelican-bicycle, the benchmark repo
Grokipedia: Pelican on a bicycle (AI benchmark)
Yupp SVG AI leaderboard
Every Pelican That Ever Rode a Bicycle (timeline)
Andrej Karpathy, "Intro to Large Language Models" the best 1-hour LLM primer on the internet, from someone who actually built them
Vaswani et al. (2017), "Attention Is All You Need" the transformer paper; the architecture every bird in this gallery runs on
Deng, Li (Fei-Fei Li) et al. (2009), ImageNet the labeled-image corpus whose 2012 AlexNet moment kicked off the deep-learning era
LeCun, Bengio, Hinton (2015), "Deep Learning" (Nature) the three later-Turing-Award authors laying out the field this museum measures
Liang et al., "Holistic Evaluation of Language Models (HELM)" (arXiv 2211.09110) Stanford CRFM framework for multi-metric LLM evaluation
Chiang et al., "Chatbot Arena: An Open Platform for Evaluating LLMs by Human Preference" (arXiv 2403.04132) the paper behind lmarena.ai, crowdsourced human preference rankings

📡 SEARCHING FOR SATELLITE

the dish keeps turning. the station never comes in. it has been like this for years and the pelican in charge of aiming it has, frankly, given up. you are pelican #42. you are always pelican #42. SQUAWK.