Adactio: Tags—formation

Threat models

People talk about the effectiveness (or lack thereof) of large language models as though all tasks are comparable. But it strikes me that there are three broad categories of work that large language models are applied to:

Compression.
Transformation.
Expansion.

Compression is when you feed a large language model something big that you want to make small. Summarise this book. Give me the gist of this meeting. Large language models are generally pretty good at this, which makes sense given that they themselves are kind of like compressed artifacts.

Transformation is when large language models convert from one format into another. Turn this audio into text. Turn this jumble of data into structured JSON. A large language model can handle these tasks pretty well. There’ll probably be a few errors so make sure that’s not a deal-breaker.

Expansion is when you give a large language model a prompt to generate something from scratch. An image. A presentation. An email. A poem. This is where slop lives. The output inevitably betrays its origins, glistening with a sheen of mediocrity.

Laurie spotted this three-way split a while back:

Is what you’re doing taking a large amount of text and asking the LLM to convert it into a smaller amount of text? Then it’s probably going to be great at it. If you’re asking it to convert into a roughly equal amount of text it will be so-so. If you’re asking it to create more text than you gave it, forget about it.

I hope that when the bubble finally bursts, we’ll see the surviving large language models put to work on the first two categories. The boring stuff. The work that’s tedious for humans.

But tedious is as tedious does. Something I consider drudgery might be the very thing that gives you life. Like Giles says:

I have a feeling that everyone likes using AI tools to try doing someone else’s profession. They’re much less keen when someone else uses it for their profession.

The big exception seems to be programming. Apparently there are plenty of coders who never before expressed an interest in being managers who are now happily hanging up their coding spurs in favour being the overseer of non-human workers.

It’s a reasonable outlook. It could even be considered a user-centred approach. Users don’t care about the elegance of your code; they care about accomplishing their tasks.

Programming is something of an exception to the efficacy of large language models in general. Instead of relying on the subjectivity of painting, poetry, or prose, programming can be objectively tested. Throw enough money at the worst people in the world and they’ll give you tokens you can use to get the machines to test their own output. So you can get a large language model to create something reasonably good from scratch as long as that something is code.

If you had asked me about the threat model of large language models two years ago, I probably would’ve been worried for artists, writers, and musicians. I thought that software had enough inherent complexity to be relatively safe.

Now my opinion has completely reversed. Software is almost certainly the killer app for large language models.

I think the artists, writers, and musicians will be okay, or at least as okay as they ever were. It turns out that humans like things made by other humans.

And y’know what? If I had to choose which endeavour I’d rather see automated away—programming or art—it’s no competition.

Don’t get me wrong—it would be nice if everyone got paid for doing what they enjoy. It’s just that I’m okay with software engineers not being at the front of that line.

I remember when I first started getting paid money to make websites. “Really?” I thought, “Someone is willing to pay me to do something I’d do anyway?” I kept waiting for the jig to be up. Instead I saw my profession grow and expand.

Perhaps there’s a long-overdue compression happening.

Or maybe it’s more like a transformation.

Denial

The Wikimedia Foundation, stewards of the finest projects on the web, have written about the hammering their servers are taking from the scraping bots that feed large language models.

Our infrastructure is built to sustain sudden traffic spikes from humans during high-interest events, but the amount of traffic generated by scraper bots is unprecedented and presents growing risks and costs.

Drew DeVault puts it more bluntly, saying Please stop externalizing your costs directly into my face:

Over the past few months, instead of working on our priorities at SourceHut, I have spent anywhere from 20-100% of my time in any given week mitigating hyper-aggressive LLM crawlers at scale.

And no, a robots.txt file doesn’t help.

If you think these crawlers respect robots.txt then you are several assumptions of good faith removed from reality. These bots crawl everything they can find, robots.txt be damned.

Free and open source projects are particularly vulnerable. FOSS infrastructure is under attack by AI companies:

LLM scrapers are taking down FOSS projects’ infrastructure, and it’s getting worse.

You try to do the right thing by making knowledge and tools freely available. This is how you get repaid. AI bots are destroying Open Access:

There’s a war going on on the Internet. AI companies with billions to burn are hard at work destroying the websites of libraries, archives, non-profit organizations, and scholarly publishers, anyone who is working to make quality information universally available on the internet.

My own experience with The Session bears this out.

Ars Technica has a piece on this: Open source devs say AI crawlers dominate traffic, forcing blocks on entire countries .

So does MIT Technology Review: AI crawler wars threaten to make the web more closed for everyone.

When we talk about the unfair practices and harm done by training large language models, we usually talk about it in the past tense: how they were trained on other people’s creative work without permission. But this is an ongoing problem that’s just getting worse.

The worst of the internet is continuously attacking the best of the internet. This is a distributed denial of service attack on the good parts of the World Wide Web.

If you’re using the products powered by these attacks, you’re part of the problem. Don’t pretend it’s cute to ask ChatGPT for something. Don’t pretend it’s somehow being technologically open-minded to continuously search for nails to hit with the latest “AI” hammers.

If you’re going to use generative tools powered by large language models, don’t pretend you don’t know how your sausage is made.

Open source devs say AI crawlers dominate traffic, forcing blocks on entire countries - Ars Technica

As it currently stands, both the rapid growth of AI-generated content overwhelming online spaces and aggressive web-crawling practices by AI firms threaten the sustainability of essential online resources. The current approach taken by some large AI companies—extracting vast amounts of data from open-source projects without clear consent or compensation—risks severely damaging the very digital ecosystem on which these AI models depend.

Go To Hellman: AI bots are destroying Open Access

AI companies with billions to burn are hard at work destroying the websites of libraries, archives, non-profit organizations, and scholarly publishers, anyone who is working to make quality information universally available on the internet.

The Blowtorch Theory: A New Model for Structure Formation in the Universe

Make yourself a nice cup of tea and settle in with Julian Gough’s magnum opus:

How early, sustained, supermassive black hole jets carved out cosmic voids, shaped filaments, and generated magnetic fields

AI wants to rule the World, but it can’t handle dairy.

AI has the same problem that I saw ten year ago at IBM. And remember that IBM has been at this AI game for a very long time. Much longer than OpenAI or any of the new kids on the block. All of the shit we’re seeing today? Anyone who worked on or near Watson saw or experienced the same problems long ago.

The New York Good Times

Better than the real thing. All true too.

Refresh for more.

What I’ve learned about writing AI apps so far | Seldo.com

LLMs are good at transforming text into less text

Laurie is really onto something with this:

This is the biggest and most fundamental thing about LLMs, and a great rule of thumb for what’s going to be an effective LLM application. Is what you’re doing taking a large amount of text and asking the LLM to convert it into a smaller amount of text? Then it’s probably going to be great at it. If you’re asking it to convert into a roughly equal amount of text it will be so-so. If you’re asking it to create more text than you gave it, forget about it.

Depending how much of the hype around AI you’ve taken on board, the idea that they “take text and turn it into less text” might seem gigantic back-pedal away from previous claims of what AI can do. But taking text and turning it into less text is still an enormous field of endeavour, and a huge market. It’s still very exciting, all the more exciting because it’s got clear boundaries and isn’t hype-driven over-reaching, or dependent on LLMs overnight becoming way better than they currently are.

Prescriptive and Descriptive Information Architectures | Jorge Arango

Interesting—this is exactly the same framing I used to talk about design systems a few years ago.

Information literacy and chatbots as search • Buttondown

If someone uses an LLM as a replacement for search, and the output they get is correct, this is just by chance. Furthermore, a system that is right 95% of the time is arguably more dangerous tthan one that is right 50% of the time. People will be more likely to trust the output, and likely less able to fact check the 5%.

Daring Fireball: Kottke on the Art and Power of Hypertextual Writing

Hypertext links are an information-density multiplier.

The way I’ve long thought about it is that traditional writing — like for print — feels two-dimensional. Writing for the web adds a third dimension. It’s not an equal dimension, though. It doesn’t turn writing from a flat plane into a full three-dimensional cube. It’s still primarily about the same two dimensions as old-fashioned writing. What hypertext links provide is an extra layer of depth. Just the fact that the links are there — even if you, the reader, don’t follow them — makes a sentence read slightly differently. It adds meaning in a way that is unique to the web as a medium for prose.

Information Architecture First Principles | Jorge Arango

People only understand things relative to things they already understand

People only understand things in context

People rely on patterns and consistency

People seek to minimize cognitive load

People have varying levels of expertise and familiarity

People are goal-oriented

People often don’t know what they’re looking for

Information is more useful when it’s actionable

Notes – David Bushell

David is on board. Who else?

Directory enquiries

I was having a discussion with some of my peers a little while back. We were collectively commenting on the state of education and documentation for front-end development.

A lot of the old stalwarts have fallen by the wayside of late. CSS Tricks hasn’t been the same since it got bought out by Digital Ocean. A List Apart goes through fallow periods. Even the Mozilla Developer Network is looking to squander its trust by adding inaccurate “content” generated by a large language model.

The most obvious solution is to start up a brand new resource for front-end developers. But there are two probems with that:

It’s really, really, really hard work, and
It feels a bit 927.

I actually think there are plenty of good articles and resources on front-end development being published. But they’re not being published in any one specific place. People are publishing them on their own websites.

Ahmed, Josh, Stephanie, Andy, Lea, Rachel, Robin, Michelle …I could go on, but you get the picture.

All this wonderful stuff is distributed across the web. If you have a well-stocked RSS reader, you’re all set. But if you’re new to front-end development, how do you know where to find this stuff? I don’t think you can rely on search, unless you have a taste for slop.

I think the solution lies not with some hand-wavey “AI” algorithm that burns a forest for every query. I think the solution lies with human curation.

I take inspiration from Phil’s fantastic project, ooh.directory. Imagine taking that idea of categorisation and applying it to front-end dev resources.

Whether it’s a post on web.dev, Smashing Magazine, or someone’s personal site, it could be included and categorised appropriately.

Now, there would still be a lot of work involved, especially in listing and categorising the articles that are already out there, but it wouldn’t be nearly as much work as trying to create those articles from scratch.

I don’t know what the categories should be. Does it make sense to have top-level categories for HTML, CSS, and JavaScript, with sub-directories within them? Or does it make more sense to categorise by topics like accessibility, animation, and so on?

And this being the web, there’s no reason why one article couldn’t be tagged to simultaneously live in multiple categories.

There’s plenty of meaty information architecture work to be done. And there’d be no shortage of ongoing work to handle new submissions.

A stretch goal could be the creation of “playlists” of hand-picked articles. “Want to get started with CSS grid layout? Read that article over there, watch this YouTube video, and study this page on MDN.”

What do you think? Does this one-stop shop of hyperlinks sound like it would be useful? Does it sound feasible?

I’m just throwing this out there. I’d love it if someone were to run with it.

Labels

I love libraries. I think they’re one of humanity’s greatest inventions.

My local library here in Brighton is terrific. It’s well-stocked, it’s got a welcoming atmosphere, and it’s in a great location.

But it has an information architecture problem.

Like most libraries, it’s using the Dewey Decimal system. It’s not a great system, but every classification system is going to have flaws—wherever you draw boundaries, there will be disagreement.

The Dewey Decimal class of 900 is for history and geography. Within that class, those 100 numbers (900 to 999) are further subdivded in groups of 10. For example, everything from 940 to 949 is for the history of Europe.

Dewey Decimal number 941 is for the history of the British Isles. The term “British Isles” is a geographical designation. It’s not a good geographical designation, but technically it’s not a political term. So it’s actually pretty smart to use a geographical rather than a political term for categorisation: geology moves a lot slower than politics.

But the Brighton Library is using the wrong label for their shelves. Everything under 941 is labelled “British History.”

The island of Ireland is part of the British Isles.

The Republic of Ireland is most definitely not part of Britain.

Seeing books about the history of Ireland, including post-colonial history, on a shelf labelled “British History” is …not good. Frankly, it’s offensive.

(I mentioned this situation to an English friend of mine, who said “Well, Ireland was once part of the British Empire”, to which I responded that all the books in the library about India should also be filed under “British History” by that logic.)

Just to be clear, I’m not saying there’s a problem with the library using the Dewey Decimal system. I’m saying they’re technically not using the system. They’ve deviated from the system’s labels by treating “History of the British Isles” and “British History” as synonymous.

I spoke to the library manager. They told me to write an email. I’ve written an email. We’ll see what happens.

You might think I’m being overly pedantic. That’s fair. But the fact this is happening in a library in England adds to the problem. It’s not just technically incorrect, it’s culturally clueless.

Mind you, I have noticed that quite a few English people have a somewhat fuzzy idea about the Republic of Ireland. Like, they understand it’s a different country, but they think it’s a different country in the way that Scotland is a different country, or Wales is a different country. They don’t seem to grasp that Ireland is a different country like France is a different country or Germany is a different country.

It would be charming if not for, y’know, those centuries of subjugation, exploitation, and forced starvation.

British history.

Update: They fixed it!

Beyond the Frame | Untangling Non-Linearity

A fascinating look at the connections between hypertext and film editing. I’m a sucker for any article that cites both Ted Nelson and Walter Murch.

RSS Anything

Next time you’re frustrated by a website that doesn’t provide an RSS feed, try using this tool:

Transform any old website with a list of links into an RSS Feed

The Website vs. Web App Dichotomy Doesn’t Exist | jakelazaroff.com

Amen!

If there’s one takeaway from all this, it’s that the web is a flexible medium where any number of technologies can be combined in all sorts of interesting ways.

The map-reduce is not the territory

Unlike many people, I’m not particularly worried about AI replacing peoples’ jobs, although employers will certainly try and use it to reduce their headcount. I’m more worried about it transforming jobs into roles without agency or space to be human. Imagine a world where performance reviews are conducted by software; where deviance from the norm is flagged electronically, and where hiring and firing can be performed without input from a human. Imagine models that can predict when unionization is about to occur in a workplace. All of this exists today, but in relatively experimental form. Capital needs predictability and scale; for most jobs, the incentives are not in favor of human diversity and intuition.

Undersea Cables by Rishi Sunak [PDF]

Years before becoming Prime Minister of the UK, Rishi Sunak wrote this report, Undersea Cables: Indispensable, insecure.

Tags: formation

69

Thursday, April 16th, 2026

Threat models

Monday, April 7th, 2025

Denial

Friday, March 28th, 2025

Open source devs say AI crawlers dominate traffic, forcing blocks on entire countries - Ars Technica

Wednesday, March 26th, 2025

Go To Hellman: AI bots are destroying Open Access

Friday, March 21st, 2025

The Blowtorch Theory: A New Model for Structure Formation in the Universe

Wednesday, February 12th, 2025

AI wants to rule the World, but it can’t handle dairy.

Tuesday, January 21st, 2025

The New York Good Times

What I’ve learned about writing AI apps so far | Seldo.com

Wednesday, January 15th, 2025

Prescriptive and Descriptive Information Architectures | Jorge Arango

Thursday, November 7th, 2024

Information literacy and chatbots as search • Buttondown

Daring Fireball: Kottke on the Art and Power of Hypertextual Writing

Thursday, September 5th, 2024

Information Architecture First Principles | Jorge Arango

Wednesday, July 10th, 2024

Notes – David Bushell

Directory enquiries

Friday, May 17th, 2024

Labels

Thursday, February 8th, 2024

Beyond the Frame | Untangling Non-Linearity

Sunday, January 7th, 2024

RSS Anything

Friday, January 5th, 2024

The Website vs. Web App Dichotomy Doesn’t Exist | jakelazaroff.com

Monday, October 23rd, 2023

The map-reduce is not the territory

Thursday, August 10th, 2023

Undersea Cables by Rishi Sunak [PDF]