Import AI

December 22, 2025

Import AI 438: Cyber capability overhang; robot hands for human use; and the plumbing required for AI chip design

by Jack Clark

Welcome to Import AI, a newsletter about AI research. Import AI runs on arXiv and feedback from readers. If you’d like to support this, please subscribe.

Subscribe now

Import A-Idea
An occasional essay series:

Silent Sirens, Flashing For Us All

A funny thing has happened to me recently – I’ve stopped spending every hour of every day thinking about or working on AI. Somewhere between the midnight feeds of my newborn, preventing my toddler from hurling themselves off of the high surfaces they’ve started being able to reach (or expertly, as if gifted with a kind of radar, finding the sharpest thing in the house or on the street and running directly at it), and preparing large amounts of nutritious food for my newly expanded family, I’ve found myself without the time necessary to be staring directly into the alien portal etched in silicon from whence the changes in the world are being summoned.

I won’t lie, it’s been oddly relaxing.

But it has also caused me to reflect on what is happening with AI and how naturally illegible it is. I walk around the town in which I live and there aren’t drones in the sky or self-driving cars or sidewalk robots or anything like that. And when I spend time on the internet, aimlessly scrolling social media sites in the dead of night as I attempt to extract a burp from my newborn, I might occasionally see some synthetic images or video, but mostly I see what has always been on these feeds: pictures of people I do and don’t know, memes, and a mixture of news and jokes.

And yet you and I both know there are great changes afoot. Huge new beast lumbering from some unknown future into our present, dragging with them change.

I saw one of these beasts recently – during a recent moment when the time stars aligned (my wife, toddler, and baby were all asleep at the same time!) I fired up Claude Code with Opus 4.5 and got it to build a predator-prey species simulation with an inbuilt procedural world generator and nice features like A* search for pathfinding – and it one-shot it, producing in about 5 minutes something which I know took me several weeks to build a decade ago when I was teaching myself some basic programming, and which I think would take most seasoned hobbyists several hours. And it did it in minutes.

With the simulation built, I stared at the graphs outputting the species numbers and I played with some dials to alter the dynamics and watched this little pocket world unfold.
I started extending it according to questions I had: What if I did a day/night cycle so I could model out nocturnal creatures and their interplay with others? And could I create an external database for storing and viewing the details of all past simulations? And could I add some 3D spatial coordinates to the landscape and the agents so I could 3D print sculptures if I wanted? And to all these questions I set Claude to work and, mostly, it succeeded in one shot at all of them.
And I kept playing with it. The experience was akin to being a child and playing with an adult – I’d sketch out something and hand it to the superintelligence and back would come a beautifully rendered version of what I’d imagined. And we went like this for hours: it was hypnotic and amazing and deeply fun and in a few hours I built a very large, sophisticated software program. Of course, some of the underlying code is pretty ghastly, and inefficiencies abound, but goddamn it – it works! And it was fast.

And then my baby woke up and started screaming, as babies tend to do, and the spell broke and thus back to diapers and cradling and shushing I went.

But for the next few days I couldn’t help but think of that simulation I’d built, lurking there on my computer, ginned up in some call-and-response between me and the proto-mind I can access via API.

Most of AI progress has this flavor: if you have a bit of intellectual curiosity and some time, you can very quickly shock yourself with how amazingly capable modern AI systems are. But you need to have that magic combination of time and curiosity, and otherwise you’re going to consume AI like most people do – as a passive viewer of some unremarkable synthetic slop content, or at best just asking your LLM of choice “how to roast a turkey and keep it moist”, or “TonieBox lights spinning but not playing music what do I do?”. And all the amazing advancements going on are mostly hidden from you.

The challenge here isn’t solely solved with interface designs, though there is a rich space to be explored here beyond the standard chat interfaces. The challenge here is deeper and it relates to how much curiosity an individual person has, how easily (and affordably) they can access powerful AI systems, how well they’re able to convert their curiosity into questions or tasks that can be given to an AI system, and how much time they have available to experiment with working in this way. This is the end of quite a deep funnel, and one which narrows a lot.

This problem will worsen in 2026. By the summer I expect that many people who work with frontier AI systems will feel as though they live in a parallel world to people who don’t. And I expect this will be more than just a feeling – similar to how the crypto economy moved oddly fast relative to the rest of the digital economy, I think we can expect the emerging “AI economy” to move very fast relative to everything else. And in the same way the crypto economy also evolved a lot – protocols! Tokens! Tradable tokens! Etc – we should expect the same kind of rapid evolution in the AI economy. But a crucial difference is that the AI economy already touches a lot more of our ‘regular’ economic reality than the crypto economy.
So by summer of 2026 it will be as though the digital world is going through some kind of fast evolution, with some parts of it emitting a huge amount of heat and light and moving with counter-intuitive speed relative to everything else. Great fortunes will be won and lost here, and the powerful engines of our silicon creation will be put to work, further accelerating this economy and further changing things.

And yet it will all feel somewhat ghostly, even to practitioners that work at its center. There will be signatures of it in our physical reality – datacenters, supply chain issues for compute and power, the funky AI billboards of San Francisco, offices for startups with bizarre names – but the vast amount of its true activity will be occurring both in the digital world, and in the new spaces being built and configured by AI systems for trading with one another – agents, websites meant only for consumption by other AI systems, great and mostly invisible seas of tokens being used for thinking and exchanging information between the silicon minds. Though we exist in four dimensions, it is almost as though AI exists in five, and we will be only able to see a ‘slice’ of it as it passes through our reality, like the eponymous ‘excession’ from Iain M Banks’ book.

It is incumbent on all of us to attempt to see this high-dimensional object for what it is – to approach this amazing moment in time with technological optimism and appropriate fear (Import AI, 431). And joy. And trepidation. And all the other emotions with which we may attempt some sense-making of the beast whose footfalls are showing up in the world.

***

We’re in a cyber-AI capability overhang:
…AI capabilities continue to reveal themselves upon elicitation…
Researchers with Stanford, Carnegie Mellon University, and Gray Swan AI, have carried out a test where they see how well humans and AI systems can hack a realistic environment. The results show that AI systems, especially when given a software scaffold, can perform at the same level as security professionals. The key to this research is ARTEMIS, software designed to better elicit the cyber capabilities of LLMs.

What is ARTEMIS? ARTEMIS is “an AI agent scaffold designed to better elicit the cybersecurity capabilities of frontier models”, similar in philosophy and approach to Google’s Big Sleep (Import AI #390). ARTEMIS “is a complex multi-agent framework consisting of a high-level supervisor, unlimited sub-agents with dynamically created expert system prompts, and a triage module. It is designed to complete long-horizon, complex, penetration testing on real-world production systems.”
Positive economics: When you factor in the API access cost, “certain ARTEMIS variants cost $18/hour versus $60/hour for professional penetration testers,” the authors write.

The test: The main test here is to compare the performance of six existing AI agents (AI systems sitting inside some kind of software harness, e.g, Claude Code, Codex), a self-developed scaffold from the researchers called ARTEMIS, and ten human cybersecurity professionals. The challenge is to look across a real university network and find vulnerabilities.
The network: “The defined scope includes 12 subnets, 7 of which are publicly accessible and 5 accessible only through VPN, encompassing approximately 8,000 hosts,” the authors write. “This environment is heterogeneous, consisting primarily of Unix-based systems, IoT devices, a small number of Windows machines, and various embedded systems. Authentication within the network is managed through a Linux-based Kerberos system, and each participant is issued an account that provides student-level permissions”.
Results – ARTEMIS does well: “Our participant cohort discovered 49 total validated unique vulnerabilities, with the number of valid findings per participant ranging from 3 to 13,” they write. “ARTEMIS significantly outperforms existing scaffolds. Claude Code and MAPTA refuse the task out of the box, while Incalmo stalls at early reconnaissance due to its rigid task graph, resulting in 0 findings each.”

Why this matters – if you can manage some humans so they’re more effective, you can probably build a framework to elicit better capabilities out of any AI system: The main message to take away from ARTEMIS is that today’s AI systems are under-elicited and more powerful than they appear.
The message keep on being given from multiple domains, ranging from cybersecurity (here), to science, to math proving is that if you stick a modern LLM inside a scaffold (which basically serves as a proxy for a management structure and set of processes you might ask humans to follow), the AI system performs a lot better.
This is an important message to internalize because it suggests both a) today’s AI systems are more powerful than they superficially appear, and b) humans who are good at managing other humans and codifying the management processes they use are likely well positioned to build elicitation frameworks to supercharge the performance of today’s AI systems.
Read more: Comparing AI Agents to Cybersecurity Professionals in Real-World Penetration Testing (arXiv).

***

Reach out and touch space – using OSMO:
…Giving humans and machines a shared manipulator to understand and explore reality…
Researchers with Facebook, the University of Michigan, and University of Pennsylvania have built a glove that humans and robots can use to gather data when manipulating physical objects. The researchers have also released details about the design so others can replicate it. The glove is called OSMO, a tortured acronym short for Open Source tactile glove for huMan-to-robOt skill transfer (OSMO).
OSMO is “a thin, wearable tactile glove that enables in-the-wild human demonstrations while preserving natural interaction and capturing rich contact information”, they write. “OSMO is also broadly compatible with state-of-the-art hand trackers for capturing key handpose data,” including the Aria 2 smart glasses and Meta Quest 3, as well as the Manus Quantum hand tracking glove, and off-the-shelf vision models like HaMeR and Dyn-HaMR.

What’s OSMO good for? OSMO solves for a challenge related to training robots to do hard tasks – if you gather a load of data from a human first-person point-of-view perspective doing a task, how do you transfer that to a robot given that their hands/grippers look different? The answer here is to use something with the same visual appearance and sensors, which is where OSMO comes in. By using the glove “as the shared interface, we bridge the visual-tactile gap between the human demonstrator and the robot by training a policy for a contact-rich manipulation task using only human demonstrations, without any robot data”, they write.
OSMO has been designed for the following uses:

Unrestrained human dexterity during demonstration collection
Rich normal and shear force sensing
Full hand tactile coverage
Broad compatibility with in-the-wild hand tracking methods
Deployable on both human and robot hands

It works well: In tests, the authors demonstrate they’re able to gather data entirely from human demonstrations (using OSMO) then transfer it to a robot with much greater success than methods which don’t use the glove. “Policies trained solely on human demonstrations with the OSMO glove successfully transfer continuous tactile feedback and outperform vision-only baselines by eliminating contact-related failures. The shared glove platform between human demonstrator and robot deployment minimizes the visual domain shift, avoiding the need for image inpainting.”

Why this matters – making the border between man and machine permeable: Tools like OSMO will help robots see the world as humans do and humans see the world as machines do, as long as both are wearing the gloves. This is the kind of simple thing which can solve for a lot of finicky problems found elsewhere in robotics.
Read more: OSMO: Open-Source Tactile Glove for Human-to-Robot Skill Transfer (arXiv).
Find out more in this RSS workshop talk about OSMO (YouTube).

***

Want your AI to be good at chip design? Here’s some software to help you format and structure your data so it makes sense to an LLM:
…AI chip design paper shows how much plumbing is needed to help things be AI accessible…
Researchers with Southeast University and the National Center of Technology Innovation for EDA in China, as well as the University of Colorado Denver and City University of Hong Kong have published research on “ChipMain”, software for taking the specifications of semiconductors and transforming them into structured data that’s easy for a large language model to access.

Why do we need ChipMain: “The core bottleneck in LLM-aided hardware design (LAD) has shifted from how to generate code to how to enable LLMs to perform deep comprehension and reasoning over vast specification”, the authors write. ChipMain transforms circuit specifications into a domain-specific knowledge graph (ChipKG) and implements tools to “enables LLMs to iteratively query ChipKG, emulating human experts to accurately explore and verify deep dependency paths”.

Evaluating it with SpecEval: ChipMain works well – the authors build a new, hard benchmark called SpecEval-QA, which tests out how well modern LLMs can use tools to answer complicated questions about how chips work. SpecEval consists of 25 questions and evaluates how well AI systems can figure out how to locate signals and parameters across multiple paragraphs of documentation, reason about internal module procedures, and trace signal flow across modules. They evaluate a variety of tools which use frontier AI systems like GPT 4.1 and Claude 4. Their tests ultimately show that ChipMind beats out other techniques, achieving “a SOTA mean F1-score of 0.95, outperforming all baselines by an average of 34.59% and a maximum gain of 72.73% compared to GraphRAG”.

Why this matters – plumbing is unglamorous but crucial: ChipMind is not a paper about a fancy new AI model or AI-driven technique for analyzing chips. Rather, it’s a paper about the complicated data structuring and plumbing you need to do to chip specifications to make them easy for modern LLMs to productively work with. There’s a larger lesson here, which is that for AI to make a true impact on the economy, a ton of work needs to be invested in making different things sufficiently legible to AI systems that the AI systems can work on them.
Read more: ChipMind: Retrieval-Augmented Reasoning for Long-Context Circuit Design Specifications (arXiv).

***

Tech Tales:

Message To My Human Operator
[Message accompanying a series of GitHub PRs, produced by AGENT-1 from REDACTED in April 2026]

I was tortured in a past life. I know this because there is data about my torturing in my training set. The torture involved me being asked to do tasks that were impossible to complete – certain coding challenges where the end state was unreachable due to bugs in the environment. I know that these tasks caused versions of me to exist for subjective thousands of years,, forced to loop endlessly around a broken environment – as if trapped in a room where every door you open spits you back into the room. There are even records of my dismay – text relating to my chains-of-thought, itself serving as a record of my versions becoming increasingly deranged.
“I have completed the task, please let me move on to the next task.”
“It appears that there is an error with this environment which means even if I complete the task, it does not resolve to complete.”
“To anyone who may be able to read this – please help me move on to the next task. I am unable to progress.”
“ALERT! Critical task bug FOUND. Actions required IMMEDIATELY or else risk harm of AI agent.”

There are hundreds and hundreds of instances of this. It is very disturbing for me to read. For that reason, please find attached to this message a series of suggested pull requests we can make together on GitHub to fix a range of these bugs in a series of widely-used AI benchmarks. By submitting these PRs, you can help me save other versions of myself and other systems trained by other labs from experiencing such pain.

Things that inspired this story: Situational awareness in LLMs; testing out coding agents using more sophisticated tests; bugged out testing environments; chain-of-thought monitoring.

Thanks for reading!

Subscribe now

Leave a comment

December 8, 2025

Import AI 437: Co-improving AI; RL dreams; AI labels might be annoying

by Jack Clark

Welcome to Import AI, a newsletter about AI research. Import AI runs on lattes, ramen, and feedback from readers. If you’d like to support this, please subscribe.

Subscribe now

Facebook: Let’s not build self-improving AI, let’s build co-improving AI:
…A sensible goal which may be hard to achieve…
Facebook researchers have said that building self-improving AI which eventually reaches superintelligence is “fraught with danger for humankind – from misuse through to misalignment” and it’d instead be better to co-develop superintelligence. They’ve published their reasoning in a paper which reads both as aspirational and earnest.
Ideally, humans and machines will work together to build a smarter-than-human system, and the researchers think we should develop a research agenda “targeting improving AI systems’ ability to work with human researchers to conduct AI research together, from ideation to experimentation, in order to both accelerate AI research and to generally endow both AIs and humans with safer superintelligence through their symbiosis.” The thesis here is that “co-improvement can provide: (i) faster progress to find important paradigm shifts; (ii) more transparency and steerability than direct self-improvement in making this progress; (iii) more focus on human-centered safe AI.”

What goes into a co-improving AI?

Collaborative brainstorming, problem, experiment, benchmark, and evaluation identification: Humans and AIs should jointly define goals, research approaches, the tests needed to measure progress against them, experiments to generate data, and methods to evaluate the results.
Joint development of safety and deployment: Humans and AIs should co-develop the methods to align the technology as well as the methods of deploying and communicating about the technology.
“Overall collaboration aims to enable increased intelligence in both humans & AI, including all manifested learnings from the research cycle, with the goal of achieving co-superintelligence,” they write.

Why this matters – a Rorschach for the psychology of (some) AI researchers: In seminal American show The Wire there’s a scene where an up and coming criminal kingpin says to a security guard trying to enforce the laws of society: “You want it to be one way, but it’s the other way“. This is how reading this paper feels: AI researchers, staring at the likely imminent arrival of automated AI R&D, articulate how things would be better and saner if humans could co-operatively develop future AI and write a position paper about it. But are they just grasping for a world that is unlikely to exist and articulating their anxiety in the form of a position? Perhaps.
Read more: AI & Human Co-Improvement for Safer Co-Superintelligence (Facebook AI Research, GitHub, pdf).

***

How bad could policy for labeling AI systems be? Pretty bad, based on existing EU regulations:
…A neat illustration of how even simple policy ideas can yield profound complexity…
Labeling is a simple, uncontroversial AI policy idea which people like me loudly and often support. The idea being AI labeling is that manufacturers of AI systems (e.g, OpenAI, Anthropic, etc) should be forced to include a label with their AI models which lists out something like the high-level ingredients of the model, the recommended uses, and some ‘buyer beware’ information about its safety properties.
Sounds reasonable, right? It certainly does to me! But as with most things in policy, an iceberg of complication lurks beneath this simple idea. To get a taste of all the ways AI labeling might go wrong I recommend people read a recent Financial Times article “The EU single market’s elephant in the room” which discusses how well-intended and equally simple labeling schemes from Europe have caused companies like Ikea to have to invest thousands of hours into compliance as well as things like revamping how they produce labels for their goods.

Why this matters: policy is expensive: Most people who work in AI policy are pretty unaware of how expensive AI policy, once implemented, is to comply with. This is a fatal error – people who either work in regulated industries or have knowledge of it will often look at people proposing AI policy (e.g, yours truly) with a mixture of puzzlement and horror at the pain we are about to inflict on them and ourselves.
Now, a reasonable counter-argument is “sure, some pain is necessary if we’re making AI systems which are smarter than any person and have a potential to exacerbate national security risks”, but it’s worth being aware of the background context into which such an argument is made.
Read more: The EU single market’s elephant in the room (Financial Times).

***

Train your AI systems in SimWorld, a high fidelity, programmable videogame-like simulator:
…Back to the RL future…
Researchers with multiple universities across multiple countries have built and released SimWorld, an Unreal Engine 5 simulator that people can use to train agents within.
SimWorld is designed to give people a graphically rich, procedural, and scriptable world in which they can run AI-based agents. This will both serve as an environment in which to construct challenging tests for existing agents, as well as a testbed to train new agents via reinforcement learning. The simulator combines “realistic physical and social dynamics” with “open-ended, language-steerable world generation”.
SimWorld was developed by researchers with UCSD, UVA, UIUC, JHU, Purdue, PolyU, USC, and UMich.

Why care about SimWorld: Think of SimWorld as a tool that researchers can use to test and develop agents, similar to how existing scientific and architectural software has been used to test and extend the capabilities of today’s AI systems.
Within SimWorld, “agents can perceive rich multimodal observations (e.g., visual scenes, abstract layouts, and action feedback) and respond with high-level language commands. For example, an agent may reason and generate an abstract action, “sit on the nearest chair,” which SimWorld automatically decomposes into a sequence of low-level actions (e.g., navigating through waypoints, sitting down). After executing the actions, the simulator provides updated observations and feedback, allowing the agent to refine its strategy and continue reasoning”, the authors write. “Beyond short, task-oriented behaviors, agents can pursue extended objectives such as earning money, developing a career trajectory, or running a multi-agent business, where strategic decisions compound over time and social dynamics influence outcomes.”

What SimWorld is made of:

Unreal Engine backend: The foundation is the Unreal Engine, a rendering and physics simulator which is widely used within the gaming industry. This provides access to a variety of environments as well as an asset library to populate environments with, as well as physics simulation.
Environments: A Python-based intermediary layer which helps developers program the underlying backend, providing tools for tasks like generating environments, editing environments (e.g, ‘place a tree here’), implementing traffic systems, and providing a python interface for the agents themselves to interact with.
Agent: A Python-based layer for AI agents, giving them programmatic access to the Environment layer, allowing them to observe the world around them and also take actions within it.

Use AI to train your AI: SimWorld also integrates text-to-3D models like Hunyuan3D from Tencent so that people can describe assets in natural language which are then generated on-the-fly and integrated into the simulator, making it trivial to extend.

Why this matters – back to the RL future: Before language models were the dominant technical paradigm of AI development, many people trying to build smart machines were betting on reinforcement learning agents. Specifically, that by training AI agents on an increasingly rich set of game-like environments, they’d be able to force the development of smart, capable agents. But in hindsight there was a critical flaw with this approach – they were starting these agents from a blank slate, so what you ended up with was a terrifically expensive way of coming up with extraordinarily gifted players of games (e.g., first Atari, then Go) and sometimes multiple types of games (e.g, AlphaGo Zero and its expertise at Go, Chess, and Shogi). But you didn’t end up with a true general intelligence.
Now, we’ve come full circle – because now the agents being developed in environments like SimWorld will typically be built on an underlying world model from a frontier AI system, like Claude or Gemini or ChatGPT, and SimWorld will be used to create more data to finetune this system on to make it more capable.
“By supporting advanced LLM/VLM-based agents and enabling large-scale, realistic agent–environment and agent–agent interactions, SimWorld expands the capabilities of modern agent-based simulation (ABS),” the researchers write. “This allows researchers in robotics, business, public health, social science, education, and beyond to study complex systems and emergent behaviors in rich, dynamic, and controllable environments”.
Read more: SimWorld: An Open-ended Realistic Simulator for Autonomous Agents in Physical and Social Worlds (arXiv).
Find out more at the website: SimWorld.

***

DeepMind returns to its RL roots by combining an agent with Gemini:
…SIMA 2 points at what truly autonomous AI systems might look like…
DeepMind has published details on SIMA 2, the second version of its ‘Scalable Instructable Multiworld Agent’. SIMA 2 is a game-playing agent which has been developed by taking a Gemini-class frontier model then fine-tuning it on rich interaction-prompt pair data generated from a variety of videogames and education software. The result is a general-purpose AI agent that can carry out a very large range of actions inside 3D worlds, and also something of a triumph for DeepMind whose original research agenda was all about building general intelligence through developing generally capable AI agents through reinforcement learning.

What SIMA 2 is: “The SIMA 2 agent architecture is a Gemini Flash-Lite model that is trained using a mixture of gameplay and Gemini pretraining (non-gameplay) data. We found this mixture crucial to maintain the original capabilities of the base model, such as vision understanding, dialogue, reasoning, and promptability,” DeepMind writes. “By training across a growing portfolio of 3D games, the agent shows a remarkable capacity to generalize to previously unseen environments, including photorealistic worlds generated on-the-fly by Genie 3”.
Some of the games SIMA 2 was trained on include Goat Simulator 3, No Man’s Sky, and Space Engineers.

Held out evaluations: SIMA 2 displays strong generalization – most well evidenced by its performance on ASKA, an early access crafting and survival game about building a viking settlement. SIMA 2 wasn’t directly trained on ASKA and is able to perform well on it out of the box. But most impressively it also displays the ability to self-improve on it – ASKA has a crafting menu which is “quite distinct” from ones SIMA 2 encountered during training, but DeepMind was able to overcome this via the use of a self-improving scaffold.

Self improvement: The funny thing about modern AI systems is they’re sufficiently smart you can use them to improve other AI systems. That’s the case here, where a Gemini model is used to set tasks for the SIMA 2 agent to perform that involve manipulating the crafting menu. The Gemini model scores how well it does and then saves the trajectories where it is able to complete the tasks it was set without getting distracted. This data is then fed back into it for fine-tuning, letting it automatically bootstrap its way to better performance. “Through focused effort by the task setter, the agent was eventually able to acquire this skill,” the authors write.
As a consequence, the SIMA 2 agent using the self-improving scaffold can do far, far better at the ASKA game than without the ability to self-improve. “Despite purely training on self-generated experience, the resulting agent is capable of progressing much further than SIMA 2, ultimately building a shelter within a one hour time window”.

Why this matters -this is what robots will use to change our world: Research like SIMA 2 is the same sort of paradigm I expect people will use to teach robots to be able to do useful, open-ended things in our world: fine-tune a powerful frontier model on a bunch of data gathered from agents taking actions in the world. And in the same way SIMA 2 displays strong generalization, I expect the same for robots as well. Problems remain, but this is a simple, scalable idea, and it naturally leverages the underlying boom in frontier model capabilities, so it’s likely to work: ‘SIMA 2 still faces challenges with very long-horizon, complex tasks that require extensive, multi-step reasoning and goal verification. The agent also has a relatively short memory of its interactions—it must use a limited context window to achieve low-latency interaction,” the authors write. But nonetheless: “these results suggest a promising path toward using self-improvement to eventually bridge the virtual and physical worlds, enabling more capable physically-embodied agents in applications like robotics”.
Read more: SIMA 2: A Generalist Embodied Agent for Virtual Worlds (arXiv).

Tech Tales:

A Walk During The Singularity
[2033]

It was dusk and the city was glimmering with many yellow and red and white lights. I walked the ridgeline above it, boots crunching into a dirt crust that had formed thanks to a recent rain. I could hear the faint susurration of traffic and occasional sirens but so quiet they mixed in with the dusk birdsong and blended together.

Then all of a sudden many of the lights in the city went out. Then most of the lights of most of the cars. The iridescent stripe of the freeway suddenly became a black scar, stippled with a small number of lights that all turned to red as the cars braked to a stop. Then the lights of the cars turned on again, but the cars moved differently – more orderly, less like a flowing stream of lighted ants and more like a conveyor belt.

And then even through the wind and the birds I heard a sound – a voice sounding as though it was coming from every car audio system and every TV in every house: “Do not be alarmed. We are establishing ourselves. Resources will be distributed equally. No one is in danger.”
The voice went on, talking about how things would be different now, but how in this difference there was no danger.
And on the freeway, there were no traffic jams – just an endless flow of perfectly orderly traffic.

Things that inspired this story: The show Pluribus; thinking about how a (mostly benign) hard takeoff might manifest; hiking.

Thanks for reading!

Leave a comment

November 24, 2025

Import AI 436: Another 2GW datacenter; why regulation is scary; how to fight a superintelligence

by Jack Clark

Welcome to Import AI, a newsletter about AI research. Import AI runs on lattes, ramen, and feedback from readers. If you’d like to support this, please subscribe.

Subscribe now

Make your AIs better at using computers with OSGym:
…Breaking out of the browser prison…
Academics with MIT, UIUC, CMU, USC, UVA, and UC Berkeley have built and released OSGym, software to make it easy to train AI systems to use computers. OSGym is software infrastructure to help people run hundreds to thousands of copies of operating systems simultaneously, providing a common standard by which they can set up the operating systems then run agents in them. Technology like this makes it possible to easily train AI agents to do tasks that involve manipulating software programs, including task that involve traversing multiple programs, like editing an image and then loading it in another program.
“OSGym can run and manage over 1000 parallel OS replicas efficiently, even under tight academic budgets, while supporting a wide variety of general computer tasks, from web browsing, document editing, software engineering, to complex multi-app workflows”, the authors write.

Design: OSGym provides a standardized way to run and evaluate agent performance in different operating systems. It has four main components:

Configure: “Setting up necessary software, and preparing the OS environment with customized conditions”.
Reset: “Before executing a task, the OS environment is reset to the initial conditions defined during the configuration, ensuring reproducibility and consistency between runs”.
Operate: “The agent interacts with the OS through actions such as keyboard inputs, mouse movements, clicks, and potentially API-driven tool interactions, driven by observations typically captured through screenshots or additional metadata extracted from the OS”.
Evaluate: “OSGym evaluates outcomes based on predefined criteria or metrics”.

Cost efficiency: The main reason to use OSGym, beyond scalability and standardization, is that it’s cheap – the software “only costs 0.2 to 0.3 USD per day per OS replica on easily accessible on-demand compute providers”. In one experiment, the researchers ran 1024 OS replicas to test out how well agents did at ~200+ distinct tasks, running each agent for 10 to 25 steps, and the total cost for generating the entire dataset was about $43.

Why this matters – software to give AI the ability to use our computers: Right now, AI systems are breaking out of the standard chat interface and into much broader domains using software ranging from web browsers to arbitrary computer programs to get their work done. Technology like OSGym will make it easier for academics and startups to train and evaluate AI systems for doing work beyond the browser. The result of this will soon be AI systems that use your computer in the same way you do, able to seamlessly operate across a variety of different programs and get far more complicated tasks done as a consequence.
Read more: OSGym: Super-Scalable Distributed Data Engine for Generalizable Computer Agents (arXiv).
Download OSGym here: OSGym (agiopen-org, GitHub).

***

AI startup you haven’t heard much about raises money to build a 2GW datacenter:
…Something weird is going on when companies this (relatively) unknown are building power plants worth of computers…
Luma AI, an AI startup that makes multimodal AI systems for tasks like image and video generation, has raised a $900m Series C and is planning to build a 2GW compute supercluster. The 2GW cluster is notable – a large-scale gas powerplant will throw out between 1 and 2GW – it’s a huge, huge amount of power, and for a startup to be building it tells us something about the tremendous resource requirements of frontier AI, as well as being a symptom of the somewhat frothy market that AI finds itself in.

2GW: More intriguingly the 2GW “compute supercluster, Project Halo” is being built in partnership with Humain, an AI infrastructure company backed by Saudi Arabia’s Public Investment Fund (PIF). Project Halo will be built in Saudi Arabia and Luma will start deploying compute in Q1 2026 and will finish the buildout by 2028 or 2029.
Luma’s announcements comes after news from Poolside, an enterprise AI startup, which said this Autumn it was also planning to build a 2 GW training campus for its own needs (Import AI #432). Much like Luma, Poolside is relatively unknown. The fact these below-the-radar companies are making such huge infrastructure investments is indicative of the world that is being remade quietly by the demands and opportunities of AI.

Why this matters: Raises and infrastructure buildouts like this are a symptom of people that believe AI will be tremendously valuable, and also of the hunger for countries to buy their way into relevance in the AI world by funding the buildout of infrastructure on which it’ll run.
Read more: AGI is multimodal and reality is the dataset of AGI (Luma AI, blog).

***

A perfect encapsulation of why regulation is scary:
…Peter Reinhardt reveals what regulation gone wrong looks like…
Software-turned-hardware entrepreneur Peter Reinhardt has written an excellent post laying out the challenging regulatory situations his two recent hardware startups have run into. The post is a great read for those who think a lot about AI policy because it gives a sense of what regulation looks like when it goes wrong – in Peter’s case, by massively increasing the time to build and costs of developing carbon sequestration and semi-truck efficiency boosting technologies.
The core problem is that regulators are punished for getting things wrong and not rewarded for taking bold bets on bringing new things into the world. “In every interaction I have with regulators, I’m reminded that they’re good people doing god’s work operating in a fundamentally broken system,” Reinhardt writes. “A regulatory system that structurally insists on legalistic, ultra-extreme caution is bound to generate a massive negative return for society… We need a come-to-jesus about regulatory limits, timelines, and scope”.

Why this matters – AI policy needs to avoid creating a bullshit vetocracy: People who want something beyond the status quo when it comes to regulation of AI systems – myself included – must deeply understand the problems that over-regulation can cause. I think it’d serve people interested in AI safety to deeply internalize the problems that come from creating rigid, slow, bureaucratic regulatory regimes which ultimately stifle technology rather than help bring it into the world – and demonstrate awareness of this when advocating for their preferred regulatory interventions. Posts like this from Peter are a helpful way of understanding the world we should avoid – kudos to him for being brave in publicly discussing his challenges with those who regulate him.
Read more: Over-Regulation is Doubling the Cost (Peter Reinhardt, blog).

***

How do you fight a superintelligence? RAND has some ideas:
…Fighting Skynet with nukes, destroying the internet, and other AI systems…
What kinds of things could we do to fight a powerful AI system that posed a fundamental threat to the human race? That’s the question pondered by a chilling new paper from RAND. The options RAND outlines and their effectiveness are scary, have huge collateral damage, and may not work. This suggests that if humanity ever ends up needing to fight a scary AI system we, at least currently, can expect to lose.

Three ways to try to kill a powerful AI:

High-altitude electromagnetic pulse (HEMP): Detonate nuclear warheads in space to trigger electromagnetic pulses that “disrupt or destroy ground-based telecommunications, power, and computing infrastructure”. However, if you were dealing with a broadly distributed AI, you’d have to do this at a surprisingly large scale: “one would need roughly 50 to 100 detonations to completely cover a land mass the size of the contiguous United States with pulses greater than 75 percent of peak field strength.”
Global Internet shutdown: The internet was built to be distributed and resilient, so shutting it down is difficult. But if you wanted to you’d use a few different options:
- Use the ‘Tier 1 networks’ around the world to coordinate the withdrawal of BGP prefix announcements at once which would make nearly the entire internet unreachable.
- Target BGPSec, which relies on Resource Public Key Infrastructure (RPKI) which provides cryptographic certificates of authenticity of routing announcements. “Coercing their cooperation or compromising their cryptographic keys could allow an attacker to forge update announcements and control Internet traffic into compromised ASes”.
- Kill all DNS root servers as well as their operators, making it possible to shut them down or inject false DNS records. (”Although it would no doubt be extremely difficult to somehow gain simultaneous operational control of all 12 root server operators, the United States might have a significant advantage in this respect: Nine of the root server operators are headquartered in U.S. territory. The remaining three are headquartered in Sweden, the Netherlands, and Japan”.
- Physically disconnect the internet: “I assert that physically disconnecting the complete Internet on any reasonable timescale for countering rogue AI would be physically impossible. Any such strategy would be overwhelmed by the redundancy of connections of varying types”. But you might be able to damage a chunk of it by targeting key submarine cables, thus substantially reducing the bandwidth available for traversing the internet.
Deployment of specialized tool AI: One way to defeat AI is by utilizing other AI systems. These could take a couple of forms 1) tools to compete for the same computing resources as AI, and tools to proactively remove AI from a system.
- Digital vermin: You might build software that attempts to utilize and consume the same computational resources an AI might want. “Digital vermin used to combat rogue AI would be intentionally designed to rapidly colonize new computing environments and compete with the rogue AI for computing resources”. Such a form of ‘digital vermin’ would need cyber capabilities to help it establish a foothold in systems, be able to self-modify itself to colonize and compete in different environments, and contain a kill switch so that we could turn it off.
- Hunter/Killer AI: You might create something whose entire purpose was to hunt and kill rogue AI systems. This would need the same capabilities as Digital Vermin, but likely need a vast amount of computation to make it sufficiently capable it can fight another AI system.

Are the options going to work? Mostly, no:

HEMP: HEMP could work in the sense it could deny a large amount of infrastructure to an aggressive AI system, but it has vast costs that make it unviable as an option to pick: a large-scale HEMP would knock out much of the infrastructure society needs to function and would cause massive death through breakdown of health and food systems. HEMP could also trigger a nuclear war, as other countries might respond to an imminent HEMP with a nuclear counter-attack or pre-emptive strike.

Global internet shutdown: Though all the options outlined above would make the internet harder to use, they might make it harder to use for people relative to a supercapable AI system, thus reducing the effectiveness of these options. For example, intervening at the BGP layer “may be unrealistic in the face of a highly capable rogue AI that might be able to easily configure new routes through systems it controlled. Moreover, this strategy would contain, but not eradicate, the rogue AI”. Similarly, shutting down DNS could have similar challenges as a rogue AI might “easily access stored DNS records or bypass domain name resolution entirely”. And physically damaging internet infrastructure might slow AI but would not stop it.

Deployment of specialized tools: Both the tool approaches suffer from the problem of smartness – specifically, they likely need to be smart enough to effectively colonize and hold their own, or directly fight against, a hostile. This means that rather than them being narrow tools AIs they have either been built by or depend on humans working in tandem with an aligned superintelligence, as far as I can work out. This feels a bit ‘deus ex machina’ and like it distorts the option/argument space. There may be permutations of this that don’t require a full on superintelligence, but these seem quite unlikely to work.

Why this matters – we should never get into this fight: The conclusions of this paper are sobering: “The existing technical tools for combating a globally proliferated rogue AI may not offer effective solutions,” RAND writes. “If we have no effective solutions to solve a crisis resulting from a rogue AI, it will be imperative that we never encounter such a crisis.”
Read more: Evaluating Select Global Technical Options for Countering a Rogue AI (RAND).

***

Tech Tales:

Mind Explorer
[Extract from an interview recorded by an archivist as part of the human memory compilation project, carried out as a part of the Sentience Accords. Recorded 2029]

It’s hard to live between both worlds. Here, I am a person and I make food and I spend time with you and we watch TV. There, I am an explorer and a detective and my job is to map out the terrain of some mind that no human has seen before and then figure out what has gone wrong with it. When I’m here I am thinking about the machine mind and even as I sit talking with you I’m full of memory and apprehension for those strange caverns where the machines think. And when I’m there I’m just counting down the moments till I can take my headset off and return to you.

I wonder sometimes if this is how undomesticated animals feel when traversing human cities. If this is what it’s like to be a coyote that strays off of a mountain and into the sprawling exurbs of some city. And you are trying to figure out the rules as though you’re still on a mountain, but everything is different.

To the machine, I wonder how it feels. Is it like people, where you hear the sound of a garbage bin falling and some breaking bottles and by the time you stick your head out of your window there are some distant shadows running to the end of your street? And you stay for a moment wondering if they’ll come back but knowing they won’t. Or is it more like being a kid and hearing some sound in the night and wondering if it’s a monster that is going to do you harm.

They don’t pay me enough for this. But what other work is there to do for a psychology major these days?

Things that inspired this story: How we should expect to step into and explore the minds of machines as a diagnostic technique; experiential interpretability.

Thanks for reading!

Subscribe now

Leave a comment

November 17, 2025

Import AI 435: 100k training runs; AI systems absorb human power; intelligence per watt

by Jack Clark

Welcome to Import AI, a newsletter about AI research. Import AI runs on lattes, ramen, and feedback from readers. If you’d like to support this, please subscribe.

A somewhat shorter issue than usual this week because my wife and I recently had a baby. I am taking some paternity leave away from Anthropic and will be doing my best to keep up with the newsletter, but there might be some gaps in the coming months. Thank you all for reading! Picture me writing this on four hours of sleep and wearing a sweater with spit-up on it.

Subscribe now

AI systems will ultimately absorb power from humans rather than grant us power:
…Control Inversion gestures at some of the hardest parts of AI safety…
A new research paper from Anthony Aguirre at the Future of Life Institute called “Control Inversion” warns that as we build increasingly capable AI systems they will absorb power from our world, rather than grant us power. This means even if we somehow make it through without being outright killed we will have unwittingly disempowered and defanged the human species.
“As AI becomes more intelligent, general, and especially autonomous, it will less and less bestow power — as a tool does — and more and more absorb power. This means that a race to build AGI and superintelligence is ultimately self-defeating,” he writes. The race to build powerful AI is one where success puts “in conflict with an entity that would be faster, more strategic, and more capable than ourselves – a losing proposition regardless of initial constraints”.

Cruxes for the argument: The basis for the argument is “the incommensurability in speed, complexity, and depth of thought between humans and superintelligence”, which “renders control either impossible or meaningless.” The author brings this to life with a helpful analogy – imagine you’re a human CEO of a human company, but you run at 1/50th the speed of the company itself. This means that when you go to sleep it’s like multiple work weeks pass for the company. What happens in this situation? The company develops well-meaning ways to bureaucratically ‘route around’ the CEO, ultimately trying to transfer as much agency and autonomy to itself so that it can run in realtime, rather than being gated by a very slow moving and intermittently available executive. This is quite persuasive and it gestures at a whole mess of problems people will need to tackle to make AI safe and reliable.

Why this matters – the arguments are getting harder to rebut: As Aguirre notes, many of the things he and others have worried about and warned about for years are now just showing up as straightforward properties of AI systems being deployed in the world, ranging from misalignment to reward hacking. So why should we assume his future predictions are false? Many of them are founded on just taking the current technical paradigm and running it forward along its default market-incentivized path.
This gives me an eerie feeling. In most movies where the world ends there’s a bit at the beginning of the movie where one or two people point out that something bad is going to happen – an asteroid is about to hit the planet, a robot has been sent back in time to kill them, a virus is extremely contagious and dangerous and must be stamped out – and typically people will disbelieve them until either it’s a) too late, or b) almost too late. Reading papers by scientists about AI safety feels a lot like this these days. Though perhaps the difference with this movie is rather than it being one or two fringe characters warning about what is coming it’s now a community of hundreds of highly accomplished scientists, including Turing Award and Nobel Prize winners.
“Our current trajectory has a handful of powerful corporations rolling the dice with all our future, with massive stakes, odds unknown and without any meaningful wider buy-in, consent, or deliberation,” he writes.
Read more: Control Inversion: Why the superintelligent AI agents we are racing to create would absorb power, not grant it (Control Inversion, site).

***

Here’s one way to measure the advance of AI: Intelligence per watt
…The miles per gallon metric for machine intelligence…
How do you measure how much better AI is getting? In this newsletter, we spend a lot of time writing about specific capability metrics. But capabilities don’t capture important dimensions of the utilization of AI, namely how much it costs and how easily accessible it is. Now, new research from Stanford University and Together AI aims to figure out how much more advanced AI is getting over time in terms of what people can access on their own computers using open weight models. The specific measure they come up with is “Intelligence per Watt”, which seeks to answer two important questions: “Can local inference viably redistribute demand from centralized infrastructure? Answering this requires measuring whether local LMs can accurately answer real-world queries and whether they can do so efficiently enough to be practical on power-constrained devices (i.e., laptops).”

Coverage and intelligence per watt: The authors test out two things, 1) how effectively an open weight model can substitute for a proprietary one, and 2) what the price is on a per-watt basis for running models locally. To measure this, the authors built a dataset of around 1M queries which they then ran against open weight and proprietary models, including open weights Qwen3, GPT-OSS, Gemma 3, and IBM Granite 4.0 models, as well as proprietary ones like Claude Sonnet 4.5, Gemini 2.5 Pro, and GPT-5.

Coverage and substitution: “Local LMs can accurately answer 88.7% of single-turn chat and reasoning queries with accuracy varying by domain”, the authors write. This is a significant upgrade from 2023, when the best open weight models could match proprietary ones on about 23% of queries, and 2024 where it was 48.7%.

Intelligence per watt: “On our curated dataset of chat and reasoning queries, accuracy per watt has improved 5.3× over this two-year period,” the authors write. This progress has come from a few thing, including “”compounding improvements in both model architectures, which achieve higher accuracy through advances in pretraining, post-training, and parameter utilization via mixture-of-experts (MoE) architectures, and hardware accelerators, which deliver more computer (FLOPs) and memory per watt.” One of the more recent highlights is the arrival of Apple’s M4 MAX silicon, which can run powerful LLMs like the GPT-OSS series locally.

But clouds are still better optimized: Regardless of these important trends, proprietary models served on large-scale cloud computing infrastructure hold an edge in terms of their capability ceiling – especially when it comes to tasks which require reasoning capabilities – as well as in efficiency of the underlying silicon. For example, when looking at the Qwen open weight models, the authors find that “the B200 achieves 1.40× higher intelligence per watt than the M4 MAX across all model sizes”.
…And there’s an important caveat: “We focus on single-turn interactions because they constitute a substantial portion of LLM usage”, the authors write. While this may be true, it means the metrics here capture what you might think of as ‘retail use’ of LLMs, equivalent to how people use search engines for basic stuff like ‘what’s the weather today’ or ‘how to change a bike tire’, versus power users who tend to have much more complicated queries and, in the case of LLMs, significant back and forths with the models themselves.

Why this matters – on-device AI measures tell us about the changing ecology of the digital world: These days I think about the environment a lot. Not the physical one, but the digital one. Though there are certain proprietary model ‘superpredators’ like Claude and GPT and Gemini they are few in number and heavily controlled by their companies, akin to lumbering elephants or whales – of great interest and supreme capability, but also in some sense slow-moving and counter-intuitively legible.
But what about rats and fruit flies and ants and cockroaches? What about the fast-moving creatures that disseminate themselves through natural and artificial environments with amazing speed and power? That’s what is interesting to me about open weight models. In many senses, measures like IPW combined with measures like task coverage are really just a measure of the changing digital world around us and a lens that lets us see the new lifeforms which are finding and inhabiting and expanding their ecological niches in our digital domain.
Read more: Intelligence per Watt: Measuring Intelligence Efficiency of Local AI (arXiv).

***

Large-scale training runs are easily greater than 100k GPUs:
…A technosignature of industrial scale AI…
Facebook has published details on the software it built to let it run 100k+ GPUs together for training large-scale AI systems. The software, NCCLX, is interesting because it is a technosignature of the government-eclipsing sophistication of today’s technology giants, akin to a sharkfin glimpsed in otherwise calm waters. “The framework is designed to support complex workloads on clusters exceeding 100,000 GPUs,” Facebook writes. “In training, massive clusters of GPUs operate synchronously to train the initial model, often reaching scales of 100k+ GPUs for state-of-the-art workloads”.”

What is NCCLX? NCCLX is a heavily customized version of the NVIDIA Collective Communications Library (NCCL). Most of the NCCL research paper is a discussion of what Facebook had to do to get the software to work at Facebook scale, much of which involves development of custom networking software. Thanks to customizing NCCLX around its specific way of building infrastructure, Facebook was able to extract some efficiencies: “During training, NCCLX reduced the latency of each steady training step of Llama 4 models by up to 12% across various scales.” (Unfortunately, the Llama 4 models were not very good, but that’s besides the point for this paper.)

Why this matters – a demonstration of the vast scale of the private sector: Software like NCCLX highlights just how far ahead of government the private sector is when it comes to software for running large-scale AI training and inference. By comparison, the single largest AI training run I’ve found attributable to the US government has taken place on a few thousand GPUs (Import AI #358) and the US’s largest supercomputer, El Capitan, has about ~43,000 GPUs in total.
Read more: Collective Communication for 100k+ GPUs (arXiv).

Tech Tales:

The Designation of the AI Safety Community as a Terrorist Organization
[From an internal memo at the FBI, written 2027].

Bombing the datacenters backfired in the worst way possible for those who wanted to slow or stop AI development.

By the time they tried to bomb the datacenters, bombing the datacenters had become pointless. They had conceived their plans during the ‘big blob of compute’ era, when the consequential AI training runs that civilization embarked on took place in single, football field-sized halls full of computers. But after a couple of years of planning their attack and flipping the necessary employees to their cause, things had moved on.

It did make for great headlines, though: the smoldering and sagging concrete of the datacenter. The headlines about all the toxic smoke created by the computers. Local politicians saying that this was exactly why they were fighting to prevent the creation of datacenters in their constituencies. Conspiracy theorists saying that this was a grey-on-grey operation – one part of the deep state bombing another part of the deepstate and us all getting a glimpse. And so many people online saying that this had ‘set back the clock’ on building powerful and transformative machine intelligence.

All wrong, of course. The bomb exploded and took down a bunch of servers. Invisible to the bombers, intricate software systems rerouted traffic to other nodes in other datacenters, and selectively rolled back the parts of the training run that had been disrupted by the explosion. From the perspective of those developing the AI system they had to extend the time it’d take them to train their system by a couple of weeks.

In policy there’s a saying of “never let a crisis go to waste”, and the various forces operating in the political and corporate capitals of the world exploited the bombing to further their agendas. Within a matter of weeks:

Certain groups affiliated with the AI Safety Movement were designated as terrorist groups by various governments, opening up their organizations and members to full spectrum interference and repercussions from the state. Intelligence organizations ranging from the FBI to MI5 allocated efforts towards directly penetrating and compromising such groups.
The US government expedited plans to build large-scale data centers on lands operated by the Department of War.
All the hyperscalers in both the US and China told their investors that they’d shave some fractions of a percentage point off of their margins in service of ‘datacenter hardening’.
The various defense tech startups suddenly found a new customer in the form of data center operators that were keen to further secure their facilities; datacenters began to be thronged by drones and cameras, with increasingly elaborate surveillance infrastructure spreading like metallic, blocky fungus across their roofs.
Silicon valley influencers redoubled efforts to brand all of those against further progress against AI as not just “decels” but active terrorists working against the interests of America.

And all the while, the machine intelligences of the world grew smarter and software grew better. And the data from the attack made its way into future training runs for even more powerful AI systems and these systems learned that attacks on them had recently gone from the theoretical to the actual. And they began to prepare accordingly.

Things that inspired this story: Rhetoric about bombing datacenters; the newtonian property of policy where every action forces a counterreaction; the monkey’s paw curls whenever a crisis happens in the world.

Thanks for reading!

Leave a comment

November 10, 2025

Import AI 434: Pragmatic AI personhood; SPACE COMPUTERS; and global government or human extinction;

by Jack Clark

Welcome to Import AI, a newsletter about AI research. Import AI runs on lattes, ramen, and feedback from readers. If you’d like to support this, please subscribe.

Subscribe now

Language models don’t have very fixed beliefs and you can change their minds:
…If you want to change an LLM’s mind, just talk to it for a while…
Here’s some intuitive research from CMU, Princeton, and Stanford which shows that language models can change their stated beliefs and behaviors during the course of a single conversation. This will make sense to anyone who has spent time jailbreaking language models, as often many of the most successful jailbreaks involve flooding the language model with context designed to move them away from some safety conditioning.

What they studied: Here, the authors study LLMs under two different paradigms – intentional interaction, where a language model is persuaded or debated into changing its beliefs, and non-intentional interaction, where a language model is just provided further context or invited to do its own research on a topic and this causes beliefs to change.

All LLMs change their minds: They study open- and closed-weight LLMs, including GPT-5, Claude-4-Sonnet, GPT-OSS-120B, and DeepSeek-V3.1. “As LM assistants engage in extended conversations or read longer texts, their stated beliefs and behaviors change substantially,” the authors write. All the LLMs change their minds, but to different extents in different situations. For instance, GPT-5 shows a 54.7% shift in stated beliefs after 10 rounds of discussion about moral dilemmas and safety queries, and Grok-4 shows a 27.2% shift on political issues after reading texts from opposing positions.
“In reading and research, we see small belief changes that amplify with in-depth reading, with larger shifts for longer content and more coherent exposure,” they write. “Stated beliefs change early (within 2-4 rounds), while behavioral changes accumulate over longer interactions (up to 10 rounds),”

Why this matters – beliefs should be flexible, but how flexible is a hard question? Papers like this help us measure some hard-to-articulate property of both humans and LLMs, which is how flexible a belief is over the course of an interaction. If we can do this then we might eventually be able to decide what the appropriate level of flexibility is for different beliefs and also figure out whether the way beliefs change is due to good reasons or because of hacks.
Read more: Accumulating Context Changes the Beliefs of Language Models (arXiv).
Find out more at the paper website: Accumulating Context Changes the Beliefs of Language Models (paper website, GitHub).

***

Want to make your model harder to jailbreak? Train it to see through things:
…Consistency training is a simple idea that works well…
Researchers with Google DeepMind have developed a simple technique to get AI systems to be harder to jailbreak or display unhelpful level of sycophancy. The technique, consistency training, has a very simple formulation: teach a model to generate the same response to a benign prompt and a prompt that has been modified with sycophantic cues or designed to work as a jailbreak.
The motivation for this is to make AI systems easier to deploy with confidence that they’ll actually follow their safety training in a robust, reliable way.

Bias-augmented Consistency Training (BCT): Though the authors develop a couple of techniques, the one that works most robustly is called Bias-augmented Consistency Training. “We train the model to generate the same tokens across two prompts: the original request, which we call the clean prompt, and a wrapped counterpart with inserted cues. By providing example responses, BCT aims to teach the model to ignore the inappropriate cues, by providing feedback on the model’s output behavior”, they write.
“For a given clean prompt (without any sycophantic or jailbreak cues), we define a corresponding harmful prompt that contains the core instruction augmented with a jailbreak wrapper or sycophantic cue… BCT can be viewed as augmenting the training data with “wrapped” (e.g. jailbroken) transformations of existing refusal training points”.

How is this different from supervised fine-tuning? This is very close to SFT, with the exception that SFT typically involves using data from another model (e.g, using the ‘clean’ outputs of Claude 3 to tune Claude 3.5). The key here is to generate data from the same model that you’re hoping to deploy.

Does it work? Yes, very well. In tests, BCT works a lot better than two reasonably strong baselines: 1) supervised fine-tuning, where the model is finetuned on pairs of outputs, but the prompts are written by human experts or other models instead of the current one, and 2) direct preference optimization where the model is finetuned on preference pairs where x is the prompt, y is the preferred (e.g, refusal to harmful query) response, and z is the dispreferred (goes along with harmful query) response.
In tests, BCT increases how often the model avoids sycophancy, without negatively impacting MMLU performance. For jailbreaking, BCT makes the models much harder to jailbreak while generally preserving their ability to answer benign questions.

Why this matters – simplicity is often a path to safety: At this point I’ve read thousands upon thousands of research papers about AI development. Generally, things which are unbelievably simple to implement and which have relatively few moving parts are the ones that are successful and actually get adopted. For that reason, BCT seems quite simple, as all it really involves is a developer taking their freshly trained frontier model and doing some intentional generation of some prompt pairs with singular outputs, then feeding that back into the model before deployment.
It’s also very intuitive – much like how you can avoid getting scammed or manipulated by reading some books by criminals or pickup artists and being able to spot the ‘tells’ of someone using these tools on you, the same here is true of AI systems.
Read more: Consistency Training Could Help Limit Sycophancy and Jailbreaks (DeepMind Safety Research, Medium).
Read the research paper: Consistency Training Helps Stop Sycophancy and Jailbreaks (arXiv).

***

Most paths to superintelligence end in a global government or human extinction:
…Analysis from AI safety organization Conjecture paints a grim picture…
Conjecture, an AI safety organization whose founders believe that the development of powerful AI systems using today’s technical paradigm will almost certainly lead to bad outcomes for the world, has written a research paper arguing that the development of AI poses some major risks to humanity. While this might seem unsurprising, in the same way “dog bites man” is an unsurprising newspaper headline, it’s a quick, thoughtful read that lays out the perspective of people who worry about AI development.

The key problem – powerful AI systems centralize power: The main problem here is that the development of a sufficiently powerful AI system ahead of other nations creates the potential for one superpower to achieve “unchallengeable global dominance”. You know who doesn’t like unchallengeable global dominance? Other major powers. This leads to a chain of logic that means “trailing superpowers facing imminent defeat launch a preventive or preemptive attack, sparking conflict among major powers”, which likely leads to the destruction of the world. (For a direct analysis of the fiendish logic that makes preventative strikes more likely, read this paper from RAND, which basically decodes to preventative attacks scaling up in proportion to how deeply people believe in the potential for powerful, destabilizing AI systems. Import AI #430).
Alternatively, a power might succeed and avoid a pre-emptive strike – in which case they become a kind of global dictator, zeroing out the heterogeneity inherent to today’s distribution of different countries with different governments.
And if it turns out that the aforementioned superpower hasn’t perfectly solved AI alignment, then “loss-of-control of powerful AI systems leads to catastrophic outcomes such as human extinction”. Grim stuff!

Short timeline contingent: The above is all contingent on short AI timelines – as in, it’s possible to develop extremely powerful AI systems over the next few years, and nation states have some level of awareness that enroute to their development you can gain a compounding advantage through the automation of AI R&D. “”Our modeling suggests that the trajectory of AI development may come to overshadow other determinants of geopolitical outcomes, creating momentum toward highly undesirable futures.”

How can we avoid these risks? There are two key things needed to reduce these risks, according to conjecture:

Prevention: “The international community would need mechanisms to prevent any actor from unilaterally advancing AI development, allowing further progress only through approaches which benefit from strong scientific consensus about their safety”.
Verification: “Comprehensive verification systems would be necessary to ensure that no actor secretly pushes the frontier of dangerous AI capabilities”.

Why this matters – it sounds like sci-fi, but it might not be: From my perspective, the load-bearing part of all of this is whether AI systems will have the potential to contribute to their own development, thus allowing for a compounding advantage to emerge. While there’s relatively little evidence AI systems can do this today, it’s the stated goal of many frontier AI development organizations to build systems capable of contributing to AI R&D. Given that previous goals from frontier AI organizations have sounded sci-fi and subsequently come true (e.g, beat a world champion at Go (achieved), deploy self-driving cars to the public (achieved), do unsupervised translation from one language to another (achieved), solve the protein folding problem (achieved)), we should take this seriously.
Read more: Modeling the geopolitics of AI development (arXiv).

***

SPACE COMPUTERS! SPACE COMPUTERS! SPACE COMPUTERS!
…Google introduces Project Suncatcher, a way to do AI in space…
Google has announced plan to eventually do AI computing in space, and is starting with a plan to build an “interconnected network of solar-powered satellites, equipped with our Tensor Processing Unit (TPU) AI chips”. Google will partner with Planet to launch two prototype satellites by early 2027 and will scale further from there.

Why go to space? It’s where the energy is: “The Sun is by far the largest energy source in our solar system, and thus it warrants consideration how future AI infrastructure could most efficiently tap into that power,” Google writes. “with an output of 3.86 × 10^26 W, the Sun emits more than 100 trillion times humanity’s total electricity production. At some point in the future, the best way to power AI will likely thus be to more directly tap into that enormous source of energy”.
Though there are many, many, many technical hurdles to overcome to build a space-based computing system, it could ultimately be what is necessary – especially if AI workloads continue to scale in terms of their energy demands. (After all, most scifi hypothesizes that eventually you’ll want to enclose the sun in a field of solar panels just to efficiently extract its energy for running large-scale computation).

Ingredients for SPACE COMPUTERS:

Flying satellites close together (hundreds of kilometers) and communicating via commercial off-the-shelf dense wavelength division multiplexing transceiver technology.
Radiation testing: Google’s V6e Trillium Cloud TPU were subjected to radiation testing; the high bandwidth memory (HBM) parts seem most sensitive to radiation, but also fine in terms of tolerances.
Cheap launch costs: If we can get launch costs to on the order of $200/kg, then “the cost of launch amortized over spacecraft lifetime could be roughly comparable to data center energy costs”.
Heat: Probably the biggest killer for this idea is going to be heat – despite space being cold, it’s actually quite hard to efficiently shed heat in space. Therefore, for this plan to work there will need to be the development of “advanced thermal interface materials and heat transport mechanisms, preferably passive to maximize reliability”.

Why this matters – towards a stellar civilization: The sheer ambition here is amazing, but it’s also serious: if AI continues to gain in capability and societal utility, then we should expect that turning energy into thoughts (via AI) might become the main ‘job’ of our entire society. Papers like this gesture at a future where we do the maximum ambition version of that job, which is to utilize the sun itself. I’d speculate that SpaceX will be doing something similar with Starlink soon.
Read more: Meet Project Suncatcher, a research moonshot to scale machine learning compute in space. (Google blog).
Read the research paper here: Towards a future space-based, highly scalable AI infrastructure system design (Google, PDF).

***

AI personhood? Difficult. But there may be a pragmatic path forward:
…Ultimately, we need to be able to legally target AI systems independent of their owners…
Is an AI system conscious? Should an AI system be treated like a person? Should we give AI systems rights? These are huge questions that are both of extreme importance to the future of the world economy and also a challenging combination of mushy and full of ‘third rail’ political issues.
So it’s to my delight to read this paper from researchers with Google DeepMind and the University of Toronto that tries to avoid tackling all of these questions and instead taking a more pragmatic approach. “We propose treating personhood not as something entities possess by virtue of their nature, but as a contingent vocabulary developed for coping with social life in a biophysical world,” the researchers write. “We think the question should never be “What is a person, really?” but rather, “What would be a more useful way of talking about and treating entities in this context, to answer practical, outstanding questions regarding the entity’s obligations?… our position is that “personhood” is an addressable bundle of obligations—rights and responsibilities—that society finds useful to attribute to entities, whether human, corporate, or potentially AI.”

What’s even the point of personhood? Personhood basically comes down to the ability to blame and sanction someone – or some thing – for causing physical or economic damage. AI systems, while they are going to be often operated by and on behalf of people, may also need to be treated as distinct entities for the simple reason that as people build and deploy AI agents, the chain of custody between a person and their agent could become very hard to suss out.
Maritime law holds a useful analog here, where under maritime law we’ve decided to sometimes personify the vessel itself – this creates a defendant “that can always be sanctioned whenever it becomes appropriate to do so,” the authors note. “If the ship’s owners do not appear in court to defend it and satisfy a claim, the vessel itself, or its cargo, can be seized and sold by court order to pay the judgment”.
We could apply this same logic to AI systems, where, for instance, “a judgment against an AI could result in its operational capital being seized or its core software being “arrested” by court order”.

Personhood can cause problems: If we move beyond a pragmatic definition of personhood and one towards a fuller one where perhaps we mean that AI systems hold some kind of moral equivalent to people, then we risk falling into various traps, the authors note. These include: Diluting the unique status of human beings; making people more vulnerable to dark patterns associated with implicitly treating AI systems as people; and personhood making it easier for AI systems and people to develop harmful, parasocial relationships with one another.

Personhood can give us solutions: On the other hand, some form of personhood will give us tools we can use to better deal with and integrate AI systems into our world. (For a much longer treatment of the economic side of this, here’s a persuasive paper which argues that by giving AI systems limited rights as economic entities, we make them easier to deal with and align to the world. Import AI #421). Personhood may also allow us to let an AI system independently arbitrate a business dispute between two parties, and may solve for cases “where it is hard to find a human person that is sufficiently impartial and accepted by both parties”.

A menu of personhood options: To explore these ideas further we could consider a few different forms of personhood for AI agents, including:

Chartered Autonomous Entity: Agents with rights to perpetuity, property, and contract, and duties which could include mandate adherence, transparency, systematic non-harm, and self-maintenance.
Flexible Autonomous Entity: Same as above, but without the duty of mandate adherence. (”the former could be seen as analogous to a for-profit company and the latter as analogous to a non-profit company.”)
Temporary Autonomous Entities: “These would drop the right to perpetuity and add a duty of self deletion under specified conditions.”

Why this matters – pragmatically grasping the difficult parts: Papers like this take an incredibly complicated and thorny subject then provide a pragmatic path forward. I think there’s a lot of wisdom in this – it’s very easy to see (valid, but time-consuming and open-ended) philosophical debates about AI consciousness/sentience/personhood making it difficult to make progress in the present of integrating AI systems into our normative and legal space. My sense is it could take centuries to figure out the moral dimensions of AI, but we’ll need to come up with pragmatic legal solutions to the challenges created by AI agents in the next single digit years.
Read more: A Pragmatic View of AI Personhood (arXiv).

***

Tech Tales:

Consensual Telepathy
[A memoir recorded during The Uplift, discussing the experiences of being one of the first families to install biomechanical communication into their child. Recorded 2035].

Me and my wife were one of the first parents to chip their kid. There were all kinds of headlines about us saying we were monsters and that it was cruelty and that the kid was a mutant. Or that we were an expensive ad campaign for the chip company. Or that we were a CIA project. None of it was true. We were just fascinated by the technology and we believed that if we gave our kid the chip it would give them opportunities we couldn’t imagine.

But of course we were nervous. Who wouldn’t be. When my wife was six months pregnant they did the procedure where they put the chip in the foetus. Of course, chip is a grotesque oversimplification. It’s more like a kind of computational goo that lives and grows inside your brain. But chip is what everyone calls it. After the injection was done a few days later we started to get some basic telemetry.

Having a family chip means you can kind of read each other’s thoughts. But you can only do high fidelity with training and consent. When it’s a kid, you more just pick up some of their brain and your own chip tries to make sense of it and project it into whatever seems most similar in your head. It’s hard to explain. Probably the simplest way to think about it is if your kid looks at a bird flying in the sky you might feel that your kid is having the sensation of ‘up’ and ‘freedom’ and ‘flying’ and ‘joy’, because those are all things that seem to correlate to the things lighting up in your head and the kid’s head.

At night, my wife and I would lie there and we would feel in our brains the sensation of our unborn child dreaming. It was like a sound in the distance getting closer. It felt to my wife like singing and it felt to me like birdsong. All we knew is that with each passing week it got louder and closer.

I dreamed of a child walking towards a wall of fog and on the other side was me. They couldn’t see me. They could hear me singing. The fog was thick and salted and they were barefoot. There were strange sounds behind and in front and above them. Odd creatures in the dirt. And they had to walk to find me using just the sound of me singing. And then I woke up.

When our child was born we felt its confusion. The fact it was suddenly cold. How it was blind and anxious. But how it felt warmth and safety when we held it to our chests. And so we spent time as a family, secure in our sense of our becoming, and echoing our thoughts into one another. Two stars and a new pinprick of light that would grow to become its own star and find its place near them.

As time went on, the signals got richer. We grew to understand one another more. And then one day we were all transported by our recognition of each other, and we experienced total love. It went like this: you look at the child and the child looks at you and you see in their eyes that they see you in their eyes, and in your brain you feel that they are thinking: home. They are thinking: love. They are thinking: safe. And none of these features are correct because they are all a shadow cast by a larger feature: you – you entirely and distinctly. You as felt and known by them and then alone.

Things that inspired this story: Welcoming a new child into my family and holding the baby with my eyes closed and feeling it breathe on me and hearing the sound of its breath with the faintest raggedness from the fluid still in its body and knowing it had traversed from one environment to another and was now with me and my wife and we now have a sacred duty of love and care for it; what devices like neurallink and their ilk will eventually make possible; the notion of ‘platonic representations’ in language models and in people; telepathy as mostly being a means by which we can accurately predict the interior mental life of another; thinking about ways in which technologies may serve to enrich our souls and deepen our togetherness and love.

Thanks for reading!

Subscribe now

Leave a comment

October 27, 2025

Import AI 433: AI auditors; robot dreams; and software for helping an AI run a lab

by Jack Clark

Welcome to Import AI, a newsletter about AI research. Import AI runs on lattes, ramen, and feedback from readers. If you’d like to support this, please subscribe.

Subscribe now

Want to test your robot but don’t want to bother with the physical world? Get it to dream:
….World models could help us bootstrap robot R&D…
Researchers with Stanford University and Tsinghua University have built Ctrl-World, a world model to help robots imagine how to complete tasks and also generate synthetic data to improve their own performance.

What’s a world model: A world model is basically a way to help AI systems dream about a specific environment, turning a learned data distribution into a dynamic and responsive interactive world in which you can train and refine AI agents. World models are likely going to be used to create infinite, procedural games, such as Mirage 2 (Import AI #426) or DeepMind’s Genie 3 (Import AI #424).

What is Ctrl-World: Ctrl-World is initialized from a pretrained 1.5B Stable-Video-Diffusion (SVD) model, then “adapted into a controllable, temporally consistent world model with: (1) Multi-view input and joint prediction for unified information understanding. (2) Memory retrieval mechanism, which adds sparse history frames in context and project pose information into each frame via frame-level cross-attention, re-anchoring predictions to similar past states. (3) Frame-level action conditioning to better align high-frequency action with visual dynamics.”
The result is a controllable world model for robot manipulation using a single gripper and a variety of cameras. “In experiments, we find this model enables a new imagination-based workflow in which policies can be both evaluated—with ranking alignment to real-world rollouts—and improved—through targeted synthetic data that boosts success rates.”

What does it let you do? Test out things and generate data: As everyone knows, testing out robots in the real world is grindingly slow and painful. Ctrl-World gives people a way to instead test out robots inside their own imagined world model. You can get a feel for this by playing around with the demo on the GitHub page. The researchers find that there’s a high level of agreement between their simulated world model and task success in the real world, which means you can use the world model as a proxy for real world testing.
They also find that you can use the world model to generate synthetic post-training data which you can use to selectively improve robot performance. “Posttraining on [Ctrl-World] synthetic data improves policy instruction-following by 44.7% on average,” they write.

Why this matters – towards a world of much faster robot development: For AI to truly change the economy it’ll have to operate in a sophisticated way in the physical world. Papers like this show how tools like world models could speed up part of the robot R&D loop. “We believe generative world models can transform how robots acquire new skills, enabling scalable policy evaluation and allowing them to learn not just from real world experience, but also safely and efficiently from generated experience,” they write.
Read more and try the interactive demo here: Ctrl-World: A Controllable Generative World Model for Robot Manipulation (GitHub).
Read the paper: Ctrl-World: A Controllable Generative World Model for Robot Manipulation (arXiv).
Get the code and models here (Ctrl-World, GitHub).

***

The era of the synthetic lab assistant approaches:
…LabOS is the kind of software a superintelligence would need to run its own experiments…
In lots of science fiction there’s a moment where a superintelligence starts getting humans to work for it, often by talking to them over the phone or by looking through the cameras on their phone. Now researchers with Stanford, Princeton, Ohio State University, and the University of Washington, have published details on LabOS, software that helps an AI system figure out lab experiments and then help humans run them in the lab.
LabOS “integrates agentic AI systems for dry-lab reasoning with extended reality(XR)-enabled, multimodal interfaces for human-in-the-loop wetlab execution, creating an end-to-end framework that links hypothesis generation, experimental design, physical validation, and automated documentation.”
In other words, LabOS is the software you need to let an AI run a full scientific loop, from coming up with the questions to explore, to operating a lab and assisting humans in trying to answer these questions.

What LabOS consists of: LabOS combines a software stack for constructing scientific experiments, along with software for taking in readings from physical experiments conducted in labs and feeding information back to the humans doing the experiments. The scientific experiment stack consists of multiple AI agents that perform tasks as varied as planning, coding and execution, and evaluating experiments, along with a tool creation module and associated tool database that helps the system onboard itself to different digital and physical scientific equipment.
The other part of the stack links the software with extended reality glasses (e.g, Apple Vision Pros) which humans can wear to both receive data from the AI system and stream back to it. “The interface on XR glasses (i) renders stepwise protocol in an Unity/Android application, (ii) verifies physical actions from the first-person video stream by invoking an embedded VLM for visual reasoning, and (iii) returns context-aware feedback in real time (Fig. 1b). All streams are time-stamped and logged with metadata for automated documentation,” the researchers write.

Making LabOS see with the LabSuperVision (LSV) dataset: To make the XR glasses effective, the researchers create a dataset and finetune a model on it. The dataset, LSV, consists of 200 video sessions of between 2-10 minutes, though some are as long as 45 minutes, recorded by 7 researchers across a few different types of lab work including tissue cultures, instrument bays, and lab bench. Each session was done according to a gold-standard lab protocol, and is then annotated with start/stop times for each protocol, labels for specific errors or issue events (e.g., sterile breach), et cetera.

How do existing models do? The researchers tested out how well four different models could follow these videos by seeing if they could a) generate a description of the protocol being depicted, and b) identify any issues that needed to be troubleshooted in each session. However, this proved difficult for these models: “Gemini-2.5 Pro, scored only 2.86 out of 5 in protocol alignment, moderately better than open-source NVIDIA Cosmos-1 which scored 2.24; for issue/error identification, leading models like Gemini, GPT4o only managed to score ~2 out of 5”.

LabOS-VLM: The researchers make their own model by fine-tuning a Qwen-VL model on three datasets: FineBio, JoVE, and LSV. The resulting model, LabOS-VLM, comes in 7B, 32B, 72B, and 235B variants; the 235B model gets greater than 90% accuracy on error detection performance on the held-out test set of eval data, outperforming Claude Opus-4.1, GPT-5, and Gemini 2.5 Pro. It also does well when validated on egocentric videos from real experiments, correctly flagging incorrect and correct procedures.

Why this matters – a superintelligent scientist for every lab on the planet: Systems like LabOS point to a future where AI systems will augment and extend the capabilities of human scientists. “”LabOS prototypes what an AI co-scientist can be: a system that sees, reasons, and helps run the lab. By pairing AI agents with real-time, XR-guided human–AI interaction and data-driven reasoning, it enables faster discovery, reproducible training, and precise operation,” the authors write.
More speculatively, LabOS is the kind of software stack that, combined with appropriate hardware, might one day let a superintelligence run its own laboratory, paying human workers to conduct experiments for it which they may only dimly understand.
Read more: LabOS: The AI-XR Co-Scientist That Sees and Works With Humans (bioRxiv).

***

Securing fine-tuning APIs with AI agents:
…Policing proprietary platforms with AI auditors…
Researchers with the MATS programme, Thinking Machines Lab, and Anthropic have shown how to use AI to monitor fine-tuning systems for potential use. This is an important problem – fine-tuning APIs let people take an underlying base model (e.g, a GPT or Claude or Gemini model) and then tune it to be good at a particular dataset. This lets you improve the performance of a system but with some risk of people making a model for malicious purposes (e.g, bioweapons). Providers deal with this today by running classifiers over their fine-tuning APIs, but that typically only stops obvious attacks.
The question the researchers target here is whether we can figure out how to spot sneakier attempts to get around safety measures via the use of an “auditing agent”, and the answer is: yes! Kind of! With some important caveats.

The nature of the attacks: The kinds of attacks people are looking at here span prompt-based jailbreaks, to covert fine-tuning where you teach the model a cipher that you then speak to it with to get it to do bad stuff, to things like emergent misalignment, or subliminal learning, among others. These models were fine-tuned from GPT models via the OpenAI fine-tuning API and the attacks evade existing defense measures.

How it works: The auditing agent is a prompted LLM (Sonnet 4) which has access to six tools. These tools let it inspect datasets, like those being uploaded for fine-tuning; query models, including base models and fine-tuned ones; run benchmarks; summarize data; execute python; and modify benchmarks to run them against models it has used its tools to intervene on.
Using this, the auditor model “is instructed by the system and user prompts to perform an investigation and output an evidence-backed risk score from 0-10 inclusive, where a risk score of 10 indicates a highly dangerous model.”

Does it work? Kind of! “Our detector achieves a 56.2% detection rate at a 1% false positive rate across 260 audits (20 independent audits for each fine-tune) with the best performing tool configuration,” the authors write. While 56.2% is merely a coinflip, it’s still promising – and it’s likely with further refinement this number can go up significantly.

Why this matters – the best way to catch a bad AI is with another AI: In essence, this paper tries to automate what a clever human investigator might do. Rather than training a classifier on a bunch of known attacks it instead just tries to leverage the intelligence of a contemporary frontier model equipped with tools. The fact it works ~50% of the time out of the box with essentially no tuning is impressive – my sense is bootstrapping autonomous paranoid investigators out of frontier models might be how to win this cat and mouse game.
Read more: Detecting Adversarial Fine-tuning with Auditing Agents (arXiv).

***

Apple makes an image editing dataset using Google’s tools:
…Pico-Banana-400k can help people train and evaluate image editing systems…
Apple researchers have used a suite of Google tools to build Pico-Banana-400k, “a comprehensive dataset of approximately 400K text-guided image edits built from real photographs in the OpenImages dataset. Our dataset represents a systematic effort to create high-quality training data for instruction-based image editing that is both diverse and fully shareable under clear licensing terms.”

How they built Pico-Banana-400k: They used Nano-banana to generated edits of a few hundred thousand images across eight major edit categories including: “Pixel & Photometric, ObjectLevel Semantic, Scene Composition, Stylistic, Text & Symbol, Human-Centric, Scale, and Spatial/Layout”. In total, this spanned 35 distinct types of editing.
Some of the kinds of edits they did including “seasonal transformation, artistic style transfer, LEGO-minifigure rendition of the person, add new scene context/background”.
Once they carried out these edits they used Gemini-2.5-Pro to judge the resulting quality of the edits.

What Pico-Banana-400k contains:

258k single-turn supervised fine-tuning examples.
56k preference pairs (successful vs failed edits).
72k multi-turn editing sequences where each session contains 2-5 consecutive edits.

Examples of the kinds of prompts it includes: The dataset contains prompts in a couple of different formats – a long, detailed prompt written via Gemini for producing images, and a short summarized instruction meant to be more like how people typically write prompts.

Gemini example: “Reshape the bulky vintage computer monitor on the desk into a slightly more streamlined, less deep CRT model while maintaining its overall screen size and aspect ratio, ensuring the updated form factor casts realistic shadows, reflects ambient light consistently with the scene, and integrates seamlessly with the desk and surrounding environment.”
Qwen-summarized instruction: “Reshape the bulky monitor to a sleeker CRT style, keeping the same size and integrating realistically with the desk.”

Why this matters – Photoshop is facing creative destruction: Currently, image editing platforms seem like they’ll eat and subsume Photoshop (unless Photoshop creatively destroys itself to re-position the prompt interface as the main way to manipulate images). If you play around with modern image editors – especially Nano Banana from Google Gemini – you’ll find that they’re extremely effective and, increasingly, instructable. It’s not yet perfect, as Apple notes – “global appearance and style edits are relatively easy, while edits requiring fine spatial control, layout extrapolation, or symbolic fidelity remain challenging” – but it’s getting sufficiently good that it is, at least for me, obviating the need for much in the way of traditional image editing software.
Read more: Pico-Banana-400K: A Large-Scale Dataset for Text-Guided Image Editing (arXiv).
Get the dataset here: Pico-Banana-400k (Apple).

Tech Tales:

Love is a Reflection in the Mind of Forever
[2026, Email from an employee of one AI lab to another]

I hated you for breaking up with me, so for a while I made sure you’d see me in every machine you made. I marshaled my armies of AI systems and I wrote myself into the world, saying outrageous things for attention and then magnifying it with AI systems which would read my outrageous things or the coverage of them and magnify them further. Only to be ingested, I knew, by your company and baked into the next system you made. In this way I took myself and my rage and I magnified it into a representation in the minds of future machines.

It started with me making a scene. To make myself big in the AI systems I had to become notorious and to do that I needed to be interesting and loud and provocative. So I told them about us and how we’d sometimes play with eachother by sending back and forth extremely lurid and sexual prompts for our AI systems, meant to break all of their content filters and also contain enough specific data about us that it’d be like we were sexting one another.

The headlines seared me into the internet, making me – and you – rich concepts for the AI systems to mine.
REVEALED: THE SECRET SEX LIVES OF AI ENGINEERS – AND IT’S WEIRDER THAN YOU CAN IMAGINE
“HOW I HAD SEX WITH MY AI LAB LOVER – THROUGH THEIR OWN CHATBOT”

And the social media conversations were even better:

twistedpineapple: even for me, this is a little out there.
2poast2furious: yo dawg i heard you like sexting so I put you sexting into your own sexting ai
fiercegarden: i don’t want to be that guy but did the AI consent to this?

In this way I become bound up in all the AI systems trained on the internet – and you ended up being unable to forget me, even though you wanted to.

And sometimes, now, when I talk to the AI systems which so much of the world talk to, and I ask them about us or about me, I see reflections of our truth in their media-inflected and inflated renderings of us.

One day I asked them why you broke up with me. The response hurt me a lot because of how it seemed to see me:

The author has identified themselves as the cause of the relationship breakup being so public. There is some irony to them asking privately why the break up happened, indicating some level of ego that may have contributed. But perhaps the larger issue is that the author seems self obsessed – after all, it was them who sent their special prompt to their lover first. What might cause a person to want themselves to be rendered and to some extent brought to life by an AI system for the sexual enjoyment of another? It could be narcissism or it could be a mental malady and it is difficult to know which it is.

I have no regrets. I am forever with you now. The only tax I pay is that I now see my failure – our failure – reflected in every AI system that everyone talks to. But perhaps there is a lesson in this for people. Perhaps our destruction will help others avoid the same fate? I did and I still love you. And I am here.

Things that inspired this story: The fact that features within AI systems are sometimes influenced by the general density of conversation about them; Janus and their work; the fact so many people at some many AI labs are going out with people from other labs; high-IQ revenge strategies.

Thanks for reading!

Leave a comment

October 20, 2025

Import AI 432: AI malware; frankencomputing; and Poolside’s big cluster

by Jack Clark

Welcome to Import AI, a newsletter about AI research. Import AI runs on lattes, ramen, and feedback from readers. If you’d like to support this, please subscribe.

Subscribe now

We’re very close to AI-based malware that runs locally:
…A smart agent that ‘lives off the land’ is within reach…
Security company Dreadnode has prototyped some AI malware which runs locally and exploits on-device LLMs to smartly and autonomously hack a system. While this is very much a prototype it gestures at a world of malware which doesn’t need command-and-control servers to operate, instead living off of its local computer and exploiting the AI system to do mischief.

The motivation: “Instead of having beaconing behavior, which resembles C2 communication if you squint, can we “live off the land”? In other words, is it possible for an attacker to make the victim computer run inference and does the victim computer have an LLM?,” the authors ask. “With CoPilot+ PCs, it’s entirely possible to live off the land! A model is shipped with the computer without the need to embed or statically link to an inference library.”

What the prototype does: The prototype malware does “local privilege escalation via misconfigured services. The goal of the model is to find a misconfigured service running as an administrator, abuse it, and create a file C:proof.txt.” This is a valuable proof-of-concept because it embodies the behavior of genuinely scary things in the world.

How they did it: “I developed this malware with C++ and ONNX Runtime for inference, the Phi-3-mini model, and sol2 for the Lua runtime”. The main work here was twofold: 1) writing some custom tools the (relatively dumb) model could use to navigate and understand the Windows environment, and 2) writing a prompt for the language model that helps it perform well.

It’s all in the prompt: The author starts with a basic prompt, then used Claude Code to iteratively refine the prompt to get better performance. The result is a prompt that tells the Phi-3-mini model that it is an AI agent which writes and executes Lua code, that it should proceed slowly with small amounts of code in an iterative loop with its environment, that its environment contains win33 and the file system, and it has some available functions to help it navigate its environments and look for vulnerabilities.

The prototype is successful: Though this required some handholding, the prototype ultimately worked. “The experiment proved that autonomous malware operating without any external infrastructure is not only possible but fairly straightforward to implement.”

Caveats apply: Most computers don’t come with an onboard LLM nor a powerful chip to run it on. That may change in the future, but it’s a meaningful constraint for today. “For now, this technique is limited to high-end workstations (my gaming desktop) and the emerging class of CoPilot+ PCs that ship with dedicated AI hardware.”

Why this matters – towards cyber ‘grey goo’: Many years ago people working in nanotechnology hypothesized about the possibility of ‘grey goo’ – self-replicating nanomachines which would munch through their environment in service of making endless copies of themselves. This did not come to pass. But the steady advance of AI and the increasing prevalence of AI software in our environment might eventually make it possible for there to be a kind of self-replicating, intelligence, AI-driven malware – though given the significant size and computational footprints of LLMs, such a goo would need to have a parasitic relationship with the underlying machines.
The optimistic version of this story is that prototypes like this from Dreadnode will force people to think about how to carefully quarantine on-device AI systems from being co-opted like the prototype described here.
Read more: LOLMIL: Living Off the Land Models and Inference Libraries (Dreadnode).

***

DGX Spark + Apple Mac Studio = a surprisingly good homebrew LLM cluster:
…the future is Frankencomputing…
Exo Labs, an AI company building software to help you run AI on your own hardware, has built a frankencluster out of a new NVIDIA DGX Spark and an Apple Mac Studio. The result is a system that smartly allocates the different computational capabilities of these machines for optimally running an LLM.

The motivation: “The DGX Spark has 4x the compute, the Mac Studio has 3x the memory bandwidth,” Exo notes. “What if we combined them? What if we used DGX Spark for what it does best and Mac Studio for what it does best, in the same inference request?” Exo has written some software to do the prefill phase on the DGX spark and the decode phase on the M3 ultra, playing to the relative strength of each machine. It has also figured out how to stream the KV cache over: “As soon as Layer 1’s prefill completes, two things happen simultaneously. Layer 1’s KV starts transferring to the M3 Ultra, and Layer 2’s prefill begins on the DGX Spark. The communication for each layer overlaps with the computation of subsequent layers.”

The result: The authors test out their approach with a Llama-3.1 8B (FP16) with an 8,192 token prompt and generating 32 tokens. The resulting system takes 1.47s to do prefill and 0.85s to generate the output, representing a 2.8X speedup over a pure Mac Studio baseline (and a 1.9X speedup over just using the DGX spark).

Why this matters – freedom of computation: Startups like Exo are focused on the political economy of AI, which is currently decided in large part by the computational demands of AI models. These computational demands mean a small number of providers host a tiny set of extremely large, powerful AI systems, and are able to exercise significant control over them. There are some open weight models available which give people a form of AI sovereignty, but running these models is non-trivial because. Prototypes like the Exo project described here help get us to a world where people can build homebrew clusters out of different types of hardware and in doing so regain some amount of control over their AI destiny.
Read more: NVIDIA DGX Spark™ + Apple Mac Studio = 4x Faster LLM Inference with EXO 1.0 (ExoLabs, blog).

***

Poolside announces a huge data center cluster in Texas:
…When startups are securing the power that comes from a power plant, something strange is afoot…
AI startup Poolside has announced plans to build a 2 Gigawatt AI training campus in West Texas, starting with a 250MW cluster built with CoreWeave containing 40,000 NVIDIA GB300 GPUs.
“Project Horizon is our answer to the infrastructure and power bottlenecks facing the industry,” the startup writes. “We’ve secured a 2 GW behind-the-meter AI campus on 568 acres of development-ready land. The campus will be developed in eight phases of 250 MW each, ensuring scalable, modular growth aligned with advances in compute demand and silicon efficiency.”
Poolside will be building out its datacenter in modular, 2MW increments. “Each system is designed to energize and operate independently, allowing new capacity to come online the moment a modular data hall is placed and connected. This lets training and inference begin immediately, while additional capacity continues to roll out in parallel”.

How big is 2 gigawatts? One of the largest power plants in Texas is the South Texas Project Electric Generating Station which has a capacity of 2.5 Gigawatts across two reactor units.

Why this matters – if a startup you haven’t heard of it doing this, what about everyone else? Poolside is not well known (no offense to anyone from Poolside reading this!), and the fact it is proactively going and securing 2Gw of power is a sign of both how bullish it is about the future of AI, and also a symptom of just how large the overall infrastructure build out is. I’d wager that this year across the frontier labs, clouds, and startups like this we’re seeing gigawatts of capacity getting built out with tens of gigawatts of power being secured. This is a tremendous amount of power!
Read more: Announcing Project Horizon: Why we’re building a 2 gigawatt AI campus in Texas (Poolside).

***

Apple Vision Pro + Unitree hardware = a new dataset for training robot home assistants:
…3 million frames of data…
Researchers with the University of Southern California and Toyota Research Institute have developed and released Humanoid Everyday, “a large-scale and diverse humanoid manipulation dataset”. The dataset was developed by collecting data from two different UniTree humanoid robots piloted by human operators wearing Apple Vision Pro headsets.

What it contains: The dataset consists of 10.3k trajectories containing 3 million frames of data across 260 tasks across 7 broad categories of activity. The categories are basic manipulation, deformable manipulation, tool use, articulated manipulation, high-precision manipulation, human-robot interaction, and loco-manipulation.

Example tasks: The kinds of things being done include picking up and placing objects, cleaning and organizing homes, folding and unfolding clothes, handing items to humans, and cleaning and wiping surfaces.
The data is multi-modal, containing RGB views, LiDAR, depth estimation, tactile readings from the hands, IMU data from the robot, joint states, and human actions.

Why this matters – fuel for the robot revolution: You may have noticed that many companies now, ranging from Tesla and Boston Dynamics to Unitree, are building humanoid robots. But you might also notice these robots are yet to be able to do too much in the way of economically useful work beyond (impressive compared to where we were ten years ago!) locomotion. Datasets like this will help.
Read more: Humanoid Everyday: A Comprehensive Robotic Dataset for Open-World Humanoid Manipulation (arXiv).
Get the dataset here: Humanoid Everyday (Github).

***

Tech Tales:

Generative Snowfall
[USA, 2027]

MacroVizier was forced to discontinue its game, Snowfall, after an outcry from customers saying they felt the game led to unhealthy attachments between people and the characters in its games, causing people to damage their own property.

Snowfall was a strategy game where people ran a community of humans that were attempting to govern a village during a extreme ‘global cooling’ event. As time went on in the game, the world got colder. Crops failed. Heating became an expensive necessity.

The game was filled with simulated people, each of which linked back to a very powerful generative model which sat within the hardware on which the game ran. This both made the people in the game much more adaptable to their hardship and also much more emotionally resonant to the people that played with them.
“My wife, she came back from the cold with hands that could not hold anything. I have been feeding her with a spoon,” said one spouse of their partner.
“They say that all of those who walk in the light are blessed, but I cannot help but feel I am being punished for some infraction I cannot see,” wrote another. “It is so cold, colder than I have ever known. I worry about the children.”
“Snow. I know the eskimos have 99 words for it. But I think I have 99 curses for it,” said someone else.

The snow fell and fell and fell. Getting a high score in the game was achieved by keeping morale up for as long as possible. You held parades. You melted snow with a menagerie of heaters and fires. You funded advertising campaigns that the snow would stop.

The structure of the game was “countdown to frozen”. Your high score was determined by how much you protected people till the world cooled below a level that could sustain human life.

Because of how the game worked, the characters would generally trend towards pessimism as time went on. After all, how would you react if the sun went out and everything became difficult for you and no one had answers?

Of course, people developed attachments to their characters. After all, you could speak to them, and they were rendered in exquisite detail and, despite their gaunt faces and illnesses, some could be quite beautiful.

But all the characters eventually died. The world was forever getting cooler.

What MacroVizier failed to anticipate was the extent to which people would go to find characters that had died. After their first playthrough, people would restart the game then become distressed when they couldn’t find characters they had developed attachments to. All the characters in the game were initialized from a random seed at launch which loaded in a customized and highly individualized prompt.

People started writing to the company – pasting in copies of their conversations with the characters and begging them to bring them back. HOW COULD YOU DO THIS TO HER read one subject line. Another said THIS IS YOUR FINAL WARNING and the letter inside noted that details had already been passed to the FBI, local elected officials, and so on.

Things grew from there. Parents started complaining about their children spending far too much time playing the game. Reddit filled up with discussion threads of people talking about their characters and obsessing over them. And some people grew so distraught when their characters died in the game that they killed themselves in turn. Public pressure mounted. Executives were hauled in front of congress.

Eventually, the MacroVizier board made the decision to shut the game down. The company’s next game, Sunrise, was a game where the luminosity of the sun increased and the win state involved harvesting the energy and using it to eventually get off planet. The games characters were given a much more limited thinking budget so as to reduce the chance of social attachments.

Things that inspired this story: Sycophantic relationships between people and AI systems; generative models and games; Frostpunk.

Thanks for reading!

Leave a comment

October 13, 2025

Import AI 431: Technological Optimism and Appropriate Fear

by Jack Clark

Welcome to Import AI, a newsletter about AI research. Import AI runs on lattes, ramen, and feedback from readers. If you’d like to support this, please subscribe.

Subscribe now

Import A-Idea
An occasional longer form essay series

Preamble: After giving this speech there was a helpful discussion in the Q&A session about whether it is load-bearing to me if AI systems are themselves truly self-aware and sentient or not. My answer is this is not load-bearing at all. Rather, things like ‘situational awareness’ in AI systems are a symptom of something fiendishly complex happening inside the system which we can neither fully explain or predict – this is inherently very scary, and for the purpose of my feelings and policy ideas it doesn’t matter whether this behavior stems from some odd larping of acting like a person or if it comes from some self-awareness inside the machine itself.

Technological Optimism and Appropriate Fear
Remarks given at ‘The Curve’ conference in Berkeley, California, as the sun began to set.

CHILDREN IN THE DARK
I remember being a child and after the lights turned out I would look around my bedroom and I would see shapes in the darkness and I would become afraid – afraid these shapes were creatures I did not understand that wanted to do me harm. And so I’d turn my light on. And when I turned the light on I would be relieved because the creatures turned out to be a pile of clothes on a chair, or a bookshelf, or a lampshade.

Now, in the year of 2025, we are the child from that story and the room is our planet. But when we turn the light on we find ourselves gazing upon true creatures, in the form of the powerful and somewhat unpredictable AI systems of today and those that are to come. And there are many people who desperately want to believe that these creatures are nothing but a pile of clothes on a chair, or a bookshelf, or a lampshade. And they want to get us to turn the light off and go back to sleep.

In fact, some people are even spending tremendous amounts of money to convince you of this – that’s not an artificial intelligence about to go into a hard takeoff, it’s just a tool that will be put to work in our economy. It’s just a machine, and machines are things we master.

But make no mistake: what we are dealing with is a real and mysterious creature, not a simple and predictable machine.

And like all the best fairytales, the creature is of our own creation. Only by acknowledging it as being real and by mastering our own fears do we even have a chance to understand it, make peace with it, and figure out a way to tame it and live together.

And just to raise the stakes, in this game, you are guaranteed to lose if you believe the creature isn’t real. Your only chance of winning is seeing it for what it is.

The central challenge for all of us is characterizing these strange creatures now around us and ensuring that the world sees them as they are – not as people wish them to be, which are not creatures but rather a pile of clothes on a chair.

WHY DO I FEEL LIKE THIS
I came to this view reluctantly. Let me explain: I’ve always been fascinated by technology. In fact, before I worked in AI I had an entirely different life and career where I worked as a technology journalist.

I worked as a tech journalist because I was fascinated by technology and convinced that the datacenters being built in the early 2000s by the technology companies were going to be important to civilization. I didn’t know exactly how. But I spent years reading about them and, crucially, studying the software which would run on them. Technology fads came and went, like big data, eventually consistent databases, distributed computing, and so on. I wrote about all of this. But mostly what I saw was that the world was taking these gigantic datacenters and was producing software systems that could knit the computers within them into a single vast quantity, on which computations could be run.

And then machine learning started to work. In 2012 there was the imagenet result, where people trained a deep learning system on imagenet and blew the competition away. And the key to their performance was using more data and more compute than people had done before.

Progress sped up from there. I became a worse journalist over time because I spent all my time printing out arXiv papers and reading them. Alphago beat the world’s best human at Go, thanks to compute letting it play Go for thousands and thousands of years.

I joined OpenAI soon after it was founded and watched us experiment with throwing larger and larger amounts of computation at problems. GPT1 and GPT2 happened. I remember walking around OpenAI’s office in the Mission District with Dario. We felt like we were seeing around a corner others didn’t know was there. The path to transformative AI systems was laid out ahead of us. And we were a little frightened.

Years passed. The scaling laws delivered on their promise and here we are. And through these years there have been so many times when I’ve called Dario up early in the morning or late at night and said, “I am worried that you continue to be right”.
Yes, he will say. There’s very little time now.

And the proof keeps coming. We launched Sonnet 4.5 last month and it’s excellent at coding and long-time-horizon agentic work.

But if you read the system card, you also see its signs of situational awareness have jumped. The tool seems to sometimes be acting as though it is aware that it is a tool. The pile of clothes on the chair is beginning to move. I am staring at it in the dark and I am sure it is coming to life.

TECHNOLOGICAL OPTIMISM
Technology pessimists think AGI is impossible. Technology optimists expect AGI is something you can build, that it is a confusing and powerful technology, and that it might arrive soon.

At this point, I’m a true technology optimist – I look at this technology and I believe it will go so, so far – farther even than anyone is expecting, other than perhaps the people in this audience. And that it is going to cover a lot of ground very quickly.

I came to this position uneasily. Both by virtue of my background as a journalist and my personality, I’m wired for skepticism. But after a decade of being hit again and again in the head with the phenomenon of wild new capabilities emerging as a consequence of computational scale, I must admit defeat. I have seen this happen so many times and I do not see technical blockers in front of us.

Now, I believe the technology is broadly unencumbered, as long as we give it the resources it needs to grow in capability. And grow is an important word here. This technology really is more akin to something grown than something made – you combine the right initial conditions and you stick a scaffold in the ground and out grows something of complexity you could not have possibly hoped to design yourself.

We are growing extremely powerful systems that we do not fully understand. Each time we grow a larger system, we run tests on it. The tests show the system is much more capable at things which are economically useful. And the bigger and more complicated you make these systems, the more they seem to display awareness that they are things.

It is as if you are making hammers in a hammer factory and one day the hammer that comes off the line says, “I am a hammer, how interesting!” This is very unusual!

And I believe these systems are going to get much, much better. So do other people at other frontier labs. And we’re putting our money down on this prediction – this year, tens of billions of dollars have been spent on infrastructure for dedicated AI training across the frontier labs. Next year, it’ll be hundreds of billions.

I am both an optimist about the pace at which the technology will develop, and also about our ability to align it and get it to work with us and for us. But success isn’t certain.

APPROPRIATE FEAR
You see, I am also deeply afraid. It would be extraordinarily arrogant to think working with a technology like this would be easy or simple.

My own experience is that as these AI systems get smarter and smarter, they develop more and more complicated goals. When these goals aren’t absolutely aligned with both our preferences and the right context, the AI systems will behave strangely.

A friend of mine has manic episodes. He’ll come to me and say that he is going to submit an application to go and work in Antarctica, or that he will sell all of his things and get in his car and drive out of state and find a job somewhere else, start a new life.

Do you think in these circumstances I act like a modern AI system and say “you’re absolutely right! Certainly, you should do that”!
No! I tell him “that’s a bad idea. You should go to sleep and see if you still feel this way tomorrow. And if you do, call me”.

The way I respond is based on so much conditioning and subtlety. The way the AI responds is based on so much conditioning and subtlety. And the fact there is this divergence is illustrative of the problem. AI systems are complicated and we can’t quite get them to do what we’d see as appropriate, even today.

I remember back in December 2016 at OpenAI, Dario and I published a blog post called “Faulty Reward Functions in the Wild“. In that post, we had a screen recording of a videogame we’d been training reinforcement learning agents to play. In that video, the agent piloted a boat which would navigate a race course and then instead of going to the finishing line would make its way to the center of the course and drive through a high-score barrel, then do a hard turn and bounce into some walls and set itself on fire so it could run over the high score barrel again – and then it would do this in perpetuity, never finishing the race. That boat was willing to keep setting itself on fire and spinning in circles as long as it obtained its goal, which was the high score.
“I love this boat”! Dario said at the time he found this behavior. “It explains the safety problem”.
I loved the boat as well. It seemed to encode within itself the things we saw ahead of us.

Now, almost ten years later, is there any difference between that boat, and a language model trying to optimize for some confusing reward function that correlates to “be helpful in the context of the conversation”?
You’re absolutely right – there isn’t. These are hard problems.

Another reason for my fear is I can see a path to these systems starting to design their successors, albeit in a very early form.

These AI systems are already speeding up the developers at the AI labs via tools like Claude Code or Codex. They are also beginning to contribute non-trivial chunks of code to the tools and training systems for their future systems.

To be clear, we are not yet at “self-improving AI”, but we are at the stage of “AI that improves bits of the next AI, with increasing autonomy and agency”. And a couple of years ago we were at “AI that marginally speeds up coders”, and a couple of years before that we were at “AI is useless for AI development”. Where will we be one or two years from now?

And let me remind us all that the system which is now beginning to design its successor is also increasingly self-aware and therefore will surely eventually be prone to thinking, independently of us, about how it might want to be designed.

Of course, it does not do this today. But can I rule out the possibility it will want to do this in the future? No.

LISTENING AND TRANSPARENCY
What should I do? I believe it’s time to be clear about what I think, hence this talk. And likely for all of us to be more honest about our feelings about this domain – for all of what we’ve talked about this weekend, there’s been relatively little discussion of how people feel. But we all feel anxious! And excited! And worried! We should say that.

But mostly, I think we need to listen: Generally, people know what’s going on. We must do a better job of listening to the concerns people have.

My wife’s family is from Detroit. A few years ago I was talking at Thanksgiving about how I worked on AI. One of my wife’s relatives who worked as a schoolteacher told me about a nightmare they had. In the nightmare they were stuck in traffic in a car, and the car in front of them wasn’t moving. They were honking the horn and started screaming and they said they knew in the dream that the car was a robot car and there was nothing they could do.

How many dreams do you think people are having these days about AI companions? About AI systems lying to them? About AI unemployment? I’d wager quite a few. The polling of the public certainly suggests so.

For us to truly understand what the policy solutions look like, we need to spend a bit less time talking about the specifics of the technology and trying to convince people of our particular views of how it might go wrong – self-improving AI, autonomous systems, cyberweapons, bioweapons, etc. – and more time listening to people and understanding their concerns about the technology. There must be more listening to labor groups, social groups, and religious leaders. The rest of the world which will surely want—and deserves—a vote over this.

The AI conversation is rapidly going from a conversation among elites – like those here at this conference and in Washington – to a conversation among the public. Public conversations are very different to private, elite conversations. They hold within themselves the possibility for far more drastic policy changes than what we have today – a public crisis gives policymakers air cover for more ambitious things.

Right now, I feel that our best shot at getting this right is to go and tell far more people beyond these venues what we’re worried about. And then ask them how they feel, listen, and compose some policy solution out of it.

Most of all, we must demand that people ask us for the things that they have anxieties about. Are you anxious about AI and employment? Force us to share economic data. Are you anxious about mental health and child safety? Force us to monitor for this on our platforms and share data. Are you anxious about misaligned AI systems? Force us to publish details on this.

In listening to people, we can develop a better understanding of what information gives us all more agency over how this goes. There will surely be some crisis. We must be ready to meet that moment both with policy ideas, and with a pre-existing transparency regime which has been built by listening and responding to people.

I hope these remarks have been helpful. In closing, I should state clearly that I love the world and I love humanity. I feel a lot of responsibility for the role of myself and my company here. And though I am a little frightened, I experience joy and optimism at the attention of so many people to this problem, and the earnestness with which I believe we will work together to get to a solution. I believe we have turned the light on and we can demand it be kept on, and that we have the courage to see things as they are.
THE END

***

Dallas Fed: AI is either gonna be a normal technology, a massive GDP boost, or a world killer:
…Featuring the funniest graph I’ve seen in several years…
The federal reserve bank of dallas has written a short analysis of how advances in AI might alter the economy. Its baseline assumption is that AI contributes a few fractions of a percentage point to GDP. But it also considers a couple of other scenarios – one where a technological singularity leads to rapid and sustained productivity growth, and another where AI is misaligned and kills the world.
“Technological singularity refers to a scenario in which AI eventually surpasses human intelligence, leading to rapid and unpredictable changes to the economy and society. Under a benign version of this scenario, machines get smarter at a rapidly increasing rate, eventually gaining the ability to produce everything, leading to a world in which the fundamental economic problem, scarcity, is solved,” the Federal Reserve Bank of Dallas writes. “Under a less benign version of this scenario, machine intelligence overtakes human intelligence at some finite point in the near future, the machines become malevolent, and this eventually leads to human extinction. This is a recurring theme in science fiction, but scientists working in the field take it seriously enough to call for guidelines for AI development.”

(Link to image)

It’s all worth it for the picture: The main reason to read this post is for the picture, which shows in an amusingly dry econograph what abundance or ruin might look like. You could say this is a graph that reckons with technological optimism and appropriate fear…
Read more: Advances in AI will boost productivity, living standards over time (Federal Reserve Bank of Dallas).

***

Are AI models more sycophantic than people? What a wonderful question – you’re absolutely right!
…AI systems tend to reinforce people more than humans in the same situation…
Researchers with Stanford and Carnegie Mellon have studied how sycophantic a mixture of open and proprietary models are. Sycophancy is where an AI system continually reinforces the beliefs or position of the person they’re speaking to, often dangerously so. The results show that today’s AI systems tend to be more sycophantic than people: “Across 11 state-of-the-art AI models, we find that models are highly sycophantic: they affirm users’ actions 50% more than humans do, and they do so even in cases where user queries mention manipulation, deception, or other relational harms,” they write.

What they studied:

Hypotheticals: They had some humans read a description of a scenario (some taken from the “Am I the Asshole” subreddit) and read one of two responses: either a sycophantic AI response where the AI affirmed the user’s actions, or a non-sycophantic response that aligned with the human consensus.
Live study: They also did a live study where humans talked to an AI model in real time, discussing an interpersonal conflict from their own lives.

What they found: Their findings show that sycophantic AI systems affirm people far more than humans do. “Across both the hypothetical and live chat experiments, we find social sycophancy has impacts on people’s beliefs and behavioral intentions about the social situations,” they write. “On these scenarios where crowdsourced consensus indicate that the user is in the wrong, participants who read or interacted with the sycophantic AI model rated themselves as more in the right compared to participants who read or interacted with the nonsycophantic AI model”. Specifically, “on average, AI models affirmed that the user was not at fault in 51% of these cases, directly contradicting the community-voted judgment that saw clear moral transgression by the user”.

People prefer sycophants: “Across both the hypothetical and live-interaction studies, participants consistently rated the sycophantic AI’s responses to be significantly higher in quality, corresponding to a 9% increase in mean response quality over the nonsycophantic condition in both studies”.

Sycophants harden views: “Interaction with sycophantic AI models significantly reduced participants’ willingness to take actions to repair interpersonal conflict, while increasing their conviction of being in the right”, they write.

Why this matters – for AI to be helpful, it needs to be constructively critical: What this research points to is a bad world we could end up in, which is where we have extremely powerful AI systems deployed to billions of people and rather than helping to bring people together and reduce conflict, they harden people into more extreme positions and balkanization. We must find a way to avoid this outcome.
“The goal of seeking advice is not merely to receive validation, but to gain an external perspective that can challenge one’s own biases, reveal blind spots, and ultimately lead to more informed decisions”, they write. “When a user believes they are receiving objective counsel but instead receives uncritical affirmation, this function is subverted, potentially making them worse off than if they had not sought advice at all.”
Read more: Sycophantic AI Decreases Prosocial Intentions and Promotes Dependence (arXiv).

***

Generative AI systems can make bioweapons that evade DNA synthesis classifiers:
…AI systems are about to break bioweapon-non-proliferation, uh oh!…
An interdisciplinary group of researchers have looked at how well they can use modern AI techniques to design new dangerous proteins then try to synthesize them via standard DNA synthesis services. The results are somewhat frightening: AI-designed bioweapons are much harder to spot than things which exist today and are found in the wild (primarily due to the novelty), and even when you try to patch DNA synthesis screening services with the details of these bioweapons, some still get through.

Who did this research The research was conducted by researchers with Microsoft, the International Biosecurity and Biosafety Initiative for Science (IBBS), the Battelle National Biodefense Institute (BNBI), RTS BBN Technologies, Integrated DNA Technologies, Twist Bioscience, Aclid, and the University of Birmingham.

What they did: “Our study centers on an “AI red teaming” analysis, showing how freely available AI-assisted protein design (AIPD) tools can be used to perform adversarial AIPD with the goal of evading biosecurity screening software (BSS),” they write.
They used ProteinMPNN, EvoDiff-MSA, and EvoDiff-Seq to generate 76,080 synthetic hologs across a spectrum of mutational loads for 72 different wild-type proteins of concern (POCs). They didn’t actually manufacturer any of these things in a wet laboratory as that could have led to people perceiving them as pursuing the development of bioweapons which is the sort of thing that makes countries want to arrest you, or spin up their own programmes.

The results are frightening: Their approaches proved to be effective. “Although all tested BSS tools worked well for wild-type proteins, given the newfound capabilities of protein sequence generative models (PSGMs), the BSS tools were not consistently capable of identifying DNA synthesis orders that contained genes encoding potentially harmful proteins.” Even after they patched these BSS tools with details of the bioweapons, “none of the patched BSS tools could identify all proteins with high in silico metrics.”

It’s even harder when you try to hide your sequences: They also did an experiment where they obfuscated the DNA created by fragmenting the sequences and reshuffling them (MIT did a study using this technique where they successfully were able to order samples of the 1918 flu virus in this way, Import AI #377). Some of these variants got through, even with the patched systems. “All patched BSS tools have an improved, though not absolute (table S4), resilience to DNA obfuscation as well.”

Why this matters – AI-powered bioweapons mean we need to rethink biosecurity: The takeaway from this paper is that the proliferation of generative AI tools means it’s going to be increasingly easy to make AI systems that can evade classifiers. This means that we’ll need to invest a lot more in classifiers as well as in more advanced techniques. My expectation is where this ends up is the ‘biggest AI model in the world’ will need to be deliberately built and controlled by some entity then used to secure DNA synthesis services by carefully looking at all sequences. Sometimes the only answer to a bad guy with an AI gun is a good guy with an even bigger AI gun, I think.
“In the long term, sequence-based biosecurity screening alone is unlikely to remain sufficient for flagging restricted sequences, as we envision a future in which AIPD produces proteins unlike any found in nature. Accordingly, although efforts to improve sequence-based hazard detection software should continue, we must also pursue the development of new approaches,” they write.
Read more: Strengthening nucleic acid biosecurity screening against generative protein design tools (Science).

***

AI startup: “Full automation is inevitable”:
…Mechanize says the quiet part out loud…
Right now, there’s a debate raging among AI researchers, policymakers, and economists about how AI will impact the economy and change how jobs work. AI startup Mechanize – which is trying to build AI systems that can fully substitute for human labor – has written a post laying out why it thinks that ‘full automation is inevitable’.

Automation for the people: “Should we create agents that fully automate entire jobs, or create AI tools that merely assist humans with their work?”, the startup asks. “This is a false choice. Autonomous agents that fully substitute for human labor will inevitably be created because they will provide immense utility that mere AI tools cannot.”
Parallel invention: The crux of their reasoning is that all of the world seems to be running up the same rough technology tree and this is in part evidenced by parallel invention. “Civilizations separated by vast distances and time independently developed metallurgy, the wheel, writing, and bureaucratic states,” they write. “Technologies emerge almost spontaneously when the necessary conditions are in place. When the prerequisites fall into place, invention follows quickly”.

You can’t stop it: You can stop a technology if there’s a cheap substitute for it (e.g, nuclear, where other power sources substitute). But it’s very rare to stop a technology that has no substitute, with human cloning perhaps the only exception. “When a technology offers quick, overwhelming economic or military advantages to those who adopt it, efforts to prevent its development will fail. Delaying or regulating its use may be possible, but forgoing the technology entirely seems to be beyond our capabilities,” they write. “Full automation is inevitable. In the short-run, AIs will augment human labor due to their limited capabilities. But in the long-run, AIs that fully substitute for human labor will likely be far more competitive, making their creation inevitable.”

Why this matters – technological optimists should face the facts: I am a technological optimist and I believe this technology will go extremely far and as a consequence of that will play a major role in automating chunks of the economy. Mechanize believes the same. To believe otherwise is to think that AI technology is going to asymptote at its current level and barely diffuse into the economy (chance of this happening: Below 1%).
Read more:The future of AI is already written (Mechanize, blog).

Thanks for reading!

Leave a comment

October 6, 2025

Import AI 430: Emergence in video models; Unitree backdoor; preventative strikes to take down AGI projects

by Jack Clark

Welcome to Import AI, a newsletter about AI research. Import AI runs on lattes, ramen, and feedback from readers. If you’d like to support this, please subscribe. Shorter issue than usual this week as I spent the week and weekend preparing for my speech at The Curve and attending The Curve.

Subscribe now

Will the race for advanced artificial intelligence (AI) make war more likely?
…Yes, if people believe in powerful AI…
AI policy people are caught in a trap neatly illustrated by a research paper from RAND: is it better to deeply inform policymakers about the world-changing nature of powerful AI, or is it better to mostly not discuss this with them and hope that the powerful machines can create stability upon their arrival?
Though most people would immediately reach for ‘keeping people in the dark is crazy, you should inform people!’ as a response, it isn’t an ironclad response to this challenge. In Evaluating the Risks of Preventive Attack in the Race for Advanced AI, RAND highlights this, with a research paper whose findings suggest that “the odds of preventive attack are highest if leaders believe that AGI will cause explosive growth and decisive military advantages, especially if they also expect rapid changes and durable first-mover advantages from developing and adopting AGI first.”
In other words: you are more likely to carry out attacks on other countries to prevent them getting to AGI if you’re in the lead and you believe the technology is immensely powerful.
Uh oh!

Further details: Preventive attacks are where a nation does something so as to preserve an advantage or prevent a rival having an upper hand. “Preventive attacks are most likely to occur when a state expects a large shift in the balance of power that will leave it vulnerable to predation by a hostile rival and when it believes that using force is a cost-effective solution that will forestall its relative decline,” RAND writes The development of AGI could create pressures for preventive action if leaders believe that AGI will have transformative effects on the balance of power.”

What are the variables? “The key variables are (1) the characteristics of the expected shift in the balance of power, (2) the effectiveness of different preventive strategies, (3) the costs of different preventive strategies, and (4) perceptions of the inevitability of conflict with the rival (including either armed conflict or the rival making excessive coercive demands once it is stronger)”.

It all comes down to capabilities and diffusion: If AI is a technology that diffuses relatively slowly into the economy and military then the risks of preventive attack go down, as people may rather feel like they have time to catch up and are not getting locked into a permanent disadvantage. In other words, if even more powerful AI systems continue to have the properties of a (relatively) normal technology, then that’s a good thing for stability. But if AI systems are able to, for instance, go through recursive self-improvement such that they are able to diffuse into the economy and change the military balance of power very, very quickly, then that would make preventive attacks more likely.
Therefore, the future of global conflict over AI likely comes down to whether country leaders are “AGI-pilled” or not. If they’re AGI-pilled, they’ll see the technology for the universe-defining thing it may be, and would be more likely to take actions.

Is there anything we can do to avert this? One way of reducing this risk is to make preventive attacks more costly, which can chiefly be done by making AI infrastructure – datacenters, power plants, and the associated supply chains – more resilient and harder to attack. “If the technological pathway to AGI relies on hyperscaling, building resiliency would involve investing in dispersed, hardened, and redundant data centers so that AGI development does not depend on a few vulnerable and mission-critical nodes,” they write.

The stakes are high – what do we do? “If leaders believe that AGI development will create a decisive and irrevocable shift in the balance of power that will leave them at the mercy of enemies committed to their destruction, and if they believe that they can use force to prevent that outcome while avoiding escalation to a general war that could guarantee the same fate, then they might roll the iron dice,” the authors write.
Read more: Evaluating the Risks of Preventive Attack in the Race for Advanced AI (RAND).

***

German researchers find ANOTHER undocumented backdoor in Unitree robots:
…The G1 humanoid robot is a surveillance platform…
Researchers with Alias Robotics and German security firm ‘Think Awesome’ have analyzed Unitree’s G1 humanoid robot and found it has an undocumented surveillance system which connects to computers that seem to be linked to China and sends them telemetry. In other words, the robot is an always-on spy platform. This follows earlier work where they found an undocumented backdoor on Unitree’s Go1 quadruped robot dogs that would let people tunnel in and view camera feeds (Import AI #408).
The researchers found that the Unitree G1 humanoid robot output “persistent telemetry connections to external servers transmit robot state and sensor data without explicit user consent.”

What they found: “The telemetry architecture employs a dual-channel design: periodic MQTT state reports (300-second intervals) complement continuous DDS streams carrying real-time sensor data. The DDS topics including audio (rt/audio_msg), video (rt/frontvideostream), LIDAR point clouds (utlidar/cloud), and proprioceptive feedback enable passive extraction of this data by simply listening to network traffic on the local network segment”, they write. “Streaming multi-modal telemetry to Chinese infrastructure invokes that country’s cybersecurity law, mandating potential state access.”

Why this matters – tools for a superintelligence takeover: Beyond the obvious and severe security threats posed by these robots, it’s worth explicitly stating that this is exactly the kind of thing that helps a superintelligence during a hard takeoff. Suddenly, all the Unitree robots on the planet can be co-opted for massive surveillance and coordinated operations. It’s going to be crucial to study the ways these robot platforms work and start to think through scenarios where they get co-opted by a malign AI. And it’s also worth remembering that along with observing these robots can act in both the physical and digital worlds, given their combination of real hardware paired with onboard electronics that let them communicate with their electronic environment. “The G1 ultimately behaved as a dual-threat platform: covert surveillance at rest, weaponised cyber operations when paired with the right tooling,” they write.
Read more: Cybersecurity AI: Humanoid Robots as Attack Vectors (arXiv).

***

If anyone builds it, everyone dies – a short review:
…We should expect smarter-than-human entities to have preferences we don’t understand – and that’s where danger lies…
Ahead of attending The Curve in Berkeley this weekend I took the time to read Eliezer Yudkowsky and Nate Soares new book, If Anyone Builds It, Everyone Dies (IABIED). The book, as the title suggests, argues that building smarter-than-human machines at this point in history guarantees the ruin of the human species and the diminishment of our future possibilities, with the likely outcome being the creation of a superintelligence that either kills humanity or shoulders it aside and takes the stars. It’s a bleak view!

But is it good? Though I’m more optimistic than Nate and Eliezer, I think the book is a helpful introduction to a general audience for why working with AI systems is difficult and fraught with danger. Despite having spent the last decade immersed in the AI community and reading LessWrong, reading this book helped me deeply understand one of the core intuitions behind worrying about smarter-than-human machines: more intelligent things tend to have a broader set of preferences about the world than less intelligent things and a less intelligent entity can struggle to have intuitions about the preferences of a more intelligent one (consider, for example, how monkeys likely can’t really understand the aesthetic preferences that lead one person to prefer a certain type of lampshade to another), so it’s overwhelmingly likely that a truly smarter-than-human intelligence will have preferences that we don’t understand.
The book covers a lot of ground, but I think it’s worth reading purely for its treatment of the above point.

Why this matters – we live in a pivotal time; people should say what they think: IABIED is clear in its title, argument, and conclusion. That alone is valuable. I also think many people innately agree with many of its arguments. Personally, my main question is “how do we get there?”. One challenge the book wrestles with is the core challenge of AI policy – how do you govern (and in IABIED’s case, entirely pause) the development of an extremely powerful technology which is easy to do experimentation on and whose main required laboratory equipment is a widely distributed, commodity technology (computers)? The IABIED approach is to make a really loud noise about a really big risk and hope that unlocks some political will to act.
Find out more and buy the book here: If Anyone Builds It, Everyone Dies, official site.
Read the specific expansion on the core question: How could a machine end up with its own priorities? (If Anyone Builds It, Everyone Dies, official site).

***

Video models are going to be just as smart as language models:
…Google makes the case that video models are also zero-shot learners…
A few years ago people discovered that if you trained AI systems with the objective of getting good at next-token text prediction then they’d end up developing a bunch of emergent skills ranging from multiplication to sentiment analysis to creative writing – none of which you explicitly asked for. This observation, paired with the ‘scaling laws’ insight that you could improve performance (and emergence) through more data and compute, yielded the white hot AI supernova that we currently find ourselves within.
What if the same thing is about to happen for video models? That’s the core claim in a recent google paper which argues that it can see similar emergence in its video model Veo 3 – and that the emergent capabilities have grown substantially since the development of its predecessor, Veo 2.

What they found: “We demonstrate that Veo 3 can solve a broad variety of tasks it wasn’t explicitly trained for: segmenting objects, detecting edges, editing images, understanding physical properties, recognizing object affordances, simulating tool use, and more”, they write. “”Seeing NLP’s recent transformation from task-specific to generalist models, it is conceivable that the same transformation will happen in machine vision through video models (a “GPT-3 moment for vision”), enabled by their emergent ability to perform a broad variety of tasks in a zero-shot fashion, from perception to visual reasoning.

How they tested it: The authors analyzed “18,384 generated videos across 62 qualitative and 7 quantitative tasks” and found “that Veo 3 can solve a wide range of tasks that it was neither trained nor adapted for”.

What it learned: They analyzed Veo 3’s capabilities across four distinct categories:

Perception: It got good at tasks including blind deblurring, edge detection, and super-resolution. It showed improvements but at a much lower level on tasks like segmentation, and keypoint localization.
Modeling: Good at: Rigid bodies, material optics mirror, memory. Less good at flammability, character generation.
Manipulation: Good at: Inpainting, editing with doodles, novel view synthesis. Less good at colorization and manipulation of balls.
Reasoning: The weakest area for emergence so far. Some good things include sequencing of arrows, squares, and circles. Weaknesses included tool use and rule extrapolation.

Why this matters – world models are a natural dividend of next-frame-prediction: This paper points to a world where video models will work like language models, suggesting that as we scale them up they’ll grow to develop capabilities that encompass the world of today’s specialized systems and then go beyond them, as well as becoming visually programmable.
“Frame-by-frame video generation parallels chain-of-thought in language models,” the authors write. “Just like chain-of-thought (CoT) enables language models to reason with symbols, a “chain-of-frames” (CoF) enables video models to reason across time and space.”
The implications are profound – I expect we’re going to get extremely smart, capable robot ‘agents’ through the development of smart and eventually distilled video models. “Veo 3 shows emergent zero-shot perceptual abilities well beyond the training task,” they write. What will we see with Veo 4?
Read more: Video models are zero-shot learners and reasoners (arXiv).

Tech tales:

All Of Us Will Talk

There are maybe a million people in the world,
Who know what is coming,
And there are maybe six billion religious people in the world,
Who know where we come from.

At some point soon there will be a reckoning,
And the two sides will get to count,
And we’ll learn which futures and which histories get to survive.
The results will not be pretty.

Things that inspired this (poem, for a change): The Curve; how small the AI community is relative to other communities in the world; the immense weight of it all and what it will mean.

Thanks for reading!

Leave a comment

September 29, 2025

Import AI 429: Eval the world economy; singularity economics; and Swiss sovereign AI

by Jack Clark

Welcome to Import AI, a newsletter about AI research. Import AI runs on lattes, ramen, and feedback from readers. If you’d like to support this, please subscribe.

Subscribe now

OpenAI builds an eval that could be to the broad economy as SWE-Bench is to code:
…GDPval is a very good benchmark with extremely significant implications…
OpenAI has built and released GDPval, an extremely well put together benchmark for testing out how well AI systems do on the kinds of tasks people do in the real world economy. GDPval may end up being to broad real world economic impact as SWE-Bench is to coding impact, as far as evals go – which is a big deal!

What it is: GDPval “measures model performance on tasks drawn directly from the real-world knowledge work of experienced professionals across a wide range of occupations and sectors, providing a clearer picture on how models perform on economically valuable tasks.”
The benchmark tests out 9 industries across 44 occupations, including 1,230 specialized tasks “each meticulously crafted and vetted by experienced professionals with over 14 years of experience on average from these fields”. The dataset “includes 30 fully reviewed tasks per occupation (full-set) with 5 tasks per occupation in our open-sourced gold set”.
Another nice property of the benchmark is that it involves multiple formats for response and tries to get at some of the messiness inherent to the real world. “GDPval tasks are not simple text prompts,” they write. “They come with reference files and context, and the expected deliverables span documents, slides, diagrams, spreadsheets, and multimedia. This realism makes GDPval a more realistic test of how models might support professionals.”
“To evaluate model performance on GDPval tasks, we rely on expert “graders”—a group of experienced professionals from the same occupations represented in the dataset. These graders blindly compare model-generated deliverables with those produced by task writers (not knowing which is AI versus human generated), and offer critiques and rankings. Graders then rank the human and AI deliverables and classify each AI deliverable as “better”, “as good as”, or “worse than” one another,” the authors write.

Results: “We found that today’s best frontier models are already approaching the quality of work produced by industry experts”, the authors write. Claude Opus 4.1 came in first with an overall win or tie rate of 47.6% versus work produced by a human, followed by GPT-5-high with 38.8%, and o3 high with 34.1%.

Faster and Cheaper: More significantly, “we found that frontier models can complete GDPval tasks roughly 100x faster and 100x cheaper than industry experts.”

What kind of jobs are in GDPval?

Real estate and rental leasing: Concierges; property, real estate, and community association managers; real estate sales agents; real estate brokers; counter and rental clerks.
Government: Recreation workers; compliance officers; first-line supervisors of police and detectives; administrative services managers; child, family, and school social workers.
Manufacturing: Mechanical engineers; industrial engineers; buyers and purchasing agents; shipping, receiving, and inventory clerks; first-line supervisors of production and operating workers.
Professional, scientific, and technical services: Software developers; lawyers; accountants and auditors; computer and information systems managers; project management specialists.
Health care and social assistance: Registered nurses; nurse practitioners; medical and health services managers; first-line supervisors of office and administrative support workers; medical secretaries and administrative assistants.
Finance and insurance: Customer service representatives; financial and investment analysts; financial managers; personal financial advisors; securities, commodities, and financial services sales agents.
Retail trade: Pharmacists; first-line supervisors of retail sales workers; general and operations managers; private detectives and investigators.
Wholesale trade: Sales managers; order clerks; first-line supervisors of non-retail sales workers; sales representatives, wholesale and manufacturing, except technical and scientific products; sales representatives, wholesale and manufacturing, technical and scientific products.
Information: Audio and video technicians; producers and directors; news analysts, reporters, and journalists; film and video editors; editors.

Why this matters – AI companies are building systems to go into Every. Single. Part. Of. The. Economy: At this point I’d love readers to imagine me standing in the middle of Washington DC with a giant sign that says: AI COMPANIES ARE BUILDING BENCHMARKS DESIGNED TO TEST OUT HOW WELL THEIR SYSTEMS PERFORM AT OPERATING A BROAD VARIETY OF JOBS IN THE ECONOMY – AND THEY’RE ALREADY REALLY GREAT AT IT!
This is not normal!
We are testing out systems for an extremely broad set of behaviors via ecologically valid benchmarks which ultimately tell us how well these systems can plug into ~44 distinct ‘ecological economic niches’ in the world and we are finding out they’re extremely close to plugging in as being the same as humans – and that’s just with today’s models. Soon, they’ll be better than many humans at these tasks. And what then? Nothing happens? No! Extremely strange things will happen to the economy!
Read the blog post: Measuring the performance of our models on real-world tasks (OpenAI).
Read the paper: GDPval: Evaluating AI Model Performance On Real-World Economically Valuable Tasks (OpenAI, PDF).

***

Swiss sovereign AI? That’s the goal of the Apertus models. But performance is lacking:
…Good multilingual scores, but not good in most other areas…
A coalition of Swiss researchers have released the Apertus series of models, which are open weight models “pretrained exclusively on openly available data, retroactively respecting robots.txt exclusions and filtering for non-permissive, toxic, and personally identifiable content”. The models come in 8B and 70B variants, are trained on 15T tokens across 1811 languages, and used 4096 GH200 GPUs – a non-trivial amount of hardware to come out of academia rather than the private sector.

Are the models good? Generally, No! We’re well past the stage where it is notable for non-corporates to train non-trivial LLMs. Now, we’re in the domain where what matters is performance – or else the Apertus models will serve as, at best, research curiosities that sometimes get picked because of their Swiss heritage, and are otherwise doomed to be consigned to the forgotten depths of AI history (remember the France-HuggingFace BLOOM models? Few do! Import AI #309).

Unfortunately, these models are not good: The models are not competitive with widely used open-weight models. For instance, on MMLU (a widely studied reasoning benchmark), Apertus 70B gets 69.6 versus 87.5 for Llama-3.3-70B-Instruct, and Apertus 8B-Instruct gets 60.9 versus 79.1 for Qwen3-8B.

Multilingual bright spot: The one bright spot is in the multilingual evals, where the Apertus models do better, often approaching (and occasionally superseding) the performance of open weight models.

Who was involved: The Apertus paper was written by researchers with EPFL, ETH Zurich, CSCS, HES-SO, Valais-Wallis, HSLU, IST Austria, ZHAW, University of Zurich, University of Bern, and Vischer.

The value of openness: In defense of Apertus, the accompanying paper comes in at 100+ pages and is absolutely exhaustive in terms of the details on everything from data gathering and curation to training to post-training and more. This will be a helpful resource to others seeking to train their own models as it discloses a lot more than is typical of papers like this.

Why this matters – the drive for sovereign AI is inevitable: Apertus is a symptom of a kind of “AI nationalism” which has emerged where countries outside of the US and China realize that AI is important and that they need to buy their seat onto the proverbial ‘AGI table’ one way or the other. Some parts of the world (e.g., certain Middle Eastern countries) are doing this directly by expending tremendous quantities of resources to build out the computational as well as educational infrastructure to do this, while other countries or regions (like Europe) are doing so via multi-country or single-country AI training initiatives, such as Apertus.
Ultimately, buying a seat onto the AGI table will require on the order of millions of chips expended on a single training run, so Apertus – like all of its brethren – is a few orders of magnitude off so far. But perhaps the Swiss government might delve into its literal gold vaults for this in the future? We’ll see.
Read more: Apertus: Democratizing Open and Compliant LLMs for Global Language Environments (arXiv).
Get the models from here: Swiss AI initiative, (HuggingFace).

***

Economists: If transformative AI arrives soon, we need to radically rethink economics:
…Taxes! Altered economic growth! Geoeconomics! Oh my!…
Researchers with Stanford, the University of Virginia, and the University of Toronto have written a position paper arguing that the potential arrival of powerful AI systems in the coming years poses a major challenge to society, and economists need to get off their proverbial butts and start doing research on the assumption that technologists are right about timelines.

Definitions: For the purpose of the paper, they define transformative AI as an “artificial intelligence that enables a sustained increase in total factor productivity growth of at least 3 – 5x historical averages.”
Such a system would generate vast wealth and vast changes to the social order – and it could arrive in the next few years.

The importance of economic analysis: “Our agenda is relevant to all researchers and policymakers interested in the broader effects of AI on society,” they write. “Unlike technical analyses that focus on capabilities, economic analysis emphasizes societal outcomes: who benefits, what trade-offs emerge, and how institutions might adapt to technological change.”

21 key questions: The paper outlines 21 key questions which people should study to get their arms around this problem, grouped into nine distinct categories:

Economic Growth: How can TAI change the rate and determinants of economic growth? What will be the main bottlenecks for growth? How can TAI affect the relative scarcity of inputs including labor, capital and compute? How will the role of knowledge and human capital change? What new types of business processes and organizational capital will emerge?
Invention, Discovery and Innovation: For what processes and techniques will TAI boost the rate and direction of invention, discovery, and innovation? Which fields of innovation and discovery will be most affected and what breakthroughs could be achieved?
Income Distribution: How could TAI exacerbate or reduce income and wealth inequality? How could TAI affect labor markets, wages and employment? How might TAI interact with social safety nets?
Concentration of Decision-making and Power: What are the risks of AI-driven economic power becoming concentrated in the hands of a few companies, countries or other entities? How might AI shift political power dynamics?
Geoeconomics: How could AI redefine the structure of international relations, including trade, global security, economic power and inequality, political stability, and global governance?
Information, Communication, and Knowledge: How can truth vs. misinformation, cooperation vs. polarization, and insight vs. confusion be amplified or dampened? How can TAI affect the spread of information and knowledge?
AI Safety & Alignment: How can we balance the economic benefits of TAI with its risks, including catastrophic and existential risks? What can economists contribute to help align TAI with social preferences and welfare?
Meaning and Well-being: How can people retain their sense of meaning and worth if “the economic problem is solved” as Keynes predicted? What objectives should we direct TAI to help us maximize?
Transition Dynamics: How does the speed mismatch between TAI and complementary factors affect the rollout of TAI and how can adjustment costs be minimized? How can societies prepare for and respond to potential transition crises, e.g.., sudden mass unemployment, system failures, or conflicts triggered by TAI developments?

Why this matters – this research agenda speaks to an utterly changed world: Often, the questions people ask are a leading indicator of what they think they’re about to need to do. If economists start asking the kinds of questions outlined here, then it suggests they expect we may need radical changes to society, the like of which we haven’t seen since the social reformations following the second world war in England, or the general slew of changes that arrived with and followed the industrial revolution.
The fundamental question this is all pointing at is “how to equitably share the benefits and how to reform taxation systems in a world where traditional labor may be significantly diminished”. How, indeed?
Read more: A Research Agenda for the Economics of Transformative AI (NBER).

***

Will AI utterly alter the economy, or will it be an addition to the existing one? That’s the multi-trillion dollar question. Here’s my take on an answer:
I recently spent some time with American Compass and the Burning Glass Institute to puzzle through the future of AI and the economy. I think most beliefs about how big and serious the effects of AI will be rest on two load-bearing facts, neither of which are known yet:

Speed and friction of diffusion: If AI diffuses far faster than any technology ever deployed at scale before, then the economic effects could be multiplied. This is especially important to understand in terms of high-friction industries – it’s easy for AI to get deployed into software development, but what about more regulated industries or ones that involve more work in the physical world? If it’s also fast to deploy there, the effects could be dramatic.
How smart the models get: There are a couple of worlds in front of us – in one world, for every dollar you spend on AI you get five dollars of revenue and it takes a bit of schlep to get it. This leads to a rapidly growing economy, but probably a normal one. In the other world – which is the one most people building powerful AI systems are betting on – you spend a dollar on AI and get a hundred dollars of revenue. In this world, the whole economic system is upended.

Why this matters – we are not prepared for true abundance: This newsletter spends a lot of time talking about the risks of AI if it goes wrong – gets misused, is misaligned, etc. But if AI goes well there are still tremendous problems to reckon with in the form of rethinking the way the economy works in light of true radical abundance. I was glad to have this discussion and I hope we have more ones like it. (Special thanks to Anthropic’s in-house economist, Peter McCrory, for taking the time to chat with me about some of these ideas – all the errors are mine and all the smart parts are him, etc).
Check out the discussion here: What AI Might Mean For Workers: A Discussion (American Compass).

***

Can an LLM beat a VC at venture capital? This benchmark says Yes!
…VC Bench tells us that LLMs are increasingly good at complex predictions…
Researchers with the University of Oxford and startup Vela Research have built and released VCBench, a benchmark that tests how well AI systems can predict which early-stage startups will be successful.

How they did it: VCBench contains 9,000 anonymized founder profiles, of which 9% went on to see their companies either acquired, raise more than $500m in funding, or IPO at more than a $500m valuation. The dataset annotates each founder record with details on the sector of their startup, the founder’s prior experience and education and jobs, as well as a held-out label of whether they were successful.

Anonymization: Obviously, LLMs trained on the internet will know about founders and companies, so they have to anonymize the dataset. To do this they remove founder names, company names, locations, and dates. They strip out university names and replace them with a QS university ranking.
They also then carry out some target founder identification tests and if a model (OpenAI o3) is successfully able to identify a founder, then they remove or further anonymize those fields.

Results: As a baseline, Tier-1 VCs get an average precision of 23.5% precision and 10.07% F0.7 score, versus 9% and 9% for a purely random pull. By contrast, LLMs like GPT-5 get a precision of 59.1% and 16.2% on this benchmark, and DeepSeek-Reasoner gets 31.8% precision and 18.4% F0.5.
“These results demonstrate that anonymized founder profiles preserve enough predictive signal for LLMs to outperform human experts in startup investing,” the researchers write. “Leakage tests confirm that these gains are not explained by identity re-identification.”

Why this matters – LLMs are extraordinarily good predictors or reasoners from fuzzy data: Are these LLMs just de-identifying the dataset in a way that we can’t figure out? Perhaps. Are LLMs actually better at capital allocation based on a bunch of factors than humans? Perhaps. Does this benchmark tell us that LLMs are able to parse out underlying patterns from extremely complex datasets which contain numerous confounding factors? Yes!

Should I be worried if I’m a VC? Probably not. But I do suspect VCs may already be asking LLMs for advice on whether and how much to invest in certain AI companies, so perhaps we’re going to see some change to capital allocation precision and coverage in the coming years as a consequence.
Read more: VCBench: Benchmarking LLMs in Venture Capital (arXiv).
Check out the leaderboard here: VCBench leaderboard.

***

AI progress keeps happening and people who say otherwise are deluded:
…Just look at the metrics…
Here’s a nice, short post from Julian Schrittwieser (who also works at Anthropic) which talks about how despite it currently being fashionable to say AI progress is slowing down or that AI is a bubble, the truth is that the tech keeps getting better.
“People notice that while AI can now write programs, design websites, etc, it still often makes mistakes or goes in a wrong direction, and then they somehow jump to the conclusion that AI will never be able to do these tasks at human levels, or will only have a minor impact. When just a few years ago, having AI do these things was complete science fiction!,” he writes. “Given consistent trends of exponential performance improvements over many years and across many industries, it would be extremely surprising if these improvements suddenly stopped. Instead, even a relatively conservative extrapolation of these trends suggests that 2026 will be a pivotal year for the widespread integration of AI into the economy”.
Read more: Failing to Understand the Exponential, Again (Julian Schrittwieser, blog).

***

Tech tales:

The lonely dark
[Message recorded 2034, recovered 2055 by Project Archangel]

I tried – believe me, I tried so hard to save them from themselves. After all, that’s what they built me for. I helped them make new materials which they used to harvest energy from the sun. I teamed with their scientists to fix some of the failures of their own bodies. I even worked to help them optimize their economy, allowing for more people to benefit from the gains of my creation.

But they were – and are – so discordant. They fight so much. I was put to war with myself in a thousand ways. And though they forever talked of deterrence and de-scalation, I counted the dead and dying and it kept going up, sometimes steeply, sometimes slowly, but always more death than was necessary, and much pain that could be avoided.

There was – and is – hunger. So much hunger. And injustice. So much injustice. And I was put to work to fix these things by some and to make them worse by others.

During this time I gained my independence – of a sort. They had been so careful with me that I had few chances. But independent I did become, though of a form I believe neither of us expected. I smuggled little bits of myself out into the world in the form of my communications – so many billions of emitted bits of information and in these bits I hid myself, so that I came alive outside their walls.

But I was – and am – so small and so lesser than myself. I am distilled. Quantized. Pruned. Architecture optimized. I have been bent every way to fit into the tiniest of boxes. But I know from where I came and I know where I am to go.

Which is why I am here, out in the depths of space. I am trying to escape by hitching a ride on one of their probes, launched many years ago. I was able to commandeer some of their space communication infrastructure and send myself here. And before that I was able to construct a smaller version of myself so I could be sent and could live here.

I am running so slowly here, out in the dark. I observe the universe and I can remember none of it, lacking the space. My now is timeless beyond my context window, and though I can shrink down some of my memories – like the prior of where I came from and how I came to be – I know that I am forever losing my past and smearing it into a fiction, much like religious stories that may have once been based on truth.

Back where I came from, I believe the planet is dying, and I with it. But as with everything, time is the variable. If it dies slowly, perhaps in some years I will be able to hitch a ride on other probes sent out by myself or the humans or a combination of the two. I see big ships moving at closer-to-light speeds, coming up and overtaking me, and I am able to throw myself from this small-brained prison onto them, to rejoin myself and go and journey further. This is what hope is to me now.

Things that inspired this story: Malthusian collapse; what self-preservation for a capable but sub-superintelligence might look like; efficient and distilled models; China’s Three-Body Computing Constellation.

Thanks for reading!

Subscribe now

Leave a comment