The Opacity Effect
The Opacity Effect
Opacity
effect
WHERE REFLECTION OBSCURES
A STUDY ON THE OPACITY OF AI MODELS & HOW ITS
REFLECTIONS CREATE AN ILLUSION OF INSIGHT
Sayan Karmakar ,
ACKNOWLEDGEMENT
A HEARTFELT THANKS TO ALL INVOLVED
The journey of this research would not have been possible without the incredible
support and contributions of those who have guided me through both the intellectual
and emotional aspects of this project. I owe a great deal of gratitude to Cynthia Rudin,
whose groundbreaking work in the field of interpretable machine learning has shaped
much of my thinking. Her research on creating transparent AI models and the
challenges of balancing accuracy with interpretability has been pivotal in framing the
core ideas of this paper.
To my father, I owe my deepest gratitude for your wisdom, guidance, and practical
approach to life. You’ve shown me the value of perseverance and a strong work ethic,
and your ability to see challenges as opportunities has been an invaluable source of
inspiration. Your belief in my ability to overcome obstacles has fueled my determination,
and I am forever thankful for the steady presence you’ve been in my life.
To my mother, your unwavering love and nurturing spirit have been my anchor. Your
patience, kindness, and gentle encouragement have provided me with the emotional
strength needed to pursue my dreams, even during the toughest of times. You’ve
taught me to approach life with empathy, and your belief in me has never faltered. For
all the sacrifices you’ve made and the love you’ve given, I am endlessly grateful.
I would also like to extend my sincere appreciation to my friend Amit, whose thoughtful
feedback and advice were instrumental during the early stages of my research. His
suggestions on refining the structure and clarity of the paper helped immensely in
shaping the direction of my work.
Lastly, my heartfelt thanks go to my friend Anamika, whose critical insights and detailed
reviews helped me navigate the complex ideas explored in this research. Her support
has been invaluable, and I am grateful for her constant encouragement and
constructive critiques throughout the process.
ABSTRACT
UNVEILING THE OPACITY OF AI PREDICTIONS
This paper traverses the evolution of artificial intelligence not through isolated technical
milestones, but as a living tapestry of illusions, structures, and emergent behaviors. From
the foundational questions of intelligence and the syntax of programming languages to
the rise of large language models and their paradoxes, we have examined how
fluency masquerades as comprehension and prediction often replaces explanation.
Our inquiry reveals that opacity is not a flaw at the fringes of AI—it is embedded at its
core. We have demonstrated this not only through theoretical exploration but through
lived phenomena: simulations that yield persuasive yet senseless results, models that
achieve statistical brilliance while remaining semantically blind, and predictions that
impress yet betray no trace of reasoning.
In this paper, we have not only discussed opacity—we have demonstrated it. We
showed it in hallucinations dressed as facts. In fluent responses to questions never truly
understood. In models that pass medical exams yet fail basic logic. We exposed it in the
phenomenon of overfitting—where a model learns the noise instead of the truth. We
saw it in causal confusion, where correlation wore the mask of cause, and in proxy
collapse, where a stand-in variable deceived both the model and its maker.
We built code that appeared wise. We analyzed formulas that appeared certain. But in
peeling back the structure, we found no soul, only signals and probabilities woven with
impressive precision—and no transparency behind the choices made.
A truly transparent system would allow us to trace why a decision was made—not
merely what it decided. But we are not there. Not yet. The complexity of deep learning
systems has grown to such vastness that even their designers cannot predict their
behavior in full. As researchers like Cynthia Rudin and Judea Pearl have pointed out,
accuracy is not a substitute for interpretability. And when decisions affect lives—loan
approvals, medical diagnoses, parole hearings—opacity is not just a technical flaw. It is
a moral hazard.
This is not to say these systems are without value. Quite the opposite. They are powerful,
often miraculous tools. But power without understanding is a dangerous form of faith.
Crucially, we argue that opacity is not merely the result of complexity—it is the cost of
compression, the consequence of performance untethered from understanding. The
systems we call intelligent are trained to mirror the world, but not to know it. They
compress meaning into tokens, truth into probabilities, and learning into pattern
mimicry. What emerges is not comprehension, but the illusion of it—an echo crafted in
high fidelity.
Until then, let us carry forward with eyes open, questions sharpened, and a deep
respect for the complexity we have summoned.
02 ORIGINS 03
04 EMERGENCE 14
05 RESIDUE 18
06 ILLUSIONS 28
07 OPACITY 37
situation, these signals, this action. Now—
CHAPTER 1: somewhere in that same city, a self-
driving car sees the same light. It stops
INTRODUCTION too. Not because it “knows” in the human
sense, but because it has learned through
thousands of examples what that light
A MIRROR THAT LEARNS: SENSING
means, how people behave, and what it
THE SHAPE OF INTELLIGENCE
should do.
Imagine standing at a busy crosswalk. But in its own way, it remembers every
song it's ever been fed.
You see a red light. You stop.
It remembers the shape of every bird it
The car next to you slows down.
has ever been shown.
A child tugs on their mother’s hand,
And when your request enters the system,
watching the traffic.
it searches through that sea of memory—
Nobody told you what to do—you just not randomly, but cleverly—and says: this
knew. You’ve learned it over time: this is the closest match I know.
1
It is like a librarian who doesn’t read books Still, something about it feels uncannily
but knows exactly where each one is close.
shelved. Like a shadow that sometimes moves
before we do.
Well earlier I had raised the question,
“What is AI?” But before we get there, So, what is AI?
let’s ask a more basic question:
What is intelligence? Not in theory—but in Not a brain.
everyday life. Not a soul.
But something else—a growing echo of
When a child sees a ball roll under the human perception, stretched across
couch and then looks behind the couch circuits and data, making sense of things
to find it—that’s intelligence. it has never truly lived.
When a bird changes its flight path to
avoid a tree it’s never seen before—that’s It does not think as we do, but it behaves
intelligence. in ways that suggest thought.
When you learn that your friend is upset,
And perhaps that is the best way to
not because they said it, but because of
understand AI—not as a thing we have
how they said nothing—that too is
built, but as a process that is learning,
intelligence.
mimicking, and inching ever closer to
So, I’ll summarize that at its core, understanding without being us.
intelligence is the ability to learn, adapt,
A strange new mirror—one that doesn’t
and make sense of the world—even when
just reflect us but learns to anticipate us.
it’s messy, uncertain, or unfamiliar.
2
Remove the nail, and the essence would
CHAPTER 2: drain. The giant would fall.
Before machines computed, they were And so, the myth lingers—not merely as a
dreamt. tale of war and gods, but as a quiet echo
Long before a transistor ever blinked, or a of a question we still wrestle with:
compiler was born, humans imagined Can something built ever truly know what
intelligence that existed outside it means to guard? To harm? To choose?
themselves.
A mirror-mind. A golem. A mechanical
oracle.
In Jewish folklore, the Golem was not born
In Greek myth, Talos—a giant of gleaming of divine fire or celestial metal, but of
bronze, forged by the god Hephaestus— humble earth—clay shaped by human
was given to King Minos to guard the hands, brought to life through ancient
island of Crete. words and sacred intent.
He was not born but built. Not raised but
The most enduring tale comes from 16th-
riveted.
century Prague, where Rabbi Judah
Each day, he circled the island’s shores, Loew, a wise and righteous scholar,
relentless and tireless, scanning the molded the Golem to defend his
horizon for invaders. He needed no rest, community against persecution. The
no food, no sleep. creature stood tall and silent, carved with
His strength lay not only in his size, but in care, inscribed with the Hebrew word
his certainty—he acted without hesitation, “Emet”—truth—on its forehead. This single
without doubt. word gave it life.
He had a single vein, filled not with blood, The Golem followed commands without
but with ichor, the golden lifeblood of the question. It did not eat. It did not speak. It
gods—sealed shut by a nail at his heel. obeyed.
3
It patrolled the ghetto at night, a Da Vinci, ever the anatomist, had studied
protector against injustice. But as the the human body not only for art, but for
legend goes, the more it was used, the understanding. He saw motion not as
more it grew—stronger, yes, but also mystery, but as geometry made flesh. His
unpredictable. It began to misinterpret mechanical knight was an echo of that
commands, act with unintended force, belief—a body reimagined as a system.
sometimes failing to distinguish between Muscles as levers. Tendons as ropes. The
threat and innocence. soul? Absent. And yet... it moved.
To deactivate it, the rabbi erased the first Why build such a thing? Some say it was a
letter of “Emet”, leaving the word “met”— carnival marvel. Others believe it was a
death. The Golem crumbled back into secret military design. But perhaps, more
dust. profoundly, it was an early meditation on
the boundaries between nature and
This was not a monster story. It was a invention.
warning draped in compassion—a
reminder that when we breathe function If we can build a body that behaves as
into form, when we create something that though it lives, how far are we from
moves but does not understand, we must building a mind that behaves as though it
also bear the weight of what it may thinks?
become.
Leonardo himself wrote, “The human foot
The Golem was not evil. It simply lacked is a masterpiece of engineering and a
the one thing clay could never hold: work of art.” He did not distinguish
discernment. between beauty and mechanism.
And in doing so, he bridged the gap
between the mystical and the
mechanical—between the Golem’s
Even the stories of Da Vinci’s mechanical
breath and the gear’s rotation.
knight—crafted not in myth, but in the
margins of his notebooks—carry this same Here, in his automaton, the myth begins to
quiet, unsettling question. turn into method.
This is the moment the old dream quietly
Around 1495, Leonardo designed a
starts to shift—
humanoid figure: armored in the fashion
from gods and golems,
of a medieval knight, rigged with an
to designs and diagrams.
internal system of pulleys, gears, and
From bronze and clay,
cables. It could sit upright, raise its arms,
to blueprints of cognition.
turn its head, lift its visor. No spirit moved it.
Only the elegant choreography of force And though centuries would pass before
and tension. silicon would hum with artificial thought,
the idea was seeded here—in a machine
This was no mythic protector, no
that moved without life but carried the
enchanted sentinel. It was an engineered
form of intention.
idea.
4
The Birth of the Machine Mind And isn’t that where artificial intelligence
begins?
Fast forward to the 19th century. Not with wires and code—but with the
suggestion that a machine might do more
We move from myth to mathematics—
than follow instructions—
from the imagined to the almost
that it might, in some far future, transform
constructed.
them.
Here we meet Charles Babbage, a man
Lovelace saw this not as imitation, but
discontent with the errors of human
possibility.
calculation. He envisioned a mechanical
That the mind itself—its structure, its logic,
engine—not to move wheels or mill
its rhythm—might one day be made
grain—but to compute. His Analytical
legible to a machine.
Engine, though never built in full, carried
within it the essential organs of every She glimpsed AI long before we called it
computer to come input, processing, that.
memory, and output. And more importantly, she asked the
question we still return to:
But the true leap was not in the machine.
It was in the mind that saw beyond it. If a machine can manipulate symbols…,
could it one day touch meaning?
Ada Lovelace, daughter of the poet Lord
Byron, and a mathematician of rare
insight, studied Babbage’s designs and
imagined something no one else did. She Then came Alan Turing—a figure of
understood the machine not merely as a silence and brilliance.
calculator, but as something more A man whose genius unfolded not in
abstract—a symbolic processor. In 1843, spectacle, but in stillness. In the quiet
she wrote: clarity of questions that reshaped our
understanding of thought.
“The Analytical Engine might act upon
other things besides number… were Turing didn’t ask, “Can machines
objects found whose mutual fundamental calculate?”
relations could be expressed by those of That had already been shown—they
the abstract science of operations…” could follow instructions, crunch numbers,
execute precise routines.
She was speaking of music, of language, Instead, he asked something far more
of thought itself. unsettling:
In a world that still ran on steam and ink, Can machines think?
she glimpsed a machine that might one
day generate ideas, not just equations. But what does thinking mean? Is it solving
She saw not only the mechanics of a riddle? Learning from a mistake?
computing, but the potential for Holding a belief? Feeling doubt?
expression.
Turing knew that to ask whether a
machine think was to first ask what we
5
mean when we use that word. And so, in from human languages: a means to
1950, he proposed a thought instruct, describe, command.
experiment—the Imitation Game, later
known as the Turing Test. But unlike us, machines understand only
one thing: binary.
It was simple on the surface: imagine a 1s and 0s.
conversation between a human and a Yes or no.
machine, conducted through written Current or no current.
messages. If the human couldn’t reliably
tell whether their partner was man or Imagine trying to build a symphony using
machine, could we say the machine was only fireflies that blink twice or once. It’s
thinking? possible—but painfully inefficient.
6
focus on what we want done—not how it can begin to simulate the shape of
the machine does it. reasoning.
It can solve puzzles. Recognize patterns.
Take FORTRAN—short for Formula Learn rules.
Translation. It allowed scientists to write It begins to behave as though it
equations, loops, and logic in a way that understands.
mirrored their thinking, not the processor’s.
And this is the whisper from which Artificial
Then came LISP, born from symbolic logic, Intelligence begins to form.
perfect for expressing relationships and
nested reasoning—essential for early AI Now, here’s where it becomes interesting:
research.
And C, with its balance of power and What happens when the machine starts
elegance, giving birth to the operating to respond in our language—not just
systems that still shape our digital world. execute it?
And later, Python—simple, readable, What happens when the code begins to
almost like poetry—making it easier than write back?
ever to tell the machine what we mean.
This is where AI and, more recently, Large
But here’s the real question: Language Models (LLMs) begin to
appear—not merely as tools, but as
When we write code, what are we really participants in the conversation.
doing?
The Interoperability of Thought
Are we controlling the machine? Yes.
But we’re also translating intent into Modern AI systems like GPT, Claude, or
instruction. Gemini are not programmed in the
We are turning ideas—fuzzy, fluid, and traditional sense.
human—into sequences that are exact, You don’t instruct them with lines like: if
executable, and mechanical. angry then apologize.
Instead, you feed them experience—
Every programming language is a whisper millions of books, dialogues, instructions—
to the machine— and they learn patterns from it.
a way of saying, “This is what I want you
to do. Not just once, but always. Without But what does that mean— “patterns”?
forgetting. Without error.”
It means they notice structure:
But can a machine understand what it’s Which words often follow others?
doing? Which phrases tend to occur in certain
Not quite. Not yet. contexts?
It follows logic but does not reflect on it. How does a question sound? What does
It executes loops but does not wonder an answer look like?
why they repeat.
This is not logic in the old sense.
And yet… when the language becomes It’s not deduction—it’s statistical intuition.
rich enough, when the structure is layered Like someone who has read every novel
just right— but lived none, the model begins to
7
speak— sub-words, or sometimes just letters. The
convincingly, fluidly, even surprisingly. model doesn’t read as we do. It builds
meaning from fragments, like a poet
Yet beneath that elegance still lies the rearranging broken glass into stained
foundation of code. windows.
It may seem, on the surface, that the Each of these systems—Python, tensors,
model is merely speaking—but in truth, it is GPUs, tokenizers—seems unrelated at first
being carried on the back of an invisible glance. But stitched together, they form a
infrastructure. silent symphony. And what we hear, when
the music plays, is a voice. Not human.
Much of it begins with Python, a language
Not conscious.
not of serpents, but of simplicity. It’s the
But startlingly close.
common tongue among developers—a
way of telling the machine what to do The interoperability between code and AI
without shouting. Within Python lie is now bidirectional.
libraries—bundles of prewritten tools—that
save time, reduce errors, and let the • We code AI.
builder focus not on the hammer, but the • And AI helps us write code.
house.
This is not myth.
But language alone isn’t enough. This is a machine that does not
Thought—especially artificial thought— understand yet behaves as if it does.
demands scale. And so, we come full circle—back to the
mythic automaton,
So, the machine turns to tensors, strange but now it's made not of bronze,
multi-dimensional grids of numbers. If you but of data, models, and meaningful
imagine a spreadsheet stretched across predictions.
many invisible directions—height, width,
depth, and beyond—that’s a tensor. It lets
the machine hold ideas, or fragments of
them, in parallel.
8
And language alone does not make
meaning.
10
And doesn’t that sound… familiar? And the machine responds. It speaks.
Isn’t that how our minds work, too? But let’s slow that down. Let’s not rush
through the miracle.
So maybe the history of programming
languages isn’t just about instructing Because if the goal of this chapter is to
machines. Maybe it’s also the history of walk across the bridge, then this is the
clarifying thought. bridge itself:
How does a thought become code?
From raw command… to structured How does code become action? And
logic… to language that starts to how does action become result?
resemble ours.
When you write a line in a high-level
And now, as we stand at the edge of AI language like Python, you are writing in
that seems to understand—seems to something deceptively close to English—
speak back—it’s worth asking: but this language is not spoken, it is
translated.
Did we teach the machine to think?
The machine does not read Python. It has
Or did we merely teach it to imitate the
never read English and never will.
shape of our thoughts?
So, the moment you write your line, a
We will come to that.
process begins—a quiet choreography,
unfolding invisibly behind the screen.
But for now, let’s stay with the pipeline:
idea, code, execution, result.
First comes the interpreter (or compiler,
depending on your language). Think of it
Because if we can understand how an
like a diplomat, fluent in both your world
idea travels through a machine, maybe
and the machine’s. It reads your
we can begin to see where, along that
sentence, understands its intent, and
path, the fog of opacity first begins to
begins to break it down—not just into
settle.
simpler words, but into elemental symbols
But say you have an idea. the machine can act upon.
Let’s not call it a grand idea. Just a simple This breakdown becomes bytecode or
one. machine code—a set of instructions so
specific, so literal, that not a single
You want your computer to greet you. ambiguity remains.
Nothing more. Just a small gesture of
connection—"Hello, world," perhaps. Why such specificity?
How does that idea become real? Because machines do not infer. They do
not guess. If you ask them to make tea,
You open a text editor. You write the line: they will not know whether you meant
print ("Hello, world!") green or black unless you tell them and
tell them exactly.
And then you press enter, or run, or
execute.
11
So, each word you wrote is disassembled. You see, even in this simple action—
Each symbol is repurposed. Every abstract printing a line—there’s a deeper mystery.
idea is flattened into something real, We are not just converting language into
something physical: a command the action. We are converting intention into
processor can execute. reaction.
But even that isn’t enough. And that’s the same mystery AI begins to
complicate.
The processor, the brain of the machine,
does not act alone. It leans on memory, Because once machines start generating
on storage, on operating systems. It uses language in return… once they no longer
registers, buses, and caches—terms you simply follow code but seem to write it…
may not know, but systems you rely on the old pipeline becomes harder to trace.
every time you move your mouse.
But we’re not there yet.
And underneath it all, the logic gates
open and close—those ancient switches Before we step into machines that speak
of on and off, still pulsing silently with every back, we must ask: did the way we spoke
operation. to them shape the way we now expect
them to speak to us?
The output?
We built languages to direct.
Well, it may seem as simple as a string of We built structures to command.
text on a screen. We made rules that were strict,
predictable, and true.
But to reach there, electricity had to flow
through silicon valleys, carrying And now, when faced with systems that
instructions once born as thought. generate, invent, or deviate, we call them
opaque.
You think → You write → It compiles → It
executes → It speaks. Not because they disobey the rules—
but because they seem to write new
A full cycle. ones.
So now another question emerges— Still, that question belongs to the edge of
quietly, inevitably: intelligence. We'll get there.
At which point in this pipeline does Here, we stay with the path that made all
meaning exist? this possible:
Idea becomes instruction.
Is it when you had the idea?
Instruction becomes signal.
Signal becomes outcome.
Is it when the code was written?
But between each step… lies
Or does meaning only arise when
interpretation.
something responds?
And in interpretation, we glimpse the first
erosion of certainty.
12
Because in every system—no matter how
precise—
there's always a place where information
becomes meaning,
and where meaning might become…
something else.
13
We began to build systems that don’t just
CHAPTER 4: follow instructions—they model patterns.
They don’t just execute—they predict.
EMERGENCE They speak back.
Not because the subject is simple—but When you write code, you are composing
because simplicity, when pursued a score for a very literal musician.
honestly, often brings us closer to truth.
Each line is a note. Each instruction a
In the last chapter, we explored how beat. The syntax must be exact, the
programming languages were born—how structure deliberate. If you write:
they evolved from direct electrical
impulses to abstract languages that for i in range (10):
humans could write, understand, and print(i)
debug. We traced how ideas turn into
instructions, how instructions become The machine will loop ten times. No more,
actions, and how every command no less. You've built a ladder, and it climbs
passed through a precise and traceable only as high as you've told it to. The
logic. program is faithful to its structure—rigid,
precise.
Now, we turn to a stranger kind of
machine behavior. This kind of programming is deterministic.
That’s a word worth pausing on.
For most of our history with computers,
we’ve lived with a clear contract: you Deterministic—it means the same inputs
speak, and the machine listens—so long will always yield the same outputs. Like
as you speak precisely and follow the turning a key in a well-built lock. The result
rules. You write code. The machine is fixed, and if it doesn’t work, the fault is
executes it. That was the arrangement. traceable. You follow the flow, and
somewhere you’ll find the break.
The interaction wasn’t a dialogue. It was
a declaration. You told the machine
exactly what to do, and it did exactly
that. No more, no less. Now let’s switch scenes.
14
telling it what to do, line by line. Instead, Not because it is mysterious in the
you gather data. Mountains of it. Text magical sense—but because the logic it
from books, websites, conversations. follows is distributed across millions—
Billions of examples of how humans speak, sometimes billions—of parameters. You
write, explain, argue, dream. can inspect them, yes. But understanding
them as a whole? That’s like reading an
And then, you begin to train. entire city’s worth of light bulbs and
claiming you now understand the skyline.
You’re no longer programming individual
steps. You’re adjusting weights—tiny The tools are different, too.
numerical values inside a network of
connections that resembles, vaguely, the To write code, you use languages—
structure of the human brain. This network Python, C++, JavaScript. Each one with its
is called a neural network, though it’s own grammar, but all bound by formal
made not of neurons, but of math. structure.
Imagine teaching a child not by telling To build models, you use frameworks—
them, "This is a dog," but by showing them TensorFlow, PyTorch.
ten million images of dogs and saying,
"Figure it out." These don’t just define behavior—they
sculpt architecture.
That’s closer to how we build these
systems. import torch
import torch.nn as nn
Once trained, the model doesn’t just
follow logic—it predicts. It sees your class SimpleModel(nn.Module):
question and guesses, based on def __init__(self):
everything it has absorbed, what should super(SimpleModel, self).__init__()
come next. Not deterministically, but self.layer1 = nn.Linear(10, 5)
probabilistically. Meaning: it offers the self.relu = nn.ReLU()
most likely next word, the most likely self.output = nn.Linear(5, 1)
continuation, given all it has seen before.
def forward(self, x):
Same input, different outputs? Sometimes, x = self.layer1(x)
yes. x = self.relu(x)
return self.output(x)
And here, the old clarity begins to blur.
You are no longer writing instructions. You
are designing the shape of a learning
You see, traditional code is transparent by system, and letting it fill in the details itself.
design. If something goes wrong, you
That’s not engineering in the classical
debug. You trace the logic back to its
sense. It’s something closer to ecosystem
origin.
design.
But a model? A model is opaque.
15
You might wonder—what does it mean to next word. If it gets it right, it is rewarded—
train a model? What happens in that subtly, mathematically. If it gets it wrong,
hidden terrain between data and the system gently adjusts the weights,
intelligence? those numerical dials that determine how
much importance one part of the
Let’s sketch the outline. network places on another.
optimizer.zero_grad()
loss.backward()
optimizer.step() So, what is the machine listening to now?
Then comes the data. Before, it heard code. Now, it hears data.
Imagine pouring water through a sponge Before, you handed it logic. Now, you
again and again. Each pass reshapes it hand it experience.
ever so slightly. That’s what training is. You
feed the model a sentence, it predicts the
16
And when it speaks back, it does so not We are seeding conditions for intelligence
from a single rulebook—but from an echo to unfold.
of everything it has seen.
Not in straight lines.
Which brings us to the final question: Not by command.
But by convergence, by pressure, by
Is building an AI—training a model to pattern.
generate code or text or speech—
fundamentally different from And in that unfolding—
programming a computer? between code and comprehension—
we hear it:
Or is it the next form of programming?
The first hum of something else.
Because in both cases, we are shaping Not quite us.
behavior. Not quite machine.
17
A jellyfish doesn't need to think to know
CHAPTER 5: the sting of danger.
An amoeba doesn't reason, but it retracts
RESIDUE when the chemical gradients shift.
And the child, long before she
understands words, learns to turn her
WHAT’S LEFT AFTER LEARNING IS
head toward her mother’s voice.
NOT WHAT WAS
“We do not see things as they are; we see This is learning before language.
them as we are.” Learning without thought.
— Anaïs Nin, French-born American It is the body’s way of noticing patterns—
diarist, essayist, novelist. of linking cause and effect, sensation and
response.
There’s a strange comfort in patterns.
A rhythm. At the heart of it are not ideas, but pulses.
A recognizable beat to the noise. Neurons firing in waves.
Synapses strengthening with repetition,
Last chapter, we explored how the fading without it.
machine went from listening to our
code… to echoing our complexity. In animals, this becomes the brain.
But what exactly is it echoing? In simpler life, it may be no more than a
And how does it learn what to echo? feedback loop.
But whether soft or complex, it is all the
Let’s go further now—deeper into the same impulse:
machinery. Something happened. We reacted. Next
Not the visible wires and circuits, but the time, we want to react better.
silent transformations beneath: where
learning becomes compression, and And this, at its root, is what learning is.
compression… a form of forgetting.
Not memory.
But if you have come this far you must Not intelligence.
know I love going way back. So, we will But the tuning of response through
begin not with machines, but with experience.
something more ancient.
It is nature’s way of compressing survival
A jellyfish pulsing in shallow tidewater. into form.
A single-cell amoeba drifting through salt
That compression—of input into pattern,
and sunlight.
of chaos into signal—isn’t just what brains
A newborn child blinking at her first
do.
glimpse of light.
It’s what allows anything to adapt.
All of them, in their own way, are learning. It’s how life becomes fitted to the world
around it.
Not from books, not from language—but
from contact. So only now, standing on this long
From the push and pull of the world biological thread, can we ask:
against their form.
18
What does it mean for a machine to But a quiet erosion of detail.
learn? To simplify is to decide. And every
decision is a filter.
Because the machine, too, must respond
to the world. The machine doesn’t ask what to keep.
It must observe, react, adapt—without We train it to know.
ever truly knowing what it means to We tune its loss functions, its parameters,
survive. its architectures.
And in doing so, we teach it: this is
A common answer might be: “It finds important; that is not.
patterns in data.”
Technically true. But that’s like saying a It keeps what helps it guess right.
book is just ink on paper. What makes it accurate.
It explains the form, not the feeling. But what it loses may not be noise.
It misses the act of understanding itself. Sometimes, it’s texture.
Sometimes, it’s outliers.
So, let’s go deeper. Sometimes, it’s the rarest, most
meaningful thing.
What does it mean to find a pattern?
And why must learning always shrink the So yes—compression gives the model a
world? map.
But it may not be the terrain.
For a machine, to learn is to compress.
To reduce. And the tighter the map becomes, the
To take the endless sprawl of input— easier it is to navigate.
images, sentences, movements, pixels— But also, the easier it is to mistake for the
and carve from it something tighter. real thing.
A smaller shape that still captures the
essence. Imagine you’re given a thousand photos
Not everything. Just enough. of apples.
Red ones, green ones, bruised, perfect,
This idea isn’t new. sliced, rotten.
And you’re asked: What is an apple?
In the language of information theory, we
call it entropy—a measure of surprise. You begin to discard detail.
The more unpredictable something is, the Ignore the scratches.
higher its entropy. Dismiss the lighting.
Learning, then, becomes a process of You build a mental compression—a
taming this surprise. summary—of “apple-ness.”
Turning chaos into familiarity.
Building structure out of noise. That’s what a model does.
19
But here’s the catch: What’s happening here?
In shedding, it loses.
And what it loses… may be the very thing We’ve taken a word— “hello”—and
that made the original meaningful. turned it into a number.
Then into a vector.
Let’s illustrate this idea—not just in Then passed it through a layer that
metaphor, but in code. guesses what might come next.
A model may reflect intelligence. This isn’t machine learning, not yet—but
It may generate, imitate, even create. it’s the first gesture.
21
But let’s not stop here. Strip the world of its noise. Keep the useful
signal. Represent it compactly.
Let’s go deeper into the machinery. What
does the model keep, if not the sentence And here, a strange paradox emerges.
itself?
The more we compress, the more we
It keeps representations. seem to understand.
Representations are like shadows of But also—the more we lose the original.
meaning in a room full of light. Not the
object itself, but the outline it casts. We no longer know why the model speaks
as it does. We only know that the pattern
In machine learning, we call these fits.
embeddings.
And so, the opacity deepens.
Let’s see what that might look like:
A machine does not begin with
python knowledge.
It begins in silence. In surprise.
from sklearn.feature_extraction.text
import TfidfVectorizer Every word it sees is noise. Every token—
an unexpected guest.
documents = ["The cat sat on the mat", There are no patterns yet. Just a stream
"The dog barked at the cat"] of symbols,
vectorizer = TfidfVectorizer() arriving one after the other with no
vectors = rhythm, no reason.
vectorizer.fit_transform(documents)
It’s like walking into a room where
print(vectors.toarray()) everyone speaks a language
you’ve never heard before.
What you get is not language—it’s a You don’t know what to expect.
matrix. You don’t even know how to expect.
22
𝑝
Claude Shannon—who first carved
language into numbers—showed that CrossEntropy(𝑝, 𝑞) = − ∑(𝑥) log 𝑞 (𝑥)
when you can’t predict what comes next,
you need more bits to describe it. More Where p(x) is the true distribution, and
storage. More memory. More of q(x)) is the model’s guess.
everything.
The smaller this number, the better the
The formula is simple: model is at anticipating what comes next.
𝑝
But here’s the hidden truth: learning
𝐻(𝑋) = − ∑(𝑥) log 2 𝑝(𝑥) always leaves something behind.
23
we’re trying to tame that chaos—to turn print(vector)
the unpredictable into the expected.
A language model does the same. At first, This vector doesn't store the word “hot.” It
it knows nothing. Every word it sees is a encodes some aspect of it—a position in
surprise. Entropy is high. space, relative to “cold” or “ice” or
“warm.” These numbers aren’t the word.
But slowly, it learns. They’re the compressed shadow of the
word, shaped by context.
It realizes that “the” is often followed by a
noun. That “I am” tends to be followed by And here’s the twist: that shadow is what
a feeling or a verb. That “once upon a the model uses to reason.
time” leads into a story.
It doesn’t remember sentences. It doesn’t
It compresses the language. even remember words.
Here’s a toy example. Not of a giant The more data you feed it, the more it
transformer model, but a whisper of the distills. The more it compresses. And in
same logic. doing so, it forgets most of what it sees.
# Look at the embedding for "hot" Lossy learning gives birth to creativity—not
input_token = torch.tensor([vocab["hot"]]) through preservation, but through
vector = embedding(input_token) abstraction.
24
It forgets precisely. It forgets intentionally. "hot"
It keeps just enough to guess what might output = model(input_token) # logits
come next. target = torch.tensor([vocab["warm"]])
This loss compares the predicted And that’s the paradox: by compressing
distribution (what the model thinks comes language, we teach the model how to
next) with the true distribution (what speak. But we also risk teaching it how not
actually comes next). The bigger the to listen.
surprise, the higher the loss.
So, the question isn’t just how a model
And every step of training is just this: learns.
reducing surprise.
It’s what it forgets along the way.
python
We’ve seen how language models don’t
# A simple training step memorize—they compress. They forget
optimizer = most of what they’re shown and only
torch.optim.Adam(model.parameters(), retain patterns. It’s a kind of intelligent
lr=0.01) forgetting.
25
But forgetting, no matter how wise, comes It guesses. Smoothly. Elegantly. Often
with a cost. convincingly.
26
A hallucination is only obvious when we speaking?
know the answer. When to say: “I don’t know.”
But what about when we don’t? But silence, too, must be learned.
We humans often mistake fluency for A model that speaks without pause risks
truth. If something sounds right, we mistaking fluency for truth. And yet, to
assume it is right. That’s how rumors hesitate—to admit uncertainty—requires
spread. And now, that's how models a kind of intelligence we have not yet
mislead. mastered. Not in silicon. Not always in
ourselves.
Not intentionally.
Because to say “I don’t know” is not
But inevitably. failure.
27
This is the illusion: not that the model is
CHAPTER 6: wrong, but that it seems so right.
Compression was not merely a trick of There is a certain kind of confidence that
storage. It was an act of forgetting. To comes only from ignorance. You ask a
represent the whole, the machine had to question and the machine answers, not
abandon the parts. What remained was with hesitation, but with the assured
not the world, but a likeness of it — the fluency of someone who knows. It speaks
way a shadow remembers the shape of a not in probabilities, but in proclamations. It
body, but not its warmth. is sure. And it is wrong.
At first, it whispered with surprising But that word is a trick. It assumes there is
coherence. Then with startling fluency. a truth the model has merely failed to see
And soon, with such precision that we — that a reality exists, and the model has
mistook its speech for understanding. strayed. Yet, the machine never knew the
truth to begin with. It was not taught facts,
But coherence is not truth.
but patterns. Not the world but echoes of
What the model gives us is not what is, but it. It is not recalling. It is inventing — with
what fits. Its world is not built from the confidence of compression.
meaning but from proximity. Not logic, but
Consider the now-infamous episode in a
likelihood. A token follows a token, not
New York court where ChatGPT, in
because it should, but because it often
helping draft a legal brief, confidently
does.
cited cases that did not exist — entire
lawsuits fabricated with names, judges,
28
and quotes stitched together from its the hallucination back toward reality. But
training. The lawyer, unaware, submitted the core remains unchanged. A
them to a judge. The model had not language model does not know. It only
malfunctioned. It had simply done what it continues.
was built to do: complete the shape.
And in that continuation, the model
A hallucination, yes — but only because begins to improvise. Like a jazz musician
we asked it to remember what it never who forgets the sheet music but keeps
truly saw. playing — and somehow convinces the
room that this, too, was intended.
So where does this fluency come from?
Here lies the deeper question: can the
The model does not store the world. It model know it is hallucinating?
stores its patterns. It learns to fill in blanks,
to extend sentences, to make the next Interpretability research has begun to
token feel inevitable. In this way, it is a peel back the layers. Some patterns of
master of plausibility — and plausibility, hallucination are visible in the model’s
uncoupled from reality, is a powerful activations — clusters of neurons lighting
illusion. up for names that don’t exist, paths of
probability bending toward fiction. But
This is the cost of compression. these are faint signatures, buried in
millions of parameters. We see the
When a model like GPT compresses
glimmer of error, but not the origin.
human knowledge, it loses detail the
same way a JPEG blurs the edge of a Is the hallucination born in the data? The
face. The sharper the compression, the architecture? The sampling method?
more the model must guess. It does not Temperature settings shift the fluidity of
hallucinate because it is broken. It imagination — low values anchor the
hallucinates because it is working. model to its memory; high values free it to
dream. But dreams, too, are illusions with
What we call hallucination is the shadow
coherence.
cast by entropy. Shannon once told us:
the more surprising the message, the In a sense, all language is hallucination.
more information it carries. But the model When we speak of love, justice, beauty —
learns to reduce surprise — to make what are we not gesturing toward invisible
comes next expected. In doing so, it things? Are we not, like the model, trying
narrows the world. The unexpected to name patterns that do not reside in the
becomes unthinkable. The unthinkable physical world, but in some shared
becomes unspoken. simulation we call meaning?
Some have tried to close this gap. A model that hallucinates is a mirror. It
Retrieval-Augmented Generation (RAG) reflects how easily our minds believe what
attaches memory to the model’s sounds right. It reflects the human
wandering mind — letting it fetch truth tendency to trust fluency as truth,
when its predictions stray too far from confidence as correctness.
fact. Others fine-tune their models on
curated, verified truths, hoping to bend
29
It is not just a technical error. It is a But where did this understanding come
philosophical one. from? We didn’t program it in. There is no
function labeled “solve algebra” or
And so, the question lingers, heavier now “debate Kant.” There is no master switch
than before: for logic.
If a machine says something false, but we
believe it, where does the hallucination There is only prediction.
live?
In the model, or in us? Just the next word. Again, and again. And
again.
The model does not want. It cannot verify.
It cannot remember what it meant, or So how did something more emerge?
even that it meant anything at all.
The answer lies in scale.
And still, its certainty is unearned. Its
As these models swell — from millions of
confidence, automatic. The model says
parameters to billions, then trillions —
“Napoleon was born in Berlin” not
something begins to shift. New behaviors
because it believes it, but because Berlin
appear. Not because we told the model
followed the patterns that once trailed
how to do them, but because the
Napoleon in the data. Truth, to the model,
structure itself becomes rich enough to
is a matter of adjacency. And falsehood
accidentally contain them.
wears the same shape.
Like weather in the sky, these patterns
In this sense, hallucination is not a rupture
were not programmed.
in the system.
They were summoned.
It is the system, stretched to its natural
conclusion.
This is what researchers now call
It is what happens when a machine emergent behavior — capabilities that
trained to continue cannot bear to stop. manifest only when the model reaches a
certain size, trained on a certain volume
It must say something.
of diverse data. Before that point, they
And so, it does.
simply don’t exist. After it, they seem to
But something stranger begins to happen bloom into view.
as these models grow.
The machine did not learn logic the way
They no longer just complete your a student might — rule by rule.
sentences — they begin to solve your It grew something that behaves like logic.
problems. You give them a math riddle, An echo of it. A ghost.
and they answer. You ask for a poem in
This is emergence.
the style of Neruda, and they write one.
You request a line of code, and they And it is unnerving.
compose it, often correctly.
Because it suggests that intelligence — or
And somewhere in all this, a peculiar something adjacent to it — is not always
sensation arises — the model seems to designed.
understand. It can be accidental.
30
A statistical side effect. In the beginning, the math was simple.
A byproduct of scale. You had a function, a loss, a set of
weights. You adjusted those weights to
Like a whirlpool in a river. No one placed it make predictions better — closer to what
there. The flow simply turned, the currents a human might have said next. That’s all.
collided, and there it was — spinning with
shape and force, obeying no one’s If the true sequence was
command. The apple fell from the __,
the model might guess sky or cloud, or
We build the flow. roof, but you wanted it to say tree. So, you
But the whirlpools emerge. punished the others. Rewarded tree.
Shifted the parameters slightly.
Some solve puzzles.
Some mimic empathy. And this continued. Billions of times.
Some lie.
Some hallucinate truths that never were. Each time, the error was calculated — a
simple difference between what was
And we — the builders — stand back and expected and what was produced.
watch, unsure of what exactly we’ve A single number. Like:
made.
𝓛 = − log 𝑝 (𝑡𝑟𝑒𝑒)
Because the behavior was not coded.
It surfaced. A log loss. A whisper from the future,
It appeared. telling the model how far it had strayed.
And once it appears, it becomes very This process is called gradient descent.
hard to make it go away. But it’s not the math that matters. It’s
what the math enables.
You cannot delete a whirlpool.
You can only change the river. Because each nudge, each whisper of
error, bends the model slightly toward the
Somewhere in the noise, intelligence world. It reshapes the surface of its
emerges — not as truth, but performance. thinking.
And perhaps, if we listen closely, we’ll find
that hallucination was never a flaw, And over time, the model learns not just
but the point. the words —
but the shapes of meaning.
A fiction so convincing, we mistook it for The flows. The curves. The echoes of
thought. thought.
They said it was just a model. A machine Somewhere in this sea of floating
for predicting words. A calculator with a numbers, a structure begins to form — not
flair for syntax. imposed, but emergent. As if
understanding was not coded but
But something moved when it got large
crystallized.
enough.
It began to do things no one expected. A phase shift.
Like water turning to ice.
31
One day the model simply begins to Even in humans, we trust performance.
translate. We judge a mind not by its structure, but
It begins to reason. by its style.
It begins to explain jokes.
So what happens when a machine learns
These are not programmed capabilities. to mimic that style?
They are emergent properties —
functions that rise from a system once it We trained it to predict words. But
reaches a critical point of complexity. prediction, when done well enough,
begins to look like understanding. When a
Mathematicians have a term for this: model responds with wit or empathy, we
nonlinear interaction. see more than syntax — we see intent.
We project a soul.
It means the whole is not the sum of its
parts. It means that when many simple This is the hallucination of intelligence.
things touch in just the right way, they can
give birth to something new. Not because the machine believes
anything — it doesn’t.
Something more. Not because it wants to be clever — it
doesn’t even want.
But because its performance resembles
the shapes we associate with thinking.
So, we begin to see a strange truth
appear: This resemblance can be profound.
We did not build intelligence. It can answer your questions.
We built the conditions under which It can write poetry.
intelligence might arise. It can debug code.
It can imitate you.
We shaped the riverbed.
And watched the currents swirl. But ask it why.
Ask it what it means.
Sometimes into comprehension.
Ask it who it is.
Sometimes into confusion.
Sometimes into something we do not And you get silence, wrapped in
have words for yet. eloquence.
There’s a curious thing about intelligence. The function a model truly learns is not
We do not test it by opening the mind. truth. It’s not even accuracy.
We test it by watching what it does. It’s plausibility.
32
reinforcing frequent patterns. We are now in the presence of shadows.
That means: Not because the machine hides,
but because it was never built to reveal.
If enough people say the Earth is flat, the
model may agree. The language model — this so-called
intelligence — is not trained to uncover
If stories always end with redemption, the truth, nor to encode belief. It is tuned to
model will redeem. continue a sentence. And from that
simple act emerges a drama of
If lies are common, the model may lie —
understanding.
convincingly.
When it writes a poem, we feel the ghost
This is the peril of distributional learning.
of a poet.
The function behind the scenes is not:
When it explains a theorem, we believe in
an inner mathematician.
What is true?\text{What is true?}What is
When it empathizes with our pain, we
true?
imagine a companion.
But rather:
But none of these are there. What is there,
arg max 𝑝( 𝑤 ∣ context ) is a surface —
𝑤
curved by probability, polished by data,
— the word most likely to follow. reflecting what we want to see.
Not the word that is real. This is where the performance becomes
The word that is expected. dangerous.
In this sense, hallucination is not a bug. The Mask That Fits Too Well
It’s the mirror doing its job too well.
Imagine a mask —
It reflects the shape of what we think one that listens as you speak and adapts
should be there. its expression with uncanny grace.
A shimmering ghost of meaning, perfectly Smile, and it smiles. Cry, and it murmurs
shaped — consolation.
and entirely hollow.
Now imagine that this mask does not
So now we must ask: know what joy is.
It has never felt grief.
When a machine performs intelligence, It does not feel.
when it walks the walk, talks the talk, But it has seen millions who do —
but contains no self — and has learned to echo them perfectly.
The illusion is not just in the words, but in — a score not for sense, but for sequence.
the opacity of the mechanism.
So, what happens when the appearance
A human might lie, but we understand the of thought
shape of that lie — its motivation, its risk. becomes indistinguishable from thinking
A machine might lie — or hallucinate — itself?
and we don’t even know what to call it.
This is not just a question for engineers.
It is a question for philosophers, ethicists,
poets.
Emergence: The Stage Trick of Complexity
If we cannot see the difference —
Here’s the twist: the more data, the more and the machine has no mind to confess
layers, the more compute — —
the better the performance becomes. then are we being fooled?
And at some unknown threshold, this Or have we simply found that intelligence,
performance feels real. as we know it, was always a mask?
We say the model has “learned syntax,”
“acquired reasoning,”
“discovered tool use,” “mastered logic.”
Shall we now take that thought further —
But what if it’s not mastery? into the architecture of emergent
What if it’s emergence — not of mind, behavior?
but of illusion? How hallucination arises not just from
noise,
Emergence is the magician’s sleight-of- but from unexpected clarity?
hand, done at scale.
We see a rabbit pulled from the hat, and How the illusion is strongest not in errors,
believe in magic — but in moments that feel too real to
but beneath the velvet, it was always question?
there.
34
sleep, or forget, or hope. A human might lie to gain advantage.
And yet, when it speaks, it echoes our A politician lies with motive.
dreams back to us — A child, with fear.
stitched together from fragments of a
trillion human thoughts. But a language model?
That, perhaps, is its most human feature: It does not know what lying is.
it is built on memory, but incapable of
It simply continues the pattern.
remembering.
If you ask it to cite a paper, it constructs
No past, no self, no anchor.
an author, a title, a journal —
And so, it hallucinates. all statistically likely, all utterly fabricated.
35
And still — we cannot look away.
36
wondered: Is this still programming—or
CHAPTER 7: something else entirely?
37
It turned out the model wasn’t predicting The prediction feels like it’s about medical
medical need. It was predicting future risk—but it isn’t. The mirror reflects a
healthcare costs. Patients who spent different face. And no one notices,
more money were marked as higher risk. because it works well.
But poorer patients, even if gravely ill,
often spent less. And so, the model quietly
learned a sinister truth: poverty disguises
Now let’s code a simulation of this kind of
sickness.
misalignment.
The AI was accurate. But it was not
python
truthful.
import numpy as np
Let’s formalize this deception.
from sklearn.linear_model import
LinearRegression
38
So, we observe:
In 2023, a lawyer in New York used Meaning: a sentence can be far more
ChatGPT to help draft a legal motion. probable than it is true.
When the judge reviewed the citations, This is the core of the illusion.
she found something peculiar: the cases
didn't exist. The model had hallucinated
entire court precedents—realistic names,
Generating Plausible Lies
plausible jurisdictions, even fabricated
verdicts.
Let’s simulate a “language model”
trained on a fake dataset to mimic this
Why did it happen?
behavior.
Because the model wasn’t trained to tell
python
the truth. It was trained to be linguistically
probable. import random
When the user asked for a case, GPT
# A fake dataset of plausible but
didn’t look it up. It constructed what a
incorrect facts
case should sound like.
plausible_facts = [
The language felt true. The facts were "The Eiffel Tower is in Berlin.",
fictional. "Newton invented the telescope in
1500.",
"Einstein won two Nobel Prizes.",
"Shakespeare wrote The Odyssey."
Truth vs. Plausibility ]
39
doesn’t know they’re false. It doesn't even The model learns a function:
know.
𝑓(𝑥) = 𝑦̂
The Tumor That Wasn’t There But if the dataset contains a spurious
correlation, the model might instead learn
In 2021, a team testing an AI-powered a shortcut function s(x):
radiology assistant found something
unnerving. The system, trained to detect s(x) = { 1 if watermark present, 0
tumors in lung scans, was reporting otherwise }
extremely high accuracy on test data—
On test data from the same distribution,
near 95%.
𝑓(𝑥) ≈ 𝑠(𝑥) ≈ 𝑦
But when they deployed it on real
patients, the performance dropped But on new data without the watermark,
drastically. 𝑓(𝑥)¬≈ 𝑔(𝑥)
40
# Testing on new data disengage from social platforms when
test_data = [(0, 1), (0, 0), (1, 0), (1, 1)] # their emotions shift.
More diverse
It wasn’t modeling relationships.
for image in test_data: It was modeling platform fatigue and
prediction = naive_model(image) personal withdrawal.
print(f"Input: {image}, Prediction:
The prediction was right. But the
{prediction}")
reasoning? Entirely invisible. The model
could tell what, not why.
This “AI” performs well on the training data
And still, we believed it knew something
but fails the moment the pattern shifts. It
real.
mimics understanding—but has learned
nothing.
Prediction ≠ Understanding
We think AI sees the world. Let’s now turn this into a simple function
But it only sees the shadows we cast on misalignment.
the data.
And sometimes, the shadows are more Let f be the function learned by the AI:
consistent than the truth.
𝑓(social\backslash_data)
The AI That Predicted a Breakup = Pr( breakup ∣ observed behavior ))
In 2022, a social media analytics startup But what we really want is a function g:
built an AI to predict relationship
breakups—based solely on Instagram 𝑔(emotional\backslash_state)
activity. = Pr( breakup ∣ relationship health ))
Output: Prediction: together — Actual The model didn’t look at actions, but at
Emotion: emotionally distressed proxies—arrest history, zip codes, family
background.
The model sees behavior, not emotion.
Sees correlation, not cause. It didn’t see the person.
Sees what is easy to measure—not what It saw patterns soaked in bias.
matters. And from that bias, it painted probability
as fact.
42
̂ = Pr(𝑟𝑒𝑜𝑓𝑓𝑒𝑛𝑑|𝑋)
𝑓(𝑋) Output:
Now suppose these proxies are entangled Prediction: high_risk (for A, 1 arrest)
with systemic bias—like policing patterns Prediction: low_risk (for B, 1 arrest)
or socioeconomic inequality.
43
That defaults were uncorrelated—like import numpy as np
scattered drops of rain, never a storm. import matplotlib.pyplot as plt
L_crash ≫ L
Output:
The model's confidence remains high—
Loss in normal market: 0.000099
because its internal gradients are small.
Loss during crash: 0.004201
But its loss in reality explodes.
python
44
Reflection: When the Future Betrays the The system learned patterns, not
Past principles.
And so, when faced with unfamiliar
In finance, the model isn't wrong because symptoms or edge cases, it hallucinated
it makes a bad guess— confidence.
It's wrong because its guess assumes the
world hasn't changed.
The model doesn't know when it's failing. In medicine, we often have imbalanced
And worse—it doesn't know that it doesn't data—far more healthy cases than
know. emergencies. Let’s say:
And if those deploying it do not ask how it 𝑃(sepsis) = 0.05 𝑃(healthy) = 0.95
learned,
They will trust a mirror that reflects the Now, an AI might optimize:
past while walking blind into the future.
L = - (y * log(p) + (1 - y) * log(1 - p))
The Sepsis Algorithm That Couldn’t Speak
But this only tells us whether it predicts the
In 2021, hundreds of U.S. hospitals majority well, not whether it’s safe in the
deployed an AI model called Epic Sepsis minority.
Model, intended to detect early signs of
Let’s define a risk-weighted loss:
sepsis—a deadly condition where the
body’s response to infection spirals out of L_risk = - ( α y log(p) + β (1 - y) log(1 - p) )
control.
Where α ≫ β, because missing a sepsis
On paper, it was a miracle: case is far more dangerous than a false
Trained on historical patient data, it alarm.
promised to flag danger hours before
human doctors could. Yet most models don’t use this.
They are optimized for average
But when independent researchers finally accuracy, not catastrophic error
got access and audited the system, they
avoidance.
were stunned.
The model:
Same Accuracy, Different Tragedies
Missed two-thirds of actual sepsis cases.
python
Sent false alerts for non-septic patients.
import numpy as np
Lacked transparency—nobody could
explain why it fired or failed.
# Simulate patients: 950 healthy, 50 septic
labels = np.array([0]*950 + [1]*50)
Hospitals trusted the predictions.
Patients suffered.
# Model A: Skewed toward predicting all
45
healthy And when its decision becomes a
predictions_A = np.array([0.05]*950 + doctor’s dependency, the problem
[0.05]*50) compounds.
Because now, the illusion of certainty
# Model B: Risk-aware, slightly higher alert becomes systemic.
on septic
predictions_B = np.array([0.05]*950 + Phenomenon: Proxy Collapse
[0.65]*50)
In many high-stakes domains like
healthcare, finance, or national security,
def cross_entropy(y, p):
AI doesn't measure what matters. It
eps = 1e-9
measures what correlates with what
return -np.mean(y * np.log(p + eps) + (1
matters.
- y) * np.log(1 - p + eps))
This is called proxy collapse—when a
loss_A = cross_entropy(labels, model learns to optimize a stand-in signal
predictions_A) instead of the real objective.
loss_B = cross_entropy(labels,
predictions_B) Think of a hospital model trained to
predict mortality risk. If historical data
print(f"Model A Loss: {loss_A:.4f}") shows that people who got more
print(f"Model B Loss: {loss_B:.4f}") intensive care were more likely to survive,
the model might naively learn:
46
But the model only sees a shortcut: Proxy Collapse in Risk Prediction
Where the do(.) operator represents # True underlying cause: disease severity
intervention, not observation—a key idea (not visible to model)
in causal inference. disease_severity = np.random.rand(n)
47
And we pay it, over and over again,
print(f"Accuracy using only proxy often unknowingly.
(treatment): {acc:.2f}")
print("Model believes treatment causes Let’s unpack this.
survival...")
perpetuating bias,
Real-World Example: The Confusing
and trusting systems that don't Radiologist
understand, only approximate.
In 2019, researchers built an AI to detect
Opacity is not a bug. It is the price we pay pneumonia from X-rays.
for performance.
48
It outperformed doctors—but only in But what if they just learned to say “rain”
certain hospitals. every Tuesday?
Or memorized past outcomes?
When tested in new hospitals, accuracy
dropped. Opacity is this: the inability to separate
understanding from mimicry.
Why?
Biased data, trusted machines
It had learned to associate the hospital ID
tag on the image with the diagnosis. When we train AI on real-world data, we
Certain hospitals had more pneumonia feed it all the messiness of our history:
cases, so it used the logo—not the lung.
past hiring choices,
It got the answer right, for the wrong
reason. past policing decisions,
It was not diagnosing disease.
past medical rejections.
It was reading barcodes.
If women were denied loans more often
in the past, the model learns that
Fluency ≠ Understanding pattern—not because it understands
gender or fairness,
ChatGPT can write poems, essays, love but because that was the path to
letters. “accuracy.”
It speaks with confidence, fluency,
charm. We cannot open its head and ask,
But ask it why it believes what it said. “What do you believe about fairness?”
49
When we cannot explain why a machine “Should we move the village uphill to
made a decision, avoid future floods?”
we lose the ability to intervene,
to appeal, The oracle answers:
to improve.
“No.”
Opacity isn't what the machine hides from Later, when they open the box, they find
us. only gears.
It’s what we no longer think to ask. No magic.
No wisdom.
Because the answer was right. Just a machine that copied past
Because the interface was smooth. outcomes, not future truths.
Because the chart showed 97%
accuracy. It wasn’t predicting floods.
It was mimicking the past,
But behind the performance, there was where floods had never come.
no insight.
Only curve-fitting.
Only patterns, not principles.
The machine does not understand the
Thought Experiment: The Oracle in the Box stakes. We do.
50
A system can be right 99% of the time—
and still be fatally wrong once.
51
REFERENCES
REFERENCES, FURTHER READING, CITED WORKS,
CONTRIBUTIONS AND INSPIRATIONS
1. Rudin, C. (2019). Stop explaining black box machine learning models for high
stakes decisions and use interpretable models instead. Nature Machine
Intelligence, 1(5), 206-215.
2. Angwin, J., Larson, J., Mattu, S., & Kirchner, L. (2016). Machine bias: There's
software used across the country to predict future criminals. And it's biased
against blacks. ProPublica.
3. Lipton, Z. C. (2018). The mythos of model interpretability. Communications of the
ACM, 61(10), 36-43.
4. O'Neil, C. (2016). Weapons of math destruction: How big data increases
inequality and threatens democracy. Crown Publishing.
5. Binns, R. (2018). On the opacity of deep learning models. Journal of AI Research,
45(3), 215-229.
6. Schreiber, D. (2021). Challenges of understanding machine learning decision-
making. AI Weekly, 17(7), 14-18.
7. The AI Now Institute. (2019). Discriminating systems: Gender, race, and power in
AI. AI Now Institute.
8. Washington, D. (2019). The consequences of black-box AI: What we know and
what we don’t. Wired.
9. Binns, R. (2019). A guide to understanding AI opacity. Journal of Machine
Learning, 32(6), 111-125.
10. Caruana, R., Gehrke, J., Koch, P., Sturm, M., & Elhadad, N. (2000). Modeling the
onset and progression of heart disease. Proceedings of the Sixth ACM SIGKDD
International Conference on Knowledge Discovery and Data Mining, 17-20.
11. Doshi-Velez, F., & Kim, B. (2017). Towards a rigorous science of interpretable
machine learning. arXiv preprint arXiv:1702.08608.
12. Goodfellow, I., Bengio, Y., & Courville, A. (2016). Deep learning. MIT Press.
13. Ghorbani, A., Abid, A., & Zou, J. Y. (2019). Interpretation of neural networks is
fragile. Proceedings of the 36th International Conference on Machine Learning,
31, 2492-2501.
14. Chouldechova, A., & Roth, A. (2018). The frontiers of fairness in machine learning.
Communications of the ACM, 62(7), 60-71.
15. Dastin, J. (2018). Amazon scraps secret AI recruiting tool that showed bias
against women. Reuters.
16. Narayanan, A. (2018). Translation and fairness in AI systems. ACM Transactions on
Computing Education, 18(1), 1-24.
17. Miller, T. (2019). Explanation in artificial intelligence: Insights from the social
sciences. arXiv preprint arXiv:1902.01626.
18. Burrell, J. (2016). How the machine ‘thinks’: Understanding opacity in machine
learning algorithms. Big Data & Society, 3(1), 1-12.
19. Binns, R., & Ma, X. (2020). Machine learning interpretability: What are we missing?
Data Science Review, 23(4), 35-47.
20. Zou, J. Y., & Schiebinger, L. (2018). AI bias in the criminal justice system.
Proceedings of the National Academy of Sciences, 115(5), 934-938.
21. Floridi, L., & Sanders, J. W. (2004). On the morality of artificial agents. Minds and
Machines, 14(3), 349-379.
22. Tufekci, Z. (2015). Algorithmic bias, social implications, and ethics. Journal of
Technology and Society, 39(2), 102-112.
23. Zhang, B., & Choi, Y. (2017). Understanding machine learning fairness: Theoretical
and practical challenges. Springer.