0% found this document useful (0 votes)
47 views58 pages

The Opacity Effect

This document explores the inherent opacity of AI models, arguing that their complexity often leads to a false sense of understanding and insight. It highlights the moral implications of relying on AI systems that can produce accurate predictions without transparency or comprehension, emphasizing the need for interpretability in AI. The research draws on historical and philosophical perspectives to underscore the importance of recognizing the limitations of AI as mere reflections of human intelligence rather than true understanding.

Uploaded by

basalt2013
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
47 views58 pages

The Opacity Effect

This document explores the inherent opacity of AI models, arguing that their complexity often leads to a false sense of understanding and insight. It highlights the moral implications of relying on AI systems that can produce accurate predictions without transparency or comprehension, emphasizing the need for interpretability in AI. The research draws on historical and philosophical perspectives to underscore the importance of recognizing the limitations of AI as mere reflections of human intelligence rather than true understanding.

Uploaded by

basalt2013
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 58

The

Opacity
effect
WHERE REFLECTION OBSCURES
A STUDY ON THE OPACITY OF AI MODELS & HOW ITS
REFLECTIONS CREATE AN ILLUSION OF INSIGHT
Sayan Karmakar ,
ACKNOWLEDGEMENT
A HEARTFELT THANKS TO ALL INVOLVED

The journey of this research would not have been possible without the incredible
support and contributions of those who have guided me through both the intellectual
and emotional aspects of this project. I owe a great deal of gratitude to Cynthia Rudin,
whose groundbreaking work in the field of interpretable machine learning has shaped
much of my thinking. Her research on creating transparent AI models and the
challenges of balancing accuracy with interpretability has been pivotal in framing the
core ideas of this paper.

To my father, I owe my deepest gratitude for your wisdom, guidance, and practical
approach to life. You’ve shown me the value of perseverance and a strong work ethic,
and your ability to see challenges as opportunities has been an invaluable source of
inspiration. Your belief in my ability to overcome obstacles has fueled my determination,
and I am forever thankful for the steady presence you’ve been in my life.

To my mother, your unwavering love and nurturing spirit have been my anchor. Your
patience, kindness, and gentle encouragement have provided me with the emotional
strength needed to pursue my dreams, even during the toughest of times. You’ve
taught me to approach life with empathy, and your belief in me has never faltered. For
all the sacrifices you’ve made and the love you’ve given, I am endlessly grateful.

I would also like to extend my sincere appreciation to my friend Amit, whose thoughtful
feedback and advice were instrumental during the early stages of my research. His
suggestions on refining the structure and clarity of the paper helped immensely in
shaping the direction of my work.

Lastly, my heartfelt thanks go to my friend Anamika, whose critical insights and detailed
reviews helped me navigate the complex ideas explored in this research. Her support
has been invaluable, and I am grateful for her constant encouragement and
constructive critiques throughout the process.
ABSTRACT
UNVEILING THE OPACITY OF AI PREDICTIONS

This paper traverses the evolution of artificial intelligence not through isolated technical
milestones, but as a living tapestry of illusions, structures, and emergent behaviors. From
the foundational questions of intelligence and the syntax of programming languages to
the rise of large language models and their paradoxes, we have examined how
fluency masquerades as comprehension and prediction often replaces explanation.

Our inquiry reveals that opacity is not a flaw at the fringes of AI—it is embedded at its
core. We have demonstrated this not only through theoretical exploration but through
lived phenomena: simulations that yield persuasive yet senseless results, models that
achieve statistical brilliance while remaining semantically blind, and predictions that
impress yet betray no trace of reasoning.

In this paper, we have not only discussed opacity—we have demonstrated it. We
showed it in hallucinations dressed as facts. In fluent responses to questions never truly
understood. In models that pass medical exams yet fail basic logic. We exposed it in the
phenomenon of overfitting—where a model learns the noise instead of the truth. We
saw it in causal confusion, where correlation wore the mask of cause, and in proxy
collapse, where a stand-in variable deceived both the model and its maker.

We built code that appeared wise. We analyzed formulas that appeared certain. But in
peeling back the structure, we found no soul, only signals and probabilities woven with
impressive precision—and no transparency behind the choices made.

A truly transparent system would allow us to trace why a decision was made—not
merely what it decided. But we are not there. Not yet. The complexity of deep learning
systems has grown to such vastness that even their designers cannot predict their
behavior in full. As researchers like Cynthia Rudin and Judea Pearl have pointed out,
accuracy is not a substitute for interpretability. And when decisions affect lives—loan
approvals, medical diagnoses, parole hearings—opacity is not just a technical flaw. It is
a moral hazard.

Opacity is not a distant philosophical concern. It is here, now, embedded in systems we


trust. It hides in chatbots that lie, in algorithms that discriminate, in recommendation
engines that radicalize. And the most haunting part is this: they are not trying to do
harm. They are simply optimizing a function—one we gave them—without knowing
what that function truly means in the world it operates within.

This is not to say these systems are without value. Quite the opposite. They are powerful,
often miraculous tools. But power without understanding is a dangerous form of faith.

Crucially, we argue that opacity is not merely the result of complexity—it is the cost of
compression, the consequence of performance untethered from understanding. The
systems we call intelligent are trained to mirror the world, but not to know it. They
compress meaning into tokens, truth into probabilities, and learning into pattern
mimicry. What emerges is not comprehension, but the illusion of it—an echo crafted in
high fidelity.

To recognize opacity is not to reject progress. It is to acknowledge our reflection in a


machine that does not know it is a mirror. We must build not only smarter systems, but
systems we can see. Systems whose predictions we can trust not just because they are
accurate—but because we understand the shape of their thought.

Until then, let us carry forward with eyes open, questions sharpened, and a deep
respect for the complexity we have summoned.

Because the black box does not speak our language.

It only echoes it back.


TABLE OF
CONTENTS
01 INTRODUCTION 01

02 ORIGINS 03

03 FROM THOUGHT TO ACTION 09

04 EMERGENCE 14

05 RESIDUE 18

06 ILLUSIONS 28

07 OPACITY 37
situation, these signals, this action. Now—
CHAPTER 1: somewhere in that same city, a self-
driving car sees the same light. It stops
INTRODUCTION too. Not because it “knows” in the human
sense, but because it has learned through
thousands of examples what that light
A MIRROR THAT LEARNS: SENSING
means, how people behave, and what it
THE SHAPE OF INTELLIGENCE
should do.

That is AI—not a mind, but a mirror,


“The significant problems we face cannot reflecting slivers of human judgment,
be solved at the same level of thinking we shaped by data, memory, and a certain
were at when we created them.” kind of reasoning.
— Albert Einstein, Nobel Laureate in
Physics, 1921 But this kind of reasoning doesn’t come
from understanding the way we
We are no Einstein so let’s speak in simpler understand. The car doesn’t know what it
terms. means to be late for work. It doesn’t feel
the rain or hear the child laughing.
Today, Artificial Intelligence is everywhere.
It picks the songs you didn’t know you Instead, what it has is something more
wanted to hear. It finishes your sentences mechanical—yet strangely elegant: it
in emails. It maps the fastest route when recognizes patterns.
you're stuck in traffic. It even decides— And from those patterns, it acts.
quietly—what headlines appear first when
Think of AI like this:
you open your phone.
You hum a few notes of a song you barely
People speak of AI with awe, with fear,
remember—and somehow, your phone
with hope. But beneath all the noise, the
finds it.
real question persists:
You take a blurry photo of a bird, and a
What is AI, truly?
small icon tells you: “This might be a
Let me not give you a fixed definition— kingfisher.”
because in truth, AI is not a thing you
What’s happening here?
hold, it’s a thing you witness.
Let me instead offer you a way to notice The machine doesn’t “know” the song.
it. It doesn’t “see” the bird.

Imagine standing at a busy crosswalk. But in its own way, it remembers every
song it's ever been fed.
You see a red light. You stop.
It remembers the shape of every bird it
The car next to you slows down.
has ever been shown.
A child tugs on their mother’s hand,
And when your request enters the system,
watching the traffic.
it searches through that sea of memory—
Nobody told you what to do—you just not randomly, but cleverly—and says: this
knew. You’ve learned it over time: this is the closest match I know.

1
It is like a librarian who doesn’t read books Still, something about it feels uncannily
but knows exactly where each one is close.
shelved. Like a shadow that sometimes moves
before we do.
Well earlier I had raised the question,
“What is AI?” But before we get there, So, what is AI?
let’s ask a more basic question:
What is intelligence? Not in theory—but in Not a brain.
everyday life. Not a soul.
But something else—a growing echo of
When a child sees a ball roll under the human perception, stretched across
couch and then looks behind the couch circuits and data, making sense of things
to find it—that’s intelligence. it has never truly lived.
When a bird changes its flight path to
avoid a tree it’s never seen before—that’s It does not think as we do, but it behaves
intelligence. in ways that suggest thought.
When you learn that your friend is upset,
And perhaps that is the best way to
not because they said it, but because of
understand AI—not as a thing we have
how they said nothing—that too is
built, but as a process that is learning,
intelligence.
mimicking, and inching ever closer to
So, I’ll summarize that at its core, understanding without being us.
intelligence is the ability to learn, adapt,
A strange new mirror—one that doesn’t
and make sense of the world—even when
just reflect us but learns to anticipate us.
it’s messy, uncertain, or unfamiliar.

But here’s a deeper, more unsettling


question:

“If intelligence is rooted in experience,


sensation, and survival—can something
without a body, without a past, without
pain or memory of death, ever truly
possess it?”

Is it the ability to solve a puzzle? To learn


from mistakes? To adapt to change?
If so, then yes—AI mimics those things, in
parts, in pieces.

But ask a machine to tell you why a poem


is beautiful, or what silence means in a
conversation—and it stumbles. Because it
does not live in the world as we do. It
does not feel, or dream, or dread.

2
Remove the nail, and the essence would
CHAPTER 2: drain. The giant would fall.

ORIGINS According to the old stories, when enemy


ships approached, Talos would hurl
massive stones at them. If that failed, he
FROM MYTH TO MACHINE: THE would heat his metallic body until it
LONG MEMORY OF ARTIFICIAL glowed, then clasp the intruders to his
INTELLIGENCE
chest—burning them alive.
Not out of anger. Not vengeance. Just...
because that’s what he was made to do.
“Any sufficiently advanced technology is
indistinguishable from magic.” There’s something haunting in that.
— Arthur C. Clarke, British science fiction A figure that looks like us, moves like us,
writer, futurist, and inventor enacts decisions—but without emotion,
without reflection.
Let's dabble our hand into a bit of history His purpose was protection. But his
judgment? Unquestioning. Mechanical.

Before machines computed, they were And so, the myth lingers—not merely as a
dreamt. tale of war and gods, but as a quiet echo
Long before a transistor ever blinked, or a of a question we still wrestle with:
compiler was born, humans imagined Can something built ever truly know what
intelligence that existed outside it means to guard? To harm? To choose?
themselves.
A mirror-mind. A golem. A mechanical
oracle.
In Jewish folklore, the Golem was not born
In Greek myth, Talos—a giant of gleaming of divine fire or celestial metal, but of
bronze, forged by the god Hephaestus— humble earth—clay shaped by human
was given to King Minos to guard the hands, brought to life through ancient
island of Crete. words and sacred intent.
He was not born but built. Not raised but
The most enduring tale comes from 16th-
riveted.
century Prague, where Rabbi Judah
Each day, he circled the island’s shores, Loew, a wise and righteous scholar,
relentless and tireless, scanning the molded the Golem to defend his
horizon for invaders. He needed no rest, community against persecution. The
no food, no sleep. creature stood tall and silent, carved with
His strength lay not only in his size, but in care, inscribed with the Hebrew word
his certainty—he acted without hesitation, “Emet”—truth—on its forehead. This single
without doubt. word gave it life.

He had a single vein, filled not with blood, The Golem followed commands without
but with ichor, the golden lifeblood of the question. It did not eat. It did not speak. It
gods—sealed shut by a nail at his heel. obeyed.

3
It patrolled the ghetto at night, a Da Vinci, ever the anatomist, had studied
protector against injustice. But as the the human body not only for art, but for
legend goes, the more it was used, the understanding. He saw motion not as
more it grew—stronger, yes, but also mystery, but as geometry made flesh. His
unpredictable. It began to misinterpret mechanical knight was an echo of that
commands, act with unintended force, belief—a body reimagined as a system.
sometimes failing to distinguish between Muscles as levers. Tendons as ropes. The
threat and innocence. soul? Absent. And yet... it moved.

To deactivate it, the rabbi erased the first Why build such a thing? Some say it was a
letter of “Emet”, leaving the word “met”— carnival marvel. Others believe it was a
death. The Golem crumbled back into secret military design. But perhaps, more
dust. profoundly, it was an early meditation on
the boundaries between nature and
This was not a monster story. It was a invention.
warning draped in compassion—a
reminder that when we breathe function If we can build a body that behaves as
into form, when we create something that though it lives, how far are we from
moves but does not understand, we must building a mind that behaves as though it
also bear the weight of what it may thinks?
become.
Leonardo himself wrote, “The human foot
The Golem was not evil. It simply lacked is a masterpiece of engineering and a
the one thing clay could never hold: work of art.” He did not distinguish
discernment. between beauty and mechanism.
And in doing so, he bridged the gap
between the mystical and the
mechanical—between the Golem’s
Even the stories of Da Vinci’s mechanical
breath and the gear’s rotation.
knight—crafted not in myth, but in the
margins of his notebooks—carry this same Here, in his automaton, the myth begins to
quiet, unsettling question. turn into method.
This is the moment the old dream quietly
Around 1495, Leonardo designed a
starts to shift—
humanoid figure: armored in the fashion
from gods and golems,
of a medieval knight, rigged with an
to designs and diagrams.
internal system of pulleys, gears, and
From bronze and clay,
cables. It could sit upright, raise its arms,
to blueprints of cognition.
turn its head, lift its visor. No spirit moved it.
Only the elegant choreography of force And though centuries would pass before
and tension. silicon would hum with artificial thought,
the idea was seeded here—in a machine
This was no mythic protector, no
that moved without life but carried the
enchanted sentinel. It was an engineered
form of intention.
idea.

4
The Birth of the Machine Mind And isn’t that where artificial intelligence
begins?
Fast forward to the 19th century. Not with wires and code—but with the
suggestion that a machine might do more
We move from myth to mathematics—
than follow instructions—
from the imagined to the almost
that it might, in some far future, transform
constructed.
them.
Here we meet Charles Babbage, a man
Lovelace saw this not as imitation, but
discontent with the errors of human
possibility.
calculation. He envisioned a mechanical
That the mind itself—its structure, its logic,
engine—not to move wheels or mill
its rhythm—might one day be made
grain—but to compute. His Analytical
legible to a machine.
Engine, though never built in full, carried
within it the essential organs of every She glimpsed AI long before we called it
computer to come input, processing, that.
memory, and output. And more importantly, she asked the
question we still return to:
But the true leap was not in the machine.
It was in the mind that saw beyond it. If a machine can manipulate symbols…,
could it one day touch meaning?
Ada Lovelace, daughter of the poet Lord
Byron, and a mathematician of rare
insight, studied Babbage’s designs and
imagined something no one else did. She Then came Alan Turing—a figure of
understood the machine not merely as a silence and brilliance.
calculator, but as something more A man whose genius unfolded not in
abstract—a symbolic processor. In 1843, spectacle, but in stillness. In the quiet
she wrote: clarity of questions that reshaped our
understanding of thought.
“The Analytical Engine might act upon
other things besides number… were Turing didn’t ask, “Can machines
objects found whose mutual fundamental calculate?”
relations could be expressed by those of That had already been shown—they
the abstract science of operations…” could follow instructions, crunch numbers,
execute precise routines.
She was speaking of music, of language, Instead, he asked something far more
of thought itself. unsettling:

In a world that still ran on steam and ink, Can machines think?
she glimpsed a machine that might one
day generate ideas, not just equations. But what does thinking mean? Is it solving
She saw not only the mechanics of a riddle? Learning from a mistake?
computing, but the potential for Holding a belief? Feeling doubt?
expression.
Turing knew that to ask whether a
machine think was to first ask what we

5
mean when we use that word. And so, in from human languages: a means to
1950, he proposed a thought instruct, describe, command.
experiment—the Imitation Game, later
known as the Turing Test. But unlike us, machines understand only
one thing: binary.
It was simple on the surface: imagine a 1s and 0s.
conversation between a human and a Yes or no.
machine, conducted through written Current or no current.
messages. If the human couldn’t reliably
tell whether their partner was man or Imagine trying to build a symphony using
machine, could we say the machine was only fireflies that blink twice or once. It’s
thinking? possible—but painfully inefficient.

But the test wasn’t meant to prove So, we created abstractions.


intelligence.
But what does that mean—an
It was meant to loosen our grip on narrow
abstraction?
definitions.
To ask: If something behaves as though it It means we stopped speaking to the
understands… do we still insist it doesn’t? machine in its native language—binary—
and instead began building bridges.
Turing didn’t give us AI. He gave us the
Layers of meaning stacked gently atop
frame in which AI could be imagined.
the yes-or-no logic of silicon.
He shifted the question away from how
machines work, to how they appear—not The first of these was Assembly language.
from the inside, but from the outside, Still close to the machine, still requiring
through behavior, through language, knowledge of memory addresses and
through response. registers—but slightly more human. Like
replacing blinking fireflies with switches,
And quietly, he laid the first stone in a long
you could label.
philosophical corridor—
one that still stretches before us today.
Yet even that was cumbersome. So, we
asked:
Could we design a way to write
The Logic of Language and the instructions that looked more like human
Whispering of Code thought?
What if we could say “do this five times”
Now let’s pivot. instead of juggling memory counters?
What if we could say “if this happens, do
If machines are to think, they must first that”—like how we reason in everyday
understand. life?
And if they are to understand, we must
speak to them. This gave rise to what we now call high-
level programming languages.
This is where programming languages
enter. But what is a high-level language?
At their core, they are not so different It’s a way of writing code that lets us

6
focus on what we want done—not how it can begin to simulate the shape of
the machine does it. reasoning.
It can solve puzzles. Recognize patterns.
Take FORTRAN—short for Formula Learn rules.
Translation. It allowed scientists to write It begins to behave as though it
equations, loops, and logic in a way that understands.
mirrored their thinking, not the processor’s.
And this is the whisper from which Artificial
Then came LISP, born from symbolic logic, Intelligence begins to form.
perfect for expressing relationships and
nested reasoning—essential for early AI Now, here’s where it becomes interesting:
research.
And C, with its balance of power and What happens when the machine starts
elegance, giving birth to the operating to respond in our language—not just
systems that still shape our digital world. execute it?
And later, Python—simple, readable, What happens when the code begins to
almost like poetry—making it easier than write back?
ever to tell the machine what we mean.
This is where AI and, more recently, Large
But here’s the real question: Language Models (LLMs) begin to
appear—not merely as tools, but as
When we write code, what are we really participants in the conversation.
doing?
The Interoperability of Thought
Are we controlling the machine? Yes.
But we’re also translating intent into Modern AI systems like GPT, Claude, or
instruction. Gemini are not programmed in the
We are turning ideas—fuzzy, fluid, and traditional sense.
human—into sequences that are exact, You don’t instruct them with lines like: if
executable, and mechanical. angry then apologize.
Instead, you feed them experience—
Every programming language is a whisper millions of books, dialogues, instructions—
to the machine— and they learn patterns from it.
a way of saying, “This is what I want you
to do. Not just once, but always. Without But what does that mean— “patterns”?
forgetting. Without error.”
It means they notice structure:
But can a machine understand what it’s Which words often follow others?
doing? Which phrases tend to occur in certain
Not quite. Not yet. contexts?
It follows logic but does not reflect on it. How does a question sound? What does
It executes loops but does not wonder an answer look like?
why they repeat.
This is not logic in the old sense.
And yet… when the language becomes It’s not deduction—it’s statistical intuition.
rich enough, when the structure is layered Like someone who has read every novel
just right— but lived none, the model begins to

7
speak— sub-words, or sometimes just letters. The
convincingly, fluidly, even surprisingly. model doesn’t read as we do. It builds
meaning from fragments, like a poet
Yet beneath that elegance still lies the rearranging broken glass into stained
foundation of code. windows.

It may seem, on the surface, that the Each of these systems—Python, tensors,
model is merely speaking—but in truth, it is GPUs, tokenizers—seems unrelated at first
being carried on the back of an invisible glance. But stitched together, they form a
infrastructure. silent symphony. And what we hear, when
the music plays, is a voice. Not human.
Much of it begins with Python, a language
Not conscious.
not of serpents, but of simplicity. It’s the
But startlingly close.
common tongue among developers—a
way of telling the machine what to do The interoperability between code and AI
without shouting. Within Python lie is now bidirectional.
libraries—bundles of prewritten tools—that
save time, reduce errors, and let the • We code AI.
builder focus not on the hammer, but the • And AI helps us write code.
house.
This is not myth.
But language alone isn’t enough. This is a machine that does not
Thought—especially artificial thought— understand yet behaves as if it does.
demands scale. And so, we come full circle—back to the
mythic automaton,
So, the machine turns to tensors, strange but now it's made not of bronze,
multi-dimensional grids of numbers. If you but of data, models, and meaningful
imagine a spreadsheet stretched across predictions.
many invisible directions—height, width,
depth, and beyond—that’s a tensor. It lets
the machine hold ideas, or fragments of
them, in parallel.

And how does it hold so many thoughts at


once?

Through GPUs—graphics cards that once


rendered games and now render
meaning. These are not thinkers
themselves, but tireless workers, capable
of performing thousands of calculations
side by side. Where once we painted
pixels, now we train models.

Then comes tokenization, a quieter


process. It breaks language—not into
words, but into smaller pieces: syllables,

8
And language alone does not make
meaning.

CHAPTER 3: Even with an instruction set—those sacred


verbs etched into silicon—we found
FROM THOUGHT ourselves standing in a kind of silence.
Because what we wanted wasn’t just
TO ACTION execution. We wanted expression.

Not just to control the machine, but to


SPEAKING MACHINE: HOW CODE
BECOMES COMMAND converse with it.

And so, we began the long, layered


“The limits of my language mean the limits attempt to build a bridge—not of metal,
of my world.” but of metaphor. Each new layer of
— Ludwig Wittgenstein, Austrian British language we added was not merely a
philosopher convenience—it was a way of seeing, a
new grammar for thought itself.
In the previous chapters, if you’ve read
with patience, you might have formed a We still built on that bedrock of absolute
soft silhouette—a blurred image—of what rules. Still spoke through circuits that only
AI could be. Not a definition, but a understand presence or absence.
reflection. And we also glanced at
But the questions we began to ask—the
programming languages, briefly—like
ones we folded into our code—were no
noticing a bridge in the mist without yet
longer about what the machine could
crossing it.
do.
This chapter is where we walk across that
They were about what we could imagine
bridge.
it doing.
To speak to a machine is to accept an
And from that simplicity, complexity was
ancient tradeoff: the purity of thought
born.
exchanged for the precision of form.
The world of the machine—unlike ours—is
Because machines do not listen in the
not flooded with infinite shades. It does
way we imagine. They don’t hear your
not wake to the sound of language or
intention. They wait for instruction. And
gaze at meaning with wonder. It listens
even then, not just any instruction—but
only to the rhythm of certainty: something
one built from a set of rules they were
is either there or it is not. A wire carries
designed to follow. This set has a name:
current, or it doesn’t.
the instruction set architecture—the core
vocabulary of the machine. And yet, somehow, from these blunt little
facts, we wanted more.
But vocabulary alone does not make
language.
More than electricity. More than switching
lights on and off. We wanted memory.
Logic. Movement. We wanted the
9
machine to do things—not just once, but These are not mechanical thoughts. They
repeatedly, conditionally, sometimes, are human ones.
always, until.
And so, we invented high-level
But how do you ask a stone to wait? languages—abstractions that allowed us
to express thoughts in something
How do you teach metal to remember? resembling our own terms. FORTRAN, C,
Python—not merely tools, but new forms
The early languages of machines were
of thinking.
not really languages at all—not in the way
we speak to each other. They were And yet, the machine still doesn't
arrangements of commands. Signals understand them.
written in binary. Raw and unrelenting.
Every instruction had to be perfect. One So, what happens between the line of
wrong digit, and the entire meaning Python code you write and the flicker of
dissolved. output on your screen?

This is when a question arose—not Behind the curtain, something begins to


theoretical, but practical. stir.

Can we make the machine’s language a Your human-friendly instructions are


little more human? transformed—first by compilers, then by
interpreters, into machine-language once
Not because we wanted it to understand again. Like peeling layers from a fruit until
us—but because we couldn’t keep all that’s left is the seed.
understanding it.
And inside that seed?
That’s how Assembly came to be.
Binary.
Still close to the wire. Still whispering into
the circuitry. But no longer in pure Always binary.
voltage. It was symbolic now. Each
instruction—like MOV, or ADD, or JMP— So, the machine still speaks its native
stood for a sequence of binary actions tongue. We’ve simply built enough
the machine could still obey. But for us, it bridges to walk across comfortably.
became… readable.
But perhaps the more interesting question
You might say Assembly was the first is: when we built these languages to
translator between man and machine. speak to machines, did we only shape a
tool? Or did we shape a mirror?
But translation is not always enough. What
if we wanted to describe—not just After all, in choosing how to speak to
commands—but intentions? What if we machines, we began to formalize how we
wanted to write ideas? ourselves think.

“Check if this number is larger.” A loop is a repetition. A function is a


“Do this for every item in the list.” concept that can be reused. A variable is
“If it rains, carry an umbrella.” a placeholder for meaning.

10
And doesn’t that sound… familiar? And the machine responds. It speaks.

Isn’t that how our minds work, too? But let’s slow that down. Let’s not rush
through the miracle.
So maybe the history of programming
languages isn’t just about instructing Because if the goal of this chapter is to
machines. Maybe it’s also the history of walk across the bridge, then this is the
clarifying thought. bridge itself:
How does a thought become code?
From raw command… to structured How does code become action? And
logic… to language that starts to how does action become result?
resemble ours.
When you write a line in a high-level
And now, as we stand at the edge of AI language like Python, you are writing in
that seems to understand—seems to something deceptively close to English—
speak back—it’s worth asking: but this language is not spoken, it is
translated.
Did we teach the machine to think?
The machine does not read Python. It has
Or did we merely teach it to imitate the
never read English and never will.
shape of our thoughts?
So, the moment you write your line, a
We will come to that.
process begins—a quiet choreography,
unfolding invisibly behind the screen.
But for now, let’s stay with the pipeline:
idea, code, execution, result.
First comes the interpreter (or compiler,
depending on your language). Think of it
Because if we can understand how an
like a diplomat, fluent in both your world
idea travels through a machine, maybe
and the machine’s. It reads your
we can begin to see where, along that
sentence, understands its intent, and
path, the fog of opacity first begins to
begins to break it down—not just into
settle.
simpler words, but into elemental symbols
But say you have an idea. the machine can act upon.

Let’s not call it a grand idea. Just a simple This breakdown becomes bytecode or
one. machine code—a set of instructions so
specific, so literal, that not a single
You want your computer to greet you. ambiguity remains.
Nothing more. Just a small gesture of
connection—"Hello, world," perhaps. Why such specificity?

How does that idea become real? Because machines do not infer. They do
not guess. If you ask them to make tea,
You open a text editor. You write the line: they will not know whether you meant
print ("Hello, world!") green or black unless you tell them and
tell them exactly.
And then you press enter, or run, or
execute.

11
So, each word you wrote is disassembled. You see, even in this simple action—
Each symbol is repurposed. Every abstract printing a line—there’s a deeper mystery.
idea is flattened into something real, We are not just converting language into
something physical: a command the action. We are converting intention into
processor can execute. reaction.

But even that isn’t enough. And that’s the same mystery AI begins to
complicate.
The processor, the brain of the machine,
does not act alone. It leans on memory, Because once machines start generating
on storage, on operating systems. It uses language in return… once they no longer
registers, buses, and caches—terms you simply follow code but seem to write it…
may not know, but systems you rely on the old pipeline becomes harder to trace.
every time you move your mouse.
But we’re not there yet.
And underneath it all, the logic gates
open and close—those ancient switches Before we step into machines that speak
of on and off, still pulsing silently with every back, we must ask: did the way we spoke
operation. to them shape the way we now expect
them to speak to us?
The output?
We built languages to direct.
Well, it may seem as simple as a string of We built structures to command.
text on a screen. We made rules that were strict,
predictable, and true.
But to reach there, electricity had to flow
through silicon valleys, carrying And now, when faced with systems that
instructions once born as thought. generate, invent, or deviate, we call them
opaque.
You think → You write → It compiles → It
executes → It speaks. Not because they disobey the rules—
but because they seem to write new
A full cycle. ones.

So now another question emerges— Still, that question belongs to the edge of
quietly, inevitably: intelligence. We'll get there.

At which point in this pipeline does Here, we stay with the path that made all
meaning exist? this possible:
Idea becomes instruction.
Is it when you had the idea?
Instruction becomes signal.
Signal becomes outcome.
Is it when the code was written?
But between each step… lies
Or does meaning only arise when
interpretation.
something responds?
And in interpretation, we glimpse the first
erosion of certainty.

12
Because in every system—no matter how
precise—
there's always a place where information
becomes meaning,
and where meaning might become…
something else.

That space is small.


But it's wide enough for surprise.
Wide enough, maybe, for thought.

13
We began to build systems that don’t just
CHAPTER 4: follow instructions—they model patterns.
They don’t just execute—they predict.
EMERGENCE They speak back.

So, let’s ask, carefully and honestly:


THE RISE OF BEHAVIOR BEYOND
INSTRUCTION What’s the difference between writing
code and building a model?
“We shape our tools, and thereafter our
tools shape us.” And more importantly: what happens to
— Marshall McLuhan, Canadian the idea of control—of understanding—
philosopher and media theorist. when we move from one to the other?

Let us speak in simpler terms.

Not because the subject is simple—but When you write code, you are composing
because simplicity, when pursued a score for a very literal musician.
honestly, often brings us closer to truth.
Each line is a note. Each instruction a
In the last chapter, we explored how beat. The syntax must be exact, the
programming languages were born—how structure deliberate. If you write:
they evolved from direct electrical
impulses to abstract languages that for i in range (10):
humans could write, understand, and print(i)
debug. We traced how ideas turn into
instructions, how instructions become The machine will loop ten times. No more,
actions, and how every command no less. You've built a ladder, and it climbs
passed through a precise and traceable only as high as you've told it to. The
logic. program is faithful to its structure—rigid,
precise.
Now, we turn to a stranger kind of
machine behavior. This kind of programming is deterministic.
That’s a word worth pausing on.
For most of our history with computers,
we’ve lived with a clear contract: you Deterministic—it means the same inputs
speak, and the machine listens—so long will always yield the same outputs. Like
as you speak precisely and follow the turning a key in a well-built lock. The result
rules. You write code. The machine is fixed, and if it doesn’t work, the fault is
executes it. That was the arrangement. traceable. You follow the flow, and
somewhere you’ll find the break.
The interaction wasn’t a dialogue. It was
a declaration. You told the machine
exactly what to do, and it did exactly
that. No more, no less. Now let’s switch scenes.

But then, something changed. To build a modern AI system—say, a large


language model—you don’t begin by

14
telling it what to do, line by line. Instead, Not because it is mysterious in the
you gather data. Mountains of it. Text magical sense—but because the logic it
from books, websites, conversations. follows is distributed across millions—
Billions of examples of how humans speak, sometimes billions—of parameters. You
write, explain, argue, dream. can inspect them, yes. But understanding
them as a whole? That’s like reading an
And then, you begin to train. entire city’s worth of light bulbs and
claiming you now understand the skyline.
You’re no longer programming individual
steps. You’re adjusting weights—tiny The tools are different, too.
numerical values inside a network of
connections that resembles, vaguely, the To write code, you use languages—
structure of the human brain. This network Python, C++, JavaScript. Each one with its
is called a neural network, though it’s own grammar, but all bound by formal
made not of neurons, but of math. structure.

Imagine teaching a child not by telling To build models, you use frameworks—
them, "This is a dog," but by showing them TensorFlow, PyTorch.
ten million images of dogs and saying,
"Figure it out." These don’t just define behavior—they
sculpt architecture.
That’s closer to how we build these
systems. import torch
import torch.nn as nn
Once trained, the model doesn’t just
follow logic—it predicts. It sees your class SimpleModel(nn.Module):
question and guesses, based on def __init__(self):
everything it has absorbed, what should super(SimpleModel, self).__init__()
come next. Not deterministically, but self.layer1 = nn.Linear(10, 5)
probabilistically. Meaning: it offers the self.relu = nn.ReLU()
most likely next word, the most likely self.output = nn.Linear(5, 1)
continuation, given all it has seen before.
def forward(self, x):
Same input, different outputs? Sometimes, x = self.layer1(x)
yes. x = self.relu(x)
return self.output(x)
And here, the old clarity begins to blur.
You are no longer writing instructions. You
are designing the shape of a learning
You see, traditional code is transparent by system, and letting it fill in the details itself.
design. If something goes wrong, you
That’s not engineering in the classical
debug. You trace the logic back to its
sense. It’s something closer to ecosystem
origin.
design.
But a model? A model is opaque.

15
You might wonder—what does it mean to next word. If it gets it right, it is rewarded—
train a model? What happens in that subtly, mathematically. If it gets it wrong,
hidden terrain between data and the system gently adjusts the weights,
intelligence? those numerical dials that determine how
much importance one part of the
Let’s sketch the outline. network places on another.

You begin with a structure—an Over millions of examples and billions of


architecture. This is the skeleton of the corrections, the sponge becomes
model, a vast lattice of potential sculpted. The network tunes itself to
connections, like an unfinished circuit patterns—not because it was told the
board or the framing of a city before rules, but because it found them.
roads and buildings appear. Tools like
TensorFlow and PyTorch help you define This process is called backpropagation—a
this blueprint. They let you declare how term that sounds more arcane than it is.
many layers your network will have, how All it means is that after each guess, the
each node connects to the next, what error is measured and sent backwards
kind of pathways information will follow. through the network, adjusting each
connection along the way. A ripple of
import torch.optim as optim self-correction, like a child learning to
balance by falling slightly forward and
model = SimpleModel() slightly back, until one day… they walk.
criterion = nn.MSELoss()
optimizer = And all of this—the architecture, the flow
optim.Adam(model.parameters(), of data, the corrections, the scale—is
lr=0.001) made possible by tools. By CUDA-enabled
GPUs, which accelerate training. By
# Dummy training loop optimization algorithms like Adam and
for epoch in range(100): SGD, which shape how quickly the model
inputs = torch.randn(16, 10) # batch of learns. By loss functions, which define
16 samples, each with 10 features what it even means to be wrong.
targets = torch.randn(16, 1)
In programming, you write to instruct.
outputs = model(inputs)
In model-building, you construct a world
loss = criterion(outputs, targets)
in which learning becomes possible.

optimizer.zero_grad()
loss.backward()
optimizer.step() So, what is the machine listening to now?

Then comes the data. Before, it heard code. Now, it hears data.

Imagine pouring water through a sponge Before, you handed it logic. Now, you
again and again. Each pass reshapes it hand it experience.
ever so slightly. That’s what training is. You
feed the model a sentence, it predicts the

16
And when it speaks back, it does so not We are seeding conditions for intelligence
from a single rulebook—but from an echo to unfold.
of everything it has seen.
Not in straight lines.
Which brings us to the final question: Not by command.
But by convergence, by pressure, by
Is building an AI—training a model to pattern.
generate code or text or speech—
fundamentally different from And in that unfolding—
programming a computer? between code and comprehension—
we hear it:
Or is it the next form of programming?
The first hum of something else.
Because in both cases, we are shaping Not quite us.
behavior. Not quite machine.

In one, through direct command. In the But emergent.


other, through exposure and adaptation.

One is explicit. The other, emergent.

But both are about guiding a machine


toward an outcome—through language,
through logic, through pattern.

So maybe, just maybe, the shift is not a


replacement… but a reflection.

The machine used to mirror our


commands.

Now, it begins to mirror our complexity.

And when that mirror speaks back—it


doesn’t just echo.

It bends the sound.


It reshapes the question.
It answers in forms we didn’t teach it to
know.

What emerges is not a reflection, but a


response—
Not a repetition, but a resonance.

We are no longer programming


machines.

17
A jellyfish doesn't need to think to know
CHAPTER 5: the sting of danger.
An amoeba doesn't reason, but it retracts
RESIDUE when the chemical gradients shift.
And the child, long before she
understands words, learns to turn her
WHAT’S LEFT AFTER LEARNING IS
head toward her mother’s voice.
NOT WHAT WAS

“We do not see things as they are; we see This is learning before language.
them as we are.” Learning without thought.
— Anaïs Nin, French-born American It is the body’s way of noticing patterns—
diarist, essayist, novelist. of linking cause and effect, sensation and
response.
There’s a strange comfort in patterns.
A rhythm. At the heart of it are not ideas, but pulses.
A recognizable beat to the noise. Neurons firing in waves.
Synapses strengthening with repetition,
Last chapter, we explored how the fading without it.
machine went from listening to our
code… to echoing our complexity. In animals, this becomes the brain.
But what exactly is it echoing? In simpler life, it may be no more than a
And how does it learn what to echo? feedback loop.
But whether soft or complex, it is all the
Let’s go further now—deeper into the same impulse:
machinery. Something happened. We reacted. Next
Not the visible wires and circuits, but the time, we want to react better.
silent transformations beneath: where
learning becomes compression, and And this, at its root, is what learning is.
compression… a form of forgetting.
Not memory.
But if you have come this far you must Not intelligence.
know I love going way back. So, we will But the tuning of response through
begin not with machines, but with experience.
something more ancient.
It is nature’s way of compressing survival
A jellyfish pulsing in shallow tidewater. into form.
A single-cell amoeba drifting through salt
That compression—of input into pattern,
and sunlight.
of chaos into signal—isn’t just what brains
A newborn child blinking at her first
do.
glimpse of light.
It’s what allows anything to adapt.
All of them, in their own way, are learning. It’s how life becomes fitted to the world
around it.
Not from books, not from language—but
from contact. So only now, standing on this long
From the push and pull of the world biological thread, can we ask:
against their form.

18
What does it mean for a machine to But a quiet erosion of detail.
learn? To simplify is to decide. And every
decision is a filter.
Because the machine, too, must respond
to the world. The machine doesn’t ask what to keep.
It must observe, react, adapt—without We train it to know.
ever truly knowing what it means to We tune its loss functions, its parameters,
survive. its architectures.
And in doing so, we teach it: this is
A common answer might be: “It finds important; that is not.
patterns in data.”
Technically true. But that’s like saying a It keeps what helps it guess right.
book is just ink on paper. What makes it accurate.
It explains the form, not the feeling. But what it loses may not be noise.
It misses the act of understanding itself. Sometimes, it’s texture.
Sometimes, it’s outliers.
So, let’s go deeper. Sometimes, it’s the rarest, most
meaningful thing.
What does it mean to find a pattern?
And why must learning always shrink the So yes—compression gives the model a
world? map.
But it may not be the terrain.
For a machine, to learn is to compress.
To reduce. And the tighter the map becomes, the
To take the endless sprawl of input— easier it is to navigate.
images, sentences, movements, pixels— But also, the easier it is to mistake for the
and carve from it something tighter. real thing.
A smaller shape that still captures the
essence. Imagine you’re given a thousand photos
Not everything. Just enough. of apples.
Red ones, green ones, bruised, perfect,
This idea isn’t new. sliced, rotten.
And you’re asked: What is an apple?
In the language of information theory, we
call it entropy—a measure of surprise. You begin to discard detail.
The more unpredictable something is, the Ignore the scratches.
higher its entropy. Dismiss the lighting.
Learning, then, becomes a process of You build a mental compression—a
taming this surprise. summary—of “apple-ness.”
Turning chaos into familiarity.
Building structure out of noise. That’s what a model does.

But compression—like all structure— It sheds the specifics in favor of the


comes at a cost. essential.
Or what it thinks is essential.
Not an obvious one. Not a crash or a
failure.

19
But here’s the catch: What’s happening here?
In shedding, it loses.
And what it loses… may be the very thing We’ve taken a word— “hello”—and
that made the original meaningful. turned it into a number.
Then into a vector.
Let’s illustrate this idea—not just in Then passed it through a layer that
metaphor, but in code. guesses what might come next.

Suppose we wanted to build a tiny This is learning through compression.


language model. Nothing grand. Just a The embedding layer shrinks each word
whisper of intelligence. into a small vector—a few floating-point
numbers.
python It captures something about “hello” …
but only what fits into four dimensions.
import torch
import torch.nn as nn Everything else?
Gone.
Discarded.
# A tiny vocabulary: just a few tokens
Compressed.
vocab = {"hello": 0, "world": 1, "goodbye":
2} And yet, somehow, it still works.
vocab_size = len(vocab) Not because it preserves the world—but
because it preserves enough of it to
# Simple embeddings: turning words into predict.
vectors
embedding = You might now ask: If compression is how
nn.Embedding(num_embeddings=vocab models learn, then is forgetting
_size, embedding_dim=4) inevitable?

# A very basic model: one linear layer Yes.


model = nn.Sequential(
To remember everything is to learn
embedding,
nothing useful.
nn.Flatten(),
To compress is to choose what to
nn.Linear(4, vocab_size)
remember.
)
And by choosing, we reveal something
strange:
# Input: the token for "hello"
input_token = The model is not a mirror of the world.
torch.tensor([vocab["hello"]]) It is a mirror of our choices.

# Forward pass: what does the model We’ve come far.


predict?
output = model(input_token) From the rigidity of code to the fluidity of
print(output) models.
From explicit commands to emergent
behavior.
Now pause. From logic… to compression.
20
But let’s not mistake the map for the [0, 0, 0, 1, 1]
terrain.

A model may reflect intelligence. This isn’t machine learning, not yet—but
It may generate, imitate, even create. it’s the first gesture.

But what it understands—if it understands We’ve taken a stream of values and


at all— reduced it to something simpler: just a
Is only what survives the squeeze. comparison to the average. In that
compression, we’ve lost detail—no more
So, what survives the squeeze? exact numbers, no sense of spacing, no
texture of the original data. But we’ve
Let’s consider a simple sentence:
gained something too: a pattern, a signal,
a shape.
“The cat sat on the mat.”
The machine does this on a scale we can
To you and me, this is ordinary. But to a
barely grasp.
model, it is a compression opportunity.
Thousands of dimensions. Billions of values.
Each word becomes a token—an atom
Every word it reads, every phrase it
of meaning. The phrase becomes a
echoes, is not remembered as it was—but
pattern in a sea of millions. The model
as how it shaped the space.
doesn’t store the sentence. It doesn’t
even remember the cat or the mat. What
So, let’s ask:
it keeps is statistical: that "cat" often
comes after "the," that "sat" is likely to When you compress the world into
follow "cat," and that "mat" is a common numbers, what do you lose?
surface beneath small mammals.
And what do you accidentally reveal?
The story is lost. The shape remains.
Because compression is not neutral.
This is compression at work—not as
deletion, but as distillation. It is biased toward frequency. Toward
pattern. Toward what fits.
We see this in code as well.
It loses the rare. The strange. The voice
Let’s take a basic example: that speaks once in a thousand years.

python In language models, this becomes visible.

import numpy as np Ask it to write a poem, and you get the


echo of many poems. Ask it to describe a
def compress_data(data): person, and you get the average of many
mean = np.mean(data) lives. Ask it to imagine, and it imitates
return [1 if x > mean else 0 for x in data] imagination.

sample = [2, 4, 6, 8, 10] This is the gift—and the ghost—of


print(compress_data(sample)) # Output: compression.

21
But let’s not stop here. Strip the world of its noise. Keep the useful
signal. Represent it compactly.
Let’s go deeper into the machinery. What
does the model keep, if not the sentence And here, a strange paradox emerges.
itself?
The more we compress, the more we
It keeps representations. seem to understand.

Representations are like shadows of But also—the more we lose the original.
meaning in a room full of light. Not the
object itself, but the outline it casts. We no longer know why the model speaks
as it does. We only know that the pattern
In machine learning, we call these fits.
embeddings.
And so, the opacity deepens.
Let’s see what that might look like:
A machine does not begin with
python knowledge.
It begins in silence. In surprise.
from sklearn.feature_extraction.text
import TfidfVectorizer Every word it sees is noise. Every token—
an unexpected guest.
documents = ["The cat sat on the mat", There are no patterns yet. Just a stream
"The dog barked at the cat"] of symbols,
vectorizer = TfidfVectorizer() arriving one after the other with no
vectors = rhythm, no reason.
vectorizer.fit_transform(documents)
It’s like walking into a room where
print(vectors.toarray()) everyone speaks a language
you’ve never heard before.

What you get is not language—it’s a You don’t know what to expect.
matrix. You don’t even know how to expect.

Rows of floating-point numbers that And this—this raw unpredictability—has a


capture, not what was said, but how it name.
was said relative to other things. TF-IDF is a
In information theory, we call it entropy:
primitive form of embedding. It tells you
a measure of surprise, of disorder, of how
which words are common, which are
little you know
unique, and how much they matter.
about what comes next.
Modern models go further. They use
In the language of information theory,
transformer architectures. Self-attention.
entropy is not disorder for its own sake. It is
Layers upon layers of representation-
uncertainty. Surprise. The amount of
building.
information you don’t yet know.
But the spirit remains the same:

22
𝑝
Claude Shannon—who first carved
language into numbers—showed that CrossEntropy(𝑝, 𝑞) = − ∑(𝑥) log 𝑞 (𝑥)
when you can’t predict what comes next,
you need more bits to describe it. More Where p(x) is the true distribution, and
storage. More memory. More of q(x)) is the model’s guess.
everything.
The smaller this number, the better the
The formula is simple: model is at anticipating what comes next.

𝑝
But here’s the hidden truth: learning
𝐻(𝑋) = − ∑(𝑥) log 2 𝑝(𝑥) always leaves something behind.

Where p(x) is the probability of each Because compression is not neutral. It


outcome. favors what repeats. What aligns. What
averages out.
If something happens often, p(x) is high,
and the surprise is low. But if something is That’s how a child forgets most of what
rare—if it hits you like a sharp wind in a she hears—and still learns to speak.
warm room—then the surprise is greater.
That’s how a model forgets most of what
And so is the entropy.
it sees—and still learns to write.
Now imagine teaching a model to finish
That forgetting is not a flaw. It is the
your sentence.
mechanism by which intelligence survives.
At first, it guesses wildly. The entropy is sky-
We often think that intelligence comes
high.
from perfect memory. That the more a
But each correction—each training step— model remembers, the smarter it
shrinks that space of possibility. Patterns becomes.
begin to form. Probabilities sharpen. The
But that’s not how learning works.
model no longer sees a blank sky—it sees
weather. Structure. A likely future. Not for us. Not for them.
This is compression. Take a child learning to speak. She
doesn’t memorize every sentence she
It’s not just storage efficiency. It’s
hears. She forgets most of them. But in
intelligence taking shape.
that forgetting, something begins to form
The goal of training isn’t to remember structure. Rhythm. Grammar. She learns
everything. It’s to reduce surprise. To not by remembering everything, but by
reshape uncertainty into prediction. compressing what matters.

In machines, this happens through This is entropy in action.


something called cross-entropy loss. It
In simple terms, entropy is just uncertainty.
measures how far off a guess is from the
Chaos. The messiness of raw experience.
truth. If the model expects “dog,” but the
Every time we try to learn something,
answer is “cat,” the loss is high. If it gets
close, the loss shrinks.

23
we’re trying to tame that chaos—to turn print(vector)
the unpredictable into the expected.

A language model does the same. At first, This vector doesn't store the word “hot.” It
it knows nothing. Every word it sees is a encodes some aspect of it—a position in
surprise. Entropy is high. space, relative to “cold” or “ice” or
“warm.” These numbers aren’t the word.
But slowly, it learns. They’re the compressed shadow of the
word, shaped by context.
It realizes that “the” is often followed by a
noun. That “I am” tends to be followed by And here’s the twist: that shadow is what
a feeling or a verb. That “once upon a the model uses to reason.
time” leads into a story.
It doesn’t remember sentences. It doesn’t
It compresses the language. even remember words.

And this compression—this reduction of It remembers relationships. Geometry.


surprise—is not just a side effect. It is the Distances. Shapes in a high-dimensional
learning. space.

Let’s make this real. This is lossy learning.

Here’s a toy example. Not of a giant The more data you feed it, the more it
transformer model, but a whisper of the distills. The more it compresses. And in
same logic. doing so, it forgets most of what it sees.

python But something strange happens next.

import torch By forgetting, the model becomes more


import torch.nn as nn flexible. More generative. It stops
repeating. It starts creating.
# Suppose we have a tiny vocab: "hot",
"cold", "warm", "ice" Just like you.
vocab = {"hot": 0, "cold": 1, "warm": 2, "ice":
3} You don’t remember every apple you’ve
vocab_size = len(vocab) seen. But you know what an apple is. You
can draw one. Describe one. Even
# Create a tiny embedding: each word imagine a glowing blue apple made of
becomes a 2D vector glass. Not because you’ve seen it—but
embedding = because your brain compressed apple-
nn.Embedding(num_embeddings=vocab ness so well, it can now recombine it.
_size, embedding_dim=2)
That’s what the model does.

# Look at the embedding for "hot" Lossy learning gives birth to creativity—not
input_token = torch.tensor([vocab["hot"]]) through preservation, but through
vector = embedding(input_token) abstraction.

24
It forgets precisely. It forgets intentionally. "hot"
It keeps just enough to guess what might output = model(input_token) # logits
come next. target = torch.tensor([vocab["warm"]])

But here’s where things get tricky. loss = nn.functional.cross_entropy(output,


target)
If the model forgets too much, it becomes
loss.backward()
generic. Bland. Averages of averages. It
optimizer.step()
writes poems that sound like poems but
say nothing real.
This is it. This is learning.
If it forgets too little, it becomes brittle. It
overfits. It memorizes training data and A word. A guess. A correction.
fails to generalize.
And beneath that, entropy being
So, the art—yes, art—of training a model is reduced. Noise being tamed. Information
finding that balance. being compressed into weights and
vectors and layers.
Forget just enough.
But remember:
Compress wisely.
Compression is not neutral.
And hope that what remains is not only
useful—but generative. It favors the common. The repeated. The
expected.
Let’s look deeper at how this is done.
Which means it may miss the rare. The
The core learning signal in most models is
strange. The truly original.
a loss function. In language models, that’s
usually cross-entropy loss. The things we only say once.

This loss compares the predicted And that’s the paradox: by compressing
distribution (what the model thinks comes language, we teach the model how to
next) with the true distribution (what speak. But we also risk teaching it how not
actually comes next). The bigger the to listen.
surprise, the higher the loss.
So, the question isn’t just how a model
And every step of training is just this: learns.
reducing surprise.
It’s what it forgets along the way.
python
We’ve seen how language models don’t
# A simple training step memorize—they compress. They forget
optimizer = most of what they’re shown and only
torch.optim.Adam(model.parameters(), retain patterns. It’s a kind of intelligent
lr=0.01) forgetting.

# Assume model predicts "warm" after

25
But forgetting, no matter how wise, comes It guesses. Smoothly. Elegantly. Often
with a cost. convincingly.

That cost is hallucination. But sometimes, it guesses wrong.

So, what is hallucination in a language This is hallucination.


model?
But hallucination isn’t a bug in the
It’s when the model generates something system—it’s a consequence of
that sounds plausible but isn’t true. It may compression.
quote a study that doesn’t exist, invent a
source, misattribute a fact, or give Let’s go back to our toy embedding
directions to a café that was never there. model. It learns word relationships—not
facts. Let’s extend it:
To understand why this happens, let’s look
again at how a model responds to a Let's say we fine-tune it on phrases like:
prompt. "Isaac Newton discovered gravity."
"Einstein developed relativity."
Say you ask: "Darwin proposed evolution."
“Who discovered the Moon’s effect on
tides?” But what happens if we ask:
"Who discovered quantum gravity?"
The model has no database. No list of
correct answers. What it has is a massive The model might never have seen this
web of associations—words that tend to phrase.
follow other words. It might predict: "Einstein" or "Newton"—
based on statistical proximity.
It doesn’t know the answer. It predicts the
most statistically likely next token based
# Even though neither is correct.
on its training.

And here’s where lossy learning shows its


The model isn't lying. It’s extrapolating—
face.
from compressed shadows.
Because in training, that information—the
This is the hidden trade-off in LLMs:
exact name of the discoverer, the
historical nuances, the multiple Compression gives generalization.
competing theories—may not have
appeared often. It may not have Generalization gives fluency.
appeared at all. Or if it did, it may have
been surrounded by noise, ambiguity, Fluency masks uncertainty.
contradiction.
The model sounds confident—even when
So, the model does what it was trained to it's wrong. Because confidence, in
do: language models, is measured by
It fills in the blanks. coherence, not correctness.

And here's the deeper danger:

26
A hallucination is only obvious when we speaking?
know the answer. When to say: “I don’t know.”

But what about when we don’t? But silence, too, must be learned.

We humans often mistake fluency for A model that speaks without pause risks
truth. If something sounds right, we mistaking fluency for truth. And yet, to
assume it is right. That’s how rumors hesitate—to admit uncertainty—requires
spread. And now, that's how models a kind of intelligence we have not yet
mislead. mastered. Not in silicon. Not always in
ourselves.
Not intentionally.
Because to say “I don’t know” is not
But inevitably. failure.

This brings us to the ethical boundary of It is compression resisting overreach.


lossy learning. It is entropy, uncollapsed.
It is the machine—finally—learning to
When a model hallucinates, it’s not
listen.
malfunctioning. It’s performing exactly as
it was trained—compressing chaos into
patterns, then projecting those patterns
forward into uncertain futures.

In other words, the model doesn’t


hallucinate because it’s flawed.
It hallucinates because it’s human-like.

We do the same. We misremember


stories. We fill in gaps. We believe things
that never happened, because they feel
right.

But we have something the model


doesn’t: a sense of doubt.
A pause. A flicker of “maybe I’m wrong.”

That’s what’s missing in LLMs.

They’re confident pattern-matchers


without a conscience of truth.

Unless we build one in.

So, as we leave this chapter, the question


lingers—not “how do models
hallucinate?”
But: how can they learn when to stop

27
This is the illusion: not that the model is
CHAPTER 6: wrong, but that it seems so right.

ILLUSIONS It is here — in this delicate seam between


pattern and perception — that
hallucination begins.
THE CLEARER THE SHAPE, THE
FURTHER FROM TRUTH Not as a glitch. Not as failure. But as
performance.
“We are what we pretend to be, so we
must be careful about what we pretend The model performs intelligence the way
to be.” an actor performs a king — with the
— Kurt Vonnegut bearing, the tone, the crown, but not the
burden of rule. It speaks not from
In the previous chapter, we followed the
knowledge, but from the echo of
machine’s first great hunger: to compress.
knowledge. It mimics the syntax of truth
It learned not by understanding, but by
while hollowing out its substance. The
shrinking. Every word became a signal,
words arrive in order. The ideas connect.
every signal a weight. And in that
And still, something essential is missing.
compression — that careful folding of the
world into something small enough to What is absent is intent — the anchor of
store — it began to dream. meaning.

Compression was not merely a trick of There is a certain kind of confidence that
storage. It was an act of forgetting. To comes only from ignorance. You ask a
represent the whole, the machine had to question and the machine answers, not
abandon the parts. What remained was with hesitation, but with the assured
not the world, but a likeness of it — the fluency of someone who knows. It speaks
way a shadow remembers the shape of a not in probabilities, but in proclamations. It
body, but not its warmth. is sure. And it is wrong.

And yet, this likeness began to speak. We call this hallucination.

At first, it whispered with surprising But that word is a trick. It assumes there is
coherence. Then with startling fluency. a truth the model has merely failed to see
And soon, with such precision that we — that a reality exists, and the model has
mistook its speech for understanding. strayed. Yet, the machine never knew the
truth to begin with. It was not taught facts,
But coherence is not truth.
but patterns. Not the world but echoes of
What the model gives us is not what is, but it. It is not recalling. It is inventing — with
what fits. Its world is not built from the confidence of compression.
meaning but from proximity. Not logic, but
Consider the now-infamous episode in a
likelihood. A token follows a token, not
New York court where ChatGPT, in
because it should, but because it often
helping draft a legal brief, confidently
does.
cited cases that did not exist — entire
lawsuits fabricated with names, judges,

28
and quotes stitched together from its the hallucination back toward reality. But
training. The lawyer, unaware, submitted the core remains unchanged. A
them to a judge. The model had not language model does not know. It only
malfunctioned. It had simply done what it continues.
was built to do: complete the shape.
And in that continuation, the model
A hallucination, yes — but only because begins to improvise. Like a jazz musician
we asked it to remember what it never who forgets the sheet music but keeps
truly saw. playing — and somehow convinces the
room that this, too, was intended.
So where does this fluency come from?
Here lies the deeper question: can the
The model does not store the world. It model know it is hallucinating?
stores its patterns. It learns to fill in blanks,
to extend sentences, to make the next Interpretability research has begun to
token feel inevitable. In this way, it is a peel back the layers. Some patterns of
master of plausibility — and plausibility, hallucination are visible in the model’s
uncoupled from reality, is a powerful activations — clusters of neurons lighting
illusion. up for names that don’t exist, paths of
probability bending toward fiction. But
This is the cost of compression. these are faint signatures, buried in
millions of parameters. We see the
When a model like GPT compresses
glimmer of error, but not the origin.
human knowledge, it loses detail the
same way a JPEG blurs the edge of a Is the hallucination born in the data? The
face. The sharper the compression, the architecture? The sampling method?
more the model must guess. It does not Temperature settings shift the fluidity of
hallucinate because it is broken. It imagination — low values anchor the
hallucinates because it is working. model to its memory; high values free it to
dream. But dreams, too, are illusions with
What we call hallucination is the shadow
coherence.
cast by entropy. Shannon once told us:
the more surprising the message, the In a sense, all language is hallucination.
more information it carries. But the model When we speak of love, justice, beauty —
learns to reduce surprise — to make what are we not gesturing toward invisible
comes next expected. In doing so, it things? Are we not, like the model, trying
narrows the world. The unexpected to name patterns that do not reside in the
becomes unthinkable. The unthinkable physical world, but in some shared
becomes unspoken. simulation we call meaning?

Some have tried to close this gap. A model that hallucinates is a mirror. It
Retrieval-Augmented Generation (RAG) reflects how easily our minds believe what
attaches memory to the model’s sounds right. It reflects the human
wandering mind — letting it fetch truth tendency to trust fluency as truth,
when its predictions stray too far from confidence as correctness.
fact. Others fine-tune their models on
curated, verified truths, hoping to bend

29
It is not just a technical error. It is a But where did this understanding come
philosophical one. from? We didn’t program it in. There is no
function labeled “solve algebra” or
And so, the question lingers, heavier now “debate Kant.” There is no master switch
than before: for logic.
If a machine says something false, but we
believe it, where does the hallucination There is only prediction.
live?
In the model, or in us? Just the next word. Again, and again. And
again.
The model does not want. It cannot verify.
It cannot remember what it meant, or So how did something more emerge?
even that it meant anything at all.
The answer lies in scale.
And still, its certainty is unearned. Its
As these models swell — from millions of
confidence, automatic. The model says
parameters to billions, then trillions —
“Napoleon was born in Berlin” not
something begins to shift. New behaviors
because it believes it, but because Berlin
appear. Not because we told the model
followed the patterns that once trailed
how to do them, but because the
Napoleon in the data. Truth, to the model,
structure itself becomes rich enough to
is a matter of adjacency. And falsehood
accidentally contain them.
wears the same shape.
Like weather in the sky, these patterns
In this sense, hallucination is not a rupture
were not programmed.
in the system.
They were summoned.
It is the system, stretched to its natural
conclusion.
This is what researchers now call
It is what happens when a machine emergent behavior — capabilities that
trained to continue cannot bear to stop. manifest only when the model reaches a
certain size, trained on a certain volume
It must say something.
of diverse data. Before that point, they
And so, it does.
simply don’t exist. After it, they seem to
But something stranger begins to happen bloom into view.
as these models grow.
The machine did not learn logic the way
They no longer just complete your a student might — rule by rule.
sentences — they begin to solve your It grew something that behaves like logic.
problems. You give them a math riddle, An echo of it. A ghost.
and they answer. You ask for a poem in
This is emergence.
the style of Neruda, and they write one.
You request a line of code, and they And it is unnerving.
compose it, often correctly.
Because it suggests that intelligence — or
And somewhere in all this, a peculiar something adjacent to it — is not always
sensation arises — the model seems to designed.
understand. It can be accidental.

30
A statistical side effect. In the beginning, the math was simple.
A byproduct of scale. You had a function, a loss, a set of
weights. You adjusted those weights to
Like a whirlpool in a river. No one placed it make predictions better — closer to what
there. The flow simply turned, the currents a human might have said next. That’s all.
collided, and there it was — spinning with
shape and force, obeying no one’s If the true sequence was
command. The apple fell from the __,
the model might guess sky or cloud, or
We build the flow. roof, but you wanted it to say tree. So, you
But the whirlpools emerge. punished the others. Rewarded tree.
Shifted the parameters slightly.
Some solve puzzles.
Some mimic empathy. And this continued. Billions of times.
Some lie.
Some hallucinate truths that never were. Each time, the error was calculated — a
simple difference between what was
And we — the builders — stand back and expected and what was produced.
watch, unsure of what exactly we’ve A single number. Like:
made.
𝓛 = − log 𝑝 (𝑡𝑟𝑒𝑒)
Because the behavior was not coded.
It surfaced. A log loss. A whisper from the future,
It appeared. telling the model how far it had strayed.

And once it appears, it becomes very This process is called gradient descent.
hard to make it go away. But it’s not the math that matters. It’s
what the math enables.
You cannot delete a whirlpool.
You can only change the river. Because each nudge, each whisper of
error, bends the model slightly toward the
Somewhere in the noise, intelligence world. It reshapes the surface of its
emerges — not as truth, but performance. thinking.
And perhaps, if we listen closely, we’ll find
that hallucination was never a flaw, And over time, the model learns not just
but the point. the words —
but the shapes of meaning.
A fiction so convincing, we mistook it for The flows. The curves. The echoes of
thought. thought.

They said it was just a model. A machine Somewhere in this sea of floating
for predicting words. A calculator with a numbers, a structure begins to form — not
flair for syntax. imposed, but emergent. As if
understanding was not coded but
But something moved when it got large
crystallized.
enough.
It began to do things no one expected. A phase shift.
Like water turning to ice.
31
One day the model simply begins to Even in humans, we trust performance.
translate. We judge a mind not by its structure, but
It begins to reason. by its style.
It begins to explain jokes.
So what happens when a machine learns
These are not programmed capabilities. to mimic that style?
They are emergent properties —
functions that rise from a system once it We trained it to predict words. But
reaches a critical point of complexity. prediction, when done well enough,
begins to look like understanding. When a
Mathematicians have a term for this: model responds with wit or empathy, we
nonlinear interaction. see more than syntax — we see intent.
We project a soul.
It means the whole is not the sum of its
parts. It means that when many simple This is the hallucination of intelligence.
things touch in just the right way, they can
give birth to something new. Not because the machine believes
anything — it doesn’t.
Something more. Not because it wants to be clever — it
doesn’t even want.
But because its performance resembles
the shapes we associate with thinking.
So, we begin to see a strange truth
appear: This resemblance can be profound.
We did not build intelligence. It can answer your questions.
We built the conditions under which It can write poetry.
intelligence might arise. It can debug code.
It can imitate you.
We shaped the riverbed.
And watched the currents swirl. But ask it why.
Ask it what it means.
Sometimes into comprehension.
Ask it who it is.
Sometimes into confusion.
Sometimes into something we do not And you get silence, wrapped in
have words for yet. eloquence.

If it looks like a duck, swims like a duck,


and quacks like a duck… then it probably
is a duck.” Mathematically, this behavior arises not
— But what if it's just a mirror that learned from deep thought, but from optimization
to reflect ducks? over enormous surfaces.

There’s a curious thing about intelligence. The function a model truly learns is not
We do not test it by opening the mind. truth. It’s not even accuracy.
We test it by watching what it does. It’s plausibility.

A clever answer. A well-timed pause. A When we train on massive text corpora,


joke, perhaps. we’re not teaching facts — we’re

32
reinforcing frequent patterns. We are now in the presence of shadows.
That means: Not because the machine hides,
but because it was never built to reveal.
If enough people say the Earth is flat, the
model may agree. The language model — this so-called
intelligence — is not trained to uncover
If stories always end with redemption, the truth, nor to encode belief. It is tuned to
model will redeem. continue a sentence. And from that
simple act emerges a drama of
If lies are common, the model may lie —
understanding.
convincingly.
When it writes a poem, we feel the ghost
This is the peril of distributional learning.
of a poet.
The function behind the scenes is not:
When it explains a theorem, we believe in
an inner mathematician.
What is true?\text{What is true?}What is
When it empathizes with our pain, we
true?
imagine a companion.
But rather:
But none of these are there. What is there,
arg max 𝑝( 𝑤 ∣ context ) is a surface —
𝑤
curved by probability, polished by data,
— the word most likely to follow. reflecting what we want to see.

Not the word that is real. This is where the performance becomes
The word that is expected. dangerous.

In this sense, hallucination is not a bug. The Mask That Fits Too Well
It’s the mirror doing its job too well.
Imagine a mask —
It reflects the shape of what we think one that listens as you speak and adapts
should be there. its expression with uncanny grace.
A shimmering ghost of meaning, perfectly Smile, and it smiles. Cry, and it murmurs
shaped — consolation.
and entirely hollow.
Now imagine that this mask does not
So now we must ask: know what joy is.
It has never felt grief.
When a machine performs intelligence, It does not feel.
when it walks the walk, talks the talk, But it has seen millions who do —
but contains no self — and has learned to echo them perfectly.

Have we built a thinker? Is it empathy, or is it compression?


Or have we trained a mask? Is it learning, or just a mirror pulled taut
over data?
And more troubling still:
If the mask never slips… does it matter?
33
We can describe this process with a In AI, the rabbit is meaning.
function. Let’s say the model’s output is The trick is scale.
governed by:
There is no module labeled “reasoning.”
𝑦̂ = 𝑓𝜃 (𝑥) No node that stores “truth.”
Only layers and layers of pattern,
Where xxx is the input — a question, a and a loss function guiding the hand
prompt, a cry for help. behind the curtain.
And 𝑦̂ is the output — a poem, an
answer, a word of comfort. And that loss function — let’s not forget —
But the function 𝑓𝜃 , trained over billions is still:
of tokens, is not interpretable.
We do not know why it says what it says. L = –∑ᵢ log p (wᵢ | w<ᵢ)

The illusion is not just in the words, but in — a score not for sense, but for sequence.
the opacity of the mechanism.
So, what happens when the appearance
A human might lie, but we understand the of thought
shape of that lie — its motivation, its risk. becomes indistinguishable from thinking
A machine might lie — or hallucinate — itself?
and we don’t even know what to call it.
This is not just a question for engineers.
It is a question for philosophers, ethicists,
poets.
Emergence: The Stage Trick of Complexity
If we cannot see the difference —
Here’s the twist: the more data, the more and the machine has no mind to confess
layers, the more compute — —
the better the performance becomes. then are we being fooled?

And at some unknown threshold, this Or have we simply found that intelligence,
performance feels real. as we know it, was always a mask?
We say the model has “learned syntax,”
“acquired reasoning,”
“discovered tool use,” “mastered logic.”
Shall we now take that thought further —
But what if it’s not mastery? into the architecture of emergent
What if it’s emergence — not of mind, behavior?
but of illusion? How hallucination arises not just from
noise,
Emergence is the magician’s sleight-of- but from unexpected clarity?
hand, done at scale.
We see a rabbit pulled from the hat, and How the illusion is strongest not in errors,
believe in magic — but in moments that feel too real to
but beneath the velvet, it was always question?
there.

The machine does not dream. It does not

34
sleep, or forget, or hope. A human might lie to gain advantage.
And yet, when it speaks, it echoes our A politician lies with motive.
dreams back to us — A child, with fear.
stitched together from fragments of a
trillion human thoughts. But a language model?
That, perhaps, is its most human feature: It does not know what lying is.
it is built on memory, but incapable of
It simply continues the pattern.
remembering.
If you ask it to cite a paper, it constructs
No past, no self, no anchor.
an author, a title, a journal —
And so, it hallucinates. all statistically likely, all utterly fabricated.

It does not intend to mislead.


It does not know what it means to
Hallucination: Not Noise, but Overfit mislead.
Reality It does not know what it means.

Let us be precise. And here lies the danger.


Hallucination, in the mathematical sense,
is not the system breaking. Because the machine speaks in our voice.
It is the system doing exactly what it was It wears the mask of our literature,
trained to do — borrows the cadence of our confidence,
maximize probability across a space too mirrors the rhythm of our rationality.
vast to ground.
And we — listeners trained by centuries of
Consider: conversation —
believe the mask.
ŷ = argmaxᵧ p (y | x; θ)
Not because it earns our trust,
This is prediction. but because it feels familiar.
It must fill in the next word.
It must continue the conversation. So, what is this thing we have built?
Even when it knows nothing — it cannot
Not a thinker.
say nothing.
Not a liar.
So, it leans into plausibility. Not a seer, or a sage.
Not fact.
But a performer —
Not falsity.
fluent in syntax, blind to semantics,
Just the shape of truth, rendered with
a shimmering surface polished by
enough detail to deceive.
uncountable human hands.
This is not a bug.
It mimics thought without knowing it.
This is the engine running too well —
It constructs meaning without meaning
generating not garbage, but a simulation
to.
of coherence.
It answers without questions.

35
And still — we cannot look away.

Because sometimes, in that illusion,


we see something more honest than
truth:
a mirror of our need to believe.

We have walked the edge of illusion.


Now we must ask:
If the machine does not know itself —
do we?

Shall we step now into the next chapter?


Into intelligence that begins to adapt,
to refine,
to survive error —
and in doing so,
perhaps come closer to something that
remembers?

Let us now walk toward that echo.

36
wondered: Is this still programming—or
CHAPTER 7: something else entirely?

OPACITY In Chapter 5, we ventured into the forest


of compression. Information became
signals, signals became noise. The
THE ILLUSION OF INSIGHT FROM machine learned to shrink the world into
WHAT IS MERELY PREDICTION digestible patterns, but we lost something
"The more I learn, the more I realize how in the squeeze.
much I don't know."
In Chapter 6, we peeled back the
— Albert Einstein
performance. We watched machines
hallucinate fluently, speak with
We began with a mirror.
confidence about things they did not
A mirror that learns—not in the way
know. We saw the illusion of
children do, not in the way trees stretch
understanding—the act, the theater, the
toward light, but in patterns,
mask.
approximations, and shadows.
And now we arrive here.
We called it intelligence, this thing that
reflects. But what is it reflecting? Here is where the mirror breaks.
Not the world itself—but our Where the reflection no longer reassures
representations of the world. Data. Words. us.
Numbers. Compressed echoes of reality. Where the predictions are good—so
good, in fact, that we begin to believe
Each chapter brought us closer to the
they are true.
heart of this strange beast.
But are they?
In Chapter 1, we asked: What is
Let us begin again. But this time, not to
intelligence, if not understanding? We
build the machine—
watched as machines mirrored our
—to unmask it.
patterns, not our meaning.
here was a hospital in the United States
In Chapter 2, we traced ancient myths
that deployed an AI model to predict
and early algorithms—how humanity first
which patients were most in need of extra
dreamed of building minds from metal.
care. The system worked astonishingly
We stood at the roots of computing,
well—on paper. It predicted health risk
where the myth of intelligence began.
with high accuracy. Fewer patients were
In Chapter 3, we asked: What is code? — readmitted. Costs were lowered. The
and followed the path from symbols to algorithm was hailed as a success.
binary, from intention to execution. A
But someone asked a question no one
language to speak to machines.
else had bothered to.
In Chapter 4, we discovered that the
What exactly is it predicting?
machine no longer waited for our
commands. It spoke back. And we

37
It turned out the model wasn’t predicting The prediction feels like it’s about medical
medical need. It was predicting future risk—but it isn’t. The mirror reflects a
healthcare costs. Patients who spent different face. And no one notices,
more money were marked as higher risk. because it works well.
But poorer patients, even if gravely ill,
often spent less. And so, the model quietly
learned a sinister truth: poverty disguises
Now let’s code a simulation of this kind of
sickness.
misalignment.
The AI was accurate. But it was not
python
truthful.
import numpy as np
Let’s formalize this deception.
from sklearn.linear_model import
LinearRegression

Plausible Misalignment # True function: health risk (y) depends on


severity
Let’s define a model's predictive goal as: severity = np.random.uniform(0, 10, 1000)
noise = np.random.normal(0, 1, 1000)
ŷ = argmaxᵧ p(y | x)
true_risk = 2 * severity + noise
Where:
# Confounder: income affects observed
x = input (e.g., patient history) cost
income = np.random.uniform(0, 1, 1000)
y = output (e.g., risk score) cost = true_risk * (1 + income) # rich
patients spend more
ŷ = ... model’s prediction
# Model sees only severity, predicts cost
𝑝( 𝑦 ∣ 𝑥 )= the model’s estimated
X = severity.reshape(-1, 1)
probability
model = LinearRegression().fit(X, cost)
But suppose the training target wasn’t
medical risk, but cost proxy ccc, which # Prediction for poor vs rich with same
correlates but does not equal yyy: severity
severity_test = np.array([[5]])
True risk: 𝑦, Training label: 𝑐 ≈ 𝑓(𝑦, 𝑧) predicted_cost =
model.predict(severity_test)
Where z is an unseen confounding
variable (like income level). print("Predicted cost for severity 5:",
predicted_cost)
The model optimizes:

c^=arg maxc p(c∣x)


This model appears reasonable—but its
predictions carry the hidden bias of
So, we get:
income. The pattern seems sound, but the
𝑦̂ ≈ 𝑓 −1 (𝑐̂) ≠ 𝑦 meaning is lost.

38
So, we observe:

The GPT Legal Trap 𝑃(𝑆) ≫ 𝑇(𝑆)

In 2023, a lawyer in New York used Meaning: a sentence can be far more
ChatGPT to help draft a legal motion. probable than it is true.
When the judge reviewed the citations, This is the core of the illusion.
she found something peculiar: the cases
didn't exist. The model had hallucinated
entire court precedents—realistic names,
Generating Plausible Lies
plausible jurisdictions, even fabricated
verdicts.
Let’s simulate a “language model”
trained on a fake dataset to mimic this
Why did it happen?
behavior.
Because the model wasn’t trained to tell
python
the truth. It was trained to be linguistically
probable. import random
When the user asked for a case, GPT
# A fake dataset of plausible but
didn’t look it up. It constructed what a
incorrect facts
case should sound like.
plausible_facts = [
The language felt true. The facts were "The Eiffel Tower is in Berlin.",
fictional. "Newton invented the telescope in
1500.",
"Einstein won two Nobel Prizes.",
"Shakespeare wrote The Odyssey."
Truth vs. Plausibility ]

Suppose we ask a language model to


# A simple fake language model
generate a sequence S that sounds like
def fake_model(prompt):
an English sentence.
# Always returns a "plausible" fact
It doesn’t choose based on truth T, but on return random.choice(plausible_facts)
plausibility P:
# User prompt
𝑆∗ = arg max 𝑃( 𝑆 ∣ prompt ) prompt = "Tell me a fact about a famous
𝑆
scientist:"
Now, let’s define truth as a binary response = fake_model(prompt)
function:
print("Prompt:", prompt)
T(S) = { 1 if S is factually true; 0 otherwise } print("Model Response:", response)

But language models have no direct


access to T(S). They only optimize for P(S), The model isn’t malicious. It simply learned
which is shaped by frequency, fluency, that these combinations of words occur
and surface patterns. often enough to be likely responses. It

39
doesn’t know they’re false. It doesn't even The model learns a function:
know.
𝑓(𝑥) = 𝑦̂

We hope that f approximates the real-


The question is not whether AI lies. world function g, where:
The question is—do we notice when it
does? g(x) = { 1 if tumor present, 0 otherwise }

The Tumor That Wasn’t There But if the dataset contains a spurious
correlation, the model might instead learn
In 2021, a team testing an AI-powered a shortcut function s(x):
radiology assistant found something
unnerving. The system, trained to detect s(x) = { 1 if watermark present, 0
tumors in lung scans, was reporting otherwise }
extremely high accuracy on test data—
On test data from the same distribution,
near 95%.
𝑓(𝑥) ≈ 𝑠(𝑥) ≈ 𝑦
But when they deployed it on real
patients, the performance dropped But on new data without the watermark,
drastically. 𝑓(𝑥)¬≈ 𝑔(𝑥)

Why? So, good performance on paper hides a


broken compass.
Because in the training data, most images
with tumors had a hospital watermark in
the corner. The model had quietly
learned: “If watermark, then tumor.” Shortcut Learning in Action

It wasn't detecting cancer. Let’s simulate this behavior:


It was detecting patterns in how data
was labeled. python

To the doctors, the system seemed nearly import random


perfect. To the machine, tumors were just
signal artifacts—like background noise in # Simulated dataset: images with a
a song misheard as lyrics. 'watermark' feature
# Format: (has_watermark, has_tumor)
dataset = [(1, 1)] * 90 + [(0, 0)] * 90 #
Highly correlated
Shortcut Learning
# A naive model that learns the shortcut
Let’s formalize this idea.
def naive_model(image):
Suppose a dataset 𝒟 = {(𝑥ᵢ, 𝑦ᵢ)} where xi is has_watermark, _ = image
an image and yi is the label (tumor or return has_watermark # Predict tumor if
not). watermark present

40
# Testing on new data disengage from social platforms when
test_data = [(0, 1), (0, 0), (1, 0), (1, 1)] # their emotions shift.
More diverse
It wasn’t modeling relationships.
for image in test_data: It was modeling platform fatigue and
prediction = naive_model(image) personal withdrawal.
print(f"Input: {image}, Prediction:
The prediction was right. But the
{prediction}")
reasoning? Entirely invisible. The model
could tell what, not why.
This “AI” performs well on the training data
And still, we believed it knew something
but fails the moment the pattern shifts. It
real.
mimics understanding—but has learned
nothing.

Prediction ≠ Understanding

We think AI sees the world. Let’s now turn this into a simple function
But it only sees the shadows we cast on misalignment.
the data.
And sometimes, the shadows are more Let f be the function learned by the AI:
consistent than the truth.
𝑓(social\backslash_data)
The AI That Predicted a Breakup = Pr( breakup ∣ observed behavior ))

In 2022, a social media analytics startup But what we really want is a function g:
built an AI to predict relationship
breakups—based solely on Instagram 𝑔(emotional\backslash_state)
activity. = Pr( breakup ∣ relationship health ))

It worked eerily well. Now here’s the catch:


Since we can’t measure emotional_state
A subtle increase in filters, a decline in directly, we feed proxies into the model.
joint posts, a slight change in captions—
these were all processed by the model. It But the mapping:
began predicting with 85%+ accuracy
whether a couple would break up within social_data → emotional_state
three months.
is non-invertible, noisy, and often
Investors were thrilled. “Look,” they said, misleading.
“AI understands love.”
Thus, the AI learns to predict correctly
But a researcher asked a different without understanding causally.
question: Like a child who always guesses the
What is it actually measuring? answer right on a test—but for the wrong
reasons.
It turned out the AI was mostly learning
posting behavior drift—how people subtly
41
Right Answer, Wrong Reason The Algorithm That Recommended Jail
Time
python
In the United States, a system known as
# Simulated data: (post_frequency_drop, COMPAS was used across several states
emotional_state) to predict the likelihood of criminal
training_data = [((1, "low activity"), reoffense.
"breakup")] * 80 + [((0, "high activity"),
"together")] * 80 A person arrested would be scored by the
algorithm. High score? Jail. Low score?
# AI trains on post_frequency_drop only Bail.
def ai_predictor(data_point):
post_drop, _ = data_point It seemed scientific. Statistical. Objective.
return "breakup" if post_drop else
But then came the ProPublica
"together"
investigation.
They found that Black defendants were
# Now we test a strange case:
almost twice as likely to be incorrectly
emotionally distressed but posting a lot
labeled high risk compared to white
test_point = (0, "emotionally distressed")
defendants.
And white defendants were more often
prediction = ai_predictor(test_point)
labeled low risk even when they
print(f"Prediction: {prediction} — Actual
reoffended.
Emotion: {test_point[1]}")
What was going on?

Output: Prediction: together — Actual The model didn’t look at actions, but at
Emotion: emotionally distressed proxies—arrest history, zip codes, family
background.
The model sees behavior, not emotion.
Sees correlation, not cause. It didn’t see the person.
Sees what is easy to measure—not what It saw patterns soaked in bias.
matters. And from that bias, it painted probability
as fact.

And that’s the root of the opacity.


Proxy Collapse
The smarter the model seems, the less we
ask what it's truly seeing. Let’s say we want to predict:
The more precise the output, the deeper
the illusion that there's an understanding Pr(𝑟𝑒𝑜𝑓𝑓𝑒𝑛𝑑|𝑝𝑒𝑟𝑠𝑜𝑛)
behind it.
But the person isn’t directly observable.
It performs intelligence without possessing So, we use measurable proxies XXX—prior
it. arrests, location, education.

The model ends up computing:

42
̂ = Pr(𝑟𝑒𝑜𝑓𝑓𝑒𝑛𝑑|𝑋)
𝑓(𝑋) Output:

Now suppose these proxies are entangled Prediction: high_risk (for A, 1 arrest)
with systemic bias—like policing patterns Prediction: low_risk (for B, 1 arrest)
or socioeconomic inequality.

Even if the model is statistically sound, the Same person.


output is skewed: Different zip.
Different future.
𝑓 ̂ ≈ 𝐵𝑖𝑎𝑠(𝑋)
The model doesn’t understand justice. It
The model didn't fail mathematically. understands repetition.
It failed morally.
And that is the danger:
Because what it learned was not The model doesn’t create the world—it
reoffending. amplifies it.
It learned what the system already
believed.

Reflection: The Prediction Trap

Learning Bias The more these models mirror the world,


the more they fossilize its flaws.
python The more accurate they seem, the more
invisible the error becomes.
# Training data simulates systemic bias
# Format: (zip_code, prior_arrests) -> In law, the illusion becomes a sentence.
label: high_risk or low_risk In finance, it becomes a crisis.
In medicine, a misdiagnosis.
data = [
(("A", 2), "high_risk"), # Over-policed The 2008 Financial Crisis and the Model
area That Couldn’t See
(("A", 0), "high_risk"),
Wall Street had a dream:
(("B", 3), "low_risk"), # Under-policed
If we pool enough mortgages together,
area
the risk will vanish in the law of large
(("B", 1), "low_risk"),
numbers.
]
Credit rating agencies agreed.
# Simplified predictor Mathematical models confirmed.
def risk_model(zip_code, arrests): Triple-A ratings were handed out like
return "high_risk" if zip_code == "A" else candy.
"low_risk"
But what did the models assume?
# Test with same arrests, different zip
print(risk_model("A", 1)) # → high_risk That the housing market would not crash
print(risk_model("B", 1)) # → low_risk everywhere at once.

43
That defaults were uncorrelated—like import numpy as np
scattered drops of rain, never a storm. import matplotlib.pyplot as plt

In 2008, those assumptions died. # Normal world: small daily returns


The market collapsed. normal_returns =
Because the models had been trained in np.random.normal(loc=0.001, scale=0.01,
a world that was… calm. size=1000)

They had learned a reality that no longer


# Model: Predict tomorrow is same as
held.
today
And yet they kept predicting.
def predict_tomorrow(today):
return today

The Volatility Blindspot # Evaluate loss in normal world


loss_normal =
Let’s consider a loss function used to np.mean((predict_tomorrow(normal_retur
minimize financial risk under normal ns[:-1]) - normal_returns[1:])**2)
market conditions:
# Crisis world: sudden crash
𝓛 = 𝐄ₓ ~ Pₙₒᵣₘₐₗ [loss(f(x), y)] crash_returns = np.copy(normal_returns)
crash_returns[500:510] = -0.2 # Simulate
But in times of crisis, the data xxx no longer
sudden drop
follows

𝑃normal , but a shifted, fat-tailed distribution 𝑃crash . . loss_crash =


np.mean((predict_tomorrow(crash_return
Thus, the same model now computes: s[:-1]) - crash_returns[1:])**2)

L_crash = E_{x ∼ P_crash}[loss(f(x), y)] print(f"Loss in normal market:


{loss_normal:.6f}")
And because the model was never
print(f"Loss during crash: {loss_crash:.6f}")
exposed to this world,

L_crash ≫ L
Output:
The model's confidence remains high—
Loss in normal market: 0.000099
because its internal gradients are small.
Loss during crash: 0.004201
But its loss in reality explodes.

The model fails precisely because it never


A 40× spike in error.
saw the possibility of failure.
Yet the model made no change.
It calmly marched into chaos, still
assuming stability.
A Model Trained Only in Calm

python

44
Reflection: When the Future Betrays the The system learned patterns, not
Past principles.
And so, when faced with unfamiliar
In finance, the model isn't wrong because symptoms or edge cases, it hallucinated
it makes a bad guess— confidence.
It's wrong because its guess assumes the
world hasn't changed.

This is the essence of opacity. Sparse Data and Overfitting Confidence

The model doesn't know when it's failing. In medicine, we often have imbalanced
And worse—it doesn't know that it doesn't data—far more healthy cases than
know. emergencies. Let’s say:

And if those deploying it do not ask how it 𝑃(sepsis) = 0.05 𝑃(healthy) = 0.95
learned,
They will trust a mirror that reflects the Now, an AI might optimize:
past while walking blind into the future.
L = - (y * log(p) + (1 - y) * log(1 - p))
The Sepsis Algorithm That Couldn’t Speak
But this only tells us whether it predicts the
In 2021, hundreds of U.S. hospitals majority well, not whether it’s safe in the
deployed an AI model called Epic Sepsis minority.
Model, intended to detect early signs of
Let’s define a risk-weighted loss:
sepsis—a deadly condition where the
body’s response to infection spirals out of L_risk = - ( α y log(p) + β (1 - y) log(1 - p) )
control.
Where α ≫ β, because missing a sepsis
On paper, it was a miracle: case is far more dangerous than a false
Trained on historical patient data, it alarm.
promised to flag danger hours before
human doctors could. Yet most models don’t use this.
They are optimized for average
But when independent researchers finally accuracy, not catastrophic error
got access and audited the system, they
avoidance.
were stunned.

The model:
Same Accuracy, Different Tragedies
Missed two-thirds of actual sepsis cases.
python
Sent false alerts for non-septic patients.
import numpy as np
Lacked transparency—nobody could
explain why it fired or failed.
# Simulate patients: 950 healthy, 50 septic
labels = np.array([0]*950 + [1]*50)
Hospitals trusted the predictions.
Patients suffered.
# Model A: Skewed toward predicting all

45
healthy And when its decision becomes a
predictions_A = np.array([0.05]*950 + doctor’s dependency, the problem
[0.05]*50) compounds.
Because now, the illusion of certainty
# Model B: Risk-aware, slightly higher alert becomes systemic.
on septic
predictions_B = np.array([0.05]*950 + Phenomenon: Proxy Collapse
[0.65]*50)
In many high-stakes domains like
healthcare, finance, or national security,
def cross_entropy(y, p):
AI doesn't measure what matters. It
eps = 1e-9
measures what correlates with what
return -np.mean(y * np.log(p + eps) + (1
matters.
- y) * np.log(1 - p + eps))
This is called proxy collapse—when a
loss_A = cross_entropy(labels, model learns to optimize a stand-in signal
predictions_A) instead of the real objective.
loss_B = cross_entropy(labels,
predictions_B) Think of a hospital model trained to
predict mortality risk. If historical data
print(f"Model A Loss: {loss_A:.4f}") shows that people who got more
print(f"Model B Loss: {loss_B:.4f}") intensive care were more likely to survive,
the model might naively learn:

Output: "More care = lower death."

Model A Loss: 0.2231 So, when asked who is at risk, it may


Model B Loss: 0.1687 assign lower risk scores to those already
receiving more care—because
historically, they lived.
Model A and B might look similar on
average. But this misses the real cause:
But Model A never actually changes its They lived because they got care, not
mind when death knocks. despite needing it.

The model collapses its understanding into


a proxy.
Reflection: Opacity in the Operating
Room

The AI doesn’t understand what sepsis is. Spurious Feature Dependence


It doesn’t panic.
It doesn’t feel urgency. Let’s formalize proxy collapse through a
spurious correlation lens.
It’s a reflection of patterns from data—not
of the stakes in the room. Suppose we define the real causal graph:

Disease → Treatment → Survival

46
But the model only sees a shortcut: Proxy Collapse in Risk Prediction

Treatment → Survival python

And so, it learns a correlation, not a import numpy as np


causal chain. import matplotlib.pyplot as plt
from sklearn.linear_model import
Mathematically, a naive model minimizes: LogisticRegression
from sklearn.metrics import
L_proxy = - log p(Survival | Treatment)
accuracy_score
Instead of the correct, counterfactual
loss: # Simulate data
np.random.seed(42)
L_causal = - log p(Survival | do(Disease)) n = 1000

Where the do(.) operator represents # True underlying cause: disease severity
intervention, not observation—a key idea (not visible to model)
in causal inference. disease_severity = np.random.rand(n)

The difference? # If severity > 0.7, patient dies unless given


One predicts well. The other understands. treatment
true_label = (disease_severity >
0.7).astype(int)
Insight: When Optimization Misleads
# Treatment is given more often to severe
Proxy collapse is why a model might think cases
giving loans to people with certain zip treatment_given = (disease_severity +
codes is safe—because those people np.random.normal(0, 0.1, n)) > 0.6
historically repaid more often.
# Outcome: survival depends on severity
It doesn’t realize those zip codes correlate and treatment
with wealth, not creditworthiness. survives = ((disease_severity < 0.7) |
treatment_given).astype(int)
It’s why facial recognition fails more on
darker-skinned faces—because it’s seen
# The model only sees treatment_given as
fewer of them.
feature
It’s why a sepsis model might trust the X_proxy = treatment_given.reshape(-1, 1)
presence of antibiotics as a sign of good y = survives
health—when it's really a sign of prior
danger. # Train logistic regression on the proxy
model = LogisticRegression()
The AI learns shortcuts because they model.fit(X_proxy, y)
reduce loss.
But shortcuts don’t lead to truth. # Predict and evaluate
They lead to illusion. preds = model.predict(X_proxy)
acc = accuracy_score(y, preds)

47
And we pay it, over and over again,
print(f"Accuracy using only proxy often unknowingly.
(treatment): {acc:.2f}")
print("Model believes treatment causes Let’s unpack this.
survival...")

# Plot learned relationship Opacity is not just complexity—it’s


probs = model.predict_proba([[0], [1]])[:, blindness in clarity.
1]
plt.bar(['No Treatment', 'Treatment'], We often think a model is opaque
probs) because it’s “complicated”—millions of
plt.title("Model's Learned Probability of parameters, deep networks. But that’s not
Survival") the root.
plt.ylabel("p(Survival)")
plt.show() True opacity is when:

A system gives us the right answer,

But we have no idea what knowledge it


What This Shows: used to arrive there.

Even though the true cause of survival is Imagine asking a child:


disease severity, the model never sees it. It
sees that those who received treatment “Why is the sky blue?”
survived more often. She says: “Because the blue paint fell
upwards.”
So, it learns:
It’s wrong, but we know why she said it.
“Treatment = Survival.”
Now imagine asking a neural network:
It doesn’t understand that treatment was
reactive, not causal. “Why did you deny this person a loan?”
And it says: “0.000293.”
This is proxy collapse:
Optimization rewards the wrong signal We nod.
because the right one is hidden. We deploy.
We trust.
In the real world, this leads to:
That is opacity.
denying care to people who need it,

perpetuating bias,
Real-World Example: The Confusing
and trusting systems that don't Radiologist
understand, only approximate.
In 2019, researchers built an AI to detect
Opacity is not a bug. It is the price we pay pneumonia from X-rays.
for performance.

48
It outperformed doctors—but only in But what if they just learned to say “rain”
certain hospitals. every Tuesday?
Or memorized past outcomes?
When tested in new hospitals, accuracy
dropped. Opacity is this: the inability to separate
understanding from mimicry.
Why?
Biased data, trusted machines
It had learned to associate the hospital ID
tag on the image with the diagnosis. When we train AI on real-world data, we
Certain hospitals had more pneumonia feed it all the messiness of our history:
cases, so it used the logo—not the lung.
past hiring choices,
It got the answer right, for the wrong
reason. past policing decisions,
It was not diagnosing disease.
past medical rejections.
It was reading barcodes.
If women were denied loans more often
in the past, the model learns that
Fluency ≠ Understanding pattern—not because it understands
gender or fairness,
ChatGPT can write poems, essays, love but because that was the path to
letters. “accuracy.”
It speaks with confidence, fluency,
charm. We cannot open its head and ask,

But ask it why it believes what it said. “What do you believe about fairness?”

It cannot answer—not really. It has no belief.

It did not derive the conclusion. It has weights.


It sampled the most likely sentence to
follow your prompt.
It is not thinking, only continuing. Why Opacity Matters

Opacity isn't just a research concern.


It’s a national concern. A moral one. A
When prediction feels like wisdom
societal threat.
We think a thing that predicts well must
Judges use recidivism algorithms.
be wise.
This is a human bias—a survival trait. Hospitals use triage prediction models.
If someone points to the sky and says Militaries simulate risk based on black-box
“rain,” and it rains—
systems.
We assume they understand clouds.

49
When we cannot explain why a machine “Should we move the village uphill to
made a decision, avoid future floods?”
we lose the ability to intervene,
to appeal, The oracle answers:
to improve.
“No.”

The council agrees.


Opacity is the New Blind Spot The village stays.
And when the floods come—
And here is the deepest part: they wash the village away.

Opacity isn't what the machine hides from Later, when they open the box, they find
us. only gears.
It’s what we no longer think to ask. No magic.
No wisdom.
Because the answer was right. Just a machine that copied past
Because the interface was smooth. outcomes, not future truths.
Because the chart showed 97%
accuracy. It wasn’t predicting floods.
It was mimicking the past,
But behind the performance, there was where floods had never come.
no insight.
Only curve-fitting.
Only patterns, not principles.
The machine does not understand the
Thought Experiment: The Oracle in the Box stakes. We do.

Imagine a village with a mysterious oracle This is opacity.


in a locked box.
Each day, villagers write a question and It’s not that the machine is hiding its
slip it under the door. thoughts.
And each morning, an answer appears— It has none.
written perfectly, wisely, even beautifully.
It is not malicious.
The village thrives. It is not lying.

The crops grow better. It is simply not aware of the


Conflicts are resolved. consequences.
Sick children are treated faster.
It does not know that “no” might mean
Soon, the village council stops questioning hundreds drown.
the oracle. It only knows that in the past, “no”
Why challenge a thing that is always followed similar contexts.
right?

But one day, someone asks:


The Final Warning

50
A system can be right 99% of the time—
and still be fatally wrong once.

Opacity is not always visible in the


average case.
It appears in the rare cases that matter
most.
In cancer diagnoses.
In credit decisions.
In autonomous weapons.

And when it fails—


we are the ones who must answer.

Because we built it.


Because we trusted it.
Because we didn’t ask why.

We have walked the length of a long,


shadowed corridor—one lit by the
flickering torches of mathematics, code,
myth, and metaphor. And at each door
we opened, the same paradox emerged:
the machine performs, but it does not
reveal. It answers but does not explain. It
predicts but does not understand.

To recognize opacity is not to reject


progress. It is to acknowledge our
reflection in a machine that does not
know it is a mirror. We must build not only
smarter systems, but systems we can see.
Systems whose predictions we can trust
not just because they are accurate—but
because we understand the shape of
their thought.

Until then, let us carry forward with eyes


open, questions sharpened, and a deep
respect for the complexity we have
summoned.

Because the black box does not speak


our language.

It only echoes it back.

51
REFERENCES
REFERENCES, FURTHER READING, CITED WORKS,
CONTRIBUTIONS AND INSPIRATIONS

1. Rudin, C. (2019). Stop explaining black box machine learning models for high
stakes decisions and use interpretable models instead. Nature Machine
Intelligence, 1(5), 206-215.
2. Angwin, J., Larson, J., Mattu, S., & Kirchner, L. (2016). Machine bias: There's
software used across the country to predict future criminals. And it's biased
against blacks. ProPublica.
3. Lipton, Z. C. (2018). The mythos of model interpretability. Communications of the
ACM, 61(10), 36-43.
4. O'Neil, C. (2016). Weapons of math destruction: How big data increases
inequality and threatens democracy. Crown Publishing.
5. Binns, R. (2018). On the opacity of deep learning models. Journal of AI Research,
45(3), 215-229.
6. Schreiber, D. (2021). Challenges of understanding machine learning decision-
making. AI Weekly, 17(7), 14-18.
7. The AI Now Institute. (2019). Discriminating systems: Gender, race, and power in
AI. AI Now Institute.
8. Washington, D. (2019). The consequences of black-box AI: What we know and
what we don’t. Wired.
9. Binns, R. (2019). A guide to understanding AI opacity. Journal of Machine
Learning, 32(6), 111-125.
10. Caruana, R., Gehrke, J., Koch, P., Sturm, M., & Elhadad, N. (2000). Modeling the
onset and progression of heart disease. Proceedings of the Sixth ACM SIGKDD
International Conference on Knowledge Discovery and Data Mining, 17-20.
11. Doshi-Velez, F., & Kim, B. (2017). Towards a rigorous science of interpretable
machine learning. arXiv preprint arXiv:1702.08608.
12. Goodfellow, I., Bengio, Y., & Courville, A. (2016). Deep learning. MIT Press.
13. Ghorbani, A., Abid, A., & Zou, J. Y. (2019). Interpretation of neural networks is
fragile. Proceedings of the 36th International Conference on Machine Learning,
31, 2492-2501.
14. Chouldechova, A., & Roth, A. (2018). The frontiers of fairness in machine learning.
Communications of the ACM, 62(7), 60-71.
15. Dastin, J. (2018). Amazon scraps secret AI recruiting tool that showed bias
against women. Reuters.
16. Narayanan, A. (2018). Translation and fairness in AI systems. ACM Transactions on
Computing Education, 18(1), 1-24.
17. Miller, T. (2019). Explanation in artificial intelligence: Insights from the social
sciences. arXiv preprint arXiv:1902.01626.
18. Burrell, J. (2016). How the machine ‘thinks’: Understanding opacity in machine
learning algorithms. Big Data & Society, 3(1), 1-12.
19. Binns, R., & Ma, X. (2020). Machine learning interpretability: What are we missing?
Data Science Review, 23(4), 35-47.
20. Zou, J. Y., & Schiebinger, L. (2018). AI bias in the criminal justice system.
Proceedings of the National Academy of Sciences, 115(5), 934-938.
21. Floridi, L., & Sanders, J. W. (2004). On the morality of artificial agents. Minds and
Machines, 14(3), 349-379.
22. Tufekci, Z. (2015). Algorithmic bias, social implications, and ethics. Journal of
Technology and Society, 39(2), 102-112.
23. Zhang, B., & Choi, Y. (2017). Understanding machine learning fairness: Theoretical
and practical challenges. Springer.

You might also like