An important concept for ML interviews
What is Isolation Forest in
AI & ML
                         Sanjay N Kumar
   Data scientist | AI ML Engineer | Statistician | Analytics Consultant
 Why Do We Need Isolation Forest?
              🚨
● Let’s start with a story!
● 👫 Imagine a class photo with 29 students
  and... 1 monkey 🐵
   Everyone is smiling 😊 — and suddenly, you
  see the monkey.
   That monkey is different. That’s an
  anomaly.
● 🧠 In data, such "monkeys" are called
  outliers — strange values.
   📌 Isolation Forest helps us spot these
  strange ones quickly.
      What is Isolation Forest? 🌳
Isolation Forest is a special algorithm to find
weird things in data.
🌲 It works like this:
● It builds many decision trees using random
  cuts
● If a data point gets isolated quickly, it's
  probably an anomaly
● If it takes more steps to isolate, it’s normal
📌 Goal: Isolate strange points faster than
normal ones
         Real-Life Example 1🛒
🏪 A grocery store sees most customers buying
5–10 apples a day. 🍎
One day, someone buys 1000 apples! 😱
That’s strange. That’s an outlier.
✅ Isolation Forest will catch this behavior and
flag it!
         Real-Life Example 2 🏥
🏥 A hospital gets 50–60 patients per day.
Suddenly, 500 patients arrive in one day. 😮
Could be a virus outbreak or emergency! 🦠
🚨 Isolation Forest alerts doctors of this
unusual pattern.
         Real-Life Example 3 🏦
🏦 In a bank, people usually withdraw ₹500 to
₹5000.
But one person withdraws ₹1 crore! 💰
That’s a red flag 🚩 — maybe a fraud case.
✅ Isolation Forest will catch this and alert the
system.
    Step-by-Step Process – Simple
          Walkthrough 🧠
Let’s understand how the algorithm works, like
a game 🎮
🪄 Step 1: Pick a Feature
From your data, randomly pick one column
(feature).
 🔍 Example: “Money Spent” or “Speed” or
“Height”
     Step-by-Step Process – Simple
           Walkthrough 🧠
✂ Step 2: Pick a Random Value
Now pick a random value in that column.
📏 Like: "Split at ₹500" or "Split at 5 feet"
Divide the data:
● Less than ₹500 ➡ Group A
● ₹500 or more ➡ Group B
     Step-by-Step Process – Simple
           Walkthrough 🧠
🔁 Step 3: Repeat the Splitting
Keep doing this — randomly pick feature ➡
randomly split
 🎯 Do this until each data point is isolated
(alone) in its group
     Step-by-Step Process – Simple
           Walkthrough 🧠
⏳ Step 4: Count the Number of Steps
Now see how fast each point was isolated.
🟢 More steps → Blended in → Normal
🔴 Fewer steps → Got isolated quickly →
Weird!
        Easy Math Example 🔢
Let’s say these are money spent by 5 people:
Randomly split the data:
● Most people group together and take 6–8
  splits
● Person E stands out and is isolated in 1–2
  splits
🎯 E is an anomaly
    What Happens in Isolation Forest?
                  💡
Let’s say we randomly choose the feature:
Money Spent
 And now we randomly pick split values like:
● ₹5000: E goes to one side 🡆 Everyone else
  to the other
   ✅ That already separates E
Just 1 random cut is enough to isolate E most
of the time!
🧮 What About the Others (A–D)?
These are close to each other:
●   A = ₹100
●   B = ₹120
●   C = ₹130
●   D = ₹150
To separate each of them, you need more
random cuts:
●   Maybe ₹110 splits A from the rest
●   Then ₹125 splits B
●   Then ₹140 splits C
●   And finally, D is alone
⏳ It takes 4–5+ steps to isolate all of them.
🎯 Why Person E is an Anomaly?
● Person E is far away from others
● Because they are so far, random split values
  will likely isolate them quickly
● So their average number of steps (tree
  depth) is small
● That gives E a high anomaly score
  ✅ Hence, Isolation Forest says: "E is
  different!"
  Simple Formula (Don’t Worry!) 🧮
📌 Isolation Forest gives each point an
anomaly score:
Anomaly_Score = 2^(-avg_steps / c(n))
● Fewer steps → Higher score (close to 1) →
  Weird
● More steps → Lower score (close to 0) →
  Normal
(c(n) is a constant based on how many points
you have)
          Where is It Used? 🛠
✅ Banks → Fraud detection
 ✅ Hospitals → Detect abnormal patient
surges
 ✅ E-commerce → Catch weird buying
patterns
 ✅ Cybersecurity → Detect hacking activity
 ✅ Sensors → Detect machine breakdowns
  Advantages of Isolation Forest 🎯
💡 Works well even if data is huge
 💡 Finds anomalies without needing labels
(unsupervised)
 💡 Very fast and efficient
 💡 Easy to use in Python using scikit-learn
 Recap – Remember These Points 📌
🌳 Isolation Forest is a tree-based algorithm
✂ Uses random cuts to split data
⏱ If a point gets isolated in fewer cuts, it’s
probably anomalous
💥 Helps detect frauds, mistakes, or surprises
in data
           Final Thought 💬
“Finding an anomaly in data is like spotting a
zebra 🦓 among horses 🐎
Isolation Forest makes this job super easy!” 🙌
Spot the Unseen, Isolate the
Unexpected….
Let Isolation Forest be your eyes in the data jungle
—
 swift, sharp, and always alert to anomalies!
🌲👁💡
Let’s turn confusion into clarity — one tree at a
time.
 Reach out, and let’s find the outliers that matter!
🚀📊
                         Sanjay N Kumar
  Data scientist | AI ML Engineer | Statistician | Analytics Consultant
         https://www.linkedin.com/in/sanjaytheanalyst360/
          sanjaytheanalyst360@gmail.com