0% found this document useful (0 votes)
9 views19 pages

Isolation Forest AI-ML

Isolation Forest is an algorithm designed to detect anomalies in data by isolating outliers through random decision tree cuts. It identifies points that are isolated quickly as anomalies, making it useful in various fields such as fraud detection in banking and abnormal patient surge detection in hospitals. The method is efficient, unsupervised, and can handle large datasets effectively.

Uploaded by

samina
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
9 views19 pages

Isolation Forest AI-ML

Isolation Forest is an algorithm designed to detect anomalies in data by isolating outliers through random decision tree cuts. It identifies points that are isolated quickly as anomalies, making it useful in various fields such as fraud detection in banking and abnormal patient surge detection in hospitals. The method is efficient, unsupervised, and can handle large datasets effectively.

Uploaded by

samina
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 19

An important concept for ML interviews

What is Isolation Forest in


AI & ML

Sanjay N Kumar
Data scientist | AI ML Engineer | Statistician | Analytics Consultant
Why Do We Need Isolation Forest?
🚨

● Let’s start with a story!


● 👫 Imagine a class photo with 29 students
and... 1 monkey 🐵
Everyone is smiling 😊 — and suddenly, you
see the monkey.
That monkey is different. That’s an
anomaly.
● 🧠 In data, such "monkeys" are called
outliers — strange values.
📌 Isolation Forest helps us spot these
strange ones quickly.
What is Isolation Forest? 🌳

Isolation Forest is a special algorithm to find


weird things in data.
🌲 It works like this:
● It builds many decision trees using random
cuts
● If a data point gets isolated quickly, it's
probably an anomaly
● If it takes more steps to isolate, it’s normal
📌 Goal: Isolate strange points faster than
normal ones
Real-Life Example 1🛒

🏪 A grocery store sees most customers buying


5–10 apples a day. 🍎
One day, someone buys 1000 apples! 😱
That’s strange. That’s an outlier.
✅ Isolation Forest will catch this behavior and
flag it!
Real-Life Example 2 🏥

🏥 A hospital gets 50–60 patients per day.


Suddenly, 500 patients arrive in one day. 😮
Could be a virus outbreak or emergency! 🦠
🚨 Isolation Forest alerts doctors of this
unusual pattern.
Real-Life Example 3 🏦

🏦 In a bank, people usually withdraw ₹500 to


₹5000.
But one person withdraws ₹1 crore! 💰
That’s a red flag 🚩 — maybe a fraud case.
✅ Isolation Forest will catch this and alert the
system.
Step-by-Step Process – Simple
Walkthrough 🧠

Let’s understand how the algorithm works, like


a game 🎮
🪄 Step 1: Pick a Feature
From your data, randomly pick one column
(feature).
🔍 Example: “Money Spent” or “Speed” or
“Height”
Step-by-Step Process – Simple
Walkthrough 🧠

✂ Step 2: Pick a Random Value


Now pick a random value in that column.
📏 Like: "Split at ₹500" or "Split at 5 feet"
Divide the data:
● Less than ₹500 ➡ Group A
● ₹500 or more ➡ Group B
Step-by-Step Process – Simple
Walkthrough 🧠

🔁 Step 3: Repeat the Splitting


Keep doing this — randomly pick feature ➡
randomly split
🎯 Do this until each data point is isolated
(alone) in its group
Step-by-Step Process – Simple
Walkthrough 🧠

⏳ Step 4: Count the Number of Steps


Now see how fast each point was isolated.
🟢 More steps → Blended in → Normal
🔴 Fewer steps → Got isolated quickly →
Weird!
Easy Math Example 🔢

Let’s say these are money spent by 5 people:

Randomly split the data:


● Most people group together and take 6–8
splits
● Person E stands out and is isolated in 1–2
splits
🎯 E is an anomaly
What Happens in Isolation Forest?
💡
Let’s say we randomly choose the feature:
Money Spent
And now we randomly pick split values like:
● ₹5000: E goes to one side 🡆 Everyone else
to the other
✅ That already separates E

Just 1 random cut is enough to isolate E most


of the time!
🧮 What About the Others (A–D)?
These are close to each other:
● A = ₹100
● B = ₹120
● C = ₹130
● D = ₹150
To separate each of them, you need more
random cuts:
● Maybe ₹110 splits A from the rest
● Then ₹125 splits B
● Then ₹140 splits C
● And finally, D is alone
⏳ It takes 4–5+ steps to isolate all of them.

🎯 Why Person E is an Anomaly?


● Person E is far away from others
● Because they are so far, random split values
will likely isolate them quickly
● So their average number of steps (tree
depth) is small
● That gives E a high anomaly score
✅ Hence, Isolation Forest says: "E is
different!"
Simple Formula (Don’t Worry!) 🧮

📌 Isolation Forest gives each point an


anomaly score:
Anomaly_Score = 2^(-avg_steps / c(n))
● Fewer steps → Higher score (close to 1) →
Weird
● More steps → Lower score (close to 0) →
Normal
(c(n) is a constant based on how many points
you have)
Where is It Used? 🛠

✅ Banks → Fraud detection


✅ Hospitals → Detect abnormal patient
surges
✅ E-commerce → Catch weird buying
patterns
✅ Cybersecurity → Detect hacking activity
✅ Sensors → Detect machine breakdowns
Advantages of Isolation Forest 🎯

💡 Works well even if data is huge


💡 Finds anomalies without needing labels
(unsupervised)
💡 Very fast and efficient
💡 Easy to use in Python using scikit-learn
Recap – Remember These Points 📌

🌳 Isolation Forest is a tree-based algorithm


✂ Uses random cuts to split data
⏱ If a point gets isolated in fewer cuts, it’s
probably anomalous
💥 Helps detect frauds, mistakes, or surprises
in data
Final Thought 💬

“Finding an anomaly in data is like spotting a


zebra 🦓 among horses 🐎

Isolation Forest makes this job super easy!” 🙌


Spot the Unseen, Isolate the
Unexpected….
Let Isolation Forest be your eyes in the data jungle

swift, sharp, and always alert to anomalies!
🌲👁💡
Let’s turn confusion into clarity — one tree at a
time.
Reach out, and let’s find the outliers that matter!
🚀📊

Sanjay N Kumar
Data scientist | AI ML Engineer | Statistician | Analytics Consultant

https://www.linkedin.com/in/sanjaytheanalyst360/

sanjaytheanalyst360@gmail.com

You might also like