0% found this document useful (0 votes)
29 views11 pages

Week 8

The document outlines a week 8 course on anomaly detection for predictive maintenance in mechatronics engineering, focusing on unsupervised learning methods like Isolation Forest and Autoencoders. It includes steps for data preprocessing, visualization, model training, and performance evaluation using metrics such as precision and recall. Key skills developed include understanding predictive maintenance, applying machine learning techniques, and visualizing data trends.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
29 views11 pages

Week 8

The document outlines a week 8 course on anomaly detection for predictive maintenance in mechatronics engineering, focusing on unsupervised learning methods like Isolation Forest and Autoencoders. It includes steps for data preprocessing, visualization, model training, and performance evaluation using metrics such as precision and recall. Key skills developed include understanding predictive maintenance, applying machine learning techniques, and visualizing data trends.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 11

Week 8: Introduction to Artificial intelligence for

the Mechatronics Engineering

Title: Anomaly Detection for Predictive Maintenance

Objective:
1. Understand the concept of anomaly detection and its application in predictive
maintenance.
2. Implement unsupervised learning methods to detect anomalies in sensor
data.
3. Apply Isolation Forest and Autoencoder neural networks for anomaly
detection.
4. Visualize normal and anomalous sensor readings using Matplotlib.
5. Evaluate model performance using precision, recall, and F1-score.

Key Skills:

 Understanding predictive maintenance and its role in industrial automation


 Applying machine learning for anomaly detection in sensor data.
 Using Isolation Forest and Autoencoders for anomaly detection.
 Evaluating model performance using classification metrics.
 Visualizing sensor data trends and anomalies using Matplotlib and Seaborn.

Problem Statement:

 A factory floor has multiple vibration sensors installed on different machines.


 These sensors monitor the health of rotating machinery by measuring vibration
levels over time.
 Normal machines show vibration levels between 5 and 15 mm/s.
 Faulty machines exhibit outliers (anomalies) with higher vibration levels.
Instructions

Step 1: Import Required Libraries (5 minutes)

 Start by importing the necessary Python libraries.

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.ensemble import IsolationForest
from sklearn.preprocessing import StandardScaler
from sklearn.metrics import classification_report
import tensorflow as tf
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense
from tensorflow.keras.optimizers import Adam

Step 2: Load and Preprocess the Dataset (15 minutes)

 Generate Sample Sensor Data


 We will simulate a dataset representing normal and faulty machine conditions.

# Set random seed for reproducibility


np. random.seed (42)

# Generate normal sensor readings (500 samples)


normal_data = np.random.normal (loc=10, scale=2, size=500) #Mean = 10

# Generate abnormal sensor readings (20 samples)


anomaly_data = np.random.normal(loc=25, scale=5, size=20) #Mean = 25

# Combine data into a single dataset


vibration_data = np.concatenate((normal_data, anomaly_data))

# Create labels (0=Normal, 1=Anomaly)


labels =np.concatenate((np.zeros(len(normal_data)),np.ones(len (anoma

# Convert to DataFrame
df = pd.DataFrame({'Vibration': vibration_data, 'Label': labels})
# Shuffle the dataset
df = df.sample(frac=1, random_state=42).reset_index(drop=True)

# Display dataset summary


print(df.head())

Step 3: Visualizing the Data (10 minutes)

# Plot histogram of vibration levels


plt.figure(figsize=(8, 5))
sns.histplot(df['Vibration'], bins=30, kde=True, hue=df ['Label'], pale*
plt.xlabel("Vibration Level (mm/s)"')
plt.ylabel ("Frequency")
plt.title("Distribution of Sensor Readings")
plt. legend ( ["Normal", "Anomaly"])
plt. show()

Step 4: Anomaly Detection Using Isolation Forest (20 minutes)

 Train the Isolation Forest Model

# # Standardize the data


scaler = StandardScaler()
df ['Vibration_Scaled'] = scaler.fit_transform(df[['Vibration']])
python

# Train Isolation Forest model
iso_forest = IsolationForest(contamination=0.05, random_state=42)
df ['Anomaly_Score'] =
iso_forest.fit_predict(df[['Vibration_Scaled']])
# Convert anomaly scores (-1= anomaly, 1 = normal) to binary labels (.
df ['Anomaly_Detected'] = (df ['Anomaly_Score'] = -1) .astype(int)
# Print classification report
print(classification_report(df['Label'], df ['Anomaly_Detected']) )
* Visualizing Isolation Forest Predictions
python
plt.figure(figsize=(8, 5))
sns.scatterplot(x=df.index, y=df['Vibration'], hue=df I'Anomaly_DetecE
plt.xlabel ("Sample Index")
plt.ylabel("Vibration Level (mm/s)"')
plt.title("Anomaly Detection with Isolation Forest") plt. legend ( ["Normal",
"Anomaly"])
plt.show()

Step 5: Anomaly Detection Using Autoencoder Neural Network (25 minutes)

 Define and Train an Autoencoder Model

# Define Autoencoder model


autoencoder = Sequential([
Dense(8, activation='relu', input_shape=(1,)),
Dense(4, activation='relu'),
Dense(8, activation='relu'),
Dense(1, activation='linear')
# Compile the model
autoencoder.compile(optimizer=Adam(learning_rate=0.01), loss='mse')
# Train the Autoencoder
history = autoencoder.fit(df[ 'Vibration_Scaled'], df['Vibration_Scaled

 Reconstruction Error and Anomaly Detection

#Compute reconstruction errors


reconstructed _data = autoencoder-predict(df[ 'Vibration_Scaled'])
reconstruction_error = np.abs(df['Vibration_Scaled'] - reconstructed_di
# Set threshold for anomaly detection (Mean + 2 * Std Dev)
threshold = np-mean (reconstruction_error) + 2 * np.std (reconstruction_‹ df
['Anomaly_Autoencoder'] = (reconstruction_error › threshold).astype(:
# Print classification report
print(classification_report(df['Label'], df['Anomaly_Autoencoder']))
* Visualizing Autoencoder-Based Anomaly Detection
python
plt.figure(figsize=(8, 5))
SnS« Scatterplot(x=dfindex, yedf['Vibration'], hue=dfl'Anomaly_AutoefE
plt.xlabel("Sample Index"')
plt.ylabel("Vibration Level (mm/s)"')
plt.title("Anomaly Detection with Autoencoder")
plt. legend ( ["Normal", "Anomaly"]) plt.show()

Step 6: Compare Model Performance (10 minutes)

 Compare the Isolation Forest and Autoencoder models based on:


 Precision (accuracy of detecting actual anomalies).
 Recall (ability to find all anomalies).
 F1-score (harmonic mean of precision and recall).
#print("Isolation Forest Performance:")
print(classification_report(df['Label'], df['Anomaly_Detected']))
print ("Autoencoder Performance:")
print(classification_report(df['Label'], df['Anomaly_Autoencoder']))

Evaluation Questions:

1. What are the key differences between Isolation Forest and Autoencoder
for anomaly detection?
 Algorithm Type: Isolation Forest is a tree-based model, while an
Autoencoder is a neural network-based model.
 Training Method: Isolation Forest isolates anomalies using decision trees
without requiring labeled data, whereas an Autoencoder learns to
reconstruct normal data patterns and uses reconstruction error as an
anomaly score.
 Feature Dependencies: Isolation Forest works well with tabular data and
high-dimensional data without assuming feature dependencies.
Autoencoders, on the other hand, learn complex feature relationships,
making them better suited for structured or sequential data.
 Computational Complexity: Isolation Forest is computationally efficient
and scalable, whereas Autoencoders require significant computational
resources, especially for deep networks.
 Interpretability: Isolation Forest is more interpretable since it provides a
direct anomaly score based on tree depth, while Autoencoders are more of
a "black-box" requiring additional interpretation techniques.

2. How does contamination level in Isolation Forest affect the results?


Effect of Contamination Level in Isolation Forest
The contamination level is a hyperparameter that represents the expected
proportion of anomalies in the dataset.
 High Contamination: More points are classified as anomalies, increasing
recall but reducing precision, as more normal points might be falsely
labeled as outliers.
 Low Contamination: Fewer points are labeled as anomalies, increasing
precision but possibly missing actual outliers (lower recall).
 Incorrect Contamination Estimate: If the true anomaly ratio is different
from the contamination parameter, the model might overfit or under-
detect anomalies.

3. What happens when we change the threshold for anomaly detection in


the Autoencoder?
Effect of Changing the Threshold in Autoencoder for Anomaly Detection
 Higher Threshold: Fewer anomalies detected, reducing false positives
but potentially missing real anomalies.
 Lower Threshold: More anomalies detected, increasing recall but also
increasing false positives.
 Tuning Strategy: The threshold is usually set based on validation data or
domain knowledge, and sometimes through statistical methods like using
the mean + k standard deviations of reconstruction error.

4. How could this method be applied to real-time predictive maintenance in


factories?
Application to Real-Time Predictive Maintenance in Factories
 Data Collection: Use IoT sensors to collect machine operation data
(vibration, temperature, pressure, sound, etc.).
 Model Training: Train an Autoencoder on normal operating conditions to
learn normal behavior patterns.
 Real-Time Monitoring: Continuously feed live data into the trained
Autoencoder and calculate the reconstruction error.
 Anomaly Detection: If the reconstruction error exceeds a threshold, flag
the instance as an anomaly.
 Predictive Action: Trigger alerts for maintenance before failures occur,
reducing downtime and costs.
Complete Code And Obtained Output:

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.ensemble import IsolationForest
from sklearn.preprocessing import StandardScaler
from sklearn.metrics import classification_report
import tensorflow as tf
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense
from tensorflow.keras.optimizers import Adam

# Set random seed for reproducibility


np.random.seed(42)

# Generate normal sensor readings (500 samples)


normal_data = np.random.normal(loc=10, scale=2, size=500) # Mean = 10

# Generate abnormal sensor readings (20 samples)


anomaly_data = np.random.normal(loc=25, scale=5, size=20) # Mean = 25

# Combine data into a single dataset


vibration_data = np.concatenate((normal_data, anomaly_data))

# Create labels (0 = Normal, 1 = Anomaly)


labels = np.concatenate((np.zeros(len(normal_data)), np.ones(len(anomaly_data))))

# Convert to DataFrame
df = pd.DataFrame({'Vibration': vibration_data, 'Label': labels})

# Shuffle the dataset


df = df.sample(frac=1, random_state=42).reset_index(drop=True)

# Display dataset summary


print(df.head())

# Visualizing the Data


plt.figure(figsize=(8, 5))
sns.histplot(data=df, x='Vibration', bins=30, kde=True, hue='Label', palette=['green', 'red'])
plt.xlabel("Vibration Level (mm/s)")
plt.ylabel("Frequency")
plt.title("Distribution of Sensor Readings")
plt.legend(["Normal", "Anomaly"])
plt.show()

# Anomaly Detection Using Isolation Forest


# Standardize the data
scaler = StandardScaler()
df['Vibration_Scaled'] = scaler.fit_transform(df[['Vibration']])

# Train Isolation Forest model


iso_forest = IsolationForest(contamination=0.05, random_state=42)
df['Anomaly_Score'] = iso_forest.fit_predict(df[['Vibration_Scaled']])
# Convert anomaly scores (-1= anomaly, 1 = normal) to binary labels (0 = normal, 1 = anomaly)
df['Anomaly_Detected'] = (df['Anomaly_Score'] == -1).astype(int)

# Print classification report


print("Isolation Forest Classification Report:")
print(classification_report(df['Label'], df['Anomaly_Detected']))

# Visualizing Isolation Forest Predictions


plt.figure(figsize=(8, 5))
sns.scatterplot(x=df.index, y=df['Vibration'], hue=df['Anomaly_Detected'], palette=['green', 'red'])
plt.xlabel("Sample Index")
plt.ylabel("Vibration Level (mm/s)")
plt.title("Anomaly Detection with Isolation Forest")
plt.legend(["Normal", "Anomaly"])
plt.show()

# Anomaly Detection Using Autoencoder Neural Network


# Define Autoencoder model
autoencoder = Sequential([
Dense(8, activation='relu', input_shape=(1,)),
Dense(4, activation='relu'),
Dense(8, activation='relu'),
Dense(1, activation='linear')
])

# Compile the model


autoencoder.compile(optimizer=Adam(learning_rate=0.01), loss='mse')

# Train the Autoencoder


history = autoencoder.fit(df['Vibration_Scaled'], df['Vibration_Scaled'],
epochs=50, batch_size=32, validation_split=0.2, verbose=0)
# Reconstruction Error and Anomaly Detection
# Compute reconstruction errors
reconstructed_data = autoencoder.predict(df['Vibration_Scaled'], verbose=0)
reconstruction_error = np.abs(df['Vibration_Scaled'] - reconstructed_data.flatten())

# Set threshold for anomaly detection (Mean + 2 * Std Dev)


threshold = np.mean(reconstruction_error) + 2 * np.std(reconstruction_error)
df['Anomaly_Autoencoder'] = (reconstruction_error > threshold).astype(int)

# Print classification report


print("Autoencoder Classification Report:")
print(classification_report(df['Label'], df['Anomaly_Autoencoder']))

# Visualizing Autoencoder-Based Anomaly Detection


plt.figure(figsize=(8, 5))
sns.scatterplot(x=df.index, y=df['Vibration'], hue=df['Anomaly_Autoencoder'], palette=['green',
'red'])
plt.xlabel("Sample Index")
plt.ylabel("Vibration Level (mm/s)")
plt.title("Anomaly Detection with Autoencoder")
plt.legend(["Normal", "Anomaly"])
plt.show()
# Compare Model Performance
print("Isolation Forest Performance:")
print(classification_report(df['Label'], df['Anomaly_Detected']))
print("\nAutoencoder Performance:")
print(classification_report(df['Label'], df['Anomaly_Autoencoder']))

You might also like