Week 8: Introduction to Artificial intelligence for
the Mechatronics Engineering
Title: Anomaly Detection for Predictive Maintenance
Objective:
      1. Understand the concept of anomaly detection and its application in predictive
         maintenance.
      2. Implement unsupervised learning methods to detect anomalies in sensor
         data.
      3. Apply Isolation Forest and Autoencoder neural networks for anomaly
         detection.
      4. Visualize normal and anomalous sensor readings using Matplotlib.
      5. Evaluate model performance using precision, recall, and F1-score.
Key Skills:
      Understanding predictive maintenance and its role in industrial automation
      Applying machine learning for anomaly detection in sensor data.
      Using Isolation Forest and Autoencoders for anomaly detection.
      Evaluating model performance using classification metrics.
      Visualizing sensor data trends and anomalies using Matplotlib and Seaborn.
Problem Statement:
      A factory floor has multiple vibration sensors installed on different machines.
      These sensors monitor the health of rotating machinery by measuring vibration
       levels over time.
      Normal machines show vibration levels between 5 and 15 mm/s.
      Faulty machines exhibit outliers (anomalies) with higher vibration levels.
Instructions
Step 1: Import Required Libraries (5 minutes)
      Start by importing the necessary Python libraries.
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.ensemble import IsolationForest
from sklearn.preprocessing import StandardScaler
from sklearn.metrics import classification_report
import tensorflow as tf
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense
from tensorflow.keras.optimizers import Adam
Step 2: Load and Preprocess the Dataset (15 minutes)
      Generate Sample Sensor Data
      We will simulate a dataset representing normal and faulty machine conditions.
# Set random seed for reproducibility
np. random.seed (42)
# Generate normal sensor readings (500 samples)
normal_data = np.random.normal (loc=10, scale=2, size=500) #Mean = 10
# Generate abnormal sensor readings (20 samples)
anomaly_data = np.random.normal(loc=25, scale=5, size=20) #Mean = 25
# Combine data into a single dataset
vibration_data = np.concatenate((normal_data, anomaly_data))
# Create labels (0=Normal, 1=Anomaly)
labels =np.concatenate((np.zeros(len(normal_data)),np.ones(len (anoma
# Convert to DataFrame
df = pd.DataFrame({'Vibration': vibration_data, 'Label': labels})
# Shuffle the dataset
df = df.sample(frac=1, random_state=42).reset_index(drop=True)
# Display dataset summary
print(df.head())
Step 3: Visualizing the Data (10 minutes)
 # Plot histogram of vibration levels
plt.figure(figsize=(8, 5))
sns.histplot(df['Vibration'], bins=30, kde=True, hue=df ['Label'], pale*
plt.xlabel("Vibration Level (mm/s)"')
plt.ylabel ("Frequency")
plt.title("Distribution of Sensor Readings")
plt. legend ( ["Normal", "Anomaly"])
plt. show()
Step 4: Anomaly Detection Using Isolation Forest (20 minutes)
      Train the Isolation Forest Model
# # Standardize the data
scaler = StandardScaler()
df ['Vibration_Scaled'] = scaler.fit_transform(df[['Vibration']])
python
⑦
# Train Isolation Forest model
iso_forest = IsolationForest(contamination=0.05, random_state=42)
df ['Anomaly_Score'] =
iso_forest.fit_predict(df[['Vibration_Scaled']])
# Convert anomaly scores (-1= anomaly, 1 = normal) to binary labels (.
df ['Anomaly_Detected'] = (df ['Anomaly_Score'] = -1) .astype(int)
# Print classification report
print(classification_report(df['Label'], df ['Anomaly_Detected']) )
* Visualizing Isolation Forest Predictions
python
plt.figure(figsize=(8, 5))
sns.scatterplot(x=df.index, y=df['Vibration'], hue=df I'Anomaly_DetecE
plt.xlabel ("Sample Index")
plt.ylabel("Vibration Level (mm/s)"')
plt.title("Anomaly Detection with Isolation Forest") plt. legend ( ["Normal",
"Anomaly"])
plt.show()
Step 5: Anomaly Detection Using Autoencoder Neural Network (25 minutes)
      Define and Train an Autoencoder Model
 # Define Autoencoder model
autoencoder = Sequential([
Dense(8, activation='relu', input_shape=(1,)),
Dense(4, activation='relu'),
Dense(8, activation='relu'),
Dense(1, activation='linear')
# Compile the model
autoencoder.compile(optimizer=Adam(learning_rate=0.01), loss='mse')
# Train the Autoencoder
history = autoencoder.fit(df[ 'Vibration_Scaled'], df['Vibration_Scaled
      Reconstruction Error and Anomaly Detection
#Compute reconstruction errors
reconstructed _data = autoencoder-predict(df[ 'Vibration_Scaled'])
reconstruction_error = np.abs(df['Vibration_Scaled'] - reconstructed_di
# Set threshold for anomaly detection (Mean + 2 * Std Dev)
threshold = np-mean (reconstruction_error) + 2 * np.std (reconstruction_‹ df
['Anomaly_Autoencoder'] = (reconstruction_error › threshold).astype(:
# Print classification report
print(classification_report(df['Label'], df['Anomaly_Autoencoder']))
* Visualizing Autoencoder-Based Anomaly Detection
python
plt.figure(figsize=(8, 5))
SnS« Scatterplot(x=dfindex, yedf['Vibration'], hue=dfl'Anomaly_AutoefE
plt.xlabel("Sample Index"')
plt.ylabel("Vibration Level (mm/s)"')
plt.title("Anomaly Detection with Autoencoder")
plt. legend ( ["Normal", "Anomaly"]) plt.show()
Step 6: Compare Model Performance (10 minutes)
      Compare the Isolation Forest and Autoencoder models based on:
           Precision (accuracy of detecting actual anomalies).
           Recall (ability to find all anomalies).
           F1-score (harmonic mean of precision and recall).
#print("Isolation Forest Performance:")
print(classification_report(df['Label'], df['Anomaly_Detected']))
print ("Autoencoder Performance:")
print(classification_report(df['Label'], df['Anomaly_Autoencoder']))
Evaluation Questions:
1. What are the key differences between Isolation Forest and Autoencoder
for anomaly detection?
    Algorithm Type: Isolation Forest is a tree-based model, while an
      Autoencoder is a neural network-based model.
    Training Method: Isolation Forest isolates anomalies using decision trees
      without requiring labeled data, whereas an Autoencoder learns to
      reconstruct normal data patterns and uses reconstruction error as an
      anomaly score.
    Feature Dependencies: Isolation Forest works well with tabular data and
      high-dimensional data without assuming feature dependencies.
      Autoencoders, on the other hand, learn complex feature relationships,
      making them better suited for structured or sequential data.
    Computational Complexity: Isolation Forest is computationally efficient
      and scalable, whereas Autoencoders require significant computational
      resources, especially for deep networks.
    Interpretability: Isolation Forest is more interpretable since it provides a
      direct anomaly score based on tree depth, while Autoencoders are more of
      a "black-box" requiring additional interpretation techniques.
2. How does contamination level in Isolation Forest affect the results?
Effect of Contamination Level in Isolation Forest
 The contamination level is a hyperparameter that represents the expected
proportion of anomalies in the dataset.
      High Contamination: More points are classified as anomalies, increasing
        recall but reducing precision, as more normal points might be falsely
        labeled as outliers.
      Low Contamination: Fewer points are labeled as anomalies, increasing
        precision but possibly missing actual outliers (lower recall).
      Incorrect Contamination Estimate: If the true anomaly ratio is different
        from the contamination parameter, the model might overfit or under-
        detect anomalies.
3. What happens when we change the threshold for anomaly detection in
the Autoencoder?
Effect of Changing the Threshold in Autoencoder for Anomaly Detection
      Higher Threshold: Fewer anomalies detected, reducing false positives
        but potentially missing real anomalies.
      Lower Threshold: More anomalies detected, increasing recall but also
        increasing false positives.
      Tuning Strategy: The threshold is usually set based on validation data or
        domain knowledge, and sometimes through statistical methods like using
        the mean + k standard deviations of reconstruction error.
4. How could this method be applied to real-time predictive maintenance in
factories?
Application to Real-Time Predictive Maintenance in Factories
      Data Collection: Use IoT sensors to collect machine operation data
        (vibration, temperature, pressure, sound, etc.).
      Model Training: Train an Autoencoder on normal operating conditions to
        learn normal behavior patterns.
      Real-Time Monitoring: Continuously feed live data into the trained
        Autoencoder and calculate the reconstruction error.
      Anomaly Detection: If the reconstruction error exceeds a threshold, flag
        the instance as an anomaly.
      Predictive Action: Trigger alerts for maintenance before failures occur,
        reducing downtime and costs.
Complete Code And Obtained Output:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.ensemble import IsolationForest
from sklearn.preprocessing import StandardScaler
from sklearn.metrics import classification_report
import tensorflow as tf
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense
from tensorflow.keras.optimizers import Adam
# Set random seed for reproducibility
np.random.seed(42)
# Generate normal sensor readings (500 samples)
normal_data = np.random.normal(loc=10, scale=2, size=500) # Mean = 10
# Generate abnormal sensor readings (20 samples)
anomaly_data = np.random.normal(loc=25, scale=5, size=20) # Mean = 25
# Combine data into a single dataset
vibration_data = np.concatenate((normal_data, anomaly_data))
# Create labels (0 = Normal, 1 = Anomaly)
labels = np.concatenate((np.zeros(len(normal_data)), np.ones(len(anomaly_data))))
# Convert to DataFrame
df = pd.DataFrame({'Vibration': vibration_data, 'Label': labels})
# Shuffle the dataset
df = df.sample(frac=1, random_state=42).reset_index(drop=True)
# Display dataset summary
print(df.head())
# Visualizing the Data
plt.figure(figsize=(8, 5))
sns.histplot(data=df, x='Vibration', bins=30, kde=True, hue='Label', palette=['green', 'red'])
plt.xlabel("Vibration Level (mm/s)")
plt.ylabel("Frequency")
plt.title("Distribution of Sensor Readings")
plt.legend(["Normal", "Anomaly"])
plt.show()
# Anomaly Detection Using Isolation Forest
# Standardize the data
scaler = StandardScaler()
df['Vibration_Scaled'] = scaler.fit_transform(df[['Vibration']])
# Train Isolation Forest model
iso_forest = IsolationForest(contamination=0.05, random_state=42)
df['Anomaly_Score'] = iso_forest.fit_predict(df[['Vibration_Scaled']])
# Convert anomaly scores (-1= anomaly, 1 = normal) to binary labels (0 = normal, 1 = anomaly)
df['Anomaly_Detected'] = (df['Anomaly_Score'] == -1).astype(int)
# Print classification report
print("Isolation Forest Classification Report:")
print(classification_report(df['Label'], df['Anomaly_Detected']))
# Visualizing Isolation Forest Predictions
plt.figure(figsize=(8, 5))
sns.scatterplot(x=df.index, y=df['Vibration'], hue=df['Anomaly_Detected'], palette=['green', 'red'])
plt.xlabel("Sample Index")
plt.ylabel("Vibration Level (mm/s)")
plt.title("Anomaly Detection with Isolation Forest")
plt.legend(["Normal", "Anomaly"])
plt.show()
# Anomaly Detection Using Autoencoder Neural Network
# Define Autoencoder model
autoencoder = Sequential([
   Dense(8, activation='relu', input_shape=(1,)),
   Dense(4, activation='relu'),
   Dense(8, activation='relu'),
   Dense(1, activation='linear')
])
# Compile the model
autoencoder.compile(optimizer=Adam(learning_rate=0.01), loss='mse')
# Train the Autoencoder
history = autoencoder.fit(df['Vibration_Scaled'], df['Vibration_Scaled'],
                epochs=50, batch_size=32, validation_split=0.2, verbose=0)
# Reconstruction Error and Anomaly Detection
# Compute reconstruction errors
reconstructed_data = autoencoder.predict(df['Vibration_Scaled'], verbose=0)
reconstruction_error = np.abs(df['Vibration_Scaled'] - reconstructed_data.flatten())
# Set threshold for anomaly detection (Mean + 2 * Std Dev)
threshold = np.mean(reconstruction_error) + 2 * np.std(reconstruction_error)
df['Anomaly_Autoencoder'] = (reconstruction_error > threshold).astype(int)
# Print classification report
print("Autoencoder Classification Report:")
print(classification_report(df['Label'], df['Anomaly_Autoencoder']))
# Visualizing Autoencoder-Based Anomaly Detection
plt.figure(figsize=(8, 5))
sns.scatterplot(x=df.index, y=df['Vibration'], hue=df['Anomaly_Autoencoder'], palette=['green',
'red'])
plt.xlabel("Sample Index")
plt.ylabel("Vibration Level (mm/s)")
plt.title("Anomaly Detection with Autoencoder")
plt.legend(["Normal", "Anomaly"])
plt.show()
# Compare Model Performance
print("Isolation Forest Performance:")
print(classification_report(df['Label'], df['Anomaly_Detected']))
print("\nAutoencoder Performance:")
print(classification_report(df['Label'], df['Anomaly_Autoencoder']))