Mini Project Final 10,42,52
Mini Project Final 10,42,52
F/ TL /
024 Rev.00 Date
20.03.2020
MINI-PROJECT REPORT
submitted in partial fulfillment of the requirements
for the award of the degree in
BACHELOR OF SCIENCE
in
COMPUTER SCIENCE AND ENGINEERING
by
DEPARTMENT OF
COMPUTER SCIENCE AND
ENGINEERING
MAY 2025
DECLARATION
J.AMIRTHA HARSHINI (221061101014), P.K.ANUSHA (221061101018) hereby declare that the project
phase 1 report entitled “SMART MOTOR TROUBLESHOOTING AND PREDICTIVE MAINTENANCE
CHATBOT” is done by us under the guidance of MRS. HARINI and is submitted in partial fulfillment of the
requirements for the award of the degree in BACHELOR OF TECHNOLOGY in COMPUTER SCIENCE
AND ENGINEERING.
1 .AMIRTHA HARSHINI.J
2. ANUSHA.P.K
DATE:
PLACE: CHENNAI SIGNATURE OF THE CANDIDATE(S)
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING
BONAFIDE CERTIFICATE
This is to certify that this Project Report is the bonafide work of J.AMIRTHA HARSHINI
(221061101014), P.K.ANUSHA (221061101018), who carried out the project entitled SMART MOTOR
Thiru. Dr. A.C. Shanmugam, B.A., B.L., President Er. A.C.S. Arunkumar,
B.Tech., and Secretary Thiru A. Ravikumar for all the encouragement and
support extended to us during the tenure of this project and also our years of
S. Geetha, who has been actively involved and very influential from the start till
Project guide MRS. HARINI for their continuous guidance and encouragement
We would also like to thank all the teaching and non-teaching staff of
Computer Science and Engineering department, for their constant support and
goals.
TABLE OF CONTENTS
2 Requirement Analysis 10
and Specification
2.1 Introduction 10
2.2 Functional Requirements 10
2.3 Non-Functional Requirements 11
2.4 User Requirements 12
2.5 System Requirements 13
3 Design 14
3.1 System Architecture 14
3.2 Component Design 14
3.3 Data Flow Design 14
3.4 Machine Learning Model 15-16
3.5 UI Design 16
3.6 Use case diagrams 17-19
3.7 Spam email data set classifier 20
4 Implementation 21
4.1 Data Handling 21-22
4.2 Feature Engineering 22
4.3 Model Training 23-24
4.4 Evaluation 24
4.5 Deployment 25
DC Direct Current
FN False Negative
ML Machine Learning
List of tables
List of Figures
Unplanned motor failures in industrial environments can result in significant downtime and maintenance
costs. This project introduces Diagnobot, an AI-based desktop application for smart motor
troubleshooting and predictive maintenance, developed entirely without reliance on IoT or cloud
technologies. Using Machine Learning and Natural Language Processing (NLP), Diagnobot simulates an
intelligent chatbot that assists users in diagnosing faults and predicting failures in various motor types,
including AC, DC, and stepper motors. The system analyzes offline CRT (Current, Resistance, Temperature)
data to detect fault patterns and offer real-time recommendations.
A Random Forest model, trained on a curated industrial dataset from Kaggle, powers the predictive engine.
The user interface, built with PyQt6, includes features like motor selection, CRT visualizations, fault history
tracking, and CSV-based data management — all accessible offline. Designed for standalone deployment,
Diagnobot provides a cost-effective, secure, and scalable AI solution for industrial maintenance. Future
developments aim to enhance its intelligence and usability while maintaining its offline-first architecture.
Diagnobot also shows CRT trends with graphs and stores history — all while running completely offline on
a local system. It's a practical, efficient, and user-friendly solution for technicians and industries looking to
modernize their motor maintenance — with no extra setup or cloud dependency. And there’s room to grow
too with future plans for smarter interactions and even better fault learning.
CHAPTER 1
INTRODUCTION
1.1 Background
The motivation for this project comes from the need for a more proactive approach to
motor maintenance. By integrating predictive algorithms with real-time CRT data and
providing an AI-powered chatbot interface, the solution enables early fault detection and
diagnosis. This allows maintenance teams to take preventative action before a fault
escalates, improving operational efficiency and reducing the risk of motor failure.
Diagnobot uses a conversational AI chatbot to interact with users, offering motor health
insights, troubleshooting suggestions, and predictive maintenance alerts, all without
relying on IoT or cloud-based solutions. This tool is aimed at enhancing motor
performance, reducing downtime, and optimizing maintenance procedures, making it
ideal for industrial environments.
1.2 Problem Statement
The core problem addressed by this project is the need for a smarter, more efficient way
of troubleshooting and maintaining motors. The AI-based Smart Motor Troubleshooting
and Predictive Maintenance Chatbot aims to fill this gap by providing real-time
monitoring, fault prediction, and automated troubleshooting support, all while being
independent of cloud and IoT infrastructure.
This system leverages CRT data analysis, machine learning-based predictive algorithms
(e.g., Random Forest), and natural language processing (NLP) to interactively guide users
in diagnosing motor faults, predicting failures, and suggesting appropriate corrective
actions. The solution operates offline without cloud or IoT dependency and supports user
authentication, data visualization, and motor-specific troubleshooting solutions.
1.3 Objectives
Real-time Motor Data Analysis: To monitor key motor parameters and detect abnormalities such as
overheating, overcurrent, voltage drop, and motor load effects.
Predictive Maintenance: To apply machine learning algorithms (e.g., Random Forest) to predict
potential motor failures before they occur.
AI-based Fault Diagnosis: To create an interactive AI chatbot that offers real-time diagnostic
assistance based on motor data and symptoms.
User-friendly Interface: To design a PyQt6-based user interface that provides easy access to the
motor’s health data, troubleshooting solutions, and maintenance recommendations.
Fault Detection for Various Motor Types: To support various types of motors (DC, AC, Stepper)
and provide motor-specific troubleshooting and solutions.
Local Data Storage: To manage user authentication, motor logs, and diagnostic results via CSV
files, without relying on external cloud services.
Industry-Grade Deployment: To ensure that the solution is suitable for industrial use, with the
ability to scale and integrate into existing systems.
The project follows a data science workflow, starting with data collection and preprocessing. Key
stages include:
Data preprocessing: Clean, normalize, and structure CRT data; handle missing values, label
motor conditions, and format data for machine learning input.
Model training: Train a Random Forest classifier on preprocessed CRT data to predict motor faults
based on labeled conditions.
Evaluation: Assess model performance using accuracy, precision, recall, and confusion matrix on
test data to ensure reliable fault prediction.
Exploratory Data Analysis (EDA): Analyze CRT data through statistical summaries and
visualizations to identify patterns, correlations, and fault indicators in motor behavior.
Feature Engineering: Extract and select key CRT features to improve model accuracy and fault
detection.
Handling Imbalanced Data: To ensure fair learning, we balanced the dataset so the model doesn't
ignore rare but critical motor faults. Techniques like SMOTE or adjusting class weights helped the
model treat all fault types with equal importance.
Model Selection: We chose the Random Forest algorithm for its reliability, ability to handle
complex data, and strong performance in classifying motor fault conditions accurately.
Hyperparameter Tuning: We fine-tuned the model’s settings to boost accuracy and make sure it
predicts motor faults as precisely as possible.
Model Interpretation: We analysed how the model makes decisions to understand which CRT
features most influence motor fault predictions ensuring transparency and trustworthiness.
Deployment and Integration: The trained model and chatbot were integrated into a user-friendly
desktop app, enabling real-time motor fault diagnosis and maintenance guidance without needing
internet or cloud access.
User Interface: Designed an intuitive, interactive, and visually appealing GUI with PyQt6,
allowing users to easily input data, select motor types, chat with the bot, and view real-time
diagnostic results.
Continuous Learning (Future Scope): In the future, the system could keep learning from new
motor data and user feedback to improve its fault predictions and advice, getting smarter and more
accurate over time.
1.5 Technologies Used
Programming Language: Python
Deployment: PyInstaller
Data Sources: Industrial Motor Temperature and Fault Detection Dataset - Kaggle
• Libraries: scikit-learn (for machine learning algorithms), pandas (for data manipulation),
• numpy (for numerical computation), and nltk (for text preprocessing).
• Vs code: We used VS Code for developing Diagnobot due to its lightweight, efficient, and
feature-rich environment. It supported Python 3.13, PyQt6 GUI design, and machine learning
integration.
Text Vectorization: Utilize TfidfVectorizer and Word2Vec from scikit-learn for feature
extraction from text data.
Model Evaluation Tools: scikit-learn's metrics module for generating classification reports,
confusion matrices, and ROC-AUC scores.
Data Cleaning: re (regular expressions) for text cleaning and pattern recognition.
Frontend : The frontend of Diagnobot is built using PyQt6, a modern Python GUI framework
that gives a professional look and feel to the desktop application.
Version Control: Git and GitHub for code management, collaboration, and version
tracking.
Environment Management: Visual Studio Code (VS Code) for writing, testing, and
debugging the code.
Deployment : Diagnobot is packaged as a standalone desktop application for Windows and use
pyinstaller.
1.6 Scope for Future Work
In the future, Diagnobot can be enhanced with real-time sensor integration using industrial
protocols like Modbus allowing direct monitoring of motor parameters. The chatbot can be
upgraded with deep learning-based NLP models like BERT or GPT for more natural and context-
aware conversations. Additionally, multi-language support can be added to improve accessibility
across different regions. A mobile version of Diagnobot can also be developed to provide
portable motor diagnostics.
Advanced NLP Models: Upgrade the chatbot with deep learning-based language models (like
Multi-language Support: Expand the chatbot to understand and respond in multiple languages
Mobile Application: Develop mobile versions for Android/iOS to provide motor diagnostics on
the go.
Integration with ERP Systems: Link Diagnobot with enterprise resource planning tools to
streamline maintenance workflows.
Enhanced Visualizations : Include interactive dashboards and 3D visualizations for better fault
understanding.
Edge AI Deployment : Run Diagnobot on low-power edge devices (e.g., Raspberry Pi) for factory-floor
integration without full PCs.
Fault Knowledge Base Expansion: Automatically update fault libraries using real-time data and
technician feedback.
Modular Plugin System : Allow add-ons for new motor types or custom analysis tools without changing
the core app.
Self-Learning from Technician Feedback – Adapt chatbot responses and accuracy based on how
technicians rate or correct its suggestions.
Security Layer for Industrial Deployment – Implement data encryption, access logs, and role-based
controls for secure industry use.
In the study titled "Fault Diagnosis of Induction Motors Using Artificial Neural
Networks" by Zhang et al. (2019), Artificial Neural Networks (ANN) were employed for
the classification of motor faults. The system focused on analyzing vibration signals and
current signatures to detect abnormalities in motor behavior. Data collection was carried
out through various sensors, and the model required a networked monitoring environment
to gather and process real-time operational data for accurate fault diagnosis.
Reactive Maintenance Approach in Motor Systems:
Reactive maintenance is a traditional strategy where repairs and interventions are
performed only after a motor has failed or shown significant degradation. This method
does not involve regular monitoring or predictive diagnostics, resulting in unexpected
downtime and higher maintenance costs. Since faults are addressed only after failure,
there is no early warning system in place. Reactive maintenance approaches do not
incorporate Artificial Intelligence (AI), predictive models, or Internet of Things (IoT)
technologies, making them inefficient compared to modern smart maintenance
solutions.
Conventional Motor Monitoring Systems with Limited Automation:
Conventional motor monitoring systems in many industries have integrated basic IoT
sensors to track motor parameters such as temperature and current. However, these
systems typically do not include advanced analytics or predictive intelligence. The
collected data is often underutilized, and alerts are generated manually based on threshold
breaches rather than intelligent analysis. Maintenance personnel rely on periodic
inspections and manual logs, with no AI-driven decision support or fault prediction. As a
result, while IoT devices are present, the system lacks automation and operates primarily
on manual alerts, limiting its effectiveness in proactive maintenance
Cloud-Based Systems:
Most modern predictive maintenance systems rely heavily on cloud platforms for data storage,
real-time processing, and large-scale machine learning model deployment. These systems typically
collect sensor data (e.g., vibration, CRT, RPM) from industrial equipment through IoT devices and
transmit it to the cloud for analysis. Tools such as AWS IoT Core, Microsoft Azure Machine
Learning, and Google Cloud AI are commonly used for real-time monitoring, model training,
and dashboard visualization.
IoT-Based Systems:
2.1 Introduction
This section outlines the functional and non-functional requirements for the spam detection system.
The primary objective is to create a model that can classify emails as spam or non-spam with high
accuracy. Requirement analysis helps in understanding the objectives and ensuring that the system
meets user and system needs effectively. section outlines the functional and non-functional
requirements of the email spam detection system. The goal is to develop a reliable and efficient
model that classifies emails as spam or not spam, ensuring usability, scalability, and accuracy. This
section defines the functional and non-functional requirements for the email spam detection system.
The goal is to build an accurate, scalable, and efficient model that reliably classifies emails as spam
or not spam, ensuring system usability and performance. This section presents the essential
requirements for developing an email spam detection system. The system is intended to accurately
distinguish between spam and legitimate emails, supporting users in maintaining a clean and efficient
inbox. The analysis of these requirements ensures that both user needs and system constraints are
thoroughly addressed. The model should not only provide high accuracy in detection but also
maintain optimal performance under varying loads, be easy to integrate and use, and support future
enhancements and scaling.
Feature Extraction: Extraction of features like frequency of certain words, subject, sender,
and other meta-information.
Spam Prediction: Ability to classify new emails as spam or not spam based on the trained
model.
Evaluation: Generation of metrics to evaluate the performance of the model.
Low Latency: The system should provide classification predictions with a response time of under
High Accuracy: The model must maintain an accuracy rate of at least 95%, ensuring reliable spam
required.
Scalability: The API service must handle at least 100 concurrent users without
performance degradation. It should also be easily scalable to support future growth in user
Availability: The system should maintain 99.9% uptime, ensuring consistent service availability
for users.
Security: All communications with the API must be secured using HTTPS, and input data
Monitoring and Logging: The system should include monitoring tools and detailed logging to
track usage, performance, and errors for ongoing analysis and troubleshooting.
Data Privacy: The system must comply with applicable data protection regulations (e.g.,
GDPR) and ensure that user email content is processed securely and not stored unnecessarily.
2.4 User Requirements
The system should return clear classifications (spam or not spam) with explanations based on
the model's decision.
Users should be able to input or forward email content for spam classification through a
brief, human-readable explanations of the model's decision (e.g., based on suspicious links,
The system should offer easy integration into popular email clients or services (e.g.,
The spam detection results should be clearly displayed within the user's email environment
Users should have an option to manually flag emails as false positives or false negatives
Users must be assured that their email content is processed securely and that no personal data
is stored unnecessarily.
Spam classification should occur in the background without interrupting the user’s regular
email workflow.
Users should be optionally notified when suspicious or potentially harmful emails are detected.
The user interface should comply with accessibility standards (e.g., WCAG) to ensure usability
Users should be able to adjust sensitivity thresholds or customize rules (e.g., always mark emails
Hardware Requirements
Minimum RAM: 4 GB for basic model training and evaluation; 8 GB or more recommended
Processor: 2.0 GHz dual-core CPU minimum; quad-core CPU preferred for parallel
processing tasks.
Storage: At least 10 GB of available disk space for datasets, model artifacts, logs, and temporary
files.
Software Requirements
Development Tools: Jupyter Notebook or any IDE that supports Python (e.g., VS Code, PyCharm).
Python Libraries:
Operating System
Environment must allow integration with authentication protocols (e.g., OAuth2) if needed for
CHAPTER 3 DESIGN
The system follows a modular approach where emails are parsed and preprocessed before being fed into
the machine learning model. The architecture includes:
Preprocessing Module: Cleans and preprocesses data (tokenization, stop word removal, etc.).
Feature Extraction Module: Extracts features like word counts, subject line, etc.
The architecture consists of modules for data collection, preprocessing, feature extraction,
model training, evaluation, and deployment.
Each component of the system (data collection, preprocessing, training, prediction) is designed to
function independently but interact cohesively. Data Collection Module: Ingests raw emails.
5. Emails are first preprocessed, converted into feature vectors, and then passed to the trained
model for prediction. The outcome is stored for monitoring.
6. Incoming emails are collected through data ingestion pipelines (e.g., APIs, direct uploads,
or IMAP servers).
7. Emails are parsed to extract relevant components such as the subject line, sender information,
and body content.
8. Metadata (timestamp, sender address, etc.) is also captured for contextual analysis.
12. Text tokenization is performed to break the email content into individual words or tokens.
13. Stop word removal and stemming/lemmatization are applied to reduce words to their root forms.
15. TF-IDF (Term Frequency–Inverse Document Frequency) is used to quantify the importance
of words.
16. Alternatively, Word2Vec embeddings are used to capture semantic relationships between words.
18. The dataset is split into training and validation sets to ensure the model can generalize well
to unseen data.
19. Models like Naive Bayes, SVM, and Decision Tree are trained with cross-validation to
prevent overfitting.
3.4 Machine Learning Model Design
deployments.
The system uses supervised learning algorithms. Three models are tested:
Naive Bayes: Works well for text classification tasks due to its simplicity.
Support Vector Machine (SVM): Effective for binary classification with high-dimensional data.
Deep learning model capturing sequential patterns for improved semantic understanding.
3.5 UI Design
If a user interface is included, it could allow users to paste an email and receive a classification
result. A simple HTML interface showing incoming emails flagged as Spam or Not Spam with
the
confidence score. If a user interface is included, it should be intuitive, responsive, and user-friendly to
facilitate easy interaction with the spam detection system. The interface can be web-based
Actors:
End User
Emails Description of
Components:
Frontend (HTML/JavaScript):
Submit button
IMPLEMENTATION
Feature extraction: Using techniques like TF-IDF or word frequency to convert text into
numerical features.
data handling is crucial for building a high-performance spam detection model. The dataset
used undergoes several preprocessing and transformation steps to ensure the quality,
relevance, and efficiency of the model.
Text Cleaning
Stopword Removal
Common English stopwords (e.g., "the", "is", "and") are removed to reduce noise and
improve model focus on meaningful terms.
The email content is broken into individual words (unigrams), phrases (bigrams/trigrams), or
tokens using tools like NLTK or spaCy.
Stemming: Reduces words to their root form (e.g., “running” → “run”) using algorithms like
Porter Stemmer.
Lemmatization: Converts words to their dictionary form considering context (e.g., “better”
→ “good”).
Remove common obfuscations used by spammers (e.g., "Fr£e", "C1ick h3re") or normalize
them using regex-based replacements.
Fig :4 data processing module
Text data is converted into numerical vectors using TF-IDF and Word2Vec embeddings.
Length of the email: Spam emails tend to be longer or shorter than regular emails.
GridSearchCV.
The models are trained using labeled datasets (spam vs. not spam). For each model, the training process
involves splitting the dataset into a training set and a testing set, followed by fitting the model on the
training set. Each component of the system (data collection, preprocessing, training, prediction) is
designed to function independently but interact cohesively. Data Collection Module: Ingests raw
emails.
emails.
Function: Acts independently to collect and store input data for further processing.
Fig :5 LSTM training model
4.4 Evaluation
Metrics include Accuracy (95.7%), Precision (96.2%), Recall (94.8%), and F1-Score (95.5%). ROC
curves demonstrate excellent model discrimination ability.
Flask API is developed to serve predictions. The model is containerized for easy deployment.
The model can be deployed on a web-based platform, allowing users to paste an email and receive
immediate classification.
CHAPTER 5
SUMMARY
This section summarizes the project's main findings, including the effectiveness of the chosen
machine learning algorithms for spam detection. The conclusion could also mention the accuracy of
the models and areas for improvement. The developed Email Spam Detection System achieved
outstanding performance through the integration of machine learning and NLP techniques. Unlike
static filters, this system adapts to evolving spam strategies and ensures high accuracy. The project
successfully demonstrates the practical application of machine learning pipelines, from data
processing to real-time deployment.
Future work includes real-time adaptive learning, better handling of adversarial examples, and full-
scale cloud deployment.
This project demonstrated the effectiveness of machine learning and NLP techniques in building a
high-performing email spam detection system. The system achieved strong accuracy and
adaptability, outperforming traditional static filters. It showcases the practical use of end-to-end ML
pipelines, from data preprocessing to deployment. Future enhancements include real-time adaptive
learning, improved defense against adversarial spam, and scalable cloud integration.
Conclusion
The Email Spam Detection System developed in this project illustrates the significant advantages of
applying machine learning and natural language processing to a real-world problem. By leveraging a
combination of supervised learning algorithms and robust text processing techniques, the system
achieved high accuracy in identifying and filtering spam emails. Unlike traditional static filters, the
model adapts to new and evolving spam tactics, making it a resilient and scalable solution.
This project not only highlights the technical feasibility of building an end-to-end machine learning
pipeline—from data collection and preprocessing to model training and deployment—but also
emphasizes its effectiveness in practical applications. The results affirm the value of intelligent,
adaptive systems in cybersecurity contexts.
Looking forward, the system can be further enhanced through continuous learning mechanisms,
stronger adversarial robustness, and seamless integration into cloud-based infrastructures. These
improvements will ensure that the system remains responsive, reliable, and scalable in increasingly
complex email environments.
CHAPTER 6
REFERENCES
1. Rennie, J., Shih, L., Teevan, J., & Karger, D. (2003). "Tackling the Poor Assumptions of Naive
Bayes Text Classifiers." Proceedings of the 25th annual international ACM SIGIR conference
on Research and development in information retrieval, 616-617.
2. Joachims, T. (1998). "Text Categorization with Support Vector Machines: Learning with Many
Relevant Features." Proceedings of the European Conference on Machine Learning, 137-142.
3. Androutsopoulos, I., et al. "An experimental comparison of naive Bayesian and keyword-based
anti- spam filtering."
4. Guzella, T. S., & Caminhas, W. M. "A review of machine learning approaches to spam filtering."
5. Devlin, J., et al. "BERT: Pre-training of Deep Bidirectional Transformers for Language
Understanding."
7. Almeida, T. A., Hidalgo, J. M. G., & Yamakami, A. "Content-based SMS spam filtering."
8. Sahami, M., Dumais, S., Heckerman, D., & Horvitz, E. (1998). A Bayesian Approach to
Filtering Junk E-Mail. Learning for Text Categorization: Papers from the 1998
Workshop.
9. Delany, S. J., Buckley, M., & Greene, D. (2005). SMS Spam Filtering: Methods and
Data. Expert Systems with Applications, 39(10), 9899-9908.
10. Bhowmick, A., & Hazarika, S. M. (2012). Machine Learning for E-mail Spam
Filtering: Review, Techniques and Trends. arXiv preprint arXiv:1211.1044.
11. Hidalgo, J. M. G., Bringas, G. C., Sánz, E. P., & García, F. C. (2006). Content-Based SMS
Spam Filtering. Proceedings of the 2006 ACM Symposium on Document Engineering.
12. Almeida, T. A., & Hidalgo, J. M. G. (2011). A New Collection of SMS Spam Filtering.
UCI Machine Learning Repository.
13. Islam, R., & Abawajy, J. (2013). A Multi-Tier Phishing Detection and Filtering
Approach. Journal of Network and Computer Applications, 36(1), 324–335.
14. Blanzieri, E., & Bryl, A. (2008). A Survey of Learning-Based Techniques of Email
Spam Filtering. Artificial Intelligence Review, 29(1), 63–92.
15. Mohtasseb, H., & Ahmed, T. (2012). SMS Spam Filtering using Neural Networks.
International Journal of Computer Applications, 58(12).
16. Zhang, L., Zhu, J., & Yao, T. (2004). An Evaluation of Statistical Spam Filtering
Techniques. ACM Transactions on Asian Language Information Processing (TALIP), 3(4),
243–269.