INDUSTRIAL INTERNSHIP TRAINING
REPORT
Submitted by
RUDHRA S N
GOPIKA S
In partial fulfillment for the award of the degree
of
BACHELOR OF TECHNOLOGY
in
Information Technology
St. XAVIER’S CATHOLIC COLLEGE OF ENGINEERING
(An Autonomous college)
Chunkankadai, Nagercoil – 629 003
ANNA UNIVERSITY : CHENNAI 600025
NOVEMBER 2024
i
ST.XAVIER’S CATHOLIC COLLEGE OF
ENGINEERING
(An Autonomous Institution)
Chunkankadai,Nagercoil-629 003
BONAFIDE CERTIFICATE
Certified that this Industrial Internship Training titled “DATA SCIENCE” undergone at
“PERSONIFWY, BANGALORE” is the bonafide work of “RUDHRA S N (962221205045),
GOPIKA S (962221205025)” who carried out the training work under my supervision. Certified
further that to the best of my knowledge the work reported here in does not form part of any other
project report or dissertation on the basis of which a degree or award was conferred on an earlier
occasion on this or any other candidate.
Period of Training : 17/06/2024 – 17/07/2024
SIGNATURE SIGNATURE
Dr. G. Sahaya Stalin Jose M.E.,Ph.D., Dr. G. Geo Jenefer M.E.,Ph.D.,
HEAD OF THE DEPARTMENT INTERNSHIP COORDINATOR
Assistant Professor, Assistant Professor,
Department of Information Technology Department of Information Technology
St. Xavier’s Catholic College of St. Xavier’s Catholic College of
Engineering Engineering
Chunkankadai-629003 Chunkankadai-629003
Submitted on the industrial Internship Training viva-voce examination held at St. Xavier’s
Catholic College of Engineering on ………………..
Examiner Examiner Examiner
ii
INTERNSHIP CERTIFICATE
iii
DECLARATION
I am “ RUDHRA S N 962221205045 ” hereby declare that the internship entitled “ DATA
SCIENCE ’’ undergone at “ PERSONIFWY, BANGALORE ” being submitted in partial
fulfilment of the requirements for the award of the Degree of “ BACHELOR OF
TECHNOLOGY ” is the original work carried out by me . It has not formed the part of any
other project work submitted for award of any degree or diploma, either in this or any other
Institution.
Period of Training : 17/06/2024 – 17/07/2024
Place : Chunkankadai Signature :
Date : Name :
Reg. No :
iv
DECLARATION
I am “ GOPIKA S 962221205025 ” hereby declare that the internship entitled “ DATA
SCIENCE ’’ undergone at “ PERSONIFWY, BANGALORE ” being submitted in partial
fulfilment of the requirements for the award of the Degree of “ BACHELOR OF
TECHNOLOGY ” is the original work carried out by me . It has not formed the part of any
other project work submitted for award of any degree or diploma, either in this or any other
Institution.
Period of Training : 17/06/2024 – 17/07/2024
Place : Chunkankadai Signature :
Date : Name :
Reg. No :
V
CO-PO &PSO MAPPING
COURSE OUTCOMES
CO1: Describe Industry Practices, Processes, Techniques, technology, automation and other
core aspects of software industry
CO2: Analyze and Design solutions to complex business problems
CO3: Build and deploy solutions for target platform
CO4: Preparation of Technical reports and presentation.
Mapping of Course Outcomes to Programme Outcomes
Course PO1 PO2 PO3 PO4 PO5 PO6 PO7 PO8 PO9 PO10 PO11 PO12
Outcomes
CO1 3 3 3 3 2 2 - 1 2 2 1 1
CO2 3 3 3 3 2 2 - 1 2 2 1 1
CO3 3 3 3 3 2 2 - 1 2 2 1 1
CO4 1 1 1 1 1 - - 1 2 3 - -
Average 2.5 2.5 2.5 2.5 1.75 1.5 - 1 2 2.25 0.75 0.75
Mapping of Course Outcomes to Programme Specific Outcomes
PSO1 PSO2 PSO3
CO1 3 3 3
CO2 3 3 3
CO3 3 3 3
CO4 3 3 3
Average 3 3 3
vi
EXECUTIVE SUMMARY
In the domain of innovative data solutions complemented by a strong digital presence stands
Personifwy, an organization renowned for its role as a repository of data-driven insights. When it
comes to services like machine learning algorithms, predictive modeling or business intelligence
systems, at Personifwy we are able to deliver high-quality customized solutions that perfectly fit
clients' requirements in terms of accuracy and creativity.
My internship with this company was concentrated on data science through Python and different
machine learning libraries; thus, I formed a stimulating effect on them during my four-week stay at
Personify. During these initial weeks spent in intensive training, I was exposed to modern methods
of analyzing data besides industry’s best practices. Such base training went a long way in
sharpening my skills preparing me for projects outside school. By end term of my internship
program not only had I applied those skills practically but also contributed significantly on creation
of vital information and answers for the company.
When the training was over, I had to work on two projects. Best Ad Predictor is all about advanced
data analysis that helps predict the best ads while SMS Spam Detection deals with categorizing
messages into spam and ham by means of complex data analytic techniques.
During my work on the Best Ad Predictor project, I focused on data preparation, including
gathering, cleaning, and preprocessing ad performance data. I then engaged in feature engineering
to create relevant features influencing ad success. The core tasks involved developing predictive
models using Upper Confidence Bound (UCB) and Thompson Sampling. After training and
optimizing these models, they were deployed in the ad-serving platform, enhancing ad targeting
and significantly impacting the company’s marketing strategies.
For the SMS Spam Detection project, I collected and preprocessed SMS data, extracted features,
and converted text into numerical format. I trained and optimized classification models to
differentiate spam from ham messages and deployed the final model for real-time classification.
This effort improved spam detection accuracy and contributed to more efficient message filtering.
My internship with Personify was educative as it showed me how to use data science to solve
practical issues thereby generating useful conclusions and remedies which were in line with the
company’s goals. This experience not only improved my technical skills but also displayed
how decisions can be made based on data.
vii
ACKNOWLEDGEMENT
First and foremost, we express our gratitude to the Almighty for His presence and abundant
grace in granting us the knowledge, wisdom, and strength to complete this internship
successfully.
We are thankful to our Correspondent Rev. Fr. S. Godwin Selva Justus, for facilitating the
resources needed to complete our internship, and to our Principal Dr. J. Maheswaran, M.E.,
Ph.D., for his continuous support and encouragement.
We are especially indebted to our Head of the Department Dr. G. Sahaya Stalin Jose, M.E.,
Ph.D., for his guidance and support during our internship journey.
We are immensely grateful to Personifwy for providing this valuable internship opportunity.
Our heartfelt thanks go to our project guide Mr. Uppendra Navada, for his exceptional
guidance, insightful feedback, and continuous support, which were invaluable in helping us
understand the complexities of our projects.
Our sincere thanks to Dr. G Geo Jenefer , M.E., Ph.D., our Internship Coordinator for giving
us innovative ideas, motivation, and wholehearted encouragement in completing our internship
successfully.
We extend our heartfelt gratitude to Dr. G. Geo Jenefer, M.E., Ph.D., and Er. T. M. Angelin
Monisha Sharean, M.E., for their invaluable mentorship. Their guidance and support greatly
enriched my learning experience during the internship.
We shall gratefully acknowledge all suggestions received for further improvement in the
project.
viii
LIST OF FIGURES
Fig.No. Title of Figures Page no.
4.1 Sample datum in the data set 10
4.2 Visualization before and after balancing 11
4.3 Enhancing model’s ability 11
4.4 Countplot for sentence containing Currency Symbol 11
4.5 Distribution of Spam vs Ham words 12
4.6 Confusion Matrix of Decision Tree vs Naïve Bayes 12
4.7 Real time Example 12
4.8 Upper Confidence Bound(UCB) 15
4.9 Thompson Sampling 15
ix
LIST OF ABBREVIATION
Abbreviation Full Form
UCB Upper Confidence Bound
TS Thompson Sampling
Term Frequency-Inverse Document
TF-IDF
Frequency
NLP Natural Language Processing
NLTK Natural Language Toolkit
Re Regular Expressions (Python's re module)
x
TABLE OF CONTENTS
CHAPTER TITLE PAGE
NO.
NO.
Certificate ii
Company Certificate iii
Declaration iv
CO-PO & PSO Mapping vi
Executive Summary vii
Acknowledgement viii
List of Figures ix
List of Abbreviation x
1. Introduction 1
1.1 Background 1
1.2 Objectives 1
1.3 Scope 2
2. Profile 3
2.1 Industry profile 3
2.2 Industry Culture 4
3. Internship Details 5
3.1 Internship Duration, Department & Industry 5
Supervisor Details
3.2 Responsibilities 5
4. Project 7
4.1 Work 1: SMS Spam classification 7
4.1.1 Project Overview 7
4.1.2 Technologies Used 7
4.1.3 Project Implementation 8
4.1.4 Challenges Faced 9
4.1.5 Key Takeaways from the project 10
4.1.6 Output Screenshots 10
4.2 Work 2: Best Ad predictor 13
4.2.1 Project Overview 13
4.2.2 Technologies Used 13
4.2.3 Project Implementation 13
4.2.4 Challenges Faced 14
xi
4.2.5 Key Takeaways 14
4.2.6 Output Screenshots 15
5. Skills Acquired 16
5.1 Technical Skills 16
5.2 Soft Skills 16
6. Conclusion 17
References 18
xii