0% found this document useful (0 votes)

26 views14 pages

Jangan Hapus 1

This document is a configuration manual for Muhammad Imran Shaikh's MSc research project on data analytics. It details the system configuration including hardware of an Intel Core i5 processor with 16GB RAM and software including Microsoft Office 365 and Python coding libraries. It describes the project development process involving data extraction, preprocessing, and implementation of recommendation engine models including content-based filtering, collaborative filtering, and matrix factorization. Evaluation of models is done using k-fold cross validation and leave one out cross validation to optimize hyperparameters and accuracy.

Uploaded by

rodiahgam1

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

26 views14 pages

Jangan Hapus 1

Uploaded by

rodiahgam1

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 14

Configuration Manual

MSc Research Project

Data Analytics

Muhammad Imran Shaikh

Student ID: x17119308

School of Computing
National College of Ireland

Supervisor: Dr. Muhammad Iqbal

National College of Ireland
Project Submission Sheet
School of Computing

Student Name: Muhammad Imran Shaikh

Student ID: x17119308
Programme: Data Analytics
Year: 2020
Module: MSc Research Project
Supervisor: Dr. Muhammad Iqbal
Submission Due Date: 17/08/2020
Project Title: Configuration Manual
Word Count: 1458
Page Count: 12

I hereby certify that the information contained in this (my submission) is information
pertaining to research I conducted for this project. All information other than my own
contribution will be fully referenced and listed in the relevant bibliography section at the
rear of the project.
ALL internet material must be referenced in the bibliography section. Students are
required to use the Referencing Standard specified in the report template. To use other
author’s written or electronic work is illegal (plagiarism) and may result in disciplinary
action.

Signature:

Date: 25th September 2020

PLEASE READ THE FOLLOWING INSTRUCTIONS AND CHECKLIST:

Attach a completed copy of this sheet to each project (including multiple copies).
Attach a Moodle submission receipt of the online project submission, to
each project (including multiple copies).
You must ensure that you retain a HARD COPY of the project, both for
your own reference and in case a project is lost or mislaid. It is not sufficient to keep
a copy on computer.

Assignments that are submitted to the Programme Coordinator office must be placed
into the assignment box located outside the office.

Office Use Only

Signature:

Date:
Penalty Applied (if applicable):
Configuration Manual
Muhammad Imran Shaikh
x17119308

1 Introduction
The main reason to write this configuration manual is to demonstrate the configura-
tion of system setup, software, and hardware compatibility to run and implement the
programming language code which will help to design the Research project and report.
This manual will cover sections like System Configuration, Project Development, Codes
Implementation, and Experiments with different machine learning models.

2 System Configuration
2.1 Hardware
Processor:3rd Generation Intel Core i5-3320M (2.6 GHz, 3MB L3 cache, 2 cores)1 Up to
3.30 GHz, Ram:16gb,System type:64-bit OS, Graphics: NVIDIA Quadro K2000M with
2GB dedicated DDR3, Windows:10pro(2019)

2.2 Software
Microsoft Office 356: Microsoft Word (For all the professional written stuff), Microsoft
Excel(For storing dataset as CSV and Excel format and for visualization purpose too),
Microsoft PowerPoint(presentation slides).
Python coding Language: Loading Libraries, Data Cleaning, Data preprocessing
and engineering, Initial Data Analysis, Training and Test data splitting, Models Im-
plementation, Hyperparameter Tuning, and Evaluation.Python IDE: Jupyter Notebook
and PyCharm.

3 Project Development
The main steps involved in the project development phase are the selection for ideal
IDE(Jupyter Notebook) to perform our coding task, Loading suitable libraries, Data
cleaning(checking null values and imputations with aggregations), and Data preprocessing
and engineering(grouping and joining datasets, Data merging, Describing columns, Re-
moval of unnecessary features, Removal of pipes with string split into Genre columns).Initial
data visualization(Word Cloud, several bar charts).
Preparation of separate class for movie dataset to implement Recommendation system
techniques(Content-based filtering and Collaborative filtering), Splitting train and test set
to fit in various models of our recommendation engine. Getting Top-N movies results from

1
our Recommendation machine learning models by utilizing different techniques. Models
Hyperparameter tuning (for extracting the best parameters for optimal results). Models
Evaluation(K-fold cross-validation,LOO(Leave One Out) Cross-validation with splits to
get better Accuracy results from our machine learning models)Multiple Evaluation plots
are plotted to get visual analysis.

3.1 Data Extraction and Pre-processing

The dataset is collected and generated by GroupLens Research Group, Dataset (ml-latest-
small)1 consist of almost 100k ratings by different users with over 1200 movie tags from
9125 movies. ’ml‘ stands for movie lens. Each selected user had at least rated 20 movies.
There are 4 files included in this dataset named ‘movies.csv’, ‘ratings.csv’, ‘tags.csv’,
‘links.csv’ but for recommendation purpose, we are considering only two dataset files
i.e. ‘movies.csv’ and ‘rating.csv’ The entire coding is done in the python programming
language. Various python Libraries are imported based on the implementation of different
recommendation techniques. All Recommendation system models are imported from
’Surprise Library’ which is an official python recommendation system library can be seen
in Figure 1 . Data preprocessing and engineering are obtained by grouping and joining

Figure 1: Importing python libraries

datasets, Data merging, Describing columns, deletion of unnecessary features, Removal

of pipes with string split in Genre columns, Create a function that counts the number of
times each genre appear can be seen in Figure 2.

4 Implementation of Recommendation Engine Ma-

chine Learning Models
Our recommendation engine is tested with 3 different recommendation system techniques
i.e. (Content-based filtering, Collaborative filtering, and Matrix Factorization) by res-
ulting Top-N movies results for users and even for movies in content-based filtering. A
separate Movie class is generated to combine both movies and ratings CSV files by group-
ing them by users and movie ids numbers can be seen in the figure. Different machine
1
https://grouplens.org/datasets/movielens/latest/

2
Figure 2: Create a function that counts the number of times each genre appears

learning models are implemented with their defined parameters to get better recommend-
ations. Surprise library which is an official python recommendation system library has
been utilized to implement machine learning models for our recommendation engine Fig-
ure 1.Following are the steps that are taken to create and evaluate our recommendation
engine.

4.1 Experiment with Movielens Dataset Analysis

The Analysis on movielens dataset based on the merging of two dataset files i.e. ‘movies.csv’
and ‘rating.csv’ as an inner join by movies Ids in fig. With the help of this merged dataset,
we are visualizing movie genres by the creation of world cloud and histogram to analyses
which movie genres are most popular can be seen in Figure 3, Figure 4. Moreover, top 25
movies with the highest ratings are also plotted to analyze which movie is rated highest
by different users in Figure 5.

4.2 Experiment with Content Based Filtering

Content-based filtering works on the phenomena of users’ interest in different items. So,
based on that interest, similar items are recommended to the users. Recommendation
action becomes more accurate if the user provides more input. In our Content-based
recommendation engine, we are finding 10 nearest neighbors of the movie of our interest
by implementing the KNNBaseline algorithm with a similarity metric of Pearson baseline
Figure 6.

3
Figure 3: Word cloud to analyze which Genre are the popular ones

4
Figure 4: Histogram to analyze which Genre are the popular ones

Figure 5: Top 25 movies with highest ratings

5
Figure 6: Content based filtering technique by using KNNBaseLine algorithm

6
4.3 Experiment with User and Item(Memory) Based Collabor-
ative Filtering
User and Item-based collaborative filtering are one of the most extensively used techniques
in the recommendation system, it works by finding a group of similar users who have
given the similar reactions to the item of your interest. The rating matrix is created to
find similar users and items based on ratings that are given by the user.KNNwithMean
machine learning algorithm along with similarity metric of Cosine is utilized to get Top-10
nearest neighbor movies for specific user Figure 7, Figure 8.

Figure 7: User based collaborative filtering technique by using KNNWithMean algorithm

7
Figure 8: Item based collaborative filtering technique by using KNNWithMean algorithm

8
4.4 Experiment with Matrix Factorization(Model)Based Col-
laborative Filtering
In Matrix factorization or Model-based collaborative filtering is the Dimensionality Re-
duction technique just like Principal Component Analysis(PCA). Matrix factorization
breaks down the user-item large matrix into a smaller matrix. The hidden features are
defined by latent factors that are created by item and user column and row matrix. In our
Matrix factorization method, we are implementing two matrix factorization algorithms
i.e. SVD(Singular Value Decomposition) and SVDpp(Singular Value Decomposition plus
plus) to get Top-10 recommendation results shown in Figure 9, Figure 10.

Figure 9: Matrix Factorization technique by using SVD algorithm

9
Figure 10: Matrix Factorization technique by using SVD++ algorithm

5 Experiment with Models Hyperparameter Tuning

For Models Hyperparameter tuning we are considering Grid Search CV from Python
Surprise Library, which provides us the best parameters to get optimal value from our
machine learning models when we are training our dataset. The main parameters which
are considered for KNN and Matrix factorization-based algorithms are different K-Values,
Epoches no, learning rate, Similarity options, and accuracy measures(RMSE and MAE)
can be seen in Figure 11, Figure 12.

Figure 11: Hyperparameter tuning with GridSearchCV for different params of collabor-
ative filtering models

10
Figure 12: Hyperparameter tuning with GridSearchCV for different params of Matrix
Factorization models

6 Experiment with Models Evaluation

For Models Cross-Validation, we are considering two cross validators to test the accuracy
of our recommendation engine models in different splits. Those two Cross validators are
K-Fold cross Validator and LOOCV leave one out cross validator can be seen in Figure 13,
Figure 14

Figure 13: Models Evaluation with K-fold Cross Validation

11
Figure 14: Models Evaluation with LOO(Leave One Out) Cross Validation

Seminar Report
No ratings yet
Seminar Report
13 pages
MOvie Recommendation System Project Report
No ratings yet
MOvie Recommendation System Project Report
30 pages
Rosp
No ratings yet
Rosp
17 pages
DSBDA Mini Project
No ratings yet
DSBDA Mini Project
11 pages
Assignment 5zeerak
No ratings yet
Assignment 5zeerak
6 pages
Project Synopsis
No ratings yet
Project Synopsis
14 pages
Karan Mini Proj
No ratings yet
Karan Mini Proj
11 pages
Iv Year - Mini Project - Final Review PPT Sample Format
No ratings yet
Iv Year - Mini Project - Final Review PPT Sample Format
25 pages
Assignment 5
No ratings yet
Assignment 5
6 pages
ML Case Study
No ratings yet
ML Case Study
4 pages
Movie Recommendation Engine Using Artificial Intelligence
No ratings yet
Movie Recommendation Engine Using Artificial Intelligence
30 pages
Project Report On Movie Recommendation System
No ratings yet
Project Report On Movie Recommendation System
10 pages
Recommendation Engines
No ratings yet
Recommendation Engines
17 pages
Chatbot For Banking Project Report - Phase - 1,2,3
No ratings yet
Chatbot For Banking Project Report - Phase - 1,2,3
32 pages
Movie Recommendation System Using Machine Learning
No ratings yet
Movie Recommendation System Using Machine Learning
6 pages
Movies Recommendation Using Machine Learning - Research Paper
No ratings yet
Movies Recommendation Using Machine Learning - Research Paper
11 pages
SML PBL
No ratings yet
SML PBL
18 pages
Tech Students' Movie Recommender
No ratings yet
Tech Students' Movie Recommender
10 pages
Dsbda Report Final
No ratings yet
Dsbda Report Final
15 pages
Project Report in House
No ratings yet
Project Report in House
19 pages
Movie Recommender System Project
No ratings yet
Movie Recommender System Project
33 pages
Roject Synopsis
No ratings yet
Roject Synopsis
10 pages
Movie Recommendation System Design
No ratings yet
Movie Recommendation System Design
24 pages
Final Synopsis
No ratings yet
Final Synopsis
18 pages
Movie Recommender System Guide
No ratings yet
Movie Recommender System Guide
15 pages
Movie - Recommendation - System Research Paper
No ratings yet
Movie - Recommendation - System Research Paper
9 pages
Derp 1
No ratings yet
Derp 1
18 pages
Dsba Rasika Mini Pro2
No ratings yet
Dsba Rasika Mini Pro2
17 pages
Medium Com SWLH Beginners Guide To Build Recommendation System 2bd4a96aa3e
No ratings yet
Medium Com SWLH Beginners Guide To Build Recommendation System 2bd4a96aa3e
14 pages
Naan Mudhalvan Phase 5project
No ratings yet
Naan Mudhalvan Phase 5project
19 pages
Ali Docs
No ratings yet
Ali Docs
32 pages
Project Movie Recommend
No ratings yet
Project Movie Recommend
4 pages
Movie Recommendation System: Synopsis For Project (KCA 353)
No ratings yet
Movie Recommendation System: Synopsis For Project (KCA 353)
17 pages
DL Project
No ratings yet
DL Project
9 pages
Review 2 SEM 6
No ratings yet
Review 2 SEM 6
25 pages
Intership PPT Final
No ratings yet
Intership PPT Final
15 pages
Group 1 (2nd Practical) .
No ratings yet
Group 1 (2nd Practical) .
15 pages
Module4.4-Case Study and Project-Recommendation System
No ratings yet
Module4.4-Case Study and Project-Recommendation System
16 pages
Movie Rec
No ratings yet
Movie Rec
13 pages
21ESKCA031 Baldeep Report
No ratings yet
21ESKCA031 Baldeep Report
34 pages
Dsbda Mini Project Aissms CLG
No ratings yet
Dsbda Mini Project Aissms CLG
10 pages
B.Tech Movie Recommender Report
No ratings yet
B.Tech Movie Recommender Report
44 pages
Vaibhav - Project Report On Movie Recommender System Using Machine Learning
No ratings yet
Vaibhav - Project Report On Movie Recommender System Using Machine Learning
11 pages
MCA IV Semester Project 1 Review Presentation: Movie Recommendation System Using Machine Learning
No ratings yet
MCA IV Semester Project 1 Review Presentation: Movie Recommendation System Using Machine Learning
12 pages
Movie Recommendation Report
No ratings yet
Movie Recommendation Report
27 pages
Group 1 (2nd Practical)
No ratings yet
Group 1 (2nd Practical)
11 pages
Minor Presentation
No ratings yet
Minor Presentation
20 pages
Movie Recommendation System Guide
No ratings yet
Movie Recommendation System Guide
31 pages
Bda Mini Project Part2
No ratings yet
Bda Mini Project Part2
24 pages
Build a Python Recommendation Engine
No ratings yet
Build a Python Recommendation Engine
17 pages
Movie Recommendation System DAA PBL Project
No ratings yet
Movie Recommendation System DAA PBL Project
40 pages
Wa0004.
No ratings yet
Wa0004.
13 pages
Divya NM (1) - 2
No ratings yet
Divya NM (1) - 2
41 pages
Mini Project
No ratings yet
Mini Project
44 pages
Project Report "E-Commerce Recommendation"
No ratings yet
Project Report "E-Commerce Recommendation"
20 pages
Movie Recommendation System ML
No ratings yet
Movie Recommendation System ML
14 pages
Movie Recommender System Guide
No ratings yet
Movie Recommender System Guide
11 pages
Literature Review
No ratings yet
Literature Review
5 pages
Dsbda Mini 2
No ratings yet
Dsbda Mini 2
23 pages
Mobile Repairing Course: What Will You Learn in This Course?
100% (1)
Mobile Repairing Course: What Will You Learn in This Course?
5 pages
SIA Software Company, Inc.: SIA-Smaart Pro Case Study #5: Live Music Show in A Midsize Auditorium
No ratings yet
SIA Software Company, Inc.: SIA-Smaart Pro Case Study #5: Live Music Show in A Midsize Auditorium
4 pages
WR SNMP Programmers Guide 10.5
No ratings yet
WR SNMP Programmers Guide 10.5
240 pages
Flow Chart of Various Parts in A Drone With Elaboration
No ratings yet
Flow Chart of Various Parts in A Drone With Elaboration
7 pages
Role: Student: Name: Abhinav
No ratings yet
Role: Student: Name: Abhinav
4 pages
SO Designation
No ratings yet
SO Designation
2 pages
SGV Inverter Parameter Setting Instruction 4H50ZS1571P1
No ratings yet
SGV Inverter Parameter Setting Instruction 4H50ZS1571P1
14 pages
Three Step Password Protection
No ratings yet
Three Step Password Protection
5 pages
SN 65 HVD 1781
No ratings yet
SN 65 HVD 1781
34 pages
Session1a Slides
No ratings yet
Session1a Slides
20 pages
Food Recognition and Nutrition Analysis Using Deep CNNS: Bojia Qiu
No ratings yet
Food Recognition and Nutrition Analysis Using Deep CNNS: Bojia Qiu
90 pages
Eresco Mf4: Reliable, Lightweight, Portable X-Ray Generator
No ratings yet
Eresco Mf4: Reliable, Lightweight, Portable X-Ray Generator
8 pages
Adv C Question Bank
No ratings yet
Adv C Question Bank
2 pages
Computer Science 2 Model Question Paper Tumkur
No ratings yet
Computer Science 2 Model Question Paper Tumkur
5 pages
GUIDELINES
No ratings yet
GUIDELINES
3 pages
Shyfem Finite Element Model For Coastal Seas User Manual
No ratings yet
Shyfem Finite Element Model For Coastal Seas User Manual
54 pages
Full Pivotcall Options Course Available Premiumcourses12: Generated Via PDF Scanner
No ratings yet
Full Pivotcall Options Course Available Premiumcourses12: Generated Via PDF Scanner
34 pages
Old Paper 2597164
No ratings yet
Old Paper 2597164
1 page
Calculator Consum Electric LP ELECTRIC
No ratings yet
Calculator Consum Electric LP ELECTRIC
4 pages
RK09 KDD
No ratings yet
RK09 KDD
8 pages
Bisection Method
No ratings yet
Bisection Method
3 pages
EnDuRA Overview
No ratings yet
EnDuRA Overview
32 pages
Iterative Waterfall Model
100% (1)
Iterative Waterfall Model
3 pages
9500 MPR MPT-GC R4.0.0 User Manual 3DB19025AAAA - 02 PDF
67% (3)
9500 MPR MPT-GC R4.0.0 User Manual 3DB19025AAAA - 02 PDF
238 pages
Crippa CA945 Drive Faults
No ratings yet
Crippa CA945 Drive Faults
57 pages
Manifest
No ratings yet
Manifest
59 pages
2022 - Midterm 2 Solution - Spring - COAL
No ratings yet
2022 - Midterm 2 Solution - Spring - COAL
6 pages
Bcom Prof
No ratings yet
Bcom Prof
85 pages
Kotlin Quick Reference Sheet
No ratings yet
Kotlin Quick Reference Sheet
18 pages
PTVVisum14 Manual
No ratings yet
PTVVisum14 Manual
2,798 pages

Jangan Hapus 1

Uploaded by

Jangan Hapus 1

Uploaded by

Configuration Manual

MSc Research Project

Muhammad Imran Shaikh

Supervisor: Dr. Muhammad Iqbal

Student Name: Muhammad Imran Shaikh

Date: 25th September 2020

PLEASE READ THE FOLLOWING INSTRUCTIONS AND CHECKLIST:

Office Use Only

3.1 Data Extraction and Pre-processing

Figure 1: Importing python libraries

datasets, Data merging, Describing columns, deletion of unnecessary features, Removal

4 Implementation of Recommendation Engine Ma-

4.1 Experiment with Movielens Dataset Analysis

4.2 Experiment with Content Based Filtering

Figure 5: Top 25 movies with highest ratings

Figure 7: User based collaborative filtering technique by using KNNWithMean algorithm

Figure 9: Matrix Factorization technique by using SVD algorithm

5 Experiment with Models Hyperparameter Tuning

6 Experiment with Models Evaluation

Figure 13: Models Evaluation with K-fold Cross Validation

You might also like