0% found this document useful (0 votes)

9 views33 pages

MCA2282

The project report details the development of a Book Recommendation System as part of a Master's degree in Computer Applications at Galgotias University. It aims to create a hybrid recommender system that utilizes both content-based and collaborative filtering techniques to provide personalized book recommendations, addressing challenges such as the cold start problem. The report includes acknowledgments, objectives, literature review, data sources, and implementation details, demonstrating the project's comprehensive approach to leveraging machine learning for enhanced user experience in book recommendations.

Uploaded by

Vicky Singh

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

9 views33 pages

MCA2282

Uploaded by

Vicky Singh

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 33

PROJECT REPORT

Book Recommendation System

Submitted in partial fulfillment of the

requirement for the award of the degree of

MASTERS OF COMPUTER APPLICATION

Session 2024-2025
In
DATA ANALYTICS
Submitted By

DEEPAK YADAV
Admis No. 23SCSE2160014

DEEPAK SINGH
Admis No. 23SCSE2160043

Under the guidance of

Dr.-Mohd Arif
Assistant Professor

School of Computing Science & Engineering

Galgotias University
November 2024
SCHOOL OF COMPUTER APPLICATION AND
TECHNOLOGY

GALGOTIAS UNIVERSITY,
GREATER NOIDA

CANDIDATE’S DECLARATION

I/We hereby certify that the work which is being presented in the project, entitled “Book

Recommendation System” in partial fulfillment of the requirements for the award of the MCA

(Master of Computer Application) submitted in the School of Computer Application and

Technology of Galgotias University, Greater Noida, is an original work carried out during the

period of August, 2024 to Jan and 2025, under the supervision of Dr.-Mohd Arif, Department of

Computer Science and Engineering/School of Computer Application and Technology , Galgotias

University, Greater Noida.

The matter presented in the thesis/project/dissertation has not been submitted by me/us for the

award of any other degree of this or any other places.

Deepak Yadav (23SCSE2160014)

Deepak Singh (23SCSE2160043)

This is to certify that the above statement made by the candidates is correct to the best of my

knowledge.

Dr.-Mohd Arif

Asst. Professor
SCHOOL OF COMPUTER APPLICATION AND
TECHNOLOGY

GALGOTIAS UNIVERSITY,
GREATER NOIDA

CERTIFICATE

This is to certify that the work in the project report entitled “A Facial Expression Recognition
System” by Deepak Yadav, Deepak Singh, bearing Enrolment No.: 23SCSE2160014,
23SCSE2160043, is a Bonafede record of project work carried out by her under my
supervision and guidance in partial fulfilment of the requirements for the award of the degree
of Masters of Computer Application in the Department School of Computer Science and
Engineering, Galgotias University, Greater Noida. Neither this project nor any part of it has
been submitted for any degree or academic award elsewhere.

Dr.-Mohd Arif

Assistant Professor
SCHOOL OF COMPUTER APPLICATION AND
TECHNOLOGY

GALGOTIAS UNIVERSITY,
GREATER NOIDA

ACKNOWLEDGEMENT

My journey towards achieving the destination for the design and development of this project
has finally come to an end, I would like to express my heartfelt gratitude to everyone who
contributed to its successful completion. I am deeply grateful to my project guide, Dr.-Mohd
Arif for their invaluable guidance, support, and expertise throughout the development of the
“Book Recommendation System” project. I am also thankful to my friends and family for their
endless support and motivation. Despite the challenges faced, I had a lot of learning and fun
throughout this journey. This project would not have been possible without the relentless
support and encouragement from everyone involved.

Date: 27 January 2024 Deepak Singh

Deepak Yadav
SCHOOL OF COMPUTER APPLICATION AND
TECHNOLOGY

GALGOTIAS UNIVERSITY,
GREATER NOIDA

ABSTRACT

Recommender systems are one of the most successful and widespread application of machine
learning. The primary aim of this project is to build a book recommender system that provides
personalized book recommendation for each user. The machine learning algorithms in
recommender systems are typically classified into two categories – content based and collaborative
filtering. Both methods have their pros and cons. Content-based filtering require rich description
of items and well organized user profile before recommendation can be made to users whereas
collaborative filtering suffers from the cold start problem. Hybrid method uses the best of both of
these methods but they tend to be complex. The goal of this project is to implement a hybrid
recommender system in its simplest form without losing much quality in its recommendation.
Furthermore, it is important to account both implicit and explicit feedback from the user. In this
regard, the sentiment on the reviews of the books are analyzed and they are taken into account
along with rating and clicks on particular book to generate a weighted average mean rating that is
used in user-item interaction matrix for generating recommendation by method of matrix
factorization. Thus, the objective here is to eliminate the need of quality descriptive item data and
adequate information about a user in order to make relevant recommendation.

The system evaluation results show that the accuracy of the system recommendations is very good
and that a recommender system based on the combination of content-based and collaborative
filtering approaches provides more accurate recommendations for the book domain.

Signature of Student Signature of Guide

List of Figures

Figure 1: Distribution of book With Rating Figure............................................................. 12

Figure 2:RMSE Formula..................................................................................................... 12
Figure 3: Ratings Of Book Per User ................................................................................... 16
Figure 4: DataSet ................................................................................................................ 19

1
14

2
Table of Contents
CHAPTER 1 ...................................................................................................................... 5
1. INTRODUCTION ..................................................................................................... 5
1.1. User-based approach........................................................................................... 5
1.2. Item-based approach ........................................................................................... 6
1.3. Researchers ......................................................................................................... 6
CHAPTER 2 ...................................................................................................................... 7
2. OBJECTIVES............................................................................................................ 7
CHAPTER 3 ...................................................................................................................... 8
3. LITERATURE REVIEW........................................................................................... 8
CHAPTER 4 .................................................................................................................... 11
4. DATA SOURCES AND PRE-PROCESSING ........................................................ 11
4.1. System Architectute .......................................................................................... 11
4.2. RMSE Model ................................................................................................... 12
4.3. Sentiment Analysis ........................................................................................... 13
CHAPTER 5 .................................................................................................................... 16
5. IMPLEMENTATION DETAILS ............................................................................. 16
5.1. Development And Deployment Details ............................................................ 16
5.2. Algorithm ......................................................................................................... 16
5.2.1. Content Based Filtering ............................................................................. 16
5.2.2. Collaborative Based Filtering .................................................................... 16
5.2.3. Hybrid Filtering Method ........................................................................... 17
5.2.4. Dataset Description ................................................................................... 17
5.2.5. Loading Data ............................................................................................. 17
5.2.6. Preprossing Data ........................................................................................ 17
5.2.7. Exploratory Data Analysis ........................................................................ 18
5.2.8. Content based filtering /popularity-based filtering .................................... 18
5.2.9. Collaborative based filtering Approach ..................................................... 18
5.3. Accuracy ....................................................................................................... 19
CHAPTER 6 .................................................................................................................... 30
6. CONCLUSIONS AND FUTURE WORK .............................................................. 20
6.1. Overview .......................................................................................................... 20
6.2. Achivements ..................................................................................................... 20
6.3. Limitations ........................................................................................................ 20
3
CHAPTER 7 .................................................................................................................... 23
7. EXPERIMENTATION AND RESULTS ................................................................ 23
7.1. Testing .............................................................................................................. 23
7.2. Experimental Results ........................................................................................ 23
CHAPTER 8 .................................................................................................................... 25
References ........................................................................................................................ 25

4
CHAPTER 1

1. INTRODUCTION
The Current recommendation systems such as content based filtering and collaborative
filtering use different information sources to make recommendations. Content based
filtering, makes recommendations based on user preferences for product features in this
case the genre of the Books, the authors, the ratings etc. Collaborative filtering impersonate
user-to-user recommendations as a weighted and linear combination of other user
preferences. Both methods have limitations.

Content-based filtering recommends new entities byusing n or more custom data to absorb
the best matches. Similarly, collaborative filtering requires a large dataset of active users
who have previously evaluated the product in order to make accurate predictions. The
combination of these different recommender systems is called a hybrid system and allows
you to combine item properties with other users' preferences. Existing services such as
Goodreads personalize recommendations and use weighted averages to provide an overall
rating for abook. This greatly enhances the value of each recommendation as it takes into
account the user's individual book preferences. However, our recommendation engine
employs a collaborative social networking approach that mixes your tastes with the wider
community to produce meaningful results. The complete RS contains three main
components: user resources, article resources, and recommended algorithms. The user
model analyzes consumer interests, and the item model analyzes the properties of items in a
similar way. It then collates the consumer's characteristics with the item's characteristics and
uses the recommended algorithm to estimate the recommended item. The performance of
this algorithm affects the performance of the entire system.

In memory-based CF, book ratings are used directly to rank unknown ratings for new
books. This method can be divided into two types: a user-based approach and an item-based
approach.

1.1. User-based approach:

User-Based Collaborative Filtering is a technique used to predict the items which looks
for similar users that might like on the basis of ratings given to that item by the other
users who have positively interacted with that item.
5
1.2. Item-based approach:
This method determines the similarity between a group of objects specifically
evaluated by the buyer and the desired object. Items that are very similar will be
selected. Recommended values are calculated by taking weighted average of user
ratings for the same object.
1.3. Researchers
have combined various recommendation techniques to obtain accurate and rapid
recommendations, these are called hybrid recommender systems. Some Hybrid
recommendation approaches are:

• Perform separate procedures and connect results.

• Use some content filtering directives with the CF communities

• Use some CF principles in content filtering recommendations.

• Use both content and collaborative filtering in the recommender

In addition, research on semantic-based, contextual,crosslanguage, cross-domain, and peer-

to-peer methods in progress.

6
CHAPTER 2

2. OBJECTIVES
The primary goal of this project is to build a recommender system that provides
personalized book recommendations to users and to build a website whereby the users can
give feedbacks on the books and get recommendations accordingly. The concrete functional
objectives of this project are as follows:

• To account all three forms of user feedback: star rating, review and clicks for generating
recommendations.

• To develop a recommender system that incorporates book metadata: author, genre and
description to find similarities among the books to address cold-start problem.

• To build a website with descent authentication where users can provide feedback on the
books and get personalized recommendation.

7
CHAPTER 3

3. LITERATURE REVIEW
It is vital to account every feedback from the user for delivering quality recommendation.
Often, the reviews on particular item are taken as of lower priority than direct rating. Business
analysts of online reviews believe that the reviews are directly responsible for their marketing
strategies as well as purchasing decisions. Two surveys related to impact of opinion mining
on individual, society and organizations were carried out in America on 2000, and on
American people in 2008, by Horrigan. The key findings of this surveys are stated below: •
81% of Internet users make online investigation of the product at least once before purchasing
it.
• 20% Internet users perform such investigation on a usual day.
• Consumers report shows that users are ready to pay starting from 20 to 99% additional for
a 5-star-rated products than that of a 4-star-rated products. Isinkaye, et al. have provided
different characteristics and potentials of different prediction techniques in recommendation
systems in order to serve as a compass for research and practice in the field of
recommendation systems. Collaborative Filtering technique filters out the recommendation
with the help of user behavior in the form of ratings. This technique generates rating for an
unrated item for the user, based on the commonalities among users and their ratings.
Recommendation quality is directly proportional to the size of rating dataset. This technique
suffers from cold start problem for a new user and a new item.

This is so because user with few ratings is difficult to categorize and item with few ratings
cannot easily be recommended. Also users with unusual tastes will not get recommendation
as per expectations because it’s quite difficult to find their co-relation with other users to
extract ratings. Content-based technique uses item features and user preference to provide
recommendations. In this technique, item features like keywords are used to describe items,
while user preference indicates the items liked by the user. It recommends items that are
similar to the items preferred by the user. Since it recommends similar items based on user
preference and item feature, crossed genre items preferred by the user cannot be
recommended. Knowledge-based methods mainly utilize the path between concepts in
knowledge resources to indicate their semantic similarity Lin and Resnik measured the
8
semantic similarity by the information content ratio of the least common sub sumer of the
two concepts. Instead of directly exploiting the graph distance in knowledge resources,
several researches manufactured concept vectors according to a set of ontology properties for
concept semantic similarity.

Corpusbased methods assume that words with similar meaning often occur in similar
contexts, Latent Semantic Analysis (LSA) represents words as compact vectors via singular
value decomposition (SVD) on the corpus matrix, and GloVe reduced computational costs by
training directly on the non-zero elements in corpus matrix. Turney measured semantic
similarity with the point-wise mutual information. Web can also be regarded as a large corpus,
and the Google Distance utilizes the number of words co-occur in web pages for concept
semantic similarity. Mikolov proposed the word embedding approach with neural network to
capture the word semantics occurred in a fixed size window.

A challenge for determining semantic similarity of short texts is how to go from word-level
semantics to short-text-level semantics. A direct approach is utilizing weighted sum of word-
to-word semantic similarity, and used the greedy word alignment method to form the sentence
semantic similarity. Banea .et al. measured the semantic similarity between text snippets
infused with opinion knowledge. D Ramage et al. constructed a concept graph using words
of each text snippet, then measured the similarity between two concept graphs. Recently, the
Sem Eval conference released the Semantic Text Similarity (STS) task especially for the
boom of short text semantics, and regression models are adopted by most teams to predict
the similarity score, and lexical feature and syntactic features are exploited. The paragraph
vector which is similar to word embedding, is also proposed with neural network to measure
the semantic similarity between short texts.

The research directly related to document semantic similarity is rare and undeveloped.
Traditional similarity researches such as the TF-IDF converted documents to vectors via word
counting, and measured the similarity between documents by the similarity of vectors.
Academic articles can be regarded as semi-structured texts, which contain many structured
annotations besides plain text. The similarity between academic articles can be measured with
the help of annotated information. Martin et al. fused the structured information, such as
9
authors and keywords with traditional text-based measures for academic articles similarity.
regard articles and their citations as an information network. Thus LDA can be used in the
semantic analysis of long documents. Muhammad Rafi defined a similarity measure based
on topic maps in the document clustering task. The documents are transformed into topic
maps based coded knowledge, and the similarity between a pair of documents is represented
as a correlation between their common patterns. M. Zhang et al. enriched document with
hidden topics from external corpora, and measured the document similarity in text
classification task with the similarity of topic distributions. Hybrid technique combines two
or more recommendation techniques to predict recommendation.

10
CHAPTER 4

4. DATA SOURCES AND PRE-PROCESSING

Data is the fuel of any machine learning algorithm. To bring the model to life, a meaningful
dataset is needed for the algorithms to work on. This project contains two major parts:
recommender system and sentiment analysis. The nature of data to be fed into the network
is different for these two. For this reason, data collection and preprocessing has been
explained separately.
4.1. Book recommender system
Although, the project plans to develop a system where uses can give rating and add review
themselves, for the initial training of the models, proper label dataset is necessary. Unlike
for movies, proper dataset is not abundantly for book. After some research, few good data
sources were found. This project has merged following two publicly available datasets.
4.1.1. Book Crossing Dataset
This collection consists of 278,858 users providing 1,149780 ratings on 271,379 books.
Duplicates entries and missing values were removed from the dataset. The distribution of
the data possessed long tail distribution of the number of ratings per book and ratings per
user. The books with less than 15 ratings and the user that has provided less than 15 ratings
were removed.
4.1.2. Surprise Library
Creating a recommendation system for a library is an innovative way to enhance user
experience by suggesting books that align with their interests while introducing delightful
surprises. The process involves gathering data on users’ reading habits, selecting an
appropriate recommendation technique (such as collaborative filtering or content-based
filtering), and building a model that balances relevance with novelty. Evaluation metrics
like precision, recall, and diversity help ensure quality, while user feedback and A/B
testing guide improvements. The goal is to create a system that feels intuitive and
engaging, encouraging users to discover hidden gems and explore new genres in the
library.

11
Following section visualizes the basic structure of the dataset obtained after merging.

Figure 4.1: Distribution of book with rating Figure

4.2. RMSE Model

Root Mean Square Error (RMSE) is a popular metric for evaluating the performance of
regression models. RMSE measures the average magnitude of the errors between
predicted and actual values, making it ideal for assessing how well a model captures
patterns in continuous data. It is calculated as the square root of the mean of the squared
differences between predicted and actual values.

The formula is:

Figure 4.2: :RMSE Formula

A lower RMSE indicates a better fit, with values close to zero representing highly accurate
predictions. RMSE penalizes larger errors more heavily, making it sensitive to outliers.
It's commonly used in recommendation systems, forecasting, and any application where
precise predictions are valuable.

12
shows the number of books at a specific rating value. The dataset originally, contains rating
values from zero to ten with zero rating signifying that the book has not been rated. It
concludes that the dataset contains large number of book which have not been rated.

Figure 4.3 represents the distribution of rating per user. It can be seen that the dataset
contains abundant users who have not rated the book is high. As the rating value goes
increasing, the count goes on decreasing.

Figure 4.3: Ratings of book per user

4.3. Sentiment Analysis
For making sentiment analyzer, the deep learning framework PyTorch was used which
provides access to Large Review Dataset for educational purpose. Along with the review
dataset, GloVe, open-sourced by Stanford, was used for obtaining vector representations
for words. Word vectors are achieved by mapping words into meaningful space where the
distance between words is related to semantic similarity. Training is performed on
aggregated global word-to-word co-occurrence statistics from a corpus, and the resulting
representations showcase linear substructures of the word vector space. Following steps
were carried out before the data was fed to the network.

4.3.1. Data Split

The above dataset provided by PyTorch comes pre-split into train and test sets. The
splitting of dataset is necessary because testing the performance of a model on the same
dataset that is used to train the model doesn’t give the true indication of how good the
model is performing.

13
4.3.2. Data preprocessing
Data preprocessing is a data mining technique that involves transforming raw data into an
format that can be understood. Real-world data is often incomplete, inconsistent, and/or
lacking in certain behaviors or trends, and is likely to contain many errors. Data
preprocessing is a proven method of resolving such issues. Data preprocessing prepares
raw data for further processing.

This is the basic after the collection of the data. The user may not get the desired datasets
for his need, so the user needs to modify the datasets according the need and demand of the
project. During this course, the user may need to handle the missing values, outliers or any
other misrepresentation of the datasets. The text review contains a large number of words
including regular expression, punctuation, commas, full stops and other forms of the words
like past or future form,
which are not needed and they are redundant to our project. For this NLTK, a python library
for Natural Language Processing has been used.

4.3.3. Data cleaning

Data cleaning is the process of detecting and correcting (or removing) corrupt or inaccurate
records from a record set, table, or database and refers to identifying incomplete, incorrect,
inaccurate or irrelevant parts of the data and then replacing, modifying, or deleting the dirty
or coarse data. Since the data provided by PyTorch were already built, there were no any
faults to be cleaned.

4.3.4. Data reduction

Data reduction is the transformation of numerical or alphabetical digital information derived
empirically or experimentally into a corrected, ordered, and simplified form. The basic
concept is the reduction of multitudinous amounts of data down to the meaningful parts. On
the other hand, this step is especially used for testing the operation and efficiency of machine
learning algorithms. Sentiment analyzer and recommender model were first tested in reduced
datasets to make the computation process faster.

4.3.5. Collaborative Filtering

Collaborative Filtering (CF) is a popular recommendation technique that predicts a user’s
interests by analyzing preferences or behaviors of similar users. There are two main types of
collaborative filtering:

1. User-Based Collaborative Filtering: Recommends items that users with similar

preferences have liked or interacted with. For example, if two users share similar book
interests, books liked by one user are recommended to the other.

2. Item-Based Collaborative Filtering: Suggests items based on similarity between items

themselves, often recommending items frequently liked together. For instance, if many

14
users read two books together, recommending one book if the other has been read is
common.

CF requires a large dataset of user interactions and performs best with active user data. It’s
widely used in platforms like e-commerce and streaming services, where it helps discover
new and relevant items based on shared user preferences.

15
CHAPTER 5
5. IMPLEMENTATION DETAILS
A recommendation engine is a class of machine learning which offers relevant suggestions
to the customer. Before the recommendation system, the major tendency to buy was to
take a suggestion from friends. But Now Google knows what news you will read, Youtube
knows what type of videos you will watch based on your search history, watch history, or
purchase history.

A recommendation system helps an organization to create loyal customers and build trust
by them desired products and services for which they came on your site. The
recommendation system today are so powerful that they can handle the new customer too
who has visited the site for the first time. They recommend the products which are
currently trending or highly rated and they can also recommend the products which bring
maximum profit to the company.

5.1 DEVELOPMENT AND DEPLOYMENT DETAILS

Machine learning is a process that is widely used for prediction. N number of algorithms
are available in various libraries which can be used for prediction. In this article, we are
going to build a prediction model on historical data using different machine learning
algorithms and classifiers, plot the results, and calculate the accuracy of the model on the
testing data.
Building/Training a model using various algorithms on a large dataset is one part of the
data. But using these models within the different applications is the second part of
deploying machine learning in the real world.
To put it to use in order to predict the new data, we have to deploy it over the internet so
that the outside world can use it. In this article, we will talk about how we have trained a
machine learning model and created a web application on it using Flask.

We have to install many required libraries which will be used in this model. Use pip
command to install all the libraries. pip install pandas pip install numpy pip install sklearn

5.2 ALGORITHM

5.2.1. Content based Filtering: The algorithm recommends a product that is similar to
those which used as watched. In simple words, In this algorithm, we try to find finding
item look alike. For example, a person likes to watch Sachin Tendulkar shots, so he may
like watching Ricky Ponting shots too because the two videos have similar tags and similar
categories.

5.2.2. Collaborative based Filtering: Collaborative based filtering recommender systems

are based on past interactions of users and target items. In simple words here, we try to
search for the look-alike customers and offer products based on what his or her lookalike
has chosen. Let us understand with an example. X and Y are two similar users and X user

16
has watched A, B, and C movie. And Y user has watched B, C, and D movie then we will
recommend A movie to Y user and D movie to X user.

5.2.3. Hybrid filtering method: It is basically a combination of both the above methods.
It is a too complex model which recommends product based on your history as well based
on similar users like you. There are some organizations that use this method like Facebook
which shows news which is important for you and for others also in your network and the
same is used by Linkedin too.

5.2.4. Dataset description

we have 3 files in our dataset which is extracted from some books selling websites.

• Books – first are about books which contain all the information related to books like an
author, title, publication year, etc.
• Users – The second file contains registered user’s information like user id, location.
• ratings – Ratings contain information like which user has given how much rating to
which book.

So based on all these three files we can build a powerful collaborative filtering model. let’s
get started.

5.2.5. Loading data

let us start while importing libraries and load datasets. while loading the file we have some
problems like.
• The values in the CSV file are separated by semicolons, not by a comma.
• There are some lines which not work like we cannot import it with pandas and It throws
an error because python is Interpreted language.
• Encoding of a file is in Latin

So, while loading data we have to handle these exceptions and after running the below
code you will get some warning and it will show which lines have an error that we have
skipped while loading.

5.2.6. Preprocessing Data: Now in the books file, we have some extra columns which are
not required for our task like image URLs. And we will rename the columns of each file
as the name of the column contains space, and uppercase letters so we will correct as to
make it easy to use.

The dataset is reliable and can consider as a large dataset. we have 271360 books data and
total registered users on the website are approximately 278000 and they have given near
about 11 lakh rating. hence we can say that the dataset we have is nice and reliable.

We do not want to find a similarity between users or books. we want to do that If there is
user A who has read and liked x and y books, And user B has also liked this two books

17
and now user A has read and liked some z book which is not read by B so we have to
recommend z book to user B. This is what collaborative filtering is.

So this is achieved using Matrix Factorization, we will create one matrix where columns
will be users and indexes will be books and value will be rating. Like we have to create a
Pivot table.

If we take all the books and all the users for modeling, Don’t you think will it create a
problem? So what we have to do is we have to decrease the number of users and books
because we cannot consider a user who has only registered on the website or has only read
one or two books. On such a user, we cannot rely to recommend books to others because
we have to extract knowledge from data. So what we will limit this number and we will
take a user who has rated at least 200 books and also we will limit books and we will take
only those books which have received at least 50 ratings from a user.

5.2.7. Exploratory data analysis

The primary goal of EDA is to support the analysis of data prior to making any
conclusions. It may aid in the detection of apparent errors, as well as a deeper
understanding of data patterns, the detection of outliers or anomalous events, and the
discovery of interesting relationships between variables.

5.2.8. Content based filtering /popularity-based filtering

It is a type of Recommendation System which works on the principle of popularity and or
anything which is in trend. These systems check about the product or movie which are in
trend or are most popular among the users and directly recommend those.
In the content based collaborative filtering we are finding the highest average rating of
each book. To find out the highest average rating we merge the ratings dataset with books
dataset

5.2.9. Collaborative based filtering Approach

Collaborative filtering is used by most recommendation systems to find similar patterns
or information of the users, this technique can filter out items that users like on the basis
of the ratings or reactions by similar users.

Collaborative filtering is a technique that can filter out items that a user might like on the
basis of reactions by similar users.

It works by searching a large group of people and finding a smaller set of users with tastes
similar to a particular user. It looks at the items they like and combines them to create a
ranked list of suggestions.

There are many ways to decide which users are similar and combine their choices to create
a list of recommendations. This article will show you how to do that with Python.

To experiment with recommendation algorithms, you’ll need data that contains a set of
items and a set of users who have reacted to some of the items.
18
The reaction can be explicit (rating on a scale of 1 to 5, likes or dislikes) or implicit
(viewing an item, adding it to a wish list, the time spent on an article).

While working with such data, you’ll mostly see it in the form of a matrix consisting of
the reactions given by a set of users to some items from a set of items. Each row would
contain the ratings given by a user, and each column would contain the ratings received
by an item.

To build a system that can automatically recommend items to users based on the
preferences of other users, the first step is to find similar users or items. The second step
is to predict the ratings of the items that are not yet rated by a user.

Figure 5.1: DataSet

5.3. Accuracy
One of the approaches to measure the accuracy of your result is the Root Mean Square
Error (RMSE), in which you predict ratings for a test dataset of user-item pairs whose rating
values are already known. The difference between the known value and the predicted value
would be the error. Square all the error values for the test set, find the average (or mean),
and then take the square root of that average to get the RMSE.

19
CHAPTER 6

6. CONCLUSIONS AND FUTURE WORK

6.1 Overview

In this chapter I am going to discuss the conclusions. This chapter discusses about the
achievements, limitations, conclusions and recommendations

6.2 Achievements

In this system all the objectives were met. The first objective was a reader is able to
customize the type of book to read. This helps the system to predict the type of books the
reader is interested in. Secondly the user is able to select a book online and due to big data
in the Recommender system the user has a wide variety of books to choose from. Thirdly, I
used a payment method for the system where the user is able to make payments on the
system for a specific item on the system. The system is able to keep track of the books the
reader reads on the system to enable a user either not to buy a book twice or maybe the user
wants to refer to a particular book. The system is also able to generate a report based on the
interests of the reader. Additionally, I was able to add an author's interface where authors
can advertise their books and also know the number of times their books sell and also they
can be customers of other sellers' books in order to further their work.

6.3 Limitations

6.3.1. Lack of Data

The primary difficulty with recommender systems is that they need a large amount of data
in order to provide meaningful suggestions (2009, January 28). It's no accident that the
firms most 35 associated with outstanding suggestions are those that collect large amounts
of customer data: Google, Amazon, Netflix, and Last.fm. A good recommender system must
first collect and evaluate item data, then user data (behavioral events), and last the magic
algorithm does its magic. The more item and user data a recommender system has to work
with, the more likely it will provide accurate suggestions. However, obtaining solid
references might be challenging. You'll need a large number of users to amass enough data
for the suggestions.
20
6.3.2. Changing Data

Over the years people change the way they think and also discover new things since due to
regular change, improvement and also discovering new technology people tend to write
new editions of books which makes it difficult for readers when it comes now the new data
changes in the way they think so they may start changing their preference. so the system
admin may be forced to keep adding and also removing books which are past time.

6.3.3. Changing User Preferences

People change interest according to the changes in lifestyle, age and environment. The
system uses algorithms to predict the interest of users in the recommender system and the
machine is not affected by the environment hence either the code or the algorithm the
system will keep on predicting the same choices the user chooses when beginning to use the
system.

6.3.4. Unpredictable Items

In an article on the Netflix Prize (The Netflix Prize - ResearchGate. Retrieved September
15, 2020), about Netflix's $1 million award for developing a collaborative filtering system
that 36 improves the company's own recommendation algorithm by 10%, I mentioned that
there was a problem with oddball films. A film that people either like or despise, such as
Napoleon Dynamite. Recommendations for these sorts of things are difficult to make, since
consumer reactions to them are often varied and unexpected.

6.4 Conclusions

Numerous recommendation systems based on collaborative filtering, content, and hybrid

recommendation approaches have been presented, however they all have significant
drawbacks that offer difficulties for future study. It is necessary to concentrate on this study
topic in order to discover and develop novel ways for overcoming obstacles and providing
recommendations for collaboratively filtering a broad variety of apps while taking quality
and privacy into account. Thus, the existing recommendation system requires upgrading in
order to meet current and future demands for higher recommendation quality.

21
6.5 Recommendations

In future the system may be created in such a way that authors may be able to negotiate the
selling price of the books with the sellers. Also, the problem of changing of user preference
may be looked at in such a way a formula is generated that if the user changes the
preference the system may be able to use that formulae to generate the formulae. Also, the
system may also increase what it recommends rather than books alone rather it can combine
elements like movies and music for entertainment purposes.

22
CHAPTER 7

7 Experimentation And Results

7.1 Testing

To acquire the outcome countless recordings were taken and their precision in deciding eye
squints and sluggishness was tried. For this venture, we utilized a webcam associated with
the PC. The webcam required white LEDs appended to it for giving better enlightenment.
Progressively scenario, infrared LEDs ought to be utilized rather than white LEDs with the
goal that the framework is non-meddlesome. Signal is utilized to create ready sound yield to
awaken the driver when sluggishness surpasses a specific edge. The framework was tried
for various individuals in various encompassing lighting conditions (daytime and evening
time). At the point, when the webcam backdrop illumination was turned ON and the face is
kept at an ideal separation, at that point the framework can identify flickers just as languor
with over 90% precision. This is a decent outcome and can be actualized continuously
frameworks also. Test yields for different conditions in different pictures are given
underneath. Recordings were taken; in which both face and eyes were recognized. In spite
of the fact that the two procedures have moderately equivalent exactness. The test was
executed on the PC. To confirm the eye divide calculation, the analyzer stood 0.2 to 0.3
away from the Webcam which was situated before the analyzer at an edge with ±15°. The
picture of the eye area was separated effectively and was featured by a red square shape. To
test the daytime discovery, we controlled the earth light force to be 80W and the analyzer
remained at a similar identification position as above. At the point, when the analyzer shut
the eye for over two seconds, the program demonstrated, "eye shut” at terminal as required.
Moreover, when the analyzer squinted the eye, nothing occurred true to form. To test the
night identification, the main contrast is that we controlled the encompassing light force to
be 20W. We effectively got a similar discovery result as daytime. The precision for eye
flicker was determined by the recipe Precision = 1 - (complete no. of flickers - no. of
flickers recognized)/complete no. of squints. A similar recipe was utilized for figuring
precision of sleepiness recognition.

23
7.2. EXPERIMENTAL RESULTS

To validate our system,we test on driver in the car with real driving condition. We use an
camera with buzzer. The results of the eye states are illustrated in image,where the eyes is
closed and the system alert the drowsiness and start the alert sound.

24
CHAPTER 8

REFERENCES
[1] P.K. Singh, P.K.D. Pramanik, A.K. Dey and P. Choudhury, Recommender systems: an
overview, research trends, and future directions, Int. J.Business Syst.Res. 15(1) (2021) 14–
52.

[2] K. Heung-Nam, E.S. Abdulmotaleb, J. Geun- Sik, “Collaborative error-reflected models

for cold-start recommender systems”, Decision Support Systems 51 (3) (2011), pp. 519–531

[3] J. Bobadilla, F. Serradilla, “The effect of sparsity on collaborative filtering metrics”, in:
Australian Database Conference, 2009, pp. 9– 17.

[4] J. Bobadilla, A. Hernando, F. Ortega, J. Bernal, “A framework for collaborative filtering

recommender systems”, Expert Systems with Applications 38 (12) (2011) 14609–14623.

[5] J. B. Schafer, D. Frankowski, J. Herlocker, S.Sen,“Collaborative filtering recommender

systems”, in: P.Brusilovsky, A. Kobsa, W. Nejdl (Eds.), The Adaptive Web, 2007, pp. 291–
324

[6] Adomavicius, G.; Tuzhilin, A., "Toward the next generation of recommender systems: a
survey of the state of- the-art and possible extensions," Knowledge and Data Engineering,
IEEE Transactions on , vol.17, no.6, pp.734,749, June 2005.

[7] B. M. Sarwar, G. Karypis, J. A. Konstan, and J. Reidl, “Item-based collaborative

filtering recommendation algorithms,” in ACM www ‟01, pp. 285–295, ACM, 2001.

[8] C. Basu, H. Hirsh, W. Cohen, “Recommendation as classification: using social and

content-based information in recommendation”, in: Proceedings of the Fifteenth National
Conference on Artificial Intelligence, 1998, pp. 714–720.

[9] Book Crossing Dataset by apl. Prof. Dr. Cai- Nicolas Ziegler
http://www.informatik.unifreiburg.de/~cziegler/BX

[10] Thorat, Poonam B., R. M. Goudar, and Sunita Barve. "Survey on collaborative
filtering, content-based filtering and hybrid recommendation system." International Journal
of Computer Applications 110, no. 4 (2015): 31-36.

25
[11] M. Balabanovic, Y. Shoham, “Content-based, collaborative recommendation”,
Communications of the ACM 40 (3) (1997) pp.66–72.

[12] M. Pazzani, “A framework for collaborative, content based, and demographic

filtering”, Artificial Intelligence Review-Special Issue on Data Mining on the Internet 13
(5-6) (1999) pp.393–408.

[13] Konstan, Joseph & Miller, Bradley & Maltz, David & Herlocker, Jon & Gordon, Lee &
Riedl, John. (2000). GroupLens: Applying collaborative filtering to Usenet news.
Communications of the ACM. 40. 10.1145/245108.245126.

Book Recs Report DAP - Aditi Sahal
No ratings yet
Book Recs Report DAP - Aditi Sahal
34 pages
ML Report 20.1
No ratings yet
ML Report 20.1
29 pages
BDA Mini Project Report
No ratings yet
BDA Mini Project Report
27 pages
Movierecommentreport
No ratings yet
Movierecommentreport
39 pages
Book Recomendation System Documentation Final
No ratings yet
Book Recomendation System Documentation Final
58 pages
Book Suggestion System Doc Team-12
No ratings yet
Book Suggestion System Doc Team-12
57 pages
TOMARFNT
No ratings yet
TOMARFNT
5 pages
Phase 2 Report
No ratings yet
Phase 2 Report
55 pages
Arathi Front Pages Final
No ratings yet
Arathi Front Pages Final
4 pages
MTE Project Report
No ratings yet
MTE Project Report
38 pages
Movie Recommendation System DAA PBL Project
No ratings yet
Movie Recommendation System DAA PBL Project
40 pages
Final
No ratings yet
Final
58 pages
Movie Recommendation System
No ratings yet
Movie Recommendation System
15 pages
Book Summarization Documentation
No ratings yet
Book Summarization Documentation
44 pages
Untitled Document
No ratings yet
Untitled Document
35 pages
B.E Cse Batchno 46
No ratings yet
B.E Cse Batchno 46
58 pages
Mini Report
No ratings yet
Mini Report
49 pages
Internshippython
No ratings yet
Internshippython
35 pages
Report On Restaurant Recommendation
No ratings yet
Report On Restaurant Recommendation
61 pages
TE Mini Project 2A Report Format 24-25
No ratings yet
TE Mini Project 2A Report Format 24-25
21 pages
17BCS217
No ratings yet
17BCS217
37 pages
MTE Project Report
No ratings yet
MTE Project Report
29 pages
Music Recommender 212 - 215 - 237 (1) - 2
No ratings yet
Music Recommender 212 - 215 - 237 (1) - 2
57 pages
Se Mini Project - A Report
No ratings yet
Se Mini Project - A Report
24 pages
Board Game Review Prediction Report
No ratings yet
Board Game Review Prediction Report
31 pages
Project Documentation (Book Recommendation System)
No ratings yet
Project Documentation (Book Recommendation System)
76 pages
Movie Recommendation System Report
No ratings yet
Movie Recommendation System Report
30 pages
Sample Report For Movie Recommender System
No ratings yet
Sample Report For Movie Recommender System
30 pages
ML Miniproject Index
No ratings yet
ML Miniproject Index
4 pages
Joel Project
No ratings yet
Joel Project
30 pages
Report - Copy Modified
No ratings yet
Report - Copy Modified
42 pages
Pratham Report ML Project
No ratings yet
Pratham Report ML Project
37 pages
Udaya
No ratings yet
Udaya
63 pages
ML Miniproject Index
No ratings yet
ML Miniproject Index
4 pages
ML - Movie Recommendation Report
No ratings yet
ML - Movie Recommendation Report
17 pages
Bookish Final Report
No ratings yet
Bookish Final Report
39 pages
Data Science Seminar: Movie Recommendations
No ratings yet
Data Science Seminar: Movie Recommendations
26 pages
46 MP
No ratings yet
46 MP
59 pages
Indore: A Project Report Submitted at
No ratings yet
Indore: A Project Report Submitted at
27 pages
Cyberbullying A17 Major Project
No ratings yet
Cyberbullying A17 Major Project
98 pages
Book Recommendation System Report
No ratings yet
Book Recommendation System Report
55 pages
Binder 1
No ratings yet
Binder 1
93 pages
Eye Blink Detection: Integrated - Master of Computer Applications
100% (1)
Eye Blink Detection: Integrated - Master of Computer Applications
34 pages
Final Reportrrrrttnb
No ratings yet
Final Reportrrrrttnb
60 pages
Mca Project
No ratings yet
Mca Project
134 pages
Thesis 23010 Print
No ratings yet
Thesis 23010 Print
57 pages
Major Project Thesis
No ratings yet
Major Project Thesis
75 pages
Template To Prepare Documentation
No ratings yet
Template To Prepare Documentation
6 pages
RTRP Project Documentation Format-2024 (AutoRecovered)
No ratings yet
RTRP Project Documentation Format-2024 (AutoRecovered)
62 pages
Giridhar K Final
No ratings yet
Giridhar K Final
50 pages
CHIRANT0H
No ratings yet
CHIRANT0H
7 pages
My Mini Project Report Format PDF
No ratings yet
My Mini Project Report Format PDF
12 pages
Online Book Reccomandation Report
No ratings yet
Online Book Reccomandation Report
26 pages
AI Project Shishi
No ratings yet
AI Project Shishi
12 pages
AYASKANTA PARIDA - Report
No ratings yet
AYASKANTA PARIDA - Report
116 pages
Sample Report-Phase1 22-23 AY
No ratings yet
Sample Report-Phase1 22-23 AY
12 pages
Shweta (CS)
No ratings yet
Shweta (CS)
37 pages
Final Report 09
No ratings yet
Final Report 09
53 pages
Moviebot
No ratings yet
Moviebot
69 pages
TechSnitch Confirmation 28th Jan
No ratings yet
TechSnitch Confirmation 28th Jan
2 pages
Galgotias Selected List
No ratings yet
Galgotias Selected List
14 pages
Interview Attendance List
No ratings yet
Interview Attendance List
2 pages
ADA MCA Assignment1
No ratings yet
ADA MCA Assignment1
1 page
Deepak Oe
No ratings yet
Deepak Oe
40 pages
Using Infoset Query, SAP Query and Quick Viewer
No ratings yet
Using Infoset Query, SAP Query and Quick Viewer
31 pages
NLP Presentation
No ratings yet
NLP Presentation
19 pages
Final Project of MIS
No ratings yet
Final Project of MIS
12 pages
Hostel Management System
No ratings yet
Hostel Management System
24 pages
DB2BP Storage Optimization 0412
No ratings yet
DB2BP Storage Optimization 0412
35 pages
GIS - Chap 3
No ratings yet
GIS - Chap 3
7 pages
PBL Report: Data Analysis & Graphing
No ratings yet
PBL Report: Data Analysis & Graphing
9 pages
Database 2 - C309 3mise: Syllabus
No ratings yet
Database 2 - C309 3mise: Syllabus
17 pages
Junior AI Engineer
No ratings yet
Junior AI Engineer
2 pages
MVC Summary Examples
No ratings yet
MVC Summary Examples
6 pages
Google Cloud Analytics Lakehouse
No ratings yet
Google Cloud Analytics Lakehouse
47 pages
NOVA: A Log-Structured File System For Hybrid Volatile/Non-volatile Main Memories
No ratings yet
NOVA: A Log-Structured File System For Hybrid Volatile/Non-volatile Main Memories
17 pages
Shell Navigation
No ratings yet
Shell Navigation
18 pages
Mobile Report - Database
No ratings yet
Mobile Report - Database
6 pages
Database Design for Professors
No ratings yet
Database Design for Professors
8 pages
Lesson Iv
No ratings yet
Lesson Iv
13 pages
A New Outlook Into Your Business: Anand Chitale and Mike Barnhart, SAS Institute Inc., Cary, NC
No ratings yet
A New Outlook Into Your Business: Anand Chitale and Mike Barnhart, SAS Institute Inc., Cary, NC
9 pages
Spreadsheets
No ratings yet
Spreadsheets
2 pages
Chapter 4 - Test Bank
85% (13)
Chapter 4 - Test Bank
28 pages
Mis Notes
100% (1)
Mis Notes
55 pages
Assignment JTW115E 2023-2024 v5
No ratings yet
Assignment JTW115E 2023-2024 v5
5 pages
Class 10 I.T. Practical File
No ratings yet
Class 10 I.T. Practical File
30 pages
SOC Operations & Incident Management
100% (2)
SOC Operations & Incident Management
2 pages
Web Programming Exam: HTML, PHP, MySQL
No ratings yet
Web Programming Exam: HTML, PHP, MySQL
3 pages
Guide To This Manual
100% (1)
Guide To This Manual
1 page
OLTP and OLAP Table Mapping Guide
No ratings yet
OLTP and OLAP Table Mapping Guide
19 pages
Jyoti Kumari Resume
No ratings yet
Jyoti Kumari Resume
2 pages
Secure SDLC for Developers
100% (1)
Secure SDLC for Developers
19 pages
Mba III Unit I Notes
No ratings yet
Mba III Unit I Notes
9 pages
Product Sales Assignment Solutions
No ratings yet
Product Sales Assignment Solutions
15 pages

MCA2282

Uploaded by

MCA2282

Uploaded by

PROJECT REPORT

Book Recommendation System

requirement for the award of the degree of

MASTERS OF COMPUTER APPLICATION

Under the guidance of

School of Computing Science & Engineering

(Master of Computer Application) submitted in the School of Computer Application and

Computer Science and Engineering/School of Computer Application and Technology , Galgotias

University, Greater Noida.

award of any other degree of this or any other places.

Deepak Yadav (23SCSE2160014)

Deepak Singh (23SCSE2160043)

Date: 27 January 2024 Deepak Singh

Signature of Student Signature of Guide

Figure 1: Distribution of book With Rating Figure............................................................. 12

1.1. User-based approach:

• Perform separate procedures and connect results.

• Use some content filtering directives with the CF communities

• Use some CF principles in content filtering recommendations.

• Use both content and collaborative filtering in the recommender

In addition, research on semantic-based, contextual,crosslanguage, cross-domain, and peer-

4. DATA SOURCES AND PRE-PROCESSING

Figure 4.1: Distribution of book with rating Figure

4.2. RMSE Model

The formula is:

Figure 4.2: :RMSE Formula

Figure 4.3: Ratings of book per user

4.3.1. Data Split

4.3.3. Data cleaning

4.3.4. Data reduction

4.3.5. Collaborative Filtering

1. User-Based Collaborative Filtering: Recommends items that users with similar

2. Item-Based Collaborative Filtering: Suggests items based on similarity between items

5.1 DEVELOPMENT AND DEPLOYMENT DETAILS

5.2.2. Collaborative based Filtering: Collaborative based filtering recommender systems

5.2.4. Dataset description

5.2.5. Loading data

5.2.7. Exploratory data analysis

5.2.8. Content based filtering /popularity-based filtering

5.2.9. Collaborative based filtering Approach

Figure 5.1: DataSet

6. CONCLUSIONS AND FUTURE WORK

6.3.1. Lack of Data

6.3.3. Changing User Preferences

6.3.4. Unpredictable Items

Numerous recommendation systems based on collaborative filtering, content, and hybrid

7 Experimentation And Results

[2] K. Heung-Nam, E.S. Abdulmotaleb, J. Geun- Sik, “Collaborative error-reflected models

[4] J. Bobadilla, A. Hernando, F. Ortega, J. Bernal, “A framework for collaborative filtering

[5] J. B. Schafer, D. Frankowski, J. Herlocker, S.Sen,“Collaborative filtering recommender

[7] B. M. Sarwar, G. Karypis, J. A. Konstan, and J. Reidl, “Item-based collaborative

[8] C. Basu, H. Hirsh, W. Cohen, “Recommendation as classification: using social and

[12] M. Pazzani, “A framework for collaborative, content based, and demographic

You might also like