MCA2282
MCA2282
On
Session 2024-2025
In
DATA ANALYTICS
Submitted By
DEEPAK YADAV
Admis No. 23SCSE2160014
DEEPAK SINGH
Admis No. 23SCSE2160043
GALGOTIAS UNIVERSITY,
GREATER NOIDA
CANDIDATE’S DECLARATION
I/We hereby certify that the work which is being presented in the project, entitled “Book
Recommendation System” in partial fulfillment of the requirements for the award of the MCA
Technology of Galgotias University, Greater Noida, is an original work carried out during the
period of August, 2024 to Jan and 2025, under the supervision of Dr.-Mohd Arif, Department of
The matter presented in the thesis/project/dissertation has not been submitted by me/us for the
This is to certify that the above statement made by the candidates is correct to the best of my
knowledge.
Dr.-Mohd Arif
Asst. Professor
SCHOOL OF COMPUTER APPLICATION AND
TECHNOLOGY
GALGOTIAS UNIVERSITY,
GREATER NOIDA
CERTIFICATE
This is to certify that the work in the project report entitled “A Facial Expression Recognition
System” by Deepak Yadav, Deepak Singh, bearing Enrolment No.: 23SCSE2160014,
23SCSE2160043, is a Bonafede record of project work carried out by her under my
supervision and guidance in partial fulfilment of the requirements for the award of the degree
of Masters of Computer Application in the Department School of Computer Science and
Engineering, Galgotias University, Greater Noida. Neither this project nor any part of it has
been submitted for any degree or academic award elsewhere.
Dr.-Mohd Arif
Assistant Professor
SCHOOL OF COMPUTER APPLICATION AND
TECHNOLOGY
GALGOTIAS UNIVERSITY,
GREATER NOIDA
ACKNOWLEDGEMENT
My journey towards achieving the destination for the design and development of this project
has finally come to an end, I would like to express my heartfelt gratitude to everyone who
contributed to its successful completion. I am deeply grateful to my project guide, Dr.-Mohd
Arif for their invaluable guidance, support, and expertise throughout the development of the
“Book Recommendation System” project. I am also thankful to my friends and family for their
endless support and motivation. Despite the challenges faced, I had a lot of learning and fun
throughout this journey. This project would not have been possible without the relentless
support and encouragement from everyone involved.
Deepak Yadav
SCHOOL OF COMPUTER APPLICATION AND
TECHNOLOGY
GALGOTIAS UNIVERSITY,
GREATER NOIDA
ABSTRACT
Recommender systems are one of the most successful and widespread application of machine
learning. The primary aim of this project is to build a book recommender system that provides
personalized book recommendation for each user. The machine learning algorithms in
recommender systems are typically classified into two categories – content based and collaborative
filtering. Both methods have their pros and cons. Content-based filtering require rich description
of items and well organized user profile before recommendation can be made to users whereas
collaborative filtering suffers from the cold start problem. Hybrid method uses the best of both of
these methods but they tend to be complex. The goal of this project is to implement a hybrid
recommender system in its simplest form without losing much quality in its recommendation.
Furthermore, it is important to account both implicit and explicit feedback from the user. In this
regard, the sentiment on the reviews of the books are analyzed and they are taken into account
along with rating and clicks on particular book to generate a weighted average mean rating that is
used in user-item interaction matrix for generating recommendation by method of matrix
factorization. Thus, the objective here is to eliminate the need of quality descriptive item data and
adequate information about a user in order to make relevant recommendation.
The system evaluation results show that the accuracy of the system recommendations is very good
and that a recommender system based on the combination of content-based and collaborative
filtering approaches provides more accurate recommendations for the book domain.
1
14
2
Table of Contents
CHAPTER 1 ...................................................................................................................... 5
1. INTRODUCTION ..................................................................................................... 5
1.1. User-based approach........................................................................................... 5
1.2. Item-based approach ........................................................................................... 6
1.3. Researchers ......................................................................................................... 6
CHAPTER 2 ...................................................................................................................... 7
2. OBJECTIVES............................................................................................................ 7
CHAPTER 3 ...................................................................................................................... 8
3. LITERATURE REVIEW........................................................................................... 8
CHAPTER 4 .................................................................................................................... 11
4. DATA SOURCES AND PRE-PROCESSING ........................................................ 11
4.1. System Architectute .......................................................................................... 11
4.2. RMSE Model ................................................................................................... 12
4.3. Sentiment Analysis ........................................................................................... 13
CHAPTER 5 .................................................................................................................... 16
5. IMPLEMENTATION DETAILS ............................................................................. 16
5.1. Development And Deployment Details ............................................................ 16
5.2. Algorithm ......................................................................................................... 16
5.2.1. Content Based Filtering ............................................................................. 16
5.2.2. Collaborative Based Filtering .................................................................... 16
5.2.3. Hybrid Filtering Method ........................................................................... 17
5.2.4. Dataset Description ................................................................................... 17
5.2.5. Loading Data ............................................................................................. 17
5.2.6. Preprossing Data ........................................................................................ 17
5.2.7. Exploratory Data Analysis ........................................................................ 18
5.2.8. Content based filtering /popularity-based filtering .................................... 18
5.2.9. Collaborative based filtering Approach ..................................................... 18
5.3. Accuracy ....................................................................................................... 19
CHAPTER 6 .................................................................................................................... 30
6. CONCLUSIONS AND FUTURE WORK .............................................................. 20
6.1. Overview .......................................................................................................... 20
6.2. Achivements ..................................................................................................... 20
6.3. Limitations ........................................................................................................ 20
3
CHAPTER 7 .................................................................................................................... 23
7. EXPERIMENTATION AND RESULTS ................................................................ 23
7.1. Testing .............................................................................................................. 23
7.2. Experimental Results ........................................................................................ 23
CHAPTER 8 .................................................................................................................... 25
References ........................................................................................................................ 25
4
CHAPTER 1
1. INTRODUCTION
The Current recommendation systems such as content based filtering and collaborative
filtering use different information sources to make recommendations. Content based
filtering, makes recommendations based on user preferences for product features in this
case the genre of the Books, the authors, the ratings etc. Collaborative filtering impersonate
user-to-user recommendations as a weighted and linear combination of other user
preferences. Both methods have limitations.
Content-based filtering recommends new entities byusing n or more custom data to absorb
the best matches. Similarly, collaborative filtering requires a large dataset of active users
who have previously evaluated the product in order to make accurate predictions. The
combination of these different recommender systems is called a hybrid system and allows
you to combine item properties with other users' preferences. Existing services such as
Goodreads personalize recommendations and use weighted averages to provide an overall
rating for abook. This greatly enhances the value of each recommendation as it takes into
account the user's individual book preferences. However, our recommendation engine
employs a collaborative social networking approach that mixes your tastes with the wider
community to produce meaningful results. The complete RS contains three main
components: user resources, article resources, and recommended algorithms. The user
model analyzes consumer interests, and the item model analyzes the properties of items in a
similar way. It then collates the consumer's characteristics with the item's characteristics and
uses the recommended algorithm to estimate the recommended item. The performance of
this algorithm affects the performance of the entire system.
In memory-based CF, book ratings are used directly to rank unknown ratings for new
books. This method can be divided into two types: a user-based approach and an item-based
approach.
6
CHAPTER 2
2. OBJECTIVES
The primary goal of this project is to build a recommender system that provides
personalized book recommendations to users and to build a website whereby the users can
give feedbacks on the books and get recommendations accordingly. The concrete functional
objectives of this project are as follows:
• To account all three forms of user feedback: star rating, review and clicks for generating
recommendations.
• To develop a recommender system that incorporates book metadata: author, genre and
description to find similarities among the books to address cold-start problem.
• To build a website with descent authentication where users can provide feedback on the
books and get personalized recommendation.
7
CHAPTER 3
3. LITERATURE REVIEW
It is vital to account every feedback from the user for delivering quality recommendation.
Often, the reviews on particular item are taken as of lower priority than direct rating. Business
analysts of online reviews believe that the reviews are directly responsible for their marketing
strategies as well as purchasing decisions. Two surveys related to impact of opinion mining
on individual, society and organizations were carried out in America on 2000, and on
American people in 2008, by Horrigan. The key findings of this surveys are stated below: •
81% of Internet users make online investigation of the product at least once before purchasing
it.
• 20% Internet users perform such investigation on a usual day.
• Consumers report shows that users are ready to pay starting from 20 to 99% additional for
a 5-star-rated products than that of a 4-star-rated products. Isinkaye, et al. have provided
different characteristics and potentials of different prediction techniques in recommendation
systems in order to serve as a compass for research and practice in the field of
recommendation systems. Collaborative Filtering technique filters out the recommendation
with the help of user behavior in the form of ratings. This technique generates rating for an
unrated item for the user, based on the commonalities among users and their ratings.
Recommendation quality is directly proportional to the size of rating dataset. This technique
suffers from cold start problem for a new user and a new item.
This is so because user with few ratings is difficult to categorize and item with few ratings
cannot easily be recommended. Also users with unusual tastes will not get recommendation
as per expectations because it’s quite difficult to find their co-relation with other users to
extract ratings. Content-based technique uses item features and user preference to provide
recommendations. In this technique, item features like keywords are used to describe items,
while user preference indicates the items liked by the user. It recommends items that are
similar to the items preferred by the user. Since it recommends similar items based on user
preference and item feature, crossed genre items preferred by the user cannot be
recommended. Knowledge-based methods mainly utilize the path between concepts in
knowledge resources to indicate their semantic similarity Lin and Resnik measured the
8
semantic similarity by the information content ratio of the least common sub sumer of the
two concepts. Instead of directly exploiting the graph distance in knowledge resources,
several researches manufactured concept vectors according to a set of ontology properties for
concept semantic similarity.
Corpusbased methods assume that words with similar meaning often occur in similar
contexts, Latent Semantic Analysis (LSA) represents words as compact vectors via singular
value decomposition (SVD) on the corpus matrix, and GloVe reduced computational costs by
training directly on the non-zero elements in corpus matrix. Turney measured semantic
similarity with the point-wise mutual information. Web can also be regarded as a large corpus,
and the Google Distance utilizes the number of words co-occur in web pages for concept
semantic similarity. Mikolov proposed the word embedding approach with neural network to
capture the word semantics occurred in a fixed size window.
A challenge for determining semantic similarity of short texts is how to go from word-level
semantics to short-text-level semantics. A direct approach is utilizing weighted sum of word-
to-word semantic similarity, and used the greedy word alignment method to form the sentence
semantic similarity. Banea .et al. measured the semantic similarity between text snippets
infused with opinion knowledge. D Ramage et al. constructed a concept graph using words
of each text snippet, then measured the similarity between two concept graphs. Recently, the
Sem Eval conference released the Semantic Text Similarity (STS) task especially for the
boom of short text semantics, and regression models are adopted by most teams to predict
the similarity score, and lexical feature and syntactic features are exploited. The paragraph
vector which is similar to word embedding, is also proposed with neural network to measure
the semantic similarity between short texts.
The research directly related to document semantic similarity is rare and undeveloped.
Traditional similarity researches such as the TF-IDF converted documents to vectors via word
counting, and measured the similarity between documents by the similarity of vectors.
Academic articles can be regarded as semi-structured texts, which contain many structured
annotations besides plain text. The similarity between academic articles can be measured with
the help of annotated information. Martin et al. fused the structured information, such as
9
authors and keywords with traditional text-based measures for academic articles similarity.
regard articles and their citations as an information network. Thus LDA can be used in the
semantic analysis of long documents. Muhammad Rafi defined a similarity measure based
on topic maps in the document clustering task. The documents are transformed into topic
maps based coded knowledge, and the similarity between a pair of documents is represented
as a correlation between their common patterns. M. Zhang et al. enriched document with
hidden topics from external corpora, and measured the document similarity in text
classification task with the similarity of topic distributions. Hybrid technique combines two
or more recommendation techniques to predict recommendation.
10
CHAPTER 4
11
Following section visualizes the basic structure of the dataset obtained after merging.
12
shows the number of books at a specific rating value. The dataset originally, contains rating
values from zero to ten with zero rating signifying that the book has not been rated. It
concludes that the dataset contains large number of book which have not been rated.
Figure 4.3 represents the distribution of rating per user. It can be seen that the dataset
contains abundant users who have not rated the book is high. As the rating value goes
increasing, the count goes on decreasing.
13
4.3.2. Data preprocessing
Data preprocessing is a data mining technique that involves transforming raw data into an
format that can be understood. Real-world data is often incomplete, inconsistent, and/or
lacking in certain behaviors or trends, and is likely to contain many errors. Data
preprocessing is a proven method of resolving such issues. Data preprocessing prepares
raw data for further processing.
This is the basic after the collection of the data. The user may not get the desired datasets
for his need, so the user needs to modify the datasets according the need and demand of the
project. During this course, the user may need to handle the missing values, outliers or any
other misrepresentation of the datasets. The text review contains a large number of words
including regular expression, punctuation, commas, full stops and other forms of the words
like past or future form,
which are not needed and they are redundant to our project. For this NLTK, a python library
for Natural Language Processing has been used.
14
users read two books together, recommending one book if the other has been read is
common.
CF requires a large dataset of user interactions and performs best with active user data. It’s
widely used in platforms like e-commerce and streaming services, where it helps discover
new and relevant items based on shared user preferences.
15
CHAPTER 5
5. IMPLEMENTATION DETAILS
A recommendation engine is a class of machine learning which offers relevant suggestions
to the customer. Before the recommendation system, the major tendency to buy was to
take a suggestion from friends. But Now Google knows what news you will read, Youtube
knows what type of videos you will watch based on your search history, watch history, or
purchase history.
A recommendation system helps an organization to create loyal customers and build trust
by them desired products and services for which they came on your site. The
recommendation system today are so powerful that they can handle the new customer too
who has visited the site for the first time. They recommend the products which are
currently trending or highly rated and they can also recommend the products which bring
maximum profit to the company.
Machine learning is a process that is widely used for prediction. N number of algorithms
are available in various libraries which can be used for prediction. In this article, we are
going to build a prediction model on historical data using different machine learning
algorithms and classifiers, plot the results, and calculate the accuracy of the model on the
testing data.
Building/Training a model using various algorithms on a large dataset is one part of the
data. But using these models within the different applications is the second part of
deploying machine learning in the real world.
To put it to use in order to predict the new data, we have to deploy it over the internet so
that the outside world can use it. In this article, we will talk about how we have trained a
machine learning model and created a web application on it using Flask.
We have to install many required libraries which will be used in this model. Use pip
command to install all the libraries. pip install pandas pip install numpy pip install sklearn
5.2 ALGORITHM
5.2.1. Content based Filtering: The algorithm recommends a product that is similar to
those which used as watched. In simple words, In this algorithm, we try to find finding
item look alike. For example, a person likes to watch Sachin Tendulkar shots, so he may
like watching Ricky Ponting shots too because the two videos have similar tags and similar
categories.
16
has watched A, B, and C movie. And Y user has watched B, C, and D movie then we will
recommend A movie to Y user and D movie to X user.
5.2.3. Hybrid filtering method: It is basically a combination of both the above methods.
It is a too complex model which recommends product based on your history as well based
on similar users like you. There are some organizations that use this method like Facebook
which shows news which is important for you and for others also in your network and the
same is used by Linkedin too.
• Books – first are about books which contain all the information related to books like an
author, title, publication year, etc.
• Users – The second file contains registered user’s information like user id, location.
• ratings – Ratings contain information like which user has given how much rating to
which book.
So based on all these three files we can build a powerful collaborative filtering model. let’s
get started.
So, while loading data we have to handle these exceptions and after running the below
code you will get some warning and it will show which lines have an error that we have
skipped while loading.
5.2.6. Preprocessing Data: Now in the books file, we have some extra columns which are
not required for our task like image URLs. And we will rename the columns of each file
as the name of the column contains space, and uppercase letters so we will correct as to
make it easy to use.
The dataset is reliable and can consider as a large dataset. we have 271360 books data and
total registered users on the website are approximately 278000 and they have given near
about 11 lakh rating. hence we can say that the dataset we have is nice and reliable.
We do not want to find a similarity between users or books. we want to do that If there is
user A who has read and liked x and y books, And user B has also liked this two books
17
and now user A has read and liked some z book which is not read by B so we have to
recommend z book to user B. This is what collaborative filtering is.
So this is achieved using Matrix Factorization, we will create one matrix where columns
will be users and indexes will be books and value will be rating. Like we have to create a
Pivot table.
If we take all the books and all the users for modeling, Don’t you think will it create a
problem? So what we have to do is we have to decrease the number of users and books
because we cannot consider a user who has only registered on the website or has only read
one or two books. On such a user, we cannot rely to recommend books to others because
we have to extract knowledge from data. So what we will limit this number and we will
take a user who has rated at least 200 books and also we will limit books and we will take
only those books which have received at least 50 ratings from a user.
Collaborative filtering is a technique that can filter out items that a user might like on the
basis of reactions by similar users.
It works by searching a large group of people and finding a smaller set of users with tastes
similar to a particular user. It looks at the items they like and combines them to create a
ranked list of suggestions.
There are many ways to decide which users are similar and combine their choices to create
a list of recommendations. This article will show you how to do that with Python.
To experiment with recommendation algorithms, you’ll need data that contains a set of
items and a set of users who have reacted to some of the items.
18
The reaction can be explicit (rating on a scale of 1 to 5, likes or dislikes) or implicit
(viewing an item, adding it to a wish list, the time spent on an article).
While working with such data, you’ll mostly see it in the form of a matrix consisting of
the reactions given by a set of users to some items from a set of items. Each row would
contain the ratings given by a user, and each column would contain the ratings received
by an item.
To build a system that can automatically recommend items to users based on the
preferences of other users, the first step is to find similar users or items. The second step
is to predict the ratings of the items that are not yet rated by a user.
5.3. Accuracy
One of the approaches to measure the accuracy of your result is the Root Mean Square
Error (RMSE), in which you predict ratings for a test dataset of user-item pairs whose rating
values are already known. The difference between the known value and the predicted value
would be the error. Square all the error values for the test set, find the average (or mean),
and then take the square root of that average to get the RMSE.
19
CHAPTER 6
6.1 Overview
In this chapter I am going to discuss the conclusions. This chapter discusses about the
achievements, limitations, conclusions and recommendations
6.2 Achievements
In this system all the objectives were met. The first objective was a reader is able to
customize the type of book to read. This helps the system to predict the type of books the
reader is interested in. Secondly the user is able to select a book online and due to big data
in the Recommender system the user has a wide variety of books to choose from. Thirdly, I
used a payment method for the system where the user is able to make payments on the
system for a specific item on the system. The system is able to keep track of the books the
reader reads on the system to enable a user either not to buy a book twice or maybe the user
wants to refer to a particular book. The system is also able to generate a report based on the
interests of the reader. Additionally, I was able to add an author's interface where authors
can advertise their books and also know the number of times their books sell and also they
can be customers of other sellers' books in order to further their work.
6.3 Limitations
The primary difficulty with recommender systems is that they need a large amount of data
in order to provide meaningful suggestions (2009, January 28). It's no accident that the
firms most 35 associated with outstanding suggestions are those that collect large amounts
of customer data: Google, Amazon, Netflix, and Last.fm. A good recommender system must
first collect and evaluate item data, then user data (behavioral events), and last the magic
algorithm does its magic. The more item and user data a recommender system has to work
with, the more likely it will provide accurate suggestions. However, obtaining solid
references might be challenging. You'll need a large number of users to amass enough data
for the suggestions.
20
6.3.2. Changing Data
Over the years people change the way they think and also discover new things since due to
regular change, improvement and also discovering new technology people tend to write
new editions of books which makes it difficult for readers when it comes now the new data
changes in the way they think so they may start changing their preference. so the system
admin may be forced to keep adding and also removing books which are past time.
People change interest according to the changes in lifestyle, age and environment. The
system uses algorithms to predict the interest of users in the recommender system and the
machine is not affected by the environment hence either the code or the algorithm the
system will keep on predicting the same choices the user chooses when beginning to use the
system.
In an article on the Netflix Prize (The Netflix Prize - ResearchGate. Retrieved September
15, 2020), about Netflix's $1 million award for developing a collaborative filtering system
that 36 improves the company's own recommendation algorithm by 10%, I mentioned that
there was a problem with oddball films. A film that people either like or despise, such as
Napoleon Dynamite. Recommendations for these sorts of things are difficult to make, since
consumer reactions to them are often varied and unexpected.
6.4 Conclusions
21
6.5 Recommendations
In future the system may be created in such a way that authors may be able to negotiate the
selling price of the books with the sellers. Also, the problem of changing of user preference
may be looked at in such a way a formula is generated that if the user changes the
preference the system may be able to use that formulae to generate the formulae. Also, the
system may also increase what it recommends rather than books alone rather it can combine
elements like movies and music for entertainment purposes.
22
CHAPTER 7
7.1 Testing
To acquire the outcome countless recordings were taken and their precision in deciding eye
squints and sluggishness was tried. For this venture, we utilized a webcam associated with
the PC. The webcam required white LEDs appended to it for giving better enlightenment.
Progressively scenario, infrared LEDs ought to be utilized rather than white LEDs with the
goal that the framework is non-meddlesome. Signal is utilized to create ready sound yield to
awaken the driver when sluggishness surpasses a specific edge. The framework was tried
for various individuals in various encompassing lighting conditions (daytime and evening
time). At the point, when the webcam backdrop illumination was turned ON and the face is
kept at an ideal separation, at that point the framework can identify flickers just as languor
with over 90% precision. This is a decent outcome and can be actualized continuously
frameworks also. Test yields for different conditions in different pictures are given
underneath. Recordings were taken; in which both face and eyes were recognized. In spite
of the fact that the two procedures have moderately equivalent exactness. The test was
executed on the PC. To confirm the eye divide calculation, the analyzer stood 0.2 to 0.3
away from the Webcam which was situated before the analyzer at an edge with ±15°. The
picture of the eye area was separated effectively and was featured by a red square shape. To
test the daytime discovery, we controlled the earth light force to be 80W and the analyzer
remained at a similar identification position as above. At the point, when the analyzer shut
the eye for over two seconds, the program demonstrated, "eye shut” at terminal as required.
Moreover, when the analyzer squinted the eye, nothing occurred true to form. To test the
night identification, the main contrast is that we controlled the encompassing light force to
be 20W. We effectively got a similar discovery result as daytime. The precision for eye
flicker was determined by the recipe Precision = 1 - (complete no. of flickers - no. of
flickers recognized)/complete no. of squints. A similar recipe was utilized for figuring
precision of sleepiness recognition.
23
7.2. EXPERIMENTAL RESULTS
To validate our system,we test on driver in the car with real driving condition. We use an
camera with buzzer. The results of the eye states are illustrated in image,where the eyes is
closed and the system alert the drowsiness and start the alert sound.
24
CHAPTER 8
REFERENCES
[1] P.K. Singh, P.K.D. Pramanik, A.K. Dey and P. Choudhury, Recommender systems: an
overview, research trends, and future directions, Int. J.Business Syst.Res. 15(1) (2021) 14–
52.
[3] J. Bobadilla, F. Serradilla, “The effect of sparsity on collaborative filtering metrics”, in:
Australian Database Conference, 2009, pp. 9– 17.
[6] Adomavicius, G.; Tuzhilin, A., "Toward the next generation of recommender systems: a
survey of the state of- the-art and possible extensions," Knowledge and Data Engineering,
IEEE Transactions on , vol.17, no.6, pp.734,749, June 2005.
[9] Book Crossing Dataset by apl. Prof. Dr. Cai- Nicolas Ziegler
http://www.informatik.unifreiburg.de/~cziegler/BX
[10] Thorat, Poonam B., R. M. Goudar, and Sunita Barve. "Survey on collaborative
filtering, content-based filtering and hybrid recommendation system." International Journal
of Computer Applications 110, no. 4 (2015): 31-36.
25
[11] M. Balabanovic, Y. Shoham, “Content-based, collaborative recommendation”,
Communications of the ACM 40 (3) (1997) pp.66–72.
[13] Konstan, Joseph & Miller, Bradley & Maltz, David & Herlocker, Jon & Gordon, Lee &
Riedl, John. (2000). GroupLens: Applying collaborative filtering to Usenet news.
Communications of the ACM. 40. 10.1145/245108.245126.
26