100% found this document useful (3 votes)
2K views41 pages

Movie Recommendation System

The document describes a movie recommendation system project submitted by two students. It includes a bona fide certificate signed by faculty members, a table of contents, and sections on the abstract, introduction, literature survey, model architecture, implementation, results and discussions, conclusion, and references. The objective is to provide accurate movie recommendations to users using a hybrid approach combining content-based and collaborative filtering.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
100% found this document useful (3 votes)
2K views41 pages

Movie Recommendation System

The document describes a movie recommendation system project submitted by two students. It includes a bona fide certificate signed by faculty members, a table of contents, and sections on the abstract, introduction, literature survey, model architecture, implementation, results and discussions, conclusion, and references. The objective is to provide accurate movie recommendations to users using a hybrid approach combining content-based and collaborative filtering.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 41

MOVIE RECOMMENDATION SYSTEM

Submitted by
SIVADHARISHANA T.D (211501100)
SOWMYA.S (211501101)

AI19341 Principles of Artificial Intelligence

Department of Artificial Intelligence and Machine Learning

Rajalakshmi Engineering College, Thandalam

1
BONAFIDE CERTIFICATE

This is to certify that the Mini project work titled “ MOVIE


RECOMMENDATION SYSTEM” done by “SIVADHARISHANA.T.D”
211501100(AIML), “SOWMYA.S” 211501101(AIML), is a record of bonafide
work carried out by them under my supervision as a part of MINI PROJECT for
the subject titled AI19341/Principles of Artificial Intelligence by Department of
Artificial Intelligence and Machine Learning.

HEAD OF THE DEPARTMENT FACULTY IN CHARGE


Dr.S.Bagavathi Priya M.Tech.,Ph.D Mrs.K.R.Sowmia
Professor and Head, AssistantProfessor(SG),
Artificial Intelligence Artificial Intelligence
and Machine Learning, and Machine Learning,
Rajalakshmi Engineering College, Rajalakshmi Engineering College,
Thandalam, Thandalam,
Chennai – 602 105. Chennai – 602 105.

This project report is submitted for practical examination for AI19341 / Principles of Artificial
Intelligence to be held on…………….at Rajalakshmi Engineering College, Thandalam.

EXTERNAL EXAMINER INTERNAL EXAMINER

2
TABLE OF CONTENTS

S.No Chapter Page Number


1. ABSTRACT 4

2. INTRODUCTION 5

3. LITERATURE SURVEY 7

4. MODEL ARCHITECTURE 12

5. IMPLEMENTATION 14

6. RESULTS AND DISCUSSIONS 24

7. CONCLUSION 29

8. REFERENCES 31

9. APPENDIX I-CODING 35

10. APPENDIX II-OUTPUT SCREENSHOTS 39

3
ABSTRACT

Recommender System is a system that helps to predict or suggest preferences according to the user’s
choices. The technique used here is called Content-Based Filtering. In today’s world, not everything
matches with people’s interests. Sometimes the feed shows the videos, music, news, clothing etc.
which is not according to user liking and interest. It makes the customer’s interest in the application
low and the user does not want to use the same application again. The need of the hour is to develop
some code which can observe the pattern of the customer trend at a beginner level and recommend
him with the best item of his interest level. This will help in giving satisfactory customer experience
and to achieve good rating and popularity too.

Many of the recommendation systems that are seen in the environment like YouTube for example,
if the user sees lots of news regarding the Current World Affairs, then it offers them related videos
according to it.The application gains popularity by application rating and at the same time enhances
the customer experience. This policy of recommendation system is helpful in giving optimum
results to the user and the application. The User can also see the recommendation in online food
applications such as Zomato and Swiggy which offer their customers, the restaurants which supply
food according to their taste. They learn upon the behaviour of the customer from the previous
orders and try to impress them with the latest additions A large number of companies are making
use of recommendation systems to increase user interaction and enrich a user's shopping experience.
Recommendation systems have several benefits, the most important being customer satisfaction and
revenue. Movie Recommendation system is very powerful and important system. But, due to the
problems associated with pure collaborative approach, movie recommendation systems also suffer
with poor recommendation quality and scalability issues. In this Movie Recommendation System,
compare the ratings and the similarity of the movies by using pandas library and installing required
datasets. After getting the ratings, a graph is plotted and the output is generated.

4
CHAPTER 1

INTRODUCTION

A recommendation system is a model used for information filtering where it tries to predict the
preferences of a user and provide suggestions based on these preferences. These systems have
become increasingly popular nowadays and are widely used today in areas such as movies, music,
books, videos, clothing, restaurants, food, places and other utilities. These systems collect
information about a user's preferences and behaviour, and then use this information to improve their
suggestions in the future.

Movies are a part and parcel of life. There are different types of movies like some for entertainment,
some for educational purposes, some are animated movies for children, and some are horror movies
or action films. Movies can be easily differentiated through their genres like comedy, thriller,
animation, action etc. Other way to distinguish among movies can be either by releasing year,
language, director etc. Watching movies online, there are a number of movies to search in our most
liked movies . Movie Recommendation Systems helps us to search our preferred movies among all
of these different types of movies and hence reduce the trouble of spending a lot of time searching
our favourable movies. So, it requires that the movie recommendation system should be very
reliable and should provide us with the recommendation of movies which are exactly same or most
matched with our preferences.

The objective of this project is to provide accurate movie recommendations to users. The goal of
the project is to improve the effect of movie recommendation system, such as accuracy, quality and
scalability of system. This is done using Hybrid approach by combining content based filtering and
collaborative filtering. To eradicate the overload of the data, recommendation system is used as
information filtering tool in social networking sites.

Recommender systems constitute one of the fastest growing segments of the Internet economy
today. They help reduce information overload and provide customized information access for
targeted domains. Building and deploying recommender systems has matured into a fertile business
activity, with benefits in retaining customers and enhancing revenues.

5
Elements of the recommender landscape include customized search engines, handcrafted content
indices, personalized shopping agents on ecommerce sites, and news-on-demand services.The
scope of such personalization thus extends to many different forms of information content and
delivery, not just web pages. The underlying algorithms and techniques, in turn, range from simple
keyword matching of consumer profiles, collaborative filtering, to more sophisticated forms of data
mining, such as clustering web server logs. Recommendation is often viewed as a system involving
two modes (typically people and artifacts, such as movies and books) and has been studied in
domains that focus on harnessing online information resources, information aggregation, social
schemes for decision making, and user interfaces.In this paper, we aim to use collaborative filtering
recommendation and actively contribute recommendation that satisfies users' tastes.

Design is on the base of data from famous online movie database. Here data collected is not
anonymous and is collected in a safeguarded manner. This data collected is not fake as user has to
login with his/her complete details (user login id, user password). User can view related links of the
movies which are recommended to them. Further filtering techniques are applied and user’s
recommendation page is updated. Recommender systems use the user, item, and ratings information
to predict how other users will like a particular item. Recommender system will become an integral
part of the Media and Entertainment (M&E) industry in the near future. There are majorly six types
of recommender systems which work primarily in the Media and Entertainment industry:
Collaborative Recommender system, Content-based recommender system, Demographic based
recommender system, Utility based recommender system, and Knowledge based recommender
system and Hybrid recommender system.

6
CHAPTER 2

LITERATURE SURVEY

Recommender systems handle the problem of information overload that users normally encounter
by providing them with personalized, exclusive content and service recommendations. Recently,
various approaches for building recommendation systems have been developed, which can utilize
collaborative filtering, content-based filtering or hybrid filtering [9–11]. Collaborative filtering
technique is the most mature and the most commonly implemented. Collaborative filtering
recommends items by identifying other users with similar taste; it uses their opinion to recommend
items to the active user. Collaborative recommender systems have been implemented in different
application areas. GroupLens is a news-based architecture which employed collaborative methods
in assisting users to locate articles from massive news database [9].

Ringo is an online social information filtering system that uses collaborative filtering to build users
profile based on their ratings on music albums [10]. Amazon uses topic diversification algorithms
to improve its recommendation [13]. The system uses collaborative filtering method to overcome
scalability issue by generating a table of similar items offline through the use of item-to-item matrix
[15].Content based techniques match content resources to user characteristics. Content based
filtering techniques ignore contributions from other user as with the case of collaborative
techniques. Collaborative filtering and content based filtering approaches are widely used today by
implementing them differently and later combining their results or adding a characteristics of
content based to collaborative based and vice-versa [12].

In another recommendation system the system will generate recommendations based on the used
items of other users whose preferences are similar to current user. These techniques are only
applicable when we want to predict things or items for a single user. This approach can be extended
to a group recommender system. Demonstration of lack of relationship between user-centered
metrics and objective-metrics is done. A metric is used to balance the weight of each of the user-
centered metrics (relevance, novelty, global satisfaction, serendipity)

7
combining them into a single value in order to have a more reliable way to judge. It’s easier for the
user to express an opinion about the recommendation instead of answer 60 questions (short
questionnaire rather than long version proposed in ResQue model). This work can be easily
expanded into other parameters. User did not perceive positively the non-personalized algorithm
and the result is different because the user perceived these recommendations as non-pertinant to
his/her choices [19].

The datasets are used as benchmarks to develop new recommendation algorithms and to compare
them to other algorithms in given settings .In this section, we present an overview of different
datasets, which are available in different domains. Kumar et al. [29] proposed MOVREC, a movie
recommendation system based on collaborative filtering approaches. Collaborative filtering takes
the data from all the users and based on that generates recommendations. A hybrid system has been
presented by Virk et al. [30].

This system combines both collaborative and content-based method. De Campos et al. [34] also
made an analysis of both the traditional recommendation techniques. As both of these techniques
have certain setbacks, he proposed another system which is a combination of Bayesian network and
collaborative technique. Kużelewska [35] proposed clustering as an approach to handle the
recommendations. Two methods for clustering were analyzed: Centroid-based solution and
memory-based methods. The result was that accurate recommendations were generated.

Chiru et al. [27] proposed Movie Recommender, a system that uses the user’s history in order to
generate recommendations. Sharma and Maan [36] in their paper analyzed various techniques used
for recommendations, collaborative, hybrid and content-based recommendations. Also, it describes
the pros and cons of these approaches. Li and Yamada [37] proposed an inductive learning algorithm
where a tree had been built which shows the user recommendation.

8
Some of the major contribution in recommendation system is discussed in Table 1.

Table 1. Literature Review of Recommendation Systems

AUTHORS YEARS DESCRIPTIONS

The authors proposed a


Scharf & Alley [38] 1993 flexible multicomponent
rate recommendation
system to predict the
optimum rate of fertilizer
for winter wheat.

The authors proposed an


Basu et al. [39] 1998 approach to the
recommendation that can
exploit both ratings and
content information.

The authors proposed


Sarwar et al. [40] 2001 various techniques for
computing item-item
similarities.

The author proposed an


Bomhardt [41] 2004 approach for a personal
recommendation of news.

The authors presented the


Manikrao & Prabhakar [42] 2005 design of a dynamic web
selection framework.

The authors proposed a


Von Reischach et al. [43] 2009 rating concept that allows
users to generate rating
criteria.

The authors proposed


Choi et al. [44] 2012 approaches for integrating
various techniques for
improving the
recommendation quality.

9
Table 2 discussed the contribution of filtering techniques for different purposes.

Table 2. Literature Review of Filtering Techniques.

AUTHORS YEARS DESCRIPTIONS

The authors
Goldberg etal. 1992 introduced the
[45] collaborative filtering
technique
Authors applied
Herlocker 1997 filtering techniques to
etal. [46] Usenet news.
The authors
introduced an
Miyahara & 2000 approach to calculate
Pazzani [47] the similarity
between a user from
negative ratings to
positive ratings
separately.
The author
Hofmann [48] 2004 introduced a new-
family of model-
based algorithms.
The authors proposed
Dabov et al. 2008 an image restoration
[49] technique using
collaborative
filtering.
The authors proposed
Pennock et al. 2013 various approaches
[50] for filtering by
personality diagnosis.
The authors
introduced a new
Liu et al. [51] 2014 method to provide an
accurate
recommendation.

10
Table 3 . Literature Review for Existing System.

The Table 3 discusses the techniques used and the outcomes of the existing systems.

S.NO AUTHOR(S) YEAR TECHNIQUE DESCRIPTION OUTCOME

1. Manoj Kumar,DK By using K- A Pre filter is used MovieREC, a


Yadav,Ankur Singh means algorithm before applying K- movie
2015
means recommender
algorithm.Several using K-means
attributes are used. had been
created.

2. AnirudhChalla,Vijayakumar Filtering and Movielens dataset The


V. clustering provides a reliable recommender
2017
techniques are model which gives system was
used precise suggestions able to
compared to understand the
previous models. user’s interests
precisely

3. Sheelavathi. A, Content-Based A dataset that The


Priyadharshan. M, Vignesh. Filtering contains the performance
2022
S, Elango. K metadata is used in results show
this project. that the
projected
strategies
improve the
accuracy of
system

11
CHAPTER 3

MODEL ARCHITECTURE

Figure 3.1 Architecture Diagram

The diagram represents our proposed system architecture. The problem statement here is to
recommend a movie to the user. In our project , we combine the two major types of filtering to
get the optimal result.

The major types of filtering are:

12
❖ Content based filtering-This filtering uses similarities in products, services, or content features,
as well as information accumulated about the user to make recommendations.

❖ Collaborative based filtering- This technique can filter out items that a user might like on the
basis of reactions by similar users. Like, if two users watch comedy movies so if a new comedy
stuff appears and is watched by A user it will also be recommended to user B.

Figure 3.2 Representation For Types Of Filtering

This project uses the rating and type (similarity) of movies to get the recommendations as output
So, this is a combination of collaborative and content-based filtering also known as hybrid
algorithm that is used in this project.

The movie ratings and number of ratings are calculated and the movie is verified by both the
content-based filtering and the collaborative based-filtering and then the optimal result which is a
list of recommended movies for the user according to his taste is derived as the output.

The proposed approach to the movie recommendation systems implies a mix of both strategies
for the most gradual and explicit results.

13
CHAPTER 4
IMPLEMENTATION

A Recommender system is a information filtering system that provide suggestions for items that
are most pertinent to a particular user. Typically, the suggestions refer to various decision-making
processes, such as what product to purchase, what music to listen to, or what online news to
read. Recommender systems are particularly useful when an individual needs to choose an item
from a potentially overwhelming number of items that a service may offer.

Recommender systems usually make use of either collaborative filtering and content-based
filtering as well as other systems such as knowledge-based systems. Collaborative filtering
approaches build a model from a user's past behaviour that is by the help of items previously
purchased or selected and/or numerical ratings given to those items as well as similar decisions
made by other users. This model is then used to predict items that the user may have an interest
in. Content-based filtering approaches utilize a series of discrete, pre-tagged characteristics of an
item in order to recommend additional items with similar properties

Movie recommendation systems use a set of different filtration strategies and algorithms to help
users find the most relevant films. The most popular categories of the ML algorithms used for
movie recommendations are content-based filtering and collaborative filtering systems. Each type
of system has its strengths and weaknesses.

Recommender systems are a useful alternative to search algorithms since they help users discover
items they might not have found otherwise. Recommender systems are often implemented using
search engines indexing non-traditional data.

Content-Based Filtering

A common approach when designing recommender systems is content-based filtering. Content-


based filtering methods are based on a description of the item and a profile of the user's
preferences. These methods are best suited to situations where there is known data on an item
(name, location, description, etc.), but not on the user.

14
Content-based recommenders treat recommendations as a user-specific classification problem
and learn a classifier for the user's likes and dislikes based on an item's features.

In this system, keywords are used to describe the items, and a user profile is built to indicate the
type of item this user likes. In other words, these algorithms try to recommend items similar to
those that a user liked in the past or is examining in the present. It does not rely on a user sign-in
mechanism to generate this often temporary profile. Various candidate items are compared with
items previously rated, and the best-matching items are recommended. This approach has its roots
in information retrieval and information filtering research.

Basically, it is a filtration strategy for movie recommendation systems, which uses the data
provided about the items (movies). This data plays a crucial role here and is extracted from only
one user. An ML algorithm used for this strategy recommends motion pictures that are similar to
the user’s preferences in the past. Therefore, the similarity in content-based filtering is generated
by the data about the past film selections and likes by only one user. The recommendation system
analyses the past preferences of the user concerned, and then it uses this information to try to find
similar movies. This information is available in the database (e.g., lead actors, director, genre,
etc.). After that, the system provides movie recommendations for the user. That said, the core
element in content-based filtering is only the data of only one user that is used to make
predictions.

A key issue with content-based filtering is whether the system can learn user preferences from
users' actions regarding one content source and use them across other content types. When the
system is limited to recommending content of the same type as the user is already using, the value
from the recommendation system is significantly less than when other content types from other
services can be recommended. For example, recommending news articles based on news
browsing is useful. Still, it would be much more useful when music, videos, products, discussions,
etc., from different services, can be recommended based on news browsing. To overcome this,
most content-based recommender systems now use some form of the hybrid system.

15
Collaborative Filtering

Collaborative filtering is based on the assumption that people who agreed in the past will agree in
the future, and that they will like similar kinds of items as they liked in the past. The system
generates recommendations using only information about rating profiles for different users or
items. By locating peer users/items with a rating history similar to the current user or item, they
generate recommendations using this neighbourhood. Collaborative filtering methods are
classified as memory-based and model-based. A well-known example of memory-based
approaches is the user-based algorithm, while that of model-based approaches is Matrix
factorization .

A key advantage of the collaborative filtering approach is that it does not rely on machine
analysable content and therefore it is capable of accurately recommending complex items such as
movies without requiring an understanding of the item itself. Many algorithms have been used in
measuring user similarity or item similarity in recommender systems. For example, the k-nearest
neighbour approach and the Pearson Correlation as first implemented by Allen.

As the name suggests, this filtering strategy is based on the combination of the relevant user’s and
other users’ behaviours. The system compares these behaviours for the most optimal results. It is
a collaboration of the multiple users’ film preferences and behaviours.

The core element in this movie recommendation system and the ML algorithm it’s built on is the
history of all users in the database. Basically, collaborative filtering is based on the interaction of
all users in the system with the movies. Thus, every user impacts the outcome of this ML-based
recommendation system, while content-based filtering depends strictly on the data from one user
for its modelling.

One of the most famous examples of collaborative filtering is item-to-item collaborative filtering
(people who buy x also buy y), an algorithm popularized by Amazon's recommender system.

16
Collaborative filtering approaches often suffer from three problems: cold start, scalability, and
sparsity.

• Cold start: For a new user or item, there isn't enough data to make accurate recommendations.

• Scalability: There are millions of users and products in many of the environments in which these
systems make recommendations. Thus, a large amount of computation power is often necessary
to calculate recommendations.

• Sparsity: The number of items sold on major e-commerce sites is extremely large. The most
active users will only have rated a small subset of the overall database. Thus, even the most
popular items have very few ratings.

Hybrid Recommendation System

Most recommender systems now use a hybrid approach, combining collaborative filtering,
content-based filtering, and other approaches. There is no reason why several different techniques
of the same type could not be hybridized. Hybrid approaches can be implemented in several ways:
by making content-based and collaborative-based predictions separately and then combining
them; by adding content-based capabilities to a collaborative-based approach or by unifying the
approaches into one model .Several studies that compare the performance of the hybrid with the
pure collaborative and content-based methods and demonstrated that the hybrid methods can
provide more accurate recommendations than pure approaches. These methods can also be used
to overcome some of the common problems in recommender systems such as cold start and the
sparsity problem, as well as the knowledge engineering bottleneck in knowledge-based
approaches.

Netflix is a good example of the use of hybrid recommender systems. The website makes
recommendations by comparing the watching and searching habits of similar users (i.e.,
collaborative filtering) as well as by offering movies that share characteristics with films that a
user has rated highly (content-based filtering).

17
PROPOSED SYSTEM:

❖ Data: Machine Learning systems need data, so find and import the essential libraries with movie
datasets that already have global ratings.

❖ Analysis: Create analysis of top-rated movies from the existing dataset.

❖ Personalization: Get personalized ratings by importing datasets with all the details required.

❖ Strategy: Implement content-based or collaborative filtering strategy.

❖ Combination: Combine recommendation lists to get a reasonable estimate across the ratings.
The combined dataset of movie ratings can now be used for either filtering model.

The Proposed system includes the steps:

● Importing pandas for python

The library imported is pandas for python. Pandas is an open source python package which is
widely used for data analysis and machine learning .It was developed by Wes McKinney in
2008.

import pandas as pd

● Loading the dataset.

To find out the related content for the user based on ratings by using a dataset to get the data of
movie IDs and details such as year of release. The dataset is loaded using pandas. The dataset is
tab separated so it can pass in \t to the sep parameter. Then it’ll pass in the column names using
the names parameter. The data consists of IDs,rating and timestamp. After getting the data, the
head of the data will be checked to see the data which is being dealt by the user.

df.head()

18
Next, check out all the movies and user’s respective ID’s. It would be much more suitable if user
can see the titles instead of just the IDs. So, load in the movie titles and merge it with dataset.

movie_titles = pd.read_csv('Movie_Titles')
movie_titles.head()

• Calculate the mean rating.

The mean rating of all the movies are calculated by sorting the values according to the title and
their ratings

data.groupby(‘title’)[‘rating’].mean().sort_values(ascending=False).head()

• Calculate the count rating.

The count rating of movies which is the number of ratings of a specific movie is calculated
similarly.

data.groupby(‘title’)[‘rating’].count().sort_values(ascending=False).head()

• Import matplot library and seaborn

Matplotlib is a low-level library of python which is used for data visualization. This helps to plot
graphs and consists line chart, bar chart, histogram etc.. Seaborn is a library that uses matplotlib
to plot graphs. It will be used to visualize random distributions.

import matplotlib.pyplot as plt


import seaborn as sns

• Plot a graph of the number of ratings columns and also plot the graph of the mean ratings.

• Sort the values according to the number of ratings .

19
• Calculate the correlation between the movies.

Analyse the correlation between other movies to find the similar movies. Correlation is the
statistical relationship between two random variables or data, in this case – relationship between
the movies which is identified as the similarity of the movies.

corr_starwars.sort_values(‘Correlation’,ascending=False).head(10)

corr_starwars=corr_starwars.join(ratings[,num of ratings’])

corr_starwars.head()

In this project , the correlation of star wars and liar liar movies are calculated and the output
shows the rating column with the mean rating and the number of ratings along with the titles of
the similar movies.

• Get the output as a list of similar movies with ratings along with the survey graph.

The Parameters used are:

1. user_id - the ID of the user who rated the movie.

2. item_id - the ID of the movie.

3. Mean rating - The average rating of the movie.

4. title - The title of the movie.

5. Count ratings – The nuber of ratings for a movie in total.

20
DATASET

The dataset included in this project contains the data such as the user ID, Item ID, Mean rating and
the count rating.

A sample of the dataset is attached below.

User Item Mean Count


ID ID rating rating

115 265 2 881171488


253 465 5 891628467
305 451 3 886324817
6 86 3 883603013
62 257 2 879372434
286 1014 5 879781125
200 222 5 876042340
210 40 3 891035994
224 29 3 888104457
303 785 3 879485318
122 387 5 879270459
194 274 2 879539794
291 1042 4 874834944
234 1184 2 892079237
119 392 4 886176814
167 486 4 892738452
299 144 4 877881320
291 118 2 874833878
308 1 4 887736532
95 546 2 879196566
38 95 5 892430094
102 768 2 883748450
63 277 4 875747401
160 234 5 876861185
50 246 3 877052329
301 98 4 882075827
225 193 4 879539727
290 88 4 880731963
97 194 3 884238860
157 274 4 886890835
181 1081 1 878962623
278 603 5 891295330
276 796 1 874791932
246 201 5 884921594
242 1137 5 879741196

21
The other dataset contains the movie titltes and the year of the movie release. Since it will be
difficult to identify a movie with the ID’s alone, the dataset that contains the movie titles is merged
along with the original dataset. Asample of the MovieTitles dataset is attached below.

Item ID Movie Titles

1 Toy Story (1995)


2 GoldenEye (1995)
3 Four Rooms (1995)
4 Get Shorty (1995)
5 Copycat (1995)
6 Shanghai Triad (Yao a yao yao dao waipo qiao) (1995)
7 Twelve Monkeys (1995)
8 Babe (1995)
9 Dead Man Walking (1995)
10 Richard III (1995)
11 Seven (Se7en) (1995)
12 Usual Suspects, The (1995)
13 Mighty Aphrodite (1995)
14 Postino, Il (1994)
15 Mr. Holland's Opus (1995)
16 French Twist (Gazon maudit) (1995)
17 From Dusk Till Dawn (1996)
18 White Balloon, The (1995)
19 Antonia's Line (1995)
20 Angels and Insects (1995)
21 Muppet Treasure Island (1996)
22 Braveheart (1995)
23 Taxi Driver (1976)
24 Rumble in the Bronx (1995)
25 Birdcage, The (1996)
26 Brothers McMullen, The (1995)
27 Bad Boys (1995)
28 Apollo 13 (1995)
29 Batman Forever (1995)
30 Belle de jour (1967)
31 Crimson Tide (1995)
32 Crumb (1994)
33 Desperado (1995)
34 Doom Generation, The (1995)
35 Free Willy 2: The Adventure Home (1995)
36 Mad Love (1995)
37 Nadja (1994)
38 Net, The (1995)
39 Strange Days (1995)
40 To Wong Foo, Thanks for Everything! Julie Newmar (1995)
41 Billy Madison (1995)
42 Clerks (1994)

22
43 Disclosure (1994)
44 Dolores Claiborne (1994)
45 Eat Drink Man Woman (1994)
46 Exotica (1994)
47 Ed Wood (1994)
48 Hoop Dreams (1994)
49 I.Q. (1994)
50 Star Wars (1977)
51 Legends of the Fall (1994)
52 Madness of King George, The (1994)
53 Natural Born Killers (1994)
54 Outbreak (1995)
55 Professional, The (1994)
56 Pulp Fiction (1994)
57 Priest (1994)
58 Quiz Show (1994)
59 Three Colors: Red (1994)
60 Three Colors: Blue (1993)
61 Three Colors: White (1994)
62 Stargate (1994)
63 Santa Clause, The (1994)
64 Shawshank Redemption, The (1994)
65 What's Eating Gilbert Grape (1993)
66 While You Were Sleeping (1995)
67 Ace Ventura: Pet Detective (1994)
68 Crow, The (1994)
69 Forrest Gump (1994)
70 Four Weddings and a Funeral (1994)
71 Lion King, The (1994)
72 Mask, The (1994)
73 Maverick (1994)
74 Faster Pussycat! Kill! Kill! (1965)
75 Brother Minister: The Assassination of Malcolm X (1994)

Both the datsets contains approximately 1700 items out of which only a sample of a a few items
are given for reference here. The datasets contains movies with title, user ID’s, item ID’s ,mean
ratings, number of ratings(count ratings) , year of release after merging them .This ensures the
project does not face any scalability issues and also for easy understanding of the data available .

23
CHAPTER 5

RESULTS AND DISCUSSIONS

Since this project is about movie recommendation system, it can develop it by using either content
based or collaborative filtering or combining both. In this project, hybrid approach has been
developed. i.e, combination of both content and collaborative filtering .Both the approaches have
advantages and disadvantages. The advantages and disadvantages of both the filtering systems are
discussed below.

In Content based filtering, the recommendations are based on the user ratings or the similarity
between the kind of movies.

Advantages: it is easy to design and it takes less time to compute.

Disadvantages: the model can only make recommendations based on existing interests of the
user. In other words, the model has limited ability to expand on the users' existing interests.

In Collaborative filtering the recommendation is comparison of similar users.

Advantages: No need domain knowledge because the embeddings are automatically learned.

Disadvantages: if an item is not seen during training, the system can't create an embedding for it.

Since, the existing systems comprise of either of the types of filterings alone, there are some
limitations to them which are listed below.

24
The limitations of the existing systems of movie recommender system are:

• Content based recommendations have been known for a long time, but they tend to overlook some
great suggestions that may not be covered by simple cosine similarities.

• The high sparsity of the data limits the performance of the algorithm.

• It has some shortcomings since it does not involve other users (i.e.,user ratings)

• While associating with pure collaborative approach, movie recommendation systems suffers with
poor recommendation quality and scalability issues.

To discuss about the proposed system approach and to list out the advantages , sample output
screenshots are attached below.

Figure 5.1Mean Rating Graph

25
Figure 5.2 Count Ratings Graph

The movie recommender system also shows the graphs of the mean ratings and the number of
ratings(count ratings) along with the output as to show, how the system calculates the optimal
result. These graphs are plotted using the functions from matplot library for python.

Figure 5.3 Output Sample 1

26
The output sample 1 shows the similar movies for the given input ‘Liar Liar’. The movies are
depicted with the details such as the title, year of release, correlation and the number of ratings.

The correlation shows the similarity between the movies and it ranges from 0 to 1 according to the
relation between them. The output will vary according to the movie the user chooses in the program
and helps to find the appropriate result .

Figure 5.4 Output Sample 2

In the output sample 2, the movie taken for comparison is ‘Lion King, The (1994)
The recommender system calculates the correlation and the number of ratings of a few similar
movies and rank them according to the correlation(similarity). The output screen shows the list of
movies like Ice Storm, Beauty and the Beast as the recommended movies along with their
correlation and number of ratings.

27
Typically, a recommendation engine processes data through the below four phases-

• Collection
Data collected here can be either explicit such as data fed by users (ratings and comments on
products) or implicit such as page views, order history/return history, and cart events.In this
case, the datasets included are the collection available.

• Storing
The type of data you use to create recommendations can help you decide the kind of storage
you should use.

• Analyzing
The recommender system analyzes and finds items with similar user engagement data by
filtering it using different analysis methods . In this project, the correlation of movies are
calculated by the sorting of ratings and comparing them.

• Filtering
The last step is to filter the data to get the relevant information required to provide
recommendations to the user. And for enabling this, you will need to choose an algorithm
suiting the recommendation engine .This recommendation system uses a combination of
collaborative and content-based filtering systems.

The Advantages of Proposed System are:

● Accuracy is high compared to collaborative approach.

● Computing time is low

● Even though the memory requirement is high, the quality is better than any other approach.

● Scalability is high

28
CHAPTER 6

CONCLUSION

With the rapid development of Internet technology, the amount of information is growing at an
explosive speed. Users are usually helpless in the face of how to obtain effective information more
efficiently. It is difficult for them to find the information they are interested in simply and quickly.
The birth of the personalized recommendation system provides users with a passive way to obtain
information and can provide personalized information for users. At present, personalized
recommendation system has been widely used in video websites, music websites, e-commerce,
news reading websites, and other fields and has attracted more and more attention from scholars
and industry. This project proposes a hybrid movie recommendation system optimization based on
content-based and collaborative filtering algorithm. The research focus of the algorithm is to
consider the user’s behavior information and item category preference information at the same time.

Hybrid Recommendation engines are essentially the combination of diverse rating and sorting
algorithms. For instance, a hybrid recommendation engine could use collaborative filtering and
product-based filtering in tandem to recommend a broader range of products to customers with
accurate precision.Compared to pure collaborative and content-based methods, hybrid methods can
provide more accurate recommendations. They can also overcome the common issues in
recommendation systems such as cold start and the data paucity troubles.

Recommendation engines today serve as the key to the success of any online business. But, for a
sound recommendation system to make relevant recommendations in real-time requires powerful
abilities to correlate not just the product but also customer, inventory, logistics, and social sentiment
data.

29
All in all, recommender systems can be a powerful tool for any e-commerce business, and rapid
future developments in the field will increase their business value even further.
With a wide range of business applications, based on recommendations, and offering better
suggestions to customers, brands can leverage recommender systems for two key areas -Customer
satisfaction and enhanced personalization.

The first step to having great product recommendations for your customers is really just having the
courage to dive into better conversions. And remember — the only way to truly engage with
customers and to keep improving is to keep trying new methods and improvements in the domain.

Thus, a movie recommendation system has been created that uses a dataset to give suggestions
based on ratings and user interests. This movie recommendation system uses a strategy that focuses
on dealing with user’s personal interests and based on his previous reviews, movies are
recommended to users. This strategy helps in improving accuracy of the recommendations.

As a future enhancement, A recommender system that uses a Model-based Collaborative Filtering


system which is based on matrix factorization can be built. This will be evaluated by using the
techniques such as Root Mean Squared Error (RMSE).

30
REFERENCES

1. M. Govindarajan ” Sentiment Analysis of Movie Reviews using Hybrid Method of Naive


Bayes and Genetic Algorithm “International Journal of Advanced Computer Research, Vol.3,
Issue-13, December-2013….

2. Debadrita Roy, Arnab Kundu, (2013), “Design of Movie Recommendation System by Means
of Collaborative Filtering”, International Journal of Emerging Technology and Advanced
Engineering, Volume 3, Issue 4

3. Dhanashri Chafale, Amit Pimpalkar” Sentiment Analysis on Product Reviews Using


Plutchik’s Wheel of Emotions with Fuzzy Logic ” An International Journal of Engineering
& Technology , Vol. 1, Issue No. 2, December, 20140

4. Haruna, K.; Ismail, M.A.; Suyanto, M; Gabralla, L.A.; Bichi, A.B.; Danjuma, S.;
Kakudi, H.A.; Haruna, M.S.; Zerdoumi, S.; Abawajy, J.H.; Herawan, T.; “A Soft Set Approach
for Handling Conflict Situation on Movie Selection”, IEEE, vol: 7, 2019, pp: 116179-116194

5. Kalra, N.; Yadav, D.; Bathla, G.; “SynRec: A Prediction Technique using Collaborative
Filtering and Synergy Score”, International Journal of Engineering and Advanced
Technology, vol: 8, 2019, pp: 457-463

6. Pavithra, M.; Sowmiya, S.; Tamilmalar, A.; Raguvaran, S.; “Searching an Optimal Algorithm
for Movie Recommendation System”, International Research Journal of Engineering and
Technology, vol: 6, 2019, pp: 216-221

7. Nagamanjula. R.; Pethalakshmi, A.; “A Novel Scheme for Movie Recommendation System
using User Similarity and Opinion Mining”, International Journal of Innovative Technology
and Exploring Engineering, vol: 8, 2019, pp: 316-322

8. G. Vaitheeswaran , L. Arockiam “Hybrid Based Approach to Enhance the Accuracy of


Sentiment Analysis on Tweets ” IJCSET ,Vol 6, Issue 6, June, 2016.

31
9. Pravin Keshav Patil, K. P. Adhiya “ Automatic Sentiment Analysis of Twitter Messages
Using Lexicon Based Approach and Naive Bayes Classifier with Interpretation of Sentiment
Variation” International Journal of Innovative Research in Science, Engineering and
Technology, Vol. 4, Issue 9, September 2015.

10. Zhang, R.; Mao, Y.; “Movie Recommendation via Markovian Factorization of Matrix
Processes”, IEEE, vol: 7, 2019, pp: 13189-13199

11. Mhetre, R.; Priya, G.; “Movie Recommendation Engine using Collaborative Filtering with
Alternative Least Square and Singular Value Decomposition Algorithms”, International
Journal of Advanced Research in Computer and Communication Engineering, vol: 8, 2019,
pp: 88-92

12. Xi, W.; Huang, L.; Wang, C.; Zheng, Y.; Lai, J.; “BPAM: Recommendation Based on
BP Neural Network with Attention Mechanism”, Proceedings of the Twenty-Eighth
International Joint Conference on Artificial Intelligence, 2019, pp: 3905-3911

13. Shaik, I.; Nittela, S.S.; Hiwarkar, T.; Nalla, S.; “K-means Clustering Algorithm Based
on E-Commerce Big Data”, International Journal of Innovative Technology and Exploring
Engineering, vol: 8, 2019, pp: 1910-1914

14. Mehra, J.; Thakur, R.S.; “Probability Density Based Fuzzy C Means Clustering for Web
Usage Mining”, International Journal of Innovative Technology and Exploring Engineering,
vol: 8, 2019, pp: 169-17

15. Campadelli, P., Casiraghi, E., & Ceruti, C. (2015, September). Neighborhood selection for
dimensionality reduction. International conference on image analysis and processing (pp. 183-
191). Cham: Springer.

16. Arora, G., Kumar, A., Devre, G. S., & Ghumare, A. (2014). Movie recommendation system
based on users’ similarity. International journal of computer science and mobile computing,
3(4), 765-770.

17. Çano, E., & Morisio, M. (2017). Hybrid recommender systems: a systematic literature review.
Intelligent data analysis, 21(6), 1487-1524.

32
18. De Campos, L. M., Fernández-Luna, J. M., Huete, J. F., & Rueda-Morales, M. A. (2010).
Combining content-based and collaborative recommendations: a hybrid approach based on
Bayesian networks. International journal of approximate reasoning, 51(7), 785-799.

19. Gayen, S., Jha, S., Singh, M., & Kumar, R. (2019). On a generalized notion of anti-fuzzy
subgroup and some characterizations. International journal of engineering and advanced
technology.

20. Zheng, H., Liu, D., Wang, J., & Liang, J. (2019). A QoE-perceived screen updates transmission
scheme in desktop virtualization environment. Multimedia tools and applications, 78(12),
16755-16781.

21. Broumi, S., Dey, A., Talea, M., Bakali, A., Smarandache, F., Nagarajan, D., & Kumar, R.
(2019). Shortest path problem using Bellman algorithm under neutrosophic environment.
Complex & intelligent systems, 5(4), 409-416.

22. Kumar, R., Edalatpanah, S. A., Jha, S., Broumi, S., Singh, R., & Dey, A. (2019). A multi
objective programming approach to solve integer valued neutrosophic shortest path problems.
Neutrosophic sets and systems, 24, 134-149.

23. Kumar, R., Dey, A., Broumi, S., & Smarandache, F. (2020). A study of neutrosophic shortest
path problem. In Neutrosophic graph theory and algorithms (pp. 148-179). IGI Global.

24. Kumar, R., Edalatpanah, S. A., Jha, S., & Singh, R. (2019). A novel approach to solve gaussian
valued neutrosophic shortest path problems. Infinite Study.

25. Kumar, R., Edalatpanah, S. A., Jha, S., Gayen, S., & Singh, R. (2019). Shortest path problems
using fuzzy weighted arc length. International journal of innovative technology and exploring
engineering, 8, 724-731.

26. Kumar, R., Edaltpanah, S. A., Jha, S., & Broumi, S. (2018). Neutrosophic shortest path problem.
Neutrosophic sets and systems, 23(1), 2.

27. Kumar, R., Jha, S., & Singh, R. (2020). A different approach for solving the shortest path
problem under mixed fuzzy environment. International journal of fuzzy system applications
(IJFSA), 9(2), 132- 161.

33
28. Kumar, R., Jha, S., & Singh, R. (2017). Shortest path problem in network with type-2 triangular
fuzzy arc length. Journal of applied research on industrial engineering, 4(1), 1-7.

29. Chiru, C. G., Preda, C., Dinu, V. N., & Macri, M. (2015, September). Movie recommender
system using the user's psychological profile. 2015 IEEE international conference on intelligent
computer communication and processing (ICCP) (pp. 93-99). IEEE.

30. Hande, R., Gutti, A., Shah, K., Gandhi, J., & Kamtikar, V. (2016). MOVIEMENDER-A movie
recommender system. International journal of engineering sciences & research technology
(IJESRT), 5(11), 686

34
APPENDIX I

CODING

# import pandas library


import pandas as pd

# Get the data


column_names = ['user_id', 'item_id', 'rating', 'timestamp']

path = 'https://media.geeksforgeeks.org/wp-content/uploads/file.tsv'

df = pd.read_csv(path, sep='\t', names=column_names)

# Check the head of the data


df.head()

# Check out all the movies and their respective IDs


movie_titles = pd.read_csv('https://media.geeksforgeeks.org/wpcontent/uploads/
Movie_Id_Titles.csv')

movie_titles.head()
data = pd.merge(df, movie_titles, on='item_id')
data.head()

# Calculate mean rating of all movies


data.groupby('title')['rating'].mean().sort_values(ascending=False).head()

35
# Calculate count rating of all movies

data.groupby('title')['rating'].count().sort_values(ascending=False).head()

# creating dataframe with 'rating' count values


ratings = pd.DataFrame(data.groupby('title')['rating'].mean())

ratings['num of ratings'] = pd.DataFrame(data.groupby('title')['rating'].count())

ratings.head()

import matplotlib.pyplot as plt


import seaborn as sns

sns.set_style('white')
%matplotlib inline

# plot graph of 'num of ratings column'


plt.figure(figsize =(10, 4))

ratings['num of ratings'].hist(bins = 70)

# plot graph of 'ratings' column


plt.figure(figsize =(10, 4))

ratings['rating'].hist(bins = 70)

36
# Sorting values according to
# the 'num of rating column'
moviemat = data.pivot_table(index ='user_id', columns ='title', values ='rating')

moviemat.head()

ratings.sort_values('num of ratings', ascending = False).head(10)

# analysing correlation with similar movies


starwars_user_ratings = moviemat['Star Wars (1977)']
liarliar_user_ratings = moviemat['Liar Liar (1997)']

starwars_user_ratings.head()

# analysing correlation with similar movies


similar_to_starwars = moviemat.corrwith(starwars_user_ratings)
similar_to_liarliar = moviemat.corrwith(liarliar_user_ratings)

corr_starwars = pd.DataFrame(similar_to_starwars, columns =['Correlation'])


corr_starwars.dropna(inplace = True)

corr_starwars.head()

# Similar movies like starwars

corr_starwars.sort_values('Correlation', ascending = False).head(10)


corr_starwars = corr_starwars.join(ratings['num of ratings'])

corr_starwars.head()

corr_starwars[corr_starwars['num of ratings']>100].sort_values('Correlation', asc


ending = False).head()

37
# Similar movies as of liarliar

corr_liarliar = pd.DataFrame(similar_to_liarliar, columns =['Correlation'])


corr_liarliar.dropna(inplace = True)

corr_liarliar = corr_liarliar.join(ratings['num of ratings'])

corr_liarliar[corr_liarliar['num of ratings']>100].sort_values('Correlation', ascen


ding = False).head()

38
APPENDIX II

OUTPUT SCREENSHOTS

OUTPUT 1:

The output for the movie – “Star Wars”.

Figure 10.1 Output Movie Set-1

39
OUTPUT 2:

Output for the movie “Liar Liar

Figure 10.2 Output Movie Set-2

40
GRAPH REPRESENTATION

GRAPH-1

Figure 10.3 Output- Number of Ratings Graph

GRAPH-2

Figure 10.4 Output- Mean Ratings Graph

41

You might also like