0% found this document useful (0 votes)
116 views6 pages

The Linear Algebra Behind Google'S Pagerank Algorithm: Sujit Dunga 11110102

The document discusses the PageRank algorithm developed by Google to rank websites in search results. It provides an overview of the mathematics behind PageRank, describing it as a function that assigns a real number to each webpage based on the number and quality of inbound links. The PageRank of a page is considered a measure of its importance, with more important pages receiving higher ranks. The document outlines the key steps in calculating PageRank, including formulating the transition matrix and iterating the PageRank vector until it converges. It also discusses some open problems and limitations with PageRank, as well as applications of the algorithm beyond web search.

Uploaded by

Sujit Dunga
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
116 views6 pages

The Linear Algebra Behind Google'S Pagerank Algorithm: Sujit Dunga 11110102

The document discusses the PageRank algorithm developed by Google to rank websites in search results. It provides an overview of the mathematics behind PageRank, describing it as a function that assigns a real number to each webpage based on the number and quality of inbound links. The PageRank of a page is considered a measure of its importance, with more important pages receiving higher ranks. The document outlines the key steps in calculating PageRank, including formulating the transition matrix and iterating the PageRank vector until it converges. It also discusses some open problems and limitations with PageRank, as well as applications of the algorithm beyond web search.

Uploaded by

Sujit Dunga
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 6

The linear algebra behind Googles PageRank algorithm

Sujit Dunga
11110102

1. Abstract
Today, Around 40% of the world population has access to the internet and there are over 800
million websites on the Internet. Search engine forms the most important part of the use of
browsing the internet. More than 60% of the users do not go past the first page and more than 90%
users do not go past the 3rd page in the search engine results page (SERP). If your website cannot be
found within the first 3 pages in the search engine results page (SERP), your website is in great
danger of being visited by very few people.

This paper gives an insight on fundamental mathematics behind PageRank, an algorithm


used by worlds fastest search engine Google to rank websites in their search engine
results page.

2. Introduction
Earlier search engines typically used to give links to pages that had the highest keyword
density. Internet spammers took it to advantage and created websites by repeating the same
words/sentences again and again in order to attract higher search page results.
To overcome the internet spamming, PageRank was first formulated in 1996 by Sergey Brin
and Larry Page (co-founders of Google) at Stanford University. It was named after Larry
Page.
PageRank is a function that assigns a real number to each page in the Web (or at least to that
portion of the Web that has been crawled and its links discovered). The intent is that the
higher the PageRank of a page, the more important it is.1
The PageRank of a page is a measure of how likely it is to be a good response to a search
query. In its simplest form, PageRank is a solution to the recursive equation a page is
important if important pages link to it.2
PageRank is displayed on the toolbar of your web browser if you have installed the Google
toolbar (http://toolbar.google.com/)
1 Mining of Massive Datasets Anand Rajaraman, Jure Leskovec, and Jeffrey D. Ullman,
page 165

2 Mining of Massive Datasets Anand Rajaraman, Jure Leskovec, and Jeffrey D.


Ullman, page 197

3. Motivation
Earlier, Internet has always been full of spammers and it has become a major challenge for
the search engine and even the user to get the exact results the user desires. Developing an
efficient and accurate Web search was one of the biggest challenges of the decade. Even
though Google was not the first search engine, it was the first to be able to defeat the
spammers using the PageRank algorithm. PageRank is the first and still the best-known
algorithm used by the worlds fastest search engine. Hence the paper focuses on giving a
insight on how this PageRank algorithm works.

4. Mathematics behind PageRank

3 http://searchengineland.com/what-is-google-pagerank-a-guide-for-searcherswebmasters-11068

The entire web is assumed to be a directed graph where each node is a page each directed line
is a link from one page to another. In the above picture, n1, n2, n3 are the web pages. It
shows the various out-links from each webpage.
Calculating the PageRank:
Step 1: Formulating the transition matrix:

4 http://computationalculture.net/article/what_is_in_pagerank

5 Mining of Massive Datasets Anand Rajaraman, Jure Leskovec, and Jeffrey D. Ullman,
page 166

A transition matrix

is a square matrix of order

nn

webpages. Each row


i
and column correspond to the in-links and out-links of the webpages i.e, the th row and the
i
i
th column represent the th page of the Web.
The value of the element
i
out-link is to page .
If page

mij

is

1/ k

, if there are

if number of out-links from page

has no out-links to any page then the value of

mij

is

and one of the

is 0.

Therefore the transition matrix for the above sketch is as follows:


0 1/ 2
1/ 3 0
M
1/ 3 0

1/ 3 1/ 2

1 0
0 1/ 2
0 1/ 2

0 0

Column 1, row 1 represent page A; Column 2, row 2 represent page B and so on.
The matrix in-short gives the probability of a random surfer going from one page to all the
other pages in the web including itself.
Step 2: Formulating the PageRank vector:
Page rank is nothing but a column vector
j
the random surfer is at page .

whose

th component gives the probability that

The solution to the PageRank vector is given by the equation:

v Mv
From the above equation PageRank can also be defined as the principal eigenvector of the
M .v 1.v
transition matrix with eigen value 1(
).
This is nothing but a system of linear equations which can solved using analytical methods
(Gaussian elimination ) but in reality there will be billions of pages and the solving
becomes difficult (It takes time of cubic power of number of equations). Hence we use an
iterative method to solve this equation.
We re-write the equation in iterative terms as:
vi 1 Mvi
Step 3: Initializing the PageRank vector:

v0
Initial guess for the PageRank vector is taken as a column vector in which each element is
1
n
n
i.e, we set equal probability for the random surfer to be at any of the webpages initially.
Step 4: Iterating the PageRank vector:
Now we find
v1 Mv0
v2 Mv1 M ( Mv0 ) M 2 v0
v3 Mv2 M (M ( Mv0 )) M 3v0
M
vn M n v0
vn , vn 1
vn
We do this until
converge and the resulting
will be our PageRank function/vector
n
for the webpages.
The above algorithm counts all in-links equally. Modified later versions of PageRank
algorithm determines that some links may be more valuable than others, and therefore assigns
them more weight than others.
5. Applications in Various Disciplines
Predicting the extinction of a species by studying food web.
Used in molecular universe to map the shapes and chemical reactions of
quadrillions of molecules.

Analysis of Metabolic and protein-protein interaction (PPI) networks.

Ranking tweets in Twitter.


Recommendation systems: suggesting Movies on Netflix.
Treatment of lung cancer- to identify tumours more likely to spread to other sites
in the body and treat them before they have a chance to spread.
Suggesting friends on Facebook

6. Open Problems
A link farm is a website (or a group of websites) whose majority of the content
is hyperlinks , often random and unrelated to other websites. Link farms were generated in
response to Google's PageRank algorithm. It is created to increase the PageRank of another
site by increasing the number of incoming links. Since PageRank algorithm assumes that the
webpage and its in-links are related to each other, a page might have a higher PageRank even
if it is unrelated to search query. Google has the right to penalize web sites that have created
link farms.

PageRank of a website can be increased by purchasing paid links from high PageRank
sites. Google has publicly warned high PageRank sites that if they were found to be selling
links to other low rank sites, their links would not be included from then on in their PageRank
calculation scheme.
Google bombing is the practice of enhancing any websites SEO profile by repeatedly
inputting the keyword associated with the destination. With enough mass participation,
Google bombing can bump the desired content to a higher page ranking in search results.6
Example: In 2003, large amount of people gathered to link the phrase miserable failure as
anchor text to link to the biography of George Bush.
The war between those who want to make the internet useful and those who would exploit it
for their own purposes will never end. Modifications will be made to the PageRank algorithm
as years pass by or even a day might come when a new algorithm will algorithm take over the
place of PageRank algorithm.
7. Conclusion
In this paper we have learnt fundamental mathematics behind the googles PageRank
algorithm and its increasing applications in network based problems.
PageRank is a powerful and irreplaceable tool that made the web-search efficient and
accurate and found its place in various network (graph) based real life applications.

8. References
Book:Mining of Massive Datasets Anand Rajaraman, Jure Leskovec, and Jeffrey D. Ullman
http://infolab.stanford.edu/~ullman/mmds/book.pdf
http://www.google.co.in/about/company/history/
http://www.wired.com/2009/09/googlefoodwebs/
http://www.internetlivestats.com/internet-users/
http://news.bbc.co.uk/2/hi/8238462.stm
http://www.extremetech.com/extreme/118419-applying-googles-pagerank-algorithm-to-the-molecular-universe
http://web.stanford.edu/class/msande233/handouts/lecture8.pdf
http://en.wikipedia.org/wiki/PageRank
http://blog.woorank.com/2013/04/what-is-a-pagerank-and-how-important-is-it-in-2013/
http://en.wikipedia.org/wiki/Link_farm
http://searchengineland.com/what-is-google-pagerank-a-guide-for-searchers-webmasters-11068
http://knowyourmeme.com/memes/google-bombing
http://google.about.com/od/g/g/googlebombdef.htm

6 http://knowyourmeme.com/memes/google-bombing

You might also like