0% found this document useful (0 votes)
136 views6 pages

DWM Expt9

The document describes implementing PageRank and HITS algorithms. It provides an overview of the algorithms and references for additional information. It then outlines the steps to code the PageRank algorithm, including using a sparse matrix to represent the graph, iterating until convergence is reached. It also discusses how to address issues like dead ends and spider traps by incorporating a damping factor and modeling a random surfer. Finally, it proposes improvements like personalized PageRank to make the algorithms more robust.

Uploaded by

Temp
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
136 views6 pages

DWM Expt9

The document describes implementing PageRank and HITS algorithms. It provides an overview of the algorithms and references for additional information. It then outlines the steps to code the PageRank algorithm, including using a sparse matrix to represent the graph, iterating until convergence is reached. It also discusses how to address issues like dead ends and spider traps by incorporating a damping factor and modeling a random surfer. Finally, it proposes improvements like personalized PageRank to make the algorithms more robust.

Uploaded by

Temp
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 6

Expt 9

Aim Implementation of Page rank/HITS algorithm

Objective: WAP to implement Page rank algorithm for a given graph and HITS algorithm

References:

http://pi.math.cornell.edu/~mec/Winter2009/RalucaRemus/Lecture3/lecture3.html PageRank

- Wikipedia

http://www.cs.princeton.edu/~chazelle/courses/BIB/pagerank.htm

https://www.youtube.com/watch?v=3_1h13PJkUs

Algorithm

Part 1: Simple

I/P- initial graph –READ/DEFINE THE MATRIX matrix and ( N –no of nodes 0

O/P- find page rank vector and k- no of iterations.

1) PASS MATRIX-A TO PAGE RANK FUNCTION

2) initial vector V0= [1/n, 1/n, 1/n..ntimes]-Print V0

3) LOOP -i=1,

Vi= M*V i-1

i=i+1;

4) Iterate until converge V or i steps( COMPARE if( Vi =Vi-1) then stop)


5) return (Print )Vi and i- no of iterations

Part 2: Web Surfer Algorithm ( Damping Factor=0.85)-


Page Rank Algorithm and Implementation - GeeksforGeeks

Page Rank - Wikipedia

Part 3: HITs algorithm- Refer PPT example


Lab Questions

1) Give the efficient approach to handle M and write reason

Ans:

● Sparse Matrix Representation: In a web graph, the link structure between pages often

results in a sparse matrix where most of the elements are zero. By representing the

web graph as a sparse matrix, we can significantly reduce the memory footprint and

computational cost associated with handling large datasets.

● Power Iteration Method: The power iteration method allows for the computation of

the PageRank vector through iterative matrix-vector multiplication. This method

converges to the principal eigenvector of the transition probability matrix, which

represents the PageRank scores for each page. It's an efficient way to estimate

PageRank without directly computing the full eigenvalue decomposition of the

transition matrix, which can be computationally expensive for large matrices.

● Parallel Processing: The power iteration method can be parallelized, allowing for the

distributed computation of PageRank across multiple processors or computing nodes.

This parallel processing capability further improves the efficiency of handling a large

number of web pages by distributing the computation workload.

● Iterative Convergence Control: Implementing a convergence criterion during the

iterative process ensures that the computation stops when the PageRank scores have

converged to a stable value. This helps avoid unnecessary iterations, optimizing the

algorithm's performance for large-scale datasets.

● Graph Partitioning Techniques: Leveraging graph partitioning techniques can help

distribute the web graph across different computing nodes or clusters, enabling

efficient processing and reducing communication overhead, especially in distributed

computing environments.
2) Give improvement of Page rank algorithm for spider trap problem and dead end

Ans:

● Damping Factor (Teleportation): The introduction of a damping factor, typically

denoted by the symbol d, helps in avoiding spider traps and dead ends. This factor

allows the possibility of "teleporting" from a dead end to any other page in the web

graph. By incorporating this probability of teleportation into the computation, the

PageRank algorithm ensures the convergence of the scores even in the presence of

dead ends.

● Random Surfer Model: The concept of the random surfer model assumes that a

random surfer, after encountering a dead end, will teleport to another page in the

graph with a certain probability. This model helps in mitigating the impact of dead

ends and ensures that the web graph remains connected, thereby improving the

robustness and accuracy of the PageRank algorithm.

● Handling Spider Traps: To address spider traps, where a surfer can get stuck in a

subset of pages without escaping, algorithms often introduce heuristics that break the

loops within the web graph. Techniques such as "trust-rank" or "topic-sensitive

PageRank" help prevent a surfer from getting trapped by adjusting the random walk

probabilities based on the topical relevance of web pages.

● Personalized PageRank: Personalized PageRank allows the computation of PageRank

scores with respect to a specific query or context, thereby reducing the impact of

spider traps and dead ends for personalized search results. This approach emphasizes

pages that are more closely related to the user's interests, minimizing the influence of

irrelevant or disconnected components.

● Topic-Sensitive PageRank: Topic-Sensitive PageRank modifies the original PageRank

algorithm to consider the topical relevance of web pages. By incorporating topical


information, the algorithm can navigate through the web graph more effectively,

avoiding spider traps and dead ends related to specific topics or themes.

Other references:

• Data Mining - Mining Text Data - Tutorialspoint

• Data Mining - Mining World Wide Web – Tutorialspoint

• Web Mining (tutorialride.com)

• Web Mining and Text Mining - An In-Depth Mining Guide (eduonix.com)

• Hyperlink Induced Topic Search (HITS) Algorithm using Networxx Module | Python –
GeeksforGeeks

• PageRank Algorithm - The Mathematics of Google Search (cornell.edu) • Google's

PageRank Algorithm: Explained and Tested (link-assistant.com) • HITS Algorithm - Hubs

and Authorities on the Internet (cornell.edu)

• HITS Algorithm: Link Analysis Explanation and Python Implementation from Scratch | by Chonyy | Towards
Data Science

• Text Data Mining – Javatpoint

• Text Summarization with NLTK in Python (stackabuse.com)

• (Tutorial) Text ANALYTICS for Beginners using NLTK – DataCamp • Beautiful Soup 4 Python -

PythonForBeginners.com • NLTK Tutorial in Python: What is Natural Language Toolkit?

(guru99.com) • NLTK Python Tutorial (Natural Language Toolkit) - DataFlair (data-flair.training) •

Natural Language Processing - Introduction – Tutorialspoint

• Natural Language Processing - Introduction – Tutorialspoint

• Natural Language Processing - Introduction - Tutorialspoint

You might also like