The History of PageRank
- PageRank is a link analysis algorithm which assigns a numerical weighting to each Web page,with the purpose of "measuring"relative importance.
- Based on the hyperlinks map
- An excellent way to prioritize the results of web keyword searches
- PageRank was proposed by Larry Page(hencethe name Page-Rank)and Sergey Brin in 1998.
- Shortly after,Page and Brin founded Google.
PageRank:the intuitive idea
- Although page E hasmore"point to"links than page C, page C is more important than page E.
- One reason is that page C is linked by page B whose is the most important page in this network.
A Simple Version of PageRank
\[ R(u)=c \sum_{v \in B_{u}} \frac{R(v)}{N_{v}} \]
- u:a web page
- B(u):the set of u‘s backlinks(point-to-links)
- N(v):the number of forward links of page v
- c:the normalization factor to make \[ \|\mathrm{R}\|_{\mathrm{L} 1}=\left|\mathrm{R}_{1}+\ldots+\mathrm{R}_{\mathrm{n}}\right| \]
An example of Simplified PageRank
A Problem with Simplified PageRank
A loop:
Fix the problem:what PageRank does
- Randomly walking and also randomly jumping
- the“random surfer”simply keeps clicking successive links at random,but periodically “gets bored”and jumps to a random page based on the distribution
- the“random surfer”simply keeps clicking successive links at random,but periodically “gets bored”and jumps to a random page based on the distribution
The actual PageRank
\[ R^{\prime}(u)=c_{1} \sum_{v \in B_{u}} \frac{R^{\prime}(v)}{N_{v}}+\mathrm{c}_{2} E(u) \]
- E(u):a distribution of ranks of web pages that "users"jump to when they"gets bored"after successive links at random.
An example of PageRank
Convergence Experiments
- PR(322 Million Links):52 iterations
- PR(161Million Links):45 iterations
- Scaling factor is roughly linear in logn
Conclusion
- PageRank rank web pages based on their locations in the web graph structure
- PageRank uses information which is external to the web pages-backlinks
- Backlinks from important pages are more significant than backlinks from common pages
- The structure of the web graph is very useful for information retrieval tasks.
以上内容出自小象学院
精彩评论