运维开发网

社交网络分析与挖掘 第六课:网页排序

运维开发网 https://www.qedev.com 2020-07-23 14:41 出处:网络 作者:运维开发网整理
The History of PageRank PageRank is a link analysis algorithm which assigns a numerical weighting to each Web page,with the purpose of "measuring"relative importance. Based on the hyperlinks map An ex

The History of PageRank

  • PageRank is a link analysis algorithm which assigns a numerical weighting to each Web page,with the purpose of "measuring"relative importance.
  • Based on the hyperlinks map
  • An excellent way to prioritize the results of web keyword searches
  • PageRank was proposed by Larry Page(hencethe name Page-Rank)and Sergey Brin in 1998.
  • Shortly after,Page and Brin founded Google.

PageRank:the intuitive idea

分享图片

  • Although page E hasmore"point to"links than page C, page C is more important than page E.
  • One reason is that page C is linked by page B whose is the most important page in this network.

A Simple Version of PageRank

\[ R(u)=c \sum_{v \in B_{u}} \frac{R(v)}{N_{v}} \]

  • u:a web page
  • B(u):the set of u‘s backlinks(point-to-links)
  • N(v):the number of forward links of page v
  • c:the normalization factor to make

    \[ \|\mathrm{R}\|_{\mathrm{L} 1}=\left|\mathrm{R}_{1}+\ldots+\mathrm{R}_{\mathrm{n}}\right| \]

An example of Simplified PageRank

分享图片

A Problem with Simplified PageRank

A loop:

分享图片

During each iteration,the loop accumulates rank but never distributes rank to other pages!

Fix the problem:what PageRank does

  • Randomly walking and also randomly jumping
    • the“random surfer”simply keeps clicking successive links at random,but periodically “gets bored”and jumps to a random page based on the distribution

      分享图片

The actual PageRank

\[ R^{\prime}(u)=c_{1} \sum_{v \in B_{u}} \frac{R^{\prime}(v)}{N_{v}}+\mathrm{c}_{2} E(u) \]

  • E(u):a distribution of ranks of web pages that "users"jump to when they"gets bored"after successive links at random.

An example of PageRank

分享图片

Convergence Experiments

  • PR(322 Million Links):52 iterations
  • PR(161Million Links):45 iterations
  • Scaling factor is roughly linear in logn

    分享图片

Conclusion

  • PageRank rank web pages based on their locations in the web graph structure
  • PageRank uses information which is external to the web pages-backlinks
  • Backlinks from important pages are more significant than backlinks from common pages
  • The structure of the web graph is very useful for information retrieval tasks.

以上内容出自小象学院

0

精彩评论

暂无评论...
验证码 换一张
取 消