PageRank: An Introduction

PageRank Overview

PageRank is an exclusive technology and registered trademark of Google Inc., and refers to the invention described in U.S. Patent # 6,285,999 – A method of ranking nodes in a linked database. The technology was invented by Lawrence Page (after whom PageRank is named), co-founder of Google Inc., while he was a PhD candidate in the Stanford University Mathematics Department.

The Google Technology page offers the following:
The heart of our software is PageRank(tm), a system for ranking web pages developed by our founders Larry Page and Sergey Brin at Stanford University. And while we have dozens of engineers working to improve every aspect of Google on a daily basis, PageRank continues to provide the basis for all of our web search tools.

PageRank has undergone some very interesting refinements over the last 6 years (Many of which are explored in other articles here).Since PageRank’s initial deployment in the BackRub search engine*, the original technical and mathematical concepts have remained essentially intact. PageRank’s continued viability in a rapidly changing internet is a testament to the versatility and scalability of the PageRank formula.

Briefly, PageRank is a mathematical formula which provides a completely objective method of assessing the importance of web pages. The assessment is made by evaluating the relationships between web pages (which can be rendered as the internet link graph) The following description is from the patent document: A method assigns importance ranks to nodes in a linked database, such as any database of documents containing citations, the world wide web or any other hyper-media database.

Unlike other citation based ranking formulas (i.e. HITS,etc.), PageRank bears the distinction of being a curiously accurate probability model of internet user behavior. This justification is touched upon in The Anatomy of a Large-Scale Hyper-textual Web Search Engine by Lawrence Page and Sergey Brin:
PageRank can be thought of as a model of user behavior. We assume there is a “random surfer” who is given a web page at random and keeps clicking on links, never hitting “back” but eventually gets bored and starts on another random page. The probability that the random surfer visits a page is its PageRank.

* BackRub was the initial search engine created by Lawrence Page and Sergey Brin; the conceptual predecessor of Google.

PageRank: Clarifying Importance as Opposed to Relevance


The relative value of any given web page or internet document in relation to other web pages or internet documents.

Google’s PageRank is a determination of importance. PageRank is calculated by determining the quantity and quality of citations to any given document. This data is then used to form a probability distribution that describes the activity of random-surfer. The behavior of the random-surfer in regards to a particular web page is described by that page’s PageRank.


A determination of relatedness between a web page or internet document in relation to a search query or a specified set of web pages or internet documents.

Semantic Analysis is an example of document to query relevancy evaluation.

Caught in the Middle:

Topic Sensitive PageRank is a determination of relevancy between a specified set of related documents. Topic Sensitive PageRank also determines the relative importance of any given document within the set in relation to all other documents within the set.

Toolbar PageRank

It is frustrating (and fruitless) trying to gauge the relative “importance” of a web page based upon the PageRank toolbar.

The inverse logarithmic scale used to chop up actual PageRank values and make them fit into a convenient 0 to 10 package is not designed for accuracy to be sure (especially when you consider the possible differences in actual PageRank value between sites that could have identical toolbar PageRank values.).

Nevertheless the fact remains that the PageRank formula works relatively well even if we don’t get to see the actual workings.

