Monday, August 29, 2005

Link analysis

In a previous post (Web-based Document Search challenge), I talked about the issue of performing simple similarity based searches for retrieving documents.

One solution as per the book - ""Text Mining: Predictive Methods for Analyzing Unstructured Information" is to perform Link analysis. Google uses a PageRank algorithm. The rank of a document is determined by the rank of the papers that link to it. A document should be ranked highly if it is cited by another highly-ranked document.

Academic documents can be also be ranked based on this citation analysis. If a document is cited by highly-ranked documents, then it should be highly-ranked as well.

References
  1. The anatomy of a search engine
  2. The PageRank citation ranking: Bringing order to the web
  3. Authoritative Sources in a hyperlinked environment
  4. Citation analysis as a tool in journal evaluation

0 Comments:

Post a Comment

<< Home