Saturday, August 27, 2005

Topic: Word Sense Disambiguation (Text Processing)

My notes from the the book - "Text Mining: Predictive Methods for Analyzing Unstructured Information"

This is an optional data pre-processing step. It is performed because English words are ambiguous (as to their meaning or reference) - even when they are tagged with their parts of speech. For example, the word "bore" can reference a hole (the bore is not large enough) or a person (he is a bore).
The Wordnet project aims to help disambiguate words. It focuses on word meanings and their inter-relationships. It however does not provide an algorithm for selecting a particular meaning for a word in a context.
There are no algorithms that can completely disambiguate text. This is due to the lack of a corpora of disambiguated text required to train machine-learning algorithms.
So - this pre-processing step is generally avoided.

Other References
Word Sense disambiguation: The state of the art.
(the paper sucks - but has great references)

0 Comments:

Post a Comment

<< Home