Saturday, August 27, 2005

Topic: Full sentence parsing (Text processing)

My notes from the the book - "Text Mining: Predictive Methods for Analyzing Unstructured Information".

Sometimes we need to perform a full parse of a sentence. The sentence is converted into a single structure such as a tree or a directed acyclic graph. Each word in the sentence is present in this structure. The structure is used to find the relation of each word in a sentence to all the others and also to find the function of the word in the sentence - is it a subject, object, etc.
There are several kinds of parses - e.g. Content-free parsers. There are a number of algorithms to produce such a tree. The Wall Street Journal corpus available from LDC (see "Corpora for Text Mining").
It is an expensive process - but sometimes is needed since it provides information that phrase identification ir partial parsing cannot provide. For e.g. the sentence "Johnson was replaced at IBM by Smith" is problematic for analysis without the use of full text parsing. We may wrongly conclude that Smith was replaced by Johnson because of the passive structure of this sentence.

0 Comments:

Post a Comment

<< Home