Topic: Phrase Recognition (Text Processing)
My notes from the the book - "Text Mining: Predictive Methods for Analyzing Unstructured Information".
Phrase Recognition is useful for creating a "partial parse" of a sentence and as a step in identifying the "Named Entities" occuring in a sentence. It is a data pre-processing step that is performed after the tokens in the sentences have been tagged by their Parts of Speech.
Phrase Recognition systems are supposed to scan a text and mark the beginnings and ends of phrases. Types of phrases are Noun phrases, Verb phrases and Prepositional phrases. One convention is to mark a word inside a phrase with "I-", a word at the beginning of a phrase adjacent to another phrase with B- and a word outside any phrase with O-. To the I- and B- tags we then add a code for the phrase type - e.g I-NP (Noun phrase).
This can be considered as a classification problem for the tokens of a sentence. There are several corpora available for developing and testing phrase recognition systems. Performance of these systems varies widely over phrase type - overall, its pretty good.
Phrase Recognition is useful for creating a "partial parse" of a sentence and as a step in identifying the "Named Entities" occuring in a sentence. It is a data pre-processing step that is performed after the tokens in the sentences have been tagged by their Parts of Speech.
Phrase Recognition systems are supposed to scan a text and mark the beginnings and ends of phrases. Types of phrases are Noun phrases, Verb phrases and Prepositional phrases. One convention is to mark a word inside a phrase with "I-", a word at the beginning of a phrase adjacent to another phrase with B- and a word outside any phrase with O-. To the I- and B- tags we then add a code for the phrase type - e.g I-NP (Noun phrase).
This can be considered as a classification problem for the tokens of a sentence. There are several corpora available for developing and testing phrase recognition systems. Performance of these systems varies widely over phrase type - overall, its pretty good.
0 Comments:
Post a Comment
<< Home