Tuesday, October 21, 2008

Semi-supervised learning for Natural Language

See link.
"In the spirit of (Miller et al., 2004), our basic strategy for taking advantage of
unlabeled data is to fi rst derive features from unlabeled data|in our case, word
clustering or mutual information features|and then use these features in a supervised
learning algorithm. (Miller et al., 2004) achieved signi cant performance gains in
named-entity recognition by using word clustering features and active learning. In
this thesis, we show that another type of unlabeled data feature based on mutual
information can also signi cantly improve performance."
"(Shi and Sarkar, 2005) takes a similar approach for the problem of extracting
course names from web pages. They rst solve the easier problem of identifying
course numbers on web pages and then use features based on course numbers to solve
the original problem of identifying course names. Using EM, they show that adding
those features leads to signi cant improvements."
The results were not that great.

Labels: ,

0 Comments:

Post a Comment

<< Home