Hamotzi's Data Mining Log: Semi-supervised learning for Natural Language

See link.
"In the spirit of (Miller et al., 2004), our basic strategy for taking advantage of
unlabeled data is to first derive features from unlabeled data|in our case, word
clustering or mutual information features|and then use these features in a supervised
learning algorithm. (Miller et al., 2004) achieved signicant performance gains in
named-entity recognition by using word clustering features and active learning. In
this thesis, we show that another type of unlabeled data feature based on mutual
information can also signicantly improve performance."
"(Shi and Sarkar, 2005) takes a similar approach for the problem of extracting
course names from web pages. They rst solve the easier problem of identifying
course numbers on web pages and then use features based on course numbers to solve
the original problem of identifying course names. Using EM, they show that adding
those features leads to signicant improvements."
The results were not that great.

Labels: nlp, ssl

Hamotzi's Data Mining Log

Tuesday, October 21, 2008

Semi-supervised learning for Natural Language

0 Comments:

About Me

Previous Posts