Monday, October 20, 2008

Link: Introduction to SSL

See link.
The presentation focuses on semi-supervised classification. This is a good presentation. He focuses on ssl classification algorithms. For each algorithm he describes the assumptions, details and pros/cons.

Some highlights from the presentation:
Basic objective of SSL: How does one use unlabeled data to improve classification?
Approach: Use labeled and unlabeled data to build better learners (contrast with supervised and unsupervised approaches).
Note that it is not always the case that unlabeled data will help.

Types of semi-supervised learning algorithms:
(1) Self-training:
They are easy and widely used. However, early mistakes could reinforce themselves and can't predict convergence.

(2) Generative Models: Assume one has the full generative model: p(X,Y|O). Marginalize over the labels of the unlabeled instances to estimate the parameters O. Then use MLE/MAP/Bayesian techniques. E.g. Mixture of Gaussians, Mixture of Multinomials (Naive Bayes) and HMMs. (EM, Baum-Welch). Relies heavily on EM. If the model is correct, this can be very effective and is a nice probabilistic framework. But unlabeled data might hurt if the generative model is wrong. Need to use heuristics to reduce the impact of unlabeled data.

(3) Cluster and label approach: Use cluster labels to label unlabeled data.

(4) Semi-supervised SVMs: Also known as Transductive SVMs. Maximize unlabeled data margins. Makes an assumption that unlabeled data from different classes are separated by a margin.

(5) Graph-based algorithms: Labels propagate using similar unlabeled instances. A graph is given on the labeled and unlabeled data. Instances connected by heavy edge tend to have the same label. Graph-based algorithms: mincut, harmonic
local and global consistency and manifold regularization. Can be extended to directed graphs. Performance is good if the graph is good - bad otherwise.

(6) Multiview algorithms: Split the features for an instance. Train classifiers on each split and get the classifiers to teach each other. It assumes that the feature splits are conditionally independent given the class. Less sensitive to mistakes than self-training. Models using BOTH features should do better

Final analysis: (more in the presentation):
Use the right model for the job
no pain, no gain
no model assumption, no gain
wrong model assumption, no gain, a lot of pain

Labels: ,

0 Comments:

Post a Comment

<< Home