Transductive learner
SVM Lite has a transductive learner. One trains the classifier on a dataset containing labeled and unlabeled vectors. This seems interesting and something I want to try.
I need to think about how to pick the unlabeled vectors and the number of vectors. I'm thinking using my (still to be clearly defined) clustering technique to pick the vectors. The question is which vectors should be chosen?
Option 1: Create "several" small clusters. Pick n% of candidates from the top-k clusters. This will be a representative sample that (should) include samples from all categories.
Option 2: Create a biased dataset - make sure to include all potential samples that are possibly of the target category.
Option 3: Random selection
I think I'll try all 3 Options.
The size of the dataset:
Option1: make it a function of:
I need to think about how to pick the unlabeled vectors and the number of vectors. I'm thinking using my (still to be clearly defined) clustering technique to pick the vectors. The question is which vectors should be chosen?
Option 1: Create "several" small clusters. Pick n% of candidates from the top-k clusters. This will be a representative sample that (should) include samples from all categories.
Option 2: Create a biased dataset - make sure to include all potential samples that are possibly of the target category.
Option 3: Random selection
I think I'll try all 3 Options.
The size of the dataset:
Option1: make it a function of:
- number of vectors that are known to belong to that category (based on the labeled dataset)
- num labeled vectors belonging to other categories
- total number of unlabeled vectors
- for e.g. ( (# of cat)/(total labeled) ) * total unlabeled * (some constant)
- number of vectors that belong to the category
- for e.g. (# of cat * 60)
0 Comments:
Post a Comment
<< Home