Monday, April 03, 2006

Generating the "Option1" dataset

Category Name: Computers/Hardware
#of labeled vectors: 6
Therefore, the number of unlabeled vectors to be systematically selected = 60 * 6 = 360 (out of 799373)

Question: I need to pick 360 unlabeled vectors out of 790K vectors - what should k (the number of clusters) be?
I'll experiment briefly with the number of clusters and based on the quality of the clusters determine what k should be.
For starters I'll try:
(a) k = 60
(b) k = 100
(c) k = 120
I believe that the number of clusters should have an adverse effect (possibly exponential effect) on the time it takes to cluster. So lets see how it goes.

0 Comments:

Post a Comment

<< Home