Wednesday, April 12, 2006

Picking Vectors from the Clusters (more)

Since the quality of the clusters was poor (even with 2500 clusters), I decided to try 5000 Clusters. CLUTO ran for 3 days - and did not produce any results (it crashed).
My options for getting better quality clusters are:
  1. Increase the amount of RAM on my machine by a GIG
  2. Split the dataset in "half" and cluster each half individually
For splitting, I could split the 2500 Clusters into 2 datasets of roughly equal size and then try clustering each dataset. That way I reduce the likelihood of splitting a natural cluster into two different datasets.

I prefer Option 1 - since Option 2 might have unintended consequences.
Time to call Fry's.

0 Comments:

Post a Comment

<< Home