Friday, April 14, 2006

5000 Clusters results

I purchased another Gig of memory ($100) from Frys and clustered the 799373 rows with 83595 dimensions into 5000 clusters using rbr (k-means repeated bisection method). I saw it use a max of 1.5G of RAM (I have 2.25G).
It took 50555.371 seconds (approximately 14 1/2 hours). The results were still not that great.

The top 10 clusters range from 0.627 to 0.481 in internal similarity. (As compared to 0.466 to 0.398 for 2500 clusters). So the results are definitely better - but not that great.


Post a Comment

<< Home