Thursday, March 16, 2006

Document representation for Clustering

Currently, I'm using shared counts of n-grams - this is a sparse matrix format.
I believe that I would get better performance if I changed it to a normalized tf-idf format and used the cosine similarity measure.

0 Comments:

Post a Comment

<< Home