Hamotzi's Data Mining Log: Document representation for Clustering

Thursday, March 16, 2006

Document representation for Clustering

Currently, I'm using shared counts of n-grams - this is a sparse matrix format.
I believe that I would get better performance if I changed it to a normalized tf-idf format and used the cosine similarity measure.

Hamotzi's Data Mining Log

Thursday, March 16, 2006

Document representation for Clustering

0 Comments:

About Me

Previous Posts