Document representation for Clustering
Currently, I'm using shared counts of n-grams - this is a sparse matrix format.
I believe that I would get better performance if I changed it to a normalized tf-idf format and used the cosine similarity measure.
I believe that I would get better performance if I changed it to a normalized tf-idf format and used the cosine similarity measure.
0 Comments:
Post a Comment
<< Home