Hamotzi's Data Mining Log: Issue: tf-idf calculation issue

Thursday, March 16, 2006

Issue: tf-idf calculation issue

The problem with the tf-idf metric is that if I modify a single query - say change a spelling error - I need to regenerate the tf-idf metrics for the entire data set. After which it has to be normalized!
This is a very expensive operation - I wonder if instead of the MySQL database if I used the BerkleyDB, would it be faster?

Hamotzi's Data Mining Log

Thursday, March 16, 2006

Issue: tf-idf calculation issue

0 Comments:

About Me

Previous Posts