Sunday, March 19, 2006

Goal for this week

My goal for this week is:
(1) Generate Berkley DB datasets
Stretch goals:
(2) Train classifier for one category
(3) Predict - on validation dataset

Discussion
(1) Generate Berkley DB datasets
MySQL is a bottleneck. Converting the datasets to Berkley should help performance. The datasets I'm gonna create will have tf-idf and normalized tf-idf values.
(2) Train classifier on one category
I might face an issue here. The number of rows is small (about 178) - but the number of features is 83K - so libsvm may not scale. I might have to re-write parts of the SVM software and/or reduce the number of features or use SVMLite.
(3) Predict on the Validation set
Again, this is the same problem as (2). The advantage of predicting on the validation set is that it is considerably smaller (I don't know the size yet).

0 Comments:

Post a Comment

<< Home