Hamotzi's Data Mining Log: KDD Cup: Using SVMs

Why SVMs?

Several studies (see entries in blog) say its a good technique

Software
So far, I have found only one
Its a two step process - learn on the training set and predict using the testing set.
The window version seems buggy - it ran much faster on my less powerful Linux box.

Data format

Treat each query as a document
Training set:

...
What if there are several classes?

Testing set:

Open Issues:

Scalability - what's the max size of dimensions and data that it can process in a reasonable amount of time? (could not process 100,000 records and 500,000 features)
Data pre-processing
There are several kernel functions and associated parameters. By trial and error, need to determine which should be used.
Need to build a smaller subset of the data for experimenting

Hamotzi's Data Mining Log

Saturday, June 04, 2005

KDD Cup: Using SVMs

0 Comments:

About Me

Previous Posts