Saturday, June 04, 2005

KDD Cup: Using SVMs

Why SVMs?
  1. Several studies (see entries in blog) say its a good technique
Software
So far, I have found only one
Its a two step process - learn on the training set and predict using the testing set.
The window version seems buggy - it ran much faster on my less powerful Linux box.

Data format
  1. Treat each query as a document
  2. Training set:
    • ...
    • What if there are several classes?
  3. Testing set:
Open Issues:
  1. Scalability - what's the max size of dimensions and data that it can process in a reasonable amount of time? (could not process 100,000 records and 500,000 features)
  2. Data pre-processing
  3. There are several kernel functions and associated parameters. By trial and error, need to determine which should be used.
  4. Need to build a smaller subset of the data for experimenting

0 Comments:

Post a Comment

<< Home