Saturday, June 04, 2005

Scalability approach?

Given a testing dataset of 800,000 records, what is the effect of splitting the dataset into smaller chunks and performing the prediction in parallel? The smaller datasets may not have any or have too few data items that belong to each of the categories or some categories may be over-represented. What happens???
The advantage of splitting is self-evident.

0 Comments:

Post a Comment

<< Home