KDD Cup: Brain Storming
Learning algorithm options:
- Supervised Learning algorithms, or
- Unsupervised Learning algorithms or
- Latent Semantic Indexing + Supervised Learning?
- Use standard techniques - Porter stemming, etc,
- Custom code based on analyzing characteristics of the dataset
- Build a custom synonym dictionary? (or use LSI techniques?)
- Ploysomy is not an issue because we can map the query to upto 5 categories
- Need to greatly reduce the number of unique words from about 799,000+ to something more "manageable" - depends on the algorithm what "manageable" means
- Need to enhance the training dataset to ensure the training set contains examples of all categories
- Hand build a custom validation set
0 Comments:
Post a Comment
<< Home