Review: Text Categorization with Support Vector Machines: Learning with Many Relevant Features
Review: Text Categorization with Support Vector Machines: Learning with Many Relevant Features
Author: Thorsten Joachims
Topic: Text Categorization
Approach: Supervised Learning with SVMs
Paper explores the use of SVMs to perform Text Categorization. Claim SVMs rock cause they're fast, robust, efficient and fully automatic - no parameter selection required. (Need a background in SVMs to comprehend the paper)
Interesting points/concepts:
Author: Thorsten Joachims
Topic: Text Categorization
Approach: Supervised Learning with SVMs
Paper explores the use of SVMs to perform Text Categorization. Claim SVMs rock cause they're fast, robust, efficient and fully automatic - no parameter selection required. (Need a background in SVMs to comprehend the paper)
Interesting points/concepts:
- Assignment of text to a category is treated as a binary classification problem - classifier determines if text belongs to this category or not.
- Used IDF to build a feature vector.
- Used Feature Selection to reduce the dimensions of the Feature Vector - should prevent overfitting.
- Several FS opions - DF Thresholding, Chi Square test, term strength criterion
- Used information gain criteria as proposed by Yang
- ?? Feature Selection hurts Text Categorization since there are very few irrelevant features and leads to loss of information
- Is'nt this a contradiction?
- Hypothesis of why SVMs should work well with Text Categorization
- SVMs work well with high d - they do not overfit
- SVMs work well with sparse vectors
- Overview of SVMs is quite high level and theoretical. Details of the algorithm/implementation were not described
0 Comments:
Post a Comment
<< Home