Thursday, March 16, 2006

Issue: Databases

My bottleneck in processing is the I/O.
The advantage of using a SQL databse is that I can run ad-hoc queries and understand my data better. The disadvantage is that it is a remote process and is slow.
As a compromise, I might use the Berkley embedded database to store information that I really will not run queries against but I need fast access to AND is going to change often. So the normalized tf-idf metrics could be stored in a Berkley DB.
The "perceived" advantage of the Berkley DB is that its embedded and should be faster than MySQL - but far slower than a file and much slower than in-memory data structures.
Hmmm since if have one gig of RAM and can expand it to 4G - may be in memory data structures are not a bad idea for certain metrics.

0 Comments:

Post a Comment

<< Home