SQLite loves me, RAM hates me
Non-technical version: DN now uses the database to load and save data. Hooray! It eats RAM like candy but that is expected and can be somewhat fixed. Speed is still a concern but real optimization will come later once the codebase is solid.
Next step is to begin working on the client in earnest and get some simulations running! Gruesome technical details for those that care after the jump
Technical Version
The loading/saving/growing code for both genotypes and phenotypes has been completely ported over to SQLite. The code is much cleaner now and (except for a few areas) a great deal faster. Transaction support in SQLite means I can insert 10,000 rows in under 2 seconds. And it allows me to do it safely.
For those unfamiliar with transactions, it is a way to ensure that everything you want written gets written to the database at the same time. Even more importantly to SQLite, it is a huge performance boost. SQLite by default treats all database queries as a single transaction. This means that a single Insert operation forces SQLite to open, lock, write, unlock and close the physical file. Multiply these I/O operations by 10,000 rows and you get a very unhappy hard drive.
If the programmer specifies a transaction however, things are quite different. SQLite will wait until the transaction has been completed before opening/locking/writing/unlocking/closing the physical file. Which means you can cram 10,000 Inserts into a single opening of the database file, greatly improving performance. It also means that if there is a catastrophic error (like the computer losing power) the transaction is rolled back. This keeps the database from becoming corrupted with half-completed data.
In other news, DN is a memory hog. But that was to be expected. I did a simple stress test to see what kind of results I would get. A network with 10,000 neurons eats a whopping 330 MB of memory. This may seem like a ridiculously large amount of memory for such a small neuron count, but consider what 10,000 neurons means:
- 10,000 Neuron objects
- 100,000 Synapse objects (100 Synapses per Neuron)
- 400,000 Receptor objects (4 Receptors per Synapse)
- 400,000 Transmitter objects (1 Transmitter object per Receptor object)
Which leads to an amazing 910,000 objects in memory at one time, plus all the assorted overhead (generic object overhead, Octree footprint, etc). All in all, it is a lot to keep track of at one time. Obviously, real simulations will be nowhere as large. Even halving the number of synapses decreases memory footprint by a huge amount.
Running speed should prove interesting, although currently it appears to be running fairly well. I’ve yet to optimize the code so there are quite a few sections that may work…but do not work fast. One in particular is the saving code for phenotypes.
Without even thinking, I programmed it to use the StringStream object. Woe upon any who touch strings! I’m forced to use strings because SQL operations are inherently strings. C++ frankly sucks at working with strings. You are forced to either molest a character array (and associated misery including internal pointers, allocation size and writing your own routines to return the finished string) or the StringStream object. The StringStream object takes care of allocation and presents a nice interface but is slow as all hell. The only way to clear a StringStream also forces it to deallocate it’s internal buffer. The alternative is creating a new StringStream object, which is mildly more expensive and slower.
Which means saving a phenotype to disk can be terribly slow if it is a large network. I believe I am going to rewrite the code with my own StringStream class. It will mimic the same functionality but not erase the buffer each time it is cleared. It will also have a user-definable buffer size to avoid reallocations in the middle of execution.
Lastly, SQLite has new experimental pre-compiled queries which I’m going to toy with. Each time you pass an operation to SQLite, it is compiled to an internal opcode version and executed. In the case of loading/saving, I am repeating the same six queries thousands of times. The pre-compiled feature allows me to precompile these queries ahead of time and merely swap out values, theoretically saving a huge amount of time. We’ll see