Implementation Desiderata and Plan



Data needs to either be versioned somehow or distributed enough that it can be reproduced if some is lost.

We don't want there to be a single point of failure where if some bug gets introduced into the code one day, we lose years of records. We're storing it in Amazon's Simple Storage system, which means we pay 15 cents per GB-Month of data used. The data grows quickly, but appears to be quite compressible. I would expect that my ICML 2008 data (1800 experiments x 30+ trials) would require about 100 MB compressed. This is cheap. Saving multiple copies is not a bad idea in the short term. But, over time this will start to get expensive and we'll want to think of something smarter.

I'm not sure SVN is the right thing, because the current plan is to store the data as serialized java objects, which means that they are binary and probably we will be storing full new copies every time they change.  One possible solution is something I've used before with some success is as follows.

First, let me define a computing run: A computing run consists of the results generated in a computing session.  This could be a few trials, or 1 trial, or thousands of trials, one on parameter set, or many.  The main thing is that a computing run is only going to be for a single agent in a single event. Now, if we save the results from each computing run as its own file, we get easy(ish) reproducibility.  This approach will require us to process these smaller files to create larger summary files. This is ok, because it only has to happen when new results are added, and those summary files will never write to the original files, so there is no risk of ruining them.  This seems like a good plan.

Of course, we don't want to dump zillions of files all into a single directory, so we'll create a directory for each agent under each event.  This way, all of the results for the agent/event pair can be found in single place.


We want to be able to do fast(ish) queries within an agent. For example, "show me learning curves for Sarsa for step sizes {.1, .2, .5, 1.0}".  Or "what is the best step size when epsilon = .5".  Or whatever.  These queries need to sift through and aggregate potentially tens of thousands of experiments, so all of the information they need should be close together and appropriately summarized.

It is an advanced and hopefully less important thing to ask complicated questions between agents.  It would be nice to say "when the step sizes are the same, show me what epsilon does best for Sarsa and Q-learning".  This actually makes sense in this case, but in general there may be no easy mapping of parameters between agents.  It may make more sense in that case to do some more manual aggregation on simpler queries.


If an agent or environment changes at all, even in the slightest, I think that should be cause for a calling it a different agent or environment.  Comparisons between different versions may be done informally, but never officially.  If an agent changes, it was for a reason, and it's results should be re-run.  Similarly, if an environment is changed, that is really creating a new event.