Status Blog‎ > ‎

Handling Multiple Runs

posted Sep 5, 2009, 10:26 AM by Brian Tanner   [ updated Sep 6, 2009, 7:54 AM ]

Generating the Results

The code that I've been using so far for the environments is very careful with random seeds.  It makes sure that given an action sequence, the environment will generate the same observation and reward sequences for the same MDP every time.  This sounded good at first.  However, now that I want to do multiple runs for the MDP it's not so good because we've lost part of the environment's stochasticity.

I'm going to add a new param to the CODA environments called RunNumber, and I'm going to use that (combined with the MDP number) to generate the random seed for transition and reward stochasticity.  This way, for every MDP, you *can* control these feature independently.  The only problem is that I have just run 600 000 experiments without this parameter so I need to re-run all of those experiments. Shux.

Handling The Results

There are two distinct strategies for handling multiple runs that we could follow.

One would be to have each run as it's own experiment.  For example, we could add a runNumber param to the experiment's paramSummary.  This would later allow us in SQL to have fine control over the runs because we could do queries over specific run numbers.  The downside of this approach is that there would be a different experimentId for each run, which means queries directly over the resultsRecords might be more complicated.  If they shared the experimentId, we could do:
select max(score) from resultRecords where experimentId=select id from Experiment where MDPNumber=2

Now we'd have to do
select max(score) from resultRecords where experimentId in (select id from Experiment where MDPNumber=2)

Maybe that's not so bad.  Maybe it gets worse when we have more complicated queries.  Not quite sure.

The opposite strategy is to take the distinct index off of ResultRecords for (AgentId,ExperimentId).  This would mean that there would literally just be multiple ResultRecords for each Agent,Experiment combination.  We'd have to manually filter or average them when necessary.  The advantages can be that we just need to submit things multiple times and they get done multiple times and end up in our results multiple times.

Now that I'm thinking more clearly about this, I don't think this strategy fits with the new accountability strategy that I'm trying to follow where we have a record of each submission.  I think we should add the runNumber to the param summary.
Comments