Migrating the idea of an experiment

The bt-Trainer Way

Below is some sample code that worked with bt-Trainer for describing a full "experiment".  In our new recordBook case, the experiment is logically separated from the agent, but here they are all combined in a agent-environment-experiment block.

Sample Experiment

import org.rlcommunity.bt.recordbook.experimenter.AbstractExperiment;

/**
 * This is a sample Sarsa(lambda) experiment that varies alpha and lambda.
*/
public class SampleExperiment extends AbstractExperiment {

    static final String thisAgentName = "CMACSarsaLambda - Java";
    static final String thisExperimentName = "SampleExperiment";

    public SampleExperiment(int totalMaxSteps, int numTrials, String theEnv, String baseDataDir) {
        super(totalMaxSteps, numTrials, theEnv, thisAgentName, thisExperimentName, baseDataDir);


        //You can setup either only the parameters you want to vary right now
        //or more parameters just in case you want to change them later.  I prefer
        //the latter option, because often you'll realize later what you *really*
        //want to check.

        //I know that all of these parameters exist in EpsilonGreedyCMACSarsaLambda

        //You can add them all at once
        super.addAgentDoubleParameterValues("sarsalambda-alpha", new Double[]{1.0d, .5d, .25d, .125d, .06125d});

        //Or you can adds them one at a time
        super.addAgentDoubleParameterValue("sarsalambda-lambda", 1.0d);
        super.addAgentDoubleParameterValue("sarsalambda-lambda", 0.984375d);
        super.addAgentDoubleParameterValue("sarsalambda-lambda", 0.75);

    }
}

The Abstract Experiment

AbstractExperiment is where the actual magic happens that runs an experiment. I'm going to explain it's steps here:

Setup

The subclass calls methods like super.addAgentDoubleParameterValues(parameterName,arrayOfValues).  Calling this basically "loads up" that abstract experiment with information about what configurations of the agent and environment should be run.  In bt-trainer, you can run variations on both the agent and environment at the same time.

Run

A for-loop over the number of trials that have been requested.  Inside the for-loop, the AbstractExperiment creates a compositeIndex, which is a helper class that lets you iterate over permutations of the experiment parameters.  The inner loop basically looks like:
while (!theIndex.exhausted) {
    if (passesFilters(theIndex))
         runTrial(theIndex);
    theIndex.advanceCounter();
}


runTrial

This is the meat and potatoes of running an experiment.  The code (simplified) looks like this:
 void runTrial(compositeIndex theIndex) {
        ExperimentHelper E = new ExperimentHelper();

        ParameterHolder pEnv = E.getEnvParamHolder(theEnv);
        ParameterHolder pAgent = E.getAgentParamHolder(theAgent);

        String paramSummaryString = getParamSummaryStringAndSetParams(theIndex, pEnv, pAgent);

        E.loadEnv(theEnv, pEnv);
        E.loadAgent(theAgent, pAgent);

        /*
         * Main Experiment Guts
         */
        RLGlueProxy.RL_init();

        int stepsLeft = totalMaxSteps;
        Vector<Integer> episodeCompletionPoints = new Vector<Integer>();
        Vector<Double> episodeReturns = new Vector<Double>();
        int totalSteps = 0;

        while (stepsLeft > 0) {
            RLGlueProxy.RL_episode(stepsLeft);

            int theseSteps = RLGlueProxy.RL_num_steps();
            double thisReturn = RLGlueProxy.RL_return();
            totalSteps += theseSteps;
            if (theseSteps < stepsLeft) {
                episodeCompletionPoints.add(totalSteps);
                episodeReturns.add(thisReturn);
            }
            stepsLeft -= theseSteps;
        }
        RLGlueProxy.RL_cleanup();

        E.unLoadEnv();
        E.unLoadAgent();

        ResultRecord theResultRecord = new ResultRecord(paramSummaryString, theEnv, theAgent, pEnv, pAgent);

        AbstractRunRecord theEpisodeEndPointRunRecord = new EpisodeEndPointRunRecord(totalTime, episodeCompletionPoints, paramSummaryString.hashCode());
        AbstractRunRecord theRewardRunRecord = new EpisodeEndReturnRunRecord(totalTime, episodeReturns, paramSummaryString.hashCode());
        theResultIndex.appendData(theResultRecord, theEpisodeEndPointRunRecord);
        theResultIndex.appendData(theResultRecord, theRewardRunRecord);
    }


This code looks pretty good in terms of being a core code kernel to control each run on the server.  I guess the best thing to do is to not force people to use this, but to actually make this the code that people need to write as part of their experiment.  They will be free to use this generic version, but they can write fancier code to makes use of new types of run records in the future.

There is magic happening with getParamSummaryStringAndSetParams(theIndex,pEnv,pAgent).  That magic needs to be taken out.  Also, I don't like that we're calling paramSummaryString.hashCode(), that should be handled by magic.  I would propose that the runTrial that is used inside the recordBook happens with the agent already loaded and ready to go, to make the experiment truly agent agnostic.

getResultSummary

This function is very very tightly coupled with ResultsManager, the core class that is used to sift and sort results when summarizing experiments, generating graphs, etc.  I think that this coupling needs to be severed.  The AbstractExperiment creates an AbstractEvaluator (which operates on a directory full of results files), and then wraps the Evaluator inside the ResultsManager.  This might have just been to keep the code short so that it looked easier to create experiments and results.  I'd much prefer if we could evaluate the data without ever having the original experiment file.  Well, I guess this is still true because this is in the AbstractExperiment, not a specific experiment.

The bt-RecordBook Way

We want to define an experiment independently of the agent that will be a part of the experiment.  Every actual experiment extends AbstractExperiment, adding their own specific agent and environment specifications.

I'm refactoring AbstractExperiment to not own and control the experimental parameters, but rather only to take experimental parameters and run with them.  The ownership part (with the composite index) can be factored into a "submitter" architecture which will submit jobs to the AbstractExperiment setup.

TODO: Creating a result record requires the agent name, env name, and both of their params.  Maybe we really should just have a summary object, that can even have some meta data.

Comments