Status Blog‎ > ‎

First Large Scale Experiments : Compromised

posted Jul 14, 2009, 9:18 PM by Brian Tanner
I had a bug in the env_step method of RandomizedCodaV1.

Can you spot it?

    public Reward_observation_terminal env_step(Action theAction) {
        Reward_observation_terminal envStepResult = theWrappedEnv.env_step(theAction);
        Observation warpedObservation = theWarper.warpObservation(envStepResult.getObservation());

        Reward_observation_terminal warpedResult = new Reward_observation_terminal(envStepResult.getReward(), warpedObservation, envStepResult.isTerminal());
        return envStepResult;
    }

That's right, I'm not returning the warped observation.  I am using it in env_start though.

So this means these 3 million results are pretty much not what I expected.  The good news is that this basically gives me 25 MDPs worth of data for a random selection of domain and randomized noise in the transitions, but no observation warping. 

It's a bit to work with, but not worth following through to all 100 or more MDPs. Hmph.



Comments