Recent site activity

Home

Interested?

I'm looking for people who are interested in supporting this project, by helping to create agents or environments, or helping decide what are some good initial events, etc.  Please contact brian@tannerpages.com if you are interested.

Introduction


The RL-Logbook is intended to simplify the task of comparing experimental results in reinforcement learning.  The idea is simple (details follow)

  1. Create a set of open source agents and environments from the reinforcement learning literature.
  2. Create a set of experimental specifications (events).
  3. Record the results (records) of each agent with a variety of parameters for every event.
Once these records have been created, they never need to be duplicated again, by anyone.  At the same time, because all of the source is available, they can be duplicated at any time, by anyone.  When testing a new algorithm, a research scientist will not need to re-implement all of the alternative algorithms.  People evaluating his/her results won't have to worry about the experiment being fairly conducted, either.

Let me give an example.

Agent:  Sarsa Lambda with Tile Coding Function Approximation and Epsilon Greedy Action Selection
Environment: Mountain Car
Event: How many episode can the agent complete within 15 000 time steps in Mountain Car, with a fixed starting state

Desiderata

Save Time / Effort / Frustration

  • No need to re-implement the environments that are used in the logbook
    • Reduces up-front development time
    • Lower probability of needing to re-run results because of bugs
  • Library of events makes it easier to find environments particularly applicable to your new algorithm
  • No need to re-run hundreds of parameter permutations of your favorite competitor algorithms
    • Computation savings
    • Stronger comparisons

Increase Comparability

  • Re-use of experimental design and environments means no idiosyncrasies in setup that change results

Accelerate Advances

  • Open Source / Public agents mean that people with new ideas can stand on the shoulders of giants
  • Trends and Macro behavior can be seen when larger amounts of data are pooled (same agents on many environments and experiments can be enlightening)

Decrease Experimenter Bias

  • "Good" experimental designs will become popular events (natural vetting)
  • Intuition is that events with less bias are less likely to favor the creator and therefore will be used by more people
  • No concerns that experimenter isn't giving every advantage to the comparison algorithms because they have been created and tested by others

RoadMap

Proposal for how things should work and future plans have been moved to the roadmap page.
Comments