Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 5 additions & 1 deletion config/gail_config.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -31,7 +31,7 @@ Pyramids:
beta: 1.0e-2
max_steps: 5.0e5
num_epoch: 3
pretraining:
behavioral_cloning:
demo_path: ./demos/ExpertPyramid.demo
strength: 0.5
steps: 10000
Expand Down Expand Up @@ -59,6 +59,10 @@ CrawlerStatic:
summary_freq: 3000
num_layers: 3
hidden_units: 512
behavioral_cloning:
demo_path: ./demos/ExpertCrawlerSta.demo
strength: 0.5
steps: 5000
reward_signals:
gail:
strength: 1.0
Expand Down
2 changes: 2 additions & 0 deletions docs/Migrating.md
Original file line number Diff line number Diff line change
Expand Up @@ -16,6 +16,8 @@ The versions can be found in
* `reset()` on the Low-Level Python API no longer takes a `config` argument. `UnityEnvironment` no longer has a `reset_parameters` field. To modify float properties in the environment, you must use a `FloatPropertiesChannel`. For more information, refer to the [Low Level Python API documentation](Python-API.md)
* The Academy no longer has a `Training Configuration` nor `Inference Configuration` field in the inspector. To modify the configuration from the Low-Level Python API, use an `EngineConfigurationChannel`. To modify it during training, use the new command line arguments `--width`, `--height`, `--quality-level`, `--time-scale` and `--target-frame-rate` in `mlagents-learn`.
* The Academy no longer has a `Default Reset Parameters` field in the inspector. The Academy class no longer has a `ResetParameters`. To access shared float properties with Python, use the new `FloatProperties` field on the Academy.
* Offline Behavioral Cloning has been removed. To learn from demonstrations, use the GAIL and
Behavioral Cloning features with either PPO or SAC. See [Imitation Learning](Training-Imitation-Learning.md) for more information.

### Steps to Migrate
* If you had a custom `Training Configuration` in the Academy inspector, you will need to pass your custom configuration at every training run using the new command line arguments `--width`, `--height`, `--quality-level`, `--time-scale` and `--target-frame-rate`.
Expand Down
9 changes: 4 additions & 5 deletions docs/Reward-Signals.md
Original file line number Diff line number Diff line change
Expand Up @@ -135,11 +135,10 @@ discriminator is trained to better distinguish between demonstrations and agent
In this way, while the agent gets better and better at mimicing the demonstrations, the
discriminator keeps getting stricter and stricter and the agent must try harder to "fool" it.

This approach, when compared to [Behavioral Cloning](Training-Behavioral-Cloning.md), requires
far fewer demonstrations to be provided. After all, we are still learning a policy that happens
to be similar to the demonstrations, not directly copying the behavior of the demonstrations. It
is especially effective when combined with an Extrinsic signal. However, the GAIL reward signal can
also be used independently to purely learn from demonstrations.
This approach learns a _policy_ that produces states and actions similar to the demonstrations,
requiring fewer demonstrations than direct cloning of the actions. In addition to learning purely
from demonstrations, the GAIL reward signal can be mixed with an extrinsic reward signal to guide
the learning process.

Using GAIL requires recorded demonstrations from your Unity environment. See the
[imitation learning guide](Training-Imitation-Learning.md) to learn more about recording demonstrations.
Expand Down
30 changes: 0 additions & 30 deletions docs/Training-Behavioral-Cloning.md

This file was deleted.

34 changes: 16 additions & 18 deletions docs/Training-Imitation-Learning.md
Original file line number Diff line number Diff line change
Expand Up @@ -19,48 +19,46 @@ imitation learning combined with reinforcement learning can dramatically
reduce the time the agent takes to solve the environment.
For instance, on the [Pyramids environment](Learning-Environment-Examples.md#pyramids),
using 6 episodes of demonstrations can reduce training steps by more than 4 times.
See PreTraining + GAIL + Curiosity + RL below.
See Behavioral Cloning + GAIL + Curiosity + RL below.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that you'll need to change the legend in the linked image.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed


<p align="center">
<img src="images/mlagents-ImitationAndRL.png"
alt="Using Demonstrations with Reinforcement Learning"
width="700" border="0" />
</p>

The ML-Agents toolkit provides several ways to learn from demonstrations.
The ML-Agents toolkit provides two features that enable your agent to learn from demonstrations.
In most scenarios, you should combine these two features

* To train using GAIL (Generative Adversarial Imitation Learning) you can add the
* GAIL (Generative Adversarial Imitation Learning) uses an adversarial approach to
reward your Agent for behaving similar to a set of demonstrations. To use GAIL, you can add the
[GAIL reward signal](Reward-Signals.md#gail-reward-signal). GAIL can be
used with or without environment rewards, and works well when there are a limited
number of demonstrations.
* To help bootstrap reinforcement learning, you can enable
[pretraining](Training-PPO.md#optional-pretraining-using-demonstrations)
on the PPO trainer, in addition to using a small GAIL reward signal.
* To train an agent to exactly mimic demonstrations, you can use the
[Behavioral Cloning](Training-Behavioral-Cloning.md) trainer. Behavioral Cloning can be
used with demonstrations (in-editor), and learns very quickly. However, it usually is ineffective
on more complex environments without a large number of demonstrations.
* Behavioral Cloning (BC) trains the Agent's neural network to exactly mimic the actions
shown in a set of demonstrations.
[The BC feature](Training-PPO.md#optional-behavioral-cloning-using-demonstrations)
can be enabled on the PPO or SAC trainer. BC tends to work best when
there are a lot of demonstrations, or in conjunction with GAIL and/or an extrinsic reward.

### How to Choose

If you want to help your agents learn (especially with environments that have sparse rewards)
using pre-recorded demonstrations, you can generally enable both GAIL and Pretraining.
using pre-recorded demonstrations, you can generally enable both GAIL and Behavioral Cloning
at low strengths in addition to having an extrinsic reward.
An example of this is provided for the Pyramids example environment under
`PyramidsLearning` in `config/gail_config.yaml`.

If you want to train purely from demonstrations, GAIL is generally the preferred approach, especially
if you have few (<10) episodes of demonstrations. An example of this is provided for the Crawler example
environment under `CrawlerStaticLearning` in `config/gail_config.yaml`.

If you have plenty of demonstrations and/or a very simple environment, Offline Behavioral Cloning can be effective and quick. However, it cannot be combined with RL.
If you want to train purely from demonstrations, GAIL and BC _without_ an
extrinsic reward signal is the preferred approach. An example of this is provided for the Crawler
example environment under `CrawlerStaticLearning` in `config/gail_config.yaml`.

## Recording Demonstrations

It is possible to record demonstrations of agent behavior from the Unity Editor,
and save them as assets. These demonstrations contain information on the
observations, actions, and rewards for a given agent during the recording session.
They can be managed from the Editor, as well as used for training with Offline
Behavioral Cloning and GAIL.
They can be managed from the Editor, as well as used for training with BC and GAIL.

In order to record demonstrations from an agent, add the `Demonstration Recorder`
component to a GameObject in the scene which contains an `Agent` component.
Expand Down
Loading