Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -36,7 +36,7 @@ developer communities.
* Self-play mechanism for training agents in adversarial scenarios
* Train memory-enhanced agents using deep reinforcement learning
* Easily definable Curriculum Learning and Generalization scenarios
* Built-in support for Imitation Learning
* Built-in support for [Imitation Learning](https://github.com/Unity-Technologies/ml-agents/tree/latest_release/docs/Training-Imitation-Learning.md) through Behavioral Cloning or Generative Adversarial Imitation Learning
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why is it linking to latest_release? (similarly for PPO/SAC)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe this was discussed at a prior meeting, but it's so users who hit the Readme.md page from Github will access the latest_release docs and not the master docs.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, that was the reason.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The issue is that if we rename/move these files then it will result in broken links for older branches. Not a major issue, but calling it out.

* Flexible agent control with On Demand Decision Making
* Visualizing network outputs within the environment
* Wrap learning environments as a gym
Expand Down
37 changes: 27 additions & 10 deletions docs/Training-Imitation-Learning.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@ of training a medic NPC. Instead of indirectly training a medic with the help
of a reward function, we can give the medic real world examples of observations
from the game and actions from a game controller to guide the medic's behavior.
Imitation Learning uses pairs of observations and actions from
a demonstration to learn a policy. [Video Link](https://youtu.be/kpb8ZkMBFYs).
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This video shows online BC, which isn't possible anymore given our current toolkit.

a demonstration to learn a policy.

Imitation learning can also be used to help reinforcement learning. Especially in
environments with sparse (i.e., infrequent or rare) rewards, the agent may never see
Expand All @@ -28,7 +28,7 @@ See Behavioral Cloning + GAIL + Curiosity + RL below.
</p>

The ML-Agents toolkit provides two features that enable your agent to learn from demonstrations.
In most scenarios, you should combine these two features
In most scenarios, you can combine these two features.

* GAIL (Generative Adversarial Imitation Learning) uses an adversarial approach to
reward your Agent for behaving similar to a set of demonstrations. To use GAIL, you can add the
Expand All @@ -37,11 +37,12 @@ In most scenarios, you should combine these two features
number of demonstrations.
* Behavioral Cloning (BC) trains the Agent's neural network to exactly mimic the actions
shown in a set of demonstrations.
[The BC feature](Training-PPO.md#optional-behavioral-cloning-using-demonstrations)
can be enabled on the PPO or SAC trainer. BC tends to work best when
there are a lot of demonstrations, or in conjunction with GAIL and/or an extrinsic reward.
The BC feature can be enabled on the [PPO](Training-PPO.md#optional-behavioral-cloning-using-demonstrations)
or [SAC](Training-SAC.md#optional-behavioral-cloning-using-demonstrations) trainer. As BC cannot generalize
past the examples shown in the demonstrations, BC tends to work best when there exists demonstrations
for nearly all of the states that the agent can experience, or in conjunction with GAIL and/or an extrinsic reward.

### How to Choose
### What to Use

If you want to help your agents learn (especially with environments that have sparse rewards)
using pre-recorded demonstrations, you can generally enable both GAIL and Behavioral Cloning
Expand All @@ -55,10 +56,10 @@ example environment under `CrawlerStaticLearning` in `config/gail_config.yaml`.

## Recording Demonstrations

It is possible to record demonstrations of agent behavior from the Unity Editor,
and save them as assets. These demonstrations contain information on the
Demonstrations of agent behavior can be recorded from the Unity Editor,
and saved as assets. These demonstrations contain information on the
observations, actions, and rewards for a given agent during the recording session.
They can be managed from the Editor, as well as used for training with BC and GAIL.
They can be managed in the Editor, as well as used for training with BC and GAIL.

In order to record demonstrations from an agent, add the `Demonstration Recorder`
component to a GameObject in the scene which contains an `Agent` component.
Expand All @@ -75,7 +76,7 @@ When `Record` is checked, a demonstration will be created whenever the scene
is played from the Editor. Depending on the complexity of the task, anywhere
from a few minutes or a few hours of demonstration data may be necessary to
be useful for imitation learning. When you have recorded enough data, end
the Editor play session, and a `.demo` file will be created in the
the Editor play session. A `.demo` file will be created in the
`Assets/Demonstrations` folder (by default). This file contains the demonstrations.
Clicking on the file will provide metadata about the demonstration in the
inspector.
Expand All @@ -85,3 +86,19 @@ inspector.
alt="BC Teacher Helper"
width="375" border="10" />
</p>

You can then specify the path to this file as the `demo_path` in your `trainer_config.yaml` file
when using BC or GAIL. For instance, for BC:

```
behavioral_cloning:
demo_path: <path_to_your_demo_file>
...
```
And for GAIL:
```
reward_signals:
gail:
demo_path: <path_to_your_demo_file>
...
```