Skip to content

Commit 84484b3

Browse files
author
Ervin T
authored
[docs] Link to Imitation Learning docs in Readme, cleanup IL docs (#3582)
1 parent 6dbba73 commit 84484b3

File tree

2 files changed

+28
-11
lines changed

2 files changed

+28
-11
lines changed

README.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -36,7 +36,7 @@ developer communities.
3636
* Self-play mechanism for training agents in adversarial scenarios
3737
* Train memory-enhanced agents using deep reinforcement learning
3838
* Easily definable Curriculum Learning and Generalization scenarios
39-
* Built-in support for Imitation Learning
39+
* Built-in support for [Imitation Learning](https://github.com/Unity-Technologies/ml-agents/tree/latest_release/docs/Training-Imitation-Learning.md) through Behavioral Cloning or Generative Adversarial Imitation Learning
4040
* Flexible agent control with On Demand Decision Making
4141
* Visualizing network outputs within the environment
4242
* Wrap learning environments as a gym

docs/Training-Imitation-Learning.md

Lines changed: 27 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -8,7 +8,7 @@ of training a medic NPC. Instead of indirectly training a medic with the help
88
of a reward function, we can give the medic real world examples of observations
99
from the game and actions from a game controller to guide the medic's behavior.
1010
Imitation Learning uses pairs of observations and actions from
11-
a demonstration to learn a policy. [Video Link](https://youtu.be/kpb8ZkMBFYs).
11+
a demonstration to learn a policy.
1212

1313
Imitation learning can also be used to help reinforcement learning. Especially in
1414
environments with sparse (i.e., infrequent or rare) rewards, the agent may never see
@@ -28,7 +28,7 @@ See Behavioral Cloning + GAIL + Curiosity + RL below.
2828
</p>
2929

3030
The ML-Agents toolkit provides two features that enable your agent to learn from demonstrations.
31-
In most scenarios, you should combine these two features
31+
In most scenarios, you can combine these two features.
3232

3333
* GAIL (Generative Adversarial Imitation Learning) uses an adversarial approach to
3434
reward your Agent for behaving similar to a set of demonstrations. To use GAIL, you can add the
@@ -37,11 +37,12 @@ In most scenarios, you should combine these two features
3737
number of demonstrations.
3838
* Behavioral Cloning (BC) trains the Agent's neural network to exactly mimic the actions
3939
shown in a set of demonstrations.
40-
[The BC feature](Training-PPO.md#optional-behavioral-cloning-using-demonstrations)
41-
can be enabled on the PPO or SAC trainer. BC tends to work best when
42-
there are a lot of demonstrations, or in conjunction with GAIL and/or an extrinsic reward.
40+
The BC feature can be enabled on the [PPO](Training-PPO.md#optional-behavioral-cloning-using-demonstrations)
41+
or [SAC](Training-SAC.md#optional-behavioral-cloning-using-demonstrations) trainer. As BC cannot generalize
42+
past the examples shown in the demonstrations, BC tends to work best when there exists demonstrations
43+
for nearly all of the states that the agent can experience, or in conjunction with GAIL and/or an extrinsic reward.
4344

44-
### How to Choose
45+
### What to Use
4546

4647
If you want to help your agents learn (especially with environments that have sparse rewards)
4748
using pre-recorded demonstrations, you can generally enable both GAIL and Behavioral Cloning
@@ -55,10 +56,10 @@ example environment under `CrawlerStaticLearning` in `config/gail_config.yaml`.
5556

5657
## Recording Demonstrations
5758

58-
It is possible to record demonstrations of agent behavior from the Unity Editor,
59-
and save them as assets. These demonstrations contain information on the
59+
Demonstrations of agent behavior can be recorded from the Unity Editor,
60+
and saved as assets. These demonstrations contain information on the
6061
observations, actions, and rewards for a given agent during the recording session.
61-
They can be managed from the Editor, as well as used for training with BC and GAIL.
62+
They can be managed in the Editor, as well as used for training with BC and GAIL.
6263

6364
In order to record demonstrations from an agent, add the `Demonstration Recorder`
6465
component to a GameObject in the scene which contains an `Agent` component.
@@ -75,7 +76,7 @@ When `Record` is checked, a demonstration will be created whenever the scene
7576
is played from the Editor. Depending on the complexity of the task, anywhere
7677
from a few minutes or a few hours of demonstration data may be necessary to
7778
be useful for imitation learning. When you have recorded enough data, end
78-
the Editor play session, and a `.demo` file will be created in the
79+
the Editor play session. A `.demo` file will be created in the
7980
`Assets/Demonstrations` folder (by default). This file contains the demonstrations.
8081
Clicking on the file will provide metadata about the demonstration in the
8182
inspector.
@@ -85,3 +86,19 @@ inspector.
8586
alt="BC Teacher Helper"
8687
width="375" border="10" />
8788
</p>
89+
90+
You can then specify the path to this file as the `demo_path` in your `trainer_config.yaml` file
91+
when using BC or GAIL. For instance, for BC:
92+
93+
```
94+
behavioral_cloning:
95+
demo_path: <path_to_your_demo_file>
96+
...
97+
```
98+
And for GAIL:
99+
```
100+
reward_signals:
101+
gail:
102+
demo_path: <path_to_your_demo_file>
103+
...
104+
```

0 commit comments

Comments
 (0)