[docs] Link to Imitation Learning docs in Readme, cleanup IL docs (#3582)

Ervin T · web-flow · commit 84484b3f9532 · 2020-03-11T13:27:02.000-07:00
diff --git a/README.md b/README.md
@@ -36,7 +36,7 @@ developer communities.
 * Self-play mechanism for training agents in adversarial scenarios
 * Train memory-enhanced agents using deep reinforcement learning
 * Easily definable Curriculum Learning and Generalization scenarios
-* Built-in support for Imitation Learning
+* Built-in support for [Imitation Learning](https://github.com/Unity-Technologies/ml-agents/tree/latest_release/docs/Training-Imitation-Learning.md) through Behavioral Cloning or Generative Adversarial Imitation Learning
 * Flexible agent control with On Demand Decision Making
 * Visualizing network outputs within the environment
 * Wrap learning environments as a gym
diff --git a/docs/Training-Imitation-Learning.md b/docs/Training-Imitation-Learning.md
@@ -8,7 +8,7 @@ of training a medic NPC. Instead of indirectly training a medic with the help
 of a reward function, we can give the medic real world examples of observations
 from the game and actions from a game controller to guide the medic's behavior.
 Imitation Learning uses pairs of observations and actions from
-a demonstration to learn a policy. [Video Link](https://youtu.be/kpb8ZkMBFYs).
+a demonstration to learn a policy.
 
 Imitation learning can also be used to help reinforcement learning. Especially in
 environments with sparse (i.e., infrequent or rare) rewards, the agent may never see
@@ -28,7 +28,7 @@ See Behavioral Cloning + GAIL + Curiosity + RL below.
 </p>
 
 The ML-Agents toolkit provides two features that enable your agent to learn from demonstrations.
-In most scenarios, you should combine these two features
+In most scenarios, you can combine these two features.
 
 * GAIL (Generative Adversarial Imitation Learning) uses an adversarial approach to
   reward your Agent for behaving similar to a set of demonstrations. To use GAIL, you can add the
@@ -37,11 +37,12 @@ In most scenarios, you should combine these two features
   number of demonstrations.
 * Behavioral Cloning (BC) trains the Agent's neural network to exactly mimic the actions
   shown in a set of demonstrations.
-  [The BC feature](Training-PPO.md#optional-behavioral-cloning-using-demonstrations)
-  can be enabled on the PPO or SAC trainer. BC tends to work best when
-  there are a lot of demonstrations, or in conjunction with GAIL and/or an extrinsic reward.
+  The BC feature can be enabled on the [PPO](Training-PPO.md#optional-behavioral-cloning-using-demonstrations)
+  or [SAC](Training-SAC.md#optional-behavioral-cloning-using-demonstrations) trainer. As BC cannot generalize
+  past the examples shown in the demonstrations, BC tends to work best when there exists demonstrations
+  for nearly all of the states that the agent can experience, or in conjunction with GAIL and/or an extrinsic reward.
 
-### How to Choose
+### What to Use
 
 If you want to help your agents learn (especially with environments that have sparse rewards)
 using pre-recorded demonstrations, you can generally enable both GAIL and Behavioral Cloning
@@ -55,10 +56,10 @@ example environment under `CrawlerStaticLearning` in `config/gail_config.yaml`.
 
 ## Recording Demonstrations
 
-It is possible to record demonstrations of agent behavior from the Unity Editor,
-and save them as assets. These demonstrations contain information on the
+Demonstrations of agent behavior can be recorded from the Unity Editor,
+and saved as assets. These demonstrations contain information on the
 observations, actions, and rewards for a given agent during the recording session.
-They can be managed from the Editor, as well as used for training with BC and GAIL.
+They can be managed in the Editor, as well as used for training with BC and GAIL.
 
 In order to record demonstrations from an agent, add the `Demonstration Recorder`
 component to a GameObject in the scene which contains an `Agent` component.
@@ -75,7 +76,7 @@ When `Record` is checked, a demonstration will be created whenever the scene
 is played from the Editor. Depending on the complexity of the task, anywhere
 from a few minutes or a few hours of demonstration data may be necessary to
 be useful for imitation learning. When you have recorded enough data, end
-the Editor play session, and a `.demo` file will be created in the
+the Editor play session. A `.demo` file will be created in the
 `Assets/Demonstrations` folder (by default). This file contains the demonstrations.
 Clicking on the file will provide metadata about the demonstration in the
 inspector.
@@ -85,3 +86,19 @@ inspector.
        alt="BC Teacher Helper"
        width="375" border="10" />
 </p>
+
+You can then specify the path to this file as the `demo_path` in your `trainer_config.yaml` file
+when using BC or GAIL. For instance, for BC:
+
+```
+    behavioral_cloning:
+        demo_path: <path_to_your_demo_file>
+        ...
+```
+And for GAIL:
+```
+    reward_signals:
+        gail:
+            demo_path: <path_to_your_demo_file>
+            ...
+```