Unity-Technologies · ervteng · Dec 12, 2019 · Nov 25, 2019 · Nov 25, 2019 · Nov 26, 2019
diff --git a/config/gail_config.yaml b/config/gail_config.yaml
@@ -31,7 +31,7 @@ Pyramids:
     beta: 1.0e-2
     max_steps: 5.0e5
     num_epoch: 3
-    pretraining:
+    behavioral_cloning:
         demo_path: ./demos/ExpertPyramid.demo
         strength: 0.5
         steps: 10000
@@ -59,6 +59,10 @@ CrawlerStatic:
     summary_freq: 3000
     num_layers: 3
     hidden_units: 512
+    behavioral_cloning:
+        demo_path: ./demos/ExpertCrawlerSta.demo
+        strength: 0.5
+        steps: 5000
     reward_signals:
         gail:
             strength: 1.0

diff --git a/docs/Migrating.md b/docs/Migrating.md
@@ -16,6 +16,8 @@ The versions can be found in
 * `reset()` on the Low-Level Python API no longer takes a `config` argument. `UnityEnvironment` no longer has a `reset_parameters` field. To modify float properties in the environment, you must use a `FloatPropertiesChannel`. For more information, refer to the [Low Level Python API documentation](Python-API.md)
 * The Academy no longer has a `Training Configuration` nor `Inference Configuration` field in the inspector. To modify the configuration from the Low-Level Python API, use an `EngineConfigurationChannel`. To modify it during training, use the new command line arguments `--width`, `--height`, `--quality-level`, `--time-scale` and `--target-frame-rate` in `mlagents-learn`.
 * The Academy no longer has a `Default Reset Parameters` field in the inspector. The Academy class no longer has a `ResetParameters`. To access shared float properties with Python, use the new `FloatProperties` field on the Academy.
+* Offline Behavioral Cloning has been removed. To learn from demonstrations, use the GAIL and
+Behavioral Cloning features with either PPO or SAC. See [Imitation Learning](Training-Imitation-Learning.md) for more information.
 
 ### Steps to Migrate
  * If you had a custom `Training Configuration` in the Academy inspector, you will need to pass your custom configuration at every training run using the new command line arguments `--width`, `--height`, `--quality-level`, `--time-scale` and `--target-frame-rate`.

diff --git a/docs/Reward-Signals.md b/docs/Reward-Signals.md
@@ -135,11 +135,10 @@ discriminator is trained to better distinguish between demonstrations and agent
 In this way, while the agent gets better and better at mimicing the demonstrations, the
 discriminator keeps getting stricter and stricter and the agent must try harder to "fool" it.
 
-This approach, when compared to [Behavioral Cloning](Training-Behavioral-Cloning.md), requires
-far fewer demonstrations to be provided. After all, we are still learning a policy that happens
-to be similar to the demonstrations, not directly copying the behavior of the demonstrations. It
-is especially effective when combined with an Extrinsic signal. However, the GAIL reward signal can
-also be used independently to purely learn from demonstrations.
+This approach learns a _policy_ that produces states and actions similar to the demonstrations,
+requiring fewer demonstrations than direct cloning of the actions. In addition to learning purely
+from demonstrations, the GAIL reward signal can be mixed with an extrinsic reward signal to guide
+the learning process.
 
 Using GAIL requires recorded demonstrations from your Unity environment. See the
 [imitation learning guide](Training-Imitation-Learning.md) to learn more about recording demonstrations.

diff --git a/docs/Training-Behavioral-Cloning.md b/docs/Training-Behavioral-Cloning.md
diff --git a/docs/Training-Imitation-Learning.md b/docs/Training-Imitation-Learning.md
@@ -19,48 +19,46 @@ imitation learning combined with reinforcement learning can dramatically
 reduce the time the agent takes to solve the environment.
 For instance, on the [Pyramids environment](Learning-Environment-Examples.md#pyramids),
 using 6 episodes of demonstrations can reduce training steps by more than 4 times.
-See PreTraining + GAIL + Curiosity + RL below.
+See Behavioral Cloning + GAIL + Curiosity + RL below.
 
 <p align="center">
   <img src="images/mlagents-ImitationAndRL.png"
        alt="Using Demonstrations with Reinforcement Learning"
        width="700" border="0" />
 </p>
 
-The ML-Agents toolkit provides several ways to learn from demonstrations.
+The ML-Agents toolkit provides two features that enable your agent to learn from demonstrations.
+In most scenarios, you should combine these two features
 
-* To train using GAIL (Generative Adversarial Imitation Learning) you can add the
+* GAIL (Generative Adversarial Imitation Learning) uses an adversarial approach to
+  reward your Agent for behaving similar to a set of demonstrations. To use GAIL, you can add the
   [GAIL reward signal](Reward-Signals.md#gail-reward-signal). GAIL can be
   used with or without environment rewards, and works well when there are a limited
   number of demonstrations.
-* To help bootstrap reinforcement learning, you can enable
-  [pretraining](Training-PPO.md#optional-pretraining-using-demonstrations)
-  on the PPO trainer, in addition to using a small GAIL reward signal.
-* To train an agent to exactly mimic demonstrations, you can use the
-  [Behavioral Cloning](Training-Behavioral-Cloning.md) trainer. Behavioral Cloning can be
-  used with demonstrations (in-editor), and learns very quickly. However, it usually is ineffective
-  on more complex environments without a large number of demonstrations.
+* Behavioral Cloning (BC) trains the Agent's neural network to exactly mimic the actions
+  shown in a set of demonstrations.
+  [The BC feature](Training-PPO.md#optional-behavioral-cloning-using-demonstrations)
+  can be enabled on the PPO or SAC trainer. BC tends to work best when
+  there are a lot of demonstrations, or in conjunction with GAIL and/or an extrinsic reward.
 
 ### How to Choose
 
 If you want to help your agents learn (especially with environments that have sparse rewards)
-using pre-recorded demonstrations, you can generally enable both GAIL and Pretraining.
+using pre-recorded demonstrations, you can generally enable both GAIL and Behavioral Cloning
+at low strengths in addition to having an extrinsic reward.
 An example of this is provided for the Pyramids example environment under
  `PyramidsLearning` in `config/gail_config.yaml`.
 
-If you want to train purely from demonstrations, GAIL is generally the preferred approach, especially
-if you have few (<10) episodes of demonstrations. An example of this is provided for the Crawler example
-environment under `CrawlerStaticLearning` in `config/gail_config.yaml`.
-
-If you have plenty of demonstrations and/or a very simple environment, Offline Behavioral Cloning can be effective and quick. However, it cannot be combined with RL.
+If you want to train purely from demonstrations, GAIL and BC _without_ an
+extrinsic reward signal is the preferred approach. An example of this is provided for the Crawler
+example environment under `CrawlerStaticLearning` in `config/gail_config.yaml`.
 
 ## Recording Demonstrations
 
 It is possible to record demonstrations of agent behavior from the Unity Editor,
 and save them as assets. These demonstrations contain information on the
 observations, actions, and rewards for a given agent during the recording session.
-They can be managed from the Editor, as well as used for training with Offline
-Behavioral Cloning and GAIL.
+They can be managed from the Editor, as well as used for training with BC and GAIL.
 
 In order to record demonstrations from an agent, add the `Demonstration Recorder`
 component to a GameObject in the scene which contains an `Agent` component.