-
Notifications
You must be signed in to change notification settings - Fork 4.4k
[docs] Link to Imitation Learning docs in Readme, cleanup #3582
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Changes from all commits
Commits
Show all changes
7 commits
Select commit
Hold shift + click to select a range
bfbdbbc
Link imitation learning docs in readme
ecc281e
Remove online IL video (not possible)
8795991
Clean up links in IL docs
8d26a3b
More cleanup of IL docs
9a6bcdc
More doc tweaks
0621260
Fix link in README
9333ed1
Add paths to IL page
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -8,7 +8,7 @@ of training a medic NPC. Instead of indirectly training a medic with the help | |
of a reward function, we can give the medic real world examples of observations | ||
from the game and actions from a game controller to guide the medic's behavior. | ||
Imitation Learning uses pairs of observations and actions from | ||
a demonstration to learn a policy. [Video Link](https://youtu.be/kpb8ZkMBFYs). | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This video shows online BC, which isn't possible anymore given our current toolkit. |
||
a demonstration to learn a policy. | ||
|
||
Imitation learning can also be used to help reinforcement learning. Especially in | ||
environments with sparse (i.e., infrequent or rare) rewards, the agent may never see | ||
|
@@ -28,7 +28,7 @@ See Behavioral Cloning + GAIL + Curiosity + RL below. | |
</p> | ||
|
||
The ML-Agents toolkit provides two features that enable your agent to learn from demonstrations. | ||
In most scenarios, you should combine these two features | ||
In most scenarios, you can combine these two features. | ||
|
||
* GAIL (Generative Adversarial Imitation Learning) uses an adversarial approach to | ||
reward your Agent for behaving similar to a set of demonstrations. To use GAIL, you can add the | ||
|
@@ -37,11 +37,12 @@ In most scenarios, you should combine these two features | |
number of demonstrations. | ||
* Behavioral Cloning (BC) trains the Agent's neural network to exactly mimic the actions | ||
shown in a set of demonstrations. | ||
[The BC feature](Training-PPO.md#optional-behavioral-cloning-using-demonstrations) | ||
can be enabled on the PPO or SAC trainer. BC tends to work best when | ||
there are a lot of demonstrations, or in conjunction with GAIL and/or an extrinsic reward. | ||
The BC feature can be enabled on the [PPO](Training-PPO.md#optional-behavioral-cloning-using-demonstrations) | ||
or [SAC](Training-SAC.md#optional-behavioral-cloning-using-demonstrations) trainer. As BC cannot generalize | ||
past the examples shown in the demonstrations, BC tends to work best when there exists demonstrations | ||
for nearly all of the states that the agent can experience, or in conjunction with GAIL and/or an extrinsic reward. | ||
|
||
### How to Choose | ||
### What to Use | ||
|
||
If you want to help your agents learn (especially with environments that have sparse rewards) | ||
using pre-recorded demonstrations, you can generally enable both GAIL and Behavioral Cloning | ||
|
@@ -55,10 +56,10 @@ example environment under `CrawlerStaticLearning` in `config/gail_config.yaml`. | |
|
||
## Recording Demonstrations | ||
|
||
It is possible to record demonstrations of agent behavior from the Unity Editor, | ||
and save them as assets. These demonstrations contain information on the | ||
Demonstrations of agent behavior can be recorded from the Unity Editor, | ||
and saved as assets. These demonstrations contain information on the | ||
observations, actions, and rewards for a given agent during the recording session. | ||
They can be managed from the Editor, as well as used for training with BC and GAIL. | ||
They can be managed in the Editor, as well as used for training with BC and GAIL. | ||
|
||
In order to record demonstrations from an agent, add the `Demonstration Recorder` | ||
component to a GameObject in the scene which contains an `Agent` component. | ||
|
@@ -75,7 +76,7 @@ When `Record` is checked, a demonstration will be created whenever the scene | |
is played from the Editor. Depending on the complexity of the task, anywhere | ||
from a few minutes or a few hours of demonstration data may be necessary to | ||
be useful for imitation learning. When you have recorded enough data, end | ||
the Editor play session, and a `.demo` file will be created in the | ||
the Editor play session. A `.demo` file will be created in the | ||
`Assets/Demonstrations` folder (by default). This file contains the demonstrations. | ||
Clicking on the file will provide metadata about the demonstration in the | ||
inspector. | ||
|
@@ -85,3 +86,19 @@ inspector. | |
alt="BC Teacher Helper" | ||
width="375" border="10" /> | ||
</p> | ||
|
||
You can then specify the path to this file as the `demo_path` in your `trainer_config.yaml` file | ||
when using BC or GAIL. For instance, for BC: | ||
|
||
``` | ||
behavioral_cloning: | ||
demo_path: <path_to_your_demo_file> | ||
... | ||
``` | ||
And for GAIL: | ||
``` | ||
reward_signals: | ||
gail: | ||
demo_path: <path_to_your_demo_file> | ||
... | ||
``` |
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why is it linking to latest_release? (similarly for PPO/SAC)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe this was discussed at a prior meeting, but it's so users who hit the Readme.md page from Github will access the
latest_release
docs and not themaster
docs.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, that was the reason.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The issue is that if we rename/move these files then it will result in broken links for older branches. Not a major issue, but calling it out.