Skip to content

Commit 3d7b809

Browse files
committed
remove duplicate of curr documentation
1 parent 269a4c8 commit 3d7b809

File tree

1 file changed

+0
-79
lines changed

1 file changed

+0
-79
lines changed

docs/Training-ML-Agents.md

Lines changed: 0 additions & 79 deletions
Original file line numberDiff line numberDiff line change
@@ -514,85 +514,6 @@ the run, you can resume it using `--resume` and lesson progress will start off w
514514
ended.
515515

516516

517-
#### Curriculum
518-
519-
To enable curriculum learning, you need to add a `curriculum` sub-section to your environment
520-
parameter. Here is one example with the environment parameter `my_environment_parameter` :
521-
522-
```yml
523-
behaviors:
524-
BehaviorY:
525-
# < Same as above >
526-
527-
# Add this section
528-
environment_parameters:
529-
my_environment_parameter:
530-
curriculum:
531-
- name: MyFirstLesson # The '-' is important as this is a list
532-
completion_criteria:
533-
measure: progress
534-
behavior: my_behavior
535-
signal_smoothing: true
536-
min_lesson_length: 100
537-
threshold: 0.2
538-
value: 0.0
539-
- name: MySecondLesson # This is the start of the second lesson
540-
completion_criteria:
541-
measure: progress
542-
behavior: my_behavior
543-
signal_smoothing: true
544-
min_lesson_length: 100
545-
threshold: 0.6
546-
require_reset: true
547-
value:
548-
sampler_type: uniform
549-
sampler_parameters:
550-
min_value: 4.0
551-
max_value: 7.0
552-
- name: MyLastLesson
553-
value: 8.0
554-
```
555-
556-
Note that this curriculum __only__ applies to `my_environment_parameter`. The `curriculum` section
557-
contains a list of `Lessons`. In the example, the lessons are named `MyFirstLesson`, `MySecondLesson`
558-
and `MyLastLesson`.
559-
Each `Lesson` has 3 fields :
560-
561-
- `name` which is a user defined name for the lesson (The name of the lesson will be displayed in
562-
the console when the lesson changes)
563-
- `completion_criteria` which determines what needs to happen in the simulation before the lesson
564-
can be considered complete. When that condition is met, the curriculum moves on to the next
565-
`Lesson`. Note that you do not need to specify a `completion_criteria` for the last `Lesson`
566-
- `value` which is the value the environment parameter will take during the lesson. Note that this
567-
can be a float or a sampler.
568-
569-
There are the different settings of the `completion_criteria` :
570-
571-
572-
| **Setting** | **Description** |
573-
| :------------------ | :---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
574-
| `measure` | What to measure learning progress, and advancement in lessons by.<br><br> `reward` uses a measure received reward, while `progress` uses the ratio of steps/max_steps. |
575-
| `behavior` | Specifies which behavior is being tracked. There can be multiple behaviors with different names, each at different points of training. This setting allows the curriculum to track only one of them. |
576-
| `threshold` | Determines at what point in value of `measure` the lesson should be increased. |
577-
| `min_lesson_length` | The minimum number of episodes that should be completed before the lesson can change. If `measure` is set to `reward`, the average cumulative reward of the last `min_lesson_length` episodes will be used to determine if the lesson should change. Must be nonnegative. <br><br> **Important**: the average reward that is compared to the thresholds is different than the mean reward that is logged to the console. For example, if `min_lesson_length` is `100`, the lesson will increment after the average cumulative reward of the last `100` episodes exceeds the current threshold. The mean reward logged to the console is dictated by the `summary_freq` parameter defined above. |
578-
| `signal_smoothing` | Whether to weight the current progress measure by previous values. |
579-
| `require_reset` | Whether changing lesson requires the environment to reset (default: false) |
580-
##### Training with a Curriculum
581-
582-
Once we have specified our metacurriculum and curricula, we can launch
583-
`mlagents-learn` to point to the config file containing
584-
our curricula and PPO will train using Curriculum Learning. For example, to
585-
train agents in the Wall Jump environment with curriculum learning, we can run:
586-
587-
```sh
588-
mlagents-learn config/ppo/WallJump_curriculum.yaml --run-id=wall-jump-curriculum
589-
```
590-
591-
We can then keep track of the current lessons and progresses via TensorBoard. If you've terminated
592-
the run, you can resume it using `--resume` and lesson progress will start off where it
593-
ended.
594-
595-
596517
### Training Using Concurrent Unity Instances
597518

598519
In order to run concurrent Unity instances during training, set the number of

0 commit comments

Comments
 (0)