Skip to content

[refactor] Structure configuration files into classes #3936

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 59 commits into from
May 26, 2020
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
59 commits
Select commit Hold shift + click to select a range
f2ce4c7
Use attrs for RunOptions and CLI
Apr 30, 2020
124d777
Add example of strict type conversion
Apr 30, 2020
7b39baa
Recursively apply cattr with being strict
Apr 30, 2020
b5121af
PPO trains
May 1, 2020
3dfe312
Use new settings for BC module
May 2, 2020
cd23b0a
Use correct enum typing
May 2, 2020
0ba816d
SAC now works
May 2, 2020
ad33ab1
Better SAC defaults
May 2, 2020
a8406d9
Reward Signals and GhostTrainer to new settings
May 4, 2020
a826bb4
Conversion script and fix mypy
May 4, 2020
65a0e13
Update curriculum to new settings
May 4, 2020
cf7990d
Fix issue with mypy fix
May 5, 2020
5060638
Enable running without config file
May 5, 2020
69ebbfb
Fix issue with upgrade script
May 5, 2020
9e7c32c
Fix some tests
May 6, 2020
d29b4b7
Fix most of simple_rl tests
May 6, 2020
c9c6613
Fix remaining simple_rl tests
May 6, 2020
32b934d
Remove unneeded methods
May 6, 2020
d0c3bd3
Fix some more tests
May 7, 2020
a2bb9a0
Fix meta curriculum test
May 7, 2020
8885cb0
Fix remaining tests
May 7, 2020
a40fa55
Merge branch 'master' into develop-attrs
May 7, 2020
f5a97c8
Fix update config script
May 7, 2020
85827ce
Revert 3DBall.yaml
May 7, 2020
b3bb269
Convert PPO configs to new format
May 7, 2020
41b11f1
Update SAC configs
May 7, 2020
02b54fc
Remove nulls from configs, update imitation
May 7, 2020
bb88ff2
Fix setup.py
May 7, 2020
df3ed19
Clean up typing, variable names
May 12, 2020
b887170
Remove unneeded cast
May 12, 2020
1e252e8
Move cattr.unstructure to settings.py
May 12, 2020
65d451f
Make Type enums standalone
May 13, 2020
6beda1f
Update upgrade_config script
May 13, 2020
4edef46
Fix issue with default hyperparams
May 13, 2020
52b23f8
Fix simple RL test
May 13, 2020
ce20517
Refactor structure methods into appropriate classes
May 14, 2020
d191262
Fix simple_rl tests
May 14, 2020
e16e20e
Clean up some test files
May 14, 2020
3376551
Fix usage of factories in settings classes
May 14, 2020
f0cd713
Add test and fix default mutables
May 14, 2020
e9740e4
Update training_int_tests
May 14, 2020
b4c587c
Merge branch 'master' into develop-attrs
May 15, 2020
2ebb433
Change docs
May 15, 2020
28507bf
Merge branch 'master' into develop-attrs
May 15, 2020
74d523d
Update with migration
May 15, 2020
93ab9d3
Fix run_experiment
May 15, 2020
68634fb
Fix simple_rl test
May 15, 2020
182b7a5
Update docs with defaults
May 15, 2020
5c39284
Add comment about BC
May 15, 2020
2ff78d7
Add more tests for settings
May 15, 2020
c6234b2
Update changelog
May 15, 2020
32ffc97
Test missing demo_path
May 15, 2020
18142fc
Improve docs and docstrings
May 18, 2020
e3d6b06
Merge branch 'master' into develop-attrs
May 20, 2020
81d1186
Move keep_checkpoints to config rather than CLI
May 20, 2020
04a7860
Remove unused check param keys
May 20, 2020
43e5acd
Remove keep_checkpoints from learn.py
May 20, 2020
f22bae8
Fix last test
May 20, 2020
da4bc73
Fix docs
May 22, 2020
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions com.unity.ml-agents/CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -25,6 +25,8 @@ vector observations to be used simultaneously. (#3981) Thank you @shakenes !
- Curriculum and Parameter Randomization configurations have been merged
into the main training configuration file. Note that this means training
configuration files are now environment-specific. (#3791)
- The format for trainer configuration has changed, and the "default" behavior has been deprecated.
See the [Migration Guide](https://github.com/Unity-Technologies/ml-agents/blob/master/docs/Migrating.md) for more details. (#3936)
- Training artifacts (trained models, summaries) are now found in the `results/`
directory. (#3829)
- Unity Player logs are now written out to the results directory. (#3877)
Expand Down
46 changes: 27 additions & 19 deletions config/imitation/CrawlerStatic.yaml
Original file line number Diff line number Diff line change
@@ -1,29 +1,37 @@
behaviors:
CrawlerStatic:
trainer: ppo
batch_size: 2024
beta: 0.005
buffer_size: 20240
epsilon: 0.2
hidden_units: 512
lambd: 0.95
learning_rate: 0.0003
max_steps: 1e7
memory_size: 256
normalize: true
num_epoch: 3
num_layers: 3
time_horizon: 1000
sequence_length: 64
summary_freq: 30000
use_recurrent: false
trainer_type: ppo
hyperparameters:
batch_size: 2024
buffer_size: 20240
learning_rate: 0.0003
beta: 0.005
epsilon: 0.2
lambd: 0.95
num_epoch: 3
learning_rate_schedule: linear
network_settings:
normalize: true
hidden_units: 512
num_layers: 3
vis_encode_type: simple
reward_signals:
gail:
strength: 1.0
gamma: 0.99
strength: 1.0
encoding_size: 128
learning_rate: 0.0003
use_actions: false
use_vail: false
demo_path: Project/Assets/ML-Agents/Examples/Crawler/Demos/ExpertCrawlerSta.demo
output_path: default
keep_checkpoints: 5
max_steps: 10000000
time_horizon: 1000
summary_freq: 30000
threaded: true
behavioral_cloning:
demo_path: Project/Assets/ML-Agents/Examples/Crawler/Demos/ExpertCrawlerSta.demo
strength: 0.5
steps: 50000
strength: 0.5
samples_per_update: 0
46 changes: 27 additions & 19 deletions config/imitation/FoodCollector.yaml
Original file line number Diff line number Diff line change
@@ -1,29 +1,37 @@
behaviors:
FoodCollector:
trainer: ppo
batch_size: 64
beta: 0.005
buffer_size: 10240
epsilon: 0.2
hidden_units: 128
lambd: 0.95
learning_rate: 0.0003
max_steps: 2.0e6
memory_size: 256
normalize: false
num_epoch: 3
num_layers: 2
time_horizon: 64
sequence_length: 32
summary_freq: 10000
use_recurrent: false
trainer_type: ppo
hyperparameters:
batch_size: 64
buffer_size: 10240
learning_rate: 0.0003
beta: 0.005
epsilon: 0.2
lambd: 0.95
num_epoch: 3
learning_rate_schedule: linear
network_settings:
normalize: false
hidden_units: 128
num_layers: 2
vis_encode_type: simple
reward_signals:
gail:
strength: 0.1
gamma: 0.99
strength: 0.1
encoding_size: 128
learning_rate: 0.0003
use_actions: false
use_vail: false
demo_path: Project/Assets/ML-Agents/Examples/FoodCollector/Demos/ExpertFood.demo
output_path: default
keep_checkpoints: 5
max_steps: 2000000
time_horizon: 64
summary_freq: 10000
threaded: true
behavioral_cloning:
demo_path: Project/Assets/ML-Agents/Examples/FoodCollector/Demos/ExpertFood.demo
strength: 1.0
steps: 0
strength: 1.0
samples_per_update: 0
48 changes: 29 additions & 19 deletions config/imitation/Hallway.yaml
Original file line number Diff line number Diff line change
@@ -1,28 +1,38 @@
behaviors:
Hallway:
trainer: ppo
batch_size: 128
beta: 0.01
buffer_size: 1024
epsilon: 0.2
hidden_units: 128
lambd: 0.95
learning_rate: 0.0003
max_steps: 1.0e7
memory_size: 256
normalize: false
num_epoch: 3
num_layers: 2
time_horizon: 64
sequence_length: 64
summary_freq: 10000
use_recurrent: true
trainer_type: ppo
hyperparameters:
batch_size: 128
buffer_size: 1024
learning_rate: 0.0003
beta: 0.01
epsilon: 0.2
lambd: 0.95
num_epoch: 3
learning_rate_schedule: linear
network_settings:
normalize: false
hidden_units: 128
num_layers: 2
vis_encode_type: simple
memory:
sequence_length: 64
memory_size: 256
reward_signals:
extrinsic:
strength: 1.0
gamma: 0.99
strength: 1.0
gail:
strength: 0.1
gamma: 0.99
strength: 0.1
encoding_size: 128
learning_rate: 0.0003
use_actions: false
use_vail: false
demo_path: Project/Assets/ML-Agents/Examples/Hallway/Demos/ExpertHallway.demo
output_path: default
keep_checkpoints: 5
max_steps: 10000000
time_horizon: 64
summary_freq: 10000
threaded: true
43 changes: 25 additions & 18 deletions config/imitation/PushBlock.yaml
Original file line number Diff line number Diff line change
@@ -1,25 +1,32 @@
behaviors:
PushBlock:
trainer: ppo
batch_size: 128
beta: 0.01
buffer_size: 2048
epsilon: 0.2
hidden_units: 256
lambd: 0.95
learning_rate: 0.0003
max_steps: 1.5e7
memory_size: 256
normalize: false
num_epoch: 3
num_layers: 2
time_horizon: 64
sequence_length: 64
summary_freq: 60000
use_recurrent: false
trainer_type: ppo
hyperparameters:
batch_size: 128
buffer_size: 2048
learning_rate: 0.0003
beta: 0.01
epsilon: 0.2
lambd: 0.95
num_epoch: 3
learning_rate_schedule: linear
network_settings:
normalize: false
hidden_units: 256
num_layers: 2
vis_encode_type: simple
reward_signals:
gail:
strength: 1.0
gamma: 0.99
strength: 1.0
encoding_size: 128
learning_rate: 0.0003
use_actions: false
use_vail: false
demo_path: Project/Assets/ML-Agents/Examples/PushBlock/Demos/ExpertPush.demo
output_path: default
keep_checkpoints: 5
max_steps: 15000000
time_horizon: 64
summary_freq: 60000
threaded: true
30 changes: 14 additions & 16 deletions config/imitation/Pyramids.yaml
Original file line number Diff line number Diff line change
@@ -1,22 +1,20 @@
behaviors:
Pyramids:
trainer: ppo
batch_size: 128
beta: 0.01
buffer_size: 2048
epsilon: 0.2
hidden_units: 512
lambd: 0.95
learning_rate: 0.0003
max_steps: 1.0e7
memory_size: 256
normalize: false
num_epoch: 3
num_layers: 2
trainer_type: ppo
time_horizon: 128
sequence_length: 64
summary_freq: 30000
use_recurrent: false
max_steps: 1.0e7
hyperparameters:
batch_size: 128
beta: 0.01
buffer_size: 2048
epsilon: 0.2
lambd: 0.95
learning_rate: 0.0003
num_epoch: 3
network_settings:
num_layers: 2
normalize: false
hidden_units: 512
reward_signals:
extrinsic:
strength: 1.0
Expand Down
42 changes: 22 additions & 20 deletions config/ppo/3DBall.yaml
Original file line number Diff line number Diff line change
@@ -1,25 +1,27 @@
behaviors:
3DBall:
trainer: ppo
batch_size: 64
beta: 0.001
buffer_size: 12000
epsilon: 0.2
hidden_units: 128
lambd: 0.99
learning_rate: 0.0003
learning_rate_schedule: linear
max_steps: 5.0e5
memory_size: 128
normalize: true
num_epoch: 3
num_layers: 2
time_horizon: 1000
sequence_length: 64
summary_freq: 12000
use_recurrent: false
vis_encode_type: simple
trainer_type: ppo
hyperparameters:
batch_size: 64
buffer_size: 12000
learning_rate: 0.0003
beta: 0.001
epsilon: 0.2
lambd: 0.99
num_epoch: 3
learning_rate_schedule: linear
network_settings:
normalize: true
hidden_units: 128
num_layers: 2
vis_encode_type: simple
reward_signals:
extrinsic:
strength: 1.0
gamma: 0.99
strength: 1.0
output_path: default
keep_checkpoints: 5
max_steps: 500000
time_horizon: 1000
summary_freq: 12000
threaded: true
42 changes: 22 additions & 20 deletions config/ppo/3DBallHard.yaml
Original file line number Diff line number Diff line change
@@ -1,25 +1,27 @@
behaviors:
3DBallHard:
trainer: ppo
batch_size: 1200
beta: 0.001
buffer_size: 12000
epsilon: 0.2
hidden_units: 128
lambd: 0.95
learning_rate: 0.0003
learning_rate_schedule: linear
max_steps: 5.0e6
memory_size: 128
normalize: true
num_epoch: 3
num_layers: 2
time_horizon: 1000
sequence_length: 64
summary_freq: 12000
use_recurrent: false
vis_encode_type: simple
trainer_type: ppo
hyperparameters:
batch_size: 1200
buffer_size: 12000
learning_rate: 0.0003
beta: 0.001
epsilon: 0.2
lambd: 0.95
num_epoch: 3
learning_rate_schedule: linear
network_settings:
normalize: true
hidden_units: 128
num_layers: 2
vis_encode_type: simple
reward_signals:
extrinsic:
strength: 1.0
gamma: 0.995
strength: 1.0
output_path: default
keep_checkpoints: 5
max_steps: 5000000
time_horizon: 1000
summary_freq: 12000
threaded: true
Loading