-
Notifications
You must be signed in to change notification settings - Fork 4.3k
Description
Describe the bug
I am using mlagents 0.18.0. While I setup everything and started training with provided example environments, it keeps giving me KeyError from trainer and AttributeError from tensorflow. I used the same setup from the same desktop about 2 days and everything works well, but it couldn't work in my current device.
To Reproduce
Steps to reproduce the behavior:
- I am using Unity editor to train 3DBall example environment. The shell script is:
mlagents-learn ppo/3DBall.yaml --run-id=3DBall_test
It is started from a virtual python environment. - The config file I used is below, which is included in the git repo:
behaviors:
3DBall:
trainer_type: ppo
hyperparameters:
batch_size: 64
buffer_size: 12000
learning_rate: 0.0003
beta: 0.001
epsilon: 0.2
lambd: 0.99
num_epoch: 3
learning_rate_schedule: linear
network_settings:
normalize: true
hidden_units: 128
num_layers: 2
vis_encode_type: simple
reward_signals:
extrinsic:
gamma: 0.99
strength: 1.0
keep_checkpoints: 5
max_steps: 500000
time_horizon: 1000
summary_freq: 12000
threaded: true
Console logs / stack traces
(mlagents-env) D:\ML-Agents\ml-agents\config>mlagents-learn ppo/3DBall.yaml --run-id=3DBall_test --force
2020-07-21 18:20:45.848482: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cudart64_101.dll
WARNING:tensorflow:From d:\ml-agents\mlagents-env\lib\site-packages\tensorflow\python\compat\v2_compat.py:96: disable_resource_variables (from tensorflow.python.ops.variable_scope) is deprecated and will be removed in a future version.
Instructions for updating:
non-resource variables are not supported in the long term
▄▄▄▓▓▓▓
╓▓▓▓▓▓▓█▓▓▓▓▓
,▄▄▄m▀▀▀' ,▓▓▓▀▓▓▄ ▓▓▓ ▓▓▌
▄▓▓▓▀' ▄▓▓▀ ▓▓▓ ▄▄ ▄▄ ,▄▄ ▄▄▄▄ ,▄▄ ▄▓▓▌▄ ▄▄▄ ,▄▄
▄▓▓▓▀ ▄▓▓▀ ▐▓▓▌ ▓▓▌ ▐▓▓ ▐▓▓▓▀▀▀▓▓▌ ▓▓▓ ▀▓▓▌▀ ^▓▓▌ ╒▓▓▌
▄▓▓▓▓▓▄▄▄▄▄▄▄▄▓▓▓ ▓▀ ▓▓▌ ▐▓▓ ▐▓▓ ▓▓▓ ▓▓▓ ▓▓▌ ▐▓▓▄ ▓▓▌
▀▓▓▓▓▀▀▀▀▀▀▀▀▀▀▓▓▄ ▓▓ ▓▓▌ ▐▓▓ ▐▓▓ ▓▓▓ ▓▓▓ ▓▓▌ ▐▓▓▐▓▓
^█▓▓▓ ▀▓▓▄ ▐▓▓▌ ▓▓▓▓▄▓▓▓▓ ▐▓▓ ▓▓▓ ▓▓▓ ▓▓▓▄ ▓▓▓▓`
'▀▓▓▓▄ ^▓▓▓ ▓▓▓ └▀▀▀▀ ▀▀ ^▀▀ `▀▀ `▀▀ '▀▀ ▐▓▓▌
▀▀▀▀▓▄▄▄ ▓▓▓▓▓▓, ▓▓▓▓▀
`▀█▓▓▓▓▓▓▓▓▓▌
¬`▀▀▀█▓
Version information:
ml-agents: 0.18.0,
ml-agents-envs: 0.18.0,
Communicator API: 1.0.0,
TensorFlow: 2.2.0
2020-07-21 18:20:48.950830: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cudart64_101.dll
WARNING:tensorflow:From d:\ml-agents\mlagents-env\lib\site-packages\tensorflow\python\compat\v2_compat.py:96: disable_resource_variables (from tensorflow.python.ops.variable_scope) is deprecated and will be removed in a future version.
Instructions for updating:
non-resource variables are not supported in the long term
2020-07-21 18:20:50 INFO [environment.py:199] Listening on port 5004. Start training by pressing the Play button in the Unity Editor.
2020-07-21 18:20:53 INFO [environment.py:108] Connected to Unity environment with package version 1.0.3 and communication version 1.0.0
2020-07-21 18:20:53 INFO [environment.py:265] Connected new brain:
3DBall?team=0
2020-07-21 18:20:53.860642: I tensorflow/core/platform/cpu_feature_guard.cc:143] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2
2020-07-21 18:20:53.872630: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x2483e493c30 initialized for platform Host (this does not guarantee that XLA will be used). Devices:
2020-07-21 18:20:53.878502: I tensorflow/compiler/xla/service/service.cc:176] StreamExecutor device (0): Host, Default Version
2020-07-21 18:20:53.996742: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library nvcuda.dll
2020-07-21 18:20:54.180968: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1561] Found device 0 with properties:
pciBusID: 0000:01:00.0 name: GeForce GTX 1070 computeCapability: 6.1
coreClock: 1.683GHz coreCount: 15 deviceMemorySize: 8.00GiB deviceMemoryBandwidth: 238.66GiB/s
2020-07-21 18:20:54.188751: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cudart64_101.dll
2020-07-21 18:20:54.225479: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cublas64_10.dll
2020-07-21 18:20:54.244696: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cufft64_10.dll
2020-07-21 18:20:54.263115: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library curand64_10.dll
2020-07-21 18:20:54.288772: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cusolver64_10.dll
2020-07-21 18:20:54.310554: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cusparse64_10.dll
2020-07-21 18:20:54.529632: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cudnn64_7.dll
2020-07-21 18:20:54.537809: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1703] Adding visible gpu devices: 0
2020-07-21 18:20:54 WARNING [stats.py:235] Could not write text summary for Tensorboard.
2020-07-21 18:20:54 INFO [trainer_controller.py:76] Saved Model
Traceback (most recent call last):
File "d:\ml-agents\ml-agents\ml-agents\mlagents\trainers\trainer_controller.py", line 130, in _create_trainer_and_manager
trainer = self.trainers[brain_name]
KeyError: '3DBall'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "D:\ML-Agents\mlagents-env\Scripts\mlagents-learn-script.py", line 33, in <module>
sys.exit(load_entry_point('mlagents', 'console_scripts', 'mlagents-learn')())
File "d:\ml-agents\ml-agents\ml-agents\mlagents\trainers\learn.py", line 283, in main
run_cli(parse_command_line())
File "d:\ml-agents\ml-agents\ml-agents\mlagents\trainers\learn.py", line 279, in run_cli
run_training(run_seed, options)
File "d:\ml-agents\ml-agents\ml-agents\mlagents\trainers\learn.py", line 158, in run_training
tc.start_learning(env_manager)
File "d:\ml-agents\ml-agents\ml-agents-envs\mlagents_envs\timers.py", line 305, in wrapped
return func(*args, **kwargs)
File "d:\ml-agents\ml-agents\ml-agents\mlagents\trainers\trainer_controller.py", line 181, in start_learning
self._create_trainers_and_managers(env_manager, new_behavior_ids)
File "d:\ml-agents\ml-agents\ml-agents\mlagents\trainers\trainer_controller.py", line 168, in _create_trainers_and_managers
self._create_trainer_and_manager(env_manager, behavior_id)
File "d:\ml-agents\ml-agents\ml-agents\mlagents\trainers\trainer_controller.py", line 132, in _create_trainer_and_manager
trainer = self.trainer_factory.generate(brain_name)
File "d:\ml-agents\ml-agents\ml-agents\mlagents\trainers\trainer_util.py", line 52, in generate
self.multi_gpu,
File "d:\ml-agents\ml-agents\ml-agents\mlagents\trainers\trainer_util.py", line 101, in initialize_trainer
trainer_artifact_path,
File "d:\ml-agents\ml-agents\ml-agents\mlagents\trainers\ppo\trainer.py", line 48, in __init__
brain_name, trainer_settings, training, artifact_path, reward_buff_cap
File "d:\ml-agents\ml-agents\ml-agents\mlagents\trainers\trainer\rl_trainer.py", line 38, in __init__
StatsPropertyType.HYPERPARAMETERS, self.trainer_settings.as_dict()
File "d:\ml-agents\ml-agents\ml-agents\mlagents\trainers\stats.py", line 321, in add_property
writer.add_property(self.category, property_type, value)
File "d:\ml-agents\ml-agents\ml-agents\mlagents\trainers\stats.py", line 216, in add_property
self.summary_writers[category].add_summary(text, 0)
File "d:\ml-agents\mlagents-env\lib\site-packages\tensorflow\python\summary\writer\writer.py", line 127, in add_summary
for value in summary.value:
AttributeError: 'str' object has no attribute 'value'
Screenshots
If applicable, add screenshots to help explain your problem.
Environment (please complete the following information):
- Unity Version: Unity 2019.4.4f1
- OS + version: Windows 10
- ML-Agents version: v0.18.0
- TensorFlow version: 2.2.0
- Environment: 3DBall (actually every env)
NOTE: We are unable to help reproduce bugs with custom environments. Please attempt to reproduce your issue with one of the example environments, or provide a minimal patch to one of the environments needed to reproduce the issue.