Skip to content

KeyError and AttributeError using MLAgents #4250

@qiwu57kevin

Description

@qiwu57kevin

Describe the bug
I am using mlagents 0.18.0. While I setup everything and started training with provided example environments, it keeps giving me KeyError from trainer and AttributeError from tensorflow. I used the same setup from the same desktop about 2 days and everything works well, but it couldn't work in my current device.

To Reproduce
Steps to reproduce the behavior:

  • I am using Unity editor to train 3DBall example environment. The shell script is:
    mlagents-learn ppo/3DBall.yaml --run-id=3DBall_test
    It is started from a virtual python environment.
  • The config file I used is below, which is included in the git repo:
behaviors:
  3DBall:
    trainer_type: ppo
    hyperparameters:
      batch_size: 64
      buffer_size: 12000
      learning_rate: 0.0003
      beta: 0.001
      epsilon: 0.2
      lambd: 0.99
      num_epoch: 3
      learning_rate_schedule: linear
    network_settings:
      normalize: true
      hidden_units: 128
      num_layers: 2
      vis_encode_type: simple
    reward_signals:
      extrinsic:
        gamma: 0.99
        strength: 1.0
    keep_checkpoints: 5
    max_steps: 500000
    time_horizon: 1000
    summary_freq: 12000
    threaded: true

Console logs / stack traces

(mlagents-env) D:\ML-Agents\ml-agents\config>mlagents-learn ppo/3DBall.yaml --run-id=3DBall_test --force
2020-07-21 18:20:45.848482: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cudart64_101.dll
WARNING:tensorflow:From d:\ml-agents\mlagents-env\lib\site-packages\tensorflow\python\compat\v2_compat.py:96: disable_resource_variables (from tensorflow.python.ops.variable_scope) is deprecated and will be removed in a future version.
Instructions for updating:
non-resource variables are not supported in the long term


                        ▄▄▄▓▓▓▓
                   ╓▓▓▓▓▓▓█▓▓▓▓▓
              ,▄▄▄m▀▀▀'  ,▓▓▓▀▓▓▄                           ▓▓▓  ▓▓▌
            ▄▓▓▓▀'      ▄▓▓▀  ▓▓▓      ▄▄     ▄▄ ,▄▄ ▄▄▄▄   ,▄▄ ▄▓▓▌▄ ▄▄▄    ,▄▄
          ▄▓▓▓▀        ▄▓▓▀   ▐▓▓▌     ▓▓▌   ▐▓▓ ▐▓▓▓▀▀▀▓▓▌ ▓▓▓ ▀▓▓▌▀ ^▓▓▌  ╒▓▓▌
        ▄▓▓▓▓▓▄▄▄▄▄▄▄▄▓▓▓      ▓▀      ▓▓▌   ▐▓▓ ▐▓▓    ▓▓▓ ▓▓▓  ▓▓▌   ▐▓▓▄ ▓▓▌
        ▀▓▓▓▓▀▀▀▀▀▀▀▀▀▀▓▓▄     ▓▓      ▓▓▌   ▐▓▓ ▐▓▓    ▓▓▓ ▓▓▓  ▓▓▌    ▐▓▓▐▓▓
          ^█▓▓▓        ▀▓▓▄   ▐▓▓▌     ▓▓▓▓▄▓▓▓▓ ▐▓▓    ▓▓▓ ▓▓▓  ▓▓▓▄    ▓▓▓▓`
            '▀▓▓▓▄      ^▓▓▓  ▓▓▓       └▀▀▀▀ ▀▀ ^▀▀    `▀▀ `▀▀   '▀▀    ▐▓▓▌
               ▀▀▀▀▓▄▄▄   ▓▓▓▓▓▓,                                      ▓▓▓▓▀
                   `▀█▓▓▓▓▓▓▓▓▓▌
                        ¬`▀▀▀█▓


 Version information:
  ml-agents: 0.18.0,
  ml-agents-envs: 0.18.0,
  Communicator API: 1.0.0,
  TensorFlow: 2.2.0
2020-07-21 18:20:48.950830: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cudart64_101.dll
WARNING:tensorflow:From d:\ml-agents\mlagents-env\lib\site-packages\tensorflow\python\compat\v2_compat.py:96: disable_resource_variables (from tensorflow.python.ops.variable_scope) is deprecated and will be removed in a future version.
Instructions for updating:
non-resource variables are not supported in the long term
2020-07-21 18:20:50 INFO [environment.py:199] Listening on port 5004. Start training by pressing the Play button in the Unity Editor.
2020-07-21 18:20:53 INFO [environment.py:108] Connected to Unity environment with package version 1.0.3 and communication version 1.0.0
2020-07-21 18:20:53 INFO [environment.py:265] Connected new brain:
3DBall?team=0
2020-07-21 18:20:53.860642: I tensorflow/core/platform/cpu_feature_guard.cc:143] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2
2020-07-21 18:20:53.872630: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x2483e493c30 initialized for platform Host (this does not guarantee that XLA will be used). Devices:
2020-07-21 18:20:53.878502: I tensorflow/compiler/xla/service/service.cc:176]   StreamExecutor device (0): Host, Default Version
2020-07-21 18:20:53.996742: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library nvcuda.dll
2020-07-21 18:20:54.180968: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1561] Found device 0 with properties:
pciBusID: 0000:01:00.0 name: GeForce GTX 1070 computeCapability: 6.1
coreClock: 1.683GHz coreCount: 15 deviceMemorySize: 8.00GiB deviceMemoryBandwidth: 238.66GiB/s
2020-07-21 18:20:54.188751: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cudart64_101.dll
2020-07-21 18:20:54.225479: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cublas64_10.dll
2020-07-21 18:20:54.244696: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cufft64_10.dll
2020-07-21 18:20:54.263115: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library curand64_10.dll
2020-07-21 18:20:54.288772: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cusolver64_10.dll
2020-07-21 18:20:54.310554: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cusparse64_10.dll
2020-07-21 18:20:54.529632: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cudnn64_7.dll
2020-07-21 18:20:54.537809: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1703] Adding visible gpu devices: 0
2020-07-21 18:20:54 WARNING [stats.py:235] Could not write text summary for Tensorboard.
2020-07-21 18:20:54 INFO [trainer_controller.py:76] Saved Model
Traceback (most recent call last):
  File "d:\ml-agents\ml-agents\ml-agents\mlagents\trainers\trainer_controller.py", line 130, in _create_trainer_and_manager
    trainer = self.trainers[brain_name]
KeyError: '3DBall'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "D:\ML-Agents\mlagents-env\Scripts\mlagents-learn-script.py", line 33, in <module>
    sys.exit(load_entry_point('mlagents', 'console_scripts', 'mlagents-learn')())
  File "d:\ml-agents\ml-agents\ml-agents\mlagents\trainers\learn.py", line 283, in main
    run_cli(parse_command_line())
  File "d:\ml-agents\ml-agents\ml-agents\mlagents\trainers\learn.py", line 279, in run_cli
    run_training(run_seed, options)
  File "d:\ml-agents\ml-agents\ml-agents\mlagents\trainers\learn.py", line 158, in run_training
    tc.start_learning(env_manager)
  File "d:\ml-agents\ml-agents\ml-agents-envs\mlagents_envs\timers.py", line 305, in wrapped
    return func(*args, **kwargs)
  File "d:\ml-agents\ml-agents\ml-agents\mlagents\trainers\trainer_controller.py", line 181, in start_learning
    self._create_trainers_and_managers(env_manager, new_behavior_ids)
  File "d:\ml-agents\ml-agents\ml-agents\mlagents\trainers\trainer_controller.py", line 168, in _create_trainers_and_managers
    self._create_trainer_and_manager(env_manager, behavior_id)
  File "d:\ml-agents\ml-agents\ml-agents\mlagents\trainers\trainer_controller.py", line 132, in _create_trainer_and_manager
    trainer = self.trainer_factory.generate(brain_name)
  File "d:\ml-agents\ml-agents\ml-agents\mlagents\trainers\trainer_util.py", line 52, in generate
    self.multi_gpu,
  File "d:\ml-agents\ml-agents\ml-agents\mlagents\trainers\trainer_util.py", line 101, in initialize_trainer
    trainer_artifact_path,
  File "d:\ml-agents\ml-agents\ml-agents\mlagents\trainers\ppo\trainer.py", line 48, in __init__
    brain_name, trainer_settings, training, artifact_path, reward_buff_cap
  File "d:\ml-agents\ml-agents\ml-agents\mlagents\trainers\trainer\rl_trainer.py", line 38, in __init__
    StatsPropertyType.HYPERPARAMETERS, self.trainer_settings.as_dict()
  File "d:\ml-agents\ml-agents\ml-agents\mlagents\trainers\stats.py", line 321, in add_property
    writer.add_property(self.category, property_type, value)
  File "d:\ml-agents\ml-agents\ml-agents\mlagents\trainers\stats.py", line 216, in add_property
    self.summary_writers[category].add_summary(text, 0)
  File "d:\ml-agents\mlagents-env\lib\site-packages\tensorflow\python\summary\writer\writer.py", line 127, in add_summary
    for value in summary.value:
AttributeError: 'str' object has no attribute 'value'

Screenshots
If applicable, add screenshots to help explain your problem.

Environment (please complete the following information):

  • Unity Version: Unity 2019.4.4f1
  • OS + version: Windows 10
  • ML-Agents version: v0.18.0
  • TensorFlow version: 2.2.0
  • Environment: 3DBall (actually every env)

NOTE: We are unable to help reproduce bugs with custom environments. Please attempt to reproduce your issue with one of the example environments, or provide a minimal patch to one of the environments needed to reproduce the issue.

Metadata

Metadata

Assignees

Labels

bugIssue describes a potential bug in ml-agents.

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions