Unity-Technologies · ervteng · Apr 29, 2020 · Apr 23, 2020 · Apr 23, 2020 · Apr 23, 2020
diff --git a/.gitignore b/.gitignore
@@ -1,6 +1,8 @@
-# Tensorflow Model Info
+# Output Artifacts (Legacy)
 /models
 /summaries
+# Output Artifacts
+/results
 
 # Training environments
 /envs

diff --git a/com.unity.ml-agents/CHANGELOG.md b/com.unity.ml-agents/CHANGELOG.md
@@ -69,6 +69,8 @@ and this project adheres to
   instead of "camelCase"; for example, `Agent.maxStep` was renamed to
   `Agent.MaxStep`. For a full list of changes, see the pull request. (#3828)
 - Update Barracuda to 0.7.0-preview which has breaking namespace and assembly name changes.
+- Training artifacts (trained models, summaries) are now found in the `results/`
+  directory. (#3829)
 
 ### Minor Changes
 
@@ -97,6 +99,7 @@ you will need to change the signature of its `Write()` method. (#3834)
 - The maximum compatible version of tensorflow was changed to allow tensorflow 2.1 and 2.2. This
 will allow use with python 3.8 using tensorflow 2.2.0rc3.
 - `UnityRLCapabilities` was added to help inform users when RL features are mismatched between C# and Python packages. (#3831)
+- Unity Player logs are now written out to the results directory. (#3877)
 
 ### Bug Fixes
 

diff --git a/docs/Getting-Started.md b/docs/Getting-Started.md
@@ -179,12 +179,11 @@ INFO:mlagents_envs:Hyperparameters for the PPO Trainer of brain 3DBallLearning:
         sequence_length:     64
         summary_freq:        1000
         use_recurrent:       False
-        summary_path:        ./summaries/first3DBallRun
         memory_size:         256
         use_curiosity:       False
         curiosity_strength:  0.01
         curiosity_enc_size:  128
-        model_path: ./models/first3DBallRun/3DBallLearning
+        output_path: ./results/first3DBallRun/3DBallLearning
 INFO:mlagents.trainers: first3DBallRun: 3DBallLearning: Step: 1000. Mean Reward: 1.242. Std of Reward: 0.746. Training.
 INFO:mlagents.trainers: first3DBallRun: 3DBallLearning: Step: 2000. Mean Reward: 1.319. Std of Reward: 0.693. Training.
 INFO:mlagents.trainers: first3DBallRun: 3DBallLearning: Step: 3000. Mean Reward: 1.804. Std of Reward: 1.056. Training.
@@ -236,7 +235,7 @@ the same command again, appending the `--resume` flag:
 mlagents-learn config/trainer_config.yaml --run-id=first3DBallRun --resume
 ```
 
-Your trained model will be at `models/<run-identifier>/<behavior_name>.nn` where
+Your trained model will be at `results/<run-identifier>/<behavior_name>.nn` where
 `<behavior_name>` is the name of the `Behavior Name` of the agents corresponding
 to the model. This file corresponds to your model's latest checkpoint. You can
 now embed this trained model into your Agents by following the steps below,

diff --git a/docs/Learning-Environment-Executable.md b/docs/Learning-Environment-Executable.md
@@ -152,12 +152,11 @@ INFO:mlagents_envs:Hyperparameters for the PPO Trainer of brain Ball3DLearning:
         sequence_length:     64
         summary_freq:        1000
         use_recurrent:       False
-        summary_path:        ./summaries/first-run-0
         memory_size:         256
         use_curiosity:       False
         curiosity_strength:  0.01
         curiosity_enc_size:  128
-        model_path: ./models/first-run-0/Ball3DLearning
+        output_path: ./results/first-run-0/Ball3DLearning
 INFO:mlagents.trainers: first-run-0: Ball3DLearning: Step: 1000. Mean Reward: 1.242. Std of Reward: 0.746. Training.
 INFO:mlagents.trainers: first-run-0: Ball3DLearning: Step: 2000. Mean Reward: 1.319. Std of Reward: 0.693. Training.
 INFO:mlagents.trainers: first-run-0: Ball3DLearning: Step: 3000. Mean Reward: 1.804. Std of Reward: 1.056. Training.
@@ -171,7 +170,7 @@ INFO:mlagents.trainers: first-run-0: Ball3DLearning: Step: 10000. Mean Reward: 2
 ```
 
 You can press Ctrl+C to stop the training, and your trained model will be at
-`models/<run-identifier>/<behavior_name>.nn`, which corresponds to your model's
+`results/<run-identifier>/<behavior_name>.nn`, which corresponds to your model's
 latest checkpoint. (**Note:** There is a known bug on Windows that causes the
 saving of the model to fail when you early terminate the training, it's
 recommended to wait until Step has reached the max_steps parameter you set in

diff --git a/docs/Migrating.md b/docs/Migrating.md
@@ -61,6 +61,8 @@ double-check that the versions are in the same. The versions can be found in
   instead of "camelCase"; for example, `Agent.maxStep` was renamed to
   `Agent.MaxStep`. For a full list of changes, see the pull request. (#3828)
 - `WriteAdapter` was renamed to `ObservationWriter`. (#3834)
+- Training artifacts (trained models, summaries) are now found under `results/`
+  instead of `summaries/` and `models/`.
 
 ### Steps to Migrate
 

diff --git a/docs/Training-ML-Agents.md b/docs/Training-ML-Agents.md
@@ -64,16 +64,17 @@ for a sample execution of the `mlagents-learn` command.
 Regardless of which training methods, configurations or hyperparameters you
 provide, the training process will always generate three artifacts:
 
-1. Summaries (under the `summaries/` folder): these are training metrics that
+1. Summaries (under the `results/<run-identifier>/<behavior-name>` folder):
+   these are training metrics that
    are updated throughout the training process. They are helpful to monitor your
    training performance and may help inform how to update your hyperparameter
    values. See [Using TensorBoard](Using-Tensorboard.md) for more details on how
    to visualize the training metrics.
-1. Models (under the `models/` folder): these contain the model checkpoints that
+1. Models (under the `results/<run-identifier>/` folder): these contain the model checkpoints that
    are updated throughout training and the final model file (`.nn`). This final
    model file is generated once either when training completes or is
    interrupted.
-1. Timers file (also under the `summaries/` folder): this contains aggregated
+1. Timers file (also under the `results/<run-identifier>` folder): this contains aggregated
    metrics on your training process, including time spent on specific code
    blocks. See [Profiling in Python](Profiling-Python.md) for more information
    on the timers generated.

diff --git a/docs/Training-PPO.md b/docs/Training-PPO.md
@@ -294,7 +294,7 @@ Typical Range: Approximately equal to PPO's `buffer_size`
 `init_path` can be specified to initialize your model from a previous run before starting.
 Note that the prior run should have used the same trainer configurations as the current run,
 and have been saved with the same version of ML-Agents. You should provide the full path
-to the folder where the checkpoints were saved, e.g. `./models/{run-id}/{behavior_name}`.
+to the folder where the checkpoints were saved, e.g. `./results/{run-id}/{behavior_name}`.
 
 This option is provided in case you want to initialize different behaviors from different runs;
 in most cases, it is sufficient to use the `--initialize-from` CLI parameter to initialize

diff --git a/docs/Training-SAC.md b/docs/Training-SAC.md
@@ -295,7 +295,7 @@ Typical Range (Discrete): `32` - `512`
 `init_path` can be specified to initialize your model from a previous run before starting.
 Note that the prior run should have used the same trainer configurations as the current run,
 and have been saved with the same version of ML-Agents. You should provide the full path
-to the folder where the checkpoints were saved, e.g. `./models/{run-id}/{behavior_name}`.
+to the folder where the checkpoints were saved, e.g. `./results/{run-id}/{behavior_name}`.
 
 This option is provided in case you want to initialize different behaviors from different runs;
 in most cases, it is sufficient to use the `--initialize-from` CLI parameter to initialize

diff --git a/docs/Using-Tensorboard.md b/docs/Using-Tensorboard.md
@@ -12,7 +12,7 @@ start TensorBoard:
 
 1. Open a terminal or console window:
 1. Navigate to the directory where the ML-Agents Toolkit is installed.
-1. From the command line run: `tensorboard --logdir=summaries --port=6006`
+1. From the command line run: `tensorboard --logdir=results --port=6006`
 1. Open a browser window and navigate to
    [localhost:6006](http://localhost:6006).
 

diff --git a/ml-agents-envs/mlagents_envs/environment.py b/ml-agents-envs/mlagents_envs/environment.py
@@ -141,8 +141,9 @@ def __init__(
         seed: int = 0,
         no_graphics: bool = False,
         timeout_wait: int = 60,
-        args: Optional[List[str]] = None,
+        additional_args: Optional[List[str]] = None,
         side_channels: Optional[List[SideChannel]] = None,
+        log_folder: Optional[str] = None,
     ):
         """
         Starts a new unity environment and establishes a connection with the environment.
@@ -157,9 +158,11 @@ def __init__(
         :int timeout_wait: Time (in seconds) to wait for connection from environment.
         :list args: Addition Unity command line arguments
         :list side_channels: Additional side channel for no-rl communication with Unity
+        :str log_folder: Optional folder to write the Unity Player log file into.  Requires absolute path.
         """
-        args = args or []
         atexit.register(self._close)
+        self.additional_args = additional_args or []
+        self.no_graphics = no_graphics
         # If base port is not specified, use BASE_ENVIRONMENT_PORT if we have
         # an environment, otherwise DEFAULT_EDITOR_PORT
         if base_port is None:
@@ -185,6 +188,7 @@ def __init__(
                         )
                     )
                 self.side_channels[_sc.channel_id] = _sc
+        self.log_folder = log_folder
 
         # If the environment name is None, a new environment will not be launched
         # and the communicator will directly try to connect to an existing unity environment.
@@ -195,7 +199,7 @@ def __init__(
                 "the worker-id must be 0 in order to connect with the Editor."
             )
         if file_name is not None:
-            self.executable_launcher(file_name, no_graphics, args)
+            self.executable_launcher(file_name, no_graphics, additional_args)
         else:
             logger.info(
                 f"Listening on port {self.port}. "
@@ -296,6 +300,20 @@ def validate_environment_path(env_path: str) -> Optional[str]:
                 launch_string = candidates[0]
         return launch_string
 
+    def executable_args(self) -> List[str]:
+        args: List[str] = []
+        if self.no_graphics:
+            args += ["-nographics", "-batchmode"]
+        args += [UnityEnvironment.PORT_COMMAND_LINE_ARG, str(self.port)]
+        if self.log_folder:
+            log_file_path = os.path.join(
+                self.log_folder, f"Player-{self.worker_id}.log"
+            )
+            args += ["-logFile", log_file_path]
+        # Add in arguments passed explicitly by the user.
+        args += self.additional_args
+        return args
+
     def executable_launcher(self, file_name, no_graphics, args):
         launch_string = self.validate_environment_path(file_name)
         if launch_string is None:
@@ -306,11 +324,7 @@ def executable_launcher(self, file_name, no_graphics, args):
         else:
             logger.debug("This is the launch string {}".format(launch_string))
             # Launch Unity environment
-            subprocess_args = [launch_string]
-            if no_graphics:
-                subprocess_args += ["-nographics", "-batchmode"]
-            subprocess_args += [UnityEnvironment.PORT_COMMAND_LINE_ARG, str(self.port)]
-            subprocess_args += args
+            subprocess_args = [launch_string] + self.executable_args()
             try:
                 self.proc1 = subprocess.Popen(
                     subprocess_args,

diff --git a/ml-agents-envs/mlagents_envs/tests/test_envs.py b/ml-agents-envs/mlagents_envs/tests/test_envs.py
@@ -49,6 +49,18 @@ def test_port_defaults(
     assert expected == env.port
 
 
+@mock.patch("mlagents_envs.environment.UnityEnvironment.executable_launcher")
+@mock.patch("mlagents_envs.environment.UnityEnvironment.get_communicator")
+def test_log_file_path_is_set(mock_communicator, mock_launcher):
+    mock_communicator.return_value = MockCommunicator()
+    env = UnityEnvironment(
+        file_name="myfile", worker_id=0, log_folder="./some-log-folder-path"
+    )
+    args = env.executable_args()
+    log_file_index = args.index("-logFile")
+    assert args[log_file_index + 1] == "./some-log-folder-path/Player-0.log"
+
+
 @mock.patch("mlagents_envs.environment.UnityEnvironment.executable_launcher")
 @mock.patch("mlagents_envs.environment.UnityEnvironment.get_communicator")
 def test_reset(mock_communicator, mock_launcher):

diff --git a/ml-agents/mlagents/trainers/learn.py b/ml-agents/mlagents/trainers/learn.py
@@ -1,5 +1,6 @@
 # # Unity ML-Agents Toolkit
 import argparse
+import yaml
 
 import os
 import numpy as np
@@ -328,26 +329,29 @@ def run_training(run_seed: int, options: RunOptions) -> None:
     :param run_options: Command line arguments for training.
     """
     with hierarchical_timer("run_training.setup"):
-        model_path = f"./models/{options.run_id}"
+        base_path = "results"
+        write_path = os.path.join(base_path, options.run_id)
         maybe_init_path = (
-            f"./models/{options.initialize_from}" if options.initialize_from else None
+            os.path.join(base_path, options.run_id) if options.initialize_from else None
         )
-        summaries_dir = "./summaries"
+        run_logs_dir = os.path.join(write_path, "run_logs")
         port = options.base_port
-
+        # Check if directory exists
+        handle_existing_directories(
+            write_path, options.resume, options.force, maybe_init_path
+        )
+        # Make run logs directory
+        os.makedirs(run_logs_dir, exist_ok=True)
         # Configure CSV, Tensorboard Writers and StatsReporter
         # We assume reward and episode length are needed in the CSV.
         csv_writer = CSVWriter(
-            summaries_dir,
+            write_path,
             required_fields=[
                 "Environment/Cumulative Reward",
                 "Environment/Episode Length",
             ],
         )
-        handle_existing_directories(
-            model_path, summaries_dir, options.resume, options.force, maybe_init_path
-        )
-        tb_writer = TensorboardWriter(summaries_dir, clear_past_data=not options.resume)
+        tb_writer = TensorboardWriter(write_path, clear_past_data=not options.resume)
         gauge_write = GaugeWriter()
         console_writer = ConsoleWriter()
         StatsReporter.add_writer(tb_writer)
@@ -358,7 +362,12 @@ def run_training(run_seed: int, options: RunOptions) -> None:
         if options.env_path is None:
             port = UnityEnvironment.DEFAULT_EDITOR_PORT
         env_factory = create_environment_factory(
-            options.env_path, options.no_graphics, run_seed, port, options.env_args
+            options.env_path,
+            options.no_graphics,
+            run_seed,
+            port,
+            options.env_args,
+            os.path.abspath(run_logs_dir),  # Unity environment requires absolute path
         )
         engine_config = EngineConfig(
             width=options.width,
@@ -377,9 +386,8 @@ def run_training(run_seed: int, options: RunOptions) -> None:
         )
         trainer_factory = TrainerFactory(
             options.trainer_config,
-            summaries_dir,
             options.run_id,
-            model_path,
+            write_path,
             options.keep_checkpoints,
             not options.inference,
             options.resume,
@@ -391,8 +399,7 @@ def run_training(run_seed: int, options: RunOptions) -> None:
         # Create controller and begin training.
         tc = TrainerController(
             trainer_factory,
-            model_path,
-            summaries_dir,
+            write_path,
             options.run_id,
             options.save_freq,
             maybe_meta_curriculum,
@@ -407,11 +414,26 @@ def run_training(run_seed: int, options: RunOptions) -> None:
         tc.start_learning(env_manager)
     finally:
         env_manager.close()
-        write_timing_tree(summaries_dir, options.run_id)
+        write_run_options(write_path, options)
+        write_timing_tree(run_logs_dir)
+
+
+def write_run_options(output_dir: str, run_options: RunOptions) -> None:
+    run_options_path = os.path.join(output_dir, "configuration.yaml")
+    try:
+        with open(run_options_path, "w") as f:
+            try:
+                yaml.dump(dict(run_options._asdict()), f, sort_keys=False)
+            except TypeError:  # Older versions of pyyaml don't support sort_keys
+                yaml.dump(dict(run_options._asdict()), f)
+    except FileNotFoundError:
+        logger.warning(
+            f"Unable to save configuration to {run_options_path}. Make sure the directory exists"
+        )
 
 
-def write_timing_tree(summaries_dir: str, run_id: str) -> None:
-    timing_path = f"{summaries_dir}/{run_id}_timers.json"
+def write_timing_tree(output_dir: str) -> None:
+    timing_path = os.path.join(output_dir, "timers.json")
     try:
         with open(timing_path, "w") as f:
             json.dump(get_timer_tree(), f, indent=4)
@@ -462,6 +484,7 @@ def create_environment_factory(
     seed: int,
     start_port: int,
     env_args: Optional[List[str]],
+    log_folder: str,
 ) -> Callable[[int, List[SideChannel]], BaseEnv]:
     if env_path is not None:
         launch_string = UnityEnvironment.validate_environment_path(env_path)
@@ -481,8 +504,9 @@ def create_unity_environment(
             seed=env_seed,
             no_graphics=no_graphics,
             base_port=start_port,
-            args=env_args,
+            additional_args=env_args,
             side_channels=side_channels,
+            log_folder=log_folder,
         )
 
     return create_unity_environment

diff --git a/ml-agents/mlagents/trainers/policy/tf_policy.py b/ml-agents/mlagents/trainers/policy/tf_policy.py
@@ -1,5 +1,6 @@
 from typing import Any, Dict, List, Optional
 import abc
+import os
 import numpy as np
 from mlagents.tf_utils import tf
 from mlagents import tf_utils
@@ -62,7 +63,7 @@ def __init__(self, seed, brain, trainer_parameters, load=False):
         self.use_continuous_act = brain.vector_action_space_type == "continuous"
         if self.use_continuous_act:
             self.num_branches = self.brain.vector_action_space_size[0]
-        self.model_path = trainer_parameters["model_path"]
+        self.model_path = trainer_parameters["output_path"]
         self.initialize_path = trainer_parameters.get("init_path", None)
         self.keep_checkpoints = trainer_parameters.get("keep_checkpoints", 5)
         self.graph = tf.Graph()
@@ -366,7 +367,7 @@ def save_model(self, steps):
         :return:
         """
         with self.graph.as_default():
-            last_checkpoint = self.model_path + "/model-" + str(steps) + ".ckpt"
+            last_checkpoint = os.path.join(self.model_path, f"model-{steps}.ckpt")
             self.saver.save(self.sess, last_checkpoint)
             tf.train.write_graph(
                 self.graph, self.model_path, "raw_graph_def.pb", as_text=False

diff --git a/ml-agents/mlagents/trainers/ppo/trainer.py b/ml-agents/mlagents/trainers/ppo/trainer.py
@@ -62,9 +62,8 @@ def __init__(
             "sequence_length",
             "summary_freq",
             "use_recurrent",
-            "summary_path",
             "memory_size",
-            "model_path",
+            "output_path",
             "reward_signals",
         ]
         self._check_param_keys()