Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 3 additions & 1 deletion .gitignore
Original file line number Diff line number Diff line change
@@ -1,6 +1,8 @@
# Tensorflow Model Info
# Output Artifacts (Legacy)
/models
/summaries
# Output Artifacts
/results

# Training environments
/envs
Expand Down
3 changes: 3 additions & 0 deletions com.unity.ml-agents/CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -69,6 +69,8 @@ and this project adheres to
instead of "camelCase"; for example, `Agent.maxStep` was renamed to
`Agent.MaxStep`. For a full list of changes, see the pull request. (#3828)
- Update Barracuda to 0.7.0-preview which has breaking namespace and assembly name changes.
- Training artifacts (trained models, summaries) are now found in the `results/`
directory. (#3829)

### Minor Changes

Expand Down Expand Up @@ -97,6 +99,7 @@ you will need to change the signature of its `Write()` method. (#3834)
- The maximum compatible version of tensorflow was changed to allow tensorflow 2.1 and 2.2. This
will allow use with python 3.8 using tensorflow 2.2.0rc3.
- `UnityRLCapabilities` was added to help inform users when RL features are mismatched between C# and Python packages. (#3831)
- Unity Player logs are now written out to the results directory. (#3877)

### Bug Fixes

Expand Down
5 changes: 2 additions & 3 deletions docs/Getting-Started.md
Original file line number Diff line number Diff line change
Expand Up @@ -179,12 +179,11 @@ INFO:mlagents_envs:Hyperparameters for the PPO Trainer of brain 3DBallLearning:
sequence_length: 64
summary_freq: 1000
use_recurrent: False
summary_path: ./summaries/first3DBallRun
memory_size: 256
use_curiosity: False
curiosity_strength: 0.01
curiosity_enc_size: 128
model_path: ./models/first3DBallRun/3DBallLearning
output_path: ./results/first3DBallRun/3DBallLearning
INFO:mlagents.trainers: first3DBallRun: 3DBallLearning: Step: 1000. Mean Reward: 1.242. Std of Reward: 0.746. Training.
INFO:mlagents.trainers: first3DBallRun: 3DBallLearning: Step: 2000. Mean Reward: 1.319. Std of Reward: 0.693. Training.
INFO:mlagents.trainers: first3DBallRun: 3DBallLearning: Step: 3000. Mean Reward: 1.804. Std of Reward: 1.056. Training.
Expand Down Expand Up @@ -236,7 +235,7 @@ the same command again, appending the `--resume` flag:
mlagents-learn config/trainer_config.yaml --run-id=first3DBallRun --resume
```

Your trained model will be at `models/<run-identifier>/<behavior_name>.nn` where
Your trained model will be at `results/<run-identifier>/<behavior_name>.nn` where
`<behavior_name>` is the name of the `Behavior Name` of the agents corresponding
to the model. This file corresponds to your model's latest checkpoint. You can
now embed this trained model into your Agents by following the steps below,
Expand Down
5 changes: 2 additions & 3 deletions docs/Learning-Environment-Executable.md
Original file line number Diff line number Diff line change
Expand Up @@ -152,12 +152,11 @@ INFO:mlagents_envs:Hyperparameters for the PPO Trainer of brain Ball3DLearning:
sequence_length: 64
summary_freq: 1000
use_recurrent: False
summary_path: ./summaries/first-run-0
memory_size: 256
use_curiosity: False
curiosity_strength: 0.01
curiosity_enc_size: 128
model_path: ./models/first-run-0/Ball3DLearning
output_path: ./results/first-run-0/Ball3DLearning
INFO:mlagents.trainers: first-run-0: Ball3DLearning: Step: 1000. Mean Reward: 1.242. Std of Reward: 0.746. Training.
INFO:mlagents.trainers: first-run-0: Ball3DLearning: Step: 2000. Mean Reward: 1.319. Std of Reward: 0.693. Training.
INFO:mlagents.trainers: first-run-0: Ball3DLearning: Step: 3000. Mean Reward: 1.804. Std of Reward: 1.056. Training.
Expand All @@ -171,7 +170,7 @@ INFO:mlagents.trainers: first-run-0: Ball3DLearning: Step: 10000. Mean Reward: 2
```

You can press Ctrl+C to stop the training, and your trained model will be at
`models/<run-identifier>/<behavior_name>.nn`, which corresponds to your model's
`results/<run-identifier>/<behavior_name>.nn`, which corresponds to your model's
latest checkpoint. (**Note:** There is a known bug on Windows that causes the
saving of the model to fail when you early terminate the training, it's
recommended to wait until Step has reached the max_steps parameter you set in
Expand Down
2 changes: 2 additions & 0 deletions docs/Migrating.md
Original file line number Diff line number Diff line change
Expand Up @@ -61,6 +61,8 @@ double-check that the versions are in the same. The versions can be found in
instead of "camelCase"; for example, `Agent.maxStep` was renamed to
`Agent.MaxStep`. For a full list of changes, see the pull request. (#3828)
- `WriteAdapter` was renamed to `ObservationWriter`. (#3834)
- Training artifacts (trained models, summaries) are now found under `results/`
instead of `summaries/` and `models/`.

### Steps to Migrate

Expand Down
7 changes: 4 additions & 3 deletions docs/Training-ML-Agents.md
Original file line number Diff line number Diff line change
Expand Up @@ -64,16 +64,17 @@ for a sample execution of the `mlagents-learn` command.
Regardless of which training methods, configurations or hyperparameters you
provide, the training process will always generate three artifacts:

1. Summaries (under the `summaries/` folder): these are training metrics that
1. Summaries (under the `results/<run-identifier>/<behavior-name>` folder):
these are training metrics that
are updated throughout the training process. They are helpful to monitor your
training performance and may help inform how to update your hyperparameter
values. See [Using TensorBoard](Using-Tensorboard.md) for more details on how
to visualize the training metrics.
1. Models (under the `models/` folder): these contain the model checkpoints that
1. Models (under the `results/<run-identifier>/` folder): these contain the model checkpoints that
are updated throughout training and the final model file (`.nn`). This final
model file is generated once either when training completes or is
interrupted.
1. Timers file (also under the `summaries/` folder): this contains aggregated
1. Timers file (also under the `results/<run-identifier>` folder): this contains aggregated
metrics on your training process, including time spent on specific code
blocks. See [Profiling in Python](Profiling-Python.md) for more information
on the timers generated.
Expand Down
2 changes: 1 addition & 1 deletion docs/Training-PPO.md
Original file line number Diff line number Diff line change
Expand Up @@ -294,7 +294,7 @@ Typical Range: Approximately equal to PPO's `buffer_size`
`init_path` can be specified to initialize your model from a previous run before starting.
Note that the prior run should have used the same trainer configurations as the current run,
and have been saved with the same version of ML-Agents. You should provide the full path
to the folder where the checkpoints were saved, e.g. `./models/{run-id}/{behavior_name}`.
to the folder where the checkpoints were saved, e.g. `./results/{run-id}/{behavior_name}`.

This option is provided in case you want to initialize different behaviors from different runs;
in most cases, it is sufficient to use the `--initialize-from` CLI parameter to initialize
Expand Down
2 changes: 1 addition & 1 deletion docs/Training-SAC.md
Original file line number Diff line number Diff line change
Expand Up @@ -295,7 +295,7 @@ Typical Range (Discrete): `32` - `512`
`init_path` can be specified to initialize your model from a previous run before starting.
Note that the prior run should have used the same trainer configurations as the current run,
and have been saved with the same version of ML-Agents. You should provide the full path
to the folder where the checkpoints were saved, e.g. `./models/{run-id}/{behavior_name}`.
to the folder where the checkpoints were saved, e.g. `./results/{run-id}/{behavior_name}`.

This option is provided in case you want to initialize different behaviors from different runs;
in most cases, it is sufficient to use the `--initialize-from` CLI parameter to initialize
Expand Down
2 changes: 1 addition & 1 deletion docs/Using-Tensorboard.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,7 @@ start TensorBoard:

1. Open a terminal or console window:
1. Navigate to the directory where the ML-Agents Toolkit is installed.
1. From the command line run: `tensorboard --logdir=summaries --port=6006`
1. From the command line run: `tensorboard --logdir=results --port=6006`
1. Open a browser window and navigate to
[localhost:6006](http://localhost:6006).

Expand Down
30 changes: 22 additions & 8 deletions ml-agents-envs/mlagents_envs/environment.py
Original file line number Diff line number Diff line change
Expand Up @@ -141,8 +141,9 @@ def __init__(
seed: int = 0,
no_graphics: bool = False,
timeout_wait: int = 60,
args: Optional[List[str]] = None,
additional_args: Optional[List[str]] = None,
side_channels: Optional[List[SideChannel]] = None,
log_folder: Optional[str] = None,
):
"""
Starts a new unity environment and establishes a connection with the environment.
Expand All @@ -157,9 +158,11 @@ def __init__(
:int timeout_wait: Time (in seconds) to wait for connection from environment.
:list args: Addition Unity command line arguments
:list side_channels: Additional side channel for no-rl communication with Unity
:str log_folder: Optional folder to write the Unity Player log file into. Requires absolute path.
"""
args = args or []
atexit.register(self._close)
self.additional_args = additional_args or []
self.no_graphics = no_graphics
# If base port is not specified, use BASE_ENVIRONMENT_PORT if we have
# an environment, otherwise DEFAULT_EDITOR_PORT
if base_port is None:
Expand All @@ -185,6 +188,7 @@ def __init__(
)
)
self.side_channels[_sc.channel_id] = _sc
self.log_folder = log_folder

# If the environment name is None, a new environment will not be launched
# and the communicator will directly try to connect to an existing unity environment.
Expand All @@ -195,7 +199,7 @@ def __init__(
"the worker-id must be 0 in order to connect with the Editor."
)
if file_name is not None:
self.executable_launcher(file_name, no_graphics, args)
self.executable_launcher(file_name, no_graphics, additional_args)
else:
logger.info(
f"Listening on port {self.port}. "
Expand Down Expand Up @@ -296,6 +300,20 @@ def validate_environment_path(env_path: str) -> Optional[str]:
launch_string = candidates[0]
return launch_string

def executable_args(self) -> List[str]:
args: List[str] = []
if self.no_graphics:
args += ["-nographics", "-batchmode"]
args += [UnityEnvironment.PORT_COMMAND_LINE_ARG, str(self.port)]
if self.log_folder:
log_file_path = os.path.join(
self.log_folder, f"Player-{self.worker_id}.log"
)
args += ["-logFile", log_file_path]
# Add in arguments passed explicitly by the user.
args += self.additional_args
return args

def executable_launcher(self, file_name, no_graphics, args):
launch_string = self.validate_environment_path(file_name)
if launch_string is None:
Expand All @@ -306,11 +324,7 @@ def executable_launcher(self, file_name, no_graphics, args):
else:
logger.debug("This is the launch string {}".format(launch_string))
# Launch Unity environment
subprocess_args = [launch_string]
if no_graphics:
subprocess_args += ["-nographics", "-batchmode"]
subprocess_args += [UnityEnvironment.PORT_COMMAND_LINE_ARG, str(self.port)]
subprocess_args += args
subprocess_args = [launch_string] + self.executable_args()
try:
self.proc1 = subprocess.Popen(
subprocess_args,
Expand Down
12 changes: 12 additions & 0 deletions ml-agents-envs/mlagents_envs/tests/test_envs.py
Original file line number Diff line number Diff line change
Expand Up @@ -49,6 +49,18 @@ def test_port_defaults(
assert expected == env.port


@mock.patch("mlagents_envs.environment.UnityEnvironment.executable_launcher")
@mock.patch("mlagents_envs.environment.UnityEnvironment.get_communicator")
def test_log_file_path_is_set(mock_communicator, mock_launcher):
mock_communicator.return_value = MockCommunicator()
env = UnityEnvironment(
file_name="myfile", worker_id=0, log_folder="./some-log-folder-path"
)
args = env.executable_args()
log_file_index = args.index("-logFile")
assert args[log_file_index + 1] == "./some-log-folder-path/Player-0.log"


@mock.patch("mlagents_envs.environment.UnityEnvironment.executable_launcher")
@mock.patch("mlagents_envs.environment.UnityEnvironment.get_communicator")
def test_reset(mock_communicator, mock_launcher):
Expand Down
60 changes: 42 additions & 18 deletions ml-agents/mlagents/trainers/learn.py
Original file line number Diff line number Diff line change
@@ -1,5 +1,6 @@
# # Unity ML-Agents Toolkit
import argparse
import yaml

import os
import numpy as np
Expand Down Expand Up @@ -328,26 +329,29 @@ def run_training(run_seed: int, options: RunOptions) -> None:
:param run_options: Command line arguments for training.
"""
with hierarchical_timer("run_training.setup"):
model_path = f"./models/{options.run_id}"
base_path = "results"
write_path = os.path.join(base_path, options.run_id)
maybe_init_path = (
f"./models/{options.initialize_from}" if options.initialize_from else None
os.path.join(base_path, options.run_id) if options.initialize_from else None
)
summaries_dir = "./summaries"
run_logs_dir = os.path.join(write_path, "run_logs")
port = options.base_port

# Check if directory exists
handle_existing_directories(
write_path, options.resume, options.force, maybe_init_path
)
# Make run logs directory
os.makedirs(run_logs_dir, exist_ok=True)
# Configure CSV, Tensorboard Writers and StatsReporter
# We assume reward and episode length are needed in the CSV.
csv_writer = CSVWriter(
summaries_dir,
write_path,
required_fields=[
"Environment/Cumulative Reward",
"Environment/Episode Length",
],
)
handle_existing_directories(
model_path, summaries_dir, options.resume, options.force, maybe_init_path
)
tb_writer = TensorboardWriter(summaries_dir, clear_past_data=not options.resume)
tb_writer = TensorboardWriter(write_path, clear_past_data=not options.resume)
gauge_write = GaugeWriter()
console_writer = ConsoleWriter()
StatsReporter.add_writer(tb_writer)
Expand All @@ -358,7 +362,12 @@ def run_training(run_seed: int, options: RunOptions) -> None:
if options.env_path is None:
port = UnityEnvironment.DEFAULT_EDITOR_PORT
env_factory = create_environment_factory(
options.env_path, options.no_graphics, run_seed, port, options.env_args
options.env_path,
options.no_graphics,
run_seed,
port,
options.env_args,
os.path.abspath(run_logs_dir), # Unity environment requires absolute path
)
engine_config = EngineConfig(
width=options.width,
Expand All @@ -377,9 +386,8 @@ def run_training(run_seed: int, options: RunOptions) -> None:
)
trainer_factory = TrainerFactory(
options.trainer_config,
summaries_dir,
options.run_id,
model_path,
write_path,
options.keep_checkpoints,
not options.inference,
options.resume,
Expand All @@ -391,8 +399,7 @@ def run_training(run_seed: int, options: RunOptions) -> None:
# Create controller and begin training.
tc = TrainerController(
trainer_factory,
model_path,
summaries_dir,
write_path,
options.run_id,
options.save_freq,
maybe_meta_curriculum,
Expand All @@ -407,11 +414,26 @@ def run_training(run_seed: int, options: RunOptions) -> None:
tc.start_learning(env_manager)
finally:
env_manager.close()
write_timing_tree(summaries_dir, options.run_id)
write_run_options(write_path, options)
write_timing_tree(run_logs_dir)


def write_run_options(output_dir: str, run_options: RunOptions) -> None:
run_options_path = os.path.join(output_dir, "configuration.yaml")
try:
with open(run_options_path, "w") as f:
try:
yaml.dump(dict(run_options._asdict()), f, sort_keys=False)
except TypeError: # Older versions of pyyaml don't support sort_keys
yaml.dump(dict(run_options._asdict()), f)
except FileNotFoundError:
logger.warning(
f"Unable to save configuration to {run_options_path}. Make sure the directory exists"
)


def write_timing_tree(summaries_dir: str, run_id: str) -> None:
timing_path = f"{summaries_dir}/{run_id}_timers.json"
def write_timing_tree(output_dir: str) -> None:
timing_path = os.path.join(output_dir, "timers.json")
try:
with open(timing_path, "w") as f:
json.dump(get_timer_tree(), f, indent=4)
Expand Down Expand Up @@ -462,6 +484,7 @@ def create_environment_factory(
seed: int,
start_port: int,
env_args: Optional[List[str]],
log_folder: str,
) -> Callable[[int, List[SideChannel]], BaseEnv]:
if env_path is not None:
launch_string = UnityEnvironment.validate_environment_path(env_path)
Expand All @@ -481,8 +504,9 @@ def create_unity_environment(
seed=env_seed,
no_graphics=no_graphics,
base_port=start_port,
args=env_args,
additional_args=env_args,
side_channels=side_channels,
log_folder=log_folder,
)

return create_unity_environment
Expand Down
5 changes: 3 additions & 2 deletions ml-agents/mlagents/trainers/policy/tf_policy.py
Original file line number Diff line number Diff line change
@@ -1,5 +1,6 @@
from typing import Any, Dict, List, Optional
import abc
import os
import numpy as np
from mlagents.tf_utils import tf
from mlagents import tf_utils
Expand Down Expand Up @@ -62,7 +63,7 @@ def __init__(self, seed, brain, trainer_parameters, load=False):
self.use_continuous_act = brain.vector_action_space_type == "continuous"
if self.use_continuous_act:
self.num_branches = self.brain.vector_action_space_size[0]
self.model_path = trainer_parameters["model_path"]
self.model_path = trainer_parameters["output_path"]
self.initialize_path = trainer_parameters.get("init_path", None)
self.keep_checkpoints = trainer_parameters.get("keep_checkpoints", 5)
self.graph = tf.Graph()
Expand Down Expand Up @@ -366,7 +367,7 @@ def save_model(self, steps):
:return:
"""
with self.graph.as_default():
last_checkpoint = self.model_path + "/model-" + str(steps) + ".ckpt"
last_checkpoint = os.path.join(self.model_path, f"model-{steps}.ckpt")
self.saver.save(self.sess, last_checkpoint)
tf.train.write_graph(
self.graph, self.model_path, "raw_graph_def.pb", as_text=False
Expand Down
3 changes: 1 addition & 2 deletions ml-agents/mlagents/trainers/ppo/trainer.py
Original file line number Diff line number Diff line change
Expand Up @@ -62,9 +62,8 @@ def __init__(
"sequence_length",
"summary_freq",
"use_recurrent",
"summary_path",
"memory_size",
"model_path",
"output_path",
"reward_signals",
]
self._check_param_keys()
Expand Down
Loading