Skip to content

[refactor] Make PyTorch the default and TensorFlow optional #4517

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 23 commits into from
Oct 21, 2020
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
23 commits
Select commit Hold shift + click to select a range
44df681
Torch setup.py
Sep 25, 2020
9df44fd
Set torch to default
Sep 25, 2020
19a88ee
Make torch default in setup.py
Sep 25, 2020
f5761f6
Remove indents
Sep 25, 2020
c0d9b81
Remove other instances of TF being used
Sep 25, 2020
978c52f
Add tensorboard to setup.py
Sep 26, 2020
c7303f0
Adding correst setup commands for verifying torch is installed (#4524)
vincentpierre Oct 1, 2020
86faff2
Develop torchdefault raise outside setup (#4530)
vincentpierre Oct 1, 2020
7419374
Merge branch 'master' into develop-torchdefault
Oct 19, 2020
3c21600
[refactor] Use PyTorch TensorBoard utils (#4518)
Oct 19, 2020
fe0cfbf
[Docs] Initial documentation changes for making Torch the default (#4…
vincentpierre Oct 20, 2020
03f7e79
[refactor] Add --tensorflow, enable Torch as default setting (#4582)
Oct 20, 2020
ad958ca
Modify Yamato tests (#4584)
Oct 20, 2020
78bd740
Don't check for PB file in Yamato inference
Oct 20, 2020
0e038ec
Merge branch 'develop-torchdefault' of github.com:Unity-Technologies/…
Oct 20, 2020
5ab0ca0
Only run inference on ONNX
Oct 20, 2020
45e197e
Update docs/Unity-Inference-Engine.md with correct tf2onnx versions
Oct 20, 2020
b8b91e1
Add reward signal class comments
Oct 20, 2020
7f7c573
More descriptive import of is_available
Oct 20, 2020
20527c7
Merge branch 'develop-torchdefault' of github.com:Unity-Technologies/…
Oct 20, 2020
2063d71
Updated installation instructions for PyTorch
Oct 20, 2020
7814a25
More reward signal comments
Oct 20, 2020
9197835
More Windows instructions
Oct 20, 2020
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion .github/ISSUE_TEMPLATE/bug_report.md
Original file line number Diff line number Diff line change
Expand Up @@ -27,7 +27,7 @@ If applicable, add screenshots to help explain your problem.
- Unity Version: [e.g. Unity 2020.1f1]
- OS + version: [e.g. Windows 10]
- _ML-Agents version_: (e.g. ML-Agents v0.8, or latest `develop` branch from source)
- _TensorFlow version_: (you can run `pip3 show tensorflow` to get this)
- _Torch version_: (you can run `pip3 show torch` to get this)
- _Environment_: (which example environment you used to reproduce the error)

**NOTE:** We are unable to help reproduce bugs with custom environments. Please attempt to reproduce your issue with one of the example environments, or provide a minimal patch to one of the environments needed to reproduce the issue.
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,7 @@
project that enables games and simulations to serve as environments for
training intelligent agents. Agents can be trained using reinforcement learning,
imitation learning, neuroevolution, or other machine learning methods through a
simple-to-use Python API. We also provide implementations (based on TensorFlow)
simple-to-use Python API. We also provide implementations (based on PyTorch)
of state-of-the-art algorithms to enable game developers and hobbyists to easily
train intelligent agents for 2D, 3D and VR/AR games. These trained agents can be
used for multiple purposes, including controlling NPC behavior (in a variety of
Expand Down
5 changes: 5 additions & 0 deletions com.unity.ml-agents/CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,11 @@ and this project adheres to
### Major Changes
#### com.unity.ml-agents (C#)
#### ml-agents / ml-agents-envs / gym-unity (Python)
- PyTorch trainers are now the default. See the
[installation docs](https://github.com/Unity-Technologies/ml-agents/blob/mastere/docs/Installation.md) for
more information on installing PyTorch. For the time being, TensorFlow is still available;
you can use the TensorFlow backend by adding `--tensorflow` to the CLI, or
adding `framework: tensorflow` in the configuration YAML. (#4517)

### Minor Changes
#### com.unity.ml-agents (C#)
Expand Down
2 changes: 1 addition & 1 deletion docs/Background-Machine-Learning.md
Original file line number Diff line number Diff line change
Expand Up @@ -194,4 +194,4 @@ we can learn policies for very complex environments (a complex environment is
one where the number of observations an agent perceives and the number of
actions they can take are large). Many of the algorithms we provide in ML-Agents
use some form of deep learning, built on top of the open-source library,
[TensorFlow](Background-TensorFlow.md).
[PyTorch](Background-PyTorch.md).
18 changes: 9 additions & 9 deletions docs/Background-TensorFlow.md → docs/Background-PyTorch.md
Original file line number Diff line number Diff line change
@@ -1,29 +1,29 @@
# Background: TensorFlow
# Background: PyTorch

As discussed in our
[machine learning background page](Background-Machine-Learning.md), many of the
algorithms we provide in the ML-Agents Toolkit leverage some form of deep
learning. More specifically, our implementations are built on top of the
open-source library [TensorFlow](https://www.tensorflow.org/). In this page we
provide a brief overview of TensorFlow, in addition to TensorFlow-related tools
open-source library [PyTorch](https://pytorch.org/). In this page we
provide a brief overview of PyTorch and TensorBoard
that we leverage within the ML-Agents Toolkit.

## TensorFlow
## PyTorch

[TensorFlow](https://www.tensorflow.org/) is an open source library for
[PyTorch](https://pytorch.org/) is an open source library for
performing computations using data flow graphs, the underlying representation of
deep learning models. It facilitates training and inference on CPUs and GPUs in
a desktop, server, or mobile device. Within the ML-Agents Toolkit, when you
train the behavior of an agent, the output is a model (.nn) file that you can
train the behavior of an agent, the output is a model (.onnx) file that you can
then associate with an Agent. Unless you implement a new algorithm, the use of
TensorFlow is mostly abstracted away and behind the scenes.
PyTorch is mostly abstracted away and behind the scenes.

## TensorBoard

One component of training models with TensorFlow is setting the values of
One component of training models with PyTorch is setting the values of
certain model attributes (called _hyperparameters_). Finding the right values of
these hyperparameters can require a few iterations. Consequently, we leverage a
visualization tool within TensorFlow called
visualization tool called
[TensorBoard](https://www.tensorflow.org/programmers_guide/summaries_and_tensorboard).
It allows the visualization of certain agent attributes (e.g. reward) throughout
training which can be helpful in both building intuitions for the different
Expand Down
10 changes: 5 additions & 5 deletions docs/Getting-Started.md
Original file line number Diff line number Diff line change
Expand Up @@ -91,7 +91,7 @@ itself to keep the ball balanced on its head.

## Running a pre-trained model

We include pre-trained models for our agents (`.nn` files) and we use the
We include pre-trained models for our agents (`.onnx` files) and we use the
[Unity Inference Engine](Unity-Inference-Engine.md) to run these models inside
Unity. In this section, we will use the pre-trained model for the 3D Ball
example.
Expand Down Expand Up @@ -124,7 +124,7 @@ example.

## Training a new model with Reinforcement Learning

While we provide pre-trained `.nn` files for the agents in this environment, any
While we provide pre-trained models for the agents in this environment, any
environment you make yourself will require training agents from scratch to
generate a new model file. In this section we will demonstrate how to use the
reinforcement learning algorithms that are part of the ML-Agents Python package
Expand Down Expand Up @@ -229,7 +229,7 @@ Once the training process completes, and the training process saves the model
use it with compatible Agents (the Agents that generated the model). **Note:**
Do not just close the Unity Window once the `Saved Model` message appears.
Either wait for the training process to close the window or press `Ctrl+C` at
the command-line prompt. If you close the window manually, the `.nn` file
the command-line prompt. If you close the window manually, the `.onnx` file
containing the trained model is not exported into the ml-agents folder.

If you've quit the training early using `Ctrl+C` and want to resume training,
Expand All @@ -239,7 +239,7 @@ run the same command again, appending the `--resume` flag:
mlagents-learn config/ppo/3DBall.yaml --run-id=first3DBallRun --resume
```

Your trained model will be at `results/<run-identifier>/<behavior_name>.nn` where
Your trained model will be at `results/<run-identifier>/<behavior_name>.onnx` where
`<behavior_name>` is the name of the `Behavior Name` of the agents corresponding
to the model. This file corresponds to your model's latest checkpoint. You can
now embed this trained model into your Agents by following the steps below,
Expand All @@ -249,7 +249,7 @@ which is similar to the steps described [above](#running-a-pre-trained-model).
`Project/Assets/ML-Agents/Examples/3DBall/TFModels/`.
1. Open the Unity Editor, and select the **3DBall** scene as described above.
1. Select the **3DBall** prefab Agent object.
1. Drag the `<behavior_name>.nn` file from the Project window of the Editor to
1. Drag the `<behavior_name>.onnx` file from the Project window of the Editor to
the **Model** placeholder in the **Ball3DAgent** inspector window.
1. Press the **Play** button at the top of the Editor.

Expand Down
24 changes: 18 additions & 6 deletions docs/Installation.md
Original file line number Diff line number Diff line change
Expand Up @@ -44,11 +44,6 @@ If your Python environment doesn't include `pip3`, see these
[instructions](https://packaging.python.org/guides/installing-using-linux-tools/#installing-pip-setuptools-wheel-with-linux-package-managers)
on installing it.

Although we do not provide support for Anaconda installation on Windows, the
previous
[Windows Anaconda Installation (Deprecated) guide](Installation-Anaconda-Windows.md)
is still available.

### Clone the ML-Agents Toolkit Repository (Optional)

Now that you have installed Unity and Python, you can now install the Unity and
Expand Down Expand Up @@ -124,6 +119,22 @@ Virtual Environments. Virtual Environments provide a mechanism for isolating the
dependencies for each project and are supported on Mac / Windows / Linux. We
offer a dedicated [guide on Virtual Environments](Using-Virtual-Environment.md).

#### (Windows) Installing PyTorch

On Windows, you'll have to install the PyTorch package separately prior to
installing ML-Agents. Activate your virtual environment and run from the command line:

```sh
pip3 install torch -f https://download.pytorch.org/whl/torch_stable.html
```

Note that on Windows, you may also need Microsoft's
[Visual C++ Redistributable](https://support.microsoft.com/en-us/help/2977003/the-latest-supported-visual-c-downloads)
if you don't have it already. See the [PyTorch installation guide](https://pytorch.org/get-started/locally/)
for more installation options and versions.

#### Installing `mlagents`

To install the `mlagents` Python package, activate your virtual environment and
run from the command line:

Expand All @@ -138,7 +149,7 @@ line parameters you can use with `mlagents-learn`.

By installing the `mlagents` package, the dependencies listed in the
[setup.py file](../ml-agents/setup.py) are also installed. These include
[TensorFlow](Background-TensorFlow.md) (Requires a CPU w/ AVX support).
[PyTorch](Background-PyTorch.md) (Requires a CPU w/ AVX support).

#### Advanced: Local Installation for Development

Expand All @@ -148,6 +159,7 @@ this, you will need to install `mlagents` and `mlagents_envs` separately. From
the repository's root directory, run:

```sh
pip3 install torch -f https://download.pytorch.org/whl/torch_stable.html
pip3 install -e ./ml-agents-envs
pip3 install -e ./ml-agents
```
Expand Down
4 changes: 2 additions & 2 deletions docs/Learning-Environment-Executable.md
Original file line number Diff line number Diff line change
Expand Up @@ -171,7 +171,7 @@ INFO:mlagents.trainers: first-run-0: Ball3DLearning: Step: 10000. Mean Reward: 2
```

You can press Ctrl+C to stop the training, and your trained model will be at
`results/<run-identifier>/<behavior_name>.nn`, which corresponds to your model's
`results/<run-identifier>/<behavior_name>.onnx`, which corresponds to your model's
latest checkpoint. (**Note:** There is a known bug on Windows that causes the
saving of the model to fail when you early terminate the training, it's
recommended to wait until Step has reached the max_steps parameter you set in
Expand All @@ -182,6 +182,6 @@ following the steps below:
`Project/Assets/ML-Agents/Examples/3DBall/TFModels/`.
1. Open the Unity Editor, and select the **3DBall** scene as described above.
1. Select the **3DBall** prefab from the Project window and select **Agent**.
1. Drag the `<behavior_name>.nn` file from the Project window of the Editor to
1. Drag the `<behavior_name>.onnx` file from the Project window of the Editor to
the **Model** placeholder in the **Ball3DAgent** inspector window.
1. Press the **Play** button at the top of the Editor.
8 changes: 4 additions & 4 deletions docs/ML-Agents-Overview.md
Original file line number Diff line number Diff line change
Expand Up @@ -35,7 +35,7 @@ open-source project that enables games and simulations to serve as environments
for training intelligent agents. Agents can be trained using reinforcement
learning, imitation learning, neuroevolution, or other machine learning methods
through a simple-to-use Python API. We also provide implementations (based on
TensorFlow) of state-of-the-art algorithms to enable game developers and
PyTorch) of state-of-the-art algorithms to enable game developers and
hobbyists to easily train intelligent agents for 2D, 3D and VR/AR games. These
trained agents can be used for multiple purposes, including controlling NPC
behavior (in a variety of settings such as multi-agent and adversarial),
Expand All @@ -51,9 +51,9 @@ transition to the ML-Agents Toolkit easier, we provide several background pages
that include overviews and helpful resources on the
[Unity Engine](Background-Unity.md),
[machine learning](Background-Machine-Learning.md) and
[TensorFlow](Background-TensorFlow.md). We **strongly** recommend browsing the
[PyTorch](Background-PyTorch.md). We **strongly** recommend browsing the
relevant background pages if you're not familiar with a Unity scene, basic
machine learning concepts or have not previously heard of TensorFlow.
machine learning concepts or have not previously heard of PyTorch.

The remainder of this page contains a deep dive into ML-Agents, its key
components, different training modes and scenarios. By the end of it, you should
Expand Down Expand Up @@ -280,7 +280,7 @@ for additional information.

### Custom Training and Inference

In the previous mode, the Agents were used for training to generate a TensorFlow
In the previous mode, the Agents were used for training to generate a PyTorch
model that the Agents can later use. However, any user of the ML-Agents Toolkit
can leverage their own algorithms for training. In this case, the behaviors of
all the Agents in the scene will be controlled within Python. You can even turn
Expand Down
2 changes: 1 addition & 1 deletion docs/Readme.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,7 @@
- [ML-Agents Toolkit Overview](ML-Agents-Overview.md)
- [Background: Unity](Background-Unity.md)
- [Background: Machine Learning](Background-Machine-Learning.md)
- [Background: TensorFlow](Background-TensorFlow.md)
- [Background: PyTorch](Background-PyTorch.md)
- [Example Environments](Learning-Environment-Examples.md)

## Creating Learning Environments
Expand Down
2 changes: 1 addition & 1 deletion docs/Training-Configuration-File.md
Original file line number Diff line number Diff line change
Expand Up @@ -32,7 +32,7 @@ choice of the trainer (which we review on subsequent sections).
| `time_horizon` | (default = `64`) How many steps of experience to collect per-agent before adding it to the experience buffer. When this limit is reached before the end of an episode, a value estimate is used to predict the overall expected reward from the agent's current state. As such, this parameter trades off between a less biased, but higher variance estimate (long time horizon) and more biased, but less varied estimate (short time horizon). In cases where there are frequent rewards within an episode, or episodes are prohibitively large, a smaller number can be more ideal. This number should be large enough to capture all the important behavior within a sequence of an agent's actions. <br><br> Typical range: `32` - `2048` |
| `max_steps` | (default = `500000`) Total number of steps (i.e., observation collected and action taken) that must be taken in the environment (or across all environments if using multiple in parallel) before ending the training process. If you have multiple agents with the same behavior name within your environment, all steps taken by those agents will contribute to the same `max_steps` count. <br><br>Typical range: `5e5` - `1e7` |
| `keep_checkpoints` | (default = `5`) The maximum number of model checkpoints to keep. Checkpoints are saved after the number of steps specified by the checkpoint_interval option. Once the maximum number of checkpoints has been reached, the oldest checkpoint is deleted when saving a new checkpoint. |
| `checkpoint_interval` | (default = `500000`) The number of experiences collected between each checkpoint by the trainer. A maximum of `keep_checkpoints` checkpoints are saved before old ones are deleted. Each checkpoint saves the `.nn` (and `.onnx` if applicable) files in `results/` folder.|
| `checkpoint_interval` | (default = `500000`) The number of experiences collected between each checkpoint by the trainer. A maximum of `keep_checkpoints` checkpoints are saved before old ones are deleted. Each checkpoint saves the `.onnx` (and `.nn` if using TensorFlow) files in `results/` folder.|
| `init_path` | (default = None) Initialize trainer from a previously saved model. Note that the prior run should have used the same trainer configurations as the current run, and have been saved with the same version of ML-Agents. <br><br>You should provide the full path to the folder where the checkpoints were saved, e.g. `./models/{run-id}/{behavior_name}`. This option is provided in case you want to initialize different behaviors from different runs; in most cases, it is sufficient to use the `--initialize-from` CLI parameter to initialize all models from the same run. |
| `threaded` | (default = `true`) By default, model updates can happen while the environment is being stepped. This violates the [on-policy](https://spinningup.openai.com/en/latest/user/algorithms.html#the-on-policy-algorithms) assumption of PPO slightly in exchange for a training speedup. To maintain the strict on-policyness of PPO, you can disable parallel updates by setting `threaded` to `false`. There is usually no reason to turn `threaded` off for SAC. |
| `hyperparameters -> learning_rate` | (default = `3e-4`) Initial learning rate for gradient descent. Corresponds to the strength of each gradient descent update step. This should typically be decreased if training is unstable, and the reward does not consistently increase. <br><br>Typical range: `1e-5` - `1e-3` |
Expand Down
35 changes: 1 addition & 34 deletions docs/Training-ML-Agents.md
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,6 @@
- [Curriculum Learning](#curriculum)
- [Training with a Curriculum](#training-with-a-curriculum)
- [Training Using Concurrent Unity Instances](#training-using-concurrent-unity-instances)
- [Using PyTorch (Experimental)](#using-pytorch-experimental)

For a broad overview of reinforcement learning, imitation learning and all the
training scenarios, methods and options within the ML-Agents Toolkit, see
Expand Down Expand Up @@ -88,7 +87,7 @@ in the `results/<run-identifier>` folder:
values. See [Using TensorBoard](Using-Tensorboard.md) for more details on how
to visualize the training metrics.
1. Models: these contain the model checkpoints that
are updated throughout training and the final model file (`.nn`). This final
are updated throughout training and the final model file (`.onnx`). This final
model file is generated once either when training completes or is
interrupted.
1. Timers file (under `results/<run-identifier>/run_logs`): this contains aggregated
Expand Down Expand Up @@ -556,35 +555,3 @@ Some considerations:
- **Result Variation Using Concurrent Unity Instances** - If you keep all the
hyperparameters the same, but change `--num-envs=<n>`, the results and model
would likely change.

### Using PyTorch (Experimental)

ML-Agents, by default, uses TensorFlow as its backend, but experimental support
for PyTorch has been added. To use PyTorch, the `torch` Python package must
be installed, and PyTorch must be enabled for your trainer.

#### Installing PyTorch

If you've already installed ML-Agents, follow the
[official PyTorch install instructions](https://pytorch.org/get-started/locally/) for
your platform and configuration. Note that on Windows, you may also need Microsoft's
[Visual C++ Redistributable](https://support.microsoft.com/en-us/help/2977003/the-latest-supported-visual-c-downloads) if you don't have it already.

If you're installing or upgrading ML-Agents on Linux or Mac, you can also run
`pip3 install mlagents[torch]` instead of `pip3 install mlagents`
during [installation](Installation.md). On Windows, install ML-Agents first and then
separately install PyTorch.

#### Enabling PyTorch

PyTorch can be enabled in one of two ways. First, by adding `--torch` to the
`mlagents-learn` command. This will make all behaviors train with PyTorch.

Second, by changing the `framework` option for your agent behavior in the
configuration YAML as below. This will use PyTorch just for that behavior.

```yaml
behaviors:
YourAgentBehavior:
framework: pytorch
```
Loading