Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 3 additions & 3 deletions docs/Feature-Monitor.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,13 +9,13 @@ You can track many different things both related and unrelated to the agents
themselves. By default, the Monitor is only active in the *inference* phase, so
not during training. To change this behavior, you can activate or deactivate it
by calling `SetActive(boolean)`. For example to also show the monitor during
training, you can call it in the `InitializeAcademy()` method of your `Academy`:
training, you can call it in the `Awake()` method of your `MonoBehaviour`:

```csharp
using MLAgents;

public class YourAcademy : Academy {
public override void InitializeAcademy()
public class MyBehaviour : MonoBehaviour {
public void Awake()
{
Monitor.SetActive(true);
}
Expand Down
14 changes: 1 addition & 13 deletions docs/Getting-Started-with-Balance-Ball.md
Original file line number Diff line number Diff line change
Expand Up @@ -50,19 +50,7 @@ to speed up training since all twelve agents contribute to training in parallel.

### Academy

The Academy object for the scene is placed on the Ball3DAcademy GameObject. Since
the base Academy class is abstract, you must always define a subclass. There are
three functions you can implement, though they are all optional:

* Academy.InitializeAcademy() — Called once when the environment is launched.
* Academy.AcademyStep() — Called at every simulation step before
agent.AgentAction() (and after the Agents collect their observations).
* Academy.AcademyReset() — Called when the Academy starts or restarts the
simulation (including the first time).

The 3D Balance Ball environment does not use these functions — each Agent resets
itself when needed — but many environments do use these functions to control the
environment around the Agents.
The Academy object for the scene is placed on the Ball3DAcademy GameObject.

### Agent

Expand Down
51 changes: 6 additions & 45 deletions docs/Learning-Environment-Create-New.md
Original file line number Diff line number Diff line change
Expand Up @@ -17,10 +17,8 @@ steps:
1. Create an environment for your agents to live in. An environment can range
from a simple physical simulation containing a few objects to an entire game
or ecosystem.
2. Implement an Academy subclass and add it to a GameObject in the Unity scene
containing the environment. Your Academy class can implement a few optional
methods to update the scene independently of any agents. For example, you can
add, move, or delete agents and other entities in the environment.
2. Add an Academy MonoBehaviour to a GameObject in the Unity scene
containing the environment.
3. Implement your Agent subclasses. An Agent subclass defines the code an Agent
uses to observe its environment, to carry out assigned actions, and to
calculate the rewards used for reinforcement training. You can also implement
Expand Down Expand Up @@ -115,46 +113,16 @@ component later in the tutorial.
You can adjust the camera angles to give a better view of the scene at runtime.
The next steps will be to create and add the ML-Agent components.

## Implement an Academy

## Add an Academy
The Academy object coordinates the ML-Agents in the scene and drives the
decision-making portion of the simulation loop. Every ML-Agent scene needs one
Academy instance. Since the base Academy class is abstract, you must make your
own subclass even if you don't need to use any of the methods for a particular
environment.
(and only one) Academy instance.

First, add a New Script component to the Academy GameObject created earlier:
First, add an Academy component to the Academy GameObject created earlier:

1. Select the Academy GameObject to view it in the Inspector window.
2. Click **Add Component**.
3. Click **New Script** in the list of components (at the bottom).
4. Name the script "RollerAcademy".
5. Click **Create and Add**.

Next, edit the new `RollerAcademy` script:

1. In the Unity Project window, double-click the `RollerAcademy` script to open
it in your code editor. (By default new scripts are placed directly in the
**Assets** folder.)
2. In the code editor, add the statement, `using MLAgents;`.
3. Change the base class from `MonoBehaviour` to `Academy`.
4. Delete the `Start()` and `Update()` methods that were added by default.

In such a basic scene, we don't need the Academy to initialize, reset, or
otherwise control any objects in the environment so we have the simplest
possible Academy implementation:

```csharp
using MLAgents;

public class RollerAcademy : Academy { }
```

The default settings for the Academy properties are also fine for this
environment, so we don't need to change anything for the RollerAcademy component
in the Inspector window.

![The Academy properties](images/mlagents-NewTutAcademy.png)
3. Select **Academy** in the list of components.

## Implement an Agent

Expand All @@ -179,13 +147,6 @@ So far, these are the basic steps that you would use to add ML-Agents to any
Unity project. Next, we will add the logic that will let our Agent learn to roll
to the cube using reinforcement learning.

In this simple scenario, we don't use the Academy object to control the
environment. If we wanted to change the environment, for example change the size
of the floor or add or remove agents or other objects before or during the
simulation, we could implement the appropriate methods in the Academy. Instead,
we will have the Agent do all the work of resetting itself and the target when
it succeeds or falls trying.

### Initialization and Resetting the Agent

When the Agent reaches its target, it marks itself done and its Agent reset
Expand Down
49 changes: 0 additions & 49 deletions docs/Learning-Environment-Design-Academy.md

This file was deleted.

58 changes: 34 additions & 24 deletions docs/Learning-Environment-Design.md
Original file line number Diff line number Diff line change
Expand Up @@ -39,15 +39,14 @@ use.

The ML-Agents Academy class orchestrates the agent simulation loop as follows:

1. Calls your Academy subclass's `AcademyReset()` function.
1. Calls your Academy's `OnEnvironmentReset` delegate.
2. Calls the `AgentReset()` function for each Agent in the scene.
3. Calls the `CollectObservations()` function for each Agent in the scene.
4. Uses each Agent's Policy to decide on the Agent's next action.
5. Calls your subclass's `AcademyStep()` function.
6. Calls the `AgentAction()` function for each Agent in the scene, passing in
5. Calls the `AgentAction()` function for each Agent in the scene, passing in
the action chosen by the Agent's Policy. (This function is not called if the
Agent is done.)
7. Calls the Agent's `AgentOnDone()` function if the Agent has reached its `Max
6. Calls the Agent's `AgentOnDone()` function if the Agent has reached its `Max
Step` count or has otherwise marked itself as `done`. Optionally, you can set
an Agent to restart if it finishes before the end of an episode. In this
case, the Academy calls the `AgentReset()` function.
Expand All @@ -57,7 +56,7 @@ implement the above methods. The `Agent.CollectObservations()` and
`Agent.AgentAction()` functions are required; the other methods are optional —
whether you need to implement them or not depends on your specific scenario.

**Note:** The API used by the Python PPO training process to communicate with
**Note:** The API used by the Python training process to communicate with
and control the Academy during training can be used for other purposes as well.
For example, you could use the API to use Unity as the simulation engine for
your own machine learning algorithms. See [Python API](Python-API.md) for more
Expand All @@ -66,32 +65,43 @@ information.
## Organizing the Unity Scene

To train and use the ML-Agents toolkit in a Unity scene, the scene must contain
a single Academy subclass and as many Agent subclasses
as you need.
a single Academy and as many Agent subclasses as you need.
Agent instances should be attached to the GameObject representing that Agent.

### Academy

The Academy object orchestrates Agents and their decision making processes. Only
place a single Academy object in a scene.

You must create a subclass of the Academy class (since the base class is
abstract). When you create your Academy subclass, you can implement the
following methods (all are optional):

* `InitializeAcademy()` — Prepare the environment the first time it launches.
* `AcademyReset()` — Prepare the environment and Agents for the next training
episode. Use this function to place and initialize entities in the scene as
necessary.
* `AcademyStep()` — Prepare the environment for the next simulation step. The
base Academy class calls this function before calling any `AgentAction()`
methods for the current step. You can use this function to update other
objects in the scene before the Agents take their actions. Note that the
Agents have already collected their observations and chosen an action before
the Academy invokes this method.

See [Academy](Learning-Environment-Design-Academy.md) for a complete list of
the Academy properties and their uses.
#### Academy resetting
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moved this section from docs/Learning-Environment-Design-Academy.md (now defunct)

To alter the environment at the start of each episode, add your method to the Academy's OnEnvironmentReset action.

```csharp
public class MySceneBehavior : MonoBehaviour
{
public void Awake()
{
var academy = FindObjectOfType<Academy>();
academy.LazyInitialization();
academy.OnEnvironmentReset += EnvironmentReset;
}

void EnvironmentReset()
{
// Reset the scene here
}
}
```

For example, you might want to reset an Agent to its starting
position or move a goal to a random position. An environment resets when the
`reset()` method is called on the Python `UnityEnvironment`.

When you reset an environment, consider the factors that should change so that
training is generalizable to different conditions. For example, if you were
training a maze-solving agent, you would probably want to change the maze itself
for each training episode. Otherwise, the agent would probably on learn to solve
one, particular maze, not mazes in general.

### Agent

Expand Down
4 changes: 1 addition & 3 deletions docs/ML-Agents-Overview.md
Original file line number Diff line number Diff line change
Expand Up @@ -139,9 +139,7 @@ organize the Unity scene:
receives and assigning a reward (positive / negative) when appropriate. Each
Agent is linked to a Policy.
- **Academy** - which orchestrates the observation and decision making process.
Within the Academy, several environment-wide parameters such as the rendering
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Old cleanup

quality and the speed at which the environment is run can be specified. The
External Communicator lives within the Academy.
The External Communicator lives within the Academy.

Every Learning Environment will always have one global Academy and one Agent for
every character in the scene. While each Agent must be linked to a Policy, it is
Expand Down
23 changes: 19 additions & 4 deletions docs/Migrating.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,10 +10,19 @@ The versions can be found in
## Migrating from 0.13 to latest

### Important changes
* The Academy class was changed to be sealed and its virtual methods were removed.
* Trainer steps are now counted per-Agent, not per-environment as in previous versions. For instance, if you have 10 Agents in the scene, 20 environment steps now corresponds to 200 steps as printed in the terminal and in Tensorboard.

### Steps to Migrate
* Multiply `max_steps` and `summary_steps` in your `trainer_config.yaml` by the number of Agents in the scene.
* If you have a class that inherits from Academy:
* If the class didn't override any of the virtual methods and didn't store any additional data, you can just replace the instance of it in the scene with an Academy.
* If the class had additional data, create a new MonoBehaviour and store the data on this instead.
* If the class overrode the virtual methods, create a new MonoBehaviour and move the logic to it:
* Move the InitializeAcademy code to MonoBehaviour.OnAwake
* Move the AcademyStep code to MonoBehaviour.FixedUpdate
* Move the OnDestroy code to MonoBehaviour.OnDestroy or add it to the to Academy.DestroyAction action.
* Move the AcademyReset code to a new method and add it to the Academy.OnEnvironmentReset action.

## Migrating from ML-Agents toolkit v0.12.0 to v0.13.0

Expand All @@ -22,7 +31,8 @@ The versions can be found in
* `reset()` on the Low-Level Python API no longer takes a `train_mode` argument. To modify the performance/speed of the engine, you must use an `EngineConfigurationChannel`
* `reset()` on the Low-Level Python API no longer takes a `config` argument. `UnityEnvironment` no longer has a `reset_parameters` field. To modify float properties in the environment, you must use a `FloatPropertiesChannel`. For more information, refer to the [Low Level Python API documentation](Python-API.md)
* `CustomResetParameters` are now removed.
* The Academy no longer has a `Training Configuration` nor `Inference Configuration` field in the inspector. To modify the configuration from the Low-Level Python API, use an `EngineConfigurationChannel`. To modify it during training, use the new command line arguments `--width`, `--height`, `--quality-level`, `--time-scale` and `--target-frame-rate` in `mlagents-learn`.
* The Academy no longer has a `Training Configuration` nor `Inference Configuration` field in the inspector. To modify the configuration from the Low-Level Python API, use an `EngineConfigurationChannel`.
To modify it during training, use the new command line arguments `--width`, `--height`, `--quality-level`, `--time-scale` and `--target-frame-rate` in `mlagents-learn`.
* The Academy no longer has a `Default Reset Parameters` field in the inspector. The Academy class no longer has a `ResetParameters`. To access shared float properties with Python, use the new `FloatProperties` field on the Academy.
* Offline Behavioral Cloning has been removed. To learn from demonstrations, use the GAIL and
Behavioral Cloning features with either PPO or SAC. See [Imitation Learning](Training-Imitation-Learning.md) for more information.
Expand All @@ -46,7 +56,9 @@ Behavioral Cloning features with either PPO or SAC. See [Imitation Learning](Tra
* Barracuda was upgraded to 0.3.2, and it is now installed via the Unity Package Manager.

### Steps to Migrate
* We [fixed a bug](https://github.com/Unity-Technologies/ml-agents/pull/2823) in `RayPerception3d.Perceive()` that was causing the `endOffset` to be used incorrectly. However this may produce different behavior from previous versions if you use a non-zero `startOffset`. To reproduce the old behavior, you should increase the the value of `endOffset` by `startOffset`. You can verify your raycasts are performing as expected in scene view using the debug rays.
* We [fixed a bug](https://github.com/Unity-Technologies/ml-agents/pull/2823) in `RayPerception3d.Perceive()` that was causing the `endOffset` to be used incorrectly. However this may produce different behavior from previous versions if you use a non-zero `startOffset`.
To reproduce the old behavior, you should increase the the value of `endOffset` by `startOffset`.
You can verify your raycasts are performing as expected in scene view using the debug rays.
* If you use RayPerception3D, replace it with RayPerceptionSensorComponent3D (and similarly for 2D). The settings, such as ray angles and detectable tags, are configured on the component now.
RayPerception3D would contribute `(# of rays) * (# of tags + 2)` to the State Size in Behavior Parameters, but this is no longer necessary, so you should reduce the State Size by this amount.
Making this change will require retraining your model, since the observations that RayPerceptionSensorComponent3D produces are different from the old behavior.
Expand All @@ -68,7 +80,8 @@ Making this change will require retraining your model, since the observations th
#### Steps to Migrate
* In order to be able to train, make sure both your ML-Agents Python package and UnitySDK code come from the v0.11 release. Training will not work, for example, if you update the ML-Agents Python package, and only update the API Version in UnitySDK.
* If your Agents used visual observations, you must add a CameraSensorComponent corresponding to each old Camera in the Agent's camera list (and similarly for RenderTextures).
* Since Brain ScriptableObjects have been removed, you will need to delete all the Brain ScriptableObjects from your `Assets` folder. Then, add a `Behavior Parameters` component to each `Agent` GameObject. You will then need to complete the fields on the new `Behavior Parameters` component with the BrainParameters of the old Brain.
* Since Brain ScriptableObjects have been removed, you will need to delete all the Brain ScriptableObjects from your `Assets` folder. Then, add a `Behavior Parameters` component to each `Agent` GameObject.
You will then need to complete the fields on the new `Behavior Parameters` component with the BrainParameters of the old Brain.

## Migrating from ML-Agents toolkit v0.9 to v0.10

Expand All @@ -79,7 +92,9 @@ Making this change will require retraining your model, since the observations th
#### Steps to Migrate
* `UnitySDK/Assets/ML-Agents/Scripts/Communicator.cs` and its class `Communicator` have been renamed to `UnitySDK/Assets/ML-Agents/Scripts/ICommunicator.cs` and `ICommunicator` respectively.
* The `SpaceType` Enums `discrete`, and `continuous` have been renamed to `Discrete` and `Continuous`.
* We have removed the `Done` call as well as the capacity to set `Max Steps` on the Academy. Therefore an AcademyReset will never be triggered from C# (only from Python). If you want to reset the simulation after a fixed number of steps, or when an event in the simulation occurs, we recommend looking at our multi-agent example environments (such as BananaCollector). In our examples, groups of Agents can be reset through an "Area" that can reset groups of Agents.
* We have removed the `Done` call as well as the capacity to set `Max Steps` on the Academy. Therefore an AcademyReset will never be triggered from C# (only from Python). If you want to reset the simulation after a
fixed number of steps, or when an event in the simulation occurs, we recommend looking at our multi-agent example environments (such as BananaCollector).
In our examples, groups of Agents can be reset through an "Area" that can reset groups of Agents.
* The import for `mlagents.envs.UnityEnvironment` was removed. If you are using the Python API, change `from mlagents_envs import UnityEnvironment` to `from mlagents_envs.environment import UnityEnvironment`.


Expand Down
Loading