Calling EndEpisode inside CollectObservations results in a recursive call to CollectObservations

Calling EndEpisode inside CollectObservations results in a recursive/nested CollectObservations call. The stack trace looks like this:
 ```
16 CollectObservations called 
UnityEngine.Debug:Log(Object)
Ball3DAgent:CollectObservations(VectorSensor) (at Assets/ML-Agents/Examples/3DBall/Scripts/Ball3DAgent.cs:52)
Unity.MLAgents.Agent:NotifyAgentDone(DoneReason) (at D:/Workspace/Unity/MLAgentsColossus_UpgradeToRelease_6/CleanRelease7/com.unity.ml-agents/Runtime/Agent.cs:515)
Unity.MLAgents.Agent:EndEpisodeAndReset(DoneReason) (at D:/Workspace/Unity/MLAgentsColossus_UpgradeToRelease_6/CleanRelease7/com.unity.ml-agents/Runtime/Agent.cs:731)
Unity.MLAgents.Agent:EndEpisode() (at D:/Workspace/Unity/MLAgentsColossus_UpgradeToRelease_6/CleanRelease7/com.unity.ml-agents/Runtime/Agent.cs:706)
Ball3DAgent:CollectObservations(VectorSensor) (at Assets/ML-Agents/Examples/3DBall/Scripts/Ball3DAgent.cs:57)
Unity.MLAgents.Agent:SendInfoToBrain() (at D:/Workspace/Unity/MLAgentsColossus_UpgradeToRelease_6/CleanRelease7/com.unity.ml-agents/Runtime/Agent.cs:1013)
Unity.MLAgents.Agent:SendInfo() (at D:/Workspace/Unity/MLAgentsColossus_UpgradeToRelease_6/CleanRelease7/com.unity.ml-agents/Runtime/Agent.cs:1262)
Unity.MLAgents.Academy:EnvironmentStep() (at D:/Workspace/Unity/MLAgentsColossus_UpgradeToRelease_6/CleanRelease7/com.unity.ml-agents/Runtime/Academy.cs:578)
Unity.MLAgents.AcademyFixedUpdateStepper:FixedUpdate() (at D:/Workspace/Unity/MLAgentsColossus_UpgradeToRelease_6/CleanRelease7/com.unity.ml-agents/Runtime/Academy.cs:43)
```

Notice that CollectObservations is called from within a previous CollectObservations call. I am aware that it is probably not a great idea to end an episode while CollectObservations it being run, but since the API allows it, the API should handle this.

There is a similar issue when EndEpisode is called from within EnvironmentReset. This should not be valid. There is no episode to end. This should generate an error. Instead it fires "CollectObservations" and "EpisodeBegin" before the environment has finished resetting.

There is a similar issue when EndEpisode is called from within PreAcademyStep. What step are we in? The previous step or the next step? FixedUpdate has stepped, but the Academy has not yet incremented the step counter. A call to "CollectObservations" happens before the AcademyStep has been incremented. What step does do these observations belong to? Also OnBeginEpisode gets fired. How many steps were in that episode? Does the step we are about to execute belong to this episode or the next?

**To Reproduce**
Use the 3DBall example
Add a EndEpisode inside CollectObservations. 
Add a Debug.Log to CollectObservations that prints out the  academy step:
```
    public override void CollectObservations(VectorSensor sensor)
    {
...
        Debug.Log(Academy.Instance.StepCount + " CollectObservations called ");
        if (Academy.Instance.StepCount % 16 == 0)
        {
            EndEpisode();
            Debug.Break();
        }
...
}
```
Run the scene in training mode and observe the nested CollectObservations call in the stack trace.

Try calling EndEpisode inside AcademyReset and PreAcademyStep and observe the similar calls.

Expected Result:

There should be some enforcement inside the Academy to ensure that EndEpisode is being called in a valid context.
- EndEpisode should only be callable in the context of a valid episode step (not in PreAcademyStep or EnvironmentReset)
- Calling EndEpisode outside the context of a valid episode step should do nothing and print an error to the console
- Calling EndEpisode in PreAcademyStep (after the previous step has finished but before the next step has started) should do nothing and print an error to the console. Which step do these observations belong to?
- Calling EndEpisode in CollectObservations should generated an error and do nothing.
- Calling EndEpisode should not generate an embedded call to CollectObservations. This extra call to CollectObservation will not return the correct observations most of the time.

The extra CollectObservations call generated by EndEpisode is probably not generating valid observations:

If it is called from within PreAcademyStep, then the step may have not initialized completely. For example the StepCount has not been incremented. It is possible that StepCount is being used in observations (eg. even steps are my turn, odd steps are my opponents turn).

If it is called from within CollectObservations then there is a recursive call to CollectObservtions. It is likely that these calls could set setting state on the agent and these state changes would interfere with each other.

If it is called in OnActionReceived then the agent has probably applied its actions to the environment. The new observations would not generate the actions generated this frame.




**Console logs / stack traces**
```
Ball3DAgent:CollectObservations(VectorSensor) (at Assets/ML-Agents/Examples/3DBall/Scripts/Ball3DAgent.cs:52)
Unity.MLAgents.Agent:NotifyAgentDone(DoneReason) (at D:/Workspace/Unity/MLAgentsColossus_UpgradeToRelease_6/CleanRelease7/com.unity.ml-agents/Runtime/Agent.cs:515)
Unity.MLAgents.Agent:EndEpisodeAndReset(DoneReason) (at D:/Workspace/Unity/MLAgentsColossus_UpgradeToRelease_6/CleanRelease7/com.unity.ml-agents/Runtime/Agent.cs:731)
Unity.MLAgents.Agent:EndEpisode() (at D:/Workspace/Unity/MLAgentsColossus_UpgradeToRelease_6/CleanRelease7/com.unity.ml-agents/Runtime/Agent.cs:706)
Ball3DAgent:CollectObservations(VectorSensor) (at Assets/ML-Agents/Examples/3DBall/Scripts/Ball3DAgent.cs:57)
Unity.MLAgents.Agent:SendInfoToBrain() (at D:/Workspace/Unity/MLAgentsColossus_UpgradeToRelease_6/CleanRelease7/com.unity.ml-agents/Runtime/Agent.cs:1013)
Unity.MLAgents.Agent:SendInfo() (at D:/Workspace/Unity/MLAgentsColossus_UpgradeToRelease_6/CleanRelease7/com.unity.ml-agents/Runtime/Agent.cs:1262)
Unity.MLAgents.Academy:EnvironmentStep() (at D:/Workspace/Unity/MLAgentsColossus_UpgradeToRelease_6/CleanRelease7/com.unity.ml-agents/Runtime/Academy.cs:578)
Unity.MLAgents.AcademyFixedUpdateStepper:FixedUpdate() (at D:/Workspace/Unity/MLAgentsColossus_UpgradeToRelease_6/CleanRelease7/com.unity.ml-agents/Runtime/Academy.cs:43)
```

**Screenshots**


**Environment (please complete the following information):**
- Unity 2019.3
- Windows 10
- Version 7
-  2.3.1
- 3DBall environment but any environment will work


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Calling EndEpisode inside CollectObservations results in a recursive call to CollectObservations #4558

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Calling EndEpisode inside CollectObservations results in a recursive call to CollectObservations #4558

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions