gym_unity seems to provide a reward of 0.0 for the final step

When using gym_unity to interact with a built Unity environment, the reward obtained in the `env.step(action)`  at the last timestep (when `done = True`) seems to be 0.
Tried this on the Basic and 3DBall environment, where both environment should produce a reward different that 0 at the last timestep.

### To reproduce the bug:
1. Build the 'Basic' environment in Unity
2. Run the following code :  
```
import numpy as np
from gym_unity.envs import UnityEnv

env = UnityEnv("../../../ml-agents/envs/buildbasic/Basic", 5, no_graphics=False, flatten_branched=False)

for e in range(2):
    print("Episode ", e)
    o, d = env.reset(), False
    
    while not d:
        o, r, d, _ = env.step(np.array([2]))
        print(o, r, d)

env.close()
```

3. Output : 
```
Episode  0
[0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 1. 0. 0. 0. 0. 0. 0. 0. 0.] -0.01 False
[0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 1. 0. 0. 0. 0. 0. 0. 0.] -0.01 False
[0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 1. 0. 0. 0. 0. 0. 0.] -0.01 False
[0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 1. 0. 0. 0. 0. 0.] -0.01 False
[0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 1. 0. 0. 0. 0.] -0.01 False
[0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 1. 0. 0. 0.] -0.01 False
[0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 1. 0. 0. 0. 0. 0. 0. 0. 0. 0.] 0.0 True
Episode  1
[0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 1. 0. 0. 0. 0. 0. 0. 0. 0.] -0.01 False
[0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 1. 0. 0. 0. 0. 0. 0. 0.] -0.01 False
[0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 1. 0. 0. 0. 0. 0. 0.] -0.01 False
[0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 1. 0. 0. 0. 0. 0.] -0.01 False
[0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 1. 0. 0. 0. 0.] -0.01 False
[0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 1. 0. 0. 0.] -0.01 False
[0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 1. 0. 0. 0. 0. 0. 0. 0. 0. 0.] 0.0 True
```
The agent should receive a reward of +1 at the last timestep for reaching the big goal (as here it is always taking the action 2, corresponding to the action of going to the left). But we can see that it receives instead a reward of 0 at the last timestep.


### Environment:
- OS + version: Ubuntu 18.04.4
- _ML-Agents version_: ML-Agents v0.14.0
- _Environment_: Basic, 3DBall

Thank you.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

gym_unity seems to provide a reward of 0.0 for the final step #3460

To reproduce the bug:

Environment:

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

gym_unity seems to provide a reward of 0.0 for the final step #3460

Description

To reproduce the bug:

Environment:

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions