Skip to content

gym_unity seems to provide a reward of 0.0 for the final step #3460

@alxndrTL

Description

@alxndrTL

When using gym_unity to interact with a built Unity environment, the reward obtained in the env.step(action) at the last timestep (when done = True) seems to be 0.
Tried this on the Basic and 3DBall environment, where both environment should produce a reward different that 0 at the last timestep.

To reproduce the bug:

  1. Build the 'Basic' environment in Unity
  2. Run the following code :
import numpy as np
from gym_unity.envs import UnityEnv

env = UnityEnv("../../../ml-agents/envs/buildbasic/Basic", 5, no_graphics=False, flatten_branched=False)

for e in range(2):
    print("Episode ", e)
    o, d = env.reset(), False
    
    while not d:
        o, r, d, _ = env.step(np.array([2]))
        print(o, r, d)

env.close()
  1. Output :
Episode  0
[0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 1. 0. 0. 0. 0. 0. 0. 0. 0.] -0.01 False
[0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 1. 0. 0. 0. 0. 0. 0. 0.] -0.01 False
[0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 1. 0. 0. 0. 0. 0. 0.] -0.01 False
[0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 1. 0. 0. 0. 0. 0.] -0.01 False
[0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 1. 0. 0. 0. 0.] -0.01 False
[0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 1. 0. 0. 0.] -0.01 False
[0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 1. 0. 0. 0. 0. 0. 0. 0. 0. 0.] 0.0 True
Episode  1
[0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 1. 0. 0. 0. 0. 0. 0. 0. 0.] -0.01 False
[0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 1. 0. 0. 0. 0. 0. 0. 0.] -0.01 False
[0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 1. 0. 0. 0. 0. 0. 0.] -0.01 False
[0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 1. 0. 0. 0. 0. 0.] -0.01 False
[0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 1. 0. 0. 0. 0.] -0.01 False
[0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 1. 0. 0. 0.] -0.01 False
[0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 1. 0. 0. 0. 0. 0. 0. 0. 0. 0.] 0.0 True

The agent should receive a reward of +1 at the last timestep for reaching the big goal (as here it is always taking the action 2, corresponding to the action of going to the left). But we can see that it receives instead a reward of 0 at the last timestep.

Environment:

  • OS + version: Ubuntu 18.04.4
  • ML-Agents version: ML-Agents v0.14.0
  • Environment: Basic, 3DBall

Thank you.

Metadata

Metadata

Assignees

Labels

bugIssue describes a potential bug in ml-agents.

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions