-
Notifications
You must be signed in to change notification settings - Fork 4.4k
Closed
Labels
bugIssue describes a potential bug in ml-agents.Issue describes a potential bug in ml-agents.
Description
When using gym_unity to interact with a built Unity environment, the reward obtained in the env.step(action)
at the last timestep (when done = True
) seems to be 0.
Tried this on the Basic and 3DBall environment, where both environment should produce a reward different that 0 at the last timestep.
To reproduce the bug:
- Build the 'Basic' environment in Unity
- Run the following code :
import numpy as np
from gym_unity.envs import UnityEnv
env = UnityEnv("../../../ml-agents/envs/buildbasic/Basic", 5, no_graphics=False, flatten_branched=False)
for e in range(2):
print("Episode ", e)
o, d = env.reset(), False
while not d:
o, r, d, _ = env.step(np.array([2]))
print(o, r, d)
env.close()
- Output :
Episode 0
[0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 1. 0. 0. 0. 0. 0. 0. 0. 0.] -0.01 False
[0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 1. 0. 0. 0. 0. 0. 0. 0.] -0.01 False
[0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 1. 0. 0. 0. 0. 0. 0.] -0.01 False
[0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 1. 0. 0. 0. 0. 0.] -0.01 False
[0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 1. 0. 0. 0. 0.] -0.01 False
[0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 1. 0. 0. 0.] -0.01 False
[0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 1. 0. 0. 0. 0. 0. 0. 0. 0. 0.] 0.0 True
Episode 1
[0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 1. 0. 0. 0. 0. 0. 0. 0. 0.] -0.01 False
[0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 1. 0. 0. 0. 0. 0. 0. 0.] -0.01 False
[0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 1. 0. 0. 0. 0. 0. 0.] -0.01 False
[0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 1. 0. 0. 0. 0. 0.] -0.01 False
[0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 1. 0. 0. 0. 0.] -0.01 False
[0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 1. 0. 0. 0.] -0.01 False
[0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 1. 0. 0. 0. 0. 0. 0. 0. 0. 0.] 0.0 True
The agent should receive a reward of +1 at the last timestep for reaching the big goal (as here it is always taking the action 2, corresponding to the action of going to the left). But we can see that it receives instead a reward of 0 at the last timestep.
Environment:
- OS + version: Ubuntu 18.04.4
- ML-Agents version: ML-Agents v0.14.0
- Environment: Basic, 3DBall
Thank you.
JohnBergago
Metadata
Metadata
Assignees
Labels
bugIssue describes a potential bug in ml-agents.Issue describes a potential bug in ml-agents.