-
Notifications
You must be signed in to change notification settings - Fork 120
Description
Describe the bug
I initialized 8 environments and expected the reset function to return 8 different states, but I found they are exactly the same. I applied different actions to these 8 environments, expecting the returned next observations to be different states, but they were still exactly the same.
I also tried the code from the CleanRL project and encountered the same issue.
code:
envs = envpool.make("CartPole-v1", num_envs=8, seed=43, env_type="gymnasium", batch_size=8)
obs = envs.reset()
print(obs)
actions = torch.zeros((8,), dtype=torch.int32)
actions[3] = 1
actions[5] = 1
next_obs, _, _, _, _ = envs.step(actions.numpy())
print(next_obs)
output:
(array([[-0.00031387, -0.00031387, -0.00031387, -0.00031387],
[-0.00031387, -0.00031387, -0.00031387, -0.00031387],
[-0.00031387, -0.00031387, -0.00031387, -0.00031387],
[-0.00031387, -0.00031387, -0.00031387, -0.00031387],
[-0.00031387, -0.00031387, -0.00031387, -0.00031387],
[-0.00031387, -0.00031387, -0.00031387, -0.00031387],
[-0.00031387, -0.00031387, -0.00031387, -0.00031387],
[-0.00031387, -0.00031387, -0.00031387, -0.00031387]],
dtype=float32), {'env_id': array([0, 0, 0, 0, 0, 0, 0, 0], dtype=int32), 'players': {'env_id': array([0, 0, 0, 0, 0, 0, 0, 0], dtype=int32)}, 'elapsed_step': array([0, 0, 0, 0, 0, 0, 0, 0], dtype=int32)})
[[-0.00110804 -0.00110804 -0.00110804 -0.00110804]
[-0.00110804 -0.00110804 -0.00110804 -0.00110804]
[-0.00110804 -0.00110804 -0.00110804 -0.00110804]
[-0.00110804 -0.00110804 -0.00110804 -0.00110804]
[-0.00110804 -0.00110804 -0.00110804 -0.00110804]
[-0.00110804 -0.00110804 -0.00110804 -0.00110804]
[-0.00110804 -0.00110804 -0.00110804 -0.00110804]
[-0.00110804 -0.00110804 -0.00110804 -0.00110804]]
To Reproduce
import envpool
import torch
envs = envpool.make("CartPole-v1", num_envs=8, seed=43, env_type="gymnasium", batch_size=8)
obs = envs.reset()
print(obs)
actions = torch.zeros((8,), dtype=torch.int32)
actions[3] = 1
actions[5] = 1
next_obs, _, _, _, _ = envs.step(actions.numpy())
print(next_obs)