Skip to content

[BUG]I initialized 8 environments and expected the reset function to return 8 different states, but I found they are exactly the same. #324

@bisonliao

Description

@bisonliao

Describe the bug

I initialized 8 environments and expected the reset function to return 8 different states, but I found they are exactly the same. I applied different actions to these 8 environments, expecting the returned next observations to be different states, but they were still exactly the same.

I also tried the code from the CleanRL project and encountered the same issue.
code:

envs = envpool.make("CartPole-v1", num_envs=8, seed=43,  env_type="gymnasium", batch_size=8)
obs = envs.reset()
print(obs)
actions = torch.zeros((8,), dtype=torch.int32)
actions[3] = 1
actions[5] = 1
next_obs, _, _, _, _ = envs.step(actions.numpy())
print(next_obs)

output:
(array([[-0.00031387, -0.00031387, -0.00031387, -0.00031387],
[-0.00031387, -0.00031387, -0.00031387, -0.00031387],
[-0.00031387, -0.00031387, -0.00031387, -0.00031387],
[-0.00031387, -0.00031387, -0.00031387, -0.00031387],
[-0.00031387, -0.00031387, -0.00031387, -0.00031387],
[-0.00031387, -0.00031387, -0.00031387, -0.00031387],
[-0.00031387, -0.00031387, -0.00031387, -0.00031387],
[-0.00031387, -0.00031387, -0.00031387, -0.00031387]],
dtype=float32), {'env_id': array([0, 0, 0, 0, 0, 0, 0, 0], dtype=int32), 'players': {'env_id': array([0, 0, 0, 0, 0, 0, 0, 0], dtype=int32)}, 'elapsed_step': array([0, 0, 0, 0, 0, 0, 0, 0], dtype=int32)})
[[-0.00110804 -0.00110804 -0.00110804 -0.00110804]
[-0.00110804 -0.00110804 -0.00110804 -0.00110804]
[-0.00110804 -0.00110804 -0.00110804 -0.00110804]
[-0.00110804 -0.00110804 -0.00110804 -0.00110804]
[-0.00110804 -0.00110804 -0.00110804 -0.00110804]
[-0.00110804 -0.00110804 -0.00110804 -0.00110804]
[-0.00110804 -0.00110804 -0.00110804 -0.00110804]
[-0.00110804 -0.00110804 -0.00110804 -0.00110804]]

To Reproduce

import envpool
import torch
envs = envpool.make("CartPole-v1", num_envs=8, seed=43,  env_type="gymnasium", batch_size=8)
obs = envs.reset()
print(obs)
actions = torch.zeros((8,), dtype=torch.int32)
actions[3] = 1
actions[5] = 1
next_obs, _, _, _, _ = envs.step(actions.numpy())
print(next_obs)

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions