Gym-PPS is a lightweight Predator-Prey Swarm environment fully compatible with the standard OpenAI Gym interface. It is designed as an efficient platform for rapidly benchmarking reinforcement learning and control algorithms in guidance, swarming, and formation tasks. 🎥 Milestone: Our demonstration video on Bilibili has reached views.
Gym-PPS requires Python 3.8 for optimal performance. To ensure stability and avoid dependency conflicts, we strongly recommend running the library within a dedicated virtual environment. Currently, the library requires manual installation from the source.
Create and activate a Python 3.8 virtual environment:
python3.8 -m venv .venv-pps
source .venv-pps/bin/activateInstall the library: Navigate to the repository directory and install the package:
cd Gym-PPS-main
pip install .To verify the installation and run a demo simulation, execute the following test script:
cd example_use_pps
python example1.pyA simulation window will pop up similar to the one shown below:
| Cartesian Mode | Polar Mode |
Gym-PPS is designed for ease of use; refer to example1.py for a quick demonstration.
python example1.py## Define the Predator-Prey Swarm (PPS) environment
scenario_name = 'PredatorPreySwarm-v0'
## customize PPS environment parameters in the .json file
custom_param = 'custom_param.json'
## Make the environment
env = gym.make(scenario_name)
custom_param = os.path.dirname(os.path.realpath(__file__)) + '/' + custom_param
env = PredatorPreySwarmCustomizer(env, custom_param)
if __name__ == '__main__':
n_p = env.get_param('n_p')
n_e = env.n_e
n_pe = env.n_pe
print("Number of predators: ", n_p)
print("Number of prey: ", n_e)
print("Number of total agents: ", n_pe)
s = env.reset() # (obs_dim, n_peo)
print("Observation space shape: ", s.shape)
print("Action space shape: ", env.action_space.shape)
for _ in range(1):
for step in range(1000):
env.render( mode='human' )
# To separately control
a_pred = np.random.uniform(-1,1,(2, n_p)) # predator action
a_prey = np.random.uniform(-1,1,(2, n_e)) # prey action
a = np.concatenate((a_pred, a_prey), axis=-1) # total action
s_, r, done, info = env.step(a) # state, reward, done, info
s = s_ # update stateYou can customize environment parameters, such as the number of predators or the dynamics mode, by modifying the custom_param.json file as shown below:
{
"dynamics_mode": "Polar",
"n_p": 3,
"n_e": 20,
"pursuer_strategy": "random",
"escaper_strategy": "nearest",
"is_periodic": true
}Alternatively, you can access or modify these parameters directly within your code:
n_p = env.get_param('n_p')
env.set_param('n_p', 10)To customize the observation or reward functions, please modify the definitions in custom_env.py. Refer to example2.py for guidance.
python example2.py## Use the following wrappers to customize reward, observation, and action functions
env = MyReward(env, custom_param)
env = MyObs(env, custom_param)
class MyObs(CustomObservation):
# def __init__(self, env, args):
# super().__init__(env, args)
# self.observation_space = spaces.Box(shape=(2, env.n_p+env.n_e), low=-np.inf, high=np.inf)
def observation(self, obs):
r"""Example::
obs = obs[6:, :] # for example, remove ego-state
# ⚠️ WARNING: Then your algorithm should stick to your own observation space !
"""
# your code here
obs = obs[6:, :]
return obs
class MyReward(CustomReward):
def reward(self, observation, reward, action):
r"""Example::
reward_p = 5.0 * self.env.is_collide_b2b[self.env.n_p:self.env.n_pe, :self.env.n_p].sum(axis=0, keepdims=True).astype(float)
reward_e = - 1.0 * self.env.is_collide_b2b[self.env.n_p:self.env.n_pe, :self.env.n_p].sum(axis=1, keepdims=True).astype(float).reshape(1,self.env.n_e)
reward_e -= 0.1 * np.abs( action[[0], self.env.n_p:self.env.n_pe]) + 0.01 * np.abs( action[[1], self.env.n_p:self.env.n_pe])
reward = np.concatenate((reward_p, reward_e), axis=1)
"""
# your code here
return rewardFor more advanced customization, such as adding new methods or functions to the environment, modify the MyEnv class directly. See example3.py for implementation details.
python example3.py## Use the following wrapper to customize the environment class
env = MyEnv(env, custom_param)
class MyEnv(PredatorPreySwarmCustomizer):
def __init__(self, env, args):
super().__init__(env, args)
## example
def compute_speed(self):
speed = np.sqrt(self.env.dp[[0],:]**2 + self.env.dp[[1],:]**2)
return speed
def myfunc(self):
# define your own function here
# your code here
passThis repository also provides a reference implementation of the MARL algorithm for the PPS environment, adapted from “Predator-prey survival pressure is sufficient to evolve swarming behaviors” (New Journal of Physics).
To train the swarm, first ensure torch is installed in the .venv-pps
pip install torchThen run
cd example_NJP_algorithm
python main.pyThe training should start immediately. Go grab a coffee, but make it an espresso because this won't take long. Afterward, increase n_e up to 25 in custom_param.json, then run
python evaluate.pyto see the prey agents embrace the swarm mind:
We hope you enjoy this project. Should you find it helpful for your research, we would appreciate your citation of the following paper, which helps other researchers find us.
Gym-PPS appears first in the paper
@article{li2023predator,
title={Predator--prey survival pressure is sufficient to evolve swarming behaviors},
author={Li, Jianan and Li, Liang and Zhao, Shiyu},
journal={New Journal of Physics},
volume={25},
number={9},
pages={092001},
year={2023},
publisher={IOP Publishing}
}
Below is a list of the parameters that can be customized:
| Parameter name | Meaning | Default value |
|---|---|---|
| n_p | number of predators | 3 |
| n_e | number of prey | 10 |
| is_periodic | whether the environment is periodic | True |
| pursuer_strategy | embedded pursuer control algorithm | 'input' |
| escaper_strategy | embedded prey control algorithm | 'input' |
| penalize_control_effort | whether to penalize control effort in reward functions | True |
| penalize_collide_walls | whether to penalize wall collision in reward functions | False |
| penalize_distance | whether to penalize predator-prey distance in reward | False |
| penalize_collide_agents | whether to penalize agents collisions in reward functions | False |
| FoV_p | Field of View for predators | 5 |
| FoV_e | Field of View for prey | 5 |
| topo_n_p2e | topological distance for predators seeing prey | 5 |
| topo_n_e2p | topological distance for prey seeing predators | 2 |
| topo_n_p2p | topological distance for predators seeing predators | 2 |
| topo_n_e2e = 5 | topological distance for prey seeing prey | 5 |
| m_p | mass of predators | 3 |
| m_e | mass of prey | 1 |
| size_p | size of predators | 0.06 |
| size_e | size of prey | 0.035 |
| render_traj | whether to render trajectories | True |
| save_frame | whether to save rendered frame | False |
| frame_dir | where to save rendered frame | "./frames" |