GitHub - WindyLab/Gym-PPS

Gym-PPS

Gym-PPS is a lightweight Predator-Prey Swarm environment fully compatible with the standard OpenAI Gym interface. It is designed as an efficient platform for rapidly benchmarking reinforcement learning and control algorithms in guidance, swarming, and formation tasks. 🎥 Milestone: Our demonstration video on Bilibili has reached views.

Usage

Gym-PPS requires Python 3.8 for optimal performance. To ensure stability and avoid dependency conflicts, we strongly recommend running the library within a dedicated virtual environment. Currently, the library requires manual installation from the source.

Create and activate a Python 3.8 virtual environment:

python3.8 -m venv .venv-pps
source .venv-pps/bin/activate

Install the library: Navigate to the repository directory and install the package:

cd Gym-PPS-main
pip install .

To verify the installation and run a demo simulation, execute the following test script:

cd example_use_pps
python example1.py

A simulation window will pop up similar to the one shown below:


Cartesian Mode	Polar Mode

Example 1: Quick Start Guide

Gym-PPS is designed for ease of use; refer to example1.py for a quick demonstration.

python example1.py

## Define the Predator-Prey Swarm (PPS) environment
scenario_name = 'PredatorPreySwarm-v0'  

## customize PPS environment parameters in the .json file
custom_param = 'custom_param.json'      

## Make the environment 
env = gym.make(scenario_name)
custom_param = os.path.dirname(os.path.realpath(__file__)) + '/' + custom_param
env = PredatorPreySwarmCustomizer(env, custom_param)

if __name__ == '__main__':

    n_p = env.get_param('n_p')
    n_e = env.n_e
    n_pe = env.n_pe
    print("Number of predators: ", n_p)
    print("Number of prey: ", n_e)
    print("Number of total agents: ", n_pe)

    s = env.reset()   # (obs_dim, n_peo)
    print("Observation space shape: ", s.shape)
    print("Action space shape: ", env.action_space.shape)
    
    for _ in range(1):
        for step in range(1000):
            env.render( mode='human' )

            # To separately control 
            a_pred = np.random.uniform(-1,1,(2, n_p))       # predator action
            a_prey = np.random.uniform(-1,1,(2, n_e))       # prey action
            a = np.concatenate((a_pred, a_prey), axis=-1)   # total action

            s_, r, done, info = env.step(a)                 # state, reward, done, info
            s = s_                                          # update state

You can customize environment parameters, such as the number of predators or the dynamics mode, by modifying the custom_param.json file as shown below:

{
    "dynamics_mode": "Polar",
    "n_p": 3,
    "n_e": 20,
    "pursuer_strategy": "random",
    "escaper_strategy": "nearest",
    "is_periodic": true
}

Alternatively, you can access or modify these parameters directly within your code:

n_p = env.get_param('n_p')
env.set_param('n_p', 10)

Example 2: Customize Observation, Reward & Action

To customize the observation or reward functions, please modify the definitions in custom_env.py. Refer to example2.py for guidance.

python example2.py

## Use the following wrappers to customize reward, observation, and action functions 
env = MyReward(env, custom_param)
env = MyObs(env, custom_param)  

class MyObs(CustomObservation):

    # def __init__(self, env, args):
    #     super().__init__(env, args)
    #     self.observation_space = spaces.Box(shape=(2, env.n_p+env.n_e), low=-np.inf, high=np.inf)

    def observation(self, obs):
        r"""Example::

        obs = obs[6:, :]  # for example, remove ego-state
        # ⚠️ WARNING: Then your algorithm should stick to your own observation space !

        """
        # your code here
        obs = obs[6:, :]
        return obs
        

class MyReward(CustomReward):
    
    def reward(self, observation, reward, action):
        r"""Example::

        reward_p =   5.0 * self.env.is_collide_b2b[self.env.n_p:self.env.n_pe, :self.env.n_p].sum(axis=0, keepdims=True).astype(float)                      
        reward_e = - 1.0 * self.env.is_collide_b2b[self.env.n_p:self.env.n_pe, :self.env.n_p].sum(axis=1, keepdims=True).astype(float).reshape(1,self.env.n_e)  
        reward_e -= 0.1 * np.abs( action[[0], self.env.n_p:self.env.n_pe]) + 0.01 * np.abs( action[[1], self.env.n_p:self.env.n_pe])  
        reward = np.concatenate((reward_p, reward_e), axis=1)

        """
        # your code here
        return reward

Example 3: Customize Environment (Advanced)

For more advanced customization, such as adding new methods or functions to the environment, modify the MyEnv class directly. See example3.py for implementation details.

python example3.py

## Use the following wrapper to customize the environment class
env = MyEnv(env, custom_param)

class MyEnv(PredatorPreySwarmCustomizer):
    def __init__(self, env, args):
        super().__init__(env, args)

    ## example
    def compute_speed(self):
        speed = np.sqrt(self.env.dp[[0],:]**2 + self.env.dp[[1],:]**2)
        return speed
    
    def myfunc(self):
        # define your own function here 
        # your code here
        pass

Implementation of NJP algorithm

This repository also provides a reference implementation of the MARL algorithm for the PPS environment, adapted from “Predator-prey survival pressure is sufficient to evolve swarming behaviors” (New Journal of Physics).

To train the swarm, first ensure torch is installed in the .venv-pps

pip install torch

Then run

cd example_NJP_algorithm
python main.py

The training should start immediately. Go grab a coffee, but make it an espresso because this won't take long. Afterward, increase n_e up to 25 in custom_param.json, then run

python evaluate.py

to see the prey agents embrace the swarm mind:

Training Result

We hope you enjoy this project. Should you find it helpful for your research, we would appreciate your citation of the following paper, which helps other researchers find us.

Paper Information

Gym-PPS appears first in the paper

@article{li2023predator,
  title={Predator--prey survival pressure is sufficient to evolve swarming behaviors},
  author={Li, Jianan and Li, Liang and Zhao, Shiyu},
  journal={New Journal of Physics},
  volume={25},
  number={9},
  pages={092001},
  year={2023},
  publisher={IOP Publishing}
}

Parameter List

Below is a list of the parameters that can be customized:

Parameter name	Meaning	Default value
n_p	number of predators	3
n_e	number of prey	10
is_periodic	whether the environment is periodic	True
pursuer_strategy	embedded pursuer control algorithm	'input'
escaper_strategy	embedded prey control algorithm	'input'
penalize_control_effort	whether to penalize control effort in reward functions	True
penalize_collide_walls	whether to penalize wall collision in reward functions	False
penalize_distance	whether to penalize predator-prey distance in reward	False
penalize_collide_agents	whether to penalize agents collisions in reward functions	False
FoV_p	Field of View for predators	5
FoV_e	Field of View for prey	5
topo_n_p2e	topological distance for predators seeing prey	5
topo_n_e2p	topological distance for prey seeing predators	2
topo_n_p2p	topological distance for predators seeing predators	2
topo_n_e2e = 5	topological distance for prey seeing prey	5
m_p	mass of predators	3
m_e	mass of prey	1
size_p	size of predators	0.06
size_e	size of prey	0.035
render_traj	whether to render trajectories	True
save_frame	whether to save rendered frame	False
frame_dir	where to save rendered frame	"./frames"

Name		Name	Last commit message	Last commit date
Latest commit History 47 Commits
.github		.github
bin		bin
docs		docs
example_NJP_algorithm		example_NJP_algorithm
example_use_pps		example_use_pps
gym		gym
scripts		scripts
tests		tests
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
CODE_OF_CONDUCT.rst		CODE_OF_CONDUCT.rst
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
LICENSE.md		LICENSE.md
README.md		README.md
py.Dockerfile		py.Dockerfile
requirements.txt		requirements.txt
setup.py		setup.py
test_requirements.txt		test_requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Licenses found

Uh oh!

Repository files navigation

Gym-PPS

Usage

Example 1: Quick Start Guide

Example 2: Customize Observation, Reward & Action

Example 3: Customize Environment (Advanced)

Implementation of NJP algorithm

Paper Information

Parameter List

About

Licenses found

Uh oh!

Releases

Packages

Uh oh!

Contributors 4

Uh oh!

Languages

License

Licenses found

WindyLab/Gym-PPS

Folders and files

Latest commit

History

Repository files navigation

Gym-PPS

Usage

Example 1: Quick Start Guide

Example 2: Customize Observation, Reward & Action

Example 3: Customize Environment (Advanced)

Implementation of NJP algorithm

Paper Information

Parameter List

About

Resources

License

Licenses found

Contributing

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 4

Uh oh!

Languages

Packages