ShadowHeistEnv is a grid-based stealth RL environment built around partial observability, autonomous guard agents, shaped rewards, and optional Hugging Face action selection.
env.py: main environment withreset(),step(action), andstate()tasks.py: task definitions forcollect_one,safe_heist, andperfect_escapegraders.py: scoring functions returning values in[0.0, 1.0]agent.py: Hugging Facepipelinedecision agent with rule-based fallbackgradio_ui.py: Gradio-based UI for playing the game manually or via the agentrun_shadow_heist.py: example loop that runs the environment and prints rewards/statesopenenv.yaml: OpenEnv manifest pointing atenv:ShadowHeistEnv
- The player controls a thief on an
N x Ngrid. - Guards patrol independently and switch to chase behavior once they detect the player.
- Treasures must be stolen with the
stealaction. - The exit ends the run successfully once at least one treasure has been secured.
- The observation is partially observable through a visibility radius and masked grid.
hideenables stealth mode and lowers guard detection chance.- The Gradio UI keeps three difficulty levels:
easy:6x6,1guard,2treasures,80max stepsmedium:8x8,2guards,3treasures,100max stepshard:10x10,3guards,4treasures,120max steps
move_upmove_downmove_leftmove_righthidestealwait
+10successful steal+1safe movement-5detected by a guard-50caught+100successful escape+0.1exploration bonus for reaching a new cell
python -m pip install -e .Optional Hugging Face integration:
python -m pip install -e .[huggingface]Optional Gradio UI:
python -m pip install -e .[ui]from agent import ShadowHeistDecisionAgent
from env import ShadowHeistEnv
env = ShadowHeistEnv(grid_size=8, num_guards=2, num_treasures=3, seed=7)
agent = ShadowHeistDecisionAgent(seed=7)
state = env.reset()
done = False
while not done:
action = agent.decide_action(state)
state, reward, done, info = env.step(action)
print(action, reward, state["player_pos"], state["collected_treasures"])Run the included script:
python run_shadow_heist.pyRun the Gradio UI:
python gradio_ui.pyname: shadow_heist_env
entry_point: env:ShadowHeistEnv
tasks:
- name: collect_one
grader: graders:grade_easy
- name: safe_heist
grader: graders:grade_medium
- name: perfect_escape
grader: graders:grade_hardpytest