Skip to content

Gomathi-006/RL

Repository files navigation

Shadow Heist OpenEnv Environment

ShadowHeistEnv is a grid-based stealth RL environment built around partial observability, autonomous guard agents, shaped rewards, and optional Hugging Face action selection.

Files

  • env.py: main environment with reset(), step(action), and state()
  • tasks.py: task definitions for collect_one, safe_heist, and perfect_escape
  • graders.py: scoring functions returning values in [0.0, 1.0]
  • agent.py: Hugging Face pipeline decision agent with rule-based fallback
  • gradio_ui.py: Gradio-based UI for playing the game manually or via the agent
  • run_shadow_heist.py: example loop that runs the environment and prints rewards/states
  • openenv.yaml: OpenEnv manifest pointing at env:ShadowHeistEnv

Game Design

  • The player controls a thief on an N x N grid.
  • Guards patrol independently and switch to chase behavior once they detect the player.
  • Treasures must be stolen with the steal action.
  • The exit ends the run successfully once at least one treasure has been secured.
  • The observation is partially observable through a visibility radius and masked grid.
  • hide enables stealth mode and lowers guard detection chance.
  • The Gradio UI keeps three difficulty levels:
    • easy: 6x6, 1 guard, 2 treasures, 80 max steps
    • medium: 8x8, 2 guards, 3 treasures, 100 max steps
    • hard: 10x10, 3 guards, 4 treasures, 120 max steps

Action Space

  • move_up
  • move_down
  • move_left
  • move_right
  • hide
  • steal
  • wait

Rewards

  • +10 successful steal
  • +1 safe movement
  • -5 detected by a guard
  • -50 caught
  • +100 successful escape
  • +0.1 exploration bonus for reaching a new cell

Install

python -m pip install -e .

Optional Hugging Face integration:

python -m pip install -e .[huggingface]

Optional Gradio UI:

python -m pip install -e .[ui]

Example

from agent import ShadowHeistDecisionAgent
from env import ShadowHeistEnv

env = ShadowHeistEnv(grid_size=8, num_guards=2, num_treasures=3, seed=7)
agent = ShadowHeistDecisionAgent(seed=7)

state = env.reset()
done = False

while not done:
    action = agent.decide_action(state)
    state, reward, done, info = env.step(action)
    print(action, reward, state["player_pos"], state["collected_treasures"])

Run the included script:

python run_shadow_heist.py

Run the Gradio UI:

python gradio_ui.py

OpenEnv Manifest

name: shadow_heist_env
entry_point: env:ShadowHeistEnv
tasks:
  - name: collect_one
    grader: graders:grade_easy
  - name: safe_heist
    grader: graders:grade_medium
  - name: perfect_escape
    grader: graders:grade_hard

Tests

pytest

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors