feat(sglang): add support for sglang server by Lin-xs · Pull Request #1267 · RLinf/RLinf

Lin-xs · 2026-06-11T14:19:30Z

Description

Adds a server-mode SGLang backend to RLinf so rollouts can talk to one or more SGLang HTTP engines through an sglang-router instead of using the in-process SGLang engine.

New module rlinf/hybrid_engines/sglang/server/:

SGLangServerWorker — Ray worker that spawns an SGLang HTTP server per process group (sized by tensor_parallel_size * pipeline_parallel_size), waits on /health, and reports its URL.
SGLangRouterWorker — single Ray worker that runs an sglang-router subprocess, dynamically registers/unregisters server URLs, and exposes router URL + health + a thin generate passthrough.
InferenceHTTPClient — sync/async client over requests / aiohttp for /generate, /v1/chat/completions, and /health against either a router or a single server.
launch_sglang_router_and_server(config, cluster, rollout_hardware_ranks, ...) — one-call orchestrator that builds a PackedPlacementStrategy from the rollout hardware ranks, launches the server group, brings up the router, and registers each server with the router from the driver.

Demo example wired up:

examples/reasoning/sglang_server_demo.py — exercises sync/async /generate and /v1/chat/completions against the router for Qwen2.5-VL-3B.
examples/reasoning/config/sglang_server_demo.yaml — rollout.server / rollout.router config block, with launch_server / launch_router toggles and group_name / router_group_name.
examples/reasoning/run_sglang_server_demo.sh — launch script.

Dependency: sglang-router is added to the agentic-sglang optional dependency group in pyproject.toml.

Motivation and Context

The existing SGLang integration only supports the in-process engine, which couples rollout workers to the engine lifecycle and makes it hard to (1) share a pool of SGLang engines across multiple consumers, (2) scale the engine pool independently of the trainer, and (3) speak the standard OpenAI / SGLang HTTP protocol from agentic / tool-using code.

This change introduces a "server mode" path: each engine runs as a long-lived HTTP server, an sglang-router fronts the pool with cache-aware routing, and callers (RL rollouts, eval harnesses, agentic demos) talk to a single stable router URL. It is fully opt-in — existing configs are unaffected.

How has this been tested?

Ran bash examples/reasoning/run_sglang_server_demo.sh on a 2-GPU node with Qwen2.5-VL-3B (TP=2, PP=1):
- Server group came up, router registered the server, and router_group.get_router_url() returned a reachable URL.
- Sync + async /generate and /v1/chat/completions paths all completed end-to-end.
- Router + server groups shut down cleanly via shutdown().wait().
No existing rollout/training paths were modified, so existing reasoning / embodied configs continue to work unchanged.

Additional information (optional, e.g., figures and logs):

Router placement defaults to node rank 0 (head); override via router_node_rank in launch_sglang_router_and_server.
aiohttp defaults to limit=100 connections — for large fan-outs bump the client's max_connections and ulimit -n accordingly (documented in the InferenceHTTPClient docstring).
Server registration is serialized from the driver to keep worker ordering stable; can be parallelized from N workers if needed.

Types of changes

Bug fix (non-breaking change which fixes an issue)
New feature (non-breaking change which adds functionality)
Documentation update (Document-only update)
Breaking change (fix or feature that would cause existing functionality to not work as expected)

Checklist:

My code follows the code style of this project.
My change requires a change to the documentation.
I have updated the documentation accordingly.
I have added tests to cover my changes.
All new and existing tests passed.

Signed-off-by: Lin-xs <1833080950@qq.com>

Copilot

Pull request overview

This PR introduces an opt-in “server mode” SGLang backend for RLinf, where rollout/eval callers can talk to a pool of external SGLang HTTP servers via an sglang-router (instead of using the in-process engine).

Changes:

Added Ray workers to launch/own SGLang HTTP server subprocesses and an sglang-router subprocess, plus a driver-side orchestrator to place and wire them together.
Added a small sync/async HTTP client for /generate, /v1/chat/completions, and /health.
Added a runnable demo (Python + YAML + shell script) and added sglang-router to the agentic-sglang optional dependency group.

Reviewed changes

Copilot reviewed 9 out of 9 changed files in this pull request and generated 8 comments.

Show a summary per file

File	Description
`rlinf/hybrid_engines/sglang/server/server_launcher.py`	New `SGLangServerWorker` that spawns an SGLang HTTP server subprocess and waits for `/health`.
`rlinf/hybrid_engines/sglang/server/router_launcher.py`	New `SGLangRouterWorker` that spawns `sglang_router.launch_router` and supports dynamic server registration.
`rlinf/hybrid_engines/sglang/server/launcher.py`	Driver helper to build placement, launch server/router groups, and register servers.
`rlinf/hybrid_engines/sglang/server/http_client.py`	New `InferenceHTTPClient` with sync + async APIs for router/server HTTP endpoints.
`rlinf/hybrid_engines/sglang/server/__init__.py`	Exports new server-mode utilities.
`pyproject.toml`	Adds `sglang-router` under `agentic-sglang` extras.
`examples/reasoning/sglang_server_demo.py`	Demo script exercising sync/async generate and chat completions via the router.
`examples/reasoning/run_sglang_server_demo.sh`	Demo launch script.
`examples/reasoning/config/sglang_server_demo.yaml`	Demo configuration for server/router blocks and placement.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

+def _wait_for_http_health(host: str, port: int, timeout: float = 300.0) -> None:
+    """Block until ``GET http://host:port/health`` returns 200, or raise."""
+    deadline = time.perf_counter() + timeout
+    url = f"http://{host}:{port}/health"
+    last_err: Optional[BaseException] = None
+    while time.perf_counter() < deadline:
+        try:
+            resp = requests.get(url, timeout=5)
+            if resp.status_code == 200:
+                return
+        except requests.exceptions.RequestException as e:
+            last_err = e
+        time.sleep(1.0)
+    raise RuntimeError(
+        f"sglang server at {url} did not become healthy within {timeout:.0f}s "
+        f"(last error: {last_err!r})."
+    )


+            self._advertise_host = ray.util.get_node_ip_address()
+
+        _wait_for_http_health(self._advertise_host, http_port)
+        self.log_info(f"sglang server ready at {self.get_server_url()}")
+        return self.get_server_url()


+        router_cfg = (
+            OmegaConf.to_container(self._router_cfg, resolve=True)
+            if self._router_cfg is not None
+            else {}
+        ) or {}
+


+        port = int(self._bind_port or self.acquire_free_port())
+        self._port = port
+


+    def __init__(
+        self,
+        base_url: str,
+        connect_timeout: float = 10.0,
+        max_connections: int = 1024 * 16,
+    ):
+        self.base_url = base_url.rstrip("/")
+        self.connect_timeout = connect_timeout
+        self.max_connections = max_connections


 agentic-sglang = [
    "sglang[all]==0.4.6.post5",
+    "sglang-router",
    "torch-memory-saver",
    "numpy==2.2",
    "transformers==4.51.1",


    "sglang[all]==0.4.6.post5",
+    "sglang-router",
    "torch-memory-saver",


+#! /bin/bash
+set -x
+
+tabs 4
+export CUDA_DEVICE_MAX_CONNECTIONS=1
+export RAY_DEDUP_LOGS=0
+


andylin-hao · 2026-06-12T02:23:43Z

+    return args
+
+
+class SGLangRouterWorker(Worker):


Should we move this to workers and name the file xxx worker?

bianhua-12 · 2026-06-12T08:10:57Z

+            placement_strategy=router_placement,
+        )
+
+    router_handle = router_group.init_router() if router_group is not None else None


Could we make the startup path transactional and clean up partially-started resources on failure?

There are two related failure paths here:

At the orchestrator level, if server_handle.wait(), router_handle.wait(), get_server_url(), or register_server() raises, the already-launched server/router Ray groups are returned by no one and their child processes can keep running.

At the router worker level, SGLangRouterWorker.init_router() calls subprocess.Popen() and then waits for /health. If the process stays alive but never becomes healthy, _wait_for_router_health() raises while the router subprocess may still be holding the selected port.

For SGLang this is especially risky because failed startup can leave GPU-serving processes and router ports behind, causing OOM or port conflicts on the next run.

Could we wrap the launch/init/register sequence in try/except and best-effort shut down router then server before re-raising, and also make init_router() call self.shutdown()/reset state when its health wait fails?

bianhua-12 · 2026-06-12T09:00:12Z

+        rollout_placement = PackedPlacementStrategy(
+            start_hardware_rank=ranks[0],
+            end_hardware_rank=ranks[-1],
+            num_hardware_per_process=num_accelerators_per_engine,
+        )
+
+        server_group = SGLangServerWorker.create_group(
+            config=config,
+            sglang_cfg=rollout_cfg.server,
+        ).launch(
+            cluster=cluster,
+            name=rollout_cfg.group_name,
+            placement_strategy=rollout_placement,
+        )


Could we avoid rebuilding placement from only the flat hardware-rank list here?

The caller likely already has a parsed RLinf placement strategy from ComponentPlacement. Reconstructing a new PackedPlacementStrategy from rollout_hardware_ranks loses the original placement semantics, especially node_group / heterogeneous cluster placement / flexible mappings. It also assumes the ranks are contiguous.

For example, if the original component placement targets a non-default node group, this helper currently creates a new PackedPlacementStrategy without passing that node group, so the server group can be scheduled against the default group instead of the configured one.

Could this helper accept the caller-provided placement strategy directly, or preserve the original node group and process-to-resource mapping when repacking?

Signed-off-by: Lin-xs <1833080950@qq.com>

feat(sglang): add support for sglang server

d18d997

Signed-off-by: Lin-xs <1833080950@qq.com>

Lin-xs requested review from bianhua-12 and Copilot June 11, 2026 14:19

Lin-xs requested review from andylin-hao and guozhen1997 as code owners June 11, 2026 14:19

Copilot started reviewing on behalf of Lin-xs June 11, 2026 14:19 View session

Copilot AI reviewed Jun 11, 2026

View reviewed changes

andylin-hao reviewed Jun 12, 2026

View reviewed changes

Lin-xs added the Needs Review! label Jun 12, 2026

bianhua-12 reviewed Jun 12, 2026

View reviewed changes

Lin-xs added 2 commits June 12, 2026 09:14

fix: pass router config to launcher function

e6a4150

Signed-off-by: Lin-xs <1833080950@qq.com>

fix: gracefully shuntdown when launch failed

d1c626d

Signed-off-by: Lin-xs <1833080950@qq.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(sglang): add support for sglang server#1267

feat(sglang): add support for sglang server#1267
Lin-xs wants to merge 3 commits into
RLinf:mainfrom
Lin-xs:feat/sglang_router

Lin-xs commented Jun 11, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

andylin-hao Jun 12, 2026

Uh oh!

Lin-xs Jun 12, 2026

Uh oh!

bianhua-12 Jun 12, 2026

Uh oh!

bianhua-12 Jun 12, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

		port = int(self._bind_port or self.acquire_free_port())
		self._port = port

Conversation

Lin-xs commented Jun 11, 2026

Description

Motivation and Context

How has this been tested?

Additional information (optional, e.g., figures and logs):

Types of changes

Checklist:

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

andylin-hao Jun 12, 2026

Choose a reason for hiding this comment

Uh oh!

Lin-xs Jun 12, 2026

Choose a reason for hiding this comment

Uh oh!

bianhua-12 Jun 12, 2026

Choose a reason for hiding this comment

Uh oh!

bianhua-12 Jun 12, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants