[Feature] XGrammar Structural Tag Migration

### Checklist

- [x] 1. If the issue you raised is not a feature but a question, please raise a discussion at https://github.com/sgl-project/sglang/discussions/new/choose Otherwise, it will be closed.
- [x] 2. Please use English, otherwise it will be closed.

### Motivation

## Overview

4 issues to migrate to new xgrammar structural tag format. Can be done in **3-5 days** with focused effort.

**Dependencies**:
- Issue 1 → Issue 2 (need base changes before detector implementations)
- Issue 3 can be done in parallel with Issue 2
- Issue 4 is optional/can be deferred

---

## Issue 1: Remove Legacy Format & Update Base Infrastructure

**Estimate**: 4-6 hours

**Goal**: Clean up legacy code and update base classes/types

### Tasks

#### 1. Remove Legacy Types
**File**: `python/sglang/srt/entrypoints/openai/protocol.py`

- [ ] Delete `LegacyStructuralTagResponseFormat` class
- [ ] Delete `StructuresResponseFormat` class
- [ ] Update `ToolCallConstraint` type alias to only use new format:
  ```python
  ToolCallConstraint: TypeAlias = Union[
      Tuple[Literal["structural_tag"], Dict[str, Any]],  # New format only
      Tuple[Literal["json_schema"], Any],
  ]
  ```

#### 2. Update BaseFormatDetector
**File**: `python/sglang/srt/function_call/base_format_detector.py`

- [ ] Remove `structure_info()` abstract method
- [ ] Remove `build_ebnf()` abstract method
- [ ] Add new `build_structural_tag()` abstract method:
  ```python
  @abstractmethod
  def build_structural_tag(
      self,
      tools: List[Tool],
      at_least_one: bool = False,
      stop_after_first: bool = False,
  ) -> Dict[str, Any]:
      """Build xgrammar structural tag for this model's format."""
      raise NotImplementedError()
  ```

#### 3. Update FunctionCallParser
**File**: `python/sglang/srt/function_call/function_call_parser.py`

- [ ] Remove `get_structure_tag()` method (old implementation)
- [ ] Update `get_structure_constraint()` to:
  - Accept `parallel_tool_calls` parameter
  - Use detector's `build_structural_tag()` for auto mode
  - Use `get_json_schema_constraint()` for required/specific
  ```python
  def get_structure_constraint(
      self,
      tool_choice: Union[ToolChoice, Literal["auto", "required"]],
      parallel_tool_calls: bool = True
  ) -> Optional[ToolCallConstraint]:
      if tool_choice == "auto":
          if self.detector.supports_structural_tag():
              tag = self.detector.build_structural_tag(
                  tools=self.tools,
                  at_least_one=False,
                  stop_after_first=not parallel_tool_calls
              )
              return ("structural_tag", tag)
          return None
      elif tool_choice == "required" or isinstance(tool_choice, ToolChoice):
          json_schema = get_json_schema_constraint(
              self.tools, tool_choice, parallel_tool_calls
          )
          return ("json_schema", json_schema)
      return None
  ```

#### 4. Update JSON Schema Builder
**File**: `python/sglang/srt/function_call/utils.py`

- [ ] Add `parallel_tool_calls` parameter to `get_json_schema_constraint()`
- [ ] Add `maxItems: 1` when `parallel_tool_calls=False` for required mode
- [ ] Keep `maxItems: 1` always for specific function choice

#### 5. Update Serving Chat
**File**: `python/sglang/srt/entrypoints/openai/serving_chat.py`

- [ ] Pass `parallel_tool_calls` to `get_structure_constraint()`
- [ ] Remove json_schema override (lines 237-244)

#### 6. Remove Legacy Backend Code
**File**: `python/sglang/srt/constrained/xgrammar_backend.py`

- [ ] Remove `is_legacy_structural_tag()` check
- [ ] Remove legacy format handling
- [ ] Pass new format directly to xgrammar compiler

**File**: `python/sglang/srt/constrained/utils.py`

- [ ] Delete `is_legacy_structural_tag()` function

#### 7. Remove `SGLANG_TOOL_STRICT_LEVEL` env var

### Testing

- [ ] Code compiles (will have import errors from detectors until Issue 2)
- [ ] Type checking passes
- [ ] Legacy types are completely removed

### Acceptance Criteria

- All legacy code removed
- Base infrastructure ready for detector implementations
- No breaking changes to existing API (just internal refactor)

---

## Issue 2: Implement build_structural_tag() for All Detectors

**Estimate**: 8-10 hours (11 detectors, 3 with special formats)

**Goal**: Update all detector classes to implement new method

### Detectors to Update

For **each detector**, do the following:
1. Remove `structure_info()` implementation
2. Remove `build_ebnf()` implementation
3. Add `build_structural_tag()` implementation

**Key principle**: Always include `tool.function.parameters` in schema, even if `strict=False`

### Detector List

#### Standard Detectors (Use triggered_tags format)

**File**: `python/sglang/srt/function_call/llama32_detector.py`
- [ ] Llama32Detector
  - Trigger: `<|python_tag|>`
  - Begin: `<|python_tag|>{"name":"{name}", "arguments":`
  - End: `}`

**File**: `python/sglang/srt/function_call/mistral_detector.py`
- [ ] MistralDetector
  - Trigger: `[TOOL_CALLS]`
  - Begin: `[TOOL_CALLS] [{`
  - End: `}]`

**File**: `python/sglang/srt/function_call/qwen25_detector.py`
- [ ] Qwen25Detector
  - Trigger: `<tool_call>`
  - Begin: `<tool_call>\n{`
  - End: `}</tool_call>`

**File**: `python/sglang/srt/function_call/qwen3_coder_detector.py`
- [ ] Qwen3CoderDetector
  - **SPECIAL FORMAT**: Uses XML parameter format, NOT JSON
  - Parameters: `<parameter=name>value</parameter>`
  - Nested objects remain JSON
  - Refer to: https://xgrammar.mlc.ai/docs/tutorials/structural_tag.html
  - May need custom content format for XML parameters

**File**: `python/sglang/srt/function_call/glm4_moe_detector.py`
- [ ] Glm4MoeDetector
  - Review existing `structure_info()` for format
  - Implement equivalent in `build_structural_tag()`

**File**: `python/sglang/srt/function_call/gpt_oss_detector.py`
- [ ] GptOssDetector
  - Review existing `structure_info()` for format
  - Implement equivalent in `build_structural_tag()`

**File**: `python/sglang/srt/function_call/pythonic_detector.py`
- [ ] PythonicDetector （Might not be able to support)
  - Review existing `structure_info()` for format
  - Implement equivalent in `build_structural_tag()`

**File**: `python/sglang/srt/function_call/kimik2_detector.py`
- [ ] KimiK2Detector
  - Review if needs special format (tags_with_separator?)
  - Implement accordingly

**File**: `python/sglang/srt/function_call/minimax_m2.py`
- [ ] MinimaxM2Detector
  - Review existing `structure_info()` for format
  - Implement equivalent in `build_structural_tag()`

**File**: `python/sglang/srt/function_call/step3_detector.py`
- [ ] Step3Detector
  - Review existing `structure_info()` for format
  - Implement equivalent in `build_structural_tag()`

#### Special Detectors (Custom Formats)

**File**: `python/sglang/srt/function_call/deepseekv31_detector.py`
- [ ] DeepSeekV31Detector
  - **SPECIAL FORMAT**: Uses wrapper tags
  - Outer wrapper: `<｜tool▁calls▁begin｜>` ... `<｜tool▁calls▁end｜>`
  - Individual calls: `<｜tool▁call▁begin｜>` ... `<｜tool▁call▁end｜>`
  - Parameters in markdown code blocks: ` ```jsonc ` ... ` ``` `
  - Use `tags_with_separator` format with `\n` separator
  - Refer to: https://xgrammar.mlc.ai/docs/tutorials/structural_tag.html

**File**: `python/sglang/srt/function_call/deepseekv3_detector.py`
- [ ] DeepSeekV3Detector
  - **SPECIAL FORMAT**: Similar to DeepSeekV31
  - Check if same wrapper tag format applies
  - May need `sequence` format for reasoning support
  - Refer to: https://xgrammar.mlc.ai/docs/tutorials/structural_tag.html

### Summary of Detectors

**Total**: 11 detectors to update

**Standard detectors** (8): Use `triggered_tags` format
- Llama32Detector
- MistralDetector
- Qwen25Detector
- Glm4MoeDetector
- GptOssDetector
- PythonicDetector
- KimiK2Detector
- MinimaxM2Detector
- Step3Detector

**Special format detectors** (3): Need custom implementations
- Qwen3CoderDetector (XML parameter format)
- DeepSeekV31Detector (wrapper tags + tags_with_separator)
- DeepSeekV3Detector (wrapper tags + tags_with_separator)

### Implementation Template

**For standard detectors**, the implementation follows this pattern:

```python
def build_structural_tag(
    self,
    tools: List[Tool],
    at_least_one: bool = False,
    stop_after_first: bool = False,
) -> Dict[str, Any]:
    """Build structural tag for [MODEL] format."""
    tags = []
    triggers = set()

    for tool in tools:
        name = tool.function.name
        if not name:
            continue

        # Define model-specific format
        begin = "..."  # From old structure_info()
        end = "..."    # From old structure_info()
        trigger = "..." # From old structure_info()

        # Always include schema
        schema = tool.function.parameters or {}

        tags.append({
            "format": "tag",
            "begin": begin,
            "content": {
                "format": "json_schema",
                "schema": schema
            },
            "end": end
        })

        triggers.add(trigger)

    return {
        "format": "triggered_tags",
        "triggers": list(triggers),
        "tags": tags,
        "at_least_one": at_least_one,
        "stop_after_first": stop_after_first
    }
```

**For DeepSeek detectors** (wrapper tags format):

```python
def build_structural_tag(
    self,
    tools: List[Tool],
    at_least_one: bool = False,
    stop_after_first: bool = False,
) -> Dict[str, Any]:
    """Build structural tag for DeepSeek wrapper format."""
    # Build individual tool call tags
    tool_tags = []
    for tool in tools:
        name = tool.function.name
        if not name:
            continue

        schema = tool.function.parameters or {}

        tool_tags.append({
            "format": "tag",
            "begin": "<｜tool▁call▁begin｜>\n```jsonc\n{",
            "content": {
                "format": "json_schema",
                "schema": schema
            },
            "end": "}\n```\n<｜tool▁call▁end｜>"
        })

    # Wrap in outer tag with separator
    return {
        "format": "triggered_tags",
        "triggers": ["<｜tool▁calls▁begin｜>"],
        "tags": [{
            "format": "tag",
            "begin": "<｜tool▁calls▁begin｜>\n",
            "content": {
                "format": "tags_with_separator",
                "tags": tool_tags,
                "separator": "\n",
                "at_least_one": at_least_one,
                "stop_after_first": stop_after_first
            },
            "end": "\n<｜tool▁calls▁end｜>"
        }],
        "at_least_one": False,  # Outer level
        "stop_after_first": False
    }
```

**For Qwen3Coder** (XML parameter format):

```python
def build_structural_tag(
    self,
    tools: List[Tool],
    at_least_one: bool = False,
    stop_after_first: bool = False,
) -> Dict[str, Any]:
    """Build structural tag for Qwen3-Coder XML format."""
    tags = []
    triggers = set()

    for tool in tools:
        name = tool.function.name
        if not name:
            continue

        # Qwen3-Coder uses XML parameter format
        # This may require a custom xgrammar format or schema transformation
        # Refer to: https://xgrammar.mlc.ai/docs/tutorials/structural_tag.html

        schema = tool.function.parameters or {}
        # TODO: May need to transform JSON schema to XML parameter schema

        tags.append({
            "format": "tag",
            "begin": f"<tool_call>\n<name>{name}</name>\n",
            "content": {
                "format": "xml_parameters",  # May need custom format
                "schema": schema
            },
            "end": "\n</tool_call>"
        })

        triggers.add("<tool_call>")

    return {
        "format": "triggered_tags",
        "triggers": list(triggers),
        "tags": tags,
        "at_least_one": at_least_one,
        "stop_after_first": stop_after_first
    }
```

**Note**: Qwen3-Coder implementation may require additional research into xgrammar's XML parameter support.

### Testing

For each detector:
- [ ] Unit test: Verify structural tag format is correct
- [ ] Unit test: Verify schema is always included
- [ ] Unit test: Verify at_least_one and stop_after_first work

### Acceptance Criteria

- All detectors implement `build_structural_tag()`
- All detectors have `structure_info()` and `build_ebnf()` removed
- Unit tests pass for each detector
- Code compiles and runs

---

## Issue 3: Add Integration Tests & E2E Validation

**Estimate**: 4-6 hours

**Goal**: Comprehensive testing of new format

### Unit Tests

**File**: `python/tests/srt/function_call/test_structural_tag_new_format.py` (NEW)

- [ ] Test schema always included (even with strict=False)
- [ ] Test at_least_one parameter
- [ ] Test stop_after_first parameter
- [ ] Test multiple tools
- [ ] Test empty parameters handling
- [ ] Test for each detector (Llama, Mistral, Qwen, etc.)

**File**: `python/tests/srt/function_call/test_parser_integration.py` (NEW)

- [ ] Test auto mode returns structural_tag
- [ ] Test auto + parallel=False sets stop_after_first
- [ ] Test required mode returns json_schema
- [ ] Test required + parallel=False sets maxItems
- [ ] Test specific function choice

### Integration Tests

**File**: `python/tests/integration/test_tool_calling_e2e.py` (NEW/UPDATE)

- [ ] Test with real models (Llama, Mistral, Qwen)
- [ ] Test tool_choice="auto" with structural tag
- [ ] Test parallel_tool_calls=True (multiple calls)
- [ ] Test parallel_tool_calls=False (single call)
- [ ] Test tool_choice="required" with json_schema
- [ ] Test schema guidance works (model follows parameter structure)

### Performance Tests

**File**: `python/tests/performance/test_structural_tag_overhead.py` (NEW)

- [ ] Benchmark compilation time vs legacy format
- [ ] Benchmark generation speed
- [ ] Compare to json_schema baseline

### Regression Tests

- [ ] Verify existing tool calling tests still pass
- [ ] Verify backward compatibility for models using json_schema
- [ ] Verify Kimi-K2 special handling still works

### Acceptance Criteria

- All tests pass
- No performance regression
- Tool calling works correctly for all modes
- parallel_tool_calls parameter works as expected

---

## Issue 4: Rust Router Support (OPTIONAL)

**Estimate**: 6-8 hours

**Goal**: Add parallel_tool_calls support and structural tag generation to Rust router

**Note**: This can be deferred or done in parallel by a different person

### Tasks

#### 1. Add ToolParser Trait Methods
**File**: `sgl-router/src/tool_parser/traits.rs`

- [ ] Add `build_structural_tag()` method to ToolParser trait with default implementation
- [ ] Add `get_format_info()` abstract method to ToolParser trait

#### 2. Implement get_format_info() for All Parsers
**Files**: `sgl-router/src/tool_parser/parsers/*.rs`

- [ ] Implement `get_format_info()` for LlamaParser
- [ ] Implement `get_format_info()` for MistralParser
- [ ] Implement `get_format_info()` for QwenParser
- [ ] Implement `get_format_info()` for DeepSeekParser
- [ ] Override `build_structural_tag()` for DeepSeekParser if reasoning support needed
- [ ] Implement for any other parsers

#### 3. Create Constraints Module
**File**: `sgl-router/src/tool_parser/constraints.rs` (NEW)

- [ ] Create new file for constraint generation logic
- [ ] Add `build_tool_call_constraint()` function with parameters:
  - `tools: &[Tool]`
  - `tool_choice: &Option<ToolChoice>`
  - `parallel_tool_calls: bool`
  - `tool_parser_factory: &ToolParserFactory`
  - `configured_parser: Option<&String>`
  - `model: &str`
- [ ] Implement auto mode → structural_tag logic
- [ ] Implement required mode → json_schema logic
- [ ] Implement specific function → json_schema logic
- [ ] Add `build_required_json_schema()` helper
- [ ] Add `build_specific_function_json_schema()` helper
- [ ] Add `parallel_tool_calls` support (maxItems in json_schema)

**File**: `sgl-router/src/tool_parser/mod.rs`

- [ ] Add `pub mod constraints;` export

#### 4. Update Protocol
**File**: `sgl-router/src/protocols/chat.rs`

- [ ] Add `parallel_tool_calls: Option<bool>` field to ChatCompletionRequest

#### 5. Update Regular Router Preparation
**File**: `sgl-router/src/routers/grpc/regular/stages/chat/preparation.rs`

- [ ] Add import: `use crate::tool_parser::constraints;`
- [ ] Replace `utils::generate_tool_constraints()` with `constraints::build_tool_call_constraint()`
- [ ] Pass `parallel_tool_calls` parameter
- [ ] Pass `tool_parser_factory` from SharedComponents
- [ ] Pass `configured_tool_parser` from processor
- [ ] Pass `model` for auto-detection fallback

#### 6. Update Harmony Router Preparation
**File**: `sgl-router/src/routers/grpc/harmony/stages/preparation.rs`

- [ ] Add import: `use crate::tool_parser::constraints;`
- [ ] Replace `utils::generate_tool_constraints()` with `constraints::build_tool_call_constraint()`
- [ ] Pass same parameters as regular router

**File**: `sgl-router/src/routers/grpc/harmony/stages/request_building.rs`

- [ ] Check if constraint generation happens here
- [ ] Update similarly if needed

#### 7. Remove Old Code
**File**: `sgl-router/src/routers/grpc/utils.rs`

- [ ] Remove `generate_tool_constraints()` function

### Testing

**File**: `sgl-router/tests/tool_parser/test_structural_tag.rs` (NEW)
- [ ] Unit tests for each parser's `get_format_info()`
- [ ] Unit tests for `build_structural_tag()` output format
- [ ] Test at_least_one and stop_after_first parameters

**File**: `sgl-router/tests/tool_parser/test_constraints.rs` (NEW)
- [ ] Test `build_tool_call_constraint()` with auto mode
- [ ] Test with required mode
- [ ] Test with specific function
- [ ] Test parallel_tool_calls=true/false
- [ ] Test json_schema maxItems handling
- [ ] Test configured_parser takes precedence
- [ ] Test auto-detection fallback

**Integration Tests**:
- [ ] Test regular router with structural_tag constraint
- [ ] Test harmony router with structural_tag constraint
- [ ] Verify Rust router → Python backend integration
- [ ] Verify parallel_tool_calls passed correctly
- [ ] Verify parity with Python implementation

### Acceptance Criteria

- ToolParser trait has structural tag methods
- All parsers implement the new methods
- Constraint generation moved to tool_parser module
- Rust router supports parallel_tool_calls parameter
- Auto mode uses structural_tag
- Required mode uses json_schema with maxItems support
- Integration tests pass
- Parity with Python router behavior

---

## Success Metrics

- [ ] All legacy code removed
- [ ] All detectors implement new format
- [ ] Schema always included in structural tags
- [ ] parallel_tool_calls parameter works
- [ ] All tests pass
- [ ] No performance regression
- [ ] Documentation updated

### Related resources

_No response_

[Feature] XGrammar Structural Tag Migration #13026

Description

Checklist

Motivation

Overview

Issue 1: Remove Legacy Format & Update Base Infrastructure

Tasks

1. Remove Legacy Types

2. Update BaseFormatDetector

3. Update FunctionCallParser

4. Update JSON Schema Builder

5. Update Serving Chat

6. Remove Legacy Backend Code

7. Remove SGLANG_TOOL_STRICT_LEVEL env var

Testing

Acceptance Criteria

Issue 2: Implement build_structural_tag() for All Detectors

Detectors to Update

Detector List

Standard Detectors (Use triggered_tags format)

Special Detectors (Custom Formats)

Summary of Detectors

Implementation Template

Testing

Acceptance Criteria

Issue 3: Add Integration Tests & E2E Validation

Unit Tests

Integration Tests

Performance Tests

Regression Tests

Acceptance Criteria

Issue 4: Rust Router Support (OPTIONAL)

Tasks

1. Add ToolParser Trait Methods

2. Implement get_format_info() for All Parsers

3. Create Constraints Module

4. Update Protocol

5. Update Regular Router Preparation

6. Update Harmony Router Preparation

7. Remove Old Code

Testing

Acceptance Criteria

Success Metrics

Related resources

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

7. Remove `SGLANG_TOOL_STRICT_LEVEL` env var