Skip to content

[Feature] XGrammar Structural Tag Migration #13026

@CatherineSue

Description

@CatherineSue

Checklist

Motivation

Overview

4 issues to migrate to new xgrammar structural tag format. Can be done in 3-5 days with focused effort.

Dependencies:

  • Issue 1 → Issue 2 (need base changes before detector implementations)
  • Issue 3 can be done in parallel with Issue 2
  • Issue 4 is optional/can be deferred

Issue 1: Remove Legacy Format & Update Base Infrastructure

Estimate: 4-6 hours

Goal: Clean up legacy code and update base classes/types

Tasks

1. Remove Legacy Types

File: python/sglang/srt/entrypoints/openai/protocol.py

  • Delete LegacyStructuralTagResponseFormat class
  • Delete StructuresResponseFormat class
  • Update ToolCallConstraint type alias to only use new format:
    ToolCallConstraint: TypeAlias = Union[
        Tuple[Literal["structural_tag"], Dict[str, Any]],  # New format only
        Tuple[Literal["json_schema"], Any],
    ]

2. Update BaseFormatDetector

File: python/sglang/srt/function_call/base_format_detector.py

  • Remove structure_info() abstract method
  • Remove build_ebnf() abstract method
  • Add new build_structural_tag() abstract method:
    @abstractmethod
    def build_structural_tag(
        self,
        tools: List[Tool],
        at_least_one: bool = False,
        stop_after_first: bool = False,
    ) -> Dict[str, Any]:
        """Build xgrammar structural tag for this model's format."""
        raise NotImplementedError()

3. Update FunctionCallParser

File: python/sglang/srt/function_call/function_call_parser.py

  • Remove get_structure_tag() method (old implementation)
  • Update get_structure_constraint() to:
    • Accept parallel_tool_calls parameter
    • Use detector's build_structural_tag() for auto mode
    • Use get_json_schema_constraint() for required/specific
    def get_structure_constraint(
        self,
        tool_choice: Union[ToolChoice, Literal["auto", "required"]],
        parallel_tool_calls: bool = True
    ) -> Optional[ToolCallConstraint]:
        if tool_choice == "auto":
            if self.detector.supports_structural_tag():
                tag = self.detector.build_structural_tag(
                    tools=self.tools,
                    at_least_one=False,
                    stop_after_first=not parallel_tool_calls
                )
                return ("structural_tag", tag)
            return None
        elif tool_choice == "required" or isinstance(tool_choice, ToolChoice):
            json_schema = get_json_schema_constraint(
                self.tools, tool_choice, parallel_tool_calls
            )
            return ("json_schema", json_schema)
        return None

4. Update JSON Schema Builder

File: python/sglang/srt/function_call/utils.py

  • Add parallel_tool_calls parameter to get_json_schema_constraint()
  • Add maxItems: 1 when parallel_tool_calls=False for required mode
  • Keep maxItems: 1 always for specific function choice

5. Update Serving Chat

File: python/sglang/srt/entrypoints/openai/serving_chat.py

  • Pass parallel_tool_calls to get_structure_constraint()
  • Remove json_schema override (lines 237-244)

6. Remove Legacy Backend Code

File: python/sglang/srt/constrained/xgrammar_backend.py

  • Remove is_legacy_structural_tag() check
  • Remove legacy format handling
  • Pass new format directly to xgrammar compiler

File: python/sglang/srt/constrained/utils.py

  • Delete is_legacy_structural_tag() function

7. Remove SGLANG_TOOL_STRICT_LEVEL env var

Testing

  • Code compiles (will have import errors from detectors until Issue 2)
  • Type checking passes
  • Legacy types are completely removed

Acceptance Criteria

  • All legacy code removed
  • Base infrastructure ready for detector implementations
  • No breaking changes to existing API (just internal refactor)

Issue 2: Implement build_structural_tag() for All Detectors

Estimate: 8-10 hours (11 detectors, 3 with special formats)

Goal: Update all detector classes to implement new method

Detectors to Update

For each detector, do the following:

  1. Remove structure_info() implementation
  2. Remove build_ebnf() implementation
  3. Add build_structural_tag() implementation

Key principle: Always include tool.function.parameters in schema, even if strict=False

Detector List

Standard Detectors (Use triggered_tags format)

File: python/sglang/srt/function_call/llama32_detector.py

  • Llama32Detector
    • Trigger: <|python_tag|>
    • Begin: <|python_tag|>{"name":"{name}", "arguments":
    • End: }

File: python/sglang/srt/function_call/mistral_detector.py

  • MistralDetector
    • Trigger: [TOOL_CALLS]
    • Begin: [TOOL_CALLS] [{
    • End: }]

File: python/sglang/srt/function_call/qwen25_detector.py

  • Qwen25Detector
    • Trigger: <tool_call>
    • Begin: <tool_call>\n{
    • End: }</tool_call>

File: python/sglang/srt/function_call/qwen3_coder_detector.py

File: python/sglang/srt/function_call/glm4_moe_detector.py

  • Glm4MoeDetector
    • Review existing structure_info() for format
    • Implement equivalent in build_structural_tag()

File: python/sglang/srt/function_call/gpt_oss_detector.py

  • GptOssDetector
    • Review existing structure_info() for format
    • Implement equivalent in build_structural_tag()

File: python/sglang/srt/function_call/pythonic_detector.py

  • PythonicDetector (Might not be able to support)
    • Review existing structure_info() for format
    • Implement equivalent in build_structural_tag()

File: python/sglang/srt/function_call/kimik2_detector.py

  • KimiK2Detector
    • Review if needs special format (tags_with_separator?)
    • Implement accordingly

File: python/sglang/srt/function_call/minimax_m2.py

  • MinimaxM2Detector
    • Review existing structure_info() for format
    • Implement equivalent in build_structural_tag()

File: python/sglang/srt/function_call/step3_detector.py

  • Step3Detector
    • Review existing structure_info() for format
    • Implement equivalent in build_structural_tag()

Special Detectors (Custom Formats)

File: python/sglang/srt/function_call/deepseekv31_detector.py

  • DeepSeekV31Detector
    • SPECIAL FORMAT: Uses wrapper tags
    • Outer wrapper: <|tool▁calls▁begin|> ... <|tool▁calls▁end|>
    • Individual calls: <|tool▁call▁begin|> ... <|tool▁call▁end|>
    • Parameters in markdown code blocks: ```jsonc ... ```
    • Use tags_with_separator format with \n separator
    • Refer to: https://xgrammar.mlc.ai/docs/tutorials/structural_tag.html

File: python/sglang/srt/function_call/deepseekv3_detector.py

Summary of Detectors

Total: 11 detectors to update

Standard detectors (8): Use triggered_tags format

  • Llama32Detector
  • MistralDetector
  • Qwen25Detector
  • Glm4MoeDetector
  • GptOssDetector
  • PythonicDetector
  • KimiK2Detector
  • MinimaxM2Detector
  • Step3Detector

Special format detectors (3): Need custom implementations

  • Qwen3CoderDetector (XML parameter format)
  • DeepSeekV31Detector (wrapper tags + tags_with_separator)
  • DeepSeekV3Detector (wrapper tags + tags_with_separator)

Implementation Template

For standard detectors, the implementation follows this pattern:

def build_structural_tag(
    self,
    tools: List[Tool],
    at_least_one: bool = False,
    stop_after_first: bool = False,
) -> Dict[str, Any]:
    """Build structural tag for [MODEL] format."""
    tags = []
    triggers = set()

    for tool in tools:
        name = tool.function.name
        if not name:
            continue

        # Define model-specific format
        begin = "..."  # From old structure_info()
        end = "..."    # From old structure_info()
        trigger = "..." # From old structure_info()

        # Always include schema
        schema = tool.function.parameters or {}

        tags.append({
            "format": "tag",
            "begin": begin,
            "content": {
                "format": "json_schema",
                "schema": schema
            },
            "end": end
        })

        triggers.add(trigger)

    return {
        "format": "triggered_tags",
        "triggers": list(triggers),
        "tags": tags,
        "at_least_one": at_least_one,
        "stop_after_first": stop_after_first
    }

For DeepSeek detectors (wrapper tags format):

def build_structural_tag(
    self,
    tools: List[Tool],
    at_least_one: bool = False,
    stop_after_first: bool = False,
) -> Dict[str, Any]:
    """Build structural tag for DeepSeek wrapper format."""
    # Build individual tool call tags
    tool_tags = []
    for tool in tools:
        name = tool.function.name
        if not name:
            continue

        schema = tool.function.parameters or {}

        tool_tags.append({
            "format": "tag",
            "begin": "<|tool▁call▁begin|>\n```jsonc\n{",
            "content": {
                "format": "json_schema",
                "schema": schema
            },
            "end": "}\n```\n<|tool▁call▁end|>"
        })

    # Wrap in outer tag with separator
    return {
        "format": "triggered_tags",
        "triggers": ["<|tool▁calls▁begin|>"],
        "tags": [{
            "format": "tag",
            "begin": "<|tool▁calls▁begin|>\n",
            "content": {
                "format": "tags_with_separator",
                "tags": tool_tags,
                "separator": "\n",
                "at_least_one": at_least_one,
                "stop_after_first": stop_after_first
            },
            "end": "\n<|tool▁calls▁end|>"
        }],
        "at_least_one": False,  # Outer level
        "stop_after_first": False
    }

For Qwen3Coder (XML parameter format):

def build_structural_tag(
    self,
    tools: List[Tool],
    at_least_one: bool = False,
    stop_after_first: bool = False,
) -> Dict[str, Any]:
    """Build structural tag for Qwen3-Coder XML format."""
    tags = []
    triggers = set()

    for tool in tools:
        name = tool.function.name
        if not name:
            continue

        # Qwen3-Coder uses XML parameter format
        # This may require a custom xgrammar format or schema transformation
        # Refer to: https://xgrammar.mlc.ai/docs/tutorials/structural_tag.html

        schema = tool.function.parameters or {}
        # TODO: May need to transform JSON schema to XML parameter schema

        tags.append({
            "format": "tag",
            "begin": f"<tool_call>\n<name>{name}</name>\n",
            "content": {
                "format": "xml_parameters",  # May need custom format
                "schema": schema
            },
            "end": "\n</tool_call>"
        })

        triggers.add("<tool_call>")

    return {
        "format": "triggered_tags",
        "triggers": list(triggers),
        "tags": tags,
        "at_least_one": at_least_one,
        "stop_after_first": stop_after_first
    }

Note: Qwen3-Coder implementation may require additional research into xgrammar's XML parameter support.

Testing

For each detector:

  • Unit test: Verify structural tag format is correct
  • Unit test: Verify schema is always included
  • Unit test: Verify at_least_one and stop_after_first work

Acceptance Criteria

  • All detectors implement build_structural_tag()
  • All detectors have structure_info() and build_ebnf() removed
  • Unit tests pass for each detector
  • Code compiles and runs

Issue 3: Add Integration Tests & E2E Validation

Estimate: 4-6 hours

Goal: Comprehensive testing of new format

Unit Tests

File: python/tests/srt/function_call/test_structural_tag_new_format.py (NEW)

  • Test schema always included (even with strict=False)
  • Test at_least_one parameter
  • Test stop_after_first parameter
  • Test multiple tools
  • Test empty parameters handling
  • Test for each detector (Llama, Mistral, Qwen, etc.)

File: python/tests/srt/function_call/test_parser_integration.py (NEW)

  • Test auto mode returns structural_tag
  • Test auto + parallel=False sets stop_after_first
  • Test required mode returns json_schema
  • Test required + parallel=False sets maxItems
  • Test specific function choice

Integration Tests

File: python/tests/integration/test_tool_calling_e2e.py (NEW/UPDATE)

  • Test with real models (Llama, Mistral, Qwen)
  • Test tool_choice="auto" with structural tag
  • Test parallel_tool_calls=True (multiple calls)
  • Test parallel_tool_calls=False (single call)
  • Test tool_choice="required" with json_schema
  • Test schema guidance works (model follows parameter structure)

Performance Tests

File: python/tests/performance/test_structural_tag_overhead.py (NEW)

  • Benchmark compilation time vs legacy format
  • Benchmark generation speed
  • Compare to json_schema baseline

Regression Tests

  • Verify existing tool calling tests still pass
  • Verify backward compatibility for models using json_schema
  • Verify Kimi-K2 special handling still works

Acceptance Criteria

  • All tests pass
  • No performance regression
  • Tool calling works correctly for all modes
  • parallel_tool_calls parameter works as expected

Issue 4: Rust Router Support (OPTIONAL)

Estimate: 6-8 hours

Goal: Add parallel_tool_calls support and structural tag generation to Rust router

Note: This can be deferred or done in parallel by a different person

Tasks

1. Add ToolParser Trait Methods

File: sgl-router/src/tool_parser/traits.rs

  • Add build_structural_tag() method to ToolParser trait with default implementation
  • Add get_format_info() abstract method to ToolParser trait

2. Implement get_format_info() for All Parsers

Files: sgl-router/src/tool_parser/parsers/*.rs

  • Implement get_format_info() for LlamaParser
  • Implement get_format_info() for MistralParser
  • Implement get_format_info() for QwenParser
  • Implement get_format_info() for DeepSeekParser
  • Override build_structural_tag() for DeepSeekParser if reasoning support needed
  • Implement for any other parsers

3. Create Constraints Module

File: sgl-router/src/tool_parser/constraints.rs (NEW)

  • Create new file for constraint generation logic
  • Add build_tool_call_constraint() function with parameters:
    • tools: &[Tool]
    • tool_choice: &Option<ToolChoice>
    • parallel_tool_calls: bool
    • tool_parser_factory: &ToolParserFactory
    • configured_parser: Option<&String>
    • model: &str
  • Implement auto mode → structural_tag logic
  • Implement required mode → json_schema logic
  • Implement specific function → json_schema logic
  • Add build_required_json_schema() helper
  • Add build_specific_function_json_schema() helper
  • Add parallel_tool_calls support (maxItems in json_schema)

File: sgl-router/src/tool_parser/mod.rs

  • Add pub mod constraints; export

4. Update Protocol

File: sgl-router/src/protocols/chat.rs

  • Add parallel_tool_calls: Option<bool> field to ChatCompletionRequest

5. Update Regular Router Preparation

File: sgl-router/src/routers/grpc/regular/stages/chat/preparation.rs

  • Add import: use crate::tool_parser::constraints;
  • Replace utils::generate_tool_constraints() with constraints::build_tool_call_constraint()
  • Pass parallel_tool_calls parameter
  • Pass tool_parser_factory from SharedComponents
  • Pass configured_tool_parser from processor
  • Pass model for auto-detection fallback

6. Update Harmony Router Preparation

File: sgl-router/src/routers/grpc/harmony/stages/preparation.rs

  • Add import: use crate::tool_parser::constraints;
  • Replace utils::generate_tool_constraints() with constraints::build_tool_call_constraint()
  • Pass same parameters as regular router

File: sgl-router/src/routers/grpc/harmony/stages/request_building.rs

  • Check if constraint generation happens here
  • Update similarly if needed

7. Remove Old Code

File: sgl-router/src/routers/grpc/utils.rs

  • Remove generate_tool_constraints() function

Testing

File: sgl-router/tests/tool_parser/test_structural_tag.rs (NEW)

  • Unit tests for each parser's get_format_info()
  • Unit tests for build_structural_tag() output format
  • Test at_least_one and stop_after_first parameters

File: sgl-router/tests/tool_parser/test_constraints.rs (NEW)

  • Test build_tool_call_constraint() with auto mode
  • Test with required mode
  • Test with specific function
  • Test parallel_tool_calls=true/false
  • Test json_schema maxItems handling
  • Test configured_parser takes precedence
  • Test auto-detection fallback

Integration Tests:

  • Test regular router with structural_tag constraint
  • Test harmony router with structural_tag constraint
  • Verify Rust router → Python backend integration
  • Verify parallel_tool_calls passed correctly
  • Verify parity with Python implementation

Acceptance Criteria

  • ToolParser trait has structural tag methods
  • All parsers implement the new methods
  • Constraint generation moved to tool_parser module
  • Rust router supports parallel_tool_calls parameter
  • Auto mode uses structural_tag
  • Required mode uses json_schema with maxItems support
  • Integration tests pass
  • Parity with Python router behavior

Success Metrics

  • All legacy code removed
  • All detectors implement new format
  • Schema always included in structural tags
  • parallel_tool_calls parameter works
  • All tests pass
  • No performance regression
  • Documentation updated

Related resources

No response

Metadata

Metadata

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions