-
Notifications
You must be signed in to change notification settings - Fork 3.8k
Description
Checklist
- 1. If the issue you raised is not a feature but a question, please raise a discussion at https://github.com/sgl-project/sglang/discussions/new/choose Otherwise, it will be closed.
- 2. Please use English, otherwise it will be closed.
Motivation
Overview
4 issues to migrate to new xgrammar structural tag format. Can be done in 3-5 days with focused effort.
Dependencies:
- Issue 1 → Issue 2 (need base changes before detector implementations)
- Issue 3 can be done in parallel with Issue 2
- Issue 4 is optional/can be deferred
Issue 1: Remove Legacy Format & Update Base Infrastructure
Estimate: 4-6 hours
Goal: Clean up legacy code and update base classes/types
Tasks
1. Remove Legacy Types
File: python/sglang/srt/entrypoints/openai/protocol.py
- Delete
LegacyStructuralTagResponseFormatclass - Delete
StructuresResponseFormatclass - Update
ToolCallConstrainttype alias to only use new format:ToolCallConstraint: TypeAlias = Union[ Tuple[Literal["structural_tag"], Dict[str, Any]], # New format only Tuple[Literal["json_schema"], Any], ]
2. Update BaseFormatDetector
File: python/sglang/srt/function_call/base_format_detector.py
- Remove
structure_info()abstract method - Remove
build_ebnf()abstract method - Add new
build_structural_tag()abstract method:@abstractmethod def build_structural_tag( self, tools: List[Tool], at_least_one: bool = False, stop_after_first: bool = False, ) -> Dict[str, Any]: """Build xgrammar structural tag for this model's format.""" raise NotImplementedError()
3. Update FunctionCallParser
File: python/sglang/srt/function_call/function_call_parser.py
- Remove
get_structure_tag()method (old implementation) - Update
get_structure_constraint()to:- Accept
parallel_tool_callsparameter - Use detector's
build_structural_tag()for auto mode - Use
get_json_schema_constraint()for required/specific
def get_structure_constraint( self, tool_choice: Union[ToolChoice, Literal["auto", "required"]], parallel_tool_calls: bool = True ) -> Optional[ToolCallConstraint]: if tool_choice == "auto": if self.detector.supports_structural_tag(): tag = self.detector.build_structural_tag( tools=self.tools, at_least_one=False, stop_after_first=not parallel_tool_calls ) return ("structural_tag", tag) return None elif tool_choice == "required" or isinstance(tool_choice, ToolChoice): json_schema = get_json_schema_constraint( self.tools, tool_choice, parallel_tool_calls ) return ("json_schema", json_schema) return None
- Accept
4. Update JSON Schema Builder
File: python/sglang/srt/function_call/utils.py
- Add
parallel_tool_callsparameter toget_json_schema_constraint() - Add
maxItems: 1whenparallel_tool_calls=Falsefor required mode - Keep
maxItems: 1always for specific function choice
5. Update Serving Chat
File: python/sglang/srt/entrypoints/openai/serving_chat.py
- Pass
parallel_tool_callstoget_structure_constraint() - Remove json_schema override (lines 237-244)
6. Remove Legacy Backend Code
File: python/sglang/srt/constrained/xgrammar_backend.py
- Remove
is_legacy_structural_tag()check - Remove legacy format handling
- Pass new format directly to xgrammar compiler
File: python/sglang/srt/constrained/utils.py
- Delete
is_legacy_structural_tag()function
7. Remove SGLANG_TOOL_STRICT_LEVEL env var
Testing
- Code compiles (will have import errors from detectors until Issue 2)
- Type checking passes
- Legacy types are completely removed
Acceptance Criteria
- All legacy code removed
- Base infrastructure ready for detector implementations
- No breaking changes to existing API (just internal refactor)
Issue 2: Implement build_structural_tag() for All Detectors
Estimate: 8-10 hours (11 detectors, 3 with special formats)
Goal: Update all detector classes to implement new method
Detectors to Update
For each detector, do the following:
- Remove
structure_info()implementation - Remove
build_ebnf()implementation - Add
build_structural_tag()implementation
Key principle: Always include tool.function.parameters in schema, even if strict=False
Detector List
Standard Detectors (Use triggered_tags format)
File: python/sglang/srt/function_call/llama32_detector.py
- Llama32Detector
- Trigger:
<|python_tag|> - Begin:
<|python_tag|>{"name":"{name}", "arguments": - End:
}
- Trigger:
File: python/sglang/srt/function_call/mistral_detector.py
- MistralDetector
- Trigger:
[TOOL_CALLS] - Begin:
[TOOL_CALLS] [{ - End:
}]
- Trigger:
File: python/sglang/srt/function_call/qwen25_detector.py
- Qwen25Detector
- Trigger:
<tool_call> - Begin:
<tool_call>\n{ - End:
}</tool_call>
- Trigger:
File: python/sglang/srt/function_call/qwen3_coder_detector.py
- Qwen3CoderDetector
- SPECIAL FORMAT: Uses XML parameter format, NOT JSON
- Parameters:
<parameter=name>value</parameter> - Nested objects remain JSON
- Refer to: https://xgrammar.mlc.ai/docs/tutorials/structural_tag.html
- May need custom content format for XML parameters
File: python/sglang/srt/function_call/glm4_moe_detector.py
- Glm4MoeDetector
- Review existing
structure_info()for format - Implement equivalent in
build_structural_tag()
- Review existing
File: python/sglang/srt/function_call/gpt_oss_detector.py
- GptOssDetector
- Review existing
structure_info()for format - Implement equivalent in
build_structural_tag()
- Review existing
File: python/sglang/srt/function_call/pythonic_detector.py
- PythonicDetector (Might not be able to support)
- Review existing
structure_info()for format - Implement equivalent in
build_structural_tag()
- Review existing
File: python/sglang/srt/function_call/kimik2_detector.py
- KimiK2Detector
- Review if needs special format (tags_with_separator?)
- Implement accordingly
File: python/sglang/srt/function_call/minimax_m2.py
- MinimaxM2Detector
- Review existing
structure_info()for format - Implement equivalent in
build_structural_tag()
- Review existing
File: python/sglang/srt/function_call/step3_detector.py
- Step3Detector
- Review existing
structure_info()for format - Implement equivalent in
build_structural_tag()
- Review existing
Special Detectors (Custom Formats)
File: python/sglang/srt/function_call/deepseekv31_detector.py
- DeepSeekV31Detector
- SPECIAL FORMAT: Uses wrapper tags
- Outer wrapper:
<|tool▁calls▁begin|>...<|tool▁calls▁end|> - Individual calls:
<|tool▁call▁begin|>...<|tool▁call▁end|> - Parameters in markdown code blocks:
```jsonc...``` - Use
tags_with_separatorformat with\nseparator - Refer to: https://xgrammar.mlc.ai/docs/tutorials/structural_tag.html
File: python/sglang/srt/function_call/deepseekv3_detector.py
- DeepSeekV3Detector
- SPECIAL FORMAT: Similar to DeepSeekV31
- Check if same wrapper tag format applies
- May need
sequenceformat for reasoning support - Refer to: https://xgrammar.mlc.ai/docs/tutorials/structural_tag.html
Summary of Detectors
Total: 11 detectors to update
Standard detectors (8): Use triggered_tags format
- Llama32Detector
- MistralDetector
- Qwen25Detector
- Glm4MoeDetector
- GptOssDetector
- PythonicDetector
- KimiK2Detector
- MinimaxM2Detector
- Step3Detector
Special format detectors (3): Need custom implementations
- Qwen3CoderDetector (XML parameter format)
- DeepSeekV31Detector (wrapper tags + tags_with_separator)
- DeepSeekV3Detector (wrapper tags + tags_with_separator)
Implementation Template
For standard detectors, the implementation follows this pattern:
def build_structural_tag(
self,
tools: List[Tool],
at_least_one: bool = False,
stop_after_first: bool = False,
) -> Dict[str, Any]:
"""Build structural tag for [MODEL] format."""
tags = []
triggers = set()
for tool in tools:
name = tool.function.name
if not name:
continue
# Define model-specific format
begin = "..." # From old structure_info()
end = "..." # From old structure_info()
trigger = "..." # From old structure_info()
# Always include schema
schema = tool.function.parameters or {}
tags.append({
"format": "tag",
"begin": begin,
"content": {
"format": "json_schema",
"schema": schema
},
"end": end
})
triggers.add(trigger)
return {
"format": "triggered_tags",
"triggers": list(triggers),
"tags": tags,
"at_least_one": at_least_one,
"stop_after_first": stop_after_first
}For DeepSeek detectors (wrapper tags format):
def build_structural_tag(
self,
tools: List[Tool],
at_least_one: bool = False,
stop_after_first: bool = False,
) -> Dict[str, Any]:
"""Build structural tag for DeepSeek wrapper format."""
# Build individual tool call tags
tool_tags = []
for tool in tools:
name = tool.function.name
if not name:
continue
schema = tool.function.parameters or {}
tool_tags.append({
"format": "tag",
"begin": "<|tool▁call▁begin|>\n```jsonc\n{",
"content": {
"format": "json_schema",
"schema": schema
},
"end": "}\n```\n<|tool▁call▁end|>"
})
# Wrap in outer tag with separator
return {
"format": "triggered_tags",
"triggers": ["<|tool▁calls▁begin|>"],
"tags": [{
"format": "tag",
"begin": "<|tool▁calls▁begin|>\n",
"content": {
"format": "tags_with_separator",
"tags": tool_tags,
"separator": "\n",
"at_least_one": at_least_one,
"stop_after_first": stop_after_first
},
"end": "\n<|tool▁calls▁end|>"
}],
"at_least_one": False, # Outer level
"stop_after_first": False
}For Qwen3Coder (XML parameter format):
def build_structural_tag(
self,
tools: List[Tool],
at_least_one: bool = False,
stop_after_first: bool = False,
) -> Dict[str, Any]:
"""Build structural tag for Qwen3-Coder XML format."""
tags = []
triggers = set()
for tool in tools:
name = tool.function.name
if not name:
continue
# Qwen3-Coder uses XML parameter format
# This may require a custom xgrammar format or schema transformation
# Refer to: https://xgrammar.mlc.ai/docs/tutorials/structural_tag.html
schema = tool.function.parameters or {}
# TODO: May need to transform JSON schema to XML parameter schema
tags.append({
"format": "tag",
"begin": f"<tool_call>\n<name>{name}</name>\n",
"content": {
"format": "xml_parameters", # May need custom format
"schema": schema
},
"end": "\n</tool_call>"
})
triggers.add("<tool_call>")
return {
"format": "triggered_tags",
"triggers": list(triggers),
"tags": tags,
"at_least_one": at_least_one,
"stop_after_first": stop_after_first
}Note: Qwen3-Coder implementation may require additional research into xgrammar's XML parameter support.
Testing
For each detector:
- Unit test: Verify structural tag format is correct
- Unit test: Verify schema is always included
- Unit test: Verify at_least_one and stop_after_first work
Acceptance Criteria
- All detectors implement
build_structural_tag() - All detectors have
structure_info()andbuild_ebnf()removed - Unit tests pass for each detector
- Code compiles and runs
Issue 3: Add Integration Tests & E2E Validation
Estimate: 4-6 hours
Goal: Comprehensive testing of new format
Unit Tests
File: python/tests/srt/function_call/test_structural_tag_new_format.py (NEW)
- Test schema always included (even with strict=False)
- Test at_least_one parameter
- Test stop_after_first parameter
- Test multiple tools
- Test empty parameters handling
- Test for each detector (Llama, Mistral, Qwen, etc.)
File: python/tests/srt/function_call/test_parser_integration.py (NEW)
- Test auto mode returns structural_tag
- Test auto + parallel=False sets stop_after_first
- Test required mode returns json_schema
- Test required + parallel=False sets maxItems
- Test specific function choice
Integration Tests
File: python/tests/integration/test_tool_calling_e2e.py (NEW/UPDATE)
- Test with real models (Llama, Mistral, Qwen)
- Test tool_choice="auto" with structural tag
- Test parallel_tool_calls=True (multiple calls)
- Test parallel_tool_calls=False (single call)
- Test tool_choice="required" with json_schema
- Test schema guidance works (model follows parameter structure)
Performance Tests
File: python/tests/performance/test_structural_tag_overhead.py (NEW)
- Benchmark compilation time vs legacy format
- Benchmark generation speed
- Compare to json_schema baseline
Regression Tests
- Verify existing tool calling tests still pass
- Verify backward compatibility for models using json_schema
- Verify Kimi-K2 special handling still works
Acceptance Criteria
- All tests pass
- No performance regression
- Tool calling works correctly for all modes
- parallel_tool_calls parameter works as expected
Issue 4: Rust Router Support (OPTIONAL)
Estimate: 6-8 hours
Goal: Add parallel_tool_calls support and structural tag generation to Rust router
Note: This can be deferred or done in parallel by a different person
Tasks
1. Add ToolParser Trait Methods
File: sgl-router/src/tool_parser/traits.rs
- Add
build_structural_tag()method to ToolParser trait with default implementation - Add
get_format_info()abstract method to ToolParser trait
2. Implement get_format_info() for All Parsers
Files: sgl-router/src/tool_parser/parsers/*.rs
- Implement
get_format_info()for LlamaParser - Implement
get_format_info()for MistralParser - Implement
get_format_info()for QwenParser - Implement
get_format_info()for DeepSeekParser - Override
build_structural_tag()for DeepSeekParser if reasoning support needed - Implement for any other parsers
3. Create Constraints Module
File: sgl-router/src/tool_parser/constraints.rs (NEW)
- Create new file for constraint generation logic
- Add
build_tool_call_constraint()function with parameters:tools: &[Tool]tool_choice: &Option<ToolChoice>parallel_tool_calls: booltool_parser_factory: &ToolParserFactoryconfigured_parser: Option<&String>model: &str
- Implement auto mode → structural_tag logic
- Implement required mode → json_schema logic
- Implement specific function → json_schema logic
- Add
build_required_json_schema()helper - Add
build_specific_function_json_schema()helper - Add
parallel_tool_callssupport (maxItems in json_schema)
File: sgl-router/src/tool_parser/mod.rs
- Add
pub mod constraints;export
4. Update Protocol
File: sgl-router/src/protocols/chat.rs
- Add
parallel_tool_calls: Option<bool>field to ChatCompletionRequest
5. Update Regular Router Preparation
File: sgl-router/src/routers/grpc/regular/stages/chat/preparation.rs
- Add import:
use crate::tool_parser::constraints; - Replace
utils::generate_tool_constraints()withconstraints::build_tool_call_constraint() - Pass
parallel_tool_callsparameter - Pass
tool_parser_factoryfrom SharedComponents - Pass
configured_tool_parserfrom processor - Pass
modelfor auto-detection fallback
6. Update Harmony Router Preparation
File: sgl-router/src/routers/grpc/harmony/stages/preparation.rs
- Add import:
use crate::tool_parser::constraints; - Replace
utils::generate_tool_constraints()withconstraints::build_tool_call_constraint() - Pass same parameters as regular router
File: sgl-router/src/routers/grpc/harmony/stages/request_building.rs
- Check if constraint generation happens here
- Update similarly if needed
7. Remove Old Code
File: sgl-router/src/routers/grpc/utils.rs
- Remove
generate_tool_constraints()function
Testing
File: sgl-router/tests/tool_parser/test_structural_tag.rs (NEW)
- Unit tests for each parser's
get_format_info() - Unit tests for
build_structural_tag()output format - Test at_least_one and stop_after_first parameters
File: sgl-router/tests/tool_parser/test_constraints.rs (NEW)
- Test
build_tool_call_constraint()with auto mode - Test with required mode
- Test with specific function
- Test parallel_tool_calls=true/false
- Test json_schema maxItems handling
- Test configured_parser takes precedence
- Test auto-detection fallback
Integration Tests:
- Test regular router with structural_tag constraint
- Test harmony router with structural_tag constraint
- Verify Rust router → Python backend integration
- Verify parallel_tool_calls passed correctly
- Verify parity with Python implementation
Acceptance Criteria
- ToolParser trait has structural tag methods
- All parsers implement the new methods
- Constraint generation moved to tool_parser module
- Rust router supports parallel_tool_calls parameter
- Auto mode uses structural_tag
- Required mode uses json_schema with maxItems support
- Integration tests pass
- Parity with Python router behavior
Success Metrics
- All legacy code removed
- All detectors implement new format
- Schema always included in structural tags
- parallel_tool_calls parameter works
- All tests pass
- No performance regression
- Documentation updated
Related resources
No response