BrowseComp/HLE Reproducibility | BrowseComp/HLE 结果可复现性

Hi Zhipu team, thank you so much for open-sourcing such impressive models and sharing your research!
I had 2 quick questions:

1. How can the BrowseComp and HLE evaluation results be replicated? I noticed the two files ["trajectory_search.json"](https://github.com/zai-org/GLM-4.5/blob/main/resources/trajectory_search.json) and ["glm_4.6_tir_guide.md"](https://github.com/zai-org/GLM-4.5/blob/main/resources/glm_4.6_tir_guide.md) but are these enough to reproduce the results?
2. Is the search-agent framework you used for BrowseComp/HLE evaluation open-source, or do you plan to open-source it?

Also, if I can also ask, what the agent framework was used to achieve the SWE-Bench Verified results?

Thanks again for your great work! 🙏

---

嗨，智谱团队，非常感谢你们开源如此优秀的模型并分享研究成果！  
我有两个简短的问题：  

1. **BrowseComp 和 HLE 的评测结果**该如何复现？我注意到仓库里有两个文件 [“trajectory_search.json”](https://github.com/zai-org/GLM-4.5/blob/main/resources/trajectory_search.json) 和 [“glm_4.6_tir_guide.md”](https://github.com/zai-org/GLM-4.5/blob/main/resources/glm_4.6_tir_guide.md)，但它们是否足以复现你们报告的结果呢？  
2. 用于 BrowseComp/HLE 评测的 **搜索代理框架** 是否已经开源，或者未来有开源计划吗？  

另外，如果可以的话，我还想请教一下，你们在 **SWE-Bench Verified** 上取得结果时使用的代理框架是什么？  

再次感谢你们的卓越工作！🙏

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

BrowseComp/HLE Reproducibility | BrowseComp/HLE 结果可复现性 #87

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

BrowseComp/HLE Reproducibility | BrowseComp/HLE 结果可复现性 #87

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions