-
Notifications
You must be signed in to change notification settings - Fork 291
Open
Description
Hi Zhipu team, thank you so much for open-sourcing such impressive models and sharing your research!
I had 2 quick questions:
- How can the BrowseComp and HLE evaluation results be replicated? I noticed the two files "trajectory_search.json" and "glm_4.6_tir_guide.md" but are these enough to reproduce the results?
- Is the search-agent framework you used for BrowseComp/HLE evaluation open-source, or do you plan to open-source it?
Also, if I can also ask, what the agent framework was used to achieve the SWE-Bench Verified results?
Thanks again for your great work! 🙏
嗨,智谱团队,非常感谢你们开源如此优秀的模型并分享研究成果!
我有两个简短的问题:
- BrowseComp 和 HLE 的评测结果该如何复现?我注意到仓库里有两个文件 “trajectory_search.json” 和 “glm_4.6_tir_guide.md”,但它们是否足以复现你们报告的结果呢?
- 用于 BrowseComp/HLE 评测的 搜索代理框架 是否已经开源,或者未来有开源计划吗?
另外,如果可以的话,我还想请教一下,你们在 SWE-Bench Verified 上取得结果时使用的代理框架是什么?
再次感谢你们的卓越工作!🙏
Metadata
Metadata
Assignees
Labels
No labels