Google Scholar

User profiles for Zhoufutu Wen

Zhoufutu Wen

ByteDance SEED

Verified email at bytedance.com

Cited by 607

[PDF] acm.org

Enhancing dynamic image advertising with vision-language pre-training

Z Wen, X Zhao, Z Jin, Y Yang, W Jia, X Chen… - Proceedings of the 46th …, 2023 - dl.acm.org

In the multimedia era, image becomes an effective medium in search advertising. Dynamic
Image Advertising (DIA), a system that matches queries with appropriate ad images and …

Save Cite Cited by 13 Related articles All 4 versions

[PDF] iclr.cc

Kor-bench: Benchmarking language models on knowledge-orthogonal reasoning tasks

K Ma, X Du, Y Wang, H Zhang, Z Wen… - International …, 2025 - proceedings.iclr.cc

In this paper, we introduce Knowledge-Orthogonal Reasoning (KOR), a concept aimed at
minimizing reliance on domain-specific knowledge, enabling more accurate evaluation of …

Save Cite Cited by 12 Related articles All 3 versions View as HTML

[PDF] neurips.cc

Supergpqa: Scaling llm evaluation across 285 graduate disciplines

…, D Ma, Y Ni, H Que, Q Wang, Z Wen… - Advances in …, 2026 - proceedings.neurips.cc

Large language models (LLMs) have demonstrated remarkable proficiency in mainstream
academic disciplines such as mathematics, physics, and computer science. However, human …

Save Cite Cited by 130 Related articles All 4 versions View as HTML

[PDF] thecvf.com

Simplevqa: Multimodal factuality evaluation for multimodal large language models

…, G Zhang, J Liu, Y Mai, Y Zeng, Z Wen… - Proceedings of the …, 2025 - openaccess.thecvf.com

The increasing application of multi-modal large language models (MLLMs) across various
sectors has spotlighted the essence of their output reliability and accuracy, particularly their …

Save Cite Cited by 43 Related articles All 3 versions View as HTML

[PDF] arxiv.org

First return, entropy-eliciting explore

…, Q Gu, T Liang, X Qu, X Zhou, Y Li, Z Wen… - arXiv preprint arXiv …, 2025 - arxiv.org

Reinforcement Learning from Verifiable Rewards (RLVR) improves the reasoning abilities
of Large Language Models (LLMs) but it struggles with unstable exploration. We propose …

Save Cite Cited by 35 Related articles All 2 versions View as HTML

[PDF] arxiv.org

Treepo: Bridging the gap of policy optimization and efficacy and inference efficiency with heuristic tree-based modeling

Y Li, Q Gu, Z Wen, Z Li, T Xing, S Guo, T Zheng… - arXiv preprint arXiv …, 2025 - arxiv.org

Recent advancements in aligning large language models via reinforcement learning have
achieved remarkable gains in solving complex reasoning problems, but at the cost of …

Save Cite Cited by 36 Related articles All 3 versions View as HTML

[PDF] arxiv.org

Futurex: An advanced live benchmark for llm agents in future prediction

…, J Guo, L Hu, J Jiao, X Li, J Liu, S Ni, Z Wen… - arXiv preprint arXiv …, 2025 - arxiv.org

Future prediction is a complex task for LLM agents, requiring a high level of analytical thinking,
information gathering, contextual understanding, and decision-making under uncertainty. …

Save Cite Cited by 26 Related articles All 2 versions View as HTML

[PDF] neurips.cc

Korgym: A dynamic game platform for llm reasoning evaluation

…, X Bu, J Chen, J Zhou, K Ma, Z Wen… - Advances in …, 2026 - proceedings.neurips.cc

Recent advancements in large language models (LLMs) underscore the need for more
comprehensive evaluation methods to accurately assess their reasoning capabilities. Existing …

Save Cite Cited by 4 Related articles All 3 versions View as HTML

[PDF] arxiv.org

Finsearchcomp: Towards a realistic, expert-level evaluation of financial search and reasoning

L Hu, J Jiao, J Liu, Y Ren, Z Wen, K Zhang… - arXiv preprint arXiv …, 2025 - arxiv.org

Search has emerged as core infrastructure for LLM-based agents and is widely viewed as
critical on the path toward more general intelligence. Finance is a particularly demanding …

Save Cite Cited by 17 Related articles All 2 versions View as HTML

[PDF] arxiv.org

Iv-bench: A benchmark for image-grounded video perception and reasoning in multimodal llms

…, Z Yang, Z Peng, B Feng, J Ma, X Gu, Z Wen… - arXiv preprint arXiv …, 2025 - arxiv.org

Existing evaluation frameworks for Multimodal Large Language Models (MLLMs) primarily
focus on image reasoning or general video understanding tasks, largely overlooking the …

Save Cite Cited by 12 Related articles All 2 versions View as HTML

Create alert

Cite

Advanced search

Saved to My library

User profiles for Zhoufutu Wen

Zhoufutu Wen

Enhancing dynamic image advertising with vision-language pre-training

Kor-bench: Benchmarking language models on knowledge-orthogonal reasoning tasks

Supergpqa: Scaling llm evaluation across 285 graduate disciplines

Simplevqa: Multimodal factuality evaluation for multimodal large language models

First return, entropy-eliciting explore

Treepo: Bridging the gap of policy optimization and efficacy and inference efficiency with heuristic tree-based modeling

Futurex: An advanced live benchmark for llm agents in future prediction

Korgym: A dynamic game platform for llm reasoning evaluation

Finsearchcomp: Towards a realistic, expert-level evaluation of financial search and reasoning

Iv-bench: A benchmark for image-grounded video perception and reasoning in multimodal llms