User profiles for Zhoufutu Wen

Zhoufutu Wen

ByteDance SEED
Verified email at bytedance.com
Cited by 607

Enhancing dynamic image advertising with vision-language pre-training

Z Wen, X Zhao, Z Jin, Y Yang, W Jia, X Chen… - Proceedings of the 46th …, 2023 - dl.acm.org
In the multimedia era, image becomes an effective medium in search advertising. Dynamic
Image Advertising (DIA), a system that matches queries with appropriate ad images and …

Kor-bench: Benchmarking language models on knowledge-orthogonal reasoning tasks

K Ma, X Du, Y Wang, H Zhang, Z Wen… - International …, 2025 - proceedings.iclr.cc
In this paper, we introduce Knowledge-Orthogonal Reasoning (KOR), a concept aimed at
minimizing reliance on domain-specific knowledge, enabling more accurate evaluation of …

Supergpqa: Scaling llm evaluation across 285 graduate disciplines

…, D Ma, Y Ni, H Que, Q Wang, Z Wen… - Advances in …, 2026 - proceedings.neurips.cc
Large language models (LLMs) have demonstrated remarkable proficiency in mainstream
academic disciplines such as mathematics, physics, and computer science. However, human …

Simplevqa: Multimodal factuality evaluation for multimodal large language models

…, G Zhang, J Liu, Y Mai, Y Zeng, Z Wen… - Proceedings of the …, 2025 - openaccess.thecvf.com
The increasing application of multi-modal large language models (MLLMs) across various
sectors has spotlighted the essence of their output reliability and accuracy, particularly their …

First return, entropy-eliciting explore

…, Q Gu, T Liang, X Qu, X Zhou, Y Li, Z Wen… - arXiv preprint arXiv …, 2025 - arxiv.org
Reinforcement Learning from Verifiable Rewards (RLVR) improves the reasoning abilities
of Large Language Models (LLMs) but it struggles with unstable exploration. We propose …

Treepo: Bridging the gap of policy optimization and efficacy and inference efficiency with heuristic tree-based modeling

Y Li, Q Gu, Z Wen, Z Li, T Xing, S Guo, T Zheng… - arXiv preprint arXiv …, 2025 - arxiv.org
Recent advancements in aligning large language models via reinforcement learning have
achieved remarkable gains in solving complex reasoning problems, but at the cost of …

Futurex: An advanced live benchmark for llm agents in future prediction

…, J Guo, L Hu, J Jiao, X Li, J Liu, S Ni, Z Wen… - arXiv preprint arXiv …, 2025 - arxiv.org
Future prediction is a complex task for LLM agents, requiring a high level of analytical thinking,
information gathering, contextual understanding, and decision-making under uncertainty. …

Korgym: A dynamic game platform for llm reasoning evaluation

…, X Bu, J Chen, J Zhou, K Ma, Z Wen… - Advances in …, 2026 - proceedings.neurips.cc
Recent advancements in large language models (LLMs) underscore the need for more
comprehensive evaluation methods to accurately assess their reasoning capabilities. Existing …

Finsearchcomp: Towards a realistic, expert-level evaluation of financial search and reasoning

L Hu, J Jiao, J Liu, Y Ren, Z Wen, K Zhang… - arXiv preprint arXiv …, 2025 - arxiv.org
Search has emerged as core infrastructure for LLM-based agents and is widely viewed as
critical on the path toward more general intelligence. Finance is a particularly demanding …

Iv-bench: A benchmark for image-grounded video perception and reasoning in multimodal llms

…, Z Yang, Z Peng, B Feng, J Ma, X Gu, Z Wen… - arXiv preprint arXiv …, 2025 - arxiv.org
Existing evaluation frameworks for Multimodal Large Language Models (MLLMs) primarily
focus on image reasoning or general video understanding tasks, largely overlooking the …