Beyond the imitation game: Quantifying and extrapolating the capabilities of language models A Srivastava, A Rastogi, A Rao, AAM Shoeb, A Abid, A Fisch, AR Brown, ... Transactions on Machine Learning Research, 2022 | 1970* | 2022 |
Spider: A large-scale human-labeled dataset for complex and cross-domain semantic parsing and text-to-sql task T Yu, R Zhang, K Yang, M Yasunaga, D Wang, Z Li, J Ma, I Li, Q Yao, ... EMNLP 2018, 2018 | 1552 | 2018 |
QMSum: A new benchmark for query-based multi-domain meeting summarization M Zhong, D Yin, T Yu, A Zaidi, M Mutuma, R Jha, AH Awadallah, ... NAACL 2021, 2021 | 367 | 2021 |
One embedder, any task: Instruction-finetuned text embeddings H Su, J Kasai, Y Wang, Y Hu, M Ostendorf, W Yih, NA Smith, ... ACL 2023, 2023 | 350 | 2023 |
Unifiedskg: Unifying and multi-tasking structured knowledge grounding with text-to-text language models T Xie, CH Wu, P Shi, R Zhong, T Scholak, M Yasunaga, CS Wu, M Zhong, ... EMNLP 2022, 2022 | 345* | 2022 |
DS-1000: A Natural and Reliable Benchmark for Data Science Code Generation Y Lai, C Li, Y Wang, T Zhang, R Zhong, L Zettlemoyer, SW Yih, D Fried, ... ICML 2023, 2023 | 330 | 2023 |
Typesql: Knowledge-based type-aware neural text-to-sql generation T Yu, Z Li, Z Zhang, R Zhang, D Radev NAACL 2018, 2018 | 329 | 2018 |
Selective annotation makes language models better few-shot learners H Su, J Kasai, CH Wu, W Shi, T Wang, J Xin, R Zhang, M Ostendorf, ... ICLR 2023, 2023 | 309* | 2023 |
GraPPa: grammar-augmented pre-training for table semantic parsing T Yu, CS Wu, XV Lin, B Wang, YC Tan, X Yang, D Radev, R Socher, ... ICLR 2021, 2021 | 272* | 2021 |
SyntaxSQLNet: Syntax Tree Networks for Complex and Cross Domain Text-to-SQL Task T Yu, M Yasunaga, K Yang, R Zhang, D Wang, Z Li, D Radev EMNLP 2018, 2018 | 262 | 2018 |
OSWorld: Benchmarking Multimodal Agents for Open-Ended Tasks in Real Computer Environments T Xie, D Zhang, J Chen, X Li, S Zhao, R Cao, TJ Hua, Z Cheng, D Shin, ... NeurIPS 2024, 2024 | 260 | 2024 |
SParC: cross-domain semantic parsing in context T Yu, R Zhang, M Yasunaga, YC Tan, XV Lin, S Li, H Er, I Li, B Pang, ... ACL 2019, 2019 | 247* | 2019 |
Binding language models in symbolic languages Z Cheng, T Xie, P Shi, C Li, R Nadkarni, Y Hu, C Xiong, D Radev, ... ICLR 2023, 2023 | 243 | 2023 |
Zerogen: Efficient zero-shot learning via dataset generation J Ye, J Gao, Q Li, H Xu, J Feng, Z Wu, T Yu, L Kong EMNLP 2022, 2022 | 225 | 2022 |
Dart: Open-domain structured data record to text generation L Nan, D Radev, R Zhang, A Rau, A Sivaprasad, C Hsieh, X Tang, A Vyas, ... NAACL 2021, 2020 | 222* | 2020 |
Folio: Natural language reasoning with first-order logic S Han, H Schoelkopf, Y Zhao, Z Qi, M Riddell, W Zhou, J Coady, D Peng, ... arXiv preprint arXiv:2209.00840, 2022 | 216* | 2022 |
Twitter sentiment in New York City parks as measure of well-being RA Plunz, Y Zhou, MIC Vintimilla, K Mckeown, T Yu, L Uguccioni, ... Landscape and urban planning 189, 235-246, 2019 | 210 | 2019 |
Editing-based SQL query generation for cross-domain context-dependent questions R Zhang, T Yu, HY Er, S Shim, E Xue, XV Lin, T Shi, C Xiong, R Socher, ... EMNLP 2019, 2019 | 176* | 2019 |
Text2Reward: Automated Dense Reward Function Generation for Reinforcement Learning T Xie, S Zhao, CH Wu, Y Liu, Q Luo, V Zhong, Y Yang, T Yu ICLR 2024, 2024 | 175* | 2024 |
Generative representational instruction tuning N Muennighoff, H Su, L Wang, N Yang, F Wei, T Yu, A Singh, D Kiela ICLR 2025, 2024 | 173 | 2024 |