skip to main content
10.1145/3626772.3657855acmconferencesArticle/Chapter ViewAbstractPublication PagesirConference Proceedingsconference-collections
research-article

TriviaHG: A Dataset for Automatic Hint Generation from Factoid Questions

Published: 11 July 2024 Publication History

Abstract

Nowadays, individuals tend to engage in dialogues with Large Language Models, seeking answers to their questions. In times when such answers are readily accessible to anyone, the stimulation and preservation of human's cognitive abilities, as well as the assurance of maintaining good reasoning skills by humans becomes crucial. This study addresses such needs by proposing hints (instead of final answers or before giving answers) as a viable solution. We introduce a framework for the automatic hint generation for factoid questions, employing it to construct TriviaHG, a novel large-scale dataset featuring 160,230 hints corresponding to 16,645 questions from the TriviaQA dataset. Additionally, we present an automatic evaluation method that measures the Convergence and Familiarity quality attributes of hints. To evaluate the TriviaHG dataset and the proposed evaluation method, we enlisted 10 individuals to annotate 2,791 hints and tasked 6 humans with answering questions using the provided hints. The effectiveness of hints varied, with success rates of 96%, 78%, and 36% for questions with easy, medium, and hard answers, respectively. Moreover, the proposed automatic evaluation methods showed a robust correlation with annotators' results. Conclusively, the findings highlight three key insights: the facilitative role of hints in resolving unknown questions, the dependence of hint quality on answer difficulty, and the feasibility of employing automatic evaluation methods for hint assessment.

References

[1]
Abdelrahman Abdallah and Adam Jatowt. 2023. Generator-retriever-generator: A novel approach to open-domain question answering. arXiv preprint arXiv:2307.11278 (2023).
[2]
Heba Abdel-Nabi, Arafat Awajan, and Mostafa Z. Ali. 2023. Deep learning-based question answering: a survey. Knowledge and Information Systems 65, 4 (01 Apr 2023), 1399--1485. https://doi.org/10.1007/s10115-022-01783-5
[3]
Josh Achiam, Steven Adler, Sandhini Agarwal, Lama Ahmad, Ilge Akkaya, Florencia Leoni Aleman, Diogo Almeida, Janko Altenschmidt, Sam Altman, Shyamal Anadkat, et al. 2023. Gpt-4 technical report. arXiv preprint arXiv:2303.08774 (2023).
[4]
Raghunath Arnab. 2017. Chapter 7 - Stratified Sampling. In Survey Sampling Theory and Applications, Raghunath Arnab (Ed.). Academic Press, 213--256. https: //doi.org/10.1016/B978-0-12-811848-1.00007--8
[5]
Albert Bandura. 2013. The role of self-efficacy in goal-based motivation. New developments in goal setting and task performance (2013), 147--157.
[6]
Iz Beltagy, Matthew E Peters, and Arman Cohan. 2020. Longformer: The long-document transformer. arXiv preprint arXiv:2004.05150 (2020).
[7]
Jonathan Berant, Andrew Chou, Roy Frostig, and Percy Liang. 2013. Semantic Parsing on Freebase from Question-Answer Pairs. In Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, David Yarowsky, Timothy Baldwin, Anna Korhonen, Karen Livescu, and Steven Bethard (Eds.). Association for Computational Linguistics, Seattle, Washington, USA, 1533--1544. https://aclanthology.org/D13-1160
[8]
Tom Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared D Kaplan, Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda Askell, et al. 2020. Language models are few-shot learners. Advances in neural information processing systems 33 (2020), 1877--1901.
[9]
Tom Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared D Kaplan, Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda Askell, Sandhini Agarwal, Ariel Herbert-Voss, Gretchen Krueger, Tom Henighan, Rewon Child, Aditya Ramesh, Daniel Ziegler, Jeffrey Wu, Clemens Winter, Chris Hesse, Mark Chen, Eric Sigler, Mateusz Litwin, Scott Gray, Benjamin Chess, Jack Clark, Christopher Berner, Sam McCandlish, Alec Radford, Ilya Sutskever, and Dario Amodei. 2020. Language Models are Few-Shot Learners. In Advances in Neural Information Processing Systems, H. Larochelle, M. Ranzato, R. Hadsell, M.F. Balcan, and H. Lin (Eds.), Vol. 33. Curran Associates, Inc., 1877--1901. https://proceedings.neurips.cc/paper_files/paper/2020/file/ 1457c0d6bfcb4967418bfb8ac142f64a-Paper.pdf
[10]
Jannis Bulian, Christian Buck, Wojciech Gajewski, Benjamin Börschinger, and Tal Schuster. 2022. Tomayto, Tomahto. Beyond Token-level Answer Equivalence for Question Answering Evaluation. In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, Yoav Goldberg, Zornitsa Kozareva, and Yue Zhang (Eds.). Association for Computational Linguistics, Abu Dhabi, United Arab Emirates, 291--305. https://doi.org/10.18653/v1/2022.emnlp-main.20
[11]
Jiawei Chen, Hongyu Lin, Xianpei Han, and Le Sun. 2023. Benchmarking large lan-guage models in retrieval-augmented generation. arXiv preprint arXiv:2309.01431 (2023).
[12]
H Looren De Jong. 1996. Levels: Reduction and elimination in cognitive neuro-science. Problems of theoretical psychology 6 (1996), 165.
[13]
Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2019. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), Jill Burstein, Christy Doran, and Thamar Solorio (Eds.). Association for Computational Linguistics, Minneapolis, Minnesota, 4171--4186. https://doi. org/10.18653/v1/N19-1423
[14]
Yunfan Gao, Yun Xiong, Xinyu Gao, Kangxiang Jia, Jinliu Pan, Yuxi Bi, Yi Dai, Jiawei Sun, and Haofen Wang. 2023. Retrieval-augmented generation for large language models: A survey. arXiv preprint arXiv:2312.10997 (2023).
[15]
Allen Rovick Gregory Hume, Joel Michael and Martha Evens. 1996. Hinting as a Tactic in One-on-One Tutoring. Journal of the Learning Sciences 5, 1 (1996), 23--47. https://doi.org/10.1207/s15327809jls0501_2
[16]
Georgiana Haldeman, Andrew Tjang, Monica Babeş-Vroman, Stephen Bartos, Jay Shah, Danielle Yucht, and Thu D. Nguyen. 2018. Providing Meaningful Feedback for Autograding of Programming Assignments. In Proceedings of the 49th ACM Technical Symposium on Computer Science Education (Baltimore, Maryland, USA) (SIGCSE '18). Association for Computing Machinery, New York, NY, USA, 278--283. https://doi.org/10.1145/3159450.3159502
[17]
Jiawei Han, Jian Pei, and Hanghang Tong. 2022. Data mining: concepts and techniques. Morgan kaufmann.
[18]
Andrew Head, Elena Glassman, Gustavo Soares, Ryo Suzuki, Lucas Figueredo, Loris D'Antoni, and Björn Hartmann. 2017. Writing Reusable Code Feedback at Scale with Mixed-Initiative Program Synthesis. In Proceedings of the Fourth (2017) ACM Conference on Learning @ Scale (Cambridge, Massachusetts, USA) (L@S '17). Association for Computing Machinery, New York, NY, USA, 89--98. https://doi.org/10.1145/3051457.3051467
[19]
Gautier Izacard and Edouard Grave. 2021. Leveraging Passage Retrieval with Generative Models for Open Domain Question Answering. In Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume, Paola Merlo, Jorg Tiedemann, and Reut Tsarfaty (Eds.). Association for Computational Linguistics, Online, 874--880. https://doi.org/10. 18653/v1/2021.eacl-main.74
[20]
Adam Jatowt, Calvin Gehrer, and Michael Färber. 2023. Automatic Hint Generation. In Proceedings of the 2023 ACM SIGIR International Conference on Theory of Information Retrieval (Taipei, Taiwan) (ICTIR '23). Association for Computing Machinery, New York, NY, USA, 117--123. https://doi.org/10.1145/3578337.3605119
[21]
Wei Jin, Tiffany Barnes, John Stamper, Michael John Eagle, Matthew W. Johnson, and Lorrie Lehmann. 2012. Program Representation for Automatic Hint Generation for a Data-Driven Novice Programming Tutor. In Intelligent Tutoring Systems, Stefano A. Cerri, William J. Clancey, Giorgos Papadourakis, and Kitty Panourgia (Eds.). Springer Berlin Heidelberg, Berlin, Heidelberg, 304--309.
[22]
Mandar Joshi, Eunsol Choi, Daniel Weld, and Luke Zettlemoyer. 2017. TriviaQA: A Large Scale Distantly Supervised Challenge Dataset for Reading Comprehension. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Regina Barzilay and Min-Yen Kan (Eds.). Association for Computational Linguistics, Vancouver, Canada, 1601--1611. https: //doi.org/10.18653/v1/P17-1147
[23]
Ehsan Kamalloo, Nouha Dziri, Charles Clarke, and Davood Rafiei. 2023. Evaluating Open-Domain Question Answering in the Era of Large Language Models. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Anna Rogers, Jordan Boyd-Graber, and Naoaki Okazaki (Eds.). Association for Computational Linguistics, Toronto, Canada, 5591--5606. https://doi.org/10.18653/v1/2023.acl-long.307
[24]
Vladimir Karpukhin, Barlas Oguz, Sewon Min, Patrick Lewis, Ledell Wu, Sergey Edunov, Danqi Chen, and Wen-tau Yih. 2020. Dense Passage Retrieval for Open-Domain Question Answering. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), Bonnie Webber, Trevor Cohn, Yulan He, and Yang Liu (Eds.). Association for Computational Linguistics, Online, 6769--6781. https://doi.org/10.18653/v1/2020.emnlp-main.550
[25]
Tom Kwiatkowski, Jennimaria Palomaki, Olivia Redfield, Michael Collins, Ankur Parikh, Chris Alberti, Danielle Epstein, Illia Polosukhin, Jacob Devlin, Kenton Lee, Kristina Toutanova, Llion Jones, Matthew Kelcey, Ming-Wei Chang, Andrew M. Dai, Jakob Uszkoreit, Quoc Le, and Slav Petrov. 2019. Natural Questions: A Benchmark for Question Answering Research. Transactions of the Association for Computational Linguistics 7 (2019), 452--466. https://doi.org/10.1162/tacl_a_00276
[26]
Timotej Lazar, Martin Mo?ina, and Ivan Bratko. 2017. Automatic Extraction of AST Patterns for Debugging Student Programs. In Artificial Intelligence in Education, Elisabeth André, Ryan Baker, Xiangen Hu, Ma. Mercedes T. Rodrigo, and Benedict du Boulay (Eds.). Springer International Publishing, Cham, 162--174.
[27]
V.C.S. Lee, Y.T. Yu, C.M. Tang, T.L. Wong, and C.K. Poon. 2018. ViDA: A virtual debugging advisor for supporting learning in computer programming courses. Journal of Computer Assisted Learning 34, 3 (2018), 243--258. https://doi.org/10. 1111/jcal.12238 arXiv:https://onlinelibrary.wiley.com/doi/pdf/10.1111/jcal.12238
[28]
Xin Li and Dan Roth. 2002. Learning Question Classifiers. In COLING 2002: The 19th International Conference on Computational Linguistics. https://aclanthology. org/C02--1150
[29]
Jimmy Lin, Xueguang Ma, Sheng-Chieh Lin, Jheng-Hong Yang, Ronak Pradeep, and Rodrigo Nogueira. 2021. Pyserini: A Python Toolkit for Reproducible Information Retrieval Research with Sparse and Dense Representations. In Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval () (SIGIR '21). Association for Computing Machinery, New York, NY, USA, 2356--2362. https://doi.org/10.1145/3404835. 3463238
[30]
Yinhan Liu, Myle Ott, Naman Goyal, Jingfei Du, Mandar Joshi, Danqi Chen, Omer Levy, Mike Lewis, Luke Zettlemoyer, and Veselin Stoyanov. 2019. Roberta: A robustly optimized bert pretraining approach. arXiv preprint arXiv:1907.11692 (2019).
[31]
Zefang Liu. 2023. SecQA: A Concise Question-Answering Dataset for Evaluating Large Language Models in Computer Security. arXiv preprint arXiv:2312.15838 (2023).
[32]
Alex Mallen, Akari Asai, Victor Zhong, Rajarshi Das, Daniel Khashabi, and Hannaneh Hajishirzi. 2023. When Not to Trust Language Models: Investigating Effectiveness of Parametric and Non-Parametric Memories. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Anna Rogers, Jordan Boyd-Graber, and Naoaki Okazaki (Eds.). Association for Computational Linguistics, Toronto, Canada, 9802--9822. https: //doi.org/10.18653/v1/2023.acl-long.546
[33]
Victor J. Marin, Tobin Pereira, Srinivas Sridharan, and Carlos R. Rivero. 2017. Automated Personalized Feedback in Introductory Java Programming MOOCs. In 2017 IEEE 33rd International Conference on Data Engineering (ICDE). 1259--1270. https://doi.org/10.1109/ICDE.2017.169
[34]
Matej Martinc, Senja Pollak, and Marko Robnik-?ikonja. 2021. Supervised and unsupervised neural approaches to text readability. Computational Linguistics 47, 1 (2021), 141--179.
[35]
Vaibhav Mavi, Anubhav Jangra, and Adam Jatowt. 2022. A Survey on Multi-hop Question Answering and Generation. arXiv e-prints (2022), arXiv-2204.
[36]
Jessica McBroom, Irena Koprinska, and Kalina Yacef. 2021. A Survey of Automated Programming Hint Generation: The HINTS Framework. ACM Comput. Surv. 54, 8, Article 172 (oct 2021), 27 pages. https://doi.org/10.1145/3469885
[37]
Khalid Nassiri and Moulay Akhloufi. 2023. Transformer models used for text-based question answering systems. Applied Intelligence 53, 9 (01 May 2023), 10602--10635. https://doi.org/10.1007/s10489-022-04052-8
[38]
Florian Obermüller, Ute Heuer, and Gordon Fraser. 2021. Guiding Next-Step Hint Generation Using Automated Tests. In Proceedings of the 26th ACM Conference on Innovation and Technology in Computer Science Education V. 1 (Virtual Event, Germany) (ITiCSE '21). Association for Computing Machinery, New York, NY, USA, 220--226. https://doi.org/10.1145/3430665.3456344
[39]
Ankit Pal, Logesh Kumar Umapathi, and Malaikannan Sankarasubbu. 2022. MedMCQA: A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering. In Proceedings of the Conference on Health, Inference, and Learning (Proceedings of Machine Learning Research, Vol. 174), Gerardo Flores, George H Chen, Tom Pollard, Joyce C Ho, and Tristan Naumann (Eds.). PMLR, 248--260. https://proceedings.mlr.press/v174/pal22a.html
[40]
Guangyuan Piao. 2021. Scholarly Text Classification with Sentence BERT and Entity Embeddings. In Trends and Applications in Knowledge Discovery and Data Mining, Manish Gupta and Ganesh Ramakrishnan (Eds.). Springer International Publishing, Cham, 79--87.
[41]
Chris Piech, Mehran Sahami, Jonathan Huang, and Leonidas Guibas. 2015. Autonomously Generating Hints by Inferring Problem Solving Policies. In Proceedings of the Second (2015) ACM Conference on Learning @ Scale (Vancouver, BC, Canada) (L@S '15). Association for Computing Machinery, New York, NY, USA, 195--204. https://doi.org/10.1145/2724660.2724668
[42]
Rajpurkar Pranav, Jia Robin, and Liang Percy. 2018. Know What You Don't Know: Unanswerable Questions for SQuAD. Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers) (2018). https://doi.org/10.18653/v1/p18-2124
[43]
Archiki Prasad, Trung Bui, Seunghyun Yoon, Hanieh Deilamsalehy, Franck Dernoncourt, and Mohit Bansal. 2023. MeetingQA: Extractive Question-Answering on Meeting Transcripts. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Anna Rogers, Jordan Boyd-Graber, and Naoaki Okazaki (Eds.). Association for Computational Linguistics, Toronto, Canada, 15000--15025. https://doi.org/10.18653/v1/2023.acl-long.837
[44]
Thomas Price, Rui Zhi, and Tiffany Barnes. 2017. Evaluation of a Data-Driven Feedback Algorithm for Open-Ended Programming. International Educational Data Mining Society (2017).
[45]
Thomas W. Price, Yihuan Dong, Rui Zhi, Benjamin Paaßen, Nicholas Lytle, Veronica Cateté, and Tiffany Barnes. 2019. A Comparison of the Quality of Data-Driven Programming Hint Generation Algorithms. International Journal of Artificial Intelligence in Education 29, 3 (01 Aug 2019), 368--395. https://doi.org/10.1007/s40593-019-00177-z
[46]
Sahana Ramnath, Preksha Nema, Deep Sahni, and Mitesh M. Khapra. 2020. Towards Interpreting BERT for Reading Comprehension Based QA. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), Bonnie Webber, Trevor Cohn, Yulan He, and Yang Liu (Eds.). Association for Computational Linguistics, Online, 3236--3242. https: //doi.org/10.18653/v1/2020.emnlp-main.261
[47]
Nils Reimers and Iryna Gurevych. 2019. Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics. https://arxiv.org/abs/1908.10084
[48]
Kelly Rivers. 2017. Automated Data-Driven Hint Generation for Learning Pro-gramming. (7 2017). https://doi.org/10.1184/R1/6714911.v1
[49]
Kelly Rivers, Erik Harpstead, and Ken Koedinger. 2016. Learning Curve Analysis for Programming: Which Concepts do Students Struggle With?. In Proceedings of the 2016 ACM Conference on International Computing Education Research (Melbourne, VIC, Australia) (ICER '16). Association for Computing Machinery, New York, NY, USA, 143--151. https://doi.org/10.1145/2960310.2960333
[50]
Anna Rogers, Matt Gardner, and Isabelle Augenstein. 2023. QA Dataset Explosion: A Taxonomy of NLP Resources for Question Answering and Reading Comprehension. ACM Comput. Surv. 55, 10, Article 197 (feb 2023), 45 pages. https://doi.org/10.1145/3560260
[51]
Jianlin Su, Murtadha Ahmed, Yu Lu, Shengfeng Pan, Wen Bo, and Yunfeng Liu. 2024. RoFormer: Enhanced transformer with Rotary Position Embedding. Neuro-computing 568 (2024), 127063. https://doi.org/10.1016/j.neucom.2023.127063
[52]
Hong Sun, Xue Li, Yinchuan Xu, Youkow Homma, Qi Cao, Min Wu, Jian Jiao, and Denis Charles. 2023. Autohint: Automatic prompt optimization with hint generation. arXiv preprint arXiv:2307.07415 (2023).
[53]
Gemini Team, Rohan Anil, Sebastian Borgeaud, Yonghui Wu, Jean-Baptiste Alayrac, Jiahui Yu, Radu Soricut, Johan Schalkwyk, Andrew M Dai, Anja Hauth, et al. 2023. Gemini: a family of highly capable multimodal models. arXiv preprint arXiv:2312.11805 (2023).
[54]
Ryan Tolboom. 2023. Computer Systems Security.
[55]
Yongqi Tong, Yifan Wang, Dawei Li, Sizhe Wang, Zi Lin, Simeng Han, and Jingbo Shang. 2023. Eliminating Reasoning via Inferring with Planning: A New Framework to Guide LLMs' Non-linear Thinking. arXiv preprint arXiv:2310.12342 (2023).
[56]
Hugo Touvron, Louis Martin, Kevin Stone, Peter Albert, Amjad Almahairi, Yasmine Babaei, Nikolay Bashlykov, Soumya Batra, Prajjwal Bhargava, Shruti Bhosale, et al. 2023. Llama 2: Open foundation and fine-tuned chat models. arXiv preprint arXiv:2307.09288 (2023).
[57]
Ellen L. Usher and Frank Pajares. 2006. Sources of academic and self-regulatory efficacy beliefs of entering middle school students. Contemporary Educational Psychology 31, 2 (2006), 125--141. https://doi.org/10.1016/j.cedpsych.2005.03.002
[58]
Jiexin Wang, Adam Jatowt, and Masatoshi Yoshikawa. 2022. ArchivalQA: A Large-scale Benchmark Dataset for Open-Domain Question Answering over Historical News Collections. In Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval (Madrid, Spain) (SIGIR '22). Association for Computing Machinery, New York, NY, USA, 3025--3035. https://doi.org/10.1145/3477495.3531734
[59]
Liang Wang, Nan Yang, and Furu Wei. 2023. Query2doc: Query Expansion with Large Language Models. In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, Houda Bouamor, Juan Pino, and Kalika Bali (Eds.). Association for Computational Linguistics, Singapore, 9414--9423. https://doi.org/10.18653/v1/2023.emnlp-main.585
[60]
Luqi Wang, Kaiwen Zheng, Liyin Qian, and Sheng Li. 2022. A Survey of Extractive Question Answering. In 2022 International Conference on High Performance Big Data and Intelligent Systems (HDIS). 147--153. https://doi.org/10.1109/HDIS56859. 2022.9991478
[61]
Dewey Lonzo Whaley III. 2005. The interquartile range: Theory and estimation. Ph. D. Dissertation. East Tennessee State University.
[62]
BigScience Workshop, Teven Le Scao, Angela Fan, Christopher Akiki, Ellie Pavlick, Suzana IliĆ, Daniel Hesslow, Roman Castagné, Alexandra Sasha Luccioni, François Yvon, et al. 2022. Bloom: A 176b-parameter open-access multilingual language model. arXiv preprint arXiv:2211.05100 (2022).
[63]
Can Xu, Qingfeng Sun, Kai Zheng, Xiubo Geng, Pu Zhao, Jiazhan Feng, Chongyang Tao, and Daxin Jiang. 2023. Wizardlm: Empowering large language models to follow complex instructions. arXiv preprint arXiv:2304.12244 (2023).
[64]
Puning Yu and Yunyi Liu. 2021. Roberta-based Encoder-decoder Model for Question Answering System. In 2021 International Conference on Intelligent Computing, Automation and Applications (ICAA). 344--349. https://doi.org/10.1109/ ICAA53760.2021.00070
[65]
Wenhao Yu, Dan Iter, Shuohang Wang, Yichong Xu, Mingxuan Ju, Soumya Sanyal, Chenguang Zhu, Michael Zeng, and Meng Jiang. 2023. Generate rather than Retrieve: Large Language Models are Strong Context Generators. In The Eleventh International Conference on Learning Representations. https://openreview.net/ forum?id=fB0hRu9GZUS
[66]
Manzil Zaheer, Guru Guruganesh, Kumar Avinava Dubey, Joshua Ainslie, Chris Alberti, Santiago Ontanon, Philip Pham, Anirudh Ravula, Qifan Wang, Li Yang, and Amr Ahmed. 2020. Big Bird: Transformers for Longer Sequences. In Advances in Neural Information Processing Systems, H. Larochelle, M. Ranzato, R. Hadsell, M.F. Balcan, and H. Lin (Eds.), Vol. 33. Curran Associates, Inc., 17283--17297. https://proceedings.neurips.cc/paper_files/paper/2020/file/ c8512d142a2d849725f31a9a7a361ab9-Paper.pdf
[67]
Tianyi Zhang*, Varsha Kishore*, Felix Wu*, Kilian Q. Weinberger, and Yoav Artzi. 2020. BERTScore: Evaluating Text Generation with BERT. In International Conference on Learning Representations. https://openreview.net/forum?id=SkeHuCVFDr
[68]
Haoxi Zhong, Chaojun Xiao, Cunchao Tu, Tianyang Zhang, Zhiyuan Liu, and Maosong Sun. 2020. JEC-QA: A Legal-Domain Question Answering Dataset. Proceedings of the AAAI Conference on Artificial Intelligence 34, 05 (Apr. 2020), 9701--9708. https://doi.org/10.1609/aaai.v34i05.6519

Index Terms

  1. TriviaHG: A Dataset for Automatic Hint Generation from Factoid Questions

      Recommendations

      Comments

      Information & Contributors

      Information

      Published In

      cover image ACM Conferences
      SIGIR '24: Proceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval
      July 2024
      3164 pages
      ISBN:9798400704314
      DOI:10.1145/3626772
      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

      Sponsors

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 11 July 2024

      Permissions

      Request permissions for this article.

      Check for updates

      Author Tags

      1. hint generation
      2. large language models
      3. question answering

      Qualifiers

      • Research-article

      Conference

      SIGIR 2024
      Sponsor:

      Acceptance Rates

      Overall Acceptance Rate 792 of 3,983 submissions, 20%

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • 0
        Total Citations
      • 152
        Total Downloads
      • Downloads (Last 12 months)152
      • Downloads (Last 6 weeks)24
      Reflects downloads up to 13 Jan 2025

      Other Metrics

      Citations

      View Options

      Login options

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Media

      Figures

      Other

      Tables

      Share

      Share

      Share this Publication link

      Share on social media