This repository was archived by the owner on Nov 3, 2025. It is now read-only.

Description
Hello!!!
I am interested in your paper "Hallucination Augmented Recitations for Language Models" and would like to conduct similar experiments.
I have few questions for you, and I would be incredibly grateful if you could answer me :)
Table 3: Token-level F1 scores of T5-3B models finetuned with TriviaQA, CF-TriviaQA, and their combination. Combining our CF-TriviaQA dataset with TriviaQA achieves good out-of-domain performance while having a similar performance in in-domain as the model finetuned with TriviaQA.
-
Here are you referring to the whole (61,688 examples link) TriviaQA dataset or just a subset? If it's a subset, how many training examples were included?
-
(Similar to previous question) TriviaQA + CF-TriviaQA, does that mean whole TriviaQA dataset + 19,327 CF examples you generated?
Looking forward to your responses!