[Question] Regarding to the TriviaQA, TriviaQA+CF-TriviaQA datasets in the paper

Hello!!!

I am interested in your paper "Hallucination Augmented Recitations for Language Models" and would like to conduct similar experiments.

I have few questions for you, and I would be incredibly grateful if you could answer me :)

> Table 3: Token-level F1 scores of T5-3B models finetuned with TriviaQA, CF-TriviaQA, and their combination. Combining our CF-TriviaQA dataset with TriviaQA achieves good out-of-domain performance while having a similar performance in in-domain as the model finetuned with TriviaQA.

1. Here are you referring to the whole (61,688 examples [link](https://github.com/mrqa/MRQA-Shared-Task-2019?tab=readme-ov-file#training-data)) TriviaQA dataset or just a subset? If it's a subset, how many training examples were included?

2. (Similar to previous question) TriviaQA + CF-TriviaQA, does that mean whole TriviaQA dataset + 19,327 CF examples you generated?

Looking forward to your responses!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Question] Regarding to the TriviaQA, TriviaQA+CF-TriviaQA datasets in the paper #1

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[Question] Regarding to the TriviaQA, TriviaQA+CF-TriviaQA datasets in the paper #1

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions