How are the acc, precision, recall, and other indicators of claim level calculated? Because the claims extracted from the model are definitely different? It is challenging to calculate indicators without manual evaluation.
Is the calculation of the indicators here based on the ground truth annotated in the dataset? The default claim is given, and the correctness of each claim is known, corresponding to the label in the dataset. Collecting evidence based on the given claim, and then verify the correctness of the claim and its consistency with the ground truth, in order to conduct factual verification. I don't know if my understanding is correct. Isn't this part of the code missing in the repo?

How are the acc, precision, recall, and other indicators of claim level calculated? Because the claims extracted from the model are definitely different? It is challenging to calculate indicators without manual evaluation.

Is the calculation of the indicators here based on the ground truth annotated in the dataset? The default claim is given, and the correctness of each claim is known, corresponding to the label in the dataset. Collecting evidence based on the given claim, and then verify the correctness of the claim and its consistency with the ground truth, in order to conduct factual verification. I don't know if my understanding is correct. Isn't this part of the code missing in the repo?