This directory contains tools for using SMART to evaluate texts produced by systems, given the source document and the reference summaries.
Link to paper: https://arxiv.org/pdf/2208.01030.pdf
SMART can be run programmatically. For example:
matcher = matching_functions.chrf_matcher
smart_scorer = scorer.SmartScorer(matching_fn=matcher)
score = smart_scorer.smart_score(reference, candidate)
Here, score is a dictionary containing SMART (1/2/L) scores.
You first need to download the necessary datasets:
- BARTScore data (you need to unpickle and save it again as a json file)
- SummEval data
You also need to download the precomputed scores for model-based matching functions (e.g., BLEURT, BERTScore, and T5-ANLI). In the terminal, follow the instructions and install gsutil. Then run:
gsutil cp -r gs://gresearch/SMART ./
Then, finally, run the following:
python summeval_experiments.py --bartscore_file=${BARTSCORE_PATH} --summeval_file=${SUMMEVAL_PATH} -- output_file=${OUTPUT_PATH}