Evaluation This repo is used for LLM Benchmark. model Acc@lambada_openai Acc@cbt-cn-val Acc@cbt-cn-test Acc@cbt-ne-val Acc@cbt-ne-test gpt2 117M 44.7 87.05 86.12 84.35 79.44 gpt2 117M (paper) 45.99 87.65 * 83.4 *