Ph.D., NLP researcher, data scientist, and entrepreneur.
Author of prof. Torchenstein's Pytorch course.
Stars
๐ LLM Eval
9 repositories
This is the repo for the paper Shepherd -- A Critic for Language Model Generation
Source Code of Paper "GPTScore: Evaluate as You Desire"
Github repository for "FELM: Benchmarking Factuality Evaluation of Large Language Models" (NeurIPS 2023)
Evaluation and Tracking for LLM Experiments and AI Agents
Evaluating LLMs' multi-round chatting capability via assessing conversations generated by two LLM instances.
[NeurIPS 2024] A comprehensive benchmark for evaluating critique ability of LLMs