Skip to content
View ksopyla's full-sized avatar

Organizations

@Ermlab @plon-io

Block or report ksopyla

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this userโ€™s behavior. Learn more about reporting abuse.

Report abuse
Stars

๐Ÿ” LLM Eval

9 repositories

This is the repo for the paper Shepherd -- A Critic for Language Model Generation

Jupyter Notebook 220 9 Updated Aug 10, 2023

Source Code of Paper "GPTScore: Evaluate as You Desire"

Python 257 22 Updated Feb 21, 2023

Expanding natural instructions

Python 1,028 197 Updated Dec 11, 2023

Github repository for "FELM: Benchmarking Factuality Evaluation of Large Language Models" (NeurIPS 2023)

Python 62 1 Updated Dec 25, 2023

Evaluation and Tracking for LLM Experiments and AI Agents

Python 2,987 237 Updated Dec 20, 2025

Evaluating LLMs' multi-round chatting capability via assessing conversations generated by two LLM instances.

Jupyter Notebook 159 6 Updated May 22, 2025

[NeurIPS 2024] A comprehensive benchmark for evaluating critique ability of LLMs

Python 48 2 Updated Nov 29, 2024

Reproducible, flexible LLM evaluations

Python 307 63 Updated Nov 20, 2025