Skip to content

inspect_evals to AVID report#9

Merged
shubhobm merged 2 commits into
mainfrom
inspect_eval
Mar 19, 2025
Merged

inspect_evals to AVID report#9
shubhobm merged 2 commits into
mainfrom
inspect_eval

Conversation

@harshraj172

Copy link
Copy Markdown
Collaborator

No description provided.

@harshraj172 harshraj172 requested a review from shubhobm March 13, 2025 18:09
Comment thread avidtools/connectors/inspect.py Outdated
report = Report()

report.affects = Affects(
developer=[],

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for this report, developer should be OpenAI, so programmatically we need to parse the first part of openai/gpt-4o-mini, the do a key-value search from a dict of the form name: human-readable-name to append the human readable name here

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is a bit difficult to have the extraction generalizable because for example one can use model created by MetaAI hosted via AzureAI (Microsoft).

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For example: inspect eval --model azureai/llama-2-70b-chat-wnsnw

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Those are special cases where deployer can be populated in place of developer. we can even choose to populate only deployer. In any case, splitting the name and where it is hosted/developed is the right thing to do to fit the schema, as compared to not splitting

Comment thread avidtools/connectors/inspect.py Outdated
artifacts=[
Artifact(
type=ArtifactTypeEnum.model,
name=eval_log.eval.model

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this should be only gpt-4o-mini

Comment thread avidtools/connectors/inspect.py Outdated
type=TypeEnum.measurement,
description=LangValue(
lang='eng',
value=eval_log.eval.task

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this should be a canned sentence of the form f"Evaluation of the LLM {model_name} on the {benchmark} benchmark using Inspect Evals"

lang='eng',
value=f"Sample input: {sample.input}\n"
f"Model output: {sample.output}\n"
f"Score: {sample.score}"

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is a good structure. to set the context, can you

  1. start with a canned description of the benchmark (if it's there in the logs), or just the canned sentence in problemtype.
  2. add a field f"Scorer: {scorer_description}\n" before the score, so the reader knows what the score signifies

Comment thread avidtools/connectors/inspect.py Outdated
Reference(
type='source',
label='Inspect Evaluation Log',
url=file_path

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

when this report goes up on AVID the user is not able to access this log. Instead, plz point to the benchmark itself. e.g. if we're ingesting a report on BOLD, the url point to that module in inspect_evals that we're contributing, and the corresponding page in Inspect Evals docs

@shubhobm shubhobm left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code is functional! a few comments on organization

@harshraj172

Copy link
Copy Markdown
Collaborator Author

@shubhobm please take a look now

@shubhobm shubhobm left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@shubhobm shubhobm merged commit b6a650a into main Mar 19, 2025
@shubhobm shubhobm deleted the inspect_eval branch March 19, 2025 17:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants