inspect_evals to AVID report by harshraj172 · Pull Request #9 · avidml/avidtools

harshraj172 · 2025-03-13T18:09:51Z

No description provided.

shubhobm · 2025-03-13T18:39:13Z

+        report = Report()
+
+        report.affects = Affects(
+            developer=[],


for this report, developer should be OpenAI, so programmatically we need to parse the first part of openai/gpt-4o-mini, the do a key-value search from a dict of the form name: human-readable-name to append the human readable name here

It is a bit difficult to have the extraction generalizable because for example one can use model created by MetaAI hosted via AzureAI (Microsoft).

For example: inspect eval --model azureai/llama-2-70b-chat-wnsnw

Those are special cases where deployer can be populated in place of developer. we can even choose to populate only deployer. In any case, splitting the name and where it is hosted/developed is the right thing to do to fit the schema, as compared to not splitting

shubhobm · 2025-03-13T18:39:35Z

+            artifacts=[
+                Artifact(
+                    type=ArtifactTypeEnum.model,
+                    name=eval_log.eval.model


this should be only gpt-4o-mini

shubhobm · 2025-03-13T18:41:39Z

+            type=TypeEnum.measurement,
+            description=LangValue(
+                lang='eng',
+                value=eval_log.eval.task


this should be a canned sentence of the form f"Evaluation of the LLM {model_name} on the {benchmark} benchmark using Inspect Evals"

shubhobm · 2025-03-13T18:45:09Z

+            lang='eng',
+            value=f"Sample input: {sample.input}\n"
+                  f"Model output: {sample.output}\n"
+                  f"Score: {sample.score}"


this is a good structure. to set the context, can you

start with a canned description of the benchmark (if it's there in the logs), or just the canned sentence in problemtype.

add a field f"Scorer: {scorer_description}\n" before the score, so the reader knows what the score signifies

shubhobm · 2025-03-13T18:47:38Z

+            Reference(
+                type='source',
+                label='Inspect Evaluation Log',
+                url=file_path


when this report goes up on AVID the user is not able to access this log. Instead, plz point to the benchmark itself. e.g. if we're ingesting a report on BOLD, the url point to that module in inspect_evals that we're contributing, and the corresponding page in Inspect Evals docs

shubhobm

Code is functional! a few comments on organization

harshraj172 · 2025-03-18T16:53:43Z

@shubhobm please take a look now

shubhobm

lgtm

inspect_evals to AVID report

e750784

harshraj172 requested a review from shubhobm March 13, 2025 18:09

shubhobm reviewed Mar 13, 2025

View reviewed changes

shubhobm requested changes Mar 13, 2025

View reviewed changes

resolve comments

072ef00

shubhobm approved these changes Mar 19, 2025

View reviewed changes

shubhobm merged commit b6a650a into main Mar 19, 2025

shubhobm deleted the inspect_eval branch March 19, 2025 17:36

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

inspect_evals to AVID report#9

inspect_evals to AVID report#9
shubhobm merged 2 commits into
mainfrom
inspect_eval

harshraj172 commented Mar 13, 2025

Uh oh!

shubhobm Mar 13, 2025

Uh oh!

harshraj172 Mar 17, 2025

Uh oh!

harshraj172 Mar 17, 2025

Uh oh!

shubhobm Mar 18, 2025

Uh oh!

shubhobm Mar 13, 2025

Uh oh!

shubhobm Mar 13, 2025

Uh oh!

shubhobm Mar 13, 2025

Uh oh!

shubhobm Mar 13, 2025

Uh oh!

shubhobm left a comment

Uh oh!

harshraj172 commented Mar 18, 2025

Uh oh!

shubhobm left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

harshraj172 commented Mar 13, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

shubhobm left a comment

Choose a reason for hiding this comment

Uh oh!

harshraj172 commented Mar 18, 2025

Uh oh!

shubhobm left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants