Skip to content

Robustness with frontier LLMs: R Software and Paper

License

Unknown, MIT licenses found

Licenses found

Unknown
LICENSE
MIT
LICENSE.md
Notifications You must be signed in to change notification settings

ArthurSpirling/futureProofR

Repository files navigation

futureProofR

492509219-8cd53b4c-b025-48d3-b193-9fc1531d1993

R software for robust inference with LLMs. This accompanies this working paper from Bisbee and Spirling:

What to Do When Humans Are No Longer the Gold Standard: Large Language Models, State of the Art and Robustness

The abstract is as follows:

In this short paper, we consider the research implications of large language model (LLM) capabilities approaching, perhaps exceeding, those of highly-trained humans. Specifically, we note that frontier LLMs demonstrate near-expert performance for many data annotation tasks, and they are getting better over time. We show what this will mean for inference in downstream tasks: optimistically, it is that estimated treatment effects will become larger, although claimed null effects may be more dubious. We argue that authors should focus more on sensitivity and robustness with respect to future technological change, and we demonstrate how to use local calibration for such problems. We discuss how our findings, combined with the fact that performance is inherently bounded above (at 100%), should affect debates on the importance of using proprietary “State of the Art” versus open-weight, replicable LLMs. We make available fast and free software (futureProofR) for implementing our suggestions

Comments are very welcome!

About

Robustness with frontier LLMs: R Software and Paper

Resources

License

Unknown, MIT licenses found

Licenses found

Unknown
LICENSE
MIT
LICENSE.md

Stars

Watchers

Forks

Packages

No packages published

Contributors 3

  •  
  •  
  •  

Languages