Advanced RAG Pipelines and Evaluation
-
Updated
Dec 7, 2025 - Python
Advanced RAG Pipelines and Evaluation
his repository contains code and resources related to an in-depth analysis of OpenAI's HealthBench, a benchmark designed for evaluating Large Language Models in the healthcare sector.
Open-source HealthBench Hard autopsy of GPT-5.2
Add a description, image, and links to the healthbench topic page so that developers can more easily learn about it.
To associate your repository with the healthbench topic, visit your repo's landing page and select "manage topics."