-
CVS Health
- United States
- in/dylan-bouchard-phd-52594664
Stars
Pip compatible CodeBLEU metric implementation available for linux/macos/win
Official repository for the paper "LiveCodeBench: Holistic and Contamination Free Evaluation of Large Language Models for Code"
Build resilient language agents as graphs.
Concise, consistent, and legible badges in SVG and raster format
Virtual whiteboard for sketching hand-drawn like diagrams
A package to evaluate factuality of long-form generation. Original implementation of our EMNLP 2023 paper "FActScore: Fine-grained Atomic Evaluation of Factual Precision in Long Form Text Generation"
Codebase for reproducing the experiments of the semantic uncertainty paper (short-phrase and sentence-length experiments).
🤗 The largest hub of ready-to-use datasets for AI models with fast, easy-to-use and efficient data manipulation tools
🤗 Transformers: the model-definition framework for state-of-the-art machine learning models in text, vision, audio, and multimodal models, for both inference and training.
LettuceDetect is a hallucination detection framework for RAG applications.
Supercharge Your LLM Application Evaluations 🚀
Deepchecks: Tests for Continuous Validation of ML Models & Data. Deepchecks is a holistic open-source solution for all of your AI & ML validation needs, enabling to thoroughly test your data and mo…
Python packaging and dependency management made easy
Complete AI governance and LLM Evals platform with support for EU AI Act, ISO 42001, ISO 27001 and NIST AI RMF. Join our Discord channel: https://discord.com/invite/d3k3E4uEpR
Adversarial Natural Language Inference Benchmark
RAG evaluation without the need for "golden answers"
LLM-powered Conversational AI experience using Vectara
scikit-learn: machine learning in Python
Uncertainty Quantification 360 (UQ360) is an extensible open-source toolkit that can help you estimate, communicate and use uncertainty in machine learning model predictions.
Interpretability and explainability of data and machine learning models
The Granite Guardian models are designed to detect risks in prompts and responses.
This repository contains a collection of surveys, datasets, papers, and codes, for predictive uncertainty estimation in deep learning models.
Awesome-LLM-Robustness: a curated list of Uncertainty, Reliability and Robustness in Large Language Models
nannyml: post-deployment data science in python