Starred repositories
Local-first memory for coding agents. Decisions, bugs, and context stored as Markdown, indexed locally with FTS5 plus optional semantic search. No RAM overhead at idle, no external servers.
Google Suite CLI: Gmail, GCal, GDrive, GContacts.
Mixing Language Models with Self-Verification and Meta-Verification
Sample app to demonstrate instrumenting Python FastAPI Uvicorn app with Datadog, Elastic, New Relic and OpenTelemetry.
Code for Fine-grained Uncertainty Quantification for LLMs from Semantic Similarities (NeurIPS'24)
Synthetic data curation for post-training and structured data extraction
Benchmarking LLMs via Uncertainty Quantification
A curated list of awesome approaches to AI model routing
Codebase for reproducing the experiments of the semantic uncertainty paper (short-phrase and sentence-length experiments).
Contextualized Sequence Likelihood: Enhanced Confidence Scores for Natural Language Generation
Awesome-LLM-Robustness: a curated list of Uncertainty, Reliability and Robustness in Large Language Models
Recipes to scale inference-time compute of open models
Optimizing inference proxy for LLMs
FrugalGPT: better quality and lower cost for LLM applications
A Heterogeneous Benchmark for Information Retrieval. Easy to use, evaluate your models across 15+ diverse IR datasets.
DocBench: A Benchmark for Evaluating LLM-based Document Reading Systems
Semantic cache for LLMs. Fully integrated with LangChain and llama_index.
A lightweight library for generating synthetic instruction tuning datasets for your data without GPT.
A reading list on LLM based Synthetic Data Generation 🔥
[NeurIPS 2023] This is the code for the paper `Large Language Model as Attributed Training Data Generator: A Tale of Diversity and Bias`.
[NeurIPS 2021] WRENCH: Weak supeRvision bENCHmark