Congrats to our friends at Helicone (YC W23) for this impressive milestone! Their V2 is one of the most exciting applications of CodeSandbox SDK we've seen yet!
“Helicone will be replaced by OpenAI's dashboard…” … was the feedback we received when we first launched on HackerNews. 😰 22 months later, Helicone (YC W23) has since processed over 2.1B requests and 2.6T tokens, working with teams ranging from startups to Fortune 500 companies. Today, we're excited to announce Helicone V2 - Developer tooling for the complete LLM application lifecycle of logging, evaluation, experimentation, and release. Helicone V1: Log → Review → Release (Hope it works) Helicone V2: Log → Evaluate → Experiment → Review → Release --- What's new in V2?: --- 1) Fully Open-Source Helm Chart for Self-Hosting Deploy Helicone anywhere with a production-ready Helm chart - gain full control over your data and infrastructure 2) Prompt Experiments Test prompt variations against real-world data. Measure real improvement before deployment with offline evaluators (similar to Anthropic's workbench but model-agnostic) 3) LLM Evaluation Framework Finally, quantifiable metrics with online and offline evaluations built into Helicone - either via LLM-as-judge or custom Python evaluators leveraging the CodeSandbox Sandboxes SDK (link in comments) 4) New Prompt Editor Revamped UI featuring integrated playground testing, real-time co-pilot suggestions, and support for 25+ providers and hundreds of models. What hasn't changed: 1) Simple one-line integration 2) Core observability offering 3) Our commitment to open-source If you've found Helicone helpful, we'd love your support and feedback. Please star our repo to support the project (link in comments) and join the discussion on Github or Discord.