Langfuse’s cover photo
Langfuse

Langfuse

Software Development

Open Source LLM Engineering Platform, now part of ClickHouse

About us

Langfuse is the 𝗺𝗼𝘀𝘁 𝗽𝗼𝗽𝘂𝗹𝗮𝗿 𝗼𝗽𝗲𝗻 𝘀𝗼𝘂𝗿𝗰𝗲 𝗟𝗟𝗠𝗢𝗽𝘀 𝗽𝗹𝗮𝘁𝗳𝗼𝗿𝗺. It helps teams collaboratively develop, monitor, evaluate, and debug AI applications. Langfuse can be 𝘀𝗲𝗹𝗳-𝗵𝗼𝘀𝘁𝗲𝗱 in minutes and is battle-tested and used in production by thousands of users from YC startups to large companies like Khan Academy or Twilio. Langfuse builds on a proven track record of reliability and performance. Developers can trace any Large Language model or framework using our SDKs for Python and JS/TS, our open API or our native integrations (OpenAI, Langchain, Llama-Index, Vercel AI SDK). Beyond tracing, developers use 𝗟𝗮𝗻𝗴𝗳𝘂𝘀𝗲 𝗣𝗿𝗼𝗺𝗽𝘁 𝗠𝗮𝗻𝗮𝗴𝗲𝗺𝗲𝗻𝘁, 𝗶𝘁𝘀 𝗼𝗽𝗲𝗻 𝗔𝗣𝗜𝘀, 𝗮𝗻𝗱 𝘁𝗲𝘀𝘁𝗶𝗻𝗴 𝗮𝗻𝗱 𝗲𝘃𝗮𝗹𝘂𝗮𝘁𝗶𝗼𝗻 𝗽𝗶𝗽𝗲𝗹𝗶𝗻𝗲𝘀 to improve the quality of their applications. Product managers can 𝗮𝗻𝗮𝗹𝘆𝘇𝗲, 𝗲𝘃𝗮𝗹𝘂𝗮𝘁𝗲, 𝗮𝗻𝗱 𝗱𝗲𝗯𝘂𝗴 𝗔𝗜 𝗽𝗿𝗼𝗱𝘂𝗰𝘁𝘀 by accessing detailed metrics on costs, latencies, and user feedback in the Langfuse Dashboard. They can bring 𝗵𝘂𝗺𝗮𝗻𝘀 𝗶𝗻 𝘁𝗵𝗲 𝗹𝗼𝗼𝗽 by setting up annotation workflows for human labelers to score their application. Langfuse can also be used to 𝗺𝗼𝗻𝗶𝘁𝗼𝗿 𝘀𝗲𝗰𝘂𝗿𝗶𝘁𝘆 𝗿𝗶𝘀𝗸𝘀 through security framework and evaluation pipelines. Langfuse enables 𝗻𝗼𝗻-𝘁𝗲𝗰𝗵𝗻𝗶𝗰𝗮𝗹 𝘁𝗲𝗮𝗺 𝗺𝗲𝗺𝗯𝗲𝗿𝘀 to iterate on prompts and model configurations directly within the Langfuse UI or use the Langfuse Playground for fast prompt testing. Langfuse is 𝗼𝗽𝗲𝗻 𝘀𝗼𝘂𝗿𝗰𝗲 and we are proud to have a fantastic community on Github and Discord that provides help and feedback. Do get in touch with us! Langfuse is now part of ClickHouse.

Website
https://langfuse.com
Industry
Software Development
Company size
11-50 employees
Headquarters
San Francisco
Type
Privately Held
Founded
2022
Specialties
Langfuse, Large Language Models, Observability, Prompt Management, Evaluations, Testing, Open Source, LLM, AI, Analytics, Open Source, and Artificial Intelligence

Products

Locations

Employees at Langfuse

Updates

  • Now live: Langfuse x Hermes integration. Hermes Agent by Nous Research is a self-improving AI agent with persistent memory, autonomous skill creation, and support for any LLM provider. It ships with a bundled Langfuse observability plugin that automatically captures every conversation turn, LLM request, and tool call, including token usage and cost. All details in the comments.

    • No alternative text description for this image
  • Quarterly Langfuse Town Hall. Join the Langfuse team, maintainers, and community for our quarterly call. We'll talk about Langfuse v4, ClickHouse, cover the latest releases, roadmap, and do Q&A. 𝗪𝗵𝗲𝗻: Thursday, June 11th, 2026, at 9am PT / 6pm CEST 𝗪𝗵𝗲𝗿𝗲: Virtual on Google Meet Link in comments.

    • No alternative text description for this image
  • 𝗗𝗮𝘆 𝟱 𝗼𝗳 𝗟𝗮𝗻𝗴𝗳𝘂𝘀𝗲 𝗟𝗮𝘂𝗻𝗰𝗵 𝗪𝗲𝗲𝗸: Langfuse MCP. Until this week, the hosted Langfuse MCP server only exposed prompt management. Today, it covers most of Langfuse: observations, metrics, scores, score configs, datasets, and their items and runs, comments, annotation queues, models, media, and health. Agents such as Claude Cowork or Linear can now investigate a production issue, pull the relevant observation, query metrics, drop a comment for the team, create a score, or create dataset items for regression testing, all without leaving the chat. The MCP server complements the Langfuse Skill and the Langfuse CLI. Use the CLI when your agent can run bash and pre-filter data. Use the MCP server when it cannot. Restrict to read-only by allow-listing lookup tools if you don't want writes. Shoutout to Ben Bachem for shipping this. Link in comments.

  • 𝗗𝗮𝘆 𝟰 𝗼𝗳 𝗟𝗮𝗻𝗴𝗳𝘂𝘀𝗲 𝗟𝗮𝘂𝗻𝗰𝗵 𝗪𝗲𝗲𝗸: Code evaluators. Not every evaluation needs an LLM. JSON parseability, schema validation, exact match, required tool arguments, custom business rules: things you can verify with code, you should verify with code. Deterministic, reproducible, no token cost. You can now write an 𝘦𝘷𝘢𝘭𝘶𝘢𝘵𝘦 function in Python or TypeScript directly in Langfuse, attach it to live observations or a dataset experiment, and the result lands as a native Langfuse score. Code evaluators sit alongside LLM-as-a-Judge rather than replacing it. Code wins for objective checks. A judge wins for quality, tone, or to add reasoning. They complement each other and give you a better, more complete picture together. Link in comments.

  • 𝗗𝗮𝘆 𝟯 𝗼𝗳 𝗟𝗮𝗻𝗴𝗳𝘂𝘀𝗲 𝗟𝗮𝘂𝗻𝗰𝗵 𝗪𝗲𝗲𝗸 𝟱: Full-Text Search When something breaks in your AI app, you need to pull the one trace that says "refund failed" out of hundreds of GB of production data. Scroll-and-hope doesn't cut it at that scale. Full-text search is now live on Langfuse Cloud. In our benchmarks, large input/output searches that took 18 seconds and scanned 494 GB now return in under half a second and read less than a gigabyte. Built on top of ClickHouse's new full-text search and a tight feedback loop with their team, helping us land features in Langfuse weeks after they ship in ClickHouse core. Humans benefit in the UI. Agents benefit in the API: the new 'matches' operator on Observations API v2 lets coding agents and scripts run token-based search programmatically. Link in comments.

  • 𝗗𝗮𝘆 𝟮 𝗼𝗳 𝗟𝗮𝗻𝗴𝗳𝘂𝘀𝗲 𝗟𝗮𝘂𝗻𝗰𝗵 𝗪𝗲𝗲𝗸 𝟱: the Langfuse agent skill. Building an agent is the easy part. Getting it to production is hard. You set up tracing and evaluators, but how do you know what your agent's real failure modes are? How do you know your LLM-as-a-judge is actually capturing what you care about? The Langfuse Skill turns Langfuse into a headless platform you can drive in natural language. It follows the Open Agent Skills standard, so it works with Claude Code, Cursor, Codex, and anything else that speaks the format. In the video below, Marlies Mayerhofer uses the LLM-as-a-Judge calibration skill with Codex to produce a full analysis with accuracy, F1, precision, recall, and cost, all graphed directly in the new Langfuse Experiments view.

  • 𝗗𝗮𝘆 𝟭 𝗼𝗳 𝗟𝗮𝗻𝗴𝗳𝘂𝘀𝗲 𝗟𝗮𝘂𝗻𝗰𝗵 𝗪𝗲𝗲𝗸 𝟱: Experiments in CI/CD. Every change of prompt, model or chunking is a potential regression you may only catch in production when it's too late. The new Langfuse Experiment GitHub Action runs your experiments on every pull request, tests your app against a dataset you control, fails the workflow when scores drop below your threshold, and posts pass/fail directly to the PR. Every run is tracked in Langfuse, so when something does regress, the failing items are only one click away. Link in comments.

Similar pages

Browse jobs

Funding