Microsoft Agent Framework Token Count in Langfuse #11252

eddieahn · 2025-12-20T18:21:08Z

eddieahn
Dec 20, 2025

Describe your question

When using the Microsoft Agent Framework, every ChatAgent, is decorated with capture_usage=False. This means:
the Agent Framework intentionally does not send token usage on invoke_agent spans.

But it seems like Langfuse estimates tokens for type spans anyway. As a result, it looks like the total count within Langfuse is double counted and an inaccurate representation on the number of tokens and cost.

If you could please take a look and confirm, that would be great!

Langfuse Cloud or Self-Hosted?

Langfuse Cloud

If Self-Hosted

No response

If Langfuse Cloud

No response

SDK and integration versions

No response

Pre-Submission Checklist

I have checked for existing issues/discussions and consulted Langfuse AI.

2025-12-20T18:21:35Z

inkeep-ai-assistant[bot]
bot Dec 20, 2025

You're right to be concerned about potential double-counting. Here's what's happening:

Token Inference for Spans

Langfuse only infers usage and cost for observations of type generation and embedding, not for regular span types(1). The documentation explicitly states: "Only observations of type generation and embedding can track costs and usage"(1).

However, if the Microsoft Agent Framework is creating observations that Langfuse interprets as generation type (rather than span type), then Langfuse will attempt to infer token counts based on the model parameter if usage details aren't provided(1).

Double-Counting Issues

There are known issues with double-counting in certain scenarios:

Parent-Child Generation Observations: If both parent and child observations are of type GENERATION, Langfuse sums tokens from both, causing totals to be roughly doubled(2). The recommended fix is to change the parent observation type from GENERATION to SPAN(2).
Cached Tokens: When cached tokens are present (like cache_read_input_tokens), the UI can double-count input tokens by summing all fields containing "input"(3)(4).

Recommendations

To resolve this:

Verify observation types: Check whether the Agent Framework is creating span or generation type observations. Only generation types should have token inference applied(1).
Explicitly provide usage details: If you want accurate token counts, explicitly ingest the usage details from the Agent Framework's response rather than relying on inference(1).
Check for nested generations: Ensure you don't have nested GENERATION type observations that would cause aggregation issues(2).

Could you share a link to a specific trace in Langfuse Cloud so we can examine the observation types and token counting behavior?

📚 Sources:

Have another question? Just tag @inkeep.

0 replies

2025-12-20T18:27:02Z

inkeep-ai-assistant[bot]
bot Dec 20, 2025

Unfortunately, I cannot access the specific trace link you provided as I don't have direct access to your Langfuse Cloud project.

However, based on the token counting behavior you're describing with the Microsoft Agent Framework, here's what's likely happening:

Token Inference Behavior

Langfuse will attempt to infer token usage when:

Usage details are not ingested via the API/SDK
The observation is of type generation or embedding
A tokenizer is specified for the model(1)

The documentation explicitly states: "Only observations of type generation and embedding can track costs and usage"(1). Regular span type observations should not have token inference applied.

Potential Causes of Double-Counting

If you're seeing double-counted tokens, the most likely scenarios are:

The Agent Framework observations are being created as generation type instead of span type, causing Langfuse to infer tokens even though the framework set capture_usage=False
Nested generation observations: If both parent and child observations are type generation, Langfuse sums tokens from both levels, effectively doubling the count(1)

How to Verify and Fix

To diagnose this issue, check in the Langfuse UI:

Look at the observation type for each Agent Framework span - is it showing as SPAN or GENERATION?
Check if there are nested GENERATION type observations in the trace hierarchy
Verify whether usage details are being ingested (even if capture_usage=False) or if Langfuse is inferring them

If Agent Framework observations are incorrectly typed as generation, you would need to ensure they're created as span type to prevent automatic token inference(1).

Could you check the observation types in your trace and share what you find?

📚 Sources:

Model Usage & Cost Tracking

Have another question? Just tag @inkeep.

0 replies

2025-12-20T18:28:16Z

inkeep-ai-assistant[bot]
bot Dec 20, 2025

I cannot directly access or analyze the JSON file you've attached. However, I can help you understand what to look for based on Langfuse's token counting behavior.

Key Points About Token Inference

Langfuse only infers usage and cost for observations of type generation and embedding—not for regular span types(1). The inference happens when:

Usage details are not explicitly provided
The observation has a model parameter
Langfuse has a tokenizer defined for that model(1)

What to Check in Your Trace JSON

Look for these specific fields in your trace JSON:

Observation Types: Check the type field for each observation. If Agent Framework spans are marked as "type": "GENERATION" instead of "type": "SPAN", Langfuse will attempt token inference even though capture_usage=False was set(1).
Usage Details: Look for usage or usageDetails fields. If these exist with token counts, they were explicitly ingested. If they're missing but you see token counts in the UI, Langfuse inferred them(1).
Nested Generations: Check if you have parent-child relationships where both are type GENERATION. This causes Langfuse to sum tokens from both levels, effectively doubling the count(1).

Expected Behavior

According to the documentation, usage types are summarized as follows: "In the UI, Langfuse summarizes all usage types that include the string input as input usage types, similarly output as output usage types. If no total usage type is ingested, Langfuse sums up all usage type units to a total"(1).

Solution

To prevent double-counting, ensure that Agent Framework observations are created as type span rather than generation. Only the actual LLM call observations should be type generation(1)(2).

If you can share specific observation types and whether usage details are present in the JSON, I can provide more targeted guidance.

📚 Sources:

Have another question? Just tag @inkeep.

0 replies

2025-12-20T18:31:06Z

inkeep-ai-assistant[bot]
bot Dec 20, 2025

Based on the metadata you've shared, I can now identify the issue. Your invoke_agent span is being sent via OpenTelemetry with the attribute gen_ai.request.model: "gpt-4.1". According to Langfuse's OpenTelemetry property mapping, any span that includes a model attribute is automatically treated as a generation observation type(1).

From the documentation: "Any span that includes an attribute mapped to model is considered an LLM call and thus tracked as a generation observation"(1).

Why Token Inference is Happening

Even though the Microsoft Agent Framework sets capture_usage=False, Langfuse will still attempt to infer token usage because:

The span has gen_ai.request.model: "gpt-4.1", which maps to the model property(1)
This causes Langfuse to classify it as a generation type observation(1)
Langfuse only infers usage and cost for observations of type generation and embedding(2)
When usage details aren't provided and a model is specified, Langfuse uses its predefined tokenizers to estimate tokens(2)

The Double-Counting Problem

The double-counting occurs because:

The Agent Framework's invoke_agent span gets token counts inferred by Langfuse
Any nested actual LLM calls (like the OpenAI completion) also get their own token counts
Langfuse sums these together in the total

Solution

To prevent this, you need to ensure the invoke_agent spans do not include model attributes. The Microsoft Agent Framework should only set model attributes on the actual LLM generation spans, not on the orchestration/routing spans.

If you cannot modify the Agent Framework's OpenTelemetry instrumentation, you could alternatively:

Manually provide usage details with zero values for the invoke_agent spans using the gen_ai.usage.* attributes(1):
- gen_ai.usage.input_tokens: 0
- gen_ai.usage.output_tokens: 0
- gen_ai.usage.total_tokens: 0
Use explicit observation type by setting langfuse.observation.type: "span" to override the automatic generation classification(1)

The key issue is that the presence of gen_ai.request.model automatically triggers generation-type behavior in Langfuse, regardless of the capture_usage=False setting in the Agent Framework.

(2), (1)

📚 Sources:

Have another question? Just tag @inkeep.

0 replies

jannikmaierhoefer · 2025-12-22T09:33:07Z

jannikmaierhoefer
Dec 22, 2025
Maintainer

Hi @eddieahn, could you send an example how and why you are using capture_usage=False . The Microsoft Agent Framework passes the usage details as part of the metadata and Langfuse picks up this information:

Example trace: https://cloud.langfuse.com/project/cloramnkj0002jz088vzn1ja4/traces/8e419d3288419b5d944270505640e183?observation=2406e8343fd49c0e&timestamp=2025-12-17T09:37:37.258Z

2 replies

eddieahn Dec 22, 2025
Author

Hi @jannikmaierhoefer, when I take a look at the Microsoft Agent Framework repo, it looks they have set the capture_usage=False to to avoid double counting. Please see attached the screenshots from the repo.

When I look at the traces coming into Langfuse, the first invoke_agent call does not have those attributes (gen_ai.usage.input_tokens and gen_ai.usage.output_tokens) in the metadata, but it seems like Langfuse is setting the token count automatically. If you take a look below, you can see thats its including the invoke_agent token count (1,313+1,657+1,845) when showing the total token and cost. So would love to get clarification and confirmation on whether it should only be looking at the counts and costs where it says "chat gpt-4.1".

jannikmaierhoefer Dec 23, 2025
Maintainer

@eddieahn, understood - thanks for clarifying. Let us look into it.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Langfuse

Microsoft Agent Framework Token Count in Langfuse #11252

Uh oh!

{{title}}

Uh oh!

Replies: 5 comments 2 replies

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Langfuse

Microsoft Agent Framework Token Count in Langfuse #11252

Uh oh!

eddieahn Dec 20, 2025

Describe your question

Langfuse Cloud or Self-Hosted?

If Self-Hosted

If Langfuse Cloud

SDK and integration versions

Pre-Submission Checklist

Replies: 5 comments · 2 replies

Uh oh!

inkeep-ai-assistant[bot] bot Dec 20, 2025

Token Inference for Spans

Double-Counting Issues

Recommendations

Uh oh!

inkeep-ai-assistant[bot] bot Dec 20, 2025

Token Inference Behavior

Potential Causes of Double-Counting

How to Verify and Fix

Uh oh!

inkeep-ai-assistant[bot] bot Dec 20, 2025

Key Points About Token Inference

What to Check in Your Trace JSON

Expected Behavior

Solution

Uh oh!

inkeep-ai-assistant[bot] bot Dec 20, 2025

Why Token Inference is Happening

The Double-Counting Problem

Solution

Uh oh!

jannikmaierhoefer Dec 22, 2025 Maintainer

Uh oh!

eddieahn Dec 22, 2025 Author

Uh oh!

jannikmaierhoefer Dec 23, 2025 Maintainer

eddieahn
Dec 20, 2025

Replies: 5 comments 2 replies

inkeep-ai-assistant[bot]
bot Dec 20, 2025

inkeep-ai-assistant[bot]
bot Dec 20, 2025

inkeep-ai-assistant[bot]
bot Dec 20, 2025

inkeep-ai-assistant[bot]
bot Dec 20, 2025

jannikmaierhoefer
Dec 22, 2025
Maintainer

eddieahn Dec 22, 2025
Author

jannikmaierhoefer Dec 23, 2025
Maintainer