Observability Module with OpenTelemetry and Phoenix Integration #168

Cyb3rWard0g · 2025-07-30T03:20:38Z

This PR introduces a complete observability solution for Dapr Agents, enabling distributed tracing and monitoring through OpenTelemetry with Phoenix UI support. The module provides automatic instrumentation for agents, tools, LLM calls, and workflow executions while maintaining W3C Trace Context standards for distributed tracing across Dapr boundaries. All observability features are optional dependencies that gracefully degrade when not installed.

Key Changes

New Observability Module (`dapr_agents/observability/`)

Automatic Instrumentation: Zero-code tracing for agents, tools, LLM interactions, and workflows
Optional Dependencies: Clean fallback behavior when observability packages aren't installed
W3C Trace Context: Standards-compliant context propagation across Dapr Workflow boundaries
Phoenix UI Integration: Rich visualization and analysis through OpenInference semantic conventions

Core Components

Instrumentor (`DaprAgentsInstrumentor`)

Main entry point for enabling observability
Automatic discovery and wrapping of key components
Configurable span processors and exporters

Wrapper Classes

AgentWrapper: Traces agent conversations and reasoning flows
LLMWrapper: Captures LLM calls with token usage and message processing
ToolWrapper: Monitors tool executions with input/output tracking
WorkflowWrapper: Traces workflow orchestration and task execution
WorkflowTaskWrapper: Detailed task-level tracing within workflows

Context Propagation (`context_propagation.py`)

W3C Trace Context format support for Dapr serialization
extract_otel_context() and restore_otel_context() utilities
Proper parent-child span relationships across workflow boundaries

Message Processing (`message_processors.py`)

Converts various message formats to OpenInference standard
Tool schema extraction and serialization
Token usage tracking and LLM response processing

Constants and Utilities

OpenInference semantic conventions with fallback values
Availability detection for optional dependencies
Safe JSON serialization with error handling

Documentation Updates

Added observability section to quickstart README
Docker Compose setup for Phoenix with PostgreSQL
Step-by-step instrumentation guide
Troubleshooting and best practices

Features

Distributed Tracing

Complete trace hierarchy from agent conversations to individual tool calls
Context propagation across async boundaries and Dapr workflows
Correlation IDs for grouping related operations

Performance Monitoring

Response times for all operations
Token usage and cost tracking for LLM calls
Error rates and failure analysis
Tool execution performance metrics

Rich Visualization

Phoenix UI compatibility with proper span relationships
Message flow visualization with input/output content
Tool schema display and parameter tracking
Workflow execution timelines

Graceful Degradation

Zero impact when observability packages not installed
Automatic fallback to no-op implementations
Clear error messages with installation guidance

Technical Implementation

W3C Trace Context Support

Proper traceparent and tracestate header handling
Compatible with Dapr's serialization mechanisms
Maintains trace continuity across workflow restarts

OpenInference Compliance

Standard semantic conventions for AI/ML observability
Proper message formatting for Phoenix UI
Tool call tracking with function schemas

Optional Dependency Pattern

Clean import handling with try/except blocks
Availability flags throughout the codebase
Helpful error messages guiding users to install extras

Breaking Changes

None. All observability features are opt-in and don't affect existing functionality.

Signed-off-by: Roberto Rodriguez <9653181+Cyb3rWard0g@users.noreply.github.com>

sicoyle

few comments so far. Overall loooking goood and super neat feature add! I'd say I don't think we have to be as verbose in the comments since the code is pretty clear, and in logs I think we can trim down on some and I'm not sure we want emojis in the logs. Runtime Dapr had comments recently saying to avoid emojis in print outs, so maybe we should follow suite there?

dapr_agents/observability/constants.py

dapr_agents/observability/context_propagation.py

sicoyle · 2025-07-30T16:28:00Z

dapr_agents/observability/context_storage.py

+
+
+# Global instance for workflow context storage across the application
+_context_storage = WorkflowContextStorage()


do we really want this global to the entire application? An application can have multiple agents, therefore one agent could access another agents storage here potentially right? I'm not sure we want that. Can the workflow ids as keys be namespaced by agent name maybe to ensure that one agent cannot access another agents storage here?

mmm can you elaborate on this? when we run dapr run --app-id weatherapp . that is 1 agent right? We use the instance_id / workflow_id here to aggregate logs.

the dapr run cli cmd will run a single app https://docs.dapr.io/reference/cli/dapr-run/ but a single app in dapr agents world (app python code) can contain multiple agents technically since to some extent they are merely python objects a user can instantiate.

Alternatively, you can have the agent with the @task(agent=custom_agent, syntax. Would we get otel metrics on the agent with the task syntax?
For example something like this,

# Define simple agents extractor = Agent( name="DestinationExtractor", role="Extract destination", instructions=["Extract the main city from the user query"] ) planner = Agent( name="PlannerAgent", role="Outline planner", instructions=["Generate a 3-day outline for the destination"] ) expander = Agent( name="ItineraryAgent", role="Itinerary expander", instructions=["Expand the outline into a detailed plan"] ) # Workflow tasks @task(agent=extractor) def extract(user_msg: str) -> str: pass @task(agent=planner) def plan(destination: str) -> str: pass @task(agent=expander) def expand(outline: str) -> str: pass # Orchestration @workflow(name="chained_planner_workflow") def chained_planner_workflow(ctx: DaprWorkflowContext, user_msg: str): dest = yield ctx.call_activity(extract, input=user_msg) outline = yield ctx.call_activity(plan, input=dest) itinerary = yield ctx.call_activity(expand, input=outline) return itinerary

So yeah in a case such as this I'm not sure how the context sharing would be safe across all agents since they'd be within the same appid.

OMG! I ran your example (I had to do a minor fix to the agent as a task which was not related to observabiity), and it worked! I loved that it kept everything under One trace starting from the "Workflow". Look:

The Observabiity module was able to trace and connect each actions from the WorkflowApp as the root node. Then 3 tasks were identified and traced as workflow tasks . However, each task spawned an AI Agent. Each agent went through 1 iteration / 1 loop and each loop requested a chat completion. 🔥 Adding @yaron2 ;)

The workflow executing successfully too based on the app logs ;)

dapr_agents/observability/instrumentor.py

sicoyle · 2025-07-30T16:39:10Z

dapr_agents/observability/wrappers/llm.py

+        }
+
+        if model:
+            attributes[LLM_MODEL_NAME] = model


what if a user is running an app with say 3 agents within it and each using a diff model. How will this work if it is a single global const?

That is a good question. I have not tested it with multiple agents and different modules.

I tested this scenario too :) and same everything is captured the right way:

😉 @yaron2

sicoyle · 2025-07-31T19:35:24Z

dapr_agents/observability/message_processors.py

+    Returns:
+        Dict[str, Any]: Span attributes for input messages
+    """
+    if OPENINFERENCE_AVAILABLE:


i think this would be cleaner if instead of global var set we used a field on the agent or if this is enabled or yeah just a bool on one of the classes. Would that be doable?

mmm I think global makes more sense tbh since we are defining wrappers for all methods used by Agents , Durable Agents, and any agentic workflows. With the previous comments you had about potential issues on having multiple agents under the same app and multi model tasks, everything works as expected.

sicoyle · 2025-07-31T19:36:06Z

dapr_agents/observability/message_processors.py

+        try:
+            # Use AgentTool's built-in function call format
+            if hasattr(tool, "to_function_call"):
+                function_call = tool.to_function_call(format_type="openai")


is it the case that every llm provider that we have abides by the openai format?

good question. I need to do more research on that.

dapr_agents/observability/message_processors.py

sicoyle · 2025-07-31T19:39:27Z

dapr_agents/observability/utils.py

+def strip_method_args(arguments: Mapping[str, Any]) -> Dict[str, Any]:
+    """
+    Remove self/cls arguments from method parameters.
+
+    Filters out 'self' and 'cls' parameters from bound arguments to avoid
+    including instance/class references in span attributes, following the
+    SmolAgents pattern for cleaner tracing data.
+
+    Args:
+        arguments: Dictionary of bound method arguments
+
+    Returns:
+        Dict[str, Any]: Filtered arguments without self/cls
+
+    Example:
+        >>> strip_method_args({'self': obj, 'param': 'value', 'cls': MyClass})
+        {'param': 'value'}
+    """
+    return {
+        key: value for key, value in arguments.items() if key not in ("self", "cls")
+    }


is smolagents pattern the ideal pattern we're following? why?

compared to other openinference packages for other frameworks, smolagents is easy to follow to learn how instrumentation / observability is enabled.

sicoyle · 2025-07-31T19:42:18Z

dapr_agents/observability/wrappers/tool.py

+        """
+        # Check for instrumentation suppression
+        if context_api and context_api.get_value(
+            context_api._SUPPRESS_INSTRUMENTATION_KEY


what does this do?

It sets suppress_instrumentation in the open telemetry context

_SUPPRESS_INSTRUMENTATION_KEY = create_key("suppress_instrumentation")

that is then used somewhere else to set another key

def _instrumented_requests_call( method: str, url: str, call_wrapped, get_or_create_headers ): if context.get_value("suppress_instrumentation") or context.get_value( _SUPPRESS_REQUESTS_INSTRUMENTATION_KEY ): return call_wrapped()

And that key _SUPPRESS_REQUESTS_INSTRUMENTATION_KEY is a:

# A key to a context variable to avoid creating duplicate spans when instrumenting # both, Session.request and Session.send, since Session.request calls into Session.send _SUPPRESS_REQUESTS_INSTRUMENTATION_KEY = "suppress_requests_instrumentation"

Some context from OpenTelemetry docs: https://opentelemetry-python-kinvolk.readthedocs.io/en/latest/_modules/opentelemetry/instrumentation/requests.html and where it is set: https://github.com/pexip/os-python-opentelemetry-api/blob/bad159831b8ba321068a4a6b06c282c8737b94a4/src/opentelemetry/context/__init__.py#L171

quickstarts/03-agent-tool-call/weather_agent_tracing.py

quickstarts/03-durable-agent-tool-call/README.md

quickstarts/03-durable-agent-tool-call/docker-compose.yml

Signed-off-by: Roberto Rodriguez <9653181+Cyb3rWard0g@users.noreply.github.com>

…quest if not set Signed-off-by: Roberto Rodriguez <9653181+Cyb3rWard0g@users.noreply.github.com>

Signed-off-by: Roberto Rodriguez <9653181+Cyb3rWard0g@users.noreply.github.com>

yaron2

LGTM

Cyb3rWard0g added 5 commits July 29, 2025 23:02

add condition for opentelemetry import

1d367c7

Signed-off-by: Roberto Rodriguez <9653181+Cyb3rWard0g@users.noreply.github.com>

feat: add observability optional dependencies and update lock file

2d3be7b

Signed-off-by: Roberto Rodriguez <9653181+Cyb3rWard0g@users.noreply.github.com>

feat: add observability dapr agents module

f70bc90

Signed-off-by: Roberto Rodriguez <9653181+Cyb3rWard0g@users.noreply.github.com>

add and update quickstarts to show new observability module

f7aa5b1

Signed-off-by: Roberto Rodriguez <9653181+Cyb3rWard0g@users.noreply.github.com>

style: fix ruff linting and formatting issues across codebase

9a39bb7

Signed-off-by: Roberto Rodriguez <9653181+Cyb3rWard0g@users.noreply.github.com>

Cyb3rWard0g requested a review from yaron2 as a code owner July 30, 2025 03:20

Cyb3rWard0g requested review from sicoyle and yaron2 and removed request for yaron2 July 30, 2025 03:20

Cyb3rWard0g added 2 commits July 29, 2025 23:32

style: fix flake8 linting issues

aab9659

Signed-off-by: Roberto Rodriguez <9653181+Cyb3rWard0g@users.noreply.github.com>

config: ignore mypy errors for optional observability module

201f831

Signed-off-by: Roberto Rodriguez <9653181+Cyb3rWard0g@users.noreply.github.com>

sicoyle reviewed Jul 30, 2025

View reviewed changes

sicoyle reviewed Jul 31, 2025

View reviewed changes

Cyb3rWard0g and others added 8 commits August 1, 2025 02:35

Updtate Workflow Task to use Agent instances without a task description

b7e05f5

Signed-off-by: Roberto Rodriguez <9653181+Cyb3rWard0g@users.noreply.github.com>

Update tool_choice in Agent flow to not be part of chat completion re…

f91a297

…quest if not set Signed-off-by: Roberto Rodriguez <9653181+Cyb3rWard0g@users.noreply.github.com>

Remove comments ;)

5c5d353

Signed-off-by: Roberto Rodriguez <9653181+Cyb3rWard0g@users.noreply.github.com>

Update quickstarts README

5d8b644

Signed-off-by: Roberto Rodriguez <9653181+Cyb3rWard0g@users.noreply.github.com>

Update tool_choice to not be added to chat completion request if not set

acd8c13

Signed-off-by: Roberto Rodriguez <9653181+Cyb3rWard0g@users.noreply.github.com>

Add quickstart to show agents as tasks with tracing and multi-model

c11c067

Signed-off-by: Roberto Rodriguez <9653181+Cyb3rWard0g@users.noreply.github.com>

style: fix ruff linting and formatting issues across codebase

11ea288

Signed-off-by: Roberto Rodriguez <9653181+Cyb3rWard0g@users.noreply.github.com>

Merge branch 'main' into cyb3rward0g/observability-openinference

cdeba38

sicoyle approved these changes Aug 1, 2025

View reviewed changes

yaron2 approved these changes Aug 1, 2025

View reviewed changes

yaron2 merged commit 963cef6 into main Aug 1, 2025
6 checks passed

Cyb3rWard0g mentioned this pull request Aug 11, 2025

Agent Interaction Tracing #79

Closed

sicoyle mentioned this pull request Aug 14, 2025

[1.16] Dapr Agents OpenTelemetry with Phoenix Integration Docs dapr/docs#4770

Open



		# Global instance for workflow context storage across the application
		_context_storage = WorkflowContextStorage()

Observability Module with OpenTelemetry and Phoenix Integration #168

Observability Module with OpenTelemetry and Phoenix Integration #168

Uh oh!

Conversation

Cyb3rWard0g commented Jul 30, 2025

Key Changes

New Observability Module (dapr_agents/observability/)

Core Components

Instrumentor (DaprAgentsInstrumentor)

Wrapper Classes

Context Propagation (context_propagation.py)

Message Processing (message_processors.py)

Constants and Utilities

Documentation Updates

Features

Distributed Tracing

Performance Monitoring

Rich Visualization

Graceful Degradation

Technical Implementation

W3C Trace Context Support

OpenInference Compliance

Optional Dependency Pattern

Breaking Changes

Uh oh!

sicoyle left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

yaron2 left a comment

Choose a reason for hiding this comment

Uh oh!

New Observability Module (`dapr_agents/observability/`)

Instrumentor (`DaprAgentsInstrumentor`)

Context Propagation (`context_propagation.py`)

Message Processing (`message_processors.py`)

sicoyle left a comment •

edited

Loading