-
Notifications
You must be signed in to change notification settings - Fork 79
Observability Module with OpenTelemetry and Phoenix Integration #168
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Signed-off-by: Roberto Rodriguez <9653181+Cyb3rWard0g@users.noreply.github.com>
Signed-off-by: Roberto Rodriguez <9653181+Cyb3rWard0g@users.noreply.github.com>
Signed-off-by: Roberto Rodriguez <9653181+Cyb3rWard0g@users.noreply.github.com>
Signed-off-by: Roberto Rodriguez <9653181+Cyb3rWard0g@users.noreply.github.com>
Signed-off-by: Roberto Rodriguez <9653181+Cyb3rWard0g@users.noreply.github.com>
Signed-off-by: Roberto Rodriguez <9653181+Cyb3rWard0g@users.noreply.github.com>
Signed-off-by: Roberto Rodriguez <9653181+Cyb3rWard0g@users.noreply.github.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
few comments so far. Overall loooking goood and super neat feature add! I'd say I don't think we have to be as verbose in the comments since the code is pretty clear, and in logs I think we can trim down on some and I'm not sure we want emojis in the logs. Runtime Dapr had comments recently saying to avoid emojis in print outs, so maybe we should follow suite there?
|
|
||
|
|
||
| # Global instance for workflow context storage across the application | ||
| _context_storage = WorkflowContextStorage() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we really want this global to the entire application? An application can have multiple agents, therefore one agent could access another agents storage here potentially right? I'm not sure we want that. Can the workflow ids as keys be namespaced by agent name maybe to ensure that one agent cannot access another agents storage here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
mmm can you elaborate on this? when we run dapr run --app-id weatherapp . that is 1 agent right? We use the instance_id / workflow_id here to aggregate logs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the dapr run cli cmd will run a single app https://docs.dapr.io/reference/cli/dapr-run/ but a single app in dapr agents world (app python code) can contain multiple agents technically since to some extent they are merely python objects a user can instantiate.
Alternatively, you can have the agent with the @task(agent=custom_agent, syntax. Would we get otel metrics on the agent with the task syntax?
For example something like this,
# Define simple agents
extractor = Agent(
name="DestinationExtractor",
role="Extract destination",
instructions=["Extract the main city from the user query"]
)
planner = Agent(
name="PlannerAgent",
role="Outline planner",
instructions=["Generate a 3-day outline for the destination"]
)
expander = Agent(
name="ItineraryAgent",
role="Itinerary expander",
instructions=["Expand the outline into a detailed plan"]
)
# Workflow tasks
@task(agent=extractor)
def extract(user_msg: str) -> str:
pass
@task(agent=planner)
def plan(destination: str) -> str:
pass
@task(agent=expander)
def expand(outline: str) -> str:
pass
# Orchestration
@workflow(name="chained_planner_workflow")
def chained_planner_workflow(ctx: DaprWorkflowContext, user_msg: str):
dest = yield ctx.call_activity(extract, input=user_msg)
outline = yield ctx.call_activity(plan, input=dest)
itinerary = yield ctx.call_activity(expand, input=outline)
return itinerarySo yeah in a case such as this I'm not sure how the context sharing would be safe across all agents since they'd be within the same appid.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The Observabiity module was able to trace and connect each actions from the WorkflowApp as the root node. Then 3 tasks were identified and traced as workflow tasks . However, each task spawned an AI Agent. Each agent went through 1 iteration / 1 loop and each loop requested a chat completion. 🔥 Adding @yaron2 ;)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| } | ||
|
|
||
| if model: | ||
| attributes[LLM_MODEL_NAME] = model |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what if a user is running an app with say 3 agents within it and each using a diff model. How will this work if it is a single global const?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That is a good question. I have not tested it with multiple agents and different modules.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
😉 @yaron2
| Returns: | ||
| Dict[str, Any]: Span attributes for input messages | ||
| """ | ||
| if OPENINFERENCE_AVAILABLE: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i think this would be cleaner if instead of global var set we used a field on the agent or if this is enabled or yeah just a bool on one of the classes. Would that be doable?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
mmm I think global makes more sense tbh since we are defining wrappers for all methods used by Agents , Durable Agents, and any agentic workflows. With the previous comments you had about potential issues on having multiple agents under the same app and multi model tasks, everything works as expected.
| try: | ||
| # Use AgentTool's built-in function call format | ||
| if hasattr(tool, "to_function_call"): | ||
| function_call = tool.to_function_call(format_type="openai") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is it the case that every llm provider that we have abides by the openai format?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good question. I need to do more research on that.
| def strip_method_args(arguments: Mapping[str, Any]) -> Dict[str, Any]: | ||
| """ | ||
| Remove self/cls arguments from method parameters. | ||
| Filters out 'self' and 'cls' parameters from bound arguments to avoid | ||
| including instance/class references in span attributes, following the | ||
| SmolAgents pattern for cleaner tracing data. | ||
| Args: | ||
| arguments: Dictionary of bound method arguments | ||
| Returns: | ||
| Dict[str, Any]: Filtered arguments without self/cls | ||
| Example: | ||
| >>> strip_method_args({'self': obj, 'param': 'value', 'cls': MyClass}) | ||
| {'param': 'value'} | ||
| """ | ||
| return { | ||
| key: value for key, value in arguments.items() if key not in ("self", "cls") | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is smolagents pattern the ideal pattern we're following? why?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
compared to other openinference packages for other frameworks, smolagents is easy to follow to learn how instrumentation / observability is enabled.
| """ | ||
| # Check for instrumentation suppression | ||
| if context_api and context_api.get_value( | ||
| context_api._SUPPRESS_INSTRUMENTATION_KEY |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what does this do?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It sets suppress_instrumentation in the open telemetry context
_SUPPRESS_INSTRUMENTATION_KEY = create_key("suppress_instrumentation")
that is then used somewhere else to set another key
def _instrumented_requests_call(
method: str, url: str, call_wrapped, get_or_create_headers
):
if context.get_value("suppress_instrumentation") or context.get_value(
_SUPPRESS_REQUESTS_INSTRUMENTATION_KEY
):
return call_wrapped()
And that key _SUPPRESS_REQUESTS_INSTRUMENTATION_KEY is a:
# A key to a context variable to avoid creating duplicate spans when instrumenting
# both, Session.request and Session.send, since Session.request calls into Session.send
_SUPPRESS_REQUESTS_INSTRUMENTATION_KEY = "suppress_requests_instrumentation"
Some context from OpenTelemetry docs: https://opentelemetry-python-kinvolk.readthedocs.io/en/latest/_modules/opentelemetry/instrumentation/requests.html and where it is set: https://github.com/pexip/os-python-opentelemetry-api/blob/bad159831b8ba321068a4a6b06c282c8737b94a4/src/opentelemetry/context/__init__.py#L171
Signed-off-by: Roberto Rodriguez <9653181+Cyb3rWard0g@users.noreply.github.com>
…quest if not set Signed-off-by: Roberto Rodriguez <9653181+Cyb3rWard0g@users.noreply.github.com>
Signed-off-by: Roberto Rodriguez <9653181+Cyb3rWard0g@users.noreply.github.com>
Signed-off-by: Roberto Rodriguez <9653181+Cyb3rWard0g@users.noreply.github.com>
Signed-off-by: Roberto Rodriguez <9653181+Cyb3rWard0g@users.noreply.github.com>
Signed-off-by: Roberto Rodriguez <9653181+Cyb3rWard0g@users.noreply.github.com>
Signed-off-by: Roberto Rodriguez <9653181+Cyb3rWard0g@users.noreply.github.com>
yaron2
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
This PR introduces a complete observability solution for Dapr Agents, enabling distributed tracing and monitoring through OpenTelemetry with Phoenix UI support. The module provides automatic instrumentation for agents, tools, LLM calls, and workflow executions while maintaining W3C Trace Context standards for distributed tracing across Dapr boundaries. All observability features are optional dependencies that gracefully degrade when not installed.
Key Changes
New Observability Module (
dapr_agents/observability/)Core Components
Instrumentor (
DaprAgentsInstrumentor)Wrapper Classes
Context Propagation (
context_propagation.py)extract_otel_context()andrestore_otel_context()utilitiesMessage Processing (
message_processors.py)Constants and Utilities
Documentation Updates
Features
Distributed Tracing
Performance Monitoring
Rich Visualization
Graceful Degradation
Technical Implementation
W3C Trace Context Support
traceparentandtracestateheader handlingOpenInference Compliance
Optional Dependency Pattern
Breaking Changes
None. All observability features are opt-in and don't affect existing functionality.