dev -> main (Nov 3rd) #1591

Ahmad-mtos · 2025-11-03T15:41:28Z

PR Type

Bug fix

Description

Remove async LRU caching from secret retrieval functions
Prevent cross-loop futures issues in workflow execution
Add docstrings explaining caching removal rationale
Clean up unused imports and environment variables

Diagram Walkthrough

flowchart LR
  A["Secret Retrieval Functions"] -->|Remove @alru_cache| B["Direct Query Execution"]
  B -->|Avoid| C["Cross-Loop Futures Issues"]
  D["Workflow Context"] -->|Execute Activity| B
  E["Non-Workflow Context"] -->|Direct Query| B

File Walkthrough

Relevant files

Bug fix

secrets.py `Remove caching decorators from secret functions` src/agents-api/agents_api/common/utils/secrets.py Removed `@alru_cache` decorator from `get_secret_by_name()` and `get_secrets_list()` functions Removed `async_lru` import and `secrets_cache_ttl` environment variable reference Added docstrings to both functions explaining caching removal to prevent cross-loop futures Reformatted `get_secrets_list()` function signature for better readability	+6/-7

Important

Remove caching from get_secret_by_name and get_secrets_list in secrets.py to avoid cross-loop futures.

Behavior:
- Remove alru_cache from get_secret_by_name and get_secrets_list in secrets.py to avoid cross-loop futures.
- Update docstrings for get_secret_by_name and get_secrets_list to reflect non-caching behavior.
Imports:
- Remove alru_cache import from secrets.py.
- Remove secrets_cache_ttl import from secrets.py.

^{This description was created by}^{for 54da275. You can customize this summary. It will automatically update as commits are pushed.}

…event cross-loop futures

fix(agents-api): Remove caching from secret retrieval functions

github-actions · 2025-11-03T15:41:48Z

Claude finished @Ahmad-mtos's task —— View job

Analyzing PR for Changelog Updates

Todo List:

Read diff.patch to understand changes
Assess if changes are user-visible
Create/update changelog entry if needed
Update home page template if applicable
Commit and push changes

Analyzing the pull request changes to determine if a changelog entry is needed...

qodo-free-for-open-source-projects · 2025-11-03T15:41:52Z

PR Compliance Guide 🔍

(Compliance updated until commit `54da275`)

Below is a summary of compliance checks for this PR:

Security Compliance
🟢	No security concerns identified No security vulnerabilities detected by AI analysis. Human verification advised for critical code.
Ticket Compliance
⚪	🎫 No ticket provided Create ticket/issue
Codebase Duplication Compliance
⚪	Codebase context is not defined Follow the guide to enable codebase context checks.
Custom Compliance
🟢	Generic: Meaningful Naming and Self-Documenting Code Objective: Ensure all identifiers clearly express their purpose and intent, making code self-documenting Status: Passed
	Generic: Robust Error Handling and Edge Case Management Objective: Ensure comprehensive error handling that provides meaningful context and graceful degradation Status: Passed
	Generic: Secure Error Handling Objective: To prevent the leakage of sensitive system information through error messages while providing sufficient detail for internal debugging. Status: Passed
	Generic: Secure Logging Practices Objective: To ensure logs are useful for debugging and auditing without exposing sensitive information like PII, PHI, or cardholder data. Status: Passed
⚪	Generic: Comprehensive Audit Trails Objective: To create a detailed and reliable record of critical system actions for security analysis and compliance. Status: Missing auditing: New secret retrieval functions remove caching and continue to fetch secrets but add no logging/audit trail for access to sensitive data, which may be a critical action requiring audit logs. Referred Code """Fetch a developer secret by name without caching to avoid cross-loop futures.""" try: secret = await workflow.execute_activity( pg_query_step, args=[ "get_secret_by_name", "secrets.get_by_name", {"developer_id": developer_id, "name": name, "decrypt": decrypt}, ], schedule_to_close_timeout=timedelta(days=31), retry_policy=DEFAULT_RETRY_POLICY, heartbeat_timeout=timedelta(seconds=temporal_heartbeat_timeout), ) except _NotInWorkflowEventLoopError: secret = await get_secret_by_name_query( developer_id=developer_id, name=name, decrypt=decrypt, ) if secret is None: ... (clipped 3 lines)
	Generic: Security-First Input Validation and Data Handling Objective: Ensure all data inputs are validated, sanitized, and handled securely to prevent vulnerabilities Status: Input validation: Functions accept `name` and `developer_id` and directly query for secrets without visible input validation or sanitization in the new code diff, which may rely on downstream safety not shown here. Referred Code async def get_secret_by_name(developer_id: UUID, name: str, decrypt: bool = False) -> Secret: """Fetch a developer secret by name without caching to avoid cross-loop futures.""" try: secret = await workflow.execute_activity( pg_query_step, args=[ "get_secret_by_name", "secrets.get_by_name", {"developer_id": developer_id, "name": name, "decrypt": decrypt}, ], schedule_to_close_timeout=timedelta(days=31), retry_policy=DEFAULT_RETRY_POLICY, heartbeat_timeout=timedelta(seconds=temporal_heartbeat_timeout), ) except _NotInWorkflowEventLoopError: secret = await get_secret_by_name_query( developer_id=developer_id, name=name, decrypt=decrypt, ) ... (clipped 4 lines)

Compliance status legend

🟢 - Fully Compliant
🟡 - Partial Compliant
🔴 - Not Compliant
⚪ - Requires Further Human Verification
🏷️ - Compliance label

Previous compliance checks

Compliance check up to commit 54da275

Security Compliance
🟢	No security concerns identified No security vulnerabilities detected by AI analysis. Human verification advised for critical code.
Ticket Compliance
⚪	🎫 No ticket provided Create ticket/issue
Codebase Duplication Compliance
⚪	Codebase context is not defined Follow the guide to enable codebase context checks.
Custom Compliance
🟢	Generic: Meaningful Naming and Self-Documenting Code Objective: Ensure all identifiers clearly express their purpose and intent, making code self-documenting Status: Passed
	Generic: Robust Error Handling and Edge Case Management Objective: Ensure comprehensive error handling that provides meaningful context and graceful degradation Status: Passed
	Generic: Secure Error Handling Objective: To prevent the leakage of sensitive system information through error messages while providing sufficient detail for internal debugging. Status: Passed
	Generic: Secure Logging Practices Objective: To ensure logs are useful for debugging and auditing without exposing sensitive information like PII, PHI, or cardholder data. Status: Passed
⚪	Generic: Comprehensive Audit Trails Objective: To create a detailed and reliable record of critical system actions for security analysis and compliance. Status: Missing audit logs: New secret retrieval paths do not include audit logging for access to sensitive secrets, and the diff provides no evidence that such access is logged elsewhere. Referred Code async def get_secret_by_name(developer_id: UUID, name: str, decrypt: bool = False) -> Secret: """Fetch a developer secret by name without caching to avoid cross-loop futures.""" try: secret = await workflow.execute_activity( pg_query_step, args=[ "get_secret_by_name", "secrets.get_by_name", {"developer_id": developer_id, "name": name, "decrypt": decrypt}, ], schedule_to_close_timeout=timedelta(days=31), retry_policy=DEFAULT_RETRY_POLICY, heartbeat_timeout=timedelta(seconds=temporal_heartbeat_timeout), ) except _NotInWorkflowEventLoopError: secret = await get_secret_by_name_query( developer_id=developer_id, name=name, decrypt=decrypt, ) ... (clipped 4 lines)
	Generic: Security-First Input Validation and Data Handling Objective: Ensure all data inputs are validated, sanitized, and handled securely to prevent vulnerabilities Status: Input validation gap: New code paths accept `name` and `developer_id` for secret access without visible validation or normalization in this diff, which may rely on upstream guarantees not shown. Referred Code async def get_secret_by_name(developer_id: UUID, name: str, decrypt: bool = False) -> Secret: """Fetch a developer secret by name without caching to avoid cross-loop futures.""" try: secret = await workflow.execute_activity( pg_query_step, args=[ "get_secret_by_name", "secrets.get_by_name", {"developer_id": developer_id, "name": name, "decrypt": decrypt}, ], schedule_to_close_timeout=timedelta(days=31), retry_policy=DEFAULT_RETRY_POLICY, heartbeat_timeout=timedelta(seconds=temporal_heartbeat_timeout), ) except _NotInWorkflowEventLoopError: secret = await get_secret_by_name_query( developer_id=developer_id, name=name, decrypt=decrypt, ) ... (clipped 4 lines)

ellipsis-dev

Important

Looks good to me! 👍

Reviewed everything up to 54da275 in 1 minute and 1 seconds. Click for details.

Reviewed 52 lines of code in 1 files
Skipped 0 files when reviewing.
Skipped posting 2 draft comments. View those below.
Modify your settings and rules to customize what types of comments Ellipsis leaves. And don't forget to react with 👍 or 👎 to teach Ellipsis.

1. src/agents-api/agents_api/common/utils/secrets.py:18

Draft comment:
Good update: the alru_cache decorator was removed and replaced with a clear docstring. Please ensure that the removal of caching won’t adversely affect performance if these secrets are fetched frequently.
Reason this comment was not posted:
Confidence changes required: 33% <= threshold 50% None

2. src/agents-api/agents_api/common/utils/secrets.py:45

Draft comment:
The get_secrets_list function also has its caching decorator removed with an updated docstring. Confirm that the fallback execution (using connection_pool) functions as expected in all contexts.
Reason this comment was not posted:
Confidence changes required: 0% <= threshold 50% None

Workflow ID: wflow_MCwr75cYd1PELEiY

^{You can customize}^{by changing your verbosity settings, reacting with 👍 or 👎, replying to comments, or adding code review rules.}

qodo-free-for-open-source-projects · 2025-11-03T15:42:35Z

PR Code Suggestions ✨

No code suggestions found for the PR.

claude · 2025-11-03T15:44:56Z

Pull Request Review: Remove Async LRU Caching from Secret Functions

Summary

This PR removes async LRU caching from secret retrieval functions to prevent cross-loop futures issues in Temporal workflow execution. While the fix addresses a legitimate technical concern, there are several considerations to discuss.

✅ Positive Aspects

Addresses a Real Issue: Cross-loop futures in async contexts can cause hard-to-debug problems, especially in Temporal workflows
Clean Implementation: The removal is straightforward and includes helpful docstrings explaining the rationale
Maintains Functionality: The core logic remains intact with proper fallback mechanisms for both workflow and non-workflow contexts

🔍 Areas of Concern

1. Performance Impact

Issue: Removing caching could significantly impact performance, especially for frequently accessed secrets.

Each call now results in a database query or Temporal activity execution
In high-throughput scenarios, this could create database bottlenecks
No performance benchmarks or metrics provided to assess impact

Recommendation: Consider implementing request-scoped caching or using Temporal's built-in caching mechanisms that are workflow-safe.

2. Missing Alternative Solutions

Issue: The PR removes caching entirely without exploring alternatives.

Potential alternatives to consider:

Use contextvars for request-scoped caching
Implement workflow-aware caching that respects event loop boundaries
Use Temporal's built-in memo functionality for workflow contexts
Add a simple in-memory cache with proper event loop isolation

3. Documentation Gap

Issue: While docstrings were added, there's no update to broader documentation about this architectural decision.

Recommendation:

Update AGENTS.md or relevant documentation to explain why caching was removed
Document performance implications for future developers
Add an AIDEV-NOTE about potential performance hotspots

🐛 Potential Issues

Database Load: Without caching, every secret access hits the database. Monitor for:
- Increased query latency
- Database connection pool exhaustion
- Overall system performance degradation
Temporal Activity Overhead: Each secret retrieval in workflow context now requires activity execution, which has overhead:
- Network round-trips
- Serialization/deserialization costs
- Temporal history growth

💡 Suggestions for Improvement

Add Metrics:

# Add performance monitoring
from ...monitoring import metrics

async def get_secret_by_name(...):
    with metrics.timer('secrets.fetch_duration'):
        # existing implementation

Consider Request-Scoped Caching:

from contextvars import ContextVar

_request_cache: ContextVar[dict] = ContextVar('secret_cache', default={})

async def get_secret_by_name(...):
    cache = _request_cache.get()
    cache_key = f"{developer_id}:{name}:{decrypt}"
    if cache_key in cache:
        return cache[cache_key]
    # ... fetch logic ...
    cache[cache_key] = secret
    return secret

Add Configuration Option:
Allow enabling/disabling caching via environment variable for different deployment scenarios.

🏗️ Architecture Considerations

This change highlights a broader architectural challenge: balancing performance optimization with Temporal's execution model. Consider:

Creating a dedicated caching strategy document for Temporal workflows
Establishing patterns for handling shared state in distributed workflows
Implementing workflow-safe caching utilities that other parts of the codebase can use

✔️ Testing Recommendations

Before merging:

Load test the secret retrieval endpoints to measure performance impact
Monitor database query rates in a staging environment
Verify no regression in workflow execution times
Test with concurrent workflow executions to ensure no race conditions

📊 Risk Assessment

Risk Level: Medium
Performance Risk: High (without mitigation)
Stability Risk: Low (fixes a real issue)
Security Risk: None identified

Final Recommendation

Approve with reservations - The fix addresses a legitimate issue, but the performance implications need careful consideration. I recommend:

Adding performance monitoring before deployment
Implementing request-scoped caching as a middle ground
Documenting this architectural decision properly
Planning a follow-up to implement workflow-safe caching

The code quality is good and the fix is valid, but we should be proactive about potential performance degradation.

Review based on AGENTS.md guidelines and best practices for the Julep codebase.

Ahmad-mtos and others added 2 commits November 3, 2025 18:31

fix(agents-api): Remove caching from secret retrieval functions to pr…

3991ab9

…event cross-loop futures

Merge pull request #1590 from julep-ai/x/task-loop

54da275

fix(agents-api): Remove caching from secret retrieval functions

qodo-free-for-open-source-projects bot added the Review effort 2/5 label Nov 3, 2025

ellipsis-dev bot reviewed Nov 3, 2025

View reviewed changes

Ahmad-mtos merged commit 30d7ad8 into main Nov 3, 2025
26 of 27 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

dev -> main (Nov 3rd) #1591

dev -> main (Nov 3rd) #1591

Uh oh!

Ahmad-mtos commented Nov 3, 2025 •

edited by ellipsis-dev bot

Loading

Uh oh!

github-actions bot commented Nov 3, 2025 •

edited

Loading

Uh oh!

qodo-free-for-open-source-projects bot commented Nov 3, 2025 •

edited

Loading

Uh oh!

ellipsis-dev bot left a comment

Uh oh!

qodo-free-for-open-source-projects bot commented Nov 3, 2025

Uh oh!

Uh oh!

claude bot commented Nov 3, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

dev -> main (Nov 3rd) #1591

dev -> main (Nov 3rd) #1591

Uh oh!

Conversation

Ahmad-mtos commented Nov 3, 2025 • edited by ellipsis-dev bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Type

Description

Diagram Walkthrough

File Walkthrough

Uh oh!

github-actions bot commented Nov 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Analyzing PR for Changelog Updates

Uh oh!

qodo-free-for-open-source-projects bot commented Nov 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Compliance Guide 🔍

(Compliance updated until commit 54da275)

Previous compliance checks

Uh oh!

ellipsis-dev bot left a comment

Choose a reason for hiding this comment

Uh oh!

qodo-free-for-open-source-projects bot commented Nov 3, 2025

PR Code Suggestions ✨

Uh oh!

Uh oh!

claude bot commented Nov 3, 2025

Pull Request Review: Remove Async LRU Caching from Secret Functions

Summary

✅ Positive Aspects

🔍 Areas of Concern

1. Performance Impact

2. Missing Alternative Solutions

3. Documentation Gap

🐛 Potential Issues

💡 Suggestions for Improvement

🏗️ Architecture Considerations

✔️ Testing Recommendations

📊 Risk Assessment

Final Recommendation

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Ahmad-mtos commented Nov 3, 2025 •

edited by ellipsis-dev bot

Loading

github-actions bot commented Nov 3, 2025 •

edited

Loading

qodo-free-for-open-source-projects bot commented Nov 3, 2025 •

edited

Loading

(Compliance updated until commit `54da275`)