Skip to content

Conversation

@Ahmad-mtos
Copy link
Contributor

@Ahmad-mtos Ahmad-mtos commented Nov 3, 2025

PR Type

Bug fix


Description

  • Remove async LRU caching from secret retrieval functions

  • Prevent cross-loop futures issues in workflow execution

  • Add docstrings explaining caching removal rationale

  • Clean up unused imports and environment variables


Diagram Walkthrough

flowchart LR
  A["Secret Retrieval Functions"] -->|Remove @alru_cache| B["Direct Query Execution"]
  B -->|Avoid| C["Cross-Loop Futures Issues"]
  D["Workflow Context"] -->|Execute Activity| B
  E["Non-Workflow Context"] -->|Direct Query| B
Loading

File Walkthrough

Relevant files
Bug fix
secrets.py
Remove caching decorators from secret functions                   

src/agents-api/agents_api/common/utils/secrets.py

  • Removed @alru_cache decorator from get_secret_by_name() and
    get_secrets_list() functions
  • Removed async_lru import and secrets_cache_ttl environment variable
    reference
  • Added docstrings to both functions explaining caching removal to
    prevent cross-loop futures
  • Reformatted get_secrets_list() function signature for better
    readability
+6/-7     


Important

Remove caching from get_secret_by_name and get_secrets_list in secrets.py to avoid cross-loop futures.

  • Behavior:
    • Remove alru_cache from get_secret_by_name and get_secrets_list in secrets.py to avoid cross-loop futures.
    • Update docstrings for get_secret_by_name and get_secrets_list to reflect non-caching behavior.
  • Imports:
    • Remove alru_cache import from secrets.py.
    • Remove secrets_cache_ttl import from secrets.py.

This description was created by Ellipsis for 54da275. You can customize this summary. It will automatically update as commits are pushed.

Ahmad-mtos and others added 2 commits November 3, 2025 18:31
fix(agents-api): Remove caching from secret retrieval functions
@github-actions
Copy link
Contributor

github-actions bot commented Nov 3, 2025

Claude finished @Ahmad-mtos's task —— View job


Analyzing PR for Changelog Updates

Todo List:

  • Read diff.patch to understand changes
  • Assess if changes are user-visible
  • Create/update changelog entry if needed
  • Update home page template if applicable
  • Commit and push changes

Analyzing the pull request changes to determine if a changelog entry is needed...

@qodo-free-for-open-source-projects
Copy link
Contributor

qodo-free-for-open-source-projects bot commented Nov 3, 2025

PR Compliance Guide 🔍

(Compliance updated until commit 54da275)

Below is a summary of compliance checks for this PR:

Security Compliance
🟢
No security concerns identified No security vulnerabilities detected by AI analysis. Human verification advised for critical code.
Ticket Compliance
🎫 No ticket provided
  • Create ticket/issue
Codebase Duplication Compliance
Codebase context is not defined

Follow the guide to enable codebase context checks.

Custom Compliance
🟢
Generic: Meaningful Naming and Self-Documenting Code

Objective: Ensure all identifiers clearly express their purpose and intent, making code
self-documenting

Status: Passed

Generic: Robust Error Handling and Edge Case Management

Objective: Ensure comprehensive error handling that provides meaningful context and graceful
degradation

Status: Passed

Generic: Secure Error Handling

Objective: To prevent the leakage of sensitive system information through error messages while
providing sufficient detail for internal debugging.

Status: Passed

Generic: Secure Logging Practices

Objective: To ensure logs are useful for debugging and auditing without exposing sensitive
information like PII, PHI, or cardholder data.

Status: Passed

Generic: Comprehensive Audit Trails

Objective: To create a detailed and reliable record of critical system actions for security analysis
and compliance.

Status:
Missing auditing: New secret retrieval functions remove caching and continue to fetch secrets but add no
logging/audit trail for access to sensitive data, which may be a critical action requiring
audit logs.

Referred Code
"""Fetch a developer secret by name without caching to avoid cross-loop futures."""
try:
    secret = await workflow.execute_activity(
        pg_query_step,
        args=[
            "get_secret_by_name",
            "secrets.get_by_name",
            {"developer_id": developer_id, "name": name, "decrypt": decrypt},
        ],
        schedule_to_close_timeout=timedelta(days=31),
        retry_policy=DEFAULT_RETRY_POLICY,
        heartbeat_timeout=timedelta(seconds=temporal_heartbeat_timeout),
    )
except _NotInWorkflowEventLoopError:
    secret = await get_secret_by_name_query(
        developer_id=developer_id,
        name=name,
        decrypt=decrypt,
    )

if secret is None:


 ... (clipped 3 lines)
Generic: Security-First Input Validation and Data Handling

Objective: Ensure all data inputs are validated, sanitized, and handled securely to prevent
vulnerabilities

Status:
Input validation: Functions accept name and developer_id and directly query for secrets without visible
input validation or sanitization in the new code diff, which may rely on downstream safety
not shown here.

Referred Code
async def get_secret_by_name(developer_id: UUID, name: str, decrypt: bool = False) -> Secret:
    """Fetch a developer secret by name without caching to avoid cross-loop futures."""
    try:
        secret = await workflow.execute_activity(
            pg_query_step,
            args=[
                "get_secret_by_name",
                "secrets.get_by_name",
                {"developer_id": developer_id, "name": name, "decrypt": decrypt},
            ],
            schedule_to_close_timeout=timedelta(days=31),
            retry_policy=DEFAULT_RETRY_POLICY,
            heartbeat_timeout=timedelta(seconds=temporal_heartbeat_timeout),
        )
    except _NotInWorkflowEventLoopError:
        secret = await get_secret_by_name_query(
            developer_id=developer_id,
            name=name,
            decrypt=decrypt,
        )



 ... (clipped 4 lines)
Compliance status legend 🟢 - Fully Compliant
🟡 - Partial Compliant
🔴 - Not Compliant
⚪ - Requires Further Human Verification
🏷️ - Compliance label

Previous compliance checks

Compliance check up to commit 54da275
Security Compliance
🟢
No security concerns identified No security vulnerabilities detected by AI analysis. Human verification advised for critical code.
Ticket Compliance
🎫 No ticket provided
  • Create ticket/issue
Codebase Duplication Compliance
Codebase context is not defined

Follow the guide to enable codebase context checks.

Custom Compliance
🟢
Generic: Meaningful Naming and Self-Documenting Code

Objective: Ensure all identifiers clearly express their purpose and intent, making code
self-documenting

Status: Passed

Generic: Robust Error Handling and Edge Case Management

Objective: Ensure comprehensive error handling that provides meaningful context and graceful
degradation

Status: Passed

Generic: Secure Error Handling

Objective: To prevent the leakage of sensitive system information through error messages while
providing sufficient detail for internal debugging.

Status: Passed

Generic: Secure Logging Practices

Objective: To ensure logs are useful for debugging and auditing without exposing sensitive
information like PII, PHI, or cardholder data.

Status: Passed

Generic: Comprehensive Audit Trails

Objective: To create a detailed and reliable record of critical system actions for security analysis
and compliance.

Status:
Missing audit logs: New secret retrieval paths do not include audit logging for access to sensitive secrets,
and the diff provides no evidence that such access is logged elsewhere.

Referred Code
async def get_secret_by_name(developer_id: UUID, name: str, decrypt: bool = False) -> Secret:
    """Fetch a developer secret by name without caching to avoid cross-loop futures."""
    try:
        secret = await workflow.execute_activity(
            pg_query_step,
            args=[
                "get_secret_by_name",
                "secrets.get_by_name",
                {"developer_id": developer_id, "name": name, "decrypt": decrypt},
            ],
            schedule_to_close_timeout=timedelta(days=31),
            retry_policy=DEFAULT_RETRY_POLICY,
            heartbeat_timeout=timedelta(seconds=temporal_heartbeat_timeout),
        )
    except _NotInWorkflowEventLoopError:
        secret = await get_secret_by_name_query(
            developer_id=developer_id,
            name=name,
            decrypt=decrypt,
        )



 ... (clipped 4 lines)
Generic: Security-First Input Validation and Data Handling

Objective: Ensure all data inputs are validated, sanitized, and handled securely to prevent
vulnerabilities

Status:
Input validation gap: New code paths accept name and developer_id for secret access without visible validation
or normalization in this diff, which may rely on upstream guarantees not shown.

Referred Code
async def get_secret_by_name(developer_id: UUID, name: str, decrypt: bool = False) -> Secret:
    """Fetch a developer secret by name without caching to avoid cross-loop futures."""
    try:
        secret = await workflow.execute_activity(
            pg_query_step,
            args=[
                "get_secret_by_name",
                "secrets.get_by_name",
                {"developer_id": developer_id, "name": name, "decrypt": decrypt},
            ],
            schedule_to_close_timeout=timedelta(days=31),
            retry_policy=DEFAULT_RETRY_POLICY,
            heartbeat_timeout=timedelta(seconds=temporal_heartbeat_timeout),
        )
    except _NotInWorkflowEventLoopError:
        secret = await get_secret_by_name_query(
            developer_id=developer_id,
            name=name,
            decrypt=decrypt,
        )



 ... (clipped 4 lines)

Copy link
Contributor

@ellipsis-dev ellipsis-dev bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Important

Looks good to me! 👍

Reviewed everything up to 54da275 in 1 minute and 1 seconds. Click for details.
  • Reviewed 52 lines of code in 1 files
  • Skipped 0 files when reviewing.
  • Skipped posting 2 draft comments. View those below.
  • Modify your settings and rules to customize what types of comments Ellipsis leaves. And don't forget to react with 👍 or 👎 to teach Ellipsis.
1. src/agents-api/agents_api/common/utils/secrets.py:18
  • Draft comment:
    Good update: the alru_cache decorator was removed and replaced with a clear docstring. Please ensure that the removal of caching won’t adversely affect performance if these secrets are fetched frequently.
  • Reason this comment was not posted:
    Confidence changes required: 33% <= threshold 50% None
2. src/agents-api/agents_api/common/utils/secrets.py:45
  • Draft comment:
    The get_secrets_list function also has its caching decorator removed with an updated docstring. Confirm that the fallback execution (using connection_pool) functions as expected in all contexts.
  • Reason this comment was not posted:
    Confidence changes required: 0% <= threshold 50% None

Workflow ID: wflow_MCwr75cYd1PELEiY

You can customize Ellipsis by changing your verbosity settings, reacting with 👍 or 👎, replying to comments, or adding code review rules.

@qodo-free-for-open-source-projects
Copy link
Contributor

PR Code Suggestions ✨

No code suggestions found for the PR.

@Ahmad-mtos Ahmad-mtos merged commit 30d7ad8 into main Nov 3, 2025
26 of 27 checks passed
@claude
Copy link
Contributor

claude bot commented Nov 3, 2025

Pull Request Review: Remove Async LRU Caching from Secret Functions

Summary

This PR removes async LRU caching from secret retrieval functions to prevent cross-loop futures issues in Temporal workflow execution. While the fix addresses a legitimate technical concern, there are several considerations to discuss.

✅ Positive Aspects

  1. Addresses a Real Issue: Cross-loop futures in async contexts can cause hard-to-debug problems, especially in Temporal workflows
  2. Clean Implementation: The removal is straightforward and includes helpful docstrings explaining the rationale
  3. Maintains Functionality: The core logic remains intact with proper fallback mechanisms for both workflow and non-workflow contexts

🔍 Areas of Concern

1. Performance Impact

Issue: Removing caching could significantly impact performance, especially for frequently accessed secrets.

  • Each call now results in a database query or Temporal activity execution
  • In high-throughput scenarios, this could create database bottlenecks
  • No performance benchmarks or metrics provided to assess impact

Recommendation: Consider implementing request-scoped caching or using Temporal's built-in caching mechanisms that are workflow-safe.

2. Missing Alternative Solutions

Issue: The PR removes caching entirely without exploring alternatives.

Potential alternatives to consider:

  • Use contextvars for request-scoped caching
  • Implement workflow-aware caching that respects event loop boundaries
  • Use Temporal's built-in memo functionality for workflow contexts
  • Add a simple in-memory cache with proper event loop isolation

3. Documentation Gap

Issue: While docstrings were added, there's no update to broader documentation about this architectural decision.

Recommendation:

  • Update AGENTS.md or relevant documentation to explain why caching was removed
  • Document performance implications for future developers
  • Add an AIDEV-NOTE about potential performance hotspots

🐛 Potential Issues

  1. Database Load: Without caching, every secret access hits the database. Monitor for:

    • Increased query latency
    • Database connection pool exhaustion
    • Overall system performance degradation
  2. Temporal Activity Overhead: Each secret retrieval in workflow context now requires activity execution, which has overhead:

    • Network round-trips
    • Serialization/deserialization costs
    • Temporal history growth

💡 Suggestions for Improvement

  1. Add Metrics:

    # Add performance monitoring
    from ...monitoring import metrics
    
    async def get_secret_by_name(...):
        with metrics.timer('secrets.fetch_duration'):
            # existing implementation
  2. Consider Request-Scoped Caching:

    from contextvars import ContextVar
    
    _request_cache: ContextVar[dict] = ContextVar('secret_cache', default={})
    
    async def get_secret_by_name(...):
        cache = _request_cache.get()
        cache_key = f"{developer_id}:{name}:{decrypt}"
        if cache_key in cache:
            return cache[cache_key]
        # ... fetch logic ...
        cache[cache_key] = secret
        return secret
  3. Add Configuration Option:
    Allow enabling/disabling caching via environment variable for different deployment scenarios.

🏗️ Architecture Considerations

This change highlights a broader architectural challenge: balancing performance optimization with Temporal's execution model. Consider:

  1. Creating a dedicated caching strategy document for Temporal workflows
  2. Establishing patterns for handling shared state in distributed workflows
  3. Implementing workflow-safe caching utilities that other parts of the codebase can use

✔️ Testing Recommendations

Before merging:

  1. Load test the secret retrieval endpoints to measure performance impact
  2. Monitor database query rates in a staging environment
  3. Verify no regression in workflow execution times
  4. Test with concurrent workflow executions to ensure no race conditions

📊 Risk Assessment

  • Risk Level: Medium
  • Performance Risk: High (without mitigation)
  • Stability Risk: Low (fixes a real issue)
  • Security Risk: None identified

Final Recommendation

Approve with reservations - The fix addresses a legitimate issue, but the performance implications need careful consideration. I recommend:

  1. Adding performance monitoring before deployment
  2. Implementing request-scoped caching as a middle ground
  3. Documenting this architectural decision properly
  4. Planning a follow-up to implement workflow-safe caching

The code quality is good and the fix is valid, but we should be proactive about potential performance degradation.


Review based on AGENTS.md guidelines and best practices for the Julep codebase.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants