Skip to content

Optimize queries -> only querying for the current page and created new aggregate fast queries for counts#910

Merged
robert-inkeep merged 13 commits intomainfrom
optimize-queries
Nov 6, 2025
Merged

Optimize queries -> only querying for the current page and created new aggregate fast queries for counts#910
robert-inkeep merged 13 commits intomainfrom
optimize-queries

Conversation

@shagun-singh-inkeep
Copy link
Collaborator

No description provided.

@changeset-bot
Copy link

changeset-bot bot commented Nov 6, 2025

🦋 Changeset detected

Latest commit: 4a83c17

The changes in this PR will be included in the next version bump.

This PR includes changesets to release 9 packages
Name Type
@inkeep/agents-core Patch
@inkeep/agents-manage-ui Patch
@inkeep/agents-cli Patch
@inkeep/agents-manage-api Patch
@inkeep/agents-run-api Patch
@inkeep/agents-sdk Patch
@inkeep/ai-sdk-provider Patch
@inkeep/create-agents Patch
test-agents Patch

Not sure what this means? Click here to learn what changesets are.

Click here if you're a maintainer who wants to add another changeset to this PR

@vercel
Copy link

vercel bot commented Nov 6, 2025

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Preview Comments Updated (UTC)
agents-manage-api Ready Ready Preview Comment Nov 6, 2025 10:22pm
agents-manage-ui Ready Ready Preview Comment Nov 6, 2025 10:22pm
agents-run-api Ready Ready Preview Comment Nov 6, 2025 10:22pm
1 Skipped Deployment
Project Deployment Preview Comments Updated (UTC)
agents-docs Skipped Skipped Nov 6, 2025 10:22pm

@shagun-singh-inkeep shagun-singh-inkeep marked this pull request as ready for review November 6, 2025 19:03
@claude
Copy link
Contributor

claude bot commented Nov 6, 2025

Claude finished @shagun-singh-inkeep's task —— View job


Code Review Complete ✅

I've conducted a comprehensive review of the query optimization changes. Here's my detailed analysis and feedback:

Todo List:

  • Read and understand all changed files
  • Analyze system architecture and design decisions
  • Review query optimization strategy and implementation
  • Check for code quality, security, and performance issues
  • Evaluate consistency with existing patterns
  • Assess customer impact and API design
  • Provide actionable feedback and recommendations

🏗️ System Design & Architecture

✅ Positive Aspects:

  • Smart Optimization Strategy: The two-phase approach (fetch conversation IDs first, then detailed stats) is architecturally sound for large datasets
  • Backward Compatibility: Maintains existing non-paginated API surface while adding pagination
  • Clear Separation of Concerns: Pagination logic is well-isolated from business logic

🔧 Areas for Improvement:

1. API Design Consistency (High Priority)

Issue: Mixed return types create confusing API contracts

// Current inconsistent API
async getConversationStats(): Promise<ConversationStats[] | PaginatedConversationStats>

Recommendation: Use discriminated union or separate methods:

async getConversationStats(): Promise<ConversationStats[]>
async getConversationStatsPaginated(): Promise<PaginatedConversationStats>

Or use options-based approach:

interface GetConversationStatsOptions {
  pagination?: { page: number; limit: number }
  // ... other options
}
async getConversationStats(options: GetConversationStatsOptions): Promise<{
  data: ConversationStats[]
  pagination?: PaginationInfo
}>

2. N+1 Query Pattern Risk (Critical)

Issue: The getPaginatedConversationIds method potentially makes multiple sequential queries:

  • First: Get activity data for sorting
  • Second: Apply span filters (if any)
  • Third: Get metadata for search filtering (if searchQuery)
  • Fourth: Get user messages for search (if searchQuery)
  • Fifth: Final detailed stats query

Impact: This could result in 4-5 sequential database queries where a single optimized query might suffice.

Recommendation: Consider query consolidation strategies:

// Consolidate initial filtering into a single query
private async getFilteredConversationIds(): Promise<{
  conversationIds: string[]
  total: number
}> {
  // Single query that handles activity, filters, and search in one go
}

3. Memory Efficiency Concerns (Medium Priority)

Issue: Still fetches ALL conversation IDs before pagination in getPaginatedConversationIds

conversationIds = Array.from(activityMap.keys()) // All IDs loaded
// ... apply filters in memory
// ... then slice for pagination
const paginatedIds = conversationIds.slice(start, start + pagination.limit)

Recommendation: Push pagination to the database level:

// Add LIMIT/OFFSET to the initial activity query itself
const activityPayload = this.buildConversationActivityPayload(
  startTime, endTime, agentId, projectId, 
  { limit: totalPagesNeeded * pageSize } // Pre-calculate reasonable limit
)

🔍 Code Quality & Engineering

4. Error Handling & Observability (Medium Priority)

Issues:

  • Silent fallbacks to empty arrays mask potential issues
  • Inconsistent error logging across methods
  • Missing performance metrics for the optimization

Recommendations:

// Add performance tracking
console.time(`getPaginatedConversationIds-${pagination.page}`)
const result = await this.getPaginatedConversationIds(...)
console.timeEnd(`getPaginatedConversationIds-${pagination.page}`)

// More granular error handling
catch (error) {
  console.error('getPaginatedConversationIds failed:', {
    error,
    page: pagination.page,
    hasSpanFilters: !!(filters?.spanName || filters?.attributes?.length),
    hasSearchQuery: !!searchQuery
  })
  throw error // Don't silently swallow
}

5. Type Safety & Documentation (Medium Priority)

Issues:

  • Complex pagination logic lacks comprehensive JSDoc
  • Type definitions could be more granular for pagination states

Recommendations:

/**
 * Optimized pagination strategy for conversation stats
 * 
 * Performance characteristics:
 * - Phase 1: Fetch conversation IDs (O(n) where n = total conversations)  
 * - Phase 2: Fetch detailed stats (O(k) where k = page size)
 * - Total queries: 2-5 depending on filters and search
 * 
 * @param startTime - Unix timestamp in milliseconds
 * @param pagination - Page and limit for results
 * @returns Paginated conversation stats with metadata
 */
private async getConversationStatsPaginated(...)

6. Consistency with Existing Patterns (Low Priority)

Issue: Query building patterns are consistent, but console logging style varies

// Some places use structured logging
console.log('📊 Pagination Stats:', { ... })
// Others use simple strings  
console.error('getAggregateStats error:', e);

Recommendation: Standardize logging approach across the file.

🚀 Performance & Scalability

7. Query Optimization Effectiveness

Positive: The optimization will significantly help with large result sets by avoiding full dataset transfers.

Concern: For small datasets, the multi-query approach may actually be slower than the original single-query method.

Recommendation: Consider a hybrid approach:

// Use simple query for small datasets, optimized pagination for large ones
if (estimatedResultSize < SIMPLE_QUERY_THRESHOLD) {
  return await this.getConversationStatsSimple(...)
}
return await this.getConversationStatsPaginated(...)

What's Done Well

  1. Excellent Backward Compatibility: Existing consumers continue to work unchanged
  2. Clean Hook Integration: The React hook properly abstracts pagination complexity
  3. Comprehensive Query Building: SigNoz query construction follows established patterns
  4. Good Separation of Concerns: UI components remain focused on presentation
  5. Type Safety: Strong TypeScript usage throughout the implementation

🎯 Recommended Priority Order

  1. High: Address API design consistency (Update README.md #1)
  2. Critical: Investigate and optimize N+1 query pattern (moved readme to contributing ov #2)
  3. Medium: Implement true database-level pagination (done #3)
  4. Medium: Enhance error handling and observability (readme clarity #4)
  5. Low: Documentation and logging consistency (update docs #5, Bump next from 15.3.4 to 15.4.7 #6)

📊 Impact Assessment

Customer Impact: Positive - significantly improved performance for large datasets with no breaking changes
System Impact: Moderate complexity increase, but contained within API layer
Maintainability: Good - clear patterns that other developers can follow

The optimization strategy is sound and addresses a real performance need. With the suggested improvements, particularly around API consistency and query efficiency, this will be a robust solution.


@vercel vercel bot temporarily deployed to Preview – agents-docs November 6, 2025 19:57 Inactive
@shagun-singh-inkeep
Copy link
Collaborator Author

@claude review again

@claude
Copy link
Contributor

claude bot commented Nov 6, 2025

Claude finished @shagun-singh-inkeep's task —— View job


Code Review Complete ✅

I've conducted a fresh review of the query optimization implementation. Here's my detailed analysis:

Todo List:

  • Review the current optimization implementation and architecture
  • Analyze performance implications and query patterns
  • Review code quality and consistency
  • Check for edge cases and potential issues
  • Provide comprehensive feedback and recommendations

🎯 Summary

The PR successfully implements a two-phase query optimization to improve performance when dealing with large datasets. The approach fetches conversation IDs first, then retrieves detailed stats only for the current page. This is a solid optimization strategy that addresses real performance concerns.

What's Working Well

1. Smart Query Optimization

The two-phase approach is well-executed:

  • Phase 1: Fetch all conversation IDs with minimal data for sorting/filtering
  • Phase 2: Fetch detailed stats only for paginated subset

This reduces data transfer and processing overhead significantly for large datasets.

2. Excellent Backward Compatibility

The implementation maintains the existing API surface while adding pagination capabilities seamlessly through the PaginatedConversationStats wrapper.

3. Clean Hook Abstraction

The React hooks (useConversationStats and useAggregateStats) properly encapsulate the complexity:

// Clean separation of concerns
const { stats, loading, error, refresh, pagination } = useConversationStats({
  // ... options
  pagination: { pageSize: 10 }
});

4. Proper Type Safety

Strong TypeScript usage throughout with well-defined interfaces for pagination metadata.

🔧 Areas for Improvement

1. Query Consolidation Opportunity (High Priority)

Current Implementation: The buildFilteredConversationIdsPayload method creates a consolidated query that fetches activity, metadata, and search data in one request, which is good. However, there's still room for improvement.

Issue: When search is active, the method fetches ALL user messages just to find the first one:

// Line 402-409 in signoz-stats.ts
for (const s of userMessagesSeries) {
  const id = s.labels?.[SPAN_KEYS.CONVERSATION_ID];
  const content = s.labels?.[SPAN_KEYS.MESSAGE_CONTENT];
  if (!id || !content) continue;
  if (!firstMessagesMap.has(id)) {
    firstMessagesMap.set(id, content); // Only keeps first, but fetches all
  }
}

Recommendation: Add a query specifically for fetching only the first user message per conversation:

// Add to buildFilteredConversationIdsPayload when includeSearchData is true
firstUserMessage: {
  aggregateOperator: AGGREGATE_OPERATORS.MIN, // Get earliest
  aggregateAttribute: {
    key: SPAN_KEYS.TIMESTAMP,
    ...QUERY_FIELD_CONFIGS.INT64_TAG_COLUMN,
  },
  groupBy: [
    { key: SPAN_KEYS.CONVERSATION_ID, ...QUERY_FIELD_CONFIGS.STRING_TAG },
  ],
  limit: 1, // Only need first message per conversation
  // ... rest of query config
}

2. Memory Usage with Large Result Sets (Medium Priority)

Issue: The method still loads ALL conversation IDs into memory before pagination (line 373):

let conversationIds = Array.from(activityMap.keys());

For very large datasets (e.g., 100k+ conversations), this could cause memory pressure.

Recommendation: Consider adding a reasonable upper limit to prevent memory issues:

const MAX_CONVERSATIONS = 50000; // Reasonable upper bound
// In buildFilteredConversationIdsPayload, add to lastActivity query:
limit: MAX_CONVERSATIONS,

3. Edge Case: Search During Pagination (Low Priority)

Current Behavior: When search is active, pagination controls are hidden (line 183 in conversation-stats-card.tsx):

{pagination && pagination.totalPages > 1 && !searchQuery && (

This makes sense for client-side search, but since search is now server-side, pagination could be preserved.

Recommendation: Consider enabling pagination for search results if the implementation supports it.

4. Logging Improvements (Low Priority)

The console.log statements are helpful for debugging but should be removed or converted to a proper logging system before production:

console.log('📊 Pagination Stats:', { ... }); // Line 218
console.log('📄 getPaginatedConversationIds (optimized):', { ... }); // Line 435

Consider using a logging utility with configurable levels.

🚀 Performance Analysis

Query Efficiency

  • Before: Single query fetching all conversation details (potentially MBs of data)
  • After: Two queries - IDs first (KBs), then details for current page only (controlled size)
  • Impact: Significant improvement for datasets > 100 conversations

Network Transfer

  • Page Size 10: ~90% reduction in detailed data transfer
  • Page Size 50: ~80% reduction in detailed data transfer

Time Complexity

  • Sorting: O(n log n) where n = total conversations
  • Pagination: O(k) where k = page size
  • Overall: Acceptable trade-off for better user experience

🏆 Final Assessment

This is a well-implemented optimization that addresses a real performance bottleneck. The code is clean, type-safe, and maintains backward compatibility. The two-phase query approach is a proven pattern for handling large datasets with pagination.

Recommended Next Steps:

  1. Address the query consolidation for user messages (High priority - quick win)
  2. Add memory safeguards for extremely large datasets (Medium priority)
  3. Clean up console.log statements before production (Low priority)
  4. Consider monitoring query performance in production to validate the optimization

The optimization successfully achieves its goal of improving performance for paginated views while maintaining the existing functionality. Great work!


Overall Grade: B+ - Solid implementation with minor room for improvements. The optimization strategy is sound and will provide real benefits to users dealing with large datasets.

@vercel vercel bot temporarily deployed to Preview – agents-docs November 6, 2025 21:02 Inactive
@shagun-singh-inkeep shagun-singh-inkeep changed the title Optimize queries -> only querying for the current page Optimize queries -> only querying for the current page and created new aggregate fast queries for counts Nov 6, 2025
@robert-inkeep robert-inkeep merged commit 02d6839 into main Nov 6, 2025
9 checks passed
@robert-inkeep robert-inkeep deleted the optimize-queries branch November 6, 2025 22:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants