Skip to content

Support for Encrypted Intermediates in Aggregation Service: Feedback Requested #77

@preethiraghavan1

Description

@preethiraghavan1

Current System

Currently, the Aggregation Service aggregates contributions from raw encrypted reports and produces aggregated and noised histograms.
In scenarios where an adtech needs to batch and send the same reports repeatedly for aggregation (e.g., when querying over extended time range like daily, weekly, monthly with re-querying), recomputing the same underlying data multiple times can be computationally expensive and lead to slow query performance while incurring additional unnecessary and avoidable costs.

Note that re-querying is in design phase and is not an available feature in Aggregation Service now.

Proposed Enhancement

Introduce the ability to cache the aggregated contributions from decrypted aggregatable reports in un-noised Encrypted Intermediate reports to be used in subsequent multiple aggregation jobs. Encrypted intermediates can be a way to reduce computations, resulting in reduced latency and cloud cost for adtechs. The reduction in computation comes from decrypting and aggregating for each report only once and caching the un-noised aggregation (the Encrypted Intermediates) for future use. Subsequent jobs that include the cached encrypted intermediates for the same data will only need to decrypt and aggregate new and fewer reports, resulting in overall lower latency.

Motivating Use Cases

  • Re-querying & Extended Time Range Queries: For frequent reporting or analysis across large timeframes, for example, daily, weekly, monthly queries using the same data, repeatedly processing the same raw data leads to unnecessary computational overhead. Using Encrypted Intermediates can reduce the overhead, ultimately reducing the job latency and cost. (#732)

  • Filtering Contributions by filtering Ids, for example, processing by campaign ids to do Reach Measurement as proposed here:
    Each query with filtering IDs processes all the input reports, even when only a fraction of the data is actually relevant. This inherent recomputation is amplified with re-querying, as it necessitates sending all reports for every query involving filtering IDs and the same data, leading to a compounding amount of avoidable reprocessing. Using Encrypted Intermediates for filtering Ids, can help with avoiding the recomputations.
    This could be an example of a user journey using filtering ids. Consider an adtech having 200M reports segmenting their data using contribution filtering to 5 campaigns using filtering IDs [1-5], and 100M output domains per filtering ID. Below is an example of how they would use encrypted intermediates to generate summary reports for each individual campaign and then across all 5 campaigns:

    1. Adtech runs an Encrypted Intermediate job to generate an encrypted intermediate 1 (let’s call this EI-1) for filtering ID 1 (let’s call this fid-1)
    2. Adtech runs an aggregation service job for the same fid-1 to generate a summary report
    3. Adtech repeats step 1&2 for fid-2 through fid-5
    4. To generate a summary report across all 5 campaigns, Adtech would run an aggregation job for all filtering IDs to generate the final summary report that includes all campaigns using EI-1 through EI-5. For this example, using EIs to generate the final summary report will give an 8X improvement in performance/latency compared to requerying without encrypted intermediates.

The graph below illustrates how an adtech could benefit from Encrypted Intermediates in their workflow. Daily intermediate reports, built incrementally, feed into further intermediate reports for various cadences and segments. We expect that summary report generation leveraging these encrypted intermediates will lead to significant speed improvements compared to using raw reports directly.

Encrypted Intermediates timeline
Note: The durations are just for illustration purposes. The real durations will vary depending on the query sizes. In general, a query running on intermediates should run faster than the whole large raw reports of the same data.

Design Considerations

  • Encrypted intermediate reports will also be histograms, but they will be stored un-noised and encrypted.
  • Encrypted intermediate reports will be padded to a fixed size, similar to raw reports, to prevent revealing the size of the contributions.
  • They will be written to the cloud location specified in the request, similar to summary reports.
  • These intermediates can contribute to further queries, both intermediate and final.
  • Aggregation report accounting will be applied to Encrypted Intermediate reports as they are done for raw reports.

Cost Considerations

Using Encrypted Intermediates for a use case depends on cost-benefit analysis. Generating Encrypted Intermediates involves processing, encryption, latency, and storage costs. If savings in processing outweigh these costs, using Encrypted Intermediates is recommended. The cost difference varies based on the use case. Some queries may benefit from using encrypted intermediates, while others may not. Guidance will be provided to help adtechs make this decision.

We believe this enhancement can help adtechs avoid repeated, costly computations while providing latency improvements and overall cost reduction for their jobs. We're interested in your feedback on this idea. In particular,

  1. What use cases do you think Encrypted Intermediates can be useful for?
  2. What kind of batching and report management assistance would be helpful when using this feature?

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions