Skip to content

Site Groups / First Party Sets v2 #22

@pbannist

Description

@pbannist

Site Groups

This document proposes a new web platform mechanism to declare a collection of related domains as being in a Site Group. This is an evolution of the First-Party Sets proposal to accommodate for several changes:

  • Removal of much language around “First-Party” as it has many historical connotations/denotations that may be less relevant or confusing in the future.

  • Renaming of the standard to “Site Groups” as not only does this remove the “First-Party” confusion, but is a more straightforward name that may even be usable in communication to users.

  • Proposes modifications to existing standard browser UA policies to remove “single organization control” as a requirement of Site Groups due to:

  1. Lack of public information that documents corporate/organizational ownership, and any clear way of defining a policy that can be fairly applied
  2. Inability of browsers to police organizational ownership
  3. Bias of this requirement towards large companies over small
  4. Organizational ownership not being discernible to the user, nor offering the user any comfort that their data would be used in a specific way
  • Specifically empowers the “owner” site with the incremental cross-site functionality, and disempowers “secondary” sites from having cross-site capabilities. This should allow for all required functionality across sites while minimally increasing the availability of user data beyond the origin.
  • Adds a requirement of a shared privacy policy, and a human-readable site group name across all domains in a Site Group.

Most of the language in this proposal is directly taken from the First-Party Sets proposal, and a significant amount of privacy-specific language was removed from this as no changes are proposed from the First-Party Sets proposal into this one. That is, I tried to remove any parts of the FPS proposal that were not modified in any way and/or were not relevant to expressing the changes in this version. If Site Groups is deemed to be a useful extension of FPS then those elements can be reintegrated back in.

Thanks to the editors/writers of the First-Party Sets proposal:
Kaustubha Govind, Google
David Benjamin, Google

Introduction

Browsers have proposed a variety of tracking policies and privacy models which scope access to user identity to some notion of “first-party”. From the user’s perspective, first-party has typically meant a singular domain, but this limits how sites can provide services to the user. Site Groups aims to increase the ability of sites to provide valuable services to their users by widening the privacy boundary to include affiliated sites, while minimally impacting the user’s privacy. In redefining this scope, we must balance two goals: the scope should be small enough to meet the user's privacy expectations, yet large enough to provide the user's desired functionality on the site they are interacting with.

One natural scope is the domain name in the top-level origin. However, the website the user is interacting with may be deployed across multiple domain names. For example, https://google.com, https://google.co.uk, and https://youtube.com are owned by the same entity, as are https://apple.com and https://icloud.com, or https://amazon.com and https://amazon.de.
We may wish to allow user identity to span related origins, where consistent with privacy requirements. For example, Firefox ships an entity list that defines lists of domains belonging to the same organization. This explainer discusses a mechanism to allow organizations to each declare their own list of domains, which is then accepted by a browser if the set conforms to its policy.

Goals

  • Allow related domain names to declare themselves as within a Site Group.
  • Define a framework for browser policy on which declared names will be treated as the same site in privacy mechanisms.
  • Minimally increase the availability of user data to reduce potential privacy issues

Non-goals

  • Third-party sign-in between unrelated sites.
  • Information exchange between unrelated sites for ad targeting or conversion measurement.
  • Other use cases which involve unrelated sites.

Declaring a Site Group

A site group is identified by one owner registered domain and a list of secondary registered domains. (See alternative designs for a discussion of origins vs registered domains.)

An origin is in the site group if:

  • Its scheme is https; and
  • Its registered domain is either the owner or is one of the secondary domains.

The browser will consider domains to be members of a set if the domains opt in and the set meets UA policy, to incorporate both user and site needs. Domains opt in by hosting a JSON manifest at https:///.well-known/site-group. The secondary domains point to the owning domain while the owning domain lists the members of the set, a version number to trigger updates, and a set of signed assertions to inform UA policy (details below).

Suppose a.example, b.example, and c.example wish to form a first-party set, owned by a.example. The sites would then serve the following resources:
https://a.example/.well-known/site-group
{
"owner": "a.example",
"version": 1,
"privacy-policy": "a.example/privacy-policy.html",
"sg-name": "Human readable name of this site group",
"members": ["b.example", "c.example"],
"assertions": {
"chrome-sg-v1" : "",
"firefox-sg-v1" : "",
"safari-sg-v1": ""
}
}

https://b.example/.well-known/site-group
{ "owner": "a.example" }

https://c.example/.well-known/site-group
{ "owner": "a.example" }

The browser then imposes additional constraints on the owner's manifest:
Entries in members that are not registrable domains are ignored.
Only entries in members that meet UA policy will be accepted. The others will be ignored. If the owner is not covered by UA policy, the entire set is rejected.

Owner Privileges

The owner domain of a given site group has special privileges within that site group. It can read and write "first party" data stores (first party cookies, LocalStorage, Storage Access API, etc.) within the browser when the browser origin is set to a domain within its site group. More plainly:

  • The owner domain has access to read/write all data from across the site group
  • Each secondary domain only has access to read/write data from its own domain (same as currently implemented)

This would generally require any sites in a site group including resources (iframes, script, etc.) from the owner domain in order to create applications using the site group, but this seems like a minimal issue. This also allows for cases where sites in a site group can avoid calling the owner domain in order to lower any privacy/security risks for that request.

Discovering Site Groups

By default, every registrable domain is implicitly owned by itself. The browser discovers site groups as it makes network requests and stores the site group owner for each domain. On a top-level navigation, websites may send a Sec-Site-Group response header to inform the browser of its site group owner. For example https://b.example/some/page may send the following header:
Sec-Site-Group: owner="a.example", minVersion=1

If this header does not match the browser's current information for b.example (either the owner does not match, or its saved first-party set manifest is too old), the browser pauses navigation to fetch the two manifest resources. Here, it would fetch https://a.example/.well-known/site-group and https://b.example/.well-known/site-group.
These requests must be uncredentialed and with suitably partitioned network caches to not leak cross-site information. In particular, the fetch must not share caches with browsing activity under a.example. See also discussion on cross-site tracking vectors.

If the manifests show the domain is in the set, the browser records a.example as the owner of b.example (but not c.example) in its site-group storage. It evicts all domains currently recorded as owned by a.example that no longer match the new manifest. Then it clears all state for domains whose owners changed, including reloading all active documents. This should behave like Clear-Site-Data: *. This is needed to unlink any site identities that should no longer be linked. Note this also means that execution contexts (documents, workers, etc.) are scoped to a particular site group throughout their lifetime. If the group owner changes, existing ones are destroyed.

The browser then retries the request (state has since been cleared) and completes navigation. As retrying POSTs is undesirable, we should ignore the Sec-Site-Group header directives on POST navigations. Sites that require a site group to be picked up on POST navigations should perform a redirect (as is already common), and have the Sec-Site-Group directive apply on the redirect.
Subresource requests and subframe navigations are simpler as they cannot introduce a new first-party/site group context. If the request matches the origin URL's owner's manifest but is not currently recorded as being in that site group, the browser validates membership as above before making the request. Any Sec-Site-Group headers are ignored and, in particular, the browser should never read or write state for a site-group other than the current one. This simpler process also avoids questions of retrying requests. The minVersion parameter in the header ensures that the browser's view of the owner's manifest is up-to-date enough for this logic.

Design details

UA Policy

Defining acceptable sets

We should have some notion of what sets are acceptable or unacceptable. For instance, a set containing the entire web seems clearly unacceptable. Conversely, a set containing https://acme-corp-landing-page.example and https://acme-corp-online-store.example seems reasonable. There is a wide spectrum between these two scenarios. We should define where to draw the line.

Browsers implementing Site Groups will specify UA policy for which domains may be in the same set. While not required, it is desirable to have some consistency across UA policies. For a set of guiding principles in defining UA policy, we can look to how the various browser proposals describe first parties (emphasis added):

  • A Potential Privacy Model for the Web (Chromium Privacy Sandbox): "The notion of "First Party" may expand beyond eTLD+1, e.g. as proposed in First Party Sets. It is reasonable for the browser to relax its identity-sharing controls within that expanded notion, provided that the resulting identity scope is not too large and can be understood by the user."

  • Edge Tracking Protection Preview: "Not all organizations do business on the internet using just one domain name. In order to help keep sites working smoothly, we group domains owned and operated by the same organization together."

  • Mozilla Anti-Tracking Policy: "A first party is a resource or a set of resources on the web operated by the same organization, which is both easily discoverable by the user and with which the user intends to interact."

  • WebKit Tracking Prevention Policy: "A first party is a website that a user is intentionally and knowingly visiting, as displayed by the URL field of the browser, and the set of resources on the web operated by the same organization." and, under "Unintended Impact", "Single sign-on to multiple websites controlled by the same organization."

UA policies are at the discretion of each browser, but since this proposal does require UA policies to be in alignment, making any required adjustments to those policies is important. Specifically, the requirement of ownership by a single organization has a variety of issues. These are laid out in the following issues:

WICG/first-party-sets#18
WICG/first-party-sets#17
WICG/first-party-sets#14

While it is obviously up to each browser vendor to decide on its own UA policy, removing this requirement does not seem to have a negative impact on overall privacy considerations. Two ways to mitigate for the removal of this requirement are:

  • Shared and declared privacy policy across all domains
  • Removal of cross-site privileges from all domains except the owner domain

Additionally, the robust, enforceable requirements of the First-Party Sets proposal remain:

  • Signed assertions by a trusted verification entity
  • Sites being able to join only a single site group

Given the UA policy, policy decisions must be delivered to the user’s browser. This can use either static lists or signed assertions. Note that site group membership requires being listed in the manifest in addition to meeting UA policy. This allows sites to quickly remove domains from their site group set.

Shared Privacy Policy

More so than organizational ownership, a shared privacy policy across all domains in a site group can give a user comfort that their data is being used in a consistent and understandable way within the site group. Additionally, by including a shared privacy policy within the json declaration, would allow any browser UI elements to include direct links to the policy for users to inspect.

Parties that were interested in verifying that a site group was "well-behaved" could easily validate that all member sites did indeed adhere to the shared privacy policy, and report violations to browsers directly or to any entities managing signed assertions for site groups (see below).

A given domain within a group could implement a stricter privacy policy than the site group-shared policy, but could not relax any of the policies from a sharing/transfer/usage perspective. This could also be validated by interested parties and external assertion-management entities.

This idea could be extended to give users direct links to "forget me" from the site group or similar functionality within the browser.

Static lists

The browser vendor could maintain a list of domains which meet its UA policy, and ship it in the browser. This is analogous to the list of domains owned by the same entity used by Edge and Firefox to control cross-site tracking mitigations.

A browser using such a list would then intersect first-party set manifests with the list. It would ignore the assertions field in the manifest. Note fetching the manifest is still necessary to ensure the site opts into being a set. This avoids problems if, say, a domain was transferred to another entity and the static list is out of date.

Static lists are easy to reason about and easy for others to inspect. At the same time, they can develop deployment and scalability issues. Changes to the list must be pushed to each user's browser via some update mechanism. This complicates sites' ability to deploy new related domains, particularly in markets where network connectivity limits update frequency. They also scale poorly if the list gets too large.

Signed assertions

Alternatively, the browser vendor, or some entities it designates, can sign assertions for domains which meet UA policy, using some private key. A signed assertion has the same meaning as membership in a static list: these domains meet the signer’s policy. The browser would trust the signers’ public key and, as above, only accept domains covered by suitable assertions.
Assertions are delivered in the assertions field, which contains a dictionary mapping from signer name to signed assertion. Browsers ignore unused assertions. This format allows sites to serve assertions from multiple signers, so they can handle policy variations more smoothly. In particular, we expect policies to evolve over time, so browser vendors may wish to run their own signers. Note these assertions solve a different problem from the Web PKI and are delivered differently. However, many of the lessons are analogous.

As with a static list, signers maintain a full list of currently checked domains. They should publish this list at a well-known location, such as https://sg-signer.example/site-groups.json. Although browsers will not consume the list directly, this allows others to audit the list. The signer may wish to incorporate a Certificate-Transparency-like mechanism for stronger guarantees.
The signer then regularly produces fresh signed assertions for the current list state. For extensibility, the exact format and contents of this assertion are signer-specific (browsers completely ignore unknown signers, so there is no need for a common format). However, there should be a recommended format to avoid common mistakes. Each signed assertion must contain:

  • The domains that have been checked against the signer’s policy
  • An expiration time for the signature
  • A signature over the above, made by the signer’s private key

Assertion lifetimes should be kept short, say two weeks. This reduces the lifetime of any mistakes. The browser vendor may also maintain a blocklist of revoked assertions to react more quickly, but the reduced lifetime reduces the size of such a list.
To avoid operational challenges for sites, the signer makes the latest assertions available at a well-known location, such as https://sg-signer.example/assertions/. We will provide automated tooling to refresh the manifest from these assertions, and sites with more specialized needs can build their own. To support such automation, the URL patterns must be standard across signers.

Note any duplicate domains in the assertions and members attribute should compress well with gzip.

UI Treatment

In order to provide transparency to users regarding the Site Group that a web page’s top-level domain belongs to, browsers may choose to present UI with information about the Site Group owner and the members list. One potential location in Chrome is the Origin/Page Info Bubble - this provides requisite information to discerning users, while avoiding the use of valuable screen real-estate or presenting confusing permission prompts. However, browsers are free to choose different presentation based on their UI patterns, or adjust as informed by user research.

Browser UI elements can also expose the shared privacy policy for the site group, as well as a human readable name of the site group that would desirably match any cross-site branding. This would hopefully give users even more context about the site group, how their information is used, and why the site is part of a given site group.

Note that Site Groups also gives browsers the opportunity to group per-site controls (such as those at chrome://settings/content/all) by the site group boundary instead of eTLD+1, which is not always the correct site boundary.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions