Dormancy Provider Mode (declarative disable for providers that expirience full or partial outage of their SaaS making API requests inconsistent) #3510

webervin · 2025-11-18T19:37:36Z

webervin
Nov 18, 2025

Goal

Introduce a native mechanism that enables successful plan and apply operations on a subset of healthy providers within a single root module. This effectively mitigates the impact of a temporary outage or degradation of a single third-party SaaS provider without resorting to complex module separation or error-prone interactive targeting.

Rationale (Addressing Existing Gaps)

The current workarounds for single-provider outages (separating root modules or using --target) introduce significant operational overhead and reduce the benefits of centralized dependency management:

Module Separation: Increases complexity, slows down dependency propagation, and requires a specific deployment order.
--target Flag: Requires intimate knowledge of resource addresses, is difficult to secure and automate robustly in CI/CD pipelines, and should not be relied upon for generalized deployment.

We can do better by teaching OpenTofu to skip problematic providers whenever possible.

Implementation Details

I propose the introduction of a new, optional setting, dormant_mode = true, configurable within the provider block or via a dedicated environment variable.

Inline configuration: Known long-term maintenance/issue periods; temporary manual override.

provider "saas_vendor" { 
  dormant_mode = true # default false
}

Environment variable, in CI/CD pipelines, where I can set variables for multiple workspaces at same time, or even use custom scripting to detect provider availability automatically as part of pipeline.
export OT_DORMANT_MODE_PROVIDERS="saas_vendor,aws-east,cloudflare" (value is coma separated list of providers that will be disabled in current operations).

OpenTofu Behavior in "Dormant" Mode

When a provider is marked as dormant, OpenTofu modifies the standard graph refresh and execution sequence for that provider only:

Area	Default Behavior (OpenTofu asks provider to do things	Dormant Provider Behavior (OpenTofu does not invoke provider code, except maybe validate)
Provider Initialization	Attempt API connection and handshake.	Successful Initialization Mock: The provider is initialized successfully without calling any external API functions.
Resource Refresh	Call API to read current state.	Refresh Bypass: A warning is issued. The state is read only from the local or remote state file. No API calls are made to the provider.
Data Sources	Call API to fetch real-time data.	Data Source Mock: A warning is issued. Data source calls attempt to return the previously known value from the state file (if present and configured to store outputs). If not in state, the call fails with an explicit error, forcing the user to remove or adjust the data source dependency.
Plan (no-refresh)	Compare config, state, datasources and generate result.	Compare config, state, datasources and generate result
Plan (refresh)	Compare config, state, ask provider to query APIs and generate result.	Issue warning and generate no-refresh plan for dormant provider. Raise error if plan results in proposed changes to resource managed by dorman provider.
Apply	Ask provider to perform CRUD operations via API.	Raise error if proposed plan contains changes to dormant provider (e.g., "Cannot apply changes to provider 'saas_vendor' while in dormant mode."). Otherwise ask providers to perform CRUD operations via API.

apparentlymart · 2025-11-18T23:09:24Z

apparentlymart
Nov 18, 2025
Maintainer

Thanks for sharing this, @webervin!

Overall this problem and solution is intuitive to me and I don't really have much to add to it.

The one part that stuck out to me as surprising was your proposal to represent this in the configuration... is the workflow you're imagining that you'd notice that one of your vendors is having an outage and then send the "dormant mode" enable through your pull request process to turn it on, and then again through the pull request process to turn it off again afterwards?

I can see how that would work, but in my experience I've typically wanted "incident-related" controls to be something I can do outside of the configuration, in case e.g. the version control system itself is degraded in some way, or if the outage only applies to one environment and so the setting only needs to be twiddled temporarily for that environment.

I ask this question mainly because during the discussion about the -exclude option we talked about potentially extending it to support excluding everything associated with a particular provider instance -- e.g. something like -exclude=provider.sass_vendor. As part of that discussion I remember us noting that someone responding to an incident might choose to temporarily reconfigure their execution system to set the environment variable TF_CLI_ARGS_plan=-exclude=provider.sass_vendor as the way to impose that setting only for the duration of the outage. I assume you'd consider that as equivalent to -target and so not an acceptable answer for similar reasons?

Thanks again for sharing this! It's something we've discussed in various forms before, and this is an interesting specific take on it.

4 replies

webervin Nov 19, 2025
Author

The one part that stuck out to me as surprising was your proposal to represent this in the configuration.

yes, it is entirely possible that in some cases there might be break-the-glass procedure that handles case where CI/CD is down and is allowing deploys using elevated credentials, being able to specidy "problem" in confiugration itself will be portable outside CI/CD and can be flagged by tools like TFLint as warning - you have part of system disabled. So you get consistent behaviour no matter where you try to execute opentofu with given state and configuration.

I've typically wanted "incident-related" controls to be something I can do outside of the configuration

I do feel that we would need to support both, in-configuration and via environment configurations to cover majority of users. It might be easiest to add variable in CI/CD but if it is multihour outage you might want to propagate knowledge outside CI/CD without relying on human memory.

I remember us noting that someone responding to an incident might choose to temporarily reconfigure their execution system to set the environment variable TF_CLI_ARGS_plan=-exclude=provider.sass_vendor as the way to impose that setting only for the duration of the outage. I assume you'd consider that as equivalent to -target and so not an acceptable answer for similar reasons?

I would consider any working option that allows to mutate resources controlled by working providers during outage, yet today https://opentofu.org/docs/cli/state/resource-addressing/#resource-addresses-on-the-command-line does not mention ability to filter by provider. Typicaly, --exclude and --target work on resource address, and unfortunately they are not prefixed with provider name, only with parent modules, so require rather big knowledge on namings. If this filter would work on provider name it would do the trick (but it would be rather confusing that resource targeting flags accept not only resource addresses).

Worth noting that historically, terraform, did refuse to do targeted run when some dependency for graph is not included, and required to target all missing dependencies as well if they are needed to build graph of changes. Potentially solution could account for muscle memory from aincent times a bit as well. Current proposal implies different behaviour and different flags to not confuse the two: in proposed scenario we know that there is no way to use given provider, and switch to mock it out using knowledge in statefile instead of considering it being part of graph.

apparentlymart Nov 19, 2025
Maintainer

Thanks for the extra details!

And indeed I do want to be clear that the option to exclude by provider in the -exclude option is only something we considered as a future extension, not something we already implemented. My intention for mentioning it was only to test whether that earlier idea might be a reasonable way to solve the problem you presented, which you've answered clearly already. Thanks!

webervin Nov 19, 2025
Author

Thank you as well for clarification.
Can we please link previous discussion(s) here, so people who find this one can follow on to the "correct" one?

apparentlymart Nov 19, 2025
Maintainer

This was part of the design discussion that led to Exclude Flag for Planning and Applying being written and accepted, but it doesn't seem like that part of the discussion was recorded in the final RFC since the RFC focuses only on what was actually implemented and doesn't discuss the ideas that were eventually considered to be out of scope for the first project.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

OpenTofu

Dormancy Provider Mode (declarative disable for providers that expirience full or partial outage of their SaaS making API requests inconsistent) #3510

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Replies: 1 comment 4 replies

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

OpenTofu

Dormancy Provider Mode (declarative disable for providers that expirience full or partial outage of their SaaS making API requests inconsistent) #3510

Uh oh!

Uh oh!

webervin Nov 18, 2025

Goal

Rationale (Addressing Existing Gaps)

Implementation Details

OpenTofu Behavior in "Dormant" Mode

Replies: 1 comment · 4 replies

Uh oh!

apparentlymart Nov 18, 2025 Maintainer

Uh oh!

Uh oh!

webervin Nov 19, 2025 Author

Uh oh!

apparentlymart Nov 19, 2025 Maintainer

Uh oh!

webervin Nov 19, 2025 Author

Uh oh!

apparentlymart Nov 19, 2025 Maintainer

webervin
Nov 18, 2025

Replies: 1 comment 4 replies

apparentlymart
Nov 18, 2025
Maintainer

webervin Nov 19, 2025
Author

apparentlymart Nov 19, 2025
Maintainer

webervin Nov 19, 2025
Author

apparentlymart Nov 19, 2025
Maintainer