A StackSet operation that touches 1000 stack instances across 50 accounts and 4 regions can take hours and create cascading failure modes. CloudFormation gives you four levers to control the rollout: **maximum concurrent accounts**, **failure tolerance**, **region concurrency**, and **concurrency mode**. Plus two safety features: **target account gates** and **parameter overrides**. ## The four core levers ### 1. Maximum concurrent accounts How many target accounts to operate on at once, **per region**. Number or percentage. ``` Maximum concurrent accounts = 50%, target accounts = 10 → Up to 5 accounts deploy in parallel per region ``` For percentages, CFN **rounds down** if not a whole number. `25% × 10 = 2.5 → 2`. ### 2. Failure tolerance Maximum allowed failures **per region** before the operation aborts. Number or percentage. ``` Failure tolerance = 20%, 10 target accounts in 3 regions → Up to 2 failures per region tolerated → 3rd failure in any region → operation stops in that region → Operation continues to next region (Sequential mode) ``` Failure tolerance interacts with Maximum concurrent accounts depending on Concurrency Mode (below). ### 3. Region concurrency How regions are processed: - **Sequential** (default) — one region at a time, in the order you specify - **Parallel** — all regions simultaneously Sequential limits blast radius — a bad deploy fails in region 1 and stops before reaching region 2. Parallel finishes faster but risks identical failures across all regions before you can cancel. ### 4. Concurrency mode How the actual concurrency level evolves as failures accumulate: **Strict Failure Tolerance** (default): - Initial concurrency = `min(MaxConcurrent, FailureTolerance + 1)` - Each failure **reduces** active concurrency - Operation stops when failures = `FailureTolerance + 1` - Slower but safer — failures naturally throttle the rollout **Soft Failure Tolerance**: - Concurrency stays at `MaxConcurrent` regardless of failures - Decoupled from failure tolerance - Faster but failures can pile up — by the time you hit `FailureTolerance + 1` failures, the in-flight operations may push the actual count higher - Useful when: you want maximum throughput AND failures are likely benign (existing-resource collisions, expected permission gaps) ## A worked example — Strict mode at scale Deploying 1000 stack instances. `FailureTolerance = 100`, `MaxConcurrent = 250`. - Initial actual concurrency = 101 (min of 250 and 100+1) - After 50 failures → actual concurrency drops to ~51 - At 101 failures → operation stops - Final state: ~150-200 stack instances created (101 failed, the rest succeeded before stop) Soft mode same scenario: - Actual concurrency = 250 throughout - Stops when failures > 100, but in-flight ops continue - Final state: ~300-400 stack instances created (~150 failed, rest succeeded; queue drains) Soft is roughly 2-3x faster at the cost of more failed resources. ## Parameter overrides — the per-instance variation lever A StackSet uses one template with one set of parameter values by default. **Parameter overrides** let you set different parameter values per stack instance (per account+region): ```bash aws cloudformation update-stack-instances \ --stack-set-name my-baseline \ --accounts 111111111111 \ --regions us-east-1 \ --parameter-overrides ParameterKey=Subnets,ParameterValue=subnet-1baa3351 ``` This is the **only** way to vary template behavior across stack instances. Use cases: - VPC IDs that differ per account - Region-specific AMI IDs - Account-specific cost-center tags - Per-environment instance sizes (when one StackSet covers prod+nonprod) Without overrides, every stack instance gets the same parameter values from the StackSet definition. ## Target account gates — the pre-deployment veto Account gates are **Lambda functions in target accounts** that CFN invokes before a stack operation. The function returns `SUCCEEDED` (proceed) or `FAILED` (skip this account, count toward failure tolerance). Strict requirements: | Requirement | Detail | |-------------|--------| | Function name | Must be `AWSCloudFormationStackSetAccountGate` (literal — not configurable) | | Location | In the target account, in the region being deployed to | | Permissions | `AWSCloudFormationStackSetExecutionRole` must have `lambda:InvokeFunction` | | Behavior on missing | If no function with that exact name exists, CFN skips the gate and proceeds | Use cases: - Block deployment when active CloudWatch alarms exist (don't deploy during incidents) - Verify maintenance window - Check account-specific feature flags - Enforce compliance preconditions (correct tags exist, certain resources are present) A failed gate counts toward failure tolerance — so account gates are strict-mode-friendly (they slow rollout proportionally to gate failures). This feature is **StackSets-only** — not available for normal stack operations. ## What you cannot control - **Cross-region rollback as a unit** — a failure in region 3 doesn't roll back regions 1 and 2 - **Cross-account rollback as a unit** — same; each stack instance is independent at the rollback level - **Operation pause** — you can stop, but can't pause-and-resume from a specific point - **Target ordering within a region** — you specify region order, but accounts within a region are arbitrary - **Per-account permission scoping** — execution role permissions are uniform across all targets ## Decision heuristics | If you... | Choose | |-----------|--------| | Are deploying to production | Sequential regions + Strict mode + low failure tolerance (1-5%) | | Are pushing a security patch and need it everywhere fast | Parallel regions + Soft mode + moderate failure tolerance | | Have stable templates with known-benign edge cases (existing resources) | Soft mode | | Are testing a new template | Sequential + Strict + low concurrency (5-10) so you can cancel quickly | | Have any meaningful chance of "this will break X% of accounts" | Strict mode (auto-throttle) | ## Related - [[CFN StackSets Cross-Account Cross-Region]] - [[CFN StackSet Permission Models Self-Managed vs Service-Managed]] - [[CFN Failure Rollback Behavior]] - [[CFN Drift Detection Mechanics and Limits]]