The Hidden Cost of Multi-Cloud

The multi-cloud pitch usually sounds like insurance: if AWS goes down, we fail over to Azure. If one vendor raises prices, we move. If a region burns, we're fine. It's a good story. It's also, in most deployments I've audited, wrong. The teams running true active-active multi-cloud aren't more resilient — they're more brittle, more expensive, and slower to ship.

Multi-cloud is a tool. Used for the right reasons, it works. Used for resilience, it usually makes the thing it was supposed to fix worse.

The egress math nobody does before the decision

AWS egress to the internet is roughly $0.09/GB after the first 100 GB. Cross-region traffic inside AWS is $0.02/GB. Azure and GCP land in similar territory. Move 50 TB of data between clouds in a month — a small analytics sync — and you're looking at north of $4,500 just for the wire. Every month. Forever.

Now multiply that by the reality of active-active: your primary dataset lives somewhere, and your secondary cloud needs a fresh copy. If it's not fresh, your failover is a fiction. If it is fresh, you're paying for continuous replication across a paid pipe. Tools like Aviatrix or Megaport help with the unit cost, not the underlying physics: data has gravity, and cross-cloud replication is a tax.

Expertise duplication is the real budget line

Here's what nobody puts on the slide:

AWS IAM and Azure RBAC are different mental models. Your ops engineer who can debug a cross-account sts:AssumeRole issue at 2am is not, usually, the same person who can debug a managed identity token failure in AKS at 2am.
CloudWatch and Azure Monitor have different query languages, retention defaults, and cost curves.
Terraform modules for AWS and Azure aren't interchangeable — they're dialects. Your platform team writes both, tests both, upgrades both.

You aren't hiring one SRE team. You're hiring 1.7 teams and paying them both to be on call. I've watched a 20-person platform org drop from 12 deploys a day to 4 inside six months of a "multi-cloud initiative." Not because the tech was broken. Because context-switching is a tax paid in human hours.

When multi-cloud genuinely makes sense

This is not an argument for single-cloud dogma. Multi-cloud is the right answer in specific cases:

Regulation, not resilience, is the honest reason to run workloads across more than one cloud.

Data residency. Your European workloads run on a cloud with the right regional footprint; your US workloads run on another. These are isolated estates, not replicated ones.
Vendor-specific capability. You run analytics on BigQuery because the alternatives aren't close, and you run enterprise workloads on Azure because of your EA. Fine. That's polycloud, not multi-cloud.
Customer requirement. A hyperscaler customer won't run your SaaS on their competitor's cloud. Valid business reason. Accept the cost.
Acquisition inheritance. You bought a company that runs on GCP. You're multi-cloud for two years whether you wanted to be or not.

What's not on that list: "in case AWS has an outage." Design for multi-region within one cloud first. A well-architected us-east-1 + us-west-2 deployment survives all but the most catastrophic AWS failure modes — and when that catastrophe hits, half the internet is down too, and your customers aren't going to notice you're up.

The abstraction trap

Teams try to paper over multi-cloud complexity with abstraction layers — Crossplane, Anthos, Kubernetes everywhere. The pitch is "write once, run anywhere." The reality is you write to the lowest common denominator, lose the managed services that made each cloud worth using, and add a platform your team has to operate on top of the ones you already have.

The takeaway: Before you go multi-cloud, write down the specific failure mode you're protecting against and the dollar cost of protecting against it. If the answer isn't a regulatory or contractual requirement, you probably want multi-region, not multi-cloud. Resilience is an architectural property. It isn't a vendor count.