It’s been a great month for exciting insights about the role that engineering must play in driving profitable innovation.
The ever-astute James Governor from RedMonk authored a post about Spotify open-sourcing their Cost Insights tool, and more broadly about how engineers should be “incentivised to take more responsibility for the costs associated with the products they’re building.”
And now, Sarah Wang and Martin Casado have released an article called “The Cost of Cloud, a Trillion Dollar Paradox.”
The piece is built around two main concepts:
- Cloud is the “perfect platform to optimize for innovation, agility, and growth, but it comes at a cost; there’s a ‘flexibility tax’ for the nimbleness the cloud provides.” Therefore, it’s critical to make cloud spend a first-class KPI metric, and empower engineering to optimize from day one.
- As a company grows, the pressure that cloud puts on its margins “can start to outweigh the benefits.” Therefore Wang and Casado present the idea of “repatriating” cloud workloads — bringing all or most out of the cloud.
The first is spot on. It’s precisely why CloudZero exists.
The second may make sense when viewed solely through the narrow lens of “unlocked market capitalization,” but is otherwise a terrible idea.
Why You Should Empower Engineering To Tackle Cost
In line with the Monkchips post, Wang and Casado make a great case for empowering engineers to innovate profitably. They, too, mentioned Spotify’s Cost Insights project and how “the company enables engineers, and not just finance teams, to take ownership of cloud spend.”
Why engineers and not finance?
Because in the cloud, every engineering decision is a purchasing decision, so it’s important to “tie the pain directly to the folks who can fix the problem.” With well-informed, continuous cost-conscious development, margins can be improved substantially.
The article described a prominent CTO whose company used “short-term incentives like those used in sales (SPIFFs), so that any engineer who saved a certain amount of cloud spend by optimizing or shutting down workloads received a spot bonus.”
The spirit of this is good — rewarding engineers who tie their technical decisions to business outcomes. But it overemphasizes the absolute cloud cost. Even with the most cost-conscious architectural choices, a cloud bill will increase if volume increases.
We suggest rewarding cost-conscious behavior based on unit cost savings instead of cloud spend savings. As a team delivers features, the organization calculates the cost per customer for that feature — and that figure is what engineers should focus on.
Telling an engineer that her EC2 spend increased by 10% last month is meaningless — especially if the increase was driven by adding a thousand new customers. Telling her that she managed to deliver the same value for the same or lower cost per customer, on the other hand, ties that win to gross margin improvements. Business outcomes, not just cloud spend, are what matters.
(This is why at CloudZero we’re focused on delivering an “intelligence” platform and not an “optimization” one.)
I fully agree with the article that “infrastructure spend should be a first-class metric” — as long as it’s focused on the context of what matters to the business: the cost of features, transactions, teams, etc. — and not the absolute cost of the service on the monthly cloud provider’s bill.
Why You Should Absolutely Not Repatriate
Now, on to Wang and Casado’s idea of “repatriating” cloud workloads.
They cited Dropbox who “saved nearly $75M over two years by shifting the majority of their workloads from public cloud to ‘lower cost, custom-built infrastructure in co-location facilities’ directly leased and operated by Dropbox.” Results may vary, but they finally assert that “repatriation results in one-third to one-half the cost of running equivalent workloads in the cloud.”
But this misses the fact that the cloud is a fundamentally different way to build and operate software. As our CloudZero co-founder and CTO, Erik Peterson, says, “Cloud is not a data center.”
Cloud applications must be built differently to take full advantage of what the cloud offers. If you “lift and shift” a data center application to the cloud (or build in the cloud but with a data center mentality), you will overpay and underachieve on promised benefits of the cloud. Optimization and cost-conscious, cloud-native architecture drive profitable innovation in the cloud.
The effort required to repatriate a cloud app is enormous, and effort invested in it essentially halts innovation while it’s underway. Plus the many negatives of operating on-prem — many of which drove companies to the cloud in the first place — still remain after the migration:
- The distraction of investing in something that’s not your core
- More time-consuming and costly to experiment with emerging technologies like AI and ML
- Less agility to meet rapid growth or contraction of usage
And the worst drawback of them all: the absolutely horrible environmental impact of on-prem computing. The Microsoft Cloud is between 72 and 98% more carbon efficient than traditional data centers. They are aiming for 100% renewable energy use for their cloud by 2025 — that would be very difficult (and costly) for every cloud “repatriating” business to achieve on its own.
“The Cost of Cloud, a Trillion Dollar Paradox” is a fascinating read, and a great addition to the fast-growing body of work emphasizing the importance of pushing engineering to take ownership of cloud spend.
I agree completely with the authors that “cloud is the perfect platform to optimize for innovation, agility, and growth” but I have to disagree that growth invariably leads to a need to depart from the cloud.
Build cloud-native, optimize, and continue to grow in the cloud. As long as you’re thoughtfully architecting your systems, instead of lifting and shifting — you’ll benefit from huge gains in speed, innovation, scale, and yes, better margins.