Breakthroughs in engineering best practices often stem from a handful of top tech companies.
Many of them share their behind-the-scenes stories at conferences, in blogs, and slide decks — or open source code.
These companies invest millions of dollars and dedicated headcount in optimizing everything from uptime to engineering velocity — so why wouldn’t you look to them for inspiration?
CloudZero is a company that enables engineering to build more profitable applications, so we thought it would be interesting to investigate and share how these top companies think about cloud cost — and what kinds of cloud cost management tools they use and build. Luckily for us, a lot of them have blogged and spoken about it.
By the way, if you want to achieve results like these companies, without pulling your best engineers off your roadmap to build a homegrown system, check CloudZero out.
Part 1: The Common Threads
Before we dive in to the specifics, here are a few of the patterns that emerged across each DIY cloud cost management tool:
They’ve built cultures of cost-conscious engineering.
- Top companies decentralize cost management to engineering teams. All of them have reported that when engineers have visibility into their spending, they make better decisions.
- These companies know it’s all about balance between cost and velocity. An engineer shouldn’t spend hours on something to save five dollars. Cost visibility is all about making better decisions and tradeoffs — not saving money at all costs.
- They want engineering teams to have autonomy and move quickly — and understand that’s a key pillar to move quickly. At the same time, many of them have built guardrails to control cost while they.
They view cost in context of business.
- Their disruptive business models have been enabled by strong command of cost and unit economics. The reason why they have revolutionized their respective categories is that they deliver innovative solutions to customers in cost effective ways. To do this, they understand their cloud unit economics, like cost per ride or stream, and discuss cost in the context of their business.
- They have built custom ways to make the data speak to the different stakeholders, including leadership and individual dev teams.
- They’ve had to figure out ways to automate or supplement their tagging in order to be able to report on cost. They’ve also allocated container spend in custom ways.
They are complex and customized, but all achieving similar outcomes.
- Existing offerings weren’t enough. Cloud cost management tools — at least the players you might see listed in the Gartner Magic Quadrant or Forrester Wave — weren’t doing the trick. Their engineering teams have all adopted next generation practices and services — and they needed a cost solution that could keep up.
- These systems are an enormous amount of work and custom engineering. They have entire teams of full-time employees building these homegrown systems.
Part 2: The Homegrown Cloud Cost Management Tools
How We Know
The Lyft team spoke at re:Invent in 2019 in a session called “Managing Your Cloud Financials as you Scale on AWS.”
You can watch it here. Lyft starts talking at around the 35 minute mark.
How They Do It
- Lyft has built a system of customized dashboards for all of their stakeholders, including leadership, engineering, and capacity planning.
- Each engineering team lead has their own dashboard where they can drill in and investigate spend.
- They measure cost per ride to track unit cost.
- They have processing that sits on top of tags to be able to attribute spend to teams and projects.
- They had to build a way to allocate container costs — a project which was much more challenging than expected.
Cost Culture
Lyft has said that once their engineers had visibility into what they were spending, they started to make better decisions around cost. Teams are now shown how much they spend compared to other teams, which has led to some good-spirited competition to reduce costs.
A slide from the re:Invent presentation detailing Lyft’s custom cost management solution.
How We Know
Netflix wrote a very detailed blog about their home grown efficiency and cost management system.
You can read it here.
How They Do It
- Netflix has “a custom dashboard that serves as a feedback loop to data producers and consumers — it is the single holistic source of truth for cost and usage trends for Netflix’s data users.”
- They break down cost into “meaningful resource unit (table, index, column family, job, etc).”
- They categorize AWS billing data by service, such as Amazon EC2 and Amazon S3. However, they have built custom ways to get further granularity into each.
- They found AWS billing data was not granular enough for them, so they have built custom methods to align cost to the business metrics they care about like teams and products.
- They provide optimization for some scenarios, such as storage.
- They deliver cost alerts directly to their engineers.
Cost Culture
Netflix sums up their approach as: “At many other organizations, an effective way to manage data infrastructure costs is to set budgets and other heavy guardrails to limit spending. However, due to the highly distributed nature of our data infrastructure and our emphasis on freedom and responsibility, those processes are counter-cultural and ineffective.
Our efficiency approach, therefore, is to provide cost transparency and place the efficiency context as close to the decision-makers as possible.”
A picture of Netflix’s dashboard that shows cost by organizational hierarchy. This kind of reporting helps give every team ownership of their cost.
How We Know
Expedia spoke at re:Invent in 2017. This is a few years old at this point, but Expedia was quite sophisticated, even back then. You can watch it here.
How They Do It
At the time of this presentation, they were just embarking on building their own custom tool to get the metrics they needed, so this is a bit light on the details of what they eventually built.
However, they did share that their cost optimization practices are:
- Automation to tag all resources
- Visualization and monitoring tools
- Measure, measure, measure
- Leveraged RI pricing
- Decentralized forecasting and planning process
- Encouraged teams to share optimization best practices
Cost Culture
In 2017, Expedia wasn’t just focusing on cost optimization. They were building “cost transparency” for their engineering teams and decentralizing responsibility for cost management. One of the major changes they made was involving engineering teams in the forecasting and budgeting.
A slide from Expedia’s Re:Invent talk in 2017.
How We Know
Slack published a job posting for a cloud economics engineer. While it is not quite as extensive as the blogs and re:Invent talks, it still gives us a glimpse into what Slack does for cost management. We suspect this posting won’t last forever, so we pasted it into a document here.
How They Do It
- Slack has a cloud economics engineering team composed of cloud engineers, financial analysts, and AWS subject matter experts working to make Slack more performant, available, and cost-efficient each day.
- They are developing a new platform to provide engineering teams visibility into their cloud spend and efficiency.
- They are building a home-grown chargeback system to ensure the correct service owners know the cost they place onto other systems.
They monitor cloud spend, track and alert on changes over time.
Cost Culture
This about sums it up: “We advise teams within Slack on how to maximize their value from the cloud and ultimately aim to build a culture where all our engineers are cost-conscious and building a business scalable for the long term. We get excited about making Slack cost-efficient whilst ensuring we use the right technology stack.”
How We Know
Segment has written two blogs about how they do cost management.
You can check them out here:
Both blogs focus on how they cut down existing costs to improve margins (which we’re guessing helped that really, really big acquisition number). We’re going to focus more on how they do ongoing monitoring and proactively reduce spend, which is covered in the 2019 blog.
How They Do It
- Today, they monitor their spend on an ongoing basis, so they won’t have to worry about their margins creeping up on them anymore.
- To get the ongoing visibility they need, Segment built a set of repeatable pricing drivers, calculated daily. The entire cost pipeline feeds into their Redshift instance, and they get daily monitoring on their “cost drivers”, visualized in Tableau. They have now built custom alerting to detect spikes and send teams an email.
Cost Culture
Segment lists 36 people who are part of their “gross margin team.” It’s clear they’ve helped their engineering team understand the value of building cost-effective products.
Part 3: Intelligence Vs. Management
Each company uses slightly different terminology. Expedia and Netflix both say they’ve built “cost transparency,” for example. But what is more striking — are the similarities.
Each team has built essentially the same solution to transform cloud cost from centralized and reactive to autonomous and proactive — while integrating cost as a key metric in their development process. They have also found metrics that align to their business, so everyone from their CEO down to an individual engineer can make better decisions based on cost.
At CloudZero, we call this cloud cost intelligence.
Cloud cost management is about reporting retroactively on how much you have spent. Cloud cost intelligence is about leveraging cost data to outperform your competition — or know exactly what levers you can pull when times get tough.
These companies have wielded this power to their advantage to generate resilient growth — and you can too.
Add Cloud Cost Intelligence The Easy Way
Here’s the thing …
You could go build a cloud cost management system like Netflix.
But an internal cost monitoring solution isn’t exactly going to help you hit those Q4 numbers — and do you really have five or six engineers you can put on the project?
CloudZero is a SaaS platform that does the hard work for you and delivers cloud cost intelligence to your engineering teams in real-time. !