Andreessen Horowitz recently published a blog about the Heavy Cloud Costs and Scaling Challenges of The New Business of AI, in which they describe how AI companies are facing cloud cost challenges, which are impacting their margins. As someone who used to manage a fully home-grown on-site distributed speech recognition platform for an industry leader, I know firsthand that ML can be expensive and challenging to maintain. However, it doesn’t have to be. At CloudZero, we take cost into consideration throughout our engineering process, starting with planning and design. I wrote about it in another blog, which you can check out here. In this post, I’ll share an example of how we designed and successfully built a fast, scalable, cheaper machine learning pipeline.

The Huge Cost of AI Research Doesn’t Have to Carry Over to Production

I think one of the reasons why machine learning is considered to be irrevocably expensive is that companies see what it costs during the research phase and assume that price needs to carry over to production. Most companies have ML somewhere in their system. I’m sure you’ve heard of or are currently maintaining a few EC2 instances tagged “research” or “experiment;” if you’re really “lucky” you have an entire “research” AWS account filled with EC2 instances in various states of disrepair. From my experience, the story of these sunken volcanoes of cloud cost is usually: First, the data science team did some experiments on lots of data and found some promising algorithms. Then, the product team simply pointed the algorithms at production data at production scale and walked away.

Here’s the nuance I think the Andreessen Horowitz blog missed and what any price-shocked company that gives up on machine learning is missing: machine learning doesn’t have to be that expensive. Once the research has been done, you can take a next step to productize it with cost in mind. Pretty much every other industry and aspect of engineering is more thoughtful when doing these transitions. For example, when I was an audio engineer, there was an entire team that took the research code and figured out how to make it fit on a speaker. Taking the time to make sure our machine learning code meets our non-functional requirements ensures that we can realize both great innovation and strong product margins.

Productizing Our Machine Learning Pipeline

When we set out to build our machine learning pipeline, the existing implementation in the research area was far too expensive for production. Our data science team had a single t3.2xlarge EC2 instance manually running an hours-long per customer process. That setup allowed them to experiment quickly to build something to work, but it would never run at the speed our customers required or the price that would allow us to scale. My CTO challenged the team: productize the ML pipeline and it has to be cheaper than the current EC2 instance.

How to Build a Cost-Effective Architecture

It’s fairly easy to calculate production costs when building on top of services like EC2, but other AWS services can be more challenging. The AWS pricing pages aren’t exactly made for this question. Instead, they assume you’ve already built your architecture and need to calculate the total cost. Here’s the approach that worked for us.

The CloudZero production system has 99% of our compute on AWS Lambda. So I started by asking myself, “how much Lambda can I get for $7 per day?” The AWS Lambda Pricing Page breaks down the total cost:

Total Lambda Cost = Request Cost + Duration Cost

where

Request Cost = 0.0000002 * Number of Requests,

Duration Cost = 0.0000166667 * GB seconds

The first interesting observation is that GB Seconds depends on Number of Requests. For example, if we have 1 Lambda Function that completes in 1 second and is configured with 1 GB of memory, then GB seconds = Number of Requests. Knowing this, we can rewrite the cost equation to give us the Number of Requests for a given Total Cost and GB seconds:

Number of Requests = Total Cost Goal / ((0.0000166667 * GB seconds) + 0.0000002)

We fix our Total Cost at $7 and input a range of GB seconds into this formula to get a table of GB Seconds to Number of Requests.

 

GBS

Number of Requests

1

414937

32

13117

64

6558

96

4373

128

3280

160

2624

192

2186

224

1874

256

1640

Why look at these data by Number of Requests? Think of a request as an abstract unit of work your system needs to process in order to deliver some user value. Sometimes a request is a user action, like completing a shopping cart checkout. In our ML pipeline, a request makes a prediction that produces cost anomalies we send to our customers. Since I know our customer goal for the year, I can find out how many total requests we need to support our customers.

The data science team and I sat down and figured out that with the average number of training and predictions we have per customer, we will need about 3000 requests per day. Checking our table above, 3280 requests gives us 128 GB seconds per request. So now we can ask, can we do training and prediction with 128 GB seconds? At first glance, this seems impossible! The existing EC2 instance uses 32 GB of memory and takes hours to run. There’s no way to slice and dice that into any kind of Lambda execution, right?

We started digging deeper together. Why do we need 32 GB of memory? Because we load all of the data up front. Why do we need to load all of those data? In order to group by certain attributes. Why do we need to compute this grouping in memory? Can we offload that grouping logic to SQL? Yes! Can we run trainings in parallel? Yes. Can we run predictions in parallel? Yes.

We worked backwards from cost requirement to a service constraint. In this case, $7 per day gives us 128 GB seconds of Lambda per request. Using this service constraint, we asked 5 whys with the data science team. In the end, we found that we could decompose the existing EC2 ML algorithm into a series of steps, each of which takes a fraction of the data and time as with the original EC2 implementation. In addition, we can parallelize many of the steps via Lambda without worrying about the pitfalls of Python concurrency semantics. (ML code is usually Python, and friends don’t let friends write concurrent Python … am I right?)

This example illustrates how you can use a cost requirement to estimate the number of requests of a single AWS service. You can extend this to estimate how much of a set of services you can afford. In our case, once we confirmed we could afford and fit our algorithms into Lambda, we did the same exercise for AWS Step Functions, AWS EventBridge, and AWS DynamoDB. Our final ML pipeline comprises all of these services triggered by daily customer data drops.

How to Confirm a Cost Effective Architecture

Once you’ve designed, implemented, and deployed your system(s) you should confirm your cost hypotheses.  We can use the AWS Cost Explorer to group and filter by tags (if you’re using tags of course). When I checked the daily granularity I found out that our final system actually costs about $2.15 per day. If you don’t have a robust tagging strategy, it’s hard to enforce, or if you have lots of shared resources, here’s where I shamelessly plug the SaaS platform I’m building: CloudZero helps engineers gain insights into their AWS costs by, among other things, grouping your resources into cost-groups, features, and products … even if your tagging is not perfect.

tl;dr

Machine learning has a reputation for having enormous compute costs associated with it.  However, with a top down cost approach, it’s possible to build machine learning based products with strong margins. Invert the AWS pricing calculations to take a target price as input; you then can calculate how many requests of a given service you can use in your system. Use customer-centric metrics to figure out how many requests you need to provide user value and then use the constraints of that service to ask 5 whys when you productize your ML algorithm. Repeat this process for the set of AWS services comprising your system. Once you’ve deployed, use a cost tool to confirm you’ve met your objectives.

A few cost tips:

If you only skimmed the article, here are a few cost tips I like to keep in mind as I’m building products. 

Tip 1: Rewrite AWS Pricing equations to estimate the Number of Requests you can support.

Tip 2: Work with your Data Science team to figure out customer-centric metrics for training and prediction.

Tip 3: Sit down with the Data Science team and ask 5 whys using a service (usually compute) constraint as primary your non-functional requirement. Productize accordingly.

The Cloud Cost Playbook

The step-by-step guide to cost maturity

The Cloud Cost Playbook cover