Cloud costs can come with significant sticker shock, especially since many businesses do not have an easy way to track or predict actual cost before the bill arrives.
However, there are several AWS cloud architecture changes that businesses can make that will help rein in cloud spend. In some cases, optimal engineering decisions should be made up-front, while in other cases certain areas should be monitored over time to identify opportunities to retool architecture and optimize cloud costs.
We consider either approach “proactive,” because ideally this happens long before a shocking cloud bill comes due.
Of course, engineering teams must have visibility into cloud spend to know what choices to make in the first place. With the appropriate data at hand on a timely basis, it becomes possible to make more cost-effective AWS cloud architecture decisions.
Let’s take a look at three key areas where proactive AWS cloud architecture decisions can prevent or correct excessive cloud spend and optimize cloud costs.
Percentages of Non-Compute and Outdated EC2 Compute
The first area we recommend examining for potential cloud cost savings is compute. Since compute is typically the lion’s share of cost on a given account, it makes sense to examine two ratios related to compute to see whether anomalies are present and identify opportunities for cloud cost optimization.
Non-Compute
We find that compute is roughly 30% of cost in a typical account with an EC2-centric architecture. Keep an eye on non-compute costs, if an account has very little compute costs, something may be amiss.
Of course, audit AWS accounts and smaller accounts that are being used for a niche purpose may not necessarily conform to typical ratios. However, a typical AWS account with an EC2-centric architecture — e.g., applications being hosted — is likely to have a minimum of 30% compute.
Examine any accounts you have that are spending more than 70% on non-compute and make sure that is appropriate based on your goals and the purpose for that account.
Allocating compute resources thoughtfully across your accounts is an example of an architectural choice that can be made at the outset — and one that should be monitored for anomalies later on and changed when appropriate to avoid wasteful spending in the cloud.
Outdated Compute
Another ratio we recommend is Outdated Compute to Total Cost, which is more likely to be an issue down the road vs. one that would impact early architecture decisions.
It is a problem that’s inherently more likely to arise over time, since it’s a matter of upgrading resources (or failing to do so). As you know, AWS has many different product families. Periodically, they will deprecate one product family and bring a new one online.
These new product families typically perform better and cost less, so outdated compute is almost always an opportunity to save money. Often, because development environments do not receive the same focus as production, they can be a common, pernicious source of outdated compute.
Migrating to the latest generation compute may or may not be a quick change, and often requires testing and script updates. The performance improvements and cost savings resulting from the improved architecture usually justify the effort.
If you set up cloud cost alerting, you may not want to get an alert on every instance of outdated compute, but we recommend keeping an eye on the ratio and noting anytime it goes above 10% for a given account, as this is likely a valuable opportunity to upgrade.
Storage & Snapshot Requirements
Next, take a look at your storage and snapshot requirements. These are common areas where businesses overspend in the cloud. The typical storage costs we see are around 25% of the total cost.
However, it’s common for teams to set up lifecycles for storage and databases improperly. It may feel like the “safe” choice to just hang onto everything forever, but storage costs can really add up over time.
When engineering a new component, it’s a good idea to define what the actual storage requirements are, rather than guessing or erring on the cautious side (which can, in fact, rack up major charges).
For architecture that has already been built, monitoring storage costs can help flag when spend ratios get too high (as we mentioned, a good rule of thumb is that storage should be 25% or less of the total bill). If it gets higher than that, find out what you really need to be storing.
Snapshotting is a similar issue to storage in that it may seem like no big deal when it is set up, but costs can quickly spiral out of control. This is another area, like outdated compute, that can also become a problem over time if teams do not upgrade infrastructure.
Before AWS had fully featured backup capabilities, engineers would often use snapshots for the same purpose. For this reason, legacy architectures may spend 15 or even 20% of their resources on snapshots.
In general, we recommend setting a threshold of 10%. If you are spending more than that on snapshots, examine whether this is necessary and look for opportunities to optimize cloud costs by reducing snapshotting.
Some useful questions to ask include:
- Do we need to use snapshots in this way?
- Should we keep these snapshots for a limited amount of time?
- Is there a better backup system we could use?
We have heard of businesses spending hundreds of thousands of dollars per year on snapshots, and there is rarely a case where the value of those snapshots is justified by the cost they incur.
Almost always, there is a way to better architect the system so that backups are conducted and stored efficiently and cloud costs can be optimized.
Data Transfer
Finally, as you likely know, AWS’ data transfer systems can be confusing to understand. As a result, it’s not uncommon for engineers to build architectures that unintentionally have very high data transfer costs.
If you have an account with high data transfer costs (i.e. more than 25% of the cloud bill), it’s worth at least asking whether this is reasonable. In the case of a video streaming application, this number may make perfect sense.
However, for a more typical application, anything higher than 25% is an opportunity to rethink how data transfer is structured and look for opportunities to optimize cloud costs.
Proactive Cloud Cost Optimization: The Way Forward
Any time an engineering team can architect a product or service from the get-go with cost in mind, there are massive cost savings to be realized.
Similarly, gaining full visibility into live cloud costs and setting alerts when thresholds get out of whack, as described above, can help engineers make proactive changes to AWS cloud architectures that are already in place.
In both instances, teams can save significant cloud costs, while often increasing efficiency and performance at the same time. The more proactive you can be, the less surprising that AWS bill will be at the end of the month, and the less friction there will be between engineering and business priorities.