Most organizations today run their workloads on public clouds because of its various benefits. Public cloud offerings have been a great solution for organizations that previously maintained their own datacenters. The costs for public cloud instances are considerably lower since you only pay for the resources you use and not for the infrastructure and maintenance. However, many organizations can face the challenge of bill shock due to ill-managed resources. Cloud costs can gradually rise and before you know it, your cloud bill is sky-high. This is why organizations try to cut costs wherever they can. And since compute resources offer much flexibility when it comes to costs, organizations look for ways to do more with less. That’s what spot instances are all about, and we’re going to explore them in this article.
When looking for compute options, public clouds offer multiple options, which can be categorized into the following categories.
- On-demand instances: This option allows organizations to leverage cloud instances for as long as they need without any commitment. This is good for organizations looking for high-priority workloads and applications in production. They are the most expensive option, and while most organizations start with this, they would eventually need to graduate to more mature forms of compute capacity to save costs and improve performance.
- Reserved instances: This is for organizations looking for backup compute capacity to use in case of sudden traffic spikes. These instances are usually cheaper and can be availed at 45% to 75% cheaper than on-demand instances.
- Spot instances: This is the cheapest compute capacity option available in the market. Organizations can avail the spot instances at a discount of around 90 percent. Spot instances are suitable for more flexible workloads.
Why spot instances?
A spot instance is a cheap compute capacity that can save organizations a lot of money. Amazon spot instances are the same as their on-demand or reserved EC2 instances for running workloads. However, the catch is that the spot instances are ephemeral. AWS has plenty of unused EC2 capacity, and spot instances allow them to make money off of this unused capacity until someone pays for it long term. If used correctly, spot instances can be quite helpful in cutting costs without affecting workload performance. Lyft is a prime example here as they leverage spot instances to cut their cloud costs by 75 percent. However, let’s try to understand why spot instances are still too risky for some organizations despite their cost-effectiveness.
Too good to be true?
Spot instances can be amazing for your organization if you are looking for cheaper compute solutions for your fault-tolerant, stateless, and flexible workloads. Getting compute capacity at a fraction of on-demand prices is a huge deal. However, there’s a catch. Spot instances are ideal EC2 capacities that can be interrupted at any time by Amazon. This can become a huge hassle if your workloads require persistent compute capacity.
To get spot instances, you have to provide the required number of spot instances, availability zone, and maximum price you would like to pay for those instances. Organizations can look at the pricing history of spot instances to determine the maximum price they would like to bid on the spot instances. Amazon determines the price of spot instances based on the future supply and demand of the EC2 capacity. However, these prices don’t fluctuate much. Organizations bidding on spot instances should keep in mind that no matter the bid, the instances can get interrupted by Amazon at any time.
What are your options?
The interruption occurs when Amazon EC2 is unable to provide enough capacity to fulfill on-demand or reserved compute requests. Or if a specific availability zone has a shortage of compute capacity. In these cases, spot instances can be terminated with a two-minute notice. This can become detrimental if an organization doesn’t factor in this scenario while configuring its workloads. Organizations can, however, choose from two different pricing options.
Regular spot pricing: This model provides huge discounts on compute capacity, but can be terminated at a two-minute notice.
Defined duration pricing: This model allows organizations to leverage spot instances for 1-6 hours with guaranteed availability. However, this option is not as cheap and is available at 30% to 60% off on-demand EC2 based on the duration.
Spot instances can also be terminated if someone bids more on the same spot or if the market price for the spot instance surpasses the maximum price you bid on it. It’s essentially a stock market for unused EC2 instances.
The downside of using spot instances
The biggest fear related to using spot instances is the most obvious one. Organizations can get reluctant to use spot instances because they can fail at any time and make their workload unreliable. Developers would have to put in additional efforts to implement failover techniques. This can get frustrating for developers and testers who could be working on more important things.
This fear is not misplaced. Constantly having to rework older workloads can be time-consuming, and there isn’t much that developers can do to ensure guaranteed fault tolerance if a spot instance gets terminated. Mission-critical workloads are not something to play around with. Interruptions in a mission-critical workload can lead to irreversible damage to an organization’s reputation.
Addressing the concerns
Spot instances can be tricky, but there are ways to implement them that can help organizations have more confidence in their workloads. First off, an organization should start small. Work on proof of concepts that can be resilient enough to deal with spot instances being terminated at any time. These POCs can be built upon to create extensive and more complex workloads that are flexible enough to keep working no matter the interruptions.
The benefit spot instances offer in terms of cutting costs shouldn’t be underestimated. You should also know where you can use spot instances. You can use spot instances in containerized workloads, CI/CD operations, distributed databases, batch processing jobs, machine learning workloads, and any application running on an orchestrated environment.
Another important thing you can do to prevent failure due to spot instance interruptions is automation. By using cloud-cost optimization tools, you can leverage spot instances and easily fall back on on-demand compute capacity when your spot instances are terminated. You can also use AWS Rebalance events to mitigate any risk beforehand. You can also use automated analytical tools that can help you leverage spot instances with increased confidence.
A big help — if implemented correctly
Organizations can end up spending extravagant amounts of money on the public cloud. They must figure out ways to cut costs wherever necessary. Spot instances allow organizations to do just that. Spot instances are gaining traction as a growing number of organizations are trying to adopt this alternative to regular compute instances. However, organizations must implement them correctly. Failure to do so can lead to much more harm than good. Various organizations have successfully implemented spot instances in their production workloads and are enjoying the benefits of reduced costs and on-demand performance. The future is stateless, and spot instances are just the right fit for organizations running stateless, fault-tolerant, and extremely flexible workloads.
Featured image: Shutterstock