Over the coming weeks we’ll dive into a few techniques that are critical to getting you the compute power you need, at a fraction of the cost. The areas we will focus on are:
Use of preemptible/low-priority/spot instances
Efficient on/off scaling of cloud resources
For this discussion, we will focus on Preemptible Instances, and their potential impact on cost. Each major cloud compute provider has some form of low cost resource. For Google it’s Preemptibles, Microsoft has Low-Priority, and Amazon with their Spot Instances. Typically, these resources can save you 50% on average compared to standard submissions. They are inexpensive because they are not currently in use, however you run the risk of being bumped at any time, and without notice. Think of these resources as the Airbnb of compute, where to can use the resource at a low rate, assuming no other normal rate request comes along. Preemption rates average around 10%, which means for every 100 frames, you’ll need to manage restarting 10 preempted tasks.
How do you most effectively take advantage of these inexpensive compute resources, while mitigating the risk of missing your deadline? With a little planning on your part, Preemptible/Low-Priority/Spot Instances can be extremely effective tools to get your project done, at a fraction of the cost. As you begin your work, here are a few important areas to consider:
Time of day/week Are you submitting your job to run overnight, right before leaving for work? Isn’t everyone else? Cloud demand ebbs and flows just like any other, and we’ve seen increased preemptions at peak times, as much as double (20%). Consider running at off hours from typical submission times. This will obviously depend on your location.
Render Time/Frame Chunking – How long does your render take locally? If, say, you have a heavily textured 3D render that takes 2 hours, you run a much greater chance of preemption than a 1-2-minute 2D comp. Frame chunking compounds this issue. Typically, in the cloud, services will spin up a single instance per frame, unless the user specifies frame chunking, whereby multiple frames are run sequentially on an instance. So that 2-hour 3D render with frame chunking becomes a 4-6-hour task and increases preemption exposure. Frame chunking was more relevant in the days of compute time minimums, but no longer. Depending on the scale your service can achieve, it’s best to simply run all frames in parallel.
Automation, Where Possible – Many services support targeting these preemptible resources. At Conductor, we’ve also been focusing on automation techniques, to ease the use of low cost resources whenever possible. At submission, we allow specification of number of automatic retries, if a preemption does, in fact, occur. Without this, a studio would need to have someone man the job status, and manually trigger a restart of the failed tasks. We’re also working on progress-optimized restarts of preemptions, as well as offering insights to preemption analytics, based on submission time. To the extent you (or your service) can automate the process, it will streamline your workflow.
So, where do you go from here? First step, select a service (hopefully you’ll give Conductor a look!), and begin some tests. Your individual situation will vary, depending on global region, time of submission, scene complexity, etc., so run a few scout frames first to assess the situation. This will allow you to understand your specific environment, and when you have the best outcome. Getting good results from somewhat unreliable resources may seem daunting at first, but in reality, success is a recipe you can uncover with a bit of upfront effort. And with Preemptibles, success comes at a significant savings.