Whether this is your first time delving into it, or you’ve already tested the waters, the mechanics of cloud rendering can be a bit overwhelming. Not all services or cloud providers operate the same way, and these differences in process and terminology can trip up the best of us. In an effort to clear up some of the murkiness, we’ve been keeping track of some of the most frequent points of confusion, and hope that this post can be a helpful resource to anyone looking to improve their understanding.
Types of Cloud
There are three fundamental cloud types that you will encounter in your cloud rendering research. We have laid out a quick overview of each, along with some examples for reference.
Examples: AWS, GCP, Azure
Best for: Studios on the larger side who already have in-house architecture/ custom render management software built to connect them to the public cloud.
Public cloud refers to machines accessible to the public via a cloud connection. While they are available to anyone, the content you upload will be secure, and will remain so throughout the rendering process. Conductor leverages two of these providers for our managed service: Google Cloud Platform (GCP) and Amazon Web Services (AWS).
These resources are vast and highly scalable, but they require users to bring a fair amount of knowledge and effort to the table to access and manage them directly. This can be overwhelming to someone who is new to cloud rendering and is most manageable with a knowledgeable team in place to handle the logistics and orchestration. You will likely require a render management package to aid in the orchestration process, such as OpenCue or AWS Portal.
Private Cloud/Render Farm
Example: Privately leased collocated data centers (Colo), Rebus Render Farm
Best for: Small projects, flexible turnaround time
In essence, private cloud refers to privately reserved resources that are only available to a select group of users. In some ways, it is similar to public cloud—it will still require you to manage the logistics, setup, and software requirements of your rendering pipeline internally. The key benefit some find with private cloud is that it ensures resources are allocated without external competition. It can also be beneficial from a cost perspective if the need to render isn’t immediate, as you can pay less for lower priority timing on your jobs. The downside to this approach is that the capacity is much lower than public cloud, often capping out at several thousand machines.
Boutique regional render farms also fall into this category; owning and reselling access to their own hardware. These render farms differ a bit from leased data centers, as their compute resources may be available to their other customers as well (depending on their business model). This reintroduces resource competition into the equation. The upside to these services for many is that they may be able to charge customers a lower rate, due to the fact that they own the compute resources and therefore control the price-point. However, if the render farm has a large user base, their limited resource pool often leads to an inability to accommodate user needs in times of high demand. The cost savings can be worthwhile if you have a flexible turnaround time. As is the case with leased data centers, you may still need to bring your own licenses, depending on the render farm.
Orchestrated / Managed Services
Examples: Conductor, Zync, etc.
Best for: Most use cases, but especially if you wish to avoid building your own cloud pipeline and architecture, or for when you just want to hit the “go” button.
Different services and offerings will likely differ in their approach, but there are a few key points where they all tend to intersect. Namely, that the process of spinning up machines and connecting to them is handled for you. On top of this there are usually other elements which are managed. Some services require more manual intervention such as requesting a specific number of machines or pre-defining the files required for the render. Others have tighter software plugins and management capabilities that automate the process.
As an example, at Conductor, we automatically detect scene dependencies and launch one machine for each frame in a submitted scene from our plugin. There are also real-time cost controls and user management features, as well as usage-based licensing bundled per-minute. The benefit to any of these offerings is a turnkey approach that lessens the need for in-house cloud expertise and allows artists to focus on the imagery. Each service will bring its own benefits to the table, so be sure to take note of what your priorities are and ask any necessary questions up front to ensure that your needs can be met.
Factors to consider
When deciding which approach is right for you, be sure to consider the fundamentals required for your workflow.
Examples: Object Storage, File Storage, Block Storage
Storage is the most critical and often misunderstood component of cloud workloads. When many people think about cloud storage, services such as Google Drive or Dropbox come to mind. The types of storage we are looking at in the context of this post aren’t about long term asset organization, as they are far more temporary in nature. Depending on your application and requirements, different types of storage are needed. Those differences generally center around read/write performance and frequency of access. For high-performance computing (HPC) workloads like rendering, storage should be mostly ephemeral, with the storage being the highest performance possible, but only active for the duration of the render time. On Conductor, we spin up these high-performance storage systems for each render job, then spin them down automatically. The resulting images are the only artifacts that remain, which we place into low-cost Object storage for eventual download.
As an incremental approach, some studios may also investigate a file caching strategy, where the local file storage is dynamically accessed by the cloud rendering machines. The benefits of this approach are that the render file structure doesn’t need to be statically determined when submitting the work and the uploading of files are incremental. However, this can also lead to longer render times and thus increased cost, as the cloud-based render instance is running, but waiting for files to be accessible.
In terms of storage on Conductor, the cost of the high-performance storage is included into our core-hour Conductor fee ($.03/core-hr). The cheaper Object storage housing images and scene data is billed monthly and prorated by the day ($0.18/GB/month). Typically, studios will complete a project, then delete (purge) their Object storage. There is value in holding off on deleting your storage if the project is ongoing and you intend to use those assets in your ongoing workflow, but otherwise, the best way to cut down on storage costs is to purge your storage relatively expediently once all files are downloaded and the project is completed.
When we refer to scale, or elasticity, we are specifically looking at the resources available for use at a given time. If you need a service with the (nearly unlimited) resource availability of public cloud, but want the ease of a managed service, your best bet is to look into the compute source of any managed service you are vetting. We are multi-cloud, meaning that we are able to draw on the elastic resources of both AWS and GCP to ensure that our users’ compute needs are met.
Licensing is a big factor, because when rendering, you will require a license for each parallel machine used. This can become a bit of a headache, as many providers will require you to bring your own licenses (BYOL) to the platform. Seeking a provider with metered licensing available will save you a lot of time and effort. At Conductor, our pricing already includes the metered, per-minute licensing charges automatically. The charges are calculated either per core-hour or per instance (or machine-hour), and then prorated to the minute. To learn more about licensing charges and how you can use this knowledge to reduce your render costs, check out this post on cost-efficient renders.
Terms to know
When investigating cloud rendering for the first time, you may find there are an unfortunate number of terms that all mean the same thing, and some others that are likely to be unfamiliar. Here’s a glossary for some terms and concepts you may come across.
Instance type (Machine/ Node type) – Specifies the hardware configuration used to run your job. Higher specification instances for cores and memory are faster and able to handle heavier scenes. You are encouraged to run tests to find the most cost-efficient combination that meets your deadline.
Core – A machine (or instance) contains a given number of cores. The higher the number of cores, the more efficiently the machine will render your content.
Scale/ Elasticity (“elastic resources”) – When we refer to scale, or elasticity, we are specifically looking at the resources available for use at a given time. Elastic resources allow you to spin up as many simultaneous machines as you need, which enables you to render your content more quickly.
Job – Jobs are user submissions. When you submit content to Conductor, whether through our plugins (located within your content creation software), command-line submission, or Conductor Companion’s Submission Kit, that submission will be classified as a job. Each submission is a unique job, and can be monitored and managed within the Conductor Dashboard. A user will submit multiple jobs throughout the life of a project.
Task – Tasks directly relate to frames, and exist within a job. For example, 96 frames assigned in a job will result in 96 tasks. Each render node assigned during a job submission is a task.
Preemptible/ Spot vs. Standard Instances – “Preemptible” instances (as they are called on GCP), or “Spot” instances (as they are called on AWS) are priced lower than “standard” instances (“standard” refers to regularly priced instances). These low-cost instances are temporarily available in times of low cloud demand, and Conductor targets these machines by default through a selection in the plugin. These machines are roughly half the cost of standard machines but can be pulled from a render mid-cycle, losing all progress. While this happens infrequently (less than 5%), preemptions can cause cost overruns and delays if not correctly addressed. Depending on the urgency of your timeline, you will want to weigh the benefits of lower costs against the risk of preemption. We are happy to answer any additional questions you may have, and advise you, if needed.
Scout Jobs/ Scout Frames – This is a Conductor-specific feature, but it is one that we refer to frequently. While our site contains an online cost estimator, the most accurate way to project your render costs is to run some elements of the scene itself. The Conductor plugins have a built-in feature for this very purpose, called “scout job”.
When enabled, your entire scene uploads to Conductor, but only a select number of frames will render. These scout frames will render immediately, as a sort of test. When the submission reaches Conductor, only those tasks containing the specified scout frames are started. Other tasks are set to a holding state. This feature allows you to investigate your render’s correctness and evaluate per-frame cost before running the remaining frames. You can find more practical information about using Scout Jobs on our Tour page at conductortech.com/tour.