AWS, an already popular cloud computing service for developers looking to access the best-performing hardware for AI workloads, has announced a more flexible scheme for shorter-term requirements.
Amazon Elastic Compute Cloud (EC2) Capacity Blocks for ML is what Amazon is calling an industry-first, and will allow customers to access GPUs on a consumption-based model.
The Seattle-based cloud giant hopes that more affordable options will provide smaller organizations with greater opportunities, helping to make for a more diverse landscape.
AWS launches short-term consumption-based GPU renting
In a press release, the company said: “With EC2 Capacity Blocks, customers can reserve hundreds of Nvidia GPUs colocated in Amazon EC2 UltraClusters designed for high-performance ML workloads.”
Customers can get access to the latest Nvidia H100 Tensor Core GPUs, which are suited to training foundation models and large language models, by specifying cluster size and duration, meaning they only pay for what they need.
Amazon noted that demand for GPUs is fast outpacing supply as more businesses get to grips with generative AI, and many will either find themselves paying for an excessive service or having GPUs sitting dormant when they’re not in use – or worse still, both.
AWS users can reserve EC2 UltraClusters of P5 instances for between 1-14 days, and up to eight weeks in advance. They can pick flexible cluster size options, ranging from 1-64 instances, or a maximum of 512 GPUs.
AWS Compute and Networking VP David Brown commented: “With Amazon EC2 Capacity Blocks, we are adding a new way for enterprises and startups to predictably acquire Nvidia GPU capacity to build, train, and deploy their generative AI applications – without making long-term capital commitments. It’s one of the latest ways AWS is innovating to broaden access to generative AI capabilities.”
Pricing for the service can be found on the AWS website, where prospective users can also sign up to use the short-term, affordable option.