AWS Charges a High Premium for Graviton 4 Instances

AWS Charges a High Premium for Graviton 4 Instances

Decades of improvements in server CPU performance and economics due to Moore’s Law have trained us all to think that no matter what happens, we’ll always see a lower unit cost of performance with each successive processor generation. But that doesn’t always happen, and especially so in the 2020s, when easy transistor shrinks and clock cranks are finally here.

That certainly hasn’t happened with the street-level pricing of the Graviton 4 processors designed by Amazon Web Services, with the first R8g samples being generally available today. More Graviton 4-based samples will eventually launch on AWS, with variations in memory, local storage, and I/O capacity, but for now, the basic R8g samples are only available in four regions.

The Graviton family of Arm-based CPUs designed by the cloud giant’s Annapurna Labs division is getting bigger and bigger, and with the Graviton 4 generation, it’s ready to take on even bigger tasks. The chip has faster cores, better cores, more cores, and for the first time, dual-socket NUMA memory clustering that brings 192 cores running at 2.8GHz and supported by 1.5TB of main memory. The original Graviton 1 chip from November 2018 looks like a toy compared to the Graviton 4 that’s available to rent today.

AWS launched the Graviton 4 in November of last year, and many details about the chip remain undisclosed. Ali Saidi, senior principal engineer at Annapurna Labs, filled in a few of the gaps in our salient spec sheet. Saidi explains that the Graviton 4 chip runs at 2.8GHz, which is pretty close to the 2.7GHz we were expecting. By doubling the L2 cache per core to 2MB, the AWS team was able to reduce the amount of L3 cache on the processor, leaving more room for a 50% expansion to increase the core count to 96 per chip. In fact, the L3 cache per core was reduced to 384KB, which is 2.7 times smaller than the L2 cache per core. However, the L3 cache is now up to 36MB shared across those 96 cores, providing a larger shared memory footprint than the L2 cache at 2MB per core.

“That’s why each L2 has grown, so instead of one megabyte, it’s two megabytes,” Saidi says. Next Platform“And the logic there is pretty simple. It takes ten cycles to reach that L2 cache, and ten cycles at twice the capacity. It takes 80 to 90 cycles to reach that last-level cache. We want to put as much memory as possible as close as possible, and we put it about 8 times closer.”

As we’ve previously reported , the Graviton 4 is based on Arm Ltd’s “Demeter” V2 core. This is the same core that Nvidia uses in its 72-core “Grace” CPU and that many other chipmakers are opting to use. Among its many other features, the V2 core features four 128-bit SVE-2 vector engines, which are useful for many HPC and AI workloads. We still don’t know what process node AWS has chosen for the Graviton 4, the number of transistors in this beast, the number of PCI-Express 5.0 lanes it has, or its thermal design point.

We will learn these things eventually.

AWS has more than 2 million Graviton processors deployed across 33 regions and 100+ availability zones, a key differentiator for the AWS cloud and a key resource for the Amazon conglomerate, which has diverse media, entertainment, retail, electronics, and cloud operations. In fact, assuming that Graviton 4 instances offer about 30 percent to 40 percent better price/performance than roughly equivalent X86 processors from Intel and AMD — we think it could be 20 percent to 25 percent this time around, but we’d need some cross-architecture benchmarks to make a better assessment — the pricing we’re seeing for the memory-optimized R8g instances at launch suggests that demand for Graviton 4 is high, so high that customers who buy it could help parent company Amazon get its own Graviton 4 capacity for much less money than it otherwise would.

Feeds and speeds for Graviton 4 samples, along with on-demand and reserved sample pricing, are as follows:

R8g instances range from 1 to 96 cores and 8GB to 768GB of memory for a single socket. There is a sliding scale of network bandwidth up to 40Gbps per instance, and Elastic Block Storage (EBS) also scales up to 30Gbps per socket. We consider the two-socket Graviton 4 instance to be a special case, given that there is only 50Gbps of network bandwidth and 40Gbps of EBS bandwidth for a two-socket machine. Furthermore, there is no instance size between 96 and 192 cores, as you would expect if all the physical machines Amazon builds are based on two-socket boxes.

On the other hand, this could just be how AWS allocates machines. For all we know, all Graviton 4 machines are likely to be two-socket systems. What is clear is that AWS – and therefore its customers – value NUMA memory sharing across processors, and that’s because with 192 cores and 1.5TB of memory, this could be a node capable of running fairly large workloads, like SAP HANA in-memory databases that will be certified on R8g instances.

Rahul Kulkarni, director of product management for the compute and AI/ML portfolio at AWS, says that in general, customers should expect at least a 30% increase in performance when moving from Graviton 3 to Graviton 4, but in many cases it’s 40% or higher. This depends on the nature of the workload and what integer or vector features the software uses.

The premium that AWS is demanding for Graviton 4 is quite high. Let’s take a look at the Graviton 4 R8g samples, comparing them to the previous versions of the Graviton 2 and Graviton 3 samples:

Our estimated ECU (short for EC2 Compute Unit, a very old relative performance metric that AWS used in its early days) performance for Graviton 4 falls in line with the minimum 30% performance increase that Saidi and Kulkarni say you should expect. For these examples shown above, we assumed that the workloads were not memory-bound and applied the same relative performance to each CPU type regardless of memory. In the real world, we find that more memory sometimes means you’re closer to the theoretical performance of a compute engine. If we had more data, we could estimate the performance impact of less memory on some smaller instance types. But we don’t have much more data.

To get a relative price/performance, we calculated the cost of running each instance for a year at the current list price on AWS. Just for fun, we also estimated what the cost would be for R8gd instances that would have dedicated local flash storage like the other “gd” instances. As always, this is shown in bold red italics.

The bottom line is this: if you compare the top 64-core R7g instance to the top 96-core R8g instance, the R8g instance provides 30% more performance, but is 65% more expensive and has a price/performance ratio of 26.9%. worse.

We’ve heard echoes of this in past CPU launches. IBM’s ES/9000 mainframes in 1990. Sun Microsystems’ UltraSparc-III systems in 2001. Intel’s “Skylake” Xeon SP v1 processors from 2017. All of these cost more per unit of performance than their predecessors, and they did so at a particularly difficult time for their manufacturers, with competition about to heat up. With AWS, we suspect it’s more about pricing what the market can handle. But that’s exactly what IBM, Sun, and Intel would say. And in fact, they all said it at the time. We were there too.

(tags to translate)Annapurna Labs