Following on from the general reveal at yesterday’s Data Center and AI Technology Premiere event in San Francisco, AMD has shared further details on the technology and implementation behind 4th Generation Epyc ‘Bergamo’ chips.
These processors use the all-new Zen 4c architecture which provides greater core-density opportunity than in-market Genoa, running all the way up to 128 cores and 256 threads per chip. Offering exactly the same ISA and software compatibility, changes between the two are actually minor, which speaks to the ingrained modularity between 4th Generation Epyc designs.
From a barebone architecture point of view, the only meaningful difference is a halving of L3 cache per core, from 4MB to 2MB, but this move alone isn’t effective enough at reducing the area by a necessary level to house 33 per cent more cores and threads whilst still staying in a desired <400W operating budget.
In fact, AMD outlines a 35 per cent area saving between the two designs, based on comparing an equal number of cores and attendant L2 cache, which, as the above graphic shows, hasn’t changed in size between generations. There’s some trickery afoot.
The first major change is with how AMD builds out Bergamo dies compared to incumbent Genoa. If you recall, Genoa uses up to 12 CCDs, each housing one CCX that’s home to eight cores, for a total of 96 on the 9654 processor. Bergamo, on the other hand, doubles the CCXes per CCD, thereby offering 16 cores each. Knowing there’s a maximum of eight CCDs in this instance, simple maths leads us to the top-line 128 cores. It’s worth repeating that though Bergamo has more cores overall, it carries fewer, denser CCDs. Is there a future possibility of a Genoa-like 12-CCD Bergamo chip with 192 cores and 384 threads? The mind boggles.
Going back to halving of L3 cache per core, do understand that it actually remains the same per traditional CCD, so each one is home to 32MB, per Genoa, though the arrangement is slightly different insofar as it’s 2x16MB and not 1x32MB. Maximum power, too, remains at 400W, so the intimation is AMD reduces Bergamo frequencies to accommodate more cores.
This relaxation of frequencies – which we’ll document below – enables AMD to remove some timing-related and buffering silicon. Furthermore, though L1 and L2 caches remain at the same capacity as on Genoa, AMD is using denser SRAM to save space. The upshot of this approach is likely to be slower peak performance. Last but not least, there is no 3D V-Cache-equipped model for Bergamo, enabling further saving of silicon through not provisioning for the requisite TSV technology.
Nevertheless, even if you add up these silicon savings, dropping the core-and-L2 cache area by 35 per cent is wholly impressive given AMD’s not using a smaller manufacturing node; it remains 5nm for compute and 6nm for IOD.
That’s a lot to take in, so here’s a simple table delineating the key characteristics between regular Genoa and Bergamo processors.
Genoa 9654 | Bergamo 9754 | |
Max cores / threads | 96 / 192 | 128 / 256 |
CCXes per CCD | 1 | 2 |
Cores per CCX | 8 | 8 |
Total CCDs | 12 | 8 |
L3 cache per CCD | 32MB | 2x16MB |
Total L3 cache | 384MB | 256MB |
DDR5 channels | 12 | 12 |
PCIe 5.0 lanes | 128 | 128 |
Power budget range | 320 – 400W | 320 – 400W |
Base / boost frequencies | 2.4GHz / 3.7GHz | 2.25GHz / 3.1GHz |
Shipping to hyperscale customers now, Bergamo is productised as three processors. Epyc 9754 is the top-stack chip offering the full complement of 128 cores and 256 threads. Illustrating our power-saving point above, AMD reins in frequency from Genoa-based 9654’s 2.4GHz base and 3.7GHz boost to 2.25GHz and 3.1GHz here. That’s quite the drop, but as cloud-native applications tend to favour cores over frequency, this necessary frequency diminution – to keep within a 400W per-chip budget – makes sense.
Bergamo also offers a non-SMT version suffixed with an S, running 128 cores and 128 threads, though it’s somewhat surprising it has the same 360W default TDP. The baby of the bunch is the Epyc 9734, kitted out with 112 cores and 224 threads, presumably built by either turning off a CCD or by running 14 out of the 16 cores in each of the eight CCDs.
AMD’s proposed expansion into cloud-optimised processors is represented by the Zen 4c-powered Bergamo trio of chips. Using a tweaked design of baseline Genoa, core density and leanness have been achieved in a smart way. Rival Intel is not standing still in this area, however, and next year will see the release of Sierra Forest processors outfitted with up to 144 E-cores built on a leading-edge Intel 3 process.