At the Advance AI event, AMD officially unveiled the Instinct MI300X, a dedicated compute accelerator designed to process massive amounts of data for generative AI and high-performance computing workloads. This is the most powerful computing accelerator in AMD’s arsenal.
The Instinct MI300X is based on the CDNA 3 architecture and a chiplet layout with crystals manufactured using 5 and 6 nm process technology. Advanced 3D packaging and TSV processing are used to assemble the chips. The base layer of the chip consists of four I/O die crystals with support for a 128-channel HBM3 memory interface, 256 MB of Infinity cache memory, support for 64 PCIe 5.0 lanes and 64 PCIe 4.0 lanes.
On the I/O chips are eight XCD chips, each containing 38 execution units on the CDNA-3 architecture, for a total of 304 units. The accelerator is equipped with 192 GB of HBM3 memory with a bandwidth of 5.3 TB/s. To cluster MI300X, the fourth generation Infinity Fabric bus is supported. In total, the MI300X contains 153 billion transistors.
AMD compares MI300X with NVIDIA H100 AI accelerators. The latter has 80 GB of HBM3 with a speed of 3.35 TB/s. According to AMD, the MI300X offers parity in AI training tasks, but outperforms the competitor by 1.6x in inference tasks (which run already trained AI systems). For example, AMD promises a speed increase of the large language model Llama 2 with 70 billion parameters by 1.4 times compared to the H100, while the MI300X is said to offer 1.6 times higher throughput for the Bloom model with 176 billion parameters.
Instinct MI300X can be combined into systems with eight accelerators on one board. Such a platform will be able to compete with the NVIDIA H100 HGX system while offering higher performance and significantly more memory, the latter of which is very important for AI tasks.
AMD calls the Instinct MI300X the most powerful AI computing accelerator. AMD’s eight-MI300X platform delivers 10.4 Pflops of performance in FP16/BF16 operation. For comparison, the NVIDIA H100 HGX platform delivers 7.9 Pflops of performance for the same tasks. And the HBM3 storage capacity of the AMD solution is 2.4 times larger than that of the competing platform.
AMD also introduced a new software platform, ROCm 6, optimized to work with Instinct MI300 series accelerators. According to the company, the new software improves the performance of computing accelerators with specialized libraries of large language models by 2.6 times and also introduces other optimizations that give the MI300X an overall eight times faster performance compared to MI250X accelerators equipped with the ROCm 5 software working platform.
Together with the Instinct MI300X, AMD introduced the MI300A hybrid processors specifically developed for data centers and high-performance computing (HPC). In fact, this is a server hybrid processor that combines a CPU and a powerful computing accelerator on one substrate.
In general, their structure is very similar to the MI300X layout, but the MI300A only uses six XCD blocks on the CDNA 3 architecture. They contain 146 billion transistors. Three additional chiplets in the MI300A are represented by CCD computing crystals, each containing eight Zen 4 processor cores. This means there are a total of 24 cores working on 48 threads. Specialized MI300A APUs received 128 GB of integrated HBM3 memory with a bandwidth of 5.3 TB/s.
AMD specifies a performance of 61 teraflops in FP64 operation and 122 teraflops in FP32 operation for the Instinct MI300A. It also notes that the new product offers twice the performance per watt of energy consumption than the competing NVIDIA Grace Hopper solution. The latter, as we remember, combines an NVIDIA accelerator and an Arm-based central processor.