In-depth interpretation of AMD Ryzen AI 300 series processors: comprehensive ren

At the ComputeX 2024 exhibition, AMD launched the brand-new Ryzen AI 300 series processors for the mobile platform. In terms of processor naming, AMD skipped the 100 and 200 series and directly entered the 300 series, incorporating the trendy term "AI" into the name. With this change in the entire naming system, how should we interpret the Ryzen AI 300 series processors? What changes and surprises do the new processors bring in terms of CPU microarchitecture, NPU, and GPU? Please see the in-depth interpretation in this article.

Ryzen AI 300 Series: New Naming, New Models, New AI

Previously, AMD's mobile processor series was composed of "AMD Ryzen" followed by "four digits," such as the Ryzen 7000 and Ryzen 8000 series. This time, AMD has changed its style and adopted a new naming convention, which is "AMD Ryzen AI" followed by "1 series digit" + "2 letters" + "three model digits," such as "AMD Ryzen AI 9 HX 375." The new naming method highlights the role of AI in the processor and simplifies the method for users to identify the processor.

Advertisement

We speculate that the Ryzen AI series processors may have multiple levels of products with "9," "7," "5," and other grades, and there may be "HX," "HS," or no English designations at all. In terms of numerical models, the higher the performance, the larger the number, such as the "375" model being more powerful than the "365." Therefore, we may see processor models like "Ryzen AI 7 HX 350" in the future.

Regarding the English designation, we have only seen the "HX," which usually represents high-performance versions. However, in the new products, the HX designation, combined with the number 9, indicates the brand level, positioning it as a high-end but not high-power version. It is not yet known which English code will be used for future high-performance, high-power versions.

Additionally, for the "300" in the Ryzen AI 300 series, AMD explains that this generation of products is its third generation of AI processors. Where are the first and second generations? If readers have been following this publication for a long time, they should know that the Ryzen 7040 series actually pioneered the integration of the XDNA architecture NPU in x86 processors, which we have introduced in several articles. The Ryzen 8040 series further enhanced the computing power of the NPU, making it AMD's second-generation AI PC processor.Let's take a look at the product series. Since the product has just been released, AMD has currently only launched three Ryzen AI 300 series processors, namely the Ryzen AI 9 HX 375, Ryzen AI 9 HX 370, and Ryzen AI 9 365. The first two processors both have 12 cores and 24 threads, with a maximum frequency of 5.1GHz, and a TDP power consumption of 28W, which manufacturers can also choose from within the power range (15W to 54W), and the built-in GPUs are Radeon 890M.

The only difference is that the former's NPU computing power is 5 TOPS higher than the latter's. The Ryzen AI 9 HX 375 has 55 TOPS, while the Ryzen AI 9 HX 370 has 50 TOPS, making it the NPU with the highest computing power in the current notebook market. The Ryzen AI 9 365 has slightly lower specifications, with 10 cores and 20 threads, and the GPU model is Radeon 880M, with the maximum frequency reduced to 5.0GHz.

Overall, the Ryzen AI 300 series processors are still in the early stages of release, and the product models are currently only high-end products, with the mid-to-high-end and mid-range layouts not yet complete. We look forward to AMD continuing to work hard and completing the entire layout of the Ryzen AI 300 series processors as soon as possible.

New architecture debut: Zen 5 + RDNA 3.5 + XDNA 2

After understanding the basic specifications and naming of the Ryzen AI 300 series processors, let's look at the content related to the architecture of these processors.

The AMD Ryzen AI 300 series processors use a single-chip design, with the product codename "Strix Point". Strix Point uses the TSMC N4P production process, which is the same as the Ryzen 9000 series desktop processors. We also briefly introduced the TSMC N4P process in the article about the Ryzen 9000 series desktop processors.

TSMC mentioned in the promotion of the N4P process that the N4P process is developed based on the N5 process, using more EUV lithography layers. The "P" means that its process orientation is performance-oriented, and overall, it can improve performance by about 11% compared to the N5 process under the same conditions, and by 6% compared to the original N4. In terms of energy efficiency, compared to N5, N4P has improved by 22%, and the overall area has been reduced by about 6% compared to N5, making it very suitable for the production and manufacturing of high-performance processors.

The overall core area of Strix Point is about 232.5 square millimeters, which is much larger than the 178 square millimeters of the previous generation, that is, the Ryzen 8000 series mobile processors, indicating that the overall performance improvement of Strix Point will be considerable. In terms of cache, due to the increase in the number of cores, Strix Point has brought the highest up to 12MB L2 cache and 24MB L3 cache, which is also one of the reasons for the significant increase in its core area.From an overall architectural perspective, Strix Point internally integrates a CPU, GPU, NPU, and a multitude of functional modules, such as video processing, image display, PCIe controllers, memory controllers, power controllers, etc., making the overall structure quite complex. The structural schematic provided by AMD shows that the entire Strix Point includes a 4-core, 8-thread Zen 5 core with 16MB L3 cache and an 8-core, 16-thread Zen 5c core with 8MB L3 cache. Additionally, it features an RDNA 3.5 GPU with 8 WGPs, an XDNA 2 NPU with 32 inference engines, video acceleration units, audio processing units, display control, system bus, security units, and wireless connectivity units, among others.

In terms of external connectivity, Strix Point supports 128-bit LPDDR5 at 7500MT/s or DDR5 at 5600MT/s memory, supports 16 PCIe 4.0 lanes, supports 4 display output streams, supports 8 USBs, including 2 USB 4, 1 USB-C 3.2, 2 USB-A 3.2 Gen 2, and 3 USB-A. There are also I2C bus, SPI and eSPI, GPIO, and other functional modules. It is worth noting that among the aforementioned structures, units, and modules, in addition to the Zen 5 architecture already being used in the Ryzen 9000 series desktop processors, the RDNA 3.5 architecture and the new generation NPU architecture are also being released for the first time.

Special mention should be made of the CPU part of Strix Point. The Strix Point core has 12 CPUs built-in, with 4 being classic Zen 5 cores and the other 8 being compact Zen 5c cores, the latter being a compact optimization version, which should be similar to what AMD did with Zen 4 and Zen 4c. On Zen 4c, AMD achieved a 35% reduction in core area through high-density compact design, streamlined modules, and process layout, while maintaining overall performance without significant degradation, and improving power consumption and performance-per-watt ratios accordingly. We will further discuss the details of Zen 5 and Zen 5c in the CPU microarchitecture section later.

Zen 5 and Zen 5c: Homogeneous Hybrid Core Design Scheme

AMD has adopted a brand-new Zen 5 architecture on Strix Point. We have analyzed the improvements of the Zen 5 architecture in detail in our previous introduction of the Ryzen 9000 series desktop processors, and we will summarize it briefly here. AMD has made design changes to the front-end, execution, and back-end parts of Zen 5, such as the adoption of a brand-new next-generation branch predictor in the front-end part of the Zen 5 architecture, which brings a zero-overhead (Zero-Bubble) conditional branch prediction function, combined with a larger TAGE branch predictor, to achieve an overall increase in operational efficiency. In terms of decoding capability, the Zen 5 front-end uses two 4-width decoders, capable of decoding up to 8 x86 instructions per cycle. In SMT mode, each decoder matches one pipeline.

Overall, compared to Zen 4, Zen 5 has made significant improvements and adjustments to its overall architecture, especially in the floating-point part and the front-end part, which means that Zen 5 has a considerable performance improvement over Zen 4. AMD has provided a table comparing the changes of Zen 5 relative to Zen 4, showing that the main improvements are in the wider, deeper, and more extensive overall architecture, ultimately bringing a 16% IPC increase for Zen 5 compared to Zen 4.

Next, let's take a look at the content related to Zen 5c. Zen 5c is a compact core designed by AMD for high-density computing. Data released by AMD shows that compared to Zen 5, the core area of Zen 5c is reduced by about 25% per core. AMD has not yet disclosed how the reduction was achieved, but considering the existing technology, it should still involve the use of a high-density version of the process library, the reduction of a large number of devices designed for high frequencies, and smaller caches, achieving the goal through multiple approaches. The base frequencies of the two are the same, with the highest frequency of Zen 5 capable of running up to 5.1GHz, while Zen 5c is limited to 3.3GHz.Specifically, looking at the product, the schematic provided by AMD shows that the L3 cache for Strix Point is configured with 16+8 for a total of 24MB, with 4 Zen 5 cores sharing 16MB of L3 cache, and 8 Zen 5c cores sharing 8MB of L3 cache. As a result, for Zen 5c, due to the lower L3 cache and maximum frequency, it is more focused on balancing energy efficiency in actual use, while overall throughput capabilities and ISA support remain completely consistent.

Therefore, Zen 5c should be more suitable for background applications and improving overall throughput in multi-threaded scenarios, enhancing the scalability of energy efficiency. However, it can be seen that since the Zen 5 cores and Zen 5c cores belong to two different "blocks," the latency when transferring data between Zen 5c and Zen 5 should increase, which means that task scheduling needs to be optimized. For the 4 Zen 5 cores, the L3 cache maintains the same average of 4MB per core as desktop processors, and the frequency of the 4 Zen 5 cores also reaches up to 5.1GHz. This means that tasks with high performance demands will be presented with extremely excellent performance on the 4 Zen 5 cores, especially for cache-sensitive applications like gaming, where the gap compared to desktop processors is even smaller.

AMD has provided a comparison of Zen 5 and Zen 5c, which we can summarize. First, the design goal of Zen 5 is the highest frequency and highest performance, so it can operate at high frequencies and has the largest single-core 4MB L3 cache, or 16MB cache shared among 4 cores. Second, Zen 5c has been optimized for scalable performance, mainly by increasing the number of cores, so it performs lower in frequency, has higher power efficiency, and also reduces L3 capacity, after all, cache is one of the components that consume the most transistors.

In the end, for overall software scheduling, unlike Intel's heterogeneous core design, since Zen 5 and Zen 5c are homogeneous cores, there is no difference in ISA, so software scheduling is relatively simpler, and there are no bottlenecks like "big cores support AVX-512, small cores do not support." Moreover, Zen 5c also supports SMT hyper-threading technology. AMD can adjust between performance and efficiency to make the final presentation more stable and reliable. However, for task scheduling targeting different cores, whether there is a greater delay and whether further optimization is needed, it will take some time to learn more details.

Zen 5c has a smaller area and performs better in energy efficiency than Zen 5, but AMD has not provided more details for the time being, so we do not know how much the energy efficiency ratio of Zen 5c will be improved compared to Zen 5 at the same frequency. However, when AMD introduced Zen 4c, it had compared the energy efficiency with Zen 4, and it was observed that at below 20W, the energy efficiency ratio of Zen 4c had already surpassed that of Zen 4, with even higher performance. The performance of Zen 5c, which follows the same design philosophy, should be similar, and we look forward to more details.

In addition, in terms of ISA, Zen 5 has also added new instruction sets compared to previous generations, including MOVDIRI/MOVD64B, VNNI/VEX, VP2INTERSECT, PREFETCH, etc., some of which are established for AVX-512, and the rest are mainly for AI computing. There are also new instructions for heterogeneous topology and PMC virtualization.

RDNA 3.5: The strongest integrated graphics takes it a step further.In addition to the CPU microarchitecture, AMD has also enabled a brand-new GPU based on the RDNA 3.5 architecture on Strix Point. Regarding this architecture, AMD has provided some information. In terms of overall scale, the GPU module integrated in Strix Point is larger, containing 1 module with 8 WGPs (Work Group Processors), totaling 1024 stream processors, 32 AI acceleration units, and 16 ray tracing acceleration units. On the rendering backend, RDNA 3.5 now includes 4 units with 16 ROP (Raster Operations) units.

The scale of Strix Point's GPU has increased significantly compared to its predecessor, and naturally, its performance has also seen a substantial boost. At a frequency of 2.9GHz, the GPU of Strix Point can deliver an FP32 throughput of 11 TFLOPS, which is about a 30% increase in computing power compared to its predecessor, Phoenix.

In terms of architectural improvements, RDNA 3.5 brings updates to the texture subsystem, including doubled texture sampling rates and point sampling acceleration, which means that the overall texture quality of the image will be better presented. The shader subsystem has introduced a doubling of the interpolation rate and the numerical comparison rate, which makes the detail presentation of high-quality images more outstanding. Additionally, the new architecture has made some improvements in the shader SALU (Scalar ALU) and VGPR (Vertex Graphics Processing Unit Register). In terms of rasterization, it has introduced batch processing capabilities, enhancing hardware efficiency. Regarding memory management, RDNA 3.5 now supports more advanced memory compression techniques, especially when paired with LPDDR 5, which can lead to performance improvements and better efficiency.

AMD has provided some test data, for example, in 3DMark, Strix Point has achieved a 32% increase in the 3DMark Time Spy score and a 19% increase in the Night Raid score compared to the previous generation, both at the same 15W TDP, which is quite satisfactory.

However, it should be mentioned that if Strix Point is equipped in full-featured or thin-and-light type laptops without a discrete GPU, its integrated GPU performance is above the level of entry-level discrete graphics, sufficient for everyday 3D functions. The balance between performance and endurance is its selling point, and it is not possible to expect a low-power device to have the performance and scale of a high-performance discrete GPU. Therefore, if you are a gamer, you may need to consider AMD's high-performance mobile chips based on the Zen 5 architecture that will be released later.

XDNA 2 architecture: Larger scale, better energy efficiency

A notable feature of AMD's mobile SoC is the inclusion of an NPU, a core specifically designed for AI computing. From the first-generation Ryzen 7040 series to the second-generation Ryzen 8040 series, and now to the current Strix Point, which is the Ryzen AI 300 series processor, it has evolved into the third generation of AMD AI PC processor products.The NPU unit architecture of Strix Point has been updated. Previously, the product used the XDNA architecture, but now the brand-new NPU employs the XDNA 2 architecture. The new architecture is larger in scale and has a higher energy efficiency ratio, making its performance and user experience on mobile devices even more anticipated.

AMD has outlined some architectural changes in XDNA 2. First, the overall architecture now offers more comprehensive and enriched support for generative AI. AMD has also made some software optimizations, including for models like Stable Diffusion, which the new processor has been optimized and supported for. Second, the new NPU has significantly increased computing power, with its AI computing power reaching up to 55 TOPS under INT 8.

XDNA 2 also introduces support for "Block FP16," a technology that, while consuming the computing power of 8-bit calculations and achieving corresponding speeds, provides results close to 16-bit computations. This will allow AI computing to no longer have to choose between speed and accuracy, but instead, it can "have both." It is worth mentioning that AMD is the first manufacturer to incorporate block floating-point technology into an NPU.

Third, compared to the previous generation, XDNA 2 has twice the concurrent spatial streams and 1.6 times the on-chip cache. The concurrent spatial streams refer to the computing method of AMD XDNA, not the traditional 2D computing method, which AMD calls spatial streams. Looking from the perspective of computing units, the NPU corresponding to XDNA 2 has 32 AI engine units, 12 more than the previous generation. The number of MACs in each AI engine is twice that of the previous generation, which is also the source of the data for XDNA 2's twice the concurrent spatial streams. In terms of cache, more on-chip cache means higher overall computing efficiency.

Lastly, the XDNA 2 architecture has added support for non-linear functions and increased functions related to sparse computing. In terms of power supply, XDNA 2 has implemented power gating for each column of computing units, coupled with process technology and design improvements, resulting in a total of 2 times the performance-to-power ratio improvement. All of these factors combined make it the current NPU with the strongest AI computing power.New Architecture, New Exploration: AMD's Leap Forward in Mobile Devices

This article provides an interpretation of the model naming, technological, and architectural improvements of the AMD Ryzen AI 300 series processors. As for performance, since this processor has been officially released and we have already tested it, we will not go through AMD's performance data one by one in this article. Readers who wish to understand the processor's performance are advised to check our review articles.

In general, the Ryzen AI 300 series processors represent the most significant update and the greatest improvement AMD has made in mobile processors in recent years. With the Ryzen AI 300 series processors, we see AMD making bold innovations from the macro-architecture level to the CPU micro-architecture, GPU micro-architecture, and NPU micro-architecture, including GPU and NPU, with some improvements appearing for the first time on the mobile platform.

In terms of CPU micro-architecture, the hybrid core pairing of Zen 5 and Zen 5c is also a first-time appearance across the entire product line. In the previous generation, we only saw the pairing of Zen 4 and Zen4c cores in the mid-range Ryzen 5 series and the entry-level Ryzen 3 series. The results of these upgrades and improvements are also very clear, which is a significant increase in energy efficiency and rapid follow-up and support for the currently popular AI.

Laptops equipped with the Ryzen AI 300 series processors are now on the market. From a market positioning perspective, the Ryzen AI 300 series processors are taking the lead in capturing the high-end ultrabook and versatile laptop market, which is also the market targeted by Intel's Lunar Lake, officially announced for release in September. Therefore, we will soon see new competition between AMD and Intel in the market. As consumers, we can once again choose from the fierce competition and get more value, which is worth looking forward to.

Social sharing:

Leave a comment