Author
Published
15 Jul 2021Form Number
LP1492PDF size
5 pages, 50 KBAbstract
Understanding how to configure AMD EPYC Gen 2 and Gen 3 based servers for minimal memory latency is essential for getting the highest possible performance. This paper explains the differences in how memory latency scales with memory speed on AMD EPYC Gen 2 and Gen 3 processors and provides guidance on which memory speed to select for both processor families. The intended audience is system administrators and technical professionals who are responsible for server performance.
Introduction
Configuring AMD EPYC Gen 2 and Gen 3 based servers for minimal memory latency is essential for getting the highest possible performance in applications that favor lower memory latency over peak memory bandwidth – such as OLTP Database workloads running on SQL Server. Unlike most servers, on AMD EPYC Gen 2 based systems, the highest configurable memory speed will not always result in the lowest memory latency and the highest memory bandwidth. As such, understanding the behavior of memory latency and how it scales with memory speed on these platforms is necessary to ensure the highest levels of performance.
AMD EPYC Gen 2 and Gen 3 processors support a maximum memory speed of 3200 MHz, however, it is important to note that there are implementation differences between the two generations. As a result, memory latency behavior with respect to the configured memory speed also changes. This performance brief will explain the differences in memory speed implementation on AMD EPYC Gen 2 and Gen 3 processors and provide guidance on how to set memory speed to achieve the lowest memory latency possible.
Understanding Memory Bus Speeds
AMD EPYC Gen 2 and Gen 3 processors support DDR4 Memory. Double Data Rate (DDR) technology allows for the memory signal to be sampled twice per clock cycle: once on the rising edge and once on the falling edge of the clock signal. Because of this, the reported memory speed rate is twice the true memory clock frequency.
AMD EPYC processors also feature an Infinity Fabric that serves as an interconnect between the CPU cores and main memory. When optimally configured, the clock speed of this Infinity Fabric is typically equal to the true memory clock frequency, or half of the reported memory speed. For example, at 2933 MHz memory speed (1467 MHz memory clock frequency), the Infinity Fabric frequency is configured to 1467 MHz. Maintaining this 1:1 ratio between the memory clock frequency and the Infinity Fabric frequency yields the best memory latency.
On Lenovo ThinkSystem SR645 and SR665 servers with AMD EPYC Gen 2 or Gen 3 processors, the memory bus speed can run at 2666 MHz, 2933 MHz or 3200 MHz. The speed depends on the RDIMM selection and the number of memory DIMMs installed in each memory channel. When configured with Performance+ RDIMMs, the ThinkSystem SR645 and SR665 servers support a memory bus speed up to 3200 MHz when configured with two memory DIMMs installed in each memory channel.
For more information, see the ThinkSystem SR645 and SR665 product guides:
- SR645 Product Guide: https://lenovopress.com/lp1280
- SR665 Product Guide: https://lenovopress.com/lp1269
AMD EPYC Gen 2
On AMD EPYC Gen 2 processors, the maximum Infinity Fabric frequency is 1467 MHz. Thus, at a configured memory speed of 3200 MHz, the 1:1 ratio between the memory clock frequency and Infinity Fabric frequency is no longer upheld. This concept is called “Decoupling” and roughly results in a 12ns memory latency penalty to synchronize the two different frequency domains.
The following table shows the relationship between the different clock frequencies and the resulting memory latency and bandwidth on AMD EPYC Gen 2 processors.
About the measurements: Multichase was used to measure memory latency, average local node latency reported; AMD stream-dynamic was used to measure bandwidth, stream triad bandwidth reported. All measurements were made in a controlled lab environment. Actual customer measurements may vary depending on configuration and application workload
Due to the decoupling latency penalty at 3200 MHz, 2933 MHz results in the lowest possible memory latency, while maximum memory bandwidth is achieved at 3200 MHz. It is important to understand the memory latency and bandwidth performance tradeoff between 2933 MHz and 3200 MHz to determine which memory speed setting is optimal for your workload. Some workloads may benefit from higher peak memory throughput, while others are more latency sensitive and will perform better with lower memory latency – even at the cost of peak memory bandwidth.
AMD EPYC Gen 3
AMD EPYC Gen 3 processors support a maximum Infinity Fabric frequency of 1600 MHz, meaning there is no instance where the memory clock frequency and the Infinity Fabric frequency are decoupled. As a result, memory latency now decreases more linearly as the memory speed is increased.
The following table shows the relationship between the different clock frequencies and the resulting memory latency and bandwidth on AMD EPYC Gen 3 processors.
At a memory speed of 3200 MHz, the Infinity Fabric frequency is set to 1600 MHz and the 1:1 ratio with the memory clock frequency is maintained. This results in the best possible memory latency as well as the highest memory bandwidth. There is no latency penalty at 3200 MHz, thus, to achieve the best memory performance, simply set the memory speed to the highest speed that can be supported based on the DIMM selection and memory population.
Conclusion
While both AMD EPYC Gen 2 and Gen 3 processors can support a maximum memory speed of 3200 MHz, there is a difference in maximum frequency the Infinity Fabric clock can support. As a result, memory latency behavior changes from generation to generation. It is important to understand these differences and the demands of different workloads to ensure memory speed is set optimally:
- For AMD EPYC Gen 2 processors, 2933 MHz results in the lowest memory latency while 3200 MHz provides the highest memory bandwidth.
- For AMD EPYC Gen 3 processors, 3200 MHz provides the lowest memory latency and the highest memory bandwidth.
About the author
Jamal Ayoubi is a Systems Performance Verification Engineer in the Lenovo Infrastructure Solutions Group Performance Laboratory in Morrisville, NC. His current role includes CPU, Memory, and PCIe subsystem analysis and performance validation against functional specifications and vendor targets. Jamal specializes in AMD EPYC architecture, UEFI specification, and performance tuning. Jamal holds Bachelor of Science degrees in Electrical Engineering and Computer Engineering from North Carolina State University.
Trademarks
Lenovo and the Lenovo logo are trademarks or registered trademarks of Lenovo in the United States, other countries, or both. A current list of Lenovo trademarks is available on the Web at https://www.lenovo.com/us/en/legal/copytrade/.
The following terms are trademarks of Lenovo in the United States, other countries, or both:
Lenovo®
ThinkSystem®
The following terms are trademarks of other companies:
AMD, AMD EPYC™, and Infinity Fabric™ are trademarks of Advanced Micro Devices, Inc.
SQL Server® is a trademark of Microsoft Corporation in the United States, other countries, or both.
Other company, product, or service names may be trademarks or service marks of others.
Configure and Buy
Full Change History
Course Detail
Employees Only Content
The content in this document with a is only visible to employees who are logged in. Logon using your Lenovo ITcode and password via Lenovo single-signon (SSO).
The owner of the document has determined that this content is classified as Lenovo Internal and should not be normally be made available to people who are not employees or contractors. This includes partners, customers, and competitors. The reasons may vary and you should reach out to the authors of the document for clarification, if needed. Be cautious about sharing this content with others as it may contain sensitive information.
Any visitor to the Lenovo Press web site who is not logged on will not be able to see this employee-only content. This content is excluded from search engine indexes and will not appear in any search results.
For all users, including logged-in employees, this employee-only content does not appear in the PDF version of this document.
This functionality is cookie based. The web site will normally remember your login state between browser sessions, however, if you clear cookies at the end of a session or work in an Incognito/Private browser window, then you will need to log in each time.
If you have any questions about this feature of the Lenovo Press web, please email David Watts at dwatts@lenovo.com.