Author
Published
2 Nov 2023Form Number
LP1840PDF size
13 pages, 435 KBAbstract
SPEComp2012 is a SPEC benchmark that is one of the best ways to evaluate the performance of modern multi-processor servers. SPEComp2012 is based on the OpenMP 3.1 application programming interface, which is widely used in parallel programming to take advantage of multi-core CPU environments.
This paper describes how to improve the performance AMD-processor based servers for the SPEComp 2012 benchmark. We look at various elements, including hardware, firmware, OS, compiler, and environment settings. The paper uses the Lenovo ThinkSystem SR655 V3 AMD server with a 4th Gen AMD EPYC processor as our test environment, however the guidance presented here applies to other AMD processor families and other ThinkSystem servers.
The paper is intended for administrators who are familiar with Linux and have C/C++ programming language experience.
Introduction to OpenMP
OpenMP (Open Multi-Processing) is an application programming interface (API) that supports multi-platform shared-memory multiprocessing programming in C, C++, and Fortran. It consists of a set of compiler directives, library routines, and environment variables that influence run-time behavior. OpenMP use a portable, scalable model that gives programmers a simple and flexible interface for developing parallel applications for platforms ranging from the standard desktop computer to the supercomputer.
To setup the number of threads used for the openmp program, the value of openmp environment variable OMP_NUM_THREADS can be modified and verified under Linux Bash shell environment as shown below.
Figure 1. Set the number of threads for parallel
In OpenMP programming, we need to specify the region which we are going to define as parallel using the keywords pragma omp parallel. The keywords are used to fork additional threads to carry out the work enclosed in the construct in parallel. The original thread will be denoted as master thread with thread ID 0.
The omp_get_thread_num() function in the following C based program will display “Hello, world” multiple time with threads ID according to user defined value of OMP_NUM_THREADS environment variable.
Compiler the C program with Intel oneAPI compiler’s option “-fopenmp” to generate a parallel execution binary file:
$ icc -fopenmp hello.c -o hello
Set to 4 threads by command export OMP_NUM_THREADS=4, the program will print out 4 times “Hello, world” with thread ID from 0 to 3 as shown in the figure below:
Introduction to SPEComp2012
SPEComp2012 contains a suite that focuses on parallel computing performance using the OpenMP 3.1 parallelism standard. SPEC OMP is not intended to stress other computer components such as networking, the operating system, graphics, or the I/O system. SPEComp2012 focuses on compute intensive performance, which means these benchmarks emphasize the performance of:
- The computer processor (CPU)
- The memory architecture
- The parallel support libraries
- The compilers
SPEComp2012 is based on compute-intensive applications provided as source code, it contains 14 benchmarks: 8 use Fortran, 5 use C, and 1 use C++.
The SPEComp2012 benchmark uses base and peak metrics to evaluate the performance of a server. The base metric is the geometric mean of medians of the base ratios, and the peak metric is the geometric mean of median of the peak ratios.
- The base metrics are required for all reported results and have stricter guidelines for compilation. For example, the same flags must be used in the same order for all benchmarks of a given language. This is the point closer to those who might prefer a relatively simple build process.
- The peak metrics are optional and have less strict requirements. For example, different compiler options may be used on each benchmark, and feedback-directed optimization is allowed. This point is closer to those who may be willing to invest more time and effort in development of build procedures.
The following table is an example of SPECompG_base2012.
Notes about the table:
- For the given OMP2012 suite, the elapsed time in seconds for each of its benchmark runs is reported.
- The ratio of the reference system (Sun Fire X4140) time divided by the corresponding measured time is reported.
- Separately for base and peak, the median of three runs of these ratios is reported per benchmark.
In all cases, a higher ratio means “better performance” on the given workload.
ThinkSystem SR655 V3
We used the Lenovo ThinkSystem SR655 V3 for our testing in the lab. The Lenovo ThinkSystem SR655 V3 is a 1-socket 2U server that features the AMD EPYC 9004 "Genoa" family of processors. With up to 96 cores per processor and support for the new PCIe 5.0 standard for I/O, the SR655 V3 offers the ultimate in one-socket server performance in a 2U form factor. The server also supports for DDR5 memory DIMMs to maximize performance of the memory subsystem, includes 12 memory channels (1 DIMM per channel), DIMM speeds up to 4800 MHz.
Figure 4. Lenovo ThinkSystem SR655 V3
For information about the SR655 V3, see the Lenovo Press product guide:
https://lenovopress.lenovo.com/lp1610-thinksystem-sr655-v3-server
Our configuration to measure SPEComp2012 in our lab is as follows.
SPEComp2012 performance tunning
To obtain the SPEComp2012 best performance recipe for the 4th Gen AMD EPYC processor, we examined the hardware, firmware, and software components of ThinkSystem SR655 V3 server.
Topics in this section:
Processors
Designed for parallel computing, the performance of SPEComp2012 benefit from CPU cores and hardware threads. The SMT (Simultaneously Multithreading) of AMD EPYC CPU provide one more hardware thread for each physical cores, which can support more OpenMP threads runs on the system to improve the performance.
The following chart shows the SPEComp2012 scaling result from 1 to 96 cores with SMT enabled. The baseline result was measure by one core, two threads (1C2T), the performance scale up to 2424.95% on 96C192T.
Figure 5. SPEComp2012 result with different cores/thread
SMT
In Lenovo UEFI Maximum Performance Operating Mode, the SMT default is enabled. It not only improves the performance, but also impacts the numbers of the threads. Before start to run the SPEComp2012 benchmark, please check the OpenMP environment variables OMP_NUM_THREADS, mapping to correct number of threads for best performance.
Enable SMT significantly improve the performance of SPEComp 2012 benchmark. The following chart is comparing the SMT enable and disable results on SR655 V3 server.
Memory
The Lenovo ThinkSystem SR655 V3 supports the DDR5 memory frequency 4000MHz, 4400MHz and 4800MHz. Configure to the highest memory frequency further improve the performance up to 5% compared to lowest memory frequency as show in the picture below.
UEFI configuration
The Lenovo ThinkSystem SR655 V3 UEFI provides three presets for Operating Mode: Maximum Performance Mode, Maximum Efficiency and Custom Mode. We recommend loading UEFI default settings first and choose “Maximum Performance” preset in Operating Mode before start to run the SPEComp2012 benchmark.
Lenovo ThinkSystem UEFI firmware design several operating modes for difference purposes, the number of operating modes will depend on different system design. We always chose the Maximum Performance Mode for best system performance.
Figure 8. UEFI setup menu for Operating Modes
The modes are as follows:
- Maximum Efficiency
Maximum Efficiency mode maximizes the performance/watt efficiency. It provides the best features for reducing power and increasing performance in application where maximum bus speeds are not critical.
- Maximum PerformanceMaximum Performance mode will maximize the absolute performance of the system without regard for power. In this mode, power consumption is not taken into consideration. Attributes like fan speed and heat output of the system may increase in addition to power consumption. Efficiency of the system may go down in this mode, but the absolute performance may increase depending on the workload that is running.
- Custom Mode
Custom Mode allows the user to individually modify any of the low-level settings that are preset and unchangeable in any of the other preset modes. Custom Mode will inherit the UEFI settings from the previous preset operating mode.
For example, if the previous operating mode was the Maximum Performance operating mode and then Custom Mode was selected, all the settings from the Maximum Performance operating mode will be inherited. Note that there are certain settings that may be mutually exclusive or interdependent. For those settings an error will be surfaced if one of the pre-requisite or interrelated settings is set in such a way as to make configuration of the setting in question non-valid.
Linux utilities
To achieve best performance for AMD Genoa Processor we suggest using commercial Enterprise Linux OS with latest kernel, such as RHEL 9 or SLES 15. Besides kernel version, cpupower is one of user-level utilities that provides predefined governors and abilities for tuning CPU frequency and power features.
Use the following command to check processor state and list available governors:
cpupower frequency-info
Figure 9. cpupower frequency-info
Use the following command to switch to the “performance” governor to get better performance:
cpupower frequency-set -g performance
Compiler flags
Support C/C++, Fortran with OpenMP 3.1, the Intel oneAPI DPC/C++ is one of the best commercial compiler toolsets we recommend building SPEComp 2012 benchmark under x86_64 architecture.
Here are suggested compiler flags use for SPEComp 2012 performance optimization.
- -O3: The higher compiler’s optimizations level, which generate performance optimized binary and reduced size of the binary.
- -fopenmp: Activating the OpenMP features based on OpenMP directives in the source codes.
- -march: Direct the compiler to generate binary for specific architecture. For example, “-march=core-avx2” would generates binary for the processors that support Intel Advanced Vector Extension2 (Intel AVX2) instructions.
OpenMP environment variables
The OpenMP environment variables configure the CPU resources allocation, CPU binding, memory binding and preferred runtime library when running OpenMP processes, the detail can find in the table below.
In SPEComp2012 benchmark configuration file, add ENV_ for all OpenMP environment variables. For example:
ENV_OMP_NUM_THREADS=256
ENV_KMP_AFFINITY=compact,1
Conclusion
For the best SPEComp2012 performance, we provided tuning suggestions from hardware configurations to firmware settings, compiler flags and OpenMP environment variables. Besides, the performance delta with and without suggestions also been provided for user reference.
Applying all the tuning steps mentioned here, the ThinkSystem SR655 V3 set a new performance world recorded on SPEComp2012. More detailed information can be found from the publish results URL:
https://lenovopress.lenovo.com/lp1758-sr655-v3-specompg-benchmark-result-2023-07-01
Author
Sinper Liang is a performance engineer in the Lenovo Infrastructure Solution Group laboratory located at Taipei, Taiwan. Sinper joined Lenovo in 2019 and focuses on system performance validation and the SPEC OMP2012 benchmark. Prior to Lenovo, he worked at the IBM Taiwan Systems and Technology Laboratory as a system UEFI firmware assurance and validation engineer, and at Wiwynn as the UEFI test leader.
Trademarks
Lenovo and the Lenovo logo are trademarks or registered trademarks of Lenovo in the United States, other countries, or both. A current list of Lenovo trademarks is available on the Web at https://www.lenovo.com/us/en/legal/copytrade/.
The following terms are trademarks of Lenovo in the United States, other countries, or both:
Lenovo®
ThinkSystem®
The following terms are trademarks of other companies:
AMD, AMD EPYC™, and Fire™ are trademarks of Advanced Micro Devices, Inc.
Intel® is a trademark of Intel Corporation or its subsidiaries.
Linux® is the trademark of Linus Torvalds in the U.S. and other countries.
SPEC® and SPEC OMP® are trademarks of the Standard Performance Evaluation Corporation (SPEC).
Other company, product, or service names may be trademarks or service marks of others.
Configure and Buy
Full Change History
Course Detail
Employees Only Content
The content in this document with a is only visible to employees who are logged in. Logon using your Lenovo ITcode and password via Lenovo single-signon (SSO).
The author of the document has determined that this content is classified as Lenovo Internal and should not be normally be made available to people who are not employees or contractors. This includes partners, customers, and competitors. The reasons may vary and you should reach out to the authors of the document for clarification, if needed. Be cautious about sharing this content with others as it may contain sensitive information.
Any visitor to the Lenovo Press web site who is not logged on will not be able to see this employee-only content. This content is excluded from search engine indexes and will not appear in any search results.
For all users, including logged-in employees, this employee-only content does not appear in the PDF version of this document.
This functionality is cookie based. The web site will normally remember your login state between browser sessions, however, if you clear cookies at the end of a session or work in an Incognito/Private browser window, then you will need to log in each time.
If you have any questions about this feature of the Lenovo Press web, please email David Watts at dwatts@lenovo.com.