SPEComp2012 is a SPEC benchmark that is one of the best ways to evaluate the performance of modern multi-processor servers. SPEComp2012 is based on the OpenMP 3.1 application programming interface, which is widely used in parallel programming to take advantage of multi-core CPU environments.
This paper describes how to improve the performance AMD-processor based servers for the SPEComp 2012 benchmark. We look at various elements, including hardware, firmware, OS, compiler, and environment settings. The paper uses the Lenovo ThinkSystem SR655 V3 AMD server with a 4th Gen AMD EPYC processor as our test environment, however the guidance presented here applies to other AMD processor families and other ThinkSystem servers.
The paper is intended for administrators who are familiar with Linux and have C/C++ programming language experience.
Introduction to OpenMP
OpenMP (Open Multi-Processing) is an application programming interface (API) that supports multi-platform shared-memory multiprocessing programming in C, C++, and Fortran. It consists of a set of compiler directives, library routines, and environment variables that influence run-time behavior. OpenMP use a portable, scalable model that gives programmers a simple and flexible interface for developing parallel applications for platforms ranging from the standard desktop computer to the supercomputer.
To setup the number of threads used for the openmp program, the value of openmp environment variable OMP_NUM_THREADS can be modified and verified under Linux Bash shell environment as shown below.
In OpenMP programming, we need to specify the region which we are going to define as parallel using the keywords pragma omp parallel. The keywords are used to fork additional threads to carry out the work enclosed in the construct in parallel. The original thread will be denoted as master thread with thread ID 0.
The omp_get_thread_num() function in the following C based program will display “Hello, world” multiple time with threads ID according to user defined value of OMP_NUM_THREADS environment variable.
Compiler the C program with Intel oneAPI compiler’s option “-fopenmp” to generate a parallel execution binary file:
$ icc -fopenmp hello.c -o hello
Set to 4 threads by command export OMP_NUM_THREADS=4, the program will print out 4 times “Hello, world” with thread ID from 0 to 3 as shown in the figure below:
Introduction to SPEComp2012
SPEComp2012 contains a suite that focuses on parallel computing performance using the OpenMP 3.1 parallelism standard. SPEC OMP is not intended to stress other computer components such as networking, the operating system, graphics, or the I/O system. SPEComp2012 focuses on compute intensive performance, which means these benchmarks emphasize the performance of:
- The computer processor (CPU)
- The memory architecture
- The parallel support libraries
- The compilers
SPEComp2012 is based on compute-intensive applications provided as source code, it contains 14 benchmarks: 8 use Fortran, 5 use C, and 1 use C++.
|Physics: Molecular Dynamics
|Physics: Computational Fluid Dynamics (CFD)
|Physics: Computational Fluid Dynamics (CFD)
|Mechanical Response Simulation
|Physics: Computation Fluid Dynamics (CFD)
|Physics: Computation Fluid Dynamics (CFD)
|Optimal Pattern Matching
|Sorting and Searching
The SPEComp2012 benchmark uses base and peak metrics to evaluate the performance of a server. The base metric is the geometric mean of medians of the base ratios, and the peak metric is the geometric mean of median of the peak ratios.
- The base metrics are required for all reported results and have stricter guidelines for compilation. For example, the same flags must be used in the same order for all benchmarks of a given language. This is the point closer to those who might prefer a relatively simple build process.
- The peak metrics are optional and have less strict requirements. For example, different compiler options may be used on each benchmark, and feedback-directed optimization is allowed. This point is closer to those who may be willing to invest more time and effort in development of build procedures.
The following table is an example of SPECompG_base2012.
|Base # Threads
|Base Run Time
|Peak # Threads
|Peak Run Time
Notes about the table:
- For the given OMP2012 suite, the elapsed time in seconds for each of its benchmark runs is reported.
- The ratio of the reference system (Sun Fire X4140) time divided by the corresponding measured time is reported.
- Separately for base and peak, the median of three runs of these ratios is reported per benchmark.
In all cases, a higher ratio means “better performance” on the given workload.
ThinkSystem SR655 V3
We used the Lenovo ThinkSystem SR655 V3 for our testing in the lab. The Lenovo ThinkSystem SR655 V3 is a 1-socket 2U server that features the AMD EPYC 9004 "Genoa" family of processors. With up to 96 cores per processor and support for the new PCIe 5.0 standard for I/O, the SR655 V3 offers the ultimate in one-socket server performance in a 2U form factor. The server also supports for DDR5 memory DIMMs to maximize performance of the memory subsystem, includes 12 memory channels (1 DIMM per channel), DIMM speeds up to 4800 MHz.
For information about the SR655 V3, see the Lenovo Press product guide:
Our configuration to measure SPEComp2012 in our lab is as follows.
|1x AMD EPYC 9654P processor, 96 cores, 2.4GHz
|12x 64GB 2Rx4 DDR5 4800MHz
|1x 960 GB SATA SSD
SPEComp2012 performance tunning
To obtain the SPEComp2012 best performance recipe for the 4th Gen AMD EPYC processor, we examined the hardware, firmware, and software components of ThinkSystem SR655 V3 server.
Topics in this section:
Designed for parallel computing, the performance of SPEComp2012 benefit from CPU cores and hardware threads. The SMT (Simultaneously Multithreading) of AMD EPYC CPU provide one more hardware thread for each physical cores, which can support more OpenMP threads runs on the system to improve the performance.
The following chart shows the SPEComp2012 scaling result from 1 to 96 cores with SMT enabled. The baseline result was measure by one core, two threads (1C2T), the performance scale up to 2424.95% on 96C192T.
In Lenovo UEFI Maximum Performance Operating Mode, the SMT default is enabled. It not only improves the performance, but also impacts the numbers of the threads. Before start to run the SPEComp2012 benchmark, please check the OpenMP environment variables OMP_NUM_THREADS, mapping to correct number of threads for best performance.
Enable SMT significantly improve the performance of SPEComp 2012 benchmark. The following chart is comparing the SMT enable and disable results on SR655 V3 server.
The Lenovo ThinkSystem SR655 V3 supports the DDR5 memory frequency 4000MHz, 4400MHz and 4800MHz. Configure to the highest memory frequency further improve the performance up to 5% compared to lowest memory frequency as show in the picture below.
The Lenovo ThinkSystem SR655 V3 UEFI provides three presets for Operating Mode: Maximum Performance Mode, Maximum Efficiency and Custom Mode. We recommend loading UEFI default settings first and choose “Maximum Performance” preset in Operating Mode before start to run the SPEComp2012 benchmark.
Lenovo ThinkSystem UEFI firmware design several operating modes for difference purposes, the number of operating modes will depend on different system design. We always chose the Maximum Performance Mode for best system performance.
The modes are as follows:
- Maximum Efficiency
Maximum Efficiency mode maximizes the performance/watt efficiency. It provides the best features for reducing power and increasing performance in application where maximum bus speeds are not critical.
- Maximum PerformanceMaximum Performance mode will maximize the absolute performance of the system without regard for power. In this mode, power consumption is not taken into consideration. Attributes like fan speed and heat output of the system may increase in addition to power consumption. Efficiency of the system may go down in this mode, but the absolute performance may increase depending on the workload that is running.
- Custom Mode
Custom Mode allows the user to individually modify any of the low-level settings that are preset and unchangeable in any of the other preset modes. Custom Mode will inherit the UEFI settings from the previous preset operating mode.
For example, if the previous operating mode was the Maximum Performance operating mode and then Custom Mode was selected, all the settings from the Maximum Performance operating mode will be inherited. Note that there are certain settings that may be mutually exclusive or interdependent. For those settings an error will be surfaced if one of the pre-requisite or interrelated settings is set in such a way as to make configuration of the setting in question non-valid.
To achieve best performance for AMD Genoa Processor we suggest using commercial Enterprise Linux OS with latest kernel, such as RHEL 9 or SLES 15. Besides kernel version, cpupower is one of user-level utilities that provides predefined governors and abilities for tuning CPU frequency and power features.
Use the following command to check processor state and list available governors:
Use the following command to switch to the “performance” governor to get better performance:
cpupower frequency-set -g performance
Support C/C++, Fortran with OpenMP 3.1, the Intel oneAPI DPC/C++ is one of the best commercial compiler toolsets we recommend building SPEComp 2012 benchmark under x86_64 architecture.
Here are suggested compiler flags use for SPEComp 2012 performance optimization.
- -O3: The higher compiler’s optimizations level, which generate performance optimized binary and reduced size of the binary.
- -fopenmp: Activating the OpenMP features based on OpenMP directives in the source codes.
- -march: Direct the compiler to generate binary for specific architecture. For example, “-march=core-avx2” would generates binary for the processors that support Intel Advanced Vector Extension2 (Intel AVX2) instructions.
OpenMP environment variables
The OpenMP environment variables configure the CPU resources allocation, CPU binding, memory binding and preferred runtime library when running OpenMP processes, the detail can find in the table below.
|Setting Value Example
|Pin OpenMP threads to hardware threads
|sets the run-time schedule type and an optional chunk size, default is static.
|selects the OpenMP run-time library execution mode. The values are serial, turnaround, or throughput.
|sets the number of bytes to allocate for each OpenMP* thread to use as the private stack for the threads.
|use the optional characters suffixes: s (seconds), m (minutes), h (hours), or d (days) to specify the units, specific infinite for an unlimited wait time.
|Enables (TRUE) or disables (FALSE) the dynamic adjustment of the number of threads.
|sets the specifies the number of threads to use for parallel regions.
In SPEComp2012 benchmark configuration file, add ENV_ for all OpenMP environment variables. For example:
For the best SPEComp2012 performance, we provided tuning suggestions from hardware configurations to firmware settings, compiler flags and OpenMP environment variables. Besides, the performance delta with and without suggestions also been provided for user reference.
Applying all the tuning steps mentioned here, the ThinkSystem SR655 V3 set a new performance world recorded on SPEComp2012. More detailed information can be found from the publish results URL:
Sinper Liang is a performance engineer in the Lenovo Infrastructure Solution Group laboratory located at Taipei, Taiwan. Sinper joined Lenovo in 2019 and focuses on system performance validation and the SPEC OMP2012 benchmark. Prior to Lenovo, he worked at the IBM Taiwan Systems and Technology Laboratory as a system UEFI firmware assurance and validation engineer, and at Wiwynn as the UEFI test leader.
Related product families
Product families related to this document are the following:
Lenovo and the Lenovo logo are trademarks or registered trademarks of Lenovo in the United States, other countries, or both. A current list of Lenovo trademarks is available on the Web at https://www.lenovo.com/us/en/legal/copytrade/.
The following terms are trademarks of Lenovo in the United States, other countries, or both:
The following terms are trademarks of other companies:
Intel® is a trademark of Intel Corporation or its subsidiaries.
Linux® is the trademark of Linus Torvalds in the U.S. and other countries.
SPEC® and SPEC OMP® are trademarks of the Standard Performance Evaluation Corporation (SPEC).
Other company, product, or service names may be trademarks or service marks of others.