skip to main content

Deploy and Scale Generative AI in Enterprises with Lenovo ThinkSystem SR650 V3

Solution Brief

Home
Top
Published
18 Dec 2023
Form Number
LP1859
PDF size
6 pages, 258 KB

Abstract

Generative AI has the potential to transform and reinvent virtually every aspect of business from customer experience to business operations to employee engagement. However, it can often be a daunting task to determine just how to get started. Lenovo and Intel have created a cost-effective solution that can help you achieve revolutionary business impacts and have the ability to scale as your AI needs grow.

Get Started with the Infrastructure You Know

Generative AI seems to be everywhere and it’s an inflection point that companies can’t afford to miss. It has the potential to transform and reinvent virtually every aspect of business from customer experience to business operations to employee engagement, yet it can often be a daunting task to determine just how to get started. But it doesn’t have to be complicated. Companies looking to start their journey on Generative AI can extend their existing infrastructure with the Lenovo ThinkSystem SR650 V3, accelerated by 4th Gen Intel® Xeon® processors, and achieve revolutionary business impacts without having to invest in dedicated (and often costly) GPU accelerators.

Solution and Testing Overview

Intel testing has shown the Lenovo ThinkSystem SR650 V3, with 4th Gen Intel Xeon processors, delivers a highly performant, scalable solution for Generative AI. A latency of 100ms or less is a response time perceived as instantaneous for most conversational AI and text summarization applications. Intel’s testing demonstrated this solution could successfully meet that target and provide the necessary performance to support a variety of use cases, including real-time chatbots.

The Lenovo ThinkSystem SR650 V3 offers high performance, storage, and memory capacity to tackle complex workloads, like Generative AI that require optimized hardware architecture. With flexible storage and networking options, the ThinkSystem SR650 V3 can easily scale for changing needs. The ThinkSystem SR650 V3 supports one or two 4th Gen Intel Xeon processors. With built-in Advanced Matrix Extensions (AMX), 4th Gen Intel Xeon processors deliver high performance on cutting-edge AI models.

Enterprises may require multiple Generative AI models to perform different tasks, including image creation, synthetic data generation, and chatbots. Generative AI models can require a large amount of storage. The ThinkSystem SR650 V3 can support many Generative AI models in a single 2U server with its tremendous amount of storage and flexibility. With three drive bay zones, it supports up to 20x 3.5-inch or 40x 2.5-inch hot-swap drive bays.

The ThinkSystem SR650 V3 offers energy-efficiency features to save energy and reduce operational costs for Generative AI workloads. These features include advanced direct-water cooling (DWC) with the Lenovo Neptune Processor DWC Module, where heat from the processors is removed from the rack and data center using an open loop and coolant distribution units, resulting in lower energy costs, high-efficiency power supplies with 80 PLUS Platinum and Titanium certifications, and optional Lenovo XClarity Energy Manager, which provides advanced data center power notification, analysis, and policy-based management to help achieve lower heat output and reduced cooling needs.

 


Figure 1. Lenovo ThinkSystem SR650 V3

Results

The Generative AI testing on the Lenovo ThinkSystem SR650 V3 with 4th Gen Intel Xeon processors was performed by Intel and validated by Lenovo. A variety of batch sizes were used to simulate concurrent users and token lengths between 32-1024 represent a typical enterprise chatbot scenario.

As demonstrated with LLAMA 2, 7B and 13B parameters, the Lenovo ThinkSystem SR650 V3 with 4th Gen Intel Xeon processors helps achieve less than 100ms next token latency from batch size 1 to batch size 16 for Generative AI inference across input token lengths 32 to 1024.

 


Figure 2. Llama 2, 7B Performance on Lenovo ThinkSystem SR650 V3 with 2x 4th Gen Intel Xeon CPU using DeepSpeed (AutoTP)

 


Figure 3. Llama 2, 13B Performance on Lenovo ThinkSystem SR650 V3 with 2x 4th Gen Intel Xeon CPU using DeepSpeed (Auto TP)

Configuration Details

Tested by Intel as of September, 2023

Table 1. Hardware Configuration
Server Lenovo ThinkSystem SR650 V3
Processor 2x Intel Xeon Platinum 8462Y+ processors
Sockets 2
Cores per Socket 32
Hyperthreading Intel® Hyper-Threading Technology Enabled
CPUs 128
Intel Turbo Boost Enabled
Base Frequency 2.8GHz
NUMA Nodes 2
Installed Memory 1024GB (16x64GB DDR5 4800 MT/s [4800 MT/s])
NIC 1x ThinkSystem Broadcom 57508 100GbE QSFP56 2-Port PCIe Ethernet Adapter, 1x Ethernet Controller E810-XXV for SFP
Disk 10x 3.2TB Intel SSDPF2KE032T1O, 1x 1TB Micron_7450_MTFDKBA960TFR
BIOS ESE114R-2.14
Microcode 0x2b0004b1
OS Red Hat Enterprise Linux 8.8 (Ootpa)
Kernel 4.18.0-477.21.1.el8_8.x86_64
Table 2. Other Configuration Details
Software Configuration Pytorch Llama2 model BF16 precision
Framework /Toolkit Torch 2.2.0.dev20230911+cpu
IPEX 2.2.0+git880fda9/llm_feature_branch
Deepspeed 0.10.2+f15e6d48
Transformers 4.31.0
Topology or ML Algorithm meta-llama/Llama-2-7b-hf, meta-llama/Llama-2-13b-hf
Dataset LaMBDa License: Creative Commons by 4.0
Compiler gcc version 12.3.0 (GCC)
Libraries oneDNN v3.2, oneccl-bind-pt 2.1.0+cpu
Dataset (size, shape) Token Length 32/128/1024/2048 (in); Token Length 32 (out)
Precision BF16
Warmup Steps 10
Num Interations 100
Batch Size 1, 2, 4, 8, 16
Beam Width 1 (greedy search)
Input Token Size 32, 256, 1024, 2048
Output Token Size 32

Accelerated by Intel

To deliver the best experience possible, Lenovo and Intel have optimized this solution to leverage Intel capabilities like processor accelerators not available in other systems. Accelerated by Intel means enhanced performance to help you achieve new innovations and insight that can give your company an edge.

Why Lenovo

Lenovo is a US$70 billion revenue Fortune Global 500 company serving customers in 180 markets around the world. Focused on a bold vision to deliver smarter technology for all, we are developing world-changing technologies that power (through devices and infrastructure) and empower (through solutions, services and software) millions of customers every day.

 

 

For More Information

To learn more about this Lenovo solution contact your Lenovo Business Partner or visit: https://www.lenovo.com/ai

Related product families

Product families related to this document are the following:

Trademarks

Lenovo and the Lenovo logo are trademarks or registered trademarks of Lenovo in the United States, other countries, or both. A current list of Lenovo trademarks is available on the Web at https://www.lenovo.com/us/en/legal/copytrade/.

The following terms are trademarks of Lenovo in the United States, other countries, or both:
Lenovo®
Lenovo Neptune®
ThinkSystem®
XClarity®

The following terms are trademarks of other companies:

Intel® and Xeon® are trademarks of Intel Corporation or its subsidiaries.

Linux® is the trademark of Linus Torvalds in the U.S. and other countries.

Other company, product, or service names may be trademarks or service marks of others.