This Lenovo reference architecture describes an entry-level, cluster architecture using Lenovo ThinkSystem compute servers and ThinkSystem DM Series storage systems optimized for Artificial Intelligence (AI) training workflows accelerated by GPUs. The architecture enables small and medium sized teams where most compute jobs are single node (single or multi-GPU) or distributed over a few computational nodes.
This document covers testing and validation of the compute/storage configuration consisting of four accelerated ThinkSystem SR670 servers and an entry-level 10GbE network connected ThinkSystem DM storage system, providing an efficient and cost-effective solution for small and medium-sized organizations starting out with AI that require the enterprise-grade capabilities of ONTAP® cloud-connected data storage available with DM Series storage.
This document is intended for Data scientists and data engineers who are looking for efficient ways to achieve deep learning (DL) and machine learning (ML) development goals, Enterprise architects who design solutions for the development of AI models and software, and IT decision makers and business leaders who want to achieve the fastest time to market possible from AI initiatives.
Table of Contents
2 Technology Overview
3 Test Overview
4 Test Configuration
5 Test Procedure
6 AI Training Results
7 Architecture Adjustments
8 Deployment considerations
Appendix: Lenovo Bill of Materials
To view the document, click the Download PDF button.