NVIDIA Run:ai on ThinkSystem Servers

As AI initiatives evolve from small-scale experimentation to full-scale production, enterprises must overcome challenges in resource allocation, team scalability, and infrastructure efficiency. Lenovo, in collaboration with NVIDIA, offers a unified solution—NVIDIA Run:ai on Lenovo AI Platforms—that streamlines AI operations by orchestrating GPU workloads, optimizing infrastructure usage, and enabling seamless collaboration between IT and data science teams. Leveraging Lenovo’s robust AI infrastructure and NVIDIA Run:ai’s dynamic orchestration capabilities, organizations can accelerate time-to-value, enhance ROI, and scale AI initiatives with confidence.

Introduction

As AI workloads mature from pilot experimentation to enterprise-scale production, organizations face increased pressure to operationalize machine learning efficiently, maximize infrastructure ROI, and support ever-expanding AI teams. In partnership with NVIDIA, Lenovo introduces a unified solution that accelerates this journey: NVIDIA Run:ai on Lenovo AI Platforms.

This powerful combination addresses common friction points across the AI lifecycle — from experimentation to deployment — by unifying GPU resource management, improving workload orchestration, and supporting cross-functional collaboration across IT and data science teams.

By leveraging Lenovo’s 285 and 289 AI infrastructure and NVIDIA Run:ai’s intelligent GPU orchestration platform, enterprises can fully unlock the value of their AI investments, scale operationally with confidence, and reduce time-to-insight for data-driven outcomes.

Business and Technical Challenges

Despite substantial investment in AI hardware and software, many organizations struggle to efficiently scale their AI initiatives.

Key challenges include:

For AI Practitioners:
- Inconsistent access to GPU resources hampers experimentation and training cycles.
- Fragmented environments delay progress from proof-of-concept to deployment.
- Contention between teams results in idle time and lost productivity.
For IT Leaders:
- GPU infrastructure is often overprovisioned or underutilized due to lack of visibility.
- Static resource allocation fails to align with dynamic AI workloads.
- Difficulty enforcing usage policies across distributed teams and environments.
For Executives:
- AI investments yield diminishing returns without centralized orchestration.
- Lack of observability across workloads delays AI roadmap execution.
- Cloud overspend and infrastructure inefficiencies erode competitive advantage

Solution Overview: NVIDIA Run:ai on Lenovo Infrastructure

NVIDIA Run:ai is a Kubernetes-native AI workload orchestration platform designed to maximize the efficiency, agility, and governance of GPU resources in hybrid and on-prem environments. When deployed on Lenovo’s purpose-built AI platforms, it delivers a scalable and flexible foundation for production-grade AI.

Figure 1. Solution Overview

Core capabilities of the solution:

Fractional GPU allocation to optimize resource utilization.
Priority-based workload scheduling to ensure mission-critical jobs are completed on time.
Elastic scaling of training and inference jobs across distributed compute clusters.
Lifecycle support for AI development, from Jupyter Notebooks to model serving.
Policy-based governance for access control, security, and compliance.

NVIDIA Run:ai System Components

NVIDIA Run:ai is made up of two components both installed over a Kubernetes cluster. NVIDIA Run:ai control plane – Provides resource management, handles workload submission and provides cluster monitoring and analytics. NVIDIA Run:ai cluster – Provides scheduling and workload management, extending Kubernetes native capabilities

Figure 2. System components

The components are as follows:

Run:ai Control Plane: Centralized resource management, user access policies, workload prioritization, built on Lenovo ThinksSystems. Refer to Control Plane System Requirements for specs and recommendations
Run:ai Cluster: GPU scheduling, workload orchestration, Kubernetes-native scalability. Built on Lenovo AI server. Refer to the Lenovo Hybrid AI 285 Platform Guide for specs and recommendations

Role-Based Value Proposition

NVIDIA Run:ai software delivers distinct value to each stakeholder. Our co-tailored solution aligns with the priorities of AI practitioners, IT managers, and platform admins—driving technical efficiency, operational control, and strategic impact.

For IT Managers:

Centralized Control: Manage multiple GPU clusters from a single console.
Usage Analytics: Gain insights into GPU allocation, job performance, and bottlenecks.
Policy Enforcement: Set consumption thresholds, scheduling rules, and user permissions.
Authentication & RBAC: Integrate with enterprise identity platforms (e.g., LDAP, SSO).
Kubernetes-Native Design: Install and manage using familiar cloud-native operations.

For AI Practitioners:

Self-Service GPU Access: Launch training, fine-tuning, or inference jobs on-demand.
Interactive Development: Run uninterrupted Jupyter Notebook sessions using fractional GPUs.
Model Lifecycle Integration: From data prep to deployment — with support for key tools (PyTorch, TensorFlow, Ray, Kubeflow).
Scalable Training & Serving: Leverage multiple GPUs with support for auto-scaling.

For Platform Admins:

Team Structuring: Map projects, teams, and departments for intelligent resource allocation.
User and Access Control: Assign permissions aligned to org structure and security policies.
Scheduling and Monitoring: Allocate resources based on workload priority and urgency.
Cost Optimization: Reduce idle GPU time and increase infrastructure ROI.

Subscription model and Part number information

Run:ai is licensed per GPU with options for education, enterprise, and public sector usage. The following table lists the ordering part numbers from Lenovo.

Table 1. NVIDIA Run:ai
Part number	Feature 7S02CTO1WW	NVIDIA part number	Description
Software subscription
7S02004UWW	SDYT	744-RA7001+P3CMI12	NVIDIA Run:ai Subscription per GPU 1 Year
7S02004XWW	SDYW	744-RA7001+P3CMI36	NVIDIA Run:ai Subscription per GPU 3 Years
7S020050WW	SDYZ	744-RA7001+P3CMI60	NVIDIA Run:ai Subscription per GPU 5 Years
7S02004VWW	SDYU	744-RA7001+P3EDI12	NVIDIA Run:ai Subscription per GPU EDU 1 Year
7S02004YWW	SDYX	744-RA7001+P3EDI36	NVIDIA Run:ai Subscription per GPU EDU 3 Years
7S020051WW	SDZ0	744-RA7001+P3EDI60	NVIDIA Run:ai Subscription per GPU EDU 5 Years
7S02004WWW	SDYV	744-RA7001+P3INI12	NVIDIA Run:ai Subscription per GPU INC 1 Year
7S02004ZWW	SDYY	744-RA7001+P3INI36	NVIDIA Run:ai Subscription per GPU INC 3 Years
7S020052WW	SDZ1	744-RA7001+P3INI60	NVIDIA Run:ai Subscription per GPU INC 5 Years
Support Services subscription
7S020053WW	SDZ2	744-RA7002+P3CMI12	24x7 Support Services for NVIDIA Run:ai Subscription per GPU 1 Year
7S020056WW	SDZ5	744-RA7002+P3CMI36	24x7 Support Services for NVIDIA Run:ai Subscription per GPU 3 Years
7S020059WW	SDZ8	744-RA7002+P3CMI60	24x7 Support Services for NVIDIA Run:ai Subscription per GPU 5 Years
7S020054WW	SDZ3	744-RA7002+P3EDI12	24x7 Support Services for NVIDIA Run:ai Subscription per GPU EDU 1 Year
7S02005AWW	SDZ9	744-RA7002+P3EDI60	24x7 Support Services for NVIDIA Run:ai Subscription per GPU EDU 5 Years
7S020057WW	SDZ6	744-RA7002+P3EDI36	24x7 Support Services for NVIDIA Run:ai Subscription per GPU EDU 3 Years
7S020055WW	SDZ4	744-RA7002+P3INI12	24x7 Support Services for NVIDIA Run:ai Subscription per GPU INC 1 Year
7S020058WW	SDZ7	744-RA7002+P3INI36	24x7 Support Services for NVIDIA Run:ai Subscription per GPU INC 3 Years
7S02005BWW	SDZA	744-RA7002+P3INI60	24x7 Support Services for NVIDIA Run:ai Subscription per GPU INC 5 Years

Author

Carlos Huescas is the Worldwide Product Manager for NVIDIA software at Lenovo. He specializes in High Performance Computing and AI solutions. He has more than 15 years of experience as an IT architect and in product management positions across several high-tech companies.

Related product families

Product families related to this document are the following:

Trademarks

Lenovo and the Lenovo logo are trademarks or registered trademarks of Lenovo in the United States, other countries, or both. A current list of Lenovo trademarks is available on the Web at https://www.lenovo.com/us/en/legal/copytrade/.

The following terms are trademarks of Lenovo in the United States, other countries, or both:
Lenovo®
ThinkSystem®

Other company, product, or service names may be trademarks or service marks of others.

Lenovo Press

Lenovo Press

NVIDIA Run:ai on ThinkSystem Servers

Solution Brief

Author

Published

Form Number

PDF size

Abstract

Introduction

Business and Technical Challenges

Solution Overview: NVIDIA Run:ai on Lenovo Infrastructure

NVIDIA Run:ai System Components

Role-Based Value Proposition

Subscription model and Part number information

Author

Related product families

Trademarks