skip to main content

NVIDIA Run:ai on ThinkSystem Servers

Solution Brief

Home
Top
Published
10 Jul 2025
Form Number
LP2254
PDF size
7 pages, 216 KB

Abstract

As AI initiatives evolve from small-scale experimentation to full-scale production, enterprises must overcome challenges in resource allocation, team scalability, and infrastructure efficiency. Lenovo, in collaboration with NVIDIA, offers a unified solution—NVIDIA Run:ai on Lenovo AI Platforms—that streamlines AI operations by orchestrating GPU workloads, optimizing infrastructure usage, and enabling seamless collaboration between IT and data science teams. Leveraging Lenovo’s robust AI infrastructure and NVIDIA Run:ai’s dynamic orchestration capabilities, organizations can accelerate time-to-value, enhance ROI, and scale AI initiatives with confidence.

Introduction

As AI workloads mature from pilot experimentation to enterprise-scale production, organizations face increased pressure to operationalize machine learning efficiently, maximize infrastructure ROI, and support ever-expanding AI teams. In partnership with NVIDIA, Lenovo introduces a unified solution that accelerates this journey: NVIDIA Run:ai on Lenovo AI Platforms.

This powerful combination addresses common friction points across the AI lifecycle — from experimentation to deployment — by unifying GPU resource management, improving workload orchestration, and supporting cross-functional collaboration across IT and data science teams.

By leveraging Lenovo’s 285 and 289 AI infrastructure and NVIDIA Run:ai’s intelligent GPU orchestration platform, enterprises can fully unlock the value of their AI investments, scale operationally with confidence, and reduce time-to-insight for data-driven outcomes.

Business and Technical Challenges

Despite substantial investment in AI hardware and software, many organizations struggle to efficiently scale their AI initiatives.

Key challenges include:

  • For AI Practitioners:
    • Inconsistent access to GPU resources hampers experimentation and training cycles.
    • Fragmented environments delay progress from proof-of-concept to deployment.
    • Contention between teams results in idle time and lost productivity.
  • For IT Leaders:
    • GPU infrastructure is often overprovisioned or underutilized due to lack of visibility.
    • Static resource allocation fails to align with dynamic AI workloads.
    • Difficulty enforcing usage policies across distributed teams and environments.
  • For Executives:
    • AI investments yield diminishing returns without centralized orchestration.
    • Lack of observability across workloads delays AI roadmap execution.
    • Cloud overspend and infrastructure inefficiencies erode competitive advantage

Solution Overview: NVIDIA Run:ai on Lenovo Infrastructure

NVIDIA Run:ai is a Kubernetes-native AI workload orchestration platform designed to maximize the efficiency, agility, and governance of GPU resources in hybrid and on-prem environments. When deployed on Lenovo’s purpose-built AI platforms, it delivers a scalable and flexible foundation for production-grade AI.

Solution Overview
Figure 1. Solution Overview

Core capabilities of the solution:

  • Fractional GPU allocation to optimize resource utilization.
  • Priority-based workload scheduling to ensure mission-critical jobs are completed on time.
  • Elastic scaling of training and inference jobs across distributed compute clusters.
  • Lifecycle support for AI development, from Jupyter Notebooks to model serving.
  • Policy-based governance for access control, security, and compliance.

NVIDIA Run:ai System Components

NVIDIA Run:ai is made up of two components both installed over a Kubernetes cluster. NVIDIA Run:ai control plane – Provides resource management, handles workload submission and provides cluster monitoring and analytics. NVIDIA Run:ai cluster – Provides scheduling and workload management, extending Kubernetes native capabilities

System components
Figure 2. System components

The components are as follows:

  • Run:ai Control Plane: Centralized resource management, user access policies, workload prioritization, built on Lenovo ThinksSystems. Refer to Control Plane System Requirements for specs and recommendations
  • Run:ai Cluster: GPU scheduling, workload orchestration, Kubernetes-native scalability. Built on Lenovo AI server. Refer to the Lenovo Hybrid AI 285 Platform Guide for specs and recommendations

Role-Based Value Proposition

NVIDIA Run:ai software delivers distinct value to each stakeholder. Our co-tailored solution aligns with the priorities of AI practitioners, IT managers, and platform admins—driving technical efficiency, operational control, and strategic impact.

For IT Managers:

  • Centralized Control: Manage multiple GPU clusters from a single console.
  • Usage Analytics: Gain insights into GPU allocation, job performance, and bottlenecks.
  • Policy Enforcement: Set consumption thresholds, scheduling rules, and user permissions.
  • Authentication & RBAC: Integrate with enterprise identity platforms (e.g., LDAP, SSO).
  • Kubernetes-Native Design: Install and manage using familiar cloud-native operations.

For AI Practitioners:

  • Self-Service GPU Access: Launch training, fine-tuning, or inference jobs on-demand.
  • Interactive Development: Run uninterrupted Jupyter Notebook sessions using fractional GPUs.
  • Model Lifecycle Integration: From data prep to deployment — with support for key tools (PyTorch, TensorFlow, Ray, Kubeflow).
  • Scalable Training & Serving: Leverage multiple GPUs with support for auto-scaling.

For Platform Admins:

  • Team Structuring: Map projects, teams, and departments for intelligent resource allocation.
  • User and Access Control: Assign permissions aligned to org structure and security policies.
  • Scheduling and Monitoring: Allocate resources based on workload priority and urgency.
  • Cost Optimization: Reduce idle GPU time and increase infrastructure ROI.

Subscription model and Part number information

Run:ai is licensed per GPU with options for education, enterprise, and public sector usage. The following table lists the ordering part numbers from Lenovo.

Table 1. NVIDIA Run:ai
Part number Feature
7S02CTO1WW
NVIDIA part number Description
Software subscription
7S02004UWW SDYT 744-RA7001+P3CMI12 NVIDIA Run:ai Subscription per GPU 1 Year
7S02004XWW SDYW 744-RA7001+P3CMI36 NVIDIA Run:ai Subscription per GPU 3 Years
7S020050WW SDYZ 744-RA7001+P3CMI60 NVIDIA Run:ai Subscription per GPU 5 Years
7S02004VWW SDYU 744-RA7001+P3EDI12 NVIDIA Run:ai Subscription per GPU EDU 1 Year
7S02004YWW SDYX 744-RA7001+P3EDI36 NVIDIA Run:ai Subscription per GPU EDU 3 Years
7S020051WW SDZ0 744-RA7001+P3EDI60 NVIDIA Run:ai Subscription per GPU EDU 5 Years
7S02004WWW SDYV 744-RA7001+P3INI12 NVIDIA Run:ai Subscription per GPU INC 1 Year
7S02004ZWW SDYY 744-RA7001+P3INI36 NVIDIA Run:ai Subscription per GPU INC 3 Years
7S020052WW SDZ1 744-RA7001+P3INI60 NVIDIA Run:ai Subscription per GPU INC 5 Years
Support Services subscription
7S020053WW SDZ2 744-RA7002+P3CMI12 24x7 Support Services for NVIDIA Run:ai Subscription per GPU 1 Year
7S020056WW SDZ5 744-RA7002+P3CMI36 24x7 Support Services for NVIDIA Run:ai Subscription per GPU 3 Years
7S020059WW SDZ8 744-RA7002+P3CMI60 24x7 Support Services for NVIDIA Run:ai Subscription per GPU 5 Years
7S020054WW SDZ3 744-RA7002+P3EDI12 24x7 Support Services for NVIDIA Run:ai Subscription per GPU EDU 1 Year
7S02005AWW SDZ9 744-RA7002+P3EDI60 24x7 Support Services for NVIDIA Run:ai Subscription per GPU EDU 5 Years
7S020057WW SDZ6 744-RA7002+P3EDI36 24x7 Support Services for NVIDIA Run:ai Subscription per GPU EDU 3 Years
7S020055WW SDZ4 744-RA7002+P3INI12 24x7 Support Services for NVIDIA Run:ai Subscription per GPU INC 1 Year
7S020058WW SDZ7 744-RA7002+P3INI36 24x7 Support Services for NVIDIA Run:ai Subscription per GPU INC 3 Years
7S02005BWW SDZA 744-RA7002+P3INI60 24x7 Support Services for NVIDIA Run:ai Subscription per GPU INC 5 Years

Author

Carlos Huescas is the Worldwide Product Manager for NVIDIA software at Lenovo. He specializes in High Performance Computing and AI solutions. He has more than 15 years of experience as an IT architect and in product management positions across several high-tech companies.

Related product families

Product families related to this document are the following:

Trademarks

Lenovo and the Lenovo logo are trademarks or registered trademarks of Lenovo in the United States, other countries, or both. A current list of Lenovo trademarks is available on the Web at https://www.lenovo.com/us/en/legal/copytrade/.

The following terms are trademarks of Lenovo in the United States, other countries, or both:
Lenovo®
ThinkSystem®

Other company, product, or service names may be trademarks or service marks of others.