Author
Published
5 Aug 2021Form Number
LP1510PDF size
5 pages, 195 KBAbstract
Enterprises are eager to get more insights faster from all their data. Lenovo has partnered with NVIDIA and Cloudera to address the challenges of end-to-end data processing. The jointly validated data analytics solution accelerates machine learning and data science by using GPU-accelerated Apache Spark running on Lenovo ThinkSystem servers. The solution delivers more accurate insights faster and ultimately a better return on investment for data-driven enterprises.
Introduction
The world’s data is doubling each year, making the data science and machine learning applications that leverage this data the world’s largest computing segment. A broad swath of enterprise applications are looking to incorporate AI and machine learning capabilities to deliver better performance to their end users. The desired improvements need to come in the form of more accurate and faster analytics and predictions from the machine learning models. Faster and better data analytics performance translates into billions in profits and significantly better return on investment (ROI).
Challenges of end-to-end data processing
The need to develop better data analytics and machine learning models cannot be overstated. Furthermore, this is not an isolated task. A comprehensive view of the end-to-end data processing flow is required to put all the necessary parts in place to address the need for improved models. Such data flow pipelines are made up of multiple tasks and stages, from initial data engineering to acquire, ingest and store ever-increasing amounts of data, to data science for creating and evaluating models, and finally to deployment of best models into production.
Depending on the type and amount of data, the intended application domain and the breadth of available skills, each stage can pose considerable challenges because of the complexity of underlying data engineering and data science workflows.
Lenovo data management solutions
At Lenovo, we are collaborating with industry-leading partners to create data processing and data management solutions suitable for various stages of the data flow. These solutions include:
- Data ingest and storage repositories for structured and unstructured data
- Compute clusters for machine learning and deep learning model training
- Tools for deployment of trained models in inference across data centers and edge IT infrastructures
The Lenovo Big Data Reference Design for Cloudera Data Platform (CDP) on ThinkSystem Servers is one example of a solution building block. Lenovo has been delivering validated designs for big data clusters that combine CDP (and its predecessors) with targeted Lenovo systems to enable a whole host of big data use cases.
Using the NVIDIA platform with GPU-accelerated clusters for Data Analytics and AI model training is another example. Learn about the offerings at the Lenovo Analytics and AI page.
Parallel processing with GPUs
With ever-growing data volume, the need for accelerating data processing has continued to increase. In particular, using parallel computing capabilities of GPUs has been shown to speed up data processing, data management and deep learning model training. While some enterprise applications require deep learning models, a majority of them can be improved using a broad set of traditional and emerging machine learning techniques. This is where attention is required to broaden the applicability of acceleration.
How can teams of data engineers handle large so-called big-data platforms and enable their data scientist colleagues to do faster model development and evaluation?
Fortunately, there is good news on this business challenge front. Lenovo partners NVIDIA and Cloudera have integrated NVIDIA RAPIDS accelerated data science libraries on Cloudera Data Platform (CDP), enabling GPU-accelerated Apache Spark 3.0 applications. Apache Spark included in CDP has been a workhorse for numerous data analytics tasks such as batch/real-time streaming, data warehouse and machine learning among others. Accelerating Spark with GPU-enabled computation is the next leap forward in helping enterprises achieve the goal of faster and better model development.
Faster Training and ROI with GPU-Accelerated Apache Spark
Lenovo has been collaborating with NVIDIA and Cloudera on validating the performance of GPU-accelerated Apache Spark 3.0 running on CDP on Lenovo systems connected with NVIDIA Networking. The early results are very exciting. Not only are we seeing significant performance improvements, but also price-to-performance benefits.
Ultimately, price-performance is the metric about which customers often care most. Can I get a better return on my investment? That is, can I get more performance if I invest in more capable systems? In fact, the results point to 5X or more reduction in training times and 3X improvement in ROI.
All this is surely leading to faster and better insights, improved ROI and ultimately, a much better business outcome for enterprises and their end customers.
Lenovo is proud to enable a leadership hybrid cloud data analytics solution comprised of Lenovo’s NVIDIA-certified ThinkSystem servers with NVIDIA A100 and A30 GPUs and Cloudera’s CDP Private Cloud Base 7.1.6. These servers have been validated for running accelerated workloads with optimum performance, manageability, scalability, and security.
Figure 2. Lenovo-NVIDIA-Cloudera hybrid cloud data analytics solution
More detailed performance validation work will be published soon. Stay tuned. This is just the start. We are planning to address the aforementioned broad spectrum of enterprise applications that are hungry for data analytics and ML acceleration.
Trademarks
Lenovo and the Lenovo logo are trademarks or registered trademarks of Lenovo in the United States, other countries, or both. A current list of Lenovo trademarks is available on the Web at https://www.lenovo.com/us/en/legal/copytrade/.
The following terms are trademarks of Lenovo in the United States, other countries, or both:
Lenovo®
ThinkSystem®
Other company, product, or service names may be trademarks or service marks of others.
Configure and Buy
Full Change History
Course Detail
Employees Only Content
The content in this document with a is only visible to employees who are logged in. Logon using your Lenovo ITcode and password via Lenovo single-signon (SSO).
The author of the document has determined that this content is classified as Lenovo Internal and should not be normally be made available to people who are not employees or contractors. This includes partners, customers, and competitors. The reasons may vary and you should reach out to the authors of the document for clarification, if needed. Be cautious about sharing this content with others as it may contain sensitive information.
Any visitor to the Lenovo Press web site who is not logged on will not be able to see this employee-only content. This content is excluded from search engine indexes and will not appear in any search results.
For all users, including logged-in employees, this employee-only content does not appear in the PDF version of this document.
This functionality is cookie based. The web site will normally remember your login state between browser sessions, however, if you clear cookies at the end of a session or work in an Incognito/Private browser window, then you will need to log in each time.
If you have any questions about this feature of the Lenovo Press web, please email David Watts at dwatts@lenovo.com.