Accelerating Data Science Workflows: Gen-over-Gen CPU Gains on Intel Processors

A desired data-science workflows requires squeezing latency out of every stage—ingestion, transformation, training, and inference—by pairing modern CPU servers with optimized analytics libraries. On Lenovo ThinkSystem servers with Intel Xeon processors, frameworks such as Modin (for pandas-scale-out) and Intel Extension for Scikit-learn/XGBoost exploit CPU parallelism and vector/matrix acceleration to shorten time-to-insight while keeping deployment simple and cost-efficient.

In this paper, we quantify generation‑to‑generation performance on common data‑science workloads running entirely on CPUs. We compare Modin data manipulation, scikit‑learn training (DBSCAN, K‑means, KNN‑Classifier, Logistic/Linear Regression, Random Forests) and inference (incl. LightGBM, XGBoost, CatBoost) across 3rd, 5th, and 6th Gen Intel Xeon processors.

The results of this analysis show that 6th Gen Xeon improves the following:

Data manipulation by up to 6.13× vs 3rd Gen (at 1.6M rows) and 2.64× vs 5th Gen (Modin)
Training by up to 7.23× vs 3rd Gen and 3.00× vs 5th Gen across algorithms
Inference by up to 4.37× vs 3rd Gen and 2.38× vs 5th Gen across algorithms

A balanced composite view yields an overall ≈1.6×-2.6× (6th / 5th) and ≈2.5×-6.1× (6th / 3rd) gain for the end‑to‑end pipeline.

This paper targets data scientists, ML engineers, and performance-minded architects who already understand core scikit-learn APIs (fit/predict, pipelines) and basic ML concepts (train-test split, common metrics). We assume readers are comfortable with Python and seek practical guidance on extracting more performance from Intel Xeon-based infrastructure without rewriting their code

Part 3 of a series: This paper is Part 3 of a series of papers on Accelerating Data Science Workflows. Part 1 covers the use of Modin. Part 2 covers the use of Intel Extension for Scikit-learn.

Introduction

Most enterprise analytics pipelines still lean on Python DataFrames (pandas/Modin) and classical ML libraries (scikit‑learn, XGBoost), so CPU‑only efficiency directly impacts cost, latency, and throughput. As Intel Xeon generations advance, pairing Modin and Intel Extension for Scikit‑learn turns architectural gains into real end‑to‑end time savings with minimal code change.

In this paper, we will cover the following comparisons:

Pandas ↔ Modin. Pandas is the de‑facto DataFrame API; Modin keeps that API while parallelizing execution across cores/cluster back‑ends (Ray). This allows parallel I/O and compute with minimal code change (import swap).
Intel Extension for Scikit‑learn (sklearn‑intelex). A single call to patch_sklearn() dynamically patches popular estimators to highly‑optimized C++ kernels (oneDAL), accelerating both fit and predict without rewriting pipelines.
Xeon generations. We focus on realistic CPU‑only deployments comparing 3rd Gen Xeon Scalable, 5th Gen Xeon, and 6th Gen Xeon ("Xeon 6").

Series of papers

This paper represents Part 3 of a series on Accelerating Data Science Workflows:

Part 1: Modin vs. Pandas for data manipulation (I/O, transforms)
Part 2: Intel Extension for Scikit‑learn (training & inference)
Part 3 (this paper): End‑to‑end gains across 3rd, 5th, and 6th Gen Intel Xeon CPUs, covering data manipulation + model training + inference, with composite pipeline speed‑ups

Algorithms and Datasets

To ensure apples‑to‑apples comparisons, we reuse the workloads from Parts 1–2 and run them on identical software stacks across 3rd, 5th, and 6th Gen Intel Xeon CPUs.

We'll be addressing the following workloads, metrics, and environment details:

Workloads:
- Data manipulation (Modin): CSV ingest + transforms at 400K / 800K / 1.6M rows.
- Training (sklearn‑intelex): DBSCAN, K‑means, KNeighborsClassifier, Logistic/Linear Regression, Random Forest Classifier/Regressor.
- Inference: CatBoost, LightGBM, XGBoost, plus the scikit‑learn models above.
Metrics: Wall‑clock time for each operation; composite results use median to reduce skew from outliers. We also report 6th↔5th and 6th↔3rd speed‑up factors.
Environment: Same software stack across generations.

Methodology Notes & Reproducibility

To keep results fair and repeatable, we standardize seeds, force garbage collection between iterations, and repeat runs to smooth variance. Use the notes below to replicate our setup and verify the numbers.

All timings are wall‑clock.
Each test repeated multiple iterations with garbage collection (gc.collect()) between runs; median reported.
Data manipulation used Modin; ML used scikit‑learn patched with Intel Extension for Scikit‑learn.
Keep the same software version, BIOS settings, and dataset descriptions.

Results

In this section, we list the raw wall clock times and normalized speed ups (5th Gen to 6th Gen, and 3rd Gen to 6th Gen). Lower values are better for time; The ratios in the right two columns highlight 6th Gen improvements.

Data Manipulation (Modin)
Model Training (Intel Extension for Scikit learn)
Model Inference (Intel Extension for Scikit learn)
Composite “Full Pipeline” View

Data Manipulation (Modin)

The improvement for Modin across the three sizes are shown in the following table:

6th vs 5th Gen improvement = 1.88×-2.64×
6th vs 3rd Gen improvement = 3.00×-6.13×

Table 1. Data Manipulation (Modin)
Rows	3rd Gen	5th Gen	6th Gen	5th to 6th Gen Improvement ↑	3rd to 6th Gen Improvement ↑
400,000	21.80 sec	13.64 sec	7.26 sec	1.88×	3.00×
800,000	39.97 sec	17.36 sec	8.66 sec	2.00×	4.62×
1,600,000	60.05 sec	25.85 sec	9.80 sec	2.64×	6.13×

Model Training (Intel Extension for Scikit learn)

For model training, we obtained the results listed in the following table:

6th vs 5th ranges 1.11×–3.00×
6th vs 3rd ranges 1.44×–7.23×

Table 2. Model Training (Intel Extension for Scikit learn)
Algorithm	3rd Gen	5th Gen	6th Gen	5th to 6th Gen Improvement ↑	3rd to 6th Gen Improvement ↑
DBSCAN	9.921	6.862	5.080	1.35×	1.95×
K means	1.295	1.069	0.558	1.92×	2.32×
KNeighborsClassifier	1.257	1.016	0.609	1.67×	2.06×
Linear Regression	0.020	0.015	0.005	3.00×	4.00×
Logistic Regression	22.632	17.541	15.748	1.11×	1.44×
Random Forest Classifier	2.678	0.682	0.402	1.70×	6.66×
Random Forest Regressor	59.208	12.161	8.188	1.49×	7.23×

Model Inference (Intel Extension for Scikit learn)

For model inference, we obtained the results listed in the following table:

6th vs 5th ranges 1.19×–2.38×
6th vs 3rd ranges 1.63×–4.37×

Table 3. Model Inference (Intel Extension for Scikit learn)
Algorithm	3rd Gen	5th Gen	6th Gen	5th to 6th Gen Improvement ↑	3rd to 6th Gen Improvement ↑
CatBoost	0.0031	0.0027	0.0017	1.59×	1.82×
K means	0.0050	0.0038	0.0016	2.38×	3.13×
KNeighborsClassifier	1.3968	0.4396	0.3194	1.38×	4.37×
LightGBM	0.0030	0.0016	0.0007	2.29×	4.29×
Linear Regression	0.0019	0.0014	0.0008	1.75×	2.38×
Logistic Regression	0.1431	0.1169	0.0708	1.65×	2.02×
Random Forest Classifier	0.4425	0.2854	0.2044	1.40×	2.16×
Random Forest Regressor	0.4409	0.1640	0.1097	1.49×	4.02×
XGBoost	0.0026	0.0019	0.0016	1.19×	1.63×

Composite “Full Pipeline” View

To avoid over‑weighting any single stage, we compute a balanced median uplift across the three segments (Modin data manipulation, training, inference):

6th vs 5th Gen: ≈ 1.6x – 2.5x
6th vs 3rd Gen: ≈ 2.5x – 6.1x

Note: If your workload is training‑heavy or inference‑heavy, scale each segment by the appropriate share of wall‑clock time to obtain the scenario‑specific uplift.

Conclusions

6th Gen Intel Xeon consistently advances end‑to‑end, CPU‑only analytics versus 5th and 3rd Gen baselines using the same, familiar software stack (Modin + Intel Extension for Scikit‑learn). In our tests, 6th/5th uplifts typically land in the 1.88×–2.64× range for data manipulation, 1.11×–3.00× for model training, and 1.19×–2.38× for inference; against 3rd Gen, ranges widen to 3.00×–6.13×, 1.44×–7.23×, and 1.63×–4.37×, respectively. The full‑pipeline gain is workload‑dependent but commonly falls around ≈1.6×–2.6× vs 5th Gen (and ≈2.5×–6.1× vs 3rd Gen) when combining prep, fit, and predict.

Practical takeaways:

Prioritize CPU upgrades when your pipelines are I/O‑heavy (large CSV/Parquet ingestion, wide group‑bys) or rely on tree ensembles, clustering, or distance‑based methods. These show the largest improvements from gen‑to‑gen and from the Intel‑optimized kernels.
Mind training vs inference trade‑offs. Some algorithms may train modestly faster but infer dramatically faster (or vice‑versa). Choose CPU generation and algorithmic settings based on where your SLA or cost is constrained (e.g., batch‑training windows vs. online latency).
Adoption is low‑friction. The improvements arrive with minimal code change: an import swap for Modin and a patch_sklearn() call for Intel Extension for Scikit‑learn, preserving APIs and model semantics.
Right‑size with pipeline weights. Apply the per‑stage ranges to your own time profile (e.g., 40% prep / 35% train / 25% infer) to estimate business impact. Where inference dominates, favor gains in predict‑time algorithms; where training windows dominate, weight fit‑time uplifts more heavily.

Overall, upgrading to 6th Gen Intel Xeon turns many formerly multi‑second steps into sub‑second operations and materially compresses end‑to‑end latency, without abandoning the mainstream pandas/scikit‑learn ecosystem.

Lab Configurations

Our test server had the hardware and software configuration listed in the following table.

Memory configuration: In our experiments, memory capacity differences across servers did not materially affect results. All configurations used server-grade DRAM (RDIMM/MRDIMM) at platform-supported speeds, and our datasets/working sets comfortably fit in available memory, so the bottleneck was CPU compute rather than memory size. For this workload mix (pandas/Modin transforms and scikit-learn/Intel-optimized algorithms), we therefore treat the memory discrepancy as non-impacting to the reported performance comparisons.

Table 4. Lab Configurations
Component	3rd Gen Intel Xeon Server	5th Gen Intel Xeon Server	6th Gen Intel Xeon Server
Server configuration
Platform	Lenovo ThinkSystem SR650 V2	Lenovo ThinkSystem SR650 V3	Lenovo ThinkSystem SR650 V4
CPU	Intel Xeon 4310 processor, 24 cores / 48 threads @ 3.0 GHz	Intel Xeon 8592+ processor, 64 cores / 128 threads @ 3.9 GHz	Intel Xeon 6787P processor, 86 cores / 172 threads @ 3.8 GHz
Memory	16x 16 GB DDR4 RAM	16x 32 GB DDR5 RAM	16x 64 GB DDR5 RAM
OS	Ubuntu 22.04.5 LTS (Linux kernel 6.8.0-59-generic)	Ubuntu 22.04.5 LTS (Linux kernel 6.8.0-59-generic)	Ubuntu 22.04.5 LTS (Linux kernel 6.8.0-59-generic)
Software components
Python	Version 3.10.12
Pandas	Version 2.2.3
scikit‑learn	Version 1.5.0
scikit-learn-intelex	Version 2025.1

References

For more information, see these web resources:

Modin documentation
https://modin.readthedocs.io/
Intel Extension for Scikit‑learn Overview:
https://www.intel.com/content/www/us/en/developer/tools/oneapi/scikit-learn.html ;
Intel Extension for Scikit‑learn Getting Started:
https://www.intel.com/content/www/us/en/developer/articles/guide/intel-extension-for-scikit-learn-getting-started.html
3rd Gen Intel Xeon Scalable:
https://www.intel.com/content/www/us/en/products/docs/processors/xeon-accelerated/3rd-gen-xeon-scalable-processors.html
5th Gen Intel Xeon Scalable:
https://www.intel.com/content/www/us/en/products/docs/processors/xeon/5th-gen-xeon-scalable-processors.html
Intel Xeon 6:
https://www.intel.com/content/www/us/en/products/details/processors/xeon.html

Authors

Kelvin He is an AI Data Scientist at Lenovo. He is a seasoned AI and data science professional specializing in building machine learning frameworks and AI-driven solutions. Kelvin is experienced in leading end-to-end model development, with a focus on turning business challenges into data-driven strategies. He is passionate about AI benchmarks, optimization techniques, and LLM applications, enabling businesses to make informed technology decisions.

David Ellison is the Chief Data Scientist for Lenovo ISG. Through Lenovo’s US and European AI Discover Centers, he leads a team that uses cutting-edge AI techniques to deliver solutions for external customers while internally supporting the overall AI strategy for the Worldwide Infrastructure Solutions Group. Before joining Lenovo, he ran an international scientific analysis and equipment company and worked as a Data Scientist for the US Postal Service. Previous to that, he received a PhD in Biomedical Engineering from Johns Hopkins University. He has numerous publications in top tier journals including two in the Proceedings of the National Academy of the Sciences.

Trademarks

Lenovo and the Lenovo logo are trademarks or registered trademarks of Lenovo in the United States, other countries, or both. A current list of Lenovo trademarks is available on the Web at https://www.lenovo.com/us/en/legal/copytrade/.

The following terms are trademarks of Lenovo in the United States, other countries, or both:
Lenovo®
ThinkSystem®

The following terms are trademarks of other companies:

Intel®, the Intel logo and Xeon® are trademarks of Intel Corporation or its subsidiaries.

Linux® is the trademark of Linus Torvalds in the U.S. and other countries.

Other company, product, or service names may be trademarks or service marks of others.

Lenovo Press

Lenovo Press

Accelerating Data Science Workflows: Gen-over-Gen CPU Gains on Intel Processors

Planning / Implementation

Authors

Published

Form Number

PDF size

Abstract

Introduction

Series of papers

Algorithms and Datasets

Methodology Notes & Reproducibility

Results

Data Manipulation (Modin)

Model Training (Intel Extension for Scikit learn)

Model Inference (Intel Extension for Scikit learn)

Composite “Full Pipeline” View

Conclusions

Lab Configurations

References

Authors

Trademarks

Cookies & Privacy

Cookie Preferences