Open Access
REVIEW
A Survey of Spark Scheduling Strategy Optimization Techniques and Development Trends
Department of Computer Science and Technology, School of Computer Science, Xi’an University of Posts and Telecommunications, Xi’an, 710100, China
* Corresponding Author: Xuanlin Wen. Email:
Computers, Materials & Continua 2025, 83(3), 3843-3875. https://doi.org/10.32604/cmc.2025.063047
Received 03 January 2025; Accepted 06 March 2025; Issue published 19 May 2025
Abstract
Spark performs excellently in large-scale data-parallel computing and iterative processing. However, with the increase in data size and program complexity, the default scheduling strategy has difficulty meeting the demands of resource utilization and performance optimization. Scheduling strategy optimization, as a key direction for improving Spark’s execution efficiency, has attracted widespread attention. This paper first introduces the basic theories of Spark, compares several default scheduling strategies, and discusses common scheduling performance evaluation indicators and factors affecting scheduling efficiency. Subsequently, existing scheduling optimization schemes are summarized based on three scheduling modes: load characteristics, cluster characteristics, and matching of both, and representative algorithms are analyzed in terms of performance indicators and applicable scenarios, comparing the advantages and disadvantages of different scheduling modes. The article also explores in detail the integration of Spark scheduling strategies with specific application scenarios and the challenges in production environments. Finally, the limitations of the existing schemes are analyzed, and prospects are envisioned.Keywords
Cite This Article

This work is licensed under a Creative Commons Attribution 4.0 International License , which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.