Whats new in Spark 3.5
Apache Spark continues to evolve, and the latest release, Spark 3.5, brings a host of exciting new features and improvements that enhance performance, usability, and flexibility.
Whether you’re a data engineer, data scientist, or big data enthusiast, these updates are sure to streamline your workflows and empower you to handle even larger datasets with greater efficiency. Let’s dive into some of the standout features of Apache Spark 3.5.
Enhanced SQL and DataFrame API
Spark 3.5 introduces several enhancements to its SQL and DataFrame API, making it even more powerful and expressive. Spark SQL now offers improved compliance with ANSI SQL standards, including better support for complex queries, additional built-in functions, and more consistent error handling. New and improved window functions provide greater flexibility for complex analytics tasks, making it easier to perform advanced calculations over partitions of data.
Improved Performance and Scalability
Performance is a key focus in Spark 3.5, with several optimizations aimed at speeding up data processing. AQE, introduced in Spark 3.0, gets even better in Spark 3.5. It now supports dynamic skew join optimization and dynamic coalescing of shuffle partitions, further optimizing query execution plans based on runtime statistics.
Kubernetes Enhancements
Kubernetes has become a popular choice for deploying Spark applications, and Spark 3.5 brings several enhancements to its Kubernetes support. The Kubernetes scheduler now includes better handling of pod allocation and more efficient resource management, leading to improved stability and performance for Spark jobs running on Kubernetes clusters.
With support for persistent volumes, Spark applications can use Kubernetes persistent volumes for stateful storage, making it easier to manage and share data between jobs.