Ingesting data into delta lakes without using Spark

When setting up a modern date lake, it is quite important to have ACID guarantees in place, the same as for traditional data warehouses. This helps to mitigate the case when two different jobs might import data in parallel or one job fails during writing data.

There are currently two different formats that provide atomicity, consistency, isolation, and durability, and in addition to that also more advanced features like time travel: Apache Iceberg and Delta Lake. When comparing both formats, Iceberg has two different types of manifest files in its specification and does not provide any explanation about the difference between these two files. Delta actually is much more precise in its specification. So I actually recommend going with Delta as a format.

To import an Apache Arrow in-memory buffer into a data lake and save it as a delta, currently there is no support in the native Arrow libraries - what I would actually pretty much prefer. So you have the option to use Spark, which provides excellent support for delta tables via the open source integration from Databricks, or you can use an alternative approach and directly integrate with standalone libraries that are available.

delta-rs written in Rust is currently the only real option to import data from arrow tables or batches. The package is rather feature-complete and provides support also support for writing data, something that other options like delta-kernel-rs or DuckDB are still missing - even in 2025. Also, it makes data transfer straightforward by using arrow internally.

While rust as programming language will never by my first choice, it is quite straightforward anyways to create a simple wrapper and to generate a foreign function interface that can be called from a cool language like C++ or Swift.

To show you how easy it actually is to import parquet with delta-rs, there is a simple command line tool for data conversion to delta available on https://github.com/matt-do-it/DeltaConvert.