fix(deps): update dependency io.delta:delta-standalone_2.13 to v3 (#170)

renovate[bot] · web-flow · commit 043a39e4007b · 2024-01-10T11:09:10.000+01:00
[![Mend Renovate](https://app.renovatebot.com/images/banner.svg)](https://renovatebot.com) This PR contains the following updates: | Package | Change | Age | Adoption | Passing | Confidence | |---|---|---|---|---|---| | [io.delta:delta-standalone_2.13](https://delta.io/) ([source](https://togithub.com/delta-io/delta)) | `0.6.0` -> `3.0.0` | [![age](https://developer.mend.io/api/mc/badges/age/maven/io.delta:delta-standalone_2.13/3.0.0?slim=true)](https://docs.renovatebot.com/merge-confidence/) | [![adoption](https://developer.mend.io/api/mc/badges/adoption/maven/io.delta:delta-standalone_2.13/3.0.0?slim=true)](https://docs.renovatebot.com/merge-confidence/) | [![passing](https://developer.mend.io/api/mc/badges/compatibility/maven/io.delta:delta-standalone_2.13/0.6.0/3.0.0?slim=true)](https://docs.renovatebot.com/merge-confidence/) | [![confidence](https://developer.mend.io/api/mc/badges/confidence/maven/io.delta:delta-standalone_2.13/0.6.0/3.0.0?slim=true)](https://docs.renovatebot.com/merge-confidence/) | --- ### Release Notes <details> <summary>delta-io/delta (io.delta:delta-standalone_2.13)</summary> ### [`v3.0.0`](https://togithub.com/delta-io/delta/releases/tag/v3.0.0): Delta Lake 3.0.0 We are excited to announce the final release of Delta Lake 3.0.0. This release includes several exciting new features and artifacts. #### Highlights Here are the most important aspects of 3.0.0: ##### Spark 3.5 Support Unlike the initial preview release, Delta Spark is now built on top of Apache Spark™ 3.5. See the Delta Spark section below for more details. ##### Delta Universal Format (UniForm) - Documentation: https://docs.delta.io/3.0.0/delta-uniform.html - Maven artifacts: [delta-iceberg\_2.12](https://repo1.maven.org/maven2/io/delta/delta-iceberg\_2.12/3.0.0/), [delta-iceberg\_2.13](https://repo1.maven.org/maven2/io/delta/delta-iceberg\_2.13/3.0.0/) Delta Universal Format (UniForm) will allow you to read Delta tables with Hudi and Iceberg clients. Iceberg support is available with this release. UniForm takes advantage of the fact that all table storage formats, such as Delta, Iceberg, and Hudi, actually consist of Parquet data files and a metadata layer. In this release, UniForm automatically generates Iceberg metadata and commits to Hive metastore, allowing Iceberg clients to read Delta tables as if they were Iceberg tables. Create a UniForm-enabled table using the following command: ```sql CREATE TABLE T (c1 INT) USING DELTA TBLPROPERTIES ( 'delta.universalFormat.enabledFormats' = 'iceberg'); ``` Every write to this table will automatically keep Iceberg metadata updated. See the documentation [here](https://docs.delta.io/3.0.0/delta-uniform.html) for more details, and the key implementations [here](https://togithub.com/delta-io/delta/commit/9b50cd206004ae28105846eee9d910f39019ab8b) and [here](https://togithub.com/delta-io/delta/commit/01fee68c). ##### Delta Kernel - API documentation: https://docs.delta.io/3.0.0/api/java/kernel/index.html - Maven artifacts: [delta-kernel-api](https://repo1.maven.org/maven2/io/delta/delta-kernel-api/3.0.0/), [delta-kernel-defaults](https://repo1.maven.org/maven2/io/delta/delta-kernel-defaults/3.0.0/) The Delta Kernel project is a set of Java libraries (Rust will be coming soon!) for building Delta connectors that can read (and, soon, write to) Delta tables without the need to understand the [Delta protocol details](https://togithub.com/delta-io/delta/blob/master/PROTOCOL.md)). You can use this library to do the following: - Read data from Delta tables in a single thread in a single process. - Read data from Delta tables using multiple threads in a single process. - Build a complex connector for a distributed processing engine and read very large Delta tables. - \[soon!] Write to Delta tables from multiple threads / processes / distributed engines. Reading a Delta table with Kernel APIs is as follows. ```java TableClient myTableClient = DefaultTableClient.create() ; // define a client Table myTable = Table.forPath(myTableClient, "/delta/table/path"); // define what table to scan Snapshot mySnapshot = myTable.getLatestSnapshot(myTableClient); // define which version of table to scan Predicate scanFilter = ... // define the predicate Scan myScan = mySnapshot.getScanBuilder(myTableClient) // specify the scan details .withFilters(scanFilter) .build(); Scan.readData(...) // returns the table data ``` Full example code can be found [here](https://togithub.com/delta-io/delta/blob/branch-3.0/kernel/examples/table-reader/src/main/java/io/delta/kernel/examples/SingleThreadedTableReader.java). For more information, refer to: - [User guide](https://togithub.com/delta-io/delta/blob/branch-3.0/kernel/USER_GUIDE.md) on step by step process of using Kernel in a standalone Java program or in a distributed processing connector. - [Slides](https://docs.google.com/presentation/d/1PGSSuJ8ndghucSF9GpYgCi9oeRpWolFyehjQbPh92-U/edit) explaining the rationale behind Kernel and the API design. - Example [Java programs](https://togithub.com/delta-io/delta/tree/branch-3.0/kernel/examples/table-reader/src/main/java/io/delta/kernel/examples) that illustrate how to read Delta tables using the Kernel APIs. - Table and default TableClient API Java [documentation](https://docs.delta.io/3.0.0/api/java/kernel/index.html) This release of Delta contains the Kernel Table API and default TableClient API definitions and implementation which allow: - Reading Delta tables with optional Deletion Vectors enabled or column mapping (name mode only) enabled. - Partition pruning optimization to reduce the number of data files to read. ##### Welcome Delta Connectors to the Delta repository! All previous connectors from https://github.com/delta-io/connectors have been moved to this repository (https://github.com/delta-io/delta) as we aim to unify our Delta connector ecosystem structure. This includes Delta-Standalone, Delta-Flink, Delta-Hive, PowerBI, and SQL-Delta-Import. The repository https://github.com/delta-io/connectors is now deprecated. #### Delta Spark Delta Spark 3.0.0 is built on top of [Apache Spark™ 3.5](https://spark.apache.org/releases/spark-release-3-5-0.html). Similar to Apache Spark, we have released Maven artifacts for both Scala 2.12 and Scala 2.13. Note that the Delta Spark maven artifact has been renamed from **delta-core** to **delta-spark**. - Documentation: https://docs.delta.io/3.0.0/index.html - API documentation: https://docs.delta.io/3.0.0/delta-apidoc.html#delta-spark - Maven artifacts: [delta-spark\_2.12](https://repo1.maven.org/maven2/io/delta/delta-spark\_2.12/3.0.0/), [delta-spark\_2.13](https://repo1.maven.org/maven2/io/delta/delta-spark\_2.13/3.0.0/), [delta-contribs\_2.12](https://repo1.maven.org/maven2/io/delta/delta-contribs\_2.12/3.0.0/), [delta_contribs\_2.13](https://repo1.maven.org/maven2/io/delta/delta-contribs\_2.13/3.0.0/), [delta-storage](https://repo1.maven.org/maven2/io/delta/delta-storage/3.0.0/), [delta-storage-s3-dynamodb](https://repo1.maven.org/maven2/io/delta/delta-storage-s3-dynamodb/3.0.0/), [delta-iceberg\_2.12](https://repo1.maven.org/maven2/io/delta/delta-iceberg\_2.12/3.0.0/), [delta-iceberg\_2.13](https://repo1.maven.org/maven2/io/delta/delta-iceberg\_2.13/3.0.0/) - Python artifacts: https://pypi.org/project/delta-spark/3.0.0/ The key features of this release are: - [Support for Apache Spark 3.5](https://togithub.com/delta-io/delta/commit/4f9c8b9cc294ec7b321847115bf87909c356bc5a) - [Delta Universal Format](https://togithub.com/delta-io/delta/commit/9b50cd206004ae28105846eee9d910f39019ab8b) - Write as Delta, read as Iceberg! See the highlighted section above. - [Up to 10x performance improvement of UPDATE using Deletion Vectors](https://togithub.com/delta-io/delta/commit/0a0ea97b) - Delta UPDATE operations now support writing Deletion Vectors. When enabled, the performance of UPDATEs will receive a significant boost. - [More than 2x performance improvement of DELETE using Deletion Vectors](https://togithub.com/delta-io/delta/commit/fc39f78d) - This fix improves the file path canonicalization logic by avoiding calling expensive `Path.toUri.toString` calls for each row in a table, resulting in a several hundred percent speed boost on DELETE operations (only when Deletion Vectors have been [enabled](https://docs.delta.io/latest/delta-deletion-vectors.html#enable-deletion-vectors) on the table). - [Up to 2x faster MERGE operation ](https://togithub.com/delta-io/delta/issues/1827)- MERGE now better leverages data skipping, the ability to use the insert-only code path in more cases, and an overall improved execution to achieve up to 2x better performance in various scenarios. - [Support streaming reads from column mapping enabled tables](https://togithub.com/delta-io/delta/commit/3441df16) when `DROP COLUMN` and `RENAME COLUMN` have been used. This includes streaming support for Change Data Feed. See the documentation [here](https://docs.delta.io/3.0.0/delta-streaming.html#tracking-non-additive-schema-changes) for more details. - [Support specifying the columns for which Delta will collect file-skipping statistics](https://togithub.com/delta-io/delta/commit/8f2b532a) via the table property `delta.dataSkippingStatsColumns`. Previously, Delta would only collect file-skipping statistics for the first N columns in the table schema (default to 32). Now, users can easily customize this. - [Support](https://togithub.com/delta-io/delta/commit/d9a5f9f9) zero-copy [convert to Delta from Iceberg](https://docs.delta.io/3.0.0/delta-utility.html#convert-an-iceberg-table-to-a-delta-table) tables on Apache Spark 3.5 using `CONVERT TO DELTA`. This feature was excluded from the Delta Lake 2.4 release since Iceberg did not yet support Apache Spark 3.4 (or 3.5). This command generates a Delta table in the same location and does not rewrite any parquet files. - [Checkpoint V2](https://togithub.com/delta-io/delta/blob/master/PROTOCOL.md#v2-checkpoint-table-feature) - Introduced a new [Checkpoint V2 format](https://togithub.com/delta-io/delta/blob/master/PROTOCOL.md#v2-checkpoint-table-feature) in Delta Protocol Specification and implemented [read](https://togithub.com/delta-io/delta/commit/6859c863e88bfe7be6d5ccbb0c221bdde57a00c3)/[write](https://togithub.com/delta-io/delta/commit/7442ebfb8df1ae7ed8630d092abd617c110be5d6) support in Delta Spark. The new checkpoint v2 format provides more reliability over the existing v1 checkpoint format. - [Log Compactions](https://togithub.com/delta-io/delta/commit/5d43f1db5975dca31da29f714b1a155aa4367aee) - Introduced new log compaction files in Delta Protocol Specification which could be useful in reducing the frequency of Delta checkpoints. Added [read support](https://togithub.com/delta-io/delta/commit/0e05caf5c2124f61da69dc6671c8011450a6e831) for log compaction files in Delta Spark. - [Safe casts enabled by default for UPDATE and MERGE operations](https://togithub.com/delta-io/delta/commit/6d78d434) - Delta UPDATE and MERGE operations now result in an error when values cannot be safely cast to the type in the target table schema. All implicit casts in Delta now follow `spark.sql.storeAssignmentPolicy` instead of `spark.sql.ansi.enabled`. - [General Apache Spark catalog support for auxiliary commands ](https://togithub.com/delta-io/delta/commit/4eb177eaf4c16080887d78407bb64a4183832686)– Several popular auxiliary commands now support general table resolution in Apache Spark. This simplifies the code and also makes it possible to use these commands with custom table catalogs based on Delta Lake tables. The following commands are now supported in this way: VACUUM, RESTORE TABLE, DESCRIBE DETAIL, DESCRIBE HISTORY, SHALLOW CLONE, OPTIMIZE. Other notable changes include - [Fix](https://togithub.com/delta-io/delta/commit/7251507fd83518fd206e54574968054f77a11cc0) for a bug in MERGE statements that contain a scalar subquery with non-deterministic results. Such a subquery can return different results during source materialization, while finding matches, and while writing modified rows. This can cause rows to be either dropped or duplicated. - [Fix](https://togithub.com/delta-io/delta/commit/2d922660) for potential resource leak when DV file not found during parquet read - [Support](https://togithub.com/delta-io/delta/commit/f0a38649) protocol version downgrade - [Fix](https://togithub.com/delta-io/delta/commit/9a5eeb73) to initial preview release to support converting null partition values in UniForm - [Fix](https://togithub.com/delta-io/delta/commit/d9ba620c) to WRITE command to not commit empty transactions, just like what DELETE, UPDATE, and MERGE commands do already - [Support](https://togithub.com/delta-io/delta/commit/3ff4075d) 3-part table name identifier. Now, commands like `OPTIMIZE <catalog>.<db>.<tbl>` will work. - [Performance improvement](https://togithub.com/delta-io/delta/commit/d19e989e) to CDF read queries scanning in batch to reduce the number of cloud requests and to reduce Spark scheduler pressure - [Fix](https://togithub.com/delta-io/delta/commit/8a2da73d) for edge case in CDF read query optimization due to incorrect statistic value - [Fix](https://togithub.com/delta-io/delta/commit/d36623f0) for edge case in streaming reads where having the same file with different DVs in the same batch would yield incorrect results as the wrong file and DV pair would be read - [Prevent](https://togithub.com/delta-io/delta/commit/d9070685) table corruption by disallowing `overwriteSchema` when partitionOverwriteMode is set to dynamic - [Fix](https://togithub.com/delta-io/delta/commit/e41db5c1) a bug where DELETE with DVs would not work on Column Mapping-enabled tables - [Support](https://togithub.com/delta-io/delta/commit/dbb22100) automatic schema evolution in structs that are inside maps - [Minor fix](https://togithub.com/delta-io/delta/commit/7e51538d) to Delta table path URI concatenation - [Support](https://togithub.com/delta-io/delta/commit/84c869c5) writing parquet data files to the `data` subdirectory via the SQL configuration `spark.databricks.delta.write.dataFilesToSubdir`. This is used to add UniForm support on BigQuery. #### Delta Flink Delta-Flink 3.0.0 is built on top of Apache Flink™ 1.16.1. - Documentation: https://github.com/delta-io/delta/tree/branch-3.0/connectors/flink - API Documentation: https://docs.delta.io/3.0.0/api/java/flink/index.html - Maven artifact: [delta-flink](https://repo1.maven.org/maven2/io/delta/delta-flink/3.0.0/) The key features of this release are - Support for [Flink SQL and Catalog](https://togithub.com/delta-io/delta/commit/47ae5a35). You can now use the Flink/Delta connector for Flink SQL jobs. You can CREATE Delta tables, SELECT data from them (uses the Delta Source), and INSERT new data into them (uses the Delta Sink). Note: for correct operations on the Delta tables, you must first configure the Delta Catalog using CREATE CATALOG before running a SQL command on Delta tables. For more information, please see the documentation [here](https://togithub.com/delta-io/delta/blob/branch-3.0/connectors/flink/README.md). - [Significant performance improvement](https://togithub.com/delta-io/delta/commit/5759de83) to Global Committer initialization - The last-successfully-committed delta version by a given Flink application is now loaded lazily significantly reducing the CPU utilization in the most common scenarios. Other notable changes include - [Fix](https://togithub.com/delta-io/delta/commit/23826a3b) a bug where Flink STRING types were incorrectly truncated to type VARCHAR with length 1 #### Delta Standalone - Documentation: https://docs.delta.io/3.0.0/delta-standalone.html - API Documentation: https://docs.delta.io/3.0.0/api/java/standalone/index.html - Maven artifacts: [delta-standalone\_2.12](https://repo1.maven.org/maven2/io/delta/delta-standalone\_2.12/3.0.0/), [delta-standalone\_2.13](https://repo1.maven.org/maven2/io/delta/delta-standalone\_2.13/3.0.0/) The key features in this release are: - [Support](https://togithub.com/delta-io/delta/commit/baf54ffd) for disabling Delta checkpointing during commits - For very large tables with millions of files, performing Delta checkpoints can become an expensive overhead during writes. Users can now disable this checkpointing by setting the hadoop configuration property `io.delta.standalone.checkpointing.enabled` to `false`. This is only safe and suggested to do if another job will periodically perform the checkpointing. - [Performance](https://togithub.com/delta-io/delta/commit/f11c3556) improvement to snapshot initialization - When a delta table is loaded at a particular version, the snapshot must contain, at a minimum, the latest protocol and metadata. This PR improves the snapshot load performance for repeated table changes. - [Support adding absolute paths](https://togithub.com/delta-io/delta/commit/02a46d19) to the Delta log - This now enables users to manually perform `SHALLOW CLONE`s and create Delta tables with external files. - [Fix](https://togithub.com/delta-io/delta/commit/4dadc028) in schema evolution to prevent adding non-nullable columns to existing Delta tables #### Credits Adam Binford, Ahir Reddy, Ala Luszczak, Alex, Allen Reese, Allison Portis, Ami Oka, Andreas Chatzistergiou, Animesh Kashyap, Anonymous, Antoine Amend, Bart Samwel, Bo Gao, Boyang Jerry Peng, Burak Yavuz, CabbageCollector, Carmen Kwan, ChengJi-db, Christopher Watford, Christos Stavrakakis, Costas Zarifis, Denny Lee, Desmond Cheong, Dhruv Arya, Eric Maynard, Eric Ogren, Felipe Pessoto, Feng Zhu, Fredrik Klauss, Gengliang Wang, Gerhard Brueckl, Gopi Krishna Madabhushi, Grzegorz Kołakowski, Hang Jia, Hao Jiang, Herivelton Andreassa, Herman van Hovell, Jacek Laskowski, Jackie Zhang, Jiaan Geng, Jiaheng Tang, Jiawei Bao, Jing Wang, Johan Lasperas, Jonas Irgens Kylling, Jungtaek Lim, Junyong Lee, K.I. (Dennis) Jung, Kam Cheung Ting, Krzysztof Chmielewski, Lars Kroll, Lin Ma, Lin Zhou, Luca Menichetti, Lukas Rupprecht, Martin Grund, Min Yang, Ming DAI, Mohamed Zait, Neil Ramaswamy, Ole Sasse, Olivier NOUGUIER, Pablo Flores, Paddy Xu, Patrick Pichler, Paweł Kubit, Prakhar Jain, Pulkit Singhal, RunyaoChen, Ryan Johnson, Sabir Akhadov, Satya Valluri, Scott Sandre, Shixiong Zhu, Siying Dong, Son, Tathagata Das, Terry Kim, Tom van Bussel, Venki Korukanti, Wenchen Fan, Xinyi, Yann Byron, Yaohua Zhao, Yijia Cui, Yuhong Chen, Yuming Wang, Yuya Ebihara, Zhen Li, aokolnychyi, gurunath, jintao shen, maryannxue, noelo, panbingkun, windpiger, wwang-talend, sherlockbeard </details> --- ### Configuration 📅 **Schedule**: Branch creation - At any time (no schedule defined), Automerge - At any time (no schedule defined). 🚦 **Automerge**: Disabled by config. Please merge this manually once you are satisfied. ♻ **Rebasing**: Whenever PR becomes conflicted, or you tick the rebase/retry checkbox. 🔕 **Ignore**: Close this PR and you won't be reminded about this update again. --- - [ ] If you want to rebase/retry this PR, check this box --- This PR has been generated by [Mend Renovate](https://www.mend.io/free-developer-tools/renovate/). View repository job log [here](https://developer.mend.io/github/agile-lab-dev/whitefox).  Co-authored-by: renovate[bot] <29139614+renovate[bot]@users.noreply.github.com>
diff --git a/server/core/build.gradle.kts b/server/core/build.gradle.kts
@@ -27,7 +27,7 @@ dependencies {
     testFixturesImplementation(String.format("org.eclipse.microprofile.config:microprofile-config-api:%s", microprofileConfigVersion))
 
     // DELTA
-    implementation("io.delta:delta-standalone_2.13:0.6.0")
+    implementation("io.delta:delta-standalone_2.13:3.0.0")
     implementation(String.format("org.apache.hadoop:hadoop-common:%s", hadoopVersion))
 
     //AWS