Downloads¶
The latest version of Iceberg is 0.11.0.
- 0.11.0 source tar.gz – signature – sha512
- 0.11.0 Spark 3.0 runtime Jar
- 0.11.0 Spark 2.4 runtime Jar
- 0.11.0 Flink runtime Jar
- 0.11.0 Hive runtime Jar
To use Iceberg in Spark, download the runtime Jar and add it to the jars folder of your Spark install. Use iceberg-spark3-runtime for Spark 3, and iceberg-spark-runtime for Spark 2.4.
To use Iceberg in Hive, download the iceberg-hive-runtime Jar and add it to Hive using ADD JAR
.
Gradle¶
To add a dependency on Iceberg in Gradle, add the following to build.gradle
:
dependencies {
compile 'org.apache.iceberg:iceberg-core:0.11.0'
}
You may also want to include iceberg-parquet
for Parquet file support.
Maven¶
To add a dependency on Iceberg in Maven, add the following to your pom.xml
:
<dependencies>
...
<dependency>
<groupId>org.apache.iceberg</groupId>
<artifactId>iceberg-core</artifactId>
<version>0.11.0</version>
</dependency>
...
</dependencies>
0.11.0 release notes¶
High-level features:
- Core API now supports partition spec and sort order evolution
- Spark 3 now supports the following SQL extensions:
- MERGE INTO (experimental)
- DELETE FROM (experimental)
- ALTER TABLE … ADD/DROP PARTITION
- ALTER TABLE … WRITE ORDERED BY
- Invoke stored procedures using CALL
- Flink now supports streaming reads, CDC writes (experimental), and filter pushdown
- AWS module is added to support better integration with AWS, with AWS Glue catalog support and dedicated S3 FileIO implementation
- Nessie module is added to support integration with project Nessie
Important bug fixes:
- #1981 fixes bug that date and timestamp transforms were producing incorrect values for dates and times before 1970. Before the fix, negative values were incorrectly transformed by date and timestamp transforms to 1 larger than the correct value. For example,
day(1969-12-31 10:00:00)
produced 0 instead of -1. The fix is backwards compatible, which means predicate projection can still work with the incorrectly transformed partitions written using older versions. - #2091 fixes
ClassCastException
for type promotionint
tolong
andfloat
todouble
during Parquet vectorized read. Now Arrow vector is created by looking at Parquet file schema instead of Iceberg schema forint
andfloat
fields. - #1998 fixes bug in
HiveTableOperation
thatunlock
is not called if new metadata cannot be deleted. Now it is guaranteed thatunlock
is always called for Hive catalog users. - #1979 fixes table listing failure in Hadoop catalog when user does not have permission to some tables. Now the tables with no permission are ignored in listing.
- #1798 fixes scan task failure when encountering duplicate entries of data files. Spark and Flink readers can now ignore duplicated entries in data files for each scan task.
- #1785 fixes invalidation of metadata tables in
CachingCatalog
. When a table is dropped, all the metadata tables associated with it are also invalidated in the cache. - #1960 fixes bug that ORC writer does not read metrics config and always use the default. Now customized metrics config is respected.
Other notable changes:
- NaN counts are now supported in metadata
- Shared catalog properties are added in core library to standardize catalog level configurations
- Spark and Flink now support dynamically loading customized
Catalog
andFileIO
implementations - Spark 2 now supports loading tables from other catalogs, like Spark 3
- Spark 3 now supports catalog names in DataFrameReader when using Iceberg as a format
- Flink now uses the number of Iceberg read splits as its job parallelism to improve performance and save resource.
- Hive (experimental) now supports INSERT INTO, case insensitive query, projection pushdown, create DDL with schema and auto type conversion
- ORC now supports reading tinyint, smallint, char, varchar types
- Avro to Iceberg schema conversion now preserves field docs
Past releases¶
0.10.0¶
- Git tag: 0.10.0
- 0.10.0 source tar.gz – signature – sha512
- 0.10.0 Spark 3.0 runtime Jar
- 0.10.0 Spark 2.4 runtime Jar
- 0.10.0 Flink runtime Jar
- 0.10.0 Hive runtime Jar
High-level features:
- Format v2 support for building row-level operations (
MERGE INTO
) in processing engines- Note: format v2 is not yet finalized and does not have a forward-compatibility guarantee
- Flink integration for writing to Iceberg tables and reading from Iceberg tables (reading supports batch mode only)
- Hive integration for reading from Iceberg tables, with filter pushdown (experimental; configuration may change)
Important bug fixes:
- #1706 fixes non-vectorized ORC reads in Spark that incorrectly skipped rows
- #1536 fixes ORC conversion of
notIn
andnotEqual
to match null values - #1722 fixes
Expressions.notNull
returning anisNull
predicate; API only, method was not used by processing engines - #1736 fixes
IllegalArgumentException
in vectorized Spark reads with negative decimal values - #1666 fixes file lengths returned by the ORC writer, using compressed size rather than uncompressed size
- #1674 removes catalog expiration in HiveCatalogs
- #1545 automatically refreshes tables in Spark when not caching table instances
Other notable changes:
- The
iceberg-hive
module has been renamed toiceberg-hive-metastore
to avoid confusion - Spark 3 is based on 3.0.1 that includes the fix for SPARK-32168
- Hadoop tables will recover from version hint corruption
- Tables can be configured with a required sort order
- Data file locations can be customized with a dynamically loaded
LocationProvider
- ORC file imports can apply a name mapping for stats
A more exhaustive list of changes is available under the 0.10.0 release milestone.
0.9.1¶
- Git tag: 0.9.1
- 0.9.1 source tar.gz – signature – sha512
- 0.9.1 Spark 3.0 runtime Jar
- 0.9.1 Spark 2.4 runtime Jar
0.9.0¶
- Git tag: 0.9.0
- 0.9.0 source tar.gz – signature – sha512
- 0.9.0 Spark 3.0 runtime Jar
- 0.9.0 Spark 2.4 runtime Jar
0.8.0¶
- Git tag: apache-iceberg-0.8.0-incubating
- 0.8.0-incubating source tar.gz – signature – sha512
- 0.8.0-incubating Spark 2.4 runtime Jar