Configuration

Table properties

Iceberg tables support table properties to configure table behavior, like the default split size for readers.

Read properties

PropertyDefaultDescription
read.split.target-size134217728 (128 MB)Target size when combining data input splits
read.split.metadata-target-size33554432 (32 MB)Target size when combining metadata input splits
read.split.planning-lookback10Number of bins to consider when combining input splits
read.split.open-file-cost4194304 (4 MB)The estimated cost to open a file, used as a minimum weight when combining splits.
read.parquet.vectorization.enabledfalseEnables parquet vectorized reads
read.parquet.vectorization.batch-size5000The batch size for parquet vectorized reads
read.orc.vectorization.enabledfalseEnables orc vectorized reads
read.orc.vectorization.batch-size5000The batch size for orc vectorized reads

Write properties

PropertyDefaultDescription
write.format.defaultparquetDefault file format for the table; parquet, avro, or orc
write.delete.format.defaultdata file formatDefault delete file format for the table; parquet, avro, or orc
write.parquet.row-group-size-bytes134217728 (128 MB)Parquet row group size
write.parquet.page-size-bytes1048576 (1 MB)Parquet page size
write.parquet.page-row-limit20000Parquet page row limit
write.parquet.dict-size-bytes2097152 (2 MB)Parquet dictionary page size
write.parquet.compression-codecgzipParquet compression codec: zstd, brotli, lz4, gzip, snappy, uncompressed
write.parquet.compression-levelnullParquet compression level
write.parquet.bloom-filter-enabled.column.col1(not set)Enables writing a bloom filter for the column: col1
write.parquet.bloom-filter-max-bytes1048576 (1 MB)The maximum number of bytes for a bloom filter bitset
write.avro.compression-codecgzipAvro compression codec: gzip(deflate with 9 level), zstd, snappy, uncompressed
write.avro.compression-levelnullAvro compression level
write.orc.stripe-size-bytes67108864 (64 MB)Define the default ORC stripe size, in bytes
write.orc.block-size-bytes268435456 (256 MB)Define the default file system block size for ORC files
write.orc.compression-codeczlibORC compression codec: zstd, lz4, lzo, zlib, snappy, none
write.orc.compression-strategyspeedORC compression strategy: speed, compression
write.location-provider.implnullOptional custom implementation for LocationProvider
write.metadata.compression-codecnoneMetadata compression codec; none or gzip
write.metadata.metrics.defaulttruncate(16)Default metrics mode for all columns in the table; none, counts, truncate(length), or full
write.metadata.metrics.column.col1(not set)Metrics mode for column ‘col1’ to allow per-column tuning; none, counts, truncate(length), or full
write.target-file-size-bytes536870912 (512 MB)Controls the size of files generated to target about this many bytes
write.delete.target-file-size-bytes67108864 (64 MB)Controls the size of delete files generated to target about this many bytes
write.distribution-modenoneDefines distribution of write data: none: don’t shuffle rows; hash: hash distribute by partition key ; range: range distribute by partition key or sort key if table has an SortOrder
write.delete.distribution-modehashDefines distribution of write delete data
write.wap.enabledfalseEnables write-audit-publish writes
write.summary.partition-limit0Includes partition-level summary stats in snapshot summaries if the changed partition count is less than this limit
write.metadata.delete-after-commit.enabledfalseControls whether to delete the oldest version metadata files after commit
write.metadata.previous-versions-max100The max number of previous version metadata files to keep before deleting after commit
write.spark.fanout.enabledfalseEnables the fanout writer in Spark that does not require data to be clustered; uses more memory
write.object-storage.enabledfalseEnables the object storage location provider that adds a hash component to file paths
write.data.pathtable location + /dataBase location for data files
write.metadata.pathtable location + /metadataBase location for metadata files
write.delete.modecopy-on-writeMode used for delete commands: copy-on-write or merge-on-read (v2 only)
write.delete.isolation-levelserializableIsolation level for delete commands: serializable or snapshot
write.update.modecopy-on-writeMode used for update commands: copy-on-write or merge-on-read (v2 only)
write.update.isolation-levelserializableIsolation level for update commands: serializable or snapshot
write.merge.modecopy-on-writeMode used for merge commands: copy-on-write or merge-on-read (v2 only)
write.merge.isolation-levelserializableIsolation level for merge commands: serializable or snapshot

Table behavior properties

PropertyDefaultDescription
commit.retry.num-retries4Number of times to retry a commit before failing
commit.retry.min-wait-ms100Minimum time in milliseconds to wait before retrying a commit
commit.retry.max-wait-ms60000 (1 min)Maximum time in milliseconds to wait before retrying a commit
commit.retry.total-timeout-ms1800000 (30 min)Total retry timeout period in milliseconds for a commit
commit.status-check.num-retries3Number of times to check whether a commit succeeded after a connection is lost before failing due to an unknown commit state
commit.status-check.min-wait-ms1000 (1s)Minimum time in milliseconds to wait before retrying a status-check
commit.status-check.max-wait-ms60000 (1 min)Maximum time in milliseconds to wait before retrying a status-check
commit.status-check.total-timeout-ms1800000 (30 min)Total timeout period in which the commit status-check must succeed, in milliseconds
commit.manifest.target-size-bytes8388608 (8 MB)Target size when merging manifest files
commit.manifest.min-count-to-merge100Minimum number of manifests to accumulate before merging
commit.manifest-merge.enabledtrueControls whether to automatically merge manifests on writes
history.expire.max-snapshot-age-ms432000000 (5 days)Default max age of snapshots to keep while expiring snapshots
history.expire.min-snapshots-to-keep1Default min number of snapshots to keep while expiring snapshots
history.expire.max-ref-age-msLong.MAX_VALUE (forever)For snapshot references except the main branch, default max age of snapshot references to keep while expiring snapshots. The main branch never expires.

Reserved table properties

Reserved table properties are only used to control behaviors when creating or updating a table. The value of these properties are not persisted as a part of the table metadata.

PropertyDefaultDescription
format-version1Table’s format version (can be 1 or 2) as defined in the Spec.

Compatibility flags

PropertyDefaultDescription
compatibility.snapshot-id-inheritance.enabledfalseEnables committing snapshots without explicit snapshot IDs

Catalog properties

Iceberg catalogs support using catalog properties to configure catalog behaviors. Here is a list of commonly used catalog properties:

PropertyDefaultDescription
catalog-implnulla custom Catalog implementation to use by an engine
io-implnulla custom FileIO implementation to use in a catalog
warehousenullthe root path of the data warehouse
urinulla URI string, such as Hive metastore URI
clients2client pool size
cache-enabledtrueWhether to cache catalog entries
cache.expiration-interval-ms30000How long catalog entries are locally cached, in milliseconds; 0 disables caching, negative values disable expiration

HadoopCatalog and HiveCatalog can access the properties in their constructors. Any other custom catalog can access the properties by implementing Catalog.initialize(catalogName, catalogProperties). The properties can be manually constructed or passed in from a compute engine like Spark or Flink. Spark uses its session properties as catalog properties, see more details in the Spark configuration section. Flink passes in catalog properties through CREATE CATALOG statement, see more details in the Flink section.

Lock catalog properties

Here are the catalog properties related to locking. They are used by some catalog implementations to control the locking behavior during commits.

PropertyDefaultDescription
lock-implnulla custom implementation of the lock manager, the actual interface depends on the catalog used
lock.tablenullan auxiliary table for locking, such as in AWS DynamoDB lock manager
lock.acquire-interval-ms5 secondsthe interval to wait between each attempt to acquire a lock
lock.acquire-timeout-ms3 minutesthe maximum time to try acquiring a lock
lock.heartbeat-interval-ms3 secondsthe interval to wait between each heartbeat after acquiring a lock
lock.heartbeat-timeout-ms15 secondsthe maximum time without a heartbeat to consider a lock expired

Hadoop configuration

The following properties from the Hadoop configuration are used by the Hive Metastore connector.

PropertyDefaultDescription
iceberg.hive.client-pool-size5The size of the Hive client pool when tracking tables in HMS
iceberg.hive.lock-timeout-ms180000 (3 min)Maximum time in milliseconds to acquire a lock
iceberg.hive.lock-check-min-wait-ms50Minimum time in milliseconds to check back on the status of lock acquisition
iceberg.hive.lock-check-max-wait-ms5000Maximum time in milliseconds to check back on the status of lock acquisition

Note: iceberg.hive.lock-check-max-wait-ms should be less than the transaction timeout of the Hive Metastore (hive.txn.timeout or metastore.txn.timeout in the newer versions). Otherwise, the heartbeats on the lock (which happens during the lock checks) would end up expiring in the Hive Metastore before the lock is retried from Iceberg.