Class SparkReadConf

java.lang.Object
org.apache.iceberg.spark.SparkReadConf

public class SparkReadConf extends Object
A class for common Iceberg configs for Spark reads.

If a config is set at multiple levels, the following order of precedence is used (top to bottom):

  1. Read options
  2. Session configuration
  3. Table metadata
The most specific value is set in read options and takes precedence over all other configs. If no read option is provided, this class checks the session configuration for any overrides. If no applicable value is found in the session configuration, this class uses the table metadata.

Note this class is NOT meant to be serialized and sent to executors.

  • Constructor Details

    • SparkReadConf

      public SparkReadConf(org.apache.spark.sql.SparkSession spark, Table table)
    • SparkReadConf

      public SparkReadConf(org.apache.spark.sql.SparkSession spark, Table table, org.apache.spark.sql.util.CaseInsensitiveStringMap options)
    • SparkReadConf

      @Deprecated public SparkReadConf(org.apache.spark.sql.SparkSession spark, Table table, String branch, org.apache.spark.sql.util.CaseInsensitiveStringMap options)
      Deprecated.
      since 1.11.0, will be removed in 1.12.0. Use SparkReadConf(SparkSession, Table, CaseInsensitiveStringMap) instead.
  • Method Details

    • caseSensitive

      public boolean caseSensitive()
    • localityEnabled

      public boolean localityEnabled()
    • startSnapshotId

      public Long startSnapshotId()
    • endSnapshotId

      public Long endSnapshotId()
    • streamingSkipDeleteSnapshots

      public boolean streamingSkipDeleteSnapshots()
    • streamingSkipOverwriteSnapshots

      public boolean streamingSkipOverwriteSnapshots()
    • parquetVectorizationEnabled

      public boolean parquetVectorizationEnabled()
    • parquetBatchSize

      public int parquetBatchSize()
    • orcVectorizationEnabled

      public boolean orcVectorizationEnabled()
    • orcBatchSize

      public int orcBatchSize()
    • splitSizeOption

      public Long splitSizeOption()
    • splitSize

      public long splitSize()
    • splitLookbackOption

      public Integer splitLookbackOption()
    • splitLookback

      public int splitLookback()
    • splitOpenFileCostOption

      public Long splitOpenFileCostOption()
    • splitOpenFileCost

      public long splitOpenFileCost()
    • streamFromTimestamp

      public long streamFromTimestamp()
    • startTimestamp

      public Long startTimestamp()
    • endTimestamp

      public Long endTimestamp()
    • maxFilesPerMicroBatch

      public int maxFilesPerMicroBatch()
    • maxRecordsPerMicroBatch

      public int maxRecordsPerMicroBatch()
    • asyncMicroBatchPlanningEnabled

      public boolean asyncMicroBatchPlanningEnabled()
    • streamingSnapshotPollingIntervalMs

      public long streamingSnapshotPollingIntervalMs()
    • asyncQueuePreloadFileLimit

      public long asyncQueuePreloadFileLimit()
    • asyncQueuePreloadRowLimit

      public long asyncQueuePreloadRowLimit()
    • preserveDataGrouping

      public boolean preserveDataGrouping()
    • aggregatePushDownEnabled

      public boolean aggregatePushDownEnabled()
    • adaptiveSplitSizeEnabled

      public boolean adaptiveSplitSizeEnabled()
    • parallelism

      public int parallelism()
    • splitParallelism

      public int splitParallelism()
    • distributedPlanningEnabled

      public boolean distributedPlanningEnabled()
    • dataPlanningMode

      public PlanningMode dataPlanningMode()
    • deletePlanningMode

      public PlanningMode deletePlanningMode()
    • executorCacheLocalityEnabled

      public boolean executorCacheLocalityEnabled()
    • cacheDeleteFilesOnExecutors

      public boolean cacheDeleteFilesOnExecutors()
    • reportColumnStats

      public boolean reportColumnStats()
    • identifierFieldsRely

      public boolean identifierFieldsRely()
    • incrementalAppendScanBoundaries

      public Pair<Long,Long> incrementalAppendScanBoundaries()