org.apache.iceberg.spark.SparkReadConf

public class SparkReadConf extends Object

A class for common Iceberg configs for Spark reads.

If a config is set at multiple levels, the following order of precedence is used (top to bottom):

Read options
Session configuration
Table metadata

The most specific value is set in read options and takes precedence over all other configs. If no read option is provided, this class checks the session configuration for any overrides. If no applicable value is found in the session configuration, this class uses the table metadata.

Note this class is NOT meant to be serialized and sent to executors.

Constructor Summary

Constructors

Constructor

Description

SparkReadConf(org.apache.spark.sql.SparkSession spark, Table table, String branch, Map<String,String> readOptions)

SparkReadConf(org.apache.spark.sql.SparkSession spark, Table table, Map<String,String> readOptions)
Method Summary

Modifier and Type

Method

Description

boolean

adaptiveSplitSizeEnabled()

boolean

aggregatePushDownEnabled()

Long

asOfTimestamp()

String

branch()

boolean

caseSensitive()

PlanningMode

dataPlanningMode()

PlanningMode

deletePlanningMode()

boolean

distributedPlanningEnabled()

Long

endSnapshotId()

Long

endTimestamp()

boolean

executorCacheLocalityEnabled()

boolean

localityEnabled()

int

maxFilesPerMicroBatch()

int

maxRecordsPerMicroBatch()

int

orcBatchSize()

boolean

orcVectorizationEnabled()

int

parallelism()

int

parquetBatchSize()

boolean

parquetVectorizationEnabled()

boolean

preserveDataGrouping()

String

scanTaskSetId()

Long

snapshotId()

int

splitLookback()

Integer

splitLookbackOption()

long

splitOpenFileCost()

Long

splitOpenFileCostOption()

long

splitSize()

Long

splitSizeOption()

Long

startSnapshotId()

Long

startTimestamp()

long

streamFromTimestamp()

boolean

streamingSkipDeleteSnapshots()

boolean

streamingSkipOverwriteSnapshots()

String

tag()

Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

Constructor Details
- SparkReadConf
  
  public SparkReadConf(org.apache.spark.sql.SparkSession spark, Table table, Map<String,String> readOptions)
- SparkReadConf
  
  public SparkReadConf(org.apache.spark.sql.SparkSession spark, Table table, String branch, Map<String,String> readOptions)
Method Details
- caseSensitive
  
  public boolean caseSensitive()
- localityEnabled
  
  public boolean localityEnabled()
- snapshotId
  
  public Long snapshotId()
- asOfTimestamp
  
  public Long asOfTimestamp()
- startSnapshotId
  
  public Long startSnapshotId()
- endSnapshotId
  
  public Long endSnapshotId()
- branch
  
  public String branch()
- tag
  
  public String tag()
- scanTaskSetId
  
  public String scanTaskSetId()
- streamingSkipDeleteSnapshots
  
  public boolean streamingSkipDeleteSnapshots()
- streamingSkipOverwriteSnapshots
  
  public boolean streamingSkipOverwriteSnapshots()
- parquetVectorizationEnabled
  
  public boolean parquetVectorizationEnabled()
- parquetBatchSize
  
  public int parquetBatchSize()
- orcVectorizationEnabled
  
  public boolean orcVectorizationEnabled()
- orcBatchSize
  
  public int orcBatchSize()
- splitSizeOption
  
  public Long splitSizeOption()
- splitSize
  
  public long splitSize()
- splitLookbackOption
  
  public Integer splitLookbackOption()
- splitLookback
  
  public int splitLookback()
- splitOpenFileCostOption
  
  public Long splitOpenFileCostOption()
- splitOpenFileCost
  
  public long splitOpenFileCost()
- streamFromTimestamp
  
  public long streamFromTimestamp()
- startTimestamp
  
  public Long startTimestamp()
- endTimestamp
  
  public Long endTimestamp()
- maxFilesPerMicroBatch
  
  public int maxFilesPerMicroBatch()
- maxRecordsPerMicroBatch
  
  public int maxRecordsPerMicroBatch()
- preserveDataGrouping
  
  public boolean preserveDataGrouping()
- aggregatePushDownEnabled
  
  public boolean aggregatePushDownEnabled()
- adaptiveSplitSizeEnabled
  
  public boolean adaptiveSplitSizeEnabled()
- parallelism
  
  public int parallelism()
- distributedPlanningEnabled
  
  public boolean distributedPlanningEnabled()
- dataPlanningMode
  
  public PlanningMode dataPlanningMode()
- deletePlanningMode
  
  public PlanningMode deletePlanningMode()
- executorCacheLocalityEnabled
  
  public boolean executorCacheLocalityEnabled()

Class SparkReadConf

Constructor Summary

Method Summary

Methods inherited from class java.lang.Object

Constructor Details

SparkReadConf

SparkReadConf

Method Details

caseSensitive

localityEnabled

snapshotId

asOfTimestamp

startSnapshotId

endSnapshotId

branch

tag

scanTaskSetId

streamingSkipDeleteSnapshots

streamingSkipOverwriteSnapshots

parquetVectorizationEnabled

parquetBatchSize

orcVectorizationEnabled

orcBatchSize

splitSizeOption

splitSize

splitLookbackOption

splitLookback

splitOpenFileCostOption

splitOpenFileCost

streamFromTimestamp

startTimestamp

endTimestamp

maxFilesPerMicroBatch

maxRecordsPerMicroBatch

preserveDataGrouping

aggregatePushDownEnabled

adaptiveSplitSizeEnabled

parallelism

distributedPlanningEnabled

dataPlanningMode

deletePlanningMode

executorCacheLocalityEnabled