Class SparkReadConf


  • public class SparkReadConf
    extends java.lang.Object
    A class for common Iceberg configs for Spark reads.

    If a config is set at multiple levels, the following order of precedence is used (top to bottom):

    1. Read options
    2. Session configuration
    3. Table metadata
    The most specific value is set in read options and takes precedence over all other configs. If no read option is provided, this class checks the session configuration for any overrides. If no applicable value is found in the session configuration, this class uses the table metadata.

    Note this class is NOT meant to be serialized and sent to executors.

    • Constructor Detail

      • SparkReadConf

        public SparkReadConf​(org.apache.spark.sql.SparkSession spark,
                             Table table,
                             java.util.Map<java.lang.String,​java.lang.String> readOptions)
      • SparkReadConf

        public SparkReadConf​(org.apache.spark.sql.SparkSession spark,
                             Table table,
                             java.lang.String branch,
                             java.util.Map<java.lang.String,​java.lang.String> readOptions)
    • Method Detail

      • caseSensitive

        public boolean caseSensitive()
      • localityEnabled

        public boolean localityEnabled()
      • snapshotId

        public java.lang.Long snapshotId()
      • asOfTimestamp

        public java.lang.Long asOfTimestamp()
      • startSnapshotId

        public java.lang.Long startSnapshotId()
      • endSnapshotId

        public java.lang.Long endSnapshotId()
      • branch

        public java.lang.String branch()
      • tag

        public java.lang.String tag()
      • fileScanTaskSetId

        @Deprecated
        public java.lang.String fileScanTaskSetId()
        Deprecated.
        will be removed in 1.3.0, use scanTaskSetId() instead
      • scanTaskSetId

        public java.lang.String scanTaskSetId()
      • streamingSkipDeleteSnapshots

        public boolean streamingSkipDeleteSnapshots()
      • streamingSkipOverwriteSnapshots

        public boolean streamingSkipOverwriteSnapshots()
      • parquetVectorizationEnabled

        public boolean parquetVectorizationEnabled()
      • parquetBatchSize

        public int parquetBatchSize()
      • orcVectorizationEnabled

        public boolean orcVectorizationEnabled()
      • orcBatchSize

        public int orcBatchSize()
      • splitSizeOption

        public java.lang.Long splitSizeOption()
      • splitSize

        public long splitSize()
      • splitLookbackOption

        public java.lang.Integer splitLookbackOption()
      • splitLookback

        public int splitLookback()
      • splitOpenFileCostOption

        public java.lang.Long splitOpenFileCostOption()
      • splitOpenFileCost

        public long splitOpenFileCost()
      • handleTimestampWithoutZone

        public boolean handleTimestampWithoutZone()
        Enables reading a timestamp without time zone as a timestamp with time zone.

        Generally, this is not safe as a timestamp without time zone is supposed to represent the wall-clock time, i.e. no matter the reader/writer timezone 3PM should always be read as 3PM, but a timestamp with time zone represents instant semantics, i.e. the timestamp is adjusted so that the corresponding time in the reader timezone is displayed.

        When set to false (default), an exception must be thrown while reading a timestamp without time zone.

        Returns:
        boolean indicating if reading timestamps without timezone is allowed
      • streamFromTimestamp

        public java.lang.Long streamFromTimestamp()
      • startTimestamp

        public java.lang.Long startTimestamp()
      • endTimestamp

        public java.lang.Long endTimestamp()
      • preserveDataGrouping

        public boolean preserveDataGrouping()
      • aggregatePushDownEnabled

        public boolean aggregatePushDownEnabled()