Class IcebergSource

java.lang.Object
org.apache.iceberg.spark.source.IcebergSource
All Implemented Interfaces:
org.apache.spark.sql.connector.catalog.SupportsCatalogOptions, org.apache.spark.sql.connector.catalog.TableProvider, org.apache.spark.sql.sources.DataSourceRegister

public class IcebergSource extends Object implements org.apache.spark.sql.sources.DataSourceRegister, org.apache.spark.sql.connector.catalog.SupportsCatalogOptions
The IcebergSource loads/writes tables with format "iceberg". It can load paths and tables.

How paths/tables are loaded when using spark.read().format("iceberg").load(table)

table = "file:///path/to/table" -> loads a HadoopTable at given path table = "tablename" -> loads currentCatalog.currentNamespace.tablename table = "catalog.tablename" -> load "tablename" from the specified catalog. table = "namespace.tablename" -> load "namespace.tablename" from current catalog table = "catalog.namespace.tablename" -> "namespace.tablename" from the specified catalog. table = "namespace1.namespace2.tablename" -> load "namespace1.namespace2.tablename" from current catalog

The above list is in order of priority. For example: a matching catalog will take priority over any namespace resolution.

  • Constructor Details

    • IcebergSource

      public IcebergSource()
  • Method Details

    • shortName

      public String shortName()
      Specified by:
      shortName in interface org.apache.spark.sql.sources.DataSourceRegister
    • inferSchema

      public org.apache.spark.sql.types.StructType inferSchema(org.apache.spark.sql.util.CaseInsensitiveStringMap options)
      Specified by:
      inferSchema in interface org.apache.spark.sql.connector.catalog.TableProvider
    • inferPartitioning

      public org.apache.spark.sql.connector.expressions.Transform[] inferPartitioning(org.apache.spark.sql.util.CaseInsensitiveStringMap options)
      Specified by:
      inferPartitioning in interface org.apache.spark.sql.connector.catalog.TableProvider
    • supportsExternalMetadata

      public boolean supportsExternalMetadata()
      Specified by:
      supportsExternalMetadata in interface org.apache.spark.sql.connector.catalog.TableProvider
    • getTable

      public org.apache.spark.sql.connector.catalog.Table getTable(org.apache.spark.sql.types.StructType schema, org.apache.spark.sql.connector.expressions.Transform[] partitioning, Map<String,String> options)
      Specified by:
      getTable in interface org.apache.spark.sql.connector.catalog.TableProvider
    • extractIdentifier

      public org.apache.spark.sql.connector.catalog.Identifier extractIdentifier(org.apache.spark.sql.util.CaseInsensitiveStringMap options)
      Specified by:
      extractIdentifier in interface org.apache.spark.sql.connector.catalog.SupportsCatalogOptions
    • extractCatalog

      public String extractCatalog(org.apache.spark.sql.util.CaseInsensitiveStringMap options)
      Specified by:
      extractCatalog in interface org.apache.spark.sql.connector.catalog.SupportsCatalogOptions
    • extractTimeTravelVersion

      public Optional<String> extractTimeTravelVersion(org.apache.spark.sql.util.CaseInsensitiveStringMap options)
      Specified by:
      extractTimeTravelVersion in interface org.apache.spark.sql.connector.catalog.SupportsCatalogOptions
    • extractTimeTravelTimestamp

      public Optional<String> extractTimeTravelTimestamp(org.apache.spark.sql.util.CaseInsensitiveStringMap options)
      Specified by:
      extractTimeTravelTimestamp in interface org.apache.spark.sql.connector.catalog.SupportsCatalogOptions