Class MigrateTableSparkAction

java.lang.Object
org.apache.iceberg.spark.actions.MigrateTableSparkAction
All Implemented Interfaces:
Action<MigrateTable,MigrateTable.Result>, MigrateTable

public class MigrateTableSparkAction extends Object implements MigrateTable
Takes a Spark table in the source catalog and attempts to transform it into an Iceberg table in the same location with the same identifier. Once complete the identifier which previously referred to a non-Iceberg table will refer to the newly migrated Iceberg table.
  • Field Details

  • Method Details

    • self

      protected MigrateTableSparkAction self()
    • destCatalog

      protected org.apache.spark.sql.connector.catalog.StagingTableCatalog destCatalog()
    • destTableIdent

      protected org.apache.spark.sql.connector.catalog.Identifier destTableIdent()
    • tableProperties

      public MigrateTableSparkAction tableProperties(Map<String,String> properties)
      Description copied from interface: MigrateTable
      Sets table properties in the newly created Iceberg table. Any properties with the same key name will be overwritten.
      Specified by:
      tableProperties in interface MigrateTable
      Parameters:
      properties - a map of properties to set
      Returns:
      this for method chaining
    • tableProperty

      public MigrateTableSparkAction tableProperty(String property, String value)
      Description copied from interface: MigrateTable
      Sets a table property in the newly created Iceberg table. Any properties with the same key will be overwritten.
      Specified by:
      tableProperty in interface MigrateTable
      Parameters:
      property - a table property name
      value - a table property value
      Returns:
      this for method chaining
    • dropBackup

      public MigrateTableSparkAction dropBackup()
      Description copied from interface: MigrateTable
      Drops the backup of the original table after a successful migration
      Specified by:
      dropBackup in interface MigrateTable
      Returns:
      this for method chaining
    • backupTableName

      public MigrateTableSparkAction backupTableName(String tableName)
      Description copied from interface: MigrateTable
      Sets a table name for the backup of the original table.
      Specified by:
      backupTableName in interface MigrateTable
      Parameters:
      tableName - the table name for backup
      Returns:
      this for method chaining
    • executeWith

      public MigrateTableSparkAction executeWith(ExecutorService service)
      Description copied from interface: MigrateTable
      Sets the executor service to use for parallel file reading. The default is not using executor service.
      Specified by:
      executeWith in interface MigrateTable
      Parameters:
      service - executor service
      Returns:
      this for method chaining
    • execute

      public MigrateTable.Result execute()
      Description copied from interface: Action
      Executes this action.
      Specified by:
      execute in interface Action<MigrateTable,MigrateTable.Result>
      Returns:
      the result of this action
    • destTableProps

      protected Map<String,String> destTableProps()
    • checkSourceCatalog

      protected org.apache.spark.sql.connector.catalog.TableCatalog checkSourceCatalog(org.apache.spark.sql.connector.catalog.CatalogPlugin catalog)
    • sourceTableLocation

      protected String sourceTableLocation()
    • v1SourceTable

      protected org.apache.spark.sql.catalyst.catalog.CatalogTable v1SourceTable()
    • sourceCatalog

      protected org.apache.spark.sql.connector.catalog.TableCatalog sourceCatalog()
    • sourceTableIdent

      protected org.apache.spark.sql.connector.catalog.Identifier sourceTableIdent()
    • setProperties

      protected void setProperties(Map<String,String> properties)
    • setProperty

      protected void setProperty(String key, String value)
    • additionalProperties

      protected Map<String,String> additionalProperties()
    • checkDestinationCatalog

      protected org.apache.spark.sql.connector.catalog.StagingTableCatalog checkDestinationCatalog(org.apache.spark.sql.connector.catalog.CatalogPlugin catalog)
    • stageDestTable

      protected StagedSparkTable stageDestTable()
    • ensureNameMappingPresent

      protected void ensureNameMappingPresent(Table table)
    • getMetadataLocation

      protected String getMetadataLocation(Table table)
    • spark

      protected org.apache.spark.sql.SparkSession spark()
    • sparkContext

      protected org.apache.spark.api.java.JavaSparkContext sparkContext()
    • option

      public MigrateTableSparkAction option(String name, String value)
    • options

      public MigrateTableSparkAction options(Map<String,String> newOptions)
    • options

      protected Map<String,String> options()
    • withJobGroupInfo

      protected <T> T withJobGroupInfo(JobGroupInfo info, Supplier<T> supplier)
    • newJobGroupInfo

      protected JobGroupInfo newJobGroupInfo(String groupId, String desc)
    • newStaticTable

      protected Table newStaticTable(TableMetadata metadata, FileIO io)
    • contentFileDS

      protected org.apache.spark.sql.Dataset<FileInfo> contentFileDS(Table table)
    • contentFileDS

      protected org.apache.spark.sql.Dataset<FileInfo> contentFileDS(Table table, Set<Long> snapshotIds)
    • manifestDS

      protected org.apache.spark.sql.Dataset<FileInfo> manifestDS(Table table)
    • manifestDS

      protected org.apache.spark.sql.Dataset<FileInfo> manifestDS(Table table, Set<Long> snapshotIds)
    • manifestListDS

      protected org.apache.spark.sql.Dataset<FileInfo> manifestListDS(Table table)
    • manifestListDS

      protected org.apache.spark.sql.Dataset<FileInfo> manifestListDS(Table table, Set<Long> snapshotIds)
    • statisticsFileDS

      protected org.apache.spark.sql.Dataset<FileInfo> statisticsFileDS(Table table, Set<Long> snapshotIds)
    • otherMetadataFileDS

      protected org.apache.spark.sql.Dataset<FileInfo> otherMetadataFileDS(Table table)
    • allReachableOtherMetadataFileDS

      protected org.apache.spark.sql.Dataset<FileInfo> allReachableOtherMetadataFileDS(Table table)
    • loadMetadataTable

      protected org.apache.spark.sql.Dataset<org.apache.spark.sql.Row> loadMetadataTable(Table table, MetadataTableType type)
    • deleteFiles

      protected org.apache.iceberg.spark.actions.BaseSparkAction.DeleteSummary deleteFiles(ExecutorService executorService, Consumer<String> deleteFunc, Iterator<FileInfo> files)
      Deletes files and keeps track of how many files were removed for each file type.
      Parameters:
      executorService - an executor service to use for parallel deletes
      deleteFunc - a delete func
      files - an iterator of Spark rows of the structure (path: String, type: String)
      Returns:
      stats on which files were deleted
    • deleteFiles

      protected org.apache.iceberg.spark.actions.BaseSparkAction.DeleteSummary deleteFiles(SupportsBulkOperations io, Iterator<FileInfo> files)