Class MigrateTableSparkAction

  • All Implemented Interfaces:
    Action<MigrateTable,​MigrateTable.Result>, MigrateTable

    public class MigrateTableSparkAction
    extends java.lang.Object
    implements MigrateTable
    Takes a Spark table in the source catalog and attempts to transform it into an Iceberg table in the same location with the same identifier. Once complete the identifier which previously referred to a non-Iceberg table will refer to the newly migrated Iceberg table.
    • Field Summary

      Fields 
      Modifier and Type Field Description
      protected static org.apache.iceberg.relocated.com.google.common.base.Joiner COMMA_JOINER  
      protected static org.apache.iceberg.relocated.com.google.common.base.Splitter COMMA_SPLITTER  
      protected static java.util.List<java.lang.String> EXCLUDED_PROPERTIES  
      protected static java.lang.String FILE_PATH  
      protected static java.lang.String ICEBERG_METADATA_FOLDER  
      protected static java.lang.String LAST_MODIFIED  
      protected static java.lang.String LOCATION  
      protected static java.lang.String MANIFEST  
      protected static java.lang.String MANIFEST_LIST  
      protected static java.lang.String OTHERS  
      protected static java.lang.String STATISTICS_FILES  
    • Method Summary

      All Methods Instance Methods Concrete Methods 
      Modifier and Type Method Description
      protected java.util.Map<java.lang.String,​java.lang.String> additionalProperties()  
      protected org.apache.spark.sql.Dataset<FileInfo> allReachableOtherMetadataFileDS​(Table table)  
      MigrateTableSparkAction backupTableName​(java.lang.String tableName)
      Sets a table name for the backup of the original table.
      protected org.apache.spark.sql.connector.catalog.StagingTableCatalog checkDestinationCatalog​(org.apache.spark.sql.connector.catalog.CatalogPlugin catalog)  
      protected org.apache.spark.sql.connector.catalog.TableCatalog checkSourceCatalog​(org.apache.spark.sql.connector.catalog.CatalogPlugin catalog)  
      protected org.apache.spark.sql.Dataset<FileInfo> contentFileDS​(Table table)  
      protected org.apache.spark.sql.Dataset<FileInfo> contentFileDS​(Table table, java.util.Set<java.lang.Long> snapshotIds)  
      protected org.apache.iceberg.spark.actions.BaseSparkAction.DeleteSummary deleteFiles​(java.util.concurrent.ExecutorService executorService, java.util.function.Consumer<java.lang.String> deleteFunc, java.util.Iterator<FileInfo> files)
      Deletes files and keeps track of how many files were removed for each file type.
      protected org.apache.iceberg.spark.actions.BaseSparkAction.DeleteSummary deleteFiles​(SupportsBulkOperations io, java.util.Iterator<FileInfo> files)  
      protected org.apache.spark.sql.connector.catalog.StagingTableCatalog destCatalog()  
      protected org.apache.spark.sql.connector.catalog.Identifier destTableIdent()  
      protected java.util.Map<java.lang.String,​java.lang.String> destTableProps()  
      MigrateTableSparkAction dropBackup()
      Drops the backup of the original table after a successful migration
      protected void ensureNameMappingPresent​(Table table)  
      MigrateTable.Result execute()
      Executes this action.
      MigrateTableSparkAction executeWith​(java.util.concurrent.ExecutorService service)
      Sets the executor service to use for parallel file reading.
      protected java.lang.String getMetadataLocation​(Table table)  
      protected org.apache.spark.sql.Dataset<org.apache.spark.sql.Row> loadMetadataTable​(Table table, MetadataTableType type)  
      protected org.apache.spark.sql.Dataset<FileInfo> manifestDS​(Table table)  
      protected org.apache.spark.sql.Dataset<FileInfo> manifestDS​(Table table, java.util.Set<java.lang.Long> snapshotIds)  
      protected org.apache.spark.sql.Dataset<FileInfo> manifestListDS​(Table table)  
      protected org.apache.spark.sql.Dataset<FileInfo> manifestListDS​(Table table, java.util.Set<java.lang.Long> snapshotIds)  
      protected JobGroupInfo newJobGroupInfo​(java.lang.String groupId, java.lang.String desc)  
      protected Table newStaticTable​(TableMetadata metadata, FileIO io)  
      ThisT option​(java.lang.String name, java.lang.String value)  
      protected java.util.Map<java.lang.String,​java.lang.String> options()  
      ThisT options​(java.util.Map<java.lang.String,​java.lang.String> newOptions)  
      protected org.apache.spark.sql.Dataset<FileInfo> otherMetadataFileDS​(Table table)  
      protected MigrateTableSparkAction self()  
      protected void setProperties​(java.util.Map<java.lang.String,​java.lang.String> properties)  
      protected void setProperty​(java.lang.String key, java.lang.String value)  
      protected org.apache.spark.sql.connector.catalog.TableCatalog sourceCatalog()  
      protected org.apache.spark.sql.connector.catalog.Identifier sourceTableIdent()  
      protected java.lang.String sourceTableLocation()  
      protected org.apache.spark.sql.SparkSession spark()  
      protected org.apache.spark.api.java.JavaSparkContext sparkContext()  
      protected StagedSparkTable stageDestTable()  
      protected org.apache.spark.sql.Dataset<FileInfo> statisticsFileDS​(Table table, java.util.Set<java.lang.Long> snapshotIds)  
      MigrateTableSparkAction tableProperties​(java.util.Map<java.lang.String,​java.lang.String> properties)
      Sets table properties in the newly created Iceberg table.
      MigrateTableSparkAction tableProperty​(java.lang.String property, java.lang.String value)
      Sets a table property in the newly created Iceberg table.
      protected org.apache.spark.sql.catalyst.catalog.CatalogTable v1SourceTable()  
      protected <T> T withJobGroupInfo​(JobGroupInfo info, java.util.function.Supplier<T> supplier)  
      • Methods inherited from class java.lang.Object

        clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
    • Field Detail

      • ICEBERG_METADATA_FOLDER

        protected static final java.lang.String ICEBERG_METADATA_FOLDER
        See Also:
        Constant Field Values
      • EXCLUDED_PROPERTIES

        protected static final java.util.List<java.lang.String> EXCLUDED_PROPERTIES
      • STATISTICS_FILES

        protected static final java.lang.String STATISTICS_FILES
        See Also:
        Constant Field Values
      • COMMA_SPLITTER

        protected static final org.apache.iceberg.relocated.com.google.common.base.Splitter COMMA_SPLITTER
      • COMMA_JOINER

        protected static final org.apache.iceberg.relocated.com.google.common.base.Joiner COMMA_JOINER
    • Method Detail

      • destCatalog

        protected org.apache.spark.sql.connector.catalog.StagingTableCatalog destCatalog()
      • destTableIdent

        protected org.apache.spark.sql.connector.catalog.Identifier destTableIdent()
      • tableProperties

        public MigrateTableSparkAction tableProperties​(java.util.Map<java.lang.String,​java.lang.String> properties)
        Description copied from interface: MigrateTable
        Sets table properties in the newly created Iceberg table. Any properties with the same key name will be overwritten.
        Specified by:
        tableProperties in interface MigrateTable
        Parameters:
        properties - a map of properties to set
        Returns:
        this for method chaining
      • tableProperty

        public MigrateTableSparkAction tableProperty​(java.lang.String property,
                                                     java.lang.String value)
        Description copied from interface: MigrateTable
        Sets a table property in the newly created Iceberg table. Any properties with the same key will be overwritten.
        Specified by:
        tableProperty in interface MigrateTable
        Parameters:
        property - a table property name
        value - a table property value
        Returns:
        this for method chaining
      • backupTableName

        public MigrateTableSparkAction backupTableName​(java.lang.String tableName)
        Description copied from interface: MigrateTable
        Sets a table name for the backup of the original table.
        Specified by:
        backupTableName in interface MigrateTable
        Parameters:
        tableName - the table name for backup
        Returns:
        this for method chaining
      • executeWith

        public MigrateTableSparkAction executeWith​(java.util.concurrent.ExecutorService service)
        Description copied from interface: MigrateTable
        Sets the executor service to use for parallel file reading. The default is not using executor service.
        Specified by:
        executeWith in interface MigrateTable
        Parameters:
        service - executor service
        Returns:
        this for method chaining
      • destTableProps

        protected java.util.Map<java.lang.String,​java.lang.String> destTableProps()
      • checkSourceCatalog

        protected org.apache.spark.sql.connector.catalog.TableCatalog checkSourceCatalog​(org.apache.spark.sql.connector.catalog.CatalogPlugin catalog)
      • sourceTableLocation

        protected java.lang.String sourceTableLocation()
      • v1SourceTable

        protected org.apache.spark.sql.catalyst.catalog.CatalogTable v1SourceTable()
      • sourceCatalog

        protected org.apache.spark.sql.connector.catalog.TableCatalog sourceCatalog()
      • sourceTableIdent

        protected org.apache.spark.sql.connector.catalog.Identifier sourceTableIdent()
      • setProperties

        protected void setProperties​(java.util.Map<java.lang.String,​java.lang.String> properties)
      • setProperty

        protected void setProperty​(java.lang.String key,
                                   java.lang.String value)
      • additionalProperties

        protected java.util.Map<java.lang.String,​java.lang.String> additionalProperties()
      • checkDestinationCatalog

        protected org.apache.spark.sql.connector.catalog.StagingTableCatalog checkDestinationCatalog​(org.apache.spark.sql.connector.catalog.CatalogPlugin catalog)
      • ensureNameMappingPresent

        protected void ensureNameMappingPresent​(Table table)
      • getMetadataLocation

        protected java.lang.String getMetadataLocation​(Table table)
      • spark

        protected org.apache.spark.sql.SparkSession spark()
      • sparkContext

        protected org.apache.spark.api.java.JavaSparkContext sparkContext()
      • option

        public ThisT option​(java.lang.String name,
                            java.lang.String value)
      • options

        public ThisT options​(java.util.Map<java.lang.String,​java.lang.String> newOptions)
      • options

        protected java.util.Map<java.lang.String,​java.lang.String> options()
      • withJobGroupInfo

        protected <T> T withJobGroupInfo​(JobGroupInfo info,
                                         java.util.function.Supplier<T> supplier)
      • newJobGroupInfo

        protected JobGroupInfo newJobGroupInfo​(java.lang.String groupId,
                                               java.lang.String desc)
      • contentFileDS

        protected org.apache.spark.sql.Dataset<FileInfo> contentFileDS​(Table table)
      • contentFileDS

        protected org.apache.spark.sql.Dataset<FileInfo> contentFileDS​(Table table,
                                                                       java.util.Set<java.lang.Long> snapshotIds)
      • manifestDS

        protected org.apache.spark.sql.Dataset<FileInfo> manifestDS​(Table table)
      • manifestDS

        protected org.apache.spark.sql.Dataset<FileInfo> manifestDS​(Table table,
                                                                    java.util.Set<java.lang.Long> snapshotIds)
      • manifestListDS

        protected org.apache.spark.sql.Dataset<FileInfo> manifestListDS​(Table table)
      • manifestListDS

        protected org.apache.spark.sql.Dataset<FileInfo> manifestListDS​(Table table,
                                                                        java.util.Set<java.lang.Long> snapshotIds)
      • statisticsFileDS

        protected org.apache.spark.sql.Dataset<FileInfo> statisticsFileDS​(Table table,
                                                                          java.util.Set<java.lang.Long> snapshotIds)
      • otherMetadataFileDS

        protected org.apache.spark.sql.Dataset<FileInfo> otherMetadataFileDS​(Table table)
      • allReachableOtherMetadataFileDS

        protected org.apache.spark.sql.Dataset<FileInfo> allReachableOtherMetadataFileDS​(Table table)
      • loadMetadataTable

        protected org.apache.spark.sql.Dataset<org.apache.spark.sql.Row> loadMetadataTable​(Table table,
                                                                                           MetadataTableType type)
      • deleteFiles

        protected org.apache.iceberg.spark.actions.BaseSparkAction.DeleteSummary deleteFiles​(java.util.concurrent.ExecutorService executorService,
                                                                                             java.util.function.Consumer<java.lang.String> deleteFunc,
                                                                                             java.util.Iterator<FileInfo> files)
        Deletes files and keeps track of how many files were removed for each file type.
        Parameters:
        executorService - an executor service to use for parallel deletes
        deleteFunc - a delete func
        files - an iterator of Spark rows of the structure (path: String, type: String)
        Returns:
        stats on which files were deleted
      • deleteFiles

        protected org.apache.iceberg.spark.actions.BaseSparkAction.DeleteSummary deleteFiles​(SupportsBulkOperations io,
                                                                                             java.util.Iterator<FileInfo> files)