Class MigrateTableSparkAction

  • All Implemented Interfaces:
    Action<MigrateTable,​MigrateTable.Result>, MigrateTable

    public class MigrateTableSparkAction
    extends java.lang.Object
    implements MigrateTable
    Takes a Spark table in the source catalog and attempts to transform it into an Iceberg table in the same location with the same identifier. Once complete the identifier which previously referred to a non-Iceberg table will refer to the newly migrated Iceberg table.
    • Field Detail

      • ICEBERG_METADATA_FOLDER

        protected static final java.lang.String ICEBERG_METADATA_FOLDER
        See Also:
        Constant Field Values
      • EXCLUDED_PROPERTIES

        protected static final java.util.List<java.lang.String> EXCLUDED_PROPERTIES
      • STATISTICS_FILES

        protected static final java.lang.String STATISTICS_FILES
        See Also:
        Constant Field Values
      • COMMA_SPLITTER

        protected static final org.apache.iceberg.relocated.com.google.common.base.Splitter COMMA_SPLITTER
      • COMMA_JOINER

        protected static final org.apache.iceberg.relocated.com.google.common.base.Joiner COMMA_JOINER
    • Method Detail

      • destCatalog

        protected org.apache.spark.sql.connector.catalog.StagingTableCatalog destCatalog()
      • destTableIdent

        protected org.apache.spark.sql.connector.catalog.Identifier destTableIdent()
      • tableProperties

        public MigrateTableSparkAction tableProperties​(java.util.Map<java.lang.String,​java.lang.String> properties)
        Description copied from interface: MigrateTable
        Sets table properties in the newly created Iceberg table. Any properties with the same key name will be overwritten.
        Specified by:
        tableProperties in interface MigrateTable
        Parameters:
        properties - a map of properties to set
        Returns:
        this for method chaining
      • tableProperty

        public MigrateTableSparkAction tableProperty​(java.lang.String property,
                                                     java.lang.String value)
        Description copied from interface: MigrateTable
        Sets a table property in the newly created Iceberg table. Any properties with the same key will be overwritten.
        Specified by:
        tableProperty in interface MigrateTable
        Parameters:
        property - a table property name
        value - a table property value
        Returns:
        this for method chaining
      • destTableProps

        protected java.util.Map<java.lang.String,​java.lang.String> destTableProps()
      • checkSourceCatalog

        protected org.apache.spark.sql.connector.catalog.TableCatalog checkSourceCatalog​(org.apache.spark.sql.connector.catalog.CatalogPlugin catalog)
      • sourceTableLocation

        protected java.lang.String sourceTableLocation()
      • v1SourceTable

        protected org.apache.spark.sql.catalyst.catalog.CatalogTable v1SourceTable()
      • sourceCatalog

        protected org.apache.spark.sql.connector.catalog.TableCatalog sourceCatalog()
      • sourceTableIdent

        protected org.apache.spark.sql.connector.catalog.Identifier sourceTableIdent()
      • setProperties

        protected void setProperties​(java.util.Map<java.lang.String,​java.lang.String> properties)
      • setProperty

        protected void setProperty​(java.lang.String key,
                                   java.lang.String value)
      • additionalProperties

        protected java.util.Map<java.lang.String,​java.lang.String> additionalProperties()
      • checkDestinationCatalog

        protected org.apache.spark.sql.connector.catalog.StagingTableCatalog checkDestinationCatalog​(org.apache.spark.sql.connector.catalog.CatalogPlugin catalog)
      • ensureNameMappingPresent

        protected void ensureNameMappingPresent​(Table table)
      • getMetadataLocation

        protected java.lang.String getMetadataLocation​(Table table)
      • spark

        protected org.apache.spark.sql.SparkSession spark()
      • sparkContext

        protected org.apache.spark.api.java.JavaSparkContext sparkContext()
      • option

        public ThisT option​(java.lang.String name,
                            java.lang.String value)
      • options

        public ThisT options​(java.util.Map<java.lang.String,​java.lang.String> newOptions)
      • options

        protected java.util.Map<java.lang.String,​java.lang.String> options()
      • withJobGroupInfo

        protected <T> T withJobGroupInfo​(JobGroupInfo info,
                                         java.util.function.Supplier<T> supplier)
      • newJobGroupInfo

        protected JobGroupInfo newJobGroupInfo​(java.lang.String groupId,
                                               java.lang.String desc)
      • contentFileDS

        protected org.apache.spark.sql.Dataset<FileInfo> contentFileDS​(Table table)
      • contentFileDS

        protected org.apache.spark.sql.Dataset<FileInfo> contentFileDS​(Table table,
                                                                       java.util.Set<java.lang.Long> snapshotIds)
      • manifestDS

        protected org.apache.spark.sql.Dataset<FileInfo> manifestDS​(Table table)
      • manifestDS

        protected org.apache.spark.sql.Dataset<FileInfo> manifestDS​(Table table,
                                                                    java.util.Set<java.lang.Long> snapshotIds)
      • manifestListDS

        protected org.apache.spark.sql.Dataset<FileInfo> manifestListDS​(Table table)
      • manifestListDS

        protected org.apache.spark.sql.Dataset<FileInfo> manifestListDS​(Table table,
                                                                        java.util.Set<java.lang.Long> snapshotIds)
      • statisticsFileDS

        protected org.apache.spark.sql.Dataset<FileInfo> statisticsFileDS​(Table table,
                                                                          java.util.Set<java.lang.Long> snapshotIds)
      • otherMetadataFileDS

        protected org.apache.spark.sql.Dataset<FileInfo> otherMetadataFileDS​(Table table)
      • allReachableOtherMetadataFileDS

        protected org.apache.spark.sql.Dataset<FileInfo> allReachableOtherMetadataFileDS​(Table table)
      • loadMetadataTable

        protected org.apache.spark.sql.Dataset<org.apache.spark.sql.Row> loadMetadataTable​(Table table,
                                                                                           MetadataTableType type)
      • deleteFiles

        protected org.apache.iceberg.spark.actions.BaseSparkAction.DeleteSummary deleteFiles​(java.util.concurrent.ExecutorService executorService,
                                                                                             java.util.function.Consumer<java.lang.String> deleteFunc,
                                                                                             java.util.Iterator<FileInfo> files)
        Deletes files and keeps track of how many files were removed for each file type.
        Parameters:
        executorService - an executor service to use for parallel deletes
        deleteFunc - a delete func
        files - an iterator of Spark rows of the structure (path: String, type: String)
        Returns:
        stats on which files were deleted
      • deleteFiles

        protected org.apache.iceberg.spark.actions.BaseSparkAction.DeleteSummary deleteFiles​(SupportsBulkOperations io,
                                                                                             java.util.Iterator<FileInfo> files)