Package org.apache.iceberg.spark.actions
Class MigrateTableSparkAction
java.lang.Object
org.apache.iceberg.spark.actions.MigrateTableSparkAction
- All Implemented Interfaces:
Action<MigrateTable,
,MigrateTable.Result> MigrateTable
Takes a Spark table in the source catalog and attempts to transform it into an Iceberg table in
the same location with the same identifier. Once complete the identifier which previously
referred to a non-Iceberg table will refer to the newly migrated Iceberg table.
-
Nested Class Summary
Nested classes/interfaces inherited from interface org.apache.iceberg.actions.MigrateTable
MigrateTable.Result
-
Field Summary
Modifier and TypeFieldDescriptionprotected static final org.apache.iceberg.relocated.com.google.common.base.Joiner
protected static final org.apache.iceberg.relocated.com.google.common.base.Splitter
protected static final String
protected static final String
protected static final String
protected static final String
protected static final String
protected static final String
protected static final String
protected static final String
-
Method Summary
Modifier and TypeMethodDescriptionprotected org.apache.spark.sql.Dataset
<FileInfo> backupTableName
(String tableName) Sets a table name for the backup of the original table.protected org.apache.spark.sql.connector.catalog.StagingTableCatalog
checkDestinationCatalog
(org.apache.spark.sql.connector.catalog.CatalogPlugin catalog) protected org.apache.spark.sql.connector.catalog.TableCatalog
checkSourceCatalog
(org.apache.spark.sql.connector.catalog.CatalogPlugin catalog) protected org.apache.spark.sql.Dataset
<FileInfo> contentFileDS
(Table table) protected org.apache.spark.sql.Dataset
<FileInfo> contentFileDS
(Table table, Set<Long> snapshotIds) protected org.apache.iceberg.spark.actions.BaseSparkAction.DeleteSummary
deleteFiles
(ExecutorService executorService, Consumer<String> deleteFunc, Iterator<FileInfo> files) Deletes files and keeps track of how many files were removed for each file type.protected org.apache.iceberg.spark.actions.BaseSparkAction.DeleteSummary
deleteFiles
(SupportsBulkOperations io, Iterator<FileInfo> files) protected org.apache.spark.sql.connector.catalog.StagingTableCatalog
protected org.apache.spark.sql.connector.catalog.Identifier
Drops the backup of the original table after a successful migrationprotected void
ensureNameMappingPresent
(Table table) execute()
Executes this action.executeWith
(ExecutorService service) Sets the executor service to use for parallel file reading.protected String
getMetadataLocation
(Table table) protected org.apache.spark.sql.Dataset
<org.apache.spark.sql.Row> loadMetadataTable
(Table table, MetadataTableType type) protected org.apache.spark.sql.Dataset
<FileInfo> manifestDS
(Table table) protected org.apache.spark.sql.Dataset
<FileInfo> manifestDS
(Table table, Set<Long> snapshotIds) protected org.apache.spark.sql.Dataset
<FileInfo> manifestListDS
(Table table) protected org.apache.spark.sql.Dataset
<FileInfo> manifestListDS
(Table table, Set<Long> snapshotIds) protected JobGroupInfo
newJobGroupInfo
(String groupId, String desc) protected Table
newStaticTable
(TableMetadata metadata, FileIO io) options()
protected org.apache.spark.sql.Dataset
<FileInfo> otherMetadataFileDS
(Table table) protected MigrateTableSparkAction
self()
protected void
setProperties
(Map<String, String> properties) protected void
setProperty
(String key, String value) protected org.apache.spark.sql.connector.catalog.TableCatalog
protected org.apache.spark.sql.connector.catalog.Identifier
protected String
protected org.apache.spark.sql.SparkSession
spark()
protected org.apache.spark.api.java.JavaSparkContext
protected StagedSparkTable
protected org.apache.spark.sql.Dataset
<FileInfo> statisticsFileDS
(Table table, Set<Long> snapshotIds) tableProperties
(Map<String, String> properties) Sets table properties in the newly created Iceberg table.tableProperty
(String property, String value) Sets a table property in the newly created Iceberg table.protected org.apache.spark.sql.catalyst.catalog.CatalogTable
protected <T> T
withJobGroupInfo
(JobGroupInfo info, Supplier<T> supplier)
-
Field Details
-
LOCATION
- See Also:
-
ICEBERG_METADATA_FOLDER
- See Also:
-
EXCLUDED_PROPERTIES
-
MANIFEST
- See Also:
-
MANIFEST_LIST
- See Also:
-
STATISTICS_FILES
- See Also:
-
OTHERS
- See Also:
-
FILE_PATH
- See Also:
-
LAST_MODIFIED
- See Also:
-
COMMA_SPLITTER
protected static final org.apache.iceberg.relocated.com.google.common.base.Splitter COMMA_SPLITTER -
COMMA_JOINER
protected static final org.apache.iceberg.relocated.com.google.common.base.Joiner COMMA_JOINER
-
-
Method Details
-
self
-
destCatalog
protected org.apache.spark.sql.connector.catalog.StagingTableCatalog destCatalog() -
destTableIdent
protected org.apache.spark.sql.connector.catalog.Identifier destTableIdent() -
tableProperties
Description copied from interface:MigrateTable
Sets table properties in the newly created Iceberg table. Any properties with the same key name will be overwritten.- Specified by:
tableProperties
in interfaceMigrateTable
- Parameters:
properties
- a map of properties to set- Returns:
- this for method chaining
-
tableProperty
Description copied from interface:MigrateTable
Sets a table property in the newly created Iceberg table. Any properties with the same key will be overwritten.- Specified by:
tableProperty
in interfaceMigrateTable
- Parameters:
property
- a table property namevalue
- a table property value- Returns:
- this for method chaining
-
dropBackup
Description copied from interface:MigrateTable
Drops the backup of the original table after a successful migration- Specified by:
dropBackup
in interfaceMigrateTable
- Returns:
- this for method chaining
-
backupTableName
Description copied from interface:MigrateTable
Sets a table name for the backup of the original table.- Specified by:
backupTableName
in interfaceMigrateTable
- Parameters:
tableName
- the table name for backup- Returns:
- this for method chaining
-
executeWith
Description copied from interface:MigrateTable
Sets the executor service to use for parallel file reading. The default is not using executor service.- Specified by:
executeWith
in interfaceMigrateTable
- Parameters:
service
- executor service- Returns:
- this for method chaining
-
execute
Description copied from interface:Action
Executes this action.- Specified by:
execute
in interfaceAction<MigrateTable,
MigrateTable.Result> - Returns:
- the result of this action
-
destTableProps
-
checkSourceCatalog
protected org.apache.spark.sql.connector.catalog.TableCatalog checkSourceCatalog(org.apache.spark.sql.connector.catalog.CatalogPlugin catalog) -
sourceTableLocation
-
v1SourceTable
protected org.apache.spark.sql.catalyst.catalog.CatalogTable v1SourceTable() -
sourceCatalog
protected org.apache.spark.sql.connector.catalog.TableCatalog sourceCatalog() -
sourceTableIdent
protected org.apache.spark.sql.connector.catalog.Identifier sourceTableIdent() -
setProperties
-
setProperty
-
additionalProperties
-
checkDestinationCatalog
protected org.apache.spark.sql.connector.catalog.StagingTableCatalog checkDestinationCatalog(org.apache.spark.sql.connector.catalog.CatalogPlugin catalog) -
stageDestTable
-
ensureNameMappingPresent
-
getMetadataLocation
-
spark
protected org.apache.spark.sql.SparkSession spark() -
sparkContext
protected org.apache.spark.api.java.JavaSparkContext sparkContext() -
option
-
options
-
options
-
withJobGroupInfo
-
newJobGroupInfo
-
newStaticTable
-
contentFileDS
-
contentFileDS
-
manifestDS
-
manifestDS
-
manifestListDS
-
manifestListDS
-
statisticsFileDS
-
otherMetadataFileDS
-
allReachableOtherMetadataFileDS
-
loadMetadataTable
protected org.apache.spark.sql.Dataset<org.apache.spark.sql.Row> loadMetadataTable(Table table, MetadataTableType type) -
deleteFiles
protected org.apache.iceberg.spark.actions.BaseSparkAction.DeleteSummary deleteFiles(ExecutorService executorService, Consumer<String> deleteFunc, Iterator<FileInfo> files) Deletes files and keeps track of how many files were removed for each file type.- Parameters:
executorService
- an executor service to use for parallel deletesdeleteFunc
- a delete funcfiles
- an iterator of Spark rows of the structure (path: String, type: String)- Returns:
- stats on which files were deleted
-
deleteFiles
protected org.apache.iceberg.spark.actions.BaseSparkAction.DeleteSummary deleteFiles(SupportsBulkOperations io, Iterator<FileInfo> files)
-