Package org.apache.iceberg.spark.actions
Class MigrateTableSparkAction
java.lang.Object
org.apache.iceberg.spark.actions.MigrateTableSparkAction
- All Implemented Interfaces:
- Action<MigrateTable,,- MigrateTable.Result> - MigrateTable
Takes a Spark table in the source catalog and attempts to transform it into an Iceberg table in
 the same location with the same identifier. Once complete the identifier which previously
 referred to a non-Iceberg table will refer to the newly migrated Iceberg table.
- 
Nested Class SummaryNested classes/interfaces inherited from interface org.apache.iceberg.actions.MigrateTableMigrateTable.Result
- 
Field SummaryFieldsModifier and TypeFieldDescriptionprotected static final org.apache.iceberg.relocated.com.google.common.base.Joinerprotected static final org.apache.iceberg.relocated.com.google.common.base.Splitterprotected static final Stringprotected static final Stringprotected static final Stringprotected static final Stringprotected static final Stringprotected static final Stringprotected static final Stringprotected static final String
- 
Method SummaryModifier and TypeMethodDescriptionprotected org.apache.spark.sql.Dataset<FileInfo>backupTableName(String tableName) Sets a table name for the backup of the original table.protected org.apache.spark.sql.connector.catalog.StagingTableCatalogcheckDestinationCatalog(org.apache.spark.sql.connector.catalog.CatalogPlugin catalog) protected org.apache.spark.sql.connector.catalog.TableCatalogcheckSourceCatalog(org.apache.spark.sql.connector.catalog.CatalogPlugin catalog) protected org.apache.spark.sql.Dataset<FileInfo>contentFileDS(Table table) protected org.apache.spark.sql.Dataset<FileInfo>contentFileDS(Table table, Set<Long> snapshotIds) protected org.apache.iceberg.spark.actions.BaseSparkAction.DeleteSummarydeleteFiles(ExecutorService executorService, Consumer<String> deleteFunc, Iterator<FileInfo> files) Deletes files and keeps track of how many files were removed for each file type.protected org.apache.iceberg.spark.actions.BaseSparkAction.DeleteSummarydeleteFiles(SupportsBulkOperations io, Iterator<FileInfo> files) protected org.apache.spark.sql.connector.catalog.StagingTableCatalogprotected org.apache.spark.sql.connector.catalog.IdentifierDrops the backup of the original table after a successful migrationprotected voidensureNameMappingPresent(Table table) execute()Executes this action.executeWith(ExecutorService service) Sets the executor service to use for parallel file reading.protected StringgetMetadataLocation(Table table) protected org.apache.spark.sql.Dataset<org.apache.spark.sql.Row>loadMetadataTable(Table table, MetadataTableType type) protected org.apache.spark.sql.Dataset<FileInfo>manifestDS(Table table) protected org.apache.spark.sql.Dataset<FileInfo>manifestDS(Table table, Set<Long> snapshotIds) protected org.apache.spark.sql.Dataset<FileInfo>manifestListDS(Table table) protected org.apache.spark.sql.Dataset<FileInfo>manifestListDS(Table table, Set<Long> snapshotIds) protected JobGroupInfonewJobGroupInfo(String groupId, String desc) protected TablenewStaticTable(TableMetadata metadata, FileIO io) options()protected org.apache.spark.sql.Dataset<FileInfo>otherMetadataFileDS(Table table) protected MigrateTableSparkActionself()protected voidsetProperties(Map<String, String> properties) protected voidsetProperty(String key, String value) protected org.apache.spark.sql.connector.catalog.TableCatalogprotected org.apache.spark.sql.connector.catalog.Identifierprotected Stringprotected org.apache.spark.sql.SparkSessionspark()protected org.apache.spark.api.java.JavaSparkContextprotected StagedSparkTableprotected org.apache.spark.sql.Dataset<FileInfo>statisticsFileDS(Table table, Set<Long> snapshotIds) tableProperties(Map<String, String> properties) Sets table properties in the newly created Iceberg table.tableProperty(String property, String value) Sets a table property in the newly created Iceberg table.protected org.apache.spark.sql.catalyst.catalog.CatalogTableprotected <T> TwithJobGroupInfo(JobGroupInfo info, Supplier<T> supplier) 
- 
Field Details- 
LOCATION- See Also:
 
- 
ICEBERG_METADATA_FOLDER- See Also:
 
- 
EXCLUDED_PROPERTIES
- 
MANIFEST- See Also:
 
- 
MANIFEST_LIST- See Also:
 
- 
STATISTICS_FILES- See Also:
 
- 
OTHERS- See Also:
 
- 
FILE_PATH- See Also:
 
- 
LAST_MODIFIED- See Also:
 
- 
COMMA_SPLITTERprotected static final org.apache.iceberg.relocated.com.google.common.base.Splitter COMMA_SPLITTER
- 
COMMA_JOINERprotected static final org.apache.iceberg.relocated.com.google.common.base.Joiner COMMA_JOINER
 
- 
- 
Method Details- 
self
- 
destCatalogprotected org.apache.spark.sql.connector.catalog.StagingTableCatalog destCatalog()
- 
destTableIdentprotected org.apache.spark.sql.connector.catalog.Identifier destTableIdent()
- 
tablePropertiesDescription copied from interface:MigrateTableSets table properties in the newly created Iceberg table. Any properties with the same key name will be overwritten.- Specified by:
- tablePropertiesin interface- MigrateTable
- Parameters:
- properties- a map of properties to set
- Returns:
- this for method chaining
 
- 
tablePropertyDescription copied from interface:MigrateTableSets a table property in the newly created Iceberg table. Any properties with the same key will be overwritten.- Specified by:
- tablePropertyin interface- MigrateTable
- Parameters:
- property- a table property name
- value- a table property value
- Returns:
- this for method chaining
 
- 
dropBackupDescription copied from interface:MigrateTableDrops the backup of the original table after a successful migration- Specified by:
- dropBackupin interface- MigrateTable
- Returns:
- this for method chaining
 
- 
backupTableNameDescription copied from interface:MigrateTableSets a table name for the backup of the original table.- Specified by:
- backupTableNamein interface- MigrateTable
- Parameters:
- tableName- the table name for backup
- Returns:
- this for method chaining
 
- 
executeWithDescription copied from interface:MigrateTableSets the executor service to use for parallel file reading. The default is not using executor service.- Specified by:
- executeWithin interface- MigrateTable
- Parameters:
- service- executor service
- Returns:
- this for method chaining
 
- 
executeDescription copied from interface:ActionExecutes this action.- Specified by:
- executein interface- Action<MigrateTable,- MigrateTable.Result> 
- Returns:
- the result of this action
 
- 
destTableProps
- 
checkSourceCatalogprotected org.apache.spark.sql.connector.catalog.TableCatalog checkSourceCatalog(org.apache.spark.sql.connector.catalog.CatalogPlugin catalog) 
- 
sourceTableLocation
- 
v1SourceTableprotected org.apache.spark.sql.catalyst.catalog.CatalogTable v1SourceTable()
- 
sourceCatalogprotected org.apache.spark.sql.connector.catalog.TableCatalog sourceCatalog()
- 
sourceTableIdentprotected org.apache.spark.sql.connector.catalog.Identifier sourceTableIdent()
- 
setProperties
- 
setProperty
- 
additionalProperties
- 
checkDestinationCatalogprotected org.apache.spark.sql.connector.catalog.StagingTableCatalog checkDestinationCatalog(org.apache.spark.sql.connector.catalog.CatalogPlugin catalog) 
- 
stageDestTable
- 
ensureNameMappingPresent
- 
getMetadataLocation
- 
sparkprotected org.apache.spark.sql.SparkSession spark()
- 
sparkContextprotected org.apache.spark.api.java.JavaSparkContext sparkContext()
- 
option
- 
options
- 
options
- 
withJobGroupInfo
- 
newJobGroupInfo
- 
newStaticTable
- 
contentFileDS
- 
contentFileDS
- 
manifestDS
- 
manifestDS
- 
manifestListDS
- 
manifestListDS
- 
statisticsFileDS
- 
otherMetadataFileDS
- 
allReachableOtherMetadataFileDS
- 
loadMetadataTableprotected org.apache.spark.sql.Dataset<org.apache.spark.sql.Row> loadMetadataTable(Table table, MetadataTableType type) 
- 
deleteFilesprotected org.apache.iceberg.spark.actions.BaseSparkAction.DeleteSummary deleteFiles(ExecutorService executorService, Consumer<String> deleteFunc, Iterator<FileInfo> files) Deletes files and keeps track of how many files were removed for each file type.- Parameters:
- executorService- an executor service to use for parallel deletes
- deleteFunc- a delete func
- files- an iterator of Spark rows of the structure (path: String, type: String)
- Returns:
- stats on which files were deleted
 
- 
deleteFilesprotected org.apache.iceberg.spark.actions.BaseSparkAction.DeleteSummary deleteFiles(SupportsBulkOperations io, Iterator<FileInfo> files) 
 
-