Package org.apache.iceberg.spark.actions
Class SnapshotTableSparkAction
java.lang.Object
org.apache.iceberg.spark.actions.SnapshotTableSparkAction
- All Implemented Interfaces:
Action<SnapshotTable,
,SnapshotTable.Result> SnapshotTable
Creates a new Iceberg table based on a source Spark table. The new Iceberg table will have a
different data and metadata directory allowing it to exist independently of the source table.
-
Nested Class Summary
Nested classes/interfaces inherited from interface org.apache.iceberg.actions.SnapshotTable
SnapshotTable.Result
-
Field Summary
Modifier and TypeFieldDescriptionprotected static final org.apache.iceberg.relocated.com.google.common.base.Joiner
protected static final org.apache.iceberg.relocated.com.google.common.base.Splitter
protected static final String
protected static final String
protected static final String
protected static final String
protected static final String
protected static final String
protected static final String
protected static final String
-
Method Summary
Modifier and TypeMethodDescriptionprotected org.apache.spark.sql.Dataset
<FileInfo> Sets the table identifier for the newly created Iceberg table.protected org.apache.spark.sql.connector.catalog.StagingTableCatalog
checkDestinationCatalog
(org.apache.spark.sql.connector.catalog.CatalogPlugin catalog) protected org.apache.spark.sql.connector.catalog.TableCatalog
checkSourceCatalog
(org.apache.spark.sql.connector.catalog.CatalogPlugin catalog) protected org.apache.spark.sql.Dataset
<FileInfo> contentFileDS
(Table table) protected org.apache.spark.sql.Dataset
<FileInfo> contentFileDS
(Table table, Set<Long> snapshotIds) protected org.apache.iceberg.spark.actions.BaseSparkAction.DeleteSummary
deleteFiles
(ExecutorService executorService, Consumer<String> deleteFunc, Iterator<FileInfo> files) Deletes files and keeps track of how many files were removed for each file type.protected org.apache.iceberg.spark.actions.BaseSparkAction.DeleteSummary
deleteFiles
(SupportsBulkOperations io, Iterator<FileInfo> files) protected org.apache.spark.sql.connector.catalog.StagingTableCatalog
protected org.apache.spark.sql.connector.catalog.Identifier
protected void
ensureNameMappingPresent
(Table table) execute()
Executes this action.executeWith
(ExecutorService service) Sets the executor service to use for parallel file reading.protected String
getMetadataLocation
(Table table) protected org.apache.spark.sql.Dataset
<org.apache.spark.sql.Row> loadMetadataTable
(Table table, MetadataTableType type) protected org.apache.spark.sql.Dataset
<FileInfo> manifestDS
(Table table) protected org.apache.spark.sql.Dataset
<FileInfo> manifestDS
(Table table, Set<Long> snapshotIds) protected org.apache.spark.sql.Dataset
<FileInfo> manifestListDS
(Table table) protected org.apache.spark.sql.Dataset
<FileInfo> manifestListDS
(Table table, Set<Long> snapshotIds) protected JobGroupInfo
newJobGroupInfo
(String groupId, String desc) protected Table
newStaticTable
(TableMetadata metadata, FileIO io) options()
protected org.apache.spark.sql.Dataset
<FileInfo> otherMetadataFileDS
(Table table) protected SnapshotTableSparkAction
self()
protected void
setProperties
(Map<String, String> properties) protected void
setProperty
(String key, String value) protected org.apache.spark.sql.connector.catalog.TableCatalog
protected org.apache.spark.sql.connector.catalog.Identifier
protected String
protected org.apache.spark.sql.SparkSession
spark()
protected org.apache.spark.api.java.JavaSparkContext
protected StagedSparkTable
protected org.apache.spark.sql.Dataset
<FileInfo> statisticsFileDS
(Table table, Set<Long> snapshotIds) tableLocation
(String location) Sets the table location for the newly created Iceberg table.tableProperties
(Map<String, String> properties) Sets table properties in the newly created Iceberg table.tableProperty
(String property, String value) Sets a table property in the newly created Iceberg table.protected org.apache.spark.sql.catalyst.catalog.CatalogTable
protected <T> T
withJobGroupInfo
(JobGroupInfo info, Supplier<T> supplier)
-
Field Details
-
LOCATION
- See Also:
-
ICEBERG_METADATA_FOLDER
- See Also:
-
EXCLUDED_PROPERTIES
-
MANIFEST
- See Also:
-
MANIFEST_LIST
- See Also:
-
STATISTICS_FILES
- See Also:
-
OTHERS
- See Also:
-
FILE_PATH
- See Also:
-
LAST_MODIFIED
- See Also:
-
COMMA_SPLITTER
protected static final org.apache.iceberg.relocated.com.google.common.base.Splitter COMMA_SPLITTER -
COMMA_JOINER
protected static final org.apache.iceberg.relocated.com.google.common.base.Joiner COMMA_JOINER
-
-
Method Details
-
self
-
destCatalog
protected org.apache.spark.sql.connector.catalog.StagingTableCatalog destCatalog() -
destTableIdent
protected org.apache.spark.sql.connector.catalog.Identifier destTableIdent() -
as
Description copied from interface:SnapshotTable
Sets the table identifier for the newly created Iceberg table.- Specified by:
as
in interfaceSnapshotTable
- Parameters:
ident
- the destination table identifier- Returns:
- this for method chaining
-
tableProperties
Description copied from interface:SnapshotTable
Sets table properties in the newly created Iceberg table. Any properties with the same key name will be overwritten.- Specified by:
tableProperties
in interfaceSnapshotTable
- Parameters:
properties
- a map of properties to be included- Returns:
- this for method chaining
-
tableProperty
Description copied from interface:SnapshotTable
Sets a table property in the newly created Iceberg table. Any properties with the same key name will be overwritten.- Specified by:
tableProperty
in interfaceSnapshotTable
- Parameters:
property
- the key of the property to addvalue
- the value of the property to add- Returns:
- this for method chaining
-
executeWith
Description copied from interface:SnapshotTable
Sets the executor service to use for parallel file reading. The default is not using executor service.- Specified by:
executeWith
in interfaceSnapshotTable
- Parameters:
service
- executor service- Returns:
- this for method chaining
-
execute
Description copied from interface:Action
Executes this action.- Specified by:
execute
in interfaceAction<SnapshotTable,
SnapshotTable.Result> - Returns:
- the result of this action
-
destTableProps
-
checkSourceCatalog
protected org.apache.spark.sql.connector.catalog.TableCatalog checkSourceCatalog(org.apache.spark.sql.connector.catalog.CatalogPlugin catalog) -
tableLocation
Description copied from interface:SnapshotTable
Sets the table location for the newly created Iceberg table.- Specified by:
tableLocation
in interfaceSnapshotTable
- Parameters:
location
- a table location- Returns:
- this for method chaining
-
sourceTableLocation
-
v1SourceTable
protected org.apache.spark.sql.catalyst.catalog.CatalogTable v1SourceTable() -
sourceCatalog
protected org.apache.spark.sql.connector.catalog.TableCatalog sourceCatalog() -
sourceTableIdent
protected org.apache.spark.sql.connector.catalog.Identifier sourceTableIdent() -
setProperties
-
setProperty
-
additionalProperties
-
checkDestinationCatalog
protected org.apache.spark.sql.connector.catalog.StagingTableCatalog checkDestinationCatalog(org.apache.spark.sql.connector.catalog.CatalogPlugin catalog) -
stageDestTable
-
ensureNameMappingPresent
-
getMetadataLocation
-
spark
protected org.apache.spark.sql.SparkSession spark() -
sparkContext
protected org.apache.spark.api.java.JavaSparkContext sparkContext() -
option
-
options
-
options
-
withJobGroupInfo
-
newJobGroupInfo
-
newStaticTable
-
contentFileDS
-
contentFileDS
-
manifestDS
-
manifestDS
-
manifestListDS
-
manifestListDS
-
statisticsFileDS
-
otherMetadataFileDS
-
allReachableOtherMetadataFileDS
-
loadMetadataTable
protected org.apache.spark.sql.Dataset<org.apache.spark.sql.Row> loadMetadataTable(Table table, MetadataTableType type) -
deleteFiles
protected org.apache.iceberg.spark.actions.BaseSparkAction.DeleteSummary deleteFiles(ExecutorService executorService, Consumer<String> deleteFunc, Iterator<FileInfo> files) Deletes files and keeps track of how many files were removed for each file type.- Parameters:
executorService
- an executor service to use for parallel deletesdeleteFunc
- a delete funcfiles
- an iterator of Spark rows of the structure (path: String, type: String)- Returns:
- stats on which files were deleted
-
deleteFiles
protected org.apache.iceberg.spark.actions.BaseSparkAction.DeleteSummary deleteFiles(SupportsBulkOperations io, Iterator<FileInfo> files)
-