Package org.apache.iceberg.spark.actions
Class SnapshotTableSparkAction
java.lang.Object
org.apache.iceberg.spark.actions.SnapshotTableSparkAction
- All Implemented Interfaces:
- Action<SnapshotTable,,- SnapshotTable.Result> - SnapshotTable
Creates a new Iceberg table based on a source Spark table. The new Iceberg table will have a
 different data and metadata directory allowing it to exist independently of the source table.
- 
Nested Class SummaryNested classes/interfaces inherited from interface org.apache.iceberg.actions.SnapshotTableSnapshotTable.Result
- 
Field SummaryFieldsModifier and TypeFieldDescriptionprotected static final org.apache.iceberg.relocated.com.google.common.base.Joinerprotected static final org.apache.iceberg.relocated.com.google.common.base.Splitterprotected static final Stringprotected static final Stringprotected static final Stringprotected static final Stringprotected static final Stringprotected static final Stringprotected static final Stringprotected static final String
- 
Method SummaryModifier and TypeMethodDescriptionprotected org.apache.spark.sql.Dataset<FileInfo> Sets the table identifier for the newly created Iceberg table.protected org.apache.spark.sql.connector.catalog.StagingTableCatalogcheckDestinationCatalog(org.apache.spark.sql.connector.catalog.CatalogPlugin catalog) protected org.apache.spark.sql.connector.catalog.TableCatalogcheckSourceCatalog(org.apache.spark.sql.connector.catalog.CatalogPlugin catalog) protected org.apache.spark.sql.Dataset<FileInfo> contentFileDS(Table table) protected org.apache.spark.sql.Dataset<FileInfo> contentFileDS(Table table, Set<Long> snapshotIds) protected org.apache.iceberg.spark.actions.BaseSparkAction.DeleteSummarydeleteFiles(ExecutorService executorService, Consumer<String> deleteFunc, Iterator<FileInfo> files) Deletes files and keeps track of how many files were removed for each file type.protected org.apache.iceberg.spark.actions.BaseSparkAction.DeleteSummarydeleteFiles(SupportsBulkOperations io, Iterator<FileInfo> files) protected org.apache.spark.sql.connector.catalog.StagingTableCatalogprotected org.apache.spark.sql.connector.catalog.Identifierprotected voidensureNameMappingPresent(Table table) execute()Executes this action.executeWith(ExecutorService service) Sets the executor service to use for parallel file reading.protected StringgetMetadataLocation(Table table) protected org.apache.spark.sql.Dataset<org.apache.spark.sql.Row> loadMetadataTable(Table table, MetadataTableType type) protected org.apache.spark.sql.Dataset<FileInfo> manifestDS(Table table) protected org.apache.spark.sql.Dataset<FileInfo> manifestDS(Table table, Set<Long> snapshotIds) protected org.apache.spark.sql.Dataset<FileInfo> manifestListDS(Table table) protected org.apache.spark.sql.Dataset<FileInfo> manifestListDS(Table table, Set<Long> snapshotIds) protected JobGroupInfonewJobGroupInfo(String groupId, String desc) protected TablenewStaticTable(String metadataFileLocation, FileIO io) protected TablenewStaticTable(TableMetadata metadata, FileIO io) options()protected org.apache.spark.sql.Dataset<FileInfo> otherMetadataFileDS(Table table) protected SnapshotTableSparkActionself()protected voidsetProperties(Map<String, String> properties) protected voidsetProperty(String key, String value) protected org.apache.spark.sql.connector.catalog.TableCatalogprotected org.apache.spark.sql.connector.catalog.Identifierprotected Stringprotected org.apache.spark.sql.SparkSessionspark()protected org.apache.spark.api.java.JavaSparkContextprotected StagedSparkTableprotected org.apache.spark.sql.Dataset<FileInfo> statisticsFileDS(Table table, Set<Long> snapshotIds) tableLocation(String location) Sets the table location for the newly created Iceberg table.tableProperties(Map<String, String> properties) Sets table properties in the newly created Iceberg table.tableProperty(String property, String value) Sets a table property in the newly created Iceberg table.protected org.apache.spark.sql.catalyst.catalog.CatalogTableprotected <T> TwithJobGroupInfo(JobGroupInfo info, Supplier<T> supplier) 
- 
Field Details- 
LOCATION- See Also:
 
- 
ICEBERG_METADATA_FOLDER- See Also:
 
- 
EXCLUDED_PROPERTIES
- 
MANIFEST- See Also:
 
- 
MANIFEST_LIST- See Also:
 
- 
STATISTICS_FILES- See Also:
 
- 
OTHERS- See Also:
 
- 
FILE_PATH- See Also:
 
- 
LAST_MODIFIED- See Also:
 
- 
COMMA_SPLITTERprotected static final org.apache.iceberg.relocated.com.google.common.base.Splitter COMMA_SPLITTER
- 
COMMA_JOINERprotected static final org.apache.iceberg.relocated.com.google.common.base.Joiner COMMA_JOINER
 
- 
- 
Method Details- 
self
- 
destCatalogprotected org.apache.spark.sql.connector.catalog.StagingTableCatalog destCatalog()
- 
destTableIdentprotected org.apache.spark.sql.connector.catalog.Identifier destTableIdent()
- 
asDescription copied from interface:SnapshotTableSets the table identifier for the newly created Iceberg table.- Specified by:
- asin interface- SnapshotTable
- Parameters:
- ident- the destination table identifier
- Returns:
- this for method chaining
 
- 
tablePropertiesDescription copied from interface:SnapshotTableSets table properties in the newly created Iceberg table. Any properties with the same key name will be overwritten.- Specified by:
- tablePropertiesin interface- SnapshotTable
- Parameters:
- properties- a map of properties to be included
- Returns:
- this for method chaining
 
- 
tablePropertyDescription copied from interface:SnapshotTableSets a table property in the newly created Iceberg table. Any properties with the same key name will be overwritten.- Specified by:
- tablePropertyin interface- SnapshotTable
- Parameters:
- property- the key of the property to add
- value- the value of the property to add
- Returns:
- this for method chaining
 
- 
executeWithDescription copied from interface:SnapshotTableSets the executor service to use for parallel file reading. The default is not using executor service.- Specified by:
- executeWithin interface- SnapshotTable
- Parameters:
- service- executor service
- Returns:
- this for method chaining
 
- 
executeDescription copied from interface:ActionExecutes this action.- Specified by:
- executein interface- Action<SnapshotTable,- SnapshotTable.Result> 
- Returns:
- the result of this action
 
- 
destTableProps
- 
checkSourceCatalogprotected org.apache.spark.sql.connector.catalog.TableCatalog checkSourceCatalog(org.apache.spark.sql.connector.catalog.CatalogPlugin catalog) 
- 
tableLocationDescription copied from interface:SnapshotTableSets the table location for the newly created Iceberg table.- Specified by:
- tableLocationin interface- SnapshotTable
- Parameters:
- location- a table location
- Returns:
- this for method chaining
 
- 
sourceTableLocation
- 
v1SourceTableprotected org.apache.spark.sql.catalyst.catalog.CatalogTable v1SourceTable()
- 
sourceCatalogprotected org.apache.spark.sql.connector.catalog.TableCatalog sourceCatalog()
- 
sourceTableIdentprotected org.apache.spark.sql.connector.catalog.Identifier sourceTableIdent()
- 
setProperties
- 
setProperty
- 
additionalProperties
- 
checkDestinationCatalogprotected org.apache.spark.sql.connector.catalog.StagingTableCatalog checkDestinationCatalog(org.apache.spark.sql.connector.catalog.CatalogPlugin catalog) 
- 
stageDestTable
- 
ensureNameMappingPresent
- 
getMetadataLocation
- 
sparkprotected org.apache.spark.sql.SparkSession spark()
- 
sparkContextprotected org.apache.spark.api.java.JavaSparkContext sparkContext()
- 
option
- 
options
- 
options
- 
withJobGroupInfo
- 
newJobGroupInfo
- 
newStaticTable
- 
newStaticTable
- 
contentFileDS
- 
contentFileDS
- 
manifestDS
- 
manifestDS
- 
manifestListDS
- 
manifestListDS
- 
statisticsFileDS
- 
otherMetadataFileDS
- 
allReachableOtherMetadataFileDS
- 
loadMetadataTableprotected org.apache.spark.sql.Dataset<org.apache.spark.sql.Row> loadMetadataTable(Table table, MetadataTableType type) 
- 
deleteFilesprotected org.apache.iceberg.spark.actions.BaseSparkAction.DeleteSummary deleteFiles(ExecutorService executorService, Consumer<String> deleteFunc, Iterator<FileInfo> files) Deletes files and keeps track of how many files were removed for each file type.- Parameters:
- executorService- an executor service to use for parallel deletes
- deleteFunc- a delete func
- files- an iterator of Spark rows of the structure (path: String, type: String)
- Returns:
- stats on which files were deleted
 
- 
deleteFilesprotected org.apache.iceberg.spark.actions.BaseSparkAction.DeleteSummary deleteFiles(SupportsBulkOperations io, Iterator<FileInfo> files) 
 
-