Package org.apache.iceberg.spark.actions
Class RewriteTablePathSparkAction
java.lang.Object
org.apache.iceberg.spark.actions.RewriteTablePathSparkAction
- All Implemented Interfaces:
Action<RewriteTablePath,
,RewriteTablePath.Result> RewriteTablePath
-
Nested Class Summary
Nested ClassesModifier and TypeClassDescriptionstatic class
Nested classes/interfaces inherited from interface org.apache.iceberg.actions.RewriteTablePath
RewriteTablePath.Result
-
Field Summary
FieldsModifier and TypeFieldDescriptionprotected static final org.apache.iceberg.relocated.com.google.common.base.Joiner
protected static final org.apache.iceberg.relocated.com.google.common.base.Splitter
protected static final String
protected static final String
protected static final String
protected static final String
protected static final String
protected static final String
-
Method Summary
Modifier and TypeMethodDescriptionprotected org.apache.spark.sql.Dataset<FileInfo>
protected org.apache.spark.sql.Dataset<FileInfo>
contentFileDS
(Table table) protected org.apache.spark.sql.Dataset<FileInfo>
contentFileDS
(Table table, Set<Long> snapshotIds) protected org.apache.iceberg.spark.actions.BaseSparkAction.DeleteSummary
deleteFiles
(ExecutorService executorService, Consumer<String> deleteFunc, Iterator<FileInfo> files) Deletes files and keeps track of how many files were removed for each file type.protected org.apache.iceberg.spark.actions.BaseSparkAction.DeleteSummary
deleteFiles
(SupportsBulkOperations io, Iterator<FileInfo> files) endVersion
(String eVersion) Last metadata version to rewrite, identified by name of a metadata.json file in the table's metadata log.execute()
Executes this action.protected org.apache.spark.sql.Dataset<org.apache.spark.sql.Row>
loadMetadataTable
(Table table, MetadataTableType type) protected org.apache.spark.sql.Dataset<FileInfo>
manifestDS
(Table table) protected org.apache.spark.sql.Dataset<FileInfo>
manifestDS
(Table table, Set<Long> snapshotIds) protected org.apache.spark.sql.Dataset<FileInfo>
manifestListDS
(Table table) protected org.apache.spark.sql.Dataset<FileInfo>
manifestListDS
(Table table, Set<Long> snapshotIds) protected JobGroupInfo
newJobGroupInfo
(String groupId, String desc) protected Table
newStaticTable
(String metadataFileLocation, FileIO io) protected Table
newStaticTable
(TableMetadata metadata, FileIO io) options()
protected org.apache.spark.sql.Dataset<FileInfo>
otherMetadataFileDS
(Table table) rewriteLocationPrefix
(String sPrefix, String tPrefix) Configure a source prefix that will be replaced by the specified target prefix in all pathsprotected RewriteTablePath
self()
protected org.apache.spark.sql.SparkSession
spark()
protected org.apache.spark.api.java.JavaSparkContext
stagingLocation
(String stagingLocation) Custom staging location.startVersion
(String sVersion) First metadata version to rewrite, identified by name of a metadata.json file in the table's metadata log.protected org.apache.spark.sql.Dataset<FileInfo>
statisticsFileDS
(Table table, Set<Long> snapshotIds) protected <T> T
withJobGroupInfo
(JobGroupInfo info, Supplier<T> supplier)
-
Field Details
-
MANIFEST
- See Also:
-
MANIFEST_LIST
- See Also:
-
STATISTICS_FILES
- See Also:
-
OTHERS
- See Also:
-
FILE_PATH
- See Also:
-
LAST_MODIFIED
- See Also:
-
COMMA_SPLITTER
protected static final org.apache.iceberg.relocated.com.google.common.base.Splitter COMMA_SPLITTER -
COMMA_JOINER
protected static final org.apache.iceberg.relocated.com.google.common.base.Joiner COMMA_JOINER
-
-
Method Details
-
self
-
rewriteLocationPrefix
Description copied from interface:RewriteTablePath
Configure a source prefix that will be replaced by the specified target prefix in all paths- Specified by:
rewriteLocationPrefix
in interfaceRewriteTablePath
- Parameters:
sPrefix
- the source prefix to be replacedtPrefix
- the target prefix- Returns:
- this for method chaining
-
startVersion
Description copied from interface:RewriteTablePath
First metadata version to rewrite, identified by name of a metadata.json file in the table's metadata log. It is optional, if provided then this action will only rewrite metadata files added after this version.- Specified by:
startVersion
in interfaceRewriteTablePath
- Parameters:
sVersion
- name of a metadata.json file. For example, "00001-8893aa9e-f92e-4443-80e7-cfa42238a654.metadata.json".- Returns:
- this for method chaining
-
endVersion
Description copied from interface:RewriteTablePath
Last metadata version to rewrite, identified by name of a metadata.json file in the table's metadata log. It is optional, if provided then this action will only rewrite metadata files added before this file, including the file itself.- Specified by:
endVersion
in interfaceRewriteTablePath
- Parameters:
eVersion
- name of a metadata.json file. For example, "00001-8893aa9e-f92e-4443-80e7-cfa42238a654.metadata.json".- Returns:
- this for method chaining
-
stagingLocation
Description copied from interface:RewriteTablePath
Custom staging location. It is optional. By default, staging location is a subdirectory under table's metadata directory.- Specified by:
stagingLocation
in interfaceRewriteTablePath
- Parameters:
stagingLocation
- the staging location- Returns:
- this for method chaining
-
execute
Description copied from interface:Action
Executes this action.- Specified by:
execute
in interfaceAction<RewriteTablePath,
RewriteTablePath.Result> - Returns:
- the result of this action
-
spark
protected org.apache.spark.sql.SparkSession spark() -
sparkContext
protected org.apache.spark.api.java.JavaSparkContext sparkContext() -
option
-
options
-
options
-
withJobGroupInfo
-
newJobGroupInfo
-
newStaticTable
-
newStaticTable
-
contentFileDS
-
contentFileDS
-
manifestDS
-
manifestDS
-
manifestListDS
-
manifestListDS
-
statisticsFileDS
-
otherMetadataFileDS
-
allReachableOtherMetadataFileDS
-
loadMetadataTable
protected org.apache.spark.sql.Dataset<org.apache.spark.sql.Row> loadMetadataTable(Table table, MetadataTableType type) -
deleteFiles
protected org.apache.iceberg.spark.actions.BaseSparkAction.DeleteSummary deleteFiles(ExecutorService executorService, Consumer<String> deleteFunc, Iterator<FileInfo> files) Deletes files and keeps track of how many files were removed for each file type.- Parameters:
executorService
- an executor service to use for parallel deletesdeleteFunc
- a delete funcfiles
- an iterator of Spark rows of the structure (path: String, type: String)- Returns:
- stats on which files were deleted
-
deleteFiles
protected org.apache.iceberg.spark.actions.BaseSparkAction.DeleteSummary deleteFiles(SupportsBulkOperations io, Iterator<FileInfo> files)
-