Package org.apache.iceberg.actions
Interface FileRewriter<T extends ContentScanTask<F>,F extends ContentFile<F>>
-
- Type Parameters:
T
- the Java type of tasks to read content filesF
- the Java type of content files
- All Known Implementing Classes:
SizeBasedDataRewriter
,SizeBasedFileRewriter
,SizeBasedPositionDeletesRewriter
public interface FileRewriter<T extends ContentScanTask<F>,F extends ContentFile<F>>
A class for rewriting content files.The entire rewrite operation is broken down into pieces based on partitioning, and size-based groups within a partition. These subunits of the rewrite are referred to as file groups. A file group will be processed by a single framework "action". For example, in Spark this means that each group would be rewritten in its own Spark job.
-
-
Method Summary
All Methods Instance Methods Abstract Methods Default Methods Modifier and Type Method Description default java.lang.String
description()
Returns a description for this rewriter.void
init(java.util.Map<java.lang.String,java.lang.String> options)
Initializes this rewriter using provided options.java.lang.Iterable<java.util.List<T>>
planFileGroups(java.lang.Iterable<T> tasks)
Selects files which this rewriter believes are valid targets to be rewritten based on their scan tasks and groups those scan tasks into file groups.java.util.Set<F>
rewrite(java.util.List<T> group)
Rewrite a group of files represented by the given list of scan tasks.java.util.Set<java.lang.String>
validOptions()
Returns a set of supported options for this rewriter.
-
-
-
Method Detail
-
description
default java.lang.String description()
Returns a description for this rewriter.
-
validOptions
java.util.Set<java.lang.String> validOptions()
Returns a set of supported options for this rewriter. Only options specified in this list will be accepted at runtime. Any other options will be rejected.
-
init
void init(java.util.Map<java.lang.String,java.lang.String> options)
Initializes this rewriter using provided options.- Parameters:
options
- options to initialize this rewriter
-
planFileGroups
java.lang.Iterable<java.util.List<T>> planFileGroups(java.lang.Iterable<T> tasks)
Selects files which this rewriter believes are valid targets to be rewritten based on their scan tasks and groups those scan tasks into file groups. The file groups are then rewritten in a single executable unit, such as a Spark job.- Parameters:
tasks
- an iterable of scan task for files in a partition- Returns:
- groups of scan tasks for files to be rewritten in a single executable unit
-
rewrite
java.util.Set<F> rewrite(java.util.List<T> group)
Rewrite a group of files represented by the given list of scan tasks.The implementation is supposed to be engine-specific (e.g. Spark, Flink, Trino).
- Parameters:
group
- a group of scan tasks for files to be rewritten together- Returns:
- a set of newly written files
-
-