T
- the Java type of tasks to read content filesF
- the Java type of content filespublic interface FileRewriter<T extends ContentScanTask<F>,F extends ContentFile<F>>
The entire rewrite operation is broken down into pieces based on partitioning, and size-based groups within a partition. These subunits of the rewrite are referred to as file groups. A file group will be processed by a single framework "action". For example, in Spark this means that each group would be rewritten in its own Spark job.
Modifier and Type | Method and Description |
---|---|
default java.lang.String |
description()
Returns a description for this rewriter.
|
void |
init(java.util.Map<java.lang.String,java.lang.String> options)
Initializes this rewriter using provided options.
|
java.lang.Iterable<java.util.List<T>> |
planFileGroups(java.lang.Iterable<T> tasks)
Selects files which this rewriter believes are valid targets to be rewritten based on their
scan tasks and groups those scan tasks into file groups.
|
java.util.Set<F> |
rewrite(java.util.List<T> group)
Rewrite a group of files represented by the given list of scan tasks.
|
java.util.Set<java.lang.String> |
validOptions()
Returns a set of supported options for this rewriter.
|
default java.lang.String description()
java.util.Set<java.lang.String> validOptions()
void init(java.util.Map<java.lang.String,java.lang.String> options)
options
- options to initialize this rewriterjava.lang.Iterable<java.util.List<T>> planFileGroups(java.lang.Iterable<T> tasks)
tasks
- an iterable of scan task for files in a partitionjava.util.Set<F> rewrite(java.util.List<T> group)
The implementation is supposed to be engine-specific (e.g. Spark, Flink, Trino).
group
- a group of scan tasks for files to be rewritten together