Package org.apache.iceberg.actions
Class BaseRewriteDataFilesAction<ThisT>
- java.lang.Object
-
- org.apache.iceberg.actions.BaseRewriteDataFilesAction<ThisT>
-
- All Implemented Interfaces:
Action<ThisT,RewriteDataFilesActionResult>
,SnapshotUpdateAction<ThisT,RewriteDataFilesActionResult>
- Direct Known Subclasses:
RewriteDataFilesAction
public abstract class BaseRewriteDataFilesAction<ThisT> extends java.lang.Object
-
-
Constructor Summary
Constructors Modifier Constructor Description protected
BaseRewriteDataFilesAction(Table table)
-
Method Summary
All Methods Instance Methods Abstract Methods Concrete Methods Modifier and Type Method Description protected boolean
caseSensitive()
BaseRewriteDataFilesAction<ThisT>
caseSensitive(boolean newCaseSensitive)
Is it case sensitiveprotected void
commit(SnapshotUpdate<?> update)
protected EncryptionManager
encryptionManager()
RewriteDataFilesActionResult
execute()
Executes this action.protected abstract FileIO
fileIO()
BaseRewriteDataFilesAction<ThisT>
filter(Expression expr)
Pass a row Expression to filter DataFiles to be rewritten.protected java.lang.String
metadataTableName(java.lang.String tableName, MetadataTableType type)
protected java.lang.String
metadataTableName(MetadataTableType type)
BaseRewriteDataFilesAction<ThisT>
outputSpecId(int specId)
Pass a PartitionSpec id to specify which PartitionSpec should be used in DataFile rewriteprotected abstract java.util.List<DataFile>
rewriteDataForTasks(java.util.List<CombinedScanTask> combinedScanTask)
protected abstract ThisT
self()
ThisT
set(java.lang.String property, java.lang.String value)
protected PartitionSpec
spec()
BaseRewriteDataFilesAction<ThisT>
splitLookback(int lookback)
Specify the number of "bins" considered when trying to pack the next file split into a task.BaseRewriteDataFilesAction<ThisT>
splitOpenFileCost(long openFileCost)
Specify the minimum file size to count to pack into one "bin".protected Table
table()
BaseRewriteDataFilesAction<ThisT>
targetSizeInBytes(long targetSize)
Specify the target rewrite data file size in bytesBaseRewriteDataFilesAction<ThisT>
useStartingSequenceNumber(boolean useStarting)
If the compaction should use the sequence number of the snapshot at compaction start time for new data files, instead of using the sequence number of the newly produced snapshot.
-
-
-
Constructor Detail
-
BaseRewriteDataFilesAction
protected BaseRewriteDataFilesAction(Table table)
-
-
Method Detail
-
table
protected Table table()
-
spec
protected PartitionSpec spec()
-
encryptionManager
protected EncryptionManager encryptionManager()
-
caseSensitive
protected boolean caseSensitive()
-
caseSensitive
public BaseRewriteDataFilesAction<ThisT> caseSensitive(boolean newCaseSensitive)
Is it case sensitive- Parameters:
newCaseSensitive
- caseSensitive- Returns:
- this for method chaining
-
outputSpecId
public BaseRewriteDataFilesAction<ThisT> outputSpecId(int specId)
Pass a PartitionSpec id to specify which PartitionSpec should be used in DataFile rewrite- Parameters:
specId
- PartitionSpec id to rewrite- Returns:
- this for method chaining
-
targetSizeInBytes
public BaseRewriteDataFilesAction<ThisT> targetSizeInBytes(long targetSize)
Specify the target rewrite data file size in bytes- Parameters:
targetSize
- size in bytes of rewrite data file- Returns:
- this for method chaining
-
splitLookback
public BaseRewriteDataFilesAction<ThisT> splitLookback(int lookback)
Specify the number of "bins" considered when trying to pack the next file split into a task. Increasing this usually makes tasks a bit more even by considering more ways to pack file regions into a single task with extra planning cost.This configuration can reorder the incoming file regions, to preserve order for lower/upper bounds in file metadata, user can use a lookback of 1.
- Parameters:
lookback
- number of "bins" considered when trying to pack the next file split into a task.- Returns:
- this for method chaining
-
splitOpenFileCost
public BaseRewriteDataFilesAction<ThisT> splitOpenFileCost(long openFileCost)
Specify the minimum file size to count to pack into one "bin". If the read file size is smaller than this specified threshold, Iceberg will use this value to do count.this configuration controls the number of files to compact for each task, small value would lead to a high compaction, the default value is 4MB.
- Parameters:
openFileCost
- minimum file size to count to pack into one "bin".- Returns:
- this for method chaining
-
filter
public BaseRewriteDataFilesAction<ThisT> filter(Expression expr)
Pass a row Expression to filter DataFiles to be rewritten. Note that all files that may contain data matching the filter may be rewritten.- Parameters:
expr
- Expression to filter out DataFiles- Returns:
- this for method chaining
-
useStartingSequenceNumber
public BaseRewriteDataFilesAction<ThisT> useStartingSequenceNumber(boolean useStarting)
If the compaction should use the sequence number of the snapshot at compaction start time for new data files, instead of using the sequence number of the newly produced snapshot.This avoids commit conflicts with updates that add newer equality deletes at a higher sequence number.
- Parameters:
useStarting
- use starting sequence number if set to true- Returns:
- this for method chaining
-
execute
public RewriteDataFilesActionResult execute()
Description copied from interface:Action
Executes this action.- Returns:
- the result of this action
-
fileIO
protected abstract FileIO fileIO()
-
rewriteDataForTasks
protected abstract java.util.List<DataFile> rewriteDataForTasks(java.util.List<CombinedScanTask> combinedScanTask)
-
self
protected abstract ThisT self()
-
set
public ThisT set(java.lang.String property, java.lang.String value)
- Specified by:
set
in interfaceSnapshotUpdateAction<ThisT,R>
-
commit
protected void commit(SnapshotUpdate<?> update)
-
metadataTableName
protected java.lang.String metadataTableName(MetadataTableType type)
-
metadataTableName
protected java.lang.String metadataTableName(java.lang.String tableName, MetadataTableType type)
-
-