Interface RewriteStrategy

    • Method Summary

      All Methods Instance Methods Abstract Methods 
      Modifier and Type Method Description
      java.lang.String name()
      Returns the name of this rewrite strategy
      RewriteStrategy options​(java.util.Map<java.lang.String,​java.lang.String> options)
      Sets options to be used with this strategy
      java.lang.Iterable<java.util.List<FileScanTask>> planFileGroups​(java.lang.Iterable<FileScanTask> dataFiles)
      Groups file scans into lists which will be processed in a single executable unit.
      java.util.Set<DataFile> rewriteFiles​(java.util.List<FileScanTask> filesToRewrite)
      Method which will rewrite files based on this particular RewriteStrategy's algorithm.
      java.lang.Iterable<FileScanTask> selectFilesToRewrite​(java.lang.Iterable<FileScanTask> dataFiles)
      Selects files which this strategy believes are valid targets to be rewritten.
      Table table()
      Returns the table being modified by this rewrite strategy
      java.util.Set<java.lang.String> validOptions()
      Returns a set of options which this rewrite strategy can use.
    • Method Detail

      • name

        java.lang.String name()
        Returns the name of this rewrite strategy
      • table

        Table table()
        Returns the table being modified by this rewrite strategy
      • validOptions

        java.util.Set<java.lang.String> validOptions()
        Returns a set of options which this rewrite strategy can use. This is an allowed-list and any options not specified here will be rejected at runtime.
      • options

        RewriteStrategy options​(java.util.Map<java.lang.String,​java.lang.String> options)
        Sets options to be used with this strategy
      • selectFilesToRewrite

        java.lang.Iterable<FileScanTask> selectFilesToRewrite​(java.lang.Iterable<FileScanTask> dataFiles)
        Selects files which this strategy believes are valid targets to be rewritten.
        Parameters:
        dataFiles - iterable of FileScanTasks for files in a given partition
        Returns:
        iterable containing only FileScanTasks to be rewritten
      • planFileGroups

        java.lang.Iterable<java.util.List<FileScanTask>> planFileGroups​(java.lang.Iterable<FileScanTask> dataFiles)
        Groups file scans into lists which will be processed in a single executable unit. Each group will end up being committed as an independent set of changes. This creates the jobs which will eventually be run as by the underlying Action.
        Parameters:
        dataFiles - iterable of FileScanTasks to be rewritten
        Returns:
        iterable of lists of FileScanTasks which will be processed together
      • rewriteFiles

        java.util.Set<DataFile> rewriteFiles​(java.util.List<FileScanTask> filesToRewrite)
        Method which will rewrite files based on this particular RewriteStrategy's algorithm. This will most likely be Action framework specific (Spark/Presto/Flink ....).
        Parameters:
        filesToRewrite - a group of files to be rewritten together
        Returns:
        a set of newly written files