Class SortStrategy

  • All Implemented Interfaces:
    java.io.Serializable, RewriteStrategy
    Direct Known Subclasses:
    SparkSortStrategy

    public abstract class SortStrategy
    extends BinPackStrategy
    A rewrite strategy for data files which aims to reorder data with data files to optimally lay them out in relation to a column. For example, if the Sort strategy is used on a set of files which is ordered by column x and original has files File A (x: 0 - 50), File B ( x: 10 - 40) and File C ( x: 30 - 60), this Strategy will attempt to rewrite those files into File A' (x: 0-20), File B' (x: 21 - 40), File C' (x: 41 - 60).

    Currently the there is no file overlap detection and we will rewrite all files if BinPackStrategy.REWRITE_ALL is true (default: false). If this property is disabled any files that would be chosen by BinPackStrategy will be rewrite candidates.

    In the future other algorithms for determining files to rewrite will be provided.

    See Also:
    Serialized Form
    • Constructor Detail

      • SortStrategy

        public SortStrategy()
    • Method Detail

      • sortOrder

        public SortStrategy sortOrder​(SortOrder order)
        Sets the sort order to be used in this strategy when rewriting files
        Parameters:
        order - the order to use
        Returns:
        this for method chaining
      • sortOrder

        protected SortOrder sortOrder()
      • validOptions

        public java.util.Set<java.lang.String> validOptions()
        Description copied from interface: RewriteStrategy
        Returns a set of options which this rewrite strategy can use. This is an allowed-list and any options not specified here will be rejected at runtime.
        Specified by:
        validOptions in interface RewriteStrategy
        Overrides:
        validOptions in class BinPackStrategy
      • validateOptions

        protected void validateOptions()