Class SortStrategy

  • All Implemented Interfaces:, RewriteStrategy
    Direct Known Subclasses:

    public abstract class SortStrategy
    extends BinPackStrategy
    A rewrite strategy for data files which aims to reorder data with data files to optimally lay them out in relation to a column. For example, if the Sort strategy is used on a set of files which is ordered by column x and original has files File A (x: 0 - 50), File B ( x: 10 - 40) and File C ( x: 30 - 60), this Strategy will attempt to rewrite those files into File A' (x: 0-20), File B' (x: 21 - 40), File C' (x: 41 - 60).

    Currently the there is no file overlap detection and we will rewrite all files if REWRITE_ALL is true (default: false). If this property is disabled any files that would be chosen by BinPackStrategy will be rewrite candidates.

    In the future other algorithms for determining files to rewrite will be provided.

    • Field Detail


        public static final java.lang.String REWRITE_ALL
        Rewrites all files, regardless of their size. Defaults to false, rewriting only mis-sized files;
    • Constructor Detail

      • SortStrategy

        public SortStrategy()
    • Method Detail

      • sortOrder

        public SortStrategy sortOrder​(SortOrder order)
        Sets the sort order to be used in this strategy when rewriting files
        order - the order to use
        this for method chaining
      • sortOrder

        protected SortOrder sortOrder()
      • validOptions

        public java.util.Set<java.lang.String> validOptions()
      • planFileGroups

        public java.lang.Iterable<java.util.List<FileScanTask>> planFileGroups​(java.lang.Iterable<FileScanTask> dataFiles)
        dataFiles - iterable of FileScanTasks to be rewritten
        iterable of lists of FileScanTasks which will be processed together
      • validateOptions

        protected void validateOptions()