Class SortStrategy

  • All Implemented Interfaces:, RewriteStrategy

    public abstract class SortStrategy
    extends BinPackStrategy
    since 1.3.0, will be removed in 1.4.0; use SizeBasedFileRewriter instead. Note: This can only be removed once Spark 3.2 isn't using this API anymore.
    A rewrite strategy for data files which aims to reorder data with data files to optimally lay them out in relation to a column. For example, if the Sort strategy is used on a set of files which is ordered by column x and original has files File A (x: 0 - 50), File B ( x: 10 - 40) and File C ( x: 30 - 60), this Strategy will attempt to rewrite those files into File A' (x: 0-20), File B' (x: 21 - 40), File C' (x: 41 - 60).

    Currently the there is no file overlap detection and we will rewrite all files if BinPackStrategy.REWRITE_ALL is true (default: false). If this property is disabled any files that would be chosen by BinPackStrategy will be rewrite candidates.

    In the future other algorithms for determining files to rewrite will be provided.

    See Also:
    Serialized Form
    • Constructor Detail

      • SortStrategy

        public SortStrategy()
    • Method Detail

      • sortOrder

        public SortStrategy sortOrder​(SortOrder order)
        Sets the sort order to be used in this strategy when rewriting files
        order - the order to use
        this for method chaining
      • sortOrder

        protected SortOrder sortOrder()
      • validOptions

        public java.util.Set<java.lang.String> validOptions()
        Description copied from interface: RewriteStrategy
        Returns a set of options which this rewrite strategy can use. This is an allowed-list and any options not specified here will be rejected at runtime.
        Specified by:
        validOptions in interface RewriteStrategy
        validOptions in class BinPackStrategy
      • validateOptions

        protected void validateOptions()