Class UnicodeUtil

java.lang.Object
org.apache.iceberg.util.UnicodeUtil

public class UnicodeUtil extends Object
  • Method Summary

    Modifier and Type
    Method
    Description
    static boolean
    Determines if the given character value is a unicode high-surrogate code unit.
    truncateString(CharSequence input, int length)
    Truncates the input charSequence such that the truncated charSequence is a valid unicode string and the number of unicode characters in the truncated charSequence is lesser than or equal to length
    static String
    truncateStringMax(String input, int length)
    Returns a valid String that is greater than the given input such that the number of unicode characters in the truncated String is lesser than or equal to length
    truncateStringMax(Literal<CharSequence> input, int length)
    Returns a valid unicode charsequence that is greater than the given input such that the number of unicode characters in the truncated charSequence is lesser than or equal to length
    static String
    truncateStringMin(String input, int length)
    Returns a valid String that is lower than the given input such that the number of unicode characters in the truncated String is lesser than or equal to length
    truncateStringMin(Literal<CharSequence> input, int length)
    Returns a valid unicode charsequence that is lower than the given input such that the number of unicode characters in the truncated charSequence is lesser than or equal to length

    Methods inherited from class java.lang.Object

    clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
  • Method Details

    • isCharHighSurrogate

      public static boolean isCharHighSurrogate(char ch)
      Determines if the given character value is a unicode high-surrogate code unit. The range of high-surrogates is 0xD800 - 0xDBFF.
    • truncateString

      public static CharSequence truncateString(CharSequence input, int length)
      Truncates the input charSequence such that the truncated charSequence is a valid unicode string and the number of unicode characters in the truncated charSequence is lesser than or equal to length
    • truncateStringMin

      public static Literal<CharSequence> truncateStringMin(Literal<CharSequence> input, int length)
      Returns a valid unicode charsequence that is lower than the given input such that the number of unicode characters in the truncated charSequence is lesser than or equal to length
    • truncateStringMin

      public static String truncateStringMin(String input, int length)
      Returns a valid String that is lower than the given input such that the number of unicode characters in the truncated String is lesser than or equal to length
    • truncateStringMax

      public static Literal<CharSequence> truncateStringMax(Literal<CharSequence> input, int length)
      Returns a valid unicode charsequence that is greater than the given input such that the number of unicode characters in the truncated charSequence is lesser than or equal to length
    • truncateStringMax

      public static String truncateStringMax(String input, int length)
      Returns a valid String that is greater than the given input such that the number of unicode characters in the truncated String is lesser than or equal to length