Interface ContentFile<F>

  • Type Parameters:
    F - the concrete Java class of a ContentFile instance.
    All Known Subinterfaces:
    DataFile, DeleteFile
    All Known Implementing Classes:
    SparkDataFile

    public interface ContentFile<F>
    Superinterface of DataFile and DeleteFile that exposes common methods.
    • Method Summary

      All Methods Instance Methods Abstract Methods Default Methods 
      Modifier and Type Method Description
      java.util.Map<java.lang.Integer,​java.lang.Long> columnSizes()
      Returns if collected, map from column ID to the size of the column in bytes, null otherwise.
      FileContent content()
      Returns type of content stored in the file; one of DATA, POSITION_DELETES, or EQUALITY_DELETES.
      F copy()
      Copies this file.
      default F copy​(boolean withStats)
      Copies this file (potentially without file stats).
      F copyWithoutStats()
      Copies this file without file stats.
      java.util.List<java.lang.Integer> equalityFieldIds()
      Returns the set of field IDs used for equality comparison, in equality delete files.
      long fileSizeInBytes()
      Returns the file size in bytes.
      FileFormat format()
      Returns format of the file.
      java.nio.ByteBuffer keyMetadata()
      Returns metadata about how this file is encrypted, or null if the file is stored in plain text.
      java.util.Map<java.lang.Integer,​java.nio.ByteBuffer> lowerBounds()
      Returns if collected, map from column ID to value lower bounds, null otherwise.
      java.util.Map<java.lang.Integer,​java.lang.Long> nanValueCounts()
      Returns if collected, map from column ID to its NaN value count, null otherwise.
      java.util.Map<java.lang.Integer,​java.lang.Long> nullValueCounts()
      Returns if collected, map from column ID to its null value count, null otherwise.
      StructLike partition()
      Returns partition for this file as a StructLike.
      java.lang.CharSequence path()
      Returns fully qualified path to the file, suitable for constructing a Hadoop Path.
      java.lang.Long pos()
      Returns the ordinal position of the file in a manifest, or null if it was not read from a manifest.
      long recordCount()
      Returns the number of top-level records in the file.
      default java.lang.Integer sortOrderId()
      Returns the sort order id of this file, which describes how the file is ordered.
      int specId()
      Returns id of the partition spec used for partition metadata.
      java.util.List<java.lang.Long> splitOffsets()
      Returns list of recommended split locations, if applicable, null otherwise.
      java.util.Map<java.lang.Integer,​java.nio.ByteBuffer> upperBounds()
      Returns if collected, map from column ID to value upper bounds, null otherwise.
      java.util.Map<java.lang.Integer,​java.lang.Long> valueCounts()
      Returns if collected, map from column ID to the count of its non-null values, null otherwise.
    • Method Detail

      • pos

        java.lang.Long pos()
        Returns the ordinal position of the file in a manifest, or null if it was not read from a manifest.
      • specId

        int specId()
        Returns id of the partition spec used for partition metadata.
      • content

        FileContent content()
        Returns type of content stored in the file; one of DATA, POSITION_DELETES, or EQUALITY_DELETES.
      • path

        java.lang.CharSequence path()
        Returns fully qualified path to the file, suitable for constructing a Hadoop Path.
      • format

        FileFormat format()
        Returns format of the file.
      • recordCount

        long recordCount()
        Returns the number of top-level records in the file.
      • fileSizeInBytes

        long fileSizeInBytes()
        Returns the file size in bytes.
      • columnSizes

        java.util.Map<java.lang.Integer,​java.lang.Long> columnSizes()
        Returns if collected, map from column ID to the size of the column in bytes, null otherwise.
      • valueCounts

        java.util.Map<java.lang.Integer,​java.lang.Long> valueCounts()
        Returns if collected, map from column ID to the count of its non-null values, null otherwise.
      • nullValueCounts

        java.util.Map<java.lang.Integer,​java.lang.Long> nullValueCounts()
        Returns if collected, map from column ID to its null value count, null otherwise.
      • nanValueCounts

        java.util.Map<java.lang.Integer,​java.lang.Long> nanValueCounts()
        Returns if collected, map from column ID to its NaN value count, null otherwise.
      • lowerBounds

        java.util.Map<java.lang.Integer,​java.nio.ByteBuffer> lowerBounds()
        Returns if collected, map from column ID to value lower bounds, null otherwise.
      • upperBounds

        java.util.Map<java.lang.Integer,​java.nio.ByteBuffer> upperBounds()
        Returns if collected, map from column ID to value upper bounds, null otherwise.
      • keyMetadata

        java.nio.ByteBuffer keyMetadata()
        Returns metadata about how this file is encrypted, or null if the file is stored in plain text.
      • splitOffsets

        java.util.List<java.lang.Long> splitOffsets()
        Returns list of recommended split locations, if applicable, null otherwise.

        When available, this information is used for planning scan tasks whose boundaries are determined by these offsets. The returned list must be sorted in ascending order.

      • equalityFieldIds

        java.util.List<java.lang.Integer> equalityFieldIds()
        Returns the set of field IDs used for equality comparison, in equality delete files.

        An equality delete file may contain additional data fields that are not used by equality comparison. The subset of columns in a delete file to be used in equality comparison are tracked by ID. Extra columns can be used to reconstruct changes and metrics from extra columns are used during job planning.

        Returns:
        IDs of the fields used in equality comparison with the records in this delete file
      • sortOrderId

        default java.lang.Integer sortOrderId()
        Returns the sort order id of this file, which describes how the file is ordered. This information will be useful for merging data and equality delete files more efficiently when they share the same sort order id.
      • copy

        F copy()
        Copies this file. Manifest readers can reuse file instances; use this method to copy data when collecting files from tasks.
        Returns:
        a copy of this data file
      • copyWithoutStats

        F copyWithoutStats()
        Copies this file without file stats. Manifest readers can reuse file instances; use this method to copy data without stats when collecting files.
        Returns:
        a copy of this data file, without lower bounds, upper bounds, value counts, null value counts, or nan value counts
      • copy

        default F copy​(boolean withStats)
        Copies this file (potentially without file stats). Manifest readers can reuse file instances; use this method to copy data when collecting files from tasks.
        Parameters:
        withStats - Will copy this file without file stats if set to false.
        Returns:
        a copy of this data file. If withStats is set to false the file will not contain lower bounds, upper bounds, value counts, null value counts, or nan value counts