Package org.apache.iceberg.spark
Class SparkContentFile<F>
- java.lang.Object
-
- org.apache.iceberg.spark.SparkContentFile<F>
-
- All Implemented Interfaces:
ContentFile<F>
- Direct Known Subclasses:
SparkDataFile
,SparkDeleteFile
public abstract class SparkContentFile<F> extends java.lang.Object implements ContentFile<F>
-
-
Method Summary
All Methods Instance Methods Abstract Methods Concrete Methods Modifier and Type Method Description protected abstract F
asFile()
java.util.Map<java.lang.Integer,java.lang.Long>
columnSizes()
Returns if collected, map from column ID to the size of the column in bytes, null otherwise.FileContent
content()
Returns type of content stored in the file; one of DATA, POSITION_DELETES, or EQUALITY_DELETES.F
copy()
Copies this file.F
copyWithoutStats()
Copies this file without file stats.java.util.List<java.lang.Integer>
equalityFieldIds()
Returns the set of field IDs used for equality comparison, in equality delete files.long
fileSizeInBytes()
Returns the file size in bytes.FileFormat
format()
Returns format of the file.java.nio.ByteBuffer
keyMetadata()
Returns metadata about how this file is encrypted, or null if the file is stored in plain text.java.util.Map<java.lang.Integer,java.nio.ByteBuffer>
lowerBounds()
Returns if collected, map from column ID to value lower bounds, null otherwise.java.util.Map<java.lang.Integer,java.lang.Long>
nanValueCounts()
Returns if collected, map from column ID to its NaN value count, null otherwise.java.util.Map<java.lang.Integer,java.lang.Long>
nullValueCounts()
Returns if collected, map from column ID to its null value count, null otherwise.StructLike
partition()
Returns partition for this file as aStructLike
.java.lang.CharSequence
path()
Returns fully qualified path to the file, suitable for constructing a Hadoop Path.java.lang.Long
pos()
Returns the ordinal position of the file in a manifest, or null if it was not read from a manifest.long
recordCount()
Returns the number of top-level records in the file.java.lang.Integer
sortOrderId()
Returns the sort order id of this file, which describes how the file is ordered.int
specId()
Returns id of the partition spec used for partition metadata.java.util.List<java.lang.Long>
splitOffsets()
Returns list of recommended split locations, if applicable, null otherwise.java.util.Map<java.lang.Integer,java.nio.ByteBuffer>
upperBounds()
Returns if collected, map from column ID to value upper bounds, null otherwise.java.util.Map<java.lang.Integer,java.lang.Long>
valueCounts()
Returns if collected, map from column ID to the count of its values (including null and NaN values), null otherwise.F
wrap(org.apache.spark.sql.Row row)
-
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
-
Methods inherited from interface org.apache.iceberg.ContentFile
copy, copyWithStats, dataSequenceNumber, fileSequenceNumber
-
-
-
-
Method Detail
-
wrap
public F wrap(org.apache.spark.sql.Row row)
-
asFile
protected abstract F asFile()
-
pos
public java.lang.Long pos()
Description copied from interface:ContentFile
Returns the ordinal position of the file in a manifest, or null if it was not read from a manifest.- Specified by:
pos
in interfaceContentFile<F>
-
specId
public int specId()
Description copied from interface:ContentFile
Returns id of the partition spec used for partition metadata.- Specified by:
specId
in interfaceContentFile<F>
-
content
public FileContent content()
Description copied from interface:ContentFile
Returns type of content stored in the file; one of DATA, POSITION_DELETES, or EQUALITY_DELETES.- Specified by:
content
in interfaceContentFile<F>
-
path
public java.lang.CharSequence path()
Description copied from interface:ContentFile
Returns fully qualified path to the file, suitable for constructing a Hadoop Path.- Specified by:
path
in interfaceContentFile<F>
-
format
public FileFormat format()
Description copied from interface:ContentFile
Returns format of the file.- Specified by:
format
in interfaceContentFile<F>
-
partition
public StructLike partition()
Description copied from interface:ContentFile
Returns partition for this file as aStructLike
.- Specified by:
partition
in interfaceContentFile<F>
-
recordCount
public long recordCount()
Description copied from interface:ContentFile
Returns the number of top-level records in the file.- Specified by:
recordCount
in interfaceContentFile<F>
-
fileSizeInBytes
public long fileSizeInBytes()
Description copied from interface:ContentFile
Returns the file size in bytes.- Specified by:
fileSizeInBytes
in interfaceContentFile<F>
-
columnSizes
public java.util.Map<java.lang.Integer,java.lang.Long> columnSizes()
Description copied from interface:ContentFile
Returns if collected, map from column ID to the size of the column in bytes, null otherwise.- Specified by:
columnSizes
in interfaceContentFile<F>
-
valueCounts
public java.util.Map<java.lang.Integer,java.lang.Long> valueCounts()
Description copied from interface:ContentFile
Returns if collected, map from column ID to the count of its values (including null and NaN values), null otherwise.- Specified by:
valueCounts
in interfaceContentFile<F>
-
nullValueCounts
public java.util.Map<java.lang.Integer,java.lang.Long> nullValueCounts()
Description copied from interface:ContentFile
Returns if collected, map from column ID to its null value count, null otherwise.- Specified by:
nullValueCounts
in interfaceContentFile<F>
-
nanValueCounts
public java.util.Map<java.lang.Integer,java.lang.Long> nanValueCounts()
Description copied from interface:ContentFile
Returns if collected, map from column ID to its NaN value count, null otherwise.- Specified by:
nanValueCounts
in interfaceContentFile<F>
-
lowerBounds
public java.util.Map<java.lang.Integer,java.nio.ByteBuffer> lowerBounds()
Description copied from interface:ContentFile
Returns if collected, map from column ID to value lower bounds, null otherwise.- Specified by:
lowerBounds
in interfaceContentFile<F>
-
upperBounds
public java.util.Map<java.lang.Integer,java.nio.ByteBuffer> upperBounds()
Description copied from interface:ContentFile
Returns if collected, map from column ID to value upper bounds, null otherwise.- Specified by:
upperBounds
in interfaceContentFile<F>
-
keyMetadata
public java.nio.ByteBuffer keyMetadata()
Description copied from interface:ContentFile
Returns metadata about how this file is encrypted, or null if the file is stored in plain text.- Specified by:
keyMetadata
in interfaceContentFile<F>
-
copy
public F copy()
Description copied from interface:ContentFile
Copies this file. Manifest readers can reuse file instances; use this method to copy data when collecting files from tasks.- Specified by:
copy
in interfaceContentFile<F>
- Returns:
- a copy of this data file
-
copyWithoutStats
public F copyWithoutStats()
Description copied from interface:ContentFile
Copies this file without file stats. Manifest readers can reuse file instances; use this method to copy data without stats when collecting files.- Specified by:
copyWithoutStats
in interfaceContentFile<F>
- Returns:
- a copy of this data file, without lower bounds, upper bounds, value counts, null value counts, or nan value counts
-
splitOffsets
public java.util.List<java.lang.Long> splitOffsets()
Description copied from interface:ContentFile
Returns list of recommended split locations, if applicable, null otherwise.When available, this information is used for planning scan tasks whose boundaries are determined by these offsets. The returned list must be sorted in ascending order.
- Specified by:
splitOffsets
in interfaceContentFile<F>
-
sortOrderId
public java.lang.Integer sortOrderId()
Description copied from interface:ContentFile
Returns the sort order id of this file, which describes how the file is ordered. This information will be useful for merging data and equality delete files more efficiently when they share the same sort order id.- Specified by:
sortOrderId
in interfaceContentFile<F>
-
equalityFieldIds
public java.util.List<java.lang.Integer> equalityFieldIds()
Description copied from interface:ContentFile
Returns the set of field IDs used for equality comparison, in equality delete files.An equality delete file may contain additional data fields that are not used by equality comparison. The subset of columns in a delete file to be used in equality comparison are tracked by ID. Extra columns can be used to reconstruct changes and metrics from extra columns are used during job planning.
- Specified by:
equalityFieldIds
in interfaceContentFile<F>
- Returns:
- IDs of the fields used in equality comparison with the records in this delete file
-
-