Class TableMigrationUtil


  • public class TableMigrationUtil
    extends java.lang.Object
    • Method Summary

      All Methods Static Methods Concrete Methods 
      Modifier and Type Method Description
      static java.util.List<DataFile> listPartition​(java.util.Map<java.lang.String,​java.lang.String> partition, java.lang.String uri, java.lang.String format, PartitionSpec spec, org.apache.hadoop.conf.Configuration conf, MetricsConfig metricsConfig, NameMapping mapping)
      Returns the data files in a partition by listing the partition location.
      static java.util.List<DataFile> listPartition​(java.util.Map<java.lang.String,​java.lang.String> partition, java.lang.String partitionUri, java.lang.String format, PartitionSpec spec, org.apache.hadoop.conf.Configuration conf, MetricsConfig metricsSpec, NameMapping mapping, int parallelism)
      Returns the data files in a partition by listing the partition location.
      • Methods inherited from class java.lang.Object

        clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
    • Method Detail

      • listPartition

        public static java.util.List<DataFile> listPartition​(java.util.Map<java.lang.String,​java.lang.String> partition,
                                                             java.lang.String uri,
                                                             java.lang.String format,
                                                             PartitionSpec spec,
                                                             org.apache.hadoop.conf.Configuration conf,
                                                             MetricsConfig metricsConfig,
                                                             NameMapping mapping)
        Returns the data files in a partition by listing the partition location.

        For Parquet and ORC partitions, this will read metrics from the file footer. For Avro partitions, metrics other than row count are set to null.

        Note: certain metrics, like NaN counts, that are only supported by Iceberg file writers but not file footers, will not be populated.

        Parameters:
        partition - map of column names to column values for the partition
        uri - partition location URI
        format - partition format, avro, parquet or orc
        spec - a partition spec
        conf - a Hadoop conf
        metricsConfig - a metrics conf
        mapping - a name mapping
        Returns:
        a List of DataFile
      • listPartition

        public static java.util.List<DataFile> listPartition​(java.util.Map<java.lang.String,​java.lang.String> partition,
                                                             java.lang.String partitionUri,
                                                             java.lang.String format,
                                                             PartitionSpec spec,
                                                             org.apache.hadoop.conf.Configuration conf,
                                                             MetricsConfig metricsSpec,
                                                             NameMapping mapping,
                                                             int parallelism)
        Returns the data files in a partition by listing the partition location. Metrics are read from the files and the file reading is done in parallel by a specified number of threads.

        For Parquet and ORC partitions, this will read metrics from the file footer. For Avro partitions, metrics other than row count are set to null.

        Note: certain metrics, like NaN counts, that are only supported by Iceberg file writers but not file footers, will not be populated.

        Parameters:
        partition - map of column names to column values for the partition
        partitionUri - partition location URI
        format - partition format, avro, parquet or orc
        spec - a partition spec
        conf - a Hadoop conf
        metricsSpec - a metrics conf
        mapping - a name mapping
        parallelism - number of threads to use for file reading
        Returns:
        a List of DataFile