Class TableMigrationUtil

java.lang.Object
org.apache.iceberg.data.TableMigrationUtil

public class TableMigrationUtil extends Object
  • Method Details

    • listPartition

      public static List<DataFile> listPartition(Map<String,String> partition, String uri, String format, PartitionSpec spec, org.apache.hadoop.conf.Configuration conf, MetricsConfig metricsConfig, NameMapping mapping)
      Returns the data files in a partition by listing the partition location.

      For Parquet and ORC partitions, this will read metrics from the file footer. For Avro partitions, metrics other than row count are set to null.

      Note: certain metrics, like NaN counts, that are only supported by Iceberg file writers but not file footers, will not be populated.

      Parameters:
      partition - map of column names to column values for the partition
      uri - partition location URI
      format - partition format, avro, parquet or orc
      spec - a partition spec
      conf - a Hadoop conf
      metricsConfig - a metrics conf
      mapping - a name mapping
      Returns:
      a List of DataFile
    • listPartition

      public static List<DataFile> listPartition(Map<String,String> partition, String partitionUri, String format, PartitionSpec spec, org.apache.hadoop.conf.Configuration conf, MetricsConfig metricsSpec, NameMapping mapping, int parallelism)
      Returns the data files in a partition by listing the partition location. Metrics are read from the files and the file reading is done in parallel by a specified number of threads.

      For Parquet and ORC partitions, this will read metrics from the file footer. For Avro partitions, metrics other than row count are set to null.

      Note: certain metrics, like NaN counts, that are only supported by Iceberg file writers but not file footers, will not be populated.

      Parameters:
      partition - map of column names to column values for the partition
      partitionUri - partition location URI
      format - partition format, avro, parquet or orc
      spec - a partition spec
      conf - a Hadoop conf
      metricsSpec - a metrics conf
      mapping - a name mapping
      parallelism - number of threads to use for file reading
      Returns:
      a List of DataFile
    • listPartition

      public static List<DataFile> listPartition(Map<String,String> partition, String partitionUri, String format, PartitionSpec spec, org.apache.hadoop.conf.Configuration conf, MetricsConfig metricsSpec, NameMapping mapping, ExecutorService service)
      Returns the data files in a partition by listing the partition location. Metrics are read from the files and the file reading is done in parallel by a specified number of threads.

      For Parquet and ORC partitions, this will read metrics from the file footer. For Avro partitions, metrics other than row count are set to null.

      Note: certain metrics, like NaN counts, that are only supported by Iceberg file writers but not file footers, will not be populated.

      Parameters:
      partition - map of column names to column values for the partition
      partitionUri - partition location URI
      format - partition format, avro, parquet or orc
      spec - a partition spec
      conf - a Hadoop conf
      metricsSpec - a metrics conf
      mapping - a name mapping
      service - executor service to use for file reading
      Returns:
      a List of DataFile
    • migrationService

      public static ExecutorService migrationService(int parallelism)