java.lang.Object

org.apache.iceberg.data.TableMigrationUtil

public class TableMigrationUtil extends Object

Method Summary

Modifier and Type

Method

Description

static List<DataFile>

listPartition(Map<String,String> partition, String uri, String format, PartitionSpec spec, org.apache.hadoop.conf.Configuration conf, MetricsConfig metricsConfig, NameMapping mapping)

Returns the data files in a partition by listing the partition location.

static List<DataFile>

listPartition(Map<String,String> partition, String partitionUri, String format, PartitionSpec spec, org.apache.hadoop.conf.Configuration conf, MetricsConfig metricsSpec, NameMapping mapping, int parallelism)

Returns the data files in a partition by listing the partition location.

static List<DataFile>

listPartition(Map<String,String> partition, String partitionUri, String format, PartitionSpec spec, org.apache.hadoop.conf.Configuration conf, MetricsConfig metricsSpec, NameMapping mapping, ExecutorService service)

Returns the data files in a partition by listing the partition location.

static ExecutorService

migrationService(int parallelism)

Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

Method Details
- listPartition
  
  public static List<DataFile> listPartition(Map<String,String> partition, String uri, String format, PartitionSpec spec, org.apache.hadoop.conf.Configuration conf, MetricsConfig metricsConfig, NameMapping mapping)
  
  Returns the data files in a partition by listing the partition location.
  For Parquet and ORC partitions, this will read metrics from the file footer. For Avro partitions, metrics other than row count are set to null.
  Note: certain metrics, like NaN counts, that are only supported by Iceberg file writers but not file footers, will not be populated.
  
  Parameters:
  
  partition - map of column names to column values for the partition
  
  uri - partition location URI
  
  format - partition format, avro, parquet or orc
  
  spec - a partition spec
  
  conf - a Hadoop conf
  
  metricsConfig - a metrics conf
  
  mapping - a name mapping
  
  Returns:
  
  a List of DataFile
- listPartition
  
  public static List<DataFile> listPartition(Map<String,String> partition, String partitionUri, String format, PartitionSpec spec, org.apache.hadoop.conf.Configuration conf, MetricsConfig metricsSpec, NameMapping mapping, int parallelism)
  
  Returns the data files in a partition by listing the partition location. Metrics are read from the files and the file reading is done in parallel by a specified number of threads.
  For Parquet and ORC partitions, this will read metrics from the file footer. For Avro partitions, metrics other than row count are set to null.
  Note: certain metrics, like NaN counts, that are only supported by Iceberg file writers but not file footers, will not be populated.
  
  Parameters:
  
  partition - map of column names to column values for the partition
  
  partitionUri - partition location URI
  
  format - partition format, avro, parquet or orc
  
  spec - a partition spec
  
  conf - a Hadoop conf
  
  metricsSpec - a metrics conf
  
  mapping - a name mapping
  
  parallelism - number of threads to use for file reading
  
  Returns:
  
  a List of DataFile
- listPartition
  
  public static List<DataFile> listPartition(Map<String,String> partition, String partitionUri, String format, PartitionSpec spec, org.apache.hadoop.conf.Configuration conf, MetricsConfig metricsSpec, NameMapping mapping, ExecutorService service)
  
  Returns the data files in a partition by listing the partition location. Metrics are read from the files and the file reading is done in parallel by a specified number of threads.
  For Parquet and ORC partitions, this will read metrics from the file footer. For Avro partitions, metrics other than row count are set to null.
  Note: certain metrics, like NaN counts, that are only supported by Iceberg file writers but not file footers, will not be populated.
  
  Parameters:
  
  partition - map of column names to column values for the partition
  
  partitionUri - partition location URI
  
  format - partition format, avro, parquet or orc
  
  spec - a partition spec
  
  conf - a Hadoop conf
  
  metricsSpec - a metrics conf
  
  mapping - a name mapping
  
  service - executor service to use for file reading
  
  Returns:
  
  a List of DataFile
- migrationService
  
  public static ExecutorService migrationService(int parallelism)

Class TableMigrationUtil

Method Summary

Methods inherited from class java.lang.Object

Method Details

listPartition

listPartition

listPartition

migrationService