Class TableMigrationUtil
-
Method Summary
Modifier and TypeMethodDescriptionlistPartition
(Map<String, String> partition, String uri, String format, PartitionSpec spec, org.apache.hadoop.conf.Configuration conf, MetricsConfig metricsConfig, NameMapping mapping) Returns the data files in a partition by listing the partition location.listPartition
(Map<String, String> partition, String partitionUri, String format, PartitionSpec spec, org.apache.hadoop.conf.Configuration conf, MetricsConfig metricsSpec, NameMapping mapping, int parallelism) Returns the data files in a partition by listing the partition location.listPartition
(Map<String, String> partition, String partitionUri, String format, PartitionSpec spec, org.apache.hadoop.conf.Configuration conf, MetricsConfig metricsSpec, NameMapping mapping, ExecutorService service) Returns the data files in a partition by listing the partition location.static ExecutorService
migrationService
(int parallelism)
-
Method Details
-
listPartition
public static List<DataFile> listPartition(Map<String, String> partition, String uri, String format, PartitionSpec spec, org.apache.hadoop.conf.Configuration conf, MetricsConfig metricsConfig, NameMapping mapping) Returns the data files in a partition by listing the partition location.For Parquet and ORC partitions, this will read metrics from the file footer. For Avro partitions, metrics other than row count are set to null.
Note: certain metrics, like NaN counts, that are only supported by Iceberg file writers but not file footers, will not be populated.
- Parameters:
partition
- map of column names to column values for the partitionuri
- partition location URIformat
- partition format, avro, parquet or orcspec
- a partition specconf
- a Hadoop confmetricsConfig
- a metrics confmapping
- a name mapping- Returns:
- a List of DataFile
-
listPartition
public static List<DataFile> listPartition(Map<String, String> partition, String partitionUri, String format, PartitionSpec spec, org.apache.hadoop.conf.Configuration conf, MetricsConfig metricsSpec, NameMapping mapping, int parallelism) Returns the data files in a partition by listing the partition location. Metrics are read from the files and the file reading is done in parallel by a specified number of threads.For Parquet and ORC partitions, this will read metrics from the file footer. For Avro partitions, metrics other than row count are set to null.
Note: certain metrics, like NaN counts, that are only supported by Iceberg file writers but not file footers, will not be populated.
- Parameters:
partition
- map of column names to column values for the partitionpartitionUri
- partition location URIformat
- partition format, avro, parquet or orcspec
- a partition specconf
- a Hadoop confmetricsSpec
- a metrics confmapping
- a name mappingparallelism
- number of threads to use for file reading- Returns:
- a List of DataFile
-
listPartition
public static List<DataFile> listPartition(Map<String, String> partition, String partitionUri, String format, PartitionSpec spec, org.apache.hadoop.conf.Configuration conf, MetricsConfig metricsSpec, NameMapping mapping, ExecutorService service) Returns the data files in a partition by listing the partition location. Metrics are read from the files and the file reading is done in parallel by a specified number of threads.For Parquet and ORC partitions, this will read metrics from the file footer. For Avro partitions, metrics other than row count are set to null.
Note: certain metrics, like NaN counts, that are only supported by Iceberg file writers but not file footers, will not be populated.
- Parameters:
partition
- map of column names to column values for the partitionpartitionUri
- partition location URIformat
- partition format, avro, parquet or orcspec
- a partition specconf
- a Hadoop confmetricsSpec
- a metrics confmapping
- a name mappingservice
- executor service to use for file reading- Returns:
- a List of DataFile
-
migrationService
-