Class TableMigrationUtil
- java.lang.Object
-
- org.apache.iceberg.data.TableMigrationUtil
-
public class TableMigrationUtil extends java.lang.Object
-
-
Method Summary
All Methods Static Methods Concrete Methods Modifier and Type Method Description static java.util.List<DataFile>
listPartition(java.util.Map<java.lang.String,java.lang.String> partition, java.lang.String uri, java.lang.String format, PartitionSpec spec, org.apache.hadoop.conf.Configuration conf, MetricsConfig metricsConfig, NameMapping mapping)
Returns the data files in a partition by listing the partition location.static java.util.List<DataFile>
listPartition(java.util.Map<java.lang.String,java.lang.String> partition, java.lang.String partitionUri, java.lang.String format, PartitionSpec spec, org.apache.hadoop.conf.Configuration conf, MetricsConfig metricsSpec, NameMapping mapping, int parallelism)
Returns the data files in a partition by listing the partition location.static java.util.List<DataFile>
listPartition(java.util.Map<java.lang.String,java.lang.String> partition, java.lang.String partitionUri, java.lang.String format, PartitionSpec spec, org.apache.hadoop.conf.Configuration conf, MetricsConfig metricsSpec, NameMapping mapping, java.util.concurrent.ExecutorService service)
Returns the data files in a partition by listing the partition location.static java.util.concurrent.ExecutorService
migrationService(int parallelism)
Returns anExecutorService
for table migration.
-
-
-
Method Detail
-
listPartition
public static java.util.List<DataFile> listPartition(java.util.Map<java.lang.String,java.lang.String> partition, java.lang.String uri, java.lang.String format, PartitionSpec spec, org.apache.hadoop.conf.Configuration conf, MetricsConfig metricsConfig, NameMapping mapping)
Returns the data files in a partition by listing the partition location.For Parquet and ORC partitions, this will read metrics from the file footer. For Avro partitions, metrics other than row count are set to null.
Note: certain metrics, like NaN counts, that are only supported by Iceberg file writers but not file footers, will not be populated.
- Parameters:
partition
- map of column names to column values for the partitionuri
- partition location URIformat
- partition format, avro, parquet or orcspec
- a partition specconf
- a Hadoop confmetricsConfig
- a metrics confmapping
- a name mapping- Returns:
- a List of DataFile
-
listPartition
public static java.util.List<DataFile> listPartition(java.util.Map<java.lang.String,java.lang.String> partition, java.lang.String partitionUri, java.lang.String format, PartitionSpec spec, org.apache.hadoop.conf.Configuration conf, MetricsConfig metricsSpec, NameMapping mapping, int parallelism)
Returns the data files in a partition by listing the partition location. Metrics are read from the files and the file reading is done in parallel by a specified number of threads.For Parquet and ORC partitions, this will read metrics from the file footer. For Avro partitions, metrics other than row count are set to null.
Note: certain metrics, like NaN counts, that are only supported by Iceberg file writers but not file footers, will not be populated.
- Parameters:
partition
- map of column names to column values for the partitionpartitionUri
- partition location URIformat
- partition format, avro, parquet or orcspec
- a partition specconf
- a Hadoop confmetricsSpec
- a metrics confmapping
- a name mappingparallelism
- number of threads to use for file reading. If null, file reading will be performed on the current thread. If non-null, the provided ExecutorService will be shutdown within this method after file reading is complete.- Returns:
- a List of DataFile
-
listPartition
public static java.util.List<DataFile> listPartition(java.util.Map<java.lang.String,java.lang.String> partition, java.lang.String partitionUri, java.lang.String format, PartitionSpec spec, org.apache.hadoop.conf.Configuration conf, MetricsConfig metricsSpec, NameMapping mapping, java.util.concurrent.ExecutorService service)
Returns the data files in a partition by listing the partition location. Metrics are read from the files and the file reading is done in parallel by a specified number of threads.For Parquet and ORC partitions, this will read metrics from the file footer. For Avro partitions, metrics other than row count are set to null.
Note: certain metrics, like NaN counts, that are only supported by Iceberg file writers but not file footers, will not be populated.
- Parameters:
partition
- map of column names to column values for the partitionpartitionUri
- partition location URIformat
- partition format, avro, parquet or orcspec
- a partition specconf
- a Hadoop confmetricsSpec
- a metrics confmapping
- a name mappingservice
- executor service to use for file reading. If null, file reading will be performed on the current thread. If non-null, the provided ExecutorService will be shutdown within this method after file reading is complete.- Returns:
- a List of DataFile
-
migrationService
public static java.util.concurrent.ExecutorService migrationService(int parallelism)
Returns anExecutorService
for table migration.If parallelism is 1, this method returns null, indicating that no executor service is needed. Otherwise, it returns a fixed-size thread pool with the given parallelism.
Important: Callers are responsible for shutting down the returned executor service when it is no longer needed to prevent resource leaks.
-
-