Class ComputeTableStatsSparkAction

java.lang.Object
org.apache.iceberg.spark.actions.ComputeTableStatsSparkAction
All Implemented Interfaces:
Action<ComputeTableStats,ComputeTableStats.Result>, ComputeTableStats

public class ComputeTableStatsSparkAction extends Object implements ComputeTableStats
Computes the statistics of the given columns and stores it as Puffin files.
  • Field Details

  • Method Details

    • self

      protected ComputeTableStatsSparkAction self()
    • columns

      public ComputeTableStats columns(String... newColumns)
      Description copied from interface: ComputeTableStats
      Choose the set of columns to collect stats, by default all columns are chosen.
      Specified by:
      columns in interface ComputeTableStats
      Parameters:
      newColumns - a set of column names to be analyzed
      Returns:
      this for method chaining
    • snapshot

      public ComputeTableStats snapshot(long newSnapshotId)
      Description copied from interface: ComputeTableStats
      Choose the table snapshot to compute stats, by default the current snapshot is used.
      Specified by:
      snapshot in interface ComputeTableStats
      Parameters:
      newSnapshotId - long ID of the snapshot for which stats need to be computed
      Returns:
      this for method chaining
    • execute

      public ComputeTableStats.Result execute()
      Description copied from interface: Action
      Executes this action.
      Specified by:
      execute in interface Action<ComputeTableStats,ComputeTableStats.Result>
      Returns:
      the result of this action
    • spark

      protected org.apache.spark.sql.SparkSession spark()
    • sparkContext

      protected org.apache.spark.api.java.JavaSparkContext sparkContext()
    • option

      public ComputeTableStatsSparkAction option(String name, String value)
    • options

      public ComputeTableStatsSparkAction options(Map<String,String> newOptions)
    • options

      protected Map<String,String> options()
    • withJobGroupInfo

      protected <T> T withJobGroupInfo(JobGroupInfo info, Supplier<T> supplier)
    • newJobGroupInfo

      protected JobGroupInfo newJobGroupInfo(String groupId, String desc)
    • newStaticTable

      protected Table newStaticTable(TableMetadata metadata, FileIO io)
    • contentFileDS

      protected org.apache.spark.sql.Dataset<FileInfo> contentFileDS(Table table)
    • contentFileDS

      protected org.apache.spark.sql.Dataset<FileInfo> contentFileDS(Table table, Set<Long> snapshotIds)
    • manifestDS

      protected org.apache.spark.sql.Dataset<FileInfo> manifestDS(Table table)
    • manifestDS

      protected org.apache.spark.sql.Dataset<FileInfo> manifestDS(Table table, Set<Long> snapshotIds)
    • manifestListDS

      protected org.apache.spark.sql.Dataset<FileInfo> manifestListDS(Table table)
    • manifestListDS

      protected org.apache.spark.sql.Dataset<FileInfo> manifestListDS(Table table, Set<Long> snapshotIds)
    • statisticsFileDS

      protected org.apache.spark.sql.Dataset<FileInfo> statisticsFileDS(Table table, Set<Long> snapshotIds)
    • otherMetadataFileDS

      protected org.apache.spark.sql.Dataset<FileInfo> otherMetadataFileDS(Table table)
    • allReachableOtherMetadataFileDS

      protected org.apache.spark.sql.Dataset<FileInfo> allReachableOtherMetadataFileDS(Table table)
    • loadMetadataTable

      protected org.apache.spark.sql.Dataset<org.apache.spark.sql.Row> loadMetadataTable(Table table, MetadataTableType type)
    • deleteFiles

      protected org.apache.iceberg.spark.actions.BaseSparkAction.DeleteSummary deleteFiles(ExecutorService executorService, Consumer<String> deleteFunc, Iterator<FileInfo> files)
      Deletes files and keeps track of how many files were removed for each file type.
      Parameters:
      executorService - an executor service to use for parallel deletes
      deleteFunc - a delete func
      files - an iterator of Spark rows of the structure (path: String, type: String)
      Returns:
      stats on which files were deleted
    • deleteFiles

      protected org.apache.iceberg.spark.actions.BaseSparkAction.DeleteSummary deleteFiles(SupportsBulkOperations io, Iterator<FileInfo> files)