java.lang.Object
- org.apache.iceberg.spark.actions.ComputeTableStatsSparkAction

All Implemented Interfaces:

Action<ComputeTableStats,ComputeTableStats.Result>, ComputeTableStats
```
public class ComputeTableStatsSparkAction
extends java.lang.Object
implements ComputeTableStats
```
Computes the statistics of the given columns and stores it as Puffin files.

Nested Class Summary
- Nested classes/interfaces inherited from interface org.apache.iceberg.actions.ComputeTableStats
  ComputeTableStats.Result

Field Summary

Fields
Modifier and Type	Field	Description
`protected static org.apache.iceberg.relocated.com.google.common.base.Joiner`	`COMMA_JOINER`
`protected static org.apache.iceberg.relocated.com.google.common.base.Splitter`	`COMMA_SPLITTER`
`protected static java.lang.String`	`FILE_PATH`
`protected static java.lang.String`	`LAST_MODIFIED`
`protected static java.lang.String`	`MANIFEST`
`protected static java.lang.String`	`MANIFEST_LIST`
`protected static java.lang.String`	`OTHERS`
`protected static java.lang.String`	`STATISTICS_FILES`

Method Summary

All Methods Instance Methods Concrete Methods
Modifier and Type	Method	Description
`protected org.apache.spark.sql.Dataset<FileInfo>`	`allReachableOtherMetadataFileDS(Table table)`
`ComputeTableStats`	`columns(java.lang.String... newColumns)`	Choose the set of columns to collect stats, by default all columns are chosen.
`protected org.apache.spark.sql.Dataset<FileInfo>`	`contentFileDS(Table table)`
`protected org.apache.spark.sql.Dataset<FileInfo>`	`contentFileDS(Table table, java.util.Set<java.lang.Long> snapshotIds)`
`protected org.apache.iceberg.spark.actions.BaseSparkAction.DeleteSummary`	`deleteFiles(java.util.concurrent.ExecutorService executorService, java.util.function.Consumer<java.lang.String> deleteFunc, java.util.Iterator<FileInfo> files)`	Deletes files and keeps track of how many files were removed for each file type.
`protected org.apache.iceberg.spark.actions.BaseSparkAction.DeleteSummary`	`deleteFiles(SupportsBulkOperations io, java.util.Iterator<FileInfo> files)`
`ComputeTableStats.Result`	`execute()`	Executes this action.
`protected org.apache.spark.sql.Dataset<org.apache.spark.sql.Row>`	`loadMetadataTable(Table table, MetadataTableType type)`
`protected org.apache.spark.sql.Dataset<FileInfo>`	`manifestDS(Table table)`
`protected org.apache.spark.sql.Dataset<FileInfo>`	`manifestDS(Table table, java.util.Set<java.lang.Long> snapshotIds)`
`protected org.apache.spark.sql.Dataset<FileInfo>`	`manifestListDS(Table table)`
`protected org.apache.spark.sql.Dataset<FileInfo>`	`manifestListDS(Table table, java.util.Set<java.lang.Long> snapshotIds)`
`protected JobGroupInfo`	`newJobGroupInfo(java.lang.String groupId, java.lang.String desc)`
`protected Table`	`newStaticTable(TableMetadata metadata, FileIO io)`
`ThisT`	`option(java.lang.String name, java.lang.String value)`
`protected java.util.Map<java.lang.String,java.lang.String>`	`options()`
`ThisT`	`options(java.util.Map<java.lang.String,java.lang.String> newOptions)`
`protected org.apache.spark.sql.Dataset<FileInfo>`	`otherMetadataFileDS(Table table)`
`protected ComputeTableStatsSparkAction`	`self()`
`ComputeTableStats`	`snapshot(long newSnapshotId)`	Choose the table snapshot to compute stats, by default the current snapshot is used.
`protected org.apache.spark.sql.SparkSession`	`spark()`
`protected org.apache.spark.api.java.JavaSparkContext`	`sparkContext()`
`protected org.apache.spark.sql.Dataset<FileInfo>`	`statisticsFileDS(Table table, java.util.Set<java.lang.Long> snapshotIds)`
`protected <T> T`	`withJobGroupInfo(JobGroupInfo info, java.util.function.Supplier<T> supplier)`

Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

Methods inherited from interface org.apache.iceberg.actions.Action
option, options

Field Detail

MANIFEST

protected static final java.lang.String MANIFEST

See Also:: Constant Field Values

MANIFEST_LIST

protected static final java.lang.String MANIFEST_LIST

See Also:: Constant Field Values

STATISTICS_FILES

protected static final java.lang.String STATISTICS_FILES

See Also:: Constant Field Values

OTHERS

protected static final java.lang.String OTHERS

See Also:: Constant Field Values

FILE_PATH

protected static final java.lang.String FILE_PATH

See Also:: Constant Field Values

LAST_MODIFIED

protected static final java.lang.String LAST_MODIFIED

See Also:: Constant Field Values

COMMA_SPLITTER

protected static final org.apache.iceberg.relocated.com.google.common.base.Splitter COMMA_SPLITTER

COMMA_JOINER

protected static final org.apache.iceberg.relocated.com.google.common.base.Joiner COMMA_JOINER

Method Detail

self

protected ComputeTableStatsSparkAction self()

columns
```
public ComputeTableStats columns(java.lang.String... newColumns)
```
Description copied from interface: ComputeTableStats

Choose the set of columns to collect stats, by default all columns are chosen.

Specified by:

columns in interface ComputeTableStats

Parameters:

newColumns - a set of column names to be analyzed

Returns:

this for method chaining

snapshot
```
public ComputeTableStats snapshot(long newSnapshotId)
```
Description copied from interface: ComputeTableStats

Choose the table snapshot to compute stats, by default the current snapshot is used.

Specified by:

snapshot in interface ComputeTableStats

Parameters:

newSnapshotId - long ID of the snapshot for which stats need to be computed

Returns:

this for method chaining

execute
```
public ComputeTableStats.Result execute()
```
Description copied from interface: Action

Executes this action.

Specified by:

execute in interface Action<ComputeTableStats,ComputeTableStats.Result>

Returns:

the result of this action

spark

protected org.apache.spark.sql.SparkSession spark()

sparkContext

protected org.apache.spark.api.java.JavaSparkContext sparkContext()

option

public ThisT option(java.lang.String name,
                    java.lang.String value)

options

public ThisT options(java.util.Map<java.lang.String,java.lang.String> newOptions)

options

protected java.util.Map<java.lang.String,java.lang.String> options()

withJobGroupInfo

protected <T> T withJobGroupInfo(JobGroupInfo info,
                                 java.util.function.Supplier<T> supplier)

newJobGroupInfo

protected JobGroupInfo newJobGroupInfo(java.lang.String groupId,
                                       java.lang.String desc)

newStaticTable

protected Table newStaticTable(TableMetadata metadata,
                               FileIO io)

contentFileDS

protected org.apache.spark.sql.Dataset<FileInfo> contentFileDS(Table table)

contentFileDS

protected org.apache.spark.sql.Dataset<FileInfo> contentFileDS(Table table,
                                                               java.util.Set<java.lang.Long> snapshotIds)

manifestDS

protected org.apache.spark.sql.Dataset<FileInfo> manifestDS(Table table)

manifestDS

protected org.apache.spark.sql.Dataset<FileInfo> manifestDS(Table table,
                                                            java.util.Set<java.lang.Long> snapshotIds)

manifestListDS

protected org.apache.spark.sql.Dataset<FileInfo> manifestListDS(Table table)

manifestListDS

protected org.apache.spark.sql.Dataset<FileInfo> manifestListDS(Table table,
                                                                java.util.Set<java.lang.Long> snapshotIds)

statisticsFileDS

protected org.apache.spark.sql.Dataset<FileInfo> statisticsFileDS(Table table,
                                                                  java.util.Set<java.lang.Long> snapshotIds)

otherMetadataFileDS

protected org.apache.spark.sql.Dataset<FileInfo> otherMetadataFileDS(Table table)

allReachableOtherMetadataFileDS

protected org.apache.spark.sql.Dataset<FileInfo> allReachableOtherMetadataFileDS(Table table)

loadMetadataTable

protected org.apache.spark.sql.Dataset<org.apache.spark.sql.Row> loadMetadataTable(Table table,
                                                                                   MetadataTableType type)

deleteFiles

protected org.apache.iceberg.spark.actions.BaseSparkAction.DeleteSummary deleteFiles(java.util.concurrent.ExecutorService executorService,
                                                                                     java.util.function.Consumer<java.lang.String> deleteFunc,
                                                                                     java.util.Iterator<FileInfo> files)

Deletes files and keeps track of how many files were removed for each file type.

Parameters:: executorService - an executor service to use for parallel deletes; deleteFunc - a delete func; files - an iterator of Spark rows of the structure (path: String, type: String)
Returns:: stats on which files were deleted

deleteFiles

protected org.apache.iceberg.spark.actions.BaseSparkAction.DeleteSummary deleteFiles(SupportsBulkOperations io,
                                                                                     java.util.Iterator<FileInfo> files)

Class ComputeTableStatsSparkAction

Nested Class Summary

Nested classes/interfaces inherited from interface org.apache.iceberg.actions.ComputeTableStats

Field Summary

Method Summary

Methods inherited from class java.lang.Object

Methods inherited from interface org.apache.iceberg.actions.Action

Field Detail

MANIFEST

MANIFEST_LIST

STATISTICS_FILES

OTHERS

FILE_PATH

LAST_MODIFIED

COMMA_SPLITTER

COMMA_JOINER

Method Detail

self

columns

snapshot

execute

spark

sparkContext

option

options

options

withJobGroupInfo

newJobGroupInfo

newStaticTable

contentFileDS

contentFileDS

manifestDS

manifestDS

manifestListDS

manifestListDS

statisticsFileDS

otherMetadataFileDS

allReachableOtherMetadataFileDS

loadMetadataTable

deleteFiles

deleteFiles