All Classes and Interfaces
Class
Description
Position2Accessor and Position3Accessor here is an optimization.
An action performed on a table.
An API that should be implemented by query engine integrations for providing actions.
A scan task for inserts generated by adding a data file to the table.
FileIO implementation backed by Azure Data Lake Storage Gen2.
The aggregate functions that can be pushed and evaluated in Iceberg.
A class for evaluating aggregates.
A
Table
implementation that exposes a table's valid data files as rows.A
Table
implementation that exposes its valid delete files as rows.A
Table
implementation that exposes a table's manifest entries as rows, for both delete
and data files.A
Table
implementation that exposes its valid files as rows.A
Table
implementation that exposes a table's valid manifest files as rows.Exception raised when attempting to create a table that already exists.
API for appending new files in a table.
An Avro Schema visitor to apply a name mapping to add Iceberg field IDs.
Vectorized reader that returns an iterator of
ColumnarBatch
.The purpose of this class is to hold configuration options for
OAuth2Util.AuthSession
.Deprecated.
since 1.7.0.
This util class converts Avro GenericRecord to Flink RowData.
Class for Avro-related utility methods.
A abstract avro schema visitor with partner type.
Interface to customize AWS clients used by Iceberg.
Exception thrown on HTTP 400 - Bad Request
A base BatchReader class that contains common functionality
A base writer factory to be extended by query engine integrations.
Base class for metadata tables.
Base
Table
implementation.A values reader for Parquet's run-length encoded data that reads column data in batches instead
of one value at a time.
API for configuring a batch scan.
Rewrites
expressions
by replacing unbound named references with references to
fields in a struct schema.A metadata about a statistics or indices blob.
Represents a bound value expression.
Represents a bound term.
A transform expression.
A Spark function implementation for the Iceberg bucket transform.
Contains the logic for hashing various types for use with the
bucket
partition
transformationsA ClientPool that caches the underlying HiveClientPool instances.
Class that wraps an Iceberg Catalog to cache tables.
A Catalog API for table create, drop, and load operations.
A builder used to create valid
tables
or start create/replace transactions
.Serializable loader to load an Iceberg
Catalog
.Class for catalog resolution and accessing the common functions for
Catalog
API.An iterator that transforms rows from changelog tables within a single Spark task.
An enum representing possible operations in a changelog.
A changelog scan task.
A map that uses char sequences as keys.
Wrapper class to adapt CharSequence for use in maps and sets.
This exception occurs when one cherrypicks an ancestor or when the picked snapshot is already
linked to a published ancestor.
A marker interface for commit exceptions where the state is known to be failure and uncommitted
metadata can be cleaned up.
This class acts as a helper for handling the closure of multiple resource.
A convenience wrapper around
CloseableIterator
, providing auto-close functionality when
all of the elements in the iterator are consumed.A data writer capable of writing to multiple specs and partitions that requires the incoming
records to be properly clustered by partition spec and by partition within each spec.
An equality delete writer capable of writing to multiple specs and partitions that requires the
incoming delete records to be properly clustered by partition spec and by partition within each
spec.
A position delete writer capable of writing to multiple specs and partitions that requires the
incoming delete records to be properly clustered by partition spec and by partition within each
spec.
This class is inspired by Spark's
ColumnarBatch
.VectorizedReader
that returns Spark's ColumnarBatch
to support Spark's vectorized
read path.SplitWatermarkExtractor
implementation which uses an Iceberg timestamp column statistics
to get the watermarks for the IcebergSourceSplit
.This class is inspired by Spark's
ColumnVector
.A scan task made of several ranges from files.
A control event payload for events sent by a coordinator that indicates it has completed a commit
cycle.
Exception raised when a commit fails because of out of date metadata.
utility class to accept thread local commit properties
Carries all metrics for a particular commit
A serializable version of
CommitMetrics
that carries its results.A commit report that contains all relevant information from a Snapshot.
Exception for a failure to confirm either affirmatively or negatively that a commit was applied.
A control event payload for events sent by a coordinator that indicates it has completed a commit
cycle.
An action that collects statistics of an Iceberg table and writes to Puffin files.
The result of table statistics collection.
Computes the statistics of the given columns and stores it as Puffin files.
An iterator that finds delete/insert rows which represent an update, and converts them into
update records from changelog tables within a single Spark task.
Represents a response to requesting server-side provided configuration for the REST catalog.
Interface used to avoid runtime dependencies on Hadoop Configurable
A simple container of objects that you can get and set.
Class that provides file-content caching during reading.
Superinterface of
DataFile
and DeleteFile
that exposes common methods.A scan task over a range of bytes in a content file.
This interface is introduced so that we can plug in different split planner for unit test
An action for converting the equality delete files to position delete files.
The action result that contains a summary of the execution.
A strategy for the action to convert equality delete to position deletes.
Generalized Counter interface for creating telemetry-related instances when counting events.
A serializable version of a
Counter
that carries its result.A procedure that creates a view for changed rows.
A REST request to create a namespace, with an optional set of properties.
Represents a REST response for a request to create a namespace / database.
A REST request to create a table, either via direct commit or staging the creation of the table
as part of a transaction.
Interface used to expose credentials held by a FileIO instance.
A control event payload for events sent by a worker that indicates it has finished sending all
data for a commit request.
Interface for data files listed in a table manifest.
A
Table
implementation that exposes a table's data files as rows.Flink data iterator that reads
CombinedScanTask
into a CloseableIterator
Batcher converts iterator of T into iterator of batched
RecordsWithSplitIds<RecordAndPosition<T>>
, as FLIP-27's SplitReader.fetch()
returns
batched records.A
ReaderFunction
implementation that uses DataIterator
.Data operations that produce snapshots.
DataStatisticsCoordinatorProvider provides the method to create new
DataStatisticsCoordinator
DataStatisticsOperator collects traffic distribution statistics.
A task that returns data as
rows
instead of where to read data.A result of writing data files.
A control event payload for events sent by a worker that contains the table data that has been
written and is ready to commit.
A Spark function implementation for the Iceberg day transform.
Resolver to resolve
Decoder
to a ResolvingDecoder
.A default
Counter
implementation that uses an AtomicLong
to count events.A default
MetricsContext
implementation that uses native Java counters/timers.Since all methods are called in the source coordinator thread by enumerator, there is no need for
locking.
A default
Timer
implementation that uses a Stopwatch
instance internally to
measure time.This interface is intended as an extension for FileIO implementations that support being a
delegate target.
A counter to be used to count deletes as they are applied.
A scan task for deletes generated by removing a data file from the table.
A scan task for deletes generated by adding delete files to the table.
Interface for delete files listed in a table delete manifest.
API for deleting files from a table.
A
Table
implementation that exposes a table's delete files as rows.An enum that represents the granularity of deletes.
An API for loading delete file content into in-memory data structures.
An action that deletes orphan metadata, data and delete files in a table.
Defines the action behavior when location prefixes (scheme/authority) mismatch.
The action result that contains a summary of the execution.
An action that removes orphan metadata, data and delete files by listing a given location and
comparing the actual files in that location with content and metadata files referenced by all
valid snapshots.
An action that deletes all files referenced by a table metadata file.
The action result that contains a summary of the execution.
An implementation of
DeleteReachableFiles
that uses metadata tables in Spark to determine
which files should be deleted.A result of writing delete files.
An API that provide actions for migration from a delta lake table to an iceberg table.
This converts dictionary encoded arrow vectors to a correctly typed arrow vector.
Enum of supported write distribution mode, it defines the write behavior of batch or streaming
job:
Iceberg internally tracked field level metrics, used by Parquet and ORC writers only.
This exception occurs when the WAP workflow detects a duplicate wap commit.
DynamoDB implementation of Iceberg catalog
DynamoDB implementation for the lock manager.
Copied from parquet-common
Convenience wrapper class around
Field
.Copied from parquet-common
Convenience wrapper class around
Method
.FileIO implementation backed by Dell EMC ECS.
This gauge measures the elapsed time between now and last recorded time set by
ElapsedTimeGauge.refreshLastRecordedTime()
.Thin wrapper around an
InputFile
instance that is encrypted.Thin wrapper around a
OutputFile
that is encrypting bytes written to the underlying file
system, via an encryption key that is symbolized by the enclosed EncryptionKeyMetadata
.Algorithm supported for file encryption.
Light typedef over a ByteBuffer that indicates that the given bytes represent metadata about an
encrypted data file's encryption key.
Module for encrypting and decrypting table data files.
Holds an endpoint definition that consists of the HTTP method (GET, POST, DELETE, ...) and the
resource path as defined in the Iceberg OpenAPI REST specification without parameter
substitution, such as /v1/{prefix}/namespaces/{namespace}.
A writer capable of writing data and equality deletes that may belong to different specs and
partitions.
A set of consumers to handle errors for requests for table entities or for namespace entities, to
throw the correct exception.
Standard response body for all API errors
Evaluates an
Expression
for data described by a Types.StructType
.Class representing all events produced to the control topic.
An action that expires snapshots in a table.
API for removing old
snapshots
from a table.The action result that contains a summary of the execution.
A procedure that expires snapshots in a table.
An action that performs the same operation as
ExpireSnapshots
but uses
Spark to determine the delta in files between the pre and post-expiration table metadata.Represents a boolean expression tree.
Factory methods for creating
expressions
.Expression utility methods.
Utils for traversing
expressions
.An
expression
that is always false.A data writer capable of writing to multiple specs and partitions that keeps data writers for
each seen spec/partition pair open until this writer is closed.
A position delete writer capable of writing to multiple specs and partitions if the incoming
stream of deletes is not ordered.
Iceberg internally tracked field level metrics.
Factory to create a new
FileAppender
to write records.Content type stored in a file, one of DATA, POSITION_DELETES, or EQUALITY_DELETES.
Enum of supported file formats.
Pluggable module for reading, writing, and deleting files.
Extension of MetricsContext for use with FileIO to define standard metrics that should be
reported.
Keeps track of the
FileIO
instance of the given TableOperations
instance and
closes the FileIO
when FileIOTracker.close()
gets calledA class for rewriting content files.
A scan task over a range of bytes in a single data file.
Read a
FileScanTask
into a CloseableIterator
A position delete writer that produces a separate delete file for each referenced data file.
A
Table
implementation that exposes a table's files as rows.A writer capable of writing files of a single type (i.e.
A factory for creating data and delete writers.
A Class for generic filters
An Iterator that filters another Iterator.
A
Histogram
implementation with reservoir sampling.This is used to fix primitive types to match a table schema.
Deprecated.
will be removed in 1.8.0; use FlinkPlannedAvroReader instead.
A Flink Catalog implementation that wraps an Iceberg
Catalog
.A Flink Catalog factory implementation that creates
FlinkCatalog
.This is a small util class that try to hide calls to Flink Internal or PublicEvolve interfaces as
Flink can change those APIs during minor version release.
When constructing Flink Iceberg source via Java API, configs can be set in
Configuration
passed to source builder.Flink
InputFormat
for Iceberg.Flink source read options
Converter between Flink types and Iceberg type.
Deprecated.
since 1.7.0, will be removed in 2.0.0.
Source builder to build
DataStream
.A class for common Iceberg configs for Flink writes.
Flink sink write options
Iceberg internally tracked field level metrics, used by Parquet and ORC writers only.
Exception thrown on HTTP 403 Forbidden - Failed authorization checks.
FileIO Implementation backed by Google Cloud Storage (GCS)
Factory to create a new
FileAppender
to write Record
s.This class is creates typed
ArrowVectorAccessor
from VectorHolder
.Create an array value of type
ArrayT
from arrow vector value.Create a decimal value of type
DecimalT
from arrow vector value.Create a UTF8 String value of type
Utf8StringT
from arrow vector value.Create a struct child vector of type
ChildVectorT
from arrow vector value.Represents a REST response to fetch a namespace and its metadata properties
HadoopCatalog provides a way to use table names like db.table to work with path-based tables
under a common location.
An interface that extends the Hadoop
Configurable
interface to offer better serialization
support for customizable Iceberg objects such as FileIO
.InputFile
implementation using the Hadoop FileSystem
API.FileIO Metrics implementation that delegates to Hadoop FileSystem statistics implementation using
the provided scheme.
OutputFile
implementation using the Hadoop FileSystem
API.Convenience methods to get Parquet abstractions for Hadoop data streams.
TableOperations implementation for file systems that support atomic rename.
Implementation of Iceberg tables that uses the Hadoop FileSystem to store metadata and manifests.
Used to expose a table's TableOperations.
A
PathFilter
that filters out hidden paths.Table history entry.
A
Table
implementation that exposes a table's history as rows.An Iceberg table committer for adding data files to the Iceberg tables.
TODO we should be able to extract some more commonalities to BaseMetastoreTableOperations to
avoid code duplication between this class and Metacat Tables.
A Spark function implementation for the Iceberg hour transform.
An HttpClient for usage with the REST catalog.
Implementation of Spark's
ColumnVector
interface.Loads iceberg-version.properties with build information.
Enumerator state for checkpointing
Generic Mrv2 InputFormat API for Iceberg.
Deprecated.
will be removed in 1.8.0
Flink v2 sink offer different hooks to insert custom topologies into the sink.
The IcebergSource loads/writes tables with format "iceberg".
This class provides an empty implementation of
IcebergSqlExtensionsListener
,
which can be extended to create a listener which only needs to handle a subset
of the available methods.This class provides an empty implementation of
IcebergSqlExtensionsVisitor
,
which can be extended to create a visitor which only needs to handle a subset
of the available methods.This interface defines a complete listener for a parse tree produced by
IcebergSqlExtensionsParser
.This interface defines a complete generic visitor for a parse tree produced
by
IcebergSqlExtensionsParser
.Deprecated.
will be removed in 1.8.0
Flink Iceberg table source.
A function for use in SQL that returns the current Iceberg version, e.g.
Evaluates an
Expression
on a DataFile
to test whether rows in the file may match.API for configuring an incremental table scan for appends only snapshots
API for configuring a scan for table changes.
API for configuring an incremental scan.
Event sent to listeners when an incremental table scan is planned.
Catalog implementation that uses in-memory data-structures to store the namespaces and tables.
An interface used to read input files using
SeekableInputStream
instances.A reader that produces Iceberg's internal in-memory object model.
An isolation level in a table.
JDBC table backed implementation of the
TriggerLockFactory
.JDBC implementation of Iceberg ViewOperations.
Captures information about the current job which is used for displaying on the UI
Deprecated.
the API will be removed in v2.0.0 (replaced with KeyManagementClient interface).
For KMS systems that support key generation, this class keeps the key generation result - the
raw secret key, and its wrap.
This implementation of AwsClientFactory is used by default if
AwsProperties.GLUE_LAKEFORMATION_ENABLED
is set to true.A listener interface that can receive notifications.
Static registration and notification for listeners.
A list of table identifiers in a given namespace.
Represents a literal fixed value in an expression predicate
A REST response that is used when a table is successfully loaded.
Interface for providing data file locations to write tasks.
An interface for locking, used to ensure commit isolation.
Manages locks and collect
Metric
for the Maintenance Tasks.A default
MetricsReporter
implementation that logs the MetricsReport
to the log
file.API for managing snapshots.
Content type stored in a manifest file, either DATA or DELETES.
A
Table
implementation that exposes a table's manifest entries as rows, for both delete
and data files.Evaluates an
Expression
on a ManifestFile
to test whether the file contains
matching partitions.Represents a manifest file that can be scanned to find files in a table.
Summarizes the values of one partition field stored in a manifest file.
A serializable bean that contains a bare minimum to read a manifest.
Base reader for data and delete manifest files.
A
Table
implementation that exposes a table's manifest files as rows.Writer for manifest files.
An immutable mapping between a field ID and a set of names.
Generic MR v1 InputFormat API for Iceberg.
A scan task that can be potentially merged with other scan tasks.
Reading metadata tables (like snapshots, manifests, etc.)
Represents a change to table or view metadata.
Iceberg file format metrics.
Wrapper writer around
DatumWriter
with metrics support.Generalized interface for creating telemetry related instances for tracking operations.
Deprecated.
will be removed in 2.0.0, use
Counter
instead.This class defines different metrics modes, which allow users to control the collection of
value_counts, null_value_counts, nan_value_counts, lower_bounds, upper_bounds for different
columns in metadata.
Under this mode, only value_counts, null_value_counts, nan_value_counts are persisted.
Under this mode, value_counts, null_value_counts, nan_value_counts and full lower_bounds,
upper_bounds are persisted.
A metrics calculation mode.
Under this mode, value_counts, null_value_counts, nan_value_counts, lower_bounds, upper_bounds
are not persisted.
Under this mode, value_counts, null_value_counts, nan_value_counts and truncated lower_bounds,
upper_bounds are persisted.
This interface defines the basic API for reporting metrics for operations to a Table.
Utility class that allows combining two given
MetricsReporter
instances.A struct of readable metric values for a primitive column
Fixed definition of a readable metric column, ie a mapping of a raw metric to a readable metric
A struct, consisting of all
MetricsUtil.ReadableColMetricsStruct
for all primitive columns of the
tableAn action that migrates an existing table to Iceberg.
The action result that contains a summary of the execution.
Takes a Spark table in the source catalog and attempts to transform it into an Iceberg table in
the same location with the same identifier.
A Spark function implementation for the Iceberg month transform.
Represents a mapping from external schema names to Iceberg type IDs.
A delegating DatumReader that applies a name mapping to a file schema to enable reading Avro
files that were written without field IDs.
Parses external name mappings from a JSON representation.
A namespace in a
Catalog
.Exception raised when attempting to drop a namespace that is not empty.
An
EncryptedInputFile
that can be used for format-native encryption.EncryptionKeyMetadata
for use with format-native encryption.An
EncryptedOutputFile
that can be used for format-native encryption.Barebone encryption parameters, one object per content file.
This interface is applied to OutputFile and InputFile implementations, in order to enable
delivery of crypto parameters (such as encryption keys etc) from the Iceberg key management
module to the writers/readers of file formats that support encryption natively (Parquet and ORC).
Nessie implementation of Iceberg Catalog.
Nessie implementation of Iceberg TableOperations.
NoSuchTableException thrown when a table is found but it is not an Iceberg table.
NoSuchIcebergViewException thrown when a view is found, but it is not an Iceberg view.
Exception raised when attempting to load a namespace that does not exist.
Exception raised when attempting to load a table that does not exist.
Exception raised when attempting to load a view that does not exist.
Exception thrown on HTTP 401 Unauthorized.
Exception raised when attempting to read a file that does not exist.
Instances of this class simply track whether a value at an index is null.
Class to handle authorization headers and token refresh.
Used for implementing ORC batch readers.
Used for implementing ORC row readers.
Write data value of a schema.
Utilities for mapping Iceberg to ORC schemas.
Generic visitor of an ORC Schema.
Create default assigner with a comparator that hands out splits where the order of the splits
will be defined by the
SerializableComparator
.FileIO implementation backed by OSS.
This class represents a fully qualified location in OSS for input/output operations expressed as
as URI.
An interface used to create output files using
PositionOutputStream
instances.Factory responsible for generating unique but recognizable data/delete file names.
API for overwriting files in a table.
This class implements a codec factory that is used when reading from Parquet.
Visitor for traversing a Parquet type with a companion Spark type.
Deprecated.
use
ParquetWriter
Represents a single field in a
PartitionSpec
.A writer capable of writing files of a single type (i.e.
A struct of partition values.
A map that uses a pair of spec ID and partition tuple as keys.
A scan task for data within a particular partition
Represents how to produce partition data for a table.
Used to create valid
partition specs
.A
Table
implementation that exposes a table's partitions as rows.Represents a partition statistics file that can be used to read table data more efficiently.
Interface for an element that is an event payload.
Control event types.
API for table metadata changes.
Deprecated.
will be removed in 1.8.0
A
ScanTask
for position delete filesA
Table
implementation whose Scan
provides PositionDeletesScanTask
, for
reading of position delete files.A position delete writer that can handle deletes ordered by file and position.
A writer capable of writing data and position deletes that may belong to different specs and
partitions.
An interface representing a stored procedure available for execution.
A catalog API for working with stored procedures.
An input parameter of a
stored procedure
.Utils to project expressions on rows to expressions on partitions.
A class that projects expressions for a table's data rows into expressions on the table's
partition values, for a table's
partition spec
.Convert Map properties to bytes.
Utility class for reading and writing Puffin files.
A builder for
PuffinReader
.A builder for
PuffinWriter
.This custom partitioner implements the
DistributionMode.RANGE
for Flink sink.RangeReadable
is an interface that allows for implementations of InputFile
streams to perform positional, range-based reads, which are more efficient than unbounded reads
in many cloud provider object stores.A record along with the reader position to be stored in the checkpoint.
Represents a variable reference in an
expression
.A
Table
implementation that exposes a table's known snapshot references as rows.An action that removes dangling delete files from the current snapshot.
An action that remove dangling deletes.
This class computes the net changes across multiple snapshots.
A procedure that removes orphan files in a table.
A REST request to rename a table or a view.
API for overwriting files in a table by partition.
API for replacing table sort order with a newly created order.
API for replacing a view's version.
Finds the residuals for an
Expression
the partitions in the given PartitionSpec
.FileIO implementation that uses location scheme to choose the correct FileIO implementation.
Interface for a basic HTTP Client for interfacing with the REST catalog.
Base class for REST client exceptions
Interface to mark both REST requests and responses.
Interface to mark a REST request.
Interface to mark a REST response
Provides a request interceptor for use with the HTTPClient that calculates the required signature
for the SigV4 protocol and adds the necessary headers for all requests created by the client.
Metrics are the only reliable way provided by the AWS SDK to determine if an API call was
retried.
An action for rewriting data files according to a rewrite strategy.
For a file group that failed to rewrite.
A description of a file group, when it was processed, and within which partition.
For a particular file group, the number of files which are newly created and the number of
files which were formerly part of the table but have been rewritten.
A map of file group information to the results of rewriting that file group.
Functionality used by RewriteDataFile Actions from different platforms to handle commits.
Container class representing a set of files to be rewritten by a RewriteAction and the new files
which have been written by the action.
API for replacing files in a table.
Enum of supported rewrite job order, it defines the order in which the file groups should be
written.
An action that rewrites manifests.
API for rewriting manifests for a table.
The action result that contains a summary of the execution.
An action that rewrites manifests in a distributed manner and co-locates metadata for partitions.
An action for rewriting position delete files.
A description of a position delete file group, when it was processed, and within which
partition.
For a particular position delete file group, the number of position delete files which are
newly created and the number of files which were formerly part of the table but have been
rewritten.
The action result that contains a summary of the execution.
A procedure that rewrites position delete files in a table.
Spark implementation of
RewritePositionDeleteFiles
.Functionality used by
RewritePositionDeleteFiles
from different platforms to handle
commits.Container class representing a set of position delete files to be rewritten by a
RewritePositionDeleteFiles
and the new files which have been written by the action.An action that rewrites the table's metadata files to a staging directory, replacing all source
prefixes in absolute paths with a specified target prefix.
The action result that contains a summary of the execution.
An implementation of StagedTable that mimics the behavior of Spark's non-atomic CTAS and RTAS.
A rolling data writer that splits incoming data into multiple files within one spec/partition
based on the target file size.
A rolling equality delete writer that splits incoming deletes into multiple files within one
spec/partition based on the target file size.
As opposed to
ManifestWriter
, a rolling writer could produce multiple manifest files.A rolling position delete writer that splits incoming deletes into multiple files within one
spec/partition based on the target file size.
Convert RowData to a different output type.
This is not serializable because Avro
Schema
is not actually serializable, even though it
implements Serializable
interface.API for encoding row-level changes to a table.
Iceberg supports two ways to modify records in a table: copy-on-write and merge-on-read.
Exception used to wrap
IOException
as a RuntimeException
and add context.Exception used to wrap
MetaException
as a RuntimeException
and add context.FileIO implementation backed by S3.
Scan objects are immutable and can be shared between threads.
Context object with optional arguments for a Flink Scan.
Event sent to listeners when a table scan is planned.
Carries all metrics for a particular scan
A serializable version of
ScanMetrics
that carries its results.A Table Scan report that contains all relevant information from a Table Scan.
A scan task.
A scan task that may include partial input files, multiple input files or both.
The schema of a data table.
Deprecated.
will be removed in 1.8.0
SeekableInputStream
is an interface with the methods needed to read data from a file or
Hadoop data stream.Wraps a
Configuration
object in a Serializable
layer.A concrete transform function that applies a transform to values of a certain type.
A read-only serializable table that can be sent to other nodes in a cluster.
This class provides a serializable table with a known size estimate.
Exception thrown on HTTP 5XX Server Error.
Exception thrown on HTTP 503: service is unavailable
A Catalog API for table and namespace operations that includes session context.
Context for a session.
Create simple assigner that hands out splits without any guarantee in order or locality.
Implementation of the Source V2 API which uses an iterator to read the elements, and uses a
single thread to do so.
A file rewriter that determines which files to rewrite based on their size.
A snapshot of the data in a table at a point in time.
Snapshot an existing Delta Lake table to Iceberg in place.
The action result that contains a summary of the execution.
This is a common base class to share code between different BaseScan implementations that handle
scans of a particular snapshot.
A
Table
implementation that exposes a table's known snapshots as rows.An action that creates an independent snapshot of an existing table.
The action result that contains a summary of the execution.
Creates a new Iceberg table based on a source Spark table.
An action that produces snapshots.
API for table changes that produce snapshots.
An Iterable that merges the items from other Iterables in order.
A field in a
SortOrder
.A position delete writer that is capable of handling unordered deletes without rows.
A struct of flattened sort field values.
A sort order that defines how data and delete files should be ordered in a table.
A builder used to create valid
sort orders
.Methods for building a sort order.
This mimics a class inside of Spark which is private inside of LookupCatalog.
An implementation of
ActionsProvider
for Spark.Deprecated.
will be removed in 1.8.0; use SparkPlannedAvroReader instead.
An internal table catalog that is capable of loading tables from a cache.
A Spark TableCatalog implementation that wraps an Iceberg
Catalog
.A batch data scan that can utilize Spark cluster resources for planning.
An executor cache for reducing the computation and IO overhead in tasks.
A function catalog that can be used to resolve Iceberg functions without a metastore connection.
Converts the OrcIterator, which returns ORC's VectorizedRowBatch to a set of Spark's UnsafeRows.
This class acts as an adaptor from an OrcFileAppender to a FileAppender<InternalRow>.
Write
class for rewriting position delete files from Spark.Builder class for rewrites of position delete files from Spark.
A class for common Iceberg configs for Spark reads.
Spark DF read options
Helper methods for working with Spark/Hive metadata.
A Spark catalog that can also load non-Iceberg tables.
Java version of the original SparkTableUtil.scala
https://github.com/apache/iceberg/blob/apache-iceberg-0.8.0-incubating/spark/src/main/scala/org/apache/iceberg/spark/SparkTableUtil.scala
Class representing a table partition.
A utility class that converts Spark values to Iceberg's internal representation.
A class for common Iceberg configs for Spark writes.
Spark DF write options
A set of requirements such as distribution and ordering reported to Spark during writes.
A utility that contains helper methods for working with Spark writes.
SplitAssigner interface is extracted out as a separate component so that we can plug in different
split assignment strategy for different requirements.
Provides implementations of
SerializableComparator
which could be used for ordering splits.We can remove this class once FLINK-21364 is resolved.
A scan task that can be split into smaller scan tasks.
The interface used to extract watermarks from splits.
SQLViewRepresentation represents views in SQL with a given dialect
A control event payload for events sent by a coordinator to request workers to send back the
table data that has been written and is ready to commit.
One-time split enumeration at the start-up for batch execution
TableOperations implementation that provides access to metadata for a Table at some point in
time, using a table metadata location.
Represents a statistics file in the Puffin format, that can be used to read table data more
efficiently.
The wrapper class for data statistics and record.
Range distribution requires gathering statistics on the sort keys to determine proper range
boundaries to distribute/cluster rows before writer operators.
Delete
implementation that avoids loading full manifests in memory.This is the single (non-parallel) monitoring task which takes a
FlinkInputFormat
, it is
responsible for:
Monitoring snapshots of the Iceberg table.The operator that reads the
splits
received from the preceding StreamingMonitorFunction
.Starting strategy for streaming execution.
Evaluates an
Expression
on a DataFile
to test whether all rows in the file match.Interface for accessing data by position in a schema.
Wrapper to adapt StructLike for use in maps and sets by implementing equals and hashCode.
Catalog methods for working with namespaces.
This interface is intended as an extension for FileIO implementations to provide additional
prefix based operations that may be useful in performing supporting operations.
This interface is intended as an extension for FileIO implementations to provide additional
best-effort recovery operations that can be useful for repairing corrupted tables where there are
reachable files missing from disk.
Interface for readers that accept a callback to determine the starting row position of an Avro
split.
Configuration properties that are controlled by Java system properties or environmental variable.
Deprecated.
Use
SystemConfigs
instead; will be removed in 2.0.0Represents a table.
This represents a commit to be applied for a single table with
UpdateRequirement
s to be
validated and MetadataUpdate
s that have been applied.Identifies a table in iceberg catalog.
Parses TableIdentifiers from a JSON representation, which is the JSON representation utilized in
the REST catalog.
Serializable loader to load an Iceberg
Table
.Metadata for a table.
SPI interface to abstract table metadata access and updates.
Element representing a table identifier, with namespace and name.
Generic interface for creating and loading a table implementation.
API for configuring a table scan.
The result of a single Maintenance Task.
The writer interface could accept records and provide the generated data files.
Factory to create
TaskWriter
An expression that evaluates to a value.
Generalized Timer interface for creating telemetry related instances for measuring duration of
operations.
A timing sample that carries internal state about the Timer's start position.
A serializable version of a
Timer
that carries its result.Element representing an offset, with topic name, partition number, and offset.
A transaction for performing multiple updates to a table.
A transform function used for partitioning.
Factory methods for transforms.
Lock interface for handling locks for the Flink Table Maintenance jobs.
An
expression
that is always true.A Spark function implementation for the Iceberg truncate transform.
Contains the logic for various
truncate
transformations for various types.Interface for passing a function that assigns column IDs from the previous Id.
Interface for passing a function that assigns column IDs.
Visitor for traversing a Parquet type with a companion Iceberg type.
Represents an unbound expression node.
Represents an unbound term.
Visitor class that accumulates the set of changes needed to evolve an existing schema into the
union of the existing and a new schema.
REST exception thrown when a request is well-formed but cannot be applied.
API for setting a table's or view's base location.
A REST request to set and/or remove properties on a namespace.
A REST response to a request to set and/or remove properties on a namespace.
API for partition spec evolution.
API for updating partition statistics files in a table.
API for updating table properties.
Represents a requirement for a
MetadataUpdate
API for schema evolution.
API for updating statistics files in a table.
API for updating view properties.
Exception which is raised when the arguments are valid in isolation, but not in conjunction with
other arguments or state, as opposed to
IllegalArgumentException
which is raised when an
argument value is always invalid.Implements a
ValuesReader
specifically to read given number of bytes from the underlying
ByteBufferInputStream
.Container class for holding the Arrow vector storing a batch of values along with other state
needed for reading values out of it.
A Vector Holder which does not actually produce values, consumers of this class should use the
constantValue to populate their ColumnVector implementation.
VectorReader(s)
that read in a batch of values into Arrow vectors.A Dummy Vector Reader which doesn't actually read files, instead it returns a dummy
VectorHolder which indicates the constant value which should be used for this column.
A Dummy Vector Reader which doesn't actually read files.
Vectorized version of the ColumnIterator that reads column values in data pages of a column in a
row group in a batched fashion.
This decoder reads Parquet dictionary encoded data in a vectorized fashion.
Interface for vectorized Iceberg readers.
An adaptor so that the ORC RecordReader can be used as an Iterator.
Copied here from Hive for compatibility
A vectorized implementation of the Iceberg reader that iterates over the table scan.
Interface for view definition.
A builder used to create or replace a SQL
View
.A Catalog API for view create, drop, and load operations.
View history entry.
SPI interface to abstract view metadata access and updates.
View properties that can be set during CREATE/REPLACE view or using updateProperties API.
A session Catalog API for view create, drop, and load operations.
A version of the view at a point in time.
Interface for converting the Hive primitive objects for the objects which could be added to an
Iceberg Record.
A Spark function implementation for the Iceberg year transform.
Within Z-Ordering the byte representations of objects being compared must be ordered, this
requires several types to be transformed when converted to bytes.