Package org.apache.iceberg.spark
Class Spark3Util
java.lang.Object
org.apache.iceberg.spark.Spark3Util
-
Nested Class Summary
Modifier and TypeClassDescriptionstatic class
This mimics a class inside of Spark which is private inside of LookupCatalog.static class
-
Method Summary
Modifier and TypeMethodDescriptionstatic UpdateProperties
applyPropertyChanges
(UpdateProperties pendingUpdate, List<org.apache.spark.sql.connector.catalog.TableChange> changes) Applies a list of Spark table changes to anUpdateProperties
operation.static UpdateSchema
applySchemaChanges
(UpdateSchema pendingUpdate, List<org.apache.spark.sql.connector.catalog.TableChange> changes) Applies a list of Spark table changes to anUpdateSchema
operation.catalogAndIdentifier
(String description, org.apache.spark.sql.SparkSession spark, String name) catalogAndIdentifier
(String description, org.apache.spark.sql.SparkSession spark, String name, org.apache.spark.sql.connector.catalog.CatalogPlugin defaultCatalog) catalogAndIdentifier
(org.apache.spark.sql.SparkSession spark, String name) catalogAndIdentifier
(org.apache.spark.sql.SparkSession spark, String name, org.apache.spark.sql.connector.catalog.CatalogPlugin defaultCatalog) catalogAndIdentifier
(org.apache.spark.sql.SparkSession spark, List<String> nameParts) catalogAndIdentifier
(org.apache.spark.sql.SparkSession spark, List<String> nameParts, org.apache.spark.sql.connector.catalog.CatalogPlugin defaultCatalog) A modified version of Spark's LookupCatalog.CatalogAndIdentifier.unapply Attempts to find the catalog and identifier a multipart identifier representsstatic String
describe
(List<Expression> exprs) static String
describe
(Expression expr) static String
static String
static String
static boolean
extensionsEnabled
(org.apache.spark.sql.SparkSession spark) static List
<SparkTableUtil.SparkPartition> getPartitions
(org.apache.spark.sql.SparkSession spark, org.apache.hadoop.fs.Path rootPath, String format, Map<String, String> partitionFilter, PartitionSpec partitionSpec) Use Spark to list all partitions in the table.static TableIdentifier
identifierToTableIdentifier
(org.apache.spark.sql.connector.catalog.Identifier identifier) static Catalog
loadIcebergCatalog
(org.apache.spark.sql.SparkSession spark, String catalogName) Returns the underlying Iceberg Catalog object represented by a Spark Catalogstatic Table
loadIcebergTable
(org.apache.spark.sql.SparkSession spark, String name) Returns an Iceberg Table by its name from a Spark V2 Catalog.static String
quotedFullIdentifier
(String catalogName, org.apache.spark.sql.connector.catalog.Identifier identifier) rebuildCreateProperties
(Map<String, String> createProperties) static org.apache.spark.sql.util.CaseInsensitiveStringMap
static Table
toIcebergTable
(org.apache.spark.sql.connector.catalog.Table table) static Term
toIcebergTerm
(org.apache.spark.sql.connector.expressions.Expression expr) static org.apache.spark.sql.connector.expressions.NamedReference
toNamedReference
(String name) static org.apache.spark.sql.connector.expressions.SortOrder[]
toOrdering
(SortOrder sortOrder) static PartitionSpec
toPartitionSpec
(Schema schema, org.apache.spark.sql.connector.expressions.Transform[] partitioning) Converts Spark transforms into aPartitionSpec
.static org.apache.spark.sql.connector.expressions.Transform[]
toTransforms
(PartitionSpec spec) Converts a PartitionSpec to Spark transforms.static org.apache.spark.sql.connector.expressions.Transform[]
toTransforms
(Schema schema, List<PartitionField> fields) static org.apache.spark.sql.catalyst.TableIdentifier
toV1TableIdentifier
(org.apache.spark.sql.connector.catalog.Identifier identifier)
-
Method Details
-
setOption
-
rebuildCreateProperties
-
applyPropertyChanges
public static UpdateProperties applyPropertyChanges(UpdateProperties pendingUpdate, List<org.apache.spark.sql.connector.catalog.TableChange> changes) Applies a list of Spark table changes to anUpdateProperties
operation.- Parameters:
pendingUpdate
- an uncommitted UpdateProperties operation to configurechanges
- a list of Spark table changes- Returns:
- the UpdateProperties operation configured with the changes
-
applySchemaChanges
public static UpdateSchema applySchemaChanges(UpdateSchema pendingUpdate, List<org.apache.spark.sql.connector.catalog.TableChange> changes) Applies a list of Spark table changes to anUpdateSchema
operation.- Parameters:
pendingUpdate
- an uncommitted UpdateSchema operation to configurechanges
- a list of Spark table changes- Returns:
- the UpdateSchema operation configured with the changes
-
toIcebergTable
-
toOrdering
public static org.apache.spark.sql.connector.expressions.SortOrder[] toOrdering(SortOrder sortOrder) -
toTransforms
public static org.apache.spark.sql.connector.expressions.Transform[] toTransforms(Schema schema, List<PartitionField> fields) -
toTransforms
public static org.apache.spark.sql.connector.expressions.Transform[] toTransforms(PartitionSpec spec) Converts a PartitionSpec to Spark transforms.- Parameters:
spec
- a PartitionSpec- Returns:
- an array of Transforms
-
toNamedReference
public static org.apache.spark.sql.connector.expressions.NamedReference toNamedReference(String name) -
toIcebergTerm
-
toPartitionSpec
public static PartitionSpec toPartitionSpec(Schema schema, org.apache.spark.sql.connector.expressions.Transform[] partitioning) Converts Spark transforms into aPartitionSpec
.- Parameters:
schema
- the table schemapartitioning
- Spark Transforms- Returns:
- a PartitionSpec
-
describe
-
describe
-
describe
-
describe
-
describe
-
extensionsEnabled
public static boolean extensionsEnabled(org.apache.spark.sql.SparkSession spark) -
loadIcebergTable
public static Table loadIcebergTable(org.apache.spark.sql.SparkSession spark, String name) throws org.apache.spark.sql.catalyst.parser.ParseException, org.apache.spark.sql.catalyst.analysis.NoSuchTableException Returns an Iceberg Table by its name from a Spark V2 Catalog. If cache is enabled inSparkCatalog
, theTableOperations
of the table may be stale, please refresh the table to get the latest one.- Parameters:
spark
- SparkSession used for looking up catalog references and tablesname
- The multipart identifier of the Iceberg table- Returns:
- an Iceberg table
- Throws:
org.apache.spark.sql.catalyst.parser.ParseException
org.apache.spark.sql.catalyst.analysis.NoSuchTableException
-
loadIcebergCatalog
public static Catalog loadIcebergCatalog(org.apache.spark.sql.SparkSession spark, String catalogName) Returns the underlying Iceberg Catalog object represented by a Spark Catalog- Parameters:
spark
- SparkSession used for looking up catalog referencecatalogName
- The name of the Spark Catalog being referenced- Returns:
- the Iceberg catalog class being wrapped by the Spark Catalog
-
catalogAndIdentifier
public static Spark3Util.CatalogAndIdentifier catalogAndIdentifier(org.apache.spark.sql.SparkSession spark, String name) throws org.apache.spark.sql.catalyst.parser.ParseException - Throws:
org.apache.spark.sql.catalyst.parser.ParseException
-
catalogAndIdentifier
public static Spark3Util.CatalogAndIdentifier catalogAndIdentifier(org.apache.spark.sql.SparkSession spark, String name, org.apache.spark.sql.connector.catalog.CatalogPlugin defaultCatalog) throws org.apache.spark.sql.catalyst.parser.ParseException - Throws:
org.apache.spark.sql.catalyst.parser.ParseException
-
catalogAndIdentifier
public static Spark3Util.CatalogAndIdentifier catalogAndIdentifier(String description, org.apache.spark.sql.SparkSession spark, String name) -
catalogAndIdentifier
public static Spark3Util.CatalogAndIdentifier catalogAndIdentifier(String description, org.apache.spark.sql.SparkSession spark, String name, org.apache.spark.sql.connector.catalog.CatalogPlugin defaultCatalog) -
catalogAndIdentifier
public static Spark3Util.CatalogAndIdentifier catalogAndIdentifier(org.apache.spark.sql.SparkSession spark, List<String> nameParts) -
catalogAndIdentifier
public static Spark3Util.CatalogAndIdentifier catalogAndIdentifier(org.apache.spark.sql.SparkSession spark, List<String> nameParts, org.apache.spark.sql.connector.catalog.CatalogPlugin defaultCatalog) A modified version of Spark's LookupCatalog.CatalogAndIdentifier.unapply Attempts to find the catalog and identifier a multipart identifier represents- Parameters:
spark
- Spark session to use for resolutionnameParts
- Multipart identifier representing a tabledefaultCatalog
- Catalog to use if none is specified- Returns:
- The CatalogPlugin and Identifier for the table
-
identifierToTableIdentifier
public static TableIdentifier identifierToTableIdentifier(org.apache.spark.sql.connector.catalog.Identifier identifier) -
quotedFullIdentifier
-
getPartitions
public static List<SparkTableUtil.SparkPartition> getPartitions(org.apache.spark.sql.SparkSession spark, org.apache.hadoop.fs.Path rootPath, String format, Map<String, String> partitionFilter, PartitionSpec partitionSpec) Use Spark to list all partitions in the table.- Parameters:
spark
- a Spark sessionrootPath
- a table identifierformat
- format of the filepartitionFilter
- partitionFilter of the filepartitionSpec
- partitionSpec of the table- Returns:
- all table's partitions
-
toV1TableIdentifier
public static org.apache.spark.sql.catalyst.TableIdentifier toV1TableIdentifier(org.apache.spark.sql.connector.catalog.Identifier identifier)
-