Package org.apache.iceberg.spark
Class Spark3Util
- java.lang.Object
-
- org.apache.iceberg.spark.Spark3Util
-
public class Spark3Util extends java.lang.Object
-
-
Nested Class Summary
Nested Classes Modifier and Type Class Description static class
Spark3Util.CatalogAndIdentifier
This mimics a class inside of Spark which is private inside of LookupCatalog.static class
Spark3Util.DescribeSchemaVisitor
-
Method Summary
All Methods Static Methods Concrete Methods Modifier and Type Method Description static UpdateProperties
applyPropertyChanges(UpdateProperties pendingUpdate, java.util.List<org.apache.spark.sql.connector.catalog.TableChange> changes)
Applies a list of Spark table changes to anUpdateProperties
operation.static UpdateSchema
applySchemaChanges(UpdateSchema pendingUpdate, java.util.List<org.apache.spark.sql.connector.catalog.TableChange> changes)
Applies a list of Spark table changes to anUpdateSchema
operation.static Spark3Util.CatalogAndIdentifier
catalogAndIdentifier(java.lang.String description, org.apache.spark.sql.SparkSession spark, java.lang.String name)
static Spark3Util.CatalogAndIdentifier
catalogAndIdentifier(java.lang.String description, org.apache.spark.sql.SparkSession spark, java.lang.String name, org.apache.spark.sql.connector.catalog.CatalogPlugin defaultCatalog)
static Spark3Util.CatalogAndIdentifier
catalogAndIdentifier(org.apache.spark.sql.SparkSession spark, java.lang.String name)
static Spark3Util.CatalogAndIdentifier
catalogAndIdentifier(org.apache.spark.sql.SparkSession spark, java.lang.String name, org.apache.spark.sql.connector.catalog.CatalogPlugin defaultCatalog)
static Spark3Util.CatalogAndIdentifier
catalogAndIdentifier(org.apache.spark.sql.SparkSession spark, java.util.List<java.lang.String> nameParts)
static Spark3Util.CatalogAndIdentifier
catalogAndIdentifier(org.apache.spark.sql.SparkSession spark, java.util.List<java.lang.String> nameParts, org.apache.spark.sql.connector.catalog.CatalogPlugin defaultCatalog)
A modified version of Spark's LookupCatalog.CatalogAndIdentifier.unapply Attempts to find the catalog and identifier a multipart identifier representsstatic java.lang.String
describe(java.util.List<Expression> exprs)
static java.lang.String
describe(Expression expr)
static java.lang.String
describe(Schema schema)
static java.lang.String
describe(SortOrder order)
static java.lang.String
describe(Type type)
static boolean
extensionsEnabled(org.apache.spark.sql.SparkSession spark)
static java.util.List<SparkTableUtil.SparkPartition>
getPartitions(org.apache.spark.sql.SparkSession spark, org.apache.hadoop.fs.Path rootPath, java.lang.String format, java.util.Map<java.lang.String,java.lang.String> partitionFilter, PartitionSpec partitionSpec)
Use Spark to list all partitions in the table.static TableIdentifier
identifierToTableIdentifier(org.apache.spark.sql.connector.catalog.Identifier identifier)
static Catalog
loadIcebergCatalog(org.apache.spark.sql.SparkSession spark, java.lang.String catalogName)
Returns the underlying Iceberg Catalog object represented by a Spark Catalogstatic Table
loadIcebergTable(org.apache.spark.sql.SparkSession spark, java.lang.String name)
Returns an Iceberg Table by its name from a Spark V2 Catalog.static java.lang.String
quotedFullIdentifier(java.lang.String catalogName, org.apache.spark.sql.connector.catalog.Identifier identifier)
static java.util.Map<java.lang.String,java.lang.String>
rebuildCreateProperties(java.util.Map<java.lang.String,java.lang.String> createProperties)
static org.apache.spark.sql.util.CaseInsensitiveStringMap
setOption(java.lang.String key, java.lang.String value, org.apache.spark.sql.util.CaseInsensitiveStringMap options)
static Table
toIcebergTable(org.apache.spark.sql.connector.catalog.Table table)
static Term
toIcebergTerm(org.apache.spark.sql.connector.expressions.Expression expr)
static org.apache.spark.sql.connector.expressions.NamedReference
toNamedReference(java.lang.String name)
static org.apache.spark.sql.connector.expressions.SortOrder[]
toOrdering(SortOrder sortOrder)
static PartitionSpec
toPartitionSpec(Schema schema, org.apache.spark.sql.connector.expressions.Transform[] partitioning)
Converts Spark transforms into aPartitionSpec
.static org.apache.spark.sql.connector.expressions.Transform[]
toTransforms(PartitionSpec spec)
Converts a PartitionSpec to Spark transforms.static org.apache.spark.sql.connector.expressions.Transform[]
toTransforms(Schema schema, java.util.List<PartitionField> fields)
static org.apache.spark.sql.catalyst.TableIdentifier
toV1TableIdentifier(org.apache.spark.sql.connector.catalog.Identifier identifier)
-
-
-
Method Detail
-
setOption
public static org.apache.spark.sql.util.CaseInsensitiveStringMap setOption(java.lang.String key, java.lang.String value, org.apache.spark.sql.util.CaseInsensitiveStringMap options)
-
rebuildCreateProperties
public static java.util.Map<java.lang.String,java.lang.String> rebuildCreateProperties(java.util.Map<java.lang.String,java.lang.String> createProperties)
-
applyPropertyChanges
public static UpdateProperties applyPropertyChanges(UpdateProperties pendingUpdate, java.util.List<org.apache.spark.sql.connector.catalog.TableChange> changes)
Applies a list of Spark table changes to anUpdateProperties
operation.- Parameters:
pendingUpdate
- an uncommitted UpdateProperties operation to configurechanges
- a list of Spark table changes- Returns:
- the UpdateProperties operation configured with the changes
-
applySchemaChanges
public static UpdateSchema applySchemaChanges(UpdateSchema pendingUpdate, java.util.List<org.apache.spark.sql.connector.catalog.TableChange> changes)
Applies a list of Spark table changes to anUpdateSchema
operation.- Parameters:
pendingUpdate
- an uncommitted UpdateSchema operation to configurechanges
- a list of Spark table changes- Returns:
- the UpdateSchema operation configured with the changes
-
toIcebergTable
public static Table toIcebergTable(org.apache.spark.sql.connector.catalog.Table table)
-
toOrdering
public static org.apache.spark.sql.connector.expressions.SortOrder[] toOrdering(SortOrder sortOrder)
-
toTransforms
public static org.apache.spark.sql.connector.expressions.Transform[] toTransforms(Schema schema, java.util.List<PartitionField> fields)
-
toTransforms
public static org.apache.spark.sql.connector.expressions.Transform[] toTransforms(PartitionSpec spec)
Converts a PartitionSpec to Spark transforms.- Parameters:
spec
- a PartitionSpec- Returns:
- an array of Transforms
-
toNamedReference
public static org.apache.spark.sql.connector.expressions.NamedReference toNamedReference(java.lang.String name)
-
toIcebergTerm
public static Term toIcebergTerm(org.apache.spark.sql.connector.expressions.Expression expr)
-
toPartitionSpec
public static PartitionSpec toPartitionSpec(Schema schema, org.apache.spark.sql.connector.expressions.Transform[] partitioning)
Converts Spark transforms into aPartitionSpec
.- Parameters:
schema
- the table schemapartitioning
- Spark Transforms- Returns:
- a PartitionSpec
-
describe
public static java.lang.String describe(java.util.List<Expression> exprs)
-
describe
public static java.lang.String describe(Expression expr)
-
describe
public static java.lang.String describe(Schema schema)
-
describe
public static java.lang.String describe(Type type)
-
describe
public static java.lang.String describe(SortOrder order)
-
extensionsEnabled
public static boolean extensionsEnabled(org.apache.spark.sql.SparkSession spark)
-
loadIcebergTable
public static Table loadIcebergTable(org.apache.spark.sql.SparkSession spark, java.lang.String name) throws org.apache.spark.sql.catalyst.parser.ParseException, org.apache.spark.sql.catalyst.analysis.NoSuchTableException
Returns an Iceberg Table by its name from a Spark V2 Catalog. If cache is enabled inSparkCatalog
, theTableOperations
of the table may be stale, please refresh the table to get the latest one.- Parameters:
spark
- SparkSession used for looking up catalog references and tablesname
- The multipart identifier of the Iceberg table- Returns:
- an Iceberg table
- Throws:
org.apache.spark.sql.catalyst.parser.ParseException
org.apache.spark.sql.catalyst.analysis.NoSuchTableException
-
loadIcebergCatalog
public static Catalog loadIcebergCatalog(org.apache.spark.sql.SparkSession spark, java.lang.String catalogName)
Returns the underlying Iceberg Catalog object represented by a Spark Catalog- Parameters:
spark
- SparkSession used for looking up catalog referencecatalogName
- The name of the Spark Catalog being referenced- Returns:
- the Iceberg catalog class being wrapped by the Spark Catalog
-
catalogAndIdentifier
public static Spark3Util.CatalogAndIdentifier catalogAndIdentifier(org.apache.spark.sql.SparkSession spark, java.lang.String name) throws org.apache.spark.sql.catalyst.parser.ParseException
- Throws:
org.apache.spark.sql.catalyst.parser.ParseException
-
catalogAndIdentifier
public static Spark3Util.CatalogAndIdentifier catalogAndIdentifier(org.apache.spark.sql.SparkSession spark, java.lang.String name, org.apache.spark.sql.connector.catalog.CatalogPlugin defaultCatalog) throws org.apache.spark.sql.catalyst.parser.ParseException
- Throws:
org.apache.spark.sql.catalyst.parser.ParseException
-
catalogAndIdentifier
public static Spark3Util.CatalogAndIdentifier catalogAndIdentifier(java.lang.String description, org.apache.spark.sql.SparkSession spark, java.lang.String name)
-
catalogAndIdentifier
public static Spark3Util.CatalogAndIdentifier catalogAndIdentifier(java.lang.String description, org.apache.spark.sql.SparkSession spark, java.lang.String name, org.apache.spark.sql.connector.catalog.CatalogPlugin defaultCatalog)
-
catalogAndIdentifier
public static Spark3Util.CatalogAndIdentifier catalogAndIdentifier(org.apache.spark.sql.SparkSession spark, java.util.List<java.lang.String> nameParts)
-
catalogAndIdentifier
public static Spark3Util.CatalogAndIdentifier catalogAndIdentifier(org.apache.spark.sql.SparkSession spark, java.util.List<java.lang.String> nameParts, org.apache.spark.sql.connector.catalog.CatalogPlugin defaultCatalog)
A modified version of Spark's LookupCatalog.CatalogAndIdentifier.unapply Attempts to find the catalog and identifier a multipart identifier represents- Parameters:
spark
- Spark session to use for resolutionnameParts
- Multipart identifier representing a tabledefaultCatalog
- Catalog to use if none is specified- Returns:
- The CatalogPlugin and Identifier for the table
-
identifierToTableIdentifier
public static TableIdentifier identifierToTableIdentifier(org.apache.spark.sql.connector.catalog.Identifier identifier)
-
quotedFullIdentifier
public static java.lang.String quotedFullIdentifier(java.lang.String catalogName, org.apache.spark.sql.connector.catalog.Identifier identifier)
-
getPartitions
public static java.util.List<SparkTableUtil.SparkPartition> getPartitions(org.apache.spark.sql.SparkSession spark, org.apache.hadoop.fs.Path rootPath, java.lang.String format, java.util.Map<java.lang.String,java.lang.String> partitionFilter, PartitionSpec partitionSpec)
Use Spark to list all partitions in the table.- Parameters:
spark
- a Spark sessionrootPath
- a table identifierformat
- format of the filepartitionFilter
- partitionFilter of the filepartitionSpec
- partitionSpec of the table- Returns:
- all table's partitions
-
toV1TableIdentifier
public static org.apache.spark.sql.catalyst.TableIdentifier toV1TableIdentifier(org.apache.spark.sql.connector.catalog.Identifier identifier)
-
-