Package org.apache.iceberg.spark
Class Spark3Util
- java.lang.Object
-
- org.apache.iceberg.spark.Spark3Util
-
public class Spark3Util extends java.lang.Object
-
-
Nested Class Summary
Nested Classes Modifier and Type Class Description static classSpark3Util.CatalogAndIdentifierThis mimics a class inside of Spark which is private inside of LookupCatalog.static classSpark3Util.DescribeSchemaVisitor
-
Method Summary
All Methods Static Methods Concrete Methods Modifier and Type Method Description static UpdatePropertiesapplyPropertyChanges(UpdateProperties pendingUpdate, java.util.List<org.apache.spark.sql.connector.catalog.TableChange> changes)Applies a list of Spark table changes to anUpdatePropertiesoperation.static UpdateSchemaapplySchemaChanges(UpdateSchema pendingUpdate, java.util.List<org.apache.spark.sql.connector.catalog.TableChange> changes)Applies a list of Spark table changes to anUpdateSchemaoperation.static Spark3Util.CatalogAndIdentifiercatalogAndIdentifier(java.lang.String description, org.apache.spark.sql.SparkSession spark, java.lang.String name)static Spark3Util.CatalogAndIdentifiercatalogAndIdentifier(java.lang.String description, org.apache.spark.sql.SparkSession spark, java.lang.String name, org.apache.spark.sql.connector.catalog.CatalogPlugin defaultCatalog)static Spark3Util.CatalogAndIdentifiercatalogAndIdentifier(org.apache.spark.sql.SparkSession spark, java.lang.String name)static Spark3Util.CatalogAndIdentifiercatalogAndIdentifier(org.apache.spark.sql.SparkSession spark, java.lang.String name, org.apache.spark.sql.connector.catalog.CatalogPlugin defaultCatalog)static Spark3Util.CatalogAndIdentifiercatalogAndIdentifier(org.apache.spark.sql.SparkSession spark, java.util.List<java.lang.String> nameParts)static Spark3Util.CatalogAndIdentifiercatalogAndIdentifier(org.apache.spark.sql.SparkSession spark, java.util.List<java.lang.String> nameParts, org.apache.spark.sql.connector.catalog.CatalogPlugin defaultCatalog)A modified version of Spark's LookupCatalog.CatalogAndIdentifier.unapply Attempts to find the catalog and identifier a multipart identifier representsstatic java.lang.Stringdescribe(java.util.List<Expression> exprs)static java.lang.Stringdescribe(Expression expr)static java.lang.Stringdescribe(Schema schema)static java.lang.Stringdescribe(SortOrder order)static java.lang.Stringdescribe(Type type)static booleanextensionsEnabled(org.apache.spark.sql.SparkSession spark)static java.util.List<SparkTableUtil.SparkPartition>getPartitions(org.apache.spark.sql.SparkSession spark, org.apache.hadoop.fs.Path rootPath, java.lang.String format, java.util.Map<java.lang.String,java.lang.String> partitionFilter, PartitionSpec partitionSpec)Use Spark to list all partitions in the table.static TableIdentifieridentifierToTableIdentifier(org.apache.spark.sql.connector.catalog.Identifier identifier)static CatalogloadIcebergCatalog(org.apache.spark.sql.SparkSession spark, java.lang.String catalogName)Returns the underlying Iceberg Catalog object represented by a Spark Catalogstatic TableloadIcebergTable(org.apache.spark.sql.SparkSession spark, java.lang.String name)Returns an Iceberg Table by its name from a Spark V2 Catalog.static java.lang.StringquotedFullIdentifier(java.lang.String catalogName, org.apache.spark.sql.connector.catalog.Identifier identifier)static java.util.Map<java.lang.String,java.lang.String>rebuildCreateProperties(java.util.Map<java.lang.String,java.lang.String> createProperties)static org.apache.spark.sql.util.CaseInsensitiveStringMapsetOption(java.lang.String key, java.lang.String value, org.apache.spark.sql.util.CaseInsensitiveStringMap options)static TabletoIcebergTable(org.apache.spark.sql.connector.catalog.Table table)static TermtoIcebergTerm(org.apache.spark.sql.connector.expressions.Expression expr)static org.apache.spark.sql.connector.expressions.NamedReferencetoNamedReference(java.lang.String name)static org.apache.spark.sql.connector.expressions.SortOrder[]toOrdering(SortOrder sortOrder)static PartitionSpectoPartitionSpec(Schema schema, org.apache.spark.sql.connector.expressions.Transform[] partitioning)Converts Spark transforms into aPartitionSpec.static org.apache.spark.sql.connector.expressions.Transform[]toTransforms(PartitionSpec spec)Converts a PartitionSpec to Spark transforms.static org.apache.spark.sql.connector.expressions.Transform[]toTransforms(Schema schema, java.util.List<PartitionField> fields)static org.apache.spark.sql.catalyst.TableIdentifiertoV1TableIdentifier(org.apache.spark.sql.connector.catalog.Identifier identifier)
-
-
-
Method Detail
-
setOption
public static org.apache.spark.sql.util.CaseInsensitiveStringMap setOption(java.lang.String key, java.lang.String value, org.apache.spark.sql.util.CaseInsensitiveStringMap options)
-
rebuildCreateProperties
public static java.util.Map<java.lang.String,java.lang.String> rebuildCreateProperties(java.util.Map<java.lang.String,java.lang.String> createProperties)
-
applyPropertyChanges
public static UpdateProperties applyPropertyChanges(UpdateProperties pendingUpdate, java.util.List<org.apache.spark.sql.connector.catalog.TableChange> changes)
Applies a list of Spark table changes to anUpdatePropertiesoperation.- Parameters:
pendingUpdate- an uncommitted UpdateProperties operation to configurechanges- a list of Spark table changes- Returns:
- the UpdateProperties operation configured with the changes
-
applySchemaChanges
public static UpdateSchema applySchemaChanges(UpdateSchema pendingUpdate, java.util.List<org.apache.spark.sql.connector.catalog.TableChange> changes)
Applies a list of Spark table changes to anUpdateSchemaoperation.- Parameters:
pendingUpdate- an uncommitted UpdateSchema operation to configurechanges- a list of Spark table changes- Returns:
- the UpdateSchema operation configured with the changes
-
toIcebergTable
public static Table toIcebergTable(org.apache.spark.sql.connector.catalog.Table table)
-
toOrdering
public static org.apache.spark.sql.connector.expressions.SortOrder[] toOrdering(SortOrder sortOrder)
-
toTransforms
public static org.apache.spark.sql.connector.expressions.Transform[] toTransforms(Schema schema, java.util.List<PartitionField> fields)
-
toTransforms
public static org.apache.spark.sql.connector.expressions.Transform[] toTransforms(PartitionSpec spec)
Converts a PartitionSpec to Spark transforms.- Parameters:
spec- a PartitionSpec- Returns:
- an array of Transforms
-
toNamedReference
public static org.apache.spark.sql.connector.expressions.NamedReference toNamedReference(java.lang.String name)
-
toIcebergTerm
public static Term toIcebergTerm(org.apache.spark.sql.connector.expressions.Expression expr)
-
toPartitionSpec
public static PartitionSpec toPartitionSpec(Schema schema, org.apache.spark.sql.connector.expressions.Transform[] partitioning)
Converts Spark transforms into aPartitionSpec.- Parameters:
schema- the table schemapartitioning- Spark Transforms- Returns:
- a PartitionSpec
-
describe
public static java.lang.String describe(java.util.List<Expression> exprs)
-
describe
public static java.lang.String describe(Expression expr)
-
describe
public static java.lang.String describe(Schema schema)
-
describe
public static java.lang.String describe(Type type)
-
describe
public static java.lang.String describe(SortOrder order)
-
extensionsEnabled
public static boolean extensionsEnabled(org.apache.spark.sql.SparkSession spark)
-
loadIcebergTable
public static Table loadIcebergTable(org.apache.spark.sql.SparkSession spark, java.lang.String name) throws org.apache.spark.sql.catalyst.parser.ParseException, org.apache.spark.sql.catalyst.analysis.NoSuchTableException
Returns an Iceberg Table by its name from a Spark V2 Catalog. If cache is enabled inSparkCatalog, theTableOperationsof the table may be stale, please refresh the table to get the latest one.- Parameters:
spark- SparkSession used for looking up catalog references and tablesname- The multipart identifier of the Iceberg table- Returns:
- an Iceberg table
- Throws:
org.apache.spark.sql.catalyst.parser.ParseExceptionorg.apache.spark.sql.catalyst.analysis.NoSuchTableException
-
loadIcebergCatalog
public static Catalog loadIcebergCatalog(org.apache.spark.sql.SparkSession spark, java.lang.String catalogName)
Returns the underlying Iceberg Catalog object represented by a Spark Catalog- Parameters:
spark- SparkSession used for looking up catalog referencecatalogName- The name of the Spark Catalog being referenced- Returns:
- the Iceberg catalog class being wrapped by the Spark Catalog
-
catalogAndIdentifier
public static Spark3Util.CatalogAndIdentifier catalogAndIdentifier(org.apache.spark.sql.SparkSession spark, java.lang.String name) throws org.apache.spark.sql.catalyst.parser.ParseException
- Throws:
org.apache.spark.sql.catalyst.parser.ParseException
-
catalogAndIdentifier
public static Spark3Util.CatalogAndIdentifier catalogAndIdentifier(org.apache.spark.sql.SparkSession spark, java.lang.String name, org.apache.spark.sql.connector.catalog.CatalogPlugin defaultCatalog) throws org.apache.spark.sql.catalyst.parser.ParseException
- Throws:
org.apache.spark.sql.catalyst.parser.ParseException
-
catalogAndIdentifier
public static Spark3Util.CatalogAndIdentifier catalogAndIdentifier(java.lang.String description, org.apache.spark.sql.SparkSession spark, java.lang.String name)
-
catalogAndIdentifier
public static Spark3Util.CatalogAndIdentifier catalogAndIdentifier(java.lang.String description, org.apache.spark.sql.SparkSession spark, java.lang.String name, org.apache.spark.sql.connector.catalog.CatalogPlugin defaultCatalog)
-
catalogAndIdentifier
public static Spark3Util.CatalogAndIdentifier catalogAndIdentifier(org.apache.spark.sql.SparkSession spark, java.util.List<java.lang.String> nameParts)
-
catalogAndIdentifier
public static Spark3Util.CatalogAndIdentifier catalogAndIdentifier(org.apache.spark.sql.SparkSession spark, java.util.List<java.lang.String> nameParts, org.apache.spark.sql.connector.catalog.CatalogPlugin defaultCatalog)
A modified version of Spark's LookupCatalog.CatalogAndIdentifier.unapply Attempts to find the catalog and identifier a multipart identifier represents- Parameters:
spark- Spark session to use for resolutionnameParts- Multipart identifier representing a tabledefaultCatalog- Catalog to use if none is specified- Returns:
- The CatalogPlugin and Identifier for the table
-
identifierToTableIdentifier
public static TableIdentifier identifierToTableIdentifier(org.apache.spark.sql.connector.catalog.Identifier identifier)
-
quotedFullIdentifier
public static java.lang.String quotedFullIdentifier(java.lang.String catalogName, org.apache.spark.sql.connector.catalog.Identifier identifier)
-
getPartitions
public static java.util.List<SparkTableUtil.SparkPartition> getPartitions(org.apache.spark.sql.SparkSession spark, org.apache.hadoop.fs.Path rootPath, java.lang.String format, java.util.Map<java.lang.String,java.lang.String> partitionFilter, PartitionSpec partitionSpec)
Use Spark to list all partitions in the table.- Parameters:
spark- a Spark sessionrootPath- a table identifierformat- format of the filepartitionFilter- partitionFilter of the filepartitionSpec- partitionSpec of the table- Returns:
- all table's partitions
-
toV1TableIdentifier
public static org.apache.spark.sql.catalyst.TableIdentifier toV1TableIdentifier(org.apache.spark.sql.connector.catalog.Identifier identifier)
-
-