Package org.apache.iceberg.spark
Class Spark3Util
java.lang.Object
org.apache.iceberg.spark.Spark3Util
-
Nested Class Summary
Nested ClassesModifier and TypeClassDescriptionstatic classThis mimics a class inside of Spark which is private inside of LookupCatalog.static class -
Method Summary
Modifier and TypeMethodDescriptionstatic UpdatePropertiesapplyPropertyChanges(UpdateProperties pendingUpdate, List<org.apache.spark.sql.connector.catalog.TableChange> changes) Applies a list of Spark table changes to anUpdatePropertiesoperation.static UpdateSchemaapplySchemaChanges(UpdateSchema pendingUpdate, List<org.apache.spark.sql.connector.catalog.TableChange> changes) Applies a list of Spark table changes to anUpdateSchemaoperation.catalogAndIdentifier(String description, org.apache.spark.sql.SparkSession spark, String name) catalogAndIdentifier(String description, org.apache.spark.sql.SparkSession spark, String name, org.apache.spark.sql.connector.catalog.CatalogPlugin defaultCatalog) catalogAndIdentifier(org.apache.spark.sql.SparkSession spark, String name) catalogAndIdentifier(org.apache.spark.sql.SparkSession spark, String name, org.apache.spark.sql.connector.catalog.CatalogPlugin defaultCatalog) catalogAndIdentifier(org.apache.spark.sql.SparkSession spark, List<String> nameParts) catalogAndIdentifier(org.apache.spark.sql.SparkSession spark, List<String> nameParts, org.apache.spark.sql.connector.catalog.CatalogPlugin defaultCatalog) A modified version of Spark's LookupCatalog.CatalogAndIdentifier.unapply Attempts to find the catalog and identifier a multipart identifier representsstatic Stringdescribe(List<Expression> exprs) static Stringdescribe(Expression expr) static Stringstatic Stringstatic Stringstatic booleanextensionsEnabled(org.apache.spark.sql.SparkSession spark) static List<SparkTableUtil.SparkPartition> getPartitions(org.apache.spark.sql.SparkSession spark, org.apache.hadoop.fs.Path rootPath, String format, Map<String, String> partitionFilter, PartitionSpec partitionSpec) Use Spark to list all partitions in the table.static TableIdentifieridentifierToTableIdentifier(org.apache.spark.sql.connector.catalog.Identifier identifier) static CatalogloadIcebergCatalog(org.apache.spark.sql.SparkSession spark, String catalogName) Returns the underlying Iceberg Catalog object represented by a Spark Catalogstatic TableloadIcebergTable(org.apache.spark.sql.SparkSession spark, String name) Returns an Iceberg Table by its name from a Spark V2 Catalog.static StringquotedFullIdentifier(String catalogName, org.apache.spark.sql.connector.catalog.Identifier identifier) rebuildCreateProperties(Map<String, String> createProperties) static org.apache.spark.sql.util.CaseInsensitiveStringMapstatic TabletoIcebergTable(org.apache.spark.sql.connector.catalog.Table table) static TermtoIcebergTerm(org.apache.spark.sql.connector.expressions.Expression expr) static org.apache.spark.sql.connector.expressions.NamedReferencetoNamedReference(String name) static org.apache.spark.sql.connector.expressions.SortOrder[]toOrdering(SortOrder sortOrder) static PartitionSpectoPartitionSpec(Schema schema, org.apache.spark.sql.connector.expressions.Transform[] partitioning) Converts Spark transforms into aPartitionSpec.static org.apache.spark.sql.connector.expressions.Transform[]toTransforms(PartitionSpec spec) Converts a PartitionSpec to Spark transforms.static org.apache.spark.sql.connector.expressions.Transform[]toTransforms(Schema schema, List<PartitionField> fields) static org.apache.spark.sql.catalyst.TableIdentifiertoV1TableIdentifier(org.apache.spark.sql.connector.catalog.Identifier identifier)
-
Method Details
-
setOption
-
rebuildCreateProperties
-
applyPropertyChanges
public static UpdateProperties applyPropertyChanges(UpdateProperties pendingUpdate, List<org.apache.spark.sql.connector.catalog.TableChange> changes) Applies a list of Spark table changes to anUpdatePropertiesoperation.- Parameters:
pendingUpdate- an uncommitted UpdateProperties operation to configurechanges- a list of Spark table changes- Returns:
- the UpdateProperties operation configured with the changes
-
applySchemaChanges
public static UpdateSchema applySchemaChanges(UpdateSchema pendingUpdate, List<org.apache.spark.sql.connector.catalog.TableChange> changes) Applies a list of Spark table changes to anUpdateSchemaoperation.- Parameters:
pendingUpdate- an uncommitted UpdateSchema operation to configurechanges- a list of Spark table changes- Returns:
- the UpdateSchema operation configured with the changes
-
toIcebergTable
-
toOrdering
public static org.apache.spark.sql.connector.expressions.SortOrder[] toOrdering(SortOrder sortOrder) -
toTransforms
public static org.apache.spark.sql.connector.expressions.Transform[] toTransforms(Schema schema, List<PartitionField> fields) -
toTransforms
public static org.apache.spark.sql.connector.expressions.Transform[] toTransforms(PartitionSpec spec) Converts a PartitionSpec to Spark transforms.- Parameters:
spec- a PartitionSpec- Returns:
- an array of Transforms
-
toNamedReference
public static org.apache.spark.sql.connector.expressions.NamedReference toNamedReference(String name) -
toIcebergTerm
-
toPartitionSpec
public static PartitionSpec toPartitionSpec(Schema schema, org.apache.spark.sql.connector.expressions.Transform[] partitioning) Converts Spark transforms into aPartitionSpec.- Parameters:
schema- the table schemapartitioning- Spark Transforms- Returns:
- a PartitionSpec
-
describe
-
describe
-
describe
-
describe
-
describe
-
extensionsEnabled
public static boolean extensionsEnabled(org.apache.spark.sql.SparkSession spark) -
loadIcebergTable
public static Table loadIcebergTable(org.apache.spark.sql.SparkSession spark, String name) throws org.apache.spark.sql.catalyst.parser.ParseException, org.apache.spark.sql.catalyst.analysis.NoSuchTableException Returns an Iceberg Table by its name from a Spark V2 Catalog. If cache is enabled inSparkCatalog, theTableOperationsof the table may be stale, please refresh the table to get the latest one.- Parameters:
spark- SparkSession used for looking up catalog references and tablesname- The multipart identifier of the Iceberg table- Returns:
- an Iceberg table
- Throws:
org.apache.spark.sql.catalyst.parser.ParseExceptionorg.apache.spark.sql.catalyst.analysis.NoSuchTableException
-
loadIcebergCatalog
public static Catalog loadIcebergCatalog(org.apache.spark.sql.SparkSession spark, String catalogName) Returns the underlying Iceberg Catalog object represented by a Spark Catalog- Parameters:
spark- SparkSession used for looking up catalog referencecatalogName- The name of the Spark Catalog being referenced- Returns:
- the Iceberg catalog class being wrapped by the Spark Catalog
-
catalogAndIdentifier
public static Spark3Util.CatalogAndIdentifier catalogAndIdentifier(org.apache.spark.sql.SparkSession spark, String name) throws org.apache.spark.sql.catalyst.parser.ParseException - Throws:
org.apache.spark.sql.catalyst.parser.ParseException
-
catalogAndIdentifier
public static Spark3Util.CatalogAndIdentifier catalogAndIdentifier(org.apache.spark.sql.SparkSession spark, String name, org.apache.spark.sql.connector.catalog.CatalogPlugin defaultCatalog) throws org.apache.spark.sql.catalyst.parser.ParseException - Throws:
org.apache.spark.sql.catalyst.parser.ParseException
-
catalogAndIdentifier
public static Spark3Util.CatalogAndIdentifier catalogAndIdentifier(String description, org.apache.spark.sql.SparkSession spark, String name) -
catalogAndIdentifier
public static Spark3Util.CatalogAndIdentifier catalogAndIdentifier(String description, org.apache.spark.sql.SparkSession spark, String name, org.apache.spark.sql.connector.catalog.CatalogPlugin defaultCatalog) -
catalogAndIdentifier
public static Spark3Util.CatalogAndIdentifier catalogAndIdentifier(org.apache.spark.sql.SparkSession spark, List<String> nameParts) -
catalogAndIdentifier
public static Spark3Util.CatalogAndIdentifier catalogAndIdentifier(org.apache.spark.sql.SparkSession spark, List<String> nameParts, org.apache.spark.sql.connector.catalog.CatalogPlugin defaultCatalog) A modified version of Spark's LookupCatalog.CatalogAndIdentifier.unapply Attempts to find the catalog and identifier a multipart identifier represents- Parameters:
spark- Spark session to use for resolutionnameParts- Multipart identifier representing a tabledefaultCatalog- Catalog to use if none is specified- Returns:
- The CatalogPlugin and Identifier for the table
-
identifierToTableIdentifier
public static TableIdentifier identifierToTableIdentifier(org.apache.spark.sql.connector.catalog.Identifier identifier) -
quotedFullIdentifier
-
getPartitions
public static List<SparkTableUtil.SparkPartition> getPartitions(org.apache.spark.sql.SparkSession spark, org.apache.hadoop.fs.Path rootPath, String format, Map<String, String> partitionFilter, PartitionSpec partitionSpec) Use Spark to list all partitions in the table.- Parameters:
spark- a Spark sessionrootPath- a table identifierformat- format of the filepartitionFilter- partitionFilter of the filepartitionSpec- partitionSpec of the table- Returns:
- all table's partitions
-
toV1TableIdentifier
public static org.apache.spark.sql.catalyst.TableIdentifier toV1TableIdentifier(org.apache.spark.sql.connector.catalog.Identifier identifier)
-