Interface RowLevelOperation
-
- All Known Subinterfaces:
SupportsDelta
public interface RowLevelOperation
A logical representation of a data source DELETE, UPDATE, or MERGE operation that requires rewriting data.
-
-
Nested Class Summary
Nested Classes Modifier and Type Interface Description static class
RowLevelOperation.Command
The SQL operation being performed.
-
Method Summary
All Methods Instance Methods Abstract Methods Default Methods Modifier and Type Method Description RowLevelOperation.Command
command()
Returns the actual SQL operation being performed.default java.lang.String
description()
Returns the description associated with this row-level operation.org.apache.spark.sql.connector.read.ScanBuilder
newScanBuilder(org.apache.spark.sql.util.CaseInsensitiveStringMap options)
Returns a scan builder to configure a scan for this row-level operation.org.apache.spark.sql.connector.write.WriteBuilder
newWriteBuilder(ExtendedLogicalWriteInfo info)
Returns a write builder to configure a write for this row-level operation.default org.apache.spark.sql.connector.expressions.NamedReference[]
requiredMetadataAttributes()
Returns metadata attributes that are required to perform this row-level operation.
-
-
-
Method Detail
-
description
default java.lang.String description()
Returns the description associated with this row-level operation.
-
command
RowLevelOperation.Command command()
Returns the actual SQL operation being performed.
-
newScanBuilder
org.apache.spark.sql.connector.read.ScanBuilder newScanBuilder(org.apache.spark.sql.util.CaseInsensitiveStringMap options)
Returns a scan builder to configure a scan for this row-level operation.Sources fall into two categories: those that can handle a delta of rows and those that need to replace groups (e.g. partitions, files). Sources that handle deltas allow Spark to quickly discard unchanged rows and have no requirements for input scans. Sources that replace groups of rows can discard deleted rows but need to keep unchanged rows to be passed back into the source. This means that scans for such data sources must produce all rows in a group if any are returned. Some sources will avoid pushing filters into files (file granularity), while others will avoid pruning files within a partition (partition granularity).
For example, if a source can only replace partitions, all rows from a partition must be returned by the scan, even if a filter can narrow the set of changes to a single file in the partition. Similarly, a source that can swap individual files must produce all rows of files where at least one record must be changed, not just the rows that must be changed.
-
newWriteBuilder
org.apache.spark.sql.connector.write.WriteBuilder newWriteBuilder(ExtendedLogicalWriteInfo info)
Returns a write builder to configure a write for this row-level operation.Note that Spark will first configure the scan and then the write, allowing data sources to pass information from the scan to the write. For example, the scan can report which condition was used to read the data that may be needed by the write under certain isolation levels.
-
requiredMetadataAttributes
default org.apache.spark.sql.connector.expressions.NamedReference[] requiredMetadataAttributes()
Returns metadata attributes that are required to perform this row-level operation.Data sources that can use this method to project metadata columns needed for writing the data back (e.g. metadata columns for grouping data).
-
-