public interface RowLevelOperation
Modifier and Type | Interface and Description |
---|---|
static class |
RowLevelOperation.Command
The SQL operation being performed.
|
Modifier and Type | Method and Description |
---|---|
RowLevelOperation.Command |
command()
Returns the actual SQL operation being performed.
|
default java.lang.String |
description()
Returns the description associated with this row-level operation.
|
org.apache.spark.sql.connector.read.ScanBuilder |
newScanBuilder(org.apache.spark.sql.util.CaseInsensitiveStringMap options)
Returns a scan builder to configure a scan for this row-level operation.
|
org.apache.spark.sql.connector.write.WriteBuilder |
newWriteBuilder(ExtendedLogicalWriteInfo info)
Returns a write builder to configure a write for this row-level operation.
|
default org.apache.spark.sql.connector.expressions.NamedReference[] |
requiredMetadataAttributes()
Returns metadata attributes that are required to perform this row-level operation.
|
default java.lang.String description()
RowLevelOperation.Command command()
org.apache.spark.sql.connector.read.ScanBuilder newScanBuilder(org.apache.spark.sql.util.CaseInsensitiveStringMap options)
Sources fall into two categories: those that can handle a delta of rows and those that need to replace groups (e.g. partitions, files). Sources that handle deltas allow Spark to quickly discard unchanged rows and have no requirements for input scans. Sources that replace groups of rows can discard deleted rows but need to keep unchanged rows to be passed back into the source. This means that scans for such data sources must produce all rows in a group if any are returned. Some sources will avoid pushing filters into files (file granularity), while others will avoid pruning files within a partition (partition granularity).
For example, if a source can only replace partitions, all rows from a partition must be returned by the scan, even if a filter can narrow the set of changes to a single file in the partition. Similarly, a source that can swap individual files must produce all rows of files where at least one record must be changed, not just the rows that must be changed.
org.apache.spark.sql.connector.write.WriteBuilder newWriteBuilder(ExtendedLogicalWriteInfo info)
Note that Spark will first configure the scan and then the write, allowing data sources to pass information from the scan to the write. For example, the scan can report which condition was used to read the data that may be needed by the write under certain isolation levels.
default org.apache.spark.sql.connector.expressions.NamedReference[] requiredMetadataAttributes()
Data sources that can use this method to project metadata columns needed for writing the data back (e.g. metadata columns for grouping data).