Interface RowLevelOperation

  • All Known Subinterfaces:
    SupportsDelta

    public interface RowLevelOperation
    A logical representation of a data source DELETE, UPDATE, or MERGE operation that requires rewriting data.
    • Nested Class Summary

      Nested Classes 
      Modifier and Type Interface Description
      static class  RowLevelOperation.Command
      The SQL operation being performed.
    • Method Summary

      All Methods Instance Methods Abstract Methods Default Methods 
      Modifier and Type Method Description
      RowLevelOperation.Command command()
      Returns the actual SQL operation being performed.
      default java.lang.String description()
      Returns the description associated with this row-level operation.
      org.apache.spark.sql.connector.read.ScanBuilder newScanBuilder​(org.apache.spark.sql.util.CaseInsensitiveStringMap options)
      Returns a scan builder to configure a scan for this row-level operation.
      org.apache.spark.sql.connector.write.WriteBuilder newWriteBuilder​(ExtendedLogicalWriteInfo info)
      Returns a write builder to configure a write for this row-level operation.
      default org.apache.spark.sql.connector.expressions.NamedReference[] requiredMetadataAttributes()
      Returns metadata attributes that are required to perform this row-level operation.
    • Method Detail

      • description

        default java.lang.String description()
        Returns the description associated with this row-level operation.
      • newScanBuilder

        org.apache.spark.sql.connector.read.ScanBuilder newScanBuilder​(org.apache.spark.sql.util.CaseInsensitiveStringMap options)
        Returns a scan builder to configure a scan for this row-level operation.

        Sources fall into two categories: those that can handle a delta of rows and those that need to replace groups (e.g. partitions, files). Sources that handle deltas allow Spark to quickly discard unchanged rows and have no requirements for input scans. Sources that replace groups of rows can discard deleted rows but need to keep unchanged rows to be passed back into the source. This means that scans for such data sources must produce all rows in a group if any are returned. Some sources will avoid pushing filters into files (file granularity), while others will avoid pruning files within a partition (partition granularity).

        For example, if a source can only replace partitions, all rows from a partition must be returned by the scan, even if a filter can narrow the set of changes to a single file in the partition. Similarly, a source that can swap individual files must produce all rows of files where at least one record must be changed, not just the rows that must be changed.

      • newWriteBuilder

        org.apache.spark.sql.connector.write.WriteBuilder newWriteBuilder​(ExtendedLogicalWriteInfo info)
        Returns a write builder to configure a write for this row-level operation.

        Note that Spark will first configure the scan and then the write, allowing data sources to pass information from the scan to the write. For example, the scan can report which condition was used to read the data that may be needed by the write under certain isolation levels.

      • requiredMetadataAttributes

        default org.apache.spark.sql.connector.expressions.NamedReference[] requiredMetadataAttributes()
        Returns metadata attributes that are required to perform this row-level operation.

        Data sources that can use this method to project metadata columns needed for writing the data back (e.g. metadata columns for grouping data).