Class HiveIcebergStorageHandler

java.lang.Object
org.apache.iceberg.mr.hive.HiveIcebergStorageHandler
All Implemented Interfaces:
org.apache.hadoop.conf.Configurable, org.apache.hadoop.hive.ql.metadata.HiveStorageHandler, org.apache.hadoop.hive.ql.metadata.HiveStoragePredicateHandler

public class HiveIcebergStorageHandler extends Object implements org.apache.hadoop.hive.ql.metadata.HiveStoragePredicateHandler, org.apache.hadoop.hive.ql.metadata.HiveStorageHandler
  • Nested Class Summary

    Nested classes/interfaces inherited from interface org.apache.hadoop.hive.ql.metadata.HiveStoragePredicateHandler

    org.apache.hadoop.hive.ql.metadata.HiveStoragePredicateHandler.DecomposedPredicate
  • Constructor Summary

    Constructors
    Constructor
    Description
     
  • Method Summary

    Modifier and Type
    Method
    Description
    static String
    catalogName(org.apache.hadoop.conf.Configuration config, String name)
    Returns the catalog name serialized to the configuration.
    static void
    checkAndSetIoConfig(org.apache.hadoop.conf.Configuration config, Table table)
    If enabled, it populates the FileIO's hadoop configuration with the input config object.
    static void
    checkAndSkipIoConfigSerialization(org.apache.hadoop.conf.Configuration config, Table table)
    If enabled, it ensures that the FileIO's hadoop configuration will not be serialized.
    void
    configureInputJobCredentials(org.apache.hadoop.hive.ql.plan.TableDesc tableDesc, Map<String,String> secrets)
     
    void
    configureInputJobProperties(org.apache.hadoop.hive.ql.plan.TableDesc tableDesc, Map<String,String> map)
     
    void
    configureJobConf(org.apache.hadoop.hive.ql.plan.TableDesc tableDesc, org.apache.hadoop.mapred.JobConf jobConf)
     
    void
    configureOutputJobProperties(org.apache.hadoop.hive.ql.plan.TableDesc tableDesc, Map<String,String> map)
     
    void
    configureTableJobProperties(org.apache.hadoop.hive.ql.plan.TableDesc tableDesc, Map<String,String> map)
     
    org.apache.hadoop.hive.ql.metadata.HiveStoragePredicateHandler.DecomposedPredicate
    decomposePredicate(org.apache.hadoop.mapred.JobConf jobConf, org.apache.hadoop.hive.serde2.Deserializer deserializer, org.apache.hadoop.hive.ql.plan.ExprNodeDesc exprNodeDesc)
     
    org.apache.hadoop.hive.ql.security.authorization.HiveAuthorizationProvider
     
    org.apache.hadoop.conf.Configuration
     
    Class<? extends org.apache.hadoop.mapred.InputFormat>
     
    org.apache.hadoop.hive.metastore.HiveMetaHook
     
    Class<? extends org.apache.hadoop.mapred.OutputFormat>
     
    Class<? extends org.apache.hadoop.hive.serde2.AbstractSerDe>
     
    outputTables(org.apache.hadoop.conf.Configuration config)
    Returns the names of the output tables stored in the configuration.
    static Schema
    schema(org.apache.hadoop.conf.Configuration config)
    Returns the Table Schema serialized to the configuration.
    void
    setConf(org.apache.hadoop.conf.Configuration conf)
     
    static Table
    table(org.apache.hadoop.conf.Configuration config, String name)
    Returns the Table serialized to the configuration based on the table name.
     

    Methods inherited from class java.lang.Object

    clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait
  • Constructor Details

    • HiveIcebergStorageHandler

      public HiveIcebergStorageHandler()
  • Method Details

    • getInputFormatClass

      public Class<? extends org.apache.hadoop.mapred.InputFormat> getInputFormatClass()
      Specified by:
      getInputFormatClass in interface org.apache.hadoop.hive.ql.metadata.HiveStorageHandler
    • getOutputFormatClass

      public Class<? extends org.apache.hadoop.mapred.OutputFormat> getOutputFormatClass()
      Specified by:
      getOutputFormatClass in interface org.apache.hadoop.hive.ql.metadata.HiveStorageHandler
    • getSerDeClass

      public Class<? extends org.apache.hadoop.hive.serde2.AbstractSerDe> getSerDeClass()
      Specified by:
      getSerDeClass in interface org.apache.hadoop.hive.ql.metadata.HiveStorageHandler
    • getMetaHook

      public org.apache.hadoop.hive.metastore.HiveMetaHook getMetaHook()
      Specified by:
      getMetaHook in interface org.apache.hadoop.hive.ql.metadata.HiveStorageHandler
    • getAuthorizationProvider

      public org.apache.hadoop.hive.ql.security.authorization.HiveAuthorizationProvider getAuthorizationProvider()
      Specified by:
      getAuthorizationProvider in interface org.apache.hadoop.hive.ql.metadata.HiveStorageHandler
    • configureInputJobProperties

      public void configureInputJobProperties(org.apache.hadoop.hive.ql.plan.TableDesc tableDesc, Map<String,String> map)
      Specified by:
      configureInputJobProperties in interface org.apache.hadoop.hive.ql.metadata.HiveStorageHandler
    • configureOutputJobProperties

      public void configureOutputJobProperties(org.apache.hadoop.hive.ql.plan.TableDesc tableDesc, Map<String,String> map)
      Specified by:
      configureOutputJobProperties in interface org.apache.hadoop.hive.ql.metadata.HiveStorageHandler
    • configureTableJobProperties

      public void configureTableJobProperties(org.apache.hadoop.hive.ql.plan.TableDesc tableDesc, Map<String,String> map)
      Specified by:
      configureTableJobProperties in interface org.apache.hadoop.hive.ql.metadata.HiveStorageHandler
    • configureInputJobCredentials

      public void configureInputJobCredentials(org.apache.hadoop.hive.ql.plan.TableDesc tableDesc, Map<String,String> secrets)
    • configureJobConf

      public void configureJobConf(org.apache.hadoop.hive.ql.plan.TableDesc tableDesc, org.apache.hadoop.mapred.JobConf jobConf)
      Specified by:
      configureJobConf in interface org.apache.hadoop.hive.ql.metadata.HiveStorageHandler
    • getConf

      public org.apache.hadoop.conf.Configuration getConf()
      Specified by:
      getConf in interface org.apache.hadoop.conf.Configurable
    • setConf

      public void setConf(org.apache.hadoop.conf.Configuration conf)
      Specified by:
      setConf in interface org.apache.hadoop.conf.Configurable
    • toString

      public String toString()
      Overrides:
      toString in class Object
    • decomposePredicate

      public org.apache.hadoop.hive.ql.metadata.HiveStoragePredicateHandler.DecomposedPredicate decomposePredicate(org.apache.hadoop.mapred.JobConf jobConf, org.apache.hadoop.hive.serde2.Deserializer deserializer, org.apache.hadoop.hive.ql.plan.ExprNodeDesc exprNodeDesc)
      Specified by:
      decomposePredicate in interface org.apache.hadoop.hive.ql.metadata.HiveStoragePredicateHandler
      Parameters:
      jobConf - Job configuration for InputFormat to access
      deserializer - Deserializer
      exprNodeDesc - Filter expression extracted by Hive
      Returns:
      Entire filter to take advantage of Hive's pruning as well as Iceberg's pruning.
    • table

      public static Table table(org.apache.hadoop.conf.Configuration config, String name)
      Returns the Table serialized to the configuration based on the table name. If configuration is missing from the FileIO of the table, it will be populated with the input config.
      Parameters:
      config - The configuration used to get the data from
      name - The name of the table we need as returned by TableDesc.getTableName()
      Returns:
      The Table
    • checkAndSetIoConfig

      public static void checkAndSetIoConfig(org.apache.hadoop.conf.Configuration config, Table table)
      If enabled, it populates the FileIO's hadoop configuration with the input config object. This might be necessary when the table object was serialized without the FileIO config.
      Parameters:
      config - Configuration to set for FileIO, if enabled
      table - The Iceberg table object
    • checkAndSkipIoConfigSerialization

      public static void checkAndSkipIoConfigSerialization(org.apache.hadoop.conf.Configuration config, Table table)
      If enabled, it ensures that the FileIO's hadoop configuration will not be serialized. This might be desirable for decreasing the overall size of serialized table objects.

      Note: Skipping FileIO config serialization in this fashion might in turn necessitate calling checkAndSetIoConfig(Configuration, Table) on the deserializer-side to enable subsequent use of the FileIO.

      Parameters:
      config - Configuration to set for FileIO in a transient manner, if enabled
      table - The Iceberg table object
    • outputTables

      public static Collection<String> outputTables(org.apache.hadoop.conf.Configuration config)
      Returns the names of the output tables stored in the configuration.
      Parameters:
      config - The configuration used to get the data from
      Returns:
      The collection of the table names as returned by TableDesc.getTableName()
    • catalogName

      public static String catalogName(org.apache.hadoop.conf.Configuration config, String name)
      Returns the catalog name serialized to the configuration.
      Parameters:
      config - The configuration used to get the data from
      name - The name of the table we neeed as returned by TableDesc.getTableName()
      Returns:
      catalog name
    • schema

      public static Schema schema(org.apache.hadoop.conf.Configuration config)
      Returns the Table Schema serialized to the configuration.
      Parameters:
      config - The configuration used to get the data from
      Returns:
      The Table Schema object