Package org.apache.iceberg.mr.hive
Class HiveIcebergStorageHandler
java.lang.Object
org.apache.iceberg.mr.hive.HiveIcebergStorageHandler
- All Implemented Interfaces:
org.apache.hadoop.conf.Configurable,org.apache.hadoop.hive.ql.metadata.HiveStorageHandler,org.apache.hadoop.hive.ql.metadata.HiveStoragePredicateHandler
public class HiveIcebergStorageHandler
extends Object
implements org.apache.hadoop.hive.ql.metadata.HiveStoragePredicateHandler, org.apache.hadoop.hive.ql.metadata.HiveStorageHandler
-
Nested Class Summary
Nested classes/interfaces inherited from interface org.apache.hadoop.hive.ql.metadata.HiveStoragePredicateHandler
org.apache.hadoop.hive.ql.metadata.HiveStoragePredicateHandler.DecomposedPredicate -
Constructor Summary
Constructors -
Method Summary
Modifier and TypeMethodDescriptionstatic StringcatalogName(org.apache.hadoop.conf.Configuration config, String name) Returns the catalog name serialized to the configuration.static voidcheckAndSetIoConfig(org.apache.hadoop.conf.Configuration config, Table table) If enabled, it populates the FileIO's hadoop configuration with the input config object.static voidcheckAndSkipIoConfigSerialization(org.apache.hadoop.conf.Configuration config, Table table) If enabled, it ensures that the FileIO's hadoop configuration will not be serialized.voidconfigureInputJobCredentials(org.apache.hadoop.hive.ql.plan.TableDesc tableDesc, Map<String, String> secrets) voidconfigureInputJobProperties(org.apache.hadoop.hive.ql.plan.TableDesc tableDesc, Map<String, String> map) voidconfigureJobConf(org.apache.hadoop.hive.ql.plan.TableDesc tableDesc, org.apache.hadoop.mapred.JobConf jobConf) voidconfigureOutputJobProperties(org.apache.hadoop.hive.ql.plan.TableDesc tableDesc, Map<String, String> map) voidconfigureTableJobProperties(org.apache.hadoop.hive.ql.plan.TableDesc tableDesc, Map<String, String> map) org.apache.hadoop.hive.ql.metadata.HiveStoragePredicateHandler.DecomposedPredicatedecomposePredicate(org.apache.hadoop.mapred.JobConf jobConf, org.apache.hadoop.hive.serde2.Deserializer deserializer, org.apache.hadoop.hive.ql.plan.ExprNodeDesc exprNodeDesc) org.apache.hadoop.hive.ql.security.authorization.HiveAuthorizationProviderorg.apache.hadoop.conf.ConfigurationgetConf()Class<? extends org.apache.hadoop.mapred.InputFormat>org.apache.hadoop.hive.metastore.HiveMetaHookClass<? extends org.apache.hadoop.mapred.OutputFormat>Class<? extends org.apache.hadoop.hive.serde2.AbstractSerDe>static Collection<String>outputTables(org.apache.hadoop.conf.Configuration config) Returns the names of the output tables stored in the configuration.static Schemaschema(org.apache.hadoop.conf.Configuration config) Returns the Table Schema serialized to the configuration.voidsetConf(org.apache.hadoop.conf.Configuration conf) static TableReturns the Table serialized to the configuration based on the table name.toString()
-
Constructor Details
-
HiveIcebergStorageHandler
public HiveIcebergStorageHandler()
-
-
Method Details
-
getInputFormatClass
- Specified by:
getInputFormatClassin interfaceorg.apache.hadoop.hive.ql.metadata.HiveStorageHandler
-
getOutputFormatClass
- Specified by:
getOutputFormatClassin interfaceorg.apache.hadoop.hive.ql.metadata.HiveStorageHandler
-
getSerDeClass
- Specified by:
getSerDeClassin interfaceorg.apache.hadoop.hive.ql.metadata.HiveStorageHandler
-
getMetaHook
public org.apache.hadoop.hive.metastore.HiveMetaHook getMetaHook()- Specified by:
getMetaHookin interfaceorg.apache.hadoop.hive.ql.metadata.HiveStorageHandler
-
getAuthorizationProvider
public org.apache.hadoop.hive.ql.security.authorization.HiveAuthorizationProvider getAuthorizationProvider()- Specified by:
getAuthorizationProviderin interfaceorg.apache.hadoop.hive.ql.metadata.HiveStorageHandler
-
configureInputJobProperties
public void configureInputJobProperties(org.apache.hadoop.hive.ql.plan.TableDesc tableDesc, Map<String, String> map) - Specified by:
configureInputJobPropertiesin interfaceorg.apache.hadoop.hive.ql.metadata.HiveStorageHandler
-
configureOutputJobProperties
public void configureOutputJobProperties(org.apache.hadoop.hive.ql.plan.TableDesc tableDesc, Map<String, String> map) - Specified by:
configureOutputJobPropertiesin interfaceorg.apache.hadoop.hive.ql.metadata.HiveStorageHandler
-
configureTableJobProperties
public void configureTableJobProperties(org.apache.hadoop.hive.ql.plan.TableDesc tableDesc, Map<String, String> map) - Specified by:
configureTableJobPropertiesin interfaceorg.apache.hadoop.hive.ql.metadata.HiveStorageHandler
-
configureInputJobCredentials
-
configureJobConf
public void configureJobConf(org.apache.hadoop.hive.ql.plan.TableDesc tableDesc, org.apache.hadoop.mapred.JobConf jobConf) - Specified by:
configureJobConfin interfaceorg.apache.hadoop.hive.ql.metadata.HiveStorageHandler
-
getConf
public org.apache.hadoop.conf.Configuration getConf()- Specified by:
getConfin interfaceorg.apache.hadoop.conf.Configurable
-
setConf
public void setConf(org.apache.hadoop.conf.Configuration conf) - Specified by:
setConfin interfaceorg.apache.hadoop.conf.Configurable
-
toString
-
decomposePredicate
public org.apache.hadoop.hive.ql.metadata.HiveStoragePredicateHandler.DecomposedPredicate decomposePredicate(org.apache.hadoop.mapred.JobConf jobConf, org.apache.hadoop.hive.serde2.Deserializer deserializer, org.apache.hadoop.hive.ql.plan.ExprNodeDesc exprNodeDesc) - Specified by:
decomposePredicatein interfaceorg.apache.hadoop.hive.ql.metadata.HiveStoragePredicateHandler- Parameters:
jobConf- Job configuration for InputFormat to accessdeserializer- DeserializerexprNodeDesc- Filter expression extracted by Hive- Returns:
- Entire filter to take advantage of Hive's pruning as well as Iceberg's pruning.
-
table
Returns the Table serialized to the configuration based on the table name. If configuration is missing from the FileIO of the table, it will be populated with the input config.- Parameters:
config- The configuration used to get the data fromname- The name of the table we need as returned by TableDesc.getTableName()- Returns:
- The Table
-
checkAndSetIoConfig
If enabled, it populates the FileIO's hadoop configuration with the input config object. This might be necessary when the table object was serialized without the FileIO config.- Parameters:
config- Configuration to set for FileIO, if enabledtable- The Iceberg table object
-
checkAndSkipIoConfigSerialization
public static void checkAndSkipIoConfigSerialization(org.apache.hadoop.conf.Configuration config, Table table) If enabled, it ensures that the FileIO's hadoop configuration will not be serialized. This might be desirable for decreasing the overall size of serialized table objects.Note: Skipping FileIO config serialization in this fashion might in turn necessitate calling
checkAndSetIoConfig(Configuration, Table)on the deserializer-side to enable subsequent use of the FileIO.- Parameters:
config- Configuration to set for FileIO in a transient manner, if enabledtable- The Iceberg table object
-
outputTables
Returns the names of the output tables stored in the configuration.- Parameters:
config- The configuration used to get the data from- Returns:
- The collection of the table names as returned by TableDesc.getTableName()
-
catalogName
Returns the catalog name serialized to the configuration.- Parameters:
config- The configuration used to get the data fromname- The name of the table we neeed as returned by TableDesc.getTableName()- Returns:
- catalog name
-
schema
Returns the Table Schema serialized to the configuration.- Parameters:
config- The configuration used to get the data from- Returns:
- The Table Schema object
-