Package org.apache.iceberg.mr.hive
Class HiveIcebergStorageHandler
java.lang.Object
org.apache.iceberg.mr.hive.HiveIcebergStorageHandler
- All Implemented Interfaces:
org.apache.hadoop.conf.Configurable
,org.apache.hadoop.hive.ql.metadata.HiveStorageHandler
,org.apache.hadoop.hive.ql.metadata.HiveStoragePredicateHandler
public class HiveIcebergStorageHandler
extends Object
implements org.apache.hadoop.hive.ql.metadata.HiveStoragePredicateHandler, org.apache.hadoop.hive.ql.metadata.HiveStorageHandler
-
Nested Class Summary
Nested classes/interfaces inherited from interface org.apache.hadoop.hive.ql.metadata.HiveStoragePredicateHandler
org.apache.hadoop.hive.ql.metadata.HiveStoragePredicateHandler.DecomposedPredicate
-
Constructor Summary
-
Method Summary
Modifier and TypeMethodDescriptionstatic String
catalogName
(org.apache.hadoop.conf.Configuration config, String name) Returns the catalog name serialized to the configuration.static void
checkAndSetIoConfig
(org.apache.hadoop.conf.Configuration config, Table table) If enabled, it populates the FileIO's hadoop configuration with the input config object.static void
checkAndSkipIoConfigSerialization
(org.apache.hadoop.conf.Configuration config, Table table) If enabled, it ensures that the FileIO's hadoop configuration will not be serialized.void
configureInputJobCredentials
(org.apache.hadoop.hive.ql.plan.TableDesc tableDesc, Map<String, String> secrets) void
configureInputJobProperties
(org.apache.hadoop.hive.ql.plan.TableDesc tableDesc, Map<String, String> map) void
configureJobConf
(org.apache.hadoop.hive.ql.plan.TableDesc tableDesc, org.apache.hadoop.mapred.JobConf jobConf) void
configureOutputJobProperties
(org.apache.hadoop.hive.ql.plan.TableDesc tableDesc, Map<String, String> map) void
configureTableJobProperties
(org.apache.hadoop.hive.ql.plan.TableDesc tableDesc, Map<String, String> map) org.apache.hadoop.hive.ql.metadata.HiveStoragePredicateHandler.DecomposedPredicate
decomposePredicate
(org.apache.hadoop.mapred.JobConf jobConf, org.apache.hadoop.hive.serde2.Deserializer deserializer, org.apache.hadoop.hive.ql.plan.ExprNodeDesc exprNodeDesc) org.apache.hadoop.hive.ql.security.authorization.HiveAuthorizationProvider
org.apache.hadoop.conf.Configuration
getConf()
Class
<? extends org.apache.hadoop.mapred.InputFormat> org.apache.hadoop.hive.metastore.HiveMetaHook
Class
<? extends org.apache.hadoop.mapred.OutputFormat> Class
<? extends org.apache.hadoop.hive.serde2.AbstractSerDe> static Collection
<String> outputTables
(org.apache.hadoop.conf.Configuration config) Returns the names of the output tables stored in the configuration.static Schema
schema
(org.apache.hadoop.conf.Configuration config) Returns the Table Schema serialized to the configuration.void
setConf
(org.apache.hadoop.conf.Configuration conf) static Table
Returns the Table serialized to the configuration based on the table name.toString()
-
Constructor Details
-
HiveIcebergStorageHandler
public HiveIcebergStorageHandler()
-
-
Method Details
-
getInputFormatClass
- Specified by:
getInputFormatClass
in interfaceorg.apache.hadoop.hive.ql.metadata.HiveStorageHandler
-
getOutputFormatClass
- Specified by:
getOutputFormatClass
in interfaceorg.apache.hadoop.hive.ql.metadata.HiveStorageHandler
-
getSerDeClass
- Specified by:
getSerDeClass
in interfaceorg.apache.hadoop.hive.ql.metadata.HiveStorageHandler
-
getMetaHook
public org.apache.hadoop.hive.metastore.HiveMetaHook getMetaHook()- Specified by:
getMetaHook
in interfaceorg.apache.hadoop.hive.ql.metadata.HiveStorageHandler
-
getAuthorizationProvider
public org.apache.hadoop.hive.ql.security.authorization.HiveAuthorizationProvider getAuthorizationProvider()- Specified by:
getAuthorizationProvider
in interfaceorg.apache.hadoop.hive.ql.metadata.HiveStorageHandler
-
configureInputJobProperties
public void configureInputJobProperties(org.apache.hadoop.hive.ql.plan.TableDesc tableDesc, Map<String, String> map) - Specified by:
configureInputJobProperties
in interfaceorg.apache.hadoop.hive.ql.metadata.HiveStorageHandler
-
configureOutputJobProperties
public void configureOutputJobProperties(org.apache.hadoop.hive.ql.plan.TableDesc tableDesc, Map<String, String> map) - Specified by:
configureOutputJobProperties
in interfaceorg.apache.hadoop.hive.ql.metadata.HiveStorageHandler
-
configureTableJobProperties
public void configureTableJobProperties(org.apache.hadoop.hive.ql.plan.TableDesc tableDesc, Map<String, String> map) - Specified by:
configureTableJobProperties
in interfaceorg.apache.hadoop.hive.ql.metadata.HiveStorageHandler
-
configureInputJobCredentials
-
configureJobConf
public void configureJobConf(org.apache.hadoop.hive.ql.plan.TableDesc tableDesc, org.apache.hadoop.mapred.JobConf jobConf) - Specified by:
configureJobConf
in interfaceorg.apache.hadoop.hive.ql.metadata.HiveStorageHandler
-
getConf
public org.apache.hadoop.conf.Configuration getConf()- Specified by:
getConf
in interfaceorg.apache.hadoop.conf.Configurable
-
setConf
public void setConf(org.apache.hadoop.conf.Configuration conf) - Specified by:
setConf
in interfaceorg.apache.hadoop.conf.Configurable
-
toString
-
decomposePredicate
public org.apache.hadoop.hive.ql.metadata.HiveStoragePredicateHandler.DecomposedPredicate decomposePredicate(org.apache.hadoop.mapred.JobConf jobConf, org.apache.hadoop.hive.serde2.Deserializer deserializer, org.apache.hadoop.hive.ql.plan.ExprNodeDesc exprNodeDesc) - Specified by:
decomposePredicate
in interfaceorg.apache.hadoop.hive.ql.metadata.HiveStoragePredicateHandler
- Parameters:
jobConf
- Job configuration for InputFormat to accessdeserializer
- DeserializerexprNodeDesc
- Filter expression extracted by Hive- Returns:
- Entire filter to take advantage of Hive's pruning as well as Iceberg's pruning.
-
table
Returns the Table serialized to the configuration based on the table name. If configuration is missing from the FileIO of the table, it will be populated with the input config.- Parameters:
config
- The configuration used to get the data fromname
- The name of the table we need as returned by TableDesc.getTableName()- Returns:
- The Table
-
checkAndSetIoConfig
If enabled, it populates the FileIO's hadoop configuration with the input config object. This might be necessary when the table object was serialized without the FileIO config.- Parameters:
config
- Configuration to set for FileIO, if enabledtable
- The Iceberg table object
-
checkAndSkipIoConfigSerialization
public static void checkAndSkipIoConfigSerialization(org.apache.hadoop.conf.Configuration config, Table table) If enabled, it ensures that the FileIO's hadoop configuration will not be serialized. This might be desirable for decreasing the overall size of serialized table objects.Note: Skipping FileIO config serialization in this fashion might in turn necessitate calling
checkAndSetIoConfig(Configuration, Table)
on the deserializer-side to enable subsequent use of the FileIO.- Parameters:
config
- Configuration to set for FileIO in a transient manner, if enabledtable
- The Iceberg table object
-
outputTables
Returns the names of the output tables stored in the configuration.- Parameters:
config
- The configuration used to get the data from- Returns:
- The collection of the table names as returned by TableDesc.getTableName()
-
catalogName
Returns the catalog name serialized to the configuration.- Parameters:
config
- The configuration used to get the data fromname
- The name of the table we neeed as returned by TableDesc.getTableName()- Returns:
- catalog name
-
schema
Returns the Table Schema serialized to the configuration.- Parameters:
config
- The configuration used to get the data from- Returns:
- The Table Schema object
-