Class HiveIcebergOutputCommitter


  • public class HiveIcebergOutputCommitter
    extends org.apache.hadoop.mapred.OutputCommitter
    An Iceberg table committer for adding data files to the Iceberg tables. Currently independent of the Hive ACID transactions.
    • Method Summary

      All Methods Instance Methods Concrete Methods 
      Modifier and Type Method Description
      void abortJob​(org.apache.hadoop.mapred.JobContext originalContext, int status)
      Removes the generated data files if there is a commit file already generated for them.
      void abortTask​(org.apache.hadoop.mapred.TaskAttemptContext originalContext)
      Removes files generated by this task.
      void commitJob​(org.apache.hadoop.mapred.JobContext originalContext)
      Reads the commit files stored in the temp directories and collects the generated committed data files.
      void commitTask​(org.apache.hadoop.mapred.TaskAttemptContext originalContext)
      Collects the generated data files and creates a commit file storing the data file list.
      boolean needsTaskCommit​(org.apache.hadoop.mapred.TaskAttemptContext context)  
      void setupJob​(org.apache.hadoop.mapred.JobContext jobContext)  
      void setupTask​(org.apache.hadoop.mapred.TaskAttemptContext taskAttemptContext)  
      • Methods inherited from class org.apache.hadoop.mapred.OutputCommitter

        abortJob, abortTask, cleanupJob, cleanupJob, commitJob, commitTask, isCommitJobRepeatable, isCommitJobRepeatable, isRecoverySupported, isRecoverySupported, isRecoverySupported, needsTaskCommit, recoverTask, recoverTask, setupJob, setupTask
      • Methods inherited from class java.lang.Object

        clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
    • Constructor Detail

      • HiveIcebergOutputCommitter

        public HiveIcebergOutputCommitter()
    • Method Detail

      • setupJob

        public void setupJob​(org.apache.hadoop.mapred.JobContext jobContext)
        Specified by:
        setupJob in class org.apache.hadoop.mapred.OutputCommitter
      • setupTask

        public void setupTask​(org.apache.hadoop.mapred.TaskAttemptContext taskAttemptContext)
        Specified by:
        setupTask in class org.apache.hadoop.mapred.OutputCommitter
      • needsTaskCommit

        public boolean needsTaskCommit​(org.apache.hadoop.mapred.TaskAttemptContext context)
        Specified by:
        needsTaskCommit in class org.apache.hadoop.mapred.OutputCommitter
      • commitTask

        public void commitTask​(org.apache.hadoop.mapred.TaskAttemptContext originalContext)
                        throws java.io.IOException
        Collects the generated data files and creates a commit file storing the data file list.
        Specified by:
        commitTask in class org.apache.hadoop.mapred.OutputCommitter
        Parameters:
        originalContext - The task attempt context
        Throws:
        java.io.IOException - Thrown if there is an error writing the commit file
      • abortTask

        public void abortTask​(org.apache.hadoop.mapred.TaskAttemptContext originalContext)
                       throws java.io.IOException
        Removes files generated by this task.
        Specified by:
        abortTask in class org.apache.hadoop.mapred.OutputCommitter
        Parameters:
        originalContext - The task attempt context
        Throws:
        java.io.IOException - Thrown if there is an error closing the writer
      • commitJob

        public void commitJob​(org.apache.hadoop.mapred.JobContext originalContext)
                       throws java.io.IOException
        Reads the commit files stored in the temp directories and collects the generated committed data files. Appends the data files to the tables. At the end removes the temporary directories.
        Overrides:
        commitJob in class org.apache.hadoop.mapred.OutputCommitter
        Parameters:
        originalContext - The job context
        Throws:
        java.io.IOException - if there is a failure accessing the files
      • abortJob

        public void abortJob​(org.apache.hadoop.mapred.JobContext originalContext,
                             int status)
                      throws java.io.IOException
        Removes the generated data files if there is a commit file already generated for them. The cleanup at the end removes the temporary directories as well.
        Overrides:
        abortJob in class org.apache.hadoop.mapred.OutputCommitter
        Parameters:
        originalContext - The job context
        status - The status of the job
        Throws:
        java.io.IOException - if there is a failure deleting the files