Class HiveIcebergOutputCommitter

java.lang.Object
org.apache.hadoop.mapreduce.OutputCommitter
org.apache.hadoop.mapred.OutputCommitter
org.apache.iceberg.mr.hive.HiveIcebergOutputCommitter

public class HiveIcebergOutputCommitter extends org.apache.hadoop.mapred.OutputCommitter
An Iceberg table committer for adding data files to the Iceberg tables. Currently independent of the Hive ACID transactions.
  • Constructor Summary

    Constructors
    Constructor
    Description
     
  • Method Summary

    Modifier and Type
    Method
    Description
    void
    abortJob(org.apache.hadoop.mapred.JobContext originalContext, int status)
    Removes the generated data files if there is a commit file already generated for them.
    void
    abortTask(org.apache.hadoop.mapred.TaskAttemptContext originalContext)
    Removes files generated by this task.
    void
    commitJob(org.apache.hadoop.mapred.JobContext originalContext)
    Reads the commit files stored in the temp directories and collects the generated committed data files.
    void
    commitTask(org.apache.hadoop.mapred.TaskAttemptContext originalContext)
    Collects the generated data files and creates a commit file storing the data file list.
    boolean
    needsTaskCommit(org.apache.hadoop.mapred.TaskAttemptContext context)
     
    void
    setupJob(org.apache.hadoop.mapred.JobContext jobContext)
     
    void
    setupTask(org.apache.hadoop.mapred.TaskAttemptContext taskAttemptContext)
     

    Methods inherited from class org.apache.hadoop.mapred.OutputCommitter

    abortJob, abortTask, cleanupJob, cleanupJob, commitJob, commitTask, isCommitJobRepeatable, isCommitJobRepeatable, isRecoverySupported, isRecoverySupported, isRecoverySupported, needsTaskCommit, recoverTask, recoverTask, setupJob, setupTask

    Methods inherited from class java.lang.Object

    clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
  • Constructor Details

    • HiveIcebergOutputCommitter

      public HiveIcebergOutputCommitter()
  • Method Details

    • setupJob

      public void setupJob(org.apache.hadoop.mapred.JobContext jobContext)
      Specified by:
      setupJob in class org.apache.hadoop.mapred.OutputCommitter
    • setupTask

      public void setupTask(org.apache.hadoop.mapred.TaskAttemptContext taskAttemptContext)
      Specified by:
      setupTask in class org.apache.hadoop.mapred.OutputCommitter
    • needsTaskCommit

      public boolean needsTaskCommit(org.apache.hadoop.mapred.TaskAttemptContext context)
      Specified by:
      needsTaskCommit in class org.apache.hadoop.mapred.OutputCommitter
    • commitTask

      public void commitTask(org.apache.hadoop.mapred.TaskAttemptContext originalContext) throws IOException
      Collects the generated data files and creates a commit file storing the data file list.
      Specified by:
      commitTask in class org.apache.hadoop.mapred.OutputCommitter
      Parameters:
      originalContext - The task attempt context
      Throws:
      IOException - Thrown if there is an error writing the commit file
    • abortTask

      public void abortTask(org.apache.hadoop.mapred.TaskAttemptContext originalContext) throws IOException
      Removes files generated by this task.
      Specified by:
      abortTask in class org.apache.hadoop.mapred.OutputCommitter
      Parameters:
      originalContext - The task attempt context
      Throws:
      IOException - Thrown if there is an error closing the writer
    • commitJob

      public void commitJob(org.apache.hadoop.mapred.JobContext originalContext) throws IOException
      Reads the commit files stored in the temp directories and collects the generated committed data files. Appends the data files to the tables. At the end removes the temporary directories.
      Overrides:
      commitJob in class org.apache.hadoop.mapred.OutputCommitter
      Parameters:
      originalContext - The job context
      Throws:
      IOException - if there is a failure accessing the files
    • abortJob

      public void abortJob(org.apache.hadoop.mapred.JobContext originalContext, int status) throws IOException
      Removes the generated data files if there is a commit file already generated for them. The cleanup at the end removes the temporary directories as well.
      Overrides:
      abortJob in class org.apache.hadoop.mapred.OutputCommitter
      Parameters:
      originalContext - The job context
      status - The status of the job
      Throws:
      IOException - if there is a failure deleting the files