Package org.apache.iceberg.mr.hive
Class HiveIcebergOutputCommitter
java.lang.Object
org.apache.hadoop.mapreduce.OutputCommitter
org.apache.hadoop.mapred.OutputCommitter
org.apache.iceberg.mr.hive.HiveIcebergOutputCommitter
public class HiveIcebergOutputCommitter
extends org.apache.hadoop.mapred.OutputCommitter
An Iceberg table committer for adding data files to the Iceberg tables. Currently independent of
the Hive ACID transactions.
-
Constructor Summary
-
Method Summary
Modifier and TypeMethodDescriptionvoid
abortJob
(org.apache.hadoop.mapred.JobContext originalContext, int status) Removes the generated data files if there is a commit file already generated for them.void
abortTask
(org.apache.hadoop.mapred.TaskAttemptContext originalContext) Removes files generated by this task.void
commitJob
(org.apache.hadoop.mapred.JobContext originalContext) Reads the commit files stored in the temp directories and collects the generated committed data files.void
commitTask
(org.apache.hadoop.mapred.TaskAttemptContext originalContext) Collects the generated data files and creates a commit file storing the data file list.boolean
needsTaskCommit
(org.apache.hadoop.mapred.TaskAttemptContext context) void
setupJob
(org.apache.hadoop.mapred.JobContext jobContext) void
setupTask
(org.apache.hadoop.mapred.TaskAttemptContext taskAttemptContext) Methods inherited from class org.apache.hadoop.mapred.OutputCommitter
abortJob, abortTask, cleanupJob, cleanupJob, commitJob, commitTask, isCommitJobRepeatable, isCommitJobRepeatable, isRecoverySupported, isRecoverySupported, isRecoverySupported, needsTaskCommit, recoverTask, recoverTask, setupJob, setupTask
-
Constructor Details
-
HiveIcebergOutputCommitter
public HiveIcebergOutputCommitter()
-
-
Method Details
-
setupJob
public void setupJob(org.apache.hadoop.mapred.JobContext jobContext) - Specified by:
setupJob
in classorg.apache.hadoop.mapred.OutputCommitter
-
setupTask
public void setupTask(org.apache.hadoop.mapred.TaskAttemptContext taskAttemptContext) - Specified by:
setupTask
in classorg.apache.hadoop.mapred.OutputCommitter
-
needsTaskCommit
public boolean needsTaskCommit(org.apache.hadoop.mapred.TaskAttemptContext context) - Specified by:
needsTaskCommit
in classorg.apache.hadoop.mapred.OutputCommitter
-
commitTask
public void commitTask(org.apache.hadoop.mapred.TaskAttemptContext originalContext) throws IOException Collects the generated data files and creates a commit file storing the data file list.- Specified by:
commitTask
in classorg.apache.hadoop.mapred.OutputCommitter
- Parameters:
originalContext
- The task attempt context- Throws:
IOException
- Thrown if there is an error writing the commit file
-
abortTask
public void abortTask(org.apache.hadoop.mapred.TaskAttemptContext originalContext) throws IOException Removes files generated by this task.- Specified by:
abortTask
in classorg.apache.hadoop.mapred.OutputCommitter
- Parameters:
originalContext
- The task attempt context- Throws:
IOException
- Thrown if there is an error closing the writer
-
commitJob
Reads the commit files stored in the temp directories and collects the generated committed data files. Appends the data files to the tables. At the end removes the temporary directories.- Overrides:
commitJob
in classorg.apache.hadoop.mapred.OutputCommitter
- Parameters:
originalContext
- The job context- Throws:
IOException
- if there is a failure accessing the files
-
abortJob
public void abortJob(org.apache.hadoop.mapred.JobContext originalContext, int status) throws IOException Removes the generated data files if there is a commit file already generated for them. The cleanup at the end removes the temporary directories as well.- Overrides:
abortJob
in classorg.apache.hadoop.mapred.OutputCommitter
- Parameters:
originalContext
- The job contextstatus
- The status of the job- Throws:
IOException
- if there is a failure deleting the files
-