Hive Table Migration🔗

Apache Hive supports ORC, Parquet, and Avro file formats that could be migrated to Iceberg. When migrating data to an Iceberg table, which provides versioning and transactional updates, only the most recent data files need to be migrated.

Iceberg supports all three migration actions: Snapshot Table, Migrate Table, and Add Files for migrating from Hive tables to Iceberg tables. Since Hive tables do not maintain snapshots, the migration process essentially involves creating a new Iceberg table with the existing schema and committing all data files across all partitions to the new Iceberg table. After the initial migration, any new data files are added to the new Iceberg table using the Add Files action.

Enabling Migration from Hive to Iceberg🔗

The Hive table migration actions are supported by the Spark Integration module via Spark Procedures. The procedures are bundled in the Spark runtime jar, which is available in the Iceberg Release Downloads.

Snapshot Hive Table to Iceberg🔗

To snapshot a Hive table, users can run the following Spark SQL:

CALL catalog_name.system.snapshot('db.source', 'db.dest')

See Spark Procedure: snapshot for more details.

Migrate Hive Table To Iceberg🔗

To migrate a Hive table to Iceberg, users can run the following Spark SQL:

CALL catalog_name.system.migrate('db.sample')

See Spark Procedure: migrate for more details.

Add Files From Hive Table to Iceberg🔗

To add data files from a Hive table to a given Iceberg table, users can run the following Spark SQL:

CALL spark_catalog.system.add_files(
table => 'db.tbl',
source_table => 'db.src_tbl'
)

See Spark Procedure: add_files for more details.