Apache Iceberg C++ 0.3.0 Release
The Apache Iceberg community is pleased to announce the 0.3.0 release of Apache Iceberg C++. This release includes over 140 pull requests from 23 contributors, including 11 first-time contributors.
iceberg-cpp is a native C++ implementation of the Apache Iceberg table format, providing libraries for reading, writing, and managing Iceberg tables in C++ applications.
Release Highlights🔗
Scan Planning and Data Access🔗
- Incremental scan APIs, incremental append scans, and incremental changelog scans for planning table changes between snapshots
- Merge-on-read data access with a MOR file scan task reader, delete filter support, and a DeleteLoader for v2 position and equality delete files
- Column selection in table scan planning and ManifestGroup file filtering
- Roaring-based position bitmaps, a position delete index, and range coalescing for position deletes
Table Operations and Maintenance🔗
- MergingSnapshotUpdate lays the groundwork for table overwrite, delete, update, and various maintainances.
- SnapshotManager support and retried transaction commits
- Snapshot expiration cleanup strategies for reachable file cleanup and incremental file cleanup
- Partition statistics updates and schema update mapping
Catalogs and Integrations🔗
- REST catalog improvements including initial OAuth2 support, OAuth2 token auto-refresh, basic authentication, snapshot loading mode, namespace separators, and server-side scan planning endpoints
- S3 FileIO integration built on Arrow filesystem support
- FileIO interface enrichment with new InputFile and OutputFile interfaces and bulk delete support
- SQL catalog support backed by SQLite, PostgreSQL, and MySQL stores
Metrics and Observability🔗
- Metrics reporter support with report JSON serialization and reporter loading
- Avro writer metrics and Parquet writer metrics
Metadata and File Format Support🔗
- Puffin support with basic data structures, format constants and JSON serialization, and file reader/writer support
- Iceberg v3 support for the unknown type and nanosecond timestamp types
- Expression serialization with operation JSON serialization, expression JSON serialization, and typed literal binding after serialization
Contributors🔗
$ git shortlog --perl-regexp --author='^((?!dependabot\[bot\]).*)$' -sn v0.2.0..v0.3.0
26 Gang Wu
26 Junwang Zhao
11 wzhuo
8 Kevin Liu
8 Xinli Shang
7 Feiyang Li
7 Zehua Zou
6 lishuxu
4 Innocent Djiofack
4 liuxiaoyu
3 Guotao Yu
3 ZhaoXuan
3 slfan1989
2 Manu Zhang
2 Minh Vu
2 Sebastian Baunsgaard
2 SkylerLin
2 姚军
1 Jiajia Li
1 Maxim Zibitsker
1 Sandeep Gottimukkala
1 Sung Yun
1 Yingfan Guo
This release welcomes 11 first-time contributors to Apache Iceberg C++: @evindj, @manuzhang, @mzibitsker, @fallintoplace, @gsandeep1241, @Baunsgaard, @linguoxuan, @sungwy, @sentomk, @zhaoxuan1994, and @SYaoJun.
We thank all contributors for their efforts in making this release possible!
Roadmap for 0.4.0🔗
The community is actively tracking the next release in #637, with a focus on filling out Iceberg v3 support and expanding table maintenance APIs.
Getting Involved🔗
We welcome questions and contributions from all interested. Issues can be filed on GitHub, and questions can be directed to GitHub or the Iceberg dev mailing list.