Vendors

Vendors Supporting Iceberg Tables🔗

This page contains some of the vendors who are shipping and supporting Apache Iceberg in their products

Amazon Web Services (AWS)🔗

AWS provides a comprehensive suite of services to support Apache Iceberg as part of its modern data architecture. Amazon S3 offers virtually unlimited, highly durable storage for data lakes, while Amazon S3 Tables deliver fully managed Iceberg tables with automated maintenance, optimization, and cost management. Amazon Athena offers a serverless, interactive query engine with native Iceberg support, enabling fast and cost-efficient querying of large-scale datasets. Amazon EMR integrates Iceberg with Apache Spark, Apache Flink, Apache Hive, Presto, and Trino, making it easy to process and analyze data at scale. AWS Glue provides fully managed data integration capabilities, including schema evolution, maintenance, optimizations, and partition management for Iceberg tables. Together, these AWS services enable a high-performance and cost-effective data lakehouse solution powered by Iceberg.

BladePipe 🔗

BladePipe is a real-time end-to-end data integration tool, offering 40+ out-of-the-box connectors. It provides a one-stop data movement solution, including schema evolution, data migration and sync, verification and correction, monitoring and alerting. With sub-second latency, BladePipe captures change data from MySQL, Oracle, PostgreSQL and other sources and streams it to Iceberg and more, all without writing a single line of code. It offers on-premise and BYOC deployment options. Learn more about how to build a pipeline with BladePipe here.

Bodo 🔗

Bodo is a high performance SQL & Python compute engine that brings HPC and supercomputing techniques to data analytics. Bodo supports Apache Iceberg tables as a first-class table format and storage, enabling users to read and write Iceberg tables with Bodo's high-performance data processing engine. Bodo is available as a cloud service on AWS and Azure, and as well as an on-premises solution.

CelerData 🔗

CelerData provides commercial offerings for StarRocks, a distributed MPP SQL engine for enterprise analytics on Iceberg. With its fully vectorized technology, local caching, and intelligent materialized view, StarRocks delivers sub-second query latency for both batch and real-time analytics. CelerData offers both an enterprise deployment and a cloud service to help customers use StarRocks more smoothly. Learn more about how to query Iceberg with StarRocks here.

ClickHouse 🔗

ClickHouse is a column-oriented database that enables its users to generate powerful analytics, using SQL queries, in real-time. ClickHouse integrates well with Iceberg and offers two options to work with it: 1. Via Iceberg table function: Provides a read-only table-like interface to Apache Iceberg tables in Amazon S3. 2. Via the Iceberg table engine: An engine that provides a read-only integration with existing Apache Iceberg tables in Amazon S3.

Cloudera 🔗

Cloudera's data lakehouse enables customers to store and manage their data in open table formats like Apache Iceberg for running large scale multi-function analytics and AI. Organizations rely on Cloudera's Iceberg support because it is easy to use, easy to integrate into any data ecosystem and easy to run multiple engines - both Cloudera and non-Cloudera, regardless of where the data resides. It provides a common standard for all data with unified security, governance, metadata management, and fine-grained access control across the data.

Cloudera provides an integrated end to end open data lakehouse with the ability to ingest batch and streaming data using NiFi, Flink and Kafka, then process the same copy of data using Spark and run analytics or AI with our Data Visualization, Data warehouse and Machine Learning tools on private or any public cloud.

Collate 🔗

Collate is an AI-native data catalog and governance platform built on OpenMetadata, the open-source (Apache 2.0) context layer. It brings together every Apache Iceberg table into a single source of governed context — with column-level lineage, data profiling, and no-code data quality tests — by connecting through the engines that already query Iceberg (e.g., Trino, Snowflake, BigQuery).

Confluent 🔗

Confluent provides Tableflow, a managed service for streaming data from Apache Kafka topics into Apache Iceberg tables. Tableflow automates schema evolution through Schema Registry integration and handles table maintenance tasks including compaction automatically.

Crunchy Data 🔗

Crunchy Data Warehouse is a modern data warehouse built on PostgreSQL. Crunchy Data Warehouse extends unmodified PostgreSQL to provide support for fully transactional Iceberg tables and high performance analytics. Crunchy Data Warehouse is available as a managed service on AWS via Crunchy Bridge, fully managed PostgreSQL as a service. Crunchy Data Warehouse can create Iceberg tables directly from a PostgreSQL database or external URLs and can read, query, and update Iceberg tables using PostgreSQL syntax. Using a hybrid query engine that combines PostgreSQL and DuckDB, Crunchy Data Warehouse enables high performance analytical queries of Iceberg tables.

Databricks 🔗

Databricks uses an open lakehouse architecture to power its Data Intelligence Platform and provide a unified foundation for all data and governance, combined with AI models tuned to an organization’s unique characteristics. Through Unity Catalog, users can manage and govern all structured data, unstructured data, business metrics and AI models across open data formats like Delta Lake, Apache Iceberg, Hudi, Parquet and more.

Dataddo 🔗

Dataddo is a fully managed data integration platform for moving enterprise data across cloud, on-prem, and hybrid environments, with first-class support for open table formats including Apache Iceberg. * Architecture - Dataddo runs a single control plane that orchestrates data planes on AWS, Azure, GCP, sovereign clouds or on-premises Kubernetes and OpenShift, so sensitive data never leaves your network. * Transport patterns - Dataddo supports ETL/ELT, CDC, streaming, reverse ETL, and batch delivery from hundreds of fully maintained connectors. * Iceberg support - Dataddo writes to Apache Iceberg via catalogs such as AWS Glue, REST and others, and automatically detects source schema drift so downstream tables don't silently break. See the Dataddo Platform for details.

dltHub 🔗

dlt is an open-source Python library for building production-grade extract & load pipelines. It automates the tedious parts of ELT, letting you load any data source into Apache Iceberg with minimal code. dlt eliminates boilerplate and makes data ingestion robust against evolving and unpredictable data sources.

Pythonic pipelines: Define Iceberg ingestion with simple Python functions that are testable and CI/CD-friendly.
Automated schema management: Infers and evolves Iceberg table schemas on the fly, adapting automatically to source changes.
Catalog support: Works with SQL-based (SQLite, PostgreSQL), REST (Lakekeeper, Polaris), and cloud-native options like AWS Glue, Databricks Unity Catalog and Snowflake Open Catalog.
Flexible deployment: Run Iceberg pipelines anywhere - local, Docker, Airflow, or serverless.

Go from an API to a versioned Iceberg table in minutes here.

Dremio 🔗

With Dremio, an organization can easily build and manage a data lakehouse in which data is stored in open formats like Apache Iceberg and can be processed with Dremio’s interactive SQL query engine and non-Dremio processing engines. Dremio Cloud provides these capabilities in a fully managed offering.

Dremio Sonar is a lakehouse query engine that provides interactive performance and DML on Apache Iceberg, as well as other formats and data sources.
Dremio Arctic is a lakehouse catalog and optimization service for Apache Iceberg. Arctic automatically optimizes tables in the background to ensure high-performance access for any engine. Arctic also simplifies experimentation, data engineering, and data governance by providing Git concepts like branches and tags on Apache Iceberg tables.

DuckDB 🔗

DuckDB is an open-source, in-process SQL database optimized for fast analytical queries. DuckDB is not only lightweight —just a small binary—, but also extensible. One of the core extensions supported by DuckDB is the duckdb-iceberg extension, which allows DuckDB users to attach to an Iceberg catalog, query data and write to Iceberg tables. This functionality is all natively implemented in DuckDB with no external dependencies.

Estuary 🔗

A low-latency, high-fidelity data movement platform, Estuary lets developers quickly set up pipelines to connect their entire data architecture. Intelligent schema inference and evolution determines field data types based on usage and keeps pipelines running when fields change. Flexible deployment options include public, private, and BYOC (Bring Your Own Cloud) for a range of compliance and privacy-oriented use cases.

Estuary's catalog of pre-built data connectors provides integrations with databases, APIs, event logs, and more. Apache Iceberg is a primary destination option with two configurable materializations: one that merges new updates and one that simply appends them.

Firebolt 🔗

Firebolt is a cloud data warehouse built to power data-intensive applications that demand low latency and high concurrency. It is optimized for reading Apache Iceberg tables with sub-second performance and integrates seamlessly with major Iceberg catalogs.

Firebolt is also available as Firebolt Core, a free, self-hosted edition of its distributed query engine.

Learn more about querying Iceberg with Firebolt here.

Fivetran 🔗

Fivetran, the global leader in data movement, is trusted by Enterprises to centralize data from SaaS applications and databases into cloud destinations, including Managed Data Lakes. Fivetran Managed Data Lakes provides a fully managed Iceberg Data Lake for users. Users can connect any of the 700+ connections that Fivetran supports and write them directly into a Storage Location of their choice. Fivetran Managed Data Lake Service handles the ingestion and maintenance of their Iceberg tables and hosts a Iceberg Rest Complaint catalog endpoint for downstream consumption.

Google Cloud 🔗

Google Cloud offers first class support for Apache Iceberg through BigLake to help you build an open, managed and high-performance Iceberg lakehouse so you can enable advanced analytics and data science with automated data management and built-in governance. BigLake metastore is a serverless metastore for all your Iceberg tables that works across engines like Apache Spark, BigQuery and third party platforms to create and manage tables, giving you a consistent view of your data and unified access controls. BigLake metastore supports the Apache Iceberg REST Catalog for easy integration with OSS and third party engines. BigLake tables for Apache Iceberg offer an enterprise-ready, fully managed Iceberg experience when used with BigQuery.

IBM watsonx.data 🔗

IBM watsonx.data is an open data lakehouse for AI and analytics. It uses Apache Iceberg as a core table format, providing features like schema evolution, time travel, and partitioning. This allows developers to easily work with large, complex data sets while ensuring efficient performance and flexibility. watsonx.data simplifies the integration of Iceberg tables, making it easy to manage data across different environments and query historical data without disruption.

Developers can leverage the benefits of Iceberg tables and take advantage of high performance compute capabilities like Velox, Presto, Apache Gluten, which are part of the watsonx.data ecosystem.

IOMETE 🔗

IOMETE is a fully-managed ready to use, batteries included Data Platform. IOMETE optimizes clustering, compaction, and access control to Apache Iceberg tables. Customer data remains on customer's account to prevent vendor lock-in. The core of IOMETE platform is a serverless Lakehouse that leverages Apache Iceberg as its core table format. IOMETE platform also includes Serverless Spark, an SQL Editor, A Data Catalog, and granular data access control. IOMETE supports Hybrid-multi-cloud setups.

Microsoft OneLake 🔗

Microsoft OneLake is a single unified data lake that brings together your entire data estate into an open, secure foundation for analytics across the organization. Built into Microsoft Fabric, OneLake delivers two powerful APIs: the Tables API and the Files API. The OneLake Tables API supports the Apache Iceberg REST Catalog (IRC) specification, making it simple to create, manage, and integrate Iceberg tables with existing tools and workflows. The OneLake Files API offers full Azure Data Lake Storage (ADLS) compatibility, enabling seamless file operations and interoperability with familiar ADLS tools. Together, these APIs make OneLake a truly open and interoperable data lake, delivering flexibility and connectivity for modern analytics and AI-driven pipelines.

Oracle 🔗

As a fully-managed Oracle AI Database service, Oracle Autonomous AI Lakehouse combines the openness of Apache Iceberg with the performance, automation, and security of Oracle Autonomous Database and Oracle Exadata. Available across Oracle Cloud Infrastructure (OCI), Microsoft Azure, Google Cloud, AWS, and on-premises, Oracle AI Database provides a multicloud and hybrid open lakehouse architecture with high-performance access to Iceberg tables through integration with existing catalogs and support for the Apache Iceberg REST Catalog specification. Oracle enables interoperability across engines such as Apache Spark, Trino, and Apache Flink while minimizing data movement and preserving vendor independence. Built-in AI, vector search, graph analytics, and JSON-relational capabilities allow organizations to run advanced analytics and AI workloads directly on Iceberg data with enterprise-grade governance, availability, and serverless scalability.

PuppyGraph 🔗

PuppyGraph is a cloud-native graph analytics engine that enables users to query one or more relational data stores as a unified graph model. This eliminates the overhead of deploying and maintaining a siloed graph database system, with no ETL required. PuppyGraph’s native Apache Iceberg integration adds native graph capabilities to your existing data lake in an easy and performant way.

Redpanda 🔗

Redpanda is both a cloud-native and self-hosted streaming platform whose Iceberg topics automatically transform Kafka messages into Iceberg tables in real-time. This allows users to query their Kafka data as part of an established Iceberg deployment, no connectors or additional technology required. Redpanda Iceberg integrates with an expanding list of Iceberg catalogs and query engines, including many listed here.

RisingWave 🔗

RisingWave is a cloud-native streaming database for real-time data ingestion, processing, and management. It integrates with Iceberg to read from and write to Iceberg tables, enabling efficient file compaction across sources like message queues, databases (via Change Data Capture), data lakes, and files. RisingWave is available as open source, a managed cloud service (RisingWave Cloud) with BYOC support, and an enterprise on-premises edition (RisingWave Premium).

Ryft 🔗

Ryft is a fully automated Iceberg management platform. Ryft helps data teams create an open, automated and cost-effective Iceberg lakehouse, by maintaining and optimizing Iceberg tables in real time, based on actual usage patterns. The Ryft engine runs compaction intelligently, adapting to different use cases like streaming, batch jobs, CDC, and more. Ryft also automates compliance, disaster recovery and data lifecycle management for Iceberg tables, to ensure your lakehouse stays secure and compliant. It directly integrates with your existing catalog, storage and query engines, allowing for a very simple deployment.

Sail 🔗

Sail is an open-source multimodal distributed compute framework, built in Rust, unifying batch, streaming, and AI workloads. For seamless adoption, Sail offers a drop-in replacement for the Spark SQL and DataFrame APIs in both single-host and distributed settings. Learn more about using Sail with Iceberg in the Sail Iceberg guide.

SingleStore 🔗

SingleStore is a high‑performance, scalable, distributed SQL platform that makes real‑time analytics and transactional processing available at scale. Its native Apache Iceberg integration removes costly ETL steps and powers intelligent, millisecond‑response applications.

By directly reading and managing data from Iceberg tables, SingleStore unlocks enterprises' dormant data, boost generative AI development, and ensure seamless schema evolution with low‑latency queries. Available self-managed or in the cloud, it bridges the gap between traditional data lakes and real‑time analytics.

Snowflake 🔗

Snowflake is a single, cross-cloud platform that enables every organization to mobilize their data with Snowflake’s Data Cloud. Snowflake Horizon Catalog is the universal catalog built into every Snowflake account, providing governance, metadata, and interoperability for Apache Iceberg across engines and clouds. Snowflake supports Snowflake-managed Iceberg Tables with full DDL and DML support and an Iceberg REST API endpoint that lets external query engines such as Apache Spark, Trino, and Apache Flink read and write those tables directly. For federation, catalog-linked databases provide automatic table discovery and synchronization with remote Iceberg REST catalogs, and read and write access through catalog integrations for externally managed Iceberg Tables.

Stackable 🔗

Stackable is the provider of the Stackable Data Platform - a modular, open source data platform for innovative data applications.

True to the philosophy of 'your data, your platform', Stackable enables the creation of flexible, scalable data architectures for data meshes, data lakehouses, event streaming, machine learning, and with the seamless integration of Apache Iceberg via Trino, Apache NiFi and Apache Spark

The Stackable Data Platform is completely open source, providing maximum portability without vendor lock-in. It also enables true data sovereignty - whether in the private or public cloud. With 24/7 support and strict SLAs, Stackable guarantees stability and efficiency - modern, flexible and secure.

Starburst 🔗

Starburst is a commercial offering for the Trino query engine. Trino is a distributed MPP SQL query engine that can query data in Iceberg at interactive speeds. Trino also enables you to join Iceberg tables with an array of other systems. Starburst offers both an enterprise deployment and a fully managed service to make managing and scaling Trino a flawless experience. Starburst also provides customer support and houses many of the original contributors to the open-source project that know Trino best. Learn more about the Starburst Iceberg connector.

StarTree 🔗

StarTree is a real-time analytics platform that is able to deliver consistently fast, highly concurrent queries on data stored in Apache Iceberg. Built on the indexing capabilities of Apache Pinot, StarTree can precisely fetch page-level data from Parquet files, reducing unnecessary scanning and data transfer.

This makes it practical and cost-effective to support SLA-driven analytics on the lakehouse. StarTree can power observability, customer-facing analytics, anomaly detection, and interactive business intelligence workloads without requiring data to be duplicated, pre-aggregated, or materialized into a separate serving system. StarTree is available as a managed cloud service or can be deployed within an enterprise cloud environment. Learn more in the StarTree Docs.

StreamNative 🔗

StreamNative provides a data streaming platform powered by Ursa, a Kafka‑compatible, leaderless, lakehouse‑native streaming engine. Ursa writes directly to Apache Iceberg tables on cloud object storage—removing the need for bespoke connectors—and automatically compacts and commits data, so it’s immediately queryable by engines such as Spark, Trino, and Flink. Learn more in the Ursa VLDB paper.

Tinybird 🔗

Tinybird is a real-time data platform that lets developers and data teams build fast APIs on top of analytical data using SQL. It now offers native support for Apache Iceberg through ClickHouse’s iceberg() table function, allowing seamless querying of Iceberg tables stored in S3.

This integration enables low-latency, high-concurrency access to Iceberg data, with Tinybird handling ingestion, transformation, and API publishing. Developers can now leverage Iceberg for open storage and governance, while using Tinybird for blazing-fast query performance and real-time delivery.

Learn more in the Tinybird documentation.

Upsolver 🔗

Upsolver is a streaming data ingestion and table management solution for Apache Iceberg. With Upsolver, users can easily ingest batch and streaming data from files, streams and databases (CDC) into Iceberg tables. In addition, Upsolver connects to your existing REST and Hive catalogs, and analyzes the health of your tables. Use Upsolver to continuously optimize tables by compacting small files, sorting and compressing, repartitioning, and cleaning up dangling files and expired manifests. Upsolver is available from the Upsolver Cloud or can be deployed in your AWS VPC.

VeloDB 🔗

VeloDB is a commercial data warehouse powered by Apache Doris, an open-source, real-time data warehouse. It also provides powerful query acceleration for Iceberg tables and efficient data writeback. VeloDB offers enterprise version and cloud service, which are fully compatible with open-source Apache Doris. Quick start with Apache Doris and Apache Iceberg here.

OLake 🔗

OLake is an open-source ELT tool to facilitate the replication of databases into Apache Iceberg™ data lakehouses. It offers native integration with PostgreSQL, MySQL, MongoDB, Oracle, and Kafka, enabling real-time data ingestion without the need for intermediary layers like Debezium, Kafka or Spark. The platform's modular architecture supports full-load operations, continuous Change Data Capture (CDC), and incremental synchronization with bookmark/cursor column support, with resumable syncs and schema evolution handling. By employing a parallelized chunking strategy, OLake accelerates initial syncs, while CDC cursor preservation ensures that incremental updates capture all events.

Learn more in the OLake documentation and explore the Github repository.