Skip to main content

Apache Iceberg

Overview

Apache Iceberg is an open table format for large analytic datasets. Rill supports reading Iceberg tables directly from object storage through compatible query engine integrations. Today, this is powered by DuckDB's native Iceberg extension.

Direct file access only

Rill reads Iceberg tables by scanning the table's metadata and data files directly from object storage. Catalog-based access (e.g., through a Hive Metastore, AWS Glue, or REST catalog) is not currently supported.

Storage Backends

Iceberg tables can be read from any of the following storage backends:

BackendURI formatAuthentication
Amazon S3s3://bucket/path/to/tableRequires an S3 connector
Google Cloud Storagegs://bucket/path/to/tableRequires a GCS connector with HMAC keys
Azure Blob Storageazure://container/path/to/tableRequires an Azure connector
Local filesystem/path/to/tableNo authentication needed

For cloud storage backends, you must first configure the corresponding storage connector with valid credentials. Rill uses these credentials to authenticate when reading the Iceberg table files.

Using the UI

  1. Click Add Data in your Rill project
  2. Select Apache Iceberg as the data source type
  3. Choose your storage backend (S3, GCS, Azure, or Local)
  4. Enter the path to your Iceberg table directory
  5. Optionally configure advanced parameters (allow moved paths, snapshot version)
  6. Enter a model name and click Create

For cloud storage backends, the UI will prompt you to set up the corresponding storage connector if one doesn't already exist.

Manual Configuration

Create a model that uses DuckDB's iceberg_scan() function to read the table.

Reading from S3

Create models/iceberg_data.yaml:

type: model
connector: duckdb
create_secrets_from_connectors: s3
materialize: true

sql: |
SELECT *
FROM iceberg_scan('s3://my-bucket/path/to/iceberg_table')

Reading from GCS

HMAC keys required

DuckDB's iceberg_scan() authenticates to GCS using HMAC keys, not JSON service account credentials. When configuring your GCS connector, use the key_id and secret (HMAC) properties instead of google_application_credentials.

type: model
connector: duckdb
create_secrets_from_connectors: gcs
materialize: true

sql: |
SELECT *
FROM iceberg_scan('gs://my-bucket/path/to/iceberg_table')

Reading from Azure

type: model
connector: duckdb
create_secrets_from_connectors: azure
materialize: true

sql: |
SELECT *
FROM iceberg_scan('azure://my-container/path/to/iceberg_table')

Reading from local filesystem

type: model
connector: duckdb
materialize: true

sql: |
SELECT *
FROM iceberg_scan('/path/to/iceberg_table')

Optional Parameters

The iceberg_scan() function accepts additional parameters:

ParameterTypeDescription
allow_moved_pathsbooleanAllow reading tables where data files have been moved from their original location. Defaults to true in the UI.
versionstringRead a specific Iceberg snapshot version instead of the latest.

Example with optional parameters:

SELECT *
FROM iceberg_scan('s3://my-bucket/path/to/iceberg_table',
allow_moved_paths = true,
version = '2')

Deploy to Rill Cloud

Since Iceberg tables are read through DuckDB using your existing storage connector credentials, deploying to Rill Cloud follows the same process as the underlying storage connector:

Ensure your storage connector credentials are configured in your Rill Cloud project before deploying.

Limitations

  • Direct file access only: Rill reads Iceberg metadata and data files directly from storage. Catalog integrations (Hive Metastore, AWS Glue, REST catalog) are not supported.
  • DuckDB engine: Iceberg support is currently provided through DuckDB's Iceberg extension. Additional engine support (e.g., ClickHouse) is planned.
  • GCS requires HMAC keys: DuckDB's iceberg_scan() only supports HMAC authentication for GCS, not JSON service account credentials.
  • Read-only: Rill reads from Iceberg tables but does not write to them.