Apache Iceberg

Overview

Apache Iceberg is an open table format for large analytic datasets. Rill supports reading Iceberg tables directly from object storage through compatible query engine integrations. Today, this is powered by DuckDB's native Iceberg extension.

Direct file access only

Rill reads Iceberg tables by scanning the table's metadata and data files directly from object storage. Catalog-based access (e.g., through a Hive Metastore, AWS Glue, or REST catalog) is not currently supported.

Storage Backends

Iceberg tables can be read from any of the following storage backends:

Backend	URI format	Authentication
Amazon S3	`s3://bucket/path/to/table`	Requires an S3 connector
Google Cloud Storage	`gs://bucket/path/to/table`	Requires a GCS connector with HMAC keys
Azure Blob Storage	`azure://container/path/to/table`	Requires an Azure connector
Local filesystem	`/path/to/table`	No authentication needed

For cloud storage backends, you must first configure the corresponding storage connector with valid credentials. Rill uses these credentials to authenticate when reading the Iceberg table files.

Using the UI

Click Add Data in your Rill project
Select Apache Iceberg as the data source type
Choose your storage backend (S3, GCS, Azure, or Local)
Enter the path to your Iceberg table directory
Optionally configure advanced parameters (allow moved paths, snapshot version)
Enter a model name and click Create

For cloud storage backends, the UI will prompt you to set up the corresponding storage connector if one doesn't already exist.

Manual Configuration

Create a model that uses DuckDB's iceberg_scan() function to read the table.

Reading from S3

Create models/iceberg_data.yaml:

type: model
connector: duckdb
create_secrets_from_connectors: s3
materialize: true

sql: |
  SELECT *
  FROM iceberg_scan('s3://my-bucket/path/to/iceberg_table')

Reading from GCS

HMAC keys required

DuckDB's iceberg_scan() authenticates to GCS using HMAC keys, not JSON service account credentials. When configuring your GCS connector, use the key_id and secret (HMAC) properties instead of google_application_credentials.

type: model
connector: duckdb
create_secrets_from_connectors: gcs
materialize: true

sql: |
  SELECT *
  FROM iceberg_scan('gs://my-bucket/path/to/iceberg_table')

Reading from Azure

type: model
connector: duckdb
create_secrets_from_connectors: azure
materialize: true

sql: |
  SELECT *
  FROM iceberg_scan('azure://my-container/path/to/iceberg_table')

Reading from local filesystem

type: model
connector: duckdb
materialize: true

sql: |
  SELECT *
  FROM iceberg_scan('/path/to/iceberg_table')

Optional Parameters

The iceberg_scan() function accepts additional parameters:

Parameter	Type	Description
`allow_moved_paths`	boolean	Allow reading tables where data files have been moved from their original location. Defaults to `true` in the UI.
`version`	string	Read a specific Iceberg snapshot version instead of the latest.

Example with optional parameters:

SELECT *
FROM iceberg_scan('s3://my-bucket/path/to/iceberg_table',
  allow_moved_paths = true,
  version = '2')

Deploy to Rill Cloud

Since Iceberg tables are read through DuckDB using your existing storage connector credentials, deploying to Rill Cloud follows the same process as the underlying storage connector:

S3: Follow the S3 deployment guide
GCS: Follow the GCS deployment guide
Azure: Follow the Azure deployment guide

Ensure your storage connector credentials are configured in your Rill Cloud project before deploying.

Limitations

Direct file access only: Rill reads Iceberg metadata and data files directly from storage. Catalog integrations (Hive Metastore, AWS Glue, REST catalog) are not supported.
DuckDB engine: Iceberg support is currently provided through DuckDB's Iceberg extension. Additional engine support (e.g., ClickHouse) is planned.
GCS requires HMAC keys: DuckDB's iceberg_scan() only supports HMAC authentication for GCS, not JSON service account credentials.
Read-only: Rill reads from Iceberg tables but does not write to them.

Overview​

Storage Backends​

Using the UI​

Manual Configuration​

Reading from S3​

Reading from GCS​

Reading from Azure​

Reading from local filesystem​

Optional Parameters​

Deploy to Rill Cloud​

Limitations​