Models YAML
This file is used to define YAML models. For more information on our SQL models, see the SQL models documentation.
Both regular models and source models can use the Model YAML specification described on this page. While SQL models are perfect for simple transformations, Model YAML files provide advanced capabilities for complex data processing scenarios.
When to use Model YAML:
- Partitions - Optimize performance with data partitioning strategies
- Incremental models - Process only new or changed data efficiently
- Pre/post execution hooks - Run custom logic before or after model execution
- Staging - Create intermediate tables for complex transformations
- Output configuration - Define specific output formats and destinations
Model YAML files give you fine-grained control over how your data is processed and transformed, making them ideal for production workloads and complex analytics pipelines.
Properties
type
[string] - Refers to the resource type and must be model
(required)
refresh
[object] - Specifies the refresh schedule that Rill should follow to re-ingest and update the underlying model data
-
cron
- [string] - A cron expression that defines the execution schedule -
time_zone
- [string] - Time zone to interpret the schedule in (e.g., 'UTC', 'America/Los_Angeles'). -
disable
- [boolean] - If true, disables the resource without deleting it. -
ref_update
- [boolean] - If true, allows the resource to run when a dependency updates. -
run_in_dev
- [boolean] - If true, allows the schedule to run in development mode.
refresh:
cron: "* * * * *"
connector
[string] - Refers to the resource type and is needed if setting an explicit OLAP engine. IE clickhouse
sql
[string] - Raw SQL query to run against source (required)
pre_exec
[string] - Refers to SQL queries to run before the main query, available for DuckDB-based models. (optional). Ensure pre_exec queries are idempotent. Use IF NOT EXISTS statements when applicable.
pre_exec: ATTACH IF NOT EXISTS 'dbname=postgres host=localhost port=5432 user=postgres password=postgres' AS postgres_db (TYPE POSTGRES)
post_exec
[string] - Refers to a SQL query that is run after the main query, available for DuckDB-based models. (optional). Ensure post_exec queries are idempotent. Use IF EXISTS statements when applicable.
post_exec: DETACH DATABASE IF EXISTS postgres_db
retry
[object] - Refers to the retry configuration for the model. (optional)
-
attempts
- [integer] - The number of attempts to retry the model. -
delay
- [string] - The delay between attempts. -
exponential_backoff
- [boolean] - Whether to use exponential backoff. -
if_error_matches
- [array of string] - The error messages to match.
retry:
attempts: 5
delay: 10s
exponential_backoff: true
if_error_matches:
- ".*OvercommitTracker.*"
- ".*Timeout.*"
- ".*Bad Gateway.*"
timeout
[string] - The maximum time to wait for model ingestion
incremental
[boolean] - whether incremental modeling is required (optional)
change_mode
[string] - Configure how changes to the model specifications are applied (optional). 'reset' will drop and recreate the model automatically, 'manual' will require a manual full or incremental refresh to apply changes, and 'patch' will switch to the new logic without re-processing historical data (only applies for incremental models).
state
[oneOf] - Refers to the explicitly defined state of your model, cannot be used with partitions (optional)
-
option 1 - [object] - Executes a raw SQL query against the project's data models.
-
sql
- [string] - Raw SQL query to run against existing models in the project. (required) -
connector
- [string] - specifies the connector to use when running SQL or glob queries.
-
-
option 2 - [object] - Executes a SQL query that targets a defined metrics view.
metrics_sql
- [string] - SQL query that targets a metrics view in the project (required)
-
option 3 - [object] - Calls a custom API defined in the project to compute data.
-
api
- [string] - Name of a custom API defined in the project. (required) -
args
- [object] - Arguments to pass to the custom API.
-
-
option 4 - [object] - Uses a file-matching pattern (glob) to query data from a connector.
-
glob
- [anyOf] - Defines the file path or pattern to query from the specified connector. (required)-
option 1 - [string] - A simple file path/glob pattern as a string.
-
option 2 - [object] - An object-based configuration for specifying a file path/glob pattern with advanced options.
-
-
connector
- [string] - Specifies the connector to use with the glob input.
-
-
option 5 - [object] - Uses the status of a resource as data.
-
resource_status
- [object] - Based on resource status (required)where_error
- [boolean] - Indicates whether the condition should trigger when the resource is in an error state.
-
state:
sql: SELECT MAX(date) as max_date
partitions
[oneOf] - Refers to the how your data is partitioned, cannot be used with state. (optional)
-
option 1 - [object] - Executes a raw SQL query against the project's data models.
-
sql
- [string] - Raw SQL query to run against existing models in the project. (required) -
connector
- [string] - specifies the connector to use when running SQL or glob queries.
-
-
option 2 - [object] - Executes a SQL query that targets a defined metrics view.
metrics_sql
- [string] - SQL query that targets a metrics view in the project (required)
-
option 3 - [object] - Calls a custom API defined in the project to compute data.
-
api
- [string] - Name of a custom API defined in the project. (required) -
args
- [object] - Arguments to pass to the custom API.
-
-
option 4 - [object] - Uses a file-matching pattern (glob) to query data from a connector.
-
glob
- [anyOf] - Defines the file path or pattern to query from the specified connector. (required)-
option 1 - [string] - A simple file path/glob pattern as a string.
-
option 2 - [object] - An object-based configuration for specifying a file path/glob pattern with advanced options.
-
-
connector
- [string] - Specifies the connector to use with the glob input.
-
-
option 5 - [object] - Uses the status of a resource as data.
-
resource_status
- [object] - Based on resource status (required)where_error
- [boolean] - Indicates whether the condition should trigger when the resource is in an error state.
-
partitions:
glob: gcs://my_bucket/y=*/m=*/d=*/*.parquet
partitions:
connector: duckdb
sql: SELECT range AS num FROM range(0,10)
materialize
[boolean] - models will be materialized in olap
partitions_watermark
[string] - Refers to a customizable timestamp that can be set to check if an object has been updated (optional).
partitions_concurrency
[integer] - Refers to the number of concurrent partitions that can be read at the same time (optional).
stage
[object] - in the case of staging models, where an input source does not support direct write to the output and a staging table is required
-
connector
- [string] - Refers to the connector type for the staging table (required) -
path
- [string] - Refers to the path to the staging table
stage:
connector: s3
path: s3://my_bucket/my_staging_table
output
[object] - to define the properties of output
-
table
- [string] - Name of the output table. If not specified, the model name is used. -
materialize
- [boolean] - Whether to materialize the model as a table or view -
connector
- [string] - Refers to the connector type for the output table. Can beclickhouse
orduckdb
and their named connector -
incremental_strategy
- [string] - Strategy to use for incremental updates. Can be 'append', 'merge' or 'partition_overwrite' -
unique_key
- [array of string] - List of columns that uniquely identify a row for merge strategy -
partition_by
- [string] - Column or expression to partition the table by
Additional properties for output
when connector
is clickhouse
-
type
- [string] - Type to materialize the model into. Can be 'TABLE', 'VIEW' or 'DICTIONARY' -
columns
- [string] - Column names and types. Can also include indexes. If unspecified, detected from the query. -
engine_full
- [string] - Full engine definition in SQL format. Can include partition keys, order, TTL, etc. -
engine
- [string] - Table engine to use. Default is MergeTree -
order_by
- [string] - ORDER BY clause. -
partition_by
- [string] - Partition BY clause. -
primary_key
- [string] - PRIMARY KEY clause. -
sample_by
- [string] - SAMPLE BY clause. -
ttl
- [string] - TTL settings for the table or columns. -
table_settings
- [string] - Table-specific settings. -
query_settings
- [string] - Settings used in insert/create table as select queries. -
distributed_settings
- [string] - Settings for distributed table. -
distributed_sharding_key
- [string] - Sharding key for distributed table. -
dictionary_source_user
- [string] - User for accessing the source dictionary table (used if type is DICTIONARY). -
dictionary_source_password
- [string] - Password for the dictionary source user.
Common Properties
name
[string] - Name is usually inferred from the filename, but can be specified manually.
refs
[array of string] - List of resource references
dev
[object] - Overrides any properties in development environment.
prod
[object] - Overrides any properties in production environment.
Additional properties for output
when connector
is clickhouse
type
[string] - Type to materialize the model into. Can be 'TABLE', 'VIEW' or 'DICTIONARY'
columns
[string] - Column names and types. Can also include indexes. If unspecified, detected from the query.
engine_full
[string] - Full engine definition in SQL format. Can include partition keys, order, TTL, etc.
engine
[string] - Table engine to use. Default is MergeTree
order_by
[string] - ORDER BY clause.
partition_by
[string] - Partition BY clause.
primary_key
[string] - PRIMARY KEY clause.
sample_by
[string] - SAMPLE BY clause.
ttl
[string] - TTL settings for the table or columns.
table_settings
[string] - Table-specific settings.
query_settings
[string] - Settings used in insert/create table as select queries.
distributed_settings
[string] - Settings for distributed table.
distributed_sharding_key
[string] - Sharding key for distributed table.
dictionary_source_user
[string] - User for accessing the source dictionary table (used if type is DICTIONARY).
dictionary_source_password
[string] - Password for the dictionary source user.
Common Properties
name
[string] - Name is usually inferred from the filename, but can be specified manually.
refs
[array of string] - List of resource references
dev
[object] - Overrides any properties in development environment.
prod
[object] - Overrides any properties in production environment.
Additional properties when connector
is athena
or named connector for athena
output_location
[string] - Output location for query results in S3.
workgroup
[string] - AWS Athena workgroup to use for queries.
region
[string] - AWS region to connect to Athena and the output location.
Additional properties when connector
is azure
or named connector of azure
path
[string] - Path to the source
account
[string] - Account identifier
uri
[string] - Source URI
extract
[object] - Arbitrary key-value pairs for extraction settings
glob
[object] - Settings related to glob file matching.
-
max_total_size
- [integer] - Maximum total size (in bytes) matched by glob -
max_objects_matched
- [integer] - Maximum number of objects matched by glob -
max_objects_listed
- [integer] - Maximum number of objects listed in glob -
page_size
- [integer] - Page size for glob listing
batch_size
[string] - Size of a batch (e.g., '100MB')
Additional properties when connector
is bigquery
or named connector of bigquery
project_id
[string] - ID of the BigQuery project.
Additional properties when connector
is duckdb
or named connector of duckdb
path
[string] - Path to the data source.
format
[string] - Format of the data source (e.g., csv, json, parquet).
pre_exec
[string] - refers to SQL queries to run before the main query, available for DuckDB-based models. (optional). Ensure pre_exec
queries are idempotent. Use IF NOT EXISTS
statements when applicable.
post_exec
[string] - refers to a SQL query that is run after the main query, available for DuckDB-based models. (optional). Ensure post_exec
queries are idempotent. Use IF EXISTS
statements when applicable.
pre_exec: ATTACH IF NOT EXISTS 'dbname=postgres host=localhost port=5432 user=postgres password=postgres' AS postgres_db (TYPE POSTGRES);
sql: SELECT * FROM postgres_query('postgres_db', 'SELECT * FROM USERS')
post_exec: DETACH DATABASE IF EXISTS postgres_db
Additional properties when connector
is gcs
or named connector of gcs
path
[string] - Path to the source
uri
[string] - Source URI
extract
[object] - key-value pairs for extraction settings
glob
[object] - Settings related to glob file matching.
-
max_total_size
- [integer] - Maximum total size (in bytes) matched by glob -
max_objects_matched
- [integer] - Maximum number of objects matched by glob -
max_objects_listed
- [integer] - Maximum number of objects listed in glob -
page_size
- [integer] - Page size for glob listing
batch_size
[string] - Size of a batch (e.g., '100MB')
Additional properties when connector
is local_file
or named connector of local_file
path
[string] - Path to the data source.
format
[string] - Format of the data source (e.g., csv, json, parquet).
Additional properties when connector
is redshift
or named connector of redshift
output_location
[string] - S3 location where query results are stored.
workgroup
[string] - Redshift Serverless workgroup to use.
database
[string] - Name of the Redshift database.
cluster_identifier
[string] - Identifier of the Redshift cluster.
role_arn
[string] - ARN of the IAM role to assume for Redshift access.
region
[string] - AWS region of the Redshift deployment.
Additional properties when connector
is s3
or named connector of s3
region
[string] - AWS region
endpoint
[string] - AWS Endpoint
path
[string] - Path to the source
uri
[string] - Source URI
extract
[object] - key-value pairs for extraction settings
glob
[object] - Settings related to glob file matching.
-
max_total_size
- [integer] - Maximum total size (in bytes) matched by glob -
max_objects_matched
- [integer] - Maximum number of objects matched by glob -
max_objects_listed
- [integer] - Maximum number of objects listed in glob -
page_size
- [integer] - Page size for glob listing
batch_size
[string] - Size of a batch (e.g., '100MB')
Additional properties when connector
is salesforce
or named connector of salesforce
soql
[string] - SOQL query to execute against the Salesforce instance.
sobject
[string] - Salesforce object (e.g., Account, Contact) targeted by the query.
queryAll
[boolean] - Whether to include deleted and archived records in the query (uses queryAll API).
Examples
### Incremental model
type: model
incremental: true
connector: bigquery
state:
sql: SELECT MAX(date) as max_date
sql: "SELECT ... FROM events \n {{ if incremental }} \n WHERE event_time > '{{.state.max_date}}' \n {{end}}\n"
output:
connector: duckdb
### Partitioned model
type: model
partitions:
glob:
connector: gcs
path: gs://rilldata-public/github-analytics/Clickhouse/2025/*/commits_*.parquet
sql: SELECT * FROM read_parquet('{{ .partition.uri }}')
output:
connector: duckdb
incremental_strategy: append
### Partitioned Incremental model
type: model
incremental: true
refresh:
cron: "0 8 * * *"
partitions:
glob:
path: gs://rilldata-public/github-analytics/Clickhouse/2025/*/*
partition: directory
sql: "SELECT * \n FROM read_parquet('gs://rilldata-public/{{ .partition.path }}/commits_*.parquet') \n WHERE '{{ .partition.path }}' IS NOT NULL\n"
output:
connector: duckdb
incremental_strategy: append
### Staging model
type: model
connector: snowflake
# Use DuckDB to generate a range of days from 1st Jan to today
partitions:
connector: duckdb
sql: SELECT range as day FROM range(TIMESTAMPTZ '2024-01-01', now(), INTERVAL 1 DAY)
# Don't reload previously ingested partitions on every refresh
incremental: true
# Query Snowflake for all events belonging to the current partition
sql: SELECT * FROM events WHERE date_trunc('day', event_time) = '{{ .partition.day }}'
# Since ClickHouse can't ingest from Snowflake or vice versa, we use S3 as a temporary staging connector
stage:
connector: s3
path: s3://bucket/temp-data
# Produce the final output into ClickHouse, requires a clickhouse.yaml connector defined.
output:
connector: clickhouse