Advanced Model YAML

In some cases, advanced models will be required when implementing advanced features such as incremental partitioned models or staging models.

Properties

type - refers to the resource type and must be 'model'(required)

refresh - Specifies the refresh schedule that Rill should follow to re-ingest and update the underlying source data (optional).

cron - a cron schedule expression, which should be encapsulated in single quotes, e.g. '* * * * *' (optional)
every - a Go duration string, such as 24h (docs) (optional)

refresh:
    cron: "0 8 * * *"

timeout — The maximum time to wait for model ingestion (optional).

incremental - set to true or false whether incremental modeling is required (optional)

state - refers to the explicitly defined state of your model, cannot be used with partitions (optional).

sql/glob - refers to the location of the data depending if the data is cloud storage or a data warehouse.

partitions - refers to the how your data is partitioned, cannot be used with state. (optional).

connector - refers to the connector that the partitions is using (optional).
sql - refers to the SQL query used to access the data in your data warehouse, use sql or glob (optional).
glob - refers to the location of the data in your cloud warehouse, use sql or glob (optional).
- path - in the case glob is selected, you will need to set the path of your source (optional).
- partition - in the case glob is selected, you can defined how to partition the table. directory or hive (optional).

partitions:
  connector: duckdb
  sql: SELECT range AS num FROM range(0,10)

partitions:
  glob:
    connector: [s3/gcs]
    path: [s3/gs]://path/to/file/**/*.parquet[.csv]

pre_exec – refers to SQL queries to run before the main query, available for DuckDB-based models. (optional). Ensure pre_exec queries are idempotent. Use IF NOT EXISTS statements when applicable.

sql - refers to the SQL query for your model. (required).

post_exec – refers to a SQL query that is run after the main query, available for DuckDB-based models. (optional). Ensure post_exec queries are dempotent. Use IF EXISTS statements when applicable.

pre_exec: ATTACH IF NOT EXISTS 'dbname=postgres host=localhost port=5432 user=postgres password=postgres' AS postgres_db (TYPE POSTGRES)

sql: SELECT * FROM postgres_query('postgres_db', 'SELECT * FROM USERS')

post_exec: DETACH DATABASE IF EXISTS postgres_db

partitions_watermark - refers to a customizable timestamp that can be set to check if an object has been updated (optional).

partitions_concurrency - refers to the number of concurrent partitions that can be read at the same time (optional).

stage - in the case of staging models, where an input source does not support direct write to the output and a staging table is required (optional).

connector - refers to the connector type for the staging table
path - path of the temporary staging table

output - in the case of staging models, where the output needs to be defined where the staging table will write the temporary data (optional).

connector - refers to the connector type for the staging table (optional).
incremental_strategy - refers to how the incremental refresh will behave, (merge or append) (optional).
unique_key - required if incremental_strategy is defined, refers to the unique column to use to merge (optional).
materialize - refers to the output table being materialized (optional).
columns - refers to a list of columns if you required to manually define column name and types (optional).
engine_full - refers to the ClickHouse engine specifications, (ENGINE = ... PARTITION BY ... ORDER BY ... SETTINGS ...) (optional).

materialize - refers to the model being materialized as a table or not (optional).

Properties​

Properties