Advanced Model YAML
In some cases, advanced models will be required when implementing advanced features such as incremental partitioned models or staging models.
Properties
type
- refers to the resource type and must be 'model'(required)
refresh
- Specifies the refresh schedule that Rill should follow to re-ingest and update the underlying source data (optional).
cron
- a cron schedule expression, which should be encapsulated in single quotes, e.g.'* * * * *'
(optional)every
- a Go duration string, such as24h
(docs) (optional)
refresh:
cron: "0 8 * * *"
timeout
— The maximum time to wait for model ingestion (optional).
incremental
- set to true
or false
whether incremental modeling is required (optional)
state
- refers to the explicitly defined state of your model, cannot be used with partitions
(optional).
sql/glob
- refers to the location of the data depending if the data is cloud storage or a data warehouse.
partitions
- refers to the how your data is partitioned, cannot be used with state
. (optional).
connector
- refers to the connector that the partitions is using (optional).sql
- refers to the SQL query used to access the data in your data warehouse, usesql
orglob
(optional).glob
- refers to the location of the data in your cloud warehouse, usesql
orglob
(optional).path
- in the caseglob
is selected, you will need to set the path of your source (optional).partition
- in the caseglob
is selected, you can defined how to partition the table. directory or hive (optional).
partitions:
connector: duckdb
sql: SELECT range AS num FROM range(0,10)
partitions:
glob:
connector: [s3/gcs]
path: [s3/gs]://path/to/file/**/*.parquet[.csv]
sql
- refers to the SQL query for your model. (required).
partitions_watermark
- refers to a customizable timestamp that can be set to check if an object has been updated (optional).
partitions_concurrency
- refers to the number of concurrent partitions that can be read at the same time (optional).
stage
- in the case of staging models, where an input source does not support direct write to the output and a staging table is required (optional).
connector
- refers to the connector type for the staging tablepath
- path of the temporary staging table
output
- in the case of staging models, where the output needs to be defined where the staging table will write the temporary data (optional).
connector
- refers to the connector type for the staging table (optional).incremental_strategy
- refers to how the incremental refresh will behave, (merge or append) (optional).unique_key
- required if incremental_stategy is defined, refers to the unique column to use to merge (optional).materialize
- refers to the output table being materialized (optional).
materialize
- refers to the model being materialized as a table or not (optional).