Skip to main content

Advanced Model YAML

In some cases, advanced models will be required when implementing advanced features such as incremental partitioned models or staging models.

Properties

type - refers to the resource type and must be 'model'(required)

refresh - Specifies the refresh schedule that Rill should follow to re-ingest and update the underlying source data (optional).

  • cron - a cron schedule expression, which should be encapsulated in single quotes, e.g. '* * * * *' (optional)
  • every - a Go duration string, such as 24h (docs) (optional)
refresh:
cron: "0 8 * * *"

timeout — The maximum time to wait for model ingestion (optional).

incremental - set to true or false whether incremental modeling is required (optional)

state - refers to the explicitly defined state of your model, cannot be used with partitions (optional).

  • sql/glob - refers to the location of the data depending if the data is cloud storage or a data warehouse.

partitions - refers to the how your data is partitioned, cannot be used with state. (optional).

  • connector - refers to the connector that the partitions is using (optional).
  • sql - refers to the SQL query used to access the data in your data warehouse, use sql or glob (optional).
  • glob - refers to the location of the data in your cloud warehouse, use sql or glob (optional).
    • path - in the case glob is selected, you will need to set the path of your source (optional).
    • partition - in the case glob is selected, you can defined how to partition the table. directory or hive (optional).
partitions:
connector: duckdb
sql: SELECT range AS num FROM range(0,10)
partitions:
glob:
connector: [s3/gcs]
path: [s3/gs]://path/to/file/**/*.parquet[.csv]

sql - refers to the SQL query for your model. (required).

partitions_watermark - refers to a customizable timestamp that can be set to check if an object has been updated (optional).

partitions_concurrency - refers to the number of concurrent partitions that can be read at the same time (optional).

stage - in the case of staging models, where an input source does not support direct write to the output and a staging table is required (optional).

  • connector - refers to the connector type for the staging table
  • path - path of the temporary staging table

output - in the case of staging models, where the output needs to be defined where the staging table will write the temporary data (optional).

  • connector - refers to the connector type for the staging table (optional).
  • incremental_strategy - refers to how the incremental refresh will behave, (merge or append) (optional).
  • unique_key - required if incremental_stategy is defined, refers to the unique column to use to merge (optional).
  • materialize - refers to the output table being materialized (optional).

materialize - refers to the model being materialized as a table or not (optional).