Skip to main content

Model YAML

Properties

type

[string] - Refers to the resource type and must be model (required)

refresh

[object] - Specifies the refresh schedule that Rill should follow to re-ingest and update the underlying model data

  • cron - [string] - A cron expression that defines the execution schedule

  • time_zone - [string] - Time zone to interpret the schedule in (e.g., 'UTC', 'America/Los_Angeles').

  • disable - [boolean] - If true, disables the resource without deleting it.

  • ref_update - [boolean] - If true, allows the resource to run when a dependency updates.

  • run_in_dev - [boolean] - If true, allows the schedule to run in development mode.

connector

[string] - Refers to the connector type or named connector for the source.

sql

[string] - Raw SQL query to run against source (required)

timeout

[string] - The maximum time to wait for model ingestion

incremental

[boolean] - whether incremental modeling is required (optional)

change_mode

[string] - Configure how changes to the model specifications are applied (optional). 'reset' will drop and recreate the model automatically, 'manual' will require a manual full or incremental refresh to apply changes, and 'patch' will switch to the new logic without re-processing historical data (only applies for incremental models).

state

[oneOf] - Refers to the explicitly defined state of your model, cannot be used with partitions (optional)

  • option 1 - [object] - Executes a raw SQL query against the project's data models.

    • sql - [string] - Raw SQL query to run against existing models in the project. (required)

    • connector - [string] - specifies the connector to use when running SQL or glob queries.

  • option 2 - [object] - Executes a SQL query that targets a defined metrics view.

    • metrics_sql - [string] - SQL query that targets a metrics view in the project (required)
  • option 3 - [object] - Calls a custom API defined in the project to compute data.

    • api - [string] - Name of a custom API defined in the project. (required)

    • args - [object] - Arguments to pass to the custom API.

  • option 4 - [object] - Uses a file-matching pattern (glob) to query data from a connector.

    • glob - [anyOf] - Defines the file path or pattern to query from the specified connector. (required)

      • option 1 - [string] - A simple file path/glob pattern as a string.

      • option 2 - [object] - An object-based configuration for specifying a file path/glob pattern with advanced options.

    • connector - [string] - Specifies the connector to use with the glob input.

  • option 5 - [object] - Uses the status of a resource as data.

    • resource_status - [object] - Based on resource status (required)

      • where_error - [boolean] - Indicates whether the condition should trigger when the resource is in an error state.

partitions

[oneOf] - Refers to the how your data is partitioned, cannot be used with state. (optional)

  • option 1 - [object] - Executes a raw SQL query against the project's data models.

    • sql - [string] - Raw SQL query to run against existing models in the project. (required)

    • connector - [string] - specifies the connector to use when running SQL or glob queries.

  • option 2 - [object] - Executes a SQL query that targets a defined metrics view.

    • metrics_sql - [string] - SQL query that targets a metrics view in the project (required)
  • option 3 - [object] - Calls a custom API defined in the project to compute data.

    • api - [string] - Name of a custom API defined in the project. (required)

    • args - [object] - Arguments to pass to the custom API.

  • option 4 - [object] - Uses a file-matching pattern (glob) to query data from a connector.

    • glob - [anyOf] - Defines the file path or pattern to query from the specified connector. (required)

      • option 1 - [string] - A simple file path/glob pattern as a string.

      • option 2 - [object] - An object-based configuration for specifying a file path/glob pattern with advanced options.

    • connector - [string] - Specifies the connector to use with the glob input.

  • option 5 - [object] - Uses the status of a resource as data.

    • resource_status - [object] - Based on resource status (required)

      • where_error - [boolean] - Indicates whether the condition should trigger when the resource is in an error state.

materialize

[boolean] - models will be materialized in olap

partitions_watermark

[string] - Refers to a customizable timestamp that can be set to check if an object has been updated (optional).

partitions_concurrency

[integer] - Refers to the number of concurrent partitions that can be read at the same time (optional).

stage

[object] - in the case of staging models, where an input source does not support direct write to the output and a staging table is required

  • connector - [string] - Refers to the connector type for the staging table (required)

output

[object] - to define the properties of output

  • table - [string] - Name of the output table. If not specified, the model name is used.

  • materialize - [boolean] - Whether to materialize the model as a table or view

  • connector - [string] - Refers to the connector type for the output table. Can be clickhouse or duckdb and their named connector

  • incremental_strategy - [string] - Strategy to use for incremental updates. Can be 'append', 'merge' or 'partition_overwrite'

  • unique_key - [array of string] - List of columns that uniquely identify a row for merge strategy

  • partition_by - [string] - Column or expression to partition the table by

Additional properties for output when connector is clickhouse

  • type - [string] - Type to materialize the model into. Can be 'TABLE', 'VIEW' or 'DICTIONARY'

  • columns - [string] - Column names and types. Can also include indexes. If unspecified, detected from the query.

  • engine_full - [string] - Full engine definition in SQL format. Can include partition keys, order, TTL, etc.

  • engine - [string] - Table engine to use. Default is MergeTree

  • order_by - [string] - ORDER BY clause.

  • partition_by - [string] - Partition BY clause.

  • primary_key - [string] - PRIMARY KEY clause.

  • sample_by - [string] - SAMPLE BY clause.

  • ttl - [string] - TTL settings for the table or columns.

  • table_settings - [string] - Table-specific settings.

  • query_settings - [string] - Settings used in insert/create table as select queries.

  • distributed_settings - [string] - Settings for distributed table.

  • distributed_sharding_key - [string] - Sharding key for distributed table.

  • dictionary_source_user - [string] - User for accessing the source dictionary table (used if type is DICTIONARY).

  • dictionary_source_password - [string] - Password for the dictionary source user.

Common Properties

name

[string] - Name is usually inferred from the filename, but can be specified manually.

refs

[array of string] - List of resource references

dev

[object] - Overrides any properties in development environment.

prod

[object] - Overrides any properties in production environment.

Additional properties when connector is athena or named connector for athena

output_location

[string] - Output location for query results in S3.

workgroup

[string] - AWS Athena workgroup to use for queries.

region

[string] - AWS region to connect to Athena and the output location.

Additional properties when connector is azure or named connector of azure

path

[string] - Path to the source

account

[string] - Account identifier

uri

[string] - Source URI

extract

[object] - Arbitrary key-value pairs for extraction settings

glob

[object] - Settings related to glob file matching.

  • max_total_size - [integer] - Maximum total size (in bytes) matched by glob

  • max_objects_matched - [integer] - Maximum number of objects matched by glob

  • max_objects_listed - [integer] - Maximum number of objects listed in glob

  • page_size - [integer] - Page size for glob listing

batch_size

[string] - Size of a batch (e.g., '100MB')

Additional properties when connector is bigquery or named connector of bigquery

project_id

[string] - ID of the BigQuery project.

Additional properties when connector is duckdb or named connector of duckdb

path

[string] - Path to the data source.

format

[string] - Format of the data source (e.g., csv, json, parquet).

pre_exec

[string] - refers to a SQL queries to run before the main query, available for DuckDB based models

post_exec

[string] - refers to a SQL query that is run after the main query, available for DuckDB based models

Additional properties when connector is gcs or named connector of gcs

path

[string] - Path to the source

uri

[string] - Source URI

extract

[object] - key-value pairs for extraction settings

glob

[object] - Settings related to glob file matching.

  • max_total_size - [integer] - Maximum total size (in bytes) matched by glob

  • max_objects_matched - [integer] - Maximum number of objects matched by glob

  • max_objects_listed - [integer] - Maximum number of objects listed in glob

  • page_size - [integer] - Page size for glob listing

batch_size

[string] - Size of a batch (e.g., '100MB')

Additional properties when connector is local_file or named connector of local_file

path

[string] - Path to the data source.

format

[string] - Format of the data source (e.g., csv, json, parquet).

Additional properties when connector is redshift or named connector of redshift

output_location

[string] - S3 location where query results are stored.

workgroup

[string] - Redshift Serverless workgroup to use.

database

[string] - Name of the Redshift database.

cluster_identifier

[string] - Identifier of the Redshift cluster.

role_arn

[string] - ARN of the IAM role to assume for Redshift access.

region

[string] - AWS region of the Redshift deployment.

Additional properties when connector is s3 or named connector of s3

region

[string] - AWS region

endpoint

[string] - AWS Endpoint

path

[string] - Path to the source

uri

[string] - Source URI

extract

[object] - key-value pairs for extraction settings

glob

[object] - Settings related to glob file matching.

  • max_total_size - [integer] - Maximum total size (in bytes) matched by glob

  • max_objects_matched - [integer] - Maximum number of objects matched by glob

  • max_objects_listed - [integer] - Maximum number of objects listed in glob

  • page_size - [integer] - Page size for glob listing

batch_size

[string] - Size of a batch (e.g., '100MB')

Additional properties when connector is salesforce or named connector of salesforce

soql

[string] - SOQL query to execute against the Salesforce instance.

sobject

[string] - Salesforce object (e.g., Account, Contact) targeted by the query.

queryAll

[boolean] - Whether to include deleted and archived records in the query (uses queryAll API).