Create Advanced Models

Unlike SQL models, YAML file models allow the ability to fine tune a model to perform additional capabilities such as pre-exec, post-exec SQL, partitioning, and incremental modeling. This is an important addition to modeling as it allows the user to customize the model's method of building. In the case of partitions and incremental modeling, this will reduce the amount of data ingested into Rill at each interval and allow insight into specific issues per partition. Another use case is when using multiple OLAP engines, this allows you to define where a SQL query is run.
When to use Advanced Models?
For the majority of users on a DuckDB backed Rill project, advanced models are not required. When a project gets larger and refreshing the whole datasets becomes a time-consuming and costly task, we introduce incremental ingestion to help alleviate the problem. Along with incremental modeling, we use partitions to divide a dataset into smaller to manage datasets. When enabling partitions, you are able to refresh single sections of data if required.
Another use case is when using multiple OLAP engines. This allows you to specify where you SQL is running. While we do not officially support ClickHouse modeling, this is available behind a feature flag clickhouseModeling
. When both DuckDB and ClickHouse are enabled in a single environment, you will need to define connector: duckdb/clickhouse
in the YAML to tell Rill where to run the SQL query.
Types of Advanced Models
Creating an Advanced Model
You can get started with an advanced model with the following code block:
#Model YAML
#Reference documentation: https://docs.rilldata.com/reference/project-files/advanced-models
type: model
connector: duckdb
sql: select * from <source>
Please refer to our reference documentation linked above for the available parameters to set in your model.
Currently, there isn't a UI button to start off with an advanced model YAML. Creating a model in Rill will always create a model.sql file.
DuckDB Model's pre-exec, SQL, post-exec
While we install a set of core libraries and extensions with our embed DuckDB, there might be specific use-cases where you might want to add a different one. In order to do this, you will need to use the pre-exec parameter to ensure that everything is loaded before running your SQL query.
Take the example of gsheets
community extension. In order to use this extension in Rill, you'll need to install and load the plugin. Once that's done you can define the secret and finally run the SQL.
pre_exec: INSTALL gsheets FROM community; LOAD gsheets; CREATE SECRET (TYPE gsheet, PROVIDER access_token, TOKEN '<your_token>');
sql: SELECT * FROM read_gsheet('https://docs.google.com/spreadsheets/d/<your_unique_ID>', headers=false);
Like any SQL query, you can divide the queries with a semicolon to run multiple queries. This is available for both pre_exec
and post_exec
. The default sql
parameter requires a single SELECT statement to run.
Another example is attaching a database to DuckDB, running some queries against it then detaching said database.
pre_exec: ATTACH 'dbname=postgres host=localhost port=5432 user=postgres password=postgres' AS postgres_db (TYPE POSTGRES);
sql: SELECT * FROM postgres_query('postgres_db', 'SELECT * FROM USERS')
post_exec: DETACH postgres_db # Note : this is not mandatory but nice to have
Similar Considerations to Note
As with normal SQL models, materialization will be disabled by default and depending on your use-case setting this parameter to true may improve performance. For more information, check out our model materialization notes.