Templating
Overview
Rill uses the Go programming language's native templating engine, known as text/template
, which you might know from projects such as Helm or Hugo. It additionally includes the Sprig library of utility functions.
Templating can be a powerful tool to help introduce dynamic conditional statements based on local variables that have been passed in to Rill or based on the environment being used. Some common use cases may include but are not limited to:
- Pointing to different source data endpoints / databases between your development and production environments, i.e. Rill Developer vs Rill Cloud
- Working with a sample or subset of data during local development (but making sure the full dataset is being used in production dashboards)
- Applying filters or other if/else predefined logic to run different SQL whether a model is being run locally or in production
- There are many more possibilities!
For the most part, templating should be used in SQL models and when defining source proprties. If you have further questions about templating, please don't hesitate to reach out and we'd love to assist you further!
Setting variables in Rill
Templating can be used in conjunction with variables to apply more advanced logic to your sources and models.
Variables can be set in Rill through one of the following methods:
- Defining the corresponding key-value pair under the top-level
env
key inrill.yaml
- Manually passing in the variable when starting Rill (i.e.
rill start --env <var_name>=<value>
) - Specifying the key-value pair for your variable in your
<RILL_PROJECT_HOME>/.env
file and/or usingrill env set
to set the variable via the CLI (and then pushing / pulling as appropriate)
For more information, please check our reference documentation.
Environments and Rill
Templating can be used in conjunction with environments to apply environment-specific logic based on whether the project is running locally on Rill Developer or when deployed to Rill Cloud. By default, Rill comes with two built-in environments defined, dev
and prod
, which correspond to Rill Developer and Rill Cloud respectively. For more details about using environments in Rill Cloud, please see our environments page.
Referencing other tables or models in SQL when using templating
When you use templating in a SQL model, Rill loses the ability to analyze the SQL for references to other sources and models in the project. This can lead to reconcile errors where Rill tries to create a model before the sources (or other models) it depends upon have finished being ingested.
To avoid this scenario, whenever you use templating in a model's SQL, it is strongly recommended to incorporate ref
tags whenever you need to reference another resource in your project in SQL. For those familiar with dbt's ref() function, the concept is very similar in nature. As an example:
# models/my_model.sql
SELECT *
FROM {{ ref "my_source" }}
WHERE my_value = '{{ .env.my_value }}'
In this example, the ref
tag ensures that the model my_model
will not be created until after a source named my_source
has finished ingesting.
Examples
Let's walk through a few example scenarios to illustrate the power of templating and how it can be used within Rill.
Changing the database user based on dev / prod
Let's say that we have a Snowflake source created that is being used in some downstream models and dashboards. In this case, we want the following logic applied:
- In local development, we want Rill Developer to use our dev user (e.g.
SUPPORT_TEST
) to connect to Snowflake - In local development, we want to limit the size of our source data in Rill Developer (in this case just a simple
LIMIT 1
to illustrate the point) - In production, we want Rill Cloud to use our provisioned service account (e.g.
PROD_USER
) to connect to Snowflake - In production, we want to make sure that Rill Cloud is using the entire source data for our downstream models and dashboards
In this hypothetical scenario, our source.yaml
might look something like the following:
type: source
connector: "snowflake"
sql: "select * from <table_name> {{if dev}} limit 1 {{end}}"
dsn: "{{if dev}}SUPPORT_TEST{{else}}PROD_USER{{end}}@<account_identifier>/<database>/<schema>?warehouse=<warehouse>&role=<role>N&authenticator=SNOWFLAKE_JWT&privateKey=..."
Changing the bucket location based on dev / prod
Let's say that we have a GCS source created where Rill is reading in some CSV data (in this case we have some sample Citi Bike trip data loaded onto both a "test" and "prod" GCS bucket). In this case, let's imagine that we want to connect to this "test" bucket for local development purposes but we want to make sure that our production data hosted on our "prod" bucket is what's being used to power this same source once the project has been deployed to Rill Cloud. In such a scenario, our source.yaml
might look like:
type: source
connector: "duckdb"
sql: "select * from read_csv('gs://{{if dev}}<test_bucket>{{else}}<prod_bucket>{{end}}/201306-citibike-tripdata.csv', auto_detect=true, ignore_errors=1, header=true)"
Applying a one week sample to the source bucket for local development
In another example, let's say we had a S3 source defined that happens to be reading a very large amount of parquet data. Following best practices, we'll want to read in a subset of this source data for local modeling in Rill Developer rather than using the full dataset for development purposes. Furthermore, we'll make the assumption that the upstream data is not partitioned and thus the S3 bucket is not partitioned (where we could then simply filter the path
by using a glob pattern potentially in conjunction with environment specific logic). So what can we do?
Fortunately, we can leverage DuckDB's ability to read from S3 files directly and apply a filter post-download using templating logic in the SQL. In this case, because there is an existing updated_at
timestamp column, we can use it to filter and retrieve only one week's worth of data. For example, our source.yaml
file may end up looking something like:
type: source
connector: "duckdb"
sql: SELECT * FROM read_parquet('s3://bucket/path/*.parquet') {{ if dev }} where updated_at >= '2024-03-01' AND updated_at < '2024-03-07' {{ end }}
In this case, we are using the embedded DuckDB engine to execute a SELECT statement while leveraging DuckDB's native read_parquet function. Therefore, the connector
type ends up being duckdb
instead of s3
. For more details, see our Source YAML reference documentation.
Limiting the number of rows in a model only for local development
Following a similar vein to our previous example, let's say that we wanted to apply a limit (or other custom SQL) to our models that only came into effect when used in development (i.e. Rill Developer), but not in production (i.e. Rill Cloud). A very straightforward example would be that perhaps we have some complex SQL written that is computationally intensive and our source data is quite large. For local modeling purposes, we don't need to work with the full dataset but need to only return the first 1000 rows to validate that the logic is correct and that results are returning as expected.
In our model.sql
file, we could leverage templating logic to check the environment is dev
and apply a LIMIT 1000
to the query:
-- A bunch of CTEs, complex joins, etc.
SELECT * FROM final
{{if dev}} LIMIT 1000 {{end}}
Running the same model above in Rill Cloud, the full dataset will be used and this limit will only apply to local development with Rill Developer!
Leveraging variables to apply a filter and row limit dynamically to a model
Our last example will highlight how the same templating concepts can be applied with variables instead of environments. In this case, we have a source dataset about horror movies that came out in the past 50 years, which includes various characteristics, attributes, and metrics about each horror movie as separate columns. For example, we know the release date, how many people saw a movie, what the budget was, it's popularity, the original language of the movie, the genres, and much more.
Let's say that we wanted to apply a filter on the resulting model based on the original_language
of the movie and also limit the number of records that we retrieve, which will be based on the language
and local_limit
variables we have defined. Taking a quick look at our project's rill.yaml
file, we can see the following configuration (to return only English movies and apply a limit of 5):
env:
local_limit: 5
language: "en"
Furthermore, our model.sql
file contains the following SQL:
SELECT * FROM {{ ref "data_source" }}
WHERE original_language = '{{ .env.language }}'
{{if dev}} LIMIT {{ .env.local_limit }} {{end}}
ref
functionIf you use templating in SQL models, you must replace references to tables / models created by other sources or models with ref
tags. See this section on "Referencing other tables or models in SQL when using templating". This ensures that the native Go templating engine used by Rill is able to resolve and correctly compile the SQL syntax during runtime (to avoid any potential downstream errors).
If we simply run Rill Developer using rill start
, our model will look like the following (this will also reflect our data model in production, i.e. Rill Cloud, after we've pushed the changes for the project to Github):
Now, just to illustrate what a local override might look like, let's say we stop Rill Developer and then restart Rill via the CLI with the following command:
rill start --env language="es" --env local_limit=100
Even though we have defaults set in rill.yaml
(and this will be used by any downstream models and dashboards on Rill Cloud), we will instead see these local overrides come into effect with our templated logic to return Spanish movies and the model limit is now 100 rows.
Voila!
Additional resources
- Official docs (Go)
- Learn Go Template Syntax (HashiCorp)
- Sprig Function Documentation