Incremental Partitioned Models
Putting the two concepts together, it is possible to create a incremental partitioned model. Doing so will allow you to not only partition the model but refresh only the partition that you need and incrementally ingest partitions
If you need any assistance with setting up a incremental partitioned model, reach out to us for assistance!
If you're looking for a working example, take a look at my-rill-tutorial in our examples repository.
As we already know how to set up these separately, let's see what changes in the UI when we enable both on a single model. In the following example, note that both incremental is enabled and partitions are defined by the google cloud storage directory.
type: model
incremental: true
refresh:
cron: "0 8 * * *"
partitions:
glob:
path: gs://rilldata-public/github-analytics/Clickhouse/2024/*/*
partition: directory
sql: |
SELECT *
FROM read_parquet('{{ .partition.uri }}/commits_*.parquet')
WHERE '{{ .partition.uri }}' IS NOT NULL
Refreshing Partitions in Incremental Models
When this model loads, you will be able to both view the partitions and select a specific partition to refresh via the UI in Rill Developer. Unlike only partitioned models, a new button is added in each of the partitons.
Likewise, if you refresh using the CLI.
rill project refresh --model CH_incremental_commits_directory --local --partition ba9f71625de8e042cabf3333576d502c
Refresh initiated. Check the project logs for status updates.
How Incremental Partitioned Models Work
Initial Ingestion:
When a model is first created, an intial ingestion will occur to bring in all of the data. This is also what occurs when you run a Full Refresh
. Note in the below image, all gray portions of the partitioned source are saved in separate partitions in the partitioned model.
Additional Partition:
If you add an additional partition to the source table, on the next refresh, Rill will detect the new partition and only add the additional partition to the model, as you can see in the diagram, the blue additional partition is added in its own partition in the partitioned model. If the other partitions have not been modified, these will not be touched.
Modify Existing Partition:
If you modify any of the already existing partition, yellow, Rill will reingest just the modified file during the scheduled refresh by checking the last_modified_date
parameter.