Partition Models
As mentioned, we can define a partitioning scheme by adding the partitions:
key with some defining parameters. Partitions are a special case of incremental model states. Let's look at the following example.
type: model
partitions:
sql: SELECT range AS num FROM range(0,10)
sql: SELECT {{ .partition.num }} AS num, now() AS inserted_on
In this simple example, we set up 10 partitions [range(0,10)] that have a single row with the same now() function as defined earlier. To confirm this we can run the following:
rill project partitions partitions_range --local
KEY (10) DATA EXECUTED ON ELAPSED ERROR
---------------------------------- ----------- ---------------------- --------- -------
ff7416f774dfb086006d0b4696c214e1 {"num":0} 2024-09-18T02:32:01Z 145ms
69401118e166742864f35f1a77ffe07d {"num":1} 2024-09-18T02:32:01Z 0s
555ef019f87b5a57ec7b057476fe9d38 {"num":2} 2024-09-18T02:32:01Z 0s
09a5530e62c87e41848a02680f4422d4 {"num":3} 2024-09-18T02:32:01Z 1ms
ecc2e2f9deb13509547ebcc2e5c55116 {"num":4} 2024-09-18T02:32:01Z 1ms
1e3ddd76525fefd5ac9989d9b6c4727e {"num":5} 2024-09-18T02:32:01Z 1ms
25684ad5ef8f1965b597edeeb8004afa {"num":6} 2024-09-18T02:32:01Z 0s
8142d250d75d0a20883333c00c3962d5 {"num":7} 2024-09-18T02:32:01Z 0s
0d6962a0746cb896ce87250808a50051 {"num":8} 2024-09-18T02:32:01Z 1ms
727d91a916260837579d5e42ad696dd9 {"num":9} 2024-09-18T02:32:01Z 0s
If you try to refresh a single partition, you'll receive the following error:
rill project refresh --model partitions_range --partition ff7416f774dfb086006d0b4696c214e1 --local
Error: can't refresh partitions on model "partitions_range" because it is not incremental
Incremental Partition Model
type: model
partitions:
sql: SELECT range AS num FROM range(0,10)
sql: SELECT {{ .partition.num }} AS num, now() AS inserted_on
incremental: true
output:
incremental_strategy: merge
unique_key: [num]
Similarily to the above, let's run rill project partitions <model_name>
to get the key_ids.
rill project partitions partitions_range_incremental --local
KEY (10) DATA EXECUTED ON ELAPSED ERROR
---------------------------------- ----------- ---------------------- --------- -------
ff7416f774dfb086006d0b4696c214e1 {"num":0} 2024-09-18T03:17:16Z 103ms
69401118e166742864f35f1a77ffe07d {"num":1} 2024-09-18T03:17:16Z 1ms
555ef019f87b5a57ec7b057476fe9d38 {"num":2} 2024-09-18T03:17:16Z 0s
09a5530e62c87e41848a02680f4422d4 {"num":3} 2024-09-18T03:17:16Z 0s
ecc2e2f9deb13509547ebcc2e5c55116 {"num":4} 2024-09-18T03:17:16Z 0s
1e3ddd76525fefd5ac9989d9b6c4727e {"num":5} 2024-09-18T03:17:16Z 0s
25684ad5ef8f1965b597edeeb8004afa {"num":6} 2024-09-18T03:17:16Z 0s
8142d250d75d0a20883333c00c3962d5 {"num":7} 2024-09-18T03:17:16Z 0s
0d6962a0746cb896ce87250808a50051 {"num":8} 2024-09-18T03:17:16Z 0s
727d91a916260837579d5e42ad696dd9 {"num":9} 2024-09-18T03:17:16Z 0s
Using the above information, we'll refresh the top partition.
rill project refresh --model partitions_range_incremental --partition ff7416f774dfb086006d0b4696c214e1 --local
Refresh initiated. Check the project logs for status updates.
Then, rerun the partitions command to see that the EXECUTED ON columns has been updated.
royendo@Roys-MacBook-Pro-2 modeling % rill project partitions partitions_range_incremental --local
KEY (10) DATA EXECUTED ON ELAPSED ERROR
---------------------------------- ----------- ---------------------- --------- -------
ff7416f774dfb086006d0b4696c214e1 {"num":0} 2024-09-18T03:17:58Z 1ms
69401118e166742864f35f1a77ffe07d {"num":1} 2024-09-18T03:17:16Z 1ms
555ef019f87b5a57ec7b057476fe9d38 {"num":2} 2024-09-18T03:17:16Z 0s
09a5530e62c87e41848a02680f4422d4 {"num":3} 2024-09-18T03:17:16Z 0s
ecc2e2f9deb13509547ebcc2e5c55116 {"num":4} 2024-09-18T03:17:16Z 0s
1e3ddd76525fefd5ac9989d9b6c4727e {"num":5} 2024-09-18T03:17:16Z 0s
25684ad5ef8f1965b597edeeb8004afa {"num":6} 2024-09-18T03:17:16Z 0s
8142d250d75d0a20883333c00c3962d5 {"num":7} 2024-09-18T03:17:16Z 0s
0d6962a0746cb896ce87250808a50051 {"num":8} 2024-09-18T03:17:16Z 0s
727d91a916260837579d5e42ad696dd9 {"num":9} 2024-09-18T03:17:16Z 0s
The above is a static model. The partition is defined by a set range(0,10) so there is no reason for us to put a refresh key-pair on this. However, real data is likely not static and will require some sort of refresh when you push to Rill Cloud.
Let's take a look at a more realistic example, our ClickHouse project.
Was this content helpful?