Let's get back to our project

Before we discuss the advanced features, we'll go over how to make changes in Rill Developer and push to Rill Cloud.

In our initial ingestion of the data, we brought in only a month's worth of data to ensure that we do not try to download all the data from our source. However, we would want our dashboards in Rill Cloud to display all the data. We can do this by defining the behavior of the source via the YAML. See our reference documentation for more information.

gs://rilldata-public/github-analytics/Clickhouse/2025/03/modified_files_*.parquet
gs://rilldata-public/github-analytics/Clickhouse/2025/03/commits_*.parquet

Local Rill Developer

Rill Developer will always run as dev unless explicitly defined with starting Rill via rill start --environment prod.

Modifying the YAML.

commits.yaml

Your source file should look something like:

# Source YAML
# Reference documentation: https://docs.rilldata.com/reference/project-files/sources

type: source

connector: "duckdb"
sql: "select * from read_parquet('gs://rilldata-public/github-analytics/Clickhouse/2025/03/commits_*.parquet')"

At the current setup, the source will only ingest data from March 2025. Instead, lets change that to our dev SQL and create a new line for the full data. This way, when we push the source to Rill Cloud, it won't be just for the month of March but historical, too!

dev:
  sql: "select * from read_parquet('gs://rilldata-public/github-analytics/Clickhouse/2025/03/commits_*.parquet')"
  
sql:  "select * from read_parquet('gs://rilldata-public/github-analytics/Clickhouse/*/*/commits_*.parquet')"

modified_files.yaml

Now let's do the same for modified_files.

# Source YAML
# Reference documentation: https://docs.rilldata.com/reference/project-files/sources
  
type: source

connector: "duckdb"
dev:
  sql: "select * from read_parquet('gs://rilldata-public/github-analytics/Clickhouse/2025/03/modified*.parquet')"
  
sql:  "select * from read_parquet('gs://rilldata-public/github-analytics/Clickhouse/*/*/modified*.parquet')"

{{if dev}} {{end}}

Similar to separating the SQL file into two separate keys, you can also use {{if dev}} to define a rule for the source data.

sql: "select * from read_parquet('gs://rilldata-public/github-analytics/Clickhouse/*/*/commits_*.parquet')
         {{if dev}} where author_date > '2025-01-01' {{end}}"```

Adding automatic source refresh

In order to keep the files from being static, you'll need to add a project refresh to the source!

refresh:
  every: 24h
  #cron: "0 8 * * *"

Now each day, this source will refresh on its own, and you'll have fresh data. Now let's take a look at the model.

Modifying the YAML.​

commits.yaml​

modified_files.yaml​

Adding automatic source refresh​

Modifying the YAML.

commits.yaml

modified_files.yaml

Adding automatic source refresh