Troubleshooting Performance in Rill
On this page, we've gathered a running list of recommendations and general guidelines to ensure your experience of using Rill remains performant and optimized. These best practices will help to ensure your dashboards remain performant, and that things continue to "just work" (for both Rill Developer and Rill Cloud), even as the size of your underlying data and deployment continues to grow. These best practices and guidelines will also continue to evolve but please don't hesitate to reach out if you start facing any bottlenecks or have further questions about ways to improve the Rill experience!
If you're looking for connector specific optimization see, Dev/Prod Connector Environments.
If you're looking for model specific optimization see, Performance Optimization.
Generally speaking, Rill's embedded DuckDB OLAP engine works very well out-of-the-box for datasets up to around 50GB in size. If you plan to be working with and ingesting volumes of data larger than 50GB, please get in touch and we can explore using one of our other enterprise-grade OLAP engine options.
Dashboard and Model Performance
Depending on the complexity of your underlying models and the size of the data models, there are things that you can do to improve performance.
Consider which models to materialize
By default, models will be materialized as views (in DuckDB). This allows for a dynamic and highly interactive experience when modeling, such as keystroke-by-keystroke profiling. However, since views are logical in nature, as the complexity and size of your data models continue grow (especially if the underlying data is very large), this can start to significantly impact performance as these complex queries will need to be continuously re-executed along with a number of profiling queries that the Rill runtime will send in the backend.
In such scenarios, we recommend materializing these models as tables. However, there are some tradeoffs to consider.
- Pros: Materializing a model will generally ensure significantly improved performance for downstream dependent models and dashboards.
- Cons: Enabling materialization for a model can severely impact or break the "keystroke-by-keystroke" experience and these models may also take longer to update (because the results are being written to a table vs remaining a view). It can also lead to degraded performance for very specific operations, such as when you need to perform cross joins.
We strongly recommend materializing final models that are being used directly in dashboards to ensure this data is served more quickly.
Refreshing Source Models
Another area to review when your data source starts getting larger is the ingestion performance. By default, when refreshing a source model in Rill, it drops and re-ingests the entire table/file. When your data is small, this isn't an issue, but it's not appropriate for larger datasets. In these cases, we recommend using partitions and incremental models.
Partitioned Models
Partitioned models divide your data into logical segments based on specific criteria, typically time-based columns like dates. This approach allows you to selectively refresh a partition where you know data has been altered.
Example partition configuration:
partitions:
glob:
path: 'gs://my-bucket/**/*.parquet'
Incremental Models
Incremental models only process new or changed data since the last refresh, rather than reprocessing the entire dataset. This dramatically improves performance for large datasets:
- Faster Refresh Times: Process only delta changes instead of full datasets
- Reduced Resource Usage: Lower CPU, memory, and storage requirements
- Frequent Updates: Enable near real-time data updates without performance degradation
- Cost Efficiency: Minimize compute costs for large-scale data processing
type: model
incremental: true
partitions:
glob:
path: gs://rilldata-public/github-analytics/Clickhouse/2024/*/*
partition: directory
sql: |
SELECT *
FROM read_parquet('{{ .partition.uri }}/commits_*.parquet')
WHERE '{{ .partition.uri }}' IS NOT NULL
By combining partitioning and incremental processing, you'll significantly reduce model refresh times and ensure your dashboards display the most current information.
Local Development / Rill Developer
When used in conjunction, Rill Developer and Rill Cloud are meant to serve two different but complementary purposes. For larger and distributed teams, Rill Developer is meant to primarily be used for local development purposes, which allow developers to quickly model their data and validate logic. Then, Rill Cloud enables shared collaboration at scale and where production consumption of dashboards should be happening (against your full data).