Many customers start with Rill loading data from storage locations - s3, GCS, etc.
Data is loaded into Rill once client processing is complete (or potentially raw data) at some regular interval (usually hourly).
If your data is already in aggregate or final format, you can load directly into Rill:
- test your data manually (more details here) which will create your Druid spec for ingestion
- using that ingestion spec, add an orchestration step post-processing to load data into Druid. any scheduling tool will work - we typically use Ariflow. See the this example Airflow dag for more details
- if you plan to use Rill Explore, contact [email protected] to create your first staging dashboard for review
For more details on Druid ingestion, visit:
For customers with more complex joins/transformations requiring Rill managed pipelines,
- grant Rill access to the storage location (usually Amazon s3 or Google Cloud Storage)
- Rill to develop pipeline logic as required
- review the sample output with the Rill team to confirm layout and values
- Rill to poll source locations on regular intervals
Rill Managed Pipelines
Email [email protected] if you're interested in having the Rill team build and manage ingestion for your data pipelines
Updated 11 days ago