Source YAML

Deprecated Feature

Sources have been deprecated and are now considered "source models." While sources remain backward compatible, we recommend migrating to the new source model format for access to the latest features and improvements.

Next steps:

Continue using sources if needed (backward compatible)
Migrate to source models via the type:model parameter for existing projects
See our model YAML reference for current documentation and best practices

Did you know?

Files that are nested at any level under your native sources directory will be assumed to be sources (unless otherwise specified by the type property).

Properties

type - Refers to the resource type and must be model (required).

connector — Refers to the connector type for the source (required).

https — public files accessible through the web via a http/https URL endpoint
s3 — a file available on amazon s3
- Note: Rill also supports ingesting data from other storage providers that support S3 API. Refer to the endpoint property below.
gcs — a file available on google cloud platform
local_file — a locally available file in a supported format (e.g. parquet, csv, etc.)
motherduck - data stored in motherduck
athena - a data store defined in Amazon Athena
redshift - a data store in Amazon Redshift
postgres - data stored in Postgres
sqlite - data stored in SQLite
snowflake - data stored in Snowflake
bigquery - data stored in BigQuery
duckdb - use the embedded DuckDB engine to submit a DuckDB-supported native SELECT query (should be used in conjunction with the sql property)

type — Deprecated but preserves a legacy alias to connector. Can be used instead to specify the source connector, instead of the resource type (see above), only if the source YAML file belongs in the <RILL_PROJECT_DIRECTORY>/sources/ directory (preserved primarily for backwards compatibility).

uri — Refers to the URI of the remote connector you are using for the source. Rill also supports glob patterns as part of the URI for S3 and GCS (required for type: http, s3, gcs).

s3://your-org/bucket/file.parquet — the s3 URI of your file
gs://your-org/bucket/file.parquet — the gsutil URI of your file
https://data.example.org/path/to/file.parquet — the web address of your file

path — Refers to the local path of the connector you are using for the source relative to your project's root directory (required for type: file).

/path/to/file.csv — the path to your file

sql — Sets the SQL query to extract data from a SQL source: DuckDB/Motherduck/Athena/BigQuery/Postrgres/SQLite/Snowflake (optional).

region — Sets the cloud region of the S3 bucket or Athena you want to connect to using the cloud region identifier (e.g. us-east-1). Only available for S3 and Athena (optional).

endpoint — Overrides the S3 endpoint to connect to. This should only be used to connect to S3-compatible services, such as Cloudflare R2 or MinIO (optional).

output_location — Sets the query output location and result files in Athena. Please note that Rill will remove the result files but setting a S3 file retention rule for the output location would make sure no orphaned files are left (optional).

workgroup — Sets a workgroup for Athena connector. The workgroup is also used to determine an output location. A workgroup may override output_location if Override client-side settings is turned on for the workgroup (optional).

project_id — Sets a project id to be used to run BigQuery jobs (required for type: bigquery).

timeout — The maximum time to wait for souce ingestion (optional).

refresh - Specifies the refresh schedule that Rill should follow to re-ingest and update the underlying source data (optional).

cron - a cron schedule expression, which should be encapsulated in single quotes, e.g. '* * * * *' (optional)
every - a Go duration string, such as 24h (docs) (optional)

db — Sets the database for motherduck connections and/or the path to the DuckDB/SQLite db file (optional).

For DuckDB / SQLite, if deploying to Rill Cloud, this db file will need to be accessible from the root directory of your project on GitHub.

database_url — Postgres connection string that should be used. Refer to Postgres documentation for more details (optional).

If not specified in the source YAML, the connector.postgres.database_url connection string will need to be set when deploying the project to Rill Cloud.

duckdb – Specifies the raw parameters to inject into the DuckDB read_csv, read_json or read_parquet statement that Rill generates internally (optional).

See the DuckDB docs for a full list of available parameters.

Example #1: Define all column data mappings

duckdb:
  header: True
  delim: "'|'"
  columns: "columns={'FlightDate': 'DATE', 'UniqueCarrier': 'VARCHAR', 'OriginCityName': 'VARCHAR', 'DestCityName': 'VARCHAR'}"

Example #2: Define a column type

duckdb:
  header: True
  delim: "'|'"
  columns: "types={'UniqueCarrier': 'VARCHAR'}"

dsn - Used to set the Snowflake connection string. For more information, refer to our Snowflake connector page and the official Go Snowflake Driver documentation (optional).

If not specified in the source YAML, the connector.snowflake.dsn connection string will need to be set when deploying the project to Rill Cloud.

Note: For new projects, use type: model in your source YAML files. The legacy type: source is still supported for backward compatibility, but may be removed in the future.

Properties​

Properties