Connector YAML
Connector YAML files define how Rill connects to external data sources and OLAP engines. Each connector specifies a driver type and its required connection parameters.
Available Connector Types
OLAP Engines
- ClickHouse - ClickHouse analytical database
- Databricks - Databricks SQL warehouse
- Druid - Apache Druid
- DuckDB - Embedded DuckDB engine (default)
- External DuckDB - External DuckDB database
- MotherDuck - MotherDuck cloud database
- Pinot - Apache Pinot
- StarRocks - StarRocks analytical database
Data Warehouses
- Athena - Amazon Athena
- BigQuery - Google BigQuery
- Databricks - Databricks SQL warehouse
- Redshift - Amazon Redshift
- Snowflake - Snowflake data warehouse
Databases
- MySQL - MySQL databases
- PostgreSQL - PostgreSQL databases
- Supabase - Supabase (managed PostgreSQL)
Object Storage
Service Integrations
- Claude - Claude connector for chat with your own API key
- OpenAI - OpenAI connector for chat with your own API key
- Gemini - Gemini connector for chat with your own API key
- Slack - Slack data
Other
- HTTPS - Public files via HTTP/HTTPS
For all credential parameters (passwords, tokens, keys), use environment variables with the syntax {{ .env.KEY_NAME }}. This keeps sensitive data out of your YAML files and version control. See our credentials documentation for complete setup instructions.
Properties
type
[string] - Refers to the resource type and must be connector (required)
Common Properties
name
[string] - Name is usually inferred from the filename, but can be specified manually.
refs
[array of string] - List of resource references
dev
[object] - Overrides any properties in development environment.
prod
[object] - Overrides any properties in production environment.
Athena
driver
[string] - Refers to the driver type and must be driver athena (required)
aws_access_key_id
[string] - AWS Access Key ID used for authentication. Required when using static credentials directly or as base credentials for assuming a role.
aws_secret_access_key
[string] - AWS Secret Access Key paired with the Access Key ID. Required when using static credentials directly or as base credentials for assuming a role.
aws_access_token
[string] - AWS session token used with temporary credentials. Required only if the Access Key and Secret Key are part of a temporary session credentials.
role_arn
[string] - ARN of the IAM role to assume. When specified, the SDK uses the base credentials to call STS AssumeRole and obtain temporary credentials scoped to this role.
role_session_name
[string] - Session name to associate with the STS AssumeRole session. Used only if 'role_arn' is specified. Useful for identifying and auditing the session.
external_id
[string] - External ID required by some roles when assuming them, typically for cross-account access. Used only if 'role_arn' is specified and the role's trust policy requires it.
workgroup
[string] - Athena workgroup to use for query execution. Defaults to 'primary' if not specified.
output_location
[string] - S3 URI where Athena query results should be stored (e.g., s3://your-bucket/athena/results/). Optional if the selected workgroup has a default result configuration.
region
[string] - AWS region where Athena and the result S3 bucket are located (e.g., us-east-1). Defaults to 'us-east-1' if not specified.
allow_host_access
[boolean] - Allow the Athena client to access host environment configurations such as environment variables or local AWS credential files. Defaults to true, enabling use of credentials and settings from the host environment unless explicitly disabled.
# Example: Athena connector configuration
type: connector # Must be `connector` (required)
driver: athena # Must be `athena` _(required)_
aws_access_key_id: "{{ .env.AWS_ACCESS_KEY_ID }}" # AWS Access Key ID for authentication
aws_secret_access_key: "{{ .env.AWS_SECRET_ACCESS_KEY }}" # AWS Secret Access Key for authentication
aws_access_token: "{{ .env.AWS_ACCESS_TOKEN }}" # AWS session token for temporary credentials
role_arn: "arn:aws:iam::123456789012:role/MyRole" # ARN of the IAM role to assume
role_session_name: "MySession" # Session name for STS AssumeRole
external_id: "MyExternalID" # External ID for cross-account access
workgroup: "primary" # Athena workgroup (defaults to 'primary')
output_location: "s3://my-bucket/athena-output/" # S3 URI for query results
region: "us-east-1" # AWS region (defaults to 'us-east-1')
allow_host_access: true # Allow host environment access _(default: true)_
Azure
driver
[string] - Refers to the driver type and must be driver azure (required)
azure_storage_account
[string] - Azure storage account name (required)
azure_storage_key
[string] - Azure storage access key (required)
azure_storage_sas_token
[string] - Optional azure SAS token for authentication
azure_storage_connection_string
[string] - Optional azure connection string for storage account
path_prefixes
[string, array] - A list of container or virtual directory prefixes that this connector is allowed to access.
Useful when different containers or paths use different credentials, allowing the system
to route access through the appropriate connector based on the blob path.
Example: azure://my-bucket/, azure://my-bucket/path/ ,azure://my-bucket/path/prefix
allow_host_access
[boolean] - Allow access to host environment configuration
# Example: Azure connector configuration
type: connector # Must be `connector` (required)
driver: azure # Must be `azure` _(required)_
azure_storage_account: "mystorageaccount" # Azure storage account name _(required)_
azure_storage_key: "{{ .env.AZURE_STORAGE_KEY }}" # Azure storage access key _(required)_
BigQuery
driver
[string] - Refers to the driver type and must be driver bigquery (required)
google_application_credentials
[string] - Raw contents of the Google Cloud service account key (in JSON format) used for authentication.
project_id
[string] - Google Cloud project ID
allow_host_access
[boolean] - Enable the BigQuery client to use credentials from the host environment when no service account JSON is provided. This includes Application Default Credentials from environment variables, local credential files, or the Google Compute Engine metadata server. Defaults to true, allowing seamless authentication in GCP environments.
log_queries
[boolean] - Controls whether to log raw SQL queries
max_bytes_billed
[integer] - Maximum number of bytes billed for a query. Queries that exceed this limit will fail with an error. This can help prevent unexpectedly high costs from large queries. It is highly recommended to set this when running on on-demand pricing model. The default value is 0 i.e. no limits are enforced in Rill.
allow_standard_api
[boolean] - Allow querying BigQuery using the standard API instead of the Storage Read API. This is less efficient and may lead to higher latency, but can be used as a fallback if the Storage Read API is not available due to insufficient permissions or other issues.
# Example: BigQuery connector configuration
type: connector # Must be `connector` (required)
driver: bigquery # Must be `bigquery` _(required)_
google_application_credentials: "{{ .env.GOOGLE_APPLICATION_CREDENTIALS }}" # Google Cloud service account JSON
project_id: "my-project-id" # Google Cloud project ID
allow_host_access: true # Allow host environment access _(default: true)_
ClickHouse
driver
[string] - Refers to the driver type and must be driver clickhouse (required)
managed
[boolean] - true means Rill will provision the connector using the default provisioner. false disables automatic provisioning.
mode
[string] - read - Controls the operation mode for the ClickHouse connection. Defaults to 'read' for safe operation with external databases. Set to 'readwrite' to enable model creation and table mutations. Note: When 'managed: true', this is automatically set to 'readwrite'.
dsn
[string] - DSN(Data Source Name) for the ClickHouse connection
username
[string] - Username for authentication
password
[string] - Password for authentication
host
[string] - Host where the ClickHouse instance is running
port
[integer] - Port where the ClickHouse instance is accessible
database
[string] - Name of the ClickHouse database within the cluster
ssl
[boolean] - Indicates whether a secured SSL connection is required
cluster
[string] - Cluster name, required for running distributed queries
sync_replicas
[boolean] - Controls whether to run SYSTEM SYNC REPLICA before replacing partitions on a replicated table in a cluster, ensuring all inserted parts are visible across replicas before the partition swap. Defaults to true
write_dsn
[string] - Separate connection string for write operations
database_whitelist
[string] - Comma-separated list of databases to show
log_queries
[boolean] - Controls whether to log raw SQL queries
query_settings_override
[string] - override the default settings used in queries. Changing the default settings can lead to incorrect query results and is generally not recommended. If you need to add settings, use query_settings
query_settings
[string] - query settings to be set on dashboard queries. query_settings_override takes precedence over these settings and if set these are ignored. Each setting must be separated by a comma. Example max_threads = 8, max_memory_usage = 10000000000
embed_port
[integer] - Port to run ClickHouse locally (0 for random port)
can_scale_to_zero
[boolean] - Indicates if the database can scale to zero
max_open_conns
[integer] - Maximum number of open connections to the database
max_idle_conns
[integer] - Maximum number of idle connections in the pool
dial_timeout
[string] - Timeout for dialing the ClickHouse server
conn_max_lifetime
[string] - Maximum time a connection may be reused
read_timeout
[string] - Maximum time for a connection to read data
# Example: ClickHouse connector configuration
type: connector # Must be `connector` (required)
driver: clickhouse # Must be `clickhouse` _(required)_
managed: false # Provision the connector using the default provisioner
mode: "readwrite" # Enable model creation and table mutations
username: "myusername" # Username for authentication
password: "{{ .env.CLICKHOUSE_PASSWORD }}" # Password for authentication
host: "localhost" # Hostname of the ClickHouse server
port: 9000 # Port number of the ClickHouse server
database: "mydatabase" # Name of the ClickHouse database
ssl: true # Enable SSL for secure connection
cluster: "mycluster" # Cluster name
databricks
driver
[string] - Refers to the driver type and must be driver databricks (required)
host
[string] - Host where the Databricks instance is running
http_path
[string] - HTTP path sets up the endpoint to the warehouse
token
[string] - Token sets up the Personal Access Token
catalog
[string] - Default catalog name. Optional.
schema
[string] - Default schema name. Optional.
dsn
[string] - DSN (Data Source Name) for the Databricks connection.
This is intended for advanced configuration where you want to specify
properties that are not explicitly defined above.
It can only be used when the other connection fields (host, http_path, token, catalog, schema) are not used.
Refer to https://github.com/databricks/databricks-sql-go for the full list of supported DSN parameters and their formats.
log_queries
[boolean] - Controls whether to log raw SQL queries
# Example: Databricks connector configuration
type: connector # Must be `connector` (required)
driver: databricks # Must be `databricks` _(required)_
host: "my-databricks-instance.cloud.databricks.com" # Hostname of the Databricks instance
http_path: "/sql/1.0/endpoints/1234567890abcdef" # HTTP path for the Databricks SQL warehouse endpoint
token: "{{ .env.DATABRICKS_TOKEN }}" # Personal Access Token for authentication
catalog: "my_catalog" # Default catalog name (optional)
schema: "my_schema" # Default schema name (optional)
Druid
driver
[string] - Refers to the driver type and must be driver druid (required)
dsn
[string] - Data Source Name (DSN) for connecting to Druid
username
[string] - Username for authenticating with Druid
password
[string] - Password for authenticating with Druid
host
[string] - Hostname of the Druid coordinator or broker
port
[integer] - Port number of the Druid service
ssl
[boolean] - Enable SSL for secure connection
log_queries
[boolean] - Log raw SQL queries sent to Druid
max_open_conns
[integer] - Maximum number of open database connections (0 = default, -1 = unlimited)
skip_version_check
[boolean] - Skip checking Druid version compatibility
# Example: Druid connector configuration
type: connector # Must be `connector` (required)
driver: druid # Must be `druid` _(required)_
username: "myusername" # Username for authentication
password: "{{ .env.DRUID_PASSWORD }}" # Password for authentication
host: "localhost" # Hostname of the Druid coordinator or broker
port: 8082 # Port number of the Druid service
ssl: true # Enable SSL for secure connection
DuckDB
driver
[string] - Must be "duckdb" (required)
mode
[string] - Set the mode for the DuckDB connection.
path
[string] - Path to external DuckDB database
attach
[string] - Full ATTACH statement to attach a DuckDB database
pool_size
[integer] - Number of concurrent connections and queries allowed
cpu
[integer] - Number of CPU cores available to the database
memory_limit_gb
[integer] - Amount of memory in GB available to the database
read_write_ratio
[number] - Ratio of resources allocated to read vs write operations
allow_host_access
[boolean] - Whether access to local environment and file system is allowed
init_sql
[string] - SQL executed during database initialization
conn_init_sql
[string] - SQL executed when a new connection is initialized
boot_queries
[string] - Deprecated - Use init_sql instead
log_queries
[boolean] - Whether to log raw SQL queries executed through OLAP
create_secrets_from_connectors
[string, array] - List of connector names for which temporary secrets should be created before executing the SQL.
database_name
[string] - Name of the attached DuckDB database (auto-detected if not set)
schema_name
[string] - Default schema used by the DuckDB database
# Example: DuckDB connector configuration
type: connector # Must be `connector` (required)
driver: duckdb # Must be `duckdb` _(required)_
mode: "readwrite" # Set the mode for the DuckDB connection.
allow_host_access: true # Whether access to the local environment and file system is allowed
cpu: 4 # Number of CPU cores available to the database
memory_limit_gb: 16 # Amount of memory in GB available to the database
pool_size: 5 # Number of concurrent connections and queries allowed
read_write_ratio: 0.7 # Ratio of resources allocated to read vs write operations
init_sql: "INSTALL httpfs; LOAD httpfs;" # SQL executed during database initialization
log_queries: true # Whether to log raw SQL queries executed through OLAP
External DuckDB
driver
[string] - Refers to the driver type and must be driver duckdb (required)
path
[string] - Path to the DuckDB database
mode
[string] - Set the mode for the DuckDB connection.
# Example: DuckDB as a source connector configuration
type: connector # Must be `connector` (required)
driver: duckdb # Must be `duckdb` _(required)_
path: "/path/to/my-duckdb-database.db" # Name of the DuckDB database
mode: "read" # Set the mode for the DuckDB connection.
GCS
driver
[string] - Refers to the driver type and must be driver gcs (required)
google_application_credentials
[string] - Google Cloud credentials JSON string
key_id
[string] - Optional S3-compatible Key ID when used in compatibility mode
secret
[string] - Optional S3-compatible Secret when used in compatibility mode
path_prefixes
[string, array] - A list of bucket path prefixes that this connector is allowed to access.
Useful when different buckets or bucket prefixes use different credentials,
allowing the system to select the appropriate connector based on the bucket path.
Example: gs://my-bucket/, gs://my-bucket/path/ ,gs://my-bucket/path/prefix
allow_host_access
[boolean] - Allow access to host environment configuration
# Example: GCS connector configuration
type: connector # Must be `connector` (required)
driver: gcs # Must be `gcs` _(required)_
google_application_credentials: "{{ .env.GOOGLE_APPLICATION_CREDENTIALS }}" # Google Cloud credentials JSON string
HTTPS
driver
[string] - Refers to the driver type and must be driver https (required)
headers
[object] - HTTP headers to include in the request
path_prefixes
[string, array] - A list of HTTP/HTTPS URL prefixes that this connector is allowed to access.
Useful when different URL namespaces use different credentials, enabling the
system to choose the appropriate connector based on the URL path.
Example: https://example.com/, https://example.com/path/ ,https://example.com/path/prefix
# Example: HTTPS connector configuration
type: connector # Must be `connector` (required)
driver: https # Must be `https` _(required)_
headers:
"Authorization": 'Bearer {{ .env.HTTPS_TOKEN }}' # HTTP headers to include in the request
MotherDuck
driver
[string] - Refers to the driver type and must be driver duckdb. (required)
path
[string] - Path to your MD database (required)
schema_name
[string] - Define your schema if not main, uses main by default
token
[string] - MotherDuck token (required)
init_sql
[string] - SQL executed during database initialization.
mode
[string] - Set the mode for the MotherDuck connection. By default, it is set to 'read' which allows only read operations. Set to 'readwrite' to enable model creation and table mutations.
create_secrets_from_connectors
[string, array] - List of connector names for which temporary secrets should be created before executing the SQL.
# Example: MotherDuck connector configuration
type: connector # Must be `connector` (required)
driver: duckdb # Must be `duckdb` _(required)_
token: "{{ .env.MOTHERDUCK_TOKEN }}" # Set the MotherDuck token from your .env file _(required)_
path: "md:my_database" # Path to your MD database
schema_name: "my_schema" # Define your schema if not main, uses main by default
MySQL
driver
[string] - Refers to the driver type and must be driver mysql (required)
dsn
[string] - Data Source Name (DSN) for the MySQL connection, provided in MySQL URI format. The DSN must follow the standard MySQL URI scheme:
mysql://user:password@host:3306/my-db
Rules for special characters in password:
- The following characters are allowed unescaped in the URI:
~._- - All other special characters must be percent-encoded (
%XXformat).
mysql://user:pa%40ss@localhost:3306/my-db # password contains '@'
mysql://user:pa%3Ass@localhost:3306/my-db # password contains ':'
host
[string] - Hostname of the MySQL server
port
[integer] - Port number for the MySQL server
database
[string] - Name of the MySQL database
user
[string] - Username for authentication
password
[string] - Password for authentication
ssl-mode
[string] - ssl mode options: disabled, preferred, or required.
log_queries
[boolean] - Controls whether to log raw SQL queries
# Example: MySQL connector configured using individual properties
type: connector
driver: mysql
host: localhost
port: 3306
database: mydb
user: user
password: "{{ .env.MYSQL_PASSWORD }}"
ssl-mode: preferred
# Example: MySQL connector configured using dsn
type: connector
driver: mysql
dsn: "{{ .env.MYSQL_DSN }}" # Define DSN in .env file
OpenAI
driver
[string] - The driver type, must be set to "openai"
api_key
[string] - API key for connecting to OpenAI (required)
model
[string] - The OpenAI model to use (e.g., 'gpt-4o')
max_output_tokens
[number] - Maximum number of tokens to generate in the completion (default: 8192)
reasoning_effort
[string] - Constrains effort on reasoning for reasoning models (e.g., 'low', 'medium', 'high')
base_url
[string] - The base URL for the OpenAI API (e.g., 'https://api.openai.com/v1')
api_type
[string] - The type of OpenAI API to use
api_version
[string] - The version of the OpenAI API to use (e.g., '2023-05-15'). Required when API Type is AZURE or AZURE_AD
# Example: OpenAI connector configuration
type: connector # Must be `connector` (required)
driver: openai # Must be `openai` _(required)_
api_key: "{{ .env.OPENAI_API_KEY }}" # API key for connecting to OpenAI
model: "gpt-4o" # The OpenAI model to use (e.g., 'gpt-4o')
max_output_tokens: 8192 # Maximum number of tokens to generate in the completion (default: 8192)
reasoning_effort: "medium" # Constrains effort on reasoning for reasoning models (e.g., 'low', 'medium', 'high')
base_url: "https://api.openai.com/v1" # The base URL for the OpenAI API (e.g., 'https://api.openai.com/v1')
api_type: "openai" # The type of OpenAI API to use
api_version: "2023-05-15" # The version of the OpenAI API to use (e.g., '2023-05-15'). Required when API Type is AZURE or AZURE_AD
Claude
driver
[string] - The driver type, must be set to "claude"
api_key
[string] - API key for connecting to Claude (required)
model
[string] - The Claude model to use (e.g., 'claude-opus-4-5')
max_tokens
[number] - Maximum number of tokens in the response (e.g., 8192)
temperature
[number] - Sampling temperature to use (e.g., 0.0)
base_url
[string] - The base URL for the Claude API
# Example: Claude connector configuration
type: connector
driver: claude
api_key: "{{ .env.claude_api_key }}"
model: claude-opus-4-5
Gemini
driver
[string] - The driver type, must be set to "gemini"
api_key
[string] - API key for connecting to Gemini (required)
model
[string] - The Gemini model to use (e.g., 'gemini-2.5-pro-preview-05-06')
include_thoughts
[boolean] - Whether to include thinking/reasoning in the response
thinking_level
[string] - Level of 'thinking' for the model's response (e.g., 'MINIMAL', 'LOW', 'MEDIUM', 'HIGH'). Default is 'LOW'.
max_output_tokens
[number] - Maximum number of tokens in the response (e.g., 8192)