Skip to main content

Amazon S3

Overview

Amazon S3 is a scalable, fully managed, and highly reliable object storage solution offered by AWS, designed to store and access data from anywhere in the world. It provides a secure and cost-effective way to store data, including common storage formats such as CSV and Parquet. Rill natively supports connecting to S3 using the provided S3 URI of your bucket to retrieve and read files.

Authentication Methods

To connect to Amazon S3, you can choose from four authentication options:

  1. Access Key/Secret Key (recommended for cloud deployment)
  2. IAM Role Assumption (enhanced security with temporary credentials)
  3. Public (for publicly accessible buckets - no authentication required)
  4. Local AWS credentials (local development only - not recommended for production)
S3-Compatible Storage

You can also connect to S3-compatible storage services by specifying a custom endpoint in your connector configuration.

Using the Add Data UI

When you add an S3 data model through the Rill UI, you'll see two authentication options:

  • Access Key/Secret Key: The process follows two steps:

    1. Configure Authentication - Set up your S3 connector with credentials
    2. Configure Data Model - Define which bucket and objects to ingest The UI will automatically create both the connector file and model file for you.
  • Public: For publicly accessible buckets, you skip the connector creation step and go directly to:

    1. Configure Data Model - Define which bucket and objects to ingest The UI will only create the model file (no connector file is needed).
Manual Configuration Only

IAM Role Assumption and Local AWS credentials are only available through manual configuration. See Method 2: IAM Role Assumption and Method 4: Local AWS Credentials for setup instructions.


Access Key/Secret Key credentials provide reliable authentication for S3. This method works for both local development and Rill Cloud deployments.

Using the UI

  1. Click Add Data in your Rill project
  2. Select Amazon S3 as the data model type
  3. In the authentication step:
    • Choose Access Key/Secret Key
    • Enter your Access Key ID
    • Enter your Secret Access Key
    • Name your connector (e.g., my_s3)
  4. In the data model configuration step:
    • Enter your bucket name and object path
    • Configure other model settings as needed
  5. Click Create to finalize

The UI will automatically create both the connector file and model file for you.

Manual Configuration

If you prefer to configure manually, create two files:

Step 1: Create connector configuration

Create connectors/my_s3.yaml:

type: connector
driver: s3

aws_access_key_id: "{{ .env.connector.s3.aws_access_key_id }}"
aws_secret_access_key: "{{ .env.connector.s3.aws_secret_access_key }}"

Step 2: Create model configuration

Create models/my_s3_data.yaml:

type: model
connector: duckdb
create_secrets_from_connectors: my_s3

sql: SELECT * FROM read_parquet('s3://my-bucket/path/to/data/*.parquet')

# Add a refresh schedule
refresh:
cron: "0 */6 * * *"

Step 3: Add credentials to .env

connector.s3.aws_access_key_id=your_access_key_id
connector.s3.aws_secret_access_key=your_secret_access_key
Did you know?

If this project has already been deployed to Rill Cloud and credentials have been set for this connector, you can use rill env pull to pull these cloud credentials locally (into your local .env file). Please note that this may override any credentials that you have set locally for this source.


Method 2: IAM Role Assumption

Rill supports AWS IAM role assumption for enhanced security. This method allows Rill to temporarily assume an IAM role to access S3 resources. This method is only available through manual configuration.

Benefits of Using IAM Roles

  • Temporary Credentials: No need to manage long-lived access keys.
  • Enhanced Security: Follows the principle of least privilege.
  • Cross-Account Access: Access S3 resources in different AWS accounts.
  • Centralized Control: Manage permissions through IAM roles and policies.

Manual Configuration

Step 1: Create connector configuration

Create connectors/my_s3_role.yaml:

type: connector
driver: s3

aws_role_arn: "{{ .env.connector.s3.aws_role_arn }}"
aws_external_id: "{{ .env.connector.s3.aws_external_id }}"

Step 2: Create model configuration

Create models/my_s3_data.yaml:

type: model
connector: duckdb
create_secrets_from_connectors: my_s3_role

sql: SELECT * FROM read_parquet('s3://my-bucket/path/to/data/*.parquet')

refresh:
cron: "0 */6 * * *"

Step 3: Add credentials to .env

connector.s3.aws_role_arn=arn:aws:iam::123456789012:role/RillDataAccess
connector.s3.aws_external_id=your_external_id

Method 3: Public Buckets

For publicly accessible S3 buckets, you don't need to create a connector. Simply use the S3 URI directly in your model configuration.

Using the UI

  1. Click Add Data in your Rill project
  2. Select Amazon S3 as the data model type
  3. In the authentication step:
    • Choose Public
    • The UI will skip connector creation and proceed directly to data model configuration
  4. In the data model configuration step:
    • Enter your bucket name and object path
    • Configure other model settings as needed
  5. Click Create to finalize

The UI will only create the model file (no connector file is created).

Manual Configuration

For public buckets, you only need to create a model file. No connector configuration is required.

Create models/my_s3_data.yaml:

type: model
connector: duckdb

sql: SELECT * FROM read_parquet('s3://my-public-bucket/path/to/data/*.parquet')

refresh:
cron: "0 */6 * * *"
Public Access Only

This method only works with publicly accessible buckets. Most production S3 buckets are private and require authentication.


Method 4: Local AWS Credentials (Local Development Only)

For local development, you can use credentials from the AWS CLI. This method is not suitable for production or Rill Cloud deployments. This method is only available through manual configuration, and you don't need to create a connector file.

Setup

  1. Install the AWS CLI if not already installed
  2. Authenticate with your AWS account:
  3. Create your model file (no connector needed)

Model Configuration

Create models/my_s3_data.yaml:

type: model
connector: duckdb

sql: SELECT * FROM read_parquet('s3://my-bucket/path/to/data/*.parquet')

refresh:
cron: "0 */6 * * *"

Rill will automatically detect and use your local AWS CLI credentials when no connector is specified.

warning

This method only works for local development. Deploying to Rill Cloud with this configuration will fail because the cloud environment doesn't have access to your local credentials. Always use Access Key/Secret Key or IAM Role Assumption for production deployments.

Using S3 Data in Models

Once your connector is configured (or for public buckets, no connector needed), you can reference S3 paths in your model SQL queries using DuckDB's S3 functions.

Basic Example

With a connector (authenticated):

type: model
connector: duckdb

sql: SELECT * FROM read_parquet('s3://my-bucket/data/*.parquet')

refresh:
cron: "0 */6 * * *"

Public bucket (no connector needed):

type: model
connector: duckdb

sql: SELECT * FROM read_parquet('s3://my-public-bucket/data/*.parquet')

refresh:
cron: "0 */6 * * *"

Path Patterns

You can use wildcards to read multiple files:

-- Single file
SELECT * FROM read_parquet('s3://my-bucket/data/file.parquet')

-- All files in a directory
SELECT * FROM read_parquet('s3://my-bucket/data/*.parquet')

-- All files in nested directories
SELECT * FROM read_parquet('s3://my-bucket/data/**/*.parquet')

-- Files matching a pattern
SELECT * FROM read_parquet('s3://my-bucket/data/2024-*.parquet')

Deploy to Rill Cloud

When deploying a project to Rill Cloud, Rill requires you to explicitly provide an access key and secret for an AWS service account with access to S3 used in your project. Please refer to our connector YAML reference docs for more information.

If you subsequently add sources that require new credentials (or if you simply entered the wrong credentials during the initial deploy), you can update the credentials by pushing the Deploy button to update your project or by running the following command in the CLI:

rill env push

Appendix

How to create an AWS service account using the AWS Management Console

Here is a step-by-step guide on how to create an AWS service account with read-only access to S3:

  1. Log in to the AWS Management Console and navigate to the IAM dashboard.

  2. In the sidebar, select "Users" and click the "Add users" button.

  3. Enter a username for the service account and click "Next".

  4. Select "Attach policies directly" and grant the service account read access to data in S3:

    • To grant access to data in all buckets, search for the "AmazonS3ReadOnlyAccess" policy. Check the box next to the policy to select it.
    • To only grant access to data in a specific bucket, follow these steps:
      1. Click the "Create policy" button in the top right corner of the "Permissions policies" box.
      2. Select the "JSON" tab in the top right corner of the "Policy editor".
      3. Paste the following policy and replace [BUCKET_NAME] with the name of your bucket:
        {
        "Version": "2012-10-17",
        "Statement": [
        {
        "Effect": "Allow",
        "Action": [
        "s3:GetObject",
        "s3:ListBucket"
        ],
        "Resource": [
        "arn:aws:s3:::[BUCKET_NAME]",
        "arn:aws:s3:::[BUCKET_NAME]/*"
        ]
        }
        ]
        }
      4. Click "Next".
      5. Give the policy a name and click "Create policy".
      6. Go back to the service account creation flow. Click the refresh button next to the "Create policy" button.
      7. Search for the policy you just created. Check the box next to the policy to select it.
  5. After attaching a policy, click "Next". Then, under "Set permissions boundaries and tags", click the "Create user" button.

  6. On the "Users" page, navigate to the newly created user and go to the "Security credentials" tab.

  7. Under the "Access keys" section, click "Create access key".

  8. On the "Access key best practices & alternatives" screen, select "Third-party service", confirm the checkbox, and click "Next".

  9. On the "Set description tag" screen, optionally enter a description, and click "Create access key".

  10. Note down the "Access key" and "Secret access key" values for the service account. (Hint: Click the ❐ icon next to the secrets to copy them to the clipboard.)

How to create an AWS service account using the aws CLI

Here is a step-by-step guide on how to create an AWS service account with read-only access to S3 using the AWS CLI:

  1. Open a terminal window and install the AWS CLI if it is not already installed on your system.

  2. Run the following command to create a new user (optionally replace rill-service-account with a name of your choice):

    aws iam create-user --no-cli-pager --user-name rill-service-account
  3. Grant the user read access to data in S3:

    • To grant access to data in all buckets, run the following command:

      aws iam attach-user-policy \
      --policy-arn arn:aws:iam::aws:policy/AmazonS3ReadOnlyAccess \
      --user-name rill-service-account
    • To only grant access to data in a specific bucket:

      1. Create a custom policy by running the following command, replacing [POLICY_NAME] with a custom name and [BUCKET_NAME] with the bucket name:

        aws iam create-policy \
        --policy-name [POLICY_NAME] \
        --policy-document \
        '{
        "Version": "2012-10-17",
        "Statement": [
        {
        "Effect": "Allow",
        "Action": [
        "s3:GetObject",
        "s3:ListBucket"
        ],
        "Resource": [
        "arn:aws:s3:::[BUCKET_NAME]",
        "arn:aws:s3:::[BUCKET_NAME]/*"
        ]
        }
        ]
        }'
      2. Attach the custom policy to the user by running the following command, replacing [POLICY_NAME] with the custom name set in the previous step:

        aws iam attach-user-policy \
        --policy-arn arn:aws:iam::aws:policy/[POLICY_NAME] \
        --user-name rill-service-account
  4. Run the following command to create an access key pair for the user:

    aws iam create-access-key --user-name rill-service-account
  5. Note down the AccessKeyId and SecretAccessKey values in the returned JSON object. Press "q" to exit the page.

How to create an IAM role for cross-account access with Rill-provided AWS account

To set up an IAM role that grants Rill's AWS account access to your S3 buckets:

  1. Log in to the AWS Management Console of your account that owns the S3 bucket (your resource account).

  2. Navigate to the IAM dashboard.

  3. In the sidebar, select "Roles" and click the "Create role" button.

  4. For trust relationship, select "AWS account" and choose "Another AWS account".

  5. Enter Rill's AWS account ID that was provided to you by your Rill representative.

  6. Select "Require external ID" and enter the External ID provided by Rill. This helps prevent the confused deputy problem.

  7. Click "Next: Permissions".

  8. Attach policies that grant the necessary S3 access permissions:

    • For read-only access to all buckets, select "AmazonS3ReadOnlyAccess"
    • For more restricted access, create a custom policy similar to the one described in the previous sections, limiting access to specific buckets.
  9. Click "Next: Tags", add optional tags if desired, and then click "Next: Review".

  10. Give the role a descriptive name (e.g., "RillDataAccess") and an optional description.

  11. Click "Create role".

  12. After creating the role, click on it to view its details.

  13. Note the "Role ARN" value which looks like: arn:aws:iam::123456789012:role/RillDataAccess

  14. Share this Role ARN with your Rill representative to complete the setup. Rill will configure their systems to assume this role when accessing your data.