Skip to main content

Amazon S3

Overview

Amazon S3 is a scalable, fully managed, and highly reliable object storage solution offered by AWS, designed to store and access data from anywhere in the world. It provides a secure and cost-effective way to store data, including common storage formats for data such as CSV and parquet. Rill supports natively connecting to S3 using the provided S3 URI of your bucket to retrieve and read files.

Connecting to S3

Local credentials

When using Rill Developer on your local machine (i.e. rill start), Rill uses the credentials configured in your local environment using the AWS CLI.

To check if you already have the AWS CLI installed and authenticated, open a terminal window and run:

aws iam get-user --no-cli-pager
note

The above command only works with AWS CLI version 2 and above.

If it prints information about your user, there is nothing more to do. Rill will be able to connect to any data in S3 that you have access to.

If you do not have the AWS CLI installed and authenticated, follow these steps:

  1. Open a terminal window and install the AWS CLI if it is not already installed on your system.

  2. If your organization has SSO configured, reach out to your admin for instructions on how to authenticate using aws sso login.

  3. If your organization does not have SSO configured:

    a. Follow the steps described under How to create an AWS service account using the AWS Management Console, which you will find below on this page.

    b. Run the following command and provide the access key, access secret, and default region when prompted (you can leave the "Default output format" blank):

    aws configure

You have now configured AWS access from your local environment. Rill will detect and use your credentials next time you try to ingest a source.

Did you know?

If this project has already been deployed to Rill Cloud and credentials have been set for this source, you can use rill env pull to pull these cloud credentials locally (into your local .env file). Please note that this may override any credentials that you have set locally for this source.

Cloud deployment

When deploying a project to Rill Cloud (i.e. rill deploy), Rill requires an access and secret key to be explicitly provided for an AWS service account with appropriate read access / permissions to the S3 buckets used in your project.

When you first deploy a project using rill deploy, you will be prompted to provide credentials for the remote sources in your project that require authentication.

If you subsequently add sources that require new credentials (or if you input the wrong credentials during the initial deploy), you can update the credentials used by Rill Cloud by running:

rill env configure
info

Note that you must cd into the Git repository that your project was deployed from before running rill env configure.

Did you know?

If you've configured credentials locally already (in your <RILL_HOME>/.home file), you can use rill env push to push these credentials to your Rill Cloud project. This will allow other users to retrieve / reuse the same credentials automatically by running rill env pull.

Appendix

How to create an AWS service account using the AWS Management Console

Here is a step-by-step guide on how to create an AWS service account with read-only access to S3:

  1. Log in to the AWS Management Console and navigate to the IAM dashboard.

  2. In the sidebar, select "Users" and click the "Add users" button.

  3. Enter a username for the service account and click "Next".

  4. Select "Attach policies directly" and grant the service account read access to data in S3:

    • To grant access to data in all buckets, search for the "AmazonS3ReadOnlyAccess" policy. Check the box next to the policy to select it.
    • To only grant access to data in a specific bucket, follow these steps:
      1. Click the "Create policy" button in the top right corner of the "Permissions policies" box.
      2. Select the "JSON" tab in the top right corner of the "Policy editor".
      3. Paste the following policy and replace [BUCKET_NAME] with the name of your bucket:
        {
        "Version": "2012-10-17",
        "Statement": [
        {
        "Effect": "Allow",
        "Action": [
        "s3:GetObject",
        "s3:ListBucket"
        ],
        "Resource": [
        "arn:aws:s3:::[BUCKET_NAME]",
        "arn:aws:s3:::[BUCKET_NAME]/*"
        ]
        }
        ]
        }
      4. Click "Next".
      5. Give the policy a name and click "Create policy".
      6. Go back to the service account creation flow. Click the refresh button next to the "Create policy" button.
      7. Search for the policy you just created. Check the box next to the policy to select it.
  5. After attaching a policy, click "Next". Then, under "Set permissions boundaries and tags", click the "Create user" button.

  6. On the "Users" page, navigate to the newly created user and go to the "Security credentials" tab.

  7. Under the "Access keys" section, click "Create access key".

  8. On the "Access key best practices & alternatives" screen, select "Third-party service", confirm the checkbox, and click "Next".

  9. On the "Set description tag" screen, optionally enter a description, and click "Create access key".

  10. Note down the "Access key" and "Secret access key" values for the service account. (Hint: Click the ❐ icon next to the secrets to copy them to the clipboard.)

How to create an AWS service account using the aws CLI

Here is a step-by-step guide on how to create an AWS service account with read-only access to S3 using the AWS CLI:

  1. Open a terminal window and install the AWS CLI if it is not already installed on your system.

  2. Run the following command to create a new user (optionally replace rill-service-account with a name of your choice):

    aws iam create-user --no-cli-pager --user-name rill-service-account
  3. Grant the user read access to data in S3:

    • To grant access to data in all buckets, run the following command:

      aws iam attach-user-policy \
      --policy-arn arn:aws:iam::aws:policy/AmazonS3ReadOnlyAccess \
      --user-name rill-service-account
    • To only grant access to data in a specific bucket:

      1. Create a custom policy by running the following command, replacing [POLICY_NAME] with a custom name and [BUCKET_NAME] with the bucket name:

        aws iam create-policy \
        --policy-name [POLICY_NAME] \
        --policy-document \
        '{
        "Version": "2012-10-17",
        "Statement": [
        {
        "Effect": "Allow",
        "Action": [
        "s3:GetObject",
        "s3:ListBucket"
        ],
        "Resource": [
        "arn:aws:s3:::[BUCKET_NAME]",
        "arn:aws:s3:::[BUCKET_NAME]/*"
        ]
        }
        ]
        }'
      2. Attach the custom policy to the user by running the following command, replacing [POLICY_NAME] with the custom name set in the previous step:

        aws iam attach-user-policy \
        --policy-arn arn:aws:iam::aws:policy/[POLICY_NAME] \
        --user-name rill-service-account
  4. Run the following command to create an access key pair for the user:

    aws iam create-access-key --user-name rill-service-account
  5. Note down the AccessKeyId and SecretAccessKey values in the returned JSON object. Click "q" to exit the page.