Google Cloud Storage (GCS)
Overview
Google Cloud Storage (GCS) is a scalable, fully managed, and highly reliable object storage service offered by Google Cloud, designed to store and access data from anywhere in the world. It provides a secure and cost-effective way to store data, including common data storage formats such as CSV and Parquet. You can connect to GCS using the provided Google Cloud Storage URI of your bucket to retrieve and read files.
Authentication Methods
To connect to Google Cloud Storage, you need to provide authentication credentials (or skip for public buckets). Rill supports three methods:
- Use Service Account JSON (recommended for production)
- Use HMAC Keys (alternative authentication method)
- Use Local Google Cloud CLI credentials (local development only - not recommended for production)
Choose the method that best fits your setup. For production deployments to Rill Cloud, use Service Account JSON or HMAC Keys. Local Google Cloud CLI credentials only work for local development and will cause deployment failures.
Using the Add Data UI
When you add a GCS data model through the Rill UI, the process follows two steps:
- Configure Authentication - Set up your GCS connector with credentials (Service Account JSON or HMAC keys)
- Configure Data Model - Define which bucket and objects to ingest
This two-step flow ensures your credentials are securely stored in the connector configuration, while your data model references remain clean and portable.
Method 1: Service Account JSON (Recommended)
Service Account JSON credentials provide the most secure and reliable authentication for GCS. This method works for both local development and Rill Cloud deployments.
Using the UI
- Click Add Data in your Rill project
- Select Google Cloud Storage (GCS) as the data model type
- In the authentication step:
- Choose Service Account JSON
- Upload your JSON key file or paste its contents
- Name your connector (e.g.,
my_gcs)
- In the data model configuration step:
- Enter your bucket name and object path
- Configure other model settings as needed
- Click Create to finalize
The UI will automatically create both the connector file and model file for you.
Manual Configuration
If you prefer to configure manually, create two files:
Step 1: Create connector configuration
Create connectors/my_gcs.yaml:
type: connector
driver: gcs
google_application_credentials: "{{ .env.connector.gcs.google_application_credentials }}"
Step 2: Create model configuration
Create models/my_gcs_data.yaml:
type: model
connector: duckdb
sql: SELECT * FROM read_parquet('gs://my-bucket/path/to/data/*.parquet')
# Add a refresh schedule
refresh:
cron: "0 */6 * * *"
Step 3: Add credentials to .env
connector.gcs.google_application_credentials=<json_credentials>
Method 2: HMAC Keys
HMAC keys provide S3-compatible authentication for GCS. This method is useful when you need compatibility with S3-style access patterns.
Using the UI
- Click Add Data in your Rill project
- Select Google Cloud Storage (GCS) as the data model type
- In the authentication step:
- Choose HMAC Keys
- Enter your Access Key ID
- Enter your Secret Access Key
- Name your connector (e.g.,
my_gcs_hmac)
- In the data model configuration step:
- Enter your bucket name and object path
- Configure other model settings as needed
- Click Create to finalize
Manual Configuration
Step 1: Create connector configuration
Create connectors/my_gcs_hmac.yaml:
type: connector
driver: gcs
key_id: "{{ .env.connector.gcs.key_id }}"
secret: "{{ .env.connector.gcs.secret }}"
Step 2: Create model configuration
Create models/my_gcs_data.yaml:
type: model
connector: duckdb
sql: SELECT * FROM read_parquet('gs://my-bucket/path/to/data/*.parquet')
# Add a refresh schedule
refresh:
cron: "0 */6 * * *"
Step 3: Add credentials to .env
connector.gcs.key_id=GOOG1234567890ABCDEFG
connector.gcs.secret=your-secret-access-key
Notice that the connector uses key_id and secret. HMAC keys use S3-compatible authentication with GCS.
Method 3: Local Google Cloud CLI Credentials
For local development, you can use credentials from the Google Cloud CLI. This method is not suitable for production or Rill Cloud deployments.
Setup
- Install the Google Cloud CLI
- Authenticate with your Google account:
gcloud auth application-default login - Create your connector and model files
Connector Configuration
Create connectors/my_gcs.yaml:
type: connector
driver: gcs
Model Configuration
Create models/my_gcs_data.yaml:
type: model
connector: duckdb
sql: SELECT * FROM read_parquet('gs://my-bucket/path/to/data/*.parquet')
# Add a refresh schedule
refresh:
cron: "0 */6 * * *"
When no explicit credentials are provided in the connector, Rill will automatically use your local Google Cloud CLI credentials.
This method only works for local development. Deploying to Rill Cloud with this configuration will fail because the cloud environment doesn't have access to your local credentials. Always use Service Account JSON or HMAC keys for production deployments.
Using GCS Data in Models
Once your connector is configured, you can reference GCS paths in your model SQL queries using DuckDB's GCS functions.
Basic Example
type: model
connector: duckdb
sql: SELECT * FROM read_parquet('gs://my-bucket/data/*.parquet')
refresh:
cron: "0 */6 * * *"
Reading Multiple File Types
type: model
connector: duckdb
sql: |
-- Read Parquet files
SELECT * FROM read_parquet('gs://my-bucket/parquet-data/*.parquet')
UNION ALL
-- Read CSV files
SELECT * FROM read_csv('gs://my-bucket/csv-data/*.csv', AUTO_DETECT=TRUE)
refresh:
cron: "0 */6 * * *"
Path Patterns
You can use wildcards to read multiple files:
-- Single file
SELECT * FROM read_parquet('gs://my-bucket/data/file.parquet')
-- All files in a directory
SELECT * FROM read_parquet('gs://my-bucket/data/*.parquet')
-- All files in nested directories
SELECT * FROM read_parquet('gs://my-bucket/data/**/*.parquet')
-- Files matching a pattern
SELECT * FROM read_parquet('gs://my-bucket/data/2024-*.parquet')
Deploying to Rill Cloud
When deploying your project to Rill Cloud, you must use either Service Account JSON or HMAC Keys. Local Google Cloud CLI credentials will not work in the cloud environment.
To manually configure your environment variables, run:
rill env configure
The CLI will interactively walk you through configuring all required credentials for your connectors.
Appendix
How to create a service account using the Google Cloud Console
Here is a step-by-step guide on how to create a Google Cloud service account with read-only access to GCS:
- Navigate to the Service Accounts page under "IAM & Admin" in the Google Cloud Console.
- Click the "Create Service Account" button at the top of the page.
- In the "Create Service Account" window, enter a name for the service account, then click "Create and continue".
- In the "Role" field, search for and select the "Storage Object Viewer" role. Click "Continue", then click "Done".
- This grants the service account access to data in all buckets. To only grant access to data in a specific bucket, leave the "Role" field blank, click "Done", then follow the steps described in Add a principal to a bucket-level policy.
- On the "Service Accounts" page, locate the service account you just created and click on the three dots on the right-hand side. Select "Manage Keys" from the dropdown menu.
- On the "Keys" page, click the "Add key" button and select "Create new key".
- Choose the "JSON" key type and click "Create".
- Download and save the JSON key file to a secure location on your computer.
You'll need to contact your internal cloud admin to create your Service Account JSONs for you.
How to create a service account using the gcloud CLI
- Open a terminal window and follow the steps on Install the Google Cloud CLI if you haven't already done so.
- You will need your Google Cloud project ID to complete this tutorial. Run the following command to show it:
gcloud config get project - Replace
[PROJECT_ID]with your project ID in the following command, and run it to create a new service account (optionally also replacerill-service-accountwith a name of your choice):gcloud iam service-accounts create rill-service-account --project [PROJECT_ID] - Grant the service account access to data in Google Cloud Storage:
- To grant access to data in all buckets, replace
[PROJECT_ID]with your project ID in the following command, and run it:gcloud projects add-iam-policy-binding [PROJECT_ID] \
--member="serviceAccount:rill-service-account@[PROJECT_ID].iam.gserviceaccount.com" \
--role="roles/storage.objectViewer" - To only grant access to data in a specific bucket, replace
[BUCKET_NAME]and[PROJECT_ID]with your details in the following command, and run it:gcloud storage buckets add-iam-policy-binding gs://[BUCKET_NAME] \
--member="serviceAccount:rill-service-account@[PROJECT_ID].iam.gserviceaccount.com" \
--role="roles/storage.objectViewer"
- To grant access to data in all buckets, replace
- Replace
[PROJECT_ID]with your project ID in the following command, and run it to create a key file for the service account:gcloud iam service-accounts keys create rill-service-account.json \
--iam-account rill-service-account@[PROJECT_ID].iam.gserviceaccount.com - You have now created a JSON key file named
rill-service-account.jsonin your current working directory.
How to create HMAC keys using the Google Cloud Console
- Go to the Google Cloud Console
- Navigate to Cloud Storage → Settings → Interoperability
- If not already enabled, click Enable Interoperability Access
- Scroll to Service account HMAC section
- Select a service account or create a new one
- Click Create a key for a service account
- Copy the Access Key and Secret - you won't be able to view the secret again
How to create HMAC keys using the gcloud CLI
# Create HMAC keys for a service account
gcloud storage hmac create SERVICE_ACCOUNT_EMAIL
# List existing HMAC keys
gcloud storage hmac list
Replace SERVICE_ACCOUNT_EMAIL with your service account's email address.
Keep your HMAC keys secure and never commit them to version control.