Skip to main content

Snowflake Destination Connection

Follow our setup guide to connect your Snowflake data warehouse to Crux.

Jon Tam avatar
Written by Jon Tam
Updated over 2 years ago

Overview

When creating a data product on the Crux External Data Platform, you may configure your data product to deliver to a target Snowflake destination.

The Snowflake destination delivery process is as follows:

  1. Crux files for a data product are processed and stored in Crux's internal data warehouse (GCS).

  2. The Crux Delivery Dispatch Service handles the data load into the Snowflake ecosystem. Files are delivered to one of the following staging areas:

    1. Snowflake's local file system

    2. AWS S3

    3. Future staging areas: Azure Blob and GCS

  3. Data then gets copied from this staging area to Snowflake tables.

Concepts

Snowflake follows a specific entity hierarchy:

  1. Select a Database

  2. Select a Schema

  3. Create a Table

Naming Convention

A fully qualified table name is referenced as <DatabaseName>.<SchemaName>.<TableName>

  • DatabaseName - Snowflake database created by a client

  • SchemaName - Snowflake schema created by a client

  • TableName - Crux-created table name:

    • Format: <Crux Frame Name>--<Legacy Dataset Id> (Example: CruxTestFrameName--AQLGSv-3bh123cRdVtHdNePvbw)


Requirements and Permissions

In order to configure the Crux External Data Platform to use Snowflake as a target destination, there are a number of setup activities that must be done.


Step 1: Infrastructure and Access Layer

As currently implemented, it is necessary to create an Amazon S3 bucket to be used by Snowflake as a staging area. In the future, we will support other staging vendors.

Creating an Amazon S3 Bucket

The following instructions describe how to create an Amazon S3 bucket used by Snowflake.

  1. Log into the AWS Management Console.

  2. From the home dashboard, choose Simple Storage Service, also known as S3.

  3. Click Create bucket.

  4. Enter the bucket name and select the region from the dropdown menu. The rest of the default settings should be unchanged. Note: it is recommended to use the same region as the one you are planning to use for Snowflake.

Creating an IAM Policy for Dispatch

The following instructions describe how to configure access permissions for the Crux service account to dispatch data into your S3 bucket.

  1. Log into the AWS Management Console.

  2. From the home dashboard, choose Identity & Access Management (IAM)

  3. Choose Policies from the left-hand navigation pane.

  4. Click the Create Policy button.

  5. Click the JSON tab.

  6. Add a policy document that will allow Crux to access the S3 bucket. The following policy (in JSON format) provides Crux's dispatch delivery service with the required permissions to load data using a single bucket.

    Copy and paste the text into the policy editor:

    {
    "Version": "2012-10-17",
    "Statement": [
    {
    "Effect": "Allow",
    "Action": [
    "s3:PutObject",
    "s3:GetObject",
    "s3:GetObjectVersion",
    "s3:DeleteObject",
    "s3:DeleteObjectVersion"
    ],
    "Resource": "arn:aws:s3:::<bucket>/*"
    },
    {
    "Effect": "Allow",
    "Action": [
    "s3:ListBucket",
    "s3:GetBucketLocation"
    ],
    "Resource": "arn:aws:s3:::<bucket>"
    }
    ]
    }

  7. Click Review policy.

  8. Enter the policy name and an optional description. Click the Create policy button.


Create an IAM Role in AWS

In the AWS Management console, create an AWS IAM role to grant privileges on the S3 bucket.

  1. Log into the AWS Management Console.

  2. From the home dashboard, choose IAM.

  3. Choose Roles from the left-hand navigation pane.

  4. Click the Create role button.

  5. Select the AWS account as the trusted entity type.

  6. In the AWS account field, select Another AWS account radio button and enter the following Crux AWS Account ID: 019717318677. Later, you will modify the trusted relationship and grant access to Crux.

  7. Click the Next button.

  8. Locate the policy you created in the above section and select this policy.

  9. Click the Next button.

  10. Enter a name and description for this role, and click the Create role button.

  11. Now you need to update this role with the ARN role that you created above. Click on the role you just created and select Trust relationship.

  12. Click the Edit trust policy.

  13. Update the existing policy with the following document using the following ARN role:

    • arn:aws:iam::019717318677:role/DeliveryDispatchServiceRole

    {
    "Version": "2012-10-17",
    "Statement": [
    {
    "Effect": "Allow",
    "Principal": {
    "AWS": "arn:aws:iam::107167841506:role/DeliveryDispatchServiceRole"
    },
    "Action": "sts:AssumeRole"
    }
    ]
    }

  14. Click Update policy.

  15. Now you need to provide Crux with this role ARN. In order to do that, you need to copy the role ARN from the summary section (e.g. arn:aws:iam::000000000000:role/CruxDispatchS3RWRole).

  16. Crux would need to update the Delivery Dispatch Service Account policy with this role ARN.


Step 2: Setup Storage Integration From Snowflake to S3

On this step, you will need to follow the Snowflake guide to set up direct Snowflake access to your S3 bucket. Note: this step is only necessary while using external S3 stage.

Follow the guide here to create a storage integration ID that you will need for the Snowflake destination form in the Crux External Data Platform.


Step 3: Setup Your Key Pair Authentication

Snowflake supports using key pair authentication for enhanced authentication security. Please refer to the Snowflake guide on this topic.

Here is a handy script to generate and print public and private keys in a format required by Snowflake and Crux's dispatch services:

#!/bin/bash

# Snowflake Docs: https://docs.snowflake.com/en/user-guide/key-pair-auth

# Variables
ENCRYPT=true

if $ENCRYPT ; then
# generate rsa private key (passphrase is MANDATORY)
openssl genrsa 2048 | openssl pkcs8 -topk8 -inform PEM -out rsa_key.p8
else
# generate rsa private key (passphrase is OPTIONAL)
openssl genrsa 2048 | openssl pkcs8 -topk8 -inform PEM -out rsa_key.p8 -nocrypt
fi

# generate rsa public key
openssl rsa -in rsa_key.p8 -pubout -out rsa_key.pub

key=$(awk '/-END PUBLIC KEY-/ { p = 0 }; p; /-BEGIN PUBLIC KEY-/ { p = 1 }' rsa_key.pub)
echo -e '\nPUBLIC KEY---------------------------------------------\n'
echo $key | tr -d ' '
echo -e '\n---------------------------------------------\n'

key=$(awk '{printf "%s\\n", $0}' rsa_key.p8)
echo -e '\nPRIVATE KEY---------------------------------------------\n'
echo $key
echo -e '\n---------------------------------------------\n'

Make note of the following key attributes:

  • privateKeyFileContent

  • privateKeyFilePassphrase (optional)


Step 4: Setup Snowflake Entities

In order to set up the appropriate Snowflake entities, you will need to be an admin in your Snowflake account in order to assume all security and admin roles.

The following parameters should be updated to set up the Snowflake entities before delivering data:

role_name
user_name
user_password
database_name
database_comment
database_schema_name

warehouse_name
warehouse_size
warehouse_type
warehouse_comment

storage_integration_name === From Step 2

<User Public Key> - Not set as a variable === From Step 3

Run Script in Snowflake Warehouse

  1. Log in to your Snowflake data warehouse.

  2. Copy the following script to a new Snowflake worksheet:

    begin;
    -- create variables
    set role_name = 'CRUX_DISPATCHER';
    set user_name = 'CRUX_USER';
    set user_password = 'password123';
    set database_name = 'CRUX_DATABASE';
    set database_comment = 'Snowflake delivery powered by Crux';
    set database_schema_name = 'PRIVATE';

    -- create variables for warehouse spec
    set warehouse_name = 'CRUX_WAREHOUSE';
    set warehouse_size = 'small';
    set warehouse_type = 'standard';
    set warehouse_comment = 'Warehouse for Crux Data loads';

    -- create variables for storage integration with S3
    set storage_integration_name = 'S3_INT_TEST_1';

    -- public key cannot be set as variable so update the value on line #35

    -- change ROLE to securityadmin
    USE ROLE securityadmin;

    -- create ROLE for Crux
    CREATE ROLE if not exists identifier($role_name);
    GRANT ROLE identifier($role_name) to ROLE SYSADMIN;

    -- create a USER for Crux
    create USER if not exists identifier($user_name)
    password = $user_password
    default_ROLE = $role_name
    default_warehouse = $warehouse_name;

    GRANT ROLE identifier($role_name) to USER identifier($user_name);

    -- grant public key to user
    -- Exclude the public key delimiters in the SQL statement.
    ALTER USER identifier($user_name) SET RSA_PUBLIC_KEY='MIIBIjANBgkqhkiG9w0BAQEFAAOCAQ8AMIIBCgKCAQEAk6L4BusRpdlnpV6kmZzXdcs7nG+iFyY39e/3dXexYLFMlnB8G8zsaXhN+FYFEyIjWA+VDc0R9GGTPIDiJwNI8vexUTrH0ymuKC3NQVRm5EjKXKnUydyESKIoIv+lyut6L/8vXfSQGPZgCIVi9G+dUHOmMTn57dSmECRlIMsKnljHFY6NkQ4i/U/RSNC6Km8ZS/7L5CHqDpyfbvBL6HYZx2wPqf4w+TQGyH2U76h+geigerOgzTwwZqV06ihSOsko19gaIew1skvsz5fj3lBBoPJCFLWOOw5pkEcl+obYCijHLp8u0T9GKCbMVz2XBnBqcsLxEdTs/OuqBPI5VxdQfwIDAQAB';

    -- change ROLE to sysadmin
    USE ROLE sysadmin;

    -- create a warehouse for Crux
    -- Snowflake Doc: https://docs.snowflake.com/en/sql-reference/sql/create-warehouse
    CREATE WAREHOUSE if not exists identifier($warehouse_name)
    WITH WAREHOUSE_SIZE = $warehouse_size
    WAREHOUSE_TYPE = $warehouse_type
    AUTO_SUSPEND = 300
    AUTO_RESUME = TRUE
    COMMENT = $warehouse_comment;

    -- Create Database for Crux
    CREATE DATABASE if not exists identifier($database_name) COMMENT = $database_comment;

    -- grant Crux role access to warehouse
    GRANT USAGE
    on warehouse identifier($warehouse_name) to ROLE identifier($role_name);

    -- grant Crux access to database
    GRANT CREATE SCHEMA, MONITOR, USAGE
    on database identifier($database_name) to ROLE identifier($role_name);

    -- change ROLE to CRUX
    USE ROLE identifier($role_name);

    -- Create Schema for Crux
    USE DATABASE identifier($database_name);
    CREATE SCHEMA if not exists identifier($database_schema_name);

    -- change ROLE to AccountAdmin
    USE ROLE ACCOUNTADMIN;
    -- grant access to storage integration for external S3
    GRANT USAGE ON INTEGRATION identifier($storage_integration_name) TO ROLE identifier($role_name);

    commit;

  3. Depending on whether you want to create a new warehouse or use a shared warehouse do either:

    • If you want to create a new exclusive warehouse, don't make any changes to the CRUX_WAREHOUSE value in the script.

    • If you want Crux to use a shared warehouse to ingest data, change the CRUX_WAREHOUSE value in the script to the name of the shared warehouse.

  4. Replace the default CRUX_DISPATCHER, CRUX_DATABASE, CRUX_USER, and password123 values with values that pertain to your specific naming conventions for those resources. Important: Do not use this username for any other purpose!

  5. Make note of the values that replace the default CRUX_DATABASE, CRUX_USER, and password123. You will need them during the setup process to configure Crux.

  6. Run the script.


Connect to Your Destination

To connect to your destination:

  1. Provide a unique connection name that you can refer to at a later time.

  2. Select Snowflake in the destination connection method.

  3. Provide the Account Identifier that is located on the bottom left of your Snowflake databases page. This will be in the form such as xy12345.

  4. Provide the Region in the form that is expressed in your Snowflake URL (e.g. us-east-1).

  5. Provide the Username in the form that is located on the bottom left of your Snowflake databases page.

  6. When region is included in the Account Identifier, select the checkbox Include region name in account.

  7. Provide the Private Key and, optionally, the Private Key Passphrase that was obtained in the bash script above.

  8. Select the Vendor that you would like to use. Currently we support Amazon S3.

  9. Provide the Warehouse name associated with where Crux will send data (e.g. CRUX_WAREHOUSE). Note: this value is case sensitive.

  10. Provide the Database name within the warehouse. Note: this value is case sensitive.

  11. Provide the Schema name to be used within the Database. Note: this value is case sensitive.

  12. If you are using a vendor for staging, toggle ON the Use vendor for staging option.

    1. When toggled on, you will see additional fields displayed that are specific to the vendor.

    2. Provide the External bucket name associated with the staging vendor.

    3. Provide the External ARN role for this Amazon S3 bucket.

    4. Provide the Storage Integration ID from the above section.

    5. Optionally, provide the External folder path for the staging activity.

  13. Click Add connection to connect and create the Snowflake destination in the Crux External Data Platform.


Connection fields

To create a Snowflake destination connection, configure the following connection settings:

Field

Description

Destination name

Name for your destination.

Connection method

Type of connection.

Account

Unique identifier for your organization.

Username

Snowflake user account.

Region

(Optional) Cloud region in use.

Include region name in account

If checked, the specified region is included in your account name. Example account name: “sr80563", "sr80563.us-east-1"

Private key

Private key used to authenticate a client when it connects.

Private key passphrase

(Optional) Word or phrase that protects private key files.

Vendor

Vendor of the cloud region.

Warehouse

The cluster of compute resources used for Snowflake queries.

Database

Database name. Example: CX_DATA_LAKE

Schema

Schema name. Example: PRIVATE

Folder path

(Optional) Destination folder path for the files.

Use vendor for staging

Boolean toggle to use vendor. When enabled, specify:

  • External ARN role: Role to access the external S3 bucket.

  • External bucket name: name for the bucket (container for the objects).

  • External folder path (optional): Prefix for the bucket.

Did this answer your question?