Overview
When creating a data product on the Crux External Data Platform, you may configure your data product to deliver to a target Snowflake destination.
The Snowflake destination delivery process is as follows:
Crux files for a data product are processed and stored in Crux's internal data warehouse (GCS).
The Crux Delivery Dispatch Service handles the data load into the Snowflake ecosystem. Files are delivered to one of the following staging areas:
Snowflake's local file system
AWS S3
Future staging areas: Azure Blob and GCS
Data then gets copied from this staging area to Snowflake tables.
Concepts
Snowflake follows a specific entity hierarchy:
Select a Database
Select a Schema
Create a Table
Naming Convention
A fully qualified table name is referenced as <DatabaseName>.<SchemaName>.<TableName>
DatabaseName - Snowflake database created by a client
SchemaName - Snowflake schema created by a client
TableName - Crux-created table name:
Format:
<Crux Frame Name>--<Legacy Dataset Id>
(Example:CruxTestFrameName--AQLGSv-3bh123cRdVtHdNePvbw
)
Requirements and Permissions
In order to configure the Crux External Data Platform to use Snowflake as a target destination, there are a number of setup activities that must be done.
Step 1: Infrastructure and Access Layer
As currently implemented, it is necessary to create an Amazon S3 bucket to be used by Snowflake as a staging area. In the future, we will support other staging vendors.
Creating an Amazon S3 Bucket
The following instructions describe how to create an Amazon S3 bucket used by Snowflake.
Log into the AWS Management Console.
From the home dashboard, choose Simple Storage Service, also known as S3.
Click Create bucket.
Enter the bucket name and select the region from the dropdown menu. The rest of the default settings should be unchanged. Note: it is recommended to use the same region as the one you are planning to use for Snowflake.
Creating an IAM Policy for Dispatch
The following instructions describe how to configure access permissions for the Crux service account to dispatch data into your S3 bucket.
Log into the AWS Management Console.
From the home dashboard, choose Identity & Access Management (IAM)
Choose Policies from the left-hand navigation pane.
Click the Create Policy button.
Click the JSON tab.
Add a policy document that will allow Crux to access the S3 bucket. The following policy (in JSON format) provides Crux's dispatch delivery service with the required permissions to load data using a single bucket.
Copy and paste the text into the policy editor:
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": [
"s3:PutObject",
"s3:GetObject",
"s3:GetObjectVersion",
"s3:DeleteObject",
"s3:DeleteObjectVersion"
],
"Resource": "arn:aws:s3:::<bucket>/*"
},
{
"Effect": "Allow",
"Action": [
"s3:ListBucket",
"s3:GetBucketLocation"
],
"Resource": "arn:aws:s3:::<bucket>"
}
]
}Click Review policy.
Enter the policy name and an optional description. Click the Create policy button.
Create an IAM Role in AWS
In the AWS Management console, create an AWS IAM role to grant privileges on the S3 bucket.
Log into the AWS Management Console.
From the home dashboard, choose IAM.
Choose Roles from the left-hand navigation pane.
Click the Create role button.
Select the AWS account as the trusted entity type.
In the AWS account field, select Another AWS account radio button and enter the following Crux AWS Account ID:
019717318677
. Later, you will modify the trusted relationship and grant access to Crux.Click the Next button.
Locate the policy you created in the above section and select this policy.
Click the Next button.
Enter a name and description for this role, and click the Create role button.
Now you need to update this role with the ARN role that you created above. Click on the role you just created and select Trust relationship.
Click the Edit trust policy.
Update the existing policy with the following document using the following ARN role:
arn:aws:iam::019717318677:role/DeliveryDispatchServiceRole
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Principal": {
"AWS": "arn:aws:iam::107167841506:role/DeliveryDispatchServiceRole"
},
"Action": "sts:AssumeRole"
}
]
}Click Update policy.
Now you need to provide Crux with this role ARN. In order to do that, you need to copy the role ARN from the summary section (e.g.
arn:aws:iam::000000000000:role/CruxDispatchS3RWRole
).Crux would need to update the Delivery Dispatch Service Account policy with this role ARN.
Step 2: Setup Storage Integration From Snowflake to S3
On this step, you will need to follow the Snowflake guide to set up direct Snowflake access to your S3 bucket. Note: this step is only necessary while using external S3 stage.
Follow the guide here to create a storage integration ID that you will need for the Snowflake destination form in the Crux External Data Platform.
Step 3: Setup Your Key Pair Authentication
Snowflake supports using key pair authentication for enhanced authentication security. Please refer to the Snowflake guide on this topic.
Here is a handy script to generate and print public and private keys in a format required by Snowflake and Crux's dispatch services:
#!/bin/bash
# Snowflake Docs: https://docs.snowflake.com/en/user-guide/key-pair-auth
# Variables
ENCRYPT=true
if $ENCRYPT ; then
# generate rsa private key (passphrase is MANDATORY)
openssl genrsa 2048 | openssl pkcs8 -topk8 -inform PEM -out rsa_key.p8
else
# generate rsa private key (passphrase is OPTIONAL)
openssl genrsa 2048 | openssl pkcs8 -topk8 -inform PEM -out rsa_key.p8 -nocrypt
fi
# generate rsa public key
openssl rsa -in rsa_key.p8 -pubout -out rsa_key.pub
key=$(awk '/-END PUBLIC KEY-/ { p = 0 }; p; /-BEGIN PUBLIC KEY-/ { p = 1 }' rsa_key.pub)
echo -e '\nPUBLIC KEY---------------------------------------------\n'
echo $key | tr -d ' '
echo -e '\n---------------------------------------------\n'
key=$(awk '{printf "%s\\n", $0}' rsa_key.p8)
echo -e '\nPRIVATE KEY---------------------------------------------\n'
echo $key
echo -e '\n---------------------------------------------\n'
Make note of the following key attributes:
privateKeyFileContent
privateKeyFilePassphrase (optional)
Step 4: Setup Snowflake Entities
In order to set up the appropriate Snowflake entities, you will need to be an admin in your Snowflake account in order to assume all security and admin roles.
The following parameters should be updated to set up the Snowflake entities before delivering data:
role_name
user_name
user_password
database_name
database_comment
database_schema_name
warehouse_name
warehouse_size
warehouse_type
warehouse_comment
storage_integration_name === From Step 2
<User Public Key> - Not set as a variable === From Step 3
Run Script in Snowflake Warehouse
Log in to your Snowflake data warehouse.
Copy the following script to a new Snowflake worksheet:
begin;
-- create variables
set role_name = 'CRUX_DISPATCHER';
set user_name = 'CRUX_USER';
set user_password = 'password123';
set database_name = 'CRUX_DATABASE';
set database_comment = 'Snowflake delivery powered by Crux';
set database_schema_name = 'PRIVATE';
-- create variables for warehouse spec
set warehouse_name = 'CRUX_WAREHOUSE';
set warehouse_size = 'small';
set warehouse_type = 'standard';
set warehouse_comment = 'Warehouse for Crux Data loads';
-- create variables for storage integration with S3
set storage_integration_name = 'S3_INT_TEST_1';
-- public key cannot be set as variable so update the value on line #35
-- change ROLE to securityadmin
USE ROLE securityadmin;
-- create ROLE for Crux
CREATE ROLE if not exists identifier($role_name);
GRANT ROLE identifier($role_name) to ROLE SYSADMIN;
-- create a USER for Crux
create USER if not exists identifier($user_name)
password = $user_password
default_ROLE = $role_name
default_warehouse = $warehouse_name;
GRANT ROLE identifier($role_name) to USER identifier($user_name);
-- grant public key to user
-- Exclude the public key delimiters in the SQL statement.
ALTER USER identifier($user_name) SET RSA_PUBLIC_KEY='MIIBIjANBgkqhkiG9w0BAQEFAAOCAQ8AMIIBCgKCAQEAk6L4BusRpdlnpV6kmZzXdcs7nG+iFyY39e/3dXexYLFMlnB8G8zsaXhN+FYFEyIjWA+VDc0R9GGTPIDiJwNI8vexUTrH0ymuKC3NQVRm5EjKXKnUydyESKIoIv+lyut6L/8vXfSQGPZgCIVi9G+dUHOmMTn57dSmECRlIMsKnljHFY6NkQ4i/U/RSNC6Km8ZS/7L5CHqDpyfbvBL6HYZx2wPqf4w+TQGyH2U76h+geigerOgzTwwZqV06ihSOsko19gaIew1skvsz5fj3lBBoPJCFLWOOw5pkEcl+obYCijHLp8u0T9GKCbMVz2XBnBqcsLxEdTs/OuqBPI5VxdQfwIDAQAB';
-- change ROLE to sysadmin
USE ROLE sysadmin;
-- create a warehouse for Crux
-- Snowflake Doc: https://docs.snowflake.com/en/sql-reference/sql/create-warehouse
CREATE WAREHOUSE if not exists identifier($warehouse_name)
WITH WAREHOUSE_SIZE = $warehouse_size
WAREHOUSE_TYPE = $warehouse_type
AUTO_SUSPEND = 300
AUTO_RESUME = TRUE
COMMENT = $warehouse_comment;
-- Create Database for Crux
CREATE DATABASE if not exists identifier($database_name) COMMENT = $database_comment;
-- grant Crux role access to warehouse
GRANT USAGE
on warehouse identifier($warehouse_name) to ROLE identifier($role_name);
-- grant Crux access to database
GRANT CREATE SCHEMA, MONITOR, USAGE
on database identifier($database_name) to ROLE identifier($role_name);
-- change ROLE to CRUX
USE ROLE identifier($role_name);
-- Create Schema for Crux
USE DATABASE identifier($database_name);
CREATE SCHEMA if not exists identifier($database_schema_name);
-- change ROLE to AccountAdmin
USE ROLE ACCOUNTADMIN;
-- grant access to storage integration for external S3
GRANT USAGE ON INTEGRATION identifier($storage_integration_name) TO ROLE identifier($role_name);
commit;Depending on whether you want to create a new warehouse or use a shared warehouse do either:
If you want to create a new exclusive warehouse, don't make any changes to the
CRUX_WAREHOUSE
value in the script.If you want Crux to use a shared warehouse to ingest data, change the
CRUX_WAREHOUSE
value in the script to the name of the shared warehouse.
Replace the default
CRUX_DISPATCHER
,CRUX_DATABASE
,CRUX_USER
, andpassword123
values with values that pertain to your specific naming conventions for those resources. Important: Do not use this username for any other purpose!Make note of the values that replace the default
CRUX_DATABASE
,CRUX_USER
, andpassword123
. You will need them during the setup process to configure Crux.Run the script.
Connect to Your Destination
To connect to your destination:
Provide a unique connection name that you can refer to at a later time.
Select Snowflake in the destination connection method.
Provide the Account Identifier that is located on the bottom left of your Snowflake databases page. This will be in the form such as
xy12345
.Provide the Region in the form that is expressed in your Snowflake URL (e.g.
us-east-1
).Provide the Username in the form that is located on the bottom left of your Snowflake databases page.
When region is included in the Account Identifier, select the checkbox
Include region name in account
.Provide the Private Key and, optionally, the Private Key Passphrase that was obtained in the bash script above.
Select the Vendor that you would like to use. Currently we support Amazon S3.
Provide the Warehouse name associated with where Crux will send data (e.g.
CRUX_WAREHOUSE
). Note: this value is case sensitive.Provide the Database name within the warehouse. Note: this value is case sensitive.
Provide the Schema name to be used within the Database. Note: this value is case sensitive.
If you are using a vendor for staging, toggle ON the Use vendor for staging option.
When toggled on, you will see additional fields displayed that are specific to the vendor.
Provide the External bucket name associated with the staging vendor.
Provide the External ARN role for this Amazon S3 bucket.
Provide the Storage Integration ID from the above section.
Optionally, provide the External folder path for the staging activity.
Click Add connection to connect and create the Snowflake destination in the Crux External Data Platform.
Connection fields
To create a Snowflake destination connection, configure the following connection settings:
Field | Description |
Destination name | Name for your destination. |
Connection method | Type of connection. |
Account | Unique identifier for your organization. |
Username | Snowflake user account. |
Region | (Optional) Cloud region in use. |
Include region name in account | If checked, the specified region is included in your account name. Example account name: “sr80563", "sr80563.us-east-1" |
Private key | Private key used to authenticate a client when it connects. |
Private key passphrase | (Optional) Word or phrase that protects private key files. |
Vendor | Vendor of the cloud region. |
Warehouse | The cluster of compute resources used for Snowflake queries. |
Database | Database name. Example: CX_DATA_LAKE |
Schema | Schema name. Example: PRIVATE |
Folder path | (Optional) Destination folder path for the files. |
Use vendor for staging | Boolean toggle to use vendor. When enabled, specify:
|