Skip to main content

Building your first data product

Step-by-step guide to connect, profile, and deliver data to your target destinations with Crux.

Jon Tam avatar
Written by Jon Tam
Updated over 3 months ago

This guide walks you through the essentials of building your first data product, from connection to final delivery with Crux. Follow these simple steps to create, configure, and activate your first data product.

1. Go to Crux Studio

Click the Crux Studio icon in the navigation menu to build your new data product.

2. Name your data product

Give your data product a unique name and description so you can easily find it later. All your data products are accessible in the My Data menu, where you can search, browse, and filter through your inventory.

3. Connect

Establish a connection to your data source using FTP, SFTP, S3, GCS, or another supported option. Remember, after this step, the source connection cannot be changed. You must create a new data product if you need to connect to a different data source later. Learn more about the Connect step.

4. Select data

Explore the source directory and choose the files and file patterns you wish to include in your data product. Crux will normalize your selection and automatically structure it into datasets and tables, organizing data for one-time or ongoing deliveries. Learn more about the Select Data step.

5. Model

Crux will profile and analyze the selected data to determine its schema (file names, data types, and sample data) and set an ingestion schedule. You can either:

  • Deliver data “as is” (raw) without processing to a file-based destination, or

  • Profile the data to identify its schema and ingestion schedule and customize normalization settings if needed.

The output is a structured and ready-to-use data product or raw data for unprocessed delivery. Learn more about the Model step.

6. Schedule

Configure the data extraction schedule. You can use the recommended settings or tailor the schedule to meet your business needs. Learn more about the Schedule step.

Configure the deadline for when you want your data at your destination. You can set a deadline that meets your business needs so that we can determine when to alert you for late or missing deliveries. Learn more about setting deadlines.

7. Test your deploy before activating

Prior to deploying and activating your production pipeline for regular deliveries, you may test the deploy to a test destination of your choosing. Create a destination that can be strictly used for deploy tests. Start the deploy test and wait until it is successful. If there is an issue, you may view the Troubleshooting Guide to investigate what occurred and remediate.

8. Specify destinations

Select your data’s production destination for regular deliveries, such as Snowflake, BigQuery, S3, GCS, FTP, SFTP, or Azure Blob. You can add multiple destinations but must select at least one to proceed. Learn more about the Destinations step.

9. Review and activate

Review the configuration of your data product and deploy to production.

Additionally, you may export your Odin spec and deploy from the cruxctl CLI tool. Once exported, you can modify the Odin configuration to adjust ingestion, normalizing, processing, and delivery settings. From there, you can validate to ensure that the spec is compliant with Odin standards and then deploy from the CLI. Learn more about the Review & Activate step.

10. Deploy datasets

Congratulations on completing your first data product setup! You should expect to receive your first data delivery based on the configured schedule. You can go to the Health Dashboard to view your active datasets. Note: the dataset will appear on the Health Dashboard after your first run has completed.

Did this answer your question?