Overview
The Open Data Integration Nomenclature (ODIN) is Crux's standard for declarative data delivery. It provides a delivery nomenclature that incentivizes industry-standard GitOps practices. ODIN specs are inherently abstracted from their underlying control planes and workflow frameworks, but they natively work with the Sphere by Crux when onboarding data via the Crux platform.
ODIN specification components
As you build a data product in Crux Studio, the Crux platform populates a YAML standard data format with the following features:
Metadata and step specifications that need to be run in data pipelines in YAML
YAML files versioned that are backward compatible so that a later version supports all statements in an earlier version
Routines for validating the YAML and making sure the fields and structure are set correctly
YAML files existing in a tree. Children in a YAML file can point to their parent with the
parent:
field
As you connect to a data source, select file patterns, model data, identify its tables and schemas, and specify desired ingestion schedule, deadlines, and destinations. Crux dynamically builds and updates the ODIN specification for each dataset in your data product.
ODIN framework
Some of the information stored in the ODIN YAML file:
ID (Airflow-specific for the pipeline)
Dataset ID
Data Product ID
Org ID
Source connection and extraction information
Normalizer specification (if applicable)
Null value types
Schema history and schema validations
Context and environment variables
Schedule interval
Availability deadline
Destination information
Sample ODIN file
id: org_name_file_pattern_v1
version: 1.x
annotations:
ENV: PRODUCTION
metadata:
org_id: Orabcde
data_product_id: Pr1234abcde
dataset_id: Ds123456
run_uber_step: true
global:
extract:
action_class: pipeline.crux_pdk.actions.extract.extractor.ShortCircuitExtractor
connection:
type: SFTP
conf: ${INGESTION-ENGINE-SFTP_TYPE}
process:
action_class: pipeline.crux_pdk.actions.process.java_processor.JavaProcessor
crux_api_conf: ${ORG_NAME_FILE_PATTERN_V1_API}
endpoint: ${API_HOST}
pipelines:
- id: org_name_file_pattern_csv_ca8d
global:
global:
file_available_start_date: 2025-02-14
supplier_implied_date_regex: file_pattern_(?P<YYYY>\d{4})(?P<MM>\d{2})(?P<DD>\d{2})
provenance_file_patterns:
origin_patterns:
- file_pattern_(?P<YYYY>\d{4})(?P<MM>\d{2})(?P<DD>\d{2})
return_patterns:
- file_pattern_(?P<YYYY>\d{4})(?P<MM>\d{2})(?P<DD>\d{2})
schema_def:
na_values:
- ""
- -1.#IND
- NONE
- -1.#QNAN
- "#N/A"
- +nan
- NA
- 1.#IND
- "#N/A N/A"
- "#NA"
- N/A
- NaN
- +NaN
- "NULL"
- None
- -nan
- n/a
- -NaN
- "null"
- nan
- 1.#QNAN
- none
- "Null"
fields:
- name: Date
data_type: DATE
date_format: "%Y%m%d"
- name: Date_Effective
data_type: DATE
date_format: "%Y%m%d"
- name: Index_Symbol
data_type: STRING
- name: Internal_Key
data_type: STRING
- name: ISIN
data_type: STRING
- name: Company_Name
data_type: STRING
steps:
- id: extract
category: short_circuit
conf:
fetch_method: fetch_all_files_in_dir_with_state
remote_path: /source/folder/sub-folder/
file_patterns:
- file_pattern_((202502(1[4-9]|2\d)|2025(0[3-9]|1\d)\d\d)|(202[6-9]|20[3-9]\d|2[1-2]\d\d)\d\d\d\d).csv
- id: process
conf:
encoding: utf-8
delimiter: ;
escapechar: "\0"
error_bad_lines: true
skip_footer: false
file_has_header: true
strip_trailing_delimiter: false
handle_extended_whitespace_chars: false
availability_deadlines:
- deadline_minute: "30"
deadline_hour: "21"
deadline_day_of_month: "*"
deadline_month: "*"
deadline_day_of_week: "*"
deadline_year: "*"
file_frequency: daily
timezone: UTC
destinations:
- destination_id: AQZ2paE57OqePz7K5sG6nsunYC
name: My Snowflake Destination
dag:
max_active_runs: 1
owner: CruxInformatics
schedule_interval: "*/15 15-16 * * *"
priority_weight: 1
dag_start_date: 2025-03-06
dag_catchup: false
enable_delivery_cache: false
queue: kubernetes
Managing ODIN with cruxctl
When reviewing your data product in Crux Studio, you can view the ODIN specification file for each dataset in your data product directly in the Crux app. Additionally, you can use the cruxctl Command Line Interface (CLI) to verify the dataset specification before deploying it to production.
You can manage and make changes to the ODIN specification for each dataset by performing the following operations:
View ODIN in the Review and deploy step as you onboard your data product.
Export ODIN to download the YAML file.
Open the YAML file in your favorite text editor.
Make any needed changes and save your updates.
Run the following command to validate your updated YAML file:
cruxctl dataset validate [your_file.yaml]
When this validation check passes, run the following command to deploy:
cruxctl dataset apply [your_file.yaml]
π Running the cruxctl dataset apply
command will shortcut the deploy action and directly apply any changes to your DAG. This deployment will be reflected in the details of your data product, and as you examine it in the Review and deploy step in the Crux Studio.