Skip to main content

Understanding the Model step

Learn about the steps involved in the modeling process.

Jon Tam avatar
Written by Jon Tam
Updated over 5 months ago

In the Model step, Crux scans the selected tables to review the delivery date ranges and file sizes of all underlying files. This allows you to confirm the delivery schedule and ensure the data is what you intend to work with. A file size chart shows the distribution of file sizes, helping you decide if Crux should further model the data into a cleaner structure. Note that larger files may increase the time required for the modeling process.

The modeling process at the table level includes five steps:

  1. Prepare. Crux checks the source data to ensure it’s ready for processing.

  2. Download files. Crux downloads files from the source and starts preparing them for profiling.

  3. Profile contents. This is the core of the modeling process, where Crux:

    • Analyzes file formats, encodings, delimiters, and other structural details of the raw files

    • Identifies patterns in the raw files

    • Infers the delivery schedule based on the frequency of file modifications

  4. Create schemas. If the table is to be delivered as “raw,” this step is skipped. Otherwise, Crux:

    • Defines the schema for standardized files, determining field types and formats based on the data.

    • Estimates critical statistical information about the field values.

    • Refines and groups file patterns based on content.

  5. Generate data pipelines. The Crux workflow manager initiates the pipeline generation process, defining the extraction schedule and setting up data source parameters.

Did this answer your question?