A table is a core component of a dataset within Crux. It represents a structured collection of data that shares a consistent schema and contains information on a specific topic. Tables can be considered a time series of snapshots, capturing data at different points in time. Each dataset within a data product comprises one or more tables, making them essential for organizing data in a time-ordered manner.
Components
Ordered collection of schemas: A table is an organized sequence of schema versions, each pertaining to a set of related files. These schemas evolve over time to reflect changes in the data's structure.
Static File Collection: Each table is a static group of files that share the same input schema, ensuring consistent ingestion and processing.
Source file patterns: Tables are composed of file patterns that define how files are named and grouped at the data source. For example, a pattern like
/path/to/files/stock-prices-YYYY-mm-dd.csv
may represent a series of daily stock prices.Schema evolution: The format of files within a table may change over time. Crux automatically detects these changes and helps organize them into distinct schema versions, ensuring that the historical structure of data is preserved.
Summary
A table in Crux is a time-ordered snapshot of data composed of related files that share a common schema. Tables are essential components of a dataset, providing a structured way to manage evolving data over time. With automatic schema detection and file pattern organization, Crux ensures that tables can accommodate changes in data structure, making them a powerful tool for managing complex data ingestion processes.
Learn more
Learn about the structure and organization of data products built with Crux.