Skip to main content

Supported file formats

Learn about supported file formats and data processing available with Crux.

Jon Tam avatar
Written by Jon Tam
Updated over 8 months ago

Crux automatically detects schema and creates data representation for external data in the following formats.

File Format

Compressed File Format

Overview

Considerations

CSV

ZIP

A text-based file where data is separated by commas or other delimiters. Itโ€™s widely used for tabular data.

Ensure consistent use of delimiters. Large CSV files can be resource-intensive and take longer to profile.

Avro

TAR

A data serialization system designed to support schemas and binary encoding.

Each file must include a schema. Avro enables fast serialization and compact schema definitions, making batch processing efficient.

Parquet

TAR.GZ

A columnar storage file format optimized for efficient compression and encoding, especially for large datasets.

Ideal for processing large datasets. Preserves schema details, making it suitable for complex data analysis and queries.

TXT (Plain Text)

N/A

A basic text file containing unstructured or semi-structured data without any formatting.

Requires custom parsing to extract structured data. Best suited for simple, raw text but less ideal for complex datasets without preprocessing.

With Crux, you can easily import and process data from these file formats, ensuring seamless integration and delivery from your external data sources. However, additional details may be required if a file contains a unique or complex structure that Crux cannot process automatically. In such cases, you may need to provide settings such as normalization parameters to help Crux accurately identify the schema and structure the dataset for further processing.

๐Ÿ’ก Crux also supports "raw" deliveries, where you can select files in any format and deliver them to the target destination without profiling. That is as-is, raw.

Learn more

Did this answer your question?