Supported file formats | Sphere by Crux

Crux automatically detects schema and creates data representation for external data in the following formats.

File Format	Compressed File Format	Overview	Considerations
CSV	ZIP	A text-based file where data is separated by commas or other delimiters. It’s widely used for tabular data.	Ensure consistent use of delimiters. Large CSV files can be resource-intensive and take longer to profile.
Avro	TAR	A data serialization system designed to support schemas and binary encoding.	Each file must include a schema. Avro enables fast serialization and compact schema definitions, making batch processing efficient.
Parquet	TAR.GZ	A columnar storage file format optimized for efficient compression and encoding, especially for large datasets.	Ideal for processing large datasets. Preserves schema details, making it suitable for complex data analysis and queries.
TXT (Plain Text)	N/A	A basic text file containing unstructured or semi-structured data without any formatting.	Requires custom parsing to extract structured data. Best suited for simple, raw text but less ideal for complex datasets without preprocessing.

With Crux, you can easily import and process data from these file formats, ensuring seamless integration and delivery from your external data sources. However, additional details may be required if a file contains a unique or complex structure that Crux cannot process automatically. In such cases, you may need to provide settings such as normalization parameters to help Crux accurately identify the schema and structure the dataset for further processing.

💡 Crux also supports "raw" deliveries, where you can select files in any format and deliver them to the target destination without profiling. That is as-is, raw.

Learn more

Crux Domain Model