B2D Storage, Data File Creation, and Upload Process

Question
Answer

What is the data file that is generated and what is it used for?

The data file, which is generated in parquet or CSV format (based on what is configured; default is parquet), will contain the data that is going to be used to build the table. A file contains data only for one given table; however, more than one file can be generated for the given table.

The generated files are cut based on the max file size configuration for the file. Once the file reaches the configured size, the file is finalized, and a new file is generated.

By default, the data files are created compressed.

What is the naming convention for the file?

The naming convention for the file is composed of the name of the table being built (converted to a valid file name), the file creator number being used to write the file, and the serialization of the file. For example, the number will increase every time a new file is being written for the given table.

The convention is:

<Table Name>.<Writer #>.<File #>.parquet or .csv

For example, two files could be created for the given table, meaning the example below shows that two file writers worked in parallel to write each file for the given table:

Employees.1.1.parquet

Employees.2.2.parquet

How would we know if the problem is in the uploading of the output data file?

An upload of a data file to S3 could fail or timeout, as a result could end up in build failure. Such a scenario could occur potentially due to wrong permission setup for Sisense to S3, or connectivity issues between the Sisense instance and S3.

Note that a returned error from Amazon will be captured.

What is the naming convention for the folder created in the bucket?

The folder that gets created during the upload to S3 inside the configured bucket will be the name of the cube title.

Is the Sisense storage still being utilized during the B2D process?

Yes, Sisense storage is used to create the data files before they are uploaded to S3.

Based on the B2D design, does Sisense recommend utilizing local storage to improve build performance changes?

This recommendation is not applicable when this feature is used.