Google BigQuery Live

This topic describes how you can add live connections to your Google BigQuery databases in Sisense. For information about live connections, see Live Connect.

Sisense’s Google BigQuery connector is a powerful tool allowing you to retrieve from BigQuery only the data that you require, based on timestamp, date range or other parameters defined by you, thus reducing the import times and the costs associated with querying a large number of tables in BigQuery.

Sisense’s Google BigQuery connector supports complex data types used in BigQuery. Complex data types, including Nested and Nested Repeated, are ingested automatically without any need to write SQL and functions.

In this topic:

Connecting to BigQuery

Sisense enables easy and quick access to tables contained within Google BigQuery. To connect to your Google BigQuery database, you need to provide a Project ID. The Project ID is a unique identifier for your BigQuery project. You receive the Project ID when you create a project in Google BigQuery.

After you provide the Project ID, you can sign into your BigQuery account with your Google credentials. Once Google has authenticated your account, you can select what tables are to be imported into Sisense.

Note: If you are connecting to a Google service remotely, and the address of Sisense is something other than localhost, Google requires that you connect using the OAuth 2.0 protocol. See Google Authentication for more information.

To add Google BigQuery live connection:

  1. In the Data page, open a live model or click to create a new live model.
  2. In the Model Editor, click . The Add Data dialog box is displayed.
  3. In the Add Data dialog box, select Google BigQuery.

  4. In Project ID, enter your BigQuery Project ID. This project ID is only used for authentication. Once you log in, you will have access to all the projects to which you have Google Big Query permissions.
    If you have already created a project and can’t find the Project ID, click here for help.
  5. Click Sign in with Google and enter your email, and then select Next to enter your password. If you have multiple accounts, select the account that has the Google BigQuery data you want to access and enter the password, if you're not already signed in. You must be signed out of all other Google accounts.
  6. Click Accept to allow your Sisense Server to access your Google BigQuery data.
  7. Close the browser window when notified to do so.
  8. In Record Fields Flattening Level, enter the number of nested levels you want to flatten. For example, in a table with nested data depth of seven, you may choose to flatten only the first two levels. The data on the other levels will not be imported into Sisense. See Single Record Objects for more information.
  9. Click Allow Large Results if your BigQuery table is larger than 128 MB. Note that when importing query results larger than 128 MB, BigQuery creates a temporary table (click here for more information). For this table to be created, you must be granted the ‘Bigquery.tables.create’ permission in BigQuery (click here for more information). If you don’t have this permission, the query results will not be imported into Sisense. The message that Sisense issues in this case is “Response too large to return. Consider setting allowLargeResults to true in your job configuration.”
  10. Click Next. All tables and views associated with Google BigQuery are displayed.
  11. From the Tables list, select the relevant table or view you want to work with. You can click next to the relevant table or view to see a preview of the data inside it.
  12. After you have selected all the relevant tables, click Done. The tables are added to your schema.

Retrieving the Parent Table from Sharded Tables

Sisense provides a unique solution to rebuilding the parent table from its constituent sharded tables (grouped view). For this, Sisense provides an external file, where you can indicate the custom prefixes or regular expressions (regex) by which to group the shared tables into one parent table.

To group the shared tables into a single parent table:

  1. Open the prefixes.json file (You can open it in Notepad). This file is located in the directory:
    ...ProgramData\Sisense\DataConnectors\JVMContainer\Connectors\GoogleBigQuery

    This file contains default prefix and regex parameters that group all Google Analytics daily tables of the format "ga_sessions_YYYYMMDD” into one parent table.
  2. To refine the grouping, modify the default parameters or add additional parameters as relevant to your data, and save the file. Make sure that the prefixes row is separated by a comma from the regex row.
  3. To update your data model based on the new parameters, do one of the following:
    • from the Connection Wizard, jump to a previous step and return to the step you were on.
    • from the data model, open the Connector Wizard to add another table.

In both cases, The Connector Wizard will display the parent table, grouped from the sharded tables based on the indicated parameters.

Controlling the Partitions to Import with Custom Queries

Google BigQuery supports partitions and sharded tables to improve performance, availability, and maintainability.

The Sisense BigQuery connector provides the following abilities to run queries on BigQuery's table partitions and sharded tables:

You can import subsets of your partitioned and sharded tables into Sisense using SQL. Starting from Sisense version 8.0.1, you can create custom SQL queries in Live data models. With Live table queries, you can now add and transform data and build custom business logic for analytics from Live models using custom SQL. Using this feature you can add a WHERE clause on your specific partition. For details, see here.

Querying Nested and Array Objects

BigQuery supports nested records and arrays within tables. Nested records in BigQuery can be Single or Repeated records.

These record types are imported into Sisense as:

For example: A BigQuery schema that has Single and Repeated records:

Would be imported into Sisense as:

Click here for more information.

Sisense ingests both nested records and arrays automatically without any need to write SQL and functions:

Note: Sisense uses the standard SQL dialect, and not legacy SQL (also known as the BigQuery SQL).

Single Record Objects

BigQuery supports Nested data as objects of Record data type.

When importing data into Sisense, you need to indicate how many levels of nested data you want to flatten (see Connecting to BigQuery). In Sisense, data on these levels will be flattened to columns using the dot operator (.).

Data on the nested levels that you chose not to flatten is not imported into Sisense.

Repeated Record Objects

BigQuery supports Nested Repeated data as arrays of Repeated Record data type.

When importing data into Sisense, you can use the UNNEST operator to retrieve nested and repeated data as flattened rows via custom query (see here for more information).

Data in Array (primitive) data type is imported into Sisense. Sisense converts this data to un-indexed strings.

Source Limitations

Using Google BigQuery Storage API

The Sisense Big Query connector allows you to use the BigQuery Storage API. The BigQuery Storage API provides fast access to data stored in BigQuery. The BigQuery Storage API is a paid BigQuery service. For details about this service, click here.

Once the Storage API is enabled in BigQuery, it becomes available in Sisense and helps speed up the build times. On each import data request from Sisense, the executed query will be automatically optimized to work with the Storage API to import the data from BigQuery. In some use cases, using Storage API for large imports, you can achieve up to x5 faster build time.

When enabled, the driver checks the number of rows in an incoming result set table and the number of pages needed to retrieve all the results. If the number of rows and pages exceeds the threshold defined by Google Big Query, the driver switches to using the BigQuery Storage API. The Storage API feature allows for additional parallelism among multiple consumers for a set of results, thus enabling the driver to handle large result sets more efficiently.

Note: Storage API is relevant for use in ElastiCube models only (not in Live models).

Enabling Storage API in Google BigQuery

Enable the below service in the BigQuery API Library:

Storage API is a beta feature by BigQuery.

To learn more: