Using Python and R in Notebooks
  • 18 May 2022
  • 6 Minutes to read
  • Contributors
  • Dark
    Light
  • PDF

Using Python and R in Notebooks

  • Dark
    Light
  • PDF

Python Libraries

The libraries provided for the code language selected for the kernel are listed in the left navigation panel. The libraries need to be imported from a code block at least once and can be used throughout the Notebook.
providedlibraries.png

Adding Python Libraries

You can add external Python libraries from the code blocks using the pip install commands.

Once an installation of a library is complete, a note displays at the bottom of the console output from pip stating that you may need to restart the kernel to use the updated packages. This note can be ignored as you will not need to restart the kernel to use the package.
ipiinstall.png

pipinstall.png

Watch this video about adding Python libraries:

Starting the Kernel

The kernel is a virtual machine that performs the computations required to execute code. It can run in the background to speed up computing.

Note:

Running the kernel can increase the CPU and memory use of your instance.

It has three statuses:

  • Stopped
  • Starting
  • Running with Elapsed Time

The kernel does not automatically start when a Notebook is created. The kernel status will read “Stopped” until the very first Code block has been triggered to run.

When the first Code Block is triggered to run by a user clicking “Run Code”, the kernel status changes to “Starting” and the code block should show a status stating “Starting kernel” with a loading icon.

Note:

Starting the first code block run may take a few minutes. Subsequent code runs will be quicker.

Once the kernel has been started, the Kernel Status shows a timer with the Elapsed Time to indicate how long the kernel has been running. You can click the square stop icon to stop the kernel.You will need to run a code block to start the kernel again.

Switching languages, for example from Python 3.9 to Python 3.7, stops the current kernel and starts a new one.
switchlang.png

The kernel shuts down for 5 minutes after the notebook has been closed or after the browser tab has been closed.

Note:

If the kernel is left running and the notebook is left open, the kernel continues to run and can increase the CPU and memory use of your instance.

Once the Python output has been saved using SisenseHelper.save_dataframe(), the output can be used to create a Sisense Chart by clicking the Chart button.

Visualizing with Python and R Plotting Libraries

Adding a Code Cell

To gain further insights from the analysis done using SQL, you can use Python to do more complicated manipulations. The outputs of the SQL block can be loaded into the Code blocks using the SisenseHelper.load_dataframe() function with the name of the SQL block as the input to continue the analysis using the preferred coding language.

There are 2 Sisense Helper functions that help load in the data from another SQL cell and save the final output as a dataframe.

  • Click + Code. The code block appears and already shows the Sisense Helpers for reference. The load_dataframe() function already has the most recent SQL block name as the input. In the screenshot below, that is “data_year”
  • To create a Sisense chart using the output of the code, the SisenseHelper.save_dataframe() must be used with the desired dataframe used as the input.
  • SQL block outputs can be loaded into and used in Code blocks but Code blocks cannot be loaded into other Code blocks as dataframes at this time.
    addcodecell.png

Watch this video on how to add Python code:

Referencing Other Cells

If a single query (cell) is refreshed, it will not rerun any other cell that it depends on. Rather, it incorporates the SQL of the cell it depends on and rewrites the query in the backend to return an up-to-date result. For example, assume that you have queryA which depends on queryB. The following would be the sequence of events:

  1. Run queryB.
  2. The latest version of queryA is temporarily injected into queryB.
  3. QueryB runs.
  4. The queryB cell updates, but queryA's cell doesn’t update.

Cell Functions

You can perform various functions on a cell:
cellfunctions.png

  1. Duplicate the cell.
  2. Click to break down CTEs into individual cells. This automatically breaks a cell with multiple common table expressions into multiple cells so they’re easier to understand and work with. For more information see About Refactoring CTEs, below.
  3. Click to reference this cell in another cell.
  4. Click the up and down arrows to change the position of this cell among the other cells in this Notebook.
  5. Click to delete the cell.
  6. Select to limit the number of results in the preview, and set the maximum number of rows (the default is 5,000 rows).
  7. Click to run this cell.
  8. Click to create a chart based on this cell.
  9. Download the results to a CSV file.
  10. Click and edit the name of the cell. Doing so will make it easier to refer to the query in the cell from other cells. Click the down arrow to contract the cell.
  11. Add new cells:
    • Click + SQL to add a new cell for entering SQL queries
    • Click + Code to add a code block for Python, or R
    • Click + Text to add a text cell for entering free text to annotate your code
      addcell.png

Watch this video about adding SQL cells:

Watch this video about adding text boxes:

Refactoring CTEs

Common Table Expressions (CTEs) can be very long. There’s no limit on how many CTEs can be defined at the beginning of a query, as long as the database has the capacity to run it without timing out. The problem with having lots of CTEs is that it can be difficult to debug.

Consider the following example:
code.png

The result of this query is 1527416. However, based on other data, we happen to know that this result is too low to be correct. It’s very hard to investigate the reason for this result just by looking at the query. Further debugging is required.
Use the CTE Refactoring feature in Notebooks to save time when trying to pinpoint issues by debugging your code.

Without the refactoring feature, you'd have to copy and paste each SQL statement into its own cell and see what the results are and determine if there’s anything wrong with each CTE.

With CTE refactoring, it's done in a single click. You can run all of the cells again to investigate the results of each.

In our example, the issue is in fourth_cte. As you can see, it doesn’t return results, unlike the third_cte above it. Upon closer inspection, it becomes clear that the reason for returning an empty result is that ‘ios” is case-sensitive and should be entered as iOS, instead.
query.png

Image Charts with Plotting Libraries

In addition to creating Sisense provided charts with the SisenseHelper.save_dataframe() function, analysts can take advantage of the plotting libraries available to Python and R by using a 3rd Sisense Helper function that will output a static image of the custom plot.

  • After creating a custom visualization using a plotting library like matplotlib, seaborn, ggplot and more.
  • Use SisenseHelper.save_image() to save and output your image chart

Only 1 image can be output per code block.
If the SisenseHelper.save_dataframe() function and the SisenseHelper.save_image() function are both found in the code cell, the save_dataframe() function will be the output.

Here is an example using the Chloroplethr plotting library with R:
imageplottinglibraries.png

Limitations

  • Interactive libraries, such as Plot.ly, are not supported in the Notebooks tab for Python or R.
  • Any chart created by Python or R cannot be used with the Sharing SQL Charts to Dashboard feature, which takes a chart from a Notebook and adds it directly to a Dashboard in the Analytics tab. This includes the Sisense provided charts created with Python or R and also the custom plot images.

Was this article helpful?