Jupyter Notebook Server

The skein.recipes.jupyter_notebook module provides a command-line recipe for starting a Jupyter notebook server on a YARN container. The intended use is to execute the module in a service, using the command:

$ python -m skein.recipes.jupyter_notebook

The executing Python environment must contain the following dependencies to work properly:

  • skein

  • notebook

After launching the service, the notebook server connection information will be stored in the key-value store under the key 'jupyter.notebook.info'. This key name is configurable with the command-line flag --notebook-info-key.

Note

This recipe is not production ready, and is mostly provided for educational reasons. Port forwarding the running notebook server off a container may be forbidden by your cluster admins, and in general is not a nice workflow. For a better way of running Jupyter on a YARN cluster see Jupyter Enterprise Gateway.

Example

Here we provide a complete walkthrough of launching and connecting to a Jupyter notebook server. This example assumes you’re logged into and running on an edge node.

Kinit (optional)

See Kinit (optional).

Start the Skein Driver (optional)

See Start the Skein Driver (optional).

Package the Python Environment

To distribute Python environments we’ll make use of conda-pack, a tool for packaging and distributing conda environments. As mentioned above, we need to make sure we have the following packages installed in the remote environment:

  • skein

  • notebook

we’ll also install numpy to have an example library for doing some computation.

# Create a new demo environment (output trimmed for brevity)
$ conda create -n jupyter-demo
...

# Activate the environment
$ conda activate jupyter-demo

# Install the needed packages (output trimmed for brevity)
$ conda install conda-pack skein notebook numpy -c conda-forge
...

# Package the environment into environment.tar
$ conda pack -o environment.tar
Collecting packages...
Packing environment at '/home/jcrist/miniconda/envs/jupyter-demo' to 'environment.tar'
[########################################] | 100% Completed | 16.1s

Fix the Packaged Kernelspec

When conda-pack packages an environment, it trys to fix as many absolute paths found in the environment to be as relocatable as possible. Those that can’t be fixed before packaging can be fixed after unarchiving using the conda-unpack command. However, this mutates the files in place, which is not allowed on YARN (the application cache directory is read-only).

To get around this, for now we need to manually patch the packaged kernelspec file to remove the absolute path.

# The patched kernel.json file
$ cat kernel.json
{
 "display_name": "Python 3",
 "language": "python",
  "argv": [
  "python",
  "-m",
  "ipykernel_launcher",
  "-f",
  "{connection_file}"
 ]
}

# Append the patch to the environment.tar file
$ tar --append \
      --file=environment.tar \
      --transform 's,^,share/jupyter/kernels/python3/,S' \
      kernel.json

# gzip the environment.tar file
$ gzip -c6 environment.tar > environment.tar.gz

Write the Application Specification

Next we need to write the application specification. For more information see the specification docs.

# stored in jupyter-demo.yaml

name: jupyter-demo

services:
  jupyter:
    resources:
      memory: 1 GiB
      vcores: 1
    files:
      # Distribute the bundled environment as part of the application.
      # This will be automatically extracted by YARN to the directory
      # ``environment`` during resource localization.
      environment: environment.tar.gz
    script: |
      # Activate our environment
      source environment/bin/activate
      # Start the jupyter notebook server
      python -m skein.recipes.jupyter_notebook

Start the Jupyter Notebook Server

Now we have everything needed to start the Jupyter notebook server. The following bash command starts the application and stores the application id in the environment variable APPID.

$ APPID=`skein application submit jupyter-demo.yaml`

Retrieve the Notebook Server Information

To connect to a notebook server we need the address. If not specified explicitly in the startup command, this address (and other connection info) will be stored in the key 'jupyter.notebook.info'. We can retrieve this information and store it in a file using the following bash command:

$ skein kv get $APPID --key jupyter.notebook.info --wait > notebook.json
$ cat notebook.json
{"protocol": "http", "host": "worker.example.com", "port": 8888, "base_url": "/", "token": "fafa55f112e6ca7a73717933c3d4a03859df799f48f54c92"}

Connect to the Notebook Server

Since the notebook server is running on a container, you’ll (probably) need to use an SSH tunnel to access the server. This can be done by ssh-ing into the edge node again, and forwarding the host and port from the notebook.json information above out.

# Start an SSH tunnel to forward out the host:port of the notebook server,
# as found above
$ ssh jcrist@edge.example.com -L 8888:worker.example.com:8888

At this point you can login to the notebook server at localhost:8888 on your browser. On first login you’ll need to provide the security token (also in the notebook.json information above).

Once logged in, you can create new notebooks and interact with Jupyter as you normally would. All notebooks run inside the remote container, and have access to the distributed Python environment.

_images/jupyter_demo.png

Shutdown the Application

Once you’re done, you can shutdown the application in one of two ways.

  • Hit the Quit button on the Jupyter Web UI at localhost:8888/tree. This will shutdown the notebook server, causing the container to exit.

  • Manually shutdown the application using the Skein CLI

    $ skein application shutdown $APPID
    

Confirm that the Application Completed

We can check that application shutdown properly using skein application status

$ skein application status $APPID
APPLICATION_ID                    NAME            STATE       STATUS       CONTAINERS    VCORES    MEMORY    RUNTIME
application_1533143063639_0021    jupyter-demo    FINISHED    SUCCEEDED    0             0         0         10m