Development¶
This page provides information on how to build, test, and develop Skein
.
Building Skein¶
Install Dependencies (Conda)¶
We recommend using the Conda package manager to setup your development environment. Here we setup a conda environment to contain the build dependencies.
# Create a new conda environment
$ conda create -n skein
# Activate environment
$ conda activate skein
# Install dependencies
$ conda install -c conda-forge grpcio protobuf cryptography pyyaml
# Install grpcio-tools (not on conda-forge currently)
$ pip install grpcio-tools
Besides the above dependencies, you’ll also need Maven. You can install Maven using your system package manager, the maven website, or use Conda:
$ conda install -c conda-forge maven
Install Dependencies (Pip)¶
You can also setup the development environment using pip
.
$ pip install grpcio protobuf cryptography pyyaml grpcio-tools
Besides the above dependencies, you’ll also need Maven. You can install Maven using your system package manager or via the maven website.
Build and Install Skein¶
You can build and install Skein as an editable package or a regular install.
# Build and install skein as an editable package
$ python setup.py develop
# or, build and install as a regular package
$ python setup.py install
Running the Tests¶
The test suite is designed to run in a specific hadoop setup, provided by the
hadoop-test-cluster package. This is a CLI tool for setting up a Hadoop
cluster using docker compose. This requires docker compose
be installed,
and the docker daemon already be running. Please follow the install
instructions for your system here.
Install hadoop-test-cluster¶
You can install hadoop-test-cluster
using pip
. This assumes you already
have docker
and docker-compose
already installed.
The hadoop-test-cluster repository readme has documentation on usage - below we provide a few commands needed for using the cluster to run the tests.
$ pip install hadoop-test-cluster
Startup the Test Cluster¶
This command starts up a tiny Hadoop cluster with simple
security, and
mounts the current directory as ~/skein
on every node. To create a cluster
with kerberos
security enabled, add --config kerberos
to the command.
$ htcluster startup --image cdh5 --mount .:skein
Login to the Edge Node¶
$ htcluster login
Setup the Development Environment¶
The docker image already has Conda installed. After startup, you only need to install the runtime and test dependencies (see Install Dependencies (Conda)). Alternatively, Maven is also already installed on the docker image, so you can skip the instructions for building Skein locally above and do everything on the docker image.
You also need pytest
to run the tests, and flake8
to run the lint
checks.
$ conda install -c conda-forge pytest flake8
Build and Install Skein¶
$ python setup.py develop
Run the Tests¶
$ pytest skein
Run the Linter¶
$ flake8 skein
Shutdown the Cluster¶
When you are done developing, you can shutdown the cluster using the following
$ htcluster shutdown
Building the Documentation¶
Skein uses Sphinx for documentation. The source files are located in
skein/docs/source
. To build the documentation locally, first install the
documentation build requirements
$ pip install sphinx numpydoc sphinxcontrib.autoprogram
Then build the documentation with make
# Running from the skein/docs folder
$ make html
The resulting HTML files end up in the build/html
directory.
Submitting a Documentation-Only Pull Request¶
If your pull-request only contains documentation changes, you can tell
Travis-CI to skip running the tests (and speed-up our CI process) by including
the string "skip-tests"
somewhere in your commit message. For example:
Note how to skip tests on travis-ci [skip-tests]
Add a note to the develop.rst docs on how to skip running the tests in
travis.
# Please enter the commit message for your changes. Lines starting
# with '#' will be ignored, and an empty message aborts the commit.
# On branch conditional-docs-build
# Changes to be committed:
# modified: docs/source/develop.rst