First make sure that Skein is installed.
Skein is intended to be used either as a command-line application or programmatically using the Python API. This document provides a brief example using the command-line – for more information please see the API or CLI documentation.
If your system is configured to use Kerberos for authentication, you need to make sure you have an active ticket-granting-ticket before continuing:
Start the Skein Driver (optional)¶
To communicate with the YARN Resource manager, Skein uses a background driver process written in Java. Since this process can be slow to start, sometimes it can be nice to start it once and have it persist through all CLI calls. This may look like:
Start the Skein driver
Run yarn application/applications
Shut down the Skein driver
To do this from the command-line, use skein driver start:
$ skein driver start localhost:12345
Note that if you don’t start the driver process, one will be started for you, but not persisted between calls.
Write an Application Specification¶
Skein applications are written declaratively as specifications. These can be provided as YAML or JSON files, or created programmatically using the specification api. For more information, see the specification docs.
Here we create a simple “Hello World” application as a YAML file:
name: hello_world queue: default master: resources: vcores: 1 memory: 512 MiB script: | sleep 60 echo "Hello World!"
Walking through this specification:
The application name is specified as
hello_world, and will be deployed in the YARN queue
name: hello_world queue: default
The application requests a single container for the Application Master with 1 virtual core and 512 MiB of memory.
... master: resources: vcores: 1 memory: 512 MiB
The Application Master then runs a bash script. Both
stderrof this script will be written to the container logs.
... script: | sleep 60 echo "Hello World!"
Submit the Application¶
Applications are submitted to be run on the cluster using the skein application submit command:
$ skein application submit hello_world.yaml application_1526497750451_0009
This uploads any necessary files to HDFS, and submits the application to the YARN scheduler. Depending on current cluster usage this could start immediately or at a later time. The command outputs the Application ID, which is needed for subsequent commands.
Query existing applications¶
As YARN processes applications, they work through several states, enumerated by
ApplicationState. The status of all Skein applications
can be queried using the skein application ls command. By default this shows all
applications that are either
$ skein application ls APPLICATION_ID NAME STATE STATUS CONTAINERS VCORES MEMORY application_1526497750451_0009 hello_world RUNNING UNDEFINED 1 1 512
You can also filter by application state. Here we show all
$ skein application ls -s KILLED -s FAILED APPLICATION_ID NAME STATE STATUS CONTAINERS VCORES MEMORY application_1526497750451_0002 hello_world KILLED KILLED 0 0 0 application_1526497750451_0004 hello_world KILLED KILLED 0 0 0 application_1526497750451_0005 hello_world FAILED FAILED 0 0 0
To get the status of a specific application, use the skein application status command:
$ skein application status application_1526497750451_0009 APPLICATION_ID NAME STATE STATUS CONTAINERS VCORES MEMORY application_1526497750451_0009 hello_world RUNNING UNDEFINED 1 1 512
Kill a running application¶
By default, applications shutdown once all of their containers have exited or any containers exits with a non-zero exit code. To explicitly kill an application, use the skein application kill command:
$ skein application kill application_1526497750451_0009 # See that the application was killed $ skein application status application_1526497750451_0009 APPLICATION_ID NAME STATE STATUS CONTAINERS VCORES MEMORY application_1526497750451_0009 hello_world KILLED KILLED 0 0 0
Stop the Skein Driver (optional)¶
If you started the Driver process (see Start the Skein Driver (optional) above), you’ll probably want to shut it down when you’re done. This isn’t strictly necessary (the driver can run for long periods), but helps keep resource usage on the edge node low.
To do this from the command-line, use skein driver stop.
$ skein driver stop