Debugging

Debugging YARN applications can be tricky, as you often don’t have access to the remote nodes to debug failures. This page provides several tips for what to do when you encounter failures.

Accessing the Skein Web UI

For running applications, the Skein Web UI can provide useful information including:

  • What services are currently running in the application

  • Status of all current and past containers

  • Live links to logs for each service (these are especially useful)

  • Key-value pairs in the Key-Value Store

For more information, see the Web UI docs.

Accessing the Application Logs

When an application finishes, its logs are (usually) aggregated and made available. They can be accessed using the yarn logs cli command.

$ yarn logs -applicationId <Application ID>

The logs contain the stdout and stderr for each service, as well as the application master. This is a good first place to look when encountering an unexpected failure or bug.

Configuring Logging in the Application Master

Skein’s Application Master uses Log4j for logging. When debugging issues in Skein itself, it can be useful to increase the log level to provide more information. This can be accomplished two different ways:

  • Change the logging level with the log_level field (debug is a good option).

  • Override the defaul log configuration by specifying a custom log4j.properties file in the specification. This allows you to increase the logging level for component libraries as well. See the Log4j documentation for more information on configuration files.

master:
  # Change the log level to debug
  log_level: debug
  # OR provide a custom log configuration file
  log_config: path/to/my/log4j.properties

Configuring Logging in the Client Driver

The skein.Client uses a Java driver process to communicate with services like YARN and HDFS. If you find issues in communicating with these services (submitting, querying, or killing applications), it may be useful to increase logging verbosity for the skein driver process (debug is a good option). There are a few ways to do this:

  • Using the log_level keyword when creating a skein.Client.

  • Setting the SKEIN_LOG_LEVEL environment variable (e.g. SKEIN_LOG_LEVEL=DEBUG).

  • Using the --log-level flag when starting a persistent driver using the CLI (e.g. skein driver start --log-level debug).

Additionally, you may want to log to a file instead of to the terminal. There are also a few ways to do this:

  • Using the log keyword when creating a skein.Client.

  • Using the --log flag when starting a persistent driver using the CLI.

Example

# Create a client, logging to `driver.log` with "debug" log level
import skein
client = skein.Client(log_level='debug', log='driver.log')