Administrative Tasks and Troubleshooting

General Information

Users

Cloud

The cloud account is the Linux user account to use for general administrative tasks. The Linux UI desktop automatically logs in to this account. This account has sudo access.

Tomcat

The tomcat account owns the Tomcat web server process, which runs the web application and consequently owns all the Wolfram Engine (wolfram and WolframKernel) and background Wolfram user interface (Mathematica) processes.

apache

The apache account owns the Apache HTTP Server process (httpd), which serves static content and runs the authentication system.

Cloud User Home Directory

The cloud user account is the one to use for most administrative tasks.

Dot files

Both the .bashrc and .bash_profile files contain the following:
. /home/cloud/updater/epc-bashinit.sh
The Wolfram Cloud expects this line to remain there for various cloud tools to function correctly. Otherwise, the bash initialization files can be customized as desired.
One of the things the epc-bashinit.sh script does is to run a script that ensures that an ssh-agent process is running and sets the environment variables appropriately so that SSH can get keys from it automatically. When using Wolfram Enterprise Private Cloud (EPC) as a cluster, the cloud admin still needs to provision the master node with the private key and arrange to run ssh-add to provide it to the SSH agent.

Updater

The updater directory holds a software component specific to EPC for installing and updating the Wolfram Cloud, as well as providing administration tools.

Configuration

The configuration directory holds configuration data as well as logs from the configuration tools. The files and directories contained here are:

Wolfram Cloud Directory

The /wolframcloud directory contains key parts of the Wolfram Cloud environment. This section describes them.

Log Files

Apache

/www/logs
The standard Apache log files, access and errors, are contained in directories named for the domain. The primary directory would be the one beginning with "www" (this may be different if you have customized the default subdomain), e.g. /www/logs/www.wolfram-epc.example.com.

Tomcat (Wolfram Cloud web app and Wolfram Engine logs)

/www/tomcat/logs/wolfram
This is the primary log file directory for the Wolfram Cloud, as it contains logs for the Wolfram Cloud server-side web application, as well as some of the Wolfram Engine logs. Logs in this directory are rotated on a daily basis at midnight, e.g. CloudPlatform.log would be renamed to CloudPlatform.log.2020-12-31.log at midnight on New Year's Eve. Some commonly used files in this directory include:
/www/tomcat/base/current/log
Lower-level Tomcat logs are located here, so if the Wolfram Cloud web app does not appear to be starting or appears to have stopped, this is where information can be found. Log files here are always named with the date, even for the current day. Useful files here include:
Administrative Tools

Fabric Commands

The Wolfram Cloud provides a number of administrative commands using fab, the Fabric remote execution tool. A Fabric command is run by changing to the updater directory and using the fab command followed by its command name and any arguments. For example, this runs the service_status command with no arguments:
cd ~/updater
fab service_status
If a command takes arguments, they are specified by adding :arg=value immediately after the command name (no space), with commas separating multiple arguments. This runs the service_status command for just the tomcat service:
fab service_status:service=tomcat
Note that the update command must be run with sudo, since the script needs elevated permissions to update installed files:
sudo fab update
The following subsections describe the available Fabric commands in more detail.

configure

The configure command applies settings in configvalues.json to the Wolfram Cloud. It does this by applying the settings to templates of configuration files, resulting in final concrete configuration files stored in /wolframcloud/configuration, and then installs the updated configuration files in their respective locations. The updated configurations will not take effect until the Wolfram Cloud services are restarted:
fab configure

restart_all

The restart_all command restarts Wolfram Cloud services by first stopping them all and then starting them. It is equivalent to using stop_all followed by start_all:
fab restart_all
If new initialization code is installed, you will need to restart the Tomcat service in order for cloud kernels to reinitialize.

backup

The backup command starts an interactive session that allows you to select which data to back up. The backup command makes a backup onto the same machine, so if there is insufficient disk storage, the operation may fail and the backup may need to be done over the network, to pull data onto another machine with sufficient storage:
fab backup

restore

The restore command starts an interactive session that allows you to select which backup file to use:
fab restore

update

The update command performs an upgrade from an update file provided by Wolfram. It can also be used to reinstall the same version. Unlike other Fabric commands, this one must be run as sudo:
sudo fab update
This starts an interactive session that allows you to select the update file to use.
Troubleshooting
This section covers common issues and what to do if you encounter them.

Troubleshooting Administrative Tasks

Configuration notebook button not responding

If the Install Configuration Files button appears to be hanging when using a cluster, open a terminal and confirm that passwordless SSH is working on each node. The hang may be due to a hidden prompt from SSH (e.g. for a host key mismatch or password prompt). Resolve these issues on the command line, then relaunch the configuration notebook.

Configuration notebook cannot be opened

If Wolfram Desktop cannot be opened because all licenses are in use on a single-node EPC, run fab stop_all in a terminal window to temporarily shut down EPC and free up kernel licenses. Alternatively, you can edit configvalues.json directly and use the command-line tools only.

Mouse not working over VNC

If mouse clicks stop working over a VNC client, the VNC server may need to be restarted. Run vncserver -list in a terminal to see how many servers are running and which X displays they are listening on. Use the vncserver -kill command to kill the server on the given X display. Run vncserver to start a new server, note the display number and try to connect to this new display over the VNC client. This shows an example of these steps:
[cloud@master ~]$ vncserver -list

TigerVNC server sessions:

X DISPLAY # PROCESS ID
:2 15507
[cloud@master ~]$ vncserver -kill :2
Killing Xvnc process ID 15507
[cloud@master ~]$ vncserver

New 'master.wolframepc.example.com:2 (cloud)' desktop is master.wolframepc.example.com:2

Starting applications specified in /home/cloud/.vnc/xstartup
Log file is /home/cloud/.vnc/master.wolframepc.example.com:2.log

Troubleshooting User Actions

Cloud web app shows a 503 page

If the Apache httpd and haproxy services are working but there are no compute nodes responding (or more specifically, the Wolfram Cloud application at port 8080 is not responding on any compute nodes), the haproxy load balancer will respond with a 503 page. If you just restarted all services, you may see this page until the Wolfram Cloud application is ready on at least one compute node. Things to try include:

Request timeouts

Requests to cloud objects that require Wolfram Language evaluations, including deployed APIFunction, FormFunction, Delayed and deployed-view notebooks, can fail with a 503 status (or other 5xx status) if a kernel is not available within a configurable timeout period, or if there are more than 10 waiting requests. This can happen through a combination of how many requests are made (the load on the cloud) and the time spent in kernel evaluations (the speed of your application code).
If such requests give a 503 status (or a status in the 503599 range), try one of the following:
If all other options are exhausted and the load is not diminishing, it may be time to scale up and add compute nodes.

Evaluation timeouts

All user evaluations are subject to both time and memory limits. If a scheduled task, APIFunction or notebook is aborted, it may be tripping these limits. If the limits are too restrictive, these can be configured on slide 7, "Wolfram Engine Settings", of the configuration notebook under "Evaluation Limits".

Kernels initialization slow or not completing

If there are problems getting kernels, examine KernelInit.log in /www/tomcat/logs/wolfram , looking for the final entry for a kernel (beginning with the "Total:"). If kernels are finishing slower than they had been, it could be that a request to Wolfram servers is not working correctly. Report this to Wolfram Support if the condition continues. If kernels are not finishing at all, check that the license is not expired by running wolfram at the command line and confirming it gives the In[1] prompt (use Quit[] to exit this).

Notebooks will not save

If a notebook being edited indicates it cannot be saved, confirm there is available storage on the /wolframcloud/userfiles partition on the master node, and that it is mounted and functioning on compute nodes.