Administrative Tasks and Troubleshooting
The cloud account is the Linux user account to use for general administrative tasks. The Linux UI desktop automatically logs in to this account. This account has sudo access.
The tomcat account owns the Tomcat web server process, which runs the web application and consequently owns all the Wolfram Engine (wolfram and WolframKernel) and background Wolfram user interface (Mathematica) processes.
The apache account owns the Apache HTTP Server process (httpd), which serves static content and runs the authentication system.
The Wolfram Cloud expects this line to remain there for various cloud tools to function correctly. Otherwise, the bash initialization files can be customized as desired.
One of the things the epc-bashinit.sh script does is to run a script that ensures that an ssh-agent process is running and sets the environment variables appropriately so that SSH can get keys from it automatically. When using Wolfram Enterprise Private Cloud (EPC) as a cluster, the cloud admin still needs to provision the master node with the private key and arrange to run ssh-add to provide it to the SSH agent.
The updater directory holds a software component specific to EPC for installing and updating the Wolfram Cloud, as well as providing administration tools.
The configuration directory holds configuration data as well as logs from the configuration tools. The files and directories contained here are:
- configvalues.json — the configuration data file that holds settings from PrivateCloudConfiguration.nb as well as any other settings that have been customized
- prevconfigvalues.json — a backupconfigvalues.json taken when using the Generate Configuration Files button on the final slide of the configuration notebook
The /wolframcloud directory contains key parts of the Wolfram Cloud environment. This section describes them.
- backups — the directory where the backup operation places its result, and where the restore operation looks for a backup file
- configuration — the directory where generated configuration files are placed before installing them; these files can be inspected prior to installation
- userfiles — the network-shared directory managed by the Wolfram Cloud where user cloud files are stored
The standard Apache log files, access and errors, are contained in directories named for the domain. The primary directory would be the one beginning with "www" (this may be different if you have customized the default subdomain), e.g. /www/logs/www.wolfram-epc.example.com.
This is the primary log file directory for the Wolfram Cloud, as it contains logs for the Wolfram Cloud server-side web application, as well as some of the Wolfram Engine logs. Logs in this directory are rotated on a daily basis at midnight, e.g. CloudPlatform.log would be renamed to CloudPlatform.log.2020-12-31.log at midnight on New Year's Eve. Some commonly used files in this directory include:
- CloudPlatform.log — the main log file for the application, which contains informational and error logs across most of the cloud features
Lower-level Tomcat logs are located here, so if the Wolfram Cloud web app does not appear to be starting or appears to have stopped, this is where information can be found. Log files here are always named with the date, even for the current day. Useful files here include:
- catalina.yyyy-mm-dd.log — logs from Tomcat, including errors encountered in starting the Wolfram Cloud web app (named ROOT in the logs)
- localhost_access.yyyy-mm-dd.log — entries for each HTTP request, which can provide basic information about who is accessing what resources
The Wolfram Cloud provides a number of administrative commands using fab, the Fabric remote execution tool. A Fabric command is run by changing to the updater directory and using the fab command followed by its command name and any arguments. For example, this runs the service_status command with no arguments:
If a command takes arguments, they are specified by adding :arg=value immediately after the command name (no space), with commas separating multiple arguments. This runs the service_status command for just the tomcat service:
Note that the update command must be run with sudo, since the script needs elevated permissions to update installed files:
sudo fab update
The configure command applies settings in configvalues.json to the Wolfram Cloud. It does this by applying the settings to templates of configuration files, resulting in final concrete configuration files stored in /wolframcloud/configuration, and then installs the updated configuration files in their respective locations. The updated configurations will not take effect until the Wolfram Cloud services are restarted:
The restart_all command restarts Wolfram Cloud services by first stopping them all and then starting them. It is equivalent to using stop_all followed by start_all:
If new initialization code is installed, you will need to restart the Tomcat service in order for cloud kernels to reinitialize.
The backup command starts an interactive session that allows you to select which data to back up. The backup command makes a backup onto the same machine, so if there is insufficient disk storage, the operation may fail and the backup may need to be done over the network, to pull data onto another machine with sufficient storage:
The restore command starts an interactive session that allows you to select which backup file to use:
The update command performs an upgrade from an update file provided by Wolfram. It can also be used to reinstall the same version. Unlike other Fabric commands, this one must be run as sudo:
sudo fab update
If the Install Configuration Files button appears to be hanging when using a cluster, open a terminal and confirm that passwordless SSH is working on each node. The hang may be due to a hidden prompt from SSH (e.g. for a host key mismatch or password prompt). Resolve these issues on the command line, then relaunch the configuration notebook.
If Wolfram Desktop cannot be opened because all licenses are in use on a single-node EPC, run fab stop_all in a terminal window to temporarily shut down EPC and free up kernel licenses. Alternatively, you can edit configvalues.json directly and use the command-line tools only.
If mouse clicks stop working over a VNC client, the VNC server may need to be restarted. Run vncserver -list in a terminal to see how many servers are running and which X displays they are listening on. Use the vncserver -kill command to kill the server on the given X display. Run vncserver to start a new server, note the display number and try to connect to this new display over the VNC client. This shows an example of these steps:
[cloud@master ~]$ vncserver -list
TigerVNC server sessions:
X DISPLAY # PROCESS ID
[cloud@master ~]$ vncserver -kill :2
Killing Xvnc process ID 15507
[cloud@master ~]$ vncserver
New 'master.wolframepc.example.com:2 (cloud)' desktop is master.wolframepc.example.com:2
Starting applications specified in /home/cloud/.vnc/xstartup
Log file is /home/cloud/.vnc/master.wolframepc.example.com:2.log
If the Apache httpd and haproxy services are working but there are no compute nodes responding (or more specifically, the Wolfram Cloud application at port 8080 is not responding on any compute nodes), the haproxy load balancer will respond with a 503 page. If you just restarted all services, you may see this page until the Wolfram Cloud application is ready on at least one compute node. Things to try include:
- Checking basic vital signs on compute nodes: CPU load, memory in use, storage capacity and use, networking load
- Checking if the Wolfram Cloud web app is running; does ps -u tomcat --forest show the expected number of Wolfram kernel processes?
Requests to cloud objects that require Wolfram Language evaluations, including deployed APIFunction, FormFunction, Delayed and deployed-view notebooks, can fail with a 503 status (or other 5xx status) if a kernel is not available within a configurable timeout period, or if there are more than 10 waiting requests. This can happen through a combination of how many requests are made (the load on the cloud) and the time spent in kernel evaluations (the speed of your application code).
- Increase the DeploymentRequestQueueSize setting in the configuration notebook (the default is 10)
- Increase the KernelPoolAcquireTimeLimit property in configvalues.json (its value is in milliseconds and defaults to 30000, i.e. 30 seconds)
If all other options are exhausted and the load is not diminishing, it may be time to scale up and add compute nodes.
All user evaluations are subject to both time and memory limits. If a scheduled task, APIFunction or notebook is aborted, it may be tripping these limits. If the limits are too restrictive, these can be configured on slide 7, "Wolfram Engine Settings", of the configuration notebook under "Evaluation Limits".
If there are problems getting kernels, examine KernelInit.log in /www/tomcat/logs/wolfram , looking for the final entry for a kernel (beginning with the "Total:"). If kernels are finishing slower than they had been, it could be that a request to Wolfram servers is not working correctly. Report this to Wolfram Support if the condition continues. If kernels are not finishing at all, check that the license is not expired by running wolfram at the command line and confirming it gives the In prompt (use Quit to exit this).