Wolfram Cloud Setup
This section describes how to set up Wolfram Enterprise Private Cloud (EPC). This document will walk through each step and link to workflow pages for some of the specific steps. You can see the guide to all EPC workflows here.
This section will examine each of these areas in turn. The decisions for each will need to be made in consultation with Wolfram as to their suitability for your intended application.
Hosting and Hypervisor Options
You can host the Wolfram Cloud either on your own hardware or in Amazon Web Services (AWS). It is possible to have a Wolfram-hosted cloud through AWS. In this case, all installation and configuration, including setup, network access and machine specs, are handled by Wolfram EPC support. Additional hosting fees apply.
If you host the Wolfram Cloud on your own hardware, you will need to decide which hypervisor to use. The supported hypervisors are VMware, VirtualBox and Kernel-based Virtual Machine (KVM). However, due to the great variety of KVMs in circulation, Wolfram cannot provide instructions for every scenario. Users are urged to direct any questions about their own KVM setups to the Technical Support team.
Cluster Planning
Determining the cluster size (how many nodes) and the hardware specifications of each node (what resources each node has) can be a complex decision, and this section only aims to give some generic advice. This decision is mainly a function of determining the capacity needs of the cluster, based on requirements and expected load, combined with cost–benefit considerations. Part of the planning involves benchmarking, running the Wolfram Language code and notebook features you plan to use on similar hardware to determine things such as wall clock time, CPU time and memory use. The following are the minimum hardware requirements for a single node:
An EPC comes with a base level of eight kernels, suitable for a single node. Here are some use cases and how their usage fundamentals drive capacity planning:
- Users with session notebooks (editing and evaluating) need one session kernel for the duration of their work session
- Web-based deployments such as APIs and forms use a deployment kernel for each request; benchmark the evaluations
- Background tasks such as ScheduledTask, AutoRefreshed and DocumentGenerator use a service kernel; benchmark the evaluation and multiply by the frequency
For on-demand usage of deployed cloud objects, you will want to consider the expected load patterns, and there are mathematical tools to arrive at design points. As a working example, say an investment firm wants an EPC to serve forms that perform a complex calculation for analysts. It is estimated the calculation will be needed approximately three thousand times during the eight-hour work day, averaging around four per minute. Benchmarking the complex calculation shows it takes half a minute to compute its result. One performance goal you could have is that 99% of the 480 minutes in the work day have enough kernels to handle all requests. You can estimate this by simulating one million random minutes with a Poisson distribution based on the average request rate, and asking how many events are within the 99th percentile:
This tells us 13 kernels will suffice for this request rate to achieve 99% likelihood that all requests have a kernel. If a standard server allocated to the cluster has enough RAM and CPU to host eight kernels, this translates to needing two compute nodes plus the master node to run services for a total cluster size of three servers.
However, this does not use the information about how long the form takes to compute. The ErlangC function can show the likelihood of experiencing a wait, given the number of processes serving requests and the ratio of the request rate to the response rate (how many requests per unit time could be handled by each server if it was 100% utilized):
This shows a 0.0025% chance that an analyst will experience at least some waiting time. If you want to change your performance goal to instead be based on the chance of a wait, you could calculate a table of wait probabilities for a different number of kernels:
Now that you have a quantitative basis to evaluate availability, you can also explore what-if scenarios for using hardware that is more or less powerful, or to see if the code could be optimized to run 10% faster, and so on.
Calculations like these would be needed for all the various use cases EPC needs to arrive at a final capacity estimate, possibly with some reserve capacity built in for future needs.
More information on queueing theory for capacity planning can be found in Devendra Kapadia's blog post, "The Mathematics of Queues".
Pre-installation
After acquiring the hardware but before installing the software, a number of things need to be arranged:
- Generate TLS/SSL certificates for the cloud base (www.wolfram-epc.example.com) and master host if there is more than one node in the cluster (e.g. master.wolfram-epc.example.com)
- Install the hypervisor on each VM host in the cluster, if not using a third-party cloud provider (e.g. AWS)
Once the cluster hardware is running the selected hypervisor, or the third-party cloud provider node(s) is online, you are ready to create the virtual machines from either the OVA file or the AMI (for AWS). The following are the specific instructions for each scenario, which should be repeated for each node in your cluster:
- EPC can be hosted on a Kernel-based Virtual Machine (KVM). However, due to the variety of KVMs, Wolfram cannot provide instructions for every scenario. Users should work with the Technical Support team directly for KVM installations.
Once the VMs are created, contact Technical Support for login assistance to access the cluster nodes (the VMs you just created).
Next, place the SSL certificates and keys in the /wolframcloud/ssl directory on the master node. These will be installed into their final location during the configuration step.
Finally, if your EPC uses more than a single node, create and provision SSH keys so the cloud account on the master node can reach the other nodes over SSH without a password. If you do not already have a keypair you can create one using ssh-keygen. Once the keys are created, install the private key (typically named id_rsa) into the cloud user's account on the master node and the public key into the cloud account on the other nodes. The EPC code will automatically start an ssh-agent on the master node, but you may need to add your key using the ssh-add command before using the configuration notebook.
In order to configure the cloud for the first time, you will access the graphical desktop on the master node and use the configuration notebook, as shown in the Configure Wolfram Enterprise Private Cloud workflow.
This workflow shows the basic steps, but does not cover what settings to use. Instead, the configuration notebook itself has documentation for each setting. While the configuration notebook covers everything for a typical initial setup, there is a configuration file, /home/cloud/configuration/configvalues.json, that holds everything set in the configuration notebook, and is the place to make additional settings. These additional settings are documented in a later section. Note you can save the progress in the notebook and come back to it later, without applying the changes. Feel free to reach out to Technical Support during the process of choosing the configuration, because it is a critical step.
Required Settings
While most settings have a default value, there are a few settings in the configuration notebook that must be provided. These settings are highlighted with a rectangular placeholder area in the notebook.
CloudBaseDomain
While your cloud installation is generally known by its $CloudBase, the cloud base domain is the hostname setting from which many other hostnames and URLs, including $CloudBase, are derived. Only you will know how your network name routing is planned, so you must provide this core name setting.
SendMailHost
Many features within the Wolfram Cloud use email to communicate (e.g. password reset emails, scheduled task notifications and CloudShare notifications), and this setting allows it to forward those emails to a mail server you provide. There is no sensible default for this, as only you know what mail server you will provide.
DatabasePassword
Changing Configuration
Generally, the process for changing configuration is no different from setting it on initial install. Any setting, including the CloudBaseDomain, can be changed and reapplied.
Once the configuration is applied and the services restarted, you can begin testing the new installation. As you use the cloud, you may find things to reconfigure. You can go back to the configuration notebook (or configvalues.json file), change settigs, and restart services.
First, go to the landing page at the $CloudBase. If the CloudBaseDomain setting is wolfram-epc.example.com, then this URL will be https://www.wolfram-epc.example.com. If this works, it means networking is generally functional, the Apache web service is up and at least one compute node is up. Note whether the browser shows the page is secure, to indicate that the certificates have been installed and issued by a trusted issuer.
Next, click the Sign in button. If this page returns a login page, it means the authentication system is functional. In a new installation, there are no accounts yet, so click the link that invites you to create an account. The default settings of the Wolfram Cloud only allow accounts to be created with an email address matching the CloudBaseDomain. If you want to expand the set of domains allowed for user accounts, go back to the configuration step and use the "AllowedEmailDomains" setting to expand this list. Once an account is created, you should be logged in to the web app.
Once logged in to the Wolfram Cloud web app, create a new notebook and start typing a simple input such as 2+2, followed by + to evaluate it. If you get an output cell, it shows the Wolfram Engine is working as well. To run the self-test, evaluate the following in the cloud notebook:
This should run for a moment or two and return a result that indicates a 100% success rate. If there are failures, these can be reported to Wolfram Support. The self-test is useful after changing configuration of any kind.
If you will use APIs or forms, you can test this functionality by creating a simple deployment and using it. For example, deploy a simple APIFunction:
Click the URL from the resulting CloudObject, and it should open a browser page with the "Hello!" greeting. You may want to remove the deployment after testing:
If you will use background tasks, create a simple task and run it. First, deploy a simple ScheduledTask:
This task was specified to have no schedule, so it will only run on demand. You can make the task run:
The task should be picked up for execution almost immediately, though it can take up to five minutes depending on what else the task system is doing. To check its status, evaluate the following:
The "LastRunDate" field will be None if the task has not run, but it will be set to a date/time if it has run. If it has run, find the "Log" field and click the URL to see the task's log file, which will show the output for each run, among other things.
If you will connect to the cloud from Wolfram desktop applications such as Wolfram Desktop and Mathematica, open that and set $CloudBase to point to your new cloud, and then attempt to connect:
If the connection is successful, you can use any cloud function in the Wolfram Language. For example, evaluate an expression on EPC using CloudEvaluate:
When a new version of the Wolfram Cloud is available, you will download an update .tar file. You can find specific instructions for upgrading here:
After upgrading, you can follow the same general steps for testing a new cloud install, though you will already have an account created, and may have test deployments as well unless you deleted them.
If you find the need to rebuild a Wolfram Cloud node, or even perform an upgrade, you can accomplish this by first making a backup, installing the Wolfram Cloud as if it is on a new node, then performing a restore. The backup contains the configuration settings, user files and database content that comprise the Wolfram Cloud's data. See the documentation for backup and restore tasks for more information.
RLink
- Step 1: in the configuration notebook, slide 8, set "JLinkUseSandbox" to False
- Step 2: set "JLinkAdditionalReadDirectories" to include the directory where the R client is installed—for example, "JLinkAdditionalReadDirectories" -> {...,"/usr/lib64/R",...}
- Needs["RLink"]
- RLinkResourcesInstall[] (if you do not have these yet)
- InstallR[]
Python through ExternalEvaluate
- Step 1: install the Wolfram Client Library for Python
- Step 3: configure Python
- Step 4: test Python from a notebook using ExternalEvaluate