Wolfram Cloud Theory of Operation

In order to operate and troubleshoot the Wolfram Cloud, it is helpful to be familiar with the way it works, for a variety of the major features.
Editing Notebooks
When you open a notebook in the webpages environment view (the URL should have "/env/" in it, such as https://www.wolframcloud.com/env/aero101/homework1.nb), the system first checks permissions on it, as it does when handling any request for a cloud object. If you are the owner or have write permissions, you will be given an exclusive editing lock on that notebook. The notebook is loaded and parsed in the application to create a model of its contents, and the model is sent to the JavaScript-powered notebook editor in the browser. More about the environment view URL can be found in the CloudObjectURLType option reference page.
As you edit the notebook, or as the notebook changes in response to evaluations (e.g. Print output will add a new cell even while the evaluation is still running), edits are sent between the browser and server to keep everything in synch. Changes to the notebook are sent to a central thread on the compute node that writes changes to the actual notebook file. When you close a browser tab, or even hit refresh (which sends a close), it actually triggers a final write of unsaved changes.
If you only have read (viewing) permission for the notebook, the interaction is far less complicated, as the notebook is still loaded and parsed into a model and then sent to your browser for viewing, but that is all since no changes are allowed.
Evaluations done in a notebook in the environment view are handled by a session (general) kernel, which is assigned exclusively to you. After some extended period of inactivity, the kernel will be terminated and any information it was holding will be lost. (Of course, any results shown in the notebook are still there, as are any files or cloud objects written to during the kernel session.)
Viewing a Deployed Notebook
Notebooks opened in the deployed view (the URL has "/obj" in it, such as https://www.wolframcloud.com/env/aero101/homework1.nb) have similarities and differences compared with the environment view. As with the environment view, or indeed any cloud object request, the system checks permissions first, but instead of "Write", the highest applicable capability is "Interact", which permits interaction with user interface controls. If a user opening a deployed view notebook does not have "Interact" permissions but does have "Read", the notebook will be displayed for reading, with UI controls disabled. In both cases, the notebook is loaded and parsed into a model, and sent to the browser for display. Even though there are no input cells where the user can enter arbitrary Wolfram Language code, the use of dynamic evaluation features requires a kernel, and for the deployed notebook this is from a deployment (or "Public") kernel. In general, a deployment kernel is used for one quick evaluation and then discarded. While this allows the cloud to handle a higher number of viewers for published content, it can result in somewhat sluggish interactivity, so there is a feature where the system attempts to reuse the same kernel for subsequent requests from the same client. This strikes a balance between server capacity and responsiveness. More about the deployed view URL can be found in the CloudObjectURLType option reference page.
Serving an API Request
When the Wolfram Cloud gets a request for an APIFunction, as with all cloud object requests, it first checks its permissions. "Execute" is the capability needed to serve an API, and there is no fallback; anything other than the request and requester having the "Execute" capability results in the request being denied. If the request is permitted, a deployment (or "Public") kernel is used to load the cloud object and run it via GenerateHTTPResponse, which determines a response to send back. If the evaluation exceeds the time or memory limits, the request will be terminated. If request traffic is sufficiently heavy, all the configured deployment kernels may be busy handling requests, and requests that arrive at this time are placed into a queue. If the queue exceeds a configured length limit, new requests are rejected immediately. If a request spends longer than a configurable amount of time waiting to be serviced, it will also be rejected.
Background Tasks
The scheduler runs on each compute node, polling MySQL tables for information about what jobs to run next. It is possible that the combination of a large cluster with a very large number of configured jobs will cause degradation in MySQL performance. When the background task system picks up a job to run, it acquires a kernel from the service pool, and loads and evaluates the code in the cloud object. As with other kernel evaluations, these are subject to time and memory constraints. The background task is a little different from other evaluation scenarios since it runs "headlessly"that is, without any manner of user interface; the only tool it has for debugging is log files. The task system maintains a log file for each task, in which the specific job starts and stops are recorded, along with error messages and console output.
Managing Cloud Objects from the Wolfram Language
Most of the Wolfram Language functions that are related to the cloud, including CloudDeploy and CloudObjects, or any function that takes a cloud object as an argument and obviously has to talk to the cloud, will make one or more calls to the Wolfram Cloud web app, and generally will involve talking to the MySQL database and often will also interact with the cloud object content stored on the NFS server. These requests are generally very lightweight, unless of course a large content payload is uploaded or downloaded, or if the total number of objects being addressed is large.
Wolfram Language Kernel Lifecycle
In order to effectively configure and use the Wolfram Language in the Wolfram Cloud, it is important to understand the lifecycle of a Wolfram Language kernel process.
The Wolfram Cloud's kernel manager maintains pools of kernels, each of a fixed size. When the Tomcat web server starts on a compute node, the configured number of kernels is launched and initialized, ready to be assigned to a user. Any time a kernel process exits for any reason, the kernel manager launches and initializes a new one to take its place. During this initialization time, systemwide initialization files are loaded. If the initialization files are updated, they will not be loaded into already initialized kernels, but only loaded into kernels launched after the point of the update. The time to initialize a kernel can vary, but it is significant, on the order of a minute.
At the point when a kernel is needed, it is assigned and configured to run as the cloud object's owner. This means that Wolfram Language metadata and settings such as $CloudUserID and $HomeDirectory are assigned to be those for the assigned user. It is at this point that initialization files for the user are loaded. The kernel sandbox is also configured so that the Wolfram Language will only be able to operate on the files for that user, such as the user's home directory. Certain metadata variables, including $GeoLocationCountry, $TimeZone and $DateStringFormat, are localized based on the detected geolocation of the requester, typically calculated using a GeoIP database. If the request is made from a private IP address, as is often the case for private cloud installations, a default IP address is fed into the GeoIP database as a proxy for a default location. This must be configured to a public IP address for this feature to work correctly, and it should be noted that various Wolfram Language features depend on settings such as $GeoLocationCountry to be set sensibly to work correctly.
Session and service kernels are killed once they are done being used. Deployment kernels are cleaned and returned to the pool for reuse. Deployment kernels will be killed once they have been used a configurable number of times.