"AzureBatch" (Batch Computation Provider)

Details

Azure Batch is a cloud-based batch computation service that schedules containerized jobs across pools of managed virtual machine nodes running on the Microsoft Azure platform.
To configure the "AzureBatch" batch computation provider for use in the Wolfram Language, follow the instructions in the Set Up the Azure Batch Computation Provider workflow.
The "AzureBatch" batch computation provider packs jobs into pool nodes based on each job's vCPU count requirement. Multiple jobs can execute concurrently on a single node, with each job running in a dedicated container.
The "AzureBatch" batch computation provider stores input and output data and console logs for jobs in an Azure Blob Storage account linked to the Azure Batch account that the job runs in. Files are organized within Azure Blob Storage containers according to Microsoft's Batch File Conventions standard.
The "AzureBatch" batch computation provider supports Linux-based jobs only.

Environment Properties

The following properties are supported in a RemoteBatchSubmissionEnvironment object for the "AzureBatch" provider:
  • "PoolID"(required)ID of the Azure Batch pool to which jobs are submitted
  • The "PoolID" property is required to construct a valid RemoteBatchSubmissionEnvironment object.
    When first evaluating RemoteBatchSubmissionEnvironment["AzureBatch",], a dialog will be presented to collect API credentials for the Azure Batch and Azure Blob Storage services. If "Save Connection" is checked in the dialog, the supplied credentials will be saved persistently and used automatically in future sessions.
    To disregard saved credentials and force the authentication dialog to display again, specify the environment setting "ServiceObject""New". »
    The Set Up the Azure Batch Computation Provider workflow provides instructions for creating all of the required environment resources in your Microsoft Azure subscription using an automated Azure Resource Manager template.

    Job Settings

    The following settings are supported by the RemoteProviderSettings option when using the "AzureBatch" provider:
  • "ContainerImage"name/URL of image to use for the job container
    "JobPriority"0priority of the job, between -1 and +1
    "VCPUCount"1integer number of vCPUs »
  • An Azure Batch pool executes "jobs", with each "job" containing one or more "tasks". The terminology used in RemoteBatchSubmit and RemoteBatchMapSubmit corresponds to this structure as follows:
  • RemoteBatchSubmit job ("single" job)an Azure Batch job containing a single task
    RemoteBatchMapSubmit array joban Azure Batch job containing multiple tasks
    RemoteBatchMapSubmit array child joba task within an Azure Batch job
  • Azure Batch packs tasks into pool nodes based on each task's "slot" requirement. Multiple tasks can execute concurrently on a single node, with each task running in a dedicated container.
    The "AzureBatch" batch computation provider sets each task's slot requirement equal to the "VCPUCount" setting, with the assumption that the pool-level "task slots per node" setting is configured equal to the number of vCPUs in the pool's node VM size. For more information, see Microsoft's documentation on parallel task execution and pool node sizes.
    The value of the "VCPUCount" setting is used only for job scheduling and for setting $ProcessorCount and $DefaultKernels within the job container. It does not affect CPU scheduling or the amount of CPU resources available to the remote kernel process. »
    The value of the "JobPriority" setting ranges from -1 (lowest priority) to +1 (highest priority). Within a given pool, higher-priority jobs have scheduling precedence over lower-priority jobs. For more information, see Microsoft's documentation on job priorities.
    The value of the "ContainerImage" setting is a Docker-compatible image reference string. If not running GPU-based jobs, you can reduce the size of the image downloaded to pool nodes by changing "wolframresearch/wolframengine:cuda" to "wolframresearch/wolframengine:latest".

    Job Statuses

    The following are possible values of the "JobStatus" job property for a "Single" or "ArrayChild" job when using the "AzureBatch" provider, listed in the order through which a typical job will pass:
  • "Active"the job task is queued and able to run, but has not yet been scheduled to a compute node
    "Running"the job task is running on a compute node
    "Succeeded"the job task's execution has succeeded and its output has been uploaded
    "Failed"the job task's execution has failed
  • A job in the "Running" state may be in the process of downloading input files, evaluating job code or uploading output data.
    The following are typical values of the "JobStatus" job property for an "Array" job when using the "AzureBatch" provider:
  • "Active"the array job is running child job tasks or waiting for child job tasks to be submitted
    "Completed"all child job tasks have completed or been terminated
  • For more information, see Microsoft's documentation on task states and job states.

    Job Properties

    When using the "AzureBatch" provider, the following properties are available from "Single"-type job objects, in addition to the standard properties supported by RemoteBatchJobObject:
  • "JobExitCode"exit code returned by the kernel within the job container
    "JobLog"standard output console logs from the job container
    "JobStandardErrorLog"standard error console logs from the job container
    "JobStatusReason"string describing the reason for which the job is in its current state
    "ProviderJobID"unique identifier for the job in Azure Batch
    "ProviderTaskID"unique identifier for the job's task in Azure Batch
  • When using the "AzureBatch" provider, the following properties are available from "Array"-type job objects, in addition to the standard properties supported by RemoteBatchJobObject:
  • "ChildJobExitCodes""JobExitCode" property of each array child job
    "ChildJobStatusReasons""JobStatusReason" property of each array child job
    "JobStatusReason"string describing the reason for which the array job is in its current state
    "ProviderFirstTaskID"sequential numeric identifier of the first child job
    "ProviderJobID"unique identifier for the array job in Azure Batch
  • When using the "AzureBatch" provider, the following properties are available from "ArrayChild"-type job objects, in addition to the standard properties supported by RemoteBatchJobObject:
  • "JobExitCode"exit code returned by the kernel within the child job container
    "JobLog"standard output console logs from the child job container
    "JobStandardErrorLog"standard error console logs from the child job container
    "JobStatusReason"string describing the reason for which the child job is in its current state
    "ProviderJobID"unique identifier for the parent array job in Azure Batch
    "ProviderTaskID"unique identifier for the child job's task in Azure Batch
  • The meanings of some possible values of the "JobExitCode" property are listed on the reference page for Exit. 1539950006
    The value of the "JobStatusReason" property is based on information supplied by Azure Batch and may be absent from a given job, depending on its current state.
    Azure Batch stores the standard output and standard error streams of a task's process separately, hence the separate "JobLog" and "JobStandardErrorLog" properties. In most circumstances, the "JobStandardErrorLog" property will be empty.
    The values of the "JobLog" and "JobStandardErrorLog" properties are typically retrieved from the job's compute node for running jobs and from Azure Blob Storage for completed jobs.
    The values of the "ProviderJobID" and "ProviderTaskID"/"ProviderFirstTaskID" properties correspond to job and task IDs, respectively, in Azure Batch.
    The task ID for a single job is typically the string "10000000". The task IDs for array child jobs are typically sequential numeric strings starting at "10000001".
    Stored job output data and logs remain in Azure Blob Storage until either manually deleted or automatically expired by the lifecycle management policy on the storage account.
    In addition to a job expression's return value available in the "EvaluationResult" property, extra output files exported to the directory ./JobOutput/ within a job container's working directory will be uploaded to the job's Azure Blob Storage container according to Microsoft's Batch File Conventions standard. However, such extra output files cannot be downloaded from within the Wolfram Language.

    Examples

    open allclose all

    Basic Examples  (2)

    Create an "AzureBatch" RemoteBatchSubmissionEnvironment object, after configuring the "AzureBatch" batch computation provider as described in the Set Up the Azure Batch Computation Provider workflow:

    Query the jobs status:

    Query the jobs status again, after it has completed:

    Download the jobs output:

    Submit an array job to Azure Batch that uses GPU computation to perform inference with a pretrained neural net:

    Download the array job output:

    Job Settings  (1)

    "VCPUCount"  (1)

    Instruct Azure Batch to schedule a submitted job to a node with at least four vCPUs available:

    Job Properties  (2)

    "Single" Jobs  (1)

    Submit a batch job to Azure Batch using RemoteBatchSubmit:

    Query the jobs status and exit code:

    Obtain the console output from the job container:

    "Array" and "ArrayChild" Jobs  (1)

    Submit an array job to Azure Batch using RemoteBatchMapSubmit:

    Query the status of each child job:

    Obtain a RemoteBatchJobObject expression representing the first child job:

    Query the child jobs status and exit code:

    Obtain the console output from the child job container:

    Properties & Relations  (1)

    If a job was terminated with RemoteBatchJobAbort, the value of the "JobStatusReason" property will indicate that the termination request originated from the Wolfram Language:

    Possible Issues  (1)

    If "Save Connection" was previously checked in the Azure Batch authentication dialog, the supplied credentials will be saved persistently and used automatically in future sessions. Specify the environment property "ServiceObject""New" to disregard saved credentials and force the authentication dialog to display again: