Creating Job Definitions#

About#

When creating a new job to be distributed on Omniverse Farm, one of the first steps you may wish to take is creating a job definition for it.

Omniverse Farm job definitions act as the point of entry for the work to be executed, and provide information about the requirements and dependencies necessary for their operations. Using this information, the services bundled in Farm Agents are then able to select the next task it can execute when querying the Farm Queue about awaiting tasks.

In the following section, we will look at job definitions in greater details, so you will have the information you need to start creating your own distributed jobs, whether they are implemented as:

Job Definition Schema: System Executables#

Job definitions are nothing more than KIT files you should already be familiar with if you have previously created an extension for an Omniverse application. If you have not yet had the opportunity to get acquainted with the development of extensions, you may be interested in looking at some of the resources available on that topic to get started.

Note

Kit token expansion is not supported in Farm 2.x.

Let’s start with a simple example printing a mandatory “Hello Omniverse!” message, in order to provide an overview of what we will be describing in greater detail:

minimal-job-definition.kit#
 1# Standard KIT metadata about the package for the job, providing information about what the feature accomplishes so
 2# it can be made visible to Users in Omniverse applications:
 3[package]
 4title = "Minimal Omniverse Farm job definition"
 5description = "A simple job definition for an Omniverse Farm job, printing a welcoming message."
 6category = "jobs"
 7version = "1.0.0"
 8authors = ["Omniverse Team"]
 9keywords = ["job"]
10
11# Schema for the job definition of a system command or executable:
12[job.hello-omniverse]
13# Type of the job. Using "base" makes it possible to run executable files:
14job_type = "base"
15# User-friendly display name for the job:
16name = "simple-hello-omniverse-job"
17# The command or application that will be executed by the job:
18command = "echo"
19# Arguments to supply to the command specified above:
20args = ["Hello Omniverse!"]
21# Capture information from `stdout` and `stderr` for the job's logs:
22log_to_stdout = true
minimal-job-definition.kit#
 1# Standard KIT metadata about the package for the job, providing information about what the feature accomplishes so
 2# it can be made visible to Users in Omniverse applications:
 3[package]
 4title = "Minimal Omniverse Farm job definition"
 5description = "A simple job definition for an Omniverse Farm job, printing a welcoming message."
 6category = "jobs"
 7version = "1.0.0"
 8authors = ["Omniverse Team"]
 9keywords = ["job"]
10
11# Schema for the job definition of a system command or executable:
12[job.hello-omniverse]
13# Type of the job. Using "base" makes it possible to run executable files:
14job_type = "base"
15# User-friendly display name for the job:
16name = "simple-hello-omniverse-job"
17# The command or application that will be executed by the job:
18command = "cmd"
19# Arguments to supply to the command specified above:
20args = ["/c","echo","Hello Omniverse!"]
21# Capture information from `stdout` and `stderr` for the job's logs:
22log_to_stdout = true

Note

Normally, you would directly call your executable in the command parameter, such as an Omniverse Kit application, but for built-in terminal commands, like echo, you must use the cmd /c syntax, as they are not standalone executables.

As you may have noticed, we have included some comments and annotations in the file. For more details about the job definition properties, refer to the Job Definition Schema Reference.

As a best practice, we encourage you to provide documentation in the job definition, as it acts as the entry point for the work that will be executed. This not only makes it easier to maintain your work over time, but also makes it easier to share it with others so they can reuse the service you created, and build even larger workflows thanks to the fruit of your labor.

While a simplistic example for demonstration purposes, you could envision using this ability for any platform-specific configuration for your job, such as:

  • Declaring environment variables

  • Enabling/disabling extensions

  • Setting default task arguments

  • etc.

Adding Job Definitions#

Omniverse Farm can be deployed as a Standalone install (refer to the Farm standalone installation guide) or in a Kubernetes environment via Helm. How you can define and add job definitions varies slightly between them.

Note

In Kubernetes, there is additional support for supplying Kubernetes specific properties under the capacity_requirements setting. Refer to the Job Definition Schema Reference for more details.

Standalone Deployments#

Job Definitions can be stored in local directories or remote services, allowing you to choose a configuration that best works for your specific needs. Depending on how the controller and jobs services are configured, job definitions are either managed directly by working with the files contained in a local directory or by using the appropriate endpoints of the jobs service or a redis server.

The default Farm Standalone deployment uses the jobs service to manage job definitions in a configurable directory accessed by the Farm Job service.

This defaults to the following locations:

  • Windows: $HOME/AppData/Local/nvidia/nv-svc-farm/job-definitions

  • Linux: $HOME/.local/share/nv-svc-farm/job-definitions

The jobs service’s /queue/management/jobs/save endpoint can be used to upload new or modified job definitions. The /queue/management/jobs/remove endpoint is used for removing job definitions. The job_definition_upload.py python script can be used to simplify uploading job definition kit files to the jobs service.

The default job definition directory can be changed via a setting in a toml file specified via the --config option.

farm-config.toml snippet#
1[settings.nv.svc.farm.jobs]
2
3store_args.new_job_definition_save_location = "/CUSTOM/JOB-DEFINITION-PATH"

Specify the farm-config.toml:

farm --config /PATH-TO/farm-config.toml

Refer to the Job Stores guide for more information.

Kubernetes Deployments#

For Kubernetes deployments, cluster access is required.

You will first need to retrieve the API key from the job’s service config map and utilize a custom job_definition_upload.py script to upload the job definitions.

To retrieve the job’s API key, issue the following command:

kubectl get configmap omniverse-farm-jobs -o yaml -n <<farm namespace>> | grep api_key | head -n 1

The API key is unique per Farm instance and must be kept private.

The job_definition_upload.py script can be retrieved from NGC.

Before using the script, two Python dependencies are required (requests and toml).

pip install requests
pip install toml

You are now ready to upload job definitions:

python job_definition_upload.py <Job Definition Filepath> --farm-url=<Omniverse Farm URL> --api-key=<API Key>

Here’s a quick usage example:

python /opt/scripts/job_definition_upload.py /home/foobar/df.kit --farm-url=http://my-awesome-farm.com --api-key="123shh-s3cr3t"

The job definition may take up to about 1 minute to propagate to the various services in the cluster.

Note

To get a list of job definitions currently in Farm, the /queue/management/jobs/load endpoint can be utilized.

Job Definition Schema: Omniverse Services#

Now that you know how to define a simple job definition and launch a command on the system, let’s see how to launch an Omniverse application to start building larger workflows.

In this example, we will go one step beyond our earlier example and introduce a few additional properties of the job definition to let you create more complex workflows:

omniverse-application-job-definition.kit#
 1# Standard KIT metadata about the package for the job, providing information about what the feature accomplishes so
 2# it can be made visible to Users in Omniverse applications:
 3[package]
 4title = "Sample Omniverse job definition"
 5description = "Sample job definition showcasing how to launch Omniverse applications."
 6version = "1.0.0"
 7authors = ["Omniverse Team"]
 8category = "jobs"
 9keywords = ["job"]
10
11# Schema for the job definition of an Omniverse application:
12[job.sample-omniverse-application-job]
13# Type of the job. Using "kit-service" makes it possible to execute services exposed by Omniverse applications:
14job_type = "kit-service"
15# User-friendly display name for the job:
16name = "sample-omniverse-application-job"
17# Set the path to the Composer shell script
18command = "/opt/nvidia/omniverse/composer/nv_internal.usd_composer.kit.sh"
19# List of arguments to provide to the Omniverse application when started:
20args = [
21    # Make sure the Omniverse application can be closed when the processing of the job has completed, and that the
22    # notification asking if the USD stage from the active session should be saved prior to closing does not prevent
23    # the application from shutting down:
24    "--/app/file/ignoreUnsavedOnExit=true",
25    # Make sure Omniverse application is active upon launch, and that notification prompt asking for User inputs
26    # is not preventing the session from being interactive:
27    "--/app/extensions/excluded/0='omni.kit.window.privacy",
28    # Add any additional setting required by the Omniverse application, or your own extensions:
29    # [...]
30]
31# Path of the service endpoint where to route arguments in order to start the processing of the job:
32task_function = "sample-processing-extension.run"
33# Flag indicating whether to execute the Omniverse application in headless mode (i.e. the equivalent of supplying it
34# with the `--no-window` command-line option):
35headless = true
36# Capture information from `stdout` and `stderr` for the job's logs:
37log_to_stdout = true
38
39# Supply a list of folders where extensions on which the job depends can be found:
40[settings.app.exts.folders]
41"++" = [
42    "${job}/exts-sample-omniverse-application-job",
43    # ...
44]
45
46# List of extensions on which the job depends in order to execute:
47[dependencies]
48"omni.services.farm.agent.runner" = {}
49# ...
50
51# When running the job, enable the following extensions:
52[settings.app.exts]
53enabled = [
54    # Extension exposing a "run" endpoint, which will receive the arguments of the task as payload, and start the
55    # job process:
56    "sample-processing-extension",
57    # ...
58]

Fundamentally, jobs implemented as Omniverse application services declare a set of extensions which should be enabled by the application, and the path to the endpoint that one of them exposes in order to fulfill the task.

The process should be familiar to you if you have already created an Omniverse extension, as it follows the typical development workflow. For clarity, a few details nonetheless about the example above, where we:

  1. Provide configuration options to the Omniverse application, so it can launch in a state that will allow it to perform the work it will receive.

  2. Specify the location of the extension(s) that we expect the Omniverse application to load for us.

  3. Enable any extension we require from the Omniverse application, along with the one that will act as the entrypoint for incoming requests to kickstart the execution of the task.

This entrypoint extension is expected to expose an endpoint that the location defined by the task_function option of the schema. This endpoint, implemented using the Service stack, will be called by the Agent tasked with performing a job, and that will supply the endpoint with any information it needs in order to execute the work.

A few additional notes about the layout of this job definition for Omniverse services:

  • We used Omniverse USD Composer for demonstration purposes in this sample. However, you are free to use any Kit application by supplying its startup command to the command property of the job definition.

  • For convenience, the headless flag can be used during development as a way of inspecting the operations performed by the service, to see the progress of the operations performed. Once deployed in a production context, running the application in headless mode make it both more performant and easier to scale, as batch workflows typically do not require a user interface to perform actions, and thus makes an entire desktop environment optional.

Schema Reference#

For reference, the following is a brief list of properties available for job definitions:

Property

Type

Description

job_type

string

Type of the job, can be either base or kit-service.

name

string

User-friendly name uniquely identifying the job.

task_function

string

Module to execute when when specifying a kit-service.

command

string

Application or command to be executed by the job.

working_directory

string

Directory where the command should be executed.

success_return_codes

Array<int>

List of return codes from the command that should be considered as successful executions.

args

Array<string>

List of arguments to supply to the command, and identical to all jobs instances.

allowed_args

Dict<string,Dict>

Dictionary of arguments which may be unique to each execution of a job, including default values. Arguments can be defined as:

[job.sample-job.allowed_args]
source      = { arg = "--source",      default = "" }
destination = { arg = "--destination", default = "" }
ratio       = { arg = "--ratio",       default = "0.5" }

env

Dict<string,string>

Dictionary of environment variables to supply to the command.

extension_paths

Array<string>

List of extension paths.

log_to_stdout

boolean

Flag indicating whether to capture information from stdout and stderr in the task’s logs.

headless

boolean

Flag indicating whether the application should be run in headless mode.

active

boolean

Flag indicating whether the task is enabled.

container

string

Image location of a Docker container to execute.

capacity_requirements

Dict<string,any>

See Capacity Requirements Schema Reference below.

Capacity Requirements Schema Reference (Kubernetes)#

The following contains a list of capacity_requirements properties available if deployed within a Kubernetes environment.

The following properties are specific to the container-v1-core and podspec-v1-core from Kubernetes version 1.24.

Two special properties are provided container_spec_field_overrides and pod_spec_field_overrides for specifying fields that may come in future Kubernetes specs.

Container Core Properties#

Property

Type

Description

container_spec_field_overrides

Dict<string,any>

Special property that does not apply to any particular Kubernetes field. Instead this can used to inject fields that may be added in future Kubernetes releases.

[job.sample-job.capacity_requirements.container_spec_field_overrides]
futureKuberneteContainerCoreField = "foobar"

env

Array<Dict<string,any>>

List of environment variables to set in the job’s container pod env.

[[job.sample-job.capacity_requirements.env]]
name = "foo"
value = "bar"

env_from

Array<Dict<string,any>>

List of sources to populate environment variables in the job’s container pod env from.

[[job.sample-job.capacity_requirements.envFrom]]
[job.sample-job.capacity_requirements.envFrom.configMapRef]
name = "sample-config"

image_pull_policy

string

The image pull policy for the job’s container image image pull policy.

[job.sample-job.capacity_requirements]
image_pull_policy = "Always"

lifecycle

Dict<string,any>

Specify the job’s container lifecycle lifecycle.

[job.sample-job.capacity_requirements.lifecycle.postStart.exec]
command = [
 "/bin/sh",
 "-c",
 "echo Hello from the postStart handler > /usr/share/message"
]

[job.sample-job.capacity_requirements.lifecycle.preStop.exec]
command = [
 "/bin/sh",
 "-c",
 "sleep 1"
]

liveness_probe

Dict<string,any>

Specify the job’s container pod liveness probe.

[job.sample-job.capacity_requirements.liveness_probe]
  [job.sample-job.capacity_requirements.liveness_probe.httpGet]
  path = "/status"
  port = "http"

ports

Array<Dict<string,any>>

Specify the job’s container pod container ports.

[[job.sample-job.capacity_requirements.ports]]
name = "http"
containerPort = 80
protocol = "TCP"

resource_limits

Dict<string,any>

Specify the job’s container pod resource limits. Refer to resource units for acceptable units.

[job.sample-job.capacity_requirements.resource_limits]
cpu = 1
memory = "4096Mi"
"nvidia.com/gpu" = 1

readiness_probe

Dict<string,any>

Specify the job’s container pod readiness probe.

[job.sample-job.capacity_requirements.readiness_probe]
  [job.sample-job.capacity_requirements.readiness_probe.httpGet]
  path = "/status"
  port = "http"

security_context

Dict<string,any>

Specify the job’s container pod, security context.

[job.sample-job.capacity_requirements.security_context]
runAsUser = 2000
allowPrivilegeEscalation = false

startup_probe

Dict<string,any>

Specify the job’s container pod startup probe.

[job.sample-job.capacity_requirements.startup_probe]
  [job.sample-job.capacity_requirements.startup_probe.httpGet]
  path = "/status"
  port = "http"

stdin

boolean

Control whether the job’s container should allocate a buffer for stdin in the container runtime stdin.

[job.sample-job.capacity_requirements]
stdin = true

stdin_once

boolean

Control whether the job’s container runtime should close the stdin channel after it has been opened by a single attach stdin once.

[job.sample-job.capacity_requirements]
stdin_once = false

termination_message_path

string

Path at which the file to which the container’s termination message will be written is mounted into the container’s filesystem termination message path.

[job.sample-job.capacity_requirements]
termination_message_path = "/dev/termination-log"

termination_message_policy

string

Indicate how the termination message should be populated termination message policy.

[job.sample-job.capacity_requirements]
termination_message_policy = "File"

tty

boolean

Control whether the job’s container should allocate a TTY for itself, also requires ‘stdin’ to be true tty.

[job.sample-job.capacity_requirements]
tty = true

volume_devices

Array<Dict<string,any>>

Specify the job’s container pod volume devices volume devices.

[[job.sample-job.capacity_requirements.volume_devices]]
devicePath = "/myrawblockdevice"
name = "blockDevicePvc"

volume_mounts

Array<Dict<string,any>>

Specify the job’s container pod volume mounts.

[[job.sample-job.capacity_requirements.volume_mounts]]
mountPath = "/root/.provider/"
name = "creds"

Pod Spec Properties#

Property

Type

Description

pod_spec_field_overrides

Dict<string,any>

Special property that does not apply to any particular Kubernetes field. Instead this can used to inject fields that may be added in future Kubernetes releases.

[job.sample-job.capacity_requirements.pod_spec_field_overrides]
futureKubernetesPodSpecField = "foobar"

active_deadline_seconds

integer

Duration in seconds the pod may be active on the node relative to StartTime before the system will actively try to mark it failed and kill associated containers active deadline seconds.

[job.sample-job.capacity_requirements]
active_deadline_seconds = 30

affinity

Dict<string,any>

Specify the job’s container pod affinity.

[[job.sample-job.capacity_requirements.affinity.nodeAffinity.requiredDuringSchedulingIgnoredDuringExecution.nodeSelectorTerms]]
[[job.sample-job.capacity_requirements.affinity.nodeAffinity.requiredDuringSchedulingIgnoredDuringExecution.nodeSelectorTerms.matchExpressions]]
key = "name"
operator = "In"
values = [ "worker-node" ]

[[job.sample-job.capacity_requirements.affinity.nodeAffinity.preferredDuringSchedulingIgnoredDuringExecution]]
weight = 1

[[job.sample-job.capacity_requirements.affinity.nodeAffinity.preferredDuringSchedulingIgnoredDuringExecution.preference.matchExpressions]]
key = "type"
operator = "In"
values = [ "01" ]

automount_service_account_token

boolean

Indicate whether a service account token should be automatically mounted automount service account token.

[job.sample-job.capacity_requirements]
active_deadline_seconds = 30

dns_config

Dict<string,any>

Specifies the DNS parameters of the job’s container pod dns config.

[job.sample-job.capacity_requirements.dnsConfig]
nameservers = [ "1.2.3.4" ]
searches = [ "ns1.svc.cluster-domain.example", "my.dns.search.suffix" ]

[[job.sample-job.capacity_requirements.dnsConfig.options]]
name = "ndots"
value = "2"

[[job.sample-job.capacity_requirements.dnsConfig.options]]
name = "edns0"

dns_policy

Dict<string,any>

Set DNS policy for the job’s container pod dns policy.

[job.sample-job.capacity_requirements]
dns_policy = "ClusterFirst"

enable_service_links

boolean

Indicates whether information about services should be injected into pod’s environment variables enable service links.

[job.sample-job.capacity_requirements]
enable_service_links = true

ephemeral_containers

Array<Dict<string,any>>

List of ephemeral containers run in the job’s container pod. ephemeral containers.

host_aliases

Array<Dict<string,any>>

List of hosts and IPs that will be injected into the pod’s hosts file if specified. This is only valid for non-hostNetwork pods. host aliases.

host_IPC

boolean

Use the host’s IPC namespace host IPC.

host_network

boolean

Host networking requested for the job’s container pod host network.

host_PID

boolean

Use the host’s PID namespace host PID.

hostname

string

Specifies the hostname of the Pod hostname.

image_pull_secrets

Array<Dict<string,any>>

List of references to secrets in the same namespace to use for pulling any of the images image pull secrets.

[[job.sample-job.capacity_requirements.imagePullSecrets]]
name = "registry-secret"

init_containers

Array<Dict<string,any>>

List of initialization containers init containers.

node_name

string

Node name is a request to schedule this pod onto a specific node node name.

node_selector

Dict<string,string>

Selector which must be true for the pod to fit on a node node selector.

[job.sample-job.capacity_requirements.node_selector]
"beta.kubernetes.io/instance-type" = "worker"
"beta.kubernetes.io/os" = "linux"

os

Dict<string,string>

Specifies the OS of the containers in the pod os.

overhead

Dict>string,any>

Overhead represents the resource overhead associated with running a pod for a given RuntimeClass overhead.

preemption_policy

string

Policy for preempting pods with lower priority preemption policy.

priority

string

Priority value priority.

priority_class_name

string

Indicate the pod’s priority priority class name.

readiness_gates

Array<Dict<string,any>>

Pod’s readiness gates.

runtime_class_name

string

Set the pod’s runtime class name.

scheduler_name

string

Specific scheduler to dispatch the pod scheduler name.

pod_security_context

Dict<string,any>

Specify the job’s container pod, pod security context.

[job.sample-job.capacity_requirements.pod_security_context]
runAsUser = 1000

service_account

string

Set the pod’s service account.

service_account_name

string

Name of the service account to use to run this pod service account name.

set_hostname_as_FQDN

boolean

The pod’s hostname will be configured as the pod’s FQDN set hostname as FQDN.

share_process_namespace

boolean

Share a single process namespace between all of the containers in a pod share process namespace.

subdomain

string

Specify the pod’s subdomain.

termination_grace_period_seconds

integer

Duration in seconds the pod needs to terminate gracefully termination grace period seconds.

tolerations

Array<Dict<string,any>>

Specify the job’s container pod tolerations.

[[job.sample-job.capacity_requirements.tolerations]]
key = "key1"
operator = "Equal"
value = "value1"
effect = "NoSchedule"

topology_spread_constraints

Array<Dict<string,any>>

Topology domain constraints see details.

volumes

Array<Dict<string,any>>

Specify the job’s container pod volumes. Refer to volumes for more examples and valid fields. The follow is an example of mounting a config map.

[[job.sample-job.capacity_requirements.volumes]]
name = "creds"
  [job.sample-job.capacity_requirements.volumes.configMap]
  name = "credentials-cm"

Additional Resources#