Creating Job Definitions#

About#

When creating a new job to be distributed on Omniverse Farm, one of the first steps you may wish to take is creating a job definition for it.

Omniverse Farm job definitions act as the point of entry for the work to be executed, and provide information about the requirements and dependencies necessary for their operations. Using this information, the services bundled in Farm Agents are then able to select the next task it can execute when querying the Farm Queue about awaiting tasks.

In the following section, we will look at job definitions in greater details, so you will have the information you need to start creating your own distributed jobs, whether they are implemented as:

Scripts or executables
Omniverse application Services

Job Definition Schema: System Executables#

Job definitions are nothing more than KIT files you should already be familiar with if you have previously created an extension for an Omniverse application. If you have not yet had the opportunity to get acquainted with the development of extensions, you may be interested in looking at some of the resources available on that topic to get started.

Note

Kit token expansion is not supported in Farm 2.x.

Let’s start with a simple example printing a mandatory “Hello Omniverse!” message, in order to provide an overview of what we will be describing in greater detail:

Linux

minimal-job-definition.kit#

# Standard KIT metadata about the package for the job, providing information about what the feature accomplishes so
# it can be made visible to Users in Omniverse applications:
[package]
title = "Minimal Omniverse Farm job definition"
description = "A simple job definition for an Omniverse Farm job, printing a welcoming message."
category = "jobs"
version = "1.0.0"
authors = ["Omniverse Team"]
keywords = ["job"]

# Schema for the job definition of a system command or executable:
[job.hello-omniverse]
# Type of the job. Using "base" makes it possible to run executable files:
job_type = "base"
# User-friendly display name for the job:
name = "simple-hello-omniverse-job"
# The command or application that will be executed by the job:
command = "echo"
# Arguments to supply to the command specified above:
args = ["Hello Omniverse!"]
# Capture information from `stdout` and `stderr` for the job's logs:
log_to_stdout = true

Windows

minimal-job-definition.kit#

# Standard KIT metadata about the package for the job, providing information about what the feature accomplishes so
# it can be made visible to Users in Omniverse applications:
[package]
title = "Minimal Omniverse Farm job definition"
description = "A simple job definition for an Omniverse Farm job, printing a welcoming message."
category = "jobs"
version = "1.0.0"
authors = ["Omniverse Team"]
keywords = ["job"]

# Schema for the job definition of a system command or executable:
[job.hello-omniverse]
# Type of the job. Using "base" makes it possible to run executable files:
job_type = "base"
# User-friendly display name for the job:
name = "simple-hello-omniverse-job"
# The command or application that will be executed by the job:
command = "cmd"
# Arguments to supply to the command specified above:
args = ["/c","echo","Hello Omniverse!"]
# Capture information from `stdout` and `stderr` for the job's logs:
log_to_stdout = true

Note

Normally, you would directly call your executable in the command parameter, such as an Omniverse Kit application, but for built-in terminal commands, like echo, you must use the cmd /c syntax, as they are not standalone executables.

As you may have noticed, we have included some comments and annotations in the file. For more details about the job definition properties, refer to the Job Definition Schema Reference.

As a best practice, we encourage you to provide documentation in the job definition, as it acts as the entry point for the work that will be executed. This not only makes it easier to maintain your work over time, but also makes it easier to share it with others so they can reuse the service you created, and build even larger workflows thanks to the fruit of your labor.

While a simplistic example for demonstration purposes, you could envision using this ability for any platform-specific configuration for your job, such as:

Declaring environment variables
Enabling/disabling extensions
Setting default task arguments
etc.

Adding Job Definitions#

Omniverse Farm can be deployed as a Standalone install (refer to the Farm standalone installation guide) or in a Kubernetes environment via Helm. How you can define and add job definitions varies slightly between them.

Note

In Kubernetes, there is additional support for supplying Kubernetes specific properties under the capacity_requirements setting. Refer to the Job Definition Schema Reference for more details.

Standalone Deployments#

Job Definitions can be stored in local directories or remote services, allowing you to choose a configuration that best works for your specific needs. Depending on how the controller and jobs services are configured, job definitions are either managed directly by working with the files contained in a local directory or by using the appropriate endpoints of the jobs service or a redis server.

The default Farm Standalone deployment uses the jobs service to manage job definitions in a configurable directory accessed by the Farm Job service.

This defaults to the following locations:

Windows: $HOME/AppData/Local/nvidia/nv-svc-farm/job-definitions
Linux: $HOME/.local/share/nv-svc-farm/job-definitions

The jobs service’s /queue/management/jobs/save endpoint can be used to upload new or modified job definitions. The /queue/management/jobs/remove endpoint is used for removing job definitions. The job_definition_upload.py python script can be used to simplify uploading job definition kit files to the jobs service.

The default job definition directory can be changed via a setting in a toml file specified via the --config option.

farm-config.toml snippet#

[settings.nv.svc.farm.jobs]

store_args.new_job_definition_save_location = "/CUSTOM/JOB-DEFINITION-PATH"

Specify the farm-config.toml:

farm --config /PATH-TO/farm-config.toml

Refer to the Job Stores guide for more information.

Kubernetes Deployments#

For Kubernetes deployments, cluster access is required.

You will first need to retrieve the API key from the job’s service config map and utilize a custom job_definition_upload.py script to upload the job definitions.

To retrieve the job’s API key, issue the following command:

kubectl get configmap omniverse-farm-jobs -o yaml -n <<farm namespace>> | grep api_key | head -n 1

The API key is unique per Farm instance and must be kept private.

The job_definition_upload.py script can be retrieved from NGC.

Before using the script, two Python dependencies are required (requests and toml).

pip install requests
pip install toml

You are now ready to upload job definitions:

python job_definition_upload.py <Job Definition Filepath> --farm-url=<Omniverse Farm URL> --api-key=<API Key>

Here’s a quick usage example:

python /opt/scripts/job_definition_upload.py /home/foobar/df.kit --farm-url=http://my-awesome-farm.com --api-key="123shh-s3cr3t"

The job definition may take up to about 1 minute to propagate to the various services in the cluster.

Note

To get a list of job definitions currently in Farm, the /queue/management/jobs/load endpoint can be utilized.

Job Definition Schema: Omniverse Services#

Now that you know how to define a simple job definition and launch a command on the system, let’s see how to launch an Omniverse application to start building larger workflows.

In this example, we will go one step beyond our earlier example and introduce a few additional properties of the job definition to let you create more complex workflows:

omniverse-application-job-definition.kit#

# Standard KIT metadata about the package for the job, providing information about what the feature accomplishes so
# it can be made visible to Users in Omniverse applications:
[package]
title = "Sample Omniverse job definition"
description = "Sample job definition showcasing how to launch Omniverse applications."
version = "1.0.0"
authors = ["Omniverse Team"]
category = "jobs"
keywords = ["job"]

# Schema for the job definition of an Omniverse application:
[job.sample-omniverse-application-job]
# Type of the job. Using "kit-service" makes it possible to execute services exposed by Omniverse applications:
job_type = "kit-service"
# User-friendly display name for the job:
name = "sample-omniverse-application-job"
# Set the path to the Composer shell script
command = "/opt/nvidia/omniverse/composer/nv_internal.usd_composer.kit.sh"
# List of arguments to provide to the Omniverse application when started:
args = [
    # Make sure the Omniverse application can be closed when the processing of the job has completed, and that the
    # notification asking if the USD stage from the active session should be saved prior to closing does not prevent
    # the application from shutting down:
    "--/app/file/ignoreUnsavedOnExit=true",
    # Make sure Omniverse application is active upon launch, and that notification prompt asking for User inputs
    # is not preventing the session from being interactive:
    "--/app/extensions/excluded/0='omni.kit.window.privacy",
    # Add any additional setting required by the Omniverse application, or your own extensions:
    # [...]
]
# Path of the service endpoint where to route arguments in order to start the processing of the job:
task_function = "sample-processing-extension.run"
# Flag indicating whether to execute the Omniverse application in headless mode (i.e. the equivalent of supplying it
# with the `--no-window` command-line option):
headless = true
# Capture information from `stdout` and `stderr` for the job's logs:
log_to_stdout = true

# Supply a list of folders where extensions on which the job depends can be found:
[settings.app.exts.folders]
"++" = [
    "${job}/exts-sample-omniverse-application-job",
    # ...
]

# List of extensions on which the job depends in order to execute:
[dependencies]
"omni.services.farm.agent.runner" = {}
# ...

# When running the job, enable the following extensions:
[settings.app.exts]
enabled = [
    # Extension exposing a "run" endpoint, which will receive the arguments of the task as payload, and start the
    # job process:
    "sample-processing-extension",
    # ...
]

Fundamentally, jobs implemented as Omniverse application services declare a set of extensions which should be enabled by the application, and the path to the endpoint that one of them exposes in order to fulfill the task.

The process should be familiar to you if you have already created an Omniverse extension, as it follows the typical development workflow. For clarity, a few details nonetheless about the example above, where we:

Provide configuration options to the Omniverse application, so it can launch in a state that will allow it to perform the work it will receive.
Specify the location of the extension(s) that we expect the Omniverse application to load for us.
Enable any extension we require from the Omniverse application, along with the one that will act as the entrypoint for incoming requests to kickstart the execution of the task.

This entrypoint extension is expected to expose an endpoint that the location defined by the task_function option of the schema. This endpoint, implemented using the Service stack, will be called by the Agent tasked with performing a job, and that will supply the endpoint with any information it needs in order to execute the work.

A few additional notes about the layout of this job definition for Omniverse services:

We used Omniverse USD Composer for demonstration purposes in this sample. However, you are free to use any Kit application by supplying its startup command to the command property of the job definition.
For convenience, the headless flag can be used during development as a way of inspecting the operations performed by the service, to see the progress of the operations performed. Once deployed in a production context, running the application in headless mode make it both more performant and easier to scale, as batch workflows typically do not require a user interface to perform actions, and thus makes an entire desktop environment optional.

Schema Reference#

For reference, the following is a brief list of properties available for job definitions:

Property	Type	Description
`job_type`	`string`	Type of the job, can be either `base` or `kit-service`.
`name`	`string`	User-friendly name uniquely identifying the job.
`task_function`	`string`	Module to execute when when specifying a `kit-service`.
`command`	`string`	Application or command to be executed by the job.
`working_directory`	`string`	Directory where the `command` should be executed.
`success_return_codes`	`Array<int>`	List of return codes from the `command` that should be considered as successful executions.
`args`	`Array<string>`	List of arguments to supply to the `command`, and identical to all jobs instances.
`allowed_args`	`Dict<string,Dict>`	Dictionary of arguments which may be unique to each execution of a job, including default values. Arguments can be defined as: [job.sample-job.allowed_args] source = { arg = "--source", default = "" } destination = { arg = "--destination", default = "" } ratio = { arg = "--ratio", default = "0.5" }
`env`	`Dict<string,string>`	Dictionary of environment variables to supply to the `command`.
`extension_paths`	`Array<string>`	List of extension paths.
`log_to_stdout`	`boolean`	Flag indicating whether to capture information from `stdout` and `stderr` in the task’s logs.
`headless`	`boolean`	Flag indicating whether the application should be run in headless mode.
`active`	`boolean`	Flag indicating whether the task is enabled.
`container`	`string`	Image location of a Docker container to execute.
`capacity_requirements`	`Dict<string,any>`	See Capacity Requirements Schema Reference below.

Capacity Requirements Schema Reference (Kubernetes)#

The following contains a list of capacity_requirements properties available if deployed within a Kubernetes environment.

The following properties are specific to the container-v1-core and podspec-v1-core from Kubernetes version 1.24.

Two special properties are provided container_spec_field_overrides and pod_spec_field_overrides for specifying fields that may come in future Kubernetes specs.

Container Core Properties#

Property	Type	Description
`container_spec_field_overrides`	`Dict<string,any>`	Special property that does not apply to any particular Kubernetes field. Instead this can used to inject fields that may be added in future Kubernetes releases. [job.sample-job.capacity_requirements.container_spec_field_overrides] futureKuberneteContainerCoreField = "foobar"
`env`	`Array<Dict<string,any>>`	List of environment variables to set in the job’s container pod env. [[job.sample-job.capacity_requirements.env]] name = "foo" value = "bar"
`env_from`	`Array<Dict<string,any>>`	List of sources to populate environment variables in the job’s container pod env from. [[job.sample-job.capacity_requirements.envFrom]] [job.sample-job.capacity_requirements.envFrom.configMapRef] name = "sample-config"
`image_pull_policy`	`string`	The image pull policy for the job’s container image image pull policy. [job.sample-job.capacity_requirements] image_pull_policy = "Always"
`lifecycle`	`Dict<string,any>`	Specify the job’s container lifecycle lifecycle. [job.sample-job.capacity_requirements.lifecycle.postStart.exec] command = [ "/bin/sh", "-c", "echo Hello from the postStart handler > /usr/share/message" ] [job.sample-job.capacity_requirements.lifecycle.preStop.exec] command = [ "/bin/sh", "-c", "sleep 1" ]
`liveness_probe`	`Dict<string,any>`	Specify the job’s container pod liveness probe. [job.sample-job.capacity_requirements.liveness_probe] [job.sample-job.capacity_requirements.liveness_probe.httpGet] path = "/status" port = "http"
`ports`	`Array<Dict<string,any>>`	Specify the job’s container pod container ports. [[job.sample-job.capacity_requirements.ports]] name = "http" containerPort = 80 protocol = "TCP"
`resource_limits`	`Dict<string,any>`	Specify the job’s container pod resource limits. Refer to resource units for acceptable units. [job.sample-job.capacity_requirements.resource_limits] cpu = 1 memory = "4096Mi" "nvidia.com/gpu" = 1
`readiness_probe`	`Dict<string,any>`	Specify the job’s container pod readiness probe. [job.sample-job.capacity_requirements.readiness_probe] [job.sample-job.capacity_requirements.readiness_probe.httpGet] path = "/status" port = "http"
`security_context`	`Dict<string,any>`	Specify the job’s container pod, security context. [job.sample-job.capacity_requirements.security_context] runAsUser = 2000 allowPrivilegeEscalation = false
`startup_probe`	`Dict<string,any>`	Specify the job’s container pod startup probe. [job.sample-job.capacity_requirements.startup_probe] [job.sample-job.capacity_requirements.startup_probe.httpGet] path = "/status" port = "http"
`stdin`	`boolean`	Control whether the job’s container should allocate a buffer for stdin in the container runtime stdin. [job.sample-job.capacity_requirements] stdin = true
`stdin_once`	`boolean`	Control whether the job’s container runtime should close the stdin channel after it has been opened by a single attach stdin once. [job.sample-job.capacity_requirements] stdin_once = false
`termination_message_path`	`string`	Path at which the file to which the container’s termination message will be written is mounted into the container’s filesystem termination message path. [job.sample-job.capacity_requirements] termination_message_path = "/dev/termination-log"
`termination_message_policy`	`string`	Indicate how the termination message should be populated termination message policy. [job.sample-job.capacity_requirements] termination_message_policy = "File"
`tty`	`boolean`	Control whether the job’s container should allocate a TTY for itself, also requires ‘stdin’ to be true tty. [job.sample-job.capacity_requirements] tty = true
`volume_devices`	`Array<Dict<string,any>>`	Specify the job’s container pod volume devices volume devices. [[job.sample-job.capacity_requirements.volume_devices]] devicePath = "/myrawblockdevice" name = "blockDevicePvc"
`volume_mounts`	`Array<Dict<string,any>>`	Specify the job’s container pod volume mounts. [[job.sample-job.capacity_requirements.volume_mounts]] mountPath = "/root/.provider/" name = "creds"

Pod Spec Properties#

Property	Type	Description
`pod_spec_field_overrides`	`Dict<string,any>`	Special property that does not apply to any particular Kubernetes field. Instead this can used to inject fields that may be added in future Kubernetes releases. [job.sample-job.capacity_requirements.pod_spec_field_overrides] futureKubernetesPodSpecField = "foobar"
`active_deadline_seconds`	`integer`	Duration in seconds the pod may be active on the node relative to StartTime before the system will actively try to mark it failed and kill associated containers active deadline seconds. [job.sample-job.capacity_requirements] active_deadline_seconds = 30
`affinity`	`Dict<string,any>`	Specify the job’s container pod affinity. [[job.sample-job.capacity_requirements.affinity.nodeAffinity.requiredDuringSchedulingIgnoredDuringExecution.nodeSelectorTerms]] [[job.sample-job.capacity_requirements.affinity.nodeAffinity.requiredDuringSchedulingIgnoredDuringExecution.nodeSelectorTerms.matchExpressions]] key = "name" operator = "In" values = [ "worker-node" ] [[job.sample-job.capacity_requirements.affinity.nodeAffinity.preferredDuringSchedulingIgnoredDuringExecution]] weight = 1 [[job.sample-job.capacity_requirements.affinity.nodeAffinity.preferredDuringSchedulingIgnoredDuringExecution.preference.matchExpressions]] key = "type" operator = "In" values = [ "01" ]
`automount_service_account_token`	`boolean`	Indicate whether a service account token should be automatically mounted automount service account token. [job.sample-job.capacity_requirements] active_deadline_seconds = 30
`dns_config`	`Dict<string,any>`	Specifies the DNS parameters of the job’s container pod dns config. [job.sample-job.capacity_requirements.dnsConfig] nameservers = [ "1.2.3.4" ] searches = [ "ns1.svc.cluster-domain.example", "my.dns.search.suffix" ] [[job.sample-job.capacity_requirements.dnsConfig.options]] name = "ndots" value = "2" [[job.sample-job.capacity_requirements.dnsConfig.options]] name = "edns0"
`dns_policy`	`Dict<string,any>`	Set DNS policy for the job’s container pod dns policy. [job.sample-job.capacity_requirements] dns_policy = "ClusterFirst"
`enable_service_links`	`boolean`	Indicates whether information about services should be injected into pod’s environment variables enable service links. [job.sample-job.capacity_requirements] enable_service_links = true
`ephemeral_containers`	`Array<Dict<string,any>>`	List of ephemeral containers run in the job’s container pod. ephemeral containers.
`host_aliases`	`Array<Dict<string,any>>`	List of hosts and IPs that will be injected into the pod’s hosts file if specified. This is only valid for non-hostNetwork pods. host aliases.
`host_IPC`	`boolean`	Use the host’s IPC namespace host IPC.
`host_network`	`boolean`	Host networking requested for the job’s container pod host network.
`host_PID`	`boolean`	Use the host’s PID namespace host PID.
`hostname`	`string`	Specifies the hostname of the Pod hostname.
`image_pull_secrets`	`Array<Dict<string,any>>`	List of references to secrets in the same namespace to use for pulling any of the images image pull secrets. [[job.sample-job.capacity_requirements.imagePullSecrets]] name = "registry-secret"
`init_containers`	`Array<Dict<string,any>>`	List of initialization containers init containers.
`node_name`	`string`	Node name is a request to schedule this pod onto a specific node node name.
`node_selector`	`Dict<string,string>`	Selector which must be true for the pod to fit on a node node selector. [job.sample-job.capacity_requirements.node_selector] "beta.kubernetes.io/instance-type" = "worker" "beta.kubernetes.io/os" = "linux"
`os`	`Dict<string,string>`	Specifies the OS of the containers in the pod os.
`overhead`	`Dict>string,any>`	Overhead represents the resource overhead associated with running a pod for a given RuntimeClass overhead.
`preemption_policy`	`string`	Policy for preempting pods with lower priority preemption policy.
`priority`	`string`	Priority value priority.
`priority_class_name`	`string`	Indicate the pod’s priority priority class name.
`readiness_gates`	`Array<Dict<string,any>>`	Pod’s readiness gates.
`runtime_class_name`	`string`	Set the pod’s runtime class name.
`scheduler_name`	`string`	Specific scheduler to dispatch the pod scheduler name.
`pod_security_context`	`Dict<string,any>`	Specify the job’s container pod, pod security context. [job.sample-job.capacity_requirements.pod_security_context] runAsUser = 1000
`service_account`	`string`	Set the pod’s service account.
`service_account_name`	`string`	Name of the service account to use to run this pod service account name.
`set_hostname_as_FQDN`	`boolean`	The pod’s hostname will be configured as the pod’s FQDN set hostname as FQDN.
`share_process_namespace`	`boolean`	Share a single process namespace between all of the containers in a pod share process namespace.
`subdomain`	`string`	Specify the pod’s subdomain.
`termination_grace_period_seconds`	`integer`	Duration in seconds the pod needs to terminate gracefully termination grace period seconds.
`tolerations`	`Array<Dict<string,any>>`	Specify the job’s container pod tolerations. [[job.sample-job.capacity_requirements.tolerations]] key = "key1" operator = "Equal" value = "value1" effect = "NoSchedule"
`topology_spread_constraints`	`Array<Dict<string,any>>`	Topology domain constraints see details.
`volumes`	`Array<Dict<string,any>>`	Specify the job’s container pod volumes. Refer to volumes for more examples and valid fields. The follow is an example of mounting a config map. [[job.sample-job.capacity_requirements.volumes]] name = "creds" [job.sample-job.capacity_requirements.volumes.configMap] name = "credentials-cm"

Creating Job Definitions#

About#

Job Definition Schema: System Executables#

Adding Job Definitions#

Standalone Deployments#

Kubernetes Deployments#

Job Definition Schema: Omniverse Services#

Schema Reference#

Capacity Requirements Schema Reference (Kubernetes)#

Container Core Properties#

Pod Spec Properties#

Additional Resources#