Portal Sample Metrics#
Overview#
The Omniverse DGX Cloud Portal Sample includes built-in OpenTelemetry (OTel) metrics export capabilities for comprehensive session monitoring and observability. This guide covers the metrics exported, architecture setup, and how to observe them in Azure Monitor.
Metrics Exported#
The Omniverse on DGX Cloud Portal Sample exports the following OpenTelemetry session metrics:
sessions.active.count
(UpDownCounter)Description: Current number of active streaming sessions
Use Case: Real-time capacity monitoring, concurrent user tracking
sessions.start.count
(Counter)Description: Total number of sessions started
Use Case: Usage analytics, growth tracking
sessions.end.count
(Counter)Description: Total number of sessions ended
Use Case: Completion rate analysis, session lifecycle tracking
sessions.duration
(Histogram)Description: Session duration in seconds with histogram buckets
Use Case: Performance analysis, user engagement metrics
Dimensional Data#
Each metric includes the following attributes for filtering and analysis:
session.id
- Unique session identifiersession.username
- User namesession.app
- Application name being streamedsession.user
- User IDnvcf.function_id
- NVIDIA Cloud Function IDnvcf.function_version_id
- NVCF Function Versionsession.duration.seconds
- Duration for end events
Prerequisites#
Docker installed on collector instance. Ports 4317 (gRPC) and 4318 (HTTP) available on collector instances.
Network connectivity between the Portal Sample and the collector.
Observability Backend: This guide provides an example of configuring Azure Monitor to export Portal Metrics. Steps to configure Grafana Cloud & Datadog are provided in the NVCF Observability Guide.
OTel Config For Azure Monitor#
The OTel collector can be set up on the same instance as the portal or a different instance. Configure the collector to receive metrics from the portal, process them (add labels, batch them), and forward them to Azure Monitor.
Create a directory for organizing the OTel collector’s configuration:
mkdir observability
Create the OTel configuration:
touch otel-collector-config.yaml
Configure the following receiver, processor, and exporter blocks. Copy the code block in the
otel-collector-config.yaml
file:
receivers:
otlp:
protocols:
grpc:
endpoint: 0.0.0.0:4317
http:
endpoint: 0.0.0.0:4318
processors:
# Add resource attributes to identify the source
resource:
attributes:
- key: service.name
value: "ov-dgxc-portal"
action: upsert
- key: service.version
value: "1.0.0"
action: upsert
- key: deployment.environment
value: "production"
action: upsert
# Batch processor for efficient export
batch:
timeout: 1s
send_batch_size: 1024
send_batch_max_size: 2048
# Memory limiter to prevent OOM
memory_limiter:
limit_mib: 256
check_interval: 1s
exporters:
debug:
verbosity: detailed
azuremonitor:
instrumentation_key: "${APPLICATIONINSIGHTS_CONNECTION_STRING}"
service:
pipelines:
metrics:
receivers: [otlp]
processors: [memory_limiter, resource, batch]
exporters: [debug, azuremonitor]
Additional documentation for OTel collector configuration can be found here.
Azure Monitor Setup#
Use an existing Azure Monitor instance, or create a new instance.
After you either create a new Application Insights instance, or use an existing instance. Navigate to Overview -> JSON View
Capture the ConnectionString Value from the Azure portal.
It will be in the format:
"InstrumentationKey=xxxxxxxxxx;IngestionEndpoint=https://xxxxx.applicationinsights.azure.com/;LiveEndpoint=https://xxxxx.monitor.azure.com/;ApplicationId=xxxxxxx"
Create OTel Collector#
Launch the collector as a Docker container so it can start receiving metrics from the Portal Sample and start forwarding them to Azure Monitor:
docker run -d \
--name otel-collector \
-p 4317:4317 \
-p 4318:4318 \
-e APPLICATIONINSIGHTS_CONNECTION_STRING="InstrumentationKey=xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxx;IngestionEndpoint=https://xxxx.in.applicationinsights.azure.com/;LiveEndpoint=https://xxxxxx.livediagnostics.monitor.azure.com/;ApplicationId=xxxxxx-xxxxx-xxxxx-xxxxx-xxxxxxxxxxxx" \
-v "$(pwd)/otel-collector-config.yaml:/etc/otelcol-contrib/config.yaml" \
otel/opentelemetry-collector-contrib:latest
Export Environment Variables#
Configure the Portal Sample to send its built-in OpenTelemetry metrics to your collector. Navigate to the Omniverse on DGX Cloud Portal Sample instance and configure the following environment variables:
export OTEL_EXPORTER_OTLP_ENDPOINT="http://<IP_OF_OTEL_INSTANCE>:4317"
export OTEL_SERVICE_NAME="web-streaming-backend"
Note
The IP Address or service name can be used for the OTEL_EXPORTER_OTLP_ENDPOINT
.
Metrics export verification#
Check Collector Status:
docker logs -f otel-collector
Verify metrics export:
Navigate to the Portal instance
cd ov-dgxc-portal-sample/backend
Test the metrics export
poetry run test-metrics
The expected output is:
Testing OpenTelemetry metrics...
Recording session start...
Incrementing active sessions...
Recording session end...
Decrementing active sessions...
Metrics recorded. Check your collector/backend for the data.
Waiting 10 seconds to ensure export...
To generate session activity from the Sample Portal, start streaming sessions from within it.
Confirm Telemetry on Azure Monitor#
Log into the Azure Portal. Once logged in, navigate to -> Monitor -> Application Insights -> Metrics
Select the appropriate metric namespace.
Sample Azure Monitor Queries#
Active Sessions Monitoring:
customMetrics
| where name == "sessions.active.count"
| extend session_app = tostring(customDimensions.session_app)
| extend session_user = tostring(customDimensions.session_user)
| extend nvcf_function_id = tostring(customDimensions.nvcf_function_id)
| project timestamp, name, value, session_app, session_user, nvcf_function_id
Active Session Duration:
customMetrics
| where name == "sessions.duration"
| extend session_app = tostring(customDimensions.session_app)
| extend session_user = tostring(customDimensions.session_user)
| extend nvcf_function_id = tostring(customDimensions.nvcf_function_id)
| project timestamp, name, value, session_app, session_user, nvcf_function_id
Usage Trends:
customMetrics
| where name == "sessions.start.count"
| extend session_app = tostring(customDimensions.session_app)
| summarize session_starts = count() by bin(timestamp, 1h), session_app
| render timechart