Skip to content

Tensorflow

Run a Tensorflow workload on a Kubernetes cluster.

Prerequisites

  1. A Kubernetes cluster;
  2. Helm 3 client. Please refer to the official setup guide.

Install this chart

Add the repo

If you haven't yet, please add this repo to Helm.

Install the chart

Use Helm to run Tensorflow on your cluster. We are deploying a Helm release called my-tensorflow, with default parameters (i.e., a simple model training):

helm install my-tensorflow sconeapps/tensorflow

Have a look at the Parameters section for a complete list of parameters this chart supports.

SGX device

By default, this helm chart uses the SCONE SGX Plugin. Hence, it sets the resource limits of CAS as follows:

resources:
  limits:
    sgx.k8s.io/sgx: 1

Alternatively, set useSGXDevPlugin to azure (e.g., --useSGXDevPlugin=azure) to support Azure's SGX Device Plugin. Since Azure requires the amount of EPC memory allocated to your application to be specified, the parameter sgxEpcMem (SGX EPC memory in MiB) becomes required too (e.g., --set useSGXDevPlugin=azure --set sgxEpcMem=16).

In case you do not want to use the SGX plugin, you can remove the resource limit and explicitly mount the local SGX device into your container by setting:

extraVolumes:
  - name: dev-isgx
    hostPath:
      path: /dev/isgx

extraVolumeMounts:
  - name: dev-isgx
    path: /dev/isgx

Please note that mounting the local SGX device into your container requires privileged mode, which will grant your container access to ALL host devices. To enable privileged mode, set securityContext:

securityContext:
  privileged: true

Before you begin

Attestation

This chart does not submit any sessions to a CAS, so you have to do it beforehand, from a trusted computer. If you need to pass remote attestation information to your container, such as SCONE_CONFIG_ID and SCONE_CAS_ADDR, use the extraEnv parameter on values.yaml.

Data output and external volumes

If your workload produces any output artifacts that need to be saved before the container is gone, consider having an auxiliary task uploading it to somewhere after the main task is finished.

For now, any output volumes are considered to be either hostPath (meaning that their respective directories have to exist on the worker node) or emptyDir, which maps everything to random-generated directory under /tmp.

Parameters

Parameter Description Default
image Tensorflow image registry.scontain.com:5050/sconecuratedimages/datasystems:tensorflow-1.15
imagePullPolicy Tensorflow pull policy IfNotPresent
imagePullSecrets Tensorflow pull secrets, in case of private repositories [{"name": "sconeapps"}]
nameOverride String to partially override tensorflow.fullname template with a string (will prepend the release name) nil
fullNameOverride String to fully override tensorflow.fullname template with a string nil
podAnnotations Additional pod annotations {}
securityContext Security context for Tensorflow container {}
extraVolumes Extra volume definitions []
extraVolumeMounts Extra volume mounts for Tensorflow pod []
extraEnv Additional environment variables for Tensorflow container [{"name": "SCONE_LAS_ADDR", "valueFrom": {"fieldRef": {"fieldPath": "status.hostIP"}}}]
resources CPU/Memory resource requests/limits for node. {}
nodeSelector Node labels for pod assignment (this value is evaluated as a template) {}
tolerations List of node taints to tolerate (this value is evaluated as a template) []
affinity Map of node/pod affinities (The value is evaluated as a template) {}
useSGXDevPlugin Use SGX Device Plugin to access SGX resources. "scone"
sgxEpcMem Required to Azure SGX Device Plugin. Protected EPC memory in MiB nil