Tensorflow
Run a Tensorflow workload on a Kubernetes cluster.
Prerequisites
- A Kubernetes cluster;
- Helm 3 client. Please refer to the official setup guide.
Install this chart
Add the repo
If you haven't yet, please add this repo to Helm.
Install the chart
Use Helm to run Tensorflow on your cluster. We are deploying a Helm release called my-tensorflow
, with default parameters (i.e., a simple model training):
helm install my-tensorflow sconeapps/tensorflow
Have a look at the Parameters section for a complete list of parameters this chart supports.
SGX device
By default, this helm chart uses the SCONE SGX Plugin. Hence, it sets the resource limits of CAS as follows:
resources:
limits:
sgx.intel.com/enclave: 1
Alternatively, set useSGXDevPlugin
to azure
(e.g., --useSGXDevPlugin=azure
) to support Azure's SGX Device Plugin. Since Azure requires the amount of EPC memory allocated to your application to be specified, the parameter sgxEpcMem
(SGX EPC memory in MiB) becomes required too (e.g., --set useSGXDevPlugin=azure --set sgxEpcMem=16
).
In case you do not want to use the SGX plugin, you can remove the resource limit and explicitly mount the local SGX device into your container by setting:
extraVolumes:
- name: dev-isgx
hostPath:
path: /dev/isgx
extraVolumeMounts:
- name: dev-isgx
path: /dev/isgx
Please note that mounting the local SGX device into your container requires privileged mode, which will grant your container access to ALL host devices. To enable privileged mode, set securityContext
:
securityContext:
privileged: true
Before you begin
Attestation
This chart does not submit any sessions to a CAS, so you have to do it beforehand, from a trusted computer. If you need to pass remote attestation information to your container, such as SCONE_CONFIG_ID
and SCONE_CAS_ADDR
, use the extraEnv
parameter on values.yaml
.
Data output and external volumes
If your workload produces any output artifacts that need to be saved before the container is gone, consider having an auxiliary task uploading it to somewhere after the main task is finished.
For now, any output volumes are considered to be either hostPath
(meaning that their respective directories have to exist on the worker node) or emptyDir
, which maps everything to random-generated directory under /tmp
.
Parameters
Parameter | Description | Default |
---|---|---|
image |
Tensorflow image | registry.scontain.com/sconecuratedimages/datasystems:tensorflow-1.15 |
imagePullPolicy |
Tensorflow pull policy | IfNotPresent |
imagePullSecrets |
Tensorflow pull secrets, in case of private repositories | [{"name": "sconeapps"}] |
nameOverride |
String to partially override tensorflow.fullname template with a string (will prepend the release name) | nil |
fullNameOverride |
String to fully override tensorflow.fullname template with a string | nil |
podAnnotations |
Additional pod annotations | {} |
securityContext |
Security context for Tensorflow container | {} |
extraVolumes |
Extra volume definitions | [] |
extraVolumeMounts |
Extra volume mounts for Tensorflow pod | [] |
extraEnv |
Additional environment variables for Tensorflow container | [{"name": "SCONE_LAS_ADDR", "valueFrom": {"fieldRef": {"fieldPath": "status.hostIP"}}}] |
resources |
CPU/Memory resource requests/limits for node. | {} |
nodeSelector |
Node labels for pod assignment (this value is evaluated as a template) | {} |
tolerations |
List of node taints to tolerate (this value is evaluated as a template) | [] |
affinity |
Map of node/pod affinities (The value is evaluated as a template) | {} |
useSGXDevPlugin |
Use SGX Device Plugin to access SGX resources. | "scone" |
sgxEpcMem |
Required to Azure SGX Device Plugin. Protected EPC memory in MiB | nil |