Run a Tensorflow workload on a Kubernetes cluster.
- A Kubernetes cluster;
- Helm 3 client. Please refer to the official setup guide.
Upload the policy for Tensorflow
If you need remote attestation, you can use the
upload_policy command to help you upload the tensorflow scone template which runs the hello world example from the TensorFlow overview. You can edit the template to run your confidential workload.
upload_policy command will generate a chart values file (
tensorflow_chart_values.yml) that you can use to deploy your confidential workload with help of HELM.
upload_policy with the default options to create the chart values with default poliy uploaded into the latest public CAS.
Create an alias for the
alias scone="docker run -it --rm \ -v \"$HOME/.docker/config.json:/root/.docker/config.json\" \ -v \"\$PWD:/root\" \ -v \"\$PWD:/values\" \ registry.scontain.com:5050/sconecuratedimages/experimental:upload-policy "
upload_policy comes with the templates for sconeapps preloaded. Assuming that you've created the alias above, use the following command to use the default policy for Tensorflow.
scone upload_policy templates/tensorflow
If everything went well,
upload_policy created the
tensorflow_chart_values.yml file in your current directory.
In case you want to not use the default policy maintained by Scontain, you can upload a custom policy. See
scone upload_policy --help for more information. It will always generate the chart values with the CAS address (SCONE_CAS_ADDR var) and SCONE_CONFIG_ID for remote attestation pointing to the policies created.
Install the chart
Add the sconeapps repo
If you haven't yet, please add sconeapps repo to Helm.
Install the chart
Use Helm to run Tensorflow on your cluster. We are deploying a Helm release called
my-tensorflow, with default parameters (i.e., a simple model training):
# Using the tensorflow_chart_values.yml file generated by the upload_policy command helm install my-tensorflow sconeapps/tensorflow -f tensorflow_chart_values.yml
Have a look at the Parameters section for a complete list of parameters this chart supports.
By default, this helm chart uses the SCONE SGX Plugin. Hence, it sets the resource limits of CAS as follows:
resources: limits: sgx.intel.com/enclave: 1
--useSGXDevPlugin=azure) to support Azure's SGX Device Plugin. Since Azure requires the amount of EPC memory allocated to your application to be specified, the parameter
sgxEpcMem (SGX EPC memory in MiB) becomes required too (e.g.,
--set useSGXDevPlugin=azure --set sgxEpcMem=16).
In case you do not want to use the SGX plugin, you can remove the resource limit and explicitly mount the local SGX device into your container by setting:
extraVolumes: - name: dev-isgx hostPath: path: /dev/isgx extraVolumeMounts: - name: dev-isgx path: /dev/isgx
Please note that mounting the local SGX device into your container requires privileged mode, which will grant your container access to ALL host devices. To enable privileged mode, set
securityContext: privileged: true
Before you begin
This chart does not submit any sessions to a CAS, so you have to do it beforehand. If you need to pass remote attestation information to your container, use the upload_policy container. It will add information such as
SCONE_CAS_ADDR, using the
extraEnv parameter on
Data output and external volumes
If your workload produces any output artifacts that need to be saved before the container is gone, consider having an auxiliary task uploading it to somewhere after the main task is finished.
For now, any output volumes are considered to be either
hostPath (meaning that their respective directories have to exist on the worker node) or
emptyDir, which maps everything to random-generated directory under
||Tensorflow pull policy||
||Tensorflow pull secrets, in case of private repositories||
||String to partially override tensorflow.fullname template with a string (will prepend the release name)||
||String to fully override tensorflow.fullname template with a string||
||Additional pod annotations||
||Security context for Tensorflow container||
||Extra volume definitions||
||Extra volume mounts for Tensorflow pod||
||Additional environment variables for Tensorflow container||
||CPU/Memory resource requests/limits for node.||
||Node labels for pod assignment (this value is evaluated as a template)||
||List of node taints to tolerate (this value is evaluated as a template)||
||Map of node/pod affinities (The value is evaluated as a template)||
||Use SGX Device Plugin to access SGX resources.||
||Required to Azure SGX Device Plugin. Protected EPC memory in MiB||