Skip to content

Deploying & Reconciling the SCONE Operator

The SCONE Kubernetes operator facilitates a declarative description of SCONE-related custom resources. These custom resources include

  • SCONE CAS (cas.services.scone.cloud): deploys a high-availability CAS using a primary/backup approach.
  • SCONE LAS (las.base.scone.cloud): deploys local attestation service on all SGX-capable nodes of a Kubernetes cluster,
  • SCONE SGXPlugin (las.base.scone.cloud): identifies and labels all SGX-capable Kubernetes nodes and ensures that containers can use SGX on these nodes,
  • SCONE signed policies (signedpolicies.cas.scone.cloud): security policies uploaded to CAS via
  • SCONE signed and encrypted policies (encryptedpolicies.cas.scone.cloud), and
  • confidential Vault (vaults.services.scone.cloud).

These custom resources are associated with controllers bringing or keeping these resources in their target state. For each of these custom resources, a custom resource definition (CRD) specifies how to define a custom resource.

Installation

We maintain a script and a helm chart to install the SCONE Operator. The script installs

The SCONE operator consists of

  • a controller manager, and
  • for each SCONE service and the SCONE policies,
  • a custom resource definition (CRD) of a custom resource (CR), and
  • a controller.

When the operator is deployed, its CRDs, controllers, and other Kubernetes objects are deployed, and the controller manager is started. The controller manager runs as a Kubernetes deployment and deploys mutating and validating webhooks at start-up. Once they have started up, a custom resource of each kind can be deployed. This is made possible by the previous deployment of the corresponding CRDs.

Once a CR is created, deleted, or updated, the corresponding controller is notified, and reconciling the CR is started. To summarize, the reconciler, i.e., the controller, checks the current state of the CR, compares it to the desired state (i.e., its configuration), and takes the necessary action to change the current state into the desired state. This action either automatically (for example, if it was an update of the CR) or explicitly results in the reconciliation being triggered again. This process continues until the two states are equal.

TL'DR

A simple operator_controller script deploys and updates the SCONE Operator. In case you use the SCONE image registry, you would need to deploy image pull secrets. For this, you would need to set environment variables:

export REGISTRY_USERNAME=<your-name> # username of the registry service account
export REGISTRY_ACCESS_TOKEN=<your-access-toke> # read access token
export REGISTRY_EMAIL=<your-email> # email address of the service account
export DCAP_KEY="00000000000000000000000000000000" # replace by your Intel DCAP API - keep as is on Azure

You can get the DCAP API Key from Intel. Your access token needs to be able to read the registry: for more details, please see the section Create an Access Token.

You can now deploy or update the SCONE operator as follows:

curl -fsSL https://raw.githubusercontent.com/scontain/SH/master/operator_controller | bash -s - --reconcile --update --plugin --verbose --dcap-api "$DCAP_KEY" --secret-operator  --username $REGISTRY_USERNAME --access-token $REGISTRY_ACCESS_TOKEN --email $REGISTRY_EMAIL

Flags:

  • --reconcile: The operator_controller, even without any of the above flags, will output warnings if the operator or any of its dependencies is not in the desired state. By setting flag --reconcile, the operator deploys and reconciles the SCONE operator in a Kubernetes cluster.

  • --update: redeploys the operator and updates its image.

  • --secret-operator: deploy a secrets operator (see below) that injects the SCONE Operator, and CAS pull secrets in all namespaces.

  • --username, --access-token, --email: define if you want to --reconcile or to --update the pull secrets.

  • --plugin: deploy the SCONE kubectl plugin on your local machine. This will find the last directory on the PATH that is writeable. If there is no writable directory on the PATH, this will fail. In this case, please set --plugin-path to define a directory where the operator controller should write the plugin. The name of the plugin is kubectl-provision.

  • --verbose: display progress information.

  • --only-operator: By default, the operator_controller installs a LAS and SGXPlugin. Set flag --only-operator to only install the operator.

  • --no-sgxplugin: in case another plugin is already installed in the cluster, set flag --no-sgxplugin.

Typically, you want to keep all versions the same. Upgrading of the SCONE operator, SCONE SGXPlugin, and the SCONE LAS can be done via the operator controller using flag --update. A SCONE CAS needs to be updated using our kubectl provision plugin.

Operator Controller Reference

Usage:
  operator_controller [OPTIONS]

Objectives:
  - Checks if the SCONE operator and all its dependencies are available.
  - Tries to fix any issues it discovers if flag '--reconcile' is set.
  - Tries to update all components in case flag '--update' is set (even if everything is ok).
  - Creates a namespace for a service if flag --create NAMESPACE is set.
  - If the --verify-image-signatures is provided, or if the Scontain container image repository
    is used, the signatures of the images used are verified.


Options:
    --reconcile | -r
                  Try to fix all warnings that we discover.
                  The default is to warn about potential issues only.
    --update | -u
                  Try to update all dependencies of the SCONE operator.
                  independently if they need fixing.
    -n | --namespace NAMESPACE
                  The Kubernetes namespace in which the SCONE operator should be deployed on the cluster.
                  Default value: "scone-system"
    -c | --create NAMESPACE
                  Create namespace "NAMESPACE" for provisioning SCONE CAS (or another service).
    --username REGISTRY_USERNAME
                  To create/update/fix the pull secrets ('sconeapps' and 'scone-operator-pull'), 
                  one needs to specify the user name, access token, and email of the registry.
                  Signup for an account: https://sconedocs.github.io/registry/
    --access-token REGISTRY_ACCESS_TOKEN
                  The access token of the pull secret.
    --email REGISTRY_EMAIL
                  The email address belonging to the pull secret.
    --plugin
                  Include the kubectl plugin in the reconciliation and updates.
    --plugin-path PATH
                  Path where we should write the kubectl plugin binary. The path must be writeable.
                  Default value: "/Users/christoffetzer/.local/bin/kubectl-provision"
                  The prefix of the default value is the last path on your shell $PATH that is writeable. 
                  If none is writeable and you set --plugin, you must specify --plugin-path PATH.
    --secret-operator
                  Check/Reconcile/Update the Secret Operator (used to inject Kubernetes Secrets into Kubernetes namespaces)
    --only-operator
                  Only install the SCONE Operator (but no LAS, SGXPlugin, kubectl plugin)
    -v | --verbose
                  Enable verbose output
    --debug | debug_short_flag
                  Create debug image instead of a production image
    --set-version VERSION
                  Set the version of the helm chart
    --no-sgxplugin
                  Set this flag in case you do not want to install the SGXPlugin.
    --verify-image-signatures PUBLIC_KEY_PATH
                  Path to the public key to use for verification of signed images.
                  For the verification of signed images in the registry.scontain.com/scone.cloud
                  repository, the public key does not need to be provided, and this
                  option is ignored.
    --dcap-api | -d <DCAP API Key>
                  DCAP API Key - required when provisioning LAS.
    --help
                  Output this usage information and exit.

Default Configuration: 
  - CERT_MANAGER=https://github.com/cert-manager/cert-manager/releases/download/v1.10.1/cert-manager.yaml
  - DEFAULT_NAMESPACE=scone-system
  - HELM_CHART=https://raw.githubusercontent.com/scontain/operator/main/scone-operator-5.8.0.tgz
  - LAS_MANIFEST=https://raw.githubusercontent.com/scontain/manifests/main/5.8.0/las.yaml
  - SGXPLUGIN_MANIFEST=https://raw.githubusercontent.com/scontain/manifests/main/5.8.0/sgxplugin.yaml
  - REGISTRY=registry.scontain.com
  - IMAGE_REPO=registry.scontain.com/scone.cloud
  - KUBECTLPLUGIN=https://raw.githubusercontent.com/scontain/SH/master/5.8.0/kubectl-provision
  - SECRET_OPERATOR_MANIFEST=https://raw.githubusercontent.com/scontain/manifests/main/5.8.0/secrets_operator.yaml
  - IMPS_HELM_CHART=banzaicloud-stable/imagepullsecrets
  - IMPS_HELM_REPO=banzaicloud-stable https://kubernetes-charts.banzaicloud.com
  - VERSION=5.8.0
You can overwrite the defaults by exporting these environment variables before executing this script.```

## Manual Deployment of the Operator

!!! note "Use the Operator Controller to deploy the SCONE Operator"
    We do not recommend installing the SCONE Operator manually. This description explains the individual steps that need to be performed in case you want to customize the Operator Controller.

We need to install a set of prerequisites to install the SCONE Operator. After the SCONE operator is up and running, each of the custom resources `SGXPlugin`, `LAS`, and `CAS` can be installed separately by deploying and creating a custom resource.

### Kubernetes Config

The first step is to ensure you have access to your Kubernetes cluster.

!!! note "KUBECONFIG"
    We assume you can access your Kubernetes cluster through your `$HOME/.kube/config` file or the `KUBECONFIG` environment variable.

### `cert-manager`

The `cert-manager` is a prerequisite of the SCONE operator. You can check if the `cert-manager` is installed using `kubectl`:

```bash
kubectl get pods -A | grep cert-manager

If no cert-manager pod is running, you can install cert-manager using kubectl or helm. Our recommendation, however, is to install the latest release using kubectl as follows:

kubectl apply -f https://github.com/cert-manager/cert-manager/releases/download/v1.10.1/cert-manager.yaml

Please refer to the official installation instructions for up-to-date version information.

Alternatively, you can install it with the SCONE operator using helm: it can be installed using command line flags of the helm install command.

Operator Namespace

Now you are ready to deploy the SCONE operator using the image you just created. Deploying the operator can be done using helm. With helm, you can specify the desired namespace on the command line.

By default, we use the namespace scone-system. You can create the namespace by executing the following:

kubectl apply -f https://raw.githubusercontent.com/scontain/operator-samples/main/namespace.yaml

Pull Secret

Pull Secret

Authorization is required to pull container images from private registries. It would be best if you created the Kubernetes secrets scone-operator-pull and sconeapps with the required credentials in the namespace scone-system.

The SCONE operator image is stored in a private Docker Hub repo. Hence, to deploy this image, one needs to pass the image pull credentials to Kubernetes. To do so, you must create the Kubernetes secret scone-operator-pull.

You could first define your credentials which include the generation of an access token to read_registry and set environment variables:

export REGISTRY_USERNAME=<your-name> # username of the registry service account
export REGISTRY_ACCESS_TOKEN=<your-access-toke> # read access token
export REGISTRY_EMAIL=<your-email> # email address of the service account

And then, create a Kubernetes secrets scone-operator-pull and sconeapps. In a simple setup, these secrets contain the same token. You need to define a secret per registry if you use multiple registries. Often one might use a different registry for the base SCONE images (Plugin, LAS, CAS) and the application images (e.g., MariaDB, Nginx, etc.)

kubectl create secret docker-registry scone-operator-pull \
   --docker-server=registry.scontain.com \
   --docker-username=$REGISTRY_USERNAME \
   --docker-password=$REGISTRY_ACCESS_TOKEN \
   --docker-email=$REGISTRY_EMAIL \
   --namespace scone-system

and

kubectl create secret docker-registry sconeapps \
   --docker-server=registry.scontain.com \
   --docker-username=$REGISTRY_USERNAME \
   --docker-password=$REGISTRY_ACCESS_TOKEN \
   --docker-email=$REGISTRY_EMAIL \
   --namespace scone-system

Deploy the SCONE Operator

Note

You can find the up-to-date list of SCONE operator releases here: SCONE operator releases.

Now deploy your SCONE operator either using helm:

helm install scone-operator https://github.com/scontain/operator/archive/refs/tags/v0.0.7.tar.gz --namespace scone-system

Automatically injecting Pull Secrets

The SCONE images require defining a pull secret. This can be inconvenient since a user would need to add the correct pull secret in each namespace that needs access to one of these images. Since the operator already defines this pull secret, one can automate the distribution of this secret to other namespaces with the help of a secrets operator.

helm repo add banzaicloud-stable https://kubernetes-charts.banzaicloud.com
helm install imps banzaicloud-stable/imagepullsecrets -n scone-system

Creating a secret injector for secret sconeapps, i.e., injecting into all namespaces that request this secret by copying form namespace scone-system:

kubectl apply -f https://raw.githubusercontent.com/scontain/operator-samples/main/secrets_operator.yaml

Using helm to install the cert-manager

cert-manager

cert-manager can be installed by adding the flag --set cert-manager.enabled=true to the above helm install command. However, care should be taken not to have more than one instance of cert-manager running in the same cluster since it also manages non-namespaced resources. You can customize where cert-manager is installed using the command line flag --set cert-manager.namespace=somenamespace and include the CRDs of the cert-manager in the installation using --set cert-manager.installCRDs=true. We recommend installing it as described above under Prerequisites and refer to the official cert-manager documentation for further information.

You can verify that the operator is running as it should using the following commands:

# Check the state of the deployment of the operator
kubectl get deployments -n scone-system scone-controller-manager
# Check the state of the pod of the deployment
kubectl get pods -n scone-system -l control-plane=controller-manager
# Check the log of the pod (use the name of the pod from the previous command)
export CONTROLLERPOD=$(kubectl get pods -n scone-system -l control-plane=controller-manager | grep scone-controller-manager | awk '{ print $1 }')
kubectl logs -n scone-system $CONTROLLERPOD

Default Images

The SCONE Operator uses the following default images:

Component Image Tag
SCONE Operator registry.scontain.com/scone.cloud/k8soperators latest
CAS registry.scontain.com/scone.cloud/cas latest
LAS registry.scontain.com/scone.cloud/las latest
SGXPlugin registry.scontain.com/scone.cloud/sgx-plugin latest
CAS Backup Controller registry.scontain.com/scone.cloud/backup-controller latest

Trouble Shooting

We have seen as set of issues that cannot be solved automatically by the SCONE Operator.

Image Pull Error

Sometimes, we see image pull errors when starting SCONE-related container images. For example, executing command

kubectl get pods -n scone-system

might result in the following output:

NAME                                               READY   STATUS             RESTARTS      AGE
las-lxk95                                          0/2     ImagePullBackOff   0             25m

Please check that your pull secrets exists by executing

kubectl get secrets sconeapps scone-operator-pull -n scone-system

this should result in an output like this:

NAME                  TYPE                             DATA   AGE
sconeapps             kubernetes.io/dockerconfigjson   1      37m
scone-operator-pull   kubernetes.io/dockerconfigjson   1      37m

Most issues we have seen so far, were caused by expired pull secrets, i.e., the access token stored in the secrets had expired. You could check the validity of your token by logging into the registry using the command docker login registry.scontain.com.

In case the two pull secrets do not exist, please check if the secret operator exists and is healthy.

You can retrieve a new token with scope read_registy by visiting gitlab.scontain.com. The easiest way to update the token in a cluster is to reconciliate the SCONE operator. Please ensure the following environment variables are defined (see above for more details):

export REGISTRY_USERNAME=<your-name> # username of the registry service account
export REGISTRY_ACCESS_TOKEN=<your-access-toke> # read access token
export REGISTRY_EMAIL=<your-email> # email address of the service account
export DCAP_KEY="00000000000000000000000000000000" # replace by your Intel DCAP API - keep as is on Azure

and execute

curl -fsSL https://raw.githubusercontent.com/scontain/SH/master/operator_controller | bash -s - --reconcile --update --plugin --verbose --dcap-api "$DCAP_KEY" --secret-operator  --username $REGISTRY_USERNAME --access-token $REGISTRY_ACCESS_TOKEN --email $REGISTRY_EMAIL

SCONE Webhook Issues

One of the common problems during deployment of the SCONE Operator is related to the Kubernetes cert manager being not or incorrectly installed. This can result in errors related to the use of webhooks by the SCONE operator like this one:

Error from server (InternalError): error when creating ".sgxplugin-manifest.yaml": Internal error occurred: failed calling webhook "msgxplugin.kb.io": failed to call webhook: Post "https://scone-webhook-service.scone-system.svc:443/mutate-base-scone-cloud-v1beta1-sgxplugin?timeout=10s": context deadline exceeded

In case you are using minikube, please start it with embedded certificates: minikube start --embed-certs. By default, the operator_controller script will deploy the cert manager and wait until the cert manager gets ready. It has a built in version like https://github.com/cert-manager/cert-manager/releases/download/v1.12.4/cert-manager.yaml. One can overwrite the default cert manager by defining environment variable CERT_MANAGER.

If the script is interrupted before the cert manager is properly running, one might need to wait a few minutes for the cert manager to become ready before trying to start operator_controller again. Please set flag --update to reinstall the cert-manager. The operator_controller script searches for an existing cert-manager by searching for a pod that contains string cert-manager. We might wrongly update the cert manager in case the cert-manager pod is not found.

LAS Unhealthy

Some applications change their user ID during startup. To be able to provide such applications with access to SGX, we probe periodically that all users can access the SGX device from within containers. The custom resource las will become unhealthy, in case access to SGX is restricted. You can determine the status of LAS by executing:

kubectl get las

In case las is unhealthy, please check the diagnostic output of kubectl describe las for events and conditions that might have caused las to become unhealthy.

Sometimes, reboots of servers and VMs can cause this if the default permissions are not properly set. To fix this issue, you might need to change the permissions of device /dev/sgx_enclave to 0666 on the hosts and the Kubernetes VMs. Pods that have the wrong permissions set, should be restarted. Alternatively, one could also change the permissions of /dev/sgx_enclave inside of the pods.