Deploying & Reconciling the SCONE Operator
The SCONE Kubernetes operator facilitates a declarative description of SCONE-related custom resources. These custom resources include
- SCONE CAS (
cas.services.scone.cloud
): deploys a high-availability CAS using a primary/backup approach. - SCONE LAS (
las.base.scone.cloud
): deploys local attestation service on all SGX-capable nodes of a Kubernetes cluster, - SCONE SGXPlugin (
las.base.scone.cloud
): identifies and labels all SGX-capable Kubernetes nodes and ensures that containers can use SGX on these nodes, - SCONE signed policies (
signedpolicies.cas.scone.cloud
): security policies uploaded to CAS via - SCONE signed and encrypted policies (
encryptedpolicies.cas.scone.cloud
), and - confidential Vault (
vaults.services.scone.cloud
).
These custom resources are associated with controllers bringing or keeping these resources in their target state. For each of these custom resources, a custom resource definition (CRD) specifies how to define a custom resource.
Installation
We maintain a script and a helm chart to install the SCONE Operator. The script installs
- the SCONE operator, and
- LAS, SGXPlugin, as well as
- our
kubectl provision
plugin
The SCONE operator consists of
- a controller manager, and
- for each SCONE service and the SCONE policies,
- a custom resource definition (CRD) of a custom resource (CR), and
- a controller.
When the operator is deployed, its CRDs, controllers, and other Kubernetes objects are deployed, and the controller manager is started. The controller manager runs as a Kubernetes deployment and deploys mutating and validating webhooks at start-up. Once they have started up, a custom resource of each kind can be deployed. This is made possible by the previous deployment of the corresponding CRDs.
Once a CR is created, deleted, or updated, the corresponding controller is notified, and reconciling the CR is started. To summarize, the reconciler, i.e., the controller, checks the current state of the CR, compares it to the desired state (i.e., its configuration), and takes the necessary action to change the current state into the desired state. This action either automatically (for example, if it was an update of the CR) or explicitly results in the reconciliation being triggered again. This process continues until the two states are equal.
TL'DR
A simple operator_controller
script deploys and updates the SCONE Operator. In case you use the SCONE image registry, you would need to deploy image pull secrets. For this, you would need to set environment variables:
export REGISTRY_USERNAME=<your-name> # username of the registry service account
export REGISTRY_ACCESS_TOKEN=<your-access-toke> # read access token
export REGISTRY_EMAIL=<your-email> # email address of the service account
export DCAP_KEY="00000000000000000000000000000000" # replace by your Intel DCAP API - keep as is on Azure
You can get the DCAP API Key from Intel. Your access token needs to be able to read the registry: for more details, please see the section Create an Access Token.
You can now deploy or update the SCONE operator as follows:
curl -fsSL https://raw.githubusercontent.com/scontain/SH/master/operator_controller | bash -s - --reconcile --update --plugin --verbose --dcap-api "$DCAP_KEY" --secret-operator --username $REGISTRY_USERNAME --access-token $REGISTRY_ACCESS_TOKEN --email $REGISTRY_EMAIL
Flags:
-
--reconcile
: Theoperator_controller
, even without any of the above flags, will output warnings if the operator or any of its dependencies is not in the desired state. By setting flag--reconcile
, the operator deploys and reconciles the SCONE operator in a Kubernetes cluster. -
--update
: redeploys the operator and updates its image. -
--secret-operator
: deploy a secrets operator (see below) that injects the SCONE Operator, and CAS pull secrets in all namespaces. -
--username
,--access-token
,--email
: define if you want to--reconcile
or to--update
the pull secrets. -
--plugin
: deploy the SCONEkubectl
plugin on your local machine. This will find the last directory on thePATH
that is writeable. If there is no writable directory on thePATH
, this will fail. In this case, please set--plugin-path
to define a directory where the operator controller should write the plugin. The name of the plugin iskubectl-provision
. -
--verbose
: display progress information. -
--only-operator
: By default, theoperator_controller
installs a LAS and SGXPlugin. Set flag--only-operator
to only install the operator. -
--no-sgxplugin
: in case another plugin is already installed in the cluster, set flag--no-sgxplugin
.
Typically, you want to keep all versions the same. Upgrading of the SCONE operator, SCONE SGXPlugin, and the SCONE LAS can be done via the operator controller using flag --update
. A SCONE CAS needs to be updated using our kubectl provision
plugin.
Operator Controller Reference
Usage:
operator_controller [OPTIONS]
Objectives:
- Checks if the SCONE operator and all its dependencies are available.
- Tries to fix any issues it discovers if flag '--reconcile' is set.
- Tries to update all components in case flag '--update' is set (even if everything is ok).
- Creates a namespace for a service if flag --create NAMESPACE is set.
- If the --verify-image-signatures is provided, or if the Scontain container image repository
is used, the signatures of the images used are verified.
Options:
--reconcile | -r
Try to fix all warnings that we discover.
The default is to warn about potential issues only.
--update | -u
Try to update all dependencies of the SCONE operator.
independently if they need fixing.
-n | --namespace NAMESPACE
The Kubernetes namespace in which the SCONE operator should be deployed on the cluster.
Default value: "scone-system"
-c | --create NAMESPACE
Create namespace "NAMESPACE" for provisioning SCONE CAS (or another service).
--username REGISTRY_USERNAME
To create/update/fix the pull secrets ('sconeapps' and 'scone-operator-pull'),
one needs to specify the user name, access token, and email of the registry.
Signup for an account: https://sconedocs.github.io/registry/
--access-token REGISTRY_ACCESS_TOKEN
The access token of the pull secret.
--email REGISTRY_EMAIL
The email address belonging to the pull secret.
--plugin
Include the kubectl plugin in the reconciliation and updates.
--plugin-path PATH
Path where we should write the kubectl plugin binary. The path must be writeable.
Default value: "/Users/christoffetzer/.local/bin/kubectl-provision"
The prefix of the default value is the last path on your shell $PATH that is writeable.
If none is writeable and you set --plugin, you must specify --plugin-path PATH.
--secret-operator
Check/Reconcile/Update the Secret Operator (used to inject Kubernetes Secrets into Kubernetes namespaces)
--only-operator
Only install the SCONE Operator (but no LAS, SGXPlugin, kubectl plugin)
-v | --verbose
Enable verbose output
--debug | debug_short_flag
Create debug image instead of a production image
--set-version VERSION
Set the version of the helm chart
--no-sgxplugin
Set this flag in case you do not want to install the SGXPlugin.
--verify-image-signatures PUBLIC_KEY_PATH
Path to the public key to use for verification of signed images.
For the verification of signed images in the registry.scontain.com/scone.cloud
repository, the public key does not need to be provided, and this
option is ignored.
--dcap-api | -d <DCAP API Key>
DCAP API Key - required when provisioning LAS.
--help
Output this usage information and exit.
Default Configuration:
- CERT_MANAGER=https://github.com/cert-manager/cert-manager/releases/download/v1.10.1/cert-manager.yaml
- DEFAULT_NAMESPACE=scone-system
- HELM_CHART=https://raw.githubusercontent.com/scontain/operator/main/scone-operator-5.8.0.tgz
- LAS_MANIFEST=https://raw.githubusercontent.com/scontain/manifests/main/5.8.0/las.yaml
- SGXPLUGIN_MANIFEST=https://raw.githubusercontent.com/scontain/manifests/main/5.8.0/sgxplugin.yaml
- REGISTRY=registry.scontain.com
- IMAGE_REPO=registry.scontain.com/scone.cloud
- KUBECTLPLUGIN=https://raw.githubusercontent.com/scontain/SH/master/5.8.0/kubectl-provision
- SECRET_OPERATOR_MANIFEST=https://raw.githubusercontent.com/scontain/manifests/main/5.8.0/secrets_operator.yaml
- IMPS_HELM_CHART=banzaicloud-stable/imagepullsecrets
- IMPS_HELM_REPO=banzaicloud-stable https://kubernetes-charts.banzaicloud.com
- VERSION=5.8.0
You can overwrite the defaults by exporting these environment variables before executing this script.```
## Manual Deployment of the Operator
!!! note "Use the Operator Controller to deploy the SCONE Operator"
We do not recommend installing the SCONE Operator manually. This description explains the individual steps that need to be performed in case you want to customize the Operator Controller.
We need to install a set of prerequisites to install the SCONE Operator. After the SCONE operator is up and running, each of the custom resources `SGXPlugin`, `LAS`, and `CAS` can be installed separately by deploying and creating a custom resource.
### Kubernetes Config
The first step is to ensure you have access to your Kubernetes cluster.
!!! note "KUBECONFIG"
We assume you can access your Kubernetes cluster through your `$HOME/.kube/config` file or the `KUBECONFIG` environment variable.
### `cert-manager`
The `cert-manager` is a prerequisite of the SCONE operator. You can check if the `cert-manager` is installed using `kubectl`:
```bash
kubectl get pods -A | grep cert-manager
If no cert-manager
pod is running, you can install cert-manager
using kubectl
or helm
. Our recommendation, however, is to install the latest release using kubectl
as follows:
kubectl apply -f https://github.com/cert-manager/cert-manager/releases/download/v1.10.1/cert-manager.yaml
Please refer to the official installation instructions for up-to-date version information.
Alternatively, you can install it with the SCONE operator using helm
: it can be installed using command line flags of the helm install
command.
Operator Namespace
Now you are ready to deploy the SCONE operator using the image you just created. Deploying the operator can be done using helm
. With helm
, you can specify the desired namespace on the command line.
By default, we use the namespace scone-system
. You can create the namespace by executing the following:
kubectl apply -f https://raw.githubusercontent.com/scontain/operator-samples/main/namespace.yaml
Pull Secret
Pull Secret
Authorization is required to pull container images from private registries. It would be best if you created the Kubernetes secrets scone-operator-pull
and sconeapps
with the required credentials in the namespace scone-system
.
The SCONE operator image is stored in a private Docker Hub repo. Hence, to deploy this image, one needs to pass the image pull credentials to Kubernetes. To do so, you must create the Kubernetes secret scone-operator-pull
.
You could first define your credentials which include the generation of an access token to read_registry
and set environment variables:
export REGISTRY_USERNAME=<your-name> # username of the registry service account
export REGISTRY_ACCESS_TOKEN=<your-access-toke> # read access token
export REGISTRY_EMAIL=<your-email> # email address of the service account
And then, create a Kubernetes secrets scone-operator-pull
and sconeapps
. In a simple setup, these secrets contain the same token. You need to define a secret per registry if you use multiple registries. Often one might use a different registry for the base SCONE images (Plugin, LAS, CAS) and the application images (e.g., MariaDB, Nginx, etc.)
kubectl create secret docker-registry scone-operator-pull \
--docker-server=registry.scontain.com \
--docker-username=$REGISTRY_USERNAME \
--docker-password=$REGISTRY_ACCESS_TOKEN \
--docker-email=$REGISTRY_EMAIL \
--namespace scone-system
and
kubectl create secret docker-registry sconeapps \
--docker-server=registry.scontain.com \
--docker-username=$REGISTRY_USERNAME \
--docker-password=$REGISTRY_ACCESS_TOKEN \
--docker-email=$REGISTRY_EMAIL \
--namespace scone-system
Deploy the SCONE Operator
Note
You can find the up-to-date list of SCONE operator releases here: SCONE operator releases.
Now deploy your SCONE operator either using helm
:
helm install scone-operator https://github.com/scontain/operator/archive/refs/tags/v0.0.7.tar.gz --namespace scone-system
Automatically injecting Pull Secrets
The SCONE images require defining a pull secret. This can be inconvenient since a user would need to add the correct pull secret in each namespace that needs access to one of these images. Since the operator already defines this pull secret, one can automate the distribution of this secret to other namespaces with the help of a secrets operator.
helm repo add banzaicloud-stable https://kubernetes-charts.banzaicloud.com
helm install imps banzaicloud-stable/imagepullsecrets -n scone-system
Creating a secret injector for secret sconeapps
, i.e., injecting into all namespaces that request this secret by copying form namespace scone-system
:
kubectl apply -f https://raw.githubusercontent.com/scontain/operator-samples/main/secrets_operator.yaml
Using helm
to install the cert-manager
cert-manager
cert-manager can be installed by adding the flag --set cert-manager.enabled=true
to the above helm install
command. However, care should be taken not to have more than one instance of cert-manager running in the same cluster since it also manages non-namespaced resources. You can customize where cert-manager is installed using the command line flag --set cert-manager.namespace=somenamespace
and include the CRDs of the cert-manager in the installation using --set cert-manager.installCRDs=true
. We recommend installing it as described above under Prerequisites and refer to the official cert-manager documentation for further information.
You can verify that the operator is running as it should using the following commands:
# Check the state of the deployment of the operator
kubectl get deployments -n scone-system scone-controller-manager
# Check the state of the pod of the deployment
kubectl get pods -n scone-system -l control-plane=controller-manager
# Check the log of the pod (use the name of the pod from the previous command)
export CONTROLLERPOD=$(kubectl get pods -n scone-system -l control-plane=controller-manager | grep scone-controller-manager | awk '{ print $1 }')
kubectl logs -n scone-system $CONTROLLERPOD
Default Images
The SCONE Operator uses the following default images:
Component | Image | Tag |
---|---|---|
SCONE Operator | registry.scontain.com/scone.cloud/k8soperators | latest |
CAS | registry.scontain.com/scone.cloud/cas | latest |
LAS | registry.scontain.com/scone.cloud/las | latest |
SGXPlugin | registry.scontain.com/scone.cloud/sgx-plugin | latest |
CAS Backup Controller | registry.scontain.com/scone.cloud/backup-controller | latest |
Trouble Shooting
We have seen as set of issues that cannot be solved automatically by the SCONE Operator.
Image Pull Error
Sometimes, we see image pull errors when starting SCONE-related container images. For example, executing command
kubectl get pods -n scone-system
might result in the following output:
NAME READY STATUS RESTARTS AGE
las-lxk95 0/2 ImagePullBackOff 0 25m
Please check that your pull secrets exists by executing
kubectl get secrets sconeapps scone-operator-pull -n scone-system
this should result in an output like this:
NAME TYPE DATA AGE
sconeapps kubernetes.io/dockerconfigjson 1 37m
scone-operator-pull kubernetes.io/dockerconfigjson 1 37m
Most issues we have seen so far, were caused by expired pull secrets, i.e., the access token stored in the secrets had expired. You could check the validity of your token by logging into the registry using the command docker login registry.scontain.com
.
In case the two pull secrets do not exist, please check if the secret operator exists and is healthy.
You can retrieve a new token with scope read_registy
by visiting gitlab.scontain.com
. The easiest way to update the token in a cluster is to reconciliate the SCONE operator. Please ensure the following environment variables are defined (see above for more details):
export REGISTRY_USERNAME=<your-name> # username of the registry service account
export REGISTRY_ACCESS_TOKEN=<your-access-toke> # read access token
export REGISTRY_EMAIL=<your-email> # email address of the service account
export DCAP_KEY="00000000000000000000000000000000" # replace by your Intel DCAP API - keep as is on Azure
and execute
curl -fsSL https://raw.githubusercontent.com/scontain/SH/master/operator_controller | bash -s - --reconcile --update --plugin --verbose --dcap-api "$DCAP_KEY" --secret-operator --username $REGISTRY_USERNAME --access-token $REGISTRY_ACCESS_TOKEN --email $REGISTRY_EMAIL
SCONE Webhook Issues
One of the common problems during deployment of the SCONE Operator is related to the Kubernetes cert manager being not or incorrectly installed. This can result in errors related to the use of webhooks by the SCONE operator like this one:
Error from server (InternalError): error when creating ".sgxplugin-manifest.yaml": Internal error occurred: failed calling webhook "msgxplugin.kb.io": failed to call webhook: Post "https://scone-webhook-service.scone-system.svc:443/mutate-base-scone-cloud-v1beta1-sgxplugin?timeout=10s": context deadline exceeded
In case you are using minikube
, please start it with embedded certificates: minikube start --embed-certs
. By default, the operator_controller
script will deploy the cert manager and wait until the cert manager gets ready. It has a built in version like https://github.com/cert-manager/cert-manager/releases/download/v1.12.4/cert-manager.yaml
. One can overwrite the default cert manager by defining environment variable CERT_MANAGER
.
If the script is interrupted before the cert manager is properly running, one might need to wait a few minutes for the cert manager to become ready before trying to start operator_controller
again. Please set flag --update
to reinstall the cert-manager. The operator_controller
script searches for an existing cert-manager by searching for a pod that contains string cert-manager
. We might wrongly update the cert manager in case the cert-manager pod is not found.
LAS Unhealthy
Some applications change their user ID during startup. To be able to provide such applications with access to SGX, we probe periodically that all users can access the SGX device from within containers. The custom resource las
will become unhealthy, in case access to SGX is restricted. You can determine the status of LAS by executing:
kubectl get las
In case las
is unhealthy, please check the diagnostic output of kubectl describe las
for events and conditions that might have caused las
to become unhealthy.
Sometimes, reboots of servers and VMs can cause this if the default permissions are not properly set. To fix this issue, you might need to change the permissions of device /dev/sgx_enclave
to 0666
on the hosts and the Kubernetes VMs. Pods that have the wrong permissions set, should be restarted. Alternatively, one could also change the permissions of /dev/sgx_enclave
inside of the pods.