Skip to content

Deploying CAS and Vault

We automated the steps to deploy CAS and Vault in production mode using a kubectl plugin.

kubectl provision is a plugin for kubectl which can be used to

  • create CAS instances
  • upgrade CAS to a new version
  • create a snapshot of a CAS database
  • create a CAS from a snapshot

and to

  • create a new confidential Vault instance
  • create a snapshot of the Vault database
  • create a CAS from a snapshot
  • upgrade Vault to a new version

To install this plugin, please follow these instructions.

Client State

When provisioning CAS and Vault, the kubectl provision will create files - including some sensitive credentials - in directory $HOME/.cas. In case you use a SGX-capable computer to execute kubectl provision, one can protect these files with the help of SCONE. Contact us, in case you need support with this.

TL;DR

The provision of CAS is performed using a kubectl plugin. This script is deployed or reconciled as part of the SCONE operator_controller deployment. You can ensure that it is deployed by executing:

curl -fsSL https://raw.githubusercontent.com/scontain/SH/master/operator_controller | bash -s - \
--reconcile --plugin --verbose

To deploy a CAS instance named mycas in production mode in the default namespace, execute the following:

kubectl provision cas mycas --verbose

This will deploy a CAS release in the default Kubernetes namespace. It starts

  • a primary CAS instance with the name 'mycas',
  • backup CAS instances on each node of the Kubernetes cluster (but pods run only temporarily),
  • startup and liveness probes that monitor the primary CAS,
  • snapshots of the CAS database are periodically created and stored in a separate persistent volume /var/mnt/cas-database-snapshots-mycas-0,
  • audit log to verify all CAS transaction is stored in /var/log/cas/audit/cas_audit.log.

Deploying SCONE CAS

SCONE CAS is a Configuration and Attestation Service, i.e., it attests and verifies a service before it provides the service with its configuration information.

If you do not specify a manifest, the plugin will create a manifest

  • a primary CAS with a given name CAS_NAME executes all requests,
  • any node in the Kubernetes cluster with SGX-support serves as a backup, i.e., is permitted to start a new primary CAS instance in case the old primary CAS has failed,
  • liveness probes will detect CAS failures and trigger a fail-over to another node in case of a CAS or node failure,
  • the database is stored in a persistent volume,
  • automatic database snapshots, and liveness probes, you can use the SCONE kubectl plugin:
kubectl provision cas CAS_NAME [--namespace NAMESPACE] [--dcap-api APIKEY] [--verbose]
  • If a --namespace NAMESPACE is provided, the CAS is installed in Kubernetes namespace NAMESPACE. Otherwise, CAS is installed in the default namespace. If the specified namespace does not exist, the plugin will terminate with an error. Ensure that the namespace has the pull secret

  • The DCAP APIKey should always be specified. We use a default API key and issue a warning if it is not set. This debug API Key might stop working at any point in time. One can subscribe to the DCAP API via the Intel website.

  • When running on Azure, one can set APIKEY="00000000000000000000000000000000".

  • Use option --help to get more detailed usage information.

  • Use flag '--verbose' to print some more verbose output

This command will deploy and provision a CAS in the Kubernetes cluster that automatically fails over in case the CAS crashes or the CAS does not start up on a particular node.

CAS Upgrade: --upgrade <VERSION>

Upgrading a CAS to a new software version, requires to follow a set of upgrade steps. These steps are performed by the kubectl provision plugin when the flag --upgrade <VERSION> is set. Consider that you run CAS instance mycas of version 5.8.0 and you want to upgrade to version 5.8.1. You would execute:

kubectl provision cas mycas --upgrade 5.8.1 --verbose

This upgrades the version to the new version and restarts.

Details of the Upgrade Process

To upgrade a CAS instance, we need to perform the following steps:

  • We first need to register the new CAS as a backup CAS. The CAS service uses a primary/backup scheme. One primary executes the commands, and one of the backup CAS instances can take over in case the primary CAS crashes. Software upgrades a performed by registering a new CAS as a backup first:

  • We first update the backup controller policy, which is stored in the CAS. This requires that we have access to the credentials in $HOME/.cas that were used to provision this CAS instance, i.e., to take ownership of this CAS.

  • If there is a failure of the primary CAS during this process, a new CAS is created using the current version of the CAS. Therefore, the upgrade can only be performed if the CAS is healthy. If it is unhealthy, please wait until / ensure the CAS becomes healthy.
  • To register a new CAS version, we change the custom resource manifest of the CAS to update the backup image to point to the new CAS version. The SCONE operator will register the new backup CAS instances with the primary CAS. The CAS instance will become unhealthy until all SGX-capable nodes in the Kubernetes cluster have written the new CAS.
  • After successfully registering the new backup CAS instances, i.e., the CAS is healthy again, we can switch to the new CAS version. To do so, we change the CAS custom resource's CAS image to point to the new CAS image. The SCONE operator will perform a rolling update of the CAS. We will wait until the CAS is healthy again, i.e., the CAS is updated.

Background: Database Recovery Snapshotting

Enabling snapshots instructs the CAS to regularly snapshot its database in a separate directory for disaster recovery purposes. Note that this feature is supposed to protect against only storage (data-loss) failures! For protection against platform failures, i.e., a VM shuts down, we enable Backup CAS instances by default.

Protecting against data loss in confidential computing requires synchronization with storage that guarantees durability. This will typically happen via a network and introduces new synchronization concerns. Synchronizing the CAS database files directly may lead to a corrupt database snapshot as CAS continuously modifies the database. Database Recovery Snapshotting allows the synchronization of a consistent CAS database.

The default period (in case no databaseSnapshots.interval is specified) is 60 seconds. The Database Recovery Snapshotting feature will be stored in databaseSnapshots.targetDirectory. The default is /var/mnt/cas-database-snapshots. For each interval, CAS will copy a consistent snapshot of its database (cas.db) and database key store (cas.key-store) into a new subdirectory. If the CAS database remains unchanged, no snapshot will be created.

Warning

The Database Recovery Snapshotting feature is not a replacement for the CAS backup feature! The database snapshots cannot be restored unless the machine you intend to restore it on has previously been registered as a backup target using the SCONE CLI. Database snapshots are a safer option than copying the database files directly, as CAS will put the database into a read-only mode when creating a snapshot, ensuring consistency of the backed-up files. You should back up the latest snapshots, e.g., by pushing them to an object store.

Note

If snapshot creation fails, CAS will be stopped with an error and return a non-zero exit code.

Create a Vault Instance

To deploy a Vault instance myvault in a namespace mynamespace, you could first define environment variables, and then create the vault with the help of the kubectl provision plugin:

export VAULT_NAME="myvault"
export NAMESPACE="mynamespace"

kubectl provision vault $VAULT_NAME -n $NAMESPACE --verbose

This command will assign this vault a random OWNER_ID - this is a mechanism to ensure that each vault instance is associated with a unique CAS namespace (even if we had already provisioned a Vault with the same name in the same namespace).

You can query the state of your Vault by executing the following:

kubectl get vault $VAULT_NAME -n $NAMESPACE

For getting a more detailed description of the Vault, you can execute the following:

kubectl get vault $VAULT_NAME -n $NAMESPACE

Vault Demo Client

To give a confidential vault client access to Vault, you first need to determine your owner id. We can query Kubernetes to determine the owner ID:

export OWNER_ID=$(kubectl get vault $VAULT_NAME -n $NAMESPACE -ojsonpath='{.spec.server.extraEnvironmentVars.OWNER_ID}')
echo export OWNER_ID=$OWNER_ID

We can now create a client CAS policy using the kubectl plugin:

kubectl provision vault $VAULT_NAME $NAMESPACE --vault-client

This will use the policy from https://raw.githubusercontent.com/scontain/manifests/main/5.8.0-rc.5/vault-demo-client-policy.yaml. You can download this policy and modify it for the confidential Vault clients that need to get access to this Vault instance.

You can execute the demo client as follows. Log into the Vault container:

kubectl exec -it $VAULT_NAME -- bash

You can determine the "OWNER_ID", i.e., a random ID that identifies this vault as follows:

export OWNER_ID=$(kubectl get vault $VAULT_NAME -n $NAMESPACE -ojsonpath='{.spec.server.extraEnvironmentVars.OWNER_ID}')
echo "export OWNER_ID=$OWNER_ID"

And run the Vault CLI inside of this container:

kubectl exec -it $VAULT_NAME-0 -n $NAMESPACE -- bash -c "SCONE_CONFIG_ID=owner-$OWNER_ID/demo-client/get-kv-secret SCONE_HEAP=1G vault"

This will output a key/value pair retrieved from this Vault instance.

Upgrade of Vault

Upgrading a Vault instance to a new software version requires to follow a set of upgrade steps. These steps are performed by the kubectl provision plugin when the flag --upgrade <VERSION> is set. Consider that you run Vault instance myvault of version 5.8.0, and you want to upgrade to version 5.8.1. You would execute the following:

kubectl provision vault mycas --upgrade 5.8.1 --verbose

This upgrades the vault to this version and restarts the vault.

Configuration Options

Attestation: --dcap-api <API KEY>

Attestation ensures that

  • the correct application code executes in an encrypted memory region that only the application code can access.
  • the CPU hardware and firmware are up-to-date, and that the encrypted memory region is indeed provided by a CPU (and not by some simulation).

To attest CAS and Vault, we might need to access the Intel API in some clouds directly. In other clouds, we get all the required information directly from the cloud. In all cases, we verify that the information is trusted, i.e., signed by a key of the CPU manufacturer.

CAS Owner: --owner-config <FILE>

kubectl provision defines a default CAS owner config. We substitute the shell variables in this file before provisioning the CAS with this owner config. If you have specific requirements regarding the owner config, specify this via the option --owner-config <FILE>.

CAS Provisioning: --is-provisioned

When a CAS starts up the first time, it is unprovisioned, i.e., it has no owner yet. This CAS does not accept any requests until it is provisioned. During provisioning, one sets the owner of the CAS. A CAS can be only one owner, i.e., only the first provisioning of a CAS will succeed.

The owner of a CAS cannot access any secrets or policies of the CAS. The owner can determine which nodes can take over the CAS in case of a failure. The owner can also decide when to upgrade the CAS, i.e., install a new version of the CAS. kubectl provision supports updates with the help of option --upgrade.

Reference: kubectl provision

Usage:
  kubectl provision SVC [NAME] [--namespace <kubernetes-namespace>] [--dcap-api <API Key>] [--owner-config <owner config>] [--verbose] [--help]

Arguments:
  Service to provision: SVC = cas | vault
    - cas:   provision CAS instance using the SCONE operator
    - vault: provision a confidential Vault instance using the SCONE operator. 
             Uses by default CAS instance cas. If no cas named cas exists, it is
             also created and provisioned, together with the vault. If such a cas
             already exists, it is not provisioned.

  Name of the service: NAME
    - If no name is specified, we set NAME=SVC

  Find more information at: https://sconedocs.github.io/5_kubectl/

Options:
    -n | --namespace
                  The Kubernetes namespace in which the service should be deployed on the cluster.
                  Default value: "default"
    --dcap-api | -d
                  DCAP API Key - define this if your cloud provider does not provide DCAP caching service. 
                  The default value is "00000000000000000000000000000000", i.e., this only works if your cloud provider provides us with the necessary attestation collaterals.
    --owner-config | -o
                  Provide a specific owner config when provisioning the CAS instance.
                  By default, we provision for a NodePort. We currently do not support
                  providing an owner config for LoadBalancer services.
    --target
                  Specify target directory for generated manifests and owner IDs. Default path="/Users/christoffetzer/.cas".
    --no-backup
                  Create and provision a cas with the backup-controller disabled.
    -v | --verbose
                  Enable verbose output
    --debug | debug_short_flag
                  Enabled debug mode
    --webhook <URL>
                  Forward entries of the CAS audit log to the given URL
    --manifests-dir <FILE/URL>
                  File or url of a directory that contains the default files to apply
                  Default: https://raw.githubusercontent.com/scontain/manifests/main
    --image-registry <IMAGE REGISTRY URL>
                  Url of an image registry containing the images to be used
                  Default: registry.scontain.com/scone.cloud
    --filename | -f <FILE>
                  file or url   that contains the manifest to apply
                  - default Vault manifest is https://raw.githubusercontent.com/scontain/manifests/main/5.8.0-rc.8/vault.yaml
                  - default CAS manifest is https://raw.githubusercontent.com/scontain/manifests/main/5.8.0-rc.8/cas.yaml
                  - default Vault verifier manifest is https://raw.githubusercontent.com/scontain/manifests/main/5.8.0-rc.8/vault-verifier.yaml
    --is-provisioned
                  Checks if CAS is already provisioned and exists: Exits with an error in case it was not yet provisioned.
    --vault-client
                  Upload Vault client policy to CAS: specify policy with flag --filename. Default policy is specifed by VAULT_DEMO_CLIENT_POLICY_URL.
                  Default policy is https://raw.githubusercontent.com/scontain/manifests/main/5.8.0-rc.8/vault-demo-client-policy.yaml
    --verify
                  Verify the set up of the specified CAS or Vault instance.
    --print-public-keys
                  - SVC==cas, it prints the CAS Key, the CAS Software Key and the CAS encryption key.
                  - SVC==vault, it prints the public key of the Vault.
    --cas
                  When provisioning vault, we use the specified cas. If not specified, we use CAS 'cas'.
                  For now, the CAS must be in the same Kubernetes cluster and the same namespace as the vault.
    --image-overwrite <IMAGE>
                  Replace the CAS image by the given image - mainly used for testing.
    --set-version <VERSION>
                  Set the version of CAS
    --local-backup
                  Take a snapshot of the encrypted CAS database and store in local filesystem.
    --cas-database-recovery <SNAPSHOT>
                  Create a new CAS instance and start with existing CAS database in directory <SNAPSHOT>.
    --set-tolerations "<TOLERATIONS>"
                  Sets the tolerations, separated by spaces, that we permit when attesting SCONE CAS.
                  Overwrites environment variable SGX_TOLERATIONS. Default is --accept-configuration-needed --accept-group-out-of-date --accept-sw-hardening-needed
                  Example: "--accept-group-out-of-date --accept-sw-hardening-needed --accept-configuration-needed"
                  See https://sconedocs.github.io/CAS_cli/#scone-cas-attest for more details.
    --upgrade <VERSION>
                  Perform software upgrade of CAS or Vault. 
                  For CAS, this will perform the following steps:
                  1. Update the policy of the backup controller (requires owner credentials)
                  2. Upgrade the backup controller by updating the CAS custom resource manifest.
                  3. Upgrade the CAS service by updating the CAS image.
    --help
                  Output this usage information and exit.
    --version
                  Print version (5.8.0-rc.8) and exit.

Current Configuration: 
  - VERSION="5.8.0-rc.8"
  - MANIFESTS_URL="https://raw.githubusercontent.com/scontain/manifests/main"
  - IMAGE_REPO="registry.scontain.com/scone.cloud"
  - IMAGE_PREFIX=""
  - NAMESPACE="default"
  - DCAP_KEY="aecd5ebb682346028d60c36131eb2d92"
  - TARGET_DIR="/Users/christoffetzer/.cas"
  - VAULT_MANIFEST_URL="https://raw.githubusercontent.com/scontain/manifests/main/5.8.0-rc.8/vault.yaml" # Vault Manifest
  - VAULT_VERIFIER_MANIFEST_URL="https://raw.githubusercontent.com/scontain/manifests/main/5.8.0-rc.8/vault-verifier.yaml" # Vault Verifier Manifest
  - VAULT_POLICY_URL="https://raw.githubusercontent.com/scontain/manifests/main/5.8.0-rc.8/vault-policy.yaml" # CAS policy for Vault
  - VAULT_VERIFY_POLICY_URL="https://raw.githubusercontent.com/scontain/manifests/main/5.8.0-rc.8/vault-verify-policy.yaml" # CAS verification policy for Vault
  - VAULT_DEMO_CLIENT_POLICY_URL="https://raw.githubusercontent.com/scontain/manifests/main/5.8.0-rc.8/vault-demo-client-policy.yaml" # demo policy for a Vault client
  - CAS_MANIFEST_URL="https://raw.githubusercontent.com/scontain/manifests/main/5.8.0-rc.8/cas.yaml"
  - CAS_PROVISIONING_URL="https://raw.githubusercontent.com/scontain/manifests/main/5.8.0-rc.8/cas_provisioning.yaml"
  - CAS_BACKUP_POLICY_URL="https://raw.githubusercontent.com/scontain/manifests/main/5.8.0-rc.8/backup_policy.yaml"
  - SGX_TOLERATIONS="--accept-configuration-needed --accept-group-out-of-date --accept-sw-hardening-needed --isvprodid 41316 --isvsvn 5 --mrsigner 195e5a6df987d6a515dd083750c1ea352283f8364d3ec9142b0d593988c6ed2d"