Deploying CAS and Vault
We automated the steps to deploy CAS
and Vault
in production mode using a kubectl
plugin.
kubectl provision
is a plugin for kubectl
which can be used to
- create CAS instances
- upgrade CAS to a new version
- create a snapshot of a CAS database
- create a CAS from a snapshot
and to
- create a new confidential Vault instance
- create a snapshot of the Vault database
- create a CAS from a snapshot
- upgrade Vault to a new version
To install this plugin, please follow these instructions.
Client State
When provisioning CAS and Vault, the kubectl provision
will create files - including some sensitive credentials - in directory $HOME/.cas
. In case you use a SGX-capable computer to execute kubectl provision
, one can protect these files with the help of SCONE. Contact us, in case you need support with this.
TL;DR
The provision
of CAS is performed using a kubectl
plugin. This script is deployed or reconciled as part of the SCONE operator_controller
deployment. You can ensure that it is deployed by executing:
curl -fsSL https://raw.githubusercontent.com/scontain/SH/master/operator_controller | bash -s - \
--reconcile --plugin --verbose
To deploy a CAS instance named mycas
in production mode in the default namespace, execute the following:
kubectl provision cas mycas --verbose
This will deploy a CAS release in the default Kubernetes namespace. It starts
- a primary CAS instance with the name 'mycas',
- backup CAS instances on each node of the Kubernetes cluster (but pods run only temporarily),
- startup and liveness probes that monitor the primary CAS,
- snapshots of the CAS database are periodically created and stored in a separate persistent volume
/var/mnt/cas-database-snapshots-mycas-0
, - audit log to verify all CAS transaction is stored in
/var/log/cas/audit/cas_audit.log
.
Deploying SCONE CAS
SCONE CAS
is a Configuration and Attestation Service, i.e., it attests and verifies a service before it provides the service with its configuration information.
If you do not specify a manifest, the plugin will create a manifest
- a primary CAS with a given name
CAS_NAME
executes all requests, - any node in the Kubernetes cluster with SGX-support serves as a backup, i.e., is permitted to start a new primary CAS instance in case the old primary CAS has failed,
- liveness probes will detect CAS failures and trigger a fail-over to another node in case of a CAS or node failure,
- the database is stored in a persistent volume,
- automatic database snapshots, and liveness probes, you can use the SCONE
kubectl plugin:
kubectl provision cas CAS_NAME [--namespace NAMESPACE] [--dcap-api APIKEY] [--verbose]
-
If a
--namespace NAMESPACE
is provided, the CAS is installed in Kubernetes namespaceNAMESPACE
. Otherwise, CAS is installed in the default namespace. If the specified namespace does not exist, the plugin will terminate with an error. Ensure that the namespace has the pull secret -
The DCAP APIKey should always be specified. We use a default API key and issue a warning if it is not set. This debug API Key might stop working at any point in time. One can subscribe to the DCAP API via the Intel website.
-
When running on Azure, one can set
APIKEY="00000000000000000000000000000000"
. -
Use option
--help
to get more detailed usage information. -
Use flag '--verbose' to print some more verbose output
This command will deploy and provision a CAS in the Kubernetes cluster that automatically fails over in case the CAS crashes or the CAS does not start up on a particular node.
CAS Upgrade: --upgrade <VERSION>
Upgrading a CAS to a new software version, requires to follow a set of upgrade steps. These steps are performed by the kubectl provision
plugin when the flag --upgrade <VERSION>
is set. Consider that you run CAS instance mycas
of version 5.8.0
and you want to upgrade to version 5.8.1
. You would execute:
kubectl provision cas mycas --upgrade 5.8.1 --verbose
This upgrades the version to the new version and restarts.
Details of the Upgrade Process
To upgrade a CAS instance, we need to perform the following steps:
-
We first need to register the new CAS as a backup CAS. The CAS service uses a primary/backup scheme. One primary executes the commands, and one of the backup CAS instances can take over in case the primary CAS crashes. Software upgrades a performed by registering a new CAS as a backup first:
-
We first update the backup controller policy, which is stored in the CAS. This requires that we have access to the credentials in
$HOME/.cas
that were used to provision this CAS instance, i.e., to take ownership of this CAS. - If there is a failure of the primary CAS during this process, a new CAS is created using the current version of the CAS. Therefore, the upgrade can only be performed if the CAS is healthy. If it is unhealthy, please wait until / ensure the CAS becomes healthy.
- To register a new CAS version, we change the custom resource manifest of the CAS to update the backup image to point to the new CAS version. The SCONE operator will register the new backup CAS instances with the primary CAS. The CAS instance will become unhealthy until all SGX-capable nodes in the Kubernetes cluster have written the new CAS.
- After successfully registering the new backup CAS instances, i.e., the CAS is healthy again, we can switch to the new CAS version. To do so, we change the CAS custom resource's CAS image to point to the new CAS image. The SCONE operator will perform a rolling update of the CAS. We will wait until the CAS is healthy again, i.e., the CAS is updated.
Background: Database Recovery Snapshotting
Enabling snapshots
instructs the CAS to regularly snapshot its database in a separate directory for disaster recovery purposes. Note that this feature is supposed to protect against only storage (data-loss) failures! For protection against platform failures, i.e., a VM shuts down, we enable Backup CAS instances by default.
Protecting against data loss in confidential computing requires synchronization with storage that guarantees durability. This will typically happen via a network and introduces new synchronization concerns. Synchronizing the CAS database files directly may lead to a corrupt database snapshot as CAS continuously modifies the database.
Database Recovery Snapshotting
allows the synchronization of a consistent CAS database.
The default period (in case no databaseSnapshots.interval
is specified) is 60 seconds.
The Database Recovery Snapshotting feature will be stored in databaseSnapshots.targetDirectory
. The default is /var/mnt/cas-database-snapshots
.
For each interval
, CAS will copy a consistent snapshot of its database (cas.db
) and database key store (cas.key-store
) into a new subdirectory. If the CAS database remains unchanged, no snapshot will be created.
Warning
The Database Recovery Snapshotting feature is not a replacement for the CAS backup feature! The database snapshots cannot be restored unless the machine you intend to restore it on has previously been registered as a backup target using the SCONE CLI. Database snapshots are a safer option than copying the database files directly, as CAS will put the database into a read-only mode when creating a snapshot, ensuring consistency of the backed-up files. You should back up the latest snapshots, e.g., by pushing them to an object store.
Note
If snapshot creation fails, CAS will be stopped with an error and return a non-zero exit code.
Create a Vault Instance
To deploy a Vault instance myvault
in a namespace mynamespace
, you could first define environment variables, and
then create the vault with the help of the kubectl provision
plugin:
export VAULT_NAME="myvault"
export NAMESPACE="mynamespace"
kubectl provision vault $VAULT_NAME -n $NAMESPACE --verbose
This command will assign this vault a random OWNER_ID
- this is a mechanism to ensure that each
vault instance is associated with a unique CAS namespace (even if we had already provisioned a Vault with the same name in the same namespace).
You can query the state of your Vault by executing the following:
kubectl get vault $VAULT_NAME -n $NAMESPACE
For getting a more detailed description of the Vault, you can execute the following:
kubectl get vault $VAULT_NAME -n $NAMESPACE
Vault Demo Client
To give a confidential vault client access to Vault, you first need to determine your owner id. We can query Kubernetes to determine the owner ID:
export OWNER_ID=$(kubectl get vault $VAULT_NAME -n $NAMESPACE -ojsonpath='{.spec.server.extraEnvironmentVars.OWNER_ID}')
echo export OWNER_ID=$OWNER_ID
We can now create a client CAS policy using the kubectl plugin:
kubectl provision vault $VAULT_NAME $NAMESPACE --vault-client
This will use the policy from https://raw.githubusercontent.com/scontain/manifests/main/5.8.0-rc.5/vault-demo-client-policy.yaml. You can download this policy and modify it for the confidential Vault clients that need to get access to this Vault instance.
You can execute the demo client as follows. Log into the Vault container:
kubectl exec -it $VAULT_NAME -- bash
You can determine the "OWNER_ID", i.e., a random ID that identifies this vault as follows:
export OWNER_ID=$(kubectl get vault $VAULT_NAME -n $NAMESPACE -ojsonpath='{.spec.server.extraEnvironmentVars.OWNER_ID}')
echo "export OWNER_ID=$OWNER_ID"
And run the Vault CLI inside of this container:
kubectl exec -it $VAULT_NAME-0 -n $NAMESPACE -- bash -c "SCONE_CONFIG_ID=owner-$OWNER_ID/demo-client/get-kv-secret SCONE_HEAP=1G vault"
This will output a key/value pair retrieved from this Vault instance.
Upgrade of Vault
Upgrading a Vault instance to a new software version requires to follow a set of upgrade steps. These steps are performed by the kubectl provision
plugin when the flag --upgrade <VERSION>
is set. Consider that you run Vault instance myvault
of version 5.8.0
, and you want to upgrade to version 5.8.1
. You would execute the following:
kubectl provision vault mycas --upgrade 5.8.1 --verbose
This upgrades the vault to this version and restarts the vault.
Configuration Options
Attestation: --dcap-api <API KEY>
Attestation ensures that
- the correct application code executes in an encrypted memory region that only the application code can access.
- the CPU hardware and firmware are up-to-date, and that the encrypted memory region is indeed provided by a CPU (and not by some simulation).
To attest CAS and Vault, we might need to access the Intel API in some clouds directly. In other clouds, we get all the required information directly from the cloud. In all cases, we verify that the information is trusted, i.e., signed by a key of the CPU manufacturer.
CAS Owner: --owner-config <FILE>
kubectl provision
defines a default CAS owner config. We substitute the shell variables in this file before provisioning the CAS with this owner config. If you have specific requirements regarding the owner config, specify this via the option --owner-config <FILE>
.
CAS Provisioning: --is-provisioned
When a CAS starts up the first time, it is unprovisioned, i.e., it has no owner yet. This CAS does not accept any requests until it is provisioned. During provisioning, one sets the owner of the CAS. A CAS can be only one owner, i.e., only the first provisioning of a CAS will succeed.
The owner of a CAS cannot access any secrets or policies of the CAS. The owner can determine which nodes can take over the CAS in case of a failure. The owner can also decide when to upgrade the CAS, i.e., install a new version of the CAS. kubectl provision
supports updates with the help of option --upgrade
.
Reference: kubectl provision
Usage:
kubectl provision SVC [NAME] [--namespace <kubernetes-namespace>] [--dcap-api <API Key>] [--owner-config <owner config>] [--verbose] [--help]
Arguments:
Service to provision: SVC = cas | vault
- cas: provision CAS instance using the SCONE operator
- vault: provision a confidential Vault instance using the SCONE operator.
Uses by default CAS instance cas. If no cas named cas exists, it is
also created and provisioned, together with the vault. If such a cas
already exists, it is not provisioned.
Name of the service: NAME
- If no name is specified, we set NAME=SVC
Find more information at: https://sconedocs.github.io/5_kubectl/
Options:
-n | --namespace
The Kubernetes namespace in which the service should be deployed on the cluster.
Default value: "default"
--dcap-api | -d
DCAP API Key - define this if your cloud provider does not provide DCAP caching service.
The default value is "00000000000000000000000000000000", i.e., this only works if your cloud provider provides us with the necessary attestation collaterals.
--owner-config | -o
Provide a specific owner config when provisioning the CAS instance.
By default, we provision for a NodePort. We currently do not support
providing an owner config for LoadBalancer services.
--target
Specify target directory for generated manifests and owner IDs. Default path="/Users/christoffetzer/.cas".
--no-backup
Create and provision a cas with the backup-controller disabled.
-v | --verbose
Enable verbose output
--debug | debug_short_flag
Enabled debug mode
--webhook <URL>
Forward entries of the CAS audit log to the given URL
--manifests-dir <FILE/URL>
File or url of a directory that contains the default files to apply
Default: https://raw.githubusercontent.com/scontain/manifests/main
--image-registry <IMAGE REGISTRY URL>
Url of an image registry containing the images to be used
Default: registry.scontain.com/scone.cloud
--filename | -f <FILE>
file or url that contains the manifest to apply
- default Vault manifest is https://raw.githubusercontent.com/scontain/manifests/main/5.8.0-rc.8/vault.yaml
- default CAS manifest is https://raw.githubusercontent.com/scontain/manifests/main/5.8.0-rc.8/cas.yaml
- default Vault verifier manifest is https://raw.githubusercontent.com/scontain/manifests/main/5.8.0-rc.8/vault-verifier.yaml
--is-provisioned
Checks if CAS is already provisioned and exists: Exits with an error in case it was not yet provisioned.
--vault-client
Upload Vault client policy to CAS: specify policy with flag --filename. Default policy is specifed by VAULT_DEMO_CLIENT_POLICY_URL.
Default policy is https://raw.githubusercontent.com/scontain/manifests/main/5.8.0-rc.8/vault-demo-client-policy.yaml
--verify
Verify the set up of the specified CAS or Vault instance.
--print-public-keys
- SVC==cas, it prints the CAS Key, the CAS Software Key and the CAS encryption key.
- SVC==vault, it prints the public key of the Vault.
--cas
When provisioning vault, we use the specified cas. If not specified, we use CAS 'cas'.
For now, the CAS must be in the same Kubernetes cluster and the same namespace as the vault.
--image-overwrite <IMAGE>
Replace the CAS image by the given image - mainly used for testing.
--set-version <VERSION>
Set the version of CAS
--local-backup
Take a snapshot of the encrypted CAS database and store in local filesystem.
--cas-database-recovery <SNAPSHOT>
Create a new CAS instance and start with existing CAS database in directory <SNAPSHOT>.
--set-tolerations "<TOLERATIONS>"
Sets the tolerations, separated by spaces, that we permit when attesting SCONE CAS.
Overwrites environment variable SGX_TOLERATIONS. Default is --accept-configuration-needed --accept-group-out-of-date --accept-sw-hardening-needed
Example: "--accept-group-out-of-date --accept-sw-hardening-needed --accept-configuration-needed"
See https://sconedocs.github.io/CAS_cli/#scone-cas-attest for more details.
--upgrade <VERSION>
Perform software upgrade of CAS or Vault.
For CAS, this will perform the following steps:
1. Update the policy of the backup controller (requires owner credentials)
2. Upgrade the backup controller by updating the CAS custom resource manifest.
3. Upgrade the CAS service by updating the CAS image.
--help
Output this usage information and exit.
--version
Print version (5.8.0-rc.8) and exit.
Current Configuration:
- VERSION="5.8.0-rc.8"
- MANIFESTS_URL="https://raw.githubusercontent.com/scontain/manifests/main"
- IMAGE_REPO="registry.scontain.com/scone.cloud"
- IMAGE_PREFIX=""
- NAMESPACE="default"
- DCAP_KEY="aecd5ebb682346028d60c36131eb2d92"
- TARGET_DIR="/Users/christoffetzer/.cas"
- VAULT_MANIFEST_URL="https://raw.githubusercontent.com/scontain/manifests/main/5.8.0-rc.8/vault.yaml" # Vault Manifest
- VAULT_VERIFIER_MANIFEST_URL="https://raw.githubusercontent.com/scontain/manifests/main/5.8.0-rc.8/vault-verifier.yaml" # Vault Verifier Manifest
- VAULT_POLICY_URL="https://raw.githubusercontent.com/scontain/manifests/main/5.8.0-rc.8/vault-policy.yaml" # CAS policy for Vault
- VAULT_VERIFY_POLICY_URL="https://raw.githubusercontent.com/scontain/manifests/main/5.8.0-rc.8/vault-verify-policy.yaml" # CAS verification policy for Vault
- VAULT_DEMO_CLIENT_POLICY_URL="https://raw.githubusercontent.com/scontain/manifests/main/5.8.0-rc.8/vault-demo-client-policy.yaml" # demo policy for a Vault client
- CAS_MANIFEST_URL="https://raw.githubusercontent.com/scontain/manifests/main/5.8.0-rc.8/cas.yaml"
- CAS_PROVISIONING_URL="https://raw.githubusercontent.com/scontain/manifests/main/5.8.0-rc.8/cas_provisioning.yaml"
- CAS_BACKUP_POLICY_URL="https://raw.githubusercontent.com/scontain/manifests/main/5.8.0-rc.8/backup_policy.yaml"
- SGX_TOLERATIONS="--accept-configuration-needed --accept-group-out-of-date --accept-sw-hardening-needed --isvprodid 41316 --isvsvn 5 --mrsigner 195e5a6df987d6a515dd083750c1ea352283f8364d3ec9142b0d593988c6ed2d"