Skip to content

Production CAS and Vault

We automated the steps to deploy CAS and Vault in production mode using a kubectl plugin.

Outline

  • First, we describe how to deploy CAS with the kubectl plugin.
  • Second, we describe some background regarding CAS provisioning.

TL;DR

The provision of CAS is performed using a kubectl plugin. This script is deployed or reconciled as part of the SCONE operator_controller deployment. You can ensure that it is deployed by executing:

curl -fsSL https://raw.githubusercontent.com/scontain/SH/master/operator_controller | bash -s - \
--reconcile --plugin --verbose

To deploy a CAS instance named mycas in production mode in the default namespace, execute the following:

kubectl provision cas mycas --verbose

This will deploy a CAS release in the default Kubernetes namespace. It starts

  • a primary CAS instance with the name 'mycas',
  • backup CAS instances on each node of the Kubernetes cluster (but pods run only temporarily),
  • startup and liveness probes that monitor the primary CAS,
  • snapshots of the CAS database are periodically created and stored in a separate persistent volume /var/mnt/cas-database-snapshots-mycas-0,
  • audit log to verify all CAS transaction is stored in /var/log/cas/audit/cas_audit.log.

Deploying SCONE CAS

SCONE CAS is a Configuration and Attestation Service, i.e., it attests and verifies a service before it provides the service with its configuration information.

If you do not specify a manifest, the plugin will create a manifest

  • a primary CAS with a given name CAS_NAME executes all requests,
  • any node in the Kubernetes cluster with SGX-support serves as a backup, i.e., is permitted to start a new primary CAS instance in case the old primary CAS has failed,
  • liveness probes will detect CAS failures and trigger a fail-over to another node in case of a CAS or node failure,
  • the database is stored in a persistent volume,
  • automatic database snapshots, and liveness probes, you can use the SCONE kubectl plugin:
kubectl provision cas CAS_NAME [--namespace NAMESPACE] [--dcap-api APIKEY] [--verbose]
  • If a --namespace NAMESPACE is provided, the CAS is installed in Kubernetes namespace NAMESPACE. Otherwise, CAS is installed in the default namespace. If the specified namespace does not exist, the plugin will terminate with an error. Ensure that the namespace has the pull secret

  • The DCAP APIKey should always be specified. We use a default API key and issue a warning if it is not set. This debug API Key might stop working at any point in time. One can subscribe to the DCAP API via the Intel website.

  • When running on Azure, one can set APIKEY="00000000000000000000000000000000".

  • Use option --help to get more detailed usage information.

  • Use flag '--verbose' to print some more verbose output

This command will deploy and provision a CAS in the Kubernetes cluster that automatically fails over in case the CAS crashes, or the CAS does not start up on a particular node.

Background: Database Recovery Snapshotting

Enabling snapshots instructs the CAS to regularly snapshot its database in a separate directory for disaster recovery purposes. Note that this feature is supposed to protect against only storage (data-loss) failures! For protection against platform failures, i.e. a VM shuts down, we enable Backup CAS instances by default.

Protecting against data loss in confidential computing requires synchronization with storage that guarantees durability. This will typically happen via a network and introduces new synchronization concerns. Synchronizing the CAS database files directly may lead to a corrupt database snapshot as CAS continuously modifies the database. Database Recovery Snapshotting allows the synchronization of a consistent CAS database.

The default period (in case no databaseSnapshots.interval is specified) is 60 seconds. The Database Recovery Snapshotting feature will be stored in databaseSnapshots.targetDirectory. The default is /var/mnt/cas-database-snapshots. For each interval, CAS will copy a consistent snapshot of its database (cas.db) and database key store (cas.key-store) into a new subdirectory. If the CAS database remains unchanged, no snapshot will be created.

Warning

The Database Recovery Snapshotting feature is not a replacement for the CAS backup feature! The database snapshots cannot be restored unless the machine you intend to restore it on has previously been registered as a backup target using the SCONE CLI. Database snapshots are a safer option than copying the database files directly, as CAS will put the database into a read-only mode when creating a snapshot, ensuring consistency of the backed-up files. You should back up the latest snapshots e.g., by pushing it to an object store.

Note

If snapshot creation fails, CAS will be stopped with an error and return a non-zero exit code.

Deployment of Vault

..coming soon..