Production CAS and Vault
We automated the steps to deploy CAS
and Vault
in production mode using a kubectl
plugin.
Outline
- First, we describe how to deploy
CAS
with thekubectl
plugin. - Second, we describe some background regarding
CAS
provisioning.
TL;DR
The provision
of CAS is performed using a kubectl
plugin. This script is deployed or reconciled as part of the SCONE operator_controller
deployment. You can ensure that it is deployed by executing:
curl -fsSL https://raw.githubusercontent.com/scontain/SH/master/operator_controller | bash -s - \
--reconcile --plugin --verbose
To deploy a CAS instance named mycas
in production mode in the default namespace, execute the following:
kubectl provision cas mycas --verbose
This will deploy a CAS release in the default Kubernetes namespace. It starts
- a primary CAS instance with the name 'mycas',
- backup CAS instances on each node of the Kubernetes cluster (but pods run only temporarily),
- startup and liveness probes that monitor the primary CAS,
- snapshots of the CAS database are periodically created and stored in a separate persistent volume
/var/mnt/cas-database-snapshots-mycas-0
, - audit log to verify all CAS transaction is stored in
/var/log/cas/audit/cas_audit.log
.
Deploying SCONE CAS
SCONE CAS
is a Configuration and Attestation Service, i.e., it attests and verifies a service before it provides the service with its configuration information.
If you do not specify a manifest, the plugin will create a manifest
- a primary CAS with a given name
CAS_NAME
executes all requests, - any node in the Kubernetes cluster with SGX-support serves as a backup, i.e., is permitted to start a new primary CAS instance in case the old primary CAS has failed,
- liveness probes will detect CAS failures and trigger a fail-over to another node in case of a CAS or node failure,
- the database is stored in a persistent volume,
- automatic database snapshots, and liveness probes, you can use the SCONE
kubectl plugin:
kubectl provision cas CAS_NAME [--namespace NAMESPACE] [--dcap-api APIKEY] [--verbose]
-
If a
--namespace NAMESPACE
is provided, the CAS is installed in Kubernetes namespaceNAMESPACE
. Otherwise, CAS is installed in the default namespace. If the specified namespace does not exist, the plugin will terminate with an error. Ensure that the namespace has the pull secret -
The DCAP APIKey should always be specified. We use a default API key and issue a warning if it is not set. This debug API Key might stop working at any point in time. One can subscribe to the DCAP API via the Intel website.
-
When running on Azure, one can set
APIKEY="00000000000000000000000000000000"
. -
Use option
--help
to get more detailed usage information. -
Use flag '--verbose' to print some more verbose output
This command will deploy and provision a CAS in the Kubernetes cluster that automatically fails over in case the CAS crashes, or the CAS does not start up on a particular node.
Background: Database Recovery Snapshotting
Enabling snapshots
instructs the CAS to regularly snapshot its database in a separate directory for disaster recovery purposes. Note that this feature is supposed to protect against only storage (data-loss) failures! For protection against platform failures, i.e. a VM shuts down, we enable Backup CAS instances by default.
Protecting against data loss in confidential computing requires synchronization with storage that guarantees durability. This will typically happen via a network and introduces new synchronization concerns. Synchronizing the CAS database files directly may lead to a corrupt database snapshot as CAS continuously modifies the database.
Database Recovery Snapshotting
allows the synchronization of a consistent CAS database.
The default period (in case no databaseSnapshots.interval
is specified) is 60 seconds.
The Database Recovery Snapshotting feature will be stored in databaseSnapshots.targetDirectory
. The default is /var/mnt/cas-database-snapshots
.
For each interval
, CAS will copy a consistent snapshot of its database (cas.db
) and database key store (cas.key-store
) into a new subdirectory. If the CAS database remains unchanged, no snapshot will be created.
Warning
The Database Recovery Snapshotting feature is not a replacement for the CAS backup feature! The database snapshots cannot be restored unless the machine you intend to restore it on has previously been registered as a backup target using the SCONE CLI. Database snapshots are a safer option than copying the database files directly, as CAS will put the database into a read-only mode when creating a snapshot, ensuring consistency of the backed-up files. You should back up the latest snapshots e.g., by pushing it to an object store.
Note
If snapshot creation fails, CAS will be stopped with an error and return a non-zero exit code.
Deployment of Vault
..coming soon..