SCONE Kubernetes Operator
Introduction
The SCONE Kubernetes Operator automates the management of SCONE-related services. It monitors the behavior of these services and ensures that the services stay in the desired state. This state is described with the help of Kubernetes custom resources.
The SCONE Operator defines a set of custom resources to manage the follow SCONE resources:
- SCONE CAS: the SCONE Configuration and Attestation Service,
- SCONE LAS: the SCONE Local Attestation Service,
- SCONE SGX Plugin: a Kubernetes plugin that provides containers with access to SGX,
- a confidential Vault, and
- signed and/or encrypted SCONE CAS policies.
A Kubernetes custom resource definition (CRD) specifies how to define a custom resource. We introduce these CRDs in some more detail below. SCONE CAS policies (aka sessions) are also defined as custom resources: This means that we can securely upload policies as Kubernetes manifests or as part of helm charts. This means that confidential applications can be deployed and operated using standard tools like helm and GitOps to manage confidential applications.
GitOps means that the ground truth for the deployment state of a confidential application is the git
repository. We will show in a later section that we do not need to trust the git repo or Kubernetes regarding integrity, confidentiality, or consistency. By excluding git repositories, Kubernetes, Linux, operations teams, etc, from the trusted computing base (TCB), we improve the security of our applications. This also means that the security argument of applications can be simplified: one does not need to have a detailed argument of the security of components that are not in the TCB.
The SCONE Kubernetes operator manages the SCONE custom resources.
Custom Resources
We define custom resources for the following SCONE services and resources:
-
Configuration and Attestation Service (CAS):
SCONE CAS is a central component of the SCONE infrastructure. Programs executing in enclaves connect to CAS to obtain their confidential configuration. CAS provisions this configuration only after it has verified the integrity and authenticity of the requesting enclave using remote attestation. Additionally, CAS checks that the requesting enclave is authorized to obtain the confidential configuration. One can run CAS instances on the same node as the application, the same cluster, or a different cluster. The CAS operator enables us to configure CAS policies remotely using
kubectl
orhelm
without exposing CAS to any external network. -
Local Attestation Service (LAS):
A LAS instance must run on each Kubernetes node that supports confidential computing. Developers will not have to know about LAS as long as the SCONE operator keeps LAS running. In conjunction with CAS, it enables remote attestation of enclaves by performing a local attestation. Currently, LAS supports DCAP and EPID-based quoting enclaves. Additionally, it provides an independent SCONE quoting enclave (QE): The SCONE QE enables the decoupling of the availability of an application from Intel's attestation services. A use case of the SCONE QE is the air-gapped deployment of applications without Internet connectivity.
-
SGX Device Plugin Service (SGX Plugin):
The SCONE SGX Plugin simplifies the deployment of confidential applications by providing any unprivileged container access to the SGX devices on hosts that support SGX. Developers will not have to know about the SGX Plugin as long as the SCONE operator can keep the SGX Plugin running. A confidential application can only run on hosts with SGX support. When running in a Kubernetes cluster, you must ensure your application's workload is scheduled to such nodes and that the application gets access to the corresponding SGX devices. A device plugin advertises hardware resources to the Kubelet.
The SCONE SGX Plugin provides access to the SGX devices without requiring the container to run in privileged mode. The SCONE SGX Plugin - provides access to the SGX devices on exactly those nodes in the cluster that have support for the SGX version that you need. - it labels these nodes with label
sgx.intel.com/capable=true
. - it will also allow your containers to run in non-privileged mode. The SGX Plugin itself must have permission to access the SGX devices.
NOTE: CAS requires both LAS and the SGX Plugin to run, and LAS requires the SGX Plugin.
The SCONE operator includes controllers to enable the submission of SCONE policies as custom resources. Using such custom resources, we delegate the policy submission to a controller, which hides some technical details of the interactions with CAS, and allows policies to be managed through kubectl
or helm
. Since we do not trust Kubernetes or any cluster admins, we must protect the policies.
SCONE provides two ways of protecting policies: Signed Policies and Encrypted Policies. In both modes, a Kubernetes admin or the controller submitting them to SCONE CAS cannot modify the contents of the policies. The following controllers and custom resources for SCONE policies are available:
-
Signed Policies:
Signed Policies are integrity-protected. A policy can be signed through the SCONE CLI, which will use a signing key pair to mark the contents of the policy. Please note that even though the policy cannot be modified once signed, its content is still visible to anyone, meaning that secrets or sensitive information must not be included in Signed Policies. Instead, you would ask CAS to generate secrets as part of a policy or import secrets from other policies.
-
Encrypted Policies:
Encrypted Policies are integrity and confidentiality-protected. A policy can be encrypted through the SCONE CLI, which will use the public encryption key of the target SCONE CAS to encrypt the policy contents. This ensures that only the intended target SCONE CAS can decrypt the policy contents, making it ideal for including secrets or sensitive information. Encrypted Policies are also signed.
-
Vault
We can deploy a confidential variant of Vault in a Kubernetes cluster with the help of the
kubectl provision
plugin.
In what follows, we will describe how to deploy the operator and the custom resources and configure them.
General SCONE Operator
The SCONE operator controls the lifecycle of the CAS
, LAS
, and SGXPlugin
custom resources. We aim to ensure a maintenance-free operation of all three services in a Kubernetes cluster. The general objectives of the SCONE operator are as follows:
-
effortless provisioning and configuration management:
- deploy and configure each of the SCONE services effortlessly by applying the configured manifest of the respective custom resource to the cluster,
- permit to pull the OCI image from a private repo - instead of the default image repo, and
- allow specification of resource limitations.
-
seamless upgrades:
- automatically update the OCI images to the newest version without shutting down existing applications.
-
full lifecycle support:
- supports full application lifecycle, including backup of the encrypted CAS database,
- automatic failure recovery in case of node failures, and
- disaster recovery in case a whole cluster or region becomes unavailable.
-
deep insights:
- provide metrics of current resource usage and status of the services
- alert the user of any problems, and
- provide the user with access to the state of the reconciliation process
-
auto pilot:
- automatic scaling, configuration, and scheduling tuning.
-
Kubernetes support:
- supports different versions of Kubernetes.
Specific CAS
Properties
The functional objectives specific to the CAS
operator are as follows:
- ensure that CAS is healthy and accepting new policies and attestation requests from confidential applications, and
- ensure that upgrades, database backups, and database encryption key backups, which ultimately allow for high-available CAS setups, are performed seamlessly.
Specific LAS
Properties
The functional objectives specific to the LAS
operator are as follows:
- ensure that LAS is deployed to all (or a subset of) SGX-enabled nodes of the cluster, thus allowing confidential applications to be attested on these nodes, and
- ensure that there is never more than one LAS service running.
Specific SGXPlugin
Properties
The functional objectives specific to the SGXPlugin
operator are as follows:
- ensure that the SGX Plugin provides all enclaves access to an SGX device,
- allow the SGX Plugin to be configured so that only a subset of the SGX-enabled nodes are considered when scheduling,
- ensure that the SGX Plugin is allowing enclaves to be scheduled to all nodes where SGX is enabled (if desired), and
- ensure there is never more than one SGX Plugin service running.