-
Notifications
You must be signed in to change notification settings - Fork 694
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Proposal: Add Security Context #867
Comments
Thanks for the proposal! Some general questions below:
|
@kailun-qin
Thank you. |
OCI Security Context
Summary
containerd
) offer their default Seccomp profiles that are allowlists of system calls to make containers secure.securitycontext
media type in the Image Media Types and addingSecurityContext
as a new field to theconfig
section of the Image Configuration.Kubernetes
.ImageDefault
to the orchestration's configuration as extra work.Background
Containers offer weaker isolation than Virtual Machines because all containers running on the same host share the same OS kernel. Therefore, it is important to reduce the attack surface of the kernel used by containers. The attack surface can be reduced by Secure computing (Seccomp) that can restrict the system calls available to each container. Additionally, OS Capability and Mandatory Access Control (MAC) like SELinux and AppArmor provide defense in depth.
The existing high-level container runtimes such as
containerd
andCRI-O
offer their default Seccomp profiles if the user sets them in a configuration ofKubernetes
as follows.The default Seccomp profiles are allowlists that drop potentially dangerous system calls such as
pivot_root
,ptrace
, and etc.Due to the default profiles, users can enforce Seccomp to containers easily without any analysis of system calls used by containers.
However, the profiles still include many system calls that actually are not used by the containers. If the users want to deny those system calls, they need to inspect the containers and identify system calls required for the containers using
DockerSlim
[1] or other dynamic analysis tools [2] [3]. Unfortunately, the dynamic analysis tools are not perfect because they cannot catch workloads that are executed rarely, such as error handling routines. To identify system calls correctly, a static analysis strategy is necessary, but there are many challenges to inspect system calls inside containerized applications correctly.Motivation
Recently, various state-of-the-art system call analysis techniques have been proposed in research papers to tackle the above issues. Typical examples include
Confine
[4] andSysfilter
[5].Confine
is a new static analysis-based system for automatically extracting and enforcing system call policies on containers.Confine
inspects containerized applications and all their dependencies, identifies the superset of system calls required for the correct operation of the containers, and generates corresponding Seccomp system call policies that can be readily enforced while loading the containers. Compared to the existing system call analysis tools,Confine
can extract system calls more correctly by analyzing containers statically. The results ofConfine
's evaluation by the authors with 150 publicly available Docker images show thatConfine
can successfully reduce their attack surface by disabling 145 or more system calls for more than half of the containers, neutralizing 51 disclosed kernel vulnerabilities.If container image developers can use
Confine
or other new static analysis-based systems to extract system calls that are used by container images, they can generate more accurate default profiles for the container image than runtime default profiles. The image default profiles can drop more system calls in the containers, with other services and functionality disabled. As a result, attack surfaces are typically much smaller than they would be with general-purpose containers, so there are fewer opportunities to attack and compromise the containers.Proposal
The goal of this proposal is to allow the users to choose the image default security context including the default Seccomp profiles and Capability setttings from the container orchestration software such as
Kubernetes
. This proposal can make containers more secure and the user can save time and effort for the security configurations of the containers. To achieve this, we propose defining a security context media type in the OCI Image Media Types and adding a security context field to the OCI Image Configuration.The reason for naming the media type
securitycontext
is to allow security information such as Capability to be added in the future. Recently, various techniques that measure Linux container security have been proposed in research papers [6] [7]. If image developers can measure accurately Capabilities used by applications in container images leveraging those tools, they can set the default Capabilities to the image config. Considering this, we think it is better to add general security settings to the Image Configuration, not limited to Seccomp.Each change is described below.
Image Media Type
We propose defining the new
securitycontext
media type in the Image Media Types.application/vnd.oci.image.securitycontext.v1+json
This contains information about security context that includes Seccomp and Linux Capability. We expect that the information is created by container image developers. For example, the image developer analyzes a container image in advance using system call analysis tools such as
Confine
and writes the seccomp profiles into thissecuritycontext
JSON file.The information is passed to each section in the OCI runtime specification by the high-level container runtimes. Hence, all the contents in the
securitycontext
follow the runtime specification configurations.Here is an example:
application/vnd.oci.image.securitycontext.v1+json
Image Configuration
We propose adding
SecurityContext
as a new field to theconfig
section of the Image Configuration. This field points to a specific security context that includes information about security configurations.SecurityContext
includes a set of descriptor properties.Here is an example:
application/vnd.oci.image.config.v1+json
Expected Use Cases
User Side:
An example of Seccomp for
Kubernetes
users is described below.Set default Seccomp profiles for a container image.
By the above configuration,
Kubernetes
enforces the image default profiles to the container.Image Developer Side:
An example for image developers is described below.
Confine
.application/vnd.oci.image.securitycontext.v1+json
and add the information to theSecurityContext
in the Image Configuration.Limitations
This default security context is just default settings for a container image that was analyzed by the image developer in advance. Therefore, if the user puts additional binaries into the default image, the user cannot use the default security context because it does not consider system calls used by the binaries.
Future Work
Currently, we have plans to develop a tool that allows image developers to easily analyze containerized applications inside an image using
Confine
and create an OCI image configuration including the image default Seccomp profiles. We're also thinking about adding support forKubernetes
toConfine
because the current implementation ofConfine
can extract system calls from only Docker containers. Additionally, we need to add a new Seccomp typeImageDefault
in the security context ofKubernetes
and modify the high-level container runtimes such ascontainerd
to extract the Seccomp profiles from the Image Configuration when users choose the image default Seccomp profiles.Backward Compatibility
There is no formal definition for backward-compatible changes in this new feature.
References
[1] DockerSlim. https://dockersl.im
[2] strace. https://strace.io
[3] oci-seccomp-bpf-hook. https://github.com/containers/oci-seccomp-bpf-hook
[4] Seyedhamed Ghavamnia, Tapti Palit, Azzedine Benameur, and Michalis Polychronakis. Confine: Automated System Call Policy Generation for Container Attack Surface Reduction. In International Symposium on Research in Attacks, Intrusions and Defenses (RAID), 2020.
[5] Nicholas DeMarinis and Kent Williams-King and Di Jin and Rodrigo Fonseca and Vasileios P. Kemerlis. sysfilter: Automated System Call Filtering for Commodity Software. In International Symposium on Research in Attacks, Intrusions and Defenses (RAID), 2020.
[6] J. Criswell, J. Zhou, S. Gravani and X. Hu. "PrivAnalyzer: Measuring the Efficacy of Linux Privilege Use," 2019 49th Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN), 2019.
[7] Xin Lin, Lingguang Lei, Yuewu Wang, Jiwu Jing, Kun Sun, and Quan Zhou. A measurement study on Linux container security: Attacks and countermeasures. In Proceedings of the 34th Annual Computer Security Applications Conference (ACSAC), 2018.
The text was updated successfully, but these errors were encountered: