summaryrefslogtreecommitdiffstats
path: root/fs/io-wq.h
diff options
context:
space:
mode:
authorYiFei Zhu <yifeifz2@illinois.edu>2020-10-11 10:47:42 -0500
committerKees Cook <keescook@chromium.org>2020-11-20 11:16:34 -0800
commitf9d480b6ffbeb336bf7f6ce44825c00f61b3abae (patch)
tree867da40111155af509560ef88831ccf9c73a6615 /fs/io-wq.h
parent09162bc32c880a791c6c0668ce0745cf7958f576 (diff)
downloadlinux-f9d480b6ffbeb336bf7f6ce44825c00f61b3abae.tar.bz2
seccomp/cache: Lookup syscall allowlist bitmap for fast path
The overhead of running Seccomp filters has been part of some past discussions [1][2][3]. Oftentimes, the filters have a large number of instructions that check syscall numbers one by one and jump based on that. Some users chain BPF filters which further enlarge the overhead. A recent work [6] comprehensively measures the Seccomp overhead and shows that the overhead is non-negligible and has a non-trivial impact on application performance. We observed some common filters, such as docker's [4] or systemd's [5], will make most decisions based only on the syscall numbers, and as past discussions considered, a bitmap where each bit represents a syscall makes most sense for these filters. The fast (common) path for seccomp should be that the filter permits the syscall to pass through, and failing seccomp is expected to be an exceptional case; it is not expected for userspace to call a denylisted syscall over and over. When it can be concluded that an allow must occur for the given architecture and syscall pair (this determination is introduced in the next commit), seccomp will immediately allow the syscall, bypassing further BPF execution. Each architecture number has its own bitmap. The architecture number in seccomp_data is checked against the defined architecture number constant before proceeding to test the bit against the bitmap with the syscall number as the index of the bit in the bitmap, and if the bit is set, seccomp returns allow. The bitmaps are all clear in this patch and will be initialized in the next commit. When only one architecture exists, the check against architecture number is skipped, suggested by Kees Cook [7]. [1] https://lore.kernel.org/linux-security-module/c22a6c3cefc2412cad00ae14c1371711@huawei.com/T/ [2] https://lore.kernel.org/lkml/202005181120.971232B7B@keescook/T/ [3] https://github.com/seccomp/libseccomp/issues/116 [4] https://github.com/moby/moby/blob/ae0ef82b90356ac613f329a8ef5ee42ca923417d/profiles/seccomp/default.json [5] https://github.com/systemd/systemd/blob/6743a1caf4037f03dc51a1277855018e4ab61957/src/shared/seccomp-util.c#L270 [6] Draco: Architectural and Operating System Support for System Call Security https://tianyin.github.io/pub/draco.pdf, MICRO-53, Oct. 2020 [7] https://lore.kernel.org/bpf/202010091614.8BB0EB64@keescook/ Co-developed-by: Dimitrios Skarlatos <dskarlat@cs.cmu.edu> Signed-off-by: Dimitrios Skarlatos <dskarlat@cs.cmu.edu> Signed-off-by: YiFei Zhu <yifeifz2@illinois.edu> Reviewed-by: Jann Horn <jannh@google.com> Signed-off-by: Kees Cook <keescook@chromium.org> Link: https://lore.kernel.org/r/10f91a367ec4fcdea7fc3f086de3f5f13a4a7436.1602431034.git.yifeifz2@illinois.edu
Diffstat (limited to 'fs/io-wq.h')
0 files changed, 0 insertions, 0 deletions