kfence: default to dynamic branch instead of static keys mode

We have observed that on very large machines with newer CPUs, the static key/branch switching delay is on the order of milliseconds. This is due to the required broadcast IPIs, which simply does not scale well to hundreds of CPUs (cores). If done too frequently, this can adversely affect tail latencies of various workloads. One workaround is to increase the sample interval to several seconds, while decreasing sampled allocation coverage, but the problem still exists and could still increase tail latencies. As already noted in the Kconfig help text, there are trade-offs: at lower sample intervals the dynamic branch results in better performance; however, at very large sample intervals, the static keys mode can result in better performance -- careful benchmarking is recommended. Our initial benchmarking showed that with large enough sample intervals and workloads stressing the allocator, the static keys mode was slightly better. Evaluating and observing the possible system-wide side-effects of the static-key-switching induced broadcast IPIs, however, was a blind spot (in particular on large machines with 100s of cores). Therefore, a major downside of the static keys mode is, unfortunately, that it is hard to predict performance on new system architectures and topologies, but also making conclusions about performance of new workloads based on a limited set of benchmarks. Most distributions will simply select the defaults, while targeting a large variety of different workloads and system architectures. As such, the better default is CONFIG_KFENCE_STATIC_KEYS=n, and re-enabling it is only recommended after careful evaluation. For reference, on x86-64 the condition in kfence_alloc() generates exactly 2 instructions in the kmem_cache_alloc() fast-path: | ... | cmpl $0x0,0x1a8021c(%rip) # ffffffff82d560d0 <kfence_allocation_gate> | je ffffffff812d6003 <kmem_cache_alloc+0x243> | ... which, given kfence_allocation_gate is infrequently modified, should be well predicted by most CPUs. Link: https://lkml.kernel.org/r/20211019102524.2807208-2-elver@google.com Signed-off-by: Marco Elver <elver@google.com> Cc: Alexander Potapenko <glider@google.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Jann Horn <jannh@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
author: Marco Elver <elver@google.com> 2021-11-05 13:45:49 -0700
committer: Linus Torvalds <torvalds@linux-foundation.org> 2021-11-06 13:30:43 -0700
commit: 4f612ed3f748962cbef1316ff3d323e2b9055b6e (patch)
tree: e169a92246e0e93b1a4e6ac6be8bccf1e552db1e /Documentation/dev-tools
parent: 07e8481d3c38f461d7b79c1d5c9afe013b162b0c (diff)
download: linux-4f612ed3f748962cbef1316ff3d323e2b9055b6e.tar.bz2
1 files changed, 8 insertions, 4 deletions
diff --git a/Documentation/dev-tools/kfence.rst b/Documentation/dev-tools/kfence.rst
index d45f952986ae..ac6b89d1a8c3 100644
--- a/Documentation/dev-tools/kfence.rst
+++ b/Documentation/dev-tools/kfence.rst
@@ -231,10 +231,14 @@ Guarded allocations are set up based on the sample interval. After expiration
 of the sample interval, the next allocation through the main allocator (SLAB or
 SLUB) returns a guarded allocation from the KFENCE object pool (allocation
 sizes up to PAGE_SIZE are supported). At this point, the timer is reset, and
-the next allocation is set up after the expiration of the interval. To "gate" a
-KFENCE allocation through the main allocator's fast-path without overhead,
-KFENCE relies on static branches via the static keys infrastructure. The static
-branch is toggled to redirect the allocation to KFENCE.
+the next allocation is set up after the expiration of the interval.
+
+When using ``CONFIG_KFENCE_STATIC_KEYS=y``, KFENCE allocations are "gated"
+through the main allocator's fast-path by relying on static branches via the
+static keys infrastructure. The static branch is toggled to redirect the
+allocation to KFENCE. Depending on sample interval, target workloads, and
+system architecture, this may perform better than the simple dynamic branch.
+Careful benchmarking is recommended.
 
 KFENCE objects each reside on a dedicated page, at either the left or right
 page boundaries selected at random. The pages to the left and right of the
author	Marco Elver <elver@google.com>	2021-11-05 13:45:49 -0700
committer	Linus Torvalds <torvalds@linux-foundation.org>	2021-11-06 13:30:43 -0700
commit	4f612ed3f748962cbef1316ff3d323e2b9055b6e (patch)
tree	e169a92246e0e93b1a4e6ac6be8bccf1e552db1e /Documentation/dev-tools
parent	07e8481d3c38f461d7b79c1d5c9afe013b162b0c (diff)
download	linux-4f612ed3f748962cbef1316ff3d323e2b9055b6e.tar.bz2