summaryrefslogtreecommitdiffstats
path: root/kernel/sched/sched.h
diff options
context:
space:
mode:
authorXunlei Pang <xlpang@linux.alibaba.com>2018-06-20 18:18:33 +0800
committerIngo Molnar <mingo@kernel.org>2018-07-03 09:17:29 +0200
commit512ac999d2755d2b7109e996a76b6fb8b888631d (patch)
treedc62292f785137cf4f606fcd9af7c1b23ebb4627 /kernel/sched/sched.h
parent296b2ffe7fa9ed756c41415c6b1512bc4ad687b1 (diff)
downloadlinux-512ac999d2755d2b7109e996a76b6fb8b888631d.tar.bz2
sched/fair: Fix bandwidth timer clock drift condition
I noticed that cgroup task groups constantly get throttled even if they have low CPU usage, this causes some jitters on the response time to some of our business containers when enabling CPU quotas. It's very simple to reproduce: mkdir /sys/fs/cgroup/cpu/test cd /sys/fs/cgroup/cpu/test echo 100000 > cpu.cfs_quota_us echo $$ > tasks then repeat: cat cpu.stat | grep nr_throttled # nr_throttled will increase steadily After some analysis, we found that cfs_rq::runtime_remaining will be cleared by expire_cfs_rq_runtime() due to two equal but stale "cfs_{b|q}->runtime_expires" after period timer is re-armed. The current condition to judge clock drift in expire_cfs_rq_runtime() is wrong, the two runtime_expires are actually the same when clock drift happens, so this condtion can never hit. The orginal design was correctly done by this commit: a9cf55b28610 ("sched: Expire invalid runtime") ... but was changed to be the current implementation due to its locking bug. This patch introduces another way, it adds a new field in both structures cfs_rq and cfs_bandwidth to record the expiration update sequence, and uses them to figure out if clock drift happens (true if they are equal). Signed-off-by: Xunlei Pang <xlpang@linux.alibaba.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Ben Segall <bsegall@google.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Fixes: 51f2176d74ac ("sched/fair: Fix unlocked reads of some cfs_b->quota/period") Link: http://lkml.kernel.org/r/20180620101834.24455-1-xlpang@linux.alibaba.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
Diffstat (limited to 'kernel/sched/sched.h')
-rw-r--r--kernel/sched/sched.h6
1 files changed, 4 insertions, 2 deletions
diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
index 27ddec334601..c7742dcc136c 100644
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -334,9 +334,10 @@ struct cfs_bandwidth {
u64 runtime;
s64 hierarchical_quota;
u64 runtime_expires;
+ int expires_seq;
- int idle;
- int period_active;
+ short idle;
+ short period_active;
struct hrtimer period_timer;
struct hrtimer slack_timer;
struct list_head throttled_cfs_rq;
@@ -551,6 +552,7 @@ struct cfs_rq {
#ifdef CONFIG_CFS_BANDWIDTH
int runtime_enabled;
+ int expires_seq;
u64 runtime_expires;
s64 runtime_remaining;