diff options
author | Eric Dumazet <edumazet@google.com> | 2016-06-06 09:37:16 -0700 |
---|---|---|
committer | David S. Miller <davem@davemloft.net> | 2016-06-07 16:37:14 -0700 |
commit | edb09eb17ed89eaa82a52dd306beac93e292b485 (patch) | |
tree | 1f241506d6b781b65d1033925f1c1ce6a39c3394 /net/sched/act_api.c | |
parent | f9eb8aea2a1e12fc2f584d1627deeb957435a801 (diff) | |
download | linux-edb09eb17ed89eaa82a52dd306beac93e292b485.tar.bz2 |
net: sched: do not acquire qdisc spinlock in qdisc/class stats dump
Large tc dumps (tc -s {qdisc|class} sh dev ethX) done by Google BwE host
agent [1] are problematic at scale :
For each qdisc/class found in the dump, we currently lock the root qdisc
spinlock in order to get stats. Sampling stats every 5 seconds from
thousands of HTB classes is a challenge when the root qdisc spinlock is
under high pressure. Not only the dumps take time, they also slow
down the fast path (queue/dequeue packets) by 10 % to 20 % in some cases.
An audit of existing qdiscs showed that sch_fq_codel is the only qdisc
that might need the qdisc lock in fq_codel_dump_stats() and
fq_codel_dump_class_stats()
In v2 of this patch, I now use the Qdisc running seqcount to provide
consistent reads of packets/bytes counters, regardless of 32/64 bit arches.
I also changed rate estimators to use the same infrastructure
so that they no longer need to lock root qdisc lock.
[1]
http://static.googleusercontent.com/media/research.google.com/en//pubs/archive/43838.pdf
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Cong Wang <xiyou.wangcong@gmail.com>
Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Cc: John Fastabend <john.fastabend@gmail.com>
Cc: Kevin Athey <kda@google.com>
Cc: Xiaotian Pei <xiaotian@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Diffstat (limited to 'net/sched/act_api.c')
-rw-r--r-- | net/sched/act_api.c | 4 |
1 files changed, 2 insertions, 2 deletions
diff --git a/net/sched/act_api.c b/net/sched/act_api.c index 719bc2e85852..b6db56ec8117 100644 --- a/net/sched/act_api.c +++ b/net/sched/act_api.c @@ -287,7 +287,7 @@ err2: if (est) { err = gen_new_estimator(&p->tcfc_bstats, p->cpu_bstats, &p->tcfc_rate_est, - &p->tcfc_lock, est); + &p->tcfc_lock, NULL, est); if (err) { free_percpu(p->cpu_qstats); goto err2; @@ -671,7 +671,7 @@ int tcf_action_copy_stats(struct sk_buff *skb, struct tc_action *a, if (err < 0) goto errout; - if (gnet_stats_copy_basic(&d, p->cpu_bstats, &p->tcfc_bstats) < 0 || + if (gnet_stats_copy_basic(NULL, &d, p->cpu_bstats, &p->tcfc_bstats) < 0 || gnet_stats_copy_rate_est(&d, &p->tcfc_bstats, &p->tcfc_rate_est) < 0 || gnet_stats_copy_queue(&d, p->cpu_qstats, |