summaryrefslogtreecommitdiffstats
path: root/net/sched
AgeCommit message (Collapse)AuthorFilesLines
2011-02-24netem: define NETEM_DIST_MAXstephen hemminger1-1/+1
Rather than magic constant in code, expose the maximum size of packet distribution table in API. In iproute2, q_netem defines MAX_DIST as 16K already. Signed-off-by: Stephen Hemminger <shemminger@vyatta.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-02-24netem: use vmalloc for distribution tablestephen hemminger1-4/+18
The netem probability table can be large (up to 64K bytes) which may be too large to allocate in one contiguous chunk. Signed-off-by: Stephen Hemminger <shemminger@vyatta.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-02-24netem: cleanup dump codestephen hemminger1-6/+3
Use nla_put_nested to update netlink attribute value. Signed-off-by: Stephen Hemminger <shemminger@vyatta.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-02-23em_meta: fix sparse warningstephen hemminger1-1/+1
gfp_t needs to be cast to integer. Signed-off-by: Stephen Hemminger <shemminger@vyatta.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-02-23mqprio: cleanupsstephen hemminger1-2/+4
* make qdisc_ops local * add sparse annotation about expected unlock/unlock in dump_class_stats * fix indentation Signed-off-by: Stephen Hemminger <shemminger@vyatta.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-02-23net_sched: SFB flow schedulerEric Dumazet3-0/+721
This is the Stochastic Fair Blue scheduler, based on work from : W. Feng, D. Kandlur, D. Saha, K. Shin. Blue: A New Class of Active Queue Management Algorithms. U. Michigan CSE-TR-387-99, April 1999. http://www.thefengs.com/wuchang/blue/CSE-TR-387-99.pdf This implementation is based on work done by Juliusz Chroboczek General SFB algorithm can be found in figure 14, page 15: B[l][n] : L x N array of bins (L levels, N bins per level) enqueue() Calculate hash function values h{0}, h{1}, .. h{L-1} Update bins at each level for i = 0 to L - 1 if (B[i][h{i}].qlen > bin_size) B[i][h{i}].p_mark += p_increment; else if (B[i][h{i}].qlen == 0) B[i][h{i}].p_mark -= p_decrement; p_min = min(B[0][h{0}].p_mark ... B[L-1][h{L-1}].p_mark); if (p_min == 1.0) ratelimit(); else mark/drop with probabilty p_min; I did the adaptation of Juliusz code to meet current kernel standards, and various changes to address previous comments : http://thread.gmane.org/gmane.linux.network/90225 http://thread.gmane.org/gmane.linux.network/90375 Default flow classifier is the rxhash introduced by RPS in 2.6.35, but we can use an external flow classifier if wanted. tc qdisc add dev $DEV parent 1:11 handle 11: \ est 0.5sec 2sec sfb limit 128 tc filter add dev $DEV protocol ip parent 11: handle 3 \ flow hash keys dst divisor 1024 Notes: 1) SFB default child qdisc is pfifo_fast. It can be changed by another qdisc but a child qdisc MUST not drop a packet previously queued. This is because SFB needs to handle a dequeued packet in order to maintain its virtual queue states. pfifo_head_drop or CHOKe should not be used. 2) ECN is enabled by default, unlike RED/CHOKe/GRED With help from Patrick McHardy & Andi Kleen Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> CC: Juliusz Chroboczek <Juliusz.Chroboczek@pps.jussieu.fr> CC: Stephen Hemminger <shemminger@vyatta.com> CC: Patrick McHardy <kaber@trash.net> CC: Andi Kleen <andi@firstfloor.org> CC: John W. Linville <linville@tuxdriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-02-22cls_u32: fix sparse warningsstephen hemminger1-6/+6
The variable _data is used in asm-generic to define sections which causes sparse warnings, so just rename the variable. Signed-off-by: Stephen Hemminger <shemminger@vyatta.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-02-14sch_mqprio: Always set num_tc to 0 in mqprio_destroy()Ben Hutchings1-7/+7
All the cleanup code in mqprio_destroy() is currently conditional on priv->qdiscs being non-null, but that condition should only apply to the per-queue qdisc cleanup. We should always set the number of traffic classes back to 0 here. Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
2011-02-02sch_choke: Need linux/vmalloc.hDavid S. Miller1-0/+1
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-02-02sched: CHOKe flow schedulerstephen hemminger3-0/+689
CHOKe ("CHOose and Kill" or "CHOose and Keep") is an alternative packet scheduler based on the Random Exponential Drop (RED) algorithm. The core idea is: For every packet arrival: Calculate Qave if (Qave < minth) Queue the new packet else Select randomly a packet from the queue if (both packets from same flow) then Drop both the packets else if (Qave > maxth) Drop packet else Admit packet with proability p (same as RED) See also: Rong Pan, Balaji Prabhakar, Konstantinos Psounis, "CHOKe: a stateless active queue management scheme for approximating fair bandwidth allocation", Proceeding of INFOCOM'2000, March 2000. Help from: Eric Dumazet <eric.dumazet@gmail.com> Patrick McHardy <kaber@trash.net> Signed-off-by: Stephen Hemminger <shemminger@vyatta.com> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-02-02sfq: deadlock in error pathstephen hemminger1-4/+5
The change to allow divisor to be a parameter (in 2.6.38-rc1) commit 817fb15dfd988d8dda916ee04fa506f0c466b9d6 introduced a possible deadlock caught by sparse. The scheduler tree lock was left locked in the case of an incorrect divisor value. Simplest fix is to move test outside of lock which also solves problem of partial update. Signed-off-by: Stephen Hemminger <shemminger@vyatta.com> Acked-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-01-26net_sched: sch_mqprio: dont leak kernel memoryEric Dumazet1-1/+1
mqprio_dump() should make sure all fields of struct tc_mqprio_qopt are initialized. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> CC: John Fastabend <john.r.fastabend@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-01-24Merge branch 'master' of ↵David S. Miller13-30/+24
master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6 Conflicts: net/sched/sch_hfsc.c net/sched/sch_htb.c net/sched/sch_tbf.c
2011-01-24Merge branch 'master' of ↵David S. Miller1-1/+1
master.kernel.org:/pub/scm/linux/kernel/git/torvalds/linux-2.6
2011-01-21net_sched: TCQ_F_CAN_BYPASS generalizationEric Dumazet5-6/+20
Now qdisc stab is handled before TCQ_F_CAN_BYPASS test in __dev_xmit_skb(), we can generalize TCQ_F_CAN_BYPASS to other qdiscs than pfifo_fast : pfifo, bfifo, pfifo_head_drop and sfq SFQ is special because it can have external classifiers, and in these cases, we cannot bypass queue discipline (packet could be dropped by classifier) without admin asking it, or further changes. Its worth doing this, especially for SFQ, avoiding dirtying memory in case no packets are already waiting in queue. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-01-20net_sched: accurate bytes/packets stats/ratesEric Dumazet13-30/+24
In commit 44b8288308ac9d (net_sched: pfifo_head_drop problem), we fixed a problem with pfifo_head drops that incorrectly decreased sch->bstats.bytes and sch->bstats.packets Several qdiscs (CHOKe, SFQ, pfifo_head, ...) are able to drop a previously enqueued packet, and bstats cannot be changed, so bstats/rates are not accurate (over estimated) This patch changes the qdisc_bstats updates to be done at dequeue() time instead of enqueue() time. bstats counters no longer account for dropped frames, and rates are more correct, since enqueue() bursts dont have effect on dequeue() rate. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Acked-by: Stephen Hemminger <shemminger@vyatta.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-01-20net_sched: RCU conversion of stabEric Dumazet2-10/+18
This patch converts stab qdisc management to RCU, so that we can perform the qdisc_calculate_pkt_len() call before getting qdisc lock. This shortens the lock's held time in __dev_xmit_skb(). This permits more qdiscs to get TCQ_F_CAN_BYPASS status, avoiding lot of cache misses and so reducing latencies. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> CC: Patrick McHardy <kaber@trash.net> CC: Jesper Dangaard Brouer <hawk@diku.dk> CC: Jarek Poplawski <jarkao2@gmail.com> CC: Jamal Hadi Salim <hadi@cyberus.ca> CC: Stephen Hemminger <shemminger@vyatta.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-01-20net_sched: move TCQ_F_THROTTLED flagEric Dumazet6-11/+11
In commit 371121057607e (net: QDISC_STATE_RUNNING dont need atomic bit ops) I moved QDISC_STATE_RUNNING flag to __state container, located in the cache line containing qdisc lock and often dirtied fields. I now move TCQ_F_THROTTLED bit too, so that we let first cache line read mostly, and shared by all cpus. This should speedup HTB/CBQ for example. Not using test_bit()/__clear_bit()/__test_and_set_bit allows to use an "unsigned int" for __state container, reducing by 8 bytes Qdisc size. Introduce helpers to hide implementation details. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> CC: Patrick McHardy <kaber@trash.net> CC: Jesper Dangaard Brouer <hawk@diku.dk> CC: Jarek Poplawski <jarkao2@gmail.com> CC: Jamal Hadi Salim <hadi@cyberus.ca> CC: Stephen Hemminger <shemminger@vyatta.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-01-20net_sched: sfq: allow divisor to be a parameterEric Dumazet1-12/+30
SFQ currently uses a 1024 slots hash table, and its internal structure (sfq_sched_data) allocation needs order-1 page on x86_64 Allow tc command to specify a divisor value (hash table size), between 1 and 65536. If no value is provided, assume the 1024 default size. This allows admins to setup smaller (or bigger) SFQ for specific needs. This also brings back sfq_sched_data allocations to order-0 ones, saving 3KB per SFQ qdisc. Jesper uses ~55.000 SFQ in one machine, this patch should free 165 MB of memory. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> CC: Patrick McHardy <kaber@trash.net> CC: Jesper Dangaard Brouer <hawk@diku.dk> CC: Jarek Poplawski <jarkao2@gmail.com> CC: Jamal Hadi Salim <hadi@cyberus.ca> CC: Stephen Hemminger <shemminger@vyatta.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-01-20Merge branch 'master' of ↵David S. Miller3-6/+3
git://git.kernel.org/pub/scm/linux/kernel/git/kaber/nf-next-2.6
2011-01-19net_sched: cleanupsEric Dumazet41-801/+842
Cleanup net/sched code to current CodingStyle and practices. Reduce inline abuse Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-01-19net_sched: implement a root container qdisc sch_mqprioJohn Fastabend4-0/+434
This implements a mqprio queueing discipline that by default creates a pfifo_fast qdisc per tx queue and provides the needed configuration interface. Using the mqprio qdisc the number of tcs currently in use along with the range of queues alloted to each class can be configured. By default skbs are mapped to traffic classes using the skb priority. This mapping is configurable. Configurable parameters, struct tc_mqprio_qopt { __u8 num_tc; __u8 prio_tc_map[TC_BITMASK + 1]; __u8 hw; __u16 count[TC_MAX_QUEUE]; __u16 offset[TC_MAX_QUEUE]; }; Here the count/offset pairing give the queue alignment and the prio_tc_map gives the mapping from skb->priority to tc. The hw bit determines if the hardware should configure the count and offset values. If the hardware bit is set then the operation will fail if the hardware does not implement the ndo_setup_tc operation. This is to avoid undetermined states where the hardware may or may not control the queue mapping. Also minimal bounds checking is done on the count/offset to verify a queue does not exceed num_tx_queues and that queue ranges do not overlap. Otherwise it is left to user policy or hardware configuration to create useful mappings. It is expected that hardware QOS schemes can be implemented by creating appropriate mappings of queues in ndo_tc_setup(). One expected use case is drivers will use the ndo_setup_tc to map queue ranges onto 802.1Q traffic classes. This provides a generic mechanism to map network traffic onto these traffic classes and removes the need for lower layer drivers to know specifics about traffic types. Signed-off-by: John Fastabend <john.r.fastabend@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-01-19Merge branch 'master' of /repos/git/net-next-2.6Patrick McHardy23-71/+54
2011-01-14Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6Linus Torvalds1-5/+21
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6: (47 commits) GRETH: resolve SMP issues and other problems GRETH: handle frame error interrupts GRETH: avoid writing bad speed/duplex when setting transfer mode GRETH: fixed skb buffer memory leak on frame errors GRETH: GBit transmit descriptor handling optimization GRETH: fix opening/closing GRETH: added raw AMBA vendor/device number to match against. cassini: Fix build bustage on x86. e1000e: consistent use of Rx/Tx vs. RX/TX/rx/tx in comments/logs e1000e: update Copyright for 2011 e1000: Avoid unhandled IRQ r8169: keep firmware in memory. netdev: tilepro: Use is_unicast_ether_addr helper etherdevice.h: Add is_unicast_ether_addr function ks8695net: Use default implementation of ethtool_ops::get_link ks8695net: Disable non-working ethtool operations USB CDC NCM: Don't deref NULL in cdc_ncm_rx_fixup() and don't use uninitialized variable. vxge: Remember to release firmware after upgrading firmware netdev: bfin_mac: Remove is_multicast_ether_addr use in netdev_for_each_mc_addr ipsec: update MAX_AH_AUTH_LEN to support sha512 ...
2011-01-14Merge branch 'master' of git://1984.lsi.us.es/net-next-2.6Patrick McHardy5-120/+218
Conflicts: net/ipv4/route.c Signed-off-by: Patrick McHardy <kaber@trash.net>
2011-01-14netfilter: fix Kconfig dependenciesPatrick McHardy3-6/+3
Fix dependencies of netfilter realm match: it depends on NET_CLS_ROUTE, which itself depends on NET_SCHED; this dependency is missing from netfilter. Since matching on realms is also useful without having NET_SCHED enabled and the option really only controls whether the tclassid member is included in route and dst entries, rename the config option to IP_ROUTE_CLASSID and move it outside of traffic scheduling context to get rid of the NET_SCHED dependeny. Reported-by: Vladis Kletnieks <Valdis.Kletnieks@vt.edu> Signed-off-by: Patrick McHardy <kaber@trash.net>
2011-01-13net: remove dev_txq_stats_fold()Eric Dumazet1-5/+21
After recent changes, (percpu stats on vlan/tunnels...), we dont need anymore per struct netdev_queue tx_bytes/tx_packets/tx_dropped counters. Only remaining users are ixgbe, sch_teql, gianfar & macvlan : 1) ixgbe can be converted to use existing tx_ring counters. 2) macvlan incremented txq->tx_dropped, it can use the dev->stats.tx_dropped counter. 3) sch_teql : almost revert ab35cd4b8f42 (Use net_device internal stats) Now we have ndo_get_stats64(), use it, even for "unsigned long" fields (No need to bring back a struct net_device_stats) 4) gianfar adds a stats structure per tx queue to hold tx_bytes/tx_packets This removes a lockdep warning (and possible lockup) in rndis gadget, calling dev_get_stats() from hard IRQ context. Ref: http://www.spinics.net/lists/netdev/msg149202.html Reported-by: Neil Jones <neiljay@gmail.com> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> CC: Jarek Poplawski <jarkao2@gmail.com> CC: Alexander Duyck <alexander.h.duyck@intel.com> CC: Jeff Kirsher <jeffrey.t.kirsher@intel.com> CC: Sandeep Gopalpet <sandeep.kumar@freescale.com> CC: Michal Nazarewicz <mina86@mina86.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-01-13Merge branch 'for-next' of ↵Linus Torvalds1-1/+1
git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial * 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial: (43 commits) Documentation/trace/events.txt: Remove obsolete sched_signal_send. writeback: fix global_dirty_limits comment runtime -> real-time ppc: fix comment typo singal -> signal drivers: fix comment typo diable -> disable. m68k: fix comment typo diable -> disable. wireless: comment typo fix diable -> disable. media: comment typo fix diable -> disable. remove doc for obsolete dynamic-printk kernel-parameter remove extraneous 'is' from Documentation/iostats.txt Fix spelling milisec -> ms in snd_ps3 module parameter description Fix spelling mistakes in comments Revert conflicting V4L changes i7core_edac: fix typos in comments mm/rmap.c: fix comment sound, ca0106: Fix assignment to 'channel'. hrtimer: fix a typo in comment init/Kconfig: fix typo anon_inodes: fix wrong function name in comment fix comment typos concerning "consistent" poll: fix a typo in comment ... Fix up trivial conflicts in: - drivers/net/wireless/iwlwifi/iwl-core.c (moved to iwl-legacy.c) - fs/ext4/ext4.h Also fix missed 'diabled' typo in drivers/net/bnx2x/bnx2x.h while at it.
2011-01-10net_sched: factorize qdisc stats handlingEric Dumazet22-65/+32
HTB takes into account skb is segmented in stats updates. Generalize this to all schedulers. They should use qdisc_bstats_update() helper instead of manipulating bstats.bytes and bstats.packets Add bstats_update() helper too for classes that use gnet_stats_basic_packed fields. Note : Right now, TCQ_F_CAN_BYPASS shortcurt can be taken only if no stab is setup on qdisc. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-01-05net_sched: pfifo_head_drop problemEric Dumazet1-2/+0
commit 57dbb2d83d100ea (sched: add head drop fifo queue) introduced pfifo_head_drop, and broke the invariant that sch->bstats.bytes and sch->bstats.packets are COUNTER (increasing counters only) This can break estimators because est_timer() handles unsigned deltas only. A decreasing counter can then give a huge unsigned delta. My mid term suggestion would be to change things so that sch->bstats.bytes and sch->bstats.packets are incremented in dequeue() only, not at enqueue() time. We also could add drop_bytes/drop_packets and provide estimations of drop rates. It would be more sensible anyway for very low speeds, and big bursts. Right now, if we drop packets, they still are accounted in byte/packets abolute counters and rate estimators. Before this mid term change, this patch makes pfifo_head_drop behavior similar to other qdiscs in case of drops : Dont decrement sch->bstats.bytes and sch->bstats.packets Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Acked-by: Hagen Paul Pfeifer <hagen@jauu.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-01-03sch_red: report backlog informationEric Dumazet1-0/+1
Provide child qdisc backlog (byte count) information so that "tc -s qdisc" can report it to user. packet count is already correctly provided. qdisc red 11: parent 1:11 limit 60Kb min 15Kb max 45Kb ecn Sent 3116427684 bytes 1415782 pkt (dropped 8, overlimits 7866 requeues 0) rate 242385Kbit 13630pps backlog 13560b 8p requeues 0 marked 7865 early 1 pdrop 7 other 0 Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2010-12-31sfq: fix slot_dequeue_head()Eric Dumazet1-2/+4
slot_dequeue_head() should make sure slot skb chain is correct in both ways, or we can crash if all possible flows are in use. Jarek pointed out slot_queue_init() can now be done in sfq_init() once, instead each time a flow is setup. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2010-12-31sch_sfq: allow big packets and be fairEric Dumazet1-7/+19
SFQ is currently 'limited' to small packets, because it uses a 15bit allotment number per flow. Introduce a scale by 8, so that we can handle full size TSO/GRO packets. Use appropriate handling to make sure allot is positive before a new packet is dequeued, so that fairness is respected. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Acked-by: Jarek Poplawski <jarkao2@gmail.com> Cc: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2010-12-22sfq: fix sfq class stats handlingEric Dumazet1-5/+11
sfq_walk() runs without qdisc lock. By the time it selects a non empty hash slot and sfq_dump_class_stats() is run (with lock held), slot might have been freed : We then access q->slots[SFQ_EMPTY_SLOT], out of bounds, and crash in slot_queue_walk() On previous kernels, bug is here but out of bounds qs[SFQ_DEPTH] and allot[SFQ_DEPTH] are located in struct sfq_sched_data, so no illegal memory access happens, only possibly wrong data reported to user. Also, slot_dequeue_tail() should make sure slot skb chain is correctly terminated, or sfq_dump_class_stats() can access freed skbs. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2010-12-22Merge branch 'master' into for-nextJiri Kosina3-3/+6
Conflicts: MAINTAINERS arch/arm/mach-omap2/pm24xx.c drivers/scsi/bfa/bfa_fcpim.c Needed to update to apply fixes for which the old branch was too outdated.
2010-12-20net_sched: sch_sfq: better struct layoutsEric Dumazet1-98/+162
Here is a respin of patch. I'll send a short patch to make SFQ more fair in presence of large packets as well. Thanks [PATCH v3 net-next-2.6] net_sched: sch_sfq: better struct layouts This patch shrinks sizeof(struct sfq_sched_data) from 0x14f8 (or more if spinlocks are bigger) to 0x1180 bytes, and reduce text size as well. text data bss dec hex filename 4821 152 0 4973 136d old/net/sched/sch_sfq.o 4627 136 0 4763 129b new/net/sched/sch_sfq.o All data for a slot/flow is now grouped in a compact and cache friendly structure, instead of being spreaded in many different points. struct sfq_slot { struct sk_buff *skblist_next; struct sk_buff *skblist_prev; sfq_index qlen; /* number of skbs in skblist */ sfq_index next; /* next slot in sfq chain */ struct sfq_head dep; /* anchor in dep[] chains */ unsigned short hash; /* hash value (index in ht[]) */ short allot; /* credit for this slot */ }; Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Cc: Jarek Poplawski <jarkao2@gmail.com> Cc: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2010-12-20Merge branch 'master' of ↵David S. Miller1-12/+8
master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6
2010-12-20net_sched: sch_sfq: fix allot handlingEric Dumazet1-12/+8
When deploying SFQ/IFB here at work, I found the allot management was pretty wrong in sfq, even changing allot from short to int... We should init allot for each new flow, not using a previous value found in slot. Before patch, I saw bursts of several packets per flow, apparently denying the default "quantum 1514" limit I had on my SFQ class. class sfq 11:1 parent 11: (dropped 0, overlimits 0 requeues 0) backlog 0b 7p requeues 0 allot 11546 class sfq 11:46 parent 11: (dropped 0, overlimits 0 requeues 0) backlog 0b 1p requeues 0 allot -23873 class sfq 11:78 parent 11: (dropped 0, overlimits 0 requeues 0) backlog 0b 5p requeues 0 allot 11393 After patch, better fairness among each flow, allot limit being respected, allot is positive : class sfq 11:e parent 11: (dropped 0, overlimits 0 requeues 86) backlog 0b 3p requeues 86 allot 596 class sfq 11:94 parent 11: (dropped 0, overlimits 0 requeues 0) backlog 0b 3p requeues 0 allot 1468 class sfq 11:a4 parent 11: (dropped 0, overlimits 0 requeues 0) backlog 0b 4p requeues 0 allot 650 class sfq 11:bb parent 11: (dropped 0, overlimits 0 requeues 0) backlog 0b 3p requeues 0 allot 596 Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2010-12-20net_sched: sch_sfq: add backlog info in sfq_dump_class_stats()Eric Dumazet1-1/+6
We currently return for each active SFQ slot the number of packets in queue. We can also give number of bytes accounted for these packets. tc -s class show dev ifb0 Before patch : class sfq 11:3d9 parent 11: (dropped 0, overlimits 0 requeues 0) backlog 0b 3p requeues 0 allot 1266 After patch : class sfq 11:3e4 parent 11: (dropped 0, overlimits 0 requeues 0) backlog 4380b 3p requeues 0 allot 1212 Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2010-12-16net: factorize sync-rcu call in unregister_netdevice_manyOctavian Purdila1-7/+22
Add dev_close_many and dev_deactivate_many to factorize another sync-rcu operation on the netdevice unregister path. $ modprobe dummy numdummies=10000 $ ip link set dev dummy* up $ time rmmod dummy Without the patch With the patch real 0m 24.63s real 0m 5.15s user 0m 0.00s user 0m 0.00s sys 0m 6.05s sys 0m 5.14s Signed-off-by: Octavian Purdila <opurdila@ixiacom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2010-12-01net sched: use xps information for qdisc NUMA affinityEric Dumazet1-1/+3
Allocate qdisc memory according to NUMA properties of cpus included in xps map. To be effective, qdisc should be (re)setup after changes of /sys/class/net/eth<n>/queues/tx-<n>/xps_cpus I added a numa_node field in struct netdev_queue, containing NUMA node if all cpus included in xps_cpus share same node, else -1. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Cc: Ben Hutchings <bhutchings@solarflare.com> Cc: Tom Herbert <therbert@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2010-11-28net: add netif_tx_queue_frozen_or_stoppedEric Dumazet2-7/+4
When testing struct netdev_queue state against FROZEN bit, we also test XOFF bit. We can test both bits at once and save some cycles. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2010-11-15Docs/Kconfig: Update: osdl.org -> linuxfoundation.orgMichael Witten1-1/+1
Some of the documentation refers to web pages under the domain `osdl.org'. However, `osdl.org' now redirects to `linuxfoundation.org'. Rather than rely on redirections, this patch updates the addresses appropriately; for the most part, only documentation that is meant to be current has been updated. The patch should be pretty quick to scan and check; each new web-page url was gotten by trying out the original URL in a browser and then simply copying the the redirected URL (formatting as necessary). There is some conflict as to which one of these domain names is preferred: linuxfoundation.org linux-foundation.org So, I wrote: info@linuxfoundation.org and got this reply: Message-ID: <4CE17EE6.9040807@linuxfoundation.org> Date: Mon, 15 Nov 2010 10:41:42 -0800 From: David Ames <david@linuxfoundation.org> ... linuxfoundation.org is preferred. The canonical name for our web site is www.linuxfoundation.org. Our list site is actually lists.linux-foundation.org. Regarding email linuxfoundation.org is preferred there are a few people who choose to use linux-foundation.org for their own reasons. Consequently, I used `linuxfoundation.org' for web pages and `lists.linux-foundation.org' for mailing-list web pages and email addresses; the only personal email address I updated from `@osdl.org' was that of Andrew Morton, who prefers `linux-foundation.org' according `git log'. Signed-off-by: Michael Witten <mfwitten@gmail.com> Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2010-11-08classifier: report statistics for basic classifierstephen hemminger1-0/+4
The basic classifier keeps statistics but does not report it to user space. This showed up when using basic classifier (with police) as a default catch all on ingress; no statistics were reported. Signed-off-by: Stephen Hemminger <shemminger@vyatta.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2010-11-03cls_cgroup: Fix crash on module unloadHerbert Xu1-2/+0
Somewhere along the lines net_cls_subsys_id became a macro when cls_cgroup is built as a module. Not only did it make cls_cgroup completely useless, it also causes it to crash on module unload. This patch fixes this by removing that macro. Thanks to Eric Dumazet for diagnosing this problem. Reported-by: Randy Dunlap <randy.dunlap@oracle.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Reviewed-by: Li Zefan <lizf@cn.fujitsu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2010-10-31text ematch: check for NULL pointer before destroying textsearch configThomas Graf1-1/+2
While validating the configuration em_ops is already set, thus the individual destroy functions are called, but the ematch data has not been allocated and associated with the ematch yet. Signed-off-by: Thomas Graf <tgraf@infradead.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2010-10-23Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next-2.6Linus Torvalds21-117/+752
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next-2.6: (1699 commits) bnx2/bnx2x: Unsupported Ethtool operations should return -EINVAL. vlan: Calling vlan_hwaccel_do_receive() is always valid. tproxy: use the interface primary IP address as a default value for --on-ip tproxy: added IPv6 support to the socket match cxgb3: function namespace cleanup tproxy: added IPv6 support to the TPROXY target tproxy: added IPv6 socket lookup function to nf_tproxy_core be2net: Changes to use only priority codes allowed by f/w tproxy: allow non-local binds of IPv6 sockets if IP_TRANSPARENT is enabled tproxy: added tproxy sockopt interface in the IPV6 layer tproxy: added udp6_lib_lookup function tproxy: added const specifiers to udp lookup functions tproxy: split off ipv6 defragmentation to a separate module l2tp: small cleanup nf_nat: restrict ICMP translation for embedded header can: mcp251x: fix generation of error frames can: mcp251x: fix endless loop in interrupt handler if CANINTF_MERRF is set can-raw: add msg_flags to distinguish local traffic 9p: client code cleanup rds: make local functions/variables static ... Fix up conflicts in net/core/dev.c, drivers/net/pcmcia/smc91c92_cs.c and drivers/net/wireless/ath/ath9k/debug.c as per David
2010-10-21Merge branch 'master' of ↵David S. Miller1-7/+7
git://git.kernel.org/pub/scm/linux/kernel/git/kaber/nf-next-2.6
2010-10-21net_sched: remove the unused parameter of qdisc_create_dflt()Changli Gao12-39/+27
The first parameter dev isn't in use in qdisc_create_dflt(). Signed-off-by: Changli Gao <xiaosuo@gmail.com> Acked-by: Jamal Hadi Salim <hadi@cyberus.ca> Signed-off-by: David S. Miller <davem@davemloft.net>
2010-10-21net/sched: fix missing spinlock initEric Dumazet1-0/+2
Under network load, doing : tc qdisc del dev eth0 root triggers : [ 167.193087] BUG: spinlock bad magic on CPU#3, udpflood/4928 [ 167.193139] lock: c15bc324, .magic: 00000000, .owner: <none>/-1, .owner_cpu: -1 [ 167.193193] Pid: 4928, comm: udpflood Not tainted 2.6.36-rc7-11417-g215340c-dirty #323 [ 167.193245] Call Trace: [ 167.193292] [<c13abaa0>] ? printk+0x18/0x20 [ 167.193342] [<c11afb53>] spin_bug+0xa3/0xf0 [ 167.193389] [<c11afcdd>] do_raw_spin_lock+0x7d/0x160 [ 167.193440] [<c1313d4e>] ? __dev_xmit_skb+0x27e/0x2b0 [ 167.193496] [<c107382b>] ? trace_hardirqs_on+0xb/0x10 [ 167.193545] [<c13ae99a>] _raw_spin_lock+0x3a/0x40 [ 167.193593] [<c1313d4e>] ? __dev_xmit_skb+0x27e/0x2b0 [ 167.193641] [<c1313d4e>] __dev_xmit_skb+0x27e/0x2b0 commit 79640a4ca695 (add additional lock to qdisc to increase throughput) forgot to initialize noop_qdisc and noqueue_qdisc busylock Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>