summaryrefslogtreecommitdiffstats
path: root/net
AgeCommit message (Collapse)AuthorFilesLines
2013-04-07decnet: remove duplicated include from dn_table.cWei Yongjun1-1/+0
Remove duplicated include. Signed-off-by: Wei Yongjun <yongjun_wei@trendmicro.com.cn> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-04-07net_cls: remove duplicated include from cls_api.cWei Yongjun1-1/+0
Remove duplicated include. Signed-off-by: Wei Yongjun <yongjun_wei@trendmicro.com.cn> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-04-076lowpan: handle dev_queue_xmit() error code properlyAlan Ott1-2/+2
dev_queue_xmit() will return a positive value if the packet could not be queued, often because the real network device (in our case the mac802154 wpan device) has its queue stopped. lowpan_xmit() should handle the positive return code (for the debug statement) and return that value to the higher layer so the higher layer will retry sending the packet. Signed-off-by: Alan Ott <alan@signal11.us> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-04-07mac802154: Increase tx_buffer_lenAlan Ott1-1/+1
Increase the buffer length from 10 to 300 packets. Consider that traffic on mac802154 devices will often be 6LoWPAN, and a full-length (1280 octet) IPv6 packet will fragment into 15 6LoWPAN fragments (because the MTU of IEEE 802.15.4 is 127). A 300-packet queue is really 20 full-length IPv6 packets. With a queue length of 10, an entire IPv6 packet was unable to get queued at one time, causing fragments to be dropped, and making reassembly impossible. Signed-off-by: Alan Ott <alan@signal11.us> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-04-07mac802154: Use netif flow controlAlan Ott1-0/+14
Use netif_stop_queue() and netif_wake_queue() to control the flow of packets to mac802154 devices. Since many IEEE 802.15.4 devices have no output buffer, and since the mac802154 xmit() function is designed to block, netif_stop_queue() is called after each packet. Signed-off-by: Alan Ott <alan@signal11.us> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-04-07mac802154: Do not try to resend failed packetsAlan Ott2-12/+2
When ops->xmit() fails, drop the packet. Devices which support hardware ack and retry (which include all devices currently supported by mainline), will automatically retry sending the packet (in the hardware) up to 3 times, per the 802.15.4 spec. There is no need, and it is incorrect to try to do it in mac802154. Signed-off-by: Alan Ott <alan@signal11.us> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-04-07sctp: fix error return code in __sctp_connect()Wei Yongjun1-2/+3
Fix to return a negative error code from the error handling case instead of 0, as returned elsewhere in this function. Signed-off-by: Wei Yongjun <yongjun_wei@trendmicro.com.cn> Acked-by: Vlad Yasevich <vyasevich@gmail.com> Acked-by: Neil Horman <nhorman@tuxdriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-04-07802: fix a possible race conditionCong Wang1-0/+4
(Resend with a better changelog) garp_pdu_queue() should ways be called with this spin lock. garp_uninit_applicant() only holds rtnl lock which is not enough here. A possible race can happen as garp_pdu_rcv() is called in BH context: garp_pdu_rcv() |->garp_pdu_parse_msg() |->garp_pdu_parse_attr() |-> garp_gid_event() Found by code inspection. Cc: Eric Dumazet <eric.dumazet@gmail.com> Cc: "David S. Miller" <davem@davemloft.net> Cc: David Ward <david.ward@ll.mit.edu> Cc: "Jorge Boncompte [DTI2]" <jorge@dti2.net> Signed-off-by: Cong Wang <amwang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-04-07Merge branch 'master' of git://1984.lsi.us.es/nf-nextDavid S. Miller46-1946/+2273
Pablo Neira Ayuso says: ==================== The following patchset contains Netfilter and IPVS updates for your net-next tree, most relevantly they are: * Add net namespace support to NFLOG, ULOG and ebt_ulog and NFQUEUE. The LOG and ebt_log target has been also adapted, but they still depend on the syslog netnamespace that seems to be missing, from Gao Feng. * Don't lose indications of congestion in IPv6 fragmentation handling, from Hannes Frederic Sowa.i * IPVS conversion to use RCU, including some code consolidation patches and optimizations, also some from Julian Anastasov. * cpu fanout support for NFQUEUE, from Holger Eitzenberger. * Better error reporting to userspace when dropping packets from all our _*_[xfrm|route]_me_harder functions, from Patrick McHardy. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2013-04-06netfilter: implement RFC3168 5.3 (ecn protection) for ipv6 fragmentation ↵Hannes Frederic Sowa1-2/+20
handling This change brings netfilter reassembly logic on par with reassembly.c. The corresponding change in net-next is (eec2e61 ipv6: implement RFC3168 5.3 (ecn protection) for ipv6 fragmentation handling) Cc: Eric Dumazet <eric.dumazet@gmail.com> Cc: Jesper Dangaard Brouer <jbrouer@redhat.com> Cc: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2013-04-05netfilter: remove unneeded variable proc_net_netfilterPablo Neira Ayuso1-12/+4
Now that this supports net namespace for nflog and nfqueue, we can remove the global proc_net_netfilter which has no clients anymore. Based on patch from Gao feng. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2013-04-05netfilter: nfnetlink_queue: add net namespace support for nfnetlink_queueGao feng1-60/+113
This patch makes /proc/net/netfilter/nfnetlink_queue pernet. Moreover, there's a pernet instance table and lock. Signed-off-by: Gao feng <gaofeng@cn.fujitsu.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2013-04-05netfilter: enable per netns support for nf_loggersGao feng1-21/+0
After this patch, all nf_loggers support net namespace. Still xt_LOG and ebt_log require syslog netns support. Signed-off-by: Gao feng <gaofeng@cn.fujitsu.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2013-04-05netfilter: nfnetlink_log: add net namespace support for nfnetlink_logGao feng1-60/+117
This patch makes /proc/net/netfilter/nfnetlink_log pernet. Moreover, there's a pernet instance table and lock. Signed-off-by: Gao feng <gaofeng@cn.fujitsu.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2013-04-05netfilter: ipt_ULOG: add net namespace support for ipt_ULOGGao feng1-40/+89
Add pernet support to ipt_ULOG by means of the new nf_log_set function added in (30e0c6a netfilter: nf_log: prepare net namespace support for loggers). This patch also make ulog_buffers and netlink socket nflognl per netns. Signed-off-by: Gao feng <gaofeng@cn.fujitsu.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2013-04-05netfilter: ebt_ulog: add net namespace support for ebt_ulogGao feng1-37/+88
Add pernet support to ebt_ulog by means of the new nf_log_set function added in (30e0c6a netfilter: nf_log: prepare net namespace support for loggers). This patch also make ulog_buffers and netlink socket ebtulognl per netns. Signed-off-by: Gao feng <gaofeng@cn.fujitsu.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2013-04-05netfilter: xt_LOG: add net namespace support for xt_LOGGao feng1-3/+49
Add pernet support to xt_LOG by means of the new nf_log_set function added in (30e0c6a netfilter: nf_log: prepare net namespace support for loggers). Since syslog ns has yet not been implemented, we don't want the containers to DDOS host's syslogd. So only enable ebt_log only from init_net and wait for syslog ns support Signed-off-by: Gao feng <gaofeng@cn.fujitsu.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2013-04-05netfilter: ebt_log: add net namespace support for ebt_logGao feng1-2/+35
Add pernet support to ebt_log by means of the new nf_log_set function added in (30e0c6a netfilter: nf_log: prepare net namespace support for loggers). Since syslog ns has yet not been implemented, we don't want the containers to DDOS host's syslogd. So only enable ebt_log only from init_net and wait for syslog ns support. Signed-off-by: Gao feng <gaofeng@cn.fujitsu.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2013-04-05netfilter: nf_log: prepare net namespace support for loggersGao feng14-96/+216
This patch adds netns support to nf_log and it prepares netns support for existing loggers. It is composed of four major changes. 1) nf_log_register has been split to two functions: nf_log_register and nf_log_set. The new nf_log_register is used to globally register the nf_logger and nf_log_set is used for enabling pernet support from nf_loggers. Per netns is not yet complete after this patch, it comes in separate follow up patches. 2) Add net as a parameter of nf_log_bind_pf. Per netns is not yet complete after this patch, it only allows to bind the nf_logger to the protocol family from init_net and it skips other cases. 3) Adapt all nf_log_packet callers to pass netns as parameter. After this patch, this function only works for init_net. 4) Make the sysctl net/netfilter/nf_log pernet. Signed-off-by: Gao feng <gaofeng@cn.fujitsu.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2013-04-05netfilter: make /proc/net/netfilter pernetGao feng1-4/+29
This patch makes this proc dentry pernet. So far only init_net had a /proc/net/netfilter directory. Signed-off-by: Gao feng <gaofeng@cn.fujitsu.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2013-04-04net: frag queue per hash bucket lockingJesper Dangaard Brouer1-13/+44
This patch implements per hash bucket locking for the frag queue hash. This removes two write locks, and the only remaining write lock is for protecting hash rebuild. This essentially reduce the readers-writer lock to a rebuild lock. This patch is part of "net: frag performance followup" http://thread.gmane.org/gmane.linux.network/263644 of which two patches have already been accepted: Same test setup as previous: (http://thread.gmane.org/gmane.linux.network/257155) Two 10G interfaces, on seperate NUMA nodes, are under-test, and uses Ethernet flow-control. A third interface is used for generating the DoS attack (with trafgen). Notice, I have changed the frag DoS generator script to be more efficient/deadly. Before it would only hit one RX queue, now its sending packets causing multi-queue RX, due to "better" RX hashing. Test types summary (netperf UDP_STREAM): Test-20G64K == 2x10G with 65K fragments Test-20G3F == 2x10G with 3x fragments (3*1472 bytes) Test-20G64K+DoS == Same as 20G64K with frag DoS Test-20G3F+DoS == Same as 20G3F with frag DoS Test-20G64K+MQ == Same as 20G64K with Multi-Queue frag DoS Test-20G3F+MQ == Same as 20G3F with Multi-Queue frag DoS When I rebased this-patch(03) (on top of net-next commit a210576c) and removed the _bh spinlock, I saw a performance regression. BUT this was caused by some unrelated change in-between. See tests below. Test (A) is what I reported before for patch-02, accepted in commit 1b5ab0de. Test (B) verifying-retest of commit 1b5ab0de corrospond to patch-02. Test (C) is what I reported before for this-patch Test (D) is net-next master HEAD (commit a210576c), which reveals some (unknown) performance regression (compared against test (B)). Test (D) function as a new base-test. Performance table summary (in Mbit/s): (#) Test-type: 20G64K 20G3F 20G64K+DoS 20G3F+DoS 20G64K+MQ 20G3F+MQ ---------- ------- ------- ---------- --------- -------- ------- (A) Patch-02 : 18848.7 13230.1 4103.04 5310.36 130.0 440.2 (B) 1b5ab0de : 18841.5 13156.8 4101.08 5314.57 129.0 424.2 (C) Patch-03v1: 18838.0 13490.5 4405.11 6814.72 196.6 461.6 (D) a210576c : 18321.5 11250.4 3635.34 5160.13 119.1 405.2 (E) with _bh : 17247.3 11492.6 3994.74 6405.29 166.7 413.6 (F) without bh: 17471.3 11298.7 3818.05 6102.11 165.7 406.3 Test (E) and (F) is this-patch(03), with(V1) and without(V2) the _bh spinlocks. I cannot explain the slow down for 20G64K (but its an artificial "lab-test" so I'm not worried). But the other results does show improvements. And test (E) "with _bh" version is slightly better. Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Acked-by: Eric Dumazet <edumazet@google.com> ---- V2: - By analysis from Hannes Frederic Sowa and Eric Dumazet, we don't need the spinlock _bh versions, as Netfilter currently does a local_bh_disable() before entering inet_fragment. - Fold-in desc from cover-mail V3: - Drop the chain_len counter per hash bucket. Signed-off-by: David S. Miller <davem@davemloft.net>
2013-04-03Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller6-27/+54
Pull net into net-next to get the synchronize_net() bug fix in bonding. Signed-off-by: David S. Miller <davem@davemloft.net>
2013-04-02net: fix smatch warnings inside datagram_pollJacob Keller5-5/+5
Commit 7d4c04fc170087119727119074e72445f2bb192b ("net: add option to enable error queue packets waking select") has an issue due to operator precedence causing the bit-wise OR to bind to the sock_flags call instead of the result of the terniary conditional. This fixes the *_poll functions to work properly. The old code results in "mask |= POLLPRI" instead of what was intended, which is to only include POLLPRI when the socket option is enabled. Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-04-02VSOCK: Handle changes to the VMCI context ID.Reilly Grant4-26/+23
The VMCI context ID of a virtual machine may change at any time. There is a VMCI event which signals this but datagrams may be processed before this is handled. It is therefore necessary to be flexible about the destination context ID of any datagrams received. (It can be assumed to be correct because it is provided by the hypervisor.) The context ID on existing sockets should be updated to reflect how the hypervisor is currently referring to the system. Signed-off-by: Reilly Grant <grantr@vmware.com> Acked-by: Andy King <acking@vmware.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-04-02net IPv6 : Fix broken IPv6 routing table after loopback down-upBalakumaran Kannan1-0/+27
IPv6 Routing table becomes broken once we do ifdown, ifup of the loopback(lo) interface. After down-up, routes of other interface's IPv6 addresses through 'lo' are lost. IPv6 addresses assigned to all interfaces are routed through 'lo' for internal communication. Once 'lo' is down, those routing entries are removed from routing table. But those removed entries are not being re-created properly when 'lo' is brought up. So IPv6 addresses of other interfaces becomes unreachable from the same machine. Also this breaks communication with other machines because of NDISC packet processing failure. This patch fixes this issue by reading all interface's IPv6 addresses and adding them to IPv6 routing table while bringing up 'lo'. ==Testing== Before applying the patch: $ route -A inet6 Kernel IPv6 routing table Destination Next Hop Flag Met Ref Use If 2000::20/128 :: U 256 0 0 eth0 fe80::/64 :: U 256 0 0 eth0 ::/0 :: !n -1 1 1 lo ::1/128 :: Un 0 1 0 lo 2000::20/128 :: Un 0 1 0 lo fe80::xxxx:xxxx:xxxx:xxxx/128 :: Un 0 1 0 lo ff00::/8 :: U 256 0 0 eth0 ::/0 :: !n -1 1 1 lo $ sudo ifdown lo $ sudo ifup lo $ route -A inet6 Kernel IPv6 routing table Destination Next Hop Flag Met Ref Use If 2000::20/128 :: U 256 0 0 eth0 fe80::/64 :: U 256 0 0 eth0 ::/0 :: !n -1 1 1 lo ::1/128 :: Un 0 1 0 lo ff00::/8 :: U 256 0 0 eth0 ::/0 :: !n -1 1 1 lo $ After applying the patch: $ route -A inet6 Kernel IPv6 routing table Destination Next Hop Flag Met Ref Use If 2000::20/128 :: U 256 0 0 eth0 fe80::/64 :: U 256 0 0 eth0 ::/0 :: !n -1 1 1 lo ::1/128 :: Un 0 1 0 lo 2000::20/128 :: Un 0 1 0 lo fe80::xxxx:xxxx:xxxx:xxxx/128 :: Un 0 1 0 lo ff00::/8 :: U 256 0 0 eth0 ::/0 :: !n -1 1 1 lo $ sudo ifdown lo $ sudo ifup lo $ route -A inet6 Kernel IPv6 routing table Destination Next Hop Flag Met Ref Use If 2000::20/128 :: U 256 0 0 eth0 fe80::/64 :: U 256 0 0 eth0 ::/0 :: !n -1 1 1 lo ::1/128 :: Un 0 1 0 lo 2000::20/128 :: Un 0 1 0 lo fe80::xxxx:xxxx:xxxx:xxxx/128 :: Un 0 1 0 lo ff00::/8 :: U 256 0 0 eth0 ::/0 :: !n -1 1 1 lo $ Signed-off-by: Balakumaran Kannan <Balakumaran.Kannan@ap.sony.com> Signed-off-by: Maruthi Thotad <Maruthi.Thotad@ap.sony.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-04-02ipconfig: add informative timeout messages while waiting for carrierPaul Gortmaker1-1/+12
Commit 3fb72f1e6e6165c5f495e8dc11c5bbd14c73385c ("ipconfig wait for carrier") added a "wait for carrier on at least one interface" policy, with a worst case maximum wait of two minutes. However, if you encounter this, you won't get any feedback from the console as to the nature of what is going on. You just see the booting process hang for two minutes and then continue. Here we add a message so the user knows what is going on, and hence can take action to rectify the situation (e.g. fix network cable or whatever.) After the 1st 10s pause, output now begins that looks like this: Waiting up to 110 more seconds for network. Waiting up to 100 more seconds for network. Waiting up to 90 more seconds for network. Waiting up to 80 more seconds for network. ... Since most systems will have no problem getting link/carrier in the 1st 10s, the only people who will see these messages are people with genuine issues that need to be resolved. Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-04-02cbq: incorrect processing of high limitsVasily Averin1-1/+4
currently cbq works incorrectly for limits > 10% real link bandwidth, and practically does not work for limits > 50% real link bandwidth. Below are results of experiments taken on 1 Gbit link In shaper | Actual Result -----------+--------------- 100M | 108 Mbps 200M | 244 Mbps 300M | 412 Mbps 500M | 893 Mbps This happen because of q->now changes incorrectly in cbq_dequeue(): when it is called before real end of packet transmitting, L2T is greater than real time delay, q_now gets an extra boost but never compensate it. To fix this problem we prevent change of q->now until its synchronization with real time. Signed-off-by: Vasily Averin <vvs@openvz.org> Reviewed-by: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-04-02bridge: remove a redundant synchronize_net()Eric Dumazet1-1/+0
commit 00cfec37484761 (net: add a synchronize_net() in netdev_rx_handler_unregister()) allows us to remove the synchronized_net() call from del_nbp() Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Veaceslav Falico <vfalico@redhat.com> Cc: Stephen Hemminger <stephen@networkplumber.org> Acked-by: Veaceslav Falico <vfalico@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-04-02netfilter: xt_NFQUEUE: coalesce IPv4 and IPv6 hashingholger@eitzenberger.org1-20/+22
Because rev1 and rev3 of the target share the same hashing generalize it by introduing nfqueue_hash(). Signed-off-by: Holger Eitzenberger <holger@eitzenberger.org> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2013-04-02netfilter: xt_NFQUEUE: introduce CPU fanoutholger@eitzenberger.org1-2/+39
Current NFQUEUE target uses a hash, computed over source and destination address (and other parameters), for steering the packet to the actual NFQUEUE. This, however forgets about the fact that the packet eventually is handled by a particular CPU on user request. If E. g. 1) IRQ affinity is used to handle packets on a particular CPU already (both single-queue or multi-queue case) and/or 2) RPS is used to steer packets to a specific softirq the target easily chooses an NFQUEUE which is not handled by a process pinned to the same CPU. The idea is therefore to use the CPU index for determining the NFQUEUE handling the packet. E. g. when having a system with 4 CPUs, 4 MQ queues and 4 NFQUEUEs it looks like this: +-----+ +-----+ +-----+ +-----+ |NFQ#0| |NFQ#1| |NFQ#2| |NFQ#3| +-----+ +-----+ +-----+ +-----+ ^ ^ ^ ^ | |NFQUEUE | | + + + + +-----+ +-----+ +-----+ +-----+ |rx-0 | |rx-1 | |rx-2 | |rx-3 | +-----+ +-----+ +-----+ +-----+ The NFQUEUEs not necessarily have to start with number 0, setups with less NFQUEUEs than packet-handling CPUs are not a problem as well. This patch extends the NFQUEUE target to accept a new NFQ_FLAG_CPU_FANOUT flag. If this is specified the target uses the CPU index for determining the NFQUEUE being used. I have to introduce rev3 for this. The 'flags' are folded into _v2 'bypass'. By changing the way which queue is assigned, I'm able to improve the performance if the processes reading on the NFQUEUs are pinned correctly. Signed-off-by: Holger Eitzenberger <holger@eitzenberger.org> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2013-04-02netfilter: use IS_ENABLE to replace if defined in TRACE targetGao feng2-6/+3
Signed-off-by: Gao feng <gaofeng@cn.fujitsu.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2013-04-02ipvs: do not disable bh for long timeJulian Anastasov12-87/+64
We used a global BH disable in LOCAL_OUT hook. Add _bh suffix to all places that need it and remove the disabling from LOCAL_OUT and sync code. Functions like ip_defrag need protection from BH, so add it. As for nf_nat_mangle_tcp_packet, it needs RCU lock. Signed-off-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Simon Horman <horms@verge.net.au>
2013-04-02ipvs: convert services to rcuJulian Anastasov17-240/+177
This is the final step in RCU conversion. Things that are removed: - svc->usecnt: now svc is accessed under RCU read lock - svc->inc: and some unused code - ip_vs_bind_pe and ip_vs_unbind_pe: no ability to replace PE - __ip_vs_svc_lock: replaced with RCU - IP_VS_WAIT_WHILE: now readers lookup svcs and dests under RCU and work in parallel with configuration Other changes: - before now, a RCU read-side critical section included the calling of the schedule method, now it is extended to include service lookup - ip_vs_svc_table and ip_vs_svc_fwm_table are now using hlist - svc->pe and svc->scheduler remain to the end (of grace period), the schedulers are prepared for such RCU readers even after done_service is called but they need to use synchronize_rcu because last ip_vs_scheduler_put can happen while RCU read-side critical sections use an outdated svc->scheduler pointer - as planned, update_service is removed - empty services can be freed immediately after grace period. If dests were present, the services are freed from the dest trash code Signed-off-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Simon Horman <horms@verge.net.au>
2013-04-02ipvs: convert dests to rcuJulian Anastasov3-25/+26
In previous commits the schedulers started to access svc->destinations with _rcu list traversal primitives because the IP_VS_WAIT_WHILE macro still plays the role of grace period. Now it is time to finish the updating part, i.e. adding and deleting of dests with _rcu suffix before removing the IP_VS_WAIT_WHILE in next commit. We use the same rule for conns as for the schedulers: dests can be searched in RCU read-side critical section where ip_vs_dest_hold can be called by ip_vs_bind_dest. Some things are not perfect, for example, calling functions like ip_vs_lookup_dest from updating code under RCU, just because we use some function both from reader and from updater. Signed-off-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Simon Horman <horms@verge.net.au>
2013-04-02ipvs: convert sched_lock to spin lockJulian Anastasov5-32/+32
As all read_locks are gone spin lock is preferred. Signed-off-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Simon Horman <horms@verge.net.au>
2013-04-02ipvs: do not expect result from done_serviceJulian Anastasov7-28/+10
This method releases the scheduler state, it can not fail. Such change will help to properly replace the scheduler in following patch. Signed-off-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Simon Horman <horms@verge.net.au>
2013-04-02ipvs: reorganize dest trashJulian Anastasov1-74/+104
All dests will go to trash, no exceptions. But we have to use new list node t_list for this, due to RCU changes in following patches. Dests will wait there initial grace period and later all conns and schedulers to put their reference. The dests don't get reference for staying in dest trash as before. As result, we do not load ip_vs_dest_put with extra checks for last refcnt and the schedulers do not need to play games with atomic_inc_not_zero while selecting best destination. Signed-off-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Simon Horman <horms@verge.net.au>
2013-04-02ipvs: convert wrr scheduler to rcuJulian Anastasov1-64/+103
The schedule method now needs _rcu list-traversal primitive for svc->destinations. As the weight for some dest can be reduced during dest selection, change the algorithm to check weights by using minimum weights in the 1 .. max_weight-(di-1) range, with the same step (di). By this way we ensure that there will be always a weight >= 1 check before claiming that all destinations are overloaded. Signed-off-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Simon Horman <horms@verge.net.au>
2013-04-02ipvs: convert wlc scheduler to rcuJulian Anastasov1-2/+2
The schedule method now needs _rcu list-traversal primitive for svc->destinations. Signed-off-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Simon Horman <horms@verge.net.au>
2013-04-02ipvs: convert sh scheduler to rcuJulian Anastasov1-36/+45
Use the 3 new methods to reassign dests. Signed-off-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Simon Horman <horms@verge.net.au>
2013-04-02ipvs: convert sed scheduler to rcuJulian Anastasov1-2/+2
The schedule method now needs _rcu list-traversal primitive for svc->destinations. Signed-off-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Simon Horman <horms@verge.net.au>
2013-04-02ipvs: convert rr scheduler to rcuJulian Anastasov1-20/+35
The schedule method now needs _rcu list-traversal primitive for svc->destinations. As the previous entry could be unlinked, limit the list traversals to 2 when lookup started from previous entry. Signed-off-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Simon Horman <horms@verge.net.au>
2013-04-02ipvs: convert nq scheduler to rcuJulian Anastasov1-1/+1
The schedule method now needs _rcu list-traversal primitive for svc->destinations. Signed-off-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Simon Horman <horms@verge.net.au>
2013-04-02ipvs: convert lc scheduler to rcuJulian Anastasov1-1/+1
The schedule method now needs _rcu list-traversal primitive for svc->destinations. Signed-off-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Simon Horman <horms@verge.net.au>
2013-04-02ipvs: convert lblcr scheduler to rcuJulian Anastasov1-81/+90
The schedule method now needs _rcu list-traversal primitive for svc->destinations. The read_lock for sched_lock is removed. The set.lock is removed because now it is used in rare cases, mostly under sched_lock. Signed-off-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Simon Horman <horms@verge.net.au>
2013-04-02ipvs: convert lblc scheduler to rcuJulian Anastasov1-41/+55
The schedule method now needs _rcu list-traversal primitive for svc->destinations. The read_lock for sched_lock is removed. Use a dead flag to prevent new entries to be created while scheduler is reclaimed. Use hlist for the hash table. Signed-off-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Simon Horman <horms@verge.net.au>
2013-04-02ipvs: convert dh scheduler to rcuJulian Anastasov1-36/+45
Use the new add_dest and del_dest methods to reassign dests. Signed-off-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Simon Horman <horms@verge.net.au>
2013-04-02ipvs: add ip_vs_dest_hold and ip_vs_dest_putJulian Anastasov3-11/+6
ip_vs_dest_hold will be used under RCU lock while ip_vs_dest_put can be called even after dest is removed from service, as it happens for conns and some schedulers. Signed-off-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Simon Horman <horms@verge.net.au>
2013-04-02ipvs: preparations for using rcu in schedulersJulian Anastasov2-0/+14
Allow schedulers to use rcu_dereference when returning destination on lookup. The RCU read-side critical section will allow ip_vs_bind_dest to get dest refcnt as preparation for the step where destinations will be deleted without an IP_VS_WAIT_WHILE guard that holds the packet processing during update. Add new optional scheduler methods add_dest, del_dest and upd_dest. For now the methods are called together with update_service but update_service will be removed in a following change. Signed-off-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Simon Horman <horms@verge.net.au>
2013-04-02ipvs: change ip_vs_sched_lock to mutexJulian Anastasov1-12/+12
The global list with schedulers ip_vs_schedulers is accessed only from user context - configuration and scheduler module [un]registration. Use ip_vs_sched_mutex instead. Signed-off-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Simon Horman <horms@verge.net.au>