summaryrefslogtreecommitdiffstats
path: root/net
AgeCommit message (Collapse)AuthorFilesLines
2009-12-11net: Handle NETREG_UNINITIALIZED devices correctlyKrishna Kumar1-4/+6
Fix two problems: 1. If unregister_netdevice_many() is called with both registered and unregistered devices, rollback_registered_many() bails out when it reaches the first unregistered device. The processing of the prior registered devices is unfinished, and the remaining devices are skipped, and possible registered netdev's are leaked/unregistered. 2. System hangs or panics depending on how the devices are passed, since when netdev_run_todo() runs, some devices were not fully processed. Tested by passing intermingled unregistered and registered vlan devices to unregister_netdevice_many() as follows: 1. dev, fake_dev1, fake_dev2: hangs in run_todo ("unregister_netdevice: waiting for eth1.100 to become free. Usage count = 1") 2. fake_dev1, dev, fake_dev2: failure during de-registration and next registration, followed by a vlan driver Oops during subsequent registration. Confirmed that the patch fixes both cases. Signed-off-by: Krishna Kumar <krkumar2@in.ibm.com> Acked-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-12-11xfrm: Fix truncation length of authentication algorithms installed via PF_KEYMartin Willi1-0/+1
Commit 4447bb33f09444920a8f1d89e1540137429351b6 ("xfrm: Store aalg in xfrm_state with a user specified truncation length") breaks installation of authentication algorithms via PF_KEY, as the state specific truncation length is not installed with the algorithms default truncation length. This patch initializes state properly to the default if installed via PF_KEY. Signed-off-by: Martin Willi <martin@strongswan.org> Acked-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-12-11net: use compat helper functions in compat_sys_recvmmsgHeiko Carstens1-5/+2
Use (get|put)_compat_timespec helper functions to simplify the code. Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Acked-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-12-11net: fix compat_sys_recvmmsg parameter typeHeiko Carstens1-7/+5
compat_sys_recvmmsg has a compat_timespec parameter and not a timespec parameter. This way we also get rid of an odd cast. Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Acked-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-12-10wireless: update old static regulatory domain rulesJohn W. Linville1-50/+25
Update "US" and "JP" for current rules, and replace "EU" rules with the world roaming domain (since it was only a pseudo-domain anyway). Signed-off-by: John W. Linville <linville@tuxdriver.com>
2009-12-10mac80211: Revert 'Use correct sign for mesh active path refresh'Javier Cardona2-3/+4
The patch ("mac80211: Use correct sign for mesh active path refresh.") was actually a bug. Reverted it and improved the explanation of how mesh path refresh works. Signed-off-by: Javier Cardona <javier@cozybit.com> Signed-off-by: Andrey Yurovsky <andrey@cozybit.com> Cc: stable@kernel.org Signed-off-by: John W. Linville <linville@tuxdriver.com>
2009-12-10mac80211: Fixed bug in mesh portal pathsJavier Cardona1-1/+0
Paths to mesh portals were being timed out immediately after each use in intermediate forwarding nodes. mppath->exp_time is set to the expiration time so assigning it to jiffies was marking the path as expired. Signed-off-by: Javier Cardona <javier@cozybit.com> Signed-off-by: Andrey Yurovsky <andrey@cozybit.com> Cc: stable@kernel.org Signed-off-by: John W. Linville <linville@tuxdriver.com>
2009-12-10net/mac80211: Correct size given to memsetJulia Lawall1-1/+1
Memset should be given the size of the structure, not the size of the pointer. The semantic patch that makes this change is as follows: (http://coccinelle.lip6.fr/) // <smpl> @@ type T; T *x; expression E; @@ memset(x, E, sizeof( + * x)) // </smpl> Signed-off-by: Julia Lawall <julia@diku.dk> Signed-off-by: John W. Linville <linville@tuxdriver.com>
2009-12-09wireless: correctly report signal value for IEEE80211_HW_SIGNAL_UNSPECJohn W. Linville1-1/+2
This part was missed in "cfg80211: implement get_wireless_stats", probably because sta_set_sinfo already existed and was only handling dBm signals. Cc: stable@kernel.org Signed-off-by: John W. Linville <linville@tuxdriver.com>
2009-12-09cfg80211: Clear encryption privacy when key off is done.Vivek Natarajan1-0/+1
When the current_bss is not set, 'iwconfig <iface> key off' does not clear the private flag. Hence after we connect with WEP to an AP and then try to connect with another non-WEP AP, it does not work. This issue will not be seen if supplicant is used. Signed-off-by: Vivek Natarajan <vnatarajan@atheros.com> Signed-off-by: John W. Linville <linville@tuxdriver.com>
2009-12-08tcp: fix retrans_stamp advancing in error casesIlpo Järvinen1-3/+32
It can happen, that tcp_retransmit_skb fails due to some error. In such cases we might end up into a state where tp->retrans_out is zero but that's only because we removed the TCPCB_SACKED_RETRANS bit from a segment but couldn't retransmit it because of the error that happened. Therefore some assumptions that retrans_out checks are based do not necessarily hold, as there still can be an old retransmission but that is only visible in TCPCB_EVER_RETRANS bit. As retransmission happen in sequential order (except for some very rare corner cases), it's enough to check the head skb for that bit. Main reason for all this complexity is the fact that connection dying time now depends on the validity of the retrans_stamp, in particular, that successive retransmissions of a segment must not advance retrans_stamp under any conditions. It seems after quick thinking that this has relatively low impact as eventually TCP will go into CA_Loss and either use the existing check for !retrans_stamp case or send a retransmission successfully, setting a new base time for the dying timer (can happen only once). At worst, the dying time will be approximately the double of the intented time. In addition, tcp_packet_delayed() will return wrong result (has some cc aspects but due to rarity of these errors, it's hardly an issue). One of retrans_stamp clearing happens indirectly through first going into CA_Open state and then a later ACK lets the clearing to happen. Thus tcp_try_keep_open has to be modified too. Thanks to Damian Lukowski <damian@tvk.rwth-aachen.de> for hinting that this possibility exists (though the particular case discussed didn't after all have it happening but was just a debug patch artifact). Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-12-08tcp: Stalling connections: Move timeout calculation routineDamian Lukowski1-0/+29
This patch moves retransmits_timed_out() from include/net/tcp.h to tcp_timer.c, where it is used. Reported-by: Frederic Leroy <fredo@starox.org> Signed-off-by: Damian Lukowski <damian@tvk.rwth-aachen.de> Acked-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-12-08atm: [br2684] allow routed mode operation againchas williams - CONTRACTOR1-3/+8
in routed mode, we don't have a hardware address so netdev_ops doesnt need to validate our hardware address via .ndo_validate_addr Reported-by: Manuel Fuentes <mfuentes@agenciaefe.com> Signed-off-by: Chas Williams - CONTRACTOR <chas@cmf.nrl.navy.mil> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-12-08atm: [lec] initialize .netdev_ops before calling register_netdev()chas williams - CONTRACTOR1-9/+1
fix oops when initializing lane interfaces. lec should probably be changed to use alloc_netdev() instead. Signed-off-by: Chas Williams - CONTRACTOR <chas@cmf.nrl.navy.mil> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-12-08[PATCH] tcp: documents timewait refcnt tricks Eric Dumazet1-14/+24
Adds kerneldoc for inet_twsk_unhash() & inet_twsk_bind_unhash(). With help from Randy Dunlap. Suggested-by: Evgeniy Polyakov <zbr@ioremap.net> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-12-08tcp: Fix a connect() race with timewait socketsEric Dumazet2-8/+23
When we find a timewait connection in __inet_hash_connect() and reuse it for a new connection request, we have a race window, releasing bind list lock and reacquiring it in __inet_twsk_kill() to remove timewait socket from list. Another thread might find the timewait socket we already chose, leading to list corruption and crashes. Fix is to remove timewait socket from bind list before releasing the bind lock. Note: This problem happens if sysctl_tcp_tw_reuse is set. Reported-by: kapil dakhane <kdakhane@gmail.com> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-12-08tcp: Fix a connect() race with timewait socketsEric Dumazet6-13/+29
First patch changes __inet_hash_nolisten() and __inet6_hash() to get a timewait parameter to be able to unhash it from ehash at same time the new socket is inserted in hash. This makes sure timewait socket wont be found by a concurrent writer in __inet_check_established() Reported-by: kapil dakhane <kdakhane@gmail.com> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-12-08tcp: Remove runtime check that can never be true.David S. Miller1-5/+0
GCC even warns about it, as reported by Andrew Morton: net/ipv4/tcp.c: In function 'do_tcp_getsockopt': net/ipv4/tcp.c:2544: warning: comparison is always false due to limited range of data type Signed-off-by: David S. Miller <davem@davemloft.net>
2009-12-08Merge branch 'master' of ↵David S. Miller5-9/+28
git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-next-2.6
2009-12-08sctp: fix compile error due to sysctl mismergeLinus Torvalds1-1/+0
I messed up the merge in d7fc02c7bae7b1cf69269992cf880a43a350cdaa, where the conflict in question wasn't just about CTL_UNNUMBERED being removed, but the 'strategy' field is too (sysctl handling is now done through the /proc interface, with no duplicate protocols for reading the data). Reported-by: Larry Finger <Larry.Finger@lwfinger.net> Reported-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-12-08Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next-2.6Linus Torvalds346-8345/+13381
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next-2.6: (1815 commits) mac80211: fix reorder buffer release iwmc3200wifi: Enable wimax core through module parameter iwmc3200wifi: Add wifi-wimax coexistence mode as a module parameter iwmc3200wifi: Coex table command does not expect a response iwmc3200wifi: Update wiwi priority table iwlwifi: driver version track kernel version iwlwifi: indicate uCode type when fail dump error/event log iwl3945: remove duplicated event logging code b43: fix two warnings ipw2100: fix rebooting hang with driver loaded cfg80211: indent regulatory messages with spaces iwmc3200wifi: fix NULL pointer dereference in pmkid update mac80211: Fix TX status reporting for injected data frames ath9k: enable 2GHz band only if the device supports it airo: Fix integer overflow warning rt2x00: Fix padding bug on L2PAD devices. WE: Fix set events not propagated b43legacy: avoid PPC fault during resume b43: avoid PPC fault during resume tcp: fix a timewait refcnt race ... Fix up conflicts due to sysctl cleanups (dead sysctl_check code and CTL_UNNUMBERED removed) in kernel/sysctl_check.c net/ipv4/sysctl_net_ipv4.c net/ipv6/addrconf.c net/sctp/sysctl.c
2009-12-08Merge git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/sysctl-2.6Linus Torvalds60-1172/+207
* git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/sysctl-2.6: (43 commits) security/tomoyo: Remove now unnecessary handling of security_sysctl. security/tomoyo: Add a special case to handle accesses through the internal proc mount. sysctl: Drop & in front of every proc_handler. sysctl: Remove CTL_NONE and CTL_UNNUMBERED sysctl: kill dead ctl_handler definitions. sysctl: Remove the last of the generic binary sysctl support sysctl net: Remove unused binary sysctl code sysctl security/tomoyo: Don't look at ctl_name sysctl arm: Remove binary sysctl support sysctl x86: Remove dead binary sysctl support sysctl sh: Remove dead binary sysctl support sysctl powerpc: Remove dead binary sysctl support sysctl ia64: Remove dead binary sysctl support sysctl s390: Remove dead sysctl binary support sysctl frv: Remove dead binary sysctl support sysctl mips/lasat: Remove dead binary sysctl support sysctl drivers: Remove dead binary sysctl support sysctl crypto: Remove dead binary sysctl support sysctl security/keys: Remove dead binary sysctl support sysctl kernel: Remove binary sysctl logic ...
2009-12-07mac80211: Fix bug in computing crc over dynamic IEs in beaconVasanthakumar Thiagarajan1-1/+1
On a 32-bit machine, BIT() macro does not give the required bit value if the bit is mroe than 31. In ieee802_11_parse_elems_crc(), BIT() is suppossed to get the bit value more than 31 (42 (id of ERP_INFO_IE), 37 (CHANNEL_SWITCH_IE), (42), 32 (POWER_CONSTRAINT_IE), 45 (HT_CAP_IE), 61 (HT_INFO_IE)). As we do not get the required bit value for the above IEs, crc over these IEs are never calculated, so any dynamic change in these IEs after the association is not really handled on 32-bit platforms. This patch fixes this issue. Cc: stable@kernel.org Signed-off-by: Vasanthakumar Thiagarajan <vasanth@atheros.com> Signed-off-by: John W. Linville <linville@tuxdriver.com>
2009-12-07net/rfkill/core.c: work around gcc-4.0.2 sillinessAndrew Morton1-2/+2
net/rfkill/core.c: In function 'rfkill_type_show': net/rfkill/core.c:610: warning: control may reach end of non-void function 'rfkill_get_type_str' being inlined A gcc bug, but simple enough to squish. Cc: John W. Linville <linville@tuxdriver.com> Cc: Johannes Berg <johannes@sipsolutions.net> Cc: David S. Miller <davem@davemloft.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: John W. Linville <linville@tuxdriver.com>
2009-12-07mac80211: Fix dynamic power save for scanning.Vivek Natarajan2-4/+17
Not only ps_sdata but also IEEE80211_CONF_PS is to be considered before restoring PS in scan_ps_disable(). For instance, when ps_sdata is set but CONF_PS is not set just because the dynamic timer is still running, a sw scan leads to setting of CONF_PS in scan_ps_disable instead of restarting the dynamic PS timer. Also for the above case, a null data frame is to be sent after returning to operating channel which was not happening with the current implementation. This patch fixes this too. Signed-off-by: Vivek Natarajan <vnatarajan@atheros.com> Reviewed-by: Kalle Valo <kalle.valo@nokia.com> Signed-off-by: John W. Linville <linville@tuxdriver.com>
2009-12-07mac80211: recalculate idle later in MLMEJohannes Berg1-2/+8
hwsim testing has revealed that when the MLME recalculates the idle state of the device, it sometimes does so before sending the final deauthentication or disassociation frame. This patch changes the place where the idle state is recalculated, but of course driver transmit is typically asynchronous while configuration is expected to be synchronous, so it doesn't fix all possible cases yet. Signed-off-by: Johannes Berg <johannes@sipsolutions.net> Signed-off-by: John W. Linville <linville@tuxdriver.com>
2009-12-05Merge branch 'master' of /home/davem/src/GIT/linux-2.6/David S. Miller2-0/+4
Conflicts: drivers/net/pcmcia/fmvj18x_cs.c drivers/net/pcmcia/nmclan_cs.c drivers/net/pcmcia/xirc2ps_cs.c drivers/net/wireless/ray_cs.c
2009-12-05Merge branch 'core-printk-for-linus' of ↵Linus Torvalds2-0/+4
git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'core-printk-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: ratelimit: Make suppressed output messages more useful printk: Remove ratelimit.h from kernel.h ratelimit: Fix/allow use in atomic contexts ratelimit: Use per ratelimit context locking
2009-12-04mac80211: fix reorder buffer releaseJohannes Berg1-1/+1
My patch "mac80211: correctly place aMPDU RX reorder code" uses an skb queue for MPDUs that were released from the buffer. I intentially didn't initialise and use the skb queue's spinlock, but in this place forgot that the code variant that doesn't touch the spinlock is needed. Thanks to Christian Lamparter for quickly spotting the bug in the backtrace Reinette reported. Reported-by: Reinette Chatre <reinette.chatre@intel.com> Bug-identified-by: Christian Lamparter <chunkeey@googlemail.com> Tested-by: Reinette Chatre <reinette.chatre@intel.com> Signed-off-by: Johannes Berg <johannes@sipsolutions.net> Signed-off-by: John W. Linville <linville@tuxdriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-12-04Merge branch 'master' of ↵David S. Miller14-308/+526
git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-next-2.6
2009-12-04cfg80211: indent regulatory messages with spacesKalle Valo1-3/+3
The regulatory messages in syslog look weird: kernel: cfg80211: Regulatory domain: US kernel: ^I(start_freq - end_freq @ bandwidth), (max_antenna_gain, max_eirp) kernel: ^I(2402000 KHz - 2472000 KHz @ 40000 KHz), (600 mBi, 2700 mBm) kernel: ^I(5170000 KHz - 5190000 KHz @ 40000 KHz), (600 mBi, 2300 mBm) kernel: ^I(5190000 KHz - 5210000 KHz @ 40000 KHz), (600 mBi, 2300 mBm) kernel: ^I(5210000 KHz - 5230000 KHz @ 40000 KHz), (600 mBi, 2300 mBm) kernel: ^I(5230000 KHz - 5330000 KHz @ 40000 KHz), (600 mBi, 2300 mBm) kernel: ^I(5735000 KHz - 5835000 KHz @ 40000 KHz), (600 mBi, 3000 mBm) Indent them with four spaces instead of the tab character to get prettier output. Signed-off-by: Kalle Valo <kalle.valo@nokia.com> Acked: Luis R. Rodriguez <lrodriguez@atheros.com> Signed-off-by: John W. Linville <linville@tuxdriver.com>
2009-12-04mac80211: Fix TX status reporting for injected data framesJouni Malinen1-1/+5
An earlier optimization on removing unnecessary traffic on cooked monitor interfaces ("mac80211: reduce the amount of unnecessary traffic on cooked monitor interfaces ") ended up removing quite a bit more than just unnecessary traffic. It was not supposed to remove TX status reporting for injected frames, but ended up doing it by checking the injected flag in skb->cb only after that field had been cleared with memset.. Fix this by taking a local copy of the injected flag before skb->cb is cleared. This broke user space applications that depend on getting TX status notifications for injected data frames. For example, STA inactivity poll from hostapd did not work and ended up kicking out stations even if they were still present. Signed-off-by: Jouni Malinen <j@w1.fi> Signed-off-by: John W. Linville <linville@tuxdriver.com>
2009-12-04WE: Fix set events not propagatedJean Tourrilhes1-1/+2
I've just noticed that some events are no longer propagated for some wireless drivers. Basically, SET request with a extra payload for driver without commit handler. The fix is pretty simple, see attached. Actually, a few lines below this line, you will see that the event generation for simple SET (iwpoint-less ?) is done properly, and this other event generation does not need fixing. Signed-off-by: Jean Tourrilhes <jt@hpl.hp.com> Signed-off-by: John W. Linville <linville@tuxdriver.com>
2009-12-03tcp: fix a timewait refcnt raceEric Dumazet1-3/+16
After TCP RCU conversion, tw->tw_refcnt should not be set to 1 in inet_twsk_alloc(). It allows a RCU reader to get this timewait socket, while we not yet stabilized it. Only choice we have is to set tw_refcnt to 0 in inet_twsk_alloc(), then atomic_add() it later, once everything is done. Location of this atomic_add() is tricky, because we dont want another writer to find this timewait in ehash, while tw_refcnt is still zero ! Thanks to Kapil Dakhane tests and reports. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-12-03tcp: connect() race with timewait reuseEric Dumazet3-18/+45
Its currently possible that several threads issuing a connect() find the same timewait socket and try to reuse it, leading to list corruptions. Condition for bug is that these threads bound their socket on same address/port of to-be-find timewait socket, and connected to same target. (SO_REUSEADDR needed) To fix this problem, we could unhash timewait socket while holding ehash lock, to make sure lookups/changes will be serialized. Only first thread finds the timewait socket, other ones find the established socket and return an EADDRNOTAVAIL error. This second version takes into account Evgeniy's review and makes sure inet_twsk_put() is called outside of locked sections. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-12-03tcp: diag: Dont report negative values for rx queueEric Dumazet2-3/+11
Both netlink and /proc/net/tcp interfaces can report transient negative values for rx queue. ss -> State Recv-Q Send-Q Local Address:Port Peer Address:Port ESTAB -6 6 127.0.0.1:45956 127.0.0.1:3333 netstat -> tcp 4294967290 6 127.0.0.1:37784 127.0.0.1:3333 ESTABLISHED This is because we dont lock socket while computing tp->rcv_nxt - tp->copied_seq, and another CPU can update copied_seq before rcv_next in RX path. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-12-03netdevice: provide common routine for macvlan and vlan operstate managementPatrick Mullaney2-25/+31
Provide common routine for the transition of operational state for a leaf device during a root device transition. Signed-off-by: Patrick Mullaney <pmullaney@novell.com> Acked-by: Arnd Bergmann <arnd@arndb.de> Acked-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-12-03Merge branch 'master' of ↵David S. Miller7-37/+149
git://git.kernel.org/pub/scm/linux/kernel/git/holtmann/bluetooth-next-2.6
2009-12-03Merge branch 'master' of ↵David S. Miller31-257/+250
git://git.kernel.org/pub/scm/linux/kernel/git/kaber/nf-next-2.6
2009-12-03net: Batch inet_twsk_purgeEric W. Biederman3-11/+21
This function walks the whole hashtable so there is no point in passing it a network namespace. Instead I purge all timewait sockets from dead network namespaces that I find. If the namespace is one of the once I am trying to purge I am guaranteed no new timewait sockets can be formed so this will get them all. If the namespace is one I am not acting for it might form a few more but I will call inet_twsk_purge again and shortly to get rid of them. In any even if the network namespace is dead timewait sockets are useless. Move the calls of inet_twsk_purge into batch_exit routines so that if I am killing a bunch of namespaces at once I will just call inet_twsk_purge once and save a lot of redundant unnecessary work. My simple 4k network namespace exit test the cleanup time dropped from roughly 8.2s to 1.6s. While the time spent running inet_twsk_purge fell to about 2ms. 1ms for ipv4 and 1ms for ipv6. Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-12-03net: Use rcu lookups in inet_twsk_purge.Eric W. Biederman1-15/+24
While we are looking up entries to free there is no reason to take the lock in inet_twsk_purge. We have to drop locks and restart occassionally anyway so adding a few more in case we get on the wrong list because of a timewait move is no big deal. At the same time not taking the lock for long periods of time is much more polite to the rest of the users of the hash table. In my test configuration of killing 4k network namespaces this change causes 4k back to back runs of inet_twsk_purge on an empty hash table to go from roughly 20.7s to 3.3s, and the total time to destroy 4k network namespaces goes from roughly 44s to 3.3s. Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Acked-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-12-03net: Allow fib_rule_unregister to batchEric W. Biederman4-37/+55
Refactor the code so fib_rules_register always takes a template instead of the actual fib_rules_ops structure that will be used. This is required for network namespace support so 2 out of the 3 callers already do this, it allows the error handling to be made common, and it allows fib_rules_unregister to free the template for hte caller. Modify fib_rules_unregister to use call_rcu instead of syncrhonize_rcu to allw multiple namespaces to be cleaned up in the same rcu grace period. Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-12-03netns: Add an explicit rcu_barrier to unregister_pernet_{device|subsys}Eric W. Biederman1-2/+6
This allows namespace exit methods to batch work that comes requires an rcu barrier using call_rcu without having to treat the unregister_pernet_operations cases specially. Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-12-03net: Allow xfrm_user_net_exit to batch efficiently.Eric W. Biederman1-8/+10
xfrm.nlsk is provided by the xfrm_user module and is access via rcu from other parts of the xfrm code. Add xfrm.nlsk_stash a copy of xfrm.nlsk that will never be set to NULL. This allows the synchronize_net and netlink_kernel_release to be deferred until a whole batch of xfrm.nlsk sockets have been set to NULL. Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-12-03net: Move network device exit batchingEric W. Biederman2-24/+25
Move network device exit batching from a special case in net_namespace.c to using common mechanisms in dev.c Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-12-03net: Add support for batching network namespace cleanupsEric W. Biederman1-61/+61
- Add exit_list to struct net to support building lists of network namespaces to cleanup. - Add exit_batch to pernet_operations to allow running operations only once during a network namespace exit. Instead of once per network namespace. - Factor opt ops_exit_list and ops_exit_free so the logic with cleanup up a network namespace does not need to be duplicated. Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-12-03ipv4 05/05: add sysctl to accept packets with local source addressesPatrick McHardy2-4/+8
commit 8ec1e0ebe26087bfc5c0394ada5feb5758014fc8 Author: Patrick McHardy <kaber@trash.net> Date: Thu Dec 3 12:16:35 2009 +0100 ipv4: add sysctl to accept packets with local source addresses Change fib_validate_source() to accept packets with a local source address when the "accept_local" sysctl is set for the incoming inet device. Combined with the previous patches, this allows to communicate between multiple local interfaces over the wire. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-12-03net 04/05: fib_rules: allow to delete local rulePatrick McHardy3-3/+3
commit d124356ce314fff22a047ea334379d5105b2d834 Author: Patrick McHardy <kaber@trash.net> Date: Thu Dec 3 12:16:35 2009 +0100 net: fib_rules: allow to delete local rule Allow to delete the local rule and recreate it with a higher priority. This can be used to force packets with a local destination out on the wire instead of routing them to loopback. Additionally this patch allows to recreate rules with a priority of 0. Combined with the previous patch to allow oif classification, a socket can be bound to the desired interface and packets routed to the wire like this: # move local rule to lower priority ip rule add pref 1000 lookup local ip rule del pref 0 # route packets of sockets bound to eth0 to the wire independant # of the destination address ip rule add pref 100 oif eth0 lookup 100 ip route add default dev eth0 table 100 Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-12-03net 03/05: fib_rules: add oif classificationPatrick McHardy1-1/+32
commit 68144d350f4f6c348659c825cde6a82b34c27a91 Author: Patrick McHardy <kaber@trash.net> Date: Thu Dec 3 12:05:25 2009 +0100 net: fib_rules: add oif classification Support routing table lookup based on the flow's oif. This is useful to classify packets originating from sockets bound to interfaces differently. The route cache already includes the oif and needs no changes. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-12-03net 02/05: fib_rules: rename ifindex/ifname/FRA_IFNAME to ↵Patrick McHardy1-18/+18
iifindex/iifname/FRA_IIFNAME commit 229e77eec406ad68662f18e49fda8b5d366768c5 Author: Patrick McHardy <kaber@trash.net> Date: Thu Dec 3 12:05:23 2009 +0100 net: fib_rules: rename ifindex/ifname/FRA_IFNAME to iifindex/iifname/FRA_IIFNAME The next patch will add oif classification, rename interface related members and attributes to reflect that they're used for iif classification. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>