summaryrefslogtreecommitdiffstats
path: root/include/net
AgeCommit message (Collapse)AuthorFilesLines
2009-12-16Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6Linus Torvalds5-4/+12
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6: (26 commits) net: sh_eth alignment fix for sh7724 using NET_IP_ALIGN V2 ixgbe: allow tx of pre-formatted vlan tagged packets ixgbe: Fix 82598 premature copper PHY link indicatation ixgbe: Fix tx_restart_queue/non_eop_desc statistics counters bcm63xx_enet: fix compilation failure after get_stats_count removal packet: dont call sleeping functions while holding rcu_read_lock() tcp: Revert per-route SACK/DSACK/TIMESTAMP changes. ipvs: zero usvc and udest netfilter: fix crashes in bridge netfilter caused by fragment jumps ipv6: reassembly: use seperate reassembly queues for conntrack and local delivery sky2: leave PCI config space writeable sky2: print Optima chip name x25: Update maintainer. ipvs: fix synchronization on connection close netfilter: xtables: document minimal required version drivers/net/bonding/: : use pr_fmt can: CAN_MCP251X should depend on HAS_DMA drivers/net/usb: Correct code taking the size of a pointer drivers/net/cpmac.c: Correct code taking the size of a pointer drivers/net/sfc: Correct code taking the size of a pointer ...
2009-12-15Merge branch 'master' of ↵David S. Miller3-1/+10
git://git.kernel.org/pub/scm/linux/kernel/git/kaber/nf-2.6
2009-12-15tcp: Revert per-route SACK/DSACK/TIMESTAMP changes.David S. Miller2-3/+2
It creates a regression, triggering badness for SYN_RECV sockets, for example: [19148.022102] Badness at net/ipv4/inet_connection_sock.c:293 [19148.022570] NIP: c02a0914 LR: c02a0904 CTR: 00000000 [19148.023035] REGS: eeecbd30 TRAP: 0700 Not tainted (2.6.32) [19148.023496] MSR: 00029032 <EE,ME,CE,IR,DR> CR: 24002442 XER: 00000000 [19148.024012] TASK = eee9a820[1756] 'privoxy' THREAD: eeeca000 This is likely caused by the change in the 'estab' parameter passed to tcp_parse_options() when invoked by the functions in net/ipv4/tcp_minisocks.c But even if that is fixed, the ->conn_request() changes made in this patch series is fundamentally wrong. They try to use the listening socket's 'dst' to probe the route settings. The listening socket doesn't even have a route, and you can't get the right route (the child request one) until much later after we setup all of the state, and it must be done by hand. This stuff really isn't ready, so the best thing to do is a full revert. This reverts the following commits: f55017a93f1a74d50244b1254b9a2bd7ac9bbf7d 022c3f7d82f0f1c68018696f2f027b87b9bb45c2 1aba721eba1d84a2defce45b950272cee1e6c72a cda42ebd67ee5fdf09d7057b5a4584d36fe8a335 345cda2fd695534be5a4494f1b59da9daed33663 dc343475ed062e13fc260acccaab91d7d80fd5b2 05eaade2782fb0c90d3034fd7a7d5a16266182bb 6a2a2d6bf8581216e08be15fcb563cfd6c430e1e Signed-off-by: David S. Miller <davem@davemloft.net>
2009-12-15netfilter: fix crashes in bridge netfilter caused by fragment jumpsPatrick McHardy2-0/+2
When fragments from bridge netfilter are passed to IPv4 or IPv6 conntrack and a reassembly queue with the same fragment key already exists from reassembling a similar packet received on a different device (f.i. with multicasted fragments), the reassembled packet might continue on a different codepath than where the head fragment originated. This can cause crashes in bridge netfilter when a fragment received on a non-bridge device (and thus with skb->nf_bridge == NULL) continues through the bridge netfilter code. Add a new reassembly identifier for packets originating from bridge netfilter and use it to put those packets in insolated queues. Fixes http://bugzilla.kernel.org/show_bug.cgi?id=14805 Reported-and-Tested-by: Chong Qiao <qiaochong@loongson.cn> Signed-off-by: Patrick McHardy <kaber@trash.net>
2009-12-15ipv6: reassembly: use seperate reassembly queues for conntrack and local ↵Patrick McHardy2-1/+8
delivery Currently the same reassembly queue might be used for packets reassembled by conntrack in different positions in the stack (PREROUTING/LOCAL_OUT), as well as local delivery. This can cause "packet jumps" when the fragment completing a reassembled packet is queued from a different position in the stack than the previous ones. Add a "user" identifier to the reassembly queue key to seperate the queues of each caller, similar to what we do for IPv4. Signed-off-by: Patrick McHardy <kaber@trash.net>
2009-12-14Merge branch 'for-linus' of ↵Linus Torvalds3-40/+21
git://git.kernel.org/pub/scm/linux/kernel/git/tj/percpu * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/percpu: (34 commits) m68k: rename global variable vmalloc_end to m68k_vmalloc_end percpu: add missing per_cpu_ptr_to_phys() definition for UP percpu: Fix kdump failure if booted with percpu_alloc=page percpu: make misc percpu symbols unique percpu: make percpu symbols in ia64 unique percpu: make percpu symbols in powerpc unique percpu: make percpu symbols in x86 unique percpu: make percpu symbols in xen unique percpu: make percpu symbols in cpufreq unique percpu: make percpu symbols in oprofile unique percpu: make percpu symbols in tracer unique percpu: make percpu symbols under kernel/ and mm/ unique percpu: remove some sparse warnings percpu: make alloc_percpu() handle array types vmalloc: fix use of non-existent percpu variable in put_cpu_var() this_cpu: Use this_cpu_xx in trace_functions_graph.c this_cpu: Use this_cpu_xx for ftrace this_cpu: Use this_cpu_xx in nmi handling this_cpu: Use this_cpu operations in RCU this_cpu: Use this_cpu ops for VM statistics ... Fix up trivial (famous last words) global per-cpu naming conflicts in arch/x86/kvm/svm.c mm/slab.c
2009-12-11Merge branch 'master' of /home/davem/src/GIT/linux-2.6/David S. Miller3-3/+2
Conflicts: include/net/tcp.h
2009-12-11net: fix compat_sys_recvmmsg parameter typeHeiko Carstens1-1/+1
compat_sys_recvmmsg has a compat_timespec parameter and not a timespec parameter. This way we also get rid of an odd cast. Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Acked-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-12-09Merge branch 'for-linus' of ↵Linus Torvalds3-3/+3
git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial: (42 commits) tree-wide: fix misspelling of "definition" in comments reiserfs: fix misspelling of "journaled" doc: Fix a typo in slub.txt. inotify: remove superfluous return code check hdlc: spelling fix in find_pvc() comment doc: fix regulator docs cut-and-pasteism mtd: Fix comment in Kconfig doc: Fix IRQ chip docs tree-wide: fix assorted typos all over the place drivers/ata/libata-sff.c: comment spelling fixes fix typos/grammos in Documentation/edac.txt sysctl: add missing comments fs/debugfs/inode.c: fix comment typos sgivwfb: Make use of ARRAY_SIZE. sky2: fix sky2_link_down copy/paste comment error tree-wide: fix typos "couter" -> "counter" tree-wide: fix typos "offest" -> "offset" fix kerneldoc for set_irq_msi() spidev: fix double "of of" in comment comment typo fix: sybsystem -> subsystem ...
2009-12-08tcp: Stalling connections: Move timeout calculation routineDamian Lukowski1-28/+0
This patch moves retransmits_timed_out() from include/net/tcp.h to tcp_timer.c, where it is used. Reported-by: Frederic Leroy <fredo@starox.org> Signed-off-by: Damian Lukowski <damian@tvk.rwth-aachen.de> Acked-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-12-08tcp: Stalling connections: Fix timeout calculation routineDamian Lukowski1-2/+8
This patch fixes a problem in the TCP connection timeout calculation. Currently, timeout decisions are made on the basis of the current tcp_time_stamp and retrans_stamp, which is usually set at the first retransmission. However, if the retransmission fails in tcp_retransmit_skb(), retrans_stamp is not updated and remains zero. This leads to wrong decisions in retransmits_timed_out() if tcp_time_stamp is larger than the specified timeout, which is very likely. In this case, the TCP connection dies after the first attempted (and unsuccessful) retransmission. With this patch, tcp_skb_cb->when is used instead, when retrans_stamp is not available. This bug has been introduced together with retransmits_timed_out() in 2.6.32, as the number of retransmissions has been used for timeout decisions before. The corresponding commit was 6fa12c85031485dff38ce550c24f10da23b0adaa (Revert Backoff [v3]: Calculate TCP's connection close threshold as a time value.). Thanks to Ilpo Järvinen for code suggestions and Frederic Leroy for testing. Reported-by: Frederic Leroy <fredo@starox.org> Signed-off-by: Damian Lukowski <damian@tvk.rwth-aachen.de> Acked-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-12-08tcp: Fix a connect() race with timewait socketsEric Dumazet1-0/+3
When we find a timewait connection in __inet_hash_connect() and reuse it for a new connection request, we have a race window, releasing bind list lock and reacquiring it in __inet_twsk_kill() to remove timewait socket from list. Another thread might find the timewait socket we already chose, leading to list corruption and crashes. Fix is to remove timewait socket from bind list before releasing the bind lock. Note: This problem happens if sysctl_tcp_tw_reuse is set. Reported-by: kapil dakhane <kdakhane@gmail.com> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-12-08tcp: Fix a connect() race with timewait socketsEric Dumazet2-4/+6
First patch changes __inet_hash_nolisten() and __inet6_hash() to get a timewait parameter to be able to unhash it from ehash at same time the new socket is inserted in hash. This makes sure timewait socket wont be found by a concurrent writer in __inet_check_established() Reported-by: kapil dakhane <kdakhane@gmail.com> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-12-08Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next-2.6Linus Torvalds68-629/+944
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next-2.6: (1815 commits) mac80211: fix reorder buffer release iwmc3200wifi: Enable wimax core through module parameter iwmc3200wifi: Add wifi-wimax coexistence mode as a module parameter iwmc3200wifi: Coex table command does not expect a response iwmc3200wifi: Update wiwi priority table iwlwifi: driver version track kernel version iwlwifi: indicate uCode type when fail dump error/event log iwl3945: remove duplicated event logging code b43: fix two warnings ipw2100: fix rebooting hang with driver loaded cfg80211: indent regulatory messages with spaces iwmc3200wifi: fix NULL pointer dereference in pmkid update mac80211: Fix TX status reporting for injected data frames ath9k: enable 2GHz band only if the device supports it airo: Fix integer overflow warning rt2x00: Fix padding bug on L2PAD devices. WE: Fix set events not propagated b43legacy: avoid PPC fault during resume b43: avoid PPC fault during resume tcp: fix a timewait refcnt race ... Fix up conflicts due to sysctl cleanups (dead sysctl_check code and CTL_UNNUMBERED removed) in kernel/sysctl_check.c net/ipv4/sysctl_net_ipv4.c net/ipv6/addrconf.c net/sctp/sysctl.c
2009-12-08Merge git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/sysctl-2.6Linus Torvalds2-3/+1
* git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/sysctl-2.6: (43 commits) security/tomoyo: Remove now unnecessary handling of security_sysctl. security/tomoyo: Add a special case to handle accesses through the internal proc mount. sysctl: Drop & in front of every proc_handler. sysctl: Remove CTL_NONE and CTL_UNNUMBERED sysctl: kill dead ctl_handler definitions. sysctl: Remove the last of the generic binary sysctl support sysctl net: Remove unused binary sysctl code sysctl security/tomoyo: Don't look at ctl_name sysctl arm: Remove binary sysctl support sysctl x86: Remove dead binary sysctl support sysctl sh: Remove dead binary sysctl support sysctl powerpc: Remove dead binary sysctl support sysctl ia64: Remove dead binary sysctl support sysctl s390: Remove dead sysctl binary support sysctl frv: Remove dead binary sysctl support sysctl mips/lasat: Remove dead binary sysctl support sysctl drivers: Remove dead binary sysctl support sysctl crypto: Remove dead binary sysctl support sysctl security/keys: Remove dead binary sysctl support sysctl kernel: Remove binary sysctl logic ...
2009-12-07Merge branch 'for-next' into for-linusJiri Kosina3-3/+3
Conflicts: kernel/irq/chip.c
2009-12-04Merge branch 'master' of ↵David S. Miller2-0/+32
git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-next-2.6
2009-12-04tree-wide: fix assorted typos all over the placeAndré Goddard Rosa3-3/+3
That is "success", "unknown", "through", "performance", "[re|un]mapping" , "access", "default", "reasonable", "[con]currently", "temperature" , "channel", "[un]used", "application", "example","hierarchy", "therefore" , "[over|under]flow", "contiguous", "threshold", "enough" and others. Signed-off-by: André Goddard Rosa <andre.goddard@gmail.com> Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2009-12-03tcp: connect() race with timewait reuseEric Dumazet1-0/+2
Its currently possible that several threads issuing a connect() find the same timewait socket and try to reuse it, leading to list corruptions. Condition for bug is that these threads bound their socket on same address/port of to-be-find timewait socket, and connected to same target. (SO_REUSEADDR needed) To fix this problem, we could unhash timewait socket while holding ehash lock, to make sure lookups/changes will be serialized. Only first thread finds the timewait socket, other ones find the established socket and return an EADDRNOTAVAIL error. This second version takes into account Evgeniy's review and makes sure inet_twsk_put() is called outside of locked sections. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-12-03Merge branch 'master' of ↵David S. Miller2-39/+2
git://git.kernel.org/pub/scm/linux/kernel/git/holtmann/bluetooth-next-2.6
2009-12-03net: Batch inet_twsk_purgeEric W. Biederman1-3/+3
This function walks the whole hashtable so there is no point in passing it a network namespace. Instead I purge all timewait sockets from dead network namespaces that I find. If the namespace is one of the once I am trying to purge I am guaranteed no new timewait sockets can be formed so this will get them all. If the namespace is one I am not acting for it might form a few more but I will call inet_twsk_purge again and shortly to get rid of them. In any even if the network namespace is dead timewait sockets are useless. Move the calls of inet_twsk_purge into batch_exit routines so that if I am killing a bunch of namespaces at once I will just call inet_twsk_purge once and save a lot of redundant unnecessary work. My simple 4k network namespace exit test the cleanup time dropped from roughly 8.2s to 1.6s. While the time spent running inet_twsk_purge fell to about 2ms. 1ms for ipv4 and 1ms for ipv6. Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-12-03net: Allow fib_rule_unregister to batchEric W. Biederman1-1/+2
Refactor the code so fib_rules_register always takes a template instead of the actual fib_rules_ops structure that will be used. This is required for network namespace support so 2 out of the 3 callers already do this, it allows the error handling to be made common, and it allows fib_rules_unregister to free the template for hte caller. Modify fib_rules_unregister to use call_rcu instead of syncrhonize_rcu to allw multiple namespaces to be cleaned up in the same rcu grace period. Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-12-03net: Allow xfrm_user_net_exit to batch efficiently.Eric W. Biederman1-0/+1
xfrm.nlsk is provided by the xfrm_user module and is access via rcu from other parts of the xfrm code. Add xfrm.nlsk_stash a copy of xfrm.nlsk that will never be set to NULL. This allows the synchronize_net and netlink_kernel_release to be deferred until a whole batch of xfrm.nlsk sockets have been set to NULL. Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-12-03net: Add support for batching network namespace cleanupsEric W. Biederman1-0/+2
- Add exit_list to struct net to support building lists of network namespaces to cleanup. - Add exit_batch to pernet_operations to allow running operations only once during a network namespace exit. Instead of once per network namespace. - Factor opt ops_exit_list and ops_exit_free so the logic with cleanup up a network namespace does not need to be duplicated. Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-12-03net 03/05: fib_rules: add oif classificationPatrick McHardy1-0/+3
commit 68144d350f4f6c348659c825cde6a82b34c27a91 Author: Patrick McHardy <kaber@trash.net> Date: Thu Dec 3 12:05:25 2009 +0100 net: fib_rules: add oif classification Support routing table lookup based on the flow's oif. This is useful to classify packets originating from sockets bound to interfaces differently. The route cache already includes the oif and needs no changes. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-12-03net 02/05: fib_rules: rename ifindex/ifname/FRA_IFNAME to ↵Patrick McHardy1-3/+3
iifindex/iifname/FRA_IIFNAME commit 229e77eec406ad68662f18e49fda8b5d366768c5 Author: Patrick McHardy <kaber@trash.net> Date: Thu Dec 3 12:05:23 2009 +0100 net: fib_rules: rename ifindex/ifname/FRA_IFNAME to iifindex/iifname/FRA_IIFNAME The next patch will add oif classification, rename interface related members and attributes to reflect that they're used for iif classification. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-12-03net 01/05: fib_rules: rearrange struct fib_rulePatrick McHardy1-1/+1
commit b8952893d5d86f69c4e499d191b98c6658f64b0f Author: Patrick McHardy <kaber@trash.net> Date: Thu Dec 3 12:05:22 2009 +0100 net: fib_rules: rearrange struct fib_rule The ifname member is only used to resolve interface names and is not needed during rule lookups. The target and ctarget members however are used during rule lookups and are currently located in a second cacheline. Move ifname further to the end to make sure both target and ctarget are located in the same cacheline as other members used during rule lookups. The layout on 64 bit changes from: struct fib_rule { ... u32 table; /* 56 4 */ u8 action; /* 60 1 */ /* XXX 3 bytes hole, try to pack */ /* --- cacheline 1 boundary (64 bytes) --- */ u32 target; /* 64 4 */ /* XXX 4 bytes hole, try to pack */ struct fib_rule * ctarget; /* 72 8 */ struct rcu_head rcu; /* 80 16 */ struct net * fr_net; /* 96 8 */ }; to: struct fib_rule { ... u32 table; /* 40 4 */ u8 action; /* 44 1 */ /* XXX 3 bytes hole, try to pack */ u32 target; /* 48 4 */ /* XXX 4 bytes hole, try to pack */ struct fib_rule * ctarget; /* 56 8 */ /* --- cacheline 1 boundary (64 bytes) --- */ char ifname[16]; /* 64 16 */ struct rcu_head rcu; /* 80 16 */ struct net * fr_net; /* 96 8 */ }; Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-12-03Bluetooth: Implement RejActioned flagGustavo F. Padovan1-0/+1
RejActioned is used to prevent retransmission when a entity is on the WAIT_F state, i.e., waiting for a frame with F-bit set due local busy condition or a expired retransmission timer. (When these two events raise they send a frame with the Poll bit set and enters in the WAIT_F state to wait for a frame with the Final bit set.) The local entity doesn't send I-frames(the data frames) until the receipt of a frame with F-bit set. When that happens it also set RejActioned to false. RejActioned is a mandatory feature of ERTM spec. Signed-off-by: Gustavo F. Padovan <gustavo@las.ic.unicamp.br> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2009-12-03Bluetooth: Fix sending ReqSeq on I-framesGustavo F. Padovan1-1/+0
As specified by ERTM spec an ERTM channel can acknowledge received I-frames(the data frames) by sending an I-frame with the proper ReqSeq value (i.e. ReqSeq is set to BufferSeq). Until now we aren't setting the ReqSeq value on I-frame control bits. That way we can save sending S-frames(Supervise frames) only to acknowledge receipt of I-frames. It is very helpful to the full-duplex channel. ReqSeq is the packet sequence number sent in an acknowledgement frame to acknowledge receipt of frames up to (ReqSeq - 1). BufferSeq controls the receiver buffer, it is used to delay acknowledgement of new frames to not cause buffer overflow. BufferSeq value is not increased until frames are pulled by reassembly function. Signed-off-by: Gustavo F. Padovan <gustavo@las.ic.unicamp.br> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2009-12-03Bluetooth: Unobfuscate tasklet_schedule usageMarcel Holtmann1-16/+0
The tasklet schedule function helpers are just an obfuscation. So remove them and call the schedule functions directly. Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2009-12-03Bluetooth: Turn hci_recv_frame into an exported functionMarcel Holtmann1-22/+1
For future simplification it is important that the hci_recv_frame function is no longer an inline function. So move it into the module itself and export it. Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2009-12-02tcp: clear hints to avoid a stale one (nfs only affected?)Ilpo Järvinen1-0/+1
Eric Dumazet mentioned in a context of another problem: "Well, it seems NFS reuses its socket, so maybe we miss some cleaning as spotted in this old patch" I've not check under which conditions that actually happens but if true, we need to make sure we don't accidently leave stale hints behind when the write queue had to be purged (whether reusing with NFS can actually happen if purging took place is something I'm not sure of). ...At least it compiles. Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-12-02TCPCT part 1g: Responder Cookie => InitiatorWilliam Allen Simpson1-0/+1
Parse incoming TCP_COOKIE option(s). Calculate <SYN,ACK> TCP_COOKIE option. Send optional <SYN,ACK> data. This is a significantly revised implementation of an earlier (year-old) patch that no longer applies cleanly, with permission of the original author (Adam Langley): http://thread.gmane.org/gmane.linux.network/102586 Requires: TCPCT part 1a: add request_values parameter for sending SYNACK TCPCT part 1b: generate Responder Cookie secret TCPCT part 1c: sysctl_tcp_cookie_size, socket option TCP_COOKIE_TRANSACTIONS TCPCT part 1d: define TCP cookie option, extend existing struct's TCPCT part 1e: implement socket option TCP_COOKIE_TRANSACTIONS TCPCT part 1f: Initiator Cookie => Responder Signed-off-by: William.Allen.Simpson@gmail.com Signed-off-by: David S. Miller <davem@davemloft.net>
2009-12-02TCPCT part 1d: define TCP cookie option, extend existing struct'sWilliam Allen Simpson1-0/+83
Data structures are carefully composed to require minimal additions. For example, the struct tcp_options_received cookie_plus variable fits between existing 16-bit and 8-bit variables, requiring no additional space (taking alignment into consideration). There are no additions to tcp_request_sock, and only 1 pointer in tcp_sock. This is a significantly revised implementation of an earlier (year-old) patch that no longer applies cleanly, with permission of the original author (Adam Langley): http://thread.gmane.org/gmane.linux.network/102586 The principle difference is using a TCP option to carry the cookie nonce, instead of a user configured offset in the data. This is more flexible and less subject to user configuration error. Such a cookie option has been suggested for many years, and is also useful without SYN data, allowing several related concepts to use the same extension option. "Re: SYN floods (was: does history repeat itself?)", September 9, 1996. http://www.merit.net/mail.archives/nanog/1996-09/msg00235.html "Re: what a new TCP header might look like", May 12, 1998. ftp://ftp.isi.edu/end2end/end2end-interest-1998.mail These functions will also be used in subsequent patches that implement additional features. Requires: TCPCT part 1a: add request_values parameter for sending SYNACK TCPCT part 1b: generate Responder Cookie secret TCPCT part 1c: sysctl_tcp_cookie_size, socket option TCP_COOKIE_TRANSACTIONS Signed-off-by: William.Allen.Simpson@gmail.com Signed-off-by: David S. Miller <davem@davemloft.net>
2009-12-02TCPCT part 1c: sysctl_tcp_cookie_size, socket option TCP_COOKIE_TRANSACTIONSWilliam Allen Simpson1-5/+1
Define sysctl (tcp_cookie_size) to turn on and off the cookie option default globally, instead of a compiled configuration option. Define per socket option (TCP_COOKIE_TRANSACTIONS) for setting constant data values, retrieving variable cookie values, and other facilities. Move inline tcp_clear_options() unchanged from net/tcp.h to linux/tcp.h, near its corresponding struct tcp_options_received (prior to changes). This is a straightforward re-implementation of an earlier (year-old) patch that no longer applies cleanly, with permission of the original author (Adam Langley): http://thread.gmane.org/gmane.linux.network/102586 These functions will also be used in subsequent patches that implement additional features. Requires: net: TCP_MSS_DEFAULT, TCP_MSS_DESIRED Signed-off-by: William.Allen.Simpson@gmail.com Signed-off-by: David S. Miller <davem@davemloft.net>
2009-12-02TCPCT part 1b: generate Responder Cookie secretWilliam Allen Simpson1-0/+8
Define (missing) hash message size for SHA1. Define hashing size constants specific to TCP cookies. Add new function: tcp_cookie_generator(). Maintain global secret values for tcp_cookie_generator(). This is a significantly revised implementation of earlier (15-year-old) Photuris [RFC-2522] code for the KA9Q cooperative multitasking platform. Linux RCU technique appears to be well-suited to this application, though neither of the circular queue items are freed. These functions will also be used in subsequent patches that implement additional features. Signed-off-by: William.Allen.Simpson@gmail.com Acked-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-12-02TCPCT part 1a: add request_values parameter for sending SYNACKWilliam Allen Simpson2-2/+9
Add optional function parameters associated with sending SYNACK. These parameters are not needed after sending SYNACK, and are not used for retransmission. Avoids extending struct tcp_request_sock, and avoids allocating kernel memory. Also affects DCCP as it uses common struct request_sock_ops, but this parameter is currently reserved for future use. Signed-off-by: William.Allen.Simpson@gmail.com Acked-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-12-01Merge branch 'master' of ↵David S. Miller1-0/+6
master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6 Conflicts: net/mac80211/ht.c
2009-12-01net: remove [un]register_pernet_gen_... and update the docs.Eric W. Biederman2-25/+5
No that all of the callers have been updated to set fields in struct pernet_operations, and simplified to let the network namespace core handle the allocation and freeing of the storage for them, remove the surpurpflous methods and update the docs to the new style. Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-12-01net: Automatically allocate per namespace data.Eric W. Biederman1-4/+24
To get the full benefit of batched network namespace cleanup netowrk device deletion needs to be performed by the generic code. When using register_pernet_gen_device and freeing the data in exit_net it is impossible to delay allocation until after exit_net has called as the device uninit methods are no longer safe. To correct this, and to simplify working with per network namespace data I have moved allocation and deletion of per network namespace data into the network namespace core. The core now frees the data only after all of the network namespace exit routines have run. Now it is only required to set the new fields .id and .size in the pernet_operations structure if you want network namespace data to be managed for you automatically. This makes the current register_pernet_gen_device and register_pernet_gen_subsys routines unnecessary. For the moment I have left them as compatibility wrappers in net_namespace.h They will be removed once all of the users have been updated. Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-12-01net: Batch network namespace destruction.Eric W. Biederman1-1/+1
It is fairly common to kill several network namespaces at once. Either because they are nested one inside the other or because they are cooperating in multiple machine networking experiments. As the network stack control logic does not parallelize easily batch up multiple network namespaces existing together. To get the full benefit of batching the virtual network devices to be removed must be all removed in one batch. For that purpose I have added a loop after the last network device operations have run that batches up all remaining network devices and deletes them. An extra benefit is that the reorganization slightly shrinks the size of the per network namespace data structures replaceing a work_struct with a list_head. In a trivial test with 4K namespaces this change reduced the cost of a destroying 4K namespaces from 7+ minutes (at 12% cpu) to 44 seconds (at 60% cpu). The bulk of that 44s was spent in inet_twsk_purge. Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-12-01net: NETDEV_UNREGISTER_PERNET -> NETDEV_UNREGISTER_BATCHEric W. Biederman1-0/+1
The motivation for an additional notifier in batched netdevice notification (rt_do_flush) only needs to be called once per batch not once per namespace. For further batching improvements I need a guarantee that the netdevices are unregistered in order allowing me to unregister an all of the network devices in a network namespace at the same time with the guarantee that the loopback device is really and truly unregistered last. Additionally it appears that we moved the route cache flush after the final synchronize_net, which seems wrong and there was no explanation. So I have restored the original location of the final synchronize_net. Cc: Octavian Purdila <opurdila@ixiacom.com> Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-11-30Merge branch 'security' of ↵Linus Torvalds1-0/+6
git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-2.6 * 'security' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-2.6: mac80211: fix spurious delBA handling mac80211: fix two remote exploits
2009-11-30mac80211: fix spurious delBA handlingJohannes Berg1-0/+6
Lennert Buytenhek noticed that delBA handling in mac80211 was broken and has remotely triggerable problems, some of which are due to some code shuffling I did that ended up changing the order in which things were done -- this was commit d75636ef9c1af224f1097941879d5a8db7cd04e5 Author: Johannes Berg <johannes@sipsolutions.net> Date: Tue Feb 10 21:25:53 2009 +0100 mac80211: RX aggregation: clean up stop session and other parts were already present in the original commit d92684e66091c0f0101819619b315b4bb8b5bcc5 Author: Ron Rindjunsky <ron.rindjunsky@intel.com> Date: Mon Jan 28 14:07:22 2008 +0200 mac80211: A-MPDU Tx add delBA from recipient support The first problem is that I moved a BUG_ON before various checks -- thereby making it possible to hit. As the comment indicates, the BUG_ON can be removed since the ampdu_action callback must already exist when the state is != IDLE. The second problem isn't easily exploitable but there's a race condition due to unconditionally setting the state to OPERATIONAL when a delBA frame is received, even when no aggregation session was ever initiated. All the drivers accept stopping the session even then, but that opens a race window where crashes could happen before the driver accepts it. Right now, a WARN_ON may happen with non-HT drivers, while the race opens only for HT drivers. For this case, there are two things necessary to fix it: 1) don't process spurious delBA frames, and be more careful about the session state; don't drop the lock 2) HT drivers need to be prepared to handle a session stop even before the session was really started -- this is true for all drivers (that support aggregation) but iwlwifi which can be fixed easily. The other HT drivers (ath9k and ar9170) are behaving properly already. Reported-by: Lennert Buytenhek <buytenh@marvell.com> Cc: stable@kernel.org Signed-off-by: Johannes Berg <johannes@sipsolutions.net> Signed-off-by: John W. Linville <linville@tuxdriver.com>
2009-11-29Merge branch 'master' of ↵David S. Miller1-1/+0
master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6 Conflicts: drivers/ieee802154/fakehard.c drivers/net/e1000e/ich8lan.c drivers/net/e1000e/phy.c drivers/net/netxen/netxen_nic_init.c drivers/net/wireless/ath/ath9k/main.c
2009-11-29X25: Move SYSCTL ifdefs into headerandrew hendry1-0/+6
Moves the CONFIG_SYSCTL ifdefs in x25_init into header. Signed-off-by: Andrew Hendry <andrew.hendry@gmail.com> Acked-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-11-29Merge branch 'net-next' of ↵David S. Miller4-97/+60
git://git.kernel.org/pub/scm/linux/kernel/git/vxy/lksctp-dev
2009-11-29sctp: on T3_RTX retransmit all the in-flight chunksAndrei Pelinescu-Onciul1-1/+0
When retransmitting due to T3 timeout, retransmit all the in-flight chunks for the corresponding transport/path, including chunks sent less then 1 rto ago. This is the correct behaviour according to rfc4960 section 6.3.3 E3 and "Note: Any DATA chunks that were sent to the address for which the T3-rtx timer expired but did not fit in one MTU (rule E3 above) should be marked for retransmission and sent as soon as cwnd allows (normally, when a SACK arrives). ". This fixes problems when more then one path is present and the T3 retransmission of the first chunk that timeouts stops the T3 timer for the initial active path, leaving all the other in-flight chunks waiting forever or until a new chunk is transmitted on the same path and timeouts (and this will happen only if the cwnd allows sending new chunks, but since cwnd was dropped to MTU by the timeout => it will wait until the first heartbeat). Example: 10 packets in flight, sent at 0.1 s intervals on the primary path. The primary path is down and the first packet timeouts. The first packet is retransmitted on another path, the T3 timer for the primary path is stopped and cwnd is set to MTU. All the other 9 in-flight packets will not be retransmitted (unless more new packets are sent on the primary path which depend on cwnd allowing it, and even in this case the 9 packets will be retransmitted only after a new packet timeouts which even in the best case would be more then RTO). This commit reverts d0ce92910bc04e107b2f3f2048f07e94f570035d and also removes the now unused transport->last_rto, introduced in b6157d8e03e1e780660a328f7183bcbfa4a93a19. p.s The problem is not only when multiple paths are there. It can happen in a single homed environment. If the application stops sending data, it possible to have a hung association. Signed-off-by: Andrei Pelinescu-Onciul <andrei@iptel.org> Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-11-28nl80211: PMKSA caching supportSamuel Ortiz1-0/+28
This is an interface to set, delete and flush PMKIDs through nl80211. Main users would be fullmac devices which firmwares are capable of generating the RSN IEs for the re-association requests, e.g. iwmc3200wifi. Signed-off-by: Samuel Ortiz <sameo@linux.intel.com> Signed-off-by: John W. Linville <linville@tuxdriver.com>
2009-11-28mac80211: move cmntr flag out of rx flagsJohannes Berg1-0/+4
The RX flags should soon be used only for flags that cannot change within an a-MPDU, so move the cooked monitor flag into the RX status flags. Signed-off-by: Johannes Berg <johannes@sipsolutions.net> Signed-off-by: John W. Linville <linville@tuxdriver.com>