summaryrefslogtreecommitdiffstats
path: root/include/net/sctp
AgeCommit message (Collapse)AuthorFilesLines
2010-05-06sctp: Fix a race between ICMP protocol unreachable and connect()Vlad Yasevich2-0/+4
ICMP protocol unreachable handling completely disregarded the fact that the user may have locked the socket. It proceeded to destroy the association, even though the user may have held the lock and had a ref on the association. This resulted in the following: Attempt to release alive inet socket f6afcc00 ========================= [ BUG: held lock freed! ] ------------------------- somenu/2672 is freeing memory f6afcc00-f6afcfff, with a lock still held there! (sk_lock-AF_INET){+.+.+.}, at: [<c122098a>] sctp_connect+0x13/0x4c 1 lock held by somenu/2672: #0: (sk_lock-AF_INET){+.+.+.}, at: [<c122098a>] sctp_connect+0x13/0x4c stack backtrace: Pid: 2672, comm: somenu Not tainted 2.6.32-telco #55 Call Trace: [<c1232266>] ? printk+0xf/0x11 [<c1038553>] debug_check_no_locks_freed+0xce/0xff [<c10620b4>] kmem_cache_free+0x21/0x66 [<c1185f25>] __sk_free+0x9d/0xab [<c1185f9c>] sk_free+0x1c/0x1e [<c1216e38>] sctp_association_put+0x32/0x89 [<c1220865>] __sctp_connect+0x36d/0x3f4 [<c122098a>] ? sctp_connect+0x13/0x4c [<c102d073>] ? autoremove_wake_function+0x0/0x33 [<c12209a8>] sctp_connect+0x31/0x4c [<c11d1e80>] inet_dgram_connect+0x4b/0x55 [<c11834fa>] sys_connect+0x54/0x71 [<c103a3a2>] ? lock_release_non_nested+0x88/0x239 [<c1054026>] ? might_fault+0x42/0x7c [<c1054026>] ? might_fault+0x42/0x7c [<c11847ab>] sys_socketcall+0x6d/0x178 [<c10da994>] ? trace_hardirqs_on_thunk+0xc/0x10 [<c1002959>] syscall_call+0x7/0xb This was because the sctp_wait_for_connect() would aqcure the socket lock and then proceed to release the last reference count on the association, thus cause the fully destruction path to finish freeing the socket. The simplest solution is to start a very short timer in case the socket is owned by user. When the timer expires, we can do some verification and be able to do the release properly. Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2010-04-28sctp: Fix skb_over_panic resulting from multiple invalid parameter errors ↵Neil Horman1-0/+1
(CVE-2010-1173) (v4) Ok, version 4 Change Notes: 1) Minor cleanups, from Vlads notes Summary: Hey- Recently, it was reported to me that the kernel could oops in the following way: <5> kernel BUG at net/core/skbuff.c:91! <5> invalid operand: 0000 [#1] <5> Modules linked in: sctp netconsole nls_utf8 autofs4 sunrpc iptable_filter ip_tables cpufreq_powersave parport_pc lp parport vmblock(U) vsock(U) vmci(U) vmxnet(U) vmmemctl(U) vmhgfs(U) acpiphp dm_mirror dm_mod button battery ac md5 ipv6 uhci_hcd ehci_hcd snd_ens1371 snd_rawmidi snd_seq_device snd_pcm_oss snd_mixer_oss snd_pcm snd_timer snd_page_alloc snd_ac97_codec snd soundcore pcnet32 mii floppy ext3 jbd ata_piix libata mptscsih mptsas mptspi mptscsi mptbase sd_mod scsi_mod <5> CPU: 0 <5> EIP: 0060:[<c02bff27>] Not tainted VLI <5> EFLAGS: 00010216 (2.6.9-89.0.25.EL) <5> EIP is at skb_over_panic+0x1f/0x2d <5> eax: 0000002c ebx: c033f461 ecx: c0357d96 edx: c040fd44 <5> esi: c033f461 edi: df653280 ebp: 00000000 esp: c040fd40 <5> ds: 007b es: 007b ss: 0068 <5> Process swapper (pid: 0, threadinfo=c040f000 task=c0370be0) <5> Stack: c0357d96 e0c29478 00000084 00000004 c033f461 df653280 d7883180 e0c2947d <5> 00000000 00000080 df653490 00000004 de4f1ac0 de4f1ac0 00000004 df653490 <5> 00000001 e0c2877a 08000800 de4f1ac0 df653490 00000000 e0c29d2e 00000004 <5> Call Trace: <5> [<e0c29478>] sctp_addto_chunk+0xb0/0x128 [sctp] <5> [<e0c2947d>] sctp_addto_chunk+0xb5/0x128 [sctp] <5> [<e0c2877a>] sctp_init_cause+0x3f/0x47 [sctp] <5> [<e0c29d2e>] sctp_process_unk_param+0xac/0xb8 [sctp] <5> [<e0c29e90>] sctp_verify_init+0xcc/0x134 [sctp] <5> [<e0c20322>] sctp_sf_do_5_1B_init+0x83/0x28e [sctp] <5> [<e0c25333>] sctp_do_sm+0x41/0x77 [sctp] <5> [<c01555a4>] cache_grow+0x140/0x233 <5> [<e0c26ba1>] sctp_endpoint_bh_rcv+0xc5/0x108 [sctp] <5> [<e0c2b863>] sctp_inq_push+0xe/0x10 [sctp] <5> [<e0c34600>] sctp_rcv+0x454/0x509 [sctp] <5> [<e084e017>] ipt_hook+0x17/0x1c [iptable_filter] <5> [<c02d005e>] nf_iterate+0x40/0x81 <5> [<c02e0bb9>] ip_local_deliver_finish+0x0/0x151 <5> [<c02e0c7f>] ip_local_deliver_finish+0xc6/0x151 <5> [<c02d0362>] nf_hook_slow+0x83/0xb5 <5> [<c02e0bb2>] ip_local_deliver+0x1a2/0x1a9 <5> [<c02e0bb9>] ip_local_deliver_finish+0x0/0x151 <5> [<c02e103e>] ip_rcv+0x334/0x3b4 <5> [<c02c66fd>] netif_receive_skb+0x320/0x35b <5> [<e0a0928b>] init_stall_timer+0x67/0x6a [uhci_hcd] <5> [<c02c67a4>] process_backlog+0x6c/0xd9 <5> [<c02c690f>] net_rx_action+0xfe/0x1f8 <5> [<c012a7b1>] __do_softirq+0x35/0x79 <5> [<c0107efb>] handle_IRQ_event+0x0/0x4f <5> [<c01094de>] do_softirq+0x46/0x4d Its an skb_over_panic BUG halt that results from processing an init chunk in which too many of its variable length parameters are in some way malformed. The problem is in sctp_process_unk_param: if (NULL == *errp) *errp = sctp_make_op_error_space(asoc, chunk, ntohs(chunk->chunk_hdr->length)); if (*errp) { sctp_init_cause(*errp, SCTP_ERROR_UNKNOWN_PARAM, WORD_ROUND(ntohs(param.p->length))); sctp_addto_chunk(*errp, WORD_ROUND(ntohs(param.p->length)), param.v); When we allocate an error chunk, we assume that the worst case scenario requires that we have chunk_hdr->length data allocated, which would be correct nominally, given that we call sctp_addto_chunk for the violating parameter. Unfortunately, we also, in sctp_init_cause insert a sctp_errhdr_t structure into the error chunk, so the worst case situation in which all parameters are in violation requires chunk_hdr->length+(sizeof(sctp_errhdr_t)*param_count) bytes of data. The result of this error is that a deliberately malformed packet sent to a listening host can cause a remote DOS, described in CVE-2010-1173: http://cve.mitre.org/cgi-bin/cvename.cgi?name=2010-1173 I've tested the below fix and confirmed that it fixes the issue. We move to a strategy whereby we allocate a fixed size error chunk and ignore errors we don't have space to report. Tested by me successfully Signed-off-by: Neil Horman <nhorman@tuxdriver.com> Acked-by: Vlad Yasevich <vladislav.yasevich@hp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2010-04-28sctp: Fix oops when sending queued ASCONF chunksVlad Yasevich1-0/+1
When we finish processing ASCONF_ACK chunk, we try to send the next queued ASCONF. This action runs the sctp state machine recursively and it's not prepared to do so. kernel BUG at kernel/timer.c:790! invalid opcode: 0000 [#1] SMP last sysfs file: /sys/module/ipv6/initstate Modules linked in: sha256_generic sctp libcrc32c ipv6 dm_multipath uinput 8139too i2c_piix4 8139cp mii i2c_core pcspkr virtio_net joydev floppy virtio_blk virtio_pci [last unloaded: scsi_wait_scan] Pid: 0, comm: swapper Not tainted 2.6.34-rc4 #15 /Bochs EIP: 0060:[<c044a2ef>] EFLAGS: 00010286 CPU: 0 EIP is at add_timer+0xd/0x1b EAX: cecbab14 EBX: 000000f0 ECX: c0957b1c EDX: 03595cf4 ESI: cecba800 EDI: cf276f00 EBP: c0957aa0 ESP: c0957aa0 DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068 Process swapper (pid: 0, ti=c0956000 task=c0988ba0 task.ti=c0956000) Stack: c0957ae0 d1851214 c0ab62e4 c0ab5f26 0500ffff 00000004 00000005 00000004 <0> 00000000 d18694fd 00000004 1666b892 cecba800 cecba800 c0957b14 00000004 <0> c0957b94 d1851b11 ceda8b00 cecba800 cf276f00 00000001 c0957b14 000000d0 Call Trace: [<d1851214>] ? sctp_side_effects+0x607/0xdfc [sctp] [<d1851b11>] ? sctp_do_sm+0x108/0x159 [sctp] [<d1863386>] ? sctp_pname+0x0/0x1d [sctp] [<d1861a56>] ? sctp_primitive_ASCONF+0x36/0x3b [sctp] [<d185657c>] ? sctp_process_asconf_ack+0x2a4/0x2d3 [sctp] [<d184e35c>] ? sctp_sf_do_asconf_ack+0x1dd/0x2b4 [sctp] [<d1851ac1>] ? sctp_do_sm+0xb8/0x159 [sctp] [<d1863334>] ? sctp_cname+0x0/0x52 [sctp] [<d1854377>] ? sctp_assoc_bh_rcv+0xac/0xe1 [sctp] [<d1858f0f>] ? sctp_inq_push+0x2d/0x30 [sctp] [<d186329d>] ? sctp_rcv+0x797/0x82e [sctp] Tested-by: Wei Yongjun <yjwei@cn.fujitsu.com> Signed-off-by: Yuansong Qiao <ysqiao@research.ait.ie> Signed-off-by: Shuaijun Zhang <szhang@research.ait.ie> Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2010-04-28sctp: avoid irq lock inversion while call sk->sk_data_ready()Wei Yongjun1-0/+1
sk->sk_data_ready() of sctp socket can be called from both BH and non-BH contexts, but the default sk->sk_data_ready(), sock_def_readable(), can not be used in this case. Therefore, we have to make a new function sctp_data_ready() to grab sk->sk_data_ready() with BH disabling. ========================================================= [ INFO: possible irq lock inversion dependency detected ] 2.6.33-rc6 #129 --------------------------------------------------------- sctp_darn/1517 just changed the state of lock: (clock-AF_INET){++.?..}, at: [<c06aab60>] sock_def_readable+0x20/0x80 but this lock took another, SOFTIRQ-unsafe lock in the past: (slock-AF_INET){+.-...} and interrupts could create inverse lock ordering between them. other info that might help us debug this: 1 lock held by sctp_darn/1517: #0: (sk_lock-AF_INET){+.+.+.}, at: [<cdfe363d>] sctp_sendmsg+0x23d/0xc00 [sctp] Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com> Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-12-09Merge branch 'for-linus' of ↵Linus Torvalds1-1/+1
git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial: (42 commits) tree-wide: fix misspelling of "definition" in comments reiserfs: fix misspelling of "journaled" doc: Fix a typo in slub.txt. inotify: remove superfluous return code check hdlc: spelling fix in find_pvc() comment doc: fix regulator docs cut-and-pasteism mtd: Fix comment in Kconfig doc: Fix IRQ chip docs tree-wide: fix assorted typos all over the place drivers/ata/libata-sff.c: comment spelling fixes fix typos/grammos in Documentation/edac.txt sysctl: add missing comments fs/debugfs/inode.c: fix comment typos sgivwfb: Make use of ARRAY_SIZE. sky2: fix sky2_link_down copy/paste comment error tree-wide: fix typos "couter" -> "counter" tree-wide: fix typos "offest" -> "offset" fix kerneldoc for set_irq_msi() spidev: fix double "of of" in comment comment typo fix: sybsystem -> subsystem ...
2009-12-07Merge branch 'for-next' into for-linusJiri Kosina1-1/+1
Conflicts: kernel/irq/chip.c
2009-12-04tree-wide: fix assorted typos all over the placeAndré Goddard Rosa1-1/+1
That is "success", "unknown", "through", "performance", "[re|un]mapping" , "access", "default", "reasonable", "[con]currently", "temperature" , "channel", "[un]used", "application", "example","hierarchy", "therefore" , "[over|under]flow", "contiguous", "threshold", "enough" and others. Signed-off-by: André Goddard Rosa <andre.goddard@gmail.com> Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2009-11-29Merge branch 'master' of ↵David S. Miller1-1/+0
master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6 Conflicts: drivers/ieee802154/fakehard.c drivers/net/e1000e/ich8lan.c drivers/net/e1000e/phy.c drivers/net/netxen/netxen_nic_init.c drivers/net/wireless/ath/ath9k/main.c
2009-11-29sctp: on T3_RTX retransmit all the in-flight chunksAndrei Pelinescu-Onciul1-1/+0
When retransmitting due to T3 timeout, retransmit all the in-flight chunks for the corresponding transport/path, including chunks sent less then 1 rto ago. This is the correct behaviour according to rfc4960 section 6.3.3 E3 and "Note: Any DATA chunks that were sent to the address for which the T3-rtx timer expired but did not fit in one MTU (rule E3 above) should be marked for retransmission and sent as soon as cwnd allows (normally, when a SACK arrives). ". This fixes problems when more then one path is present and the T3 retransmission of the first chunk that timeouts stops the T3 timer for the initial active path, leaving all the other in-flight chunks waiting forever or until a new chunk is transmitted on the same path and timeouts (and this will happen only if the cwnd allows sending new chunks, but since cwnd was dropped to MTU by the timeout => it will wait until the first heartbeat). Example: 10 packets in flight, sent at 0.1 s intervals on the primary path. The primary path is down and the first packet timeouts. The first packet is retransmitted on another path, the T3 timer for the primary path is stopped and cwnd is set to MTU. All the other 9 in-flight packets will not be retransmitted (unless more new packets are sent on the primary path which depend on cwnd allowing it, and even in this case the 9 packets will be retransmitted only after a new packet timeouts which even in the best case would be more then RTO). This commit reverts d0ce92910bc04e107b2f3f2048f07e94f570035d and also removes the now unused transport->last_rto, introduced in b6157d8e03e1e780660a328f7183bcbfa4a93a19. p.s The problem is not only when multiple paths are there. It can happen in a single homed environment. If the application stops sending data, it possible to have a hung association. Signed-off-by: Andrei Pelinescu-Onciul <andrei@iptel.org> Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-11-23sctp: Update max.burst implementationVlad Yasevich1-0/+4
Current implementation of max.burst ends up limiting new data during cwnd decay period. The decay is happening becuase the connection is idle and we are allowed to fill the congestion window. The point of max.burst is to limit micro-bursts in response to large acks. This still happens, as max.burst is still applied to each transmit opportunity. It will also apply if a very large send is made (greater then allowed by burst). Tested-by: Florian Niederbacher <florian.niederbacher@student.uibk.ac.at> Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
2009-11-23sctp: Turn the enum socket options into definesVlad Yasevich1-82/+43
Recent attempt to remove deprecated socket options demonstrated that removing options from the enum space will have severe binary compatibility issues. The reason is that it changes the subsequent enum space and causes option values to be redefined. To solve this, and to get rid of the ugly double statements for every option, we simply convert to the #define scheme. Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
2009-11-23sctp: Remove useless last_time_used variableVlad Yasevich1-6/+0
The transport last_time_used variable is rather useless. It was only used when determining if CWND needs to be updated due to idle transport. However, idle transport detection was based on a Heartbeat timer and last_time_used was not incremented when sending Heartbeats. As a result the check for cwnd reduction was always true. We can get rid of the variable and just base our cwnd manipulation on the HB timer (like the code comment sais). We also have to call into the cwnd manipulation function regardless of whether HBs are enabled or not. That way we will detect idle transports if the user has disabled Heartbeats. Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
2009-11-23sctp: remove deprecated SCTP_GET_*_OLD stuffsAmerigo Wang1-8/+0
SCTP_GET_*_OLD stuffs are schedlued to be removed. Cc: Vlad Yasevich <vladislav.yasevich@hp.com> Signed-off-by: WANG Cong <amwang@redhat.com> Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
2009-11-23sctp: Update SWS avaoidance receiver side algorithmVlad Yasevich2-0/+10
We currently send window update SACKs every time we free up 1 PMTU worth of data. That a lot more SACKs then necessary. Instead, we'll now send back the actuall window every time we send a sack, and do window-update SACKs when a fraction of the receive buffer has been opened. The fraction is controlled with a sysctl. Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
2009-11-23sctp: Fix malformed "Invalid Stream Identifier" errorVlad Yasevich1-1/+2
The "Invalid Stream Identifier" error has a 16 bit reserved field at the end, thus making the parameter length be 8 bytes. We've never supplied that reserved field making wireshark tag the packet as malformed. Reported-by: Chris Dischino <cdischino@sonusnet.com> Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
2009-11-23sctp: implement the sender side for SACK-IMMEDIATELY extensionWei Yongjun1-0/+1
This patch implement the sender side for SACK-IMMEDIATELY extension. Section 4.1. Sender Side Considerations Whenever the sender of a DATA chunk can benefit from the corresponding SACK chunk being sent back without delay, the sender MAY set the I-bit in the DATA chunk header. Reasons for setting the I-bit include o The sender is in the SHUTDOWN-PENDING state. o The application requests to set the I-bit of the last DATA chunk of a user message when providing the user message to the SCTP implementation. Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com> Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
2009-11-17Merge branch 'master' of ↵David S. Miller1-1/+1
master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6 Conflicts: drivers/net/can/Kconfig
2009-11-13sctp: Set source addresses on the association before adding transportsVlad Yasevich1-1/+1
Recent commit 8da645e101a8c20c6073efda3c7cc74eec01b87f sctp: Get rid of an extra routing lookup when adding a transport introduced a regression in the connection setup. The behavior was different between IPv4 and IPv6. IPv4 case ended up working because the route lookup routing returned a NULL route, which triggered another route lookup later in the output patch that succeeded. In the IPv6 case, a valid route was returned for first call, but we could not find a valid source address at the time since the source addresses were not set on the association yet. Thus resulted in a hung connection. The solution is to set the source addresses on the association prior to adding peers. Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-11-04net: cleanup include/netEric Dumazet1-2/+1
This cleanup patch puts struct/union/enum opening braces, in first line to ease grep games. struct something { becomes : struct something { Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-09-30net: Make setsockopt() optlen be unsigned.David S. Miller1-2/+2
This provides safety against negative optlen at the type level instead of depending upon (sometimes non-trivial) checks against this sprinkled all over the the place, in each and every implementation. Based upon work done by Arjan van de Ven and feedback from Linus Torvalds. Signed-off-by: David S. Miller <davem@davemloft.net>
2009-09-04sctp: turn flags in 'struct sctp_association' into bit fieldsWei Yongjun1-12/+9
This shrinks the size of struct sctp_association a little. Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com> Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
2009-09-04sctp: Sysctl configuration for IPv4 Address ScopingBhaskar Dutta2-0/+17
This patch introduces a new sysctl option to make IPv4 Address Scoping configurable <draft-stewart-tsvwg-sctp-ipv4-00.txt>. In networking environments where DNAT rules in iptables prerouting chains convert destination IP's to link-local/private IP addresses, SCTP connections fail to establish as the INIT chunk is dropped by the kernel due to address scope match failure. For example to support overlapping IP addresses (same IP address with different vlan id) a Layer-5 application listens on link local IP's, and there is a DNAT rule that maps the destination IP to a link local IP. Such applications never get the SCTP INIT if the address-scoping draft is strictly followed. This sysctl configuration allows SCTP to function in such unconventional networking environments. Sysctl options: 0 - Disable IPv4 address scoping draft altogether 1 - Enable IPv4 address scoping (default, current behavior) 2 - Enable address scoping but allow IPv4 private addresses in init/init-ack 3 - Enable address scoping but allow IPv4 link local address in init/init-ack Signed-off-by: Bhaskar Dutta <bhaskar.dutta@globallogic.com> Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
2009-09-04sctp: Turn flags in 'sctp_packet' into bit fieldsVlad Yasevich1-16/+6
This shrinks the size of sctp_packet a little. Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
2009-09-04sctp: Fix SCTP_MAXSEG socket option to comply to spec.Vlad Yasevich2-3/+5
We had a bug that we never stored the user-defined value for MAXSEG when setting the value on an association. Thus future PMTU events ended up re-writing the frag point and increasing it past user limit. Additionally, when setting the option on the socket/endpoint, we effect all current associations, which is against spec. Now, we store the user 'maxseg' value along with the computed 'frag_point'. We inherit 'maxseg' from the socket at association creation and use it as an upper limit for 'frag_point' when its set. Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
2009-09-04sctp: Don't do NAGLE delay on large writes that were fragmented smallVlad Yasevich1-1/+1
SCTP will delay the last part of a large write due to NAGLE, if that part is smaller then MTU. Since we are doing large writes, we might as well send the last portion now instead of waiting untill the next large write happens. The small portion will be sent as is regardless, so it's better to not delay it. This is a result of much discussions with Wei Yongjun <yjwei@cn.fujitsu.com> and Doug Graham <dgraham@nortel.com>. Many thanks go out to them. Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
2009-09-04sctp: drop a_rwnd to 0 when receive buffer overflows.Vlad Yasevich1-0/+6
SCTP has a problem that when small chunks are used, it is possible to exhaust the receiver buffer without fully closing receive window. This happens due to all overhead that we have account for with small messages. To fix this, when receive buffer is exceeded, we'll drop the window to 0 and save the 'drop' portion. When application starts reading data and freeing up recevie buffer space, we'll wait until we've reached the 'drop' window and then add back this 'drop' one mtu at a time. This worked well in testing and under stress produced rather even recovery. Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
2009-09-04sctp: Send user messages to the lower layer as oneVlad Yasevich2-0/+4
Currenlty, sctp breaks up user messages into fragments and sends each fragment to the lower layer by itself. This means that for each fragment we go all the way down the stack and back up. This also discourages bundling of multiple fragments when they can fit into a sigle packet (ex: due to user setting a low fragmentation threashold). We introduce a new command SCTP_CMD_SND_MSG and hand the whole message down state machine. The state machine and the side-effect parser will cork the queue, add all chunks from the message to the queue, and then un-cork the queue thus causing the chunks to get transmitted. Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
2009-09-04sctp: Disallow new connection on a closing socketVlad Yasevich1-1/+1
If a socket has a lot of association that are in the process of of being closed/aborted, it is possible for a remote to establish new associations during the time period that the old ones are shutting down. If this was a result of a close() call, there will be no socket and will cause a memory leak. We'll prevent this by setting the socket state to CLOSING and disallow new associations when in this state. Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
2009-09-04sctp: remove unused union (sctp_cmsg_data_t) definitionRami Rosen1-6/+0
This patch removes an unused union definition (sctp_cmsg_data_t) from include/net/sctp/user.h. Signed-off-by: Rami Rosen <rosenrami@gmail.com> Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
2009-08-05net: mark read-only arrays as constJan Engelhardt1-1/+3
String literals are constant, and usually, we can also tag the array of pointers const too, moving it to the .rodata section. Signed-off-by: Jan Engelhardt <jengelh@medozas.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-06-23net: Move rx skb_orphan call to where neededHerbert Xu1-0/+1
In order to get the tun driver to account packets, we need to be able to receive packets with destructors set. To be on the safe side, I added an skb_orphan call for all protocols by default since some of them (IP in particular) cannot handle receiving packets destructors properly. Now it seems that at least one protocol (CAN) expects to be able to pass skb->sk through the rx path without getting clobbered. So this patch attempts to fix this properly by moving the skb_orphan call to where it's actually needed. In particular, I've added it to skb_set_owner_[rw] which is what most users of skb->destructor call. This is actually an improvement for tun too since it means that we only give back the amount charged to the socket when the skb is passed to another socket that will also be charged accordingly. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Tested-by: Oliver Hartkopp <olver@hartkopp.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-06-03sctp: support non-blocking version of the new sctp_connectx() APIVlad Yasevich1-0/+2
Prior implementation of the new sctp_connectx() call that returns an association ID did not work correctly on non-blocking socket. This is because we could not return both a EINPROGRESS error and an association id. This is a new implementation that supports this. Originally from Ivan Skytte Jørgensen <isj-sctp@i1.dk Signed-off-by: Ivan Skytte Jørgensen <isj-sctp@i1.dk Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
2009-06-03sctp: fix to choose alternate destination when retransmit ASCONF chunkWei Yongjun1-4/+2
RFC 5061 Section 5.1 ASCONF Chunk Procedures said: B4) Re-transmit the ASCONF Chunk last sent and if possible choose an alternate destination address (please refer to [RFC4960], Section 6.4.1). An endpoint MUST NOT add new parameters to this chunk; it MUST be the same (including its Sequence Number) as the last ASCONF sent. An endpoint MAY, however, bundle an additional ASCONF with new ASCONF parameters with the next Sequence Number. For details, see Section 5.5. This patch fix to choose an alternate destination address when re-transmit the ASCONF chunk, with some dup codes cleanup. Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com> Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
2009-03-21sctp: Clean up TEST_FRAME hacks.Vlad Yasevich1-7/+0
Remove 2 TEST_FRAME hacks that are no longer needed. These allowed sctp regression tests to compile before, but are no longer needed. Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-03-02sctp: Fix broken RTO-doubling for data retransmitsVlad Yasevich1-1/+2
Commit faee47cdbfe8d74a1573c2f81ea6dbb08d735be6 (sctp: Fix the RTO-doubling on idle-link heartbeats) broke the RTO doubling for data retransmits. If the heartbeat was sent before the data T3-rtx time, the the RTO will not double upon the T3-rtx expiration. Distingish between the operations by passing an argument to the function. Additionally, Wei Youngjun pointed out that our treatment of requested HEARTBEATS and timer HEARTBEATS is the same wrt resetting congestion window. That needs to be separated, since user requested HEARTBEATS should not treat the link as idle. Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-02-16sctp: Inherit all socket options from parent correctly.Vlad Yasevich1-0/+2
During peeloff/accept() sctp needs to save the parent socket state into the new socket so that any options set on the parent are inherited by the child socket. This was found when the parent/listener socket issues SO_BINDTODEVICE, but the data was misrouted after a route cache flush. Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-02-16sctp: Fix the RTO-doubling on idle-link heartbeatsVlad Yasevich1-0/+2
SCTP incorrectly doubles rto ever time a Hearbeat chunk is generated. However RFC 4960 states: On an idle destination address that is allowed to heartbeat, it is recommended that a HEARTBEAT chunk is sent once per RTO of that destination address plus the protocol parameter 'HB.interval', with jittering of +/- 50% of the RTO value, and exponential backoff of the RTO if the previous HEARTBEAT is unanswered. Essentially, of if the heartbean is unacknowledged, do we double the RTO. Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-02-16sctp: Clean up sctp checksumming codeVlad Yasevich1-7/+7
The sctp crc32c checksum is always generated in little endian. So, we clean up the code to treat it as little endian and remove all the __force casts. Suggested by Herbert Xu. Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-02-16sctp: Allow to disable SCTP checksums via module parameterLucas Nussbaum1-0/+5
This is a new version of my patch, now using a module parameter instead of a sysctl, so that the option is harder to find. Please note that, once the module is loaded, it is still possible to change the value of the parameter in /sys/module/sctp/parameters/, which is useful if you want to do performance comparisons without rebooting. Computation of SCTP checksums significantly affects the performance of SCTP. For example, using two dual-Opteron 246 connected using a Gbe network, it was not possible to achieve more than ~730 Mbps, compared to 941 Mbps after disabling SCTP checksums. Unfortunately, SCTP checksum offloading in NICs is not commonly available (yet). By default, checksums are still enabled, of course. Signed-off-by: Lucas Nussbaum <lucas.nussbaum@ens-lyon.fr> Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-01-22sctp: Fix crc32c calculations on big-endian arhes.Vlad Yasevich1-1/+1
crc32c algorithm provides a byteswaped result. On little-endian arches, the result ends up in big-endian/network byte order. On big-endinan arches, the result ends up in little-endian order and needs to be byte swapped again. Thus calling cpu_to_le32 gives the right output. Tested-by: Jukka Taimisto <jukka.taimisto@mail.suomi.net> Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-12-25sctp: Implement socket option SCTP_GET_ASSOC_NUMBERWei Yongjun1-0/+2
Implement socket option SCTP_GET_ASSOC_NUMBER of the latest ietf socket extensions API draft. 8.2.5. Get the Current Number of Associations (SCTP_GET_ASSOC_NUMBER) This option gets the current number of associations that are attached to a one-to-many style socket. The option value is an uint32_t. Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com> Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-11-25net: Use a percpu_counter for sockets_allocatedEric Dumazet1-0/+1
Instead of using one atomic_t per protocol, use a percpu_counter for "sockets_allocated", to reduce cache line contention on heavy duty network servers. Note : We revert commit (248969ae31e1b3276fc4399d67ce29a5d81e6fd9 net: af_unix can make unix_nr_socks visbile in /proc), since it is not anymore used after sock_prot_inuse_add() addition Signed-off-by: Eric Dumazet <dada1@cosmosbay.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-10-31misc: replace NIPQUAD()Harvey Harrison1-2/+2
Using NIPQUAD() with NIPQUAD_FMT, %d.%d.%d.%d or %u.%u.%u.%u can be replaced with %pI4 Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-10-29net: replace %p6 with %pI6Harvey Harrison1-1/+1
Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-10-28net: replace uses of NIP6_FMT with %p6Harvey Harrison1-2/+2
Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-10-23sctp: Fix to handle SHUTDOWN in SHUTDOWN_RECEIVED stateWei Yongjun1-0/+1
Once an endpoint has reached the SHUTDOWN-RECEIVED state, it MUST NOT send a SHUTDOWN in response to a ULP request. The Cumulative TSN Ack of the received SHUTDOWN chunk MUST be processed. This patch fix to process Cumulative TSN Ack of the received SHUTDOWN chunk in SHUTDOWN_RECEIVED state. Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com> Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-10-16include: replace __FUNCTION__ with __func__Harvey Harrison1-1/+1
__FUNCTION__ is gcc-specific, use __func__ Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-10-08sctp: shrink sctp_tsnmap some more by removing gabs arrayVlad Yasevich1-11/+3
The gabs array in the sctp_tsnmap structure is only used in one place, sctp_make_sack(). As such, carrying the array around in the sctp_tsnmap and thus directly in the sctp_association is rather pointless since most of the time it's just taking up space. Now, let sctp_make_sack create and populate it and then throw it away when it's done. Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-10-08sctp: Rework the tsn map to use generic bitmap.Vlad Yasevich3-32/+12
The tsn map currently use is 4K large and is stuck inside the sctp_association structure making memory references REALLY expensive. What we really need is at most 4K worth of bits so the biggest map we would have is 512 bytes. Also, the map is only really usefull when we have gaps to store and report. As such, starting with minimal map of say 32 TSNs (bits) should be enough for normal low-loss operations. We can grow the map by some multiple of 32 along with some extra room any time we receive the TSN which would put us outside of the map boundry. As we close gaps, we can shift the map to rebase it on the latest TSN we've seen. This saves 4088 bytes per association just in the map alone along savings from the now unnecessary structure members. Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-10-01sctp: try harder to figure out address family when checking wildcardsVlad Yasevich1-1/+1
sctp_is_any() function that is used to check for wildcard addresses only looks at the address itself to determine the address family. This function is used in the API to check the address passed in from the user. If the user simply zerroes out the sockaddr_storage and pass that in, we'll end up failing. So, let's try harder to determine the address family by also checking the socket if it's possible. Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>