summaryrefslogtreecommitdiffstats
path: root/net/rds/tcp.c
AgeCommit message (Collapse)AuthorFilesLines
2018-08-21rds: tcp: remove duplicated include from tcp.cYue Haibing1-1/+0
Remove duplicated include. Signed-off-by: Yue Haibing <yuehaibing@huawei.com> Acked-by: Sowmini Varadhan <sowmini.varadhan@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-08-01rds: Remove IPv6 dependencyKa-Cheong Poon1-0/+25
This patch removes the IPv6 dependency from RDS. Signed-off-by: Ka-Cheong Poon <ka-cheong.poon@oracle.com> Acked-by: Santosh Shilimkar <santosh.shilimkar@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-07-24rds: remove trailing whitespace and blank linesStephen Hemminger1-1/+0
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-07-23rds: Extend RDS API for IPv6 supportKa-Cheong Poon1-0/+44
There are many data structures (RDS socket options) used by RDS apps which use a 32 bit integer to store IP address. To support IPv6, struct in6_addr needs to be used. To ensure backward compatibility, a new data structure is introduced for each of those data structures which use a 32 bit integer to represent an IP address. And new socket options are introduced to use those new structures. This means that existing apps should work without a problem with the new RDS module. For apps which want to use IPv6, those new data structures and socket options can be used. IPv4 mapped address is used to represent IPv4 address in the new data structures. v4: Revert changes to SO_RDS_TRANSPORT Signed-off-by: Ka-Cheong Poon <ka-cheong.poon@oracle.com> Acked-by: Santosh Shilimkar <santosh.shilimkar@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-07-23rds: Enable RDS IPv6 supportKa-Cheong Poon1-20/+34
This patch enables RDS to use IPv6 addresses. For RDS/TCP, the listener is now an IPv6 endpoint which accepts both IPv4 and IPv6 connection requests. RDS/RDMA/IB uses a private data (struct rds_ib_connect_private) exchange between endpoints at RDS connection establishment time to support RDMA. This private data exchange uses a 32 bit integer to represent an IP address. This needs to be changed in order to support IPv6. A new private data struct rds6_ib_connect_private is introduced to handle this. To ensure backward compatibility, an IPv6 capable RDS stack uses another RDMA listener port (RDS_CM_PORT) to accept IPv6 connection. And it continues to use the original RDS_PORT for IPv4 RDS connections. When it needs to communicate with an IPv6 peer, it uses the RDS_CM_PORT to send the connection set up request. v5: Fixed syntax problem (David Miller). v4: Changed port history comments in rds.h (Sowmini Varadhan). v3: Added support to set up IPv4 connection using mapped address (David Miller). Added support to set up connection between link local and non-link addresses. Various review comments from Santosh Shilimkar and Sowmini Varadhan. v2: Fixed bound and peer address scope mismatched issue. Added back rds_connect() IPv6 changes. Signed-off-by: Ka-Cheong Poon <ka-cheong.poon@oracle.com> Acked-by: Santosh Shilimkar <santosh.shilimkar@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-07-23rds: Changing IP address internal representation to struct in6_addrKa-Cheong Poon1-3/+29
This patch changes the internal representation of an IP address to use struct in6_addr. IPv4 address is stored as an IPv4 mapped address. All the functions which take an IP address as argument are also changed to use struct in6_addr. But RDS socket layer is not modified such that it still does not accept IPv6 address from an application. And RDS layer does not accept nor initiate IPv6 connections. v2: Fixed sparse warnings. Signed-off-by: Ka-Cheong Poon <ka-cheong.poon@oracle.com> Acked-by: Santosh Shilimkar <santosh.shilimkar@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-27net: Drop pernet_operations::asyncKirill Tkhai1-1/+0
Synchronous pernet_operations are not allowed anymore. All are asynchronous. So, drop the structure member. Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-22rds: tcp: remove register_netdevice_notifier infrastructure.Sowmini Varadhan1-70/+23
The netns deletion path does not need to wait for all net_devices to be unregistered before dismantling rds_tcp state for the netns (we are able to dismantle this state on module unload even when all net_devices are active so there is no dependency here). This patch removes code related to netdevice notifiers and refactors all the code needed to dismantle rds_tcp state into a ->exit callback for the pernet_operations used with register_pernet_device(). Signed-off-by: Sowmini Varadhan <sowmini.varadhan@oracle.com> Reviewed-by: Kirill Tkhai <ktkhai@virtuozzo.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-17rds: tcp: must use spin_lock_irq* and not spin_lock_bh with rds_tcp_conn_lockSowmini Varadhan1-8/+9
rds_tcp_connection allocation/free management has the potential to be called from __rds_conn_create after IRQs have been disabled, so spin_[un]lock_bh cannot be used with rds_tcp_conn_lock. Bottom-halves that need to synchronize for critical sections protected by rds_tcp_conn_lock should instead use rds_destroy_pending() correctly. Reported-by: syzbot+c68e51bb5e699d3f8d91@syzkaller.appspotmail.com Fixes: ebeeb1ad9b8a ("rds: tcp: use rds_destroy_pending() to synchronize netns/module teardown and rds connection/workq management") Signed-off-by: Sowmini Varadhan <sowmini.varadhan@oracle.com> Acked-by: Santosh Shilimkar <santosh.shilimkar@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-13net: Convert rds_tcp_net_opsKirill Tkhai1-0/+1
These pernet_operations create and destroy sysctl table and listen socket. Also, exit method flushes global workqueue and work. Everything looks per-net safe, so we can mark them async. Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-02-12net: make getname() functions return length rather than use int* parameterDenys Vlasenko1-5/+2
Changes since v1: Added changes in these files: drivers/infiniband/hw/usnic/usnic_transport.c drivers/staging/lustre/lnet/lnet/lib-socket.c drivers/target/iscsi/iscsi_target_login.c drivers/vhost/net.c fs/dlm/lowcomms.c fs/ocfs2/cluster/tcp.c security/tomoyo/network.c Before: All these functions either return a negative error indicator, or store length of sockaddr into "int *socklen" parameter and return zero on success. "int *socklen" parameter is awkward. For example, if caller does not care, it still needs to provide on-stack storage for the value it does not need. None of the many FOO_getname() functions of various protocols ever used old value of *socklen. They always just overwrite it. This change drops this parameter, and makes all these functions, on success, return length of sockaddr. It's always >= 0 and can be differentiated from an error. Tests in callers are changed from "if (err)" to "if (err < 0)", where needed. rpc_sockname() lost "int buflen" parameter, since its only use was to be passed to kernel_getsockname() as &buflen and subsequently not used in any way. Userspace API is not changed. text data bss dec hex filename 30108430 2633624 873672 33615726 200ef6e vmlinux.before.o 30108109 2633612 873672 33615393 200ee21 vmlinux.o Signed-off-by: Denys Vlasenko <dvlasenk@redhat.com> CC: David S. Miller <davem@davemloft.net> CC: linux-kernel@vger.kernel.org CC: netdev@vger.kernel.org CC: linux-bluetooth@vger.kernel.org CC: linux-decnet-user@lists.sourceforge.net CC: linux-wireless@vger.kernel.org CC: linux-rdma@vger.kernel.org CC: linux-sctp@vger.kernel.org CC: linux-nfs@vger.kernel.org CC: linux-x25@vger.kernel.org Signed-off-by: David S. Miller <davem@davemloft.net>
2018-02-08rds: tcp: use rds_destroy_pending() to synchronize netns/module teardown and ↵Sowmini Varadhan1-12/+30
rds connection/workq management An rds_connection can get added during netns deletion between lines 528 and 529 of 506 static void rds_tcp_kill_sock(struct net *net) : /* code to pull out all the rds_connections that should be destroyed */ : 528 spin_unlock_irq(&rds_tcp_conn_lock); 529 list_for_each_entry_safe(tc, _tc, &tmp_list, t_tcp_node) 530 rds_conn_destroy(tc->t_cpath->cp_conn); Such an rds_connection would miss out the rds_conn_destroy() loop (that cancels all pending work) and (if it was scheduled after netns deletion) could trigger the use-after-free. A similar race-window exists for the module unload path in rds_tcp_exit -> rds_tcp_destroy_conns Concurrency with netns deletion (rds_tcp_kill_sock()) must be handled by checking check_net() before enqueuing new work or adding new connections. Concurrency with module-unload is handled by maintaining a module specific flag that is set at the start of the module exit function, and must be checked before enqueuing new work or adding new connections. This commit refactors existing RDS_DESTROY_PENDING checks added by commit 3db6e0d172c9 ("rds: use RCU to synchronize work-enqueue with connection teardown") and consolidates all the concurrency checks listed above into the function rds_destroy_pending(). Signed-off-by: Sowmini Varadhan <sowmini.varadhan@oracle.com> Acked-by: Santosh Shilimkar <santosh.shilimkar@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-23Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller1-2/+3
en_rx_am.c was deleted in 'net-next' but had a bug fixed in it in 'net'. The esp{4,6}_offload.c conflicts were overlapping changes. The 'out' label is removed so we just return ERR_PTR(-EINVAL) directly. Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-22rds: tcp: compute m_ack_seq as offset from ->write_seqSowmini Varadhan1-2/+3
rds-tcp uses m_ack_seq to track the tcp ack# that indicates that the peer has received a rds_message. The m_ack_seq is used in rds_tcp_is_acked() to figure out when it is safe to drop the rds_message from the RDS retransmit queue. The m_ack_seq must be calculated as an offset from the right edge of the in-flight tcp buffer, i.e., it should be based on the ->write_seq, not the ->snd_nxt. Signed-off-by: Sowmini Varadhan <sowmini.varadhan@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-27rds: tcp: cleanup if kmem_cache_alloc fails in rds_tcp_conn_alloc()Sowmini Varadhan1-20/+26
If kmem_cache_alloc() fails in the middle of the for() loop, cleanup anything that might have been allocated so far. Signed-off-by: Sowmini Varadhan <sowmini.varadhan@oracle.com> Acked-by: Santosh Shilimkar <santosh.shilimkar@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-27rds: tcp: initialize t_tcp_detached to falseSowmini Varadhan1-0/+1
Commit f10b4cff98c6 ("rds: tcp: atomically purge entries from rds_tcp_conn_list during netns delete") adds the field t_tcp_detached, but this needs to be initialized explicitly to false. Signed-off-by: Sowmini Varadhan <sowmini.varadhan@oracle.com> Acked-by: Santosh Shilimkar <santosh.shilimkar@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-01rds: tcp: atomically purge entries from rds_tcp_conn_list during netns deleteSowmini Varadhan1-2/+7
The rds_tcp_kill_sock() function parses the rds_tcp_conn_list to find the rds_connection entries marked for deletion as part of the netns deletion under the protection of the rds_tcp_conn_lock. Since the rds_tcp_conn_list tracks rds_tcp_connections (which have a 1:1 mapping with rds_conn_path), multiple tc entries in the rds_tcp_conn_list will map to a single rds_connection, and will be deleted as part of the rds_conn_destroy() operation that is done outside the rds_tcp_conn_lock. The rds_tcp_conn_list traversal done under the protection of rds_tcp_conn_lock should not leave any doomed tc entries in the list after the rds_tcp_conn_lock is released, else another concurrently executiong netns delete (for a differnt netns) thread may trip on these entries. Reported-by: syzbot <syzkaller@googlegroups.com> Signed-off-by: Sowmini Varadhan <sowmini.varadhan@oracle.com> Acked-by: Santosh Shilimkar <santosh.shilimkar@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-01rds: tcp: correctly sequence cleanup on netns deletion.Sowmini Varadhan1-2/+2
Commit 8edc3affc077 ("rds: tcp: Take explicit refcounts on struct net") introduces a regression in rds-tcp netns cleanup. The cleanup_net(), (and thus rds_tcp_dev_event notification) is only called from put_net() when all netns refcounts go to 0, but this cannot happen if the rds_connection itself is holding a c_net ref that it expects to release in rds_tcp_kill_sock. Instead, the rds_tcp_kill_sock callback should make sure to tear down state carefully, ensuring that the socket teardown is only done after all data-structures and workqs that depend on it are quiesced. The original motivation for commit 8edc3affc077 ("rds: tcp: Take explicit refcounts on struct net") was to resolve a race condition reported by syzkaller where workqs for tx/rx/connect were triggered after the namespace was deleted. Those worker threads should have been cancelled/flushed before socket tear-down and indeed, rds_conn_path_destroy() does try to sequence this by doing /* cancel cp_send_w */ /* cancel cp_recv_w */ /* flush cp_down_w */ /* free data structures */ Here the "flush cp_down_w" will trigger rds_conn_shutdown and thus invoke rds_tcp_conn_path_shutdown() to close the tcp socket, so that we ought to have satisfied the requirement that "socket-close is done after all other dependent state is quiesced". However, rds_conn_shutdown has a bug in that it *always* triggers the reconnect workq (and if connection is successful, we always restart tx/rx workqs so with the right timing, we risk the race conditions reported by syzkaller). Netns deletion is like module teardown- no need to restart a reconnect in this case. We can use the c_destroy_in_prog bit to avoid restarting the reconnect. Fixes: 8edc3affc077 ("rds: tcp: Take explicit refcounts on struct net") Signed-off-by: Sowmini Varadhan <sowmini.varadhan@oracle.com> Acked-by: Santosh Shilimkar <santosh.shilimkar@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-01rds: tcp: remove redundant function rds_tcp_conn_paths_destroy()Sowmini Varadhan1-24/+1
A side-effect of Commit c14b0366813a ("rds: tcp: set linger to 1 when unloading a rds-tcp") is that we always send a RST on the tcp connection for rds_conn_destroy(), so rds_tcp_conn_paths_destroy() is not needed any more and is removed in this patch. Signed-off-by: Sowmini Varadhan <sowmini.varadhan@oracle.com> Acked-by: Santosh Shilimkar <santosh.shilimkar@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-07-16rds: cancel send/recv work before queuing connection shutdownSowmini Varadhan1-1/+1
We could end up executing rds_conn_shutdown before the rds_recv_worker thread, then rds_conn_shutdown -> rds_tcp_conn_shutdown can do a sock_release and set sock->sk to null, which may interleave in bad ways with rds_recv_worker, e.g., it could result in: "BUG: unable to handle kernel NULL pointer dereference at 0000000000000078" [ffff881769f6fd70] release_sock at ffffffff815f337b [ffff881769f6fd90] rds_tcp_recv at ffffffffa043c888 [rds_tcp] [ffff881769f6fdb0] rds_recv_worker at ffffffffa04a4810 [rds] [ffff881769f6fde0] process_one_work at ffffffff810a14c1 [ffff881769f6fe40] worker_thread at ffffffff810a1940 [ffff881769f6fec0] kthread at ffffffff810a6b1e Also, do not enqueue any new shutdown workq items when the connection is shutting down (this may happen for rds-tcp in softirq mode, if a FIN or CLOSE is received while the modules is in the middle of an unload) Signed-off-by: Sowmini Varadhan <sowmini.varadhan@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-04-26Merge branches 'uaccess.alpha', 'uaccess.arc', 'uaccess.arm', ↵Al Viro1-17/+21
'uaccess.arm64', 'uaccess.avr32', 'uaccess.bfin', 'uaccess.c6x', 'uaccess.cris', 'uaccess.frv', 'uaccess.h8300', 'uaccess.hexagon', 'uaccess.ia64', 'uaccess.m32r', 'uaccess.m68k', 'uaccess.metag', 'uaccess.microblaze', 'uaccess.mips', 'uaccess.mn10300', 'uaccess.nios2', 'uaccess.openrisc', 'uaccess.parisc', 'uaccess.powerpc', 'uaccess.s390', 'uaccess.score', 'uaccess.sh', 'uaccess.sparc', 'uaccess.tile', 'uaccess.um', 'uaccess.unicore32', 'uaccess.x86' and 'uaccess.xtensa' into work.uaccess
2017-04-06don't open-code kernel_setsockopt()Al Viro1-4/+1
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2017-03-07rds: tcp: Sequence teardown of listen and acceptor sockets to avoid racesSowmini Varadhan1-5/+10
Commit a93d01f5777e ("RDS: TCP: avoid bad page reference in rds_tcp_listen_data_ready") added the function rds_tcp_listen_sock_def_readable() to handle the case when a partially set-up acceptor socket drops into rds_tcp_listen_data_ready(). However, if the listen socket (rtn->rds_tcp_listen_sock) is itself going through a tear-down via rds_tcp_listen_stop(), the (*ready)() will be null and we would hit a panic of the form BUG: unable to handle kernel NULL pointer dereference at (null) IP: (null) : ? rds_tcp_listen_data_ready+0x59/0xb0 [rds_tcp] tcp_data_queue+0x39d/0x5b0 tcp_rcv_established+0x2e5/0x660 tcp_v4_do_rcv+0x122/0x220 tcp_v4_rcv+0x8b7/0x980 : In the above case, it is not fatal to encounter a NULL value for ready- we should just drop the packet and let the flush of the acceptor thread finish gracefully. In general, the tear-down sequence for listen() and accept() socket that is ensured by this commit is: rtn->rds_tcp_listen_sock = NULL; /* prevent any new accepts */ In rds_tcp_listen_stop(): serialize with, and prevent, further callbacks using lock_sock() flush rds_wq flush acceptor workq sock_release(listen socket) Signed-off-by: Sowmini Varadhan <sowmini.varadhan@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-03-07rds: tcp: Reorder initialization sequence in rds_tcp_init to avoid racesSowmini Varadhan1-10/+9
Order of initialization in rds_tcp_init needs to be done so that resources are set up and destroyed in the correct synchronization sequence with both the data path, as well as netns create/destroy path. Specifically, - we must call register_pernet_subsys and get the rds_tcp_netid before calling register_netdevice_notifier, otherwise we risk the sequence 1. register_netdevice_notifier sets up netdev notifier callback 2. rds_tcp_dev_event -> rds_tcp_kill_sock uses netid 0, and finds the wrong rtn, resulting in a panic with string that is of the form: BUG: unable to handle kernel NULL pointer dereference at 000000000000000d IP: rds_tcp_kill_sock+0x3a/0x1d0 [rds_tcp] : - the rds_tcp_incoming_slab kmem_cache must be initialized before the datapath starts up. The latter can happen any time after the pernet_subsys registration of rds_tcp_net_ops, whose -> init function sets up the listen socket. If the rds_tcp_incoming_slab has not been set up at that time, a panic of the form below may be encountered BUG: unable to handle kernel NULL pointer dereference at 0000000000000014 IP: kmem_cache_alloc+0x90/0x1c0 : rds_tcp_data_recv+0x1e7/0x370 [rds_tcp] tcp_read_sock+0x96/0x1c0 rds_tcp_recv_path+0x65/0x80 [rds_tcp] : Signed-off-by: Sowmini Varadhan <sowmini.varadhan@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-03-07rds: tcp: Take explicit refcounts on struct netSowmini Varadhan1-2/+2
It is incorrect for the rds_connection to piggyback on the sock_net() refcount for the netns because this gives rise to a chicken-and-egg problem during rds_conn_destroy. Instead explicitly take a ref on the net, and hold the netns down till the connection tear-down is complete. Reported-by: Dmitry Vyukov <dvyukov@google.com> Signed-off-by: Sowmini Varadhan <sowmini.varadhan@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-03-03rds: remove unnecessary returned value checkZhu Yanjun1-5/+1
The function rds_trans_register always returns 0. As such, it is not necessary to check the returned value. Cc: Joe Jin <joe.jin@oracle.com> Cc: Junxiao Bi <junxiao.bi@oracle.com> Signed-off-by: Zhu Yanjun <yanjun.zhu@oracle.com> Reviewed-by: Yuval Shaia <yuval.shaia@oracle.com> Acked-by: Santosh Shilimkar <santosh.shilimkar@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-02-24rds: fix memory leak errorZhu Yanjun1-3/+4
When the function register_netdevice_notifier fails, the memory allocated by kmem_cache_create should be freed by the function kmem_cache_destroy. Cc: Joe Jin <joe.jin@oracle.com> Cc: Junxiao Bi <junxiao.bi@oracle.com> Signed-off-by: Zhu Yanjun <yanjun.zhu@oracle.com> Acked-by: Santosh Shilimkar <santosh.shilimkar@oracle.com> Acked-by: Sowmini Varadhan <sowmini.varadhan@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-03Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller1-0/+2
Couple conflicts resolved here: 1) In the MACB driver, a bug fix to properly initialize the RX tail pointer properly overlapped with some changes to support variable sized rings. 2) In XGBE we had a "CONFIG_PM" --> "CONFIG_PM_SLEEP" fix overlapping with a reorganization of the driver to support ACPI, OF, as well as PCI variants of the chip. 3) In 'net' we had several probe error path bug fixes to the stmmac driver, meanwhile a lot of this code was cleaned up and reorganized in 'net-next'. 4) The cls_flower classifier obtained a helper function in 'net-next' called __fl_delete() and this overlapped with Daniel Borkamann's bug fix to use RCU for object destruction in 'net'. It also overlapped with Jiri's change to guard the rhashtable_remove_fast() call with a check against tc_skip_sw(). 5) In mlx4, a revert bug fix in 'net' overlapped with some unrelated changes in 'net-next'. 6) In geneve, a stale header pointer after pskb_expand_head() bug fix in 'net' overlapped with a large reorganization of the same code in 'net-next'. Since the 'net-next' code no longer had the bug in question, there was nothing to do other than to simply take the 'net-next' hunks. Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-02RDS: TCP: unregister_netdevice_notifier() in error path of rds_tcp_init_netSowmini Varadhan1-0/+2
If some error is encountered in rds_tcp_init_net, make sure to unregister_netdevice_notifier(), else we could trigger a panic later on, when the modprobe from a netns fails. Signed-off-by: Sowmini Varadhan <sowmini.varadhan@oracle.com> Acked-by: Santosh Shilimkar <santosh.shilimkar@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-18netns: make struct pernet_operations::id unsigned intAlexey Dobriyan1-1/+1
Make struct pernet_operations::id unsigned. There are 2 reasons to do so: 1) This field is really an index into an zero based array and thus is unsigned entity. Using negative value is out-of-bound access by definition. 2) On x86_64 unsigned 32-bit data which are mixed with pointers via array indexing or offsets added or subtracted to pointers are preffered to signed 32-bit data. "int" being used as an array index needs to be sign-extended to 64-bit before being used. void f(long *p, int i) { g(p[i]); } roughly translates to movsx rsi, esi mov rdi, [rsi+...] call g MOVSX is 3 byte instruction which isn't necessary if the variable is unsigned because x86_64 is zero extending by default. Now, there is net_generic() function which, you guessed it right, uses "int" as an array index: static inline void *net_generic(const struct net *net, int id) { ... ptr = ng->ptr[id - 1]; ... } And this function is used a lot, so those sign extensions add up. Patch snipes ~1730 bytes on allyesconfig kernel (without all junk messing with code generation): add/remove: 0/0 grow/shrink: 70/598 up/down: 396/-2126 (-1730) Unfortunately some functions actually grow bigger. This is a semmingly random artefact of code generation with register allocator being used differently. gcc decides that some variable needs to live in new r8+ registers and every access now requires REX prefix. Or it is shifted into r12, so [r12+0] addressing mode has to be used which is longer than [r8] However, overall balance is in negative direction: add/remove: 0/0 grow/shrink: 70/598 up/down: 396/-2126 (-1730) function old new delta nfsd4_lock 3886 3959 +73 tipc_link_build_proto_msg 1096 1140 +44 mac80211_hwsim_new_radio 2776 2808 +32 tipc_mon_rcv 1032 1058 +26 svcauth_gss_legacy_init 1413 1429 +16 tipc_bcbase_select_primary 379 392 +13 nfsd4_exchange_id 1247 1260 +13 nfsd4_setclientid_confirm 782 793 +11 ... put_client_renew_locked 494 480 -14 ip_set_sockfn_get 730 716 -14 geneve_sock_add 829 813 -16 nfsd4_sequence_done 721 703 -18 nlmclnt_lookup_host 708 686 -22 nfsd4_lockt 1085 1063 -22 nfs_get_client 1077 1050 -27 tcf_bpf_init 1106 1076 -30 nfsd4_encode_fattr 5997 5930 -67 Total: Before=154856051, After=154854321, chg -0.00% Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-09RDS: TCP: report addr/port info based on TCP socket in rds-infoSowmini Varadhan1-7/+13
The socket argument passed to rds_tcp_tc_info() is a PF_RDS socket, so it is incorrect to report the address port info based on rds_getname() as part of TCP state report. Invoke inet_getname() for the t_sock associated with the rds_tcp_connection instead. Signed-off-by: Sowmini Varadhan <sowmini.varadhan@oracle.com> Acked-by: Santosh Shilimkar <santosh.shilimkar@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-07-15RDS: TCP: Enable multipath RDS for TCPSowmini Varadhan1-1/+1
Use RDS probe-ping to compute how many paths may be used with the peer, and to synchronously start the multiple paths. If mprds is supported, hash outgoing traffic to one of multiple paths in rds_sendmsg() when multipath RDS is supported by the transport. CC: Santosh Shilimkar <santosh.shilimkar@oracle.com> Signed-off-by: Sowmini Varadhan <sowmini.varadhan@oracle.com> Acked-by: Santosh Shilimkar <santosh.shilimkar@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-07-15RDS: TCP: Reduce code duplication in rds_tcp_reset_callbacks()Sowmini Varadhan1-18/+4
Some code duplication in rds_tcp_reset_callbacks() can be avoided by having the function call rds_tcp_restore_callbacks() and rds_tcp_set_callbacks(). Acked-by: Santosh Shilimkar <santosh.shilimkar@oracle.com> Signed-off-by: Sowmini Varadhan <sowmini.varadhan@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-07-15RDS: TCP: avoid bad page reference in rds_tcp_listen_data_readySowmini Varadhan1-0/+7
As the existing comments in rds_tcp_listen_data_ready() indicate, it is possible under some race-windows to get to this function with the accept() socket. If that happens, we could run into a sequence whereby thread 1 thread 2 rds_tcp_accept_one() thread sets up new_sock via ->accept(). The sk_user_data is now sock_def_readable data comes in for new_sock, ->sk_data_ready is called, and we land in rds_tcp_listen_data_ready rds_tcp_set_callbacks() takes the sk_callback_lock and sets up sk_user_data to be the cp read_lock sk_callback_lock ready = cp unlock sk_callback_lock page fault on ready In the above sequence, we end up with a panic on a bad page reference when trying to execute (*ready)(). Instead we need to call sock_def_readable() safely, which is what this patch achieves. Acked-by: Santosh Shilimkar <santosh.shilimkar@oracle.com> Signed-off-by: Sowmini Varadhan <sowmini.varadhan@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-07-06Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller1-2/+3
Conflicts: drivers/net/ethernet/mellanox/mlx5/core/en.h drivers/net/ethernet/mellanox/mlx5/core/en_main.c drivers/net/usb/r8152.c All three conflicts were overlapping changes. Signed-off-by: David S. Miller <davem@davemloft.net>
2016-07-04RDS: fix rds_tcp_init() error pathVegard Nossum1-2/+3
If register_pernet_subsys() fails, we shouldn't try to call unregister_pernet_subsys(). Fixes: 467fa15356 ("RDS-TCP: Support multiple RDS-TCP listen endpoints, one per netns.") Cc: stable@vger.kernel.org Cc: Sowmini Varadhan <sowmini.varadhan@oracle.com> Cc: David S. Miller <davem@davemloft.net> Signed-off-by: Vegard Nossum <vegard.nossum@oracle.com> Acked-by: Sowmini Varadhan <sowmini.varadhan@oracle.com> Acked-by: Santosh Shilimkar <santosh.shilimkar@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-07-01RDS: TCP: Hooks to set up a single connection pathSowmini Varadhan1-1/+1
This patch adds ->conn_path_connect callbacks in the rds_transport that are used to set up a single connection path. Acked-by: Santosh Shilimkar <santosh.shilimkar@oracle.com> Signed-off-by: Sowmini Varadhan <sowmini.varadhan@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-07-01RDS: TCP: make receive path use the rds_conn_pathSowmini Varadhan1-1/+1
The ->sk_user_data contains a pointer to the rds_conn_path for the socket. Use this consistently in the rds_tcp_data_ready callbacks to get the rds_conn_path for rds_recv_incoming. Acked-by: Santosh Shilimkar <santosh.shilimkar@oracle.com> Signed-off-by: Sowmini Varadhan <sowmini.varadhan@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-07-01RDS: TCP: make ->sk_user_data point to a rds_conn_pathSowmini Varadhan1-12/+13
The socket callbacks should all operate on a struct rds_conn_path, in preparation for a MP capable RDS-TCP. Acked-by: Santosh Shilimkar <santosh.shilimkar@oracle.com> Signed-off-by: Sowmini Varadhan <sowmini.varadhan@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-07-01RDS: TCP: Refactor connection destruction to handle multiple pathsSowmini Varadhan1-7/+39
A single rds_connection may have multiple rds_conn_paths that have to be carefully and correctly destroyed, for both rmmod and netns-delete cases. For both cases, we extract a single rds_tcp_connection for each conn into a temporary list, and then invoke rds_conn_destroy() which iteratively dismantles every path in the rds_connection. For the netns deletion case, we additionally have to make sure that we do not leave a socket in TIME_WAIT state, as this will hold up the netns deletion. Thus we call rds_tcp_conn_paths_destroy() to reset state quickly. Acked-by: Santosh Shilimkar <santosh.shilimkar@oracle.com> Signed-off-by: Sowmini Varadhan <sowmini.varadhan@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-07-01RDS: TCP: Make rds_tcp_connection track the rds_conn_pathSowmini Varadhan1-19/+25
The struct rds_tcp_connection is the transport-specific private data structure that tracks TCP information per rds_conn_path. Modify this structure to have a back-pointer to the rds_conn_path for which it is the ->cp_transport_data. Acked-by: Santosh Shilimkar <santosh.shilimkar@oracle.com> Signed-off-by: Sowmini Varadhan <sowmini.varadhan@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-07-01RDS: TCP: Remove dead logic around c_passive in rds-tcpSowmini Varadhan1-6/+1
The c_passive bit is only intended for the IB transport and will never be encountered in rds-tcp, so remove the dead logic that predicates on this bit. Acked-by: Santosh Shilimkar <santosh.shilimkar@oracle.com> Signed-off-by: Sowmini Varadhan <sowmini.varadhan@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-07-01RDS: Rework path specific indirectionsSowmini Varadhan1-3/+3
Refactor code to avoid separate indirections for single-path and multipath transports. All transports (both single and mp-capable) will get a pointer to the rds_conn_path, and can trivially derive the rds_connection from the ->cp_conn. Acked-by: Santosh Shilimkar <santosh.shilimkar@oracle.com> Signed-off-by: Sowmini Varadhan <sowmini.varadhan@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-06-17RDS: TCP: Fix non static symbol warningsWei Yongjun1-2/+2
Fixes the following sparse warnings: net/rds/tcp.c:59:5: warning: symbol 'rds_tcp_min_sndbuf' was not declared. Should it be static? net/rds/tcp.c:60:5: warning: symbol 'rds_tcp_min_rcvbuf' was not declared. Should it be static? Signed-off-by: Wei Yongjun <yongjun_wei@trendmicro.com.cn> Acked-by: Sowmini Varadhan <sowmini.varadhan@oracle.com> Acked-by: Santosh Shilimkar <santosh.shilimkar@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-06-14RDS: Update rds_conn_shutdown to work with rds_conn_pathSowmini Varadhan1-1/+1
This commit changes rds_conn_shutdown to take a rds_conn_path * argument, allowing it to shutdown paths other than c_path[0] for MP-capable transports. Signed-off-by: Sowmini Varadhan <sowmini.varadhan@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-06-14RDS: split out connection specific state from rds_connection to rds_conn_pathSowmini Varadhan1-0/+1
In preparation for multipath RDS, split the rds_connection structure into a base structure, and a per-path struct rds_conn_path. The base structure tracks information and locks common to all paths. The workqs for send/recv/shutdown etc are tracked per rds_conn_path. Thus the workq callbacks now work with rds_conn_path. This commit allows for one rds_conn_path per rds_connection, and will be extended into multiple conn_paths in subsequent commits. Signed-off-by: Sowmini Varadhan <sowmini.varadhan@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-06-07RDS: TCP: fix race windows in send-path quiescence by rds_tcp_accept_one()Sowmini Varadhan1-1/+13
The send path needs to be quiesced before resetting callbacks from rds_tcp_accept_one(), and commit eb192840266f ("RDS:TCP: Synchronize rds_tcp_accept_one with rds_send_xmit when resetting t_sock") achieves this using the c_state and RDS_IN_XMIT bit following the pattern used by rds_conn_shutdown(). However this leaves the possibility of a race window as shown in the sequence below take t_conn_lock in rds_tcp_conn_connect send outgoing syn to peer drop t_conn_lock in rds_tcp_conn_connect incoming from peer triggers rds_tcp_accept_one, conn is marked CONNECTING wait for RDS_IN_XMIT to quiesce any rds_send_xmit threads call rds_tcp_reset_callbacks [.. race-window where incoming syn-ack can cause the conn to be marked UP from rds_tcp_state_change ..] lock_sock called from rds_tcp_reset_callbacks, and we set t_sock to null As soon as the conn is marked UP in the race-window above, rds_send_xmit() threads will proceed to rds_tcp_xmit and may encounter a null-pointer deref on the t_sock. Given that rds_tcp_state_change() is invoked in softirq context, whereas rds_tcp_reset_callbacks() is in workq context, and testing for RDS_IN_XMIT after lock_sock could result in a deadlock with tcp_sendmsg, this commit fixes the race by using a new c_state, RDS_TCP_RESETTING, which will prevent a transition to RDS_CONN_UP from rds_tcp_state_change(). Signed-off-by: Sowmini Varadhan <sowmini.varadhan@oracle.com> Acked-by: Santosh Shilimkar <santosh.shilimkar@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-06-07RDS: TCP: Retransmit half-sent datagrams when switching sockets in ↵Sowmini Varadhan1-0/+1
rds_tcp_reset_callbacks When we switch a connection's sockets in rds_tcp_rest_callbacks, any partially sent datagram must be retransmitted on the new socket so that the receiver can correctly reassmble the RDS datagram. Use rds_send_reset() which is designed for this purpose. Signed-off-by: Sowmini Varadhan <sowmini.varadhan@oracle.com> Acked-by: Santosh Shilimkar <santosh.shilimkar@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-06-07RDS: TCP: Add/use rds_tcp_reset_callbacks to reset tcp socket safelySowmini Varadhan1-3/+62
When rds_tcp_accept_one() has to replace the existing tcp socket with a newer tcp socket (duelling-syn resolution), it must lock_sock() to suppress the rds_tcp_data_recv() path while callbacks are being changed. Also, existing RDS datagram reassembly state must be reset, so that the next datagram on the new socket does not have corrupted state. Similarly when resetting the newly accepted socket, appropriate locks and synchronization is needed. This commit ensures correct synchronization by invoking kernel_sock_shutdown to reset a newly accepted sock, and by taking appropriate lock_sock()s (for old and new sockets) when resetting existing callbacks. Signed-off-by: Sowmini Varadhan <sowmini.varadhan@oracle.com> Acked-by: Santosh Shilimkar <santosh.shilimkar@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-05-03RDS: TCP: Synchronize accept() and connect() paths on t_conn_lock.Sowmini Varadhan1-0/+1
An arbitration scheme for duelling SYNs is implemented as part of commit 241b271952eb ("RDS-TCP: Reset tcp callbacks if re-using an outgoing socket in rds_tcp_accept_one()") which ensures that both nodes involved will arrive at the same arbitration decision. However, this needs to be synchronized with an outgoing SYN to be generated by rds_tcp_conn_connect(). This commit achieves the synchronization through the t_conn_lock mutex in struct rds_tcp_connection. The rds_conn_state is checked in rds_tcp_conn_connect() after acquiring the t_conn_lock mutex. A SYN is sent out only if the RDS connection is not already UP (an UP would indicate that rds_tcp_accept_one() has completed 3WH, so no SYN needs to be generated). Similarly, the rds_conn_state is checked in rds_tcp_accept_one() after acquiring the t_conn_lock mutex. The only acceptable states (to allow continuation of the arbitration logic) are UP (i.e., outgoing SYN was SYN-ACKed by peer after it sent us the SYN) or CONNECTING (we sent outgoing SYN before we saw incoming SYN). Signed-off-by: Sowmini Varadhan <sowmini.varadhan@oracle.com> Acked-by: Santosh Shilimkar <santosh.shilimkar@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>