summaryrefslogtreecommitdiffstats
path: root/fs/dlm
AgeCommit message (Collapse)AuthorFilesLines
2014-04-11net: Fix use after free by removing length arg from sk_data_ready callbacks.David S. Miller1-1/+1
Several spots in the kernel perform a sequence like: skb_queue_tail(&sk->s_receive_queue, skb); sk->sk_data_ready(sk, skb->len); But at the moment we place the SKB onto the socket receive queue it can be consumed and freed up. So this skb->len access is potentially to freed up memory. Furthermore, the skb->len can be modified by the consumer so it is possible that the value isn't accurate. And finally, no actual implementation of this callback actually uses the length argument. And since nobody actually cared about it's value, lots of call sites pass arbitrary values in such as '0' and even '1'. So just remove the length argument from the callback, that way there is no confusion whatsoever and all of these use-after-free cases get fixed as a side effect. Based upon a patch by Eric Dumazet and his suggestion to audit this issue tree-wide. Signed-off-by: David S. Miller <davem@davemloft.net>
2014-02-14dlm: use INFO for recovery messagesDavid Teigland8-47/+46
The log messages relating to the progress of recovery are minimal and very often useful. Change these to the KERN_INFO level so they are always available. Signed-off-by: David Teigland <teigland@redhat.com>
2014-02-12fs: Include appropriate header file in dlm/ast.cRashika Kheria1-0/+1
Include appropriate header file fs/dlm/ast.h in fs/dlm/ast.c because it contains function prototypes of some functions defined in fs/dlm/ast.c. This also eliminates the following warning in fs/dlm/ast: fs/dlm/ast.c:52:5: warning: no previous prototype for ‘dlm_add_lkb_callback’ [-Wmissing-prototypes] fs/dlm/ast.c:113:5: warning: no previous prototype for ‘dlm_rem_lkb_callback’ [-Wmissing-prototypes] fs/dlm/ast.c:174:6: warning: no previous prototype for ‘dlm_add_cb’ [-Wmissing-prototypes] fs/dlm/ast.c:212:6: warning: no previous prototype for ‘dlm_callback_work’ [-Wmissing-prototypes] fs/dlm/ast.c:267:5: warning: no previous prototype for ‘dlm_callback_start’ [-Wmissing-prototypes] fs/dlm/ast.c:278:6: warning: no previous prototype for ‘dlm_callback_stop’ [-Wmissing-prototypes] fs/dlm/ast.c:284:6: warning: no previous prototype for ‘dlm_callback_suspend’ [-Wmissing-prototypes] fs/dlm/ast.c:292:6: warning: no previous prototype for ‘dlm_callback_resume’ [-Wmissing-prototypes] Signed-off-by: Rashika Kheria <rashika.kheria@gmail.com> Reviewed-by: Josh Triplett <josh@joshtriplett.org> Signed-off-by: David Teigland <teigland@redhat.com>
2014-02-12dlm: silence a harmless use after free warningDan Carpenter1-0/+1
We pass the freed "r" pointer back to the caller. It's harmless but it upsets the static checkers. Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: David Teigland <teigland@redhat.com>
2014-01-25Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-nextLinus Torvalds1-2/+2
Pull networking updates from David Miller: 1) BPF debugger and asm tool by Daniel Borkmann. 2) Speed up create/bind in AF_PACKET, also from Daniel Borkmann. 3) Correct reciprocal_divide and update users, from Hannes Frederic Sowa and Daniel Borkmann. 4) Currently we only have a "set" operation for the hw timestamp socket ioctl, add a "get" operation to match. From Ben Hutchings. 5) Add better trace events for debugging driver datapath problems, also from Ben Hutchings. 6) Implement auto corking in TCP, from Eric Dumazet. Basically, if we have a small send and a previous packet is already in the qdisc or device queue, defer until TX completion or we get more data. 7) Allow userspace to manage ipv6 temporary addresses, from Jiri Pirko. 8) Add a qdisc bypass option for AF_PACKET sockets, from Daniel Borkmann. 9) Share IP header compression code between Bluetooth and IEEE802154 layers, from Jukka Rissanen. 10) Fix ipv6 router reachability probing, from Jiri Benc. 11) Allow packets to be captured on macvtap devices, from Vlad Yasevich. 12) Support tunneling in GRO layer, from Jerry Chu. 13) Allow bonding to be configured fully using netlink, from Scott Feldman. 14) Allow AF_PACKET users to obtain the VLAN TPID, just like they can already get the TCI. From Atzm Watanabe. 15) New "Heavy Hitter" qdisc, from Terry Lam. 16) Significantly improve the IPSEC support in pktgen, from Fan Du. 17) Allow ipv4 tunnels to cache routes, just like sockets. From Tom Herbert. 18) Add Proportional Integral Enhanced packet scheduler, from Vijay Subramanian. 19) Allow openvswitch to mmap'd netlink, from Thomas Graf. 20) Key TCP metrics blobs also by source address, not just destination address. From Christoph Paasch. 21) Support 10G in generic phylib. From Andy Fleming. 22) Try to short-circuit GRO flow compares using device provided RX hash, if provided. From Tom Herbert. The wireless and netfilter folks have been busy little bees too. * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (2064 commits) net/cxgb4: Fix referencing freed adapter ipv6: reallocate addrconf router for ipv6 address when lo device up fib_frontend: fix possible NULL pointer dereference rtnetlink: remove IFLA_BOND_SLAVE definition rtnetlink: remove check for fill_slave_info in rtnl_have_link_slave_info qlcnic: update version to 5.3.55 qlcnic: Enhance logic to calculate msix vectors. qlcnic: Refactor interrupt coalescing code for all adapters. qlcnic: Update poll controller code path qlcnic: Interrupt code cleanup qlcnic: Enhance Tx timeout debugging. qlcnic: Use bool for rx_mac_learn. bonding: fix u64 division rtnetlink: add missing IFLA_BOND_AD_INFO_UNSPEC sfc: Use the correct maximum TX DMA ring size for SFC9100 Add Shradha Shah as the sfc driver maintainer. net/vxlan: Share RX skb de-marking and checksum checks with ovs tulip: cleanup by using ARRAY_SIZE() ip_tunnel: clear IPCB in ip_tunnel_xmit() in case dst_link_failure() is called net/cxgb4: Don't retrieve stats during recovery ...
2014-01-21sctp: remove macros sctp_{lock|release}_sockwangweidong1-2/+2
Redefined {lock|release}_sock to sctp_{lock|release}_sock for user space friendly code which we haven't use in years, so removing them. Signed-off-by: Wang Weidong <wangweidong1@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-12-16dlm: set zero linger time on sctp socketDongmao Zhang1-0/+8
The recovery time for a failed node was taking a long time because the failed node could not perform the full shutdown process. Removing the linger time speeds this up. The dlm does not care what happens to messages to or from the failed node. Signed-off-by: Dongmao Zhang <dmzhang@suse.com> Signed-off-by: David Teigland <teigland@redhat.com>
2013-11-19genetlink: only pass array to genl_register_family_with_ops()Johannes Berg1-4/+6
As suggested by David Miller, make genl_register_family_with_ops() a macro and pass only the array, evaluating ARRAY_SIZE() in the macro, this is a little safer. The openvswitch has some indirection, assing ops/n_ops directly in that code. This might ultimately just assign the pointers in the family initializations, saving the struct genl_family_and_ops and code (once mcast groups are handled differently.) Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-10-16dlm: Avoid that dlm_release_lockspace() incorrectly returns -EBUSYBart Van Assche1-3/+1
When dlm_release_lockspace(ls, 1) is invoked on a busy system immediately after the last dlm_unlock() AST has finished it can occur that lkb_idr_is_local() is invoked for the unlocked LKB since removal from ls_lkbidr only occurs after the AST has returned. If that happens dlm_release_lockspace(ls, 1) will return -EBUSY instead of releasing the lockspace. Fix this race condition by changing lkb_idr_is_local() such that it only returns true for LKB's that have not yet been unlocked. Signed-off-by: Bart Van Assche <bvanassche@acm.org> Signed-off-by: David Teigland <teigland@redhat.com>
2013-08-12dlm: remove signal blockingDavid Teigland1-19/+6
The signal blocking was incorrect and unnecessary so just remove it. Signed-off-by: David Teigland <teigland@redhat.com>
2013-07-30dlm: WQ_NON_REENTRANT is meaningless and going awayTejun Heo1-4/+1
dbf2576e37 ("workqueue: make all workqueues non-reentrant") made WQ_NON_REENTRANT no-op and the flag is going away. Remove its usages. This patch doesn't introduce any behavior changes. Signed-off-by: Tejun Heo <tj@kernel.org> Signed-off-by: David Teigland <teigland@redhat.com>
2013-06-26dlm: Avoid LVB truncationBart Van Assche1-4/+4
For lockspaces with an LVB length above 64 bytes, avoid truncating the LVB while exchanging it with another node in the cluster. Signed-off-by: Bart Van Assche <bvanassche@acm.org> Signed-off-by: David Teigland <teigland@redhat.com>
2013-06-25dlm: log an error for unmanaged lockspacesDavid Teigland1-1/+8
Log an error message if the dlm user daemon exits before all the lockspaces have been removed. Signed-off-by: David Teigland <teigland@redhat.com>
2013-06-25dlm: config: using strlcpy instead of strncpyZhao Hongjiang1-2/+3
for NUL terminated string, need alway set '\0' in the end. Signed-off-by: Zhao Hongjiang <zhaohongjiang@huawei.com> Signed-off-by: David Teigland <teigland@redhat.com>
2013-06-19dlm: remove duplicated include from lowcomms.cWei Yongjun1-1/+0
Remove duplicated include. Signed-off-by: Wei Yongjun <yongjun_wei@trendmicro.com.cn> Signed-off-by: David Teigland <teigland@redhat.com>
2013-06-14dlm: disable nagle for SCTPMike Christie1-0/+6
For TCP we disable Nagle and I cannot think of why it would be needed for SCTP. When disabled it seems to improve dlm_lock operations like it does for TCP. Signed-off-by: Mike Christie <michaelc@cs.wisc.edu> Signed-off-by: David Teigland <teigland@redhat.com>
2013-06-14dlm: retry failed SCTP sendsMike Christie1-29/+75
Currently if a SCTP send fails, we lose the data we were trying to send because the writequeue_entry is released when we do the send. When this happens other nodes will then hang waiting for a reply. This adds support for SCTP to retry the send operation. I also removed the retry limit for SCTP use, because we want to make sure we try every path during init time and for longer failures we want to continually retry in case paths come back up while trying other paths. We will do this until userspace tells us to stop. Signed-off-by: Mike Christie <michaelc@cs.wisc.edu> Signed-off-by: David Teigland <teigland@redhat.com>
2013-06-14dlm: try other IPs when sctp init assoc failsMike Christie1-11/+50
Currently, if we cannot create a association to the first IP addr that is added to DLM, the SCTP init assoc code will just retry the same IP. This patch adds a simple failover schemes where we will try one of the addresses that was passed into DLM. Signed-off-by: Mike Christie <michaelc@cs.wisc.edu> Signed-off-by: David Teigland <teigland@redhat.com>
2013-06-14dlm: clear correct bit during sctp init failure handlingMike Christie1-1/+1
We should be testing and cleaing the init pending bit because later when sctp_init_assoc is recalled it will be checking that it is not set and set the bit. We do not want to touch CF_CONNECT_PENDING here because we will queue swork and process_send_sockets will then call the connect_action function. Signed-off-by: Mike Christie <michaelc@cs.wisc.edu> Signed-off-by: David Teigland <teigland@redhat.com>
2013-06-14dlm: set sctp assoc id during setupMike Christie1-0/+1
sctp_assoc was not getting set so later lookups failed. Signed-off-by: Mike Christie <michaelc@cs.wisc.edu> Signed-off-by: David Teigland <teigland@redhat.com>
2013-06-14dlm: clear correct init bit during sctp setupMike Christie1-1/+1
We were clearing the base con's init pending flags, but the con for the node was the one with the pending bit set. Signed-off-by: Mike Christie <michaelc@cs.wisc.edu> Signed-off-by: David Teigland <teigland@redhat.com>
2013-05-01Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-nextLinus Torvalds1-1/+1
Pull networking updates from David Miller: "Highlights (1721 non-merge commits, this has to be a record of some sort): 1) Add 'random' mode to team driver, from Jiri Pirko and Eric Dumazet. 2) Make it so that any driver that supports configuration of multiple MAC addresses can provide the forwarding database add and del calls by providing a default implementation and hooking that up if the driver doesn't have an explicit set of handlers. From Vlad Yasevich. 3) Support GSO segmentation over tunnels and other encapsulating devices such as VXLAN, from Pravin B Shelar. 4) Support L2 GRE tunnels in the flow dissector, from Michael Dalton. 5) Implement Tail Loss Probe (TLP) detection in TCP, from Nandita Dukkipati. 6) In the PHY layer, allow supporting wake-on-lan in situations where the PHY registers have to be written for it to be configured. Use it to support wake-on-lan in mv643xx_eth. From Michael Stapelberg. 7) Significantly improve firewire IPV6 support, from YOSHIFUJI Hideaki. 8) Allow multiple packets to be sent in a single transmission using network coding in batman-adv, from Martin Hundebøll. 9) Add support for T5 cxgb4 chips, from Santosh Rastapur. 10) Generalize the VXLAN forwarding tables so that there is more flexibility in configurating various aspects of the endpoints. From David Stevens. 11) Support RSS and TSO in hardware over GRE tunnels in bxn2x driver, from Dmitry Kravkov. 12) Zero copy support in nfnelink_queue, from Eric Dumazet and Pablo Neira Ayuso. 13) Start adding networking selftests. 14) In situations of overload on the same AF_PACKET fanout socket, or per-cpu packet receive queue, minimize drop by distributing the load to other cpus/fanouts. From Willem de Bruijn and Eric Dumazet. 15) Add support for new payload offset BPF instruction, from Daniel Borkmann. 16) Convert several drivers over to mdoule_platform_driver(), from Sachin Kamat. 17) Provide a minimal BPF JIT image disassembler userspace tool, from Daniel Borkmann. 18) Rewrite F-RTO implementation in TCP to match the final specification of it in RFC4138 and RFC5682. From Yuchung Cheng. 19) Provide netlink socket diag of netlink sockets ("Yo dawg, I hear you like netlink, so I implemented netlink dumping of netlink sockets.") From Andrey Vagin. 20) Remove ugly passing of rtnetlink attributes into rtnl_doit functions, from Thomas Graf. 21) Allow userspace to be able to see if a configuration change occurs in the middle of an address or device list dump, from Nicolas Dichtel. 22) Support RFC3168 ECN protection for ipv6 fragments, from Hannes Frederic Sowa. 23) Increase accuracy of packet length used by packet scheduler, from Jason Wang. 24) Beginning set of changes to make ipv4/ipv6 fragment handling more scalable and less susceptible to overload and locking contention, from Jesper Dangaard Brouer. 25) Get rid of using non-type-safe NLMSG_* macros and use nlmsg_*() instead. From Hong Zhiguo. 26) Optimize route usage in IPVS by avoiding reference counting where possible, from Julian Anastasov. 27) Convert IPVS schedulers to RCU, also from Julian Anastasov. 28) Support cpu fanouts in xt_NFQUEUE netfilter target, from Holger Eitzenberger. 29) Network namespace support for nf_log, ebt_log, xt_LOG, ipt_ULOG, nfnetlink_log, and nfnetlink_queue. From Gao feng. 30) Implement RFC3168 ECN protection, from Hannes Frederic Sowa. 31) Support several new r8169 chips, from Hayes Wang. 32) Support tokenized interface identifiers in ipv6, from Daniel Borkmann. 33) Use usbnet_link_change() helper in USB net driver, from Ming Lei. 34) Add 802.1ad vlan offload support, from Patrick McHardy. 35) Support mmap() based netlink communication, also from Patrick McHardy. 36) Support HW timestamping in mlx4 driver, from Amir Vadai. 37) Rationalize AF_PACKET packet timestamping when transmitting, from Willem de Bruijn and Daniel Borkmann. 38) Bring parity to what's provided by /proc/net/packet socket dumping and the info provided by netlink socket dumping of AF_PACKET sockets. From Nicolas Dichtel. 39) Fix peeking beyond zero sized SKBs in AF_UNIX, from Benjamin Poirier" * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (1722 commits) filter: fix va_list build error af_unix: fix a fatal race with bit fields bnx2x: Prevent memory leak when cnic is absent bnx2x: correct reading of speed capabilities net: sctp: attribute printl with __printf for gcc fmt checks netlink: kconfig: move mmap i/o into netlink kconfig netpoll: convert mutex into a semaphore netlink: Fix skb ref counting. net_sched: act_ipt forward compat with xtables mlx4_en: fix a build error on 32bit arches Revert "bnx2x: allow nvram test to run when device is down" bridge: avoid OOPS if root port not found drivers: net: cpsw: fix kernel warn on cpsw irq enable sh_eth: use random MAC address if no valid one supplied 3c509.c: call SET_NETDEV_DEV for all device types (ISA/ISAPnP/EISA) tg3: fix to append hardware time stamping flags unix/stream: fix peeking with an offset larger than data in queue unix/dgram: fix peeking with an offset larger than data in queue unix/dgram: peek beyond 0-sized skbs openvswitch: Remove unneeded ovs_netdev_get_ifindex() ...
2013-04-09net: sctp: introduce uapi header for sctpDaniel Borkmann1-1/+1
This patch introduces an UAPI header for the SCTP protocol, so that we can facilitate the maintenance and development of user land applications or libraries, in particular in terms of header synchronization. To not break compatibility, some fragments from lksctp-tools' netinet/sctp.h have been carefully included, while taking care that neither kernel nor user land breaks, so both compile fine with this change (for lksctp-tools I tested with the old netinet/sctp.h header and with a newly adapted one that includes the uapi sctp header). lksctp-tools smoke test run through successfully as well in both cases. Suggested-by: Neil Horman <nhorman@tuxdriver.com> Cc: Neil Horman <nhorman@tuxdriver.com> Cc: Vlad Yasevich <vyasevich@gmail.com> Signed-off-by: Daniel Borkmann <dborkman@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-04-08dlm: avoid unnecessary posix unlockDavid Teigland1-3/+15
When the kernel clears flocks/plocks during close, it calls posix unlock when there are flocks but no posix locks. Without this patch, that unnecessary posix unlock is passed to userland (dlm_controld), across the cluster, and back to the kernel. This can create a lot of plock activity, even when no posix locks had been used. This patch copies the nfs approach, and skips the full posix unlock if there is no plock found during the vfs unlock phase. Signed-off-by: David Teigland <teigland@redhat.com>
2013-02-27hlist: drop the node parameter from iteratorsSasha Levin1-7/+4
I'm not sure why, but the hlist for each entry iterators were conceived list_for_each_entry(pos, head, member) The hlist ones were greedy and wanted an extra parameter: hlist_for_each_entry(tpos, pos, head, member) Why did they need an extra pos parameter? I'm not quite sure. Not only they don't really need it, it also prevents the iterator from looking exactly like the list iterator, which is unfortunate. Besides the semantic patch, there was some manual work required: - Fix up the actual hlist iterators in linux/list.h - Fix up the declaration of other iterators based on the hlist ones. - A very small amount of places were using the 'node' parameter, this was modified to use 'obj->member' instead. - Coccinelle didn't handle the hlist_for_each_entry_safe iterator properly, so those had to be fixed up manually. The semantic patch which is mostly the work of Peter Senna Tschudin is here: @@ iterator name hlist_for_each_entry, hlist_for_each_entry_continue, hlist_for_each_entry_from, hlist_for_each_entry_rcu, hlist_for_each_entry_rcu_bh, hlist_for_each_entry_continue_rcu_bh, for_each_busy_worker, ax25_uid_for_each, ax25_for_each, inet_bind_bucket_for_each, sctp_for_each_hentry, sk_for_each, sk_for_each_rcu, sk_for_each_from, sk_for_each_safe, sk_for_each_bound, hlist_for_each_entry_safe, hlist_for_each_entry_continue_rcu, nr_neigh_for_each, nr_neigh_for_each_safe, nr_node_for_each, nr_node_for_each_safe, for_each_gfn_indirect_valid_sp, for_each_gfn_sp, for_each_host; type T; expression a,c,d,e; identifier b; statement S; @@ -T b; <+... when != b ( hlist_for_each_entry(a, - b, c, d) S | hlist_for_each_entry_continue(a, - b, c) S | hlist_for_each_entry_from(a, - b, c) S | hlist_for_each_entry_rcu(a, - b, c, d) S | hlist_for_each_entry_rcu_bh(a, - b, c, d) S | hlist_for_each_entry_continue_rcu_bh(a, - b, c) S | for_each_busy_worker(a, c, - b, d) S | ax25_uid_for_each(a, - b, c) S | ax25_for_each(a, - b, c) S | inet_bind_bucket_for_each(a, - b, c) S | sctp_for_each_hentry(a, - b, c) S | sk_for_each(a, - b, c) S | sk_for_each_rcu(a, - b, c) S | sk_for_each_from -(a, b) +(a) S + sk_for_each_from(a) S | sk_for_each_safe(a, - b, c, d) S | sk_for_each_bound(a, - b, c) S | hlist_for_each_entry_safe(a, - b, c, d, e) S | hlist_for_each_entry_continue_rcu(a, - b, c) S | nr_neigh_for_each(a, - b, c) S | nr_neigh_for_each_safe(a, - b, c, d) S | nr_node_for_each(a, - b, c) S | nr_node_for_each_safe(a, - b, c, d) S | - for_each_gfn_sp(a, c, d, b) S + for_each_gfn_sp(a, c, d) S | - for_each_gfn_indirect_valid_sp(a, c, d, b) S + for_each_gfn_indirect_valid_sp(a, c, d) S | for_each_host(a, - b, c) S | for_each_host_safe(a, - b, c, d) S | for_each_mesh_entry(a, - b, c, d) S ) ...+> [akpm@linux-foundation.org: drop bogus change from net/ipv4/raw.c] [akpm@linux-foundation.org: drop bogus hunk from net/ipv6/raw.c] [akpm@linux-foundation.org: checkpatch fixes] [akpm@linux-foundation.org: fix warnings] [akpm@linux-foudnation.org: redo intrusive kvm changes] Tested-by: Peter Senna Tschudin <peter.senna@gmail.com> Acked-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Signed-off-by: Sasha Levin <sasha.levin@oracle.com> Cc: Wu Fengguang <fengguang.wu@intel.com> Cc: Marcelo Tosatti <mtosatti@redhat.com> Cc: Gleb Natapov <gleb@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-02-27dlm: convert to idr_alloc()Tejun Heo2-26/+19
Convert to the much saner new idr interface. Error return values from recover_idr_add() mix -1 and -errno. The conversion doesn't change that but it looks iffy. Signed-off-by: Tejun Heo <tj@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-02-27dlm: don't use idr_remove_all()Tejun Heo2-2/+1
idr_destroy() can destroy idr by itself and idr_remove_all() is being deprecated. The conversion isn't completely trivial for recover_idr_clear() as it's the only place in kernel which makes legitimate use of idr_remove_all() w/o idr_destroy(). Replace it with idr_remove() call inside idr_for_each_entry() loop. It goes on top so that it matches the operation order in recover_idr_del(). Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Christine Caulfield <ccaulfie@redhat.com> Cc: David Teigland <teigland@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-02-27dlm: use idr_for_each_entry() in recover_idr_clear() error pathTejun Heo1-13/+10
Convert recover_idr_clear() to use idr_for_each_entry() instead of idr_for_each(). It's somewhat less efficient this way but it shouldn't matter in an error path. This is to help with deprecation of idr_remove_all(). Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Christine Caulfield <ccaulfie@redhat.com> Cc: David Teigland <teigland@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-02-26Merge branch 'for-linus' of ↵Linus Torvalds1-1/+1
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull vfs pile (part one) from Al Viro: "Assorted stuff - cleaning namei.c up a bit, fixing ->d_name/->d_parent locking violations, etc. The most visible changes here are death of FS_REVAL_DOT (replaced with "has ->d_weak_revalidate()") and a new helper getting from struct file to inode. Some bits of preparation to xattr method interface changes. Misc patches by various people sent this cycle *and* ocfs2 fixes from several cycles ago that should've been upstream right then. PS: the next vfs pile will be xattr stuff." * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (46 commits) saner proc_get_inode() calling conventions proc: avoid extra pde_put() in proc_fill_super() fs: change return values from -EACCES to -EPERM fs/exec.c: make bprm_mm_init() static ocfs2/dlm: use GFP_ATOMIC inside a spin_lock ocfs2: fix possible use-after-free with AIO ocfs2: Fix oops in ocfs2_fast_symlink_readpage() code path get_empty_filp()/alloc_file() leave both ->f_pos and ->f_version zero target: writev() on single-element vector is pointless export kernel_write(), convert open-coded instances fs: encode_fh: return FILEID_INVALID if invalid fid_type kill f_vfsmnt vfs: kill FS_REVAL_DOT by adding a d_weak_revalidate dentry op nfsd: handle vfs_getattr errors in acl protocol switch vfs_getattr() to struct path default SET_PERSONALITY() in linux/elf.h ceph: prepopulate inodes only when request is aborted d_hash_and_lookup(): export, switch open-coded instances 9p: switch v9fs_set_create_acl() to inode+fid, do it before d_instantiate() 9p: split dropping the acls from v9fs_set_create_acl() ...
2013-02-26fs: change return values from -EACCES to -EPERMZhao Hongjiang1-1/+1
According to SUSv3: [EACCES] Permission denied. An attempt was made to access a file in a way forbidden by its file access permissions. [EPERM] Operation not permitted. An attempt was made to perform an operation limited to processes with appropriate privileges or to the owner of a file or other resource. So -EPERM should be returned if capability checks fails. Strictly speaking this is an API change since the error code user sees is altered. Signed-off-by: Zhao Hongjiang <zhaohongjiang@huawei.com> Acked-by: Jan Kara <jack@suse.cz> Acked-by: Steven Whitehouse <swhiteho@redhat.com> Acked-by: Ian Kent <raven@themaw.net> Cc: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-02-21Merge tag 'dlm-3.9' of ↵Linus Torvalds2-0/+18
git://git.kernel.org/pub/scm/linux/kernel/git/teigland/linux-dlm Pull dlm update from David Teigland: "This includes a single patch to avoid excessive and unnecessary scanning of rsbs to free." * tag 'dlm-3.9' of git://git.kernel.org/pub/scm/linux/kernel/git/teigland/linux-dlm: dlm: avoid scanning unchanged toss lists
2013-02-04dlm: check the write size from userDavid Teigland1-4/+4
Return EINVAL from write if the size is larger than allowed. Do this before allocating kernel memory for the bogus size, which could lead to OOM. Reported-by: Sasha Levin <levinsasha928@gmail.com> Tested-by: Jana Saout <jana@saout.de> Signed-off-by: David Teigland <teigland@redhat.com>
2013-01-07dlm: avoid scanning unchanged toss listsDavid Teigland2-0/+18
Keep track of whether a toss list contains any shrinkable rsbs. If not, dlm_scand can avoid scanning the list for rsbs to shrink. Unnecessary scanning can otherwise waste a lot of time because the toss lists can contain a large number of rsbs that are non-shrinkable (directory records). Signed-off-by: David Teigland <teigland@redhat.com>
2012-11-16dlm: fix lvb invalidation conditionsDavid Teigland3-10/+44
When a node is removed that held a PW/EX lock, the existing master node should invalidate the lvb on the resource due to the purged lock. Previously, the existing master node was invalidating the lvb if it found only NL/CR locks on the resource during recovery for the removed node. This could lead to cases where it invalidated the lvb and shouldn't have, or cases where it should have invalidated and didn't. When recovery selects a *new* master node for a resource, and that new master finds only NL/CR locks on the resource after lock recovery, it should invalidate the lvb. This case was handled correctly (but was incorrectly applied to the existing master case also.) When a process exits while holding a PW/EX lock, the lvb on the resource should be invalidated. This was not happening. The lvb contents and VALNOTVALID flag should be recovered before granting locks in recovery so that the recovered lvb state is provided in the callback. The lvb was being recovered after the lock was granted. Signed-off-by: David Teigland <teigland@redhat.com>
2012-11-01fs/dlm: remove CONFIG_EXPERIMENTALKees Cook1-1/+1
This config item has not carried much meaning for a while now and is almost always enabled by default. As agreed during the Linux kernel summit, remove it. CC: Christine Caulfield <ccaulfie@redhat.com> Signed-off-by: Kees Cook <keescook@chromium.org> Signed-off-by: David Teigland <teigland@redhat.com>
2012-11-01dlm: remove unused variable in *dlm_lowcomms_get_buffer()Wei Yongjun1-3/+2
The variable users is initialized but never used otherwise, so remove the unused variable. dpatch engine is used to auto generate this patch. (https://github.com/weiyj/dpatch) Signed-off-by: Wei Yongjun <yongjun_wei@trendmicro.com.cn> Signed-off-by: David Teigland <teigland@redhat.com>
2012-10-02Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-nextLinus Torvalds1-4/+4
Pull networking changes from David Miller: 1) GRE now works over ipv6, from Dmitry Kozlov. 2) Make SCTP more network namespace aware, from Eric Biederman. 3) TEAM driver now works with non-ethernet devices, from Jiri Pirko. 4) Make openvswitch network namespace aware, from Pravin B Shelar. 5) IPV6 NAT implementation, from Patrick McHardy. 6) Server side support for TCP Fast Open, from Jerry Chu and others. 7) Packet BPF filter supports MOD and XOR, from Eric Dumazet and Daniel Borkmann. 8) Increate the loopback default MTU to 64K, from Eric Dumazet. 9) Use a per-task rather than per-socket page fragment allocator for outgoing networking traffic. This benefits processes that have very many mostly idle sockets, which is quite common. From Eric Dumazet. 10) Use up to 32K for page fragment allocations, with fallbacks to smaller sizes when higher order page allocations fail. Benefits are a) less segments for driver to process b) less calls to page allocator c) less waste of space. From Eric Dumazet. 11) Allow GRO to be used on GRE tunnels, from Eric Dumazet. 12) VXLAN device driver, one way to handle VLAN issues such as the limitation of 4096 VLAN IDs yet still have some level of isolation. From Stephen Hemminger. 13) As usual there is a large boatload of driver changes, with the scale perhaps tilted towards the wireless side this time around. Fix up various fairly trivial conflicts, mostly caused by the user namespace changes. * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (1012 commits) hyperv: Add buffer for extended info after the RNDIS response message. hyperv: Report actual status in receive completion packet hyperv: Remove extra allocated space for recv_pkt_list elements hyperv: Fix page buffer handling in rndis_filter_send_request() hyperv: Fix the missing return value in rndis_filter_set_packet_filter() hyperv: Fix the max_xfer_size in RNDIS initialization vxlan: put UDP socket in correct namespace vxlan: Depend on CONFIG_INET sfc: Fix the reported priorities of different filter types sfc: Remove EFX_FILTER_FLAG_RX_OVERRIDE_IP sfc: Fix loopback self-test with separate_tx_channels=1 sfc: Fix MCDI structure field lookup sfc: Add parentheses around use of bitfield macro arguments sfc: Fix null function pointer in efx_sriov_channel_type vxlan: virtual extensible lan igmp: export symbol ip_mc_leave_group netlink: add attributes to fdb interface tg3: unconditionally select HWMON support when tg3 is enabled. Revert "net: ti cpsw ethernet: allow reading phy interface mode from DT" gre: fix sparse warning ...
2012-09-10netlink: Rename pid to portid to avoid confusionEric W. Biederman1-4/+4
It is a frequent mistake to confuse the netlink port identifier with a process identifier. Try to reduce this confusion by renaming fields that hold port identifiers portid instead of pid. I have carefully avoided changing the structures exported to userspace to avoid changing the userspace API. I have successfully built an allyesconfig kernel with this change. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Acked-by: Stephen Hemminger <shemminger@vyatta.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-09-10dlm: check the maximum size of a request from userSasha Levin1-0/+7
device_write only checks whether the request size is big enough, but it doesn't check if the size is too big. At that point, it also tries to allocate as much memory as the user has requested even if it's too much. This can lead to OOM killer kicking in, or memory corruption if (count + 1) overflows. Signed-off-by: Sasha Levin <levinsasha928@gmail.com> Signed-off-by: David Teigland <teigland@redhat.com>
2012-08-13dlm: cleanup send_to_sock routineYing Xue1-4/+1
Remove unnecessary code form send_to_sock routine. Signed-off-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: David Teigland <teigland@redhat.com>
2012-08-10dlm: convert add_sock routine return value type to voidYing Xue1-2/+1
Since add_sock() always returns a success code - 0, its return value type should be changed from integer to void. Signed-off-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: David Teigland <teigland@redhat.com>
2012-08-10dlm: remove redundant variable assignmentsXue Ying1-2/+0
Once the tcp_create_listen_sock() is returned successfully, we will invoke add_sock() immediately. In add_sock(), the 'con' variable is assigned to 'sk_user_data', meanwhile, the 'sock' is also set to 'con->sock'. So it's unnecessary to do the same thing in tcp_create_listen_sock(). Signed-off-by: Xue Ying <ying.xue@windriver.com> Signed-off-by: David Teigland <teigland@redhat.com>
2012-08-08dlm: fix unlock balance warningsDavid Teigland6-30/+78
The in_recovery rw_semaphore has always been acquired and released by different threads by design. To work around the "BUG: bad unlock balance detected!" messages, adjust things so the dlm_recoverd thread always does both down_write and up_write. Signed-off-by: David Teigland <teigland@redhat.com>
2012-08-08dlm: fix uninitialized spinlockDavid Teigland1-2/+2
Use DEFINE_SPINLOCK for global dlm_cb_seq_spin. Signed-off-by: David Teigland <teigland@redhat.com>
2012-08-08dlm: fix deadlock between dlm_send and dlm_controldDavid Teigland5-90/+200
A deadlock sometimes occurs between dlm_controld closing a lowcomms connection through configfs and dlm_send looking up the address for a new connection in configfs. dlm_controld does a configfs rmdir which calls dlm_lowcomms_close which waits for dlm_send to cancel work on the workqueues. The dlm_send workqueue thread has called tcp_connect_to_sock which calls dlm_nodeid_to_addr which does a configfs lookup and blocks on a lock held by dlm_controld in the rmdir path. The solution here is to save the node addresses within the lowcomms code so that the lowcomms workqueue does not need to step through configfs to get a node address. dlm_controld: wait_for_completion+0x1d/0x20 __cancel_work_timer+0x1b3/0x1e0 cancel_work_sync+0x10/0x20 dlm_lowcomms_close+0x4c/0xb0 [dlm] drop_comm+0x22/0x60 [dlm] client_drop_item+0x26/0x50 [configfs] configfs_rmdir+0x180/0x230 [configfs] vfs_rmdir+0xbd/0xf0 do_rmdir+0x103/0x120 sys_rmdir+0x16/0x20 dlm_send: mutex_lock+0x2b/0x50 get_comm+0x34/0x140 [dlm] dlm_nodeid_to_addr+0x18/0xd0 [dlm] tcp_connect_to_sock+0xf4/0x2d0 [dlm] process_send_sockets+0x1d2/0x260 [dlm] worker_thread+0x170/0x2a0 Signed-off-by: David Teigland <teigland@redhat.com>
2012-07-16dlm: fix missing dir removeDavid Teigland1-2/+68
I don't know exactly how, but in some cases, a dir record is not removed, or a new one is created when it shouldn't be. The result is that the dir node lookup returns a master node where the rsb does not exist. In this case, The master node will repeatedly return -EBADR for requests, and the lock requests will be stuck. Until all possible ways for this to happen can be eliminated, a simple and effective way to recover from this situation is for the supposed master node to send a standard remove message to the dir node when it receives a request for a resource it has no rsb for. Signed-off-by: David Teigland <teigland@redhat.com>
2012-07-16dlm: fix conversion deadlock from recoveryDavid Teigland2-17/+48
The process of rebuilding locks on a new master during recovery could re-order the locks on the convert queue, creating an "in place" conversion deadlock that would not be resolved. Fix this by not considering queue order when granting conversions after recovery. Signed-off-by: David Teigland <teigland@redhat.com>
2012-07-16dlm: use wait_event_timeoutDavid Teigland1-18/+11
Use wait_event_timeout to avoid using a timer directly. Signed-off-by: David Teigland <teigland@redhat.com>
2012-07-16dlm: fix race between remove and lookupDavid Teigland3-39/+176
It was possible for a remove message on an old rsb to be sent after a lookup message on a new rsb, where the rsbs were for the same resource name. This could lead to a missing directory entry for the new rsb. It is fixed by keeping a copy of the resource name being removed until after the remove has been sent. A lookup checks if this in-progress remove matches the name it is looking up. Signed-off-by: David Teigland <teigland@redhat.com>
2012-07-16dlm: use idr instead of list for recovered rsbsDavid Teigland4-23/+101
When a large number of resources are being recovered, a linear search of the recover_list takes a long time. Use an idr in place of a list. Signed-off-by: David Teigland <teigland@redhat.com>