summaryrefslogtreecommitdiffstats
AgeCommit message (Collapse)AuthorFilesLines
2016-12-16ovl: split super.cMiklos Szeredi6-546/+572
fs/overlayfs/super.c is the biggest of the overlayfs source files and it contains various utility functions as well as the rather complicated lookup code. Split these parts out to separate files. Before: 1446 fs/overlayfs/super.c After: 919 fs/overlayfs/super.c 267 fs/overlayfs/namei.c 235 fs/overlayfs/util.c 51 fs/overlayfs/ovl_entry.h Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2016-12-16ovl: use d_is_dir()Miklos Szeredi1-2/+2
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2016-12-16ovl: simplify lookupMiklos Szeredi1-29/+25
If encountering a non-directory, then stop looking at lower layers. In this case the oe->opaque flag is not set anymore, which doesn't matter since existence of lower file is now checked at remove/rename time. Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2016-12-16ovl: check lower existence of rename targetMiklos Szeredi1-52/+11
Check if something exists on the lower layer(s) under the target or rename to decide if directory needs to be marked "opaque". Marking opaque is done before the rename, and on failure the marking was undone. Also the opaque xattr was removed if the target didn't cover anything. This patch changes behavior so that removal of "opaque" is not done in either of the above cases. This means that directory may have the opaque flag even if it doesn't cover anything. However this shouldn't affect the performance or semantics of the overalay, while simplifying the code. Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2016-12-16ovl: rename: simplify handling of lower/merged directoryMiklos Szeredi2-21/+12
d_is_dir() is safe to call on a negative dentry. Use this fact to simplify handling of the lower or merged directories. Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2016-12-16ovl: get rid of PURE typeMiklos Szeredi3-13/+6
The remainging uses of __OVL_PATH_PURE can be replaced by ovl_dentry_is_opaque(). Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2016-12-16ovl: check lower existence when removingMiklos Szeredi3-3/+56
Currently ovl_lookup() checks existence of lower file even if there's a non-directory on upper (which is always opaque). This is done so that remove can decide whether a whiteout is needed or not. It would be better to defer this check to unlink, since most of the time the gathered information about opaqueness will be unused. This adds a helper ovl_lower_positive() that checks if there's anything on the lower layer(s). The following patches also introduce changes to how the "opaque" attribute is updated on directories: this attribute is added when the directory is creted or moved over a whiteout or object covering something on the lower layer. However following changes will allow the attribute to remain on the directory after being moved, even if the new location doesn't cover anything. Because of this, we need to check lower layers even for opaque directories, so that whiteout is only created when necessary. This function will later be also used to decide about marking a directory opaque, so deal with negative dentries as well. When dealing with negative, it's enough to check for being a whiteout If the dentry is positive but not upper then it also obviously needs whiteout/opaque. Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2016-12-16ovl: add ovl_dentry_is_whiteout()Miklos Szeredi3-3/+9
And use it instead of ovl_dentry_is_opaque() where appropriate. Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2016-12-16ovl: don't check stickyMiklos Szeredi1-24/+0
Since commit 07a2daab49c5 ("ovl: Copy up underlying inode's ->i_mode to overlay inode") sticky checking on overlay inode is performed by the vfs, so checking against sticky on underlying inode is not needed. Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2016-12-16ovl: don't check rename to selfMiklos Szeredi1-12/+3
This is redundant, the vfs already performed this check (and was broken, see commit 9409e22acdfc ("vfs: rename: check backing inode being equal")). Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2016-12-16ovl: treat special files like a regular fsMiklos Szeredi4-19/+16
No sense in opening special files on the underlying layers, they work just as well if opened on the overlay. Side effect is that it's no longer possible to connect one side of a pipe opened on overlayfs with the other side opened on the underlying layer. Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2016-12-16ovl: rename ovl_rename2() to ovl_rename()Miklos Szeredi1-4/+4
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2016-12-16ovl: use vfs_clone_file_range() for copy up if possibleAmir Goldstein1-1/+8
When copying up within the same fs, try to use vfs_clone_file_range(). This is very efficient when lower and upper are on the same fs with file reflink support. If vfs_clone_file_range() fails for any reason, copy up falls back to the regular data copy code. Tested correct behavior when lower and upper are on: 1. same ext4 (copy) 2. same xfs + reflink patches + mkfs.xfs (copy) 3. same xfs + reflink patches + mkfs.xfs -m reflink=1 (reflink) 4. different xfs + reflink patches + mkfs.xfs -m reflink=1 (copy) For comparison, on my laptop, xfstest overlay/001 (copy up of large sparse files) takes less than 1 second in the xfs reflink setup vs. 25 seconds on the rest of the setups. Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2016-12-16Revert "ovl: get_write_access() in truncate"Miklos Szeredi1-21/+0
This reverts commit 03bea60409328de54e4ff7ec41672e12a9cb0908. Commit 4d0c5ba2ff79 ("vfs: do get_write_access() on upper layer of overlayfs") makes the writecount checks inside overlayfs superfluous, the file is already copied up and write access acquired on the upper inode when ovl_setattr is called with ATTR_SIZE. Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2016-12-16ovl: update docMiklos Szeredi1-3/+3
The quirk for file locks and leases no longer applies. Add missing info about renaming directory residing on lower layer. Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2016-12-16vfs: fix vfs_clone_file_range() for overlayfs filesAmir Goldstein1-5/+5
With overlayfs, it is wrong to compare file_inode(inode)->i_sb of regular files with those of non-regular files, because the former reference the real (upper/lower) sb and the latter reference the overlayfs sb. Move the test for same super block after the sanity tests for clone range of directory and non-regular file. This change fixes xfstest generic/157, which returned EXDEV instead of EISDIR/EINVAL in the following test cases over overlayfs: echo "Try to reflink a dir" _reflink_range $testdir1/dir1 0 $testdir1/file2 0 $blksz echo "Try to reflink a device" _reflink_range $testdir1/dev1 0 $testdir1/file2 0 $blksz Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2016-12-16vfs: call vfs_clone_file_range() under freeze protectionAmir Goldstein4-6/+15
Move sb_start_write()/sb_end_write() out of the vfs helper and up into the ioctl handler. Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2016-12-16vfs: allow vfs_clone_file_range() across mount pointsAmir Goldstein2-2/+10
FICLONE/FICLONERANGE ioctls return -EXDEV if src and dest files are not on the same mount point. Practically, clone only requires that src and dest files are on the same file system. Move the check for same mount point to ioctl handler and keep only the check for same super block in the vfs helper. A following patch is going to use the vfs_clone_file_range() helper in overlayfs to copy up between lower and upper mount points on the same file system. Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2016-12-16vfs: no mnt_want_write_file() in vfs_{copy,clone}_file_range()Miklos Szeredi1-8/+4
We've checked for file_out being opened for write. This ensures that we already have mnt_want_write() on target. Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2016-12-16Revert "vfs: rename: check backing inode being equal"Miklos Szeredi1-5/+1
This reverts commit 9409e22acdfc9153f88d9b1ed2bd2a5b34d2d3ca. Since commit 51f7e52dc943 ("ovl: share inode for hard link") there's no need to call d_real_inode() to check two overlay inodes for equality. Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2016-12-16Revert "af_unix: fix hard linked sockets on overlay"Miklos Szeredi1-3/+3
This reverts commit eb0a4a47ae89aaa0674ab3180de6a162f3be2ddf. Since commit 51f7e52dc943 ("ovl: share inode for hard link") there's no need to call d_real_inode() to check two overlay inodes for equality. Side effect of this revert is that it's no longer possible to connect one socket on overlayfs to one on the underlying layer (something which didn't make sense anyway). Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2016-12-04Linux 4.9-rc8v4.9-rc8Linus Torvalds1-1/+1
2016-12-03Merge tag 'drm-fixes-for-v4.9-rc8' of ↵Linus Torvalds7-13/+34
git://people.freedesktop.org/~airlied/linux Pull drm fixes from Dave Airlie: "A pretty small pull request: a couple of AMD powerxpress regression fixes and a power management fix, a couple of i915 fixes and one hdlcd fix, along with one core don't oops because of incorrect API usage fix" * tag 'drm-fixes-for-v4.9-rc8' of git://people.freedesktop.org/~airlied/linux: drm/i915: drop the struct_mutex when wedged or trying to reset drm/i915: Don't touch NULL sg on i915_gem_object_get_pages_gtt() error drm: Don't call drm_for_each_crtc with a non-KMS driver drm/radeon: fix check for port PM availability drm/amdgpu: fix check for port PM availability drm/amd/powerplay: initialize the soft_regs offset in struct smu7_hwmgr drm: hdlcd: Fix cleanup order
2016-12-04Merge tag 'drm-intel-fixes-2016-12-01' of ↵Dave Airlie2-3/+5
git://anongit.freedesktop.org/git/drm-intel into drm-fixes 2 intel fixes. * tag 'drm-intel-fixes-2016-12-01' of git://anongit.freedesktop.org/git/drm-intel: drm/i915: drop the struct_mutex when wedged or trying to reset drm/i915: Don't touch NULL sg on i915_gem_object_get_pages_gtt() error
2016-12-02Merge branch 'akpm' (patches from Andrew)Linus Torvalds2-1/+3
Merge more fixes from Andrew Morton: "2 fixes" * emailed patches from Andrew Morton <akpm@linux-foundation.org>: mm, vmscan: add cond_resched() into shrink_node_memcg() mm: workingset: fix NULL ptr in count_shadow_nodes
2016-12-02mm, vmscan: add cond_resched() into shrink_node_memcg()Michal Hocko1-0/+2
Boris Zhmurov has reported RCU stalls during the kswapd reclaim: INFO: rcu_sched detected stalls on CPUs/tasks: 23-...: (22 ticks this GP) idle=92f/140000000000000/0 softirq=2638404/2638404 fqs=23 (detected by 4, t=6389 jiffies, g=786259, c=786258, q=42115) Task dump for CPU 23: kswapd1 R running task 0 148 2 0x00000008 Call Trace: shrink_node+0xd2/0x2f0 kswapd+0x2cb/0x6a0 mem_cgroup_shrink_node+0x160/0x160 kthread+0xbd/0xe0 __switch_to+0x1fa/0x5c0 ret_from_fork+0x1f/0x40 kthread_create_on_node+0x180/0x180 a closer code inspection has shown that we might indeed miss all the scheduling points in the reclaim path if no pages can be isolated from the LRU list. This is a pathological case but other reports from Donald Buczek have shown that we might indeed hit such a path: clusterd-989 [009] .... 118023.654491: mm_vmscan_direct_reclaim_end: nr_reclaimed=193 kswapd1-86 [001] dN.. 118023.987475: mm_vmscan_lru_isolate: isolate_mode=0 classzone=0 order=0 nr_requested=32 nr_scanned=4239830 nr_taken=0 file=1 kswapd1-86 [001] dN.. 118024.320968: mm_vmscan_lru_isolate: isolate_mode=0 classzone=0 order=0 nr_requested=32 nr_scanned=4239844 nr_taken=0 file=1 kswapd1-86 [001] dN.. 118024.654375: mm_vmscan_lru_isolate: isolate_mode=0 classzone=0 order=0 nr_requested=32 nr_scanned=4239858 nr_taken=0 file=1 kswapd1-86 [001] dN.. 118024.987036: mm_vmscan_lru_isolate: isolate_mode=0 classzone=0 order=0 nr_requested=32 nr_scanned=4239872 nr_taken=0 file=1 kswapd1-86 [001] dN.. 118025.319651: mm_vmscan_lru_isolate: isolate_mode=0 classzone=0 order=0 nr_requested=32 nr_scanned=4239886 nr_taken=0 file=1 kswapd1-86 [001] dN.. 118025.652248: mm_vmscan_lru_isolate: isolate_mode=0 classzone=0 order=0 nr_requested=32 nr_scanned=4239900 nr_taken=0 file=1 kswapd1-86 [001] dN.. 118025.984870: mm_vmscan_lru_isolate: isolate_mode=0 classzone=0 order=0 nr_requested=32 nr_scanned=4239914 nr_taken=0 file=1 [...] kswapd1-86 [001] dN.. 118084.274403: mm_vmscan_lru_isolate: isolate_mode=0 classzone=0 order=0 nr_requested=32 nr_scanned=4241133 nr_taken=0 file=1 this is minute long snapshot which didn't take a single page from the LRU. It is not entirely clear why only 1303 pages have been scanned during that time (maybe there was a heavy IRQ activity interfering). In any case it looks like we can really hit long periods without scheduling on non preemptive kernels so an explicit cond_resched() in shrink_node_memcg which is independent on the reclaim operation is due. Link: http://lkml.kernel.org/r/20161202095841.16648-1-mhocko@kernel.org Signed-off-by: Michal Hocko <mhocko@suse.com> Reported-by: Boris Zhmurov <bb@kernelpanic.ru> Tested-by: Boris Zhmurov <bb@kernelpanic.ru> Reported-by: Donald Buczek <buczek@molgen.mpg.de> Reported-by: "Christopher S. Aker" <caker@theshore.net> Reported-by: Paul Menzel <pmenzel@molgen.mpg.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-12-02mm: workingset: fix NULL ptr in count_shadow_nodesMichal Hocko1-1/+1
Commit 0a6b76dd23fa ("mm: workingset: make shadow node shrinker memcg aware") has made the workingset shadow nodes shrinker memcg aware. The implementation is not correct though because memcg_kmem_enabled() might become true while we are doing a global reclaim when the sc->memcg might be NULL which is exactly what Marek has seen: BUG: unable to handle kernel NULL pointer dereference at 0000000000000400 IP: [<ffffffff8122d520>] mem_cgroup_node_nr_lru_pages+0x20/0x40 PGD 0 Oops: 0000 [#1] SMP CPU: 0 PID: 60 Comm: kswapd0 Tainted: G O 4.8.10-12.pvops.qubes.x86_64 #1 task: ffff880011863b00 task.stack: ffff880011868000 RIP: mem_cgroup_node_nr_lru_pages+0x20/0x40 RSP: e02b:ffff88001186bc70 EFLAGS: 00010293 RAX: 0000000000000000 RBX: ffff88001186bd20 RCX: 0000000000000002 RDX: 000000000000000c RSI: 0000000000000000 RDI: 0000000000000000 RBP: ffff88001186bc70 R08: 28f5c28f5c28f5c3 R09: 0000000000000000 R10: 0000000000006c34 R11: 0000000000000333 R12: 00000000000001f6 R13: ffffffff81c6f6a0 R14: 0000000000000000 R15: 0000000000000000 FS: 0000000000000000(0000) GS:ffff880013c00000(0000) knlGS:ffff880013d00000 CS: e033 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000000400 CR3: 00000000122f2000 CR4: 0000000000042660 Call Trace: count_shadow_nodes+0x9a/0xa0 shrink_slab.part.42+0x119/0x3e0 shrink_node+0x22c/0x320 kswapd+0x32c/0x700 kthread+0xd8/0xf0 ret_from_fork+0x1f/0x40 Code: 66 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 3b 35 dd eb b1 00 55 48 89 e5 73 2c 89 d2 31 c9 31 c0 4c 63 ce 48 0f a3 ca 73 13 <4a> 8b b4 cf 00 04 00 00 41 89 c8 4a 03 84 c6 80 00 00 00 83 c1 RIP mem_cgroup_node_nr_lru_pages+0x20/0x40 RSP <ffff88001186bc70> CR2: 0000000000000400 ---[ end trace 100494b9edbdfc4d ]--- This patch fixes the issue by checking sc->memcg rather than memcg_kmem_enabled() which is sufficient because shrink_slab makes sure that only memcg aware shrinkers will get non-NULL memcgs and only if memcg_kmem_enabled is true. Fixes: 0a6b76dd23fa ("mm: workingset: make shadow node shrinker memcg aware") Link: http://lkml.kernel.org/r/20161201132156.21450-1-mhocko@kernel.org Signed-off-by: Michal Hocko <mhocko@suse.com> Reported-by: Marek Marczykowski-Górecki <marmarek@mimuw.edu.pl> Tested-by: Marek Marczykowski-Górecki <marmarek@mimuw.edu.pl> Acked-by: Vladimir Davydov <vdavydov.dev@gmail.com> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Acked-by: Balbir Singh <bsingharora@gmail.com> Cc: <stable@vger.kernel.org> [4.6+] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-12-02kbuild: fix building bzImage with CONFIG_TRIM_UNUSED_KSYMS enabledNicolas Pitre1-1/+8
When building a specific target such as bzImage, modules aren't normally built. However if CONFIG_TRIM_UNUSED_KSYMS is enabled, no built modules means none of the exported symbols are used and therefore they will all be trimmed away from the final kernel. A subsequent "make modules" will fail because modpost cannot find the needed symbols for those modules in the kernel binary. Let's make sure modules are also built whenever CONFIG_TRIM_UNUSED_KSYMS is enabled and that the kernel binary is properly rebuilt accordingly. Signed-off-by: Nicolas Pitre <nico@linaro.org> Tested-by: Jarod Wilson <jarod@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-12-02Merge tag 'fixes-for-linus' of ↵Linus Torvalds8-6/+22
git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc Pull ARM SoC fixes from Arnd Bergmann: "This should be the last set of bugfixes for arm-soc in v4.9. None of these are critical regressions, but it would be nice to still get them merged. - On the Juno platform, the idle latency was described wrong, leading to suboptimal cpuidle tuning. - Also on the same platform, PCI I/O space was set up incorrectly and could not work. - On the sti platform, a syntactically incorrect DT entry caused warnings. - The newly added 'gr8' platform has somewhat confusing file names, which we rename for consistency" * tag 'fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc: arm64: dts: juno: fix cluster sleep state entry latency on all SoC versions arm64: dts: juno: Correct PCI IO window ARM: dts: STiH407-family: fix i2c nodes ARM: gr8: Rename the DTSI and relevant DTS
2016-12-02Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netLinus Torvalds121-450/+1064
Pull networking fixes from David Miller: 1) Lots more phydev and probe error path leaks in various drivers by Johan Hovold. 2) Fix race in packet_set_ring(), from Philip Pettersson. 3) Use after free in dccp_invalid_packet(), from Eric Dumazet. 4) Signnedness overflow in SO_{SND,RCV}BUFFORCE, also from Eric Dumazet. 5) When tunneling between ipv4 and ipv6 we can be left with the wrong skb->protocol value as we enter the IPSEC engine and this causes all kinds of problems. Set it before the output path does any dst_output() calls, from Eli Cooper. 6) bcmgenet uses wrong device struct pointer in DMA API calls, fix from Florian Fainelli. 7) Various netfilter nat bug fixes from FLorian Westphal. 8) Fix memory leak in ipvlan_link_new(), from Gao Feng. 9) Locking fixes, particularly wrt. socket lookups, in l2tp from Guillaume Nault. 10) Avoid invoking rhash teardowns in atomic context by moving netlink cb->done() dump completion from a worker thread. Fix from Herbert Xu. 11) Buffer refcount problems in tun and macvtap on errors, from Jason Wang. 12) We don't set Kconfig symbol DEFAULT_TCP_CONG properly when the user selects BBR. Fix from Julian Wollrath. 13) Fix deadlock in transmit path on altera TSE driver, from Lino Sanfilippo. 14) Fix unbalanced reference counting in dsa_switch_tree, from Nikita Yushchenko. 15) tc_tunnel_key needs to be properly exported to userspace via uapi, fix from Roi Dayan. 16) rds_tcp_init_net() doesn't unregister notifier in error path, fix from Sowmini Varadhan. 17) Stale packet header pointer access after pskb_expand_head() in genenve driver, fix from Sabrina Dubroca. * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (103 commits) net: avoid signed overflows for SO_{SND|RCV}BUFFORCE geneve: avoid use-after-free of skb->data tipc: check minimum bearer MTU net: renesas: ravb: unintialized return value sh_eth: remove unchecked interrupts for RZ/A1 net: bcmgenet: Utilize correct struct device for all DMA operations NET: usb: qmi_wwan: add support for Telit LE922A PID 0x1040 cdc_ether: Fix handling connection notification ip6_offload: check segs for NULL in ipv6_gso_segment. RDS: TCP: unregister_netdevice_notifier() in error path of rds_tcp_init_net Revert: "ip6_tunnel: Update skb->protocol to ETH_P_IPV6 in ip6_tnl_xmit()" ipv6: Set skb->protocol properly for local output ipv4: Set skb->protocol properly for local output packet: fix race condition in packet_set_ring net: ethernet: altera: TSE: do not use tx queue lock in tx completion handler net: ethernet: altera: TSE: Remove unneeded dma sync for tx buffers net: ethernet: stmmac: fix of-node and fixed-link-phydev leaks net: ethernet: stmmac: platform: fix outdated function header net: ethernet: stmmac: dwmac-meson8b: fix probe error path net: ethernet: stmmac: dwmac-generic: fix probe error path ...
2016-12-02net: avoid signed overflows for SO_{SND|RCV}BUFFORCEEric Dumazet1-2/+2
CAP_NET_ADMIN users should not be allowed to set negative sk_sndbuf or sk_rcvbuf values, as it can lead to various memory corruptions, crashes, OOM... Note that before commit 82981930125a ("net: cleanups in sock_setsockopt()"), the bug was even more serious, since SO_SNDBUF and SO_RCVBUF were vulnerable. This needs to be backported to all known linux kernels. Again, many thanks to syzkaller team for discovering this gem. Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: Andrey Konovalov <andreyknvl@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-02geneve: avoid use-after-free of skb->dataSabrina Dubroca1-10/+4
geneve{,6}_build_skb can end up doing a pskb_expand_head(), which makes the ip_hdr(skb) reference we stashed earlier stale. Since it's only needed as an argument to ip_tunnel_ecn_encap(), move this directly in the function call. Fixes: 08399efc6319 ("geneve: ensure ECN info is handled properly in all tx/rx paths") Signed-off-by: Sabrina Dubroca <sd@queasysnail.net> Reviewed-by: John W. Linville <linville@tuxdriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-02tipc: check minimum bearer MTUMichal Kubeček3-2/+27
Qian Zhang (张谦) reported a potential socket buffer overflow in tipc_msg_build() which is also known as CVE-2016-8632: due to insufficient checks, a buffer overflow can occur if MTU is too short for even tipc headers. As anyone can set device MTU in a user/net namespace, this issue can be abused by a regular user. As agreed in the discussion on Ben Hutchings' original patch, we should check the MTU at the moment a bearer is attached rather than for each processed packet. We also need to repeat the check when bearer MTU is adjusted to new device MTU. UDP case also needs a check to avoid overflow when calculating bearer MTU. Fixes: b97bf3fd8f6a ("[TIPC] Initial merge") Signed-off-by: Michal Kubecek <mkubecek@suse.cz> Reported-by: Qian Zhang (张谦) <zhangqian-c@360.cn> Acked-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-02Merge tag 'linux-can-fixes-for-4.9-20161201' of ↵David S. Miller4-24/+121
git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can Marc Kleine-Budde says: ==================== pull-request: can 2016-12-02 this is a pull request for net/master. There are two patches by Stephane Grosjean, who adds support for the new PCAN-USB X6 USB interface to the pcan_usb driver. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-02net: renesas: ravb: unintialized return valueDan Carpenter1-2/+0
We want to set the other "err" variable here so that we can return it later. My version of GCC misses this issue but I caught it with a static checker. Fixes: 9f70eb339f52 ("net: ethernet: renesas: ravb: fix fixed-link phydev leaks") Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Acked-by: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com> Reviewed-by: Johan Hovold <johan@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-02sh_eth: remove unchecked interrupts for RZ/A1Chris Brandt1-1/+1
When streaming a lot of data and the RZ/A1 can't keep up, some status bits will get set that are not being checked or cleared which cause the following messages and the Ethernet driver to stop working. This patch fixes that issue. irq 21: nobody cared (try booting with the "irqpoll" option) handlers: [<c036b71c>] sh_eth_interrupt Disabling IRQ #21 Fixes: db893473d313a4ad ("sh_eth: Add support for r7s72100") Signed-off-by: Chris Brandt <chris.brandt@renesas.com> Acked-by: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-02net: bcmgenet: Utilize correct struct device for all DMA operationsFlorian Fainelli1-3/+5
__bcmgenet_tx_reclaim() and bcmgenet_free_rx_buffers() are not using the same struct device during unmap that was used for the map operation, which makes DMA-API debugging warn about it. Fix this by always using &priv->pdev->dev throughout the driver, using an identical device reference for all map/unmap calls. Fixes: 1c1008c793fa ("net: bcmgenet: add main driver file") Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-02Fix up a couple of field names in the CREDITS fileLinus Torvalds1-4/+4
Ozgur Karatas reported that the very first entry in the CREDITS file had the wrong tag for name (M: instead of N: - it happened when moving the entry from the MAINTAINERS file, where 'M:' stands for "Maintainer"). And when I went looking, I found a couple of other cases of wrong tagging too. Reported-by: Ozgur Karatas <mueddib@yandex.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-12-02NET: usb: qmi_wwan: add support for Telit LE922A PID 0x1040Daniele Palmas1-0/+1
This patch adds support for PID 0x1040 of Telit LE922A. The qmi adapter requires to have DTR set for proper working, so QMI_WWAN_QUIRK_DTR has been enabled. Signed-off-by: Daniele Palmas <dnlplm@gmail.com> Acked-by: Bjørn Mork <bjorn@mork.no> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-02cdc_ether: Fix handling connection notificationKristian Evensen1-7/+31
Commit bfe9b9d2df66 ("cdc_ether: Improve ZTE MF823/831/910 handling") introduced a work-around in usbnet_cdc_status() for devices that exported cdc carrier on twice on connect. Before the commit, this behavior caused the link state to be incorrect. It was assumed that all CDC Ethernet devices would either export this behavior, or send one off and then one on notification (which seems to be the default behavior). Unfortunately, it turns out multiple devices sends a connection notification multiple times per second (via an interrupt), even when connection state does not change. This has been observed with several different USB LAN dongles (at least), for example 13b1:0041 (Linksys). After bfe9b9d2df66, the link state has been set as down and then up for each notification. This has caused a flood of Netlink NEWLINK messages and syslog to be flooded with messages similar to: cdc_ether 2-1:2.0 eth1: kevent 12 may have been dropped This commit fixes the behavior by reverting usbnet_cdc_status() to how it was before bfe9b9d2df66. The work-around has been moved to a separate status-function which is only called when a known, affect device is detected. v1->v2: * Do not open-code netif_carrier_ok() (thanks Henning Schild). * Call netif_carrier_off() instead of usb_link_change(). This prevents calling schedule_work() twice without giving the work queue a chance to be processed (thanks Bjørn Mork). Fixes: bfe9b9d2df66 ("cdc_ether: Improve ZTE MF823/831/910 handling") Reported-by: Henning Schild <henning.schild@siemens.com> Signed-off-by: Kristian Evensen <kristian.evensen@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-02ip6_offload: check segs for NULL in ipv6_gso_segment.Artem Savkov1-1/+1
segs needs to be checked for being NULL in ipv6_gso_segment() before calling skb_shinfo(segs), otherwise kernel can run into a NULL-pointer dereference: [ 97.811262] BUG: unable to handle kernel NULL pointer dereference at 00000000000000cc [ 97.819112] IP: [<ffffffff816e52f9>] ipv6_gso_segment+0x119/0x2f0 [ 97.825214] PGD 0 [ 97.827047] [ 97.828540] Oops: 0000 [#1] SMP [ 97.831678] Modules linked in: vhost_net vhost macvtap macvlan nfsv3 rpcsec_gss_krb5 nfsv4 dns_resolver nfs fscache xt_CHECKSUM iptable_mangle ipt_MASQUERADE nf_nat_masquerade_ipv4 iptable_nat nf_nat_ipv4 nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack ipt_REJECT nf_reject_ipv4 tun ebtable_filter ebtables ip6table_filter ip6_tables iptable_filter bridge stp llc snd_hda_codec_realtek snd_hda_codec_hdmi snd_hda_codec_generic snd_hda_intel snd_hda_codec edac_mce_amd snd_hda_core edac_core snd_hwdep kvm_amd snd_seq kvm snd_seq_device snd_pcm irqbypass snd_timer ppdev parport_serial snd parport_pc k10temp pcspkr soundcore parport sp5100_tco shpchp sg wmi i2c_piix4 acpi_cpufreq nfsd auth_rpcgss nfs_acl lockd grace sunrpc ip_tables xfs libcrc32c sr_mod cdrom sd_mod ata_generic pata_acpi amdkfd amd_iommu_v2 radeon broadcom bcm_phy_lib i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops ttm ahci serio_raw tg3 firewire_ohci libahci pata_atiixp drm ptp libata firewire_core pps_core i2c_core crc_itu_t fjes dm_mirror dm_region_hash dm_log dm_mod [ 97.927721] CPU: 1 PID: 3504 Comm: vhost-3495 Not tainted 4.9.0-7.el7.test.x86_64 #1 [ 97.935457] Hardware name: AMD Snook/Snook, BIOS ESK0726A 07/26/2010 [ 97.941806] task: ffff880129a1c080 task.stack: ffffc90001bcc000 [ 97.947720] RIP: 0010:[<ffffffff816e52f9>] [<ffffffff816e52f9>] ipv6_gso_segment+0x119/0x2f0 [ 97.956251] RSP: 0018:ffff88012fc43a10 EFLAGS: 00010207 [ 97.961557] RAX: 0000000000000000 RBX: ffff8801292c8700 RCX: 0000000000000594 [ 97.968687] RDX: 0000000000000593 RSI: ffff880129a846c0 RDI: 0000000000240000 [ 97.975814] RBP: ffff88012fc43a68 R08: ffff880129a8404e R09: 0000000000000000 [ 97.982942] R10: 0000000000000000 R11: ffff880129a84076 R12: 00000020002949b3 [ 97.990070] R13: ffff88012a580000 R14: 0000000000000000 R15: ffff88012a580000 [ 97.997198] FS: 0000000000000000(0000) GS:ffff88012fc40000(0000) knlGS:0000000000000000 [ 98.005280] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 98.011021] CR2: 00000000000000cc CR3: 0000000126c5d000 CR4: 00000000000006e0 [ 98.018149] Stack: [ 98.020157] 00000000ffffffff ffff88012fc43ac8 ffffffffa017ad0a 000000000000000e [ 98.027584] 0000001300000000 0000000077d59998 ffff8801292c8700 00000020002949b3 [ 98.035010] ffff88012a580000 0000000000000000 ffff88012a580000 ffff88012fc43a98 [ 98.042437] Call Trace: [ 98.044879] <IRQ> [ 98.046803] [<ffffffffa017ad0a>] ? tg3_start_xmit+0x84a/0xd60 [tg3] [ 98.053156] [<ffffffff815eeee0>] skb_mac_gso_segment+0xb0/0x130 [ 98.059158] [<ffffffff815eefd3>] __skb_gso_segment+0x73/0x110 [ 98.064985] [<ffffffff815ef40d>] validate_xmit_skb+0x12d/0x2b0 [ 98.070899] [<ffffffff815ef5d2>] validate_xmit_skb_list+0x42/0x70 [ 98.077073] [<ffffffff81618560>] sch_direct_xmit+0xd0/0x1b0 [ 98.082726] [<ffffffff815efd86>] __dev_queue_xmit+0x486/0x690 [ 98.088554] [<ffffffff8135c135>] ? cpumask_next_and+0x35/0x50 [ 98.094380] [<ffffffff815effa0>] dev_queue_xmit+0x10/0x20 [ 98.099863] [<ffffffffa09ce057>] br_dev_queue_push_xmit+0xa7/0x170 [bridge] [ 98.106907] [<ffffffffa09ce161>] br_forward_finish+0x41/0xc0 [bridge] [ 98.113430] [<ffffffff81627cf2>] ? nf_iterate+0x52/0x60 [ 98.118735] [<ffffffff81627d6b>] ? nf_hook_slow+0x6b/0xc0 [ 98.124216] [<ffffffffa09ce32c>] __br_forward+0x14c/0x1e0 [bridge] [ 98.130480] [<ffffffffa09ce120>] ? br_dev_queue_push_xmit+0x170/0x170 [bridge] [ 98.137785] [<ffffffffa09ce4bd>] br_forward+0x9d/0xb0 [bridge] [ 98.143701] [<ffffffffa09cfbb7>] br_handle_frame_finish+0x267/0x560 [bridge] [ 98.150834] [<ffffffffa09d0064>] br_handle_frame+0x174/0x2f0 [bridge] [ 98.157355] [<ffffffff8102fb89>] ? sched_clock+0x9/0x10 [ 98.162662] [<ffffffff810b63b2>] ? sched_clock_cpu+0x72/0xa0 [ 98.168403] [<ffffffff815eccf5>] __netif_receive_skb_core+0x1e5/0xa20 [ 98.174926] [<ffffffff813659f9>] ? timerqueue_add+0x59/0xb0 [ 98.180580] [<ffffffff815ed548>] __netif_receive_skb+0x18/0x60 [ 98.186494] [<ffffffff815ee625>] process_backlog+0x95/0x140 [ 98.192145] [<ffffffff815edccd>] net_rx_action+0x16d/0x380 [ 98.197713] [<ffffffff8170cff1>] __do_softirq+0xd1/0x283 [ 98.203106] [<ffffffff8170b2bc>] do_softirq_own_stack+0x1c/0x30 [ 98.209107] <EOI> [ 98.211029] [<ffffffff8108a5c0>] do_softirq+0x50/0x60 [ 98.216166] [<ffffffff815ec853>] netif_rx_ni+0x33/0x80 [ 98.221386] [<ffffffffa09eeff7>] tun_get_user+0x487/0x7f0 [tun] [ 98.227388] [<ffffffffa09ef3ab>] tun_sendmsg+0x4b/0x60 [tun] [ 98.233129] [<ffffffffa0b68932>] handle_tx+0x282/0x540 [vhost_net] [ 98.239392] [<ffffffffa0b68c25>] handle_tx_kick+0x15/0x20 [vhost_net] [ 98.245916] [<ffffffffa0abacfe>] vhost_worker+0x9e/0xf0 [vhost] [ 98.251919] [<ffffffffa0abac60>] ? vhost_umem_alloc+0x40/0x40 [vhost] [ 98.258440] [<ffffffff81003a47>] ? do_syscall_64+0x67/0x180 [ 98.264094] [<ffffffff810a44d9>] kthread+0xd9/0xf0 [ 98.268965] [<ffffffff810a4400>] ? kthread_park+0x60/0x60 [ 98.274444] [<ffffffff8170a4d5>] ret_from_fork+0x25/0x30 [ 98.279836] Code: 8b 93 d8 00 00 00 48 2b 93 d0 00 00 00 4c 89 e6 48 89 df 66 89 93 c2 00 00 00 ff 10 48 3d 00 f0 ff ff 49 89 c2 0f 87 52 01 00 00 <41> 8b 92 cc 00 00 00 48 8b 80 d0 00 00 00 44 0f b7 74 10 06 66 [ 98.299425] RIP [<ffffffff816e52f9>] ipv6_gso_segment+0x119/0x2f0 [ 98.305612] RSP <ffff88012fc43a10> [ 98.309094] CR2: 00000000000000cc [ 98.312406] ---[ end trace 726a2c7a2d2d78d0 ]--- Signed-off-by: Artem Savkov <asavkov@redhat.com> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-02RDS: TCP: unregister_netdevice_notifier() in error path of rds_tcp_init_netSowmini Varadhan1-0/+2
If some error is encountered in rds_tcp_init_net, make sure to unregister_netdevice_notifier(), else we could trigger a panic later on, when the modprobe from a netns fails. Signed-off-by: Sowmini Varadhan <sowmini.varadhan@oracle.com> Acked-by: Santosh Shilimkar <santosh.shilimkar@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-02Revert: "ip6_tunnel: Update skb->protocol to ETH_P_IPV6 in ip6_tnl_xmit()"Eli Cooper1-1/+0
This reverts commit ae148b085876fa771d9ef2c05f85d4b4bf09ce0d ("ip6_tunnel: Update skb->protocol to ETH_P_IPV6 in ip6_tnl_xmit()"). skb->protocol is now set in __ip_local_out() and __ip6_local_out() before dst_output() is called. It is no longer necessary to do it for each tunnel. Cc: stable@vger.kernel.org Signed-off-by: Eli Cooper <elicooper@gmx.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-02ipv6: Set skb->protocol properly for local outputEli Cooper1-0/+2
When xfrm is applied to TSO/GSO packets, it follows this path: xfrm_output() -> xfrm_output_gso() -> skb_gso_segment() where skb_gso_segment() relies on skb->protocol to function properly. This patch sets skb->protocol to ETH_P_IPV6 before dst_output() is called, fixing a bug where GSO packets sent through an ipip6 tunnel are dropped when xfrm is involved. Cc: stable@vger.kernel.org Signed-off-by: Eli Cooper <elicooper@gmx.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-02ipv4: Set skb->protocol properly for local outputEli Cooper1-0/+2
When xfrm is applied to TSO/GSO packets, it follows this path: xfrm_output() -> xfrm_output_gso() -> skb_gso_segment() where skb_gso_segment() relies on skb->protocol to function properly. This patch sets skb->protocol to ETH_P_IP before dst_output() is called, fixing a bug where GSO packets sent through a sit tunnel are dropped when xfrm is involved. Cc: stable@vger.kernel.org Signed-off-by: Eli Cooper <elicooper@gmx.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-02packet: fix race condition in packet_set_ringPhilip Pettersson1-6/+12
When packet_set_ring creates a ring buffer it will initialize a struct timer_list if the packet version is TPACKET_V3. This value can then be raced by a different thread calling setsockopt to set the version to TPACKET_V1 before packet_set_ring has finished. This leads to a use-after-free on a function pointer in the struct timer_list when the socket is closed as the previously initialized timer will not be deleted. The bug is fixed by taking lock_sock(sk) in packet_setsockopt when changing the packet version while also taking the lock at the start of packet_set_ring. Fixes: f6fb8f100b80 ("af-packet: TPACKET_V3 flexible buffer implementation.") Signed-off-by: Philip Pettersson <philip.pettersson@gmail.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-02Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds3-5/+9
Pull KVM fixes from Radim Krčmář: "All architectures avoid memory corruption in an error path. ARM prevents bogus acknowledgement of interrupts" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: KVM: use after free in kvm_ioctl_create_device() KVM: arm/arm64: vgic: Don't notify EOI for non-SPIs
2016-12-02Merge branch 'i2c/for-current' of ↵Linus Torvalds2-19/+12
git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux Pull i2c fix from Wolfram Sang: "Here is the revert for the regression of the i2c-octeon driver I mentioned last time. I wished for a bit more feedback, but all people working actively on it are in need of this patch, so here it goes" * 'i2c/for-current' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux: Revert "i2c: octeon: thunderx: Limit register access retries"
2016-12-02net: ethernet: altera: TSE: do not use tx queue lock in tx completion handlerLino Sanfilippo1-2/+0
The driver already uses its private lock for synchronization between xmit and xmit completion handler making the additional use of the xmit_lock unnecessary. Furthermore the driver does not set NETIF_F_LLTX resulting in xmit to be called with the xmit_lock held and then taking the private lock while xmit completion handler does the reverse, first take the private lock, then the xmit_lock. Fix these issues by not taking the xmit_lock in the tx completion handler. Signed-off-by: Lino Sanfilippo <LinoSanfilippo@gmx.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-02net: ethernet: altera: TSE: Remove unneeded dma sync for tx buffersLino Sanfilippo1-10/+0
An explicit dma sync for device directly after mapping as well as an explicit dma sync for cpu directly before unmapping is unnecessary and costly on the hotpath. So remove these calls. Signed-off-by: Lino Sanfilippo <LinoSanfilippo@gmx.de> Signed-off-by: David S. Miller <davem@davemloft.net>