summaryrefslogtreecommitdiffstats
path: root/include
AgeCommit message (Collapse)AuthorFilesLines
2020-01-18Merge branch 'core-urgent-for-linus' of ↵Linus Torvalds1-2/+2
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull rseq fixes from Ingo Molnar: "Two rseq bugfixes: - CLONE_VM !CLONE_THREAD didn't work properly, the kernel would end up corrupting the TLS of the parent. Technically a change in the ABI but the previous behavior couldn't resonably have been relied on by applications so this looks like a valid exception to the ABI rule. - Make the RSEQ_FLAG_UNREGISTER ABI behavior consistent with the handling of other flags. This is not thought to impact any applications either" * 'core-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: rseq: Unregister rseq for clone CLONE_VM rseq: Reject unknown flags on rseq unregister
2020-01-17Merge tag 'block-5.5-2020-01-16' of git://git.kernel.dk/linux-blockLinus Torvalds1-4/+4
Pull block fixes from Jens Axboe: "Three fixes that should go into this release: - The 32-bit segment size fix that I mentioned last week (Ming) - Use uint for the block size (Mikulas) - A null_blk zone write handling fix (Damien)" * tag 'block-5.5-2020-01-16' of git://git.kernel.dk/linux-block: block: fix an integer overflow in logical block size null_blk: Fix zone write handling block: fix get_max_segment_size() overflow on 32bit arch
2020-01-16Merge tag 'armsoc-fixes' of ↵Linus Torvalds1-3/+3
git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc Pull ARM SoC fixes from Olof Johansson: "I've been sitting on these longer than I meant, so the patch count is a bit higher than ideal for this part of the release. There's also some reverts of double-applied patches that brings the diffstat up a bit. With that said, the biggest changes are: - Revert of duplicate i2c device addition on two Aspeed (BMC) Devicetrees. - Move of two device nodes that got applied to the wrong part of the tree on ASpeed G6. - Regulator fix for Beaglebone X15 (adding 12/5V supplies) - Use interrupts for keys on Amlogic SM1 to avoid missed polls In addition to that, there is a collection of smaller DT fixes: - Power supply assignment fixes for i.MX6 - Fix of interrupt line for magnetometer on i.MX8 Librem5 devkit - Build fixlets (selects) for davinci/omap2+ - More interrupt number fixes for Stratix10, Amlogic SM1, etc. - ... and more similar fixes across different platforms And some non-DT stuff: - optee fix to register multiple shared pages properly - Clock calculation fixes for MMP3 - Clock fixes for OMAP as well" * tag 'armsoc-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc: (42 commits) MAINTAINERS: Add myself as the co-maintainer for Actions Semi platforms ARM: dts: imx7: Fix Toradex Colibri iMX7S 256MB NAND flash support ARM: dts: imx6sll-evk: Remove incorrect power supply assignment ARM: dts: imx6sl-evk: Remove incorrect power supply assignment ARM: dts: imx6sx-sdb: Remove incorrect power supply assignment ARM: dts: imx6qdl-sabresd: Remove incorrect power supply assignment ARM: dts: imx6q-icore-mipi: Use 1.5 version of i.Core MX6DL ARM: omap2plus: select RESET_CONTROLLER ARM: davinci: select CONFIG_RESET_CONTROLLER ARM: dts: aspeed: rainier: Fix fan fault and presence ARM: dts: aspeed: rainier: Remove duplicate i2c busses ARM: dts: aspeed: tacoma: Remove duplicate flash nodes ARM: dts: aspeed: tacoma: Remove duplicate i2c busses ARM: dts: aspeed: tacoma: Fix fsi master node ARM: dts: aspeed-g6: Fix FSI master location ARM: dts: mmp3: Fix the TWSI ranges clk: mmp2: Fix the order of timer mux parents ARM: mmp: do not divide the clock rate arm64: dts: rockchip: Fix IR on Beelink A1 optee: Fix multi page dynamic shm pool alloc ...
2020-01-15block: fix an integer overflow in logical block sizeMikulas Patocka1-4/+4
Logical block size has type unsigned short. That means that it can be at most 32768. However, there are architectures that can run with 64k pages (for example arm64) and on these architectures, it may be possible to create block devices with 64k block size. For exmaple (run this on an architecture with 64k pages): Mount will fail with this error because it tries to read the superblock using 2-sector access: device-mapper: writecache: I/O is not aligned, sector 2, size 1024, block size 65536 EXT4-fs (dm-0): unable to read superblock This patch changes the logical block size from unsigned short to unsigned int to avoid the overflow. Cc: stable@vger.kernel.org Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Reviewed-by: Ming Lei <ming.lei@redhat.com> Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-01-15Merge branch 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfsLinus Torvalds1-1/+0
Pull vfs fixes from Al Viro: "Fixes for mountpoint_last() bugs (by converting to use of lookup_last()) and an autofs regression fix from this cycle (caused by follow_managed() breakage introduced in barrier fixes series)" * 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: fix autofs regression caused by follow_managed() changes reimplement path_mountpoint() with less magic
2020-01-15reimplement path_mountpoint() with less magicAl Viro1-1/+0
... and get rid of a bunch of bugs in it. Background: the reason for path_mountpoint() is that umount() really doesn't want attempts to revalidate the root of what it's trying to umount. The thing we want to avoid actually happen from complete_walk(); solution was to do something parallel to normal path_lookupat() and it both went overboard and got the boilerplate subtly (and not so subtly) wrong. A better solution is to do pretty much what the normal path_lookupat() does, but instead of complete_walk() do unlazy_walk(). All it takes to avoid that ->d_weak_revalidate() call... mountpoint_last() goes away, along with everything it got wrong, and so does the magic around LOOKUP_NO_REVAL. Another source of bugs is that when we traverse mounts at the final location (and we need to do that - umount . expects to get whatever's overmounting ., if any, out of the lookup) we really ought to take care of ->d_manage() - as it is, manual umount of autofs automount in progress can lead to unpleasant surprises for the daemon. Easily solved by using handle_lookup_down() instead of follow_mount(). Tested-by: Ian Kent <raven@themaw.net> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2020-01-14Merge tag 'asm-generic-5.5' of ↵Linus Torvalds1-1/+32
git://git.kernel.org/pub/scm/linux/kernel/git/arnd/playground Pull asm-generic fixes from Arnd Bergmann: "Here are two bugfixes from Mike Rapoport, both fixing compile-time errors for the nds32 architecture that were recently introduced" * tag 'asm-generic-5.5' of git://git.kernel.org/pub/scm/linux/kernel/git/arnd/playground: nds32: fix build failure caused by page table folding updates asm-generic/nds32: don't redefine cacheflush primitives
2020-01-14Merge branch 'dhowells' (patches from DavidH)Linus Torvalds1-9/+3
Merge misc fixes from David Howells. Two afs fixes and a key refcounting fix. * dhowells: afs: Fix afs_lookup() to not clobber the version on a new dentry afs: Fix use-after-loss-of-ref keys: Fix request_key() cache
2020-01-14afs: Fix use-after-loss-of-refDavid Howells1-9/+3
afs_lookup() has a tracepoint to indicate the outcome of d_splice_alias(), passing it the inode to retrieve the fid from. However, the function gave up its ref on that inode when it called d_splice_alias(), which may have failed and dropped the inode. Fix this by caching the fid. Fixes: 80548b03991f ("afs: Add more tracepoints") Reported-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-01-13mm: khugepaged: add trace status description for SCAN_PAGE_HAS_PRIVATEYang Shi1-1/+2
Commit 99cb0dbd47a1 ("mm,thp: add read-only THP support for (non-shmem) FS") introduced a new khugepaged scan result: SCAN_PAGE_HAS_PRIVATE, but the corresponding description for trace events were not added. Link: http://lkml.kernel.org/r/1574793844-2914-1-git-send-email-yang.shi@linux.alibaba.com Fixes: 99cb0dbd47a1 ("mm,thp: add read-only THP support for (non-shmem) FS") Signed-off-by: Yang Shi <yang.shi@linux.alibaba.com> Cc: Song Liu <songliubraving@fb.com> Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Anshuman Khandual <anshuman.khandual@arm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-01-13mm, debug_pagealloc: don't rely on static keys too earlyVlastimil Babka1-3/+15
Commit 96a2b03f281d ("mm, debug_pagelloc: use static keys to enable debugging") has introduced a static key to reduce overhead when debug_pagealloc is compiled in but not enabled. It relied on the assumption that jump_label_init() is called before parse_early_param() as in start_kernel(), so when the "debug_pagealloc=on" option is parsed, it is safe to enable the static key. However, it turns out multiple architectures call parse_early_param() earlier from their setup_arch(). x86 also calls jump_label_init() even earlier, so no issue was found while testing the commit, but same is not true for e.g. ppc64 and s390 where the kernel would not boot with debug_pagealloc=on as found by our QA. To fix this without tricky changes to init code of multiple architectures, this patch partially reverts the static key conversion from 96a2b03f281d. Init-time and non-fastpath calls (such as in arch code) of debug_pagealloc_enabled() will again test a simple bool variable. Fastpath mm code is converted to a new debug_pagealloc_enabled_static() variant that relies on the static key, which is enabled in a well-defined point in mm_init() where it's guaranteed that jump_label_init() has been called, regardless of architecture. [sfr@canb.auug.org.au: export _debug_pagealloc_enabled_early] Link: http://lkml.kernel.org/r/20200106164944.063ac07b@canb.auug.org.au Link: http://lkml.kernel.org/r/20191219130612.23171-1-vbabka@suse.cz Fixes: 96a2b03f281d ("mm, debug_pagelloc: use static keys to enable debugging") Signed-off-by: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> Cc: Michal Hocko <mhocko@kernel.org> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Matthew Wilcox <willy@infradead.org> Cc: Mel Gorman <mgorman@techsingularity.net> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Qian Cai <cai@lca.pw> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-01-13mm: memcg/slab: fix percpu slab vmstats flushingRoman Gushchin1-3/+2
Currently slab percpu vmstats are flushed twice: during the memcg offlining and just before freeing the memcg structure. Each time percpu counters are summed, added to the atomic counterparts and propagated up by the cgroup tree. The second flushing is required due to how recursive vmstats are implemented: counters are batched in percpu variables on a local level, and once a percpu value is crossing some predefined threshold, it spills over to atomic values on the local and each ascendant levels. It means that without flushing some numbers cached in percpu variables will be dropped on floor each time a cgroup is destroyed. And with uptime the error on upper levels might become noticeable. The first flushing aims to make counters on ancestor levels more precise. Dying cgroups may resume in the dying state for a long time. After kmem_cache reparenting which is performed during the offlining slab counters of the dying cgroup don't have any chances to be updated, because any slab operations will be performed on the parent level. It means that the inaccuracy caused by percpu batching will not decrease up to the final destruction of the cgroup. By the original idea flushing slab counters during the offlining should minimize the visible inaccuracy of slab counters on the parent level. The problem is that percpu counters are not zeroed after the first flushing. So every cached percpu value is summed twice. It creates a small error (up to 32 pages per cpu, but usually less) which accumulates on parent cgroup level. After creating and destroying of thousands of child cgroups, slab counter on parent level can be way off the real value. For now, let's just stop flushing slab counters on memcg offlining. It can't be done correctly without scheduling a work on each cpu: reading and zeroing it during css offlining can race with an asynchronous update, which doesn't expect values to be changed underneath. With this change, slab counters on parent level will become eventually consistent. Once all dying children are gone, values are correct. And if not, the error is capped by 32 * NR_CPUS pages per dying cgroup. It's not perfect, as slab are reparented, so any updates after the reparenting will happen on the parent level. It means that if a slab page was allocated, a counter on child level was bumped, then the page was reparented and freed, the annihilation of positive and negative counter values will not happen until the child cgroup is released. It makes slab counters different from others, and it might want us to implement flushing in a correct form again. But it's also a question of performance: scheduling a work on each cpu isn't free, and it's an open question if the benefit of having more accurate counters is worth it. We might also consider flushing all counters on offlining, not only slab counters. So let's fix the main problem now: make the slab counters eventually consistent, so at least the error won't grow with uptime (or more precisely the number of created and destroyed cgroups). And think about the accuracy of counters separately. Link: http://lkml.kernel.org/r/20191220042728.1045881-1-guro@fb.com Fixes: bee07b33db78 ("mm: memcontrol: flush percpu slab vmstats on kmem offlining") Signed-off-by: Roman Gushchin <guro@fb.com> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Acked-by: Michal Hocko <mhocko@suse.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-01-12Merge tag 'riscv/for-v5.5-rc6' of ↵Linus Torvalds1-0/+16
git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux Pull RISC-V fixes from Paul Walmsley: "Two fixes for RISC-V: - Clear FP registers during boot when FP support is present, rather than when they aren't present - Move the header files associated with the SiFive L2 cache controller to drivers/soc (where the code was recently moved)" * tag 'riscv/for-v5.5-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux: riscv: Fixup obvious bug for fp-regs reset riscv: move sifive_l2_cache.h to include/soc
2020-01-12riscv: move sifive_l2_cache.h to include/socYash Shah1-0/+16
The commit 9209fb51896f ("riscv: move sifive_l2_cache.c to drivers/soc") moves the sifive L2 cache driver to driver/soc. It did not move the header file along with the driver. Therefore this patch moves the header file to driver/soc Signed-off-by: Yash Shah <yash.shah@sifive.com> Reviewed-by: Anup Patel <anup@brainfault.org> [paul.walmsley@sifive.com: updated to fix the include guard] Fixes: 9209fb51896f ("riscv: move sifive_l2_cache.c to drivers/soc") Signed-off-by: Paul Walmsley <paul.walmsley@sifive.com>
2020-01-10Merge tag 'amlogic-fixes' of ↵Olof Johansson1-3/+3
https://git.kernel.org/pub/scm/linux/kernel/git/khilman/linux-amlogic into arm/fixes arm-soc: Amlogic fixes for v5.5-rc * tag 'amlogic-fixes' of https://git.kernel.org/pub/scm/linux/kernel/git/khilman/linux-amlogic: arm64: dts: meson-sm1-sei610: add gpio bluetooth interrupt dt-bindings: reset: meson8b: fix duplicate reset IDs soc: amlogic: meson-ee-pwrc: propagate errors from pm_genpd_init() soc: amlogic: meson-ee-pwrc: propagate PD provider registration errors ARM: dts: meson8: fix the size of the PMU registers arm64: dts: meson-sm1-sei610: gpio-keys: switch to IRQs Link: https://lore.kernel.org/r/7hmuaweavi.fsf@baylibre.com Signed-off-by: Olof Johansson <olof@lixom.net>
2020-01-10Merge tag 'block-5.5-2020-01-10' of git://git.kernel.dk/linux-blockLinus Torvalds1-22/+0
Pull block fixes from Jens Axboe: "A few fixes that should go into this round. This pull request contains two NVMe fixes via Keith, removal of a dead function, and a fix for the bio op for read truncates (Ming)" * tag 'block-5.5-2020-01-10' of git://git.kernel.dk/linux-block: nvmet: fix per feat data len for get_feature nvme: Translate more status codes to blk_status_t fs: move guard_bio_eod() after bio_set_op_attrs block: remove unused mp_bvec_last_segment
2020-01-10Merge tag 'mtd/fixes-for-5.5-rc6' of ↵Linus Torvalds1-1/+1
git://git.kernel.org/pub/scm/linux/kernel/git/mtd/linux Pull MTD fixes from Miquel Raynal: "MTD: - sm_ftl: Fix NULL pointer warning. Raw NAND: - Cadence: fix compile testing. - STM32: Avoid locking. Onenand: - Fix several sparse/build warnings. SPI-NOR: - Add a flag to fix interaction with Micron parts" * tag 'mtd/fixes-for-5.5-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/mtd/linux: mtd: spi-nor: Fix the writing of the Status Register on micron flashes mtd: sm_ftl: fix NULL pointer warning mtd: onenand: omap2: Pass correct flags for prep_dma_memcpy mtd: onenand: samsung: Fix iomem access with regular memcpy mtd: onenand: omap2: Fix errors in style mtd: cadence: Fix cast to pointer from integer of different size warning mtd: rawnand: stm32_fmc2: avoid to lock the CPU bus
2020-01-09Merge branch 'for-linus' of ↵Linus Torvalds1-0/+1
git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input Pull input fixes from Dmitry Torokhov: "Just a few small fixups here" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input: Input: imx_sc_key - only take the valid data from SCU firmware as key state Input: add safety guards to input_set_keycode() Input: input_event - fix struct padding on sparc64 Input: uinput - always report EPOLLOUT
2020-01-09mtd: onenand: omap2: Fix errors in styleAmir Mahdi Ghorbanian1-1/+1
Correct mispelling, spacing, and coding style flaws caught by checkpatch.pl script in the Omap2 Onenand driver . Signed-off-by: Amir Mahdi Ghorbanian <indigoomega021@gmail.com> Signed-off-by: Miquel Raynal <miquel.raynal@bootlin.com>
2020-01-09Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netLinus Torvalds4-1/+49
Pull networking fixes from David Miller: 1) Missing netns pointer init in arp_tables, from Florian Westphal. 2) Fix normal tcp SACK being treated as D-SACK, from Pengcheng Yang. 3) Fix divide by zero in sch_cake, from Wen Yang. 4) Len passed to skb_put_padto() is wrong in qrtr code, from Carl Huang. 5) cmd->obj.chunk is leaked in sctp code error paths, from Xin Long. 6) cgroup bpf programs can be released out of order, fix from Roman Gushchin. 7) Make sure stmmac debugfs entry name is changed when device name changes, from Jiping Ma. 8) Fix memory leak in vlan_dev_set_egress_priority(), from Eric Dumazet. 9) SKB leak in lan78xx usb driver, also from Eric Dumazet. 10) Ridiculous TCA_FQ_QUANTUM values configured can cause loops in fq packet scheduler, reject them. From Eric Dumazet. * git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (69 commits) tipc: fix wrong connect() return code tipc: fix link overflow issue at socket shutdown netfilter: ipset: avoid null deref when IPSET_ATTR_LINENO is present netfilter: conntrack: dccp, sctp: handle null timeout argument atm: eni: fix uninitialized variable warning macvlan: do not assume mac_header is set in macvlan_broadcast() net: sch_prio: When ungrafting, replace with FIFO mlxsw: spectrum_qdisc: Ignore grafting of invisible FIFO MAINTAINERS: Remove myself as co-maintainer for qcom-ethqos gtp: fix bad unlock balance in gtp_encap_enable_socket pkt_sched: fq: do not accept silly TCA_FQ_QUANTUM tipc: remove meaningless assignment in Makefile tipc: do not add socket.o to tipc-y twice net: stmmac: dwmac-sun8i: Allow all RGMII modes net: stmmac: dwmac-sunxi: Allow all RGMII modes net: usb: lan78xx: fix possible skb leak net: stmmac: Fixed link does not need MDIO Bus vlan: vlan_changelink() should propagate errors vlan: fix memory leak in vlan_dev_set_egress_priority stmmac: debugfs entry name is not be changed when udev rename device name. ...
2020-01-08Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nfDavid S. Miller1-0/+6
Pablo Neira Ayuso says: ==================== Netfilter fixes for net The following patchset contains Netfilter fixes for net: 1) Missing netns context in arp_tables, from Florian Westphal. 2) Underflow in flowtable reference counter, from wenxu. 3) Fix incorrect ethernet destination address in flowtable offload, from wenxu. 4) Check for status of neighbour entry, from wenxu. 5) Fix NAT port mangling, from wenxu. 6) Unbind callbacks from destroy path to cleanup hardware properly on flowtable removal. 7) Fix missing casting statistics timestamp, add nf_flowtable_time_stamp and use it. 8) NULL pointer exception when timeout argument is null in conntrack dccp and sctp protocol helpers, from Florian Westphal. 9) Possible nul-dereference in ipset with IPSET_ATTR_LINENO, also from Florian. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2020-01-08macvlan: do not assume mac_header is set in macvlan_broadcast()Eric Dumazet1-0/+8
Use of eth_hdr() in tx path is error prone. Many drivers call skb_reset_mac_header() before using it, but others do not. Commit 6d1ccff62780 ("net: reset mac header in dev_start_xmit()") attempted to fix this generically, but commit d346a3fae3ff ("packet: introduce PACKET_QDISC_BYPASS socket option") brought back the macvlan bug. Lets add a new helper, so that tx paths no longer have to call skb_reset_mac_header() only to get a pointer to skb->data. Hopefully we will be able to revert 6d1ccff62780 ("net: reset mac header in dev_start_xmit()") and save few cycles in transmit fast path. BUG: KASAN: use-after-free in __get_unaligned_cpu32 include/linux/unaligned/packed_struct.h:19 [inline] BUG: KASAN: use-after-free in mc_hash drivers/net/macvlan.c:251 [inline] BUG: KASAN: use-after-free in macvlan_broadcast+0x547/0x620 drivers/net/macvlan.c:277 Read of size 4 at addr ffff8880a4932401 by task syz-executor947/9579 CPU: 0 PID: 9579 Comm: syz-executor947 Not tainted 5.5.0-rc4-syzkaller #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Call Trace: __dump_stack lib/dump_stack.c:77 [inline] dump_stack+0x197/0x210 lib/dump_stack.c:118 print_address_description.constprop.0.cold+0xd4/0x30b mm/kasan/report.c:374 __kasan_report.cold+0x1b/0x41 mm/kasan/report.c:506 kasan_report+0x12/0x20 mm/kasan/common.c:639 __asan_report_load_n_noabort+0xf/0x20 mm/kasan/generic_report.c:145 __get_unaligned_cpu32 include/linux/unaligned/packed_struct.h:19 [inline] mc_hash drivers/net/macvlan.c:251 [inline] macvlan_broadcast+0x547/0x620 drivers/net/macvlan.c:277 macvlan_queue_xmit drivers/net/macvlan.c:520 [inline] macvlan_start_xmit+0x402/0x77f drivers/net/macvlan.c:559 __netdev_start_xmit include/linux/netdevice.h:4447 [inline] netdev_start_xmit include/linux/netdevice.h:4461 [inline] dev_direct_xmit+0x419/0x630 net/core/dev.c:4079 packet_direct_xmit+0x1a9/0x250 net/packet/af_packet.c:240 packet_snd net/packet/af_packet.c:2966 [inline] packet_sendmsg+0x260d/0x6220 net/packet/af_packet.c:2991 sock_sendmsg_nosec net/socket.c:639 [inline] sock_sendmsg+0xd7/0x130 net/socket.c:659 __sys_sendto+0x262/0x380 net/socket.c:1985 __do_sys_sendto net/socket.c:1997 [inline] __se_sys_sendto net/socket.c:1993 [inline] __x64_sys_sendto+0xe1/0x1a0 net/socket.c:1993 do_syscall_64+0xfa/0x790 arch/x86/entry/common.c:294 entry_SYSCALL_64_after_hwframe+0x49/0xbe RIP: 0033:0x442639 Code: 18 89 d0 c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 0f 83 5b 10 fc ff c3 66 2e 0f 1f 84 00 00 00 00 RSP: 002b:00007ffc13549e08 EFLAGS: 00000246 ORIG_RAX: 000000000000002c RAX: ffffffffffffffda RBX: 0000000000000003 RCX: 0000000000442639 RDX: 000000000000000e RSI: 0000000020000080 RDI: 0000000000000003 RBP: 0000000000000004 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000 R13: 0000000000403bb0 R14: 0000000000000000 R15: 0000000000000000 Allocated by task 9389: save_stack+0x23/0x90 mm/kasan/common.c:72 set_track mm/kasan/common.c:80 [inline] __kasan_kmalloc mm/kasan/common.c:513 [inline] __kasan_kmalloc.constprop.0+0xcf/0xe0 mm/kasan/common.c:486 kasan_kmalloc+0x9/0x10 mm/kasan/common.c:527 __do_kmalloc mm/slab.c:3656 [inline] __kmalloc+0x163/0x770 mm/slab.c:3665 kmalloc include/linux/slab.h:561 [inline] tomoyo_realpath_from_path+0xc5/0x660 security/tomoyo/realpath.c:252 tomoyo_get_realpath security/tomoyo/file.c:151 [inline] tomoyo_path_perm+0x230/0x430 security/tomoyo/file.c:822 tomoyo_inode_getattr+0x1d/0x30 security/tomoyo/tomoyo.c:129 security_inode_getattr+0xf2/0x150 security/security.c:1222 vfs_getattr+0x25/0x70 fs/stat.c:115 vfs_statx_fd+0x71/0xc0 fs/stat.c:145 vfs_fstat include/linux/fs.h:3265 [inline] __do_sys_newfstat+0x9b/0x120 fs/stat.c:378 __se_sys_newfstat fs/stat.c:375 [inline] __x64_sys_newfstat+0x54/0x80 fs/stat.c:375 do_syscall_64+0xfa/0x790 arch/x86/entry/common.c:294 entry_SYSCALL_64_after_hwframe+0x49/0xbe Freed by task 9389: save_stack+0x23/0x90 mm/kasan/common.c:72 set_track mm/kasan/common.c:80 [inline] kasan_set_free_info mm/kasan/common.c:335 [inline] __kasan_slab_free+0x102/0x150 mm/kasan/common.c:474 kasan_slab_free+0xe/0x10 mm/kasan/common.c:483 __cache_free mm/slab.c:3426 [inline] kfree+0x10a/0x2c0 mm/slab.c:3757 tomoyo_realpath_from_path+0x1a7/0x660 security/tomoyo/realpath.c:289 tomoyo_get_realpath security/tomoyo/file.c:151 [inline] tomoyo_path_perm+0x230/0x430 security/tomoyo/file.c:822 tomoyo_inode_getattr+0x1d/0x30 security/tomoyo/tomoyo.c:129 security_inode_getattr+0xf2/0x150 security/security.c:1222 vfs_getattr+0x25/0x70 fs/stat.c:115 vfs_statx_fd+0x71/0xc0 fs/stat.c:145 vfs_fstat include/linux/fs.h:3265 [inline] __do_sys_newfstat+0x9b/0x120 fs/stat.c:378 __se_sys_newfstat fs/stat.c:375 [inline] __x64_sys_newfstat+0x54/0x80 fs/stat.c:375 do_syscall_64+0xfa/0x790 arch/x86/entry/common.c:294 entry_SYSCALL_64_after_hwframe+0x49/0xbe The buggy address belongs to the object at ffff8880a4932000 which belongs to the cache kmalloc-4k of size 4096 The buggy address is located 1025 bytes inside of 4096-byte region [ffff8880a4932000, ffff8880a4933000) The buggy address belongs to the page: page:ffffea0002924c80 refcount:1 mapcount:0 mapping:ffff8880aa402000 index:0x0 compound_mapcount: 0 raw: 00fffe0000010200 ffffea0002846208 ffffea00028f3888 ffff8880aa402000 raw: 0000000000000000 ffff8880a4932000 0000000100000001 0000000000000000 page dumped because: kasan: bad access detected Memory state around the buggy address: ffff8880a4932300: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb ffff8880a4932380: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb >ffff8880a4932400: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb ^ ffff8880a4932480: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb ffff8880a4932500: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb Fixes: b863ceb7ddce ("[NET]: Add macvlan driver") Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: syzbot <syzkaller@googlegroups.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-01-06Merge tag 'trace-v5.5-rc5' of ↵Linus Torvalds1-4/+4
git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace Pull tracing fixes from Steven Rostedt: "Various tracing fixes: - kbuild found missing define of MCOUNT_INSN_SIZE for various build configs - Initialize variable to zero as gcc thinks it is used undefined (it really isn't but the code is subtle enough that this doesn't hurt) - Convert from do_div() to div64_ull() to prevent potential divide by zero - Unregister a trace point on error path in sched_wakeup tracer - Use signed offset for archs that can have stext not be first - A simple indentation fix (whitespace error)" * tag 'trace-v5.5-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace: tracing: Fix indentation issue kernel/trace: Fix do not unregister tracepoints when register sched_migrate_task fail tracing: Change offset type to s32 in preempt/irq tracepoints ftrace: Avoid potential division by zero in function profiler tracing: Have stack tracer compile when MCOUNT_INSN_SIZE is not defined tracing: Define MCOUNT_INSN_SIZE when not defined without direct calls tracing: Initialize val to zero in parse_entry of inject code
2020-01-06net: ethernet: sxgbe: Rename Samsung to lowercaseKrzysztof Kozlowski1-1/+1
Fix up inconsistent usage of upper and lowercase letters in "Samsung" name. "SAMSUNG" is not an abbreviation but a regular trademarked name. Therefore it should be written with lowercase letters starting with capital letter. Although advertisement materials usually use uppercase "SAMSUNG", the lowercase version is used in all legal aspects (e.g. on Wikipedia and in privacy/legal statements on https://www.samsung.com/semiconductor/privacy-global/). Signed-off-by: Krzysztof Kozlowski <krzk@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-01-06Merge tag 'spi-fix-v5.5-rc5' of ↵Linus Torvalds1-2/+2
git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi Pull spi fixes from Mark Brown: "A small collection of fixes here, one to make the newly added PTP timestamping code more accurate, a few driver fixes and a fix for the core DT binding to document the fact that we support eight wire buses" * tag 'spi-fix-v5.5-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi: spi: Document Octal mode as valid SPI bus width spi: spi-dw: Add lock protect dw_spi rx/tx to prevent concurrent calls spi: spi-fsl-dspi: Fix 16-bit word order in 32-bit XSPI mode spi: Don't look at TX buffer for PTP system timestamping spi: uniphier: Fix FIFO threshold
2020-01-06Merge tag 'rtc-5.5-2' of ↵Linus Torvalds1-0/+8
git://git.kernel.org/pub/scm/linux/kernel/git/abelloni/linux Pull RTC fixes from Alexandre Belloni: "A few fixes for this cycle. The CMOS AltCentury support broke a few platforms with a recent BIOS so I reverted it. The mt6397 fix is not that critical but good to have. And finally, the sun6i fix repairs WiFi and BT on a few platforms. Summary: - cmos: revert AltCentury support on AMD/Hygon - mt6397: fix alarm register overwrite - sun6i: ensure clock is working on R40" * tag 'rtc-5.5-2' of git://git.kernel.org/pub/scm/linux/kernel/git/abelloni/linux: rtc: cmos: Revert "rtc: Fix the AltCentury value on AMD/Hygon platform" rtc: mt6397: fix alarm register overwrite rtc: sun6i: Add support for RTC clocks on R40
2020-01-06netfilter: flowtable: add nf_flowtable_time_stampPablo Neira Ayuso1-0/+6
This patch adds nf_flowtable_time_stamp and updates the existing code to use it. This patch is also implicitly fixing up hardware statistic fetching via nf_flow_offload_stats() where casting to u32 is missing. Use nf_flow_timeout_delta() to fix this. Fixes: c29f74e0df7a ("netfilter: nf_flow_table: hardware offload support") Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Acked-by: wenxu <wenxu@ucloud.cn>
2020-01-04block: remove unused mp_bvec_last_segmentJens Axboe1-22/+0
After commit 85a8ce62c2ea ("block: add bio_truncate to fix guard_bio_eod") this function is unused, remove it. Reviewed-by: Ming Lei <ming.lei@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-01-04kcov: fix struct layout for kcov_remote_argAndrey Konovalov1-5/+5
Make the layout of kcov_remote_arg the same for 32-bit and 64-bit code. This makes it more convenient to write userspace apps that can be compiled into 32-bit or 64-bit binaries and still work with the same 64-bit kernel. Also use proper __u32 types in uapi headers instead of unsigned ints. Link: http://lkml.kernel.org/r/9e91020876029cfefc9211ff747685eba9536426.1575638983.git.andreyknvl@google.com Fixes: eec028c9386ed1a ("kcov: remote coverage support") Signed-off-by: Andrey Konovalov <andreyknvl@google.com> Acked-by: Marco Elver <elver@google.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Alan Stern <stern@rowland.harvard.edu> Cc: Felipe Balbi <balbi@kernel.org> Cc: Chunfeng Yun <chunfeng.yun@mediatek.com> Cc: "Jacky . Cao @ sony . com" <Jacky.Cao@sony.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Alexander Potapenko <glider@google.com> Cc: Marco Elver <elver@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-01-04mm/memory_hotplug: shrink zones when offlining memoryDavid Hildenbrand1-2/+5
We currently try to shrink a single zone when removing memory. We use the zone of the first page of the memory we are removing. If that memmap was never initialized (e.g., memory was never onlined), we will read garbage and can trigger kernel BUGs (due to a stale pointer): BUG: unable to handle page fault for address: 000000000000353d #PF: supervisor write access in kernel mode #PF: error_code(0x0002) - not-present page PGD 0 P4D 0 Oops: 0002 [#1] SMP PTI CPU: 1 PID: 7 Comm: kworker/u8:0 Not tainted 5.3.0-rc5-next-20190820+ #317 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.1-0-ga5cab58e9a3f-prebuilt.qemu.4 Workqueue: kacpi_hotplug acpi_hotplug_work_fn RIP: 0010:clear_zone_contiguous+0x5/0x10 Code: 48 89 c6 48 89 c3 e8 2a fe ff ff 48 85 c0 75 cf 5b 5d c3 c6 85 fd 05 00 00 01 5b 5d c3 0f 1f 840 RSP: 0018:ffffad2400043c98 EFLAGS: 00010246 RAX: 0000000000000000 RBX: 0000000200000000 RCX: 0000000000000000 RDX: 0000000000200000 RSI: 0000000000140000 RDI: 0000000000002f40 RBP: 0000000140000000 R08: 0000000000000000 R09: 0000000000000001 R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000140000 R13: 0000000000140000 R14: 0000000000002f40 R15: ffff9e3e7aff3680 FS: 0000000000000000(0000) GS:ffff9e3e7bb00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 000000000000353d CR3: 0000000058610000 CR4: 00000000000006e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: __remove_pages+0x4b/0x640 arch_remove_memory+0x63/0x8d try_remove_memory+0xdb/0x130 __remove_memory+0xa/0x11 acpi_memory_device_remove+0x70/0x100 acpi_bus_trim+0x55/0x90 acpi_device_hotplug+0x227/0x3a0 acpi_hotplug_work_fn+0x1a/0x30 process_one_work+0x221/0x550 worker_thread+0x50/0x3b0 kthread+0x105/0x140 ret_from_fork+0x3a/0x50 Modules linked in: CR2: 000000000000353d Instead, shrink the zones when offlining memory or when onlining failed. Introduce and use remove_pfn_range_from_zone(() for that. We now properly shrink the zones, even if we have DIMMs whereby - Some memory blocks fall into no zone (never onlined) - Some memory blocks fall into multiple zones (offlined+re-onlined) - Multiple memory blocks that fall into different zones Drop the zone parameter (with a potential dubious value) from __remove_pages() and __remove_section(). Link: http://lkml.kernel.org/r/20191006085646.5768-6-david@redhat.com Fixes: f1dd2cd13c4b ("mm, memory_hotplug: do not associate hotadded memory to zones until online") [visible after d0dc12e86b319] Signed-off-by: David Hildenbrand <david@redhat.com> Reviewed-by: Oscar Salvador <osalvador@suse.de> Cc: Michal Hocko <mhocko@suse.com> Cc: "Matthew Wilcox (Oracle)" <willy@infradead.org> Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.ibm.com> Cc: Pavel Tatashin <pasha.tatashin@soleen.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Logan Gunthorpe <logang@deltatee.com> Cc: <stable@vger.kernel.org> [5.0+] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-01-04Merge tag 'dmaengine-fix-5.5-rc5' of ↵Linus Torvalds1-1/+4
git://git.infradead.org/users/vkoul/slave-dma Pull dmaengine fixes from Vinod Koul: "A bunch of fixes for: - uninitialized dma_slave_caps access - virt-dma use after free in vchan_complete() - driver fixes for ioat, k3dma and jz4780" * tag 'dmaengine-fix-5.5-rc5' of git://git.infradead.org/users/vkoul/slave-dma: ioat: ioat_alloc_ring() failure handling. dmaengine: virt-dma: Fix access after free in vchan_complete() dmaengine: k3dma: Avoid null pointer traversal dmaengine: dma-jz4780: Also break descriptor chains on JZ4725B dmaengine: Fix access to uninitialized dma_slave_caps
2020-01-03Merge tag 'block-5.5-20200103' of git://git.kernel.dk/linux-blockLinus Torvalds1-0/+1
Pull block fixes from Jens Axboe: "Three fixes in here: - Fix for a missing split on default memory boundary mask (4G) (Ming) - Fix for multi-page read bio truncate (Ming) - Fix for null_blk zone close request handling (Damien)" * tag 'block-5.5-20200103' of git://git.kernel.dk/linux-block: null_blk: Fix REQ_OP_ZONE_CLOSE handling block: fix splitting segments on boundary masks block: add bio_truncate to fix guard_bio_eod
2020-01-03tracing: Change offset type to s32 in preempt/irq tracepointsJoel Fernandes (Google)1-4/+4
Discussion in the below link reported that symbols in modules can appear to be before _stext on ARM architecture, causing wrapping with the offsets of this tracepoint. Change the offset type to s32 to fix this. Link: http://lore.kernel.org/r/20191127154428.191095-1-antonio.borneo@st.com Link: http://lkml.kernel.org/r/20200102194625.226436-1-joel@joelfernandes.org Cc: Bjorn Helgaas <bhelgaas@google.com> Cc: David Sterba <dsterba@suse.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Mike Rapoport <rppt@linux.ibm.com> Cc: "Rafael J. Wysocki" <rafael.j.wysocki@intel.com> Cc: Sakari Ailus <sakari.ailus@linux.intel.com> Cc: Antonio Borneo <antonio.borneo@st.com> Cc: stable@vger.kernel.org Fixes: d59158162e032 ("tracing: Add support for preempt and irq enable/disable events") Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2020-01-02Merge tag 'sizeof_field-v5.5-rc5' of ↵Linus Torvalds1-9/+0
git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux Pull final sizeof_field conversion from Kees Cook: "Remove now unused FIELD_SIZEOF() macro (Kees Cook)" * tag 'sizeof_field-v5.5-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux: kernel.h: Remove unused FIELD_SIZEOF()
2020-01-02Revert "fs: remove ksys_dup()"Dominik Brodowski1-0/+1
This reverts commit 8243186f0cc7 ("fs: remove ksys_dup()") and the subsequent fix for it in commit 2d3145f8d280 ("early init: fix error handling when opening /dev/console"). Trying to use filp_open() and f_dupfd() instead of pseudo-syscalls caused more trouble than what is worth it: it requires accessing vfs internals and it turns out there were other bugs in it too. In particular, the file reference counting was wrong - because unlike the original "open+2*dup" sequence it used "filp_open+3*f_dupfd" and thus had an extra leaked file reference. That in turn then caused odd problems with Androidx86 long after boot becaue of how the extra reference to the console kept the session active even after all file descriptors had been closed. Reported-by: youling 257 <youling257@gmail.com> Cc: Arvind Sankar <nivedita@alum.mit.edu> Cc: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Dominik Brodowski <linux@dominikbrodowski.net> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-01-02can: can_dropped_invalid_skb(): ensure an initialized headroom in outgoing ↵Oliver Hartkopp1-0/+34
CAN sk_buffs KMSAN sysbot detected a read access to an untinitialized value in the headroom of an outgoing CAN related sk_buff. When using CAN sockets this area is filled appropriately - but when using a packet socket this initialization is missing. The problematic read access occurs in the CAN receive path which can only be triggered when the sk_buff is sent through a (virtual) CAN interface. So we check in the sending path whether we need to perform the missing initializations. Fixes: d3b58c47d330d ("can: replace timestamp as unique skb attribute") Reported-by: syzbot+b02ff0707a97e4e79ebb@syzkaller.appspotmail.com Signed-off-by: Oliver Hartkopp <socketcan@hartkopp.net> Tested-by: Oliver Hartkopp <socketcan@hartkopp.net> Cc: linux-stable <stable@vger.kernel.org> # >= v4.1 Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
2019-12-31Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netLinus Torvalds5-12/+30
Pull networking fixes from David Miller: 1) Fix big endian overflow in nf_flow_table, from Arnd Bergmann. 2) Fix port selection on big endian in nft_tproxy, from Phil Sutter. 3) Fix precision tracking for unbound scalars in bpf verifier, from Daniel Borkmann. 4) Fix integer overflow in socket rcvbuf check in UDP, from Antonio Messina. 5) Do not perform a neigh confirmation during a pmtu update over a tunnel, from Hangbin Liu. 6) Fix DMA mapping leak in dpaa_eth driver, from Madalin Bucur. 7) Various PTP fixes for sja1105 dsa driver, from Vladimir Oltean. 8) Add missing to dummy definition of of_mdiobus_child_is_phy(), from Geert Uytterhoeven * git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (54 commits) hsr: fix slab-out-of-bounds Read in hsr_debugfs_rename() net/sched: add delete_empty() to filters and use it in cls_flower tcp: Fix highest_sack and highest_sack_seq ptp: fix the race between the release of ptp_clock and cdev net: dsa: sja1105: Reconcile the meaning of TPID and TPID2 for E/T and P/Q/R/S Documentation: net: dsa: sja1105: Remove text about taprio base-time limitation net: dsa: sja1105: Remove restriction of zero base-time for taprio offload net: dsa: sja1105: Really make the PTP command read-write net: dsa: sja1105: Take PTP egress timestamp by port, not mgmt slot cxgb4/cxgb4vf: fix flow control display for auto negotiation mlxsw: spectrum: Use dedicated policer for VRRP packets mlxsw: spectrum_router: Skip loopback RIFs during MAC validation net: stmmac: dwmac-meson8b: Fix the RGMII TX delay on Meson8b/8m2 SoCs net/sched: act_mirred: Pull mac prior redir to non mac_header_xmit device net_sched: sch_fq: properly set sk->sk_pacing_status bnx2x: Fix accounting of vlan resources among the PFs bnx2x: Use appropriate define for vlan credit of: mdio: Add missing inline to of_mdiobus_child_is_phy() dummy net: phy: aquantia: add suspend / resume ops for AQR105 dpaa_eth: fix DMA mapping leak ...
2019-12-30net/sched: add delete_empty() to filters and use it in cls_flowerDavide Caratti1-0/+5
Revert "net/sched: cls_u32: fix refcount leak in the error path of u32_change()", and fix the u32 refcount leak in a more generic way that preserves the semantic of rule dumping. On tc filters that don't support lockless insertion/removal, there is no need to guard against concurrent insertion when a removal is in progress. Therefore, for most of them we can avoid a full walk() when deleting, and just decrease the refcount, like it was done on older Linux kernels. This fixes situations where walk() was wrongly detecting a non-empty filter, like it happened with cls_u32 in the error path of change(), thus leading to failures in the following tdc selftests: 6aa7: (filter, u32) Add/Replace u32 with source match and invalid indev 6658: (filter, u32) Add/Replace u32 with custom hash table and invalid handle 74c2: (filter, u32) Add/Replace u32 filter with invalid hash table id On cls_flower, and on (future) lockless filters, this check is necessary: move all the check_empty() logic in a callback so that each filter can have its own implementation. For cls_flower, it's sufficient to check if no IDRs have been allocated. This reverts commit 275c44aa194b7159d1191817b20e076f55f0e620. Changes since v1: - document the need for delete_empty() when TCF_PROTO_OPS_DOIT_UNLOCKED is used, thanks to Vlad Buslov - implement delete_empty() without doing fl_walk(), thanks to Vlad Buslov - squash revert and new fix in a single patch, to be nice with bisect tests that run tdc on u32 filter, thanks to Dave Miller Fixes: 275c44aa194b ("net/sched: cls_u32: fix refcount leak in the error path of u32_change()") Fixes: 6676d5e416ee ("net: sched: set dedicated tcf_walker flag when tp is empty") Suggested-by: Jamal Hadi Salim <jhs@mojatatu.com> Suggested-by: Vlad Buslov <vladbu@mellanox.com> Signed-off-by: Davide Caratti <dcaratti@redhat.com> Reviewed-by: Vlad Buslov <vladbu@mellanox.com> Tested-by: Jamal Hadi Salim <jhs@mojatatu.com> Acked-by: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-12-30ptp: fix the race between the release of ptp_clock and cdevVladis Dronov1-8/+11
In a case when a ptp chardev (like /dev/ptp0) is open but an underlying device is removed, closing this file leads to a race. This reproduces easily in a kvm virtual machine: ts# cat openptp0.c int main() { ... fp = fopen("/dev/ptp0", "r"); ... sleep(10); } ts# uname -r 5.5.0-rc3-46cf053e ts# cat /proc/cmdline ... slub_debug=FZP ts# modprobe ptp_kvm ts# ./openptp0 & [1] 670 opened /dev/ptp0, sleeping 10s... ts# rmmod ptp_kvm ts# ls /dev/ptp* ls: cannot access '/dev/ptp*': No such file or directory ts# ...woken up [ 48.010809] general protection fault: 0000 [#1] SMP [ 48.012502] CPU: 6 PID: 658 Comm: openptp0 Not tainted 5.5.0-rc3-46cf053e #25 [ 48.014624] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), ... [ 48.016270] RIP: 0010:module_put.part.0+0x7/0x80 [ 48.017939] RSP: 0018:ffffb3850073be00 EFLAGS: 00010202 [ 48.018339] RAX: 000000006b6b6b6b RBX: 6b6b6b6b6b6b6b6b RCX: ffff89a476c00ad0 [ 48.018936] RDX: fffff65a08d3ea08 RSI: 0000000000000247 RDI: 6b6b6b6b6b6b6b6b [ 48.019470] ... ^^^ a slub poison [ 48.023854] Call Trace: [ 48.024050] __fput+0x21f/0x240 [ 48.024288] task_work_run+0x79/0x90 [ 48.024555] do_exit+0x2af/0xab0 [ 48.024799] ? vfs_write+0x16a/0x190 [ 48.025082] do_group_exit+0x35/0x90 [ 48.025387] __x64_sys_exit_group+0xf/0x10 [ 48.025737] do_syscall_64+0x3d/0x130 [ 48.026056] entry_SYSCALL_64_after_hwframe+0x44/0xa9 [ 48.026479] RIP: 0033:0x7f53b12082f6 [ 48.026792] ... [ 48.030945] Modules linked in: ptp i6300esb watchdog [last unloaded: ptp_kvm] [ 48.045001] Fixing recursive fault but reboot is needed! This happens in: static void __fput(struct file *file) { ... if (file->f_op->release) file->f_op->release(inode, file); <<< cdev is kfree'd here if (unlikely(S_ISCHR(inode->i_mode) && inode->i_cdev != NULL && !(mode & FMODE_PATH))) { cdev_put(inode->i_cdev); <<< cdev fields are accessed here Namely: __fput() posix_clock_release() kref_put(&clk->kref, delete_clock) <<< the last reference delete_clock() delete_ptp_clock() kfree(ptp) <<< cdev is embedded in ptp cdev_put module_put(p->owner) <<< *p is kfree'd, bang! Here cdev is embedded in posix_clock which is embedded in ptp_clock. The race happens because ptp_clock's lifetime is controlled by two refcounts: kref and cdev.kobj in posix_clock. This is wrong. Make ptp_clock's sysfs device a parent of cdev with cdev_device_add() created especially for such cases. This way the parent device with its ptp_clock is not released until all references to the cdev are released. This adds a requirement that an initialized but not exposed struct device should be provided to posix_clock_register() by a caller instead of a simple dev_t. This approach was adopted from the commit 72139dfa2464 ("watchdog: Fix the race between the release of watchdog_core_data and cdev"). See details of the implementation in the commit 233ed09d7fda ("chardev: add helper function to register char devs with a struct device"). Link: https://lore.kernel.org/linux-fsdevel/20191125125342.6189-1-vdronov@redhat.com/T/#u Analyzed-by: Stephen Johnston <sjohnsto@redhat.com> Analyzed-by: Vern Lovejoy <vlovejoy@redhat.com> Signed-off-by: Vladis Dronov <vdronov@redhat.com> Acked-by: Richard Cochran <richardcochran@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-12-30kernel.h: Remove unused FIELD_SIZEOF()Kees Cook1-9/+0
Now that all callers of FIELD_SIZEOF() have been converted to sizeof_field(), remove the unused prior macro. Signed-off-by: Kees Cook <keescook@chromium.org>
2019-12-30asm-generic/nds32: don't redefine cacheflush primitivesMike Rapoport1-1/+32
The commit c296d4dc13ae ("asm-generic: fix a compilation warning") changed asm-generic/cachflush.h to use static inlines instead of macros and as a result the nds32 build with CONFIG_CPU_CACHE_ALIASING=n fails: CC init/main.o In file included from arch/nds32/include/asm/cacheflush.h:43, from include/linux/highmem.h:12, from include/linux/pagemap.h:11, from include/linux/blkdev.h:16, from include/linux/blk-cgroup.h:23, from include/linux/writeback.h:14, from init/main.c:44: include/asm-generic/cacheflush.h:50:20: error: static declaration of 'flush_icache_range' follows non-static declaration static inline void flush_icache_range(unsigned long start, unsigned long end) ^~~~~~~~~~~~~~~~~~ In file included from include/linux/highmem.h:12, from include/linux/pagemap.h:11, from include/linux/blkdev.h:16, from include/linux/blk-cgroup.h:23, from include/linux/writeback.h:14, from init/main.c:44: arch/nds32/include/asm/cacheflush.h:11:6: note: previous declaration of 'flush_icache_range' was here void flush_icache_range(unsigned long start, unsigned long end); ^~~~~~~~~~~~~~~~~~ Surround the inline functions in asm-generic/cacheflush.h by ifdef's so that architectures could override them and add the required overrides to nds32. Fixes: c296d4dc13ae ("asm-generic: fix a compilation warning") Link: https://lore.kernel.org/lkml/201912212139.yptX8CsV%25lkp@intel.com/ Reported-by: kbuild test robot <lkp@intel.com> Signed-off-by: Mike Rapoport <rppt@linux.ibm.com> Reviewed-by: Greentime Hu <green.hu@gmail.com> Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2019-12-28block: add bio_truncate to fix guard_bio_eodMing Lei1-0/+1
Some filesystem, such as vfat, may send bio which crosses device boundary, and the worse thing is that the IO request starting within device boundaries can contain more than one segment past EOD. Commit dce30ca9e3b6 ("fs: fix guard_bio_eod to check for real EOD errors") tries to fix this issue by returning -EIO for this situation. However, this way lets fs user code lose chance to handle -EIO, then sync_inodes_sb() may hang for ever. Also the current truncating on last segment is dangerous by updating the last bvec, given bvec table becomes not immutable any more, and fs bio users may not retrieve the truncated pages via bio_for_each_segment_all() in its .end_io callback. Fixes this issue by supporting multi-segment truncating. And the approach is simpler: - just update bio size since block layer can make correct bvec with the updated bio size. Then bvec table becomes really immutable. - zero all truncated segments for read bio Cc: Carlos Maiolino <cmaiolino@redhat.com> Cc: linux-fsdevel@vger.kernel.org Fixed-by: dce30ca9e3b6 ("fs: fix guard_bio_eod to check for real EOD errors") Reported-by: syzbot+2b9e54155c8c25d8d165@syzkaller.appspotmail.com Signed-off-by: Ming Lei <ming.lei@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-12-27spi: Don't look at TX buffer for PTP system timestampingVladimir Oltean1-2/+2
The API for PTP system timestamping (associating a SPI transaction with the system time at which it was transferred) is flawed: it assumes that the xfer->tx_buf pointer will always be present. This is, of course, not always the case. So introduce a "progress" variable that denotes how many word have been transferred. Fix the Freescale DSPI driver, the only user of the API so far, in the same patch. Fixes: b42faeee718c ("spi: Add a PTP system timestamp to the transfer structure") Fixes: d6b71dfaeeba ("spi: spi-fsl-dspi: Implement the PTP system timestamping for TCFQ mode") Signed-off-by: Vladimir Oltean <olteanv@gmail.com> Link: https://lore.kernel.org/r/20191227012417.1057-1-olteanv@gmail.com Signed-off-by: Mark Brown <broonie@kernel.org>
2019-12-26of: mdio: Add missing inline to of_mdiobus_child_is_phy() dummyGeert Uytterhoeven1-1/+1
If CONFIG_OF_MDIO=n: drivers/net/phy/mdio_bus.c:23: include/linux/of_mdio.h:58:13: warning: ‘of_mdiobus_child_is_phy’ defined but not used [-Wunused-function] static bool of_mdiobus_child_is_phy(struct device_node *child) ^~~~~~~~~~~~~~~~~~~~~~~ Fix this by adding the missing "inline" keyword. Fixes: 0aa4d016c043d16a ("of: mdio: export of_mdiobus_child_is_phy") Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Acked-by: Borislav Petkov <bp@suse.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-12-25ata: libahci_platform: Export again ahci_platform_<en/dis>able_phys()Florian Fainelli1-0/+2
This reverts commit 6bb86fefa086faba7b60bb452300b76a47cde1a5 ("libahci_platform: Staticize ahci_platform_<en/dis>able_phys()") we are going to need ahci_platform_{enable,disable}_phys() in a subsequent commit for ahci_brcm.c in order to properly control the PHY initialization order. Also make sure the function prototypes are declared in include/linux/ahci_platform.h as a result. Cc: stable@vger.kernel.org Reviewed-by: Hans de Goede <hdegoede@redhat.com> Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-12-25libata: Fix retrieving of active qcsSascha Hauer1-0/+1
ata_qc_complete_multiple() is called with a mask of the still active tags. mv_sata doesn't have this information directly and instead calculates the still active tags from the started tags (ap->qc_active) and the finished tags as (ap->qc_active ^ done_mask) Since 28361c40368 the hw_tag and tag are no longer the same and the equation is no longer valid. In ata_exec_internal_sg() ap->qc_active is initialized as 1ULL << ATA_TAG_INTERNAL, but in hardware tag 0 is started and this will be in done_mask on completion. ap->qc_active ^ done_mask becomes 0x100000000 ^ 0x1 = 0x100000001 and thus tag 0 used as the internal tag will never be reported as completed. This is fixed by introducing ata_qc_get_active() which returns the active hardware tags and calling it where appropriate. This is tested on mv_sata, but sata_fsl and sata_nv suffer from the same problem. There is another case in sata_nv that most likely needs fixing as well, but this looks a little different, so I wasn't confident enough to change that. Fixes: 28361c403683 ("libata: add extra internal command") Cc: stable@vger.kernel.org Tested-by: Pali Rohár <pali.rohar@gmail.com> Signed-off-by: Sascha Hauer <s.hauer@pengutronix.de> Add missing export of ata_qc_get_active(), as per Pali. Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-12-25rseq: Unregister rseq for clone CLONE_VMMathieu Desnoyers1-2/+2
It has been reported by Google that rseq is not behaving properly with respect to clone when CLONE_VM is used without CLONE_THREAD. It keeps the prior thread's rseq TLS registered when the TLS of the thread has moved, so the kernel can corrupt the TLS of the parent. The approach of clearing the per task-struct rseq registration on clone with CLONE_THREAD flag is incomplete. It does not cover the use-case of clone with CLONE_VM set, but without CLONE_THREAD. Here is the rationale for unregistering rseq on clone with CLONE_VM flag set: 1) CLONE_THREAD requires CLONE_SIGHAND, which requires CLONE_VM to be set. Therefore, just checking for CLONE_VM covers all CLONE_THREAD uses. There is no point in checking for both CLONE_THREAD and CLONE_VM, 2) There is the possibility of an unlikely scenario where CLONE_SETTLS is used without CLONE_VM. In order to be an issue, it would require that the rseq TLS is in a shared memory area. I do not plan on adding CLONE_SETTLS to the set of clone flags which unregister RSEQ, because it would require that we also unregister RSEQ on set_thread_area(2) and arch_prctl(2) ARCH_SET_FS for completeness. So rather than doing a partial solution, it appears better to let user-space explicitly perform rseq unregistration across clone if needed in scenarios where CLONE_VM is not set. Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: https://lkml.kernel.org/r/20191211161713.4490-3-mathieu.desnoyers@efficios.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-12-24net/dst: do not confirm neighbor for vxlan and geneve pmtu updateHangbin Liu1-1/+1
When do IPv6 tunnel PMTU update and calls __ip6_rt_update_pmtu() in the end, we should not call dst_confirm_neigh() as there is no two-way communication. So disable the neigh confirm for vxlan and geneve pmtu update. v5: No change. v4: No change. v3: Do not remove dst_confirm_neigh, but add a new bool parameter in dst_ops.update_pmtu to control whether we should do neighbor confirm. Also split the big patch to small ones for each area. v2: Remove dst_confirm_neigh in __ip6_rt_update_pmtu. Fixes: a93bf0ff4490 ("vxlan: update skb dst pmtu on tx path") Fixes: 52a589d51f10 ("geneve: update skb dst pmtu on tx path") Reviewed-by: Guillaume Nault <gnault@redhat.com> Tested-by: Guillaume Nault <gnault@redhat.com> Acked-by: David Ahern <dsahern@gmail.com> Signed-off-by: Hangbin Liu <liuhangbin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-12-24net/dst: add new function skb_dst_update_pmtu_no_confirmHangbin Liu1-0/+9
Add a new function skb_dst_update_pmtu_no_confirm() for callers who need update pmtu but should not do neighbor confirm. v5: No change. v4: No change. v3: Do not remove dst_confirm_neigh, but add a new bool parameter in dst_ops.update_pmtu to control whether we should do neighbor confirm. Also split the big patch to small ones for each area. v2: Remove dst_confirm_neigh in __ip6_rt_update_pmtu. Reviewed-by: Guillaume Nault <gnault@redhat.com> Acked-by: David Ahern <dsahern@gmail.com> Signed-off-by: Hangbin Liu <liuhangbin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-12-24net: add bool confirm_neigh parameter for dst_ops.update_pmtuHangbin Liu2-2/+3
The MTU update code is supposed to be invoked in response to real networking events that update the PMTU. In IPv6 PMTU update function __ip6_rt_update_pmtu() we called dst_confirm_neigh() to update neighbor confirmed time. But for tunnel code, it will call pmtu before xmit, like: - tnl_update_pmtu() - skb_dst_update_pmtu() - ip6_rt_update_pmtu() - __ip6_rt_update_pmtu() - dst_confirm_neigh() If the tunnel remote dst mac address changed and we still do the neigh confirm, we will not be able to update neigh cache and ping6 remote will failed. So for this ip_tunnel_xmit() case, _EVEN_ if the MTU is changed, we should not be invoking dst_confirm_neigh() as we have no evidence of successful two-way communication at this point. On the other hand it is also important to keep the neigh reachability fresh for TCP flows, so we cannot remove this dst_confirm_neigh() call. To fix the issue, we have to add a new bool parameter for dst_ops.update_pmtu to choose whether we should do neigh update or not. I will add the parameter in this patch and set all the callers to true to comply with the previous way, and fix the tunnel code one by one on later patches. v5: No change. v4: No change. v3: Do not remove dst_confirm_neigh, but add a new bool parameter in dst_ops.update_pmtu to control whether we should do neighbor confirm. Also split the big patch to small ones for each area. v2: Remove dst_confirm_neigh in __ip6_rt_update_pmtu. Suggested-by: David Miller <davem@davemloft.net> Reviewed-by: Guillaume Nault <gnault@redhat.com> Acked-by: David Ahern <dsahern@gmail.com> Signed-off-by: Hangbin Liu <liuhangbin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>