summaryrefslogtreecommitdiffstats
AgeCommit message (Collapse)AuthorFilesLines
2022-08-20mm/shmem: fix chattr fsflags support in tmpfsHugh Dickins2-32/+35
ext[234] have always allowed unimplemented chattr flags to be set, but other filesystems have tended to be stricter. Follow the stricter approach for tmpfs: I don't want to have to explain why csu attributes don't actually work, and we won't need to update the chattr(1) manpage; and it's never wrong to start off strict, relaxing later if persuaded. Allow only a (append only) i (immutable) A (no atime) and d (no dump). Although lsattr showed 'A' inherited, the NOATIME behavior was not being inherited: because nothing sync'ed FS_NOATIME_FL to S_NOATIME. Add shmem_set_inode_flags() to sync the flags, using inode_set_flags() to avoid that instant of lost immutablility during fileattr_set(). But that change switched generic/079 from passing to failing: because FS_IMMUTABLE_FL and FS_APPEND_FL had been unconventionally included in the INHERITED fsflags: remove them and generic/079 is back to passing. Link: https://lkml.kernel.org/r/2961dcb0-ddf3-b9f0-3268-12a4ff996856@google.com Fixes: e408e695f5f1 ("mm/shmem: support FS_IOC_[SG]ETFLAGS in tmpfs") Signed-off-by: Hugh Dickins <hughd@google.com> Cc: "Theodore Ts'o" <tytso@mit.edu> Cc: Radoslaw Burny <rburny@google.com> Cc: "Darrick J. Wong" <djwong@kernel.org> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-08-20mm/hugetlb: support write-faults in shared mappingsDavid Hildenbrand1-7/+19
If we ever get a write-fault on a write-protected page in a shared mapping, we'd be in trouble (again). Instead, we can simply map the page writable. And in fact, there is even a way right now to trigger that code via uffd-wp ever since we stared to support it for shmem in 5.19: -------------------------------------------------------------------------- #include <stdio.h> #include <stdlib.h> #include <string.h> #include <fcntl.h> #include <unistd.h> #include <errno.h> #include <sys/mman.h> #include <sys/syscall.h> #include <sys/ioctl.h> #include <linux/userfaultfd.h> #define HUGETLB_SIZE (2 * 1024 * 1024u) static char *map; int uffd; static int temp_setup_uffd(void) { struct uffdio_api uffdio_api; struct uffdio_register uffdio_register; struct uffdio_writeprotect uffd_writeprotect; struct uffdio_range uffd_range; uffd = syscall(__NR_userfaultfd, O_CLOEXEC | O_NONBLOCK | UFFD_USER_MODE_ONLY); if (uffd < 0) { fprintf(stderr, "syscall() failed: %d\n", errno); return -errno; } uffdio_api.api = UFFD_API; uffdio_api.features = UFFD_FEATURE_PAGEFAULT_FLAG_WP; if (ioctl(uffd, UFFDIO_API, &uffdio_api) < 0) { fprintf(stderr, "UFFDIO_API failed: %d\n", errno); return -errno; } if (!(uffdio_api.features & UFFD_FEATURE_PAGEFAULT_FLAG_WP)) { fprintf(stderr, "UFFD_FEATURE_WRITEPROTECT missing\n"); return -ENOSYS; } /* Register UFFD-WP */ uffdio_register.range.start = (unsigned long) map; uffdio_register.range.len = HUGETLB_SIZE; uffdio_register.mode = UFFDIO_REGISTER_MODE_WP; if (ioctl(uffd, UFFDIO_REGISTER, &uffdio_register) < 0) { fprintf(stderr, "UFFDIO_REGISTER failed: %d\n", errno); return -errno; } /* Writeprotect a single page. */ uffd_writeprotect.range.start = (unsigned long) map; uffd_writeprotect.range.len = HUGETLB_SIZE; uffd_writeprotect.mode = UFFDIO_WRITEPROTECT_MODE_WP; if (ioctl(uffd, UFFDIO_WRITEPROTECT, &uffd_writeprotect)) { fprintf(stderr, "UFFDIO_WRITEPROTECT failed: %d\n", errno); return -errno; } /* Unregister UFFD-WP without prior writeunprotection. */ uffd_range.start = (unsigned long) map; uffd_range.len = HUGETLB_SIZE; if (ioctl(uffd, UFFDIO_UNREGISTER, &uffd_range)) { fprintf(stderr, "UFFDIO_UNREGISTER failed: %d\n", errno); return -errno; } return 0; } int main(int argc, char **argv) { int fd; fd = open("/dev/hugepages/tmp", O_RDWR | O_CREAT); if (!fd) { fprintf(stderr, "open() failed\n"); return -errno; } if (ftruncate(fd, HUGETLB_SIZE)) { fprintf(stderr, "ftruncate() failed\n"); return -errno; } map = mmap(NULL, HUGETLB_SIZE, PROT_READ|PROT_WRITE, MAP_SHARED, fd, 0); if (map == MAP_FAILED) { fprintf(stderr, "mmap() failed\n"); return -errno; } *map = 0; if (temp_setup_uffd()) return 1; *map = 0; return 0; } -------------------------------------------------------------------------- Above test fails with SIGBUS when there is only a single free hugetlb page. # echo 1 > /sys/kernel/mm/hugepages/hugepages-2048kB/nr_hugepages # ./test Bus error (core dumped) And worse, with sufficient free hugetlb pages it will map an anonymous page into a shared mapping, for example, messing up accounting during unmap and breaking MAP_SHARED semantics: # echo 2 > /sys/kernel/mm/hugepages/hugepages-2048kB/nr_hugepages # ./test # cat /proc/meminfo | grep HugePages_ HugePages_Total: 2 HugePages_Free: 1 HugePages_Rsvd: 18446744073709551615 HugePages_Surp: 0 Reason is that uffd-wp doesn't clear the uffd-wp PTE bit when unregistering and consequently keeps the PTE writeprotected. Reason for this is to avoid the additional overhead when unregistering. Note that this is the case also for !hugetlb and that we will end up with writable PTEs that still have the uffd-wp PTE bit set once we return from hugetlb_wp(). I'm not touching the uffd-wp PTE bit for now, because it seems to be a generic thing -- wp_page_reuse() also doesn't clear it. VM_MAYSHARE handling in hugetlb_fault() for FAULT_FLAG_WRITE indicates that MAP_SHARED handling was at least envisioned, but could never have worked as expected. While at it, make sure that we never end up in hugetlb_wp() on write faults without VM_WRITE, because we don't support maybe_mkwrite() semantics as commonly used in the !hugetlb case -- for example, in wp_page_reuse(). Note that there is no need to do any kind of reservation in hugetlb_fault() in this case ... because we already have a hugetlb page mapped R/O that we will simply map writable and we are not dealing with COW/unsharing. Link: https://lkml.kernel.org/r/20220811103435.188481-3-david@redhat.com Fixes: b1f9e876862d ("mm/uffd: enable write protection for shmem & hugetlbfs") Signed-off-by: David Hildenbrand <david@redhat.com> Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com> Cc: Bjorn Helgaas <bhelgaas@google.com> Cc: Cyrill Gorcunov <gorcunov@openvz.org> Cc: Hugh Dickins <hughd@google.com> Cc: Jamie Liu <jamieliu@google.com> Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Muchun Song <songmuchun@bytedance.com> Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> Cc: Pavel Emelyanov <xemul@parallels.com> Cc: Peter Feiner <pfeiner@google.com> Cc: Peter Xu <peterx@redhat.com> Cc: <stable@vger.kernel.org> [5.19] Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-08-20mm/hugetlb: fix hugetlb not supporting softdirty trackingDavid Hildenbrand1-2/+5
Patch series "mm/hugetlb: fix write-fault handling for shared mappings", v2. I observed that hugetlb does not support/expect write-faults in shared mappings that would have to map the R/O-mapped page writable -- and I found two case where we could currently get such faults and would erroneously map an anon page into a shared mapping. Reproducers part of the patches. I propose to backport both fixes to stable trees. The first fix needs a small adjustment. This patch (of 2): Staring at hugetlb_wp(), one might wonder where all the logic for shared mappings is when stumbling over a write-protected page in a shared mapping. In fact, there is none, and so far we thought we could get away with that because e.g., mprotect() should always do the right thing and map all pages directly writable. Looks like we were wrong: -------------------------------------------------------------------------- #include <stdio.h> #include <stdlib.h> #include <string.h> #include <fcntl.h> #include <unistd.h> #include <errno.h> #include <sys/mman.h> #define HUGETLB_SIZE (2 * 1024 * 1024u) static void clear_softdirty(void) { int fd = open("/proc/self/clear_refs", O_WRONLY); const char *ctrl = "4"; int ret; if (fd < 0) { fprintf(stderr, "open(clear_refs) failed\n"); exit(1); } ret = write(fd, ctrl, strlen(ctrl)); if (ret != strlen(ctrl)) { fprintf(stderr, "write(clear_refs) failed\n"); exit(1); } close(fd); } int main(int argc, char **argv) { char *map; int fd; fd = open("/dev/hugepages/tmp", O_RDWR | O_CREAT); if (!fd) { fprintf(stderr, "open() failed\n"); return -errno; } if (ftruncate(fd, HUGETLB_SIZE)) { fprintf(stderr, "ftruncate() failed\n"); return -errno; } map = mmap(NULL, HUGETLB_SIZE, PROT_READ|PROT_WRITE, MAP_SHARED, fd, 0); if (map == MAP_FAILED) { fprintf(stderr, "mmap() failed\n"); return -errno; } *map = 0; if (mprotect(map, HUGETLB_SIZE, PROT_READ)) { fprintf(stderr, "mmprotect() failed\n"); return -errno; } clear_softdirty(); if (mprotect(map, HUGETLB_SIZE, PROT_READ|PROT_WRITE)) { fprintf(stderr, "mmprotect() failed\n"); return -errno; } *map = 0; return 0; } -------------------------------------------------------------------------- Above test fails with SIGBUS when there is only a single free hugetlb page. # echo 1 > /sys/kernel/mm/hugepages/hugepages-2048kB/nr_hugepages # ./test Bus error (core dumped) And worse, with sufficient free hugetlb pages it will map an anonymous page into a shared mapping, for example, messing up accounting during unmap and breaking MAP_SHARED semantics: # echo 2 > /sys/kernel/mm/hugepages/hugepages-2048kB/nr_hugepages # ./test # cat /proc/meminfo | grep HugePages_ HugePages_Total: 2 HugePages_Free: 1 HugePages_Rsvd: 18446744073709551615 HugePages_Surp: 0 Reason in this particular case is that vma_wants_writenotify() will return "true", removing VM_SHARED in vma_set_page_prot() to map pages write-protected. Let's teach vma_wants_writenotify() that hugetlb does not support softdirty tracking. Link: https://lkml.kernel.org/r/20220811103435.188481-1-david@redhat.com Link: https://lkml.kernel.org/r/20220811103435.188481-2-david@redhat.com Fixes: 64e455079e1b ("mm: softdirty: enable write notifications on VMAs after VM_SOFTDIRTY cleared") Signed-off-by: David Hildenbrand <david@redhat.com> Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com> Cc: Peter Feiner <pfeiner@google.com> Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Cyrill Gorcunov <gorcunov@openvz.org> Cc: Pavel Emelyanov <xemul@parallels.com> Cc: Jamie Liu <jamieliu@google.com> Cc: Hugh Dickins <hughd@google.com> Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> Cc: Bjorn Helgaas <bhelgaas@google.com> Cc: Muchun Song <songmuchun@bytedance.com> Cc: Peter Xu <peterx@redhat.com> Cc: <stable@vger.kernel.org> [3.18+] Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-08-20mm/uffd: reset write protection when unregister with wp-modePeter Xu3-11/+24
The motivation of this patch comes from a recent report and patchfix from David Hildenbrand on hugetlb shared handling of wr-protected page [1]. With the reproducer provided in commit message of [1], one can leverage the uffd-wp lazy-reset of ptes to trigger a hugetlb issue which can affect not only the attacker process, but also the whole system. The lazy-reset mechanism of uffd-wp was used to make unregister faster, meanwhile it has an assumption that any leftover pgtable entries should only affect the process on its own, so not only the user should be aware of anything it does, but also it should not affect outside of the process. But it seems that this is not true, and it can also be utilized to make some exploit easier. So far there's no clue showing that the lazy-reset is important to any userfaultfd users because normally the unregister will only happen once for a specific range of memory of the lifecycle of the process. Considering all above, what this patch proposes is to do explicit pte resets when unregister an uffd region with wr-protect mode enabled. It should be the same as calling ioctl(UFFDIO_WRITEPROTECT, wp=false) right before ioctl(UFFDIO_UNREGISTER) for the user. So potentially it'll make the unregister slower. From that pov it's a very slight abi change, but hopefully nothing should break with this change either. Regarding to the change itself - core of uffd write [un]protect operation is moved into a separate function (uffd_wp_range()) and it is reused in the unregister code path. Note that the new function will not check for anything, e.g. ranges or memory types, because they should have been checked during the previous UFFDIO_REGISTER or it should have failed already. It also doesn't check mmap_changing because we're with mmap write lock held anyway. I added a Fixes upon introducing of uffd-wp shmem+hugetlbfs because that's the only issue reported so far and that's the commit David's reproducer will start working (v5.19+). But the whole idea actually applies to not only file memories but also anonymous. It's just that we don't need to fix anonymous prior to v5.19- because there's no known way to exploit. IOW, this patch can also fix the issue reported in [1] as the patch 2 does. [1] https://lore.kernel.org/all/20220811103435.188481-3-david@redhat.com/ Link: https://lkml.kernel.org/r/20220811201340.39342-1-peterx@redhat.com Fixes: b1f9e876862d ("mm/uffd: enable write protection for shmem & hugetlbfs") Signed-off-by: Peter Xu <peterx@redhat.com> Cc: David Hildenbrand <david@redhat.com> Cc: Mike Rapoport <rppt@linux.vnet.ibm.com> Cc: Mike Kravetz <mike.kravetz@oracle.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Nadav Amit <nadav.amit@gmail.com> Cc: Axel Rasmussen <axelrasmussen@google.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-08-20mm/smaps: don't access young/dirty bit if pte unpresentPeter Xu1-3/+4
These bits should only be valid when the ptes are present. Introducing two booleans for it and set it to false when !pte_present() for both pte and pmd accountings. The bug is found during code reading and no real world issue reported, but logically such an error can cause incorrect readings for either smaps or smaps_rollup output on quite a few fields. For example, it could cause over-estimate on values like Shared_Dirty, Private_Dirty, Referenced. Or it could also cause under-estimate on values like LazyFree, Shared_Clean, Private_Clean. Link: https://lkml.kernel.org/r/20220805160003.58929-1-peterx@redhat.com Fixes: b1d4d9e0cbd0 ("proc/smaps: carefully handle migration entries") Fixes: c94b6923fa0a ("/proc/PID/smaps: Add PMD migration entry parsing") Signed-off-by: Peter Xu <peterx@redhat.com> Reviewed-by: Vlastimil Babka <vbabka@suse.cz> Reviewed-by: David Hildenbrand <david@redhat.com> Reviewed-by: Yang Shi <shy828301@gmail.com> Cc: Konstantin Khlebnikov <khlebnikov@openvz.org> Cc: Huang Ying <ying.huang@intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-08-20mm: add DEVICE_ZONE to FOR_ALL_ZONESHao Lee2-5/+19
FOR_ALL_ZONES should be consistent with enum zone_type. Otherwise, __count_zid_vm_events have the potential to add count to wrong item when zid is ZONE_DEVICE. Link: https://lkml.kernel.org/r/20220807154442.GA18167@haolee.io Signed-off-by: Hao Lee <haolee.swjtu@gmail.com> Cc: David Hildenbrand <david@redhat.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-08-20kernel/sys_ni: add compat entry for fadvise64_64Randy Dunlap1-0/+1
When CONFIG_ADVISE_SYSCALLS is not set/enabled and CONFIG_COMPAT is set/enabled, the riscv compat_syscall_table references 'compat_sys_fadvise64_64', which is not defined: riscv64-linux-ld: arch/riscv/kernel/compat_syscall_table.o:(.rodata+0x6f8): undefined reference to `compat_sys_fadvise64_64' Add 'fadvise64_64' to kernel/sys_ni.c as a conditional COMPAT function so that when CONFIG_ADVISE_SYSCALLS is not set, there is a fallback function available. Link: https://lkml.kernel.org/r/20220807220934.5689-1-rdunlap@infradead.org Fixes: d3ac21cacc24 ("mm: Support compiling out madvise and fadvise") Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Suggested-by: Arnd Bergmann <arnd@arndb.de> Reviewed-by: Arnd Bergmann <arnd@arndb.de> Cc: Josh Triplett <josh@joshtriplett.org> Cc: Paul Walmsley <paul.walmsley@sifive.com> Cc: Palmer Dabbelt <palmer@dabbelt.com> Cc: Albert Ou <aou@eecs.berkeley.edu> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-08-20mm/gup: fix FOLL_FORCE COW security issue and remove FOLL_COWDavid Hildenbrand3-44/+89
Ever since the Dirty COW (CVE-2016-5195) security issue happened, we know that FOLL_FORCE can be possibly dangerous, especially if there are races that can be exploited by user space. Right now, it would be sufficient to have some code that sets a PTE of a R/O-mapped shared page dirty, in order for it to erroneously become writable by FOLL_FORCE. The implications of setting a write-protected PTE dirty might not be immediately obvious to everyone. And in fact ever since commit 9ae0f87d009c ("mm/shmem: unconditionally set pte dirty in mfill_atomic_install_pte"), we can use UFFDIO_CONTINUE to map a shmem page R/O while marking the pte dirty. This can be used by unprivileged user space to modify tmpfs/shmem file content even if the user does not have write permissions to the file, and to bypass memfd write sealing -- Dirty COW restricted to tmpfs/shmem (CVE-2022-2590). To fix such security issues for good, the insight is that we really only need that fancy retry logic (FOLL_COW) for COW mappings that are not writable (!VM_WRITE). And in a COW mapping, we really only broke COW if we have an exclusive anonymous page mapped. If we have something else mapped, or the mapped anonymous page might be shared (!PageAnonExclusive), we have to trigger a write fault to break COW. If we don't find an exclusive anonymous page when we retry, we have to trigger COW breaking once again because something intervened. Let's move away from this mandatory-retry + dirty handling and rely on our PageAnonExclusive() flag for making a similar decision, to use the same COW logic as in other kernel parts here as well. In case we stumble over a PTE in a COW mapping that does not map an exclusive anonymous page, COW was not properly broken and we have to trigger a fake write-fault to break COW. Just like we do in can_change_pte_writable() added via commit 64fe24a3e05e ("mm/mprotect: try avoiding write faults for exclusive anonymous pages when changing protection") and commit 76aefad628aa ("mm/mprotect: fix soft-dirty check in can_change_pte_writable()"), take care of softdirty and uffd-wp manually. For example, a write() via /proc/self/mem to a uffd-wp-protected range has to fail instead of silently granting write access and bypassing the userspace fault handler. Note that FOLL_FORCE is not only used for debug access, but also triggered by applications without debug intentions, for example, when pinning pages via RDMA. This fixes CVE-2022-2590. Note that only x86_64 and aarch64 are affected, because only those support CONFIG_HAVE_ARCH_USERFAULTFD_MINOR. Fortunately, FOLL_COW is no longer required to handle FOLL_FORCE. So let's just get rid of it. Thanks to Nadav Amit for pointing out that the pte_dirty() check in FOLL_FORCE code is problematic and might be exploitable. Note 1: We don't check for the PTE being dirty because it doesn't matter for making a "was COWed" decision anymore, and whoever modifies the page has to set the page dirty either way. Note 2: Kernels before extended uffd-wp support and before PageAnonExclusive (< 5.19) can simply revert the problematic commit instead and be safe regarding UFFDIO_CONTINUE. A backport to v5.19 requires minor adjustments due to lack of vma_soft_dirty_enabled(). Link: https://lkml.kernel.org/r/20220809205640.70916-1-david@redhat.com Fixes: 9ae0f87d009c ("mm/shmem: unconditionally set pte dirty in mfill_atomic_install_pte") Signed-off-by: David Hildenbrand <david@redhat.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Axel Rasmussen <axelrasmussen@google.com> Cc: Nadav Amit <nadav.amit@gmail.com> Cc: Peter Xu <peterx@redhat.com> Cc: Hugh Dickins <hughd@google.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: John Hubbard <jhubbard@nvidia.com> Cc: Jason Gunthorpe <jgg@nvidia.com> Cc: David Laight <David.Laight@ACULAB.COM> Cc: <stable@vger.kernel.org> [5.16] Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-08-20Revert "zram: remove double compression logic"Jiri Slaby2-10/+33
This reverts commit e7be8d1dd983156b ("zram: remove double compression logic") as it causes zram failures. It does not revert cleanly, PTR_ERR handling was introduced in the meantime. This is handled by appropriate IS_ERR. When under memory pressure, zs_malloc() can fail. Before the above commit, the allocation was retried with direct reclaim enabled (GFP_NOIO). After the commit, it is not -- only __GFP_KSWAPD_RECLAIM is tried. So when the failure occurs under memory pressure, the overlaying filesystem such as ext2 (mounted by ext4 module in this case) can emit failures, making the (file)system unusable: EXT4-fs warning (device zram0): ext4_end_bio:343: I/O error 10 writing to inode 16386 starting block 159744) Buffer I/O error on device zram0, logical block 159744 With direct reclaim, memory is really reclaimed and allocation succeeds, eventually. In the worst case, the oom killer is invoked, which is proper outcome if user sets up zram too large (in comparison to available RAM). This very diff doesn't apply to 5.19 (stable) cleanly (see PTR_ERR note above). Use revert of e7be8d1dd983 directly. Link: https://bugzilla.suse.com/show_bug.cgi?id=1202203 Link: https://lkml.kernel.org/r/20220810070609.14402-1-jslaby@suse.cz Fixes: e7be8d1dd983 ("zram: remove double compression logic") Signed-off-by: Jiri Slaby <jslaby@suse.cz> Reviewed-by: Sergey Senozhatsky <senozhatsky@chromium.org> Cc: Minchan Kim <minchan@kernel.org> Cc: Nitin Gupta <ngupta@vflare.org> Cc: Alexey Romanov <avromanov@sberdevices.ru> Cc: Dmitry Rokosov <ddrokosov@sberdevices.ru> Cc: Lukas Czerner <lczerner@redhat.com> Cc: <stable@vger.kernel.org> [5.19] Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-08-20get_maintainer: add Alan to .get_maintainer.ignoreDan Carpenter1-0/+2
Alan asked to be added to the .get_maintainer.ignore list. Link: https://lkml.kernel.org/r/YvN30KhO9aD5Sza9@kili Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Cc: Alan Cox <alan@lxorguk.ukuu.org.uk> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-08-14Linux 6.0-rc1v6.0-rc1Linus Torvalds1-4/+4
2022-08-14radix-tree: replace gfp.h inclusion with gfp_types.hYury Norov1-1/+1
Radix tree header includes gfp.h for __GFP_BITS_SHIFT only. Now we have gfp_types.h for this. Fixes powerpc allmodconfig build: In file included from include/linux/nodemask.h:97, from include/linux/mmzone.h:17, from include/linux/gfp.h:7, from include/linux/radix-tree.h:12, from include/linux/idr.h:15, from include/linux/kernfs.h:12, from include/linux/sysfs.h:16, from include/linux/kobject.h:20, from include/linux/pci.h:35, from arch/powerpc/kernel/prom_init.c:24: include/linux/random.h: In function 'add_latent_entropy': >> include/linux/random.h:25:46: error: 'latent_entropy' undeclared (first use in this function); did you mean 'add_latent_entropy'? 25 | add_device_randomness((const void *)&latent_entropy, sizeof(latent_entropy)); | ^~~~~~~~~~~~~~ | add_latent_entropy include/linux/random.h:25:46: note: each undeclared identifier is reported only once for each function it appears in Reported-by: kernel test robot <lkp@intel.com> CC: Andy Shevchenko <andriy.shevchenko@linux.intel.com> CC: Andrew Morton <akpm@linux-foundation.org> CC: Jason A. Donenfeld <Jason@zx2c4.com> Signed-off-by: Yury Norov <yury.norov@gmail.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-08-14Merge tag 'pull-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfsLinus Torvalds1-0/+3
Pull vfs lseek fix from Al Viro: "Fix proc_reg_llseek() breakage. Always had been possible if somebody left NULL ->proc_lseek, became a practical issue now" * tag 'pull-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: take care to handle NULL ->proc_lseek()
2022-08-14take care to handle NULL ->proc_lseek()Al Viro1-0/+3
Easily done now, just by clearing FMODE_LSEEK in ->f_mode during proc_reg_open() for such entries. Fixes: 868941b14441 "fs: remove no_llseek" Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2022-08-14Merge tag 'for-linus-6.0-rc1b-tag' of ↵Linus Torvalds16-36/+116
git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip Pull more xen updates from Juergen Gross: - fix the handling of the "persistent grants" feature negotiation between Xen blkfront and Xen blkback drivers - a cleanup of xen.config and adding xen.config to Xen section in MAINTAINERS - support HVMOP_set_evtchn_upcall_vector, which is more compliant to "normal" interrupt handling than the global callback used up to now - further small cleanups * tag 'for-linus-6.0-rc1b-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip: MAINTAINERS: add xen config fragments to XEN HYPERVISOR sections xen: remove XEN_SCRUB_PAGES in xen.config xen/pciback: Fix comment typo xen/xenbus: fix return type in xenbus_file_read() xen-blkfront: Apply 'feature_persistent' parameter when connect xen-blkback: Apply 'feature_persistent' parameter when connect xen-blkback: fix persistent grants negotiation x86/xen: Add support for HVMOP_set_evtchn_upcall_vector
2022-08-14Merge tag 'perf-tools-fixes-for-v6.0-2022-08-13' of ↵Linus Torvalds112-4871/+93678
git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux Pull more perf tool updates from Arnaldo Carvalho de Melo: - 'perf c2c' now supports ARM64, adjust its output to cope with differences with what is in x86_64. Now go find false sharing on ARM64 (at least Neoverse) as well! - Refactor the JSON processing, making the output more compact and thus reducing the size of the resulting perf binary - Improvements for 'perf offcpu' profiling, including tracking child processes - Update Intel JSON metrics and events files for broadwellde, broadwellx, cascadelakex, haswellx, icelakex, ivytown, jaketown, knightslanding, sapphirerapids, skylakex and snowridgex - Add 'perf stat' JSON output and a 'perf test' entry for it - Ignore memfd and anonymous mmap events if jitdump present - Refactor 'perf test' shell tests allowing subdirs - Fix an error handling path in 'parse_perf_probe_command()' - Fixes for the guest Intel PT tracing patchkit in the 1st batch of this merge window - Print debuginfod queries if -v option is used, to explain delays in processing when debuginfo servers are enabled to fetch DSOs with richer symbol tables - Improve error message for 'perf record -p not_existing_pid' - Fix openssl and libbpf feature detection - Add PMU pai_crypto event description for IBM z16 on 'perf list' - Fix typos and duplicated words on comments in various places * tag 'perf-tools-fixes-for-v6.0-2022-08-13' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux: (81 commits) perf test: Refactor shell tests allowing subdirs perf vendor events: Update events for snowridgex perf vendor events: Update events and metrics for skylakex perf vendor events: Update metrics for sapphirerapids perf vendor events: Update events for knightslanding perf vendor events: Update metrics for jaketown perf vendor events: Update metrics for ivytown perf vendor events: Update events and metrics for icelakex perf vendor events: Update events and metrics for haswellx perf vendor events: Update events and metrics for cascadelakex perf vendor events: Update events and metrics for broadwellx perf vendor events: Update metrics for broadwellde perf jevents: Fold strings optimization perf jevents: Compress the pmu_events_table perf metrics: Copy entire pmu_event in find metric perf pmu-events: Hide the pmu_events perf pmu-events: Don't assume pmu_event is an array perf pmu-events: Move test events/metrics to JSON perf test: Use full metric resolution perf pmu-events: Hide pmu_events_map ...
2022-08-14Merge tag 'powerpc-6.0-2' of ↵Linus Torvalds6-30/+25
git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux Pull powerpc fixes from Michael Ellerman: - Ensure we never emit lwarx with EH=1 on 32-bit, because some 32-bit CPUs trap on it rather than ignoring it as they should. - Fix ftrace when building with clang, which was broken by some refactoring. - A couple of other minor fixes. Thanks to Christophe Leroy, Naveen N. Rao, Nick Desaulniers, Ondrej Mosnacek, Pali Rohár, Russell Currey, and Segher Boessenkool. * tag 'powerpc-6.0-2' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: powerpc/kexec: Fix build failure from uninitialised variable powerpc/ppc-opcode: Fix PPC_RAW_TW() powerpc64/ftrace: Fix ftrace for clang builds powerpc: Make eh value more explicit when using lwarx powerpc: Don't hide eh field of lwarx behind a macro powerpc: Fix eh field when calling lwarx on PPC32
2022-08-13Merge tag 'pull-work.misc' of ↵Linus Torvalds1-1/+1
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull /proc/mounts fix from Al Viro: "Fix for /proc/mounts escaping - escape the '#' character too" * tag 'pull-work.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: vfs: escape hash as well
2022-08-13Merge tag '5.20-rc-smb3-client-fixes-part2' of ↵Linus Torvalds19-445/+528
git://git.samba.org/sfrench/cifs-2.6 Pull more cifs updates from Steve French: - two fixes for stable, one for a lock length miscalculation, and another fixes a lease break timeout bug - improvement to handle leases, allows the close timeout to be configured more safely - five restructuring/cleanup patches * tag '5.20-rc-smb3-client-fixes-part2' of git://git.samba.org/sfrench/cifs-2.6: cifs: Do not access tcon->cfids->cfid directly from is_path_accessible cifs: Add constructor/destructors for tcon->cfid SMB3: fix lease break timeout when multiple deferred close handles for the same file. smb3: allow deferred close timeout to be configurable cifs: Do not use tcon->cfid directly, use the cfid we get from open_cached_dir cifs: Move cached-dir functions into a separate file cifs: Remove {cifs,nfs}_fscache_release_page() cifs: fix lock length calculation
2022-08-13afs: Enable multipage folio supportDavid Howells2-1/+3
Enable multipage folio support for the afs filesystem. Support has already been implemented in netfslib, fscache and cachefiles and in most of afs, but I've waited for Matthew Wilcox's latest folio changes. Note that it does require a change to afs_write_begin() to return the correct subpage. This is a "temporary" change as we're working on getting rid of the need for ->write_begin() and ->write_end() completely, at least as far as network filesystems are concerned - but it doesn't prevent afs from making use of the capability. Signed-off-by: David Howells <dhowells@redhat.com> Acked-by: Matthew Wilcox (Oracle) <willy@infradead.org> Tested-by: kafs-testing@auristor.com Cc: Marc Dionne <marc.dionne@auristor.com> Cc: linux-afs@lists.infradead.org Link: https://lore.kernel.org/lkml/2274528.1645833226@warthog.procyon.org.uk/ Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-08-13Merge tag 'timers-urgent-2022-08-13' of ↵Linus Torvalds4-4/+8
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull timer fixes from Ingo Molnar: "Misc timer fixes: - fix a potential use-after-free bug in posix timers - correct a prototype - address a build warning" * tag 'timers-urgent-2022-08-13' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: posix-cpu-timers: Cleanup CPU timers before freeing them during exec time: Correct the prototype of ns_to_kernel_old_timeval and ns_to_timespec64 posix-timers: Make do_clock_gettime() static
2022-08-13Merge tag 'x86-urgent-2022-08-13' of ↵Linus Torvalds2-12/+27
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 fix from Ingo Molnar: "Fix the 'IBPB mitigated RETBleed' mode of operation on AMD CPUs (not turned on by default), which also need STIBP enabled (if available) to be '100% safe' on even the shortest speculation windows" * tag 'x86-urgent-2022-08-13' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/bugs: Enable STIBP for IBPB mitigated RETBleed
2022-08-13Merge tag 'i2c-for-5.20-part2' of ↵Linus Torvalds59-181/+370
git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux Pull more i2c updates from Wolfram Sang: - two driver fixes for issues introduced this cycle - one trivial driver improvement regarding ACPI - more DTS conversion and additions - documentation updates - subsystem-wide move from strlcpy to strscpy * tag 'i2c-for-5.20-part2' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux: docs: i2c: i2c-sysfs: fix hyperlinks docs: i2c: i2c-sysfs: improve wording docs: i2c: instantiating-devices: add syntax coloring to dts and C blocks docs: i2c: smbus-protocol: improve DataLow/DataHigh definition docs: i2c: i2c-protocol: remove unused legend items docs: i2c: i2c-protocol,smbus-protocol: remove nonsense words docs: i2c: i2c-protocol: update introductory paragraph i2c: move core from strlcpy to strscpy i2c: move drivers from strlcpy to strscpy i2c: kempld: Support ACPI I2C device declaration i2c: mediatek: add i2c compatible for MT8188 dt-bindings: i2c: update bindings for mt8188 soc i2c: microchip-corei2c: fix erroneous late ack send dt-bindings: i2c: qcom,i2c-cci: convert to dtschema i2c: qcom-geni: Fix GPI DMA buffer sync-back
2022-08-13Merge tag 'ntb-5.20' of https://github.com/jonmason/ntbLinus Torvalds14-25/+1822
Pull NTB updates from Jon Mason: "Non-Transparent Bridge updates. Fix of heap data and clang warnings, support for a new Intel NTB device, and NTB EndPoint Function (EPF) support and the various fixes for that" * tag 'ntb-5.20' of https://github.com/jonmason/ntb: MAINTAINERS: add PCI Endpoint NTB drivers to NTB files NTB: EPF: Tidy up some bounds checks NTB: EPF: Fix error code in epf_ntb_bind() PCI: endpoint: pci-epf-vntb: reduce several globals to statics PCI: endpoint: pci-epf-vntb: fix error handle in epf_ntb_mw_bar_init() PCI: endpoint: Fix Kconfig dependency NTB: EPF: set pointer addr to null using NULL rather than 0 Documentation: PCI: extend subheading underline for "lspci output" section Documentation: PCI: Use code-block block for scratchpad registers diagram Documentation: PCI: Add specification for the PCI vNTB function device PCI: endpoint: Support NTB transfer between RC and EP NTB: epf: Allow more flexibility in the memory BAR map method PCI: designware-ep: Allow pci_epc_set_bar() update inbound map address ntb: intel: add GNR support for Intel PCIe gen5 NTB NTB: ntb_tool: uninitialized heap data in tool_fn_write() ntb: idt: fix clang -Wformat warnings
2022-08-13Merge tag 'xfs-5.20-merge-8' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linuxLinus Torvalds5-46/+193
Pull more xfs updates from Darrick Wong: "There's not a lot this time around, just the usual bug fixes and corrections for missing error returns. - Return error codes from block device flushes to userspace - Fix a deadlock between reclaim and mount time quotacheck - Fix an unnecessary ENOSPC return when doing COW on a filesystem with severe free space fragmentation - Fix a miscalculation in the transaction reservation computations for file removal operations" * tag 'xfs-5.20-merge-8' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux: xfs: fix inode reservation space for removing transaction xfs: Fix false ENOSPC when performing direct write on a delalloc extent in cow fork xfs: fix intermittent hang during quotacheck xfs: check return codes when flushing block devices
2022-08-13Merge tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsiLinus Torvalds21-99/+124
Pull more SCSI updates from James Bottomley: "Mostly small bug fixes and trivial updates. The major new core update is a change to the way device, target and host reference counting is done to try to make it more robust (this change has soaked for a while to try to winkle out any bugs)" * tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi: scsi: pm8001: Fix typo 'the the' in comment scsi: megaraid_sas: Remove redundant variable cmd_type scsi: FlashPoint: Remove redundant variable bm_int_st scsi: zfcp: Fix missing auto port scan and thus missing target ports scsi: core: Call blk_mq_free_tag_set() earlier scsi: core: Simplify LLD module reference counting scsi: core: Make sure that hosts outlive targets scsi: core: Make sure that targets outlive devices scsi: ufs: ufs-pci: Correct check for RESET DSM scsi: target: core: De-RCU of se_lun and se_lun acl scsi: target: core: Fix race during ACL removal scsi: ufs: core: Correct ufshcd_shutdown() flow scsi: ufs: core: Increase the maximum data buffer size scsi: lpfc: Check the return value of alloc_workqueue()
2022-08-13Merge tag 'block-6.0-2022-08-12' of git://git.kernel.dk/linux-blockLinus Torvalds6-7/+19
Pull block fixes from Jens Axboe: - NVMe pull request - print nvme connect Linux error codes properly (Amit Engel) - fix the fc_appid_store return value (Christoph Hellwig) - fix a typo in an error message (Christophe JAILLET) - add another non-unique identifier quirk (Dennis P. Kliem) - check if the queue is allocated before stopping it in nvme-tcp (Maurizio Lombardi) - restart admin queue if the caller needs to restart queue in nvme-fc (Ming Lei) - use kmemdup instead of kmalloc + memcpy in nvme-auth (Zhang Xiaoxu) - __alloc_disk_node() error handling fix (Rafael) * tag 'block-6.0-2022-08-12' of git://git.kernel.dk/linux-block: block: Do not call blk_put_queue() if gendisk allocation fails nvme-pci: add NVME_QUIRK_BOGUS_NID for ADATA XPG GAMMIX S70 nvme-tcp: check if the queue is allocated before stopping it nvme-fabrics: Fix a typo in an error message nvme-fabrics: parse nvme connect Linux error codes nvmet-auth: use kmemdup instead of kmalloc + memcpy nvme-fc: fix the fc_appid_store return value nvme-fc: restart admin queue if the caller needs to restart queue
2022-08-13Merge tag 'io_uring-6.0-2022-08-13' of git://git.kernel.dk/linux-blockLinus Torvalds26-183/+178
Pull io_uring fixes from Jens Axboe: - Regression fix for this merge window, fixing a wrong order of arguments for io_req_set_res() for passthru (Dylan) - Fix for the audit code leaking context memory (Peilin) - Ensure that provided buffers are memcg accounted (Pavel) - Correctly handle short zero-copy sends (Pavel) - Sparse warning fixes for the recvmsg multishot command (Dylan) - Error handling fix for passthru (Anuj) - Remove randomization of struct kiocb fields, to avoid it growing in size if re-arranged in such a fashion that it grows more holes or padding (Keith, Linus) - Small series improving type safety of the sqe fields (Stefan) * tag 'io_uring-6.0-2022-08-13' of git://git.kernel.dk/linux-block: io_uring: add missing BUILD_BUG_ON() checks for new io_uring_sqe fields io_uring: make io_kiocb_to_cmd() typesafe fs: don't randomize struct kiocb fields io_uring: consistently make use of io_notif_to_data() io_uring: fix error handling for io_uring_cmd io_uring: fix io_recvmsg_prep_multishot sparse warnings io_uring/net: send retry for zerocopy io_uring: mem-account pbuf buckets audit, io_uring, io-wq: Fix memory leak in io_sq_thread() and io_wqe_worker() io_uring: pass correct parameters to io_req_set_res
2022-08-13perf test: Refactor shell tests allowing subdirsCarsten Haitzler4-134/+238
This is a prelude to adding more tests to shell tests and in order to support putting those tests into subdirectories, I need to change the test code that scans/finds and runs them. To support subdirs I have to recurse so it's time to refactor the code to allow this and centralize the shell script finding into one location and only one single scan that builds a list of all the found tests in memory instead of it being duplicated in 3 places. This code also optimizes things like knowing the max width of desciption strings (as we can do that while we scan instead of a whole new pass of opening files). It also more cleanly filters scripts to see only *.sh files thus skipping random other files in directories like *~ backup files, other random junk/data files that may appear and the scripts must be executable to make the cut (this ensures the script lib dir is not seen as scripts to run). This avoids perf test running previous older versions of test scripts that are editor backup files as well as skipping perf.data files that may appear and so on. Reviewed-by: Leo Yan <leo.yan@linaro.org> Signed-off-by: Carsten Haitzler <carsten.haitzler@arm.com> Tested-by: Leo Yan <leo.yan@linaro.org> Cc: Mathieu Poirier <mathieu.poirier@linaro.org> Cc: Mike Leach <mike.leach@linaro.org> Cc: Suzuki Poulouse <suzuki.poulose@arm.com> Cc: coresight@lists.linaro.org Link: https://lore.kernel.org/r/20220812121641.336465-2-carsten.haitzler@foss.arm.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-08-13perf vendor events: Update events for snowridgexZhengjun Xing1-84/+27
Update the events to v1.20, update events for snowridgex by the latest event converter tools. Use script at: https://github.com/intel/event-converter-for-linux-perf/blob/master/download_and_gen.py to download and generate the latest events and metrics. Manually copy the snowridgex files into perf. Signed-off-by: Xing Zhengjun <zhengjun.xing@linux.intel.com> Tested-by: Ian Rogers <irogers@google.com> Cc: Alexander Shishkin <alexander.shishkin@intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20220812085239.3089231-12-zhengjun.xing@linux.intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-08-13perf vendor events: Update events and metrics for skylakexZhengjun Xing4-1157/+24918
Update the events to v1.28, the metrics are based on TMA 4.4 full, update events and metrics for skylakex by the latest event converter tools. Use script at: https://github.com/intel/event-converter-for-linux-perf/blob/master/download_and_gen.py to download and generate the latest events and metrics. Manually copy the skylakex files into perf. Signed-off-by: Xing Zhengjun <zhengjun.xing@linux.intel.com> Tested-by: Ian Rogers <irogers@google.com> Cc: Alexander Shishkin <alexander.shishkin@intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20220812085239.3089231-11-zhengjun.xing@linux.intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-08-13perf vendor events: Update metrics for sapphirerapidsZhengjun Xing1-0/+6
The metrics are based on TMA 4.4 full, add new metrics “UNCORE_FREQ” for sapphirerapids. Use script at: https://github.com/intel/event-converter-for-linux-perf/blob/master/download_and_gen.py to download and generate the latest events and metrics. Manually copy the sapphirerapids files into perf. Signed-off-by: Xing Zhengjun <zhengjun.xing@linux.intel.com> Tested-by: Ian Rogers <irogers@google.com> Cc: Alexander Shishkin <alexander.shishkin@intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20220812085239.3089231-10-zhengjun.xing@linux.intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-08-13perf vendor events: Update events for knightslandingZhengjun Xing1-0/+213
Update the events to v9, update events for knightslanding by the latest event converter tools. Use script at: https://github.com/intel/event-converter-for-linux-perf/blob/master/download_and_gen.py to download and generate the latest events and metrics. Manually copy the knightslanding files into perf. Signed-off-by: Xing Zhengjun <zhengjun.xing@linux.intel.com> Tested-by: Ian Rogers <irogers@google.com> Cc: Alexander Shishkin <alexander.shishkin@intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20220812085239.3089231-9-zhengjun.xing@linux.intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-08-13perf vendor events: Update metrics for jaketownZhengjun Xing1-0/+6
The metrics are based on TMA 4.4 full, add new metrics “UNCORE_FREQ” for jaketown. Use script at: https://github.com/intel/event-converter-for-linux-perf/blob/master/download_and_gen.py to download and generate the latest events and metrics. Manually copy the jaketown files into perf. Signed-off-by: Xing Zhengjun <zhengjun.xing@linux.intel.com> Tested-by: Ian Rogers <irogers@google.com> Cc: Alexander Shishkin <alexander.shishkin@intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20220812085239.3089231-8-zhengjun.xing@linux.intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-08-13perf vendor events: Update metrics for ivytownZhengjun Xing1-0/+6
The metrics are based on TMA 4.4 full, add new metrics “UNCORE_FREQ” for ivytown. Use script at: https://github.com/intel/event-converter-for-linux-perf/blob/master/download_and_gen.py to download and generate the latest events and metrics. Manually copy the ivytown files into perf. Signed-off-by: Xing Zhengjun <zhengjun.xing@linux.intel.com> Tested-by: Ian Rogers <irogers@google.com> Cc: Alexander Shishkin <alexander.shishkin@intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20220812085239.3089231-7-zhengjun.xing@linux.intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-08-13perf vendor events: Update events and metrics for icelakexZhengjun Xing4-556/+38332
Update the events to v1.15, the metrics are based on TMA 4.4 full, update events and metrics for icelakex by the latest event converter tools. Use script at: https://github.com/intel/event-converter-for-linux-perf/blob/master/download_and_gen.py to download and generate the latest events and metrics. Manually copy the icelakex files into perf. Signed-off-by: Xing Zhengjun <zhengjun.xing@linux.intel.com> Tested-by: Ian Rogers <irogers@google.com> Cc: Alexander Shishkin <alexander.shishkin@intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20220812085239.3089231-6-zhengjun.xing@linux.intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-08-13perf vendor events: Update events and metrics for haswellxZhengjun Xing2-171/+413
Update the events to v25, the metrics are based on TMA 4.4 full, update events and metrics for haswellx by the latest event converter tools. Use script at: https://github.com/intel/event-converter-for-linux-perf/blob/master/download_and_gen.py to download and generate the latest events and metrics. Manually copy the haswellx files into perf. Signed-off-by: Xing Zhengjun <zhengjun.xing@linux.intel.com> Tested-by: Ian Rogers <irogers@google.com> Cc: Alexander Shishkin <alexander.shishkin@intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20220812085239.3089231-5-zhengjun.xing@linux.intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-08-13perf vendor events: Update events and metrics for cascadelakexZhengjun Xing4-1241/+25848
Update to v16, the metrics are based on TMA 4.4 full, update events and add new metrics “UNCORE_FREQ” for cascadelakex. Use script at: https://github.com/intel/event-converter-for-linux-perf/blob/master/download_and_gen.py to download and generate the latest events and metrics. Manually copy the cascadelakex files into perf. Signed-off-by: Xing Zhengjun <zhengjun.xing@linux.intel.com> Tested-by: Ian Rogers <irogers@google.com> Cc: Alexander Shishkin <alexander.shishkin@intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20220812085239.3089231-4-zhengjun.xing@linux.intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-08-13perf vendor events: Update events and metrics for broadwellxZhengjun Xing2-159/+10
Update to v19, the metrics are based on TMA 4.4 full, update events and add new metrics “UNCORE_FREQ” for broadwellx. Use script at: https://github.com/intel/event-converter-for-linux-perf/blob/master/download_and_gen.py to download and generate the latest events and metrics. Manually copy the broadwellx files into perf. Signed-off-by: Xing Zhengjun <zhengjun.xing@linux.intel.com> Tested-by: Ian Rogers <irogers@google.com> Cc: Alexander Shishkin <alexander.shishkin@intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20220812085239.3089231-3-zhengjun.xing@linux.intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-08-13perf vendor events: Update metrics for broadwelldeZhengjun Xing1-0/+6
The metrics are based on TMA 4.4 full, add new metrics “UNCORE_FREQ” for broadwellde. Use script at: https://github.com/intel/event-converter-for-linux-perf/blob/master/download_and_gen.py to download and generate the latest events and metrics. Manually copy the broadwellde files into perf. Signed-off-by: Xing Zhengjun <zhengjun.xing@linux.intel.com> Tested-by: Ian Rogers <irogers@google.com> Cc: Alexander Shishkin <alexander.shishkin@intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20220812085239.3089231-2-zhengjun.xing@linux.intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-08-13perf jevents: Fold strings optimizationIan Rogers1-7/+48
If a shorter string ends a longer string then the shorter string may reuse the longer string at an offset. For example, on x86 the event arith.cycles_div_busy and cycles_div_busy can be folded, even though they have difference names the strings are identical after 6 characters. cycles_div_busy can reuse the arith.cycles_div_busy string at an offset of 6. In pmu-events.c this looks like the following where the 'also:' lists folded strings: /* offset=177541 */ "arith.cycles_div_busy\000\000pipeline\000Cycles the divider is busy\000\000\000event=0x14,period=2000000,umask=0x1\000\000\000\000\000\000\000\000\000" /* also: cycles_div_busy\000\000pipeline\000Cycles the divider is busy\000\000\000event=0x14,period=2000000,umask=0x1\000\000\000\000\000\000\000\000\000 */ As jevents.py combines multiple strings for an event into a larger string, the amount of folding is minimal as all parts of the event must align. Other organizations can benefit more from folding, but lose space by say recording more offsets. Signed-off-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Clark <james.clark@arm.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: John Garry <john.garry@huawei.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Leo Yan <leo.yan@linaro.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Mike Leach <mike.leach@linaro.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ravi Bangoria <ravi.bangoria@amd.com> Cc: Stephane Eranian <eranian@google.com> Cc: Will Deacon <will@kernel.org> Cc: Xing Zhengjun <zhengjun.xing@linux.intel.com> Cc: linux-arm-kernel@lists.infradead.org Link: https://lore.kernel.org/r/20220812230949.683239-15-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-08-13perf jevents: Compress the pmu_events_tableIan Rogers1-45/+162
The pmu_events array requires 15 pointers per entry which in position independent code need relocating. Change the array to be an array of offsets within a big C string. Only the offset of the first variable is required, subsequent variables are stored in order after the \0 terminator (requiring a byte per variable rather than 4 bytes per offset). The file size savings are: no jevents - the same 19,788,464bytes x86 jevents - ~16.7% file size saving 23,744,288bytes vs 28,502,632bytes all jevents - ~19.5% file size saving 24,469,056bytes vs 30,379,920bytes default build options plus NO_LIBBFD=1. For example, the x86 build savings come from .rela.dyn and .data.rel.ro becoming smaller by 3,157,032bytes and 3,042,016bytes respectively. .rodata increases by 1,432,448bytes, giving an overall 4,766,600bytes saving. To make metric strings more shareable, the topic is changed from say 'skx metrics' to just 'metrics'. To try to help with the memory layout the pmu_events are ordered as used by perf qsort comparator functions. Signed-off-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Clark <james.clark@arm.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: John Garry <john.garry@huawei.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Leo Yan <leo.yan@linaro.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Mike Leach <mike.leach@linaro.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ravi Bangoria <ravi.bangoria@amd.com> Cc: Stephane Eranian <eranian@google.com> Cc: Will Deacon <will@kernel.org> Cc: Xing Zhengjun <zhengjun.xing@linux.intel.com> Cc: linux-arm-kernel@lists.infradead.org Link: https://lore.kernel.org/r/20220812230949.683239-14-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-08-13perf metrics: Copy entire pmu_event in find metricIan Rogers2-17/+18
The pmu_event passed to the pmu_events_table_for_each_event is invalid after the loop. Copy the entire struct in metricgroup__find_metric. Reduce the scope of this function to static. Signed-off-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Clark <james.clark@arm.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: John Garry <john.garry@huawei.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Leo Yan <leo.yan@linaro.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Mike Leach <mike.leach@linaro.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ravi Bangoria <ravi.bangoria@amd.com> Cc: Stephane Eranian <eranian@google.com> Cc: Will Deacon <will@kernel.org> Cc: Xing Zhengjun <zhengjun.xing@linux.intel.com> Cc: linux-arm-kernel@lists.infradead.org Link: https://lore.kernel.org/r/20220812230949.683239-13-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-08-13perf pmu-events: Hide the pmu_eventsIan Rogers12-91/+101
Hide that the pmu_event structs are an array with a new wrapper struct. Signed-off-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Clark <james.clark@arm.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: John Garry <john.garry@huawei.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Leo Yan <leo.yan@linaro.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Mike Leach <mike.leach@linaro.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ravi Bangoria <ravi.bangoria@amd.com> Cc: Stephane Eranian <eranian@google.com> Cc: Will Deacon <will@kernel.org> Cc: Xing Zhengjun <zhengjun.xing@linux.intel.com> Cc: linux-arm-kernel@lists.infradead.org Link: https://lore.kernel.org/r/20220812230949.683239-12-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-08-13perf pmu-events: Don't assume pmu_event is an arrayIan Rogers7-182/+313
The current code assumes that a struct pmu_event can be iterated over forward until a NULL pmu_event is encountered. This makes it difficult to refactor pmu_event. Add a loop function taking a callback function that's passed the struct pmu_event. This way the pmu_event is only needed for one element and not an entire array. Switch existing code iterating over the pmu_event arrays to use the new loop function pmu_events_table_for_each_event. Signed-off-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Clark <james.clark@arm.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: John Garry <john.garry@huawei.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Leo Yan <leo.yan@linaro.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Mike Leach <mike.leach@linaro.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ravi Bangoria <ravi.bangoria@amd.com> Cc: Stephane Eranian <eranian@google.com> Cc: Will Deacon <will@kernel.org> Cc: Xing Zhengjun <zhengjun.xing@linux.intel.com> Cc: linux-arm-kernel@lists.infradead.org Link: https://lore.kernel.org/r/20220812230949.683239-11-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-08-13perf pmu-events: Move test events/metrics to JSONIan Rogers4-81/+132
Move arrays of pmu_events into the JSON code so that it may be regenerated and modified by the jevents.py script. Signed-off-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Clark <james.clark@arm.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: John Garry <john.garry@huawei.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Leo Yan <leo.yan@linaro.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Mike Leach <mike.leach@linaro.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ravi Bangoria <ravi.bangoria@amd.com> Cc: Stephane Eranian <eranian@google.com> Cc: Will Deacon <will@kernel.org> Cc: Xing Zhengjun <zhengjun.xing@linux.intel.com> Cc: linux-arm-kernel@lists.infradead.org Link: https://lore.kernel.org/r/20220812230949.683239-10-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-08-13perf test: Use full metric resolutionIan Rogers1-145/+77
The simple metric resolution doesn't handle recursion properly, switch to use the full resolution as with the parse-metric tests which also increases coverage. Don't set the values for the metric backward as failures to generate a result are ignored. Signed-off-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Clark <james.clark@arm.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: John Garry <john.garry@huawei.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Leo Yan <leo.yan@linaro.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Mike Leach <mike.leach@linaro.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ravi Bangoria <ravi.bangoria@amd.com> Cc: Stephane Eranian <eranian@google.com> Cc: Will Deacon <will@kernel.org> Cc: Xing Zhengjun <zhengjun.xing@linux.intel.com> Cc: linux-arm-kernel@lists.infradead.org Link: https://lore.kernel.org/r/20220812230949.683239-9-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-08-13perf pmu-events: Hide pmu_events_mapIan Rogers7-180/+280
Move usage of the table to pmu-events.c so it may be hidden. By abstracting the table the implementation can later be changed. Signed-off-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Clark <james.clark@arm.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: John Garry <john.garry@huawei.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Leo Yan <leo.yan@linaro.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Mike Leach <mike.leach@linaro.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ravi Bangoria <ravi.bangoria@amd.com> Cc: Stephane Eranian <eranian@google.com> Cc: Will Deacon <will@kernel.org> Cc: Xing Zhengjun <zhengjun.xing@linux.intel.com> Cc: linux-arm-kernel@lists.infradead.org Link: https://lore.kernel.org/r/20220812230949.683239-8-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-08-13perf pmu-events: Avoid passing pmu_events_mapIan Rogers9-120/+101
Preparation for hiding pmu_events_map as an implementation detail. While the map is passed, the table of events is all that is normally wanted. While modifying the function's types, rename pmu_events_map__find to pmu_events_table__find to match later encapsulation. Similarly rename pmu_add_cpu_aliases_map to pmu_add_cpu_aliases_table. Signed-off-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Clark <james.clark@arm.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: John Garry <john.garry@huawei.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Leo Yan <leo.yan@linaro.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Mike Leach <mike.leach@linaro.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ravi Bangoria <ravi.bangoria@amd.com> Cc: Stephane Eranian <eranian@google.com> Cc: Will Deacon <will@kernel.org> Cc: Xing Zhengjun <zhengjun.xing@linux.intel.com> Cc: linux-arm-kernel@lists.infradead.org Link: https://lore.kernel.org/r/20220812230949.683239-7-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-08-13perf pmu-events: Hide pmu_sys_event_tablesIan Rogers6-52/+84
Move usage of the table to pmu-events.c so it may be hidden. By abstracting the table the implementation can later be changed. Signed-off-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Clark <james.clark@arm.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: John Garry <john.garry@huawei.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Leo Yan <leo.yan@linaro.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Mike Leach <mike.leach@linaro.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ravi Bangoria <ravi.bangoria@amd.com> Cc: Stephane Eranian <eranian@google.com> Cc: Will Deacon <will@kernel.org> Cc: Xing Zhengjun <zhengjun.xing@linux.intel.com> Cc: linux-arm-kernel@lists.infradead.org Link: https://lore.kernel.org/r/20220812230949.683239-6-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>