summaryrefslogtreecommitdiffstats
AgeCommit message (Collapse)AuthorFilesLines
2018-12-14PM / Domains: Save OPP table pointer in genpdViresh Kumar2-2/+23
dev_pm_genpd_set_performance_state() will be required to call dev_pm_opp_xlate_performance_state() going forward to translate from performance state of a sub-domain to performance state of its master. And dev_pm_opp_xlate_performance_state() needs pointers to the OPP tables of both genpd and its master. Lets fetch and save them while the OPP tables are added. Fetching the OPP tables should never fail as we just added the OPP tables and so add a WARN_ON() for such a bug instead of full error paths. Tested-by: Rajendra Nayak <rnayak@codeaurora.org> Reviewed-by: Ulf Hansson <ulf.hansson@linaro.org> Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
2018-12-14OPP: Don't return 0 on error from of_get_required_opp_performance_state()Viresh Kumar2-8/+8
of_get_required_opp_performance_state() returns 0 on errors currently and a positive performance state otherwise. Since 0 is a valid performance state (representing off), it would be better if this routine returns negative values on error. That will also make it behave similar to dev_pm_opp_xlate_performance_state(), which also returns performance states and returns negative values on error. Change the return type of the function to "int" in order to return negative values. This doesn't have any users for now and so no other part of the kernel will be impacted with this change. Reviewed-by: Ulf Hansson <ulf.hansson@linaro.org> Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
2018-12-14OPP: Add dev_pm_opp_xlate_performance_state() helperViresh Kumar2-0/+70
dev_pm_genpd_set_performance_state() needs to handle performance state propagation going forward. Currently this routine only gets the required performance state of the device's genpd as an argument, but it doesn't know how to translate that to master genpd(s) of the device's genpd. Introduce a new helper dev_pm_opp_xlate_performance_state() which will be used to translate from performance state of a device (or genpd sub-domain) to another device (or master genpd). Normally the src_table (of genpd sub-domain) will have the "required_opps" property set to point to one of the OPPs in the dst_table (of master genpd), but in some cases the genpd and its master have one to one mapping of performance states and so none of them have the "required-opps" property set. Return the performance state of the src_table as it is in such cases. Tested-by: Rajendra Nayak <rnayak@codeaurora.org> Reviewed-by: Ulf Hansson <ulf.hansson@linaro.org> Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
2018-12-14OPP: Improve _find_table_of_opp_np()Viresh Kumar1-4/+10
Make _find_table_of_opp_np() more efficient by using of_get_parent() to find the parent OPP table node. Tested-by: Rajendra Nayak <rnayak@codeaurora.org> Reviewed-by: Ulf Hansson <ulf.hansson@linaro.org> Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
2018-12-14PM / Domains: Make genpd performance states orthogonal to the idlestatesUlf Hansson1-15/+4
It's quite questionable whether genpd internally should care about if the corresponding PM domain for a device is powered on, as to allow setting a new performance state for it. The assumptions creates an unnecessary limitation at this point, for both consumers and providers, but more importantly it also makes the code more complicated. Therefore, let's simplify the code to allow setting a performance state, by invoking the ->set_performance_state() callback, no matter whether the PM domain is powered on or off. Do note, this change means genpd providers needs to restore the performance state themselves during power on, via the ->power_on() callback. Moreover, they may also need to check that the PM domain is powered on, from their ->set_performance_state() callback, before deciding to update the state. Tested-by: Rajendra Nayak <rnayak@codeaurora.org> Acked-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org> Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
2018-11-05OPP: Remove of_dev_pm_opp_find_required_opp()Viresh Kumar2-59/+0
This isn't used anymore, remove it. Reviewed-by: Ulf Hansson <ulf.hansson@linaro.org> Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
2018-11-05OPP: Rename and relocate of_genpd_opp_to_performance_state()Viresh Kumar4-57/+49
The OPP core already has the performance state values for each of the genpd's OPPs and there is no need to call the genpd callback again to get the performance state for the case where the end device doesn't have an OPP table and has the "required-opps" property directly in its node. This commit renames of_genpd_opp_to_performance_state() as of_get_required_opp_performance_state() and moves it to the OPP core, as it is all about OPP stuff now. Reviewed-by: Ulf Hansson <ulf.hansson@linaro.org> Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
2018-11-05OPP: Configure all required OPPsViresh Kumar2-50/+68
Now that all the infrastructure is in place to support multiple required OPPs, lets switch over to using it. A new internal routine _set_required_opps() takes care of updating performance state for all the required OPPs. With this the performance state updates are supported even when the end device needs to configure regulators as well, that wasn't the case earlier. The pstates were earlier stored in the end device's OPP structures, that also changes now as those values are stored in the genpd's OPP structures. And so we switch over to using pm_genpd_opp_to_performance_state() instead of of_genpd_opp_to_performance_state() to get performance state for the genpd OPPs. The routine _generic_set_opp_domain() is not required anymore and is removed. On errors we don't try to recover by reverting to old settings as things are really complex now and the calls here should never really fail unless there is a bug. There is no point increasing the complexity, for code which will never be executed. Reviewed-by: Ulf Hansson <ulf.hansson@linaro.org> Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
2018-11-05OPP: Add dev_pm_opp_{set|put}_genpd_virt_dev() helperViresh Kumar4-1/+115
Multiple generic power domains for a consumer device are supported with the help of virtual devices, which are created for each consumer device - genpd pair. These are the device structures which are attached to the power domain and are required by the OPP core to set the performance state of the genpd. The helpers added by this commit are required to be called once for each of these virtual devices. These are required only if multiple domains are available for a device, otherwise the actual device structure will be used instead by the OPP core. The new helpers also support the complex cases where the consumer device wouldn't always require all the domains. For example, a camera may require only one power domain during normal operations but two during high resolution operations. The consumer driver can call dev_pm_opp_put_genpd_virt_dev(high_resolution_genpd_virt_dev) if it is currently operating in the normal mode and doesn't have any performance requirements from the genpd which manages high resolution power requirements. The consumer driver can later call dev_pm_opp_set_genpd_virt_dev(high_resolution_genpd_virt_dev) once it switches back to the high resolution mode. The new helpers differ from other OPP set/put helpers as the new ones can be called with OPPs initialized for the table as we may need to call them on the fly because of the complex case explained above. For this reason it is possible that the genpd virt_dev structure may be used in parallel while the new helpers are running and a new mutex is added to protect against that. We didn't use the existing opp_table->lock mutex as that is widely used in the OPP core and we will need this lock in the dev_pm_opp_set_rate() helper while changing OPP and we need to make sure there is not much contention while doing that as that's the hotpath. Reviewed-by: Ulf Hansson <ulf.hansson@linaro.org> Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
2018-11-05PM / Domains: Add genpd_opp_to_performance_state()Viresh Kumar2-0/+41
The OPP core currently stores the performance state in the consumer device's OPP table, but that is going to change going forward and performance state will rather be set directly in the genpd's OPP table. For that we need to get the performance state for genpd's device structure (genpd->dev) instead of the consumer device's structure. Add a new helper to do that. Reviewed-by: Ulf Hansson <ulf.hansson@linaro.org> Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
2018-11-05OPP: Populate OPPs from "required-opps" propertyViresh Kumar3-2/+86
An earlier commit populated the OPP tables from the "required-opps" property, this commit populates the individual OPPs. This is repeated for each OPP in the OPP table and these populated OPPs will be used by later commits. Reviewed-by: Ulf Hansson <ulf.hansson@linaro.org> Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
2018-11-05OPP: Populate required opp tables from "required-opps" propertyViresh Kumar3-0/+157
The current implementation works only for the case where a single phandle is present in the "required-opps" property, while DT allows multiple phandles to be present there. This patch adds new infrastructure to parse all the phandles present in "required-opps" property and save pointers of the required OPP's OPP tables. These will be used by later commits. Reviewed-by: Ulf Hansson <ulf.hansson@linaro.org> Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
2018-11-05OPP: Separate out custom OPP handler specific codeViresh Kumar1-27/+40
Create a separate routine to take care of custom set_opp() handler specific stuff. Reviewed-by: Ulf Hansson <ulf.hansson@linaro.org> Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
2018-11-05OPP: Identify and mark genpd OPP tablesViresh Kumar2-2/+6
We need to handle genpd OPP tables differently, this is already the case at one location and will be extended going forward. Add another field to the OPP table to check if the table belongs to a genpd or not. Reviewed-by: Ulf Hansson <ulf.hansson@linaro.org> Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
2018-11-05PM / Domains: Rename genpd virtual devices as virt_devViresh Kumar1-13/+13
There are several struct device instances that genpd core handles. The most common one is the consumer device structure, which is named (correctly) as "dev" within genpd core. The second one is the genpd's device structure, referenced as genpd->dev. The third one is the virtual device structures created by the genpd core to represent the consumer device for multiple power domain case, currently named as genpd_dev. The naming of these virtual devices isn't very clear or readable and it looks more like the genpd->dev. Rename the virtual device instances within the genpd core as "virt_dev". Reviewed-by: Ulf Hansson <ulf.hansson@linaro.org> Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
2018-11-04Linux 4.20-rc1v4.20-rc1Linus Torvalds1-2/+2
2018-11-04Merge tag 'tags/upstream-4.20-rc1' of git://git.infradead.org/linux-ubifsLinus Torvalds25-292/+2418
Pull UBIFS updates from Richard Weinberger: - Full filesystem authentication feature, UBIFS is now able to have the whole filesystem structure authenticated plus user data encrypted and authenticated. - Minor cleanups * tag 'tags/upstream-4.20-rc1' of git://git.infradead.org/linux-ubifs: (26 commits) ubifs: Remove unneeded semicolon Documentation: ubifs: Add authentication whitepaper ubifs: Enable authentication support ubifs: Do not update inode size in-place in authenticated mode ubifs: Add hashes and HMACs to default filesystem ubifs: authentication: Authenticate super block node ubifs: Create hash for default LPT ubfis: authentication: Authenticate master node ubifs: authentication: Authenticate LPT ubifs: Authenticate replayed journal ubifs: Add auth nodes to garbage collector journal head ubifs: Add authentication nodes to journal ubifs: authentication: Add hashes to index nodes ubifs: Add hashes to the tree node cache ubifs: Create functions to embed a HMAC in a node ubifs: Add helper functions for authentication support ubifs: Add separate functions to init/crc a node ubifs: Format changes for authentication support ubifs: Store read superblock node ubifs: Drop write_node ...
2018-11-04Merge tag 'nfs-for-4.20-2' of git://git.linux-nfs.org/projects/trondmy/linux-nfsLinus Torvalds5-40/+17
Pull NFS client bugfixes from Trond Myklebust: "Highlights include: Bugfix: - Fix build issues on architectures that don't provide 64-bit cmpxchg Cleanups: - Fix a spelling mistake" * tag 'nfs-for-4.20-2' of git://git.linux-nfs.org/projects/trondmy/linux-nfs: NFS: fix spelling mistake, EACCESS -> EACCES SUNRPC: Use atomic(64)_t for seq_send(64)
2018-11-04Merge branch 'timers-urgent-for-linus' of ↵Linus Torvalds7-0/+432
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull more timer updates from Thomas Gleixner: "A set of commits for the new C-SKY architecture timers" * 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: dt-bindings: timer: gx6605s SOC timer clocksource/drivers/c-sky: Add gx6605s SOC system timer dt-bindings: timer: C-SKY Multi-processor timer clocksource/drivers/c-sky: Add C-SKY SMP timer
2018-11-04Merge tag 'ntb-4.20' of git://github.com/jonmason/ntbLinus Torvalds6-110/+429
Pull NTB updates from Jon Mason: "Fairly minor changes and bug fixes: NTB IDT thermal changes and hook into hwmon, ntb_netdev clean-up of private struct, and a few bug fixes" * tag 'ntb-4.20' of git://github.com/jonmason/ntb: ntb: idt: Alter the driver info comments ntb: idt: Discard temperature sensor IRQ handler ntb: idt: Add basic hwmon sysfs interface ntb: idt: Alter temperature read method ntb_netdev: Simplify remove with client device drvdata NTB: transport: Try harder to alloc an aligned MW buffer ntb: ntb_transport: Mark expected switch fall-throughs ntb: idt: Set PCIe bus address to BARLIMITx NTB: ntb_hw_idt: replace IS_ERR_OR_NULL with regular NULL checks ntb: intel: fix return value for ndev_vec_mask() ntb_netdev: fix sleep time mismatch
2018-11-03Merge branch 'sched-urgent-for-linus' of ↵Linus Torvalds2-2/+2
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull scheduler fixes from Ingo Molnar: "A memory (under-)allocation fix and a comment fix" * 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: sched/topology: Fix off by one bug sched/rt: Update comment in pick_next_task_rt()
2018-11-03Merge branch 'x86-urgent-for-linus' of ↵Linus Torvalds47-171/+198
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 fixes from Ingo Molnar: "A number of fixes and some late updates: - make in_compat_syscall() behavior on x86-32 similar to other platforms, this touches a number of generic files but is not intended to impact non-x86 platforms. - objtool fixes - PAT preemption fix - paravirt fixes/cleanups - cpufeatures updates for new instructions - earlyprintk quirk - make microcode version in sysfs world-readable (it is already world-readable in procfs) - minor cleanups and fixes" * 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: compat: Cleanup in_compat_syscall() callers x86/compat: Adjust in_compat_syscall() to generic code under !COMPAT objtool: Support GCC 9 cold subfunction naming scheme x86/numa_emulation: Fix uniform-split numa emulation x86/paravirt: Remove unused _paravirt_ident_32 x86/mm/pat: Disable preemption around __flush_tlb_all() x86/paravirt: Remove GPL from pv_ops export x86/traps: Use format string with panic() call x86: Clean up 'sizeof x' => 'sizeof(x)' x86/cpufeatures: Enumerate MOVDIR64B instruction x86/cpufeatures: Enumerate MOVDIRI instruction x86/earlyprintk: Add a force option for pciserial device objtool: Support per-function rodata sections x86/microcode: Make revision and processor flags world-readable
2018-11-03Merge branch 'perf-urgent-for-linus' of ↵Linus Torvalds102-539/+3616
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull perf updates and fixes from Ingo Molnar: "These are almost all tooling updates: 'perf top', 'perf trace' and 'perf script' fixes and updates, an UAPI header sync with the merge window versions, license marker updates, much improved Sparc support from David Miller, and a number of fixes" * 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (66 commits) perf intel-pt/bts: Calculate cpumode for synthesized samples perf intel-pt: Insert callchain context into synthesized callchains perf tools: Don't clone maps from parent when synthesizing forks perf top: Start display thread earlier tools headers uapi: Update linux/if_link.h header copy tools headers uapi: Update linux/netlink.h header copy tools headers: Sync the various kvm.h header copies tools include uapi: Update linux/mmap.h copy perf trace beauty: Use the mmap flags table generated from headers perf beauty: Wire up the mmap flags table generator to the Makefile perf beauty: Add a generator for MAP_ mmap's flag constants tools include uapi: Update asound.h copy tools arch uapi: Update asm-generic/unistd.h and arm64 unistd.h copies tools include uapi: Update linux/fs.h copy perf callchain: Honour the ordering of PERF_CONTEXT_{USER,KERNEL,etc} perf cs-etm: Correct CPU mode for samples perf unwind: Take pgoff into account when reporting elf to libdwfl perf top: Do not use overwrite mode by default perf top: Allow disabling the overwrite mode perf trace: Beautify mount's first pathname arg ...
2018-11-03Merge branch 'irq-urgent-for-linus' of ↵Linus Torvalds2-3/+3
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull irq fixes from Ingo Molnar: "An irqchip driver fix and a memory (over-)allocation fix" * 'irq-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: irqchip/irq-mvebu-sei: Fix a NULL vs IS_ERR() bug in probe function irq/matrix: Fix memory overallocation
2018-11-04sched/topology: Fix off by one bugPeter Zijlstra1-1/+1
With the addition of the NUMA identity level, we increased @level by one and will run off the end of the array in the distance sort loop. Fixed: 051f3ca02e46 ("sched/topology: Introduce NUMA identity node sched domain") Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-kernel@vger.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-11-03Merge branch 'core/urgent' into x86/urgent, to pick up objtool fixIngo Molnar4068-73501/+261778
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-11-03Merge tag 'armsoc-fixes' of ↵Linus Torvalds8-22/+35
git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc Pull ARM SoC fixes from Olof Johansson: "A few fixes who have come in near or during the merge window: - Removal of a VLA usage in Marvell mpp platform code - Enable some IPMI options for ARM64 servers by default, helps testing - Enable PREEMPT on 32-bit ARMv7 defconfig - Minor fix for stm32 DT (removal of an unused DMA property) - Bugfix for TI OMAP1-based ams-delta (-EINVAL -> IRQ_NOTCONNECTED)" * tag 'armsoc-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc: ARM: dts: stm32: update HASH1 dmas property on stm32mp157c ARM: orion: avoid VLA in orion_mpp_conf ARM: defconfig: Update multi_v7 to use PREEMPT arm64: defconfig: Enable some IPMI configs soc: ti: QMSS: Fix usage of irq_set_affinity_hint ARM: OMAP1: ams-delta: Fix impossible .irq < 0
2018-11-03Merge tag 'arm64-upstream' of ↵Linus Torvalds3-8/+23
git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux Pull more arm64 updates from Catalin Marinas: - fix W+X page (mark RO) allocated by the arm64 kprobes code - Makefile fix for .i files in out of tree modules * tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: arm64: kprobe: make page to RO mode when allocate it arm64: kdump: fix small typo arm64: makefile fix build of .i file in external module case
2018-11-03Merge tag 'dma-mapping-4.20-2' of git://git.infradead.org/users/hch/dma-mappingLinus Torvalds1-0/+2
Pull dma-mapping fix from Christoph Hellwig: "Avoid compile warnings on non-default arm64 configs" * tag 'dma-mapping-4.20-2' of git://git.infradead.org/users/hch/dma-mapping: arm64: fix warnings without CONFIG_IOMMU_DMA
2018-11-03Merge tag 'kbuild-v4.20-2' of ↵Linus Torvalds17-78/+28
git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild Pull Kbuild updates from Masahiro Yamada: - clean-up leftovers in Kconfig files - remove stale oldnoconfig and silentoldconfig targets - remove unneeded cc-fullversion and cc-name variables - improve merge_config script to allow overriding option prefix * tag 'kbuild-v4.20-2' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild: kbuild: remove cc-name variable kbuild: replace cc-name test with CONFIG_CC_IS_CLANG merge_config.sh: Allow to define config prefix kbuild: remove unused cc-fullversion variable kconfig: remove silentoldconfig target kconfig: remove oldnoconfig target powerpc: PCI_MSI needs PCI powerpc: remove CONFIG_MCA leftovers powerpc: remove CONFIG_PCI_QSPAN scsi: aha152x: rename the PCMCIA define
2018-11-03Merge tag '4.20-rc1-smb3-fixes' of git://git.samba.org/sfrench/cifs-2.6Linus Torvalds12-99/+530
Pull cifs fixes and updates from Steve French: "Three small fixes (one Kerberos related, one for stable, and another fixes an oops in xfstest 377), two helpful debugging improvements, three patches for cifs directio and some minor cleanup" * tag '4.20-rc1-smb3-fixes' of git://git.samba.org/sfrench/cifs-2.6: cifs: fix signed/unsigned mismatch on aio_read patch cifs: don't dereference smb_file_target before null check CIFS: Add direct I/O functions to file_operations CIFS: Add support for direct I/O write CIFS: Add support for direct I/O read smb3: missing defines and structs for reparse point handling smb3: allow more detailed protocol info on open files for debugging smb3: on kerberos mount if server doesn't specify auth type use krb5 smb3: add trace point for tree connection cifs: fix spelling mistake, EACCESS -> EACCES cifs: fix return value for cifs_listxattr
2018-11-03Merge branch 'work.afs' of ↵Linus Torvalds1-1/+1
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull 9p fix from Al Viro: "Regression fix for net/9p handling of iov_iter; broken by braino when switching to iov_iter_is_kvec() et.al., spotted and fixed by Marc" * 'work.afs' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: iov_iter: Fix 9p virtio breakage
2018-11-03Merge tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsiLinus Torvalds16-50/+52
Pull more SCSI updates from James Bottomley: "This is a set of minor small (and safe changes) that didn't make the initial pull request plus some bug fixes" * tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi: scsi: mvsas: Remove set but not used variable 'id' scsi: qla2xxx: Remove two arguments from qlafx00_error_entry() scsi: qla2xxx: Make sure that qlafx00_ioctl_iosb_entry() initializes 'res' scsi: qla2xxx: Remove a set-but-not-used variable scsi: qla2xxx: Make qla2x00_sysfs_write_nvram() easier to analyze scsi: qla2xxx: Declare local functions 'static' scsi: qla2xxx: Improve several kernel-doc headers scsi: qla2xxx: Modify fall-through annotations scsi: 3w-sas: 3w-9xxx: Use unsigned char for cdb scsi: mvsas: Use dma_pool_zalloc scsi: target: Don't request modules that aren't even built scsi: target: Set response length for REPORT TARGET PORT GROUPS
2018-11-03Merge branch 'akpm' (patches from Andrew)Linus Torvalds19-124/+172
Merge more updates from Andrew Morton: - more ocfs2 work - various leftovers * emailed patches from Andrew Morton <akpm@linux-foundation.org>: memory_hotplug: cond_resched in __remove_pages bfs: add sanity check at bfs_fill_super() kernel/sysctl.c: remove duplicated include kernel/kexec_file.c: remove some duplicated includes mm, thp: consolidate THP gfp handling into alloc_hugepage_direct_gfpmask ocfs2: fix clusters leak in ocfs2_defrag_extent() ocfs2: dlmglue: clean up timestamp handling ocfs2: don't put and assigning null to bh allocated outside ocfs2: fix a misuse a of brelse after failing ocfs2_check_dir_entry ocfs2: don't use iocb when EIOCBQUEUED returns ocfs2: without quota support, avoid calling quota recovery ocfs2: remove ocfs2_is_o2cb_active() mm: thp: relax __GFP_THISNODE for MADV_HUGEPAGE mappings include/linux/notifier.h: SRCU: fix ctags mm: handle no memcg case in memcg_kmem_charge() properly
2018-11-03memory_hotplug: cond_resched in __remove_pagesMichal Hocko1-0/+1
We have received a bug report that unbinding a large pmem (>1TB) can result in a soft lockup: NMI watchdog: BUG: soft lockup - CPU#9 stuck for 23s! [ndctl:4365] [...] Supported: Yes CPU: 9 PID: 4365 Comm: ndctl Not tainted 4.12.14-94.40-default #1 SLE12-SP4 Hardware name: Intel Corporation S2600WFD/S2600WFD, BIOS SE5C620.86B.01.00.0833.051120182255 05/11/2018 task: ffff9cce7d4410c0 task.stack: ffffbe9eb1bc4000 RIP: 0010:__put_page+0x62/0x80 Call Trace: devm_memremap_pages_release+0x152/0x260 release_nodes+0x18d/0x1d0 device_release_driver_internal+0x160/0x210 unbind_store+0xb3/0xe0 kernfs_fop_write+0x102/0x180 __vfs_write+0x26/0x150 vfs_write+0xad/0x1a0 SyS_write+0x42/0x90 do_syscall_64+0x74/0x150 entry_SYSCALL_64_after_hwframe+0x3d/0xa2 RIP: 0033:0x7fd13166b3d0 It has been reported on an older (4.12) kernel but the current upstream code doesn't cond_resched in the hot remove code at all and the given range to remove might be really large. Fix the issue by calling cond_resched once per memory section. Link: http://lkml.kernel.org/r/20181031125840.23982-1-mhocko@kernel.org Signed-off-by: Michal Hocko <mhocko@suse.com> Acked-by: Johannes Thumshirn <jthumshirn@suse.de> Cc: Dan Williams <dan.j.williams@gmail.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-11-03bfs: add sanity check at bfs_fill_super()Tetsuo Handa1-3/+6
syzbot is reporting too large memory allocation at bfs_fill_super() [1]. Since file system image is corrupted such that bfs_sb->s_start == 0, bfs_fill_super() is trying to allocate 8MB of continuous memory. Fix this by adding a sanity check on bfs_sb->s_start, __GFP_NOWARN and printf(). [1] https://syzkaller.appspot.com/bug?id=16a87c236b951351374a84c8a32f40edbc034e96 Link: http://lkml.kernel.org/r/1525862104-3407-1-git-send-email-penguin-kernel@I-love.SAKURA.ne.jp Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Reported-by: syzbot <syzbot+71c6b5d68e91149fc8a4@syzkaller.appspotmail.com> Reviewed-by: Andrew Morton <akpm@linux-foundation.org> Cc: Tigran Aivazian <aivazian.tigran@gmail.com> Cc: Matthew Wilcox <willy@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-11-03kernel/sysctl.c: remove duplicated includeMichael Schupikov1-1/+0
Remove one include of <linux/pipe_fs_i.h>. No functional changes. Link: http://lkml.kernel.org/r/20181004134223.17735-1-michael@schupikov.de Signed-off-by: Michael Schupikov <michael@schupikov.de> Reviewed-by: Richard Weinberger <richard@nod.at> Acked-by: Luis Chamberlain <mcgrof@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-11-03kernel/kexec_file.c: remove some duplicated includeszhong jiang1-2/+0
We include kexec.h and slab.h twice in kexec_file.c. It's unnecessary. hence just remove them. Link: http://lkml.kernel.org/r/1537498098-19171-1-git-send-email-zhongjiang@huawei.com Signed-off-by: zhong jiang <zhongjiang@huawei.com> Reviewed-by: Bhupesh Sharma <bhsharma@redhat.com> Reviewed-by: Andrew Morton <akpm@linux-foundation.org> Acked-by: Baoquan He <bhe@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-11-03mm, thp: consolidate THP gfp handling into alloc_hugepage_direct_gfpmaskMichal Hocko5-77/+40
THP allocation mode is quite complex and it depends on the defrag mode. This complexity is hidden in alloc_hugepage_direct_gfpmask from a large part currently. The NUMA special casing (namely __GFP_THISNODE) is however independent and placed in alloc_pages_vma currently. This both adds an unnecessary branch to all vma based page allocation requests and it makes the code more complex unnecessarily as well. Not to mention that e.g. shmem THP used to do the node reclaiming unconditionally regardless of the defrag mode until recently. This was not only unexpected behavior but it was also hardly a good default behavior and I strongly suspect it was just a side effect of the code sharing more than a deliberate decision which suggests that such a layering is wrong. Get rid of the thp special casing from alloc_pages_vma and move the logic to alloc_hugepage_direct_gfpmask. __GFP_THISNODE is applied to the resulting gfp mask only when the direct reclaim is not requested and when there is no explicit numa binding to preserve the current logic. Please note that there's also a slight difference wrt MPOL_BIND now. The previous code would avoid using __GFP_THISNODE if the local node was outside of policy_nodemask(). After this patch __GFP_THISNODE is avoided for all MPOL_BIND policies. So there's a difference that if local node is actually allowed by the bind policy's nodemask, previously __GFP_THISNODE would be added, but now it won't be. From the behavior POV this is still correct because the policy nodemask is used. Link: http://lkml.kernel.org/r/20180925120326.24392-3-mhocko@kernel.org Signed-off-by: Michal Hocko <mhocko@suse.com> Acked-by: Vlastimil Babka <vbabka@suse.cz> Cc: Alex Williamson <alex.williamson@redhat.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: David Rientjes <rientjes@google.com> Cc: "Kirill A. Shutemov" <kirill@shutemov.name> Cc: Mel Gorman <mgorman@techsingularity.net> Cc: Stefan Priebe - Profihost AG <s.priebe@profihost.ag> Cc: Zi Yan <zi.yan@cs.rutgers.edu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-11-03ocfs2: fix clusters leak in ocfs2_defrag_extent()Larry Chen1-0/+17
ocfs2_defrag_extent() might leak allocated clusters. When the file system has insufficient space, the number of claimed clusters might be less than the caller wants. If that happens, the original code might directly commit the transaction without returning clusters. This patch is based on code in ocfs2_add_clusters_in_btree(). [akpm@linux-foundation.org: include localalloc.h, reduce scope of data_ac] Link: http://lkml.kernel.org/r/20180904041621.16874-3-lchen@suse.com Signed-off-by: Larry Chen <lchen@suse.com> Reviewed-by: Andrew Morton <akpm@linux-foundation.org> Cc: Mark Fasheh <mark@fasheh.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Junxiao Bi <junxiao.bi@oracle.com> Cc: Joseph Qi <jiangqi903@gmail.com> Cc: Changwei Ge <ge.changwei@h3c.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-11-03ocfs2: dlmglue: clean up timestamp handlingArnd Bergmann1-17/+9
The handling of timestamps outside of the 1970..2038 range in the dlm glue is rather inconsistent: on 32-bit architectures, this has always wrapped around to negative timestamps in the 1902..1969 range, while on 64-bit kernels all timestamps are interpreted as positive 34 bit numbers in the 1970..2514 year range. Now that the VFS code handles 64-bit timestamps on all architectures, we can make the behavior more consistent here, and return the same result that we had on 64-bit already, making the file system y2038 safe in the process. Outside of dlmglue, it already uses 64-bit on-disk timestamps anway, so that part is fine. For consistency, I'm changing ocfs2_pack_timespec() to clamp anything outside of the supported range to the minimum and maximum values. This avoids a possible ambiguity of values before 1970 in particular, which used to be interpreted as times at the end of the 2514 range previously. Link: http://lkml.kernel.org/r/20180619155826.4106487-1-arnd@arndb.de Signed-off-by: Arnd Bergmann <arnd@arndb.de> Reviewed-by: Andrew Morton <akpm@linux-foundation.org> Cc: Mark Fasheh <mark@fasheh.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Junxiao Bi <junxiao.bi@oracle.com> Cc: Joseph Qi <jiangqi903@gmail.com> Cc: Changwei Ge <ge.changwei@h3c.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-11-03ocfs2: don't put and assigning null to bh allocated outsideChangwei Ge1-18/+59
ocfs2_read_blocks() and ocfs2_read_blocks_sync() are both used to read several blocks from disk. Currently, the input argument *bhs* can be NULL or NOT. It depends on the caller's behavior. If the function fails in reading blocks from disk, the corresponding bh will be assigned to NULL and put. Obviously, above process for non-NULL input bh is not appropriate. Because the caller doesn't even know its bhs are put and re-assigned. If buffer head is managed by caller, ocfs2_read_blocks and ocfs2_read_blocks_sync() should not evaluate it to NULL. It will cause caller accessing illegal memory, thus crash. Link: http://lkml.kernel.org/r/HK2PR06MB045285E0F4FBB561F9F2F9B3D5680@HK2PR06MB0452.apcprd06.prod.outlook.com Signed-off-by: Changwei Ge <ge.changwei@h3c.com> Reviewed-by: Guozhonghua <guozhonghua@h3c.com> Cc: Mark Fasheh <mark@fasheh.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Junxiao Bi <junxiao.bi@oracle.com> Cc: Joseph Qi <jiangqi903@gmail.com> Cc: Changwei Ge <ge.changwei@h3c.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-11-03ocfs2: fix a misuse a of brelse after failing ocfs2_check_dir_entryChangwei Ge1-2/+1
Somehow, file system metadata was corrupted, which causes ocfs2_check_dir_entry() to fail in function ocfs2_dir_foreach_blk_el(). According to the original design intention, if above happens we should skip the problematic block and continue to retrieve dir entry. But there is obviouse misuse of brelse around related code. After failure of ocfs2_check_dir_entry(), current code just moves to next position and uses the problematic buffer head again and again during which the problematic buffer head is released for multiple times. I suppose, this a serious issue which is long-lived in ocfs2. This may cause other file systems which is also used in a the same host insane. So we should also consider about bakcporting this patch into linux -stable. Link: http://lkml.kernel.org/r/HK2PR06MB045211675B43EED794E597B6D56E0@HK2PR06MB0452.apcprd06.prod.outlook.com Signed-off-by: Changwei Ge <ge.changwei@h3c.com> Suggested-by: Changkuo Shi <shi.changkuo@h3c.com> Reviewed-by: Andrew Morton <akpm@linux-foundation.org> Cc: Mark Fasheh <mark@fasheh.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Junxiao Bi <junxiao.bi@oracle.com> Cc: Joseph Qi <jiangqi903@gmail.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-11-03ocfs2: don't use iocb when EIOCBQUEUED returnsChangwei Ge1-2/+2
When -EIOCBQUEUED returns, it means that aio_complete() will be called from dio_complete(), which is an asynchronous progress against write_iter. Generally, IO is a very slow progress than executing instruction, but we still can't take the risk to access a freed iocb. And we do face a BUG crash issue. Using the crash tool, iocb is obviously freed already. crash> struct -x kiocb ffff881a350f5900 struct kiocb { ki_filp = 0xffff881a350f5a80, ki_pos = 0x0, ki_complete = 0x0, private = 0x0, ki_flags = 0x0 } And the backtrace shows: ocfs2_file_write_iter+0xcaa/0xd00 [ocfs2] aio_run_iocb+0x229/0x2f0 do_io_submit+0x291/0x540 SyS_io_submit+0x10/0x20 system_call_fastpath+0x16/0x75 Link: http://lkml.kernel.org/r/1523361653-14439-1-git-send-email-ge.changwei@h3c.com Signed-off-by: Changwei Ge <ge.changwei@h3c.com> Reviewed-by: Andrew Morton <akpm@linux-foundation.org> Cc: Mark Fasheh <mark@fasheh.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Junxiao Bi <junxiao.bi@oracle.com> Cc: Joseph Qi <jiangqi903@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-11-03ocfs2: without quota support, avoid calling quota recoveryGuozhonghua1-17/+34
During one dead node's recovery by other node, quota recovery work will be queued. We should avoid calling quota when it is not supported, so check the quota flags. Link: http://lkml.kernel.org/r/71604351584F6A4EBAE558C676F37CA401071AC9FB@H3CMLB12-EX.srv.huawei-3com.com Signed-off-by: guozhonghua <guozhonghua@h3c.com> Reviewed-by: Jan Kara <jack@suse.cz> Cc: Mark Fasheh <mark@fasheh.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Junxiao Bi <junxiao.bi@oracle.com> Cc: Joseph Qi <jiangqi903@gmail.com> Cc: Changwei Ge <ge.changwei@h3c.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-11-03ocfs2: remove ocfs2_is_o2cb_active()Gang He3-10/+1
Remove ocfs2_is_o2cb_active(). We have similar functions to identify which cluster stack is being used via osb->osb_cluster_stack. Secondly, the current implementation of ocfs2_is_o2cb_active() is not totally safe. Based on the design of stackglue, we need to get ocfs2_stack_lock before using ocfs2_stack related data structures, and that active_stack pointer can be NULL in the case of mount failure. Link: http://lkml.kernel.org/r/1495441079-11708-1-git-send-email-ghe@suse.com Signed-off-by: Gang He <ghe@suse.com> Reviewed-by: Joseph Qi <jiangqi903@gmail.com> Reviewed-by: Eric Ren <zren@suse.com> Acked-by: Changwei Ge <ge.changwei@h3c.com> Cc: Mark Fasheh <mark@fasheh.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Junxiao Bi <junxiao.bi@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-11-03mm: thp: relax __GFP_THISNODE for MADV_HUGEPAGE mappingsAndrea Arcangeli1-2/+30
THP allocation might be really disruptive when allocated on NUMA system with the local node full or hard to reclaim. Stefan has posted an allocation stall report on 4.12 based SLES kernel which suggests the same issue: kvm: page allocation stalls for 194572ms, order:9, mode:0x4740ca(__GFP_HIGHMEM|__GFP_IO|__GFP_FS|__GFP_COMP|__GFP_NOMEMALLOC|__GFP_HARDWALL|__GFP_THISNODE|__GFP_MOVABLE|__GFP_DIRECT_RECLAIM), nodemask=(null) kvm cpuset=/ mems_allowed=0-1 CPU: 10 PID: 84752 Comm: kvm Tainted: G W 4.12.0+98-ph <a href="/view.php?id=1" title="[geschlossen] Integration Ramdisk" class="resolved">0000001</a> SLE15 (unreleased) Hardware name: Supermicro SYS-1029P-WTRT/X11DDW-NT, BIOS 2.0 12/05/2017 Call Trace: dump_stack+0x5c/0x84 warn_alloc+0xe0/0x180 __alloc_pages_slowpath+0x820/0xc90 __alloc_pages_nodemask+0x1cc/0x210 alloc_pages_vma+0x1e5/0x280 do_huge_pmd_wp_page+0x83f/0xf00 __handle_mm_fault+0x93d/0x1060 handle_mm_fault+0xc6/0x1b0 __do_page_fault+0x230/0x430 do_page_fault+0x2a/0x70 page_fault+0x7b/0x80 [...] Mem-Info: active_anon:126315487 inactive_anon:1612476 isolated_anon:5 active_file:60183 inactive_file:245285 isolated_file:0 unevictable:15657 dirty:286 writeback:1 unstable:0 slab_reclaimable:75543 slab_unreclaimable:2509111 mapped:81814 shmem:31764 pagetables:370616 bounce:0 free:32294031 free_pcp:6233 free_cma:0 Node 0 active_anon:254680388kB inactive_anon:1112760kB active_file:240648kB inactive_file:981168kB unevictable:13368kB isolated(anon):0kB isolated(file):0kB mapped:280240kB dirty:1144kB writeback:0kB shmem:95832kB shmem_thp: 0kB shmem_pmdmapped: 0kB anon_thp: 81225728kB writeback_tmp:0kB unstable:0kB all_unreclaimable? no Node 1 active_anon:250583072kB inactive_anon:5337144kB active_file:84kB inactive_file:0kB unevictable:49260kB isolated(anon):20kB isolated(file):0kB mapped:47016kB dirty:0kB writeback:4kB shmem:31224kB shmem_thp: 0kB shmem_pmdmapped: 0kB anon_thp: 31897600kB writeback_tmp:0kB unstable:0kB all_unreclaimable? no The defrag mode is "madvise" and from the above report it is clear that the THP has been allocated for MADV_HUGEPAGA vma. Andrea has identified that the main source of the problem is __GFP_THISNODE usage: : The problem is that direct compaction combined with the NUMA : __GFP_THISNODE logic in mempolicy.c is telling reclaim to swap very : hard the local node, instead of failing the allocation if there's no : THP available in the local node. : : Such logic was ok until __GFP_THISNODE was added to the THP allocation : path even with MPOL_DEFAULT. : : The idea behind the __GFP_THISNODE addition, is that it is better to : provide local memory in PAGE_SIZE units than to use remote NUMA THP : backed memory. That largely depends on the remote latency though, on : threadrippers for example the overhead is relatively low in my : experience. : : The combination of __GFP_THISNODE and __GFP_DIRECT_RECLAIM results in : extremely slow qemu startup with vfio, if the VM is larger than the : size of one host NUMA node. This is because it will try very hard to : unsuccessfully swapout get_user_pages pinned pages as result of the : __GFP_THISNODE being set, instead of falling back to PAGE_SIZE : allocations and instead of trying to allocate THP on other nodes (it : would be even worse without vfio type1 GUP pins of course, except it'd : be swapping heavily instead). Fix this by removing __GFP_THISNODE for THP requests which are requesting the direct reclaim. This effectivelly reverts 5265047ac301 on the grounds that the zone/node reclaim was known to be disruptive due to premature reclaim when there was memory free. While it made sense at the time for HPC workloads without NUMA awareness on rare machines, it was ultimately harmful in the majority of cases. The existing behaviour is similar, if not as widespare as it applies to a corner case but crucially, it cannot be tuned around like zone_reclaim_mode can. The default behaviour should always be to cause the least harm for the common case. If there are specialised use cases out there that want zone_reclaim_mode in specific cases, then it can be built on top. Longterm we should consider a memory policy which allows for the node reclaim like behavior for the specific memory ranges which would allow a [1] http://lkml.kernel.org/r/20180820032204.9591-1-aarcange@redhat.com Mel said: : Both patches look correct to me but I'm responding to this one because : it's the fix. The change makes sense and moves further away from the : severe stalling behaviour we used to see with both THP and zone reclaim : mode. : : I put together a basic experiment with usemem configured to reference a : buffer multiple times that is 80% the size of main memory on a 2-socket : box with symmetric node sizes and defrag set to "always". The defrag : setting is not the default but it would be functionally similar to : accessing a buffer with madvise(MADV_HUGEPAGE). Usemem is configured to : reference the buffer multiple times and while it's not an interesting : workload, it would be expected to complete reasonably quickly as it fits : within memory. The results were; : : usemem : vanilla noreclaim-v1 : Amean Elapsd-1 42.78 ( 0.00%) 26.87 ( 37.18%) : Amean Elapsd-3 27.55 ( 0.00%) 7.44 ( 73.00%) : Amean Elapsd-4 5.72 ( 0.00%) 5.69 ( 0.45%) : : This shows the elapsed time in seconds for 1 thread, 3 threads and 4 : threads referencing buffers 80% the size of memory. With the patches : applied, it's 37.18% faster for the single thread and 73% faster with two : threads. Note that 4 threads showing little difference does not indicate : the problem is related to thread counts. It's simply the case that 4 : threads gets spread so their workload mostly fits in one node. : : The overall view from /proc/vmstats is more startling : : 4.19.0-rc1 4.19.0-rc1 : vanillanoreclaim-v1r1 : Minor Faults 35593425 708164 : Major Faults 484088 36 : Swap Ins 3772837 0 : Swap Outs 3932295 0 : : Massive amounts of swap in/out without the patch : : Direct pages scanned 6013214 0 : Kswapd pages scanned 0 0 : Kswapd pages reclaimed 0 0 : Direct pages reclaimed 4033009 0 : : Lots of reclaim activity without the patch : : Kswapd efficiency 100% 100% : Kswapd velocity 0.000 0.000 : Direct efficiency 67% 100% : Direct velocity 11191.956 0.000 : : Mostly from direct reclaim context as you'd expect without the patch. : : Page writes by reclaim 3932314.000 0.000 : Page writes file 19 0 : Page writes anon 3932295 0 : Page reclaim immediate 42336 0 : : Writes from reclaim context is never good but the patch eliminates it. : : We should never have default behaviour to thrash the system for such a : basic workload. If zone reclaim mode behaviour is ever desired but on a : single task instead of a global basis then the sensible option is to build : a mempolicy that enforces that behaviour. This was a severe regression compared to previous kernels that made important workloads unusable and it starts when __GFP_THISNODE was added to THP allocations under MADV_HUGEPAGE. It is not a significant risk to go to the previous behavior before __GFP_THISNODE was added, it worked like that for years. This was simply an optimization to some lucky workloads that can fit in a single node, but it ended up breaking the VM for others that can't possibly fit in a single node, so going back is safe. [mhocko@suse.com: rewrote the changelog based on the one from Andrea] Link: http://lkml.kernel.org/r/20180925120326.24392-2-mhocko@kernel.org Fixes: 5265047ac301 ("mm, thp: really limit transparent hugepage allocation to local node") Signed-off-by: Andrea Arcangeli <aarcange@redhat.com> Signed-off-by: Michal Hocko <mhocko@suse.com> Reported-by: Stefan Priebe <s.priebe@profihost.ag> Debugged-by: Andrea Arcangeli <aarcange@redhat.com> Reported-by: Alex Williamson <alex.williamson@redhat.com> Reviewed-by: Mel Gorman <mgorman@techsingularity.net> Tested-by: Mel Gorman <mgorman@techsingularity.net> Cc: Zi Yan <zi.yan@cs.rutgers.edu> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: David Rientjes <rientjes@google.com> Cc: "Kirill A. Shutemov" <kirill@shutemov.name> Cc: <stable@vger.kernel.org> [4.1+] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-11-03include/linux/notifier.h: SRCU: fix ctagsSam Protsenko1-2/+1
ctags indexing ("make tags" command) throws this warning: ctags: Warning: include/linux/notifier.h:125: null expansion of name pattern "\1" This is the result of DEFINE_PER_CPU() macro expansion. Fix that by getting rid of line break. Similar fix was already done in commit 25528213fe9f ("tags: Fix DEFINE_PER_CPU expansions"), but this one probably wasn't noticed. Link: http://lkml.kernel.org/r/20181030202808.28027-1-semen.protsenko@linaro.org Fixes: 9c80172b902d ("kernel/SRCU: provide a static initializer") Signed-off-by: Sam Protsenko <semen.protsenko@linaro.org> Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Cc: Andy Shevchenko <andy.shevchenko@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-11-03mm: handle no memcg case in memcg_kmem_charge() properlyRoman Gushchin1-1/+1
Mike Galbraith reported a regression caused by the commit 9b6f7e163cd0 ("mm: rework memcg kernel stack accounting") on a system with "cgroup_disable=memory" boot option: the system panics with the following stack trace: BUG: unable to handle kernel NULL pointer dereference at 00000000000000f8 PGD 0 P4D 0 Oops: 0002 [#1] PREEMPT SMP PTI CPU: 0 PID: 1 Comm: systemd Not tainted 4.19.0-preempt+ #410 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS ?-20180531_142017-buildhw-08.phx2.fed4 RIP: 0010:page_counter_try_charge+0x22/0xc0 Code: 41 5d c3 c3 0f 1f 40 00 0f 1f 44 00 00 48 85 ff 0f 84 a7 00 00 00 41 56 48 89 f8 49 89 fe 49 Call Trace: try_charge+0xcb/0x780 memcg_kmem_charge_memcg+0x28/0x80 memcg_kmem_charge+0x8b/0x1d0 copy_process.part.41+0x1ca/0x2070 _do_fork+0xd7/0x3d0 do_syscall_64+0x5a/0x180 entry_SYSCALL_64_after_hwframe+0x49/0xbe The problem occurs because get_mem_cgroup_from_current() returns the NULL pointer if memory controller is disabled. Let's check if this is a case at the beginning of memcg_kmem_charge() and just return 0 if mem_cgroup_disabled() returns true. This is how we handle this case in many other places in the memory controller code. Link: http://lkml.kernel.org/r/20181029215123.17830-1-guro@fb.com Fixes: 9b6f7e163cd0 ("mm: rework memcg kernel stack accounting") Signed-off-by: Roman Gushchin <guro@fb.com> Reported-by: Mike Galbraith <efault@gmx.de> Acked-by: Rik van Riel <riel@surriel.com> Acked-by: Michal Hocko <mhocko@suse.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Vladimir Davydov <vdavydov.dev@gmail.com> Cc: Shakeel Butt <shakeelb@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-11-02Merge tag 'omap-for-v4.20/omap1-fix-signed' of ↵Olof Johansson1-6/+2
git://git.kernel.org/pub/scm/linux/kernel/git/tmlind/linux-omap into fixes Fix for omap1 ams-delta irq We need to use IRQ_NOTCONNECTED instead of -EINVAL for ams_delta_modem_ports irq. * tag 'omap-for-v4.20/omap1-fix-signed' of git://git.kernel.org/pub/scm/linux/kernel/git/tmlind/linux-omap: ARM: OMAP1: ams-delta: Fix impossible .irq < 0 Signed-off-by: Olof Johansson <olof@lixom.net>