summaryrefslogtreecommitdiffstats
path: root/include
AgeCommit message (Collapse)AuthorFilesLines
2016-03-09KVM: arm/arm64: vgic-v2: Avoid accessing GICH registersMarc Zyngier1-0/+2
GICv2 registers are *slow*. As in "terrifyingly slow". Which is bad. But we're equaly bad, as we make a point in accessing them even if we don't have any interrupt in flight. A good solution is to first find out if we have anything useful to write into the GIC, and if we don't, to simply not do it. This involves tracking which LRs actually have something valid there. Reviewed-by: Christoffer Dall <christoffer.dall@linaro.org> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
2016-02-29KVM: arm/arm64: timer: Add active state cachingMarc Zyngier1-0/+5
Programming the active state in the (re)distributor can be an expensive operation so it makes some sense to try and reduce the number of accesses as much as possible. So far, we program the active state on each VM entry, but there is some opportunity to do less. An obvious solution is to cache the active state in memory, and only program it in the HW when conditions change. But because the HW can also change things under our feet (the active state can transition from 1 to 0 when the guest does an EOI), some precautions have to be taken, which amount to only caching an "inactive" state, and always programing it otherwise. With this in place, we observe a reduction of around 700 cycles on a 2GHz GICv2 platform for a NULL hypercall. Reviewed-by: Christoffer Dall <christoffer.dall@linaro.org> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
2016-02-29arm64: KVM: Add a new vcpu device control group for PMUv3Shannon Zhao1-0/+23
To configure the virtual PMUv3 overflow interrupt number, we use the vcpu kvm_device ioctl, encapsulating the KVM_ARM_VCPU_PMU_V3_IRQ attribute within the KVM_ARM_VCPU_PMU_V3_CTRL group. After configuring the PMUv3, call the vcpu ioctl with attribute KVM_ARM_VCPU_PMU_V3_INIT to initialize the PMUv3. Signed-off-by: Shannon Zhao <shannon.zhao@linaro.org> Acked-by: Peter Maydell <peter.maydell@linaro.org> Reviewed-by: Andrew Jones <drjones@redhat.com> Reviewed-by: Christoffer Dall <christoffer.dall@linaro.org> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
2016-02-29arm64: KVM: Introduce per-vcpu kvm device controlsShannon Zhao1-0/+1
In some cases it needs to get/set attributes specific to a vcpu and so needs something else than ONE_REG. Let's copy the KVM_DEVICE approach, and define the respective ioctls for the vcpu file descriptor. Signed-off-by: Shannon Zhao <shannon.zhao@linaro.org> Reviewed-by: Andrew Jones <drjones@redhat.com> Acked-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
2016-02-29arm64: KVM: Add a new feature bit for PMUv3Shannon Zhao2-0/+3
To support guest PMUv3, use one bit of the VCPU INIT feature array. Initialize the PMU when initialzing the vcpu with that bit and PMU overflow interrupt set. Signed-off-by: Shannon Zhao <shannon.zhao@linaro.org> Acked-by: Peter Maydell <peter.maydell@linaro.org> Reviewed-by: Andrew Jones <drjones@redhat.com> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
2016-02-29arm64: KVM: Free perf event of PMU when destroying vcpuShannon Zhao1-0/+2
When KVM frees VCPU, it needs to free the perf_event of PMU. Signed-off-by: Shannon Zhao <shannon.zhao@linaro.org> Reviewed-by: Marc Zyngier <marc.zyngier@arm.com> Reviewed-by: Andrew Jones <drjones@redhat.com> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
2016-02-29arm64: KVM: Reset PMU state when resetting vcpuShannon Zhao1-0/+2
When resetting vcpu, it needs to reset the PMU state to initial status. Signed-off-by: Shannon Zhao <shannon.zhao@linaro.org> Reviewed-by: Marc Zyngier <marc.zyngier@arm.com> Reviewed-by: Andrew Jones <drjones@redhat.com> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
2016-02-29arm64: KVM: Add PMU overflow interrupt routingShannon Zhao1-0/+5
When calling perf_event_create_kernel_counter to create perf_event, assign a overflow handler. Then when the perf event overflows, set the corresponding bit of guest PMOVSSET register. If this counter is enabled and its interrupt is enabled as well, kick the vcpu to sync the interrupt. On VM entry, if there is counter overflowed and interrupt level is changed, inject the interrupt with corresponding level. On VM exit, sync the interrupt level as well if it has been changed. Signed-off-by: Shannon Zhao <shannon.zhao@linaro.org> Reviewed-by: Marc Zyngier <marc.zyngier@arm.com> Reviewed-by: Andrew Jones <drjones@redhat.com> Reviewed-by: Christoffer Dall <christoffer.dall@linaro.org> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
2016-02-29arm64: KVM: Add helper to handle PMCR register bitsShannon Zhao1-0/+2
According to ARMv8 spec, when writing 1 to PMCR.E, all counters are enabled by PMCNTENSET, while writing 0 to PMCR.E, all counters are disabled. When writing 1 to PMCR.P, reset all event counters, not including PMCCNTR, to zero. When writing 1 to PMCR.C, reset PMCCNTR to zero. Signed-off-by: Shannon Zhao <shannon.zhao@linaro.org> Reviewed-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
2016-02-29arm64: KVM: Add access handler for PMSWINC registerShannon Zhao1-0/+2
Add access handler which emulates writing and reading PMSWINC register and add support for creating software increment event. Signed-off-by: Shannon Zhao <shannon.zhao@linaro.org> Reviewed-by: Andrew Jones <drjones@redhat.com> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
2016-02-29arm64: KVM: Add access handler for PMOVSSET and PMOVSCLR registerShannon Zhao1-0/+2
Since the reset value of PMOVSSET and PMOVSCLR is UNKNOWN, use reset_unknown for its reset handler. Add a handler to emulate writing PMOVSSET or PMOVSCLR register. When writing non-zero value to PMOVSSET, the counter and its interrupt is enabled, kick this vcpu to sync PMU interrupt. Signed-off-by: Shannon Zhao <shannon.zhao@linaro.org> Reviewed-by: Andrew Jones <drjones@redhat.com> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
2016-02-29arm64: KVM: PMU: Add perf event map and introduce perf event creating functionShannon Zhao1-0/+4
When we use tools like perf on host, perf passes the event type and the id of this event type category to kernel, then kernel will map them to hardware event number and write this number to PMU PMEVTYPER<n>_EL0 register. When getting the event number in KVM, directly use raw event type to create a perf_event for it. Signed-off-by: Shannon Zhao <shannon.zhao@linaro.org> Reviewed-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
2016-02-29arm64: KVM: Add access handler for PMCNTENSET and PMCNTENCLR registerShannon Zhao1-0/+9
Since the reset value of PMCNTENSET and PMCNTENCLR is UNKNOWN, use reset_unknown for its reset handler. Add a handler to emulate writing PMCNTENSET or PMCNTENCLR register. When writing to PMCNTENSET, call perf_event_enable to enable the perf event. When writing to PMCNTENCLR, call perf_event_disable to disable the perf event. Signed-off-by: Shannon Zhao <shannon.zhao@linaro.org> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
2016-02-29arm64: KVM: Add access handler for event counter registerShannon Zhao1-0/+11
These kind of registers include PMEVCNTRn, PMCCNTR and PMXEVCNTR which is mapped to PMEVCNTRn. The access handler translates all aarch32 register offsets to aarch64 ones and uses vcpu_sys_reg() to access their values to avoid taking care of big endian. When reading these registers, return the sum of register value and the value perf event counts. Signed-off-by: Shannon Zhao <shannon.zhao@linaro.org> Reviewed-by: Andrew Jones <drjones@redhat.com> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
2016-02-29arm64: KVM: Add access handler for PMCR registerShannon Zhao1-0/+4
Add reset handler which gets host value of PMCR_EL0 and make writable bits architecturally UNKNOWN except PMCR.E which is zero. Add an access handler for PMCR. Signed-off-by: Shannon Zhao <shannon.zhao@linaro.org> Reviewed-by: Andrew Jones <drjones@redhat.com> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
2016-02-29arm64: KVM: Define PMU data structure for each vcpuShannon Zhao1-0/+42
Here we plan to support virtual PMU for guest by full software emulation, so define some basic structs and functions preparing for futher steps. Define struct kvm_pmc for performance monitor counter and struct kvm_pmu for performance monitor unit for each vcpu. According to ARMv8 spec, the PMU contains at most 32(ARMV8_PMU_MAX_COUNTERS) counters. Since this only supports ARM64 (or PMUv3), add a separate config symbol for it. Signed-off-by: Shannon Zhao <shannon.zhao@linaro.org> Acked-by: Marc Zyngier <marc.zyngier@arm.com> Reviewed-by: Andrew Jones <drjones@redhat.com> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
2016-02-28Merge branch 'perf-urgent-for-linus' of ↵Linus Torvalds1-3/+4
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull perf fixes from Thomas Gleixner: "A rather largish series of 12 patches addressing a maze of race conditions in the perf core code from Peter Zijlstra" * 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: perf: Robustify task_function_call() perf: Fix scaling vs. perf_install_in_context() perf: Fix scaling vs. perf_event_enable() perf: Fix scaling vs. perf_event_enable_on_exec() perf: Fix ctx time tracking by introducing EVENT_TIME perf: Cure event->pending_disable race perf: Fix race between event install and jump_labels perf: Fix cloning perf: Only update context time when active perf: Allow perf_release() with !event->ctx perf: Do not double free perf: Close install vs. exit race
2016-02-27Merge branch 'akpm' (patches from Andrew)Linus Torvalds2-3/+6
Merge fixes from Andrew Morton: "10 fixes" * emailed patches from Andrew Morton <akpm@linux-foundation.org>: dax: move writeback calls into the filesystems dax: give DAX clearing code correct bdev ext4: online defrag not supported with DAX ext2, ext4: only set S_DAX for regular inodes block: disable block device DAX by default ocfs2: unlock inode if deleting inode from orphan fails mm: ASLR: use get_random_long() drivers: char: random: add get_random_long() mm: numa: quickly fail allocations for NUMA balancing on full nodes mm: thp: fix SMP race condition between THP page fault and MADV_DONTNEED
2016-02-27Merge tag 'pci-v4.5-fixes-3' of ↵Linus Torvalds1-17/+0
git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci Pull PCI fixes from Bjorn Helgaas: "Enumeration: Revert x86 pcibios_alloc_irq() to fix regression (Bjorn Helgaas) Marvell MVEBU host bridge driver: Restrict build to 32-bit ARM (Thierry Reding)" * tag 'pci-v4.5-fixes-3' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci: PCI: mvebu: Restrict build to 32-bit ARM Revert "PCI, x86: Implement pcibios_alloc_irq() and pcibios_free_irq()" Revert "PCI: Add helpers to manage pci_dev->irq and pci_dev->irq_managed" Revert "x86/PCI: Don't alloc pcibios-irq when MSI is enabled"
2016-02-27dax: move writeback calls into the filesystemsRoss Zwisler1-2/+4
Previously calls to dax_writeback_mapping_range() for all DAX filesystems (ext2, ext4 & xfs) were centralized in filemap_write_and_wait_range(). dax_writeback_mapping_range() needs a struct block_device, and it used to get that from inode->i_sb->s_bdev. This is correct for normal inodes mounted on ext2, ext4 and XFS filesystems, but is incorrect for DAX raw block devices and for XFS real-time files. Instead, call dax_writeback_mapping_range() directly from the filesystem ->writepages function so that it can supply us with a valid block device. This also fixes DAX code to properly flush caches in response to sync(2). Signed-off-by: Ross Zwisler <ross.zwisler@linux.intel.com> Signed-off-by: Jan Kara <jack@suse.cz> Cc: Al Viro <viro@ftp.linux.org.uk> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Dave Chinner <david@fromorbit.com> Cc: Jens Axboe <axboe@fb.com> Cc: Matthew Wilcox <matthew.r.wilcox@intel.com> Cc: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-02-27dax: give DAX clearing code correct bdevRoss Zwisler1-1/+1
dax_clear_blocks() needs a valid struct block_device and previously it was using inode->i_sb->s_bdev in all cases. This is correct for normal inodes on mounted ext2, ext4 and XFS filesystems, but is incorrect for DAX raw block devices and for XFS real-time devices. Instead, rename dax_clear_blocks() to dax_clear_sectors(), and change its arguments to take a bdev and a sector instead of an inode and a block. This better reflects what the function does, and it allows the filesystem and raw block device code to pass in an appropriate struct block_device. Signed-off-by: Ross Zwisler <ross.zwisler@linux.intel.com> Suggested-by: Dan Williams <dan.j.williams@intel.com> Reviewed-by: Jan Kara <jack@suse.cz> Cc: Theodore Ts'o <tytso@mit.edu> Cc: Al Viro <viro@ftp.linux.org.uk> Cc: Dave Chinner <david@fromorbit.com> Cc: Jens Axboe <axboe@fb.com> Cc: Matthew Wilcox <matthew.r.wilcox@intel.com> Cc: Ross Zwisler <ross.zwisler@linux.intel.com> Cc: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-02-27drivers: char: random: add get_random_long()Daniel Cashman1-0/+1
Commit d07e22597d1d ("mm: mmap: add new /proc tunable for mmap_base ASLR") added the ability to choose from a range of values to use for entropy count in generating the random offset to the mmap_base address. The maximum value on this range was set to 32 bits for 64-bit x86 systems, but this value could be increased further, requiring more than the 32 bits of randomness provided by get_random_int(), as is already possible for arm64. Add a new function: get_random_long() which more naturally fits with the mmap usage of get_random_int() but operates exactly the same as get_random_int(). Also, fix the shifting constant in mmap_rnd() to be an unsigned long so that values greater than 31 bits generate an appropriate mask without overflow. This is especially important on x86, as its shift instruction uses a 5-bit mask for the shift operand, which meant that any value for mmap_rnd_bits over 31 acts as a no-op and effectively disables mmap_base randomization. Finally, replace calls to get_random_int() with get_random_long() where appropriate. This patch (of 2): Add get_random_long(). Signed-off-by: Daniel Cashman <dcashman@android.com> Acked-by: Kees Cook <keescook@chromium.org> Cc: "Theodore Ts'o" <tytso@mit.edu> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Will Deacon <will.deacon@arm.com> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: David S. Miller <davem@davemloft.net> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Nick Kralevich <nnk@google.com> Cc: Jeff Vander Stoep <jeffv@google.com> Cc: Mark Salyzyn <salyzyn@android.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-02-26Merge tag 'sound-4.5-rc6' of ↵Linus Torvalds1-1/+1
git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound Pull sound fixes from Takashi Iwai: "Things got calmed down for rc6, as it seems, and we have only a few HD-audio fixes at this time: a fix for Skylake codec probe errors, a fix for missing interrupt handling, and a few Dell and HP quirks" * tag 'sound-4.5-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound: ALSA: hda - Loop interrupt handling until really cleared ALSA: hda - Fix headset support and noise on HP EliteBook 755 G2 ALSA: hda - Fixup speaker pass-through control for nid 0x14 on ALC225 ALSA: hda - Fixing background noise on Dell Inspiron 3162 ALSA: hda - Apply clock gate workaround to Skylake, too
2016-02-26ALSA: hda - Loop interrupt handling until really clearedTakashi Iwai1-1/+1
Currently the interrupt handler of HD-audio driver assumes that no irq update is needed while processing the irq. But in reality, it has been confirmed that the HW irq is issued even during the irq handling. Since we clear the irq status at the beginning, process the interrupt, then exits from the handler, the lately issued interrupt is left untouched without being properly processed. This patch changes the interrupt handler code to loop over the check-and-process. The handler tries repeatedly as long as the IRQ status are turned on, and either stream or CORB/RIRB is handled. For checking the stream handling, snd_hdac_bus_handle_stream_irq() returns a value indicating the stream indices bits. Other than that, the change is only in the irq handler itself. Reported-by: Libin Yang <libin.yang@linux.intel.com> Cc: <stable@vger.kernel.org> Signed-off-by: Takashi Iwai <tiwai@suse.de>
2016-02-25Merge branch 'libnvdimm-fixes' of ↵Linus Torvalds2-4/+10
git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm Pull libnvdimm fixes from Dan Williams: - Two fixes for compatibility with the ACPI 6.1 specification. Without these fixes multi-interface DIMMs will fail to be probed, and address range scrub commands to find memory errors will give results that the kernel will mis-interpret. For multi-interface DIMMs Linux will accept either the original 6.0 implementation or 6.1. For address range scrub we'll only support 6.1 since ACPI formalized this DSM differently than the original example [1] implemented in v4.2. The expectation is that production systems will only ever ship the ACPI 6.1 address range scrub command definition. - The wider async address range scrub work targeting 4.6 discovered that the original synchronous implementation in 4.5 is not sizing its return buffer correctly. - Arnd caught that my recent fix to the size of the pfn_t flags missed updating the flags variable used in the pmem driver. - Toshi found that we mishandle the memremap() return value in devm_memremap(). * 'libnvdimm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm: nvdimm: use 'u64' for pfn flags devm_memremap: Fix error value when memremap failed nfit: update address range scrub commands to the acpi 6.1 format libnvdimm, tools/testing/nvdimm: fix 'ars_status' output buffer sizing nfit: fix multi-interface dimm handling, acpi6.1 compatibility
2016-02-25Merge tag 'for-v4.5-rc' of ↵Linus Torvalds1-0/+1
git://git.kernel.org/pub/scm/linux/kernel/git/sre/linux-power-supply Pull power supply fixes from Sebastian Reichel: "Add a regression fix for changed sysfs path of bq27xxx_battery and update MAINTAINERS file" * tag 'for-v4.5-rc' of git://git.kernel.org/pub/scm/linux/kernel/git/sre/linux-power-supply: power: bq27xxx_battery: Restore device name MAINTAINERS: update bq27xxx driver
2016-02-25perf: Fix race between event install and jump_labelsPeter Zijlstra1-3/+3
perf_install_in_context() relies upon the context switch hooks to have scheduled in events when the IPI misses its target -- after all, if the task has moved from the CPU (or wasn't running at all), it will have to context switch to run elsewhere. This however doesn't appear to be happening. It is possible for the IPI to not happen (task wasn't running) only to later observe the task running with an inactive context. The only possible explanation is that the context switch hooks are not called. Therefore put in a sync_sched() after toggling the jump_label to guarantee all CPUs will have them enabled before we install an event. A simple if (0->1) sync_sched() will not in fact work, because any further increment can race and complete before the sync_sched(). Therefore we must jump through some hoops. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: dvyukov@google.com Cc: eranian@google.com Cc: oleg@redhat.com Cc: panand@redhat.com Cc: sasha.levin@oracle.com Cc: vince@deater.net Link: http://lkml.kernel.org/r/20160224174947.980211985@infradead.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-02-25perf: Fix cloningPeter Zijlstra1-0/+1
Alexander reported that when the 'original' context gets destroyed, no new clones happen. This can happen irrespective of the ctx switch optimization, any task can die, even the parent, and we want to continue monitoring the task hierarchy until we either close the event or no tasks are left in the hierarchy. perf_event_init_context() will attempt to pin the 'parent' context during clone(). At that point current is the parent, and since current cannot have exited while executing clone(), its context cannot have passed through perf_event_exit_task_context(). Therefore perf_pin_task_context() cannot observe ctx->task == TASK_TOMBSTONE. However, since inherit_event() does: if (parent_event->parent) parent_event = parent_event->parent; it looks at the 'original' event when it does: is_orphaned_event(). This can return true if the context that contains the this event has passed through perf_event_exit_task_context(). And thus we'll fail to clone the perf context. Fix this by adding a new state: STATE_DEAD, which is set by perf_release() to indicate that the filedesc (or kernel reference) is dead and there are no observers for our data left. Only for STATE_DEAD will is_orphaned_event() be true and inhibit cloning. STATE_EXIT is otherwise preserved such that is_event_hup() remains functional and will report when the observed task hierarchy becomes empty. Reported-by: Alexander Shishkin <alexander.shishkin@linux.intel.com> Tested-by: Alexander Shishkin <alexander.shishkin@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: dvyukov@google.com Cc: eranian@google.com Cc: oleg@redhat.com Cc: panand@redhat.com Cc: sasha.levin@oracle.com Cc: vince@deater.net Fixes: c6e5b73242d2 ("perf: Synchronously clean up child events") Link: http://lkml.kernel.org/r/20160224174947.919845295@infradead.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-02-23nfit: update address range scrub commands to the acpi 6.1 formatDan Williams2-3/+10
The original format of these commands from the "NVDIMM DSM Interface Example" [1] are superseded by the ACPI 6.1 definition of the "NVDIMM Root Device _DSMs" [2]. [1]: http://pmem.io/documents/NVDIMM_DSM_Interface_Example.pdf [2]: http://www.uefi.org/sites/default/files/resources/ACPI_6_1.pdf "9.20.7 NVDIMM Root Device _DSMs" Changes include: 1/ New 'restart' fields in ars_status, unfortunately these are implemented in the middle of the existing definition so this change is not backwards compatible. The expectation is that shipping platforms will only ever support the ACPI 6.1 definition. 2/ New status values for ars_start ('busy') and ars_status ('overflow'). Cc: Vishal Verma <vishal.l.verma@intel.com> Cc: Linda Knippers <linda.knippers@hpe.com> Cc: <stable@vger.kernel.org> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2016-02-23Merge tag 'nfs-for-4.5-4' of git://git.linux-nfs.org/projects/trondmy/linux-nfsLinus Torvalds2-3/+2
Pull NFS client bugfixes from Trond Myklebust: "Stable bugfixes: - Fix nfs_size_to_loff_t - NFSv4: Fix a dentry leak on alias use Other bugfixes: - Don't schedule a layoutreturn if the layout segment can be freed immediately. - Always set NFS_LAYOUT_RETURN_REQUESTED with lo->plh_return_iomode - rpcrdma_bc_receive_call() should init rq_private_buf.len - fix stateid handling for the NFS v4.2 operations - pnfs/blocklayout: fix a memeory leak when using,vmalloc_to_page - fix panic in gss_pipe_downcall() in fips mode - Fix a race between layoutget and pnfs_destroy_layout - Fix a race between layoutget and bulk recalls" * tag 'nfs-for-4.5-4' of git://git.linux-nfs.org/projects/trondmy/linux-nfs: NFSv4.x/pnfs: Fix a race between layoutget and bulk recalls NFSv4.x/pnfs: Fix a race between layoutget and pnfs_destroy_layout auth_gss: fix panic in gss_pipe_downcall() in fips mode pnfs/blocklayout: fix a memeory leak when using,vmalloc_to_page nfs4: fix stateid handling for the NFS v4.2 operations NFSv4: Fix a dentry leak on alias use xprtrdma: rpcrdma_bc_receive_call() should init rq_private_buf.len pNFS: Always set NFS_LAYOUT_RETURN_REQUESTED with lo->plh_return_iomode pNFS: Fix pnfs_mark_matching_lsegs_return() nfs: fix nfs_size_to_loff_t
2016-02-22Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netLinus Torvalds6-1488/+1505
Pull networking fixes from David Miller: "Looks like a lot, but mostly driver fixes scattered all over as usual. Of note: 1) Add conditional sched in nf conntrack in cleanup to avoid NMI watchdogs. From Florian Westphal. 2) Fix deadlock in nfnetlink cttimeout, also from Floarian. 3) Fix handling of slaves in bonding ARP monitor validation, from Jay Vosburgh. 4) Callers of ip_cmsg_send() are responsible for freeing IP options, some were not doing so. Fix from Eric Dumazet. 5) Fix per-cpu bugs in mvneta driver, from Gregory CLEMENT. 6) Fix vlan handling in mv88e6xxx DSA driver, from Vivien Didelot. 7) bcm7xxx PHY driver bug fixes from Florian Fainelli. 8) Avoid unaligned accesses to protocol headers wrt. GRE, from Alexander Duyck. 9) SKB leaks and other problems in arc_emac driver, from Alexander Kochetkov. 10) tcp_v4_inbound_md5_hash() releases listener socket instead of request socket on error path, oops. Fix from Eric Dumazet. 11) Missing socket release in pppoe_rcv_core() that seems to have existed basically forever. From Guillaume Nault. 12) Missing slave_dev unregister in dsa_slave_create() error path, from Florian Fainelli. 13) crypto_alloc_hash() never returns NULL, fix return value check in __tcp_alloc_md5sig_pool. From Insu Yun. 14) Properly expire exception route entries in ipv4, from Xin Long. 15) Fix races in tcp/dccp listener socket dismantle, from Eric Dumazet. 16) Don't set IFF_TX_SKB_SHARING in vxlan, geneve, or GRE, it's not legal. These drivers modify the SKB on transmit. From Jiri Benc. 17) Fix regression in the initialziation of netdev->tx_queue_len. From Phil Sutter. 18) Missing unlock in tipc_nl_add_bc_link() error path, from Insu Yun. 19) SCTP port hash sizing does not properly ensure that table is a power of two in size. From Neil Horman. 20) Fix initializing of software copy of MAC address in fmvj18x_cs driver, from Ken Kawasaki" * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (129 commits) bnx2x: Fix 84833 phy command handler bnx2x: Fix led setting for 84858 phy. bnx2x: Correct 84858 PHY fw version bnx2x: Fix 84833 RX CRC bnx2x: Fix link-forcing for KR2 net: ethernet: davicom: fix devicetree irq resource fmvj18x_cs: fix incorrect indexing of dev->dev_addr[] when copying the MAC address Driver: Vmxnet3: Update Rx ring 2 max size net: netcp: rework the code for get/set sw_data in dma desc soc: ti: knav_dma: rename pad in struct knav_dma_desc to sw_data net: ti: netcp: restore get/set_pad_info() functionality MAINTAINERS: Drop myself as xen netback maintainer sctp: Fix port hash table size computation can: ems_usb: Fix possible tx overflow Bluetooth: hci_core: Avoid mixing up req_complete and req_complete_skb net: bcmgenet: Fix internal PHY link state af_unix: Don't use continue to re-execute unix_stream_read_generic loop unix_diag: fix incorrect sign extension in unix_lookup_by_ino bnxt_en: Failure to update PHY is not fatal condition. bnxt_en: Remove unnecessary call to update PHY settings. ...
2016-02-21soc: ti: knav_dma: rename pad in struct knav_dma_desc to sw_dataKaricheri, Muralidharan1-1/+3
Rename the pad to sw_data as per description of this field in the hardware spec(refer sprugr9 from www.ti.com). Latest version of the document is at http://www.ti.com/lit/ug/sprugr9h/sprugr9h.pdf and section 3.1 Host Packet Descriptor describes this field. Define and use a constant for the size of sw_data field similar to other fields in the struct for desc and document the sw_data field in the header. As the sw_data is not touched by hw, it's type can be changed to u32. Rename the helpers to match with the updated dma desc field sw_data. Cc: Wingman Kwok <w-kwok2@ti.com> Cc: Mugunthan V N <mugunthanvnm@ti.com> CC: Arnd Bergmann <arnd@arndb.de> CC: Grygorii Strashko <grygorii.strashko@ti.com> CC: David Laight <David.Laight@ACULAB.COM> Signed-off-by: Murali Karicheri <m-karicheri2@ti.com> Acked-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-02-21power: bq27xxx_battery: Restore device nameIvaylo Dimitrov1-0/+1
Patch <703df6c09795> ("power: bq27xxx_battery: Reorganize I2C into a module") has removed the device name numbering from bq27xxx_battery_i2c_probe. Fix that by restoring the code. Fixes: 703df6c097956d17a818e63961c82e8e9eef9fef Signed-off-by: Ivaylo Dimitrov <ivo.g.dimitrov.75@gmail.com> Reviewed-by: Pali Rohár <pali.rohar@gmail.com> Tested-by: Pali Rohár <pali.rohar@gmail.com> Signed-off-by: Sebastian Reichel <sre@kernel.org>
2016-02-20Merge branch 'x86-urgent-for-linus' of ↵Linus Torvalds2-1/+8
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 fixes from Ingo Molnar: "This is unusually large, partly due to the EFI fixes that prevent accidental deletion of EFI variables through efivarfs that may brick machines. These fixes are somewhat involved to maintain compatibility with existing install methods and other usage modes, while trying to turn off the 'rm -rf' bricking vector. Other fixes are for large page ioremap()s and for non-temporal user-memcpy()s" * 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/mm: Fix vmalloc_fault() to handle large pages properly hpet: Drop stale URLs x86/uaccess/64: Handle the caching of 4-byte nocache copies properly in __copy_user_nocache() x86/uaccess/64: Make the __copy_user_nocache() assembly code more readable lib/ucs2_string: Correct ucs2 -> utf8 conversion efi: Add pstore variables to the deletion whitelist efi: Make efivarfs entries immutable by default efi: Make our variable validation list include the guid efi: Do variable name validation tests in utf8 efi: Use ucs2_as_utf8 in efivarfs instead of open coding a bad version lib/ucs2_string: Add ucs2 -> utf8 helper functions
2016-02-20Merge tag 'powerpc-4.5-3' of ↵Linus Torvalds1-0/+8
git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux Pull powerpc fixes from Michael Ellerman: - Fix build error on 32-bit with checkpoint restart from Aneesh Kumar - Fix dedotify for binutils >= 2.26 from Andreas Schwab - Don't trace hcalls on offline CPUs from Denis Kirjanov - eeh: Fix stale cached primary bus from Gavin Shan - eeh: Fix stale PE primary bus from Gavin Shan - mm: Fix Multi hit ERAT cause by recent THP update from Aneesh Kumar K.V - ioda: Set "read" permission when "write" is set from Alexey Kardashevskiy * tag 'powerpc-4.5-3' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: powerpc/ioda: Set "read" permission when "write" is set powerpc/mm: Fix Multi hit ERAT cause by recent THP update powerpc/powernv: Fix stale PE primary bus powerpc/eeh: Fix stale cached primary bus powerpc/pseries: Don't trace hcalls on offline CPUs powerpc: Fix dedotify for binutils >= 2.26 powerpc/book3s_32: Fix build error with checkpoint restart
2016-02-20Merge tag 'clk-fixes-for-linus' of ↵Linus Torvalds1-1/+1
git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux Pull clk driver fixes from Stephen Boyd: "An assortment of vendor specific clk drivers fixes, most notably fallout from adding Tegra210 and rockchip rk3036/rk3368 drivers this cycle. There's also the random smattering of sparse/checker fixes, a build "fix" to get the Tango clk driver to compile because the Kconfig symbol was renamed after the fact, and a clk gpio fix for a patch mismerge" * tag 'clk-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux: (28 commits) clk: gpio: Really allow an optional clock= DT property Revert "clk: qcom: Specify LE device endianness" clk: versatile: mask VCO bits before writing clk: tegra: super: Fix sparse warnings for functions not declared as static clk: tegra: Fix sparse warnings for functions not declared as static clk: tegra: Fix sparse warning for pll_m clk: tegra: Use definition for pll_u override bit clk: tegra: Fix warning caused by pll_u failing to lock clk: tegra: Fix clock sources for Tegra210 EMC clk: tegra: Add the APB2APE audio clock on Tegra210 clk: tegra: Add missing of_node_put() clk: tegra: Fix PLLE SS coefficients clk: tegra: Fix typos around clearing PLLE bits during enable clk: tegra: Do not disable PLLE when under hardware control clk: tegra: Fix pllx dyn step calculation clk: tegra: pll: Fix potential sleeping-while-atomic clk: tegra: Fix the misnaming of nvenc from msenc clk: tegra: Fix naming of MISC registers clk: tango4: rename ARCH_TANGOX to ARCH_TANGO clk: scpi: Fix checking return value of platform_device_register_simple() ...
2016-02-20Merge branch 'drm-fixes' of git://people.freedesktop.org/~airlied/linuxLinus Torvalds1-1/+7
Pull more drm fixes from Dave Airlie: "Some more fixes trickled in: A bunch of VC4 ones since it's a pretty new driver not much chance of regressions, and it fixes GPU resets. Also one atomic fix, one set of fixes for a common bug in TTM cleanup, and one i915 hotplug fix" * 'drm-fixes' of git://people.freedesktop.org/~airlied/linux: drm/nouveau: use post-decrement in error handling drm/atomic: Allow for holes in connector state, v2. drm/i915: Fix hpd live status bits for g4x drm/vc4: Use runtime PM to power cycle the device when the GPU hangs. drm/vc4: Enable runtime PM. drm/vc4: Fix spurious GPU resets due to BO reuse. drm/vc4: Drop error message on seqno wait timeouts. drm/vc4: Fix -ERESTARTSYS error return from BO waits. drm/vc4: Return an ERR_PTR from BO creation instead of NULL. drm/vc4: Fix the clear color for the first tile rendered. drm/vc4: Validate that WAIT_BO padding is cleared. drm/radeon: use post-decrement in error handling drm/amdgpu: use post-decrement in error handling
2016-02-19libnvdimm, tools/testing/nvdimm: fix 'ars_status' output buffer sizingDan Williams1-1/+0
Use the output length specified in the command to size the receive buffer rather than the arbitrary 4K limit. This bug was hiding the fact that the ndctl implementation of ndctl_bus_cmd_new_ars_status() was not specifying an output buffer size. Cc: <stable@vger.kernel.org> Cc: Vishal Verma <vishal.l.verma@intel.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2016-02-19Merge branch 'akpm' (patches from Andrew)Linus Torvalds1-4/+1
Merge fixes from Andrew Morton: "10 fixes" * emailed patches from Andrew Morton <akpm@linux-foundation.org>: mm: slab: free kmem_cache_node after destroy sysfs file ipc/shm: handle removed segments gracefully in shm_mmap() MAINTAINERS: update Kselftest Framework mailing list devm_memremap_release(): fix memremap'd addr handling mm/hugetlb.c: fix incorrect proc nr_hugepages value mm, x86: fix pte_page() crash in gup_pte_range() fsnotify: turn fsnotify reaper thread into a workqueue job Revert "fsnotify: destroy marks with call_srcu instead of dedicated thread" mm: fix regression in remap_file_pages() emulation thp, dax: do not try to withdraw pgtable from non-anon VMA
2016-02-19net: make netdev_for_each_lower_dev safe for device removalNikolay Aleksandrov1-1/+1
When I used netdev_for_each_lower_dev in commit bad531623253 ("vrf: remove slave queue and private slave struct") I thought that it acts like netdev_for_each_lower_private and can be used to remove the current device from the list while walking, but unfortunately it acts more like netdev_for_each_lower_private_rcu and doesn't allow it. The difference is where the "iter" points to, right now it points to the current element and that makes it impossible to remove it. Change the logic to be similar to netdev_for_each_lower_private and make it point to the "next" element so we can safely delete the current one. VRF is the only such user right now, there's no change for the read-only users. Here's what can happen now: [98423.249858] general protection fault: 0000 [#1] SMP [98423.250175] Modules linked in: vrf bridge(O) stp llc nfsd auth_rpcgss oid_registry nfs_acl nfs lockd grace sunrpc crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel jitterentropy_rng sha256_generic hmac drbg ppdev aesni_intel aes_x86_64 glue_helper lrw gf128mul ablk_helper cryptd evdev serio_raw pcspkr virtio_balloon parport_pc parport i2c_piix4 i2c_core virtio_console acpi_cpufreq button 9pnet_virtio 9p 9pnet fscache ipv6 autofs4 ext4 crc16 mbcache jbd2 sg virtio_blk virtio_net sr_mod cdrom e1000 ata_generic ehci_pci uhci_hcd ehci_hcd usbcore usb_common virtio_pci ata_piix libata floppy virtio_ring virtio scsi_mod [last unloaded: bridge] [98423.255040] CPU: 1 PID: 14173 Comm: ip Tainted: G O 4.5.0-rc2+ #81 [98423.255386] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.8.1-20150318_183358- 04/01/2014 [98423.255777] task: ffff8800547f5540 ti: ffff88003428c000 task.ti: ffff88003428c000 [98423.256123] RIP: 0010:[<ffffffff81514f3e>] [<ffffffff81514f3e>] netdev_lower_get_next+0x1e/0x30 [98423.256534] RSP: 0018:ffff88003428f940 EFLAGS: 00010207 [98423.256766] RAX: 0002000100000004 RBX: ffff880054ff9000 RCX: 0000000000000000 [98423.257039] RDX: ffff88003428f8b8 RSI: ffff88003428f950 RDI: ffff880054ff90c0 [98423.257287] RBP: ffff88003428f940 R08: 0000000000000000 R09: 0000000000000000 [98423.257537] R10: 0000000000000001 R11: 0000000000000000 R12: ffff88003428f9e0 [98423.257802] R13: ffff880054a5fd00 R14: ffff88003428f970 R15: 0000000000000001 [98423.258055] FS: 00007f3d76881700(0000) GS:ffff88005d000000(0000) knlGS:0000000000000000 [98423.258418] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [98423.258650] CR2: 00007ffe5951ffa8 CR3: 0000000052077000 CR4: 00000000000406e0 [98423.258902] Stack: [98423.259075] ffff88003428f960 ffffffffa0442636 0002000100000004 ffff880054ff9000 [98423.259647] ffff88003428f9b0 ffffffff81518205 ffff880054ff9000 ffff88003428f978 [98423.260208] ffff88003428f978 ffff88003428f9e0 ffff88003428f9e0 ffff880035b35f00 [98423.260739] Call Trace: [98423.260920] [<ffffffffa0442636>] vrf_dev_uninit+0x76/0xa0 [vrf] [98423.261156] [<ffffffff81518205>] rollback_registered_many+0x205/0x390 [98423.261401] [<ffffffff815183ec>] unregister_netdevice_many+0x1c/0x70 [98423.261641] [<ffffffff8153223c>] rtnl_delete_link+0x3c/0x50 [98423.271557] [<ffffffff815335bb>] rtnl_dellink+0xcb/0x1d0 [98423.271800] [<ffffffff811cd7da>] ? __inc_zone_state+0x4a/0x90 [98423.272049] [<ffffffff815337b4>] rtnetlink_rcv_msg+0x84/0x200 [98423.272279] [<ffffffff810cfe7d>] ? trace_hardirqs_on+0xd/0x10 [98423.272513] [<ffffffff8153370b>] ? rtnetlink_rcv+0x1b/0x40 [98423.272755] [<ffffffff81533730>] ? rtnetlink_rcv+0x40/0x40 [98423.272983] [<ffffffff8155d6e7>] netlink_rcv_skb+0x97/0xb0 [98423.273209] [<ffffffff8153371a>] rtnetlink_rcv+0x2a/0x40 [98423.273476] [<ffffffff8155ce8b>] netlink_unicast+0x11b/0x1a0 [98423.273710] [<ffffffff8155d2f1>] netlink_sendmsg+0x3e1/0x610 [98423.273947] [<ffffffff814fbc98>] sock_sendmsg+0x38/0x70 [98423.274175] [<ffffffff814fc253>] ___sys_sendmsg+0x2e3/0x2f0 [98423.274416] [<ffffffff810d841e>] ? do_raw_spin_unlock+0xbe/0x140 [98423.274658] [<ffffffff811e1bec>] ? handle_mm_fault+0x26c/0x2210 [98423.274894] [<ffffffff811e19cd>] ? handle_mm_fault+0x4d/0x2210 [98423.275130] [<ffffffff81269611>] ? __fget_light+0x91/0xb0 [98423.275365] [<ffffffff814fcd42>] __sys_sendmsg+0x42/0x80 [98423.275595] [<ffffffff814fcd92>] SyS_sendmsg+0x12/0x20 [98423.275827] [<ffffffff81611bb6>] entry_SYSCALL_64_fastpath+0x16/0x7a [98423.276073] Code: c3 31 c0 5d c3 0f 1f 84 00 00 00 00 00 66 66 66 66 90 48 8b 06 55 48 81 c7 c0 00 00 00 48 89 e5 48 8b 00 48 39 f8 74 09 48 89 06 <48> 8b 40 e8 5d c3 31 c0 5d c3 0f 1f 84 00 00 00 00 00 66 66 66 [98423.279639] RIP [<ffffffff81514f3e>] netdev_lower_get_next+0x1e/0x30 [98423.279920] RSP <ffff88003428f940> CC: David Ahern <dsa@cumulusnetworks.com> CC: David S. Miller <davem@davemloft.net> CC: Roopa Prabhu <roopa@cumulusnetworks.com> CC: Vlad Yasevich <vyasevic@redhat.com> Fixes: bad531623253 ("vrf: remove slave queue and private slave struct") Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Reviewed-by: David Ahern <dsa@cumulusnetworks.com> Tested-by: David Ahern <dsa@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-02-19drm/atomic: Allow for holes in connector state, v2.Maarten Lankhorst1-1/+7
Because we record connector_mask using 1 << drm_connector_index now the connector_mask should stay the same even when other connectors are removed. This was not the case with MST, in that case when removing a connector all other connectors may change their index. This is fixed by waiting until the first get_connector_state to allocate connector_state, and force reallocation when state is too small. As a side effect connector arrays no longer have to be preallocated, and can be allocated on first use which means a less allocations in the page flip only path. Changes since v1: - Whitespace. (Ville) - Call ida_remove when destroying the connector. (Ville) - u32 alloc -> int. (Ville) Fixes: 14de6c44d149 ("drm/atomic: Remove drm_atomic_connectors_for_crtc.") Signed-off-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> Cc: Ville Syrjälä <ville.syrjala@linux.intel.com> Reviewed-by: Lyude <cpaul@redhat.com> Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com> Signed-off-by: Dave Airlie <airlied@redhat.com>
2016-02-18Merge branch 'for-linus' of ↵Linus Torvalds1-2/+4
git://git.kernel.org/pub/scm/linux/kernel/git/jikos/livepatching Pull livepatching fixes from Jiri Kosina: - regression (from 4.4) fix for ordering issue, introduced by an earlier ftrace change, that broke live patching of modules. The fix replaces the ftrace module notifier by direct call in order to make the ordering guaranteed and well-defined. The patch, from Jessica Yu, has been acked both by Steven and Rusty - error message fix from Miroslav Benes * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/livepatching: ftrace/module: remove ftrace module notifier livepatch: change the error message in asm/livepatch.h header files
2016-02-18Revert "fsnotify: destroy marks with call_srcu instead of dedicated thread"Jeff Layton1-4/+1
This reverts commit c510eff6beba ("fsnotify: destroy marks with call_srcu instead of dedicated thread"). Eryu reported that he was seeing some OOM kills kick in when running a testcase that adds and removes inotify marks on a file in a tight loop. The above commit changed the code to use call_srcu to clean up the marks. While that does (in principle) work, the srcu callback job is limited to cleaning up entries in small batches and only once per jiffy. It's easily possible to overwhelm that machinery with too many call_srcu callbacks, and Eryu's reproduer did just that. There's also another potential problem with using call_srcu here. While you can obviously sleep while holding the srcu_read_lock, the callbacks run under local_bh_disable, so you can't sleep there. It's possible when putting the last reference to the fsnotify_mark that we'll end up putting a chain of references including the fsnotify_group, uid, and associated keys. While I don't see any obvious ways that that could occurs, it's probably still best to avoid using call_srcu here after all. This patch reverts the above patch. A later patch will take a different approach to eliminated the dedicated thread here. Signed-off-by: Jeff Layton <jeff.layton@primarydata.com> Reported-by: Eryu Guan <guaneryu@gmail.com> Tested-by: Eryu Guan <guaneryu@gmail.com> Cc: Jan Kara <jack@suse.com> Cc: Eric Paris <eparis@parisplace.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-02-18tcp/dccp: fix another race at listener dismantleEric Dumazet1-2/+3
Ilya reported following lockdep splat: kernel: ========================= kernel: [ BUG: held lock freed! ] kernel: 4.5.0-rc1-ceph-00026-g5e0a311 #1 Not tainted kernel: ------------------------- kernel: swapper/5/0 is freeing memory ffff880035c9d200-ffff880035c9dbff, with a lock still held there! kernel: (&(&queue->rskq_lock)->rlock){+.-...}, at: [<ffffffff816f6a88>] inet_csk_reqsk_queue_add+0x28/0xa0 kernel: 4 locks held by swapper/5/0: kernel: #0: (rcu_read_lock){......}, at: [<ffffffff8169ef6b>] netif_receive_skb_internal+0x4b/0x1f0 kernel: #1: (rcu_read_lock){......}, at: [<ffffffff816e977f>] ip_local_deliver_finish+0x3f/0x380 kernel: #2: (slock-AF_INET){+.-...}, at: [<ffffffff81685ffb>] sk_clone_lock+0x19b/0x440 kernel: #3: (&(&queue->rskq_lock)->rlock){+.-...}, at: [<ffffffff816f6a88>] inet_csk_reqsk_queue_add+0x28/0xa0 To properly fix this issue, inet_csk_reqsk_queue_add() needs to return to its callers if the child as been queued into accept queue. We also need to make sure listener is still there before calling sk->sk_data_ready(), by holding a reference on it, since the reference carried by the child can disappear as soon as the child is put on accept queue. Reported-by: Ilya Dryomov <idryomov@gmail.com> Fixes: ebb516af60e1 ("tcp/dccp: fix race at listener dismantle phase") Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-02-18route: check and remove route cache when we get routeXin Long1-0/+1
Since the gc of ipv4 route was removed, the route cached would has no chance to be removed, and even it has been timeout, it still could be used, cause no code to check it's expires. Fix this issue by checking and removing route cache when we get route. Signed-off-by: Xin Long <lucien.xin@gmail.com> Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-02-17Revert "PCI: Add helpers to manage pci_dev->irq and pci_dev->irq_managed"Bjorn Helgaas1-17/+0
Revert 811a4e6fce09 ("PCI: Add helpers to manage pci_dev->irq and pci_dev->irq_managed"). This is part of reverting 991de2e59090 ("PCI, x86: Implement pcibios_alloc_irq() and pcibios_free_irq()") to fix regressions it introduced. Link: https://bugzilla.kernel.org/show_bug.cgi?id=111211 Fixes: 991de2e59090 ("PCI, x86: Implement pcibios_alloc_irq() and pcibios_free_irq()") Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Acked-by: Rafael J. Wysocki <rafael@kernel.org> CC: Jiang Liu <jiang.liu@linux.intel.com>
2016-02-17ftrace/module: remove ftrace module notifierJessica Yu1-2/+4
Remove the ftrace module notifier in favor of directly calling ftrace_module_enable() and ftrace_release_mod() in the module loader. Hard-coding the function calls directly in the module loader removes dependence on the module notifier call chain and provides better visibility and control over what gets called when, which is important to kernel utilities such as livepatch. This fixes a notifier ordering issue in which the ftrace module notifier (and hence ftrace_module_enable()) for coming modules was being called after klp_module_notify(), which caused livepatch modules to initialize incorrectly. This patch removes dependence on the module notifier call chain in favor of hard coding the corresponding function calls in the module loader. This ensures that ftrace and livepatch code get called in the correct order on patch module load and unload. Fixes: 5156dca34a3e ("ftrace: Fix the race between ftrace and insmod") Signed-off-by: Jessica Yu <jeyu@redhat.com> Reviewed-by: Steven Rostedt <rostedt@goodmis.org> Reviewed-by: Petr Mladek <pmladek@suse.cz> Acked-by: Rusty Russell <rusty@rustcorp.com.au> Reviewed-by: Josh Poimboeuf <jpoimboe@redhat.com> Reviewed-by: Miroslav Benes <mbenes@suse.cz> Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2016-02-17Merge branch 'for-linus' of git://git.kernel.dk/linux-blockLinus Torvalds1-0/+4
Pull block fixes from Jens Axboe: "A collection of fixes from the past few weeks that should go into 4.5. This contains: - Overflow fix for sysfs discard show function from Alan. - A stacking limit init fix for max_dev_sectors, so we don't end up artificially capping some use cases. From Keith. - Have blk-mq proper end unstarted requests on a dying queue, instead of pushing that to the driver. From Keith. - NVMe: - Update to Kconfig description for NVME_SCSI, since it was vague and having it on is important for some SUSE distros. From Christoph. - Set of fixes from Keith, around surprise removal. Also kills the no-merge flag, so it supports merging. - Set of fixes for lightnvm from Matias, Javier, and Wenwei. - Fix null_blk oops when asked for lightnvm, but not available. From Matias. - Copy-to-user EINTR fix from Hannes, fixing a case where SG_IO fails if interrupted by a signal. - Two floppy fixes from Jiri, fixing signal handling and blocking open. - A use-after-free fix for O_DIRECT, from Mike Krinkin. - A block module ref count fix from Roman Pen. - An fs IO wait accounting fix for O_DSYNC from Stephane Gasparini. - Smaller reallo fix for xen-blkfront from Bob Liu. - Removal of an unused struct member in the deadline IO scheduler, from Tahsin. - Also from Tahsin, properly initialize inode struct members associated with cgroup writeback, if enabled. - From Tejun, ensure that we keep the superblock pinned during cgroup writeback" * 'for-linus' of git://git.kernel.dk/linux-block: (25 commits) blk: fix overflow in queue_discard_max_hw_show writeback: initialize inode members that track writeback history writeback: keep superblock pinned during cgroup writeback association switches bio: return EINTR if copying to user space got interrupted NVMe: Rate limit nvme IO warnings NVMe: Poll device while still active during remove NVMe: Requeue requests on suspended queues NVMe: Allow request merges NVMe: Fix io incapable return values blk-mq: End unstarted requests on dying queue block: Initialize max_dev_sectors to 0 null_blk: oops when initializing without lightnvm block: fix module reference leak on put_disk() call for cgroups throttle nvme: fix Kconfig description for BLK_DEV_NVME_SCSI kernel/fs: fix I/O wait not accounted for RW O_DSYNC floppy: refactor open() flags handling lightnvm: allow to force mm initialization lightnvm: check overflow and correct mlc pairs lightnvm: fix request intersection locking in rrpc lightnvm: warn if irqs are disabled in lock laddr ...
2016-02-17Merge tag 'trace-fixes-v4.5-rc4' of ↵Linus Torvalds2-1/+6
git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace Pull tracing fixes from Steven Rostedt: "This includes two fixes. The first is something that has come up a few times and has been worked out individually, but it's come up now enough that the problem should be generic. Tracepoints are protected by RCU sched. There are several tracepoints within core infrastructure like kfree(). If a tracepoint is called when the CPU is going down, or when it's coming up but has yet to be recognized by RCU, a RCU warning is triggered. This is a true bug as that tracepoint is not protected by RCU. Usually, this is taken care of by testing for cpu online as a tracepoint condition. But as this is happening more often, moving it from a individual tracepoint to a check in the tracepoint infrastructure is more robust. Note, there is now a duplicate of a cpu online test, because this update does not remove the individual checks. But the overhead is small enough that the removal can be done in another release. The second change is strange linker breakage due to the branch tracer's builtin_constant_p() check failing, and treating the condition as a variable instead of a constant. Arnd Bergmann found that this can be fixed by testing !!(cond) instead of just (cond)" * tag 'trace-fixes-v4.5-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace: tracing: Fix freak link error caused by branch tracer tracepoints: Do not trace when cpu is offline
2016-02-17pnfs/blocklayout: fix a memeory leak when using,vmalloc_to_pageKinglong Mee1-0/+1
unreferenced object 0xffffc90000abf000 (size 16900): comm "fsync02", pid 15765, jiffies 4297431627 (age 423.772s) hex dump (first 32 bytes): 00 00 00 00 00 00 00 00 00 a0 c2 19 00 88 ff ff ................ 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ backtrace: [<ffffffff8174d54e>] kmemleak_alloc+0x4e/0xb0 [<ffffffff811b9b91>] __vmalloc_node_range+0x231/0x280 [<ffffffff811b9c2a>] __vmalloc+0x4a/0x50 [<ffffffffa02c9ec1>] ext_tree_prepare_commit+0x231/0x2e0 [blocklayoutdriver] [<ffffffffa02c700e>] bl_prepare_layoutcommit+0xe/0x10 [blocklayoutdriver] [<ffffffffa0596a6c>] pnfs_layoutcommit_inode+0x29c/0x330 [nfsv4] [<ffffffffa0596b13>] pnfs_generic_sync+0x13/0x20 [nfsv4] [<ffffffffa0585188>] nfs4_file_fsync+0x58/0x150 [nfsv4] [<ffffffff81228e5b>] vfs_fsync_range+0x4b/0xb0 [<ffffffff81228f1d>] do_fsync+0x3d/0x70 [<ffffffff812291d0>] SyS_fsync+0x10/0x20 [<ffffffff81757def>] entry_SYSCALL_64_fastpath+0x12/0x76 [<ffffffffffffffff>] 0xffffffffffffffff v2, add missing include header Signed-off-by: Kinglong Mee <kinglongmee@gmail.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>