summaryrefslogtreecommitdiffstats
path: root/kernel
AgeCommit message (Collapse)AuthorFilesLines
2017-05-01Merge branch 'x86-boot-for-linus' of ↵Linus Torvalds1-52/+0
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 boot updates from Ingo Molnar: "The biggest changes in this cycle were: - reworking of the e820 code: separate in-kernel and boot-ABI data structures and apply a whole range of cleanups to the kernel side. No change in functionality. - enable KASLR by default: it's used by all major distros and it's out of the experimental stage as well. - ... misc fixes and cleanups" * 'x86-boot-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (63 commits) x86/KASLR: Fix kexec kernel boot crash when KASLR randomization fails x86/reboot: Turn off KVM when halting a CPU x86/boot: Fix BSS corruption/overwrite bug in early x86 kernel startup x86: Enable KASLR by default boot/param: Move next_arg() function to lib/cmdline.c for later reuse x86/boot: Fix Sparse warning by including required header file x86/boot/64: Rename start_cpu() x86/xen: Update e820 table handling to the new core x86 E820 code x86/boot: Fix pr_debug() API braindamage xen, x86/headers: Add <linux/device.h> dependency to <asm/xen/page.h> x86/boot/e820: Simplify e820__update_table() x86/boot/e820: Separate the E820 ABI structures from the in-kernel structures x86/boot/e820: Fix and clean up e820_type switch() statements x86/boot/e820: Rename the remaining E820 APIs to the e820__*() prefix x86/boot/e820: Remove unnecessary #include's x86/boot/e820: Rename e820_mark_nosave_regions() to e820__register_nosave_regions() x86/boot/e820: Rename e820_reserve_resources*() to e820__reserve_resources*() x86/boot/e820: Use bool in query APIs x86/boot/e820: Document e820__reserve_setup_data() x86/boot/e820: Clean up __e820__update_table() et al ...
2017-05-01Merge branch 'perf-core-for-linus' of ↵Linus Torvalds8-28/+208
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull perf updates from Ingo Molnar: "The main changes in this cycle were: Kernel side changes: - Kprobes and uprobes changes: - Make their trampolines read-only while they are used - Make UPROBES_EVENTS default-y which is the distro practice - Apply misc fixes and robustization to probe point insertion. - add support for AMD IOMMU events - extend hw events on Intel Goldmont CPUs - ... plus misc fixes and updates. Tooling side changes: - support s390 jump instructions in perf annotate (Christian Borntraeger) - vendor hardware events updates (Andi Kleen) - add argument support for SDT events in powerpc (Ravi Bangoria) - beautify the statx syscall arguments in 'perf trace' (Arnaldo Carvalho de Melo) - handle inline functions in callchains (Jin Yao) - enable sorting by srcline as key (Milian Wolff) - add 'brstackinsn' field in 'perf script' to reuse the x86 instruction decoder used in the Intel PT code to study hot paths to samples (Andi Kleen) - add PERF_RECORD_NAMESPACES so that the kernel can record information required to associate samples to namespaces, helping in container problem characterization. (Hari Bathini) - allow sorting by symbol_size in 'perf report' and 'perf top' (Charles Baylis) - in perf stat, make system wide (-a) the default option if no target was specified and one of following conditions is met: - no workload specified (current behaviour) - a workload is specified but all requested events are system wide ones, like uncore ones. (Jiri Olsa) - ... plus lots of other updates, enhancements, cleanups and fixes" * 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (235 commits) perf tools: Fix the code to strip command name tools arch x86: Sync cpufeatures.h tools arch: Sync arch/x86/lib/memcpy_64.S with the kernel tools: Update asm-generic/mman-common.h copy from the kernel perf tools: Use just forward declarations for struct thread where possible perf tools: Add the right header to obtain PERF_ALIGN() perf tools: Remove poll.h and wait.h from util.h perf tools: Remove string.h, unistd.h and sys/stat.h from util.h perf tools: Remove stale prototypes from builtin.h perf tools: Remove string.h from util.h perf tools: Remove sys/ioctl.h from util.h perf tools: Remove a few more needless includes from util.h perf tools: Include sys/param.h where needed perf callchain: Move callchain specific routines from util.[ch] perf tools: Add compress.h for the *_decompress_to_file() headers perf mem: Fix display of data source snoop indication perf debug: Move dump_stack() and sighandler_dump_stack() to debug.h perf kvm: Make function only used by 'perf kvm' static perf tools: Move timestamp routines from util.h to time-utils.h perf tools: Move units conversion/formatting routines to separate object ...
2017-05-01Merge branch 'locking-core-for-linus' of ↵Linus Torvalds13-485/+852
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull locking updates from Ingo Molnar: "The main changes in this cycle were: - a big round of FUTEX_UNLOCK_PI improvements, fixes, cleanups and general restructuring - lockdep updates such as new checks for lock_downgrade() - introduce the new atomic_try_cmpxchg() locking API and use it to optimize refcount code generation - ... plus misc fixes, updates and cleanups" * 'locking-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (38 commits) MAINTAINERS: Add FUTEX SUBSYSTEM futex: Clarify mark_wake_futex memory barrier usage futex: Fix small (and harmless looking) inconsistencies futex: Avoid freeing an active timer rtmutex: Plug preempt count leak in rt_mutex_futex_unlock() rtmutex: Fix more prio comparisons rtmutex: Fix PI chain order integrity sched,tracing: Update trace_sched_pi_setprio() sched/rtmutex: Refactor rt_mutex_setprio() rtmutex: Clean up sched/deadline/rtmutex: Dont miss the dl_runtime/dl_period update sched/rtmutex/deadline: Fix a PI crash for deadline tasks rtmutex: Deboost before waking up the top waiter locking/ww-mutex: Limit stress test to 2 seconds locking/atomic: Fix atomic_try_cmpxchg() semantics lockdep: Fix per-cpu static objects futex: Drop hb->lock before enqueueing on the rtmutex futex: Futex_unlock_pi() determinism futex: Rework futex_lock_pi() to use rt_mutex_*_proxy_lock() futex,rt_mutex: Restructure rt_mutex_finish_proxy_lock() ...
2017-05-01Merge branch 'sched-core-for-linus' of ↵Linus Torvalds8-292/+518
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull scheduler updates from Ingo Molnar: "The main changes in this cycle were: - another round of rq-clock handling debugging, robustization and fixes - PELT accounting improvements - CPU hotplug related ->cpus_allowed affinity handling fixes all around the tree - ... plus misc fixes, cleanups and updates" * 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (35 commits) sched/x86: Update reschedule warning text crypto: N2 - Replace racy task affinity logic cpufreq/sparc-us2e: Replace racy task affinity logic cpufreq/sparc-us3: Replace racy task affinity logic cpufreq/sh: Replace racy task affinity logic cpufreq/ia64: Replace racy task affinity logic ACPI/processor: Replace racy task affinity logic ACPI/processor: Fix error handling in __acpi_processor_start() sparc/sysfs: Replace racy task affinity logic powerpc/smp: Replace open coded task affinity logic ia64/sn/hwperf: Replace racy task affinity logic ia64/salinfo: Replace racy task affinity logic workqueue: Provide work_on_cpu_safe() ia64/topology: Remove cpus_allowed manipulation sched/fair: Move the PELT constants into a generated header sched/fair: Increase PELT accuracy for small tasks sched/fair: Fix comments sched/Documentation: Add 'sched-pelt' tool sched/fair: Fix corner case in __accumulate_sum() sched/core: Remove 'task' parameter and rename tsk_restore_flags() to current_restore_flags() ...
2017-05-01Merge branch 'timers-core-for-linus' of ↵Linus Torvalds14-117/+161
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull timer updates from Thomas Gleixner: "The timer departement delivers: - more year 2038 rework - a massive rework of the arm achitected timer - preparatory patches to allow NTP correction of clock event devices to avoid early expiry - the usual pile of fixes and enhancements all over the place" * 'timers-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (91 commits) timer/sysclt: Restrict timer migration sysctl values to 0 and 1 arm64/arch_timer: Mark errata handlers as __maybe_unused Clocksource/mips-gic: Remove redundant non devicetree init MIPS/Malta: Probe gic-timer via devicetree clocksource: Use GENMASK_ULL in definition of CLOCKSOURCE_MASK acpi/arm64: Add SBSA Generic Watchdog support in GTDT driver clocksource: arm_arch_timer: add GTDT support for memory-mapped timer acpi/arm64: Add memory-mapped timer support in GTDT driver clocksource: arm_arch_timer: simplify ACPI support code. acpi/arm64: Add GTDT table parse driver clocksource: arm_arch_timer: split MMIO timer probing. clocksource: arm_arch_timer: add structs to describe MMIO timer clocksource: arm_arch_timer: move arch_timer_needs_of_probing into DT init call clocksource: arm_arch_timer: refactor arch_timer_needs_probing clocksource: arm_arch_timer: split dt-only rate handling x86/uv/time: Set ->min_delta_ticks and ->max_delta_ticks unicore32/time: Set ->min_delta_ticks and ->max_delta_ticks um/time: Set ->min_delta_ticks and ->max_delta_ticks tile/time: Set ->min_delta_ticks and ->max_delta_ticks score/time: Set ->min_delta_ticks and ->max_delta_ticks ...
2017-05-01Merge branch 'irq-core-for-linus' of ↵Linus Torvalds2-3/+8
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull irq updates from Thomas Gleixner: "Nothing exciting from the irq side for this merge window: - a new driver for a Mediatek SoC - ACPI support for ARM GICV3 - support for shared nested interrupts - the usual pile of fixes and updates all over te place" * 'irq-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (28 commits) irqchip/mbigen: Fix return value check in mbigen_device_probe() irqchip/mips-gic: Replace static map with dynamic irqchip/mips-gic: Remove device IRQ domain irqchip/mips-gic: Separate IPI reservation & usage tracking genirq: Use irqd_get_trigger_type to compare the trigger type for shared IRQs genirq: Use cpumask_available() for check of cpumask variable cpumask: Add helper cpumask_available() irqchip/irq-imx-gpcv2: Clear OF_POPULATED flag irqchip/atmel-aic5: Handle suspend to RAM irqchip: Add Mediatek mtk-cirq driver dt-bindings: mtk-cirq: Add binding document irqchip/gic-v3-its: Add IORT hook for platform MSI support irqchip/mbigen: Add ACPI support irqchip/mbigen: Introduce mbigen_of_create_domain() irqchip/mbigen: Drop module owner platform-msi: Make platform_msi_create_device_domain() ACPI aware irqchip/gicv3-its: platform-msi: Scan MADT to create platform msi domain irqchip/gicv3-its: platform-msi: Refactor its_pmsi_init() to prepare for ACPI irqchip/gicv3-its: platform-msi: Refactor its_pmsi_prepare() irqchip/gic-v3-its: Keep the include header files in alphabetic order ...
2017-05-01Merge branch 'work.uaccess' of ↵Linus Torvalds1-1/+1
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull uaccess unification updates from Al Viro: "This is the uaccess unification pile. It's _not_ the end of uaccess work, but the next batch of that will go into the next cycle. This one mostly takes copy_from_user() and friends out of arch/* and gets the zero-padding behaviour in sync for all architectures. Dealing with the nocache/writethrough mess is for the next cycle; fortunately, that's x86-only. Same for cleanups in iov_iter.c (I am sold on access_ok() in there, BTW; just not in this pile), same for reducing __copy_... callsites, strn*... stuff, etc. - there will be a pile about as large as this one in the next merge window. This one sat in -next for weeks. -3KLoC" * 'work.uaccess' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (96 commits) HAVE_ARCH_HARDENED_USERCOPY is unconditional now CONFIG_ARCH_HAS_RAW_COPY_USER is unconditional now m32r: switch to RAW_COPY_USER hexagon: switch to RAW_COPY_USER microblaze: switch to RAW_COPY_USER get rid of padding, switch to RAW_COPY_USER ia64: get rid of copy_in_user() ia64: sanitize __access_ok() ia64: get rid of 'segment' argument of __do_{get,put}_user() ia64: get rid of 'segment' argument of __{get,put}_user_check() ia64: add extable.h powerpc: get rid of zeroing, switch to RAW_COPY_USER esas2r: don't open-code memdup_user() alpha: fix stack smashing in old_adjtimex(2) don't open-code kernel_setsockopt() mips: switch to RAW_COPY_USER mips: get rid of tail-zeroing in primitives mips: make copy_from_user() zero tail explicitly mips: clean and reorder the forest of macros... mips: consolidate __invoke_... wrappers ...
2017-05-01Merge tag 'pm-4.12-rc1' of ↵Linus Torvalds2-28/+66
git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm Pull power management updates from Rafael Wysocki: "This time the majority of changes go to the cpufreq subsystem (and to the intel_pstate driver in particular) and there are some updates in the generic power domains framework, cpuidle, tools and a couple of other places. One thing worth mentioning is that the intel_pstate's sysfs interface has been reworked to be more consistent with the general expectations of the cpufreq core and less confusing, hopefully for the better. Also, we have a new cpufreq driver for Tegra186 and new hardware support in intel_pstata and the Mediatek cpufreq driver. Apart from that, the AnalyzeSuspend utility for system suspend profiling gets a companion called AnalyzeBoot for the analogous profiling of system boot and they both go into one place under tools/power/pm-graph/. The rest is mostly fixes, cleanups and code reorganization. Specifics: - Rework the intel_pstate driver's sysfs interface to make it more straightforward and more intuitive (Rafael Wysocki). - Make intel_pstate support all processors which advertise HWP (hardware-managed P-states) to the kernel in all operation modes and make it use the load-based P-state selection algorithm on a wider range of systems in the active mode (Rafael Wysocki). - Add cpufreq driver for Tegra186 (Mikko Perttunen). - Add support for Gemini Lake SoCs to intel_pstate (David Box). - Add support for MT8176 and MT817x to the Mediatek cpufreq driver and clean up that driver a bit (Daniel Kurtz). - Clean up intel_pstate and optimize it slightly (Rafael Wysocki). - Update the schedutil cpufreq governor, mostly to fix a couple of issues with it related to specific workloads, and rework its sysfs tunable and initialization a bit (Rafael Wysocki, Viresh Kumar). - Fix minor issues in the imx6q, dbx500 and qoriq cpufreq drivers (Christophe Jaillet, Irina Tirdea, Leonard Crestez, Viresh Kumar, YuanTian Tang). - Add file patterns for cpufreq DT bindings to MAINTAINERS (Geert Uytterhoeven). - Add support for "always on" power domains to the genpd (generic power domains) framework and clean up that code somewhat (Ulf Hansson, Lina Iyer, Viresh Kumar). - Fix minor issues in the powernv cpuidle driver and clean it up (Anton Blanchard, Gautham Shenoy). - Move the AnalyzeSuspend utility under tools/power/pm-graph/ and add an analogous boot-profiling utility called AnalyzeBoot to it (Todd Brandt). - Add rk3328 support to the rockchip-io AVS (Adaptive Voltage Scaling) driver (David Wu). - Fix minor issues in the cpuidle core, the intel_pstate_tracer utility, the devfreq framework and the PM core documentation (Chanwoo Choi, Doug Smythies, Johan Hovold, Marcin Nowakowski)" * tag 'pm-4.12-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (56 commits) PM / runtime: Document autosuspend-helper side effects PM / runtime: Fix autosuspend documentation tools: power: pm-graph: Package makefile and man pages tools: power: pm-graph: AnalyzeBoot v2.0 tools: power: pm-graph: AnalyzeSuspend v4.6 cpufreq: Add Tegra186 cpufreq driver cpufreq: imx6q: Fix error handling code cpufreq: imx6q: Set max suspend_freq to avoid changes during suspend cpufreq: imx6q: Fix handling EPROBE_DEFER from regulator cpuidle: powernv: Avoid a branch in the core snooze_loop() loop cpuidle: powernv: Don't continually set thread priority in snooze_loop() cpuidle: powernv: Don't bounce between low and very low thread priority cpuidle: cpuidle-cps: remove unused variable tools/power/x86/intel_pstate_tracer: Adjust directory ownership cpufreq: schedutil: Use policy-dependent transition delays cpufreq: schedutil: Reduce frequencies slower PM / devfreq: Move struct devfreq_governor to devfreq directory PM / Domains: Ignore domain-idle-states that are not compatible cpufreq: intel_pstate: Add support for Gemini Lake powernv-cpuidle: Validate DT property array size ...
2017-05-01Merge branch 'for-4.12' of ↵Linus Torvalds6-31/+53
git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup Pull cgroup updates from Tejun Heo: "Nothing major. Two notable fixes are Li's second stab at fixing the long-standing race condition in the mount path and suppression of spurious warning from cgroup_get(). All other changes are trivial" * 'for-4.12' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup: cgroup: mark cgroup_get() with __maybe_unused cgroup: avoid attaching a cgroup root to two different superblocks, take 2 cgroup: fix spurious warnings on cgroup_is_dead() from cgroup_sk_alloc() cgroup: move cgroup_subsys_state parent field for cache locality cpuset: Remove cpuset_update_active_cpus()'s parameter. cgroup: switch to BUG_ON() cgroup: drop duplicate header nsproxy.h kernel: convert css_set.refcount from atomic_t to refcount_t kernel: convert cgroup_namespace.count from atomic_t to refcount_t
2017-05-01Merge branch 'for-4.12' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wqLinus Torvalds1-3/+2
Pull workqueue update from Tejun Heo: "One trivial patch to use setup_deferrable_timer() instead of open-coding the initialization" * 'for-4.12' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq: workqueue: use setup_deferrable_timer
2017-05-01cgroup: mark cgroup_get() with __maybe_unusedTejun Heo1-1/+1
a590b90d472f ("cgroup: fix spurious warnings on cgroup_is_dead() from cgroup_sk_alloc()") converted most cgroup_get() usages to cgroup_get_live() leaving cgroup_sk_alloc() the sole user of cgroup_get(). When !CONFIG_SOCK_CGROUP_DATA, this ends up triggering unused warning for cgroup_get(). Silence the warning by adding __maybe_unused to cgroup_get(). Reported-by: Stephen Rothwell <sfr@canb.auug.org.au> Link: http://lkml.kernel.org/r/20170501145340.17e8ef86@canb.auug.org.au Signed-off-by: Tejun Heo <tj@kernel.org>
2017-05-01Merge branch 'for-4.12/block' of git://git.kernel.dk/linux-blockLinus Torvalds1-23/+12
Pull block layer updates from Jens Axboe: - Add BFQ IO scheduler under the new blk-mq scheduling framework. BFQ was initially a fork of CFQ, but subsequently changed to implement fairness based on B-WF2Q+, a modified variant of WF2Q. BFQ is meant to be used on desktop type single drives, providing good fairness. From Paolo. - Add Kyber IO scheduler. This is a full multiqueue aware scheduler, using a scalable token based algorithm that throttles IO based on live completion IO stats, similary to blk-wbt. From Omar. - A series from Jan, moving users to separately allocated backing devices. This continues the work of separating backing device life times, solving various problems with hot removal. - A series of updates for lightnvm, mostly from Javier. Includes a 'pblk' target that exposes an open channel SSD as a physical block device. - A series of fixes and improvements for nbd from Josef. - A series from Omar, removing queue sharing between devices on mostly legacy drivers. This helps us clean up other bits, if we know that a queue only has a single device backing. This has been overdue for more than a decade. - Fixes for the blk-stats, and improvements to unify the stats and user windows. This both improves blk-wbt, and enables other users to register a need to receive IO stats for a device. From Omar. - blk-throttle improvements from Shaohua. This provides a scalable framework for implementing scalable priotization - particularly for blk-mq, but applicable to any type of block device. The interface is marked experimental for now. - Bucketized IO stats for IO polling from Stephen Bates. This improves efficiency of polled workloads in the presence of mixed block size IO. - A few fixes for opal, from Scott. - A few pulls for NVMe, including a lot of fixes for NVMe-over-fabrics. From a variety of folks, mostly Sagi and James Smart. - A series from Bart, improving our exposed info and capabilities from the blk-mq debugfs support. - A series from Christoph, cleaning up how handle WRITE_ZEROES. - A series from Christoph, cleaning up the block layer handling of how we track errors in a request. On top of being a nice cleanup, it also shrinks the size of struct request a bit. - Removal of mg_disk and hd (sorry Linus) by Christoph. The former was never used by platforms, and the latter has outlived it's usefulness. - Various little bug fixes and cleanups from a wide variety of folks. * 'for-4.12/block' of git://git.kernel.dk/linux-block: (329 commits) block: hide badblocks attribute by default blk-mq: unify hctx delay_work and run_work block: add kblock_mod_delayed_work_on() blk-mq: unify hctx delayed_run_work and run_work nbd: fix use after free on module unload MAINTAINERS: bfq: Add Paolo as maintainer for the BFQ I/O scheduler blk-mq-sched: alloate reserved tags out of normal pool mtip32xx: use runtime tag to initialize command header scsi: Implement blk_mq_ops.show_rq() blk-mq: Add blk_mq_ops.show_rq() blk-mq: Show operation, cmd_flags and rq_flags names blk-mq: Make blk_flags_show() callers append a newline character blk-mq: Move the "state" debugfs attribute one level down blk-mq: Unregister debugfs attributes earlier blk-mq: Only unregister hctxs for which registration succeeded blk-mq-debugfs: Rename functions for registering and unregistering the mq directory blk-mq: Let blk_mq_debugfs_register() look up the queue name blk-mq: Register <dev>/queue/mq after having registered <dev>/queue ide-pm: always pass 0 error to ide_complete_rq in ide_do_devset ide-pm: always pass 0 error to __blk_end_request_all ..
2017-04-28cgroup: avoid attaching a cgroup root to two different superblocks, take 2Zefan Li3-6/+20
Commit bfb0b80db5f9 ("cgroup: avoid attaching a cgroup root to two different superblocks") is broken. Now we try to fix the race by delaying the initialization of cgroup root refcnt until a superblock has been allocated. Reported-by: Dmitry Vyukov <dvyukov@google.com> Reported-by: Andrei Vagin <avagin@virtuozzo.com> Tested-by: Andrei Vagin <avagin@virtuozzo.com> Signed-off-by: Zefan Li <lizefan@huawei.com> Signed-off-by: Tejun Heo <tj@kernel.org>
2017-04-28Merge branch 'pm-cpufreq'Rafael J. Wysocki2-28/+66
* pm-cpufreq: (37 commits) cpufreq: Add Tegra186 cpufreq driver cpufreq: imx6q: Fix error handling code cpufreq: imx6q: Set max suspend_freq to avoid changes during suspend cpufreq: imx6q: Fix handling EPROBE_DEFER from regulator cpufreq: schedutil: Use policy-dependent transition delays cpufreq: schedutil: Reduce frequencies slower cpufreq: intel_pstate: Add support for Gemini Lake cpufreq: intel_pstate: Eliminate intel_pstate_get_min_max() cpufreq: intel_pstate: Do not walk policy->cpus cpufreq: intel_pstate: Introduce pid_in_use() cpufreq: intel_pstate: Drop struct cpu_defaults cpufreq: intel_pstate: Move cpu_defaults definitions cpufreq: intel_pstate: Add update_util callback to pstate_funcs cpufreq: intel_pstate: Use different utilization update callbacks cpufreq: intel_pstate: Modify check in intel_pstate_update_status() cpufreq: intel_pstate: Drop driver_registered variable cpufreq: intel_pstate: Skip unnecessary PID resets on init cpufreq: intel_pstate: Set HWP sampling interval once cpufreq: intel_pstate: Clean up intel_pstate_busy_pid_reset() cpufreq: intel_pstate: Fold intel_pstate_reset_all_pid() into the caller ...
2017-04-28Merge schedutil governor updates for v4.12.Rafael J. Wysocki2-28/+66
2017-04-28cgroup: fix spurious warnings on cgroup_is_dead() from cgroup_sk_alloc()Tejun Heo1-6/+16
cgroup_get() expected to be called only on live cgroups and triggers warning on a dead cgroup; however, cgroup_sk_alloc() may be called while cloning a socket which is left in an empty and removed cgroup and thus may legitimately duplicate its reference on a dead cgroup. This currently triggers the following warning spuriously. WARNING: CPU: 14 PID: 0 at kernel/cgroup.c:490 cgroup_get+0x55/0x60 ... [<ffffffff8107e123>] __warn+0xd3/0xf0 [<ffffffff8107e20e>] warn_slowpath_null+0x1e/0x20 [<ffffffff810ff465>] cgroup_get+0x55/0x60 [<ffffffff81106061>] cgroup_sk_alloc+0x51/0xe0 [<ffffffff81761beb>] sk_clone_lock+0x2db/0x390 [<ffffffff817cce06>] inet_csk_clone_lock+0x16/0xc0 [<ffffffff817e8173>] tcp_create_openreq_child+0x23/0x4b0 [<ffffffff818601a1>] tcp_v6_syn_recv_sock+0x91/0x670 [<ffffffff817e8b16>] tcp_check_req+0x3a6/0x4e0 [<ffffffff81861ba3>] tcp_v6_rcv+0x693/0xa00 [<ffffffff81837429>] ip6_input_finish+0x59/0x3e0 [<ffffffff81837cb2>] ip6_input+0x32/0xb0 [<ffffffff81837387>] ip6_rcv_finish+0x57/0xa0 [<ffffffff81837ac8>] ipv6_rcv+0x318/0x4d0 [<ffffffff817778c7>] __netif_receive_skb_core+0x2d7/0x9a0 [<ffffffff81777fa6>] __netif_receive_skb+0x16/0x70 [<ffffffff81778023>] netif_receive_skb_internal+0x23/0x80 [<ffffffff817787d8>] napi_gro_frags+0x208/0x270 [<ffffffff8168a9ec>] mlx4_en_process_rx_cq+0x74c/0xf40 [<ffffffff8168b270>] mlx4_en_poll_rx_cq+0x30/0x90 [<ffffffff81778b30>] net_rx_action+0x210/0x350 [<ffffffff8188c426>] __do_softirq+0x106/0x2c7 [<ffffffff81082bad>] irq_exit+0x9d/0xa0 [<ffffffff8188c0e4>] do_IRQ+0x54/0xd0 [<ffffffff8188a63f>] common_interrupt+0x7f/0x7f <EOI> [<ffffffff8173d7e7>] cpuidle_enter+0x17/0x20 [<ffffffff810bdfd9>] cpu_startup_entry+0x2a9/0x2f0 [<ffffffff8103edd1>] start_secondary+0xf1/0x100 This patch renames the existing cgroup_get() with the dead cgroup warning to cgroup_get_live() after cgroup_kn_lock_live() and introduces the new cgroup_get() which doesn't check whether the cgroup is live or dead. All existing cgroup_get() users except for cgroup_sk_alloc() are converted to use cgroup_get_live(). Fixes: d979a39d7242 ("cgroup: duplicate cgroup reference when cloning sockets") Cc: stable@vger.kernel.org # v4.5+ Cc: Johannes Weiner <hannes@cmpxchg.org> Reported-by: Chris Mason <clm@fb.com> Signed-off-by: Tejun Heo <tj@kernel.org>
2017-04-27sched/cputime: Fix ksoftirqd cputime accounting regressionFrederic Weisbecker2-13/+23
irq_time_read() returns the irqtime minus the ksoftirqd time. This is necessary because irq_time_read() is used to substract the IRQ time from the sum_exec_runtime of a task. If we were to include the softirq time of ksoftirqd, this task would substract its own CPU time everytime it updates ksoftirqd->sum_exec_runtime which would therefore never progress. But this behaviour got broken by: a499a5a14db ("sched/cputime: Increment kcpustat directly on irqtime account") ... which now includes ksoftirqd softirq time in the time returned by irq_time_read(). This has resulted in wrong ksoftirqd cputime reported to userspace through /proc/stat and thus "top" not showing ksoftirqd when it should after intense networking load. ksoftirqd->stime happens to be correct but it gets scaled down by sum_exec_runtime through task_cputime_adjusted(). To fix this, just account the strict IRQ time in a separate counter and use it to report the IRQ time. Reported-and-tested-by: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Reviewed-by: Rik van Riel <riel@redhat.com> Acked-by: Jesper Dangaard Brouer <brouer@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stanislaw Gruszka <sgruszka@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Wanpeng Li <wanpeng.li@hotmail.com> Link: http://lkml.kernel.org/r/1493129448-5356-1-git-send-email-fweisbec@gmail.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-04-23Merge branch 'irq-urgent-for-linus' of ↵Linus Torvalds1-1/+1
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull irq fix from Thomas Gleixner: "The (hopefully) final fix for the irq affinity spreading logic" * 'irq-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: genirq/affinity: Fix calculating vectors to assign
2017-04-20Merge tag 'trace-v4.11-rc5-5' of ↵Linus Torvalds2-5/+19
git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace Pull two more ftrace fixes from Steven Rostedt: "While continuing my development, I uncovered two more small bugs. One is a race condition when enabling the snapshot function probe trigger. It enables the probe before allocating the snapshot, and if the probe triggers first, it stops tracing with a warning that the snapshot buffer was not allocated. The seconds is that the snapshot file should show how to use it when it is empty. But a bug fix from long ago broke the "is empty" test and the snapshot file no longer displays the help message" * tag 'trace-v4.11-rc5-5' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace: ring-buffer: Have ring_buffer_iter_empty() return true when empty tracing: Allocate the snapshot buffer before enabling probe
2017-04-20block: remove the errors field from struct requestChristoph Hellwig1-14/+12
Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Bart Van Assche <Bart.VanAssche@sandisk.com> Acked-by: Roger Pau Monné <roger.pau@citrix.com> Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Signed-off-by: Jens Axboe <axboe@fb.com>
2017-04-20blktrace: remove the unused block_rq_abort tracepointChristoph Hellwig1-9/+0
Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de> Signed-off-by: Jens Axboe <axboe@fb.com>
2017-04-20Merge branch 'linus' into irq/coreThomas Gleixner18-131/+199
Pick up upstream fixes to avoid conflicts with pending patches.
2017-04-20genirq/affinity: Fix calculating vectors to assignKeith Busch1-1/+1
The vectors_per_node is calculated from the remaining available vectors. The current vector starts after pre_vectors, so we need to subtract that from the current to properly account for the number of remaining vectors to assign. Fixes: 3412386b531 ("irq/affinity: Fix extra vecs calculation") Reported-by: Andrei Vagin <avagin@virtuozzo.com> Signed-off-by: Keith Busch <keith.busch@intel.com> Link: http://lkml.kernel.org/r/1492645870-13019-1-git-send-email-keith.busch@intel.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2017-04-20timer/sysclt: Restrict timer migration sysctl values to 0 and 1Myungho Jung2-1/+3
timer_migration sysctl acts as a boolean switch, so the allowed values should be restricted to 0 and 1. Add the necessary extra fields to the sysctl table entry to enforce that. [ tglx: Rewrote changelog ] Signed-off-by: Myungho Jung <mhjungk@gmail.com> Link: http://lkml.kernel.org/r/1492640690-3550-1-git-send-email-mhjungk@gmail.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2017-04-19ring-buffer: Have ring_buffer_iter_empty() return true when emptySteven Rostedt (VMware)1-2/+14
I noticed that reading the snapshot file when it is empty no longer gives a status. It suppose to show the status of the snapshot buffer as well as how to allocate and use it. For example: ># cat snapshot # tracer: nop # # # * Snapshot is allocated * # # Snapshot commands: # echo 0 > snapshot : Clears and frees snapshot buffer # echo 1 > snapshot : Allocates snapshot buffer, if not already allocated. # Takes a snapshot of the main buffer. # echo 2 > snapshot : Clears snapshot buffer (but does not allocate or free) # (Doesn't have to be '2' works with any number that # is not a '0' or '1') But instead it just showed an empty buffer: ># cat snapshot # tracer: nop # # entries-in-buffer/entries-written: 0/0 #P:4 # # _-----=> irqs-off # / _----=> need-resched # | / _---=> hardirq/softirq # || / _--=> preempt-depth # ||| / delay # TASK-PID CPU# |||| TIMESTAMP FUNCTION # | | | |||| | | What happened was that it was using the ring_buffer_iter_empty() function to see if it was empty, and if it was, it showed the status. But that function was returning false when it was empty. The reason was that the iter header page was on the reader page, and the reader page was empty, but so was the buffer itself. The check only tested to see if the iter was on the commit page, but the commit page was no longer pointing to the reader page, but as all pages were empty, the buffer is also. Cc: stable@vger.kernel.org Fixes: 651e22f2701b ("ring-buffer: Always reset iterator to reader page") Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2017-04-19tracing: Allocate the snapshot buffer before enabling probeSteven Rostedt (VMware)1-3/+5
Currently the snapshot trigger enables the probe and then allocates the snapshot. If the probe triggers before the allocation, it could cause the snapshot to fail and turn tracing off. It's best to allocate the snapshot buffer first, and then enable the trigger. If something goes wrong in the enabling of the trigger, the snapshot buffer is still allocated, but it can also be freed by the user by writting zero into the snapshot buffer file. Also add a check of the return status of alloc_snapshot(). Cc: stable@vger.kernel.org Fixes: 77fd5c15e3 ("tracing: Add snapshot trigger to function probes") Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2017-04-18Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparcLinus Torvalds1-3/+3
Pull sparc fixes from David Miller: "Two Sparc bug fixes from Daniel Jordan and Nitin Gupta" * git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc: sparc64: Fix hugepage page table free sparc64: Use LOCKDEP_SMALL, not PROVE_LOCKING_SMALL
2017-04-18Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netLinus Torvalds1-0/+8
Pull networking fixes from David Miller: 1) BPF tail call handling bug fixes from Daniel Borkmann. 2) Fix allowance of too many rx queues in sfc driver, from Bert Kenward. 3) Non-loopback ipv6 packets claiming src of ::1 should be dropped, from Florian Westphal. 4) Statistics requests on KSZ9031 can crash, fix from Grygorii Strashko. 5) TX ring handling fixes in mediatek driver, from Sean Wang. 6) ip_ra_control can deadlock, fix lock acquisition ordering to fix, from Cong WANG. 7) Fix use after free in ip_recv_error(), from Willem de Buijn. * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: bpf: fix checking xdp_adjust_head on tail calls bpf: fix cb access in socket filter programs on tail calls ipv6: drop non loopback packets claiming to originate from ::1 net: ethernet: mediatek: fix inconsistency of port number carried in TXD net: ethernet: mediatek: fix inconsistency between TXD and the used buffer net: phy: micrel: fix crash when statistic requested for KSZ9031 phy net: vrf: Fix setting NLM_F_EXCL flag when adding l3mdev rule net: thunderx: Fix set_max_bgx_per_node for 81xx rgx net-timestamp: avoid use-after-free in ip_recv_error ipv4: fix a deadlock in ip_ra_control sfc: limit the number of receive queues
2017-04-18sparc64: Use LOCKDEP_SMALL, not PROVE_LOCKING_SMALLDaniel Jordan1-3/+3
CONFIG_PROVE_LOCKING_SMALL shrinks the memory usage of lockdep so the kernel text, data, and bss fit in the required 32MB limit, but this option is not set for every config that enables lockdep. A 4.10 kernel fails to boot with the console output Kernel: Using 8 locked TLB entries for main kernel image. hypervisor_tlb_lock[2000000:0:8000000071c007c3:1]: errors with f Program terminated with these config options CONFIG_LOCKDEP=y CONFIG_LOCK_STAT=y CONFIG_PROVE_LOCKING=n To fix, rename CONFIG_PROVE_LOCKING_SMALL to CONFIG_LOCKDEP_SMALL, and enable this option with CONFIG_LOCKDEP=y so we get the reduced memory usage every time lockdep is turned on. Tested that CONFIG_LOCKDEP_SMALL is set to 'y' if and only if CONFIG_LOCKDEP is set to 'y'. When other lockdep-related config options that select CONFIG_LOCKDEP are enabled (e.g. CONFIG_LOCK_STAT or CONFIG_PROVE_LOCKING), verified that CONFIG_LOCKDEP_SMALL is also enabled. Fixes: e6b5f1be7afe ("config: Adding the new config parameter CONFIG_PROVE_LOCKING_SMALL for sparc") Signed-off-by: Daniel Jordan <daniel.m.jordan@oracle.com> Reviewed-by: Babu Moger <babu.moger@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-04-18Merge tag 'trace-v4.11-rc5-3' of ↵Linus Torvalds3-0/+12
git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace Pull ftrace fix from Steven Rostedt: "Namhyung Kim discovered a use after free bug. It has to do with adding a pid filter to function tracing in an instance, and then freeing the instance" * tag 'trace-v4.11-rc5-3' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace: ftrace: Fix function pid filter on instances
2017-04-18boot/param: Move next_arg() function to lib/cmdline.c for later reuseBaoquan He1-52/+0
next_arg() will be used to parse boot parameters in the x86/boot/compressed code, so move it to lib/cmdline.c for better code reuse. No change in functionality. Signed-off-by: Baoquan He <bhe@redhat.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Gustavo Padovan <gustavo.padovan@collabora.co.uk> Cc: Jens Axboe <axboe@fb.com> Cc: Jessica Yu <jeyu@redhat.com> Cc: Johannes Berg <johannes.berg@intel.com> Cc: Josh Triplett <josh@joshtriplett.org> Cc: Larry Finger <Larry.Finger@lwfinger.net> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Niklas Söderlund <niklas.soderlund+renesas@ragnatech.se> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Petr Mladek <pmladek@suse.com> Cc: Rasmus Villemoes <linux@rasmusvillemoes.dk> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: dan.j.williams@intel.com Cc: dave.jiang@intel.com Cc: dyoung@redhat.com Cc: keescook@chromium.org Cc: zijun_hu <zijun_hu@htc.com> Link: http://lkml.kernel.org/r/1492436099-4017-2-git-send-email-bhe@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-04-17ftrace: Fix function pid filter on instancesNamhyung Kim3-0/+12
When function tracer has a pid filter, it adds a probe to sched_switch to track if current task can be ignored. The probe checks the ftrace_ignore_pid from current tr to filter tasks. But it misses to delete the probe when removing an instance so that it can cause a crash due to the invalid tr pointer (use-after-free). This is easily reproducible with the following: # cd /sys/kernel/debug/tracing # mkdir instances/buggy # echo $$ > instances/buggy/set_ftrace_pid # rmdir instances/buggy ============================================================================ BUG: KASAN: use-after-free in ftrace_filter_pid_sched_switch_probe+0x3d/0x90 Read of size 8 by task kworker/0:1/17 CPU: 0 PID: 17 Comm: kworker/0:1 Tainted: G B 4.11.0-rc3 #198 Call Trace: dump_stack+0x68/0x9f kasan_object_err+0x21/0x70 kasan_report.part.1+0x22b/0x500 ? ftrace_filter_pid_sched_switch_probe+0x3d/0x90 kasan_report+0x25/0x30 __asan_load8+0x5e/0x70 ftrace_filter_pid_sched_switch_probe+0x3d/0x90 ? fpid_start+0x130/0x130 __schedule+0x571/0xce0 ... To fix it, use ftrace_clear_pids() to unregister the probe. As instance_rmdir() already updated ftrace codes, it can just free the filter safely. Link: http://lkml.kernel.org/r/20170417024430.21194-2-namhyung@kernel.org Fixes: 0c8916c34203 ("tracing: Add rmdir to remove multibuffer instances") Cc: Ingo Molnar <mingo@kernel.org> Cc: stable@vger.kernel.org Reviewed-by: Masami Hiramatsu <mhiramat@kernel.org> Signed-off-by: Namhyung Kim <namhyung@kernel.org> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2017-04-17bpf: fix checking xdp_adjust_head on tail callsDaniel Borkmann1-0/+1
Commit 17bedab27231 ("bpf: xdp: Allow head adjustment in XDP prog") added the xdp_adjust_head bit to the BPF prog in order to tell drivers that the program that is to be attached requires support for the XDP bpf_xdp_adjust_head() helper such that drivers not supporting this helper can reject the program. There are also drivers that do support the helper, but need to check for xdp_adjust_head bit in order to move packet metadata prepended by the firmware away for making headroom. For these cases, the current check for xdp_adjust_head bit is insufficient since there can be cases where the program itself does not use the bpf_xdp_adjust_head() helper, but tail calls into another program that uses bpf_xdp_adjust_head(). As such, the xdp_adjust_head bit is still set to 0. Since the first program has no control over which program it calls into, we need to assume that bpf_xdp_adjust_head() helper is used upon tail calls. Thus, for the very same reasons in cb_access, set the xdp_adjust_head bit to 1 when the main program uses tail calls. Fixes: 17bedab27231 ("bpf: xdp: Allow head adjustment in XDP prog") Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Alexei Starovoitov <ast@kernel.org> Cc: Martin KaFai Lau <kafai@fb.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-04-17bpf: fix cb access in socket filter programs on tail callsDaniel Borkmann1-0/+7
Commit ff936a04e5f2 ("bpf: fix cb access in socket filter programs") added a fix for socket filter programs such that in i) AF_PACKET the 20 bytes of skb->cb[] area gets zeroed before use in order to not leak data, and ii) socket filter programs attached to TCP/UDP sockets need to save/restore these 20 bytes since they are also used by protocol layers at that time. The problem is that bpf_prog_run_save_cb() and bpf_prog_run_clear_cb() only look at the actual attached program to determine whether to zero or save/restore the skb->cb[] parts. There can be cases where the actual attached program does not access the skb->cb[], but the program tail calls into another program which does access this area. In such a case, the zero or save/restore is currently not performed. Since the programs we tail call into are unknown at verification time and can dynamically change, we need to assume that whenever the attached program performs a tail call, that later programs could access the skb->cb[], and therefore we need to always set cb_access to 1. Fixes: ff936a04e5f2 ("bpf: fix cb access in socket filter programs") Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-04-17cpufreq: schedutil: Use policy-dependent transition delaysRafael J. Wysocki1-5/+10
Make the schedutil governor take the initial (default) value of the rate_limit_us sysfs attribute from the (new) transition_delay_us policy parameter (to be set by the scaling driver). That will allow scaling drivers to make schedutil use smaller default values of rate_limit_us and reduce the default average time interval between consecutive frequency changes. Make intel_pstate set transition_delay_us to 500. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
2017-04-16Merge branch 'for-4.11-fixes' of ↵Linus Torvalds1-1/+1
git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup Pull cgroup fix from Tejun Heo: "Unfortunately, the commit to fix the cgroup mount race in the previous pull request can lead to hangs. The original bug has been around for a while and isn't too likely to be triggered in usual use cases. Revert the commit for now" * 'for-4.11-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup: Revert "cgroup: avoid attaching a cgroup root to two different superblocks"
2017-04-16Merge tag 'trace-v4.11-rc5-2' of ↵Linus Torvalds1-4/+16
git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace Pull ftrace fix from Steven Rostedt: "While rewriting the function probe code, I stumbled over a long standing bug. This bug has been there sinc function tracing was added way back when. But my new development depends on this bug being fixed, and it should be fixed regardless as it causes ftrace to disable itself when triggered, and a reboot is required to enable it again. The bug is that the function probe does not disable itself properly if there's another probe of its type still enabled. For example: # cd /sys/kernel/debug/tracing # echo schedule:traceoff > set_ftrace_filter # echo do_IRQ:traceoff > set_ftrace_filter # echo \!do_IRQ:traceoff > /debug/tracing/set_ftrace_filter # echo do_IRQ:traceoff > set_ftrace_filter The above registers two traceoff probes (one for schedule and one for do_IRQ, and then removes do_IRQ. But since there still exists one for schedule, it is not done properly. When adding do_IRQ back, the breakage in the accounting is noticed by the ftrace self tests, and it causes a warning and disables ftrace" * tag 'trace-v4.11-rc5-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace: ftrace: Fix removing of second function probe
2017-04-16Revert "cgroup: avoid attaching a cgroup root to two different superblocks"Tejun Heo1-1/+1
This reverts commit bfb0b80db5f9dca5ac0a5fd0edb765ee555e5a8e. Andrei reports CRIU test hangs with the patch applied. The bug fixed by the patch isn't too likely to trigger in actual uses. Revert the patch for now. Signed-off-by: Tejun Heo <tj@kernel.org> Reported-by: Andrei Vagin <avagin@virtuozzo.com> Link: http://lkml.kernel.org/r/20170414232737.GC20350@outlook.office365.com
2017-04-15futex: Clarify mark_wake_futex memory barrier usageDarren Hart (VMware)1-4/+5
Clarify the scenario described in mark_wake_futex requiring the smp_store_release(). Update the comment to explicitly refer to the plist_del now under __unqueue_futex() (previously plist_del was in the same function as the comment). Signed-off-by: Darren Hart (VMware) <dvhart@infradead.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/20170414223138.GA4222@fury Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2017-04-15genirq: Use irqd_get_trigger_type to compare the trigger type for shared IRQsHans de Goede1-1/+3
When requesting a shared irq with IRQF_TRIGGER_NONE then the irqaction flags get filled with the trigger type from the irq_data: if (!(new->flags & IRQF_TRIGGER_MASK)) new->flags |= irqd_get_trigger_type(&desc->irq_data); On the first setup_irq() the trigger type in irq_data is NONE when the above code executes, then the irq is started up for the first time and then the actual trigger type gets established, but that's too late to fix up new->flags. When then a second user of the irq requests the irq with IRQF_TRIGGER_NONE its irqaction's triggertype gets set to the actual trigger type and the following check fails: if (!((old->flags ^ new->flags) & IRQF_TRIGGER_MASK)) Resulting in the request_irq failing with -EBUSY even though both users requested the irq with IRQF_SHARED | IRQF_TRIGGER_NONE Fix this by comparing the new irqaction's trigger type to the trigger type stored in the irq_data which correctly reflects the actual trigger type being used for the irq. Suggested-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Hans de Goede <hdegoede@redhat.com> Acked-by: Marc Zyngier <marc.zyngier@arm.com> Link: http://lkml.kernel.org/r/20170415100831.17073-1-hdegoede@redhat.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2017-04-15Merge tag 'irqchip-4.12' of ↵Thomas Gleixner18-448/+721
git://git.kernel.org/pub/scm/linux/kernel/git/maz/arm-platforms into irq/core Pull irqchip updates from Marc Zyngier - Unify gemini and moxa irqchips under the faraday banner - Extend mtk-sysirq to deal with multiple MMIO regions - ACPI/IORT support for GICv3 ITS platform MSI - ACPI support for mbigen - Add mtk-cirq wakeup interrupt controller driver - Atmel aic5 suspend support - Allow GPCv2 to be probed both as an irqchip and a device
2017-04-15workqueue: Provide work_on_cpu_safe()Thomas Gleixner1-0/+23
work_on_cpu() is not protected against CPU hotplug. For code which requires to be either executed on an online CPU or to fail if the CPU is not available the callsite would have to protect against CPU hotplug. Provide a function which does get/put_online_cpus() around the call to work_on_cpu() and fails the call with -ENODEV if the target CPU is not online. Preparatory patch to convert several racy task affinity manipulations. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Tejun Heo <tj@kernel.org> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: Tony Luck <tony.luck@intel.com> Cc: Herbert Xu <herbert@gondor.apana.org.au> Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Sebastian Siewior <bigeasy@linutronix.de> Cc: Lai Jiangshan <jiangshanlai@gmail.com> Cc: Viresh Kumar <viresh.kumar@linaro.org> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: "David S. Miller" <davem@davemloft.net> Cc: Len Brown <lenb@kernel.org> Link: http://lkml.kernel.org/r/20170412201042.262610721@linutronix.de Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2017-04-14Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netLinus Torvalds1-6/+6
Pull networking fixes from David Miller: "Things seem to be settling down as far as networking is concerned, let's hope this trend continues... 1) Add iov_iter_revert() and use it to fix the behavior of skb_copy_datagram_msg() et al., from Al Viro. 2) Fix the protocol used in the synthetic SKB we cons up for the purposes of doing a simulated route lookup for RTM_GETROUTE requests. From Florian Larysch. 3) Don't add noop_qdisc to the per-device qdisc hashes, from Cong Wang. 4) Don't call netdev_change_features with the team lock held, from Xin Long. 5) Revert TCP F-RTO extension to catch more spurious timeouts because it interacts very badly with some middle-boxes. From Yuchung Cheng. 6) Fix the loss of error values in l2tp {s,g}etsockopt calls, from Guillaume Nault. 7) ctnetlink uses bit positions where it should be using bit masks, fix from Liping Zhang. 8) Missing RCU locking in netfilter helper code, from Gao Feng. 9) Avoid double frees and use-after-frees in tcp_disconnect(), from Eric Dumazet. 10) Don't do a changelink before we register the netdevice in bridging, from Ido Schimmel. 11) Lock the ipv6 device address list properly, from Rabin Vincent" * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (29 commits) netfilter: ipt_CLUSTERIP: Fix wrong conntrack netns refcnt usage netfilter: nft_hash: do not dump the auto generated seed drivers: net: usb: qmi_wwan: add QMI_QUIRK_SET_DTR for Telit PID 0x1201 ipv6: Fix idev->addr_list corruption net: xdp: don't export dev_change_xdp_fd() bridge: netlink: register netdevice before executing changelink bridge: implement missing ndo_uninit() bpf: reference may_access_skb() from __bpf_prog_run() tcp: clear saved_syn in tcp_disconnect() netfilter: nf_ct_expect: use proper RCU list traversal/update APIs netfilter: ctnetlink: skip dumping expect when nfct_help(ct) is NULL netfilter: make it safer during the inet6_dev->addr_list traversal netfilter: ctnetlink: make it safer when checking the ct helper name netfilter: helper: Add the rcu lock when call __nf_conntrack_helper_find netfilter: ctnetlink: using bit to represent the ct event netfilter: xt_TCPMSS: add more sanity tests on tcph->doff net: tcp: Increase TCP_MIB_OUTRSTS even though fail to alloc skb l2tp: don't mask errors in pppol2tp_getsockopt() l2tp: don't mask errors in pppol2tp_setsockopt() tcp: restrict F-RTO to work-around broken middle-boxes ...
2017-04-14Merge branch 'irq-urgent-for-linus' of ↵Linus Torvalds1-9/+11
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull irq fixes from Thomas Gleixner: "The irq department provides: - two fixes for the CPU affinity spread infrastructure to prevent unbalanced spreading in corner cases which leads to horrible performance, because interrupts are rather aggregated than spread - add a missing spinlock initializer in the imx-gpcv2 init code" * 'irq-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: irqchip/irq-imx-gpcv2: Fix spinlock initialization irq/affinity: Fix extra vecs calculation irq/affinity: Fix CPU spread for unbalanced nodes
2017-04-14ftrace: Fix removing of second function probeSteven Rostedt (VMware)1-4/+16
When two function probes are added to set_ftrace_filter, and then one of them is removed, the update to the function locations is not performed, and the record keeping of the function states are corrupted, and causes an ftrace_bug() to occur. This is easily reproducable by adding two probes, removing one, and then adding it back again. # cd /sys/kernel/debug/tracing # echo schedule:traceoff > set_ftrace_filter # echo do_IRQ:traceoff > set_ftrace_filter # echo \!do_IRQ:traceoff > /debug/tracing/set_ftrace_filter # echo do_IRQ:traceoff > set_ftrace_filter Causes: ------------[ cut here ]------------ WARNING: CPU: 2 PID: 1098 at kernel/trace/ftrace.c:2369 ftrace_get_addr_curr+0x143/0x220 Modules linked in: [...] CPU: 2 PID: 1098 Comm: bash Not tainted 4.10.0-test+ #405 Hardware name: Hewlett-Packard HP Compaq Pro 6300 SFF/339A, BIOS K01 v02.05 05/07/2012 Call Trace: dump_stack+0x68/0x9f __warn+0x111/0x130 ? trace_irq_work_interrupt+0xa0/0xa0 warn_slowpath_null+0x1d/0x20 ftrace_get_addr_curr+0x143/0x220 ? __fentry__+0x10/0x10 ftrace_replace_code+0xe3/0x4f0 ? ftrace_int3_handler+0x90/0x90 ? printk+0x99/0xb5 ? 0xffffffff81000000 ftrace_modify_all_code+0x97/0x110 arch_ftrace_update_code+0x10/0x20 ftrace_run_update_code+0x1c/0x60 ftrace_run_modify_code.isra.48.constprop.62+0x8e/0xd0 register_ftrace_function_probe+0x4b6/0x590 ? ftrace_startup+0x310/0x310 ? debug_lockdep_rcu_enabled.part.4+0x1a/0x30 ? update_stack_state+0x88/0x110 ? ftrace_regex_write.isra.43.part.44+0x1d3/0x320 ? preempt_count_sub+0x18/0xd0 ? mutex_lock_nested+0x104/0x800 ? ftrace_regex_write.isra.43.part.44+0x1d3/0x320 ? __unwind_start+0x1c0/0x1c0 ? _mutex_lock_nest_lock+0x800/0x800 ftrace_trace_probe_callback.isra.3+0xc0/0x130 ? func_set_flag+0xe0/0xe0 ? __lock_acquire+0x642/0x1790 ? __might_fault+0x1e/0x20 ? trace_get_user+0x398/0x470 ? strcmp+0x35/0x60 ftrace_trace_onoff_callback+0x48/0x70 ftrace_regex_write.isra.43.part.44+0x251/0x320 ? match_records+0x420/0x420 ftrace_filter_write+0x2b/0x30 __vfs_write+0xd7/0x330 ? do_loop_readv_writev+0x120/0x120 ? locks_remove_posix+0x90/0x2f0 ? do_lock_file_wait+0x160/0x160 ? __lock_is_held+0x93/0x100 ? rcu_read_lock_sched_held+0x5c/0xb0 ? preempt_count_sub+0x18/0xd0 ? __sb_start_write+0x10a/0x230 ? vfs_write+0x222/0x240 vfs_write+0xef/0x240 SyS_write+0xab/0x130 ? SyS_read+0x130/0x130 ? trace_hardirqs_on_caller+0x182/0x280 ? trace_hardirqs_on_thunk+0x1a/0x1c entry_SYSCALL_64_fastpath+0x18/0xad RIP: 0033:0x7fe61c157c30 RSP: 002b:00007ffe87890258 EFLAGS: 00000246 ORIG_RAX: 0000000000000001 RAX: ffffffffffffffda RBX: ffffffff8114a410 RCX: 00007fe61c157c30 RDX: 0000000000000010 RSI: 000055814798f5e0 RDI: 0000000000000001 RBP: ffff8800c9027f98 R08: 00007fe61c422740 R09: 00007fe61ca53700 R10: 0000000000000073 R11: 0000000000000246 R12: 0000558147a36400 R13: 00007ffe8788f160 R14: 0000000000000024 R15: 00007ffe8788f15c ? trace_hardirqs_off_caller+0xc0/0x110 ---[ end trace 99fa09b3d9869c2c ]--- Bad trampoline accounting at: ffffffff81cc3b00 (do_IRQ+0x0/0x150) Cc: stable@vger.kernel.org Fixes: 59df055f1991 ("ftrace: trace different functions with a different tracer") Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2017-04-14time: Change k_clock nsleep() to use timespec64Deepa Dinamani6-33/+42
struct timespec is not y2038 safe on 32 bit machines. Replace uses of struct timespec with struct timespec64 in the kernel. The syscall interfaces themselves will be changed in a separate series. Note that the restart_block parameter for nanosleep has also been left unchanged and will be part of syscall series noted above. Signed-off-by: Deepa Dinamani <deepa.kernel@gmail.com> Cc: y2038@lists.linaro.org Cc: john.stultz@linaro.org Cc: arnd@arndb.de Link: http://lkml.kernel.org/r/1490555058-4603-8-git-send-email-deepa.kernel@gmail.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2017-04-14time: Change k_clock timer_set() and timer_get() to use timespec64Deepa Dinamani4-49/+51
struct timespec is not y2038 safe on 32 bit machines. Replace uses of struct timespec with struct timespec64 in the kernel. struct itimerspec internally uses struct timespec. Use struct itimerspec64 which uses struct timespec64. The syscall interfaces themselves will be changed in a separate series. Signed-off-by: Deepa Dinamani <deepa.kernel@gmail.com> Cc: y2038@lists.linaro.org Cc: john.stultz@linaro.org Cc: arnd@arndb.de Link: http://lkml.kernel.org/r/1490555058-4603-7-git-send-email-deepa.kernel@gmail.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2017-04-14time: Change k_clock clock_set() to use timespec64Deepa Dinamani3-10/+8
struct timespec is not y2038 safe on 32 bit machines. Replace uses of struct timespec with struct timespec64 in the kernel. The syscall interfaces themselves will be changed in a separate series. Signed-off-by: Deepa Dinamani <deepa.kernel@gmail.com> Cc: y2038@lists.linaro.org Cc: john.stultz@linaro.org Cc: arnd@arndb.de Link: http://lkml.kernel.org/r/1490555058-4603-6-git-send-email-deepa.kernel@gmail.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2017-04-14time: Change k_clock clock_getres() to use timespec64Deepa Dinamani4-14/+13
struct timespec is not y2038 safe on 32 bit machines. Replace uses of struct timespec with struct timespec64 in the kernel. The syscall interfaces themselves will be changed in a separate series. The clock_getres() interface has also been changed to use timespec64 even though this particular interface is not affected by the y2038 problem. This helps verification for internal kernel code for y2038 readiness by getting rid of time_t/ timeval/ timespec completely. Signed-off-by: Deepa Dinamani <deepa.kernel@gmail.com> Cc: y2038@lists.linaro.org Cc: john.stultz@linaro.org Cc: arnd@arndb.de Link: http://lkml.kernel.org/r/1490555058-4603-5-git-send-email-deepa.kernel@gmail.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2017-04-14time: Change k_clock clock_get() to use timespec64Deepa Dinamani5-31/+33
struct timespec is not y2038 safe on 32 bit machines. Replace uses of struct timespec with struct timespec64 in the kernel. The syscall interfaces themselves will be changed in a separate series. Signed-off-by: Deepa Dinamani <deepa.kernel@gmail.com> Cc: y2038@lists.linaro.org Cc: john.stultz@linaro.org Cc: arnd@arndb.de Link: http://lkml.kernel.org/r/1490555058-4603-4-git-send-email-deepa.kernel@gmail.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>