summaryrefslogtreecommitdiffstats
path: root/include
AgeCommit message (Collapse)AuthorFilesLines
2013-04-29Merge tag 'pci-v3.10-changes' of ↵Linus Torvalds10-115/+156
git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci Pull PCI updates from Bjorn Helgaas: "PCI changes for the v3.10 merge window: PCI device hotplug - Remove ACPI PCI subdrivers (Jiang Liu, Myron Stowe) - Make acpiphp builtin only, not modular (Jiang Liu) - Add acpiphp mutual exclusion (Jiang Liu) Power management - Skip "PME enabled/disabled" messages when not supported (Rafael Wysocki) - Fix fallback to PCI_D0 (Rafael Wysocki) Miscellaneous - Factor quirk_io_region (Yinghai Lu) - Cache MSI capability offsets & cleanup (Gavin Shan, Bjorn Helgaas) - Clean up EISA resource initialization and logging (Bjorn Helgaas) - Fix prototype warnings (Andy Shevchenko, Bjorn Helgaas) - MIPS: Initialize of_node before scanning bus (Gabor Juhos) - Fix pcibios_get_phb_of_node() declaration "weak" annotation (Gabor Juhos) - Add MSI INTX_DISABLE quirks for AR8161/AR8162/etc (Xiong Huang) - Fix aer_inject return values (Prarit Bhargava) - Remove PME/ACPI dependency (Andrew Murray) - Use shared PCI_BUS_NUM() and PCI_DEVID() (Shuah Khan)" * tag 'pci-v3.10-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci: (63 commits) vfio-pci: Use cached MSI/MSI-X capabilities vfio-pci: Use PCI_MSIX_TABLE_BIR, not PCI_MSIX_FLAGS_BIRMASK PCI: Remove "extern" from function declarations PCI: Use PCI_MSIX_TABLE_BIR, not PCI_MSIX_FLAGS_BIRMASK PCI: Drop msi_mask_reg() and remove drivers/pci/msi.h PCI: Use msix_table_size() directly, drop multi_msix_capable() PCI: Drop msix_table_offset_reg() and msix_pba_offset_reg() macros PCI: Drop is_64bit_address() and is_mask_bit_support() macros PCI: Drop msi_data_reg() macro PCI: Drop msi_lower_address_reg() and msi_upper_address_reg() macros PCI: Drop msi_control_reg() macro and use PCI_MSI_FLAGS directly PCI: Use cached MSI/MSI-X offsets from dev, not from msi_desc PCI: Clean up MSI/MSI-X capability #defines PCI: Use cached MSI-X cap while enabling MSI-X PCI: Use cached MSI cap while enabling MSI interrupts PCI: Remove MSI/MSI-X cap check in pci_msi_check_device() PCI: Cache MSI/MSI-X capability offsets in struct pci_dev PCI: Use u8, not int, for PM capability offset [SCSI] megaraid_sas: Use correct #define for MSI-X capability PCI: Remove "extern" from function declarations ...
2013-04-29Merge branch 'core-locking-for-linus' of ↵Linus Torvalds2-1/+3
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull locking changes from Ingo Molnar: "The most noticeable change are mutex speedups from Waiman Long, for higher loads. These scalability changes should be most noticeable on larger server systems. There are also cleanups, fixes and debuggability improvements." * 'core-locking-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: lockdep: Consolidate bug messages into a single print_lockdep_off() function lockdep: Print out additional debugging advice when we hit lockdep BUGs mutex: Back out architecture specific check for negative mutex count mutex: Queue mutex spinners with MCS lock to reduce cacheline contention mutex: Make more scalable by doing less atomic operations mutex: Move mutex spinning code from sched/core.c back to mutex.c locking/rtmutex/tester: Set correct permissions on sysfs files lockdep: Remove unnecessary 'hlock_next' variable
2013-04-29Merge tag 'stable/for-linus-3.10-rc0-tag' of ↵Linus Torvalds1-2/+1
git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen Pull Xen updates from Konrad Rzeszutek Wilk: "Features: - Populate the boot_params with EDD data. - Cleanups in the IRQ code. Bug-fixes: - CPU hotplug offline/online in PVHVM mode. - Re-upload processor PM data after ACPI S3 suspend/resume cycle." And Konrad gets a gold star for sending the pull request early when he thought he'd be away for the first week of the merge window (but because of 3.9 dragging out to -rc8 he then re-sent the reminder on the first day of the merge window anyway) * tag 'stable/for-linus-3.10-rc0-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen: xen: resolve section mismatch warnings in xen-acpi-processor xen: Re-upload processor PM data to hypervisor after S3 resume (v2) xen/smp: Unifiy some of the PVs and PVHVM offline CPU path xen/smp/pvhvm: Don't initialize IRQ_WORKER as we are using the native one. xen/spinlock: Disable IRQ spinlock (PV) allocation on PVHVM xen/spinlock: Check against default value of -1 for IRQ line. xen/time: Add default value of -1 for IRQ and check for that. xen/events: Check that IRQ value passed in is valid. xen/time: Fix kasprintf splat when allocating timer%d IRQ line. xen/smp/spinlock: Fix leakage of the spinlock interrupt line for every CPU online/offline xen/smp: Fix leakage of timer interrupt line for every CPU online/offline. xen kconfig: fix select INPUT_XEN_KBDDEV_FRONTEND xen: drop tracking of IRQ vector x86/xen: populate boot_params with EDD data
2013-04-24Merge branch 'pci/gavin-msi-cleanup' into nextBjorn Helgaas3-27/+31
* pci/gavin-msi-cleanup: vfio-pci: Use cached MSI/MSI-X capabilities vfio-pci: Use PCI_MSIX_TABLE_BIR, not PCI_MSIX_FLAGS_BIRMASK PCI: Remove "extern" from function declarations PCI: Use PCI_MSIX_TABLE_BIR, not PCI_MSIX_FLAGS_BIRMASK PCI: Drop msi_mask_reg() and remove drivers/pci/msi.h PCI: Use msix_table_size() directly, drop multi_msix_capable() PCI: Drop msix_table_offset_reg() and msix_pba_offset_reg() macros PCI: Drop is_64bit_address() and is_mask_bit_support() macros PCI: Drop msi_data_reg() macro PCI: Drop msi_lower_address_reg() and msi_upper_address_reg() macros PCI: Drop msi_control_reg() macro and use PCI_MSI_FLAGS directly PCI: Use cached MSI/MSI-X offsets from dev, not from msi_desc PCI: Clean up MSI/MSI-X capability #defines PCI: Use cached MSI-X cap while enabling MSI-X PCI: Use cached MSI cap while enabling MSI interrupts PCI: Remove MSI/MSI-X cap check in pci_msi_check_device() PCI: Cache MSI/MSI-X capability offsets in struct pci_dev PCI: Use u8, not int, for PM capability offset [SCSI] megaraid_sas: Use correct #define for MSI-X capability
2013-04-23PCI: Remove "extern" from function declarationsBjorn Helgaas1-12/+11
We had an inconsistent mix of using and omitting the "extern" keyword on function declarations in header files. This removes them all. Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2013-04-23PCI: Clean up MSI/MSI-X capability #definesBjorn Helgaas1-13/+17
This doesn't change any existing symbols, but it puts them in logical order and uses explicit masks instead of shifts, like the rest of the file. It also adds new symbols for PCI_MSIX_TABLE_BIR, PCI_MSIX_TABLE_OFFSET, PCI_MSIX_PBA_BIR, and PCI_MSIX_PBA_OFFSET to replace the mis-named PCI_MSIX_FLAGS_BIRMASK (the BAR index fields are part of the Table and PBA registers, not the flags register). Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2013-04-23PCI: Cache MSI/MSI-X capability offsets in struct pci_devGavin Shan1-0/+2
The patch caches the MSI and MSI-X capability offset in PCI device (struct pci_dev) so that we needn't read it from the config space upon enabling or disabling MSI or MSI-X interrupts. [bhelgaas: moved pm_cap size change to separate patch] Signed-off-by: Gavin Shan <shangw@linux.vnet.ibm.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2013-04-23PCI: Use u8, not int, for PM capability offsetBjorn Helgaas1-2/+1
The Power Management Capability (PCI_CAP_ID_PM == 0x01) is defined by PCI and must appear in the 256-byte PCI Configuration Space from 0-0xff. It cannot be in the PCIe Extended Configuration space from 0x100-0xfff, so we only need a u8 to hold its offset. Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2013-04-20Merge branch 'x86-kdump-for-linus' of ↵Linus Torvalds2-0/+3
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull kdump fixes from Peter Anvin: "The kexec/kdump people have found several problems with the support for loading over 4 GiB that was introduced in this merge cycle. This is partly due to a number of design problems inherent in the way the various pieces of kdump fit together (it is pretty horrifically manual in many places.) After a *lot* of iterations this is the patchset that was agreed upon, but of course it is now very late in the cycle. However, because it changes both the syntax and semantics of the crashkernel option, it would be desirable to avoid a stable release with the broken interfaces." I'm not happy with the timing, since originally the plan was to release the final 3.9 tomorrow. But apparently I'm doing an -rc8 instead... * 'x86-kdump-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: kexec: use Crash kernel for Crash kernel low x86, kdump: Change crashkernel_high/low= to crashkernel=,high/low x86, kdump: Retore crashkernel= to allocate under 896M x86, kdump: Set crashkernel_low automatically
2013-04-20Merge branch 'x86-urgent-for-linus' of ↵Linus Torvalds2-1/+22
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 fixes from Peter Anvin: "Three groups of fixes: 1. Make sure we don't execute the early microcode patching if family < 6, since it would touch MSRs which don't exist on those families, causing crashes. 2. The Xen partial emulation of HyperV can be dealt with more gracefully than just disabling the driver. 3. More EFI variable space magic. In particular, variables hidden from runtime code need to be taken into account too." * 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86, microcode: Verify the family before dispatching microcode patching x86, hyperv: Handle Xen emulation of Hyper-V more gracefully x86,efi: Implement efi_no_storage_paranoia parameter efi: Export efi_query_variable_store() for efivars.ko x86/Kconfig: Make EFI select UCS2_STRING efi: Distinguish between "remaining space" and actually used space efi: Pass boot services variable info to runtime code Move utf16 functions to kernel core and rename x86,efi: Check max_size only if it is non-zero. x86, efivars: firmware bug workarounds should be in platform code
2013-04-20Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netLinus Torvalds3-3/+6
Pull networking updates from David Miller: 1) ax88796 does 64-bit divides which causes link errors on ARM, fix from Arnd Bergmann. 2) Once an improper offload setting is detected on an SKB we don't rate limit the log message so we can very easily live lock. From Ben Greear. 3) Openvswitch cannot report vport configuration changes reliably because it didn't preallocate the netlink notification message before changing state. From Jesse Gross. 4) The effective UID/GID SCM credentials fix, from Linus. 5) When a user explicitly asks for wireless authentication, cfg80211 isn't told about the AP detachment leaving inconsistent state. Fix from Johannes Berg. 6) Fix self-MAC checks in batman-adv on multi-mesh nodes, from Antonio Quartulli. 7) Revert build_skb() change sin IGB driver, can result in memory corruption. From Alexander Duyck. 8) Fix setting VLANs on virtual functions in IXGBE, from Greg Rose. 9) Fix TSO races in qlcnic driver, from Sritej Velaga. 10) In bnx2x the kernel driver and UNDI firmware can try to program the chip at the same time, resulting in corruption. Add proper synchronization. From Dmitry Kravkov. 11) Fix corruption of status block in firmware ram in bxn2x, from Ariel Elior. 12) Fix load balancing hash regression of bonding driver in forwarding configurations, from Eric Dumazet. 13) Fix TS ECR regression in TCP by calling tcp_replace_ts_recent() in all the right spots, from Eric Dumazet. 14) Fix several bonding bugs having to do with address manintainence, including not removing address when configuration operations encounter errors, missed locking on the address lists, missing refcounting on VLAN objects, etc. All from Nikolay Aleksandrov. 15) Add workarounds for firmware bugs in LTE qmi_wwan devices, wherein the devices fail to add a proper ethernet header while on LTE networks but otherwise properly do so on 2G and 3G ones. From Bjørn Mork. * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (38 commits) net: fix incorrect credentials passing net: rate-limit warn-bad-offload splats. net: ax88796: avoid 64 bit arithmetic qlge: Update version to 1.00.00.32. qlge: Fix ethtool autoneg advertising. qlge: Fix receive path to drop error frames net: qmi_wwan: prevent duplicate mac address on link (firmware bug workaround) net: qmi_wwan: fixup destination address (firmware bug workaround) net: qmi_wwan: fixup missing ethernet header (firmware bug workaround) bonding: in bond_mc_swap() bond's mc addr list is walked without lock bonding: disable netpoll on enslave failure bonding: primary_slave & curr_active_slave are not cleaned on enslave failure bonding: vlans don't get deleted on enslave failure bonding: mc addresses don't get deleted on enslave failure pkt_sched: fix error return code in fw_change_attrs() irda: small read past the end of array in debug code tcp: call tcp_replace_ts_recent() from tcp_ack() netfilter: xt_rpfilter: skip locally generated broadcast/multicast, too netfilter: ipset: bitmap:ip,mac: fix listing with timeout bonding: fix l23 and l34 load balancing in forwarding path ...
2013-04-20net: fix incorrect credentials passingLinus Torvalds1-2/+2
Commit 257b5358b32f ("scm: Capture the full credentials of the scm sender") changed the credentials passing code to pass in the effective uid/gid instead of the real uid/gid. Obviously this doesn't matter most of the time (since normally they are the same), but it results in differences for suid binaries when the wrong uid/gid ends up being used. This just undoes that (presumably unintentional) part of the commit. Reported-by: Andy Lutomirski <luto@amacapital.net> Cc: Eric W. Biederman <ebiederm@xmission.com> Cc: Serge E. Hallyn <serge@hallyn.com> Cc: David S. Miller <davem@davemloft.net> Cc: stable@vger.kernel.org Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Acked-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-04-19Merge remote-tracking branch 'efi/urgent' into x86/urgentH. Peter Anvin2-1/+22
Matt Fleming (1): x86, efivars: firmware bug workarounds should be in platform code Matthew Garrett (3): Move utf16 functions to kernel core and rename efi: Pass boot services variable info to runtime code efi: Distinguish between "remaining space" and actually used space Richard Weinberger (2): x86,efi: Check max_size only if it is non-zero. x86,efi: Implement efi_no_storage_paranoia parameter Sergey Vlasov (2): x86/Kconfig: Make EFI select UCS2_STRING efi: Export efi_query_variable_store() for efivars.ko Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2013-04-19irda: small read past the end of array in debug codeDan Carpenter1-1/+2
The "reason" can come from skb->data[] and it hasn't been capped so it can be from 0-255 instead of just 0-6. For example in irlmp_state_dtr() the code does: reason = skb->data[3]; ... irlmp_disconnect_indication(self, reason, skb); Also LMREASON has a couple other values which don't have entries in the irlmp_reasons[] array. And 0xff is a valid reason as well which means "unknown". So far as I can see we don't actually care about "reason" except for in the debug code. Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-04-19Merge branch 'for-linus' of ↵Linus Torvalds1-220/+216
git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse Pull fuse build fix from Miklos Szeredi: "This fixes android builds. The patch appears large, but is just search & replace." * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse: fuse: fix type definitions in uapi header
2013-04-19mutex: Queue mutex spinners with MCS lock to reduce cacheline contentionWaiman Long1-0/+3
The current mutex spinning code (with MUTEX_SPIN_ON_OWNER option turned on) allow multiple tasks to spin on a single mutex concurrently. A potential problem with the current approach is that when the mutex becomes available, all the spinning tasks will try to acquire the mutex more or less simultaneously. As a result, there will be a lot of cacheline bouncing especially on systems with a large number of CPUs. This patch tries to reduce this kind of contention by putting the mutex spinners into a queue so that only the first one in the queue will try to acquire the mutex. This will reduce contention and allow all the tasks to move forward faster. The queuing of mutex spinners is done using an MCS lock based implementation which will further reduce contention on the mutex cacheline than a similar ticket spinlock based implementation. This patch will add a new field into the mutex data structure for holding the MCS lock. This expands the mutex size by 8 bytes for 64-bit system and 4 bytes for 32-bit system. This overhead will be avoid if the MUTEX_SPIN_ON_OWNER option is turned off. The following table shows the jobs per minute (JPM) scalability data on an 8-node 80-core Westmere box with a 3.7.10 kernel. The numactl command is used to restrict the running of the fserver workloads to 1/2/4/8 nodes with hyperthreading off. +-----------------+-----------+-----------+-------------+----------+ | Configuration | Mean JPM | Mean JPM | Mean JPM | % Change | | | w/o patch | patch 1 | patches 1&2 | 1->1&2 | +-----------------+------------------------------------------------+ | | User Range 1100 - 2000 | +-----------------+------------------------------------------------+ | 8 nodes, HT off | 227972 | 227237 | 305043 | +34.2% | | 4 nodes, HT off | 393503 | 381558 | 394650 | +3.4% | | 2 nodes, HT off | 334957 | 325240 | 338853 | +4.2% | | 1 node , HT off | 198141 | 197972 | 198075 | +0.1% | +-----------------+------------------------------------------------+ | | User Range 200 - 1000 | +-----------------+------------------------------------------------+ | 8 nodes, HT off | 282325 | 312870 | 332185 | +6.2% | | 4 nodes, HT off | 390698 | 378279 | 393419 | +4.0% | | 2 nodes, HT off | 336986 | 326543 | 340260 | +4.2% | | 1 node , HT off | 197588 | 197622 | 197582 | 0.0% | +-----------------+-----------+-----------+-------------+----------+ At low user range 10-100, the JPM differences were within +/-1%. So they are not that interesting. The fserver workload uses mutex spinning extensively. With just the mutex change in the first patch, there is no noticeable change in performance. Rather, there is a slight drop in performance. This mutex spinning patch more than recovers the lost performance and show a significant increase of +30% at high user load with the full 8 nodes. Similar improvements were also seen in a 3.8 kernel. The table below shows the %time spent by different kernel functions as reported by perf when running the fserver workload at 1500 users with all 8 nodes. +-----------------------+-----------+---------+-------------+ | Function | % time | % time | % time | | | w/o patch | patch 1 | patches 1&2 | +-----------------------+-----------+---------+-------------+ | __read_lock_failed | 34.96% | 34.91% | 29.14% | | __write_lock_failed | 10.14% | 10.68% | 7.51% | | mutex_spin_on_owner | 3.62% | 3.42% | 2.33% | | mspin_lock | N/A | N/A | 9.90% | | __mutex_lock_slowpath | 1.46% | 0.81% | 0.14% | | _raw_spin_lock | 2.25% | 2.50% | 1.10% | +-----------------------+-----------+---------+-------------+ The fserver workload for an 8-node system is dominated by the contention in the read/write lock. Mutex contention also plays a role. With the first patch only, mutex contention is down (as shown by the __mutex_lock_slowpath figure) which help a little bit. We saw only a few percents improvement with that. By applying patch 2 as well, the single mutex_spin_on_owner figure is now split out into an additional mspin_lock figure. The time increases from 3.42% to 11.23%. It shows a great reduction in contention among the spinners leading to a 30% improvement. The time ratio 9.9/2.33=4.3 indicates that there are on average 4+ spinners waiting in the spin_lock loop for each spinner in the mutex_spin_on_owner loop. Contention in other locking functions also go down by quite a lot. The table below shows the performance change of both patches 1 & 2 over patch 1 alone in other AIM7 workloads (at 8 nodes, hyperthreading off). +--------------+---------------+----------------+-----------------+ | Workload | mean % change | mean % change | mean % change | | | 10-100 users | 200-1000 users | 1100-2000 users | +--------------+---------------+----------------+-----------------+ | alltests | 0.0% | -0.8% | +0.6% | | five_sec | -0.3% | +0.8% | +0.8% | | high_systime | +0.4% | +2.4% | +2.1% | | new_fserver | +0.1% | +14.1% | +34.2% | | shared | -0.5% | -0.3% | -0.4% | | short | -1.7% | -9.8% | -8.3% | +--------------+---------------+----------------+-----------------+ The short workload is the only one that shows a decline in performance probably due to the spinner locking and queuing overhead. Signed-off-by: Waiman Long <Waiman.Long@hp.com> Reviewed-by: Davidlohr Bueso <davidlohr.bueso@hp.com> Acked-by: Rik van Riel <riel@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Chandramouleeswaran Aswin <aswin@hp.com> Cc: Norton Scott J <scott.norton@hp.com> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: David Howells <dhowells@redhat.com> Cc: Dave Jones <davej@redhat.com> Cc: Clark Williams <williams@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/1366226594-5506-4-git-send-email-Waiman.Long@hp.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-04-19mutex: Move mutex spinning code from sched/core.c back to mutex.cWaiman Long1-1/+0
As mentioned by Ingo, the SCHED_FEAT_OWNER_SPIN scheduler feature bit was really just an early hack to make with/without mutex-spinning testable. So it is no longer necessary. This patch removes the SCHED_FEAT_OWNER_SPIN feature bit and move the mutex spinning code from kernel/sched/core.c back to kernel/mutex.c which is where they should belong. Signed-off-by: Waiman Long <Waiman.Long@hp.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Chandramouleeswaran Aswin <aswin@hp.com> Cc: Davidlohr Bueso <davidlohr.bueso@hp.com> Cc: Norton Scott J <scott.norton@hp.com> Cc: Rik van Riel <riel@redhat.com> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: David Howells <dhowells@redhat.com> Cc: Dave Jones <davej@redhat.com> Cc: Clark Williams <williams@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/1366226594-5506-2-git-send-email-Waiman.Long@hp.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-04-18Merge branch 'master' of ↵John W. Linville1-0/+2
git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless into for-davem
2013-04-18Revert "block: add missing block_bio_complete() tracepoint"Linus Torvalds2-5/+4
This reverts commit 3a366e614d0837d9fc23f78cdb1a1186ebc3387f. Wanlong Gao reports that it causes a kernel panic on his machine several minutes after boot. Reverting it removes the panic. Jens says: "It's not quite clear why that is yet, so I think we should just revert the commit for 3.9 final (which I'm assuming is pretty close). The wifi is crap at the LSF hotel, so sending this email instead of queueing up a revert and pull request." Reported-by: Wanlong Gao <gaowanlong@cn.fujitsu.com> Requested-by: Jens Axboe <axboe@kernel.dk> Cc: Tejun Heo <tj@kernel.org> Cc: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-04-17x86, kdump: Retore crashkernel= to allocate under 896MYinghai Lu1-0/+2
Vivek found old kexec-tools does not work new kernel anymore. So change back crashkernel= back to old behavoir, and add crashkernel_high= to let user decide if buffer could be above 4G, and also new kexec-tools will be needed. -v2: let crashkernel=X override crashkernel_high= update description about _high will be ignored by crashkernel=X -v3: update description about kernel-parameters.txt according to Vivek. Signed-off-by: Yinghai Lu <yinghai@kernel.org> Link: http://lkml.kernel.org/r/1366089828-19692-4-git-send-email-yinghai@kernel.org Acked-by: Vivek Goyal <vgoyal@redhat.com> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2013-04-17x86, kdump: Set crashkernel_low automaticallyYinghai Lu1-0/+1
Chao said that kdump does does work well on his system on 3.8 without extra parameter, even iommu does not work with kdump. And now have to append crashkernel_low=Y in first kernel to make kdump work. We have now modified crashkernel=X to allocate memory beyong 4G (if available) and do not allocate low range for crashkernel if the user does not specify that with crashkernel_low=Y. This causes regression if iommu is not enabled. Without iommu, swiotlb needs to be setup in first 4G and there is no low memory available to second kernel. Set crashkernel_low automatically if the user does not specify that. For system that does support IOMMU with kdump properly, user could specify crashkernel_low=0 to save that 72M low ram. -v3: add swiotlb_size() according to Konrad. -v4: add comments what 8M is for according to hpa. also update more crashkernel_low= in kernel-parameters.txt -v5: update changelog according to Vivek. -v6: Change description about swiotlb referring according to HATAYAMA. Reported-by: WANG Chao <chaowang@redhat.com> Tested-by: WANG Chao <chaowang@redhat.com> Signed-off-by: Yinghai Lu <yinghai@kernel.org> Link: http://lkml.kernel.org/r/1366089828-19692-2-git-send-email-yinghai@kernel.org Acked-by: Vivek Goyal <vgoyal@redhat.com> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2013-04-17Merge branch 'pci/cleanup' into nextBjorn Helgaas5-79/+78
* pci/cleanup: PCI: Remove "extern" from function declarations PCI: Warn about failures instead of "must_check" functions PCI: Remove __must_check from definitions PCI: Remove unused variables PCI: Move cpci_hotplug_init() proto to header file PCI: Make local functions/structs static PCI: Fix missing prototype for pcie_port_acpi_setup() Conflicts: drivers/pci/hotplug/acpiphp.h include/linux/pci.h
2013-04-17PCI: Remove "extern" from function declarationsBjorn Helgaas5-79/+78
We had an inconsistent mix of using and omitting the "extern" keyword on function declarations in header files. This removes them all. Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2013-04-17Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netLinus Torvalds2-7/+24
Pull networking fixes from David Miller: 1) Fix erroneous netfilter drop of SIP packets generated by some Cisco phones, from Patrick McHardy. 2) Fix netfilter IPSET refcounting in list_set_add(), from Jozsef Kadlecsik. 3) Fix TCP syncookies route lookup key, we don't use the same values we would use for the usual SYN receive processing, from Dmitry Popov. 4) Fix NULL deref in bond_slave_netdev_event(), from Nikolay Aleksandrov. 5) When bonding enslave fails, we can forget to clear the IFF_BONDING bit, fix also from Nikolay Aleksandrov. 6) skb->csum_start is 16-bits, which is almost always just fine. But if we reallocate the headroom of an SKB this can push the skb->csum_start value outside of it's valid range. This can easily happen when collapsing multiple SKBs from the retransmit queue together. Fix from Thomas Graf. 7) Fix NULL deref in be2net driver due to missing check of __vlan_put_tag() return value, from Ivan Vecera. 8) tun_set_iff() returns zero instead of error code on failure, fix from Wei Yongjun. 9) Like GARP, 802 MRP needs to hold the app->lock when adding MAD events and queueing PDUs. Fix from David Ward. 10) Build fix, MVMDIO needs PHYLIB, from Thomas Petazzoni.. 11) Fix mac80211 static with ipv6 modular build, from Cong Wang. 12) If userland specifies a path cost explicitly, do not override it when the carrier state changes. From Stephen Hemminger. 13) mvnets calculates the TX queue to use incorrectly resulting in garbage pointer derefs and crashes, fix from Willy Tarreau. 14) cdc_mbim does erroneous sizeof(ETH_HLEN). Fix from Bjorn Mork. 15) IP fragmentation can leak a refcount-less route out from an RCU protected section. This results in crashes and all sorts of hard to diagnose behavior. Fix from Eric Dumazet. * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (24 commits) qlcnic: fix beaconing test for 82xx adapter net: drop dst before queueing fragments net: fec: fix regression in link change accounting net: cdc_mbim: remove bogus sizeof() drivers: net: ethernet: cpsw: get slave VLAN id from slave node instead of cpsw node net: mvneta: fix improper tx queue usage in mvneta_tx() esp4: fix error return code in esp_output() bridge: make user modified path cost sticky ipv6: statically link register_inet6addr_notifier() net: mvmdio: add select PHYLIB net/802/mrp: fix possible race condition when calling mrp_pdu_queue() tuntap: fix error return code in tun_set_iff() be2net: take care of __vlan_put_tag return value can: sja1000: fix handling on dt properties on little endian systems can: mcp251x: add missing IRQF_ONESHOT to request_threaded_irq netfilter: nf_nat: fix race when unloading protocol modules tcp: Reallocate headroom if it would overflow csum_start stmmac: prevent interrupt loop with MMC RX IPC Counter bonding: IFF_BONDING is not stripped on enslave failure bonding: fix netdev event NULL pointer dereference ...
2013-04-17fuse: fix type definitions in uapi headerMiklos Szeredi1-220/+216
Commit 7e98d53086d18c877cb44e9065219335184024de (Synchronize fuse header with one used in library) added #ifdef __linux__ around defines if it is not set. The kernel build is self-contained and can be built on non-Linux toolchains. After the mentioned commit builds on non-Linux toolchains will try to include stdint.h and fail due to -nostdinc, and then fail with a bunch of undefined type errors. Fix by checking for __KERNEL__ instead of __linux__ and using the standard int types instead of the linux specific ones. Reported-by: Arve Hjønnevåg <arve@android.com> Reported-by: Colin Cross <ccross@android.com> Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
2013-04-16vm: add vm_iomap_memory() helper functionLinus Torvalds1-0/+2
Various drivers end up replicating the code to mmap() their memory buffers into user space, and our core memory remapping function may be very flexible but it is unnecessarily complicated for the common cases to use. Our internal VM uses pfn's ("page frame numbers") which simplifies things for the VM, and allows us to pass physical addresses around in a denser and more efficient format than passing a "phys_addr_t" around, and having to shift it up and down by the page size. But it just means that drivers end up doing that shifting instead at the interface level. It also means that drivers end up mucking around with internal VM things like the vma details (vm_pgoff, vm_start/end) way more than they really need to. So this just exports a function to map a certain physical memory range into user space (using a phys_addr_t based interface that is much more natural for a driver) and hides all the complexity from the driver. Some drivers will still end up tweaking the vm_page_prot details for things like prefetching or cacheability etc, but that's actually relevant to the driver, rather than caring about what the page offset of the mapping is into the particular IO memory region. Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-04-16xen: drop tracking of IRQ vectorJan Beulich1-2/+1
For quite a few Xen versions, this wasn't the IRQ vector anymore anyway, and it is not being used by the kernel for anything. Hence drop the field from struct irq_info, and respective function parameters. Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2013-04-16Merge branch 'pci/jiang-subdrivers' into nextBjorn Helgaas3-9/+31
* pci/jiang-subdrivers: PCI/ACPI: Remove support of ACPI PCI subdrivers PCI: acpiphp: Protect acpiphp data structures from concurrent updates PCI: acpiphp: Use normal list to simplify implementation PCI: acpiphp: Do not use ACPI PCI subdriver mechanism PCI: acpiphp: Convert acpiphp to be builtin only, not modular PCI/ACPI: Handle PCI slot devices when creating/destroying PCI buses x86/PCI: Implement pcibios_{add|remove}_bus() hooks ia64/PCI: Implement pcibios_{add|remove}_bus() hooks PCI/ACPI: Prepare stub functions to handle ACPI PCI (hotplug) slots PCI: Add pcibios hooks for adding and removing PCI buses PCI: acpiphp: Replace local macros with standard ACPI macros PCI: acpiphp: Remove all functions even if function 0 doesn't exist PCI: acpiphp: Use list_for_each_entry_safe() in acpiphp_sanitize_bus() PCI: Clean up usages of pci_bus->is_added PCI: When removing bus, always remove legacy files & unregister
2013-04-16PCI/ACPI: Remove support of ACPI PCI subdriversMyron Stowe1-9/+0
Both sub-drivers of the "PCI Root Bridge ("pci_bridge")" driver, "acpiphp" and "pci_slot", have been converted to hook directly into the PCI core. With the conversions there are no remaining usages of the 'struct acpi_pci_driver' list based infrastructure. This patch removes it. Signed-off-by: Myron Stowe <myron.stowe@redhat.com> Signed-off-by: Jiang Liu <jiang.liu@huawei.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Yinghai Lu <yinghai@kernel.org> Cc: "Rafael J. Wysocki" <rafael.j.wysocki@intel.com> Cc: Toshi Kani <toshi.kani@hp.com>
2013-04-15Merge branch 'pci/misc' into nextBjorn Helgaas1-0/+1
* pci/misc: PCI: Clean up quirk_io_region PCI: Use vma_pages() to replace (vm_end - vm_start) >> PAGE_SHIFT PCI: Use PCI_EXP_SLTCAP_PSN mask when extracting slot number PCI: Remove unnecessary dependencies between PME and ACPI [SCSI] mvumi: Use PCI_VENDOR_ID_MARVELL_EXT for 0x1b4b [SCSI] mvsas: Use PCI_VENDOR_ID_MARVELL_EXT for 0x1b4b ahci: Use PCI_VENDOR_ID_MARVELL_EXT for 0x1b4b PCI: Define macro for Marvell vendor ID PCI: Add MSI INTX_DISABLE quirks for AR8161/AR8162/AR8171/AR8172/E210X PCI: aer_inject: Fix return values when device not found
2013-04-15PCI: Define macro for Marvell vendor IDXiangliang Yu1-0/+1
Define PCI_VENDOR_ID_MARVELL_EXT macro for 0x1b4b vendor ID Signed-off-by: Xiangliang Yu <yuxiangl@marvell.com> Signed-off-by: Myron Stowe <myron.stowe@redhat.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2013-04-15Merge branch 'pci/gabor-get-of-node' into nextBjorn Helgaas1-1/+1
* pci/gabor-get-of-node: MIPS/PCI: Implement pcibios_get_phb_of_node PCI: Remove __weak annotation from pcibios_get_phb_of_node decl
2013-04-15Move utf16 functions to kernel core and renameMatthew Garrett1-0/+14
We want to be able to use the utf16 functions that are currently present in the EFI variables code in platform-specific code as well. Move them to the kernel core, and in the process rename them to accurately describe what they do - they don't handle UTF16, only UCS2. Signed-off-by: Matthew Garrett <matthew.garrett@nebula.com> Signed-off-by: Matt Fleming <matt.fleming@intel.com>
2013-04-15Merge branches 'timers-urgent-for-linus', 'irq-urgent-for-linus' and ↵Linus Torvalds2-3/+4
'core-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull {timer,irq,core} fixes from Thomas Gleixner: - timer: bug fix for a cpu hotplug race. - irq: single bugfix for a wrong return value, which prevents the calling function to invoke the software fallback. - core: bugfix which plugs two race confitions which can cause hotplug per cpu threads to end up on the wrong cpu. * 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: hrtimer: Don't reinitialize a cpu_base lock on CPU_UP * 'irq-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: irqchip: gic: fix irq_trigger return * 'core-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: kthread: Prevent unpark race which puts threads on the wrong cpu
2013-04-14ipv6: statically link register_inet6addr_notifier()Cong Wang1-0/+1
Tomas reported the following build error: net/built-in.o: In function `ieee80211_unregister_hw': (.text+0x10f0e1): undefined reference to `unregister_inet6addr_notifier' net/built-in.o: In function `ieee80211_register_hw': (.text+0x10f610): undefined reference to `register_inet6addr_notifier' make: *** [vmlinux] Error 1 when built IPv6 as a module. So we have to statically link these symbols. Reported-by: Tomas Melin <tomas.melin@iki.fi> Cc: Tomas Melin <tomas.melin@iki.fi> Cc: "David S. Miller" <davem@davemloft.net> Cc: YOSHIFUJI Hidaki <yoshfuji@linux-ipv6.org> Signed-off-by: Cong Wang <amwang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-04-14Merge tag 'trace-fixes-v3.9-rc-v3' of ↵Linus Torvalds1-1/+2
git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace Pull ftrace fixes from Steven Rostedt: "Namhyung Kim found and fixed a bug that can crash the kernel by simply doing: echo 1234 | tee -a /sys/kernel/debug/tracing/set_ftrace_pid Luckily, this can only be done by root, but still is a nasty bug." * tag 'trace-fixes-v3.9-rc-v3' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace: ftrace: Move ftrace_filter_lseek out of CONFIG_DYNAMIC_FTRACE section tracing: Fix possible NULL pointer dereferences
2013-04-14Add file_ns_capable() helper function for open-time capability checkingLinus Torvalds1-0/+2
Nothing is using it yet, but this will allow us to delay the open-time checks to use time, without breaking the normal UNIX permission semantics where permissions are determined by the opener (and the file descriptor can then be passed to a different process, or the process can drop capabilities). Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-04-12x86-32: Fix possible incomplete TLB invalidate with PAE pagetablesDave Hansen1-1/+6
This patch attempts to fix: https://bugzilla.kernel.org/show_bug.cgi?id=56461 The symptom is a crash and messages like this: chrome: Corrupted page table at address 34a03000 *pdpt = 0000000000000000 *pde = 0000000000000000 Bad pagetable: 000f [#1] PREEMPT SMP Ingo guesses this got introduced by commit 611ae8e3f520 ("x86/tlb: enable tlb flush range support for x86") since that code started to free unused pagetables. On x86-32 PAE kernels, that new code has the potential to free an entire PMD page and will clear one of the four page-directory-pointer-table (aka pgd_t entries). The hardware aggressively "caches" these top-level entries and invlpg does not actually affect the CPU's copy. If we clear one we *HAVE* to do a full TLB flush, otherwise we might continue using a freed pmd page. (note, we do this properly on the population side in pud_populate()). This patch tracks whenever we clear one of these entries in the 'struct mmu_gather', and ensures that we follow up with a full tlb flush. BTW, I disassembled and checked that: if (tlb->fullmm == 0) and if (!tlb->fullmm && !tlb->need_flush_all) generate essentially the same code, so there should be zero impact there to the !PAE case. Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Cc: Peter Anvin <hpa@zytor.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Artem S Tashkinov <t.artem@mailcity.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-04-12PCI: acpiphp: Do not use ACPI PCI subdriver mechanismJiang Liu1-0/+11
Previously the acpiphp driver registered itself as an ACPI PCI subdriver, so its callbacks were invoked when creating/destroying PCI root buses to manage ACPI-based PCI hotplug slots. But it doesn't handle P2P bridge hotplug events, so it will cause strange behaviour if there are hotplug slots associated with a hot-removed P2P bridge. This patch fixes this issue by: 1) Directly hooking into PCI core to update hotplug slot devices when creating/destroying PCI buses through: pci_{add|remove}_bus() -> acpi_pci_{add|remove}_bus() 2) Getting rid of unused ACPI PCI subdriver-related code It also cleans up unused code in the acpiphp driver. [bhelgaas: keep acpi_pci_add_bus() stub for CONFIG_ACPI=n] Signed-off-by: Jiang Liu <jiang.liu@huawei.com> Signed-off-by: Yijing Wang <wangyijing@huawei.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Yinghai Lu <yinghai@kernel.org> Cc: "Rafael J. Wysocki" <rafael.j.wysocki@intel.com> Cc: Toshi Kani <toshi.kani@hp.com>
2013-04-12PCI/ACPI: Handle PCI slot devices when creating/destroying PCI busesJiang Liu1-0/+12
Currently the pci_slot driver doesn't update PCI slot devices when PCI device hotplug event happens, which may cause memory leak and returning stale information to user. Now the pci_slot driver has been changed as built-in driver, so invoke PCI slot enumeration and destroy routines directly from the PCI core. And remove ACPI PCI sub-driver related code because it isn't needed any more. [bhelgas: removed "extern" from function declarations] Signed-off-by: Jiang Liu <jiang.liu@huawei.com> Signed-off-by: Yijing Wang <wangyijing@huawei.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Yinghai Lu <yinghai@kernel.org> Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Cc: Toshi Kani <toshi.kani@hp.com>
2013-04-12PCI/ACPI: Prepare stub functions to handle ACPI PCI (hotplug) slotsJiang Liu1-1/+7
Prepare two stub functions to handle ACPI PCI slots and ACPI PCI hotplug slots, which will be invoked by the PCI core when creating/destroying PCI buses. It will be used to get rid of ACPI PCI subdrivers for pci_slot and acpiphp, and eventually remove the ACPI PCI subdriver mechanism. And it will also be used to handle ACPI PCI (hotplug) slots in a unified way, both at boot time and for PCI hotplug operations. Signed-off-by: Jiang Liu <jiang.liu@huawei.com> Signed-off-by: Yijing Wang <wangyijing@huawei.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Yinghai Lu <yinghai@kernel.org> Cc: "Rafael J. Wysocki" <rafael.j.wysocki@intel.com> Cc: Toshi Kani <toshi.kani@hp.com> Cc: Tony Luck <tony.luck@intel.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Myron Stowe <myron.stowe@redhat.com>
2013-04-12PCI: Add pcibios hooks for adding and removing PCI busesJiang Liu1-0/+2
On ACPI-based platforms, the pci_slot driver creates PCI slot devices according to information from ACPI tables by registering an ACPI PCI subdriver. The ACPI PCI subdriver will only be called when creating/ destroying PCI root buses, and it won't be called when hot-plugging P2P bridges. It may cause stale PCI slot devices after hot-removing a P2P bridge if that bridge has associated PCI slots. And the acpiphp driver has the same issue too. This patch introduces two hook points into the PCI core, which will be invoked when creating/destroying PCI buses for PCI host and P2P bridges. They could be used to setup/destroy platform dependent stuff in a unified way, both at boot time and for PCI hotplug operations. Signed-off-by: Jiang Liu <jiang.liu@huawei.com> Signed-off-by: Yijing Wang <wangyijing@huawei.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Yinghai Lu <yinghai@kernel.org> Cc: "Rafael J. Wysocki" <rafael.j.wysocki@intel.com> Cc: Toshi Kani <toshi.kani@hp.com> Cc: Tony Luck <tony.luck@intel.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Myron Stowe <myron.stowe@redhat.com>
2013-04-12ftrace: Move ftrace_filter_lseek out of CONFIG_DYNAMIC_FTRACE sectionSteven Rostedt (Red Hat)1-1/+2
As ftrace_filter_lseek is now used with ftrace_pid_fops, it needs to be moved out of the #ifdef CONFIG_DYNAMIC_FTRACE section as the ftrace_pid_fops is defined when DYNAMIC_FTRACE is not. Cc: stable@vger.kernel.org Cc: Namhyung Kim <namhyung@kernel.org> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2013-04-12tracing: Fix possible NULL pointer dereferencesNamhyung Kim1-1/+1
Currently set_ftrace_pid and set_graph_function files use seq_lseek for their fops. However seq_open() is called only for FMODE_READ in the fops->open() so that if an user tries to seek one of those file when she open it for writing, it sees NULL seq_file and then panic. It can be easily reproduced with following command: $ cd /sys/kernel/debug/tracing $ echo 1234 | sudo tee -a set_ftrace_pid In this example, GNU coreutils' tee opens the file with fopen(, "a") and then the fopen() internally calls lseek(). Link: http://lkml.kernel.org/r/1365663302-2170-1-git-send-email-namhyung@kernel.org Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Namhyung Kim <namhyung.kim@lge.com> Cc: stable@vger.kernel.org Signed-off-by: Namhyung Kim <namhyung@kernel.org> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2013-04-12Merge branch 'master' of ↵David S. Miller1-7/+23
git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf into netfilter Pablo Neira Ayuso says: ==================== The following patchset contains late netfilter fixes for your net tree, they are: * Don't drop segmented TCP packets in the SIP helper, we've got reports from users that this was breaking communications when the SIP phone messages are larger than the MTU, from Patrick McHardy. * Fix refcount leak in the ipset list set, from Jozsef Kadlecsik. * On hash set resizing, the nomatch flag was lost, thus entirely inverting the logic of the set matching, from Jozsef Kadlecsik. * Fix crash on NAT modules removal. Timer expiration may race with the module cleanup exit path while deleting conntracks, from Florian Westphal. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2013-04-12kthread: Prevent unpark race which puts threads on the wrong cpuThomas Gleixner2-3/+4
The smpboot threads rely on the park/unpark mechanism which binds per cpu threads on a particular core. Though the functionality is racy: CPU0 CPU1 CPU2 unpark(T) wake_up_process(T) clear(SHOULD_PARK) T runs leave parkme() due to !SHOULD_PARK bind_to(CPU2) BUG_ON(wrong CPU) We cannot let the tasks move themself to the target CPU as one of those tasks is actually the migration thread itself, which requires that it starts running on the target cpu right away. The solution to this problem is to prevent wakeups in park mode which are not from unpark(). That way we can guarantee that the association of the task to the target cpu is working correctly. Add a new task state (TASK_PARKED) which prevents other wakeups and use this state explicitly for the unpark wakeup. Peter noticed: Also, since the task state is visible to userspace and all the parked tasks are still in the PID space, its a good hint in ps and friends that these tasks aren't really there for the moment. The migration thread has another related issue. CPU0 CPU1 Bring up CPU2 create_thread(T) park(T) wait_for_completion() parkme() complete() sched_set_stop_task() schedule(TASK_PARKED) The sched_set_stop_task() call is issued while the task is on the runqueue of CPU1 and that confuses the hell out of the stop_task class on that cpu. So we need the same synchronizaion before sched_set_stop_task(). Reported-by: Dave Jones <davej@redhat.com> Reported-and-tested-by: Dave Hansen <dave@sr71.net> Reported-and-tested-by: Borislav Petkov <bp@alien8.de> Acked-by: Peter Ziljstra <peterz@infradead.org> Cc: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com> Cc: dhillf@gmail.com Cc: Ingo Molnar <mingo@kernel.org> Cc: stable@vger.kernel.org Link: http://lkml.kernel.org/r/alpine.LFD.2.02.1304091635430.21884@ionos Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2013-04-10Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netLinus Torvalds2-0/+20
Pull networking fixes from David Miller: 1) cfg80211_conn_scan() must be called with the sched_scan_mutex, fix from Artem Savkov. 2) Fix regression in TCP ICMPv6 processing, we do not want to treat redirects as socket errors, from Christoph Paasch. 3) Fix several recvmsg() msg_name kernel memory leaks into userspace, in ATM, AX25, Bluetooth, CAIF, IRDA, s390 IUCV, L2TP, LLC, Netrom, NFC, Rose, TIPC, and VSOCK. From Mathias Krause and Wei Yongjun. 4) Fix AF_IUCV handling of segmented SKBs in recvmsg(), from Ursula Braun and Eric Dumazet. 5) CAN gw.c code does kfree() on SLAB cache memory, use kmem_cache_free() instead. Fix from Wei Yongjun. 6) Fix LSM regression on TCP SYN/ACKs, some LSMs such as SELINUX want an skb->sk socket context available for these packets, but nothing else requires it. From Eric Dumazet and Paul Moore. 7) Fix ipv4 address lifetime processing so that we don't perform sleepable acts inside of rcu_read_lock() sections, do them in an rtnl_lock() section instead. From Jiri Pirko. 8) mvneta driver accidently sets HW features after device registry, it should do so beforehand. Fix from Willy Tarreau. 9) Fix bonding unload races more correctly, from Nikolay Aleksandrov and Veaceslav Falico. 10) rtnl_dump_ifinfo() and rtnl_calcit() invoke nlmsg_parse() with wrong header size argument. Fix from Michael Riesch. * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (44 commits) lsm: add the missing documentation for the security_skb_owned_by() hook bnx2x: Prevent null pointer dereference in AFEX mode e100: Add dma mapping error check selinux: add a skb_owned_by() hook can: gw: use kmem_cache_free() instead of kfree() netrom: fix invalid use of sizeof in nr_recvmsg() qeth: fix qeth_wait_for_threads() deadlock for OSN devices af_iucv: fix recvmsg by replacing skb_pull() function rtnetlink: Call nlmsg_parse() with correct header length bonding: fix bonding_masters race condition in bond unloading Revert "bonding: remove sysfs before removing devices" net: mvneta: enable features before registering the driver hyperv: Fix RNDIS send_completion code path hyperv: Fix a kernel warning from netvsc_linkstatus_callback() net: ipv4: fix schedule while atomic bug in check_lifetime() net: ipv4: reset check_lifetime_work after changing lifetime bnx2x: Fix KR2 rapid link flap sctp: remove 'sridhar' from maintainers list VSOCK: Fix missing msg_namelen update in vsock_stream_recvmsg() VSOCK: vmci - fix possible info leak in vmci_transport_dgram_dequeue() ...
2013-04-10lsm: add the missing documentation for the security_skb_owned_by() hookPaul Moore1-0/+4
Unfortunately we didn't catch the missing comments earlier when the patch was merged. Signed-off-by: Paul Moore <pmoore@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-04-10PCI: Remove __weak annotation from pcibios_get_phb_of_node declBjorn Helgaas1-1/+1
The __weak annotation on the pcibios_get_phb_of_node() declaration causes *every* definition to be marked "weak." The linker then selects one based on link order, which may be the wrong one. Gabor found that on MIPS, the linker selected the generic implementation from drivers/pci even though arch/mips supplied a definition without the __weak annotation: $ mipsel-openwrt-linux-readelf -s arch/mips/pci/built-in.o \ drivers/pci/built-in.o vmlinux.o | grep pcibios_get_phb_of_node 86: 0000046c 12 FUNC WEAK DEFAULT 2 pcibios_get_phb_of_node 1430: 00012e2c 104 FUNC WEAK DEFAULT 2 pcibios_get_phb_of_node 31898: 0017e4ec 104 FUNC WEAK DEFAULT 2 pcibios_get_phb_of_node This removes the __weak annotation from the pcibios_get_phb_of_node() declaration so arch-specific non-weak implementations work reliably. Suggested-by: Gabor Juhos <juhosg@openwrt.org> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2013-04-10ssb: implement spurious tone avoidanceRafał Miłecki1-0/+2
And make use of it in b43. This fixes a regression introduced with 49d55cef5b1925a5c1efb6aaddaa40fc7c693335 b43: N-PHY: implement spurious tone avoidance This commit made BCM4322 use only MCS 0 on channel 13, which of course resulted in performance drop (down to 0.7Mb/s). Reported-by: Stefan Brüns <stefan.bruens@rwth-aachen.de> Signed-off-by: Rafał Miłecki <zajec5@gmail.com> Cc: Stable <stable@vger.kernel.org> Signed-off-by: John W. Linville <linville@tuxdriver.com>