summaryrefslogtreecommitdiffstats
path: root/include
AgeCommit message (Collapse)AuthorFilesLines
2012-05-24NFSv4.1 mdsthreshold attribute xdrAndy Adamson2-0/+17
We only support one layout type per file system, so one threshold_item4 per mdsthreshold4. Signed-off-by: Andy Adamson <andros@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-05-22NFS: EXCHANGE_ID should save the server major and minor IDChuck Lever2-1/+3
Save the server major and minor ID results from EXCHANGE_ID, as they are needed for detecting server trunking. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-05-22NFS: Add nfs_client behavior flagsChuck Lever2-1/+4
"noresvport" and "discrtry" can be passed to nfs_create_rpc_client() by setting flags in the passed-in nfs_client. This change makes it easy to add new flags. Note that these settings are now "sticky" over the lifetime of a struct nfs_client, and may even be copied when an nfs_client is cloned. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-05-22NFS: Refactor nfs_get_client(): initialize nfs_clientChuck Lever1-1/+2
Clean up: Continue to rationalize the locking in nfs_get_client() by moving the logic that handles the case where a matching server IP address is not found. When we support server trunking detection, client initialization may return a different nfs_client struct than was passed to it. Change the synopsis of the init_client methods to return an nfs_client. The client initialization logic in nfs_get_client() is not much more than a wrapper around ->init_client. It's simpler to keep the little bits of error handling in the version-specific init_client methods. No behavior change is expected. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-05-22NFS: Always use the same SETCLIENTID boot verifierChuck Lever1-3/+0
Currently our NFS client assigns a unique SETCLIENTID boot verifier for each server IP address it knows about. It's set to CURRENT_TIME when the struct nfs_client for that server IP is created. During the SETCLIENTID operation, our client also presents an nfs_client_id4 string to servers, as an identifier on which the server can hang all of this client's NFSv4 state. Our client's nfs_client_id4 string is unique for each server IP address. An NFSv4 server is obligated to wipe all NFSv4 state associated with an nfs_client_id4 string when the client presents the same nfs_client_id4 string along with a changed SETCLIENTID boot verifier. When our client unmounts the last of a server's shares, it destroys that server's struct nfs_client. The next time the client mounts that NFS server, it creates a fresh struct nfs_client with a fresh boot verifier. On seeing the fresh verifer, the server wipes any previous NFSv4 state associated with that nfs_client_id4. However, NFSv4.1 clients are supposed to present the same nfs_client_id4 string to all servers. And, to support Transparent State Migration, the same nfs_client_id4 string should be presented to all NFSv4.0 servers so they recognize that migrated state for this client belongs with state a server may already have for this client. (This is known as the Uniform Client String model). If the nfs_client_id4 string is the same but the boot verifier changes for each server IP address, SETCLIENTID and EXCHANGE_ID operations from such a client could unintentionally result in a server wiping a client's previously obtained lease. Thus, if our NFS client is going to use a fixed nfs_client_id4 string, either for NFSv4.0 or NFSv4.1 mounts, our NFS client should use a boot verifier that does not change depending on server IP address. Replace our current per-nfs_client boot verifier with a per-nfs_net boot verifier. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-05-22NFS: Use proper naming conventions for the nfs_client.net fieldChuck Lever1-1/+1
Clean up: When naming fields and data types, follow established conventions to facilitate accurate grep/cscope searches. Introduced by commit e50a7a1a "NFS: make NFS client allocated per network namespace context," Tue Jan 10, 2012. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-05-22NFS: Use proper naming conventions for nfs_client.impl_id fieldChuck Lever1-1/+1
Clean up: When naming fields and data types, follow established conventions to facilitate accurate grep/cscope searches. Additionally, for consistency, move the impl_id field into the NFSv4- specific part of the nfs_client, and free that memory in the logic that shuts down NFSv4 nfs_clients. Introduced by commit 7d2ed9ac "NFSv4: parse and display server implementation ids," Fri Feb 17, 2012. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-05-22NFS: Use proper naming conventions for NFSv4.1 server scope fieldsChuck Lever2-4/+4
Clean up: When naming fields and data types, follow established conventions to facilitate accurate grep/cscope searches. Additionally, for consistency, move the scope field into the NFSv4- specific part of the nfs_client, and free that memory in the logic that shuts down NFSv4 nfs_clients. Introduced by commit 99fe60d0 "nfs41: exchange_id operation", April 1 2009. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-05-22NFS: Fix comment misspelling in struct nfs_client definitionChuck Lever1-1/+1
Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-05-22NFS: Add NFSDBG_STATEChuck Lever1-0/+1
fs/nfs/nfs4state.c does not yet have any dprintk() call sites, and I'm about to introduce some. We will need a new flag for enabling them. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-05-21Merge branch 'bugfixes' into nfs-for-nextTrond Myklebust21-36/+128
2012-05-12Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netLinus Torvalds3-14/+19
Pull networking fixes from David S. Miller: 1) Since we do RCU lookups on ipv4 FIB entries, we have to test if the entry is dead before returning it to our caller. 2) openvswitch locking and packet validation fixes from Ansis Atteka, Jesse Gross, and Pravin B Shelar. 3) Fix PM resume locking in IGB driver, from Benjamin Poirier. 4) Fix VLAN header handling in vhost-net and macvtap, from Basil Gor. 5) Revert a bogus network namespace isolation change that was causing regressions on S390 networking devices. 6) If bonding decides to process and handle a LACPDU frame, we shouldn't bump the rx_dropped counter. From Jiri Bohac. 7) Fix mis-calculation of available TX space in r8169 driver when doing TSO, which can lead to crashes and/or hung device. From Julien Ducourthial. 8) SCTP does not validate cached routes properly in all cases, from Nicolas Dichtel. 9) Link status interrupt needs to be handled in ks8851 driver, from Stephen Boyd. 10) Use capable(), not cap_raised(), in connector/userns netlink code. From Eric W. Biederman via Andrew Morton. 11) Fix pktgen OOPS on module unload, from Eric Dumazet. 12) iwlwifi under-estimates SKB truesizes, also from Eric Dumazet. 13) Cure division by zero in SFC driver, from Ben Hutchings. * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (26 commits) ks8851: Update link status during link change interrupt macvtap: restore vlan header on user read vhost-net: fix handle_rx buffer size bonding: don't increase rx_dropped after processing LACPDUs connector/userns: replace netlink uses of cap_raised() with capable() sctp: check cached dst before using it pktgen: fix crash at module unload Revert "net: maintain namespace isolation between vlan and real device" ehea: fix losing of NEQ events when one event occurred early igb: fix rtnl race in PM resume path ipv4: Do not use dead fib_info entries. r8169: fix unsigned int wraparound with TSO sfc: Fix division by zero when using one RX channel and no SR-IOV openvswitch: Validation of IPv6 set port action uses IPv4 header net: compare_ether_addr[_64bits]() has no ordering cdc_ether: Ignore bogus union descriptor for RNDIS devices bnx2x: bug fix when loading after SAN boot e1000: Silence sparse warnings by correcting type igb, ixgbe: netdev_tx_reset_queue incorrectly called from tx init path openvswitch: Release rtnl_lock if ovs_vport_cmd_build_info() failed. ...
2012-05-10sctp: check cached dst before using itNicolas Dichtel1-0/+13
dst_check() will take care of SA (and obsolete field), hence IPsec rekeying scenario is taken into account. Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Acked-by: Vlad Yaseivch <vyasevich@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-05-10Revert "net: maintain namespace isolation between vlan and real device"David S. Miller1-9/+0
This reverts commit 8a83a00b0735190384a348156837918271034144. It causes regressions for S390 devices, because it does an unconditional DST drop on SKBs for vlans and the QETH device needs the neighbour entry hung off the DST for certain things on transmit. Arnd can't remember exactly why he even needed this change. Conflicts: drivers/net/macvlan.c net/8021q/vlan_dev.c net/core/dev.c Signed-off-by: David S. Miller <davem@davemloft.net>
2012-05-09NFS: Clean up - Rename nfs_unlock_request and nfs_unlock_request_dont_releaseTrond Myklebust1-1/+1
Function rename to ensure that the functionality of nfs_unlock_request() mirrors that of nfs_lock_request(). Then let nfs_unlock_and_release_request() do the work of what used to be called nfs_unlock_request()... Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> Cc: Fred Isaman <iisaman@netapp.com>
2012-05-09NFS: Clean up - simplify nfs_lock_request()Trond Myklebust1-12/+2
We only have two places where we need to grab a reference when trying to lock the nfs_page. We're better off making that explicit. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> Cc: Fred Isaman <iisaman@netapp.com>
2012-05-09NFS: Prevent a deadlock in the new writeback codeTrond Myklebust1-0/+1
We have to unlock the nfs_page before we call nfs_end_page_writeback to avoid races with functions that expect the page to be unlocked when PG_locked and PG_writeback are not set. The problem is that nfs_unlock_request also releases the nfs_page, causing a deadlock if the release of the nfs_open_context triggers an iput() while the PG_writeback flag is still set... The solution is to separate the unlocking and release of the nfs_page, so that we can do the former before nfs_end_page_writeback and the latter after. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> Cc: Fred Isaman <iisaman@netapp.com>
2012-05-07net: compare_ether_addr[_64bits]() has no orderingJohannes Berg1-5/+6
Neither compare_ether_addr() nor compare_ether_addr_64bits() (as it can fall back to the former) have comparison semantics like memcmp() where the sign of the return value indicates sort order. We had a bug in the wireless code due to a blind memcmp replacement because of this. A cursory look suggests that the wireless bug was the only one due to this semantic difference. Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-05-06Merge branch 'x86-urgent-for-linus' of ↵Linus Torvalds1-1/+1
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 fixes form Peter Anvin * 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: intel_mid_powerbtn: mark irq as IRQF_NO_SUSPEND arch/x86/platform/geode/net5501.c: change active_low to 0 for LED driver x86, relocs: Remove an unused variable asm-generic: Use __BITS_PER_LONG in statfs.h x86/amd: Re-enable CPU topology extensions in case BIOS has disabled it
2012-05-05Merge branch 'release' of ↵Linus Torvalds1-3/+4
git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux Pull an ACPI patch from Len Brown: "It fixes a D3 issue new in 3.4-rc1." By Lin Ming via Len Brown: * 'release' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux: ACPI: Fix D3hot v D3cold confusion
2012-05-05ACPI: Fix D3hot v D3cold confusionLin Ming1-3/+4
Before this patch, ACPI_STATE_D3 incorrectly referenced D3hot in some places, but D3cold in other places. After this patch, ACPI_STATE_D3 always means ACPI_STATE_D3_COLD; and all references to D3hot use ACPI_STATE_D3_HOT. ACPI's _PR3 method is used to enter both D3hot and D3cold states. What distinguishes D3hot from D3cold is the presence _PR3 (Power Resources for D3hot) If these resources are all ON, then the state is D3hot. If _PR3 is not present, or all _PR0 resources for the devices are OFF, then the state is D3cold. This patch applies after Linux-3.4-rc1. A future syntax cleanup may remove ACPI_STATE_D3 to emphasize that it always means ACPI_STATE_D3_COLD. Signed-off-by: Lin Ming <ming.m.lin@intel.com> Acked-by: Rafael J. Wysocki <rjw@sisk.pl> Reviewed-by: Aaron Lu <aaron.lu@amd.com> Signed-off-by: Len Brown <len.brown@intel.com>
2012-05-04seqlock: add 'raw_seqcount_begin()' functionLinus Torvalds1-0/+21
The normal read_seqcount_begin() function will wait for any current writers to exit their critical region by looping until the sequence count is even. That "wait for sequence count to stabilize" is the right thing to do if the read-locker will just retry the whole operation on contention: no point in doing a potentially expensive reader sequence if we know at the beginning that we'll just end up re-doing it all. HOWEVER. Some users don't actually retry the operation, but instead will abort and do the operation with proper locking. So the sequence count case may be the optimistic quick case, but in the presense of writers you may want to do full locking in order to guarantee forward progress. The prime example of this would be the RCU name lookup. And in that case, you may well be better off without the "retry early", and are in a rush to instead get to the failure handling. Thus this "raw" interface that just returns the sequence number without testing it - it just forces the low bit to zero so that read_seqcount_retry() will always fail such a "active concurrent writer" scenario. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-05-04Fix __read_seqcount_begin() to use ACCESS_ONCE for sequence value readLinus Torvalds1-1/+1
We really need to use a ACCESS_ONCE() on the sequence value read in __read_seqcount_begin(), because otherwise the compiler might end up reloading the value in between the test and the return of it. As a result, it might end up returning an odd value (which means that a write is in progress). If the reader is then fast enough that that odd value is still the current one when the read_seqcount_retry() is done, we might end up with a "successful" read sequence, even despite the concurrent write being active. In practice this probably never really happens - there just isn't anything else going on around the read of the sequence count, and the common case is that we end up having a read barrier immediately afterwards. So the code sequence in which gcc might decide to reaload from memory is small, and there's no reason to believe it would ever actually do the reload. But if the compiler ever were to decide to do so, it would be incredibly annoying to debug. Let's just make sure. Cc: stable@kernel.org Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-05-03Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netLinus Torvalds6-6/+19
Pull networking fixes from David Miller: 1) Transfer padding was wrong for full-speed USB in ASIX driver, fix from Ingo van Lil. 2) Propagate the negative packet offset fix into the PowerPC BPF JIT. From Jan Seiffert. 3) dl2k driver's private ioctls were letting unprivileged tasks make MII writes and other ugly bits like that. Fix from Jeff Mahoney. 4) Fix TX VLAN and RX packet drops in ucc_geth, from Joakim Tjernlund. 5) OOPS and network namespace fixes in IPVS from Hans Schillstrom and Julian Anastasov. 6) Fix races and sleeping in locked context bugs in drop_monitor, from Neil Horman. 7) Fix link status indication in smsc95xx driver, from Paolo Pisati. 8) Fix bridge netfilter OOPS, from Peter Huang. 9) L2TP sendmsg can return on error conditions with the socket lock held, oops. Fix from Sasha Levin. 10) udp_diag should return meaningful values for socket memory usage, from Shan Wei. 11) Eric Dumazet is so awesome he gets his own section: Socket memory cgroup code (I never should have applied those patches, grumble...) made erroneous changes to sk_sockets_allocated_read_positive(). It was changed to use percpu_counter_sum_positive (which requires BH disabling) instead of percpu_counter_read_positive (which does not). Revert back to avoid crashes and lockdep warnings. Adjust the default tcp_adv_win_scale and tcp_rmem[2] values to fix throughput regressions. This is necessary as a result of our more precise skb->truesize tracking. Fix SKB leak in netem packet scheduler. 12) New device IDs for various bluetooth devices, from Manoj Iyer, AceLan Kao, and Steven Harms. 13) Fix command completion race in ipw2200, from Stanislav Yakovlev. 14) Fix rtlwifi oops on unload, from Larry Finger. 15) Fix hard_mtu when adjusting hard_header_len in smsc95xx driver. From Stephane Fillod. 16) ehea driver registers it's IRQ before all the necessary state is setup, resulting in crashes. Fix from Thadeu Lima de Souza Cascardo. 17) Fix PHY connection failures in davinci_emac driver, from Anatolij Gustschin. 18) Missing break; in switch statement in bluetooth's hci_cmd_complete_evt(). Fix from Szymon Janc. 19) Fix queue programming in iwlwifi, from Johannes Berg. 20) Interrupt throttling defaults not being actually programmed into the hardware, fix from Jeff Kirsher and Ying Cai. 21) TLAN driver SKB encoding in descriptor busted on 64-bit, fix from Benjamin Poirier. 22) Fix blind status block RX producer pointer deref in TG3 driver, from Matt Carlson. 23) Promisc and multicast are busted on ehea, fixes from Thadeu Lima de Souza Cascardo. 24) Fix crashes in 6lowpan, from Alexander Smirnov. 25) tcp_complete_cwr() needs to be careful to not rewind the CWND to ssthresh if ssthresh has the "infinite" value. Fix from Yuchung Cheng. * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (81 commits) sungem: Fix WakeOnLan tcp: change tcp_adv_win_scale and tcp_rmem[2] net: l2tp: unlock socket lock before returning from l2tp_ip_sendmsg drop_monitor: prevent init path from scheduling on the wrong cpu usbnet: fix failure handling in usbnet_probe usbnet: fix leak of transfer buffer of dev->interrupt ucc_geth: Add 16 bytes to max TX frame for VLANs net: ucc_geth, increase no. of HW RX descriptors netem: fix possible skb leak sky2: fix receive length error in mixed non-VLAN/VLAN traffic sky2: propogate rx hash when packet is copied net: fix two typos in skbuff.h cxgb3: Don't call cxgb_vlan_mode until q locks are initialized ixgbe: fix calling skb_put on nonlinear skb assertion bug ixgbe: Fix a memory leak in IEEE DCB igbvf: fix the bug when initializing the igbvf smsc75xx: enable mac to detect speed/duplex from phy smsc75xx: declare smsc75xx's MII as GMII capable smsc75xx: fix phy interrupt acknowledge smsc75xx: fix phy init reset loop ...
2012-05-01NFS: Adapt readdirplus to application usage patternsTrond Myklebust1-5/+0
While the use of READDIRPLUS is significantly more efficient than READDIR followed by many LOOKUP calls, it is still less efficient than just READDIR if the attributes are not required. This patch tracks when lookups are attempted on the directory, and uses that information to selectively disable READDIRPLUS on that directory. The first 'readdir' call is always served using READDIRPLUS. Subsequent calls only use READDIRPLUS if there was a successful lookup or revalidation on a child in the mean time. Credit for the original idea should go to Neil Brown. See: http://www.spinics.net/lists/linux-nfs/msg19996.html However, the implementation in this patch differs from Neil's in that it focuses on tracking lookups rather than calls to stat(). Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> Cc: Neil Brown <neilb@suse.de>
2012-05-01NFSv4: Simplify the NFSv4 REMOVE, LINK and RENAME compoundsTrond Myklebust1-2/+0
Get rid of the post-op GETATTR on the directory in order to reduce the amount of processing done on the server. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-05-01NFSv4: Simplify the NFSv4 CREATE compoundTrond Myklebust1-1/+0
Get rid of the post-op GETATTR on the directory in order to reduce the amount of processing done on the server. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-05-01NFSv4: Simplify the NFSv4 OPEN compoundTrond Myklebust1-2/+0
Get rid of the post-op GETATTR on the directory in order to reduce the amount of processing done on the server. The cost is that if we later need to stat() the directory, then we know that the ctime and mtime are likely to be invalid. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-05-01NFSv2/v3: Simulate the change attributeTrond Myklebust1-3/+3
Use the ctime to simulate a change attribute for NFSv2 and NFSv3. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-05-01Merge branch 'master' of ↵John W. Linville1-1/+2
git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless into for-davem
2012-05-01net: fix two typos in skbuff.hEric Dumazet1-2/+2
fix kernel doc typos in function names Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-04-30NFS: Fix a compile issue when CONFIG_NFS_V4_1 is undefinedTrond Myklebust1-0/+5
struct nfs_direct_req can't compile when struct pnfs_ds_commit_info is undefined. Reported-by: Bryan Schumaker <bjschuma@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> Cc: Fred Isaman <iisaman@netapp.com>
2012-04-30Merge tag 'scsi-fixes' of ↵Linus Torvalds3-7/+40
git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi Pull SCSI fixes from James Bottomley: "This is a set of SAS and SATA fixes; there are one or two longstanding bug fixes, but most of this is regression fixes." * tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi: [SCSI] libfc: update mfs boundry checking [SCSI] Revert "[SCSI] libsas: fix sas port naming" [SCSI] libsas: fix false positive 'device attached' conditions [SCSI] libsas, libata: fix start of life for a sas ata_port [SCSI] libsas: fix ata_eh clobbering ex_phys via smp_ata_check_ready [SCSI] libsas: unify domain_device sas_rphy lifetimes [SCSI] libsas: fix sas_get_port_device regression [SCSI] libsas: fix sas_find_bcast_phy() in the presence of 'vacant' phys [SCSI] libsas: introduce sas_work to fix sas_drain_work vs sas_queue_work [SCSI] libata: Pass correct DMA device to scsi host [SCSI] scsi_lib: use correct DMA device in __scsi_alloc_queue
2012-04-30efi: Add new variable attributesMatthew Garrett1-1/+12
More recent versions of the UEFI spec have added new attributes for variables. Add them. Signed-off-by: Matthew Garrett <mjg@redhat.com> Cc: stable@vger.kernel.org Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-04-30asm-generic: Use __BITS_PER_LONG in statfs.hH. Peter Anvin1-1/+1
<asm-generic/statfs.h> is exported to userspace, so using BITS_PER_LONG is invalid. We need to use __BITS_PER_LONG instead. This is kernel bugzilla 43165. Reported-by: H.J. Lu <hjl.tools@gmail.com> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com> Link: http://lkml.kernel.org/r/1335465916-16965-1-git-send-email-hpa@linux.intel.com Acked-by: Arnd Bergmann <arnd@arndb.de> Cc: <stable@vger.kernel.org>
2012-04-30net: fix sk_sockets_allocated_read_positiveEric Dumazet1-2/+2
Denys Fedoryshchenko reported frequent crashes on a proxy server and kindly provided a lockdep report that explains it all : [ 762.903868] [ 762.903880] ================================= [ 762.903890] [ INFO: inconsistent lock state ] [ 762.903903] 3.3.4-build-0061 #8 Not tainted [ 762.904133] --------------------------------- [ 762.904344] inconsistent {IN-SOFTIRQ-W} -> {SOFTIRQ-ON-W} usage. [ 762.904542] squid/1603 [HC0[0]:SC0[0]:HE1:SE1] takes: [ 762.904542] (key#3){+.?...}, at: [<c0232cc4>] __percpu_counter_sum+0xd/0x58 [ 762.904542] {IN-SOFTIRQ-W} state was registered at: [ 762.904542] [<c0158b84>] __lock_acquire+0x284/0xc26 [ 762.904542] [<c01598e8>] lock_acquire+0x71/0x85 [ 762.904542] [<c0349765>] _raw_spin_lock+0x33/0x40 [ 762.904542] [<c0232c93>] __percpu_counter_add+0x58/0x7c [ 762.904542] [<c02cfde1>] sk_clone_lock+0x1e5/0x200 [ 762.904542] [<c0303ee4>] inet_csk_clone_lock+0xe/0x78 [ 762.904542] [<c0315778>] tcp_create_openreq_child+0x1b/0x404 [ 762.904542] [<c031339c>] tcp_v4_syn_recv_sock+0x32/0x1c1 [ 762.904542] [<c031615a>] tcp_check_req+0x1fd/0x2d7 [ 762.904542] [<c0313f77>] tcp_v4_do_rcv+0xab/0x194 [ 762.904542] [<c03153bb>] tcp_v4_rcv+0x3b3/0x5cc [ 762.904542] [<c02fc0c4>] ip_local_deliver_finish+0x13a/0x1e9 [ 762.904542] [<c02fc539>] NF_HOOK.clone.11+0x46/0x4d [ 762.904542] [<c02fc652>] ip_local_deliver+0x41/0x45 [ 762.904542] [<c02fc4d1>] ip_rcv_finish+0x31a/0x33c [ 762.904542] [<c02fc539>] NF_HOOK.clone.11+0x46/0x4d [ 762.904542] [<c02fc857>] ip_rcv+0x201/0x23e [ 762.904542] [<c02daa3a>] __netif_receive_skb+0x319/0x368 [ 762.904542] [<c02dac07>] netif_receive_skb+0x4e/0x7d [ 762.904542] [<c02dacf6>] napi_skb_finish+0x1e/0x34 [ 762.904542] [<c02db122>] napi_gro_receive+0x20/0x24 [ 762.904542] [<f85d1743>] e1000_receive_skb+0x3f/0x45 [e1000e] [ 762.904542] [<f85d3464>] e1000_clean_rx_irq+0x1f9/0x284 [e1000e] [ 762.904542] [<f85d3926>] e1000_clean+0x62/0x1f4 [e1000e] [ 762.904542] [<c02db228>] net_rx_action+0x90/0x160 [ 762.904542] [<c012a445>] __do_softirq+0x7b/0x118 [ 762.904542] irq event stamp: 156915469 [ 762.904542] hardirqs last enabled at (156915469): [<c019b4f4>] __slab_alloc.clone.58.clone.63+0xc4/0x2de [ 762.904542] hardirqs last disabled at (156915468): [<c019b452>] __slab_alloc.clone.58.clone.63+0x22/0x2de [ 762.904542] softirqs last enabled at (156915466): [<c02ce677>] lock_sock_nested+0x64/0x6c [ 762.904542] softirqs last disabled at (156915464): [<c0349914>] _raw_spin_lock_bh+0xe/0x45 [ 762.904542] [ 762.904542] other info that might help us debug this: [ 762.904542] Possible unsafe locking scenario: [ 762.904542] [ 762.904542] CPU0 [ 762.904542] ---- [ 762.904542] lock(key#3); [ 762.904542] <Interrupt> [ 762.904542] lock(key#3); [ 762.904542] [ 762.904542] *** DEADLOCK *** [ 762.904542] [ 762.904542] 1 lock held by squid/1603: [ 762.904542] #0: (sk_lock-AF_INET){+.+.+.}, at: [<c03055c0>] lock_sock+0xa/0xc [ 762.904542] [ 762.904542] stack backtrace: [ 762.904542] Pid: 1603, comm: squid Not tainted 3.3.4-build-0061 #8 [ 762.904542] Call Trace: [ 762.904542] [<c0347b73>] ? printk+0x18/0x1d [ 762.904542] [<c015873a>] valid_state+0x1f6/0x201 [ 762.904542] [<c0158816>] mark_lock+0xd1/0x1bb [ 762.904542] [<c015876b>] ? mark_lock+0x26/0x1bb [ 762.904542] [<c015805d>] ? check_usage_forwards+0x77/0x77 [ 762.904542] [<c0158bf8>] __lock_acquire+0x2f8/0xc26 [ 762.904542] [<c0159b8e>] ? mark_held_locks+0x5d/0x7b [ 762.904542] [<c0159cf6>] ? trace_hardirqs_on+0xb/0xd [ 762.904542] [<c0158dd4>] ? __lock_acquire+0x4d4/0xc26 [ 762.904542] [<c01598e8>] lock_acquire+0x71/0x85 [ 762.904542] [<c0232cc4>] ? __percpu_counter_sum+0xd/0x58 [ 762.904542] [<c0349765>] _raw_spin_lock+0x33/0x40 [ 762.904542] [<c0232cc4>] ? __percpu_counter_sum+0xd/0x58 [ 762.904542] [<c0232cc4>] __percpu_counter_sum+0xd/0x58 [ 762.904542] [<c02cebc4>] __sk_mem_schedule+0xdd/0x1c7 [ 762.904542] [<c02d178d>] ? __alloc_skb+0x76/0x100 [ 762.904542] [<c0305e8e>] sk_wmem_schedule+0x21/0x2d [ 762.904542] [<c0306370>] sk_stream_alloc_skb+0x42/0xaa [ 762.904542] [<c0306567>] tcp_sendmsg+0x18f/0x68b [ 762.904542] [<c031f3dc>] ? ip_fast_csum+0x30/0x30 [ 762.904542] [<c0320193>] inet_sendmsg+0x53/0x5a [ 762.904542] [<c02cb633>] sock_aio_write+0xd2/0xda [ 762.904542] [<c015876b>] ? mark_lock+0x26/0x1bb [ 762.904542] [<c01a1017>] do_sync_write+0x9f/0xd9 [ 762.904542] [<c01a2111>] ? file_free_rcu+0x2f/0x2f [ 762.904542] [<c01a17a1>] vfs_write+0x8f/0xab [ 762.904542] [<c01a284d>] ? fget_light+0x75/0x7c [ 762.904542] [<c01a1900>] sys_write+0x3d/0x5e [ 762.904542] [<c0349ec9>] syscall_call+0x7/0xb [ 762.904542] [<c0340000>] ? rp_sidt+0x41/0x83 Bug is that sk_sockets_allocated_read_positive() calls percpu_counter_sum_positive() without BH being disabled. This bug was added in commit 180d8cd942ce33 (foundations of per-cgroup memory pressure controlling.), since previous code was using percpu_counter_read_positive() which is IRQ safe. In __sk_mem_schedule() we dont need the precise count of allocated sockets and can revert to previous behavior. Reported-by: Denys Fedoryshchenko <denys@visp.net.lb> Sined-off-by: Eric Dumazet <edumazet@google.com> Cc: Glauber Costa <glommer@parallels.com> Acked-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-04-30Merge branch 'master' of git://1984.lsi.us.es/netDavid S. Miller1-1/+3
2012-04-30ipvs: kernel oops - do_ip_vs_get_ctlHans Schillstrom1-0/+2
Change order of init so netns init is ready when register ioctl and netlink. Ver2 Whitespace fixes and __init added. Reported-by: "Ryan O'Hara" <rohara@redhat.com> Signed-off-by: Hans Schillstrom <hans.schillstrom@ericsson.com> Acked-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: Simon Horman <horms@verge.net.au>
2012-04-30ipvs: take care of return value from protocol init_netnsHans Schillstrom1-1/+1
ip_vs_create_timeout_table() can return NULL All functions protocol init_netns is affected of this patch. Signed-off-by: Hans Schillstrom <hans.schillstrom@ericsson.com> Acked-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Simon Horman <horms@verge.net.au>
2012-04-29pipes: add a "packetized pipe" mode for writingLinus Torvalds1-0/+1
The actual internal pipe implementation is already really about individual packets (called "pipe buffers"), and this simply exposes that as a special packetized mode. When we are in the packetized mode (marked by O_DIRECT as suggested by Alan Cox), a write() on a pipe will not merge the new data with previous writes, so each write will get a pipe buffer of its own. The pipe buffer is then marked with the PIPE_BUF_FLAG_PACKET flag, which in turn will tell the reader side to break the read at that boundary (and throw away any partial packet contents that do not fit in the read buffer). End result: as long as you do writes less than PIPE_BUF in size (so that the pipe doesn't have to split them up), you can now treat the pipe as a packet interface, where each read() system call will read one packet at a time. You can just use a sufficiently big read buffer (PIPE_BUF is sufficient, since bigger than that doesn't guarantee atomicity anyway), and the return value of the read() will naturally give you the size of the packet. NOTE! We do not support zero-sized packets, and zero-sized reads and writes to a pipe continue to be no-ops. Also note that big packets will currently be split at write time, but that the size at which that happens is not really specified (except that it's bigger than PIPE_BUF). Currently that limit is the system page size, but we might want to explicitly support bigger packets some day. The main user for this is going to be the autofs packet interface, allowing us to stop having to care so deeply about exact packet sizes (which have had bugs with 32/64-bit compatibility modes). But user space can create packetized pipes with "pipe2(fd, O_DIRECT)", which will fail with an EINVAL on kernels that do not support this interface. Tested-by: Michael Tokarev <mjt@tls.msk.ru> Cc: Alan Cox <alan@lxorguk.ukuu.org.uk> Cc: David Miller <davem@davemloft.net> Cc: Ian Kent <raven@themaw.net> Cc: Thomas Meyer <thomas@m3y3r.de> Cc: stable@kernel.org # needed for systemd/autofs interaction fix Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-04-29Merge tag 'usb-3.4-rc5' of ↵Linus Torvalds1-0/+2
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb Pull USB fixes from Greg Kroah-Hartman: "Here are a number of small USB fixes for 3.4-rc5. Nothing major, as before, some USB gadget fixes. There's a crash fix for a number of ASUS laptops on resume that had been reported by a number of different people. We think the fix might also pertain to other machines, as this was a BIOS bug, and they seem to travel to different models and manufacturers quite easily. Other than that, some other reported problems fixed as well." * tag 'usb-3.4-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb: usb: gadget: udc-core: fix incompatibility with dummy-hcd usb: gadget: udc-core: fix wrong call order USB: cdc-wdm: fix race leading leading to memory corruption USB: EHCI: fix crash during suspend on ASUS computers usb gadget: uvc: uvc_request_data::length field must be signed usb: gadget: dummy: do not call pullup() on udc_stop() usb: musb: davinci.c: add missing unregister usb: musb: drop __deprecated flag USB: gadget: storage gadgets send wrong error code for unknown commands usb: otg: gpio_vbus: Add otg transceiver events and notifiers
2012-04-28Merge tag 'fixes-for-linus' of ↵Linus Torvalds1-0/+4
git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc Pull ARM SoC fixes from Olof Johansson: "Nothing controversial, just another batch of fixes: - Samsung/exynos fixes for more merge window fallout: build errors and warnings mostly, but also some clock/device setup issues on exynos4/5 - PXA bug and warning fixes related to gpio and pinmux - IRQ domain conversion bugfixes for U300 and MSM - A regulator setup fix for U300" * tag 'fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc: ARM: PXA2xx: MFP: fix potential direction bug ARM: PXA2xx: MFP: fix bug with MFP_LPM_KEEP_OUTPUT arm/sa1100: fix sa1100-rtc memory resource ARM: pxa: fix gpio wakeup setting ARM: SAMSUNG: add missing MMC_CAP2_BROKEN_VOLTAGE capability ARM: EXYNOS: Fix compilation error when CONFIG_OF is not defined ARM: EXYNOS: Fix resource on dev-dwmci.c ARM: S3C24XX: Fix build warning for S3C2410_PM ARM: mini2440_defconfig: Fix build error ARM: msm: Fix gic irqdomain support ARM: EXYNOS: Fix incorrect initialization of GIC ARM: EXYNOS: use 'exynos4-sdhci' as device name for sdhci controllers ARM: u300: bump all IRQ numbers by one ARM: ux300: Fix unimplementable regulation constraints
2012-04-27Merge tag 'spi-for-linus' of git://git.secretlab.ca/git/linux-2.6Linus Torvalds1-1/+1
Pull misc SPI device driver bug fixes from Grant Likely. * tag 'spi-for-linus' of git://git.secretlab.ca/git/linux-2.6: spi/spi-bfin5xx: Fix flush of last bit after each spi transfer spi/spi-bfin5xx: fix reversed if condition in interrupt mode spi/spi_bfin_sport: drop bits_per_word from client data spi/bfin_spi: drop bits_per_word from client data spi/spi-bfin-sport: move word length setup to transfer handler spi/bfin5xx: rename config macro name for bfin5xx spi controller driver spi/pl022: Allow request for higher frequency than maximum possible spi/bcm63xx: set master driver mode_bits. spi/bcm63xx: don't use the stopping state spi/bcm63xx: convert to the pump message infrastructure spi/spi-ep93xx.c: use dma_transfer_direction instead of dma_data_direction spi: fix spi.h kernel-doc warning spi/pl022: Fix calculate_effective_freq() spi/pl022: Fix range checking for bits per word
2012-04-27Merge branch 'for-upstream' of ↵John W. Linville1-1/+2
git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth
2012-04-27NFS: Remove extra rpc_clnt argument to proc_lookupBryan Schumaker1-1/+1
Now that I'm doing secinfo automatically in the v4 code this extra argument isn't needed. Signed-off-by: Bryan Schumaker <bjschuma@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-04-27NFS: Create a submount rpc_opBryan Schumaker1-0/+2
This simplifies the code for v2 and v3 and gives v4 a chance to decide on referrals without needing to modify the generic client. Signed-off-by: Bryan Schumaker <bjschuma@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-04-27NFS: Remove secinfo knowledge out of the generic clientBryan Schumaker1-1/+0
And also remove the unneeded rpc_op. Signed-off-by: Bryan Schumaker <bjschuma@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-04-27NFS: add dreq to nfs_commit_infoFred Isaman1-0/+1
Need this to pass into nfs_commitdata_init, in order to keep data->dreq accurate. Signed-off-by: Fred Isaman <iisaman@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-04-27NFS: create nfs_commit_completion_opsFred Isaman1-0/+9
Factors out the code that needs to change when directio starts using these code paths. Signed-off-by: Fred Isaman <iisaman@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-04-27NFS: create struct nfs_commit_infoFred Isaman2-4/+28
It is COMMIT that is handled the most differently between the paged and direct paths. Create a structure that encapsulates everything either path needs to know about the commit state. We could use void to hide some of the layout driver stuff, but Trond suggests pulling it out to ensure type checking, given the huge changes being made, and the fact that it doesn't interfere with other drivers. Signed-off-by: Fred Isaman <iisaman@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>