summaryrefslogtreecommitdiffstats
path: root/fs
AgeCommit message (Collapse)AuthorFilesLines
2007-11-29proc: fix NULL ->i_fop oopsAlexey Dobriyan3-40/+1
proc_kill_inodes() can clear ->i_fop in the middle of vfs_readdir resulting in NULL dereference during "file->f_op->readdir(file, buf, filler)". The solution is to remove proc_kill_inodes() completely: a) we don't have tricky modules implementing their tricky readdir hooks which could keeping this revoke from hell. b) In a situation when module is gone but PDE still alive, standard readdir will return only "." and "..", because pde->next was cleared by remove_proc_entry(). c) the race proc_kill_inode() destined to prevent is not completely fixed, just race window made smaller, because vfs_readdir() is run without sb_lock held and without file_list_lock held. Effectively, ->i_fop is cleared at random moment, which can't fix properly anything. BUG: unable to handle kernel NULL pointer dereference at virtual address 00000018 printing eip: c1061205 *pdpt = 0000000005b22001 *pde = 0000000000000000 Oops: 0000 [#1] PREEMPT SMP Modules linked in: foo af_packet ipv6 cpufreq_ondemand loop serio_raw sr_mod k8temp cdrom hwmon amd_rng Pid: 2033, comm: find Not tainted (2.6.24-rc1-b1d08ac064268d0ae2281e98bf5e82627e0f0c56 #2) EIP: 0060:[<c1061205>] EFLAGS: 00010246 CPU: 0 EIP is at vfs_readdir+0x47/0x74 EAX: c6b6a780 EBX: 00000000 ECX: c1061040 EDX: c5decf94 ESI: c6b6a780 EDI: fffffffe EBP: c9797c54 ESP: c5decf78 DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068 Process find (pid: 2033, ti=c5dec000 task=c64bba90 task.ti=c5dec000) Stack: c5decf94 c1061040 fffffff7 0805ffbc 00000000 c6b6a780 c1061295 0805ffbc 00000000 00000400 00000000 00000004 0805ffbc 4588eff4 c5dec000 c10026ba 00000004 0805ffbc 00000400 0805ffbc 4588eff4 bfdc6c70 000000dc 0000007b Call Trace: [<c1061040>] filldir64+0x0/0xc5 [<c1061295>] sys_getdents64+0x63/0xa5 [<c10026ba>] sysenter_past_esp+0x5f/0x85 ======================= Code: 49 83 78 18 00 74 43 8d 6b 74 bf fe ff ff ff 89 e8 e8 b8 c0 12 00 f6 83 2c 01 00 00 10 75 22 8b 5e 10 8b 4c 24 04 89 f0 8b 14 24 <ff> 53 18 f6 46 1a 04 89 c7 75 0b 8b 56 0c 8b 46 08 e8 c8 66 00 EIP: [<c1061205>] vfs_readdir+0x47/0x74 SS:ESP 0068:c5decf78 hch: "Nice, getting rid of this is a very good step formwards. Unfortunately we have another copy of this junk in security/selinux/selinuxfs.c:sel_remove_entries() which would need the same treatment." Signed-off-by: Alexey Dobriyan <adobriyan@sw.ru> Acked-by: Christoph Hellwig <hch@infradead.org> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Stephen Smalley <sds@tycho.nsa.gov> Cc: James Morris <jmorris@namei.org> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-11-28Merge git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-2.6Linus Torvalds1-1/+5
* git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-2.6: sysfs: fix off-by-one error in fill_read_buffer() kobject: two typo fixes UIO: add UIO documentation target to DocBook Makefile UIO: fix up the UIO documentation create /sys/.../power when CONFIG_PM is set allow LEGACY_PTYS to be set to 0
2007-11-28sysfs: fix off-by-one error in fill_read_buffer()Miao Xie1-1/+5
I found that there is a off-by-one problem in the following code. Version: 2.6.24-rc2 File: fs/sysfs/file.c:118-122 Function: fill_read_buffer -------------------------------------------------------------------- count = ops->show(kobj, attr_sd->s_attr.attr, buffer->page); sysfs_put_active_two(attr_sd); BUG_ON(count > (ssize_t)PAGE_SIZE); -------------------------------------------------------------------- Because according to the specification of the sysfs and the implement of the show methods, the show methods return the number of bytes which would be generated for the given input, excluding the trailing null.So if the return value of the show methods equals PAGE_SIZE - 1, the buffer is full in fact. And if the return value equals PAGE_SIZE, the resulting string was already truncated,or buffer overflow occurred. This patch fixes an off-by-one error in fill_read_buffer. Signed-off-by: Miao Xie <miaox@cn.fujitsu.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Acked-by: Tejun Heo <teheo@suse.de> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2007-11-28vfs: coredumping fixIngo Molnar1-0/+6
fix: http://bugzilla.kernel.org/show_bug.cgi?id=3043 only allow coredumping to the same uid that the coredumping task runs under. Signed-off-by: Ingo Molnar <mingo@elte.hu> Acked-by: Alan Cox <alan@redhat.com> Acked-by: Christoph Hellwig <hch@lst.de> Acked-by: Al Viro <viro@ftp.linux.org.uk> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-11-27ocfs2: reverse inline-data truncate argsMark Fasheh1-4/+15
ocfs2_truncate() and ocfs2_remove_inode_range() had reversed their "set i_size" arguments to ocfs2_truncate_inline(). Fix things so that truncate sets i_size, and punching a hole ignores it. This exposed a problem where punching a hole in an inline-data file wasn't updating the page cache, so fix that too. Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2007-11-27ocfs2: Fix comparison in ocfs2_size_fits_inline_data()Mark Fasheh1-1/+1
This was causing us to prematurely push out inline data by one byte. Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2007-11-27ocfs2: Remove bug statement in ocfs2_dentry_iput()Mark Fasheh1-4/+16
The existing bug statement didn't take into account unhashed dentries which might not have a cluster lock on them. This could happen if a node exporting the file system via NFS is rebooted, re-exported to nfs clients and then unmounted. It's fine in this case to not have a dentry cluster lock. Just remove the bug statement and replace it with an error print, which does the proper checks. Though we want to know if something has happened which might have prevented a cluster lock from being created, it's definitely not necessary to panic the machine for this. Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2007-11-27[PATCH] ocfs2: Remove expensive bitmap scanningJan Kara2-2/+12
Enable expensive bitmap scanning only if DEBUG option is enabled. The bitmap scanning quite loads the CPU and on my machine the write throughput of dd if=/dev/zero of=/ocfs2/file bs=1M count=500 conv=sync improves from 37 MB/s to 45.4 MB/s in local mode... Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2007-11-27ocfs2: log valid inode # on bad inodeMark Fasheh1-2/+2
If the inode block isn't valid then we don't want to print the value from that, instead print the block number which was passed in (which should always be correct). Also, turn this into a debug print for now - folks who hit an actual problem always have other logs indicating what the source is. Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2007-11-27ocfs2: Filter -ENOSPC in mlog_errno()Mark Fasheh1-1/+1
It's almost never worth printing in that situation and we keep forgetting to manually filter it out. Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2007-11-27[PATCH] fs/ocfs2: Add missing "space"Joe Perches2-3/+3
Signed-off-by: Joe Perches <joe@perches.com> Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2007-11-27ocfs2: Reset journal parameters after s_mount_opt updateMark Fasheh1-3/+3
Right now we're just setting them from the existing parameters, not the new ones that a remount specified. Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2007-11-26Merge git://git.linux-nfs.org/pub/linux/nfs-2.6Linus Torvalds5-100/+138
* git://git.linux-nfs.org/pub/linux/nfs-2.6: NFS: Clean up new multi-segment direct I/O changes NFS: Ensure we return zero if applications attempt to write zero bytes NFS: Support multiple segment iovecs in the NFS direct I/O path NFS: Introduce iovec I/O helpers to fs/nfs/direct.c SUNRPC: Add missing "space" to net/sunrpc/auth_gss.c SUNRPC: make sunrpc/xprtsock.c:xs_setup_{udp,tcp}() static NFS: fs/nfs/dir.c should #include "internal.h" NFS: make nfs_wb_page_priority() static NFS: mount failure causes bad page state SUNRPC: remove NFS/RDMA client's binary sysctls kernel BUG at fs/nfs/namespace.c:108! - can be triggered by bad server sunrpc: rpc_pipe_poll may miss available data in some cases sunrpc: return error if unsupported enctype or cksumtype is encountered sunrpc: gss_pipe_downcall(), don't assume all errors are transient NFS: Fix the ustat() regression
2007-11-26Merge git://git.kernel.org/pub/scm/linux/kernel/git/mingo/linux-2.6-schedLinus Torvalds1-1/+3
* git://git.kernel.org/pub/scm/linux/kernel/git/mingo/linux-2.6-sched: sched: bump version of kernel/sched_debug.c sched: fix minimum granularity tunings sched: fix RLIMIT_CPU comment sched: fix kernel/acct.c comment sched: fix prev_stime calculation sched: don't forget to unlock uids_mutex on error paths
2007-11-26NFS: Clean up new multi-segment direct I/O changesChuck Lever1-9/+13
Simplify calling sequence of nfs_direct_{read,write}_schedule(), and rename them to reflect their new role. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2007-11-26NFS: Ensure we return zero if applications attempt to write zero bytesChuck Lever1-0/+2
A zero byte count direct write request should be a successful no-op, not an error. Signed-off-by: Chuck Lever <cel@netapp.com> Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2007-11-26NFS: Support multiple segment iovecs in the NFS direct I/O pathChuck Lever1-44/+23
Allow applications to perform asynchronous scatter-gather direct I/O to NFS files. Signed-off-by: Chuck Lever <cel@netapp.com> Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2007-11-26NFS: Introduce iovec I/O helpers to fs/nfs/direct.cChuck Lever1-0/+71
Add helpers that iterate over multi-segment iovecs. These will be used to support multi-segment scatter/gather direct I/O in a later patch. Signed-off-by: Chuck Lever <cel@netapp.com> Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2007-11-26NFS: fs/nfs/dir.c should #include "internal.h"Adrian Bunk1-0/+1
Every file should include the headers containing the prototypes for its global functions (in this case nfs_access_cache_shrinker()). Signed-off-by: Adrian Bunk <bunk@kernel.org> Cc: Trond Myklebust <trond.myklebust@fys.uio.no> Cc: "J. Bruce Fields" <bfields@fieldses.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2007-11-26NFS: make nfs_wb_page_priority() staticAdrian Bunk1-1/+2
nfs_wb_page_priority() can now become static. Signed-off-by: Adrian Bunk <bunk@kernel.org> Cc: Trond Myklebust <trond.myklebust@fys.uio.no> Cc: "J. Bruce Fields" <bfields@fieldses.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2007-11-26NFS: mount failure causes bad page stateRussell King1-2/+4
While testing a kernel based upon ecd744eec3aa8bbc949ec04ed3fbf7ecb2958a0e (with wrong boot arguments), I got the following bad page state entry while NFS was trying to mount it's rootfs: IP-Config: Complete: device=eth0, addr=192.168.1.101, mask=255.255.255.0, gw=255.255.255.255, host=192.168.1.101, domain=, nis-domain=(none), bootserver=192.168.1.100, rootserver=192.168.1.100, rootpath= Looking up port of RPC 100003/2 on 192.168.1.100 rpcbind: server 192.168.1.100 not responding, timed out Root-NFS: Unable to get nfsd port number from server, using default Looking up port of RPC 100005/1 on 192.168.1.100 rpcbind: server 192.168.1.100 not responding, timed out Root-NFS: Unable to get mountd port number from server, using default mount: server 192.168.1.100 not responding, timed out Root-NFS: Server returned error -5 while mounting /nfs/rootfs/ VFS: Unable to mount root fs via NFS, trying floppy. Bad page state in process 'swapper' page:c02b1260 flags:0x00000400 mapping:00000000 mapcount:0 count:0 Trying to fix it up, but a reboot is needed Backtrace: [<c0023e34>] (dump_stack+0x0/0x14) from [<c0062570>] (bad_page+0x70/0xac) [<c0062500>] (bad_page+0x0/0xac) from [<c0064914>] (free_hot_cold_page+0x80/0x178) [<c0064894>] (free_hot_cold_page+0x0/0x178) from [<c0064a74>] (free_hot_page+0x14/0x18) [<c0064a60>] (free_hot_page+0x0/0x18) from [<c0067078>] (put_page+0xf8/0x154) [<c0066f80>] (put_page+0x0/0x154) from [<c007dbc8>] (kfree+0xc8/0xd0) [<c007db00>] (kfree+0x0/0xd0) from [<c00cbb54>] (nfs_get_sb+0x230/0x710) [<c00cb924>] (nfs_get_sb+0x0/0x710) from [<c0084334>] (vfs_kern_mount+0x58/0xac)[<c00842dc>] (vfs_kern_mount+0x0/0xac) from [<c00843c0>] (do_kern_mount+0x38/0xf4) [<c0084388>] (do_kern_mount+0x0/0xf4) from [<c0099c7c>] (do_mount+0x1e8/0x614) ... This seems to be caused by use of an uninitialised structure due to NULL options being passed to nfs_validate_mount_data(). Ensure that the parsed mount data is always initialised. Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk> (Trond: added fix for the same bug in nfs4_validate_mount_data()). Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2007-11-26sched: fix prev_stime calculationIngo Molnar1-1/+3
Srivatsa Vaddagiri noticed occasionally incorrect CPU usage values in top and tracked it down to stime going below 0 in task_stime(). Negative values are possible there due to the sampled nature of stime/utime. Fix suggested by Balbir Singh. Signed-off-by: Ingo Molnar <mingo@elte.hu> Tested-by: Srivatsa Vaddagiri <vatsa@linux.vnet.ibm.com> Reviewed-by: Balbir Singh <balbir@linux.vnet.ibm.com>
2007-11-25[CIFS] Fix check after use error in ACL codeSteve French1-6/+7
Spotted by the coverity scanner. CC: Adrian Bunk <bunk@kernel.org> Signed-off-by: Steve French <sfrench@us.ibm.com>
2007-11-25Merge branch 'master' of /pub/scm/linux/kernel/git/torvalds/linux-2.6Steve French25-254/+304
2007-11-20[CIFS] Fix potential data corruption when writing out cached dirty pagesJeff Layton5-30/+58
Fix RedHat bug 329431 The idea here is separate "conscious" from "unconscious" flushes. Conscious flushes are those due to a fsync() or close(). Unconscious ones are flushes that occur as a side effect of some other operation or due to memory pressure. Currently, when an error occurs during an unconscious flush (ENOSPC or EIO), we toss out the page and don't preserve that error to report to the user when a conscious flush occurs. If after the unconscious flush, there are no more dirty pages for the inode, the conscious flush will simply return success even though there were previous errors when writing out pages. This can lead to data corruption. The easiest way to reproduce this is to mount up a CIFS share that's very close to being full or where the user is very close to quota. mv a file to the share that's slightly larger than the quota allows. The writes will all succeed (since they go to pagecache). The mv will do a setattr to set the new file's attributes. This calls filemap_write_and_wait, which will return an error since all of the pages can't be written out. Then later, when the flush and release ops occur, there are no more dirty pages in pagecache for the file and those operations return 0. mv then assumes that the file was written out correctly and deletes the original. CIFS already has a write_behind_rc variable where it stores the results from earlier flushes, but that value is only reported in cifs_close. Since the VFS ignores the return value from the release operation, this isn't helpful. We should be reporting this error during the flush operation. This patch does the following: 1) changes cifs_fsync to use filemap_write_and_wait and cifs_flush and also sync to check its return code. If it returns successful, they then check the value of write_behind_rc to see if an earlier flush had reported any errors. If so, they return that error and clear write_behind_rc. 2) sets write_behind_rc in a few other places where pages are written out as a side effect of other operations and the code waits on them. 3) changes cifs_setattr to only call filemap_write_and_wait for ATTR_SIZE changes. 4) makes cifs_writepages accurately distinguish between EIO and ENOSPC errors when writing out pages. Some simple testing indicates that the patch works as expected and that it fixes the reproduceable known problem. Acked-by: Dave Kleikamp <shaggy@austin.rr.com> Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Steve French <sfrench@us.ibm.com>
2007-11-20[CIFS] Fix spurious reconnect on 2nd peek from read of SMB lengthPetr Tesarik1-3/+3
When retrying kernel_recvmsg() because of a short read, check returned length against the remaining length, not against total length. This avoids unneeded session reconnects which would otherwise occur when kernel_recvmsg() finally returns zero when asked to read zero bytes. Signed-off-by: Petr Tesarik <ptesarik@suse.cz> Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Steve French <sfrench@us.ibm.com>
2007-11-17kernel BUG at fs/nfs/namespace.c:108! - can be triggered by bad serverNeil Brown1-0/+5
Hi Trond, I have discovered that the BUG_ON in nfs_follow_mountpoint: BUG_ON(IS_ROOT(dentry)); can be triggered by a misbehaving server. What happens is the client does a lookup and discoveres that the named directory has a different fsid, so it initiates a mount. It then performs a GETATTR on the mounted directory and gets a different fsid again (due to a bug in the NFS server). This causes nfs_follow_mountpoint to be called on the newly mounted root, which triggers the BUG_ON. To duplicate this, have a directory which contains some mountpoints, and export that directory with the "crossmnt" flag using nfs-utils 1.1.1 (or 1.1.0 I think) The GETATTR on the root of the mounted filesystem will return the information for the top exportpoint, while a lookup will return the correct information. This difference causes the NFS client to BUG. I think the best way to fix this is to trap this possibility early, so just before completing the mount in the NFS client, check that it isn't going to use nfs_mountpoint_inode_operations. As long as i_op will never change once set (is that true?), this should be adequately safe. The following patch shows a possible approach, and it works for me. i.e. when the NFS server is misbehaving, I get ESTALE on those mountpoints, while when the NFS server is working correctly, I get correct behaviour on the client. NeilBrown Signed-off-by: Neil Brown <neilb@suse.de> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2007-11-17NFS: Fix the ustat() regressionTrond Myklebust1-54/+27
Since 2.6.18, the superblock sb->s_root has been a dummy dentry with a dummy inode. This breaks ustat(), which actually uses sb->s_root in a vfstat() call. Fix this by making the s_root a dummy alias to the directory inode that was used when creating the superblock. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2007-11-17[CIFS] remove build warningSteve French3-2/+2
CC: Jeff Layton <jlayton@redhat.com> Signed-off-by: Steve French <sfrench@us.ibm.com>
2007-11-16[CIFS] Have CIFS_SessSetup build correct SPNEGO SessionSetup requestSteve French4-18/+77
Have CIFS_SessSetup call cifs_get_spnego_key when Kerberos is negotiated. Use the info in the key payload to build a session setup request packet. Also clean up how the request buffer in the function is freed on error. With appropriate user space helper (in samba/source/client). Kerberos support (secure session establishment can be done now via Kerberos, previously users would have to use NTLMv2 instead for more secure session setup). Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Steve French <sfrench@us.ibm.com>
2007-11-16[CIFS] minor checkpatch cleanupSteve French3-9/+9
Signed-off-by: Steve French <sfrench@us.ibm.com>
2007-11-16[CIFS] have cifs_get_spnego_key get the hostname from TCP_Server_InfoJeff Layton2-3/+3
Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Steve French <sfrench@us.ibm.com>
2007-11-16[CIFS] add hostname field to TCP_Server_Info structJeff Layton2-0/+37
...and populate it with the hostname portion of the UNC string. Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Steve French <sfrench@us.ibm.com>
2007-11-16[CIFS] clean up error handling in cifs_mountJeff Layton1-58/+30
Move all of the kfree's sprinkled in the middle of the function to the end, and have the code set rc and just goto there on error. Also zero out the password string before freeing it. Looks like this should also fix a potential memory leak of the prepath string if an error occurs near the end of the function. Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Steve French <sfrench@us.ibm.com>
2007-11-16[CIFS] add ver= prefix to upcall format versionSteve French1-6/+11
Acked-by: Jeff Layton <jlayton@redhat.com> Acked-by: Igor Mammedov <niallan@gmail.com> Signed-off-by: Steve French <sfrench@us.ibm.com>
2007-11-14Fix 64KB blocksize in ext3 directoriesJan Kara2-52/+50
With 64KB blocksize, a directory entry can have size 64KB which does not fit into 16 bits we have for entry lenght. So we store 0xffff instead and convert value when read from / written to disk. The patch also converts some places to use ext3_next_entry() when we are changing them anyway. [akpm@linux-foundation.org: coding-style cleanups] Signed-off-by: Jan Kara <jack@suse.cz> Cc: <linux-ext4@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-11-14smbfs: fix debug buildsJeff Layton4-6/+7
Fix some warnings with SMBFS_DEBUG_* builds. This patch makes it so that builds with -Werror don't fail. Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-11-14mark sys_open/sys_read exports unusedArjan van de Ven2-2/+2
sys_open / sys_read were used in the early 1.2 days to load firmware from disk inside drivers. Since 2.0 or so this was deprecated behavior, but several drivers still were using this. Since a few years we have a request_firmware() API that implements this in a nice, consistent way. Only some old ISA sound drivers (pre-ALSA) still straggled along for some time.... however with commit c2b1239a9f22f19c53543b460b24507d0e21ea0c the last user is now gone. This is a good thing, since using sys_open / sys_read etc for firmware is a very buggy to dangerous thing to do; these operations put an fd in the process file descriptor table.... which then can be tampered with from other threads for example. For those who don't want the firmware loader, filp_open()/vfs_read are the better APIs to use, without this security issue. The patch below marks sys_open and sys_read unused now that they're really not used anymore, and for deletion in the 2.6.25 timeframe. Signed-off-by: Arjan van de Ven <arjan@linux.intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-11-14proc: simplify and correct proc_flush_taskEric W. Biederman1-9/+6
Currently we special case when we have only the initial pid namespace. Unfortunately in doing so the copied case for the other namespaces was broken so we don't properly flush the thread directories :( So this patch removes the unnecessary special case (removing a usage of proc_mnt) and corrects the flushing of the thread directories. Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Cc: Al Viro <viro@ftp.linux.org.uk> Cc: Pavel Emelyanov <xemul@openvz.org> Cc: Sukadev Bhattiprolu <sukadev@us.ibm.com> Cc: Kirill Korotaev <dev@sw.ru> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-11-14fuse_file_alloc(): fix NULL dereferencesAdrian Bunk1-2/+3
Fix obvious NULL dereferences spotted by the Coverity checker. Signed-off-by: Adrian Bunk <bunk@kernel.org> Acked-by: Miklos Szeredi <miklos@szeredi.hu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-11-14reiserfs: don't drop PG_dirty when releasing sub-page-sized dirty fileFengguang Wu1-3/+0
This is not a new problem in 2.6.23-git17. 2.6.22/2.6.23 is buggy in the same way. Reiserfs could accumulate dirty sub-page-size files until umount time. They cannot be synced to disk by pdflush routines or explicit `sync' commands. Only `umount' can do the trick. The direct cause is: the dirty page's PG_dirty is wrongly _cleared_. Call trace: [<ffffffff8027e920>] cancel_dirty_page+0xd0/0xf0 [<ffffffff8816d470>] :reiserfs:reiserfs_cut_from_item+0x660/0x710 [<ffffffff8816d791>] :reiserfs:reiserfs_do_truncate+0x271/0x530 [<ffffffff8815872d>] :reiserfs:reiserfs_truncate_file+0xfd/0x3b0 [<ffffffff8815d3d0>] :reiserfs:reiserfs_file_release+0x1e0/0x340 [<ffffffff802a187c>] __fput+0xcc/0x1b0 [<ffffffff802a1ba6>] fput+0x16/0x20 [<ffffffff8029e676>] filp_close+0x56/0x90 [<ffffffff8029fe0d>] sys_close+0xad/0x110 [<ffffffff8020c41e>] system_call+0x7e/0x83 Fix the bug by removing the cancel_dirty_page() call. Tests show that it causes no bad behaviors on various write sizes. === for the patient === Here are more detailed demonstrations of the problem. 1) the page has both PG_dirty(D)/PAGECACHE_TAG_DIRTY(d) after being written to; and then only PAGECACHE_TAG_DIRTY(d) remains after the file is closed. ------------------------------ screen 0 ------------------------------ [T0] root /home/wfg# cat > /test/tiny [T1] hi [T2] root /home/wfg# ------------------------------ screen 1 ------------------------------ [T1] root /home/wfg# echo /test/tiny > /proc/filecache [T1] root /home/wfg# cat /proc/filecache # file /test/tiny # flags R:referenced A:active M:mmap U:uptodate D:dirty W:writeback O:owner B:buffer d:dirty w:writeback # idx len state refcnt 0 1 ___UD__Bd_ 2 [T2] root /home/wfg# cat /proc/filecache # file /test/tiny # flags R:referenced A:active M:mmap U:uptodate D:dirty W:writeback O:owner B:buffer d:dirty w:writeback # idx len state refcnt 0 1 ___U___Bd_ 2 2) note the non-zero 'cancelled_write_bytes' after /tmp/hi is copied. ------------------------------ screen 0 ------------------------------ [T0] root /home/wfg# echo hi > /tmp/hi [T1] root /home/wfg# cp /tmp/hi /dev/stdin /test [T2] hi [T3] root /home/wfg# ------------------------------ screen 1 ------------------------------ [T1] root /proc/4397# cd /proc/`pidof cp` [T1] root /proc/4713# cat io rchar: 8396 wchar: 3 syscr: 20 syscw: 1 read_bytes: 0 write_bytes: 20480 cancelled_write_bytes: 4096 [T2] root /proc/4713# cat io rchar: 8399 wchar: 6 syscr: 21 syscw: 2 read_bytes: 0 write_bytes: 24576 cancelled_write_bytes: 4096 //Question: the 'write_bytes' is a bit more than expected ;-) Tested-by: Maxim Levitsky <maximlevitsky@gmail.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Jeff Mahoney <jeffm@suse.com> Signed-off-by: Fengguang Wu <wfg@mail.ustc.edu.cn> Reviewed-by: Chris Mason <chris.mason@oracle.com> Cc: <stable@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-11-14Fixes to the BFS filesystem driverDmitri Vorobiev4-155/+184
I found a few bugs in the BFS driver. Detailed description of the bugs as well as the steps to reproduce the errors are given in the kernel bugzilla. Please follow these links for more information: http://bugzilla.kernel.org/show_bug.cgi?id=9363 http://bugzilla.kernel.org/show_bug.cgi?id=9364 http://bugzilla.kernel.org/show_bug.cgi?id=9365 http://bugzilla.kernel.org/show_bug.cgi?id=9366 This patch fixes the bugs described above. Besides, the patch introduces coding style changes to make the BFS driver conform to the requirements specified for Linux kernel code. Finally, I made a few cosmetic changes such as removal of trivial debug output. Also, the patch removes the fields `si_lf_ioff' and `si_lf_sblk' of the in-core superblock structure. These fields are initialized but never actually used. If you are wondering why I need BFS, here is the answer: I am using this driver in the context of Linux kernel classes I am teaching in the Moscow State University and in the International Institute of Information Technology in Pune, India. Signed-off-by: Dmitri Vorobiev <dmitri.vorobiev@gmail.com> Cc: Tigran Aivazian <tigran@veritas.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-11-14hugetlb: allow bulk updating in hugetlb_*_quota()Adam Litke1-5/+5
Add a second parameter 'delta' to hugetlb_get_quota and hugetlb_put_quota to allow bulk updating of the sbinfo->free_blocks counter. This will be used by the next patch in the series. Signed-off-by: Adam Litke <agl@us.ibm.com> Cc: Ken Chen <kenchen@google.com> Cc: Andy Whitcroft <apw@shadowen.org> Cc: Dave Hansen <haveblue@us.ibm.com> Cc: David Gibson <hermes@gibson.dropbear.id.au> Cc: William Lee Irwin III <wli@holomorphy.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-11-14hugetlb: fix quota management for private mappingsAdam Litke1-1/+0
The hugetlbfs quota management system was never taught to handle MAP_PRIVATE mappings when that support was added. Currently, quota is debited at page instantiation and credited at file truncation. This approach works correctly for shared pages but is incomplete for private pages. In addition to hugetlb_no_page(), private pages can be instantiated by hugetlb_cow(); but this function does not respect quotas. Private huge pages are treated very much like normal, anonymous pages. They are not "backed" by the hugetlbfs file and are not stored in the mapping's radix tree. This means that private pages are invisible to truncate_hugepages() so that function will not credit the quota. This patch (based on a prototype provided by Ken Chen) moves quota crediting for all pages into free_huge_page(). page->private is used to store a pointer to the mapping to which this page belongs. This is used to credit quota on the appropriate hugetlbfs instance. Signed-off-by: Adam Litke <agl@us.ibm.com> Cc: Ken Chen <kenchen@google.com> Cc: Ken Chen <kenchen@google.com> Cc: Andy Whitcroft <apw@shadowen.org> Cc: Dave Hansen <haveblue@us.ibm.com> Cc: David Gibson <hermes@gibson.dropbear.id.au> Cc: William Lee Irwin III <wli@holomorphy.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-11-14proc: fix proc_kill_inodes to kill dentries on all proc superblocksEric W. Biederman3-18/+25
It appears we overlooked support for removing generic proc files when we added support for multiple proc super blocks. Handle that now. [akpm@linux-foundation.org: coding-style cleanups] Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Acked-by: Pavel Emelyanov <xemul@openvz.org> Cc: Alexey Dobriyan <adobriyan@sw.ru> Acked-by: Sukadev Bhattiprolu <sukadev@us.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-11-14Forbid user to change file flags on quota filesJan Kara5-0/+21
Forbid user from changing file flags on quota files. User has no bussiness in playing with these flags when quota is on. Furthermore there is a remote possibility of deadlock due to a lock inversion between quota file's i_mutex and transaction's start (i_mutex for quota file is locked only when trasaction is started in quota operations) in ext3 and ext4. Signed-off-by: Jan Kara <jack@suse.cz> Cc: LIOU Payphone <lioupayphone@gmail.com> Cc: <linux-ext4@vger.kernel.org> Acked-by: Dave Kleikamp <shaggy@austin.ibm.com> Cc: <reiserfs-dev@namesys.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-11-14eCryptfs: cast page->index to loff_t instead of off_tMichael Halcrow1-1/+1
page->index should be cast to loff_t instead of off_t. Signed-off-by: Michael Halcrow <mhalcrow@us.ibm.com> Reported-by: Eric Sandeen <sandeen@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-11-13[CIFS] Fix buffer overflow if server sends corrupt response to smallSteve French7-96/+133
request In SendReceive() function in transport.c - it memcpy's message payload into a buffer passed via out_buf param. The function assumes that all buffers are of size (CIFSMaxBufSize + MAX_CIFS_HDR_SIZE) , unfortunately it is also called with smaller (MAX_CIFS_SMALL_BUFFER_SIZE) buffers. There are eight callers (SMB worker functions) which are primarily affected by this change: TreeDisconnect, uLogoff, Close, findClose, SetFileSize, SetFileTimes, Lock and PosixLock CC: Dave Kleikamp <shaggy@austin.ibm.com> CC: Przemyslaw Wegrzyn <czajnik@czajsoft.pl> Acked-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Steve French <sfrench@us.ibm.com>
2007-11-13Merge branch 'master' of ↵Linus Torvalds3-4/+4
master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6 * 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6: (45 commits) [NETFILTER]: xt_time should not assume CONFIG_KTIME_SCALAR [NET]: Move unneeded data to initdata section. [NET]: Cleanup pernet operation without CONFIG_NET_NS [TEHUTI]: Fix incorrect usage of strncat in bdx_get_drvinfo() [MYRI_SBUS]: Prevent that myri_do_handshake lies about ticks. [NETFILTER]: bridge: fix double POSTROUTING hook invocation [NETFILTER]: Consolidate nf_sockopt and compat_nf_sockopt [NETFILTER]: nf_nat: fix memset error [INET]: Use list_head-s in inetpeer.c [IPVS]: Remove unused exports. [NET]: Unexport sysctl_{r,w}mem_max. [TG3]: Update version to 3.86 [TG3]: MII => TP [TG3]: Add A1 revs [TG3]: Increase the PCI MRRS [TG3]: Prescaler fix [TG3]: Limit 5784 / 5764 to MAC LED mode [TG3]: Disable GPHY autopowerdown [TG3]: CPMU adjustments for loopback tests [TG3]: Fix nvram selftest failures ...
2007-11-13Revert "ext2/ext3/ext4: add block bitmap validation"Linus Torvalds3-130/+9
This reverts commit 7c9e69faa28027913ee059c285a5ea8382e24b5d, fixing up conflicts in fs/ext4/balloc.c manually. The cost of doing the bitmap validation on each lookup - even when the bitmap is cached - is absolutely prohibitive. We could, and probably should, do it only when adding the bitmap to the buffer cache. However, right now we are better off just reverting it. Peter Zijlstra measured the cost of this extra validation as a 85% decrease in cached iozone, and while I had a patch that took it down to just 17% by not being _quite_ so stupid in the validation, it was still a big slowdown that could have been avoided by just doing it right. Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Aneesh Kumar <aneesh.kumar@linux.vnet.ibm.com> Cc: Andreas Dilger <adilger@clusterfs.com> Cc: Mingming Cao <cmm@us.ibm.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>