<feed xmlns='http://www.w3.org/2005/Atom'>
<title>linux/fs/compat.c, branch v2.6.31-rc2</title>
<subtitle>Linux Kernel (branches are rebased on master from time to time)</subtitle>
<id>https://sre.ring0.de/linux/atom?h=v2.6.31-rc2</id>
<link rel='self' href='https://sre.ring0.de/linux/atom?h=v2.6.31-rc2'/>
<link rel='alternate' type='text/html' href='https://sre.ring0.de/linux/'/>
<updated>2009-06-12T16:01:44Z</updated>
<entry>
<title>trivial: fix comment typo in fs/compat.c</title>
<updated>2009-06-12T16:01:44Z</updated>
<author>
<name>Nikanth Karthikesan</name>
<email>knikanth@suse.de</email>
</author>
<published>2009-04-01T09:10:51Z</published>
<link rel='alternate' type='text/html' href='https://sre.ring0.de/linux/commit/?id=ff677f8d10a7b7dea6fbfc48d5ceeb3018cabb23'/>
<id>urn:sha1:ff677f8d10a7b7dea6fbfc48d5ceeb3018cabb23</id>
<content type='text'>
Fix a typo in fs/compat.c

Signed-off-by: Nikanth Karthikesan &lt;knikanth@suse.de&gt;
Signed-off-by: Jiri Kosina &lt;jkosina@suse.cz&gt;
</content>
</entry>
<entry>
<title>Push BKL into do_mount()</title>
<updated>2009-06-12T01:36:08Z</updated>
<author>
<name>Al Viro</name>
<email>viro@zeniv.linux.org.uk</email>
</author>
<published>2009-05-08T17:31:17Z</published>
<link rel='alternate' type='text/html' href='https://sre.ring0.de/linux/commit/?id=6fac98dd218653c6aff8a0f56305c424930cea2a'/>
<id>urn:sha1:6fac98dd218653c6aff8a0f56305c424930cea2a</id>
<content type='text'>
Signed-off-by: Al Viro &lt;viro@zeniv.linux.org.uk&gt;
</content>
</entry>
<entry>
<title>CRED: Rename cred_exec_mutex to reflect that it's a guard against ptrace</title>
<updated>2009-05-10T22:15:36Z</updated>
<author>
<name>David Howells</name>
<email>dhowells@redhat.com</email>
</author>
<published>2009-05-08T12:55:22Z</published>
<link rel='alternate' type='text/html' href='https://sre.ring0.de/linux/commit/?id=5e751e992f3fb08ba35e1ca8095ec8fbf9eda523'/>
<id>urn:sha1:5e751e992f3fb08ba35e1ca8095ec8fbf9eda523</id>
<content type='text'>
Rename cred_exec_mutex to reflect that it's a guard against foreign
intervention on a process's credential state, such as is made by ptrace().  The
attachment of a debugger to a process affects execve()'s calculation of the new
credential state - _and_ also setprocattr()'s calculation of that state.

Signed-off-by: David Howells &lt;dhowells@redhat.com&gt;
Signed-off-by: James Morris &lt;jmorris@namei.org&gt;
</content>
</entry>
<entry>
<title>do_execve() must not clear fs-&gt;in_exec if it was set by another thread</title>
<updated>2009-04-24T14:39:45Z</updated>
<author>
<name>Oleg Nesterov</name>
<email>oleg@redhat.com</email>
</author>
<published>2009-04-23T23:01:56Z</published>
<link rel='alternate' type='text/html' href='https://sre.ring0.de/linux/commit/?id=8c652f96d3852b97a49c331cd0bb02d22f3cb31b'/>
<id>urn:sha1:8c652f96d3852b97a49c331cd0bb02d22f3cb31b</id>
<content type='text'>
If do_execve() fails after check_unsafe_exec(), it clears fs-&gt;in_exec
unconditionally. This is wrong if we race with our sub-thread which
also does do_execve:

	Two threads T1 and T2 and another process P, all share the same
	-&gt;fs.

	T1 starts do_execve(BAD_FILE). It calls check_unsafe_exec(), since
	-&gt;fs is shared, we set LSM_UNSAFE but not -&gt;in_exec.

	P exits and decrements fs-&gt;users.

	T2 starts do_execve(), calls check_unsafe_exec(), now -&gt;fs is not
	shared, we set fs-&gt;in_exec.

	T1 continues, open_exec(BAD_FILE) fails, we clear -&gt;in_exec and
	return to the user-space.

	T1 does clone(CLONE_FS /* without CLONE_THREAD */).

	T2 continues without LSM_UNSAFE_SHARE while -&gt;fs is shared with
	another process.

Change check_unsafe_exec() to return res = 1 if we set -&gt;in_exec, and change
do_execve() to clear -&gt;in_exec depending on res.

When do_execve() suceeds, it is safe to clear -&gt;in_exec unconditionally.
It can be set only if we don't share -&gt;fs with another process, and since
we already killed all sub-threads either -&gt;in_exec == 0 or we are the
only user of this -&gt;fs.

Also, we do not need fs-&gt;lock to clear fs-&gt;in_exec.

Signed-off-by: Oleg Nesterov &lt;oleg@redhat.com&gt;
Acked-by: Roland McGrath &lt;roland@redhat.com&gt;
Acked-by: Hugh Dickins &lt;hugh@veritas.com&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>kill vfs_stat_fd / vfs_lstat_fd</title>
<updated>2009-04-21T03:02:52Z</updated>
<author>
<name>Christoph Hellwig</name>
<email>hch@infradead.org</email>
</author>
<published>2009-04-08T20:34:03Z</published>
<link rel='alternate' type='text/html' href='https://sre.ring0.de/linux/commit/?id=2eae7a1874ca5be3232765d89e0250a449f1bc90'/>
<id>urn:sha1:2eae7a1874ca5be3232765d89e0250a449f1bc90</id>
<content type='text'>
There's really no reason to keep vfs_stat_fd and vfs_lstat_fd with
Oleg's vfs_fstatat.  Use vfs_fstatat for the few cases having the
directory fd, and switch all others to vfs_stat / vfs_lstat.

Reviewed-by: Christoph Hellwig &lt;hch@lst.de&gt;

Signed-off-by: Al Viro &lt;viro@zeniv.linux.org.uk&gt;
</content>
</entry>
<entry>
<title>Separate out common fstatat code into vfs_fstatat</title>
<updated>2009-04-21T03:02:51Z</updated>
<author>
<name>Oleg Drokin</name>
<email>green@linuxhacker.ru</email>
</author>
<published>2009-04-08T16:05:42Z</published>
<link rel='alternate' type='text/html' href='https://sre.ring0.de/linux/commit/?id=0112fc2229847feb6c4eb011e6833d8f1742a375'/>
<id>urn:sha1:0112fc2229847feb6c4eb011e6833d8f1742a375</id>
<content type='text'>
This is a version incorporating Christoph's suggestion.

Separate out common *fstatat functionality into a single function
instead of duplicating it all over the code.

Signed-off-by: Oleg Drokin &lt;green@linuxhacker.ru&gt;
Signed-off-by: Al Viro &lt;viro@zeniv.linux.org.uk&gt;
</content>
</entry>
<entry>
<title>Make non-compat preadv/pwritev use native register size</title>
<updated>2009-04-04T21:20:34Z</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2009-04-03T15:03:22Z</published>
<link rel='alternate' type='text/html' href='https://sre.ring0.de/linux/commit/?id=601cc11d054ae4b5e9b5babec3d8e4667a2cb9b5'/>
<id>urn:sha1:601cc11d054ae4b5e9b5babec3d8e4667a2cb9b5</id>
<content type='text'>
Instead of always splitting the file offset into 32-bit 'high' and 'low'
parts, just split them into the largest natural word-size - which in C
terms is 'unsigned long'.

This allows 64-bit architectures to avoid the unnecessary 32-bit
shifting and masking for native format (while the compat interfaces will
obviously always have to do it).

This also changes the order of 'high' and 'low' to be "low first".  Why?
Because when we have it like this, the 64-bit system calls now don't use
the "pos_high" argument at all, and it makes more sense for the native
system call to simply match the user-mode prototype.

This results in a much more natural calling convention, and allows the
compiler to generate much more straightforward code.  On x86-64, we now
generate

        testq   %rcx, %rcx      # pos_l
        js      .L122   #,
        movq    %rcx, -48(%rbp) # pos_l, pos

from the C source

        loff_t pos = pos_from_hilo(pos_h, pos_l);
	...
        if (pos &lt; 0)
                return -EINVAL;

and the 'pos_h' register isn't even touched.  It used to generate code
like

        mov     %r8d, %r8d      # pos_low, pos_low
        salq    $32, %rcx       #, tmp71
        movq    %r8, %rax       # pos_low, pos.386
        orq     %rcx, %rax      # tmp71, pos.386
        js      .L122   #,
        movq    %rax, -48(%rbp) # pos.386, pos

which isn't _that_ horrible, but it does show how the natural word size
is just a more sensible interface (same arguments will hold in the user
level glibc wrapper function, of course, so the kernel side is just half
of the equation!)

Note: in all cases the user code wrapper can again be the same. You can
just do

	#define HALF_BITS (sizeof(unsigned long)*4)
	__syscall(PWRITEV, fd, iov, count, offset, (offset &gt;&gt; HALF_BITS) &gt;&gt; HALF_BITS);

or something like that.  That way the user mode wrapper will also be
nicely passing in a zero (it won't actually have to do the shifts, the
compiler will understand what is going on) for the last argument.

And that is a good idea, even if nobody will necessarily ever care: if
we ever do move to a 128-bit lloff_t, this particular system call might
be left alone.  Of course, that will be the least of our worries if we
really ever need to care, so this may not be worth really caring about.

[ Fixed for lost 'loff_t' cast noticed by Andrew Morton ]

Acked-by: Gerd Hoffmann &lt;kraxel@redhat.com&gt;
Cc: H. Peter Anvin &lt;hpa@zytor.com&gt;
Cc: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Cc: linux-api@vger.kernel.org
Cc: linux-arch@vger.kernel.org
Cc: Ingo Molnar &lt;mingo@elte.hu&gt;
Cc: Ralf Baechle &lt;ralf@linux-mips.org&gt;&gt;
Cc: Al Viro &lt;viro@zeniv.linux.org.uk&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6</title>
<updated>2009-04-03T04:09:10Z</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2009-04-03T04:09:10Z</published>
<link rel='alternate' type='text/html' href='https://sre.ring0.de/linux/commit/?id=8fe74cf053de7ad2124a894996f84fa890a81093'/>
<id>urn:sha1:8fe74cf053de7ad2124a894996f84fa890a81093</id>
<content type='text'>
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6:
  Remove two unneeded exports and make two symbols static in fs/mpage.c
  Cleanup after commit 585d3bc06f4ca57f975a5a1f698f65a45ea66225
  Trim includes of fdtable.h
  Don't crap into descriptor table in binfmt_som
  Trim includes in binfmt_elf
  Don't mess with descriptor table in load_elf_binary()
  Get rid of indirect include of fs_struct.h
  New helper - current_umask()
  check_unsafe_exec() doesn't care about signal handlers sharing
  New locking/refcounting for fs_struct
  Take fs_struct handling to new file (fs/fs_struct.c)
  Get rid of bumping fs_struct refcount in pivot_root(2)
  Kill unsharing fs_struct in __set_personality()
</content>
</entry>
<entry>
<title>preadv/pwritev: switch compat readv/preadv/writev/pwritev from fget to fget_light</title>
<updated>2009-04-03T02:05:08Z</updated>
<author>
<name>Gerd Hoffmann</name>
<email>kraxel@redhat.com</email>
</author>
<published>2009-04-02T23:59:25Z</published>
<link rel='alternate' type='text/html' href='https://sre.ring0.de/linux/commit/?id=10c7db279218eda4b19d29ee17db8a815b18d564'/>
<id>urn:sha1:10c7db279218eda4b19d29ee17db8a815b18d564</id>
<content type='text'>
Signed-off-by: Gerd Hoffmann &lt;kraxel@redhat.com&gt;
Cc: Arnd Bergmann &lt;arnd@arndb.de&gt;
Cc: Al Viro &lt;viro@zeniv.linux.org.uk&gt;
Cc: &lt;linux-api@vger.kernel.org&gt;
Cc: &lt;linux-arch@vger.kernel.org&gt;
Cc: Ralf Baechle &lt;ralf@linux-mips.org&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>preadv/pwritev: Add preadv and pwritev system calls.</title>
<updated>2009-04-03T02:05:08Z</updated>
<author>
<name>Gerd Hoffmann</name>
<email>kraxel@redhat.com</email>
</author>
<published>2009-04-02T23:59:23Z</published>
<link rel='alternate' type='text/html' href='https://sre.ring0.de/linux/commit/?id=f3554f4bc69803ac2baaf7cf2aa4339e1f4b693e'/>
<id>urn:sha1:f3554f4bc69803ac2baaf7cf2aa4339e1f4b693e</id>
<content type='text'>
This patch adds preadv and pwritev system calls.  These syscalls are a
pretty straightforward combination of pread and readv (same for write).
They are quite useful for doing vectored I/O in threaded applications.
Using lseek+readv instead opens race windows you'll have to plug with
locking.

Other systems have such system calls too, for example NetBSD, check
here: http://www.daemon-systems.org/man/preadv.2.html

The application-visible interface provided by glibc should look like
this to be compatible to the existing implementations in the *BSD family:

  ssize_t preadv(int d, const struct iovec *iov, int iovcnt, off_t offset);
  ssize_t pwritev(int d, const struct iovec *iov, int iovcnt, off_t offset);

This prototype has one problem though: On 32bit archs is the (64bit)
offset argument unaligned, which the syscall ABI of several archs doesn't
allow to do.  At least s390 needs a wrapper in glibc to handle this.  As
we'll need a wrappers in glibc anyway I've decided to push problem to
glibc entriely and use a syscall prototype which works without
arch-specific wrappers inside the kernel: The offset argument is
explicitly splitted into two 32bit values.

The patch sports the actual system call implementation and the windup in
the x86 system call tables.  Other archs follow as separate patches.

Signed-off-by: Gerd Hoffmann &lt;kraxel@redhat.com&gt;
Cc: Arnd Bergmann &lt;arnd@arndb.de&gt;
Cc: Al Viro &lt;viro@zeniv.linux.org.uk&gt;
Cc: &lt;linux-api@vger.kernel.org&gt;
Cc: &lt;linux-arch@vger.kernel.org&gt;
Cc: Ralf Baechle &lt;ralf@linux-mips.org&gt;
Cc: Ingo Molnar &lt;mingo@elte.hu&gt;
Cc: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Cc: "H. Peter Anvin" &lt;hpa@zytor.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
</feed>
