Age | Commit message (Collapse) | Author | Files | Lines |
|
Enable FORTIFY_SOURCE so running Kunit tests can test fortified
functions.
Signed-off-by: Kees Cook <keescook@chromium.org>
Tested-by: David Gow <davidgow@google.com>
Link: https://lore.kernel.org/r/20220210003224.773957-1-keescook@chromium.org
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/rw/uml
Pull UML updates from Richard Weinberger:
- KASAN support for x86_64
- noreboot command line option, just like qemu's -no-reboot
- Various fixes and cleanups
* tag 'for-linus-5.20-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rw/uml:
um: include sys/types.h for size_t
um: Replace to_phys() and to_virt() with less generic function names
um: Add missing apply_returns()
um: add "noreboot" command line option for PANIC_TIMEOUT=-1 setups
um: include linux/stddef.h for __always_inline
UML: add support for KASAN under x86_64
mm: Add PAGE_ALIGN_DOWN macro
um: random: Don't initialise hwrng struct with zero
um: remove unused mm_copy_segments
um: remove unused variable
um: Remove straying parenthesis
um: x86: print RIP with symbol
arch: um: Fix build for statically linked UML w/ constructors
x86/um: Kconfig: Fix indentation
um/drivers: Kconfig: Fix indentation
um: Kconfig: Fix indentation
|
|
UML generally does not provide access to special CPU instructions like
RDRAND, and execution tends to be rather deterministic, with no real
hardware interrupts, making good randomness really very hard, if not
all together impossible. Not only is this a security eyebrow raiser, but
it's also quite annoying when trying to do various pieces of UML-based
automation that takes a long time to boot, if ever.
Fix this by trivially calling getrandom() in the host and using that
seed as "bootloader randomness", which initializes the rng immediately
at UML boot.
The old behavior can be restored the same way as on any other arch, by
way of CONFIG_TRUST_BOOTLOADER_RANDOMNESS=n or
random.trust_bootloader=0. So seen from that perspective, this just
makes UML act like other archs, which is positive in its own right.
Additionally, wire up arch_get_random_{int,long}() in the same way, so
that reseeds can also make use of the host RNG, controllable by
CONFIG_TRUST_CPU_RANDOMNESS and random.trust_cpu, per usual.
Cc: stable@vger.kernel.org
Acked-by: Johannes Berg <johannes@sipsolutions.net>
Acked-By: Anton Ivanov <anton.ivanov@cambridgegreys.com>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
|
|
to_virt() and to_phys() are very generic and may be defined by drivers.
As it turns out, commit 9409c9b6709e ("pmem: refactor pmem_clear_poison()")
did exactly that. This results in build errors such as the following
when trying to build um:allmodconfig.
drivers/nvdimm/pmem.c: In function ‘pmem_dax_zero_page_range’:
./arch/um/include/asm/page.h:105:20: error:
too few arguments to function ‘to_phys’
105 | #define __pa(virt) to_phys((void *) (unsigned long) (virt))
| ^~~~~~~
Use less generic function names for the um specific to_phys() and to_virt()
functions to fix the problem and to avoid similar problems in the future.
Fixes: 9409c9b6709e ("pmem: refactor pmem_clear_poison()")
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Christoph Hellwig <hch@lst.de>
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Acked-By: Anton Ivanov <anton.ivanov@cambridgegreys.com>
Acked-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Richard Weinberger <richard@nod.at>
|
|
QEMU has a -no-reboot option, which halts instead of reboots when the
guest asks to reboot. This is invaluable when used with
CONFIG_PANIC_TIMEOUT=-1 (and panic_on_warn), because it allows panics
and warnings to be caught immediately in CI. Implement this in UML too,
by way of a basic setup param.
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Signed-off-by: Richard Weinberger <richard@nod.at>
|
|
Make KASAN run on User Mode Linux on x86_64.
The UML-specific KASAN initializer uses mmap to map the ~16TB of shadow
memory to the location defined by KASAN_SHADOW_OFFSET. kasan_init()
utilizes constructors to initialize KASAN before main().
The location of the KASAN shadow memory, starting at
KASAN_SHADOW_OFFSET, can be configured using the KASAN_SHADOW_OFFSET
option. The default location of this offset is 0x100000000000, which
keeps it out-of-the-way even on UML setups with more "physical" memory.
For low-memory setups, 0x7fff8000 can be used instead, which fits in an
immediate and is therefore faster, as suggested by Dmitry Vyukov. There
is usually enough free space at this location; however, it is a config
option so that it can be easily changed if needed.
Note that, unlike KASAN on other architectures, vmalloc allocations
still use the shadow memory allocated upfront, rather than allocating
and free-ing it per-vmalloc allocation.
If another architecture chooses to go down the same path, we should
replace the checks for CONFIG_UML with something more generic, such
as:
- A CONFIG_KASAN_NO_SHADOW_ALLOC option, which architectures could set
- or, a way of having architecture-specific versions of these vmalloc
and module shadow memory allocation options.
Also note that, while UML supports both KASAN in inline mode
(CONFIG_KASAN_INLINE) and static linking (CONFIG_STATIC_LINK), it does
not support both at the same time.
Signed-off-by: Patricia Alfonso <trishalfonso@google.com>
Co-developed-by: Vincent Whitchurch <vincent.whitchurch@axis.com>
Signed-off-by: Vincent Whitchurch <vincent.whitchurch@axis.com>
Signed-off-by: David Gow <davidgow@google.com>
Reviewed-by: Johannes Berg <johannes@sipsolutions.net>
Reviewed-by: Dmitry Vyukov <dvyukov@google.com>
Reviewed-by: Andrey Konovalov <andreyknvl@gmail.com>
Signed-off-by: Richard Weinberger <richard@nod.at>
|
|
The variable dead is initialized but never used otherwise.
Signed-off-by: Haowen Bai <baihaowen@meizu.com>
Signed-off-by: Richard Weinberger <richard@nod.at>
|
|
The UML function names to_virt() and to_phys() are exposed by UML
headers, and are very generic and may be defined by drivers. As it
turns out, commit 9409c9b6709e ("pmem: refactor pmem_clear_poison()")
did exactly that.
This results in build errors such as the following when trying to build
um:allmodconfig:
drivers/nvdimm/pmem.c: In function ‘pmem_dax_zero_page_range’:
./arch/um/include/asm/page.h:105:20: error: too few arguments to function ‘to_phys’
105 | #define __pa(virt) to_phys((void *) (unsigned long) (virt))
| ^~~~~~~
Use less generic function names for the um specific to_phys() and
to_virt() functions to fix the problem and to avoid similar problems in
the future.
Fixes: 9409c9b6709e ("pmem: refactor pmem_clear_poison()")
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Christoph Hellwig <hch@lst.de>
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/rw/uml
Pull UML updates from Richard Weinberger:
- Devicetree support (for testing)
- Various cleanups and fixes: UBD, port_user, uml_mconsole
- Maintainer update
* tag 'for-linus-5.18-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rw/uml:
um: run_helper: Write error message to kernel log on exec failure on host
um: port_user: Improve error handling when port-helper is not found
um: port_user: Allow setting path to port-helper using UML_PORT_HELPER envvar
um: port_user: Search for in.telnetd in PATH
um: clang: Strip out -mno-global-merge from USER_CFLAGS
docs: UML: Mention telnetd for port channel
um: Remove unused timeval_to_ns() function
um: Fix uml_mconsole stop/go
um: Cleanup syscall_handler_t definition/cast, fix warning
uml: net: vector: fix const issue
um: Fix WRITE_ZEROES in the UBD Driver
um: Migrate vector drivers to NAPI
um: Fix order of dtb unflatten/early init
um: fix and optimize xor select template for CONFIG64 and timetravel mode
um: Document dtb command line option
lib/logic_iomem: correct fallback config references
um: Remove duplicated include in syscalls_64.c
MAINTAINERS: Update UserModeLinux entry
|
|
Add SUBARCH target for Clang+um (which must go last, not alphabetically,
so the other SUBARCHes are assigned). Remove open-coded "DEFINE"
macro, instead using linux/kbuild.h's version which was updated to use
Clang-friendly assembly in commit cf0c3e68aa81 ("kbuild: fix asm-offset
generation to work with clang"). Redefine "DEFINE_LONGS" in terms of
"COMMENT" and "DEFINE" so that the intended coment actually has useful
content. Add a missed "break" to avoid implicit fall-through warnings.
This lets me run KUnit tests with Clang:
$ ./tools/testing/kunit/kunit.py run --make_options LLVM=1
...
Cc: Jeff Dike <jdike@addtoit.com>
Cc: Richard Weinberger <richard@nod.at>
Cc: Anton Ivanov <anton.ivanov@cambridgegreys.com>
Cc: Masahiro Yamada <masahiroy@kernel.org>
Cc: Nick Desaulniers <ndesaulniers@google.com>
Cc: Nathan Chancellor <nathan@kernel.org>
Cc: David Gow <davidgow@google.com>
Cc: linux-um@lists.infradead.org
Cc: linux-kbuild@vger.kernel.org
Cc: linux-kselftest@vger.kernel.org
Cc: kunit-dev@googlegroups.com
Cc: llvm@lists.linux.dev
Reviewed-by: Nathan Chancellor <nathan@kernel.org>
Link: https://lore.kernel.org/lkml/Yg2YubZxvYvx7%2Fnm@dev-arch.archlinux-ax161/
Tested-by: David Gow <davidgow@google.com>
Link: https://lore.kernel.org/lkml/CABVgOSk=oFxsbSbQE-v65VwR2+mXeGXDDjzq8t7FShwjJ3+kUg@mail.gmail.com/
Signed-off-by: Kees Cook <keescook@chromium.org>
---
v1: https://lore.kernel.org/lkml/20220217002843.2312603-1-keescook@chromium.org
v2: https://lore.kernel.org/lkml/20220224055831.1854786-1-keescook@chromium.org
v3:
- use kbuild.h to avoid duplication (Masahiro)
- fix intended comments (Masahiro)
- use SUBARCH (Nathan)
|
|
The best place to log errors from the host side is in the kernel log within
the UML guest. Letting the user now that exec() failed and why is very
helpful when the user is trying to determine why some aspect of UML is not
working. For instance, when telneting into the UML instance, if the
connection is established and then immediately dropped, this may be due to
exec() failing because in.telnetd is not found.
Signed-off-by: Glenn Washburn <development@efficientek.com>
Signed-off-by: Richard Weinberger <richard@nod.at>
|
|
The timeval_to_ns() function doesn't appear to be used anywhere, as far
as I (or git grep) can tell, and clang throws up a warning about it:
../arch/um/os-Linux/time.c:21:25: warning: unused function 'timeval_to_ns' [-Wunused-function]
static inline long long timeval_to_ns(const struct timeval *tv)
^
1 warning generated.
Signed-off-by: David Gow <davidgow@google.com>
Acked-By: Anton Ivanov <anton.ivanov@cambridgegreys.com>
Signed-off-by: Richard Weinberger <richard@nod.at>
|
|
Call to fallocate with FALLOC_FL_PUNCH_HOLE on a device backed by a sparse
file can end up by missing data, zeroes data range, if the underlying file
is used with a tool like bmaptool which will referenced only used spaces.
Signed-off-by: Frédéric Danis <frederic.danis@collabora.com>
Acked-by: Anton Ivanov <anton.ivanov@cambridgegreys.com>
Signed-off-by: Richard Weinberger <richard@nod.at>
|
|
Use the macro 'swap()' defined in 'include/linux/minmax.h' to avoid
opencoding it.
Reported-by: Zeal Robot <zealci@zte.com.cn>
Signed-off-by: Yang Guang <yang.guang5@zte.com.cn>
Signed-off-by: Richard Weinberger <richard@nod.at>
|
|
The function names init_registers() and restore_registers() are used
in several net/ethernet/ and gpu/drm/ drivers for other purposes (not
calls to UML functions), so rename them.
This fixes multiple build errors.
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Cc: Jeff Dike <jdike@addtoit.com>
Cc: Richard Weinberger <richard@nod.at>
Cc: Anton Ivanov <anton.ivanov@cambridgegreys.com>
Cc: linux-um@lists.infradead.org
Signed-off-by: Richard Weinberger <richard@nod.at>
|
|
Rename set_signals() as there's at least one driver that
uses the same name and can now be built on UM due to PCI
support, and thus we can get symbol conflicts.
Also rename set_signals_trace() to be consistent.
Reported-by: kernel test robot <lkp@intel.com>
Fixes: 68f5d3f3b654 ("um: add PCI over virtio emulation driver")
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Richard Weinberger <richard@nod.at>
|
|
Delete/fixup few includes in anticipation of global -isystem compile
option removal.
Note: crypto/aegis128-neon-inner.c keeps <stddef.h> due to redefinition
of uintptr_t error (one definition comes from <stddef.h>, another from
<linux/types.h>).
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
|
|
GCC assumes that stack is aligned to 16-byte on call sites [1].
Since GCC 8, GCC began using 16-byte aligned SSE instructions to
implement assignments to structs on stack. When
CC_OPTIMIZE_FOR_PERFORMANCE is enabled, this affects
os-Linux/sigio.c, write_sigio_thread:
struct pollfds *fds, tmp;
tmp = current_poll;
Note that struct pollfds is exactly 16 bytes in size.
GCC 8+ generates assembly similar to:
movdqa (%rdi),%xmm0
movaps %xmm0,-0x50(%rbp)
This is an issue, because movaps will #GP if -0x50(%rbp) is not
aligned to 16 bytes [2], and how rbp gets assigned to is via glibc
clone thread_start, then function prologue, going though execution
trace similar to (showing only relevant instructions):
sub $0x10,%rsi
mov %rcx,0x8(%rsi)
mov %rdi,(%rsi)
syscall
pop %rax
pop %rdi
callq *%rax
push %rbp
mov %rsp,%rbp
The stack pointer always points to the topmost element on stack,
rather then the space right above the topmost. On push, the
pointer decrements first before writing to the memory pointed to
by it. Therefore, there is no need to have the stack pointer
pointer always point to valid memory unless the stack is poped;
so the `- sizeof(void *)` in the code is unnecessary.
On the other hand, glibc reserves the 16 bytes it needs on stack
and pops itself, so by the call instruction the stack pointer
is exactly the caller-supplied sp. It then push the 16 bytes of
the return address and the saved stack pointer, so the base
pointer will be 16-byte aligned if and only if the caller
supplied sp is 16-byte aligned. Therefore, the caller must supply
a 16-byte aligned pointer, which `stack + UM_KERN_PAGE_SIZE`
already satisfies.
On a side note, musl is unaffected by this issue because it forces
16 byte alignment via `and $-16,%rsi` in its clone wrapper.
Similarly, glibc i386 is also unaffected because it has
`andl $0xfffffff0, %ecx`.
To reproduce this bug, enable CONFIG_UML_RTC and
CC_OPTIMIZE_FOR_PERFORMANCE. uml_rtc will call
add_sigio_fd which will then cause write_sigio_thread to either go
into segfault loop or panic with "Segfault with no mm".
Similarly, signal stacks will be aligned by the host kernel upon
signal delivery. `- sizeof(void *)` to sigaltstack is
unconventional and extraneous.
On a related note, initialization of longjmp buffers do require
`- sizeof(void *)`. This is to account for the return address
that would have been pushed to the stack at the call site.
The reason for uml to respect 16-byte alignment, rather than
telling GCC to assume 8-byte alignment like the host kernel since
commit d9b0cde91c60 ("x86-64, gcc: Use
-mpreferred-stack-boundary=3 if supported"), is because uml links
against libc. There is no reason to assume libc is also compiled
with that flag and assumes 8-byte alignment rather than 16-byte.
[1] https://gcc.gnu.org/bugzilla/show_bug.cgi?id=40838
[2] https://c9x.me/x86/html/file_module_x86_id_180.html
Signed-off-by: YiFei Zhu <zhuyifei1999@gmail.com>
Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Reviewed-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: Richard Weinberger <richard@nod.at>
|
|
1. Reflect host cpu flags into the UML instance so they can
be used to select the correct implementations for xor, crypto, etc.
2. Reflect host cache alignment into UML instance. This is
important when running 32 bit on a 64 bit host as 32 bit by
default aligns to 32 while the actual alignment should be 64.
Ditto for some Xeons which align at 128.
Signed-off-by: Anton Ivanov <anton.ivanov@cambridgegreys.com>
Signed-off-by: Richard Weinberger <richard@nod.at>
|
|
We should be able to ndelay() from any context, even from an
interrupt context! However, this is broken (not functionally,
but locking-wise) in time-travel because we'll get into the
time-travel code and enable interrupts to handle messages on
other time-travel aware subsystems (only virtio for now).
Luckily, I've already reworked the time-travel aware signal
(interrupt) delivery for suspend/resume to have a time travel
handler, which runs directly in the context of the signal and
not from the Linux interrupt.
In order to fix this time-travel issue then, we need to do a
few things:
1) rework the signal handling code to call time-travel handlers
(only) if interrupts are disabled but signals aren't blocked,
instead of marking it only pending there. This is needed to
not deadlock other communication.
2) rework time-travel to not enable interrupts while it's
waiting for a message;
3) rework time-travel to not (just) disable interrupts but
rather block signals at a lower level while it needs them
disabled for communicating with the controller.
Finally, since now we can actually spend even virtual time
in interrupts-disabled sections, the delay warning when we
deliver a time-travel delayed interrupt is no longer valid,
things can (and should) now get delayed.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Richard Weinberger <richard@nod.at>
|
|
Use signals_enabled instead of always jumping through
a function call to read it, there's not much point in
that.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Richard Weinberger <richard@nod.at>
|
|
This mostly reverts the old commit 3963333fe676 ("uml: cover stubs
with a VMA") which had added a VMA to the existing PTEs. However,
there's no real reason to have the PTEs in the first place and the
VMA cannot be 'fixed' in place, which leads to bugs that userspace
could try to unmap them and be forcefully killed, or such. Also,
there's a bit of an ugly hole in userspace's address space.
Simplify all this: just install the stub code/page at the top of
the (inner) address space, i.e. put it just above TASK_SIZE. The
pages are simply hard-coded to be mapped in the userspace process
we use to implement an mm context, and they're out of reach of the
inner mmap/munmap/mprotect etc. since they're above TASK_SIZE.
Getting rid of the VMA also makes vma_merge() no longer hit one of
the VM_WARN_ON()s there because we installed a VMA while the code
assumes the stack VMA is the first one.
It also removes a lockdep warning about mmap_sem usage since we no
longer have uml_setup_stubs() and thus no longer need to do any
manipulation that would require mmap_sem in activate_mm().
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Richard Weinberger <richard@nod.at>
|
|
The userspace stacks mostly have a stack (and in the case of the
syscall stub we can just set their stack pointer) that points to
the location of the stub data page already.
Rework the stubs to use the stack pointer to derive the start of
the data page, rather than requiring it to be hard-coded.
In the clone stub, also integrate the int3 into the stack remap,
since we really must not use the stack while we remap it.
This prepares for putting the stub at a variable location that's
not part of the normal address space of the userspace processes
running inside the UML machine.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Richard Weinberger <richard@nod.at>
|
|
If the two are mixed up, then it looks as though the parent
returned an error if the child failed (before) the mmap(),
and then the resulting process never gets killed. Fix this
by splitting the child and parent errors, reporting and
using them appropriately.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Richard Weinberger <richard@nod.at>
|
|
In some cases we can get to fix_range_common() with mmap_sem held,
and in others we get there without it being held. For example, we
get there with it held from sys_mprotect(), and without it held
from fork_handler().
Avoid any issues in this and simply defer killing the task until
it runs the next time. Do it on the mm so that another task that
shares the same mm can't continue running afterwards.
Cc: stable@vger.kernel.org
Fixes: 468f65976a8d ("um: Fix hung task in fix_range_common()")
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Richard Weinberger <richard@nod.at>
|
|
Since we're basically debugging the userspace (it runs in ptrace)
it's useful to dump out the registers - but they're not readable,
so if something goes wrong it's hard to say what. Print the names
of registers in the register dump so it's easier to look at.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Richard Weinberger <richard@nod.at>
|
|
Changing os_idle_sleep() to use pause() (I accidentally described
it as an empty select() in the commit log because I had changed it
from that to pause() in a later revision) exposed a race condition
in the idle code. The following can happen:
timer_settime(0, 0, {it_interval={tv_sec=0, tv_nsec=0}, it_value={tv_sec=0, tv_nsec=624017}}, NULL) = 0
...
<SIGALRM is delivered but we're already on the way to idle>
pause()
and we now hang forever. This was previously possible as well, but
it could never cause UML to hang for more than a second since we
could only sleep for that much, so at most you'd notice a "hiccup"
in the UML. Obviously, any sort of external interrupt also "saves"
it and interrupts pause().
Fix this by properly handling the race, rather than papering over
it again:
- first, block SIGALRM, and obtain the old signal set
- check the timer
- suspend, waiting for any signal out of the old set, if, and only
if, the timer will fire in the future
- restore the old signal mask
This ensures race-free operation: as it's blocked, the signal won't
be delivered while we're looking at the timer even if it were to be
triggered right _after_ we've returned from timer_gettime() with a
non-zero value (telling us the timer will trigger). Thus, despite
getting to sigsuspend() because timer_gettime() told us we're still
waiting, we'll not hang because sigsuspend() will return immediately
due to the pending signal.
Fixes: 49da38a3ef33 ("um: Simplify os_idle_sleep() and sleep longer")
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Acked-By: Anton Ivanov <anton.ivanov@cambridgegreys.com>
Signed-off-by: Richard Weinberger <richard@nod.at>
|
|
This reverts commit ef4459a6da09 ("um: allocate a guard page to
helper threads"), it's broken in multiple ways:
1) the free no longer matches the alloc; and
2) more importantly, the set_memory_ro() causes allocation of
page tables for the normal memory that doesn't have any,
and that later causes corruption and crashes (usually but
not always in vfree()).
We could fix the first bug and use vmalloc() to work around the
second, but set_memory_ro() actually doesn't do anything either
so I'll just revert that as well.
Reported-by: Benjamin Berg <benjamin@sipsolutions.net>
Fixes: ef4459a6da09 ("um: allocate a guard page to helper threads")
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Richard Weinberger <richard@nod.at>
|
|
Lockdep (on 5.10-rc) points out that we're delivering IRQs while IRQs
are not even enabled, which clearly shouldn't happen. Defer the time
event IRQ delivery until they actually are enabled.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Richard Weinberger <richard@nod.at>
|
|
If the sigio workaround needed to be applied to a file descriptor,
set_irq_wake() wouldn't work for it since it would get polled by
the thread instead of causing SIGIO, and thus could never really
cause a wakeup, since the thread notification FD wasn't marked as
being able to wake up the system.
Fix this by marking the thread's notification FD explicitly as a
wake source FD, i.e. not suppressing SIGIO for it in suspend. In
order to not cause spurious wakeups, we then need to remove all
FDs that shouldn't wake up the system from the polling thread. In
order to do this, add unlocked versions of ignore_sigio_fd() and
add_sigio_fd() (nothing else is happening in suspend, so this is
fine), and also modify ignore_sigio_fd() to return -ENOENT if the
FD wasn't originally in there. This doesn't matter because nothing
else currently checks the return value, but the irq code needs to
know which ones to restore the workaround for.
All told, this lets us use a timerfd for the RTC clock in the next
patch, which doesn't send SIGIO.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Richard Weinberger <richard@nod.at>
|
|
Ensure that file closes, connection closes, etc are propagated
as interrupts in the interrupt controller.
Fixes: ff6a17989c08 ("Epoll based IRQ controller")
Signed-off-by: Anton Ivanov <anton.ivanov@cambridgegreys.com>
Signed-off-by: Richard Weinberger <richard@nod.at>
|
|
We've been running into stack overflows in helper threads
corrupting memory (e.g. because somebody put printf() or
os_info() there), so to avoid those causing hard-to-debug
issues later on, allocate a guard page for helper thread
stacks and mark it read-only.
Unfortunately, the crash dump at that point is useless as
the stack tracer will try to backtrace the *kernel* thread,
not the helper thread, but at least we don't survive to a
random issue caused by corruption.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Richard Weinberger <richard@nod.at>
|
|
UML userspace fetches siginfo and passes it to signal handlers
in UML. This is needed only for some of the signals, because
key handlers like SIGIO make no use of this variable.
Signed-off-by: Anton Ivanov <anton.ivanov@cambridgegreys.com>
Signed-off-by: Richard Weinberger <richard@nod.at>
|
|
With all the previous bits in place, we can now also support
suspend to RAM, in the sense that everything is suspended,
not just most, including userspace, processes like in s2idle.
Since um_idle_sleep() now waits forever, we can simply call
that to "suspend" the system.
As before, you can wake it up using SIGUSR1 since we're just
in a pause() call that only needs to return.
In order to implement selective resume from certain devices,
and not have any arbitrary device interrupt wake up, suspend
interrupts by removing SIGIO notification (O_ASYNC) from all
the FDs that are not supposed to wake up the system. However,
swap out the handler so we don't actually handle the SIGIO as
an interrupt.
Since we're in pause(), the mere act of receiving SIGIO wakes
us up, and then after things have been restored enough, re-set
O_ASYNC for all previously suspended FDs, reinstall the proper
SIGIO handler, and send SIGIO to self to process anything that
might now be pending.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Richard Weinberger <richard@nod.at>
|
|
In order to be able to experiment with suspend in UML, add the
minimal work to be able to suspend (s2idle) an instance of UML,
and be able to wake it back up from that state with the USR1
signal sent to the main UML process.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Acked-By: Anton Ivanov <anton.ivanov@cambridgegreys.com>
Signed-off-by: Richard Weinberger <richard@nod.at>
|
|
There really is no reason to pass the amount of time we should
sleep, especially since it's just hard-coded to one second.
Additionally, one second isn't really all that long, and as we
are expecting to be woken up by a signal, we can sleep longer
and avoid doing some work every second, so replace the current
clock_nanosleep() with just an empty select() that can _only_
be woken up by a signal.
We can also remove the deliver_alarm() since we don't need to
do that when we got e.g. SIGIO that woke us up, and if we got
SIGALRM the signal handler will actually (have) run, so it's
just unnecessary extra work.
Similarly, in time-travel mode, just program the wakeup event
from idle to be S64_MAX, which is basically the most you could
ever simulate to. Of course, you should already have an event
in the list that's earlier and will cause a wakeup, normally
that's the regular timer interrupt, though in suspend it may
(later) also be an RTC event. Since actually getting to this
point would be a bug and you can't ever get out again, panic()
on it in the time control code.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Acked-By: Anton Ivanov <anton.ivanov@cambridgegreys.com>
Signed-off-by: Richard Weinberger <richard@nod.at>
|
|
We don't actually use this in um_request_irq(), so it can
never be assigned. It's also not clear what that would be
useful for, so just remove it.
This results in quite a number of cleanups, all the way to
removing the "SIGIO on close" startup check, since the data
it assigns (pty_close_sigio) is not used anymore.
While at it, also make this an enum so we get a minimum of
type checking, and remove the IRQ_NONE hack in virtio since
we now no longer have the name twice.
Acked-By: Anton Ivanov <anton.ivanov@cambridgegreys.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Acked-By: Anton Ivanov <anton.ivanov@cambridgegreys.com>
Signed-off-by: Richard Weinberger <richard@nod.at>
|
|
If we run out of space, return an error instead of 0.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Richard Weinberger <richard@nod.at>
|
|
The signal.c can't use heap for bit data located on stack. However,
by default a compiler warns us about overstepping stack frame size
threshold:
arch/um/os-Linux/signal.c: In function ‘sig_handler_common’:
arch/um/os-Linux/signal.c:51:1: warning: the frame size of 2960 bytes is larger than 2048 bytes [-Wframe-larger-than=]
51 | }
| ^
arch/um/os-Linux/signal.c: In function ‘timer_real_alarm_handler’:
arch/um/os-Linux/signal.c:95:1: warning: the frame size of 2960 bytes is larger than 2048 bytes [-Wframe-larger-than=]
95 | }
| ^
Due to above increase stack frame size threshold explicitly for signal.c
to avoid unnecessary warning.
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Tested-by: David Gow <davidgow@google.com>
Signed-off-by: Richard Weinberger <richard@nod.at>
|
|
asprintf is not compatible with the existing uml memory allocation
mechanism. Its use on the "user" side of UML results in a corrupt slab
state.
Fixes: 0d4e5ac7e780 ("um: remove uses of variable length arrays")
Cc: stable@vger.kernel.org
Signed-off-by: Anton Ivanov <anton.ivanov@cambridgegreys.com>
Signed-off-by: Richard Weinberger <richard@nod.at>
|
|
pids are no longer limited to 16-bits, bump to 32-bits,
ie. 9 decimal characters. Additionally sizeof("/") already
returns 2 - ie. it already accounts for trailing zero.
Cc: Jeff Dike <jdike@addtoit.com>
Cc: Richard Weinberger <richard@nod.at>
Cc: Anton Ivanov <anton.ivanov@cambridgegreys.com>
Cc: Linux UM Mailing List <linux-um@lists.infradead.org>
Signed-off-by: Maciej Żenczykowski <maze@google.com>
Signed-off-by: Richard Weinberger <richard@nod.at>
|
|
musl toolchain and headers are a bit more strict. These fixes enable building
UML with musl as well as seem not to break on glibc.
Signed-off-by: Ignat Korchagin <ignat@cloudflare.com>
Tested-by: Brendan Higgins <brendanhiggins@google.com>
Signed-off-by: Richard Weinberger <richard@nod.at>
|
|
We do not need to update the metadata (atime, mtime, etc)
on the UBD file and/or the COW file until UML exits.
UBD image mtime is checked in UML only when opening
the files. After that they are locked and used
exclusively by a single UML instance, so there is
no point wasting resources on updating metadata on
every sync. We can sync data only. The host will
always update mtime if a file has been modified upon
closing it.
Signed-off-by: Anton Ivanov <anton.ivanov@cambridgegreys.com>
Signed-off-by: Richard Weinberger <richard@nod.at>
|
|
These two functions are otherwise unknown to the pedantic compiler.
Include the correct header to enable the build to succeed.
Signed-off-by: Zach van Rijn <me@zv.io>
Acked-By: Anton Ivanov <anton.ivanov@cambridgegreys.com>
Signed-off-by: Richard Weinberger <richard@nod.at>
|
|
This implements synchronized time-travel mode which - using a special
application on a unix socket - lets multiple machines take part in a
time-travelling simulation together.
The protocol for the unix domain socket is defined in the new file
include/uapi/linux/um_timetravel.h.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Richard Weinberger <richard@nod.at>
|
|
This file isn't really shared, it's only used on the kernel side,
not on the user side. Remove the include from the user-side and
move the file to a better place.
While at it, rename it to time-internal.h, it's not really just
timers but all kinds of things related to timekeeping.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Richard Weinberger <richard@nod.at>
|
|
When building UML with glibc 2.17 installed, compilation
of arch/um/os-Linux/file.c fails due to failure to find
FALLOC_FL_PUNCH_HOLE and FALLOC_FL_KEEP_SIZE definitions.
It appears that /usr/include/bits/fcntl-linux.h (indirectly
included by /usr/include/fcntl.h) does not include falloc.h
with an older glibc, whereas a more up-to-date version
does.
Adding the direct include to file.c resolves the issue
and does not cause problems for more recent glibc.
Fixes: 50109b5a03b4 ("um: Add support for DISCARD in the UBD Driver")
Cc: Brendan Higgins <brendanhiggins@google.com>
Signed-off-by: Alan Maguire <alan.maguire@oracle.com>
Reviewed-by: Brendan Higgins <brendanhiggins@google.com>
Acked-By: Anton Ivanov <anton.ivanov@cambridgegreys.com>
Signed-off-by: Richard Weinberger <richard@nod.at>
|
|
sizeof gives us the size of the pointer variable, not of the
area it points to. So the number of bytes copied by umid_file_name()
is 8.
We should pass in the correct length of the file buffer.
Signed-off-by: Wen Yang <wenyang@linux.alibaba.com>
Signed-off-by: Richard Weinberger <richard@nod.at>
|
|
The ubd code suffers from a possible y2038 overflow on 32-bit
architectures, both for the cow header and the os_file_modtime()
function.
Replace time_t with time64_t to extend the ubd_kern side as much
as possible.
Whether this makes a difference for the user side depends on
the host libc implementation that may use either 32-bit or 64-bit
time_t.
For the cow file format, the header contains an unsigned 32-bit
timestamp, which is good until y2106, passing this through a
'long long' gives us a consistent interpretation between 32-bit
and 64-bit um kernels.
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
|
|
In the main() code, we eventually enable signals just before
exec() or exit(), in order to to not have signals pending and
delivered *after* the exec().
I've observed SIGSEGV loops at this point, and the reason seems
to be the irqflags tracing; this makes sense as the kernel is
no longer really functional at this point. Since there's really
no reason to use unblock_signals_trace() here (I had just done
a global search & replace), use the plain unblock_signals() in
this case to avoid going into the no longer functional kernel.
Fixes: 0dafcbe128d2 ("um: Implement TRACE_IRQFLAGS_SUPPORT")
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Richard Weinberger <richard@nod.at>
|