Age | Commit message (Collapse) | Author | Files | Lines |
|
This change adds a procfs connector event, which is emitted on every
successful process tracer attach or detach.
If some process connects to other one, kernelspace connector reports
process id and thread group id of both these involved processes. On
disconnection null process id is returned.
Such an event allows to create a simple automated userspace mechanism
to be aware about processes connecting to others, therefore predefined
process policies can be applied to them if needed.
Note, a detach signal is emitted only in case, if a tracer process
explicitly executes PTRACE_DETACH request. In other cases like tracee
or tracer exit detach event from proc connector is not reported.
Signed-off-by: Vladimir Zapolskiy <vzapolskiy@gmail.com>
Acked-by: Evgeniy Polyakov <zbr@ioremap.net>
Cc: David S. Miller <davem@davemloft.net>
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
|
|
`make headers_check` complains that
linux-2.6/usr/include/linux/sdla.h:116: userspace cannot reference
function or variable defined in the kernel
this is due to that there is no such a kernel function,
void sdla(void *cfg_info, char *dev, struct frad_conf *conf, int quiet);
I don't know why we have it in a kernel header, so remove it.
Signed-off-by: WANG Cong <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
This patch adds new field 'force_sf_dma_mode' to plat_stmmacenet_data
struct to allow users to specify if they want to use force store forward
eventhough tx_coe is not available in hw.
without this flag stmmac driver will use cut-thru mode not use
store-forward mode.
Signed-off-by: Srinivas Kandagatla <srinivas.kandagatla@st.com>
Signed-off-by: Giuseppe Cavallaro <peppe.cavallaro@st.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Goal of this patch is to permit nfnetlink providers not mandate
nfnl_mutex being held while nfnetlink_rcv_msg() calls them.
If struct nfnl_callback contains a non NULL call_rcu(), then
nfnetlink_rcv_msg() will use it instead of call() field, holding
rcu_read_lock instead of nfnl_mutex
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
CC: Florian Westphal <fw@strlen.de>
CC: Eric Leblond <eric@regit.org>
Signed-off-by: Patrick McHardy <kaber@trash.net>
|
|
In the future dst entries will be neigh-less. In that environment we
need to have an easy transition point for current users of
dst->neighbour outside of the packet output fast path.
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
dst_{get,set}_neighbour()
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
It just makes it harder to see 1) what the code is doing
and 2) grep for all users of dst{->,.}neighbour
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
This will get us closer to being able to do "neigh stuff"
completely independent of the underlying dst_entry for
protocols (ipv4/ipv6) that wish to do so.
We will also be able to make dst entries neigh-less.
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6:
Bluetooth: Fix crash with incoming L2CAP connections
Bluetooth: Fix regression in L2CAP connection procedure
gianfar: rx parser
r6040: only disable RX interrupt if napi_schedule_prep is successful
net: remove NETIF_F_ALL_TX_OFFLOADS
net: sctp: fix checksum marking for outgoing packets
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux-acpi-2.6
* 'release' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux-acpi-2.6:
ACPI: Fixes device power states array overflow
ACPI, APEI, HEST, Detect duplicated hardware error source ID
ACPI: Fix lockdep false positives in acpi_power_off()
|
|
there is only one user of vlan_find_dev outside of the actual vlan code:
qlcnic uses it to iterate over some VLANs it knows.
let's just make vlan_find_dev private to the VLAN code and have the
iteration in qlcnic be a bit more direct. (a few rcu dereferences less
too)
Signed-off-by: David Lamparter <equinox@diac24.net>
Cc: Patrick McHardy <kaber@trash.net>
Cc: Amit Kumar Salecha <amit.salecha@qlogic.com>
Cc: Anirban Chakraborty <anirban.chakraborty@qlogic.com>
Cc: linux-driver@qlogic.com
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
define ETH_P_8021AD to 88a8 (assigned by IEEE) and add ETH_P_QINQ{1,2,3}
for the pre-standard 9{1,2,3}00 types. all of them use 802.1q frame
format, with 1 bit used differently in some cases.
also define ETH_P_8021AH to 88e7 (assigned by IEEE). this is Mac-in-Mac
and uses a different, 16-byte header.
Signed-off-by: David Lamparter <equinox@diac24.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The fake SIGSTOP during attach has numerous problems. PTRACE_SEIZE
is already fine, but we have basically the same problems is SIGSTOP
is sent on auto-attach, the tracer can't know if this signal signal
should be cancelled or not.
Change ptrace_event() to set JOBCTL_TRAP_STOP if the new child is
PT_SEIZED, this triggers the PTRACE_EVENT_STOP report.
Thereafter a PT_SEIZED task can never report the bogus SIGSTOP.
Test-case:
#define PTRACE_SEIZE 0x4206
#define PTRACE_SEIZE_DEVEL 0x80000000
#define PTRACE_EVENT_STOP 7
#define WEVENT(s) ((s & 0xFF0000) >> 16)
int main(void)
{
int child, grand_child, status;
long message;
child = fork();
if (!child) {
kill(getpid(), SIGSTOP);
fork();
assert(0);
return 0x23;
}
assert(ptrace(PTRACE_SEIZE, child, 0,PTRACE_SEIZE_DEVEL) == 0);
assert(wait(&status) == child);
assert(WIFSTOPPED(status) && WSTOPSIG(status) == SIGSTOP);
assert(ptrace(PTRACE_SETOPTIONS, child, 0, PTRACE_O_TRACEFORK) == 0);
assert(ptrace(PTRACE_CONT, child, 0,0) == 0);
assert(waitpid(child, &status, 0) == child);
assert(WIFSTOPPED(status) && WSTOPSIG(status) == SIGTRAP);
assert(WEVENT(status) == PTRACE_EVENT_FORK);
assert(ptrace(PTRACE_GETEVENTMSG, child, 0, &message) == 0);
grand_child = message;
assert(waitpid(grand_child, &status, 0) == grand_child);
assert(WIFSTOPPED(status) && WSTOPSIG(status) == SIGTRAP);
assert(WEVENT(status) == PTRACE_EVENT_STOP);
kill(child, SIGKILL);
kill(grand_child, SIGKILL);
return 0;
}
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Acked-by: Tejun Heo <tj@kernel.org>
|
|
If the new child is traced, do_fork() adds the pending SIGSTOP.
It assumes that either it is traced because of auto-attach or the
tracer attached later, in both cases sigaddset/set_thread_flag is
correct even if SIGSTOP is already pending.
Now that we have PTRACE_SEIZE this is no longer right in the latter
case. If the tracer does PTRACE_SEIZE after copy_process() makes the
child visible the queued SIGSTOP is wrong.
We could check PT_SEIZED bit and change ptrace_attach() to set both
PT_PTRACED and PT_SEIZED bits simultaneously but see the next patch,
we need to know whether this child was auto-attached or not anyway.
So this patch simply moves this code to ptrace_init_task(), this
way we can never race with ptrace_attach().
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Acked-by: Tejun Heo <tj@kernel.org>
|
|
new_child->jobctl is not initialized during the fork, it is copied
from parent->jobctl. Currently this is harmless, the forking task
is running and copy_process() can't succeed if signal_pending() is
true, so only JOBCTL_STOP_DEQUEUED can be copied. Still this is a
bit fragile, it would be more clean to set ->jobctl = 0 explicitly.
Also, check ->ptrace != 0 instead of PT_PTRACED, move the
CONFIG_HAVE_HW_BREAKPOINT code up.
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Acked-by: Tejun Heo <tj@kernel.org>
|
|
It is always dev_queue_xmit().
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
It's just taking on one of two possible values, either
neigh_ops->output or dev_queue_xmit(). And this is purely depending
upon whether nud_state has NUD_CONNECTED set or not.
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
It's always dev_queue_xmit().
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Add overview documentation in Documentation/ABI/stable/firewire-cdev.
Improve the inline reference documentation in firewire-cdev.h:
- Add /* available since kernel... */ comments to event numbers
consistent with the comments on ioctl numbers.
- Shorten some documentation on an event and an ioctl that are
less interesting to current programming because there are newer
preferable variants.
- Spell Configuration ROM (name of an IEEE 1212 register) in
upper case.
- Move the dummy FW_CDEV_VERSION out of the reader's field of
vision. We should remove it from the header next year or so.
Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de>
|
|
event queuing
Between open(2) of a /dev/fw* and the first FW_CDEV_IOC_GET_INFO
ioctl(2) on it, the kernel already queues FW_CDEV_EVENT_BUS_RESET events
to be read(2) by the client. The get_info ioctl is practically always
issued right away after open, hence this condition only occurs if the
client opens during a bus reset, especially during a rapid series of bus
resets.
The problem with this condition is twofold:
- These bus reset events carry the (as yet undocumented) @closure
value of 0. But it is not the kernel's place to choose closures;
they are privat to the client. E.g., this 0 value forced from the
kernel makes it unsafe for clients to dereference it as a pointer to
a closure object without NULL pointer check.
- It is impossible for clients to determine the relative order of bus
reset events from get_info ioctl(2) versus those from read(2),
except in one way: By comparison of closure values. Again, such a
procedure imposes complexity on clients and reduces freedom in use
of the bus reset closure.
So, change the ABI to suppress queuing of bus reset events before the
first FW_CDEV_IOC_GET_INFO ioctl was issued by the client.
Note, this ABI change cannot be version-controlled. The kernel cannot
distinguish old from new clients before the first FW_CDEV_IOC_GET_INFO
ioctl.
We will try to back-merge this change into currently maintained stable/
longterm series, and we only document the new behaviour. The old
behavior is now considered a kernel bug, which it basically is.
Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de>
Cc: <stable@kernel.org>
|
|
|
|
* pm-runtime:
OMAP: PM: disable idle on suspend for GPIO and UART
OMAP: PM: omap_device: add API to disable idle on suspend
OMAP: PM: omap_device: add system PM methods for PM domain handling
OMAP: PM: omap_device: conditionally use PM domain runtime helpers
PM / Runtime: Add new helper function: pm_runtime_status_suspended()
PM / Runtime: Consistent utilization of deferred_resume
PM / Runtime: Prevent runtime_resume from racing with probe
PM / Runtime: Replace "run-time" with "runtime" in documentation
PM / Runtime: Improve documentation of enable, disable and barrier
PM: Limit race conditions between runtime PM and system sleep (v2)
PCI / PM: Detect early wakeup in pci_pm_prepare()
PM / Runtime: Return special error code if runtime PM is disabled
PM / Runtime: Update documentation of interactions with system sleep
|
|
* pm-domains: (33 commits)
ARM / shmobile: Return -EBUSY from A4LC power off if A3RV is active
PM / Domains: Take .power_off() error code into account
ARM / shmobile: Use genpd_queue_power_off_work()
ARM / shmobile: Use pm_genpd_poweroff_unused()
PM / Domains: Introduce function to power off all unused PM domains
PM / Domains: Queue up power off work only if it is not pending
PM / Domains: Improve handling of wakeup devices during system suspend
PM / Domains: Do not restore all devices on power off error
PM / Domains: Allow callbacks to execute all runtime PM helpers
PM / Domains: Do not execute device callbacks under locks
PM / Domains: Make failing pm_genpd_prepare() clean up properly
PM / Domains: Set device state to "active" during system resume
ARM: mach-shmobile: sh7372 A3RV requires A4LC
PM / Domains: Export pm_genpd_poweron() in header
ARM: mach-shmobile: sh7372 late pm domain off
ARM: mach-shmobile: Runtime PM late init callback
ARM: mach-shmobile: sh7372 D4 support
ARM: mach-shmobile: sh7372 A4MP support
ARM: mach-shmobile: sh7372: make sure that fsi is peripheral of spu2
ARM: mach-shmobile: sh7372 A3SG support
...
|
|
A system or a device may need to control suspend/wakeup events. It may
want to wakeup the system after a predefined amount of time or at a
predefined event decided while entering suspend for polling or delayed
work. Then, it may want to enter suspend again if its predefined wakeup
condition is the only wakeup reason and there is no outstanding events;
thus, it does not wakeup the userspace unnecessary or unnecessary
devices and keeps suspended as long as possible (saving the power).
Enabling a system to wakeup after a specified time can be easily
achieved by using RTC. However, to enter suspend again immediately
without invoking userland and unrelated devices, we need additional
features in the suspend framework.
Such need comes from:
1. Monitoring a critical device status without interrupts that can
wakeup the system. (in-suspend polling)
An example is ambient temperature monitoring that needs to shut down
the system or a specific device function if it is too hot or cold. The
temperature of a specific device may be needed to be monitored as well;
e.g., a charger monitors battery temperature in order to stop charging
if overheated.
2. Execute critical "delayed work" at suspend.
A driver or a system/board may have a delayed work (or any similar
things) that it wants to execute at the requested time.
For example, some chargers want to check the battery voltage some
time (e.g., 30 seconds) after the battery is fully charged and the
charger has stopped. Then, the charger restarts charging if the voltage
has dropped more than a threshold, which is smaller than "restart-charger"
voltage, which is a threshold to restart charging regardless of the
time passed.
This patch allows to add "suspend_again" callback at struct
platform_suspend_ops and let the "suspend_again" callback return true if
the system is required to enter suspend again after the current instance
of wakeup. Device-wise suspend_again implemented at dev_pm_ops or
syscore is not done because: a) suspend_again feature is usually under
platform-wise decision and controls the behavior of the whole platform
and b) There are very limited devices related to the usage cases of
suspend_again; chargers and temperature sensors are mentioned so far.
With suspend_again callback registered at struct platform_suspend_ops
suspend_ops in kernel/power/suspend.c with suspend_set_ops by the
platform, the suspend framework tries to enter suspend again by
looping suspend_enter() if suspend_again has returned true and there has
been no errors in the suspending sequence or pending wakeups (by
pm_wakeup_pending).
Tested at Exynos4-NURI.
[rjw: Fixed up kerneldoc comment for suspend_enter().]
Signed-off-by: MyungJoo Ham <myungjoo.ham@samsung.com>
Signed-off-by: Kyungmin Park <kyungmin.park@samsung.com>
Acked-by: Pavel Machek <pavel@ucw.cz>
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
|
|
cpufreq table allocated by opp_init_cpufreq_table is better
freed by OPP layer itself. This allows future modifications to
the table handling to be transparent to the users.
Signed-off-by: Nishanth Menon <nm@ti.com>
Acked-by: Kevin Hilman <khilman@ti.com>
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
|
|
There's no in-tree users, and bus notifiers are more generic anyway.
Signed-off-by: Peter Korsgaard <jacmet@sunsite.dk>
Signed-off-by: Grant Likely <grant.likely@secretlab.ca>
|
|
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@kernel.org
Signed-off-by: Dave Airlie <airlied@redhat.com>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-next-2.6 into for-davem
Conflicts:
net/bluetooth/l2cap_core.c
|
|
There is no software fallback implemented for SCTP or FCoE checksumming,
and so it should not be passed on by software devices like bridge or bonding.
For VLAN devices, this is different. First, the driver for underlying device
should be prepared to get offloaded packets even when the feature is disabled
(especially if it advertises it in vlan_features). Second, devices under
VLANs do not get replaced without tearing down the VLAN first.
This fixes a mess I accidentally introduced while converting bonding to
ndo_fix_features.
NETIF_F_SOFT_FEATURES are removed from BOND_VLAN_FEATURES because they
are unused as of commit 712ae51afd.
Signed-off-by: Michał Mirosław <mirq-linux@rere.qmqm.pl>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Remove it, as it indirectly exposes netdev features. It's not used in
iproute2 (2.6.38) - is anything else using its interface?
Signed-off-by: Michał Mirosław <mirq-linux@rere.qmqm.pl>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
It is not used anywhere except net/core/dev.c now.
Signed-off-by: Michał Mirosław <mirq-linux@rere.qmqm.pl>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Currently the stack trace per event in ftace is only 8 frames.
This can be quite limiting and sometimes useless. Especially when
the "ignore frames" is wrong and we also use up stack frames for
the event processing itself.
Change this to be dynamic by adding a percpu buffer that we can
write a large stack frame into and then copy into the ring buffer.
For interrupts and NMIs that come in while another event is being
process, will only get to use the 8 frame stack. That should be enough
as the task that it interrupted will have the full stack frame anyway.
Requested-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
|
|
Make pd_power_down_a3rv() use genpd_queue_power_off_work() to queue
up the powering off of the A4LC domain to avoid queuing it up when
it is pending.
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Acked-by: Magnus Damm <damm@opensource.se>
|
|
master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6
Conflicts:
net/bluetooth/l2cap_core.c
|
|
Now that there is a one-to-one correspondance between neighbour
and hh_cache entries, we no longer need:
1) dynamic allocation
2) attachment to dst->hh
3) refcounting
Initialization of the hh_cache entry is indicated by hh_len
being non-zero, and such initialization is always done with
the neighbour's lock held as a writer.
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
|
|
Commit 28c2103 added new state ACPI_STATE_D3_COLD, so the device power
states array must be expanded by one also.
v2: Use ACPI_D_STATE_COUNT instead of number 5 for the array size.
Reported-by: Dan Carpenter <error27@gmail.com>
Suggested-by: Oldřich Jedlička <oldium.pro@seznam.cz>
Signed-off-by: Lin Ming <ming.m.lin@intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/cjb/mmc
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/cjb/mmc:
mmc: core: Bus width testing needs to handle suspend/resume
|
|
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6: (21 commits)
slip: fix wrong SLIP6 ifdef-endif placing
natsemi: fix another dma-debug report
sctp: ABORT if receive, reassmbly, or reodering queue is not empty while closing socket
net: Fix default in docs for tcp_orphan_retries.
hso: fix a use after free condition
net/natsemi: Fix module parameter permissions
XFRM: Fix memory leak in xfrm_state_update
sctp: Enforce retransmission limit during shutdown
mac80211: fix TKIP replay vulnerability
mac80211: fix ie memory allocation for scheduled scans
ssb: fix init regression of hostmode PCI core
rtlwifi: rtl8192cu: Add new USB ID for Netgear WNA1000M
ath9k: Fix tx throughput drops for AR9003 chips with AES encryption
carl9170: add NEC WL300NU-AG usbid
cfg80211: fix deadlock with rfkill/sched_scan by adding new mutex
ath5k: fix incorrect use of drvdata in PCI suspend/resume code
ath5k: fix incorrect use of drvdata in sysfs code
Bluetooth: Fix memory leak under page timeouts
Bluetooth: Fix regression with incoming L2CAP connections
Bluetooth: Fix hidp disconnect deadlocks and lost wakeup
...
|
|
On reading the ext_csd for the first time (in 1 bit mode), save the
ext_csd information needed for bus width compare.
On every pass we make re-reading the ext_csd, compare the data
against the saved ext_csd data.
This fixes a regression introduced in 3.0-rc1 by 08ee80cc397ac1a3
("mmc: core: eMMC bus width may not work on all platforms"), which
incorrectly assumed we would be re-reading the ext_csd at resume-
time.
Signed-off-by: Philip Rakity <prakity@marvell.com>
Tested-by: Jaehoon Chung <jh80.chung@samsung.com>
Signed-off-by: Chris Ball <cjb@laptop.org>
|
|
In WoWLAN, devices may use crypto keys for TX/RX
and could also implement GTK rekeying. If the
driver isn't able to retrieve replay counters and
similar information from the device upon resume,
or if the device isn't responsive due to platform
issues, it isn't safe to keep the connection up
as GTK rekey messages from during the sleep time
could be replayed against it.
The only protection against that is disconnecting
from the AP. Modifying mac80211 to do that while
it is resuming would be very complex and invasive
in the case that the driver requires a reconfig,
so do it after it has resumed completely. In that
case, however, packets might be replayed since it
can then only happen after TX/RX are up again, so
mark keys for interfaces that need to disconnect
as "tainted" and drop all packets that are sent
or received with those keys.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
|
|
All ACPICA locks are allocated by the same function,
acpi_os_create_lock(), with the help of a local variable called
"lock". Thus, when lockdep is enabled, it uses "lock" as the
name of all those locks and regards them as instances of the same
lock, which causes it to report possible locking problems with them
when there aren't any.
To work around this problem, define acpi_os_create_lock() as a macro
and make it pass its argument to spin_lock_init(), so that lockdep
uses it as the name of the new lock. Define this macron in a
Linux-specific file, to minimize the resulting modifications of
the OS-independent ACPICA parts.
This change is based on an earlier patch from Andrea Righi and it
addresses a regression from 2.6.39 tracked as
https://bugzilla.kernel.org/show_bug.cgi?id=38152
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Reported-and-tested-by: Borislav Petkov <bp@alien8.de>
Tested-by: Andrea Righi <andrea@betterlinux.com>
Reviewed-by: Florian Mickler <florian@mickler.org>
Signed-off-by: Len Brown <len.brown@intel.com>
|
|
Add a new function pm_genpd_poweroff_unused() queuing up the
execution of pm_genpd_poweroff() for every initialized generic PM
domain. Calling it will cause every generic PM domain without
devices in use to be powered off.
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Acked-by: Magnus Damm <damm@opensource.se>
|
|
This never, ever, happens.
Neighbour entries are always tied to one address family, and therefore
one set of dst_ops, and therefore one dst_ops->protocol "hh_type"
value.
This capability was blindly imported by Alexey Kuznetsov when he wrote
the neighbour layer.
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Get rid of all of the useless and costly indirection
by doing the neigh hash table lookup directly inside
of the neighbour binding.
Rename from arp_bind_neighbour to rt_bind_neighbour.
Use new helpers {__,}ipv4_neigh_lookup()
In rt_bind_neighbour() get rid of useless tests which
are never true in the context this function is called,
namely dev is never NULL and the dst->neighbour is
always NULL.
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
SPARSEMEM w/o VMEMMAP and DISCONTIGMEM, both used only on 32bit, use
sections array to map pfn to nid which is limited in granularity. If
NUMA nodes are laid out such that the mapping cannot be accurate, boot
will fail triggering BUG_ON() in mminit_verify_page_links().
On 32bit, it's 512MiB w/ PAE and SPARSEMEM. This seems to have been
granular enough until commit 2706a0bf7b (x86, NUMA: Enable
CONFIG_AMD_NUMA on 32bit too). Apparently, there is a machine which
aligns NUMA nodes to 128MiB and has only AMD NUMA but not SRAT. This
led to the following BUG_ON().
On node 0 totalpages: 2096615
DMA zone: 32 pages used for memmap
DMA zone: 0 pages reserved
DMA zone: 3927 pages, LIFO batch:0
Normal zone: 1740 pages used for memmap
Normal zone: 220978 pages, LIFO batch:31
HighMem zone: 16405 pages used for memmap
HighMem zone: 1853533 pages, LIFO batch:31
BUG: Int 6: CR2 (null)
EDI (null) ESI 00000002 EBP 00000002 ESP c1543ecc
EBX f2400000 EDX 00000006 ECX (null) EAX 00000001
err (null) EIP c16209aa CS 00000060 flg 00010002
Stack: f2400000 00220000 f7200800 c1620613 00220000 01000000 04400000 00238000
(null) f7200000 00000002 f7200b58 f7200800 c1620929 000375fe (null)
f7200b80 c16395f0 00200a02 f7200a80 (null) 000375fe 00000002 (null)
Pid: 0, comm: swapper Not tainted 2.6.39-rc5-00181-g2706a0b #17
Call Trace:
[<c136b1e5>] ? early_fault+0x2e/0x2e
[<c16209aa>] ? mminit_verify_page_links+0x12/0x42
[<c1620613>] ? memmap_init_zone+0xaf/0x10c
[<c1620929>] ? free_area_init_node+0x2b9/0x2e3
[<c1607e99>] ? free_area_init_nodes+0x3f2/0x451
[<c1601d80>] ? paging_init+0x112/0x118
[<c15f578d>] ? setup_arch+0x791/0x82f
[<c15f43d9>] ? start_kernel+0x6a/0x257
This patch implements node_map_pfn_alignment() which determines
maximum internode alignment and update numa_register_memblks() to
reject NUMA configuration if alignment exceeds the pfn -> nid mapping
granularity of the memory model as determined by PAGES_PER_SECTION.
This makes the problematic machine boot w/ flatmem by rejecting the
NUMA config and provides protection against crazy NUMA configurations.
Signed-off-by: Tejun Heo <tj@kernel.org>
Link: http://lkml.kernel.org/r/20110712074534.GB2872@htj.dyndns.org
LKML-Reference: <20110628174613.GP478@escobedo.osrc.amd.com>
Reported-and-Tested-by: Hans Rosenfeld <hans.rosenfeld@amd.com>
Cc: Conny Seidel <conny.seidel@amd.com>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc
* 'merge' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc:
powerpc/mm: Fix memory_block_size_bytes() for non-pseries
mm: Move definition of MIN_MEMORY_BLOCK_SIZE to a header
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/arm/linux-arm-soc
* 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/linux-arm-soc:
pcmcia: pxa2xx/vpac270: free gpios on exist rather than requesting
ARM: pxa/raumfeld: fix device name for codec ak4104
ARM: pxa/raumfeld: display initialisation fixes
ARM: pxa/raumfeld: adapt to upcoming hardware change
ARM: pxa: fix gpio_to_chip() clash with gpiolib namespace
genirq: replace irq_gc_ack() with {set,clr}_bit variants (fwd)
arm: mach-vt8500: add forgotten irq_data conversion
ARM: pxa168: correct nand pmu setting
ARM: pxa910: correct nand pmu setting
ARM: pxa: fix PGSR register address calculation
|