Age | Commit message (Collapse) | Author | Files | Lines |
|
Daniel Borkmann says:
====================
pull-request: bpf-next 2020-08-04
The following pull-request contains BPF updates for your *net-next* tree.
We've added 73 non-merge commits during the last 9 day(s) which contain
a total of 135 files changed, 4603 insertions(+), 1013 deletions(-).
The main changes are:
1) Implement bpf_link support for XDP. Also add LINK_DETACH operation for the BPF
syscall allowing processes with BPF link FD to force-detach, from Andrii Nakryiko.
2) Add BPF iterator for map elements and to iterate all BPF programs for efficient
in-kernel inspection, from Yonghong Song and Alexei Starovoitov.
3) Separate bpf_get_{stack,stackid}() helpers for perf events in BPF to avoid
unwinder errors, from Song Liu.
4) Allow cgroup local storage map to be shared between programs on the same
cgroup. Also extend BPF selftests with coverage, from YiFei Zhu.
5) Add BPF exception tables to ARM64 JIT in order to be able to JIT BPF_PROBE_MEM
load instructions, from Jean-Philippe Brucker.
6) Follow-up fixes on BPF socket lookup in combination with reuseport group
handling. Also add related BPF selftests, from Jakub Sitnicki.
7) Allow to use socket storage in BPF_PROG_TYPE_CGROUP_SOCK-typed programs for
socket create/release as well as bind functions, from Stanislav Fomichev.
8) Fix an info leak in xsk_getsockopt() when retrieving XDP stats via old struct
xdp_statistics, from Peilin Ye.
9) Fix PT_REGS_RC{,_CORE}() macros in libbpf for MIPS arch, from Jerry Crunchtime.
10) Extend BPF kernel test infra with skb->family and skb->{local,remote}_ip{4,6}
fields and allow user space to specify skb->dev via ifindex, from Dmitry Yakunin.
11) Fix a bpftool segfault due to missing program type name and make it more robust
to prevent them in future gaps, from Quentin Monnet.
12) Consolidate cgroup helper functions across selftests and fix a v6 localhost
resolver issue, from John Fastabend.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
A later patch will refuse to set the action of certain traps in mlxsw
and also to change the policer binding of certain groups. Pass extack so
that failure could be communicated clearly to user space.
Reviewed-by: Petr Machata <petrm@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Now that BPF program/link management is centralized in generic net_device
code, kernel code never queries program id from drivers, so
XDP_QUERY_PROG/XDP_QUERY_PROG_HW commands are unnecessary.
This patch removes all the implementations of those commands in kernel, along
the xdp_attachment_query().
This patch was compile-tested on allyesconfig.
Signed-off-by: Andrii Nakryiko <andriin@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20200722064603.3350758-10-andriin@fb.com
|
|
The UDP reuseport conflict was a little bit tricky.
The net-next code, via bpf-next, extracted the reuseport handling
into a helper so that the BPF sk lookup code could invoke it.
At the same time, the logic for reuseport handling of unconnected
sockets changed via commit efc6b6f6c3113e8b203b9debfb72d81e0f3dcace
which changed the logic to carry on the reuseport result into the
rest of the lookup loop if we do not return immediately.
This requires moving the reuseport_has_conns() logic into the callers.
While we are here, get rid of inline directives as they do not belong
in foo.c files.
The other changes were cases of more straightforward overlapping
modifications.
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
In the nsim_create(), rtnl_lock() is called before nsim_bpf_init().
If nsim_bpf_init() is failed, rtnl_unlock() should be called,
but it isn't called.
So, unbalanced locking would occur.
Fixes: e05b2d141fef ("netdevsim: move netdev creation/destruction to dev probe")
Signed-off-by: Taehee Yoo <ap420073@gmail.com>
Reviewed-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Add UDP tunnel port handlers to our fake driver so we can test
the core infra.
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Currently, devlink_port_attrs_set accepts a long list of parameters,
that most of them are devlink port's attributes.
Use the devlink_port_attrs struct to replace the relevant parameters.
Signed-off-by: Danielle Ratson <danieller@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Register two control traps with devlink. The existing selftest at
tools/testing/selftests/drivers/net/netdevsim/devlink_trap.sh iterates
over all registered traps and checks that the action of non-drop traps
cannot be changed. Up until now only exception traps were tested, now
control traps will be tested as well.
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The layer 3 exceptions are still subject to the same trap policer, so
nothing changes, but user space can choose to assign a different one.
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
In case the policer drop counter is retrieved when the jiffies value is
a multiple of 64, the counter will not be incremented.
This randomly breaks a selftest [1] the reads the counter twice and
checks that it was incremented:
```
TEST: Trap policer [FAIL]
Policer drop counter was not incremented
```
Fix by always incrementing the counter by 1.
[1] tools/testing/selftests/drivers/net/netdevsim/devlink_trap.sh
Fixes: ad188458d012 ("netdevsim: Add devlink-trap policer support")
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
In case memory resources for dummy_data were allocated, release them
before return.
Addresses-Coverity-ID: 1491997 ("Resource leak")
Fixes: 7ef19d3b1d5e ("devlink: report error once U32_MAX snapshot ids have been used")
Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com>
Reviewed-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Add a dummy callback to set trap group parameters. Return an error when
the 'fail_trap_group_set' debugfs file is set in order to exercise error
paths and verify that error is propagated to user space when should.
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Reviewed-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Packet trap groups are used to aggregate logically related packet traps.
Currently, these groups allow user space to batch operations such as
setting the trap action of all member traps.
In order to prevent the CPU from being overwhelmed by too many trapped
packets, it is desirable to bind a packet trap policer to these groups.
For example, to limit all the packets that encountered an exception
during routing to 10Kpps.
Allow device drivers to bind default packet trap policers to packet trap
groups when the latter are registered with devlink.
The next patch will enable user space to change this default binding.
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Reviewed-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Register three dummy packet trap policers with devlink and implement
callbacks to change their parameters and read their counters.
This will be used later on in the series to test the devlink-trap
policer infrastructure.
v2:
* Remove check about burst size being a power of 2 and instead add a
debugfs knob to fail the operation
* Provide max/min rate/burst size when registering policers and remove
the validity checks from nsim_dev_devlink_trap_policer_set()
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Reviewed-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
When health reporter is registered to devlink, devlink will implicitly set
auto recover if and only if the reporter has a recover method. No reason
to explicitly get the auto recover flag from the driver.
Remove this flag from all drivers that called
devlink_health_reporter_create.
All existing health reporters set auto recovery to true if they have a
recover method.
Yet, administrator can unset auto recover via netlink command as prior to
this patch.
Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Reviewed-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Health reporters should be registered with auto recover set to true.
Align dummy reporter behaviour with that, as in later patch the option to
set auto recover behaviour will be removed.
In addition, align netdevsim selftest to the new default value.
Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Implement the .snapshot region operation for the dummy data region. This
enables a region snapshot to be taken upon request via the new
DEVLINK_CMD_REGION_SNAPSHOT command.
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Each snapshot created for a devlink region must have an id. These ids
are supposed to be unique per "event" that caused the snapshot to be
created. Drivers call devlink_region_snapshot_id_get to obtain a new id
to use for a new event trigger. The id values are tracked per devlink,
so that the same id number can be used if a triggering event creates
multiple snapshots on different regions.
There is no mechanism for snapshot ids to ever be reused. Introduce an
xarray to store the count of how many snapshots are using a given id,
replacing the snapshot_id field previously used for picking the next id.
The devlink_region_snapshot_id_get() function will use xa_alloc to
insert an initial value of 1 value at an available slot between 0 and
U32_MAX.
The new __devlink_snapshot_id_increment() and
__devlink_snapshot_id_decrement() functions will be used to track how
many snapshots currently use an id.
Drivers must now call devlink_snapshot_id_put() in order to release
their reference of the snapshot id after adding region snapshots.
By tracking the total number of snapshots using a given id, it is
possible for the decrement() function to erase the id from the xarray
when it is not in use.
With this method, a snapshot id can become reused again once all
snapshots that referred to it have been deleted via
DEVLINK_CMD_REGION_DEL, and the driver has finished adding snapshots.
This work also paves the way to introduce a mechanism for userspace to
request a snapshot.
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The devlink_snapshot_id_get() function returns a snapshot id. The
snapshot id is a u32, so there is no way to indicate an error code.
A future change is going to possibly add additional cases where this
function could fail. Refactor the function to return the snapshot id in
an argument, so that it can return zero or an error value.
This ensures that snapshot ids cannot be confused with error values, and
aids in the future refactor of snapshot id allocation management.
Because there is no current way to release previously used snapshot ids,
add a simple check ensuring that an error is reported in case the
snapshot_id would over flow.
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
It does not makes sense that two snapshots for a given region would use
different destructors. Simplify snapshot creation by adding
a .destructor op for regions.
This operation will replace the data_destructor for the snapshot
creation, and makes snapshot creation easier.
Noticed-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Modify the devlink region code in preparation for adding new operations
on regions.
Create a devlink_region_ops structure, and move the name pointer from
within the devlink_region structure into the ops structure (similar to
the devlink_health_reporter_ops).
This prepares the regions to enable support of additional operations in
the future such as requesting snapshots, or accessing the region
directly without a snapshot.
In order to re-use the constant strings in the mlx4 driver their
declaration must be changed to 'const char * const' to ensure the
compiler realizes that both the data and the pointer cannot change.
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Reviewed-by: Jakub Kicinski <kuba@kernel.org>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Overlapping header include additions in macsec.c
A bug fix in 'net' overlapping with the removal of 'version'
string in ena_netdev.c
Overlapping test additions in selftests Makefile
Overlapping PCI ID table adjustments in iwlwifi driver.
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Packet trap groups are now explicitly registered by drivers and not
implicitly registered when the packet traps are registered. Therefore,
there is no need to encode entire group structure the trap is associated
with inside the trap structure.
Instead, only pass the group identifier. Refer to it as initial group
identifier, as future patches will allow user space to move traps
between groups.
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Use the previously added API to explicitly register / unregister
supported packet trap groups. This is in preparation for future patches
that will enable drivers to pass additional group attributes, such as
associated policer identifier.
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Since snprintf() returns the would-be-output size instead of the
actual output size, the succeeding calls may go beyond the given
buffer limit. Fix it by replacing with scnprintf().
Cc: "David S . Miller" <davem@davemloft.net>
Cc: Jakub Kicinski <kuba@kernel.org>
Cc: netdev@vger.kernel.org
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Add new trap ACL which reports flow action cookie in a metadata. Allow
used to setup the cookie using debugfs file.
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Reviewed-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Add cookie argument to devlink_trap_report() allowing driver to pass in
the user cookie. Pass on the cookie down to drop monitor code.
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Reviewed-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
drivers/net/netdevsim/dev.c:937:1-3: WARNING: PTR_ERR_OR_ZERO can be used
Use PTR_ERR_OR_ZERO rather than if(IS_ERR(...)) + PTR_ERR
Generated by: scripts/coccinelle/api/ptr_ret.cocci
Fixes: 6556ff32f12d ("netdevsim: use IS_ERR instead of IS_ERR_OR_NULL for debugfs")
CC: Taehee Yoo <ap420073@gmail.com>
Signed-off-by: kbuild test robot <lkp@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
sdev.c code is merged into dev.c and is not used anymore.
it would be removed.
Reviewed-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Taehee Yoo <ap420073@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
vfnum buffer size and binary_len buffer size is received by user-space.
So, this buffer size could be too large. If so, kmalloc will internally
print a warning message.
This warning message is actually not necessary for the netdevsim module.
So, this patch adds __GFP_NOWARN.
Test commands:
modprobe netdevsim
echo 1 > /sys/bus/netdevsim/new_device
echo 1000000000 > /sys/devices/netdevsim1/sriov_numvfs
Splat looks like:
[ 357.847266][ T1000] WARNING: CPU: 0 PID: 1000 at mm/page_alloc.c:4738 __alloc_pages_nodemask+0x2f3/0x740
[ 357.850273][ T1000] Modules linked in: netdevsim veth openvswitch nsh nf_conncount nf_nat nf_conntrack nf_defrx
[ 357.852989][ T1000] CPU: 0 PID: 1000 Comm: bash Tainted: G B 5.5.0-rc5+ #270
[ 357.854334][ T1000] Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006
[ 357.855703][ T1000] RIP: 0010:__alloc_pages_nodemask+0x2f3/0x740
[ 357.856669][ T1000] Code: 64 fe ff ff 65 48 8b 04 25 c0 0f 02 00 48 05 f0 12 00 00 41 be 01 00 00 00 49 89 47 0
[ 357.860272][ T1000] RSP: 0018:ffff8880b7f47bd8 EFLAGS: 00010246
[ 357.861009][ T1000] RAX: ffffed1016fe8f80 RBX: 1ffff11016fe8fae RCX: 0000000000000000
[ 357.861843][ T1000] RDX: 0000000000000000 RSI: 0000000000000017 RDI: 0000000000000000
[ 357.862661][ T1000] RBP: 0000000000040dc0 R08: 1ffff11016fe8f67 R09: dffffc0000000000
[ 357.863509][ T1000] R10: ffff8880b7f47d68 R11: fffffbfff2798180 R12: 1ffff11016fe8f80
[ 357.864355][ T1000] R13: 0000000000000017 R14: 0000000000000017 R15: ffff8880c2038d68
[ 357.865178][ T1000] FS: 00007fd9a5b8c740(0000) GS:ffff8880d9c00000(0000) knlGS:0000000000000000
[ 357.866248][ T1000] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 357.867531][ T1000] CR2: 000055ce01ba8100 CR3: 00000000b7dbe005 CR4: 00000000000606f0
[ 357.868972][ T1000] Call Trace:
[ 357.869423][ T1000] ? lock_contended+0xcd0/0xcd0
[ 357.870001][ T1000] ? __alloc_pages_slowpath+0x21d0/0x21d0
[ 357.870673][ T1000] ? _kstrtoull+0x76/0x160
[ 357.871148][ T1000] ? alloc_pages_current+0xc1/0x1a0
[ 357.871704][ T1000] kmalloc_order+0x22/0x80
[ 357.872184][ T1000] kmalloc_order_trace+0x1d/0x140
[ 357.872733][ T1000] __kmalloc+0x302/0x3a0
[ 357.873204][ T1000] nsim_bus_dev_numvfs_store+0x1ab/0x260 [netdevsim]
[ 357.873919][ T1000] ? kernfs_get_active+0x12c/0x180
[ 357.874459][ T1000] ? new_device_store+0x450/0x450 [netdevsim]
[ 357.875111][ T1000] ? kernfs_get_parent+0x70/0x70
[ 357.875632][ T1000] ? sysfs_file_ops+0x160/0x160
[ 357.876152][ T1000] kernfs_fop_write+0x276/0x410
[ 357.876680][ T1000] ? __sb_start_write+0x1ba/0x2e0
[ 357.877225][ T1000] vfs_write+0x197/0x4a0
[ 357.877671][ T1000] ksys_write+0x141/0x1d0
[ ... ]
Reviewed-by: Jakub Kicinski <kuba@kernel.org>
Fixes: 79579220566c ("netdevsim: add SR-IOV functionality")
Fixes: 82c93a87bf8b ("netdevsim: implement couple of testing devlink health reporters")
Signed-off-by: Taehee Yoo <ap420073@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Debugfs APIs return valid pointer or error pointer. it doesn't return NULL.
So, using IS_ERR is enough, not using IS_ERR_OR_NULL.
Reviewed-by: Jakub Kicinski <kuba@kernel.org>
Reported-by: kbuild test robot <lkp@intel.com>
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Taehee Yoo <ap420073@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
When netdevsim dev is being created, a debugfs directory is created.
The variable "dev_ddir_name" is 16bytes device name pointer and device
name is "netdevsim<dev id>".
The maximum dev id length is 10.
So, 16bytes for device name isn't enough.
Test commands:
modprobe netdevsim
echo "1000000000 0" > /sys/bus/netdevsim/new_device
Splat looks like:
[ 249.622710][ T900] BUG: KASAN: stack-out-of-bounds in number+0x824/0x880
[ 249.623658][ T900] Write of size 1 at addr ffff88804c527988 by task bash/900
[ 249.624521][ T900]
[ 249.624830][ T900] CPU: 1 PID: 900 Comm: bash Not tainted 5.5.0+ #322
[ 249.625691][ T900] Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006
[ 249.626712][ T900] Call Trace:
[ 249.627103][ T900] dump_stack+0x96/0xdb
[ 249.627639][ T900] ? number+0x824/0x880
[ 249.628173][ T900] print_address_description.constprop.5+0x1be/0x360
[ 249.629022][ T900] ? number+0x824/0x880
[ 249.629569][ T900] ? number+0x824/0x880
[ 249.630105][ T900] __kasan_report+0x12a/0x170
[ 249.630717][ T900] ? number+0x824/0x880
[ 249.631201][ T900] kasan_report+0xe/0x20
[ 249.631723][ T900] number+0x824/0x880
[ 249.632235][ T900] ? put_dec+0xa0/0xa0
[ 249.632716][ T900] ? rcu_read_lock_sched_held+0x90/0xc0
[ 249.633392][ T900] vsnprintf+0x63c/0x10b0
[ 249.633983][ T900] ? pointer+0x5b0/0x5b0
[ 249.634543][ T900] ? mark_lock+0x11d/0xc40
[ 249.635200][ T900] sprintf+0x9b/0xd0
[ 249.635750][ T900] ? scnprintf+0xe0/0xe0
[ 249.636370][ T900] nsim_dev_probe+0x63c/0xbf0 [netdevsim]
[ ... ]
Reviewed-by: Jakub Kicinski <kuba@kernel.org>
Fixes: ab1d0cc004d7 ("netdevsim: change debugfs tree topology")
Signed-off-by: Taehee Yoo <ap420073@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
nsim_dev_take_snapshot_write() uses nsim_dev and nsim_dev->dummy_region.
So, during this function, these data shouldn't be removed.
But there is no protecting stuff in this function.
There are two similar cases.
1. reload case
reload could be called during nsim_dev_take_snapshot_write().
When reload is being executed, nsim_dev_reload_down() is called and it
calls nsim_dev_reload_destroy(). nsim_dev_reload_destroy() calls
devlink_region_destroy() to destroy nsim_dev->dummy_region.
So, during nsim_dev_take_snapshot_write(), nsim_dev->dummy_region()
would be removed.
At this point, snapshot_write() would access freed pointer.
In order to fix this case, take_snapshot file will be removed before
devlink_region_destroy().
The take_snapshot file will be re-created by ->reload_up().
2. del_device_store case
del_device_store() also could call nsim_dev_reload_destroy()
during nsim_dev_take_snapshot_write(). If so, panic would occur.
This problem is actually the same problem with the first case.
So, this problem will be fixed by the first case's solution.
Test commands:
modprobe netdevsim
while :
do
echo 1 > /sys/bus/netdevsim/new_device &
echo 1 > /sys/bus/netdevsim/del_device &
devlink dev reload netdevsim/netdevsim1 &
echo 1 > /sys/kernel/debug/netdevsim/netdevsim1/take_snapshot &
done
Splat looks like:
[ 45.564513][ T975] general protection fault, probably for non-canonical address 0xdffffc000000003a: 0000 [#1] SMP DEI
[ 45.566131][ T975] KASAN: null-ptr-deref in range [0x00000000000001d0-0x00000000000001d7]
[ 45.566135][ T975] CPU: 1 PID: 975 Comm: bash Not tainted 5.5.0+ #322
[ 45.569020][ T975] Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006
[ 45.569026][ T975] RIP: 0010:__mutex_lock+0x10a/0x14b0
[ 45.570518][ T975] Code: 08 84 d2 0f 85 7f 12 00 00 44 8b 0d 10 23 65 02 45 85 c9 75 29 49 8d 7f 68 48 b8 00 00 00 0f
[ 45.570522][ T975] RSP: 0018:ffff888046ccfbf0 EFLAGS: 00010206
[ 45.572305][ T975] RAX: dffffc0000000000 RBX: 0000000000000000 RCX: 0000000000000000
[ 45.572308][ T975] RDX: 000000000000003a RSI: ffffffffac926440 RDI: 00000000000001d0
[ 45.576843][ T975] RBP: ffff888046ccfd70 R08: ffffffffab610645 R09: 0000000000000000
[ 45.576847][ T975] R10: ffff888046ccfd90 R11: ffffed100d6360ad R12: 0000000000000000
[ 45.578471][ T975] R13: dffffc0000000000 R14: ffffffffae1976c0 R15: 0000000000000168
[ 45.578475][ T975] FS: 00007f614d6e7740(0000) GS:ffff88806c400000(0000) knlGS:0000000000000000
[ 45.581492][ T975] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 45.582942][ T975] CR2: 00005618677d1cf0 CR3: 000000005fb9c002 CR4: 00000000000606e0
[ 45.584543][ T975] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 45.586633][ T975] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 45.589889][ T975] Call Trace:
[ 45.591445][ T975] ? devlink_region_snapshot_create+0x55/0x4a0
[ 45.601250][ T975] ? mutex_lock_io_nested+0x1380/0x1380
[ 45.602817][ T975] ? mutex_lock_io_nested+0x1380/0x1380
[ 45.603875][ T975] ? mark_held_locks+0xa5/0xe0
[ 45.604769][ T975] ? _raw_spin_unlock_irqrestore+0x2d/0x50
[ 45.606147][ T975] ? __mutex_unlock_slowpath+0xd0/0x670
[ 45.607723][ T975] ? crng_backtrack_protect+0x80/0x80
[ 45.613530][ T975] ? wait_for_completion+0x390/0x390
[ 45.615152][ T975] ? devlink_region_snapshot_create+0x55/0x4a0
[ 45.616834][ T975] devlink_region_snapshot_create+0x55/0x4a0
[ ... ]
Fixes: 4418f862d675 ("netdevsim: implement support for devlink region and snapshots")
Signed-off-by: Taehee Yoo <ap420073@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
devlink reload destroys resources and allocates resources again.
So, when devices and ports resources are being used, devlink reload
function should not be executed. In order to avoid this race, a new
lock is added and new_port() and del_port() call devlink_reload_disable()
and devlink_reload_enable().
Thread0 Thread1
{new/del}_port() {new/del}_port()
devlink_reload_disable()
devlink_reload_disable()
devlink_reload_enable()
//here
devlink_reload_enable()
Before Thread1's devlink_reload_enable(), the devlink is already allowed
to execute reload because Thread0 allows it. devlink reload disable/enable
variable type is bool. So the above case would exist.
So, disable/enable should be executed atomically.
In order to do that, a new lock is used.
Test commands:
modprobe netdevsim
echo 1 > /sys/bus/netdevsim/new_device
while :
do
echo 1 > /sys/devices/netdevsim1/new_port &
echo 1 > /sys/devices/netdevsim1/del_port &
devlink dev reload netdevsim/netdevsim1 &
done
Splat looks like:
[ 23.342145][ T932] DEBUG_LOCKS_WARN_ON(mutex_is_locked(lock))
[ 23.342159][ T932] WARNING: CPU: 0 PID: 932 at kernel/locking/mutex-debug.c:103 mutex_destroy+0xc7/0xf0
[ 23.344182][ T932] Modules linked in: netdevsim openvswitch nsh nf_conncount nf_nat nf_conntrack nf_defrag_ipv6 nf_dx
[ 23.346485][ T932] CPU: 0 PID: 932 Comm: devlink Not tainted 5.5.0+ #322
[ 23.347696][ T932] Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006
[ 23.348893][ T932] RIP: 0010:mutex_destroy+0xc7/0xf0
[ 23.349505][ T932] Code: e0 07 83 c0 03 38 d0 7c 04 84 d2 75 2e 8b 05 00 ac b0 02 85 c0 75 8b 48 c7 c6 00 5e 07 96 40
[ 23.351887][ T932] RSP: 0018:ffff88806208f810 EFLAGS: 00010286
[ 23.353963][ T932] RAX: dffffc0000000008 RBX: ffff888067f6f2c0 RCX: ffffffff942c4bd4
[ 23.355222][ T932] RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffffffff96dac5b4
[ 23.356169][ T932] RBP: ffff888067f6f000 R08: fffffbfff2d235a5 R09: fffffbfff2d235a5
[ 23.357160][ T932] R10: 0000000000000001 R11: fffffbfff2d235a4 R12: ffff888067f6f208
[ 23.358288][ T932] R13: ffff88806208fa70 R14: ffff888067f6f000 R15: ffff888069ce3800
[ 23.359307][ T932] FS: 00007fe2a3876740(0000) GS:ffff88806c000000(0000) knlGS:0000000000000000
[ 23.360473][ T932] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 23.361319][ T932] CR2: 00005561357aa000 CR3: 000000005227a006 CR4: 00000000000606f0
[ 23.362323][ T932] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 23.363417][ T932] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 23.364414][ T932] Call Trace:
[ 23.364828][ T932] nsim_dev_reload_destroy+0x77/0xb0 [netdevsim]
[ 23.365655][ T932] nsim_dev_reload_down+0x84/0xb0 [netdevsim]
[ 23.366433][ T932] devlink_reload+0xb1/0x350
[ 23.367010][ T932] genl_rcv_msg+0x580/0xe90
[ ...]
[ 23.531729][ T1305] kernel BUG at lib/list_debug.c:53!
[ 23.532523][ T1305] invalid opcode: 0000 [#1] SMP DEBUG_PAGEALLOC KASAN PTI
[ 23.533467][ T1305] CPU: 2 PID: 1305 Comm: bash Tainted: G W 5.5.0+ #322
[ 23.534962][ T1305] Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006
[ 23.536503][ T1305] RIP: 0010:__list_del_entry_valid+0xe6/0x150
[ 23.538346][ T1305] Code: 89 ea 48 c7 c7 00 73 1e 96 e8 df f7 4c ff 0f 0b 48 c7 c7 60 73 1e 96 e8 d1 f7 4c ff 0f 0b 44
[ 23.541068][ T1305] RSP: 0018:ffff888047c27b58 EFLAGS: 00010282
[ 23.542001][ T1305] RAX: 0000000000000054 RBX: ffff888067f6f318 RCX: 0000000000000000
[ 23.543051][ T1305] RDX: 0000000000000054 RSI: 0000000000000008 RDI: ffffed1008f84f61
[ 23.544072][ T1305] RBP: ffff88804aa0fca0 R08: ffffed100d940539 R09: ffffed100d940539
[ 23.545085][ T1305] R10: 0000000000000001 R11: ffffed100d940538 R12: ffff888047c27cb0
[ 23.546422][ T1305] R13: ffff88806208b840 R14: ffffffff981976c0 R15: ffff888067f6f2c0
[ 23.547406][ T1305] FS: 00007f76c0431740(0000) GS:ffff88806c800000(0000) knlGS:0000000000000000
[ 23.548527][ T1305] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 23.549389][ T1305] CR2: 00007f5048f1a2f8 CR3: 000000004b310006 CR4: 00000000000606e0
[ 23.550636][ T1305] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 23.551578][ T1305] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 23.552597][ T1305] Call Trace:
[ 23.553004][ T1305] mutex_remove_waiter+0x101/0x520
[ 23.553646][ T1305] __mutex_lock+0xac7/0x14b0
[ 23.554218][ T1305] ? nsim_dev_port_del+0x4e/0x140 [netdevsim]
[ 23.554908][ T1305] ? mutex_lock_io_nested+0x1380/0x1380
[ 23.555570][ T1305] ? _parse_integer+0xf0/0xf0
[ 23.556043][ T1305] ? kstrtouint+0x86/0x110
[ 23.556504][ T1305] ? nsim_dev_port_del+0x4e/0x140 [netdevsim]
[ 23.557133][ T1305] nsim_dev_port_del+0x4e/0x140 [netdevsim]
[ 23.558024][ T1305] del_port_store+0xcc/0xf0 [netdevsim]
[ ... ]
Fixes: 75ba029f3c07 ("netdevsim: implement proper devlink reload")
Signed-off-by: Taehee Yoo <ap420073@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
When module is being initialized, __init() calls bus_register() and
driver_register().
These functions internally create various resources and sysfs files.
The sysfs files are used for basic operations(add/del device).
/sys/bus/netdevsim/new_device
/sys/bus/netdevsim/del_device
These sysfs files use netdevsim resources, they are mostly allocated
and initialized in ->probe() function, which is nsim_dev_probe().
But, sysfs files could be executed before ->probe() is finished.
So, accessing uninitialized data would occur.
Another problem is very similar.
/sys/bus/netdevsim/new_device internally creates sysfs files.
/sys/devices/netdevsim<id>/new_port
/sys/devices/netdevsim<id>/del_port
These sysfs files also use netdevsim resources, they are mostly allocated
and initialized in creating device routine, which is nsim_bus_dev_new().
But they also could be executed before nsim_bus_dev_new() is finished.
So, accessing uninitialized data would occur.
To fix these problems, this patch adds flags, which means whether the
operation is finished or not.
The flag variable 'nsim_bus_enable' means whether netdevsim bus was
initialized or not.
This is protected by nsim_bus_dev_list_lock.
The flag variable 'nsim_bus_dev->init' means whether nsim_bus_dev was
initialized or not.
This could be used in {new/del}_port_store() with no lock.
Test commands:
#SHELL1
modprobe netdevsim
while :
do
echo "1 1" > /sys/bus/netdevsim/new_device
echo "1 1" > /sys/bus/netdevsim/del_device
done
#SHELL2
while :
do
echo 1 > /sys/devices/netdevsim1/new_port
echo 1 > /sys/devices/netdevsim1/del_port
done
Splat looks like:
[ 47.508954][ T1008] general protection fault, probably for non-canonical address 0xdffffc0000000021: 0000 I
[ 47.510793][ T1008] KASAN: null-ptr-deref in range [0x0000000000000108-0x000000000000010f]
[ 47.511963][ T1008] CPU: 2 PID: 1008 Comm: bash Not tainted 5.5.0+ #322
[ 47.512823][ T1008] Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006
[ 47.514041][ T1008] RIP: 0010:__mutex_lock+0x10a/0x14b0
[ 47.514699][ T1008] Code: 08 84 d2 0f 85 7f 12 00 00 44 8b 0d 10 23 65 02 45 85 c9 75 29 49 8d 7f 68 48 b8 00 00 00 0f
[ 47.517163][ T1008] RSP: 0018:ffff888059b4fbb0 EFLAGS: 00010206
[ 47.517802][ T1008] RAX: dffffc0000000000 RBX: 0000000000000000 RCX: 0000000000000000
[ 47.518941][ T1008] RDX: 0000000000000021 RSI: ffffffff85926440 RDI: 0000000000000108
[ 47.519732][ T1008] RBP: ffff888059b4fd30 R08: ffffffffc073fad0 R09: 0000000000000000
[ 47.520729][ T1008] R10: ffff888059b4fd50 R11: ffff88804bb38040 R12: 0000000000000000
[ 47.521702][ T1008] R13: dffffc0000000000 R14: ffffffff871976c0 R15: 00000000000000a0
[ 47.522760][ T1008] FS: 00007fd4be05a740(0000) GS:ffff88806c800000(0000) knlGS:0000000000000000
[ 47.523877][ T1008] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 47.524627][ T1008] CR2: 0000561c82b69cf0 CR3: 0000000065dd6004 CR4: 00000000000606e0
[ 47.527662][ T1008] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 47.528604][ T1008] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 47.529531][ T1008] Call Trace:
[ 47.529874][ T1008] ? nsim_dev_port_add+0x50/0x150 [netdevsim]
[ 47.530470][ T1008] ? mutex_lock_io_nested+0x1380/0x1380
[ 47.531018][ T1008] ? _kstrtoull+0x76/0x160
[ 47.531449][ T1008] ? _parse_integer+0xf0/0xf0
[ 47.531874][ T1008] ? kernfs_fop_write+0x1cf/0x410
[ 47.532330][ T1008] ? sysfs_file_ops+0x160/0x160
[ 47.532773][ T1008] ? kstrtouint+0x86/0x110
[ 47.533168][ T1008] ? nsim_dev_port_add+0x50/0x150 [netdevsim]
[ 47.533721][ T1008] nsim_dev_port_add+0x50/0x150 [netdevsim]
[ 47.534336][ T1008] ? sysfs_file_ops+0x160/0x160
[ 47.534858][ T1008] new_port_store+0x99/0xb0 [netdevsim]
[ 47.535439][ T1008] ? del_port_store+0xb0/0xb0 [netdevsim]
[ 47.536035][ T1008] ? sysfs_file_ops+0x112/0x160
[ 47.536544][ T1008] ? sysfs_kf_write+0x3b/0x180
[ 47.537029][ T1008] kernfs_fop_write+0x276/0x410
[ 47.537548][ T1008] ? __sb_start_write+0x215/0x2e0
[ 47.538110][ T1008] vfs_write+0x197/0x4a0
[ ... ]
Fixes: f9d9db47d3ba ("netdevsim: add bus attributes to add new and delete devices")
Fixes: 794b2c05ca1c ("netdevsim: extend device attrs to support port addition and deletion")
Signed-off-by: Taehee Yoo <ap420073@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
|
|
It seems nsim_fib6_rt_create() intent was to return
either a valid pointer or an embedded error code.
BUG: unable to handle page fault for address: fffffffffffffff4
PGD 9870067 P4D 9870067 PUD 9872067 PMD 0
Oops: 0000 [#1] PREEMPT SMP KASAN
CPU: 0 PID: 22851 Comm: syz-executor.1 Not tainted 5.5.0-rc5-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
RIP: 0010:jhash2 include/linux/jhash.h:125 [inline]
RIP: 0010:rhashtable_jhash2+0x76/0x2c0 lib/rhashtable.c:963
Code: b9 00 00 00 00 00 fc ff df 48 c1 e8 03 0f b6 14 08 4c 89 f0 83 e0 07 83 c0 03 38 d0 7c 08 84 d2 0f 85 30 02 00 00 49 8d 7e 04 <41> 8b 06 48 be 00 00 00 00 00 fc ff df 48 89 fa 48 c1 ea 03 0f b6
RSP: 0018:ffffc90016127190 EFLAGS: 00010246
RAX: 0000000000000007 RBX: 00000000dfb3ab49 RCX: dffffc0000000000
RDX: 0000000000000000 RSI: ffffffff839ba7c8 RDI: fffffffffffffff8
RBP: ffffc900161271c0 R08: ffff8880951f8640 R09: ffffed1015d0703d
R10: ffffed1015d0703c R11: ffff8880ae8381e3 R12: 00000000dfb3ab49
R13: 00000000dfb3ab49 R14: fffffffffffffff4 R15: 0000000000000007
FS: 00007f40bfbc6700(0000) GS:ffff8880ae800000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: fffffffffffffff4 CR3: 0000000093660000 CR4: 00000000001406f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
rht_key_get_hash include/linux/rhashtable.h:133 [inline]
rht_key_hashfn include/linux/rhashtable.h:159 [inline]
rht_head_hashfn include/linux/rhashtable.h:174 [inline]
__rhashtable_insert_fast.constprop.0+0xe15/0x1180 include/linux/rhashtable.h:723
rhashtable_insert_fast include/linux/rhashtable.h:832 [inline]
nsim_fib6_rt_add drivers/net/netdevsim/fib.c:603 [inline]
nsim_fib6_rt_insert drivers/net/netdevsim/fib.c:658 [inline]
nsim_fib6_event drivers/net/netdevsim/fib.c:719 [inline]
nsim_fib_event drivers/net/netdevsim/fib.c:744 [inline]
nsim_fib_event_nb+0x1b16/0x2600 drivers/net/netdevsim/fib.c:772
notifier_call_chain+0xc2/0x230 kernel/notifier.c:83
__atomic_notifier_call_chain+0xa6/0x1a0 kernel/notifier.c:173
atomic_notifier_call_chain+0x2e/0x40 kernel/notifier.c:183
call_fib_notifiers+0x173/0x2a0 net/core/fib_notifier.c:35
call_fib6_notifiers+0x4b/0x60 net/ipv6/fib6_notifier.c:22
call_fib6_entry_notifiers+0xfb/0x150 net/ipv6/ip6_fib.c:399
fib6_add_rt2node net/ipv6/ip6_fib.c:1216 [inline]
fib6_add+0x20cd/0x3ec0 net/ipv6/ip6_fib.c:1471
__ip6_ins_rt+0x54/0x80 net/ipv6/route.c:1315
ip6_ins_rt+0x96/0xd0 net/ipv6/route.c:1325
__ipv6_dev_ac_inc+0x76f/0xb20 net/ipv6/anycast.c:324
ipv6_sock_ac_join+0x4c1/0x790 net/ipv6/anycast.c:139
do_ipv6_setsockopt.isra.0+0x3908/0x4290 net/ipv6/ipv6_sockglue.c:670
ipv6_setsockopt+0xff/0x180 net/ipv6/ipv6_sockglue.c:944
udpv6_setsockopt+0x68/0xb0 net/ipv6/udp.c:1564
sock_common_setsockopt+0x94/0xd0 net/core/sock.c:3149
__sys_setsockopt+0x261/0x4c0 net/socket.c:2130
__do_sys_setsockopt net/socket.c:2146 [inline]
__se_sys_setsockopt net/socket.c:2143 [inline]
__x64_sys_setsockopt+0xbe/0x150 net/socket.c:2143
do_syscall_64+0xfa/0x790 arch/x86/entry/common.c:294
entry_SYSCALL_64_after_hwframe+0x49/0xbe
RIP: 0033:0x45aff9
Fixes: 48bb9eb47b27 ("netdevsim: fib: Add dummy implementation for FIB offload")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: syzbot <syzkaller@googlegroups.com>
Cc: Ido Schimmel <idosch@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Implement dummy IPv4 and IPv6 FIB "offload" in the driver by storing
currently "programmed" routes in a hash table. Each route in the hash
table is marked with "trap" indication. The indication is cleared when
the route is replaced or when the netdevsim instance is deleted.
This will later allow us to test the route offload API on top of
netdevsim.
v2:
* Convert to new fib_alias_hw_flags_set() interface
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The function to obtain a unique snapshot id was mistakenly typo'd as
devlink_region_shapshot_id_get. Fix this typo by renaming the function
and all of its users.
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Rename the trap-specific netdevimsim.rst file, and expand it to include
documentation of all the devlink features currently implemented by the
netdevsim driver code.
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Cc: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Combine the documentation for devlink into a subfolder, and provide an
index.rst file that can be used to generally describe devlink.
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Now that mlxsw is converted to use the new FIB notifications it is
possible to delete the old ones and use the new replace / append /
delete notifications.
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Reviewed-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Unlike mlxsw, the other listeners to the FIB notification chain do not
require any special modifications as they never considered multiple
identical routes.
This patch removes the old route notifications and converts all the
listeners to use the new replace / delete notifications.
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Reviewed-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Update dummy reporter's output to use updated devlink interface of
binary fmsg pair.
Signed-off-by: Aya Levin <ayal@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
There is a race between driver code that does setup/cleanup of device
and devlink reload operation that in some drivers works with the same
code. Use after free could we easily obtained by running:
while true; do
echo 10 > /sys/bus/netdevsim/new_device
devlink dev reload netdevsim/netdevsim10 &
echo 10 > /sys/bus/netdevsim/del_device
done
Fix this by enabling reload only after setup of device is complete and
disabling it at the beginning of the cleanup process.
Reported-by: Ido Schimmel <idosch@mellanox.com>
Fixes: 2d8dc5bbf4e7 ("devlink: Add support for reload")
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Acked-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Looks like the port adding loop makes a re-appearance on net-next
after net was merged back into it (even though it doesn't feature
in the merge diff).
The ports are already added in nsim_dev_create() so when we try
to add them again get EEXIST, and see:
netdevsim: probe of netdevsim0 failed with error -17
in the logs. When we remove the loop again the nsim_dev_probe()
and nsim_dev_remove() become a wrapper of nsim_dev_create() and
nsim_dev_destroy(). Remove this layer of indirection.
Fixes: d31e95585ca6 ("Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net")
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The only slightly tricky merge conflict was the netdevsim because the
mutex locking fix overlapped a lot of driver reload reorganization.
The rest were (relatively) trivial in nature.
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Commit da58f90f11f5 ("netdevsim: Add devlink-trap support") added
delayed work to netdevsim that periodically iterates over the registered
netdevsim ports and reports various packet traps via devlink.
While the delayed work takes the 'port_list_lock' mutex to protect
against concurrent addition / deletion of ports, during device creation
/ dismantle ports are added / deleted without this lock, which can
result in a use-after-free [1].
Fix this by making sure that the ports list is always modified under the
lock.
[1]
[ 59.205543] ==================================================================
[ 59.207748] BUG: KASAN: use-after-free in nsim_dev_trap_report_work+0xa67/0xad0
[ 59.210247] Read of size 8 at addr ffff8883cbdd3398 by task kworker/3:1/38
[ 59.212584]
[ 59.213148] CPU: 3 PID: 38 Comm: kworker/3:1 Not tainted 5.4.0-rc3-custom-16119-ge6abb5f0261e #2013
[ 59.215896] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS ?-20180724_192412-buildhw-07.phx2.fedoraproject.org-1.fc29 04/01/2014
[ 59.218384] Workqueue: events nsim_dev_trap_report_work
[ 59.219428] Call Trace:
[ 59.219924] dump_stack+0xa9/0x10e
[ 59.220623] print_address_description.constprop.4+0x21/0x340
[ 59.221976] ? vprintk_func+0x66/0x240
[ 59.222752] __kasan_report.cold.8+0x78/0x91
[ 59.223602] ? nsim_dev_trap_report_work+0xa67/0xad0
[ 59.224603] kasan_report+0xe/0x20
[ 59.225296] nsim_dev_trap_report_work+0xa67/0xad0
[ 59.226435] ? rcu_read_lock_sched_held+0xaf/0xe0
[ 59.227512] ? trace_event_raw_event_rcu_quiescent_state_report+0x360/0x360
[ 59.228851] process_one_work+0x98f/0x1760
[ 59.229684] ? pwq_dec_nr_in_flight+0x330/0x330
[ 59.230656] worker_thread+0x91/0xc40
[ 59.231587] ? process_one_work+0x1760/0x1760
[ 59.232451] kthread+0x34a/0x410
[ 59.233104] ? __kthread_queue_delayed_work+0x240/0x240
[ 59.234141] ret_from_fork+0x3a/0x50
[ 59.234982]
[ 59.235371] Allocated by task 187:
[ 59.236189] save_stack+0x19/0x80
[ 59.236853] __kasan_kmalloc.constprop.5+0xc1/0xd0
[ 59.237822] kmem_cache_alloc_trace+0x14c/0x380
[ 59.238769] __nsim_dev_port_add+0xaf/0x5c0
[ 59.239627] nsim_dev_probe+0x4fc/0x1140
[ 59.240550] really_probe+0x264/0xc00
[ 59.241418] driver_probe_device+0x208/0x2e0
[ 59.242255] __device_attach_driver+0x215/0x2d0
[ 59.243150] bus_for_each_drv+0x154/0x1d0
[ 59.243944] __device_attach+0x1ba/0x2b0
[ 59.244923] bus_probe_device+0x1dd/0x290
[ 59.245805] device_add+0xbac/0x1550
[ 59.246528] new_device_store+0x1f4/0x400
[ 59.247306] bus_attr_store+0x7b/0xa0
[ 59.248047] sysfs_kf_write+0x10f/0x170
[ 59.248941] kernfs_fop_write+0x283/0x430
[ 59.249843] __vfs_write+0x81/0x100
[ 59.250546] vfs_write+0x1ce/0x510
[ 59.251190] ksys_write+0x104/0x200
[ 59.251873] do_syscall_64+0xa4/0x4e0
[ 59.252642] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 59.253837]
[ 59.254203] Freed by task 187:
[ 59.254811] save_stack+0x19/0x80
[ 59.255463] __kasan_slab_free+0x125/0x170
[ 59.256265] kfree+0x100/0x440
[ 59.256870] nsim_dev_remove+0x98/0x100
[ 59.257651] nsim_bus_remove+0x16/0x20
[ 59.258382] device_release_driver_internal+0x20b/0x4d0
[ 59.259588] bus_remove_device+0x2e9/0x5a0
[ 59.260551] device_del+0x410/0xad0
[ 59.263777] device_unregister+0x26/0xc0
[ 59.264616] nsim_bus_dev_del+0x16/0x60
[ 59.265381] del_device_store+0x2d6/0x3c0
[ 59.266295] bus_attr_store+0x7b/0xa0
[ 59.267192] sysfs_kf_write+0x10f/0x170
[ 59.267960] kernfs_fop_write+0x283/0x430
[ 59.268800] __vfs_write+0x81/0x100
[ 59.269551] vfs_write+0x1ce/0x510
[ 59.270252] ksys_write+0x104/0x200
[ 59.270910] do_syscall_64+0xa4/0x4e0
[ 59.271680] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 59.272812]
[ 59.273211] The buggy address belongs to the object at ffff8883cbdd3200
[ 59.273211] which belongs to the cache kmalloc-512 of size 512
[ 59.275838] The buggy address is located 408 bytes inside of
[ 59.275838] 512-byte region [ffff8883cbdd3200, ffff8883cbdd3400)
[ 59.278151] The buggy address belongs to the page:
[ 59.279215] page:ffffea000f2f7400 refcount:1 mapcount:0 mapping:ffff8883ecc0ce00 index:0x0 compound_mapcount: 0
[ 59.281449] flags: 0x200000000010200(slab|head)
[ 59.282356] raw: 0200000000010200 ffffea000f2f3a08 ffffea000f2fd608 ffff8883ecc0ce00
[ 59.283949] raw: 0000000000000000 0000000000150015 00000001ffffffff 0000000000000000
[ 59.285608] page dumped because: kasan: bad access detected
[ 59.286981]
[ 59.287337] Memory state around the buggy address:
[ 59.288310] ffff8883cbdd3280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 59.289763] ffff8883cbdd3300: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 59.291452] >ffff8883cbdd3380: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 59.292945] ^
[ 59.293815] ffff8883cbdd3400: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[ 59.295220] ffff8883cbdd3480: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[ 59.296872] ==================================================================
Fixes: da58f90f11f5 ("netdevsim: Add devlink-trap support")
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Reported-by: syzbot+9ed8f68ab30761f3678e@syzkaller.appspotmail.com
Acked-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
In nsim_fib_init(), if register_fib_notifier failed, nsim_fib_net_ops
should be unregistered before return.
In nsim_fib_exit(), unregister_fib_notifier should be called before
nsim_fib_net_ops be unregistered, otherwise may cause use-after-free:
BUG: KASAN: use-after-free in nsim_fib_event_nb+0x342/0x570 [netdevsim]
Read of size 8 at addr ffff8881daaf4388 by task kworker/0:3/3499
CPU: 0 PID: 3499 Comm: kworker/0:3 Not tainted 5.3.0-rc7+ #30
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1ubuntu1 04/01/2014
Workqueue: ipv6_addrconf addrconf_dad_work [ipv6]
Call Trace:
__dump_stack lib/dump_stack.c:77 [inline]
dump_stack+0xa9/0x10e lib/dump_stack.c:113
print_address_description+0x65/0x380 mm/kasan/report.c:351
__kasan_report+0x149/0x18d mm/kasan/report.c:482
kasan_report+0xe/0x20 mm/kasan/common.c:618
nsim_fib_event_nb+0x342/0x570 [netdevsim]
notifier_call_chain+0x52/0xf0 kernel/notifier.c:95
__atomic_notifier_call_chain+0x78/0x140 kernel/notifier.c:185
call_fib_notifiers+0x30/0x60 net/core/fib_notifier.c:30
call_fib6_entry_notifiers+0xc1/0x100 [ipv6]
fib6_add+0x92e/0x1b10 [ipv6]
__ip6_ins_rt+0x40/0x60 [ipv6]
ip6_ins_rt+0x84/0xb0 [ipv6]
__ipv6_ifa_notify+0x4b6/0x550 [ipv6]
ipv6_ifa_notify+0xa5/0x180 [ipv6]
addrconf_dad_completed+0xca/0x640 [ipv6]
addrconf_dad_work+0x296/0x960 [ipv6]
process_one_work+0x5c0/0xc00 kernel/workqueue.c:2269
worker_thread+0x5c/0x670 kernel/workqueue.c:2415
kthread+0x1d7/0x200 kernel/kthread.c:255
ret_from_fork+0x3a/0x50 arch/x86/entry/entry_64.S:352
Allocated by task 3388:
save_stack+0x19/0x80 mm/kasan/common.c:69
set_track mm/kasan/common.c:77 [inline]
__kasan_kmalloc.constprop.3+0xa0/0xd0 mm/kasan/common.c:493
kmalloc include/linux/slab.h:557 [inline]
kzalloc include/linux/slab.h:748 [inline]
ops_init+0xa9/0x220 net/core/net_namespace.c:127
__register_pernet_operations net/core/net_namespace.c:1135 [inline]
register_pernet_operations+0x1d4/0x420 net/core/net_namespace.c:1212
register_pernet_subsys+0x24/0x40 net/core/net_namespace.c:1253
nsim_fib_init+0x12/0x70 [netdevsim]
veth_get_link_ksettings+0x2b/0x50 [veth]
do_one_initcall+0xd4/0x454 init/main.c:939
do_init_module+0xe0/0x330 kernel/module.c:3490
load_module+0x3c2f/0x4620 kernel/module.c:3841
__do_sys_finit_module+0x163/0x190 kernel/module.c:3931
do_syscall_64+0x72/0x2e0 arch/x86/entry/common.c:296
entry_SYSCALL_64_after_hwframe+0x49/0xbe
Freed by task 3534:
save_stack+0x19/0x80 mm/kasan/common.c:69
set_track mm/kasan/common.c:77 [inline]
__kasan_slab_free+0x130/0x180 mm/kasan/common.c:455
slab_free_hook mm/slub.c:1423 [inline]
slab_free_freelist_hook mm/slub.c:1474 [inline]
slab_free mm/slub.c:3016 [inline]
kfree+0xe9/0x2d0 mm/slub.c:3957
ops_free net/core/net_namespace.c:151 [inline]
ops_free_list.part.7+0x156/0x220 net/core/net_namespace.c:184
ops_free_list net/core/net_namespace.c:182 [inline]
__unregister_pernet_operations net/core/net_namespace.c:1165 [inline]
unregister_pernet_operations+0x221/0x2a0 net/core/net_namespace.c:1224
unregister_pernet_subsys+0x1d/0x30 net/core/net_namespace.c:1271
nsim_fib_exit+0x11/0x20 [netdevsim]
nsim_module_exit+0x16/0x21 [netdevsim]
__do_sys_delete_module kernel/module.c:1015 [inline]
__se_sys_delete_module kernel/module.c:958 [inline]
__x64_sys_delete_module+0x244/0x330 kernel/module.c:958
do_syscall_64+0x72/0x2e0 arch/x86/entry/common.c:296
entry_SYSCALL_64_after_hwframe+0x49/0xbe
Reported-by: Hulk Robot <hulkci@huawei.com>
Fixes: 59c84b9fcf42 ("netdevsim: Restore per-network namespace accounting for fib entries")
Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Acked-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Implement "empty" and "dummy" reporters. The first one is really simple
and does nothing. The other one has debugfs files to trigger breakage
and it is able to do recovery. The ops also implement dummy fmsg
content.
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|