summaryrefslogtreecommitdiffstats
path: root/net/openvswitch/datapath.c
AgeCommit message (Collapse)AuthorFilesLines
2022-12-21net: openvswitch: release vport resources on failureAaron Conole1-2/+6
A recent commit introducing upcall packet accounting failed to properly release the vport object when the per-cpu stats struct couldn't be allocated. This can cause dangling pointers to dp objects long after they've been released. Cc: wangchuanlei <wangchuanlei@inspur.com> Fixes: 1933ea365aa7 ("net: openvswitch: Add support to count upcall packets") Reported-by: syzbot+8f4e2dcfcb3209ac35f9@syzkaller.appspotmail.com Signed-off-by: Aaron Conole <aconole@redhat.com> Acked-by: Eelco Chaudron <echaudro@redhat.com> Reviewed-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com> Link: https://lore.kernel.org/r/20221220212717.526780-1-aconole@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-12-16openvswitch: Fix flow lookup to use unmasked keyEelco Chaudron1-9/+16
The commit mentioned below causes the ovs_flow_tbl_lookup() function to be called with the masked key. However, it's supposed to be called with the unmasked key. This due to the fact that the datapath supports installing wider flows, and OVS relies on this behavior. For example if ipv4(src=1.1.1.1/192.0.0.0, dst=1.1.1.2/192.0.0.0) exists, a wider flow (smaller mask) of ipv4(src=192.1.1.1/128.0.0.0,dst=192.1.1.2/ 128.0.0.0) is allowed to be added. However, if we try to add a wildcard rule, the installation fails: $ ovs-appctl dpctl/add-flow system@myDP "in_port(1),eth_type(0x0800), \ ipv4(src=1.1.1.1/192.0.0.0,dst=1.1.1.2/192.0.0.0,frag=no)" 2 $ ovs-appctl dpctl/add-flow system@myDP "in_port(1),eth_type(0x0800), \ ipv4(src=192.1.1.1/0.0.0.0,dst=49.1.1.2/0.0.0.0,frag=no)" 2 ovs-vswitchd: updating flow table (File exists) The reason is that the key used to determine if the flow is already present in the system uses the original key ANDed with the mask. This results in the IP address not being part of the (miniflow) key, i.e., being substituted with an all-zero value. When doing the actual lookup, this results in the key wrongfully matching the first flow, and therefore the flow does not get installed. This change reverses the commit below, but rather than having the key on the stack, it's allocated. Fixes: 190aa3e77880 ("openvswitch: Fix Frame-size larger than 1024 bytes warning.") Signed-off-by: Eelco Chaudron <echaudro@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-12-09net: openvswitch: Add support to count upcall packetswangchuanlei1-0/+41
Add support to count upall packets, when kmod of openvswitch upcall to count the number of packets for upcall succeed and failed, which is a better way to see how many packets upcalled on every interfaces. Signed-off-by: wangchuanlei <wangchuanlei@inspur.com> Acked-by: Eelco Chaudron <echaudro@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-11-03Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski1-0/+1
No conflicts. Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-10-28net: openvswitch: add missing .resv_start_opJakub Kicinski1-0/+1
I missed one of the families in OvS when annotating .resv_start_op. This triggers the warning added in commit ce48ebdd5651 ("genetlink: limit the use of validation workarounds to old ops"). Reported-by: syzbot+40eb8c0447c0e47a7e9b@syzkaller.appspotmail.com Fixes: 9c5d03d36251 ("genetlink: start to validate reserved header bytes") Link: https://lore.kernel.org/r/20221028032501.2724270-1-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-10-28net: Remove the obsolte u64_stats_fetch_*_irq() users (net).Thomas Gleixner1-2/+2
Now that the 32bit UP oddity is gone and 32bit uses always a sequence count, there is no need for the fetch_irq() variants anymore. Convert to the regular interface. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-10-27openvswitch: switch from WARN to pr_warnAaron Conole1-1/+2
As noted by Paolo Abeni, pr_warn doesn't generate any splat and can still preserve the warning to the user that feature downgrade occurred. We likely cannot introduce other kinds of checks / enforcement here because syzbot can generate different genl versions to the datapath. Reported-by: syzbot+31cde0bef4bbf8ba2d86@syzkaller.appspotmail.com Fixes: 44da5ae5fbea ("openvswitch: Drop user features if old user space attempted to create datapath") Cc: Thomas Graf <tgraf@suug.ch> Signed-off-by: Aaron Conole <aconole@redhat.com> Acked-by: Ilya Maximets <i.maximets@ovn.org> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2022-09-01Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski1-1/+3
tools/testing/selftests/net/.gitignore sort the net-next version and use it Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-08-29genetlink: start to validate reserved header bytesJakub Kicinski1-0/+3
We had historically not checked that genlmsghdr.reserved is 0 on input which prevents us from using those precious bytes in the future. One use case would be to extend the cmd field, which is currently just 8 bits wide and 256 is not a lot of commands for some core families. To make sure that new families do the right thing by default put the onus of opting out of validation on existing families. Signed-off-by: Jakub Kicinski <kuba@kernel.org> Acked-by: Paul Moore <paul@paul-moore.com> (NetLabel) Signed-off-by: David S. Miller <davem@davemloft.net>
2022-08-26openvswitch: add OVS_DP_ATTR_PER_CPU_PIDS to get requestsAndrey Zhadchenko1-1/+9
CRIU needs OVS_DP_ATTR_PER_CPU_PIDS to checkpoint/restore newest openvswitch versions. Add pids to generic datapath reply. Limit exported pids amount to nr_cpu_ids. Signed-off-by: Andrey Zhadchenko <andrey.zhadchenko@virtuozzo.com> Acked-by: Christian Brauner (Microsoft) <brauner@kernel.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-08-26openvswitch: allow specifying ifindex of new interfacesAndrey Zhadchenko1-2/+9
CRIU is preserving ifindexes of net devices after restoration. However, current Open vSwitch API does not allow to target ifindex, so we cannot correctly restore OVS configuration. Add new OVS_DP_ATTR_IFINDEX for OVS_DP_CMD_NEW and use it as desired ifindex. Use OVS_VPORT_ATTR_IFINDEX during OVS_VPORT_CMD_NEW to specify new netdev ifindex. Signed-off-by: Andrey Zhadchenko <andrey.zhadchenko@virtuozzo.com> Acked-by: Christian Brauner (Microsoft) <brauner@kernel.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-08-26openvswitch: fix memory leak at failed datapath creationAndrey Zhadchenko1-1/+3
ovs_dp_cmd_new()->ovs_dp_change()->ovs_dp_set_upcall_portids() allocates array via kmalloc. If for some reason new_vport() fails during ovs_dp_cmd_new() dp->upcall_portids must be freed. Add missing kfree. Kmemleak example: unreferenced object 0xffff88800c382500 (size 64): comm "dump_state", pid 323, jiffies 4294955418 (age 104.347s) hex dump (first 32 bytes): 5e c2 79 e4 1f 7a 38 c7 09 21 38 0c 80 88 ff ff ^.y..z8..!8..... 03 00 00 00 0a 00 00 00 14 00 00 00 28 00 00 00 ............(... backtrace: [<0000000071bebc9f>] ovs_dp_set_upcall_portids+0x38/0xa0 [<000000000187d8bd>] ovs_dp_change+0x63/0xe0 [<000000002397e446>] ovs_dp_cmd_new+0x1f0/0x380 [<00000000aa06f36e>] genl_family_rcv_msg_doit+0xea/0x150 [<000000008f583bc4>] genl_rcv_msg+0xdc/0x1e0 [<00000000fa10e377>] netlink_rcv_skb+0x50/0x100 [<000000004959cece>] genl_rcv+0x24/0x40 [<000000004699ac7f>] netlink_unicast+0x23e/0x360 [<00000000c153573e>] netlink_sendmsg+0x24e/0x4b0 [<000000006f4aa380>] sock_sendmsg+0x62/0x70 [<00000000d0068654>] ____sys_sendmsg+0x230/0x270 [<0000000012dacf7d>] ___sys_sendmsg+0x88/0xd0 [<0000000011776020>] __sys_sendmsg+0x59/0xa0 [<000000002e8f2dc1>] do_syscall_64+0x3b/0x90 [<000000003243e7cb>] entry_SYSCALL_64_after_hwframe+0x63/0xcd Fixes: b83d23a2a38b ("openvswitch: Introduce per-cpu upcall dispatch") Acked-by: Aaron Conole <aconole@redhat.com> Signed-off-by: Andrey Zhadchenko <andrey.zhadchenko@virtuozzo.com> Link: https://lore.kernel.org/r/20220825020326.664073-1-andrey.zhadchenko@virtuozzo.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-08-22openvswitch: Fix overreporting of drops in dropwatchMike Pattrick1-2/+3
Currently queue_userspace_packet will call kfree_skb for all frames, whether or not an error occurred. This can result in a single dropped frame being reported as multiple drops in dropwatch. This functions caller may also call kfree_skb in case of an error. This patch will consume the skbs instead and allow caller's to use kfree_skb. Signed-off-by: Mike Pattrick <mkp@redhat.com> Link: https://bugzilla.redhat.com/show_bug.cgi?id=2109957 Signed-off-by: David S. Miller <davem@davemloft.net>
2022-08-22openvswitch: Fix double reporting of drops in dropwatchMike Pattrick1-3/+10
Frames sent to userspace can be reported as dropped in ovs_dp_process_packet, however, if they are dropped in the netlink code then netlink_attachskb will report the same frame as dropped. This patch checks for error codes which indicate that the frame has already been freed. Signed-off-by: Mike Pattrick <mkp@redhat.com> Link: https://bugzilla.redhat.com/show_bug.cgi?id=2109946 Signed-off-by: David S. Miller <davem@davemloft.net>
2022-02-05net/sched: Enable tc skb ext allocation on chain miss only when neededPaul Blakey1-7/+11
Currently tc skb extension is used to send miss info from tc to ovs datapath module, and driver to tc. For the tc to ovs miss it is currently always allocated even if it will not be used by ovs datapath (as it depends on a requested feature). Export the static key which is used by openvswitch module to guard this code path as well, so it will be skipped if ovs datapath doesn't need it. Enable this code path once ovs datapath needs it. Signed-off-by: Paul Blakey <paulb@nvidia.com> Reviewed-by: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-07-27openvswitch: fix sparse warning incorrect typeMark Gray1-1/+1
fix incorrect type in argument 1 (different address spaces) ../net/openvswitch/datapath.c:169:17: warning: incorrect type in argument 1 (different address spaces) ../net/openvswitch/datapath.c:169:17: expected void const * ../net/openvswitch/datapath.c:169:17: got struct dp_nlsk_pids [noderef] __rcu *upcall_portids Found at: https://patchwork.kernel.org/project/netdevbpf/patch/20210630095350.817785-1-mark.d.gray@redhat.com/#24285159 Signed-off-by: Mark Gray <mark.d.gray@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-07-27openvswitch: fix alignment issuesMark Gray1-6/+10
Signed-off-by: Mark Gray <mark.d.gray@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-07-16openvswitch: Introduce per-cpu upcall dispatchMark Gray1-2/+70
The Open vSwitch kernel module uses the upcall mechanism to send packets from kernel space to user space when it misses in the kernel space flow table. The upcall sends packets via a Netlink socket. Currently, a Netlink socket is created for every vport. In this way, there is a 1:1 mapping between a vport and a Netlink socket. When a packet is received by a vport, if it needs to be sent to user space, it is sent via the corresponding Netlink socket. This mechanism, with various iterations of the corresponding user space code, has seen some limitations and issues: * On systems with a large number of vports, there is a correspondingly large number of Netlink sockets which can limit scaling. (https://bugzilla.redhat.com/show_bug.cgi?id=1526306) * Packet reordering on upcalls. (https://bugzilla.redhat.com/show_bug.cgi?id=1844576) * A thundering herd issue. (https://bugzilla.redhat.com/show_bug.cgi?id=1834444) This patch introduces an alternative, feature-negotiated, upcall mode using a per-cpu dispatch rather than a per-vport dispatch. In this mode, the Netlink socket to be used for the upcall is selected based on the CPU of the thread that is executing the upcall. In this way, it resolves the issues above as: a) The number of Netlink sockets scales with the number of CPUs rather than the number of vports. b) Ordering per-flow is maintained as packets are distributed to CPUs based on mechanisms such as RSS and flows are distributed to a single user space thread. c) Packets from a flow can only wake up one user space thread. The corresponding user space code can be found at: https://mail.openvswitch.org/pipermail/ovs-dev/2021-July/385139.html Bugzilla: https://bugzilla.redhat.com/1844576 Signed-off-by: Mark Gray <mark.d.gray@redhat.com> Acked-by: Flavio Leitner <fbl@sysclose.org> Acked-by: Pravin B Shelar <pshelar@ovn.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-22openvswitch: add trace pointsAaron Conole1-0/+4
This makes openvswitch module use the event tracing framework to log the upcall interface and action execution pipeline. When using openvswitch as the packet forwarding engine, some types of debugging are made possible simply by using the ovs-vswitchd's ofproto/trace command. However, such a command has some limitations: 1. When trying to trace packets that go through the CT action, the state of the packet can't be determined, and probably would be potentially wrong. 2. Deducing problem packets can sometimes be difficult as well even if many of the flows are known 3. It's possible to use the openvswitch module even without the ovs-vswitchd (although, not common use). Introduce the event tracing points here to make it possible for working through these problems in kernel space. The style is copied from the mac80211 driver-trace / trace code for consistency - this creates some checkpatch splats, but the official 'guide' for adding tracepoints, as well as the existing examples all add the same splats so it seems acceptable. Signed-off-by: Aaron Conole <aconole@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-11-03net: openvswitch: silence suspicious RCU usage warningEelco Chaudron1-7/+7
Silence suspicious RCU usage warning in ovs_flow_tbl_masks_cache_resize() by replacing rcu_dereference() with rcu_dereference_ovsl(). In addition, when creating a new datapath, make sure it's configured under the ovs_lock. Fixes: 9bf24f594c6a ("net: openvswitch: make masks cache size configurable") Reported-by: syzbot+9a8f8bfcc56e8578016c@syzkaller.appspotmail.com Signed-off-by: Eelco Chaudron <echaudro@redhat.com> Link: https://lore.kernel.org/r/160439190002.56943.1418882726496275961.stgit@ebuild Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-10-02genetlink: move to smaller ops wherever possibleJakub Kicinski1-12/+12
Bulk of the genetlink users can use smaller ops, move them. Signed-off-by: Jakub Kicinski <kuba@kernel.org> Reviewed-by: Johannes Berg <johannes@sipsolutions.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-09-01net: openvswitch: fixes crash if nf_conncount_init() failsEelco Chaudron1-1/+7
If nf_conncount_init fails currently the dispatched work is not canceled, causing problems when the timer fires. This change fixes this by not scheduling the work until all initialization is successful. Fixes: a65878d6f00b ("net: openvswitch: fixes potential deadlock in dp cleanup code") Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: Eelco Chaudron <echaudro@redhat.com> Reviewed-by: Tonghao Zhang <xiangxia.m.yue@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-09-01net: openvswitch: improve the coding styleTonghao Zhang1-16/+22
Not change the logic, just improve the coding style. Cc: Pravin B Shelar <pshelar@ovn.org> Signed-off-by: Tonghao Zhang <xiangxia.m.yue@gmail.com> Acked-by: Pravin B Shelar <pshelar@ovn.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-08-13net: openvswitch: introduce common code for flushing flowsTonghao Zhang1-1/+9
To avoid some issues, for example RCU usage warning and double free, we should flush the flows under ovs_lock. This patch refactors table_instance_destroy and introduces table_instance_flow_flush which can be invoked by __dp_destroy or ovs_flow_tbl_flush. Fixes: 50b0e61b32ee ("net: openvswitch: fix possible memleak on destroy flow-table") Reported-by: Johan Knöös <jknoos@google.com> Reported-at: https://mail.openvswitch.org/pipermail/ovs-discuss/2020-August/050489.html Signed-off-by: Tonghao Zhang <xiangxia.m.yue@gmail.com> Reviewed-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-08-03net: openvswitch: make masks cache size configurableEelco Chaudron1-0/+17
This patch makes the masks cache size configurable, or with a size of 0, disable it. Reviewed-by: Paolo Abeni <pabeni@redhat.com> Reviewed-by: Tonghao Zhang <xiangxia.m.yue@gmail.com> Signed-off-by: Eelco Chaudron <echaudro@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-08-03net: openvswitch: add masks cache hit counterEelco Chaudron1-1/+4
Add a counter that counts the number of masks cache hits, and export it through the megaflow netlink statistics. Reviewed-by: Paolo Abeni <pabeni@redhat.com> Reviewed-by: Tonghao Zhang <xiangxia.m.yue@gmail.com> Signed-off-by: Eelco Chaudron <echaudro@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-24net: openvswitch: fixes potential deadlock in dp cleanup codeEelco Chaudron1-11/+12
The previous patch introduced a deadlock, this patch fixes it by making sure the work is canceled without holding the global ovs lock. This is done by moving the reorder processing one layer up to the netns level. Fixes: eac87c413bf9 ("net: openvswitch: reorder masks array based on usage") Reported-by: syzbot+2c4ff3614695f75ce26c@syzkaller.appspotmail.com Reported-by: syzbot+bad6507e5db05017b008@syzkaller.appspotmail.com Reviewed-by: Paolo <pabeni@redhat.com> Signed-off-by: Eelco Chaudron <echaudro@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-17net: openvswitch: reorder masks array based on usageEelco Chaudron1-0/+22
This patch reorders the masks array every 4 seconds based on their usage count. This greatly reduces the masks per packet hit, and hence the overall performance. Especially in the OVS/OVN case for OpenShift. Here are some results from the OVS/OVN OpenShift test, which use 8 pods, each pod having 512 uperf connections, each connection sends a 64-byte request and gets a 1024-byte response (TCP). All uperf clients are on 1 worker node while all uperf servers are on the other worker node. Kernel without this patch : 7.71 Gbps Kernel with this patch applied: 14.52 Gbps We also run some tests to verify the rebalance activity does not lower the flow insertion rate, which does not. Signed-off-by: Eelco Chaudron <echaudro@redhat.com> Tested-by: Andrew Theurer <atheurer@redhat.com> Reviewed-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-04-20net: openvswitch: ovs_ct_exit to be done under ovs_lockTonghao Zhang1-1/+3
syzbot wrote: | ============================= | WARNING: suspicious RCU usage | 5.7.0-rc1+ #45 Not tainted | ----------------------------- | net/openvswitch/conntrack.c:1898 RCU-list traversed in non-reader section!! | | other info that might help us debug this: | rcu_scheduler_active = 2, debug_locks = 1 | ... | | stack backtrace: | Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.0-0-ga698c8995f-prebuilt.qemu.org 04/01/2014 | Workqueue: netns cleanup_net | Call Trace: | ... | ovs_ct_exit | ovs_exit_net | ops_exit_list.isra.7 | cleanup_net | process_one_work | worker_thread To avoid that warning, invoke the ovs_ct_exit under ovs_lock and add lockdep_ovsl_is_held as optional lockdep expression. Link: https://lore.kernel.org/lkml/000000000000e642a905a0cbee6e@google.com Fixes: 11efd5cb04a1 ("openvswitch: Support conntrack zone limit") Cc: Pravin B Shelar <pshelar@ovn.org> Cc: Yi-Hung Wei <yihung.wei@gmail.com> Reported-by: syzbot+7ef50afd3a211f879112@syzkaller.appspotmail.com Signed-off-by: Tonghao Zhang <xiangxia.m.yue@gmail.com> Acked-by: Pravin B Shelar <pshelar@ovn.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-29net: Fix typo of SKB_SGO_CB_OFFSETCambda Zhu1-1/+1
The SKB_SGO_CB_OFFSET should be SKB_GSO_CB_OFFSET which means the offset of the GSO in skb cb. This patch fixes the typo. Fixes: 9207f9d45b0a ("net: preserve IP control block during GSO segmentation") Signed-off-by: Cambda Zhu <cambda@linux.alibaba.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-03openvswitch: add missing attribute validation for hashJakub Kicinski1-0/+1
Add missing attribute validation for OVS_PACKET_ATTR_HASH to the netlink policy. Fixes: bd1903b7c459 ("net: openvswitch: add hash info to upcall") Signed-off-by: Jakub Kicinski <kuba@kernel.org> Reviewed-by: Greg Rose <gvrose8192@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-02-18datapath.c: Use built-in RCU list checkingMadhuparna Bhowmik1-3/+6
hlist_for_each_entry_rcu() has built-in RCU and lock checking. Pass cond argument to list_for_each_entry_rcu() to silence false lockdep warning when CONFIG_PROVE_RCU_LIST is enabled by default. Signed-off-by: Madhuparna Bhowmik <madhuparnabhowmik10@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-01-14net: openvswitch: use skb_list_walk_safe helper for gso segmentsJason A. Donenfeld1-7/+4
This is a straight-forward conversion case for the new function, keeping the flow of the existing code as intact as possible. Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-12-09treewide: Use sizeof_field() macroPankaj Bharadiya1-1/+1
Replace all the occurrences of FIELD_SIZEOF() with sizeof_field() except at places where these are defined. Later patches will remove the unused definition of FIELD_SIZEOF(). This patch is generated using following script: EXCLUDE_FILES="include/linux/stddef.h|include/linux/kernel.h" git grep -l -e "\bFIELD_SIZEOF\b" | while read file; do if [[ "$file" =~ $EXCLUDE_FILES ]]; then continue fi sed -i -e 's/\bFIELD_SIZEOF\b/sizeof_field/g' $file; done Signed-off-by: Pankaj Bharadiya <pankaj.laxminarayan.bharadiya@intel.com> Link: https://lore.kernel.org/r/20190924105839.110713-3-pankaj.laxminarayan.bharadiya@intel.com Co-developed-by: Kees Cook <keescook@chromium.org> Signed-off-by: Kees Cook <keescook@chromium.org> Acked-by: David Miller <davem@davemloft.net> # for net
2019-12-01openvswitch: remove another BUG_ON()Paolo Abeni1-1/+5
If we can't build the flow del notification, we can simply delete the flow, no need to crash the kernel. Still keep a WARN_ON to preserve debuggability. Note: the BUG_ON() predates the Fixes tag, but this change can be applied only after the mentioned commit. v1 -> v2: - do not leak an skb on error Fixes: aed067783e50 ("openvswitch: Minimize ovs_flow_cmd_del critical section.") Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-12-01openvswitch: drop unneeded BUG_ON() in ovs_flow_cmd_build_info()Paolo Abeni1-1/+4
All the callers of ovs_flow_cmd_build_info() already deal with error return code correctly, so we can handle the error condition in a more gracefull way. Still dump a warning to preserve debuggability. v1 -> v2: - clarify the commit message - clean the skb and report the error (DaveM) Fixes: ccb1352e76cf ("net: Add Open vSwitch kernel components.") Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-11-26openvswitch: fix flow command message sizePaolo Abeni1-1/+5
When user-space sets the OVS_UFID_F_OMIT_* flags, and the relevant flow has no UFID, we can exceed the computed size, as ovs_nla_put_identifier() will always dump an OVS_FLOW_ATTR_KEY attribute. Take the above in account when computing the flow command message size. Fixes: 74ed7ab9264c ("openvswitch: Add support for unique flow IDs.") Reported-by: Qi Jun Ding <qding@redhat.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-11-15net: openvswitch: don't call pad_packet if not necessaryTonghao Zhang1-14/+8
The nla_put_u16/nla_put_u32 makes sure that *attrlen is align. The call tree is that: nla_put_u16/nla_put_u32 -> nla_put attrlen = sizeof(u16) or sizeof(u32) -> __nla_put attrlen -> __nla_reserve attrlen -> skb_put(skb, nla_total_size(attrlen)) nla_total_size returns the total length of attribute including padding. Cc: Joe Stringer <joe@ovn.org> Cc: William Tu <u9012063@gmail.com> Signed-off-by: Tonghao Zhang <xiangxia.m.yue@gmail.com> Acked-by: Pravin B Shelar <pshelar@ovn.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-11-14net: openvswitch: add hash info to upcallTonghao Zhang1-1/+25
When using the kernel datapath, the upcall don't include skb hash info relatived. That will introduce some problem, because the hash of skb is important in kernel stack. For example, VXLAN module uses it to select UDP src port. The tx queue selection may also use the hash in stack. Hash is computed in different ways. Hash is random for a TCP socket, and hash may be computed in hardware, or software stack. Recalculation hash is not easy. Hash of TCP socket is computed: tcp_v4_connect -> sk_set_txhash (is random) __tcp_transmit_skb -> skb_set_hash_from_sk There will be one upcall, without information of skb hash, to ovs-vswitchd, for the first packet of a TCP session. The rest packets will be processed in Open vSwitch modules, hash kept. If this tcp session is forward to VXLAN module, then the UDP src port of first tcp packet is different from rest packets. TCP packets may come from the host or dockers, to Open vSwitch. To fix it, we store the hash info to upcall, and restore hash when packets sent back. +---------------+ +-------------------------+ | Docker/VMs | | ovs-vswitchd | +----+----------+ +-+--------------------+--+ | ^ | | | | | | upcall v restore packet hash (not recalculate) | +-+--------------------+--+ | tap netdev | | vxlan module +---------------> +--> Open vSwitch ko +--> or internal type | | +-------------------------+ Reported-at: https://mail.openvswitch.org/pipermail/ovs-dev/2019-October/364062.html Signed-off-by: Tonghao Zhang <xiangxia.m.yue@gmail.com> Acked-by: Pravin B Shelar <pshelar@ovn.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-11-03net: openvswitch: simplify the ovs_dp_cmd_newTonghao Zhang1-22/+38
use the specified functions to init resource. Signed-off-by: Tonghao Zhang <xiangxia.m.yue@gmail.com> Tested-by: Greg Rose <gvrose8192@gmail.com> Acked-by: Pravin B Shelar <pshelar@ovn.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-11-03net: openvswitch: don't unlock mutex when changing the user_features failsTonghao Zhang1-1/+1
Unlocking of a not locked mutex is not allowed. Other kernel thread may be in critical section while we unlock it because of setting user_feature fail. Fixes: 95a7233c4 ("net: openvswitch: Set OvS recirc_id from tc chain index") Cc: Paul Blakey <paulb@mellanox.com> Signed-off-by: Tonghao Zhang <xiangxia.m.yue@gmail.com> Tested-by: Greg Rose <gvrose8192@gmail.com> Acked-by: William Tu <u9012063@gmail.com> Acked-by: Pravin B Shelar <pshelar@ovn.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-11-03net: openvswitch: add flow-mask cache for performanceTonghao Zhang1-1/+2
The idea of this optimization comes from a patch which is committed in 2014, openvswitch community. The author is Pravin B Shelar. In order to get high performance, I implement it again. Later patches will use it. Pravin B Shelar, says: | On every packet OVS needs to lookup flow-table with every | mask until it finds a match. The packet flow-key is first | masked with mask in the list and then the masked key is | looked up in flow-table. Therefore number of masks can | affect packet processing performance. Link: https://github.com/openvswitch/ovs/commit/5604935e4e1cbc16611d2d97f50b717aa31e8ec5 Signed-off-by: Tonghao Zhang <xiangxia.m.yue@gmail.com> Tested-by: Greg Rose <gvrose8192@gmail.com> Acked-by: William Tu <u9012063@gmail.com> Signed-off-by: Pravin B Shelar <pshelar@ovn.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-10-25netns: fix GFP flags in rtnl_net_notifyid()Guillaume Nault1-9/+11
In rtnl_net_notifyid(), we certainly can't pass a null GFP flag to rtnl_notify(). A GFP_KERNEL flag would be fine in most circumstances, but there are a few paths calling rtnl_net_notifyid() from atomic context or from RCU critical sections. The later also precludes the use of gfp_any() as it wouldn't detect the RCU case. Also, the nlmsg_new() call is wrong too, as it uses GFP_KERNEL unconditionally. Therefore, we need to pass the GFP flags as parameter and propagate it through function calls until the proper flags can be determined. In most cases, GFP_KERNEL is fine. The exceptions are: * openvswitch: ovs_vport_cmd_get() and ovs_vport_cmd_dump() indirectly call rtnl_net_notifyid() from RCU critical section, * rtnetlink: rtmsg_ifinfo_build_skb() already receives GFP flags as parameter. Also, in ovs_vport_cmd_build_info(), let's change the GFP flags used by nlmsg_new(). The function is allowed to sleep, so better make the flags consistent with the ones used in the following ovs_vport_cmd_fill_info() call. Found by code inspection. Fixes: 9a9634545c70 ("netns: notify netns id events") Signed-off-by: Guillaume Nault <gnault@redhat.com> Acked-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Acked-by: Pravin B Shelar <pshelar@ovn.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-09-26openvswitch: change type of UPCALL_PID attribute to NLA_UNSPECLi RongQing1-1/+1
userspace openvswitch patch "(dpif-linux: Implement the API functions to allow multiple handler threads read upcall)" changes its type from U32 to UNSPEC, but leave the kernel unchanged and after kernel 6e237d099fac "(netlink: Relax attr validation for fixed length types)", this bug is exposed by the below warning [ 57.215841] netlink: 'ovs-vswitchd': attribute type 5 has an invalid length. Fixes: 5cd667b0a456 ("openvswitch: Allow each vport to have an array of 'port_id's") Signed-off-by: Li RongQing <lirongqing@baidu.com> Acked-by: Pravin B Shelar <pshelar@ovn.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-09-06net: openvswitch: Set OvS recirc_id from tc chain indexPaul Blakey1-5/+33
Offloaded OvS datapath rules are translated one to one to tc rules, for example the following simplified OvS rule: recirc_id(0),in_port(dev1),eth_type(0x0800),ct_state(-trk) actions:ct(),recirc(2) Will be translated to the following tc rule: $ tc filter add dev dev1 ingress \ prio 1 chain 0 proto ip \ flower tcp ct_state -trk \ action ct pipe \ action goto chain 2 Received packets will first travel though tc, and if they aren't stolen by it, like in the above rule, they will continue to OvS datapath. Since we already did some actions (action ct in this case) which might modify the packets, and updated action stats, we would like to continue the proccessing with the correct recirc_id in OvS (here recirc_id(2)) where we left off. To support this, introduce a new skb extension for tc, which will be used for translating tc chain to ovs recirc_id to handle these miss cases. Last tc chain index will be set by tc goto chain action and read by OvS datapath. Signed-off-by: Paul Blakey <paulb@mellanox.com> Signed-off-by: Vlad Buslov <vladbu@mellanox.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Acked-by: Pravin B Shelar <pshelar@ovn.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-08-06Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netDavid S. Miller1-7/+8
Just minor overlapping changes in the conflicts here. Signed-off-by: David S. Miller <davem@davemloft.net>
2019-08-06openvswitch: Print error when ovs_execute_actions() failsYifeng Sun1-2/+5
Currently in function ovs_dp_process_packet(), return values of ovs_execute_actions() are silently discarded. This patch prints out an debug message when error happens so as to provide helpful hints for debugging. Acked-by: Pravin B Shelar <pshelar@ovn.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-07-24ovs: datapath: hide clang frame-overflow warningsArnd Bergmann1-7/+8
Some functions in the datapath code are factored out so that each one has a stack frame smaller than 1024 bytes with gcc. However, when compiling with clang, the functions are inlined more aggressively and combined again so we get net/openvswitch/datapath.c:1124:12: error: stack frame size of 1528 bytes in function 'ovs_flow_cmd_set' [-Werror,-Wframe-larger-than=] Marking both get_flow_actions() and ovs_nla_init_match_and_action() as 'noinline_for_stack' gives us the same behavior that we see with gcc, and no warning. Note that this does not mean we actually use less stack, as the functions call each other, and we still get three copies of the large 'struct sw_flow_key' type on the stack. The comment tells us that this was previously considered safe, presumably since the netlink parsing functions are called with a known backchain that does not also use a lot of stack space. Fixes: 9cc9a5cb176c ("datapath: Avoid using stack larger than 1024.") Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-07-12net: openvswitch: do not update max_headroom if new headroom is equal to old ↵Taehee Yoo1-11/+28
headroom When a vport is deleted, the maximum headroom size would be changed. If the vport which has the largest headroom is deleted, the new max_headroom would be set. But, if the new headroom size is equal to the old headroom size, updating routine is unnecessary. Signed-off-by: Taehee Yoo <ap420073@gmail.com> Tested-by: Greg Rose <gvrose8192@gmail.com> Reviewed-by: Greg Rose <gvrose8192@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-06-17Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller1-14/+1
Honestly all the conflicts were simple overlapping changes, nothing really interesting to report. Signed-off-by: David S. Miller <davem@davemloft.net>