Age | Commit message (Collapse) | Author | Files | Lines |
|
Ideally, we would need to generate IP ID using a per destination IP
generator.
linux kernels used inet_peer cache for this purpose, but this had a huge
cost on servers disabling MTU discovery.
1) each inet_peer struct consumes 192 bytes
2) inetpeer cache uses a binary tree of inet_peer structs,
with a nominal size of ~66000 elements under load.
3) lookups in this tree are hitting a lot of cache lines, as tree depth
is about 20.
4) If server deals with many tcp flows, we have a high probability of
not finding the inet_peer, allocating a fresh one, inserting it in
the tree with same initial ip_id_count, (cf secure_ip_id())
5) We garbage collect inet_peer aggressively.
IP ID generation do not have to be 'perfect'
Goal is trying to avoid duplicates in a short period of time,
so that reassembly units have a chance to complete reassembly of
fragments belonging to one message before receiving other fragments
with a recycled ID.
We simply use an array of generators, and a Jenkin hash using the dst IP
as a key.
ipv6_select_ident() is put back into net/ipv6/ip6_output.c where it
belongs (it is only used from this file)
secure_ip_id() and secure_ipv6_id() no longer are needed.
Rename ip_select_ident_more() to ip_select_ident_segs() to avoid
unnecessary decrement/increment of the number of segments.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
This change provides a function to be used in order to break the
ndo_set_rx_mode call into a set of address add and remove calls. The code
is based on the implementation of dev_uc_sync/dev_mc_sync. Since they
essentially do the same thing but with only one dev I simply named my
functions __dev_uc_sync/__dev_mc_sync.
I also implemented an unsync version of the functions as well to allow for
cleanup on close.
Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
This reverts commit 70a640d0dae3a9b1b222ce673eb5d92c263ddd61.
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Commit 9739eef13c92 ("net: filter: make BPF conversion more readable")
started to introduce helper macros similar to BPF_STMT()/BPF_JUMP()
macros from classic BPF.
However, quite some statements in the filter conversion functions
remained in the old style which gives a mixture of block macros and
non block macros in the code. This patch makes the block macros itself
more readable by using explicit member initialization, and converts
the remaining ones where possible to remain in a more consistent state.
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Acked-by: Alexei Starovoitov <ast@plumgrid.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
This patch finally allows us to get rid of the BPF_S_* enum.
Currently, the code performs unnecessary encode and decode
workarounds in seccomp and filter migration itself when a filter
is being attached in order to overcome BPF_S_* encoding which
is not used anymore by the new interpreter resp. JIT compilers.
Keeping it around would mean that also in future we would need
to extend and maintain this enum and related encoders/decoders.
We can get rid of all that and save us these operations during
filter attaching. Naturally, also JIT compilers need to be updated
by this.
Before JIT conversion is being done, each compiler checks if A
is being loaded at startup to obtain information if it needs to
emit instructions to clear A first. Since BPF extensions are a
subset of BPF_LD | BPF_{W,H,B} | BPF_ABS variants, case statements
for extensions can be removed at that point. To ease and minimalize
code changes in the classic JITs, we have introduced bpf_anc_helper().
Tested with test_bpf on x86_64 (JIT, int), s390x (JIT, int),
arm (JIT, int), i368 (int), ppc64 (JIT, int); for sparc we
unfortunately didn't have access, but changes are analogous to
the rest.
Joint work with Alexei Starovoitov.
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Signed-off-by: Alexei Starovoitov <ast@plumgrid.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Mircea Gherzan <mgherzan@gmail.com>
Cc: Kees Cook <keescook@chromium.org>
Acked-by: Chema Gonzalez <chemag@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
This reverts commit 70a640d0dae3a9b1b222ce673eb5d92c263ddd61
("net/mlx4_en: Use affinity hint") and commit
c8865b64b05b2f4eeefd369373e9c8aeb069e7a1 ("cpumask: Utility function
to set n'th cpu - local cpu first") because these changes break
the build when SMP is disabled amongst other things.
Reported-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The “affinity hint” mechanism is used by the user space
daemon, irqbalancer, to indicate a preferred CPU mask for irqs.
Irqbalancer can use this hint to balance the irqs between the
cpus indicated by the mask.
We wish the HCA to preferentially map the IRQs it uses to numa cores
close to it. To accomplish this, we use cpumask_set_cpu_local_first(), that
sets the affinity hint according the following policy:
First it maps IRQs to “close” numa cores. If these are exhausted, the
remaining IRQs are mapped to “far” numa cores.
Signed-off-by: Yuval Atias <yuvala@mellanox.com>
Signed-off-by: Amir Vadai <amirv@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
This function sets the n'th cpu - local cpu's first.
For example: in a 16 cores server with even cpu's local, will get the
following values:
cpumask_set_cpu_local_first(0, numa, cpumask) => cpu 0 is set
cpumask_set_cpu_local_first(1, numa, cpumask) => cpu 2 is set
...
cpumask_set_cpu_local_first(7, numa, cpumask) => cpu 14 is set
cpumask_set_cpu_local_first(8, numa, cpumask) => cpu 1 is set
cpumask_set_cpu_local_first(9, numa, cpumask) => cpu 3 is set
...
cpumask_set_cpu_local_first(15, numa, cpumask) => cpu 15 is set
Curently this function will be used by multi queue networking devices to
calculate the irq affinity mask, such that as many local cpu's as
possible will be utilized to handle the mq device irq's.
Signed-off-by: Amir Vadai <amirv@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Pablo Neira Ayuso says:
====================
Netfilter/IPVS updates for net-next
This small patchset contains three accumulated Netfilter/IPVS updates,
they are:
1) Refactorize common NAT code by encapsulating it into a helper
function, similarly to what we do in other conntrack extensions,
from Florian Westphal.
2) A minor format string mismatch fix for IPVS, from Masanari Iida.
3) Add quota support to the netfilter accounting infrastructure, now
you can add quotas to accounting objects via the nfnetlink interface
and use them from iptables. You can also listen to quota
notifications from userspace. This enhancement from Mathieu Poirier.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-next
John W. Linville says:
====================
Please pull this batch of updates intended for 3.16...
For the mac80211 bits, Johannes says:
"Here I just have Heikki's rfkill GPIO cleanups.
The ARM/tegra patch is OK with the maintainer (Stephen). Let me know of
any problems."
and;
"We have a whole bunch of work on CSA by Andrei, Luca and Michal, but
unfortunately it doesn't seem quite complete yet so it's still disabled.
There's some TDLS work from Arik, and the rest is mostly minor fixes and
cleanups."
For the NFC bits, Samuel says:
"This is the NFC pull request for 3.16. We have:
- STMicroeectronics st21nfca support. The st21nfca is an HCI chipset and
thus relies on the HCI stack. This submission provides support for tag
redaer/writer mode (including Type 5) and device tree bindings.
- PM runtime support and a bunch of bug fixes for TI's trf7970a.
- Device tree support for NXP's pn544. Legacy platform data support is
obviously kept intact.
- NFC Tag type 4B support to the NFC Digital stack.
- SOCK_RAW type support to the raw NFC socket, and allow NCI
sniffing from that. This can be extended to report HCI frames and also
proprietarry ones like e.g. the pn533 ones."
For the iwlwifi bits, Emmanuel says:
"Eran continues to work on new devices, Eyal is still digging in
the rate control stuff, and Johannes added new functionality to the
debug system we have in place now along with a few cleanups he made
on the way. That's pretty much it."
and;
"Avri continues to work on the power code and Eran is improving the
NVM handling as a preparations for new devices on which he works
with Liad. Luca cleans up a bit the code while working on CSA. I have
the regular BT Coex stuff and a small lockdep fix. Johannes has his
regular amount of clean ups and improvements, the main one is the
ability to leave 2 chains open to improve diversity and hence the
throughput in high attenuation scenarios."
and;
"The regular amount of housekeeping here. I merged iwlwifi-fixes.git to
be able to add the patch you didn't want in wireless.git at that stage
of the -rc cycle. Luca has a few preparations for CSA implementation
and also what seems to be a bugfix for P2P but hasn't caused issues
we could notice."
For the Atheros bits, Kalle says:
"For ath10k Michal did various small fixes on how we handle
hardware/firmware problems and he also fixed two memory leaks."
Also included are a couple of pulls from the wireless tree to
avoid/resolve merge issues...
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Add a function to walk the list of subnodes of a mdio bus and look for
a node that matches the phy's address with its 'reg' property. If found,
set the of_node pointer for the phy. This allows auto-probed pyh
devices to be augmented by information passed in via DT.
Signed-off-by: Daniel Mack <zonque@gmail.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-next into for-davem
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless
|
|
Conflicts:
drivers/net/bonding/bond_alb.c
drivers/net/ethernet/altera/altera_msgdma.c
drivers/net/ethernet/altera/altera_sgdma.c
net/ipv6/xfrm6_output.c
Several cases of overlapping changes.
The xfrm6_output.c has a bug fix which overlaps the renaming
of skb->local_df to skb->ignore_df.
In the Altera TSE driver cases, the register access cleanups
in net-next overlapped with bug fixes done in net.
Similarly a bug fix to send ALB packets in the bonding driver using
the right source address overlaps with cleanups in net-next.
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Pull networking fixes from David Miller:
"It looks like a sizeble collection but this is nearly 3 weeks of bug
fixing while you were away.
1) Fix crashes over IPSEC tunnels with NAT, the latter can reroute
the packet through a non-IPSEC protected path and the code has to
be able to handle SKBs attached to routes lacking an attached xfrm
state. From Steffen Klassert.
2) Fix OOPSs in ipv4 and ipv6 ipsec layers for unsupported
sub-protocols, also from Steffen Klassert.
3) Set local_df on fragmented netfilter skbs otherwise we won't be
able to forward successfully, from Florian Westphal.
4) cdc_mbim ipv6 neighbour code does __vlan_find_dev_deep without
holding RCU lock, from Bjorn Mork.
5) local_df test in ip_may_fragment is inverted, from Florian
Westphal.
6) jme driver doesn't check for DMA mapping failures, from Neil
Horman.
7) qlogic driver doesn't calculate number of TX queues properly, from
Shahed Shaikh.
8) fib_info_cnt can drift irreversibly positive if we fail to
allocate the fi->fib_metrics array, from Sergey Popovich.
9) Fix use after free in ip6_route_me_harder(), also from Sergey
Popovich.
10) When SYSCTL is disabled, we don't handle local_port_range and
ping_group_range defaults properly at all, from Cong Wang.
11) Unaccelerated VLAN tagged frames improperly handled by cdc_mbim
driver, fix from Bjorn Mork.
12) cassini driver needs nested lock annotations for TX locking, from
Emil Goode.
13) On init error ipv6 VTI driver can unregister pernet ops twice,
oops. Fix from Mahtias Krause.
14) If macvlan device is down, don't propagate IFF_ALLMULTI changes,
from Peter Christensen.
15) Missing NULL pointer check while parsing netlink config options in
ip6_tnl_validate(). From Susant Sahani.
16) Fix handling of neighbour entries during ipv6 router reachability
probing, from Duan Jiong.
17) x86 and s390 JIT address randomization has some address
calculation bugs leading to crashes, from Alexei Starovoitov and
Heiko Carstens.
18) Clear up those uglies with nop patching and net_get_random_once(),
from Hannes Frederic Sowa.
19) Option length miscalculated in ip6_append_data(), fix also from
Hannes Frederic Sowa.
20) A while ago we fixed a race during device unregistry when a
namespace went down, turns out there is a second place that needs
similar protection. From Cong Wang.
21) In the new Altera TSE driver multicast filtering isn't working,
disable it and just use promisc mode until the cause is found.
From Vince Bridgers.
22) When we disable router enabling in ipv6 we have to flush the
cached routes explicitly, from Duan Jiong.
23) NBMA tunnels should not cache routes on the tunnel object because
the key is variable, from Timo Teräs.
24) With stacked devices GRO information in skb->cb[] can be not setup
properly, make sure it is in all code paths. From Eric Dumazet.
25) Really fix stacked vlan locking, multiple levels of nesting with
intervening non-vlan devices are possible. From Vlad Yasevich.
26) Fallback ipip tunnel device's mtu is not setup properly, from
Steffen Klassert.
27) The packet scheduler's tcindex filter can crash because we
structure copy objects with list_head's inside, oops. From Cong
Wang.
28) Fix CHECKSUM_COMPLETE handling for ipv6 GRE tunnels, from Eric
Dumazet.
29) In some configurations 'itag' in __mkroute_input() can end up
being used uninitialized because of how fib_validate_source()
works. Fix it by explitly initializing itag to zero like all the
other fib_validate_source() callers do, from Li RongQing"
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (116 commits)
batman: fix a bogus warning from batadv_is_on_batman_iface()
ipv4: initialise the itag variable in __mkroute_input
bonding: Send ALB learning packets using the right source
bonding: Don't assume 802.1Q when sending alb learning packets.
net: doc: Update references to skb->rxhash
stmmac: Remove unbalanced clk_disable call
ipv6: gro: fix CHECKSUM_COMPLETE support
net_sched: fix an oops in tcindex filter
can: peak_pci: prevent use after free at netdev removal
ip_tunnel: Initialize the fallback device properly
vlan: Fix build error wth vlan_get_encap_level()
can: c_can: remove obsolete STRICT_FRAME_ORDERING Kconfig option
MAINTAINERS: Pravin Shelar is Open vSwitch maintainer.
bnx2x: Convert return 0 to return rc
bonding: Fix alb mode to only use first level vlans.
bonding: Fix stacked device detection in arp monitoring
macvlan: Fix lockdep warnings with stacked macvlan devices
vlan: Fix lockdep warning with stacked vlan devices.
net: Allow for more then a single subclass for netif_addr_lock
net: Find the nesting level of a given device by type.
...
|
|
The sk_unattached_filter_create() API is used by BPF filters that
are not directly attached or related to sockets, and are used in
team, ptp, xt_bpf, cls_bpf, etc. As such all users do their own
internal managment of obtaining filter blocks and thus already
have them in kernel memory and set up before calling into
sk_unattached_filter_create(). As a result, due to __user annotation
in sock_fprog, sparse triggers false positives (incorrect type in
assignment [different address space]) when filters are set up before
passing them to sk_unattached_filter_create(). Therefore, let
sk_unattached_filter_create() API use sock_fprog_kern to overcome
this issue.
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Acked-by: Alexei Starovoitov <ast@plumgrid.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Lets get rid of this macro. After commit 5bcfedf06f7f ("net: filter:
simplify label names from jump-table"), labels have become more
readable due to omission of BPF_ prefix but at the same time more
generic, so that things like `git grep -n` would not find them. As
a middle path, lets get rid of the DL macro as it's not strictly
needed and would otherwise just hide the full name.
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Acked-by: Alexei Starovoitov <ast@plumgrid.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Added new L2TP configuration options to allow TX and RX of
zero checksums in IPv6. Default is not to use them.
Signed-off-by: Tom Herbert <therbert@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
RFC 6935 permits zero checksums to be used in IPv6 however this is
recommended only for certain tunnel protocols, it does not make
checksums completely optional like they are in IPv4.
This patch restricts the use of IPv6 zero checksums that was previously
intoduced. no_check6_tx and no_check6_rx have been added to control
the use of checksums in UDP6 RX and TX path. The normal
sk_no_check_{rx,tx} settings are not used (this avoids ambiguity when
dealing with a dual stack socket).
A helper function has been added (udp_set_no_check6) which can be
called by tunnel impelmentations to all zero checksums (send on the
socket, and accept them as valid).
Signed-off-by: Tom Herbert <therbert@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Define separate fields in the sock structure for configuring disabling
checksums in both TX and RX-- sk_no_check_tx and sk_no_check_rx.
The SO_NO_CHECK socket option only affects sk_no_check_tx. Also,
removed UDP_CSUM_* defines since they are no longer necessary.
Signed-off-by: Tom Herbert <therbert@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
It doesn't seem like an protocols are setting anything other
than the default, and allowing to arbitrarily disable checksums
for a whole protocol seems dangerous. This can be done on a per
socket basis.
Signed-off-by: Tom Herbert <therbert@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
through ip tool.
o min_tx_rate puts lower limit on the VF bandwidth. VF is guaranteed
to have a bandwidth of at least this value.
max_tx_rate puts cap on the VF bandwidth. VF can have a bandwidth
of up to this value.
o A new handler set_vf_rate for attr IFLA_VF_RATE has been introduced
which takes 4 arguments:
netdev, VF number, min_tx_rate, max_tx_rate
o ndo_set_vf_rate replaces ndo_set_vf_tx_rate handler.
o Drivers that currently implement ndo_set_vf_tx_rate should now call
ndo_set_vf_rate instead and reject attempt to set a minimum bandwidth
greater than 0 for IFLA_VF_TX_RATE when IFLA_VF_RATE is not yet
implemented by driver.
o If user enters only one of either min_tx_rate or max_tx_rate, then,
userland should read back the other value from driver and set both
for IFLA_VF_RATE.
Drivers that have not yet implemented IFLA_VF_RATE should always
return min_tx_rate as 0 when read from ip tool.
o If both IFLA_VF_TX_RATE and IFLA_VF_RATE options are specified, then
IFLA_VF_RATE should override.
o Idea is to have consistent display of rate values to user.
o Usage example: -
./ip link set p4p1 vf 0 rate 900
./ip link show p4p1
32: p4p1: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN mode
DEFAULT qlen 1000
link/ether 00:0e:1e:08:b0:f0 brd ff:ff:ff:ff:ff:ff
vf 0 MAC 3e:a0:ca:bd:ae:5a, tx rate 900 (Mbps), max_tx_rate 900Mbps
vf 1 MAC f6:c6:7c:3f:3d:6c
vf 2 MAC 56:32:43:98:d7:71
vf 3 MAC d6:be:c3:b5:85:ff
vf 4 MAC ee:a9:9a:1e:19:14
vf 5 MAC 4a:d0:4c:07:52:18
vf 6 MAC 3a:76:44:93:62:f9
vf 7 MAC 82:e9:e7:e3:15:1a
./ip link set p4p1 vf 0 max_tx_rate 300 min_tx_rate 200
./ip link show p4p1
32: p4p1: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN mode
DEFAULT qlen 1000
link/ether 00:0e:1e:08:b0:f0 brd ff:ff:ff:ff:ff:ff
vf 0 MAC 3e:a0:ca:bd:ae:5a, tx rate 300 (Mbps), max_tx_rate 300Mbps,
min_tx_rate 200Mbps
vf 1 MAC f6:c6:7c:3f:3d:6c
vf 2 MAC 56:32:43:98:d7:71
vf 3 MAC d6:be:c3:b5:85:ff
vf 4 MAC ee:a9:9a:1e:19:14
vf 5 MAC 4a:d0:4c:07:52:18
vf 6 MAC 3a:76:44:93:62:f9
vf 7 MAC 82:e9:e7:e3:15:1a
./ip link set p4p1 vf 0 max_tx_rate 600 rate 300
./ip link show p4p1
32: p4p1: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN mode
DEFAULT qlen 1000
link/ether 00:0e:1e:08:b0:f brd ff:ff:ff:ff:ff:ff
vf 0 MAC 3e:a0:ca:bd:ae:5, tx rate 600 (Mbps), max_tx_rate 600Mbps,
min_tx_rate 200Mbps
vf 1 MAC f6:c6:7c:3f:3d:6c
vf 2 MAC 56:32:43:98:d7:71
vf 3 MAC d6:be:c3:b5:85:ff
vf 4 MAC ee:a9:9a:1e:19:14
vf 5 MAC 4a:d0:4c:07:52:18
vf 6 MAC 3a:76:44:93:62:f9
vf 7 MAC 82:e9:e7:e3:15:1a
Signed-off-by: Sucheta Chakraborty <sucheta.chakraborty@qlogic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull scheduler fixes from Ingo Molnar:
"The biggest commit is an irqtime accounting loop latency fix, the rest
are misc fixes all over the place: deadline scheduling, docs, numa,
balancer and a bad to-idle latency fix"
* 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
sched/numa: Initialize newidle balance stats in sd_numa_init()
sched: Fix updating rq->max_idle_balance_cost and rq->next_balance in idle_balance()
sched: Skip double execution of pick_next_task_fair()
sched: Use CPUPRI_NR_PRIORITIES instead of MAX_RT_PRIO in cpupri check
sched/deadline: Fix memory leak
sched/deadline: Fix sched_yield() behavior
sched: Sanitize irq accounting madness
sched/docbook: Fix 'make htmldocs' warnings caused by missing description
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull perf fixes from Ingo Molnar:
"The biggest changes are fixes for races that kept triggering Trinity
crashes, plus liblockdep build fixes and smaller misc fixes.
The liblockdep bits in perf/urgent are a pull mistake - they should
have been in locking/urgent - but by the time I noticed other commits
were added and testing was done :-/ Sorry about that"
* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
perf: Fix a race between ring_buffer_detach() and ring_buffer_attach()
perf: Prevent false warning in perf_swevent_add
perf: Limit perf_event_attr::sample_period to 63 bits
tools/liblockdep: Remove all build files when doing make clean
tools/liblockdep: Build liblockdep from tools/Makefile
perf/x86/intel: Fix Silvermont's event constraints
perf: Fix perf_event_init_context()
perf: Fix race in removing an event
|
|
In commit ad86622b478e ("wait: swap EXIT_ZOMBIE and EXIT_DEAD to hide
EXIT_TRACE from user-space") the order of task state definitions were
changed: EXIT_DEAD and EXIT_ZOMBIE were swapped. Though the charterers
for the states in TASK_STATE_TO_CHAR_STR string were not updated. This
patch synchronizes the string to the order of definitions.
Signed-off-by: Masatake YAMATO <yamato@redhat.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
Flow SET can accept an empty set of actions, with the intended
semantics of leaving existing actions unmodified. This seems to have
been brokin after OVS 1.7, as we have assigned the flow's actions
pointer to NULL in this case, but we never check for the NULL pointer
later on. This patch restores the intended behavior and documents it
in the include/linux/openvswitch.h.
Signed-off-by: Jarno Rajahalme <jrajahalme@nicira.com>
Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
|
|
use_prio was added as part of an infrastructure for running FCoE in A0 mode.
FCoE didn't get into Mellanox Upstream driver, and when it will, it won't be
using A0 steering mode.
Therefore we can safely deprecate this module parameter without hurting any
existing user.
CC: Carol Soto <clsoto@linux.vnet.ibm.com>
Signed-off-by: Amir Vadai <amirv@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec-next
Steffen Klassert says:
====================
pull request (net-next): ipsec-next 2014-05-22
This is the last ipsec pull request before I leave for
a three weeks vacation tomorrow. David, can you please
take urgent ipsec patches directly into net/net-next
during this time?
I'll continue to run the ipsec/ipsec-next trees as soon
as I'm back.
1) Simplify the xfrm audit handling, from Tetsuo Handa.
2) Codingstyle cleanup for xfrm_output, from abian Frederick.
Please pull or let me know if there are problems.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
All in-tree drivers have been converted to use the new pair of
functions: of_is_fixed_phy_link() plus of_phy_register_fixed_link(), we
can now safely remove of_phy_connect_fixed_link.
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
When combining real_dev's features and vlan_features, simple
bitwise AND is used. This doesn't work well for checksum
offloading features as if one set has NETIF_F_HW_CSUM and the
other NETIF_F_IP_CSUM and/or NETIF_F_IPV6_CSUM, we end up with
no checksum offloading. However, from the logical point of view
(how can_checksum_protocol() works), NETIF_F_HW_CSUM contains
the functionality of NETIF_F_IP_CSUM and NETIF_F_IPV6_CSUM so
that the result should be IP/IPV6.
Add helper function netdev_intersect_features() implementing
this logic and use it in vlan_dev_fix_features().
Signed-off-by: Michal Kubecek <mkubecek@suse.cz>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Although the implementation probably needs a lot of work, this initial API
allows to implement software TSO in mvneta and mv643xx_eth drivers in a not
so intrusive way.
Signed-off-by: Ezequiel Garcia <ezequiel.garcia@free-electrons.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/sameo/nfc-next
Samuel Ortiz <sameo@linux.intel.com> says:
"NFC: 3.16: First pull request
This is the NFC pull request for 3.16. We have:
- STMicroeectronics st21nfca support. The st21nfca is an HCI chipset and
thus relies on the HCI stack. This submission provides support for tag
redaer/writer mode (including Type 5) and device tree bindings.
- PM runtime support and a bunch of bug fixes for TI's trf7970a.
- Device tree support for NXP's pn544. Legacy platform data support is
obviously kept intact.
- NFC Tag type 4B support to the NFC Digital stack.
- SOCK_RAW type support to the raw NFC socket, and allow NCI
sniffing from that. This can be extended to report HCI frames and also
proprietarry ones like e.g. the pn533 ones."
Signed-off-by: John W. Linville <linville@tuxdriver.com>
|
|
Pablo Neira Ayuso says:
====================
Netfilter/nftables updates for net-next
The following patchset contains Netfilter/nftables updates for net-next,
most relevantly they are:
1) Add set element update notification via netlink, from Arturo Borrero.
2) Put all object updates in one single message batch that is sent to
kernel-space. Before this patch only rules where included in the batch.
This series also introduces the generic transaction infrastructure so
updates to all objects (tables, chains, rules and sets) are applied in
an all-or-nothing fashion, these series from me.
3) Defer release of objects via call_rcu to reduce the time required to
commit changes. The assumption is that all objects are destroyed in
reverse order to ensure that dependencies betweem them are fulfilled
(ie. rules and sets are destroyed first, then chains, and finally
tables).
4) Allow to match by bridge port name, from Tomasz Bursztyka. This series
include two patches to prepare this new feature.
5) Implement the proper set selection based on the characteristics of the
data. The new infrastructure also allows you to specify your preferences
in terms of memory and computational complexity so the underlying set
type is also selected according to your needs, from Patrick McHardy.
6) Several cleanup patches for nft expressions, including one minor possible
compilation breakage due to missing mark support, also from Patrick.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Experience with the recent e114a710aa50 ("tcp: fix cwnd limited
checking to improve congestion control") has shown that there are
common cases where that commit can cause cwnd to be much larger than
necessary. This leads to TSO autosizing cooking skbs that are too
large, among other things.
The main problems seemed to be:
(1) That commit attempted to predict the future behavior of the
connection by looking at the write queue (if TSO or TSQ limit
sending). That prediction sometimes overestimated future outstanding
packets.
(2) That commit always allowed cwnd to grow to twice the number of
outstanding packets (even in congestion avoidance, where this is not
needed).
This commit improves both of these, by:
(1) Switching to a measurement-based approach where we explicitly
track the largest number of packets in flight during the past window
("max_packets_out"), and remember whether we were cwnd-limited at the
moment we finished sending that flight.
(2) Only allowing cwnd to grow to twice the number of outstanding
packets ("max_packets_out") in slow start. In congestion avoidance
mode we now only allow cwnd to grow if it was fully utilized.
Signed-off-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Channels in 2.4GHz band overlap, this means that if we
send a probe request on channel 1 and then move to channel
2, we will hear the probe response on channel 2. In this
case, the RSSI will be lower than if we had heard it on
the channel on which it was sent (1 in this case).
The firmware / low level driver can parse the channel in
the DS IE or HT IE and compensate the RSSI so that it will
still have a valid value even if we heard the frame on an
adjacent channel. This can be done up to a certain offset.
Add this offset as a configuration for the low level driver.
A low level driver that can compensate the low RSSI in this
case should assign the maximal offset for which the RSSI
value is still valid.
Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
|
|
Kernel API for classic BPF socket filters is:
sk_unattached_filter_create() - validate classic BPF, convert, JIT
SK_RUN_FILTER() - run it
sk_unattached_filter_destroy() - destroy socket filter
Cleanup internal BPF kernel API as following:
sk_filter_select_runtime() - final step of internal BPF creation.
Try to JIT internal BPF program, if JIT is not available select interpreter
SK_RUN_FILTER() - run it
sk_filter_free() - free internal BPF program
Disallow direct calls to BPF interpreter. Execution of the BPF program should
be done with SK_RUN_FILTER() macro.
Example of internal BPF create, run, destroy:
struct sk_filter *fp;
fp = kzalloc(sk_filter_size(prog_len), GFP_KERNEL);
memcpy(fp->insni, prog, prog_len * sizeof(fp->insni[0]));
fp->len = prog_len;
sk_filter_select_runtime(fp);
SK_RUN_FILTER(fp, ctx);
sk_filter_free(fp);
Sockets, seccomp, testsuite, tracing are using different ways to populate
sk_filter, so first steps of program creation are not common.
Signed-off-by: Alexei Starovoitov <ast@plumgrid.com>
Acked-by: Daniel Borkmann <dborkman@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/vfs
Pull renameat2 arch support from Miklos Szeredi:
"I've collected architecture patches for the renameat2 syscall that
maintainers acked and/or asked me to queue.
This adds architecture support for the renameat2 syscall to m68k,
parisc, ia64 and through asm-generic to arc, arm64, c6x, hexagon,
metag, openrisc, score, tile, unicore32"
* 'renameat2' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/vfs:
scripts/checksyscalls.sh: Make renameat optional
asm-generic: Add renameat2 syscall
ia64: add renameat2 syscall
parisc: add renameat2 syscall
m68k: add renameat2 syscall
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core
Pull driver core fixes from Greg KH:
"Here are two driver core (well, sysfs) fixes for 3.15-rc6 that resolve
some reported issues and a regression from 3.13"
* tag 'driver-core-3.15-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core:
sysfs: make sure read buffer is zeroed
kernfs, sysfs, cgroup: restrict extra perm check on open to sysfs
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup
Pull more cgroup fixes from Tejun Heo:
"Three more patches to fix cgroup_freezer breakage due to the recent
cgroup internal locking changes - an operation cgroup_freezer was
using now requires sleepable context and cgroup_freezer was invoking
that while holding a spin lock. cgroup_freezer was using an overly
elaborate hierarchical locking scheme.
While it's possible to convert the hierarchical spinlocks directly to
mutexes, this patch simplifies the overall locking so that it uses a
global mutex. This has the added benefit of avoiding iterating
potentially huge number of tasks under a spinlock. While the patch is
on the larger side in the devel cycle, the changes made are mostly
straight-forward and the locking logic is a lot simpler afterwards"
* 'for-3.15-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup:
cgroup: fix rcu_read_lock() leak in update_if_frozen()
cgroup_freezer: replace freezer->lock with freezer_mutex
cgroup: introduce task_css_is_root()
|
|
Pull device tree fixes from Grant Likely:
"Drivercore bugfixes for v3.15
This branch contains bug fixes important to get into v3.15. There is
a fix for modifying properties seen during early boot, a fix for an
incorrect prototype when CONFIG_OF=n, and a couple of corrections to
device tree memory nodes on a few platforms"
* tag 'dt-for-linus' of git://git.secretlab.ca/git/linux:
mips: dts: Fix missing device_type="memory" property in memory nodes
arm: dts: Fix missing device_type="memory" for ste-ccu8540
of: fix CONFIG_OF=n prototype of of_node_full_name()
of: make of_update_property() usable earlier in the boot process
|
|
Implement and export the new cfg80211_get_station() API.
This utility can be used by other kernel modules to obtain
detailed information about a given wireless station.
It will be in particular useful to batman-adv which will
implement a wireless rate based metric.
Signed-off-by: Antonio Quartulli <antonio@open-mesh.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
|
|
Add get_expected_throughput() API to mac80211 so that each
driver can implement its own version based on the RC
algorithm they are using (might be using an HW RC algo).
The API returns a value expressed in Kbps.
Also, add the new get_expected_throughput() member
to the rate_control_ops structure in order to be
able to query the RC algorithm (this patch provides an
implementation of this API for both minstrel and
minstrel_ht).
The related member in the station_info object is now
filled accordingly when dumping a station.
Cc: Felix Fietkau <nbd@openwrt.org>
Signed-off-by: Antonio Quartulli <antonio@open-mesh.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
|
|
git://gitorious.org/linux-can/linux-can-next
Marc Kleine-Budde says:
====================
pull-request: can-next 2014-05-19
this is a pull request of 13 patches for net-next/master.
A patch by Dan Carpenter fixes a coccinelle warning in the mcp251x
driver. Jean Delvare contributes three patches to tightening the
Kconfig dependencies for some drivers. Then come three patches by Pavel
Machek that improve the c_can driver support on the socfpga platform.
Sergei Shtylyov's patch brings support for the CAN hardware found on
Renesas R-Car CAN controllers. Four patches by Oliver Hartkopp, the
first cleans up the guard macros in the CAN headers the other three
improve the EFF frame filtering. Maximilian Schneider's patch adds
support for the GS_USB CAN devices.
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The upper timer_interval limit is arbitrary and much higher
than anything usable in the real world. Reducing it from 15s
to ~4s to make the timer_interval fit in an u32 does not make
much difference. The limit is still outside the practical
bounds.
This eliminates the need for a 64bit timer_interval, fixing a
build error related to 64bit division:
drivers/built-in.o: In function `cdc_ncm_get_coalesce':
ak8975.c:(.text+0x1ac994): undefined reference to `__aeabi_uldivmod'
Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Bjørn Mork <bjorn@mork.no>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The new function vlan_get_encap_level() uses vlan_dev_priv()
which is only conditionally avaialble when VLAN support is
enabled. Make vlan_get_encap_level() conditionally available
as well.
Fixes: 44a4085538c8 ("bonding: Fix stacked device detection in arp monitoring")
Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
CC: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Vlad Yasevich <vyasevic@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Users may need information about the expected throughput
towards a given peer.
This value is supposed to consider the size overhead
generated by the 802.11 header.
This value is exported in kbps through the get_station() API
by including it into the station_info object.
Moreover, it is sent to user space when replying to the
nl80211 GET_STATION command.
This information will be useful to the batman-adv module
which will use it for its new metric computation.
Signed-off-by: Antonio Quartulli <antonio@open-mesh.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
|
|
Add the renameat2 syscall to the generic syscall list, which is used by the
following architectures: arc, arm64, c6x, hexagon, metag, openrisc, score,
tile, unicore32.
Signed-off-by: James Hogan <james.hogan@imgtec.com>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Cc: linux-arch@vger.kernel.org
Cc: Vineet Gupta <vgupta@synopsys.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: linux-arm-kernel@lists.infradead.org
Cc: Mark Salter <msalter@redhat.com>
Cc: Aurelien Jacquiot <a-jacquiot@ti.com>
Cc: Richard Kuo <rkuo@codeaurora.org>
Cc: linux-hexagon@vger.kernel.org
Cc: linux-metag@vger.kernel.org
Cc: Jonas Bonn <jonas@southpole.se>
Cc: Chen Liqin <liqin.linux@gmail.com>
Cc: Lennox Wu <lennox.wu@gmail.com>
Cc: Chris Metcalf <cmetcalf@tilera.com>
Cc: Guan Xuetao <gxt@mprc.pku.edu.cn>
|
|
Pull MIPS fixes from Ralf Baechle:
"MIPS fixes for various loose ends:
- Fix workarounds for R4000 erratum.
- Patch up DEC, Siemens-Nixdorf and Loongson hardware support.
- Wire up renameat2 syscall.
- Delete unused file - it was causing false warnings from maintenance
scripts.
- Revert a patch because it's functionality is now implemented twice
which causes superfluous /proc/cpuinfo output.
- Fix a microMIPS regression"
* 'upstream' of git://git.linux-mips.org/pub/scm/ralf/upstream-linus:
MIPS: mm: Fix broken microMIPS kernel regression.
MIPS: Add new AUDIT_ARCH token for the N32 ABI on MIPS64
MIPS: Wire up renameat2 syscall.
MIPS: inst.h: Rename BITFIELD_FIELD to __BITFIELD_FIELD.
MIPS: Remove file missed when removing rm9k support a while ago.
MIPS/loongson2_cpufreq: Fix CPU clock rate setting
MIPS: Loongson: No need to select GENERIC_HARDIRQS_NO__DO_IRQ
MIPS: csum_partial.S CPU_DADDI_WORKAROUNDS bug fix
MIPS: __strncpy_from_user_asm CPU_DADDI_WORKAROUNDS bug fix
MIPS: __delay CPU_DADDI_WORKAROUNDS bug fix
MIPS: DEC/SNI: O32 wrapper stack switching fixes
MIPS: DEC: Bus error handler <asm/cpu-type.h> fixes
MAINTAINERS: TURBOchannel: Update entry
Revert "MIPS: MT: proc: Add support for printing VPE and TC ids"
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/jhogan/metag
Pull Metag architecture and related fixes from James Hogan:
"Mostly fixes for metag and parisc relating to upgrowing stacks.
- Fix missing compiler barriers in metag memory barriers.
- Fix BUG_ON on metag when RLIMIT_STACK hard limit is increased
beyond safe value.
- Make maximum stack size configurable. This reduces the default
user stack size back to 80MB (especially on parisc after their
removal of _STK_LIM_MAX override). This only affects metag and
parisc.
- Remove metag _STK_LIM_MAX override to match other arches and follow
parisc, now that it is safe to do so (due to the BUG_ON fix
mentioned above).
- Finally now that both metag and parisc _STK_LIM_MAX overrides have
been removed, it makes sense to remove _STK_LIM_MAX altogether"
* tag 'metag-for-v3.15-2' of git://git.kernel.org/pub/scm/linux/kernel/git/jhogan/metag:
asm-generic: remove _STK_LIM_MAX
metag: Remove _STK_LIM_MAX override
parisc,metag: Do not hardcode maximum userspace stack size
metag: Reduce maximum stack size to 256MB
metag: fix memory barriers
|