summaryrefslogtreecommitdiffstats
AgeCommit message (Collapse)AuthorFilesLines
2014-09-05enic: implement rx_copybreakGovindarajulu Varadarajan2-3/+48
Calling dma_map_single()/dma_unmap_single() is quite expensive compared to copying a small packet. So let's copy short frames and keep the buffers mapped. Signed-off-by: Govindarajulu Varadarajan <_govind@gmx.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-09-05dev_ioctl: remove dev_load() CAP_SYS_MODULE messageDaniel Borkmann1-5/+2
Marcel reported to see the following message when autoloading is being triggered when adding nlmon device: Loading kernel module for a network device with CAP_SYS_MODULE (deprecated). Use CAP_NET_ADMIN and alias netdev-nlmon instead. This false-positive happens despite with having correct capabilities set, e.g. through issuing `ip link del dev nlmon` more than once on a valid device with name nlmon, but Marcel has also seen it on creation time when no nlmon module is previously compiled-in or loaded as module and the device name equals a link type name (e.g. nlmon, vxlan, team). Stephen says: The netdev module alias is a hold over from the past. For normal devices, people used to create a alias eth0 to and point it to the type of network device used, that was back in the bad old ISA days before real discovery. Also, the tunnels create module alias for the control device and ip used to use this to autoload the tunnel device. The message is bogus and should just be removed, I also see it in a couple of other cases where tap devices are renamed for other usese. As mentioned in 8909c9ad8ff0 ("net: don't allow CAP_NET_ADMIN to load non-netdev kernel modules"), we nevertheless still might want to leave the old autoloading behaviour in place as it could break old scripts, so for now, lets just remove the log message as Stephen suggests. Reference: http://thread.gmane.org/gmane.linux.kernel/1105168 Reported-by: Marcel Holtmann <marcel@holtmann.org> Suggested-by: Stephen Hemminger <stephen@networkplumber.org> Signed-off-by: Daniel Borkmann <dborkman@redhat.com> Cc: Vasiliy Kulikov <segoon@openwall.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-09-05net: bpf: make eBPF interpreter images read-onlyDaniel Borkmann11-32/+144
With eBPF getting more extended and exposure to user space is on it's way, hardening the memory range the interpreter uses to steer its command flow seems appropriate. This patch moves the to be interpreted bytecode to read-only pages. In case we execute a corrupted BPF interpreter image for some reason e.g. caused by an attacker which got past a verifier stage, it would not only provide arbitrary read/write memory access but arbitrary function calls as well. After setting up the BPF interpreter image, its contents do not change until destruction time, thus we can setup the image on immutable made pages in order to mitigate modifications to that code. The idea is derived from commit 314beb9bcabf ("x86: bpf_jit_comp: secure bpf jit against spraying attacks"). This is possible because bpf_prog is not part of sk_filter anymore. After setup bpf_prog cannot be altered during its life-time. This prevents any modifications to the entire bpf_prog structure (incl. function/JIT image pointer). Every eBPF program (including classic BPF that are migrated) have to call bpf_prog_select_runtime() to select either interpreter or a JIT image as a last setup step, and they all are being freed via bpf_prog_free(), including non-JIT. Therefore, we can easily integrate this into the eBPF life-time, plus since we directly allocate a bpf_prog, we have no performance penalty. Tested with seccomp and test_bpf testsuite in JIT/non-JIT mode and manual inspection of kernel_page_tables. Brad Spengler proposed the same idea via Twitter during development of this patch. Joint work with Hannes Frederic Sowa. Suggested-by: Brad Spengler <spender@grsecurity.net> Signed-off-by: Daniel Borkmann <dborkman@redhat.com> Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Cc: Alexei Starovoitov <ast@plumgrid.com> Cc: Kees Cook <keescook@chromium.org> Acked-by: Alexei Starovoitov <ast@plumgrid.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-09-04net: systemport: update UMAC_CMD only when link is detectedFlorian Fainelli1-3/+6
When we bring the interface down, phy_stop() will schedule the PHY state machine to call our link adjustment callback. By the time we do so, we may have clock gated off the SYSTEMPORT hardware block, and this will cause bus errors to happen in bcm_sysport_adj_link(): Make sure that we only touch the UMAC_CMD register when there is an actual link. This is safe to do for two reasons: - updating the Ethernet MAC registers only make sense when a physical link is present - the PHY library state machine first set phydev->link = 0 before invoking phydev->adjust_link in the PHY_HALTED case This is a similar fix to the GENET one: c677ba8b3c47650358572091ed8a6af50bfca877 ("net: bcmgenet: update UMAC_CMD only when link is detected"). Fixes: 80105befdb4b ("net: systemport: add Broadcom SYSTEMPORT Ethernet MAC driver") Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-09-04ipv4: implement igmp_qrv sysctl to tune igmp robustness variableHannes Frederic Sowa4-16/+31
As in IPv6 people might increase the igmp query robustness variable to make sure unsolicited state change reports aren't lost on the network. Add and document this new knob to igmp code. RFCs allow tuning this parameter back to first IGMP RFC, so we also use this setting for all counters, including source specific multicast. Also take over sysctl value when upping the interface and don't reuse the last one seen on the interface. Cc: Flavio Leitner <fbl@redhat.com> Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Acked-by: Flavio Leitner <fbl@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-09-04ipv6: add sysctl_mld_qrv to configure query robustness variableHannes Frederic Sowa4-10/+31
This patch adds a new sysctl_mld_qrv knob to configure the mldv1/v2 query robustness variable. It specifies how many retransmit of unsolicited mld retransmit should happen. Admins might want to tune this on lossy links. Also reset mld state on interface down/up, so we pick up new sysctl settings during interface up event. IPv6 certification requests this knob to be available. I didn't make this knob netns specific, as it is mostly a setting in a physical environment and should be per host. Cc: Flavio Leitner <fbl@redhat.com> Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Acked-by: Flavio Leitner <fbl@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-09-03mlx4_en: Convert the normal skb free path to dev_consume_skb_any()Rick Jones1-1/+1
It would appear the mlx4_en driver was still making a call to dev_kfree_skb_any() where dev_consume_skb_any() would be more appropriate. This should make dropped packet profiling/tracking easier/better over a NIC driven by mlx4_en. Signed-off-by: Rick Jones <rick.jones2@hp.com> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-09-03lib/rhashtable: allow user to set the minimum shifts of shrinkingYing Xue2-4/+10
Although rhashtable library allows user to specify a quiet big size for user's created hash table, the table may be shrunk to a very small size - HASH_MIN_SIZE(4) after object is removed from the table at the first time. Subsequently, even if the total amount of objects saved in the table is quite lower than user's initial setting in a long time, the hash table size is still dynamically adjusted by rhashtable_shrink() or rhashtable_expand() each time object is inserted or removed from the table. However, as synchronize_rcu() has to be called when table is shrunk or expanded by the two functions, we should permit user to set the minimum table size through configuring the minimum number of shifts according to user specific requirement, avoiding these expensive actions of shrinking or expanding because of calling synchronize_rcu(). Signed-off-by: Ying Xue <ying.xue@windriver.com> Acked-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-09-03qdisc: validate frames going through the direct_xmit pathJesper Dangaard Brouer1-2/+8
In commit 50cbe9ab5f8d ("net: Validate xmit SKBs right when we pull them out of the qdisc") the validation code was moved out of dev_hard_start_xmit and into dequeue_skb. However this overlooked the fact that we do not always enqueue the skb onto a qdisc. First situation is if qdisc have flag TCQ_F_CAN_BYPASS and qdisc is empty. Second situation is if there is no qdisc on the device, which is a common case for software devices. Originally spotted and inital patch by Alexander Duyck. As a result Alex was seeing issues trying to connect to a vhost_net interface after commit 50cbe9ab5f8d was applied. Added a call to validate_xmit_skb() in __dev_xmit_skb(), in the code path for qdiscs with TCQ_F_CAN_BYPASS flag, and in __dev_queue_xmit() when no qdisc. Also handle the error situation where dev_hard_start_xmit() could return a skb list, and does not return dev_xmit_complete(rc) and falls through to the kfree_skb(), in that situation it should call kfree_skb_list(). Fixes: 50cbe9ab5f8d ("net: Validate xmit SKBs right when we pull them out of the qdisc") Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com> Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-09-03qdisc: exit case fixes for skb list handling in qdisc layerJesper Dangaard Brouer1-2/+2
More minor fixes to merge commit 53fda7f7f9e (Merge branch 'xmit_list') that allows us to work with a list of SKBs. Fixing exit cases in qdisc_reset() and qdisc_destroy(), where a leftover requeued SKB (qdisc->gso_skb) can have the potential of being a skb list, thus use kfree_skb_list(). This is a followup to commit 10770bc2d1 ("qdisc: adjustments for API allowing skb list xmits"). Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-09-02qdisc: adjustments for API allowing skb list xmitsJesper Dangaard Brouer1-4/+4
Minor adjustments for merge commit 53fda7f7f9e (Merge branch 'xmit_list') that allows us to work with a list of SKBs. Update code doc to function sch_direct_xmit(). In handle_dev_cpu_collision() use kfree_skb_list() in error handling. Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-09-02ethernet: arc: remove unused devJingoo Han1-1/+0
Remove unused 'dev' variable from arc_emac_remove(), since it's not being used any more. Signed-off-by: Jingoo Han <jg1.han@samsung.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-09-02ethernet: nvidia: Remove extra parensDavid Wood1-1/+1
Remove unnecessary double parenthesis around if statement. Signed-off-by: David Wood <devel@dtwood.uk> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-09-02Merge branch 'netdev_modified'David S. Miller1-24/+42
Nicolas Dichtel says: ==================== rtnl: send notification in do_setlink() This series ensures to call the notifier chain and to send a netlink message when a change is done by do_setlink(). The three first patches mainly prepare the last one, which do this change. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2014-09-02rtnl/do_setlink(): notify when a netdev is modifiedNicolas Dichtel1-14/+21
Depending on which parameters were updated, the changes were not propagated via the notifier chain and netlink. The new flag has been set only when the change did not cause a call to the notifier chain and/or to the netlink notification functions. Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-09-02rtnl/do_setlink(): last arg is now a set of flagsNicolas Dichtel1-21/+22
There is no functional changes with this commit, it only prepares the next one. Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-09-02rtnl/do_setlink(): set modified when IFLA_LINKMODE is updatedNicolas Dichtel1-1/+5
The only effect of this patch is to print a warning if IFLA_LINKMODE is updated and a following change fails. Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-09-02rtnl/do_setlink(): set modified when IFLA_TXQLEN is updatedNicolas Dichtel1-2/+8
The only effect of this patch is to print a warning if IFLA_TXQLEN is updated and a following change fails. Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-09-02Merge branch 'be2net-next'David S. Miller6-141/+233
Sathya Perla says: ==================== be2net: patch set v2 changes: add a new line after variable declaration in patch 12. *** Patch 1 adds a few new log messages to help debugging in failure cases. Patch 2 uses new macros for parsing RX/TX completions and TX wrbs to help shorten the lines. Patch 3 adds a description for the RX counter rx_input_fifo_overflow_drop. Patch 4 adds TX completion error statistics reporting via ethtool. Patch 5 adds a dma_mapping_error counter and its reporting via ethtool. Patch 6 fixes up log messages in the Lancer FW download path. Patch 7 replaces gotos with direct return statements. Patch 8 cleans up be_change_mtu() code by using a new macro BE_MAX_MTU Patch 9 makes be_cmd_get_regs() routine to return an integer status similar to other FW cmd routines in be_cmds.c Patch 10 gets rid of TX budget as enforcing a budget on TX completion processing in NAPI is neither suggested nor it provides a performance benefit. Patch 11 defines and uses a new macro for_all_tx_queues_on_eq() similar to the RX processing code. Patch 12 queries max_tx_qs from the FW for BE3 super-nic profiles. For those profiles, the driver cannot assume a constant BE3_MAX_TX_QS value, as the value may change for each function. Please consider applying this patch set to the net-next tree. Thanks! ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2014-09-02be2net: query max_tx_qs for BE3 super-nic profile from FWSuresh Reddy1-2/+12
In the BE3 super-nic profile, the max_tx_qs value can vary for each function. So the driver needs to query this value from FW instead of using the pre-defined constant BE3_MAX_TX_QS. Signed-off-by: Suresh Reddy <Suresh.Reddy@emulex.com> Signed-off-by: Sathya Perla <sathya.perla@emulex.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-09-02be2net: define macro for_all_tx_queues_on_eq()Sathya Perla2-3/+7
Replace the for() loop that traverses all the TX queues on an EQ with the macro for_all_tx_queues_on_eq(). With this expalnatory name, the one line comment is not required anymore. Signed-off-by: Sathya Perla <sathya.perla@emulex.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-09-02be2net: get rid of TX budgetSathya Perla2-21/+11
Enforcing a budget on the TX completion processing in NAPI doesn't benefit performance in anyway. Just get rid of it. Signed-off-by: Sathya Perla <sathya.perla@emulex.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-09-02be2net: make be_cmd_get_regs() return a statusVasundhara Volam2-6/+6
There are a few failure cases in be_cmd_get_regs() that ideally must return an error value. This style is used across all the routines in be_cmds.c with this routine being an exception. This patch fixes this. Signed-off-by: Vasundhara Volam <vasundhara.volam@emulex.com> Signed-off-by: Sathya Perla <sathya.perla@emulex.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-09-02be2net: define BE_MAX_MTUKalesh AP2-7/+9
This patch defines a new macro BE_MAX_MTU to make the code in be_change_mtu() more readable. Signed-off-by: Kalesh AP <kalesh.purayil@emulex.com> Signed-off-by: Sathya Perla <sathya.perla@emulex.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-09-02be2net: remove unncessary gotosKalesh AP1-10/+6
In cases where there is no extra code to handle an error, this patch replaces gotos with a direct return statement. Signed-off-by: Kalesh AP <kalesh.purayil@emulex.com> Signed-off-by: Sathya Perla <sathya.perla@emulex.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-09-02be2net: fix log messages in lancer FW download pathKalesh AP1-19/+11
Log messages in the Lancer FW download path have issues such as: - a single message spanning multiple lines - the success message is logged even in failure cases - status codes are already logged in the FW cmd routines This patch fixes these issues. Signed-off-by: Kalesh AP <kalesh.purayil@emulex.com> Signed-off-by: Sathya Perla <sathya.perla@emulex.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-09-02be2net: Add a dma_mapping_error counter in ethtoolVasundhara Volam3-1/+5
Add a dma_mapping_error counter to count the number of packets dropped due to DMA mapping errors. Signed-off-by: Vasundhara Volam <vasundhara.volam@emulex.com> Signed-off-by: Sathya Perla <sathya.perla@emulex.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-09-02be2net: Add TX completion error statistics in ethtoolKalesh AP4-0/+92
HW reports TX completion errors in TX completion. This patch adds these counters to ethtool statistics. Signed-off-by: Kalesh AP <kalesh.purayil@emulex.com> Signed-off-by: Sathya Perla <sathya.perla@emulex.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-09-02be2net: add a description for counter rx_input_fifo_overflow_dropSathya Perla1-0/+5
Signed-off-by: Sathya Perla <sathya.perla@emulex.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-09-02be2net: shorten AMAP_GET/SET_BITS() macro callsSathya Perla2-69/+58
The AMAP_GET/SET_BITS() macro calls take structure name as a parameter and hence are long and span more than one line. Replace these calls with a wrapper macros for RX/Tx compls and TX wrb. This results in fewer lines and more readable code in be_main.c Signed-off-by: Sathya Perla <sathya.perla@emulex.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-09-02be2net: add a few log messagesSathya Perla2-8/+16
This patch adds the following log messages to help debugging failure cases: 1) log FW version number: this is useful when driver initialization fails and the FW version number cannot be queried via ethtool 2) per function resource limits for BEx chips: these values are currently being printed only for Skyhawk and Lancer 3) PCI BAR mapping failure 4) function_mode/caps queried from FW: this helps catch any FW bugs that could advertise wrong capabilities to the driver Signed-off-by: Sathya Perla <sathya.perla@emulex.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-09-01sock: deduplicate errqueue dequeueWillem de Bruijn6-51/+28
sk->sk_error_queue is dequeued in four locations. All share the exact same logic. Deduplicate. Also collapse the two critical sections for dequeue (at the top of the recv handler) and signal (at the bottom). This moves signal generation for the next packet forward, which should be harmless. It also changes the behavior if the recv handler exits early with an error. Previously, a signal for follow-up packets on the errqueue would then not be scheduled. The new behavior, to always signal, is arguably a bug fix. For rxrpc, the change causes the same function to be called repeatedly for each queued packet (because the recv handler == sk_error_report). It is likely that all packets will fail for the same reason (e.g., memory exhaustion). This code runs without sk_lock held, so it is not safe to trust that sk->sk_err is immutable inbetween releasing q->lock and the subsequent test. Introduce int err just to avoid this potential race. Signed-off-by: Willem de Bruijn <willemb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-09-01net-timestamp: expand documentationWillem de Bruijn3-84/+764
Expand Documentation/networking/timestamping.txt with new interfaces and bytestream timestamping. Also minor cleanup of the other text. Import txtimestamp.c test of the new features. Signed-off-by: Willem de Bruijn <willemb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-09-01Merge branch 'csums-next'David S. Miller12-21/+135
Tom Herbert says: ==================== net: Checksum offload changes - Part VI I am working on overhauling RX checksum offload. Goals of this effort are: - Specify what exactly it means when driver returns CHECKSUM_UNNECESSARY - Preserve CHECKSUM_COMPLETE through encapsulation layers - Don't do skb_checksum more than once per packet - Unify GRO and non-GRO csum verification as much as possible - Unify the checksum functions (checksum_init) - Simplify code What is in this seventh patch set: - Add skb->csum. This allows a device or GRO to indicate that an invalid checksum was detected. - Checksum unncessary to checksum complete conversions. With these changes, I believe that the third goal of the overhaul is now mostly achieved. In the case of no encapsulation or one layer of encapsulation, there should only be at most one skb_checksum over each packet (between GRO and normal path). In the case of two layers of encapsulation, it is still possible with the right combination of non-zero and zero UDP checksums to have >1 skb_checksum. For instance: IP>GRE(with csum)>IP>UDP(zero csum)>VXLAN>IP>UDP(non-zero csum), would likely necessiate an skb_checksum in GRO and normal path. This doesn't seem like a common scenario at all so I'm inclined to not address this now, if multiple layers of encapsulation becomes popular we can reassess. Note that checksum conversion shows a nice improvement for RX VXLAN when outer UDP checksum is enabled (12.65% CPU compared to 20.94%). This is not only from the fact that we don't need checksum calculation on the host, but also allows GRO for VXLAN in this case. Checksum conversion does not help send side (which still needs to perform a checksum on host). For that we will implement remote checksum offload in a later patch (http://tools.ietf.org/html/draft-herbert-remotecsumoffload-00). Please review carefully and test if possible, mucking with basic checksum functions is always a little precarious :-) ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2014-09-01l2tp: Enable checksum unnecessary conversions for l2tp/UDP socketsTom Herbert1-0/+2
Signed-off-by: Tom Herbert <therbert@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-09-01vxlan: Enable checksum unnecessary conversions for vxlan/UDP socketsTom Herbert1-0/+2
Signed-off-by: Tom Herbert <therbert@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-09-01gre: Add support for checksum unnecessary conversionsTom Herbert2-2/+10
Call skb_checksum_try_convert and skb_gro_checksum_try_convert after checksum is found present and validated in the GRE header for normal and GRO paths respectively. In GRO path, call skb_gro_checksum_try_convert Signed-off-by: Tom Herbert <therbert@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-09-01udp: Add support for doing checksum unnecessary conversionTom Herbert5-16/+57
Add support for doing CHECKSUM_UNNECESSARY to CHECKSUM_COMPLETE conversion in UDP tunneling path. In the normal UDP path, we call skb_checksum_try_convert after locating the UDP socket. The check is that checksum conversion is enabled for the socket (new flag in UDP socket) and that checksum field is non-zero. In the UDP GRO path, we call skb_gro_checksum_try_convert after checksum is validated and checksum field is non-zero. Since this is already in GRO we assume that checksum conversion is always wanted. Signed-off-by: Tom Herbert <therbert@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-09-01net: Infrastructure for checksum unnecessary conversionsTom Herbert2-0/+40
For normal path, added skb_checksum_try_convert which is called to attempt to convert CHECKSUM_UNNECESSARY to CHECKSUM_COMPLETE. The primary condition to allow this is that ip_summed is CHECKSUM_NONE and csum_valid is true, which will be the state after consuming a CHECKSUM_UNNECESSARY. For GRO path, added skb_gro_checksum_try_convert which is the GRO analogue of skb_checksum_try_convert. The primary condition to allow this is that NAPI_GRO_CB(skb)->csum_cnt == 0 and NAPI_GRO_CB(skb)->csum_valid is set. This implies that we have consumed all available CHECKSUM_UNNECESSARY checksums in the GRO path. Signed-off-by: Tom Herbert <therbert@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-09-01net: Support for csum_bad in skbuffTom Herbert3-3/+24
This flag indicates that an invalid checksum was detected in the packet. __skb_mark_checksum_bad helper function was added to set this. Checksums can be marked bad from a driver or the GRO path (the latter is implemented in this patch). csum_bad is checked in __skb_checksum_validate_complete (i.e. calling that when ip_summed == CHECKSUM_NONE). csum_bad works in conjunction with ip_summed value. In the case that ip_summed is CHECKSUM_NONE and csum_bad is set, this implies that the first (or next) checksum encountered in the packet is bad. When ip_summed is CHECKSUM_UNNECESSARY, the first checksum after the last one validated is bad. For example, if ip_summed == CHECKSUM_UNNECESSARY, csum_level == 1, and csum_bad is set-- then the third checksum in the packet is bad. In the normal path, the packet will be dropped when processing the protocol layer of the bad checksum: __skb_decr_checksum_unnecessary called twice for the good checksums changing ip_summed to CHECKSUM_NONE so that __skb_checksum_validate_complete is called to validate the third checksum and that will fail since csum_bad is set. Signed-off-by: Tom Herbert <therbert@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-09-01r8152: rename rx_buf_szhayeswang1-11/+11
The variable "rx_buf_sz" is used by both tx and rx buffers. Replace it with "agg_buf_sz". Signed-off-by: Hayes Wang <hayeswang@realtek.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-09-01net: phy: mdio-bcm-unimac: NULL-terminate unimac_mdio_idsFlorian Fainelli1-0/+1
drivers/net/phy/mdio-bcm-unimac.c:195:37-38: unimac_mdio_ids is not NULL terminated at line 195 Make sure of_device_id tables are NULL terminated Generated by: scripts/coccinelle/misc/of_table.cocci Signed-off-by: Fengguang Wu <fengguang.wu@intel.com> Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-09-01net: dsa: make dsa_pack_type staticFlorian Fainelli1-1/+1
net/dsa/dsa.c:624:20: sparse: symbol 'dsa_pack_type' was not declared. Should it be static? Fixes: 3e8a72d1dae374 ("net: dsa: reduce number of protocol hooks") Signed-off-by: Fengguang Wu <fengguang.wu@intel.com> Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-09-01bonding: add slave_changelink support and use it for queue_idNikolay Aleksandrov1-0/+29
This patch adds support for slave_changelink to the bonding and uses it to give the ability to change the queue_id of the enslaved devices via netlink. It sets slave_maxtype and uses bond_changelink as a prototype for bond_slave_changelink. Example/test command after the iproute2 patch: ip link set eth0 type bond_slave queue_id 10 CC: David S. Miller <davem@davemloft.net> CC: Jay Vosburgh <j.vosburgh@gmail.com> CC: Veaceslav Falico <vfalico@gmail.com> CC: Andy Gospodarek <andy@greyhouse.net> Suggested-by: Jiri Pirko <jiri@resnulli.us> Signed-off-by: Nikolay Aleksandrov <nikolay@redhat.com> Acked-by: Jiri Pirko <jiri@resnulli.us> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-09-01tcp: whitespace fixesstephen hemminger15-123/+104
Fix places where there is space before tab, long lines, and awkward if(){, double spacing etc. Add blank line after declaration/initialization. Signed-off-by: Stephen Hemminger <stephen@networkplumber.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-09-01net: systemport: tell RXCHK if we are using Broadcom tagsFlorian Fainelli1-0/+9
When Broadcom tags are enabled, e.g: when interfaced to an Ethernet switch, make sure that we tell the RXCHK engine that it should be expecting a 4-bytes Broadcom tag after the Ethernet MAC Source Address. Use netdev_uses_dsa() to check for that condition since that will tell us if a switch is attached to our network interface. Fixes: 80105befdb4b ("net: systemport: add Broadcom SYSTEMPORT Ethernet MAC driver") Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-09-01pktgen: add flag NO_TIMESTAMP to disable timestampingJesper Dangaard Brouer1-3/+16
Then testing the TX limits of the stack, then it is useful to be-able to disable the do_gettimeofday() timetamping on every packet. This implements a pktgen flag NO_TIMESTAMP which will disable this call to do_gettimeofday(). The performance change on (my system E5-2695) with skb_clone=0, goes from TX 2,423,751 pps to 2,567,165 pps with flag NO_TIMESTAMP. Thus, the cost of do_gettimeofday() or saving is approx 23 nanosec. Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-09-01bnx2x: fix tunneled GSO over IPv6Dmitry Kravkov1-1/+1
Set correct bit for packed description. Introduced in e42780b66aab88d3a82b6087bcd6095b90eecde7 bnx2x: Utilize FW 7.10.51 Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Dmitry Kravkov <Dmitry.Kravkov@qlogic.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-09-01bnx2x: prevent incorrect byte-swap in BEDmitry Kravkov1-20/+0
Fixes incorrectly defined struct in FW HSI for BE platform. Affects tunneling, tx-switching and anti-spoofing. Introduced in e42780b66aab88d3a82b6087bcd6095b90eecde7 bnx2x: Utilize FW 7.10.51 Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Dmitry Kravkov <Dmitry.Kravkov@qlogic.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-09-01tipc: add name distributor resiliency queueErik Hugne6-7/+95
TIPC name table updates are distributed asynchronously in a cluster, entailing a risk of certain race conditions. E.g., if two nodes simultaneously issue conflicting (overlapping) publications, this may not be detected until both publications have reached a third node, in which case one of the publications will be silently dropped on that node. Hence, we end up with an inconsistent name table. In most cases this conflict is just a temporary race, e.g., one node is issuing a publication under the assumption that a previous, conflicting, publication has already been withdrawn by the other node. However, because of the (rtt related) distributed update delay, this may not yet hold true on all nodes. The symptom of this failure is a syslog message: "tipc: Cannot publish {%u,%u,%u}, overlap error". In this commit we add a resiliency queue at the receiving end of the name table distributor. When insertion of an arriving publication fails, we retain it in this queue for a short amount of time, assuming that another update will arrive very soon and clear the conflict. If so happens, we insert the publication, otherwise we drop it. The (configurable) retention value defaults to 2000 ms. Knowing from experience that the situation described above is extremely rare, there is no risk that the queue will accumulate any large number of items. Signed-off-by: Erik Hugne <erik.hugne@ericsson.com> Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Acked-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>