summaryrefslogtreecommitdiffstats
path: root/include
AgeCommit message (Collapse)AuthorFilesLines
2015-03-23Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller3-0/+12
Conflicts: net/netfilter/nf_tables_core.c The nf_tables_core.c conflict was resolved using a conflict resolution from Stephen Rothwell as a guide. Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-23rhashtable: Fix sleeping inside RCU critical section in walk_stopHerbert Xu1-0/+2
The commit 963ecbd41a1026d99ec7537c050867428c397b89 ("rhashtable: Fix use-after-free in rhashtable_walk_stop") fixed a real bug but created another one because we may end up sleeping inside an RCU critical section. This patch fixes it properly by replacing the mutex with a spin lock that specifically protects the walker lists. Reported-by: Sasha Levin <sasha.levin@oracle.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-23ipv6: introduce idgen_delay and idgen_retries knobsHannes Frederic Sowa1-0/+2
This is specified by RFC 7217. Cc: Erik Kline <ek@google.com> Cc: Fernando Gont <fgont@si6networks.com> Cc: Lorenzo Colitti <lorenzo@google.com> Cc: YOSHIFUJI Hideaki/吉藤英明 <hideaki.yoshifuji@miraclelinux.com> Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-23ipv6: do retries on stable privacy addressesHannes Frederic Sowa1-0/+1
If a DAD conflict is detected, we want to retry privacy stable address generation up to idgen_retries (= 3) times with a delay of idgen_delay (= 1 second). Add the logic to addrconf_dad_failure. By design, we don't clean up dad failed permanent addresses. Cc: Erik Kline <ek@google.com> Cc: Fernando Gont <fgont@si6networks.com> Cc: Lorenzo Colitti <lorenzo@google.com> Cc: YOSHIFUJI Hideaki/吉藤英明 <hideaki.yoshifuji@miraclelinux.com> Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-23ipv6: collapse state_lock and lockHannes Frederic Sowa1-2/+1
Cc: Erik Kline <ek@google.com> Cc: Fernando Gont <fgont@si6networks.com> Cc: Lorenzo Colitti <lorenzo@google.com> Cc: YOSHIFUJI Hideaki/吉藤英明 <hideaki.yoshifuji@miraclelinux.com> Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-23ipv6: introduce IFA_F_STABLE_PRIVACY flagHannes Frederic Sowa1-0/+1
We need to mark appropriate addresses so we can do retries in case their DAD failed. Cc: Erik Kline <ek@google.com> Cc: Fernando Gont <fgont@si6networks.com> Cc: Lorenzo Colitti <lorenzo@google.com> Cc: YOSHIFUJI Hideaki/吉藤英明 <hideaki.yoshifuji@miraclelinux.com> Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-23ipv6: generation of stable privacy addresses for link-local and autoconfHannes Frederic Sowa1-0/+1
This patch implements the stable privacy address generation for link-local and autoconf addresses as specified in RFC7217. RID = F(Prefix, Net_Iface, Network_ID, DAD_Counter, secret_key) is the RID (random identifier). As the hash function F we chose one round of sha1. Prefix will be either the link-local prefix or the router advertised one. As Net_Iface we use the MAC address of the device. DAD_Counter and secret_key are implemented as specified. We don't use Network_ID, as it couples the code too closely to other subsystems. It is specified as optional in the RFC. As Net_Iface we only use the MAC address: we simply have no stable identifier in the kernel we could possibly use: because this code might run very early, we cannot depend on names, as they might be changed by user space early on during the boot process. A new address generation mode is introduced, IN6_ADDR_GEN_MODE_STABLE_PRIVACY. With iproute2 one can switch back to none or eui64 address configuration mode although the stable_secret is already set. We refuse writes to ipv6/conf/all/stable_secret but only allow ipv6/conf/default/stable_secret and the interface specific file to be written to. The default stable_secret is used as the parameter for the namespace, the interface specific can overwrite the secret, e.g. when switching a network configuration from one system to another while inheriting the secret. Cc: Erik Kline <ek@google.com> Cc: Fernando Gont <fgont@si6networks.com> Cc: Lorenzo Colitti <lorenzo@google.com> Cc: YOSHIFUJI Hideaki/吉藤英明 <hideaki.yoshifuji@miraclelinux.com> Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-23ipv6: introduce secret_stable to ipv6_devconfHannes Frederic Sowa2-0/+5
This patch implements the procfs logic for the stable_address knob: The secret is formatted as an ipv6 address and will be stored per interface and per namespace. We track initialized flag and return EIO errors until the secret is set. We don't inherit the secret to newly created namespaces. Cc: Erik Kline <ek@google.com> Cc: Fernando Gont <fgont@si6networks.com> Cc: Lorenzo Colitti <lorenzo@google.com> Cc: YOSHIFUJI Hideaki/吉藤英明 <hideaki.yoshifuji@miraclelinux.com> Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-23rhashtable: Add immediate rehash during insertionHerbert Xu1-5/+37
This patch reintroduces immediate rehash during insertion. If we find during insertion that the table is full or the chain length exceeds a set limit (currently 16 but may be disabled with insecure_elasticity) then we will force an immediate rehash. The rehash will contain an expansion if the table utilisation exceeds 75%. If this rehash fails then the insertion will fail. Otherwise the insertion will be reattempted in the new hash table. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Acked-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-23rhashtable: Add multiple rehash supportHerbert Xu1-12/+14
This patch adds the missing bits to allow multiple rehashes. The read-side as well as remove already handle this correctly. So it's only the rehasher and insertion that need modification to handle this. Note that this patch doesn't actually enable it so for now rehashing is still only performed by the worker thread. This patch also disables the explicit expand/shrink interface because the table is meant to expand and shrink automatically, and continuing to export these interfaces unnecessarily complicates the life of the rehasher since the rehash process is now composed of two parts. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Acked-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-23rhashtable: Allow hashfn to be unsetHerbert Xu1-6/+27
Since every current rhashtable user uses jhash as their hash function, the fact that jhash is an inline function causes each user to generate a copy of its code. This function provides a solution to this problem by allowing hashfn to be unset. In which case rhashtable will automatically set it to jhash. Furthermore, if the key length is a multiple of 4, we will switch over to jhash2. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Acked-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-23rhashtable: Eliminate unnecessary branch in rht_key_hashfnHerbert Xu1-2/+6
When rht_key_hashfn is called from rhashtable itself and params is equal to ht->p, there is no point in checking params.key_len and falling back to ht->p.key_len. For some reason gcc couldn't figure out that params is the same as ht->p. So let's help it by only checking params.key_len when it's a constant. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Acked-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-23Merge tag 'linux-can-next-for-4.1-20150323' of ↵David S. Miller3-8/+7
git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can-next Marc Kleine-Budde says: ==================== pull-request: can-next 2015-03-23 this is a pull request of 6 patches for net-next/master. A patch by Florian Westphal, converts the skb->destructor to use sock_efree() instead of own destructor. Ahmed S. Darwish's patch converts the kvaser_usb driver to use unregister_candev(). A patch by me removes a return from a void function in the m_can driver. Yegor Yefremov contributes a patch for combined rx/tx LED trigger support. A sparse warning in the esd_usb2 driver was fixes by Thomas Körper. Ben Dooks converts the at91_can driver to use endian agnostic IO accessors. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-23Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-nextDavid S. Miller1-29/+0
Pablo Neira Ayuso says: ==================== Netfilter updates for net-next The following patchset contains Netfilter updates for net-next. Basically, more incremental updates for br_netfilter from Florian Westphal, small nf_tables updates (including one fix for rb-tree locking) and small two-liner to add extra validation for the REJECT6 target. More specifically, they are: 1) Use the conntrack status flags from br_netfilter to know that DNAT is happening. Patch for Florian Westphal. 2) nf_bridge->physoutdev == NULL already indicates that the traffic is bridged, so let's get rid of the BRNF_BRIDGED flag. Also from Florian. 3) Another patch to prepare voidization of seq_printf/seq_puts/seq_putc, from Joe Perches. 4) Consolidation of nf_tables_newtable() error path. 5) Kill nf_bridge_pad used by br_netfilter from ip_fragment(), from Florian Westphal. 6) Access rb-tree root node inside the lock and remove unnecessary locking from the get path (we already hold nfnl_lock there), from Patrick McHardy. 7) You cannot use a NFT_SET_ELEM_INTERVAL_END when the set doesn't support interval, also from Patrick. 8) Enforce IP6T_F_PROTO from ip6t_REJECT to make sure the core is actually restricting matches to TCP. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-23af_packet: pass checksum validation status to the userAlexander Drozdov1-0/+1
Introduce TP_STATUS_CSUM_VALID tp_status flag to tell the af_packet user that at least the transport header checksum has been already validated. For now, the flag may be set for incoming packets only. Signed-off-by: Alexander Drozdov <al.drozdov@gmail.com> Cc: Willem de Bruijn <willemb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-23ipv4: dccp: handle ICMP messages on DCCP_NEW_SYN_RECV request socketsEric Dumazet1-0/+2
dccp_v4_err() can restrict lookups to ehash table, and not to listeners. Note this patch creates the infrastructure, but this means that ICMP messages for request sockets are ignored until complete conversion. New dccp_req_err() helper is exported so that we can use it in IPv6 in following patch. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-23ipv4: tcp: handle ICMP messages on TCP_NEW_SYN_RECV request socketsEric Dumazet1-0/+1
tcp_v4_err() can restrict lookups to ehash table, and not to listeners. Note this patch creates the infrastructure, but this means that ICMP messages for request sockets are ignored until complete conversion. New tcp_req_err() helper is exported so that we can use it in IPv6 in following patch. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-23net: convert syn_wait_lock to a spinlockEric Dumazet1-8/+3
This is a low hanging fruit, as we'll get rid of syn_wait_lock eventually. We hold syn_wait_lock for such small sections, that it makes no sense to use a read/write lock. A spin lock is simply faster. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-23inet: remove sk_listener parameter from syn_ack_timeout()Eric Dumazet3-4/+3
It is not needed, and req->sk_listener points to the listener anyway. request_sock argument can be const. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-23crypto: af_alg - Allow to link sglTadeusz Struk1-1/+3
Allow to link af_alg sgls. Signed-off-by: Tadeusz Struk <tadeusz.struk@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-23net: socket: add support for async operationstadeusz.struk@intel.com1-0/+1
Add support for async operations. Signed-off-by: Tadeusz Struk <tadeusz.struk@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-23Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netLinus Torvalds1-0/+10
Pull networking fixes from David Miller: 1) Validate iov ranges before feeding them into iov_iter_init(), from Al Viro. 2) We changed copy_from_msghdr_from_user() to zero out the msg_namelen is a NULL pointer is given for the msg_name. Do the same in the compat code too. From Catalin Marinas. 3) Fix partially initialized tuples in netfilter conntrack helper, from Ian Wilson. 4) Missing continue; statement in nft_hash walker can lead to crashes, from Herbert Xu. 5) tproxy_tg6_check looks for IP6T_INV_PROTO in ->flags instead of ->invflags, fix from Pablo Neira Ayuso. 6) Incorrect memory account of TCP FINs can result in negative socket memory accounting values. Fix from Josh Hunt. 7) Don't allow virtual functions to enable VLAN promiscuous mode in be2net driver, from Vasundhara Volam. * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: netfilter: nft_compat: set IP6T_F_PROTO flag if protocol is set cx82310_eth: wait for firmware to become ready net: validate the range we feed to iov_iter_init() in sys_sendto/sys_recvfrom net: compat: Update get_compat_msghdr() to match copy_msghdr_from_user() behaviour be2net: use PCI MMIO read instead of config read for errors be2net: restrict MODIFY_EQ_DELAY cmd to a max of 8 EQs be2net: Prevent VFs from enabling VLAN promiscuous mode tcp: fix tcp fin memory accounting ipv6: fix backtracking for throw routes net: ethernet: pcnet32: Setup the SRAM and NOUFLO on Am79C97{3, 5} ipv6: call ipv6_proxy_select_ident instead of ipv6_select_ident in udp6_ufo_fragment netfilter: xt_TPROXY: fix invflags check in tproxy_tg6_check() netfilter: restore rule tracing via nfnetlink_log netfilter: nf_tables: allow to change chain policy without hook if it exists netfilter: Fix potential crash in nft_hash walker netfilter: Zero the tuple in nfnl_cthelper_parse_tuple()
2015-03-22can: add combined rx/tx LED trigger supportYegor Yefremov2-2/+6
Add <ifname>-rxtx trigger, that will be activated both for tx as rx events. This trigger mimics "activity" LED for Ethernet devices. Signed-off-by: Yegor Yefremov <yegorslists@googlemail.com> Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
2015-03-22can: use sock_efree instead of own destructorFlorian Westphal1-6/+1
It is identical to the can destructor. Signed-off-by: Florian Westphal <fw@strlen.de> Acked-by: Oliver Hartkopp <socketcan@hartkopp.net> Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
2015-03-22Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nfDavid S. Miller1-0/+10
Pablo Neira Ayuso says: ==================== Netfilter fixes for net The following patchset contains Netfilter fixes for your net tree, they are: 1) Fix missing initialization of tuple structure in nfnetlink_cthelper to avoid mismatches when looking up to attach userspace helpers to flows, from Ian Wilson. 2) Fix potential crash in nft_hash when we hit -EAGAIN in nft_hash_walk(), from Herbert Xu. 3) We don't need to indicate the hook information to update the basechain default policy in nf_tables. 4) Restore tracing over nfnetlink_log due to recent rework to accomodate logging infrastructure into nf_tables. 5) Fix wrong IP6T_INV_PROTO check in xt_TPROXY. 6) Set IP6T_F_PROTO flag in nft_compat so we can use SYNPROXY6 and REJECT6 from xt over nftables. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-22netfilter: bridge: kill nf_bridge_padFlorian Westphal1-22/+0
The br_netfilter frag output function calls skb_cow_head() so in case it needs a larger headroom to e.g. re-add a previously stripped PPPOE or VLAN header things will still work (at cost of reallocation). We can then move nf_bridge_encap_header_len to br_netfilter. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2015-03-21Merge git://git.kernel.org/pub/scm/linux/kernel/git/nab/target-pendingLinus Torvalds1-0/+1
Pull SCSI target fixes from Nicholas Bellinger: "Here are current target-pending fixes for v4.0-rc5 code that have made their way into the queue over the last weeks. The fixes this round include: - Fix long-standing iser-target logout bug related to early conn_logout_comp completion, resulting in iscsi_conn use-after-tree OOpsen. (Sagi + nab) - Fix long-standing tcm_fc bug in ft_invl_hw_context() failure handing for DDP hw offload. (DanC) - Fix incorrect use of unprotected __transport_register_session() in tcm_qla2xxx + other single local se_node_acl fabrics. (Bart) - Fix reference leak in target_submit_cmd() -> target_get_sess_cmd() for ack_kref=1 failure path. (Bart) - Fix pSCSI backend ->get_device_type() statistics OOPs with un-configured device. (Olaf + nab) - Fix virtual LUN=0 target_configure_device failure OOPs at modprobe time. (Claudio + nab) - Fix FUA write false positive failure regression in v4.0-rc1 code. (Christophe Vu-Brugier + HCH)" * git://git.kernel.org/pub/scm/linux/kernel/git/nab/target-pending: target: do not reject FUA CDBs when write cache is enabled but emulate_write_cache is 0 target: Fix virtual LUN=0 target_configure_device failure OOPs target/pscsi: Fix NULL pointer dereference in get_device_type tcm_fc: missing curly braces in ft_invl_hw_context() target: Fix reference leak in target_get_sess_cmd() error path loop/usb/vhost-scsi/xen-scsiback: Fix use of __transport_register_session tcm_qla2xxx: Fix incorrect use of __transport_register_session iscsi-target: Avoid early conn_logout_comp for iser connections Revert "iscsi-target: Avoid IN_LOGOUT failure case for iser-target" target: Disallow changing of WRITE cache/FUA attrs after export
2015-03-21Merge tag 'dm-4.0-fixes' of ↵Linus Torvalds1-0/+1
git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm Pull devicemapper fixes from Mike Snitzer: "A handful of stable fixes for DM: - fix thin target to always zero-fill reads to unprovisioned blocks - fix to interlock device destruction's suspend from internal suspends - fix 2 snapshot exception store handover bugs - fix dm-io to cope with DISCARD and WRITE_SAME capabilities changing" * tag 'dm-4.0-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm: dm io: deal with wandering queue limits when handling REQ_DISCARD and REQ_WRITE_SAME dm snapshot: suspend merging snapshot when doing exception handover dm snapshot: suspend origin when doing exception handover dm: hold suspend_lock while suspending device during device deletion dm thin: fix to consistently zero-fill reads to unprovisioned blocks
2015-03-20net: neighbour: Add mcast_resolicit to configure the number of multicast ↵YOSHIFUJI Hideaki/吉藤英明2-0/+2
resolicitations in PROBE state. We send unicast neighbor (ARP or NDP) solicitations ucast_probes times in PROBE state. Zhu Yanjun reported that some implementation does not reply against them and the entry will become FAILED, which is undesirable. We had been dealt with such nodes by sending multicast probes mcast_ solicit times after unicast probes in PROBE state. In 2003, I made a change not to send them to improve compatibility with IPv6 NDP. Let's introduce per-protocol per-interface sysctl knob "mcast_ reprobe" to configure the number of multicast (re)solicitation for reconfirmation in PROBE state. The default is 0, since we have been doing so for 10+ years. Reported-by: Zhu Yanjun <Yanjun.Zhu@windriver.com> CC: Ulf Samuelsson <ulf.samuelsson@ericsson.com> Signed-off-by: YOSHIFUJI Hideaki <hideaki.yoshifuji@miraclelinux.com> Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-20switchdev: kernel-doc cleanup on swithdev opsScott Feldman1-16/+7
Suggested-by: Ben Hutchings <ben@decadent.org.uk> Signed-off-by: Scott Feldman <sfeldma@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-20Revert "selinux: add a skb_owned_by() hook"Eric Dumazet1-8/+0
This reverts commit ca10b9e9a8ca7342ee07065289cbe74ac128c169. No longer needed after commit eb8895debe1baba41fcb62c78a16f0c63c21662a ("tcp: tcp_make_synack() should use sock_wmalloc") When under SYNFLOOD, we build lot of SYNACK and hit false sharing because of multiple modifications done on sk_listener->sk_wmem_alloc Since tcp_make_synack() uses sock_wmalloc(), there is no need to call skb_set_owner_w() again, as this adds two atomic operations. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-20act_bpf: add initial eBPF support for actionsDaniel Borkmann2-1/+7
This work extends the "classic" BPF programmable tc action by extending its scope also to native eBPF code! Together with commit e2e9b6541dd4 ("cls_bpf: add initial eBPF support for programmable classifiers") this adds the facility to implement fully flexible classifier and actions for tc that can be implemented in a C subset in user space, "safely" loaded into the kernel, and being run in native speed when JITed. Also, since eBPF maps can be shared between eBPF programs, it offers the possibility that cls_bpf and act_bpf can share data 1) between themselves and 2) between user space applications. That means that, f.e. customized runtime statistics can be collected in user space, but also more importantly classifier and action behaviour could be altered based on map input from the user space application. For the remaining details on the workflow and integration, see the cls_bpf commit e2e9b6541dd4. Preliminary iproute2 part can be found under [1]. [1] http://git.breakpoint.cc/cgit/dborkman/iproute2.git/log/?h=ebpf-act Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Cc: Jamal Hadi Salim <jhs@mojatatu.com> Cc: Jiri Pirko <jiri@resnulli.us> Acked-by: Jiri Pirko <jiri@resnulli.us> Acked-by: Alexei Starovoitov <ast@plumgrid.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-20ebpf: add sched_act_type and map it to sk_filter's verifier opsDaniel Borkmann1-0/+1
In order to prepare eBPF support for tc action, we need to add sched_act_type, so that the eBPF verifier is aware of what helper function act_bpf may use, that it can load skb data and read out currently available skb fields. This is bascially analogous to 96be4325f443 ("ebpf: add sched_cls_type and map it to sk_filter's verifier ops"). BPF_PROG_TYPE_SCHED_CLS and BPF_PROG_TYPE_SCHED_ACT need to be separate since both will have a different set of functionality in future (classifier vs action), thus we won't run into ABI troubles when the point in time comes to diverge functionality from the classifier. The future plan for act_bpf would be that it will be able to write into skb->data and alter selected fields mirrored in struct __sk_buff. For an initial support, it's sufficient to map it to sk_filter_ops. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Cc: Jiri Pirko <jiri@resnulli.us> Reviewed-by: Jiri Pirko <jiri@resnulli.us> Acked-by: Alexei Starovoitov <ast@plumgrid.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-20Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller20-20/+82
Conflicts: drivers/net/ethernet/emulex/benet/be_main.c net/core/sysctl_net_core.c net/ipv4/inet_diag.c The be_main.c conflict resolution was really tricky. The conflict hunks generated by GIT were very unhelpful, to say the least. It split functions in half and moved them around, when the real actual conflict only existed solely inside of one function, that being be_map_pci_bars(). So instead, to resolve this, I checked out be_main.c from the top of net-next, then I applied the be_main.c changes from 'net' since the last time I merged. And this worked beautifully. The inet_diag.c and sysctl_net_core.c conflicts were simple overlapping changes, and were easily to resolve. Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-20rhashtable: Fix undeclared EEXIST build error on ia64Herbert Xu1-0/+1
We need to include linux/errno.h in rhashtable.h since it doesn't always get included otherwise. Reported-by: kbuild test robot <fengguang.wu@intel.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-20rhashtable: Rip out obsolete out-of-line interfaceHerbert Xu1-16/+3
Now that all rhashtable users have been converted over to the inline interface, this patch removes the unused out-of-line interface. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-20rhashtable: Allow hash/comparison functions to be inlinedHerbert Xu1-0/+386
This patch deals with the complaint that we make indirect function calls on the fast paths unnecessarily in rhashtable. We resolve it by moving the fast paths into inline functions that take struct rhashtable_param (which obviously must be the same set of parameters supplied to rhashtable_init) as an argument. The only remaining indirect call is to obj_hashfn (or key_hashfn it obj_hashfn is unset) on the rehash as well as the insert-during- rehash slow path. This patch also extends the support of vairable-length keys to include those where the key is fixed but scattered in the object. For example, in netlink we want to key off the namespace and the portid but they're not next to each other. This patch does this by directly using the object hash function as the indicator of whether the key is accessible or not. It also adds a new function obj_cmpfn to compare a key against an object. This means that the caller no longer needs to supply explicit compare functions. All this is done in a backwards compatible manner so no existing users are affected until they convert to the new interface. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-20rhashtable: Make rhashtable_init params argument constHerbert Xu1-1/+2
This patch marks the rhashtable_init params argument const as there is no reason to modify it since we will always make a copy of it in the rhashtable. This patch also fixes a bug where we don't actually round up the value of min_size unless it is less than HASH_MIN_SIZE. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Acked-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-20net: increase sk_[max_]ack_backlogEric Dumazet1-2/+2
sk_ack_backlog & sk_max_ack_backlog were 16bit fields, meaning listen() backlog was limited to 65535. It is time to increase the width to allow much bigger backlog, if admins change /proc/sys/net/core/somaxconn & /proc/sys/net/ipv4/tcp_max_syn_backlog default values. Tested: echo 5000000 >/proc/sys/net/core/somaxconn echo 5000000 >/proc/sys/net/ipv4/tcp_max_syn_backlog Ran a SYNFLOOD test against a listener using listen(fd, 5000000) myhost~# grep request_sock_TCP /proc/slabinfo request_sock_TCP 4185642 4411940 304 13 1 : tunables 54 27 8 : slabdata 339380 339380 0 Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-20inet: get rid of central tcp/dccp listener timerEric Dumazet3-58/+46
One of the major issue for TCP is the SYNACK rtx handling, done by inet_csk_reqsk_queue_prune(), fired by the keepalive timer of a TCP_LISTEN socket. This function runs for awful long times, with socket lock held, meaning that other cpus needing this lock have to spin for hundred of ms. SYNACK are sent in huge bursts, likely to cause severe drops anyway. This model was OK 15 years ago when memory was very tight. We now can afford to have a timer per request sock. Timer invocations no longer need to lock the listener, and can be run from all cpus in parallel. With following patch increasing somaxconn width to 32 bits, I tested a listener with more than 4 million active request sockets, and a steady SYNFLOOD of ~200,000 SYN per second. Host was sending ~830,000 SYNACK per second. This is ~100 times more what we could achieve before this patch. Later, we will get rid of the listener hash and use ehash instead. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-20inet: drop prev pointer handling in request sockEric Dumazet4-14/+16
When request sock are put in ehash table, the whole notion of having a previous request to update dl_next is pointless. Also, following patch will get rid of big purge timer, so we want to delete a request sock without holding listener lock. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-19target: do not reject FUA CDBs when write cache is enabled but ↵Christophe Vu-Brugier1-0/+1
emulate_write_cache is 0 A check that rejects a CDB with FUA bit set if no write cache is emulated was added by the following commit: fde9f50 target: Add sanity checks for DPO/FUA bit usage The condition is as follows: if (!dev->dev_attrib.emulate_fua_write || !dev->dev_attrib.emulate_write_cache) However, this check is wrong if the backend device supports WCE but "emulate_write_cache" is disabled. This patch uses se_dev_check_wce() (previously named spc_check_dev_wce) to invoke transport->get_write_cache() if the device has a write cache or check the "emulate_write_cache" attribute otherwise. Reported-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Christophe Vu-Brugier <cvubrugier@fastmail.fm> Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2015-03-19Merge tag 'pinctrl-v4.0-2' of ↵Linus Torvalds1-3/+3
git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl Pull pin control fixes from Linus Walleij: "Here is a slew of pin control fixes I've accumulated for the v4.0 kernel. Nothing special, just driver fixes (mainly embedded Intel it seems) and a misunderstanding regarding the stub functions was reverted: - Fix up consumer return values on pin control stubs. - Four patches fixing up the interrupt handling and sleep context save in the Baytrail driver. - Make default output directions work properly in the Cherryview driver. - Fix interrupt locking in the AT91 driver. - Fix setting interrupt generating lines as input in the sunxi driver" * tag 'pinctrl-v4.0-2' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl: pinctrl: sun4i: GPIOs configured as irq must be set to input before reading pinctrl: at91: move lock/unlock_as_irq calls into request/release pinctrl: update direction_output function of cherryview driver pinctrl: baytrail: Save pin context over system sleep pinctrl: baytrail: Rework interrupt handling pinctrl: baytrail: Clear interrupt triggering from pins that are in GPIO mode pinctrl: baytrail: Relax GPIO request rules Revert "pinctrl: consumer: use correct retval for placeholder functions"
2015-03-19Merge branch 'for-upstream' of ↵David S. Miller5-8/+84
git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth-next Johan Hedberg says: ==================== pull request: bluetooth-next 2015-03-19 This wont the last 4.1 bluetooth-next pull request, but we've piled up enough patches in less than a week that I wanted to save you from a single huge "last-minute" pull somewhere closer to the merge window. The main changes are: - Simultaneous LE & BR/EDR discovery support for HW that can do it - Complete LE OOB pairing support - More fine-grained mgmt-command access control (normal user can now do harmless read-only operations). - Added RF power amplifier support in cc2520 ieee802154 driver - Some cleanups/fixes in ieee802154 code Please let me know if there are any issues pulling. Thanks. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-19Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netLinus Torvalds4-1/+13
Pull networking fixes from David Miller: 1) Fix packet header offset calculation in _decode_session6(), from Hajime Tazaki. 2) Fix route leak in error paths of xfrm_lookup(), from Huaibin Wang. 3) Be sure to clear state properly when scans fail in iwlwifi mvm code, from Luciano Coelho. 4) iwlwifi tries to stop scans that aren't actually running, also from Luciano Coelho. 5) mac80211 should drop mesh frames that are not encrypted, fix from Bob Copeland. 6) Add new device ID to b43 wireless driver for BCM432228 chips, from Rafał Miłecki. 7) Fix accidental addition of members after variable sized array in struct tc_u_hnode, from WANG Cong. 8) Don't re-enable interrupts until after we call napi_complete() in ibmveth and WIZnet drivers, frm Yongbae Park. 9) Fix regression in vlan tag handling of fec driver, from Fugang Duan. 10) If a network namespace change fails during rtnl_newlink(), we don't unwind the device registry properly. 11) Fix two TCP regressions, from Neal Cardwell: - Don't allow snd_cwnd_cnt to accumulate huge values due to missing test in tcp_cong_avoid_ai(). - Restore CUBIC back to advancing cwnd by 1.5x packets per RTT. 12) Fix performance regression in xne-netback involving push TX notifications, from David Vrabel. 13) __skb_tstamp_tx() can be called with a NULL sk pointer, do not dereference blindly. From Willem de Bruijn. 14) Fix potential stack overflow in RDS protocol stack, from Arnd Bergmann. 15) VXLAN_VID_MASK used incorrectly in new remote checksum offload support of VXLAN driver. Fix from Alexey Kodanev. 16) Fix too small netlink SKB allocation in inet_diag layer, from Eric Dumazet. 17) ieee80211_check_combinations() does not count interfaces correctly, from Andrei Otcheretianski. 18) Hardware feature determination in bxn2x driver references a piece of software state that actually isn't initialized yet, fix from Michal Schmidt. 19) inet_csk_wait_for_connect() needs a sched_annotate_sleep() annoation, from Eric Dumazet. * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (56 commits) Revert "net: cx82310_eth: use common match macro" net/mlx4_en: Set statistics bitmap at port init IB/mlx4: Saturate RoCE port PMA counters in case of overflow net/mlx4_en: Fix off-by-one in ethtool statistics display IB/mlx4: Verify net device validity on port change event act_bpf: allow non-default TC_ACT opcodes as BPF exec outcome Revert "smc91x: retrieve IRQ and trigger flags in a modern way" inet: Clean up inet_csk_wait_for_connect() vs. might_sleep() ip6_tunnel: fix error code when tunnel exists netdevice.h: fix ndo_bridge_* comments bnx2x: fix encapsulation features on 57710/57711 mac80211: ignore CSA to same channel nl80211: ignore HT/VHT capabilities without QoS/WMM mac80211: ask for ECSA IE to be considered for beacon parse CRC mac80211: count interfaces correctly for combination checks isdn: icn: use strlcpy() when parsing setup options rxrpc: bogus MSG_PEEK test in rxrpc_recvmsg() caif: fix MSG_OOB test in caif_seqpkt_recvmsg() bridge: reset bridge mtu after deleting an interface can: kvaser_usb: Fix tx queue start/stop race conditions ...
2015-03-19netfilter: restore rule tracing via nfnetlink_logPablo Neira Ayuso1-0/+10
Since fab4085 ("netfilter: log: nf_log_packet() as real unified interface"), the loginfo structure that is passed to nf_log_packet() is used to explicitly indicate the logger type you want to use. This is a problem for people tracing rules through nfnetlink_log since packets are always routed to the NF_LOG_TYPE logger after the aforementioned patch. We can fix this by removing the trace loginfo structures, but that still changes the log level from 4 to 5 for tracing messages and there may be someone relying on this outthere. So let's just introduce a new nf_log_trace() function that restores the former behaviour. Reported-by: Markus Kötter <koetter@rrzn.uni-hannover.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2015-03-18bridge: add ageing_time, stp_state, priority over netlinkJörg Thalheim1-0/+3
Signed-off-by: Jörg Thalheim <joerg@higgsboson.tk> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-18net: Fix high overhead of vlan sub-device teardown.David S. Miller1-0/+1
When a networking device is taken down that has a non-trivial number of VLAN devices configured under it, we eat a full synchronize_net() for every such VLAN device. This is because of the call chain: NETDEV_DOWN notifier --> vlan_device_event() --> dev_change_flags() --> __dev_change_flags() --> __dev_close() --> __dev_close_many() --> dev_deactivate_many() --> synchronize_net() This is kind of rediculous because we already have infrastructure for batching doing operation X to a list of net devices so that we only incur one sync. So make use of that by exporting dev_close_many() and adjusting it's interfaace so that the caller can fully manage the batch list. Use this in vlan_device_event() and all the overhead goes away. Reported-by: Salam Noureddine <noureddine@arista.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-18net: add support for phys_port_nameDavid Ahern2-0/+5
Similar to port id allow netdevices to specify port names and export the name via sysfs. Drivers can implement the netdevice operation to assist udev in having sane default names for the devices using the rule: $ cat /etc/udev/rules.d/80-net-setup-link.rules SUBSYSTEM=="net", ACTION=="add", ATTR{phys_port_name}!="", NAME="$attr{phys_port_name}" Use of phys_name versus phys_id was suggested-by Jiri Pirko. Signed-off-by: David Ahern <dsahern@gmail.com> Acked-by: Jiri Pirko <jiri@resnulli.us> Acked-by: Scott Feldman <sfeldma@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-18ipv4, ipv6: kill ip_mc_{join, leave}_group and ipv6_sock_mc_{join, drop}Marcelo Ricardo Leitner2-6/+0
in favor of their inner __ ones, which doesn't grab rtnl. As these functions need to operate on a locked socket, we can't be grabbing rtnl by then. It's too late and doing so causes reversed locking. So this patch: - move rtnl handling to callers instead while already fixing some reversed locking situations, like on vxlan and ipvs code. - renames __ ones to not have the __ mark: __ip_mc_{join,leave}_group -> ip_mc_{join,leave}_group __ipv6_sock_mc_{join,drop} -> ipv6_sock_mc_{join,drop} Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>