summaryrefslogtreecommitdiffstats
path: root/drivers/infiniband/ulp/ipoib/ipoib_main.c
AgeCommit message (Collapse)AuthorFilesLines
2012-08-14IB/ipoib: Fix RCU pointer dereference of wrong objectShlomo Pongratz1-1/+1
Commit b63b70d87741 ("IPoIB: Use a private hash table for path lookup in xmit path") introduced a bug where in ipoib_neigh_free() (which is called from a few errors flows in the driver), rcu_dereference() is invoked with the wrong pointer object, which results in a crash. Signed-off-by: Shlomo Pongratz <shlomop@mellanox.com> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
2012-07-30IPoIB: Use a private hash table for path lookup in xmit pathShlomo Pongratz1-167/+479
Dave Miller <davem@davemloft.net> provided a detailed description of why the way IPoIB is using neighbours for its own ipoib_neigh struct is buggy: Any time an ipoib_neigh is changed, a sequence like the following is made: spin_lock_irqsave(&priv->lock, flags); /* * It's safe to call ipoib_put_ah() inside * priv->lock here, because we know that * path->ah will always hold one more reference, * so ipoib_put_ah() will never do more than * decrement the ref count. */ if (neigh->ah) ipoib_put_ah(neigh->ah); list_del(&neigh->list); ipoib_neigh_free(dev, neigh); spin_unlock_irqrestore(&priv->lock, flags); ipoib_path_lookup(skb, n, dev); This doesn't work, because you're leaving a stale pointer to the freed up ipoib_neigh in the special neigh->ha pointer cookie. Yes, it even fails with all the locking done to protect _changes_ to *ipoib_neigh(n), and with the code in ipoib_neigh_free() that NULLs out the pointer. The core issue is that read side calls to *to_ipoib_neigh(n) are not being synchronized at all, they are performed without any locking. So whether we hold the lock or not when making changes to *ipoib_neigh(n) you still can have threads see references to freed up ipoib_neigh objects. cpu 1 cpu 2 n = *ipoib_neigh() *ipoib_neigh() = NULL kfree(n) n->foo == OOPS [..] Perhaps the ipoib code can have a private path database it manages entirely itself, which holds all the necessary information and is looked up by some generic key which is available easily at transmit time and does not involve generic neighbour entries. See <http://marc.info/?l=linux-rdma&m=132812793105624&w=2> and <http://marc.info/?l=linux-rdma&w=2&r=1&s=allows+references+to+freed+memory&q=b> for the full discussion. This patch aims to solve the race conditions found in the IPoIB driver. The patch removes the connection between the core networking neighbour structure and the ipoib_neigh structure. In addition to avoiding the race described above, it allows us to handle SKBs carrying IP packets that don't have any associated neighbour. We add an ipoib_neigh hash table with N buckets where the key is the destination hardware address. The ipoib_neigh is fetched from the hash table and instead of the stashed location in the neighbour structure. The hash table uses both RCU and reference counting to guarantee that no ipoib_neigh instance is ever deleted while in use. Fetching the ipoib_neigh structure instance from the hash also makes the special code in ipoib_start_xmit that handles remote and local bonding failover redundant. Aged ipoib_neigh instances are deleted by a garbage collection task that runs every M seconds and deletes every ipoib_neigh instance that was idle for at least 2*M seconds. The deletion is safe since the ipoib_neigh instances are protected using RCU and reference count mechanisms. The number of buckets (N) and frequency of running the GC thread (M), are taken from the exported arb_tbl. Signed-off-by: Shlomo Pongratz <shlomop@mellanox.com> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
2012-07-05ipoib: Convert over to dev_lookup_neigh_skb().David S. Miller1-1/+3
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-02-08IPoIB: Stop lying about hard_header_len and use skb->cb to stash LL addressesRoland Dreier1-36/+19
Commit a0417fa3a18a ("net: Make qdisc_skb_cb upper size bound explicit.") made it possible for a netdev driver to use skb->cb between its header_ops.create method and its .ndo_start_xmit method. Use this in ipoib_hard_header() to stash away the LL address (GID + QPN), instead of the "ipoib_pseudoheader" hack. This allows IPoIB to stop lying about its hard_header_len, which will let us fix the L2 check for GRO. Signed-off-by: Roland Dreier <roland@purestorage.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-12-05infiniband: ipoib: Sanitize neighbour handling in ipoib_main.cDavid Miller1-12/+12
Reduce the number of dst_get_neighbour_noref() calls within a single call chain. Primarily by passing the neighbour pointer down to the helper functions. Handle dst_get_neighbour_noref() returning NULL in ipoib_start_xmit() by incrementing the dropped counter and freeing the packet. We don't want it to fall through into the ARP/RARP/multicast handling, since that should only happen when skb_dst() is NULL. Signed-off-by: David S. Miller <davem@davemloft.net> Acked-by: Roland Dreier <roland@purestorage.com>
2011-12-05net: Rename dst_get_neighbour{, _raw} to dst_get_neighbour_noref{, _raw}.David Miller1-4/+4
To reflect the fact that a refrence is not obtained to the resulting neighbour entry. Signed-off-by: David S. Miller <davem@davemloft.net> Acked-by: Roland Dreier <roland@purestorage.com>
2011-12-02Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller1-8/+12
2011-11-30neigh: Add infrastructure for allocating device neigh privates.David Miller1-0/+2
netdev->neigh_priv_len records the private area length. This will trigger for neigh_table objects which set tbl->entry_size to zero, and the first instances of this will be forthcoming. Signed-off-by: David S. Miller <davem@davemloft.net>
2011-11-29Merge branches 'cxgb4', 'ipoib', 'misc' and 'qib' into for-nextRoland Dreier1-8/+12
2011-11-29IB: Fix RCU lockdep splatsEric Dumazet1-7/+11
Commit f2c31e32b37 ("net: fix NULL dereferences in check_peer_redir()") forgot to take care of infiniband uses of dst neighbours. Many thanks to Marc Aurele who provided a nice bug report and feedback. Reported-by: Marc Aurele La France <tsi@ualberta.ca> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Cc: David Miller <davem@davemloft.net> Cc: <stable@kernel.org> Signed-off-by: Roland Dreier <roland@purestorage.com>
2011-11-29IB/ipoib: Prevent hung task or softlockup processing multicast responseMike Marciniszyn1-1/+1
This following can occur with ipoib when processing a multicast reponse: BUG: soft lockup - CPU#0 stuck for 67s! [ib_mad1:982] Modules linked in: ... CPU 0: Modules linked in: ... Pid: 982, comm: ib_mad1 Not tainted 2.6.32-131.0.15.el6.x86_64 #1 ProLiant DL160 G5 RIP: 0010:[<ffffffff814ddb27>] [<ffffffff814ddb27>] _spin_unlock_irqrestore+0x17/0x20 RSP: 0018:ffff8802119ed860 EFLAGS: 00000246 0000000000000004 RBX: ffff8802119ed860 RCX: 000000000000a299 RDX: ffff88021086c700 RSI: 0000000000000246 RDI: 0000000000000246 RBP: ffffffff8100bc8e R08: ffff880210ac229c R09: 0000000000000000 R10: ffff88021278aab8 R11: 0000000000000000 R12: ffff8802119ed860 R13: ffffffff8100be6e R14: 0000000000000001 R15: 0000000000000003 FS: 0000000000000000(0000) GS:ffff880028200000(0000) knlGS:0000000000000000 CS: 0010 DS: 0018 ES: 0018 CR0: 000000008005003b CR2: 00000000006d4840 CR3: 0000000209aa5000 CR4: 00000000000406f0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 Call Trace: [<ffffffffa032c247>] ? ipoib_mcast_send+0x157/0x480 [ib_ipoib] [<ffffffff8100bc8e>] ? apic_timer_interrupt+0xe/0x20 [<ffffffff8100bc8e>] ? apic_timer_interrupt+0xe/0x20 [<ffffffffa03283d4>] ? ipoib_path_lookup+0x124/0x2d0 [ib_ipoib] [<ffffffffa03286fc>] ? ipoib_start_xmit+0x17c/0x430 [ib_ipoib] [<ffffffff8141e758>] ? dev_hard_start_xmit+0x2c8/0x3f0 [<ffffffff81439d0a>] ? sch_direct_xmit+0x15a/0x1c0 [<ffffffff81423098>] ? dev_queue_xmit+0x388/0x4d0 [<ffffffffa032d6b7>] ? ipoib_mcast_join_finish+0x2c7/0x510 [ib_ipoib] [<ffffffffa032dab8>] ? ipoib_mcast_sendonly_join_complete+0x1b8/0x1f0 [ib_ipoib] [<ffffffffa02a0946>] ? mcast_work_handler+0x1a6/0x710 [ib_sa] [<ffffffffa015f01e>] ? ib_send_mad+0xfe/0x3c0 [ib_mad] [<ffffffffa00f6c93>] ? ib_get_cached_lmc+0xa3/0xb0 [ib_core] [<ffffffffa02a0f9b>] ? join_handler+0xeb/0x200 [ib_sa] [<ffffffffa029e4fc>] ? ib_sa_mcmember_rec_callback+0x5c/0xa0 [ib_sa] [<ffffffffa029e79c>] ? recv_handler+0x3c/0x70 [ib_sa] [<ffffffffa01603a4>] ? ib_mad_completion_handler+0x844/0x9d0 [ib_mad] [<ffffffffa015fb60>] ? ib_mad_completion_handler+0x0/0x9d0 [ib_mad] [<ffffffff81088830>] ? worker_thread+0x170/0x2a0 [<ffffffff8108e160>] ? autoremove_wake_function+0x0/0x40 [<ffffffff810886c0>] ? worker_thread+0x0/0x2a0 [<ffffffff8108ddf6>] ? kthread+0x96/0xa0 [<ffffffff8100c1ca>] ? child_rip+0xa/0x20 Coinciding with stack trace is the following message: ib0: ib_address_create failed The code below in ipoib_mcast_join_finish() will note the above failure in the address handle but otherwise continue: ah = ipoib_create_ah(dev, priv->pd, &av); if (!ah) { ipoib_warn(priv, "ib_address_create failed\n"); } else { The while loop at the bottom of ipoib_mcast_join_finish() will attempt to send queued multicast packets in mcast->pkt_queue and eventually end up in ipoib_mcast_send(): if (!mcast->ah) { if (skb_queue_len(&mcast->pkt_queue) < IPOIB_MAX_MCAST_QUEUE) skb_queue_tail(&mcast->pkt_queue, skb); else { ++dev->stats.tx_dropped; dev_kfree_skb_any(skb); } My read is that the code will requeue the packet and return to the ipoib_mcast_join_finish() while loop and the stage is set for the "hung" task diagnostic as the while loop never sees a non-NULL ah, and will do nothing to resolve. There are GFP_ATOMIC allocates in the provider routines, so this is possible and should be dealt with. The test that induced the failure is associated with a host SM on the same server during a shutdown. This patch causes ipoib_mcast_join_finish() to exit with an error which will flush the queued mcast packets. Nothing is done to unwind the QP attached state so that subsequent sends from above will retry the join. Reviewed-by: Ram Vepa <ram.vepa@qlogic.com> Reviewed-by: Gary Leshner <gary.leshner@qlogic.com> Signed-off-by: Mike Marciniszyn <mike.marciniszyn@qlogic.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
2011-11-16infiniband: Update net drivers for netdev_features_t changes.David S. Miller1-1/+1
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-09-22Merge branch 'master' of github.com:davem330/netDavid S. Miller1-3/+5
Conflicts: MAINTAINERS drivers/net/Kconfig drivers/net/ethernet/broadcom/bnx2x/bnx2x_link.c drivers/net/ethernet/broadcom/tg3.c drivers/net/wireless/iwlwifi/iwl-pci.c drivers/net/wireless/iwlwifi/iwl-trans-tx-pcie.c drivers/net/wireless/rt2x00/rt2800usb.c drivers/net/wireless/wl12xx/main.c
2011-08-17net: remove use of ndo_set_multicast_list in driversJiri Pirko1-1/+1
replace it by ndo_set_rx_mode Signed-off-by: Jiri Pirko <jpirko@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-08-16IPoIB: Fix possible NULL dereference in ipoib_start_xmit()Bernd Schubert1-3/+5
Fix a bug introduced in 69cce1d14049 ("net: Abstract dst->neighbour accesses behind helpers.") where we might dereference skb_dst(skb) even if it is NULL, which causes: [ 240.944030] BUG: unable to handle kernel NULL pointer dereference at 0000000000000040 [ 240.948007] IP: [<ffffffffa0366ce9>] ipoib_start_xmit+0x39/0x280 [ib_ipoib] [...] [ 240.948007] Call Trace: [ 240.948007] <IRQ> [ 240.948007] [<ffffffff812cd5e0>] dev_hard_start_xmit+0x2a0/0x590 [ 240.948007] [<ffffffff8131f680>] ? arp_create+0x70/0x200 [ 240.948007] [<ffffffff812e8e1f>] sch_direct_xmit+0xef/0x1c0 Addresses: https://bugzilla.kernel.org/show_bug.cgi?id=41212 Signed-off-by: Bernd Schubert <bernd.schubert@itwm.fraunhofer.de> Signed-off-by: Roland Dreier <roland@purestorage.com>
2011-07-17net: Abstract dst->neighbour accesses behind helpers.David S. Miller1-14/+27
dst_{get,set}_neighbour() Signed-off-by: David S. Miller <davem@davemloft.net>
2011-04-20net: infiniband/ulp/ipoib: convert to hw_featuresMichał Mirosław1-7/+17
Signed-off-by: Michał Mirosław <mirq-linux@rere.qmqm.pl> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-01-12RDMA: Use vzalloc() to replace vmalloc()+memset(0)Joe Perches1-2/+1
Signed-off-by: Joe Perches <joe@perches.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>
2011-01-10IPoIB: Add GRO supportOr Gerlitz1-0/+2
Signed-off-by: Or Gerlitz <ogerlitz@voltaire.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>
2011-01-10IPoIB: Remove LRO supportOr Gerlitz1-62/+0
As a first step in moving from LRO to GRO, revert commit af40da894e9 ("IPoIB: add LRO support"). Also eliminate the ethtool set_flags callback which isn't needed anymore. Finally, we need to include <linux/sched.h> directly to get the declaration of restart_syscall() (which used to be included implicitly through <linux/inet_lro.h>). Cc: Ben Hutchings <bhutchings@solarflare.com> Cc: Eric W. Biederman <ebiederm@xmission.com> Cc: Vladimir Sokolovsky <vlad@mellanox.co.il> Signed-off-by: Or Gerlitz <ogerlitz@voltaire.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>
2010-10-26Merge branch 'for-linus' of ↵Linus Torvalds1-0/+3
git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband: (63 commits) IB/qib: clean up properly if pci_set_consistent_dma_mask() fails IB/qib: Allow driver to load if PCIe AER fails IB/qib: Fix uninitialized pointer if CONFIG_PCI_MSI not set IB/qib: Fix extra log level in qib_early_err() RDMA/cxgb4: Remove unnecessary KERN_<level> use RDMA/cxgb3: Remove unnecessary KERN_<level> use IB/core: Add link layer type information to sysfs IB/mlx4: Add VLAN support for IBoE IB/core: Add VLAN support for IBoE IB/mlx4: Add support for IBoE mlx4_en: Change multicast promiscuous mode to support IBoE mlx4_core: Update data structures and constants for IBoE mlx4_core: Allow protocol drivers to find corresponding interfaces IB/uverbs: Return link layer type to userspace for query port operation IB/srp: Sync buffer before posting send IB/srp: Use list_first_entry() IB/srp: Reduce number of BUSY conditions IB/srp: Eliminate two forward declarations IB/mlx4: Signal node desc changes to SM by using FW to generate trap 144 IB: Replace EXTRA_CFLAGS with ccflags-y ...
2010-10-26replace nested max/min macros with {max,min}3 macroHagen Paul Pfeifer1-2/+1
Use the new {max,min}3 macros to save some cycles and bytes on the stack. This patch substitutes trivial nested macros with their counterpart. Signed-off-by: Hagen Paul Pfeifer <hagen@jauu.net> Cc: Joe Perches <joe@perches.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Hartley Sweeten <hsweeten@visionengravers.com> Cc: Russell King <linux@arm.linux.org.uk> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Herbert Xu <herbert@gondor.apana.org.au> Cc: Roland Dreier <rolandd@cisco.com> Cc: Sean Hefty <sean.hefty@intel.com> Cc: Pekka Enberg <penberg@cs.helsinki.fi> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-10-26Merge branches 'amso1100', 'cma', 'cxgb3', 'cxgb4', 'ehca', 'iboe', 'ipoib', ↵Roland Dreier1-0/+3
'misc', 'mlx4', 'nes', 'qib' and 'srp' into for-next
2010-10-23IPoIB: Set dev_id field of net_deviceEli Cohen1-0/+1
Use the net device's dev_id field to encode the port number of the pci device. This can be used to to associate a net device with the pci device's port. The encoding is: dev_id = port - 1. Signed-off-by: Eli Cohen <eli@mellanox.co.il> Signed-off-by: Roland Dreier <rolandd@cisco.com>
2010-10-13IPoIB: Skip IBoE portsEli Cohen1-0/+2
IPoIB is IP-over-Infiniband link layer. In the case of IBoE, the link layer is Ethernet and IP can work directly over Ethernet, so disable IPoIB for non-IB_LINK_LAYER_INFINIBAND ports. Signed-off-by: Eli Cohen <eli@mellanox.co.il> Signed-off-by: Roland Dreier <rolandd@cisco.com>
2010-07-06IPoIB: Fix world-writable child interface control sysfs attributesOr Gerlitz1-2/+2
Sumeet Lahorani <sumeet.lahorani@oracle.com> reported that the IPoIB child entries are world-writable; however we don't want ordinary users to be able to create and destroy child interfaces, so fix them to be writable only by root. Signed-off-by: Or Gerlitz <ogerlitz@voltaire.com> Cc: <stable@kernel.org> Signed-off-by: Roland Dreier <rolandd@cisco.com>
2009-12-09IPoIB: Clear ipoib_neigh.dgid in ipoib_neigh_alloc()David J. Wilder1-0/+1
IPoIB can miss a change in destination GID under some conditions. The problem is caused when ipoib_neigh->dgid contains a stale address. The fix is to set ipoib_neigh->dgid to zero in ipoib_neigh_alloc(). This can happen when a system using bonding on its IPoIB interfaces has switched its active interface from interface A to B and back to A. The system that fails over will not correctly processes the 2nd address change, as described below. When an address has changed neighbor->ha is updated with the new address. Each neighbor has an associated ipoib_neigh. ipoib_neigh->dgid also holds a copy of the remote node's hardware address. When an address changes neighbor->ha is updated by the network layer (arp code) with the new address. IPoIB detects this change in ipoib_start_xmit() by comparing neighbor->ha with ipoib_neigh->dgid. The bug is that ipoib_neigh->dgid may already contain the new address (A) thus the change from B to A is missed by ipoib. Here is the sequence of events: ipoib_neigh->dgid = A and neighbor->ha = A The address is switched to B (the first switch) neighbor->ha = B The change is seen in ipoib_start_xmit() -- neighbor->ha != ipoib_neigh->dgid so ipoib_neigh is released, and a new one is allocated. The allocator may return the same chunk of memory that was just released, therefore ipoib_neigh->dgid still contains A at this point. ipoib_neigh->dgid should be updated in neigh_add_path(), but if the following conditions are true dgid is not updated: 1) __path_find() returns a path 2) path->ah is NULL The remote system now switches from address B to A, neighbor->ha is updated to A. Now we have again : ipoib_neigh->dgid = A and neighbor->ha = A Since the addresses are the same ipoib won't process the change in address. Fix this by zeroing out the dgid field when allocating a new struct ipoib_neigh. Signed-off-by: David Wilder <dwilder@us.ibm.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>
2009-09-05IPoIB: Drop priv->lock before calling ipoib_send()Roland Dreier1-1/+6
IPoIB currently must use irqsave locking for priv->lock, since it is taken from interrupt context in one path. However, ipoib_send() does skb_orphan(), and the network stack locking is not IRQ-safe. Therefore we need to make sure we don't hold priv->lock when calling ipoib_send() to avoid lockdep warnings (the code was almost certainly safe in practice, since the only code path that takes priv->lock from interrupt context would never call into the network stack). Addresses: http://bugzilla.kernel.org/show_bug.cgi?id=13757 Reported-by: Bart Van Assche <bart.vanassche@gmail.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>
2009-06-03net: skb->dst accessorsEric Dumazet1-15/+15
Define three accessors to get/set dst attached to a skb struct dst_entry *skb_dst(const struct sk_buff *skb) void skb_dst_set(struct sk_buff *skb, struct dst_entry *dst) void skb_dst_drop(struct sk_buff *skb) This one should replace occurrences of : dst_release(skb->dst) skb->dst = NULL; Delete skb->dst field Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-05-30net: unset IFF_XMIT_DST_RELEASE for qeth and ipoibEric Dumazet1-0/+1
Last two drivers that need skb->dst in their start_xmit() function Tell dev_hard_start_xmit() to no release it by unsetting IFF_XMIT_DST_RELEASE Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-04-20IPoIB: Disable NAPI while CQ is being drainedYossi Etigin1-4/+1
If NAPI is enabled while IPoIB's CQ is being drained, it creates a race on priv->ibwc between ipoib_poll() and ipoib_drain_cq(), leading to memory corruption. The solution is to enable/disable NAPI in ipoib_ib_dev_{open/stop}() instead of in ipoib_{open/stop}(), and sync NAPI on the INITIALIZED flag instead on the ADMIN_UP flag. This way NAPI will be disabled when ipoib_drain_cq() is called. This fixes <https://bugs.openfabrics.org/show_bug.cgi?id=1587>. Signed-off-by: Yossi Etigin <yosefe@voltaire.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>
2009-03-26Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next-2.6Linus Torvalds1-7/+11
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next-2.6: (1750 commits) ixgbe: Allow Priority Flow Control settings to survive a device reset net: core: remove unneeded include in net/core/utils.c. e1000e: update version number e1000e: fix close interrupt race e1000e: fix loss of multicast packets e1000e: commonize tx cleanup routine to match e1000 & igb netfilter: fix nf_logger name in ebt_ulog. netfilter: fix warning in ebt_ulog init function. netfilter: fix warning about invalid const usage e1000: fix close race with interrupt e1000: cleanup clean_tx_irq routine so that it completely cleans ring e1000: fix tx hang detect logic and address dma mapping issues bridge: bad error handling when adding invalid ether address bonding: select current active slave when enslaving device for mode tlb and alb gianfar: reallocate skb when headroom is not enough for fcb Bump release date to 25Mar2009 and version to 0.22 r6040: Fix second PHY address qeth: fix wait_event_timeout handling qeth: check for completion of a running recovery qeth: unregister MAC addresses during recovery. ... Manually fixed up conflicts in: drivers/infiniband/hw/cxgb3/cxio_hal.h drivers/infiniband/hw/nes/nes_nic.c
2009-03-21infiniband: convert ipoib to net_device_opsStephen Hemminger1-7/+11
Signed-off-by: Stephen Hemminger <shemminger@vyatta.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-02-17IPoIB: In unicast_arp_send(), only free newly-created pathsJack Morgenstein1-2/+7
If path_rec_start() returns error, call path_free() only if the path was newly-created. If we free an existing path whose valid flag was zero, (but do not detach it from the list) we cause corruption of the path list (of which it is a member), and get a kernel crash. The simplest solution is to not free an existing path -- just leave it in the list as-is (i.e., with its valid flag cleared). Thanks to Yossi Etigin of Voltaire for identifying the problem flow which caused the kernel crash. Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il> Signed-off-by: Moni Shua <monis@voltaire.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>
2009-01-14IPoIB: Fix hang in napi_disable() if P_Key is never foundRoland Dreier1-12/+15
After commit fe25c561 ("IPoIB: Don't enable NAPI when it's already enabled"), if an interface is brought up but the corresponding P_Key never appears, then ipoib_stop() will hang in napi_disable(), because ipoib_open() returns before it does napi_enable(). Fix this by changing ipoib_open() to call napi_enable() even if the P_Key isn't present. Reported-by: Yossi Etigin <yosefe@Voltaire.COM> Signed-off-by: Roland Dreier <rolandd@cisco.com>
2009-01-09IPoIB: Fix loss of connectivity after bonding failover on both sidesYossi Etigin1-19/+19
Fix bonding failover in the case both peers failover and the gratuitous ARP is lost. In that case, the sender side will create an ipoib_neigh and issue a path request with the old GID first. When skb->dst->neighbour->ha changes due to ARP refresh, this ipoib_neigh will not be added to the path->list of the path of the new GID, because the ipoib_neigh already exists. It will not have an AH either, because of sender-side failover. Therefore, it will not get an AH when the path is resolved. The solution here is to compare GIDs in ipoib_start_xmit() even if neigh->ah is invalid. Comparing with an uninitialized value of neigh->dgid should be fine, since a spurious match is harmless (and astronomically unlikely too). Signed-off-by: Moni Shoua <monis@voltaire.com> Signed-off-by: Yossi Etigin <yosefe@voltaire.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-11-18Merge branch 'master' of ↵David S. Miller1-2/+4
master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6 Conflicts: drivers/isdn/i4l/isdn_net.c fs/cifs/connect.c
2008-11-12IPoIB: Fix crash in path_rec_completion()Yossi Etigin1-1/+1
Fix a crash in path_rec_completion() during an SM up/down loop. If more than one path record request is issued, the first completion releases path->done, allowing ipoib_flush_paths() to free the path, and thus corrupting it for the second completion. Commit ee1e2c82 ("IPoIB: Refresh paths instead of flushing them on SM change events") added the field path->valid and changed the test "if (!path)" to "if (!path || !path->valid)". This change made it possible for a path with an outstanding query to pass the test and issue another query on the same path. Having two queries on the same path leads to a crash. This fixes <https://bugs.openfabrics.org/show_bug.cgi?id=1325>. Signed-off-by: Yossi Etigin <yosefe@voltaire.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-11-12IPoIB: Fix hang in ipoib_flush_paths()Yossi Etigin1-0/+1
ipoib_flush_paths() can hang during an SM up/down loop: if path_rec_start() fails (for instance, because there is no sm_ah), the path is still added to the path list by neigh_add_path(). Then, ipoib_flush_paths() will wait for path->done, but it will never complete because the request was not issued at all. Fix this by completing path->done if issuing the query fails. This fixes <https://bugs.openfabrics.org/show_bug.cgi?id=1329>. Signed-off-by: Yossi Etigin <yosefe@voltaire.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-11-12IPoIB: Don't enable NAPI when it's already enabledYossi Etigin1-1/+2
If a P_Key is not present when an interface is created, ipoib_open() will return after doing napi_enable(). ipoib_open() will be called again from ipoib_pkey_poll() when the P_Key appears, after NAPI has already been enabled, and try to enable it again. This triggers a BUG_ON() in napi_enable(). Fix this by moving the call to napi_enable() to after the test for P_Key presence. Signed-off-by: Yossi Etigin <yosefe@voltaire.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-10-29net: replace %p6 with %pI6Harvey Harrison1-6/+6
Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-10-28infiniband: ipoib replace IPOIB_GID_FMT with %p6Harvey Harrison1-13/+12
Replace all uses of IPOIB_GID_FMT, IPOIB_GID_RAW_ARG() and IPOIB_GID_ARG() Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-10-22IPoIB: Set netdev offload features properly for child (VLAN) interfacesOr Gerlitz1-28/+39
Child devices were created without any offload features set, fix this by moving the code that computes the features into generic function which is now called through non-child and child device creation. Signed-off-by: Or Gerlitz <ogerlitz@voltaire.com> -- v1 has a bug where the 'result' flag in ipoib_vlan_add may be used uninitialized Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-09-30IPoIB: Use netif_tx_lock() and get rid of private tx_lock, LLTXRoland Dreier1-40/+28
Currently, IPoIB is an LLTX driver that uses its own IRQ-disabling tx_lock. Not only do we want to get rid of LLTX, this actually causes problems because of the skb_orphan() done with this tx_lock held: some skb destructors expect to be run with interrupts enabled. The simplest fix for this is to get rid of the driver-private tx_lock and stop using LLTX. We kill off priv->tx_lock and use netif_tx_lock[_bh]() instead; the patch to do this is a tiny bit tricky because we need to update places that take priv->lock inside the tx_lock to disable IRQs, rather than relying on tx_lock having already disabled IRQs. Also, there are a couple of places where we need to disable BHs to make sure we have a consistent context to call netif_tx_lock() (since we no longer can use _irqsave() variants), and we also have to change ipoib_send_comp_handler() to call drain_tx_cq() through a timer rather than directly, because ipoib_send_comp_handler() runs in interrupt context and drain_tx_cq() must run in BH context so it can call netif_tx_lock(). Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-09-25IPoIB: Fix crash when path record fails after path flushRoland Dreier1-4/+4
Commit ee1e2c82 ("IPoIB: Refresh paths instead of flushing them on SM change events") changed how paths are flushed on an SM event. This change introduces a problem if the path record query triggered by fails, causing path->ah to become NULL. A later successful path query will then trigger WARN_ON() in path_rec_completion(), and crash because path->ah has already been freed, so the ipoib_put_ah() inside the lock in path_rec_completion() may actually drop the last reference (contrary to the comment that claims this is safe). Fix this by updating path->ah and freeing old_ah only when the path record query is successful. This prevents the neighbour AH and that path AH from getting out of sync. This fixes <https://bugs.openfabrics.org/show_bug.cgi?id=1194> Reported-by: Rabah Salem <ravah@mellanox.com> Debugged-by: Eli Cohen <eli@mellanox.co.il> Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-09-16IPoIB: Fix deadlock on RTNL between bcast join comp and ipoib_stop()Yossi Etigin1-0/+1
Taking rtnl_lock in ipoib_mcast_join_complete() causes a deadlock with ipoib_stop(). We avoid it by scheduling the piece of code that takes the lock on ipoib_workqueue instead of executing it directly. This works because we only flush the ipoib_workqueue with the RTNL not held. The deadlock happens because ipoib_stop() calls ipoib_ib_dev_down() which calls ipoib_mcast_dev_flush(), which calls ipoib_mcast_free(), which calls ipoib_mcast_leave(). The latter calls ib_sa_free_multicast(), and this waits until the multicast completion handler finishes. This handler is ipoib_mcast_join_complete(), which waits for the rtnl_lock(), which was already taken by ipoib_stop(). This bug was introduced in commit a77a57a1 ("IPoIB: Fix deadlock on RTNL in ipoib_stop()"). Signed-off-by: Yossi Etigin <yosefe@voltaire.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-08-19IPoIB: Fix deadlock on RTNL in ipoib_stop()Roland Dreier1-10/+9
Commit c8c2afe3 ("IPoIB: Use rtnl lock/unlock when changing device flags") added a call to rtnl_lock() in ipoib_mcast_join_task(), which is run from the ipoib_workqueue. However, ipoib_stop() (which is run inside rtnl_lock()) flushes this workqueue, which leads to a deadlock if the join task is pending. Fix this by simply not flushing the workqueue from ipoib_stop(). It turns out that we really don't care about workqueue tasks running during or after ipoib_stop(), as long as we make sure to flush the workqueue before unregistering a netdev. This fixes <https://bugs.openfabrics.org/show_bug.cgi?id=1114>. Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-07-22IPoIB: Include err code in trace message for ib_sa_path_rec_get() failuresOr Gerlitz1-1/+1
Print the return code of ib_sa_path_rec_get() if it fails to help debug errors. Signed-off-by: Or Gerlitz <ogerlitz@voltaire.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-07-14IPoIB: Remove priv->mcast_mutexEli Cohen1-1/+0
No need for a mutex around calls to ib_attach_mcast/ib_detach_mcast since these operations are synchronized at the HW driver layer. Signed-off-by: Eli Cohen <eli@mellanox.co.il> Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-07-14IPoIB: Refresh paths instead of flushing them on SM change eventsMoni Shoua1-4/+40
The patch tries to solve the problem of device going down and paths being flushed on an SM change event. The method is to mark the paths as candidates for refresh (by setting the new valid flag to 0), and wait for an ARP probe a new path record query. The solution requires a different and less intrusive handling of SM change event. For that, the second argument of the flush function changes its meaning from a boolean flag to a level. In most cases, SM failover doesn't cause LID change so traffic won't stop. In the rare cases of LID change, the remote host (the one that hadn't changed its LID) will lose connectivity until paths are refreshed. This is no worse than the current state. In fact, preventing the device from going down saves packets that otherwise would be lost. Signed-off-by: Moni Levy <monil@voltaire.com> Signed-off-by: Moni Shoua <monis@voltaire.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>