summaryrefslogtreecommitdiffstats
path: root/net
AgeCommit message (Collapse)AuthorFilesLines
2011-01-19net_sched: cleanupsEric Dumazet41-801/+842
Cleanup net/sched code to current CodingStyle and practices. Reduce inline abuse Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-01-19af_unix: coding style: remove one level of indentation in unix_shutdown()Alban Crequy1-29/+31
Signed-off-by: Alban Crequy <alban.crequy@collabora.co.uk> Reviewed-by: Ian Molton <ian.molton@collabora.co.uk> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-01-19net_sched: implement a root container qdisc sch_mqprioJohn Fastabend4-0/+434
This implements a mqprio queueing discipline that by default creates a pfifo_fast qdisc per tx queue and provides the needed configuration interface. Using the mqprio qdisc the number of tcs currently in use along with the range of queues alloted to each class can be configured. By default skbs are mapped to traffic classes using the skb priority. This mapping is configurable. Configurable parameters, struct tc_mqprio_qopt { __u8 num_tc; __u8 prio_tc_map[TC_BITMASK + 1]; __u8 hw; __u16 count[TC_MAX_QUEUE]; __u16 offset[TC_MAX_QUEUE]; }; Here the count/offset pairing give the queue alignment and the prio_tc_map gives the mapping from skb->priority to tc. The hw bit determines if the hardware should configure the count and offset values. If the hardware bit is set then the operation will fail if the hardware does not implement the ndo_setup_tc operation. This is to avoid undetermined states where the hardware may or may not control the queue mapping. Also minimal bounds checking is done on the count/offset to verify a queue does not exceed num_tx_queues and that queue ranges do not overlap. Otherwise it is left to user policy or hardware configuration to create useful mappings. It is expected that hardware QOS schemes can be implemented by creating appropriate mappings of queues in ndo_tc_setup(). One expected use case is drivers will use the ndo_setup_tc to map queue ranges onto 802.1Q traffic classes. This provides a generic mechanism to map network traffic onto these traffic classes and removes the need for lower layer drivers to know specifics about traffic types. Signed-off-by: John Fastabend <john.r.fastabend@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-01-19net: implement mechanism for HW based QOSJohn Fastabend1-1/+54
This patch provides a mechanism for lower layer devices to steer traffic using skb->priority to tx queues. This allows for hardware based QOS schemes to use the default qdisc without incurring the penalties related to global state and the qdisc lock. While reliably receiving skbs on the correct tx ring to avoid head of line blocking resulting from shuffling in the LLD. Finally, all the goodness from txq caching and xps/rps can still be leveraged. Many drivers and hardware exist with the ability to implement QOS schemes in the hardware but currently these drivers tend to rely on firmware to reroute specific traffic, a driver specific select_queue or the queue_mapping action in the qdisc. By using select_queue for this drivers need to be updated for each and every traffic type and we lose the goodness of much of the upstream work. Firmware solutions are inherently inflexible. And finally if admins are expected to build a qdisc and filter rules to steer traffic this requires knowledge of how the hardware is currently configured. The number of tx queues and the queue offsets may change depending on resources. Also this approach incurs all the overhead of a qdisc with filters. With the mechanism in this patch users can set skb priority using expected methods ie setsockopt() or the stack can set the priority directly. Then the skb will be steered to the correct tx queues aligned with hardware QOS traffic classes. In the normal case with single traffic class and all queues in this class everything works as is until the LLD enables multiple tcs. To steer the skb we mask out the lower 4 bits of the priority and allow the hardware to configure upto 15 distinct classes of traffic. This is expected to be sufficient for most applications at any rate it is more then the 8021Q spec designates and is equal to the number of prio bands currently implemented in the default qdisc. This in conjunction with a userspace application such as lldpad can be used to implement 8021Q transmission selection algorithms one of these algorithms being the extended transmission selection algorithm currently being used for DCB. Signed-off-by: John Fastabend <john.r.fastabend@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-01-19netlink: support setting devgroup parametersVlad Dogaru1-4/+28
If a rtnetlink request specifies a negative or zero ifindex and has no interface name attribute, but has a group attribute, then the chenges are made to all the interfaces belonging to the specified group. Signed-off-by: Vlad Dogaru <ddvlad@rosedu.org> Acked-by: Jamal Hadi Salim <hadi@cyberus.ca> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-01-19net_device: add support for network device groupsVlad Dogaru2-0/+18
Net devices can now be grouped, enabling simpler manipulation from userspace. This patch adds a group field to the net_device structure, as well as rtnetlink support to query and modify it. Signed-off-by: Vlad Dogaru <ddvlad@rosedu.org> Acked-by: Jamal Hadi Salim <hadi@cyberus.ca> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-01-19net: cleanup unused macros in net directoryShan Wei10-12/+2
Clean up some unused macros in net/*. 1. be left for code change. e.g. PGV_FROM_VMALLOC, PGV_FROM_VMALLOC, KMEM_SAFETYZONE. 2. never be used since introduced to kernel. e.g. P9_RDMA_MAX_SGE, UTIL_CTRL_PKT_SIZE. Signed-off-by: Shan Wei <shanwei@cn.fujitsu.com> Acked-by: Sjur Braendeland <sjur.brandeland@stericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-01-18net: filter: dont block softirqs in sk_run_filter()Eric Dumazet2-6/+6
Packet filter (BPF) doesnt need to disable softirqs, being fully re-entrant and lock-less. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-01-18af_unix: implement socket filterAlban Crequy1-0/+6
Linux Socket Filters can already be successfully attached and detached on unix sockets with setsockopt(sockfd, SOL_SOCKET, SO_{ATTACH,DETACH}_FILTER, ...). See: Documentation/networking/filter.txt But the filter was never used in the unix socket code so it did not work. This patch uses sk_filter() to filter buffers before delivery. This short program demonstrates the problem on SOCK_DGRAM. int main(void) { int i, j, ret; int sv[2]; struct pollfd fds[2]; char *message = "Hello world!"; char buffer[64]; struct sock_filter ins[32] = {{0,},}; struct sock_fprog filter; socketpair(AF_UNIX, SOCK_DGRAM, 0, sv); for (i = 0 ; i < 2 ; i++) { fds[i].fd = sv[i]; fds[i].events = POLLIN; fds[i].revents = 0; } for(j = 1 ; j < 13 ; j++) { /* Set a socket filter to truncate the message */ memset(ins, 0, sizeof(ins)); ins[0].code = BPF_RET|BPF_K; ins[0].k = j; filter.len = 1; filter.filter = ins; setsockopt(sv[1], SOL_SOCKET, SO_ATTACH_FILTER, &filter, sizeof(filter)); /* send a message */ send(sv[0], message, strlen(message) + 1, 0); /* The filter should let the message pass but truncated. */ poll(fds, 2, 0); /* Receive the truncated message*/ ret = recv(sv[1], buffer, 64, 0); printf("received %d bytes, expected %d\n", ret, j); } for (i = 0 ; i < 2 ; i++) close(sv[i]); return 0; } Signed-off-by: Alban Crequy <alban.crequy@collabora.co.uk> Reviewed-by: Ian Molton <ian.molton@collabora.co.uk> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-01-18Merge branch 'master' of ↵David S. Miller11-33/+42
master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6
2011-01-18net offloading: Do not mask out NETIF_F_HW_VLAN_TX for vlan.Jesse Gross1-2/+2
In netif_skb_features() we return only the features that are valid for vlans if we have a vlan packet. However, we should not mask out NETIF_F_HW_VLAN_TX since it enables transmission of vlan tags and is obviously valid. Reported-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: Jesse Gross <jesse@nicira.com> Acked-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-01-18ipv6: Silence privacy extensions initializationRomain Francoise1-3/+0
When a network namespace is created (via CLONE_NEWNET), the loopback interface is automatically added to the new namespace, triggering a printk in ipv6_add_dev() if CONFIG_IPV6_PRIVACY is set. This is problematic for applications which use CLONE_NEWNET as part of a sandbox, like Chromium's suid sandbox or recent versions of vsftpd. On a busy machine, it can lead to thousands of useless "lo: Disabled Privacy Extensions" messages appearing in dmesg. It's easy enough to check the status of privacy extensions via the use_tempaddr sysctl, so just removing the printk seems like the most sensible solution. Signed-off-by: Romain Francoise <romain@orebokech.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-01-18Merge branch 'master' of ↵David S. Miller2-10/+13
git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-2.6
2011-01-15caif: checking the wrong variableDan Carpenter1-4/+5
In the original code we check if (servl == NULL) twice. The first time should print the message that cfmuxl_remove_uplayer() failed and set "ret" correctly, but instead it just returns success. The second check should be checking the value of "ret" instead of "servl". Signed-off-by: Dan Carpenter <error27@gmail.com> Acked-by: Sjur Braendeland <sjur.brandeland@stericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-01-15can: test size of struct sockaddr in sendmsgKurt Van Dijck2-0/+6
This patch makes the CAN socket code conform to the manpage of sendmsg. Signed-off-by: Kurt Van Dijck <kurt.van.dijck@eia.be> Acked-by: Oliver Hartkopp <socketcan@hartkopp.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-01-15Merge branch 'for-david' of git://git.open-mesh.org/ecsv/linux-mergeDavid S. Miller4-14/+16
2011-01-16batman-adv: Use "__attribute__" shortcut macrosSven Eckelmann3-12/+12
Linux 2.6.21 defines different macros for __attribute__ which are also used inside batman-adv. The next version of checkpatch.pl warns about the usage of __attribute__((packed))). Linux 2.6.33 defines an extra macro __always_unused which is used to assist source code analyzers and can be used to removed the last existing __attribute__ inside the source code. Signed-off-by: Sven Eckelmann <sven@narfation.org>
2011-01-14Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6Linus Torvalds8-38/+37
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6: (47 commits) GRETH: resolve SMP issues and other problems GRETH: handle frame error interrupts GRETH: avoid writing bad speed/duplex when setting transfer mode GRETH: fixed skb buffer memory leak on frame errors GRETH: GBit transmit descriptor handling optimization GRETH: fix opening/closing GRETH: added raw AMBA vendor/device number to match against. cassini: Fix build bustage on x86. e1000e: consistent use of Rx/Tx vs. RX/TX/rx/tx in comments/logs e1000e: update Copyright for 2011 e1000: Avoid unhandled IRQ r8169: keep firmware in memory. netdev: tilepro: Use is_unicast_ether_addr helper etherdevice.h: Add is_unicast_ether_addr function ks8695net: Use default implementation of ethtool_ops::get_link ks8695net: Disable non-working ethtool operations USB CDC NCM: Don't deref NULL in cdc_ncm_rx_fixup() and don't use uninitialized variable. vxge: Remember to release firmware after upgrading firmware netdev: bfin_mac: Remove is_multicast_ether_addr use in netdev_for_each_mc_addr ipsec: update MAX_AH_AUTH_LEN to support sha512 ...
2011-01-14Merge branch 'for-2.6.38' of git://linux-nfs.org/~bfields/linuxLinus Torvalds10-94/+141
* 'for-2.6.38' of git://linux-nfs.org/~bfields/linux: (62 commits) nfsd4: fix callback restarting nfsd: break lease on unlink, link, and rename nfsd4: break lease on nfsd setattr nfsd: don't support msnfs export option nfsd4: initialize cb_per_client nfsd4: allow restarting callbacks nfsd4: simplify nfsd4_cb_prepare nfsd4: give out delegations more quickly in 4.1 case nfsd4: add helper function to run callbacks nfsd4: make sure sequence flags are set after destroy_session nfsd4: re-probe callback on connection loss nfsd4: set sequence flag when backchannel is down nfsd4: keep finer-grained callback status rpc: allow xprt_class->setup to return a preexisting xprt rpc: keep backchannel xprt as long as server connection rpc: move sk_bc_xprt to svc_xprt nfsd4: allow backchannel recovery nfsd4: support BIND_CONN_TO_SESSION nfsd4: modify session list under cl_lock Documentation: fl_mylease no longer exists ... Fix up conflicts in fs/nfsd/vfs.c with the vfs-scale work. The vfs-scale work touched some msnfs cases, and this merge removes support for that entirely, so the conflict was trivial to resolve.
2011-01-14rxrpc: rxrpc_workqueue isn't used during memory reclaimTejun Heo1-1/+1
rxrpc_workqueue isn't depended upon while reclaiming memory. Convert to alloc_workqueue() without WQ_MEM_RECLAIM. Signed-off-by: Tejun Heo <tj@kernel.org> Signed-off-by: David Howells <dhowells@redhat.com> Cc: linux-afs@lists.infradead.org Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-01-13net: remove dev_txq_stats_fold()Eric Dumazet2-34/+21
After recent changes, (percpu stats on vlan/tunnels...), we dont need anymore per struct netdev_queue tx_bytes/tx_packets/tx_dropped counters. Only remaining users are ixgbe, sch_teql, gianfar & macvlan : 1) ixgbe can be converted to use existing tx_ring counters. 2) macvlan incremented txq->tx_dropped, it can use the dev->stats.tx_dropped counter. 3) sch_teql : almost revert ab35cd4b8f42 (Use net_device internal stats) Now we have ndo_get_stats64(), use it, even for "unsigned long" fields (No need to bring back a struct net_device_stats) 4) gianfar adds a stats structure per tx queue to hold tx_bytes/tx_packets This removes a lockdep warning (and possible lockup) in rndis gadget, calling dev_get_stats() from hard IRQ context. Ref: http://www.spinics.net/lists/netdev/msg149202.html Reported-by: Neil Jones <neiljay@gmail.com> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> CC: Jarek Poplawski <jarkao2@gmail.com> CC: Alexander Duyck <alexander.h.duyck@intel.com> CC: Jeff Kirsher <jeffrey.t.kirsher@intel.com> CC: Sandeep Gopalpet <sandeep.kumar@freescale.com> CC: Michal Nazarewicz <mina86@mina86.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-01-13batman-adv: Even Batman should not dereference NULL pointersJesper Juhl1-2/+4
There's a problem in net/batman-adv/unicast.c::frag_send_skb(). dev_alloc_skb() allocates memory and may fail, thus returning NULL. If this happens we'll pass a NULL pointer on to skb_split() which in turn hands it to skb_split_inside_header() from where it gets passed to skb_put() that lets skb_tail_pointer() play with it and that function dereferences it. And thus the bat dies. While I was at it I also moved the call to dev_alloc_skb() above the assignment to 'unicast_packet' since there's no reason to do that assignment if the memory allocation fails. Signed-off-by: Jesper Juhl <jj@chaosbits.net> Signed-off-by: Sven Eckelmann <sven@narfation.org>
2011-01-13mac80211: use maximum number of AMPDU frames as default in BA RXLuciano Coelho1-9/+2
When the buffer size is set to zero in the block ack parameter set field, we should use the maximum supported number of subframes. The existing code was bogus and was doing some unnecessary calculations that lead to wrong values. Thanks Johannes for helping me figure this one out. Cc: stable@kernel.org Cc: Johannes Berg <johannes@sipsolutions.net> Signed-off-by: Luciano Coelho <coelho@ti.com> Reviewed-by: Johannes Berg <johannes@sipsolutions.net> Signed-off-by: John W. Linville <linville@tuxdriver.com>
2011-01-13mac80211: fix lockdep warningJohannes Berg1-1/+11
Since the introduction of the fixes for the reorder timer, mac80211 will cause lockdep warnings because lockdep confuses local->skb_queue and local->rx_skb_queue and treats their lock as the same. However, their locks are different, and are valid in different contexts (the former is used in IRQ context, the latter in BH only) and the only thing to be done is mark the former as a different lock class so that lockdep can tell the difference. Reported-by: Larry Finger <Larry.Finger@lwfinger.net> Reported-by: Sujith <m.sujith@gmail.com> Reported-by: Miles Lane <miles.lane@gmail.com> Tested-by: Sujith <m.sujith@gmail.com> Tested-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: John W. Linville <linville@tuxdriver.com>
2011-01-13Merge branch 'master' of git://1984.lsi.us.es/net-2.6David S. Miller1-1/+2
2011-01-13Merge branch 'for-linus' of ↵Linus Torvalds1-15/+15
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6 * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6: (41 commits) fs: add documentation on fallocate hole punching Gfs2: fail if we try to use hole punch Btrfs: fail if we try to use hole punch Ext4: fail if we try to use hole punch Ocfs2: handle hole punching via fallocate properly XFS: handle hole punching via fallocate properly fs: add hole punching to fallocate vfs: pass struct file to do_truncate on O_TRUNC opens (try #2) fix signedness mess in rw_verify_area() on 64bit architectures fs: fix kernel-doc for dcache::prepend_path fs: fix kernel-doc for dcache::d_validate sanitize ecryptfs ->mount() switch afs move internal-only parts of ncpfs headers to fs/ncpfs switch ncpfs switch 9p pass default dentry_operations to mount_pseudo() switch hostfs switch affs switch configfs ...
2011-01-13Merge git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6Linus Torvalds1-3/+3
* git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6: (46 commits) hwrng: via_rng - Fix memory scribbling on some CPUs crypto: padlock - Move padlock.h into include/crypto hwrng: via_rng - Fix asm constraints crypto: n2 - use __devexit not __exit in n2_unregister_algs crypto: mark crypto workqueues CPU_INTENSIVE crypto: mv_cesa - dont return PTR_ERR() of wrong pointer crypto: ripemd - Set module author and update email address crypto: omap-sham - backlog handling fix crypto: gf128mul - Remove experimental tag crypto: af_alg - fix af_alg memory_allocated data type crypto: aesni-intel - Fixed build with binutils 2.16 crypto: af_alg - Make sure sk_security is initialized on accept()ed sockets net: Add missing lockdep class names for af_alg include: Install linux/if_alg.h for user-space crypto API crypto: omap-aes - checkpatch --file warning fixes crypto: omap-aes - initialize aes module once per request crypto: omap-aes - unnecessary code removed crypto: omap-aes - error handling implementation improved crypto: omap-aes - redundant locking is removed crypto: omap-aes - DMA initialization fixes for OMAP off mode ...
2011-01-13Merge branch 'for-linus' of ↵Linus Torvalds3-45/+8
git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client: rbd: fix cleanup when trying to mount inexistent image net/ceph: make ceph_msgr_wq non-reentrant ceph: fsc->*_wq's aren't used in memory reclaim path ceph: Always free allocated memory in osdmap_decode() ceph: Makefile: Remove unnessary code ceph: associate requests with opening sessions ceph: drop redundant r_mds field ceph: implement DIRLAYOUTHASH feature to get dir layout from MDS ceph: add dir_layout to inode
2011-01-13Merge branch 'for-next' of ↵Linus Torvalds8-8/+14
git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial * 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial: (43 commits) Documentation/trace/events.txt: Remove obsolete sched_signal_send. writeback: fix global_dirty_limits comment runtime -> real-time ppc: fix comment typo singal -> signal drivers: fix comment typo diable -> disable. m68k: fix comment typo diable -> disable. wireless: comment typo fix diable -> disable. media: comment typo fix diable -> disable. remove doc for obsolete dynamic-printk kernel-parameter remove extraneous 'is' from Documentation/iostats.txt Fix spelling milisec -> ms in snd_ps3 module parameter description Fix spelling mistakes in comments Revert conflicting V4L changes i7core_edac: fix typos in comments mm/rmap.c: fix comment sound, ca0106: Fix assignment to 'channel'. hrtimer: fix a typo in comment init/Kconfig: fix typo anon_inodes: fix wrong function name in comment fix comment typos concerning "consistent" poll: fix a typo in comment ... Fix up trivial conflicts in: - drivers/net/wireless/iwlwifi/iwl-core.c (moved to iwl-legacy.c) - fs/ext4/ext4.h Also fix missed 'diabled' typo in drivers/net/bnx2x/bnx2x.h while at it.
2011-01-13netfilter: ctnetlink: fix loop in ctnetlink_get_conntrack()Pablo Neira Ayuso1-1/+2
This patch fixes a loop in ctnetlink_get_conntrack() that can be triggered if you use the same socket to receive events and to perform a GET operation. Under heavy load, netlink_unicast() may return -EAGAIN, this error code is reserved in nfnetlink for the module load-on-demand. Instead, we return -ENOBUFS which is the appropriate error code that has to be propagated to user-space. Reported-by: Holger Eitzenberger <holger@eitzenberger.org> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2011-01-12eth: fix new kernel-doc warningRandy Dunlap1-1/+1
Fix new kernel-doc warning (copy-paste typo): Warning(net/ethernet/eth.c:366): No description found for parameter 'rxqs' Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-01-12Merge branch 'master' of git://1984.lsi.us.es/net-2.6David S. Miller2-1/+9
2011-01-12inet6: prevent network storms caused by linux IPv6 routersAlexey Kuznetsov1-0/+3
Linux IPv6 forwards unicast packets, which are link layer multicasts... The hole was present since day one. I was 100% this check is there, but it is not. The problem shows itself, f.e. when Microsoft Network Load Balancer runs on a network. This software resolves IPv6 unicast addresses to multicast MAC addresses. Signed-off-by: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-01-12pass default dentry_operations to mount_pseudo()Al Viro1-15/+15
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2011-01-12net/ceph: make ceph_msgr_wq non-reentrantTejun Heo1-44/+2
ceph messenger code does a rather complex dancing around multithread workqueue to make sure the same work item isn't executed concurrently on different CPUs. This restriction can be provided by workqueue with WQ_NON_REENTRANT. Make ceph_msgr_wq non-reentrant workqueue with the default concurrency level and remove the QUEUED/BUSY logic. * This removes backoff handling in con_work() but it couldn't reliably block execution of con_work() to begin with - queue_con() can be called after the work started but before BUSY is set. It seems that it was an optimization for a rather cold path and can be safely removed. * The number of concurrent work items is bound by the number of connections and connetions are independent from each other. With the default concurrency level, different connections will be executed independently. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Sage Weil <sage@newdream.net> Cc: ceph-devel@vger.kernel.org Signed-off-by: Sage Weil <sage@newdream.net>
2011-01-12ceph: Always free allocated memory in osdmap_decode()Jesper Juhl1-1/+3
Always free memory allocated to 'pi' in net/ceph/osdmap.c::osdmap_decode(). Signed-off-by: Jesper Juhl <jj@chaosbits.net> Signed-off-by: Sage Weil <sage@newdream.net>
2011-01-12ceph: add dir_layout to inodeSage Weil1-0/+3
Add a ceph_dir_layout to the inode, and calculate dentry hash values based on the parent directory's specified dir_hash function. This is needed because the old default Linux dcache hash function is extremely week and leads to a poor distribution of files among dir fragments. Signed-off-by: Sage Weil <sage@newdream.net>
2011-01-12netfilter: fix compilation when conntrack is disabled but tproxy is enabledKOVACS Krisztian2-1/+9
The IPv6 tproxy patches split IPv6 defragmentation off of conntrack, but failed to update the #ifdef stanzas guarding the defragmentation related fields and code in skbuff and conntrack related code in nf_defrag_ipv6.c. This patch adds the required #ifdefs so that IPv6 tproxy can truly be used without connection tracking. Original report: http://marc.info/?l=linux-netdev&m=129010118516341&w=2 Reported-by: Randy Dunlap <randy.dunlap@oracle.com> Acked-by: Randy Dunlap <randy.dunlap@oracle.com> Signed-off-by: KOVACS Krisztian <hidden@balabit.hu> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2011-01-12net: ax25: fix information leak to userland harderKees Cook1-1/+1
Commit fe10ae53384e48c51996941b7720ee16995cbcb7 adds a memset() to clear the structure being sent back to userspace, but accidentally used the wrong size. Reported-by: Brad Spengler <spender@grsecurity.net> Signed-off-by: Kees Cook <kees.cook@canonical.com> Cc: stable@kernel.org Signed-off-by: David S. Miller <davem@davemloft.net>
2011-01-11Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6Linus Torvalds45-279/+235
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6: (67 commits) cxgb4vf: recover from failure in cxgb4vf_open() netfilter: ebtables: make broute table work again netfilter: fix race in conntrack between dump_table and destroy ah: reload pointers to skb data after calling skb_cow_data() ah: update maximum truncated ICV length xfrm: check trunc_len in XFRMA_ALG_AUTH_TRUNC ehea: Increase the skb array usage net/fec: remove config FEC2 as it's used nowhere pcnet_cs: add new_id tcp: disallow bind() to reuse addr/port net/r8169: Update the function of parsing firmware net: ppp: use {get,put}_unaligned_be{16,32} CAIF: Fix IPv6 support in receive path for GPRS/3G arp: allow to invalidate specific ARP entries net_sched: factorize qdisc stats handling mlx4: Call alloc_etherdev to allocate RX and TX queues net: Add alloc_netdev_mqs function caif: don't set connection request param size before copying data cxgb4vf: fix mailbox data/control coherency domain race qlcnic: change module parameter permissions ...
2011-01-11Merge branch 'master' of git://1984.lsi.us.es/net-2.6David S. Miller5-103/+49
2011-01-11Merge branch 'nfs-for-2.6.38' of ↵Linus Torvalds9-196/+345
git://git.linux-nfs.org/projects/trondmy/nfs-2.6 * 'nfs-for-2.6.38' of git://git.linux-nfs.org/projects/trondmy/nfs-2.6: (89 commits) NFS fix the setting of exchange id flag NFS: Don't use vm_map_ram() in readdir NFSv4: Ensure continued open and lockowner name uniqueness NFS: Move cl_delegations to the nfs_server struct NFS: Introduce nfs_detach_delegations() NFS: Move cl_state_owners and related fields to the nfs_server struct NFS: Allow walking nfs_client.cl_superblocks list outside client.c pnfs: layout roc code pnfs: update nfs4_callback_recallany to handle layouts pnfs: add CB_LAYOUTRECALL handling pnfs: CB_LAYOUTRECALL xdr code pnfs: change lo refcounting to atomic_t pnfs: check that partial LAYOUTGET return is ignored pnfs: add layout to client list before sending rpc pnfs: serialize LAYOUTGET(openstateid) pnfs: layoutget rpc code cleanup pnfs: change how lsegs are removed from layout list pnfs: change layout state seqlock to a spinlock pnfs: add prefix to struct pnfs_layout_hdr fields pnfs: add prefix to struct pnfs_layout_segment fields ...
2011-01-11netfilter: fix race in conntrack between dump_table and destroyStephen Hemminger1-9/+5
The netlink interface to dump the connection tracking table has a race when entries are deleted at the same time. A customer reported a crash and the backtrace showed thatctnetlink_dump_table was running while a conntrack entry was being destroyed. (see https://bugzilla.vyatta.com/show_bug.cgi?id=6402). According to RCU documentation, when using hlist_nulls the reader must handle the case of seeing a deleted entry and not proceed further down the linked list. The old code would continue which caused the scan to walk into the free list. This patch uses locking (rather than RCU) for this operation which is guaranteed safe, and no longer requires getting reference while doing dump operation. Signed-off-by: Stephen Hemminger <shemminger@vyatta.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2011-01-11ah: reload pointers to skb data after calling skb_cow_data()Dang Hongwu2-6/+9
skb_cow_data() may allocate a new data buffer, so pointers on skb should be set after this function. Bug was introduced by commit dff3bb06 ("ah4: convert to ahash") and 8631e9bd ("ah6: convert to ahash"). Signed-off-by: Wang Xuefu <xuefu.wang@6wind.com> Acked-by: Krzysztof Witek <krzysztof.witek@6wind.com> Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-01-11xfrm: check trunc_len in XFRMA_ALG_AUTH_TRUNCNicolas Dichtel1-1/+3
Maximum trunc length is defined by MAX_AH_AUTH_LEN (in bytes) and need to be checked when this value is set (in bits) by the user. In ah4.c and ah6.c a BUG_ON() checks this condiftion. Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-01-11tcp: disallow bind() to reuse addr/portEric Dumazet2-3/+4
inet_csk_bind_conflict() logic currently disallows a bind() if it finds a friend socket (a socket bound on same address/port) satisfying a set of conditions : 1) Current (to be bound) socket doesnt have sk_reuse set OR 2) other socket doesnt have sk_reuse set OR 3) other socket is in LISTEN state We should add the CLOSE state in the 3) condition, in order to avoid two REUSEADDR sockets in CLOSE state with same local address/port, since this can deny further operations. Note : a prior patch tried to address the problem in a different (and buggy) way. (commit fda48a0d7a8412ced tcp: bind() fix when many ports are bound). Reported-by: Gaspar Chilingarov <gasparch@gmail.com> Reported-by: Daniel Baluta <daniel.baluta@gmail.com> Tested-by: Daniel Baluta <daniel.baluta@gmail.com> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-01-11rpc: allow xprt_class->setup to return a preexisting xprtJ. Bruce Fields2-9/+12
This allows us to reuse the xprt associated with a server connection if one has already been set up. Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2011-01-11rpc: keep backchannel xprt as long as server connectionJ. Bruce Fields3-11/+29
Multiple backchannels can share the same tcp connection; from rfc 5661 section 2.10.3.1: A connection's association with a session is not exclusive. A connection associated with the channel(s) of one session may be simultaneously associated with the channel(s) of other sessions including sessions associated with other client IDs. However, multiple backchannels share a connection, they must all share the same xid stream (hence the same rpc_xprt); the only way we have to match replies with calls at the rpc layer is using the xid. So, keep the rpc_xprt around as long as the connection lasts, in case we're asked to use the connection as a backchannel again. Requests to create new backchannel clients over a given server connection should results in creating new clients that reuse the existing rpc_xprt. But to start, just reject attempts to associate multiple rpc_xprt's with the same underlying bc_xprt. Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2011-01-11rpc: move sk_bc_xprt to svc_xprtJ. Bruce Fields2-5/+7
This seems obviously transport-level information even if it's currently used only by the server socket code. Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2011-01-11Merge commit 'v2.6.37' into for-2.6.38-incomingJ. Bruce Fields25-82/+154
I made a slight mess of Documentation/filesystems/Locking; resolve conflicts with upstream before fixing it up.