summaryrefslogtreecommitdiffstats
path: root/net
AgeCommit message (Collapse)AuthorFilesLines
2011-12-03vlan: Move vlan_set_encap_proto() to vlan header filePravin B Shelar1-33/+0
Open vSwitch needs this function for vlan handling. Signed-off-by: Pravin B Shelar <pshelar@nicira.com> Signed-off-by: Jesse Gross <jesse@nicira.com>
2011-12-03genetlink: Add lockdep_genl_is_held().Pravin B Shelar1-0/+8
Open vSwitch uses genl_mutex locking to protect datapath data-structures like flow-table, flow-actions. Following patch adds lockdep_genl_is_held() which is used for rcu annotation to prove locking. Signed-off-by: Pravin B Shelar <pshelar@nicira.com> Signed-off-by: Jesse Gross <jesse@nicira.com>
2011-12-03genetlink: Add genl_notify()Pravin B Shelar1-0/+13
Open vSwitch uses Generic Netlink interface for communication between userspace and kernel module. genl_notify() is used for sending notification back to userspace. genl_notify() is analogous to rtnl_notify() but uses genl_sock instead of rtnl. Signed-off-by: Pravin B Shelar <pshelar@nicira.com> Signed-off-by: Jesse Gross <jesse@nicira.com>
2011-12-02atm: clip: Remove code commented out since eternity.David S. Miller1-9/+0
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-12-02Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller27-132/+253
2011-12-01Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netLinus Torvalds44-186/+337
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (73 commits) netfilter: Remove ADVANCED dependency from NF_CONNTRACK_NETBIOS_NS ipv4: flush route cache after change accept_local sch_red: fix red_change Revert "udp: remove redundant variable" bridge: master device stuck in no-carrier state forever when in user-stp mode ipv4: Perform peer validation on cached route lookup. net/core: fix rollback handler in register_netdevice_notifier sch_red: fix red_calc_qavg_from_idle_time bonding: only use primary address for ARP ipv4: fix lockdep splat in rt_cache_seq_show sch_teql: fix lockdep splat net: fec: Select the FEC driver by default for i.MX SoCs isdn: avoid copying too long drvid isdn: make sure strings are null terminated netlabel: Fix build problems when IPv6 is not enabled sctp: better integer overflow check in sctp_auth_create_key() sctp: integer overflow in sctp_auth_create_key() ipv6: Set mcast_hops to IPV6_DEFAULT_MCASTHOPS when -1 was given. net: Fix corruption in /proc/*/net/dev_mcast mac80211: fix race between the AGG SM and the Tx data path ...
2011-12-01netfilter: Remove ADVANCED dependency from NF_CONNTRACK_NETBIOS_NSDavid S. Miller1-1/+0
firewalld in Fedora 16 needs this. Signed-off-by: David S. Miller <davem@davemloft.net>
2011-12-01ipv4: flush route cache after change accept_localPeter Pan(潘卫平)1-0/+5
After reset ipv4_devconf->data[IPV4_DEVCONF_ACCEPT_LOCAL] to 0, we should flush route cache, or it will continue receive packets with local source address, which should be dropped. Signed-off-by: Weiping Pan <panweiping3@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-12-01sch_red: fix red_changeEric Dumazet1-2/+2
Le mercredi 30 novembre 2011 à 14:36 -0800, Stephen Hemminger a écrit : > (Almost) nobody uses RED because they can't figure it out. > According to Wikipedia, VJ says that: > "there are not one, but two bugs in classic RED." RED is useful for high throughput routers, I doubt many linux machines act as such devices. I was considering adding Adaptative RED (Sally Floyd, Ramakrishna Gummadi, Scott Shender), August 2001 In this version, maxp is dynamic (from 1% to 50%), and user only have to setup min_th (target average queue size) (max_th and wq (burst in linux RED) are automatically setup) By the way it seems we have a small bug in red_change() if (skb_queue_empty(&sch->q)) red_end_of_idle_period(&q->parms); First, if queue is empty, we should call red_start_of_idle_period(&q->parms); Second, since we dont use anymore sch->q, but q->qdisc, the test is meaningless. Oh well... [PATCH] sch_red: fix red_change() Now RED is classful, we must check q->qdisc->q.qlen, and if queue is empty, we start an idle period, not end it. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-12-01dccp: Fix compile warning in probe code.David S. Miller1-2/+12
Commit 1386be55e32a3c5d8ef4a2b243c530a7b664c02c ("dccp: fix auto-loading of dccp(_probe)") fixed a bug but created a new compiler warning: net/dccp/probe.c: In function ‘dccpprobe_init’: net/dccp/probe.c:166:2: warning: the omitted middle operand in ?: will always be ‘true’, suggest explicit middle operand [-Wparentheses] try_then_request_module() is built for situations where the "existence" test is some lookup function that returns a non-NULL object on success, and with a reference count of some kind held. Here we're looking for a success return of zero from the jprobe registry. Instead of fighting the way try_then_request_module() works, simply open code what we want to happen in a local helper function. Signed-off-by: David S. Miller <davem@davemloft.net>
2011-12-01Revert "udp: remove redundant variable"David S. Miller2-14/+16
This reverts commit 81d54ec8479a2c695760da81f05b5a9fb2dbe40a. If we take the "try_again" goto, due to a checksum error, the 'len' has already been truncated. So we won't compute the same values as the original code did. Reported-by: paul bilke <fsmail@conspiracy.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-12-01bridge: master device stuck in no-carrier state forever when in user-stp modeVitalii Demianets2-15/+20
When in user-stp mode, bridge master do not follow state of its slaves, so after the following sequence of events it can stuck forever in no-carrier state: 1) turn stp off 2) put all slaves down - master device will follow their state and also go in no-carrier state 3) turn stp on with bridge-stp script returning 0 (go to the user-stp mode) Now bridge master won't follow slaves' state and will never reach running state. This patch solves the problem by making user-stp and kernel-stp behavior similar regarding master following slaves' states. Signed-off-by: Vitalii Demianets <vitas@nppfactor.kiev.ua> Acked-by: Stephen Hemminger <shemminger@vyatta.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-12-01ipv4: Perform peer validation on cached route lookup.David S. Miller1-7/+19
Otherwise we won't notice the peer GENID change. Reported-by: Steffen Klassert <steffen.klassert@secunet.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-12-01ipv4: use a 64bit load/store in output pathEric Dumazet1-4/+17
gcc compiler is smart enough to use a single load/store if we memcpy(dptr, sptr, 8) on x86_64, regardless of CONFIG_CC_OPTIMIZE_FOR_SIZE In IP header, daddr immediately follows saddr, this wont change in the future. We only need to make sure our flowi4 (saddr,daddr) fields wont break the rule. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-12-01dccp: Evaluate ip_hdr() only once in dccp_v4_route_skb().David S. Miller1-2/+3
This also works around a bogus gcc warning generated by an upcoming patch from Eric Dumazet that rearranges the layout of struct flowi4. Signed-off-by: David S. Miller <davem@davemloft.net>
2011-12-01net: net_device flags is an unsigned intEric Dumazet1-7/+7
commit b00055aacdb ([NET] core: add RFC2863 operstate) changed net_device flags from unsigned short to unsigned int. Some core functions still assume its an unsigned short. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-12-01netem: fix build error on 32bit archesEric Dumazet1-1/+4
ERROR: "__udivdi3" [net/sched/sch_netem.ko] undefined! Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Acked-by: Hagen Paul Pfeifer <hagen@jauu.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-11-30net/core: fix rollback handler in register_netdevice_notifierRongQing.Li1-1/+2
Within nested statements, the break statement terminates only the do, for, switch, or while statement that immediately encloses it, So replace the break with goto. Signed-off-by: RongQing.Li <roy.qing.li@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-11-30caif: Remove unused enum and parameter in cfserlsjur.brandeland@stericsson.com2-3/+2
Remove unused enum cfcnfg_phy_type and the parameter to cfserl_create. Signed-off-by: Sjur Brændeland <sjur.brandeland@stericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-11-30caif: Restructure how link caif link layer enrollsjur.brandeland@stericsson.com2-86/+106
Enrolling CAIF link layers are refactored. Signed-off-by: Sjur Brændeland <sjur.brandeland@stericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-11-30caif: Allow cfpkt_extr_head to process empty messagesjur.brandeland@stericsson.com1-1/+2
Allow NULL pointer in cfpkt_extr_head in order to skip past header data. Signed-off-by: Sjur Brændeland <sjur.brandeland@stericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-11-30netem: rate extensionHagen Paul Pfeifer1-0/+40
Currently netem is not in the ability to emulate channel bandwidth. Only static delay (and optional random jitter) can be configured. To emulate the channel rate the token bucket filter (sch_tbf) can be used. But TBF has some major emulation flaws. The buffer (token bucket depth/rate) cannot be 0. Also the idea behind TBF is that the credit (token in buckets) fills if no packet is transmitted. So that there is always a "positive" credit for new packets. In real life this behavior contradicts the law of nature where nothing can travel faster as speed of light. E.g.: on an emulated 1000 byte/s link a small IPv4/TCP SYN packet with ~50 byte require ~0.05 seconds - not 0 seconds. Netem is an excellent place to implement a rate limiting feature: static delay is already implemented, tfifo already has time information and the user can skip TBF configuration completely. This patch implement rate feature which can be configured via tc. e.g: tc qdisc add dev eth0 root netem rate 10kbit To emulate a link of 5000byte/s and add an additional static delay of 10ms: tc qdisc add dev eth0 root netem delay 10ms rate 5KBps Note: similar to TBF the rate extension is bounded to the kernel timing system. Depending on the architecture timer granularity, higher rates (e.g. 10mbit/s and higher) tend to transmission bursts. Also note: further queues living in network adaptors; see ethtool(8). Signed-off-by: Hagen Paul Pfeifer <hagen@jauu.net> Acked-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@drr.davemloft.net>
2011-11-30ipv6 : mcast : Delete useless parameter in ip6_mc_add1_src()Jun Zhao1-2/+2
Need not to used 'delta' flag when add single-source to interface filter source list. Signed-off-by: Jun Zhao <mypopydev@gmail.com> Signed-off-by: David S. Miller <davem@drr.davemloft.net>
2011-11-30ipv4 : igmp : Delete useless parameter in ip_mc_add1_src()Jun Zhao1-2/+2
Need not to used 'delta' flag when add single-source to interface filter source list. Signed-off-by: Jun Zhao <mypopydev@gmail.com> Signed-off-by: David S. Miller <davem@drr.davemloft.net>
2011-11-30atm: clip: Use device neigh support on top of "arp_tbl".David Miller3-85/+16
Instead of instantiating an entire new neigh_table instance just for ATM handling, use the neigh device private facility. Signed-off-by: David S. Miller <davem@davemloft.net>
2011-11-30neigh: Add device constructor/destructor capability.David Miller1-1/+14
If the neigh entry has device private state, it will need constructor/destructor ops. Signed-off-by: David S. Miller <davem@davemloft.net>
2011-11-30atm: clip: Convert over to neighbour_priv()David Miller1-13/+15
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-11-30neigh: Do not set tbl->entry_size in ipv4/ipv6 neigh tables.David Miller3-3/+0
Let the core self-size the neigh entry based upon the key length. Signed-off-by: David S. Miller <davem@davemloft.net>
2011-11-30neigh: Add infrastructure for allocating device neigh privates.David Miller2-3/+12
netdev->neigh_priv_len records the private area length. This will trigger for neigh_table objects which set tbl->entry_size to zero, and the first instances of this will be forthcoming. Signed-off-by: David S. Miller <davem@davemloft.net>
2011-11-30neigh: Get rid of neigh_table->kmem_cachepDavid Miller1-16/+2
We are going to alloc for device specific private areas for neighbour entries, and in order to do that we have to move away from the fixed allocation size enforced by using neigh_table->kmem_cachep As a nice side effect we can now use kfree_rcu(). Signed-off-by: David S. Miller <davem@davemloft.net>
2011-11-30ipv4: fix lockdep splat in rt_cache_seq_showEric Dumazet1-2/+6
After commit f2c31e32b378 (fix NULL dereferences in check_peer_redir()), dst_get_neighbour() should be guarded by rcu_read_lock() / rcu_read_unlock() section. Reported-by: Miles Lane <miles.lane@gmail.com> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-11-30sch_teql: fix lockdep splatEric Dumazet1-11/+20
We need rcu_read_lock() protection before using dst_get_neighbour(), and we must cache its value (pass it to __teql_resolve()) teql_master_xmit() is called under rcu_read_lock_bh() protection, its not enough. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-11-30tcp: inherit listener congestion control for passive cnxEric Dumazet2-1/+4
Rick Jones reported that TCP_CONGESTION sockopt performed on a listener was ignored for its children sockets : right after accept() the congestion control for new socket is the system default one. This seems an oversight of the initial design (quoted from Stephen) Based on prior investigation and patch from Rick. Reported-by: Rick Jones <rick.jones2@hp.com> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> CC: Stephen Hemminger <shemminger@vyatta.com> CC: Yuchung Cheng <ycheng@google.com> Tested-by: Rick Jones <rick.jones2@hp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-11-30Merge branch 'master' of ↵John W. Linville2-5/+41
git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless into for-davem
2011-11-29ipv4: remove useless codes in ipmr_device_event()RongQing.Li1-3/+1
Commit 7dc00c82 added a 'notify' parameter for vif_delete() to distinguish whether to unregister the device. When notify=1 means we does not need to unregister the device, so calling unregister_netdevice_many is useless. Signed-off-by: RongQing.Li <roy.qing.li@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-11-29net: Fix skb_update_prio RCU usage.Igor Maravic1-1/+1
Change function rcu_dereference to rcu_dereference_bh to avoid warning [ INFO: suspicious RCU usage. ] ------------------------------- net/core/dev.c:2459 suspicious rcu_dereference_check() usage! because we are locking with rcu_read_lock_bh(); in function dev_queue_xmit(struct sk_buff *skb) Signed-off-by: Igor Maravic <igorm@etf.rs> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-11-29netlabel: Fix build problems when IPv6 is not enabledPaul Moore1-8/+14
A recent fix to the the NetLabel code caused build problem with configurations that did not have IPv6 enabled; see below: netlabel_kapi.c: In function 'netlbl_cfg_unlbl_map_add': netlabel_kapi.c:165:4: error: implicit declaration of function 'netlbl_af6list_add' This patch fixes this problem by making the IPv6 specific code conditional on the IPv6 configuration flags as we done in the rest of NetLabel and the network stack as a whole. We have to move some variable declarations around as a result so things may not be quite as pretty, but at least it builds cleanly now. Some additional IPv6 conditionals were added to the NetLabel code as well for the sake of consistency. Reported-by: Randy Dunlap <rdunlap@xenotime.net> Signed-off-by: Paul Moore <pmoore@redhat.com> Acked-by: Randy Dunlap <rdunlap@xenotime.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-11-29sctp: better integer overflow check in sctp_auth_create_key()Xi Wang1-1/+1
The check from commit 30c2235c is incomplete and cannot prevent cases like key_len = 0x80000000 (INT_MAX + 1). In that case, the left-hand side of the check (INT_MAX - key_len), which is unsigned, becomes 0xffffffff (UINT_MAX) and bypasses the check. However this shouldn't be a security issue. The function is called from the following two code paths: 1) setsockopt() 2) sctp_auth_asoc_set_secret() In case (1), sca_keylength is never going to exceed 65535 since it's bounded by a u16 from the user API. As such, the key length will never overflow. In case (2), sca_keylength is computed based on the user key (1 short) and 2 * key_vector (3 shorts) for a total of 7 * USHRT_MAX, which still will not overflow. In other words, this overflow check is not really necessary. Just make it more correct. Signed-off-by: Xi Wang <xi.wang@gmail.com> Cc: Vlad Yasevich <vladislav.yasevich@hp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-11-29sch_choke: use skb_flow_dissect()Eric Dumazet1-89/+31
Instead of using a custom flow dissector, use skb_flow_dissect() and benefit from tunnelling support. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-11-29sch_sfq: use skb_flow_dissect()Eric Dumazet1-55/+10
Instead of using a custom flow dissector, use skb_flow_dissect() and benefit from tunnelling support. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-11-29tcp: avoid frag allocation for small framesEric Dumazet1-3/+6
tcp_sendmsg() uses select_size() helper to choose skb head size when a new skb must be allocated. If GSO is enabled for the socket, current strategy is to force all payload data to be outside of headroom, in PAGE fragments. This strategy is not welcome for small packets, wasting memory. Experiments show that best results are obtained when using 2048 bytes for skb head (This includes the skb overhead and various headers) This patch provides better len/truesize ratios for packets sent to loopback device, and reduce memory needs for in-flight loopback packets, particularly on arches with big pages. If a sender sends many 1-byte packets to an unresponsive application, receiver rmem_alloc will grow faster and will stop queuing these packets sooner, or will collapse its receive queue to free excess memory. netperf -t TCP_RR results are improved by ~4 %, and many workloads are improved as well (tbench, mysql...) Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-11-29flow_dissector: use a 64bit load/storeEric Dumazet1-2/+11
Le lundi 28 novembre 2011 à 19:06 -0500, David Miller a écrit : > From: Dimitris Michailidis <dm@chelsio.com> > Date: Mon, 28 Nov 2011 08:25:39 -0800 > > >> +bool skb_flow_dissect(const struct sk_buff *skb, struct flow_keys > >> *flow) > >> +{ > >> + int poff, nhoff = skb_network_offset(skb); > >> + u8 ip_proto; > >> + u16 proto = skb->protocol; > > > > __be16 instead of u16 for proto? > > I'll take care of this when I apply these patches. ( CC trimmed ) Thanks David ! Here is a small patch to use one 64bit load/store on x86_64 instead of two 32bit load/stores. [PATCH net-next] flow_dissector: use a 64bit load/store gcc compiler is smart enough to use a single load/store if we memcpy(dptr, sptr, 8) on x86_64, regardless of CONFIG_CC_OPTIMIZE_FOR_SIZE In IP header, daddr immediately follows saddr, this wont change in the future. We only need to make sure our flow_keys (src,dst) fields wont break the rule. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-11-29bql: Byte queue limitsTom Herbert3-8/+141
Networking stack support for byte queue limits, uses dynamic queue limits library. Byte queue limits are maintained per transmit queue, and a dql structure has been added to netdev_queue structure for this purpose. Configuration of bql is in the tx-<n> sysfs directory for the queue under the byte_queue_limits directory. Configuration includes: limit_min, bql minimum limit limit_max, bql maximum limit hold_time, bql slack hold time Also under the directory are: limit, current byte limit inflight, current number of bytes on the queue Signed-off-by: Tom Herbert <therbert@google.com> Acked-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-11-29xps: Add xps_queue_release functionTom Herbert1-42/+47
This patch moves the xps specific parts in netdev_queue_release into its own function which netdev_queue_release can call. This allows netdev_queue_release to be more generic (for adding new attributes to tx queues). Signed-off-by: Tom Herbert <therbert@google.com> Acked-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-11-29net: Add queue state xoff flag for stackTom Herbert6-14/+16
Create separate queue state flags so that either the stack or drivers can turn on XOFF. Added a set of functions used in the stack to determine if a queue is really stopped (either by stack or driver) Signed-off-by: Tom Herbert <therbert@google.com> Acked-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-11-29Merge branch 'nf' of git://1984.lsi.us.es/netDavid S. Miller6-44/+75
2011-11-29tcp: do not scale TSO segment size with reordering degreeNeal Cardwell2-2/+2
Since 2005 (c1b4a7e69576d65efc31a8cea0714173c2841244) tcp_tso_should_defer has been using tcp_max_burst() as a target limit for deciding how large to make outgoing TSO packets when not using sysctl_tcp_tso_win_divisor. But since 2008 (dd9e0dda66ba38a2ddd1405ac279894260dc5c36) tcp_max_burst() returns the reordering degree. We should not have tcp_tso_should_defer attempt to build larger segments just because there is more reordering. This commit splits the notion of deferral size used in TSO from the notion of burst size used in cwnd moderation, and returns the TSO deferral limit to its original value. Signed-off-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-11-29atm: br2684: Avoid alignment issuesPascal Hambourg1-1/+2
Use memcmp() instead of cast to u16 when checking the PAD field. Signed-off-by: Pascal Hambourg <pascal@plouf.fr.eu.org> Signed-off-by: chas williams - CONTRACTOR <chas@cmf.nrl.navy.mil> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-11-29atm: br2684: Make headroom and hard_header_len depend on the payload typePascal Hambourg1-2/+6
Routed payload requires less headroom than bridged payload. So do not reallocate headroom if not needed. Also, add worst case AAL5 overhead to netdev->hard_header_len. Signed-off-by: Pascal Hambourg <pascal@plouf.fr.eu.org> Signed-off-by: chas williams - CONTRACTOR <chas@cmf.nrl.navy.mil> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-11-29net: optimize socket timestampingEric Dumazet1-15/+12
We can test/set multiple bits from sk_flags at once, to shorten a bit socket setup/dismantle phase. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>