diff options
author | Linus Torvalds <torvalds@linux-foundation.org> | 2013-11-13 17:40:34 +0900 |
---|---|---|
committer | Linus Torvalds <torvalds@linux-foundation.org> | 2013-11-13 17:40:34 +0900 |
commit | 42a2d923cc349583ebf6fdd52a7d35e1c2f7e6bd (patch) | |
tree | 2b2b0c03b5389c1301800119333967efafd994ca /net | |
parent | 5cbb3d216e2041700231bcfc383ee5f8b7fc8b74 (diff) | |
parent | 75ecab1df14d90e86cebef9ec5c76befde46e65f (diff) | |
download | linux-42a2d923cc349583ebf6fdd52a7d35e1c2f7e6bd.tar.bz2 |
Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next
Pull networking updates from David Miller:
1) The addition of nftables. No longer will we need protocol aware
firewall filtering modules, it can all live in userspace.
At the core of nftables is a, for lack of a better term, virtual
machine that executes byte codes to inspect packet or metadata
(arriving interface index, etc.) and make verdict decisions.
Besides support for loading packet contents and comparing them, the
interpreter supports lookups in various datastructures as
fundamental operations. For example sets are supports, and
therefore one could create a set of whitelist IP address entries
which have ACCEPT verdicts attached to them, and use the appropriate
byte codes to do such lookups.
Since the interpreted code is composed in userspace, userspace can
do things like optimize things before giving it to the kernel.
Another major improvement is the capability of atomically updating
portions of the ruleset. In the existing netfilter implementation,
one has to update the entire rule set in order to make a change and
this is very expensive.
Userspace tools exist to create nftables rules using existing
netfilter rule sets, but both kernel implementations will need to
co-exist for quite some time as we transition from the old to the
new stuff.
Kudos to Patrick McHardy, Pablo Neira Ayuso, and others who have
worked so hard on this.
2) Daniel Borkmann and Hannes Frederic Sowa made several improvements
to our pseudo-random number generator, mostly used for things like
UDP port randomization and netfitler, amongst other things.
In particular the taus88 generater is updated to taus113, and test
cases are added.
3) Support 64-bit rates in HTB and TBF schedulers, from Eric Dumazet
and Yang Yingliang.
4) Add support for new 577xx tigon3 chips to tg3 driver, from Nithin
Sujir.
5) Fix two fatal flaws in TCP dynamic right sizing, from Eric Dumazet,
Neal Cardwell, and Yuchung Cheng.
6) Allow IP_TOS and IP_TTL to be specified in sendmsg() ancillary
control message data, much like other socket option attributes.
From Francesco Fusco.
7) Allow applications to specify a cap on the rate computed
automatically by the kernel for pacing flows, via a new
SO_MAX_PACING_RATE socket option. From Eric Dumazet.
8) Make the initial autotuned send buffer sizing in TCP more closely
reflect actual needs, from Eric Dumazet.
9) Currently early socket demux only happens for TCP sockets, but we
can do it for connected UDP sockets too. Implementation from Shawn
Bohrer.
10) Refactor inet socket demux with the goal of improving hash demux
performance for listening sockets. With the main goals being able
to use RCU lookups on even request sockets, and eliminating the
listening lock contention. From Eric Dumazet.
11) The bonding layer has many demuxes in it's fast path, and an RCU
conversion was started back in 3.11, several changes here extend the
RCU usage to even more locations. From Ding Tianhong and Wang
Yufen, based upon suggestions by Nikolay Aleksandrov and Veaceslav
Falico.
12) Allow stackability of segmentation offloads to, in particular, allow
segmentation offloading over tunnels. From Eric Dumazet.
13) Significantly improve the handling of secret keys we input into the
various hash functions in the inet hashtables, TCP fast open, as
well as syncookies. From Hannes Frederic Sowa. The key fundamental
operation is "net_get_random_once()" which uses static keys.
Hannes even extended this to ipv4/ipv6 fragmentation handling and
our generic flow dissector.
14) The generic driver layer takes care now to set the driver data to
NULL on device removal, so it's no longer necessary for drivers to
explicitly set it to NULL any more. Many drivers have been cleaned
up in this way, from Jingoo Han.
15) Add a BPF based packet scheduler classifier, from Daniel Borkmann.
16) Improve CRC32 interfaces and generic SKB checksum iterators so that
SCTP's checksumming can more cleanly be handled. Also from Daniel
Borkmann.
17) Add a new PMTU discovery mode, IP_PMTUDISC_INTERFACE, which forces
using the interface MTU value. This helps avoid PMTU attacks,
particularly on DNS servers. From Hannes Frederic Sowa.
18) Use generic XPS for transmit queue steering rather than internal
(re-)implementation in virtio-net. From Jason Wang.
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (1622 commits)
random32: add test cases for taus113 implementation
random32: upgrade taus88 generator to taus113 from errata paper
random32: move rnd_state to linux/random.h
random32: add prandom_reseed_late() and call when nonblocking pool becomes initialized
random32: add periodic reseeding
random32: fix off-by-one in seeding requirement
PHY: Add RTL8201CP phy_driver to realtek
xtsonic: add missing platform_set_drvdata() in xtsonic_probe()
macmace: add missing platform_set_drvdata() in mace_probe()
ethernet/arc/arc_emac: add missing platform_set_drvdata() in arc_emac_probe()
ipv6: protect for_each_sk_fl_rcu in mem_check with rcu_read_lock_bh
vlan: Implement vlan_dev_get_egress_qos_mask as an inline.
ixgbe: add warning when max_vfs is out of range.
igb: Update link modes display in ethtool
netfilter: push reasm skb through instead of original frag skbs
ip6_output: fragment outgoing reassembled skb properly
MAINTAINERS: mv643xx_eth: take over maintainership from Lennart
net_sched: tbf: support of 64bit rates
ixgbe: deleting dfwd stations out of order can cause null ptr deref
ixgbe: fix build err, num_rx_queues is only available with CONFIG_RPS
...
Diffstat (limited to 'net')
428 files changed, 35440 insertions, 13910 deletions
diff --git a/net/8021q/vlan.c b/net/8021q/vlan.c index 61fc573f1142..b3d17d1c49c3 100644 --- a/net/8021q/vlan.c +++ b/net/8021q/vlan.c @@ -98,14 +98,14 @@ void unregister_vlan_dev(struct net_device *dev, struct list_head *head) vlan_gvrp_request_leave(dev); vlan_group_set_device(grp, vlan->vlan_proto, vlan_id, NULL); + + netdev_upper_dev_unlink(real_dev, dev); /* Because unregister_netdevice_queue() makes sure at least one rcu * grace period is respected before device freeing, * we dont need to call synchronize_net() here. */ unregister_netdevice_queue(dev, head); - netdev_upper_dev_unlink(real_dev, dev); - if (grp->nr_vlan_devs == 0) { vlan_mvrp_uninit_applicant(real_dev); vlan_gvrp_uninit_applicant(real_dev); @@ -169,13 +169,13 @@ int register_vlan_dev(struct net_device *dev) if (err < 0) goto out_uninit_mvrp; - err = netdev_upper_dev_link(real_dev, dev); - if (err) - goto out_uninit_mvrp; - err = register_netdevice(dev); if (err < 0) - goto out_upper_dev_unlink; + goto out_uninit_mvrp; + + err = netdev_upper_dev_link(real_dev, dev); + if (err) + goto out_unregister_netdev; /* Account for reference in struct vlan_dev_priv */ dev_hold(real_dev); @@ -191,8 +191,8 @@ int register_vlan_dev(struct net_device *dev) return 0; -out_upper_dev_unlink: - netdev_upper_dev_unlink(real_dev, dev); +out_unregister_netdev: + unregister_netdevice(dev); out_uninit_mvrp: if (grp->nr_vlan_devs == 0) vlan_mvrp_uninit_applicant(real_dev); diff --git a/net/8021q/vlan.h b/net/8021q/vlan.h index ba5983f34c42..5704ed9c3a23 100644 --- a/net/8021q/vlan.h +++ b/net/8021q/vlan.h @@ -5,83 +5,6 @@ #include <linux/u64_stats_sync.h> #include <linux/list.h> - -/** - * struct vlan_priority_tci_mapping - vlan egress priority mappings - * @priority: skb priority - * @vlan_qos: vlan priority: (skb->priority << 13) & 0xE000 - * @next: pointer to next struct - */ -struct vlan_priority_tci_mapping { - u32 priority; - u16 vlan_qos; - struct vlan_priority_tci_mapping *next; -}; - - -/** - * struct vlan_pcpu_stats - VLAN percpu rx/tx stats - * @rx_packets: number of received packets - * @rx_bytes: number of received bytes - * @rx_multicast: number of received multicast packets - * @tx_packets: number of transmitted packets - * @tx_bytes: number of transmitted bytes - * @syncp: synchronization point for 64bit counters - * @rx_errors: number of rx errors - * @tx_dropped: number of tx drops - */ -struct vlan_pcpu_stats { - u64 rx_packets; - u64 rx_bytes; - u64 rx_multicast; - u64 tx_packets; - u64 tx_bytes; - struct u64_stats_sync syncp; - u32 rx_errors; - u32 tx_dropped; -}; - -struct netpoll; - -/** - * struct vlan_dev_priv - VLAN private device data - * @nr_ingress_mappings: number of ingress priority mappings - * @ingress_priority_map: ingress priority mappings - * @nr_egress_mappings: number of egress priority mappings - * @egress_priority_map: hash of egress priority mappings - * @vlan_proto: VLAN encapsulation protocol - * @vlan_id: VLAN identifier - * @flags: device flags - * @real_dev: underlying netdevice - * @real_dev_addr: address of underlying netdevice - * @dent: proc dir entry - * @vlan_pcpu_stats: ptr to percpu rx stats - */ -struct vlan_dev_priv { - unsigned int nr_ingress_mappings; - u32 ingress_priority_map[8]; - unsigned int nr_egress_mappings; - struct vlan_priority_tci_mapping *egress_priority_map[16]; - - __be16 vlan_proto; - u16 vlan_id; - u16 flags; - - struct net_device *real_dev; - unsigned char real_dev_addr[ETH_ALEN]; - - struct proc_dir_entry *dent; - struct vlan_pcpu_stats __percpu *vlan_pcpu_stats; -#ifdef CONFIG_NET_POLL_CONTROLLER - struct netpoll *netpoll; -#endif -}; - -static inline struct vlan_dev_priv *vlan_dev_priv(const struct net_device *dev) -{ - return netdev_priv(dev); -} - /* if this changes, algorithm will have to be reworked because this * depends on completely exhausting the VLAN identifier space. Thus * it gives constant time look-up, but in many cases it wastes memory. @@ -196,12 +119,12 @@ static inline u32 vlan_get_ingress_priority(struct net_device *dev, } #ifdef CONFIG_VLAN_8021Q_GVRP -extern int vlan_gvrp_request_join(const struct net_device *dev); -extern void vlan_gvrp_request_leave(const struct net_device *dev); -extern int vlan_gvrp_init_applicant(struct net_device *dev); -extern void vlan_gvrp_uninit_applicant(struct net_device *dev); -extern int vlan_gvrp_init(void); -extern void vlan_gvrp_uninit(void); +int vlan_gvrp_request_join(const struct net_device *dev); +void vlan_gvrp_request_leave(const struct net_device *dev); +int vlan_gvrp_init_applicant(struct net_device *dev); +void vlan_gvrp_uninit_applicant(struct net_device *dev); +int vlan_gvrp_init(void); +void vlan_gvrp_uninit(void); #else static inline int vlan_gvrp_request_join(const struct net_device *dev) { return 0; } static inline void vlan_gvrp_request_leave(const struct net_device *dev) {} @@ -212,12 +135,12 @@ static inline void vlan_gvrp_uninit(void) {} #endif #ifdef CONFIG_VLAN_8021Q_MVRP -extern int vlan_mvrp_request_join(const struct net_device *dev); -extern void vlan_mvrp_request_leave(const struct net_device *dev); -extern int vlan_mvrp_init_applicant(struct net_device *dev); -extern void vlan_mvrp_uninit_applicant(struct net_device *dev); -extern int vlan_mvrp_init(void); -extern void vlan_mvrp_uninit(void); +int vlan_mvrp_request_join(const struct net_device *dev); +void vlan_mvrp_request_leave(const struct net_device *dev); +int vlan_mvrp_init_applicant(struct net_device *dev); +void vlan_mvrp_uninit_applicant(struct net_device *dev); +int vlan_mvrp_init(void); +void vlan_mvrp_uninit(void); #else static inline int vlan_mvrp_request_join(const struct net_device *dev) { return 0; } static inline void vlan_mvrp_request_leave(const struct net_device *dev) {} @@ -229,8 +152,8 @@ static inline void vlan_mvrp_uninit(void) {} extern const char vlan_fullname[]; extern const char vlan_version[]; -extern int vlan_netlink_init(void); -extern void vlan_netlink_fini(void); +int vlan_netlink_init(void); +void vlan_netlink_fini(void); extern struct rtnl_link_ops vlan_link_ops; diff --git a/net/8021q/vlan_dev.c b/net/8021q/vlan_dev.c index 09bf1c38805b..8db1b985dbf1 100644 --- a/net/8021q/vlan_dev.c +++ b/net/8021q/vlan_dev.c @@ -68,25 +68,6 @@ static int vlan_dev_rebuild_header(struct sk_buff *skb) return 0; } -static inline u16 -vlan_dev_get_egress_qos_mask(struct net_device *dev, struct sk_buff *skb) -{ - struct vlan_priority_tci_mapping *mp; - - smp_rmb(); /* coupled with smp_wmb() in vlan_dev_set_egress_priority() */ - - mp = vlan_dev_priv(dev)->egress_priority_map[(skb->priority & 0xF)]; - while (mp) { - if (mp->priority == skb->priority) { - return mp->vlan_qos; /* This should already be shifted - * to mask correctly with the - * VLAN's TCI */ - } - mp = mp->next; - } - return 0; -} - /* * Create the VLAN header for an arbitrary protocol layer * @@ -111,7 +92,7 @@ static int vlan_dev_hard_header(struct sk_buff *skb, struct net_device *dev, vhdr = (struct vlan_hdr *) skb_push(skb, VLAN_HLEN); vlan_tci = vlan->vlan_id; - vlan_tci |= vlan_dev_get_egress_qos_mask(dev, skb); + vlan_tci |= vlan_dev_get_egress_qos_mask(dev, skb->priority); vhdr->h_vlan_TCI = htons(vlan_tci); /* @@ -168,7 +149,7 @@ static netdev_tx_t vlan_dev_hard_start_xmit(struct sk_buff *skb, vlan->flags & VLAN_FLAG_REORDER_HDR) { u16 vlan_tci; vlan_tci = vlan->vlan_id; - vlan_tci |= vlan_dev_get_egress_qos_mask(dev, skb); + vlan_tci |= vlan_dev_get_egress_qos_mask(dev, skb->priority); skb = __vlan_hwaccel_put_tag(skb, vlan->vlan_proto, vlan_tci); } diff --git a/net/Kconfig b/net/Kconfig index b50dacc072f0..0715db64a5c3 100644 --- a/net/Kconfig +++ b/net/Kconfig @@ -220,6 +220,7 @@ source "net/openvswitch/Kconfig" source "net/vmw_vsock/Kconfig" source "net/netlink/Kconfig" source "net/mpls/Kconfig" +source "net/hsr/Kconfig" config RPS boolean diff --git a/net/Makefile b/net/Makefile index 9492e8cb64e9..8fa2f91517f1 100644 --- a/net/Makefile +++ b/net/Makefile @@ -71,3 +71,4 @@ obj-$(CONFIG_NFC) += nfc/ obj-$(CONFIG_OPENVSWITCH) += openvswitch/ obj-$(CONFIG_VSOCKETS) += vmw_vsock/ obj-$(CONFIG_NET_MPLS_GSO) += mpls/ +obj-$(CONFIG_HSR) += hsr/ diff --git a/net/ax25/af_ax25.c b/net/ax25/af_ax25.c index 4b4d2b779ec1..a00123ebb0ae 100644 --- a/net/ax25/af_ax25.c +++ b/net/ax25/af_ax25.c @@ -1735,7 +1735,7 @@ static int ax25_ioctl(struct socket *sock, unsigned int cmd, unsigned long arg) res = -EFAULT; break; } - if (amount > AX25_NOUID_BLOCK) { + if (amount < 0 || amount > AX25_NOUID_BLOCK) { res = -EINVAL; break; } diff --git a/net/batman-adv/Makefile b/net/batman-adv/Makefile index 489bb36f1b94..4f4aabbd8eab 100644 --- a/net/batman-adv/Makefile +++ b/net/batman-adv/Makefile @@ -24,6 +24,7 @@ batman-adv-y += bitarray.o batman-adv-$(CONFIG_BATMAN_ADV_BLA) += bridge_loop_avoidance.o batman-adv-y += debugfs.o batman-adv-$(CONFIG_BATMAN_ADV_DAT) += distributed-arp-table.o +batman-adv-y += fragmentation.o batman-adv-y += gateway_client.o batman-adv-y += gateway_common.o batman-adv-y += hard-interface.o @@ -37,5 +38,3 @@ batman-adv-y += send.o batman-adv-y += soft-interface.o batman-adv-y += sysfs.o batman-adv-y += translation-table.o -batman-adv-y += unicast.o -batman-adv-y += vis.o diff --git a/net/batman-adv/bat_iv_ogm.c b/net/batman-adv/bat_iv_ogm.c index 0a8a80cd4bf1..a2b480a90872 100644 --- a/net/batman-adv/bat_iv_ogm.c +++ b/net/batman-adv/bat_iv_ogm.c @@ -87,22 +87,198 @@ static uint8_t batadv_ring_buffer_avg(const uint8_t lq_recv[]) return (uint8_t)(sum / count); } +/** + * batadv_iv_ogm_orig_free - free the private resources allocated for this + * orig_node + * @orig_node: the orig_node for which the resources have to be free'd + */ +static void batadv_iv_ogm_orig_free(struct batadv_orig_node *orig_node) +{ + kfree(orig_node->bat_iv.bcast_own); + kfree(orig_node->bat_iv.bcast_own_sum); +} + +/** + * batadv_iv_ogm_orig_add_if - change the private structures of the orig_node to + * include the new hard-interface + * @orig_node: the orig_node that has to be changed + * @max_if_num: the current amount of interfaces + * + * Returns 0 on success, a negative error code otherwise. + */ +static int batadv_iv_ogm_orig_add_if(struct batadv_orig_node *orig_node, + int max_if_num) +{ + void *data_ptr; + size_t data_size, old_size; + int ret = -ENOMEM; + + spin_lock_bh(&orig_node->bat_iv.ogm_cnt_lock); + + data_size = max_if_num * sizeof(unsigned long) * BATADV_NUM_WORDS; + old_size = (max_if_num - 1) * sizeof(unsigned long) * BATADV_NUM_WORDS; + data_ptr = kmalloc(data_size, GFP_ATOMIC); + if (!data_ptr) + goto unlock; + + memcpy(data_ptr, orig_node->bat_iv.bcast_own, old_size); + kfree(orig_node->bat_iv.bcast_own); + orig_node->bat_iv.bcast_own = data_ptr; + + data_ptr = kmalloc(max_if_num * sizeof(uint8_t), GFP_ATOMIC); + if (!data_ptr) { + kfree(orig_node->bat_iv.bcast_own); + goto unlock; + } + + memcpy(data_ptr, orig_node->bat_iv.bcast_own_sum, + (max_if_num - 1) * sizeof(uint8_t)); + kfree(orig_node->bat_iv.bcast_own_sum); + orig_node->bat_iv.bcast_own_sum = data_ptr; + + ret = 0; + +unlock: + spin_unlock_bh(&orig_node->bat_iv.ogm_cnt_lock); + + return ret; +} + +/** + * batadv_iv_ogm_orig_del_if - change the private structures of the orig_node to + * exclude the removed interface + * @orig_node: the orig_node that has to be changed + * @max_if_num: the current amount of interfaces + * @del_if_num: the index of the interface being removed + * + * Returns 0 on success, a negative error code otherwise. + */ +static int batadv_iv_ogm_orig_del_if(struct batadv_orig_node *orig_node, + int max_if_num, int del_if_num) +{ + int chunk_size, ret = -ENOMEM, if_offset; + void *data_ptr = NULL; + + spin_lock_bh(&orig_node->bat_iv.ogm_cnt_lock); + + /* last interface was removed */ + if (max_if_num == 0) + goto free_bcast_own; + + chunk_size = sizeof(unsigned long) * BATADV_NUM_WORDS; + data_ptr = kmalloc(max_if_num * chunk_size, GFP_ATOMIC); + if (!data_ptr) + goto unlock; + + /* copy first part */ + memcpy(data_ptr, orig_node->bat_iv.bcast_own, del_if_num * chunk_size); + + /* copy second part */ + memcpy((char *)data_ptr + del_if_num * chunk_size, + orig_node->bat_iv.bcast_own + ((del_if_num + 1) * chunk_size), + (max_if_num - del_if_num) * chunk_size); + +free_bcast_own: + kfree(orig_node->bat_iv.bcast_own); + orig_node->bat_iv.bcast_own = data_ptr; + + if (max_if_num == 0) + goto free_own_sum; + + data_ptr = kmalloc(max_if_num * sizeof(uint8_t), GFP_ATOMIC); + if (!data_ptr) { + kfree(orig_node->bat_iv.bcast_own); + goto unlock; + } + + memcpy(data_ptr, orig_node->bat_iv.bcast_own_sum, + del_if_num * sizeof(uint8_t)); + + if_offset = (del_if_num + 1) * sizeof(uint8_t); + memcpy((char *)data_ptr + del_if_num * sizeof(uint8_t), + orig_node->bat_iv.bcast_own_sum + if_offset, + (max_if_num - del_if_num) * sizeof(uint8_t)); + +free_own_sum: + kfree(orig_node->bat_iv.bcast_own_sum); + orig_node->bat_iv.bcast_own_sum = data_ptr; + + ret = 0; +unlock: + spin_unlock_bh(&orig_node->bat_iv.ogm_cnt_lock); + + return ret; +} + +/** + * batadv_iv_ogm_orig_get - retrieve or create (if does not exist) an originator + * @bat_priv: the bat priv with all the soft interface information + * @addr: mac address of the originator + * + * Returns the originator object corresponding to the passed mac address or NULL + * on failure. + * If the object does not exists it is created an initialised. + */ +static struct batadv_orig_node * +batadv_iv_ogm_orig_get(struct batadv_priv *bat_priv, const uint8_t *addr) +{ + struct batadv_orig_node *orig_node; + int size, hash_added; + + orig_node = batadv_orig_hash_find(bat_priv, addr); + if (orig_node) + return orig_node; + + orig_node = batadv_orig_node_new(bat_priv, addr); + if (!orig_node) + return NULL; + + spin_lock_init(&orig_node->bat_iv.ogm_cnt_lock); + + size = bat_priv->num_ifaces * sizeof(unsigned long) * BATADV_NUM_WORDS; + orig_node->bat_iv.bcast_own = kzalloc(size, GFP_ATOMIC); + if (!orig_node->bat_iv.bcast_own) + goto free_orig_node; + + size = bat_priv->num_ifaces * sizeof(uint8_t); + orig_node->bat_iv.bcast_own_sum = kzalloc(size, GFP_ATOMIC); + if (!orig_node->bat_iv.bcast_own_sum) + goto free_bcast_own; + + hash_added = batadv_hash_add(bat_priv->orig_hash, batadv_compare_orig, + batadv_choose_orig, orig_node, + &orig_node->hash_entry); + if (hash_added != 0) + goto free_bcast_own; + + return orig_node; + +free_bcast_own: + kfree(orig_node->bat_iv.bcast_own); +free_orig_node: + batadv_orig_node_free_ref(orig_node); + + return NULL; +} + static struct batadv_neigh_node * batadv_iv_ogm_neigh_new(struct batadv_hard_iface *hard_iface, const uint8_t *neigh_addr, struct batadv_orig_node *orig_node, struct batadv_orig_node *orig_neigh) { + struct batadv_priv *bat_priv = netdev_priv(hard_iface->soft_iface); struct batadv_neigh_node *neigh_node; - neigh_node = batadv_neigh_node_new(hard_iface, neigh_addr); + neigh_node = batadv_neigh_node_new(hard_iface, neigh_addr, orig_node); if (!neigh_node) goto out; - INIT_LIST_HEAD(&neigh_node->bonding_list); + spin_lock_init(&neigh_node->bat_iv.lq_update_lock); - neigh_node->orig_node = orig_neigh; - neigh_node->if_incoming = hard_iface; + batadv_dbg(BATADV_DBG_BATMAN, bat_priv, + "Creating new neighbor %pM for orig_node %pM on interface %s\n", + neigh_addr, orig_node->orig, hard_iface->net_dev->name); spin_lock_bh(&orig_node->neigh_list_lock); hlist_add_head_rcu(&neigh_node->list, &orig_node->neigh_list); @@ -135,9 +311,8 @@ static int batadv_iv_ogm_iface_enable(struct batadv_hard_iface *hard_iface) batadv_ogm_packet->header.version = BATADV_COMPAT_VERSION; batadv_ogm_packet->header.ttl = 2; batadv_ogm_packet->flags = BATADV_NO_FLAGS; + batadv_ogm_packet->reserved = 0; batadv_ogm_packet->tq = BATADV_TQ_MAX_VALUE; - batadv_ogm_packet->tt_num_changes = 0; - batadv_ogm_packet->ttvn = 0; res = 0; @@ -207,12 +382,12 @@ static uint8_t batadv_hop_penalty(uint8_t tq, /* is there another aggregated packet here? */ static int batadv_iv_ogm_aggr_packet(int buff_pos, int packet_len, - int tt_num_changes) + __be16 tvlv_len) { int next_buff_pos = 0; next_buff_pos += buff_pos + BATADV_OGM_HLEN; - next_buff_pos += batadv_tt_len(tt_num_changes); + next_buff_pos += ntohs(tvlv_len); return (next_buff_pos <= packet_len) && (next_buff_pos <= BATADV_MAX_AGGREGATION_BYTES); @@ -240,7 +415,7 @@ static void batadv_iv_ogm_send_to_if(struct batadv_forw_packet *forw_packet, /* adjust all flags and log packets */ while (batadv_iv_ogm_aggr_packet(buff_pos, forw_packet->packet_len, - batadv_ogm_packet->tt_num_changes)) { + batadv_ogm_packet->tvlv_len)) { /* we might have aggregated direct link packets with an * ordinary base packet */ @@ -256,18 +431,18 @@ static void batadv_iv_ogm_send_to_if(struct batadv_forw_packet *forw_packet, fwd_str = "Sending own"; batadv_dbg(BATADV_DBG_BATMAN, bat_priv, - "%s %spacket (originator %pM, seqno %u, TQ %d, TTL %d, IDF %s, ttvn %d) on interface %s [%pM]\n", + "%s %spacket (originator %pM, seqno %u, TQ %d, TTL %d, IDF %s) on interface %s [%pM]\n", fwd_str, (packet_num > 0 ? "aggregated " : ""), batadv_ogm_packet->orig, ntohl(batadv_ogm_packet->seqno), batadv_ogm_packet->tq, batadv_ogm_packet->header.ttl, (batadv_ogm_packet->flags & BATADV_DIRECTLINK ? "on" : "off"), - batadv_ogm_packet->ttvn, hard_iface->net_dev->name, + hard_iface->net_dev->name, hard_iface->net_dev->dev_addr); buff_pos += BATADV_OGM_HLEN; - buff_pos += batadv_tt_len(batadv_ogm_packet->tt_num_changes); + buff_pos += ntohs(batadv_ogm_packet->tvlv_len); packet_num++; packet_pos = forw_packet->skb->data + buff_pos; batadv_ogm_packet = (struct batadv_ogm_packet *)packet_pos; @@ -601,7 +776,7 @@ static void batadv_iv_ogm_forward(struct batadv_orig_node *orig_node, struct batadv_hard_iface *if_incoming) { struct batadv_priv *bat_priv = netdev_priv(if_incoming->soft_iface); - uint8_t tt_num_changes; + uint16_t tvlv_len; if (batadv_ogm_packet->header.ttl <= 1) { batadv_dbg(BATADV_DBG_BATMAN, bat_priv, "ttl exceeded\n"); @@ -621,7 +796,7 @@ static void batadv_iv_ogm_forward(struct batadv_orig_node *orig_node, return; } - tt_num_changes = batadv_ogm_packet->tt_num_changes; + tvlv_len = ntohs(batadv_ogm_packet->tvlv_len); batadv_ogm_packet->header.ttl--; memcpy(batadv_ogm_packet->prev_sender, ethhdr->h_source, ETH_ALEN); @@ -642,7 +817,7 @@ static void batadv_iv_ogm_forward(struct batadv_orig_node *orig_node, batadv_ogm_packet->flags &= ~BATADV_DIRECTLINK; batadv_iv_ogm_queue_add(bat_priv, (unsigned char *)batadv_ogm_packet, - BATADV_OGM_HLEN + batadv_tt_len(tt_num_changes), + BATADV_OGM_HLEN + tvlv_len, if_incoming, 0, batadv_iv_ogm_fwd_send_time()); } @@ -662,20 +837,22 @@ batadv_iv_ogm_slide_own_bcast_window(struct batadv_hard_iface *hard_iface) uint32_t i; size_t word_index; uint8_t *w; + int if_num; for (i = 0; i < hash->size; i++) { head = &hash->table[i]; rcu_read_lock(); hlist_for_each_entry_rcu(orig_node, head, hash_entry) { - spin_lock_bh(&orig_node->ogm_cnt_lock); + spin_lock_bh(&orig_node->bat_iv.ogm_cnt_lock); word_index = hard_iface->if_num * BATADV_NUM_WORDS; - word = &(orig_node->bcast_own[word_index]); + word = &(orig_node->bat_iv.bcast_own[word_index]); batadv_bit_get_packet(bat_priv, word, 1, 0); - w = &orig_node->bcast_own_sum[hard_iface->if_num]; + if_num = hard_iface->if_num; + w = &orig_node->bat_iv.bcast_own_sum[if_num]; *w = bitmap_weight(word, BATADV_TQ_LOCAL_WINDOW_SIZE); - spin_unlock_bh(&orig_node->ogm_cnt_lock); + spin_unlock_bh(&orig_node->bat_iv.ogm_cnt_lock); } rcu_read_unlock(); } @@ -688,43 +865,29 @@ static void batadv_iv_ogm_schedule(struct batadv_hard_iface *hard_iface) struct batadv_ogm_packet *batadv_ogm_packet; struct batadv_hard_iface *primary_if; int *ogm_buff_len = &hard_iface->bat_iv.ogm_buff_len; - int vis_server, tt_num_changes = 0; uint32_t seqno; - uint8_t bandwidth; + uint16_t tvlv_len = 0; - vis_server = atomic_read(&bat_priv->vis_mode); primary_if = batadv_primary_if_get_selected(bat_priv); - if (hard_iface == primary_if) - tt_num_changes = batadv_tt_append_diff(bat_priv, ogm_buff, - ogm_buff_len, - BATADV_OGM_HLEN); + if (hard_iface == primary_if) { + /* tt changes have to be committed before the tvlv data is + * appended as it may alter the tt tvlv container + */ + batadv_tt_local_commit_changes(bat_priv); + tvlv_len = batadv_tvlv_container_ogm_append(bat_priv, ogm_buff, + ogm_buff_len, + BATADV_OGM_HLEN); + } batadv_ogm_packet = (struct batadv_ogm_packet *)(*ogm_buff); + batadv_ogm_packet->tvlv_len = htons(tvlv_len); /* change sequence number to network order */ seqno = (uint32_t)atomic_read(&hard_iface->bat_iv.ogm_seqno); batadv_ogm_packet->seqno = htonl(seqno); atomic_inc(&hard_iface->bat_iv.ogm_seqno); - batadv_ogm_packet->ttvn = atomic_read(&bat_priv->tt.vn); - batadv_ogm_packet->tt_crc = htons(bat_priv->tt.local_crc); - if (tt_num_changes >= 0) - batadv_ogm_packet->tt_num_changes = tt_num_changes; - - if (vis_server == BATADV_VIS_TYPE_SERVER_SYNC) - batadv_ogm_packet->flags |= BATADV_VIS_SERVER; - else - batadv_ogm_packet->flags &= ~BATADV_VIS_SERVER; - - if (hard_iface == primary_if && - atomic_read(&bat_priv->gw_mode) == BATADV_GW_MODE_SERVER) { - bandwidth = (uint8_t)atomic_read(&bat_priv->gw_bandwidth); - batadv_ogm_packet->gw_flags = bandwidth; - } else { - batadv_ogm_packet->gw_flags = BATADV_NO_FLAGS; - } - batadv_iv_ogm_slide_own_bcast_window(hard_iface); batadv_iv_ogm_queue_add(bat_priv, hard_iface->bat_iv.ogm_buff, hard_iface->bat_iv.ogm_buff_len, hard_iface, 1, @@ -770,18 +933,18 @@ batadv_iv_ogm_orig_update(struct batadv_priv *bat_priv, if (dup_status != BATADV_NO_DUP) continue; - spin_lock_bh(&tmp_neigh_node->lq_update_lock); - batadv_ring_buffer_set(tmp_neigh_node->tq_recv, - &tmp_neigh_node->tq_index, 0); - tq_avg = batadv_ring_buffer_avg(tmp_neigh_node->tq_recv); - tmp_neigh_node->tq_avg = tq_avg; - spin_unlock_bh(&tmp_neigh_node->lq_update_lock); + spin_lock_bh(&tmp_neigh_node->bat_iv.lq_update_lock); + batadv_ring_buffer_set(tmp_neigh_node->bat_iv.tq_recv, + &tmp_neigh_node->bat_iv.tq_index, 0); + tq_avg = batadv_ring_buffer_avg(tmp_neigh_node->bat_iv.tq_recv); + tmp_neigh_node->bat_iv.tq_avg = tq_avg; + spin_unlock_bh(&tmp_neigh_node->bat_iv.lq_update_lock); } if (!neigh_node) { struct batadv_orig_node *orig_tmp; - orig_tmp = batadv_get_orig_node(bat_priv, ethhdr->h_source); + orig_tmp = batadv_iv_ogm_orig_get(bat_priv, ethhdr->h_source); if (!orig_tmp) goto unlock; @@ -798,80 +961,55 @@ batadv_iv_ogm_orig_update(struct batadv_priv *bat_priv, rcu_read_unlock(); - orig_node->flags = batadv_ogm_packet->flags; neigh_node->last_seen = jiffies; - spin_lock_bh(&neigh_node->lq_update_lock); - batadv_ring_buffer_set(neigh_node->tq_recv, - &neigh_node->tq_index, + spin_lock_bh(&neigh_node->bat_iv.lq_update_lock); + batadv_ring_buffer_set(neigh_node->bat_iv.tq_recv, + &neigh_node->bat_iv.tq_index, batadv_ogm_packet->tq); - neigh_node->tq_avg = batadv_ring_buffer_avg(neigh_node->tq_recv); - spin_unlock_bh(&neigh_node->lq_update_lock); + tq_avg = batadv_ring_buffer_avg(neigh_node->bat_iv.tq_recv); + neigh_node->bat_iv.tq_avg = tq_avg; + spin_unlock_bh(&neigh_node->bat_iv.lq_update_lock); if (dup_status == BATADV_NO_DUP) { orig_node->last_ttl = batadv_ogm_packet->header.ttl; neigh_node->last_ttl = batadv_ogm_packet->header.ttl; } - batadv_bonding_candidate_add(orig_node, neigh_node); + batadv_bonding_candidate_add(bat_priv, orig_node, neigh_node); /* if this neighbor already is our next hop there is nothing * to change */ router = batadv_orig_node_get_router(orig_node); if (router == neigh_node) - goto update_tt; + goto out; /* if this neighbor does not offer a better TQ we won't consider it */ - if (router && (router->tq_avg > neigh_node->tq_avg)) - goto update_tt; + if (router && (router->bat_iv.tq_avg > neigh_node->bat_iv.tq_avg)) + goto out; /* if the TQ is the same and the link not more symmetric we * won't consider it either */ - if (router && (neigh_node->tq_avg == router->tq_avg)) { + if (router && (neigh_node->bat_iv.tq_avg == router->bat_iv.tq_avg)) { orig_node_tmp = router->orig_node; - spin_lock_bh(&orig_node_tmp->ogm_cnt_lock); + spin_lock_bh(&orig_node_tmp->bat_iv.ogm_cnt_lock); if_num = router->if_incoming->if_num; - sum_orig = orig_node_tmp->bcast_own_sum[if_num]; - spin_unlock_bh(&orig_node_tmp->ogm_cnt_lock); + sum_orig = orig_node_tmp->bat_iv.bcast_own_sum[if_num]; + spin_unlock_bh(&orig_node_tmp->bat_iv.ogm_cnt_lock); orig_node_tmp = neigh_node->orig_node; - spin_lock_bh(&orig_node_tmp->ogm_cnt_lock); + spin_lock_bh(&orig_node_tmp->bat_iv.ogm_cnt_lock); if_num = neigh_node->if_incoming->if_num; - sum_neigh = orig_node_tmp->bcast_own_sum[if_num]; - spin_unlock_bh(&orig_node_tmp->ogm_cnt_lock); + sum_neigh = orig_node_tmp->bat_iv.bcast_own_sum[if_num]; + spin_unlock_bh(&orig_node_tmp->bat_iv.ogm_cnt_lock); if (sum_orig >= sum_neigh) - goto update_tt; + goto out; } batadv_update_route(bat_priv, orig_node, neigh_node); - -update_tt: - /* I have to check for transtable changes only if the OGM has been - * sent through a primary interface - */ - if (((batadv_ogm_packet->orig != ethhdr->h_source) && - (batadv_ogm_packet->header.ttl > 2)) || - (batadv_ogm_packet->flags & BATADV_PRIMARIES_FIRST_HOP)) - batadv_tt_update_orig(bat_priv, orig_node, tt_buff, - batadv_ogm_packet->tt_num_changes, - batadv_ogm_packet->ttvn, - ntohs(batadv_ogm_packet->tt_crc)); - - if (orig_node->gw_flags != batadv_ogm_packet->gw_flags) - batadv_gw_node_update(bat_priv, orig_node, - batadv_ogm_packet->gw_flags); - - orig_node->gw_flags = batadv_ogm_packet->gw_flags; - - /* restart gateway selection if fast or late switching was enabled */ - if ((orig_node->gw_flags) && - (atomic_read(&bat_priv->gw_mode) == BATADV_GW_MODE_CLIENT) && - (atomic_read(&bat_priv->gw_sel_class) > 2)) - batadv_gw_check_election(bat_priv, orig_node); - goto out; unlock: @@ -893,7 +1031,7 @@ static int batadv_iv_ogm_calc_tq(struct batadv_orig_node *orig_node, uint8_t total_count; uint8_t orig_eq_count, neigh_rq_count, neigh_rq_inv, tq_own; unsigned int neigh_rq_inv_cube, neigh_rq_max_cube; - int tq_asym_penalty, inv_asym_penalty, ret = 0; + int tq_asym_penalty, inv_asym_penalty, if_num, ret = 0; unsigned int combined_tq; /* find corresponding one hop neighbor */ @@ -931,10 +1069,11 @@ static int batadv_iv_ogm_calc_tq(struct batadv_orig_node *orig_node, orig_node->last_seen = jiffies; /* find packet count of corresponding one hop neighbor */ - spin_lock_bh(&orig_node->ogm_cnt_lock); - orig_eq_count = orig_neigh_node->bcast_own_sum[if_incoming->if_num]; - neigh_rq_count = neigh_node->real_packet_count; - spin_unlock_bh(&orig_node->ogm_cnt_lock); + spin_lock_bh(&orig_node->bat_iv.ogm_cnt_lock); + if_num = if_incoming->if_num; + orig_eq_count = orig_neigh_node->bat_iv.bcast_own_sum[if_num]; + neigh_rq_count = neigh_node->bat_iv.real_packet_count; + spin_unlock_bh(&orig_node->bat_iv.ogm_cnt_lock); /* pay attention to not get a value bigger than 100 % */ if (orig_eq_count > neigh_rq_count) @@ -1016,12 +1155,13 @@ batadv_iv_ogm_update_seqnos(const struct ethhdr *ethhdr, uint32_t seqno = ntohl(batadv_ogm_packet->seqno); uint8_t *neigh_addr; uint8_t packet_count; + unsigned long *bitmap; - orig_node = batadv_get_orig_node(bat_priv, batadv_ogm_packet->orig); + orig_node = batadv_iv_ogm_orig_get(bat_priv, batadv_ogm_packet->orig); if (!orig_node) return BATADV_NO_DUP; - spin_lock_bh(&orig_node->ogm_cnt_lock); + spin_lock_bh(&orig_node->bat_iv.ogm_cnt_lock); seq_diff = seqno - orig_node->last_real_seqno; /* signalize caller that the packet is to be dropped. */ @@ -1036,7 +1176,7 @@ batadv_iv_ogm_update_seqnos(const struct ethhdr *ethhdr, hlist_for_each_entry_rcu(tmp_neigh_node, &orig_node->neigh_list, list) { neigh_addr = tmp_neigh_node->addr; - is_dup = batadv_test_bit(tmp_neigh_node->real_bits, + is_dup = batadv_test_bit(tmp_neigh_node->bat_iv.real_bits, orig_node->last_real_seqno, seqno); @@ -1052,13 +1192,13 @@ batadv_iv_ogm_update_seqnos(const struct ethhdr *ethhdr, } /* if the window moved, set the update flag. */ - need_update |= batadv_bit_get_packet(bat_priv, - tmp_neigh_node->real_bits, + bitmap = tmp_neigh_node->bat_iv.real_bits; + need_update |= batadv_bit_get_packet(bat_priv, bitmap, seq_diff, set_mark); - packet_count = bitmap_weight(tmp_neigh_node->real_bits, + packet_count = bitmap_weight(tmp_neigh_node->bat_iv.real_bits, BATADV_TQ_LOCAL_WINDOW_SIZE); - tmp_neigh_node->real_packet_count = packet_count; + tmp_neigh_node->bat_iv.real_packet_count = packet_count; } rcu_read_unlock(); @@ -1070,7 +1210,7 @@ batadv_iv_ogm_update_seqnos(const struct ethhdr *ethhdr, } out: - spin_unlock_bh(&orig_node->ogm_cnt_lock); + spin_unlock_bh(&orig_node->bat_iv.ogm_cnt_lock); batadv_orig_node_free_ref(orig_node); return ret; } @@ -1082,7 +1222,7 @@ static void batadv_iv_ogm_process(const struct ethhdr *ethhdr, { struct batadv_priv *bat_priv = netdev_priv(if_incoming->soft_iface); struct batadv_hard_iface *hard_iface; - struct batadv_orig_node *orig_neigh_node, *orig_node; + struct batadv_orig_node *orig_neigh_node, *orig_node, *orig_node_tmp; struct batadv_neigh_node *router = NULL, *router_router = NULL; struct batadv_neigh_node *orig_neigh_router = NULL; int has_directlink_flag; @@ -1122,13 +1262,11 @@ static void batadv_iv_ogm_process(const struct ethhdr *ethhdr, is_single_hop_neigh = true; batadv_dbg(BATADV_DBG_BATMAN, bat_priv, - "Received BATMAN packet via NB: %pM, IF: %s [%pM] (from OG: %pM, via prev OG: %pM, seqno %u, ttvn %u, crc %#.4x, changes %u, tq %d, TTL %d, V %d, IDF %d)\n", + "Received BATMAN packet via NB: %pM, IF: %s [%pM] (from OG: %pM, via prev OG: %pM, seqno %u, tq %d, TTL %d, V %d, IDF %d)\n", ethhdr->h_source, if_incoming->net_dev->name, if_incoming->net_dev->dev_addr, batadv_ogm_packet->orig, batadv_ogm_packet->prev_sender, - ntohl(batadv_ogm_packet->seqno), batadv_ogm_packet->ttvn, - ntohs(batadv_ogm_packet->tt_crc), - batadv_ogm_packet->tt_num_changes, batadv_ogm_packet->tq, + ntohl(batadv_ogm_packet->seqno), batadv_ogm_packet->tq, batadv_ogm_packet->header.ttl, batadv_ogm_packet->header.version, has_directlink_flag); @@ -1168,8 +1306,8 @@ static void batadv_iv_ogm_process(const struct ethhdr *ethhdr, int16_t if_num; uint8_t *weight; - orig_neigh_node = batadv_get_orig_node(bat_priv, - ethhdr->h_source); + orig_neigh_node = batadv_iv_ogm_orig_get(bat_priv, + ethhdr->h_source); if (!orig_neigh_node) return; @@ -1183,15 +1321,15 @@ static void batadv_iv_ogm_process(const struct ethhdr *ethhdr, if_num = if_incoming->if_num; offset = if_num * BATADV_NUM_WORDS; - spin_lock_bh(&orig_neigh_node->ogm_cnt_lock); - word = &(orig_neigh_node->bcast_own[offset]); + spin_lock_bh(&orig_neigh_node->bat_iv.ogm_cnt_lock); + word = &(orig_neigh_node->bat_iv.bcast_own[offset]); bit_pos = if_incoming_seqno - 2; bit_pos -= ntohl(batadv_ogm_packet->seqno); batadv_set_bit(word, bit_pos); - weight = &orig_neigh_node->bcast_own_sum[if_num]; + weight = &orig_neigh_node->bat_iv.bcast_own_sum[if_num]; *weight = bitmap_weight(word, BATADV_TQ_LOCAL_WINDOW_SIZE); - spin_unlock_bh(&orig_neigh_node->ogm_cnt_lock); + spin_unlock_bh(&orig_neigh_node->bat_iv.ogm_cnt_lock); } batadv_dbg(BATADV_DBG_BATMAN, bat_priv, @@ -1214,7 +1352,7 @@ static void batadv_iv_ogm_process(const struct ethhdr *ethhdr, return; } - orig_node = batadv_get_orig_node(bat_priv, batadv_ogm_packet->orig); + orig_node = batadv_iv_ogm_orig_get(bat_priv, batadv_ogm_packet->orig); if (!orig_node) return; @@ -1235,10 +1373,12 @@ static void batadv_iv_ogm_process(const struct ethhdr *ethhdr, } router = batadv_orig_node_get_router(orig_node); - if (router) - router_router = batadv_orig_node_get_router(router->orig_node); + if (router) { + orig_node_tmp = router->orig_node; + router_router = batadv_orig_node_get_router(orig_node_tmp); + } - if ((router && router->tq_avg != 0) && + if ((router && router->bat_iv.tq_avg != 0) && (batadv_compare_eth(router->addr, ethhdr->h_source))) is_from_best_next_hop = true; @@ -1254,14 +1394,16 @@ static void batadv_iv_ogm_process(const struct ethhdr *ethhdr, goto out; } + batadv_tvlv_ogm_receive(bat_priv, batadv_ogm_packet, orig_node); + /* if sender is a direct neighbor the sender mac equals * originator mac */ if (is_single_hop_neigh) orig_neigh_node = orig_node; else - orig_neigh_node = batadv_get_orig_node(bat_priv, - ethhdr->h_source); + orig_neigh_node = batadv_iv_ogm_orig_get(bat_priv, + ethhdr->h_source); if (!orig_neigh_node) goto out; @@ -1350,9 +1492,9 @@ static int batadv_iv_ogm_receive(struct sk_buff *skb, struct batadv_ogm_packet *batadv_ogm_packet; struct ethhdr *ethhdr; int buff_pos = 0, packet_len; - unsigned char *tt_buff, *packet_buff; - bool ret; + unsigned char *tvlv_buff, *packet_buff; uint8_t *packet_pos; + bool ret; ret = batadv_check_management_packet(skb, if_incoming, BATADV_OGM_HLEN); if (!ret) @@ -1375,14 +1517,14 @@ static int batadv_iv_ogm_receive(struct sk_buff *skb, /* unpack the aggregated packets and process them one by one */ while (batadv_iv_ogm_aggr_packet(buff_pos, packet_len, - batadv_ogm_packet->tt_num_changes)) { - tt_buff = packet_buff + buff_pos + BATADV_OGM_HLEN; + batadv_ogm_packet->tvlv_len)) { + tvlv_buff = packet_buff + buff_pos + BATADV_OGM_HLEN; - batadv_iv_ogm_process(ethhdr, batadv_ogm_packet, tt_buff, - if_incoming); + batadv_iv_ogm_process(ethhdr, batadv_ogm_packet, + tvlv_buff, if_incoming); buff_pos += BATADV_OGM_HLEN; - buff_pos += batadv_tt_len(batadv_ogm_packet->tt_num_changes); + buff_pos += ntohs(batadv_ogm_packet->tvlv_len); packet_pos = packet_buff + buff_pos; batadv_ogm_packet = (struct batadv_ogm_packet *)packet_pos; @@ -1392,6 +1534,106 @@ static int batadv_iv_ogm_receive(struct sk_buff *skb, return NET_RX_SUCCESS; } +/** + * batadv_iv_ogm_orig_print - print the originator table + * @bat_priv: the bat priv with all the soft interface information + * @seq: debugfs table seq_file struct + */ +static void batadv_iv_ogm_orig_print(struct batadv_priv *bat_priv, + struct seq_file *seq) +{ + struct batadv_neigh_node *neigh_node, *neigh_node_tmp; + struct batadv_hashtable *hash = bat_priv->orig_hash; + int last_seen_msecs, last_seen_secs; + struct batadv_orig_node *orig_node; + unsigned long last_seen_jiffies; + struct hlist_head *head; + int batman_count = 0; + uint32_t i; + + seq_printf(seq, " %-15s %s (%s/%i) %17s [%10s]: %20s ...\n", + "Originator", "last-seen", "#", BATADV_TQ_MAX_VALUE, + "Nexthop", "outgoingIF", "Potential nexthops"); + + for (i = 0; i < hash->size; i++) { + head = &hash->table[i]; + + rcu_read_lock(); + hlist_for_each_entry_rcu(orig_node, head, hash_entry) { + neigh_node = batadv_orig_node_get_router(orig_node); + if (!neigh_node) + continue; + + if (neigh_node->bat_iv.tq_avg == 0) + goto next; + + last_seen_jiffies = jiffies - orig_node->last_seen; + last_seen_msecs = jiffies_to_msecs(last_seen_jiffies); + last_seen_secs = last_seen_msecs / 1000; + last_seen_msecs = last_seen_msecs % 1000; + + seq_printf(seq, "%pM %4i.%03is (%3i) %pM [%10s]:", + orig_node->orig, last_seen_secs, + last_seen_msecs, neigh_node->bat_iv.tq_avg, + neigh_node->addr, + neigh_node->if_incoming->net_dev->name); + + hlist_for_each_entry_rcu(neigh_node_tmp, + &orig_node->neigh_list, list) { + seq_printf(seq, " %pM (%3i)", + neigh_node_tmp->addr, + neigh_node_tmp->bat_iv.tq_avg); + } + + seq_puts(seq, "\n"); + batman_count++; + +next: + batadv_neigh_node_free_ref(neigh_node); + } + rcu_read_unlock(); + } + + if (batman_count == 0) + seq_puts(seq, "No batman nodes in range ...\n"); +} + +/** + * batadv_iv_ogm_neigh_cmp - compare the metrics of two neighbors + * @neigh1: the first neighbor object of the comparison + * @neigh2: the second neighbor object of the comparison + * + * Returns a value less, equal to or greater than 0 if the metric via neigh1 is + * lower, the same as or higher than the metric via neigh2 + */ +static int batadv_iv_ogm_neigh_cmp(struct batadv_neigh_node *neigh1, + struct batadv_neigh_node *neigh2) +{ + uint8_t tq1, tq2; + + tq1 = neigh1->bat_iv.tq_avg; + tq2 = neigh2->bat_iv.tq_avg; + + return tq1 - tq2; +} + +/** + * batadv_iv_ogm_neigh_is_eob - check if neigh1 is equally good or better than + * neigh2 from the metric prospective + * @neigh1: the first neighbor object of the comparison + * @neigh2: the second neighbor object of the comparison + * + * Returns true if the metric via neigh1 is equally good or better than the + * metric via neigh2, false otherwise. + */ +static bool batadv_iv_ogm_neigh_is_eob(struct batadv_neigh_node *neigh1, + struct batadv_neigh_node *neigh2) +{ + int diff = batadv_iv_ogm_neigh_cmp(neigh1, neigh2); + + return diff > -BATADV_TQ_SIMILARITY_THRESHOLD; +} + static struct batadv_algo_ops batadv_batman_iv __read_mostly = { .name = "BATMAN_IV", .bat_iface_enable = batadv_iv_ogm_iface_enable, @@ -1400,6 +1642,12 @@ static struct batadv_algo_ops batadv_batman_iv __read_mostly = { .bat_primary_iface_set = batadv_iv_ogm_primary_iface_set, .bat_ogm_schedule = batadv_iv_ogm_schedule, .bat_ogm_emit = batadv_iv_ogm_emit, + .bat_neigh_cmp = batadv_iv_ogm_neigh_cmp, + .bat_neigh_is_equiv_or_better = batadv_iv_ogm_neigh_is_eob, + .bat_orig_print = batadv_iv_ogm_orig_print, + .bat_orig_free = batadv_iv_ogm_orig_free, + .bat_orig_add_if = batadv_iv_ogm_orig_add_if, + .bat_orig_del_if = batadv_iv_ogm_orig_del_if, }; int __init batadv_iv_init(void) diff --git a/net/batman-adv/bridge_loop_avoidance.c b/net/batman-adv/bridge_loop_avoidance.c index 264de88db320..28eb5e6d0a02 100644 --- a/net/batman-adv/bridge_loop_avoidance.c +++ b/net/batman-adv/bridge_loop_avoidance.c @@ -411,10 +411,10 @@ batadv_bla_get_backbone_gw(struct batadv_priv *bat_priv, uint8_t *orig, return NULL; } - /* this is a gateway now, remove any tt entries */ + /* this is a gateway now, remove any TT entry on this VLAN */ orig_node = batadv_orig_hash_find(bat_priv, orig); if (orig_node) { - batadv_tt_global_del_orig(bat_priv, orig_node, + batadv_tt_global_del_orig(bat_priv, orig_node, vid, "became a backbone gateway"); batadv_orig_node_free_ref(orig_node); } @@ -858,30 +858,28 @@ static int batadv_bla_process_claim(struct batadv_priv *bat_priv, struct batadv_hard_iface *primary_if, struct sk_buff *skb) { - struct ethhdr *ethhdr; + struct batadv_bla_claim_dst *bla_dst; + uint8_t *hw_src, *hw_dst; struct vlan_ethhdr *vhdr; + struct ethhdr *ethhdr; struct arphdr *arphdr; - uint8_t *hw_src, *hw_dst; - struct batadv_bla_claim_dst *bla_dst; - uint16_t proto; + unsigned short vid; + __be16 proto; int headlen; - unsigned short vid = BATADV_NO_FLAGS; int ret; + vid = batadv_get_vid(skb, 0); ethhdr = eth_hdr(skb); - if (ntohs(ethhdr->h_proto) == ETH_P_8021Q) { + proto = ethhdr->h_proto; + headlen = ETH_HLEN; + if (vid & BATADV_VLAN_HAS_TAG) { vhdr = (struct vlan_ethhdr *)ethhdr; - vid = ntohs(vhdr->h_vlan_TCI) & VLAN_VID_MASK; - vid |= BATADV_VLAN_HAS_TAG; - proto = ntohs(vhdr->h_vlan_encapsulated_proto); - headlen = sizeof(*vhdr); - } else { - proto = ntohs(ethhdr->h_proto); - headlen = ETH_HLEN; + proto = vhdr->h_vlan_encapsulated_proto; + headlen += VLAN_HLEN; } - if (proto != ETH_P_ARP) + if (proto != htons(ETH_P_ARP)) return 0; /* not a claim frame */ /* this must be a ARP frame. check if it is a claim. */ @@ -1317,12 +1315,14 @@ out: /* @bat_priv: the bat priv with all the soft interface information * @orig: originator mac address + * @vid: VLAN identifier * - * check if the originator is a gateway for any VLAN ID. + * Check if the originator is a gateway for the VLAN identified by vid. * - * returns 1 if it is found, 0 otherwise + * Returns true if orig is a backbone for this vid, false otherwise. */ -int batadv_bla_is_backbone_gw_orig(struct batadv_priv *bat_priv, uint8_t *orig) +bool batadv_bla_is_backbone_gw_orig(struct batadv_priv *bat_priv, uint8_t *orig, + unsigned short vid) { struct batadv_hashtable *hash = bat_priv->bla.backbone_hash; struct hlist_head *head; @@ -1330,25 +1330,26 @@ int batadv_bla_is_backbone_gw_orig(struct batadv_priv *bat_priv, uint8_t *orig) int i; if (!atomic_read(&bat_priv->bridge_loop_avoidance)) - return 0; + return false; if (!hash) - return 0; + return false; for (i = 0; i < hash->size; i++) { head = &hash->table[i]; rcu_read_lock(); hlist_for_each_entry_rcu(backbone_gw, head, hash_entry) { - if (batadv_compare_eth(backbone_gw->orig, orig)) { + if (batadv_compare_eth(backbone_gw->orig, orig) && + backbone_gw->vid == vid) { rcu_read_unlock(); - return 1; + return true; } } rcu_read_unlock(); } - return 0; + return false; } @@ -1365,10 +1366,8 @@ int batadv_bla_is_backbone_gw_orig(struct batadv_priv *bat_priv, uint8_t *orig) int batadv_bla_is_backbone_gw(struct sk_buff *skb, struct batadv_orig_node *orig_node, int hdr_size) { - struct ethhdr *ethhdr; - struct vlan_ethhdr *vhdr; struct batadv_bla_backbone_gw *backbone_gw; - unsigned short vid = BATADV_NO_FLAGS; + unsigned short vid; if (!atomic_read(&orig_node->bat_priv->bridge_loop_avoidance)) return 0; @@ -1377,16 +1376,7 @@ int batadv_bla_is_backbone_gw(struct sk_buff *skb, if (!pskb_may_pull(skb, hdr_size + ETH_HLEN)) return 0; - ethhdr = (struct ethhdr *)(((uint8_t *)skb->data) + hdr_size); - - if (ntohs(ethhdr->h_proto) == ETH_P_8021Q) { - if (!pskb_may_pull(skb, hdr_size + sizeof(struct vlan_ethhdr))) - return 0; - - vhdr = (struct vlan_ethhdr *)(skb->data + hdr_size); - vid = ntohs(vhdr->h_vlan_TCI) & VLAN_VID_MASK; - vid |= BATADV_VLAN_HAS_TAG; - } + vid = batadv_get_vid(skb, hdr_size); /* see if this originator is a backbone gw for this VLAN */ backbone_gw = batadv_backbone_hash_find(orig_node->bat_priv, diff --git a/net/batman-adv/bridge_loop_avoidance.h b/net/batman-adv/bridge_loop_avoidance.h index 4b102e71e5bd..da173e760e77 100644 --- a/net/batman-adv/bridge_loop_avoidance.h +++ b/net/batman-adv/bridge_loop_avoidance.h @@ -30,7 +30,8 @@ int batadv_bla_is_backbone_gw(struct sk_buff *skb, int batadv_bla_claim_table_seq_print_text(struct seq_file *seq, void *offset); int batadv_bla_backbone_table_seq_print_text(struct seq_file *seq, void *offset); -int batadv_bla_is_backbone_gw_orig(struct batadv_priv *bat_priv, uint8_t *orig); +bool batadv_bla_is_backbone_gw_orig(struct batadv_priv *bat_priv, uint8_t *orig, + unsigned short vid); int batadv_bla_check_bcast_duplist(struct batadv_priv *bat_priv, struct sk_buff *skb); void batadv_bla_update_orig_address(struct batadv_priv *bat_priv, @@ -74,10 +75,11 @@ static inline int batadv_bla_backbone_table_seq_print_text(struct seq_file *seq, return 0; } -static inline int batadv_bla_is_backbone_gw_orig(struct batadv_priv *bat_priv, - uint8_t *orig) +static inline bool batadv_bla_is_backbone_gw_orig(struct batadv_priv *bat_priv, + uint8_t *orig, + unsigned short vid) { - return 0; + return false; } static inline int diff --git a/net/batman-adv/debugfs.c b/net/batman-adv/debugfs.c index f186a55b23c3..049a7a2ac5b6 100644 --- a/net/batman-adv/debugfs.c +++ b/net/batman-adv/debugfs.c @@ -28,7 +28,6 @@ #include "gateway_common.h" #include "gateway_client.h" #include "soft-interface.h" -#include "vis.h" #include "icmp_socket.h" #include "bridge_loop_avoidance.h" #include "distributed-arp-table.h" @@ -300,12 +299,6 @@ static int batadv_transtable_local_open(struct inode *inode, struct file *file) return single_open(file, batadv_tt_local_seq_print_text, net_dev); } -static int batadv_vis_data_open(struct inode *inode, struct file *file) -{ - struct net_device *net_dev = (struct net_device *)inode->i_private; - return single_open(file, batadv_vis_seq_print_text, net_dev); -} - struct batadv_debuginfo { struct attribute attr; const struct file_operations fops; @@ -356,7 +349,6 @@ static BATADV_DEBUGINFO(dat_cache, S_IRUGO, batadv_dat_cache_open); #endif static BATADV_DEBUGINFO(transtable_local, S_IRUGO, batadv_transtable_local_open); -static BATADV_DEBUGINFO(vis_data, S_IRUGO, batadv_vis_data_open); #ifdef CONFIG_BATMAN_ADV_NC static BATADV_DEBUGINFO(nc_nodes, S_IRUGO, batadv_nc_nodes_open); #endif @@ -373,7 +365,6 @@ static struct batadv_debuginfo *batadv_mesh_debuginfos[] = { &batadv_debuginfo_dat_cache, #endif &batadv_debuginfo_transtable_local, - &batadv_debuginfo_vis_data, #ifdef CONFIG_BATMAN_ADV_NC &batadv_debuginfo_nc_nodes, #endif diff --git a/net/batman-adv/distributed-arp-table.c b/net/batman-adv/distributed-arp-table.c index 06345d401588..6c8c3934bd7b 100644 --- a/net/batman-adv/distributed-arp-table.c +++ b/net/batman-adv/distributed-arp-table.c @@ -19,6 +19,7 @@ #include <linux/if_ether.h> #include <linux/if_arp.h> +#include <linux/if_vlan.h> #include <net/arp.h> #include "main.h" @@ -29,7 +30,6 @@ #include "send.h" #include "types.h" #include "translation-table.h" -#include "unicast.h" static void batadv_dat_purge(struct work_struct *work); @@ -206,15 +206,11 @@ static __be32 batadv_arp_ip_dst(struct sk_buff *skb, int hdr_size) */ static uint32_t batadv_hash_dat(const void *data, uint32_t size) { - const unsigned char *key = data; uint32_t hash = 0; - size_t i; + const struct batadv_dat_entry *dat = data; - for (i = 0; i < 4; i++) { - hash += key[i]; - hash += (hash << 10); - hash ^= (hash >> 6); - } + hash = batadv_hash_bytes(hash, &dat->ip, sizeof(dat->ip)); + hash = batadv_hash_bytes(hash, &dat->vid, sizeof(dat->vid)); hash += (hash << 3); hash ^= (hash >> 11); @@ -228,21 +224,26 @@ static uint32_t batadv_hash_dat(const void *data, uint32_t size) * table * @bat_priv: the bat priv with all the soft interface information * @ip: search key + * @vid: VLAN identifier * * Returns the dat_entry if found, NULL otherwise. */ static struct batadv_dat_entry * -batadv_dat_entry_hash_find(struct batadv_priv *bat_priv, __be32 ip) +batadv_dat_entry_hash_find(struct batadv_priv *bat_priv, __be32 ip, + unsigned short vid) { struct hlist_head *head; - struct batadv_dat_entry *dat_entry, *dat_entry_tmp = NULL; + struct batadv_dat_entry to_find, *dat_entry, *dat_entry_tmp = NULL; struct batadv_hashtable *hash = bat_priv->dat.hash; uint32_t index; if (!hash) return NULL; - index = batadv_hash_dat(&ip, hash->size); + to_find.ip = ip; + to_find.vid = vid; + + index = batadv_hash_dat(&to_find, hash->size); head = &hash->table[index]; rcu_read_lock(); @@ -266,22 +267,24 @@ batadv_dat_entry_hash_find(struct batadv_priv *bat_priv, __be32 ip) * @bat_priv: the bat priv with all the soft interface information * @ip: ipv4 to add/edit * @mac_addr: mac address to assign to the given ipv4 + * @vid: VLAN identifier */ static void batadv_dat_entry_add(struct batadv_priv *bat_priv, __be32 ip, - uint8_t *mac_addr) + uint8_t *mac_addr, unsigned short vid) { struct batadv_dat_entry *dat_entry; int hash_added; - dat_entry = batadv_dat_entry_hash_find(bat_priv, ip); + dat_entry = batadv_dat_entry_hash_find(bat_priv, ip, vid); /* if this entry is already known, just update it */ if (dat_entry) { if (!batadv_compare_eth(dat_entry->mac_addr, mac_addr)) memcpy(dat_entry->mac_addr, mac_addr, ETH_ALEN); dat_entry->last_update = jiffies; batadv_dbg(BATADV_DBG_DAT, bat_priv, - "Entry updated: %pI4 %pM\n", &dat_entry->ip, - dat_entry->mac_addr); + "Entry updated: %pI4 %pM (vid: %d)\n", + &dat_entry->ip, dat_entry->mac_addr, + BATADV_PRINT_VID(vid)); goto out; } @@ -290,12 +293,13 @@ static void batadv_dat_entry_add(struct batadv_priv *bat_priv, __be32 ip, goto out; dat_entry->ip = ip; + dat_entry->vid = vid; memcpy(dat_entry->mac_addr, mac_addr, ETH_ALEN); dat_entry->last_update = jiffies; atomic_set(&dat_entry->refcount, 2); hash_added = batadv_hash_add(bat_priv->dat.hash, batadv_compare_dat, - batadv_hash_dat, &dat_entry->ip, + batadv_hash_dat, dat_entry, &dat_entry->hash_entry); if (unlikely(hash_added != 0)) { @@ -304,8 +308,8 @@ static void batadv_dat_entry_add(struct batadv_priv *bat_priv, __be32 ip, goto out; } - batadv_dbg(BATADV_DBG_DAT, bat_priv, "New entry added: %pI4 %pM\n", - &dat_entry->ip, dat_entry->mac_addr); + batadv_dbg(BATADV_DBG_DAT, bat_priv, "New entry added: %pI4 %pM (vid: %d)\n", + &dat_entry->ip, dat_entry->mac_addr, BATADV_PRINT_VID(vid)); out: if (dat_entry) @@ -419,6 +423,10 @@ static bool batadv_is_orig_node_eligible(struct batadv_dat_candidate *res, bool ret = false; int j; + /* check if orig node candidate is running DAT */ + if (!(candidate->capabilities & BATADV_ORIG_CAPA_HAS_DAT)) + goto out; + /* Check if this node has already been selected... */ for (j = 0; j < select; j++) if (res[j].orig_node == candidate) @@ -588,9 +596,9 @@ static bool batadv_dat_send_data(struct batadv_priv *bat_priv, goto free_orig; tmp_skb = pskb_copy(skb, GFP_ATOMIC); - if (!batadv_unicast_4addr_prepare_skb(bat_priv, tmp_skb, - cand[i].orig_node, - packet_subtype)) { + if (!batadv_send_skb_prepare_unicast_4addr(bat_priv, tmp_skb, + cand[i].orig_node, + packet_subtype)) { kfree_skb(tmp_skb); goto free_neigh; } @@ -626,6 +634,59 @@ out: } /** + * batadv_dat_tvlv_container_update - update the dat tvlv container after dat + * setting change + * @bat_priv: the bat priv with all the soft interface information + */ +static void batadv_dat_tvlv_container_update(struct batadv_priv *bat_priv) +{ + char dat_mode; + + dat_mode = atomic_read(&bat_priv->distributed_arp_table); + + switch (dat_mode) { + case 0: + batadv_tvlv_container_unregister(bat_priv, BATADV_TVLV_DAT, 1); + break; + case 1: + batadv_tvlv_container_register(bat_priv, BATADV_TVLV_DAT, 1, + NULL, 0); + break; + } +} + +/** + * batadv_dat_status_update - update the dat tvlv container after dat + * setting change + * @net_dev: the soft interface net device + */ +void batadv_dat_status_update(struct net_device *net_dev) +{ + struct batadv_priv *bat_priv = netdev_priv(net_dev); + batadv_dat_tvlv_container_update(bat_priv); +} + +/** + * batadv_gw_tvlv_ogm_handler_v1 - process incoming dat tvlv container + * @bat_priv: the bat priv with all the soft interface information + * @orig: the orig_node of the ogm + * @flags: flags indicating the tvlv state (see batadv_tvlv_handler_flags) + * @tvlv_value: tvlv buffer containing the gateway data + * @tvlv_value_len: tvlv buffer length + */ +static void batadv_dat_tvlv_ogm_handler_v1(struct batadv_priv *bat_priv, + struct batadv_orig_node *orig, + uint8_t flags, + void *tvlv_value, + uint16_t tvlv_value_len) +{ + if (flags & BATADV_TVLV_HANDLER_OGM_CIFNOTFND) + orig->capabilities &= ~BATADV_ORIG_CAPA_HAS_DAT; + else + orig->capabilities |= BATADV_ORIG_CAPA_HAS_DAT; +} + +/** * batadv_dat_hash_free - free the local DAT hash table * @bat_priv: the bat priv with all the soft interface information */ @@ -657,6 +718,10 @@ int batadv_dat_init(struct batadv_priv *bat_priv) batadv_dat_start_timer(bat_priv); + batadv_tvlv_handler_register(bat_priv, batadv_dat_tvlv_ogm_handler_v1, + NULL, BATADV_TVLV_DAT, 1, + BATADV_TVLV_HANDLER_OGM_CIFNOTFND); + batadv_dat_tvlv_container_update(bat_priv); return 0; } @@ -666,6 +731,9 @@ int batadv_dat_init(struct batadv_priv *bat_priv) */ void batadv_dat_free(struct batadv_priv *bat_priv) { + batadv_tvlv_container_unregister(bat_priv, BATADV_TVLV_DAT, 1); + batadv_tvlv_handler_unregister(bat_priv, BATADV_TVLV_DAT, 1); + cancel_delayed_work_sync(&bat_priv->dat.work); batadv_dat_hash_free(bat_priv); @@ -693,8 +761,8 @@ int batadv_dat_cache_seq_print_text(struct seq_file *seq, void *offset) goto out; seq_printf(seq, "Distributed ARP Table (%s):\n", net_dev->name); - seq_printf(seq, " %-7s %-13s %5s\n", "IPv4", "MAC", - "last-seen"); + seq_printf(seq, " %-7s %-9s %4s %11s\n", "IPv4", + "MAC", "VID", "last-seen"); for (i = 0; i < hash->size; i++) { head = &hash->table[i]; @@ -707,8 +775,9 @@ int batadv_dat_cache_seq_print_text(struct seq_file *seq, void *offset) last_seen_msecs = last_seen_msecs % 60000; last_seen_secs = last_seen_msecs / 1000; - seq_printf(seq, " * %15pI4 %14pM %6i:%02i\n", + seq_printf(seq, " * %15pI4 %14pM %4i %6i:%02i\n", &dat_entry->ip, dat_entry->mac_addr, + BATADV_PRINT_VID(dat_entry->vid), last_seen_mins, last_seen_secs); } rcu_read_unlock(); @@ -795,6 +864,31 @@ out: } /** + * batadv_dat_get_vid - extract the VLAN identifier from skb if any + * @skb: the buffer containing the packet to extract the VID from + * @hdr_size: the size of the batman-adv header encapsulating the packet + * + * If the packet embedded in the skb is vlan tagged this function returns the + * VID with the BATADV_VLAN_HAS_TAG flag. Otherwise BATADV_NO_FLAGS is returned. + */ +static unsigned short batadv_dat_get_vid(struct sk_buff *skb, int *hdr_size) +{ + unsigned short vid; + + vid = batadv_get_vid(skb, *hdr_size); + + /* ARP parsing functions jump forward of hdr_size + ETH_HLEN. + * If the header contained in the packet is a VLAN one (which is longer) + * hdr_size is updated so that the functions will still skip the + * correct amount of bytes. + */ + if (vid & BATADV_VLAN_HAS_TAG) + *hdr_size += VLAN_HLEN; + + return vid; +} + +/** * batadv_dat_snoop_outgoing_arp_request - snoop the ARP request and try to * answer using DAT * @bat_priv: the bat priv with all the soft interface information @@ -813,26 +907,31 @@ bool batadv_dat_snoop_outgoing_arp_request(struct batadv_priv *bat_priv, bool ret = false; struct batadv_dat_entry *dat_entry = NULL; struct sk_buff *skb_new; + int hdr_size = 0; + unsigned short vid; if (!atomic_read(&bat_priv->distributed_arp_table)) goto out; - type = batadv_arp_get_type(bat_priv, skb, 0); + vid = batadv_dat_get_vid(skb, &hdr_size); + + type = batadv_arp_get_type(bat_priv, skb, hdr_size); /* If the node gets an ARP_REQUEST it has to send a DHT_GET unicast * message to the selected DHT candidates */ if (type != ARPOP_REQUEST) goto out; - batadv_dbg_arp(bat_priv, skb, type, 0, "Parsing outgoing ARP REQUEST"); + batadv_dbg_arp(bat_priv, skb, type, hdr_size, + "Parsing outgoing ARP REQUEST"); - ip_src = batadv_arp_ip_src(skb, 0); - hw_src = batadv_arp_hw_src(skb, 0); - ip_dst = batadv_arp_ip_dst(skb, 0); + ip_src = batadv_arp_ip_src(skb, hdr_size); + hw_src = batadv_arp_hw_src(skb, hdr_size); + ip_dst = batadv_arp_ip_dst(skb, hdr_size); - batadv_dat_entry_add(bat_priv, ip_src, hw_src); + batadv_dat_entry_add(bat_priv, ip_src, hw_src, vid); - dat_entry = batadv_dat_entry_hash_find(bat_priv, ip_dst); + dat_entry = batadv_dat_entry_hash_find(bat_priv, ip_dst, vid); if (dat_entry) { /* If the ARP request is destined for a local client the local * client will answer itself. DAT would only generate a @@ -842,7 +941,8 @@ bool batadv_dat_snoop_outgoing_arp_request(struct batadv_priv *bat_priv, * additional DAT answer may trigger kernel warnings about * a packet coming from the wrong port. */ - if (batadv_is_my_client(bat_priv, dat_entry->mac_addr)) { + if (batadv_is_my_client(bat_priv, dat_entry->mac_addr, + BATADV_NO_FLAGS)) { ret = true; goto out; } @@ -853,11 +953,15 @@ bool batadv_dat_snoop_outgoing_arp_request(struct batadv_priv *bat_priv, if (!skb_new) goto out; + if (vid & BATADV_VLAN_HAS_TAG) + skb_new = vlan_insert_tag(skb_new, htons(ETH_P_8021Q), + vid & VLAN_VID_MASK); + skb_reset_mac_header(skb_new); skb_new->protocol = eth_type_trans(skb_new, bat_priv->soft_iface); bat_priv->stats.rx_packets++; - bat_priv->stats.rx_bytes += skb->len + ETH_HLEN; + bat_priv->stats.rx_bytes += skb->len + ETH_HLEN + hdr_size; bat_priv->soft_iface->last_rx = jiffies; netif_rx(skb_new); @@ -892,11 +996,14 @@ bool batadv_dat_snoop_incoming_arp_request(struct batadv_priv *bat_priv, struct sk_buff *skb_new; struct batadv_dat_entry *dat_entry = NULL; bool ret = false; + unsigned short vid; int err; if (!atomic_read(&bat_priv->distributed_arp_table)) goto out; + vid = batadv_dat_get_vid(skb, &hdr_size); + type = batadv_arp_get_type(bat_priv, skb, hdr_size); if (type != ARPOP_REQUEST) goto out; @@ -908,9 +1015,9 @@ bool batadv_dat_snoop_incoming_arp_request(struct batadv_priv *bat_priv, batadv_dbg_arp(bat_priv, skb, type, hdr_size, "Parsing incoming ARP REQUEST"); - batadv_dat_entry_add(bat_priv, ip_src, hw_src); + batadv_dat_entry_add(bat_priv, ip_src, hw_src, vid); - dat_entry = batadv_dat_entry_hash_find(bat_priv, ip_dst); + dat_entry = batadv_dat_entry_hash_find(bat_priv, ip_dst, vid); if (!dat_entry) goto out; @@ -921,17 +1028,22 @@ bool batadv_dat_snoop_incoming_arp_request(struct batadv_priv *bat_priv, if (!skb_new) goto out; + if (vid & BATADV_VLAN_HAS_TAG) + skb_new = vlan_insert_tag(skb_new, htons(ETH_P_8021Q), + vid & VLAN_VID_MASK); + /* To preserve backwards compatibility, the node has choose the outgoing * format based on the incoming request packet type. The assumption is * that a node not using the 4addr packet format doesn't support it. */ if (hdr_size == sizeof(struct batadv_unicast_4addr_packet)) - err = batadv_unicast_4addr_send_skb(bat_priv, skb_new, - BATADV_P_DAT_CACHE_REPLY); + err = batadv_send_skb_via_tt_4addr(bat_priv, skb_new, + BATADV_P_DAT_CACHE_REPLY, + vid); else - err = batadv_unicast_send_skb(bat_priv, skb_new); + err = batadv_send_skb_via_tt(bat_priv, skb_new, vid); - if (!err) { + if (err != NET_XMIT_DROP) { batadv_inc_counter(bat_priv, BATADV_CNT_DAT_CACHED_REPLY_TX); ret = true; } @@ -954,23 +1066,28 @@ void batadv_dat_snoop_outgoing_arp_reply(struct batadv_priv *bat_priv, uint16_t type; __be32 ip_src, ip_dst; uint8_t *hw_src, *hw_dst; + int hdr_size = 0; + unsigned short vid; if (!atomic_read(&bat_priv->distributed_arp_table)) return; - type = batadv_arp_get_type(bat_priv, skb, 0); + vid = batadv_dat_get_vid(skb, &hdr_size); + + type = batadv_arp_get_type(bat_priv, skb, hdr_size); if (type != ARPOP_REPLY) return; - batadv_dbg_arp(bat_priv, skb, type, 0, "Parsing outgoing ARP REPLY"); + batadv_dbg_arp(bat_priv, skb, type, hdr_size, + "Parsing outgoing ARP REPLY"); - hw_src = batadv_arp_hw_src(skb, 0); - ip_src = batadv_arp_ip_src(skb, 0); - hw_dst = batadv_arp_hw_dst(skb, 0); - ip_dst = batadv_arp_ip_dst(skb, 0); + hw_src = batadv_arp_hw_src(skb, hdr_size); + ip_src = batadv_arp_ip_src(skb, hdr_size); + hw_dst = batadv_arp_hw_dst(skb, hdr_size); + ip_dst = batadv_arp_ip_dst(skb, hdr_size); - batadv_dat_entry_add(bat_priv, ip_src, hw_src); - batadv_dat_entry_add(bat_priv, ip_dst, hw_dst); + batadv_dat_entry_add(bat_priv, ip_src, hw_src, vid); + batadv_dat_entry_add(bat_priv, ip_dst, hw_dst, vid); /* Send the ARP reply to the candidates for both the IP addresses that * the node obtained from the ARP reply @@ -992,10 +1109,13 @@ bool batadv_dat_snoop_incoming_arp_reply(struct batadv_priv *bat_priv, __be32 ip_src, ip_dst; uint8_t *hw_src, *hw_dst; bool ret = false; + unsigned short vid; if (!atomic_read(&bat_priv->distributed_arp_table)) goto out; + vid = batadv_dat_get_vid(skb, &hdr_size); + type = batadv_arp_get_type(bat_priv, skb, hdr_size); if (type != ARPOP_REPLY) goto out; @@ -1011,13 +1131,13 @@ bool batadv_dat_snoop_incoming_arp_reply(struct batadv_priv *bat_priv, /* Update our internal cache with both the IP addresses the node got * within the ARP reply */ - batadv_dat_entry_add(bat_priv, ip_src, hw_src); - batadv_dat_entry_add(bat_priv, ip_dst, hw_dst); + batadv_dat_entry_add(bat_priv, ip_src, hw_src, vid); + batadv_dat_entry_add(bat_priv, ip_dst, hw_dst, vid); /* if this REPLY is directed to a client of mine, let's deliver the * packet to the interface */ - ret = !batadv_is_my_client(bat_priv, hw_dst); + ret = !batadv_is_my_client(bat_priv, hw_dst, vid); out: if (ret) kfree_skb(skb); @@ -1040,7 +1160,8 @@ bool batadv_dat_drop_broadcast_packet(struct batadv_priv *bat_priv, __be32 ip_dst; struct batadv_dat_entry *dat_entry = NULL; bool ret = false; - const size_t bcast_len = sizeof(struct batadv_bcast_packet); + int hdr_size = sizeof(struct batadv_bcast_packet); + unsigned short vid; if (!atomic_read(&bat_priv->distributed_arp_table)) goto out; @@ -1051,12 +1172,14 @@ bool batadv_dat_drop_broadcast_packet(struct batadv_priv *bat_priv, if (forw_packet->num_packets) goto out; - type = batadv_arp_get_type(bat_priv, forw_packet->skb, bcast_len); + vid = batadv_dat_get_vid(forw_packet->skb, &hdr_size); + + type = batadv_arp_get_type(bat_priv, forw_packet->skb, hdr_size); if (type != ARPOP_REQUEST) goto out; - ip_dst = batadv_arp_ip_dst(forw_packet->skb, bcast_len); - dat_entry = batadv_dat_entry_hash_find(bat_priv, ip_dst); + ip_dst = batadv_arp_ip_dst(forw_packet->skb, hdr_size); + dat_entry = batadv_dat_entry_hash_find(bat_priv, ip_dst, vid); /* check if the node already got this entry */ if (!dat_entry) { batadv_dbg(BATADV_DBG_DAT, bat_priv, diff --git a/net/batman-adv/distributed-arp-table.h b/net/batman-adv/distributed-arp-table.h index 125c8c6fcfad..60d853beb8d8 100644 --- a/net/batman-adv/distributed-arp-table.h +++ b/net/batman-adv/distributed-arp-table.h @@ -29,6 +29,7 @@ #define BATADV_DAT_ADDR_MAX ((batadv_dat_addr_t)~(batadv_dat_addr_t)0) +void batadv_dat_status_update(struct net_device *net_dev); bool batadv_dat_snoop_outgoing_arp_request(struct batadv_priv *bat_priv, struct sk_buff *skb); bool batadv_dat_snoop_incoming_arp_request(struct batadv_priv *bat_priv, @@ -98,6 +99,10 @@ static inline void batadv_dat_inc_counter(struct batadv_priv *bat_priv, #else +static inline void batadv_dat_status_update(struct net_device *net_dev) +{ +} + static inline bool batadv_dat_snoop_outgoing_arp_request(struct batadv_priv *bat_priv, struct sk_buff *skb) diff --git a/net/batman-adv/fragmentation.c b/net/batman-adv/fragmentation.c new file mode 100644 index 000000000000..271d321b3a04 --- /dev/null +++ b/net/batman-adv/fragmentation.c @@ -0,0 +1,491 @@ +/* Copyright (C) 2013 B.A.T.M.A.N. contributors: + * + * Martin Hundebøll <martin@hundeboll.net> + * + * This program is free software; you can redistribute it and/or + * modify it under the terms of version 2 of the GNU General Public + * License as published by the Free Software Foundation. + * + * This program is distributed in the hope that it will be useful, but + * WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + * General Public License for more details. + * + * You should have received a copy of the GNU General Public License + * along with this program; if not, write to the Free Software + * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA + * 02110-1301, USA + */ + +#include "main.h" +#include "fragmentation.h" +#include "send.h" +#include "originator.h" +#include "routing.h" +#include "hard-interface.h" +#include "soft-interface.h" + + +/** + * batadv_frag_clear_chain - delete entries in the fragment buffer chain + * @head: head of chain with entries. + * + * Free fragments in the passed hlist. Should be called with appropriate lock. + */ +static void batadv_frag_clear_chain(struct hlist_head *head) +{ + struct batadv_frag_list_entry *entry; + struct hlist_node *node; + + hlist_for_each_entry_safe(entry, node, head, list) { + hlist_del(&entry->list); + kfree_skb(entry->skb); + kfree(entry); + } +} + +/** + * batadv_frag_purge_orig - free fragments associated to an orig + * @orig_node: originator to free fragments from + * @check_cb: optional function to tell if an entry should be purged + */ +void batadv_frag_purge_orig(struct batadv_orig_node *orig_node, + bool (*check_cb)(struct batadv_frag_table_entry *)) +{ + struct batadv_frag_table_entry *chain; + uint8_t i; + + for (i = 0; i < BATADV_FRAG_BUFFER_COUNT; i++) { + chain = &orig_node->fragments[i]; + spin_lock_bh(&orig_node->fragments[i].lock); + + if (!check_cb || check_cb(chain)) { + batadv_frag_clear_chain(&orig_node->fragments[i].head); + orig_node->fragments[i].size = 0; + } + + spin_unlock_bh(&orig_node->fragments[i].lock); + } +} + +/** + * batadv_frag_size_limit - maximum possible size of packet to be fragmented + * + * Returns the maximum size of payload that can be fragmented. + */ +static int batadv_frag_size_limit(void) +{ + int limit = BATADV_FRAG_MAX_FRAG_SIZE; + + limit -= sizeof(struct batadv_frag_packet); + limit *= BATADV_FRAG_MAX_FRAGMENTS; + + return limit; +} + +/** + * batadv_frag_init_chain - check and prepare fragment chain for new fragment + * @chain: chain in fragments table to init + * @seqno: sequence number of the received fragment + * + * Make chain ready for a fragment with sequence number "seqno". Delete existing + * entries if they have an "old" sequence number. + * + * Caller must hold chain->lock. + * + * Returns true if chain is empty and caller can just insert the new fragment + * without searching for the right position. + */ +static bool batadv_frag_init_chain(struct batadv_frag_table_entry *chain, + uint16_t seqno) +{ + if (chain->seqno == seqno) + return false; + + if (!hlist_empty(&chain->head)) + batadv_frag_clear_chain(&chain->head); + + chain->size = 0; + chain->seqno = seqno; + + return true; +} + +/** + * batadv_frag_insert_packet - insert a fragment into a fragment chain + * @orig_node: originator that the fragment was received from + * @skb: skb to insert + * @chain_out: list head to attach complete chains of fragments to + * + * Insert a new fragment into the reverse ordered chain in the right table + * entry. The hash table entry is cleared if "old" fragments exist in it. + * + * Returns true if skb is buffered, false on error. If the chain has all the + * fragments needed to merge the packet, the chain is moved to the passed head + * to avoid locking the chain in the table. + */ +static bool batadv_frag_insert_packet(struct batadv_orig_node *orig_node, + struct sk_buff *skb, + struct hlist_head *chain_out) +{ + struct batadv_frag_table_entry *chain; + struct batadv_frag_list_entry *frag_entry_new = NULL, *frag_entry_curr; + struct batadv_frag_packet *frag_packet; + uint8_t bucket; + uint16_t seqno, hdr_size = sizeof(struct batadv_frag_packet); + bool ret = false; + + /* Linearize packet to avoid linearizing 16 packets in a row when doing + * the later merge. Non-linear merge should be added to remove this + * linearization. + */ + if (skb_linearize(skb) < 0) + goto err; + + frag_packet = (struct batadv_frag_packet *)skb->data; + seqno = ntohs(frag_packet->seqno); + bucket = seqno % BATADV_FRAG_BUFFER_COUNT; + + frag_entry_new = kmalloc(sizeof(*frag_entry_new), GFP_ATOMIC); + if (!frag_entry_new) + goto err; + + frag_entry_new->skb = skb; + frag_entry_new->no = frag_packet->no; + + /* Select entry in the "chain table" and delete any prior fragments + * with another sequence number. batadv_frag_init_chain() returns true, + * if the list is empty at return. + */ + chain = &orig_node->fragments[bucket]; + spin_lock_bh(&chain->lock); + if (batadv_frag_init_chain(chain, seqno)) { + hlist_add_head(&frag_entry_new->list, &chain->head); + chain->size = skb->len - hdr_size; + chain->timestamp = jiffies; + ret = true; + goto out; + } + + /* Find the position for the new fragment. */ + hlist_for_each_entry(frag_entry_curr, &chain->head, list) { + /* Drop packet if fragment already exists. */ + if (frag_entry_curr->no == frag_entry_new->no) + goto err_unlock; + + /* Order fragments from highest to lowest. */ + if (frag_entry_curr->no < frag_entry_new->no) { + hlist_add_before(&frag_entry_new->list, + &frag_entry_curr->list); + chain->size += skb->len - hdr_size; + chain->timestamp = jiffies; + ret = true; + goto out; + } + } + + /* Reached the end of the list, so insert after 'frag_entry_curr'. */ + if (likely(frag_entry_curr)) { + hlist_add_after(&frag_entry_curr->list, &frag_entry_new->list); + chain->size += skb->len - hdr_size; + chain->timestamp = jiffies; + ret = true; + } + +out: + if (chain->size > batadv_frag_size_limit() || + ntohs(frag_packet->total_size) > batadv_frag_size_limit()) { + /* Clear chain if total size of either the list or the packet + * exceeds the maximum size of one merged packet. + */ + batadv_frag_clear_chain(&chain->head); + chain->size = 0; + } else if (ntohs(frag_packet->total_size) == chain->size) { + /* All fragments received. Hand over chain to caller. */ + hlist_move_list(&chain->head, chain_out); + chain->size = 0; + } + +err_unlock: + spin_unlock_bh(&chain->lock); + +err: + if (!ret) + kfree(frag_entry_new); + + return ret; +} + +/** + * batadv_frag_merge_packets - merge a chain of fragments + * @chain: head of chain with fragments + * @skb: packet with total size of skb after merging + * + * Expand the first skb in the chain and copy the content of the remaining + * skb's into the expanded one. After doing so, clear the chain. + * + * Returns the merged skb or NULL on error. + */ +static struct sk_buff * +batadv_frag_merge_packets(struct hlist_head *chain, struct sk_buff *skb) +{ + struct batadv_frag_packet *packet; + struct batadv_frag_list_entry *entry; + struct sk_buff *skb_out = NULL; + int size, hdr_size = sizeof(struct batadv_frag_packet); + + /* Make sure incoming skb has non-bogus data. */ + packet = (struct batadv_frag_packet *)skb->data; + size = ntohs(packet->total_size); + if (size > batadv_frag_size_limit()) + goto free; + + /* Remove first entry, as this is the destination for the rest of the + * fragments. + */ + entry = hlist_entry(chain->first, struct batadv_frag_list_entry, list); + hlist_del(&entry->list); + skb_out = entry->skb; + kfree(entry); + + /* Make room for the rest of the fragments. */ + if (pskb_expand_head(skb_out, 0, size - skb->len, GFP_ATOMIC) < 0) { + kfree_skb(skb_out); + skb_out = NULL; + goto free; + } + + /* Move the existing MAC header to just before the payload. (Override + * the fragment header.) + */ + skb_pull_rcsum(skb_out, hdr_size); + memmove(skb_out->data - ETH_HLEN, skb_mac_header(skb_out), ETH_HLEN); + skb_set_mac_header(skb_out, -ETH_HLEN); + skb_reset_network_header(skb_out); + skb_reset_transport_header(skb_out); + + /* Copy the payload of the each fragment into the last skb */ + hlist_for_each_entry(entry, chain, list) { + size = entry->skb->len - hdr_size; + memcpy(skb_put(skb_out, size), entry->skb->data + hdr_size, + size); + } + +free: + /* Locking is not needed, because 'chain' is not part of any orig. */ + batadv_frag_clear_chain(chain); + return skb_out; +} + +/** + * batadv_frag_skb_buffer - buffer fragment for later merge + * @skb: skb to buffer + * @orig_node_src: originator that the skb is received from + * + * Add fragment to buffer and merge fragments if possible. + * + * There are three possible outcomes: 1) Packet is merged: Return true and + * set *skb to merged packet; 2) Packet is buffered: Return true and set *skb + * to NULL; 3) Error: Return false and leave skb as is. + */ +bool batadv_frag_skb_buffer(struct sk_buff **skb, + struct batadv_orig_node *orig_node_src) +{ + struct sk_buff *skb_out = NULL; + struct hlist_head head = HLIST_HEAD_INIT; + bool ret = false; + + /* Add packet to buffer and table entry if merge is possible. */ + if (!batadv_frag_insert_packet(orig_node_src, *skb, &head)) + goto out_err; + + /* Leave if more fragments are needed to merge. */ + if (hlist_empty(&head)) + goto out; + + skb_out = batadv_frag_merge_packets(&head, *skb); + if (!skb_out) + goto out_err; + +out: + *skb = skb_out; + ret = true; +out_err: + return ret; +} + +/** + * batadv_frag_skb_fwd - forward fragments that would exceed MTU when merged + * @skb: skb to forward + * @recv_if: interface that the skb is received on + * @orig_node_src: originator that the skb is received from + * + * Look up the next-hop of the fragments payload and check if the merged packet + * will exceed the MTU towards the next-hop. If so, the fragment is forwarded + * without merging it. + * + * Returns true if the fragment is consumed/forwarded, false otherwise. + */ +bool batadv_frag_skb_fwd(struct sk_buff *skb, + struct batadv_hard_iface *recv_if, + struct batadv_orig_node *orig_node_src) +{ + struct batadv_priv *bat_priv = netdev_priv(recv_if->soft_iface); + struct batadv_orig_node *orig_node_dst = NULL; + struct batadv_neigh_node *neigh_node = NULL; + struct batadv_frag_packet *packet; + uint16_t total_size; + bool ret = false; + + packet = (struct batadv_frag_packet *)skb->data; + orig_node_dst = batadv_orig_hash_find(bat_priv, packet->dest); + if (!orig_node_dst) + goto out; + + neigh_node = batadv_find_router(bat_priv, orig_node_dst, recv_if); + if (!neigh_node) + goto out; + + /* Forward the fragment, if the merged packet would be too big to + * be assembled. + */ + total_size = ntohs(packet->total_size); + if (total_size > neigh_node->if_incoming->net_dev->mtu) { + batadv_inc_counter(bat_priv, BATADV_CNT_FRAG_FWD); + batadv_add_counter(bat_priv, BATADV_CNT_FRAG_FWD_BYTES, + skb->len + ETH_HLEN); + + packet->header.ttl--; + batadv_send_skb_packet(skb, neigh_node->if_incoming, + neigh_node->addr); + ret = true; + } + +out: + if (orig_node_dst) + batadv_orig_node_free_ref(orig_node_dst); + if (neigh_node) + batadv_neigh_node_free_ref(neigh_node); + return ret; +} + +/** + * batadv_frag_create - create a fragment from skb + * @skb: skb to create fragment from + * @frag_head: header to use in new fragment + * @mtu: size of new fragment + * + * Split the passed skb into two fragments: A new one with size matching the + * passed mtu and the old one with the rest. The new skb contains data from the + * tail of the old skb. + * + * Returns the new fragment, NULL on error. + */ +static struct sk_buff *batadv_frag_create(struct sk_buff *skb, + struct batadv_frag_packet *frag_head, + unsigned int mtu) +{ + struct sk_buff *skb_fragment; + unsigned header_size = sizeof(*frag_head); + unsigned fragment_size = mtu - header_size; + + skb_fragment = netdev_alloc_skb(NULL, mtu + ETH_HLEN); + if (!skb_fragment) + goto err; + + skb->priority = TC_PRIO_CONTROL; + + /* Eat the last mtu-bytes of the skb */ + skb_reserve(skb_fragment, header_size + ETH_HLEN); + skb_split(skb, skb_fragment, skb->len - fragment_size); + + /* Add the header */ + skb_push(skb_fragment, header_size); + memcpy(skb_fragment->data, frag_head, header_size); + +err: + return skb_fragment; +} + +/** + * batadv_frag_send_packet - create up to 16 fragments from the passed skb + * @skb: skb to create fragments from + * @orig_node: final destination of the created fragments + * @neigh_node: next-hop of the created fragments + * + * Returns true on success, false otherwise. + */ +bool batadv_frag_send_packet(struct sk_buff *skb, + struct batadv_orig_node *orig_node, + struct batadv_neigh_node *neigh_node) +{ + struct batadv_priv *bat_priv; + struct batadv_hard_iface *primary_if; + struct batadv_frag_packet frag_header; + struct sk_buff *skb_fragment; + unsigned mtu = neigh_node->if_incoming->net_dev->mtu; + unsigned header_size = sizeof(frag_header); + unsigned max_fragment_size, max_packet_size; + + /* To avoid merge and refragmentation at next-hops we never send + * fragments larger than BATADV_FRAG_MAX_FRAG_SIZE + */ + mtu = min_t(unsigned, mtu, BATADV_FRAG_MAX_FRAG_SIZE); + max_fragment_size = (mtu - header_size - ETH_HLEN); + max_packet_size = max_fragment_size * BATADV_FRAG_MAX_FRAGMENTS; + + /* Don't even try to fragment, if we need more than 16 fragments */ + if (skb->len > max_packet_size) + goto out_err; + + bat_priv = orig_node->bat_priv; + primary_if = batadv_primary_if_get_selected(bat_priv); + if (!primary_if) + goto out_err; + + /* Create one header to be copied to all fragments */ + frag_header.header.packet_type = BATADV_UNICAST_FRAG; + frag_header.header.version = BATADV_COMPAT_VERSION; + frag_header.header.ttl = BATADV_TTL; + frag_header.seqno = htons(atomic_inc_return(&bat_priv->frag_seqno)); + frag_header.reserved = 0; + frag_header.no = 0; + frag_header.total_size = htons(skb->len); + memcpy(frag_header.orig, primary_if->net_dev->dev_addr, ETH_ALEN); + memcpy(frag_header.dest, orig_node->orig, ETH_ALEN); + + /* Eat and send fragments from the tail of skb */ + while (skb->len > max_fragment_size) { + skb_fragment = batadv_frag_create(skb, &frag_header, mtu); + if (!skb_fragment) + goto out_err; + + batadv_inc_counter(bat_priv, BATADV_CNT_FRAG_TX); + batadv_add_counter(bat_priv, BATADV_CNT_FRAG_TX_BYTES, + skb_fragment->len + ETH_HLEN); + batadv_send_skb_packet(skb_fragment, neigh_node->if_incoming, + neigh_node->addr); + frag_header.no++; + + /* The initial check in this function should cover this case */ + if (frag_header.no == BATADV_FRAG_MAX_FRAGMENTS - 1) + goto out_err; + } + + /* Make room for the fragment header. */ + if (batadv_skb_head_push(skb, header_size) < 0 || + pskb_expand_head(skb, header_size + ETH_HLEN, 0, GFP_ATOMIC) < 0) + goto out_err; + + memcpy(skb->data, &frag_header, header_size); + + /* Send the last fragment */ + batadv_inc_counter(bat_priv, BATADV_CNT_FRAG_TX); + batadv_add_counter(bat_priv, BATADV_CNT_FRAG_TX_BYTES, + skb->len + ETH_HLEN); + batadv_send_skb_packet(skb, neigh_node->if_incoming, neigh_node->addr); + + return true; +out_err: + return false; +} diff --git a/net/batman-adv/fragmentation.h b/net/batman-adv/fragmentation.h new file mode 100644 index 000000000000..ca029e2676e7 --- /dev/null +++ b/net/batman-adv/fragmentation.h @@ -0,0 +1,50 @@ +/* Copyright (C) 2013 B.A.T.M.A.N. contributors: + * + * Martin Hundebøll <martin@hundeboll.net> + * + * This program is free software; you can redistribute it and/or + * modify it under the terms of version 2 of the GNU General Public + * License as published by the Free Software Foundation. + * + * This program is distributed in the hope that it will be useful, but + * WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + * General Public License for more details. + * + * You should have received a copy of the GNU General Public License + * along with this program; if not, write to the Free Software + * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA + * 02110-1301, USA + */ + +#ifndef _NET_BATMAN_ADV_FRAGMENTATION_H_ +#define _NET_BATMAN_ADV_FRAGMENTATION_H_ + +void batadv_frag_purge_orig(struct batadv_orig_node *orig, + bool (*check_cb)(struct batadv_frag_table_entry *)); +bool batadv_frag_skb_fwd(struct sk_buff *skb, + struct batadv_hard_iface *recv_if, + struct batadv_orig_node *orig_node_src); +bool batadv_frag_skb_buffer(struct sk_buff **skb, + struct batadv_orig_node *orig_node); +bool batadv_frag_send_packet(struct sk_buff *skb, + struct batadv_orig_node *orig_node, + struct batadv_neigh_node *neigh_node); + +/** + * batadv_frag_check_entry - check if a list of fragments has timed out + * @frags_entry: table entry to check + * + * Returns true if the frags entry has timed out, false otherwise. + */ +static inline bool +batadv_frag_check_entry(struct batadv_frag_table_entry *frags_entry) +{ + if (!hlist_empty(&frags_entry->head) && + batadv_has_timed_out(frags_entry->timestamp, BATADV_FRAG_TIMEOUT)) + return true; + else + return false; +} + +#endif /* _NET_BATMAN_ADV_FRAGMENTATION_H_ */ diff --git a/net/batman-adv/gateway_client.c b/net/batman-adv/gateway_client.c index 1ce4b8763ef2..2449afaa7638 100644 --- a/net/batman-adv/gateway_client.c +++ b/net/batman-adv/gateway_client.c @@ -118,7 +118,6 @@ batadv_gw_get_best_gw_node(struct batadv_priv *bat_priv) uint32_t max_gw_factor = 0, tmp_gw_factor = 0; uint32_t gw_divisor; uint8_t max_tq = 0; - int down, up; uint8_t tq_avg; struct batadv_orig_node *orig_node; @@ -138,14 +137,13 @@ batadv_gw_get_best_gw_node(struct batadv_priv *bat_priv) if (!atomic_inc_not_zero(&gw_node->refcount)) goto next; - tq_avg = router->tq_avg; + tq_avg = router->bat_iv.tq_avg; switch (atomic_read(&bat_priv->gw_sel_class)) { case 1: /* fast connection */ - batadv_gw_bandwidth_to_kbit(orig_node->gw_flags, - &down, &up); - - tmp_gw_factor = tq_avg * tq_avg * down * 100 * 100; + tmp_gw_factor = tq_avg * tq_avg; + tmp_gw_factor *= gw_node->bandwidth_down; + tmp_gw_factor *= 100 * 100; tmp_gw_factor /= gw_divisor; if ((tmp_gw_factor > max_gw_factor) || @@ -223,11 +221,6 @@ void batadv_gw_election(struct batadv_priv *bat_priv) struct batadv_neigh_node *router = NULL; char gw_addr[18] = { '\0' }; - /* The batman daemon checks here if we already passed a full originator - * cycle in order to make sure we don't choose the first gateway we - * hear about. This check is based on the daemon's uptime which we - * don't have. - */ if (atomic_read(&bat_priv->gw_mode) != BATADV_GW_MODE_CLIENT) goto out; @@ -258,16 +251,22 @@ void batadv_gw_election(struct batadv_priv *bat_priv) NULL); } else if ((!curr_gw) && (next_gw)) { batadv_dbg(BATADV_DBG_BATMAN, bat_priv, - "Adding route to gateway %pM (gw_flags: %i, tq: %i)\n", + "Adding route to gateway %pM (bandwidth: %u.%u/%u.%u MBit, tq: %i)\n", next_gw->orig_node->orig, - next_gw->orig_node->gw_flags, router->tq_avg); + next_gw->bandwidth_down / 10, + next_gw->bandwidth_down % 10, + next_gw->bandwidth_up / 10, + next_gw->bandwidth_up % 10, router->bat_iv.tq_avg); batadv_throw_uevent(bat_priv, BATADV_UEV_GW, BATADV_UEV_ADD, gw_addr); } else { batadv_dbg(BATADV_DBG_BATMAN, bat_priv, - "Changing route to gateway %pM (gw_flags: %i, tq: %i)\n", + "Changing route to gateway %pM (bandwidth: %u.%u/%u.%u MBit, tq: %i)\n", next_gw->orig_node->orig, - next_gw->orig_node->gw_flags, router->tq_avg); + next_gw->bandwidth_down / 10, + next_gw->bandwidth_down % 10, + next_gw->bandwidth_up / 10, + next_gw->bandwidth_up % 10, router->bat_iv.tq_avg); batadv_throw_uevent(bat_priv, BATADV_UEV_GW, BATADV_UEV_CHANGE, gw_addr); } @@ -306,8 +305,8 @@ void batadv_gw_check_election(struct batadv_priv *bat_priv, if (!router_orig) goto out; - gw_tq_avg = router_gw->tq_avg; - orig_tq_avg = router_orig->tq_avg; + gw_tq_avg = router_gw->bat_iv.tq_avg; + orig_tq_avg = router_orig->bat_iv.tq_avg; /* the TQ value has to be better */ if (orig_tq_avg < gw_tq_avg) @@ -337,12 +336,20 @@ out: return; } +/** + * batadv_gw_node_add - add gateway node to list of available gateways + * @bat_priv: the bat priv with all the soft interface information + * @orig_node: originator announcing gateway capabilities + * @gateway: announced bandwidth information + */ static void batadv_gw_node_add(struct batadv_priv *bat_priv, struct batadv_orig_node *orig_node, - uint8_t new_gwflags) + struct batadv_tvlv_gateway_data *gateway) { struct batadv_gw_node *gw_node; - int down, up; + + if (gateway->bandwidth_down == 0) + return; gw_node = kzalloc(sizeof(*gw_node), GFP_ATOMIC); if (!gw_node) @@ -356,73 +363,116 @@ static void batadv_gw_node_add(struct batadv_priv *bat_priv, hlist_add_head_rcu(&gw_node->list, &bat_priv->gw.list); spin_unlock_bh(&bat_priv->gw.list_lock); - batadv_gw_bandwidth_to_kbit(new_gwflags, &down, &up); batadv_dbg(BATADV_DBG_BATMAN, bat_priv, - "Found new gateway %pM -> gw_class: %i - %i%s/%i%s\n", - orig_node->orig, new_gwflags, - (down > 2048 ? down / 1024 : down), - (down > 2048 ? "MBit" : "KBit"), - (up > 2048 ? up / 1024 : up), - (up > 2048 ? "MBit" : "KBit")); + "Found new gateway %pM -> gw bandwidth: %u.%u/%u.%u MBit\n", + orig_node->orig, + ntohl(gateway->bandwidth_down) / 10, + ntohl(gateway->bandwidth_down) % 10, + ntohl(gateway->bandwidth_up) / 10, + ntohl(gateway->bandwidth_up) % 10); } -void batadv_gw_node_update(struct batadv_priv *bat_priv, - struct batadv_orig_node *orig_node, - uint8_t new_gwflags) +/** + * batadv_gw_node_get - retrieve gateway node from list of available gateways + * @bat_priv: the bat priv with all the soft interface information + * @orig_node: originator announcing gateway capabilities + * + * Returns gateway node if found or NULL otherwise. + */ +static struct batadv_gw_node * +batadv_gw_node_get(struct batadv_priv *bat_priv, + struct batadv_orig_node *orig_node) { - struct batadv_gw_node *gw_node, *curr_gw; - - /* Note: We don't need a NULL check here, since curr_gw never gets - * dereferenced. If curr_gw is NULL we also should not exit as we may - * have this gateway in our list (duplication check!) even though we - * have no currently selected gateway. - */ - curr_gw = batadv_gw_get_selected_gw_node(bat_priv); + struct batadv_gw_node *gw_node_tmp, *gw_node = NULL; rcu_read_lock(); - hlist_for_each_entry_rcu(gw_node, &bat_priv->gw.list, list) { - if (gw_node->orig_node != orig_node) + hlist_for_each_entry_rcu(gw_node_tmp, &bat_priv->gw.list, list) { + if (gw_node_tmp->orig_node != orig_node) continue; - batadv_dbg(BATADV_DBG_BATMAN, bat_priv, - "Gateway class of originator %pM changed from %i to %i\n", - orig_node->orig, gw_node->orig_node->gw_flags, - new_gwflags); + if (gw_node_tmp->deleted) + continue; - gw_node->deleted = 0; + if (!atomic_inc_not_zero(&gw_node_tmp->refcount)) + continue; - if (new_gwflags == BATADV_NO_FLAGS) { - gw_node->deleted = jiffies; - batadv_dbg(BATADV_DBG_BATMAN, bat_priv, - "Gateway %pM removed from gateway list\n", - orig_node->orig); + gw_node = gw_node_tmp; + break; + } + rcu_read_unlock(); - if (gw_node == curr_gw) - goto deselect; - } + return gw_node; +} - goto unlock; +/** + * batadv_gw_node_update - update list of available gateways with changed + * bandwidth information + * @bat_priv: the bat priv with all the soft interface information + * @orig_node: originator announcing gateway capabilities + * @gateway: announced bandwidth information + */ +void batadv_gw_node_update(struct batadv_priv *bat_priv, + struct batadv_orig_node *orig_node, + struct batadv_tvlv_gateway_data *gateway) +{ + struct batadv_gw_node *gw_node, *curr_gw = NULL; + + gw_node = batadv_gw_node_get(bat_priv, orig_node); + if (!gw_node) { + batadv_gw_node_add(bat_priv, orig_node, gateway); + goto out; } - if (new_gwflags == BATADV_NO_FLAGS) - goto unlock; + if ((gw_node->bandwidth_down == ntohl(gateway->bandwidth_down)) && + (gw_node->bandwidth_up == ntohl(gateway->bandwidth_up))) + goto out; - batadv_gw_node_add(bat_priv, orig_node, new_gwflags); - goto unlock; + batadv_dbg(BATADV_DBG_BATMAN, bat_priv, + "Gateway bandwidth of originator %pM changed from %u.%u/%u.%u MBit to %u.%u/%u.%u MBit\n", + orig_node->orig, + gw_node->bandwidth_down / 10, + gw_node->bandwidth_down % 10, + gw_node->bandwidth_up / 10, + gw_node->bandwidth_up % 10, + ntohl(gateway->bandwidth_down) / 10, + ntohl(gateway->bandwidth_down) % 10, + ntohl(gateway->bandwidth_up) / 10, + ntohl(gateway->bandwidth_up) % 10); + + gw_node->bandwidth_down = ntohl(gateway->bandwidth_down); + gw_node->bandwidth_up = ntohl(gateway->bandwidth_up); + + gw_node->deleted = 0; + if (ntohl(gateway->bandwidth_down) == 0) { + gw_node->deleted = jiffies; + batadv_dbg(BATADV_DBG_BATMAN, bat_priv, + "Gateway %pM removed from gateway list\n", + orig_node->orig); -deselect: - batadv_gw_deselect(bat_priv); -unlock: - rcu_read_unlock(); + /* Note: We don't need a NULL check here, since curr_gw never + * gets dereferenced. + */ + curr_gw = batadv_gw_get_selected_gw_node(bat_priv); + if (gw_node == curr_gw) + batadv_gw_deselect(bat_priv); + } +out: if (curr_gw) batadv_gw_node_free_ref(curr_gw); + if (gw_node) + batadv_gw_node_free_ref(gw_node); } void batadv_gw_node_delete(struct batadv_priv *bat_priv, struct batadv_orig_node *orig_node) { - batadv_gw_node_update(bat_priv, orig_node, 0); + struct batadv_tvlv_gateway_data gateway; + + gateway.bandwidth_down = 0; + gateway.bandwidth_up = 0; + + batadv_gw_node_update(bat_priv, orig_node, &gateway); } void batadv_gw_node_purge(struct batadv_priv *bat_priv) @@ -467,9 +517,7 @@ static int batadv_write_buffer_text(struct batadv_priv *bat_priv, { struct batadv_gw_node *curr_gw; struct batadv_neigh_node *router; - int down, up, ret = -1; - - batadv_gw_bandwidth_to_kbit(gw_node->orig_node->gw_flags, &down, &up); + int ret = -1; router = batadv_orig_node_get_router(gw_node->orig_node); if (!router) @@ -477,16 +525,15 @@ static int batadv_write_buffer_text(struct batadv_priv *bat_priv, curr_gw = batadv_gw_get_selected_gw_node(bat_priv); - ret = seq_printf(seq, "%s %pM (%3i) %pM [%10s]: %3i - %i%s/%i%s\n", + ret = seq_printf(seq, "%s %pM (%3i) %pM [%10s]: %u.%u/%u.%u MBit\n", (curr_gw == gw_node ? "=>" : " "), gw_node->orig_node->orig, - router->tq_avg, router->addr, + router->bat_iv.tq_avg, router->addr, router->if_incoming->net_dev->name, - gw_node->orig_node->gw_flags, - (down > 2048 ? down / 1024 : down), - (down > 2048 ? "MBit" : "KBit"), - (up > 2048 ? up / 1024 : up), - (up > 2048 ? "MBit" : "KBit")); + gw_node->bandwidth_down / 10, + gw_node->bandwidth_down % 10, + gw_node->bandwidth_up / 10, + gw_node->bandwidth_up % 10); batadv_neigh_node_free_ref(router); if (curr_gw) @@ -508,7 +555,7 @@ int batadv_gw_client_seq_print_text(struct seq_file *seq, void *offset) goto out; seq_printf(seq, - " %-12s (%s/%i) %17s [%10s]: gw_class ... [B.A.T.M.A.N. adv %s, MainIF/MAC: %s/%pM (%s)]\n", + " %-12s (%s/%i) %17s [%10s]: advertised uplink bandwidth ... [B.A.T.M.A.N. adv %s, MainIF/MAC: %s/%pM (%s)]\n", "Gateway", "#", BATADV_TQ_MAX_VALUE, "Nexthop", "outgoingIF", BATADV_SOURCE_VERSION, primary_if->net_dev->name, primary_if->net_dev->dev_addr, net_dev->name); @@ -603,24 +650,29 @@ bool batadv_gw_is_dhcp_target(struct sk_buff *skb, unsigned int *header_len) struct iphdr *iphdr; struct ipv6hdr *ipv6hdr; struct udphdr *udphdr; + struct vlan_ethhdr *vhdr; + __be16 proto; /* check for ethernet header */ if (!pskb_may_pull(skb, *header_len + ETH_HLEN)) return false; ethhdr = (struct ethhdr *)skb->data; + proto = ethhdr->h_proto; *header_len += ETH_HLEN; /* check for initial vlan header */ - if (ntohs(ethhdr->h_proto) == ETH_P_8021Q) { + if (proto == htons(ETH_P_8021Q)) { if (!pskb_may_pull(skb, *header_len + VLAN_HLEN)) return false; - ethhdr = (struct ethhdr *)(skb->data + VLAN_HLEN); + + vhdr = (struct vlan_ethhdr *)skb->data; + proto = vhdr->h_vlan_encapsulated_proto; *header_len += VLAN_HLEN; } /* check for ip header */ - switch (ntohs(ethhdr->h_proto)) { - case ETH_P_IP: + switch (proto) { + case htons(ETH_P_IP): if (!pskb_may_pull(skb, *header_len + sizeof(*iphdr))) return false; iphdr = (struct iphdr *)(skb->data + *header_len); @@ -631,7 +683,7 @@ bool batadv_gw_is_dhcp_target(struct sk_buff *skb, unsigned int *header_len) return false; break; - case ETH_P_IPV6: + case htons(ETH_P_IPV6): if (!pskb_may_pull(skb, *header_len + sizeof(*ipv6hdr))) return false; ipv6hdr = (struct ipv6hdr *)(skb->data + *header_len); @@ -658,28 +710,44 @@ bool batadv_gw_is_dhcp_target(struct sk_buff *skb, unsigned int *header_len) *header_len += sizeof(*udphdr); /* check for bootp port */ - if ((ntohs(ethhdr->h_proto) == ETH_P_IP) && - (ntohs(udphdr->dest) != 67)) + if ((proto == htons(ETH_P_IP)) && + (udphdr->dest != htons(67))) return false; - if ((ntohs(ethhdr->h_proto) == ETH_P_IPV6) && - (ntohs(udphdr->dest) != 547)) + if ((proto == htons(ETH_P_IPV6)) && + (udphdr->dest != htons(547))) return false; return true; } -/* this call might reallocate skb data */ +/** + * batadv_gw_out_of_range - check if the dhcp request destination is the best gw + * @bat_priv: the bat priv with all the soft interface information + * @skb: the outgoing packet + * + * Check if the skb is a DHCP request and if it is sent to the current best GW + * server. Due to topology changes it may be the case that the GW server + * previously selected is not the best one anymore. + * + * Returns true if the packet destination is unicast and it is not the best gw, + * false otherwise. + * + * This call might reallocate skb data. + */ bool batadv_gw_out_of_range(struct batadv_priv *bat_priv, struct sk_buff *skb) { struct batadv_neigh_node *neigh_curr = NULL, *neigh_old = NULL; struct batadv_orig_node *orig_dst_node = NULL; - struct batadv_gw_node *curr_gw = NULL; + struct batadv_gw_node *gw_node = NULL, *curr_gw = NULL; struct ethhdr *ethhdr; bool ret, out_of_range = false; unsigned int header_len = 0; uint8_t curr_tq_avg; + unsigned short vid; + + vid = batadv_get_vid(skb, 0); ret = batadv_gw_is_dhcp_target(skb, &header_len); if (!ret) @@ -687,11 +755,12 @@ bool batadv_gw_out_of_range(struct batadv_priv *bat_priv, ethhdr = (struct ethhdr *)skb->data; orig_dst_node = batadv_transtable_search(bat_priv, ethhdr->h_source, - ethhdr->h_dest); + ethhdr->h_dest, vid); if (!orig_dst_node) goto out; - if (!orig_dst_node->gw_flags) + gw_node = batadv_gw_node_get(bat_priv, orig_dst_node); + if (!gw_node->bandwidth_down == 0) goto out; ret = batadv_is_type_dhcprequest(skb, header_len); @@ -723,7 +792,7 @@ bool batadv_gw_out_of_range(struct batadv_priv *bat_priv, if (!neigh_curr) goto out; - curr_tq_avg = neigh_curr->tq_avg; + curr_tq_avg = neigh_curr->bat_iv.tq_avg; break; case BATADV_GW_MODE_OFF: default: @@ -734,7 +803,7 @@ bool batadv_gw_out_of_range(struct batadv_priv *bat_priv, if (!neigh_old) goto out; - if (curr_tq_avg - neigh_old->tq_avg > BATADV_GW_THRESHOLD) + if (curr_tq_avg - neigh_old->bat_iv.tq_avg > BATADV_GW_THRESHOLD) out_of_range = true; out: @@ -742,6 +811,8 @@ out: batadv_orig_node_free_ref(orig_dst_node); if (curr_gw) batadv_gw_node_free_ref(curr_gw); + if (gw_node) + batadv_gw_node_free_ref(gw_node); if (neigh_old) batadv_neigh_node_free_ref(neigh_old); if (neigh_curr) diff --git a/net/batman-adv/gateway_client.h b/net/batman-adv/gateway_client.h index ceef4ebe8bcd..d95c2d23195e 100644 --- a/net/batman-adv/gateway_client.h +++ b/net/batman-adv/gateway_client.h @@ -29,7 +29,7 @@ void batadv_gw_check_election(struct batadv_priv *bat_priv, struct batadv_orig_node *orig_node); void batadv_gw_node_update(struct batadv_priv *bat_priv, struct batadv_orig_node *orig_node, - uint8_t new_gwflags); + struct batadv_tvlv_gateway_data *gateway); void batadv_gw_node_delete(struct batadv_priv *bat_priv, struct batadv_orig_node *orig_node); void batadv_gw_node_purge(struct batadv_priv *bat_priv); diff --git a/net/batman-adv/gateway_common.c b/net/batman-adv/gateway_common.c index 84bb2b18d711..b211b0f9cb78 100644 --- a/net/batman-adv/gateway_common.c +++ b/net/batman-adv/gateway_common.c @@ -21,64 +21,23 @@ #include "gateway_common.h" #include "gateway_client.h" -/* calculates the gateway class from kbit */ -static void batadv_kbit_to_gw_bandwidth(int down, int up, long *gw_srv_class) -{ - int mdown = 0, tdown, tup, difference; - uint8_t sbit, part; - - *gw_srv_class = 0; - difference = 0x0FFFFFFF; - - /* test all downspeeds */ - for (sbit = 0; sbit < 2; sbit++) { - for (part = 0; part < 16; part++) { - tdown = 32 * (sbit + 2) * (1 << part); - - if (abs(tdown - down) < difference) { - *gw_srv_class = (sbit << 7) + (part << 3); - difference = abs(tdown - down); - mdown = tdown; - } - } - } - - /* test all upspeeds */ - difference = 0x0FFFFFFF; - - for (part = 0; part < 8; part++) { - tup = ((part + 1) * (mdown)) / 8; - - if (abs(tup - up) < difference) { - *gw_srv_class = (*gw_srv_class & 0xF8) | part; - difference = abs(tup - up); - } - } -} - -/* returns the up and downspeeds in kbit, calculated from the class */ -void batadv_gw_bandwidth_to_kbit(uint8_t gw_srv_class, int *down, int *up) -{ - int sbit = (gw_srv_class & 0x80) >> 7; - int dpart = (gw_srv_class & 0x78) >> 3; - int upart = (gw_srv_class & 0x07); - - if (!gw_srv_class) { - *down = 0; - *up = 0; - return; - } - - *down = 32 * (sbit + 2) * (1 << dpart); - *up = ((upart + 1) * (*down)) / 8; -} - +/** + * batadv_parse_gw_bandwidth - parse supplied string buffer to extract download + * and upload bandwidth information + * @net_dev: the soft interface net device + * @buff: string buffer to parse + * @down: pointer holding the returned download bandwidth information + * @up: pointer holding the returned upload bandwidth information + * + * Returns false on parse error and true otherwise. + */ static bool batadv_parse_gw_bandwidth(struct net_device *net_dev, char *buff, - int *up, int *down) + uint32_t *down, uint32_t *up) { - int ret, multi = 1; + enum batadv_bandwidth_units bw_unit_type = BATADV_BW_UNIT_KBIT; char *slash_ptr, *tmp_ptr; long ldown, lup; + int ret; slash_ptr = strchr(buff, '/'); if (slash_ptr) @@ -88,10 +47,10 @@ static bool batadv_parse_gw_bandwidth(struct net_device *net_dev, char *buff, tmp_ptr = buff + strlen(buff) - 4; if (strnicmp(tmp_ptr, "mbit", 4) == 0) - multi = 1024; + bw_unit_type = BATADV_BW_UNIT_MBIT; if ((strnicmp(tmp_ptr, "kbit", 4) == 0) || - (multi > 1)) + (bw_unit_type == BATADV_BW_UNIT_MBIT)) *tmp_ptr = '\0'; } @@ -103,20 +62,28 @@ static bool batadv_parse_gw_bandwidth(struct net_device *net_dev, char *buff, return false; } - *down = ldown * multi; + switch (bw_unit_type) { + case BATADV_BW_UNIT_MBIT: + *down = ldown * 10; + break; + case BATADV_BW_UNIT_KBIT: + default: + *down = ldown / 100; + break; + } /* we also got some upload info */ if (slash_ptr) { - multi = 1; + bw_unit_type = BATADV_BW_UNIT_KBIT; if (strlen(slash_ptr + 1) > 4) { tmp_ptr = slash_ptr + 1 - 4 + strlen(slash_ptr + 1); if (strnicmp(tmp_ptr, "mbit", 4) == 0) - multi = 1024; + bw_unit_type = BATADV_BW_UNIT_MBIT; if ((strnicmp(tmp_ptr, "kbit", 4) == 0) || - (multi > 1)) + (bw_unit_type == BATADV_BW_UNIT_MBIT)) *tmp_ptr = '\0'; } @@ -128,52 +95,149 @@ static bool batadv_parse_gw_bandwidth(struct net_device *net_dev, char *buff, return false; } - *up = lup * multi; + switch (bw_unit_type) { + case BATADV_BW_UNIT_MBIT: + *up = lup * 10; + break; + case BATADV_BW_UNIT_KBIT: + default: + *up = lup / 100; + break; + } } return true; } +/** + * batadv_gw_tvlv_container_update - update the gw tvlv container after gateway + * setting change + * @bat_priv: the bat priv with all the soft interface information + */ +void batadv_gw_tvlv_container_update(struct batadv_priv *bat_priv) +{ + struct batadv_tvlv_gateway_data gw; + uint32_t down, up; + char gw_mode; + + gw_mode = atomic_read(&bat_priv->gw_mode); + + switch (gw_mode) { + case BATADV_GW_MODE_OFF: + case BATADV_GW_MODE_CLIENT: + batadv_tvlv_container_unregister(bat_priv, BATADV_TVLV_GW, 1); + break; + case BATADV_GW_MODE_SERVER: + down = atomic_read(&bat_priv->gw.bandwidth_down); + up = atomic_read(&bat_priv->gw.bandwidth_up); + gw.bandwidth_down = htonl(down); + gw.bandwidth_up = htonl(up); + batadv_tvlv_container_register(bat_priv, BATADV_TVLV_GW, 1, + &gw, sizeof(gw)); + break; + } +} + ssize_t batadv_gw_bandwidth_set(struct net_device *net_dev, char *buff, size_t count) { struct batadv_priv *bat_priv = netdev_priv(net_dev); - long gw_bandwidth_tmp = 0; - int up = 0, down = 0; + uint32_t down_curr, up_curr, down_new = 0, up_new = 0; bool ret; - ret = batadv_parse_gw_bandwidth(net_dev, buff, &up, &down); + down_curr = (unsigned int)atomic_read(&bat_priv->gw.bandwidth_down); + up_curr = (unsigned int)atomic_read(&bat_priv->gw.bandwidth_up); + + ret = batadv_parse_gw_bandwidth(net_dev, buff, &down_new, &up_new); if (!ret) goto end; - if ((!down) || (down < 256)) - down = 2000; - - if (!up) - up = down / 5; + if (!down_new) + down_new = 1; - batadv_kbit_to_gw_bandwidth(down, up, &gw_bandwidth_tmp); + if (!up_new) + up_new = down_new / 5; - /* the gw bandwidth we guessed above might not match the given - * speeds, hence we need to calculate it back to show the number - * that is going to be propagated - */ - batadv_gw_bandwidth_to_kbit((uint8_t)gw_bandwidth_tmp, &down, &up); + if (!up_new) + up_new = 1; - if (atomic_read(&bat_priv->gw_bandwidth) == gw_bandwidth_tmp) + if ((down_curr == down_new) && (up_curr == up_new)) return count; batadv_gw_deselect(bat_priv); batadv_info(net_dev, - "Changing gateway bandwidth from: '%i' to: '%ld' (propagating: %d%s/%d%s)\n", - atomic_read(&bat_priv->gw_bandwidth), gw_bandwidth_tmp, - (down > 2048 ? down / 1024 : down), - (down > 2048 ? "MBit" : "KBit"), - (up > 2048 ? up / 1024 : up), - (up > 2048 ? "MBit" : "KBit")); + "Changing gateway bandwidth from: '%u.%u/%u.%u MBit' to: '%u.%u/%u.%u MBit'\n", + down_curr / 10, down_curr % 10, up_curr / 10, up_curr % 10, + down_new / 10, down_new % 10, up_new / 10, up_new % 10); - atomic_set(&bat_priv->gw_bandwidth, gw_bandwidth_tmp); + atomic_set(&bat_priv->gw.bandwidth_down, down_new); + atomic_set(&bat_priv->gw.bandwidth_up, up_new); + batadv_gw_tvlv_container_update(bat_priv); end: return count; } + +/** + * batadv_gw_tvlv_ogm_handler_v1 - process incoming gateway tvlv container + * @bat_priv: the bat priv with all the soft interface information + * @orig: the orig_node of the ogm + * @flags: flags indicating the tvlv state (see batadv_tvlv_handler_flags) + * @tvlv_value: tvlv buffer containing the gateway data + * @tvlv_value_len: tvlv buffer length + */ +static void batadv_gw_tvlv_ogm_handler_v1(struct batadv_priv *bat_priv, + struct batadv_orig_node *orig, + uint8_t flags, + void *tvlv_value, + uint16_t tvlv_value_len) +{ + struct batadv_tvlv_gateway_data gateway, *gateway_ptr; + + /* only fetch the tvlv value if the handler wasn't called via the + * CIFNOTFND flag and if there is data to fetch + */ + if ((flags & BATADV_TVLV_HANDLER_OGM_CIFNOTFND) || + (tvlv_value_len < sizeof(gateway))) { + gateway.bandwidth_down = 0; + gateway.bandwidth_up = 0; + } else { + gateway_ptr = tvlv_value; + gateway.bandwidth_down = gateway_ptr->bandwidth_down; + gateway.bandwidth_up = gateway_ptr->bandwidth_up; + if ((gateway.bandwidth_down == 0) || + (gateway.bandwidth_up == 0)) { + gateway.bandwidth_down = 0; + gateway.bandwidth_up = 0; + } + } + + batadv_gw_node_update(bat_priv, orig, &gateway); + + /* restart gateway selection if fast or late switching was enabled */ + if ((gateway.bandwidth_down != 0) && + (atomic_read(&bat_priv->gw_mode) == BATADV_GW_MODE_CLIENT) && + (atomic_read(&bat_priv->gw_sel_class) > 2)) + batadv_gw_check_election(bat_priv, orig); +} + +/** + * batadv_gw_init - initialise the gateway handling internals + * @bat_priv: the bat priv with all the soft interface information + */ +void batadv_gw_init(struct batadv_priv *bat_priv) +{ + batadv_tvlv_handler_register(bat_priv, batadv_gw_tvlv_ogm_handler_v1, + NULL, BATADV_TVLV_GW, 1, + BATADV_TVLV_HANDLER_OGM_CIFNOTFND); +} + +/** + * batadv_gw_free - free the gateway handling internals + * @bat_priv: the bat priv with all the soft interface information + */ +void batadv_gw_free(struct batadv_priv *bat_priv) +{ + batadv_tvlv_container_unregister(bat_priv, BATADV_TVLV_GW, 1); + batadv_tvlv_handler_unregister(bat_priv, BATADV_TVLV_GW, 1); +} diff --git a/net/batman-adv/gateway_common.h b/net/batman-adv/gateway_common.h index 509b2bf8c2f4..56384a4cd18c 100644 --- a/net/batman-adv/gateway_common.h +++ b/net/batman-adv/gateway_common.h @@ -26,12 +26,24 @@ enum batadv_gw_modes { BATADV_GW_MODE_SERVER, }; +/** + * enum batadv_bandwidth_units - bandwidth unit types + * @BATADV_BW_UNIT_KBIT: unit type kbit + * @BATADV_BW_UNIT_MBIT: unit type mbit + */ +enum batadv_bandwidth_units { + BATADV_BW_UNIT_KBIT, + BATADV_BW_UNIT_MBIT, +}; + #define BATADV_GW_MODE_OFF_NAME "off" #define BATADV_GW_MODE_CLIENT_NAME "client" #define BATADV_GW_MODE_SERVER_NAME "server" -void batadv_gw_bandwidth_to_kbit(uint8_t gw_class, int *down, int *up); ssize_t batadv_gw_bandwidth_set(struct net_device *net_dev, char *buff, size_t count); +void batadv_gw_tvlv_container_update(struct batadv_priv *bat_priv); +void batadv_gw_init(struct batadv_priv *bat_priv); +void batadv_gw_free(struct batadv_priv *bat_priv); #endif /* _NET_BATMAN_ADV_GATEWAY_COMMON_H_ */ diff --git a/net/batman-adv/hard-interface.c b/net/batman-adv/hard-interface.c index c478e6bcf89b..57c2a19dcb5c 100644 --- a/net/batman-adv/hard-interface.c +++ b/net/batman-adv/hard-interface.c @@ -28,6 +28,7 @@ #include "originator.h" #include "hash.h" #include "bridge_loop_avoidance.h" +#include "gateway_client.h" #include <linux/if_arp.h> #include <linux/if_ether.h> @@ -124,8 +125,11 @@ static int batadv_is_valid_iface(const struct net_device *net_dev) * * Returns true if the net device is a 802.11 wireless device, false otherwise. */ -static bool batadv_is_wifi_netdev(struct net_device *net_device) +bool batadv_is_wifi_netdev(struct net_device *net_device) { + if (!net_device) + return false; + #ifdef CONFIG_WIRELESS_EXT /* pre-cfg80211 drivers have to implement WEXT, so it is possible to * check for wireless_handlers != NULL @@ -141,34 +145,6 @@ static bool batadv_is_wifi_netdev(struct net_device *net_device) return false; } -/** - * batadv_is_wifi_iface - check if the given interface represented by ifindex - * is a wifi interface - * @ifindex: interface index to check - * - * Returns true if the interface represented by ifindex is a 802.11 wireless - * device, false otherwise. - */ -bool batadv_is_wifi_iface(int ifindex) -{ - struct net_device *net_device = NULL; - bool ret = false; - - if (ifindex == BATADV_NULL_IFINDEX) - goto out; - - net_device = dev_get_by_index(&init_net, ifindex); - if (!net_device) - goto out; - - ret = batadv_is_wifi_netdev(net_device); - -out: - if (net_device) - dev_put(net_device); - return ret; -} - static struct batadv_hard_iface * batadv_hardif_get_active(const struct net_device *soft_iface) { @@ -194,22 +170,13 @@ out: static void batadv_primary_if_update_addr(struct batadv_priv *bat_priv, struct batadv_hard_iface *oldif) { - struct batadv_vis_packet *vis_packet; struct batadv_hard_iface *primary_if; - struct sk_buff *skb; primary_if = batadv_primary_if_get_selected(bat_priv); if (!primary_if) goto out; batadv_dat_init_own_addr(bat_priv, primary_if); - - skb = bat_priv->vis.my_info->skb_packet; - vis_packet = (struct batadv_vis_packet *)skb->data; - memcpy(vis_packet->vis_orig, primary_if->net_dev->dev_addr, ETH_ALEN); - memcpy(vis_packet->sender_orig, - primary_if->net_dev->dev_addr, ETH_ALEN); - batadv_bla_update_orig_address(bat_priv, primary_if, oldif); out: if (primary_if) @@ -275,16 +242,10 @@ static void batadv_check_known_mac_addr(const struct net_device *net_dev) int batadv_hardif_min_mtu(struct net_device *soft_iface) { - const struct batadv_priv *bat_priv = netdev_priv(soft_iface); + struct batadv_priv *bat_priv = netdev_priv(soft_iface); const struct batadv_hard_iface *hard_iface; - /* allow big frames if all devices are capable to do so - * (have MTU > 1500 + BAT_HEADER_LEN) - */ int min_mtu = ETH_DATA_LEN; - if (atomic_read(&bat_priv->fragmentation)) - goto out; - rcu_read_lock(); list_for_each_entry_rcu(hard_iface, &batadv_hardif_list, list) { if ((hard_iface->if_status != BATADV_IF_ACTIVE) && @@ -294,23 +255,40 @@ int batadv_hardif_min_mtu(struct net_device *soft_iface) if (hard_iface->soft_iface != soft_iface) continue; - min_mtu = min_t(int, - hard_iface->net_dev->mtu - BATADV_HEADER_LEN, - min_mtu); + min_mtu = min_t(int, hard_iface->net_dev->mtu, min_mtu); } rcu_read_unlock(); + + atomic_set(&bat_priv->packet_size_max, min_mtu); + + if (atomic_read(&bat_priv->fragmentation) == 0) + goto out; + + /* with fragmentation enabled the maximum size of internally generated + * packets such as translation table exchanges or tvlv containers, etc + * has to be calculated + */ + min_mtu = min_t(int, min_mtu, BATADV_FRAG_MAX_FRAG_SIZE); + min_mtu -= sizeof(struct batadv_frag_packet); + min_mtu *= BATADV_FRAG_MAX_FRAGMENTS; + atomic_set(&bat_priv->packet_size_max, min_mtu); + + /* with fragmentation enabled we can fragment external packets easily */ + min_mtu = min_t(int, min_mtu, ETH_DATA_LEN); + out: - return min_mtu; + return min_mtu - batadv_max_header_len(); } /* adjusts the MTU if a new interface with a smaller MTU appeared. */ void batadv_update_min_mtu(struct net_device *soft_iface) { - int min_mtu; + soft_iface->mtu = batadv_hardif_min_mtu(soft_iface); - min_mtu = batadv_hardif_min_mtu(soft_iface); - if (soft_iface->mtu != min_mtu) - soft_iface->mtu = min_mtu; + /* Check if the local translate table should be cleaned up to match a + * new (and smaller) MTU. + */ + batadv_tt_local_resize_to_mtu(soft_iface); } static void @@ -388,7 +366,8 @@ int batadv_hardif_enable_interface(struct batadv_hard_iface *hard_iface, { struct batadv_priv *bat_priv; struct net_device *soft_iface, *master; - __be16 ethertype = __constant_htons(ETH_P_BATMAN); + __be16 ethertype = htons(ETH_P_BATMAN); + int max_header_len = batadv_max_header_len(); int ret; if (hard_iface->if_status != BATADV_IF_NOT_IN_USE) @@ -453,23 +432,22 @@ int batadv_hardif_enable_interface(struct batadv_hard_iface *hard_iface, hard_iface->batman_adv_ptype.dev = hard_iface->net_dev; dev_add_pack(&hard_iface->batman_adv_ptype); - atomic_set(&hard_iface->frag_seqno, 1); batadv_info(hard_iface->soft_iface, "Adding interface: %s\n", hard_iface->net_dev->name); if (atomic_read(&bat_priv->fragmentation) && - hard_iface->net_dev->mtu < ETH_DATA_LEN + BATADV_HEADER_LEN) + hard_iface->net_dev->mtu < ETH_DATA_LEN + max_header_len) batadv_info(hard_iface->soft_iface, - "The MTU of interface %s is too small (%i) to handle the transport of batman-adv packets. Packets going over this interface will be fragmented on layer2 which could impact the performance. Setting the MTU to %zi would solve the problem.\n", + "The MTU of interface %s is too small (%i) to handle the transport of batman-adv packets. Packets going over this interface will be fragmented on layer2 which could impact the performance. Setting the MTU to %i would solve the problem.\n", hard_iface->net_dev->name, hard_iface->net_dev->mtu, - ETH_DATA_LEN + BATADV_HEADER_LEN); + ETH_DATA_LEN + max_header_len); if (!atomic_read(&bat_priv->fragmentation) && - hard_iface->net_dev->mtu < ETH_DATA_LEN + BATADV_HEADER_LEN) + hard_iface->net_dev->mtu < ETH_DATA_LEN + max_header_len) batadv_info(hard_iface->soft_iface, - "The MTU of interface %s is too small (%i) to handle the transport of batman-adv packets. If you experience problems getting traffic through try increasing the MTU to %zi.\n", + "The MTU of interface %s is too small (%i) to handle the transport of batman-adv packets. If you experience problems getting traffic through try increasing the MTU to %i.\n", hard_iface->net_dev->name, hard_iface->net_dev->mtu, - ETH_DATA_LEN + BATADV_HEADER_LEN); + ETH_DATA_LEN + max_header_len); if (batadv_hardif_is_iface_up(hard_iface)) batadv_hardif_activate_interface(hard_iface); @@ -533,8 +511,12 @@ void batadv_hardif_disable_interface(struct batadv_hard_iface *hard_iface, dev_put(hard_iface->soft_iface); /* nobody uses this interface anymore */ - if (!bat_priv->num_ifaces && autodel == BATADV_IF_CLEANUP_AUTO) - batadv_softif_destroy_sysfs(hard_iface->soft_iface); + if (!bat_priv->num_ifaces) { + batadv_gw_check_client_stop(bat_priv); + + if (autodel == BATADV_IF_CLEANUP_AUTO) + batadv_softif_destroy_sysfs(hard_iface->soft_iface); + } netdev_upper_dev_unlink(hard_iface->net_dev, hard_iface->soft_iface); hard_iface->soft_iface = NULL; @@ -652,6 +634,8 @@ static int batadv_hard_if_event(struct notifier_block *this, if (batadv_softif_is_valid(net_dev) && event == NETDEV_REGISTER) { batadv_sysfs_add_meshif(net_dev); + bat_priv = netdev_priv(net_dev); + batadv_softif_create_vlan(bat_priv, BATADV_NO_FLAGS); return NOTIFY_DONE; } diff --git a/net/batman-adv/hard-interface.h b/net/batman-adv/hard-interface.h index 49892881a7c5..df4c8bd45c40 100644 --- a/net/batman-adv/hard-interface.h +++ b/net/batman-adv/hard-interface.h @@ -41,6 +41,7 @@ enum batadv_hard_if_cleanup { extern struct notifier_block batadv_hard_if_notifier; +bool batadv_is_wifi_netdev(struct net_device *net_device); struct batadv_hard_iface* batadv_hardif_get_by_netdev(const struct net_device *net_dev); int batadv_hardif_enable_interface(struct batadv_hard_iface *hard_iface, @@ -51,7 +52,6 @@ void batadv_hardif_remove_interfaces(void); int batadv_hardif_min_mtu(struct net_device *soft_iface); void batadv_update_min_mtu(struct net_device *soft_iface); void batadv_hardif_free_rcu(struct rcu_head *rcu); -bool batadv_is_wifi_iface(int ifindex); static inline void batadv_hardif_free_ref(struct batadv_hard_iface *hard_iface) diff --git a/net/batman-adv/icmp_socket.c b/net/batman-adv/icmp_socket.c index 5a99bb4b6b82..29ae4efe3543 100644 --- a/net/batman-adv/icmp_socket.c +++ b/net/batman-adv/icmp_socket.c @@ -29,7 +29,7 @@ static struct batadv_socket_client *batadv_socket_client_hash[256]; static void batadv_socket_add_packet(struct batadv_socket_client *socket_client, - struct batadv_icmp_packet_rr *icmp_packet, + struct batadv_icmp_header *icmph, size_t icmp_len); void batadv_socket_init(void) @@ -155,13 +155,13 @@ static ssize_t batadv_socket_write(struct file *file, const char __user *buff, struct batadv_priv *bat_priv = socket_client->bat_priv; struct batadv_hard_iface *primary_if = NULL; struct sk_buff *skb; - struct batadv_icmp_packet_rr *icmp_packet; - + struct batadv_icmp_packet_rr *icmp_packet_rr; + struct batadv_icmp_header *icmp_header; struct batadv_orig_node *orig_node = NULL; struct batadv_neigh_node *neigh_node = NULL; size_t packet_len = sizeof(struct batadv_icmp_packet); - if (len < sizeof(struct batadv_icmp_packet)) { + if (len < sizeof(struct batadv_icmp_header)) { batadv_dbg(BATADV_DBG_BATMAN, bat_priv, "Error - can't send packet from char device: invalid packet size\n"); return -EINVAL; @@ -174,8 +174,10 @@ static ssize_t batadv_socket_write(struct file *file, const char __user *buff, goto out; } - if (len >= sizeof(struct batadv_icmp_packet_rr)) - packet_len = sizeof(struct batadv_icmp_packet_rr); + if (len >= BATADV_ICMP_MAX_PACKET_SIZE) + packet_len = BATADV_ICMP_MAX_PACKET_SIZE; + else + packet_len = len; skb = netdev_alloc_skb_ip_align(NULL, packet_len + ETH_HLEN); if (!skb) { @@ -185,67 +187,78 @@ static ssize_t batadv_socket_write(struct file *file, const char __user *buff, skb->priority = TC_PRIO_CONTROL; skb_reserve(skb, ETH_HLEN); - icmp_packet = (struct batadv_icmp_packet_rr *)skb_put(skb, packet_len); + icmp_header = (struct batadv_icmp_header *)skb_put(skb, packet_len); - if (copy_from_user(icmp_packet, buff, packet_len)) { + if (copy_from_user(icmp_header, buff, packet_len)) { len = -EFAULT; goto free_skb; } - if (icmp_packet->header.packet_type != BATADV_ICMP) { + if (icmp_header->header.packet_type != BATADV_ICMP) { batadv_dbg(BATADV_DBG_BATMAN, bat_priv, "Error - can't send packet from char device: got bogus packet type (expected: BAT_ICMP)\n"); len = -EINVAL; goto free_skb; } - if (icmp_packet->msg_type != BATADV_ECHO_REQUEST) { + switch (icmp_header->msg_type) { + case BATADV_ECHO_REQUEST: + if (len < sizeof(struct batadv_icmp_packet)) { + batadv_dbg(BATADV_DBG_BATMAN, bat_priv, + "Error - can't send packet from char device: invalid packet size\n"); + len = -EINVAL; + goto free_skb; + } + + if (atomic_read(&bat_priv->mesh_state) != BATADV_MESH_ACTIVE) + goto dst_unreach; + + orig_node = batadv_orig_hash_find(bat_priv, icmp_header->dst); + if (!orig_node) + goto dst_unreach; + + neigh_node = batadv_orig_node_get_router(orig_node); + if (!neigh_node) + goto dst_unreach; + + if (!neigh_node->if_incoming) + goto dst_unreach; + + if (neigh_node->if_incoming->if_status != BATADV_IF_ACTIVE) + goto dst_unreach; + + icmp_packet_rr = (struct batadv_icmp_packet_rr *)icmp_header; + if (packet_len == sizeof(*icmp_packet_rr)) + memcpy(icmp_packet_rr->rr, + neigh_node->if_incoming->net_dev->dev_addr, + ETH_ALEN); + + break; + default: batadv_dbg(BATADV_DBG_BATMAN, bat_priv, - "Error - can't send packet from char device: got bogus message type (expected: ECHO_REQUEST)\n"); + "Error - can't send packet from char device: got unknown message type\n"); len = -EINVAL; goto free_skb; } - icmp_packet->uid = socket_client->index; + icmp_header->uid = socket_client->index; - if (icmp_packet->header.version != BATADV_COMPAT_VERSION) { - icmp_packet->msg_type = BATADV_PARAMETER_PROBLEM; - icmp_packet->header.version = BATADV_COMPAT_VERSION; - batadv_socket_add_packet(socket_client, icmp_packet, + if (icmp_header->header.version != BATADV_COMPAT_VERSION) { + icmp_header->msg_type = BATADV_PARAMETER_PROBLEM; + icmp_header->header.version = BATADV_COMPAT_VERSION; + batadv_socket_add_packet(socket_client, icmp_header, packet_len); goto free_skb; } - if (atomic_read(&bat_priv->mesh_state) != BATADV_MESH_ACTIVE) - goto dst_unreach; - - orig_node = batadv_orig_hash_find(bat_priv, icmp_packet->dst); - if (!orig_node) - goto dst_unreach; - - neigh_node = batadv_orig_node_get_router(orig_node); - if (!neigh_node) - goto dst_unreach; - - if (!neigh_node->if_incoming) - goto dst_unreach; - - if (neigh_node->if_incoming->if_status != BATADV_IF_ACTIVE) - goto dst_unreach; - - memcpy(icmp_packet->orig, - primary_if->net_dev->dev_addr, ETH_ALEN); - - if (packet_len == sizeof(struct batadv_icmp_packet_rr)) - memcpy(icmp_packet->rr, - neigh_node->if_incoming->net_dev->dev_addr, ETH_ALEN); + memcpy(icmp_header->orig, primary_if->net_dev->dev_addr, ETH_ALEN); batadv_send_skb_packet(skb, neigh_node->if_incoming, neigh_node->addr); goto out; dst_unreach: - icmp_packet->msg_type = BATADV_DESTINATION_UNREACHABLE; - batadv_socket_add_packet(socket_client, icmp_packet, packet_len); + icmp_header->msg_type = BATADV_DESTINATION_UNREACHABLE; + batadv_socket_add_packet(socket_client, icmp_header, packet_len); free_skb: kfree_skb(skb); out: @@ -298,27 +311,40 @@ err: return -ENOMEM; } +/** + * batadv_socket_receive_packet - schedule an icmp packet to be sent to userspace + * on an icmp socket. + * @socket_client: the socket this packet belongs to + * @icmph: pointer to the header of the icmp packet + * @icmp_len: total length of the icmp packet + */ static void batadv_socket_add_packet(struct batadv_socket_client *socket_client, - struct batadv_icmp_packet_rr *icmp_packet, + struct batadv_icmp_header *icmph, size_t icmp_len) { struct batadv_socket_packet *socket_packet; + size_t len; socket_packet = kmalloc(sizeof(*socket_packet), GFP_ATOMIC); if (!socket_packet) return; + len = icmp_len; + /* check the maximum length before filling the buffer */ + if (len > sizeof(socket_packet->icmp_packet)) + len = sizeof(socket_packet->icmp_packet); + INIT_LIST_HEAD(&socket_packet->list); - memcpy(&socket_packet->icmp_packet, icmp_packet, icmp_len); - socket_packet->icmp_len = icmp_len; + memcpy(&socket_packet->icmp_packet, icmph, len); + socket_packet->icmp_len = len; spin_lock_bh(&socket_client->lock); /* while waiting for the lock the socket_client could have been * deleted */ - if (!batadv_socket_client_hash[icmp_packet->uid]) { + if (!batadv_socket_client_hash[icmph->uid]) { spin_unlock_bh(&socket_client->lock); kfree(socket_packet); return; @@ -342,12 +368,18 @@ static void batadv_socket_add_packet(struct batadv_socket_client *socket_client, wake_up(&socket_client->queue_wait); } -void batadv_socket_receive_packet(struct batadv_icmp_packet_rr *icmp_packet, +/** + * batadv_socket_receive_packet - schedule an icmp packet to be received + * locally and sent to userspace. + * @icmph: pointer to the header of the icmp packet + * @icmp_len: total length of the icmp packet + */ +void batadv_socket_receive_packet(struct batadv_icmp_header *icmph, size_t icmp_len) { struct batadv_socket_client *hash; - hash = batadv_socket_client_hash[icmp_packet->uid]; + hash = batadv_socket_client_hash[icmph->uid]; if (hash) - batadv_socket_add_packet(hash, icmp_packet, icmp_len); + batadv_socket_add_packet(hash, icmph, icmp_len); } diff --git a/net/batman-adv/icmp_socket.h b/net/batman-adv/icmp_socket.h index 1fcca37b6223..6665080dff79 100644 --- a/net/batman-adv/icmp_socket.h +++ b/net/batman-adv/icmp_socket.h @@ -24,7 +24,7 @@ void batadv_socket_init(void); int batadv_socket_setup(struct batadv_priv *bat_priv); -void batadv_socket_receive_packet(struct batadv_icmp_packet_rr *icmp_packet, +void batadv_socket_receive_packet(struct batadv_icmp_header *icmph, size_t icmp_len); #endif /* _NET_BATMAN_ADV_ICMP_SOCKET_H_ */ diff --git a/net/batman-adv/main.c b/net/batman-adv/main.c index 1356af660b5b..c51a5e568f0a 100644 --- a/net/batman-adv/main.c +++ b/net/batman-adv/main.c @@ -36,10 +36,11 @@ #include "gateway_client.h" #include "bridge_loop_avoidance.h" #include "distributed-arp-table.h" -#include "vis.h" +#include "gateway_common.h" #include "hash.h" #include "bat_algo.h" #include "network-coding.h" +#include "fragmentation.h" /* List manipulations on hardif_list have to be rtnl_lock()'ed, @@ -109,9 +110,11 @@ int batadv_mesh_init(struct net_device *soft_iface) spin_lock_init(&bat_priv->tt.req_list_lock); spin_lock_init(&bat_priv->tt.roam_list_lock); spin_lock_init(&bat_priv->tt.last_changeset_lock); + spin_lock_init(&bat_priv->tt.commit_lock); spin_lock_init(&bat_priv->gw.list_lock); - spin_lock_init(&bat_priv->vis.hash_lock); - spin_lock_init(&bat_priv->vis.list_lock); + spin_lock_init(&bat_priv->tvlv.container_list_lock); + spin_lock_init(&bat_priv->tvlv.handler_list_lock); + spin_lock_init(&bat_priv->softif_vlan_list_lock); INIT_HLIST_HEAD(&bat_priv->forw_bat_list); INIT_HLIST_HEAD(&bat_priv->forw_bcast_list); @@ -119,6 +122,9 @@ int batadv_mesh_init(struct net_device *soft_iface) INIT_LIST_HEAD(&bat_priv->tt.changes_list); INIT_LIST_HEAD(&bat_priv->tt.req_list); INIT_LIST_HEAD(&bat_priv->tt.roam_list); + INIT_HLIST_HEAD(&bat_priv->tvlv.container_list); + INIT_HLIST_HEAD(&bat_priv->tvlv.handler_list); + INIT_HLIST_HEAD(&bat_priv->softif_vlan_list); ret = batadv_originator_init(bat_priv); if (ret < 0) @@ -128,13 +134,6 @@ int batadv_mesh_init(struct net_device *soft_iface) if (ret < 0) goto err; - batadv_tt_local_add(soft_iface, soft_iface->dev_addr, - BATADV_NULL_IFINDEX); - - ret = batadv_vis_init(bat_priv); - if (ret < 0) - goto err; - ret = batadv_bla_init(bat_priv); if (ret < 0) goto err; @@ -147,6 +146,8 @@ int batadv_mesh_init(struct net_device *soft_iface) if (ret < 0) goto err; + batadv_gw_init(bat_priv); + atomic_set(&bat_priv->gw.reselect, 0); atomic_set(&bat_priv->mesh_state, BATADV_MESH_ACTIVE); @@ -165,8 +166,6 @@ void batadv_mesh_free(struct net_device *soft_iface) batadv_purge_outstanding_packets(bat_priv, NULL); - batadv_vis_quit(bat_priv); - batadv_gw_node_purge(bat_priv); batadv_nc_mesh_free(bat_priv); batadv_dat_free(bat_priv); @@ -185,6 +184,8 @@ void batadv_mesh_free(struct net_device *soft_iface) */ batadv_originator_free(bat_priv); + batadv_gw_free(bat_priv); + free_percpu(bat_priv->bat_counters); bat_priv->bat_counters = NULL; @@ -255,6 +256,31 @@ out: } /** + * batadv_max_header_len - calculate maximum encapsulation overhead for a + * payload packet + * + * Return the maximum encapsulation overhead in bytes. + */ +int batadv_max_header_len(void) +{ + int header_len = 0; + + header_len = max_t(int, header_len, + sizeof(struct batadv_unicast_packet)); + header_len = max_t(int, header_len, + sizeof(struct batadv_unicast_4addr_packet)); + header_len = max_t(int, header_len, + sizeof(struct batadv_bcast_packet)); + +#ifdef CONFIG_BATMAN_ADV_NC + header_len = max_t(int, header_len, + sizeof(struct batadv_coded_packet)); +#endif + + return header_len; +} + +/** * batadv_skb_set_priority - sets skb priority according to packet content * @skb: the packet to be sent * @offset: offset to the packet content @@ -392,22 +418,31 @@ static void batadv_recv_handler_init(void) for (i = 0; i < ARRAY_SIZE(batadv_rx_handler); i++) batadv_rx_handler[i] = batadv_recv_unhandled_packet; - /* batman icmp packet */ - batadv_rx_handler[BATADV_ICMP] = batadv_recv_icmp_packet; + for (i = BATADV_UNICAST_MIN; i <= BATADV_UNICAST_MAX; i++) + batadv_rx_handler[i] = batadv_recv_unhandled_unicast_packet; + + /* compile time checks for struct member offsets */ + BUILD_BUG_ON(offsetof(struct batadv_unicast_4addr_packet, src) != 10); + BUILD_BUG_ON(offsetof(struct batadv_unicast_packet, dest) != 4); + BUILD_BUG_ON(offsetof(struct batadv_unicast_tvlv_packet, dst) != 4); + BUILD_BUG_ON(offsetof(struct batadv_frag_packet, dest) != 4); + BUILD_BUG_ON(offsetof(struct batadv_icmp_packet, icmph.dst) != 4); + BUILD_BUG_ON(offsetof(struct batadv_icmp_packet_rr, icmph.dst) != 4); + + /* broadcast packet */ + batadv_rx_handler[BATADV_BCAST] = batadv_recv_bcast_packet; + + /* unicast packets ... */ /* unicast with 4 addresses packet */ batadv_rx_handler[BATADV_UNICAST_4ADDR] = batadv_recv_unicast_packet; /* unicast packet */ batadv_rx_handler[BATADV_UNICAST] = batadv_recv_unicast_packet; - /* fragmented unicast packet */ - batadv_rx_handler[BATADV_UNICAST_FRAG] = batadv_recv_ucast_frag_packet; - /* broadcast packet */ - batadv_rx_handler[BATADV_BCAST] = batadv_recv_bcast_packet; - /* vis packet */ - batadv_rx_handler[BATADV_VIS] = batadv_recv_vis_packet; - /* Translation table query (request or response) */ - batadv_rx_handler[BATADV_TT_QUERY] = batadv_recv_tt_query; - /* Roaming advertisement */ - batadv_rx_handler[BATADV_ROAM_ADV] = batadv_recv_roam_adv; + /* unicast tvlv packet */ + batadv_rx_handler[BATADV_UNICAST_TVLV] = batadv_recv_unicast_tvlv; + /* batman icmp packet */ + batadv_rx_handler[BATADV_ICMP] = batadv_recv_icmp_packet; + /* Fragmented packets */ + batadv_rx_handler[BATADV_UNICAST_FRAG] = batadv_recv_frag_packet; } int @@ -415,7 +450,12 @@ batadv_recv_handler_register(uint8_t packet_type, int (*recv_handler)(struct sk_buff *, struct batadv_hard_iface *)) { - if (batadv_rx_handler[packet_type] != &batadv_recv_unhandled_packet) + int (*curr)(struct sk_buff *, + struct batadv_hard_iface *); + curr = batadv_rx_handler[packet_type]; + + if ((curr != batadv_recv_unhandled_packet) && + (curr != batadv_recv_unhandled_unicast_packet)) return -EBUSY; batadv_rx_handler[packet_type] = recv_handler; @@ -461,7 +501,9 @@ int batadv_algo_register(struct batadv_algo_ops *bat_algo_ops) !bat_algo_ops->bat_iface_update_mac || !bat_algo_ops->bat_primary_iface_set || !bat_algo_ops->bat_ogm_schedule || - !bat_algo_ops->bat_ogm_emit) { + !bat_algo_ops->bat_ogm_emit || + !bat_algo_ops->bat_neigh_cmp || + !bat_algo_ops->bat_neigh_is_equiv_or_better) { pr_info("Routing algo '%s' does not implement required ops\n", bat_algo_ops->name); ret = -EINVAL; @@ -536,6 +578,601 @@ __be32 batadv_skb_crc32(struct sk_buff *skb, u8 *payload_ptr) return htonl(crc); } +/** + * batadv_tvlv_handler_free_ref - decrement the tvlv handler refcounter and + * possibly free it + * @tvlv_handler: the tvlv handler to free + */ +static void +batadv_tvlv_handler_free_ref(struct batadv_tvlv_handler *tvlv_handler) +{ + if (atomic_dec_and_test(&tvlv_handler->refcount)) + kfree_rcu(tvlv_handler, rcu); +} + +/** + * batadv_tvlv_handler_get - retrieve tvlv handler from the tvlv handler list + * based on the provided type and version (both need to match) + * @bat_priv: the bat priv with all the soft interface information + * @type: tvlv handler type to look for + * @version: tvlv handler version to look for + * + * Returns tvlv handler if found or NULL otherwise. + */ +static struct batadv_tvlv_handler +*batadv_tvlv_handler_get(struct batadv_priv *bat_priv, + uint8_t type, uint8_t version) +{ + struct batadv_tvlv_handler *tvlv_handler_tmp, *tvlv_handler = NULL; + + rcu_read_lock(); + hlist_for_each_entry_rcu(tvlv_handler_tmp, + &bat_priv->tvlv.handler_list, list) { + if (tvlv_handler_tmp->type != type) + continue; + + if (tvlv_handler_tmp->version != version) + continue; + + if (!atomic_inc_not_zero(&tvlv_handler_tmp->refcount)) + continue; + + tvlv_handler = tvlv_handler_tmp; + break; + } + rcu_read_unlock(); + + return tvlv_handler; +} + +/** + * batadv_tvlv_container_free_ref - decrement the tvlv container refcounter and + * possibly free it + * @tvlv_handler: the tvlv container to free + */ +static void batadv_tvlv_container_free_ref(struct batadv_tvlv_container *tvlv) +{ + if (atomic_dec_and_test(&tvlv->refcount)) + kfree(tvlv); +} + +/** + * batadv_tvlv_container_get - retrieve tvlv container from the tvlv container + * list based on the provided type and version (both need to match) + * @bat_priv: the bat priv with all the soft interface information + * @type: tvlv container type to look for + * @version: tvlv container version to look for + * + * Has to be called with the appropriate locks being acquired + * (tvlv.container_list_lock). + * + * Returns tvlv container if found or NULL otherwise. + */ +static struct batadv_tvlv_container +*batadv_tvlv_container_get(struct batadv_priv *bat_priv, + uint8_t type, uint8_t version) +{ + struct batadv_tvlv_container *tvlv_tmp, *tvlv = NULL; + + hlist_for_each_entry(tvlv_tmp, &bat_priv->tvlv.container_list, list) { + if (tvlv_tmp->tvlv_hdr.type != type) + continue; + + if (tvlv_tmp->tvlv_hdr.version != version) + continue; + + if (!atomic_inc_not_zero(&tvlv_tmp->refcount)) + continue; + + tvlv = tvlv_tmp; + break; + } + + return tvlv; +} + +/** + * batadv_tvlv_container_list_size - calculate the size of the tvlv container + * list entries + * @bat_priv: the bat priv with all the soft interface information + * + * Has to be called with the appropriate locks being acquired + * (tvlv.container_list_lock). + * + * Returns size of all currently registered tvlv containers in bytes. + */ +static uint16_t batadv_tvlv_container_list_size(struct batadv_priv *bat_priv) +{ + struct batadv_tvlv_container *tvlv; + uint16_t tvlv_len = 0; + + hlist_for_each_entry(tvlv, &bat_priv->tvlv.container_list, list) { + tvlv_len += sizeof(struct batadv_tvlv_hdr); + tvlv_len += ntohs(tvlv->tvlv_hdr.len); + } + + return tvlv_len; +} + +/** + * batadv_tvlv_container_remove - remove tvlv container from the tvlv container + * list + * @tvlv: the to be removed tvlv container + * + * Has to be called with the appropriate locks being acquired + * (tvlv.container_list_lock). + */ +static void batadv_tvlv_container_remove(struct batadv_tvlv_container *tvlv) +{ + if (!tvlv) + return; + + hlist_del(&tvlv->list); + + /* first call to decrement the counter, second call to free */ + batadv_tvlv_container_free_ref(tvlv); + batadv_tvlv_container_free_ref(tvlv); +} + +/** + * batadv_tvlv_container_unregister - unregister tvlv container based on the + * provided type and version (both need to match) + * @bat_priv: the bat priv with all the soft interface information + * @type: tvlv container type to unregister + * @version: tvlv container type to unregister + */ +void batadv_tvlv_container_unregister(struct batadv_priv *bat_priv, + uint8_t type, uint8_t version) +{ + struct batadv_tvlv_container *tvlv; + + spin_lock_bh(&bat_priv->tvlv.container_list_lock); + tvlv = batadv_tvlv_container_get(bat_priv, type, version); + batadv_tvlv_container_remove(tvlv); + spin_unlock_bh(&bat_priv->tvlv.container_list_lock); +} + +/** + * batadv_tvlv_container_register - register tvlv type, version and content + * to be propagated with each (primary interface) OGM + * @bat_priv: the bat priv with all the soft interface information + * @type: tvlv container type + * @version: tvlv container version + * @tvlv_value: tvlv container content + * @tvlv_value_len: tvlv container content length + * + * If a container of the same type and version was already registered the new + * content is going to replace the old one. + */ +void batadv_tvlv_container_register(struct batadv_priv *bat_priv, + uint8_t type, uint8_t version, + void *tvlv_value, uint16_t tvlv_value_len) +{ + struct batadv_tvlv_container *tvlv_old, *tvlv_new; + + if (!tvlv_value) + tvlv_value_len = 0; + + tvlv_new = kzalloc(sizeof(*tvlv_new) + tvlv_value_len, GFP_ATOMIC); + if (!tvlv_new) + return; + + tvlv_new->tvlv_hdr.version = version; + tvlv_new->tvlv_hdr.type = type; + tvlv_new->tvlv_hdr.len = htons(tvlv_value_len); + + memcpy(tvlv_new + 1, tvlv_value, ntohs(tvlv_new->tvlv_hdr.len)); + INIT_HLIST_NODE(&tvlv_new->list); + atomic_set(&tvlv_new->refcount, 1); + + spin_lock_bh(&bat_priv->tvlv.container_list_lock); + tvlv_old = batadv_tvlv_container_get(bat_priv, type, version); + batadv_tvlv_container_remove(tvlv_old); + hlist_add_head(&tvlv_new->list, &bat_priv->tvlv.container_list); + spin_unlock_bh(&bat_priv->tvlv.container_list_lock); +} + +/** + * batadv_tvlv_realloc_packet_buff - reallocate packet buffer to accomodate + * requested packet size + * @packet_buff: packet buffer + * @packet_buff_len: packet buffer size + * @packet_min_len: requested packet minimum size + * @additional_packet_len: requested additional packet size on top of minimum + * size + * + * Returns true of the packet buffer could be changed to the requested size, + * false otherwise. + */ +static bool batadv_tvlv_realloc_packet_buff(unsigned char **packet_buff, + int *packet_buff_len, + int min_packet_len, + int additional_packet_len) +{ + unsigned char *new_buff; + + new_buff = kmalloc(min_packet_len + additional_packet_len, GFP_ATOMIC); + + /* keep old buffer if kmalloc should fail */ + if (new_buff) { + memcpy(new_buff, *packet_buff, min_packet_len); + kfree(*packet_buff); + *packet_buff = new_buff; + *packet_buff_len = min_packet_len + additional_packet_len; + return true; + } + + return false; +} + +/** + * batadv_tvlv_container_ogm_append - append tvlv container content to given + * OGM packet buffer + * @bat_priv: the bat priv with all the soft interface information + * @packet_buff: ogm packet buffer + * @packet_buff_len: ogm packet buffer size including ogm header and tvlv + * content + * @packet_min_len: ogm header size to be preserved for the OGM itself + * + * The ogm packet might be enlarged or shrunk depending on the current size + * and the size of the to-be-appended tvlv containers. + * + * Returns size of all appended tvlv containers in bytes. + */ +uint16_t batadv_tvlv_container_ogm_append(struct batadv_priv *bat_priv, + unsigned char **packet_buff, + int *packet_buff_len, + int packet_min_len) +{ + struct batadv_tvlv_container *tvlv; + struct batadv_tvlv_hdr *tvlv_hdr; + uint16_t tvlv_value_len; + void *tvlv_value; + bool ret; + + spin_lock_bh(&bat_priv->tvlv.container_list_lock); + tvlv_value_len = batadv_tvlv_container_list_size(bat_priv); + + ret = batadv_tvlv_realloc_packet_buff(packet_buff, packet_buff_len, + packet_min_len, tvlv_value_len); + + if (!ret) + goto end; + + if (!tvlv_value_len) + goto end; + + tvlv_value = (*packet_buff) + packet_min_len; + + hlist_for_each_entry(tvlv, &bat_priv->tvlv.container_list, list) { + tvlv_hdr = tvlv_value; + tvlv_hdr->type = tvlv->tvlv_hdr.type; + tvlv_hdr->version = tvlv->tvlv_hdr.version; + tvlv_hdr->len = tvlv->tvlv_hdr.len; + tvlv_value = tvlv_hdr + 1; + memcpy(tvlv_value, tvlv + 1, ntohs(tvlv->tvlv_hdr.len)); + tvlv_value = (uint8_t *)tvlv_value + ntohs(tvlv->tvlv_hdr.len); + } + +end: + spin_unlock_bh(&bat_priv->tvlv.container_list_lock); + return tvlv_value_len; +} + +/** + * batadv_tvlv_call_handler - parse the given tvlv buffer to call the + * appropriate handlers + * @bat_priv: the bat priv with all the soft interface information + * @tvlv_handler: tvlv callback function handling the tvlv content + * @ogm_source: flag indicating wether the tvlv is an ogm or a unicast packet + * @orig_node: orig node emitting the ogm packet + * @src: source mac address of the unicast packet + * @dst: destination mac address of the unicast packet + * @tvlv_value: tvlv content + * @tvlv_value_len: tvlv content length + * + * Returns success if handler was not found or the return value of the handler + * callback. + */ +static int batadv_tvlv_call_handler(struct batadv_priv *bat_priv, + struct batadv_tvlv_handler *tvlv_handler, + bool ogm_source, + struct batadv_orig_node *orig_node, + uint8_t *src, uint8_t *dst, + void *tvlv_value, uint16_t tvlv_value_len) +{ + if (!tvlv_handler) + return NET_RX_SUCCESS; + + if (ogm_source) { + if (!tvlv_handler->ogm_handler) + return NET_RX_SUCCESS; + + if (!orig_node) + return NET_RX_SUCCESS; + + tvlv_handler->ogm_handler(bat_priv, orig_node, + BATADV_NO_FLAGS, + tvlv_value, tvlv_value_len); + tvlv_handler->flags |= BATADV_TVLV_HANDLER_OGM_CALLED; + } else { + if (!src) + return NET_RX_SUCCESS; + + if (!dst) + return NET_RX_SUCCESS; + + if (!tvlv_handler->unicast_handler) + return NET_RX_SUCCESS; + + return tvlv_handler->unicast_handler(bat_priv, src, + dst, tvlv_value, + tvlv_value_len); + } + + return NET_RX_SUCCESS; +} + +/** + * batadv_tvlv_containers_process - parse the given tvlv buffer to call the + * appropriate handlers + * @bat_priv: the bat priv with all the soft interface information + * @ogm_source: flag indicating wether the tvlv is an ogm or a unicast packet + * @orig_node: orig node emitting the ogm packet + * @src: source mac address of the unicast packet + * @dst: destination mac address of the unicast packet + * @tvlv_value: tvlv content + * @tvlv_value_len: tvlv content length + * + * Returns success when processing an OGM or the return value of all called + * handler callbacks. + */ +int batadv_tvlv_containers_process(struct batadv_priv *bat_priv, + bool ogm_source, + struct batadv_orig_node *orig_node, + uint8_t *src, uint8_t *dst, + void *tvlv_value, uint16_t tvlv_value_len) +{ + struct batadv_tvlv_handler *tvlv_handler; + struct batadv_tvlv_hdr *tvlv_hdr; + uint16_t tvlv_value_cont_len; + uint8_t cifnotfound = BATADV_TVLV_HANDLER_OGM_CIFNOTFND; + int ret = NET_RX_SUCCESS; + + while (tvlv_value_len >= sizeof(*tvlv_hdr)) { + tvlv_hdr = tvlv_value; + tvlv_value_cont_len = ntohs(tvlv_hdr->len); + tvlv_value = tvlv_hdr + 1; + tvlv_value_len -= sizeof(*tvlv_hdr); + + if (tvlv_value_cont_len > tvlv_value_len) + break; + + tvlv_handler = batadv_tvlv_handler_get(bat_priv, + tvlv_hdr->type, + tvlv_hdr->version); + + ret |= batadv_tvlv_call_handler(bat_priv, tvlv_handler, + ogm_source, orig_node, + src, dst, tvlv_value, + tvlv_value_cont_len); + if (tvlv_handler) + batadv_tvlv_handler_free_ref(tvlv_handler); + tvlv_value = (uint8_t *)tvlv_value + tvlv_value_cont_len; + tvlv_value_len -= tvlv_value_cont_len; + } + + if (!ogm_source) + return ret; + + rcu_read_lock(); + hlist_for_each_entry_rcu(tvlv_handler, + &bat_priv->tvlv.handler_list, list) { + if ((tvlv_handler->flags & BATADV_TVLV_HANDLER_OGM_CIFNOTFND) && + !(tvlv_handler->flags & BATADV_TVLV_HANDLER_OGM_CALLED)) + tvlv_handler->ogm_handler(bat_priv, orig_node, + cifnotfound, NULL, 0); + + tvlv_handler->flags &= ~BATADV_TVLV_HANDLER_OGM_CALLED; + } + rcu_read_unlock(); + + return NET_RX_SUCCESS; +} + +/** + * batadv_tvlv_ogm_receive - process an incoming ogm and call the appropriate + * handlers + * @bat_priv: the bat priv with all the soft interface information + * @batadv_ogm_packet: ogm packet containing the tvlv containers + * @orig_node: orig node emitting the ogm packet + */ +void batadv_tvlv_ogm_receive(struct batadv_priv *bat_priv, + struct batadv_ogm_packet *batadv_ogm_packet, + struct batadv_orig_node *orig_node) +{ + void *tvlv_value; + uint16_t tvlv_value_len; + + if (!batadv_ogm_packet) + return; + + tvlv_value_len = ntohs(batadv_ogm_packet->tvlv_len); + if (!tvlv_value_len) + return; + + tvlv_value = batadv_ogm_packet + 1; + + batadv_tvlv_containers_process(bat_priv, true, orig_node, NULL, NULL, + tvlv_value, tvlv_value_len); +} + +/** + * batadv_tvlv_handler_register - register tvlv handler based on the provided + * type and version (both need to match) for ogm tvlv payload and/or unicast + * payload + * @bat_priv: the bat priv with all the soft interface information + * @optr: ogm tvlv handler callback function. This function receives the orig + * node, flags and the tvlv content as argument to process. + * @uptr: unicast tvlv handler callback function. This function receives the + * source & destination of the unicast packet as well as the tvlv content + * to process. + * @type: tvlv handler type to be registered + * @version: tvlv handler version to be registered + * @flags: flags to enable or disable TVLV API behavior + */ +void batadv_tvlv_handler_register(struct batadv_priv *bat_priv, + void (*optr)(struct batadv_priv *bat_priv, + struct batadv_orig_node *orig, + uint8_t flags, + void *tvlv_value, + uint16_t tvlv_value_len), + int (*uptr)(struct batadv_priv *bat_priv, + uint8_t *src, uint8_t *dst, + void *tvlv_value, + uint16_t tvlv_value_len), + uint8_t type, uint8_t version, uint8_t flags) +{ + struct batadv_tvlv_handler *tvlv_handler; + + tvlv_handler = batadv_tvlv_handler_get(bat_priv, type, version); + if (tvlv_handler) { + batadv_tvlv_handler_free_ref(tvlv_handler); + return; + } + + tvlv_handler = kzalloc(sizeof(*tvlv_handler), GFP_ATOMIC); + if (!tvlv_handler) + return; + + tvlv_handler->ogm_handler = optr; + tvlv_handler->unicast_handler = uptr; + tvlv_handler->type = type; + tvlv_handler->version = version; + tvlv_handler->flags = flags; + atomic_set(&tvlv_handler->refcount, 1); + INIT_HLIST_NODE(&tvlv_handler->list); + + spin_lock_bh(&bat_priv->tvlv.handler_list_lock); + hlist_add_head_rcu(&tvlv_handler->list, &bat_priv->tvlv.handler_list); + spin_unlock_bh(&bat_priv->tvlv.handler_list_lock); +} + +/** + * batadv_tvlv_handler_unregister - unregister tvlv handler based on the + * provided type and version (both need to match) + * @bat_priv: the bat priv with all the soft interface information + * @type: tvlv handler type to be unregistered + * @version: tvlv handler version to be unregistered + */ +void batadv_tvlv_handler_unregister(struct batadv_priv *bat_priv, + uint8_t type, uint8_t version) +{ + struct batadv_tvlv_handler *tvlv_handler; + + tvlv_handler = batadv_tvlv_handler_get(bat_priv, type, version); + if (!tvlv_handler) + return; + + batadv_tvlv_handler_free_ref(tvlv_handler); + spin_lock_bh(&bat_priv->tvlv.handler_list_lock); + hlist_del_rcu(&tvlv_handler->list); + spin_unlock_bh(&bat_priv->tvlv.handler_list_lock); + batadv_tvlv_handler_free_ref(tvlv_handler); +} + +/** + * batadv_tvlv_unicast_send - send a unicast packet with tvlv payload to the + * specified host + * @bat_priv: the bat priv with all the soft interface information + * @src: source mac address of the unicast packet + * @dst: destination mac address of the unicast packet + * @type: tvlv type + * @version: tvlv version + * @tvlv_value: tvlv content + * @tvlv_value_len: tvlv content length + */ +void batadv_tvlv_unicast_send(struct batadv_priv *bat_priv, uint8_t *src, + uint8_t *dst, uint8_t type, uint8_t version, + void *tvlv_value, uint16_t tvlv_value_len) +{ + struct batadv_unicast_tvlv_packet *unicast_tvlv_packet; + struct batadv_tvlv_hdr *tvlv_hdr; + struct batadv_orig_node *orig_node; + struct sk_buff *skb = NULL; + unsigned char *tvlv_buff; + unsigned int tvlv_len; + ssize_t hdr_len = sizeof(*unicast_tvlv_packet); + bool ret = false; + + orig_node = batadv_orig_hash_find(bat_priv, dst); + if (!orig_node) + goto out; + + tvlv_len = sizeof(*tvlv_hdr) + tvlv_value_len; + + skb = netdev_alloc_skb_ip_align(NULL, ETH_HLEN + hdr_len + tvlv_len); + if (!skb) + goto out; + + skb->priority = TC_PRIO_CONTROL; + skb_reserve(skb, ETH_HLEN); + tvlv_buff = skb_put(skb, sizeof(*unicast_tvlv_packet) + tvlv_len); + unicast_tvlv_packet = (struct batadv_unicast_tvlv_packet *)tvlv_buff; + unicast_tvlv_packet->header.packet_type = BATADV_UNICAST_TVLV; + unicast_tvlv_packet->header.version = BATADV_COMPAT_VERSION; + unicast_tvlv_packet->header.ttl = BATADV_TTL; + unicast_tvlv_packet->reserved = 0; + unicast_tvlv_packet->tvlv_len = htons(tvlv_len); + unicast_tvlv_packet->align = 0; + memcpy(unicast_tvlv_packet->src, src, ETH_ALEN); + memcpy(unicast_tvlv_packet->dst, dst, ETH_ALEN); + + tvlv_buff = (unsigned char *)(unicast_tvlv_packet + 1); + tvlv_hdr = (struct batadv_tvlv_hdr *)tvlv_buff; + tvlv_hdr->version = version; + tvlv_hdr->type = type; + tvlv_hdr->len = htons(tvlv_value_len); + tvlv_buff += sizeof(*tvlv_hdr); + memcpy(tvlv_buff, tvlv_value, tvlv_value_len); + + if (batadv_send_skb_to_orig(skb, orig_node, NULL) != NET_XMIT_DROP) + ret = true; + +out: + if (skb && !ret) + kfree_skb(skb); + if (orig_node) + batadv_orig_node_free_ref(orig_node); +} + +/** + * batadv_get_vid - extract the VLAN identifier from skb if any + * @skb: the buffer containing the packet + * @header_len: length of the batman header preceding the ethernet header + * + * If the packet embedded in the skb is vlan tagged this function returns the + * VID with the BATADV_VLAN_HAS_TAG flag. Otherwise BATADV_NO_FLAGS is returned. + */ +unsigned short batadv_get_vid(struct sk_buff *skb, size_t header_len) +{ + struct ethhdr *ethhdr = (struct ethhdr *)(skb->data + header_len); + struct vlan_ethhdr *vhdr; + unsigned short vid; + + if (ethhdr->h_proto != htons(ETH_P_8021Q)) + return BATADV_NO_FLAGS; + + if (!pskb_may_pull(skb, header_len + VLAN_ETH_HLEN)) + return BATADV_NO_FLAGS; + + vhdr = (struct vlan_ethhdr *)(skb->data + header_len); + vid = ntohs(vhdr->h_vlan_TCI) & VLAN_VID_MASK; + vid |= BATADV_VLAN_HAS_TAG; + + return vid; +} + static int batadv_param_set_ra(const char *val, const struct kernel_param *kp) { struct batadv_algo_ops *bat_algo_ops; diff --git a/net/batman-adv/main.h b/net/batman-adv/main.h index 24675523930f..f94f287b8670 100644 --- a/net/batman-adv/main.h +++ b/net/batman-adv/main.h @@ -20,13 +20,13 @@ #ifndef _NET_BATMAN_ADV_MAIN_H_ #define _NET_BATMAN_ADV_MAIN_H_ -#define BATADV_DRIVER_AUTHOR "Marek Lindner <lindner_marek@yahoo.de>, " \ - "Simon Wunderlich <siwu@hrz.tu-chemnitz.de>" +#define BATADV_DRIVER_AUTHOR "Marek Lindner <mareklindner@neomailbox.ch>, " \ + "Simon Wunderlich <sw@simonwunderlich.de>" #define BATADV_DRIVER_DESC "B.A.T.M.A.N. advanced" #define BATADV_DRIVER_DEVICE "batman-adv" #ifndef BATADV_SOURCE_VERSION -#define BATADV_SOURCE_VERSION "2013.4.0" +#define BATADV_SOURCE_VERSION "2013.5.0" #endif /* B.A.T.M.A.N. parameters */ @@ -86,7 +86,11 @@ /* numbers of originator to contact for any PUT/GET DHT operation */ #define BATADV_DAT_CANDIDATES_NUM 3 -#define BATADV_VIS_INTERVAL 5000 /* 5 seconds */ +/** + * BATADV_TQ_SIMILARITY_THRESHOLD - TQ points that a secondary metric can differ + * at most from the primary one in order to be still considered acceptable + */ +#define BATADV_TQ_SIMILARITY_THRESHOLD 50 /* how much worse secondary interfaces may be to be considered as bonding * candidates @@ -133,6 +137,15 @@ enum batadv_uev_type { #define BATADV_GW_THRESHOLD 50 +/* Number of fragment chains for each orig_node */ +#define BATADV_FRAG_BUFFER_COUNT 8 +/* Maximum number of fragments for one packet */ +#define BATADV_FRAG_MAX_FRAGMENTS 16 +/* Maxumim size of each fragment */ +#define BATADV_FRAG_MAX_FRAG_SIZE 1400 +/* Time to keep fragments while waiting for rest of the fragments */ +#define BATADV_FRAG_TIMEOUT 10000 + #define BATADV_DAT_CANDIDATE_NOT_FOUND 0 #define BATADV_DAT_CANDIDATE_ORIG 1 @@ -160,15 +173,9 @@ enum batadv_uev_type { #include <net/rtnetlink.h> #include <linux/jiffies.h> #include <linux/seq_file.h> -#include "types.h" +#include <linux/if_vlan.h> -/** - * batadv_vlan_flags - flags for the four MSB of any vlan ID field - * @BATADV_VLAN_HAS_TAG: whether the field contains a valid vlan tag or not - */ -enum batadv_vlan_flags { - BATADV_VLAN_HAS_TAG = BIT(15), -}; +#include "types.h" #define BATADV_PRINT_VID(vid) (vid & BATADV_VLAN_HAS_TAG ? \ (int)(vid & VLAN_VID_MASK) : -1) @@ -184,6 +191,7 @@ void batadv_mesh_free(struct net_device *soft_iface); int batadv_is_my_mac(struct batadv_priv *bat_priv, const uint8_t *addr); struct batadv_hard_iface * batadv_seq_print_text_primary_if_get(struct seq_file *seq); +int batadv_max_header_len(void); void batadv_skb_set_priority(struct sk_buff *skb, int offset); int batadv_batman_skb_recv(struct sk_buff *skb, struct net_device *dev, struct packet_type *ptype, @@ -326,4 +334,40 @@ static inline uint64_t batadv_sum_counter(struct batadv_priv *bat_priv, */ #define BATADV_SKB_CB(__skb) ((struct batadv_skb_cb *)&((__skb)->cb[0])) +void batadv_tvlv_container_register(struct batadv_priv *bat_priv, + uint8_t type, uint8_t version, + void *tvlv_value, uint16_t tvlv_value_len); +uint16_t batadv_tvlv_container_ogm_append(struct batadv_priv *bat_priv, + unsigned char **packet_buff, + int *packet_buff_len, + int packet_min_len); +void batadv_tvlv_ogm_receive(struct batadv_priv *bat_priv, + struct batadv_ogm_packet *batadv_ogm_packet, + struct batadv_orig_node *orig_node); +void batadv_tvlv_container_unregister(struct batadv_priv *bat_priv, + uint8_t type, uint8_t version); + +void batadv_tvlv_handler_register(struct batadv_priv *bat_priv, + void (*optr)(struct batadv_priv *bat_priv, + struct batadv_orig_node *orig, + uint8_t flags, + void *tvlv_value, + uint16_t tvlv_value_len), + int (*uptr)(struct batadv_priv *bat_priv, + uint8_t *src, uint8_t *dst, + void *tvlv_value, + uint16_t tvlv_value_len), + uint8_t type, uint8_t version, uint8_t flags); +void batadv_tvlv_handler_unregister(struct batadv_priv *bat_priv, + uint8_t type, uint8_t version); +int batadv_tvlv_containers_process(struct batadv_priv *bat_priv, + bool ogm_source, + struct batadv_orig_node *orig_node, + uint8_t *src, uint8_t *dst, + void *tvlv_buff, uint16_t tvlv_buff_len); +void batadv_tvlv_unicast_send(struct batadv_priv *bat_priv, uint8_t *src, + uint8_t *dst, uint8_t type, uint8_t version, + void *tvlv_value, uint16_t tvlv_value_len); +unsigned short batadv_get_vid(struct sk_buff *skb, size_t header_len); + #endif /* _NET_BATMAN_ADV_MAIN_H_ */ diff --git a/net/batman-adv/network-coding.c b/net/batman-adv/network-coding.c index 4ecc0b6bf8ab..351e199bc0af 100644 --- a/net/batman-adv/network-coding.c +++ b/net/batman-adv/network-coding.c @@ -59,6 +59,59 @@ static void batadv_nc_start_timer(struct batadv_priv *bat_priv) } /** + * batadv_nc_tvlv_container_update - update the network coding tvlv container + * after network coding setting change + * @bat_priv: the bat priv with all the soft interface information + */ +static void batadv_nc_tvlv_container_update(struct batadv_priv *bat_priv) +{ + char nc_mode; + + nc_mode = atomic_read(&bat_priv->network_coding); + + switch (nc_mode) { + case 0: + batadv_tvlv_container_unregister(bat_priv, BATADV_TVLV_NC, 1); + break; + case 1: + batadv_tvlv_container_register(bat_priv, BATADV_TVLV_NC, 1, + NULL, 0); + break; + } +} + +/** + * batadv_nc_status_update - update the network coding tvlv container after + * network coding setting change + * @net_dev: the soft interface net device + */ +void batadv_nc_status_update(struct net_device *net_dev) +{ + struct batadv_priv *bat_priv = netdev_priv(net_dev); + batadv_nc_tvlv_container_update(bat_priv); +} + +/** + * batadv_nc_tvlv_ogm_handler_v1 - process incoming nc tvlv container + * @bat_priv: the bat priv with all the soft interface information + * @orig: the orig_node of the ogm + * @flags: flags indicating the tvlv state (see batadv_tvlv_handler_flags) + * @tvlv_value: tvlv buffer containing the gateway data + * @tvlv_value_len: tvlv buffer length + */ +static void batadv_nc_tvlv_ogm_handler_v1(struct batadv_priv *bat_priv, + struct batadv_orig_node *orig, + uint8_t flags, + void *tvlv_value, + uint16_t tvlv_value_len) +{ + if (flags & BATADV_TVLV_HANDLER_OGM_CIFNOTFND) + orig->capabilities &= ~BATADV_ORIG_CAPA_HAS_NC; + else + orig->capabilities |= BATADV_ORIG_CAPA_HAS_NC; +} + +/** * batadv_nc_mesh_init - initialise coding hash table and start house keeping * @bat_priv: the bat priv with all the soft interface information */ @@ -87,6 +140,10 @@ int batadv_nc_mesh_init(struct batadv_priv *bat_priv) INIT_DELAYED_WORK(&bat_priv->nc.work, batadv_nc_worker); batadv_nc_start_timer(bat_priv); + batadv_tvlv_handler_register(bat_priv, batadv_nc_tvlv_ogm_handler_v1, + NULL, BATADV_TVLV_NC, 1, + BATADV_TVLV_HANDLER_OGM_CIFNOTFND); + batadv_nc_tvlv_container_update(bat_priv); return 0; err: @@ -802,6 +859,10 @@ void batadv_nc_update_nc_node(struct batadv_priv *bat_priv, if (!atomic_read(&bat_priv->network_coding)) goto out; + /* check if orig node is network coding enabled */ + if (!(orig_node->capabilities & BATADV_ORIG_CAPA_HAS_NC)) + goto out; + /* accept ogms from 'good' neighbors and single hop neighbors */ if (!batadv_can_nc_with_orig(bat_priv, orig_node, ogm_packet) && !is_single_hop_neigh) @@ -942,7 +1003,7 @@ static bool batadv_nc_code_packets(struct batadv_priv *bat_priv, struct batadv_nc_packet *nc_packet, struct batadv_neigh_node *neigh_node) { - uint8_t tq_weighted_neigh, tq_weighted_coding; + uint8_t tq_weighted_neigh, tq_weighted_coding, tq_tmp; struct sk_buff *skb_dest, *skb_src; struct batadv_unicast_packet *packet1; struct batadv_unicast_packet *packet2; @@ -967,8 +1028,10 @@ static bool batadv_nc_code_packets(struct batadv_priv *bat_priv, if (!router_coding) goto out; - tq_weighted_neigh = batadv_nc_random_weight_tq(router_neigh->tq_avg); - tq_weighted_coding = batadv_nc_random_weight_tq(router_coding->tq_avg); + tq_tmp = batadv_nc_random_weight_tq(router_neigh->bat_iv.tq_avg); + tq_weighted_neigh = tq_tmp; + tq_tmp = batadv_nc_random_weight_tq(router_coding->bat_iv.tq_avg); + tq_weighted_coding = tq_tmp; /* Select one destination for the MAC-header dst-field based on * weighted TQ-values. @@ -1735,6 +1798,8 @@ free_nc_packet: */ void batadv_nc_mesh_free(struct batadv_priv *bat_priv) { + batadv_tvlv_container_unregister(bat_priv, BATADV_TVLV_NC, 1); + batadv_tvlv_handler_unregister(bat_priv, BATADV_TVLV_NC, 1); cancel_delayed_work_sync(&bat_priv->nc.work); batadv_nc_purge_paths(bat_priv, bat_priv->nc.coding_hash, NULL); diff --git a/net/batman-adv/network-coding.h b/net/batman-adv/network-coding.h index ddfa618e80bf..d4fd315b5261 100644 --- a/net/batman-adv/network-coding.h +++ b/net/batman-adv/network-coding.h @@ -22,6 +22,7 @@ #ifdef CONFIG_BATMAN_ADV_NC +void batadv_nc_status_update(struct net_device *net_dev); int batadv_nc_init(void); int batadv_nc_mesh_init(struct batadv_priv *bat_priv); void batadv_nc_mesh_free(struct batadv_priv *bat_priv); @@ -47,6 +48,10 @@ int batadv_nc_init_debugfs(struct batadv_priv *bat_priv); #else /* ifdef CONFIG_BATMAN_ADV_NC */ +static inline void batadv_nc_status_update(struct net_device *net_dev) +{ +} + static inline int batadv_nc_init(void) { return 0; diff --git a/net/batman-adv/originator.c b/net/batman-adv/originator.c index f50553a7de62..8ab14340d10f 100644 --- a/net/batman-adv/originator.c +++ b/net/batman-adv/originator.c @@ -25,10 +25,10 @@ #include "routing.h" #include "gateway_client.h" #include "hard-interface.h" -#include "unicast.h" #include "soft-interface.h" #include "bridge_loop_avoidance.h" #include "network-coding.h" +#include "fragmentation.h" /* hash class keys */ static struct lock_class_key batadv_orig_hash_lock_class_key; @@ -36,7 +36,7 @@ static struct lock_class_key batadv_orig_hash_lock_class_key; static void batadv_purge_orig(struct work_struct *work); /* returns 1 if they are the same originator */ -static int batadv_compare_orig(const struct hlist_node *node, const void *data2) +int batadv_compare_orig(const struct hlist_node *node, const void *data2) { const void *data1 = container_of(node, struct batadv_orig_node, hash_entry); @@ -44,6 +44,88 @@ static int batadv_compare_orig(const struct hlist_node *node, const void *data2) return (memcmp(data1, data2, ETH_ALEN) == 0 ? 1 : 0); } +/** + * batadv_orig_node_vlan_get - get an orig_node_vlan object + * @orig_node: the originator serving the VLAN + * @vid: the VLAN identifier + * + * Returns the vlan object identified by vid and belonging to orig_node or NULL + * if it does not exist. + */ +struct batadv_orig_node_vlan * +batadv_orig_node_vlan_get(struct batadv_orig_node *orig_node, + unsigned short vid) +{ + struct batadv_orig_node_vlan *vlan = NULL, *tmp; + + rcu_read_lock(); + list_for_each_entry_rcu(tmp, &orig_node->vlan_list, list) { + if (tmp->vid != vid) + continue; + + if (!atomic_inc_not_zero(&tmp->refcount)) + continue; + + vlan = tmp; + + break; + } + rcu_read_unlock(); + + return vlan; +} + +/** + * batadv_orig_node_vlan_new - search and possibly create an orig_node_vlan + * object + * @orig_node: the originator serving the VLAN + * @vid: the VLAN identifier + * + * Returns NULL in case of failure or the vlan object identified by vid and + * belonging to orig_node otherwise. The object is created and added to the list + * if it does not exist. + * + * The object is returned with refcounter increased by 1. + */ +struct batadv_orig_node_vlan * +batadv_orig_node_vlan_new(struct batadv_orig_node *orig_node, + unsigned short vid) +{ + struct batadv_orig_node_vlan *vlan; + + spin_lock_bh(&orig_node->vlan_list_lock); + + /* first look if an object for this vid already exists */ + vlan = batadv_orig_node_vlan_get(orig_node, vid); + if (vlan) + goto out; + + vlan = kzalloc(sizeof(*vlan), GFP_ATOMIC); + if (!vlan) + goto out; + + atomic_set(&vlan->refcount, 2); + vlan->vid = vid; + + list_add_rcu(&vlan->list, &orig_node->vlan_list); + +out: + spin_unlock_bh(&orig_node->vlan_list_lock); + + return vlan; +} + +/** + * batadv_orig_node_vlan_free_ref - decrement the refcounter and possibly free + * the originator-vlan object + * @orig_vlan: the originator-vlan object to release + */ +void batadv_orig_node_vlan_free_ref(struct batadv_orig_node_vlan *orig_vlan) +{ + if (atomic_dec_and_test(&orig_vlan->refcount)) + kfree_rcu(orig_vlan, rcu); +} + int batadv_originator_init(struct batadv_priv *bat_priv) { if (bat_priv->orig_hash) @@ -90,11 +172,20 @@ batadv_orig_node_get_router(struct batadv_orig_node *orig_node) return router; } +/** + * batadv_neigh_node_new - create and init a new neigh_node object + * @hard_iface: the interface where the neighbour is connected to + * @neigh_addr: the mac address of the neighbour interface + * @orig_node: originator object representing the neighbour + * + * Allocates a new neigh_node object and initialises all the generic fields. + * Returns the new object or NULL on failure. + */ struct batadv_neigh_node * batadv_neigh_node_new(struct batadv_hard_iface *hard_iface, - const uint8_t *neigh_addr) + const uint8_t *neigh_addr, + struct batadv_orig_node *orig_node) { - struct batadv_priv *bat_priv = netdev_priv(hard_iface->soft_iface); struct batadv_neigh_node *neigh_node; neigh_node = kzalloc(sizeof(*neigh_node), GFP_ATOMIC); @@ -104,15 +195,14 @@ batadv_neigh_node_new(struct batadv_hard_iface *hard_iface, INIT_HLIST_NODE(&neigh_node->list); memcpy(neigh_node->addr, neigh_addr, ETH_ALEN); - spin_lock_init(&neigh_node->lq_update_lock); + neigh_node->if_incoming = hard_iface; + neigh_node->orig_node = orig_node; + + INIT_LIST_HEAD(&neigh_node->bonding_list); /* extra reference for return */ atomic_set(&neigh_node->refcount, 2); - batadv_dbg(BATADV_DBG_BATMAN, bat_priv, - "Creating new neighbor %pM on interface %s\n", neigh_addr, - hard_iface->net_dev->name); - out: return neigh_node; } @@ -146,13 +236,15 @@ static void batadv_orig_node_free_rcu(struct rcu_head *rcu) /* Free nc_nodes */ batadv_nc_purge_orig(orig_node->bat_priv, orig_node, NULL); - batadv_frag_list_free(&orig_node->frag_list); - batadv_tt_global_del_orig(orig_node->bat_priv, orig_node, + batadv_frag_purge_orig(orig_node, NULL); + + batadv_tt_global_del_orig(orig_node->bat_priv, orig_node, -1, "originator timed out"); + if (orig_node->bat_priv->bat_algo_ops->bat_orig_free) + orig_node->bat_priv->bat_algo_ops->bat_orig_free(orig_node); + kfree(orig_node->tt_buff); - kfree(orig_node->bcast_own); - kfree(orig_node->bcast_own_sum); kfree(orig_node); } @@ -210,20 +302,22 @@ void batadv_originator_free(struct batadv_priv *bat_priv) batadv_hash_destroy(hash); } -/* this function finds or creates an originator entry for the given - * address if it does not exits +/** + * batadv_orig_node_new - creates a new orig_node + * @bat_priv: the bat priv with all the soft interface information + * @addr: the mac address of the originator + * + * Creates a new originator object and initialise all the generic fields. + * The new object is not added to the originator list. + * Returns the newly created object or NULL on failure. */ -struct batadv_orig_node *batadv_get_orig_node(struct batadv_priv *bat_priv, +struct batadv_orig_node *batadv_orig_node_new(struct batadv_priv *bat_priv, const uint8_t *addr) { struct batadv_orig_node *orig_node; - int size; - int hash_added; + struct batadv_orig_node_vlan *vlan; unsigned long reset_time; - - orig_node = batadv_orig_hash_find(bat_priv, addr); - if (orig_node) - return orig_node; + int i; batadv_dbg(BATADV_DBG_BATMAN, bat_priv, "Creating new originator: %pM\n", addr); @@ -234,10 +328,12 @@ struct batadv_orig_node *batadv_get_orig_node(struct batadv_priv *bat_priv, INIT_HLIST_HEAD(&orig_node->neigh_list); INIT_LIST_HEAD(&orig_node->bond_list); - spin_lock_init(&orig_node->ogm_cnt_lock); + INIT_LIST_HEAD(&orig_node->vlan_list); spin_lock_init(&orig_node->bcast_seqno_lock); spin_lock_init(&orig_node->neigh_list_lock); spin_lock_init(&orig_node->tt_buff_lock); + spin_lock_init(&orig_node->tt_lock); + spin_lock_init(&orig_node->vlan_list_lock); batadv_nc_init_orig(orig_node); @@ -249,43 +345,32 @@ struct batadv_orig_node *batadv_get_orig_node(struct batadv_priv *bat_priv, memcpy(orig_node->orig, addr, ETH_ALEN); batadv_dat_init_orig_node_addr(orig_node); orig_node->router = NULL; - orig_node->tt_crc = 0; atomic_set(&orig_node->last_ttvn, 0); orig_node->tt_buff = NULL; orig_node->tt_buff_len = 0; - atomic_set(&orig_node->tt_size, 0); reset_time = jiffies - 1 - msecs_to_jiffies(BATADV_RESET_PROTECTION_MS); orig_node->bcast_seqno_reset = reset_time; orig_node->batman_seqno_reset = reset_time; atomic_set(&orig_node->bond_candidates, 0); - size = bat_priv->num_ifaces * sizeof(unsigned long) * BATADV_NUM_WORDS; - - orig_node->bcast_own = kzalloc(size, GFP_ATOMIC); - if (!orig_node->bcast_own) + /* create a vlan object for the "untagged" LAN */ + vlan = batadv_orig_node_vlan_new(orig_node, BATADV_NO_FLAGS); + if (!vlan) goto free_orig_node; + /* batadv_orig_node_vlan_new() increases the refcounter. + * Immediately release vlan since it is not needed anymore in this + * context + */ + batadv_orig_node_vlan_free_ref(vlan); - size = bat_priv->num_ifaces * sizeof(uint8_t); - orig_node->bcast_own_sum = kzalloc(size, GFP_ATOMIC); - - INIT_LIST_HEAD(&orig_node->frag_list); - orig_node->last_frag_packet = 0; - - if (!orig_node->bcast_own_sum) - goto free_bcast_own; - - hash_added = batadv_hash_add(bat_priv->orig_hash, batadv_compare_orig, - batadv_choose_orig, orig_node, - &orig_node->hash_entry); - if (hash_added != 0) - goto free_bcast_own_sum; + for (i = 0; i < BATADV_FRAG_BUFFER_COUNT; i++) { + INIT_HLIST_HEAD(&orig_node->fragments[i].head); + spin_lock_init(&orig_node->fragments[i].lock); + orig_node->fragments[i].size = 0; + } return orig_node; -free_bcast_own_sum: - kfree(orig_node->bcast_own_sum); -free_bcast_own: - kfree(orig_node->bcast_own); free_orig_node: kfree(orig_node); return NULL; @@ -294,15 +379,16 @@ free_orig_node: static bool batadv_purge_orig_neighbors(struct batadv_priv *bat_priv, struct batadv_orig_node *orig_node, - struct batadv_neigh_node **best_neigh_node) + struct batadv_neigh_node **best_neigh) { + struct batadv_algo_ops *bao = bat_priv->bat_algo_ops; struct hlist_node *node_tmp; struct batadv_neigh_node *neigh_node; bool neigh_purged = false; unsigned long last_seen; struct batadv_hard_iface *if_incoming; - *best_neigh_node = NULL; + *best_neigh = NULL; spin_lock_bh(&orig_node->neigh_list_lock); @@ -335,9 +421,12 @@ batadv_purge_orig_neighbors(struct batadv_priv *bat_priv, batadv_bonding_candidate_del(orig_node, neigh_node); batadv_neigh_node_free_ref(neigh_node); } else { - if ((!*best_neigh_node) || - (neigh_node->tq_avg > (*best_neigh_node)->tq_avg)) - *best_neigh_node = neigh_node; + /* store the best_neighbour if this is the first + * iteration or if a better neighbor has been found + */ + if (!*best_neigh || + bao->bat_neigh_cmp(neigh_node, *best_neigh) > 0) + *best_neigh = neigh_node; } } @@ -388,17 +477,14 @@ static void _batadv_purge_orig(struct batadv_priv *bat_priv) hlist_for_each_entry_safe(orig_node, node_tmp, head, hash_entry) { if (batadv_purge_orig_node(bat_priv, orig_node)) { - if (orig_node->gw_flags) - batadv_gw_node_delete(bat_priv, - orig_node); + batadv_gw_node_delete(bat_priv, orig_node); hlist_del_rcu(&orig_node->hash_entry); batadv_orig_node_free_ref(orig_node); continue; } - if (batadv_has_timed_out(orig_node->last_frag_packet, - BATADV_FRAG_TIMEOUT)) - batadv_frag_list_free(&orig_node->frag_list); + batadv_frag_purge_orig(orig_node, + batadv_frag_check_entry); } spin_unlock_bh(list_lock); } @@ -429,100 +515,26 @@ int batadv_orig_seq_print_text(struct seq_file *seq, void *offset) { struct net_device *net_dev = (struct net_device *)seq->private; struct batadv_priv *bat_priv = netdev_priv(net_dev); - struct batadv_hashtable *hash = bat_priv->orig_hash; - struct hlist_head *head; struct batadv_hard_iface *primary_if; - struct batadv_orig_node *orig_node; - struct batadv_neigh_node *neigh_node, *neigh_node_tmp; - int batman_count = 0; - int last_seen_secs; - int last_seen_msecs; - unsigned long last_seen_jiffies; - uint32_t i; primary_if = batadv_seq_print_text_primary_if_get(seq); if (!primary_if) - goto out; + return 0; - seq_printf(seq, "[B.A.T.M.A.N. adv %s, MainIF/MAC: %s/%pM (%s)]\n", + seq_printf(seq, "[B.A.T.M.A.N. adv %s, MainIF/MAC: %s/%pM (%s %s)]\n", BATADV_SOURCE_VERSION, primary_if->net_dev->name, - primary_if->net_dev->dev_addr, net_dev->name); - seq_printf(seq, " %-15s %s (%s/%i) %17s [%10s]: %20s ...\n", - "Originator", "last-seen", "#", BATADV_TQ_MAX_VALUE, - "Nexthop", "outgoingIF", "Potential nexthops"); - - for (i = 0; i < hash->size; i++) { - head = &hash->table[i]; - - rcu_read_lock(); - hlist_for_each_entry_rcu(orig_node, head, hash_entry) { - neigh_node = batadv_orig_node_get_router(orig_node); - if (!neigh_node) - continue; - - if (neigh_node->tq_avg == 0) - goto next; - - last_seen_jiffies = jiffies - orig_node->last_seen; - last_seen_msecs = jiffies_to_msecs(last_seen_jiffies); - last_seen_secs = last_seen_msecs / 1000; - last_seen_msecs = last_seen_msecs % 1000; - - seq_printf(seq, "%pM %4i.%03is (%3i) %pM [%10s]:", - orig_node->orig, last_seen_secs, - last_seen_msecs, neigh_node->tq_avg, - neigh_node->addr, - neigh_node->if_incoming->net_dev->name); - - hlist_for_each_entry_rcu(neigh_node_tmp, - &orig_node->neigh_list, list) { - seq_printf(seq, " %pM (%3i)", - neigh_node_tmp->addr, - neigh_node_tmp->tq_avg); - } + primary_if->net_dev->dev_addr, net_dev->name, + bat_priv->bat_algo_ops->name); - seq_puts(seq, "\n"); - batman_count++; + batadv_hardif_free_ref(primary_if); -next: - batadv_neigh_node_free_ref(neigh_node); - } - rcu_read_unlock(); + if (!bat_priv->bat_algo_ops->bat_orig_print) { + seq_puts(seq, + "No printing function for this routing protocol\n"); + return 0; } - if (batman_count == 0) - seq_puts(seq, "No batman nodes in range ...\n"); - -out: - if (primary_if) - batadv_hardif_free_ref(primary_if); - return 0; -} - -static int batadv_orig_node_add_if(struct batadv_orig_node *orig_node, - int max_if_num) -{ - void *data_ptr; - size_t data_size, old_size; - - data_size = max_if_num * sizeof(unsigned long) * BATADV_NUM_WORDS; - old_size = (max_if_num - 1) * sizeof(unsigned long) * BATADV_NUM_WORDS; - data_ptr = kmalloc(data_size, GFP_ATOMIC); - if (!data_ptr) - return -ENOMEM; - - memcpy(data_ptr, orig_node->bcast_own, old_size); - kfree(orig_node->bcast_own); - orig_node->bcast_own = data_ptr; - - data_ptr = kmalloc(max_if_num * sizeof(uint8_t), GFP_ATOMIC); - if (!data_ptr) - return -ENOMEM; - - memcpy(data_ptr, orig_node->bcast_own_sum, - (max_if_num - 1) * sizeof(uint8_t)); - kfree(orig_node->bcast_own_sum); - orig_node->bcast_own_sum = data_ptr; + bat_priv->bat_algo_ops->bat_orig_print(bat_priv, seq); return 0; } @@ -531,6 +543,7 @@ int batadv_orig_hash_add_if(struct batadv_hard_iface *hard_iface, int max_if_num) { struct batadv_priv *bat_priv = netdev_priv(hard_iface->soft_iface); + struct batadv_algo_ops *bao = bat_priv->bat_algo_ops; struct batadv_hashtable *hash = bat_priv->orig_hash; struct hlist_head *head; struct batadv_orig_node *orig_node; @@ -545,10 +558,10 @@ int batadv_orig_hash_add_if(struct batadv_hard_iface *hard_iface, rcu_read_lock(); hlist_for_each_entry_rcu(orig_node, head, hash_entry) { - spin_lock_bh(&orig_node->ogm_cnt_lock); - ret = batadv_orig_node_add_if(orig_node, max_if_num); - spin_unlock_bh(&orig_node->ogm_cnt_lock); - + ret = 0; + if (bao->bat_orig_add_if) + ret = bao->bat_orig_add_if(orig_node, + max_if_num); if (ret == -ENOMEM) goto err; } @@ -562,54 +575,6 @@ err: return -ENOMEM; } -static int batadv_orig_node_del_if(struct batadv_orig_node *orig_node, - int max_if_num, int del_if_num) -{ - void *data_ptr = NULL; - int chunk_size; - - /* last interface was removed */ - if (max_if_num == 0) - goto free_bcast_own; - - chunk_size = sizeof(unsigned long) * BATADV_NUM_WORDS; - data_ptr = kmalloc(max_if_num * chunk_size, GFP_ATOMIC); - if (!data_ptr) - return -ENOMEM; - - /* copy first part */ - memcpy(data_ptr, orig_node->bcast_own, del_if_num * chunk_size); - - /* copy second part */ - memcpy((char *)data_ptr + del_if_num * chunk_size, - orig_node->bcast_own + ((del_if_num + 1) * chunk_size), - (max_if_num - del_if_num) * chunk_size); - -free_bcast_own: - kfree(orig_node->bcast_own); - orig_node->bcast_own = data_ptr; - - if (max_if_num == 0) - goto free_own_sum; - - data_ptr = kmalloc(max_if_num * sizeof(uint8_t), GFP_ATOMIC); - if (!data_ptr) - return -ENOMEM; - - memcpy(data_ptr, orig_node->bcast_own_sum, - del_if_num * sizeof(uint8_t)); - - memcpy((char *)data_ptr + del_if_num * sizeof(uint8_t), - orig_node->bcast_own_sum + ((del_if_num + 1) * sizeof(uint8_t)), - (max_if_num - del_if_num) * sizeof(uint8_t)); - -free_own_sum: - kfree(orig_node->bcast_own_sum); - orig_node->bcast_own_sum = data_ptr; - - return 0; -} - int batadv_orig_hash_del_if(struct batadv_hard_iface *hard_iface, int max_if_num) { @@ -618,6 +583,7 @@ int batadv_orig_hash_del_if(struct batadv_hard_iface *hard_iface, struct hlist_head *head; struct batadv_hard_iface *hard_iface_tmp; struct batadv_orig_node *orig_node; + struct batadv_algo_ops *bao = bat_priv->bat_algo_ops; uint32_t i; int ret; @@ -629,11 +595,11 @@ int batadv_orig_hash_del_if(struct batadv_hard_iface *hard_iface, rcu_read_lock(); hlist_for_each_entry_rcu(orig_node, head, hash_entry) { - spin_lock_bh(&orig_node->ogm_cnt_lock); - ret = batadv_orig_node_del_if(orig_node, max_if_num, - hard_iface->if_num); - spin_unlock_bh(&orig_node->ogm_cnt_lock); - + ret = 0; + if (bao->bat_orig_del_if) + ret = bao->bat_orig_del_if(orig_node, + max_if_num, + hard_iface->if_num); if (ret == -ENOMEM) goto err; } diff --git a/net/batman-adv/originator.h b/net/batman-adv/originator.h index 7887b84a9af4..6f77d808a916 100644 --- a/net/batman-adv/originator.h +++ b/net/batman-adv/originator.h @@ -22,16 +22,18 @@ #include "hash.h" +int batadv_compare_orig(const struct hlist_node *node, const void *data2); int batadv_originator_init(struct batadv_priv *bat_priv); void batadv_originator_free(struct batadv_priv *bat_priv); void batadv_purge_orig_ref(struct batadv_priv *bat_priv); void batadv_orig_node_free_ref(struct batadv_orig_node *orig_node); void batadv_orig_node_free_ref_now(struct batadv_orig_node *orig_node); -struct batadv_orig_node *batadv_get_orig_node(struct batadv_priv *bat_priv, +struct batadv_orig_node *batadv_orig_node_new(struct batadv_priv *bat_priv, const uint8_t *addr); struct batadv_neigh_node * batadv_neigh_node_new(struct batadv_hard_iface *hard_iface, - const uint8_t *neigh_addr); + const uint8_t *neigh_addr, + struct batadv_orig_node *orig_node); void batadv_neigh_node_free_ref(struct batadv_neigh_node *neigh_node); struct batadv_neigh_node * batadv_orig_node_get_router(struct batadv_orig_node *orig_node); @@ -40,6 +42,13 @@ int batadv_orig_hash_add_if(struct batadv_hard_iface *hard_iface, int max_if_num); int batadv_orig_hash_del_if(struct batadv_hard_iface *hard_iface, int max_if_num); +struct batadv_orig_node_vlan * +batadv_orig_node_vlan_new(struct batadv_orig_node *orig_node, + unsigned short vid); +struct batadv_orig_node_vlan * +batadv_orig_node_vlan_get(struct batadv_orig_node *orig_node, + unsigned short vid); +void batadv_orig_node_vlan_free_ref(struct batadv_orig_node_vlan *orig_vlan); /* hashfunction to choose an entry in a hash table of given size diff --git a/net/batman-adv/packet.h b/net/batman-adv/packet.h index a51ccfc39da4..207459b62966 100644 --- a/net/batman-adv/packet.h +++ b/net/batman-adv/packet.h @@ -20,17 +20,34 @@ #ifndef _NET_BATMAN_ADV_PACKET_H_ #define _NET_BATMAN_ADV_PACKET_H_ +/** + * enum batadv_packettype - types for batman-adv encapsulated packets + * @BATADV_IV_OGM: originator messages for B.A.T.M.A.N. IV + * @BATADV_BCAST: broadcast packets carrying broadcast payload + * @BATADV_CODED: network coded packets + * + * @BATADV_UNICAST: unicast packets carrying unicast payload traffic + * @BATADV_UNICAST_FRAG: unicast packets carrying a fragment of the original + * payload packet + * @BATADV_UNICAST_4ADDR: unicast packet including the originator address of + * the sender + * @BATADV_ICMP: unicast packet like IP ICMP used for ping or traceroute + * @BATADV_UNICAST_TVLV: unicast packet carrying TVLV containers + */ enum batadv_packettype { - BATADV_IV_OGM = 0x01, - BATADV_ICMP = 0x02, - BATADV_UNICAST = 0x03, - BATADV_BCAST = 0x04, - BATADV_VIS = 0x05, - BATADV_UNICAST_FRAG = 0x06, - BATADV_TT_QUERY = 0x07, - BATADV_ROAM_ADV = 0x08, - BATADV_UNICAST_4ADDR = 0x09, - BATADV_CODED = 0x0a, + /* 0x00 - 0x3f: local packets or special rules for handling */ + BATADV_IV_OGM = 0x00, + BATADV_BCAST = 0x01, + BATADV_CODED = 0x02, + /* 0x40 - 0x7f: unicast */ +#define BATADV_UNICAST_MIN 0x40 + BATADV_UNICAST = 0x40, + BATADV_UNICAST_FRAG = 0x41, + BATADV_UNICAST_4ADDR = 0x42, + BATADV_ICMP = 0x43, + BATADV_UNICAST_TVLV = 0x44, +#define BATADV_UNICAST_MAX 0x7f + /* 0x80 - 0xff: reserved */ }; /** @@ -48,13 +65,21 @@ enum batadv_subtype { }; /* this file is included by batctl which needs these defines */ -#define BATADV_COMPAT_VERSION 14 +#define BATADV_COMPAT_VERSION 15 +/** + * enum batadv_iv_flags - flags used in B.A.T.M.A.N. IV OGM packets + * @BATADV_NOT_BEST_NEXT_HOP: flag is set when ogm packet is forwarded and was + * previously received from someone else than the best neighbor. + * @BATADV_PRIMARIES_FIRST_HOP: flag is set when the primary interface address + * is used, and the packet travels its first hop. + * @BATADV_DIRECTLINK: flag is for the first hop or if rebroadcasted from a + * one hop neighbor on the interface where it was originally received. + */ enum batadv_iv_flags { - BATADV_NOT_BEST_NEXT_HOP = BIT(3), - BATADV_PRIMARIES_FIRST_HOP = BIT(4), - BATADV_VIS_SERVER = BIT(5), - BATADV_DIRECTLINK = BIT(6), + BATADV_NOT_BEST_NEXT_HOP = BIT(0), + BATADV_PRIMARIES_FIRST_HOP = BIT(1), + BATADV_DIRECTLINK = BIT(2), }; /* ICMP message types */ @@ -66,43 +91,44 @@ enum batadv_icmp_packettype { BATADV_PARAMETER_PROBLEM = 12, }; -/* vis defines */ -enum batadv_vis_packettype { - BATADV_VIS_TYPE_SERVER_SYNC = 0, - BATADV_VIS_TYPE_CLIENT_UPDATE = 1, -}; - -/* fragmentation defines */ -enum batadv_unicast_frag_flags { - BATADV_UNI_FRAG_HEAD = BIT(0), - BATADV_UNI_FRAG_LARGETAIL = BIT(1), -}; +/* tt data subtypes */ +#define BATADV_TT_DATA_TYPE_MASK 0x0F -/* TT_QUERY subtypes */ -#define BATADV_TT_QUERY_TYPE_MASK 0x3 - -enum batadv_tt_query_packettype { - BATADV_TT_REQUEST = 0, - BATADV_TT_RESPONSE = 1, -}; - -/* TT_QUERY flags */ -enum batadv_tt_query_flags { - BATADV_TT_FULL_TABLE = BIT(2), +/** + * enum batadv_tt_data_flags - flags for tt data tvlv + * @BATADV_TT_OGM_DIFF: TT diff propagated through OGM + * @BATADV_TT_REQUEST: TT request message + * @BATADV_TT_RESPONSE: TT response message + * @BATADV_TT_FULL_TABLE: contains full table to replace existing table + */ +enum batadv_tt_data_flags { + BATADV_TT_OGM_DIFF = BIT(0), + BATADV_TT_REQUEST = BIT(1), + BATADV_TT_RESPONSE = BIT(2), + BATADV_TT_FULL_TABLE = BIT(4), }; /* BATADV_TT_CLIENT flags. * Flags from BIT(0) to BIT(7) are sent on the wire, while flags from BIT(8) to - * BIT(15) are used for local computation only + * BIT(15) are used for local computation only. + * Flags from BIT(4) to BIT(7) are kept in sync with the rest of the network. */ enum batadv_tt_client_flags { BATADV_TT_CLIENT_DEL = BIT(0), BATADV_TT_CLIENT_ROAM = BIT(1), - BATADV_TT_CLIENT_WIFI = BIT(2), - BATADV_TT_CLIENT_TEMP = BIT(3), + BATADV_TT_CLIENT_WIFI = BIT(4), BATADV_TT_CLIENT_NOPURGE = BIT(8), BATADV_TT_CLIENT_NEW = BIT(9), BATADV_TT_CLIENT_PENDING = BIT(10), + BATADV_TT_CLIENT_TEMP = BIT(11), +}; + +/** + * batadv_vlan_flags - flags for the four MSB of any vlan ID field + * @BATADV_VLAN_HAS_TAG: whether the field contains a valid vlan tag or not + */ +enum batadv_vlan_flags { + BATADV_VLAN_HAS_TAG = BIT(15), }; /* claim frame types for the bridge loop avoidance */ @@ -113,6 +139,22 @@ enum batadv_bla_claimframe { BATADV_CLAIM_TYPE_REQUEST = 0x03, }; +/** + * enum batadv_tvlv_type - tvlv type definitions + * @BATADV_TVLV_GW: gateway tvlv + * @BATADV_TVLV_DAT: distributed arp table tvlv + * @BATADV_TVLV_NC: network coding tvlv + * @BATADV_TVLV_TT: translation table tvlv + * @BATADV_TVLV_ROAM: roaming advertisement tvlv + */ +enum batadv_tvlv_type { + BATADV_TVLV_GW = 0x01, + BATADV_TVLV_DAT = 0x02, + BATADV_TVLV_NC = 0x03, + BATADV_TVLV_TT = 0x04, + BATADV_TVLV_ROAM = 0x05, +}; + /* the destination hardware field in the ARP frame is used to * transport the claim type and the group id */ @@ -131,47 +173,74 @@ struct batadv_header { */ }; +/** + * struct batadv_ogm_packet - ogm (routing protocol) packet + * @header: common batman packet header + * @flags: contains routing relevant flags - see enum batadv_iv_flags + * @tvlv_len: length of tvlv data following the ogm header + */ struct batadv_ogm_packet { struct batadv_header header; - uint8_t flags; /* 0x40: DIRECTLINK flag, 0x20 VIS_SERVER flag... */ + uint8_t flags; __be32 seqno; uint8_t orig[ETH_ALEN]; uint8_t prev_sender[ETH_ALEN]; - uint8_t gw_flags; /* flags related to gateway class */ + uint8_t reserved; uint8_t tq; - uint8_t tt_num_changes; - uint8_t ttvn; /* translation table version number */ - __be16 tt_crc; -} __packed; + __be16 tvlv_len; + /* __packed is not needed as the struct size is divisible by 4, + * and the largest data type in this struct has a size of 4. + */ +}; #define BATADV_OGM_HLEN sizeof(struct batadv_ogm_packet) -struct batadv_icmp_packet { +/** + * batadv_icmp_header - common ICMP header + * @header: common batman header + * @msg_type: ICMP packet type + * @dst: address of the destination node + * @orig: address of the source node + * @uid: local ICMP socket identifier + */ +struct batadv_icmp_header { struct batadv_header header; uint8_t msg_type; /* see ICMP message types above */ uint8_t dst[ETH_ALEN]; uint8_t orig[ETH_ALEN]; - __be16 seqno; uint8_t uid; +}; + +/** + * batadv_icmp_packet - ICMP packet + * @icmph: common ICMP header + * @reserved: not used - useful for alignment + * @seqno: ICMP sequence number + */ +struct batadv_icmp_packet { + struct batadv_icmp_header icmph; uint8_t reserved; + __be16 seqno; }; #define BATADV_RR_LEN 16 -/* icmp_packet_rr must start with all fields from imcp_packet - * as this is assumed by code that handles ICMP packets +/** + * batadv_icmp_packet_rr - ICMP RouteRecord packet + * @icmph: common ICMP header + * @rr_cur: number of entries the rr array + * @seqno: ICMP sequence number + * @rr: route record array */ struct batadv_icmp_packet_rr { - struct batadv_header header; - uint8_t msg_type; /* see ICMP message types above */ - uint8_t dst[ETH_ALEN]; - uint8_t orig[ETH_ALEN]; - __be16 seqno; - uint8_t uid; + struct batadv_icmp_header icmph; uint8_t rr_cur; + __be16 seqno; uint8_t rr[BATADV_RR_LEN][ETH_ALEN]; }; +#define BATADV_ICMP_MAX_PACKET_SIZE sizeof(struct batadv_icmp_packet_rr) + /* All packet headers in front of an ethernet header have to be completely * divisible by 2 but not by 4 to make the payload after the ethernet * header again 4 bytes boundary aligned. @@ -209,15 +278,32 @@ struct batadv_unicast_4addr_packet { */ }; -struct batadv_unicast_frag_packet { - struct batadv_header header; - uint8_t ttvn; /* destination translation table version number */ - uint8_t dest[ETH_ALEN]; - uint8_t flags; - uint8_t align; - uint8_t orig[ETH_ALEN]; - __be16 seqno; -} __packed; +/** + * struct batadv_frag_packet - fragmented packet + * @header: common batman packet header with type, compatversion, and ttl + * @dest: final destination used when routing fragments + * @orig: originator of the fragment used when merging the packet + * @no: fragment number within this sequence + * @reserved: reserved byte for alignment + * @seqno: sequence identification + * @total_size: size of the merged packet + */ +struct batadv_frag_packet { + struct batadv_header header; +#if defined(__BIG_ENDIAN_BITFIELD) + uint8_t no:4; + uint8_t reserved:4; +#elif defined(__LITTLE_ENDIAN_BITFIELD) + uint8_t reserved:4; + uint8_t no:4; +#else +#error "unknown bitfield endianess" +#endif + uint8_t dest[ETH_ALEN]; + uint8_t orig[ETH_ALEN]; + __be16 seqno; + __be16 total_size; +}; struct batadv_bcast_packet { struct batadv_header header; @@ -231,54 +317,6 @@ struct batadv_bcast_packet { #pragma pack() -struct batadv_vis_packet { - struct batadv_header header; - uint8_t vis_type; /* which type of vis-participant sent this? */ - __be32 seqno; /* sequence number */ - uint8_t entries; /* number of entries behind this struct */ - uint8_t reserved; - uint8_t vis_orig[ETH_ALEN]; /* originator reporting its neighbors */ - uint8_t target_orig[ETH_ALEN]; /* who should receive this packet */ - uint8_t sender_orig[ETH_ALEN]; /* who sent or forwarded this packet */ -}; - -struct batadv_tt_query_packet { - struct batadv_header header; - /* the flag field is a combination of: - * - TT_REQUEST or TT_RESPONSE - * - TT_FULL_TABLE - */ - uint8_t flags; - uint8_t dst[ETH_ALEN]; - uint8_t src[ETH_ALEN]; - /* the ttvn field is: - * if TT_REQUEST: ttvn that triggered the - * request - * if TT_RESPONSE: new ttvn for the src - * orig_node - */ - uint8_t ttvn; - /* tt_data field is: - * if TT_REQUEST: crc associated with the - * ttvn - * if TT_RESPONSE: table_size - */ - __be16 tt_data; -} __packed; - -struct batadv_roam_adv_packet { - struct batadv_header header; - uint8_t reserved; - uint8_t dst[ETH_ALEN]; - uint8_t src[ETH_ALEN]; - uint8_t client[ETH_ALEN]; -} __packed; - -struct batadv_tt_change { - uint8_t flags; - uint8_t addr[ETH_ALEN]; -} __packed; - /** * struct batadv_coded_packet - network coded packet * @header: common batman packet header and ttl of first included packet @@ -311,4 +349,96 @@ struct batadv_coded_packet { __be16 coded_len; }; +/** + * struct batadv_unicast_tvlv - generic unicast packet with tvlv payload + * @header: common batman packet header + * @reserved: reserved field (for packet alignment) + * @src: address of the source + * @dst: address of the destination + * @tvlv_len: length of tvlv data following the unicast tvlv header + * @align: 2 bytes to align the header to a 4 byte boundry + */ +struct batadv_unicast_tvlv_packet { + struct batadv_header header; + uint8_t reserved; + uint8_t dst[ETH_ALEN]; + uint8_t src[ETH_ALEN]; + __be16 tvlv_len; + uint16_t align; +}; + +/** + * struct batadv_tvlv_hdr - base tvlv header struct + * @type: tvlv container type (see batadv_tvlv_type) + * @version: tvlv container version + * @len: tvlv container length + */ +struct batadv_tvlv_hdr { + uint8_t type; + uint8_t version; + __be16 len; +}; + +/** + * struct batadv_tvlv_gateway_data - gateway data propagated through gw tvlv + * container + * @bandwidth_down: advertised uplink download bandwidth + * @bandwidth_up: advertised uplink upload bandwidth + */ +struct batadv_tvlv_gateway_data { + __be32 bandwidth_down; + __be32 bandwidth_up; +}; + +/** + * struct batadv_tvlv_tt_data - tt data propagated through the tt tvlv container + * @flags: translation table flags (see batadv_tt_data_flags) + * @ttvn: translation table version number + * @vlan_num: number of announced VLANs. In the TVLV this struct is followed by + * one batadv_tvlv_tt_vlan_data object per announced vlan + */ +struct batadv_tvlv_tt_data { + uint8_t flags; + uint8_t ttvn; + __be16 num_vlan; +}; + +/** + * struct batadv_tvlv_tt_vlan_data - vlan specific tt data propagated through + * the tt tvlv container + * @crc: crc32 checksum of the entries belonging to this vlan + * @vid: vlan identifier + * @reserved: unused, useful for alignment purposes + */ +struct batadv_tvlv_tt_vlan_data { + __be32 crc; + __be16 vid; + uint16_t reserved; +}; + +/** + * struct batadv_tvlv_tt_change - translation table diff data + * @flags: status indicators concerning the non-mesh client (see + * batadv_tt_client_flags) + * @reserved: reserved field + * @addr: mac address of non-mesh client that triggered this tt change + * @vid: VLAN identifier + */ +struct batadv_tvlv_tt_change { + uint8_t flags; + uint8_t reserved; + uint8_t addr[ETH_ALEN]; + __be16 vid; +}; + +/** + * struct batadv_tvlv_roam_adv - roaming advertisement + * @client: mac address of roaming client + * @vid: VLAN identifier + */ +struct batadv_tvlv_roam_adv { + uint8_t client[ETH_ALEN]; + __be16 vid; +}; + #endif /* _NET_BATMAN_ADV_PACKET_H_ */ diff --git a/net/batman-adv/routing.c b/net/batman-adv/routing.c index 0439395d7ba5..d4114d775ad6 100644 --- a/net/batman-adv/routing.c +++ b/net/batman-adv/routing.c @@ -25,11 +25,12 @@ #include "icmp_socket.h" #include "translation-table.h" #include "originator.h" -#include "vis.h" -#include "unicast.h" #include "bridge_loop_avoidance.h" #include "distributed-arp-table.h" #include "network-coding.h" +#include "fragmentation.h" + +#include <linux/if_vlan.h> static int batadv_route_unicast_packet(struct sk_buff *skb, struct batadv_hard_iface *recv_if); @@ -46,7 +47,7 @@ static void _batadv_update_route(struct batadv_priv *bat_priv, if ((curr_router) && (!neigh_node)) { batadv_dbg(BATADV_DBG_ROUTES, bat_priv, "Deleting route towards: %pM\n", orig_node->orig); - batadv_tt_global_del_orig(bat_priv, orig_node, + batadv_tt_global_del_orig(bat_priv, orig_node, -1, "Deleted route towards originator"); /* route added */ @@ -114,9 +115,19 @@ out: return; } -void batadv_bonding_candidate_add(struct batadv_orig_node *orig_node, +/** + * batadv_bonding_candidate_add - consider a new link for bonding mode towards + * the given originator + * @bat_priv: the bat priv with all the soft interface information + * @orig_node: the target node + * @neigh_node: the neighbor representing the new link to consider for bonding + * mode + */ +void batadv_bonding_candidate_add(struct batadv_priv *bat_priv, + struct batadv_orig_node *orig_node, struct batadv_neigh_node *neigh_node) { + struct batadv_algo_ops *bao = bat_priv->bat_algo_ops; struct batadv_neigh_node *tmp_neigh_node, *router = NULL; uint8_t interference_candidate = 0; @@ -131,8 +142,9 @@ void batadv_bonding_candidate_add(struct batadv_orig_node *orig_node, if (!router) goto candidate_del; + /* ... and is good enough to be considered */ - if (neigh_node->tq_avg < router->tq_avg - BATADV_BONDING_TQ_THRESHOLD) + if (bao->bat_neigh_is_equiv_or_better(neigh_node, router)) goto candidate_del; /* check if we have another candidate with the same mac address or @@ -248,46 +260,65 @@ bool batadv_check_management_packet(struct sk_buff *skb, return true; } +/** + * batadv_recv_my_icmp_packet - receive an icmp packet locally + * @bat_priv: the bat priv with all the soft interface information + * @skb: icmp packet to process + * + * Returns NET_RX_SUCCESS if the packet has been consumed or NET_RX_DROP + * otherwise. + */ static int batadv_recv_my_icmp_packet(struct batadv_priv *bat_priv, - struct sk_buff *skb, size_t icmp_len) + struct sk_buff *skb) { struct batadv_hard_iface *primary_if = NULL; struct batadv_orig_node *orig_node = NULL; - struct batadv_icmp_packet_rr *icmp_packet; - int ret = NET_RX_DROP; + struct batadv_icmp_header *icmph; + int res, ret = NET_RX_DROP; - icmp_packet = (struct batadv_icmp_packet_rr *)skb->data; + icmph = (struct batadv_icmp_header *)skb->data; - /* add data to device queue */ - if (icmp_packet->msg_type != BATADV_ECHO_REQUEST) { - batadv_socket_receive_packet(icmp_packet, icmp_len); - goto out; - } + switch (icmph->msg_type) { + case BATADV_ECHO_REPLY: + case BATADV_DESTINATION_UNREACHABLE: + case BATADV_TTL_EXCEEDED: + /* receive the packet */ + if (skb_linearize(skb) < 0) + break; - primary_if = batadv_primary_if_get_selected(bat_priv); - if (!primary_if) - goto out; + batadv_socket_receive_packet(icmph, skb->len); + break; + case BATADV_ECHO_REQUEST: + /* answer echo request (ping) */ + primary_if = batadv_primary_if_get_selected(bat_priv); + if (!primary_if) + goto out; - /* answer echo request (ping) */ - /* get routing information */ - orig_node = batadv_orig_hash_find(bat_priv, icmp_packet->orig); - if (!orig_node) - goto out; + /* get routing information */ + orig_node = batadv_orig_hash_find(bat_priv, icmph->orig); + if (!orig_node) + goto out; - /* create a copy of the skb, if needed, to modify it. */ - if (skb_cow(skb, ETH_HLEN) < 0) - goto out; + /* create a copy of the skb, if needed, to modify it. */ + if (skb_cow(skb, ETH_HLEN) < 0) + goto out; - icmp_packet = (struct batadv_icmp_packet_rr *)skb->data; + icmph = (struct batadv_icmp_header *)skb->data; - memcpy(icmp_packet->dst, icmp_packet->orig, ETH_ALEN); - memcpy(icmp_packet->orig, primary_if->net_dev->dev_addr, ETH_ALEN); - icmp_packet->msg_type = BATADV_ECHO_REPLY; - icmp_packet->header.ttl = BATADV_TTL; + memcpy(icmph->dst, icmph->orig, ETH_ALEN); + memcpy(icmph->orig, primary_if->net_dev->dev_addr, ETH_ALEN); + icmph->msg_type = BATADV_ECHO_REPLY; + icmph->header.ttl = BATADV_TTL; - if (batadv_send_skb_to_orig(skb, orig_node, NULL) != NET_XMIT_DROP) - ret = NET_RX_SUCCESS; + res = batadv_send_skb_to_orig(skb, orig_node, NULL); + if (res != NET_XMIT_DROP) + ret = NET_RX_SUCCESS; + break; + default: + /* drop unknown type */ + goto out; + } out: if (primary_if) batadv_hardif_free_ref(primary_if); @@ -307,9 +338,9 @@ static int batadv_recv_icmp_ttl_exceeded(struct batadv_priv *bat_priv, icmp_packet = (struct batadv_icmp_packet *)skb->data; /* send TTL exceeded if packet is an echo request (traceroute) */ - if (icmp_packet->msg_type != BATADV_ECHO_REQUEST) { + if (icmp_packet->icmph.msg_type != BATADV_ECHO_REQUEST) { pr_debug("Warning - can't forward icmp packet from %pM to %pM: ttl exceeded\n", - icmp_packet->orig, icmp_packet->dst); + icmp_packet->icmph.orig, icmp_packet->icmph.dst); goto out; } @@ -318,7 +349,7 @@ static int batadv_recv_icmp_ttl_exceeded(struct batadv_priv *bat_priv, goto out; /* get routing information */ - orig_node = batadv_orig_hash_find(bat_priv, icmp_packet->orig); + orig_node = batadv_orig_hash_find(bat_priv, icmp_packet->icmph.orig); if (!orig_node) goto out; @@ -328,10 +359,11 @@ static int batadv_recv_icmp_ttl_exceeded(struct batadv_priv *bat_priv, icmp_packet = (struct batadv_icmp_packet *)skb->data; - memcpy(icmp_packet->dst, icmp_packet->orig, ETH_ALEN); - memcpy(icmp_packet->orig, primary_if->net_dev->dev_addr, ETH_ALEN); - icmp_packet->msg_type = BATADV_TTL_EXCEEDED; - icmp_packet->header.ttl = BATADV_TTL; + memcpy(icmp_packet->icmph.dst, icmp_packet->icmph.orig, ETH_ALEN); + memcpy(icmp_packet->icmph.orig, primary_if->net_dev->dev_addr, + ETH_ALEN); + icmp_packet->icmph.msg_type = BATADV_TTL_EXCEEDED; + icmp_packet->icmph.header.ttl = BATADV_TTL; if (batadv_send_skb_to_orig(skb, orig_node, NULL) != NET_XMIT_DROP) ret = NET_RX_SUCCESS; @@ -349,16 +381,13 @@ int batadv_recv_icmp_packet(struct sk_buff *skb, struct batadv_hard_iface *recv_if) { struct batadv_priv *bat_priv = netdev_priv(recv_if->soft_iface); - struct batadv_icmp_packet_rr *icmp_packet; + struct batadv_icmp_header *icmph; + struct batadv_icmp_packet_rr *icmp_packet_rr; struct ethhdr *ethhdr; struct batadv_orig_node *orig_node = NULL; - int hdr_size = sizeof(struct batadv_icmp_packet); + int hdr_size = sizeof(struct batadv_icmp_header); int ret = NET_RX_DROP; - /* we truncate all incoming icmp packets if they don't match our size */ - if (skb->len >= sizeof(struct batadv_icmp_packet_rr)) - hdr_size = sizeof(struct batadv_icmp_packet_rr); - /* drop packet if it has not necessary minimum size */ if (unlikely(!pskb_may_pull(skb, hdr_size))) goto out; @@ -377,26 +406,39 @@ int batadv_recv_icmp_packet(struct sk_buff *skb, if (!batadv_is_my_mac(bat_priv, ethhdr->h_dest)) goto out; - icmp_packet = (struct batadv_icmp_packet_rr *)skb->data; + icmph = (struct batadv_icmp_header *)skb->data; /* add record route information if not full */ - if ((hdr_size == sizeof(struct batadv_icmp_packet_rr)) && - (icmp_packet->rr_cur < BATADV_RR_LEN)) { - memcpy(&(icmp_packet->rr[icmp_packet->rr_cur]), + if ((icmph->msg_type == BATADV_ECHO_REPLY || + icmph->msg_type == BATADV_ECHO_REQUEST) && + (skb->len >= sizeof(struct batadv_icmp_packet_rr))) { + if (skb_linearize(skb) < 0) + goto out; + + /* create a copy of the skb, if needed, to modify it. */ + if (skb_cow(skb, ETH_HLEN) < 0) + goto out; + + icmph = (struct batadv_icmp_header *)skb->data; + icmp_packet_rr = (struct batadv_icmp_packet_rr *)icmph; + if (icmp_packet_rr->rr_cur >= BATADV_RR_LEN) + goto out; + + memcpy(&(icmp_packet_rr->rr[icmp_packet_rr->rr_cur]), ethhdr->h_dest, ETH_ALEN); - icmp_packet->rr_cur++; + icmp_packet_rr->rr_cur++; } /* packet for me */ - if (batadv_is_my_mac(bat_priv, icmp_packet->dst)) - return batadv_recv_my_icmp_packet(bat_priv, skb, hdr_size); + if (batadv_is_my_mac(bat_priv, icmph->dst)) + return batadv_recv_my_icmp_packet(bat_priv, skb); /* TTL exceeded */ - if (icmp_packet->header.ttl < 2) + if (icmph->header.ttl < 2) return batadv_recv_icmp_ttl_exceeded(bat_priv, skb); /* get routing information */ - orig_node = batadv_orig_hash_find(bat_priv, icmp_packet->dst); + orig_node = batadv_orig_hash_find(bat_priv, icmph->dst); if (!orig_node) goto out; @@ -404,10 +446,10 @@ int batadv_recv_icmp_packet(struct sk_buff *skb, if (skb_cow(skb, ETH_HLEN) < 0) goto out; - icmp_packet = (struct batadv_icmp_packet_rr *)skb->data; + icmph = (struct batadv_icmp_header *)skb->data; /* decrement ttl */ - icmp_packet->header.ttl--; + icmph->header.ttl--; /* route it */ if (batadv_send_skb_to_orig(skb, orig_node, recv_if) != NET_XMIT_DROP) @@ -474,18 +516,25 @@ out: return router; } -/* Interface Alternating: Use the best of the - * remaining candidates which are not using - * this interface. +/** + * batadv_find_ifalter_router - find the best of the remaining candidates which + * are not using this interface + * @bat_priv: the bat priv with all the soft interface information + * @primary_orig: the destination + * @recv_if: the interface that the router returned by this function has to not + * use * - * Increases the returned router's refcount + * Returns the best candidate towards primary_orig that is not using recv_if. + * Increases the returned neighbor's refcount */ static struct batadv_neigh_node * -batadv_find_ifalter_router(struct batadv_orig_node *primary_orig, +batadv_find_ifalter_router(struct batadv_priv *bat_priv, + struct batadv_orig_node *primary_orig, const struct batadv_hard_iface *recv_if) { - struct batadv_neigh_node *tmp_neigh_node; struct batadv_neigh_node *router = NULL, *first_candidate = NULL; + struct batadv_algo_ops *bao = bat_priv->bat_algo_ops; + struct batadv_neigh_node *tmp_neigh_node; rcu_read_lock(); list_for_each_entry_rcu(tmp_neigh_node, &primary_orig->bond_list, @@ -497,7 +546,7 @@ batadv_find_ifalter_router(struct batadv_orig_node *primary_orig, if (tmp_neigh_node->if_incoming == recv_if) continue; - if (router && tmp_neigh_node->tq_avg <= router->tq_avg) + if (router && bao->bat_neigh_cmp(tmp_neigh_node, router)) continue; if (!atomic_inc_not_zero(&tmp_neigh_node->refcount)) @@ -557,126 +606,6 @@ static int batadv_check_unicast_packet(struct batadv_priv *bat_priv, return 0; } -int batadv_recv_tt_query(struct sk_buff *skb, struct batadv_hard_iface *recv_if) -{ - struct batadv_priv *bat_priv = netdev_priv(recv_if->soft_iface); - struct batadv_tt_query_packet *tt_query; - uint16_t tt_size; - int hdr_size = sizeof(*tt_query); - char tt_flag; - size_t packet_size; - - if (batadv_check_unicast_packet(bat_priv, skb, hdr_size) < 0) - return NET_RX_DROP; - - /* I could need to modify it */ - if (skb_cow(skb, sizeof(struct batadv_tt_query_packet)) < 0) - goto out; - - tt_query = (struct batadv_tt_query_packet *)skb->data; - - switch (tt_query->flags & BATADV_TT_QUERY_TYPE_MASK) { - case BATADV_TT_REQUEST: - batadv_inc_counter(bat_priv, BATADV_CNT_TT_REQUEST_RX); - - /* If we cannot provide an answer the tt_request is - * forwarded - */ - if (!batadv_send_tt_response(bat_priv, tt_query)) { - if (tt_query->flags & BATADV_TT_FULL_TABLE) - tt_flag = 'F'; - else - tt_flag = '.'; - - batadv_dbg(BATADV_DBG_TT, bat_priv, - "Routing TT_REQUEST to %pM [%c]\n", - tt_query->dst, - tt_flag); - return batadv_route_unicast_packet(skb, recv_if); - } - break; - case BATADV_TT_RESPONSE: - batadv_inc_counter(bat_priv, BATADV_CNT_TT_RESPONSE_RX); - - if (batadv_is_my_mac(bat_priv, tt_query->dst)) { - /* packet needs to be linearized to access the TT - * changes - */ - if (skb_linearize(skb) < 0) - goto out; - /* skb_linearize() possibly changed skb->data */ - tt_query = (struct batadv_tt_query_packet *)skb->data; - - tt_size = batadv_tt_len(ntohs(tt_query->tt_data)); - - /* Ensure we have all the claimed data */ - packet_size = sizeof(struct batadv_tt_query_packet); - packet_size += tt_size; - if (unlikely(skb_headlen(skb) < packet_size)) - goto out; - - batadv_handle_tt_response(bat_priv, tt_query); - } else { - if (tt_query->flags & BATADV_TT_FULL_TABLE) - tt_flag = 'F'; - else - tt_flag = '.'; - batadv_dbg(BATADV_DBG_TT, bat_priv, - "Routing TT_RESPONSE to %pM [%c]\n", - tt_query->dst, - tt_flag); - return batadv_route_unicast_packet(skb, recv_if); - } - break; - } - -out: - /* returning NET_RX_DROP will make the caller function kfree the skb */ - return NET_RX_DROP; -} - -int batadv_recv_roam_adv(struct sk_buff *skb, struct batadv_hard_iface *recv_if) -{ - struct batadv_priv *bat_priv = netdev_priv(recv_if->soft_iface); - struct batadv_roam_adv_packet *roam_adv_packet; - struct batadv_orig_node *orig_node; - - if (batadv_check_unicast_packet(bat_priv, skb, - sizeof(*roam_adv_packet)) < 0) - goto out; - - batadv_inc_counter(bat_priv, BATADV_CNT_TT_ROAM_ADV_RX); - - roam_adv_packet = (struct batadv_roam_adv_packet *)skb->data; - - if (!batadv_is_my_mac(bat_priv, roam_adv_packet->dst)) - return batadv_route_unicast_packet(skb, recv_if); - - /* check if it is a backbone gateway. we don't accept - * roaming advertisement from it, as it has the same - * entries as we have. - */ - if (batadv_bla_is_backbone_gw_orig(bat_priv, roam_adv_packet->src)) - goto out; - - orig_node = batadv_orig_hash_find(bat_priv, roam_adv_packet->src); - if (!orig_node) - goto out; - - batadv_dbg(BATADV_DBG_TT, bat_priv, - "Received ROAMING_ADV from %pM (client %pM)\n", - roam_adv_packet->src, roam_adv_packet->client); - - batadv_tt_global_add(bat_priv, orig_node, roam_adv_packet->client, - BATADV_TT_CLIENT_ROAM, - atomic_read(&orig_node->last_ttvn) + 1); - - batadv_orig_node_free_ref(orig_node); -out: - /* returning NET_RX_DROP will make the caller function kfree the skb */ - return NET_RX_DROP; -} - /* find a suitable router for this originator, and use * bonding if possible. increases the found neighbors * refcount. @@ -751,7 +680,8 @@ batadv_find_router(struct batadv_priv *bat_priv, if (bonding_enabled) router = batadv_find_bond_router(primary_orig_node, recv_if); else - router = batadv_find_ifalter_router(primary_orig_node, recv_if); + router = batadv_find_ifalter_router(bat_priv, primary_orig_node, + recv_if); return_router: if (router && router->if_incoming->if_status != BATADV_IF_ACTIVE) @@ -772,11 +702,9 @@ static int batadv_route_unicast_packet(struct sk_buff *skb, { struct batadv_priv *bat_priv = netdev_priv(recv_if->soft_iface); struct batadv_orig_node *orig_node = NULL; - struct batadv_neigh_node *neigh_node = NULL; struct batadv_unicast_packet *unicast_packet; struct ethhdr *ethhdr = eth_hdr(skb); int res, hdr_len, ret = NET_RX_DROP; - struct sk_buff *new_skb; unicast_packet = (struct batadv_unicast_packet *)skb->data; @@ -793,46 +721,12 @@ static int batadv_route_unicast_packet(struct sk_buff *skb, if (!orig_node) goto out; - /* find_router() increases neigh_nodes refcount if found. */ - neigh_node = batadv_find_router(bat_priv, orig_node, recv_if); - - if (!neigh_node) - goto out; - /* create a copy of the skb, if needed, to modify it. */ if (skb_cow(skb, ETH_HLEN) < 0) goto out; - unicast_packet = (struct batadv_unicast_packet *)skb->data; - - if (unicast_packet->header.packet_type == BATADV_UNICAST && - atomic_read(&bat_priv->fragmentation) && - skb->len > neigh_node->if_incoming->net_dev->mtu) { - ret = batadv_frag_send_skb(skb, bat_priv, - neigh_node->if_incoming, - neigh_node->addr); - goto out; - } - - if (unicast_packet->header.packet_type == BATADV_UNICAST_FRAG && - batadv_frag_can_reassemble(skb, - neigh_node->if_incoming->net_dev->mtu)) { - ret = batadv_frag_reassemble_skb(skb, bat_priv, &new_skb); - - if (ret == NET_RX_DROP) - goto out; - - /* packet was buffered for late merge */ - if (!new_skb) { - ret = NET_RX_SUCCESS; - goto out; - } - - skb = new_skb; - unicast_packet = (struct batadv_unicast_packet *)skb->data; - } - /* decrement ttl */ + unicast_packet = (struct batadv_unicast_packet *)skb->data; unicast_packet->header.ttl--; switch (unicast_packet->header.packet_type) { @@ -867,8 +761,6 @@ static int batadv_route_unicast_packet(struct sk_buff *skb, } out: - if (neigh_node) - batadv_neigh_node_free_ref(neigh_node); if (orig_node) batadv_orig_node_free_ref(orig_node); return ret; @@ -879,6 +771,7 @@ out: * @bat_priv: the bat priv with all the soft interface information * @unicast_packet: the unicast header to be updated * @dst_addr: the payload destination + * @vid: VLAN identifier * * Search the translation table for dst_addr and update the unicast header with * the new corresponding information (originator address where the destination @@ -889,21 +782,22 @@ out: static bool batadv_reroute_unicast_packet(struct batadv_priv *bat_priv, struct batadv_unicast_packet *unicast_packet, - uint8_t *dst_addr) + uint8_t *dst_addr, unsigned short vid) { struct batadv_orig_node *orig_node = NULL; struct batadv_hard_iface *primary_if = NULL; bool ret = false; uint8_t *orig_addr, orig_ttvn; - if (batadv_is_my_client(bat_priv, dst_addr)) { + if (batadv_is_my_client(bat_priv, dst_addr, vid)) { primary_if = batadv_primary_if_get_selected(bat_priv); if (!primary_if) goto out; orig_addr = primary_if->net_dev->dev_addr; orig_ttvn = (uint8_t)atomic_read(&bat_priv->tt.vn); } else { - orig_node = batadv_transtable_search(bat_priv, NULL, dst_addr); + orig_node = batadv_transtable_search(bat_priv, NULL, dst_addr, + vid); if (!orig_node) goto out; @@ -930,11 +824,12 @@ out: static int batadv_check_unicast_ttvn(struct batadv_priv *bat_priv, struct sk_buff *skb, int hdr_len) { - uint8_t curr_ttvn, old_ttvn; + struct batadv_unicast_packet *unicast_packet; + struct batadv_hard_iface *primary_if; struct batadv_orig_node *orig_node; + uint8_t curr_ttvn, old_ttvn; struct ethhdr *ethhdr; - struct batadv_hard_iface *primary_if; - struct batadv_unicast_packet *unicast_packet; + unsigned short vid; int is_old_ttvn; /* check if there is enough data before accessing it */ @@ -946,6 +841,7 @@ static int batadv_check_unicast_ttvn(struct batadv_priv *bat_priv, return 0; unicast_packet = (struct batadv_unicast_packet *)skb->data; + vid = batadv_get_vid(skb, hdr_len); ethhdr = (struct ethhdr *)(skb->data + hdr_len); /* check if the destination client was served by this node and it is now @@ -953,9 +849,9 @@ static int batadv_check_unicast_ttvn(struct batadv_priv *bat_priv, * message and that it knows the new destination in the mesh to re-route * the packet to */ - if (batadv_tt_local_client_is_roaming(bat_priv, ethhdr->h_dest)) { + if (batadv_tt_local_client_is_roaming(bat_priv, ethhdr->h_dest, vid)) { if (batadv_reroute_unicast_packet(bat_priv, unicast_packet, - ethhdr->h_dest)) + ethhdr->h_dest, vid)) net_ratelimited_function(batadv_dbg, BATADV_DBG_TT, bat_priv, "Rerouting unicast packet to %pM (dst=%pM): Local Roaming\n", @@ -1001,7 +897,7 @@ static int batadv_check_unicast_ttvn(struct batadv_priv *bat_priv, * target host */ if (batadv_reroute_unicast_packet(bat_priv, unicast_packet, - ethhdr->h_dest)) { + ethhdr->h_dest, vid)) { net_ratelimited_function(batadv_dbg, BATADV_DBG_TT, bat_priv, "Rerouting unicast packet to %pM (dst=%pM): TTVN mismatch old_ttvn=%u new_ttvn=%u\n", unicast_packet->dest, ethhdr->h_dest, @@ -1013,7 +909,7 @@ static int batadv_check_unicast_ttvn(struct batadv_priv *bat_priv, * currently served by this node or there is no destination at all and * it is possible to drop the packet */ - if (!batadv_is_my_client(bat_priv, ethhdr->h_dest)) + if (!batadv_is_my_client(bat_priv, ethhdr->h_dest, vid)) return 0; /* update the header in order to let the packet be delivered to this @@ -1032,6 +928,34 @@ static int batadv_check_unicast_ttvn(struct batadv_priv *bat_priv, return 1; } +/** + * batadv_recv_unhandled_unicast_packet - receive and process packets which + * are in the unicast number space but not yet known to the implementation + * @skb: unicast tvlv packet to process + * @recv_if: pointer to interface this packet was received on + * + * Returns NET_RX_SUCCESS if the packet has been consumed or NET_RX_DROP + * otherwise. + */ +int batadv_recv_unhandled_unicast_packet(struct sk_buff *skb, + struct batadv_hard_iface *recv_if) +{ + struct batadv_unicast_packet *unicast_packet; + struct batadv_priv *bat_priv = netdev_priv(recv_if->soft_iface); + int check, hdr_size = sizeof(*unicast_packet); + + check = batadv_check_unicast_packet(bat_priv, skb, hdr_size); + if (check < 0) + return NET_RX_DROP; + + /* we don't know about this type, drop it. */ + unicast_packet = (struct batadv_unicast_packet *)skb->data; + if (batadv_is_my_mac(bat_priv, unicast_packet->dest)) + return NET_RX_DROP; + + return batadv_route_unicast_packet(skb, recv_if); +} + int batadv_recv_unicast_packet(struct sk_buff *skb, struct batadv_hard_iface *recv_if) { @@ -1094,51 +1018,112 @@ rx_success: return batadv_route_unicast_packet(skb, recv_if); } -int batadv_recv_ucast_frag_packet(struct sk_buff *skb, - struct batadv_hard_iface *recv_if) +/** + * batadv_recv_unicast_tvlv - receive and process unicast tvlv packets + * @skb: unicast tvlv packet to process + * @recv_if: pointer to interface this packet was received on + * @dst_addr: the payload destination + * + * Returns NET_RX_SUCCESS if the packet has been consumed or NET_RX_DROP + * otherwise. + */ +int batadv_recv_unicast_tvlv(struct sk_buff *skb, + struct batadv_hard_iface *recv_if) { struct batadv_priv *bat_priv = netdev_priv(recv_if->soft_iface); - struct batadv_unicast_frag_packet *unicast_packet; - int hdr_size = sizeof(*unicast_packet); - struct sk_buff *new_skb = NULL; - int ret; + struct batadv_unicast_tvlv_packet *unicast_tvlv_packet; + unsigned char *tvlv_buff; + uint16_t tvlv_buff_len; + int hdr_size = sizeof(*unicast_tvlv_packet); + int ret = NET_RX_DROP; if (batadv_check_unicast_packet(bat_priv, skb, hdr_size) < 0) return NET_RX_DROP; - if (!batadv_check_unicast_ttvn(bat_priv, skb, hdr_size)) + /* the header is likely to be modified while forwarding */ + if (skb_cow(skb, hdr_size) < 0) return NET_RX_DROP; - unicast_packet = (struct batadv_unicast_frag_packet *)skb->data; + /* packet needs to be linearized to access the tvlv content */ + if (skb_linearize(skb) < 0) + return NET_RX_DROP; - /* packet for me */ - if (batadv_is_my_mac(bat_priv, unicast_packet->dest)) { - ret = batadv_frag_reassemble_skb(skb, bat_priv, &new_skb); + unicast_tvlv_packet = (struct batadv_unicast_tvlv_packet *)skb->data; - if (ret == NET_RX_DROP) - return NET_RX_DROP; + tvlv_buff = (unsigned char *)(skb->data + hdr_size); + tvlv_buff_len = ntohs(unicast_tvlv_packet->tvlv_len); - /* packet was buffered for late merge */ - if (!new_skb) - return NET_RX_SUCCESS; + if (tvlv_buff_len > skb->len - hdr_size) + return NET_RX_DROP; - if (batadv_dat_snoop_incoming_arp_request(bat_priv, new_skb, - hdr_size)) - goto rx_success; - if (batadv_dat_snoop_incoming_arp_reply(bat_priv, new_skb, - hdr_size)) - goto rx_success; + ret = batadv_tvlv_containers_process(bat_priv, false, NULL, + unicast_tvlv_packet->src, + unicast_tvlv_packet->dst, + tvlv_buff, tvlv_buff_len); - batadv_interface_rx(recv_if->soft_iface, new_skb, recv_if, - sizeof(struct batadv_unicast_packet), NULL); + if (ret != NET_RX_SUCCESS) + ret = batadv_route_unicast_packet(skb, recv_if); -rx_success: - return NET_RX_SUCCESS; + return ret; +} + +/** + * batadv_recv_frag_packet - process received fragment + * @skb: the received fragment + * @recv_if: interface that the skb is received on + * + * This function does one of the three following things: 1) Forward fragment, if + * the assembled packet will exceed our MTU; 2) Buffer fragment, if we till + * lack further fragments; 3) Merge fragments, if we have all needed parts. + * + * Return NET_RX_DROP if the skb is not consumed, NET_RX_SUCCESS otherwise. + */ +int batadv_recv_frag_packet(struct sk_buff *skb, + struct batadv_hard_iface *recv_if) +{ + struct batadv_priv *bat_priv = netdev_priv(recv_if->soft_iface); + struct batadv_orig_node *orig_node_src = NULL; + struct batadv_frag_packet *frag_packet; + int ret = NET_RX_DROP; + + if (batadv_check_unicast_packet(bat_priv, skb, + sizeof(*frag_packet)) < 0) + goto out; + + frag_packet = (struct batadv_frag_packet *)skb->data; + orig_node_src = batadv_orig_hash_find(bat_priv, frag_packet->orig); + if (!orig_node_src) + goto out; + + /* Route the fragment if it is not for us and too big to be merged. */ + if (!batadv_is_my_mac(bat_priv, frag_packet->dest) && + batadv_frag_skb_fwd(skb, recv_if, orig_node_src)) { + ret = NET_RX_SUCCESS; + goto out; } - return batadv_route_unicast_packet(skb, recv_if); -} + batadv_inc_counter(bat_priv, BATADV_CNT_FRAG_RX); + batadv_add_counter(bat_priv, BATADV_CNT_FRAG_RX_BYTES, skb->len); + + /* Add fragment to buffer and merge if possible. */ + if (!batadv_frag_skb_buffer(&skb, orig_node_src)) + goto out; + /* Deliver merged packet to the appropriate handler, if it was + * merged + */ + if (skb) + batadv_batman_skb_recv(skb, recv_if->net_dev, + &recv_if->batman_adv_ptype, NULL); + + ret = NET_RX_SUCCESS; + +out: + if (orig_node_src) + batadv_orig_node_free_ref(orig_node_src); + + return ret; +} int batadv_recv_bcast_packet(struct sk_buff *skb, struct batadv_hard_iface *recv_if) @@ -1240,53 +1225,3 @@ out: batadv_orig_node_free_ref(orig_node); return ret; } - -int batadv_recv_vis_packet(struct sk_buff *skb, - struct batadv_hard_iface *recv_if) -{ - struct batadv_vis_packet *vis_packet; - struct ethhdr *ethhdr; - struct batadv_priv *bat_priv = netdev_priv(recv_if->soft_iface); - int hdr_size = sizeof(*vis_packet); - - /* keep skb linear */ - if (skb_linearize(skb) < 0) - return NET_RX_DROP; - - if (unlikely(!pskb_may_pull(skb, hdr_size))) - return NET_RX_DROP; - - vis_packet = (struct batadv_vis_packet *)skb->data; - ethhdr = eth_hdr(skb); - - /* not for me */ - if (!batadv_is_my_mac(bat_priv, ethhdr->h_dest)) - return NET_RX_DROP; - - /* ignore own packets */ - if (batadv_is_my_mac(bat_priv, vis_packet->vis_orig)) - return NET_RX_DROP; - - if (batadv_is_my_mac(bat_priv, vis_packet->sender_orig)) - return NET_RX_DROP; - - switch (vis_packet->vis_type) { - case BATADV_VIS_TYPE_SERVER_SYNC: - batadv_receive_server_sync_packet(bat_priv, vis_packet, - skb_headlen(skb)); - break; - - case BATADV_VIS_TYPE_CLIENT_UPDATE: - batadv_receive_client_update_packet(bat_priv, vis_packet, - skb_headlen(skb)); - break; - - default: /* ignore unknown packet */ - break; - } - - /* We take a copy of the data in the packet, so we should - * always free the skbuf. - */ - return NET_RX_DROP; -} diff --git a/net/batman-adv/routing.h b/net/batman-adv/routing.h index 72a29bde2010..19544ddb81b5 100644 --- a/net/batman-adv/routing.h +++ b/net/batman-adv/routing.h @@ -30,23 +30,26 @@ int batadv_recv_icmp_packet(struct sk_buff *skb, struct batadv_hard_iface *recv_if); int batadv_recv_unicast_packet(struct sk_buff *skb, struct batadv_hard_iface *recv_if); -int batadv_recv_ucast_frag_packet(struct sk_buff *skb, - struct batadv_hard_iface *recv_if); +int batadv_recv_frag_packet(struct sk_buff *skb, + struct batadv_hard_iface *iface); int batadv_recv_bcast_packet(struct sk_buff *skb, struct batadv_hard_iface *recv_if); -int batadv_recv_vis_packet(struct sk_buff *skb, - struct batadv_hard_iface *recv_if); int batadv_recv_tt_query(struct sk_buff *skb, struct batadv_hard_iface *recv_if); int batadv_recv_roam_adv(struct sk_buff *skb, struct batadv_hard_iface *recv_if); +int batadv_recv_unicast_tvlv(struct sk_buff *skb, + struct batadv_hard_iface *recv_if); +int batadv_recv_unhandled_unicast_packet(struct sk_buff *skb, + struct batadv_hard_iface *recv_if); struct batadv_neigh_node * batadv_find_router(struct batadv_priv *bat_priv, struct batadv_orig_node *orig_node, const struct batadv_hard_iface *recv_if); void batadv_bonding_candidate_del(struct batadv_orig_node *orig_node, struct batadv_neigh_node *neigh_node); -void batadv_bonding_candidate_add(struct batadv_orig_node *orig_node, +void batadv_bonding_candidate_add(struct batadv_priv *bat_priv, + struct batadv_orig_node *orig_node, struct batadv_neigh_node *neigh_node); void batadv_bonding_save_primary(const struct batadv_orig_node *orig_node, struct batadv_orig_node *orig_neigh_node, diff --git a/net/batman-adv/send.c b/net/batman-adv/send.c index 0266edd0fa7f..c83be5ebaa28 100644 --- a/net/batman-adv/send.c +++ b/net/batman-adv/send.c @@ -24,12 +24,11 @@ #include "translation-table.h" #include "soft-interface.h" #include "hard-interface.h" -#include "vis.h" #include "gateway_common.h" +#include "gateway_client.h" #include "originator.h" #include "network-coding.h" - -#include <linux/if_ether.h> +#include "fragmentation.h" static void batadv_send_outstanding_bcast_packet(struct work_struct *work); @@ -64,10 +63,10 @@ int batadv_send_skb_packet(struct sk_buff *skb, ethhdr = eth_hdr(skb); memcpy(ethhdr->h_source, hard_iface->net_dev->dev_addr, ETH_ALEN); memcpy(ethhdr->h_dest, dst_addr, ETH_ALEN); - ethhdr->h_proto = __constant_htons(ETH_P_BATMAN); + ethhdr->h_proto = htons(ETH_P_BATMAN); skb_set_network_header(skb, ETH_HLEN); - skb->protocol = __constant_htons(ETH_P_BATMAN); + skb->protocol = htons(ETH_P_BATMAN); skb->dev = hard_iface->net_dev; @@ -109,7 +108,19 @@ int batadv_send_skb_to_orig(struct sk_buff *skb, /* batadv_find_router() increases neigh_nodes refcount if found. */ neigh_node = batadv_find_router(bat_priv, orig_node, recv_if); if (!neigh_node) - return ret; + goto out; + + /* Check if the skb is too large to send in one piece and fragment + * it if needed. + */ + if (atomic_read(&bat_priv->fragmentation) && + skb->len > neigh_node->if_incoming->net_dev->mtu) { + /* Fragment and send packet. */ + if (batadv_frag_send_packet(skb, orig_node, neigh_node)) + ret = NET_XMIT_SUCCESS; + + goto out; + } /* try to network code the packet, if it is received on an interface * (i.e. being forwarded). If the packet originates from this node or if @@ -123,11 +134,225 @@ int batadv_send_skb_to_orig(struct sk_buff *skb, ret = NET_XMIT_SUCCESS; } - batadv_neigh_node_free_ref(neigh_node); +out: + if (neigh_node) + batadv_neigh_node_free_ref(neigh_node); + + return ret; +} + +/** + * batadv_send_skb_push_fill_unicast - extend the buffer and initialize the + * common fields for unicast packets + * @skb: the skb carrying the unicast header to initialize + * @hdr_size: amount of bytes to push at the beginning of the skb + * @orig_node: the destination node + * + * Returns false if the buffer extension was not possible or true otherwise. + */ +static bool +batadv_send_skb_push_fill_unicast(struct sk_buff *skb, int hdr_size, + struct batadv_orig_node *orig_node) +{ + struct batadv_unicast_packet *unicast_packet; + uint8_t ttvn = (uint8_t)atomic_read(&orig_node->last_ttvn); + + if (batadv_skb_head_push(skb, hdr_size) < 0) + return false; + + unicast_packet = (struct batadv_unicast_packet *)skb->data; + unicast_packet->header.version = BATADV_COMPAT_VERSION; + /* batman packet type: unicast */ + unicast_packet->header.packet_type = BATADV_UNICAST; + /* set unicast ttl */ + unicast_packet->header.ttl = BATADV_TTL; + /* copy the destination for faster routing */ + memcpy(unicast_packet->dest, orig_node->orig, ETH_ALEN); + /* set the destination tt version number */ + unicast_packet->ttvn = ttvn; + + return true; +} + +/** + * batadv_send_skb_prepare_unicast - encapsulate an skb with a unicast header + * @skb: the skb containing the payload to encapsulate + * @orig_node: the destination node + * + * Returns false if the payload could not be encapsulated or true otherwise. + */ +static bool batadv_send_skb_prepare_unicast(struct sk_buff *skb, + struct batadv_orig_node *orig_node) +{ + size_t uni_size = sizeof(struct batadv_unicast_packet); + + return batadv_send_skb_push_fill_unicast(skb, uni_size, orig_node); +} + +/** + * batadv_send_skb_prepare_unicast_4addr - encapsulate an skb with a + * unicast 4addr header + * @bat_priv: the bat priv with all the soft interface information + * @skb: the skb containing the payload to encapsulate + * @orig_node: the destination node + * @packet_subtype: the unicast 4addr packet subtype to use + * + * Returns false if the payload could not be encapsulated or true otherwise. + */ +bool batadv_send_skb_prepare_unicast_4addr(struct batadv_priv *bat_priv, + struct sk_buff *skb, + struct batadv_orig_node *orig, + int packet_subtype) +{ + struct batadv_hard_iface *primary_if; + struct batadv_unicast_4addr_packet *uc_4addr_packet; + bool ret = false; + + primary_if = batadv_primary_if_get_selected(bat_priv); + if (!primary_if) + goto out; + + /* Pull the header space and fill the unicast_packet substructure. + * We can do that because the first member of the uc_4addr_packet + * is of type struct unicast_packet + */ + if (!batadv_send_skb_push_fill_unicast(skb, sizeof(*uc_4addr_packet), + orig)) + goto out; + + uc_4addr_packet = (struct batadv_unicast_4addr_packet *)skb->data; + uc_4addr_packet->u.header.packet_type = BATADV_UNICAST_4ADDR; + memcpy(uc_4addr_packet->src, primary_if->net_dev->dev_addr, ETH_ALEN); + uc_4addr_packet->subtype = packet_subtype; + uc_4addr_packet->reserved = 0; + + ret = true; +out: + if (primary_if) + batadv_hardif_free_ref(primary_if); + return ret; +} + +/** + * batadv_send_skb_unicast - encapsulate and send an skb via unicast + * @bat_priv: the bat priv with all the soft interface information + * @skb: payload to send + * @packet_type: the batman unicast packet type to use + * @packet_subtype: the unicast 4addr packet subtype (only relevant for unicast + * 4addr packets) + * @orig_node: the originator to send the packet to + * @vid: the vid to be used to search the translation table + * + * Wrap the given skb into a batman-adv unicast or unicast-4addr header + * depending on whether BATADV_UNICAST or BATADV_UNICAST_4ADDR was supplied + * as packet_type. Then send this frame to the given orig_node and release a + * reference to this orig_node. + * + * Returns NET_XMIT_DROP in case of error or NET_XMIT_SUCCESS otherwise. + */ +static int batadv_send_skb_unicast(struct batadv_priv *bat_priv, + struct sk_buff *skb, int packet_type, + int packet_subtype, + struct batadv_orig_node *orig_node, + unsigned short vid) +{ + struct ethhdr *ethhdr = (struct ethhdr *)skb->data; + struct batadv_unicast_packet *unicast_packet; + int ret = NET_XMIT_DROP; + + if (!orig_node) + goto out; + + switch (packet_type) { + case BATADV_UNICAST: + if (!batadv_send_skb_prepare_unicast(skb, orig_node)) + goto out; + break; + case BATADV_UNICAST_4ADDR: + if (!batadv_send_skb_prepare_unicast_4addr(bat_priv, skb, + orig_node, + packet_subtype)) + goto out; + break; + default: + /* this function supports UNICAST and UNICAST_4ADDR only. It + * should never be invoked with any other packet type + */ + goto out; + } + + unicast_packet = (struct batadv_unicast_packet *)skb->data; + + /* inform the destination node that we are still missing a correct route + * for this client. The destination will receive this packet and will + * try to reroute it because the ttvn contained in the header is less + * than the current one + */ + if (batadv_tt_global_client_is_roaming(bat_priv, ethhdr->h_dest, vid)) + unicast_packet->ttvn = unicast_packet->ttvn - 1; + if (batadv_send_skb_to_orig(skb, orig_node, NULL) != NET_XMIT_DROP) + ret = NET_XMIT_SUCCESS; + +out: + if (orig_node) + batadv_orig_node_free_ref(orig_node); + if (ret == NET_XMIT_DROP) + kfree_skb(skb); return ret; } +/** + * batadv_send_skb_via_tt_generic - send an skb via TT lookup + * @bat_priv: the bat priv with all the soft interface information + * @skb: payload to send + * @packet_type: the batman unicast packet type to use + * @packet_subtype: the unicast 4addr packet subtype (only relevant for unicast + * 4addr packets) + * @vid: the vid to be used to search the translation table + * + * Look up the recipient node for the destination address in the ethernet + * header via the translation table. Wrap the given skb into a batman-adv + * unicast or unicast-4addr header depending on whether BATADV_UNICAST or + * BATADV_UNICAST_4ADDR was supplied as packet_type. Then send this frame + * to the according destination node. + * + * Returns NET_XMIT_DROP in case of error or NET_XMIT_SUCCESS otherwise. + */ +int batadv_send_skb_via_tt_generic(struct batadv_priv *bat_priv, + struct sk_buff *skb, int packet_type, + int packet_subtype, unsigned short vid) +{ + struct ethhdr *ethhdr = (struct ethhdr *)skb->data; + struct batadv_orig_node *orig_node; + + orig_node = batadv_transtable_search(bat_priv, ethhdr->h_source, + ethhdr->h_dest, vid); + return batadv_send_skb_unicast(bat_priv, skb, packet_type, + packet_subtype, orig_node, vid); +} + +/** + * batadv_send_skb_via_gw - send an skb via gateway lookup + * @bat_priv: the bat priv with all the soft interface information + * @skb: payload to send + * @vid: the vid to be used to search the translation table + * + * Look up the currently selected gateway. Wrap the given skb into a batman-adv + * unicast header and send this frame to this gateway node. + * + * Returns NET_XMIT_DROP in case of error or NET_XMIT_SUCCESS otherwise. + */ +int batadv_send_skb_via_gw(struct batadv_priv *bat_priv, struct sk_buff *skb, + unsigned short vid) +{ + struct batadv_orig_node *orig_node; + + orig_node = batadv_gw_get_selected_orig(bat_priv); + return batadv_send_skb_unicast(bat_priv, skb, BATADV_UNICAST, 0, + orig_node, vid); +} + void batadv_schedule_bat_ogm(struct batadv_hard_iface *hard_iface) { struct batadv_priv *bat_priv = netdev_priv(hard_iface->soft_iface); diff --git a/net/batman-adv/send.h b/net/batman-adv/send.h index e7b17880fca4..aa2e2537a739 100644 --- a/net/batman-adv/send.h +++ b/net/batman-adv/send.h @@ -34,5 +34,58 @@ void batadv_send_outstanding_bat_ogm_packet(struct work_struct *work); void batadv_purge_outstanding_packets(struct batadv_priv *bat_priv, const struct batadv_hard_iface *hard_iface); +bool batadv_send_skb_prepare_unicast_4addr(struct batadv_priv *bat_priv, + struct sk_buff *skb, + struct batadv_orig_node *orig_node, + int packet_subtype); +int batadv_send_skb_via_tt_generic(struct batadv_priv *bat_priv, + struct sk_buff *skb, int packet_type, + int packet_subtype, unsigned short vid); +int batadv_send_skb_via_gw(struct batadv_priv *bat_priv, struct sk_buff *skb, + unsigned short vid); + +/** + * batadv_send_skb_via_tt - send an skb via TT lookup + * @bat_priv: the bat priv with all the soft interface information + * @skb: the payload to send + * @vid: the vid to be used to search the translation table + * + * Look up the recipient node for the destination address in the ethernet + * header via the translation table. Wrap the given skb into a batman-adv + * unicast header. Then send this frame to the according destination node. + * + * Returns NET_XMIT_DROP in case of error or NET_XMIT_SUCCESS otherwise. + */ +static inline int batadv_send_skb_via_tt(struct batadv_priv *bat_priv, + struct sk_buff *skb, + unsigned short vid) +{ + return batadv_send_skb_via_tt_generic(bat_priv, skb, BATADV_UNICAST, 0, + vid); +} + +/** + * batadv_send_skb_via_tt_4addr - send an skb via TT lookup + * @bat_priv: the bat priv with all the soft interface information + * @skb: the payload to send + * @packet_subtype: the unicast 4addr packet subtype to use + * @vid: the vid to be used to search the translation table + * + * Look up the recipient node for the destination address in the ethernet + * header via the translation table. Wrap the given skb into a batman-adv + * unicast-4addr header. Then send this frame to the according destination + * node. + * + * Returns NET_XMIT_DROP in case of error or NET_XMIT_SUCCESS otherwise. + */ +static inline int batadv_send_skb_via_tt_4addr(struct batadv_priv *bat_priv, + struct sk_buff *skb, + int packet_subtype, + unsigned short vid) +{ + return batadv_send_skb_via_tt_generic(bat_priv, skb, + BATADV_UNICAST_4ADDR, + packet_subtype, vid); +} #endif /* _NET_BATMAN_ADV_SEND_H_ */ diff --git a/net/batman-adv/soft-interface.c b/net/batman-adv/soft-interface.c index 813db4e64602..36f050876f82 100644 --- a/net/batman-adv/soft-interface.c +++ b/net/batman-adv/soft-interface.c @@ -34,8 +34,6 @@ #include <linux/ethtool.h> #include <linux/etherdevice.h> #include <linux/if_vlan.h> -#include <linux/if_ether.h> -#include "unicast.h" #include "bridge_loop_avoidance.h" #include "network-coding.h" @@ -120,9 +118,10 @@ static int batadv_interface_set_mac_addr(struct net_device *dev, void *p) /* only modify transtable if it has been initialized before */ if (atomic_read(&bat_priv->mesh_state) == BATADV_MESH_ACTIVE) { - batadv_tt_local_remove(bat_priv, old_addr, + batadv_tt_local_remove(bat_priv, old_addr, BATADV_NO_FLAGS, "mac address changed", false); - batadv_tt_local_add(dev, addr->sa_data, BATADV_NULL_IFINDEX); + batadv_tt_local_add(dev, addr->sa_data, BATADV_NO_FLAGS, + BATADV_NULL_IFINDEX); } return 0; @@ -139,36 +138,48 @@ static int batadv_interface_change_mtu(struct net_device *dev, int new_mtu) return 0; } +/** + * batadv_interface_set_rx_mode - set the rx mode of a device + * @dev: registered network device to modify + * + * We do not actually need to set any rx filters for the virtual batman + * soft interface. However a dummy handler enables a user to set static + * multicast listeners for instance. + */ +static void batadv_interface_set_rx_mode(struct net_device *dev) +{ +} + static int batadv_interface_tx(struct sk_buff *skb, struct net_device *soft_iface) { - struct ethhdr *ethhdr = (struct ethhdr *)skb->data; + struct ethhdr *ethhdr; struct batadv_priv *bat_priv = netdev_priv(soft_iface); struct batadv_hard_iface *primary_if = NULL; struct batadv_bcast_packet *bcast_packet; - struct vlan_ethhdr *vhdr; - __be16 ethertype = __constant_htons(ETH_P_BATMAN); + __be16 ethertype = htons(ETH_P_BATMAN); static const uint8_t stp_addr[ETH_ALEN] = {0x01, 0x80, 0xC2, 0x00, 0x00, 0x00}; static const uint8_t ectp_addr[ETH_ALEN] = {0xCF, 0x00, 0x00, 0x00, 0x00, 0x00}; + struct vlan_ethhdr *vhdr; unsigned int header_len = 0; int data_len = skb->len, ret; - unsigned short vid __maybe_unused = BATADV_NO_FLAGS; - bool do_bcast = false; - uint32_t seqno; unsigned long brd_delay = 1; + bool do_bcast = false, client_added; + unsigned short vid; + uint32_t seqno; if (atomic_read(&bat_priv->mesh_state) != BATADV_MESH_ACTIVE) goto dropped; soft_iface->trans_start = jiffies; + vid = batadv_get_vid(skb, 0); + ethhdr = (struct ethhdr *)skb->data; switch (ntohs(ethhdr->h_proto)) { case ETH_P_8021Q: vhdr = (struct vlan_ethhdr *)skb->data; - vid = ntohs(vhdr->h_vlan_TCI) & VLAN_VID_MASK; - vid |= BATADV_VLAN_HAS_TAG; if (vhdr->h_vlan_encapsulated_proto != ethertype) break; @@ -185,8 +196,12 @@ static int batadv_interface_tx(struct sk_buff *skb, ethhdr = (struct ethhdr *)skb->data; /* Register the client MAC in the transtable */ - if (!is_multicast_ether_addr(ethhdr->h_source)) - batadv_tt_local_add(soft_iface, ethhdr->h_source, skb->skb_iif); + if (!is_multicast_ether_addr(ethhdr->h_source)) { + client_added = batadv_tt_local_add(soft_iface, ethhdr->h_source, + vid, skb->skb_iif); + if (!client_added) + goto dropped; + } /* don't accept stp packets. STP does not help in meshes. * better use the bridge loop avoidance ... @@ -286,8 +301,12 @@ static int batadv_interface_tx(struct sk_buff *skb, batadv_dat_snoop_outgoing_arp_reply(bat_priv, skb); - ret = batadv_unicast_send_skb(bat_priv, skb); - if (ret != 0) + if (is_multicast_ether_addr(ethhdr->h_dest)) + ret = batadv_send_skb_via_gw(bat_priv, skb, vid); + else + ret = batadv_send_skb_via_tt(bat_priv, skb, vid); + + if (ret == NET_XMIT_DROP) goto dropped_freed; } @@ -309,12 +328,12 @@ void batadv_interface_rx(struct net_device *soft_iface, struct sk_buff *skb, struct batadv_hard_iface *recv_if, int hdr_size, struct batadv_orig_node *orig_node) { + struct batadv_header *batadv_header = (struct batadv_header *)skb->data; struct batadv_priv *bat_priv = netdev_priv(soft_iface); - struct ethhdr *ethhdr; + __be16 ethertype = htons(ETH_P_BATMAN); struct vlan_ethhdr *vhdr; - struct batadv_header *batadv_header = (struct batadv_header *)skb->data; - unsigned short vid __maybe_unused = BATADV_NO_FLAGS; - __be16 ethertype = __constant_htons(ETH_P_BATMAN); + struct ethhdr *ethhdr; + unsigned short vid; bool is_bcast; is_bcast = (batadv_header->packet_type == BATADV_BCAST); @@ -326,13 +345,12 @@ void batadv_interface_rx(struct net_device *soft_iface, skb_pull_rcsum(skb, hdr_size); skb_reset_mac_header(skb); + vid = batadv_get_vid(skb, hdr_size); ethhdr = eth_hdr(skb); switch (ntohs(ethhdr->h_proto)) { case ETH_P_8021Q: vhdr = (struct vlan_ethhdr *)skb->data; - vid = ntohs(vhdr->h_vlan_TCI) & VLAN_VID_MASK; - vid |= BATADV_VLAN_HAS_TAG; if (vhdr->h_vlan_encapsulated_proto != ethertype) break; @@ -368,9 +386,10 @@ void batadv_interface_rx(struct net_device *soft_iface, if (orig_node) batadv_tt_add_temporary_global_entry(bat_priv, orig_node, - ethhdr->h_source); + ethhdr->h_source, vid); - if (batadv_is_ap_isolated(bat_priv, ethhdr->h_source, ethhdr->h_dest)) + if (batadv_is_ap_isolated(bat_priv, ethhdr->h_source, ethhdr->h_dest, + vid)) goto dropped; netif_rx(skb); @@ -382,6 +401,177 @@ out: return; } +/** + * batadv_softif_vlan_free_ref - decrease the vlan object refcounter and + * possibly free it + * @softif_vlan: the vlan object to release + */ +void batadv_softif_vlan_free_ref(struct batadv_softif_vlan *softif_vlan) +{ + if (atomic_dec_and_test(&softif_vlan->refcount)) + kfree_rcu(softif_vlan, rcu); +} + +/** + * batadv_softif_vlan_get - get the vlan object for a specific vid + * @bat_priv: the bat priv with all the soft interface information + * @vid: the identifier of the vlan object to retrieve + * + * Returns the private data of the vlan matching the vid passed as argument or + * NULL otherwise. The refcounter of the returned object is incremented by 1. + */ +struct batadv_softif_vlan *batadv_softif_vlan_get(struct batadv_priv *bat_priv, + unsigned short vid) +{ + struct batadv_softif_vlan *vlan_tmp, *vlan = NULL; + + rcu_read_lock(); + hlist_for_each_entry_rcu(vlan_tmp, &bat_priv->softif_vlan_list, list) { + if (vlan_tmp->vid != vid) + continue; + + if (!atomic_inc_not_zero(&vlan_tmp->refcount)) + continue; + + vlan = vlan_tmp; + break; + } + rcu_read_unlock(); + + return vlan; +} + +/** + * batadv_create_vlan - allocate the needed resources for a new vlan + * @bat_priv: the bat priv with all the soft interface information + * @vid: the VLAN identifier + * + * Returns 0 on success, a negative error otherwise. + */ +int batadv_softif_create_vlan(struct batadv_priv *bat_priv, unsigned short vid) +{ + struct batadv_softif_vlan *vlan; + int err; + + vlan = batadv_softif_vlan_get(bat_priv, vid); + if (vlan) { + batadv_softif_vlan_free_ref(vlan); + return -EEXIST; + } + + vlan = kzalloc(sizeof(*vlan), GFP_ATOMIC); + if (!vlan) + return -ENOMEM; + + vlan->vid = vid; + atomic_set(&vlan->refcount, 1); + + atomic_set(&vlan->ap_isolation, 0); + + err = batadv_sysfs_add_vlan(bat_priv->soft_iface, vlan); + if (err) { + kfree(vlan); + return err; + } + + /* add a new TT local entry. This one will be marked with the NOPURGE + * flag + */ + batadv_tt_local_add(bat_priv->soft_iface, + bat_priv->soft_iface->dev_addr, vid, + BATADV_NULL_IFINDEX); + + spin_lock_bh(&bat_priv->softif_vlan_list_lock); + hlist_add_head_rcu(&vlan->list, &bat_priv->softif_vlan_list); + spin_unlock_bh(&bat_priv->softif_vlan_list_lock); + + return 0; +} + +/** + * batadv_softif_destroy_vlan - remove and destroy a softif_vlan object + * @bat_priv: the bat priv with all the soft interface information + * @vlan: the object to remove + */ +static void batadv_softif_destroy_vlan(struct batadv_priv *bat_priv, + struct batadv_softif_vlan *vlan) +{ + spin_lock_bh(&bat_priv->softif_vlan_list_lock); + hlist_del_rcu(&vlan->list); + spin_unlock_bh(&bat_priv->softif_vlan_list_lock); + + batadv_sysfs_del_vlan(bat_priv, vlan); + + /* explicitly remove the associated TT local entry because it is marked + * with the NOPURGE flag + */ + batadv_tt_local_remove(bat_priv, bat_priv->soft_iface->dev_addr, + vlan->vid, "vlan interface destroyed", false); + + batadv_softif_vlan_free_ref(vlan); +} + +/** + * batadv_interface_add_vid - ndo_add_vid API implementation + * @dev: the netdev of the mesh interface + * @vid: identifier of the new vlan + * + * Set up all the internal structures for handling the new vlan on top of the + * mesh interface + * + * Returns 0 on success or a negative error code in case of failure. + */ +static int batadv_interface_add_vid(struct net_device *dev, __be16 proto, + unsigned short vid) +{ + struct batadv_priv *bat_priv = netdev_priv(dev); + + /* only 802.1Q vlans are supported. + * batman-adv does not know how to handle other types + */ + if (proto != htons(ETH_P_8021Q)) + return -EINVAL; + + vid |= BATADV_VLAN_HAS_TAG; + + return batadv_softif_create_vlan(bat_priv, vid); +} + +/** + * batadv_interface_kill_vid - ndo_kill_vid API implementation + * @dev: the netdev of the mesh interface + * @vid: identifier of the deleted vlan + * + * Destroy all the internal structures used to handle the vlan identified by vid + * on top of the mesh interface + * + * Returns 0 on success, -EINVAL if the specified prototype is not ETH_P_8021Q + * or -ENOENT if the specified vlan id wasn't registered. + */ +static int batadv_interface_kill_vid(struct net_device *dev, __be16 proto, + unsigned short vid) +{ + struct batadv_priv *bat_priv = netdev_priv(dev); + struct batadv_softif_vlan *vlan; + + /* only 802.1Q vlans are supported. batman-adv does not know how to + * handle other types + */ + if (proto != htons(ETH_P_8021Q)) + return -EINVAL; + + vlan = batadv_softif_vlan_get(bat_priv, vid | BATADV_VLAN_HAS_TAG); + if (!vlan) + return -ENOENT; + + batadv_softif_destroy_vlan(bat_priv, vlan); + + /* finally free the vlan object */ + batadv_softif_vlan_free_ref(vlan); + + return 0; +} + /* batman-adv network devices have devices nesting below it and are a special * "super class" of normal network devices; split their locks off into a * separate class since they always nest. @@ -421,6 +611,7 @@ static void batadv_set_lockdep_class(struct net_device *dev) */ static void batadv_softif_destroy_finish(struct work_struct *work) { + struct batadv_softif_vlan *vlan; struct batadv_priv *bat_priv; struct net_device *soft_iface; @@ -428,6 +619,13 @@ static void batadv_softif_destroy_finish(struct work_struct *work) cleanup_work); soft_iface = bat_priv->soft_iface; + /* destroy the "untagged" VLAN */ + vlan = batadv_softif_vlan_get(bat_priv, BATADV_NO_FLAGS); + if (vlan) { + batadv_softif_destroy_vlan(bat_priv, vlan); + batadv_softif_vlan_free_ref(vlan); + } + batadv_sysfs_del_meshif(soft_iface); rtnl_lock(); @@ -444,6 +642,7 @@ static void batadv_softif_destroy_finish(struct work_struct *work) static int batadv_softif_init_late(struct net_device *dev) { struct batadv_priv *bat_priv; + uint32_t random_seqno; int ret; size_t cnt_len = sizeof(uint64_t) * BATADV_CNT_NUM; @@ -468,17 +667,17 @@ static int batadv_softif_init_late(struct net_device *dev) #ifdef CONFIG_BATMAN_ADV_DAT atomic_set(&bat_priv->distributed_arp_table, 1); #endif - atomic_set(&bat_priv->ap_isolation, 0); - atomic_set(&bat_priv->vis_mode, BATADV_VIS_TYPE_CLIENT_UPDATE); atomic_set(&bat_priv->gw_mode, BATADV_GW_MODE_OFF); atomic_set(&bat_priv->gw_sel_class, 20); - atomic_set(&bat_priv->gw_bandwidth, 41); + atomic_set(&bat_priv->gw.bandwidth_down, 100); + atomic_set(&bat_priv->gw.bandwidth_up, 20); atomic_set(&bat_priv->orig_interval, 1000); atomic_set(&bat_priv->hop_penalty, 30); #ifdef CONFIG_BATMAN_ADV_DEBUG atomic_set(&bat_priv->log_level, 0); #endif atomic_set(&bat_priv->fragmentation, 1); + atomic_set(&bat_priv->packet_size_max, ETH_DATA_LEN); atomic_set(&bat_priv->bcast_queue_left, BATADV_BCAST_QUEUE_LEN); atomic_set(&bat_priv->batman_queue_left, BATADV_BATMAN_QUEUE_LEN); @@ -493,6 +692,10 @@ static int batadv_softif_init_late(struct net_device *dev) bat_priv->tt.last_changeset = NULL; bat_priv->tt.last_changeset_len = 0; + /* randomize initial seqno to avoid collision */ + get_random_bytes(&random_seqno, sizeof(random_seqno)); + atomic_set(&bat_priv->frag_seqno, random_seqno); + bat_priv->primary_if = NULL; bat_priv->num_ifaces = 0; @@ -578,8 +781,11 @@ static const struct net_device_ops batadv_netdev_ops = { .ndo_open = batadv_interface_open, .ndo_stop = batadv_interface_release, .ndo_get_stats = batadv_interface_stats, + .ndo_vlan_rx_add_vid = batadv_interface_add_vid, + .ndo_vlan_rx_kill_vid = batadv_interface_kill_vid, .ndo_set_mac_address = batadv_interface_set_mac_addr, .ndo_change_mtu = batadv_interface_change_mtu, + .ndo_set_rx_mode = batadv_interface_set_rx_mode, .ndo_start_xmit = batadv_interface_tx, .ndo_validate_addr = eth_validate_addr, .ndo_add_slave = batadv_softif_slave_add, @@ -616,6 +822,7 @@ static void batadv_softif_init_early(struct net_device *dev) dev->netdev_ops = &batadv_netdev_ops; dev->destructor = batadv_softif_free; + dev->features |= NETIF_F_HW_VLAN_CTAG_FILTER; dev->tx_queue_len = 0; /* can't call min_mtu, because the needed variables @@ -623,7 +830,7 @@ static void batadv_softif_init_early(struct net_device *dev) */ dev->mtu = ETH_DATA_LEN; /* reserve more space in the skbuff for our header */ - dev->hard_header_len = BATADV_HEADER_LEN; + dev->hard_header_len = batadv_max_header_len(); /* generate random address */ eth_hw_addr_random(dev); @@ -760,6 +967,12 @@ static const struct { { "mgmt_tx_bytes" }, { "mgmt_rx" }, { "mgmt_rx_bytes" }, + { "frag_tx" }, + { "frag_tx_bytes" }, + { "frag_rx" }, + { "frag_rx_bytes" }, + { "frag_fwd" }, + { "frag_fwd_bytes" }, { "tt_request_tx" }, { "tt_request_rx" }, { "tt_response_tx" }, diff --git a/net/batman-adv/soft-interface.h b/net/batman-adv/soft-interface.h index 2f2472c2ea0d..06fc91ff5a02 100644 --- a/net/batman-adv/soft-interface.h +++ b/net/batman-adv/soft-interface.h @@ -28,5 +28,9 @@ struct net_device *batadv_softif_create(const char *name); void batadv_softif_destroy_sysfs(struct net_device *soft_iface); int batadv_softif_is_valid(const struct net_device *net_dev); extern struct rtnl_link_ops batadv_link_ops; +int batadv_softif_create_vlan(struct batadv_priv *bat_priv, unsigned short vid); +void batadv_softif_vlan_free_ref(struct batadv_softif_vlan *softif_vlan); +struct batadv_softif_vlan *batadv_softif_vlan_get(struct batadv_priv *bat_priv, + unsigned short vid); #endif /* _NET_BATMAN_ADV_SOFT_INTERFACE_H_ */ diff --git a/net/batman-adv/sysfs.c b/net/batman-adv/sysfs.c index 4114b961bc2c..6335433310af 100644 --- a/net/batman-adv/sysfs.c +++ b/net/batman-adv/sysfs.c @@ -21,11 +21,12 @@ #include "sysfs.h" #include "translation-table.h" #include "distributed-arp-table.h" +#include "network-coding.h" #include "originator.h" #include "hard-interface.h" +#include "soft-interface.h" #include "gateway_common.h" #include "gateway_client.h" -#include "vis.h" static struct net_device *batadv_kobj_to_netdev(struct kobject *obj) { @@ -39,6 +40,53 @@ static struct batadv_priv *batadv_kobj_to_batpriv(struct kobject *obj) return netdev_priv(net_dev); } +/** + * batadv_vlan_kobj_to_batpriv - convert a vlan kobj in the associated batpriv + * @obj: kobject to covert + * + * Returns the associated batadv_priv struct. + */ +static struct batadv_priv *batadv_vlan_kobj_to_batpriv(struct kobject *obj) +{ + /* VLAN specific attributes are located in the root sysfs folder if they + * refer to the untagged VLAN.. + */ + if (!strcmp(BATADV_SYSFS_IF_MESH_SUBDIR, obj->name)) + return batadv_kobj_to_batpriv(obj); + + /* ..while the attributes for the tagged vlans are located in + * the in the corresponding "vlan%VID" subfolder + */ + return batadv_kobj_to_batpriv(obj->parent); +} + +/** + * batadv_kobj_to_vlan - convert a kobj in the associated softif_vlan struct + * @obj: kobject to covert + * + * Returns the associated softif_vlan struct if found, NULL otherwise. + */ +static struct batadv_softif_vlan * +batadv_kobj_to_vlan(struct batadv_priv *bat_priv, struct kobject *obj) +{ + struct batadv_softif_vlan *vlan_tmp, *vlan = NULL; + + rcu_read_lock(); + hlist_for_each_entry_rcu(vlan_tmp, &bat_priv->softif_vlan_list, list) { + if (vlan_tmp->kobj != obj) + continue; + + if (!atomic_inc_not_zero(&vlan_tmp->refcount)) + continue; + + vlan = vlan_tmp; + break; + } + rcu_read_unlock(); + + return vlan; +} + #define BATADV_UEV_TYPE_VAR "BATTYPE=" #define BATADV_UEV_ACTION_VAR "BATACTION=" #define BATADV_UEV_DATA_VAR "BATDATA=" @@ -53,6 +101,15 @@ static char *batadv_uev_type_str[] = { "gw" }; +/* Use this, if you have customized show and store functions for vlan attrs */ +#define BATADV_ATTR_VLAN(_name, _mode, _show, _store) \ +struct batadv_attribute batadv_attr_vlan_##_name = { \ + .attr = {.name = __stringify(_name), \ + .mode = _mode }, \ + .show = _show, \ + .store = _store, \ +}; + /* Use this, if you have customized show and store functions */ #define BATADV_ATTR(_name, _mode, _show, _store) \ struct batadv_attribute batadv_attr_##_name = { \ @@ -122,6 +179,41 @@ ssize_t batadv_show_##_name(struct kobject *kobj, \ static BATADV_ATTR(_name, _mode, batadv_show_##_name, \ batadv_store_##_name) +#define BATADV_ATTR_VLAN_STORE_BOOL(_name, _post_func) \ +ssize_t batadv_store_vlan_##_name(struct kobject *kobj, \ + struct attribute *attr, char *buff, \ + size_t count) \ +{ \ + struct batadv_priv *bat_priv = batadv_vlan_kobj_to_batpriv(kobj);\ + struct batadv_softif_vlan *vlan = batadv_kobj_to_vlan(bat_priv, \ + kobj); \ + size_t res = __batadv_store_bool_attr(buff, count, _post_func, \ + attr, &vlan->_name, \ + bat_priv->soft_iface); \ + batadv_softif_vlan_free_ref(vlan); \ + return res; \ +} + +#define BATADV_ATTR_VLAN_SHOW_BOOL(_name) \ +ssize_t batadv_show_vlan_##_name(struct kobject *kobj, \ + struct attribute *attr, char *buff) \ +{ \ + struct batadv_priv *bat_priv = batadv_vlan_kobj_to_batpriv(kobj);\ + struct batadv_softif_vlan *vlan = batadv_kobj_to_vlan(bat_priv, \ + kobj); \ + size_t res = sprintf(buff, "%s\n", \ + atomic_read(&vlan->_name) == 0 ? \ + "disabled" : "enabled"); \ + batadv_softif_vlan_free_ref(vlan); \ + return res; \ +} + +/* Use this, if you are going to turn a [name] in the vlan struct on or off */ +#define BATADV_ATTR_VLAN_BOOL(_name, _mode, _post_func) \ + static BATADV_ATTR_VLAN_STORE_BOOL(_name, _post_func) \ + static BATADV_ATTR_VLAN_SHOW_BOOL(_name) \ + static BATADV_ATTR_VLAN(_name, _mode, batadv_show_vlan_##_name, \ + batadv_store_vlan_##_name) static int batadv_store_bool_attr(char *buff, size_t count, struct net_device *net_dev, @@ -230,74 +322,6 @@ __batadv_store_uint_attr(const char *buff, size_t count, return ret; } -static ssize_t batadv_show_vis_mode(struct kobject *kobj, - struct attribute *attr, char *buff) -{ - struct batadv_priv *bat_priv = batadv_kobj_to_batpriv(kobj); - int vis_mode = atomic_read(&bat_priv->vis_mode); - const char *mode; - - if (vis_mode == BATADV_VIS_TYPE_CLIENT_UPDATE) - mode = "client"; - else - mode = "server"; - - return sprintf(buff, "%s\n", mode); -} - -static ssize_t batadv_store_vis_mode(struct kobject *kobj, - struct attribute *attr, char *buff, - size_t count) -{ - struct net_device *net_dev = batadv_kobj_to_netdev(kobj); - struct batadv_priv *bat_priv = netdev_priv(net_dev); - unsigned long val; - int ret, vis_mode_tmp = -1; - const char *old_mode, *new_mode; - - ret = kstrtoul(buff, 10, &val); - - if (((count == 2) && (!ret) && - (val == BATADV_VIS_TYPE_CLIENT_UPDATE)) || - (strncmp(buff, "client", 6) == 0) || - (strncmp(buff, "off", 3) == 0)) - vis_mode_tmp = BATADV_VIS_TYPE_CLIENT_UPDATE; - - if (((count == 2) && (!ret) && - (val == BATADV_VIS_TYPE_SERVER_SYNC)) || - (strncmp(buff, "server", 6) == 0)) - vis_mode_tmp = BATADV_VIS_TYPE_SERVER_SYNC; - - if (vis_mode_tmp < 0) { - if (buff[count - 1] == '\n') - buff[count - 1] = '\0'; - - batadv_info(net_dev, - "Invalid parameter for 'vis mode' setting received: %s\n", - buff); - return -EINVAL; - } - - if (atomic_read(&bat_priv->vis_mode) == vis_mode_tmp) - return count; - - if (atomic_read(&bat_priv->vis_mode) == BATADV_VIS_TYPE_CLIENT_UPDATE) - old_mode = "client"; - else - old_mode = "server"; - - if (vis_mode_tmp == BATADV_VIS_TYPE_CLIENT_UPDATE) - new_mode = "client"; - else - new_mode = "server"; - - batadv_info(net_dev, "Changing vis mode from: %s to: %s\n", old_mode, - new_mode); - - atomic_set(&bat_priv->vis_mode, (unsigned int)vis_mode_tmp); - return count; -} - static ssize_t batadv_show_bat_algo(struct kobject *kobj, struct attribute *attr, char *buff) { @@ -390,6 +414,7 @@ static ssize_t batadv_store_gw_mode(struct kobject *kobj, */ batadv_gw_check_client_stop(bat_priv); atomic_set(&bat_priv->gw_mode, (unsigned int)gw_mode_tmp); + batadv_gw_tvlv_container_update(bat_priv); return count; } @@ -397,15 +422,13 @@ static ssize_t batadv_show_gw_bwidth(struct kobject *kobj, struct attribute *attr, char *buff) { struct batadv_priv *bat_priv = batadv_kobj_to_batpriv(kobj); - int down, up; - int gw_bandwidth = atomic_read(&bat_priv->gw_bandwidth); - - batadv_gw_bandwidth_to_kbit(gw_bandwidth, &down, &up); - return sprintf(buff, "%i%s/%i%s\n", - (down > 2048 ? down / 1024 : down), - (down > 2048 ? "MBit" : "KBit"), - (up > 2048 ? up / 1024 : up), - (up > 2048 ? "MBit" : "KBit")); + uint32_t down, up; + + down = atomic_read(&bat_priv->gw.bandwidth_down); + up = atomic_read(&bat_priv->gw.bandwidth_up); + + return sprintf(buff, "%u.%u/%u.%u MBit\n", down / 10, + down % 10, up / 10, up % 10); } static ssize_t batadv_store_gw_bwidth(struct kobject *kobj, @@ -426,12 +449,10 @@ BATADV_ATTR_SIF_BOOL(bonding, S_IRUGO | S_IWUSR, NULL); BATADV_ATTR_SIF_BOOL(bridge_loop_avoidance, S_IRUGO | S_IWUSR, NULL); #endif #ifdef CONFIG_BATMAN_ADV_DAT -BATADV_ATTR_SIF_BOOL(distributed_arp_table, S_IRUGO | S_IWUSR, NULL); +BATADV_ATTR_SIF_BOOL(distributed_arp_table, S_IRUGO | S_IWUSR, + batadv_dat_status_update); #endif BATADV_ATTR_SIF_BOOL(fragmentation, S_IRUGO | S_IWUSR, batadv_update_min_mtu); -BATADV_ATTR_SIF_BOOL(ap_isolation, S_IRUGO | S_IWUSR, NULL); -static BATADV_ATTR(vis_mode, S_IRUGO | S_IWUSR, batadv_show_vis_mode, - batadv_store_vis_mode); static BATADV_ATTR(routing_algo, S_IRUGO, batadv_show_bat_algo, NULL); static BATADV_ATTR(gw_mode, S_IRUGO | S_IWUSR, batadv_show_gw_mode, batadv_store_gw_mode); @@ -447,7 +468,8 @@ static BATADV_ATTR(gw_bandwidth, S_IRUGO | S_IWUSR, batadv_show_gw_bwidth, BATADV_ATTR_SIF_UINT(log_level, S_IRUGO | S_IWUSR, 0, BATADV_DBG_ALL, NULL); #endif #ifdef CONFIG_BATMAN_ADV_NC -BATADV_ATTR_SIF_BOOL(network_coding, S_IRUGO | S_IWUSR, NULL); +BATADV_ATTR_SIF_BOOL(network_coding, S_IRUGO | S_IWUSR, + batadv_nc_status_update); #endif static struct batadv_attribute *batadv_mesh_attrs[] = { @@ -460,8 +482,6 @@ static struct batadv_attribute *batadv_mesh_attrs[] = { &batadv_attr_distributed_arp_table, #endif &batadv_attr_fragmentation, - &batadv_attr_ap_isolation, - &batadv_attr_vis_mode, &batadv_attr_routing_algo, &batadv_attr_gw_mode, &batadv_attr_orig_interval, @@ -477,6 +497,16 @@ static struct batadv_attribute *batadv_mesh_attrs[] = { NULL, }; +BATADV_ATTR_VLAN_BOOL(ap_isolation, S_IRUGO | S_IWUSR, NULL); + +/** + * batadv_vlan_attrs - array of vlan specific sysfs attributes + */ +static struct batadv_attribute *batadv_vlan_attrs[] = { + &batadv_attr_vlan_ap_isolation, + NULL, +}; + int batadv_sysfs_add_meshif(struct net_device *dev) { struct kobject *batif_kobject = &dev->dev.kobj; @@ -527,6 +557,80 @@ void batadv_sysfs_del_meshif(struct net_device *dev) bat_priv->mesh_obj = NULL; } +/** + * batadv_sysfs_add_vlan - add all the needed sysfs objects for the new vlan + * @dev: netdev of the mesh interface + * @vlan: private data of the newly added VLAN interface + * + * Returns 0 on success and -ENOMEM if any of the structure allocations fails. + */ +int batadv_sysfs_add_vlan(struct net_device *dev, + struct batadv_softif_vlan *vlan) +{ + char vlan_subdir[sizeof(BATADV_SYSFS_VLAN_SUBDIR_PREFIX) + 5]; + struct batadv_priv *bat_priv = netdev_priv(dev); + struct batadv_attribute **bat_attr; + int err; + + if (vlan->vid & BATADV_VLAN_HAS_TAG) { + sprintf(vlan_subdir, BATADV_SYSFS_VLAN_SUBDIR_PREFIX "%hu", + vlan->vid & VLAN_VID_MASK); + + vlan->kobj = kobject_create_and_add(vlan_subdir, + bat_priv->mesh_obj); + if (!vlan->kobj) { + batadv_err(dev, "Can't add sysfs directory: %s/%s\n", + dev->name, vlan_subdir); + goto out; + } + } else { + /* the untagged LAN uses the root folder to store its "VLAN + * specific attributes" + */ + vlan->kobj = bat_priv->mesh_obj; + kobject_get(bat_priv->mesh_obj); + } + + for (bat_attr = batadv_vlan_attrs; *bat_attr; ++bat_attr) { + err = sysfs_create_file(vlan->kobj, + &((*bat_attr)->attr)); + if (err) { + batadv_err(dev, "Can't add sysfs file: %s/%s/%s\n", + dev->name, vlan_subdir, + ((*bat_attr)->attr).name); + goto rem_attr; + } + } + + return 0; + +rem_attr: + for (bat_attr = batadv_vlan_attrs; *bat_attr; ++bat_attr) + sysfs_remove_file(vlan->kobj, &((*bat_attr)->attr)); + + kobject_put(vlan->kobj); + vlan->kobj = NULL; +out: + return -ENOMEM; +} + +/** + * batadv_sysfs_del_vlan - remove all the sysfs objects for a given VLAN + * @bat_priv: the bat priv with all the soft interface information + * @vlan: the private data of the VLAN to destroy + */ +void batadv_sysfs_del_vlan(struct batadv_priv *bat_priv, + struct batadv_softif_vlan *vlan) +{ + struct batadv_attribute **bat_attr; + + for (bat_attr = batadv_vlan_attrs; *bat_attr; ++bat_attr) + sysfs_remove_file(vlan->kobj, &((*bat_attr)->attr)); + + kobject_put(vlan->kobj); + vlan->kobj = NULL; +} + static ssize_t batadv_show_mesh_iface(struct kobject *kobj, struct attribute *attr, char *buff) { diff --git a/net/batman-adv/sysfs.h b/net/batman-adv/sysfs.h index 479acf4c16f4..c7d725de50ad 100644 --- a/net/batman-adv/sysfs.h +++ b/net/batman-adv/sysfs.h @@ -22,6 +22,12 @@ #define BATADV_SYSFS_IF_MESH_SUBDIR "mesh" #define BATADV_SYSFS_IF_BAT_SUBDIR "batman_adv" +/** + * BATADV_SYSFS_VLAN_SUBDIR_PREFIX - prefix of the subfolder that will be + * created in the sysfs hierarchy for each VLAN interface. The subfolder will + * be named "BATADV_SYSFS_VLAN_SUBDIR_PREFIX%vid". + */ +#define BATADV_SYSFS_VLAN_SUBDIR_PREFIX "vlan" struct batadv_attribute { struct attribute attr; @@ -36,6 +42,10 @@ void batadv_sysfs_del_meshif(struct net_device *dev); int batadv_sysfs_add_hardif(struct kobject **hardif_obj, struct net_device *dev); void batadv_sysfs_del_hardif(struct kobject **hardif_obj); +int batadv_sysfs_add_vlan(struct net_device *dev, + struct batadv_softif_vlan *vlan); +void batadv_sysfs_del_vlan(struct batadv_priv *bat_priv, + struct batadv_softif_vlan *vlan); int batadv_throw_uevent(struct batadv_priv *bat_priv, enum batadv_uev_type type, enum batadv_uev_action action, const char *data); diff --git a/net/batman-adv/translation-table.c b/net/batman-adv/translation-table.c index 34510f38708f..4add57d4857f 100644 --- a/net/batman-adv/translation-table.c +++ b/net/batman-adv/translation-table.c @@ -27,13 +27,14 @@ #include "routing.h" #include "bridge_loop_avoidance.h" -#include <linux/crc16.h> +#include <linux/crc32c.h> /* hash class keys */ static struct lock_class_key batadv_tt_local_hash_lock_class_key; static struct lock_class_key batadv_tt_global_hash_lock_class_key; static void batadv_send_roam_adv(struct batadv_priv *bat_priv, uint8_t *client, + unsigned short vid, struct batadv_orig_node *orig_node); static void batadv_tt_purge(struct work_struct *work); static void @@ -41,7 +42,8 @@ batadv_tt_global_del_orig_list(struct batadv_tt_global_entry *tt_global_entry); static void batadv_tt_global_del(struct batadv_priv *bat_priv, struct batadv_orig_node *orig_node, const unsigned char *addr, - const char *message, bool roaming); + unsigned short vid, const char *message, + bool roaming); /* returns 1 if they are the same mac addr */ static int batadv_compare_tt(const struct hlist_node *node, const void *data2) @@ -52,43 +54,93 @@ static int batadv_compare_tt(const struct hlist_node *node, const void *data2) return (memcmp(data1, data2, ETH_ALEN) == 0 ? 1 : 0); } +/** + * batadv_choose_tt - return the index of the tt entry in the hash table + * @data: pointer to the tt_common_entry object to map + * @size: the size of the hash table + * + * Returns the hash index where the object represented by 'data' should be + * stored at. + */ +static inline uint32_t batadv_choose_tt(const void *data, uint32_t size) +{ + struct batadv_tt_common_entry *tt; + uint32_t hash = 0; + + tt = (struct batadv_tt_common_entry *)data; + hash = batadv_hash_bytes(hash, &tt->addr, ETH_ALEN); + hash = batadv_hash_bytes(hash, &tt->vid, sizeof(tt->vid)); + + hash += (hash << 3); + hash ^= (hash >> 11); + hash += (hash << 15); + + return hash % size; +} + +/** + * batadv_tt_hash_find - look for a client in the given hash table + * @hash: the hash table to search + * @addr: the mac address of the client to look for + * @vid: VLAN identifier + * + * Returns a pointer to the tt_common struct belonging to the searched client if + * found, NULL otherwise. + */ static struct batadv_tt_common_entry * -batadv_tt_hash_find(struct batadv_hashtable *hash, const void *data) +batadv_tt_hash_find(struct batadv_hashtable *hash, const uint8_t *addr, + unsigned short vid) { struct hlist_head *head; - struct batadv_tt_common_entry *tt_common_entry; - struct batadv_tt_common_entry *tt_common_entry_tmp = NULL; + struct batadv_tt_common_entry to_search, *tt, *tt_tmp = NULL; uint32_t index; if (!hash) return NULL; - index = batadv_choose_orig(data, hash->size); + memcpy(to_search.addr, addr, ETH_ALEN); + to_search.vid = vid; + + index = batadv_choose_tt(&to_search, hash->size); head = &hash->table[index]; rcu_read_lock(); - hlist_for_each_entry_rcu(tt_common_entry, head, hash_entry) { - if (!batadv_compare_eth(tt_common_entry, data)) + hlist_for_each_entry_rcu(tt, head, hash_entry) { + if (!batadv_compare_eth(tt, addr)) continue; - if (!atomic_inc_not_zero(&tt_common_entry->refcount)) + if (tt->vid != vid) continue; - tt_common_entry_tmp = tt_common_entry; + if (!atomic_inc_not_zero(&tt->refcount)) + continue; + + tt_tmp = tt; break; } rcu_read_unlock(); - return tt_common_entry_tmp; + return tt_tmp; } +/** + * batadv_tt_local_hash_find - search the local table for a given client + * @bat_priv: the bat priv with all the soft interface information + * @addr: the mac address of the client to look for + * @vid: VLAN identifier + * + * Returns a pointer to the corresponding tt_local_entry struct if the client is + * found, NULL otherwise. + */ static struct batadv_tt_local_entry * -batadv_tt_local_hash_find(struct batadv_priv *bat_priv, const void *data) +batadv_tt_local_hash_find(struct batadv_priv *bat_priv, const uint8_t *addr, + unsigned short vid) { struct batadv_tt_common_entry *tt_common_entry; struct batadv_tt_local_entry *tt_local_entry = NULL; - tt_common_entry = batadv_tt_hash_find(bat_priv->tt.local_hash, data); + tt_common_entry = batadv_tt_hash_find(bat_priv->tt.local_hash, addr, + vid); if (tt_common_entry) tt_local_entry = container_of(tt_common_entry, struct batadv_tt_local_entry, @@ -96,13 +148,24 @@ batadv_tt_local_hash_find(struct batadv_priv *bat_priv, const void *data) return tt_local_entry; } +/** + * batadv_tt_global_hash_find - search the global table for a given client + * @bat_priv: the bat priv with all the soft interface information + * @addr: the mac address of the client to look for + * @vid: VLAN identifier + * + * Returns a pointer to the corresponding tt_global_entry struct if the client + * is found, NULL otherwise. + */ static struct batadv_tt_global_entry * -batadv_tt_global_hash_find(struct batadv_priv *bat_priv, const void *data) +batadv_tt_global_hash_find(struct batadv_priv *bat_priv, const uint8_t *addr, + unsigned short vid) { struct batadv_tt_common_entry *tt_common_entry; struct batadv_tt_global_entry *tt_global_entry = NULL; - tt_common_entry = batadv_tt_hash_find(bat_priv->tt.global_hash, data); + tt_common_entry = batadv_tt_hash_find(bat_priv->tt.global_hash, addr, + vid); if (tt_common_entry) tt_global_entry = container_of(tt_common_entry, struct batadv_tt_global_entry, @@ -117,25 +180,17 @@ batadv_tt_local_entry_free_ref(struct batadv_tt_local_entry *tt_local_entry) kfree_rcu(tt_local_entry, common.rcu); } -static void batadv_tt_global_entry_free_rcu(struct rcu_head *rcu) -{ - struct batadv_tt_common_entry *tt_common_entry; - struct batadv_tt_global_entry *tt_global_entry; - - tt_common_entry = container_of(rcu, struct batadv_tt_common_entry, rcu); - tt_global_entry = container_of(tt_common_entry, - struct batadv_tt_global_entry, common); - - kfree(tt_global_entry); -} - +/** + * batadv_tt_global_entry_free_ref - decrement the refcounter for a + * tt_global_entry and possibly free it + * @tt_global_entry: the object to free + */ static void batadv_tt_global_entry_free_ref(struct batadv_tt_global_entry *tt_global_entry) { if (atomic_dec_and_test(&tt_global_entry->common.refcount)) { batadv_tt_global_del_orig_list(tt_global_entry); - call_rcu(&tt_global_entry->common.rcu, - batadv_tt_global_entry_free_rcu); + kfree_rcu(tt_global_entry, common.rcu); } } @@ -153,13 +208,107 @@ static void batadv_tt_orig_list_entry_free_rcu(struct rcu_head *rcu) kfree(orig_entry); } +/** + * batadv_tt_local_size_mod - change the size by v of the local table identified + * by vid + * @bat_priv: the bat priv with all the soft interface information + * @vid: the VLAN identifier of the sub-table to change + * @v: the amount to sum to the local table size + */ +static void batadv_tt_local_size_mod(struct batadv_priv *bat_priv, + unsigned short vid, int v) +{ + struct batadv_softif_vlan *vlan; + + vlan = batadv_softif_vlan_get(bat_priv, vid); + if (!vlan) + return; + + atomic_add(v, &vlan->tt.num_entries); + + batadv_softif_vlan_free_ref(vlan); +} + +/** + * batadv_tt_local_size_inc - increase by one the local table size for the given + * vid + * @bat_priv: the bat priv with all the soft interface information + * @vid: the VLAN identifier + */ +static void batadv_tt_local_size_inc(struct batadv_priv *bat_priv, + unsigned short vid) +{ + batadv_tt_local_size_mod(bat_priv, vid, 1); +} + +/** + * batadv_tt_local_size_dec - decrease by one the local table size for the given + * vid + * @bat_priv: the bat priv with all the soft interface information + * @vid: the VLAN identifier + */ +static void batadv_tt_local_size_dec(struct batadv_priv *bat_priv, + unsigned short vid) +{ + batadv_tt_local_size_mod(bat_priv, vid, -1); +} + +/** + * batadv_tt_global_size_mod - change the size by v of the local table + * identified by vid + * @bat_priv: the bat priv with all the soft interface information + * @vid: the VLAN identifier + * @v: the amount to sum to the global table size + */ +static void batadv_tt_global_size_mod(struct batadv_orig_node *orig_node, + unsigned short vid, int v) +{ + struct batadv_orig_node_vlan *vlan; + + vlan = batadv_orig_node_vlan_new(orig_node, vid); + if (!vlan) + return; + + if (atomic_add_return(v, &vlan->tt.num_entries) == 0) { + spin_lock_bh(&orig_node->vlan_list_lock); + list_del_rcu(&vlan->list); + spin_unlock_bh(&orig_node->vlan_list_lock); + batadv_orig_node_vlan_free_ref(vlan); + } + + batadv_orig_node_vlan_free_ref(vlan); +} + +/** + * batadv_tt_global_size_inc - increase by one the global table size for the + * given vid + * @orig_node: the originator which global table size has to be decreased + * @vid: the vlan identifier + */ +static void batadv_tt_global_size_inc(struct batadv_orig_node *orig_node, + unsigned short vid) +{ + batadv_tt_global_size_mod(orig_node, vid, 1); +} + +/** + * batadv_tt_global_size_dec - decrease by one the global table size for the + * given vid + * @orig_node: the originator which global table size has to be decreased + * @vid: the vlan identifier + */ +static void batadv_tt_global_size_dec(struct batadv_orig_node *orig_node, + unsigned short vid) +{ + batadv_tt_global_size_mod(orig_node, vid, -1); +} + static void batadv_tt_orig_list_entry_free_ref(struct batadv_tt_orig_list_entry *orig_entry) { if (!atomic_dec_and_test(&orig_entry->refcount)) return; - /* to avoid race conditions, immediately decrease the tt counter */ - atomic_dec(&orig_entry->orig_node->tt_size); + call_rcu(&orig_entry->rcu, batadv_tt_orig_list_entry_free_rcu); } @@ -180,12 +329,13 @@ static void batadv_tt_local_event(struct batadv_priv *bat_priv, bool del_op_requested, del_op_entry; tt_change_node = kmalloc(sizeof(*tt_change_node), GFP_ATOMIC); - if (!tt_change_node) return; tt_change_node->change.flags = flags; + tt_change_node->change.reserved = 0; memcpy(tt_change_node->change.addr, common->addr, ETH_ALEN); + tt_change_node->change.vid = htons(common->vid); del_op_requested = flags & BATADV_TT_CLIENT_DEL; @@ -208,6 +358,13 @@ static void batadv_tt_local_event(struct batadv_priv *bat_priv, goto del; if (del_op_requested && !del_op_entry) goto del; + + /* this is a second add in the same originator interval. It + * means that flags have been changed: update them! + */ + if (!del_op_requested && !del_op_entry) + entry->change.flags = flags; + continue; del: list_del(&entry->list); @@ -229,9 +386,55 @@ unlock: atomic_inc(&bat_priv->tt.local_changes); } -int batadv_tt_len(int changes_num) +/** + * batadv_tt_len - compute length in bytes of given number of tt changes + * @changes_num: number of tt changes + * + * Returns computed length in bytes. + */ +static int batadv_tt_len(int changes_num) { - return changes_num * sizeof(struct batadv_tt_change); + return changes_num * sizeof(struct batadv_tvlv_tt_change); +} + +/** + * batadv_tt_entries - compute the number of entries fitting in tt_len bytes + * @tt_len: available space + * + * Returns the number of entries. + */ +static uint16_t batadv_tt_entries(uint16_t tt_len) +{ + return tt_len / batadv_tt_len(1); +} + +/** + * batadv_tt_local_table_transmit_size - calculates the local translation table + * size when transmitted over the air + * @bat_priv: the bat priv with all the soft interface information + * + * Returns local translation table size in bytes. + */ +static int batadv_tt_local_table_transmit_size(struct batadv_priv *bat_priv) +{ + uint16_t num_vlan = 0, tt_local_entries = 0; + struct batadv_softif_vlan *vlan; + int hdr_size; + + rcu_read_lock(); + hlist_for_each_entry_rcu(vlan, &bat_priv->softif_vlan_list, list) { + num_vlan++; + tt_local_entries += atomic_read(&vlan->tt.num_entries); + } + rcu_read_unlock(); + + /* header size of tvlv encapsulated tt response payload */ + hdr_size = sizeof(struct batadv_unicast_tvlv_packet); + hdr_size += sizeof(struct batadv_tvlv_hdr); + hdr_size += sizeof(struct batadv_tvlv_tt_data); + hdr_size += num_vlan * sizeof(struct batadv_tvlv_tt_vlan_data); + + return hdr_size + batadv_tt_len(tt_local_entries); } static int batadv_tt_local_init(struct batadv_priv *bat_priv) @@ -255,33 +458,51 @@ static void batadv_tt_global_free(struct batadv_priv *bat_priv, const char *message) { batadv_dbg(BATADV_DBG_TT, bat_priv, - "Deleting global tt entry %pM: %s\n", - tt_global->common.addr, message); + "Deleting global tt entry %pM (vid: %d): %s\n", + tt_global->common.addr, + BATADV_PRINT_VID(tt_global->common.vid), message); batadv_hash_remove(bat_priv->tt.global_hash, batadv_compare_tt, - batadv_choose_orig, tt_global->common.addr); + batadv_choose_tt, &tt_global->common); batadv_tt_global_entry_free_ref(tt_global); } -void batadv_tt_local_add(struct net_device *soft_iface, const uint8_t *addr, - int ifindex) +/** + * batadv_tt_local_add - add a new client to the local table or update an + * existing client + * @soft_iface: netdev struct of the mesh interface + * @addr: the mac address of the client to add + * @vid: VLAN identifier + * @ifindex: index of the interface where the client is connected to (useful to + * identify wireless clients) + * + * Returns true if the client was successfully added, false otherwise. + */ +bool batadv_tt_local_add(struct net_device *soft_iface, const uint8_t *addr, + unsigned short vid, int ifindex) { struct batadv_priv *bat_priv = netdev_priv(soft_iface); struct batadv_tt_local_entry *tt_local; struct batadv_tt_global_entry *tt_global; + struct net_device *in_dev = NULL; struct hlist_head *head; struct batadv_tt_orig_list_entry *orig_entry; - int hash_added; - bool roamed_back = false; + int hash_added, table_size, packet_size_max; + bool ret = false, roamed_back = false; + uint8_t remote_flags; + + if (ifindex != BATADV_NULL_IFINDEX) + in_dev = dev_get_by_index(&init_net, ifindex); - tt_local = batadv_tt_local_hash_find(bat_priv, addr); - tt_global = batadv_tt_global_hash_find(bat_priv, addr); + tt_local = batadv_tt_local_hash_find(bat_priv, addr, vid); + tt_global = batadv_tt_global_hash_find(bat_priv, addr, vid); if (tt_local) { tt_local->last_seen = jiffies; if (tt_local->common.flags & BATADV_TT_CLIENT_PENDING) { batadv_dbg(BATADV_DBG_TT, bat_priv, - "Re-adding pending client %pM\n", addr); + "Re-adding pending client %pM (vid: %d)\n", + addr, BATADV_PRINT_VID(vid)); /* whatever the reason why the PENDING flag was set, * this is a client which was enqueued to be removed in * this orig_interval. Since it popped up again, the @@ -293,8 +514,8 @@ void batadv_tt_local_add(struct net_device *soft_iface, const uint8_t *addr, if (tt_local->common.flags & BATADV_TT_CLIENT_ROAM) { batadv_dbg(BATADV_DBG_TT, bat_priv, - "Roaming client %pM came back to its original location\n", - addr); + "Roaming client %pM (vid: %d) came back to its original location\n", + addr, BATADV_PRINT_VID(vid)); /* the ROAM flag is set because this client roamed away * and the node got a roaming_advertisement message. Now * that the client popped up again at its original @@ -306,12 +527,24 @@ void batadv_tt_local_add(struct net_device *soft_iface, const uint8_t *addr, goto check_roaming; } + /* Ignore the client if we cannot send it in a full table response. */ + table_size = batadv_tt_local_table_transmit_size(bat_priv); + table_size += batadv_tt_len(1); + packet_size_max = atomic_read(&bat_priv->packet_size_max); + if (table_size > packet_size_max) { + net_ratelimited_function(batadv_info, soft_iface, + "Local translation table size (%i) exceeds maximum packet size (%i); Ignoring new local tt entry: %pM\n", + table_size, packet_size_max, addr); + goto out; + } + tt_local = kmalloc(sizeof(*tt_local), GFP_ATOMIC); if (!tt_local) goto out; batadv_dbg(BATADV_DBG_TT, bat_priv, - "Creating new local tt entry: %pM (ttvn: %d)\n", addr, + "Creating new local tt entry: %pM (vid: %d, ttvn: %d)\n", + addr, BATADV_PRINT_VID(vid), (uint8_t)atomic_read(&bat_priv->tt.vn)); memcpy(tt_local->common.addr, addr, ETH_ALEN); @@ -320,7 +553,8 @@ void batadv_tt_local_add(struct net_device *soft_iface, const uint8_t *addr, * (consistency check) */ tt_local->common.flags = BATADV_TT_CLIENT_NEW; - if (batadv_is_wifi_iface(ifindex)) + tt_local->common.vid = vid; + if (batadv_is_wifi_netdev(in_dev)) tt_local->common.flags |= BATADV_TT_CLIENT_WIFI; atomic_set(&tt_local->common.refcount, 2); tt_local->last_seen = jiffies; @@ -331,7 +565,7 @@ void batadv_tt_local_add(struct net_device *soft_iface, const uint8_t *addr, tt_local->common.flags |= BATADV_TT_CLIENT_NOPURGE; hash_added = batadv_hash_add(bat_priv->tt.local_hash, batadv_compare_tt, - batadv_choose_orig, &tt_local->common, + batadv_choose_tt, &tt_local->common, &tt_local->common.hash_entry); if (unlikely(hash_added != 0)) { @@ -353,6 +587,7 @@ check_roaming: rcu_read_lock(); hlist_for_each_entry_rcu(orig_entry, head, list) { batadv_send_roam_adv(bat_priv, tt_global->common.addr, + tt_global->common.vid, orig_entry->orig_node); } rcu_read_unlock(); @@ -369,78 +604,219 @@ check_roaming: } } + /* store the current remote flags before altering them. This helps + * understanding is flags are changing or not + */ + remote_flags = tt_local->common.flags & BATADV_TT_REMOTE_MASK; + + if (batadv_is_wifi_netdev(in_dev)) + tt_local->common.flags |= BATADV_TT_CLIENT_WIFI; + else + tt_local->common.flags &= ~BATADV_TT_CLIENT_WIFI; + + /* if any "dynamic" flag has been modified, resend an ADD event for this + * entry so that all the nodes can get the new flags + */ + if (remote_flags ^ (tt_local->common.flags & BATADV_TT_REMOTE_MASK)) + batadv_tt_local_event(bat_priv, tt_local, BATADV_NO_FLAGS); + + ret = true; out: + if (in_dev) + dev_put(in_dev); if (tt_local) batadv_tt_local_entry_free_ref(tt_local); if (tt_global) batadv_tt_global_entry_free_ref(tt_global); + return ret; } -static void batadv_tt_realloc_packet_buff(unsigned char **packet_buff, - int *packet_buff_len, - int min_packet_len, - int new_packet_len) +/** + * batadv_tt_prepare_tvlv_global_data - prepare the TVLV TT header to send + * within a TT Response directed to another node + * @orig_node: originator for which the TT data has to be prepared + * @tt_data: uninitialised pointer to the address of the TVLV buffer + * @tt_change: uninitialised pointer to the address of the area where the TT + * changed can be stored + * @tt_len: pointer to the length to reserve to the tt_change. if -1 this + * function reserves the amount of space needed to send the entire global TT + * table. In case of success the value is updated with the real amount of + * reserved bytes + + * Allocate the needed amount of memory for the entire TT TVLV and write its + * header made up by one tvlv_tt_data object and a series of tvlv_tt_vlan_data + * objects, one per active VLAN served by the originator node. + * + * Return the size of the allocated buffer or 0 in case of failure. + */ +static uint16_t +batadv_tt_prepare_tvlv_global_data(struct batadv_orig_node *orig_node, + struct batadv_tvlv_tt_data **tt_data, + struct batadv_tvlv_tt_change **tt_change, + int32_t *tt_len) { - unsigned char *new_buff; + uint16_t num_vlan = 0, num_entries = 0, change_offset, tvlv_len; + struct batadv_tvlv_tt_vlan_data *tt_vlan; + struct batadv_orig_node_vlan *vlan; + uint8_t *tt_change_ptr; - new_buff = kmalloc(new_packet_len, GFP_ATOMIC); + rcu_read_lock(); + list_for_each_entry_rcu(vlan, &orig_node->vlan_list, list) { + num_vlan++; + num_entries += atomic_read(&vlan->tt.num_entries); + } - /* keep old buffer if kmalloc should fail */ - if (new_buff) { - memcpy(new_buff, *packet_buff, min_packet_len); - kfree(*packet_buff); - *packet_buff = new_buff; - *packet_buff_len = new_packet_len; + change_offset = sizeof(**tt_data); + change_offset += num_vlan * sizeof(*tt_vlan); + + /* if tt_len is negative, allocate the space needed by the full table */ + if (*tt_len < 0) + *tt_len = batadv_tt_len(num_entries); + + tvlv_len = *tt_len; + tvlv_len += change_offset; + + *tt_data = kmalloc(tvlv_len, GFP_ATOMIC); + if (!*tt_data) { + *tt_len = 0; + goto out; } + + (*tt_data)->flags = BATADV_NO_FLAGS; + (*tt_data)->ttvn = atomic_read(&orig_node->last_ttvn); + (*tt_data)->num_vlan = htons(num_vlan); + + tt_vlan = (struct batadv_tvlv_tt_vlan_data *)(*tt_data + 1); + list_for_each_entry_rcu(vlan, &orig_node->vlan_list, list) { + tt_vlan->vid = htons(vlan->vid); + tt_vlan->crc = htonl(vlan->tt.crc); + + tt_vlan++; + } + + tt_change_ptr = (uint8_t *)*tt_data + change_offset; + *tt_change = (struct batadv_tvlv_tt_change *)tt_change_ptr; + +out: + rcu_read_unlock(); + return tvlv_len; } -static void batadv_tt_prepare_packet_buff(struct batadv_priv *bat_priv, - unsigned char **packet_buff, - int *packet_buff_len, - int min_packet_len) -{ - int req_len; +/** + * batadv_tt_prepare_tvlv_local_data - allocate and prepare the TT TVLV for this + * node + * @bat_priv: the bat priv with all the soft interface information + * @tt_data: uninitialised pointer to the address of the TVLV buffer + * @tt_change: uninitialised pointer to the address of the area where the TT + * changes can be stored + * @tt_len: pointer to the length to reserve to the tt_change. if -1 this + * function reserves the amount of space needed to send the entire local TT + * table. In case of success the value is updated with the real amount of + * reserved bytes + * + * Allocate the needed amount of memory for the entire TT TVLV and write its + * header made up by one tvlv_tt_data object and a series of tvlv_tt_vlan_data + * objects, one per active VLAN. + * + * Return the size of the allocated buffer or 0 in case of failure. + */ +static uint16_t +batadv_tt_prepare_tvlv_local_data(struct batadv_priv *bat_priv, + struct batadv_tvlv_tt_data **tt_data, + struct batadv_tvlv_tt_change **tt_change, + int32_t *tt_len) +{ + struct batadv_tvlv_tt_vlan_data *tt_vlan; + struct batadv_softif_vlan *vlan; + uint16_t num_vlan = 0, num_entries = 0, tvlv_len; + uint8_t *tt_change_ptr; + int change_offset; - req_len = min_packet_len; - req_len += batadv_tt_len(atomic_read(&bat_priv->tt.local_changes)); + rcu_read_lock(); + hlist_for_each_entry_rcu(vlan, &bat_priv->softif_vlan_list, list) { + num_vlan++; + num_entries += atomic_read(&vlan->tt.num_entries); + } - /* if we have too many changes for one packet don't send any - * and wait for the tt table request which will be fragmented - */ - if (req_len > bat_priv->soft_iface->mtu) - req_len = min_packet_len; + change_offset = sizeof(**tt_data); + change_offset += num_vlan * sizeof(*tt_vlan); + + /* if tt_len is negative, allocate the space needed by the full table */ + if (*tt_len < 0) + *tt_len = batadv_tt_len(num_entries); + + tvlv_len = *tt_len; + tvlv_len += change_offset; + + *tt_data = kmalloc(tvlv_len, GFP_ATOMIC); + if (!*tt_data) { + tvlv_len = 0; + goto out; + } + + (*tt_data)->flags = BATADV_NO_FLAGS; + (*tt_data)->ttvn = atomic_read(&bat_priv->tt.vn); + (*tt_data)->num_vlan = htons(num_vlan); + + tt_vlan = (struct batadv_tvlv_tt_vlan_data *)(*tt_data + 1); + hlist_for_each_entry_rcu(vlan, &bat_priv->softif_vlan_list, list) { + tt_vlan->vid = htons(vlan->vid); + tt_vlan->crc = htonl(vlan->tt.crc); - batadv_tt_realloc_packet_buff(packet_buff, packet_buff_len, - min_packet_len, req_len); + tt_vlan++; + } + + tt_change_ptr = (uint8_t *)*tt_data + change_offset; + *tt_change = (struct batadv_tvlv_tt_change *)tt_change_ptr; + +out: + rcu_read_unlock(); + return tvlv_len; } -static int batadv_tt_changes_fill_buff(struct batadv_priv *bat_priv, - unsigned char **packet_buff, - int *packet_buff_len, - int min_packet_len) +/** + * batadv_tt_tvlv_container_update - update the translation table tvlv container + * after local tt changes have been committed + * @bat_priv: the bat priv with all the soft interface information + */ +static void batadv_tt_tvlv_container_update(struct batadv_priv *bat_priv) { struct batadv_tt_change_node *entry, *safe; - int count = 0, tot_changes = 0, new_len; - unsigned char *tt_buff; + struct batadv_tvlv_tt_data *tt_data; + struct batadv_tvlv_tt_change *tt_change; + int tt_diff_len, tt_change_len = 0; + int tt_diff_entries_num = 0, tt_diff_entries_count = 0; + uint16_t tvlv_len; - batadv_tt_prepare_packet_buff(bat_priv, packet_buff, - packet_buff_len, min_packet_len); + tt_diff_entries_num = atomic_read(&bat_priv->tt.local_changes); + tt_diff_len = batadv_tt_len(tt_diff_entries_num); - new_len = *packet_buff_len - min_packet_len; - tt_buff = *packet_buff + min_packet_len; + /* if we have too many changes for one packet don't send any + * and wait for the tt table request which will be fragmented + */ + if (tt_diff_len > bat_priv->soft_iface->mtu) + tt_diff_len = 0; + + tvlv_len = batadv_tt_prepare_tvlv_local_data(bat_priv, &tt_data, + &tt_change, &tt_diff_len); + if (!tvlv_len) + return; - if (new_len > 0) - tot_changes = new_len / batadv_tt_len(1); + tt_data->flags = BATADV_TT_OGM_DIFF; + + if (tt_diff_len == 0) + goto container_register; spin_lock_bh(&bat_priv->tt.changes_list_lock); atomic_set(&bat_priv->tt.local_changes, 0); list_for_each_entry_safe(entry, safe, &bat_priv->tt.changes_list, list) { - if (count < tot_changes) { - memcpy(tt_buff + batadv_tt_len(count), - &entry->change, sizeof(struct batadv_tt_change)); - count++; + if (tt_diff_entries_count < tt_diff_entries_num) { + memcpy(tt_change + tt_diff_entries_count, + &entry->change, + sizeof(struct batadv_tvlv_tt_change)); + tt_diff_entries_count++; } list_del(&entry->list); kfree(entry); @@ -452,20 +828,25 @@ static int batadv_tt_changes_fill_buff(struct batadv_priv *bat_priv, kfree(bat_priv->tt.last_changeset); bat_priv->tt.last_changeset_len = 0; bat_priv->tt.last_changeset = NULL; + tt_change_len = batadv_tt_len(tt_diff_entries_count); /* check whether this new OGM has no changes due to size problems */ - if (new_len > 0) { + if (tt_diff_entries_count > 0) { /* if kmalloc() fails we will reply with the full table * instead of providing the diff */ - bat_priv->tt.last_changeset = kmalloc(new_len, GFP_ATOMIC); + bat_priv->tt.last_changeset = kzalloc(tt_diff_len, GFP_ATOMIC); if (bat_priv->tt.last_changeset) { - memcpy(bat_priv->tt.last_changeset, tt_buff, new_len); - bat_priv->tt.last_changeset_len = new_len; + memcpy(bat_priv->tt.last_changeset, + tt_change, tt_change_len); + bat_priv->tt.last_changeset_len = tt_diff_len; } } spin_unlock_bh(&bat_priv->tt.last_changeset_lock); - return count; +container_register: + batadv_tvlv_container_register(bat_priv, BATADV_TVLV_TT, 1, tt_data, + tvlv_len); + kfree(tt_data); } int batadv_tt_local_seq_print_text(struct seq_file *seq, void *offset) @@ -476,7 +857,9 @@ int batadv_tt_local_seq_print_text(struct seq_file *seq, void *offset) struct batadv_tt_common_entry *tt_common_entry; struct batadv_tt_local_entry *tt_local; struct batadv_hard_iface *primary_if; + struct batadv_softif_vlan *vlan; struct hlist_head *head; + unsigned short vid; uint32_t i; int last_seen_secs; int last_seen_msecs; @@ -489,11 +872,10 @@ int batadv_tt_local_seq_print_text(struct seq_file *seq, void *offset) goto out; seq_printf(seq, - "Locally retrieved addresses (from %s) announced via TT (TTVN: %u CRC: %#.4x):\n", - net_dev->name, (uint8_t)atomic_read(&bat_priv->tt.vn), - bat_priv->tt.local_crc); - seq_printf(seq, " %-13s %-7s %-10s\n", "Client", "Flags", - "Last seen"); + "Locally retrieved addresses (from %s) announced via TT (TTVN: %u):\n", + net_dev->name, (uint8_t)atomic_read(&bat_priv->tt.vn)); + seq_printf(seq, " %-13s %s %-7s %-9s (%-10s)\n", "Client", "VID", + "Flags", "Last seen", "CRC"); for (i = 0; i < hash->size; i++) { head = &hash->table[i]; @@ -504,6 +886,7 @@ int batadv_tt_local_seq_print_text(struct seq_file *seq, void *offset) tt_local = container_of(tt_common_entry, struct batadv_tt_local_entry, common); + vid = tt_common_entry->vid; last_seen_jiffies = jiffies - tt_local->last_seen; last_seen_msecs = jiffies_to_msecs(last_seen_jiffies); last_seen_secs = last_seen_msecs / 1000; @@ -511,8 +894,17 @@ int batadv_tt_local_seq_print_text(struct seq_file *seq, void *offset) no_purge = tt_common_entry->flags & np_flag; - seq_printf(seq, " * %pM [%c%c%c%c%c] %3u.%03u\n", + vlan = batadv_softif_vlan_get(bat_priv, vid); + if (!vlan) { + seq_printf(seq, "Cannot retrieve VLAN %d\n", + BATADV_PRINT_VID(vid)); + continue; + } + + seq_printf(seq, + " * %pM %4i [%c%c%c%c%c] %3u.%03u (%#.8x)\n", tt_common_entry->addr, + BATADV_PRINT_VID(tt_common_entry->vid), (tt_common_entry->flags & BATADV_TT_CLIENT_ROAM ? 'R' : '.'), no_purge ? 'P' : '.', @@ -523,7 +915,10 @@ int batadv_tt_local_seq_print_text(struct seq_file *seq, void *offset) (tt_common_entry->flags & BATADV_TT_CLIENT_WIFI ? 'W' : '.'), no_purge ? 0 : last_seen_secs, - no_purge ? 0 : last_seen_msecs); + no_purge ? 0 : last_seen_msecs, + vlan->tt.crc); + + batadv_softif_vlan_free_ref(vlan); } rcu_read_unlock(); } @@ -547,27 +942,29 @@ batadv_tt_local_set_pending(struct batadv_priv *bat_priv, tt_local_entry->common.flags |= BATADV_TT_CLIENT_PENDING; batadv_dbg(BATADV_DBG_TT, bat_priv, - "Local tt entry (%pM) pending to be removed: %s\n", - tt_local_entry->common.addr, message); + "Local tt entry (%pM, vid: %d) pending to be removed: %s\n", + tt_local_entry->common.addr, + BATADV_PRINT_VID(tt_local_entry->common.vid), message); } /** * batadv_tt_local_remove - logically remove an entry from the local table * @bat_priv: the bat priv with all the soft interface information * @addr: the MAC address of the client to remove + * @vid: VLAN identifier * @message: message to append to the log on deletion * @roaming: true if the deletion is due to a roaming event * * Returns the flags assigned to the local entry before being deleted */ uint16_t batadv_tt_local_remove(struct batadv_priv *bat_priv, - const uint8_t *addr, const char *message, - bool roaming) + const uint8_t *addr, unsigned short vid, + const char *message, bool roaming) { struct batadv_tt_local_entry *tt_local_entry; uint16_t flags, curr_flags = BATADV_NO_FLAGS; - tt_local_entry = batadv_tt_local_hash_find(bat_priv, addr); + tt_local_entry = batadv_tt_local_hash_find(bat_priv, addr, vid); if (!tt_local_entry) goto out; @@ -603,8 +1000,16 @@ out: return curr_flags; } +/** + * batadv_tt_local_purge_list - purge inactive tt local entries + * @bat_priv: the bat priv with all the soft interface information + * @head: pointer to the list containing the local tt entries + * @timeout: parameter deciding whether a given tt local entry is considered + * inactive or not + */ static void batadv_tt_local_purge_list(struct batadv_priv *bat_priv, - struct hlist_head *head) + struct hlist_head *head, + int timeout) { struct batadv_tt_local_entry *tt_local_entry; struct batadv_tt_common_entry *tt_common_entry; @@ -622,8 +1027,7 @@ static void batadv_tt_local_purge_list(struct batadv_priv *bat_priv, if (tt_local_entry->common.flags & BATADV_TT_CLIENT_PENDING) continue; - if (!batadv_has_timed_out(tt_local_entry->last_seen, - BATADV_TT_LOCAL_TIMEOUT)) + if (!batadv_has_timed_out(tt_local_entry->last_seen, timeout)) continue; batadv_tt_local_set_pending(bat_priv, tt_local_entry, @@ -631,7 +1035,14 @@ static void batadv_tt_local_purge_list(struct batadv_priv *bat_priv, } } -static void batadv_tt_local_purge(struct batadv_priv *bat_priv) +/** + * batadv_tt_local_purge - purge inactive tt local entries + * @bat_priv: the bat priv with all the soft interface information + * @timeout: parameter deciding whether a given tt local entry is considered + * inactive or not + */ +static void batadv_tt_local_purge(struct batadv_priv *bat_priv, + int timeout) { struct batadv_hashtable *hash = bat_priv->tt.local_hash; struct hlist_head *head; @@ -643,7 +1054,7 @@ static void batadv_tt_local_purge(struct batadv_priv *bat_priv) list_lock = &hash->list_locks[i]; spin_lock_bh(list_lock); - batadv_tt_local_purge_list(bat_priv, head); + batadv_tt_local_purge_list(bat_priv, head, timeout); spin_unlock_bh(list_lock); } } @@ -784,7 +1195,7 @@ batadv_tt_global_orig_entry_add(struct batadv_tt_global_entry *tt_global, INIT_HLIST_NODE(&orig_entry->list); atomic_inc(&orig_node->refcount); - atomic_inc(&orig_node->tt_size); + batadv_tt_global_size_inc(orig_node, tt_global->common.vid); orig_entry->orig_node = orig_node; orig_entry->ttvn = ttvn; atomic_set(&orig_entry->refcount, 2); @@ -803,6 +1214,7 @@ out: * @bat_priv: the bat priv with all the soft interface information * @orig_node: the originator announcing the client * @tt_addr: the mac address of the non-mesh client + * @vid: VLAN identifier * @flags: TT flags that have to be set for this non-mesh client * @ttvn: the tt version number ever announcing this non-mesh client * @@ -813,21 +1225,28 @@ out: * If a TT local entry exists for this non-mesh client remove it. * * The caller must hold orig_node refcount. + * + * Return true if the new entry has been added, false otherwise */ -int batadv_tt_global_add(struct batadv_priv *bat_priv, - struct batadv_orig_node *orig_node, - const unsigned char *tt_addr, uint16_t flags, - uint8_t ttvn) +static bool batadv_tt_global_add(struct batadv_priv *bat_priv, + struct batadv_orig_node *orig_node, + const unsigned char *tt_addr, + unsigned short vid, uint16_t flags, + uint8_t ttvn) { struct batadv_tt_global_entry *tt_global_entry; struct batadv_tt_local_entry *tt_local_entry; - int ret = 0; + bool ret = false; int hash_added; struct batadv_tt_common_entry *common; uint16_t local_flags; - tt_global_entry = batadv_tt_global_hash_find(bat_priv, tt_addr); - tt_local_entry = batadv_tt_local_hash_find(bat_priv, tt_addr); + /* ignore global entries from backbone nodes */ + if (batadv_bla_is_backbone_gw_orig(bat_priv, orig_node->orig, vid)) + return true; + + tt_global_entry = batadv_tt_global_hash_find(bat_priv, tt_addr, vid); + tt_local_entry = batadv_tt_local_hash_find(bat_priv, tt_addr, vid); /* if the node already has a local client for this entry, it has to wait * for a roaming advertisement instead of manually messing up the global @@ -844,6 +1263,7 @@ int batadv_tt_global_add(struct batadv_priv *bat_priv, common = &tt_global_entry->common; memcpy(common->addr, tt_addr, ETH_ALEN); + common->vid = vid; common->flags = flags; tt_global_entry->roam_at = 0; @@ -861,7 +1281,7 @@ int batadv_tt_global_add(struct batadv_priv *bat_priv, hash_added = batadv_hash_add(bat_priv->tt.global_hash, batadv_compare_tt, - batadv_choose_orig, common, + batadv_choose_tt, common, &common->hash_entry); if (unlikely(hash_added != 0)) { @@ -920,14 +1340,15 @@ add_orig_entry: batadv_tt_global_orig_entry_add(tt_global_entry, orig_node, ttvn); batadv_dbg(BATADV_DBG_TT, bat_priv, - "Creating new global tt entry: %pM (via %pM)\n", - common->addr, orig_node->orig); - ret = 1; + "Creating new global tt entry: %pM (vid: %d, via %pM)\n", + common->addr, BATADV_PRINT_VID(common->vid), + orig_node->orig); + ret = true; out_remove: /* remove address from local hash if present */ - local_flags = batadv_tt_local_remove(bat_priv, tt_addr, + local_flags = batadv_tt_local_remove(bat_priv, tt_addr, vid, "global tt received", flags & BATADV_TT_CLIENT_ROAM); tt_global_entry->common.flags |= local_flags & BATADV_TT_CLIENT_WIFI; @@ -947,18 +1368,20 @@ out: } /* batadv_transtable_best_orig - Get best originator list entry from tt entry + * @bat_priv: the bat priv with all the soft interface information * @tt_global_entry: global translation table entry to be analyzed * * This functon assumes the caller holds rcu_read_lock(). * Returns best originator list entry or NULL on errors. */ static struct batadv_tt_orig_list_entry * -batadv_transtable_best_orig(struct batadv_tt_global_entry *tt_global_entry) +batadv_transtable_best_orig(struct batadv_priv *bat_priv, + struct batadv_tt_global_entry *tt_global_entry) { - struct batadv_neigh_node *router = NULL; + struct batadv_neigh_node *router, *best_router = NULL; + struct batadv_algo_ops *bao = bat_priv->bat_algo_ops; struct hlist_head *head; struct batadv_tt_orig_list_entry *orig_entry, *best_entry = NULL; - int best_tq = 0; head = &tt_global_entry->orig_list; hlist_for_each_entry_rcu(orig_entry, head, list) { @@ -966,64 +1389,104 @@ batadv_transtable_best_orig(struct batadv_tt_global_entry *tt_global_entry) if (!router) continue; - if (router->tq_avg > best_tq) { - best_entry = orig_entry; - best_tq = router->tq_avg; + if (best_router && + bao->bat_neigh_cmp(router, best_router) <= 0) { + batadv_neigh_node_free_ref(router); + continue; } - batadv_neigh_node_free_ref(router); + /* release the refcount for the "old" best */ + if (best_router) + batadv_neigh_node_free_ref(best_router); + + best_entry = orig_entry; + best_router = router; } + if (best_router) + batadv_neigh_node_free_ref(best_router); + return best_entry; } /* batadv_tt_global_print_entry - print all orig nodes who announce the address * for this global entry + * @bat_priv: the bat priv with all the soft interface information * @tt_global_entry: global translation table entry to be printed * @seq: debugfs table seq_file struct * * This functon assumes the caller holds rcu_read_lock(). */ static void -batadv_tt_global_print_entry(struct batadv_tt_global_entry *tt_global_entry, +batadv_tt_global_print_entry(struct batadv_priv *bat_priv, + struct batadv_tt_global_entry *tt_global_entry, struct seq_file *seq) { - struct hlist_head *head; struct batadv_tt_orig_list_entry *orig_entry, *best_entry; struct batadv_tt_common_entry *tt_common_entry; - uint16_t flags; + struct batadv_orig_node_vlan *vlan; + struct hlist_head *head; uint8_t last_ttvn; + uint16_t flags; tt_common_entry = &tt_global_entry->common; flags = tt_common_entry->flags; - best_entry = batadv_transtable_best_orig(tt_global_entry); + best_entry = batadv_transtable_best_orig(bat_priv, tt_global_entry); if (best_entry) { + vlan = batadv_orig_node_vlan_get(best_entry->orig_node, + tt_common_entry->vid); + if (!vlan) { + seq_printf(seq, + " * Cannot retrieve VLAN %d for originator %pM\n", + BATADV_PRINT_VID(tt_common_entry->vid), + best_entry->orig_node->orig); + goto print_list; + } + last_ttvn = atomic_read(&best_entry->orig_node->last_ttvn); seq_printf(seq, - " %c %pM (%3u) via %pM (%3u) (%#.4x) [%c%c%c]\n", + " %c %pM %4i (%3u) via %pM (%3u) (%#.8x) [%c%c%c]\n", '*', tt_global_entry->common.addr, + BATADV_PRINT_VID(tt_global_entry->common.vid), best_entry->ttvn, best_entry->orig_node->orig, - last_ttvn, best_entry->orig_node->tt_crc, + last_ttvn, vlan->tt.crc, (flags & BATADV_TT_CLIENT_ROAM ? 'R' : '.'), (flags & BATADV_TT_CLIENT_WIFI ? 'W' : '.'), (flags & BATADV_TT_CLIENT_TEMP ? 'T' : '.')); + + batadv_orig_node_vlan_free_ref(vlan); } +print_list: head = &tt_global_entry->orig_list; hlist_for_each_entry_rcu(orig_entry, head, list) { if (best_entry == orig_entry) continue; + vlan = batadv_orig_node_vlan_get(orig_entry->orig_node, + tt_common_entry->vid); + if (!vlan) { + seq_printf(seq, + " + Cannot retrieve VLAN %d for originator %pM\n", + BATADV_PRINT_VID(tt_common_entry->vid), + orig_entry->orig_node->orig); + continue; + } + last_ttvn = atomic_read(&orig_entry->orig_node->last_ttvn); - seq_printf(seq, " %c %pM (%3u) via %pM (%3u) [%c%c%c]\n", + seq_printf(seq, + " %c %pM %4d (%3u) via %pM (%3u) (%#.8x) [%c%c%c]\n", '+', tt_global_entry->common.addr, + BATADV_PRINT_VID(tt_global_entry->common.vid), orig_entry->ttvn, orig_entry->orig_node->orig, - last_ttvn, + last_ttvn, vlan->tt.crc, (flags & BATADV_TT_CLIENT_ROAM ? 'R' : '.'), (flags & BATADV_TT_CLIENT_WIFI ? 'W' : '.'), (flags & BATADV_TT_CLIENT_TEMP ? 'T' : '.')); + + batadv_orig_node_vlan_free_ref(vlan); } } @@ -1045,9 +1508,9 @@ int batadv_tt_global_seq_print_text(struct seq_file *seq, void *offset) seq_printf(seq, "Globally announced TT entries received via the mesh %s\n", net_dev->name); - seq_printf(seq, " %-13s %s %-15s %s (%-6s) %s\n", - "Client", "(TTVN)", "Originator", "(Curr TTVN)", "CRC", - "Flags"); + seq_printf(seq, " %-13s %s %s %-15s %s (%-10s) %s\n", + "Client", "VID", "(TTVN)", "Originator", "(Curr TTVN)", + "CRC", "Flags"); for (i = 0; i < hash->size; i++) { head = &hash->table[i]; @@ -1058,7 +1521,7 @@ int batadv_tt_global_seq_print_text(struct seq_file *seq, void *offset) tt_global = container_of(tt_common_entry, struct batadv_tt_global_entry, common); - batadv_tt_global_print_entry(tt_global, seq); + batadv_tt_global_print_entry(bat_priv, tt_global, seq); } rcu_read_unlock(); } @@ -1080,6 +1543,8 @@ batadv_tt_global_del_orig_list(struct batadv_tt_global_entry *tt_global_entry) head = &tt_global_entry->orig_list; hlist_for_each_entry_safe(orig_entry, safe, head, list) { hlist_del_rcu(&orig_entry->list); + batadv_tt_global_size_dec(orig_entry->orig_node, + tt_global_entry->common.vid); batadv_tt_orig_list_entry_free_ref(orig_entry); } spin_unlock_bh(&tt_global_entry->list_lock); @@ -1094,16 +1559,21 @@ batadv_tt_global_del_orig_entry(struct batadv_priv *bat_priv, struct hlist_head *head; struct hlist_node *safe; struct batadv_tt_orig_list_entry *orig_entry; + unsigned short vid; spin_lock_bh(&tt_global_entry->list_lock); head = &tt_global_entry->orig_list; hlist_for_each_entry_safe(orig_entry, safe, head, list) { if (orig_entry->orig_node == orig_node) { + vid = tt_global_entry->common.vid; batadv_dbg(BATADV_DBG_TT, bat_priv, - "Deleting %pM from global tt entry %pM: %s\n", + "Deleting %pM from global tt entry %pM (vid: %d): %s\n", orig_node->orig, - tt_global_entry->common.addr, message); + tt_global_entry->common.addr, + BATADV_PRINT_VID(vid), message); hlist_del_rcu(&orig_entry->list); + batadv_tt_global_size_dec(orig_node, + tt_global_entry->common.vid); batadv_tt_orig_list_entry_free_ref(orig_entry); } } @@ -1150,17 +1620,25 @@ batadv_tt_global_del_roaming(struct batadv_priv *bat_priv, orig_node, message); } - - +/** + * batadv_tt_global_del - remove a client from the global table + * @bat_priv: the bat priv with all the soft interface information + * @orig_node: an originator serving this client + * @addr: the mac address of the client + * @vid: VLAN identifier + * @message: a message explaining the reason for deleting the client to print + * for debugging purpose + * @roaming: true if the deletion has been triggered by a roaming event + */ static void batadv_tt_global_del(struct batadv_priv *bat_priv, struct batadv_orig_node *orig_node, - const unsigned char *addr, + const unsigned char *addr, unsigned short vid, const char *message, bool roaming) { struct batadv_tt_global_entry *tt_global_entry; struct batadv_tt_local_entry *local_entry = NULL; - tt_global_entry = batadv_tt_global_hash_find(bat_priv, addr); + tt_global_entry = batadv_tt_global_hash_find(bat_priv, addr, vid); if (!tt_global_entry) goto out; @@ -1189,7 +1667,8 @@ static void batadv_tt_global_del(struct batadv_priv *bat_priv, * the global entry, since it is useless now. */ local_entry = batadv_tt_local_hash_find(bat_priv, - tt_global_entry->common.addr); + tt_global_entry->common.addr, + vid); if (local_entry) { /* local entry exists, case 2: client roamed to us. */ batadv_tt_global_del_orig_list(tt_global_entry); @@ -1207,8 +1686,18 @@ out: batadv_tt_local_entry_free_ref(local_entry); } +/** + * batadv_tt_global_del_orig - remove all the TT global entries belonging to the + * given originator matching the provided vid + * @bat_priv: the bat priv with all the soft interface information + * @orig_node: the originator owning the entries to remove + * @match_vid: the VLAN identifier to match. If negative all the entries will be + * removed + * @message: debug message to print as "reason" + */ void batadv_tt_global_del_orig(struct batadv_priv *bat_priv, struct batadv_orig_node *orig_node, + int32_t match_vid, const char *message) { struct batadv_tt_global_entry *tt_global; @@ -1218,6 +1707,7 @@ void batadv_tt_global_del_orig(struct batadv_priv *bat_priv, struct hlist_node *safe; struct hlist_head *head; spinlock_t *list_lock; /* protects write access to the hash lists */ + unsigned short vid; if (!hash) return; @@ -1229,6 +1719,10 @@ void batadv_tt_global_del_orig(struct batadv_priv *bat_priv, spin_lock_bh(list_lock); hlist_for_each_entry_safe(tt_common_entry, safe, head, hash_entry) { + /* remove only matching entries */ + if (match_vid >= 0 && tt_common_entry->vid != match_vid) + continue; + tt_global = container_of(tt_common_entry, struct batadv_tt_global_entry, common); @@ -1237,9 +1731,11 @@ void batadv_tt_global_del_orig(struct batadv_priv *bat_priv, orig_node, message); if (hlist_empty(&tt_global->orig_list)) { + vid = tt_global->common.vid; batadv_dbg(BATADV_DBG_TT, bat_priv, - "Deleting global tt entry %pM: %s\n", - tt_global->common.addr, message); + "Deleting global tt entry %pM (vid: %d): %s\n", + tt_global->common.addr, + BATADV_PRINT_VID(vid), message); hlist_del_rcu(&tt_common_entry->hash_entry); batadv_tt_global_entry_free_ref(tt_global); } @@ -1297,8 +1793,10 @@ static void batadv_tt_global_purge(struct batadv_priv *bat_priv) continue; batadv_dbg(BATADV_DBG_TT, bat_priv, - "Deleting global tt entry (%pM): %s\n", - tt_global->common.addr, msg); + "Deleting global tt entry %pM (vid: %d): %s\n", + tt_global->common.addr, + BATADV_PRINT_VID(tt_global->common.vid), + msg); hlist_del_rcu(&tt_common->hash_entry); @@ -1357,23 +1855,49 @@ _batadv_is_ap_isolated(struct batadv_tt_local_entry *tt_local_entry, return ret; } +/** + * batadv_transtable_search - get the mesh destination for a given client + * @bat_priv: the bat priv with all the soft interface information + * @src: mac address of the source client + * @addr: mac address of the destination client + * @vid: VLAN identifier + * + * Returns a pointer to the originator that was selected as destination in the + * mesh for contacting the client 'addr', NULL otherwise. + * In case of multiple originators serving the same client, the function returns + * the best one (best in terms of metric towards the destination node). + * + * If the two clients are AP isolated the function returns NULL. + */ struct batadv_orig_node *batadv_transtable_search(struct batadv_priv *bat_priv, const uint8_t *src, - const uint8_t *addr) + const uint8_t *addr, + unsigned short vid) { struct batadv_tt_local_entry *tt_local_entry = NULL; struct batadv_tt_global_entry *tt_global_entry = NULL; struct batadv_orig_node *orig_node = NULL; struct batadv_tt_orig_list_entry *best_entry; + bool ap_isolation_enabled = false; + struct batadv_softif_vlan *vlan; - if (src && atomic_read(&bat_priv->ap_isolation)) { - tt_local_entry = batadv_tt_local_hash_find(bat_priv, src); + /* if the AP isolation is requested on a VLAN, then check for its + * setting in the proper VLAN private data structure + */ + vlan = batadv_softif_vlan_get(bat_priv, vid); + if (vlan) { + ap_isolation_enabled = atomic_read(&vlan->ap_isolation); + batadv_softif_vlan_free_ref(vlan); + } + + if (src && ap_isolation_enabled) { + tt_local_entry = batadv_tt_local_hash_find(bat_priv, src, vid); if (!tt_local_entry || (tt_local_entry->common.flags & BATADV_TT_CLIENT_PENDING)) goto out; } - tt_global_entry = batadv_tt_global_hash_find(bat_priv, addr); + tt_global_entry = batadv_tt_global_hash_find(bat_priv, addr, vid); if (!tt_global_entry) goto out; @@ -1385,7 +1909,7 @@ struct batadv_orig_node *batadv_transtable_search(struct batadv_priv *bat_priv, goto out; rcu_read_lock(); - best_entry = batadv_transtable_best_orig(tt_global_entry); + best_entry = batadv_transtable_best_orig(bat_priv, tt_global_entry); /* found anything? */ if (best_entry) orig_node = best_entry->orig_node; @@ -1402,17 +1926,40 @@ out: return orig_node; } -/* Calculates the checksum of the local table of a given orig_node */ -static uint16_t batadv_tt_global_crc(struct batadv_priv *bat_priv, - struct batadv_orig_node *orig_node) +/** + * batadv_tt_global_crc - calculates the checksum of the local table belonging + * to the given orig_node + * @bat_priv: the bat priv with all the soft interface information + * @orig_node: originator for which the CRC should be computed + * @vid: VLAN identifier for which the CRC32 has to be computed + * + * This function computes the checksum for the global table corresponding to a + * specific originator. In particular, the checksum is computed as follows: For + * each client connected to the originator the CRC32C of the MAC address and the + * VID is computed and then all the CRC32Cs of the various clients are xor'ed + * together. + * + * The idea behind is that CRC32C should be used as much as possible in order to + * produce a unique hash of the table, but since the order which is used to feed + * the CRC32C function affects the result and since every node in the network + * probably sorts the clients differently, the hash function cannot be directly + * computed over the entire table. Hence the CRC32C is used only on + * the single client entry, while all the results are then xor'ed together + * because the XOR operation can combine them all while trying to reduce the + * noise as much as possible. + * + * Returns the checksum of the global table of a given originator. + */ +static uint32_t batadv_tt_global_crc(struct batadv_priv *bat_priv, + struct batadv_orig_node *orig_node, + unsigned short vid) { - uint16_t total = 0, total_one; struct batadv_hashtable *hash = bat_priv->tt.global_hash; struct batadv_tt_common_entry *tt_common; struct batadv_tt_global_entry *tt_global; struct hlist_head *head; - uint32_t i; - int j; + uint32_t i, crc_tmp, crc = 0; + uint8_t flags; for (i = 0; i < hash->size; i++) { head = &hash->table[i]; @@ -1422,6 +1969,12 @@ static uint16_t batadv_tt_global_crc(struct batadv_priv *bat_priv, tt_global = container_of(tt_common, struct batadv_tt_global_entry, common); + /* compute the CRC only for entries belonging to the + * VLAN identified by the vid passed as parameter + */ + if (tt_common->vid != vid) + continue; + /* Roaming clients are in the global table for * consistency only. They don't have to be * taken into account while computing the @@ -1443,48 +1996,74 @@ static uint16_t batadv_tt_global_crc(struct batadv_priv *bat_priv, orig_node)) continue; - total_one = 0; - for (j = 0; j < ETH_ALEN; j++) - total_one = crc16_byte(total_one, - tt_common->addr[j]); - total ^= total_one; + crc_tmp = crc32c(0, &tt_common->vid, + sizeof(tt_common->vid)); + + /* compute the CRC on flags that have to be kept in sync + * among nodes + */ + flags = tt_common->flags & BATADV_TT_SYNC_MASK; + crc_tmp = crc32c(crc_tmp, &flags, sizeof(flags)); + + crc ^= crc32c(crc_tmp, tt_common->addr, ETH_ALEN); } rcu_read_unlock(); } - return total; + return crc; } -/* Calculates the checksum of the local table */ -static uint16_t batadv_tt_local_crc(struct batadv_priv *bat_priv) +/** + * batadv_tt_local_crc - calculates the checksum of the local table + * @bat_priv: the bat priv with all the soft interface information + * @vid: VLAN identifier for which the CRC32 has to be computed + * + * For details about the computation, please refer to the documentation for + * batadv_tt_global_crc(). + * + * Returns the checksum of the local table + */ +static uint32_t batadv_tt_local_crc(struct batadv_priv *bat_priv, + unsigned short vid) { - uint16_t total = 0, total_one; struct batadv_hashtable *hash = bat_priv->tt.local_hash; struct batadv_tt_common_entry *tt_common; struct hlist_head *head; - uint32_t i; - int j; + uint32_t i, crc_tmp, crc = 0; + uint8_t flags; for (i = 0; i < hash->size; i++) { head = &hash->table[i]; rcu_read_lock(); hlist_for_each_entry_rcu(tt_common, head, hash_entry) { + /* compute the CRC only for entries belonging to the + * VLAN identified by vid + */ + if (tt_common->vid != vid) + continue; + /* not yet committed clients have not to be taken into * account while computing the CRC */ if (tt_common->flags & BATADV_TT_CLIENT_NEW) continue; - total_one = 0; - for (j = 0; j < ETH_ALEN; j++) - total_one = crc16_byte(total_one, - tt_common->addr[j]); - total ^= total_one; + + crc_tmp = crc32c(0, &tt_common->vid, + sizeof(tt_common->vid)); + + /* compute the CRC on flags that have to be kept in sync + * among nodes + */ + flags = tt_common->flags & BATADV_TT_SYNC_MASK; + crc_tmp = crc32c(crc_tmp, &flags, sizeof(flags)); + + crc ^= crc32c(crc_tmp, tt_common->addr, ETH_ALEN); } rcu_read_unlock(); } - return total; + return crc; } static void batadv_tt_req_list_free(struct batadv_priv *bat_priv) @@ -1503,11 +2082,9 @@ static void batadv_tt_req_list_free(struct batadv_priv *bat_priv) static void batadv_tt_save_orig_buffer(struct batadv_priv *bat_priv, struct batadv_orig_node *orig_node, - const unsigned char *tt_buff, - uint8_t tt_num_changes) + const void *tt_buff, + uint16_t tt_buff_len) { - uint16_t tt_buff_len = batadv_tt_len(tt_num_changes); - /* Replace the old buffer only if I received something in the * last OGM (the OGM could carry no changes) */ @@ -1569,9 +2146,14 @@ unlock: return tt_req_node; } -/* data_ptr is useless here, but has to be kept to respect the prototype */ -static int batadv_tt_local_valid_entry(const void *entry_ptr, - const void *data_ptr) +/** + * batadv_tt_local_valid - verify that given tt entry is a valid one + * @entry_ptr: to be checked local tt entry + * @data_ptr: not used but definition required to satisfy the callback prototype + * + * Returns 1 if the entry is a valid, 0 otherwise. + */ +static int batadv_tt_local_valid(const void *entry_ptr, const void *data_ptr) { const struct batadv_tt_common_entry *tt_common_entry = entry_ptr; @@ -1598,41 +2180,30 @@ static int batadv_tt_global_valid(const void *entry_ptr, return batadv_tt_global_entry_has_orig(tt_global_entry, orig_node); } -static struct sk_buff * -batadv_tt_response_fill_table(uint16_t tt_len, uint8_t ttvn, - struct batadv_hashtable *hash, - struct batadv_priv *bat_priv, - int (*valid_cb)(const void *, const void *), - void *cb_data) +/** + * batadv_tt_tvlv_generate - fill the tvlv buff with the tt entries from the + * specified tt hash + * @bat_priv: the bat priv with all the soft interface information + * @hash: hash table containing the tt entries + * @tt_len: expected tvlv tt data buffer length in number of bytes + * @tvlv_buff: pointer to the buffer to fill with the TT data + * @valid_cb: function to filter tt change entries + * @cb_data: data passed to the filter function as argument + */ +static void batadv_tt_tvlv_generate(struct batadv_priv *bat_priv, + struct batadv_hashtable *hash, + void *tvlv_buff, uint16_t tt_len, + int (*valid_cb)(const void *, const void *), + void *cb_data) { struct batadv_tt_common_entry *tt_common_entry; - struct batadv_tt_query_packet *tt_response; - struct batadv_tt_change *tt_change; + struct batadv_tvlv_tt_change *tt_change; struct hlist_head *head; - struct sk_buff *skb = NULL; - uint16_t tt_tot, tt_count; - ssize_t tt_query_size = sizeof(struct batadv_tt_query_packet); + uint16_t tt_tot, tt_num_entries = 0; uint32_t i; - size_t len; - - if (tt_query_size + tt_len > bat_priv->soft_iface->mtu) { - tt_len = bat_priv->soft_iface->mtu - tt_query_size; - tt_len -= tt_len % sizeof(struct batadv_tt_change); - } - tt_tot = tt_len / sizeof(struct batadv_tt_change); - - len = tt_query_size + tt_len; - skb = netdev_alloc_skb_ip_align(NULL, len + ETH_HLEN); - if (!skb) - goto out; - - skb->priority = TC_PRIO_CONTROL; - skb_reserve(skb, ETH_HLEN); - tt_response = (struct batadv_tt_query_packet *)skb_put(skb, len); - tt_response->ttvn = ttvn; - tt_change = (struct batadv_tt_change *)(skb->data + tt_query_size); - tt_count = 0; + tt_tot = batadv_tt_entries(tt_len); + tt_change = (struct batadv_tvlv_tt_change *)tvlv_buff; rcu_read_lock(); for (i = 0; i < hash->size; i++) { @@ -1640,7 +2211,7 @@ batadv_tt_response_fill_table(uint16_t tt_len, uint8_t ttvn, hlist_for_each_entry_rcu(tt_common_entry, head, hash_entry) { - if (tt_count == tt_tot) + if (tt_tot == tt_num_entries) break; if ((valid_cb) && (!valid_cb(tt_common_entry, cb_data))) @@ -1649,33 +2220,123 @@ batadv_tt_response_fill_table(uint16_t tt_len, uint8_t ttvn, memcpy(tt_change->addr, tt_common_entry->addr, ETH_ALEN); tt_change->flags = tt_common_entry->flags; + tt_change->vid = htons(tt_common_entry->vid); + tt_change->reserved = 0; - tt_count++; + tt_num_entries++; tt_change++; } } rcu_read_unlock(); +} - /* store in the message the number of entries we have successfully - * copied - */ - tt_response->tt_data = htons(tt_count); +/** + * batadv_tt_global_check_crc - check if all the CRCs are correct + * @orig_node: originator for which the CRCs have to be checked + * @tt_vlan: pointer to the first tvlv VLAN entry + * @num_vlan: number of tvlv VLAN entries + * @create: if true, create VLAN objects if not found + * + * Return true if all the received CRCs match the locally stored ones, false + * otherwise + */ +static bool batadv_tt_global_check_crc(struct batadv_orig_node *orig_node, + struct batadv_tvlv_tt_vlan_data *tt_vlan, + uint16_t num_vlan) +{ + struct batadv_tvlv_tt_vlan_data *tt_vlan_tmp; + struct batadv_orig_node_vlan *vlan; + int i; -out: - return skb; + /* check if each received CRC matches the locally stored one */ + for (i = 0; i < num_vlan; i++) { + tt_vlan_tmp = tt_vlan + i; + + /* if orig_node is a backbone node for this VLAN, don't check + * the CRC as we ignore all the global entries over it + */ + if (batadv_bla_is_backbone_gw_orig(orig_node->bat_priv, + orig_node->orig, + ntohs(tt_vlan_tmp->vid))) + continue; + + vlan = batadv_orig_node_vlan_get(orig_node, + ntohs(tt_vlan_tmp->vid)); + if (!vlan) + return false; + + if (vlan->tt.crc != ntohl(tt_vlan_tmp->crc)) + return false; + } + + return true; +} + +/** + * batadv_tt_local_update_crc - update all the local CRCs + * @bat_priv: the bat priv with all the soft interface information + */ +static void batadv_tt_local_update_crc(struct batadv_priv *bat_priv) +{ + struct batadv_softif_vlan *vlan; + + /* recompute the global CRC for each VLAN */ + rcu_read_lock(); + hlist_for_each_entry_rcu(vlan, &bat_priv->softif_vlan_list, list) { + vlan->tt.crc = batadv_tt_local_crc(bat_priv, vlan->vid); + } + rcu_read_unlock(); } +/** + * batadv_tt_global_update_crc - update all the global CRCs for this orig_node + * @bat_priv: the bat priv with all the soft interface information + * @orig_node: the orig_node for which the CRCs have to be updated + */ +static void batadv_tt_global_update_crc(struct batadv_priv *bat_priv, + struct batadv_orig_node *orig_node) +{ + struct batadv_orig_node_vlan *vlan; + uint32_t crc; + + /* recompute the global CRC for each VLAN */ + rcu_read_lock(); + list_for_each_entry_rcu(vlan, &orig_node->vlan_list, list) { + /* if orig_node is a backbone node for this VLAN, don't compute + * the CRC as we ignore all the global entries over it + */ + if (batadv_bla_is_backbone_gw_orig(bat_priv, orig_node->orig, + vlan->vid)) + continue; + + crc = batadv_tt_global_crc(bat_priv, orig_node, vlan->vid); + vlan->tt.crc = crc; + } + rcu_read_unlock(); +} + +/** + * batadv_send_tt_request - send a TT Request message to a given node + * @bat_priv: the bat priv with all the soft interface information + * @dst_orig_node: the destination of the message + * @ttvn: the version number that the source of the message is looking for + * @tt_vlan: pointer to the first tvlv VLAN object to request + * @num_vlan: number of tvlv VLAN entries + * @full_table: ask for the entire translation table if true, while only for the + * last TT diff otherwise + */ static int batadv_send_tt_request(struct batadv_priv *bat_priv, struct batadv_orig_node *dst_orig_node, - uint8_t ttvn, uint16_t tt_crc, - bool full_table) + uint8_t ttvn, + struct batadv_tvlv_tt_vlan_data *tt_vlan, + uint16_t num_vlan, bool full_table) { - struct sk_buff *skb = NULL; - struct batadv_tt_query_packet *tt_request; - struct batadv_hard_iface *primary_if; + struct batadv_tvlv_tt_data *tvlv_tt_data = NULL; struct batadv_tt_req_node *tt_req_node = NULL; - int ret = 1; - size_t tt_req_len; + struct batadv_tvlv_tt_vlan_data *tt_vlan_req; + struct batadv_hard_iface *primary_if; + bool ret = false; + int i, size; primary_if = batadv_primary_if_get_selected(bat_priv); if (!primary_if) @@ -1688,157 +2349,171 @@ static int batadv_send_tt_request(struct batadv_priv *bat_priv, if (!tt_req_node) goto out; - skb = netdev_alloc_skb_ip_align(NULL, sizeof(*tt_request) + ETH_HLEN); - if (!skb) + size = sizeof(*tvlv_tt_data) + sizeof(*tt_vlan_req) * num_vlan; + tvlv_tt_data = kzalloc(size, GFP_ATOMIC); + if (!tvlv_tt_data) goto out; - skb->priority = TC_PRIO_CONTROL; - skb_reserve(skb, ETH_HLEN); + tvlv_tt_data->flags = BATADV_TT_REQUEST; + tvlv_tt_data->ttvn = ttvn; + tvlv_tt_data->num_vlan = htons(num_vlan); - tt_req_len = sizeof(*tt_request); - tt_request = (struct batadv_tt_query_packet *)skb_put(skb, tt_req_len); + /* send all the CRCs within the request. This is needed by intermediate + * nodes to ensure they have the correct table before replying + */ + tt_vlan_req = (struct batadv_tvlv_tt_vlan_data *)(tvlv_tt_data + 1); + for (i = 0; i < num_vlan; i++) { + tt_vlan_req->vid = tt_vlan->vid; + tt_vlan_req->crc = tt_vlan->crc; - tt_request->header.packet_type = BATADV_TT_QUERY; - tt_request->header.version = BATADV_COMPAT_VERSION; - memcpy(tt_request->src, primary_if->net_dev->dev_addr, ETH_ALEN); - memcpy(tt_request->dst, dst_orig_node->orig, ETH_ALEN); - tt_request->header.ttl = BATADV_TTL; - tt_request->ttvn = ttvn; - tt_request->tt_data = htons(tt_crc); - tt_request->flags = BATADV_TT_REQUEST; + tt_vlan_req++; + tt_vlan++; + } if (full_table) - tt_request->flags |= BATADV_TT_FULL_TABLE; + tvlv_tt_data->flags |= BATADV_TT_FULL_TABLE; batadv_dbg(BATADV_DBG_TT, bat_priv, "Sending TT_REQUEST to %pM [%c]\n", - dst_orig_node->orig, (full_table ? 'F' : '.')); + dst_orig_node->orig, full_table ? 'F' : '.'); batadv_inc_counter(bat_priv, BATADV_CNT_TT_REQUEST_TX); - - if (batadv_send_skb_to_orig(skb, dst_orig_node, NULL) != NET_XMIT_DROP) - ret = 0; + batadv_tvlv_unicast_send(bat_priv, primary_if->net_dev->dev_addr, + dst_orig_node->orig, BATADV_TVLV_TT, 1, + tvlv_tt_data, size); + ret = true; out: if (primary_if) batadv_hardif_free_ref(primary_if); - if (ret) - kfree_skb(skb); if (ret && tt_req_node) { spin_lock_bh(&bat_priv->tt.req_list_lock); list_del(&tt_req_node->list); spin_unlock_bh(&bat_priv->tt.req_list_lock); kfree(tt_req_node); } + kfree(tvlv_tt_data); return ret; } -static bool -batadv_send_other_tt_response(struct batadv_priv *bat_priv, - struct batadv_tt_query_packet *tt_request) +/** + * batadv_send_other_tt_response - send reply to tt request concerning another + * node's translation table + * @bat_priv: the bat priv with all the soft interface information + * @tt_data: tt data containing the tt request information + * @req_src: mac address of tt request sender + * @req_dst: mac address of tt request recipient + * + * Returns true if tt request reply was sent, false otherwise. + */ +static bool batadv_send_other_tt_response(struct batadv_priv *bat_priv, + struct batadv_tvlv_tt_data *tt_data, + uint8_t *req_src, uint8_t *req_dst) { struct batadv_orig_node *req_dst_orig_node; struct batadv_orig_node *res_dst_orig_node = NULL; - uint8_t orig_ttvn, req_ttvn, ttvn; - int res, ret = false; - unsigned char *tt_buff; - bool full_table; - uint16_t tt_len, tt_tot; - struct sk_buff *skb = NULL; - struct batadv_tt_query_packet *tt_response; - uint8_t *packet_pos; - size_t len; + struct batadv_tvlv_tt_change *tt_change; + struct batadv_tvlv_tt_data *tvlv_tt_data = NULL; + struct batadv_tvlv_tt_vlan_data *tt_vlan; + bool ret = false, full_table; + uint8_t orig_ttvn, req_ttvn; + uint16_t tvlv_len; + int32_t tt_len; batadv_dbg(BATADV_DBG_TT, bat_priv, "Received TT_REQUEST from %pM for ttvn: %u (%pM) [%c]\n", - tt_request->src, tt_request->ttvn, tt_request->dst, - (tt_request->flags & BATADV_TT_FULL_TABLE ? 'F' : '.')); + req_src, tt_data->ttvn, req_dst, + (tt_data->flags & BATADV_TT_FULL_TABLE ? 'F' : '.')); /* Let's get the orig node of the REAL destination */ - req_dst_orig_node = batadv_orig_hash_find(bat_priv, tt_request->dst); + req_dst_orig_node = batadv_orig_hash_find(bat_priv, req_dst); if (!req_dst_orig_node) goto out; - res_dst_orig_node = batadv_orig_hash_find(bat_priv, tt_request->src); + res_dst_orig_node = batadv_orig_hash_find(bat_priv, req_src); if (!res_dst_orig_node) goto out; orig_ttvn = (uint8_t)atomic_read(&req_dst_orig_node->last_ttvn); - req_ttvn = tt_request->ttvn; + req_ttvn = tt_data->ttvn; - /* I don't have the requested data */ + tt_vlan = (struct batadv_tvlv_tt_vlan_data *)(tt_data + 1); + /* this node doesn't have the requested data */ if (orig_ttvn != req_ttvn || - tt_request->tt_data != htons(req_dst_orig_node->tt_crc)) + !batadv_tt_global_check_crc(req_dst_orig_node, tt_vlan, + ntohs(tt_data->num_vlan))) goto out; /* If the full table has been explicitly requested */ - if (tt_request->flags & BATADV_TT_FULL_TABLE || + if (tt_data->flags & BATADV_TT_FULL_TABLE || !req_dst_orig_node->tt_buff) full_table = true; else full_table = false; - /* In this version, fragmentation is not implemented, then - * I'll send only one packet with as much TT entries as I can + /* TT fragmentation hasn't been implemented yet, so send as many + * TT entries fit a single packet as possible only */ if (!full_table) { spin_lock_bh(&req_dst_orig_node->tt_buff_lock); tt_len = req_dst_orig_node->tt_buff_len; - tt_tot = tt_len / sizeof(struct batadv_tt_change); - len = sizeof(*tt_response) + tt_len; - skb = netdev_alloc_skb_ip_align(NULL, len + ETH_HLEN); - if (!skb) + tvlv_len = batadv_tt_prepare_tvlv_global_data(req_dst_orig_node, + &tvlv_tt_data, + &tt_change, + &tt_len); + if (!tt_len) goto unlock; - skb->priority = TC_PRIO_CONTROL; - skb_reserve(skb, ETH_HLEN); - packet_pos = skb_put(skb, len); - tt_response = (struct batadv_tt_query_packet *)packet_pos; - tt_response->ttvn = req_ttvn; - tt_response->tt_data = htons(tt_tot); - - tt_buff = skb->data + sizeof(*tt_response); /* Copy the last orig_node's OGM buffer */ - memcpy(tt_buff, req_dst_orig_node->tt_buff, + memcpy(tt_change, req_dst_orig_node->tt_buff, req_dst_orig_node->tt_buff_len); - spin_unlock_bh(&req_dst_orig_node->tt_buff_lock); } else { - tt_len = (uint16_t)atomic_read(&req_dst_orig_node->tt_size); - tt_len *= sizeof(struct batadv_tt_change); - ttvn = (uint8_t)atomic_read(&req_dst_orig_node->last_ttvn); - - skb = batadv_tt_response_fill_table(tt_len, ttvn, - bat_priv->tt.global_hash, - bat_priv, - batadv_tt_global_valid, - req_dst_orig_node); - if (!skb) + /* allocate the tvlv, put the tt_data and all the tt_vlan_data + * in the initial part + */ + tt_len = -1; + tvlv_len = batadv_tt_prepare_tvlv_global_data(req_dst_orig_node, + &tvlv_tt_data, + &tt_change, + &tt_len); + if (!tt_len) goto out; - tt_response = (struct batadv_tt_query_packet *)skb->data; + /* fill the rest of the tvlv with the real TT entries */ + batadv_tt_tvlv_generate(bat_priv, bat_priv->tt.global_hash, + tt_change, tt_len, + batadv_tt_global_valid, + req_dst_orig_node); } - tt_response->header.packet_type = BATADV_TT_QUERY; - tt_response->header.version = BATADV_COMPAT_VERSION; - tt_response->header.ttl = BATADV_TTL; - memcpy(tt_response->src, req_dst_orig_node->orig, ETH_ALEN); - memcpy(tt_response->dst, tt_request->src, ETH_ALEN); - tt_response->flags = BATADV_TT_RESPONSE; + /* Don't send the response, if larger than fragmented packet. */ + tt_len = sizeof(struct batadv_unicast_tvlv_packet) + tvlv_len; + if (tt_len > atomic_read(&bat_priv->packet_size_max)) { + net_ratelimited_function(batadv_info, bat_priv->soft_iface, + "Ignoring TT_REQUEST from %pM; Response size exceeds max packet size.\n", + res_dst_orig_node->orig); + goto out; + } + + tvlv_tt_data->flags = BATADV_TT_RESPONSE; + tvlv_tt_data->ttvn = req_ttvn; if (full_table) - tt_response->flags |= BATADV_TT_FULL_TABLE; + tvlv_tt_data->flags |= BATADV_TT_FULL_TABLE; batadv_dbg(BATADV_DBG_TT, bat_priv, - "Sending TT_RESPONSE %pM for %pM (ttvn: %u)\n", - res_dst_orig_node->orig, req_dst_orig_node->orig, req_ttvn); + "Sending TT_RESPONSE %pM for %pM [%c] (ttvn: %u)\n", + res_dst_orig_node->orig, req_dst_orig_node->orig, + full_table ? 'F' : '.', req_ttvn); batadv_inc_counter(bat_priv, BATADV_CNT_TT_RESPONSE_TX); - res = batadv_send_skb_to_orig(skb, res_dst_orig_node, NULL); - if (res != NET_XMIT_DROP) - ret = true; + batadv_tvlv_unicast_send(bat_priv, req_dst_orig_node->orig, + req_src, BATADV_TVLV_TT, 1, tvlv_tt_data, + tvlv_len); + ret = true; goto out; unlock: @@ -1849,37 +2524,43 @@ out: batadv_orig_node_free_ref(res_dst_orig_node); if (req_dst_orig_node) batadv_orig_node_free_ref(req_dst_orig_node); - if (!ret) - kfree_skb(skb); + kfree(tvlv_tt_data); return ret; } -static bool -batadv_send_my_tt_response(struct batadv_priv *bat_priv, - struct batadv_tt_query_packet *tt_request) +/** + * batadv_send_my_tt_response - send reply to tt request concerning this node's + * translation table + * @bat_priv: the bat priv with all the soft interface information + * @tt_data: tt data containing the tt request information + * @req_src: mac address of tt request sender + * + * Returns true if tt request reply was sent, false otherwise. + */ +static bool batadv_send_my_tt_response(struct batadv_priv *bat_priv, + struct batadv_tvlv_tt_data *tt_data, + uint8_t *req_src) { - struct batadv_orig_node *orig_node; + struct batadv_tvlv_tt_data *tvlv_tt_data = NULL; struct batadv_hard_iface *primary_if = NULL; - uint8_t my_ttvn, req_ttvn, ttvn; - int ret = false; - unsigned char *tt_buff; + struct batadv_tvlv_tt_change *tt_change; + struct batadv_orig_node *orig_node; + uint8_t my_ttvn, req_ttvn; + uint16_t tvlv_len; bool full_table; - uint16_t tt_len, tt_tot; - struct sk_buff *skb = NULL; - struct batadv_tt_query_packet *tt_response; - uint8_t *packet_pos; - size_t len; + int32_t tt_len; batadv_dbg(BATADV_DBG_TT, bat_priv, "Received TT_REQUEST from %pM for ttvn: %u (me) [%c]\n", - tt_request->src, tt_request->ttvn, - (tt_request->flags & BATADV_TT_FULL_TABLE ? 'F' : '.')); + req_src, tt_data->ttvn, + (tt_data->flags & BATADV_TT_FULL_TABLE ? 'F' : '.')); + spin_lock_bh(&bat_priv->tt.commit_lock); my_ttvn = (uint8_t)atomic_read(&bat_priv->tt.vn); - req_ttvn = tt_request->ttvn; + req_ttvn = tt_data->ttvn; - orig_node = batadv_orig_hash_find(bat_priv, tt_request->src); + orig_node = batadv_orig_hash_find(bat_priv, req_src); if (!orig_node) goto out; @@ -1890,103 +2571,104 @@ batadv_send_my_tt_response(struct batadv_priv *bat_priv, /* If the full table has been explicitly requested or the gap * is too big send the whole local translation table */ - if (tt_request->flags & BATADV_TT_FULL_TABLE || my_ttvn != req_ttvn || + if (tt_data->flags & BATADV_TT_FULL_TABLE || my_ttvn != req_ttvn || !bat_priv->tt.last_changeset) full_table = true; else full_table = false; - /* In this version, fragmentation is not implemented, then - * I'll send only one packet with as much TT entries as I can + /* TT fragmentation hasn't been implemented yet, so send as many + * TT entries fit a single packet as possible only */ if (!full_table) { spin_lock_bh(&bat_priv->tt.last_changeset_lock); - tt_len = bat_priv->tt.last_changeset_len; - tt_tot = tt_len / sizeof(struct batadv_tt_change); - len = sizeof(*tt_response) + tt_len; - skb = netdev_alloc_skb_ip_align(NULL, len + ETH_HLEN); - if (!skb) + tt_len = bat_priv->tt.last_changeset_len; + tvlv_len = batadv_tt_prepare_tvlv_local_data(bat_priv, + &tvlv_tt_data, + &tt_change, + &tt_len); + if (!tt_len) goto unlock; - skb->priority = TC_PRIO_CONTROL; - skb_reserve(skb, ETH_HLEN); - packet_pos = skb_put(skb, len); - tt_response = (struct batadv_tt_query_packet *)packet_pos; - tt_response->ttvn = req_ttvn; - tt_response->tt_data = htons(tt_tot); - - tt_buff = skb->data + sizeof(*tt_response); - memcpy(tt_buff, bat_priv->tt.last_changeset, + /* Copy the last orig_node's OGM buffer */ + memcpy(tt_change, bat_priv->tt.last_changeset, bat_priv->tt.last_changeset_len); spin_unlock_bh(&bat_priv->tt.last_changeset_lock); } else { - tt_len = (uint16_t)atomic_read(&bat_priv->tt.local_entry_num); - tt_len *= sizeof(struct batadv_tt_change); - ttvn = (uint8_t)atomic_read(&bat_priv->tt.vn); - - skb = batadv_tt_response_fill_table(tt_len, ttvn, - bat_priv->tt.local_hash, - bat_priv, - batadv_tt_local_valid_entry, - NULL); - if (!skb) + req_ttvn = (uint8_t)atomic_read(&bat_priv->tt.vn); + + /* allocate the tvlv, put the tt_data and all the tt_vlan_data + * in the initial part + */ + tt_len = -1; + tvlv_len = batadv_tt_prepare_tvlv_local_data(bat_priv, + &tvlv_tt_data, + &tt_change, + &tt_len); + if (!tt_len) goto out; - tt_response = (struct batadv_tt_query_packet *)skb->data; + /* fill the rest of the tvlv with the real TT entries */ + batadv_tt_tvlv_generate(bat_priv, bat_priv->tt.local_hash, + tt_change, tt_len, + batadv_tt_local_valid, NULL); } - tt_response->header.packet_type = BATADV_TT_QUERY; - tt_response->header.version = BATADV_COMPAT_VERSION; - tt_response->header.ttl = BATADV_TTL; - memcpy(tt_response->src, primary_if->net_dev->dev_addr, ETH_ALEN); - memcpy(tt_response->dst, tt_request->src, ETH_ALEN); - tt_response->flags = BATADV_TT_RESPONSE; + tvlv_tt_data->flags = BATADV_TT_RESPONSE; + tvlv_tt_data->ttvn = req_ttvn; if (full_table) - tt_response->flags |= BATADV_TT_FULL_TABLE; + tvlv_tt_data->flags |= BATADV_TT_FULL_TABLE; batadv_dbg(BATADV_DBG_TT, bat_priv, - "Sending TT_RESPONSE to %pM [%c]\n", - orig_node->orig, - (tt_response->flags & BATADV_TT_FULL_TABLE ? 'F' : '.')); + "Sending TT_RESPONSE to %pM [%c] (ttvn: %u)\n", + orig_node->orig, full_table ? 'F' : '.', req_ttvn); batadv_inc_counter(bat_priv, BATADV_CNT_TT_RESPONSE_TX); - if (batadv_send_skb_to_orig(skb, orig_node, NULL) != NET_XMIT_DROP) - ret = true; + batadv_tvlv_unicast_send(bat_priv, primary_if->net_dev->dev_addr, + req_src, BATADV_TVLV_TT, 1, tvlv_tt_data, + tvlv_len); + goto out; unlock: spin_unlock_bh(&bat_priv->tt.last_changeset_lock); out: + spin_unlock_bh(&bat_priv->tt.commit_lock); if (orig_node) batadv_orig_node_free_ref(orig_node); if (primary_if) batadv_hardif_free_ref(primary_if); - if (!ret) - kfree_skb(skb); - /* This packet was for me, so it doesn't need to be re-routed */ + kfree(tvlv_tt_data); + /* The packet was for this host, so it doesn't need to be re-routed */ return true; } -bool batadv_send_tt_response(struct batadv_priv *bat_priv, - struct batadv_tt_query_packet *tt_request) +/** + * batadv_send_tt_response - send reply to tt request + * @bat_priv: the bat priv with all the soft interface information + * @tt_data: tt data containing the tt request information + * @req_src: mac address of tt request sender + * @req_dst: mac address of tt request recipient + * + * Returns true if tt request reply was sent, false otherwise. + */ +static bool batadv_send_tt_response(struct batadv_priv *bat_priv, + struct batadv_tvlv_tt_data *tt_data, + uint8_t *req_src, uint8_t *req_dst) { - if (batadv_is_my_mac(bat_priv, tt_request->dst)) { - /* don't answer backbone gws! */ - if (batadv_bla_is_backbone_gw_orig(bat_priv, tt_request->src)) - return true; - - return batadv_send_my_tt_response(bat_priv, tt_request); - } else { - return batadv_send_other_tt_response(bat_priv, tt_request); - } + if (batadv_is_my_mac(bat_priv, req_dst)) + return batadv_send_my_tt_response(bat_priv, tt_data, req_src); + else + return batadv_send_other_tt_response(bat_priv, tt_data, + req_src, req_dst); } static void _batadv_tt_update_changes(struct batadv_priv *bat_priv, struct batadv_orig_node *orig_node, - struct batadv_tt_change *tt_change, + struct batadv_tvlv_tt_change *tt_change, uint16_t tt_num_changes, uint8_t ttvn) { int i; @@ -1997,11 +2679,13 @@ static void _batadv_tt_update_changes(struct batadv_priv *bat_priv, roams = (tt_change + i)->flags & BATADV_TT_CLIENT_ROAM; batadv_tt_global_del(bat_priv, orig_node, (tt_change + i)->addr, + ntohs((tt_change + i)->vid), "tt removed by changes", roams); } else { if (!batadv_tt_global_add(bat_priv, orig_node, (tt_change + i)->addr, + ntohs((tt_change + i)->vid), (tt_change + i)->flags, ttvn)) /* In case of problem while storing a * global_entry, we stop the updating @@ -2016,21 +2700,22 @@ static void _batadv_tt_update_changes(struct batadv_priv *bat_priv, } static void batadv_tt_fill_gtable(struct batadv_priv *bat_priv, - struct batadv_tt_query_packet *tt_response) + struct batadv_tvlv_tt_change *tt_change, + uint8_t ttvn, uint8_t *resp_src, + uint16_t num_entries) { struct batadv_orig_node *orig_node; - orig_node = batadv_orig_hash_find(bat_priv, tt_response->src); + orig_node = batadv_orig_hash_find(bat_priv, resp_src); if (!orig_node) goto out; /* Purge the old table first.. */ - batadv_tt_global_del_orig(bat_priv, orig_node, "Received full table"); + batadv_tt_global_del_orig(bat_priv, orig_node, -1, + "Received full table"); - _batadv_tt_update_changes(bat_priv, orig_node, - (struct batadv_tt_change *)(tt_response + 1), - ntohs(tt_response->tt_data), - tt_response->ttvn); + _batadv_tt_update_changes(bat_priv, orig_node, tt_change, num_entries, + ttvn); spin_lock_bh(&orig_node->tt_buff_lock); kfree(orig_node->tt_buff); @@ -2038,7 +2723,7 @@ static void batadv_tt_fill_gtable(struct batadv_priv *bat_priv, orig_node->tt_buff = NULL; spin_unlock_bh(&orig_node->tt_buff_lock); - atomic_set(&orig_node->last_ttvn, tt_response->ttvn); + atomic_set(&orig_node->last_ttvn, ttvn); out: if (orig_node) @@ -2048,22 +2733,31 @@ out: static void batadv_tt_update_changes(struct batadv_priv *bat_priv, struct batadv_orig_node *orig_node, uint16_t tt_num_changes, uint8_t ttvn, - struct batadv_tt_change *tt_change) + struct batadv_tvlv_tt_change *tt_change) { _batadv_tt_update_changes(bat_priv, orig_node, tt_change, tt_num_changes, ttvn); - batadv_tt_save_orig_buffer(bat_priv, orig_node, - (unsigned char *)tt_change, tt_num_changes); + batadv_tt_save_orig_buffer(bat_priv, orig_node, tt_change, + batadv_tt_len(tt_num_changes)); atomic_set(&orig_node->last_ttvn, ttvn); } -bool batadv_is_my_client(struct batadv_priv *bat_priv, const uint8_t *addr) +/** + * batadv_is_my_client - check if a client is served by the local node + * @bat_priv: the bat priv with all the soft interface information + * @addr: the mac adress of the client to check + * @vid: VLAN identifier + * + * Returns true if the client is served by this node, false otherwise. + */ +bool batadv_is_my_client(struct batadv_priv *bat_priv, const uint8_t *addr, + unsigned short vid) { struct batadv_tt_local_entry *tt_local_entry; bool ret = false; - tt_local_entry = batadv_tt_local_hash_find(bat_priv, addr); + tt_local_entry = batadv_tt_local_hash_find(bat_priv, addr, vid); if (!tt_local_entry) goto out; /* Check if the client has been logically deleted (but is kept for @@ -2079,72 +2773,68 @@ out: return ret; } -void batadv_handle_tt_response(struct batadv_priv *bat_priv, - struct batadv_tt_query_packet *tt_response) +/** + * batadv_handle_tt_response - process incoming tt reply + * @bat_priv: the bat priv with all the soft interface information + * @tt_data: tt data containing the tt request information + * @resp_src: mac address of tt reply sender + * @num_entries: number of tt change entries appended to the tt data + */ +static void batadv_handle_tt_response(struct batadv_priv *bat_priv, + struct batadv_tvlv_tt_data *tt_data, + uint8_t *resp_src, uint16_t num_entries) { struct batadv_tt_req_node *node, *safe; struct batadv_orig_node *orig_node = NULL; - struct batadv_tt_change *tt_change; + struct batadv_tvlv_tt_change *tt_change; + uint8_t *tvlv_ptr = (uint8_t *)tt_data; + uint16_t change_offset; batadv_dbg(BATADV_DBG_TT, bat_priv, "Received TT_RESPONSE from %pM for ttvn %d t_size: %d [%c]\n", - tt_response->src, tt_response->ttvn, - ntohs(tt_response->tt_data), - (tt_response->flags & BATADV_TT_FULL_TABLE ? 'F' : '.')); - - /* we should have never asked a backbone gw */ - if (batadv_bla_is_backbone_gw_orig(bat_priv, tt_response->src)) - goto out; + resp_src, tt_data->ttvn, num_entries, + (tt_data->flags & BATADV_TT_FULL_TABLE ? 'F' : '.')); - orig_node = batadv_orig_hash_find(bat_priv, tt_response->src); + orig_node = batadv_orig_hash_find(bat_priv, resp_src); if (!orig_node) goto out; - if (tt_response->flags & BATADV_TT_FULL_TABLE) { - batadv_tt_fill_gtable(bat_priv, tt_response); + spin_lock_bh(&orig_node->tt_lock); + + change_offset = sizeof(struct batadv_tvlv_tt_vlan_data); + change_offset *= ntohs(tt_data->num_vlan); + change_offset += sizeof(*tt_data); + tvlv_ptr += change_offset; + + tt_change = (struct batadv_tvlv_tt_change *)tvlv_ptr; + if (tt_data->flags & BATADV_TT_FULL_TABLE) { + batadv_tt_fill_gtable(bat_priv, tt_change, tt_data->ttvn, + resp_src, num_entries); } else { - tt_change = (struct batadv_tt_change *)(tt_response + 1); - batadv_tt_update_changes(bat_priv, orig_node, - ntohs(tt_response->tt_data), - tt_response->ttvn, tt_change); + batadv_tt_update_changes(bat_priv, orig_node, num_entries, + tt_data->ttvn, tt_change); } + /* Recalculate the CRC for this orig_node and store it */ + batadv_tt_global_update_crc(bat_priv, orig_node); + + spin_unlock_bh(&orig_node->tt_lock); + /* Delete the tt_req_node from pending tt_requests list */ spin_lock_bh(&bat_priv->tt.req_list_lock); list_for_each_entry_safe(node, safe, &bat_priv->tt.req_list, list) { - if (!batadv_compare_eth(node->addr, tt_response->src)) + if (!batadv_compare_eth(node->addr, resp_src)) continue; list_del(&node->list); kfree(node); } - spin_unlock_bh(&bat_priv->tt.req_list_lock); - /* Recalculate the CRC for this orig_node and store it */ - orig_node->tt_crc = batadv_tt_global_crc(bat_priv, orig_node); + spin_unlock_bh(&bat_priv->tt.req_list_lock); out: if (orig_node) batadv_orig_node_free_ref(orig_node); } -int batadv_tt_init(struct batadv_priv *bat_priv) -{ - int ret; - - ret = batadv_tt_local_init(bat_priv); - if (ret < 0) - return ret; - - ret = batadv_tt_global_init(bat_priv); - if (ret < 0) - return ret; - - INIT_DELAYED_WORK(&bat_priv->tt.work, batadv_tt_purge); - queue_delayed_work(batadv_event_workqueue, &bat_priv->tt.work, - msecs_to_jiffies(BATADV_TT_WORK_PERIOD)); - - return 1; -} - static void batadv_tt_roam_list_free(struct batadv_priv *bat_priv) { struct batadv_tt_roam_node *node, *safe; @@ -2225,14 +2915,28 @@ unlock: return ret; } +/** + * batadv_send_roam_adv - send a roaming advertisement message + * @bat_priv: the bat priv with all the soft interface information + * @client: mac address of the roaming client + * @vid: VLAN identifier + * @orig_node: message destination + * + * Send a ROAMING_ADV message to the node which was previously serving this + * client. This is done to inform the node that from now on all traffic destined + * for this particular roamed client has to be forwarded to the sender of the + * roaming message. + */ static void batadv_send_roam_adv(struct batadv_priv *bat_priv, uint8_t *client, + unsigned short vid, struct batadv_orig_node *orig_node) { - struct sk_buff *skb = NULL; - struct batadv_roam_adv_packet *roam_adv_packet; - int ret = 1; struct batadv_hard_iface *primary_if; - size_t len = sizeof(*roam_adv_packet); + struct batadv_tvlv_roam_adv tvlv_roam; + + primary_if = batadv_primary_if_get_selected(bat_priv); + if (!primary_if) + goto out; /* before going on we have to check whether the client has * already roamed to us too many times @@ -2240,40 +2944,22 @@ static void batadv_send_roam_adv(struct batadv_priv *bat_priv, uint8_t *client, if (!batadv_tt_check_roam_count(bat_priv, client)) goto out; - skb = netdev_alloc_skb_ip_align(NULL, len + ETH_HLEN); - if (!skb) - goto out; - - skb->priority = TC_PRIO_CONTROL; - skb_reserve(skb, ETH_HLEN); - - roam_adv_packet = (struct batadv_roam_adv_packet *)skb_put(skb, len); - - roam_adv_packet->header.packet_type = BATADV_ROAM_ADV; - roam_adv_packet->header.version = BATADV_COMPAT_VERSION; - roam_adv_packet->header.ttl = BATADV_TTL; - roam_adv_packet->reserved = 0; - primary_if = batadv_primary_if_get_selected(bat_priv); - if (!primary_if) - goto out; - memcpy(roam_adv_packet->src, primary_if->net_dev->dev_addr, ETH_ALEN); - batadv_hardif_free_ref(primary_if); - memcpy(roam_adv_packet->dst, orig_node->orig, ETH_ALEN); - memcpy(roam_adv_packet->client, client, ETH_ALEN); - batadv_dbg(BATADV_DBG_TT, bat_priv, - "Sending ROAMING_ADV to %pM (client %pM)\n", - orig_node->orig, client); + "Sending ROAMING_ADV to %pM (client %pM, vid: %d)\n", + orig_node->orig, client, BATADV_PRINT_VID(vid)); batadv_inc_counter(bat_priv, BATADV_CNT_TT_ROAM_ADV_TX); - if (batadv_send_skb_to_orig(skb, orig_node, NULL) != NET_XMIT_DROP) - ret = 0; + memcpy(tvlv_roam.client, client, sizeof(tvlv_roam.client)); + tvlv_roam.vid = htons(vid); + + batadv_tvlv_unicast_send(bat_priv, primary_if->net_dev->dev_addr, + orig_node->orig, BATADV_TVLV_ROAM, 1, + &tvlv_roam, sizeof(tvlv_roam)); out: - if (ret && skb) - kfree_skb(skb); - return; + if (primary_if) + batadv_hardif_free_ref(primary_if); } static void batadv_tt_purge(struct work_struct *work) @@ -2286,7 +2972,7 @@ static void batadv_tt_purge(struct work_struct *work) priv_tt = container_of(delayed_work, struct batadv_priv_tt, work); bat_priv = container_of(priv_tt, struct batadv_priv, tt); - batadv_tt_local_purge(bat_priv); + batadv_tt_local_purge(bat_priv, BATADV_TT_LOCAL_TIMEOUT); batadv_tt_global_purge(bat_priv); batadv_tt_req_purge(bat_priv); batadv_tt_roam_purge(bat_priv); @@ -2297,6 +2983,9 @@ static void batadv_tt_purge(struct work_struct *work) void batadv_tt_free(struct batadv_priv *bat_priv) { + batadv_tvlv_container_unregister(bat_priv, BATADV_TVLV_TT, 1); + batadv_tvlv_handler_unregister(bat_priv, BATADV_TVLV_TT, 1); + cancel_delayed_work_sync(&bat_priv->tt.work); batadv_tt_local_table_free(bat_priv); @@ -2308,19 +2997,25 @@ void batadv_tt_free(struct batadv_priv *bat_priv) kfree(bat_priv->tt.last_changeset); } -/* This function will enable or disable the specified flags for all the entries - * in the given hash table and returns the number of modified entries +/** + * batadv_tt_local_set_flags - set or unset the specified flags on the local + * table and possibly count them in the TT size + * @bat_priv: the bat priv with all the soft interface information + * @flags: the flag to switch + * @enable: whether to set or unset the flag + * @count: whether to increase the TT size by the number of changed entries */ -static uint16_t batadv_tt_set_flags(struct batadv_hashtable *hash, - uint16_t flags, bool enable) +static void batadv_tt_local_set_flags(struct batadv_priv *bat_priv, + uint16_t flags, bool enable, bool count) { - uint32_t i; + struct batadv_hashtable *hash = bat_priv->tt.local_hash; + struct batadv_tt_common_entry *tt_common_entry; uint16_t changed_num = 0; struct hlist_head *head; - struct batadv_tt_common_entry *tt_common_entry; + uint32_t i; if (!hash) - goto out; + return; for (i = 0; i < hash->size; i++) { head = &hash->table[i]; @@ -2338,11 +3033,15 @@ static uint16_t batadv_tt_set_flags(struct batadv_hashtable *hash, tt_common_entry->flags &= ~flags; } changed_num++; + + if (!count) + continue; + + batadv_tt_local_size_inc(bat_priv, + tt_common_entry->vid); } rcu_read_unlock(); } -out: - return changed_num; } /* Purge out all the tt local entries marked with BATADV_TT_CLIENT_PENDING */ @@ -2370,10 +3069,11 @@ static void batadv_tt_local_purge_pending_clients(struct batadv_priv *bat_priv) continue; batadv_dbg(BATADV_DBG_TT, bat_priv, - "Deleting local tt entry (%pM): pending\n", - tt_common->addr); + "Deleting local tt entry (%pM, vid: %d): pending\n", + tt_common->addr, + BATADV_PRINT_VID(tt_common->vid)); - atomic_dec(&bat_priv->tt.local_entry_num); + batadv_tt_local_size_dec(bat_priv, tt_common->vid); hlist_del_rcu(&tt_common->hash_entry); tt_local = container_of(tt_common, struct batadv_tt_local_entry, @@ -2384,22 +3084,25 @@ static void batadv_tt_local_purge_pending_clients(struct batadv_priv *bat_priv) } } -static int batadv_tt_commit_changes(struct batadv_priv *bat_priv, - unsigned char **packet_buff, - int *packet_buff_len, int packet_min_len) +/** + * batadv_tt_local_commit_changes_nolock - commit all pending local tt changes + * which have been queued in the time since the last commit + * @bat_priv: the bat priv with all the soft interface information + * + * Caller must hold tt->commit_lock. + */ +static void batadv_tt_local_commit_changes_nolock(struct batadv_priv *bat_priv) { - uint16_t changed_num = 0; - - if (atomic_read(&bat_priv->tt.local_changes) < 1) - return -ENOENT; + if (atomic_read(&bat_priv->tt.local_changes) < 1) { + if (!batadv_atomic_dec_not_zero(&bat_priv->tt.ogm_append_cnt)) + batadv_tt_tvlv_container_update(bat_priv); + return; + } - changed_num = batadv_tt_set_flags(bat_priv->tt.local_hash, - BATADV_TT_CLIENT_NEW, false); + batadv_tt_local_set_flags(bat_priv, BATADV_TT_CLIENT_NEW, false, true); - /* all reset entries have to be counted as local entries */ - atomic_add(changed_num, &bat_priv->tt.local_entry_num); batadv_tt_local_purge_pending_clients(bat_priv); - bat_priv->tt.local_crc = batadv_tt_local_crc(bat_priv); + batadv_tt_local_update_crc(bat_priv); /* Increment the TTVN only once per OGM interval */ atomic_inc(&bat_priv->tt.vn); @@ -2409,49 +3112,38 @@ static int batadv_tt_commit_changes(struct batadv_priv *bat_priv, /* reset the sending counter */ atomic_set(&bat_priv->tt.ogm_append_cnt, BATADV_TT_OGM_APPEND_MAX); - - return batadv_tt_changes_fill_buff(bat_priv, packet_buff, - packet_buff_len, packet_min_len); + batadv_tt_tvlv_container_update(bat_priv); } -/* when calling this function (hard_iface == primary_if) has to be true */ -int batadv_tt_append_diff(struct batadv_priv *bat_priv, - unsigned char **packet_buff, int *packet_buff_len, - int packet_min_len) +/** + * batadv_tt_local_commit_changes - commit all pending local tt changes which + * have been queued in the time since the last commit + * @bat_priv: the bat priv with all the soft interface information + */ +void batadv_tt_local_commit_changes(struct batadv_priv *bat_priv) { - int tt_num_changes; - - /* if at least one change happened */ - tt_num_changes = batadv_tt_commit_changes(bat_priv, packet_buff, - packet_buff_len, - packet_min_len); - - /* if the changes have been sent often enough */ - if ((tt_num_changes < 0) && - (!batadv_atomic_dec_not_zero(&bat_priv->tt.ogm_append_cnt))) { - batadv_tt_realloc_packet_buff(packet_buff, packet_buff_len, - packet_min_len, packet_min_len); - tt_num_changes = 0; - } - - return tt_num_changes; + spin_lock_bh(&bat_priv->tt.commit_lock); + batadv_tt_local_commit_changes_nolock(bat_priv); + spin_unlock_bh(&bat_priv->tt.commit_lock); } bool batadv_is_ap_isolated(struct batadv_priv *bat_priv, uint8_t *src, - uint8_t *dst) + uint8_t *dst, unsigned short vid) { struct batadv_tt_local_entry *tt_local_entry = NULL; struct batadv_tt_global_entry *tt_global_entry = NULL; + struct batadv_softif_vlan *vlan; bool ret = false; - if (!atomic_read(&bat_priv->ap_isolation)) + vlan = batadv_softif_vlan_get(bat_priv, vid); + if (!vlan || !atomic_read(&vlan->ap_isolation)) goto out; - tt_local_entry = batadv_tt_local_hash_find(bat_priv, dst); + tt_local_entry = batadv_tt_local_hash_find(bat_priv, dst, vid); if (!tt_local_entry) goto out; - tt_global_entry = batadv_tt_global_hash_find(bat_priv, src); + tt_global_entry = batadv_tt_global_hash_find(bat_priv, src, vid); if (!tt_global_entry) goto out; @@ -2461,6 +3153,8 @@ bool batadv_is_ap_isolated(struct batadv_priv *bat_priv, uint8_t *src, ret = true; out: + if (vlan) + batadv_softif_vlan_free_ref(vlan); if (tt_global_entry) batadv_tt_global_entry_free_ref(tt_global_entry); if (tt_local_entry) @@ -2468,19 +3162,29 @@ out: return ret; } -void batadv_tt_update_orig(struct batadv_priv *bat_priv, - struct batadv_orig_node *orig_node, - const unsigned char *tt_buff, uint8_t tt_num_changes, - uint8_t ttvn, uint16_t tt_crc) +/** + * batadv_tt_update_orig - update global translation table with new tt + * information received via ogms + * @bat_priv: the bat priv with all the soft interface information + * @orig: the orig_node of the ogm + * @tt_vlan: pointer to the first tvlv VLAN entry + * @tt_num_vlan: number of tvlv VLAN entries + * @tt_change: pointer to the first entry in the TT buffer + * @tt_num_changes: number of tt changes inside the tt buffer + * @ttvn: translation table version number of this changeset + * @tt_crc: crc32 checksum of orig node's translation table + */ +static void batadv_tt_update_orig(struct batadv_priv *bat_priv, + struct batadv_orig_node *orig_node, + const void *tt_buff, uint16_t tt_num_vlan, + struct batadv_tvlv_tt_change *tt_change, + uint16_t tt_num_changes, uint8_t ttvn) { uint8_t orig_ttvn = (uint8_t)atomic_read(&orig_node->last_ttvn); + struct batadv_tvlv_tt_vlan_data *tt_vlan; bool full_table = true; - struct batadv_tt_change *tt_change; - - /* don't care about a backbone gateways updates. */ - if (batadv_bla_is_backbone_gw_orig(bat_priv, orig_node->orig)) - return; + tt_vlan = (struct batadv_tvlv_tt_vlan_data *)tt_buff; /* orig table not initialised AND first diff is in the OGM OR the ttvn * increased by one -> we can apply the attached changes */ @@ -2496,7 +3200,9 @@ void batadv_tt_update_orig(struct batadv_priv *bat_priv, goto request_table; } - tt_change = (struct batadv_tt_change *)tt_buff; + spin_lock_bh(&orig_node->tt_lock); + + tt_change = (struct batadv_tvlv_tt_change *)tt_buff; batadv_tt_update_changes(bat_priv, orig_node, tt_num_changes, ttvn, tt_change); @@ -2504,7 +3210,9 @@ void batadv_tt_update_orig(struct batadv_priv *bat_priv, * prefer to recompute it to spot any possible inconsistency * in the global table */ - orig_node->tt_crc = batadv_tt_global_crc(bat_priv, orig_node); + batadv_tt_global_update_crc(bat_priv, orig_node); + + spin_unlock_bh(&orig_node->tt_lock); /* The ttvn alone is not enough to guarantee consistency * because a single value could represent different states @@ -2515,37 +3223,46 @@ void batadv_tt_update_orig(struct batadv_priv *bat_priv, * checking the CRC value is mandatory to detect the * inconsistency */ - if (orig_node->tt_crc != tt_crc) + if (!batadv_tt_global_check_crc(orig_node, tt_vlan, + tt_num_vlan)) goto request_table; } else { /* if we missed more than one change or our tables are not * in sync anymore -> request fresh tt data */ if (!orig_node->tt_initialised || ttvn != orig_ttvn || - orig_node->tt_crc != tt_crc) { + !batadv_tt_global_check_crc(orig_node, tt_vlan, + tt_num_vlan)) { request_table: batadv_dbg(BATADV_DBG_TT, bat_priv, - "TT inconsistency for %pM. Need to retrieve the correct information (ttvn: %u last_ttvn: %u crc: %#.4x last_crc: %#.4x num_changes: %u)\n", - orig_node->orig, ttvn, orig_ttvn, tt_crc, - orig_node->tt_crc, tt_num_changes); + "TT inconsistency for %pM. Need to retrieve the correct information (ttvn: %u last_ttvn: %u num_changes: %u)\n", + orig_node->orig, ttvn, orig_ttvn, + tt_num_changes); batadv_send_tt_request(bat_priv, orig_node, ttvn, - tt_crc, full_table); + tt_vlan, tt_num_vlan, + full_table); return; } } } -/* returns true whether we know that the client has moved from its old - * originator to another one. This entry is kept is still kept for consistency - * purposes +/** + * batadv_tt_global_client_is_roaming - check if a client is marked as roaming + * @bat_priv: the bat priv with all the soft interface information + * @addr: the mac address of the client to check + * @vid: VLAN identifier + * + * Returns true if we know that the client has moved from its old originator + * to another one. This entry is still kept for consistency purposes and will be + * deleted later by a DEL or because of timeout */ bool batadv_tt_global_client_is_roaming(struct batadv_priv *bat_priv, - uint8_t *addr) + uint8_t *addr, unsigned short vid) { struct batadv_tt_global_entry *tt_global_entry; bool ret = false; - tt_global_entry = batadv_tt_global_hash_find(bat_priv, addr); + tt_global_entry = batadv_tt_global_hash_find(bat_priv, addr, vid); if (!tt_global_entry) goto out; @@ -2558,19 +3275,20 @@ out: /** * batadv_tt_local_client_is_roaming - tells whether the client is roaming * @bat_priv: the bat priv with all the soft interface information - * @addr: the MAC address of the local client to query + * @addr: the mac address of the local client to query + * @vid: VLAN identifier * * Returns true if the local client is known to be roaming (it is not served by * this node anymore) or not. If yes, the client is still present in the table * to keep the latter consistent with the node TTVN */ bool batadv_tt_local_client_is_roaming(struct batadv_priv *bat_priv, - uint8_t *addr) + uint8_t *addr, unsigned short vid) { struct batadv_tt_local_entry *tt_local_entry; bool ret = false; - tt_local_entry = batadv_tt_local_hash_find(bat_priv, addr); + tt_local_entry = batadv_tt_local_hash_find(bat_priv, addr, vid); if (!tt_local_entry) goto out; @@ -2582,26 +3300,268 @@ out: bool batadv_tt_add_temporary_global_entry(struct batadv_priv *bat_priv, struct batadv_orig_node *orig_node, - const unsigned char *addr) + const unsigned char *addr, + unsigned short vid) { bool ret = false; - /* if the originator is a backbone node (meaning it belongs to the same - * LAN of this node) the temporary client must not be added because to - * reach such destination the node must use the LAN instead of the mesh - */ - if (batadv_bla_is_backbone_gw_orig(bat_priv, orig_node->orig)) - goto out; - - if (!batadv_tt_global_add(bat_priv, orig_node, addr, + if (!batadv_tt_global_add(bat_priv, orig_node, addr, vid, BATADV_TT_CLIENT_TEMP, atomic_read(&orig_node->last_ttvn))) goto out; batadv_dbg(BATADV_DBG_TT, bat_priv, - "Added temporary global client (addr: %pM orig: %pM)\n", - addr, orig_node->orig); + "Added temporary global client (addr: %pM, vid: %d, orig: %pM)\n", + addr, BATADV_PRINT_VID(vid), orig_node->orig); ret = true; out: return ret; } + +/** + * batadv_tt_local_resize_to_mtu - resize the local translation table fit the + * maximum packet size that can be transported through the mesh + * @soft_iface: netdev struct of the mesh interface + * + * Remove entries older than 'timeout' and half timeout if more entries need + * to be removed. + */ +void batadv_tt_local_resize_to_mtu(struct net_device *soft_iface) +{ + struct batadv_priv *bat_priv = netdev_priv(soft_iface); + int packet_size_max = atomic_read(&bat_priv->packet_size_max); + int table_size, timeout = BATADV_TT_LOCAL_TIMEOUT / 2; + bool reduced = false; + + spin_lock_bh(&bat_priv->tt.commit_lock); + + while (true) { + table_size = batadv_tt_local_table_transmit_size(bat_priv); + if (packet_size_max >= table_size) + break; + + batadv_tt_local_purge(bat_priv, timeout); + batadv_tt_local_purge_pending_clients(bat_priv); + + timeout /= 2; + reduced = true; + net_ratelimited_function(batadv_info, soft_iface, + "Forced to purge local tt entries to fit new maximum fragment MTU (%i)\n", + packet_size_max); + } + + /* commit these changes immediately, to avoid synchronization problem + * with the TTVN + */ + if (reduced) + batadv_tt_local_commit_changes_nolock(bat_priv); + + spin_unlock_bh(&bat_priv->tt.commit_lock); +} + +/** + * batadv_tt_tvlv_ogm_handler_v1 - process incoming tt tvlv container + * @bat_priv: the bat priv with all the soft interface information + * @orig: the orig_node of the ogm + * @flags: flags indicating the tvlv state (see batadv_tvlv_handler_flags) + * @tvlv_value: tvlv buffer containing the gateway data + * @tvlv_value_len: tvlv buffer length + */ +static void batadv_tt_tvlv_ogm_handler_v1(struct batadv_priv *bat_priv, + struct batadv_orig_node *orig, + uint8_t flags, void *tvlv_value, + uint16_t tvlv_value_len) +{ + struct batadv_tvlv_tt_vlan_data *tt_vlan; + struct batadv_tvlv_tt_change *tt_change; + struct batadv_tvlv_tt_data *tt_data; + uint16_t num_entries, num_vlan; + + if (tvlv_value_len < sizeof(*tt_data)) + return; + + tt_data = (struct batadv_tvlv_tt_data *)tvlv_value; + tvlv_value_len -= sizeof(*tt_data); + + num_vlan = ntohs(tt_data->num_vlan); + + if (tvlv_value_len < sizeof(*tt_vlan) * num_vlan) + return; + + tt_vlan = (struct batadv_tvlv_tt_vlan_data *)(tt_data + 1); + tt_change = (struct batadv_tvlv_tt_change *)(tt_vlan + num_vlan); + tvlv_value_len -= sizeof(*tt_vlan) * num_vlan; + + num_entries = batadv_tt_entries(tvlv_value_len); + + batadv_tt_update_orig(bat_priv, orig, tt_vlan, num_vlan, tt_change, + num_entries, tt_data->ttvn); +} + +/** + * batadv_tt_tvlv_unicast_handler_v1 - process incoming (unicast) tt tvlv + * container + * @bat_priv: the bat priv with all the soft interface information + * @src: mac address of tt tvlv sender + * @dst: mac address of tt tvlv recipient + * @tvlv_value: tvlv buffer containing the tt data + * @tvlv_value_len: tvlv buffer length + * + * Returns NET_RX_DROP if the tt tvlv is to be re-routed, NET_RX_SUCCESS + * otherwise. + */ +static int batadv_tt_tvlv_unicast_handler_v1(struct batadv_priv *bat_priv, + uint8_t *src, uint8_t *dst, + void *tvlv_value, + uint16_t tvlv_value_len) +{ + struct batadv_tvlv_tt_data *tt_data; + uint16_t tt_vlan_len, tt_num_entries; + char tt_flag; + bool ret; + + if (tvlv_value_len < sizeof(*tt_data)) + return NET_RX_SUCCESS; + + tt_data = (struct batadv_tvlv_tt_data *)tvlv_value; + tvlv_value_len -= sizeof(*tt_data); + + tt_vlan_len = sizeof(struct batadv_tvlv_tt_vlan_data); + tt_vlan_len *= ntohs(tt_data->num_vlan); + + if (tvlv_value_len < tt_vlan_len) + return NET_RX_SUCCESS; + + tvlv_value_len -= tt_vlan_len; + tt_num_entries = batadv_tt_entries(tvlv_value_len); + + switch (tt_data->flags & BATADV_TT_DATA_TYPE_MASK) { + case BATADV_TT_REQUEST: + batadv_inc_counter(bat_priv, BATADV_CNT_TT_REQUEST_RX); + + /* If this node cannot provide a TT response the tt_request is + * forwarded + */ + ret = batadv_send_tt_response(bat_priv, tt_data, src, dst); + if (!ret) { + if (tt_data->flags & BATADV_TT_FULL_TABLE) + tt_flag = 'F'; + else + tt_flag = '.'; + + batadv_dbg(BATADV_DBG_TT, bat_priv, + "Routing TT_REQUEST to %pM [%c]\n", + dst, tt_flag); + /* tvlv API will re-route the packet */ + return NET_RX_DROP; + } + break; + case BATADV_TT_RESPONSE: + batadv_inc_counter(bat_priv, BATADV_CNT_TT_RESPONSE_RX); + + if (batadv_is_my_mac(bat_priv, dst)) { + batadv_handle_tt_response(bat_priv, tt_data, + src, tt_num_entries); + return NET_RX_SUCCESS; + } + + if (tt_data->flags & BATADV_TT_FULL_TABLE) + tt_flag = 'F'; + else + tt_flag = '.'; + + batadv_dbg(BATADV_DBG_TT, bat_priv, + "Routing TT_RESPONSE to %pM [%c]\n", dst, tt_flag); + + /* tvlv API will re-route the packet */ + return NET_RX_DROP; + } + + return NET_RX_SUCCESS; +} + +/** + * batadv_roam_tvlv_unicast_handler_v1 - process incoming tt roam tvlv container + * @bat_priv: the bat priv with all the soft interface information + * @src: mac address of tt tvlv sender + * @dst: mac address of tt tvlv recipient + * @tvlv_value: tvlv buffer containing the tt data + * @tvlv_value_len: tvlv buffer length + * + * Returns NET_RX_DROP if the tt roam tvlv is to be re-routed, NET_RX_SUCCESS + * otherwise. + */ +static int batadv_roam_tvlv_unicast_handler_v1(struct batadv_priv *bat_priv, + uint8_t *src, uint8_t *dst, + void *tvlv_value, + uint16_t tvlv_value_len) +{ + struct batadv_tvlv_roam_adv *roaming_adv; + struct batadv_orig_node *orig_node = NULL; + + /* If this node is not the intended recipient of the + * roaming advertisement the packet is forwarded + * (the tvlv API will re-route the packet). + */ + if (!batadv_is_my_mac(bat_priv, dst)) + return NET_RX_DROP; + + if (tvlv_value_len < sizeof(*roaming_adv)) + goto out; + + orig_node = batadv_orig_hash_find(bat_priv, src); + if (!orig_node) + goto out; + + batadv_inc_counter(bat_priv, BATADV_CNT_TT_ROAM_ADV_RX); + roaming_adv = (struct batadv_tvlv_roam_adv *)tvlv_value; + + batadv_dbg(BATADV_DBG_TT, bat_priv, + "Received ROAMING_ADV from %pM (client %pM)\n", + src, roaming_adv->client); + + batadv_tt_global_add(bat_priv, orig_node, roaming_adv->client, + ntohs(roaming_adv->vid), BATADV_TT_CLIENT_ROAM, + atomic_read(&orig_node->last_ttvn) + 1); + +out: + if (orig_node) + batadv_orig_node_free_ref(orig_node); + return NET_RX_SUCCESS; +} + +/** + * batadv_tt_init - initialise the translation table internals + * @bat_priv: the bat priv with all the soft interface information + * + * Return 0 on success or negative error number in case of failure. + */ +int batadv_tt_init(struct batadv_priv *bat_priv) +{ + int ret; + + /* synchronized flags must be remote */ + BUILD_BUG_ON(!(BATADV_TT_SYNC_MASK & BATADV_TT_REMOTE_MASK)); + + ret = batadv_tt_local_init(bat_priv); + if (ret < 0) + return ret; + + ret = batadv_tt_global_init(bat_priv); + if (ret < 0) + return ret; + + batadv_tvlv_handler_register(bat_priv, batadv_tt_tvlv_ogm_handler_v1, + batadv_tt_tvlv_unicast_handler_v1, + BATADV_TVLV_TT, 1, BATADV_NO_FLAGS); + + batadv_tvlv_handler_register(bat_priv, NULL, + batadv_roam_tvlv_unicast_handler_v1, + BATADV_TVLV_ROAM, 1, BATADV_NO_FLAGS); + + INIT_DELAYED_WORK(&bat_priv->tt.work, batadv_tt_purge); + queue_delayed_work(batadv_event_workqueue, &bat_priv->tt.work, + msecs_to_jiffies(BATADV_TT_WORK_PERIOD)); + + return 1; +} diff --git a/net/batman-adv/translation-table.h b/net/batman-adv/translation-table.h index 659a3bb759ce..026b1ffa6746 100644 --- a/net/batman-adv/translation-table.h +++ b/net/batman-adv/translation-table.h @@ -20,49 +20,35 @@ #ifndef _NET_BATMAN_ADV_TRANSLATION_TABLE_H_ #define _NET_BATMAN_ADV_TRANSLATION_TABLE_H_ -int batadv_tt_len(int changes_num); int batadv_tt_init(struct batadv_priv *bat_priv); -void batadv_tt_local_add(struct net_device *soft_iface, const uint8_t *addr, - int ifindex); +bool batadv_tt_local_add(struct net_device *soft_iface, const uint8_t *addr, + unsigned short vid, int ifindex); uint16_t batadv_tt_local_remove(struct batadv_priv *bat_priv, - const uint8_t *addr, const char *message, - bool roaming); + const uint8_t *addr, unsigned short vid, + const char *message, bool roaming); int batadv_tt_local_seq_print_text(struct seq_file *seq, void *offset); -void batadv_tt_global_add_orig(struct batadv_priv *bat_priv, - struct batadv_orig_node *orig_node, - const unsigned char *tt_buff, int tt_buff_len); -int batadv_tt_global_add(struct batadv_priv *bat_priv, - struct batadv_orig_node *orig_node, - const unsigned char *addr, uint16_t flags, - uint8_t ttvn); int batadv_tt_global_seq_print_text(struct seq_file *seq, void *offset); void batadv_tt_global_del_orig(struct batadv_priv *bat_priv, struct batadv_orig_node *orig_node, - const char *message); + int32_t match_vid, const char *message); struct batadv_orig_node *batadv_transtable_search(struct batadv_priv *bat_priv, const uint8_t *src, - const uint8_t *addr); + const uint8_t *addr, + unsigned short vid); void batadv_tt_free(struct batadv_priv *bat_priv); -bool batadv_send_tt_response(struct batadv_priv *bat_priv, - struct batadv_tt_query_packet *tt_request); -bool batadv_is_my_client(struct batadv_priv *bat_priv, const uint8_t *addr); -void batadv_handle_tt_response(struct batadv_priv *bat_priv, - struct batadv_tt_query_packet *tt_response); +bool batadv_is_my_client(struct batadv_priv *bat_priv, const uint8_t *addr, + unsigned short vid); bool batadv_is_ap_isolated(struct batadv_priv *bat_priv, uint8_t *src, - uint8_t *dst); -void batadv_tt_update_orig(struct batadv_priv *bat_priv, - struct batadv_orig_node *orig_node, - const unsigned char *tt_buff, uint8_t tt_num_changes, - uint8_t ttvn, uint16_t tt_crc); -int batadv_tt_append_diff(struct batadv_priv *bat_priv, - unsigned char **packet_buff, int *packet_buff_len, - int packet_min_len); + uint8_t *dst, unsigned short vid); +void batadv_tt_local_commit_changes(struct batadv_priv *bat_priv); bool batadv_tt_global_client_is_roaming(struct batadv_priv *bat_priv, - uint8_t *addr); + uint8_t *addr, unsigned short vid); bool batadv_tt_local_client_is_roaming(struct batadv_priv *bat_priv, - uint8_t *addr); + uint8_t *addr, unsigned short vid); +void batadv_tt_local_resize_to_mtu(struct net_device *soft_iface); bool batadv_tt_add_temporary_global_entry(struct batadv_priv *bat_priv, struct batadv_orig_node *orig_node, - const unsigned char *addr); + const unsigned char *addr, + unsigned short vid); #endif /* _NET_BATMAN_ADV_TRANSLATION_TABLE_H_ */ diff --git a/net/batman-adv/types.h b/net/batman-adv/types.h index b2c94e139319..91dd369b0ff2 100644 --- a/net/batman-adv/types.h +++ b/net/batman-adv/types.h @@ -24,13 +24,6 @@ #include "bitarray.h" #include <linux/kernel.h> -/** - * Maximum overhead for the encapsulation for a payload packet - */ -#define BATADV_HEADER_LEN \ - (ETH_HLEN + max(sizeof(struct batadv_unicast_packet), \ - sizeof(struct batadv_bcast_packet))) - #ifdef CONFIG_BATMAN_ADV_DAT /* batadv_dat_addr_t is the type used for all DHT addresses. If it is changed, @@ -43,6 +36,18 @@ #endif /* CONFIG_BATMAN_ADV_DAT */ /** + * BATADV_TT_REMOTE_MASK - bitmask selecting the flags that are sent over the + * wire only + */ +#define BATADV_TT_REMOTE_MASK 0x00FF + +/** + * BATADV_TT_SYNC_MASK - bitmask of the flags that need to be kept in sync + * among the nodes. These flags are used to compute the global/local CRC + */ +#define BATADV_TT_SYNC_MASK 0x00F0 + +/** * struct batadv_hard_iface_bat_iv - per hard interface B.A.T.M.A.N. IV data * @ogm_buff: buffer holding the OGM packet * @ogm_buff_len: length of the OGM packet buffer @@ -60,7 +65,6 @@ struct batadv_hard_iface_bat_iv { * @if_num: identificator of the interface * @if_status: status of the interface for batman-adv * @net_dev: pointer to the net_device - * @frag_seqno: last fragment sequence number sent by this interface * @num_bcasts: number of payload re-broadcasts on this interface (ARQ) * @hardif_obj: kobject of the per interface sysfs "mesh" directory * @refcount: number of contexts the object is used @@ -76,7 +80,6 @@ struct batadv_hard_iface { int16_t if_num; char if_status; struct net_device *net_dev; - atomic_t frag_seqno; uint8_t num_bcasts; struct kobject *hardif_obj; atomic_t refcount; @@ -88,28 +91,97 @@ struct batadv_hard_iface { }; /** + * struct batadv_frag_table_entry - head in the fragment buffer table + * @head: head of list with fragments + * @lock: lock to protect the list of fragments + * @timestamp: time (jiffie) of last received fragment + * @seqno: sequence number of the fragments in the list + * @size: accumulated size of packets in list + */ +struct batadv_frag_table_entry { + struct hlist_head head; + spinlock_t lock; /* protects head */ + unsigned long timestamp; + uint16_t seqno; + uint16_t size; +}; + +/** + * struct batadv_frag_list_entry - entry in a list of fragments + * @list: list node information + * @skb: fragment + * @no: fragment number in the set + */ +struct batadv_frag_list_entry { + struct hlist_node list; + struct sk_buff *skb; + uint8_t no; +}; + +/** + * struct batadv_vlan_tt - VLAN specific TT attributes + * @crc: CRC32 checksum of the entries belonging to this vlan + * @num_entries: number of TT entries for this VLAN + */ +struct batadv_vlan_tt { + uint32_t crc; + atomic_t num_entries; +}; + +/** + * batadv_orig_node_vlan - VLAN specific data per orig_node + * @vid: the VLAN identifier + * @tt: VLAN specific TT attributes + * @list: list node for orig_node::vlan_list + * @refcount: number of context where this object is currently in use + * @rcu: struct used for freeing in a RCU-safe manner + */ +struct batadv_orig_node_vlan { + unsigned short vid; + struct batadv_vlan_tt tt; + struct list_head list; + atomic_t refcount; + struct rcu_head rcu; +}; + +/** + * struct batadv_orig_bat_iv - B.A.T.M.A.N. IV private orig_node members + * @bcast_own: bitfield containing the number of our OGMs this orig_node + * rebroadcasted "back" to us (relative to last_real_seqno) + * @bcast_own_sum: counted result of bcast_own + * @ogm_cnt_lock: lock protecting bcast_own, bcast_own_sum, + * neigh_node->bat_iv.real_bits & neigh_node->bat_iv.real_packet_count + */ +struct batadv_orig_bat_iv { + unsigned long *bcast_own; + uint8_t *bcast_own_sum; + /* ogm_cnt_lock protects: bcast_own, bcast_own_sum, + * neigh_node->bat_iv.real_bits & neigh_node->bat_iv.real_packet_count + */ + spinlock_t ogm_cnt_lock; +}; + +/** * struct batadv_orig_node - structure for orig_list maintaining nodes of mesh * @orig: originator ethernet address * @primary_addr: hosts primary interface address * @router: router that should be used to reach this originator * @batadv_dat_addr_t: address of the orig node in the distributed hash - * @bcast_own: bitfield containing the number of our OGMs this orig_node - * rebroadcasted "back" to us (relative to last_real_seqno) - * @bcast_own_sum: counted result of bcast_own * @last_seen: time when last packet from this node was received * @bcast_seqno_reset: time when the broadcast seqno window was reset * @batman_seqno_reset: time when the batman seqno window was reset - * @gw_flags: flags related to gateway class - * @flags: for now only VIS_SERVER flag + * @capabilities: announced capabilities of this originator * @last_ttvn: last seen translation table version number - * @tt_crc: CRC of the translation table * @tt_buff: last tt changeset this node received from the orig node * @tt_buff_len: length of the last tt changeset this node received from the * orig node * @tt_buff_lock: lock that protects tt_buff and tt_buff_len - * @tt_size: number of global TT entries announced by the orig node * @tt_initialised: bool keeping track of whether or not this node have received * any translation table information from the orig node yet + * @tt_lock: prevents from updating the table while reading it. Table update is + * made up by two operations (data structure update and metdata -CRC/TTVN- + * recalculation) and they have to be executed atomically in order to avoid + * another thread to read the table/metadata between those. * @last_real_seqno: last and best known sequence number * @last_ttl: ttl of last received packet * @bcast_bits: bitfield containing the info which payload broadcast originated @@ -117,14 +189,9 @@ struct batadv_hard_iface { * last_bcast_seqno) * @last_bcast_seqno: last broadcast sequence number received by this host * @neigh_list: list of potential next hop neighbor towards this orig node - * @frag_list: fragmentation buffer list for fragment re-assembly - * @last_frag_packet: time when last fragmented packet from this node was - * received * @neigh_list_lock: lock protecting neigh_list, router and bonding_list * @hash_entry: hlist node for batadv_priv::orig_hash * @bat_priv: pointer to soft_iface this orig node belongs to - * @ogm_cnt_lock: lock protecting bcast_own, bcast_own_sum, - * neigh_node->real_bits & neigh_node->real_packet_count * @bcast_seqno_lock: lock protecting bcast_bits & last_bcast_seqno * @bond_candidates: how many candidates are available * @bond_list: list of bonding candidates @@ -134,6 +201,11 @@ struct batadv_hard_iface { * @out_coding_list: list of nodes that can hear this orig * @in_coding_list_lock: protects in_coding_list * @out_coding_list_lock: protects out_coding_list + * @fragments: array with heads for fragment chains + * @vlan_list: a list of orig_node_vlan structs, one per VLAN served by the + * originator represented by this object + * @vlan_list_lock: lock protecting vlan_list + * @bat_iv: B.A.T.M.A.N. IV private structure */ struct batadv_orig_node { uint8_t orig[ETH_ALEN]; @@ -142,35 +214,26 @@ struct batadv_orig_node { #ifdef CONFIG_BATMAN_ADV_DAT batadv_dat_addr_t dat_addr; #endif - unsigned long *bcast_own; - uint8_t *bcast_own_sum; unsigned long last_seen; unsigned long bcast_seqno_reset; unsigned long batman_seqno_reset; - uint8_t gw_flags; - uint8_t flags; + uint8_t capabilities; atomic_t last_ttvn; - uint16_t tt_crc; unsigned char *tt_buff; int16_t tt_buff_len; spinlock_t tt_buff_lock; /* protects tt_buff & tt_buff_len */ - atomic_t tt_size; bool tt_initialised; + /* prevents from changing the table while reading it */ + spinlock_t tt_lock; uint32_t last_real_seqno; uint8_t last_ttl; DECLARE_BITMAP(bcast_bits, BATADV_TQ_LOCAL_WINDOW_SIZE); uint32_t last_bcast_seqno; struct hlist_head neigh_list; - struct list_head frag_list; - unsigned long last_frag_packet; /* neigh_list_lock protects: neigh_list, router & bonding_list */ spinlock_t neigh_list_lock; struct hlist_node hash_entry; struct batadv_priv *bat_priv; - /* ogm_cnt_lock protects: bcast_own, bcast_own_sum, - * neigh_node->real_bits & neigh_node->real_packet_count - */ - spinlock_t ogm_cnt_lock; /* bcast_seqno_lock protects: bcast_bits & last_bcast_seqno */ spinlock_t bcast_seqno_lock; atomic_t bond_candidates; @@ -183,12 +246,28 @@ struct batadv_orig_node { spinlock_t in_coding_list_lock; /* Protects in_coding_list */ spinlock_t out_coding_list_lock; /* Protects out_coding_list */ #endif + struct batadv_frag_table_entry fragments[BATADV_FRAG_BUFFER_COUNT]; + struct list_head vlan_list; + spinlock_t vlan_list_lock; /* protects vlan_list */ + struct batadv_orig_bat_iv bat_iv; +}; + +/** + * enum batadv_orig_capabilities - orig node capabilities + * @BATADV_ORIG_CAPA_HAS_DAT: orig node has distributed arp table enabled + * @BATADV_ORIG_CAPA_HAS_NC: orig node has network coding enabled + */ +enum batadv_orig_capabilities { + BATADV_ORIG_CAPA_HAS_DAT = BIT(0), + BATADV_ORIG_CAPA_HAS_NC = BIT(1), }; /** * struct batadv_gw_node - structure for orig nodes announcing gw capabilities * @list: list node for batadv_priv_gw::list * @orig_node: pointer to corresponding orig node + * @bandwidth_down: advertised uplink download bandwidth + * @bandwidth_up: advertised uplink upload bandwidth * @deleted: this struct is scheduled for deletion * @refcount: number of contexts the object is used * @rcu: struct used for freeing in an RCU-safe manner @@ -196,46 +275,57 @@ struct batadv_orig_node { struct batadv_gw_node { struct hlist_node list; struct batadv_orig_node *orig_node; + uint32_t bandwidth_down; + uint32_t bandwidth_up; unsigned long deleted; atomic_t refcount; struct rcu_head rcu; }; /** - * struct batadv_neigh_node - structure for single hop neighbors - * @list: list node for batadv_orig_node::neigh_list - * @addr: mac address of neigh node + * struct batadv_neigh_bat_iv - B.A.T.M.A.N. IV specific structure for single + * hop neighbors * @tq_recv: ring buffer of received TQ values from this neigh node * @tq_index: ring buffer index * @tq_avg: averaged tq of all tq values in the ring buffer (tq_recv) - * @last_ttl: last received ttl from this neigh node - * @bonding_list: list node for batadv_orig_node::bond_list - * @last_seen: when last packet via this neighbor was received * @real_bits: bitfield containing the number of OGMs received from this neigh * node (relative to orig_node->last_real_seqno) * @real_packet_count: counted result of real_bits + * @lq_update_lock: lock protecting tq_recv & tq_index + */ +struct batadv_neigh_bat_iv { + uint8_t tq_recv[BATADV_TQ_GLOBAL_WINDOW_SIZE]; + uint8_t tq_index; + uint8_t tq_avg; + DECLARE_BITMAP(real_bits, BATADV_TQ_LOCAL_WINDOW_SIZE); + uint8_t real_packet_count; + spinlock_t lq_update_lock; /* protects tq_recv & tq_index */ +}; + +/** + * struct batadv_neigh_node - structure for single hops neighbors + * @list: list node for batadv_orig_node::neigh_list * @orig_node: pointer to corresponding orig_node + * @addr: the MAC address of the neighboring interface * @if_incoming: pointer to incoming hard interface - * @lq_update_lock: lock protecting tq_recv & tq_index + * @last_seen: when last packet via this neighbor was received + * @last_ttl: last received ttl from this neigh node + * @bonding_list: list node for batadv_orig_node::bond_list * @refcount: number of contexts the object is used * @rcu: struct used for freeing in an RCU-safe manner + * @bat_iv: B.A.T.M.A.N. IV private structure */ struct batadv_neigh_node { struct hlist_node list; + struct batadv_orig_node *orig_node; uint8_t addr[ETH_ALEN]; - uint8_t tq_recv[BATADV_TQ_GLOBAL_WINDOW_SIZE]; - uint8_t tq_index; - uint8_t tq_avg; + struct batadv_hard_iface *if_incoming; + unsigned long last_seen; uint8_t last_ttl; struct list_head bonding_list; - unsigned long last_seen; - DECLARE_BITMAP(real_bits, BATADV_TQ_LOCAL_WINDOW_SIZE); - uint8_t real_packet_count; - struct batadv_orig_node *orig_node; - struct batadv_hard_iface *if_incoming; - spinlock_t lq_update_lock; /* protects tq_recv & tq_index */ atomic_t refcount; struct rcu_head rcu; + struct batadv_neigh_bat_iv bat_iv; }; /** @@ -265,6 +355,12 @@ struct batadv_bcast_duplist_entry { * @BATADV_CNT_MGMT_TX_BYTES: transmitted routing protocol traffic bytes counter * @BATADV_CNT_MGMT_RX: received routing protocol traffic packet counter * @BATADV_CNT_MGMT_RX_BYTES: received routing protocol traffic bytes counter + * @BATADV_CNT_FRAG_TX: transmitted fragment traffic packet counter + * @BATADV_CNT_FRAG_TX_BYTES: transmitted fragment traffic bytes counter + * @BATADV_CNT_FRAG_RX: received fragment traffic packet counter + * @BATADV_CNT_FRAG_RX_BYTES: received fragment traffic bytes counter + * @BATADV_CNT_FRAG_FWD: forwarded fragment traffic packet counter + * @BATADV_CNT_FRAG_FWD_BYTES: forwarded fragment traffic bytes counter * @BATADV_CNT_TT_REQUEST_TX: transmitted tt req traffic packet counter * @BATADV_CNT_TT_REQUEST_RX: received tt req traffic packet counter * @BATADV_CNT_TT_RESPONSE_TX: transmitted tt resp traffic packet counter @@ -302,6 +398,12 @@ enum batadv_counters { BATADV_CNT_MGMT_TX_BYTES, BATADV_CNT_MGMT_RX, BATADV_CNT_MGMT_RX_BYTES, + BATADV_CNT_FRAG_TX, + BATADV_CNT_FRAG_TX_BYTES, + BATADV_CNT_FRAG_RX, + BATADV_CNT_FRAG_RX_BYTES, + BATADV_CNT_FRAG_FWD, + BATADV_CNT_FRAG_FWD_BYTES, BATADV_CNT_TT_REQUEST_TX, BATADV_CNT_TT_REQUEST_RX, BATADV_CNT_TT_RESPONSE_TX, @@ -343,11 +445,14 @@ enum batadv_counters { * @changes_list_lock: lock protecting changes_list * @req_list_lock: lock protecting req_list * @roam_list_lock: lock protecting roam_list - * @local_entry_num: number of entries in the local hash table - * @local_crc: Checksum of the local table, recomputed before sending a new OGM * @last_changeset: last tt changeset this host has generated * @last_changeset_len: length of last tt changeset this host has generated * @last_changeset_lock: lock protecting last_changeset & last_changeset_len + * @commit_lock: prevents from executing a local TT commit while reading the + * local table. The local TT commit is made up by two operations (data + * structure update and metdata -CRC/TTVN- recalculation) and they have to be + * executed atomically in order to avoid another thread to read the + * table/metadata between those. * @work: work queue callback item for translation table purging */ struct batadv_priv_tt { @@ -362,12 +467,12 @@ struct batadv_priv_tt { spinlock_t changes_list_lock; /* protects changes */ spinlock_t req_list_lock; /* protects req_list */ spinlock_t roam_list_lock; /* protects roam_list */ - atomic_t local_entry_num; - uint16_t local_crc; unsigned char *last_changeset; int16_t last_changeset_len; /* protects last_changeset & last_changeset_len */ spinlock_t last_changeset_lock; + /* prevents from executing a commit while reading the table */ + spinlock_t commit_lock; struct delayed_work work; }; @@ -420,31 +525,31 @@ struct batadv_priv_debug_log { * @list: list of available gateway nodes * @list_lock: lock protecting gw_list & curr_gw * @curr_gw: pointer to currently selected gateway node + * @bandwidth_down: advertised uplink download bandwidth (if gw_mode server) + * @bandwidth_up: advertised uplink upload bandwidth (if gw_mode server) * @reselect: bool indicating a gateway re-selection is in progress */ struct batadv_priv_gw { struct hlist_head list; spinlock_t list_lock; /* protects gw_list & curr_gw */ struct batadv_gw_node __rcu *curr_gw; /* rcu protected pointer */ + atomic_t bandwidth_down; + atomic_t bandwidth_up; atomic_t reselect; }; /** - * struct batadv_priv_vis - per mesh interface vis data - * @send_list: list of batadv_vis_info packets to sent - * @hash: hash table containing vis data from other nodes in the network - * @hash_lock: lock protecting the hash table - * @list_lock: lock protecting my_info::recv_list - * @work: work queue callback item for vis packet sending - * @my_info: holds this node's vis data sent on a regular basis + * struct batadv_priv_tvlv - per mesh interface tvlv data + * @container_list: list of registered tvlv containers to be sent with each OGM + * @handler_list: list of the various tvlv content handlers + * @container_list_lock: protects tvlv container list access + * @handler_list_lock: protects handler list access */ -struct batadv_priv_vis { - struct list_head send_list; - struct batadv_hashtable *hash; - spinlock_t hash_lock; /* protects hash */ - spinlock_t list_lock; /* protects my_info::recv_list */ - struct delayed_work work; - struct batadv_vis_info *my_info; +struct batadv_priv_tvlv { + struct hlist_head container_list; + struct hlist_head handler_list; + spinlock_t container_list_lock; /* protects container_list */ + spinlock_t handler_list_lock; /* protects handler_list */ }; /** @@ -491,6 +596,26 @@ struct batadv_priv_nc { }; /** + * struct batadv_softif_vlan - per VLAN attributes set + * @vid: VLAN identifier + * @kobj: kobject for sysfs vlan subdirectory + * @ap_isolation: AP isolation state + * @tt: TT private attributes (VLAN specific) + * @list: list node for bat_priv::softif_vlan_list + * @refcount: number of context where this object is currently in use + * @rcu: struct used for freeing in a RCU-safe manner + */ +struct batadv_softif_vlan { + unsigned short vid; + struct kobject *kobj; + atomic_t ap_isolation; /* boolean */ + struct batadv_vlan_tt tt; + struct hlist_node list; + atomic_t refcount; + struct rcu_head rcu; +}; + +/** * struct batadv_priv - per mesh interface data * @mesh_state: current status of the mesh (inactive/active/deactivating) * @soft_iface: net device which holds this struct as private data @@ -499,15 +624,15 @@ struct batadv_priv_nc { * @aggregated_ogms: bool indicating whether OGM aggregation is enabled * @bonding: bool indicating whether traffic bonding is enabled * @fragmentation: bool indicating whether traffic fragmentation is enabled - * @ap_isolation: bool indicating whether ap isolation is enabled + * @packet_size_max: max packet size that can be transmitted via + * multiple fragmented skbs or a single frame if fragmentation is disabled + * @frag_seqno: incremental counter to identify chains of egress fragments * @bridge_loop_avoidance: bool indicating whether bridge loop avoidance is * enabled * @distributed_arp_table: bool indicating whether distributed ARP table is * enabled - * @vis_mode: vis operation: client or server (see batadv_vis_packettype) * @gw_mode: gateway operation: off, client or server (see batadv_gw_modes) * @gw_sel_class: gateway selection class (applies if gw_mode client) - * @gw_bandwidth: gateway announced bandwidth (applies if gw_mode server) * @orig_interval: OGM broadcast interval in milliseconds * @hop_penalty: penalty which will be applied to an OGM's tq-field on every hop * @log_level: configured log level (see batadv_dbg_level) @@ -527,11 +652,14 @@ struct batadv_priv_nc { * @primary_if: one of the hard interfaces assigned to this mesh interface * becomes the primary interface * @bat_algo_ops: routing algorithm used by this mesh interface + * @softif_vlan_list: a list of softif_vlan structs, one per VLAN created on top + * of the mesh interface represented by this object + * @softif_vlan_list_lock: lock protecting softif_vlan_list * @bla: bridge loope avoidance data * @debug_log: holding debug logging relevant data * @gw: gateway data * @tt: translation table data - * @vis: vis data + * @tvlv: type-version-length-value data * @dat: distributed arp table data * @network_coding: bool indicating whether network coding is enabled * @batadv_priv_nc: network coding data @@ -544,17 +672,16 @@ struct batadv_priv { atomic_t aggregated_ogms; atomic_t bonding; atomic_t fragmentation; - atomic_t ap_isolation; + atomic_t packet_size_max; + atomic_t frag_seqno; #ifdef CONFIG_BATMAN_ADV_BLA atomic_t bridge_loop_avoidance; #endif #ifdef CONFIG_BATMAN_ADV_DAT atomic_t distributed_arp_table; #endif - atomic_t vis_mode; atomic_t gw_mode; atomic_t gw_sel_class; - atomic_t gw_bandwidth; atomic_t orig_interval; atomic_t hop_penalty; #ifdef CONFIG_BATMAN_ADV_DEBUG @@ -575,6 +702,8 @@ struct batadv_priv { struct work_struct cleanup_work; struct batadv_hard_iface __rcu *primary_if; /* rcu protected pointer */ struct batadv_algo_ops *bat_algo_ops; + struct hlist_head softif_vlan_list; + spinlock_t softif_vlan_list_lock; /* protects softif_vlan_list */ #ifdef CONFIG_BATMAN_ADV_BLA struct batadv_priv_bla bla; #endif @@ -583,7 +712,7 @@ struct batadv_priv { #endif struct batadv_priv_gw gw; struct batadv_priv_tt tt; - struct batadv_priv_vis vis; + struct batadv_priv_tvlv tvlv; #ifdef CONFIG_BATMAN_ADV_DAT struct batadv_priv_dat dat; #endif @@ -620,7 +749,7 @@ struct batadv_socket_client { struct batadv_socket_packet { struct list_head list; size_t icmp_len; - struct batadv_icmp_packet_rr icmp_packet; + uint8_t icmp_packet[BATADV_ICMP_MAX_PACKET_SIZE]; }; /** @@ -677,6 +806,7 @@ struct batadv_bla_claim { /** * struct batadv_tt_common_entry - tt local & tt global common data * @addr: mac address of non-mesh client + * @vid: VLAN identifier * @hash_entry: hlist node for batadv_priv_tt::local_hash or for * batadv_priv_tt::global_hash * @flags: various state handling flags (see batadv_tt_client_flags) @@ -686,6 +816,7 @@ struct batadv_bla_claim { */ struct batadv_tt_common_entry { uint8_t addr[ETH_ALEN]; + unsigned short vid; struct hlist_node hash_entry; uint16_t flags; unsigned long added_at; @@ -740,7 +871,7 @@ struct batadv_tt_orig_list_entry { */ struct batadv_tt_change_node { struct list_head list; - struct batadv_tt_change change; + struct batadv_tvlv_tt_change change; }; /** @@ -866,78 +997,6 @@ struct batadv_forw_packet { }; /** - * struct batadv_frag_packet_list_entry - storage for fragment packet - * @list: list node for orig_node::frag_list - * @seqno: sequence number of the fragment - * @skb: fragment's skb buffer - */ -struct batadv_frag_packet_list_entry { - struct list_head list; - uint16_t seqno; - struct sk_buff *skb; -}; - -/** - * struct batadv_vis_info - local data for vis information - * @first_seen: timestamp used for purging stale vis info entries - * @recv_list: List of server-neighbors we have received this packet from. This - * packet should not be re-forward to them again. List elements are struct - * batadv_vis_recvlist_node - * @send_list: list of packets to be forwarded - * @refcount: number of contexts the object is used - * @hash_entry: hlist node for batadv_priv_vis::hash - * @bat_priv: pointer to soft_iface this orig node belongs to - * @skb_packet: contains the vis packet - */ -struct batadv_vis_info { - unsigned long first_seen; - struct list_head recv_list; - struct list_head send_list; - struct kref refcount; - struct hlist_node hash_entry; - struct batadv_priv *bat_priv; - struct sk_buff *skb_packet; -} __packed; - -/** - * struct batadv_vis_info_entry - contains link information for vis - * @src: source MAC of the link, all zero for local TT entry - * @dst: destination MAC of the link, client mac address for local TT entry - * @quality: transmission quality of the link, or 0 for local TT entry - */ -struct batadv_vis_info_entry { - uint8_t src[ETH_ALEN]; - uint8_t dest[ETH_ALEN]; - uint8_t quality; -} __packed; - -/** - * struct batadv_vis_recvlist_node - list entry for batadv_vis_info::recv_list - * @list: list node for batadv_vis_info::recv_list - * @mac: MAC address of the originator from where the vis_info was received - */ -struct batadv_vis_recvlist_node { - struct list_head list; - uint8_t mac[ETH_ALEN]; -}; - -/** - * struct batadv_vis_if_list_entry - auxiliary data for vis data generation - * @addr: MAC address of the interface - * @primary: true if this interface is the primary interface - * @list: list node the interface list - * - * While scanning for vis-entries of a particular vis-originator - * this list collects its interfaces to create a subgraph/cluster - * out of them later - */ -struct batadv_vis_if_list_entry { - uint8_t addr[ETH_ALEN]; - bool primary; - struct hlist_node list; -}; - -/** * struct batadv_algo_ops - mesh algorithm callbacks * @list: list node for the batadv_algo_list * @name: name of the algorithm @@ -948,6 +1007,16 @@ struct batadv_vis_if_list_entry { * @bat_primary_iface_set: called when primary interface is selected / changed * @bat_ogm_schedule: prepare a new outgoing OGM for the send queue * @bat_ogm_emit: send scheduled OGM + * @bat_neigh_cmp: compare the metrics of two neighbors + * @bat_neigh_is_equiv_or_better: check if neigh1 is equally good or + * better than neigh2 from the metric prospective + * @bat_orig_print: print the originator table (optional) + * @bat_orig_free: free the resources allocated by the routing algorithm for an + * orig_node object + * @bat_orig_add_if: ask the routing algorithm to apply the needed changes to + * the orig_node due to a new hard-interface being added into the mesh + * @bat_orig_del_if: ask the routing algorithm to apply the needed changes to + * the orig_node due to an hard-interface being removed from the mesh */ struct batadv_algo_ops { struct hlist_node list; @@ -958,6 +1027,17 @@ struct batadv_algo_ops { void (*bat_primary_iface_set)(struct batadv_hard_iface *hard_iface); void (*bat_ogm_schedule)(struct batadv_hard_iface *hard_iface); void (*bat_ogm_emit)(struct batadv_forw_packet *forw_packet); + int (*bat_neigh_cmp)(struct batadv_neigh_node *neigh1, + struct batadv_neigh_node *neigh2); + bool (*bat_neigh_is_equiv_or_better)(struct batadv_neigh_node *neigh1, + struct batadv_neigh_node *neigh2); + /* orig_node handling API */ + void (*bat_orig_print)(struct batadv_priv *priv, struct seq_file *seq); + void (*bat_orig_free)(struct batadv_orig_node *orig_node); + int (*bat_orig_add_if)(struct batadv_orig_node *orig_node, + int max_if_num); + int (*bat_orig_del_if)(struct batadv_orig_node *orig_node, + int max_if_num, int del_if_num); }; /** @@ -965,6 +1045,7 @@ struct batadv_algo_ops { * is used to stored ARP entries needed for the global DAT cache * @ip: the IPv4 corresponding to this DAT/ARP entry * @mac_addr: the MAC address associated to the stored IPv4 + * @vid: the vlan ID associated to this entry * @last_update: time in jiffies when this entry was refreshed last time * @hash_entry: hlist node for batadv_priv_dat::hash * @refcount: number of contexts the object is used @@ -973,6 +1054,7 @@ struct batadv_algo_ops { struct batadv_dat_entry { __be32 ip; uint8_t mac_addr[ETH_ALEN]; + unsigned short vid; unsigned long last_update; struct hlist_node hash_entry; atomic_t refcount; @@ -992,4 +1074,60 @@ struct batadv_dat_candidate { struct batadv_orig_node *orig_node; }; +/** + * struct batadv_tvlv_container - container for tvlv appended to OGMs + * @list: hlist node for batadv_priv_tvlv::container_list + * @tvlv_hdr: tvlv header information needed to construct the tvlv + * @value_len: length of the buffer following this struct which contains + * the actual tvlv payload + * @refcount: number of contexts the object is used + */ +struct batadv_tvlv_container { + struct hlist_node list; + struct batadv_tvlv_hdr tvlv_hdr; + atomic_t refcount; +}; + +/** + * struct batadv_tvlv_handler - handler for specific tvlv type and version + * @list: hlist node for batadv_priv_tvlv::handler_list + * @ogm_handler: handler callback which is given the tvlv payload to process on + * incoming OGM packets + * @unicast_handler: handler callback which is given the tvlv payload to process + * on incoming unicast tvlv packets + * @type: tvlv type this handler feels responsible for + * @version: tvlv version this handler feels responsible for + * @flags: tvlv handler flags + * @refcount: number of contexts the object is used + * @rcu: struct used for freeing in an RCU-safe manner + */ +struct batadv_tvlv_handler { + struct hlist_node list; + void (*ogm_handler)(struct batadv_priv *bat_priv, + struct batadv_orig_node *orig, + uint8_t flags, + void *tvlv_value, uint16_t tvlv_value_len); + int (*unicast_handler)(struct batadv_priv *bat_priv, + uint8_t *src, uint8_t *dst, + void *tvlv_value, uint16_t tvlv_value_len); + uint8_t type; + uint8_t version; + uint8_t flags; + atomic_t refcount; + struct rcu_head rcu; +}; + +/** + * enum batadv_tvlv_handler_flags - tvlv handler flags definitions + * @BATADV_TVLV_HANDLER_OGM_CIFNOTFND: tvlv ogm processing function will call + * this handler even if its type was not found (with no data) + * @BATADV_TVLV_HANDLER_OGM_CALLED: interval tvlv handling flag - the API marks + * a handler as being called, so it won't be called if the + * BATADV_TVLV_HANDLER_OGM_CIFNOTFND flag was set + */ +enum batadv_tvlv_handler_flags { + BATADV_TVLV_HANDLER_OGM_CIFNOTFND = BIT(1), + BATADV_TVLV_HANDLER_OGM_CALLED = BIT(2), +}; + #endif /* _NET_BATMAN_ADV_TYPES_H_ */ diff --git a/net/batman-adv/unicast.c b/net/batman-adv/unicast.c deleted file mode 100644 index 48b31d33ce6b..000000000000 --- a/net/batman-adv/unicast.c +++ /dev/null @@ -1,491 +0,0 @@ -/* Copyright (C) 2010-2013 B.A.T.M.A.N. contributors: - * - * Andreas Langer - * - * This program is free software; you can redistribute it and/or - * modify it under the terms of version 2 of the GNU General Public - * License as published by the Free Software Foundation. - * - * This program is distributed in the hope that it will be useful, but - * WITHOUT ANY WARRANTY; without even the implied warranty of - * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU - * General Public License for more details. - * - * You should have received a copy of the GNU General Public License - * along with this program; if not, write to the Free Software - * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA - * 02110-1301, USA - */ - -#include "main.h" -#include "unicast.h" -#include "send.h" -#include "soft-interface.h" -#include "gateway_client.h" -#include "originator.h" -#include "hash.h" -#include "translation-table.h" -#include "routing.h" -#include "hard-interface.h" - - -static struct sk_buff * -batadv_frag_merge_packet(struct list_head *head, - struct batadv_frag_packet_list_entry *tfp, - struct sk_buff *skb) -{ - struct batadv_unicast_frag_packet *up; - struct sk_buff *tmp_skb; - struct batadv_unicast_packet *unicast_packet; - int hdr_len = sizeof(*unicast_packet); - int uni_diff = sizeof(*up) - hdr_len; - uint8_t *packet_pos; - - up = (struct batadv_unicast_frag_packet *)skb->data; - /* set skb to the first part and tmp_skb to the second part */ - if (up->flags & BATADV_UNI_FRAG_HEAD) { - tmp_skb = tfp->skb; - } else { - tmp_skb = skb; - skb = tfp->skb; - } - - if (skb_linearize(skb) < 0 || skb_linearize(tmp_skb) < 0) - goto err; - - skb_pull(tmp_skb, sizeof(*up)); - if (pskb_expand_head(skb, 0, tmp_skb->len, GFP_ATOMIC) < 0) - goto err; - - /* move free entry to end */ - tfp->skb = NULL; - tfp->seqno = 0; - list_move_tail(&tfp->list, head); - - memcpy(skb_put(skb, tmp_skb->len), tmp_skb->data, tmp_skb->len); - kfree_skb(tmp_skb); - - memmove(skb->data + uni_diff, skb->data, hdr_len); - packet_pos = skb_pull(skb, uni_diff); - unicast_packet = (struct batadv_unicast_packet *)packet_pos; - unicast_packet->header.packet_type = BATADV_UNICAST; - - return skb; - -err: - /* free buffered skb, skb will be freed later */ - kfree_skb(tfp->skb); - return NULL; -} - -static void batadv_frag_create_entry(struct list_head *head, - struct sk_buff *skb) -{ - struct batadv_frag_packet_list_entry *tfp; - struct batadv_unicast_frag_packet *up; - - up = (struct batadv_unicast_frag_packet *)skb->data; - - /* free and oldest packets stand at the end */ - tfp = list_entry((head)->prev, typeof(*tfp), list); - kfree_skb(tfp->skb); - - tfp->seqno = ntohs(up->seqno); - tfp->skb = skb; - list_move(&tfp->list, head); - return; -} - -static int batadv_frag_create_buffer(struct list_head *head) -{ - int i; - struct batadv_frag_packet_list_entry *tfp; - - for (i = 0; i < BATADV_FRAG_BUFFER_SIZE; i++) { - tfp = kmalloc(sizeof(*tfp), GFP_ATOMIC); - if (!tfp) { - batadv_frag_list_free(head); - return -ENOMEM; - } - tfp->skb = NULL; - tfp->seqno = 0; - INIT_LIST_HEAD(&tfp->list); - list_add(&tfp->list, head); - } - - return 0; -} - -static struct batadv_frag_packet_list_entry * -batadv_frag_search_packet(struct list_head *head, - const struct batadv_unicast_frag_packet *up) -{ - struct batadv_frag_packet_list_entry *tfp; - struct batadv_unicast_frag_packet *tmp_up = NULL; - bool is_head_tmp, is_head; - uint16_t search_seqno; - - if (up->flags & BATADV_UNI_FRAG_HEAD) - search_seqno = ntohs(up->seqno)+1; - else - search_seqno = ntohs(up->seqno)-1; - - is_head = up->flags & BATADV_UNI_FRAG_HEAD; - - list_for_each_entry(tfp, head, list) { - if (!tfp->skb) - continue; - - if (tfp->seqno == ntohs(up->seqno)) - goto mov_tail; - - tmp_up = (struct batadv_unicast_frag_packet *)tfp->skb->data; - - if (tfp->seqno == search_seqno) { - is_head_tmp = tmp_up->flags & BATADV_UNI_FRAG_HEAD; - if (is_head_tmp != is_head) - return tfp; - else - goto mov_tail; - } - } - return NULL; - -mov_tail: - list_move_tail(&tfp->list, head); - return NULL; -} - -void batadv_frag_list_free(struct list_head *head) -{ - struct batadv_frag_packet_list_entry *pf, *tmp_pf; - - if (!list_empty(head)) { - list_for_each_entry_safe(pf, tmp_pf, head, list) { - kfree_skb(pf->skb); - list_del(&pf->list); - kfree(pf); - } - } - return; -} - -/* frag_reassemble_skb(): - * returns NET_RX_DROP if the operation failed - skb is left intact - * returns NET_RX_SUCCESS if the fragment was buffered (skb_new will be NULL) - * or the skb could be reassembled (skb_new will point to the new packet and - * skb was freed) - */ -int batadv_frag_reassemble_skb(struct sk_buff *skb, - struct batadv_priv *bat_priv, - struct sk_buff **new_skb) -{ - struct batadv_orig_node *orig_node; - struct batadv_frag_packet_list_entry *tmp_frag_entry; - int ret = NET_RX_DROP; - struct batadv_unicast_frag_packet *unicast_packet; - - unicast_packet = (struct batadv_unicast_frag_packet *)skb->data; - *new_skb = NULL; - - orig_node = batadv_orig_hash_find(bat_priv, unicast_packet->orig); - if (!orig_node) - goto out; - - orig_node->last_frag_packet = jiffies; - - if (list_empty(&orig_node->frag_list) && - batadv_frag_create_buffer(&orig_node->frag_list)) { - pr_debug("couldn't create frag buffer\n"); - goto out; - } - - tmp_frag_entry = batadv_frag_search_packet(&orig_node->frag_list, - unicast_packet); - - if (!tmp_frag_entry) { - batadv_frag_create_entry(&orig_node->frag_list, skb); - ret = NET_RX_SUCCESS; - goto out; - } - - *new_skb = batadv_frag_merge_packet(&orig_node->frag_list, - tmp_frag_entry, skb); - /* if not, merge failed */ - if (*new_skb) - ret = NET_RX_SUCCESS; - -out: - if (orig_node) - batadv_orig_node_free_ref(orig_node); - return ret; -} - -int batadv_frag_send_skb(struct sk_buff *skb, struct batadv_priv *bat_priv, - struct batadv_hard_iface *hard_iface, - const uint8_t dstaddr[]) -{ - struct batadv_unicast_packet tmp_uc, *unicast_packet; - struct batadv_hard_iface *primary_if; - struct sk_buff *frag_skb; - struct batadv_unicast_frag_packet *frag1, *frag2; - int uc_hdr_len = sizeof(*unicast_packet); - int ucf_hdr_len = sizeof(*frag1); - int data_len = skb->len - uc_hdr_len; - int large_tail = 0, ret = NET_RX_DROP; - uint16_t seqno; - - primary_if = batadv_primary_if_get_selected(bat_priv); - if (!primary_if) - goto dropped; - - frag_skb = dev_alloc_skb(data_len - (data_len / 2) + ucf_hdr_len); - if (!frag_skb) - goto dropped; - - skb->priority = TC_PRIO_CONTROL; - skb_reserve(frag_skb, ucf_hdr_len); - - unicast_packet = (struct batadv_unicast_packet *)skb->data; - memcpy(&tmp_uc, unicast_packet, uc_hdr_len); - skb_split(skb, frag_skb, data_len / 2 + uc_hdr_len); - - if (batadv_skb_head_push(skb, ucf_hdr_len - uc_hdr_len) < 0 || - batadv_skb_head_push(frag_skb, ucf_hdr_len) < 0) - goto drop_frag; - - frag1 = (struct batadv_unicast_frag_packet *)skb->data; - frag2 = (struct batadv_unicast_frag_packet *)frag_skb->data; - - memcpy(frag1, &tmp_uc, sizeof(tmp_uc)); - - frag1->header.ttl--; - frag1->header.version = BATADV_COMPAT_VERSION; - frag1->header.packet_type = BATADV_UNICAST_FRAG; - - memcpy(frag1->orig, primary_if->net_dev->dev_addr, ETH_ALEN); - memcpy(frag2, frag1, sizeof(*frag2)); - - if (data_len & 1) - large_tail = BATADV_UNI_FRAG_LARGETAIL; - - frag1->flags = BATADV_UNI_FRAG_HEAD | large_tail; - frag2->flags = large_tail; - - seqno = atomic_add_return(2, &hard_iface->frag_seqno); - frag1->seqno = htons(seqno - 1); - frag2->seqno = htons(seqno); - - batadv_send_skb_packet(skb, hard_iface, dstaddr); - batadv_send_skb_packet(frag_skb, hard_iface, dstaddr); - ret = NET_RX_SUCCESS; - goto out; - -drop_frag: - kfree_skb(frag_skb); -dropped: - kfree_skb(skb); -out: - if (primary_if) - batadv_hardif_free_ref(primary_if); - return ret; -} - -/** - * batadv_unicast_push_and_fill_skb - extends the buffer and initializes the - * common fields for unicast packets - * @skb: packet - * @hdr_size: amount of bytes to push at the beginning of the skb - * @orig_node: the destination node - * - * Returns false if the buffer extension was not possible or true otherwise - */ -static bool batadv_unicast_push_and_fill_skb(struct sk_buff *skb, int hdr_size, - struct batadv_orig_node *orig_node) -{ - struct batadv_unicast_packet *unicast_packet; - uint8_t ttvn = (uint8_t)atomic_read(&orig_node->last_ttvn); - - if (batadv_skb_head_push(skb, hdr_size) < 0) - return false; - - unicast_packet = (struct batadv_unicast_packet *)skb->data; - unicast_packet->header.version = BATADV_COMPAT_VERSION; - /* batman packet type: unicast */ - unicast_packet->header.packet_type = BATADV_UNICAST; - /* set unicast ttl */ - unicast_packet->header.ttl = BATADV_TTL; - /* copy the destination for faster routing */ - memcpy(unicast_packet->dest, orig_node->orig, ETH_ALEN); - /* set the destination tt version number */ - unicast_packet->ttvn = ttvn; - - return true; -} - -/** - * batadv_unicast_prepare_skb - encapsulate an skb with a unicast header - * @skb: the skb containing the payload to encapsulate - * @orig_node: the destination node - * - * Returns false if the payload could not be encapsulated or true otherwise. - * - * This call might reallocate skb data. - */ -static bool batadv_unicast_prepare_skb(struct sk_buff *skb, - struct batadv_orig_node *orig_node) -{ - size_t uni_size = sizeof(struct batadv_unicast_packet); - return batadv_unicast_push_and_fill_skb(skb, uni_size, orig_node); -} - -/** - * batadv_unicast_4addr_prepare_skb - encapsulate an skb with a unicast4addr - * header - * @bat_priv: the bat priv with all the soft interface information - * @skb: the skb containing the payload to encapsulate - * @orig_node: the destination node - * @packet_subtype: the batman 4addr packet subtype to use - * - * Returns false if the payload could not be encapsulated or true otherwise. - * - * This call might reallocate skb data. - */ -bool batadv_unicast_4addr_prepare_skb(struct batadv_priv *bat_priv, - struct sk_buff *skb, - struct batadv_orig_node *orig, - int packet_subtype) -{ - struct batadv_hard_iface *primary_if; - struct batadv_unicast_4addr_packet *unicast_4addr_packet; - bool ret = false; - - primary_if = batadv_primary_if_get_selected(bat_priv); - if (!primary_if) - goto out; - - /* pull the header space and fill the unicast_packet substructure. - * We can do that because the first member of the unicast_4addr_packet - * is of type struct unicast_packet - */ - if (!batadv_unicast_push_and_fill_skb(skb, - sizeof(*unicast_4addr_packet), - orig)) - goto out; - - unicast_4addr_packet = (struct batadv_unicast_4addr_packet *)skb->data; - unicast_4addr_packet->u.header.packet_type = BATADV_UNICAST_4ADDR; - memcpy(unicast_4addr_packet->src, primary_if->net_dev->dev_addr, - ETH_ALEN); - unicast_4addr_packet->subtype = packet_subtype; - unicast_4addr_packet->reserved = 0; - - ret = true; -out: - if (primary_if) - batadv_hardif_free_ref(primary_if); - return ret; -} - -/** - * batadv_unicast_generic_send_skb - send an skb as unicast - * @bat_priv: the bat priv with all the soft interface information - * @skb: payload to send - * @packet_type: the batman unicast packet type to use - * @packet_subtype: the batman packet subtype. It is ignored if packet_type is - * not BATADV_UNICAT_4ADDR - * - * Returns 1 in case of error or 0 otherwise - */ -int batadv_unicast_generic_send_skb(struct batadv_priv *bat_priv, - struct sk_buff *skb, int packet_type, - int packet_subtype) -{ - struct ethhdr *ethhdr = (struct ethhdr *)skb->data; - struct batadv_unicast_packet *unicast_packet; - struct batadv_orig_node *orig_node; - struct batadv_neigh_node *neigh_node; - int data_len = skb->len; - int ret = NET_RX_DROP; - unsigned int dev_mtu, header_len; - - /* get routing information */ - if (is_multicast_ether_addr(ethhdr->h_dest)) { - orig_node = batadv_gw_get_selected_orig(bat_priv); - if (orig_node) - goto find_router; - } - - /* check for tt host - increases orig_node refcount. - * returns NULL in case of AP isolation - */ - orig_node = batadv_transtable_search(bat_priv, ethhdr->h_source, - ethhdr->h_dest); - -find_router: - /* find_router(): - * - if orig_node is NULL it returns NULL - * - increases neigh_nodes refcount if found. - */ - neigh_node = batadv_find_router(bat_priv, orig_node, NULL); - - if (!neigh_node) - goto out; - - switch (packet_type) { - case BATADV_UNICAST: - if (!batadv_unicast_prepare_skb(skb, orig_node)) - goto out; - - header_len = sizeof(struct batadv_unicast_packet); - break; - case BATADV_UNICAST_4ADDR: - if (!batadv_unicast_4addr_prepare_skb(bat_priv, skb, orig_node, - packet_subtype)) - goto out; - - header_len = sizeof(struct batadv_unicast_4addr_packet); - break; - default: - /* this function supports UNICAST and UNICAST_4ADDR only. It - * should never be invoked with any other packet type - */ - goto out; - } - - ethhdr = (struct ethhdr *)(skb->data + header_len); - unicast_packet = (struct batadv_unicast_packet *)skb->data; - - /* inform the destination node that we are still missing a correct route - * for this client. The destination will receive this packet and will - * try to reroute it because the ttvn contained in the header is less - * than the current one - */ - if (batadv_tt_global_client_is_roaming(bat_priv, ethhdr->h_dest)) - unicast_packet->ttvn = unicast_packet->ttvn - 1; - - dev_mtu = neigh_node->if_incoming->net_dev->mtu; - /* fragmentation mechanism only works for UNICAST (now) */ - if (packet_type == BATADV_UNICAST && - atomic_read(&bat_priv->fragmentation) && - data_len + sizeof(*unicast_packet) > dev_mtu) { - /* send frag skb decreases ttl */ - unicast_packet->header.ttl++; - ret = batadv_frag_send_skb(skb, bat_priv, - neigh_node->if_incoming, - neigh_node->addr); - goto out; - } - - if (batadv_send_skb_to_orig(skb, orig_node, NULL) != NET_XMIT_DROP) - ret = 0; - -out: - if (neigh_node) - batadv_neigh_node_free_ref(neigh_node); - if (orig_node) - batadv_orig_node_free_ref(orig_node); - if (ret == NET_RX_DROP) - kfree_skb(skb); - return ret; -} diff --git a/net/batman-adv/unicast.h b/net/batman-adv/unicast.h deleted file mode 100644 index 429cf8a4a31e..000000000000 --- a/net/batman-adv/unicast.h +++ /dev/null @@ -1,92 +0,0 @@ -/* Copyright (C) 2010-2013 B.A.T.M.A.N. contributors: - * - * Andreas Langer - * - * This program is free software; you can redistribute it and/or - * modify it under the terms of version 2 of the GNU General Public - * License as published by the Free Software Foundation. - * - * This program is distributed in the hope that it will be useful, but - * WITHOUT ANY WARRANTY; without even the implied warranty of - * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU - * General Public License for more details. - * - * You should have received a copy of the GNU General Public License - * along with this program; if not, write to the Free Software - * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA - * 02110-1301, USA - */ - -#ifndef _NET_BATMAN_ADV_UNICAST_H_ -#define _NET_BATMAN_ADV_UNICAST_H_ - -#include "packet.h" - -#define BATADV_FRAG_TIMEOUT 10000 /* purge frag list entries after time in ms */ -#define BATADV_FRAG_BUFFER_SIZE 6 /* number of list elements in buffer */ - -int batadv_frag_reassemble_skb(struct sk_buff *skb, - struct batadv_priv *bat_priv, - struct sk_buff **new_skb); -void batadv_frag_list_free(struct list_head *head); -int batadv_frag_send_skb(struct sk_buff *skb, struct batadv_priv *bat_priv, - struct batadv_hard_iface *hard_iface, - const uint8_t dstaddr[]); -bool batadv_unicast_4addr_prepare_skb(struct batadv_priv *bat_priv, - struct sk_buff *skb, - struct batadv_orig_node *orig_node, - int packet_subtype); -int batadv_unicast_generic_send_skb(struct batadv_priv *bat_priv, - struct sk_buff *skb, int packet_type, - int packet_subtype); - - -/** - * batadv_unicast_send_skb - send the skb encapsulated in a unicast packet - * @bat_priv: the bat priv with all the soft interface information - * @skb: the payload to send - */ -static inline int batadv_unicast_send_skb(struct batadv_priv *bat_priv, - struct sk_buff *skb) -{ - return batadv_unicast_generic_send_skb(bat_priv, skb, BATADV_UNICAST, - 0); -} - -/** - * batadv_unicast_send_skb - send the skb encapsulated in a unicast4addr packet - * @bat_priv: the bat priv with all the soft interface information - * @skb: the payload to send - * @packet_subtype: the batman 4addr packet subtype to use - */ -static inline int batadv_unicast_4addr_send_skb(struct batadv_priv *bat_priv, - struct sk_buff *skb, - int packet_subtype) -{ - return batadv_unicast_generic_send_skb(bat_priv, skb, - BATADV_UNICAST_4ADDR, - packet_subtype); -} - -static inline int batadv_frag_can_reassemble(const struct sk_buff *skb, int mtu) -{ - const struct batadv_unicast_frag_packet *unicast_packet; - int uneven_correction = 0; - unsigned int merged_size; - - unicast_packet = (struct batadv_unicast_frag_packet *)skb->data; - - if (unicast_packet->flags & BATADV_UNI_FRAG_LARGETAIL) { - if (unicast_packet->flags & BATADV_UNI_FRAG_HEAD) - uneven_correction = 1; - else - uneven_correction = -1; - } - - merged_size = (skb->len - sizeof(*unicast_packet)) * 2; - merged_size += sizeof(struct batadv_unicast_packet) + uneven_correction; - - return merged_size <= mtu; -} - -#endif /* _NET_BATMAN_ADV_UNICAST_H_ */ diff --git a/net/batman-adv/vis.c b/net/batman-adv/vis.c deleted file mode 100644 index d8ea31a58457..000000000000 --- a/net/batman-adv/vis.c +++ /dev/null @@ -1,938 +0,0 @@ -/* Copyright (C) 2008-2013 B.A.T.M.A.N. contributors: - * - * Simon Wunderlich - * - * This program is free software; you can redistribute it and/or - * modify it under the terms of version 2 of the GNU General Public - * License as published by the Free Software Foundation. - * - * This program is distributed in the hope that it will be useful, but - * WITHOUT ANY WARRANTY; without even the implied warranty of - * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU - * General Public License for more details. - * - * You should have received a copy of the GNU General Public License - * along with this program; if not, write to the Free Software - * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA - * 02110-1301, USA - */ - -#include "main.h" -#include "send.h" -#include "translation-table.h" -#include "vis.h" -#include "soft-interface.h" -#include "hard-interface.h" -#include "hash.h" -#include "originator.h" - -#define BATADV_MAX_VIS_PACKET_SIZE 1000 - -/* hash class keys */ -static struct lock_class_key batadv_vis_hash_lock_class_key; - -/* free the info */ -static void batadv_free_info(struct kref *ref) -{ - struct batadv_vis_info *info; - struct batadv_priv *bat_priv; - struct batadv_vis_recvlist_node *entry, *tmp; - - info = container_of(ref, struct batadv_vis_info, refcount); - bat_priv = info->bat_priv; - - list_del_init(&info->send_list); - spin_lock_bh(&bat_priv->vis.list_lock); - list_for_each_entry_safe(entry, tmp, &info->recv_list, list) { - list_del(&entry->list); - kfree(entry); - } - - spin_unlock_bh(&bat_priv->vis.list_lock); - kfree_skb(info->skb_packet); - kfree(info); -} - -/* Compare two vis packets, used by the hashing algorithm */ -static int batadv_vis_info_cmp(const struct hlist_node *node, const void *data2) -{ - const struct batadv_vis_info *d1, *d2; - const struct batadv_vis_packet *p1, *p2; - - d1 = container_of(node, struct batadv_vis_info, hash_entry); - d2 = data2; - p1 = (struct batadv_vis_packet *)d1->skb_packet->data; - p2 = (struct batadv_vis_packet *)d2->skb_packet->data; - return batadv_compare_eth(p1->vis_orig, p2->vis_orig); -} - -/* hash function to choose an entry in a hash table of given size - * hash algorithm from http://en.wikipedia.org/wiki/Hash_table - */ -static uint32_t batadv_vis_info_choose(const void *data, uint32_t size) -{ - const struct batadv_vis_info *vis_info = data; - const struct batadv_vis_packet *packet; - const unsigned char *key; - uint32_t hash = 0; - size_t i; - - packet = (struct batadv_vis_packet *)vis_info->skb_packet->data; - key = packet->vis_orig; - for (i = 0; i < ETH_ALEN; i++) { - hash += key[i]; - hash += (hash << 10); - hash ^= (hash >> 6); - } - - hash += (hash << 3); - hash ^= (hash >> 11); - hash += (hash << 15); - - return hash % size; -} - -static struct batadv_vis_info * -batadv_vis_hash_find(struct batadv_priv *bat_priv, const void *data) -{ - struct batadv_hashtable *hash = bat_priv->vis.hash; - struct hlist_head *head; - struct batadv_vis_info *vis_info, *vis_info_tmp = NULL; - uint32_t index; - - if (!hash) - return NULL; - - index = batadv_vis_info_choose(data, hash->size); - head = &hash->table[index]; - - rcu_read_lock(); - hlist_for_each_entry_rcu(vis_info, head, hash_entry) { - if (!batadv_vis_info_cmp(&vis_info->hash_entry, data)) - continue; - - vis_info_tmp = vis_info; - break; - } - rcu_read_unlock(); - - return vis_info_tmp; -} - -/* insert interface to the list of interfaces of one originator, if it - * does not already exist in the list - */ -static void batadv_vis_data_insert_interface(const uint8_t *interface, - struct hlist_head *if_list, - bool primary) -{ - struct batadv_vis_if_list_entry *entry; - - hlist_for_each_entry(entry, if_list, list) { - if (batadv_compare_eth(entry->addr, interface)) - return; - } - - /* it's a new address, add it to the list */ - entry = kmalloc(sizeof(*entry), GFP_ATOMIC); - if (!entry) - return; - memcpy(entry->addr, interface, ETH_ALEN); - entry->primary = primary; - hlist_add_head(&entry->list, if_list); -} - -static void batadv_vis_data_read_prim_sec(struct seq_file *seq, - const struct hlist_head *if_list) -{ - struct batadv_vis_if_list_entry *entry; - - hlist_for_each_entry(entry, if_list, list) { - if (entry->primary) - seq_puts(seq, "PRIMARY, "); - else - seq_printf(seq, "SEC %pM, ", entry->addr); - } -} - -/* read an entry */ -static ssize_t -batadv_vis_data_read_entry(struct seq_file *seq, - const struct batadv_vis_info_entry *entry, - const uint8_t *src, bool primary) -{ - if (primary && entry->quality == 0) - return seq_printf(seq, "TT %pM, ", entry->dest); - else if (batadv_compare_eth(entry->src, src)) - return seq_printf(seq, "TQ %pM %d, ", entry->dest, - entry->quality); - - return 0; -} - -static void -batadv_vis_data_insert_interfaces(struct hlist_head *list, - struct batadv_vis_packet *packet, - struct batadv_vis_info_entry *entries) -{ - int i; - - for (i = 0; i < packet->entries; i++) { - if (entries[i].quality == 0) - continue; - - if (batadv_compare_eth(entries[i].src, packet->vis_orig)) - continue; - - batadv_vis_data_insert_interface(entries[i].src, list, false); - } -} - -static void batadv_vis_data_read_entries(struct seq_file *seq, - struct hlist_head *list, - struct batadv_vis_packet *packet, - struct batadv_vis_info_entry *entries) -{ - int i; - struct batadv_vis_if_list_entry *entry; - - hlist_for_each_entry(entry, list, list) { - seq_printf(seq, "%pM,", entry->addr); - - for (i = 0; i < packet->entries; i++) - batadv_vis_data_read_entry(seq, &entries[i], - entry->addr, entry->primary); - - /* add primary/secondary records */ - if (batadv_compare_eth(entry->addr, packet->vis_orig)) - batadv_vis_data_read_prim_sec(seq, list); - - seq_puts(seq, "\n"); - } -} - -static void batadv_vis_seq_print_text_bucket(struct seq_file *seq, - const struct hlist_head *head) -{ - struct batadv_vis_info *info; - struct batadv_vis_packet *packet; - uint8_t *entries_pos; - struct batadv_vis_info_entry *entries; - struct batadv_vis_if_list_entry *entry; - struct hlist_node *n; - - HLIST_HEAD(vis_if_list); - - hlist_for_each_entry_rcu(info, head, hash_entry) { - packet = (struct batadv_vis_packet *)info->skb_packet->data; - entries_pos = (uint8_t *)packet + sizeof(*packet); - entries = (struct batadv_vis_info_entry *)entries_pos; - - batadv_vis_data_insert_interface(packet->vis_orig, &vis_if_list, - true); - batadv_vis_data_insert_interfaces(&vis_if_list, packet, - entries); - batadv_vis_data_read_entries(seq, &vis_if_list, packet, - entries); - - hlist_for_each_entry_safe(entry, n, &vis_if_list, list) { - hlist_del(&entry->list); - kfree(entry); - } - } -} - -int batadv_vis_seq_print_text(struct seq_file *seq, void *offset) -{ - struct batadv_hard_iface *primary_if; - struct hlist_head *head; - struct net_device *net_dev = (struct net_device *)seq->private; - struct batadv_priv *bat_priv = netdev_priv(net_dev); - struct batadv_hashtable *hash = bat_priv->vis.hash; - uint32_t i; - int ret = 0; - int vis_server = atomic_read(&bat_priv->vis_mode); - - primary_if = batadv_primary_if_get_selected(bat_priv); - if (!primary_if) - goto out; - - if (vis_server == BATADV_VIS_TYPE_CLIENT_UPDATE) - goto out; - - spin_lock_bh(&bat_priv->vis.hash_lock); - for (i = 0; i < hash->size; i++) { - head = &hash->table[i]; - batadv_vis_seq_print_text_bucket(seq, head); - } - spin_unlock_bh(&bat_priv->vis.hash_lock); - -out: - if (primary_if) - batadv_hardif_free_ref(primary_if); - return ret; -} - -/* add the info packet to the send list, if it was not - * already linked in. - */ -static void batadv_send_list_add(struct batadv_priv *bat_priv, - struct batadv_vis_info *info) -{ - if (list_empty(&info->send_list)) { - kref_get(&info->refcount); - list_add_tail(&info->send_list, &bat_priv->vis.send_list); - } -} - -/* delete the info packet from the send list, if it was - * linked in. - */ -static void batadv_send_list_del(struct batadv_vis_info *info) -{ - if (!list_empty(&info->send_list)) { - list_del_init(&info->send_list); - kref_put(&info->refcount, batadv_free_info); - } -} - -/* tries to add one entry to the receive list. */ -static void batadv_recv_list_add(struct batadv_priv *bat_priv, - struct list_head *recv_list, const char *mac) -{ - struct batadv_vis_recvlist_node *entry; - - entry = kmalloc(sizeof(*entry), GFP_ATOMIC); - if (!entry) - return; - - memcpy(entry->mac, mac, ETH_ALEN); - spin_lock_bh(&bat_priv->vis.list_lock); - list_add_tail(&entry->list, recv_list); - spin_unlock_bh(&bat_priv->vis.list_lock); -} - -/* returns 1 if this mac is in the recv_list */ -static int batadv_recv_list_is_in(struct batadv_priv *bat_priv, - const struct list_head *recv_list, - const char *mac) -{ - const struct batadv_vis_recvlist_node *entry; - - spin_lock_bh(&bat_priv->vis.list_lock); - list_for_each_entry(entry, recv_list, list) { - if (batadv_compare_eth(entry->mac, mac)) { - spin_unlock_bh(&bat_priv->vis.list_lock); - return 1; - } - } - spin_unlock_bh(&bat_priv->vis.list_lock); - return 0; -} - -/* try to add the packet to the vis_hash. return NULL if invalid (e.g. too old, - * broken.. ). vis hash must be locked outside. is_new is set when the packet - * is newer than old entries in the hash. - */ -static struct batadv_vis_info * -batadv_add_packet(struct batadv_priv *bat_priv, - struct batadv_vis_packet *vis_packet, int vis_info_len, - int *is_new, int make_broadcast) -{ - struct batadv_vis_info *info, *old_info; - struct batadv_vis_packet *search_packet, *old_packet; - struct batadv_vis_info search_elem; - struct batadv_vis_packet *packet; - struct sk_buff *tmp_skb; - int hash_added; - size_t len; - size_t max_entries; - - *is_new = 0; - /* sanity check */ - if (!bat_priv->vis.hash) - return NULL; - - /* see if the packet is already in vis_hash */ - search_elem.skb_packet = dev_alloc_skb(sizeof(*search_packet)); - if (!search_elem.skb_packet) - return NULL; - len = sizeof(*search_packet); - tmp_skb = search_elem.skb_packet; - search_packet = (struct batadv_vis_packet *)skb_put(tmp_skb, len); - - memcpy(search_packet->vis_orig, vis_packet->vis_orig, ETH_ALEN); - old_info = batadv_vis_hash_find(bat_priv, &search_elem); - kfree_skb(search_elem.skb_packet); - - if (old_info) { - tmp_skb = old_info->skb_packet; - old_packet = (struct batadv_vis_packet *)tmp_skb->data; - if (!batadv_seq_after(ntohl(vis_packet->seqno), - ntohl(old_packet->seqno))) { - if (old_packet->seqno == vis_packet->seqno) { - batadv_recv_list_add(bat_priv, - &old_info->recv_list, - vis_packet->sender_orig); - return old_info; - } else { - /* newer packet is already in hash. */ - return NULL; - } - } - /* remove old entry */ - batadv_hash_remove(bat_priv->vis.hash, batadv_vis_info_cmp, - batadv_vis_info_choose, old_info); - batadv_send_list_del(old_info); - kref_put(&old_info->refcount, batadv_free_info); - } - - info = kmalloc(sizeof(*info), GFP_ATOMIC); - if (!info) - return NULL; - - len = sizeof(*packet) + vis_info_len; - info->skb_packet = netdev_alloc_skb_ip_align(NULL, len + ETH_HLEN); - if (!info->skb_packet) { - kfree(info); - return NULL; - } - info->skb_packet->priority = TC_PRIO_CONTROL; - skb_reserve(info->skb_packet, ETH_HLEN); - packet = (struct batadv_vis_packet *)skb_put(info->skb_packet, len); - - kref_init(&info->refcount); - INIT_LIST_HEAD(&info->send_list); - INIT_LIST_HEAD(&info->recv_list); - info->first_seen = jiffies; - info->bat_priv = bat_priv; - memcpy(packet, vis_packet, len); - - /* initialize and add new packet. */ - *is_new = 1; - - /* Make it a broadcast packet, if required */ - if (make_broadcast) - memcpy(packet->target_orig, batadv_broadcast_addr, ETH_ALEN); - - /* repair if entries is longer than packet. */ - max_entries = vis_info_len / sizeof(struct batadv_vis_info_entry); - if (packet->entries > max_entries) - packet->entries = max_entries; - - batadv_recv_list_add(bat_priv, &info->recv_list, packet->sender_orig); - - /* try to add it */ - hash_added = batadv_hash_add(bat_priv->vis.hash, batadv_vis_info_cmp, - batadv_vis_info_choose, info, - &info->hash_entry); - if (hash_added != 0) { - /* did not work (for some reason) */ - kref_put(&info->refcount, batadv_free_info); - info = NULL; - } - - return info; -} - -/* handle the server sync packet, forward if needed. */ -void batadv_receive_server_sync_packet(struct batadv_priv *bat_priv, - struct batadv_vis_packet *vis_packet, - int vis_info_len) -{ - struct batadv_vis_info *info; - int is_new, make_broadcast; - int vis_server = atomic_read(&bat_priv->vis_mode); - - make_broadcast = (vis_server == BATADV_VIS_TYPE_SERVER_SYNC); - - spin_lock_bh(&bat_priv->vis.hash_lock); - info = batadv_add_packet(bat_priv, vis_packet, vis_info_len, - &is_new, make_broadcast); - if (!info) - goto end; - - /* only if we are server ourselves and packet is newer than the one in - * hash. - */ - if (vis_server == BATADV_VIS_TYPE_SERVER_SYNC && is_new) - batadv_send_list_add(bat_priv, info); -end: - spin_unlock_bh(&bat_priv->vis.hash_lock); -} - -/* handle an incoming client update packet and schedule forward if needed. */ -void batadv_receive_client_update_packet(struct batadv_priv *bat_priv, - struct batadv_vis_packet *vis_packet, - int vis_info_len) -{ - struct batadv_vis_info *info; - struct batadv_vis_packet *packet; - int is_new; - int vis_server = atomic_read(&bat_priv->vis_mode); - int are_target = 0; - - /* clients shall not broadcast. */ - if (is_broadcast_ether_addr(vis_packet->target_orig)) - return; - - /* Are we the target for this VIS packet? */ - if (vis_server == BATADV_VIS_TYPE_SERVER_SYNC && - batadv_is_my_mac(bat_priv, vis_packet->target_orig)) - are_target = 1; - - spin_lock_bh(&bat_priv->vis.hash_lock); - info = batadv_add_packet(bat_priv, vis_packet, vis_info_len, - &is_new, are_target); - - if (!info) - goto end; - /* note that outdated packets will be dropped at this point. */ - - packet = (struct batadv_vis_packet *)info->skb_packet->data; - - /* send only if we're the target server or ... */ - if (are_target && is_new) { - packet->vis_type = BATADV_VIS_TYPE_SERVER_SYNC; /* upgrade! */ - batadv_send_list_add(bat_priv, info); - - /* ... we're not the recipient (and thus need to forward). */ - } else if (!batadv_is_my_mac(bat_priv, packet->target_orig)) { - batadv_send_list_add(bat_priv, info); - } - -end: - spin_unlock_bh(&bat_priv->vis.hash_lock); -} - -/* Walk the originators and find the VIS server with the best tq. Set the packet - * address to its address and return the best_tq. - * - * Must be called with the originator hash locked - */ -static int batadv_find_best_vis_server(struct batadv_priv *bat_priv, - struct batadv_vis_info *info) -{ - struct batadv_hashtable *hash = bat_priv->orig_hash; - struct batadv_neigh_node *router; - struct hlist_head *head; - struct batadv_orig_node *orig_node; - struct batadv_vis_packet *packet; - int best_tq = -1; - uint32_t i; - - packet = (struct batadv_vis_packet *)info->skb_packet->data; - - for (i = 0; i < hash->size; i++) { - head = &hash->table[i]; - - rcu_read_lock(); - hlist_for_each_entry_rcu(orig_node, head, hash_entry) { - router = batadv_orig_node_get_router(orig_node); - if (!router) - continue; - - if ((orig_node->flags & BATADV_VIS_SERVER) && - (router->tq_avg > best_tq)) { - best_tq = router->tq_avg; - memcpy(packet->target_orig, orig_node->orig, - ETH_ALEN); - } - batadv_neigh_node_free_ref(router); - } - rcu_read_unlock(); - } - - return best_tq; -} - -/* Return true if the vis packet is full. */ -static bool batadv_vis_packet_full(const struct batadv_vis_info *info) -{ - const struct batadv_vis_packet *packet; - size_t num; - - packet = (struct batadv_vis_packet *)info->skb_packet->data; - num = BATADV_MAX_VIS_PACKET_SIZE / sizeof(struct batadv_vis_info_entry); - - if (num < packet->entries + 1) - return true; - return false; -} - -/* generates a packet of own vis data, - * returns 0 on success, -1 if no packet could be generated - */ -static int batadv_generate_vis_packet(struct batadv_priv *bat_priv) -{ - struct batadv_hashtable *hash = bat_priv->orig_hash; - struct hlist_head *head; - struct batadv_orig_node *orig_node; - struct batadv_neigh_node *router; - struct batadv_vis_info *info = bat_priv->vis.my_info; - struct batadv_vis_packet *packet; - struct batadv_vis_info_entry *entry; - struct batadv_tt_common_entry *tt_common_entry; - uint8_t *packet_pos; - int best_tq = -1; - uint32_t i; - - info->first_seen = jiffies; - packet = (struct batadv_vis_packet *)info->skb_packet->data; - packet->vis_type = atomic_read(&bat_priv->vis_mode); - - memcpy(packet->target_orig, batadv_broadcast_addr, ETH_ALEN); - packet->header.ttl = BATADV_TTL; - packet->seqno = htonl(ntohl(packet->seqno) + 1); - packet->entries = 0; - packet->reserved = 0; - skb_trim(info->skb_packet, sizeof(*packet)); - - if (packet->vis_type == BATADV_VIS_TYPE_CLIENT_UPDATE) { - best_tq = batadv_find_best_vis_server(bat_priv, info); - - if (best_tq < 0) - return best_tq; - } - - for (i = 0; i < hash->size; i++) { - head = &hash->table[i]; - - rcu_read_lock(); - hlist_for_each_entry_rcu(orig_node, head, hash_entry) { - router = batadv_orig_node_get_router(orig_node); - if (!router) - continue; - - if (!batadv_compare_eth(router->addr, orig_node->orig)) - goto next; - - if (router->if_incoming->if_status != BATADV_IF_ACTIVE) - goto next; - - if (router->tq_avg < 1) - goto next; - - /* fill one entry into buffer. */ - packet_pos = skb_put(info->skb_packet, sizeof(*entry)); - entry = (struct batadv_vis_info_entry *)packet_pos; - memcpy(entry->src, - router->if_incoming->net_dev->dev_addr, - ETH_ALEN); - memcpy(entry->dest, orig_node->orig, ETH_ALEN); - entry->quality = router->tq_avg; - packet->entries++; - -next: - batadv_neigh_node_free_ref(router); - - if (batadv_vis_packet_full(info)) - goto unlock; - } - rcu_read_unlock(); - } - - hash = bat_priv->tt.local_hash; - - for (i = 0; i < hash->size; i++) { - head = &hash->table[i]; - - rcu_read_lock(); - hlist_for_each_entry_rcu(tt_common_entry, head, - hash_entry) { - packet_pos = skb_put(info->skb_packet, sizeof(*entry)); - entry = (struct batadv_vis_info_entry *)packet_pos; - memset(entry->src, 0, ETH_ALEN); - memcpy(entry->dest, tt_common_entry->addr, ETH_ALEN); - entry->quality = 0; /* 0 means TT */ - packet->entries++; - - if (batadv_vis_packet_full(info)) - goto unlock; - } - rcu_read_unlock(); - } - - return 0; - -unlock: - rcu_read_unlock(); - return 0; -} - -/* free old vis packets. Must be called with this vis_hash_lock - * held - */ -static void batadv_purge_vis_packets(struct batadv_priv *bat_priv) -{ - uint32_t i; - struct batadv_hashtable *hash = bat_priv->vis.hash; - struct hlist_node *node_tmp; - struct hlist_head *head; - struct batadv_vis_info *info; - - for (i = 0; i < hash->size; i++) { - head = &hash->table[i]; - - hlist_for_each_entry_safe(info, node_tmp, - head, hash_entry) { - /* never purge own data. */ - if (info == bat_priv->vis.my_info) - continue; - - if (batadv_has_timed_out(info->first_seen, - BATADV_VIS_TIMEOUT)) { - hlist_del(&info->hash_entry); - batadv_send_list_del(info); - kref_put(&info->refcount, batadv_free_info); - } - } - } -} - -static void batadv_broadcast_vis_packet(struct batadv_priv *bat_priv, - struct batadv_vis_info *info) -{ - struct batadv_hashtable *hash = bat_priv->orig_hash; - struct hlist_head *head; - struct batadv_orig_node *orig_node; - struct batadv_vis_packet *packet; - struct sk_buff *skb; - uint32_t i, res; - - - packet = (struct batadv_vis_packet *)info->skb_packet->data; - - /* send to all routers in range. */ - for (i = 0; i < hash->size; i++) { - head = &hash->table[i]; - - rcu_read_lock(); - hlist_for_each_entry_rcu(orig_node, head, hash_entry) { - /* if it's a vis server and reachable, send it. */ - if (!(orig_node->flags & BATADV_VIS_SERVER)) - continue; - - /* don't send it if we already received the packet from - * this node. - */ - if (batadv_recv_list_is_in(bat_priv, &info->recv_list, - orig_node->orig)) - continue; - - memcpy(packet->target_orig, orig_node->orig, ETH_ALEN); - skb = skb_clone(info->skb_packet, GFP_ATOMIC); - if (!skb) - continue; - - res = batadv_send_skb_to_orig(skb, orig_node, NULL); - if (res == NET_XMIT_DROP) - kfree_skb(skb); - } - rcu_read_unlock(); - } -} - -static void batadv_unicast_vis_packet(struct batadv_priv *bat_priv, - struct batadv_vis_info *info) -{ - struct batadv_orig_node *orig_node; - struct sk_buff *skb; - struct batadv_vis_packet *packet; - - packet = (struct batadv_vis_packet *)info->skb_packet->data; - - orig_node = batadv_orig_hash_find(bat_priv, packet->target_orig); - if (!orig_node) - goto out; - - skb = skb_clone(info->skb_packet, GFP_ATOMIC); - if (!skb) - goto out; - - if (batadv_send_skb_to_orig(skb, orig_node, NULL) == NET_XMIT_DROP) - kfree_skb(skb); - -out: - if (orig_node) - batadv_orig_node_free_ref(orig_node); -} - -/* only send one vis packet. called from batadv_send_vis_packets() */ -static void batadv_send_vis_packet(struct batadv_priv *bat_priv, - struct batadv_vis_info *info) -{ - struct batadv_hard_iface *primary_if; - struct batadv_vis_packet *packet; - - primary_if = batadv_primary_if_get_selected(bat_priv); - if (!primary_if) - goto out; - - packet = (struct batadv_vis_packet *)info->skb_packet->data; - if (packet->header.ttl < 2) { - pr_debug("Error - can't send vis packet: ttl exceeded\n"); - goto out; - } - - memcpy(packet->sender_orig, primary_if->net_dev->dev_addr, ETH_ALEN); - packet->header.ttl--; - - if (is_broadcast_ether_addr(packet->target_orig)) - batadv_broadcast_vis_packet(bat_priv, info); - else - batadv_unicast_vis_packet(bat_priv, info); - packet->header.ttl++; /* restore TTL */ - -out: - if (primary_if) - batadv_hardif_free_ref(primary_if); -} - -/* called from timer; send (and maybe generate) vis packet. */ -static void batadv_send_vis_packets(struct work_struct *work) -{ - struct delayed_work *delayed_work; - struct batadv_priv *bat_priv; - struct batadv_priv_vis *priv_vis; - struct batadv_vis_info *info; - - delayed_work = container_of(work, struct delayed_work, work); - priv_vis = container_of(delayed_work, struct batadv_priv_vis, work); - bat_priv = container_of(priv_vis, struct batadv_priv, vis); - spin_lock_bh(&bat_priv->vis.hash_lock); - batadv_purge_vis_packets(bat_priv); - - if (batadv_generate_vis_packet(bat_priv) == 0) { - /* schedule if generation was successful */ - batadv_send_list_add(bat_priv, bat_priv->vis.my_info); - } - - while (!list_empty(&bat_priv->vis.send_list)) { - info = list_first_entry(&bat_priv->vis.send_list, - typeof(*info), send_list); - - kref_get(&info->refcount); - spin_unlock_bh(&bat_priv->vis.hash_lock); - - batadv_send_vis_packet(bat_priv, info); - - spin_lock_bh(&bat_priv->vis.hash_lock); - batadv_send_list_del(info); - kref_put(&info->refcount, batadv_free_info); - } - spin_unlock_bh(&bat_priv->vis.hash_lock); - - queue_delayed_work(batadv_event_workqueue, &bat_priv->vis.work, - msecs_to_jiffies(BATADV_VIS_INTERVAL)); -} - -/* init the vis server. this may only be called when if_list is already - * initialized (e.g. bat0 is initialized, interfaces have been added) - */ -int batadv_vis_init(struct batadv_priv *bat_priv) -{ - struct batadv_vis_packet *packet; - int hash_added; - unsigned int len; - unsigned long first_seen; - struct sk_buff *tmp_skb; - - if (bat_priv->vis.hash) - return 0; - - spin_lock_bh(&bat_priv->vis.hash_lock); - - bat_priv->vis.hash = batadv_hash_new(256); - if (!bat_priv->vis.hash) { - pr_err("Can't initialize vis_hash\n"); - goto err; - } - - batadv_hash_set_lock_class(bat_priv->vis.hash, - &batadv_vis_hash_lock_class_key); - - bat_priv->vis.my_info = kmalloc(BATADV_MAX_VIS_PACKET_SIZE, GFP_ATOMIC); - if (!bat_priv->vis.my_info) - goto err; - - len = sizeof(*packet) + BATADV_MAX_VIS_PACKET_SIZE + ETH_HLEN; - bat_priv->vis.my_info->skb_packet = netdev_alloc_skb_ip_align(NULL, - len); - if (!bat_priv->vis.my_info->skb_packet) - goto free_info; - - bat_priv->vis.my_info->skb_packet->priority = TC_PRIO_CONTROL; - skb_reserve(bat_priv->vis.my_info->skb_packet, ETH_HLEN); - tmp_skb = bat_priv->vis.my_info->skb_packet; - packet = (struct batadv_vis_packet *)skb_put(tmp_skb, sizeof(*packet)); - - /* prefill the vis info */ - first_seen = jiffies - msecs_to_jiffies(BATADV_VIS_INTERVAL); - bat_priv->vis.my_info->first_seen = first_seen; - INIT_LIST_HEAD(&bat_priv->vis.my_info->recv_list); - INIT_LIST_HEAD(&bat_priv->vis.my_info->send_list); - kref_init(&bat_priv->vis.my_info->refcount); - bat_priv->vis.my_info->bat_priv = bat_priv; - packet->header.version = BATADV_COMPAT_VERSION; - packet->header.packet_type = BATADV_VIS; - packet->header.ttl = BATADV_TTL; - packet->seqno = 0; - packet->reserved = 0; - packet->entries = 0; - - INIT_LIST_HEAD(&bat_priv->vis.send_list); - - hash_added = batadv_hash_add(bat_priv->vis.hash, batadv_vis_info_cmp, - batadv_vis_info_choose, - bat_priv->vis.my_info, - &bat_priv->vis.my_info->hash_entry); - if (hash_added != 0) { - pr_err("Can't add own vis packet into hash\n"); - /* not in hash, need to remove it manually. */ - kref_put(&bat_priv->vis.my_info->refcount, batadv_free_info); - goto err; - } - - spin_unlock_bh(&bat_priv->vis.hash_lock); - - INIT_DELAYED_WORK(&bat_priv->vis.work, batadv_send_vis_packets); - queue_delayed_work(batadv_event_workqueue, &bat_priv->vis.work, - msecs_to_jiffies(BATADV_VIS_INTERVAL)); - - return 0; - -free_info: - kfree(bat_priv->vis.my_info); - bat_priv->vis.my_info = NULL; -err: - spin_unlock_bh(&bat_priv->vis.hash_lock); - batadv_vis_quit(bat_priv); - return -ENOMEM; -} - -/* Decrease the reference count on a hash item info */ -static void batadv_free_info_ref(struct hlist_node *node, void *arg) -{ - struct batadv_vis_info *info; - - info = container_of(node, struct batadv_vis_info, hash_entry); - batadv_send_list_del(info); - kref_put(&info->refcount, batadv_free_info); -} - -/* shutdown vis-server */ -void batadv_vis_quit(struct batadv_priv *bat_priv) -{ - if (!bat_priv->vis.hash) - return; - - cancel_delayed_work_sync(&bat_priv->vis.work); - - spin_lock_bh(&bat_priv->vis.hash_lock); - /* properly remove, kill timers ... */ - batadv_hash_delete(bat_priv->vis.hash, batadv_free_info_ref, NULL); - bat_priv->vis.hash = NULL; - bat_priv->vis.my_info = NULL; - spin_unlock_bh(&bat_priv->vis.hash_lock); -} diff --git a/net/batman-adv/vis.h b/net/batman-adv/vis.h deleted file mode 100644 index ad92b0e3c230..000000000000 --- a/net/batman-adv/vis.h +++ /dev/null @@ -1,36 +0,0 @@ -/* Copyright (C) 2008-2013 B.A.T.M.A.N. contributors: - * - * Simon Wunderlich, Marek Lindner - * - * This program is free software; you can redistribute it and/or - * modify it under the terms of version 2 of the GNU General Public - * License as published by the Free Software Foundation. - * - * This program is distributed in the hope that it will be useful, but - * WITHOUT ANY WARRANTY; without even the implied warranty of - * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU - * General Public License for more details. - * - * You should have received a copy of the GNU General Public License - * along with this program; if not, write to the Free Software - * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA - * 02110-1301, USA - */ - -#ifndef _NET_BATMAN_ADV_VIS_H_ -#define _NET_BATMAN_ADV_VIS_H_ - -/* timeout of vis packets in milliseconds */ -#define BATADV_VIS_TIMEOUT 200000 - -int batadv_vis_seq_print_text(struct seq_file *seq, void *offset); -void batadv_receive_server_sync_packet(struct batadv_priv *bat_priv, - struct batadv_vis_packet *vis_packet, - int vis_info_len); -void batadv_receive_client_update_packet(struct batadv_priv *bat_priv, - struct batadv_vis_packet *vis_packet, - int vis_info_len); -int batadv_vis_init(struct batadv_priv *bat_priv); -void batadv_vis_quit(struct batadv_priv *bat_priv); - -#endif /* _NET_BATMAN_ADV_VIS_H_ */ diff --git a/net/bluetooth/Makefile b/net/bluetooth/Makefile index dea6a287daca..6a791e73e39d 100644 --- a/net/bluetooth/Makefile +++ b/net/bluetooth/Makefile @@ -11,3 +11,5 @@ obj-$(CONFIG_BT_HIDP) += hidp/ bluetooth-y := af_bluetooth.o hci_core.o hci_conn.o hci_event.o mgmt.o \ hci_sock.o hci_sysfs.o l2cap_core.o l2cap_sock.o smp.o sco.o lib.o \ a2mp.o amp.o + +subdir-ccflags-y += -D__CHECK_ENDIAN__ diff --git a/net/bluetooth/a2mp.c b/net/bluetooth/a2mp.c index 17f33a62f6db..efcd108822c4 100644 --- a/net/bluetooth/a2mp.c +++ b/net/bluetooth/a2mp.c @@ -15,8 +15,9 @@ #include <net/bluetooth/bluetooth.h> #include <net/bluetooth/hci_core.h> #include <net/bluetooth/l2cap.h> -#include <net/bluetooth/a2mp.h> -#include <net/bluetooth/amp.h> + +#include "a2mp.h" +#include "amp.h" /* Global AMP Manager list */ LIST_HEAD(amp_mgr_list); @@ -75,33 +76,26 @@ u8 __next_ident(struct amp_mgr *mgr) return mgr->ident; } -static inline void __a2mp_cl_bredr(struct a2mp_cl *cl) -{ - cl->id = 0; - cl->type = 0; - cl->status = 1; -} - /* hci_dev_list shall be locked */ -static void __a2mp_add_cl(struct amp_mgr *mgr, struct a2mp_cl *cl, u8 num_ctrl) +static void __a2mp_add_cl(struct amp_mgr *mgr, struct a2mp_cl *cl) { - int i = 0; struct hci_dev *hdev; + int i = 1; - __a2mp_cl_bredr(cl); + cl[0].id = AMP_ID_BREDR; + cl[0].type = AMP_TYPE_BREDR; + cl[0].status = AMP_STATUS_BLUETOOTH_ONLY; list_for_each_entry(hdev, &hci_dev_list, list) { - /* Iterate through AMP controllers */ - if (hdev->id == HCI_BREDR_ID) - continue; - - /* Starting from second entry */ - if (++i >= num_ctrl) - return; - - cl[i].id = hdev->id; - cl[i].type = hdev->amp_type; - cl[i].status = hdev->amp_status; + if (hdev->dev_type == HCI_AMP) { + cl[i].id = hdev->id; + cl[i].type = hdev->amp_type; + if (test_bit(HCI_UP, &hdev->flags)) + cl[i].status = hdev->amp_status; + else + cl[i].status = AMP_STATUS_POWERED_DOWN; + i++; + } } } @@ -129,6 +123,7 @@ static int a2mp_discover_req(struct amp_mgr *mgr, struct sk_buff *skb, struct a2mp_discov_rsp *rsp; u16 ext_feat; u8 num_ctrl; + struct hci_dev *hdev; if (len < sizeof(*req)) return -EINVAL; @@ -152,7 +147,14 @@ static int a2mp_discover_req(struct amp_mgr *mgr, struct sk_buff *skb, read_lock(&hci_dev_list_lock); - num_ctrl = __hci_num_ctrl(); + /* at minimum the BR/EDR needs to be listed */ + num_ctrl = 1; + + list_for_each_entry(hdev, &hci_dev_list, list) { + if (hdev->dev_type == HCI_AMP) + num_ctrl++; + } + len = num_ctrl * sizeof(struct a2mp_cl) + sizeof(*rsp); rsp = kmalloc(len, GFP_ATOMIC); if (!rsp) { @@ -163,7 +165,7 @@ static int a2mp_discover_req(struct amp_mgr *mgr, struct sk_buff *skb, rsp->mtu = __constant_cpu_to_le16(L2CAP_A2MP_DEFAULT_MTU); rsp->ext_feat = 0; - __a2mp_add_cl(mgr, rsp->cl, num_ctrl); + __a2mp_add_cl(mgr, rsp->cl); read_unlock(&hci_dev_list_lock); @@ -208,7 +210,7 @@ static int a2mp_discover_rsp(struct amp_mgr *mgr, struct sk_buff *skb, BT_DBG("Remote AMP id %d type %d status %d", cl->id, cl->type, cl->status); - if (cl->id != HCI_BREDR_ID && cl->type == HCI_AMP) { + if (cl->id != AMP_ID_BREDR && cl->type != AMP_TYPE_BREDR) { struct a2mp_info_req req; found = true; @@ -344,7 +346,7 @@ static int a2mp_getampassoc_req(struct amp_mgr *mgr, struct sk_buff *skb, tmp = amp_mgr_lookup_by_state(READ_LOC_AMP_ASSOC); hdev = hci_dev_get(req->id); - if (!hdev || hdev->amp_type == HCI_BREDR || tmp) { + if (!hdev || hdev->amp_type == AMP_TYPE_BREDR || tmp) { struct a2mp_amp_assoc_rsp rsp; rsp.id = req->id; @@ -451,7 +453,7 @@ static int a2mp_createphyslink_req(struct amp_mgr *mgr, struct sk_buff *skb, rsp.remote_id = req->local_id; hdev = hci_dev_get(req->remote_id); - if (!hdev || hdev->amp_type != HCI_AMP) { + if (!hdev || hdev->amp_type == AMP_TYPE_BREDR) { rsp.status = A2MP_STATUS_INVALID_CTRL_ID; goto send_rsp; } @@ -535,7 +537,8 @@ static int a2mp_discphyslink_req(struct amp_mgr *mgr, struct sk_buff *skb, goto send_rsp; } - hcon = hci_conn_hash_lookup_ba(hdev, AMP_LINK, mgr->l2cap_conn->dst); + hcon = hci_conn_hash_lookup_ba(hdev, AMP_LINK, + &mgr->l2cap_conn->hcon->dst); if (!hcon) { BT_ERR("No phys link exist"); rsp.status = A2MP_STATUS_NO_PHYSICAL_LINK_EXISTS; @@ -669,7 +672,8 @@ static void a2mp_chan_close_cb(struct l2cap_chan *chan) l2cap_chan_put(chan); } -static void a2mp_chan_state_change_cb(struct l2cap_chan *chan, int state) +static void a2mp_chan_state_change_cb(struct l2cap_chan *chan, int state, + int err) { struct amp_mgr *mgr = chan->data; @@ -706,6 +710,9 @@ static struct l2cap_ops a2mp_chan_ops = { .teardown = l2cap_chan_no_teardown, .ready = l2cap_chan_no_ready, .defer = l2cap_chan_no_defer, + .resume = l2cap_chan_no_resume, + .set_shutdown = l2cap_chan_no_set_shutdown, + .get_sndtimeo = l2cap_chan_no_get_sndtimeo, }; static struct l2cap_chan *a2mp_chan_open(struct l2cap_conn *conn, bool locked) @@ -829,6 +836,9 @@ struct l2cap_chan *a2mp_channel_create(struct l2cap_conn *conn, { struct amp_mgr *mgr; + if (conn->hcon->type != ACL_LINK) + return NULL; + mgr = amp_mgr_create(conn, false); if (!mgr) { BT_ERR("Could not create AMP manager"); @@ -871,7 +881,7 @@ void a2mp_send_getinfo_rsp(struct hci_dev *hdev) rsp.id = hdev->id; rsp.status = A2MP_STATUS_INVALID_CTRL_ID; - if (hdev->amp_type != HCI_BREDR) { + if (hdev->amp_type != AMP_TYPE_BREDR) { rsp.status = 0; rsp.total_bw = cpu_to_le32(hdev->amp_total_bw); rsp.max_bw = cpu_to_le32(hdev->amp_max_bw); diff --git a/net/bluetooth/a2mp.h b/net/bluetooth/a2mp.h new file mode 100644 index 000000000000..487b54c1308f --- /dev/null +++ b/net/bluetooth/a2mp.h @@ -0,0 +1,150 @@ +/* + Copyright (c) 2010,2011 Code Aurora Forum. All rights reserved. + Copyright (c) 2011,2012 Intel Corp. + + This program is free software; you can redistribute it and/or modify + it under the terms of the GNU General Public License version 2 and + only version 2 as published by the Free Software Foundation. + + This program is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + GNU General Public License for more details. +*/ + +#ifndef __A2MP_H +#define __A2MP_H + +#include <net/bluetooth/l2cap.h> + +#define A2MP_FEAT_EXT 0x8000 + +enum amp_mgr_state { + READ_LOC_AMP_INFO, + READ_LOC_AMP_ASSOC, + READ_LOC_AMP_ASSOC_FINAL, + WRITE_REMOTE_AMP_ASSOC, +}; + +struct amp_mgr { + struct list_head list; + struct l2cap_conn *l2cap_conn; + struct l2cap_chan *a2mp_chan; + struct l2cap_chan *bredr_chan; + struct kref kref; + __u8 ident; + __u8 handle; + unsigned long state; + unsigned long flags; + + struct list_head amp_ctrls; + struct mutex amp_ctrls_lock; +}; + +struct a2mp_cmd { + __u8 code; + __u8 ident; + __le16 len; + __u8 data[0]; +} __packed; + +/* A2MP command codes */ +#define A2MP_COMMAND_REJ 0x01 +struct a2mp_cmd_rej { + __le16 reason; + __u8 data[0]; +} __packed; + +#define A2MP_DISCOVER_REQ 0x02 +struct a2mp_discov_req { + __le16 mtu; + __le16 ext_feat; +} __packed; + +struct a2mp_cl { + __u8 id; + __u8 type; + __u8 status; +} __packed; + +#define A2MP_DISCOVER_RSP 0x03 +struct a2mp_discov_rsp { + __le16 mtu; + __le16 ext_feat; + struct a2mp_cl cl[0]; +} __packed; + +#define A2MP_CHANGE_NOTIFY 0x04 +#define A2MP_CHANGE_RSP 0x05 + +#define A2MP_GETINFO_REQ 0x06 +struct a2mp_info_req { + __u8 id; +} __packed; + +#define A2MP_GETINFO_RSP 0x07 +struct a2mp_info_rsp { + __u8 id; + __u8 status; + __le32 total_bw; + __le32 max_bw; + __le32 min_latency; + __le16 pal_cap; + __le16 assoc_size; +} __packed; + +#define A2MP_GETAMPASSOC_REQ 0x08 +struct a2mp_amp_assoc_req { + __u8 id; +} __packed; + +#define A2MP_GETAMPASSOC_RSP 0x09 +struct a2mp_amp_assoc_rsp { + __u8 id; + __u8 status; + __u8 amp_assoc[0]; +} __packed; + +#define A2MP_CREATEPHYSLINK_REQ 0x0A +#define A2MP_DISCONNPHYSLINK_REQ 0x0C +struct a2mp_physlink_req { + __u8 local_id; + __u8 remote_id; + __u8 amp_assoc[0]; +} __packed; + +#define A2MP_CREATEPHYSLINK_RSP 0x0B +#define A2MP_DISCONNPHYSLINK_RSP 0x0D +struct a2mp_physlink_rsp { + __u8 local_id; + __u8 remote_id; + __u8 status; +} __packed; + +/* A2MP response status */ +#define A2MP_STATUS_SUCCESS 0x00 +#define A2MP_STATUS_INVALID_CTRL_ID 0x01 +#define A2MP_STATUS_UNABLE_START_LINK_CREATION 0x02 +#define A2MP_STATUS_NO_PHYSICAL_LINK_EXISTS 0x02 +#define A2MP_STATUS_COLLISION_OCCURED 0x03 +#define A2MP_STATUS_DISCONN_REQ_RECVD 0x04 +#define A2MP_STATUS_PHYS_LINK_EXISTS 0x05 +#define A2MP_STATUS_SECURITY_VIOLATION 0x06 + +extern struct list_head amp_mgr_list; +extern struct mutex amp_mgr_list_lock; + +struct amp_mgr *amp_mgr_get(struct amp_mgr *mgr); +int amp_mgr_put(struct amp_mgr *mgr); +u8 __next_ident(struct amp_mgr *mgr); +struct l2cap_chan *a2mp_channel_create(struct l2cap_conn *conn, + struct sk_buff *skb); +struct amp_mgr *amp_mgr_lookup_by_state(u8 state); +void a2mp_send(struct amp_mgr *mgr, u8 code, u8 ident, u16 len, void *data); +void a2mp_discover_amp(struct l2cap_chan *chan); +void a2mp_send_getinfo_rsp(struct hci_dev *hdev); +void a2mp_send_getampassoc_rsp(struct hci_dev *hdev, u8 status); +void a2mp_send_create_phy_link_req(struct hci_dev *hdev, u8 status); +void a2mp_send_create_phy_link_rsp(struct hci_dev *hdev, u8 status); + +#endif /* __A2MP_H */ diff --git a/net/bluetooth/af_bluetooth.c b/net/bluetooth/af_bluetooth.c index 9096137c889c..f6a1671ea2ff 100644 --- a/net/bluetooth/af_bluetooth.c +++ b/net/bluetooth/af_bluetooth.c @@ -25,12 +25,13 @@ /* Bluetooth address family and sockets. */ #include <linux/module.h> +#include <linux/debugfs.h> #include <asm/ioctls.h> #include <net/bluetooth/bluetooth.h> #include <linux/proc_fs.h> -#define VERSION "2.16" +#define VERSION "2.17" /* Bluetooth sockets */ #define BT_MAX_PROTO 8 @@ -221,12 +222,12 @@ int bt_sock_recvmsg(struct kiocb *iocb, struct socket *sock, if (flags & (MSG_OOB)) return -EOPNOTSUPP; - msg->msg_namelen = 0; - skb = skb_recv_datagram(sk, flags, noblock, &err); if (!skb) { - if (sk->sk_shutdown & RCV_SHUTDOWN) + if (sk->sk_shutdown & RCV_SHUTDOWN) { + msg->msg_namelen = 0; return 0; + } return err; } @@ -238,9 +239,16 @@ int bt_sock_recvmsg(struct kiocb *iocb, struct socket *sock, skb_reset_transport_header(skb); err = skb_copy_datagram_iovec(skb, 0, msg->msg_iov, copied); - if (err == 0) + if (err == 0) { sock_recv_ts_and_drops(msg, sk, skb); + if (bt_sk(sk)->skb_msg_name) + bt_sk(sk)->skb_msg_name(skb, msg->msg_name, + &msg->msg_namelen); + else + msg->msg_namelen = 0; + } + skb_free_datagram(sk, skb); return err ? : copied; @@ -490,6 +498,7 @@ int bt_sock_ioctl(struct socket *sock, unsigned int cmd, unsigned long arg) } EXPORT_SYMBOL(bt_sock_ioctl); +/* This function expects the sk lock to be held when called */ int bt_sock_wait_state(struct sock *sk, int state, unsigned long timeo) { DECLARE_WAITQUEUE(wait, current); @@ -525,6 +534,46 @@ int bt_sock_wait_state(struct sock *sk, int state, unsigned long timeo) } EXPORT_SYMBOL(bt_sock_wait_state); +/* This function expects the sk lock to be held when called */ +int bt_sock_wait_ready(struct sock *sk, unsigned long flags) +{ + DECLARE_WAITQUEUE(wait, current); + unsigned long timeo; + int err = 0; + + BT_DBG("sk %p", sk); + + timeo = sock_sndtimeo(sk, flags & O_NONBLOCK); + + add_wait_queue(sk_sleep(sk), &wait); + set_current_state(TASK_INTERRUPTIBLE); + while (test_bit(BT_SK_SUSPEND, &bt_sk(sk)->flags)) { + if (!timeo) { + err = -EAGAIN; + break; + } + + if (signal_pending(current)) { + err = sock_intr_errno(timeo); + break; + } + + release_sock(sk); + timeo = schedule_timeout(timeo); + lock_sock(sk); + set_current_state(TASK_INTERRUPTIBLE); + + err = sock_error(sk); + if (err) + break; + } + __set_current_state(TASK_RUNNING); + remove_wait_queue(sk_sleep(sk), &wait); + + return err; +} +EXPORT_SYMBOL(bt_sock_wait_ready); + #ifdef CONFIG_PROC_FS struct bt_seq_state { struct bt_sock_list *l; @@ -563,7 +612,7 @@ static int bt_seq_show(struct seq_file *seq, void *v) struct bt_sock_list *l = s->l; if (v == SEQ_START_TOKEN) { - seq_puts(seq ,"sk RefCnt Rmem Wmem User Inode Src Dst Parent"); + seq_puts(seq ,"sk RefCnt Rmem Wmem User Inode Parent"); if (l->custom_seq_show) { seq_putc(seq, ' '); @@ -576,15 +625,13 @@ static int bt_seq_show(struct seq_file *seq, void *v) struct bt_sock *bt = bt_sk(sk); seq_printf(seq, - "%pK %-6d %-6u %-6u %-6u %-6lu %pMR %pMR %-6lu", + "%pK %-6d %-6u %-6u %-6u %-6lu %-6lu", sk, atomic_read(&sk->sk_refcnt), sk_rmem_alloc_get(sk), sk_wmem_alloc_get(sk), from_kuid(seq_user_ns(seq), sock_i_uid(sk)), sock_i_ino(sk), - &bt->src, - &bt->dst, bt->parent? sock_i_ino(bt->parent): 0LU); if (l->custom_seq_show) { @@ -662,12 +709,17 @@ static struct net_proto_family bt_sock_family_ops = { .create = bt_sock_create, }; +struct dentry *bt_debugfs; +EXPORT_SYMBOL_GPL(bt_debugfs); + static int __init bt_init(void) { int err; BT_INFO("Core ver %s", VERSION); + bt_debugfs = debugfs_create_dir("bluetooth", NULL); + err = bt_sysfs_init(); if (err < 0) return err; @@ -708,7 +760,6 @@ error: static void __exit bt_exit(void) { - sco_exit(); l2cap_exit(); @@ -718,6 +769,8 @@ static void __exit bt_exit(void) sock_unregister(PF_BLUETOOTH); bt_sysfs_cleanup(); + + debugfs_remove_recursive(bt_debugfs); } subsys_initcall(bt_init); diff --git a/net/bluetooth/amp.c b/net/bluetooth/amp.c index d459ed43c779..bb39509b3f06 100644 --- a/net/bluetooth/amp.c +++ b/net/bluetooth/amp.c @@ -14,10 +14,11 @@ #include <net/bluetooth/bluetooth.h> #include <net/bluetooth/hci.h> #include <net/bluetooth/hci_core.h> -#include <net/bluetooth/a2mp.h> -#include <net/bluetooth/amp.h> #include <crypto/hash.h> +#include "a2mp.h" +#include "amp.h" + /* Remote AMP Controllers interface */ void amp_ctrl_get(struct amp_ctrl *ctrl) { @@ -110,7 +111,7 @@ static u8 __next_handle(struct amp_mgr *mgr) struct hci_conn *phylink_add(struct hci_dev *hdev, struct amp_mgr *mgr, u8 remote_id, bool out) { - bdaddr_t *dst = mgr->l2cap_conn->dst; + bdaddr_t *dst = &mgr->l2cap_conn->hcon->dst; struct hci_conn *hcon; hcon = hci_conn_add(hdev, AMP_LINK, dst); @@ -409,7 +410,8 @@ void amp_create_logical_link(struct l2cap_chan *chan) struct hci_cp_create_accept_logical_link cp; struct hci_dev *hdev; - BT_DBG("chan %p hs_hcon %p dst %pMR", chan, hs_hcon, chan->conn->dst); + BT_DBG("chan %p hs_hcon %p dst %pMR", chan, hs_hcon, + &chan->conn->hcon->dst); if (!hs_hcon) return; diff --git a/net/bluetooth/amp.h b/net/bluetooth/amp.h new file mode 100644 index 000000000000..7ea3db77ba89 --- /dev/null +++ b/net/bluetooth/amp.h @@ -0,0 +1,54 @@ +/* + Copyright (c) 2011,2012 Intel Corp. + + This program is free software; you can redistribute it and/or modify + it under the terms of the GNU General Public License version 2 and + only version 2 as published by the Free Software Foundation. + + This program is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + GNU General Public License for more details. +*/ + +#ifndef __AMP_H +#define __AMP_H + +struct amp_ctrl { + struct list_head list; + struct kref kref; + __u8 id; + __u16 assoc_len_so_far; + __u16 assoc_rem_len; + __u16 assoc_len; + __u8 *assoc; +}; + +int amp_ctrl_put(struct amp_ctrl *ctrl); +void amp_ctrl_get(struct amp_ctrl *ctrl); +struct amp_ctrl *amp_ctrl_add(struct amp_mgr *mgr, u8 id); +struct amp_ctrl *amp_ctrl_lookup(struct amp_mgr *mgr, u8 id); +void amp_ctrl_list_flush(struct amp_mgr *mgr); + +struct hci_conn *phylink_add(struct hci_dev *hdev, struct amp_mgr *mgr, + u8 remote_id, bool out); + +int phylink_gen_key(struct hci_conn *hcon, u8 *data, u8 *len, u8 *type); + +void amp_read_loc_info(struct hci_dev *hdev, struct amp_mgr *mgr); +void amp_read_loc_assoc_frag(struct hci_dev *hdev, u8 phy_handle); +void amp_read_loc_assoc(struct hci_dev *hdev, struct amp_mgr *mgr); +void amp_read_loc_assoc_final_data(struct hci_dev *hdev, + struct hci_conn *hcon); +void amp_create_phylink(struct hci_dev *hdev, struct amp_mgr *mgr, + struct hci_conn *hcon); +void amp_accept_phylink(struct hci_dev *hdev, struct amp_mgr *mgr, + struct hci_conn *hcon); +void amp_write_remote_assoc(struct hci_dev *hdev, u8 handle); +void amp_write_rem_assoc_continue(struct hci_dev *hdev, u8 handle); +void amp_physical_cfm(struct hci_conn *bredr_hcon, struct hci_conn *hs_hcon); +void amp_create_logical_link(struct l2cap_chan *chan); +void amp_disconnect_logical_link(struct hci_chan *hchan); +void amp_destroy_logical_link(struct hci_chan *hchan, u8 reason); + +#endif /* __AMP_H */ diff --git a/net/bluetooth/bnep/core.c b/net/bluetooth/bnep/core.c index e430b1abcd2f..a841d3e776c5 100644 --- a/net/bluetooth/bnep/core.c +++ b/net/bluetooth/bnep/core.c @@ -32,6 +32,7 @@ #include <asm/unaligned.h> #include <net/bluetooth/bluetooth.h> +#include <net/bluetooth/l2cap.h> #include <net/bluetooth/hci_core.h> #include "bnep.h" @@ -510,20 +511,13 @@ static int bnep_session(void *arg) static struct device *bnep_get_device(struct bnep_session *session) { - bdaddr_t *src = &bt_sk(session->sock->sk)->src; - bdaddr_t *dst = &bt_sk(session->sock->sk)->dst; - struct hci_dev *hdev; struct hci_conn *conn; - hdev = hci_get_route(dst, src); - if (!hdev) + conn = l2cap_pi(session->sock->sk)->chan->conn->hcon; + if (!conn) return NULL; - conn = hci_conn_hash_lookup_ba(hdev, ACL_LINK, dst); - - hci_dev_put(hdev); - - return conn ? &conn->dev : NULL; + return &conn->dev; } static struct device_type bnep_type = { @@ -539,8 +533,8 @@ int bnep_add_connection(struct bnep_connadd_req *req, struct socket *sock) BT_DBG(""); - baswap((void *) dst, &bt_sk(sock->sk)->dst); - baswap((void *) src, &bt_sk(sock->sk)->src); + baswap((void *) dst, &l2cap_pi(sock->sk)->chan->dst); + baswap((void *) src, &l2cap_pi(sock->sk)->chan->src); /* session struct allocated as private part of net_device */ dev = alloc_netdev(sizeof(struct bnep_session), diff --git a/net/bluetooth/cmtp/core.c b/net/bluetooth/cmtp/core.c index e0a6ebf2baa6..67fe5e84e68f 100644 --- a/net/bluetooth/cmtp/core.c +++ b/net/bluetooth/cmtp/core.c @@ -340,20 +340,20 @@ int cmtp_add_connection(struct cmtp_connadd_req *req, struct socket *sock) down_write(&cmtp_session_sem); - s = __cmtp_get_session(&bt_sk(sock->sk)->dst); + s = __cmtp_get_session(&l2cap_pi(sock->sk)->chan->dst); if (s && s->state == BT_CONNECTED) { err = -EEXIST; goto failed; } - bacpy(&session->bdaddr, &bt_sk(sock->sk)->dst); + bacpy(&session->bdaddr, &l2cap_pi(sock->sk)->chan->dst); session->mtu = min_t(uint, l2cap_pi(sock->sk)->chan->omtu, l2cap_pi(sock->sk)->chan->imtu); BT_DBG("mtu %d", session->mtu); - sprintf(session->name, "%pMR", &bt_sk(sock->sk)->dst); + sprintf(session->name, "%pMR", &session->bdaddr); session->sock = sock; session->state = BT_CONFIG; diff --git a/net/bluetooth/hci_conn.c b/net/bluetooth/hci_conn.c index f0817121ec5e..ba5366c320da 100644 --- a/net/bluetooth/hci_conn.c +++ b/net/bluetooth/hci_conn.c @@ -28,8 +28,9 @@ #include <net/bluetooth/bluetooth.h> #include <net/bluetooth/hci_core.h> -#include <net/bluetooth/a2mp.h> -#include <net/bluetooth/smp.h> + +#include "smp.h" +#include "a2mp.h" struct sco_param { u16 pkt_type; @@ -49,30 +50,6 @@ static const struct sco_param sco_param_wideband[] = { { EDR_ESCO_MASK | ESCO_EV3, 0x0008 }, /* T1 */ }; -static void hci_le_create_connection(struct hci_conn *conn) -{ - struct hci_dev *hdev = conn->hdev; - struct hci_cp_le_create_conn cp; - - conn->state = BT_CONNECT; - conn->out = true; - conn->link_mode |= HCI_LM_MASTER; - conn->sec_level = BT_SECURITY_LOW; - - memset(&cp, 0, sizeof(cp)); - cp.scan_interval = __constant_cpu_to_le16(0x0060); - cp.scan_window = __constant_cpu_to_le16(0x0030); - bacpy(&cp.peer_addr, &conn->dst); - cp.peer_addr_type = conn->dst_type; - cp.conn_interval_min = __constant_cpu_to_le16(0x0028); - cp.conn_interval_max = __constant_cpu_to_le16(0x0038); - cp.supervision_timeout = __constant_cpu_to_le16(0x002a); - cp.min_ce_len = __constant_cpu_to_le16(0x0000); - cp.max_ce_len = __constant_cpu_to_le16(0x0000); - - hci_send_cmd(hdev, HCI_OP_LE_CREATE_CONN, sizeof(cp), &cp); -} - static void hci_le_create_connection_cancel(struct hci_conn *conn) { hci_send_cmd(conn->hdev, HCI_OP_LE_CREATE_CONN_CANCEL, 0, NULL); @@ -340,8 +317,10 @@ static void hci_conn_timeout(struct work_struct *work) } /* Enter sniff mode */ -static void hci_conn_enter_sniff_mode(struct hci_conn *conn) +static void hci_conn_idle(struct work_struct *work) { + struct hci_conn *conn = container_of(work, struct hci_conn, + idle_work.work); struct hci_dev *hdev = conn->hdev; BT_DBG("hcon %p mode %d", conn, conn->mode); @@ -375,21 +354,12 @@ static void hci_conn_enter_sniff_mode(struct hci_conn *conn) } } -static void hci_conn_idle(unsigned long arg) -{ - struct hci_conn *conn = (void *) arg; - - BT_DBG("hcon %p mode %d", conn, conn->mode); - - hci_conn_enter_sniff_mode(conn); -} - -static void hci_conn_auto_accept(unsigned long arg) +static void hci_conn_auto_accept(struct work_struct *work) { - struct hci_conn *conn = (void *) arg; - struct hci_dev *hdev = conn->hdev; + struct hci_conn *conn = container_of(work, struct hci_conn, + auto_accept_work.work); - hci_send_cmd(hdev, HCI_OP_USER_CONFIRM_REPLY, sizeof(conn->dst), + hci_send_cmd(conn->hdev, HCI_OP_USER_CONFIRM_REPLY, sizeof(conn->dst), &conn->dst); } @@ -404,6 +374,7 @@ struct hci_conn *hci_conn_add(struct hci_dev *hdev, int type, bdaddr_t *dst) return NULL; bacpy(&conn->dst, dst); + bacpy(&conn->src, &hdev->bdaddr); conn->hdev = hdev; conn->type = type; conn->mode = HCI_CM_ACTIVE; @@ -437,9 +408,8 @@ struct hci_conn *hci_conn_add(struct hci_dev *hdev, int type, bdaddr_t *dst) INIT_LIST_HEAD(&conn->chan_list); INIT_DELAYED_WORK(&conn->disc_work, hci_conn_timeout); - setup_timer(&conn->idle_timer, hci_conn_idle, (unsigned long)conn); - setup_timer(&conn->auto_accept_timer, hci_conn_auto_accept, - (unsigned long) conn); + INIT_DELAYED_WORK(&conn->auto_accept_work, hci_conn_auto_accept); + INIT_DELAYED_WORK(&conn->idle_work, hci_conn_idle); atomic_set(&conn->refcnt, 0); @@ -460,11 +430,9 @@ int hci_conn_del(struct hci_conn *conn) BT_DBG("%s hcon %p handle %d", hdev->name, conn, conn->handle); - del_timer(&conn->idle_timer); - cancel_delayed_work_sync(&conn->disc_work); - - del_timer(&conn->auto_accept_timer); + cancel_delayed_work_sync(&conn->auto_accept_work); + cancel_delayed_work_sync(&conn->idle_work); if (conn->type == ACL_LINK) { struct hci_conn *sco = conn->link; @@ -518,6 +486,7 @@ struct hci_dev *hci_get_route(bdaddr_t *dst, bdaddr_t *src) list_for_each_entry(d, &hci_dev_list, list) { if (!test_bit(HCI_UP, &d->flags) || test_bit(HCI_RAW, &d->flags) || + test_bit(HCI_USER_CHANNEL, &d->dev_flags) || d->dev_type != HCI_BREDR) continue; @@ -545,34 +514,124 @@ struct hci_dev *hci_get_route(bdaddr_t *dst, bdaddr_t *src) } EXPORT_SYMBOL(hci_get_route); +static void create_le_conn_complete(struct hci_dev *hdev, u8 status) +{ + struct hci_conn *conn; + + if (status == 0) + return; + + BT_ERR("HCI request failed to create LE connection: status 0x%2.2x", + status); + + hci_dev_lock(hdev); + + conn = hci_conn_hash_lookup_state(hdev, LE_LINK, BT_CONNECT); + if (!conn) + goto done; + + conn->state = BT_CLOSED; + + mgmt_connect_failed(hdev, &conn->dst, conn->type, conn->dst_type, + status); + + hci_proto_connect_cfm(conn, status); + + hci_conn_del(conn); + +done: + hci_dev_unlock(hdev); +} + +static int hci_create_le_conn(struct hci_conn *conn) +{ + struct hci_dev *hdev = conn->hdev; + struct hci_cp_le_create_conn cp; + struct hci_request req; + int err; + + hci_req_init(&req, hdev); + + memset(&cp, 0, sizeof(cp)); + cp.scan_interval = cpu_to_le16(hdev->le_scan_interval); + cp.scan_window = cpu_to_le16(hdev->le_scan_window); + bacpy(&cp.peer_addr, &conn->dst); + cp.peer_addr_type = conn->dst_type; + cp.own_address_type = conn->src_type; + cp.conn_interval_min = cpu_to_le16(hdev->le_conn_min_interval); + cp.conn_interval_max = cpu_to_le16(hdev->le_conn_max_interval); + cp.supervision_timeout = __constant_cpu_to_le16(0x002a); + cp.min_ce_len = __constant_cpu_to_le16(0x0000); + cp.max_ce_len = __constant_cpu_to_le16(0x0000); + + hci_req_add(&req, HCI_OP_LE_CREATE_CONN, sizeof(cp), &cp); + + err = hci_req_run(&req, create_le_conn_complete); + if (err) { + hci_conn_del(conn); + return err; + } + + return 0; +} + static struct hci_conn *hci_connect_le(struct hci_dev *hdev, bdaddr_t *dst, u8 dst_type, u8 sec_level, u8 auth_type) { - struct hci_conn *le; + struct hci_conn *conn; + int err; - if (test_bit(HCI_LE_PERIPHERAL, &hdev->flags)) + if (test_bit(HCI_ADVERTISING, &hdev->flags)) return ERR_PTR(-ENOTSUPP); - le = hci_conn_hash_lookup_ba(hdev, LE_LINK, dst); - if (!le) { - le = hci_conn_hash_lookup_state(hdev, LE_LINK, BT_CONNECT); - if (le) - return ERR_PTR(-EBUSY); + /* Some devices send ATT messages as soon as the physical link is + * established. To be able to handle these ATT messages, the user- + * space first establishes the connection and then starts the pairing + * process. + * + * So if a hci_conn object already exists for the following connection + * attempt, we simply update pending_sec_level and auth_type fields + * and return the object found. + */ + conn = hci_conn_hash_lookup_ba(hdev, LE_LINK, dst); + if (conn) { + conn->pending_sec_level = sec_level; + conn->auth_type = auth_type; + goto done; + } - le = hci_conn_add(hdev, LE_LINK, dst); - if (!le) - return ERR_PTR(-ENOMEM); + /* Since the controller supports only one LE connection attempt at a + * time, we return -EBUSY if there is any connection attempt running. + */ + conn = hci_conn_hash_lookup_state(hdev, LE_LINK, BT_CONNECT); + if (conn) + return ERR_PTR(-EBUSY); - le->dst_type = bdaddr_to_le(dst_type); - hci_le_create_connection(le); - } + conn = hci_conn_add(hdev, LE_LINK, dst); + if (!conn) + return ERR_PTR(-ENOMEM); - le->pending_sec_level = sec_level; - le->auth_type = auth_type; + if (dst_type == BDADDR_LE_PUBLIC) + conn->dst_type = ADDR_LE_DEV_PUBLIC; + else + conn->dst_type = ADDR_LE_DEV_RANDOM; - hci_conn_hold(le); + conn->src_type = hdev->own_addr_type; - return le; + conn->state = BT_CONNECT; + conn->out = true; + conn->link_mode |= HCI_LM_MASTER; + conn->sec_level = BT_SECURITY_LOW; + conn->pending_sec_level = sec_level; + conn->auth_type = auth_type; + + err = hci_create_le_conn(conn); + if (err) + return ERR_PTR(err); + +done: + hci_conn_hold(conn); + return conn; } static struct hci_conn *hci_connect_acl(struct hci_dev *hdev, bdaddr_t *dst, @@ -580,6 +639,9 @@ static struct hci_conn *hci_connect_acl(struct hci_dev *hdev, bdaddr_t *dst, { struct hci_conn *acl; + if (!test_bit(HCI_BREDR_ENABLED, &hdev->dev_flags)) + return ERR_PTR(-ENOTSUPP); + acl = hci_conn_hash_lookup_ba(hdev, ACL_LINK, dst); if (!acl) { acl = hci_conn_add(hdev, ACL_LINK, dst); @@ -846,8 +908,8 @@ void hci_conn_enter_active_mode(struct hci_conn *conn, __u8 force_active) timer: if (hdev->idle_timeout > 0) - mod_timer(&conn->idle_timer, - jiffies + msecs_to_jiffies(hdev->idle_timeout)); + queue_delayed_work(hdev->workqueue, &conn->idle_work, + msecs_to_jiffies(hdev->idle_timeout)); } /* Drop all connection on the device */ diff --git a/net/bluetooth/hci_core.c b/net/bluetooth/hci_core.c index fb7356fcfe51..6ccc4eb9e55e 100644 --- a/net/bluetooth/hci_core.c +++ b/net/bluetooth/hci_core.c @@ -27,8 +27,9 @@ #include <linux/export.h> #include <linux/idr.h> - #include <linux/rfkill.h> +#include <linux/debugfs.h> +#include <asm/unaligned.h> #include <net/bluetooth/bluetooth.h> #include <net/bluetooth/hci_core.h> @@ -55,6 +56,586 @@ static void hci_notify(struct hci_dev *hdev, int event) hci_sock_dev_event(hdev, event); } +/* ---- HCI debugfs entries ---- */ + +static ssize_t dut_mode_read(struct file *file, char __user *user_buf, + size_t count, loff_t *ppos) +{ + struct hci_dev *hdev = file->private_data; + char buf[3]; + + buf[0] = test_bit(HCI_DUT_MODE, &hdev->dev_flags) ? 'Y': 'N'; + buf[1] = '\n'; + buf[2] = '\0'; + return simple_read_from_buffer(user_buf, count, ppos, buf, 2); +} + +static ssize_t dut_mode_write(struct file *file, const char __user *user_buf, + size_t count, loff_t *ppos) +{ + struct hci_dev *hdev = file->private_data; + struct sk_buff *skb; + char buf[32]; + size_t buf_size = min(count, (sizeof(buf)-1)); + bool enable; + int err; + + if (!test_bit(HCI_UP, &hdev->flags)) + return -ENETDOWN; + + if (copy_from_user(buf, user_buf, buf_size)) + return -EFAULT; + + buf[buf_size] = '\0'; + if (strtobool(buf, &enable)) + return -EINVAL; + + if (enable == test_bit(HCI_DUT_MODE, &hdev->dev_flags)) + return -EALREADY; + + hci_req_lock(hdev); + if (enable) + skb = __hci_cmd_sync(hdev, HCI_OP_ENABLE_DUT_MODE, 0, NULL, + HCI_CMD_TIMEOUT); + else + skb = __hci_cmd_sync(hdev, HCI_OP_RESET, 0, NULL, + HCI_CMD_TIMEOUT); + hci_req_unlock(hdev); + + if (IS_ERR(skb)) + return PTR_ERR(skb); + + err = -bt_to_errno(skb->data[0]); + kfree_skb(skb); + + if (err < 0) + return err; + + change_bit(HCI_DUT_MODE, &hdev->dev_flags); + + return count; +} + +static const struct file_operations dut_mode_fops = { + .open = simple_open, + .read = dut_mode_read, + .write = dut_mode_write, + .llseek = default_llseek, +}; + +static int features_show(struct seq_file *f, void *ptr) +{ + struct hci_dev *hdev = f->private; + u8 p; + + hci_dev_lock(hdev); + for (p = 0; p < HCI_MAX_PAGES && p <= hdev->max_page; p++) { + seq_printf(f, "%2u: 0x%2.2x 0x%2.2x 0x%2.2x 0x%2.2x " + "0x%2.2x 0x%2.2x 0x%2.2x 0x%2.2x\n", p, + hdev->features[p][0], hdev->features[p][1], + hdev->features[p][2], hdev->features[p][3], + hdev->features[p][4], hdev->features[p][5], + hdev->features[p][6], hdev->features[p][7]); + } + if (lmp_le_capable(hdev)) + seq_printf(f, "LE: 0x%2.2x 0x%2.2x 0x%2.2x 0x%2.2x " + "0x%2.2x 0x%2.2x 0x%2.2x 0x%2.2x\n", + hdev->le_features[0], hdev->le_features[1], + hdev->le_features[2], hdev->le_features[3], + hdev->le_features[4], hdev->le_features[5], + hdev->le_features[6], hdev->le_features[7]); + hci_dev_unlock(hdev); + + return 0; +} + +static int features_open(struct inode *inode, struct file *file) +{ + return single_open(file, features_show, inode->i_private); +} + +static const struct file_operations features_fops = { + .open = features_open, + .read = seq_read, + .llseek = seq_lseek, + .release = single_release, +}; + +static int blacklist_show(struct seq_file *f, void *p) +{ + struct hci_dev *hdev = f->private; + struct bdaddr_list *b; + + hci_dev_lock(hdev); + list_for_each_entry(b, &hdev->blacklist, list) + seq_printf(f, "%pMR (type %u)\n", &b->bdaddr, b->bdaddr_type); + hci_dev_unlock(hdev); + + return 0; +} + +static int blacklist_open(struct inode *inode, struct file *file) +{ + return single_open(file, blacklist_show, inode->i_private); +} + +static const struct file_operations blacklist_fops = { + .open = blacklist_open, + .read = seq_read, + .llseek = seq_lseek, + .release = single_release, +}; + +static int uuids_show(struct seq_file *f, void *p) +{ + struct hci_dev *hdev = f->private; + struct bt_uuid *uuid; + + hci_dev_lock(hdev); + list_for_each_entry(uuid, &hdev->uuids, list) { + u8 i, val[16]; + + /* The Bluetooth UUID values are stored in big endian, + * but with reversed byte order. So convert them into + * the right order for the %pUb modifier. + */ + for (i = 0; i < 16; i++) + val[i] = uuid->uuid[15 - i]; + + seq_printf(f, "%pUb\n", val); + } + hci_dev_unlock(hdev); + + return 0; +} + +static int uuids_open(struct inode *inode, struct file *file) +{ + return single_open(file, uuids_show, inode->i_private); +} + +static const struct file_operations uuids_fops = { + .open = uuids_open, + .read = seq_read, + .llseek = seq_lseek, + .release = single_release, +}; + +static int inquiry_cache_show(struct seq_file *f, void *p) +{ + struct hci_dev *hdev = f->private; + struct discovery_state *cache = &hdev->discovery; + struct inquiry_entry *e; + + hci_dev_lock(hdev); + + list_for_each_entry(e, &cache->all, all) { + struct inquiry_data *data = &e->data; + seq_printf(f, "%pMR %d %d %d 0x%.2x%.2x%.2x 0x%.4x %d %d %u\n", + &data->bdaddr, + data->pscan_rep_mode, data->pscan_period_mode, + data->pscan_mode, data->dev_class[2], + data->dev_class[1], data->dev_class[0], + __le16_to_cpu(data->clock_offset), + data->rssi, data->ssp_mode, e->timestamp); + } + + hci_dev_unlock(hdev); + + return 0; +} + +static int inquiry_cache_open(struct inode *inode, struct file *file) +{ + return single_open(file, inquiry_cache_show, inode->i_private); +} + +static const struct file_operations inquiry_cache_fops = { + .open = inquiry_cache_open, + .read = seq_read, + .llseek = seq_lseek, + .release = single_release, +}; + +static int link_keys_show(struct seq_file *f, void *ptr) +{ + struct hci_dev *hdev = f->private; + struct list_head *p, *n; + + hci_dev_lock(hdev); + list_for_each_safe(p, n, &hdev->link_keys) { + struct link_key *key = list_entry(p, struct link_key, list); + seq_printf(f, "%pMR %u %*phN %u\n", &key->bdaddr, key->type, + HCI_LINK_KEY_SIZE, key->val, key->pin_len); + } + hci_dev_unlock(hdev); + + return 0; +} + +static int link_keys_open(struct inode *inode, struct file *file) +{ + return single_open(file, link_keys_show, inode->i_private); +} + +static const struct file_operations link_keys_fops = { + .open = link_keys_open, + .read = seq_read, + .llseek = seq_lseek, + .release = single_release, +}; + +static ssize_t use_debug_keys_read(struct file *file, char __user *user_buf, + size_t count, loff_t *ppos) +{ + struct hci_dev *hdev = file->private_data; + char buf[3]; + + buf[0] = test_bit(HCI_DEBUG_KEYS, &hdev->dev_flags) ? 'Y': 'N'; + buf[1] = '\n'; + buf[2] = '\0'; + return simple_read_from_buffer(user_buf, count, ppos, buf, 2); +} + +static const struct file_operations use_debug_keys_fops = { + .open = simple_open, + .read = use_debug_keys_read, + .llseek = default_llseek, +}; + +static int dev_class_show(struct seq_file *f, void *ptr) +{ + struct hci_dev *hdev = f->private; + + hci_dev_lock(hdev); + seq_printf(f, "0x%.2x%.2x%.2x\n", hdev->dev_class[2], + hdev->dev_class[1], hdev->dev_class[0]); + hci_dev_unlock(hdev); + + return 0; +} + +static int dev_class_open(struct inode *inode, struct file *file) +{ + return single_open(file, dev_class_show, inode->i_private); +} + +static const struct file_operations dev_class_fops = { + .open = dev_class_open, + .read = seq_read, + .llseek = seq_lseek, + .release = single_release, +}; + +static int voice_setting_get(void *data, u64 *val) +{ + struct hci_dev *hdev = data; + + hci_dev_lock(hdev); + *val = hdev->voice_setting; + hci_dev_unlock(hdev); + + return 0; +} + +DEFINE_SIMPLE_ATTRIBUTE(voice_setting_fops, voice_setting_get, + NULL, "0x%4.4llx\n"); + +static int auto_accept_delay_set(void *data, u64 val) +{ + struct hci_dev *hdev = data; + + hci_dev_lock(hdev); + hdev->auto_accept_delay = val; + hci_dev_unlock(hdev); + + return 0; +} + +static int auto_accept_delay_get(void *data, u64 *val) +{ + struct hci_dev *hdev = data; + + hci_dev_lock(hdev); + *val = hdev->auto_accept_delay; + hci_dev_unlock(hdev); + + return 0; +} + +DEFINE_SIMPLE_ATTRIBUTE(auto_accept_delay_fops, auto_accept_delay_get, + auto_accept_delay_set, "%llu\n"); + +static int ssp_debug_mode_set(void *data, u64 val) +{ + struct hci_dev *hdev = data; + struct sk_buff *skb; + __u8 mode; + int err; + + if (val != 0 && val != 1) + return -EINVAL; + + if (!test_bit(HCI_UP, &hdev->flags)) + return -ENETDOWN; + + hci_req_lock(hdev); + mode = val; + skb = __hci_cmd_sync(hdev, HCI_OP_WRITE_SSP_DEBUG_MODE, sizeof(mode), + &mode, HCI_CMD_TIMEOUT); + hci_req_unlock(hdev); + + if (IS_ERR(skb)) + return PTR_ERR(skb); + + err = -bt_to_errno(skb->data[0]); + kfree_skb(skb); + + if (err < 0) + return err; + + hci_dev_lock(hdev); + hdev->ssp_debug_mode = val; + hci_dev_unlock(hdev); + + return 0; +} + +static int ssp_debug_mode_get(void *data, u64 *val) +{ + struct hci_dev *hdev = data; + + hci_dev_lock(hdev); + *val = hdev->ssp_debug_mode; + hci_dev_unlock(hdev); + + return 0; +} + +DEFINE_SIMPLE_ATTRIBUTE(ssp_debug_mode_fops, ssp_debug_mode_get, + ssp_debug_mode_set, "%llu\n"); + +static int idle_timeout_set(void *data, u64 val) +{ + struct hci_dev *hdev = data; + + if (val != 0 && (val < 500 || val > 3600000)) + return -EINVAL; + + hci_dev_lock(hdev); + hdev->idle_timeout = val; + hci_dev_unlock(hdev); + + return 0; +} + +static int idle_timeout_get(void *data, u64 *val) +{ + struct hci_dev *hdev = data; + + hci_dev_lock(hdev); + *val = hdev->idle_timeout; + hci_dev_unlock(hdev); + + return 0; +} + +DEFINE_SIMPLE_ATTRIBUTE(idle_timeout_fops, idle_timeout_get, + idle_timeout_set, "%llu\n"); + +static int sniff_min_interval_set(void *data, u64 val) +{ + struct hci_dev *hdev = data; + + if (val == 0 || val % 2 || val > hdev->sniff_max_interval) + return -EINVAL; + + hci_dev_lock(hdev); + hdev->sniff_min_interval = val; + hci_dev_unlock(hdev); + + return 0; +} + +static int sniff_min_interval_get(void *data, u64 *val) +{ + struct hci_dev *hdev = data; + + hci_dev_lock(hdev); + *val = hdev->sniff_min_interval; + hci_dev_unlock(hdev); + + return 0; +} + +DEFINE_SIMPLE_ATTRIBUTE(sniff_min_interval_fops, sniff_min_interval_get, + sniff_min_interval_set, "%llu\n"); + +static int sniff_max_interval_set(void *data, u64 val) +{ + struct hci_dev *hdev = data; + + if (val == 0 || val % 2 || val < hdev->sniff_min_interval) + return -EINVAL; + + hci_dev_lock(hdev); + hdev->sniff_max_interval = val; + hci_dev_unlock(hdev); + + return 0; +} + +static int sniff_max_interval_get(void *data, u64 *val) +{ + struct hci_dev *hdev = data; + + hci_dev_lock(hdev); + *val = hdev->sniff_max_interval; + hci_dev_unlock(hdev); + + return 0; +} + +DEFINE_SIMPLE_ATTRIBUTE(sniff_max_interval_fops, sniff_max_interval_get, + sniff_max_interval_set, "%llu\n"); + +static int static_address_show(struct seq_file *f, void *p) +{ + struct hci_dev *hdev = f->private; + + hci_dev_lock(hdev); + seq_printf(f, "%pMR\n", &hdev->static_addr); + hci_dev_unlock(hdev); + + return 0; +} + +static int static_address_open(struct inode *inode, struct file *file) +{ + return single_open(file, static_address_show, inode->i_private); +} + +static const struct file_operations static_address_fops = { + .open = static_address_open, + .read = seq_read, + .llseek = seq_lseek, + .release = single_release, +}; + +static int own_address_type_set(void *data, u64 val) +{ + struct hci_dev *hdev = data; + + if (val != 0 && val != 1) + return -EINVAL; + + hci_dev_lock(hdev); + hdev->own_addr_type = val; + hci_dev_unlock(hdev); + + return 0; +} + +static int own_address_type_get(void *data, u64 *val) +{ + struct hci_dev *hdev = data; + + hci_dev_lock(hdev); + *val = hdev->own_addr_type; + hci_dev_unlock(hdev); + + return 0; +} + +DEFINE_SIMPLE_ATTRIBUTE(own_address_type_fops, own_address_type_get, + own_address_type_set, "%llu\n"); + +static int long_term_keys_show(struct seq_file *f, void *ptr) +{ + struct hci_dev *hdev = f->private; + struct list_head *p, *n; + + hci_dev_lock(hdev); + list_for_each_safe(p, n, &hdev->link_keys) { + struct smp_ltk *ltk = list_entry(p, struct smp_ltk, list); + seq_printf(f, "%pMR (type %u) %u %u %u %.4x %*phN %*phN\\n", + <k->bdaddr, ltk->bdaddr_type, ltk->authenticated, + ltk->type, ltk->enc_size, __le16_to_cpu(ltk->ediv), + 8, ltk->rand, 16, ltk->val); + } + hci_dev_unlock(hdev); + + return 0; +} + +static int long_term_keys_open(struct inode *inode, struct file *file) +{ + return single_open(file, long_term_keys_show, inode->i_private); +} + +static const struct file_operations long_term_keys_fops = { + .open = long_term_keys_open, + .read = seq_read, + .llseek = seq_lseek, + .release = single_release, +}; + +static int conn_min_interval_set(void *data, u64 val) +{ + struct hci_dev *hdev = data; + + if (val < 0x0006 || val > 0x0c80 || val > hdev->le_conn_max_interval) + return -EINVAL; + + hci_dev_lock(hdev); + hdev->le_conn_min_interval = val; + hci_dev_unlock(hdev); + + return 0; +} + +static int conn_min_interval_get(void *data, u64 *val) +{ + struct hci_dev *hdev = data; + + hci_dev_lock(hdev); + *val = hdev->le_conn_min_interval; + hci_dev_unlock(hdev); + + return 0; +} + +DEFINE_SIMPLE_ATTRIBUTE(conn_min_interval_fops, conn_min_interval_get, + conn_min_interval_set, "%llu\n"); + +static int conn_max_interval_set(void *data, u64 val) +{ + struct hci_dev *hdev = data; + + if (val < 0x0006 || val > 0x0c80 || val < hdev->le_conn_min_interval) + return -EINVAL; + + hci_dev_lock(hdev); + hdev->le_conn_max_interval = val; + hci_dev_unlock(hdev); + + return 0; +} + +static int conn_max_interval_get(void *data, u64 *val) +{ + struct hci_dev *hdev = data; + + hci_dev_lock(hdev); + *val = hdev->le_conn_max_interval; + hci_dev_unlock(hdev); + + return 0; +} + +DEFINE_SIMPLE_ATTRIBUTE(conn_max_interval_fops, conn_max_interval_get, + conn_max_interval_set, "%llu\n"); + /* ---- HCI requests ---- */ static void hci_req_sync_complete(struct hci_dev *hdev, u8 result) @@ -307,11 +888,23 @@ static void amp_init(struct hci_request *req) /* Read Local Version */ hci_req_add(req, HCI_OP_READ_LOCAL_VERSION, 0, NULL); + /* Read Local Supported Commands */ + hci_req_add(req, HCI_OP_READ_LOCAL_COMMANDS, 0, NULL); + + /* Read Local Supported Features */ + hci_req_add(req, HCI_OP_READ_LOCAL_FEATURES, 0, NULL); + /* Read Local AMP Info */ hci_req_add(req, HCI_OP_READ_LOCAL_AMP_INFO, 0, NULL); /* Read Data Blk size */ hci_req_add(req, HCI_OP_READ_DATA_BLOCK_SIZE, 0, NULL); + + /* Read Flow Control Mode */ + hci_req_add(req, HCI_OP_READ_FLOW_CONTROL_MODE, 0, NULL); + + /* Read Location Data */ + hci_req_add(req, HCI_OP_READ_LOCATION_DATA, 0, NULL); } static void hci_init1_req(struct hci_request *req, unsigned long opt) @@ -341,6 +934,8 @@ static void hci_init1_req(struct hci_request *req, unsigned long opt) static void bredr_setup(struct hci_request *req) { + struct hci_dev *hdev = req->hdev; + __le16 param; __u8 flt_type; @@ -356,6 +951,12 @@ static void bredr_setup(struct hci_request *req) /* Read Voice Setting */ hci_req_add(req, HCI_OP_READ_VOICE_SETTING, 0, NULL); + /* Read Number of Supported IAC */ + hci_req_add(req, HCI_OP_READ_NUM_SUPPORTED_IAC, 0, NULL); + + /* Read Current IAC LAP */ + hci_req_add(req, HCI_OP_READ_CURRENT_IAC_LAP, 0, NULL); + /* Clear Event Filters */ flt_type = HCI_FLT_CLEAR_ALL; hci_req_add(req, HCI_OP_SET_EVENT_FLT, 1, &flt_type); @@ -364,8 +965,10 @@ static void bredr_setup(struct hci_request *req) param = __constant_cpu_to_le16(0x7d00); hci_req_add(req, HCI_OP_WRITE_CA_TIMEOUT, 2, ¶m); - /* Read page scan parameters */ - if (req->hdev->hci_ver > BLUETOOTH_VER_1_1) { + /* AVM Berlin (31), aka "BlueFRITZ!", reports version 1.2, + * but it does not support page scan related HCI commands. + */ + if (hdev->manufacturer != 31 && hdev->hci_ver > BLUETOOTH_VER_1_1) { hci_req_add(req, HCI_OP_READ_PAGE_SCAN_ACTIVITY, 0, NULL); hci_req_add(req, HCI_OP_READ_PAGE_SCAN_TYPE, 0, NULL); } @@ -519,6 +1122,8 @@ static void hci_init2_req(struct hci_request *req, unsigned long opt) if (lmp_bredr_capable(hdev)) bredr_setup(req); + else + clear_bit(HCI_BREDR_ENABLED, &hdev->dev_flags); if (lmp_le_capable(hdev)) le_setup(req); @@ -532,6 +1137,14 @@ static void hci_init2_req(struct hci_request *req, unsigned long opt) hci_req_add(req, HCI_OP_READ_LOCAL_COMMANDS, 0, NULL); if (lmp_ssp_capable(hdev)) { + /* When SSP is available, then the host features page + * should also be available as well. However some + * controllers list the max_page as 0 as long as SSP + * has not been enabled. To achieve proper debugging + * output, force the minimum max_page to 1 at least. + */ + hdev->max_page = 0x01; + if (test_bit(HCI_SSP_ENABLED, &hdev->dev_flags)) { u8 mode = 0x01; hci_req_add(req, HCI_OP_WRITE_SSP_MODE, @@ -607,6 +1220,34 @@ static void hci_set_le_support(struct hci_request *req) &cp); } +static void hci_set_event_mask_page_2(struct hci_request *req) +{ + struct hci_dev *hdev = req->hdev; + u8 events[8] = { 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00 }; + + /* If Connectionless Slave Broadcast master role is supported + * enable all necessary events for it. + */ + if (hdev->features[2][0] & 0x01) { + events[1] |= 0x40; /* Triggered Clock Capture */ + events[1] |= 0x80; /* Synchronization Train Complete */ + events[2] |= 0x10; /* Slave Page Response Timeout */ + events[2] |= 0x20; /* CSB Channel Map Change */ + } + + /* If Connectionless Slave Broadcast slave role is supported + * enable all necessary events for it. + */ + if (hdev->features[2][0] & 0x02) { + events[2] |= 0x01; /* Synchronization Train Received */ + events[2] |= 0x02; /* CSB Receive */ + events[2] |= 0x04; /* CSB Timeout */ + events[2] |= 0x08; /* Truncated Page Complete */ + } + + hci_req_add(req, HCI_OP_SET_EVENT_MASK_PAGE_2, sizeof(events), events); +} + static void hci_init3_req(struct hci_request *req, unsigned long opt) { struct hci_dev *hdev = req->hdev; @@ -634,8 +1275,17 @@ static void hci_init3_req(struct hci_request *req, unsigned long opt) hci_setup_link_policy(req); if (lmp_le_capable(hdev)) { + /* If the controller has a public BD_ADDR, then by + * default use that one. If this is a LE only + * controller without one, default to the random + * address. + */ + if (bacmp(&hdev->bdaddr, BDADDR_ANY)) + hdev->own_addr_type = ADDR_LE_DEV_PUBLIC; + else + hdev->own_addr_type = ADDR_LE_DEV_RANDOM; + hci_set_le_support(req); - hci_update_ad(req); } /* Read features beyond page 1 if available */ @@ -648,6 +1298,19 @@ static void hci_init3_req(struct hci_request *req, unsigned long opt) } } +static void hci_init4_req(struct hci_request *req, unsigned long opt) +{ + struct hci_dev *hdev = req->hdev; + + /* Set event mask page 2 if the HCI command for it is supported */ + if (hdev->commands[22] & 0x04) + hci_set_event_mask_page_2(req); + + /* Check for Synchronization Train support */ + if (hdev->features[2][0] & 0x04) + hci_req_add(req, HCI_OP_READ_SYNC_TRAIN_PARAMS, 0, NULL); +} + static int __hci_init(struct hci_dev *hdev) { int err; @@ -656,6 +1319,14 @@ static int __hci_init(struct hci_dev *hdev) if (err < 0) return err; + /* The Device Under Test (DUT) mode is special and available for + * all controller types. So just create it early on. + */ + if (test_bit(HCI_SETUP, &hdev->dev_flags)) { + debugfs_create_file("dut_mode", 0644, hdev->debugfs, hdev, + &dut_mode_fops); + } + /* HCI_BREDR covers both single-mode LE, BR/EDR and dual-mode * BR/EDR/LE type controllers. AMP controllers only need the * first stage init. @@ -667,7 +1338,75 @@ static int __hci_init(struct hci_dev *hdev) if (err < 0) return err; - return __hci_req_sync(hdev, hci_init3_req, 0, HCI_INIT_TIMEOUT); + err = __hci_req_sync(hdev, hci_init3_req, 0, HCI_INIT_TIMEOUT); + if (err < 0) + return err; + + err = __hci_req_sync(hdev, hci_init4_req, 0, HCI_INIT_TIMEOUT); + if (err < 0) + return err; + + /* Only create debugfs entries during the initial setup + * phase and not every time the controller gets powered on. + */ + if (!test_bit(HCI_SETUP, &hdev->dev_flags)) + return 0; + + debugfs_create_file("features", 0444, hdev->debugfs, hdev, + &features_fops); + debugfs_create_u16("manufacturer", 0444, hdev->debugfs, + &hdev->manufacturer); + debugfs_create_u8("hci_version", 0444, hdev->debugfs, &hdev->hci_ver); + debugfs_create_u16("hci_revision", 0444, hdev->debugfs, &hdev->hci_rev); + debugfs_create_file("blacklist", 0444, hdev->debugfs, hdev, + &blacklist_fops); + debugfs_create_file("uuids", 0444, hdev->debugfs, hdev, &uuids_fops); + + if (lmp_bredr_capable(hdev)) { + debugfs_create_file("inquiry_cache", 0444, hdev->debugfs, + hdev, &inquiry_cache_fops); + debugfs_create_file("link_keys", 0400, hdev->debugfs, + hdev, &link_keys_fops); + debugfs_create_file("use_debug_keys", 0444, hdev->debugfs, + hdev, &use_debug_keys_fops); + debugfs_create_file("dev_class", 0444, hdev->debugfs, + hdev, &dev_class_fops); + debugfs_create_file("voice_setting", 0444, hdev->debugfs, + hdev, &voice_setting_fops); + } + + if (lmp_ssp_capable(hdev)) { + debugfs_create_file("auto_accept_delay", 0644, hdev->debugfs, + hdev, &auto_accept_delay_fops); + debugfs_create_file("ssp_debug_mode", 0644, hdev->debugfs, + hdev, &ssp_debug_mode_fops); + } + + if (lmp_sniff_capable(hdev)) { + debugfs_create_file("idle_timeout", 0644, hdev->debugfs, + hdev, &idle_timeout_fops); + debugfs_create_file("sniff_min_interval", 0644, hdev->debugfs, + hdev, &sniff_min_interval_fops); + debugfs_create_file("sniff_max_interval", 0644, hdev->debugfs, + hdev, &sniff_max_interval_fops); + } + + if (lmp_le_capable(hdev)) { + debugfs_create_u8("white_list_size", 0444, hdev->debugfs, + &hdev->le_white_list_size); + debugfs_create_file("static_address", 0444, hdev->debugfs, + hdev, &static_address_fops); + debugfs_create_file("own_address_type", 0644, hdev->debugfs, + hdev, &own_address_type_fops); + debugfs_create_file("long_term_keys", 0400, hdev->debugfs, + hdev, &long_term_keys_fops); + debugfs_create_file("conn_min_interval", 0644, hdev->debugfs, + hdev, &conn_min_interval_fops); + debugfs_create_file("conn_max_interval", 0644, hdev->debugfs, + hdev, &conn_max_interval_fops); + } + + return 0; } static void hci_scan_req(struct hci_request *req, unsigned long opt) @@ -984,6 +1723,21 @@ int hci_inquiry(void __user *arg) if (!hdev) return -ENODEV; + if (test_bit(HCI_USER_CHANNEL, &hdev->dev_flags)) { + err = -EBUSY; + goto done; + } + + if (hdev->dev_type != HCI_BREDR) { + err = -EOPNOTSUPP; + goto done; + } + + if (!test_bit(HCI_BREDR_ENABLED, &hdev->dev_flags)) { + err = -EOPNOTSUPP; + goto done; + } + hci_dev_lock(hdev); if (inquiry_cache_age(hdev) > INQUIRY_CACHE_AGE_MAX || inquiry_cache_empty(hdev) || ir.flags & IREQ_CACHE_FLUSH) { @@ -1043,100 +1797,10 @@ done: return err; } -static u8 create_ad(struct hci_dev *hdev, u8 *ptr) +static int hci_dev_do_open(struct hci_dev *hdev) { - u8 ad_len = 0, flags = 0; - size_t name_len; - - if (test_bit(HCI_LE_PERIPHERAL, &hdev->dev_flags)) - flags |= LE_AD_GENERAL; - - if (!lmp_bredr_capable(hdev)) - flags |= LE_AD_NO_BREDR; - - if (lmp_le_br_capable(hdev)) - flags |= LE_AD_SIM_LE_BREDR_CTRL; - - if (lmp_host_le_br_capable(hdev)) - flags |= LE_AD_SIM_LE_BREDR_HOST; - - if (flags) { - BT_DBG("adv flags 0x%02x", flags); - - ptr[0] = 2; - ptr[1] = EIR_FLAGS; - ptr[2] = flags; - - ad_len += 3; - ptr += 3; - } - - if (hdev->adv_tx_power != HCI_TX_POWER_INVALID) { - ptr[0] = 2; - ptr[1] = EIR_TX_POWER; - ptr[2] = (u8) hdev->adv_tx_power; - - ad_len += 3; - ptr += 3; - } - - name_len = strlen(hdev->dev_name); - if (name_len > 0) { - size_t max_len = HCI_MAX_AD_LENGTH - ad_len - 2; - - if (name_len > max_len) { - name_len = max_len; - ptr[1] = EIR_NAME_SHORT; - } else - ptr[1] = EIR_NAME_COMPLETE; - - ptr[0] = name_len + 1; - - memcpy(ptr + 2, hdev->dev_name, name_len); - - ad_len += (name_len + 2); - ptr += (name_len + 2); - } - - return ad_len; -} - -void hci_update_ad(struct hci_request *req) -{ - struct hci_dev *hdev = req->hdev; - struct hci_cp_le_set_adv_data cp; - u8 len; - - if (!lmp_le_capable(hdev)) - return; - - memset(&cp, 0, sizeof(cp)); - - len = create_ad(hdev, cp.data); - - if (hdev->adv_data_len == len && - memcmp(cp.data, hdev->adv_data, len) == 0) - return; - - memcpy(hdev->adv_data, cp.data, sizeof(cp.data)); - hdev->adv_data_len = len; - - cp.length = len; - - hci_req_add(req, HCI_OP_LE_SET_ADV_DATA, sizeof(cp), &cp); -} - -/* ---- HCI ioctl helpers ---- */ - -int hci_dev_open(__u16 dev) -{ - struct hci_dev *hdev; int ret = 0; - hdev = hci_dev_get(dev); - if (!hdev) - return -ENODEV; - BT_DBG("%s %p", hdev->name, hdev); hci_req_lock(hdev); @@ -1146,13 +1810,29 @@ int hci_dev_open(__u16 dev) goto done; } - /* Check for rfkill but allow the HCI setup stage to proceed - * (which in itself doesn't cause any RF activity). - */ - if (test_bit(HCI_RFKILLED, &hdev->dev_flags) && - !test_bit(HCI_SETUP, &hdev->dev_flags)) { - ret = -ERFKILL; - goto done; + if (!test_bit(HCI_SETUP, &hdev->dev_flags)) { + /* Check for rfkill but allow the HCI setup stage to + * proceed (which in itself doesn't cause any RF activity). + */ + if (test_bit(HCI_RFKILLED, &hdev->dev_flags)) { + ret = -ERFKILL; + goto done; + } + + /* Check for valid public address or a configured static + * random adddress, but let the HCI setup proceed to + * be able to determine if there is a public address + * or not. + * + * This check is only valid for BR/EDR controllers + * since AMP controllers do not have an address. + */ + if (hdev->dev_type == HCI_BREDR && + !bacmp(&hdev->bdaddr, BDADDR_ANY) && + !bacmp(&hdev->static_addr, BDADDR_ANY)) { + ret = -EADDRNOTAVAIL; + goto done; + } } if (test_bit(HCI_UP, &hdev->flags)) { @@ -1172,16 +1852,11 @@ int hci_dev_open(__u16 dev) ret = hdev->setup(hdev); if (!ret) { - /* Treat all non BR/EDR controllers as raw devices if - * enable_hs is not set. - */ - if (hdev->dev_type != HCI_BREDR && !enable_hs) - set_bit(HCI_RAW, &hdev->flags); - if (test_bit(HCI_QUIRK_RAW_DEVICE, &hdev->quirks)) set_bit(HCI_RAW, &hdev->flags); - if (!test_bit(HCI_RAW, &hdev->flags)) + if (!test_bit(HCI_RAW, &hdev->flags) && + !test_bit(HCI_USER_CHANNEL, &hdev->dev_flags)) ret = __hci_init(hdev); } @@ -1192,7 +1867,8 @@ int hci_dev_open(__u16 dev) set_bit(HCI_UP, &hdev->flags); hci_notify(hdev, HCI_DEV_UP); if (!test_bit(HCI_SETUP, &hdev->dev_flags) && - mgmt_valid_hdev(hdev)) { + !test_bit(HCI_USER_CHANNEL, &hdev->dev_flags) && + hdev->dev_type == HCI_BREDR) { hci_dev_lock(hdev); mgmt_powered(hdev, 1); hci_dev_unlock(hdev); @@ -1220,10 +1896,41 @@ int hci_dev_open(__u16 dev) done: hci_req_unlock(hdev); - hci_dev_put(hdev); return ret; } +/* ---- HCI ioctl helpers ---- */ + +int hci_dev_open(__u16 dev) +{ + struct hci_dev *hdev; + int err; + + hdev = hci_dev_get(dev); + if (!hdev) + return -ENODEV; + + /* We need to ensure that no other power on/off work is pending + * before proceeding to call hci_dev_do_open. This is + * particularly important if the setup procedure has not yet + * completed. + */ + if (test_and_clear_bit(HCI_AUTO_OFF, &hdev->dev_flags)) + cancel_delayed_work(&hdev->power_off); + + /* After this call it is guaranteed that the setup procedure + * has finished. This means that error conditions like RFKILL + * or no valid public or static random address apply. + */ + flush_workqueue(hdev->req_workqueue); + + err = hci_dev_do_open(hdev); + + hci_dev_put(hdev); + + return err; +} + static int hci_dev_do_close(struct hci_dev *hdev) { BT_DBG("%s %p", hdev->name, hdev); @@ -1247,6 +1954,7 @@ static int hci_dev_do_close(struct hci_dev *hdev) cancel_delayed_work(&hdev->discov_off); hdev->discov_timeout = 0; clear_bit(HCI_DISCOVERABLE, &hdev->dev_flags); + clear_bit(HCI_LIMITED_DISCOVERABLE, &hdev->dev_flags); } if (test_and_clear_bit(HCI_SERVICE_CACHE, &hdev->dev_flags)) @@ -1268,6 +1976,7 @@ static int hci_dev_do_close(struct hci_dev *hdev) skb_queue_purge(&hdev->cmd_q); atomic_set(&hdev->cmd_cnt, 1); if (!test_bit(HCI_RAW, &hdev->flags) && + !test_bit(HCI_AUTO_OFF, &hdev->dev_flags) && test_bit(HCI_QUIRK_RESET_ON_CLOSE, &hdev->quirks)) { set_bit(HCI_INIT, &hdev->flags); __hci_req_sync(hdev, hci_reset_req, 0, HCI_CMD_TIMEOUT); @@ -1300,15 +2009,16 @@ static int hci_dev_do_close(struct hci_dev *hdev) hdev->flags = 0; hdev->dev_flags &= ~HCI_PERSISTENT_MASK; - if (!test_and_clear_bit(HCI_AUTO_OFF, &hdev->dev_flags) && - mgmt_valid_hdev(hdev)) { - hci_dev_lock(hdev); - mgmt_powered(hdev, 0); - hci_dev_unlock(hdev); + if (!test_and_clear_bit(HCI_AUTO_OFF, &hdev->dev_flags)) { + if (hdev->dev_type == HCI_BREDR) { + hci_dev_lock(hdev); + mgmt_powered(hdev, 0); + hci_dev_unlock(hdev); + } } /* Controller radio is available but is currently powered down */ - hdev->amp_status = 0; + hdev->amp_status = AMP_STATUS_POWERED_DOWN; memset(hdev->eir, 0, sizeof(hdev->eir)); memset(hdev->dev_class, 0, sizeof(hdev->dev_class)); @@ -1328,11 +2038,17 @@ int hci_dev_close(__u16 dev) if (!hdev) return -ENODEV; + if (test_bit(HCI_USER_CHANNEL, &hdev->dev_flags)) { + err = -EBUSY; + goto done; + } + if (test_and_clear_bit(HCI_AUTO_OFF, &hdev->dev_flags)) cancel_delayed_work(&hdev->power_off); err = hci_dev_do_close(hdev); +done: hci_dev_put(hdev); return err; } @@ -1348,8 +2064,15 @@ int hci_dev_reset(__u16 dev) hci_req_lock(hdev); - if (!test_bit(HCI_UP, &hdev->flags)) + if (!test_bit(HCI_UP, &hdev->flags)) { + ret = -ENETDOWN; goto done; + } + + if (test_bit(HCI_USER_CHANNEL, &hdev->dev_flags)) { + ret = -EBUSY; + goto done; + } /* Drop queues */ skb_queue_purge(&hdev->rx_q); @@ -1384,10 +2107,15 @@ int hci_dev_reset_stat(__u16 dev) if (!hdev) return -ENODEV; + if (test_bit(HCI_USER_CHANNEL, &hdev->dev_flags)) { + ret = -EBUSY; + goto done; + } + memset(&hdev->stat, 0, sizeof(struct hci_dev_stats)); +done: hci_dev_put(hdev); - return ret; } @@ -1404,6 +2132,21 @@ int hci_dev_cmd(unsigned int cmd, void __user *arg) if (!hdev) return -ENODEV; + if (test_bit(HCI_USER_CHANNEL, &hdev->dev_flags)) { + err = -EBUSY; + goto done; + } + + if (hdev->dev_type != HCI_BREDR) { + err = -EOPNOTSUPP; + goto done; + } + + if (!test_bit(HCI_BREDR_ENABLED, &hdev->dev_flags)) { + err = -EOPNOTSUPP; + goto done; + } + switch (cmd) { case HCISETAUTH: err = hci_req_sync(hdev, hci_auth_req, dr.dev_opt, @@ -1462,6 +2205,7 @@ int hci_dev_cmd(unsigned int cmd, void __user *arg) break; } +done: hci_dev_put(hdev); return err; } @@ -1534,7 +2278,7 @@ int hci_get_dev_info(void __user *arg) strcpy(di.name, hdev->name); di.bdaddr = hdev->bdaddr; - di.type = (hdev->bus & 0x0f) | (hdev->dev_type << 4); + di.type = (hdev->bus & 0x0f) | ((hdev->dev_type & 0x03) << 4); di.flags = hdev->flags; di.pkt_type = hdev->pkt_type; if (lmp_bredr_capable(hdev)) { @@ -1570,6 +2314,9 @@ static int hci_rfkill_set_block(void *data, bool blocked) BT_DBG("%p name %s blocked %d", hdev, hdev->name, blocked); + if (test_bit(HCI_USER_CHANNEL, &hdev->dev_flags)) + return -EBUSY; + if (blocked) { set_bit(HCI_RFKILLED, &hdev->dev_flags); if (!test_bit(HCI_SETUP, &hdev->dev_flags)) @@ -1592,13 +2339,20 @@ static void hci_power_on(struct work_struct *work) BT_DBG("%s", hdev->name); - err = hci_dev_open(hdev->id); + err = hci_dev_do_open(hdev); if (err < 0) { mgmt_set_powered_failed(hdev, err); return; } - if (test_bit(HCI_RFKILLED, &hdev->dev_flags)) { + /* During the HCI setup phase, a few error conditions are + * ignored and they need to be checked now. If they are still + * valid, it is important to turn the device back off. + */ + if (test_bit(HCI_RFKILLED, &hdev->dev_flags) || + (hdev->dev_type == HCI_BREDR && + !bacmp(&hdev->bdaddr, BDADDR_ANY) && + !bacmp(&hdev->static_addr, BDADDR_ANY))) { clear_bit(HCI_AUTO_OFF, &hdev->dev_flags); hci_dev_do_close(hdev); } else if (test_bit(HCI_AUTO_OFF, &hdev->dev_flags)) { @@ -1623,19 +2377,12 @@ static void hci_power_off(struct work_struct *work) static void hci_discov_off(struct work_struct *work) { struct hci_dev *hdev; - u8 scan = SCAN_PAGE; hdev = container_of(work, struct hci_dev, discov_off.work); BT_DBG("%s", hdev->name); - hci_dev_lock(hdev); - - hci_send_cmd(hdev, HCI_OP_WRITE_SCAN_ENABLE, sizeof(scan), &scan); - - hdev->discov_timeout = 0; - - hci_dev_unlock(hdev); + mgmt_discoverable_timeout(hdev); } int hci_uuids_clear(struct hci_dev *hdev) @@ -1958,13 +2705,15 @@ int hci_add_remote_oob_data(struct hci_dev *hdev, bdaddr_t *bdaddr, u8 *hash, return 0; } -struct bdaddr_list *hci_blacklist_lookup(struct hci_dev *hdev, bdaddr_t *bdaddr) +struct bdaddr_list *hci_blacklist_lookup(struct hci_dev *hdev, + bdaddr_t *bdaddr, u8 type) { struct bdaddr_list *b; - list_for_each_entry(b, &hdev->blacklist, list) - if (bacmp(bdaddr, &b->bdaddr) == 0) + list_for_each_entry(b, &hdev->blacklist, list) { + if (!bacmp(&b->bdaddr, bdaddr) && b->bdaddr_type == type) return b; + } return NULL; } @@ -1974,9 +2723,7 @@ int hci_blacklist_clear(struct hci_dev *hdev) struct list_head *p, *n; list_for_each_safe(p, n, &hdev->blacklist) { - struct bdaddr_list *b; - - b = list_entry(p, struct bdaddr_list, list); + struct bdaddr_list *b = list_entry(p, struct bdaddr_list, list); list_del(p); kfree(b); @@ -1989,10 +2736,10 @@ int hci_blacklist_add(struct hci_dev *hdev, bdaddr_t *bdaddr, u8 type) { struct bdaddr_list *entry; - if (bacmp(bdaddr, BDADDR_ANY) == 0) + if (!bacmp(bdaddr, BDADDR_ANY)) return -EBADF; - if (hci_blacklist_lookup(hdev, bdaddr)) + if (hci_blacklist_lookup(hdev, bdaddr, type)) return -EEXIST; entry = kzalloc(sizeof(struct bdaddr_list), GFP_KERNEL); @@ -2000,6 +2747,7 @@ int hci_blacklist_add(struct hci_dev *hdev, bdaddr_t *bdaddr, u8 type) return -ENOMEM; bacpy(&entry->bdaddr, bdaddr); + entry->bdaddr_type = type; list_add(&entry->list, &hdev->blacklist); @@ -2010,10 +2758,10 @@ int hci_blacklist_del(struct hci_dev *hdev, bdaddr_t *bdaddr, u8 type) { struct bdaddr_list *entry; - if (bacmp(bdaddr, BDADDR_ANY) == 0) + if (!bacmp(bdaddr, BDADDR_ANY)) return hci_blacklist_clear(hdev); - entry = hci_blacklist_lookup(hdev, bdaddr); + entry = hci_blacklist_lookup(hdev, bdaddr, type); if (!entry) return -ENOENT; @@ -2111,13 +2859,19 @@ struct hci_dev *hci_alloc_dev(void) hdev->pkt_type = (HCI_DM1 | HCI_DH1 | HCI_HV1); hdev->esco_type = (ESCO_HV1); hdev->link_mode = (HCI_LM_ACCEPT); - hdev->io_capability = 0x03; /* No Input No Output */ + hdev->num_iac = 0x01; /* One IAC support is mandatory */ + hdev->io_capability = 0x03; /* No Input No Output */ hdev->inq_tx_power = HCI_TX_POWER_INVALID; hdev->adv_tx_power = HCI_TX_POWER_INVALID; hdev->sniff_max_interval = 800; hdev->sniff_min_interval = 80; + hdev->le_scan_interval = 0x0060; + hdev->le_scan_window = 0x0030; + hdev->le_conn_min_interval = 0x0028; + hdev->le_conn_max_interval = 0x0038; + mutex_init(&hdev->lock); mutex_init(&hdev->req_lock); @@ -2206,7 +2960,12 @@ int hci_register_dev(struct hci_dev *hdev) goto err; } - error = hci_add_sysfs(hdev); + if (!IS_ERR_OR_NULL(bt_debugfs)) + hdev->debugfs = debugfs_create_dir(hdev->name, bt_debugfs); + + dev_set_name(&hdev->dev, "%s", hdev->name); + + error = device_add(&hdev->dev); if (error < 0) goto err_wqueue; @@ -2224,9 +2983,14 @@ int hci_register_dev(struct hci_dev *hdev) set_bit(HCI_RFKILLED, &hdev->dev_flags); set_bit(HCI_SETUP, &hdev->dev_flags); + set_bit(HCI_AUTO_OFF, &hdev->dev_flags); - if (hdev->dev_type != HCI_AMP) - set_bit(HCI_AUTO_OFF, &hdev->dev_flags); + if (hdev->dev_type == HCI_BREDR) { + /* Assume BR/EDR support until proven otherwise (such as + * through reading supported features during init. + */ + set_bit(HCI_BREDR_ENABLED, &hdev->dev_flags); + } write_lock(&hci_dev_list_lock); list_add(&hdev->list, &hci_dev_list); @@ -2289,7 +3053,9 @@ void hci_unregister_dev(struct hci_dev *hdev) rfkill_destroy(hdev->rfkill); } - hci_del_sysfs(hdev); + device_del(&hdev->dev); + + debugfs_remove_recursive(hdev->debugfs); destroy_workqueue(hdev->workqueue); destroy_workqueue(hdev->req_workqueue); @@ -2325,9 +3091,8 @@ int hci_resume_dev(struct hci_dev *hdev) EXPORT_SYMBOL(hci_resume_dev); /* Receive frame from HCI drivers */ -int hci_recv_frame(struct sk_buff *skb) +int hci_recv_frame(struct hci_dev *hdev, struct sk_buff *skb) { - struct hci_dev *hdev = (struct hci_dev *) skb->dev; if (!hdev || (!test_bit(HCI_UP, &hdev->flags) && !test_bit(HCI_INIT, &hdev->flags))) { kfree_skb(skb); @@ -2386,7 +3151,6 @@ static int hci_reassembly(struct hci_dev *hdev, int type, void *data, scb->expect = hlen; scb->pkt_type = type; - skb->dev = (void *) hdev; hdev->reassembly[index] = skb; } @@ -2446,7 +3210,7 @@ static int hci_reassembly(struct hci_dev *hdev, int type, void *data, /* Complete frame */ bt_cb(skb)->pkt_type = type; - hci_recv_frame(skb); + hci_recv_frame(hdev, skb); hdev->reassembly[index] = NULL; return remain; @@ -2537,15 +3301,8 @@ int hci_unregister_cb(struct hci_cb *cb) } EXPORT_SYMBOL(hci_unregister_cb); -static int hci_send_frame(struct sk_buff *skb) +static void hci_send_frame(struct hci_dev *hdev, struct sk_buff *skb) { - struct hci_dev *hdev = (struct hci_dev *) skb->dev; - - if (!hdev) { - kfree_skb(skb); - return -ENODEV; - } - BT_DBG("%s type %d len %d", hdev->name, bt_cb(skb)->pkt_type, skb->len); /* Time stamp */ @@ -2562,7 +3319,8 @@ static int hci_send_frame(struct sk_buff *skb) /* Get rid of skb owner, prior to sending to the driver. */ skb_orphan(skb); - return hdev->send(skb); + if (hdev->send(hdev, skb) < 0) + BT_ERR("%s sending frame failed", hdev->name); } void hci_req_init(struct hci_request *req, struct hci_dev *hdev) @@ -2625,7 +3383,6 @@ static struct sk_buff *hci_prepare_cmd(struct hci_dev *hdev, u16 opcode, BT_DBG("skb len %d", skb->len); bt_cb(skb)->pkt_type = HCI_COMMAND_PKT; - skb->dev = (void *) hdev; return skb; } @@ -2769,7 +3526,6 @@ static void hci_queue_acl(struct hci_chan *chan, struct sk_buff_head *queue, do { skb = list; list = list->next; - skb->dev = (void *) hdev; bt_cb(skb)->pkt_type = HCI_ACLDATA_PKT; hci_add_acl_hdr(skb, conn->handle, flags); @@ -2788,8 +3544,6 @@ void hci_send_acl(struct hci_chan *chan, struct sk_buff *skb, __u16 flags) BT_DBG("%s chan %p flags 0x%4.4x", hdev->name, chan, flags); - skb->dev = (void *) hdev; - hci_queue_acl(chan, &chan->data_q, skb, flags); queue_work(hdev->workqueue, &hdev->tx_work); @@ -2810,7 +3564,6 @@ void hci_send_sco(struct hci_conn *conn, struct sk_buff *skb) skb_reset_transport_header(skb); memcpy(skb_transport_header(skb), &hdr, HCI_SCO_HDR_SIZE); - skb->dev = (void *) hdev; bt_cb(skb)->pkt_type = HCI_SCODATA_PKT; skb_queue_tail(&conn->data_q, skb); @@ -3075,7 +3828,7 @@ static void hci_sched_acl_pkt(struct hci_dev *hdev) hci_conn_enter_active_mode(chan->conn, bt_cb(skb)->force_active); - hci_send_frame(skb); + hci_send_frame(hdev, skb); hdev->acl_last_tx = jiffies; hdev->acl_cnt--; @@ -3127,7 +3880,7 @@ static void hci_sched_acl_blk(struct hci_dev *hdev) hci_conn_enter_active_mode(chan->conn, bt_cb(skb)->force_active); - hci_send_frame(skb); + hci_send_frame(hdev, skb); hdev->acl_last_tx = jiffies; hdev->block_cnt -= blocks; @@ -3180,7 +3933,7 @@ static void hci_sched_sco(struct hci_dev *hdev) while (hdev->sco_cnt && (conn = hci_low_sent(hdev, SCO_LINK, "e))) { while (quote-- && (skb = skb_dequeue(&conn->data_q))) { BT_DBG("skb %p len %d", skb, skb->len); - hci_send_frame(skb); + hci_send_frame(hdev, skb); conn->sent++; if (conn->sent == ~0) @@ -3204,7 +3957,7 @@ static void hci_sched_esco(struct hci_dev *hdev) "e))) { while (quote-- && (skb = skb_dequeue(&conn->data_q))) { BT_DBG("skb %p len %d", skb, skb->len); - hci_send_frame(skb); + hci_send_frame(hdev, skb); conn->sent++; if (conn->sent == ~0) @@ -3246,7 +3999,7 @@ static void hci_sched_le(struct hci_dev *hdev) skb = skb_dequeue(&chan->data_q); - hci_send_frame(skb); + hci_send_frame(hdev, skb); hdev->le_last_tx = jiffies; cnt--; @@ -3272,19 +4025,17 @@ static void hci_tx_work(struct work_struct *work) BT_DBG("%s acl %d sco %d le %d", hdev->name, hdev->acl_cnt, hdev->sco_cnt, hdev->le_cnt); - /* Schedule queues and send stuff to HCI driver */ - - hci_sched_acl(hdev); - - hci_sched_sco(hdev); - - hci_sched_esco(hdev); - - hci_sched_le(hdev); + if (!test_bit(HCI_USER_CHANNEL, &hdev->dev_flags)) { + /* Schedule queues and send stuff to HCI driver */ + hci_sched_acl(hdev); + hci_sched_sco(hdev); + hci_sched_esco(hdev); + hci_sched_le(hdev); + } /* Send next queued raw (unknown type) packet */ while ((skb = skb_dequeue(&hdev->raw_q))) - hci_send_frame(skb); + hci_send_frame(hdev, skb); } /* ----- HCI RX task (incoming data processing) ----- */ @@ -3471,7 +4222,8 @@ static void hci_rx_work(struct work_struct *work) hci_send_to_sock(hdev, skb); } - if (test_bit(HCI_RAW, &hdev->flags)) { + if (test_bit(HCI_RAW, &hdev->flags) || + test_bit(HCI_USER_CHANNEL, &hdev->dev_flags)) { kfree_skb(skb); continue; } @@ -3526,10 +4278,10 @@ static void hci_cmd_work(struct work_struct *work) kfree_skb(hdev->sent_cmd); - hdev->sent_cmd = skb_clone(skb, GFP_ATOMIC); + hdev->sent_cmd = skb_clone(skb, GFP_KERNEL); if (hdev->sent_cmd) { atomic_dec(&hdev->cmd_cnt); - hci_send_frame(skb); + hci_send_frame(hdev, skb); if (test_bit(HCI_RESET, &hdev->flags)) del_timer(&hdev->cmd_timer); else @@ -3541,15 +4293,3 @@ static void hci_cmd_work(struct work_struct *work) } } } - -u8 bdaddr_to_le(u8 bdaddr_type) -{ - switch (bdaddr_type) { - case BDADDR_LE_PUBLIC: - return ADDR_LE_DEV_PUBLIC; - - default: - /* Fallback to LE Random address type */ - return ADDR_LE_DEV_RANDOM; - } -} diff --git a/net/bluetooth/hci_event.c b/net/bluetooth/hci_event.c index 8db3e89fae35..5935f748c0f9 100644 --- a/net/bluetooth/hci_event.c +++ b/net/bluetooth/hci_event.c @@ -29,8 +29,9 @@ #include <net/bluetooth/bluetooth.h> #include <net/bluetooth/hci_core.h> #include <net/bluetooth/mgmt.h> -#include <net/bluetooth/a2mp.h> -#include <net/bluetooth/amp.h> + +#include "a2mp.h" +#include "amp.h" /* Handle HCI Event packets */ @@ -194,6 +195,11 @@ static void hci_cc_reset(struct hci_dev *hdev, struct sk_buff *skb) memset(hdev->adv_data, 0, sizeof(hdev->adv_data)); hdev->adv_data_len = 0; + + memset(hdev->scan_rsp_data, 0, sizeof(hdev->scan_rsp_data)); + hdev->scan_rsp_data_len = 0; + + hdev->ssp_debug_mode = 0; } static void hci_cc_write_local_name(struct hci_dev *hdev, struct sk_buff *skb) @@ -297,6 +303,11 @@ static void hci_cc_write_scan_enable(struct hci_dev *hdev, struct sk_buff *skb) goto done; } + /* We need to ensure that we set this back on if someone changed + * the scan mode through a raw HCI socket. + */ + set_bit(HCI_BREDR_ENABLED, &hdev->dev_flags); + old_pscan = test_and_clear_bit(HCI_PSCAN, &hdev->flags); old_iscan = test_and_clear_bit(HCI_ISCAN, &hdev->flags); @@ -304,11 +315,6 @@ static void hci_cc_write_scan_enable(struct hci_dev *hdev, struct sk_buff *skb) set_bit(HCI_ISCAN, &hdev->flags); if (!old_iscan) mgmt_discoverable(hdev, 1); - if (hdev->discov_timeout > 0) { - int to = msecs_to_jiffies(hdev->discov_timeout * 1000); - queue_delayed_work(hdev->workqueue, &hdev->discov_off, - to); - } } else if (old_iscan) mgmt_discoverable(hdev, 0); @@ -412,6 +418,21 @@ static void hci_cc_write_voice_setting(struct hci_dev *hdev, hdev->notify(hdev, HCI_NOTIFY_VOICE_SETTING); } +static void hci_cc_read_num_supported_iac(struct hci_dev *hdev, + struct sk_buff *skb) +{ + struct hci_rp_read_num_supported_iac *rp = (void *) skb->data; + + BT_DBG("%s status 0x%2.2x", hdev->name, rp->status); + + if (rp->status) + return; + + hdev->num_iac = rp->num_iac; + + BT_DBG("%s num iac %d", hdev->name, hdev->num_iac); +} + static void hci_cc_write_ssp_mode(struct hci_dev *hdev, struct sk_buff *skb) { __u8 status = *((__u8 *) skb->data); @@ -449,14 +470,13 @@ static void hci_cc_read_local_version(struct hci_dev *hdev, struct sk_buff *skb) if (rp->status) return; - hdev->hci_ver = rp->hci_ver; - hdev->hci_rev = __le16_to_cpu(rp->hci_rev); - hdev->lmp_ver = rp->lmp_ver; - hdev->manufacturer = __le16_to_cpu(rp->manufacturer); - hdev->lmp_subver = __le16_to_cpu(rp->lmp_subver); - - BT_DBG("%s manufacturer 0x%4.4x hci ver %d:%d", hdev->name, - hdev->manufacturer, hdev->hci_ver, hdev->hci_rev); + if (test_bit(HCI_SETUP, &hdev->dev_flags)) { + hdev->hci_ver = rp->hci_ver; + hdev->hci_rev = __le16_to_cpu(rp->hci_rev); + hdev->lmp_ver = rp->lmp_ver; + hdev->manufacturer = __le16_to_cpu(rp->manufacturer); + hdev->lmp_subver = __le16_to_cpu(rp->lmp_subver); + } } static void hci_cc_read_local_commands(struct hci_dev *hdev, @@ -536,7 +556,8 @@ static void hci_cc_read_local_ext_features(struct hci_dev *hdev, if (rp->status) return; - hdev->max_page = rp->max_page; + if (hdev->max_page < rp->max_page) + hdev->max_page = rp->max_page; if (rp->page < HCI_MAX_PAGES) memcpy(hdev->features[rp->page], rp->features, 8); @@ -913,17 +934,9 @@ static void hci_cc_le_set_adv_enable(struct hci_dev *hdev, struct sk_buff *skb) if (!status) { if (*sent) - set_bit(HCI_LE_PERIPHERAL, &hdev->dev_flags); + set_bit(HCI_ADVERTISING, &hdev->dev_flags); else - clear_bit(HCI_LE_PERIPHERAL, &hdev->dev_flags); - } - - if (!test_bit(HCI_INIT, &hdev->flags)) { - struct hci_request req; - - hci_req_init(&req, hdev); - hci_update_ad(&req); - hci_req_run(&req, NULL); + clear_bit(HCI_ADVERTISING, &hdev->dev_flags); } hci_dev_unlock(hdev); @@ -994,20 +1007,20 @@ static void hci_cc_write_le_host_supported(struct hci_dev *hdev, return; if (!status) { - if (sent->le) + if (sent->le) { hdev->features[1][0] |= LMP_HOST_LE; - else + set_bit(HCI_LE_ENABLED, &hdev->dev_flags); + } else { hdev->features[1][0] &= ~LMP_HOST_LE; + clear_bit(HCI_LE_ENABLED, &hdev->dev_flags); + clear_bit(HCI_ADVERTISING, &hdev->dev_flags); + } if (sent->simul) hdev->features[1][0] |= LMP_HOST_LE_BREDR; else hdev->features[1][0] &= ~LMP_HOST_LE_BREDR; } - - if (test_bit(HCI_MGMT, &hdev->dev_flags) && - !test_bit(HCI_INIT, &hdev->flags)) - mgmt_le_enable_complete(hdev, sent->le, status); } static void hci_cc_write_remote_amp_assoc(struct hci_dev *hdev, @@ -1291,9 +1304,11 @@ static void hci_cs_remote_name_req(struct hci_dev *hdev, __u8 status) goto unlock; if (!test_and_set_bit(HCI_CONN_AUTH_PEND, &conn->flags)) { - struct hci_cp_auth_requested cp; - cp.handle = __cpu_to_le16(conn->handle); - hci_send_cmd(hdev, HCI_OP_AUTH_REQUESTED, sizeof(cp), &cp); + struct hci_cp_auth_requested auth_cp; + + auth_cp.handle = __cpu_to_le16(conn->handle); + hci_send_cmd(hdev, HCI_OP_AUTH_REQUESTED, + sizeof(auth_cp), &auth_cp); } unlock: @@ -1465,33 +1480,6 @@ static void hci_cs_disconnect(struct hci_dev *hdev, u8 status) hci_dev_unlock(hdev); } -static void hci_cs_le_create_conn(struct hci_dev *hdev, __u8 status) -{ - struct hci_conn *conn; - - BT_DBG("%s status 0x%2.2x", hdev->name, status); - - if (status) { - hci_dev_lock(hdev); - - conn = hci_conn_hash_lookup_state(hdev, LE_LINK, BT_CONNECT); - if (!conn) { - hci_dev_unlock(hdev); - return; - } - - BT_DBG("%s bdaddr %pMR conn %p", hdev->name, &conn->dst, conn); - - conn->state = BT_CLOSED; - mgmt_connect_failed(hdev, &conn->dst, conn->type, - conn->dst_type, status); - hci_proto_connect_cfm(conn, status); - hci_conn_del(conn); - - hci_dev_unlock(hdev); - } -} - static void hci_cs_create_phylink(struct hci_dev *hdev, u8 status) { struct hci_cp_create_phy_link *cp; @@ -1706,7 +1694,7 @@ static void hci_conn_request_evt(struct hci_dev *hdev, struct sk_buff *skb) &flags); if ((mask & HCI_LM_ACCEPT) && - !hci_blacklist_lookup(hdev, &ev->bdaddr)) { + !hci_blacklist_lookup(hdev, &ev->bdaddr, BDADDR_BREDR)) { /* Connection accepted */ struct inquiry_entry *ie; struct hci_conn *conn; @@ -1821,10 +1809,25 @@ static void hci_disconn_complete_evt(struct hci_dev *hdev, struct sk_buff *skb) } if (ev->status == 0) { - if (conn->type == ACL_LINK && conn->flush_key) + u8 type = conn->type; + + if (type == ACL_LINK && conn->flush_key) hci_remove_link_key(hdev, &conn->dst); hci_proto_disconn_cfm(conn, ev->reason); hci_conn_del(conn); + + /* Re-enable advertising if necessary, since it might + * have been disabled by the connection. From the + * HCI_LE_Set_Advertise_Enable command description in + * the core specification (v4.0): + * "The Controller shall continue advertising until the Host + * issues an LE_Set_Advertise_Enable command with + * Advertising_Enable set to 0x00 (Advertising is disabled) + * or until a connection is created or until the Advertising + * is timed out due to Directed Advertising." + */ + if (type == LE_LINK) + mgmt_reenable_advertising(hdev); } unlock: @@ -2139,6 +2142,10 @@ static void hci_cmd_complete_evt(struct hci_dev *hdev, struct sk_buff *skb) hci_cc_write_voice_setting(hdev, skb); break; + case HCI_OP_READ_NUM_SUPPORTED_IAC: + hci_cc_read_num_supported_iac(hdev, skb); + break; + case HCI_OP_WRITE_SSP_MODE: hci_cc_write_ssp_mode(hdev, skb); break; @@ -2342,10 +2349,6 @@ static void hci_cmd_status_evt(struct hci_dev *hdev, struct sk_buff *skb) hci_cs_disconnect(hdev, ev->status); break; - case HCI_OP_LE_CREATE_CONN: - hci_cs_le_create_conn(hdev, ev->status); - break; - case HCI_OP_CREATE_PHY_LINK: hci_cs_create_phylink(hdev, ev->status); break; @@ -2548,7 +2551,6 @@ static void hci_mode_change_evt(struct hci_dev *hdev, struct sk_buff *skb) conn = hci_conn_hash_lookup_handle(hdev, __le16_to_cpu(ev->handle)); if (conn) { conn->mode = ev->mode; - conn->interval = __le16_to_cpu(ev->interval); if (!test_and_clear_bit(HCI_CONN_MODE_CHANGE_PEND, &conn->flags)) { @@ -2930,6 +2932,23 @@ unlock: hci_dev_unlock(hdev); } +static inline size_t eir_get_length(u8 *eir, size_t eir_len) +{ + size_t parsed = 0; + + while (parsed < eir_len) { + u8 field_len = eir[0]; + + if (field_len == 0) + return parsed; + + parsed += field_len + 1; + eir += field_len + 1; + } + + return eir_len; +} + static void hci_extended_inquiry_result_evt(struct hci_dev *hdev, struct sk_buff *skb) { @@ -3170,7 +3189,8 @@ static void hci_user_confirm_request_evt(struct hci_dev *hdev, if (hdev->auto_accept_delay > 0) { int delay = msecs_to_jiffies(hdev->auto_accept_delay); - mod_timer(&conn->auto_accept_timer, jiffies + delay); + queue_delayed_work(conn->hdev->workqueue, + &conn->auto_accept_work, delay); goto unlock; } @@ -3485,6 +3505,17 @@ static void hci_le_conn_complete_evt(struct hci_dev *hdev, struct sk_buff *skb) conn->dst_type = ev->bdaddr_type; + /* The advertising parameters for own address type + * define which source address and source address + * type this connections has. + */ + if (bacmp(&conn->src, BDADDR_ANY)) { + conn->src_type = ADDR_LE_DEV_PUBLIC; + } else { + bacpy(&conn->src, &hdev->static_addr); + conn->src_type = ADDR_LE_DEV_RANDOM; + } + if (ev->role == LE_CONN_ROLE_MASTER) { conn->out = true; conn->link_mode |= HCI_LM_MASTER; @@ -3640,8 +3671,8 @@ void hci_event_packet(struct hci_dev *hdev, struct sk_buff *skb) skb_pull(skb, HCI_EVENT_HDR_SIZE); if (hdev->sent_cmd && bt_cb(hdev->sent_cmd)->req.event == event) { - struct hci_command_hdr *hdr = (void *) hdev->sent_cmd->data; - u16 opcode = __le16_to_cpu(hdr->opcode); + struct hci_command_hdr *cmd_hdr = (void *) hdev->sent_cmd->data; + u16 opcode = __le16_to_cpu(cmd_hdr->opcode); hci_req_cmd_complete(hdev, opcode, 0); } diff --git a/net/bluetooth/hci_sock.c b/net/bluetooth/hci_sock.c index 9bd7d959e384..71f0be173080 100644 --- a/net/bluetooth/hci_sock.c +++ b/net/bluetooth/hci_sock.c @@ -66,6 +66,46 @@ static struct bt_sock_list hci_sk_list = { .lock = __RW_LOCK_UNLOCKED(hci_sk_list.lock) }; +static bool is_filtered_packet(struct sock *sk, struct sk_buff *skb) +{ + struct hci_filter *flt; + int flt_type, flt_event; + + /* Apply filter */ + flt = &hci_pi(sk)->filter; + + if (bt_cb(skb)->pkt_type == HCI_VENDOR_PKT) + flt_type = 0; + else + flt_type = bt_cb(skb)->pkt_type & HCI_FLT_TYPE_BITS; + + if (!test_bit(flt_type, &flt->type_mask)) + return true; + + /* Extra filter for event packets only */ + if (bt_cb(skb)->pkt_type != HCI_EVENT_PKT) + return false; + + flt_event = (*(__u8 *)skb->data & HCI_FLT_EVENT_BITS); + + if (!hci_test_bit(flt_event, &flt->event_mask)) + return true; + + /* Check filter only when opcode is set */ + if (!flt->opcode) + return false; + + if (flt_event == HCI_EV_CMD_COMPLETE && + flt->opcode != get_unaligned((__le16 *)(skb->data + 3))) + return true; + + if (flt_event == HCI_EV_CMD_STATUS && + flt->opcode != get_unaligned((__le16 *)(skb->data + 4))) + return true; + + return false; +} + /* Send frame to RAW socket */ void hci_send_to_sock(struct hci_dev *hdev, struct sk_buff *skb) { @@ -77,7 +117,6 @@ void hci_send_to_sock(struct hci_dev *hdev, struct sk_buff *skb) read_lock(&hci_sk_list.lock); sk_for_each(sk, &hci_sk_list.head) { - struct hci_filter *flt; struct sk_buff *nskb; if (sk->sk_state != BT_BOUND || hci_pi(sk)->hdev != hdev) @@ -87,31 +126,19 @@ void hci_send_to_sock(struct hci_dev *hdev, struct sk_buff *skb) if (skb->sk == sk) continue; - if (hci_pi(sk)->channel != HCI_CHANNEL_RAW) - continue; - - /* Apply filter */ - flt = &hci_pi(sk)->filter; - - if (!test_bit((bt_cb(skb)->pkt_type == HCI_VENDOR_PKT) ? - 0 : (bt_cb(skb)->pkt_type & HCI_FLT_TYPE_BITS), - &flt->type_mask)) - continue; - - if (bt_cb(skb)->pkt_type == HCI_EVENT_PKT) { - int evt = (*(__u8 *)skb->data & HCI_FLT_EVENT_BITS); - - if (!hci_test_bit(evt, &flt->event_mask)) + if (hci_pi(sk)->channel == HCI_CHANNEL_RAW) { + if (is_filtered_packet(sk, skb)) continue; - - if (flt->opcode && - ((evt == HCI_EV_CMD_COMPLETE && - flt->opcode != - get_unaligned((__le16 *)(skb->data + 3))) || - (evt == HCI_EV_CMD_STATUS && - flt->opcode != - get_unaligned((__le16 *)(skb->data + 4))))) + } else if (hci_pi(sk)->channel == HCI_CHANNEL_USER) { + if (!bt_cb(skb)->incoming) + continue; + if (bt_cb(skb)->pkt_type != HCI_EVENT_PKT && + bt_cb(skb)->pkt_type != HCI_ACLDATA_PKT && + bt_cb(skb)->pkt_type != HCI_SCODATA_PKT) continue; + } else { + /* Don't send frame to other channel types */ + continue; } if (!skb_copy) { @@ -360,7 +387,6 @@ static void hci_si_event(struct hci_dev *hdev, int type, int dlen, void *data) __net_timestamp(skb); bt_cb(skb)->pkt_type = HCI_EVENT_PKT; - skb->dev = (void *) hdev; hci_send_to_sock(hdev, skb); kfree_skb(skb); } @@ -426,6 +452,12 @@ static int hci_sock_release(struct socket *sock) bt_sock_unlink(&hci_sk_list, sk); if (hdev) { + if (hci_pi(sk)->channel == HCI_CHANNEL_USER) { + mgmt_index_added(hdev); + clear_bit(HCI_USER_CHANNEL, &hdev->dev_flags); + hci_dev_close(hdev->id); + } + atomic_dec(&hdev->promisc); hci_dev_put(hdev); } @@ -449,7 +481,7 @@ static int hci_sock_blacklist_add(struct hci_dev *hdev, void __user *arg) hci_dev_lock(hdev); - err = hci_blacklist_add(hdev, &bdaddr, 0); + err = hci_blacklist_add(hdev, &bdaddr, BDADDR_BREDR); hci_dev_unlock(hdev); @@ -466,7 +498,7 @@ static int hci_sock_blacklist_del(struct hci_dev *hdev, void __user *arg) hci_dev_lock(hdev); - err = hci_blacklist_del(hdev, &bdaddr, 0); + err = hci_blacklist_del(hdev, &bdaddr, BDADDR_BREDR); hci_dev_unlock(hdev); @@ -482,6 +514,12 @@ static int hci_sock_bound_ioctl(struct sock *sk, unsigned int cmd, if (!hdev) return -EBADFD; + if (test_bit(HCI_USER_CHANNEL, &hdev->dev_flags)) + return -EBUSY; + + if (hdev->dev_type != HCI_BREDR) + return -EOPNOTSUPP; + switch (cmd) { case HCISETRAW: if (!capable(CAP_NET_ADMIN)) @@ -512,23 +550,29 @@ static int hci_sock_bound_ioctl(struct sock *sk, unsigned int cmd, if (!capable(CAP_NET_ADMIN)) return -EPERM; return hci_sock_blacklist_del(hdev, (void __user *) arg); - - default: - if (hdev->ioctl) - return hdev->ioctl(hdev, cmd, arg); - return -EINVAL; } + + return -ENOIOCTLCMD; } static int hci_sock_ioctl(struct socket *sock, unsigned int cmd, unsigned long arg) { - struct sock *sk = sock->sk; void __user *argp = (void __user *) arg; + struct sock *sk = sock->sk; int err; BT_DBG("cmd %x arg %lx", cmd, arg); + lock_sock(sk); + + if (hci_pi(sk)->channel != HCI_CHANNEL_RAW) { + err = -EBADFD; + goto done; + } + + release_sock(sk); + switch (cmd) { case HCIGETDEVLIST: return hci_get_dev_list(argp); @@ -573,13 +617,15 @@ static int hci_sock_ioctl(struct socket *sock, unsigned int cmd, case HCIINQUIRY: return hci_inquiry(argp); - - default: - lock_sock(sk); - err = hci_sock_bound_ioctl(sk, cmd, arg); - release_sock(sk); - return err; } + + lock_sock(sk); + + err = hci_sock_bound_ioctl(sk, cmd, arg); + +done: + release_sock(sk); + return err; } static int hci_sock_bind(struct socket *sock, struct sockaddr *addr, @@ -629,6 +675,56 @@ static int hci_sock_bind(struct socket *sock, struct sockaddr *addr, hci_pi(sk)->hdev = hdev; break; + case HCI_CHANNEL_USER: + if (hci_pi(sk)->hdev) { + err = -EALREADY; + goto done; + } + + if (haddr.hci_dev == HCI_DEV_NONE) { + err = -EINVAL; + goto done; + } + + if (!capable(CAP_NET_ADMIN)) { + err = -EPERM; + goto done; + } + + hdev = hci_dev_get(haddr.hci_dev); + if (!hdev) { + err = -ENODEV; + goto done; + } + + if (test_bit(HCI_UP, &hdev->flags) || + test_bit(HCI_INIT, &hdev->flags) || + test_bit(HCI_SETUP, &hdev->dev_flags)) { + err = -EBUSY; + hci_dev_put(hdev); + goto done; + } + + if (test_and_set_bit(HCI_USER_CHANNEL, &hdev->dev_flags)) { + err = -EUSERS; + hci_dev_put(hdev); + goto done; + } + + mgmt_index_removed(hdev); + + err = hci_dev_open(hdev->id); + if (err) { + clear_bit(HCI_USER_CHANNEL, &hdev->dev_flags); + hci_dev_put(hdev); + goto done; + } + + atomic_inc(&hdev->promisc); + + hci_pi(sk)->hdev = hdev; + break; + case HCI_CHANNEL_CONTROL: if (haddr.hci_dev != HCI_DEV_NONE) { err = -EINVAL; @@ -677,22 +773,30 @@ static int hci_sock_getname(struct socket *sock, struct sockaddr *addr, { struct sockaddr_hci *haddr = (struct sockaddr_hci *) addr; struct sock *sk = sock->sk; - struct hci_dev *hdev = hci_pi(sk)->hdev; + struct hci_dev *hdev; + int err = 0; BT_DBG("sock %p sk %p", sock, sk); - if (!hdev) - return -EBADFD; + if (peer) + return -EOPNOTSUPP; lock_sock(sk); + hdev = hci_pi(sk)->hdev; + if (!hdev) { + err = -EBADFD; + goto done; + } + *addr_len = sizeof(*haddr); haddr->hci_family = AF_BLUETOOTH; haddr->hci_dev = hdev->id; - haddr->hci_channel= 0; + haddr->hci_channel= hci_pi(sk)->channel; +done: release_sock(sk); - return 0; + return err; } static void hci_sock_cmsg(struct sock *sk, struct msghdr *msg, @@ -767,6 +871,7 @@ static int hci_sock_recvmsg(struct kiocb *iocb, struct socket *sock, case HCI_CHANNEL_RAW: hci_sock_cmsg(sk, msg, skb); break; + case HCI_CHANNEL_USER: case HCI_CHANNEL_CONTROL: case HCI_CHANNEL_MONITOR: sock_recv_timestamp(msg, sk, skb); @@ -801,6 +906,7 @@ static int hci_sock_sendmsg(struct kiocb *iocb, struct socket *sock, switch (hci_pi(sk)->channel) { case HCI_CHANNEL_RAW: + case HCI_CHANNEL_USER: break; case HCI_CHANNEL_CONTROL: err = mgmt_control(sk, msg, len); @@ -835,9 +941,9 @@ static int hci_sock_sendmsg(struct kiocb *iocb, struct socket *sock, bt_cb(skb)->pkt_type = *((unsigned char *) skb->data); skb_pull(skb, 1); - skb->dev = (void *) hdev; - if (bt_cb(skb)->pkt_type == HCI_COMMAND_PKT) { + if (hci_pi(sk)->channel == HCI_CHANNEL_RAW && + bt_cb(skb)->pkt_type == HCI_COMMAND_PKT) { u16 opcode = get_unaligned_le16(skb->data); u16 ogf = hci_opcode_ogf(opcode); u16 ocf = hci_opcode_ocf(opcode); @@ -868,6 +974,14 @@ static int hci_sock_sendmsg(struct kiocb *iocb, struct socket *sock, goto drop; } + if (hci_pi(sk)->channel == HCI_CHANNEL_USER && + bt_cb(skb)->pkt_type != HCI_COMMAND_PKT && + bt_cb(skb)->pkt_type != HCI_ACLDATA_PKT && + bt_cb(skb)->pkt_type != HCI_SCODATA_PKT) { + err = -EINVAL; + goto drop; + } + skb_queue_tail(&hdev->raw_q, skb); queue_work(hdev->workqueue, &hdev->tx_work); } @@ -895,7 +1009,7 @@ static int hci_sock_setsockopt(struct socket *sock, int level, int optname, lock_sock(sk); if (hci_pi(sk)->channel != HCI_CHANNEL_RAW) { - err = -EINVAL; + err = -EBADFD; goto done; } @@ -981,7 +1095,7 @@ static int hci_sock_getsockopt(struct socket *sock, int level, int optname, lock_sock(sk); if (hci_pi(sk)->channel != HCI_CHANNEL_RAW) { - err = -EINVAL; + err = -EBADFD; goto done; } diff --git a/net/bluetooth/hci_sysfs.c b/net/bluetooth/hci_sysfs.c index edf623a29043..0b61250cfdf9 100644 --- a/net/bluetooth/hci_sysfs.c +++ b/net/bluetooth/hci_sysfs.c @@ -1,17 +1,12 @@ /* Bluetooth HCI driver model support. */ -#include <linux/debugfs.h> #include <linux/module.h> -#include <asm/unaligned.h> #include <net/bluetooth/bluetooth.h> #include <net/bluetooth/hci_core.h> static struct class *bt_class; -struct dentry *bt_debugfs; -EXPORT_SYMBOL_GPL(bt_debugfs); - static inline char *link_typetostr(int type) { switch (type) { @@ -42,29 +37,15 @@ static ssize_t show_link_address(struct device *dev, return sprintf(buf, "%pMR\n", &conn->dst); } -static ssize_t show_link_features(struct device *dev, - struct device_attribute *attr, char *buf) -{ - struct hci_conn *conn = to_hci_conn(dev); - - return sprintf(buf, "0x%02x%02x%02x%02x%02x%02x%02x%02x\n", - conn->features[0][0], conn->features[0][1], - conn->features[0][2], conn->features[0][3], - conn->features[0][4], conn->features[0][5], - conn->features[0][6], conn->features[0][7]); -} - #define LINK_ATTR(_name, _mode, _show, _store) \ struct device_attribute link_attr_##_name = __ATTR(_name, _mode, _show, _store) static LINK_ATTR(type, S_IRUGO, show_link_type, NULL); static LINK_ATTR(address, S_IRUGO, show_link_address, NULL); -static LINK_ATTR(features, S_IRUGO, show_link_features, NULL); static struct attribute *bt_link_attrs[] = { &link_attr_type.attr, &link_attr_address.attr, - &link_attr_features.attr, NULL }; @@ -150,28 +131,6 @@ void hci_conn_del_sysfs(struct hci_conn *conn) hci_dev_put(hdev); } -static inline char *host_bustostr(int bus) -{ - switch (bus) { - case HCI_VIRTUAL: - return "VIRTUAL"; - case HCI_USB: - return "USB"; - case HCI_PCCARD: - return "PCCARD"; - case HCI_UART: - return "UART"; - case HCI_RS232: - return "RS232"; - case HCI_PCI: - return "PCI"; - case HCI_SDIO: - return "SDIO"; - default: - return "UNKNOWN"; - } -} - static inline char *host_typetostr(int type) { switch (type) { @@ -184,13 +143,6 @@ static inline char *host_typetostr(int type) } } -static ssize_t show_bus(struct device *dev, - struct device_attribute *attr, char *buf) -{ - struct hci_dev *hdev = to_hci_dev(dev); - return sprintf(buf, "%s\n", host_bustostr(hdev->bus)); -} - static ssize_t show_type(struct device *dev, struct device_attribute *attr, char *buf) { @@ -212,14 +164,6 @@ static ssize_t show_name(struct device *dev, return sprintf(buf, "%s\n", name); } -static ssize_t show_class(struct device *dev, - struct device_attribute *attr, char *buf) -{ - struct hci_dev *hdev = to_hci_dev(dev); - return sprintf(buf, "0x%.2x%.2x%.2x\n", hdev->dev_class[2], - hdev->dev_class[1], hdev->dev_class[0]); -} - static ssize_t show_address(struct device *dev, struct device_attribute *attr, char *buf) { @@ -227,150 +171,14 @@ static ssize_t show_address(struct device *dev, return sprintf(buf, "%pMR\n", &hdev->bdaddr); } -static ssize_t show_features(struct device *dev, - struct device_attribute *attr, char *buf) -{ - struct hci_dev *hdev = to_hci_dev(dev); - - return sprintf(buf, "0x%02x%02x%02x%02x%02x%02x%02x%02x\n", - hdev->features[0][0], hdev->features[0][1], - hdev->features[0][2], hdev->features[0][3], - hdev->features[0][4], hdev->features[0][5], - hdev->features[0][6], hdev->features[0][7]); -} - -static ssize_t show_manufacturer(struct device *dev, - struct device_attribute *attr, char *buf) -{ - struct hci_dev *hdev = to_hci_dev(dev); - return sprintf(buf, "%d\n", hdev->manufacturer); -} - -static ssize_t show_hci_version(struct device *dev, - struct device_attribute *attr, char *buf) -{ - struct hci_dev *hdev = to_hci_dev(dev); - return sprintf(buf, "%d\n", hdev->hci_ver); -} - -static ssize_t show_hci_revision(struct device *dev, - struct device_attribute *attr, char *buf) -{ - struct hci_dev *hdev = to_hci_dev(dev); - return sprintf(buf, "%d\n", hdev->hci_rev); -} - -static ssize_t show_idle_timeout(struct device *dev, - struct device_attribute *attr, char *buf) -{ - struct hci_dev *hdev = to_hci_dev(dev); - return sprintf(buf, "%d\n", hdev->idle_timeout); -} - -static ssize_t store_idle_timeout(struct device *dev, - struct device_attribute *attr, - const char *buf, size_t count) -{ - struct hci_dev *hdev = to_hci_dev(dev); - unsigned int val; - int rv; - - rv = kstrtouint(buf, 0, &val); - if (rv < 0) - return rv; - - if (val != 0 && (val < 500 || val > 3600000)) - return -EINVAL; - - hdev->idle_timeout = val; - - return count; -} - -static ssize_t show_sniff_max_interval(struct device *dev, - struct device_attribute *attr, char *buf) -{ - struct hci_dev *hdev = to_hci_dev(dev); - return sprintf(buf, "%d\n", hdev->sniff_max_interval); -} - -static ssize_t store_sniff_max_interval(struct device *dev, - struct device_attribute *attr, - const char *buf, size_t count) -{ - struct hci_dev *hdev = to_hci_dev(dev); - u16 val; - int rv; - - rv = kstrtou16(buf, 0, &val); - if (rv < 0) - return rv; - - if (val == 0 || val % 2 || val < hdev->sniff_min_interval) - return -EINVAL; - - hdev->sniff_max_interval = val; - - return count; -} - -static ssize_t show_sniff_min_interval(struct device *dev, - struct device_attribute *attr, char *buf) -{ - struct hci_dev *hdev = to_hci_dev(dev); - return sprintf(buf, "%d\n", hdev->sniff_min_interval); -} - -static ssize_t store_sniff_min_interval(struct device *dev, - struct device_attribute *attr, - const char *buf, size_t count) -{ - struct hci_dev *hdev = to_hci_dev(dev); - u16 val; - int rv; - - rv = kstrtou16(buf, 0, &val); - if (rv < 0) - return rv; - - if (val == 0 || val % 2 || val > hdev->sniff_max_interval) - return -EINVAL; - - hdev->sniff_min_interval = val; - - return count; -} - -static DEVICE_ATTR(bus, S_IRUGO, show_bus, NULL); static DEVICE_ATTR(type, S_IRUGO, show_type, NULL); static DEVICE_ATTR(name, S_IRUGO, show_name, NULL); -static DEVICE_ATTR(class, S_IRUGO, show_class, NULL); static DEVICE_ATTR(address, S_IRUGO, show_address, NULL); -static DEVICE_ATTR(features, S_IRUGO, show_features, NULL); -static DEVICE_ATTR(manufacturer, S_IRUGO, show_manufacturer, NULL); -static DEVICE_ATTR(hci_version, S_IRUGO, show_hci_version, NULL); -static DEVICE_ATTR(hci_revision, S_IRUGO, show_hci_revision, NULL); - -static DEVICE_ATTR(idle_timeout, S_IRUGO | S_IWUSR, - show_idle_timeout, store_idle_timeout); -static DEVICE_ATTR(sniff_max_interval, S_IRUGO | S_IWUSR, - show_sniff_max_interval, store_sniff_max_interval); -static DEVICE_ATTR(sniff_min_interval, S_IRUGO | S_IWUSR, - show_sniff_min_interval, store_sniff_min_interval); static struct attribute *bt_host_attrs[] = { - &dev_attr_bus.attr, &dev_attr_type.attr, &dev_attr_name.attr, - &dev_attr_class.attr, &dev_attr_address.attr, - &dev_attr_features.attr, - &dev_attr_manufacturer.attr, - &dev_attr_hci_version.attr, - &dev_attr_hci_revision.attr, - &dev_attr_idle_timeout.attr, - &dev_attr_sniff_max_interval.attr, - &dev_attr_sniff_min_interval.attr, NULL }; @@ -396,141 +204,6 @@ static struct device_type bt_host = { .release = bt_host_release, }; -static int inquiry_cache_show(struct seq_file *f, void *p) -{ - struct hci_dev *hdev = f->private; - struct discovery_state *cache = &hdev->discovery; - struct inquiry_entry *e; - - hci_dev_lock(hdev); - - list_for_each_entry(e, &cache->all, all) { - struct inquiry_data *data = &e->data; - seq_printf(f, "%pMR %d %d %d 0x%.2x%.2x%.2x 0x%.4x %d %d %u\n", - &data->bdaddr, - data->pscan_rep_mode, data->pscan_period_mode, - data->pscan_mode, data->dev_class[2], - data->dev_class[1], data->dev_class[0], - __le16_to_cpu(data->clock_offset), - data->rssi, data->ssp_mode, e->timestamp); - } - - hci_dev_unlock(hdev); - - return 0; -} - -static int inquiry_cache_open(struct inode *inode, struct file *file) -{ - return single_open(file, inquiry_cache_show, inode->i_private); -} - -static const struct file_operations inquiry_cache_fops = { - .open = inquiry_cache_open, - .read = seq_read, - .llseek = seq_lseek, - .release = single_release, -}; - -static int blacklist_show(struct seq_file *f, void *p) -{ - struct hci_dev *hdev = f->private; - struct bdaddr_list *b; - - hci_dev_lock(hdev); - - list_for_each_entry(b, &hdev->blacklist, list) - seq_printf(f, "%pMR\n", &b->bdaddr); - - hci_dev_unlock(hdev); - - return 0; -} - -static int blacklist_open(struct inode *inode, struct file *file) -{ - return single_open(file, blacklist_show, inode->i_private); -} - -static const struct file_operations blacklist_fops = { - .open = blacklist_open, - .read = seq_read, - .llseek = seq_lseek, - .release = single_release, -}; - -static void print_bt_uuid(struct seq_file *f, u8 *uuid) -{ - u32 data0, data5; - u16 data1, data2, data3, data4; - - data5 = get_unaligned_le32(uuid); - data4 = get_unaligned_le16(uuid + 4); - data3 = get_unaligned_le16(uuid + 6); - data2 = get_unaligned_le16(uuid + 8); - data1 = get_unaligned_le16(uuid + 10); - data0 = get_unaligned_le32(uuid + 12); - - seq_printf(f, "%.8x-%.4x-%.4x-%.4x-%.4x%.8x\n", - data0, data1, data2, data3, data4, data5); -} - -static int uuids_show(struct seq_file *f, void *p) -{ - struct hci_dev *hdev = f->private; - struct bt_uuid *uuid; - - hci_dev_lock(hdev); - - list_for_each_entry(uuid, &hdev->uuids, list) - print_bt_uuid(f, uuid->uuid); - - hci_dev_unlock(hdev); - - return 0; -} - -static int uuids_open(struct inode *inode, struct file *file) -{ - return single_open(file, uuids_show, inode->i_private); -} - -static const struct file_operations uuids_fops = { - .open = uuids_open, - .read = seq_read, - .llseek = seq_lseek, - .release = single_release, -}; - -static int auto_accept_delay_set(void *data, u64 val) -{ - struct hci_dev *hdev = data; - - hci_dev_lock(hdev); - - hdev->auto_accept_delay = val; - - hci_dev_unlock(hdev); - - return 0; -} - -static int auto_accept_delay_get(void *data, u64 *val) -{ - struct hci_dev *hdev = data; - - hci_dev_lock(hdev); - - *val = hdev->auto_accept_delay; - - hci_dev_unlock(hdev); - - return 0; -} - -DEFINE_SIMPLE_ATTRIBUTE(auto_accept_delay_fops, auto_accept_delay_get, - auto_accept_delay_set, "%llu\n"); - void hci_init_sysfs(struct hci_dev *hdev) { struct device *dev = &hdev->dev; @@ -542,52 +215,8 @@ void hci_init_sysfs(struct hci_dev *hdev) device_initialize(dev); } -int hci_add_sysfs(struct hci_dev *hdev) -{ - struct device *dev = &hdev->dev; - int err; - - BT_DBG("%p name %s bus %d", hdev, hdev->name, hdev->bus); - - dev_set_name(dev, "%s", hdev->name); - - err = device_add(dev); - if (err < 0) - return err; - - if (!bt_debugfs) - return 0; - - hdev->debugfs = debugfs_create_dir(hdev->name, bt_debugfs); - if (!hdev->debugfs) - return 0; - - debugfs_create_file("inquiry_cache", 0444, hdev->debugfs, - hdev, &inquiry_cache_fops); - - debugfs_create_file("blacklist", 0444, hdev->debugfs, - hdev, &blacklist_fops); - - debugfs_create_file("uuids", 0444, hdev->debugfs, hdev, &uuids_fops); - - debugfs_create_file("auto_accept_delay", 0444, hdev->debugfs, hdev, - &auto_accept_delay_fops); - return 0; -} - -void hci_del_sysfs(struct hci_dev *hdev) -{ - BT_DBG("%p name %s bus %d", hdev, hdev->name, hdev->bus); - - debugfs_remove_recursive(hdev->debugfs); - - device_del(&hdev->dev); -} - int __init bt_sysfs_init(void) { - bt_debugfs = debugfs_create_dir("bluetooth", NULL); - bt_class = class_create(THIS_MODULE, "bluetooth"); return PTR_ERR_OR_ZERO(bt_class); @@ -596,6 +225,4 @@ int __init bt_sysfs_init(void) void bt_sysfs_cleanup(void) { class_destroy(bt_class); - - debugfs_remove_recursive(bt_debugfs); } diff --git a/net/bluetooth/hidp/core.c b/net/bluetooth/hidp/core.c index bdc35a7a7fee..292e619db896 100644 --- a/net/bluetooth/hidp/core.c +++ b/net/bluetooth/hidp/core.c @@ -767,10 +767,10 @@ static int hidp_setup_hid(struct hidp_session *session, strncpy(hid->name, req->name, sizeof(req->name) - 1); snprintf(hid->phys, sizeof(hid->phys), "%pMR", - &bt_sk(session->ctrl_sock->sk)->src); + &l2cap_pi(session->ctrl_sock->sk)->chan->src); snprintf(hid->uniq, sizeof(hid->uniq), "%pMR", - &bt_sk(session->ctrl_sock->sk)->dst); + &l2cap_pi(session->ctrl_sock->sk)->chan->dst); hid->dev.parent = &session->conn->hcon->dev; hid->ll_driver = &hidp_hid_driver; @@ -1283,23 +1283,29 @@ static int hidp_session_thread(void *arg) static int hidp_verify_sockets(struct socket *ctrl_sock, struct socket *intr_sock) { + struct l2cap_chan *ctrl_chan, *intr_chan; struct bt_sock *ctrl, *intr; struct hidp_session *session; if (!l2cap_is_socket(ctrl_sock) || !l2cap_is_socket(intr_sock)) return -EINVAL; + ctrl_chan = l2cap_pi(ctrl_sock->sk)->chan; + intr_chan = l2cap_pi(intr_sock->sk)->chan; + + if (bacmp(&ctrl_chan->src, &intr_chan->src) || + bacmp(&ctrl_chan->dst, &intr_chan->dst)) + return -ENOTUNIQ; + ctrl = bt_sk(ctrl_sock->sk); intr = bt_sk(intr_sock->sk); - if (bacmp(&ctrl->src, &intr->src) || bacmp(&ctrl->dst, &intr->dst)) - return -ENOTUNIQ; if (ctrl->sk.sk_state != BT_CONNECTED || intr->sk.sk_state != BT_CONNECTED) return -EBADFD; /* early session check, we check again during session registration */ - session = hidp_session_find(&ctrl->dst); + session = hidp_session_find(&ctrl_chan->dst); if (session) { hidp_session_put(session); return -EEXIST; @@ -1332,7 +1338,7 @@ int hidp_connection_add(struct hidp_connadd_req *req, if (!conn) return -EBADFD; - ret = hidp_session_new(&session, &bt_sk(ctrl_sock->sk)->dst, ctrl_sock, + ret = hidp_session_new(&session, &chan->dst, ctrl_sock, intr_sock, req, conn); if (ret) goto out_conn; diff --git a/net/bluetooth/hidp/hidp.h b/net/bluetooth/hidp/hidp.h index 9e6cc3553105..ab5241400cf7 100644 --- a/net/bluetooth/hidp/hidp.h +++ b/net/bluetooth/hidp/hidp.h @@ -182,7 +182,7 @@ struct hidp_session { }; /* HIDP init defines */ -extern int __init hidp_init_sockets(void); -extern void __exit hidp_cleanup_sockets(void); +int __init hidp_init_sockets(void); +void __exit hidp_cleanup_sockets(void); #endif /* __HIDP_H */ diff --git a/net/bluetooth/l2cap_core.c b/net/bluetooth/l2cap_core.c index 63fa11109a1c..0cef67707838 100644 --- a/net/bluetooth/l2cap_core.c +++ b/net/bluetooth/l2cap_core.c @@ -36,14 +36,15 @@ #include <net/bluetooth/bluetooth.h> #include <net/bluetooth/hci_core.h> #include <net/bluetooth/l2cap.h> -#include <net/bluetooth/smp.h> -#include <net/bluetooth/a2mp.h> -#include <net/bluetooth/amp.h> + +#include "smp.h" +#include "a2mp.h" +#include "amp.h" bool disable_ertm; -static u32 l2cap_feat_mask = L2CAP_FEAT_FIXED_CHAN; -static u8 l2cap_fixed_chan[8] = { L2CAP_FC_L2CAP, }; +static u32 l2cap_feat_mask = L2CAP_FEAT_FIXED_CHAN | L2CAP_FEAT_UCD; +static u8 l2cap_fixed_chan[8] = { L2CAP_FC_L2CAP | L2CAP_FC_CONNLESS, }; static LIST_HEAD(chan_list); static DEFINE_RWLOCK(chan_list_lock); @@ -58,6 +59,18 @@ static void l2cap_send_disconn_req(struct l2cap_chan *chan, int err); static void l2cap_tx(struct l2cap_chan *chan, struct l2cap_ctrl *control, struct sk_buff_head *skbs, u8 event); +static inline __u8 bdaddr_type(struct hci_conn *hcon, __u8 type) +{ + if (hcon->type == LE_LINK) { + if (type == ADDR_LE_DEV_PUBLIC) + return BDADDR_LE_PUBLIC; + else + return BDADDR_LE_RANDOM; + } + + return BDADDR_BREDR; +} + /* ---- L2CAP channels ---- */ static struct l2cap_chan *__l2cap_get_chan_by_dcid(struct l2cap_conn *conn, @@ -148,7 +161,7 @@ static struct l2cap_chan *__l2cap_global_chan_by_addr(__le16 psm, bdaddr_t *src) struct l2cap_chan *c; list_for_each_entry(c, &chan_list, global_l) { - if (c->sport == psm && !bacmp(&bt_sk(c->sk)->src, src)) + if (c->sport == psm && !bacmp(&c->src, src)) return c; } return NULL; @@ -210,38 +223,25 @@ static u16 l2cap_alloc_cid(struct l2cap_conn *conn) return 0; } -static void __l2cap_state_change(struct l2cap_chan *chan, int state) +static void l2cap_state_change(struct l2cap_chan *chan, int state) { BT_DBG("chan %p %s -> %s", chan, state_to_string(chan->state), state_to_string(state)); chan->state = state; - chan->ops->state_change(chan, state); -} - -static void l2cap_state_change(struct l2cap_chan *chan, int state) -{ - struct sock *sk = chan->sk; - - lock_sock(sk); - __l2cap_state_change(chan, state); - release_sock(sk); + chan->ops->state_change(chan, state, 0); } -static inline void __l2cap_chan_set_err(struct l2cap_chan *chan, int err) +static inline void l2cap_state_change_and_error(struct l2cap_chan *chan, + int state, int err) { - struct sock *sk = chan->sk; - - sk->sk_err = err; + chan->state = state; + chan->ops->state_change(chan, chan->state, err); } static inline void l2cap_chan_set_err(struct l2cap_chan *chan, int err) { - struct sock *sk = chan->sk; - - lock_sock(sk); - __l2cap_chan_set_err(chan, err); - release_sock(sk); + chan->ops->state_change(chan, chan->state, err); } static void __set_retrans_timer(struct l2cap_chan *chan) @@ -620,10 +620,8 @@ void l2cap_chan_del(struct l2cap_chan *chan, int err) void l2cap_chan_close(struct l2cap_chan *chan, int reason) { struct l2cap_conn *conn = chan->conn; - struct sock *sk = chan->sk; - BT_DBG("chan %p state %s sk %p", chan, state_to_string(chan->state), - sk); + BT_DBG("chan %p state %s", chan, state_to_string(chan->state)); switch (chan->state) { case BT_LISTEN: @@ -634,7 +632,7 @@ void l2cap_chan_close(struct l2cap_chan *chan, int reason) case BT_CONFIG: if (chan->chan_type == L2CAP_CHAN_CONN_ORIENTED && conn->hcon->type == ACL_LINK) { - __set_chan_timer(chan, sk->sk_sndtimeo); + __set_chan_timer(chan, chan->ops->get_sndtimeo(chan)); l2cap_send_disconn_req(chan, reason); } else l2cap_chan_del(chan, reason); @@ -646,10 +644,11 @@ void l2cap_chan_close(struct l2cap_chan *chan, int reason) struct l2cap_conn_rsp rsp; __u16 result; - if (test_bit(BT_SK_DEFER_SETUP, &bt_sk(sk)->flags)) + if (test_bit(FLAG_DEFER_SETUP, &chan->flags)) result = L2CAP_CR_SEC_BLOCK; else result = L2CAP_CR_BAD_PSM; + l2cap_state_change(chan, BT_DISCONN); rsp.scid = cpu_to_le16(chan->dcid); @@ -676,7 +675,8 @@ void l2cap_chan_close(struct l2cap_chan *chan, int reason) static inline u8 l2cap_get_auth_type(struct l2cap_chan *chan) { - if (chan->chan_type == L2CAP_CHAN_RAW) { + switch (chan->chan_type) { + case L2CAP_CHAN_RAW: switch (chan->sec_level) { case BT_SECURITY_HIGH: return HCI_AT_DEDICATED_BONDING_MITM; @@ -685,15 +685,29 @@ static inline u8 l2cap_get_auth_type(struct l2cap_chan *chan) default: return HCI_AT_NO_BONDING; } - } else if (chan->psm == __constant_cpu_to_le16(L2CAP_PSM_SDP)) { - if (chan->sec_level == BT_SECURITY_LOW) - chan->sec_level = BT_SECURITY_SDP; - + break; + case L2CAP_CHAN_CONN_LESS: + if (chan->psm == __constant_cpu_to_le16(L2CAP_PSM_3DSP)) { + if (chan->sec_level == BT_SECURITY_LOW) + chan->sec_level = BT_SECURITY_SDP; + } if (chan->sec_level == BT_SECURITY_HIGH) return HCI_AT_NO_BONDING_MITM; else return HCI_AT_NO_BONDING; - } else { + break; + case L2CAP_CHAN_CONN_ORIENTED: + if (chan->psm == __constant_cpu_to_le16(L2CAP_PSM_SDP)) { + if (chan->sec_level == BT_SECURITY_LOW) + chan->sec_level = BT_SECURITY_SDP; + + if (chan->sec_level == BT_SECURITY_HIGH) + return HCI_AT_NO_BONDING_MITM; + else + return HCI_AT_NO_BONDING; + } + /* fall through */ + default: switch (chan->sec_level) { case BT_SECURITY_HIGH: return HCI_AT_GENERAL_BONDING_MITM; @@ -702,6 +716,7 @@ static inline u8 l2cap_get_auth_type(struct l2cap_chan *chan) default: return HCI_AT_NO_BONDING; } + break; } } @@ -1015,14 +1030,29 @@ static inline int __l2cap_no_conn_pending(struct l2cap_chan *chan) static bool __amp_capable(struct l2cap_chan *chan) { struct l2cap_conn *conn = chan->conn; + struct hci_dev *hdev; + bool amp_available = false; - if (enable_hs && - hci_amp_capable() && - chan->chan_policy == BT_CHANNEL_POLICY_AMP_PREFERRED && - conn->fixed_chan_mask & L2CAP_FC_A2MP) - return true; - else + if (!conn->hs_enabled) return false; + + if (!(conn->fixed_chan_mask & L2CAP_FC_A2MP)) + return false; + + read_lock(&hci_dev_list_lock); + list_for_each_entry(hdev, &hci_dev_list, list) { + if (hdev->amp_type != AMP_TYPE_BREDR && + test_bit(HCI_UP, &hdev->flags)) { + amp_available = true; + break; + } + } + read_unlock(&hci_dev_list_lock); + + if (chan->chan_policy == BT_CHANNEL_POLICY_AMP_PREFERRED) + return amp_available; + + return false; } static bool l2cap_check_efs(struct l2cap_chan *chan) @@ -1186,7 +1216,6 @@ static inline int l2cap_mode_supported(__u8 mode, __u32 feat_mask) static void l2cap_send_disconn_req(struct l2cap_chan *chan, int err) { - struct sock *sk = chan->sk; struct l2cap_conn *conn = chan->conn; struct l2cap_disconn_req req; @@ -1209,10 +1238,7 @@ static void l2cap_send_disconn_req(struct l2cap_chan *chan, int err) l2cap_send_cmd(conn, l2cap_get_ident(conn), L2CAP_DISCONN_REQ, sizeof(req), &req); - lock_sock(sk); - __l2cap_state_change(chan, BT_DISCONN); - __l2cap_chan_set_err(chan, err); - release_sock(sk); + l2cap_state_change_and_error(chan, BT_DISCONN, err); } /* ---- L2CAP connections ---- */ @@ -1225,8 +1251,6 @@ static void l2cap_conn_start(struct l2cap_conn *conn) mutex_lock(&conn->chan_lock); list_for_each_entry_safe(chan, tmp, &conn->chan_l, list) { - struct sock *sk = chan->sk; - l2cap_chan_lock(chan); if (chan->chan_type != L2CAP_CHAN_CONN_ORIENTED) { @@ -1258,19 +1282,16 @@ static void l2cap_conn_start(struct l2cap_conn *conn) rsp.dcid = cpu_to_le16(chan->scid); if (l2cap_chan_check_security(chan)) { - lock_sock(sk); - if (test_bit(BT_SK_DEFER_SETUP, - &bt_sk(sk)->flags)) { + if (test_bit(FLAG_DEFER_SETUP, &chan->flags)) { rsp.result = __constant_cpu_to_le16(L2CAP_CR_PEND); rsp.status = __constant_cpu_to_le16(L2CAP_CS_AUTHOR_PEND); chan->ops->defer(chan); } else { - __l2cap_state_change(chan, BT_CONFIG); + l2cap_state_change(chan, BT_CONFIG); rsp.result = __constant_cpu_to_le16(L2CAP_CR_SUCCESS); rsp.status = __constant_cpu_to_le16(L2CAP_CS_NO_INFO); } - release_sock(sk); } else { rsp.result = __constant_cpu_to_le16(L2CAP_CR_PEND); rsp.status = __constant_cpu_to_le16(L2CAP_CS_AUTHEN_PEND); @@ -1309,8 +1330,6 @@ static struct l2cap_chan *l2cap_global_chan_by_scid(int state, u16 cid, read_lock(&chan_list_lock); list_for_each_entry(c, &chan_list, global_l) { - struct sock *sk = c->sk; - if (state && c->state != state) continue; @@ -1319,16 +1338,16 @@ static struct l2cap_chan *l2cap_global_chan_by_scid(int state, u16 cid, int src_any, dst_any; /* Exact match. */ - src_match = !bacmp(&bt_sk(sk)->src, src); - dst_match = !bacmp(&bt_sk(sk)->dst, dst); + src_match = !bacmp(&c->src, src); + dst_match = !bacmp(&c->dst, dst); if (src_match && dst_match) { read_unlock(&chan_list_lock); return c; } /* Closest match */ - src_any = !bacmp(&bt_sk(sk)->src, BDADDR_ANY); - dst_any = !bacmp(&bt_sk(sk)->dst, BDADDR_ANY); + src_any = !bacmp(&c->src, BDADDR_ANY); + dst_any = !bacmp(&c->dst, BDADDR_ANY); if ((src_match && dst_any) || (src_any && dst_match) || (src_any && dst_any)) c1 = c; @@ -1342,14 +1361,15 @@ static struct l2cap_chan *l2cap_global_chan_by_scid(int state, u16 cid, static void l2cap_le_conn_ready(struct l2cap_conn *conn) { - struct sock *parent; + struct hci_conn *hcon = conn->hcon; struct l2cap_chan *chan, *pchan; + u8 dst_type; BT_DBG(""); /* Check if we have socket listening on cid */ pchan = l2cap_global_chan_by_scid(BT_LISTEN, L2CAP_CID_ATT, - conn->src, conn->dst); + &hcon->src, &hcon->dst); if (!pchan) return; @@ -1357,9 +1377,13 @@ static void l2cap_le_conn_ready(struct l2cap_conn *conn) if (__l2cap_get_chan_by_dcid(conn, L2CAP_CID_ATT)) return; - parent = pchan->sk; + dst_type = bdaddr_type(hcon, hcon->dst_type); + + /* If device is blocked, do not create a channel for it */ + if (hci_blacklist_lookup(hcon->hdev, &hcon->dst, dst_type)) + return; - lock_sock(parent); + l2cap_chan_lock(pchan); chan = pchan->ops->new_connection(pchan); if (!chan) @@ -1367,13 +1391,15 @@ static void l2cap_le_conn_ready(struct l2cap_conn *conn) chan->dcid = L2CAP_CID_ATT; - bacpy(&bt_sk(chan->sk)->src, conn->src); - bacpy(&bt_sk(chan->sk)->dst, conn->dst); + bacpy(&chan->src, &hcon->src); + bacpy(&chan->dst, &hcon->dst); + chan->src_type = bdaddr_type(hcon, hcon->src_type); + chan->dst_type = dst_type; __l2cap_chan_add(conn, chan); clean: - release_sock(parent); + l2cap_chan_unlock(pchan); } static void l2cap_conn_ready(struct l2cap_conn *conn) @@ -1408,12 +1434,7 @@ static void l2cap_conn_ready(struct l2cap_conn *conn) l2cap_chan_ready(chan); } else if (chan->chan_type != L2CAP_CHAN_CONN_ORIENTED) { - struct sock *sk = chan->sk; - __clear_chan_timer(chan); - lock_sock(sk); - __l2cap_state_change(chan, BT_CONNECTED); - sk->sk_state_change(sk); - release_sock(sk); + l2cap_chan_ready(chan); } else if (chan->state == BT_CONNECT) { l2cap_do_start(chan); @@ -1633,11 +1654,12 @@ static struct l2cap_conn *l2cap_conn_add(struct hci_conn *hcon) break; } - conn->src = &hcon->hdev->bdaddr; - conn->dst = &hcon->dst; - conn->feat_mask = 0; + if (hcon->type == ACL_LINK) + conn->hs_enabled = test_bit(HCI_HS_ENABLED, + &hcon->hdev->dev_flags); + spin_lock_init(&conn->lock); mutex_init(&conn->chan_lock); @@ -1688,8 +1710,6 @@ static struct l2cap_chan *l2cap_global_chan_by_psm(int state, __le16 psm, read_lock(&chan_list_lock); list_for_each_entry(c, &chan_list, global_l) { - struct sock *sk = c->sk; - if (state && c->state != state) continue; @@ -1698,16 +1718,16 @@ static struct l2cap_chan *l2cap_global_chan_by_psm(int state, __le16 psm, int src_any, dst_any; /* Exact match. */ - src_match = !bacmp(&bt_sk(sk)->src, src); - dst_match = !bacmp(&bt_sk(sk)->dst, dst); + src_match = !bacmp(&c->src, src); + dst_match = !bacmp(&c->dst, dst); if (src_match && dst_match) { read_unlock(&chan_list_lock); return c; } /* Closest match */ - src_any = !bacmp(&bt_sk(sk)->src, BDADDR_ANY); - dst_any = !bacmp(&bt_sk(sk)->dst, BDADDR_ANY); + src_any = !bacmp(&c->src, BDADDR_ANY); + dst_any = !bacmp(&c->dst, BDADDR_ANY); if ((src_match && dst_any) || (src_any && dst_match) || (src_any && dst_any)) c1 = c; @@ -1722,18 +1742,16 @@ static struct l2cap_chan *l2cap_global_chan_by_psm(int state, __le16 psm, int l2cap_chan_connect(struct l2cap_chan *chan, __le16 psm, u16 cid, bdaddr_t *dst, u8 dst_type) { - struct sock *sk = chan->sk; - bdaddr_t *src = &bt_sk(sk)->src; struct l2cap_conn *conn; struct hci_conn *hcon; struct hci_dev *hdev; __u8 auth_type; int err; - BT_DBG("%pMR -> %pMR (type %u) psm 0x%2.2x", src, dst, + BT_DBG("%pMR -> %pMR (type %u) psm 0x%2.2x", &chan->src, dst, dst_type, __le16_to_cpu(psm)); - hdev = hci_get_route(dst, src); + hdev = hci_get_route(dst, &chan->src); if (!hdev) return -EHOSTUNREACH; @@ -1790,9 +1808,8 @@ int l2cap_chan_connect(struct l2cap_chan *chan, __le16 psm, u16 cid, } /* Set destination address and psm */ - lock_sock(sk); - bacpy(&bt_sk(sk)->dst, dst); - release_sock(sk); + bacpy(&chan->dst, dst); + chan->dst_type = dst_type; chan->psm = psm; chan->dcid = cid; @@ -1825,7 +1842,8 @@ int l2cap_chan_connect(struct l2cap_chan *chan, __le16 psm, u16 cid, } /* Update source addr of the socket */ - bacpy(src, conn->src); + bacpy(&chan->src, &hcon->src); + chan->src_type = bdaddr_type(hcon, hcon->src_type); l2cap_chan_unlock(chan); l2cap_chan_add(conn, chan); @@ -1835,7 +1853,7 @@ int l2cap_chan_connect(struct l2cap_chan *chan, __le16 psm, u16 cid, hci_conn_drop(hcon); l2cap_state_change(chan, BT_CONNECT); - __set_chan_timer(chan, sk->sk_sndtimeo); + __set_chan_timer(chan, chan->ops->get_sndtimeo(chan)); if (hcon->state == BT_CONNECTED) { if (chan->chan_type != L2CAP_CHAN_CONN_ORIENTED) { @@ -1855,38 +1873,6 @@ done: return err; } -int __l2cap_wait_ack(struct sock *sk) -{ - struct l2cap_chan *chan = l2cap_pi(sk)->chan; - DECLARE_WAITQUEUE(wait, current); - int err = 0; - int timeo = HZ/5; - - add_wait_queue(sk_sleep(sk), &wait); - set_current_state(TASK_INTERRUPTIBLE); - while (chan->unacked_frames > 0 && chan->conn) { - if (!timeo) - timeo = HZ/5; - - if (signal_pending(current)) { - err = sock_intr_errno(timeo); - break; - } - - release_sock(sk); - timeo = schedule_timeout(timeo); - lock_sock(sk); - set_current_state(TASK_INTERRUPTIBLE); - - err = sock_error(sk); - if (err) - break; - } - set_current_state(TASK_RUNNING); - remove_wait_queue(sk_sleep(sk), &wait); - return err; -} - static void l2cap_monitor_timeout(struct work_struct *work) { struct l2cap_chan *chan = container_of(work, struct l2cap_chan, @@ -2263,7 +2249,8 @@ static struct sk_buff *l2cap_create_connless_pdu(struct l2cap_chan *chan, int err, count, hlen = L2CAP_HDR_SIZE + L2CAP_PSMLEN_SIZE; struct l2cap_hdr *lh; - BT_DBG("chan %p len %zu priority %u", chan, len, priority); + BT_DBG("chan %p psm 0x%2.2x len %zu priority %u", chan, + __le16_to_cpu(chan->psm), len, priority); count = min_t(unsigned int, (conn->mtu - hlen), len); @@ -2278,7 +2265,7 @@ static struct sk_buff *l2cap_create_connless_pdu(struct l2cap_chan *chan, lh = (struct l2cap_hdr *) skb_put(skb, L2CAP_HDR_SIZE); lh->cid = cpu_to_le16(chan->dcid); lh->len = cpu_to_le16(len + L2CAP_PSMLEN_SIZE); - put_unaligned(chan->psm, skb_put(skb, L2CAP_PSMLEN_SIZE)); + put_unaligned(chan->psm, (__le16 *) skb_put(skb, L2CAP_PSMLEN_SIZE)); err = l2cap_skbuff_fromiovec(chan, msg, len, count, skb); if (unlikely(err < 0)) { @@ -2826,17 +2813,16 @@ static void l2cap_raw_recv(struct l2cap_conn *conn, struct sk_buff *skb) mutex_lock(&conn->chan_lock); list_for_each_entry(chan, &conn->chan_l, list) { - struct sock *sk = chan->sk; if (chan->chan_type != L2CAP_CHAN_RAW) continue; - /* Don't send frame to the socket it came from */ - if (skb->sk == sk) + /* Don't send frame to the channel it came from */ + if (bt_cb(skb)->chan == chan) continue; + nskb = skb_clone(skb, GFP_KERNEL); if (!nskb) continue; - if (chan->ops->recv(chan, nskb)) kfree_skb(nskb); } @@ -3043,8 +3029,8 @@ int l2cap_ertm_init(struct l2cap_chan *chan) skb_queue_head_init(&chan->tx_q); - chan->local_amp_id = 0; - chan->move_id = 0; + chan->local_amp_id = AMP_ID_BREDR; + chan->move_id = AMP_ID_BREDR; chan->move_state = L2CAP_MOVE_STABLE; chan->move_role = L2CAP_MOVE_ROLE_NONE; @@ -3084,20 +3070,20 @@ static inline __u8 l2cap_select_mode(__u8 mode, __u16 remote_feat_mask) } } -static inline bool __l2cap_ews_supported(struct l2cap_chan *chan) +static inline bool __l2cap_ews_supported(struct l2cap_conn *conn) { - return enable_hs && chan->conn->feat_mask & L2CAP_FEAT_EXT_WINDOW; + return conn->hs_enabled && conn->feat_mask & L2CAP_FEAT_EXT_WINDOW; } -static inline bool __l2cap_efs_supported(struct l2cap_chan *chan) +static inline bool __l2cap_efs_supported(struct l2cap_conn *conn) { - return enable_hs && chan->conn->feat_mask & L2CAP_FEAT_EXT_FLOW; + return conn->hs_enabled && conn->feat_mask & L2CAP_FEAT_EXT_FLOW; } static void __l2cap_set_ertm_timeouts(struct l2cap_chan *chan, struct l2cap_conf_rfc *rfc) { - if (chan->local_amp_id && chan->hs_hcon) { + if (chan->local_amp_id != AMP_ID_BREDR && chan->hs_hcon) { u64 ertm_to = chan->hs_hcon->hdev->amp_be_flush_to; /* Class 1 devices have must have ERTM timeouts @@ -3135,7 +3121,7 @@ static void __l2cap_set_ertm_timeouts(struct l2cap_chan *chan, static inline void l2cap_txwin_setup(struct l2cap_chan *chan) { if (chan->tx_win > L2CAP_DEFAULT_TX_WINDOW && - __l2cap_ews_supported(chan)) { + __l2cap_ews_supported(chan->conn)) { /* use extended control field */ set_bit(FLAG_EXT_CTRL, &chan->flags); chan->tx_win_max = L2CAP_DEFAULT_EXT_WINDOW; @@ -3165,7 +3151,7 @@ static int l2cap_build_conf_req(struct l2cap_chan *chan, void *data) if (test_bit(CONF_STATE2_DEVICE, &chan->conf_state)) break; - if (__l2cap_efs_supported(chan)) + if (__l2cap_efs_supported(chan->conn)) set_bit(FLAG_EFS_ENABLE, &chan->flags); /* fall through */ @@ -3317,7 +3303,7 @@ static int l2cap_parse_conf_req(struct l2cap_chan *chan, void *data) break; case L2CAP_CONF_EWS: - if (!enable_hs) + if (!chan->conn->hs_enabled) return -ECONNREFUSED; set_bit(FLAG_EXT_CTRL, &chan->flags); @@ -3349,7 +3335,7 @@ static int l2cap_parse_conf_req(struct l2cap_chan *chan, void *data) } if (remote_efs) { - if (__l2cap_efs_supported(chan)) + if (__l2cap_efs_supported(chan->conn)) set_bit(FLAG_EFS_ENABLE, &chan->flags); else return -ECONNREFUSED; @@ -3715,7 +3701,6 @@ static struct l2cap_chan *l2cap_connect(struct l2cap_conn *conn, struct l2cap_conn_req *req = (struct l2cap_conn_req *) data; struct l2cap_conn_rsp rsp; struct l2cap_chan *chan = NULL, *pchan; - struct sock *parent, *sk = NULL; int result, status = L2CAP_CS_NO_INFO; u16 dcid = 0, scid = __le16_to_cpu(req->scid); @@ -3724,16 +3709,15 @@ static struct l2cap_chan *l2cap_connect(struct l2cap_conn *conn, BT_DBG("psm 0x%2.2x scid 0x%4.4x", __le16_to_cpu(psm), scid); /* Check if we have socket listening on psm */ - pchan = l2cap_global_chan_by_psm(BT_LISTEN, psm, conn->src, conn->dst); + pchan = l2cap_global_chan_by_psm(BT_LISTEN, psm, &conn->hcon->src, + &conn->hcon->dst); if (!pchan) { result = L2CAP_CR_BAD_PSM; goto sendresp; } - parent = pchan->sk; - mutex_lock(&conn->chan_lock); - lock_sock(parent); + l2cap_chan_lock(pchan); /* Check if the ACL is secure enough (if not SDP) */ if (psm != __constant_cpu_to_le16(L2CAP_PSM_SDP) && @@ -3753,8 +3737,6 @@ static struct l2cap_chan *l2cap_connect(struct l2cap_conn *conn, if (!chan) goto response; - sk = chan->sk; - /* For certain devices (ex: HID mouse), support for authentication, * pairing and bonding is optional. For such devices, inorder to avoid * the ACL alive for too long after L2CAP disconnection, reset the ACL @@ -3762,8 +3744,10 @@ static struct l2cap_chan *l2cap_connect(struct l2cap_conn *conn, */ conn->hcon->disc_timeout = HCI_DISCONN_TIMEOUT; - bacpy(&bt_sk(sk)->src, conn->src); - bacpy(&bt_sk(sk)->dst, conn->dst); + bacpy(&chan->src, &conn->hcon->src); + bacpy(&chan->dst, &conn->hcon->dst); + chan->src_type = bdaddr_type(conn->hcon, conn->hcon->src_type); + chan->dst_type = bdaddr_type(conn->hcon, conn->hcon->dst_type); chan->psm = psm; chan->dcid = scid; chan->local_amp_id = amp_id; @@ -3772,14 +3756,14 @@ static struct l2cap_chan *l2cap_connect(struct l2cap_conn *conn, dcid = chan->scid; - __set_chan_timer(chan, sk->sk_sndtimeo); + __set_chan_timer(chan, chan->ops->get_sndtimeo(chan)); chan->ident = cmd->ident; if (conn->info_state & L2CAP_INFO_FEAT_MASK_REQ_DONE) { if (l2cap_chan_check_security(chan)) { - if (test_bit(BT_SK_DEFER_SETUP, &bt_sk(sk)->flags)) { - __l2cap_state_change(chan, BT_CONNECT2); + if (test_bit(FLAG_DEFER_SETUP, &chan->flags)) { + l2cap_state_change(chan, BT_CONNECT2); result = L2CAP_CR_PEND; status = L2CAP_CS_AUTHOR_PEND; chan->ops->defer(chan); @@ -3788,28 +3772,28 @@ static struct l2cap_chan *l2cap_connect(struct l2cap_conn *conn, * The connection will succeed after the * physical link is up. */ - if (amp_id) { - __l2cap_state_change(chan, BT_CONNECT2); - result = L2CAP_CR_PEND; - } else { - __l2cap_state_change(chan, BT_CONFIG); + if (amp_id == AMP_ID_BREDR) { + l2cap_state_change(chan, BT_CONFIG); result = L2CAP_CR_SUCCESS; + } else { + l2cap_state_change(chan, BT_CONNECT2); + result = L2CAP_CR_PEND; } status = L2CAP_CS_NO_INFO; } } else { - __l2cap_state_change(chan, BT_CONNECT2); + l2cap_state_change(chan, BT_CONNECT2); result = L2CAP_CR_PEND; status = L2CAP_CS_AUTHEN_PEND; } } else { - __l2cap_state_change(chan, BT_CONNECT2); + l2cap_state_change(chan, BT_CONNECT2); result = L2CAP_CR_PEND; status = L2CAP_CS_NO_INFO; } response: - release_sock(parent); + l2cap_chan_unlock(pchan); mutex_unlock(&conn->chan_lock); sendresp: @@ -3891,13 +3875,13 @@ static int l2cap_connect_create_rsp(struct l2cap_conn *conn, if (scid) { chan = __l2cap_get_chan_by_scid(conn, scid); if (!chan) { - err = -EFAULT; + err = -EBADSLT; goto unlock; } } else { chan = __l2cap_get_chan_by_ident(conn, cmd->ident); if (!chan) { - err = -EFAULT; + err = -EBADSLT; goto unlock; } } @@ -3965,6 +3949,18 @@ static void l2cap_send_efs_conf_rsp(struct l2cap_chan *chan, void *data, L2CAP_CONF_SUCCESS, flags), data); } +static void cmd_reject_invalid_cid(struct l2cap_conn *conn, u8 ident, + u16 scid, u16 dcid) +{ + struct l2cap_cmd_rej_cid rej; + + rej.reason = __constant_cpu_to_le16(L2CAP_REJ_INVALID_CID); + rej.scid = __cpu_to_le16(scid); + rej.dcid = __cpu_to_le16(dcid); + + l2cap_send_cmd(conn, ident, L2CAP_COMMAND_REJ, sizeof(rej), &rej); +} + static inline int l2cap_config_req(struct l2cap_conn *conn, struct l2cap_cmd_hdr *cmd, u16 cmd_len, u8 *data) @@ -3984,18 +3980,14 @@ static inline int l2cap_config_req(struct l2cap_conn *conn, BT_DBG("dcid 0x%4.4x flags 0x%2.2x", dcid, flags); chan = l2cap_get_chan_by_scid(conn, dcid); - if (!chan) - return -ENOENT; + if (!chan) { + cmd_reject_invalid_cid(conn, cmd->ident, dcid, 0); + return 0; + } if (chan->state != BT_CONFIG && chan->state != BT_CONNECT2) { - struct l2cap_cmd_rej_cid rej; - - rej.reason = __constant_cpu_to_le16(L2CAP_REJ_INVALID_CID); - rej.scid = cpu_to_le16(chan->scid); - rej.dcid = cpu_to_le16(chan->dcid); - - l2cap_send_cmd(conn, cmd->ident, L2CAP_COMMAND_REJ, - sizeof(rej), &rej); + cmd_reject_invalid_cid(conn, cmd->ident, chan->scid, + chan->dcid); goto unlock; } @@ -4198,7 +4190,6 @@ static inline int l2cap_disconnect_req(struct l2cap_conn *conn, struct l2cap_disconn_rsp rsp; u16 dcid, scid; struct l2cap_chan *chan; - struct sock *sk; if (cmd_len != sizeof(*req)) return -EPROTO; @@ -4213,20 +4204,17 @@ static inline int l2cap_disconnect_req(struct l2cap_conn *conn, chan = __l2cap_get_chan_by_scid(conn, dcid); if (!chan) { mutex_unlock(&conn->chan_lock); + cmd_reject_invalid_cid(conn, cmd->ident, dcid, scid); return 0; } l2cap_chan_lock(chan); - sk = chan->sk; - rsp.dcid = cpu_to_le16(chan->scid); rsp.scid = cpu_to_le16(chan->dcid); l2cap_send_cmd(conn, cmd->ident, L2CAP_DISCONN_RSP, sizeof(rsp), &rsp); - lock_sock(sk); - sk->sk_shutdown = SHUTDOWN_MASK; - release_sock(sk); + chan->ops->set_shutdown(chan); l2cap_chan_hold(chan); l2cap_chan_del(chan, ECONNRESET); @@ -4303,7 +4291,7 @@ static inline int l2cap_information_req(struct l2cap_conn *conn, if (!disable_ertm) feat_mask |= L2CAP_FEAT_ERTM | L2CAP_FEAT_STREAMING | L2CAP_FEAT_FCS; - if (enable_hs) + if (conn->hs_enabled) feat_mask |= L2CAP_FEAT_EXT_FLOW | L2CAP_FEAT_EXT_WINDOW; @@ -4314,7 +4302,7 @@ static inline int l2cap_information_req(struct l2cap_conn *conn, u8 buf[12]; struct l2cap_info_rsp *rsp = (struct l2cap_info_rsp *) buf; - if (enable_hs) + if (conn->hs_enabled) l2cap_fixed_chan[0] |= L2CAP_FC_A2MP; else l2cap_fixed_chan[0] &= ~L2CAP_FC_A2MP; @@ -4411,7 +4399,7 @@ static int l2cap_create_channel_req(struct l2cap_conn *conn, if (cmd_len != sizeof(*req)) return -EPROTO; - if (!enable_hs) + if (!conn->hs_enabled) return -EINVAL; psm = le16_to_cpu(req->psm); @@ -4420,7 +4408,7 @@ static int l2cap_create_channel_req(struct l2cap_conn *conn, BT_DBG("psm 0x%2.2x, scid 0x%4.4x, amp_id %d", psm, scid, req->amp_id); /* For controller id 0 make BR/EDR connection */ - if (req->amp_id == HCI_BREDR_ID) { + if (req->amp_id == AMP_ID_BREDR) { l2cap_connect(conn, cmd, data, L2CAP_CREATE_CHAN_RSP, req->amp_id); return 0; @@ -4442,10 +4430,13 @@ static int l2cap_create_channel_req(struct l2cap_conn *conn, struct amp_mgr *mgr = conn->hcon->amp_mgr; struct hci_conn *hs_hcon; - hs_hcon = hci_conn_hash_lookup_ba(hdev, AMP_LINK, conn->dst); + hs_hcon = hci_conn_hash_lookup_ba(hdev, AMP_LINK, + &conn->hcon->dst); if (!hs_hcon) { hci_dev_put(hdev); - return -EFAULT; + cmd_reject_invalid_cid(conn, cmd->ident, chan->scid, + chan->dcid); + return 0; } BT_DBG("mgr %p bredr_chan %p hs_hcon %p", mgr, chan, hs_hcon); @@ -4469,7 +4460,7 @@ error: l2cap_send_cmd(conn, cmd->ident, L2CAP_CREATE_CHAN_RSP, sizeof(rsp), &rsp); - return -EFAULT; + return 0; } static void l2cap_send_move_chan_req(struct l2cap_chan *chan, u8 dest_amp_id) @@ -4655,7 +4646,7 @@ void l2cap_logical_cfm(struct l2cap_chan *chan, struct hci_chan *hchan, if (chan->state != BT_CONNECTED) { /* Ignore logical link if channel is on BR/EDR */ - if (chan->local_amp_id) + if (chan->local_amp_id != AMP_ID_BREDR) l2cap_logical_finish_create(chan, hchan); } else { l2cap_logical_finish_move(chan, hchan); @@ -4666,7 +4657,7 @@ void l2cap_move_start(struct l2cap_chan *chan) { BT_DBG("chan %p", chan); - if (chan->local_amp_id == HCI_BREDR_ID) { + if (chan->local_amp_id == AMP_ID_BREDR) { if (chan->chan_policy != BT_CHANNEL_POLICY_AMP_PREFERRED) return; chan->move_role = L2CAP_MOVE_ROLE_INITIATOR; @@ -4723,7 +4714,7 @@ static void l2cap_do_create(struct l2cap_chan *chan, int result, sizeof(rsp), &rsp); if (result == L2CAP_CR_SUCCESS) { - __l2cap_state_change(chan, BT_CONFIG); + l2cap_state_change(chan, BT_CONFIG); set_bit(CONF_REQ_SENT, &chan->conf_state); l2cap_send_cmd(chan->conn, l2cap_get_ident(chan->conn), L2CAP_CONF_REQ, @@ -4838,7 +4829,7 @@ static inline int l2cap_move_channel_req(struct l2cap_conn *conn, BT_DBG("icid 0x%4.4x, dest_amp_id %d", icid, req->dest_amp_id); - if (!enable_hs) + if (!conn->hs_enabled) return -EINVAL; chan = l2cap_get_chan_by_dcid(conn, icid); @@ -4865,7 +4856,7 @@ static inline int l2cap_move_channel_req(struct l2cap_conn *conn, goto send_move_response; } - if (req->dest_amp_id) { + if (req->dest_amp_id != AMP_ID_BREDR) { struct hci_dev *hdev; hdev = hci_dev_get(req->dest_amp_id); if (!hdev || hdev->dev_type != HCI_AMP || @@ -4885,7 +4876,7 @@ static inline int l2cap_move_channel_req(struct l2cap_conn *conn, */ if ((__chan_is_moving(chan) || chan->move_role != L2CAP_MOVE_ROLE_NONE) && - bacmp(conn->src, conn->dst) > 0) { + bacmp(&conn->hcon->src, &conn->hcon->dst) > 0) { result = L2CAP_MR_COLLISION; goto send_move_response; } @@ -4895,7 +4886,7 @@ static inline int l2cap_move_channel_req(struct l2cap_conn *conn, chan->move_id = req->dest_amp_id; icid = chan->dcid; - if (!req->dest_amp_id) { + if (req->dest_amp_id == AMP_ID_BREDR) { /* Moving to BR/EDR */ if (test_bit(CONN_LOCAL_BUSY, &chan->conn_state)) { chan->move_state = L2CAP_MOVE_WAIT_LOCAL_BUSY; @@ -5087,7 +5078,7 @@ static int l2cap_move_channel_confirm(struct l2cap_conn *conn, if (chan->move_state == L2CAP_MOVE_WAIT_CONFIRM) { if (result == L2CAP_MC_CONFIRMED) { chan->local_amp_id = chan->move_id; - if (!chan->local_amp_id) + if (chan->local_amp_id == AMP_ID_BREDR) __release_logical_link(chan); } else { chan->move_id = chan->local_amp_id; @@ -5127,7 +5118,7 @@ static inline int l2cap_move_channel_confirm_rsp(struct l2cap_conn *conn, if (chan->move_state == L2CAP_MOVE_WAIT_CONFIRM_RSP) { chan->local_amp_id = chan->move_id; - if (!chan->local_amp_id && chan->hs_hchan) + if (chan->local_amp_id == AMP_ID_BREDR && chan->hs_hchan) __release_logical_link(chan); l2cap_move_done(chan); @@ -5219,7 +5210,7 @@ static inline int l2cap_bredr_sig_cmd(struct l2cap_conn *conn, case L2CAP_CONN_RSP: case L2CAP_CREATE_CHAN_RSP: - err = l2cap_connect_create_rsp(conn, cmd, cmd_len, data); + l2cap_connect_create_rsp(conn, cmd, cmd_len, data); break; case L2CAP_CONF_REQ: @@ -5227,7 +5218,7 @@ static inline int l2cap_bredr_sig_cmd(struct l2cap_conn *conn, break; case L2CAP_CONF_RSP: - err = l2cap_config_rsp(conn, cmd, cmd_len, data); + l2cap_config_rsp(conn, cmd, cmd_len, data); break; case L2CAP_DISCONN_REQ: @@ -5235,7 +5226,7 @@ static inline int l2cap_bredr_sig_cmd(struct l2cap_conn *conn, break; case L2CAP_DISCONN_RSP: - err = l2cap_disconnect_rsp(conn, cmd, cmd_len, data); + l2cap_disconnect_rsp(conn, cmd, cmd_len, data); break; case L2CAP_ECHO_REQ: @@ -5250,7 +5241,7 @@ static inline int l2cap_bredr_sig_cmd(struct l2cap_conn *conn, break; case L2CAP_INFO_RSP: - err = l2cap_information_rsp(conn, cmd, cmd_len, data); + l2cap_information_rsp(conn, cmd, cmd_len, data); break; case L2CAP_CREATE_CHAN_REQ: @@ -5262,7 +5253,7 @@ static inline int l2cap_bredr_sig_cmd(struct l2cap_conn *conn, break; case L2CAP_MOVE_CHAN_RSP: - err = l2cap_move_channel_rsp(conn, cmd, cmd_len, data); + l2cap_move_channel_rsp(conn, cmd, cmd_len, data); break; case L2CAP_MOVE_CHAN_CFM: @@ -5270,7 +5261,7 @@ static inline int l2cap_bredr_sig_cmd(struct l2cap_conn *conn, break; case L2CAP_MOVE_CHAN_CFM_RSP: - err = l2cap_move_channel_confirm_rsp(conn, cmd, cmd_len, data); + l2cap_move_channel_confirm_rsp(conn, cmd, cmd_len, data); break; default: @@ -5304,51 +5295,48 @@ static inline int l2cap_le_sig_cmd(struct l2cap_conn *conn, static inline void l2cap_le_sig_channel(struct l2cap_conn *conn, struct sk_buff *skb) { - u8 *data = skb->data; - int len = skb->len; - struct l2cap_cmd_hdr cmd; + struct hci_conn *hcon = conn->hcon; + struct l2cap_cmd_hdr *cmd; + u16 len; int err; - l2cap_raw_recv(conn, skb); + if (hcon->type != LE_LINK) + goto drop; - while (len >= L2CAP_CMD_HDR_SIZE) { - u16 cmd_len; - memcpy(&cmd, data, L2CAP_CMD_HDR_SIZE); - data += L2CAP_CMD_HDR_SIZE; - len -= L2CAP_CMD_HDR_SIZE; + if (skb->len < L2CAP_CMD_HDR_SIZE) + goto drop; - cmd_len = le16_to_cpu(cmd.len); + cmd = (void *) skb->data; + skb_pull(skb, L2CAP_CMD_HDR_SIZE); - BT_DBG("code 0x%2.2x len %d id 0x%2.2x", cmd.code, cmd_len, - cmd.ident); + len = le16_to_cpu(cmd->len); - if (cmd_len > len || !cmd.ident) { - BT_DBG("corrupted command"); - break; - } + BT_DBG("code 0x%2.2x len %d id 0x%2.2x", cmd->code, len, cmd->ident); - err = l2cap_le_sig_cmd(conn, &cmd, data); - if (err) { - struct l2cap_cmd_rej_unk rej; + if (len != skb->len || !cmd->ident) { + BT_DBG("corrupted command"); + goto drop; + } - BT_ERR("Wrong link type (%d)", err); + err = l2cap_le_sig_cmd(conn, cmd, skb->data); + if (err) { + struct l2cap_cmd_rej_unk rej; - /* FIXME: Map err to a valid reason */ - rej.reason = __constant_cpu_to_le16(L2CAP_REJ_NOT_UNDERSTOOD); - l2cap_send_cmd(conn, cmd.ident, L2CAP_COMMAND_REJ, - sizeof(rej), &rej); - } + BT_ERR("Wrong link type (%d)", err); - data += cmd_len; - len -= cmd_len; + rej.reason = __constant_cpu_to_le16(L2CAP_REJ_NOT_UNDERSTOOD); + l2cap_send_cmd(conn, cmd->ident, L2CAP_COMMAND_REJ, + sizeof(rej), &rej); } +drop: kfree_skb(skb); } static inline void l2cap_sig_channel(struct l2cap_conn *conn, struct sk_buff *skb) { + struct hci_conn *hcon = conn->hcon; u8 *data = skb->data; int len = skb->len; struct l2cap_cmd_hdr cmd; @@ -5356,6 +5344,9 @@ static inline void l2cap_sig_channel(struct l2cap_conn *conn, l2cap_raw_recv(conn, skb); + if (hcon->type != ACL_LINK) + goto drop; + while (len >= L2CAP_CMD_HDR_SIZE) { u16 cmd_len; memcpy(&cmd, data, L2CAP_CMD_HDR_SIZE); @@ -5378,7 +5369,6 @@ static inline void l2cap_sig_channel(struct l2cap_conn *conn, BT_ERR("Wrong link type (%d)", err); - /* FIXME: Map err to a valid reason */ rej.reason = __constant_cpu_to_le16(L2CAP_REJ_NOT_UNDERSTOOD); l2cap_send_cmd(conn, cmd.ident, L2CAP_COMMAND_REJ, sizeof(rej), &rej); @@ -5388,6 +5378,7 @@ static inline void l2cap_sig_channel(struct l2cap_conn *conn, len -= cmd_len; } +drop: kfree_skb(skb); } @@ -5784,7 +5775,7 @@ static int l2cap_rx_state_recv(struct l2cap_chan *chan, struct sk_buff *skb, u8 event) { int err = 0; - bool skb_in_use = 0; + bool skb_in_use = false; BT_DBG("chan %p, control %p, skb %p, event %d", chan, control, skb, event); @@ -5805,7 +5796,7 @@ static int l2cap_rx_state_recv(struct l2cap_chan *chan, control->txseq); chan->buffer_seq = chan->expected_tx_seq; - skb_in_use = 1; + skb_in_use = true; err = l2cap_reassemble_sdu(chan, skb, control); if (err) @@ -5841,7 +5832,7 @@ static int l2cap_rx_state_recv(struct l2cap_chan *chan, * current frame is stored for later use. */ skb_queue_tail(&chan->srej_q, skb); - skb_in_use = 1; + skb_in_use = true; BT_DBG("Queued %p (queue len %d)", skb, skb_queue_len(&chan->srej_q)); @@ -5919,7 +5910,7 @@ static int l2cap_rx_state_srej_sent(struct l2cap_chan *chan, { int err = 0; u16 txseq = control->txseq; - bool skb_in_use = 0; + bool skb_in_use = false; BT_DBG("chan %p, control %p, skb %p, event %d", chan, control, skb, event); @@ -5931,7 +5922,7 @@ static int l2cap_rx_state_srej_sent(struct l2cap_chan *chan, /* Keep frame for reassembly later */ l2cap_pass_to_tx(chan, control); skb_queue_tail(&chan->srej_q, skb); - skb_in_use = 1; + skb_in_use = true; BT_DBG("Queued %p (queue len %d)", skb, skb_queue_len(&chan->srej_q)); @@ -5942,7 +5933,7 @@ static int l2cap_rx_state_srej_sent(struct l2cap_chan *chan, l2cap_pass_to_tx(chan, control); skb_queue_tail(&chan->srej_q, skb); - skb_in_use = 1; + skb_in_use = true; BT_DBG("Queued %p (queue len %d)", skb, skb_queue_len(&chan->srej_q)); @@ -5957,7 +5948,7 @@ static int l2cap_rx_state_srej_sent(struct l2cap_chan *chan, * the missing frames. */ skb_queue_tail(&chan->srej_q, skb); - skb_in_use = 1; + skb_in_use = true; BT_DBG("Queued %p (queue len %d)", skb, skb_queue_len(&chan->srej_q)); @@ -5971,7 +5962,7 @@ static int l2cap_rx_state_srej_sent(struct l2cap_chan *chan, * SREJ'd frames. */ skb_queue_tail(&chan->srej_q, skb); - skb_in_use = 1; + skb_in_use = true; BT_DBG("Queued %p (queue len %d)", skb, skb_queue_len(&chan->srej_q)); @@ -6380,9 +6371,13 @@ done: static void l2cap_conless_channel(struct l2cap_conn *conn, __le16 psm, struct sk_buff *skb) { + struct hci_conn *hcon = conn->hcon; struct l2cap_chan *chan; - chan = l2cap_global_chan_by_psm(0, psm, conn->src, conn->dst); + if (hcon->type != ACL_LINK) + goto drop; + + chan = l2cap_global_chan_by_psm(0, psm, &hcon->src, &hcon->dst); if (!chan) goto drop; @@ -6394,6 +6389,10 @@ static void l2cap_conless_channel(struct l2cap_conn *conn, __le16 psm, if (chan->imtu < skb->len) goto drop; + /* Store remote BD_ADDR and PSM for msg_name */ + bacpy(&bt_cb(skb)->bdaddr, &hcon->dst); + bt_cb(skb)->psm = psm; + if (!chan->ops->recv(chan, skb)) return; @@ -6404,15 +6403,22 @@ drop: static void l2cap_att_channel(struct l2cap_conn *conn, struct sk_buff *skb) { + struct hci_conn *hcon = conn->hcon; struct l2cap_chan *chan; + if (hcon->type != LE_LINK) + goto drop; + chan = l2cap_global_chan_by_scid(BT_CONNECTED, L2CAP_CID_ATT, - conn->src, conn->dst); + &hcon->src, &hcon->dst); if (!chan) goto drop; BT_DBG("chan %p, len %d", chan, skb->len); + if (hci_blacklist_lookup(hcon->hdev, &hcon->dst, hcon->dst_type)) + goto drop; + if (chan->imtu < skb->len) goto drop; @@ -6441,9 +6447,6 @@ static void l2cap_recv_frame(struct l2cap_conn *conn, struct sk_buff *skb) BT_DBG("len %d, cid 0x%4.4x", len, cid); switch (cid) { - case L2CAP_CID_LE_SIGNALING: - l2cap_le_sig_channel(conn, skb); - break; case L2CAP_CID_SIGNALING: l2cap_sig_channel(conn, skb); break; @@ -6458,6 +6461,10 @@ static void l2cap_recv_frame(struct l2cap_conn *conn, struct sk_buff *skb) l2cap_att_channel(conn, skb); break; + case L2CAP_CID_LE_SIGNALING: + l2cap_le_sig_channel(conn, skb); + break; + case L2CAP_CID_SMP: if (smp_sig_channel(conn, skb)) l2cap_conn_del(conn->hcon, EACCES); @@ -6481,17 +6488,15 @@ int l2cap_connect_ind(struct hci_dev *hdev, bdaddr_t *bdaddr) /* Find listening sockets and check their link_mode */ read_lock(&chan_list_lock); list_for_each_entry(c, &chan_list, global_l) { - struct sock *sk = c->sk; - if (c->state != BT_LISTEN) continue; - if (!bacmp(&bt_sk(sk)->src, &hdev->bdaddr)) { + if (!bacmp(&c->src, &hdev->bdaddr)) { lm1 |= HCI_LM_ACCEPT; if (test_bit(FLAG_ROLE_SWITCH, &c->flags)) lm1 |= HCI_LM_MASTER; exact++; - } else if (!bacmp(&bt_sk(sk)->src, BDADDR_ANY)) { + } else if (!bacmp(&c->src, BDADDR_ANY)) { lm2 |= HCI_LM_ACCEPT; if (test_bit(FLAG_ROLE_SWITCH, &c->flags)) lm2 |= HCI_LM_MASTER; @@ -6597,11 +6602,7 @@ int l2cap_security_cfm(struct hci_conn *hcon, u8 status, u8 encrypt) if (!status && (chan->state == BT_CONNECTED || chan->state == BT_CONFIG)) { - struct sock *sk = chan->sk; - - clear_bit(BT_SK_SUSPEND, &bt_sk(sk)->flags); - sk->sk_state_change(sk); - + chan->ops->resume(chan); l2cap_check_encryption(chan, encrypt); l2cap_chan_unlock(chan); continue; @@ -6614,32 +6615,26 @@ int l2cap_security_cfm(struct hci_conn *hcon, u8 status, u8 encrypt) __set_chan_timer(chan, L2CAP_DISC_TIMEOUT); } } else if (chan->state == BT_CONNECT2) { - struct sock *sk = chan->sk; struct l2cap_conn_rsp rsp; __u16 res, stat; - lock_sock(sk); - if (!status) { - if (test_bit(BT_SK_DEFER_SETUP, - &bt_sk(sk)->flags)) { + if (test_bit(FLAG_DEFER_SETUP, &chan->flags)) { res = L2CAP_CR_PEND; stat = L2CAP_CS_AUTHOR_PEND; chan->ops->defer(chan); } else { - __l2cap_state_change(chan, BT_CONFIG); + l2cap_state_change(chan, BT_CONFIG); res = L2CAP_CR_SUCCESS; stat = L2CAP_CS_NO_INFO; } } else { - __l2cap_state_change(chan, BT_DISCONN); + l2cap_state_change(chan, BT_DISCONN); __set_chan_timer(chan, L2CAP_DISC_TIMEOUT); res = L2CAP_CR_SEC_BLOCK; stat = L2CAP_CS_NO_INFO; } - release_sock(sk); - rsp.scid = cpu_to_le16(chan->dcid); rsp.dcid = cpu_to_le16(chan->scid); rsp.result = cpu_to_le16(res); @@ -6756,9 +6751,13 @@ int l2cap_recv_acldata(struct hci_conn *hcon, struct sk_buff *skb, u16 flags) conn->rx_len -= skb->len; if (!conn->rx_len) { - /* Complete frame received */ - l2cap_recv_frame(conn, conn->rx_skb); + /* Complete frame received. l2cap_recv_frame + * takes ownership of the skb so set the global + * rx_skb pointer to NULL first. + */ + struct sk_buff *rx_skb = conn->rx_skb; conn->rx_skb = NULL; + l2cap_recv_frame(conn, rx_skb); } break; } @@ -6775,10 +6774,8 @@ static int l2cap_debugfs_show(struct seq_file *f, void *p) read_lock(&chan_list_lock); list_for_each_entry(c, &chan_list, global_l) { - struct sock *sk = c->sk; - seq_printf(f, "%pMR %pMR %d %d 0x%4.4x 0x%4.4x %d %d %d %d\n", - &bt_sk(sk)->src, &bt_sk(sk)->dst, + &c->src, &c->dst, c->state, __le16_to_cpu(c->psm), c->scid, c->dcid, c->imtu, c->omtu, c->sec_level, c->mode); @@ -6811,12 +6808,11 @@ int __init l2cap_init(void) if (err < 0) return err; - if (bt_debugfs) { - l2cap_debugfs = debugfs_create_file("l2cap", 0444, bt_debugfs, - NULL, &l2cap_debugfs_fops); - if (!l2cap_debugfs) - BT_ERR("Failed to create L2CAP debug file"); - } + if (IS_ERR_OR_NULL(bt_debugfs)) + return 0; + + l2cap_debugfs = debugfs_create_file("l2cap", 0444, bt_debugfs, + NULL, &l2cap_debugfs_fops); return 0; } diff --git a/net/bluetooth/l2cap_sock.c b/net/bluetooth/l2cap_sock.c index 0098af80b213..7cc24d263caa 100644 --- a/net/bluetooth/l2cap_sock.c +++ b/net/bluetooth/l2cap_sock.c @@ -32,7 +32,8 @@ #include <net/bluetooth/bluetooth.h> #include <net/bluetooth/hci_core.h> #include <net/bluetooth/l2cap.h> -#include <net/bluetooth/smp.h> + +#include "smp.h" static struct bt_sock_list l2cap_sk_list = { .lock = __RW_LOCK_UNLOCKED(l2cap_sk_list.lock) @@ -68,6 +69,18 @@ static int l2cap_sock_bind(struct socket *sock, struct sockaddr *addr, int alen) if (la.l2_cid && la.l2_psm) return -EINVAL; + if (!bdaddr_type_is_valid(la.l2_bdaddr_type)) + return -EINVAL; + + if (bdaddr_type_is_le(la.l2_bdaddr_type)) { + /* Connection oriented channels are not supported on LE */ + if (la.l2_psm) + return -EINVAL; + /* We only allow ATT user space socket */ + if (la.l2_cid != __constant_cpu_to_le16(L2CAP_CID_ATT)) + return -EINVAL; + } + lock_sock(sk); if (sk->sk_state != BT_OPEN) { @@ -99,11 +112,20 @@ static int l2cap_sock_bind(struct socket *sock, struct sockaddr *addr, int alen) if (err < 0) goto done; - if (__le16_to_cpu(la.l2_psm) == L2CAP_PSM_SDP || - __le16_to_cpu(la.l2_psm) == L2CAP_PSM_RFCOMM) - chan->sec_level = BT_SECURITY_SDP; + switch (chan->chan_type) { + case L2CAP_CHAN_CONN_LESS: + if (__le16_to_cpu(la.l2_psm) == L2CAP_PSM_3DSP) + chan->sec_level = BT_SECURITY_SDP; + break; + case L2CAP_CHAN_CONN_ORIENTED: + if (__le16_to_cpu(la.l2_psm) == L2CAP_PSM_SDP || + __le16_to_cpu(la.l2_psm) == L2CAP_PSM_RFCOMM) + chan->sec_level = BT_SECURITY_SDP; + break; + } - bacpy(&bt_sk(sk)->src, &la.l2_bdaddr); + bacpy(&chan->src, &la.l2_bdaddr); + chan->src_type = la.l2_bdaddr_type; chan->state = BT_BOUND; sk->sk_state = BT_BOUND; @@ -134,6 +156,47 @@ static int l2cap_sock_connect(struct socket *sock, struct sockaddr *addr, if (la.l2_cid && la.l2_psm) return -EINVAL; + if (!bdaddr_type_is_valid(la.l2_bdaddr_type)) + return -EINVAL; + + /* Check that the socket wasn't bound to something that + * conflicts with the address given to connect(). If chan->src + * is BDADDR_ANY it means bind() was never used, in which case + * chan->src_type and la.l2_bdaddr_type do not need to match. + */ + if (chan->src_type == BDADDR_BREDR && bacmp(&chan->src, BDADDR_ANY) && + bdaddr_type_is_le(la.l2_bdaddr_type)) { + /* Old user space versions will try to incorrectly bind + * the ATT socket using BDADDR_BREDR. We need to accept + * this and fix up the source address type only when + * both the source CID and destination CID indicate + * ATT. Anything else is an invalid combination. + */ + if (chan->scid != L2CAP_CID_ATT || + la.l2_cid != __constant_cpu_to_le16(L2CAP_CID_ATT)) + return -EINVAL; + + /* We don't have the hdev available here to make a + * better decision on random vs public, but since all + * user space versions that exhibit this issue anyway do + * not support random local addresses assuming public + * here is good enough. + */ + chan->src_type = BDADDR_LE_PUBLIC; + } + + if (chan->src_type != BDADDR_BREDR && la.l2_bdaddr_type == BDADDR_BREDR) + return -EINVAL; + + if (bdaddr_type_is_le(la.l2_bdaddr_type)) { + /* Connection oriented channels are not supported on LE */ + if (la.l2_psm) + return -EINVAL; + /* We only allow ATT user space socket */ + if (la.l2_cid != __constant_cpu_to_le16(L2CAP_CID_ATT)) + return -EINVAL; + } + err = l2cap_chan_connect(chan, la.l2_psm, __le16_to_cpu(la.l2_cid), &la.l2_bdaddr, la.l2_bdaddr_type); if (err) @@ -265,12 +328,14 @@ static int l2cap_sock_getname(struct socket *sock, struct sockaddr *addr, if (peer) { la->l2_psm = chan->psm; - bacpy(&la->l2_bdaddr, &bt_sk(sk)->dst); + bacpy(&la->l2_bdaddr, &chan->dst); la->l2_cid = cpu_to_le16(chan->dcid); + la->l2_bdaddr_type = chan->dst_type; } else { la->l2_psm = chan->sport; - bacpy(&la->l2_bdaddr, &bt_sk(sk)->src); + bacpy(&la->l2_bdaddr, &chan->src); la->l2_cid = cpu_to_le16(chan->scid); + la->l2_bdaddr_type = chan->src_type; } return 0; @@ -445,11 +510,6 @@ static int l2cap_sock_getsockopt(struct socket *sock, int level, int optname, break; case BT_CHANNEL_POLICY: - if (!enable_hs) { - err = -ENOPROTOOPT; - break; - } - if (put_user(chan->chan_policy, (u32 __user *) optval)) err = -EFAULT; break; @@ -665,10 +725,13 @@ static int l2cap_sock_setsockopt(struct socket *sock, int level, int optname, break; } - if (opt) + if (opt) { set_bit(BT_SK_DEFER_SETUP, &bt_sk(sk)->flags); - else + set_bit(FLAG_DEFER_SETUP, &chan->flags); + } else { clear_bit(BT_SK_DEFER_SETUP, &bt_sk(sk)->flags); + clear_bit(FLAG_DEFER_SETUP, &chan->flags); + } break; case BT_FLUSHABLE: @@ -683,7 +746,7 @@ static int l2cap_sock_setsockopt(struct socket *sock, int level, int optname, } if (opt == BT_FLUSHABLE_OFF) { - struct l2cap_conn *conn = chan->conn; + conn = chan->conn; /* proceed further only when we have l2cap_conn and No Flush support in the LM */ if (!conn || !lmp_no_flush_capable(conn->hcon->hdev)) { @@ -720,11 +783,6 @@ static int l2cap_sock_setsockopt(struct socket *sock, int level, int optname, break; case BT_CHANNEL_POLICY: - if (!enable_hs) { - err = -ENOPROTOOPT; - break; - } - if (get_user(opt, (u32 __user *) optval)) { err = -EFAULT; break; @@ -777,6 +835,12 @@ static int l2cap_sock_sendmsg(struct kiocb *iocb, struct socket *sock, if (sk->sk_state != BT_CONNECTED) return -ENOTCONN; + lock_sock(sk); + err = bt_sock_wait_ready(sk, msg->msg_flags); + release_sock(sk); + if (err) + return err; + l2cap_chan_lock(chan); err = l2cap_chan_send(chan, msg, len, sk->sk_priority); l2cap_chan_unlock(chan); @@ -799,8 +863,8 @@ static int l2cap_sock_recvmsg(struct kiocb *iocb, struct socket *sock, pi->chan->state = BT_CONFIG; __l2cap_connect_rsp_defer(pi->chan); - release_sock(sk); - return 0; + err = 0; + goto done; } release_sock(sk); @@ -856,6 +920,38 @@ static void l2cap_sock_kill(struct sock *sk) sock_put(sk); } +static int __l2cap_wait_ack(struct sock *sk) +{ + struct l2cap_chan *chan = l2cap_pi(sk)->chan; + DECLARE_WAITQUEUE(wait, current); + int err = 0; + int timeo = HZ/5; + + add_wait_queue(sk_sleep(sk), &wait); + set_current_state(TASK_INTERRUPTIBLE); + while (chan->unacked_frames > 0 && chan->conn) { + if (!timeo) + timeo = HZ/5; + + if (signal_pending(current)) { + err = sock_intr_errno(timeo); + break; + } + + release_sock(sk); + timeo = schedule_timeout(timeo); + lock_sock(sk); + set_current_state(TASK_INTERRUPTIBLE); + + err = sock_error(sk); + if (err) + break; + } + set_current_state(TASK_RUNNING); + remove_wait_queue(sk_sleep(sk), &wait); + return err; +} + static int l2cap_sock_shutdown(struct socket *sock, int how) { struct sock *sk = sock->sk; @@ -946,6 +1042,8 @@ static struct l2cap_chan *l2cap_sock_new_connection_cb(struct l2cap_chan *chan) { struct sock *sk, *parent = chan->data; + lock_sock(parent); + /* Check for backlog size */ if (sk_acceptq_is_full(parent)) { BT_DBG("backlog full %d", parent->sk_ack_backlog); @@ -963,18 +1061,19 @@ static struct l2cap_chan *l2cap_sock_new_connection_cb(struct l2cap_chan *chan) bt_accept_enqueue(parent, sk); + release_sock(parent); + return l2cap_pi(sk)->chan; } static int l2cap_sock_recv_cb(struct l2cap_chan *chan, struct sk_buff *skb) { - int err; struct sock *sk = chan->data; - struct l2cap_pinfo *pi = l2cap_pi(sk); + int err; lock_sock(sk); - if (pi->rx_busy_skb) { + if (l2cap_pi(sk)->rx_busy_skb) { err = -ENOMEM; goto done; } @@ -990,9 +1089,9 @@ static int l2cap_sock_recv_cb(struct l2cap_chan *chan, struct sk_buff *skb) * acked and reassembled until there is buffer space * available. */ - if (err < 0 && pi->chan->mode == L2CAP_MODE_ERTM) { - pi->rx_busy_skb = skb; - l2cap_chan_busy(pi->chan, 1); + if (err < 0 && chan->mode == L2CAP_MODE_ERTM) { + l2cap_pi(sk)->rx_busy_skb = skb; + l2cap_chan_busy(chan, 1); err = 0; } @@ -1050,26 +1149,33 @@ static void l2cap_sock_teardown_cb(struct l2cap_chan *chan, int err) release_sock(sk); } -static void l2cap_sock_state_change_cb(struct l2cap_chan *chan, int state) +static void l2cap_sock_state_change_cb(struct l2cap_chan *chan, int state, + int err) { struct sock *sk = chan->data; sk->sk_state = state; + + if (err) + sk->sk_err = err; } static struct sk_buff *l2cap_sock_alloc_skb_cb(struct l2cap_chan *chan, unsigned long len, int nb) { + struct sock *sk = chan->data; struct sk_buff *skb; int err; l2cap_chan_unlock(chan); - skb = bt_skb_send_alloc(chan->sk, len, nb, &err); + skb = bt_skb_send_alloc(sk, len, nb, &err); l2cap_chan_lock(chan); if (!skb) return ERR_PTR(err); + bt_cb(skb)->chan = chan; + return skb; } @@ -1095,11 +1201,39 @@ static void l2cap_sock_ready_cb(struct l2cap_chan *chan) static void l2cap_sock_defer_cb(struct l2cap_chan *chan) { - struct sock *sk = chan->data; - struct sock *parent = bt_sk(sk)->parent; + struct sock *parent, *sk = chan->data; + + lock_sock(sk); + parent = bt_sk(sk)->parent; if (parent) parent->sk_data_ready(parent, 0); + + release_sock(sk); +} + +static void l2cap_sock_resume_cb(struct l2cap_chan *chan) +{ + struct sock *sk = chan->data; + + clear_bit(BT_SK_SUSPEND, &bt_sk(sk)->flags); + sk->sk_state_change(sk); +} + +static void l2cap_sock_set_shutdown_cb(struct l2cap_chan *chan) +{ + struct sock *sk = chan->data; + + lock_sock(sk); + sk->sk_shutdown = SHUTDOWN_MASK; + release_sock(sk); +} + +static long l2cap_sock_get_sndtimeo_cb(struct l2cap_chan *chan) +{ + struct sock *sk = chan->data; + + return sk->sk_sndtimeo; } static struct l2cap_ops l2cap_chan_ops = { @@ -1111,6 +1245,9 @@ static struct l2cap_ops l2cap_chan_ops = { .state_change = l2cap_sock_state_change_cb, .ready = l2cap_sock_ready_cb, .defer = l2cap_sock_defer_cb, + .resume = l2cap_sock_resume_cb, + .set_shutdown = l2cap_sock_set_shutdown_cb, + .get_sndtimeo = l2cap_sock_get_sndtimeo_cb, .alloc_skb = l2cap_sock_alloc_skb_cb, }; @@ -1120,6 +1257,7 @@ static void l2cap_sock_destruct(struct sock *sk) if (l2cap_pi(sk)->chan) l2cap_chan_put(l2cap_pi(sk)->chan); + if (l2cap_pi(sk)->rx_busy_skb) { kfree_skb(l2cap_pi(sk)->rx_busy_skb); l2cap_pi(sk)->rx_busy_skb = NULL; @@ -1129,10 +1267,22 @@ static void l2cap_sock_destruct(struct sock *sk) skb_queue_purge(&sk->sk_write_queue); } +static void l2cap_skb_msg_name(struct sk_buff *skb, void *msg_name, + int *msg_namelen) +{ + struct sockaddr_l2 *la = (struct sockaddr_l2 *) msg_name; + + memset(la, 0, sizeof(struct sockaddr_l2)); + la->l2_family = AF_BLUETOOTH; + la->l2_psm = bt_cb(skb)->psm; + bacpy(&la->l2_bdaddr, &bt_cb(skb)->bdaddr); + + *msg_namelen = sizeof(struct sockaddr_l2); +} + static void l2cap_sock_init(struct sock *sk, struct sock *parent) { - struct l2cap_pinfo *pi = l2cap_pi(sk); - struct l2cap_chan *chan = pi->chan; + struct l2cap_chan *chan = l2cap_pi(sk)->chan; BT_DBG("sk %p", sk); @@ -1156,13 +1306,13 @@ static void l2cap_sock_init(struct sock *sk, struct sock *parent) security_sk_clone(parent, sk); } else { - switch (sk->sk_type) { case SOCK_RAW: chan->chan_type = L2CAP_CHAN_RAW; break; case SOCK_DGRAM: chan->chan_type = L2CAP_CHAN_CONN_LESS; + bt_sk(sk)->skb_msg_name = l2cap_skb_msg_name; break; case SOCK_SEQPACKET: case SOCK_STREAM: @@ -1224,8 +1374,6 @@ static struct sock *l2cap_sock_alloc(struct net *net, struct socket *sock, l2cap_chan_hold(chan); - chan->sk = sk; - l2cap_pi(sk)->chan = chan; return sk; diff --git a/net/bluetooth/mgmt.c b/net/bluetooth/mgmt.c index fedc5399d465..074d83690a41 100644 --- a/net/bluetooth/mgmt.c +++ b/net/bluetooth/mgmt.c @@ -30,12 +30,11 @@ #include <net/bluetooth/bluetooth.h> #include <net/bluetooth/hci_core.h> #include <net/bluetooth/mgmt.h> -#include <net/bluetooth/smp.h> -bool enable_hs; +#include "smp.h" #define MGMT_VERSION 1 -#define MGMT_REVISION 3 +#define MGMT_REVISION 4 static const u16 mgmt_commands[] = { MGMT_OP_READ_INDEX_LIST, @@ -76,6 +75,10 @@ static const u16 mgmt_commands[] = { MGMT_OP_BLOCK_DEVICE, MGMT_OP_UNBLOCK_DEVICE, MGMT_OP_SET_DEVICE_ID, + MGMT_OP_SET_ADVERTISING, + MGMT_OP_SET_BREDR, + MGMT_OP_SET_STATIC_ADDRESS, + MGMT_OP_SET_SCAN_PARAMS, }; static const u16 mgmt_events[] = { @@ -181,11 +184,6 @@ static u8 mgmt_status_table[] = { MGMT_STATUS_CONNECT_FAILED, /* MAC Connection Failed */ }; -bool mgmt_valid_hdev(struct hci_dev *hdev) -{ - return hdev->dev_type == HCI_BREDR; -} - static u8 mgmt_status(u8 hci_status) { if (hci_status < ARRAY_SIZE(mgmt_status_table)) @@ -321,10 +319,8 @@ static int read_index_list(struct sock *sk, struct hci_dev *hdev, void *data, count = 0; list_for_each_entry(d, &hci_dev_list, list) { - if (!mgmt_valid_hdev(d)) - continue; - - count++; + if (d->dev_type == HCI_BREDR) + count++; } rp_len = sizeof(*rp) + (2 * count); @@ -339,11 +335,13 @@ static int read_index_list(struct sock *sk, struct hci_dev *hdev, void *data, if (test_bit(HCI_SETUP, &d->dev_flags)) continue; - if (!mgmt_valid_hdev(d)) + if (test_bit(HCI_USER_CHANNEL, &d->dev_flags)) continue; - rp->index[count++] = cpu_to_le16(d->id); - BT_DBG("Added hci%u", d->id); + if (d->dev_type == HCI_BREDR) { + rp->index[count++] = cpu_to_le16(d->id); + BT_DBG("Added hci%u", d->id); + } } rp->num_controllers = cpu_to_le16(count); @@ -366,9 +364,6 @@ static u32 get_supported_settings(struct hci_dev *hdev) settings |= MGMT_SETTING_POWERED; settings |= MGMT_SETTING_PAIRABLE; - if (lmp_ssp_capable(hdev)) - settings |= MGMT_SETTING_SSP; - if (lmp_bredr_capable(hdev)) { settings |= MGMT_SETTING_CONNECTABLE; if (hdev->hci_ver >= BLUETOOTH_VER_1_2) @@ -376,13 +371,17 @@ static u32 get_supported_settings(struct hci_dev *hdev) settings |= MGMT_SETTING_DISCOVERABLE; settings |= MGMT_SETTING_BREDR; settings |= MGMT_SETTING_LINK_SECURITY; - } - if (enable_hs) - settings |= MGMT_SETTING_HS; + if (lmp_ssp_capable(hdev)) { + settings |= MGMT_SETTING_SSP; + settings |= MGMT_SETTING_HS; + } + } - if (lmp_le_capable(hdev)) + if (lmp_le_capable(hdev)) { settings |= MGMT_SETTING_LE; + settings |= MGMT_SETTING_ADVERTISING; + } return settings; } @@ -406,7 +405,7 @@ static u32 get_current_settings(struct hci_dev *hdev) if (test_bit(HCI_PAIRABLE, &hdev->dev_flags)) settings |= MGMT_SETTING_PAIRABLE; - if (lmp_bredr_capable(hdev)) + if (test_bit(HCI_BREDR_ENABLED, &hdev->dev_flags)) settings |= MGMT_SETTING_BREDR; if (test_bit(HCI_LE_ENABLED, &hdev->dev_flags)) @@ -421,6 +420,9 @@ static u32 get_current_settings(struct hci_dev *hdev) if (test_bit(HCI_HS_ENABLED, &hdev->dev_flags)) settings |= MGMT_SETTING_HS; + if (test_bit(HCI_ADVERTISING, &hdev->dev_flags)) + settings |= MGMT_SETTING_ADVERTISING; + return settings; } @@ -534,6 +536,156 @@ static u8 *create_uuid128_list(struct hci_dev *hdev, u8 *data, ptrdiff_t len) return ptr; } +static struct pending_cmd *mgmt_pending_find(u16 opcode, struct hci_dev *hdev) +{ + struct pending_cmd *cmd; + + list_for_each_entry(cmd, &hdev->mgmt_pending, list) { + if (cmd->opcode == opcode) + return cmd; + } + + return NULL; +} + +static u8 create_scan_rsp_data(struct hci_dev *hdev, u8 *ptr) +{ + u8 ad_len = 0; + size_t name_len; + + name_len = strlen(hdev->dev_name); + if (name_len > 0) { + size_t max_len = HCI_MAX_AD_LENGTH - ad_len - 2; + + if (name_len > max_len) { + name_len = max_len; + ptr[1] = EIR_NAME_SHORT; + } else + ptr[1] = EIR_NAME_COMPLETE; + + ptr[0] = name_len + 1; + + memcpy(ptr + 2, hdev->dev_name, name_len); + + ad_len += (name_len + 2); + ptr += (name_len + 2); + } + + return ad_len; +} + +static void update_scan_rsp_data(struct hci_request *req) +{ + struct hci_dev *hdev = req->hdev; + struct hci_cp_le_set_scan_rsp_data cp; + u8 len; + + if (!test_bit(HCI_LE_ENABLED, &hdev->dev_flags)) + return; + + memset(&cp, 0, sizeof(cp)); + + len = create_scan_rsp_data(hdev, cp.data); + + if (hdev->scan_rsp_data_len == len && + memcmp(cp.data, hdev->scan_rsp_data, len) == 0) + return; + + memcpy(hdev->scan_rsp_data, cp.data, sizeof(cp.data)); + hdev->scan_rsp_data_len = len; + + cp.length = len; + + hci_req_add(req, HCI_OP_LE_SET_SCAN_RSP_DATA, sizeof(cp), &cp); +} + +static u8 get_adv_discov_flags(struct hci_dev *hdev) +{ + struct pending_cmd *cmd; + + /* If there's a pending mgmt command the flags will not yet have + * their final values, so check for this first. + */ + cmd = mgmt_pending_find(MGMT_OP_SET_DISCOVERABLE, hdev); + if (cmd) { + struct mgmt_mode *cp = cmd->param; + if (cp->val == 0x01) + return LE_AD_GENERAL; + else if (cp->val == 0x02) + return LE_AD_LIMITED; + } else { + if (test_bit(HCI_LIMITED_DISCOVERABLE, &hdev->dev_flags)) + return LE_AD_LIMITED; + else if (test_bit(HCI_DISCOVERABLE, &hdev->dev_flags)) + return LE_AD_GENERAL; + } + + return 0; +} + +static u8 create_adv_data(struct hci_dev *hdev, u8 *ptr) +{ + u8 ad_len = 0, flags = 0; + + flags |= get_adv_discov_flags(hdev); + + if (test_bit(HCI_BREDR_ENABLED, &hdev->dev_flags)) { + if (lmp_le_br_capable(hdev)) + flags |= LE_AD_SIM_LE_BREDR_CTRL; + if (lmp_host_le_br_capable(hdev)) + flags |= LE_AD_SIM_LE_BREDR_HOST; + } else { + flags |= LE_AD_NO_BREDR; + } + + if (flags) { + BT_DBG("adv flags 0x%02x", flags); + + ptr[0] = 2; + ptr[1] = EIR_FLAGS; + ptr[2] = flags; + + ad_len += 3; + ptr += 3; + } + + if (hdev->adv_tx_power != HCI_TX_POWER_INVALID) { + ptr[0] = 2; + ptr[1] = EIR_TX_POWER; + ptr[2] = (u8) hdev->adv_tx_power; + + ad_len += 3; + ptr += 3; + } + + return ad_len; +} + +static void update_adv_data(struct hci_request *req) +{ + struct hci_dev *hdev = req->hdev; + struct hci_cp_le_set_adv_data cp; + u8 len; + + if (!test_bit(HCI_LE_ENABLED, &hdev->dev_flags)) + return; + + memset(&cp, 0, sizeof(cp)); + + len = create_adv_data(hdev, cp.data); + + if (hdev->adv_data_len == len && + memcmp(cp.data, hdev->adv_data, len) == 0) + return; + + memcpy(hdev->adv_data, cp.data, sizeof(cp.data)); + hdev->adv_data_len = len; + + cp.length = len; + + hci_req_add(req, HCI_OP_LE_SET_ADV_DATA, sizeof(cp), &cp); +} + static void create_eir(struct hci_dev *hdev, u8 *data) { u8 *ptr = data; @@ -632,6 +784,9 @@ static void update_class(struct hci_request *req) if (!hdev_is_powered(hdev)) return; + if (!test_bit(HCI_BREDR_ENABLED, &hdev->dev_flags)) + return; + if (test_bit(HCI_SERVICE_CACHE, &hdev->dev_flags)) return; @@ -639,6 +794,9 @@ static void update_class(struct hci_request *req) cod[1] = hdev->major_class; cod[2] = get_service_classes(hdev); + if (test_bit(HCI_LIMITED_DISCOVERABLE, &hdev->dev_flags)) + cod[1] |= 0x20; + if (memcmp(cod, hdev->dev_class, 3) == 0) return; @@ -763,18 +921,6 @@ static void mgmt_pending_foreach(u16 opcode, struct hci_dev *hdev, } } -static struct pending_cmd *mgmt_pending_find(u16 opcode, struct hci_dev *hdev) -{ - struct pending_cmd *cmd; - - list_for_each_entry(cmd, &hdev->mgmt_pending, list) { - if (cmd->opcode == opcode) - return cmd; - } - - return NULL; -} - static void mgmt_pending_remove(struct pending_cmd *cmd) { list_del(&cmd->list); @@ -804,6 +950,12 @@ static int set_powered(struct sock *sk, struct hci_dev *hdev, void *data, hci_dev_lock(hdev); + if (mgmt_pending_find(MGMT_OP_SET_POWERED, hdev)) { + err = cmd_status(sk, hdev->id, MGMT_OP_SET_POWERED, + MGMT_STATUS_BUSY); + goto failed; + } + if (test_and_clear_bit(HCI_AUTO_OFF, &hdev->dev_flags)) { cancel_delayed_work(&hdev->power_off); @@ -820,12 +972,6 @@ static int set_powered(struct sock *sk, struct hci_dev *hdev, void *data, goto failed; } - if (mgmt_pending_find(MGMT_OP_SET_POWERED, hdev)) { - err = cmd_status(sk, hdev->id, MGMT_OP_SET_POWERED, - MGMT_STATUS_BUSY); - goto failed; - } - cmd = mgmt_pending_add(sk, MGMT_OP_SET_POWERED, hdev, data, len); if (!cmd) { err = -ENOMEM; @@ -883,27 +1029,141 @@ static int new_settings(struct hci_dev *hdev, struct sock *skip) return mgmt_event(MGMT_EV_NEW_SETTINGS, hdev, &ev, sizeof(ev), skip); } +struct cmd_lookup { + struct sock *sk; + struct hci_dev *hdev; + u8 mgmt_status; +}; + +static void settings_rsp(struct pending_cmd *cmd, void *data) +{ + struct cmd_lookup *match = data; + + send_settings_rsp(cmd->sk, cmd->opcode, match->hdev); + + list_del(&cmd->list); + + if (match->sk == NULL) { + match->sk = cmd->sk; + sock_hold(match->sk); + } + + mgmt_pending_free(cmd); +} + +static void cmd_status_rsp(struct pending_cmd *cmd, void *data) +{ + u8 *status = data; + + cmd_status(cmd->sk, cmd->index, cmd->opcode, *status); + mgmt_pending_remove(cmd); +} + +static u8 mgmt_bredr_support(struct hci_dev *hdev) +{ + if (!lmp_bredr_capable(hdev)) + return MGMT_STATUS_NOT_SUPPORTED; + else if (!test_bit(HCI_BREDR_ENABLED, &hdev->dev_flags)) + return MGMT_STATUS_REJECTED; + else + return MGMT_STATUS_SUCCESS; +} + +static u8 mgmt_le_support(struct hci_dev *hdev) +{ + if (!lmp_le_capable(hdev)) + return MGMT_STATUS_NOT_SUPPORTED; + else if (!test_bit(HCI_LE_ENABLED, &hdev->dev_flags)) + return MGMT_STATUS_REJECTED; + else + return MGMT_STATUS_SUCCESS; +} + +static void set_discoverable_complete(struct hci_dev *hdev, u8 status) +{ + struct pending_cmd *cmd; + struct mgmt_mode *cp; + struct hci_request req; + bool changed; + + BT_DBG("status 0x%02x", status); + + hci_dev_lock(hdev); + + cmd = mgmt_pending_find(MGMT_OP_SET_DISCOVERABLE, hdev); + if (!cmd) + goto unlock; + + if (status) { + u8 mgmt_err = mgmt_status(status); + cmd_status(cmd->sk, cmd->index, cmd->opcode, mgmt_err); + clear_bit(HCI_LIMITED_DISCOVERABLE, &hdev->dev_flags); + goto remove_cmd; + } + + cp = cmd->param; + if (cp->val) { + changed = !test_and_set_bit(HCI_DISCOVERABLE, + &hdev->dev_flags); + + if (hdev->discov_timeout > 0) { + int to = msecs_to_jiffies(hdev->discov_timeout * 1000); + queue_delayed_work(hdev->workqueue, &hdev->discov_off, + to); + } + } else { + changed = test_and_clear_bit(HCI_DISCOVERABLE, + &hdev->dev_flags); + } + + send_settings_rsp(cmd->sk, MGMT_OP_SET_DISCOVERABLE, hdev); + + if (changed) + new_settings(hdev, cmd->sk); + + /* When the discoverable mode gets changed, make sure + * that class of device has the limited discoverable + * bit correctly set. + */ + hci_req_init(&req, hdev); + update_class(&req); + hci_req_run(&req, NULL); + +remove_cmd: + mgmt_pending_remove(cmd); + +unlock: + hci_dev_unlock(hdev); +} + static int set_discoverable(struct sock *sk, struct hci_dev *hdev, void *data, u16 len) { struct mgmt_cp_set_discoverable *cp = data; struct pending_cmd *cmd; + struct hci_request req; u16 timeout; u8 scan; int err; BT_DBG("request for %s", hdev->name); - if (!lmp_bredr_capable(hdev)) + if (!test_bit(HCI_LE_ENABLED, &hdev->dev_flags) && + !test_bit(HCI_BREDR_ENABLED, &hdev->dev_flags)) return cmd_status(sk, hdev->id, MGMT_OP_SET_DISCOVERABLE, - MGMT_STATUS_NOT_SUPPORTED); + MGMT_STATUS_REJECTED); - if (cp->val != 0x00 && cp->val != 0x01) + if (cp->val != 0x00 && cp->val != 0x01 && cp->val != 0x02) return cmd_status(sk, hdev->id, MGMT_OP_SET_DISCOVERABLE, MGMT_STATUS_INVALID_PARAMS); timeout = __le16_to_cpu(cp->timeout); - if (!cp->val && timeout > 0) + + /* Disabling discoverable requires that no timeout is set, + * and enabling limited discoverable requires a timeout. + */ + if ((cp->val == 0x00 && timeout > 0) || + (cp->val == 0x02 && timeout == 0)) return cmd_status(sk, hdev->id, MGMT_OP_SET_DISCOVERABLE, MGMT_STATUS_INVALID_PARAMS); @@ -931,6 +1191,10 @@ static int set_discoverable(struct sock *sk, struct hci_dev *hdev, void *data, if (!hdev_is_powered(hdev)) { bool changed = false; + /* Setting limited discoverable when powered off is + * not a valid operation since it requires a timeout + * and so no need to check HCI_LIMITED_DISCOVERABLE. + */ if (!!cp->val != test_bit(HCI_DISCOVERABLE, &hdev->dev_flags)) { change_bit(HCI_DISCOVERABLE, &hdev->dev_flags); changed = true; @@ -946,16 +1210,20 @@ static int set_discoverable(struct sock *sk, struct hci_dev *hdev, void *data, goto failed; } - if (!!cp->val == test_bit(HCI_DISCOVERABLE, &hdev->dev_flags)) { - if (hdev->discov_timeout > 0) { - cancel_delayed_work(&hdev->discov_off); - hdev->discov_timeout = 0; - } + /* If the current mode is the same, then just update the timeout + * value with the new value. And if only the timeout gets updated, + * then no need for any HCI transactions. + */ + if (!!cp->val == test_bit(HCI_DISCOVERABLE, &hdev->dev_flags) && + (cp->val == 0x02) == test_bit(HCI_LIMITED_DISCOVERABLE, + &hdev->dev_flags)) { + cancel_delayed_work(&hdev->discov_off); + hdev->discov_timeout = timeout; - if (cp->val && timeout > 0) { - hdev->discov_timeout = timeout; + if (cp->val && hdev->discov_timeout > 0) { + int to = msecs_to_jiffies(hdev->discov_timeout * 1000); queue_delayed_work(hdev->workqueue, &hdev->discov_off, - msecs_to_jiffies(hdev->discov_timeout * 1000)); + to); } err = send_settings_rsp(sk, MGMT_OP_SET_DISCOVERABLE, hdev); @@ -968,20 +1236,66 @@ static int set_discoverable(struct sock *sk, struct hci_dev *hdev, void *data, goto failed; } + /* Cancel any potential discoverable timeout that might be + * still active and store new timeout value. The arming of + * the timeout happens in the complete handler. + */ + cancel_delayed_work(&hdev->discov_off); + hdev->discov_timeout = timeout; + + /* Limited discoverable mode */ + if (cp->val == 0x02) + set_bit(HCI_LIMITED_DISCOVERABLE, &hdev->dev_flags); + else + clear_bit(HCI_LIMITED_DISCOVERABLE, &hdev->dev_flags); + + hci_req_init(&req, hdev); + + /* The procedure for LE-only controllers is much simpler - just + * update the advertising data. + */ + if (!test_bit(HCI_BREDR_ENABLED, &hdev->dev_flags)) + goto update_ad; + scan = SCAN_PAGE; - if (cp->val) + if (cp->val) { + struct hci_cp_write_current_iac_lap hci_cp; + + if (cp->val == 0x02) { + /* Limited discoverable mode */ + hci_cp.num_iac = 2; + hci_cp.iac_lap[0] = 0x00; /* LIAC */ + hci_cp.iac_lap[1] = 0x8b; + hci_cp.iac_lap[2] = 0x9e; + hci_cp.iac_lap[3] = 0x33; /* GIAC */ + hci_cp.iac_lap[4] = 0x8b; + hci_cp.iac_lap[5] = 0x9e; + } else { + /* General discoverable mode */ + hci_cp.num_iac = 1; + hci_cp.iac_lap[0] = 0x33; /* GIAC */ + hci_cp.iac_lap[1] = 0x8b; + hci_cp.iac_lap[2] = 0x9e; + } + + hci_req_add(&req, HCI_OP_WRITE_CURRENT_IAC_LAP, + (hci_cp.num_iac * 3) + 1, &hci_cp); + scan |= SCAN_INQUIRY; - else - cancel_delayed_work(&hdev->discov_off); + } else { + clear_bit(HCI_LIMITED_DISCOVERABLE, &hdev->dev_flags); + } + + hci_req_add(&req, HCI_OP_WRITE_SCAN_ENABLE, sizeof(scan), &scan); - err = hci_send_cmd(hdev, HCI_OP_WRITE_SCAN_ENABLE, 1, &scan); +update_ad: + update_adv_data(&req); + + err = hci_req_run(&req, set_discoverable_complete); if (err < 0) mgmt_pending_remove(cmd); - if (cp->val) - hdev->discov_timeout = timeout; - failed: hci_dev_unlock(hdev); return err; @@ -993,6 +1307,9 @@ static void write_fast_connectable(struct hci_request *req, bool enable) struct hci_cp_write_page_scan_activity acp; u8 type; + if (!test_bit(HCI_BREDR_ENABLED, &hdev->dev_flags)) + return; + if (hdev->hci_ver < BLUETOOTH_VER_1_2) return; @@ -1019,9 +1336,55 @@ static void write_fast_connectable(struct hci_request *req, bool enable) hci_req_add(req, HCI_OP_WRITE_PAGE_SCAN_TYPE, 1, &type); } +static u8 get_adv_type(struct hci_dev *hdev) +{ + struct pending_cmd *cmd; + bool connectable; + + /* If there's a pending mgmt command the flag will not yet have + * it's final value, so check for this first. + */ + cmd = mgmt_pending_find(MGMT_OP_SET_CONNECTABLE, hdev); + if (cmd) { + struct mgmt_mode *cp = cmd->param; + connectable = !!cp->val; + } else { + connectable = test_bit(HCI_CONNECTABLE, &hdev->dev_flags); + } + + return connectable ? LE_ADV_IND : LE_ADV_NONCONN_IND; +} + +static void enable_advertising(struct hci_request *req) +{ + struct hci_dev *hdev = req->hdev; + struct hci_cp_le_set_adv_param cp; + u8 enable = 0x01; + + memset(&cp, 0, sizeof(cp)); + cp.min_interval = __constant_cpu_to_le16(0x0800); + cp.max_interval = __constant_cpu_to_le16(0x0800); + cp.type = get_adv_type(hdev); + cp.own_address_type = hdev->own_addr_type; + cp.channel_map = 0x07; + + hci_req_add(req, HCI_OP_LE_SET_ADV_PARAM, sizeof(cp), &cp); + + hci_req_add(req, HCI_OP_LE_SET_ADV_ENABLE, sizeof(enable), &enable); +} + +static void disable_advertising(struct hci_request *req) +{ + u8 enable = 0x00; + + hci_req_add(req, HCI_OP_LE_SET_ADV_ENABLE, sizeof(enable), &enable); +} + static void set_connectable_complete(struct hci_dev *hdev, u8 status) { struct pending_cmd *cmd; + struct mgmt_mode *cp; + bool changed; BT_DBG("status 0x%02x", status); @@ -1031,14 +1394,56 @@ static void set_connectable_complete(struct hci_dev *hdev, u8 status) if (!cmd) goto unlock; + if (status) { + u8 mgmt_err = mgmt_status(status); + cmd_status(cmd->sk, cmd->index, cmd->opcode, mgmt_err); + goto remove_cmd; + } + + cp = cmd->param; + if (cp->val) + changed = !test_and_set_bit(HCI_CONNECTABLE, &hdev->dev_flags); + else + changed = test_and_clear_bit(HCI_CONNECTABLE, &hdev->dev_flags); + send_settings_rsp(cmd->sk, MGMT_OP_SET_CONNECTABLE, hdev); + if (changed) + new_settings(hdev, cmd->sk); + +remove_cmd: mgmt_pending_remove(cmd); unlock: hci_dev_unlock(hdev); } +static int set_connectable_update_settings(struct hci_dev *hdev, + struct sock *sk, u8 val) +{ + bool changed = false; + int err; + + if (!!val != test_bit(HCI_CONNECTABLE, &hdev->dev_flags)) + changed = true; + + if (val) { + set_bit(HCI_CONNECTABLE, &hdev->dev_flags); + } else { + clear_bit(HCI_CONNECTABLE, &hdev->dev_flags); + clear_bit(HCI_DISCOVERABLE, &hdev->dev_flags); + } + + err = send_settings_rsp(sk, MGMT_OP_SET_CONNECTABLE, hdev); + if (err < 0) + return err; + + if (changed) + return new_settings(hdev, sk); + + return 0; +} + static int set_connectable(struct sock *sk, struct hci_dev *hdev, void *data, u16 len) { @@ -1050,9 +1455,10 @@ static int set_connectable(struct sock *sk, struct hci_dev *hdev, void *data, BT_DBG("request for %s", hdev->name); - if (!lmp_bredr_capable(hdev)) + if (!test_bit(HCI_LE_ENABLED, &hdev->dev_flags) && + !test_bit(HCI_BREDR_ENABLED, &hdev->dev_flags)) return cmd_status(sk, hdev->id, MGMT_OP_SET_CONNECTABLE, - MGMT_STATUS_NOT_SUPPORTED); + MGMT_STATUS_REJECTED); if (cp->val != 0x00 && cp->val != 0x01) return cmd_status(sk, hdev->id, MGMT_OP_SET_CONNECTABLE, @@ -1061,25 +1467,7 @@ static int set_connectable(struct sock *sk, struct hci_dev *hdev, void *data, hci_dev_lock(hdev); if (!hdev_is_powered(hdev)) { - bool changed = false; - - if (!!cp->val != test_bit(HCI_CONNECTABLE, &hdev->dev_flags)) - changed = true; - - if (cp->val) { - set_bit(HCI_CONNECTABLE, &hdev->dev_flags); - } else { - clear_bit(HCI_CONNECTABLE, &hdev->dev_flags); - clear_bit(HCI_DISCOVERABLE, &hdev->dev_flags); - } - - err = send_settings_rsp(sk, MGMT_OP_SET_CONNECTABLE, hdev); - if (err < 0) - goto failed; - - if (changed) - err = new_settings(hdev, sk); - + err = set_connectable_update_settings(hdev, sk, cp->val); goto failed; } @@ -1090,30 +1478,37 @@ static int set_connectable(struct sock *sk, struct hci_dev *hdev, void *data, goto failed; } - if (!!cp->val == test_bit(HCI_PSCAN, &hdev->flags)) { - err = send_settings_rsp(sk, MGMT_OP_SET_CONNECTABLE, hdev); - goto failed; - } - cmd = mgmt_pending_add(sk, MGMT_OP_SET_CONNECTABLE, hdev, data, len); if (!cmd) { err = -ENOMEM; goto failed; } - if (cp->val) { - scan = SCAN_PAGE; - } else { - scan = 0; + hci_req_init(&req, hdev); - if (test_bit(HCI_ISCAN, &hdev->flags) && - hdev->discov_timeout > 0) - cancel_delayed_work(&hdev->discov_off); - } + /* If BR/EDR is not enabled and we disable advertising as a + * by-product of disabling connectable, we need to update the + * advertising flags. + */ + if (!test_bit(HCI_BREDR_ENABLED, &hdev->dev_flags)) { + if (!cp->val) { + clear_bit(HCI_LIMITED_DISCOVERABLE, &hdev->dev_flags); + clear_bit(HCI_DISCOVERABLE, &hdev->dev_flags); + } + update_adv_data(&req); + } else if (cp->val != test_bit(HCI_PSCAN, &hdev->flags)) { + if (cp->val) { + scan = SCAN_PAGE; + } else { + scan = 0; - hci_req_init(&req, hdev); + if (test_bit(HCI_ISCAN, &hdev->flags) && + hdev->discov_timeout > 0) + cancel_delayed_work(&hdev->discov_off); + } - hci_req_add(&req, HCI_OP_WRITE_SCAN_ENABLE, 1, &scan); + hci_req_add(&req, HCI_OP_WRITE_SCAN_ENABLE, 1, &scan); + } /* If we're going from non-connectable to connectable or * vice-versa when fast connectable is enabled ensure that fast @@ -1124,9 +1519,20 @@ static int set_connectable(struct sock *sk, struct hci_dev *hdev, void *data, if (cp->val || test_bit(HCI_FAST_CONNECTABLE, &hdev->dev_flags)) write_fast_connectable(&req, false); + if (test_bit(HCI_ADVERTISING, &hdev->dev_flags) && + hci_conn_num(hdev, LE_LINK) == 0) { + disable_advertising(&req); + enable_advertising(&req); + } + err = hci_req_run(&req, set_connectable_complete); - if (err < 0) + if (err < 0) { mgmt_pending_remove(cmd); + if (err == -ENODATA) + err = set_connectable_update_settings(hdev, sk, + cp->val); + goto failed; + } failed: hci_dev_unlock(hdev); @@ -1137,6 +1543,7 @@ static int set_pairable(struct sock *sk, struct hci_dev *hdev, void *data, u16 len) { struct mgmt_mode *cp = data; + bool changed; int err; BT_DBG("request for %s", hdev->name); @@ -1148,17 +1555,18 @@ static int set_pairable(struct sock *sk, struct hci_dev *hdev, void *data, hci_dev_lock(hdev); if (cp->val) - set_bit(HCI_PAIRABLE, &hdev->dev_flags); + changed = !test_and_set_bit(HCI_PAIRABLE, &hdev->dev_flags); else - clear_bit(HCI_PAIRABLE, &hdev->dev_flags); + changed = test_and_clear_bit(HCI_PAIRABLE, &hdev->dev_flags); err = send_settings_rsp(sk, MGMT_OP_SET_PAIRABLE, hdev); if (err < 0) - goto failed; + goto unlock; - err = new_settings(hdev, sk); + if (changed) + err = new_settings(hdev, sk); -failed: +unlock: hci_dev_unlock(hdev); return err; } @@ -1168,14 +1576,15 @@ static int set_link_security(struct sock *sk, struct hci_dev *hdev, void *data, { struct mgmt_mode *cp = data; struct pending_cmd *cmd; - u8 val; + u8 val, status; int err; BT_DBG("request for %s", hdev->name); - if (!lmp_bredr_capable(hdev)) + status = mgmt_bredr_support(hdev); + if (status) return cmd_status(sk, hdev->id, MGMT_OP_SET_LINK_SECURITY, - MGMT_STATUS_NOT_SUPPORTED); + status); if (cp->val != 0x00 && cp->val != 0x01) return cmd_status(sk, hdev->id, MGMT_OP_SET_LINK_SECURITY, @@ -1236,11 +1645,15 @@ static int set_ssp(struct sock *sk, struct hci_dev *hdev, void *data, u16 len) { struct mgmt_mode *cp = data; struct pending_cmd *cmd; - u8 val; + u8 status; int err; BT_DBG("request for %s", hdev->name); + status = mgmt_bredr_support(hdev); + if (status) + return cmd_status(sk, hdev->id, MGMT_OP_SET_SSP, status); + if (!lmp_ssp_capable(hdev)) return cmd_status(sk, hdev->id, MGMT_OP_SET_SSP, MGMT_STATUS_NOT_SUPPORTED); @@ -1251,14 +1664,20 @@ static int set_ssp(struct sock *sk, struct hci_dev *hdev, void *data, u16 len) hci_dev_lock(hdev); - val = !!cp->val; - if (!hdev_is_powered(hdev)) { - bool changed = false; + bool changed; - if (val != test_bit(HCI_SSP_ENABLED, &hdev->dev_flags)) { - change_bit(HCI_SSP_ENABLED, &hdev->dev_flags); - changed = true; + if (cp->val) { + changed = !test_and_set_bit(HCI_SSP_ENABLED, + &hdev->dev_flags); + } else { + changed = test_and_clear_bit(HCI_SSP_ENABLED, + &hdev->dev_flags); + if (!changed) + changed = test_and_clear_bit(HCI_HS_ENABLED, + &hdev->dev_flags); + else + clear_bit(HCI_HS_ENABLED, &hdev->dev_flags); } err = send_settings_rsp(sk, MGMT_OP_SET_SSP, hdev); @@ -1271,13 +1690,14 @@ static int set_ssp(struct sock *sk, struct hci_dev *hdev, void *data, u16 len) goto failed; } - if (mgmt_pending_find(MGMT_OP_SET_SSP, hdev)) { + if (mgmt_pending_find(MGMT_OP_SET_SSP, hdev) || + mgmt_pending_find(MGMT_OP_SET_HS, hdev)) { err = cmd_status(sk, hdev->id, MGMT_OP_SET_SSP, MGMT_STATUS_BUSY); goto failed; } - if (test_bit(HCI_SSP_ENABLED, &hdev->dev_flags) == val) { + if (!!cp->val == test_bit(HCI_SSP_ENABLED, &hdev->dev_flags)) { err = send_settings_rsp(sk, MGMT_OP_SET_SSP, hdev); goto failed; } @@ -1288,7 +1708,7 @@ static int set_ssp(struct sock *sk, struct hci_dev *hdev, void *data, u16 len) goto failed; } - err = hci_send_cmd(hdev, HCI_OP_WRITE_SSP_MODE, sizeof(val), &val); + err = hci_send_cmd(hdev, HCI_OP_WRITE_SSP_MODE, 1, &cp->val); if (err < 0) { mgmt_pending_remove(cmd); goto failed; @@ -1302,23 +1722,90 @@ failed: static int set_hs(struct sock *sk, struct hci_dev *hdev, void *data, u16 len) { struct mgmt_mode *cp = data; + bool changed; + u8 status; + int err; BT_DBG("request for %s", hdev->name); - if (!enable_hs) + status = mgmt_bredr_support(hdev); + if (status) + return cmd_status(sk, hdev->id, MGMT_OP_SET_HS, status); + + if (!lmp_ssp_capable(hdev)) return cmd_status(sk, hdev->id, MGMT_OP_SET_HS, MGMT_STATUS_NOT_SUPPORTED); + if (!test_bit(HCI_SSP_ENABLED, &hdev->dev_flags)) + return cmd_status(sk, hdev->id, MGMT_OP_SET_HS, + MGMT_STATUS_REJECTED); + if (cp->val != 0x00 && cp->val != 0x01) return cmd_status(sk, hdev->id, MGMT_OP_SET_HS, MGMT_STATUS_INVALID_PARAMS); - if (cp->val) - set_bit(HCI_HS_ENABLED, &hdev->dev_flags); - else - clear_bit(HCI_HS_ENABLED, &hdev->dev_flags); + hci_dev_lock(hdev); + + if (cp->val) { + changed = !test_and_set_bit(HCI_HS_ENABLED, &hdev->dev_flags); + } else { + if (hdev_is_powered(hdev)) { + err = cmd_status(sk, hdev->id, MGMT_OP_SET_HS, + MGMT_STATUS_REJECTED); + goto unlock; + } + + changed = test_and_clear_bit(HCI_HS_ENABLED, &hdev->dev_flags); + } - return send_settings_rsp(sk, MGMT_OP_SET_HS, hdev); + err = send_settings_rsp(sk, MGMT_OP_SET_HS, hdev); + if (err < 0) + goto unlock; + + if (changed) + err = new_settings(hdev, sk); + +unlock: + hci_dev_unlock(hdev); + return err; +} + +static void le_enable_complete(struct hci_dev *hdev, u8 status) +{ + struct cmd_lookup match = { NULL, hdev }; + + if (status) { + u8 mgmt_err = mgmt_status(status); + + mgmt_pending_foreach(MGMT_OP_SET_LE, hdev, cmd_status_rsp, + &mgmt_err); + return; + } + + mgmt_pending_foreach(MGMT_OP_SET_LE, hdev, settings_rsp, &match); + + new_settings(hdev, match.sk); + + if (match.sk) + sock_put(match.sk); + + /* Make sure the controller has a good default for + * advertising data. Restrict the update to when LE + * has actually been enabled. During power on, the + * update in powered_update_hci will take care of it. + */ + if (test_bit(HCI_LE_ENABLED, &hdev->dev_flags)) { + struct hci_request req; + + hci_dev_lock(hdev); + + hci_req_init(&req, hdev); + update_adv_data(&req); + update_scan_rsp_data(&req); + hci_req_run(&req, NULL); + + hci_dev_unlock(hdev); + } } static int set_le(struct sock *sk, struct hci_dev *hdev, void *data, u16 len) @@ -1326,6 +1813,7 @@ static int set_le(struct sock *sk, struct hci_dev *hdev, void *data, u16 len) struct mgmt_mode *cp = data; struct hci_cp_write_le_host_supported hci_cp; struct pending_cmd *cmd; + struct hci_request req; int err; u8 val, enabled; @@ -1340,7 +1828,7 @@ static int set_le(struct sock *sk, struct hci_dev *hdev, void *data, u16 len) MGMT_STATUS_INVALID_PARAMS); /* LE-only devices do not allow toggling LE on/off */ - if (!lmp_bredr_capable(hdev)) + if (!test_bit(HCI_BREDR_ENABLED, &hdev->dev_flags)) return cmd_status(sk, hdev->id, MGMT_OP_SET_LE, MGMT_STATUS_REJECTED); @@ -1357,6 +1845,11 @@ static int set_le(struct sock *sk, struct hci_dev *hdev, void *data, u16 len) changed = true; } + if (!val && test_bit(HCI_ADVERTISING, &hdev->dev_flags)) { + clear_bit(HCI_ADVERTISING, &hdev->dev_flags); + changed = true; + } + err = send_settings_rsp(sk, MGMT_OP_SET_LE, hdev); if (err < 0) goto unlock; @@ -1367,7 +1860,8 @@ static int set_le(struct sock *sk, struct hci_dev *hdev, void *data, u16 len) goto unlock; } - if (mgmt_pending_find(MGMT_OP_SET_LE, hdev)) { + if (mgmt_pending_find(MGMT_OP_SET_LE, hdev) || + mgmt_pending_find(MGMT_OP_SET_ADVERTISING, hdev)) { err = cmd_status(sk, hdev->id, MGMT_OP_SET_LE, MGMT_STATUS_BUSY); goto unlock; @@ -1379,15 +1873,22 @@ static int set_le(struct sock *sk, struct hci_dev *hdev, void *data, u16 len) goto unlock; } + hci_req_init(&req, hdev); + memset(&hci_cp, 0, sizeof(hci_cp)); if (val) { hci_cp.le = val; hci_cp.simul = lmp_le_br_capable(hdev); + } else { + if (test_bit(HCI_ADVERTISING, &hdev->dev_flags)) + disable_advertising(&req); } - err = hci_send_cmd(hdev, HCI_OP_WRITE_LE_HOST_SUPPORTED, sizeof(hci_cp), - &hci_cp); + hci_req_add(&req, HCI_OP_WRITE_LE_HOST_SUPPORTED, sizeof(hci_cp), + &hci_cp); + + err = hci_req_run(&req, le_enable_complete); if (err < 0) mgmt_pending_remove(cmd); @@ -1706,6 +2207,12 @@ static int load_link_keys(struct sock *sk, struct hci_dev *hdev, void *data, u16 key_count, expected_len; int i; + BT_DBG("request for %s", hdev->name); + + if (!lmp_bredr_capable(hdev)) + return cmd_status(sk, hdev->id, MGMT_OP_LOAD_LINK_KEYS, + MGMT_STATUS_NOT_SUPPORTED); + key_count = __le16_to_cpu(cp->key_count); expected_len = sizeof(*cp) + key_count * @@ -2515,8 +3022,11 @@ static int set_local_name(struct sock *sk, struct hci_dev *hdev, void *data, update_eir(&req); } + /* The name is stored in the scan response data and so + * no need to udpate the advertising data here. + */ if (lmp_le_capable(hdev)) - hci_update_ad(&req); + update_scan_rsp_data(&req); err = hci_req_run(&req, set_name_complete); if (err < 0) @@ -2685,6 +3195,7 @@ static int start_discovery(struct sock *sk, struct hci_dev *hdev, struct hci_request req; /* General inquiry access code (GIAC) */ u8 lap[3] = { 0x33, 0x8b, 0x9e }; + u8 status; int err; BT_DBG("%s", hdev->name); @@ -2721,9 +3232,10 @@ static int start_discovery(struct sock *sk, struct hci_dev *hdev, switch (hdev->discovery.type) { case DISCOV_TYPE_BREDR: - if (!lmp_bredr_capable(hdev)) { + status = mgmt_bredr_support(hdev); + if (status) { err = cmd_status(sk, hdev->id, MGMT_OP_START_DISCOVERY, - MGMT_STATUS_NOT_SUPPORTED); + status); mgmt_pending_remove(cmd); goto failed; } @@ -2745,22 +3257,23 @@ static int start_discovery(struct sock *sk, struct hci_dev *hdev, case DISCOV_TYPE_LE: case DISCOV_TYPE_INTERLEAVED: - if (!test_bit(HCI_LE_ENABLED, &hdev->dev_flags)) { + status = mgmt_le_support(hdev); + if (status) { err = cmd_status(sk, hdev->id, MGMT_OP_START_DISCOVERY, - MGMT_STATUS_NOT_SUPPORTED); + status); mgmt_pending_remove(cmd); goto failed; } if (hdev->discovery.type == DISCOV_TYPE_INTERLEAVED && - !lmp_bredr_capable(hdev)) { + !test_bit(HCI_BREDR_ENABLED, &hdev->dev_flags)) { err = cmd_status(sk, hdev->id, MGMT_OP_START_DISCOVERY, MGMT_STATUS_NOT_SUPPORTED); mgmt_pending_remove(cmd); goto failed; } - if (test_bit(HCI_LE_PERIPHERAL, &hdev->dev_flags)) { + if (test_bit(HCI_ADVERTISING, &hdev->dev_flags)) { err = cmd_status(sk, hdev->id, MGMT_OP_START_DISCOVERY, MGMT_STATUS_REJECTED); mgmt_pending_remove(cmd); @@ -2778,6 +3291,7 @@ static int start_discovery(struct sock *sk, struct hci_dev *hdev, param_cp.type = LE_SCAN_ACTIVE; param_cp.interval = cpu_to_le16(DISCOV_LE_SCAN_INT); param_cp.window = cpu_to_le16(DISCOV_LE_SCAN_WIN); + param_cp.own_address_type = hdev->own_addr_type; hci_req_add(&req, HCI_OP_LE_SET_SCAN_PARAM, sizeof(param_cp), ¶m_cp); @@ -3065,6 +3579,186 @@ static int set_device_id(struct sock *sk, struct hci_dev *hdev, void *data, return err; } +static void set_advertising_complete(struct hci_dev *hdev, u8 status) +{ + struct cmd_lookup match = { NULL, hdev }; + + if (status) { + u8 mgmt_err = mgmt_status(status); + + mgmt_pending_foreach(MGMT_OP_SET_ADVERTISING, hdev, + cmd_status_rsp, &mgmt_err); + return; + } + + mgmt_pending_foreach(MGMT_OP_SET_ADVERTISING, hdev, settings_rsp, + &match); + + new_settings(hdev, match.sk); + + if (match.sk) + sock_put(match.sk); +} + +static int set_advertising(struct sock *sk, struct hci_dev *hdev, void *data, + u16 len) +{ + struct mgmt_mode *cp = data; + struct pending_cmd *cmd; + struct hci_request req; + u8 val, enabled, status; + int err; + + BT_DBG("request for %s", hdev->name); + + status = mgmt_le_support(hdev); + if (status) + return cmd_status(sk, hdev->id, MGMT_OP_SET_ADVERTISING, + status); + + if (cp->val != 0x00 && cp->val != 0x01) + return cmd_status(sk, hdev->id, MGMT_OP_SET_ADVERTISING, + MGMT_STATUS_INVALID_PARAMS); + + hci_dev_lock(hdev); + + val = !!cp->val; + enabled = test_bit(HCI_ADVERTISING, &hdev->dev_flags); + + /* The following conditions are ones which mean that we should + * not do any HCI communication but directly send a mgmt + * response to user space (after toggling the flag if + * necessary). + */ + if (!hdev_is_powered(hdev) || val == enabled || + hci_conn_num(hdev, LE_LINK) > 0) { + bool changed = false; + + if (val != test_bit(HCI_ADVERTISING, &hdev->dev_flags)) { + change_bit(HCI_ADVERTISING, &hdev->dev_flags); + changed = true; + } + + err = send_settings_rsp(sk, MGMT_OP_SET_ADVERTISING, hdev); + if (err < 0) + goto unlock; + + if (changed) + err = new_settings(hdev, sk); + + goto unlock; + } + + if (mgmt_pending_find(MGMT_OP_SET_ADVERTISING, hdev) || + mgmt_pending_find(MGMT_OP_SET_LE, hdev)) { + err = cmd_status(sk, hdev->id, MGMT_OP_SET_ADVERTISING, + MGMT_STATUS_BUSY); + goto unlock; + } + + cmd = mgmt_pending_add(sk, MGMT_OP_SET_ADVERTISING, hdev, data, len); + if (!cmd) { + err = -ENOMEM; + goto unlock; + } + + hci_req_init(&req, hdev); + + if (val) + enable_advertising(&req); + else + disable_advertising(&req); + + err = hci_req_run(&req, set_advertising_complete); + if (err < 0) + mgmt_pending_remove(cmd); + +unlock: + hci_dev_unlock(hdev); + return err; +} + +static int set_static_address(struct sock *sk, struct hci_dev *hdev, + void *data, u16 len) +{ + struct mgmt_cp_set_static_address *cp = data; + int err; + + BT_DBG("%s", hdev->name); + + if (!lmp_le_capable(hdev)) + return cmd_status(sk, hdev->id, MGMT_OP_SET_STATIC_ADDRESS, + MGMT_STATUS_NOT_SUPPORTED); + + if (hdev_is_powered(hdev)) + return cmd_status(sk, hdev->id, MGMT_OP_SET_STATIC_ADDRESS, + MGMT_STATUS_REJECTED); + + if (bacmp(&cp->bdaddr, BDADDR_ANY)) { + if (!bacmp(&cp->bdaddr, BDADDR_NONE)) + return cmd_status(sk, hdev->id, + MGMT_OP_SET_STATIC_ADDRESS, + MGMT_STATUS_INVALID_PARAMS); + + /* Two most significant bits shall be set */ + if ((cp->bdaddr.b[5] & 0xc0) != 0xc0) + return cmd_status(sk, hdev->id, + MGMT_OP_SET_STATIC_ADDRESS, + MGMT_STATUS_INVALID_PARAMS); + } + + hci_dev_lock(hdev); + + bacpy(&hdev->static_addr, &cp->bdaddr); + + err = cmd_complete(sk, hdev->id, MGMT_OP_SET_STATIC_ADDRESS, 0, NULL, 0); + + hci_dev_unlock(hdev); + + return err; +} + +static int set_scan_params(struct sock *sk, struct hci_dev *hdev, + void *data, u16 len) +{ + struct mgmt_cp_set_scan_params *cp = data; + __u16 interval, window; + int err; + + BT_DBG("%s", hdev->name); + + if (!lmp_le_capable(hdev)) + return cmd_status(sk, hdev->id, MGMT_OP_SET_SCAN_PARAMS, + MGMT_STATUS_NOT_SUPPORTED); + + interval = __le16_to_cpu(cp->interval); + + if (interval < 0x0004 || interval > 0x4000) + return cmd_status(sk, hdev->id, MGMT_OP_SET_SCAN_PARAMS, + MGMT_STATUS_INVALID_PARAMS); + + window = __le16_to_cpu(cp->window); + + if (window < 0x0004 || window > 0x4000) + return cmd_status(sk, hdev->id, MGMT_OP_SET_SCAN_PARAMS, + MGMT_STATUS_INVALID_PARAMS); + + if (window > interval) + return cmd_status(sk, hdev->id, MGMT_OP_SET_SCAN_PARAMS, + MGMT_STATUS_INVALID_PARAMS); + + hci_dev_lock(hdev); + + hdev->le_scan_interval = interval; + hdev->le_scan_window = window; + + err = cmd_complete(sk, hdev->id, MGMT_OP_SET_SCAN_PARAMS, 0, NULL, 0); + + hci_dev_unlock(hdev); + + return err; +} + static void fast_connectable_complete(struct hci_dev *hdev, u8 status) { struct pending_cmd *cmd; @@ -3108,7 +3802,8 @@ static int set_fast_connectable(struct sock *sk, struct hci_dev *hdev, BT_DBG("%s", hdev->name); - if (!lmp_bredr_capable(hdev) || hdev->hci_ver < BLUETOOTH_VER_1_2) + if (!test_bit(HCI_BREDR_ENABLED, &hdev->dev_flags) || + hdev->hci_ver < BLUETOOTH_VER_1_2) return cmd_status(sk, hdev->id, MGMT_OP_SET_FAST_CONNECTABLE, MGMT_STATUS_NOT_SUPPORTED); @@ -3162,6 +3857,148 @@ unlock: return err; } +static void set_bredr_scan(struct hci_request *req) +{ + struct hci_dev *hdev = req->hdev; + u8 scan = 0; + + /* Ensure that fast connectable is disabled. This function will + * not do anything if the page scan parameters are already what + * they should be. + */ + write_fast_connectable(req, false); + + if (test_bit(HCI_CONNECTABLE, &hdev->dev_flags)) + scan |= SCAN_PAGE; + if (test_bit(HCI_DISCOVERABLE, &hdev->dev_flags)) + scan |= SCAN_INQUIRY; + + if (scan) + hci_req_add(req, HCI_OP_WRITE_SCAN_ENABLE, 1, &scan); +} + +static void set_bredr_complete(struct hci_dev *hdev, u8 status) +{ + struct pending_cmd *cmd; + + BT_DBG("status 0x%02x", status); + + hci_dev_lock(hdev); + + cmd = mgmt_pending_find(MGMT_OP_SET_BREDR, hdev); + if (!cmd) + goto unlock; + + if (status) { + u8 mgmt_err = mgmt_status(status); + + /* We need to restore the flag if related HCI commands + * failed. + */ + clear_bit(HCI_BREDR_ENABLED, &hdev->dev_flags); + + cmd_status(cmd->sk, cmd->index, cmd->opcode, mgmt_err); + } else { + send_settings_rsp(cmd->sk, MGMT_OP_SET_BREDR, hdev); + new_settings(hdev, cmd->sk); + } + + mgmt_pending_remove(cmd); + +unlock: + hci_dev_unlock(hdev); +} + +static int set_bredr(struct sock *sk, struct hci_dev *hdev, void *data, u16 len) +{ + struct mgmt_mode *cp = data; + struct pending_cmd *cmd; + struct hci_request req; + int err; + + BT_DBG("request for %s", hdev->name); + + if (!lmp_bredr_capable(hdev) || !lmp_le_capable(hdev)) + return cmd_status(sk, hdev->id, MGMT_OP_SET_BREDR, + MGMT_STATUS_NOT_SUPPORTED); + + if (!test_bit(HCI_LE_ENABLED, &hdev->dev_flags)) + return cmd_status(sk, hdev->id, MGMT_OP_SET_BREDR, + MGMT_STATUS_REJECTED); + + if (cp->val != 0x00 && cp->val != 0x01) + return cmd_status(sk, hdev->id, MGMT_OP_SET_BREDR, + MGMT_STATUS_INVALID_PARAMS); + + hci_dev_lock(hdev); + + if (cp->val == test_bit(HCI_BREDR_ENABLED, &hdev->dev_flags)) { + err = send_settings_rsp(sk, MGMT_OP_SET_BREDR, hdev); + goto unlock; + } + + if (!hdev_is_powered(hdev)) { + if (!cp->val) { + clear_bit(HCI_DISCOVERABLE, &hdev->dev_flags); + clear_bit(HCI_SSP_ENABLED, &hdev->dev_flags); + clear_bit(HCI_LINK_SECURITY, &hdev->dev_flags); + clear_bit(HCI_FAST_CONNECTABLE, &hdev->dev_flags); + clear_bit(HCI_HS_ENABLED, &hdev->dev_flags); + } + + change_bit(HCI_BREDR_ENABLED, &hdev->dev_flags); + + err = send_settings_rsp(sk, MGMT_OP_SET_BREDR, hdev); + if (err < 0) + goto unlock; + + err = new_settings(hdev, sk); + goto unlock; + } + + /* Reject disabling when powered on */ + if (!cp->val) { + err = cmd_status(sk, hdev->id, MGMT_OP_SET_BREDR, + MGMT_STATUS_REJECTED); + goto unlock; + } + + if (mgmt_pending_find(MGMT_OP_SET_BREDR, hdev)) { + err = cmd_status(sk, hdev->id, MGMT_OP_SET_BREDR, + MGMT_STATUS_BUSY); + goto unlock; + } + + cmd = mgmt_pending_add(sk, MGMT_OP_SET_BREDR, hdev, data, len); + if (!cmd) { + err = -ENOMEM; + goto unlock; + } + + /* We need to flip the bit already here so that update_adv_data + * generates the correct flags. + */ + set_bit(HCI_BREDR_ENABLED, &hdev->dev_flags); + + hci_req_init(&req, hdev); + + if (test_bit(HCI_CONNECTABLE, &hdev->dev_flags)) + set_bredr_scan(&req); + + /* Since only the advertising data flags will change, there + * is no need to update the scan response data. + */ + update_adv_data(&req); + + err = hci_req_run(&req, set_bredr_complete); + if (err < 0) + mgmt_pending_remove(cmd); + +unlock: + hci_dev_unlock(hdev); + return err; +} + static bool ltk_is_valid(struct mgmt_ltk_info *key) { if (key->authenticated != 0x00 && key->authenticated != 0x01) @@ -3180,6 +4017,12 @@ static int load_long_term_keys(struct sock *sk, struct hci_dev *hdev, u16 key_count, expected_len; int i, err; + BT_DBG("request for %s", hdev->name); + + if (!lmp_le_capable(hdev)) + return cmd_status(sk, hdev->id, MGMT_OP_LOAD_LONG_TERM_KEYS, + MGMT_STATUS_NOT_SUPPORTED); + key_count = __le16_to_cpu(cp->key_count); expected_len = sizeof(*cp) + key_count * @@ -3208,15 +4051,19 @@ static int load_long_term_keys(struct sock *sk, struct hci_dev *hdev, for (i = 0; i < key_count; i++) { struct mgmt_ltk_info *key = &cp->keys[i]; - u8 type; + u8 type, addr_type; + + if (key->addr.type == BDADDR_LE_PUBLIC) + addr_type = ADDR_LE_DEV_PUBLIC; + else + addr_type = ADDR_LE_DEV_RANDOM; if (key->master) type = HCI_SMP_LTK; else type = HCI_SMP_LTK_SLAVE; - hci_add_ltk(hdev, &key->addr.bdaddr, - bdaddr_to_le(key->addr.type), + hci_add_ltk(hdev, &key->addr.bdaddr, addr_type, type, 0, key->authenticated, key->val, key->enc_size, key->ediv, key->rand); } @@ -3276,6 +4123,10 @@ static const struct mgmt_handler { { block_device, false, MGMT_BLOCK_DEVICE_SIZE }, { unblock_device, false, MGMT_UNBLOCK_DEVICE_SIZE }, { set_device_id, false, MGMT_SET_DEVICE_ID_SIZE }, + { set_advertising, false, MGMT_SETTING_SIZE }, + { set_bredr, false, MGMT_SETTING_SIZE }, + { set_static_address, false, MGMT_SET_STATIC_ADDRESS_SIZE }, + { set_scan_params, false, MGMT_SET_SCAN_PARAMS_SIZE }, }; @@ -3320,6 +4171,13 @@ int mgmt_control(struct sock *sk, struct msghdr *msg, size_t msglen) MGMT_STATUS_INVALID_INDEX); goto done; } + + if (test_bit(HCI_SETUP, &hdev->dev_flags) || + test_bit(HCI_USER_CHANNEL, &hdev->dev_flags)) { + err = cmd_status(sk, index, opcode, + MGMT_STATUS_INVALID_INDEX); + goto done; + } } if (opcode >= ARRAY_SIZE(mgmt_handlers) || @@ -3365,74 +4223,24 @@ done: return err; } -static void cmd_status_rsp(struct pending_cmd *cmd, void *data) -{ - u8 *status = data; - - cmd_status(cmd->sk, cmd->index, cmd->opcode, *status); - mgmt_pending_remove(cmd); -} - -int mgmt_index_added(struct hci_dev *hdev) +void mgmt_index_added(struct hci_dev *hdev) { - if (!mgmt_valid_hdev(hdev)) - return -ENOTSUPP; + if (hdev->dev_type != HCI_BREDR) + return; - return mgmt_event(MGMT_EV_INDEX_ADDED, hdev, NULL, 0, NULL); + mgmt_event(MGMT_EV_INDEX_ADDED, hdev, NULL, 0, NULL); } -int mgmt_index_removed(struct hci_dev *hdev) +void mgmt_index_removed(struct hci_dev *hdev) { u8 status = MGMT_STATUS_INVALID_INDEX; - if (!mgmt_valid_hdev(hdev)) - return -ENOTSUPP; + if (hdev->dev_type != HCI_BREDR) + return; mgmt_pending_foreach(0, hdev, cmd_status_rsp, &status); - return mgmt_event(MGMT_EV_INDEX_REMOVED, hdev, NULL, 0, NULL); -} - -struct cmd_lookup { - struct sock *sk; - struct hci_dev *hdev; - u8 mgmt_status; -}; - -static void settings_rsp(struct pending_cmd *cmd, void *data) -{ - struct cmd_lookup *match = data; - - send_settings_rsp(cmd->sk, cmd->opcode, match->hdev); - - list_del(&cmd->list); - - if (match->sk == NULL) { - match->sk = cmd->sk; - sock_hold(match->sk); - } - - mgmt_pending_free(cmd); -} - -static void set_bredr_scan(struct hci_request *req) -{ - struct hci_dev *hdev = req->hdev; - u8 scan = 0; - - /* Ensure that fast connectable is disabled. This function will - * not do anything if the page scan parameters are already what - * they should be. - */ - write_fast_connectable(req, false); - - if (test_bit(HCI_CONNECTABLE, &hdev->dev_flags)) - scan |= SCAN_PAGE; - if (test_bit(HCI_DISCOVERABLE, &hdev->dev_flags)) - scan |= SCAN_INQUIRY; - - if (scan) - hci_req_add(req, HCI_OP_WRITE_SCAN_ENABLE, 1, &scan); + mgmt_event(MGMT_EV_INDEX_REMOVED, hdev, NULL, 0, NULL); } static void powered_complete(struct hci_dev *hdev, u8 status) @@ -3483,13 +4291,33 @@ static int powered_update_hci(struct hci_dev *hdev) sizeof(cp), &cp); } + if (lmp_le_capable(hdev)) { + /* Set random address to static address if configured */ + if (bacmp(&hdev->static_addr, BDADDR_ANY)) + hci_req_add(&req, HCI_OP_LE_SET_RANDOM_ADDR, 6, + &hdev->static_addr); + + /* Make sure the controller has a good default for + * advertising data. This also applies to the case + * where BR/EDR was toggled during the AUTO_OFF phase. + */ + if (test_bit(HCI_LE_ENABLED, &hdev->dev_flags)) { + update_adv_data(&req); + update_scan_rsp_data(&req); + } + + if (test_bit(HCI_ADVERTISING, &hdev->dev_flags)) + enable_advertising(&req); + } + link_sec = test_bit(HCI_LINK_SECURITY, &hdev->dev_flags); if (link_sec != test_bit(HCI_AUTH, &hdev->flags)) hci_req_add(&req, HCI_OP_WRITE_AUTH_ENABLE, sizeof(link_sec), &link_sec); if (lmp_bredr_capable(hdev)) { - set_bredr_scan(&req); + if (test_bit(HCI_BREDR_ENABLED, &hdev->dev_flags)) + set_bredr_scan(&req); update_class(&req); update_name(&req); update_eir(&req); @@ -3533,76 +4361,110 @@ new_settings: return err; } -int mgmt_set_powered_failed(struct hci_dev *hdev, int err) +void mgmt_set_powered_failed(struct hci_dev *hdev, int err) { struct pending_cmd *cmd; u8 status; cmd = mgmt_pending_find(MGMT_OP_SET_POWERED, hdev); if (!cmd) - return -ENOENT; + return; if (err == -ERFKILL) status = MGMT_STATUS_RFKILLED; else status = MGMT_STATUS_FAILED; - err = cmd_status(cmd->sk, hdev->id, MGMT_OP_SET_POWERED, status); + cmd_status(cmd->sk, hdev->id, MGMT_OP_SET_POWERED, status); mgmt_pending_remove(cmd); +} - return err; +void mgmt_discoverable_timeout(struct hci_dev *hdev) +{ + struct hci_request req; + + hci_dev_lock(hdev); + + /* When discoverable timeout triggers, then just make sure + * the limited discoverable flag is cleared. Even in the case + * of a timeout triggered from general discoverable, it is + * safe to unconditionally clear the flag. + */ + clear_bit(HCI_LIMITED_DISCOVERABLE, &hdev->dev_flags); + clear_bit(HCI_DISCOVERABLE, &hdev->dev_flags); + + hci_req_init(&req, hdev); + if (test_bit(HCI_BREDR_ENABLED, &hdev->dev_flags)) { + u8 scan = SCAN_PAGE; + hci_req_add(&req, HCI_OP_WRITE_SCAN_ENABLE, + sizeof(scan), &scan); + } + update_class(&req); + update_adv_data(&req); + hci_req_run(&req, NULL); + + hdev->discov_timeout = 0; + + new_settings(hdev, NULL); + + hci_dev_unlock(hdev); } -int mgmt_discoverable(struct hci_dev *hdev, u8 discoverable) +void mgmt_discoverable(struct hci_dev *hdev, u8 discoverable) { - struct cmd_lookup match = { NULL, hdev }; - bool changed = false; - int err = 0; + bool changed; + + /* Nothing needed here if there's a pending command since that + * commands request completion callback takes care of everything + * necessary. + */ + if (mgmt_pending_find(MGMT_OP_SET_DISCOVERABLE, hdev)) + return; if (discoverable) { - if (!test_and_set_bit(HCI_DISCOVERABLE, &hdev->dev_flags)) - changed = true; + changed = !test_and_set_bit(HCI_DISCOVERABLE, &hdev->dev_flags); } else { - if (test_and_clear_bit(HCI_DISCOVERABLE, &hdev->dev_flags)) - changed = true; + clear_bit(HCI_LIMITED_DISCOVERABLE, &hdev->dev_flags); + changed = test_and_clear_bit(HCI_DISCOVERABLE, &hdev->dev_flags); } - mgmt_pending_foreach(MGMT_OP_SET_DISCOVERABLE, hdev, settings_rsp, - &match); + if (changed) { + struct hci_request req; - if (changed) - err = new_settings(hdev, match.sk); - - if (match.sk) - sock_put(match.sk); + /* In case this change in discoverable was triggered by + * a disabling of connectable there could be a need to + * update the advertising flags. + */ + hci_req_init(&req, hdev); + update_adv_data(&req); + hci_req_run(&req, NULL); - return err; + new_settings(hdev, NULL); + } } -int mgmt_connectable(struct hci_dev *hdev, u8 connectable) +void mgmt_connectable(struct hci_dev *hdev, u8 connectable) { - struct pending_cmd *cmd; - bool changed = false; - int err = 0; + bool changed; - if (connectable) { - if (!test_and_set_bit(HCI_CONNECTABLE, &hdev->dev_flags)) - changed = true; - } else { - if (test_and_clear_bit(HCI_CONNECTABLE, &hdev->dev_flags)) - changed = true; - } + /* Nothing needed here if there's a pending command since that + * commands request completion callback takes care of everything + * necessary. + */ + if (mgmt_pending_find(MGMT_OP_SET_CONNECTABLE, hdev)) + return; - cmd = mgmt_pending_find(MGMT_OP_SET_CONNECTABLE, hdev); + if (connectable) + changed = !test_and_set_bit(HCI_CONNECTABLE, &hdev->dev_flags); + else + changed = test_and_clear_bit(HCI_CONNECTABLE, &hdev->dev_flags); if (changed) - err = new_settings(hdev, cmd ? cmd->sk : NULL); - - return err; + new_settings(hdev, NULL); } -int mgmt_write_scan_failed(struct hci_dev *hdev, u8 scan, u8 status) +void mgmt_write_scan_failed(struct hci_dev *hdev, u8 scan, u8 status) { u8 mgmt_err = mgmt_status(status); @@ -3613,12 +4475,10 @@ int mgmt_write_scan_failed(struct hci_dev *hdev, u8 scan, u8 status) if (scan & SCAN_INQUIRY) mgmt_pending_foreach(MGMT_OP_SET_DISCOVERABLE, hdev, cmd_status_rsp, &mgmt_err); - - return 0; } -int mgmt_new_link_key(struct hci_dev *hdev, struct link_key *key, - bool persistent) +void mgmt_new_link_key(struct hci_dev *hdev, struct link_key *key, + bool persistent) { struct mgmt_ev_new_link_key ev; @@ -3631,10 +4491,10 @@ int mgmt_new_link_key(struct hci_dev *hdev, struct link_key *key, memcpy(ev.key.val, key->val, HCI_LINK_KEY_SIZE); ev.key.pin_len = key->pin_len; - return mgmt_event(MGMT_EV_NEW_LINK_KEY, hdev, &ev, sizeof(ev), NULL); + mgmt_event(MGMT_EV_NEW_LINK_KEY, hdev, &ev, sizeof(ev), NULL); } -int mgmt_new_ltk(struct hci_dev *hdev, struct smp_ltk *key, u8 persistent) +void mgmt_new_ltk(struct hci_dev *hdev, struct smp_ltk *key, u8 persistent) { struct mgmt_ev_new_long_term_key ev; @@ -3653,13 +4513,23 @@ int mgmt_new_ltk(struct hci_dev *hdev, struct smp_ltk *key, u8 persistent) memcpy(ev.key.rand, key->rand, sizeof(key->rand)); memcpy(ev.key.val, key->val, sizeof(key->val)); - return mgmt_event(MGMT_EV_NEW_LONG_TERM_KEY, hdev, &ev, sizeof(ev), - NULL); + mgmt_event(MGMT_EV_NEW_LONG_TERM_KEY, hdev, &ev, sizeof(ev), NULL); +} + +static inline u16 eir_append_data(u8 *eir, u16 eir_len, u8 type, u8 *data, + u8 data_len) +{ + eir[eir_len++] = sizeof(type) + data_len; + eir[eir_len++] = type; + memcpy(&eir[eir_len], data, data_len); + eir_len += data_len; + + return eir_len; } -int mgmt_device_connected(struct hci_dev *hdev, bdaddr_t *bdaddr, u8 link_type, - u8 addr_type, u32 flags, u8 *name, u8 name_len, - u8 *dev_class) +void mgmt_device_connected(struct hci_dev *hdev, bdaddr_t *bdaddr, u8 link_type, + u8 addr_type, u32 flags, u8 *name, u8 name_len, + u8 *dev_class) { char buf[512]; struct mgmt_ev_device_connected *ev = (void *) buf; @@ -3680,8 +4550,8 @@ int mgmt_device_connected(struct hci_dev *hdev, bdaddr_t *bdaddr, u8 link_type, ev->eir_len = cpu_to_le16(eir_len); - return mgmt_event(MGMT_EV_DEVICE_CONNECTED, hdev, buf, - sizeof(*ev) + eir_len, NULL); + mgmt_event(MGMT_EV_DEVICE_CONNECTED, hdev, buf, + sizeof(*ev) + eir_len, NULL); } static void disconnect_rsp(struct pending_cmd *cmd, void *data) @@ -3719,12 +4589,11 @@ static void unpair_device_rsp(struct pending_cmd *cmd, void *data) mgmt_pending_remove(cmd); } -int mgmt_device_disconnected(struct hci_dev *hdev, bdaddr_t *bdaddr, - u8 link_type, u8 addr_type, u8 reason) +void mgmt_device_disconnected(struct hci_dev *hdev, bdaddr_t *bdaddr, + u8 link_type, u8 addr_type, u8 reason) { struct mgmt_ev_device_disconnected ev; struct sock *sk = NULL; - int err; mgmt_pending_foreach(MGMT_OP_DISCONNECT, hdev, disconnect_rsp, &sk); @@ -3732,45 +4601,39 @@ int mgmt_device_disconnected(struct hci_dev *hdev, bdaddr_t *bdaddr, ev.addr.type = link_to_bdaddr(link_type, addr_type); ev.reason = reason; - err = mgmt_event(MGMT_EV_DEVICE_DISCONNECTED, hdev, &ev, sizeof(ev), - sk); + mgmt_event(MGMT_EV_DEVICE_DISCONNECTED, hdev, &ev, sizeof(ev), sk); if (sk) sock_put(sk); mgmt_pending_foreach(MGMT_OP_UNPAIR_DEVICE, hdev, unpair_device_rsp, hdev); - - return err; } -int mgmt_disconnect_failed(struct hci_dev *hdev, bdaddr_t *bdaddr, - u8 link_type, u8 addr_type, u8 status) +void mgmt_disconnect_failed(struct hci_dev *hdev, bdaddr_t *bdaddr, + u8 link_type, u8 addr_type, u8 status) { struct mgmt_rp_disconnect rp; struct pending_cmd *cmd; - int err; mgmt_pending_foreach(MGMT_OP_UNPAIR_DEVICE, hdev, unpair_device_rsp, hdev); cmd = mgmt_pending_find(MGMT_OP_DISCONNECT, hdev); if (!cmd) - return -ENOENT; + return; bacpy(&rp.addr.bdaddr, bdaddr); rp.addr.type = link_to_bdaddr(link_type, addr_type); - err = cmd_complete(cmd->sk, cmd->index, MGMT_OP_DISCONNECT, - mgmt_status(status), &rp, sizeof(rp)); + cmd_complete(cmd->sk, cmd->index, MGMT_OP_DISCONNECT, + mgmt_status(status), &rp, sizeof(rp)); mgmt_pending_remove(cmd); - - return err; } -int mgmt_connect_failed(struct hci_dev *hdev, bdaddr_t *bdaddr, u8 link_type, - u8 addr_type, u8 status) +void mgmt_connect_failed(struct hci_dev *hdev, bdaddr_t *bdaddr, u8 link_type, + u8 addr_type, u8 status) { struct mgmt_ev_connect_failed ev; @@ -3778,10 +4641,10 @@ int mgmt_connect_failed(struct hci_dev *hdev, bdaddr_t *bdaddr, u8 link_type, ev.addr.type = link_to_bdaddr(link_type, addr_type); ev.status = mgmt_status(status); - return mgmt_event(MGMT_EV_CONNECT_FAILED, hdev, &ev, sizeof(ev), NULL); + mgmt_event(MGMT_EV_CONNECT_FAILED, hdev, &ev, sizeof(ev), NULL); } -int mgmt_pin_code_request(struct hci_dev *hdev, bdaddr_t *bdaddr, u8 secure) +void mgmt_pin_code_request(struct hci_dev *hdev, bdaddr_t *bdaddr, u8 secure) { struct mgmt_ev_pin_code_request ev; @@ -3789,52 +4652,45 @@ int mgmt_pin_code_request(struct hci_dev *hdev, bdaddr_t *bdaddr, u8 secure) ev.addr.type = BDADDR_BREDR; ev.secure = secure; - return mgmt_event(MGMT_EV_PIN_CODE_REQUEST, hdev, &ev, sizeof(ev), - NULL); + mgmt_event(MGMT_EV_PIN_CODE_REQUEST, hdev, &ev, sizeof(ev), NULL); } -int mgmt_pin_code_reply_complete(struct hci_dev *hdev, bdaddr_t *bdaddr, - u8 status) +void mgmt_pin_code_reply_complete(struct hci_dev *hdev, bdaddr_t *bdaddr, + u8 status) { struct pending_cmd *cmd; struct mgmt_rp_pin_code_reply rp; - int err; cmd = mgmt_pending_find(MGMT_OP_PIN_CODE_REPLY, hdev); if (!cmd) - return -ENOENT; + return; bacpy(&rp.addr.bdaddr, bdaddr); rp.addr.type = BDADDR_BREDR; - err = cmd_complete(cmd->sk, hdev->id, MGMT_OP_PIN_CODE_REPLY, - mgmt_status(status), &rp, sizeof(rp)); + cmd_complete(cmd->sk, hdev->id, MGMT_OP_PIN_CODE_REPLY, + mgmt_status(status), &rp, sizeof(rp)); mgmt_pending_remove(cmd); - - return err; } -int mgmt_pin_code_neg_reply_complete(struct hci_dev *hdev, bdaddr_t *bdaddr, - u8 status) +void mgmt_pin_code_neg_reply_complete(struct hci_dev *hdev, bdaddr_t *bdaddr, + u8 status) { struct pending_cmd *cmd; struct mgmt_rp_pin_code_reply rp; - int err; cmd = mgmt_pending_find(MGMT_OP_PIN_CODE_NEG_REPLY, hdev); if (!cmd) - return -ENOENT; + return; bacpy(&rp.addr.bdaddr, bdaddr); rp.addr.type = BDADDR_BREDR; - err = cmd_complete(cmd->sk, hdev->id, MGMT_OP_PIN_CODE_NEG_REPLY, - mgmt_status(status), &rp, sizeof(rp)); + cmd_complete(cmd->sk, hdev->id, MGMT_OP_PIN_CODE_NEG_REPLY, + mgmt_status(status), &rp, sizeof(rp)); mgmt_pending_remove(cmd); - - return err; } int mgmt_user_confirm_request(struct hci_dev *hdev, bdaddr_t *bdaddr, @@ -3936,8 +4792,8 @@ int mgmt_user_passkey_notify(struct hci_dev *hdev, bdaddr_t *bdaddr, return mgmt_event(MGMT_EV_PASSKEY_NOTIFY, hdev, &ev, sizeof(ev), NULL); } -int mgmt_auth_failed(struct hci_dev *hdev, bdaddr_t *bdaddr, u8 link_type, - u8 addr_type, u8 status) +void mgmt_auth_failed(struct hci_dev *hdev, bdaddr_t *bdaddr, u8 link_type, + u8 addr_type, u8 status) { struct mgmt_ev_auth_failed ev; @@ -3945,40 +4801,36 @@ int mgmt_auth_failed(struct hci_dev *hdev, bdaddr_t *bdaddr, u8 link_type, ev.addr.type = link_to_bdaddr(link_type, addr_type); ev.status = mgmt_status(status); - return mgmt_event(MGMT_EV_AUTH_FAILED, hdev, &ev, sizeof(ev), NULL); + mgmt_event(MGMT_EV_AUTH_FAILED, hdev, &ev, sizeof(ev), NULL); } -int mgmt_auth_enable_complete(struct hci_dev *hdev, u8 status) +void mgmt_auth_enable_complete(struct hci_dev *hdev, u8 status) { struct cmd_lookup match = { NULL, hdev }; - bool changed = false; - int err = 0; + bool changed; if (status) { u8 mgmt_err = mgmt_status(status); mgmt_pending_foreach(MGMT_OP_SET_LINK_SECURITY, hdev, cmd_status_rsp, &mgmt_err); - return 0; + return; } - if (test_bit(HCI_AUTH, &hdev->flags)) { - if (!test_and_set_bit(HCI_LINK_SECURITY, &hdev->dev_flags)) - changed = true; - } else { - if (test_and_clear_bit(HCI_LINK_SECURITY, &hdev->dev_flags)) - changed = true; - } + if (test_bit(HCI_AUTH, &hdev->flags)) + changed = !test_and_set_bit(HCI_LINK_SECURITY, + &hdev->dev_flags); + else + changed = test_and_clear_bit(HCI_LINK_SECURITY, + &hdev->dev_flags); mgmt_pending_foreach(MGMT_OP_SET_LINK_SECURITY, hdev, settings_rsp, &match); if (changed) - err = new_settings(hdev, match.sk); + new_settings(hdev, match.sk); if (match.sk) sock_put(match.sk); - - return err; } static void clear_eir(struct hci_request *req) @@ -3996,38 +4848,41 @@ static void clear_eir(struct hci_request *req) hci_req_add(req, HCI_OP_WRITE_EIR, sizeof(cp), &cp); } -int mgmt_ssp_enable_complete(struct hci_dev *hdev, u8 enable, u8 status) +void mgmt_ssp_enable_complete(struct hci_dev *hdev, u8 enable, u8 status) { struct cmd_lookup match = { NULL, hdev }; struct hci_request req; bool changed = false; - int err = 0; if (status) { u8 mgmt_err = mgmt_status(status); if (enable && test_and_clear_bit(HCI_SSP_ENABLED, - &hdev->dev_flags)) - err = new_settings(hdev, NULL); + &hdev->dev_flags)) { + clear_bit(HCI_HS_ENABLED, &hdev->dev_flags); + new_settings(hdev, NULL); + } mgmt_pending_foreach(MGMT_OP_SET_SSP, hdev, cmd_status_rsp, &mgmt_err); - - return err; + return; } if (enable) { - if (!test_and_set_bit(HCI_SSP_ENABLED, &hdev->dev_flags)) - changed = true; + changed = !test_and_set_bit(HCI_SSP_ENABLED, &hdev->dev_flags); } else { - if (test_and_clear_bit(HCI_SSP_ENABLED, &hdev->dev_flags)) - changed = true; + changed = test_and_clear_bit(HCI_SSP_ENABLED, &hdev->dev_flags); + if (!changed) + changed = test_and_clear_bit(HCI_HS_ENABLED, + &hdev->dev_flags); + else + clear_bit(HCI_HS_ENABLED, &hdev->dev_flags); } mgmt_pending_foreach(MGMT_OP_SET_SSP, hdev, settings_rsp, &match); if (changed) - err = new_settings(hdev, match.sk); + new_settings(hdev, match.sk); if (match.sk) sock_put(match.sk); @@ -4040,8 +4895,6 @@ int mgmt_ssp_enable_complete(struct hci_dev *hdev, u8 enable, u8 status) clear_eir(&req); hci_req_run(&req, NULL); - - return err; } static void sk_lookup(struct pending_cmd *cmd, void *data) @@ -4054,33 +4907,30 @@ static void sk_lookup(struct pending_cmd *cmd, void *data) } } -int mgmt_set_class_of_dev_complete(struct hci_dev *hdev, u8 *dev_class, - u8 status) +void mgmt_set_class_of_dev_complete(struct hci_dev *hdev, u8 *dev_class, + u8 status) { struct cmd_lookup match = { NULL, hdev, mgmt_status(status) }; - int err = 0; mgmt_pending_foreach(MGMT_OP_SET_DEV_CLASS, hdev, sk_lookup, &match); mgmt_pending_foreach(MGMT_OP_ADD_UUID, hdev, sk_lookup, &match); mgmt_pending_foreach(MGMT_OP_REMOVE_UUID, hdev, sk_lookup, &match); if (!status) - err = mgmt_event(MGMT_EV_CLASS_OF_DEV_CHANGED, hdev, dev_class, - 3, NULL); + mgmt_event(MGMT_EV_CLASS_OF_DEV_CHANGED, hdev, dev_class, 3, + NULL); if (match.sk) sock_put(match.sk); - - return err; } -int mgmt_set_local_name_complete(struct hci_dev *hdev, u8 *name, u8 status) +void mgmt_set_local_name_complete(struct hci_dev *hdev, u8 *name, u8 status) { struct mgmt_cp_set_local_name ev; struct pending_cmd *cmd; if (status) - return 0; + return; memset(&ev, 0, sizeof(ev)); memcpy(ev.name, name, HCI_MAX_NAME_LENGTH); @@ -4094,96 +4944,54 @@ int mgmt_set_local_name_complete(struct hci_dev *hdev, u8 *name, u8 status) * HCI dev don't send any mgmt signals. */ if (mgmt_pending_find(MGMT_OP_SET_POWERED, hdev)) - return 0; + return; } - return mgmt_event(MGMT_EV_LOCAL_NAME_CHANGED, hdev, &ev, sizeof(ev), - cmd ? cmd->sk : NULL); + mgmt_event(MGMT_EV_LOCAL_NAME_CHANGED, hdev, &ev, sizeof(ev), + cmd ? cmd->sk : NULL); } -int mgmt_read_local_oob_data_reply_complete(struct hci_dev *hdev, u8 *hash, - u8 *randomizer, u8 status) +void mgmt_read_local_oob_data_reply_complete(struct hci_dev *hdev, u8 *hash, + u8 *randomizer, u8 status) { struct pending_cmd *cmd; - int err; BT_DBG("%s status %u", hdev->name, status); cmd = mgmt_pending_find(MGMT_OP_READ_LOCAL_OOB_DATA, hdev); if (!cmd) - return -ENOENT; + return; if (status) { - err = cmd_status(cmd->sk, hdev->id, MGMT_OP_READ_LOCAL_OOB_DATA, - mgmt_status(status)); + cmd_status(cmd->sk, hdev->id, MGMT_OP_READ_LOCAL_OOB_DATA, + mgmt_status(status)); } else { struct mgmt_rp_read_local_oob_data rp; memcpy(rp.hash, hash, sizeof(rp.hash)); memcpy(rp.randomizer, randomizer, sizeof(rp.randomizer)); - err = cmd_complete(cmd->sk, hdev->id, - MGMT_OP_READ_LOCAL_OOB_DATA, 0, &rp, - sizeof(rp)); + cmd_complete(cmd->sk, hdev->id, MGMT_OP_READ_LOCAL_OOB_DATA, + 0, &rp, sizeof(rp)); } mgmt_pending_remove(cmd); - - return err; } -int mgmt_le_enable_complete(struct hci_dev *hdev, u8 enable, u8 status) -{ - struct cmd_lookup match = { NULL, hdev }; - bool changed = false; - int err = 0; - - if (status) { - u8 mgmt_err = mgmt_status(status); - - if (enable && test_and_clear_bit(HCI_LE_ENABLED, - &hdev->dev_flags)) - err = new_settings(hdev, NULL); - - mgmt_pending_foreach(MGMT_OP_SET_LE, hdev, cmd_status_rsp, - &mgmt_err); - - return err; - } - - if (enable) { - if (!test_and_set_bit(HCI_LE_ENABLED, &hdev->dev_flags)) - changed = true; - } else { - if (test_and_clear_bit(HCI_LE_ENABLED, &hdev->dev_flags)) - changed = true; - } - - mgmt_pending_foreach(MGMT_OP_SET_LE, hdev, settings_rsp, &match); - - if (changed) - err = new_settings(hdev, match.sk); - - if (match.sk) - sock_put(match.sk); - - return err; -} - -int mgmt_device_found(struct hci_dev *hdev, bdaddr_t *bdaddr, u8 link_type, - u8 addr_type, u8 *dev_class, s8 rssi, u8 cfm_name, u8 - ssp, u8 *eir, u16 eir_len) +void mgmt_device_found(struct hci_dev *hdev, bdaddr_t *bdaddr, u8 link_type, + u8 addr_type, u8 *dev_class, s8 rssi, u8 cfm_name, u8 + ssp, u8 *eir, u16 eir_len) { char buf[512]; struct mgmt_ev_device_found *ev = (void *) buf; size_t ev_size; if (!hci_discovery_active(hdev)) - return -EPERM; + return; /* Leave 5 bytes for a potential CoD field */ if (sizeof(*ev) + eir_len + 5 > sizeof(buf)) - return -EINVAL; + return; memset(buf, 0, sizeof(buf)); @@ -4205,11 +5013,11 @@ int mgmt_device_found(struct hci_dev *hdev, bdaddr_t *bdaddr, u8 link_type, ev->eir_len = cpu_to_le16(eir_len); ev_size = sizeof(*ev) + eir_len; - return mgmt_event(MGMT_EV_DEVICE_FOUND, hdev, ev, ev_size, NULL); + mgmt_event(MGMT_EV_DEVICE_FOUND, hdev, ev, ev_size, NULL); } -int mgmt_remote_name(struct hci_dev *hdev, bdaddr_t *bdaddr, u8 link_type, - u8 addr_type, s8 rssi, u8 *name, u8 name_len) +void mgmt_remote_name(struct hci_dev *hdev, bdaddr_t *bdaddr, u8 link_type, + u8 addr_type, s8 rssi, u8 *name, u8 name_len) { struct mgmt_ev_device_found *ev; char buf[sizeof(*ev) + HCI_MAX_NAME_LENGTH + 2]; @@ -4228,11 +5036,10 @@ int mgmt_remote_name(struct hci_dev *hdev, bdaddr_t *bdaddr, u8 link_type, ev->eir_len = cpu_to_le16(eir_len); - return mgmt_event(MGMT_EV_DEVICE_FOUND, hdev, ev, - sizeof(*ev) + eir_len, NULL); + mgmt_event(MGMT_EV_DEVICE_FOUND, hdev, ev, sizeof(*ev) + eir_len, NULL); } -int mgmt_discovering(struct hci_dev *hdev, u8 discovering) +void mgmt_discovering(struct hci_dev *hdev, u8 discovering) { struct mgmt_ev_discovering ev; struct pending_cmd *cmd; @@ -4256,7 +5063,7 @@ int mgmt_discovering(struct hci_dev *hdev, u8 discovering) ev.type = hdev->discovery.type; ev.discovering = discovering; - return mgmt_event(MGMT_EV_DISCOVERING, hdev, &ev, sizeof(ev), NULL); + mgmt_event(MGMT_EV_DISCOVERING, hdev, &ev, sizeof(ev), NULL); } int mgmt_device_blocked(struct hci_dev *hdev, bdaddr_t *bdaddr, u8 type) @@ -4287,5 +5094,35 @@ int mgmt_device_unblocked(struct hci_dev *hdev, bdaddr_t *bdaddr, u8 type) cmd ? cmd->sk : NULL); } -module_param(enable_hs, bool, 0644); -MODULE_PARM_DESC(enable_hs, "Enable High Speed support"); +static void adv_enable_complete(struct hci_dev *hdev, u8 status) +{ + BT_DBG("%s status %u", hdev->name, status); + + /* Clear the advertising mgmt setting if we failed to re-enable it */ + if (status) { + clear_bit(HCI_ADVERTISING, &hdev->dev_flags); + new_settings(hdev, NULL); + } +} + +void mgmt_reenable_advertising(struct hci_dev *hdev) +{ + struct hci_request req; + + if (hci_conn_num(hdev, LE_LINK) > 0) + return; + + if (!test_bit(HCI_ADVERTISING, &hdev->dev_flags)) + return; + + hci_req_init(&req, hdev); + enable_advertising(&req); + + /* If this fails we have no option but to let user space know + * that we've disabled advertising. + */ + if (hci_req_run(&req, adv_enable_complete) < 0) { + clear_bit(HCI_ADVERTISING, &hdev->dev_flags); + new_settings(hdev, NULL); + } +} diff --git a/net/bluetooth/rfcomm/core.c b/net/bluetooth/rfcomm/core.c index ca957d34b0c8..94d06cbfbc18 100644 --- a/net/bluetooth/rfcomm/core.c +++ b/net/bluetooth/rfcomm/core.c @@ -641,13 +641,13 @@ static struct rfcomm_session *rfcomm_session_get(bdaddr_t *src, bdaddr_t *dst) { struct rfcomm_session *s; struct list_head *p, *n; - struct bt_sock *sk; + struct l2cap_chan *chan; list_for_each_safe(p, n, &session_list) { s = list_entry(p, struct rfcomm_session, list); - sk = bt_sk(s->sock->sk); + chan = l2cap_pi(s->sock->sk)->chan; - if ((!bacmp(src, BDADDR_ANY) || !bacmp(&sk->src, src)) && - !bacmp(&sk->dst, dst)) + if ((!bacmp(src, BDADDR_ANY) || !bacmp(&chan->src, src)) && + !bacmp(&chan->dst, dst)) return s; } return NULL; @@ -732,11 +732,11 @@ failed: void rfcomm_session_getaddr(struct rfcomm_session *s, bdaddr_t *src, bdaddr_t *dst) { - struct sock *sk = s->sock->sk; + struct l2cap_chan *chan = l2cap_pi(s->sock->sk)->chan; if (src) - bacpy(src, &bt_sk(sk)->src); + bacpy(src, &chan->src); if (dst) - bacpy(dst, &bt_sk(sk)->dst); + bacpy(dst, &chan->dst); } /* ---- RFCOMM frame sending ---- */ @@ -2112,12 +2112,11 @@ static int rfcomm_dlc_debugfs_show(struct seq_file *f, void *x) rfcomm_lock(); list_for_each_entry(s, &session_list, list) { + struct l2cap_chan *chan = l2cap_pi(s->sock->sk)->chan; struct rfcomm_dlc *d; list_for_each_entry(d, &s->dlcs, list) { - struct sock *sk = s->sock->sk; - seq_printf(f, "%pMR %pMR %ld %d %d %d %d\n", - &bt_sk(sk)->src, &bt_sk(sk)->dst, + &chan->src, &chan->dst, d->state, d->dlci, d->mtu, d->rx_credits, d->tx_credits); } @@ -2155,13 +2154,6 @@ static int __init rfcomm_init(void) goto unregister; } - if (bt_debugfs) { - rfcomm_dlc_debugfs = debugfs_create_file("rfcomm_dlc", 0444, - bt_debugfs, NULL, &rfcomm_dlc_debugfs_fops); - if (!rfcomm_dlc_debugfs) - BT_ERR("Failed to create RFCOMM debug file"); - } - err = rfcomm_init_ttys(); if (err < 0) goto stop; @@ -2172,6 +2164,13 @@ static int __init rfcomm_init(void) BT_INFO("RFCOMM ver %s", VERSION); + if (IS_ERR_OR_NULL(bt_debugfs)) + return 0; + + rfcomm_dlc_debugfs = debugfs_create_file("rfcomm_dlc", 0444, + bt_debugfs, NULL, + &rfcomm_dlc_debugfs_fops); + return 0; cleanup: diff --git a/net/bluetooth/rfcomm/sock.c b/net/bluetooth/rfcomm/sock.c index 30b3721dc6d7..c4d3d423f89b 100644 --- a/net/bluetooth/rfcomm/sock.c +++ b/net/bluetooth/rfcomm/sock.c @@ -87,7 +87,8 @@ static void rfcomm_sk_state_change(struct rfcomm_dlc *d, int err) parent->sk_data_ready(parent, 0); } else { if (d->state == BT_CONNECTED) - rfcomm_session_getaddr(d->session, &bt_sk(sk)->src, NULL); + rfcomm_session_getaddr(d->session, + &rfcomm_pi(sk)->src, NULL); sk->sk_state_change(sk); } @@ -110,7 +111,7 @@ static struct sock *__rfcomm_get_sock_by_addr(u8 channel, bdaddr_t *src) sk_for_each(sk, &rfcomm_sk_list.head) { if (rfcomm_pi(sk)->channel == channel && - !bacmp(&bt_sk(sk)->src, src)) + !bacmp(&rfcomm_pi(sk)->src, src)) break; } @@ -132,11 +133,11 @@ static struct sock *rfcomm_get_sock_by_channel(int state, u8 channel, bdaddr_t * if (rfcomm_pi(sk)->channel == channel) { /* Exact match. */ - if (!bacmp(&bt_sk(sk)->src, src)) + if (!bacmp(&rfcomm_pi(sk)->src, src)) break; /* Closest match */ - if (!bacmp(&bt_sk(sk)->src, BDADDR_ANY)) + if (!bacmp(&rfcomm_pi(sk)->src, BDADDR_ANY)) sk1 = sk; } } @@ -355,7 +356,7 @@ static int rfcomm_sock_bind(struct socket *sock, struct sockaddr *addr, int addr err = -EADDRINUSE; } else { /* Save source address */ - bacpy(&bt_sk(sk)->src, &sa->rc_bdaddr); + bacpy(&rfcomm_pi(sk)->src, &sa->rc_bdaddr); rfcomm_pi(sk)->channel = sa->rc_channel; sk->sk_state = BT_BOUND; } @@ -393,13 +394,14 @@ static int rfcomm_sock_connect(struct socket *sock, struct sockaddr *addr, int a } sk->sk_state = BT_CONNECT; - bacpy(&bt_sk(sk)->dst, &sa->rc_bdaddr); + bacpy(&rfcomm_pi(sk)->dst, &sa->rc_bdaddr); rfcomm_pi(sk)->channel = sa->rc_channel; d->sec_level = rfcomm_pi(sk)->sec_level; d->role_switch = rfcomm_pi(sk)->role_switch; - err = rfcomm_dlc_open(d, &bt_sk(sk)->src, &sa->rc_bdaddr, sa->rc_channel); + err = rfcomm_dlc_open(d, &rfcomm_pi(sk)->src, &sa->rc_bdaddr, + sa->rc_channel); if (!err) err = bt_sock_wait_state(sk, BT_CONNECTED, sock_sndtimeo(sk, flags & O_NONBLOCK)); @@ -429,7 +431,7 @@ static int rfcomm_sock_listen(struct socket *sock, int backlog) } if (!rfcomm_pi(sk)->channel) { - bdaddr_t *src = &bt_sk(sk)->src; + bdaddr_t *src = &rfcomm_pi(sk)->src; u8 channel; err = -EINVAL; @@ -530,9 +532,9 @@ static int rfcomm_sock_getname(struct socket *sock, struct sockaddr *addr, int * sa->rc_family = AF_BLUETOOTH; sa->rc_channel = rfcomm_pi(sk)->channel; if (peer) - bacpy(&sa->rc_bdaddr, &bt_sk(sk)->dst); + bacpy(&sa->rc_bdaddr, &rfcomm_pi(sk)->dst); else - bacpy(&sa->rc_bdaddr, &bt_sk(sk)->src); + bacpy(&sa->rc_bdaddr, &rfcomm_pi(sk)->src); *len = sizeof(struct sockaddr_rc); return 0; @@ -544,7 +546,7 @@ static int rfcomm_sock_sendmsg(struct kiocb *iocb, struct socket *sock, struct sock *sk = sock->sk; struct rfcomm_dlc *d = rfcomm_pi(sk)->dlc; struct sk_buff *skb; - int sent = 0; + int sent; if (test_bit(RFCOMM_DEFER_SETUP, &d->flags)) return -ENOTCONN; @@ -559,6 +561,10 @@ static int rfcomm_sock_sendmsg(struct kiocb *iocb, struct socket *sock, lock_sock(sk); + sent = bt_sock_wait_ready(sk, msg->msg_flags); + if (sent) + goto done; + while (len) { size_t size = min_t(size_t, len, d->mtu); int err; @@ -594,6 +600,7 @@ static int rfcomm_sock_sendmsg(struct kiocb *iocb, struct socket *sock, len -= size; } +done: release_sock(sk); return sent; @@ -946,8 +953,8 @@ int rfcomm_connect_ind(struct rfcomm_session *s, u8 channel, struct rfcomm_dlc * bt_sock_reclassify_lock(sk, BTPROTO_RFCOMM); rfcomm_sock_init(sk, parent); - bacpy(&bt_sk(sk)->src, &src); - bacpy(&bt_sk(sk)->dst, &dst); + bacpy(&rfcomm_pi(sk)->src, &src); + bacpy(&rfcomm_pi(sk)->dst, &dst); rfcomm_pi(sk)->channel = channel; sk->sk_state = BT_CONFIG; @@ -974,7 +981,7 @@ static int rfcomm_sock_debugfs_show(struct seq_file *f, void *p) sk_for_each(sk, &rfcomm_sk_list.head) { seq_printf(f, "%pMR %pMR %d %d\n", - &bt_sk(sk)->src, &bt_sk(sk)->dst, + &rfcomm_pi(sk)->src, &rfcomm_pi(sk)->dst, sk->sk_state, rfcomm_pi(sk)->channel); } @@ -1044,15 +1051,15 @@ int __init rfcomm_init_sockets(void) goto error; } - if (bt_debugfs) { - rfcomm_sock_debugfs = debugfs_create_file("rfcomm", 0444, - bt_debugfs, NULL, &rfcomm_sock_debugfs_fops); - if (!rfcomm_sock_debugfs) - BT_ERR("Failed to create RFCOMM debug file"); - } - BT_INFO("RFCOMM socket layer initialized"); + if (IS_ERR_OR_NULL(bt_debugfs)) + return 0; + + rfcomm_sock_debugfs = debugfs_create_file("rfcomm", 0444, + bt_debugfs, NULL, + &rfcomm_sock_debugfs_fops); + return 0; error: diff --git a/net/bluetooth/sco.c b/net/bluetooth/sco.c index 96bd388d93a4..12a0e51e21e1 100644 --- a/net/bluetooth/sco.c +++ b/net/bluetooth/sco.c @@ -92,9 +92,6 @@ static struct sco_conn *sco_conn_add(struct hci_conn *hcon) hcon->sco_data = conn; conn->hcon = hcon; - conn->src = &hdev->bdaddr; - conn->dst = &hcon->dst; - if (hdev->sco_mtu > 0) conn->mtu = hdev->sco_mtu; else @@ -156,16 +153,14 @@ static int sco_chan_add(struct sco_conn *conn, struct sock *sk, static int sco_connect(struct sock *sk) { - bdaddr_t *src = &bt_sk(sk)->src; - bdaddr_t *dst = &bt_sk(sk)->dst; struct sco_conn *conn; struct hci_conn *hcon; struct hci_dev *hdev; int err, type; - BT_DBG("%pMR -> %pMR", src, dst); + BT_DBG("%pMR -> %pMR", &sco_pi(sk)->src, &sco_pi(sk)->dst); - hdev = hci_get_route(dst, src); + hdev = hci_get_route(&sco_pi(sk)->dst, &sco_pi(sk)->src); if (!hdev) return -EHOSTUNREACH; @@ -182,7 +177,8 @@ static int sco_connect(struct sock *sk) goto done; } - hcon = hci_connect_sco(hdev, type, dst, sco_pi(sk)->setting); + hcon = hci_connect_sco(hdev, type, &sco_pi(sk)->dst, + sco_pi(sk)->setting); if (IS_ERR(hcon)) { err = PTR_ERR(hcon); goto done; @@ -196,7 +192,7 @@ static int sco_connect(struct sock *sk) } /* Update source addr of the socket */ - bacpy(src, conn->src); + bacpy(&sco_pi(sk)->src, &hcon->src); err = sco_chan_add(conn, sk, NULL); if (err) @@ -270,7 +266,7 @@ static struct sock *__sco_get_sock_listen_by_addr(bdaddr_t *ba) if (sk->sk_state != BT_LISTEN) continue; - if (!bacmp(&bt_sk(sk)->src, ba)) + if (!bacmp(&sco_pi(sk)->src, ba)) return sk; } @@ -291,11 +287,11 @@ static struct sock *sco_get_sock_listen(bdaddr_t *src) continue; /* Exact match. */ - if (!bacmp(&bt_sk(sk)->src, src)) + if (!bacmp(&sco_pi(sk)->src, src)) break; /* Closest match */ - if (!bacmp(&bt_sk(sk)->src, BDADDR_ANY)) + if (!bacmp(&sco_pi(sk)->src, BDADDR_ANY)) sk1 = sk; } @@ -475,7 +471,7 @@ static int sco_sock_bind(struct socket *sock, struct sockaddr *addr, int addr_le goto done; } - bacpy(&bt_sk(sk)->src, &sa->sco_bdaddr); + bacpy(&sco_pi(sk)->src, &sa->sco_bdaddr); sk->sk_state = BT_BOUND; @@ -505,7 +501,7 @@ static int sco_sock_connect(struct socket *sock, struct sockaddr *addr, int alen lock_sock(sk); /* Set destination address and psm */ - bacpy(&bt_sk(sk)->dst, &sa->sco_bdaddr); + bacpy(&sco_pi(sk)->dst, &sa->sco_bdaddr); err = sco_connect(sk); if (err) @@ -522,7 +518,7 @@ done: static int sco_sock_listen(struct socket *sock, int backlog) { struct sock *sk = sock->sk; - bdaddr_t *src = &bt_sk(sk)->src; + bdaddr_t *src = &sco_pi(sk)->src; int err = 0; BT_DBG("sk %p backlog %d", sk, backlog); @@ -626,9 +622,9 @@ static int sco_sock_getname(struct socket *sock, struct sockaddr *addr, int *len *len = sizeof(struct sockaddr_sco); if (peer) - bacpy(&sa->sco_bdaddr, &bt_sk(sk)->dst); + bacpy(&sa->sco_bdaddr, &sco_pi(sk)->dst); else - bacpy(&sa->sco_bdaddr, &bt_sk(sk)->src); + bacpy(&sa->sco_bdaddr, &sco_pi(sk)->src); return 0; } @@ -999,7 +995,7 @@ static void sco_conn_ready(struct sco_conn *conn) } else { sco_conn_lock(conn); - parent = sco_get_sock_listen(conn->src); + parent = sco_get_sock_listen(&conn->hcon->src); if (!parent) { sco_conn_unlock(conn); return; @@ -1017,8 +1013,8 @@ static void sco_conn_ready(struct sco_conn *conn) sco_sock_init(sk, parent); - bacpy(&bt_sk(sk)->src, conn->src); - bacpy(&bt_sk(sk)->dst, conn->dst); + bacpy(&sco_pi(sk)->src, &conn->hcon->src); + bacpy(&sco_pi(sk)->dst, &conn->hcon->dst); hci_conn_hold(conn->hcon); __sco_chan_add(conn, sk, parent); @@ -1051,8 +1047,8 @@ int sco_connect_ind(struct hci_dev *hdev, bdaddr_t *bdaddr, __u8 *flags) if (sk->sk_state != BT_LISTEN) continue; - if (!bacmp(&bt_sk(sk)->src, &hdev->bdaddr) || - !bacmp(&bt_sk(sk)->src, BDADDR_ANY)) { + if (!bacmp(&sco_pi(sk)->src, &hdev->bdaddr) || + !bacmp(&sco_pi(sk)->src, BDADDR_ANY)) { lm |= HCI_LM_ACCEPT; if (test_bit(BT_SK_DEFER_SETUP, &bt_sk(sk)->flags)) @@ -1111,8 +1107,8 @@ static int sco_debugfs_show(struct seq_file *f, void *p) read_lock(&sco_sk_list.lock); sk_for_each(sk, &sco_sk_list.head) { - seq_printf(f, "%pMR %pMR %d\n", &bt_sk(sk)->src, - &bt_sk(sk)->dst, sk->sk_state); + seq_printf(f, "%pMR %pMR %d\n", &sco_pi(sk)->src, + &sco_pi(sk)->dst, sk->sk_state); } read_unlock(&sco_sk_list.lock); @@ -1181,15 +1177,14 @@ int __init sco_init(void) goto error; } - if (bt_debugfs) { - sco_debugfs = debugfs_create_file("sco", 0444, bt_debugfs, - NULL, &sco_debugfs_fops); - if (!sco_debugfs) - BT_ERR("Failed to create SCO debug file"); - } - BT_INFO("SCO socket layer initialized"); + if (IS_ERR_OR_NULL(bt_debugfs)) + return 0; + + sco_debugfs = debugfs_create_file("sco", 0444, bt_debugfs, + NULL, &sco_debugfs_fops); + return 0; error: diff --git a/net/bluetooth/smp.c b/net/bluetooth/smp.c index b5562abdd6e0..85a2796cac61 100644 --- a/net/bluetooth/smp.c +++ b/net/bluetooth/smp.c @@ -28,7 +28,8 @@ #include <net/bluetooth/hci_core.h> #include <net/bluetooth/l2cap.h> #include <net/bluetooth/mgmt.h> -#include <net/bluetooth/smp.h> + +#include "smp.h" #define SMP_TIMEOUT msecs_to_jiffies(30000) @@ -85,8 +86,8 @@ static int smp_e(struct crypto_blkcipher *tfm, const u8 *k, u8 *r) } static int smp_c1(struct crypto_blkcipher *tfm, u8 k[16], u8 r[16], - u8 preq[7], u8 pres[7], u8 _iat, bdaddr_t *ia, - u8 _rat, bdaddr_t *ra, u8 res[16]) + u8 preq[7], u8 pres[7], u8 _iat, bdaddr_t *ia, + u8 _rat, bdaddr_t *ra, u8 res[16]) { u8 p1[16], p2[16]; int err; @@ -126,8 +127,8 @@ static int smp_c1(struct crypto_blkcipher *tfm, u8 k[16], u8 r[16], return err; } -static int smp_s1(struct crypto_blkcipher *tfm, u8 k[16], - u8 r1[16], u8 r2[16], u8 _r[16]) +static int smp_s1(struct crypto_blkcipher *tfm, u8 k[16], u8 r1[16], + u8 r2[16], u8 _r[16]) { int err; @@ -150,7 +151,7 @@ static int smp_rand(u8 *buf) } static struct sk_buff *smp_build_cmd(struct l2cap_conn *conn, u8 code, - u16 dlen, void *data) + u16 dlen, void *data) { struct sk_buff *skb; struct l2cap_hdr *lh; @@ -213,9 +214,8 @@ static __u8 seclevel_to_authreq(__u8 sec_level) } static void build_pairing_cmd(struct l2cap_conn *conn, - struct smp_cmd_pairing *req, - struct smp_cmd_pairing *rsp, - __u8 authreq) + struct smp_cmd_pairing *req, + struct smp_cmd_pairing *rsp, __u8 authreq) { u8 dist_keys = 0; @@ -249,7 +249,7 @@ static u8 check_enc_key_size(struct l2cap_conn *conn, __u8 max_key_size) struct smp_chan *smp = conn->smp_chan; if ((max_key_size > SMP_MAX_ENC_KEY_SIZE) || - (max_key_size < SMP_MIN_ENC_KEY_SIZE)) + (max_key_size < SMP_MIN_ENC_KEY_SIZE)) return SMP_ENC_KEY_SIZE; smp->enc_key_size = max_key_size; @@ -263,15 +263,15 @@ static void smp_failure(struct l2cap_conn *conn, u8 reason, u8 send) if (send) smp_send_cmd(conn, SMP_CMD_PAIRING_FAIL, sizeof(reason), - &reason); + &reason); - clear_bit(HCI_CONN_ENCRYPT_PEND, &conn->hcon->flags); - mgmt_auth_failed(conn->hcon->hdev, conn->dst, hcon->type, - hcon->dst_type, HCI_ERROR_AUTH_FAILURE); + clear_bit(HCI_CONN_ENCRYPT_PEND, &hcon->flags); + mgmt_auth_failed(hcon->hdev, &hcon->dst, hcon->type, hcon->dst_type, + HCI_ERROR_AUTH_FAILURE); cancel_delayed_work_sync(&conn->security_timer); - if (test_and_clear_bit(HCI_CONN_LE_SMP_PEND, &conn->hcon->flags)) + if (test_and_clear_bit(HCI_CONN_LE_SMP_PEND, &hcon->flags)) smp_chan_destroy(conn); } @@ -309,8 +309,8 @@ static int tk_request(struct l2cap_conn *conn, u8 remote_oob, u8 auth, /* If either side has unknown io_caps, use JUST WORKS */ /* Otherwise, look up method from the table */ if (!(auth & SMP_AUTH_MITM) || - local_io > SMP_IO_KEYBOARD_DISPLAY || - remote_io > SMP_IO_KEYBOARD_DISPLAY) + local_io > SMP_IO_KEYBOARD_DISPLAY || + remote_io > SMP_IO_KEYBOARD_DISPLAY) method = JUST_WORKS; else method = gen_method[remote_io][local_io]; @@ -354,10 +354,10 @@ static int tk_request(struct l2cap_conn *conn, u8 remote_oob, u8 auth, hci_dev_lock(hcon->hdev); if (method == REQ_PASSKEY) - ret = mgmt_user_passkey_request(hcon->hdev, conn->dst, + ret = mgmt_user_passkey_request(hcon->hdev, &hcon->dst, hcon->type, hcon->dst_type); else - ret = mgmt_user_confirm_request(hcon->hdev, conn->dst, + ret = mgmt_user_confirm_request(hcon->hdev, &hcon->dst, hcon->type, hcon->dst_type, cpu_to_le32(passkey), 0); @@ -386,12 +386,13 @@ static void confirm_work(struct work_struct *work) smp->tfm = tfm; if (conn->hcon->out) - ret = smp_c1(tfm, smp->tk, smp->prnd, smp->preq, smp->prsp, 0, - conn->src, conn->hcon->dst_type, conn->dst, res); + ret = smp_c1(tfm, smp->tk, smp->prnd, smp->preq, smp->prsp, + conn->hcon->src_type, &conn->hcon->src, + conn->hcon->dst_type, &conn->hcon->dst, res); else ret = smp_c1(tfm, smp->tk, smp->prnd, smp->preq, smp->prsp, - conn->hcon->dst_type, conn->dst, 0, conn->src, - res); + conn->hcon->dst_type, &conn->hcon->dst, + conn->hcon->src_type, &conn->hcon->src, res); if (ret) { reason = SMP_UNSPECIFIED; goto error; @@ -425,11 +426,13 @@ static void random_work(struct work_struct *work) BT_DBG("conn %p %s", conn, conn->hcon->out ? "master" : "slave"); if (hcon->out) - ret = smp_c1(tfm, smp->tk, smp->rrnd, smp->preq, smp->prsp, 0, - conn->src, hcon->dst_type, conn->dst, res); + ret = smp_c1(tfm, smp->tk, smp->rrnd, smp->preq, smp->prsp, + hcon->src_type, &hcon->src, + hcon->dst_type, &hcon->dst, res); else ret = smp_c1(tfm, smp->tk, smp->rrnd, smp->preq, smp->prsp, - hcon->dst_type, conn->dst, 0, conn->src, res); + hcon->dst_type, &hcon->dst, + hcon->src_type, &hcon->src, res); if (ret) { reason = SMP_UNSPECIFIED; goto error; @@ -477,9 +480,9 @@ static void random_work(struct work_struct *work) swap128(key, stk); memset(stk + smp->enc_key_size, 0, - SMP_MAX_ENC_KEY_SIZE - smp->enc_key_size); + SMP_MAX_ENC_KEY_SIZE - smp->enc_key_size); - hci_add_ltk(hcon->hdev, conn->dst, hcon->dst_type, + hci_add_ltk(hcon->hdev, &hcon->dst, hcon->dst_type, HCI_SMP_STK_SLAVE, 0, 0, stk, smp->enc_key_size, ediv, rand); } @@ -494,7 +497,7 @@ static struct smp_chan *smp_chan_create(struct l2cap_conn *conn) { struct smp_chan *smp; - smp = kzalloc(sizeof(struct smp_chan), GFP_ATOMIC); + smp = kzalloc(sizeof(*smp), GFP_ATOMIC); if (!smp) return NULL; @@ -649,7 +652,7 @@ static u8 smp_cmd_pairing_rsp(struct l2cap_conn *conn, struct sk_buff *skb) memcpy(&smp->prsp[1], rsp, sizeof(*rsp)); if ((req->auth_req & SMP_AUTH_BONDING) && - (rsp->auth_req & SMP_AUTH_BONDING)) + (rsp->auth_req & SMP_AUTH_BONDING)) auth = SMP_AUTH_BONDING; auth |= (req->auth_req | rsp->auth_req) & SMP_AUTH_MITM; @@ -684,7 +687,7 @@ static u8 smp_cmd_pairing_confirm(struct l2cap_conn *conn, struct sk_buff *skb) swap128(smp->prnd, random); smp_send_cmd(conn, SMP_CMD_PAIRING_RANDOM, sizeof(random), - random); + random); } else if (test_bit(SMP_FLAG_TK_VALID, &smp->smp_flags)) { queue_work(hdev->workqueue, &smp->confirm); } else { @@ -714,7 +717,7 @@ static u8 smp_ltk_encrypt(struct l2cap_conn *conn, u8 sec_level) struct smp_ltk *key; struct hci_conn *hcon = conn->hcon; - key = hci_find_ltk_by_addr(hcon->hdev, conn->dst, hcon->dst_type); + key = hci_find_ltk_by_addr(hcon->hdev, &hcon->dst, hcon->dst_type); if (!key) return 0; @@ -728,8 +731,8 @@ static u8 smp_ltk_encrypt(struct l2cap_conn *conn, u8 sec_level) hcon->enc_key_size = key->enc_size; return 1; - } + static u8 smp_cmd_security_req(struct l2cap_conn *conn, struct sk_buff *skb) { struct smp_cmd_security_req *rp = (void *) skb->data; @@ -835,9 +838,9 @@ static int smp_cmd_master_ident(struct l2cap_conn *conn, struct sk_buff *skb) skb_pull(skb, sizeof(*rp)); hci_dev_lock(hdev); - authenticated = (conn->hcon->sec_level == BT_SECURITY_HIGH); - hci_add_ltk(conn->hcon->hdev, conn->dst, hcon->dst_type, - HCI_SMP_LTK, 1, authenticated, smp->tk, smp->enc_key_size, + authenticated = (hcon->sec_level == BT_SECURITY_HIGH); + hci_add_ltk(hdev, &hcon->dst, hcon->dst_type, HCI_SMP_LTK, 1, + authenticated, smp->tk, smp->enc_key_size, rp->ediv, rp->rand); smp_distribute_keys(conn, 1); hci_dev_unlock(hdev); @@ -847,16 +850,27 @@ static int smp_cmd_master_ident(struct l2cap_conn *conn, struct sk_buff *skb) int smp_sig_channel(struct l2cap_conn *conn, struct sk_buff *skb) { - __u8 code = skb->data[0]; - __u8 reason; + struct hci_conn *hcon = conn->hcon; + __u8 code, reason; int err = 0; - if (!test_bit(HCI_LE_ENABLED, &conn->hcon->hdev->dev_flags)) { + if (hcon->type != LE_LINK) { + kfree_skb(skb); + return 0; + } + + if (skb->len < 1) { + kfree_skb(skb); + return -EILSEQ; + } + + if (!test_bit(HCI_LE_ENABLED, &hcon->hdev->dev_flags)) { err = -ENOTSUPP; reason = SMP_PAIRING_NOTSUPP; goto done; } + code = skb->data[0]; skb_pull(skb, sizeof(code)); /* @@ -974,7 +988,7 @@ int smp_distribute_keys(struct l2cap_conn *conn, __u8 force) smp_send_cmd(conn, SMP_CMD_ENCRYPT_INFO, sizeof(enc), &enc); authenticated = hcon->sec_level == BT_SECURITY_HIGH; - hci_add_ltk(conn->hcon->hdev, conn->dst, hcon->dst_type, + hci_add_ltk(hcon->hdev, &hcon->dst, hcon->dst_type, HCI_SMP_LTK_SLAVE, 1, authenticated, enc.ltk, smp->enc_key_size, ediv, ident.rand); @@ -996,10 +1010,10 @@ int smp_distribute_keys(struct l2cap_conn *conn, __u8 force) /* Just public address */ memset(&addrinfo, 0, sizeof(addrinfo)); - bacpy(&addrinfo.bdaddr, conn->src); + bacpy(&addrinfo.bdaddr, &conn->hcon->src); smp_send_cmd(conn, SMP_CMD_IDENT_ADDR_INFO, sizeof(addrinfo), - &addrinfo); + &addrinfo); *keydist &= ~SMP_DIST_ID_KEY; } diff --git a/net/bluetooth/smp.h b/net/bluetooth/smp.h new file mode 100644 index 000000000000..f8ba07f3e5fa --- /dev/null +++ b/net/bluetooth/smp.h @@ -0,0 +1,146 @@ +/* + BlueZ - Bluetooth protocol stack for Linux + Copyright (C) 2011 Nokia Corporation and/or its subsidiary(-ies). + + This program is free software; you can redistribute it and/or modify + it under the terms of the GNU General Public License version 2 as + published by the Free Software Foundation; + + THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS + OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, + FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT OF THIRD PARTY RIGHTS. + IN NO EVENT SHALL THE COPYRIGHT HOLDER(S) AND AUTHOR(S) BE LIABLE FOR ANY + CLAIM, OR ANY SPECIAL INDIRECT OR CONSEQUENTIAL DAMAGES, OR ANY DAMAGES + WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN AN + ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING OUT OF + OR IN CONNECTION WITH THE USE OR PERFORMANCE OF THIS SOFTWARE. + + ALL LIABILITY, INCLUDING LIABILITY FOR INFRINGEMENT OF ANY PATENTS, + COPYRIGHTS, TRADEMARKS OR OTHER RIGHTS, RELATING TO USE OF THIS + SOFTWARE IS DISCLAIMED. +*/ + +#ifndef __SMP_H +#define __SMP_H + +struct smp_command_hdr { + __u8 code; +} __packed; + +#define SMP_CMD_PAIRING_REQ 0x01 +#define SMP_CMD_PAIRING_RSP 0x02 +struct smp_cmd_pairing { + __u8 io_capability; + __u8 oob_flag; + __u8 auth_req; + __u8 max_key_size; + __u8 init_key_dist; + __u8 resp_key_dist; +} __packed; + +#define SMP_IO_DISPLAY_ONLY 0x00 +#define SMP_IO_DISPLAY_YESNO 0x01 +#define SMP_IO_KEYBOARD_ONLY 0x02 +#define SMP_IO_NO_INPUT_OUTPUT 0x03 +#define SMP_IO_KEYBOARD_DISPLAY 0x04 + +#define SMP_OOB_NOT_PRESENT 0x00 +#define SMP_OOB_PRESENT 0x01 + +#define SMP_DIST_ENC_KEY 0x01 +#define SMP_DIST_ID_KEY 0x02 +#define SMP_DIST_SIGN 0x04 + +#define SMP_AUTH_NONE 0x00 +#define SMP_AUTH_BONDING 0x01 +#define SMP_AUTH_MITM 0x04 + +#define SMP_CMD_PAIRING_CONFIRM 0x03 +struct smp_cmd_pairing_confirm { + __u8 confirm_val[16]; +} __packed; + +#define SMP_CMD_PAIRING_RANDOM 0x04 +struct smp_cmd_pairing_random { + __u8 rand_val[16]; +} __packed; + +#define SMP_CMD_PAIRING_FAIL 0x05 +struct smp_cmd_pairing_fail { + __u8 reason; +} __packed; + +#define SMP_CMD_ENCRYPT_INFO 0x06 +struct smp_cmd_encrypt_info { + __u8 ltk[16]; +} __packed; + +#define SMP_CMD_MASTER_IDENT 0x07 +struct smp_cmd_master_ident { + __le16 ediv; + __u8 rand[8]; +} __packed; + +#define SMP_CMD_IDENT_INFO 0x08 +struct smp_cmd_ident_info { + __u8 irk[16]; +} __packed; + +#define SMP_CMD_IDENT_ADDR_INFO 0x09 +struct smp_cmd_ident_addr_info { + __u8 addr_type; + bdaddr_t bdaddr; +} __packed; + +#define SMP_CMD_SIGN_INFO 0x0a +struct smp_cmd_sign_info { + __u8 csrk[16]; +} __packed; + +#define SMP_CMD_SECURITY_REQ 0x0b +struct smp_cmd_security_req { + __u8 auth_req; +} __packed; + +#define SMP_PASSKEY_ENTRY_FAILED 0x01 +#define SMP_OOB_NOT_AVAIL 0x02 +#define SMP_AUTH_REQUIREMENTS 0x03 +#define SMP_CONFIRM_FAILED 0x04 +#define SMP_PAIRING_NOTSUPP 0x05 +#define SMP_ENC_KEY_SIZE 0x06 +#define SMP_CMD_NOTSUPP 0x07 +#define SMP_UNSPECIFIED 0x08 +#define SMP_REPEATED_ATTEMPTS 0x09 + +#define SMP_MIN_ENC_KEY_SIZE 7 +#define SMP_MAX_ENC_KEY_SIZE 16 + +#define SMP_FLAG_TK_VALID 1 +#define SMP_FLAG_CFM_PENDING 2 +#define SMP_FLAG_MITM_AUTH 3 + +struct smp_chan { + struct l2cap_conn *conn; + u8 preq[7]; /* SMP Pairing Request */ + u8 prsp[7]; /* SMP Pairing Response */ + u8 prnd[16]; /* SMP Pairing Random (local) */ + u8 rrnd[16]; /* SMP Pairing Random (remote) */ + u8 pcnf[16]; /* SMP Pairing Confirm */ + u8 tk[16]; /* SMP Temporary Key */ + u8 enc_key_size; + unsigned long smp_flags; + struct crypto_blkcipher *tfm; + struct work_struct confirm; + struct work_struct random; + +}; + +/* SMP Commands */ +int smp_conn_security(struct hci_conn *hcon, __u8 sec_level); +int smp_sig_channel(struct l2cap_conn *conn, struct sk_buff *skb); +int smp_distribute_keys(struct l2cap_conn *conn, __u8 force); +int smp_user_confirm_reply(struct hci_conn *conn, u16 mgmt_op, __le32 passkey); + +void smp_chan_destroy(struct l2cap_conn *conn); + +#endif /* __SMP_H */ diff --git a/net/bridge/br_multicast.c b/net/bridge/br_multicast.c index 686284ff3d6a..4c214b2b88ef 100644 --- a/net/bridge/br_multicast.c +++ b/net/bridge/br_multicast.c @@ -363,7 +363,7 @@ static struct sk_buff *br_ip4_multicast_alloc_query(struct net_bridge *br, skb_reset_mac_header(skb); eth = eth_hdr(skb); - memcpy(eth->h_source, br->dev->dev_addr, 6); + memcpy(eth->h_source, br->dev->dev_addr, ETH_ALEN); eth->h_dest[0] = 1; eth->h_dest[1] = 0; eth->h_dest[2] = 0x5e; @@ -433,7 +433,7 @@ static struct sk_buff *br_ip6_multicast_alloc_query(struct net_bridge *br, skb_reset_mac_header(skb); eth = eth_hdr(skb); - memcpy(eth->h_source, br->dev->dev_addr, 6); + memcpy(eth->h_source, br->dev->dev_addr, ETH_ALEN); eth->h_proto = htons(ETH_P_IPV6); skb_put(skb, sizeof(*eth)); diff --git a/net/bridge/br_netfilter.c b/net/bridge/br_netfilter.c index f87736270eaa..80cad2cf02a7 100644 --- a/net/bridge/br_netfilter.c +++ b/net/bridge/br_netfilter.c @@ -559,6 +559,8 @@ static struct net_device *setup_pre_routing(struct sk_buff *skb) else if (skb->protocol == htons(ETH_P_PPP_SES)) nf_bridge->mask |= BRNF_PPPoE; + /* Must drop socket now because of tproxy. */ + skb_orphan(skb); return skb->dev; } @@ -619,7 +621,7 @@ bad: /* Replicate the checks that IPv6 does on packet reception and pass the packet * to ip6tables, which doesn't support NAT, so things are fairly simple. */ -static unsigned int br_nf_pre_routing_ipv6(unsigned int hook, +static unsigned int br_nf_pre_routing_ipv6(const struct nf_hook_ops *ops, struct sk_buff *skb, const struct net_device *in, const struct net_device *out, @@ -669,7 +671,8 @@ static unsigned int br_nf_pre_routing_ipv6(unsigned int hook, * receiving device) to make netfilter happy, the REDIRECT * target in particular. Save the original destination IP * address to be able to detect DNAT afterwards. */ -static unsigned int br_nf_pre_routing(unsigned int hook, struct sk_buff *skb, +static unsigned int br_nf_pre_routing(const struct nf_hook_ops *ops, + struct sk_buff *skb, const struct net_device *in, const struct net_device *out, int (*okfn)(struct sk_buff *)) @@ -691,7 +694,7 @@ static unsigned int br_nf_pre_routing(unsigned int hook, struct sk_buff *skb, return NF_ACCEPT; nf_bridge_pull_encap_header_rcsum(skb); - return br_nf_pre_routing_ipv6(hook, skb, in, out, okfn); + return br_nf_pre_routing_ipv6(ops, skb, in, out, okfn); } if (!brnf_call_iptables && !br->nf_call_iptables) @@ -727,7 +730,8 @@ static unsigned int br_nf_pre_routing(unsigned int hook, struct sk_buff *skb, * took place when the packet entered the bridge), but we * register an IPv4 PRE_ROUTING 'sabotage' hook that will * prevent this from happening. */ -static unsigned int br_nf_local_in(unsigned int hook, struct sk_buff *skb, +static unsigned int br_nf_local_in(const struct nf_hook_ops *ops, + struct sk_buff *skb, const struct net_device *in, const struct net_device *out, int (*okfn)(struct sk_buff *)) @@ -765,7 +769,8 @@ static int br_nf_forward_finish(struct sk_buff *skb) * but we are still able to filter on the 'real' indev/outdev * because of the physdev module. For ARP, indev and outdev are the * bridge ports. */ -static unsigned int br_nf_forward_ip(unsigned int hook, struct sk_buff *skb, +static unsigned int br_nf_forward_ip(const struct nf_hook_ops *ops, + struct sk_buff *skb, const struct net_device *in, const struct net_device *out, int (*okfn)(struct sk_buff *)) @@ -818,7 +823,8 @@ static unsigned int br_nf_forward_ip(unsigned int hook, struct sk_buff *skb, return NF_STOLEN; } -static unsigned int br_nf_forward_arp(unsigned int hook, struct sk_buff *skb, +static unsigned int br_nf_forward_arp(const struct nf_hook_ops *ops, + struct sk_buff *skb, const struct net_device *in, const struct net_device *out, int (*okfn)(struct sk_buff *)) @@ -878,7 +884,8 @@ static int br_nf_dev_queue_xmit(struct sk_buff *skb) #endif /* PF_BRIDGE/POST_ROUTING ********************************************/ -static unsigned int br_nf_post_routing(unsigned int hook, struct sk_buff *skb, +static unsigned int br_nf_post_routing(const struct nf_hook_ops *ops, + struct sk_buff *skb, const struct net_device *in, const struct net_device *out, int (*okfn)(struct sk_buff *)) @@ -923,7 +930,8 @@ static unsigned int br_nf_post_routing(unsigned int hook, struct sk_buff *skb, /* IP/SABOTAGE *****************************************************/ /* Don't hand locally destined packets to PF_INET(6)/PRE_ROUTING * for the second time. */ -static unsigned int ip_sabotage_in(unsigned int hook, struct sk_buff *skb, +static unsigned int ip_sabotage_in(const struct nf_hook_ops *ops, + struct sk_buff *skb, const struct net_device *in, const struct net_device *out, int (*okfn)(struct sk_buff *)) diff --git a/net/bridge/br_private.h b/net/bridge/br_private.h index 2e8244efb262..229d820bdf0b 100644 --- a/net/bridge/br_private.h +++ b/net/bridge/br_private.h @@ -343,10 +343,9 @@ static inline int br_is_root_bridge(const struct net_bridge *br) } /* br_device.c */ -extern void br_dev_setup(struct net_device *dev); -extern void br_dev_delete(struct net_device *dev, struct list_head *list); -extern netdev_tx_t br_dev_xmit(struct sk_buff *skb, - struct net_device *dev); +void br_dev_setup(struct net_device *dev); +void br_dev_delete(struct net_device *dev, struct list_head *list); +netdev_tx_t br_dev_xmit(struct sk_buff *skb, struct net_device *dev); #ifdef CONFIG_NET_POLL_CONTROLLER static inline void br_netpoll_send_skb(const struct net_bridge_port *p, struct sk_buff *skb) @@ -357,8 +356,8 @@ static inline void br_netpoll_send_skb(const struct net_bridge_port *p, netpoll_send_skb(np, skb); } -extern int br_netpoll_enable(struct net_bridge_port *p, gfp_t gfp); -extern void br_netpoll_disable(struct net_bridge_port *p); +int br_netpoll_enable(struct net_bridge_port *p, gfp_t gfp); +void br_netpoll_disable(struct net_bridge_port *p); #else static inline void br_netpoll_send_skb(const struct net_bridge_port *p, struct sk_buff *skb) @@ -376,117 +375,99 @@ static inline void br_netpoll_disable(struct net_bridge_port *p) #endif /* br_fdb.c */ -extern int br_fdb_init(void); -extern void br_fdb_fini(void); -extern void br_fdb_flush(struct net_bridge *br); -extern void br_fdb_changeaddr(struct net_bridge_port *p, - const unsigned char *newaddr); -extern void br_fdb_change_mac_address(struct net_bridge *br, const u8 *newaddr); -extern void br_fdb_cleanup(unsigned long arg); -extern void br_fdb_delete_by_port(struct net_bridge *br, - const struct net_bridge_port *p, int do_all); -extern struct net_bridge_fdb_entry *__br_fdb_get(struct net_bridge *br, - const unsigned char *addr, - __u16 vid); -extern int br_fdb_test_addr(struct net_device *dev, unsigned char *addr); -extern int br_fdb_fillbuf(struct net_bridge *br, void *buf, - unsigned long count, unsigned long off); -extern int br_fdb_insert(struct net_bridge *br, - struct net_bridge_port *source, - const unsigned char *addr, - u16 vid); -extern void br_fdb_update(struct net_bridge *br, - struct net_bridge_port *source, - const unsigned char *addr, - u16 vid); -extern int fdb_delete_by_addr(struct net_bridge *br, const u8 *addr, u16 vid); - -extern int br_fdb_delete(struct ndmsg *ndm, struct nlattr *tb[], - struct net_device *dev, - const unsigned char *addr); -extern int br_fdb_add(struct ndmsg *nlh, struct nlattr *tb[], - struct net_device *dev, - const unsigned char *addr, - u16 nlh_flags); -extern int br_fdb_dump(struct sk_buff *skb, - struct netlink_callback *cb, - struct net_device *dev, - int idx); +int br_fdb_init(void); +void br_fdb_fini(void); +void br_fdb_flush(struct net_bridge *br); +void br_fdb_changeaddr(struct net_bridge_port *p, const unsigned char *newaddr); +void br_fdb_change_mac_address(struct net_bridge *br, const u8 *newaddr); +void br_fdb_cleanup(unsigned long arg); +void br_fdb_delete_by_port(struct net_bridge *br, + const struct net_bridge_port *p, int do_all); +struct net_bridge_fdb_entry *__br_fdb_get(struct net_bridge *br, + const unsigned char *addr, __u16 vid); +int br_fdb_test_addr(struct net_device *dev, unsigned char *addr); +int br_fdb_fillbuf(struct net_bridge *br, void *buf, unsigned long count, + unsigned long off); +int br_fdb_insert(struct net_bridge *br, struct net_bridge_port *source, + const unsigned char *addr, u16 vid); +void br_fdb_update(struct net_bridge *br, struct net_bridge_port *source, + const unsigned char *addr, u16 vid); +int fdb_delete_by_addr(struct net_bridge *br, const u8 *addr, u16 vid); + +int br_fdb_delete(struct ndmsg *ndm, struct nlattr *tb[], + struct net_device *dev, const unsigned char *addr); +int br_fdb_add(struct ndmsg *nlh, struct nlattr *tb[], struct net_device *dev, + const unsigned char *addr, u16 nlh_flags); +int br_fdb_dump(struct sk_buff *skb, struct netlink_callback *cb, + struct net_device *dev, int idx); /* br_forward.c */ -extern void br_deliver(const struct net_bridge_port *to, - struct sk_buff *skb); -extern int br_dev_queue_push_xmit(struct sk_buff *skb); -extern void br_forward(const struct net_bridge_port *to, +void br_deliver(const struct net_bridge_port *to, struct sk_buff *skb); +int br_dev_queue_push_xmit(struct sk_buff *skb); +void br_forward(const struct net_bridge_port *to, struct sk_buff *skb, struct sk_buff *skb0); -extern int br_forward_finish(struct sk_buff *skb); -extern void br_flood_deliver(struct net_bridge *br, struct sk_buff *skb, - bool unicast); -extern void br_flood_forward(struct net_bridge *br, struct sk_buff *skb, - struct sk_buff *skb2, bool unicast); +int br_forward_finish(struct sk_buff *skb); +void br_flood_deliver(struct net_bridge *br, struct sk_buff *skb, bool unicast); +void br_flood_forward(struct net_bridge *br, struct sk_buff *skb, + struct sk_buff *skb2, bool unicast); /* br_if.c */ -extern void br_port_carrier_check(struct net_bridge_port *p); -extern int br_add_bridge(struct net *net, const char *name); -extern int br_del_bridge(struct net *net, const char *name); -extern void br_net_exit(struct net *net); -extern int br_add_if(struct net_bridge *br, - struct net_device *dev); -extern int br_del_if(struct net_bridge *br, - struct net_device *dev); -extern int br_min_mtu(const struct net_bridge *br); -extern netdev_features_t br_features_recompute(struct net_bridge *br, - netdev_features_t features); +void br_port_carrier_check(struct net_bridge_port *p); +int br_add_bridge(struct net *net, const char *name); +int br_del_bridge(struct net *net, const char *name); +void br_net_exit(struct net *net); +int br_add_if(struct net_bridge *br, struct net_device *dev); +int br_del_if(struct net_bridge *br, struct net_device *dev); +int br_min_mtu(const struct net_bridge *br); +netdev_features_t br_features_recompute(struct net_bridge *br, + netdev_features_t features); /* br_input.c */ -extern int br_handle_frame_finish(struct sk_buff *skb); -extern rx_handler_result_t br_handle_frame(struct sk_buff **pskb); +int br_handle_frame_finish(struct sk_buff *skb); +rx_handler_result_t br_handle_frame(struct sk_buff **pskb); /* br_ioctl.c */ -extern int br_dev_ioctl(struct net_device *dev, struct ifreq *rq, int cmd); -extern int br_ioctl_deviceless_stub(struct net *net, unsigned int cmd, void __user *arg); +int br_dev_ioctl(struct net_device *dev, struct ifreq *rq, int cmd); +int br_ioctl_deviceless_stub(struct net *net, unsigned int cmd, + void __user *arg); /* br_multicast.c */ #ifdef CONFIG_BRIDGE_IGMP_SNOOPING extern unsigned int br_mdb_rehash_seq; -extern int br_multicast_rcv(struct net_bridge *br, - struct net_bridge_port *port, - struct sk_buff *skb, - u16 vid); -extern struct net_bridge_mdb_entry *br_mdb_get(struct net_bridge *br, - struct sk_buff *skb, u16 vid); -extern void br_multicast_add_port(struct net_bridge_port *port); -extern void br_multicast_del_port(struct net_bridge_port *port); -extern void br_multicast_enable_port(struct net_bridge_port *port); -extern void br_multicast_disable_port(struct net_bridge_port *port); -extern void br_multicast_init(struct net_bridge *br); -extern void br_multicast_open(struct net_bridge *br); -extern void br_multicast_stop(struct net_bridge *br); -extern void br_multicast_deliver(struct net_bridge_mdb_entry *mdst, - struct sk_buff *skb); -extern void br_multicast_forward(struct net_bridge_mdb_entry *mdst, - struct sk_buff *skb, struct sk_buff *skb2); -extern int br_multicast_set_router(struct net_bridge *br, unsigned long val); -extern int br_multicast_set_port_router(struct net_bridge_port *p, - unsigned long val); -extern int br_multicast_toggle(struct net_bridge *br, unsigned long val); -extern int br_multicast_set_querier(struct net_bridge *br, unsigned long val); -extern int br_multicast_set_hash_max(struct net_bridge *br, unsigned long val); -extern struct net_bridge_mdb_entry *br_mdb_ip_get( - struct net_bridge_mdb_htable *mdb, - struct br_ip *dst); -extern struct net_bridge_mdb_entry *br_multicast_new_group(struct net_bridge *br, - struct net_bridge_port *port, struct br_ip *group); -extern void br_multicast_free_pg(struct rcu_head *head); -extern struct net_bridge_port_group *br_multicast_new_port_group( - struct net_bridge_port *port, - struct br_ip *group, - struct net_bridge_port_group __rcu *next, - unsigned char state); -extern void br_mdb_init(void); -extern void br_mdb_uninit(void); -extern void br_mdb_notify(struct net_device *dev, struct net_bridge_port *port, - struct br_ip *group, int type); +int br_multicast_rcv(struct net_bridge *br, struct net_bridge_port *port, + struct sk_buff *skb, u16 vid); +struct net_bridge_mdb_entry *br_mdb_get(struct net_bridge *br, + struct sk_buff *skb, u16 vid); +void br_multicast_add_port(struct net_bridge_port *port); +void br_multicast_del_port(struct net_bridge_port *port); +void br_multicast_enable_port(struct net_bridge_port *port); +void br_multicast_disable_port(struct net_bridge_port *port); +void br_multicast_init(struct net_bridge *br); +void br_multicast_open(struct net_bridge *br); +void br_multicast_stop(struct net_bridge *br); +void br_multicast_deliver(struct net_bridge_mdb_entry *mdst, + struct sk_buff *skb); +void br_multicast_forward(struct net_bridge_mdb_entry *mdst, + struct sk_buff *skb, struct sk_buff *skb2); +int br_multicast_set_router(struct net_bridge *br, unsigned long val); +int br_multicast_set_port_router(struct net_bridge_port *p, unsigned long val); +int br_multicast_toggle(struct net_bridge *br, unsigned long val); +int br_multicast_set_querier(struct net_bridge *br, unsigned long val); +int br_multicast_set_hash_max(struct net_bridge *br, unsigned long val); +struct net_bridge_mdb_entry * +br_mdb_ip_get(struct net_bridge_mdb_htable *mdb, struct br_ip *dst); +struct net_bridge_mdb_entry * +br_multicast_new_group(struct net_bridge *br, struct net_bridge_port *port, + struct br_ip *group); +void br_multicast_free_pg(struct rcu_head *head); +struct net_bridge_port_group * +br_multicast_new_port_group(struct net_bridge_port *port, struct br_ip *group, + struct net_bridge_port_group __rcu *next, + unsigned char state); +void br_mdb_init(void); +void br_mdb_uninit(void); +void br_mdb_notify(struct net_device *dev, struct net_bridge_port *port, + struct br_ip *group, int type); #define mlock_dereference(X, br) \ rcu_dereference_protected(X, lockdep_is_held(&br->multicast_lock)) @@ -592,22 +573,21 @@ static inline void br_mdb_uninit(void) /* br_vlan.c */ #ifdef CONFIG_BRIDGE_VLAN_FILTERING -extern bool br_allowed_ingress(struct net_bridge *br, struct net_port_vlans *v, - struct sk_buff *skb, u16 *vid); -extern bool br_allowed_egress(struct net_bridge *br, - const struct net_port_vlans *v, - const struct sk_buff *skb); -extern struct sk_buff *br_handle_vlan(struct net_bridge *br, - const struct net_port_vlans *v, - struct sk_buff *skb); -extern int br_vlan_add(struct net_bridge *br, u16 vid, u16 flags); -extern int br_vlan_delete(struct net_bridge *br, u16 vid); -extern void br_vlan_flush(struct net_bridge *br); -extern int br_vlan_filter_toggle(struct net_bridge *br, unsigned long val); -extern int nbp_vlan_add(struct net_bridge_port *port, u16 vid, u16 flags); -extern int nbp_vlan_delete(struct net_bridge_port *port, u16 vid); -extern void nbp_vlan_flush(struct net_bridge_port *port); -extern bool nbp_vlan_find(struct net_bridge_port *port, u16 vid); +bool br_allowed_ingress(struct net_bridge *br, struct net_port_vlans *v, + struct sk_buff *skb, u16 *vid); +bool br_allowed_egress(struct net_bridge *br, const struct net_port_vlans *v, + const struct sk_buff *skb); +struct sk_buff *br_handle_vlan(struct net_bridge *br, + const struct net_port_vlans *v, + struct sk_buff *skb); +int br_vlan_add(struct net_bridge *br, u16 vid, u16 flags); +int br_vlan_delete(struct net_bridge *br, u16 vid); +void br_vlan_flush(struct net_bridge *br); +int br_vlan_filter_toggle(struct net_bridge *br, unsigned long val); +int nbp_vlan_add(struct net_bridge_port *port, u16 vid, u16 flags); +int nbp_vlan_delete(struct net_bridge_port *port, u16 vid); +void nbp_vlan_flush(struct net_bridge_port *port); +bool nbp_vlan_find(struct net_bridge_port *port, u16 vid); static inline struct net_port_vlans *br_get_vlan_info( const struct net_bridge *br) @@ -726,9 +706,9 @@ static inline u16 br_get_pvid(const struct net_port_vlans *v) /* br_netfilter.c */ #ifdef CONFIG_BRIDGE_NETFILTER -extern int br_netfilter_init(void); -extern void br_netfilter_fini(void); -extern void br_netfilter_rtable_init(struct net_bridge *); +int br_netfilter_init(void); +void br_netfilter_fini(void); +void br_netfilter_rtable_init(struct net_bridge *); #else #define br_netfilter_init() (0) #define br_netfilter_fini() do { } while(0) @@ -736,43 +716,39 @@ extern void br_netfilter_rtable_init(struct net_bridge *); #endif /* br_stp.c */ -extern void br_log_state(const struct net_bridge_port *p); -extern struct net_bridge_port *br_get_port(struct net_bridge *br, - u16 port_no); -extern void br_init_port(struct net_bridge_port *p); -extern void br_become_designated_port(struct net_bridge_port *p); +void br_log_state(const struct net_bridge_port *p); +struct net_bridge_port *br_get_port(struct net_bridge *br, u16 port_no); +void br_init_port(struct net_bridge_port *p); +void br_become_designated_port(struct net_bridge_port *p); -extern void __br_set_forward_delay(struct net_bridge *br, unsigned long t); -extern int br_set_forward_delay(struct net_bridge *br, unsigned long x); -extern int br_set_hello_time(struct net_bridge *br, unsigned long x); -extern int br_set_max_age(struct net_bridge *br, unsigned long x); +void __br_set_forward_delay(struct net_bridge *br, unsigned long t); +int br_set_forward_delay(struct net_bridge *br, unsigned long x); +int br_set_hello_time(struct net_bridge *br, unsigned long x); +int br_set_max_age(struct net_bridge *br, unsigned long x); /* br_stp_if.c */ -extern void br_stp_enable_bridge(struct net_bridge *br); -extern void br_stp_disable_bridge(struct net_bridge *br); -extern void br_stp_set_enabled(struct net_bridge *br, unsigned long val); -extern void br_stp_enable_port(struct net_bridge_port *p); -extern void br_stp_disable_port(struct net_bridge_port *p); -extern bool br_stp_recalculate_bridge_id(struct net_bridge *br); -extern void br_stp_change_bridge_id(struct net_bridge *br, const unsigned char *a); -extern void br_stp_set_bridge_priority(struct net_bridge *br, - u16 newprio); -extern int br_stp_set_port_priority(struct net_bridge_port *p, - unsigned long newprio); -extern int br_stp_set_path_cost(struct net_bridge_port *p, - unsigned long path_cost); -extern ssize_t br_show_bridge_id(char *buf, const struct bridge_id *id); +void br_stp_enable_bridge(struct net_bridge *br); +void br_stp_disable_bridge(struct net_bridge *br); +void br_stp_set_enabled(struct net_bridge *br, unsigned long val); +void br_stp_enable_port(struct net_bridge_port *p); +void br_stp_disable_port(struct net_bridge_port *p); +bool br_stp_recalculate_bridge_id(struct net_bridge *br); +void br_stp_change_bridge_id(struct net_bridge *br, const unsigned char *a); +void br_stp_set_bridge_priority(struct net_bridge *br, u16 newprio); +int br_stp_set_port_priority(struct net_bridge_port *p, unsigned long newprio); +int br_stp_set_path_cost(struct net_bridge_port *p, unsigned long path_cost); +ssize_t br_show_bridge_id(char *buf, const struct bridge_id *id); /* br_stp_bpdu.c */ struct stp_proto; -extern void br_stp_rcv(const struct stp_proto *proto, struct sk_buff *skb, - struct net_device *dev); +void br_stp_rcv(const struct stp_proto *proto, struct sk_buff *skb, + struct net_device *dev); /* br_stp_timer.c */ -extern void br_stp_timer_init(struct net_bridge *br); -extern void br_stp_port_timer_init(struct net_bridge_port *p); -extern unsigned long br_timer_value(const struct timer_list *timer); +void br_stp_timer_init(struct net_bridge *br); +void br_stp_port_timer_init(struct net_bridge_port *p); +unsigned long br_timer_value(const struct timer_list *timer); /* br.c */ #if IS_ENABLED(CONFIG_ATM_LANE) @@ -781,23 +757,23 @@ extern int (*br_fdb_test_addr_hook)(struct net_device *dev, unsigned char *addr) /* br_netlink.c */ extern struct rtnl_link_ops br_link_ops; -extern int br_netlink_init(void); -extern void br_netlink_fini(void); -extern void br_ifinfo_notify(int event, struct net_bridge_port *port); -extern int br_setlink(struct net_device *dev, struct nlmsghdr *nlmsg); -extern int br_dellink(struct net_device *dev, struct nlmsghdr *nlmsg); -extern int br_getlink(struct sk_buff *skb, u32 pid, u32 seq, - struct net_device *dev, u32 filter_mask); +int br_netlink_init(void); +void br_netlink_fini(void); +void br_ifinfo_notify(int event, struct net_bridge_port *port); +int br_setlink(struct net_device *dev, struct nlmsghdr *nlmsg); +int br_dellink(struct net_device *dev, struct nlmsghdr *nlmsg); +int br_getlink(struct sk_buff *skb, u32 pid, u32 seq, struct net_device *dev, + u32 filter_mask); #ifdef CONFIG_SYSFS /* br_sysfs_if.c */ extern const struct sysfs_ops brport_sysfs_ops; -extern int br_sysfs_addif(struct net_bridge_port *p); -extern int br_sysfs_renameif(struct net_bridge_port *p); +int br_sysfs_addif(struct net_bridge_port *p); +int br_sysfs_renameif(struct net_bridge_port *p); /* br_sysfs_br.c */ -extern int br_sysfs_addbr(struct net_device *dev); -extern void br_sysfs_delbr(struct net_device *dev); +int br_sysfs_addbr(struct net_device *dev); +void br_sysfs_delbr(struct net_device *dev); #else diff --git a/net/bridge/br_private_stp.h b/net/bridge/br_private_stp.h index 0c0fe36e7aa9..2fe910c4e170 100644 --- a/net/bridge/br_private_stp.h +++ b/net/bridge/br_private_stp.h @@ -51,19 +51,19 @@ static inline int br_is_designated_port(const struct net_bridge_port *p) /* br_stp.c */ -extern void br_become_root_bridge(struct net_bridge *br); -extern void br_config_bpdu_generation(struct net_bridge *); -extern void br_configuration_update(struct net_bridge *); -extern void br_port_state_selection(struct net_bridge *); -extern void br_received_config_bpdu(struct net_bridge_port *p, - const struct br_config_bpdu *bpdu); -extern void br_received_tcn_bpdu(struct net_bridge_port *p); -extern void br_transmit_config(struct net_bridge_port *p); -extern void br_transmit_tcn(struct net_bridge *br); -extern void br_topology_change_detection(struct net_bridge *br); +void br_become_root_bridge(struct net_bridge *br); +void br_config_bpdu_generation(struct net_bridge *); +void br_configuration_update(struct net_bridge *); +void br_port_state_selection(struct net_bridge *); +void br_received_config_bpdu(struct net_bridge_port *p, + const struct br_config_bpdu *bpdu); +void br_received_tcn_bpdu(struct net_bridge_port *p); +void br_transmit_config(struct net_bridge_port *p); +void br_transmit_tcn(struct net_bridge *br); +void br_topology_change_detection(struct net_bridge *br); /* br_stp_bpdu.c */ -extern void br_send_config_bpdu(struct net_bridge_port *, struct br_config_bpdu *); -extern void br_send_tcn_bpdu(struct net_bridge_port *); +void br_send_config_bpdu(struct net_bridge_port *, struct br_config_bpdu *); +void br_send_tcn_bpdu(struct net_bridge_port *); #endif diff --git a/net/bridge/netfilter/Kconfig b/net/bridge/netfilter/Kconfig index a9aff9c7d027..5ca74a0e595f 100644 --- a/net/bridge/netfilter/Kconfig +++ b/net/bridge/netfilter/Kconfig @@ -1,6 +1,10 @@ # # Bridge netfilter configuration # +# +config NF_TABLES_BRIDGE + depends on NF_TABLES + tristate "Ethernet Bridge nf_tables support" menuconfig BRIDGE_NF_EBTABLES tristate "Ethernet Bridge tables (ebtables) support" diff --git a/net/bridge/netfilter/Makefile b/net/bridge/netfilter/Makefile index 0718699540b0..ea7629f58b3d 100644 --- a/net/bridge/netfilter/Makefile +++ b/net/bridge/netfilter/Makefile @@ -2,6 +2,8 @@ # Makefile for the netfilter modules for Link Layer filtering on a bridge. # +obj-$(CONFIG_NF_TABLES_BRIDGE) += nf_tables_bridge.o + obj-$(CONFIG_BRIDGE_NF_EBTABLES) += ebtables.o # tables diff --git a/net/bridge/netfilter/ebt_among.c b/net/bridge/netfilter/ebt_among.c index 8b84c581be30..3fb3c848affe 100644 --- a/net/bridge/netfilter/ebt_among.c +++ b/net/bridge/netfilter/ebt_among.c @@ -28,7 +28,7 @@ static bool ebt_mac_wormhash_contains(const struct ebt_mac_wormhash *wh, uint32_t cmp[2] = { 0, 0 }; int key = ((const unsigned char *)mac)[5]; - memcpy(((char *) cmp) + 2, mac, 6); + memcpy(((char *) cmp) + 2, mac, ETH_ALEN); start = wh->table[key]; limit = wh->table[key + 1]; if (ip) { diff --git a/net/bridge/netfilter/ebtable_filter.c b/net/bridge/netfilter/ebtable_filter.c index 94b2b700cff8..bb2da7b706e7 100644 --- a/net/bridge/netfilter/ebtable_filter.c +++ b/net/bridge/netfilter/ebtable_filter.c @@ -60,17 +60,21 @@ static const struct ebt_table frame_filter = }; static unsigned int -ebt_in_hook(unsigned int hook, struct sk_buff *skb, const struct net_device *in, - const struct net_device *out, int (*okfn)(struct sk_buff *)) +ebt_in_hook(const struct nf_hook_ops *ops, struct sk_buff *skb, + const struct net_device *in, const struct net_device *out, + int (*okfn)(struct sk_buff *)) { - return ebt_do_table(hook, skb, in, out, dev_net(in)->xt.frame_filter); + return ebt_do_table(ops->hooknum, skb, in, out, + dev_net(in)->xt.frame_filter); } static unsigned int -ebt_out_hook(unsigned int hook, struct sk_buff *skb, const struct net_device *in, - const struct net_device *out, int (*okfn)(struct sk_buff *)) +ebt_out_hook(const struct nf_hook_ops *ops, struct sk_buff *skb, + const struct net_device *in, const struct net_device *out, + int (*okfn)(struct sk_buff *)) { - return ebt_do_table(hook, skb, in, out, dev_net(out)->xt.frame_filter); + return ebt_do_table(ops->hooknum, skb, in, out, + dev_net(out)->xt.frame_filter); } static struct nf_hook_ops ebt_ops_filter[] __read_mostly = { diff --git a/net/bridge/netfilter/ebtable_nat.c b/net/bridge/netfilter/ebtable_nat.c index 322555acdd40..bd238f1f105b 100644 --- a/net/bridge/netfilter/ebtable_nat.c +++ b/net/bridge/netfilter/ebtable_nat.c @@ -60,17 +60,21 @@ static struct ebt_table frame_nat = }; static unsigned int -ebt_nat_in(unsigned int hook, struct sk_buff *skb, const struct net_device *in - , const struct net_device *out, int (*okfn)(struct sk_buff *)) +ebt_nat_in(const struct nf_hook_ops *ops, struct sk_buff *skb, + const struct net_device *in, const struct net_device *out, + int (*okfn)(struct sk_buff *)) { - return ebt_do_table(hook, skb, in, out, dev_net(in)->xt.frame_nat); + return ebt_do_table(ops->hooknum, skb, in, out, + dev_net(in)->xt.frame_nat); } static unsigned int -ebt_nat_out(unsigned int hook, struct sk_buff *skb, const struct net_device *in - , const struct net_device *out, int (*okfn)(struct sk_buff *)) +ebt_nat_out(const struct nf_hook_ops *ops, struct sk_buff *skb, + const struct net_device *in, const struct net_device *out, + int (*okfn)(struct sk_buff *)) { - return ebt_do_table(hook, skb, in, out, dev_net(out)->xt.frame_nat); + return ebt_do_table(ops->hooknum, skb, in, out, + dev_net(out)->xt.frame_nat); } static struct nf_hook_ops ebt_ops_nat[] __read_mostly = { diff --git a/net/bridge/netfilter/nf_tables_bridge.c b/net/bridge/netfilter/nf_tables_bridge.c new file mode 100644 index 000000000000..cf54b22818c8 --- /dev/null +++ b/net/bridge/netfilter/nf_tables_bridge.c @@ -0,0 +1,102 @@ +/* + * Copyright (c) 2008 Patrick McHardy <kaber@trash.net> + * Copyright (c) 2013 Pablo Neira Ayuso <pablo@netfilter.org> + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License version 2 as + * published by the Free Software Foundation. + * + * Development of this code funded by Astaro AG (http://www.astaro.com/) + */ + +#include <linux/init.h> +#include <linux/module.h> +#include <linux/netfilter_bridge.h> +#include <net/netfilter/nf_tables.h> + +static struct nft_af_info nft_af_bridge __read_mostly = { + .family = NFPROTO_BRIDGE, + .nhooks = NF_BR_NUMHOOKS, + .owner = THIS_MODULE, +}; + +static int nf_tables_bridge_init_net(struct net *net) +{ + net->nft.bridge = kmalloc(sizeof(struct nft_af_info), GFP_KERNEL); + if (net->nft.bridge == NULL) + return -ENOMEM; + + memcpy(net->nft.bridge, &nft_af_bridge, sizeof(nft_af_bridge)); + + if (nft_register_afinfo(net, net->nft.bridge) < 0) + goto err; + + return 0; +err: + kfree(net->nft.bridge); + return -ENOMEM; +} + +static void nf_tables_bridge_exit_net(struct net *net) +{ + nft_unregister_afinfo(net->nft.bridge); + kfree(net->nft.bridge); +} + +static struct pernet_operations nf_tables_bridge_net_ops = { + .init = nf_tables_bridge_init_net, + .exit = nf_tables_bridge_exit_net, +}; + +static unsigned int +nft_do_chain_bridge(const struct nf_hook_ops *ops, + struct sk_buff *skb, + const struct net_device *in, + const struct net_device *out, + int (*okfn)(struct sk_buff *)) +{ + struct nft_pktinfo pkt; + + nft_set_pktinfo(&pkt, ops, skb, in, out); + + return nft_do_chain_pktinfo(&pkt, ops); +} + +static struct nf_chain_type filter_bridge = { + .family = NFPROTO_BRIDGE, + .name = "filter", + .type = NFT_CHAIN_T_DEFAULT, + .hook_mask = (1 << NF_BR_LOCAL_IN) | + (1 << NF_BR_FORWARD) | + (1 << NF_BR_LOCAL_OUT), + .fn = { + [NF_BR_LOCAL_IN] = nft_do_chain_bridge, + [NF_BR_FORWARD] = nft_do_chain_bridge, + [NF_BR_LOCAL_OUT] = nft_do_chain_bridge, + }, +}; + +static int __init nf_tables_bridge_init(void) +{ + int ret; + + nft_register_chain_type(&filter_bridge); + ret = register_pernet_subsys(&nf_tables_bridge_net_ops); + if (ret < 0) + nft_unregister_chain_type(&filter_bridge); + + return ret; +} + +static void __exit nf_tables_bridge_exit(void) +{ + unregister_pernet_subsys(&nf_tables_bridge_net_ops); + nft_unregister_chain_type(&filter_bridge); +} + +module_init(nf_tables_bridge_init); +module_exit(nf_tables_bridge_exit); + +MODULE_LICENSE("GPL"); +MODULE_AUTHOR("Patrick McHardy <kaber@trash.net>"); +MODULE_ALIAS_NFT_FAMILY(AF_BRIDGE); diff --git a/net/caif/cfpkt_skbuff.c b/net/caif/cfpkt_skbuff.c index 6493351f39c6..1be0b521ac49 100644 --- a/net/caif/cfpkt_skbuff.c +++ b/net/caif/cfpkt_skbuff.c @@ -203,20 +203,10 @@ int cfpkt_add_body(struct cfpkt *pkt, const void *data, u16 len) PKT_ERROR(pkt, "cow failed\n"); return -EPROTO; } - /* - * Is the SKB non-linear after skb_cow_data()? If so, we are - * going to add data to the last SKB, so we need to adjust - * lengths of the top SKB. - */ - if (lastskb != skb) { - pr_warn("Packet is non-linear\n"); - skb->len += len; - skb->data_len += len; - } } /* All set to put the last SKB and optionally write data there. */ - to = skb_put(lastskb, len); + to = pskb_put(skb, lastskb, len); if (likely(data)) memcpy(to, data, len); return 0; diff --git a/net/can/af_can.h b/net/can/af_can.h index 1dccb4c33894..6de58b40535c 100644 --- a/net/can/af_can.h +++ b/net/can/af_can.h @@ -108,9 +108,9 @@ struct s_pstats { extern struct dev_rcv_lists can_rx_alldev_list; /* function prototypes for the CAN networklayer procfs (proc.c) */ -extern void can_init_proc(void); -extern void can_remove_proc(void); -extern void can_stat_update(unsigned long data); +void can_init_proc(void); +void can_remove_proc(void); +void can_stat_update(unsigned long data); /* structures and variables from af_can.c needed in proc.c for reading */ extern struct timer_list can_stattimer; /* timer for statistics update */ diff --git a/net/ceph/auth_none.h b/net/ceph/auth_none.h index ed7d088b1bc9..059a3ce4b53f 100644 --- a/net/ceph/auth_none.h +++ b/net/ceph/auth_none.h @@ -23,7 +23,7 @@ struct ceph_auth_none_info { struct ceph_none_authorizer au; /* we only need one; it's static */ }; -extern int ceph_auth_none_init(struct ceph_auth_client *ac); +int ceph_auth_none_init(struct ceph_auth_client *ac); #endif diff --git a/net/ceph/auth_x.h b/net/ceph/auth_x.h index c5a058da7ac8..65ee72082d99 100644 --- a/net/ceph/auth_x.h +++ b/net/ceph/auth_x.h @@ -45,7 +45,7 @@ struct ceph_x_info { struct ceph_x_authorizer auth_authorizer; }; -extern int ceph_x_init(struct ceph_auth_client *ac); +int ceph_x_init(struct ceph_auth_client *ac); #endif diff --git a/net/ceph/crypto.h b/net/ceph/crypto.h index 3572dc518bc9..d1498224c49d 100644 --- a/net/ceph/crypto.h +++ b/net/ceph/crypto.h @@ -20,34 +20,32 @@ static inline void ceph_crypto_key_destroy(struct ceph_crypto_key *key) kfree(key->key); } -extern int ceph_crypto_key_clone(struct ceph_crypto_key *dst, - const struct ceph_crypto_key *src); -extern int ceph_crypto_key_encode(struct ceph_crypto_key *key, - void **p, void *end); -extern int ceph_crypto_key_decode(struct ceph_crypto_key *key, - void **p, void *end); -extern int ceph_crypto_key_unarmor(struct ceph_crypto_key *key, const char *in); +int ceph_crypto_key_clone(struct ceph_crypto_key *dst, + const struct ceph_crypto_key *src); +int ceph_crypto_key_encode(struct ceph_crypto_key *key, void **p, void *end); +int ceph_crypto_key_decode(struct ceph_crypto_key *key, void **p, void *end); +int ceph_crypto_key_unarmor(struct ceph_crypto_key *key, const char *in); /* crypto.c */ -extern int ceph_decrypt(struct ceph_crypto_key *secret, - void *dst, size_t *dst_len, - const void *src, size_t src_len); -extern int ceph_encrypt(struct ceph_crypto_key *secret, - void *dst, size_t *dst_len, - const void *src, size_t src_len); -extern int ceph_decrypt2(struct ceph_crypto_key *secret, - void *dst1, size_t *dst1_len, - void *dst2, size_t *dst2_len, - const void *src, size_t src_len); -extern int ceph_encrypt2(struct ceph_crypto_key *secret, - void *dst, size_t *dst_len, - const void *src1, size_t src1_len, - const void *src2, size_t src2_len); -extern int ceph_crypto_init(void); -extern void ceph_crypto_shutdown(void); +int ceph_decrypt(struct ceph_crypto_key *secret, + void *dst, size_t *dst_len, + const void *src, size_t src_len); +int ceph_encrypt(struct ceph_crypto_key *secret, + void *dst, size_t *dst_len, + const void *src, size_t src_len); +int ceph_decrypt2(struct ceph_crypto_key *secret, + void *dst1, size_t *dst1_len, + void *dst2, size_t *dst2_len, + const void *src, size_t src_len); +int ceph_encrypt2(struct ceph_crypto_key *secret, + void *dst, size_t *dst_len, + const void *src1, size_t src1_len, + const void *src2, size_t src2_len); +int ceph_crypto_init(void); +void ceph_crypto_shutdown(void); /* armor.c */ -extern int ceph_armor(char *dst, const char *src, const char *end); -extern int ceph_unarmor(char *dst, const char *src, const char *end); +int ceph_armor(char *dst, const char *src, const char *end); +int ceph_unarmor(char *dst, const char *src, const char *end); #endif diff --git a/net/core/datagram.c b/net/core/datagram.c index af814e764206..a16ed7bbe376 100644 --- a/net/core/datagram.c +++ b/net/core/datagram.c @@ -577,7 +577,7 @@ EXPORT_SYMBOL(skb_copy_datagram_from_iovec); /** * zerocopy_sg_from_iovec - Build a zerocopy datagram from an iovec * @skb: buffer to copy - * @from: io vector to copy to + * @from: io vector to copy from * @offset: offset in the io vector to start copying from * @count: amount of vectors to copy to buffer from * diff --git a/net/core/dev.c b/net/core/dev.c index 3430b1ed12e5..8ffc52e01ece 100644 --- a/net/core/dev.c +++ b/net/core/dev.c @@ -1203,7 +1203,7 @@ void netdev_state_change(struct net_device *dev) { if (dev->flags & IFF_UP) { call_netdevice_notifiers(NETDEV_CHANGE, dev); - rtmsg_ifinfo(RTM_NEWLINK, dev, 0); + rtmsg_ifinfo(RTM_NEWLINK, dev, 0, GFP_KERNEL); } } EXPORT_SYMBOL(netdev_state_change); @@ -1293,7 +1293,7 @@ int dev_open(struct net_device *dev) if (ret < 0) return ret; - rtmsg_ifinfo(RTM_NEWLINK, dev, IFF_UP|IFF_RUNNING); + rtmsg_ifinfo(RTM_NEWLINK, dev, IFF_UP|IFF_RUNNING, GFP_KERNEL); call_netdevice_notifiers(NETDEV_UP, dev); return ret; @@ -1307,7 +1307,7 @@ static int __dev_close_many(struct list_head *head) ASSERT_RTNL(); might_sleep(); - list_for_each_entry(dev, head, unreg_list) { + list_for_each_entry(dev, head, close_list) { call_netdevice_notifiers(NETDEV_GOING_DOWN, dev); clear_bit(__LINK_STATE_START, &dev->state); @@ -1323,7 +1323,7 @@ static int __dev_close_many(struct list_head *head) dev_deactivate_many(head); - list_for_each_entry(dev, head, unreg_list) { + list_for_each_entry(dev, head, close_list) { const struct net_device_ops *ops = dev->netdev_ops; /* @@ -1351,7 +1351,7 @@ static int __dev_close(struct net_device *dev) /* Temporarily disable netpoll until the interface is down */ netpoll_rx_disable(dev); - list_add(&dev->unreg_list, &single); + list_add(&dev->close_list, &single); retval = __dev_close_many(&single); list_del(&single); @@ -1362,21 +1362,20 @@ static int __dev_close(struct net_device *dev) static int dev_close_many(struct list_head *head) { struct net_device *dev, *tmp; - LIST_HEAD(tmp_list); - list_for_each_entry_safe(dev, tmp, head, unreg_list) + /* Remove the devices that don't need to be closed */ + list_for_each_entry_safe(dev, tmp, head, close_list) if (!(dev->flags & IFF_UP)) - list_move(&dev->unreg_list, &tmp_list); + list_del_init(&dev->close_list); __dev_close_many(head); - list_for_each_entry(dev, head, unreg_list) { - rtmsg_ifinfo(RTM_NEWLINK, dev, IFF_UP|IFF_RUNNING); + list_for_each_entry_safe(dev, tmp, head, close_list) { + rtmsg_ifinfo(RTM_NEWLINK, dev, IFF_UP|IFF_RUNNING, GFP_KERNEL); call_netdevice_notifiers(NETDEV_DOWN, dev); + list_del_init(&dev->close_list); } - /* rollback_registered_many needs the complete original list */ - list_splice(&tmp_list, head); return 0; } @@ -1397,7 +1396,7 @@ int dev_close(struct net_device *dev) /* Block netpoll rx while the interface is going down */ netpoll_rx_disable(dev); - list_add(&dev->unreg_list, &single); + list_add(&dev->close_list, &single); dev_close_many(&single); list_del(&single); @@ -2378,6 +2377,8 @@ struct sk_buff *__skb_gso_segment(struct sk_buff *skb, } SKB_GSO_CB(skb)->mac_offset = skb_headroom(skb); + SKB_GSO_CB(skb)->encap_level = 0; + skb_reset_mac_header(skb); skb_reset_mac_len(skb); @@ -2537,7 +2538,7 @@ static inline int skb_needs_linearize(struct sk_buff *skb, } int dev_hard_start_xmit(struct sk_buff *skb, struct net_device *dev, - struct netdev_queue *txq) + struct netdev_queue *txq, void *accel_priv) { const struct net_device_ops *ops = dev->netdev_ops; int rc = NETDEV_TX_OK; @@ -2603,9 +2604,13 @@ int dev_hard_start_xmit(struct sk_buff *skb, struct net_device *dev, dev_queue_xmit_nit(skb, dev); skb_len = skb->len; - rc = ops->ndo_start_xmit(skb, dev); + if (accel_priv) + rc = ops->ndo_dfwd_start_xmit(skb, dev, accel_priv); + else + rc = ops->ndo_start_xmit(skb, dev); + trace_net_dev_xmit(skb, rc, dev, skb_len); - if (rc == NETDEV_TX_OK) + if (rc == NETDEV_TX_OK && txq) txq_trans_update(txq); return rc; } @@ -2621,7 +2626,10 @@ gso: dev_queue_xmit_nit(nskb, dev); skb_len = nskb->len; - rc = ops->ndo_start_xmit(nskb, dev); + if (accel_priv) + rc = ops->ndo_dfwd_start_xmit(nskb, dev, accel_priv); + else + rc = ops->ndo_start_xmit(nskb, dev); trace_net_dev_xmit(nskb, rc, dev, skb_len); if (unlikely(rc != NETDEV_TX_OK)) { if (rc & ~NETDEV_TX_MASK) @@ -2646,6 +2654,7 @@ out_kfree_skb: out: return rc; } +EXPORT_SYMBOL_GPL(dev_hard_start_xmit); static void qdisc_pkt_len_init(struct sk_buff *skb) { @@ -2853,7 +2862,7 @@ int dev_queue_xmit(struct sk_buff *skb) if (!netif_xmit_stopped(txq)) { __this_cpu_inc(xmit_recursion); - rc = dev_hard_start_xmit(skb, dev, txq); + rc = dev_hard_start_xmit(skb, dev, txq, NULL); __this_cpu_dec(xmit_recursion); if (dev_xmit_complete(rc)) { HARD_TX_UNLOCK(dev, txq); @@ -4374,42 +4383,40 @@ struct netdev_adjacent { /* upper master flag, there can only be one master device per list */ bool master; - /* indicates that this dev is our first-level lower/upper device */ - bool neighbour; - /* counter for the number of times this device was added to us */ u16 ref_nr; + /* private field for the users */ + void *private; + struct list_head list; struct rcu_head rcu; }; -static struct netdev_adjacent *__netdev_find_adj(struct net_device *dev, - struct net_device *adj_dev, - bool upper) +static struct netdev_adjacent *__netdev_find_adj_rcu(struct net_device *dev, + struct net_device *adj_dev, + struct list_head *adj_list) { struct netdev_adjacent *adj; - struct list_head *dev_list; - - dev_list = upper ? &dev->upper_dev_list : &dev->lower_dev_list; - list_for_each_entry(adj, dev_list, list) { + list_for_each_entry_rcu(adj, adj_list, list) { if (adj->dev == adj_dev) return adj; } return NULL; } -static inline struct netdev_adjacent *__netdev_find_upper(struct net_device *dev, - struct net_device *udev) +static struct netdev_adjacent *__netdev_find_adj(struct net_device *dev, + struct net_device *adj_dev, + struct list_head *adj_list) { - return __netdev_find_adj(dev, udev, true); -} + struct netdev_adjacent *adj; -static inline struct netdev_adjacent *__netdev_find_lower(struct net_device *dev, - struct net_device *ldev) -{ - return __netdev_find_adj(dev, ldev, false); + list_for_each_entry(adj, adj_list, list) { + if (adj->dev == adj_dev) + return adj; + } + return NULL; } /** @@ -4426,7 +4433,7 @@ bool netdev_has_upper_dev(struct net_device *dev, { ASSERT_RTNL(); - return __netdev_find_upper(dev, upper_dev); + return __netdev_find_adj(dev, upper_dev, &dev->all_adj_list.upper); } EXPORT_SYMBOL(netdev_has_upper_dev); @@ -4441,7 +4448,7 @@ bool netdev_has_any_upper_dev(struct net_device *dev) { ASSERT_RTNL(); - return !list_empty(&dev->upper_dev_list); + return !list_empty(&dev->all_adj_list.upper); } EXPORT_SYMBOL(netdev_has_any_upper_dev); @@ -4458,10 +4465,10 @@ struct net_device *netdev_master_upper_dev_get(struct net_device *dev) ASSERT_RTNL(); - if (list_empty(&dev->upper_dev_list)) + if (list_empty(&dev->adj_list.upper)) return NULL; - upper = list_first_entry(&dev->upper_dev_list, + upper = list_first_entry(&dev->adj_list.upper, struct netdev_adjacent, list); if (likely(upper->master)) return upper->dev; @@ -4469,15 +4476,26 @@ struct net_device *netdev_master_upper_dev_get(struct net_device *dev) } EXPORT_SYMBOL(netdev_master_upper_dev_get); -/* netdev_upper_get_next_dev_rcu - Get the next dev from upper list +void *netdev_adjacent_get_private(struct list_head *adj_list) +{ + struct netdev_adjacent *adj; + + adj = list_entry(adj_list, struct netdev_adjacent, list); + + return adj->private; +} +EXPORT_SYMBOL(netdev_adjacent_get_private); + +/** + * netdev_all_upper_get_next_dev_rcu - Get the next dev from upper list * @dev: device * @iter: list_head ** of the current position * * Gets the next device from the dev's upper list, starting from iter * position. The caller must hold RCU read lock. */ -struct net_device *netdev_upper_get_next_dev_rcu(struct net_device *dev, - struct list_head **iter) +struct net_device *netdev_all_upper_get_next_dev_rcu(struct net_device *dev, + struct list_head **iter) { struct netdev_adjacent *upper; @@ -4485,14 +4503,71 @@ struct net_device *netdev_upper_get_next_dev_rcu(struct net_device *dev, upper = list_entry_rcu((*iter)->next, struct netdev_adjacent, list); - if (&upper->list == &dev->upper_dev_list) + if (&upper->list == &dev->all_adj_list.upper) return NULL; *iter = &upper->list; return upper->dev; } -EXPORT_SYMBOL(netdev_upper_get_next_dev_rcu); +EXPORT_SYMBOL(netdev_all_upper_get_next_dev_rcu); + +/** + * netdev_lower_get_next_private - Get the next ->private from the + * lower neighbour list + * @dev: device + * @iter: list_head ** of the current position + * + * Gets the next netdev_adjacent->private from the dev's lower neighbour + * list, starting from iter position. The caller must hold either hold the + * RTNL lock or its own locking that guarantees that the neighbour lower + * list will remain unchainged. + */ +void *netdev_lower_get_next_private(struct net_device *dev, + struct list_head **iter) +{ + struct netdev_adjacent *lower; + + lower = list_entry(*iter, struct netdev_adjacent, list); + + if (&lower->list == &dev->adj_list.lower) + return NULL; + + if (iter) + *iter = lower->list.next; + + return lower->private; +} +EXPORT_SYMBOL(netdev_lower_get_next_private); + +/** + * netdev_lower_get_next_private_rcu - Get the next ->private from the + * lower neighbour list, RCU + * variant + * @dev: device + * @iter: list_head ** of the current position + * + * Gets the next netdev_adjacent->private from the dev's lower neighbour + * list, starting from iter position. The caller must hold RCU read lock. + */ +void *netdev_lower_get_next_private_rcu(struct net_device *dev, + struct list_head **iter) +{ + struct netdev_adjacent *lower; + + WARN_ON_ONCE(!rcu_read_lock_held()); + + lower = list_entry_rcu((*iter)->next, struct netdev_adjacent, list); + + if (&lower->list == &dev->adj_list.lower) + return NULL; + + if (iter) + *iter = &lower->list; + + return lower->private; +} +EXPORT_SYMBOL(netdev_lower_get_next_private_rcu); /** * netdev_master_upper_dev_get_rcu - Get master upper device @@ -4505,7 +4580,7 @@ struct net_device *netdev_master_upper_dev_get_rcu(struct net_device *dev) { struct netdev_adjacent *upper; - upper = list_first_or_null_rcu(&dev->upper_dev_list, + upper = list_first_or_null_rcu(&dev->adj_list.upper, struct netdev_adjacent, list); if (upper && likely(upper->master)) return upper->dev; @@ -4515,15 +4590,16 @@ EXPORT_SYMBOL(netdev_master_upper_dev_get_rcu); static int __netdev_adjacent_dev_insert(struct net_device *dev, struct net_device *adj_dev, - bool neighbour, bool master, - bool upper) + struct list_head *dev_list, + void *private, bool master) { struct netdev_adjacent *adj; + char linkname[IFNAMSIZ+7]; + int ret; - adj = __netdev_find_adj(dev, adj_dev, upper); + adj = __netdev_find_adj(dev, adj_dev, dev_list); if (adj) { - BUG_ON(neighbour); adj->ref_nr++; return 0; } @@ -4534,124 +4610,179 @@ static int __netdev_adjacent_dev_insert(struct net_device *dev, adj->dev = adj_dev; adj->master = master; - adj->neighbour = neighbour; adj->ref_nr = 1; - + adj->private = private; dev_hold(adj_dev); - pr_debug("dev_hold for %s, because of %s link added from %s to %s\n", - adj_dev->name, upper ? "upper" : "lower", dev->name, - adj_dev->name); - if (!upper) { - list_add_tail_rcu(&adj->list, &dev->lower_dev_list); - return 0; + pr_debug("dev_hold for %s, because of link added from %s to %s\n", + adj_dev->name, dev->name, adj_dev->name); + + if (dev_list == &dev->adj_list.lower) { + sprintf(linkname, "lower_%s", adj_dev->name); + ret = sysfs_create_link(&(dev->dev.kobj), + &(adj_dev->dev.kobj), linkname); + if (ret) + goto free_adj; + } else if (dev_list == &dev->adj_list.upper) { + sprintf(linkname, "upper_%s", adj_dev->name); + ret = sysfs_create_link(&(dev->dev.kobj), + &(adj_dev->dev.kobj), linkname); + if (ret) + goto free_adj; } - /* Ensure that master upper link is always the first item in list. */ - if (master) - list_add_rcu(&adj->list, &dev->upper_dev_list); - else - list_add_tail_rcu(&adj->list, &dev->upper_dev_list); + /* Ensure that master link is always the first item in list. */ + if (master) { + ret = sysfs_create_link(&(dev->dev.kobj), + &(adj_dev->dev.kobj), "master"); + if (ret) + goto remove_symlinks; + + list_add_rcu(&adj->list, dev_list); + } else { + list_add_tail_rcu(&adj->list, dev_list); + } return 0; -} -static inline int __netdev_upper_dev_insert(struct net_device *dev, - struct net_device *udev, - bool master, bool neighbour) -{ - return __netdev_adjacent_dev_insert(dev, udev, neighbour, master, - true); -} +remove_symlinks: + if (dev_list == &dev->adj_list.lower) { + sprintf(linkname, "lower_%s", adj_dev->name); + sysfs_remove_link(&(dev->dev.kobj), linkname); + } else if (dev_list == &dev->adj_list.upper) { + sprintf(linkname, "upper_%s", adj_dev->name); + sysfs_remove_link(&(dev->dev.kobj), linkname); + } -static inline int __netdev_lower_dev_insert(struct net_device *dev, - struct net_device *ldev, - bool neighbour) -{ - return __netdev_adjacent_dev_insert(dev, ldev, neighbour, false, - false); +free_adj: + kfree(adj); + dev_put(adj_dev); + + return ret; } void __netdev_adjacent_dev_remove(struct net_device *dev, - struct net_device *adj_dev, bool upper) + struct net_device *adj_dev, + struct list_head *dev_list) { struct netdev_adjacent *adj; + char linkname[IFNAMSIZ+7]; - if (upper) - adj = __netdev_find_upper(dev, adj_dev); - else - adj = __netdev_find_lower(dev, adj_dev); + adj = __netdev_find_adj(dev, adj_dev, dev_list); - if (!adj) + if (!adj) { + pr_err("tried to remove device %s from %s\n", + dev->name, adj_dev->name); BUG(); + } if (adj->ref_nr > 1) { + pr_debug("%s to %s ref_nr-- = %d\n", dev->name, adj_dev->name, + adj->ref_nr-1); adj->ref_nr--; return; } + if (adj->master) + sysfs_remove_link(&(dev->dev.kobj), "master"); + + if (dev_list == &dev->adj_list.lower) { + sprintf(linkname, "lower_%s", adj_dev->name); + sysfs_remove_link(&(dev->dev.kobj), linkname); + } else if (dev_list == &dev->adj_list.upper) { + sprintf(linkname, "upper_%s", adj_dev->name); + sysfs_remove_link(&(dev->dev.kobj), linkname); + } + list_del_rcu(&adj->list); - pr_debug("dev_put for %s, because of %s link removed from %s to %s\n", - adj_dev->name, upper ? "upper" : "lower", dev->name, - adj_dev->name); + pr_debug("dev_put for %s, because link removed from %s to %s\n", + adj_dev->name, dev->name, adj_dev->name); dev_put(adj_dev); kfree_rcu(adj, rcu); } -static inline void __netdev_upper_dev_remove(struct net_device *dev, - struct net_device *udev) -{ - return __netdev_adjacent_dev_remove(dev, udev, true); -} - -static inline void __netdev_lower_dev_remove(struct net_device *dev, - struct net_device *ldev) -{ - return __netdev_adjacent_dev_remove(dev, ldev, false); -} - -int __netdev_adjacent_dev_insert_link(struct net_device *dev, - struct net_device *upper_dev, - bool master, bool neighbour) +int __netdev_adjacent_dev_link_lists(struct net_device *dev, + struct net_device *upper_dev, + struct list_head *up_list, + struct list_head *down_list, + void *private, bool master) { int ret; - ret = __netdev_upper_dev_insert(dev, upper_dev, master, neighbour); + ret = __netdev_adjacent_dev_insert(dev, upper_dev, up_list, private, + master); if (ret) return ret; - ret = __netdev_lower_dev_insert(upper_dev, dev, neighbour); + ret = __netdev_adjacent_dev_insert(upper_dev, dev, down_list, private, + false); if (ret) { - __netdev_upper_dev_remove(dev, upper_dev); + __netdev_adjacent_dev_remove(dev, upper_dev, up_list); return ret; } return 0; } -static inline int __netdev_adjacent_dev_link(struct net_device *dev, - struct net_device *udev) +int __netdev_adjacent_dev_link(struct net_device *dev, + struct net_device *upper_dev) { - return __netdev_adjacent_dev_insert_link(dev, udev, false, false); + return __netdev_adjacent_dev_link_lists(dev, upper_dev, + &dev->all_adj_list.upper, + &upper_dev->all_adj_list.lower, + NULL, false); } -static inline int __netdev_adjacent_dev_link_neighbour(struct net_device *dev, - struct net_device *udev, - bool master) +void __netdev_adjacent_dev_unlink_lists(struct net_device *dev, + struct net_device *upper_dev, + struct list_head *up_list, + struct list_head *down_list) { - return __netdev_adjacent_dev_insert_link(dev, udev, master, true); + __netdev_adjacent_dev_remove(dev, upper_dev, up_list); + __netdev_adjacent_dev_remove(upper_dev, dev, down_list); } void __netdev_adjacent_dev_unlink(struct net_device *dev, struct net_device *upper_dev) { - __netdev_upper_dev_remove(dev, upper_dev); - __netdev_lower_dev_remove(upper_dev, dev); + __netdev_adjacent_dev_unlink_lists(dev, upper_dev, + &dev->all_adj_list.upper, + &upper_dev->all_adj_list.lower); } +int __netdev_adjacent_dev_link_neighbour(struct net_device *dev, + struct net_device *upper_dev, + void *private, bool master) +{ + int ret = __netdev_adjacent_dev_link(dev, upper_dev); + + if (ret) + return ret; + + ret = __netdev_adjacent_dev_link_lists(dev, upper_dev, + &dev->adj_list.upper, + &upper_dev->adj_list.lower, + private, master); + if (ret) { + __netdev_adjacent_dev_unlink(dev, upper_dev); + return ret; + } + + return 0; +} + +void __netdev_adjacent_dev_unlink_neighbour(struct net_device *dev, + struct net_device *upper_dev) +{ + __netdev_adjacent_dev_unlink(dev, upper_dev); + __netdev_adjacent_dev_unlink_lists(dev, upper_dev, + &dev->adj_list.upper, + &upper_dev->adj_list.lower); +} static int __netdev_upper_dev_link(struct net_device *dev, - struct net_device *upper_dev, bool master) + struct net_device *upper_dev, bool master, + void *private) { struct netdev_adjacent *i, *j, *to_i, *to_j; int ret = 0; @@ -4662,26 +4793,29 @@ static int __netdev_upper_dev_link(struct net_device *dev, return -EBUSY; /* To prevent loops, check if dev is not upper device to upper_dev. */ - if (__netdev_find_upper(upper_dev, dev)) + if (__netdev_find_adj(upper_dev, dev, &upper_dev->all_adj_list.upper)) return -EBUSY; - if (__netdev_find_upper(dev, upper_dev)) + if (__netdev_find_adj(dev, upper_dev, &dev->all_adj_list.upper)) return -EEXIST; if (master && netdev_master_upper_dev_get(dev)) return -EBUSY; - ret = __netdev_adjacent_dev_link_neighbour(dev, upper_dev, master); + ret = __netdev_adjacent_dev_link_neighbour(dev, upper_dev, private, + master); if (ret) return ret; /* Now that we linked these devs, make all the upper_dev's - * upper_dev_list visible to every dev's lower_dev_list and vice + * all_adj_list.upper visible to every dev's all_adj_list.lower an * versa, and don't forget the devices itself. All of these * links are non-neighbours. */ - list_for_each_entry(i, &dev->lower_dev_list, list) { - list_for_each_entry(j, &upper_dev->upper_dev_list, list) { + list_for_each_entry(i, &dev->all_adj_list.lower, list) { + list_for_each_entry(j, &upper_dev->all_adj_list.upper, list) { + pr_debug("Interlinking %s with %s, non-neighbour\n", + i->dev->name, j->dev->name); ret = __netdev_adjacent_dev_link(i->dev, j->dev); if (ret) goto rollback_mesh; @@ -4689,14 +4823,18 @@ static int __netdev_upper_dev_link(struct net_device *dev, } /* add dev to every upper_dev's upper device */ - list_for_each_entry(i, &upper_dev->upper_dev_list, list) { + list_for_each_entry(i, &upper_dev->all_adj_list.upper, list) { + pr_debug("linking %s's upper device %s with %s\n", + upper_dev->name, i->dev->name, dev->name); ret = __netdev_adjacent_dev_link(dev, i->dev); if (ret) goto rollback_upper_mesh; } /* add upper_dev to every dev's lower device */ - list_for_each_entry(i, &dev->lower_dev_list, list) { + list_for_each_entry(i, &dev->all_adj_list.lower, list) { + pr_debug("linking %s's lower device %s with %s\n", dev->name, + i->dev->name, upper_dev->name); ret = __netdev_adjacent_dev_link(i->dev, upper_dev); if (ret) goto rollback_lower_mesh; @@ -4707,7 +4845,7 @@ static int __netdev_upper_dev_link(struct net_device *dev, rollback_lower_mesh: to_i = i; - list_for_each_entry(i, &dev->lower_dev_list, list) { + list_for_each_entry(i, &dev->all_adj_list.lower, list) { if (i == to_i) break; __netdev_adjacent_dev_unlink(i->dev, upper_dev); @@ -4717,7 +4855,7 @@ rollback_lower_mesh: rollback_upper_mesh: to_i = i; - list_for_each_entry(i, &upper_dev->upper_dev_list, list) { + list_for_each_entry(i, &upper_dev->all_adj_list.upper, list) { if (i == to_i) break; __netdev_adjacent_dev_unlink(dev, i->dev); @@ -4728,8 +4866,8 @@ rollback_upper_mesh: rollback_mesh: to_i = i; to_j = j; - list_for_each_entry(i, &dev->lower_dev_list, list) { - list_for_each_entry(j, &upper_dev->upper_dev_list, list) { + list_for_each_entry(i, &dev->all_adj_list.lower, list) { + list_for_each_entry(j, &upper_dev->all_adj_list.upper, list) { if (i == to_i && j == to_j) break; __netdev_adjacent_dev_unlink(i->dev, j->dev); @@ -4738,7 +4876,7 @@ rollback_mesh: break; } - __netdev_adjacent_dev_unlink(dev, upper_dev); + __netdev_adjacent_dev_unlink_neighbour(dev, upper_dev); return ret; } @@ -4756,7 +4894,7 @@ rollback_mesh: int netdev_upper_dev_link(struct net_device *dev, struct net_device *upper_dev) { - return __netdev_upper_dev_link(dev, upper_dev, false); + return __netdev_upper_dev_link(dev, upper_dev, false, NULL); } EXPORT_SYMBOL(netdev_upper_dev_link); @@ -4774,10 +4912,18 @@ EXPORT_SYMBOL(netdev_upper_dev_link); int netdev_master_upper_dev_link(struct net_device *dev, struct net_device *upper_dev) { - return __netdev_upper_dev_link(dev, upper_dev, true); + return __netdev_upper_dev_link(dev, upper_dev, true, NULL); } EXPORT_SYMBOL(netdev_master_upper_dev_link); +int netdev_master_upper_dev_link_private(struct net_device *dev, + struct net_device *upper_dev, + void *private) +{ + return __netdev_upper_dev_link(dev, upper_dev, true, private); +} +EXPORT_SYMBOL(netdev_master_upper_dev_link_private); + /** * netdev_upper_dev_unlink - Removes a link to upper device * @dev: device @@ -4792,29 +4938,59 @@ void netdev_upper_dev_unlink(struct net_device *dev, struct netdev_adjacent *i, *j; ASSERT_RTNL(); - __netdev_adjacent_dev_unlink(dev, upper_dev); + __netdev_adjacent_dev_unlink_neighbour(dev, upper_dev); /* Here is the tricky part. We must remove all dev's lower * devices from all upper_dev's upper devices and vice * versa, to maintain the graph relationship. */ - list_for_each_entry(i, &dev->lower_dev_list, list) - list_for_each_entry(j, &upper_dev->upper_dev_list, list) + list_for_each_entry(i, &dev->all_adj_list.lower, list) + list_for_each_entry(j, &upper_dev->all_adj_list.upper, list) __netdev_adjacent_dev_unlink(i->dev, j->dev); /* remove also the devices itself from lower/upper device * list */ - list_for_each_entry(i, &dev->lower_dev_list, list) + list_for_each_entry(i, &dev->all_adj_list.lower, list) __netdev_adjacent_dev_unlink(i->dev, upper_dev); - list_for_each_entry(i, &upper_dev->upper_dev_list, list) + list_for_each_entry(i, &upper_dev->all_adj_list.upper, list) __netdev_adjacent_dev_unlink(dev, i->dev); call_netdevice_notifiers(NETDEV_CHANGEUPPER, dev); } EXPORT_SYMBOL(netdev_upper_dev_unlink); +void *netdev_lower_dev_get_private_rcu(struct net_device *dev, + struct net_device *lower_dev) +{ + struct netdev_adjacent *lower; + + if (!lower_dev) + return NULL; + lower = __netdev_find_adj_rcu(dev, lower_dev, &dev->adj_list.lower); + if (!lower) + return NULL; + + return lower->private; +} +EXPORT_SYMBOL(netdev_lower_dev_get_private_rcu); + +void *netdev_lower_dev_get_private(struct net_device *dev, + struct net_device *lower_dev) +{ + struct netdev_adjacent *lower; + + if (!lower_dev) + return NULL; + lower = __netdev_find_adj(dev, lower_dev, &dev->adj_list.lower); + if (!lower) + return NULL; + + return lower->private; +} +EXPORT_SYMBOL(netdev_lower_dev_get_private); + static void dev_change_rx_flags(struct net_device *dev, int flags) { const struct net_device_ops *ops = dev->netdev_ops; @@ -4823,7 +4999,7 @@ static void dev_change_rx_flags(struct net_device *dev, int flags) ops->ndo_change_rx_flags(dev, flags); } -static int __dev_set_promiscuity(struct net_device *dev, int inc) +static int __dev_set_promiscuity(struct net_device *dev, int inc, bool notify) { unsigned int old_flags = dev->flags; kuid_t uid; @@ -4866,6 +5042,8 @@ static int __dev_set_promiscuity(struct net_device *dev, int inc) dev_change_rx_flags(dev, IFF_PROMISC); } + if (notify) + __dev_notify_flags(dev, old_flags, IFF_PROMISC); return 0; } @@ -4885,7 +5063,7 @@ int dev_set_promiscuity(struct net_device *dev, int inc) unsigned int old_flags = dev->flags; int err; - err = __dev_set_promiscuity(dev, inc); + err = __dev_set_promiscuity(dev, inc, true); if (err < 0) return err; if (dev->flags != old_flags) @@ -4894,22 +5072,9 @@ int dev_set_promiscuity(struct net_device *dev, int inc) } EXPORT_SYMBOL(dev_set_promiscuity); -/** - * dev_set_allmulti - update allmulti count on a device - * @dev: device - * @inc: modifier - * - * Add or remove reception of all multicast frames to a device. While the - * count in the device remains above zero the interface remains listening - * to all interfaces. Once it hits zero the device reverts back to normal - * filtering operation. A negative @inc value is used to drop the counter - * when releasing a resource needing all multicasts. - * Return 0 if successful or a negative errno code on error. - */ - -int dev_set_allmulti(struct net_device *dev, int inc) +static int __dev_set_allmulti(struct net_device *dev, int inc, bool notify) { - unsigned int old_flags = dev->flags; + unsigned int old_flags = dev->flags, old_gflags = dev->gflags; ASSERT_RTNL(); @@ -4932,9 +5097,30 @@ int dev_set_allmulti(struct net_device *dev, int inc) if (dev->flags ^ old_flags) { dev_change_rx_flags(dev, IFF_ALLMULTI); dev_set_rx_mode(dev); + if (notify) + __dev_notify_flags(dev, old_flags, + dev->gflags ^ old_gflags); } return 0; } + +/** + * dev_set_allmulti - update allmulti count on a device + * @dev: device + * @inc: modifier + * + * Add or remove reception of all multicast frames to a device. While the + * count in the device remains above zero the interface remains listening + * to all interfaces. Once it hits zero the device reverts back to normal + * filtering operation. A negative @inc value is used to drop the counter + * when releasing a resource needing all multicasts. + * Return 0 if successful or a negative errno code on error. + */ + +int dev_set_allmulti(struct net_device *dev, int inc) +{ + return __dev_set_allmulti(dev, inc, true); +} EXPORT_SYMBOL(dev_set_allmulti); /* @@ -4959,10 +5145,10 @@ void __dev_set_rx_mode(struct net_device *dev) * therefore calling __dev_set_promiscuity here is safe. */ if (!netdev_uc_empty(dev) && !dev->uc_promisc) { - __dev_set_promiscuity(dev, 1); + __dev_set_promiscuity(dev, 1, false); dev->uc_promisc = true; } else if (netdev_uc_empty(dev) && dev->uc_promisc) { - __dev_set_promiscuity(dev, -1); + __dev_set_promiscuity(dev, -1, false); dev->uc_promisc = false; } } @@ -5051,9 +5237,13 @@ int __dev_change_flags(struct net_device *dev, unsigned int flags) if ((flags ^ dev->gflags) & IFF_PROMISC) { int inc = (flags & IFF_PROMISC) ? 1 : -1; + unsigned int old_flags = dev->flags; dev->gflags ^= IFF_PROMISC; - dev_set_promiscuity(dev, inc); + + if (__dev_set_promiscuity(dev, inc, false) >= 0) + if (dev->flags != old_flags) + dev_set_rx_mode(dev); } /* NOTE: order of synchronization of IFF_PROMISC and IFF_ALLMULTI @@ -5064,16 +5254,20 @@ int __dev_change_flags(struct net_device *dev, unsigned int flags) int inc = (flags & IFF_ALLMULTI) ? 1 : -1; dev->gflags ^= IFF_ALLMULTI; - dev_set_allmulti(dev, inc); + __dev_set_allmulti(dev, inc, false); } return ret; } -void __dev_notify_flags(struct net_device *dev, unsigned int old_flags) +void __dev_notify_flags(struct net_device *dev, unsigned int old_flags, + unsigned int gchanges) { unsigned int changes = dev->flags ^ old_flags; + if (gchanges) + rtmsg_ifinfo(RTM_NEWLINK, dev, gchanges, GFP_ATOMIC); + if (changes & IFF_UP) { if (dev->flags & IFF_UP) call_netdevice_notifiers(NETDEV_UP, dev); @@ -5102,17 +5296,14 @@ void __dev_notify_flags(struct net_device *dev, unsigned int old_flags) int dev_change_flags(struct net_device *dev, unsigned int flags) { int ret; - unsigned int changes, old_flags = dev->flags; + unsigned int changes, old_flags = dev->flags, old_gflags = dev->gflags; ret = __dev_change_flags(dev, flags); if (ret < 0) return ret; - changes = old_flags ^ dev->flags; - if (changes) - rtmsg_ifinfo(RTM_NEWLINK, dev, changes); - - __dev_notify_flags(dev, old_flags); + changes = (old_flags ^ dev->flags) | (old_gflags ^ dev->gflags); + __dev_notify_flags(dev, old_flags, changes); return ret; } EXPORT_SYMBOL(dev_change_flags); @@ -5259,6 +5450,7 @@ static void net_set_todo(struct net_device *dev) static void rollback_registered_many(struct list_head *head) { struct net_device *dev, *tmp; + LIST_HEAD(close_head); BUG_ON(dev_boot_phase); ASSERT_RTNL(); @@ -5281,7 +5473,9 @@ static void rollback_registered_many(struct list_head *head) } /* If device is running, close it first. */ - dev_close_many(head); + list_for_each_entry(dev, head, unreg_list) + list_add_tail(&dev->close_list, &close_head); + dev_close_many(&close_head); list_for_each_entry(dev, head, unreg_list) { /* And unlink it from device chain. */ @@ -5304,7 +5498,7 @@ static void rollback_registered_many(struct list_head *head) if (!dev->rtnl_link_ops || dev->rtnl_link_state == RTNL_LINK_INITIALIZED) - rtmsg_ifinfo(RTM_DELLINK, dev, ~0U); + rtmsg_ifinfo(RTM_DELLINK, dev, ~0U, GFP_KERNEL); /* * Flush the unicast and multicast chains @@ -5703,7 +5897,7 @@ int register_netdevice(struct net_device *dev) */ if (!dev->rtnl_link_ops || dev->rtnl_link_state == RTNL_LINK_INITIALIZED) - rtmsg_ifinfo(RTM_NEWLINK, dev, ~0U); + rtmsg_ifinfo(RTM_NEWLINK, dev, ~0U, GFP_KERNEL); out: return ret; @@ -6010,6 +6204,16 @@ void netdev_set_default_ethtool_ops(struct net_device *dev, } EXPORT_SYMBOL_GPL(netdev_set_default_ethtool_ops); +void netdev_freemem(struct net_device *dev) +{ + char *addr = (char *)dev - dev->padded; + + if (is_vmalloc_addr(addr)) + vfree(addr); + else + kfree(addr); +} + /** * alloc_netdev_mqs - allocate network device * @sizeof_priv: size of private data to allocate space for @@ -6053,7 +6257,9 @@ struct net_device *alloc_netdev_mqs(int sizeof_priv, const char *name, /* ensure 32-byte alignment of whole construct */ alloc_size += NETDEV_ALIGN - 1; - p = kzalloc(alloc_size, GFP_KERNEL); + p = kzalloc(alloc_size, GFP_KERNEL | __GFP_NOWARN | __GFP_REPEAT); + if (!p) + p = vzalloc(alloc_size); if (!p) return NULL; @@ -6062,7 +6268,7 @@ struct net_device *alloc_netdev_mqs(int sizeof_priv, const char *name, dev->pcpu_refcnt = alloc_percpu(int); if (!dev->pcpu_refcnt) - goto free_p; + goto free_dev; if (dev_addr_init(dev)) goto free_pcpu; @@ -6077,9 +6283,12 @@ struct net_device *alloc_netdev_mqs(int sizeof_priv, const char *name, INIT_LIST_HEAD(&dev->napi_list); INIT_LIST_HEAD(&dev->unreg_list); + INIT_LIST_HEAD(&dev->close_list); INIT_LIST_HEAD(&dev->link_watch_list); - INIT_LIST_HEAD(&dev->upper_dev_list); - INIT_LIST_HEAD(&dev->lower_dev_list); + INIT_LIST_HEAD(&dev->adj_list.upper); + INIT_LIST_HEAD(&dev->adj_list.lower); + INIT_LIST_HEAD(&dev->all_adj_list.upper); + INIT_LIST_HEAD(&dev->all_adj_list.lower); dev->priv_flags = IFF_XMIT_DST_RELEASE; setup(dev); @@ -6112,8 +6321,8 @@ free_pcpu: kfree(dev->_rx); #endif -free_p: - kfree(p); +free_dev: + netdev_freemem(dev); return NULL; } EXPORT_SYMBOL(alloc_netdev_mqs); @@ -6150,7 +6359,7 @@ void free_netdev(struct net_device *dev) /* Compatibility with error handling in drivers */ if (dev->reg_state == NETREG_UNINITIALIZED) { - kfree((char *)dev - dev->padded); + netdev_freemem(dev); return; } @@ -6312,7 +6521,7 @@ int dev_change_net_namespace(struct net_device *dev, struct net *net, const char call_netdevice_notifiers(NETDEV_UNREGISTER, dev); rcu_barrier(); call_netdevice_notifiers(NETDEV_UNREGISTER_FINAL, dev); - rtmsg_ifinfo(RTM_DELLINK, dev, ~0U); + rtmsg_ifinfo(RTM_DELLINK, dev, ~0U, GFP_KERNEL); /* * Flush the unicast and multicast chains @@ -6351,7 +6560,7 @@ int dev_change_net_namespace(struct net_device *dev, struct net *net, const char * Prevent userspace races by waiting until the network * device is fully setup before sending notifications. */ - rtmsg_ifinfo(RTM_NEWLINK, dev, ~0U); + rtmsg_ifinfo(RTM_NEWLINK, dev, ~0U, GFP_KERNEL); synchronize_net(); err = 0; diff --git a/net/core/dev_addr_lists.c b/net/core/dev_addr_lists.c index 6cda4e2c2132..ec40a849fc42 100644 --- a/net/core/dev_addr_lists.c +++ b/net/core/dev_addr_lists.c @@ -752,7 +752,7 @@ int dev_mc_del_global(struct net_device *dev, const unsigned char *addr) EXPORT_SYMBOL(dev_mc_del_global); /** - * dev_mc_sync - Synchronize device's unicast list to another device + * dev_mc_sync - Synchronize device's multicast list to another device * @to: destination device * @from: source device * @@ -780,7 +780,7 @@ int dev_mc_sync(struct net_device *to, struct net_device *from) EXPORT_SYMBOL(dev_mc_sync); /** - * dev_mc_sync_multiple - Synchronize device's unicast list to another + * dev_mc_sync_multiple - Synchronize device's multicast list to another * device, but allow for multiple calls to sync to multiple devices. * @to: destination device * @from: source device diff --git a/net/core/ethtool.c b/net/core/ethtool.c index 78e9d9223e40..30071dec287a 100644 --- a/net/core/ethtool.c +++ b/net/core/ethtool.c @@ -81,6 +81,8 @@ static const char netdev_features_strings[NETDEV_FEATURE_COUNT][ETH_GSTRING_LEN] [NETIF_F_TSO6_BIT] = "tx-tcp6-segmentation", [NETIF_F_FSO_BIT] = "tx-fcoe-segmentation", [NETIF_F_GSO_GRE_BIT] = "tx-gre-segmentation", + [NETIF_F_GSO_IPIP_BIT] = "tx-ipip-segmentation", + [NETIF_F_GSO_SIT_BIT] = "tx-sit-segmentation", [NETIF_F_GSO_UDP_TUNNEL_BIT] = "tx-udp_tnl-segmentation", [NETIF_F_GSO_MPLS_BIT] = "tx-mpls-segmentation", @@ -94,6 +96,7 @@ static const char netdev_features_strings[NETDEV_FEATURE_COUNT][ETH_GSTRING_LEN] [NETIF_F_LOOPBACK_BIT] = "loopback", [NETIF_F_RXFCS_BIT] = "rx-fcs", [NETIF_F_RXALL_BIT] = "rx-all", + [NETIF_F_HW_L2FW_DOFFLOAD_BIT] = "l2-fwd-offload", }; static int ethtool_get_features(struct net_device *dev, void __user *useraddr) diff --git a/net/core/fib_rules.c b/net/core/fib_rules.c index 2e654138433c..f409e0bd35c0 100644 --- a/net/core/fib_rules.c +++ b/net/core/fib_rules.c @@ -460,7 +460,8 @@ static int fib_nl_delrule(struct sk_buff *skb, struct nlmsghdr* nlh) if (frh->action && (frh->action != rule->action)) continue; - if (frh->table && (frh_get_table(frh, tb) != rule->table)) + if (frh_get_table(frh, tb) && + (frh_get_table(frh, tb) != rule->table)) continue; if (tb[FRA_PRIORITY] && diff --git a/net/core/flow_dissector.c b/net/core/flow_dissector.c index 143b6fdb9647..d6ef17322500 100644 --- a/net/core/flow_dissector.c +++ b/net/core/flow_dissector.c @@ -25,9 +25,35 @@ static void iph_to_flow_copy_addrs(struct flow_keys *flow, const struct iphdr *i memcpy(&flow->src, &iph->saddr, sizeof(flow->src) + sizeof(flow->dst)); } +/** + * skb_flow_get_ports - extract the upper layer ports and return them + * @skb: buffer to extract the ports from + * @thoff: transport header offset + * @ip_proto: protocol for which to get port offset + * + * The function will try to retrieve the ports at offset thoff + poff where poff + * is the protocol port offset returned from proto_ports_offset + */ +__be32 skb_flow_get_ports(const struct sk_buff *skb, int thoff, u8 ip_proto) +{ + int poff = proto_ports_offset(ip_proto); + + if (poff >= 0) { + __be32 *ports, _ports; + + ports = skb_header_pointer(skb, thoff + poff, + sizeof(_ports), &_ports); + if (ports) + return *ports; + } + + return 0; +} +EXPORT_SYMBOL(skb_flow_get_ports); + bool skb_flow_dissect(const struct sk_buff *skb, struct flow_keys *flow) { - int poff, nhoff = skb_network_offset(skb); + int nhoff = skb_network_offset(skb); u8 ip_proto; __be16 proto = skb->protocol; @@ -42,13 +68,13 @@ ip: iph = skb_header_pointer(skb, nhoff, sizeof(_iph), &_iph); if (!iph || iph->ihl < 5) return false; + nhoff += iph->ihl * 4; + ip_proto = iph->protocol; if (ip_is_fragment(iph)) ip_proto = 0; - else - ip_proto = iph->protocol; + iph_to_flow_copy_addrs(flow, iph); - nhoff += iph->ihl * 4; break; } case __constant_htons(ETH_P_IPV6): { @@ -150,16 +176,7 @@ ipv6: } flow->ip_proto = ip_proto; - poff = proto_ports_offset(ip_proto); - if (poff >= 0) { - __be32 *ports, _ports; - - ports = skb_header_pointer(skb, nhoff + poff, - sizeof(_ports), &_ports); - if (ports) - flow->ports = *ports; - } - + flow->ports = skb_flow_get_ports(skb, nhoff, ip_proto); flow->thoff = (u16) nhoff; return true; @@ -167,6 +184,22 @@ ipv6: EXPORT_SYMBOL(skb_flow_dissect); static u32 hashrnd __read_mostly; +static __always_inline void __flow_hash_secret_init(void) +{ + net_get_random_once(&hashrnd, sizeof(hashrnd)); +} + +static __always_inline u32 __flow_hash_3words(u32 a, u32 b, u32 c) +{ + __flow_hash_secret_init(); + return jhash_3words(a, b, c, hashrnd); +} + +static __always_inline u32 __flow_hash_1word(u32 a) +{ + __flow_hash_secret_init(); + return jhash_1word(a, hashrnd); +} /* * __skb_get_rxhash: calculate a flow hash based on src/dst addresses @@ -193,9 +226,9 @@ void __skb_get_rxhash(struct sk_buff *skb) swap(keys.port16[0], keys.port16[1]); } - hash = jhash_3words((__force u32)keys.dst, - (__force u32)keys.src, - (__force u32)keys.ports, hashrnd); + hash = __flow_hash_3words((__force u32)keys.dst, + (__force u32)keys.src, + (__force u32)keys.ports); if (!hash) hash = 1; @@ -231,7 +264,7 @@ u16 __skb_tx_hash(const struct net_device *dev, const struct sk_buff *skb, hash = skb->sk->sk_hash; else hash = (__force u16) skb->protocol; - hash = jhash_1word(hash, hashrnd); + hash = __flow_hash_1word(hash); return (u16) (((u64) hash * qcount) >> 32) + qoffset; } @@ -323,7 +356,7 @@ static inline int get_xps_queue(struct net_device *dev, struct sk_buff *skb) else hash = (__force u16) skb->protocol ^ skb->rxhash; - hash = jhash_1word(hash, hashrnd); + hash = __flow_hash_1word(hash); queue_index = map->queues[ ((u64)hash * map->len) >> 32]; } @@ -378,11 +411,3 @@ struct netdev_queue *netdev_pick_tx(struct net_device *dev, skb_set_queue_mapping(skb, queue_index); return netdev_get_tx_queue(dev, queue_index); } - -static int __init initialize_hashrnd(void) -{ - get_random_bytes(&hashrnd, sizeof(hashrnd)); - return 0; -} - -late_initcall_sync(initialize_hashrnd); diff --git a/net/core/iovec.c b/net/core/iovec.c index b77eeecc0011..4cdb7c48dad6 100644 --- a/net/core/iovec.c +++ b/net/core/iovec.c @@ -100,7 +100,7 @@ int memcpy_toiovecend(const struct iovec *iov, unsigned char *kdata, EXPORT_SYMBOL(memcpy_toiovecend); /* - * Copy iovec from kernel. Returns -EFAULT on error. + * Copy iovec to kernel. Returns -EFAULT on error. */ int memcpy_fromiovecend(unsigned char *kdata, const struct iovec *iov, diff --git a/net/core/neighbour.c b/net/core/neighbour.c index 6072610a8672..ca15f32821fb 100644 --- a/net/core/neighbour.c +++ b/net/core/neighbour.c @@ -867,7 +867,7 @@ static void neigh_invalidate(struct neighbour *neigh) static void neigh_probe(struct neighbour *neigh) __releases(neigh->lock) { - struct sk_buff *skb = skb_peek(&neigh->arp_queue); + struct sk_buff *skb = skb_peek_tail(&neigh->arp_queue); /* keep skb alive even if arp_queue overflows */ if (skb) skb = skb_copy(skb, GFP_ATOMIC); diff --git a/net/core/net-sysfs.c b/net/core/net-sysfs.c index 325dee863e46..f3edf9635e02 100644 --- a/net/core/net-sysfs.c +++ b/net/core/net-sysfs.c @@ -1263,7 +1263,7 @@ static void netdev_release(struct device *d) BUG_ON(dev->reg_state != NETREG_RELEASED); kfree(dev->ifalias); - kfree((char *)dev - dev->padded); + netdev_freemem(dev); } static const void *net_namespace(struct device *d) diff --git a/net/core/netprio_cgroup.c b/net/core/netprio_cgroup.c index d9cd627e6a16..9b7cf6c85f82 100644 --- a/net/core/netprio_cgroup.c +++ b/net/core/netprio_cgroup.c @@ -222,11 +222,10 @@ static void net_prio_attach(struct cgroup_subsys_state *css, struct cgroup_taskset *tset) { struct task_struct *p; - void *v; + void *v = (void *)(unsigned long)css->cgroup->id; cgroup_taskset_for_each(p, css, tset) { task_lock(p); - v = (void *)(unsigned long)task_netprioidx(p); iterate_fd(p->files, 0, update_netprio, v); task_unlock(p); } diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c index 2a0e21de3060..cf67144d3e3c 100644 --- a/net/core/rtnetlink.c +++ b/net/core/rtnetlink.c @@ -1647,9 +1647,8 @@ int rtnl_configure_link(struct net_device *dev, const struct ifinfomsg *ifm) } dev->rtnl_link_state = RTNL_LINK_INITIALIZED; - rtmsg_ifinfo(RTM_NEWLINK, dev, ~0U); - __dev_notify_flags(dev, old_flags); + __dev_notify_flags(dev, old_flags, ~0U); return 0; } EXPORT_SYMBOL(rtnl_configure_link); @@ -1985,14 +1984,15 @@ static int rtnl_dump_all(struct sk_buff *skb, struct netlink_callback *cb) return skb->len; } -void rtmsg_ifinfo(int type, struct net_device *dev, unsigned int change) +void rtmsg_ifinfo(int type, struct net_device *dev, unsigned int change, + gfp_t flags) { struct net *net = dev_net(dev); struct sk_buff *skb; int err = -ENOBUFS; size_t if_info_size; - skb = nlmsg_new((if_info_size = if_nlmsg_size(dev, 0)), GFP_KERNEL); + skb = nlmsg_new((if_info_size = if_nlmsg_size(dev, 0)), flags); if (skb == NULL) goto errout; @@ -2003,7 +2003,7 @@ void rtmsg_ifinfo(int type, struct net_device *dev, unsigned int change) kfree_skb(skb); goto errout; } - rtnl_notify(skb, net, 0, RTNLGRP_LINK, NULL, GFP_KERNEL); + rtnl_notify(skb, net, 0, RTNLGRP_LINK, NULL, flags); return; errout: if (err < 0) @@ -2717,7 +2717,7 @@ static int rtnetlink_event(struct notifier_block *this, unsigned long event, voi case NETDEV_JOIN: break; default: - rtmsg_ifinfo(RTM_NEWLINK, dev, 0); + rtmsg_ifinfo(RTM_NEWLINK, dev, 0, GFP_KERNEL); break; } return NOTIFY_DONE; diff --git a/net/core/secure_seq.c b/net/core/secure_seq.c index 8d9d05edd2eb..897da56f3aff 100644 --- a/net/core/secure_seq.c +++ b/net/core/secure_seq.c @@ -7,6 +7,7 @@ #include <linux/hrtimer.h> #include <linux/ktime.h> #include <linux/string.h> +#include <linux/net.h> #include <net/secure_seq.h> @@ -15,20 +16,9 @@ static u32 net_secret[NET_SECRET_SIZE] ____cacheline_aligned; -static void net_secret_init(void) +static __always_inline void net_secret_init(void) { - u32 tmp; - int i; - - if (likely(net_secret[0])) - return; - - for (i = NET_SECRET_SIZE; i > 0;) { - do { - get_random_bytes(&tmp, sizeof(tmp)); - } while (!tmp); - cmpxchg(&net_secret[--i], 0, tmp); - } + net_get_random_once(net_secret, sizeof(net_secret)); } #endif diff --git a/net/core/skbuff.c b/net/core/skbuff.c index d81cff119f73..8cec1e6b844d 100644 --- a/net/core/skbuff.c +++ b/net/core/skbuff.c @@ -476,6 +476,18 @@ void skb_add_rx_frag(struct sk_buff *skb, int i, struct page *page, int off, } EXPORT_SYMBOL(skb_add_rx_frag); +void skb_coalesce_rx_frag(struct sk_buff *skb, int i, int size, + unsigned int truesize) +{ + skb_frag_t *frag = &skb_shinfo(skb)->frags[i]; + + skb_frag_size_add(frag, size); + skb->len += size; + skb->data_len += size; + skb->truesize += truesize; +} +EXPORT_SYMBOL(skb_coalesce_rx_frag); + static void skb_drop_list(struct sk_buff **listp) { kfree_skb_list(*listp); @@ -580,9 +592,6 @@ static void skb_release_head_state(struct sk_buff *skb) #if IS_ENABLED(CONFIG_NF_CONNTRACK) nf_conntrack_put(skb->nfct); #endif -#ifdef NET_SKBUFF_NF_DEFRAG_NEEDED - nf_conntrack_put_reasm(skb->nfct_reasm); -#endif #ifdef CONFIG_BRIDGE_NETFILTER nf_bridge_put(skb->nf_bridge); #endif @@ -903,6 +912,9 @@ EXPORT_SYMBOL(skb_clone); static void skb_headers_offset_update(struct sk_buff *skb, int off) { + /* Only adjust this if it actually is csum_start rather than csum */ + if (skb->ip_summed == CHECKSUM_PARTIAL) + skb->csum_start += off; /* {transport,network,mac}_header and tail are relative to skb->head */ skb->transport_header += off; skb->network_header += off; @@ -1036,8 +1048,8 @@ EXPORT_SYMBOL(__pskb_copy); * @ntail: room to add at tail * @gfp_mask: allocation priority * - * Expands (or creates identical copy, if &nhead and &ntail are zero) - * header of skb. &sk_buff itself is not changed. &sk_buff MUST have + * Expands (or creates identical copy, if @nhead and @ntail are zero) + * header of @skb. &sk_buff itself is not changed. &sk_buff MUST have * reference count of 1. Returns zero in the case of success or error, * if expansion failed. In the last case, &sk_buff is not changed. * @@ -1109,9 +1121,6 @@ int pskb_expand_head(struct sk_buff *skb, int nhead, int ntail, #endif skb->tail += off; skb_headers_offset_update(skb, nhead); - /* Only adjust this if it actually is csum_start rather than csum */ - if (skb->ip_summed == CHECKSUM_PARTIAL) - skb->csum_start += nhead; skb->cloned = 0; skb->hdr_len = 0; skb->nohdr = 0; @@ -1176,7 +1185,6 @@ struct sk_buff *skb_copy_expand(const struct sk_buff *skb, NUMA_NO_NODE); int oldheadroom = skb_headroom(skb); int head_copy_len, head_copy_off; - int off; if (!n) return NULL; @@ -1200,11 +1208,7 @@ struct sk_buff *skb_copy_expand(const struct sk_buff *skb, copy_skb_header(n, skb); - off = newheadroom - oldheadroom; - if (n->ip_summed == CHECKSUM_PARTIAL) - n->csum_start += off; - - skb_headers_offset_update(n, off); + skb_headers_offset_update(n, newheadroom - oldheadroom); return n; } @@ -1257,6 +1261,29 @@ free_skb: EXPORT_SYMBOL(skb_pad); /** + * pskb_put - add data to the tail of a potentially fragmented buffer + * @skb: start of the buffer to use + * @tail: tail fragment of the buffer to use + * @len: amount of data to add + * + * This function extends the used data area of the potentially + * fragmented buffer. @tail must be the last fragment of @skb -- or + * @skb itself. If this would exceed the total buffer size the kernel + * will panic. A pointer to the first byte of the extra data is + * returned. + */ + +unsigned char *pskb_put(struct sk_buff *skb, struct sk_buff *tail, int len) +{ + if (tail != skb) { + skb->data_len += len; + skb->len += len; + } + return skb_put(tail, len); +} +EXPORT_SYMBOL_GPL(pskb_put); + +/** * skb_put - add data to a buffer * @skb: buffer to use * @len: amount of data to add @@ -1933,9 +1960,8 @@ fault: EXPORT_SYMBOL(skb_store_bits); /* Checksum skb data. */ - -__wsum skb_checksum(const struct sk_buff *skb, int offset, - int len, __wsum csum) +__wsum __skb_checksum(const struct sk_buff *skb, int offset, int len, + __wsum csum, const struct skb_checksum_ops *ops) { int start = skb_headlen(skb); int i, copy = start - offset; @@ -1946,7 +1972,7 @@ __wsum skb_checksum(const struct sk_buff *skb, int offset, if (copy > 0) { if (copy > len) copy = len; - csum = csum_partial(skb->data + offset, copy, csum); + csum = ops->update(skb->data + offset, copy, csum); if ((len -= copy) == 0) return csum; offset += copy; @@ -1967,10 +1993,10 @@ __wsum skb_checksum(const struct sk_buff *skb, int offset, if (copy > len) copy = len; vaddr = kmap_atomic(skb_frag_page(frag)); - csum2 = csum_partial(vaddr + frag->page_offset + - offset - start, copy, 0); + csum2 = ops->update(vaddr + frag->page_offset + + offset - start, copy, 0); kunmap_atomic(vaddr); - csum = csum_block_add(csum, csum2, pos); + csum = ops->combine(csum, csum2, pos, copy); if (!(len -= copy)) return csum; offset += copy; @@ -1989,9 +2015,9 @@ __wsum skb_checksum(const struct sk_buff *skb, int offset, __wsum csum2; if (copy > len) copy = len; - csum2 = skb_checksum(frag_iter, offset - start, - copy, 0); - csum = csum_block_add(csum, csum2, pos); + csum2 = __skb_checksum(frag_iter, offset - start, + copy, 0, ops); + csum = ops->combine(csum, csum2, pos, copy); if ((len -= copy) == 0) return csum; offset += copy; @@ -2003,6 +2029,18 @@ __wsum skb_checksum(const struct sk_buff *skb, int offset, return csum; } +EXPORT_SYMBOL(__skb_checksum); + +__wsum skb_checksum(const struct sk_buff *skb, int offset, + int len, __wsum csum) +{ + const struct skb_checksum_ops ops = { + .update = csum_partial_ext, + .combine = csum_block_add_ext, + }; + + return __skb_checksum(skb, offset, len, csum, &ops); +} EXPORT_SYMBOL(skb_checksum); /* Both of above in one bottle. */ @@ -2522,14 +2560,14 @@ EXPORT_SYMBOL(skb_prepare_seq_read); * @data: destination pointer for data to be returned * @st: state variable * - * Reads a block of skb data at &consumed relative to the + * Reads a block of skb data at @consumed relative to the * lower offset specified to skb_prepare_seq_read(). Assigns - * the head of the data block to &data and returns the length + * the head of the data block to @data and returns the length * of the block or 0 if the end of the skb data or the upper * offset has been reached. * * The caller is not required to consume all of the data - * returned, i.e. &consumed is typically set to the number + * returned, i.e. @consumed is typically set to the number * of bytes already consumed and the next call to * skb_seq_read() will return the remaining part of the block. * @@ -2837,14 +2875,7 @@ struct sk_buff *skb_segment(struct sk_buff *skb, netdev_features_t features) __copy_skb_header(nskb, skb); nskb->mac_len = skb->mac_len; - /* nskb and skb might have different headroom */ - if (nskb->ip_summed == CHECKSUM_PARTIAL) - nskb->csum_start += skb_headroom(nskb) - headroom; - - skb_reset_mac_header(nskb); - skb_set_network_header(nskb, skb->mac_len); - nskb->transport_header = (nskb->network_header + - skb_network_header_len(skb)); + skb_headers_offset_update(nskb, skb_headroom(nskb) - headroom); skb_copy_from_linear_data_offset(skb, -tnl_hlen, nskb->data - tnl_hlen, @@ -2936,32 +2967,30 @@ EXPORT_SYMBOL_GPL(skb_segment); int skb_gro_receive(struct sk_buff **head, struct sk_buff *skb) { - struct sk_buff *p = *head; - struct sk_buff *nskb; - struct skb_shared_info *skbinfo = skb_shinfo(skb); - struct skb_shared_info *pinfo = skb_shinfo(p); - unsigned int headroom; - unsigned int len = skb_gro_len(skb); + struct skb_shared_info *pinfo, *skbinfo = skb_shinfo(skb); unsigned int offset = skb_gro_offset(skb); unsigned int headlen = skb_headlen(skb); + struct sk_buff *nskb, *lp, *p = *head; + unsigned int len = skb_gro_len(skb); unsigned int delta_truesize; + unsigned int headroom; - if (p->len + len >= 65536) + if (unlikely(p->len + len >= 65536)) return -E2BIG; - if (pinfo->frag_list) - goto merge; - else if (headlen <= offset) { + lp = NAPI_GRO_CB(p)->last ?: p; + pinfo = skb_shinfo(lp); + + if (headlen <= offset) { skb_frag_t *frag; skb_frag_t *frag2; int i = skbinfo->nr_frags; int nr_frags = pinfo->nr_frags + i; - offset -= headlen; - if (nr_frags > MAX_SKB_FRAGS) - return -E2BIG; + goto merge; + offset -= headlen; pinfo->nr_frags = nr_frags; skbinfo->nr_frags = 0; @@ -2992,7 +3021,7 @@ int skb_gro_receive(struct sk_buff **head, struct sk_buff *skb) unsigned int first_offset; if (nr_frags + 1 + skbinfo->nr_frags > MAX_SKB_FRAGS) - return -E2BIG; + goto merge; first_offset = skb->data - (unsigned char *)page_address(page) + @@ -3010,7 +3039,10 @@ int skb_gro_receive(struct sk_buff **head, struct sk_buff *skb) delta_truesize = skb->truesize - SKB_DATA_ALIGN(sizeof(struct sk_buff)); NAPI_GRO_CB(skb)->free = NAPI_GRO_FREE_STOLEN_HEAD; goto done; - } else if (skb_gro_len(p) != pinfo->gso_size) + } + if (pinfo->frag_list) + goto merge; + if (skb_gro_len(p) != pinfo->gso_size) return -E2BIG; headroom = skb_headroom(p); @@ -3062,16 +3094,24 @@ merge: __skb_pull(skb, offset); - NAPI_GRO_CB(p)->last->next = skb; + if (!NAPI_GRO_CB(p)->last) + skb_shinfo(p)->frag_list = skb; + else + NAPI_GRO_CB(p)->last->next = skb; NAPI_GRO_CB(p)->last = skb; skb_header_release(skb); + lp = p; done: NAPI_GRO_CB(p)->count++; p->data_len += len; p->truesize += delta_truesize; p->len += len; - + if (lp != p) { + lp->data_len += len; + lp->truesize += delta_truesize; + lp->len += len; + } NAPI_GRO_CB(skb)->same_flow = 1; return 0; } diff --git a/net/core/sock.c b/net/core/sock.c index 0b39e7ae4383..ab20ed9b0f31 100644 --- a/net/core/sock.c +++ b/net/core/sock.c @@ -475,12 +475,6 @@ discard_and_relse: } EXPORT_SYMBOL(sk_receive_skb); -void sk_reset_txq(struct sock *sk) -{ - sk_tx_queue_clear(sk); -} -EXPORT_SYMBOL(sk_reset_txq); - struct dst_entry *__sk_dst_check(struct sock *sk, u32 cookie) { struct dst_entry *dst = __sk_dst_get(sk); @@ -914,6 +908,13 @@ set_rcvbuf: } break; #endif + + case SO_MAX_PACING_RATE: + sk->sk_max_pacing_rate = val; + sk->sk_pacing_rate = min(sk->sk_pacing_rate, + sk->sk_max_pacing_rate); + break; + default: ret = -ENOPROTOOPT; break; @@ -1177,6 +1178,10 @@ int sock_getsockopt(struct socket *sock, int level, int optname, break; #endif + case SO_MAX_PACING_RATE: + v.val = sk->sk_max_pacing_rate; + break; + default: return -ENOPROTOOPT; } @@ -1836,7 +1841,17 @@ EXPORT_SYMBOL(sock_alloc_send_skb); /* On 32bit arches, an skb frag is limited to 2^15 */ #define SKB_FRAG_PAGE_ORDER get_order(32768) -bool sk_page_frag_refill(struct sock *sk, struct page_frag *pfrag) +/** + * skb_page_frag_refill - check that a page_frag contains enough room + * @sz: minimum size of the fragment we want to get + * @pfrag: pointer to page_frag + * @prio: priority for memory allocation + * + * Note: While this allocator tries to use high order pages, there is + * no guarantee that allocations succeed. Therefore, @sz MUST be + * less or equal than PAGE_SIZE. + */ +bool skb_page_frag_refill(unsigned int sz, struct page_frag *pfrag, gfp_t prio) { int order; @@ -1845,16 +1860,16 @@ bool sk_page_frag_refill(struct sock *sk, struct page_frag *pfrag) pfrag->offset = 0; return true; } - if (pfrag->offset < pfrag->size) + if (pfrag->offset + sz <= pfrag->size) return true; put_page(pfrag->page); } /* We restrict high order allocations to users that can afford to wait */ - order = (sk->sk_allocation & __GFP_WAIT) ? SKB_FRAG_PAGE_ORDER : 0; + order = (prio & __GFP_WAIT) ? SKB_FRAG_PAGE_ORDER : 0; do { - gfp_t gfp = sk->sk_allocation; + gfp_t gfp = prio; if (order) gfp |= __GFP_COMP | __GFP_NOWARN; @@ -1866,6 +1881,15 @@ bool sk_page_frag_refill(struct sock *sk, struct page_frag *pfrag) } } while (--order >= 0); + return false; +} +EXPORT_SYMBOL(skb_page_frag_refill); + +bool sk_page_frag_refill(struct sock *sk, struct page_frag *pfrag) +{ + if (likely(skb_page_frag_refill(32U, pfrag, sk->sk_allocation))) + return true; + sk_enter_memory_pressure(sk); sk_stream_moderate_sndbuf(sk); return false; @@ -2319,6 +2343,7 @@ void sock_init_data(struct socket *sock, struct sock *sk) sk->sk_ll_usec = sysctl_net_busy_read; #endif + sk->sk_max_pacing_rate = ~0U; sk->sk_pacing_rate = ~0U; /* * Before updating sk_refcnt, we must commit prior changes to memory diff --git a/net/core/utils.c b/net/core/utils.c index aa88e23fc87a..2f737bf90b3f 100644 --- a/net/core/utils.c +++ b/net/core/utils.c @@ -338,3 +338,52 @@ void inet_proto_csum_replace16(__sum16 *sum, struct sk_buff *skb, csum_unfold(*sum))); } EXPORT_SYMBOL(inet_proto_csum_replace16); + +struct __net_random_once_work { + struct work_struct work; + struct static_key *key; +}; + +static void __net_random_once_deferred(struct work_struct *w) +{ + struct __net_random_once_work *work = + container_of(w, struct __net_random_once_work, work); + if (!static_key_enabled(work->key)) + static_key_slow_inc(work->key); + kfree(work); +} + +static void __net_random_once_disable_jump(struct static_key *key) +{ + struct __net_random_once_work *w; + + w = kmalloc(sizeof(*w), GFP_ATOMIC); + if (!w) + return; + + INIT_WORK(&w->work, __net_random_once_deferred); + w->key = key; + schedule_work(&w->work); +} + +bool __net_get_random_once(void *buf, int nbytes, bool *done, + struct static_key *done_key) +{ + static DEFINE_SPINLOCK(lock); + unsigned long flags; + + spin_lock_irqsave(&lock, flags); + if (*done) { + spin_unlock_irqrestore(&lock, flags); + return false; + } + + get_random_bytes(buf, nbytes); + *done = true; + spin_unlock_irqrestore(&lock, flags); + + __net_random_once_disable_jump(done_key); + + return true; +} +EXPORT_SYMBOL(__net_get_random_once); diff --git a/net/dccp/ackvec.h b/net/dccp/ackvec.h index a269aa7f7923..3284bfa988c0 100644 --- a/net/dccp/ackvec.h +++ b/net/dccp/ackvec.h @@ -101,16 +101,16 @@ struct dccp_ackvec_record { u8 avr_ack_nonce:1; }; -extern int dccp_ackvec_init(void); -extern void dccp_ackvec_exit(void); +int dccp_ackvec_init(void); +void dccp_ackvec_exit(void); -extern struct dccp_ackvec *dccp_ackvec_alloc(const gfp_t priority); -extern void dccp_ackvec_free(struct dccp_ackvec *av); +struct dccp_ackvec *dccp_ackvec_alloc(const gfp_t priority); +void dccp_ackvec_free(struct dccp_ackvec *av); -extern void dccp_ackvec_input(struct dccp_ackvec *av, struct sk_buff *skb); -extern int dccp_ackvec_update_records(struct dccp_ackvec *av, u64 seq, u8 sum); -extern void dccp_ackvec_clear_state(struct dccp_ackvec *av, const u64 ackno); -extern u16 dccp_ackvec_buflen(const struct dccp_ackvec *av); +void dccp_ackvec_input(struct dccp_ackvec *av, struct sk_buff *skb); +int dccp_ackvec_update_records(struct dccp_ackvec *av, u64 seq, u8 sum); +void dccp_ackvec_clear_state(struct dccp_ackvec *av, const u64 ackno); +u16 dccp_ackvec_buflen(const struct dccp_ackvec *av); static inline bool dccp_ackvec_is_empty(const struct dccp_ackvec *av) { @@ -133,7 +133,6 @@ struct dccp_ackvec_parsed { struct list_head node; }; -extern int dccp_ackvec_parsed_add(struct list_head *head, - u8 *vec, u8 len, u8 nonce); -extern void dccp_ackvec_parsed_cleanup(struct list_head *parsed_chunks); +int dccp_ackvec_parsed_add(struct list_head *head, u8 *vec, u8 len, u8 nonce); +void dccp_ackvec_parsed_cleanup(struct list_head *parsed_chunks); #endif /* _ACKVEC_H */ diff --git a/net/dccp/ccid.h b/net/dccp/ccid.h index fb85d371a8de..6eb837a47b5c 100644 --- a/net/dccp/ccid.h +++ b/net/dccp/ccid.h @@ -93,8 +93,8 @@ extern struct ccid_operations ccid2_ops; extern struct ccid_operations ccid3_ops; #endif -extern int ccid_initialize_builtins(void); -extern void ccid_cleanup_builtins(void); +int ccid_initialize_builtins(void); +void ccid_cleanup_builtins(void); struct ccid { struct ccid_operations *ccid_ops; @@ -106,12 +106,12 @@ static inline void *ccid_priv(const struct ccid *ccid) return (void *)ccid->ccid_priv; } -extern bool ccid_support_check(u8 const *ccid_array, u8 array_len); -extern int ccid_get_builtin_ccids(u8 **ccid_array, u8 *array_len); -extern int ccid_getsockopt_builtin_ccids(struct sock *sk, int len, - char __user *, int __user *); +bool ccid_support_check(u8 const *ccid_array, u8 array_len); +int ccid_get_builtin_ccids(u8 **ccid_array, u8 *array_len); +int ccid_getsockopt_builtin_ccids(struct sock *sk, int len, + char __user *, int __user *); -extern struct ccid *ccid_new(const u8 id, struct sock *sk, bool rx); +struct ccid *ccid_new(const u8 id, struct sock *sk, bool rx); static inline int ccid_get_current_rx_ccid(struct dccp_sock *dp) { @@ -131,8 +131,8 @@ static inline int ccid_get_current_tx_ccid(struct dccp_sock *dp) return ccid->ccid_ops->ccid_id; } -extern void ccid_hc_rx_delete(struct ccid *ccid, struct sock *sk); -extern void ccid_hc_tx_delete(struct ccid *ccid, struct sock *sk); +void ccid_hc_rx_delete(struct ccid *ccid, struct sock *sk); +void ccid_hc_tx_delete(struct ccid *ccid, struct sock *sk); /* * Congestion control of queued data packets via CCID decision. diff --git a/net/dccp/ccids/lib/loss_interval.h b/net/dccp/ccids/lib/loss_interval.h index d1d2f5383b7d..57f631a86ccd 100644 --- a/net/dccp/ccids/lib/loss_interval.h +++ b/net/dccp/ccids/lib/loss_interval.h @@ -65,9 +65,9 @@ static inline u8 tfrc_lh_length(struct tfrc_loss_hist *lh) struct tfrc_rx_hist; -extern int tfrc_lh_interval_add(struct tfrc_loss_hist *, struct tfrc_rx_hist *, - u32 (*first_li)(struct sock *), struct sock *); -extern u8 tfrc_lh_update_i_mean(struct tfrc_loss_hist *lh, struct sk_buff *); -extern void tfrc_lh_cleanup(struct tfrc_loss_hist *lh); +int tfrc_lh_interval_add(struct tfrc_loss_hist *, struct tfrc_rx_hist *, + u32 (*first_li)(struct sock *), struct sock *); +u8 tfrc_lh_update_i_mean(struct tfrc_loss_hist *lh, struct sk_buff *); +void tfrc_lh_cleanup(struct tfrc_loss_hist *lh); #endif /* _DCCP_LI_HIST_ */ diff --git a/net/dccp/ccids/lib/packet_history.h b/net/dccp/ccids/lib/packet_history.h index 7ee4a9d9d335..ee362b0b630d 100644 --- a/net/dccp/ccids/lib/packet_history.h +++ b/net/dccp/ccids/lib/packet_history.h @@ -60,8 +60,8 @@ static inline struct tfrc_tx_hist_entry * return head; } -extern int tfrc_tx_hist_add(struct tfrc_tx_hist_entry **headp, u64 seqno); -extern void tfrc_tx_hist_purge(struct tfrc_tx_hist_entry **headp); +int tfrc_tx_hist_add(struct tfrc_tx_hist_entry **headp, u64 seqno); +void tfrc_tx_hist_purge(struct tfrc_tx_hist_entry **headp); /* Subtraction a-b modulo-16, respects circular wrap-around */ #define SUB16(a, b) (((a) + 16 - (b)) & 0xF) @@ -139,20 +139,17 @@ static inline bool tfrc_rx_hist_loss_pending(const struct tfrc_rx_hist *h) return h->loss_count > 0; } -extern void tfrc_rx_hist_add_packet(struct tfrc_rx_hist *h, - const struct sk_buff *skb, const u64 ndp); +void tfrc_rx_hist_add_packet(struct tfrc_rx_hist *h, const struct sk_buff *skb, + const u64 ndp); -extern int tfrc_rx_hist_duplicate(struct tfrc_rx_hist *h, struct sk_buff *skb); +int tfrc_rx_hist_duplicate(struct tfrc_rx_hist *h, struct sk_buff *skb); struct tfrc_loss_hist; -extern int tfrc_rx_handle_loss(struct tfrc_rx_hist *h, - struct tfrc_loss_hist *lh, - struct sk_buff *skb, const u64 ndp, - u32 (*first_li)(struct sock *sk), - struct sock *sk); -extern u32 tfrc_rx_hist_sample_rtt(struct tfrc_rx_hist *h, - const struct sk_buff *skb); -extern int tfrc_rx_hist_alloc(struct tfrc_rx_hist *h); -extern void tfrc_rx_hist_purge(struct tfrc_rx_hist *h); +int tfrc_rx_handle_loss(struct tfrc_rx_hist *h, struct tfrc_loss_hist *lh, + struct sk_buff *skb, const u64 ndp, + u32 (*first_li)(struct sock *sk), struct sock *sk); +u32 tfrc_rx_hist_sample_rtt(struct tfrc_rx_hist *h, const struct sk_buff *skb); +int tfrc_rx_hist_alloc(struct tfrc_rx_hist *h); +void tfrc_rx_hist_purge(struct tfrc_rx_hist *h); #endif /* _DCCP_PKT_HIST_ */ diff --git a/net/dccp/ccids/lib/tfrc.h b/net/dccp/ccids/lib/tfrc.h index ed698c42a5fb..40ee7d62b652 100644 --- a/net/dccp/ccids/lib/tfrc.h +++ b/net/dccp/ccids/lib/tfrc.h @@ -55,21 +55,21 @@ static inline u32 tfrc_ewma(const u32 avg, const u32 newval, const u8 weight) return avg ? (weight * avg + (10 - weight) * newval) / 10 : newval; } -extern u32 tfrc_calc_x(u16 s, u32 R, u32 p); -extern u32 tfrc_calc_x_reverse_lookup(u32 fvalue); -extern u32 tfrc_invert_loss_event_rate(u32 loss_event_rate); +u32 tfrc_calc_x(u16 s, u32 R, u32 p); +u32 tfrc_calc_x_reverse_lookup(u32 fvalue); +u32 tfrc_invert_loss_event_rate(u32 loss_event_rate); -extern int tfrc_tx_packet_history_init(void); -extern void tfrc_tx_packet_history_exit(void); -extern int tfrc_rx_packet_history_init(void); -extern void tfrc_rx_packet_history_exit(void); +int tfrc_tx_packet_history_init(void); +void tfrc_tx_packet_history_exit(void); +int tfrc_rx_packet_history_init(void); +void tfrc_rx_packet_history_exit(void); -extern int tfrc_li_init(void); -extern void tfrc_li_exit(void); +int tfrc_li_init(void); +void tfrc_li_exit(void); #ifdef CONFIG_IP_DCCP_TFRC_LIB -extern int tfrc_lib_init(void); -extern void tfrc_lib_exit(void); +int tfrc_lib_init(void); +void tfrc_lib_exit(void); #else #define tfrc_lib_init() (0) #define tfrc_lib_exit() diff --git a/net/dccp/dccp.h b/net/dccp/dccp.h index 708e75bf623d..30948784dd58 100644 --- a/net/dccp/dccp.h +++ b/net/dccp/dccp.h @@ -53,7 +53,7 @@ extern struct inet_hashinfo dccp_hashinfo; extern struct percpu_counter dccp_orphan_count; -extern void dccp_time_wait(struct sock *sk, int state, int timeo); +void dccp_time_wait(struct sock *sk, int state, int timeo); /* * Set safe upper bounds for header and option length. Since Data Offset is 8 @@ -224,114 +224,108 @@ static inline void dccp_csum_outgoing(struct sk_buff *skb) skb->csum = skb_checksum(skb, 0, (cov > skb->len)? skb->len : cov, 0); } -extern void dccp_v4_send_check(struct sock *sk, struct sk_buff *skb); +void dccp_v4_send_check(struct sock *sk, struct sk_buff *skb); -extern int dccp_retransmit_skb(struct sock *sk); +int dccp_retransmit_skb(struct sock *sk); -extern void dccp_send_ack(struct sock *sk); -extern void dccp_reqsk_send_ack(struct sock *sk, struct sk_buff *skb, - struct request_sock *rsk); +void dccp_send_ack(struct sock *sk); +void dccp_reqsk_send_ack(struct sock *sk, struct sk_buff *skb, + struct request_sock *rsk); -extern void dccp_send_sync(struct sock *sk, const u64 seq, - const enum dccp_pkt_type pkt_type); +void dccp_send_sync(struct sock *sk, const u64 seq, + const enum dccp_pkt_type pkt_type); /* * TX Packet Dequeueing Interface */ -extern void dccp_qpolicy_push(struct sock *sk, struct sk_buff *skb); -extern bool dccp_qpolicy_full(struct sock *sk); -extern void dccp_qpolicy_drop(struct sock *sk, struct sk_buff *skb); -extern struct sk_buff *dccp_qpolicy_top(struct sock *sk); -extern struct sk_buff *dccp_qpolicy_pop(struct sock *sk); -extern bool dccp_qpolicy_param_ok(struct sock *sk, __be32 param); +void dccp_qpolicy_push(struct sock *sk, struct sk_buff *skb); +bool dccp_qpolicy_full(struct sock *sk); +void dccp_qpolicy_drop(struct sock *sk, struct sk_buff *skb); +struct sk_buff *dccp_qpolicy_top(struct sock *sk); +struct sk_buff *dccp_qpolicy_pop(struct sock *sk); +bool dccp_qpolicy_param_ok(struct sock *sk, __be32 param); /* * TX Packet Output and TX Timers */ -extern void dccp_write_xmit(struct sock *sk); -extern void dccp_write_space(struct sock *sk); -extern void dccp_flush_write_queue(struct sock *sk, long *time_budget); +void dccp_write_xmit(struct sock *sk); +void dccp_write_space(struct sock *sk); +void dccp_flush_write_queue(struct sock *sk, long *time_budget); -extern void dccp_init_xmit_timers(struct sock *sk); +void dccp_init_xmit_timers(struct sock *sk); static inline void dccp_clear_xmit_timers(struct sock *sk) { inet_csk_clear_xmit_timers(sk); } -extern unsigned int dccp_sync_mss(struct sock *sk, u32 pmtu); +unsigned int dccp_sync_mss(struct sock *sk, u32 pmtu); -extern const char *dccp_packet_name(const int type); +const char *dccp_packet_name(const int type); -extern void dccp_set_state(struct sock *sk, const int state); -extern void dccp_done(struct sock *sk); +void dccp_set_state(struct sock *sk, const int state); +void dccp_done(struct sock *sk); -extern int dccp_reqsk_init(struct request_sock *rq, struct dccp_sock const *dp, - struct sk_buff const *skb); +int dccp_reqsk_init(struct request_sock *rq, struct dccp_sock const *dp, + struct sk_buff const *skb); -extern int dccp_v4_conn_request(struct sock *sk, struct sk_buff *skb); +int dccp_v4_conn_request(struct sock *sk, struct sk_buff *skb); -extern struct sock *dccp_create_openreq_child(struct sock *sk, - const struct request_sock *req, - const struct sk_buff *skb); +struct sock *dccp_create_openreq_child(struct sock *sk, + const struct request_sock *req, + const struct sk_buff *skb); -extern int dccp_v4_do_rcv(struct sock *sk, struct sk_buff *skb); +int dccp_v4_do_rcv(struct sock *sk, struct sk_buff *skb); -extern struct sock *dccp_v4_request_recv_sock(struct sock *sk, - struct sk_buff *skb, - struct request_sock *req, - struct dst_entry *dst); -extern struct sock *dccp_check_req(struct sock *sk, struct sk_buff *skb, - struct request_sock *req, - struct request_sock **prev); +struct sock *dccp_v4_request_recv_sock(struct sock *sk, struct sk_buff *skb, + struct request_sock *req, + struct dst_entry *dst); +struct sock *dccp_check_req(struct sock *sk, struct sk_buff *skb, + struct request_sock *req, + struct request_sock **prev); -extern int dccp_child_process(struct sock *parent, struct sock *child, - struct sk_buff *skb); -extern int dccp_rcv_state_process(struct sock *sk, struct sk_buff *skb, - struct dccp_hdr *dh, unsigned int len); -extern int dccp_rcv_established(struct sock *sk, struct sk_buff *skb, - const struct dccp_hdr *dh, const unsigned int len); +int dccp_child_process(struct sock *parent, struct sock *child, + struct sk_buff *skb); +int dccp_rcv_state_process(struct sock *sk, struct sk_buff *skb, + struct dccp_hdr *dh, unsigned int len); +int dccp_rcv_established(struct sock *sk, struct sk_buff *skb, + const struct dccp_hdr *dh, const unsigned int len); -extern int dccp_init_sock(struct sock *sk, const __u8 ctl_sock_initialized); -extern void dccp_destroy_sock(struct sock *sk); +int dccp_init_sock(struct sock *sk, const __u8 ctl_sock_initialized); +void dccp_destroy_sock(struct sock *sk); -extern void dccp_close(struct sock *sk, long timeout); -extern struct sk_buff *dccp_make_response(struct sock *sk, - struct dst_entry *dst, - struct request_sock *req); +void dccp_close(struct sock *sk, long timeout); +struct sk_buff *dccp_make_response(struct sock *sk, struct dst_entry *dst, + struct request_sock *req); -extern int dccp_connect(struct sock *sk); -extern int dccp_disconnect(struct sock *sk, int flags); -extern int dccp_getsockopt(struct sock *sk, int level, int optname, - char __user *optval, int __user *optlen); -extern int dccp_setsockopt(struct sock *sk, int level, int optname, - char __user *optval, unsigned int optlen); +int dccp_connect(struct sock *sk); +int dccp_disconnect(struct sock *sk, int flags); +int dccp_getsockopt(struct sock *sk, int level, int optname, + char __user *optval, int __user *optlen); +int dccp_setsockopt(struct sock *sk, int level, int optname, + char __user *optval, unsigned int optlen); #ifdef CONFIG_COMPAT -extern int compat_dccp_getsockopt(struct sock *sk, - int level, int optname, - char __user *optval, int __user *optlen); -extern int compat_dccp_setsockopt(struct sock *sk, - int level, int optname, - char __user *optval, unsigned int optlen); +int compat_dccp_getsockopt(struct sock *sk, int level, int optname, + char __user *optval, int __user *optlen); +int compat_dccp_setsockopt(struct sock *sk, int level, int optname, + char __user *optval, unsigned int optlen); #endif -extern int dccp_ioctl(struct sock *sk, int cmd, unsigned long arg); -extern int dccp_sendmsg(struct kiocb *iocb, struct sock *sk, - struct msghdr *msg, size_t size); -extern int dccp_recvmsg(struct kiocb *iocb, struct sock *sk, - struct msghdr *msg, size_t len, int nonblock, - int flags, int *addr_len); -extern void dccp_shutdown(struct sock *sk, int how); -extern int inet_dccp_listen(struct socket *sock, int backlog); -extern unsigned int dccp_poll(struct file *file, struct socket *sock, - poll_table *wait); -extern int dccp_v4_connect(struct sock *sk, struct sockaddr *uaddr, - int addr_len); - -extern struct sk_buff *dccp_ctl_make_reset(struct sock *sk, - struct sk_buff *skb); -extern int dccp_send_reset(struct sock *sk, enum dccp_reset_codes code); -extern void dccp_send_close(struct sock *sk, const int active); -extern int dccp_invalid_packet(struct sk_buff *skb); -extern u32 dccp_sample_rtt(struct sock *sk, long delta); +int dccp_ioctl(struct sock *sk, int cmd, unsigned long arg); +int dccp_sendmsg(struct kiocb *iocb, struct sock *sk, struct msghdr *msg, + size_t size); +int dccp_recvmsg(struct kiocb *iocb, struct sock *sk, + struct msghdr *msg, size_t len, int nonblock, int flags, + int *addr_len); +void dccp_shutdown(struct sock *sk, int how); +int inet_dccp_listen(struct socket *sock, int backlog); +unsigned int dccp_poll(struct file *file, struct socket *sock, + poll_table *wait); +int dccp_v4_connect(struct sock *sk, struct sockaddr *uaddr, int addr_len); + +struct sk_buff *dccp_ctl_make_reset(struct sock *sk, struct sk_buff *skb); +int dccp_send_reset(struct sock *sk, enum dccp_reset_codes code); +void dccp_send_close(struct sock *sk, const int active); +int dccp_invalid_packet(struct sk_buff *skb); +u32 dccp_sample_rtt(struct sock *sk, long delta); static inline int dccp_bad_service_code(const struct sock *sk, const __be32 service) @@ -475,25 +469,25 @@ static inline int dccp_ack_pending(const struct sock *sk) return dccp_ackvec_pending(sk) || inet_csk_ack_scheduled(sk); } -extern int dccp_feat_signal_nn_change(struct sock *sk, u8 feat, u64 nn_val); -extern int dccp_feat_finalise_settings(struct dccp_sock *dp); -extern int dccp_feat_server_ccid_dependencies(struct dccp_request_sock *dreq); -extern int dccp_feat_insert_opts(struct dccp_sock*, struct dccp_request_sock*, - struct sk_buff *skb); -extern int dccp_feat_activate_values(struct sock *sk, struct list_head *fn); -extern void dccp_feat_list_purge(struct list_head *fn_list); - -extern int dccp_insert_options(struct sock *sk, struct sk_buff *skb); -extern int dccp_insert_options_rsk(struct dccp_request_sock*, struct sk_buff*); -extern int dccp_insert_option_elapsed_time(struct sk_buff *skb, u32 elapsed); -extern u32 dccp_timestamp(void); -extern void dccp_timestamping_init(void); -extern int dccp_insert_option(struct sk_buff *skb, unsigned char option, - const void *value, unsigned char len); +int dccp_feat_signal_nn_change(struct sock *sk, u8 feat, u64 nn_val); +int dccp_feat_finalise_settings(struct dccp_sock *dp); +int dccp_feat_server_ccid_dependencies(struct dccp_request_sock *dreq); +int dccp_feat_insert_opts(struct dccp_sock*, struct dccp_request_sock*, + struct sk_buff *skb); +int dccp_feat_activate_values(struct sock *sk, struct list_head *fn); +void dccp_feat_list_purge(struct list_head *fn_list); + +int dccp_insert_options(struct sock *sk, struct sk_buff *skb); +int dccp_insert_options_rsk(struct dccp_request_sock *, struct sk_buff *); +int dccp_insert_option_elapsed_time(struct sk_buff *skb, u32 elapsed); +u32 dccp_timestamp(void); +void dccp_timestamping_init(void); +int dccp_insert_option(struct sk_buff *skb, unsigned char option, + const void *value, unsigned char len); #ifdef CONFIG_SYSCTL -extern int dccp_sysctl_init(void); -extern void dccp_sysctl_exit(void); +int dccp_sysctl_init(void); +void dccp_sysctl_exit(void); #else static inline int dccp_sysctl_init(void) { diff --git a/net/dccp/feat.h b/net/dccp/feat.h index 90b957d34d26..0e75cebb2187 100644 --- a/net/dccp/feat.h +++ b/net/dccp/feat.h @@ -107,13 +107,13 @@ extern unsigned long sysctl_dccp_sequence_window; extern int sysctl_dccp_rx_ccid; extern int sysctl_dccp_tx_ccid; -extern int dccp_feat_init(struct sock *sk); -extern void dccp_feat_initialise_sysctls(void); -extern int dccp_feat_register_sp(struct sock *sk, u8 feat, u8 is_local, - u8 const *list, u8 len); -extern int dccp_feat_parse_options(struct sock *, struct dccp_request_sock *, - u8 mand, u8 opt, u8 feat, u8 *val, u8 len); -extern int dccp_feat_clone_list(struct list_head const *, struct list_head *); +int dccp_feat_init(struct sock *sk); +void dccp_feat_initialise_sysctls(void); +int dccp_feat_register_sp(struct sock *sk, u8 feat, u8 is_local, + u8 const *list, u8 len); +int dccp_feat_parse_options(struct sock *, struct dccp_request_sock *, + u8 mand, u8 opt, u8 feat, u8 *val, u8 len); +int dccp_feat_clone_list(struct list_head const *, struct list_head *); /* * Encoding variable-length options and their maximum length. @@ -127,11 +127,11 @@ extern int dccp_feat_clone_list(struct list_head const *, struct list_head *); */ #define DCCP_OPTVAL_MAXLEN 6 -extern void dccp_encode_value_var(const u64 value, u8 *to, const u8 len); -extern u64 dccp_decode_value_var(const u8 *bf, const u8 len); -extern u64 dccp_feat_nn_get(struct sock *sk, u8 feat); +void dccp_encode_value_var(const u64 value, u8 *to, const u8 len); +u64 dccp_decode_value_var(const u8 *bf, const u8 len); +u64 dccp_feat_nn_get(struct sock *sk, u8 feat); -extern int dccp_insert_option_mandatory(struct sk_buff *skb); -extern int dccp_insert_fn_opt(struct sk_buff *skb, u8 type, u8 feat, - u8 *val, u8 len, bool repeat_first); +int dccp_insert_option_mandatory(struct sk_buff *skb); +int dccp_insert_fn_opt(struct sk_buff *skb, u8 type, u8 feat, u8 *val, u8 len, + bool repeat_first); #endif /* _DCCP_FEAT_H */ diff --git a/net/dccp/ipv4.c b/net/dccp/ipv4.c index ebc54fef85a5..d9f65fc66db5 100644 --- a/net/dccp/ipv4.c +++ b/net/dccp/ipv4.c @@ -174,6 +174,7 @@ static inline void dccp_do_pmtu_discovery(struct sock *sk, mtu = dst_mtu(dst); if (inet->pmtudisc != IP_PMTUDISC_DONT && + ip_sk_accept_pmtu(sk) && inet_csk(sk)->icsk_pmtu_cookie > mtu) { dccp_sync_mss(sk, mtu); @@ -409,9 +410,9 @@ struct sock *dccp_v4_request_recv_sock(struct sock *sk, struct sk_buff *skb, newinet = inet_sk(newsk); ireq = inet_rsk(req); - newinet->inet_daddr = ireq->rmt_addr; - newinet->inet_rcv_saddr = ireq->loc_addr; - newinet->inet_saddr = ireq->loc_addr; + newinet->inet_daddr = ireq->ir_rmt_addr; + newinet->inet_rcv_saddr = ireq->ir_loc_addr; + newinet->inet_saddr = ireq->ir_loc_addr; newinet->inet_opt = ireq->opt; ireq->opt = NULL; newinet->mc_index = inet_iif(skb); @@ -516,10 +517,10 @@ static int dccp_v4_send_response(struct sock *sk, struct request_sock *req) const struct inet_request_sock *ireq = inet_rsk(req); struct dccp_hdr *dh = dccp_hdr(skb); - dh->dccph_checksum = dccp_v4_csum_finish(skb, ireq->loc_addr, - ireq->rmt_addr); - err = ip_build_and_send_pkt(skb, sk, ireq->loc_addr, - ireq->rmt_addr, + dh->dccph_checksum = dccp_v4_csum_finish(skb, ireq->ir_loc_addr, + ireq->ir_rmt_addr); + err = ip_build_and_send_pkt(skb, sk, ireq->ir_loc_addr, + ireq->ir_rmt_addr, ireq->opt); err = net_xmit_eval(err); } @@ -641,8 +642,8 @@ int dccp_v4_conn_request(struct sock *sk, struct sk_buff *skb) goto drop_and_free; ireq = inet_rsk(req); - ireq->loc_addr = ip_hdr(skb)->daddr; - ireq->rmt_addr = ip_hdr(skb)->saddr; + ireq->ir_loc_addr = ip_hdr(skb)->daddr; + ireq->ir_rmt_addr = ip_hdr(skb)->saddr; /* * Step 3: Process LISTEN state diff --git a/net/dccp/ipv6.c b/net/dccp/ipv6.c index 6cf9f7782ad4..4ac71ff7c2e4 100644 --- a/net/dccp/ipv6.c +++ b/net/dccp/ipv6.c @@ -67,7 +67,7 @@ static inline void dccp_v6_send_check(struct sock *sk, struct sk_buff *skb) struct dccp_hdr *dh = dccp_hdr(skb); dccp_csum_outgoing(skb); - dh->dccph_checksum = dccp_v6_csum_finish(skb, &np->saddr, &np->daddr); + dh->dccph_checksum = dccp_v6_csum_finish(skb, &np->saddr, &sk->sk_v6_daddr); } static inline __u64 dccp_v6_init_sequence(struct sk_buff *skb) @@ -216,7 +216,7 @@ out: static int dccp_v6_send_response(struct sock *sk, struct request_sock *req) { - struct inet6_request_sock *ireq6 = inet6_rsk(req); + struct inet_request_sock *ireq = inet_rsk(req); struct ipv6_pinfo *np = inet6_sk(sk); struct sk_buff *skb; struct in6_addr *final_p, final; @@ -226,12 +226,12 @@ static int dccp_v6_send_response(struct sock *sk, struct request_sock *req) memset(&fl6, 0, sizeof(fl6)); fl6.flowi6_proto = IPPROTO_DCCP; - fl6.daddr = ireq6->rmt_addr; - fl6.saddr = ireq6->loc_addr; + fl6.daddr = ireq->ir_v6_rmt_addr; + fl6.saddr = ireq->ir_v6_loc_addr; fl6.flowlabel = 0; - fl6.flowi6_oif = ireq6->iif; - fl6.fl6_dport = inet_rsk(req)->rmt_port; - fl6.fl6_sport = inet_rsk(req)->loc_port; + fl6.flowi6_oif = ireq->ir_iif; + fl6.fl6_dport = ireq->ir_rmt_port; + fl6.fl6_sport = htons(ireq->ir_num); security_req_classify_flow(req, flowi6_to_flowi(&fl6)); @@ -249,9 +249,9 @@ static int dccp_v6_send_response(struct sock *sk, struct request_sock *req) struct dccp_hdr *dh = dccp_hdr(skb); dh->dccph_checksum = dccp_v6_csum_finish(skb, - &ireq6->loc_addr, - &ireq6->rmt_addr); - fl6.daddr = ireq6->rmt_addr; + &ireq->ir_v6_loc_addr, + &ireq->ir_v6_rmt_addr); + fl6.daddr = ireq->ir_v6_rmt_addr; err = ip6_xmit(sk, skb, &fl6, np->opt, np->tclass); err = net_xmit_eval(err); } @@ -264,8 +264,7 @@ done: static void dccp_v6_reqsk_destructor(struct request_sock *req) { dccp_feat_list_purge(&dccp_rsk(req)->dreq_featneg); - if (inet6_rsk(req)->pktopts != NULL) - kfree_skb(inet6_rsk(req)->pktopts); + kfree_skb(inet_rsk(req)->pktopts); } static void dccp_v6_ctl_send_reset(struct sock *sk, struct sk_buff *rxskb) @@ -359,7 +358,7 @@ static int dccp_v6_conn_request(struct sock *sk, struct sk_buff *skb) { struct request_sock *req; struct dccp_request_sock *dreq; - struct inet6_request_sock *ireq6; + struct inet_request_sock *ireq; struct ipv6_pinfo *np = inet6_sk(sk); const __be32 service = dccp_hdr_request(skb)->dccph_req_service; struct dccp_skb_cb *dcb = DCCP_SKB_CB(skb); @@ -398,22 +397,22 @@ static int dccp_v6_conn_request(struct sock *sk, struct sk_buff *skb) if (security_inet_conn_request(sk, skb, req)) goto drop_and_free; - ireq6 = inet6_rsk(req); - ireq6->rmt_addr = ipv6_hdr(skb)->saddr; - ireq6->loc_addr = ipv6_hdr(skb)->daddr; + ireq = inet_rsk(req); + ireq->ir_v6_rmt_addr = ipv6_hdr(skb)->saddr; + ireq->ir_v6_loc_addr = ipv6_hdr(skb)->daddr; if (ipv6_opt_accepted(sk, skb) || np->rxopt.bits.rxinfo || np->rxopt.bits.rxoinfo || np->rxopt.bits.rxhlim || np->rxopt.bits.rxohlim) { atomic_inc(&skb->users); - ireq6->pktopts = skb; + ireq->pktopts = skb; } - ireq6->iif = sk->sk_bound_dev_if; + ireq->ir_iif = sk->sk_bound_dev_if; /* So that link locals have meaning */ if (!sk->sk_bound_dev_if && - ipv6_addr_type(&ireq6->rmt_addr) & IPV6_ADDR_LINKLOCAL) - ireq6->iif = inet6_iif(skb); + ipv6_addr_type(&ireq->ir_v6_rmt_addr) & IPV6_ADDR_LINKLOCAL) + ireq->ir_iif = inet6_iif(skb); /* * Step 3: Process LISTEN state @@ -446,7 +445,7 @@ static struct sock *dccp_v6_request_recv_sock(struct sock *sk, struct request_sock *req, struct dst_entry *dst) { - struct inet6_request_sock *ireq6 = inet6_rsk(req); + struct inet_request_sock *ireq = inet_rsk(req); struct ipv6_pinfo *newnp, *np = inet6_sk(sk); struct inet_sock *newinet; struct dccp6_sock *newdp6; @@ -467,11 +466,11 @@ static struct sock *dccp_v6_request_recv_sock(struct sock *sk, memcpy(newnp, np, sizeof(struct ipv6_pinfo)); - ipv6_addr_set_v4mapped(newinet->inet_daddr, &newnp->daddr); + ipv6_addr_set_v4mapped(newinet->inet_daddr, &newsk->sk_v6_daddr); ipv6_addr_set_v4mapped(newinet->inet_saddr, &newnp->saddr); - newnp->rcv_saddr = newnp->saddr; + newsk->sk_v6_rcv_saddr = newnp->saddr; inet_csk(newsk)->icsk_af_ops = &dccp_ipv6_mapped; newsk->sk_backlog_rcv = dccp_v4_do_rcv; @@ -505,12 +504,12 @@ static struct sock *dccp_v6_request_recv_sock(struct sock *sk, memset(&fl6, 0, sizeof(fl6)); fl6.flowi6_proto = IPPROTO_DCCP; - fl6.daddr = ireq6->rmt_addr; + fl6.daddr = ireq->ir_v6_rmt_addr; final_p = fl6_update_dst(&fl6, np->opt, &final); - fl6.saddr = ireq6->loc_addr; + fl6.saddr = ireq->ir_v6_loc_addr; fl6.flowi6_oif = sk->sk_bound_dev_if; - fl6.fl6_dport = inet_rsk(req)->rmt_port; - fl6.fl6_sport = inet_rsk(req)->loc_port; + fl6.fl6_dport = ireq->ir_rmt_port; + fl6.fl6_sport = htons(ireq->ir_num); security_sk_classify_flow(sk, flowi6_to_flowi(&fl6)); dst = ip6_dst_lookup_flow(sk, &fl6, final_p, false); @@ -538,10 +537,10 @@ static struct sock *dccp_v6_request_recv_sock(struct sock *sk, memcpy(newnp, np, sizeof(struct ipv6_pinfo)); - newnp->daddr = ireq6->rmt_addr; - newnp->saddr = ireq6->loc_addr; - newnp->rcv_saddr = ireq6->loc_addr; - newsk->sk_bound_dev_if = ireq6->iif; + newsk->sk_v6_daddr = ireq->ir_v6_rmt_addr; + newnp->saddr = ireq->ir_v6_loc_addr; + newsk->sk_v6_rcv_saddr = ireq->ir_v6_loc_addr; + newsk->sk_bound_dev_if = ireq->ir_iif; /* Now IPv6 options... @@ -554,10 +553,10 @@ static struct sock *dccp_v6_request_recv_sock(struct sock *sk, /* Clone pktoptions received with SYN */ newnp->pktoptions = NULL; - if (ireq6->pktopts != NULL) { - newnp->pktoptions = skb_clone(ireq6->pktopts, GFP_ATOMIC); - consume_skb(ireq6->pktopts); - ireq6->pktopts = NULL; + if (ireq->pktopts != NULL) { + newnp->pktoptions = skb_clone(ireq->pktopts, GFP_ATOMIC); + consume_skb(ireq->pktopts); + ireq->pktopts = NULL; if (newnp->pktoptions) skb_set_owner_r(newnp->pktoptions, newsk); } @@ -885,7 +884,7 @@ static int dccp_v6_connect(struct sock *sk, struct sockaddr *uaddr, return -EINVAL; } - np->daddr = usin->sin6_addr; + sk->sk_v6_daddr = usin->sin6_addr; np->flow_label = fl6.flowlabel; /* @@ -915,16 +914,16 @@ static int dccp_v6_connect(struct sock *sk, struct sockaddr *uaddr, goto failure; } ipv6_addr_set_v4mapped(inet->inet_saddr, &np->saddr); - ipv6_addr_set_v4mapped(inet->inet_rcv_saddr, &np->rcv_saddr); + ipv6_addr_set_v4mapped(inet->inet_rcv_saddr, &sk->sk_v6_rcv_saddr); return err; } - if (!ipv6_addr_any(&np->rcv_saddr)) - saddr = &np->rcv_saddr; + if (!ipv6_addr_any(&sk->sk_v6_rcv_saddr)) + saddr = &sk->sk_v6_rcv_saddr; fl6.flowi6_proto = IPPROTO_DCCP; - fl6.daddr = np->daddr; + fl6.daddr = sk->sk_v6_daddr; fl6.saddr = saddr ? *saddr : np->saddr; fl6.flowi6_oif = sk->sk_bound_dev_if; fl6.fl6_dport = usin->sin6_port; @@ -941,7 +940,7 @@ static int dccp_v6_connect(struct sock *sk, struct sockaddr *uaddr, if (saddr == NULL) { saddr = &fl6.saddr; - np->rcv_saddr = *saddr; + sk->sk_v6_rcv_saddr = *saddr; } /* set the source address */ @@ -963,7 +962,7 @@ static int dccp_v6_connect(struct sock *sk, struct sockaddr *uaddr, goto late_failure; dp->dccps_iss = secure_dccpv6_sequence_number(np->saddr.s6_addr32, - np->daddr.s6_addr32, + sk->sk_v6_daddr.s6_addr32, inet->inet_sport, inet->inet_dport); err = dccp_connect(sk); diff --git a/net/dccp/ipv6.h b/net/dccp/ipv6.h index 6eef81fdbe56..af259e15e7f0 100644 --- a/net/dccp/ipv6.h +++ b/net/dccp/ipv6.h @@ -25,12 +25,10 @@ struct dccp6_sock { struct dccp6_request_sock { struct dccp_request_sock dccp; - struct inet6_request_sock inet6; }; struct dccp6_timewait_sock { struct inet_timewait_sock inet; - struct inet6_timewait_sock tw6; }; #endif /* _DCCP_IPV6_H */ diff --git a/net/dccp/minisocks.c b/net/dccp/minisocks.c index 662071b249cc..9e2f78bc1553 100644 --- a/net/dccp/minisocks.c +++ b/net/dccp/minisocks.c @@ -56,12 +56,9 @@ void dccp_time_wait(struct sock *sk, int state, int timeo) #if IS_ENABLED(CONFIG_IPV6) if (tw->tw_family == PF_INET6) { const struct ipv6_pinfo *np = inet6_sk(sk); - struct inet6_timewait_sock *tw6; - tw->tw_ipv6_offset = inet6_tw_offset(sk->sk_prot); - tw6 = inet6_twsk((struct sock *)tw); - tw6->tw_v6_daddr = np->daddr; - tw6->tw_v6_rcv_saddr = np->rcv_saddr; + tw->tw_v6_daddr = sk->sk_v6_daddr; + tw->tw_v6_rcv_saddr = sk->sk_v6_rcv_saddr; tw->tw_ipv6only = np->ipv6only; } #endif @@ -269,10 +266,10 @@ int dccp_reqsk_init(struct request_sock *req, { struct dccp_request_sock *dreq = dccp_rsk(req); - inet_rsk(req)->rmt_port = dccp_hdr(skb)->dccph_sport; - inet_rsk(req)->loc_port = dccp_hdr(skb)->dccph_dport; - inet_rsk(req)->acked = 0; - dreq->dreq_timestamp_echo = 0; + inet_rsk(req)->ir_rmt_port = dccp_hdr(skb)->dccph_sport; + inet_rsk(req)->ir_num = ntohs(dccp_hdr(skb)->dccph_dport); + inet_rsk(req)->acked = 0; + dreq->dreq_timestamp_echo = 0; /* inherit feature negotiation options from listening socket */ return dccp_feat_clone_list(&dp->dccps_featneg, &dreq->dreq_featneg); diff --git a/net/dccp/output.c b/net/dccp/output.c index d17fc90a74b6..8876078859da 100644 --- a/net/dccp/output.c +++ b/net/dccp/output.c @@ -424,8 +424,8 @@ struct sk_buff *dccp_make_response(struct sock *sk, struct dst_entry *dst, /* Build and checksum header */ dh = dccp_zeroed_hdr(skb, dccp_header_size); - dh->dccph_sport = inet_rsk(req)->loc_port; - dh->dccph_dport = inet_rsk(req)->rmt_port; + dh->dccph_sport = htons(inet_rsk(req)->ir_num); + dh->dccph_dport = inet_rsk(req)->ir_rmt_port; dh->dccph_doff = (dccp_header_size + DCCP_SKB_CB(skb)->dccpd_opt_len) / 4; dh->dccph_type = DCCP_PKT_RESPONSE; diff --git a/net/dccp/proto.c b/net/dccp/proto.c index ba64750f0387..eb892b4f4814 100644 --- a/net/dccp/proto.c +++ b/net/dccp/proto.c @@ -1158,10 +1158,8 @@ static int __init dccp_init(void) goto out_free_bind_bucket_cachep; } - for (i = 0; i <= dccp_hashinfo.ehash_mask; i++) { + for (i = 0; i <= dccp_hashinfo.ehash_mask; i++) INIT_HLIST_NULLS_HEAD(&dccp_hashinfo.ehash[i].chain, i); - INIT_HLIST_NULLS_HEAD(&dccp_hashinfo.ehash[i].twchain, i); - } if (inet_ehash_locks_alloc(&dccp_hashinfo)) goto out_free_dccp_ehash; diff --git a/net/decnet/netfilter/dn_rtmsg.c b/net/decnet/netfilter/dn_rtmsg.c index 2a7efe388344..e83015cecfa7 100644 --- a/net/decnet/netfilter/dn_rtmsg.c +++ b/net/decnet/netfilter/dn_rtmsg.c @@ -87,7 +87,7 @@ static void dnrmg_send_peer(struct sk_buff *skb) } -static unsigned int dnrmg_hook(unsigned int hook, +static unsigned int dnrmg_hook(const struct nf_hook_ops *ops, struct sk_buff *skb, const struct net_device *in, const struct net_device *out, diff --git a/net/ethernet/eth.c b/net/ethernet/eth.c index be1f64d35358..8f032bae60ad 100644 --- a/net/ethernet/eth.c +++ b/net/ethernet/eth.c @@ -58,7 +58,7 @@ #include <net/ipv6.h> #include <net/ip.h> #include <net/dsa.h> -#include <asm/uaccess.h> +#include <linux/uaccess.h> __setup("ether=", netdev_boot_setup); @@ -133,7 +133,7 @@ int eth_rebuild_header(struct sk_buff *skb) return arp_find(eth->h_dest, skb); #endif default: - printk(KERN_DEBUG + netdev_dbg(dev, "%s: unable to resolve type %X addresses.\n", dev->name, ntohs(eth->h_proto)); @@ -169,20 +169,9 @@ __be16 eth_type_trans(struct sk_buff *skb, struct net_device *dev) else skb->pkt_type = PACKET_MULTICAST; } - - /* - * This ALLMULTI check should be redundant by 1.4 - * so don't forget to remove it. - * - * Seems, you forgot to remove it. All silly devices - * seems to set IFF_PROMISC. - */ - - else if (1 /*dev->flags&IFF_PROMISC */ ) { - if (unlikely(!ether_addr_equal_64bits(eth->h_dest, - dev->dev_addr))) - skb->pkt_type = PACKET_OTHERHOST; - } + else if (unlikely(!ether_addr_equal_64bits(eth->h_dest, + dev->dev_addr))) + skb->pkt_type = PACKET_OTHERHOST; /* * Some variants of DSA tagging don't have an ethertype field @@ -190,12 +179,13 @@ __be16 eth_type_trans(struct sk_buff *skb, struct net_device *dev) * variants has been configured on the receiving interface, * and if so, set skb->protocol without looking at the packet. */ - if (netdev_uses_dsa_tags(dev)) + if (unlikely(netdev_uses_dsa_tags(dev))) return htons(ETH_P_DSA); - if (netdev_uses_trailer_tags(dev)) + + if (unlikely(netdev_uses_trailer_tags(dev))) return htons(ETH_P_TRAILER); - if (ntohs(eth->h_proto) >= ETH_P_802_3_MIN) + if (likely(ntohs(eth->h_proto) >= ETH_P_802_3_MIN)) return eth->h_proto; /* @@ -204,7 +194,7 @@ __be16 eth_type_trans(struct sk_buff *skb, struct net_device *dev) * layer. We look for FFFF which isn't a used 802.2 SSAP/DSAP. This * won't work for fault tolerant netware but does for the rest. */ - if (skb->len >= 2 && *(unsigned short *)(skb->data) == 0xFFFF) + if (unlikely(skb->len >= 2 && *(unsigned short *)(skb->data) == 0xFFFF)) return htons(ETH_P_802_3); /* diff --git a/net/hsr/Kconfig b/net/hsr/Kconfig new file mode 100644 index 000000000000..0d3d709052ca --- /dev/null +++ b/net/hsr/Kconfig @@ -0,0 +1,27 @@ +# +# IEC 62439-3 High-availability Seamless Redundancy +# + +config HSR + tristate "High-availability Seamless Redundancy (HSR)" + ---help--- + If you say Y here, then your Linux box will be able to act as a + DANH ("Doubly attached node implementing HSR"). For this to work, + your Linux box needs (at least) two physical Ethernet interfaces, + and it must be connected as a node in a ring network together with + other HSR capable nodes. + + All Ethernet frames sent over the hsr device will be sent in both + directions on the ring (over both slave ports), giving a redundant, + instant fail-over network. Each HSR node in the ring acts like a + bridge for HSR frames, but filters frames that have been forwarded + earlier. + + This code is a "best effort" to comply with the HSR standard as + described in IEC 62439-3:2010 (HSRv0), but no compliancy tests have + been made. + + You need to perform any and all necessary tests yourself before + relying on this code in a safety critical system! + + If unsure, say N. diff --git a/net/hsr/Makefile b/net/hsr/Makefile new file mode 100644 index 000000000000..b68359f181cc --- /dev/null +++ b/net/hsr/Makefile @@ -0,0 +1,7 @@ +# +# Makefile for HSR +# + +obj-$(CONFIG_HSR) += hsr.o + +hsr-y := hsr_main.o hsr_framereg.o hsr_device.o hsr_netlink.o diff --git a/net/hsr/hsr_device.c b/net/hsr/hsr_device.c new file mode 100644 index 000000000000..cac505f166d5 --- /dev/null +++ b/net/hsr/hsr_device.c @@ -0,0 +1,596 @@ +/* Copyright 2011-2013 Autronica Fire and Security AS + * + * This program is free software; you can redistribute it and/or modify it + * under the terms of the GNU General Public License as published by the Free + * Software Foundation; either version 2 of the License, or (at your option) + * any later version. + * + * Author(s): + * 2011-2013 Arvid Brodin, arvid.brodin@xdin.com + * + * This file contains device methods for creating, using and destroying + * virtual HSR devices. + */ + +#include <linux/netdevice.h> +#include <linux/skbuff.h> +#include <linux/etherdevice.h> +#include <linux/if_arp.h> +#include <linux/rtnetlink.h> +#include <linux/pkt_sched.h> +#include "hsr_device.h" +#include "hsr_framereg.h" +#include "hsr_main.h" + + +static bool is_admin_up(struct net_device *dev) +{ + return dev && (dev->flags & IFF_UP); +} + +static bool is_slave_up(struct net_device *dev) +{ + return dev && is_admin_up(dev) && netif_oper_up(dev); +} + +static void __hsr_set_operstate(struct net_device *dev, int transition) +{ + write_lock_bh(&dev_base_lock); + if (dev->operstate != transition) { + dev->operstate = transition; + write_unlock_bh(&dev_base_lock); + netdev_state_change(dev); + } else { + write_unlock_bh(&dev_base_lock); + } +} + +void hsr_set_operstate(struct net_device *hsr_dev, struct net_device *slave1, + struct net_device *slave2) +{ + if (!is_admin_up(hsr_dev)) { + __hsr_set_operstate(hsr_dev, IF_OPER_DOWN); + return; + } + + if (is_slave_up(slave1) || is_slave_up(slave2)) + __hsr_set_operstate(hsr_dev, IF_OPER_UP); + else + __hsr_set_operstate(hsr_dev, IF_OPER_LOWERLAYERDOWN); +} + +void hsr_set_carrier(struct net_device *hsr_dev, struct net_device *slave1, + struct net_device *slave2) +{ + if (is_slave_up(slave1) || is_slave_up(slave2)) + netif_carrier_on(hsr_dev); + else + netif_carrier_off(hsr_dev); +} + + +void hsr_check_announce(struct net_device *hsr_dev, int old_operstate) +{ + struct hsr_priv *hsr_priv; + + hsr_priv = netdev_priv(hsr_dev); + + if ((hsr_dev->operstate == IF_OPER_UP) && (old_operstate != IF_OPER_UP)) { + /* Went up */ + hsr_priv->announce_count = 0; + hsr_priv->announce_timer.expires = jiffies + + msecs_to_jiffies(HSR_ANNOUNCE_INTERVAL); + add_timer(&hsr_priv->announce_timer); + } + + if ((hsr_dev->operstate != IF_OPER_UP) && (old_operstate == IF_OPER_UP)) + /* Went down */ + del_timer(&hsr_priv->announce_timer); +} + + +int hsr_get_max_mtu(struct hsr_priv *hsr_priv) +{ + int mtu_max; + + if (hsr_priv->slave[0] && hsr_priv->slave[1]) + mtu_max = min(hsr_priv->slave[0]->mtu, hsr_priv->slave[1]->mtu); + else if (hsr_priv->slave[0]) + mtu_max = hsr_priv->slave[0]->mtu; + else if (hsr_priv->slave[1]) + mtu_max = hsr_priv->slave[1]->mtu; + else + mtu_max = HSR_TAGLEN; + + return mtu_max - HSR_TAGLEN; +} + +static int hsr_dev_change_mtu(struct net_device *dev, int new_mtu) +{ + struct hsr_priv *hsr_priv; + + hsr_priv = netdev_priv(dev); + + if (new_mtu > hsr_get_max_mtu(hsr_priv)) { + netdev_info(hsr_priv->dev, "A HSR master's MTU cannot be greater than the smallest MTU of its slaves minus the HSR Tag length (%d octets).\n", + HSR_TAGLEN); + return -EINVAL; + } + + dev->mtu = new_mtu; + + return 0; +} + +static int hsr_dev_open(struct net_device *dev) +{ + struct hsr_priv *hsr_priv; + int i; + char *slave_name; + + hsr_priv = netdev_priv(dev); + + for (i = 0; i < HSR_MAX_SLAVE; i++) { + if (hsr_priv->slave[i]) + slave_name = hsr_priv->slave[i]->name; + else + slave_name = "null"; + + if (!is_slave_up(hsr_priv->slave[i])) + netdev_warn(dev, "Slave %c (%s) is not up; please bring it up to get a working HSR network\n", + 'A' + i, slave_name); + } + + return 0; +} + +static int hsr_dev_close(struct net_device *dev) +{ + /* Nothing to do here. We could try to restore the state of the slaves + * to what they were before being changed by the hsr master dev's state, + * but they might have been changed manually in the mean time too, so + * taking them up or down here might be confusing and is probably not a + * good idea. + */ + return 0; +} + + +static void hsr_fill_tag(struct hsr_ethhdr *hsr_ethhdr, struct hsr_priv *hsr_priv) +{ + unsigned long irqflags; + + /* IEC 62439-1:2010, p 48, says the 4-bit "path" field can take values + * between 0001-1001 ("ring identifier", for regular HSR frames), + * or 1111 ("HSR management", supervision frames). Unfortunately, the + * spec writers forgot to explain what a "ring identifier" is, or + * how it is used. So we just set this to 0001 for regular frames, + * and 1111 for supervision frames. + */ + set_hsr_tag_path(&hsr_ethhdr->hsr_tag, 0x1); + + /* IEC 62439-1:2010, p 12: "The link service data unit in an Ethernet + * frame is the content of the frame located between the Length/Type + * field and the Frame Check Sequence." + * + * IEC 62439-3, p 48, specifies the "original LPDU" to include the + * original "LT" field (what "LT" means is not explained anywhere as + * far as I can see - perhaps "Length/Type"?). So LSDU_size might + * equal original length + 2. + * Also, the fact that this field is not used anywhere (might be used + * by a RedBox connecting HSR and PRP nets?) means I cannot test its + * correctness. Instead of guessing, I set this to 0 here, to make any + * problems immediately apparent. Anyone using this driver with PRP/HSR + * RedBoxes might need to fix this... + */ + set_hsr_tag_LSDU_size(&hsr_ethhdr->hsr_tag, 0); + + spin_lock_irqsave(&hsr_priv->seqnr_lock, irqflags); + hsr_ethhdr->hsr_tag.sequence_nr = htons(hsr_priv->sequence_nr); + hsr_priv->sequence_nr++; + spin_unlock_irqrestore(&hsr_priv->seqnr_lock, irqflags); + + hsr_ethhdr->hsr_tag.encap_proto = hsr_ethhdr->ethhdr.h_proto; + + hsr_ethhdr->ethhdr.h_proto = htons(ETH_P_PRP); +} + +static int slave_xmit(struct sk_buff *skb, struct hsr_priv *hsr_priv, + enum hsr_dev_idx dev_idx) +{ + struct hsr_ethhdr *hsr_ethhdr; + + hsr_ethhdr = (struct hsr_ethhdr *) skb->data; + + skb->dev = hsr_priv->slave[dev_idx]; + + hsr_addr_subst_dest(hsr_priv, &hsr_ethhdr->ethhdr, dev_idx); + + /* Address substitution (IEC62439-3 pp 26, 50): replace mac + * address of outgoing frame with that of the outgoing slave's. + */ + memcpy(hsr_ethhdr->ethhdr.h_source, skb->dev->dev_addr, ETH_ALEN); + + return dev_queue_xmit(skb); +} + + +static int hsr_dev_xmit(struct sk_buff *skb, struct net_device *dev) +{ + struct hsr_priv *hsr_priv; + struct hsr_ethhdr *hsr_ethhdr; + struct sk_buff *skb2; + int res1, res2; + + hsr_priv = netdev_priv(dev); + hsr_ethhdr = (struct hsr_ethhdr *) skb->data; + + if ((skb->protocol != htons(ETH_P_PRP)) || + (hsr_ethhdr->ethhdr.h_proto != htons(ETH_P_PRP))) { + hsr_fill_tag(hsr_ethhdr, hsr_priv); + skb->protocol = htons(ETH_P_PRP); + } + + skb2 = pskb_copy(skb, GFP_ATOMIC); + + res1 = NET_XMIT_DROP; + if (likely(hsr_priv->slave[HSR_DEV_SLAVE_A])) + res1 = slave_xmit(skb, hsr_priv, HSR_DEV_SLAVE_A); + + res2 = NET_XMIT_DROP; + if (likely(skb2 && hsr_priv->slave[HSR_DEV_SLAVE_B])) + res2 = slave_xmit(skb2, hsr_priv, HSR_DEV_SLAVE_B); + + if (likely(res1 == NET_XMIT_SUCCESS || res1 == NET_XMIT_CN || + res2 == NET_XMIT_SUCCESS || res2 == NET_XMIT_CN)) { + hsr_priv->dev->stats.tx_packets++; + hsr_priv->dev->stats.tx_bytes += skb->len; + } else { + hsr_priv->dev->stats.tx_dropped++; + } + + return NETDEV_TX_OK; +} + + +static int hsr_header_create(struct sk_buff *skb, struct net_device *dev, + unsigned short type, const void *daddr, + const void *saddr, unsigned int len) +{ + int res; + + /* Make room for the HSR tag now. We will fill it in later (in + * hsr_dev_xmit) + */ + if (skb_headroom(skb) < HSR_TAGLEN + ETH_HLEN) + return -ENOBUFS; + skb_push(skb, HSR_TAGLEN); + + /* To allow VLAN/HSR combos we should probably use + * res = dev_hard_header(skb, dev, type, daddr, saddr, len + HSR_TAGLEN); + * here instead. It would require other changes too, though - e.g. + * separate headers for each slave etc... + */ + res = eth_header(skb, dev, type, daddr, saddr, len + HSR_TAGLEN); + if (res <= 0) + return res; + skb_reset_mac_header(skb); + + return res + HSR_TAGLEN; +} + + +static const struct header_ops hsr_header_ops = { + .create = hsr_header_create, + .parse = eth_header_parse, +}; + + +/* HSR:2010 supervision frames should be padded so that the whole frame, + * including headers and FCS, is 64 bytes (without VLAN). + */ +static int hsr_pad(int size) +{ + const int min_size = ETH_ZLEN - HSR_TAGLEN - ETH_HLEN; + + if (size >= min_size) + return size; + return min_size; +} + +static void send_hsr_supervision_frame(struct net_device *hsr_dev, u8 type) +{ + struct hsr_priv *hsr_priv; + struct sk_buff *skb; + int hlen, tlen; + struct hsr_sup_tag *hsr_stag; + struct hsr_sup_payload *hsr_sp; + unsigned long irqflags; + + hlen = LL_RESERVED_SPACE(hsr_dev); + tlen = hsr_dev->needed_tailroom; + skb = alloc_skb(hsr_pad(sizeof(struct hsr_sup_payload)) + hlen + tlen, + GFP_ATOMIC); + + if (skb == NULL) + return; + + hsr_priv = netdev_priv(hsr_dev); + + skb_reserve(skb, hlen); + + skb->dev = hsr_dev; + skb->protocol = htons(ETH_P_PRP); + skb->priority = TC_PRIO_CONTROL; + + if (dev_hard_header(skb, skb->dev, ETH_P_PRP, + hsr_priv->sup_multicast_addr, + skb->dev->dev_addr, skb->len) < 0) + goto out; + + skb_pull(skb, sizeof(struct ethhdr)); + hsr_stag = (typeof(hsr_stag)) skb->data; + + set_hsr_stag_path(hsr_stag, 0xf); + set_hsr_stag_HSR_Ver(hsr_stag, 0); + + spin_lock_irqsave(&hsr_priv->seqnr_lock, irqflags); + hsr_stag->sequence_nr = htons(hsr_priv->sequence_nr); + hsr_priv->sequence_nr++; + spin_unlock_irqrestore(&hsr_priv->seqnr_lock, irqflags); + + hsr_stag->HSR_TLV_Type = type; + hsr_stag->HSR_TLV_Length = 12; + + skb_push(skb, sizeof(struct ethhdr)); + + /* Payload: MacAddressA */ + hsr_sp = (typeof(hsr_sp)) skb_put(skb, sizeof(*hsr_sp)); + memcpy(hsr_sp->MacAddressA, hsr_dev->dev_addr, ETH_ALEN); + + dev_queue_xmit(skb); + return; + +out: + kfree_skb(skb); +} + + +/* Announce (supervision frame) timer function + */ +static void hsr_announce(unsigned long data) +{ + struct hsr_priv *hsr_priv; + + hsr_priv = (struct hsr_priv *) data; + + if (hsr_priv->announce_count < 3) { + send_hsr_supervision_frame(hsr_priv->dev, HSR_TLV_ANNOUNCE); + hsr_priv->announce_count++; + } else { + send_hsr_supervision_frame(hsr_priv->dev, HSR_TLV_LIFE_CHECK); + } + + if (hsr_priv->announce_count < 3) + hsr_priv->announce_timer.expires = jiffies + + msecs_to_jiffies(HSR_ANNOUNCE_INTERVAL); + else + hsr_priv->announce_timer.expires = jiffies + + msecs_to_jiffies(HSR_LIFE_CHECK_INTERVAL); + + if (is_admin_up(hsr_priv->dev)) + add_timer(&hsr_priv->announce_timer); +} + + +static void restore_slaves(struct net_device *hsr_dev) +{ + struct hsr_priv *hsr_priv; + int i; + int res; + + hsr_priv = netdev_priv(hsr_dev); + + rtnl_lock(); + + /* Restore promiscuity */ + for (i = 0; i < HSR_MAX_SLAVE; i++) { + if (!hsr_priv->slave[i]) + continue; + res = dev_set_promiscuity(hsr_priv->slave[i], -1); + if (res) + netdev_info(hsr_dev, + "Cannot restore slave promiscuity (%s, %d)\n", + hsr_priv->slave[i]->name, res); + } + + rtnl_unlock(); +} + +static void reclaim_hsr_dev(struct rcu_head *rh) +{ + struct hsr_priv *hsr_priv; + + hsr_priv = container_of(rh, struct hsr_priv, rcu_head); + free_netdev(hsr_priv->dev); +} + + +/* According to comments in the declaration of struct net_device, this function + * is "Called from unregister, can be used to call free_netdev". Ok then... + */ +static void hsr_dev_destroy(struct net_device *hsr_dev) +{ + struct hsr_priv *hsr_priv; + + hsr_priv = netdev_priv(hsr_dev); + + del_timer(&hsr_priv->announce_timer); + unregister_hsr_master(hsr_priv); /* calls list_del_rcu on hsr_priv */ + restore_slaves(hsr_dev); + call_rcu(&hsr_priv->rcu_head, reclaim_hsr_dev); /* reclaim hsr_priv */ +} + +static const struct net_device_ops hsr_device_ops = { + .ndo_change_mtu = hsr_dev_change_mtu, + .ndo_open = hsr_dev_open, + .ndo_stop = hsr_dev_close, + .ndo_start_xmit = hsr_dev_xmit, +}; + + +void hsr_dev_setup(struct net_device *dev) +{ + random_ether_addr(dev->dev_addr); + + ether_setup(dev); + dev->header_ops = &hsr_header_ops; + dev->netdev_ops = &hsr_device_ops; + dev->tx_queue_len = 0; + + dev->destructor = hsr_dev_destroy; +} + + +/* Return true if dev is a HSR master; return false otherwise. + */ +bool is_hsr_master(struct net_device *dev) +{ + return (dev->netdev_ops->ndo_start_xmit == hsr_dev_xmit); +} + +static int check_slave_ok(struct net_device *dev) +{ + /* Don't allow HSR on non-ethernet like devices */ + if ((dev->flags & IFF_LOOPBACK) || (dev->type != ARPHRD_ETHER) || + (dev->addr_len != ETH_ALEN)) { + netdev_info(dev, "Cannot use loopback or non-ethernet device as HSR slave.\n"); + return -EINVAL; + } + + /* Don't allow enslaving hsr devices */ + if (is_hsr_master(dev)) { + netdev_info(dev, "Cannot create trees of HSR devices.\n"); + return -EINVAL; + } + + if (is_hsr_slave(dev)) { + netdev_info(dev, "This device is already a HSR slave.\n"); + return -EINVAL; + } + + if (dev->priv_flags & IFF_802_1Q_VLAN) { + netdev_info(dev, "HSR on top of VLAN is not yet supported in this driver.\n"); + return -EINVAL; + } + + /* HSR over bonded devices has not been tested, but I'm not sure it + * won't work... + */ + + return 0; +} + + +/* Default multicast address for HSR Supervision frames */ +static const unsigned char def_multicast_addr[ETH_ALEN] = { + 0x01, 0x15, 0x4e, 0x00, 0x01, 0x00 +}; + +int hsr_dev_finalize(struct net_device *hsr_dev, struct net_device *slave[2], + unsigned char multicast_spec) +{ + struct hsr_priv *hsr_priv; + int i; + int res; + + hsr_priv = netdev_priv(hsr_dev); + hsr_priv->dev = hsr_dev; + INIT_LIST_HEAD(&hsr_priv->node_db); + INIT_LIST_HEAD(&hsr_priv->self_node_db); + for (i = 0; i < HSR_MAX_SLAVE; i++) + hsr_priv->slave[i] = slave[i]; + + spin_lock_init(&hsr_priv->seqnr_lock); + /* Overflow soon to find bugs easier: */ + hsr_priv->sequence_nr = USHRT_MAX - 1024; + + init_timer(&hsr_priv->announce_timer); + hsr_priv->announce_timer.function = hsr_announce; + hsr_priv->announce_timer.data = (unsigned long) hsr_priv; + + memcpy(hsr_priv->sup_multicast_addr, def_multicast_addr, ETH_ALEN); + hsr_priv->sup_multicast_addr[ETH_ALEN - 1] = multicast_spec; + +/* FIXME: should I modify the value of these? + * + * - hsr_dev->flags - i.e. + * IFF_MASTER/SLAVE? + * - hsr_dev->priv_flags - i.e. + * IFF_EBRIDGE? + * IFF_TX_SKB_SHARING? + * IFF_HSR_MASTER/SLAVE? + */ + + for (i = 0; i < HSR_MAX_SLAVE; i++) { + res = check_slave_ok(slave[i]); + if (res) + return res; + } + + hsr_dev->features = slave[0]->features & slave[1]->features; + /* Prevent recursive tx locking */ + hsr_dev->features |= NETIF_F_LLTX; + /* VLAN on top of HSR needs testing and probably some work on + * hsr_header_create() etc. + */ + hsr_dev->features |= NETIF_F_VLAN_CHALLENGED; + + /* Set hsr_dev's MAC address to that of mac_slave1 */ + memcpy(hsr_dev->dev_addr, hsr_priv->slave[0]->dev_addr, ETH_ALEN); + + /* Set required header length */ + for (i = 0; i < HSR_MAX_SLAVE; i++) { + if (slave[i]->hard_header_len + HSR_TAGLEN > + hsr_dev->hard_header_len) + hsr_dev->hard_header_len = + slave[i]->hard_header_len + HSR_TAGLEN; + } + + /* MTU */ + for (i = 0; i < HSR_MAX_SLAVE; i++) + if (slave[i]->mtu - HSR_TAGLEN < hsr_dev->mtu) + hsr_dev->mtu = slave[i]->mtu - HSR_TAGLEN; + + /* Make sure the 1st call to netif_carrier_on() gets through */ + netif_carrier_off(hsr_dev); + + /* Promiscuity */ + for (i = 0; i < HSR_MAX_SLAVE; i++) { + res = dev_set_promiscuity(slave[i], 1); + if (res) { + netdev_info(hsr_dev, "Cannot set slave promiscuity (%s, %d)\n", + slave[i]->name, res); + goto fail; + } + } + + /* Make sure we recognize frames from ourselves in hsr_rcv() */ + res = hsr_create_self_node(&hsr_priv->self_node_db, + hsr_dev->dev_addr, + hsr_priv->slave[1]->dev_addr); + if (res < 0) + goto fail; + + res = register_netdevice(hsr_dev); + if (res) + goto fail; + + register_hsr_master(hsr_priv); + + return 0; + +fail: + restore_slaves(hsr_dev); + return res; +} diff --git a/net/hsr/hsr_device.h b/net/hsr/hsr_device.h new file mode 100644 index 000000000000..2c7148e73914 --- /dev/null +++ b/net/hsr/hsr_device.h @@ -0,0 +1,29 @@ +/* Copyright 2011-2013 Autronica Fire and Security AS + * + * This program is free software; you can redistribute it and/or modify it + * under the terms of the GNU General Public License as published by the Free + * Software Foundation; either version 2 of the License, or (at your option) + * any later version. + * + * Author(s): + * 2011-2013 Arvid Brodin, arvid.brodin@xdin.com + */ + +#ifndef __HSR_DEVICE_H +#define __HSR_DEVICE_H + +#include <linux/netdevice.h> +#include "hsr_main.h" + +void hsr_dev_setup(struct net_device *dev); +int hsr_dev_finalize(struct net_device *hsr_dev, struct net_device *slave[2], + unsigned char multicast_spec); +void hsr_set_operstate(struct net_device *hsr_dev, struct net_device *slave1, + struct net_device *slave2); +void hsr_set_carrier(struct net_device *hsr_dev, struct net_device *slave1, + struct net_device *slave2); +void hsr_check_announce(struct net_device *hsr_dev, int old_operstate); +bool is_hsr_master(struct net_device *dev); +int hsr_get_max_mtu(struct hsr_priv *hsr_priv); + +#endif /* __HSR_DEVICE_H */ diff --git a/net/hsr/hsr_framereg.c b/net/hsr/hsr_framereg.c new file mode 100644 index 000000000000..003f5bb3acd2 --- /dev/null +++ b/net/hsr/hsr_framereg.c @@ -0,0 +1,503 @@ +/* Copyright 2011-2013 Autronica Fire and Security AS + * + * This program is free software; you can redistribute it and/or modify it + * under the terms of the GNU General Public License as published by the Free + * Software Foundation; either version 2 of the License, or (at your option) + * any later version. + * + * Author(s): + * 2011-2013 Arvid Brodin, arvid.brodin@xdin.com + * + * The HSR spec says never to forward the same frame twice on the same + * interface. A frame is identified by its source MAC address and its HSR + * sequence number. This code keeps track of senders and their sequence numbers + * to allow filtering of duplicate frames, and to detect HSR ring errors. + */ + +#include <linux/if_ether.h> +#include <linux/etherdevice.h> +#include <linux/slab.h> +#include <linux/rculist.h> +#include "hsr_main.h" +#include "hsr_framereg.h" +#include "hsr_netlink.h" + + +struct node_entry { + struct list_head mac_list; + unsigned char MacAddressA[ETH_ALEN]; + unsigned char MacAddressB[ETH_ALEN]; + enum hsr_dev_idx AddrB_if; /* The local slave through which AddrB + * frames are received from this node + */ + unsigned long time_in[HSR_MAX_SLAVE]; + bool time_in_stale[HSR_MAX_SLAVE]; + u16 seq_out[HSR_MAX_DEV]; + struct rcu_head rcu_head; +}; + +/* TODO: use hash lists for mac addresses (linux/jhash.h)? */ + + + +/* Search for mac entry. Caller must hold rcu read lock. + */ +static struct node_entry *find_node_by_AddrA(struct list_head *node_db, + const unsigned char addr[ETH_ALEN]) +{ + struct node_entry *node; + + list_for_each_entry_rcu(node, node_db, mac_list) { + if (ether_addr_equal(node->MacAddressA, addr)) + return node; + } + + return NULL; +} + + +/* Search for mac entry. Caller must hold rcu read lock. + */ +static struct node_entry *find_node_by_AddrB(struct list_head *node_db, + const unsigned char addr[ETH_ALEN]) +{ + struct node_entry *node; + + list_for_each_entry_rcu(node, node_db, mac_list) { + if (ether_addr_equal(node->MacAddressB, addr)) + return node; + } + + return NULL; +} + + +/* Search for mac entry. Caller must hold rcu read lock. + */ +struct node_entry *hsr_find_node(struct list_head *node_db, struct sk_buff *skb) +{ + struct node_entry *node; + struct ethhdr *ethhdr; + + if (!skb_mac_header_was_set(skb)) + return NULL; + + ethhdr = (struct ethhdr *) skb_mac_header(skb); + + list_for_each_entry_rcu(node, node_db, mac_list) { + if (ether_addr_equal(node->MacAddressA, ethhdr->h_source)) + return node; + if (ether_addr_equal(node->MacAddressB, ethhdr->h_source)) + return node; + } + + return NULL; +} + + +/* Helper for device init; the self_node_db is used in hsr_rcv() to recognize + * frames from self that's been looped over the HSR ring. + */ +int hsr_create_self_node(struct list_head *self_node_db, + unsigned char addr_a[ETH_ALEN], + unsigned char addr_b[ETH_ALEN]) +{ + struct node_entry *node, *oldnode; + + node = kmalloc(sizeof(*node), GFP_KERNEL); + if (!node) + return -ENOMEM; + + memcpy(node->MacAddressA, addr_a, ETH_ALEN); + memcpy(node->MacAddressB, addr_b, ETH_ALEN); + + rcu_read_lock(); + oldnode = list_first_or_null_rcu(self_node_db, + struct node_entry, mac_list); + if (oldnode) { + list_replace_rcu(&oldnode->mac_list, &node->mac_list); + rcu_read_unlock(); + synchronize_rcu(); + kfree(oldnode); + } else { + rcu_read_unlock(); + list_add_tail_rcu(&node->mac_list, self_node_db); + } + + return 0; +} + +static void node_entry_reclaim(struct rcu_head *rh) +{ + kfree(container_of(rh, struct node_entry, rcu_head)); +} + + +/* Add/merge node to the database of nodes. 'skb' must contain an HSR + * supervision frame. + * - If the supervision header's MacAddressA field is not yet in the database, + * this frame is from an hitherto unknown node - add it to the database. + * - If the sender's MAC address is not the same as its MacAddressA address, + * the node is using PICS_SUBS (address substitution). Record the sender's + * address as the node's MacAddressB. + * + * This function needs to work even if the sender node has changed one of its + * slaves' MAC addresses. In this case, there are four different cases described + * by (Addr-changed, received-from) pairs as follows. Note that changing the + * SlaveA address is equal to changing the node's own address: + * + * - (AddrB, SlaveB): The new AddrB will be recorded by PICS_SUBS code since + * node == NULL. + * - (AddrB, SlaveA): Will work as usual (the AddrB change won't be detected + * from this frame). + * + * - (AddrA, SlaveB): The old node will be found. We need to detect this and + * remove the node. + * - (AddrA, SlaveA): A new node will be registered (non-PICS_SUBS at first). + * The old one will be pruned after HSR_NODE_FORGET_TIME. + * + * We also need to detect if the sender's SlaveA and SlaveB cables have been + * swapped. + */ +struct node_entry *hsr_merge_node(struct hsr_priv *hsr_priv, + struct node_entry *node, + struct sk_buff *skb, + enum hsr_dev_idx dev_idx) +{ + struct hsr_sup_payload *hsr_sp; + struct hsr_ethhdr_sp *hsr_ethsup; + int i; + unsigned long now; + + hsr_ethsup = (struct hsr_ethhdr_sp *) skb_mac_header(skb); + hsr_sp = (struct hsr_sup_payload *) skb->data; + + if (node && !ether_addr_equal(node->MacAddressA, hsr_sp->MacAddressA)) { + /* Node has changed its AddrA, frame was received from SlaveB */ + list_del_rcu(&node->mac_list); + call_rcu(&node->rcu_head, node_entry_reclaim); + node = NULL; + } + + if (node && (dev_idx == node->AddrB_if) && + !ether_addr_equal(node->MacAddressB, hsr_ethsup->ethhdr.h_source)) { + /* Cables have been swapped */ + list_del_rcu(&node->mac_list); + call_rcu(&node->rcu_head, node_entry_reclaim); + node = NULL; + } + + if (node && (dev_idx != node->AddrB_if) && + (node->AddrB_if != HSR_DEV_NONE) && + !ether_addr_equal(node->MacAddressA, hsr_ethsup->ethhdr.h_source)) { + /* Cables have been swapped */ + list_del_rcu(&node->mac_list); + call_rcu(&node->rcu_head, node_entry_reclaim); + node = NULL; + } + + if (node) + return node; + + node = find_node_by_AddrA(&hsr_priv->node_db, hsr_sp->MacAddressA); + if (node) { + /* Node is known, but frame was received from an unknown + * address. Node is PICS_SUBS capable; merge its AddrB. + */ + memcpy(node->MacAddressB, hsr_ethsup->ethhdr.h_source, ETH_ALEN); + node->AddrB_if = dev_idx; + return node; + } + + node = kzalloc(sizeof(*node), GFP_ATOMIC); + if (!node) + return NULL; + + memcpy(node->MacAddressA, hsr_sp->MacAddressA, ETH_ALEN); + memcpy(node->MacAddressB, hsr_ethsup->ethhdr.h_source, ETH_ALEN); + if (!ether_addr_equal(hsr_sp->MacAddressA, hsr_ethsup->ethhdr.h_source)) + node->AddrB_if = dev_idx; + else + node->AddrB_if = HSR_DEV_NONE; + + /* We are only interested in time diffs here, so use current jiffies + * as initialization. (0 could trigger an spurious ring error warning). + */ + now = jiffies; + for (i = 0; i < HSR_MAX_SLAVE; i++) + node->time_in[i] = now; + for (i = 0; i < HSR_MAX_DEV; i++) + node->seq_out[i] = ntohs(hsr_ethsup->hsr_sup.sequence_nr) - 1; + + list_add_tail_rcu(&node->mac_list, &hsr_priv->node_db); + + return node; +} + + +/* 'skb' is a frame meant for this host, that is to be passed to upper layers. + * + * If the frame was sent by a node's B interface, replace the sender + * address with that node's "official" address (MacAddressA) so that upper + * layers recognize where it came from. + */ +void hsr_addr_subst_source(struct hsr_priv *hsr_priv, struct sk_buff *skb) +{ + struct ethhdr *ethhdr; + struct node_entry *node; + + if (!skb_mac_header_was_set(skb)) { + WARN_ONCE(1, "%s: Mac header not set\n", __func__); + return; + } + ethhdr = (struct ethhdr *) skb_mac_header(skb); + + rcu_read_lock(); + node = find_node_by_AddrB(&hsr_priv->node_db, ethhdr->h_source); + if (node) + memcpy(ethhdr->h_source, node->MacAddressA, ETH_ALEN); + rcu_read_unlock(); +} + + +/* 'skb' is a frame meant for another host. + * 'hsr_dev_idx' is the HSR index of the outgoing device + * + * Substitute the target (dest) MAC address if necessary, so the it matches the + * recipient interface MAC address, regardless of whether that is the + * recipient's A or B interface. + * This is needed to keep the packets flowing through switches that learn on + * which "side" the different interfaces are. + */ +void hsr_addr_subst_dest(struct hsr_priv *hsr_priv, struct ethhdr *ethhdr, + enum hsr_dev_idx dev_idx) +{ + struct node_entry *node; + + rcu_read_lock(); + node = find_node_by_AddrA(&hsr_priv->node_db, ethhdr->h_dest); + if (node && (node->AddrB_if == dev_idx)) + memcpy(ethhdr->h_dest, node->MacAddressB, ETH_ALEN); + rcu_read_unlock(); +} + + +/* seq_nr_after(a, b) - return true if a is after (higher in sequence than) b, + * false otherwise. + */ +static bool seq_nr_after(u16 a, u16 b) +{ + /* Remove inconsistency where + * seq_nr_after(a, b) == seq_nr_before(a, b) */ + if ((int) b - a == 32768) + return false; + + return (((s16) (b - a)) < 0); +} +#define seq_nr_before(a, b) seq_nr_after((b), (a)) +#define seq_nr_after_or_eq(a, b) (!seq_nr_before((a), (b))) +#define seq_nr_before_or_eq(a, b) (!seq_nr_after((a), (b))) + + +void hsr_register_frame_in(struct node_entry *node, enum hsr_dev_idx dev_idx) +{ + if ((dev_idx < 0) || (dev_idx >= HSR_MAX_DEV)) { + WARN_ONCE(1, "%s: Invalid dev_idx (%d)\n", __func__, dev_idx); + return; + } + node->time_in[dev_idx] = jiffies; + node->time_in_stale[dev_idx] = false; +} + + +/* 'skb' is a HSR Ethernet frame (with a HSR tag inserted), with a valid + * ethhdr->h_source address and skb->mac_header set. + * + * Return: + * 1 if frame can be shown to have been sent recently on this interface, + * 0 otherwise, or + * negative error code on error + */ +int hsr_register_frame_out(struct node_entry *node, enum hsr_dev_idx dev_idx, + struct sk_buff *skb) +{ + struct hsr_ethhdr *hsr_ethhdr; + u16 sequence_nr; + + if ((dev_idx < 0) || (dev_idx >= HSR_MAX_DEV)) { + WARN_ONCE(1, "%s: Invalid dev_idx (%d)\n", __func__, dev_idx); + return -EINVAL; + } + if (!skb_mac_header_was_set(skb)) { + WARN_ONCE(1, "%s: Mac header not set\n", __func__); + return -EINVAL; + } + hsr_ethhdr = (struct hsr_ethhdr *) skb_mac_header(skb); + + sequence_nr = ntohs(hsr_ethhdr->hsr_tag.sequence_nr); + if (seq_nr_before_or_eq(sequence_nr, node->seq_out[dev_idx])) + return 1; + + node->seq_out[dev_idx] = sequence_nr; + return 0; +} + + + +static bool is_late(struct node_entry *node, enum hsr_dev_idx dev_idx) +{ + enum hsr_dev_idx other; + + if (node->time_in_stale[dev_idx]) + return true; + + if (dev_idx == HSR_DEV_SLAVE_A) + other = HSR_DEV_SLAVE_B; + else + other = HSR_DEV_SLAVE_A; + + if (node->time_in_stale[other]) + return false; + + if (time_after(node->time_in[other], node->time_in[dev_idx] + + msecs_to_jiffies(MAX_SLAVE_DIFF))) + return true; + + return false; +} + + +/* Remove stale sequence_nr records. Called by timer every + * HSR_LIFE_CHECK_INTERVAL (two seconds or so). + */ +void hsr_prune_nodes(struct hsr_priv *hsr_priv) +{ + struct node_entry *node; + unsigned long timestamp; + unsigned long time_a, time_b; + + rcu_read_lock(); + list_for_each_entry_rcu(node, &hsr_priv->node_db, mac_list) { + /* Shorthand */ + time_a = node->time_in[HSR_DEV_SLAVE_A]; + time_b = node->time_in[HSR_DEV_SLAVE_B]; + + /* Check for timestamps old enough to risk wrap-around */ + if (time_after(jiffies, time_a + MAX_JIFFY_OFFSET/2)) + node->time_in_stale[HSR_DEV_SLAVE_A] = true; + if (time_after(jiffies, time_b + MAX_JIFFY_OFFSET/2)) + node->time_in_stale[HSR_DEV_SLAVE_B] = true; + + /* Get age of newest frame from node. + * At least one time_in is OK here; nodes get pruned long + * before both time_ins can get stale + */ + timestamp = time_a; + if (node->time_in_stale[HSR_DEV_SLAVE_A] || + (!node->time_in_stale[HSR_DEV_SLAVE_B] && + time_after(time_b, time_a))) + timestamp = time_b; + + /* Warn of ring error only as long as we get frames at all */ + if (time_is_after_jiffies(timestamp + + msecs_to_jiffies(1.5*MAX_SLAVE_DIFF))) { + + if (is_late(node, HSR_DEV_SLAVE_A)) + hsr_nl_ringerror(hsr_priv, node->MacAddressA, + HSR_DEV_SLAVE_A); + else if (is_late(node, HSR_DEV_SLAVE_B)) + hsr_nl_ringerror(hsr_priv, node->MacAddressA, + HSR_DEV_SLAVE_B); + } + + /* Prune old entries */ + if (time_is_before_jiffies(timestamp + + msecs_to_jiffies(HSR_NODE_FORGET_TIME))) { + hsr_nl_nodedown(hsr_priv, node->MacAddressA); + list_del_rcu(&node->mac_list); + /* Note that we need to free this entry later: */ + call_rcu(&node->rcu_head, node_entry_reclaim); + } + } + rcu_read_unlock(); +} + + +void *hsr_get_next_node(struct hsr_priv *hsr_priv, void *_pos, + unsigned char addr[ETH_ALEN]) +{ + struct node_entry *node; + + if (!_pos) { + node = list_first_or_null_rcu(&hsr_priv->node_db, + struct node_entry, mac_list); + if (node) + memcpy(addr, node->MacAddressA, ETH_ALEN); + return node; + } + + node = _pos; + list_for_each_entry_continue_rcu(node, &hsr_priv->node_db, mac_list) { + memcpy(addr, node->MacAddressA, ETH_ALEN); + return node; + } + + return NULL; +} + + +int hsr_get_node_data(struct hsr_priv *hsr_priv, + const unsigned char *addr, + unsigned char addr_b[ETH_ALEN], + unsigned int *addr_b_ifindex, + int *if1_age, + u16 *if1_seq, + int *if2_age, + u16 *if2_seq) +{ + struct node_entry *node; + unsigned long tdiff; + + + rcu_read_lock(); + node = find_node_by_AddrA(&hsr_priv->node_db, addr); + if (!node) { + rcu_read_unlock(); + return -ENOENT; /* No such entry */ + } + + memcpy(addr_b, node->MacAddressB, ETH_ALEN); + + tdiff = jiffies - node->time_in[HSR_DEV_SLAVE_A]; + if (node->time_in_stale[HSR_DEV_SLAVE_A]) + *if1_age = INT_MAX; +#if HZ <= MSEC_PER_SEC + else if (tdiff > msecs_to_jiffies(INT_MAX)) + *if1_age = INT_MAX; +#endif + else + *if1_age = jiffies_to_msecs(tdiff); + + tdiff = jiffies - node->time_in[HSR_DEV_SLAVE_B]; + if (node->time_in_stale[HSR_DEV_SLAVE_B]) + *if2_age = INT_MAX; +#if HZ <= MSEC_PER_SEC + else if (tdiff > msecs_to_jiffies(INT_MAX)) + *if2_age = INT_MAX; +#endif + else + *if2_age = jiffies_to_msecs(tdiff); + + /* Present sequence numbers as if they were incoming on interface */ + *if1_seq = node->seq_out[HSR_DEV_SLAVE_B]; + *if2_seq = node->seq_out[HSR_DEV_SLAVE_A]; + + if ((node->AddrB_if != HSR_DEV_NONE) && hsr_priv->slave[node->AddrB_if]) + *addr_b_ifindex = hsr_priv->slave[node->AddrB_if]->ifindex; + else + *addr_b_ifindex = -1; + + rcu_read_unlock(); + + return 0; +} diff --git a/net/hsr/hsr_framereg.h b/net/hsr/hsr_framereg.h new file mode 100644 index 000000000000..e6c4022030ad --- /dev/null +++ b/net/hsr/hsr_framereg.h @@ -0,0 +1,53 @@ +/* Copyright 2011-2013 Autronica Fire and Security AS + * + * This program is free software; you can redistribute it and/or modify it + * under the terms of the GNU General Public License as published by the Free + * Software Foundation; either version 2 of the License, or (at your option) + * any later version. + * + * Author(s): + * 2011-2013 Arvid Brodin, arvid.brodin@xdin.com + */ + +#ifndef _HSR_FRAMEREG_H +#define _HSR_FRAMEREG_H + +#include "hsr_main.h" + +struct node_entry; + +struct node_entry *hsr_find_node(struct list_head *node_db, struct sk_buff *skb); + +struct node_entry *hsr_merge_node(struct hsr_priv *hsr_priv, + struct node_entry *node, + struct sk_buff *skb, + enum hsr_dev_idx dev_idx); + +void hsr_addr_subst_source(struct hsr_priv *hsr_priv, struct sk_buff *skb); +void hsr_addr_subst_dest(struct hsr_priv *hsr_priv, struct ethhdr *ethhdr, + enum hsr_dev_idx dev_idx); + +void hsr_register_frame_in(struct node_entry *node, enum hsr_dev_idx dev_idx); + +int hsr_register_frame_out(struct node_entry *node, enum hsr_dev_idx dev_idx, + struct sk_buff *skb); + +void hsr_prune_nodes(struct hsr_priv *hsr_priv); + +int hsr_create_self_node(struct list_head *self_node_db, + unsigned char addr_a[ETH_ALEN], + unsigned char addr_b[ETH_ALEN]); + +void *hsr_get_next_node(struct hsr_priv *hsr_priv, void *_pos, + unsigned char addr[ETH_ALEN]); + +int hsr_get_node_data(struct hsr_priv *hsr_priv, + const unsigned char *addr, + unsigned char addr_b[ETH_ALEN], + unsigned int *addr_b_ifindex, + int *if1_age, + u16 *if1_seq, + int *if2_age, + u16 *if2_seq); + +#endif /* _HSR_FRAMEREG_H */ diff --git a/net/hsr/hsr_main.c b/net/hsr/hsr_main.c new file mode 100644 index 000000000000..af68dd83a4e3 --- /dev/null +++ b/net/hsr/hsr_main.c @@ -0,0 +1,469 @@ +/* Copyright 2011-2013 Autronica Fire and Security AS + * + * This program is free software; you can redistribute it and/or modify it + * under the terms of the GNU General Public License as published by the Free + * Software Foundation; either version 2 of the License, or (at your option) + * any later version. + * + * Author(s): + * 2011-2013 Arvid Brodin, arvid.brodin@xdin.com + * + * In addition to routines for registering and unregistering HSR support, this + * file also contains the receive routine that handles all incoming frames with + * Ethertype (protocol) ETH_P_PRP (HSRv0), and network device event handling. + */ + +#include <linux/netdevice.h> +#include <linux/rculist.h> +#include <linux/timer.h> +#include <linux/etherdevice.h> +#include "hsr_main.h" +#include "hsr_device.h" +#include "hsr_netlink.h" +#include "hsr_framereg.h" + + +/* List of all registered virtual HSR devices */ +static LIST_HEAD(hsr_list); + +void register_hsr_master(struct hsr_priv *hsr_priv) +{ + list_add_tail_rcu(&hsr_priv->hsr_list, &hsr_list); +} + +void unregister_hsr_master(struct hsr_priv *hsr_priv) +{ + struct hsr_priv *hsr_priv_it; + + list_for_each_entry(hsr_priv_it, &hsr_list, hsr_list) + if (hsr_priv_it == hsr_priv) { + list_del_rcu(&hsr_priv_it->hsr_list); + return; + } +} + +bool is_hsr_slave(struct net_device *dev) +{ + struct hsr_priv *hsr_priv_it; + + list_for_each_entry_rcu(hsr_priv_it, &hsr_list, hsr_list) { + if (dev == hsr_priv_it->slave[0]) + return true; + if (dev == hsr_priv_it->slave[1]) + return true; + } + + return false; +} + + +/* If dev is a HSR slave device, return the virtual master device. Return NULL + * otherwise. + */ +static struct hsr_priv *get_hsr_master(struct net_device *dev) +{ + struct hsr_priv *hsr_priv; + + rcu_read_lock(); + list_for_each_entry_rcu(hsr_priv, &hsr_list, hsr_list) + if ((dev == hsr_priv->slave[0]) || + (dev == hsr_priv->slave[1])) { + rcu_read_unlock(); + return hsr_priv; + } + + rcu_read_unlock(); + return NULL; +} + + +/* If dev is a HSR slave device, return the other slave device. Return NULL + * otherwise. + */ +static struct net_device *get_other_slave(struct hsr_priv *hsr_priv, + struct net_device *dev) +{ + if (dev == hsr_priv->slave[0]) + return hsr_priv->slave[1]; + if (dev == hsr_priv->slave[1]) + return hsr_priv->slave[0]; + + return NULL; +} + + +static int hsr_netdev_notify(struct notifier_block *nb, unsigned long event, + void *ptr) +{ + struct net_device *slave, *other_slave; + struct hsr_priv *hsr_priv; + int old_operstate; + int mtu_max; + int res; + struct net_device *dev; + + dev = netdev_notifier_info_to_dev(ptr); + + hsr_priv = get_hsr_master(dev); + if (hsr_priv) { + /* dev is a slave device */ + slave = dev; + other_slave = get_other_slave(hsr_priv, slave); + } else { + if (!is_hsr_master(dev)) + return NOTIFY_DONE; + hsr_priv = netdev_priv(dev); + slave = hsr_priv->slave[0]; + other_slave = hsr_priv->slave[1]; + } + + switch (event) { + case NETDEV_UP: /* Administrative state DOWN */ + case NETDEV_DOWN: /* Administrative state UP */ + case NETDEV_CHANGE: /* Link (carrier) state changes */ + old_operstate = hsr_priv->dev->operstate; + hsr_set_carrier(hsr_priv->dev, slave, other_slave); + /* netif_stacked_transfer_operstate() cannot be used here since + * it doesn't set IF_OPER_LOWERLAYERDOWN (?) + */ + hsr_set_operstate(hsr_priv->dev, slave, other_slave); + hsr_check_announce(hsr_priv->dev, old_operstate); + break; + case NETDEV_CHANGEADDR: + + /* This should not happen since there's no ndo_set_mac_address() + * for HSR devices - i.e. not supported. + */ + if (dev == hsr_priv->dev) + break; + + if (dev == hsr_priv->slave[0]) + memcpy(hsr_priv->dev->dev_addr, + hsr_priv->slave[0]->dev_addr, ETH_ALEN); + + /* Make sure we recognize frames from ourselves in hsr_rcv() */ + res = hsr_create_self_node(&hsr_priv->self_node_db, + hsr_priv->dev->dev_addr, + hsr_priv->slave[1] ? + hsr_priv->slave[1]->dev_addr : + hsr_priv->dev->dev_addr); + if (res) + netdev_warn(hsr_priv->dev, + "Could not update HSR node address.\n"); + + if (dev == hsr_priv->slave[0]) + call_netdevice_notifiers(NETDEV_CHANGEADDR, hsr_priv->dev); + break; + case NETDEV_CHANGEMTU: + if (dev == hsr_priv->dev) + break; /* Handled in ndo_change_mtu() */ + mtu_max = hsr_get_max_mtu(hsr_priv); + if (hsr_priv->dev->mtu > mtu_max) + dev_set_mtu(hsr_priv->dev, mtu_max); + break; + case NETDEV_UNREGISTER: + if (dev == hsr_priv->slave[0]) + hsr_priv->slave[0] = NULL; + if (dev == hsr_priv->slave[1]) + hsr_priv->slave[1] = NULL; + + /* There should really be a way to set a new slave device... */ + + break; + case NETDEV_PRE_TYPE_CHANGE: + /* HSR works only on Ethernet devices. Refuse slave to change + * its type. + */ + return NOTIFY_BAD; + } + + return NOTIFY_DONE; +} + + +static struct timer_list prune_timer; + +static void prune_nodes_all(unsigned long data) +{ + struct hsr_priv *hsr_priv; + + rcu_read_lock(); + list_for_each_entry_rcu(hsr_priv, &hsr_list, hsr_list) + hsr_prune_nodes(hsr_priv); + rcu_read_unlock(); + + prune_timer.expires = jiffies + msecs_to_jiffies(PRUNE_PERIOD); + add_timer(&prune_timer); +} + + +static struct sk_buff *hsr_pull_tag(struct sk_buff *skb) +{ + struct hsr_tag *hsr_tag; + struct sk_buff *skb2; + + skb2 = skb_share_check(skb, GFP_ATOMIC); + if (unlikely(!skb2)) + goto err_free; + skb = skb2; + + if (unlikely(!pskb_may_pull(skb, HSR_TAGLEN))) + goto err_free; + + hsr_tag = (struct hsr_tag *) skb->data; + skb->protocol = hsr_tag->encap_proto; + skb_pull(skb, HSR_TAGLEN); + + return skb; + +err_free: + kfree_skb(skb); + return NULL; +} + + +/* The uses I can see for these HSR supervision frames are: + * 1) Use the frames that are sent after node initialization ("HSR_TLV.Type = + * 22") to reset any sequence_nr counters belonging to that node. Useful if + * the other node's counter has been reset for some reason. + * -- + * Or not - resetting the counter and bridging the frame would create a + * loop, unfortunately. + * + * 2) Use the LifeCheck frames to detect ring breaks. I.e. if no LifeCheck + * frame is received from a particular node, we know something is wrong. + * We just register these (as with normal frames) and throw them away. + * + * 3) Allow different MAC addresses for the two slave interfaces, using the + * MacAddressA field. + */ +static bool is_supervision_frame(struct hsr_priv *hsr_priv, struct sk_buff *skb) +{ + struct hsr_sup_tag *hsr_stag; + + if (!ether_addr_equal(eth_hdr(skb)->h_dest, + hsr_priv->sup_multicast_addr)) + return false; + + hsr_stag = (struct hsr_sup_tag *) skb->data; + if (get_hsr_stag_path(hsr_stag) != 0x0f) + return false; + if ((hsr_stag->HSR_TLV_Type != HSR_TLV_ANNOUNCE) && + (hsr_stag->HSR_TLV_Type != HSR_TLV_LIFE_CHECK)) + return false; + if (hsr_stag->HSR_TLV_Length != 12) + return false; + + return true; +} + + +/* Implementation somewhat according to IEC-62439-3, p. 43 + */ +static int hsr_rcv(struct sk_buff *skb, struct net_device *dev, + struct packet_type *pt, struct net_device *orig_dev) +{ + struct hsr_priv *hsr_priv; + struct net_device *other_slave; + struct node_entry *node; + bool deliver_to_self; + struct sk_buff *skb_deliver; + enum hsr_dev_idx dev_in_idx, dev_other_idx; + bool dup_out; + int ret; + + hsr_priv = get_hsr_master(dev); + + if (!hsr_priv) { + /* Non-HSR-slave device 'dev' is connected to a HSR network */ + kfree_skb(skb); + dev->stats.rx_errors++; + return NET_RX_SUCCESS; + } + + if (dev == hsr_priv->slave[0]) { + dev_in_idx = HSR_DEV_SLAVE_A; + dev_other_idx = HSR_DEV_SLAVE_B; + } else { + dev_in_idx = HSR_DEV_SLAVE_B; + dev_other_idx = HSR_DEV_SLAVE_A; + } + + node = hsr_find_node(&hsr_priv->self_node_db, skb); + if (node) { + /* Always kill frames sent by ourselves */ + kfree_skb(skb); + return NET_RX_SUCCESS; + } + + /* Is this frame a candidate for local reception? */ + deliver_to_self = false; + if ((skb->pkt_type == PACKET_HOST) || + (skb->pkt_type == PACKET_MULTICAST) || + (skb->pkt_type == PACKET_BROADCAST)) + deliver_to_self = true; + else if (ether_addr_equal(eth_hdr(skb)->h_dest, + hsr_priv->dev->dev_addr)) { + skb->pkt_type = PACKET_HOST; + deliver_to_self = true; + } + + + rcu_read_lock(); /* node_db */ + node = hsr_find_node(&hsr_priv->node_db, skb); + + if (is_supervision_frame(hsr_priv, skb)) { + skb_pull(skb, sizeof(struct hsr_sup_tag)); + node = hsr_merge_node(hsr_priv, node, skb, dev_in_idx); + if (!node) { + rcu_read_unlock(); /* node_db */ + kfree_skb(skb); + hsr_priv->dev->stats.rx_dropped++; + return NET_RX_DROP; + } + skb_push(skb, sizeof(struct hsr_sup_tag)); + deliver_to_self = false; + } + + if (!node) { + /* Source node unknown; this might be a HSR frame from + * another net (different multicast address). Ignore it. + */ + rcu_read_unlock(); /* node_db */ + kfree_skb(skb); + return NET_RX_SUCCESS; + } + + /* Register ALL incoming frames as outgoing through the other interface. + * This allows us to register frames as incoming only if they are valid + * for the receiving interface, without using a specific counter for + * incoming frames. + */ + dup_out = hsr_register_frame_out(node, dev_other_idx, skb); + if (!dup_out) + hsr_register_frame_in(node, dev_in_idx); + + /* Forward this frame? */ + if (!dup_out && (skb->pkt_type != PACKET_HOST)) + other_slave = get_other_slave(hsr_priv, dev); + else + other_slave = NULL; + + if (hsr_register_frame_out(node, HSR_DEV_MASTER, skb)) + deliver_to_self = false; + + rcu_read_unlock(); /* node_db */ + + if (!deliver_to_self && !other_slave) { + kfree_skb(skb); + /* Circulated frame; silently remove it. */ + return NET_RX_SUCCESS; + } + + skb_deliver = skb; + if (deliver_to_self && other_slave) { + /* skb_clone() is not enough since we will strip the hsr tag + * and do address substitution below + */ + skb_deliver = pskb_copy(skb, GFP_ATOMIC); + if (!skb_deliver) { + deliver_to_self = false; + hsr_priv->dev->stats.rx_dropped++; + } + } + + if (deliver_to_self) { + bool multicast_frame; + + skb_deliver = hsr_pull_tag(skb_deliver); + if (!skb_deliver) { + hsr_priv->dev->stats.rx_dropped++; + goto forward; + } +#if !defined(CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS) + /* Move everything in the header that is after the HSR tag, + * to work around alignment problems caused by the 6-byte HSR + * tag. In practice, this removes/overwrites the HSR tag in + * the header and restores a "standard" packet. + */ + memmove(skb_deliver->data - HSR_TAGLEN, skb_deliver->data, + skb_headlen(skb_deliver)); + + /* Adjust skb members so they correspond with the move above. + * This cannot possibly underflow skb->data since hsr_pull_tag() + * above succeeded. + * At this point in the protocol stack, the transport and + * network headers have not been set yet, and we haven't touched + * the mac header nor the head. So we only need to adjust data + * and tail: + */ + skb_deliver->data -= HSR_TAGLEN; + skb_deliver->tail -= HSR_TAGLEN; +#endif + skb_deliver->dev = hsr_priv->dev; + hsr_addr_subst_source(hsr_priv, skb_deliver); + multicast_frame = (skb_deliver->pkt_type == PACKET_MULTICAST); + ret = netif_rx(skb_deliver); + if (ret == NET_RX_DROP) { + hsr_priv->dev->stats.rx_dropped++; + } else { + hsr_priv->dev->stats.rx_packets++; + hsr_priv->dev->stats.rx_bytes += skb->len; + if (multicast_frame) + hsr_priv->dev->stats.multicast++; + } + } + +forward: + if (other_slave) { + skb_push(skb, ETH_HLEN); + skb->dev = other_slave; + dev_queue_xmit(skb); + } + + return NET_RX_SUCCESS; +} + + +static struct packet_type hsr_pt __read_mostly = { + .type = htons(ETH_P_PRP), + .func = hsr_rcv, +}; + +static struct notifier_block hsr_nb = { + .notifier_call = hsr_netdev_notify, /* Slave event notifications */ +}; + + +static int __init hsr_init(void) +{ + int res; + + BUILD_BUG_ON(sizeof(struct hsr_tag) != HSR_TAGLEN); + + dev_add_pack(&hsr_pt); + + init_timer(&prune_timer); + prune_timer.function = prune_nodes_all; + prune_timer.data = 0; + prune_timer.expires = jiffies + msecs_to_jiffies(PRUNE_PERIOD); + add_timer(&prune_timer); + + register_netdevice_notifier(&hsr_nb); + + res = hsr_netlink_init(); + + return res; +} + +static void __exit hsr_exit(void) +{ + unregister_netdevice_notifier(&hsr_nb); + del_timer(&prune_timer); + hsr_netlink_exit(); + dev_remove_pack(&hsr_pt); +} + +module_init(hsr_init); +module_exit(hsr_exit); +MODULE_LICENSE("GPL"); diff --git a/net/hsr/hsr_main.h b/net/hsr/hsr_main.h new file mode 100644 index 000000000000..56fe060c0ab1 --- /dev/null +++ b/net/hsr/hsr_main.h @@ -0,0 +1,166 @@ +/* Copyright 2011-2013 Autronica Fire and Security AS + * + * This program is free software; you can redistribute it and/or modify it + * under the terms of the GNU General Public License as published by the Free + * Software Foundation; either version 2 of the License, or (at your option) + * any later version. + * + * Author(s): + * 2011-2013 Arvid Brodin, arvid.brodin@xdin.com + */ + +#ifndef _HSR_PRIVATE_H +#define _HSR_PRIVATE_H + +#include <linux/netdevice.h> +#include <linux/list.h> + + +/* Time constants as specified in the HSR specification (IEC-62439-3 2010) + * Table 8. + * All values in milliseconds. + */ +#define HSR_LIFE_CHECK_INTERVAL 2000 /* ms */ +#define HSR_NODE_FORGET_TIME 60000 /* ms */ +#define HSR_ANNOUNCE_INTERVAL 100 /* ms */ + + +/* By how much may slave1 and slave2 timestamps of latest received frame from + * each node differ before we notify of communication problem? + */ +#define MAX_SLAVE_DIFF 3000 /* ms */ + + +/* How often shall we check for broken ring and remove node entries older than + * HSR_NODE_FORGET_TIME? + */ +#define PRUNE_PERIOD 3000 /* ms */ + + +#define HSR_TLV_ANNOUNCE 22 +#define HSR_TLV_LIFE_CHECK 23 + + +/* HSR Tag. + * As defined in IEC-62439-3:2010, the HSR tag is really { ethertype = 0x88FB, + * path, LSDU_size, sequence Nr }. But we let eth_header() create { h_dest, + * h_source, h_proto = 0x88FB }, and add { path, LSDU_size, sequence Nr, + * encapsulated protocol } instead. + */ +#define HSR_TAGLEN 6 + +/* Field names below as defined in the IEC:2010 standard for HSR. */ +struct hsr_tag { + __be16 path_and_LSDU_size; + __be16 sequence_nr; + __be16 encap_proto; +} __packed; + + +/* The helper functions below assumes that 'path' occupies the 4 most + * significant bits of the 16-bit field shared by 'path' and 'LSDU_size' (or + * equivalently, the 4 most significant bits of HSR tag byte 14). + * + * This is unclear in the IEC specification; its definition of MAC addresses + * indicates the spec is written with the least significant bit first (to the + * left). This, however, would mean that the LSDU field would be split in two + * with the path field in-between, which seems strange. I'm guessing the MAC + * address definition is in error. + */ +static inline u16 get_hsr_tag_path(struct hsr_tag *ht) +{ + return ntohs(ht->path_and_LSDU_size) >> 12; +} + +static inline u16 get_hsr_tag_LSDU_size(struct hsr_tag *ht) +{ + return ntohs(ht->path_and_LSDU_size) & 0x0FFF; +} + +static inline void set_hsr_tag_path(struct hsr_tag *ht, u16 path) +{ + ht->path_and_LSDU_size = htons( + (ntohs(ht->path_and_LSDU_size) & 0x0FFF) | (path << 12)); +} + +static inline void set_hsr_tag_LSDU_size(struct hsr_tag *ht, u16 LSDU_size) +{ + ht->path_and_LSDU_size = htons( + (ntohs(ht->path_and_LSDU_size) & 0xF000) | + (LSDU_size & 0x0FFF)); +} + +struct hsr_ethhdr { + struct ethhdr ethhdr; + struct hsr_tag hsr_tag; +} __packed; + + +/* HSR Supervision Frame data types. + * Field names as defined in the IEC:2010 standard for HSR. + */ +struct hsr_sup_tag { + __be16 path_and_HSR_Ver; + __be16 sequence_nr; + __u8 HSR_TLV_Type; + __u8 HSR_TLV_Length; +} __packed; + +struct hsr_sup_payload { + unsigned char MacAddressA[ETH_ALEN]; +} __packed; + +static inline u16 get_hsr_stag_path(struct hsr_sup_tag *hst) +{ + return get_hsr_tag_path((struct hsr_tag *) hst); +} + +static inline u16 get_hsr_stag_HSR_ver(struct hsr_sup_tag *hst) +{ + return get_hsr_tag_LSDU_size((struct hsr_tag *) hst); +} + +static inline void set_hsr_stag_path(struct hsr_sup_tag *hst, u16 path) +{ + set_hsr_tag_path((struct hsr_tag *) hst, path); +} + +static inline void set_hsr_stag_HSR_Ver(struct hsr_sup_tag *hst, u16 HSR_Ver) +{ + set_hsr_tag_LSDU_size((struct hsr_tag *) hst, HSR_Ver); +} + +struct hsr_ethhdr_sp { + struct ethhdr ethhdr; + struct hsr_sup_tag hsr_sup; +} __packed; + + +enum hsr_dev_idx { + HSR_DEV_NONE = -1, + HSR_DEV_SLAVE_A = 0, + HSR_DEV_SLAVE_B, + HSR_DEV_MASTER, +}; +#define HSR_MAX_SLAVE (HSR_DEV_SLAVE_B + 1) +#define HSR_MAX_DEV (HSR_DEV_MASTER + 1) + +struct hsr_priv { + struct list_head hsr_list; /* List of hsr devices */ + struct rcu_head rcu_head; + struct net_device *dev; + struct net_device *slave[HSR_MAX_SLAVE]; + struct list_head node_db; /* Other HSR nodes */ + struct list_head self_node_db; /* MACs of slaves */ + struct timer_list announce_timer; /* Supervision frame dispatch */ + int announce_count; + u16 sequence_nr; + spinlock_t seqnr_lock; /* locking for sequence_nr */ + unsigned char sup_multicast_addr[ETH_ALEN]; +}; + +void register_hsr_master(struct hsr_priv *hsr_priv); +void unregister_hsr_master(struct hsr_priv *hsr_priv); +bool is_hsr_slave(struct net_device *dev); + +#endif /* _HSR_PRIVATE_H */ diff --git a/net/hsr/hsr_netlink.c b/net/hsr/hsr_netlink.c new file mode 100644 index 000000000000..4e66bf61f585 --- /dev/null +++ b/net/hsr/hsr_netlink.c @@ -0,0 +1,457 @@ +/* Copyright 2011-2013 Autronica Fire and Security AS + * + * This program is free software; you can redistribute it and/or modify it + * under the terms of the GNU General Public License as published by the Free + * Software Foundation; either version 2 of the License, or (at your option) + * any later version. + * + * Author(s): + * 2011-2013 Arvid Brodin, arvid.brodin@xdin.com + * + * Routines for handling Netlink messages for HSR. + */ + +#include "hsr_netlink.h" +#include <linux/kernel.h> +#include <net/rtnetlink.h> +#include <net/genetlink.h> +#include "hsr_main.h" +#include "hsr_device.h" +#include "hsr_framereg.h" + +static const struct nla_policy hsr_policy[IFLA_HSR_MAX + 1] = { + [IFLA_HSR_SLAVE1] = { .type = NLA_U32 }, + [IFLA_HSR_SLAVE2] = { .type = NLA_U32 }, + [IFLA_HSR_MULTICAST_SPEC] = { .type = NLA_U8 }, +}; + + +/* Here, it seems a netdevice has already been allocated for us, and the + * hsr_dev_setup routine has been executed. Nice! + */ +static int hsr_newlink(struct net *src_net, struct net_device *dev, + struct nlattr *tb[], struct nlattr *data[]) +{ + struct net_device *link[2]; + unsigned char multicast_spec; + + if (!data[IFLA_HSR_SLAVE1]) { + netdev_info(dev, "IFLA_HSR_SLAVE1 missing!\n"); + return -EINVAL; + } + link[0] = __dev_get_by_index(src_net, nla_get_u32(data[IFLA_HSR_SLAVE1])); + if (!data[IFLA_HSR_SLAVE2]) { + netdev_info(dev, "IFLA_HSR_SLAVE2 missing!\n"); + return -EINVAL; + } + link[1] = __dev_get_by_index(src_net, nla_get_u32(data[IFLA_HSR_SLAVE2])); + + if (!link[0] || !link[1]) + return -ENODEV; + if (link[0] == link[1]) + return -EINVAL; + + if (!data[IFLA_HSR_MULTICAST_SPEC]) + multicast_spec = 0; + else + multicast_spec = nla_get_u8(data[IFLA_HSR_MULTICAST_SPEC]); + + return hsr_dev_finalize(dev, link, multicast_spec); +} + +static struct rtnl_link_ops hsr_link_ops __read_mostly = { + .kind = "hsr", + .maxtype = IFLA_HSR_MAX, + .policy = hsr_policy, + .priv_size = sizeof(struct hsr_priv), + .setup = hsr_dev_setup, + .newlink = hsr_newlink, +}; + + + +/* attribute policy */ +/* NLA_BINARY missing in libnl; use NLA_UNSPEC in userspace instead. */ +static const struct nla_policy hsr_genl_policy[HSR_A_MAX + 1] = { + [HSR_A_NODE_ADDR] = { .type = NLA_BINARY, .len = ETH_ALEN }, + [HSR_A_NODE_ADDR_B] = { .type = NLA_BINARY, .len = ETH_ALEN }, + [HSR_A_IFINDEX] = { .type = NLA_U32 }, + [HSR_A_IF1_AGE] = { .type = NLA_U32 }, + [HSR_A_IF2_AGE] = { .type = NLA_U32 }, + [HSR_A_IF1_SEQ] = { .type = NLA_U16 }, + [HSR_A_IF2_SEQ] = { .type = NLA_U16 }, +}; + +static struct genl_family hsr_genl_family = { + .id = GENL_ID_GENERATE, + .hdrsize = 0, + .name = "HSR", + .version = 1, + .maxattr = HSR_A_MAX, +}; + +static struct genl_multicast_group hsr_network_genl_mcgrp = { + .name = "hsr-network", +}; + + + +/* This is called if for some node with MAC address addr, we only get frames + * over one of the slave interfaces. This would indicate an open network ring + * (i.e. a link has failed somewhere). + */ +void hsr_nl_ringerror(struct hsr_priv *hsr_priv, unsigned char addr[ETH_ALEN], + enum hsr_dev_idx dev_idx) +{ + struct sk_buff *skb; + void *msg_head; + int res; + int ifindex; + + skb = genlmsg_new(NLMSG_GOODSIZE, GFP_ATOMIC); + if (!skb) + goto fail; + + msg_head = genlmsg_put(skb, 0, 0, &hsr_genl_family, 0, HSR_C_RING_ERROR); + if (!msg_head) + goto nla_put_failure; + + res = nla_put(skb, HSR_A_NODE_ADDR, ETH_ALEN, addr); + if (res < 0) + goto nla_put_failure; + + if (hsr_priv->slave[dev_idx]) + ifindex = hsr_priv->slave[dev_idx]->ifindex; + else + ifindex = -1; + res = nla_put_u32(skb, HSR_A_IFINDEX, ifindex); + if (res < 0) + goto nla_put_failure; + + genlmsg_end(skb, msg_head); + genlmsg_multicast(skb, 0, hsr_network_genl_mcgrp.id, GFP_ATOMIC); + + return; + +nla_put_failure: + kfree_skb(skb); + +fail: + netdev_warn(hsr_priv->dev, "Could not send HSR ring error message\n"); +} + +/* This is called when we haven't heard from the node with MAC address addr for + * some time (just before the node is removed from the node table/list). + */ +void hsr_nl_nodedown(struct hsr_priv *hsr_priv, unsigned char addr[ETH_ALEN]) +{ + struct sk_buff *skb; + void *msg_head; + int res; + + skb = genlmsg_new(NLMSG_GOODSIZE, GFP_ATOMIC); + if (!skb) + goto fail; + + msg_head = genlmsg_put(skb, 0, 0, &hsr_genl_family, 0, HSR_C_NODE_DOWN); + if (!msg_head) + goto nla_put_failure; + + + res = nla_put(skb, HSR_A_NODE_ADDR, ETH_ALEN, addr); + if (res < 0) + goto nla_put_failure; + + genlmsg_end(skb, msg_head); + genlmsg_multicast(skb, 0, hsr_network_genl_mcgrp.id, GFP_ATOMIC); + + return; + +nla_put_failure: + kfree_skb(skb); + +fail: + netdev_warn(hsr_priv->dev, "Could not send HSR node down\n"); +} + + +/* HSR_C_GET_NODE_STATUS lets userspace query the internal HSR node table + * about the status of a specific node in the network, defined by its MAC + * address. + * + * Input: hsr ifindex, node mac address + * Output: hsr ifindex, node mac address (copied from request), + * age of latest frame from node over slave 1, slave 2 [ms] + */ +static int hsr_get_node_status(struct sk_buff *skb_in, struct genl_info *info) +{ + /* For receiving */ + struct nlattr *na; + struct net_device *hsr_dev; + + /* For sending */ + struct sk_buff *skb_out; + void *msg_head; + struct hsr_priv *hsr_priv; + unsigned char hsr_node_addr_b[ETH_ALEN]; + int hsr_node_if1_age; + u16 hsr_node_if1_seq; + int hsr_node_if2_age; + u16 hsr_node_if2_seq; + int addr_b_ifindex; + int res; + + if (!info) + goto invalid; + + na = info->attrs[HSR_A_IFINDEX]; + if (!na) + goto invalid; + na = info->attrs[HSR_A_NODE_ADDR]; + if (!na) + goto invalid; + + hsr_dev = __dev_get_by_index(genl_info_net(info), + nla_get_u32(info->attrs[HSR_A_IFINDEX])); + if (!hsr_dev) + goto invalid; + if (!is_hsr_master(hsr_dev)) + goto invalid; + + + /* Send reply */ + + skb_out = genlmsg_new(NLMSG_GOODSIZE, GFP_KERNEL); + if (!skb_out) { + res = -ENOMEM; + goto fail; + } + + msg_head = genlmsg_put(skb_out, NETLINK_CB(skb_in).portid, + info->snd_seq, &hsr_genl_family, 0, + HSR_C_SET_NODE_STATUS); + if (!msg_head) { + res = -ENOMEM; + goto nla_put_failure; + } + + res = nla_put_u32(skb_out, HSR_A_IFINDEX, hsr_dev->ifindex); + if (res < 0) + goto nla_put_failure; + + hsr_priv = netdev_priv(hsr_dev); + res = hsr_get_node_data(hsr_priv, + (unsigned char *) nla_data(info->attrs[HSR_A_NODE_ADDR]), + hsr_node_addr_b, + &addr_b_ifindex, + &hsr_node_if1_age, + &hsr_node_if1_seq, + &hsr_node_if2_age, + &hsr_node_if2_seq); + if (res < 0) + goto fail; + + res = nla_put(skb_out, HSR_A_NODE_ADDR, ETH_ALEN, + nla_data(info->attrs[HSR_A_NODE_ADDR])); + if (res < 0) + goto nla_put_failure; + + if (addr_b_ifindex > -1) { + res = nla_put(skb_out, HSR_A_NODE_ADDR_B, ETH_ALEN, + hsr_node_addr_b); + if (res < 0) + goto nla_put_failure; + + res = nla_put_u32(skb_out, HSR_A_ADDR_B_IFINDEX, addr_b_ifindex); + if (res < 0) + goto nla_put_failure; + } + + res = nla_put_u32(skb_out, HSR_A_IF1_AGE, hsr_node_if1_age); + if (res < 0) + goto nla_put_failure; + res = nla_put_u16(skb_out, HSR_A_IF1_SEQ, hsr_node_if1_seq); + if (res < 0) + goto nla_put_failure; + if (hsr_priv->slave[0]) + res = nla_put_u32(skb_out, HSR_A_IF1_IFINDEX, + hsr_priv->slave[0]->ifindex); + if (res < 0) + goto nla_put_failure; + + res = nla_put_u32(skb_out, HSR_A_IF2_AGE, hsr_node_if2_age); + if (res < 0) + goto nla_put_failure; + res = nla_put_u16(skb_out, HSR_A_IF2_SEQ, hsr_node_if2_seq); + if (res < 0) + goto nla_put_failure; + if (hsr_priv->slave[1]) + res = nla_put_u32(skb_out, HSR_A_IF2_IFINDEX, + hsr_priv->slave[1]->ifindex); + + genlmsg_end(skb_out, msg_head); + genlmsg_unicast(genl_info_net(info), skb_out, info->snd_portid); + + return 0; + +invalid: + netlink_ack(skb_in, nlmsg_hdr(skb_in), -EINVAL); + return 0; + +nla_put_failure: + kfree_skb(skb_out); + /* Fall through */ + +fail: + return res; +} + +static struct genl_ops hsr_ops_get_node_status = { + .cmd = HSR_C_GET_NODE_STATUS, + .flags = 0, + .policy = hsr_genl_policy, + .doit = hsr_get_node_status, + .dumpit = NULL, +}; + + +/* Get a list of MacAddressA of all nodes known to this node (other than self). + */ +static int hsr_get_node_list(struct sk_buff *skb_in, struct genl_info *info) +{ + /* For receiving */ + struct nlattr *na; + struct net_device *hsr_dev; + + /* For sending */ + struct sk_buff *skb_out; + void *msg_head; + struct hsr_priv *hsr_priv; + void *pos; + unsigned char addr[ETH_ALEN]; + int res; + + if (!info) + goto invalid; + + na = info->attrs[HSR_A_IFINDEX]; + if (!na) + goto invalid; + + hsr_dev = __dev_get_by_index(genl_info_net(info), + nla_get_u32(info->attrs[HSR_A_IFINDEX])); + if (!hsr_dev) + goto invalid; + if (!is_hsr_master(hsr_dev)) + goto invalid; + + + /* Send reply */ + + skb_out = genlmsg_new(NLMSG_GOODSIZE, GFP_KERNEL); + if (!skb_out) { + res = -ENOMEM; + goto fail; + } + + msg_head = genlmsg_put(skb_out, NETLINK_CB(skb_in).portid, + info->snd_seq, &hsr_genl_family, 0, + HSR_C_SET_NODE_LIST); + if (!msg_head) { + res = -ENOMEM; + goto nla_put_failure; + } + + res = nla_put_u32(skb_out, HSR_A_IFINDEX, hsr_dev->ifindex); + if (res < 0) + goto nla_put_failure; + + hsr_priv = netdev_priv(hsr_dev); + + rcu_read_lock(); + pos = hsr_get_next_node(hsr_priv, NULL, addr); + while (pos) { + res = nla_put(skb_out, HSR_A_NODE_ADDR, ETH_ALEN, addr); + if (res < 0) { + rcu_read_unlock(); + goto nla_put_failure; + } + pos = hsr_get_next_node(hsr_priv, pos, addr); + } + rcu_read_unlock(); + + genlmsg_end(skb_out, msg_head); + genlmsg_unicast(genl_info_net(info), skb_out, info->snd_portid); + + return 0; + +invalid: + netlink_ack(skb_in, nlmsg_hdr(skb_in), -EINVAL); + return 0; + +nla_put_failure: + kfree_skb(skb_out); + /* Fall through */ + +fail: + return res; +} + + +static struct genl_ops hsr_ops_get_node_list = { + .cmd = HSR_C_GET_NODE_LIST, + .flags = 0, + .policy = hsr_genl_policy, + .doit = hsr_get_node_list, + .dumpit = NULL, +}; + +int __init hsr_netlink_init(void) +{ + int rc; + + rc = rtnl_link_register(&hsr_link_ops); + if (rc) + goto fail_rtnl_link_register; + + rc = genl_register_family(&hsr_genl_family); + if (rc) + goto fail_genl_register_family; + + rc = genl_register_ops(&hsr_genl_family, &hsr_ops_get_node_status); + if (rc) + goto fail_genl_register_ops; + + rc = genl_register_ops(&hsr_genl_family, &hsr_ops_get_node_list); + if (rc) + goto fail_genl_register_ops_node_list; + + rc = genl_register_mc_group(&hsr_genl_family, &hsr_network_genl_mcgrp); + if (rc) + goto fail_genl_register_mc_group; + + return 0; + +fail_genl_register_mc_group: + genl_unregister_ops(&hsr_genl_family, &hsr_ops_get_node_list); +fail_genl_register_ops_node_list: + genl_unregister_ops(&hsr_genl_family, &hsr_ops_get_node_status); +fail_genl_register_ops: + genl_unregister_family(&hsr_genl_family); +fail_genl_register_family: + rtnl_link_unregister(&hsr_link_ops); +fail_rtnl_link_register: + + return rc; +} + +void __exit hsr_netlink_exit(void) +{ + genl_unregister_mc_group(&hsr_genl_family, &hsr_network_genl_mcgrp); + genl_unregister_ops(&hsr_genl_family, &hsr_ops_get_node_status); + genl_unregister_family(&hsr_genl_family); + + rtnl_link_unregister(&hsr_link_ops); +} + +MODULE_ALIAS_RTNL_LINK("hsr"); diff --git a/net/hsr/hsr_netlink.h b/net/hsr/hsr_netlink.h new file mode 100644 index 000000000000..d4579dcc3c7d --- /dev/null +++ b/net/hsr/hsr_netlink.h @@ -0,0 +1,30 @@ +/* Copyright 2011-2013 Autronica Fire and Security AS + * + * This program is free software; you can redistribute it and/or modify it + * under the terms of the GNU General Public License as published by the Free + * Software Foundation; either version 2 of the License, or (at your option) + * any later version. + * + * Author(s): + * 2011-2013 Arvid Brodin, arvid.brodin@xdin.com + */ + +#ifndef __HSR_NETLINK_H +#define __HSR_NETLINK_H + +#include <linux/if_ether.h> +#include <linux/module.h> +#include <uapi/linux/hsr_netlink.h> + +struct hsr_priv; + +int __init hsr_netlink_init(void); +void __exit hsr_netlink_exit(void); + +void hsr_nl_ringerror(struct hsr_priv *hsr_priv, unsigned char addr[ETH_ALEN], + int dev_idx); +void hsr_nl_nodedown(struct hsr_priv *hsr_priv, unsigned char addr[ETH_ALEN]); +void hsr_nl_framedrop(int dropcount, int dev_idx); +void hsr_nl_linkdown(int dev_idx); + +#endif /* __HSR_NETLINK_H */ diff --git a/net/ieee802154/6lowpan.c b/net/ieee802154/6lowpan.c index ff41b4d60d30..426b5df1c98f 100644 --- a/net/ieee802154/6lowpan.c +++ b/net/ieee802154/6lowpan.c @@ -440,7 +440,6 @@ lowpan_uncompress_udp_header(struct sk_buff *skb, struct udphdr *uh) default: pr_debug("ERROR: unknown UDP format\n"); goto err; - break; } pr_debug("uncompressed UDP ports: src = %d, dst = %d\n", @@ -655,7 +654,9 @@ static int lowpan_header_create(struct sk_buff *skb, head[1] = iphc1; skb_pull(skb, sizeof(struct ipv6hdr)); + skb_reset_transport_header(skb); memcpy(skb_push(skb, hc06_ptr - head), head, hc06_ptr - head); + skb_reset_network_header(skb); lowpan_raw_dump_table(__func__, "raw skb data dump", skb->data, skb->len); @@ -738,7 +739,6 @@ static int lowpan_skb_deliver(struct sk_buff *skb, struct ipv6hdr *hdr) return -ENOMEM; skb_push(new, sizeof(struct ipv6hdr)); - skb_reset_network_header(new); skb_copy_to_linear_data(new, hdr, sizeof(struct ipv6hdr)); new->protocol = htons(ETH_P_IPV6); @@ -785,7 +785,6 @@ lowpan_alloc_new_frame(struct sk_buff *skb, u16 len, u16 tag) goto skb_err; frame->skb->priority = skb->priority; - frame->skb->dev = skb->dev; /* reserve headroom for uncompressed ipv6 header */ skb_reserve(frame->skb, sizeof(struct ipv6hdr)); @@ -1061,7 +1060,6 @@ lowpan_process_data(struct sk_buff *skb) skb = new; skb_push(skb, sizeof(struct udphdr)); - skb_reset_transport_header(skb); skb_copy_to_linear_data(skb, &uh, sizeof(struct udphdr)); lowpan_raw_dump_table(__func__, "raw UDP header dump", @@ -1104,50 +1102,40 @@ static int lowpan_set_address(struct net_device *dev, void *p) return 0; } -static int lowpan_get_mac_header_length(struct sk_buff *skb) -{ - /* - * Currently long addressing mode is supported only, so the overall - * header size is 21: - * FC SeqNum DPAN DA SA Sec - * 2 + 1 + 2 + 8 + 8 + 0 = 21 - */ - return 21; -} - static int lowpan_fragment_xmit(struct sk_buff *skb, u8 *head, int mlen, int plen, int offset, int type) { struct sk_buff *frag; - int hlen, ret; + int hlen; hlen = (type == LOWPAN_DISPATCH_FRAG1) ? LOWPAN_FRAG1_HEAD_SIZE : LOWPAN_FRAGN_HEAD_SIZE; lowpan_raw_dump_inline(__func__, "6lowpan fragment header", head, hlen); - frag = dev_alloc_skb(hlen + mlen + plen + IEEE802154_MFR_SIZE); + frag = netdev_alloc_skb(skb->dev, + hlen + mlen + plen + IEEE802154_MFR_SIZE); if (!frag) return -ENOMEM; frag->priority = skb->priority; - frag->dev = skb->dev; /* copy header, MFR and payload */ - memcpy(skb_put(frag, mlen), skb->data, mlen); - memcpy(skb_put(frag, hlen), head, hlen); + skb_put(frag, mlen); + skb_copy_to_linear_data(frag, skb_mac_header(skb), mlen); + + skb_put(frag, hlen); + skb_copy_to_linear_data_offset(frag, mlen, head, hlen); - if (plen) - skb_copy_from_linear_data_offset(skb, offset + mlen, - skb_put(frag, plen), plen); + skb_put(frag, plen); + skb_copy_to_linear_data_offset(frag, mlen + hlen, + skb_network_header(skb) + offset, plen); lowpan_raw_dump_table(__func__, " raw fragment dump", frag->data, frag->len); - ret = dev_queue_xmit(frag); - - return ret; + return dev_queue_xmit(frag); } static int @@ -1156,7 +1144,7 @@ lowpan_skb_fragmentation(struct sk_buff *skb, struct net_device *dev) int err, header_length, payload_length, tag, offset = 0; u8 head[5]; - header_length = lowpan_get_mac_header_length(skb); + header_length = skb->mac_len; payload_length = skb->len - header_length; tag = lowpan_dev_info(dev)->fragment_tag++; @@ -1181,7 +1169,7 @@ lowpan_skb_fragmentation(struct sk_buff *skb, struct net_device *dev) head[0] &= ~LOWPAN_DISPATCH_FRAG1; head[0] |= LOWPAN_DISPATCH_FRAGN; - while ((payload_length - offset > 0) && (err >= 0)) { + while (payload_length - offset > 0) { int len = LOWPAN_FRAG_SIZE; head[4] = offset / 8; @@ -1327,8 +1315,6 @@ static int lowpan_rcv(struct sk_buff *skb, struct net_device *dev, /* Pull off the 1-byte of 6lowpan header. */ skb_pull(local_skb, 1); - skb_reset_network_header(local_skb); - skb_set_transport_header(local_skb, sizeof(struct ipv6hdr)); lowpan_give_skb_to_devices(local_skb); @@ -1372,8 +1358,10 @@ static int lowpan_newlink(struct net *src_net, struct net_device *dev, real_dev = dev_get_by_index(src_net, nla_get_u32(tb[IFLA_LINK])); if (!real_dev) return -ENODEV; - if (real_dev->type != ARPHRD_IEEE802154) + if (real_dev->type != ARPHRD_IEEE802154) { + dev_put(real_dev); return -EINVAL; + } lowpan_dev_info(dev)->real_dev = real_dev; lowpan_dev_info(dev)->fragment_tag = 0; diff --git a/net/ipv4/af_inet.c b/net/ipv4/af_inet.c index cfeb85cff4f0..68af9aac91d0 100644 --- a/net/ipv4/af_inet.c +++ b/net/ipv4/af_inet.c @@ -245,29 +245,6 @@ out: } EXPORT_SYMBOL(inet_listen); -u32 inet_ehash_secret __read_mostly; -EXPORT_SYMBOL(inet_ehash_secret); - -u32 ipv6_hash_secret __read_mostly; -EXPORT_SYMBOL(ipv6_hash_secret); - -/* - * inet_ehash_secret must be set exactly once, and to a non nul value - * ipv6_hash_secret must be set exactly once. - */ -void build_ehash_secret(void) -{ - u32 rnd; - - do { - get_random_bytes(&rnd, sizeof(rnd)); - } while (rnd == 0); - - if (cmpxchg(&inet_ehash_secret, 0, rnd) == 0) - get_random_bytes(&ipv6_hash_secret, sizeof(ipv6_hash_secret)); -} -EXPORT_SYMBOL(build_ehash_secret); - /* * Create an inet socket. */ @@ -284,10 +261,6 @@ static int inet_create(struct net *net, struct socket *sock, int protocol, int try_loading_module = 0; int err; - if (unlikely(!inet_ehash_secret)) - if (sock->type != SOCK_RAW && sock->type != SOCK_DGRAM) - build_ehash_secret(); - sock->state = SS_UNCONNECTED; /* Look for the requested type/protocol pair. */ @@ -1254,36 +1227,36 @@ static int inet_gso_send_check(struct sk_buff *skb) if (ihl < sizeof(*iph)) goto out; + proto = iph->protocol; + + /* Warning: after this point, iph might be no longer valid */ if (unlikely(!pskb_may_pull(skb, ihl))) goto out; - __skb_pull(skb, ihl); + skb_reset_transport_header(skb); - iph = ip_hdr(skb); - proto = iph->protocol; err = -EPROTONOSUPPORT; - rcu_read_lock(); ops = rcu_dereference(inet_offloads[proto]); if (likely(ops && ops->callbacks.gso_send_check)) err = ops->callbacks.gso_send_check(skb); - rcu_read_unlock(); out: return err; } static struct sk_buff *inet_gso_segment(struct sk_buff *skb, - netdev_features_t features) + netdev_features_t features) { struct sk_buff *segs = ERR_PTR(-EINVAL); const struct net_offload *ops; + unsigned int offset = 0; + bool udpfrag, encap; struct iphdr *iph; int proto; + int nhoff; int ihl; int id; - unsigned int offset = 0; - bool tunnel; if (unlikely(skb_shinfo(skb)->gso_type & ~(SKB_GSO_TCPV4 | @@ -1291,12 +1264,16 @@ static struct sk_buff *inet_gso_segment(struct sk_buff *skb, SKB_GSO_DODGY | SKB_GSO_TCP_ECN | SKB_GSO_GRE | + SKB_GSO_IPIP | + SKB_GSO_SIT | SKB_GSO_TCPV6 | SKB_GSO_UDP_TUNNEL | SKB_GSO_MPLS | 0))) goto out; + skb_reset_network_header(skb); + nhoff = skb_network_header(skb) - skb_mac_header(skb); if (unlikely(!pskb_may_pull(skb, sizeof(*iph)))) goto out; @@ -1305,42 +1282,50 @@ static struct sk_buff *inet_gso_segment(struct sk_buff *skb, if (ihl < sizeof(*iph)) goto out; + id = ntohs(iph->id); + proto = iph->protocol; + + /* Warning: after this point, iph might be no longer valid */ if (unlikely(!pskb_may_pull(skb, ihl))) goto out; + __skb_pull(skb, ihl); - tunnel = !!skb->encapsulation; + encap = SKB_GSO_CB(skb)->encap_level > 0; + if (encap) + features = skb->dev->hw_enc_features & netif_skb_features(skb); + SKB_GSO_CB(skb)->encap_level += ihl; - __skb_pull(skb, ihl); skb_reset_transport_header(skb); - iph = ip_hdr(skb); - id = ntohs(iph->id); - proto = iph->protocol; + segs = ERR_PTR(-EPROTONOSUPPORT); - rcu_read_lock(); + /* Note : following gso_segment() might change skb->encapsulation */ + udpfrag = !skb->encapsulation && proto == IPPROTO_UDP; + ops = rcu_dereference(inet_offloads[proto]); if (likely(ops && ops->callbacks.gso_segment)) segs = ops->callbacks.gso_segment(skb, features); - rcu_read_unlock(); if (IS_ERR_OR_NULL(segs)) goto out; skb = segs; do { - iph = ip_hdr(skb); - if (!tunnel && proto == IPPROTO_UDP) { + iph = (struct iphdr *)(skb_mac_header(skb) + nhoff); + if (udpfrag) { iph->id = htons(id); iph->frag_off = htons(offset >> 3); if (skb->next != NULL) iph->frag_off |= htons(IP_MF); - offset += (skb->len - skb->mac_len - iph->ihl * 4); - } else { + offset += skb->len - nhoff - ihl; + } else { iph->id = htons(id++); } - iph->tot_len = htons(skb->len - skb->mac_len); - iph->check = 0; - iph->check = ip_fast_csum(skb_network_header(skb), iph->ihl); + iph->tot_len = htons(skb->len - nhoff); + ip_send_check(iph); + if (encap) + skb_reset_inner_headers(skb); + skb->network_header = (u8 *)iph - skb->head; } while ((skb = skb->next)); out: @@ -1546,6 +1531,7 @@ static const struct net_protocol tcp_protocol = { }; static const struct net_protocol udp_protocol = { + .early_demux = udp_v4_early_demux, .handler = udp_rcv, .err_handler = udp_err, .no_policy = 1, @@ -1646,6 +1632,13 @@ static struct packet_offload ip_packet_offload __read_mostly = { }, }; +static const struct net_offload ipip_offload = { + .callbacks = { + .gso_send_check = inet_gso_send_check, + .gso_segment = inet_gso_segment, + }, +}; + static int __init ipv4_offload_init(void) { /* @@ -1657,6 +1650,7 @@ static int __init ipv4_offload_init(void) pr_crit("%s: Cannot add TCP protocol offload\n", __func__); dev_add_offload(&ip_packet_offload); + inet_add_offload(&ipip_offload, IPPROTO_IPIP); return 0; } @@ -1705,8 +1699,6 @@ static int __init inet_init(void) ip_static_sysctl_init(); #endif - tcp_prot.sysctl_mem = init_net.ipv4.sysctl_tcp_mem; - /* * Add all the base protocols. */ diff --git a/net/ipv4/esp4.c b/net/ipv4/esp4.c index 109ee89f123e..7785b28061ac 100644 --- a/net/ipv4/esp4.c +++ b/net/ipv4/esp4.c @@ -121,7 +121,6 @@ static int esp_output(struct xfrm_state *x, struct sk_buff *skb) struct aead_givcrypt_request *req; struct scatterlist *sg; struct scatterlist *asg; - struct esp_data *esp; struct sk_buff *trailer; void *tmp; u8 *iv; @@ -139,8 +138,7 @@ static int esp_output(struct xfrm_state *x, struct sk_buff *skb) /* skb is pure payload to encrypt */ - esp = x->data; - aead = esp->aead; + aead = x->data; alen = crypto_aead_authsize(aead); tfclen = 0; @@ -154,8 +152,6 @@ static int esp_output(struct xfrm_state *x, struct sk_buff *skb) } blksize = ALIGN(crypto_aead_blocksize(aead), 4); clen = ALIGN(skb->len + 2 + tfclen, blksize); - if (esp->padlen) - clen = ALIGN(clen, esp->padlen); plen = clen - skb->len - tfclen; err = skb_cow_data(skb, tfclen + plen + alen, &trailer); @@ -280,8 +276,7 @@ static int esp_input_done2(struct sk_buff *skb, int err) { const struct iphdr *iph; struct xfrm_state *x = xfrm_input_state(skb); - struct esp_data *esp = x->data; - struct crypto_aead *aead = esp->aead; + struct crypto_aead *aead = x->data; int alen = crypto_aead_authsize(aead); int hlen = sizeof(struct ip_esp_hdr) + crypto_aead_ivsize(aead); int elen = skb->len - hlen; @@ -376,8 +371,7 @@ static void esp_input_done(struct crypto_async_request *base, int err) static int esp_input(struct xfrm_state *x, struct sk_buff *skb) { struct ip_esp_hdr *esph; - struct esp_data *esp = x->data; - struct crypto_aead *aead = esp->aead; + struct crypto_aead *aead = x->data; struct aead_request *req; struct sk_buff *trailer; int elen = skb->len - sizeof(*esph) - crypto_aead_ivsize(aead); @@ -459,9 +453,8 @@ out: static u32 esp4_get_mtu(struct xfrm_state *x, int mtu) { - struct esp_data *esp = x->data; - u32 blksize = ALIGN(crypto_aead_blocksize(esp->aead), 4); - u32 align = max_t(u32, blksize, esp->padlen); + struct crypto_aead *aead = x->data; + u32 blksize = ALIGN(crypto_aead_blocksize(aead), 4); unsigned int net_adj; switch (x->props.mode) { @@ -476,8 +469,8 @@ static u32 esp4_get_mtu(struct xfrm_state *x, int mtu) BUG(); } - return ((mtu - x->props.header_len - crypto_aead_authsize(esp->aead) - - net_adj) & ~(align - 1)) + net_adj - 2; + return ((mtu - x->props.header_len - crypto_aead_authsize(aead) - + net_adj) & ~(blksize - 1)) + net_adj - 2; } static void esp4_err(struct sk_buff *skb, u32 info) @@ -511,18 +504,16 @@ static void esp4_err(struct sk_buff *skb, u32 info) static void esp_destroy(struct xfrm_state *x) { - struct esp_data *esp = x->data; + struct crypto_aead *aead = x->data; - if (!esp) + if (!aead) return; - crypto_free_aead(esp->aead); - kfree(esp); + crypto_free_aead(aead); } static int esp_init_aead(struct xfrm_state *x) { - struct esp_data *esp = x->data; struct crypto_aead *aead; int err; @@ -531,7 +522,7 @@ static int esp_init_aead(struct xfrm_state *x) if (IS_ERR(aead)) goto error; - esp->aead = aead; + x->data = aead; err = crypto_aead_setkey(aead, x->aead->alg_key, (x->aead->alg_key_len + 7) / 8); @@ -548,7 +539,6 @@ error: static int esp_init_authenc(struct xfrm_state *x) { - struct esp_data *esp = x->data; struct crypto_aead *aead; struct crypto_authenc_key_param *param; struct rtattr *rta; @@ -583,7 +573,7 @@ static int esp_init_authenc(struct xfrm_state *x) if (IS_ERR(aead)) goto error; - esp->aead = aead; + x->data = aead; keylen = (x->aalg ? (x->aalg->alg_key_len + 7) / 8 : 0) + (x->ealg->alg_key_len + 7) / 8 + RTA_SPACE(sizeof(*param)); @@ -638,16 +628,11 @@ error: static int esp_init_state(struct xfrm_state *x) { - struct esp_data *esp; struct crypto_aead *aead; u32 align; int err; - esp = kzalloc(sizeof(*esp), GFP_KERNEL); - if (esp == NULL) - return -ENOMEM; - - x->data = esp; + x->data = NULL; if (x->aead) err = esp_init_aead(x); @@ -657,9 +642,7 @@ static int esp_init_state(struct xfrm_state *x) if (err) goto error; - aead = esp->aead; - - esp->padlen = 0; + aead = x->data; x->props.header_len = sizeof(struct ip_esp_hdr) + crypto_aead_ivsize(aead); @@ -683,9 +666,7 @@ static int esp_init_state(struct xfrm_state *x) } align = ALIGN(crypto_aead_blocksize(aead), 4); - if (esp->padlen) - align = max_t(u32, align, esp->padlen); - x->props.trailer_len = align + 1 + crypto_aead_authsize(esp->aead); + x->props.trailer_len = align + 1 + crypto_aead_authsize(aead); error: return err; diff --git a/net/ipv4/fib_frontend.c b/net/ipv4/fib_frontend.c index b3f627ac4ed8..d846304b7b89 100644 --- a/net/ipv4/fib_frontend.c +++ b/net/ipv4/fib_frontend.c @@ -933,7 +933,6 @@ static void nl_fib_lookup(struct fib_result_nl *frn, struct fib_table *tb) local_bh_disable(); frn->tb_id = tb->tb_id; - rcu_read_lock(); frn->err = fib_table_lookup(tb, &fl4, &res, FIB_LOOKUP_NOREF); if (!frn->err) { @@ -942,7 +941,6 @@ static void nl_fib_lookup(struct fib_result_nl *frn, struct fib_table *tb) frn->type = res.type; frn->scope = res.scope; } - rcu_read_unlock(); local_bh_enable(); } } diff --git a/net/ipv4/fib_lookup.h b/net/ipv4/fib_lookup.h index af0f14aba169..388d113fd289 100644 --- a/net/ipv4/fib_lookup.h +++ b/net/ipv4/fib_lookup.h @@ -24,21 +24,17 @@ static inline void fib_alias_accessed(struct fib_alias *fa) } /* Exported by fib_semantics.c */ -extern void fib_release_info(struct fib_info *); -extern struct fib_info *fib_create_info(struct fib_config *cfg); -extern int fib_nh_match(struct fib_config *cfg, struct fib_info *fi); -extern int fib_dump_info(struct sk_buff *skb, u32 pid, u32 seq, int event, - u32 tb_id, u8 type, __be32 dst, - int dst_len, u8 tos, struct fib_info *fi, - unsigned int); -extern void rtmsg_fib(int event, __be32 key, struct fib_alias *fa, - int dst_len, u32 tb_id, struct nl_info *info, - unsigned int nlm_flags); -extern struct fib_alias *fib_find_alias(struct list_head *fah, - u8 tos, u32 prio); -extern int fib_detect_death(struct fib_info *fi, int order, - struct fib_info **last_resort, - int *last_idx, int dflt); +void fib_release_info(struct fib_info *); +struct fib_info *fib_create_info(struct fib_config *cfg); +int fib_nh_match(struct fib_config *cfg, struct fib_info *fi); +int fib_dump_info(struct sk_buff *skb, u32 pid, u32 seq, int event, u32 tb_id, + u8 type, __be32 dst, int dst_len, u8 tos, struct fib_info *fi, + unsigned int); +void rtmsg_fib(int event, __be32 key, struct fib_alias *fa, int dst_len, + u32 tb_id, const struct nl_info *info, unsigned int nlm_flags); +struct fib_alias *fib_find_alias(struct list_head *fah, u8 tos, u32 prio); +int fib_detect_death(struct fib_info *fi, int order, + struct fib_info **last_resort, int *last_idx, int dflt); static inline void fib_result_assign(struct fib_result *res, struct fib_info *fi) diff --git a/net/ipv4/fib_semantics.c b/net/ipv4/fib_semantics.c index d5dbca5ecf62..e63f47a4e651 100644 --- a/net/ipv4/fib_semantics.c +++ b/net/ipv4/fib_semantics.c @@ -380,7 +380,7 @@ static inline size_t fib_nlmsg_size(struct fib_info *fi) } void rtmsg_fib(int event, __be32 key, struct fib_alias *fa, - int dst_len, u32 tb_id, struct nl_info *info, + int dst_len, u32 tb_id, const struct nl_info *info, unsigned int nlm_flags) { struct sk_buff *skb; diff --git a/net/ipv4/fib_trie.c b/net/ipv4/fib_trie.c index 3df6d3edb2a1..ec9a9ef4ce50 100644 --- a/net/ipv4/fib_trie.c +++ b/net/ipv4/fib_trie.c @@ -762,12 +762,9 @@ static struct tnode *inflate(struct trie *t, struct tnode *tn) if (IS_LEAF(node) || ((struct tnode *) node)->pos > tn->pos + tn->bits - 1) { - if (tkey_extract_bits(node->key, - oldtnode->pos + oldtnode->bits, - 1) == 0) - put_child(tn, 2*i, node); - else - put_child(tn, 2*i+1, node); + put_child(tn, + tkey_extract_bits(node->key, oldtnode->pos, oldtnode->bits + 1), + node); continue; } @@ -1120,12 +1117,8 @@ static struct list_head *fib_insert_node(struct trie *t, u32 key, int plen) * first tnode need some special handling */ - if (tp) - pos = tp->pos+tp->bits; - else - pos = 0; - if (n) { + pos = tp ? tp->pos+tp->bits : 0; newpos = tkey_mismatch(key, pos, n->key); tn = tnode_new(n->key, newpos, 1); } else { diff --git a/net/ipv4/gre_demux.c b/net/ipv4/gre_demux.c index 736c9fc3ef93..5893e99e8299 100644 --- a/net/ipv4/gre_demux.c +++ b/net/ipv4/gre_demux.c @@ -93,35 +93,6 @@ void gre_build_header(struct sk_buff *skb, const struct tnl_ptk_info *tpi, } EXPORT_SYMBOL_GPL(gre_build_header); -struct sk_buff *gre_handle_offloads(struct sk_buff *skb, bool gre_csum) -{ - int err; - - if (likely(!skb->encapsulation)) { - skb_reset_inner_headers(skb); - skb->encapsulation = 1; - } - - if (skb_is_gso(skb)) { - err = skb_unclone(skb, GFP_ATOMIC); - if (unlikely(err)) - goto error; - skb_shinfo(skb)->gso_type |= SKB_GSO_GRE; - return skb; - } else if (skb->ip_summed == CHECKSUM_PARTIAL && gre_csum) { - err = skb_checksum_help(skb); - if (unlikely(err)) - goto error; - } else if (skb->ip_summed != CHECKSUM_PARTIAL) - skb->ip_summed = CHECKSUM_NONE; - - return skb; -error: - kfree_skb(skb); - return ERR_PTR(err); -} -EXPORT_SYMBOL_GPL(gre_handle_offloads); - static __sum16 check_checksum(struct sk_buff *skb) { __sum16 csum = 0; diff --git a/net/ipv4/gre_offload.c b/net/ipv4/gre_offload.c index 55e6bfb3a289..e5d436188464 100644 --- a/net/ipv4/gre_offload.c +++ b/net/ipv4/gre_offload.c @@ -39,7 +39,8 @@ static struct sk_buff *gre_gso_segment(struct sk_buff *skb, SKB_GSO_UDP | SKB_GSO_DODGY | SKB_GSO_TCP_ECN | - SKB_GSO_GRE))) + SKB_GSO_GRE | + SKB_GSO_IPIP))) goto out; if (unlikely(!pskb_may_pull(skb, sizeof(*greh)))) diff --git a/net/ipv4/icmp.c b/net/ipv4/icmp.c index 5f7d11a45871..5c0e8bc6e5ba 100644 --- a/net/ipv4/icmp.c +++ b/net/ipv4/icmp.c @@ -353,6 +353,9 @@ static void icmp_reply(struct icmp_bxm *icmp_param, struct sk_buff *skb) saddr = fib_compute_spec_dst(skb); ipc.opt = NULL; ipc.tx_flags = 0; + ipc.ttl = 0; + ipc.tos = -1; + if (icmp_param->replyopts.opt.opt.optlen) { ipc.opt = &icmp_param->replyopts.opt; if (ipc.opt->opt.srr) @@ -608,6 +611,8 @@ void icmp_send(struct sk_buff *skb_in, int type, int code, __be32 info) ipc.addr = iph->saddr; ipc.opt = &icmp_param->replyopts.opt; ipc.tx_flags = 0; + ipc.ttl = 0; + ipc.tos = -1; rt = icmp_route_lookup(net, &fl4, skb_in, iph, saddr, tos, type, code, icmp_param); diff --git a/net/ipv4/inet_connection_sock.c b/net/ipv4/inet_connection_sock.c index 6acb541c9091..fc0e649cc002 100644 --- a/net/ipv4/inet_connection_sock.c +++ b/net/ipv4/inet_connection_sock.c @@ -29,27 +29,19 @@ const char inet_csk_timer_bug_msg[] = "inet_csk BUG: unknown timer value\n"; EXPORT_SYMBOL(inet_csk_timer_bug_msg); #endif -/* - * This struct holds the first and last local port number. - */ -struct local_ports sysctl_local_ports __read_mostly = { - .lock = __SEQLOCK_UNLOCKED(sysctl_local_ports.lock), - .range = { 32768, 61000 }, -}; - unsigned long *sysctl_local_reserved_ports; EXPORT_SYMBOL(sysctl_local_reserved_ports); -void inet_get_local_port_range(int *low, int *high) +void inet_get_local_port_range(struct net *net, int *low, int *high) { unsigned int seq; do { - seq = read_seqbegin(&sysctl_local_ports.lock); + seq = read_seqbegin(&net->ipv4.sysctl_local_ports.lock); - *low = sysctl_local_ports.range[0]; - *high = sysctl_local_ports.range[1]; - } while (read_seqretry(&sysctl_local_ports.lock, seq)); + *low = net->ipv4.sysctl_local_ports.range[0]; + *high = net->ipv4.sysctl_local_ports.range[1]; + } while (read_seqretry(&net->ipv4.sysctl_local_ports.lock, seq)); } EXPORT_SYMBOL(inet_get_local_port_range); @@ -79,17 +71,16 @@ int inet_csk_bind_conflict(const struct sock *sk, (!reuseport || !sk2->sk_reuseport || (sk2->sk_state != TCP_TIME_WAIT && !uid_eq(uid, sock_i_uid(sk2))))) { - const __be32 sk2_rcv_saddr = sk_rcv_saddr(sk2); - if (!sk2_rcv_saddr || !sk_rcv_saddr(sk) || - sk2_rcv_saddr == sk_rcv_saddr(sk)) + + if (!sk2->sk_rcv_saddr || !sk->sk_rcv_saddr || + sk2->sk_rcv_saddr == sk->sk_rcv_saddr) break; } if (!relax && reuse && sk2->sk_reuse && sk2->sk_state != TCP_LISTEN) { - const __be32 sk2_rcv_saddr = sk_rcv_saddr(sk2); - if (!sk2_rcv_saddr || !sk_rcv_saddr(sk) || - sk2_rcv_saddr == sk_rcv_saddr(sk)) + if (!sk2->sk_rcv_saddr || !sk->sk_rcv_saddr || + sk2->sk_rcv_saddr == sk->sk_rcv_saddr) break; } } @@ -116,7 +107,7 @@ int inet_csk_get_port(struct sock *sk, unsigned short snum) int remaining, rover, low, high; again: - inet_get_local_port_range(&low, &high); + inet_get_local_port_range(net, &low, &high); remaining = (high - low) + 1; smallest_rover = rover = net_random() % remaining + low; @@ -421,8 +412,8 @@ struct dst_entry *inet_csk_route_req(struct sock *sk, RT_CONN_FLAGS(sk), RT_SCOPE_UNIVERSE, sk->sk_protocol, flags, - (opt && opt->opt.srr) ? opt->opt.faddr : ireq->rmt_addr, - ireq->loc_addr, ireq->rmt_port, inet_sk(sk)->inet_sport); + (opt && opt->opt.srr) ? opt->opt.faddr : ireq->ir_rmt_addr, + ireq->ir_loc_addr, ireq->ir_rmt_port, inet_sk(sk)->inet_sport); security_req_classify_flow(req, flowi4_to_flowi(fl4)); rt = ip_route_output_flow(net, fl4, sk); if (IS_ERR(rt)) @@ -457,8 +448,8 @@ struct dst_entry *inet_csk_route_child_sock(struct sock *sk, flowi4_init_output(fl4, sk->sk_bound_dev_if, sk->sk_mark, RT_CONN_FLAGS(sk), RT_SCOPE_UNIVERSE, sk->sk_protocol, inet_sk_flowi_flags(sk), - (opt && opt->opt.srr) ? opt->opt.faddr : ireq->rmt_addr, - ireq->loc_addr, ireq->rmt_port, inet_sk(sk)->inet_sport); + (opt && opt->opt.srr) ? opt->opt.faddr : ireq->ir_rmt_addr, + ireq->ir_loc_addr, ireq->ir_rmt_port, inet_sk(sk)->inet_sport); security_req_classify_flow(req, flowi4_to_flowi(fl4)); rt = ip_route_output_flow(net, fl4, sk); if (IS_ERR(rt)) @@ -504,9 +495,9 @@ struct request_sock *inet_csk_search_req(const struct sock *sk, prev = &req->dl_next) { const struct inet_request_sock *ireq = inet_rsk(req); - if (ireq->rmt_port == rport && - ireq->rmt_addr == raddr && - ireq->loc_addr == laddr && + if (ireq->ir_rmt_port == rport && + ireq->ir_rmt_addr == raddr && + ireq->ir_loc_addr == laddr && AF_INET_FAMILY(req->rsk_ops->family)) { WARN_ON(req->sk); *prevp = prev; @@ -523,7 +514,8 @@ void inet_csk_reqsk_queue_hash_add(struct sock *sk, struct request_sock *req, { struct inet_connection_sock *icsk = inet_csk(sk); struct listen_sock *lopt = icsk->icsk_accept_queue.listen_opt; - const u32 h = inet_synq_hash(inet_rsk(req)->rmt_addr, inet_rsk(req)->rmt_port, + const u32 h = inet_synq_hash(inet_rsk(req)->ir_rmt_addr, + inet_rsk(req)->ir_rmt_port, lopt->hash_rnd, lopt->nr_table_entries); reqsk_queue_hash_req(&icsk->icsk_accept_queue, h, req, timeout); @@ -683,9 +675,9 @@ struct sock *inet_csk_clone_lock(const struct sock *sk, newsk->sk_state = TCP_SYN_RECV; newicsk->icsk_bind_hash = NULL; - inet_sk(newsk)->inet_dport = inet_rsk(req)->rmt_port; - inet_sk(newsk)->inet_num = ntohs(inet_rsk(req)->loc_port); - inet_sk(newsk)->inet_sport = inet_rsk(req)->loc_port; + inet_sk(newsk)->inet_dport = inet_rsk(req)->ir_rmt_port; + inet_sk(newsk)->inet_num = inet_rsk(req)->ir_num; + inet_sk(newsk)->inet_sport = htons(inet_rsk(req)->ir_num); newsk->sk_write_space = sk_stream_write_space; newicsk->icsk_retransmits = 0; diff --git a/net/ipv4/inet_diag.c b/net/ipv4/inet_diag.c index 5f648751fce2..56a964a553d2 100644 --- a/net/ipv4/inet_diag.c +++ b/net/ipv4/inet_diag.c @@ -121,13 +121,13 @@ int inet_sk_diag_fill(struct sock *sk, struct inet_connection_sock *icsk, #if IS_ENABLED(CONFIG_IPV6) if (r->idiag_family == AF_INET6) { - const struct ipv6_pinfo *np = inet6_sk(sk); - *(struct in6_addr *)r->id.idiag_src = np->rcv_saddr; - *(struct in6_addr *)r->id.idiag_dst = np->daddr; + *(struct in6_addr *)r->id.idiag_src = sk->sk_v6_rcv_saddr; + *(struct in6_addr *)r->id.idiag_dst = sk->sk_v6_daddr; if (ext & (1 << (INET_DIAG_TCLASS - 1))) - if (nla_put_u8(skb, INET_DIAG_TCLASS, np->tclass) < 0) + if (nla_put_u8(skb, INET_DIAG_TCLASS, + inet6_sk(sk)->tclass) < 0) goto errout; } #endif @@ -222,7 +222,7 @@ static int inet_twsk_diag_fill(struct inet_timewait_sock *tw, u32 portid, u32 seq, u16 nlmsg_flags, const struct nlmsghdr *unlh) { - long tmo; + s32 tmo; struct inet_diag_msg *r; struct nlmsghdr *nlh; @@ -234,7 +234,7 @@ static int inet_twsk_diag_fill(struct inet_timewait_sock *tw, r = nlmsg_data(nlh); BUG_ON(tw->tw_state != TCP_TIME_WAIT); - tmo = tw->tw_ttd - jiffies; + tmo = tw->tw_ttd - inet_tw_time_stamp(); if (tmo < 0) tmo = 0; @@ -248,18 +248,15 @@ static int inet_twsk_diag_fill(struct inet_timewait_sock *tw, r->id.idiag_dst[0] = tw->tw_daddr; r->idiag_state = tw->tw_substate; r->idiag_timer = 3; - r->idiag_expires = DIV_ROUND_UP(tmo * 1000, HZ); + r->idiag_expires = jiffies_to_msecs(tmo); r->idiag_rqueue = 0; r->idiag_wqueue = 0; r->idiag_uid = 0; r->idiag_inode = 0; #if IS_ENABLED(CONFIG_IPV6) if (tw->tw_family == AF_INET6) { - const struct inet6_timewait_sock *tw6 = - inet6_twsk((struct sock *)tw); - - *(struct in6_addr *)r->id.idiag_src = tw6->tw_v6_rcv_saddr; - *(struct in6_addr *)r->id.idiag_dst = tw6->tw_v6_daddr; + *(struct in6_addr *)r->id.idiag_src = tw->tw_v6_rcv_saddr; + *(struct in6_addr *)r->id.idiag_dst = tw->tw_v6_daddr; } #endif @@ -273,10 +270,11 @@ static int sk_diag_fill(struct sock *sk, struct sk_buff *skb, const struct nlmsghdr *unlh) { if (sk->sk_state == TCP_TIME_WAIT) - return inet_twsk_diag_fill((struct inet_timewait_sock *)sk, - skb, r, portid, seq, nlmsg_flags, - unlh); - return inet_csk_diag_fill(sk, skb, r, user_ns, portid, seq, nlmsg_flags, unlh); + return inet_twsk_diag_fill(inet_twsk(sk), skb, r, portid, seq, + nlmsg_flags, unlh); + + return inet_csk_diag_fill(sk, skb, r, user_ns, portid, seq, + nlmsg_flags, unlh); } int inet_diag_dump_one_icsk(struct inet_hashinfo *hashinfo, struct sk_buff *in_skb, @@ -338,12 +336,9 @@ int inet_diag_dump_one_icsk(struct inet_hashinfo *hashinfo, struct sk_buff *in_s err = 0; out: - if (sk) { - if (sk->sk_state == TCP_TIME_WAIT) - inet_twsk_put((struct inet_timewait_sock *)sk); - else - sock_put(sk); - } + if (sk) + sock_gen_put(sk); + out_nosk: return err; } @@ -489,10 +484,9 @@ int inet_diag_bc_sk(const struct nlattr *bc, struct sock *sk) entry.family = sk->sk_family; #if IS_ENABLED(CONFIG_IPV6) if (entry.family == AF_INET6) { - struct ipv6_pinfo *np = inet6_sk(sk); - entry.saddr = np->rcv_saddr.s6_addr32; - entry.daddr = np->daddr.s6_addr32; + entry.saddr = sk->sk_v6_rcv_saddr.s6_addr32; + entry.daddr = sk->sk_v6_daddr.s6_addr32; } else #endif { @@ -635,22 +629,22 @@ static int inet_csk_diag_dump(struct sock *sk, cb->nlh->nlmsg_seq, NLM_F_MULTI, cb->nlh); } -static int inet_twsk_diag_dump(struct inet_timewait_sock *tw, +static int inet_twsk_diag_dump(struct sock *sk, struct sk_buff *skb, struct netlink_callback *cb, struct inet_diag_req_v2 *r, const struct nlattr *bc) { + struct inet_timewait_sock *tw = inet_twsk(sk); + if (bc != NULL) { struct inet_diag_entry entry; entry.family = tw->tw_family; #if IS_ENABLED(CONFIG_IPV6) if (tw->tw_family == AF_INET6) { - struct inet6_timewait_sock *tw6 = - inet6_twsk((struct sock *)tw); - entry.saddr = tw6->tw_v6_rcv_saddr.s6_addr32; - entry.daddr = tw6->tw_v6_daddr.s6_addr32; + entry.saddr = tw->tw_v6_rcv_saddr.s6_addr32; + entry.daddr = tw->tw_v6_daddr.s6_addr32; } else #endif { @@ -682,12 +676,12 @@ static inline void inet_diag_req_addrs(const struct sock *sk, #if IS_ENABLED(CONFIG_IPV6) if (sk->sk_family == AF_INET6) { if (req->rsk_ops->family == AF_INET6) { - entry->saddr = inet6_rsk(req)->loc_addr.s6_addr32; - entry->daddr = inet6_rsk(req)->rmt_addr.s6_addr32; + entry->saddr = ireq->ir_v6_loc_addr.s6_addr32; + entry->daddr = ireq->ir_v6_rmt_addr.s6_addr32; } else if (req->rsk_ops->family == AF_INET) { - ipv6_addr_set_v4mapped(ireq->loc_addr, + ipv6_addr_set_v4mapped(ireq->ir_loc_addr, &entry->saddr_storage); - ipv6_addr_set_v4mapped(ireq->rmt_addr, + ipv6_addr_set_v4mapped(ireq->ir_rmt_addr, &entry->daddr_storage); entry->saddr = entry->saddr_storage.s6_addr32; entry->daddr = entry->daddr_storage.s6_addr32; @@ -695,8 +689,8 @@ static inline void inet_diag_req_addrs(const struct sock *sk, } else #endif { - entry->saddr = &ireq->loc_addr; - entry->daddr = &ireq->rmt_addr; + entry->saddr = &ireq->ir_loc_addr; + entry->daddr = &ireq->ir_rmt_addr; } } @@ -731,9 +725,9 @@ static int inet_diag_fill_req(struct sk_buff *skb, struct sock *sk, tmo = 0; r->id.idiag_sport = inet->inet_sport; - r->id.idiag_dport = ireq->rmt_port; - r->id.idiag_src[0] = ireq->loc_addr; - r->id.idiag_dst[0] = ireq->rmt_addr; + r->id.idiag_dport = ireq->ir_rmt_port; + r->id.idiag_src[0] = ireq->ir_loc_addr; + r->id.idiag_dst[0] = ireq->ir_rmt_addr; r->idiag_expires = jiffies_to_msecs(tmo); r->idiag_rqueue = 0; r->idiag_wqueue = 0; @@ -792,13 +786,13 @@ static int inet_diag_dump_reqs(struct sk_buff *skb, struct sock *sk, if (reqnum < s_reqnum) continue; - if (r->id.idiag_dport != ireq->rmt_port && + if (r->id.idiag_dport != ireq->ir_rmt_port && r->id.idiag_dport) continue; if (bc) { inet_diag_req_addrs(sk, req, &entry); - entry.dport = ntohs(ireq->rmt_port); + entry.dport = ntohs(ireq->ir_rmt_port); if (!inet_diag_bc_run(bc, &entry)) continue; @@ -911,8 +905,7 @@ skip_listen_ht: num = 0; - if (hlist_nulls_empty(&head->chain) && - hlist_nulls_empty(&head->twchain)) + if (hlist_nulls_empty(&head->chain)) continue; if (i > s_i) @@ -920,7 +913,7 @@ skip_listen_ht: spin_lock_bh(lock); sk_nulls_for_each(sk, node, &head->chain) { - struct inet_sock *inet = inet_sk(sk); + int res; if (!net_eq(sock_net(sk), net)) continue; @@ -929,15 +922,19 @@ skip_listen_ht: if (!(r->idiag_states & (1 << sk->sk_state))) goto next_normal; if (r->sdiag_family != AF_UNSPEC && - sk->sk_family != r->sdiag_family) + sk->sk_family != r->sdiag_family) goto next_normal; - if (r->id.idiag_sport != inet->inet_sport && + if (r->id.idiag_sport != htons(sk->sk_num) && r->id.idiag_sport) goto next_normal; - if (r->id.idiag_dport != inet->inet_dport && + if (r->id.idiag_dport != sk->sk_dport && r->id.idiag_dport) goto next_normal; - if (inet_csk_diag_dump(sk, skb, cb, r, bc) < 0) { + if (sk->sk_state == TCP_TIME_WAIT) + res = inet_twsk_diag_dump(sk, skb, cb, r, bc); + else + res = inet_csk_diag_dump(sk, skb, cb, r, bc); + if (res < 0) { spin_unlock_bh(lock); goto done; } @@ -945,33 +942,6 @@ next_normal: ++num; } - if (r->idiag_states & TCPF_TIME_WAIT) { - struct inet_timewait_sock *tw; - - inet_twsk_for_each(tw, node, - &head->twchain) { - if (!net_eq(twsk_net(tw), net)) - continue; - - if (num < s_num) - goto next_dying; - if (r->sdiag_family != AF_UNSPEC && - tw->tw_family != r->sdiag_family) - goto next_dying; - if (r->id.idiag_sport != tw->tw_sport && - r->id.idiag_sport) - goto next_dying; - if (r->id.idiag_dport != tw->tw_dport && - r->id.idiag_dport) - goto next_dying; - if (inet_twsk_diag_dump(tw, skb, cb, r, bc) < 0) { - spin_unlock_bh(lock); - goto done; - } -next_dying: - ++num; - } - } spin_unlock_bh(lock); } diff --git a/net/ipv4/inet_fragment.c b/net/ipv4/inet_fragment.c index c5313a9c019b..bb075fc9a14f 100644 --- a/net/ipv4/inet_fragment.c +++ b/net/ipv4/inet_fragment.c @@ -93,9 +93,6 @@ void inet_frags_init(struct inet_frags *f) } rwlock_init(&f->lock); - f->rnd = (u32) ((totalram_pages ^ (totalram_pages >> 7)) ^ - (jiffies ^ (jiffies >> 6))); - setup_timer(&f->secret_timer, inet_frag_secret_rebuild, (unsigned long)f); f->secret_timer.expires = jiffies + f->secret_interval; diff --git a/net/ipv4/inet_hashtables.c b/net/ipv4/inet_hashtables.c index 96da9c77deca..8b9cf279450d 100644 --- a/net/ipv4/inet_hashtables.c +++ b/net/ipv4/inet_hashtables.c @@ -24,6 +24,31 @@ #include <net/secure_seq.h> #include <net/ip.h> +static unsigned int inet_ehashfn(struct net *net, const __be32 laddr, + const __u16 lport, const __be32 faddr, + const __be16 fport) +{ + static u32 inet_ehash_secret __read_mostly; + + net_get_random_once(&inet_ehash_secret, sizeof(inet_ehash_secret)); + + return __inet_ehashfn(laddr, lport, faddr, fport, + inet_ehash_secret + net_hash_mix(net)); +} + + +static unsigned int inet_sk_ehashfn(const struct sock *sk) +{ + const struct inet_sock *inet = inet_sk(sk); + const __be32 laddr = inet->inet_rcv_saddr; + const __u16 lport = inet->inet_num; + const __be32 faddr = inet->inet_daddr; + const __be16 fport = inet->inet_dport; + struct net *net = sock_net(sk); + + return inet_ehashfn(net, laddr, lport, faddr, fport); +} + /* * Allocate and initialize a new local port bind bucket. * The bindhash mutex for snum's hash chain must be held here. @@ -230,6 +255,19 @@ begin: } EXPORT_SYMBOL_GPL(__inet_lookup_listener); +/* All sockets share common refcount, but have different destructors */ +void sock_gen_put(struct sock *sk) +{ + if (!atomic_dec_and_test(&sk->sk_refcnt)) + return; + + if (sk->sk_state == TCP_TIME_WAIT) + inet_twsk_free(inet_twsk(sk)); + else + sk_free(sk); +} +EXPORT_SYMBOL_GPL(sock_gen_put); + struct sock *__inet_lookup_established(struct net *net, struct inet_hashinfo *hashinfo, const __be32 saddr, const __be16 sport, @@ -255,13 +293,13 @@ begin: if (likely(INET_MATCH(sk, net, acookie, saddr, daddr, ports, dif))) { if (unlikely(!atomic_inc_not_zero(&sk->sk_refcnt))) - goto begintw; + goto out; if (unlikely(!INET_MATCH(sk, net, acookie, saddr, daddr, ports, dif))) { - sock_put(sk); + sock_gen_put(sk); goto begin; } - goto out; + goto found; } } /* @@ -271,37 +309,9 @@ begin: */ if (get_nulls_value(node) != slot) goto begin; - -begintw: - /* Must check for a TIME_WAIT'er before going to listener hash. */ - sk_nulls_for_each_rcu(sk, node, &head->twchain) { - if (sk->sk_hash != hash) - continue; - if (likely(INET_TW_MATCH(sk, net, acookie, - saddr, daddr, ports, - dif))) { - if (unlikely(!atomic_inc_not_zero(&sk->sk_refcnt))) { - sk = NULL; - goto out; - } - if (unlikely(!INET_TW_MATCH(sk, net, acookie, - saddr, daddr, ports, - dif))) { - inet_twsk_put(inet_twsk(sk)); - goto begintw; - } - goto out; - } - } - /* - * if the nulls value we got at the end of this lookup is - * not the expected one, we must restart lookup. - * We probably met an item that was moved to another chain. - */ - if (get_nulls_value(node) != slot) - goto begintw; - sk = NULL; out: + sk = NULL; +found: rcu_read_unlock(); return sk; } @@ -326,39 +336,29 @@ static int __inet_check_established(struct inet_timewait_death_row *death_row, spinlock_t *lock = inet_ehash_lockp(hinfo, hash); struct sock *sk2; const struct hlist_nulls_node *node; - struct inet_timewait_sock *tw; + struct inet_timewait_sock *tw = NULL; int twrefcnt = 0; spin_lock(lock); - /* Check TIME-WAIT sockets first. */ - sk_nulls_for_each(sk2, node, &head->twchain) { - if (sk2->sk_hash != hash) - continue; - - if (likely(INET_TW_MATCH(sk2, net, acookie, - saddr, daddr, ports, dif))) { - tw = inet_twsk(sk2); - if (twsk_unique(sk, sk2, twp)) - goto unique; - else - goto not_unique; - } - } - tw = NULL; - - /* And established part... */ sk_nulls_for_each(sk2, node, &head->chain) { if (sk2->sk_hash != hash) continue; + if (likely(INET_MATCH(sk2, net, acookie, - saddr, daddr, ports, dif))) + saddr, daddr, ports, dif))) { + if (sk2->sk_state == TCP_TIME_WAIT) { + tw = inet_twsk(sk2); + if (twsk_unique(sk, sk2, twp)) + break; + } goto not_unique; + } } -unique: /* Must record num and sport now. Otherwise we will see - * in hash table socket with a funny identity. */ + * in hash table socket with a funny identity. + */ inet->inet_num = lport; inet->inet_sport = htons(lport); sk->sk_hash = hash; @@ -494,7 +494,7 @@ int __inet_hash_connect(struct inet_timewait_death_row *death_row, u32 offset = hint + port_offset; struct inet_timewait_sock *tw = NULL; - inet_get_local_port_range(&low, &high); + inet_get_local_port_range(net, &low, &high); remaining = (high - low) + 1; local_bh_disable(); diff --git a/net/ipv4/inet_timewait_sock.c b/net/ipv4/inet_timewait_sock.c index 1f27c9f4afd0..6d592f8555fb 100644 --- a/net/ipv4/inet_timewait_sock.c +++ b/net/ipv4/inet_timewait_sock.c @@ -87,19 +87,11 @@ static void __inet_twsk_kill(struct inet_timewait_sock *tw, refcnt += inet_twsk_bind_unhash(tw, hashinfo); spin_unlock(&bhead->lock); -#ifdef SOCK_REFCNT_DEBUG - if (atomic_read(&tw->tw_refcnt) != 1) { - pr_debug("%s timewait_sock %p refcnt=%d\n", - tw->tw_prot->name, tw, atomic_read(&tw->tw_refcnt)); - } -#endif - while (refcnt) { - inet_twsk_put(tw); - refcnt--; - } + BUG_ON(refcnt >= atomic_read(&tw->tw_refcnt)); + atomic_sub(refcnt, &tw->tw_refcnt); } -static noinline void inet_twsk_free(struct inet_timewait_sock *tw) +void inet_twsk_free(struct inet_timewait_sock *tw) { struct module *owner = tw->tw_prot->owner; twsk_destructor((struct sock *)tw); @@ -118,6 +110,18 @@ void inet_twsk_put(struct inet_timewait_sock *tw) } EXPORT_SYMBOL_GPL(inet_twsk_put); +static void inet_twsk_add_node_rcu(struct inet_timewait_sock *tw, + struct hlist_nulls_head *list) +{ + hlist_nulls_add_head_rcu(&tw->tw_node, list); +} + +static void inet_twsk_add_bind_node(struct inet_timewait_sock *tw, + struct hlist_head *list) +{ + hlist_add_head(&tw->tw_bind_node, list); +} + /* * Enter the time wait state. This is called with locally disabled BH. * Essentially we whip up a timewait bucket, copy the relevant info into it @@ -146,26 +150,21 @@ void __inet_twsk_hashdance(struct inet_timewait_sock *tw, struct sock *sk, spin_lock(lock); /* - * Step 2: Hash TW into TIMEWAIT chain. - * Should be done before removing sk from established chain - * because readers are lockless and search established first. + * Step 2: Hash TW into tcp ehash chain. + * Notes : + * - tw_refcnt is set to 3 because : + * - We have one reference from bhash chain. + * - We have one reference from ehash chain. + * We can use atomic_set() because prior spin_lock()/spin_unlock() + * committed into memory all tw fields. */ - inet_twsk_add_node_rcu(tw, &ehead->twchain); + atomic_set(&tw->tw_refcnt, 1 + 1 + 1); + inet_twsk_add_node_rcu(tw, &ehead->chain); - /* Step 3: Remove SK from established hash. */ + /* Step 3: Remove SK from hash chain */ if (__sk_nulls_del_node_init_rcu(sk)) sock_prot_inuse_add(sock_net(sk), sk->sk_prot, -1); - /* - * Notes : - * - We initially set tw_refcnt to 0 in inet_twsk_alloc() - * - We add one reference for the bhash link - * - We add one reference for the ehash link - * - We want this refcnt update done before allowing other - * threads to find this tw in ehash chain. - */ - atomic_add(1 + 1 + 1, &tw->tw_refcnt); - spin_unlock(lock); } EXPORT_SYMBOL_GPL(__inet_twsk_hashdance); @@ -387,11 +386,11 @@ void inet_twsk_schedule(struct inet_timewait_sock *tw, if (slot >= INET_TWDR_TWKILL_SLOTS) slot = INET_TWDR_TWKILL_SLOTS - 1; } - tw->tw_ttd = jiffies + timeo; + tw->tw_ttd = inet_tw_time_stamp() + timeo; slot = (twdr->slot + slot) & (INET_TWDR_TWKILL_SLOTS - 1); list = &twdr->cells[slot]; } else { - tw->tw_ttd = jiffies + (slot << INET_TWDR_RECYCLE_TICK); + tw->tw_ttd = inet_tw_time_stamp() + (slot << INET_TWDR_RECYCLE_TICK); if (twdr->twcal_hand < 0) { twdr->twcal_hand = 0; @@ -490,7 +489,9 @@ void inet_twsk_purge(struct inet_hashinfo *hashinfo, restart_rcu: rcu_read_lock(); restart: - sk_nulls_for_each_rcu(sk, node, &head->twchain) { + sk_nulls_for_each_rcu(sk, node, &head->chain) { + if (sk->sk_state != TCP_TIME_WAIT) + continue; tw = inet_twsk(sk); if ((tw->tw_family != family) || atomic_read(&twsk_net(tw)->count)) diff --git a/net/ipv4/ip_fragment.c b/net/ipv4/ip_fragment.c index b66910aaef4d..2481993a4970 100644 --- a/net/ipv4/ip_fragment.c +++ b/net/ipv4/ip_fragment.c @@ -106,6 +106,7 @@ struct ip4_create_arg { static unsigned int ipqhashfn(__be16 id, __be32 saddr, __be32 daddr, u8 prot) { + net_get_random_once(&ip4_frags.rnd, sizeof(ip4_frags.rnd)); return jhash_3words((__force u32)id << 16 | prot, (__force u32)saddr, (__force u32)daddr, ip4_frags.rnd) & (INETFRAGS_HASHSZ - 1); diff --git a/net/ipv4/ip_output.c b/net/ipv4/ip_output.c index 3982eabf61e1..912402752f2f 100644 --- a/net/ipv4/ip_output.c +++ b/net/ipv4/ip_output.c @@ -810,7 +810,7 @@ static int __ip_append_data(struct sock *sk, int copy; int err; int offset = 0; - unsigned int maxfraglen, fragheaderlen; + unsigned int maxfraglen, fragheaderlen, maxnonfragsize; int csummode = CHECKSUM_NONE; struct rtable *rt = (struct rtable *)cork->dst; @@ -823,8 +823,10 @@ static int __ip_append_data(struct sock *sk, fragheaderlen = sizeof(struct iphdr) + (opt ? opt->optlen : 0); maxfraglen = ((mtu - fragheaderlen) & ~7) + fragheaderlen; + maxnonfragsize = (inet->pmtudisc >= IP_PMTUDISC_DO) ? + mtu : 0xFFFF; - if (cork->length + length > 0xFFFF - fragheaderlen) { + if (cork->length + length > maxnonfragsize - fragheaderlen) { ip_local_error(sk, EMSGSIZE, fl4->daddr, inet->inet_dport, mtu-exthdrlen); return -EMSGSIZE; @@ -1035,7 +1037,6 @@ error: static int ip_setup_cork(struct sock *sk, struct inet_cork *cork, struct ipcm_cookie *ipc, struct rtable **rtp) { - struct inet_sock *inet = inet_sk(sk); struct ip_options_rcu *opt; struct rtable *rt; @@ -1061,10 +1062,13 @@ static int ip_setup_cork(struct sock *sk, struct inet_cork *cork, * We steal reference to this route, caller should not release it */ *rtp = NULL; - cork->fragsize = inet->pmtudisc == IP_PMTUDISC_PROBE ? - rt->dst.dev->mtu : dst_mtu(&rt->dst); + cork->fragsize = ip_sk_use_pmtu(sk) ? + dst_mtu(&rt->dst) : rt->dst.dev->mtu; cork->dst = &rt->dst; cork->length = 0; + cork->ttl = ipc->ttl; + cork->tos = ipc->tos; + cork->priority = ipc->priority; cork->tx_flags = ipc->tx_flags; return 0; @@ -1119,7 +1123,7 @@ ssize_t ip_append_page(struct sock *sk, struct flowi4 *fl4, struct page *page, int mtu; int len; int err; - unsigned int maxfraglen, fragheaderlen, fraggap; + unsigned int maxfraglen, fragheaderlen, fraggap, maxnonfragsize; if (inet->hdrincl) return -EPERM; @@ -1143,8 +1147,10 @@ ssize_t ip_append_page(struct sock *sk, struct flowi4 *fl4, struct page *page, fragheaderlen = sizeof(struct iphdr) + (opt ? opt->optlen : 0); maxfraglen = ((mtu - fragheaderlen) & ~7) + fragheaderlen; + maxnonfragsize = (inet->pmtudisc >= IP_PMTUDISC_DO) ? + mtu : 0xFFFF; - if (cork->length + size > 0xFFFF - fragheaderlen) { + if (cork->length + size > maxnonfragsize - fragheaderlen) { ip_local_error(sk, EMSGSIZE, fl4->daddr, inet->inet_dport, mtu); return -EMSGSIZE; } @@ -1308,7 +1314,8 @@ struct sk_buff *__ip_make_skb(struct sock *sk, /* DF bit is set when we want to see DF on outgoing frames. * If local_df is set too, we still allow to fragment this frame * locally. */ - if (inet->pmtudisc >= IP_PMTUDISC_DO || + if (inet->pmtudisc == IP_PMTUDISC_DO || + inet->pmtudisc == IP_PMTUDISC_PROBE || (skb->len <= dst_mtu(&rt->dst) && ip_dont_fragment(sk, &rt->dst))) df = htons(IP_DF); @@ -1316,7 +1323,9 @@ struct sk_buff *__ip_make_skb(struct sock *sk, if (cork->flags & IPCORK_OPT) opt = cork->opt; - if (rt->rt_type == RTN_MULTICAST) + if (cork->ttl != 0) + ttl = cork->ttl; + else if (rt->rt_type == RTN_MULTICAST) ttl = inet->mc_ttl; else ttl = ip_select_ttl(inet, &rt->dst); @@ -1324,7 +1333,7 @@ struct sk_buff *__ip_make_skb(struct sock *sk, iph = ip_hdr(skb); iph->version = 4; iph->ihl = 5; - iph->tos = inet->tos; + iph->tos = (cork->tos != -1) ? cork->tos : inet->tos; iph->frag_off = df; iph->ttl = ttl; iph->protocol = sk->sk_protocol; @@ -1336,7 +1345,7 @@ struct sk_buff *__ip_make_skb(struct sock *sk, ip_options_build(skb, opt, cork->addr, rt, 0); } - skb->priority = sk->sk_priority; + skb->priority = (cork->tos != -1) ? cork->priority: sk->sk_priority; skb->mark = sk->sk_mark; /* * Steal rt from cork.dst to avoid a pair of atomic_inc/atomic_dec @@ -1486,6 +1495,8 @@ void ip_send_unicast_reply(struct net *net, struct sk_buff *skb, __be32 daddr, ipc.addr = daddr; ipc.opt = NULL; ipc.tx_flags = 0; + ipc.ttl = 0; + ipc.tos = -1; if (replyopts.opt.opt.optlen) { ipc.opt = &replyopts.opt; diff --git a/net/ipv4/ip_sockglue.c b/net/ipv4/ip_sockglue.c index d9c4f113d709..3f858266fa7e 100644 --- a/net/ipv4/ip_sockglue.c +++ b/net/ipv4/ip_sockglue.c @@ -189,7 +189,7 @@ EXPORT_SYMBOL(ip_cmsg_recv); int ip_cmsg_send(struct net *net, struct msghdr *msg, struct ipcm_cookie *ipc) { - int err; + int err, val; struct cmsghdr *cmsg; for (cmsg = CMSG_FIRSTHDR(msg); cmsg; cmsg = CMSG_NXTHDR(msg, cmsg)) { @@ -215,6 +215,24 @@ int ip_cmsg_send(struct net *net, struct msghdr *msg, struct ipcm_cookie *ipc) ipc->addr = info->ipi_spec_dst.s_addr; break; } + case IP_TTL: + if (cmsg->cmsg_len != CMSG_LEN(sizeof(int))) + return -EINVAL; + val = *(int *)CMSG_DATA(cmsg); + if (val < 1 || val > 255) + return -EINVAL; + ipc->ttl = val; + break; + case IP_TOS: + if (cmsg->cmsg_len != CMSG_LEN(sizeof(int))) + return -EINVAL; + val = *(int *)CMSG_DATA(cmsg); + if (val < 0 || val > 255) + return -EINVAL; + ipc->tos = val; + ipc->priority = rt_tos2priority(ipc->tos); + break; + default: return -EINVAL; } @@ -609,7 +627,7 @@ static int do_ip_setsockopt(struct sock *sk, int level, inet->nodefrag = val ? 1 : 0; break; case IP_MTU_DISCOVER: - if (val < IP_PMTUDISC_DONT || val > IP_PMTUDISC_PROBE) + if (val < IP_PMTUDISC_DONT || val > IP_PMTUDISC_INTERFACE) goto e_inval; inet->pmtudisc = val; break; @@ -1034,11 +1052,12 @@ e_inval: * destination in skb->cb[] before dst drop. * This way, receiver doesnt make cache line misses to read rtable. */ -void ipv4_pktinfo_prepare(struct sk_buff *skb) +void ipv4_pktinfo_prepare(const struct sock *sk, struct sk_buff *skb) { struct in_pktinfo *pktinfo = PKTINFO_SKB_CB(skb); - if (skb_rtable(skb)) { + if ((inet_sk(sk)->cmsg_flags & IP_CMSG_PKTINFO) && + skb_rtable(skb)) { pktinfo->ipi_ifindex = inet_iif(skb); pktinfo->ipi_spec_dst.s_addr = fib_compute_spec_dst(skb); } else { diff --git a/net/ipv4/ip_tunnel_core.c b/net/ipv4/ip_tunnel_core.c index c31e3ad98ef2..42ffbc8d65c6 100644 --- a/net/ipv4/ip_tunnel_core.c +++ b/net/ipv4/ip_tunnel_core.c @@ -116,3 +116,36 @@ int iptunnel_pull_header(struct sk_buff *skb, int hdr_len, __be16 inner_proto) return 0; } EXPORT_SYMBOL_GPL(iptunnel_pull_header); + +struct sk_buff *iptunnel_handle_offloads(struct sk_buff *skb, + bool csum_help, + int gso_type_mask) +{ + int err; + + if (likely(!skb->encapsulation)) { + skb_reset_inner_headers(skb); + skb->encapsulation = 1; + } + + if (skb_is_gso(skb)) { + err = skb_unclone(skb, GFP_ATOMIC); + if (unlikely(err)) + goto error; + skb_shinfo(skb)->gso_type |= gso_type_mask; + return skb; + } + + if (skb->ip_summed == CHECKSUM_PARTIAL && csum_help) { + err = skb_checksum_help(skb); + if (unlikely(err)) + goto error; + } else if (skb->ip_summed != CHECKSUM_PARTIAL) + skb->ip_summed = CHECKSUM_NONE; + + return skb; +error: + kfree_skb(skb); + return ERR_PTR(err); +} +EXPORT_SYMBOL_GPL(iptunnel_handle_offloads); diff --git a/net/ipv4/ip_vti.c b/net/ipv4/ip_vti.c index 6e87f853d033..5d9c845d288a 100644 --- a/net/ipv4/ip_vti.c +++ b/net/ipv4/ip_vti.c @@ -49,70 +49,6 @@ static struct rtnl_link_ops vti_link_ops __read_mostly; static int vti_net_id __read_mostly; static int vti_tunnel_init(struct net_device *dev); -static int vti_err(struct sk_buff *skb, u32 info) -{ - - /* All the routers (except for Linux) return only - * 8 bytes of packet payload. It means, that precise relaying of - * ICMP in the real Internet is absolutely infeasible. - */ - struct net *net = dev_net(skb->dev); - struct ip_tunnel_net *itn = net_generic(net, vti_net_id); - struct iphdr *iph = (struct iphdr *)skb->data; - const int type = icmp_hdr(skb)->type; - const int code = icmp_hdr(skb)->code; - struct ip_tunnel *t; - int err; - - switch (type) { - default: - case ICMP_PARAMETERPROB: - return 0; - - case ICMP_DEST_UNREACH: - switch (code) { - case ICMP_SR_FAILED: - case ICMP_PORT_UNREACH: - /* Impossible event. */ - return 0; - default: - /* All others are translated to HOST_UNREACH. */ - break; - } - break; - case ICMP_TIME_EXCEEDED: - if (code != ICMP_EXC_TTL) - return 0; - break; - } - - err = -ENOENT; - - t = ip_tunnel_lookup(itn, skb->dev->ifindex, TUNNEL_NO_KEY, - iph->daddr, iph->saddr, 0); - if (t == NULL) - goto out; - - if (type == ICMP_DEST_UNREACH && code == ICMP_FRAG_NEEDED) { - ipv4_update_pmtu(skb, dev_net(skb->dev), info, - t->parms.link, 0, IPPROTO_IPIP, 0); - err = 0; - goto out; - } - - err = 0; - if (t->parms.iph.ttl == 0 && type == ICMP_TIME_EXCEEDED) - goto out; - - if (time_before(jiffies, t->err_time + IPTUNNEL_ERR_TIMEO)) - t->err_count++; - else - t->err_count = 1; - t->err_time = jiffies; -out: - return err; -} - /* We dont digest the packet therefore let the packet pass */ static int vti_rcv(struct sk_buff *skb) { @@ -304,9 +240,8 @@ static void __net_init vti_fb_tunnel_init(struct net_device *dev) iph->ihl = 5; } -static struct xfrm_tunnel vti_handler __read_mostly = { +static struct xfrm_tunnel_notifier vti_handler __read_mostly = { .handler = vti_rcv, - .err_handler = vti_err, .priority = 1, }; diff --git a/net/ipv4/ipip.c b/net/ipv4/ipip.c index 7f80fb4b82d3..fe3e9f7f1f0b 100644 --- a/net/ipv4/ipip.c +++ b/net/ipv4/ipip.c @@ -220,17 +220,17 @@ static netdev_tx_t ipip_tunnel_xmit(struct sk_buff *skb, struct net_device *dev) if (unlikely(skb->protocol != htons(ETH_P_IP))) goto tx_error; - if (likely(!skb->encapsulation)) { - skb_reset_inner_headers(skb); - skb->encapsulation = 1; - } + skb = iptunnel_handle_offloads(skb, false, SKB_GSO_IPIP); + if (IS_ERR(skb)) + goto out; ip_tunnel_xmit(skb, dev, tiph, tiph->protocol); return NETDEV_TX_OK; tx_error: - dev->stats.tx_errors++; dev_kfree_skb(skb); +out: + dev->stats.tx_errors++; return NETDEV_TX_OK; } @@ -275,6 +275,7 @@ static const struct net_device_ops ipip_netdev_ops = { #define IPIP_FEATURES (NETIF_F_SG | \ NETIF_F_FRAGLIST | \ NETIF_F_HIGHDMA | \ + NETIF_F_GSO_SOFTWARE | \ NETIF_F_HW_CSUM) static void ipip_tunnel_setup(struct net_device *dev) diff --git a/net/ipv4/netfilter/Kconfig b/net/ipv4/netfilter/Kconfig index 1657e39b291f..40d56073cd19 100644 --- a/net/ipv4/netfilter/Kconfig +++ b/net/ipv4/netfilter/Kconfig @@ -36,6 +36,27 @@ config NF_CONNTRACK_PROC_COMPAT If unsure, say Y. +config NF_TABLES_IPV4 + depends on NF_TABLES + tristate "IPv4 nf_tables support" + +config NFT_REJECT_IPV4 + depends on NF_TABLES_IPV4 + tristate "nf_tables IPv4 reject support" + +config NFT_CHAIN_ROUTE_IPV4 + depends on NF_TABLES_IPV4 + tristate "IPv4 nf_tables route chain support" + +config NFT_CHAIN_NAT_IPV4 + depends on NF_TABLES_IPV4 + depends on NF_NAT_IPV4 && NFT_NAT + tristate "IPv4 nf_tables nat chain support" + +config NF_TABLES_ARP + depends on NF_TABLES + tristate "ARP nf_tables support" + config IP_NF_IPTABLES tristate "IP tables support (required for filtering/masq/NAT)" default m if NETFILTER_ADVANCED=n diff --git a/net/ipv4/netfilter/Makefile b/net/ipv4/netfilter/Makefile index 3622b248b6dd..19df72b7ba88 100644 --- a/net/ipv4/netfilter/Makefile +++ b/net/ipv4/netfilter/Makefile @@ -27,6 +27,12 @@ obj-$(CONFIG_NF_NAT_SNMP_BASIC) += nf_nat_snmp_basic.o # NAT protocols (nf_nat) obj-$(CONFIG_NF_NAT_PROTO_GRE) += nf_nat_proto_gre.o +obj-$(CONFIG_NF_TABLES_IPV4) += nf_tables_ipv4.o +obj-$(CONFIG_NFT_REJECT_IPV4) += nft_reject_ipv4.o +obj-$(CONFIG_NFT_CHAIN_ROUTE_IPV4) += nft_chain_route_ipv4.o +obj-$(CONFIG_NFT_CHAIN_NAT_IPV4) += nft_chain_nat_ipv4.o +obj-$(CONFIG_NF_TABLES_ARP) += nf_tables_arp.o + # generic IP tables obj-$(CONFIG_IP_NF_IPTABLES) += ip_tables.o diff --git a/net/ipv4/netfilter/arptable_filter.c b/net/ipv4/netfilter/arptable_filter.c index a865f6f94013..802ddecb30b8 100644 --- a/net/ipv4/netfilter/arptable_filter.c +++ b/net/ipv4/netfilter/arptable_filter.c @@ -27,13 +27,14 @@ static const struct xt_table packet_filter = { /* The work comes in here from netfilter.c */ static unsigned int -arptable_filter_hook(unsigned int hook, struct sk_buff *skb, +arptable_filter_hook(const struct nf_hook_ops *ops, struct sk_buff *skb, const struct net_device *in, const struct net_device *out, int (*okfn)(struct sk_buff *)) { const struct net *net = dev_net((in != NULL) ? in : out); - return arpt_do_table(skb, hook, in, out, net->ipv4.arptable_filter); + return arpt_do_table(skb, ops->hooknum, in, out, + net->ipv4.arptable_filter); } static struct nf_hook_ops *arpfilter_ops __read_mostly; diff --git a/net/ipv4/netfilter/ipt_CLUSTERIP.c b/net/ipv4/netfilter/ipt_CLUSTERIP.c index 0b732efd32e2..2510c02c2d21 100644 --- a/net/ipv4/netfilter/ipt_CLUSTERIP.c +++ b/net/ipv4/netfilter/ipt_CLUSTERIP.c @@ -28,6 +28,7 @@ #include <linux/netfilter_ipv4/ipt_CLUSTERIP.h> #include <net/netfilter/nf_conntrack.h> #include <net/net_namespace.h> +#include <net/netns/generic.h> #include <net/checksum.h> #include <net/ip.h> @@ -57,15 +58,21 @@ struct clusterip_config { struct rcu_head rcu; }; -static LIST_HEAD(clusterip_configs); +#ifdef CONFIG_PROC_FS +static const struct file_operations clusterip_proc_fops; +#endif -/* clusterip_lock protects the clusterip_configs list */ -static DEFINE_SPINLOCK(clusterip_lock); +static int clusterip_net_id __read_mostly; + +struct clusterip_net { + struct list_head configs; + /* lock protects the configs list */ + spinlock_t lock; #ifdef CONFIG_PROC_FS -static const struct file_operations clusterip_proc_fops; -static struct proc_dir_entry *clusterip_procdir; + struct proc_dir_entry *procdir; #endif +}; static inline void clusterip_config_get(struct clusterip_config *c) @@ -92,10 +99,13 @@ clusterip_config_put(struct clusterip_config *c) static inline void clusterip_config_entry_put(struct clusterip_config *c) { + struct net *net = dev_net(c->dev); + struct clusterip_net *cn = net_generic(net, clusterip_net_id); + local_bh_disable(); - if (atomic_dec_and_lock(&c->entries, &clusterip_lock)) { + if (atomic_dec_and_lock(&c->entries, &cn->lock)) { list_del_rcu(&c->list); - spin_unlock(&clusterip_lock); + spin_unlock(&cn->lock); local_bh_enable(); dev_mc_del(c->dev, c->clustermac); @@ -113,11 +123,12 @@ clusterip_config_entry_put(struct clusterip_config *c) } static struct clusterip_config * -__clusterip_config_find(__be32 clusterip) +__clusterip_config_find(struct net *net, __be32 clusterip) { struct clusterip_config *c; + struct clusterip_net *cn = net_generic(net, clusterip_net_id); - list_for_each_entry_rcu(c, &clusterip_configs, list) { + list_for_each_entry_rcu(c, &cn->configs, list) { if (c->clusterip == clusterip) return c; } @@ -126,12 +137,12 @@ __clusterip_config_find(__be32 clusterip) } static inline struct clusterip_config * -clusterip_config_find_get(__be32 clusterip, int entry) +clusterip_config_find_get(struct net *net, __be32 clusterip, int entry) { struct clusterip_config *c; rcu_read_lock_bh(); - c = __clusterip_config_find(clusterip); + c = __clusterip_config_find(net, clusterip); if (c) { if (unlikely(!atomic_inc_not_zero(&c->refcount))) c = NULL; @@ -158,6 +169,7 @@ clusterip_config_init(const struct ipt_clusterip_tgt_info *i, __be32 ip, struct net_device *dev) { struct clusterip_config *c; + struct clusterip_net *cn = net_generic(dev_net(dev), clusterip_net_id); c = kzalloc(sizeof(*c), GFP_ATOMIC); if (!c) @@ -180,7 +192,7 @@ clusterip_config_init(const struct ipt_clusterip_tgt_info *i, __be32 ip, /* create proc dir entry */ sprintf(buffer, "%pI4", &ip); c->pde = proc_create_data(buffer, S_IWUSR|S_IRUSR, - clusterip_procdir, + cn->procdir, &clusterip_proc_fops, c); if (!c->pde) { kfree(c); @@ -189,9 +201,9 @@ clusterip_config_init(const struct ipt_clusterip_tgt_info *i, __be32 ip, } #endif - spin_lock_bh(&clusterip_lock); - list_add_rcu(&c->list, &clusterip_configs); - spin_unlock_bh(&clusterip_lock); + spin_lock_bh(&cn->lock); + list_add_rcu(&c->list, &cn->configs); + spin_unlock_bh(&cn->lock); return c; } @@ -370,7 +382,7 @@ static int clusterip_tg_check(const struct xt_tgchk_param *par) /* FIXME: further sanity checks */ - config = clusterip_config_find_get(e->ip.dst.s_addr, 1); + config = clusterip_config_find_get(par->net, e->ip.dst.s_addr, 1); if (!config) { if (!(cipinfo->flags & CLUSTERIP_FLAG_NEW)) { pr_info("no config found for %pI4, need 'new'\n", @@ -384,7 +396,7 @@ static int clusterip_tg_check(const struct xt_tgchk_param *par) return -EINVAL; } - dev = dev_get_by_name(&init_net, e->ip.iniface); + dev = dev_get_by_name(par->net, e->ip.iniface); if (!dev) { pr_info("no such interface %s\n", e->ip.iniface); @@ -483,7 +495,7 @@ static void arp_print(struct arp_payload *payload) #endif static unsigned int -arp_mangle(unsigned int hook, +arp_mangle(const struct nf_hook_ops *ops, struct sk_buff *skb, const struct net_device *in, const struct net_device *out, @@ -492,6 +504,7 @@ arp_mangle(unsigned int hook, struct arphdr *arp = arp_hdr(skb); struct arp_payload *payload; struct clusterip_config *c; + struct net *net = dev_net(in ? in : out); /* we don't care about non-ethernet and non-ipv4 ARP */ if (arp->ar_hrd != htons(ARPHRD_ETHER) || @@ -508,7 +521,7 @@ arp_mangle(unsigned int hook, /* if there is no clusterip configuration for the arp reply's * source ip, we don't want to mangle it */ - c = clusterip_config_find_get(payload->src_ip, 0); + c = clusterip_config_find_get(net, payload->src_ip, 0); if (!c) return NF_ACCEPT; @@ -698,48 +711,75 @@ static const struct file_operations clusterip_proc_fops = { #endif /* CONFIG_PROC_FS */ +static int clusterip_net_init(struct net *net) +{ + struct clusterip_net *cn = net_generic(net, clusterip_net_id); + + INIT_LIST_HEAD(&cn->configs); + + spin_lock_init(&cn->lock); + +#ifdef CONFIG_PROC_FS + cn->procdir = proc_mkdir("ipt_CLUSTERIP", net->proc_net); + if (!cn->procdir) { + pr_err("Unable to proc dir entry\n"); + return -ENOMEM; + } +#endif /* CONFIG_PROC_FS */ + + return 0; +} + +static void clusterip_net_exit(struct net *net) +{ +#ifdef CONFIG_PROC_FS + struct clusterip_net *cn = net_generic(net, clusterip_net_id); + proc_remove(cn->procdir); +#endif +} + +static struct pernet_operations clusterip_net_ops = { + .init = clusterip_net_init, + .exit = clusterip_net_exit, + .id = &clusterip_net_id, + .size = sizeof(struct clusterip_net), +}; + static int __init clusterip_tg_init(void) { int ret; - ret = xt_register_target(&clusterip_tg_reg); + ret = register_pernet_subsys(&clusterip_net_ops); if (ret < 0) return ret; + ret = xt_register_target(&clusterip_tg_reg); + if (ret < 0) + goto cleanup_subsys; + ret = nf_register_hook(&cip_arp_ops); if (ret < 0) goto cleanup_target; -#ifdef CONFIG_PROC_FS - clusterip_procdir = proc_mkdir("ipt_CLUSTERIP", init_net.proc_net); - if (!clusterip_procdir) { - pr_err("Unable to proc dir entry\n"); - ret = -ENOMEM; - goto cleanup_hook; - } -#endif /* CONFIG_PROC_FS */ - pr_info("ClusterIP Version %s loaded successfully\n", CLUSTERIP_VERSION); + return 0; -#ifdef CONFIG_PROC_FS -cleanup_hook: - nf_unregister_hook(&cip_arp_ops); -#endif /* CONFIG_PROC_FS */ cleanup_target: xt_unregister_target(&clusterip_tg_reg); +cleanup_subsys: + unregister_pernet_subsys(&clusterip_net_ops); return ret; } static void __exit clusterip_tg_exit(void) { pr_info("ClusterIP Version %s unloading\n", CLUSTERIP_VERSION); -#ifdef CONFIG_PROC_FS - proc_remove(clusterip_procdir); -#endif + nf_unregister_hook(&cip_arp_ops); xt_unregister_target(&clusterip_tg_reg); + unregister_pernet_subsys(&clusterip_net_ops); /* Wait for completion of call_rcu_bh()'s (clusterip_config_rcu_free) */ rcu_barrier_bh(); diff --git a/net/ipv4/netfilter/ipt_SYNPROXY.c b/net/ipv4/netfilter/ipt_SYNPROXY.c index b6346bf2fde3..01cffeaa0085 100644 --- a/net/ipv4/netfilter/ipt_SYNPROXY.c +++ b/net/ipv4/netfilter/ipt_SYNPROXY.c @@ -297,7 +297,7 @@ synproxy_tg4(struct sk_buff *skb, const struct xt_action_param *par) return XT_CONTINUE; } -static unsigned int ipv4_synproxy_hook(unsigned int hooknum, +static unsigned int ipv4_synproxy_hook(const struct nf_hook_ops *ops, struct sk_buff *skb, const struct net_device *in, const struct net_device *out, diff --git a/net/ipv4/netfilter/iptable_filter.c b/net/ipv4/netfilter/iptable_filter.c index 50af5b45c050..e08a74a243a8 100644 --- a/net/ipv4/netfilter/iptable_filter.c +++ b/net/ipv4/netfilter/iptable_filter.c @@ -33,20 +33,21 @@ static const struct xt_table packet_filter = { }; static unsigned int -iptable_filter_hook(unsigned int hook, struct sk_buff *skb, +iptable_filter_hook(const struct nf_hook_ops *ops, struct sk_buff *skb, const struct net_device *in, const struct net_device *out, int (*okfn)(struct sk_buff *)) { const struct net *net; - if (hook == NF_INET_LOCAL_OUT && + if (ops->hooknum == NF_INET_LOCAL_OUT && (skb->len < sizeof(struct iphdr) || ip_hdrlen(skb) < sizeof(struct iphdr))) /* root is playing with raw sockets. */ return NF_ACCEPT; net = dev_net((in != NULL) ? in : out); - return ipt_do_table(skb, hook, in, out, net->ipv4.iptable_filter); + return ipt_do_table(skb, ops->hooknum, in, out, + net->ipv4.iptable_filter); } static struct nf_hook_ops *filter_ops __read_mostly; diff --git a/net/ipv4/netfilter/iptable_mangle.c b/net/ipv4/netfilter/iptable_mangle.c index 0d8cd82e0fad..6a5079c34bb3 100644 --- a/net/ipv4/netfilter/iptable_mangle.c +++ b/net/ipv4/netfilter/iptable_mangle.c @@ -79,19 +79,19 @@ ipt_mangle_out(struct sk_buff *skb, const struct net_device *out) /* The work comes in here from netfilter.c. */ static unsigned int -iptable_mangle_hook(unsigned int hook, +iptable_mangle_hook(const struct nf_hook_ops *ops, struct sk_buff *skb, const struct net_device *in, const struct net_device *out, int (*okfn)(struct sk_buff *)) { - if (hook == NF_INET_LOCAL_OUT) + if (ops->hooknum == NF_INET_LOCAL_OUT) return ipt_mangle_out(skb, out); - if (hook == NF_INET_POST_ROUTING) - return ipt_do_table(skb, hook, in, out, + if (ops->hooknum == NF_INET_POST_ROUTING) + return ipt_do_table(skb, ops->hooknum, in, out, dev_net(out)->ipv4.iptable_mangle); /* PREROUTING/INPUT/FORWARD: */ - return ipt_do_table(skb, hook, in, out, + return ipt_do_table(skb, ops->hooknum, in, out, dev_net(in)->ipv4.iptable_mangle); } diff --git a/net/ipv4/netfilter/iptable_nat.c b/net/ipv4/netfilter/iptable_nat.c index 683bfaffed65..ee2886126e3d 100644 --- a/net/ipv4/netfilter/iptable_nat.c +++ b/net/ipv4/netfilter/iptable_nat.c @@ -61,7 +61,7 @@ static unsigned int nf_nat_rule_find(struct sk_buff *skb, unsigned int hooknum, } static unsigned int -nf_nat_ipv4_fn(unsigned int hooknum, +nf_nat_ipv4_fn(const struct nf_hook_ops *ops, struct sk_buff *skb, const struct net_device *in, const struct net_device *out, @@ -71,7 +71,7 @@ nf_nat_ipv4_fn(unsigned int hooknum, enum ip_conntrack_info ctinfo; struct nf_conn_nat *nat; /* maniptype == SRC for postrouting. */ - enum nf_nat_manip_type maniptype = HOOK2MANIP(hooknum); + enum nf_nat_manip_type maniptype = HOOK2MANIP(ops->hooknum); /* We never see fragments: conntrack defrags on pre-routing * and local-out, and nf_nat_out protects post-routing. @@ -108,7 +108,7 @@ nf_nat_ipv4_fn(unsigned int hooknum, case IP_CT_RELATED_REPLY: if (ip_hdr(skb)->protocol == IPPROTO_ICMP) { if (!nf_nat_icmp_reply_translation(skb, ct, ctinfo, - hooknum)) + ops->hooknum)) return NF_DROP; else return NF_ACCEPT; @@ -121,14 +121,14 @@ nf_nat_ipv4_fn(unsigned int hooknum, if (!nf_nat_initialized(ct, maniptype)) { unsigned int ret; - ret = nf_nat_rule_find(skb, hooknum, in, out, ct); + ret = nf_nat_rule_find(skb, ops->hooknum, in, out, ct); if (ret != NF_ACCEPT) return ret; } else { pr_debug("Already setup manip %s for ct %p\n", maniptype == NF_NAT_MANIP_SRC ? "SRC" : "DST", ct); - if (nf_nat_oif_changed(hooknum, ctinfo, nat, out)) + if (nf_nat_oif_changed(ops->hooknum, ctinfo, nat, out)) goto oif_changed; } break; @@ -137,11 +137,11 @@ nf_nat_ipv4_fn(unsigned int hooknum, /* ESTABLISHED */ NF_CT_ASSERT(ctinfo == IP_CT_ESTABLISHED || ctinfo == IP_CT_ESTABLISHED_REPLY); - if (nf_nat_oif_changed(hooknum, ctinfo, nat, out)) + if (nf_nat_oif_changed(ops->hooknum, ctinfo, nat, out)) goto oif_changed; } - return nf_nat_packet(ct, ctinfo, hooknum, skb); + return nf_nat_packet(ct, ctinfo, ops->hooknum, skb); oif_changed: nf_ct_kill_acct(ct, ctinfo, skb); @@ -149,7 +149,7 @@ oif_changed: } static unsigned int -nf_nat_ipv4_in(unsigned int hooknum, +nf_nat_ipv4_in(const struct nf_hook_ops *ops, struct sk_buff *skb, const struct net_device *in, const struct net_device *out, @@ -158,7 +158,7 @@ nf_nat_ipv4_in(unsigned int hooknum, unsigned int ret; __be32 daddr = ip_hdr(skb)->daddr; - ret = nf_nat_ipv4_fn(hooknum, skb, in, out, okfn); + ret = nf_nat_ipv4_fn(ops, skb, in, out, okfn); if (ret != NF_DROP && ret != NF_STOLEN && daddr != ip_hdr(skb)->daddr) skb_dst_drop(skb); @@ -167,7 +167,7 @@ nf_nat_ipv4_in(unsigned int hooknum, } static unsigned int -nf_nat_ipv4_out(unsigned int hooknum, +nf_nat_ipv4_out(const struct nf_hook_ops *ops, struct sk_buff *skb, const struct net_device *in, const struct net_device *out, @@ -185,7 +185,7 @@ nf_nat_ipv4_out(unsigned int hooknum, ip_hdrlen(skb) < sizeof(struct iphdr)) return NF_ACCEPT; - ret = nf_nat_ipv4_fn(hooknum, skb, in, out, okfn); + ret = nf_nat_ipv4_fn(ops, skb, in, out, okfn); #ifdef CONFIG_XFRM if (ret != NF_DROP && ret != NF_STOLEN && !(IPCB(skb)->flags & IPSKB_XFRM_TRANSFORMED) && @@ -207,7 +207,7 @@ nf_nat_ipv4_out(unsigned int hooknum, } static unsigned int -nf_nat_ipv4_local_fn(unsigned int hooknum, +nf_nat_ipv4_local_fn(const struct nf_hook_ops *ops, struct sk_buff *skb, const struct net_device *in, const struct net_device *out, @@ -223,7 +223,7 @@ nf_nat_ipv4_local_fn(unsigned int hooknum, ip_hdrlen(skb) < sizeof(struct iphdr)) return NF_ACCEPT; - ret = nf_nat_ipv4_fn(hooknum, skb, in, out, okfn); + ret = nf_nat_ipv4_fn(ops, skb, in, out, okfn); if (ret != NF_DROP && ret != NF_STOLEN && (ct = nf_ct_get(skb, &ctinfo)) != NULL) { enum ip_conntrack_dir dir = CTINFO2DIR(ctinfo); diff --git a/net/ipv4/netfilter/iptable_raw.c b/net/ipv4/netfilter/iptable_raw.c index 1f82aea11df6..b2f7e8f98316 100644 --- a/net/ipv4/netfilter/iptable_raw.c +++ b/net/ipv4/netfilter/iptable_raw.c @@ -20,20 +20,20 @@ static const struct xt_table packet_raw = { /* The work comes in here from netfilter.c. */ static unsigned int -iptable_raw_hook(unsigned int hook, struct sk_buff *skb, +iptable_raw_hook(const struct nf_hook_ops *ops, struct sk_buff *skb, const struct net_device *in, const struct net_device *out, int (*okfn)(struct sk_buff *)) { const struct net *net; - if (hook == NF_INET_LOCAL_OUT && + if (ops->hooknum == NF_INET_LOCAL_OUT && (skb->len < sizeof(struct iphdr) || ip_hdrlen(skb) < sizeof(struct iphdr))) /* root is playing with raw sockets. */ return NF_ACCEPT; net = dev_net((in != NULL) ? in : out); - return ipt_do_table(skb, hook, in, out, net->ipv4.iptable_raw); + return ipt_do_table(skb, ops->hooknum, in, out, net->ipv4.iptable_raw); } static struct nf_hook_ops *rawtable_ops __read_mostly; diff --git a/net/ipv4/netfilter/iptable_security.c b/net/ipv4/netfilter/iptable_security.c index f867a8d38bf7..c86647ed2078 100644 --- a/net/ipv4/netfilter/iptable_security.c +++ b/net/ipv4/netfilter/iptable_security.c @@ -37,21 +37,22 @@ static const struct xt_table security_table = { }; static unsigned int -iptable_security_hook(unsigned int hook, struct sk_buff *skb, +iptable_security_hook(const struct nf_hook_ops *ops, struct sk_buff *skb, const struct net_device *in, const struct net_device *out, int (*okfn)(struct sk_buff *)) { const struct net *net; - if (hook == NF_INET_LOCAL_OUT && + if (ops->hooknum == NF_INET_LOCAL_OUT && (skb->len < sizeof(struct iphdr) || ip_hdrlen(skb) < sizeof(struct iphdr))) /* Somebody is playing with raw sockets. */ return NF_ACCEPT; net = dev_net((in != NULL) ? in : out); - return ipt_do_table(skb, hook, in, out, net->ipv4.iptable_security); + return ipt_do_table(skb, ops->hooknum, in, out, + net->ipv4.iptable_security); } static struct nf_hook_ops *sectbl_ops __read_mostly; diff --git a/net/ipv4/netfilter/nf_conntrack_l3proto_ipv4.c b/net/ipv4/netfilter/nf_conntrack_l3proto_ipv4.c index 86f5b34a4ed1..ecd8bec411c9 100644 --- a/net/ipv4/netfilter/nf_conntrack_l3proto_ipv4.c +++ b/net/ipv4/netfilter/nf_conntrack_l3proto_ipv4.c @@ -92,7 +92,7 @@ static int ipv4_get_l4proto(const struct sk_buff *skb, unsigned int nhoff, return NF_ACCEPT; } -static unsigned int ipv4_helper(unsigned int hooknum, +static unsigned int ipv4_helper(const struct nf_hook_ops *ops, struct sk_buff *skb, const struct net_device *in, const struct net_device *out, @@ -121,7 +121,7 @@ static unsigned int ipv4_helper(unsigned int hooknum, ct, ctinfo); } -static unsigned int ipv4_confirm(unsigned int hooknum, +static unsigned int ipv4_confirm(const struct nf_hook_ops *ops, struct sk_buff *skb, const struct net_device *in, const struct net_device *out, @@ -147,16 +147,16 @@ out: return nf_conntrack_confirm(skb); } -static unsigned int ipv4_conntrack_in(unsigned int hooknum, +static unsigned int ipv4_conntrack_in(const struct nf_hook_ops *ops, struct sk_buff *skb, const struct net_device *in, const struct net_device *out, int (*okfn)(struct sk_buff *)) { - return nf_conntrack_in(dev_net(in), PF_INET, hooknum, skb); + return nf_conntrack_in(dev_net(in), PF_INET, ops->hooknum, skb); } -static unsigned int ipv4_conntrack_local(unsigned int hooknum, +static unsigned int ipv4_conntrack_local(const struct nf_hook_ops *ops, struct sk_buff *skb, const struct net_device *in, const struct net_device *out, @@ -166,7 +166,7 @@ static unsigned int ipv4_conntrack_local(unsigned int hooknum, if (skb->len < sizeof(struct iphdr) || ip_hdrlen(skb) < sizeof(struct iphdr)) return NF_ACCEPT; - return nf_conntrack_in(dev_net(out), PF_INET, hooknum, skb); + return nf_conntrack_in(dev_net(out), PF_INET, ops->hooknum, skb); } /* Connection tracking may drop packets, but never alters them, so diff --git a/net/ipv4/netfilter/nf_defrag_ipv4.c b/net/ipv4/netfilter/nf_defrag_ipv4.c index 742815518b0f..12e13bd82b5b 100644 --- a/net/ipv4/netfilter/nf_defrag_ipv4.c +++ b/net/ipv4/netfilter/nf_defrag_ipv4.c @@ -60,7 +60,7 @@ static enum ip_defrag_users nf_ct_defrag_user(unsigned int hooknum, return IP_DEFRAG_CONNTRACK_OUT + zone; } -static unsigned int ipv4_conntrack_defrag(unsigned int hooknum, +static unsigned int ipv4_conntrack_defrag(const struct nf_hook_ops *ops, struct sk_buff *skb, const struct net_device *in, const struct net_device *out, @@ -83,7 +83,9 @@ static unsigned int ipv4_conntrack_defrag(unsigned int hooknum, #endif /* Gather fragments. */ if (ip_is_fragment(ip_hdr(skb))) { - enum ip_defrag_users user = nf_ct_defrag_user(hooknum, skb); + enum ip_defrag_users user = + nf_ct_defrag_user(ops->hooknum, skb); + if (nf_ct_ipv4_gather_frags(skb, user)) return NF_STOLEN; } diff --git a/net/ipv4/netfilter/nf_tables_arp.c b/net/ipv4/netfilter/nf_tables_arp.c new file mode 100644 index 000000000000..3e67ef1c676f --- /dev/null +++ b/net/ipv4/netfilter/nf_tables_arp.c @@ -0,0 +1,102 @@ +/* + * Copyright (c) 2008-2010 Patrick McHardy <kaber@trash.net> + * Copyright (c) 2013 Pablo Neira Ayuso <pablo@netfilter.org> + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License version 2 as + * published by the Free Software Foundation. + * + * Development of this code funded by Astaro AG (http://www.astaro.com/) + */ + +#include <linux/module.h> +#include <linux/init.h> +#include <linux/netfilter_arp.h> +#include <net/netfilter/nf_tables.h> + +static struct nft_af_info nft_af_arp __read_mostly = { + .family = NFPROTO_ARP, + .nhooks = NF_ARP_NUMHOOKS, + .owner = THIS_MODULE, +}; + +static int nf_tables_arp_init_net(struct net *net) +{ + net->nft.arp = kmalloc(sizeof(struct nft_af_info), GFP_KERNEL); + if (net->nft.arp== NULL) + return -ENOMEM; + + memcpy(net->nft.arp, &nft_af_arp, sizeof(nft_af_arp)); + + if (nft_register_afinfo(net, net->nft.arp) < 0) + goto err; + + return 0; +err: + kfree(net->nft.arp); + return -ENOMEM; +} + +static void nf_tables_arp_exit_net(struct net *net) +{ + nft_unregister_afinfo(net->nft.arp); + kfree(net->nft.arp); +} + +static struct pernet_operations nf_tables_arp_net_ops = { + .init = nf_tables_arp_init_net, + .exit = nf_tables_arp_exit_net, +}; + +static unsigned int +nft_do_chain_arp(const struct nf_hook_ops *ops, + struct sk_buff *skb, + const struct net_device *in, + const struct net_device *out, + int (*okfn)(struct sk_buff *)) +{ + struct nft_pktinfo pkt; + + nft_set_pktinfo(&pkt, ops, skb, in, out); + + return nft_do_chain_pktinfo(&pkt, ops); +} + +static struct nf_chain_type filter_arp = { + .family = NFPROTO_ARP, + .name = "filter", + .type = NFT_CHAIN_T_DEFAULT, + .hook_mask = (1 << NF_ARP_IN) | + (1 << NF_ARP_OUT) | + (1 << NF_ARP_FORWARD), + .fn = { + [NF_ARP_IN] = nft_do_chain_arp, + [NF_ARP_OUT] = nft_do_chain_arp, + [NF_ARP_FORWARD] = nft_do_chain_arp, + }, +}; + +static int __init nf_tables_arp_init(void) +{ + int ret; + + nft_register_chain_type(&filter_arp); + ret = register_pernet_subsys(&nf_tables_arp_net_ops); + if (ret < 0) + nft_unregister_chain_type(&filter_arp); + + return ret; +} + +static void __exit nf_tables_arp_exit(void) +{ + unregister_pernet_subsys(&nf_tables_arp_net_ops); + nft_unregister_chain_type(&filter_arp); +} + +module_init(nf_tables_arp_init); +module_exit(nf_tables_arp_exit); + +MODULE_LICENSE("GPL"); +MODULE_AUTHOR("Patrick McHardy <kaber@trash.net>"); +MODULE_ALIAS_NFT_FAMILY(3); /* NFPROTO_ARP */ diff --git a/net/ipv4/netfilter/nf_tables_ipv4.c b/net/ipv4/netfilter/nf_tables_ipv4.c new file mode 100644 index 000000000000..0f4cbfeb19bd --- /dev/null +++ b/net/ipv4/netfilter/nf_tables_ipv4.c @@ -0,0 +1,127 @@ +/* + * Copyright (c) 2008 Patrick McHardy <kaber@trash.net> + * Copyright (c) 2012-2013 Pablo Neira Ayuso <pablo@netfilter.org> + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License version 2 as + * published by the Free Software Foundation. + * + * Development of this code funded by Astaro AG (http://www.astaro.com/) + */ + +#include <linux/init.h> +#include <linux/module.h> +#include <linux/ip.h> +#include <linux/netfilter_ipv4.h> +#include <net/netfilter/nf_tables.h> +#include <net/net_namespace.h> +#include <net/ip.h> +#include <net/netfilter/nf_tables_ipv4.h> + +static unsigned int nft_ipv4_output(const struct nf_hook_ops *ops, + struct sk_buff *skb, + const struct net_device *in, + const struct net_device *out, + int (*okfn)(struct sk_buff *)) +{ + struct nft_pktinfo pkt; + + if (unlikely(skb->len < sizeof(struct iphdr) || + ip_hdr(skb)->ihl < sizeof(struct iphdr) / 4)) { + if (net_ratelimit()) + pr_info("nf_tables_ipv4: ignoring short SOCK_RAW " + "packet\n"); + return NF_ACCEPT; + } + nft_set_pktinfo_ipv4(&pkt, ops, skb, in, out); + + return nft_do_chain_pktinfo(&pkt, ops); +} + +static struct nft_af_info nft_af_ipv4 __read_mostly = { + .family = NFPROTO_IPV4, + .nhooks = NF_INET_NUMHOOKS, + .owner = THIS_MODULE, + .hooks = { + [NF_INET_LOCAL_OUT] = nft_ipv4_output, + }, +}; + +static int nf_tables_ipv4_init_net(struct net *net) +{ + net->nft.ipv4 = kmalloc(sizeof(struct nft_af_info), GFP_KERNEL); + if (net->nft.ipv4 == NULL) + return -ENOMEM; + + memcpy(net->nft.ipv4, &nft_af_ipv4, sizeof(nft_af_ipv4)); + + if (nft_register_afinfo(net, net->nft.ipv4) < 0) + goto err; + + return 0; +err: + kfree(net->nft.ipv4); + return -ENOMEM; +} + +static void nf_tables_ipv4_exit_net(struct net *net) +{ + nft_unregister_afinfo(net->nft.ipv4); + kfree(net->nft.ipv4); +} + +static struct pernet_operations nf_tables_ipv4_net_ops = { + .init = nf_tables_ipv4_init_net, + .exit = nf_tables_ipv4_exit_net, +}; + +static unsigned int +nft_do_chain_ipv4(const struct nf_hook_ops *ops, + struct sk_buff *skb, + const struct net_device *in, + const struct net_device *out, + int (*okfn)(struct sk_buff *)) +{ + struct nft_pktinfo pkt; + + nft_set_pktinfo_ipv4(&pkt, ops, skb, in, out); + + return nft_do_chain_pktinfo(&pkt, ops); +} + +static struct nf_chain_type filter_ipv4 = { + .family = NFPROTO_IPV4, + .name = "filter", + .type = NFT_CHAIN_T_DEFAULT, + .hook_mask = (1 << NF_INET_LOCAL_IN) | + (1 << NF_INET_LOCAL_OUT) | + (1 << NF_INET_FORWARD) | + (1 << NF_INET_PRE_ROUTING) | + (1 << NF_INET_POST_ROUTING), + .fn = { + [NF_INET_LOCAL_IN] = nft_do_chain_ipv4, + [NF_INET_LOCAL_OUT] = nft_ipv4_output, + [NF_INET_FORWARD] = nft_do_chain_ipv4, + [NF_INET_PRE_ROUTING] = nft_do_chain_ipv4, + [NF_INET_POST_ROUTING] = nft_do_chain_ipv4, + }, +}; + +static int __init nf_tables_ipv4_init(void) +{ + nft_register_chain_type(&filter_ipv4); + return register_pernet_subsys(&nf_tables_ipv4_net_ops); +} + +static void __exit nf_tables_ipv4_exit(void) +{ + unregister_pernet_subsys(&nf_tables_ipv4_net_ops); + nft_unregister_chain_type(&filter_ipv4); +} + +module_init(nf_tables_ipv4_init); +module_exit(nf_tables_ipv4_exit); + +MODULE_LICENSE("GPL"); +MODULE_AUTHOR("Patrick McHardy <kaber@trash.net>"); +MODULE_ALIAS_NFT_FAMILY(AF_INET); diff --git a/net/ipv4/netfilter/nft_chain_nat_ipv4.c b/net/ipv4/netfilter/nft_chain_nat_ipv4.c new file mode 100644 index 000000000000..cf2c792cd971 --- /dev/null +++ b/net/ipv4/netfilter/nft_chain_nat_ipv4.c @@ -0,0 +1,205 @@ +/* + * Copyright (c) 2008-2009 Patrick McHardy <kaber@trash.net> + * Copyright (c) 2012 Pablo Neira Ayuso <pablo@netfilter.org> + * Copyright (c) 2012 Intel Corporation + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License version 2 as + * published by the Free Software Foundation. + * + * Development of this code funded by Astaro AG (http://www.astaro.com/) + */ + +#include <linux/module.h> +#include <linux/init.h> +#include <linux/list.h> +#include <linux/skbuff.h> +#include <linux/ip.h> +#include <linux/netfilter.h> +#include <linux/netfilter_ipv4.h> +#include <linux/netfilter/nf_tables.h> +#include <net/netfilter/nf_conntrack.h> +#include <net/netfilter/nf_nat.h> +#include <net/netfilter/nf_nat_core.h> +#include <net/netfilter/nf_tables.h> +#include <net/netfilter/nf_tables_ipv4.h> +#include <net/netfilter/nf_nat_l3proto.h> +#include <net/ip.h> + +/* + * NAT chains + */ + +static unsigned int nf_nat_fn(const struct nf_hook_ops *ops, + struct sk_buff *skb, + const struct net_device *in, + const struct net_device *out, + int (*okfn)(struct sk_buff *)) +{ + enum ip_conntrack_info ctinfo; + struct nf_conn *ct = nf_ct_get(skb, &ctinfo); + struct nf_conn_nat *nat; + enum nf_nat_manip_type maniptype = HOOK2MANIP(ops->hooknum); + struct nft_pktinfo pkt; + unsigned int ret; + + if (ct == NULL || nf_ct_is_untracked(ct)) + return NF_ACCEPT; + + NF_CT_ASSERT(!(ip_hdr(skb)->frag_off & htons(IP_MF | IP_OFFSET))); + + nat = nfct_nat(ct); + if (nat == NULL) { + /* Conntrack module was loaded late, can't add extension. */ + if (nf_ct_is_confirmed(ct)) + return NF_ACCEPT; + nat = nf_ct_ext_add(ct, NF_CT_EXT_NAT, GFP_ATOMIC); + if (nat == NULL) + return NF_ACCEPT; + } + + switch (ctinfo) { + case IP_CT_RELATED: + case IP_CT_RELATED + IP_CT_IS_REPLY: + if (ip_hdr(skb)->protocol == IPPROTO_ICMP) { + if (!nf_nat_icmp_reply_translation(skb, ct, ctinfo, + ops->hooknum)) + return NF_DROP; + else + return NF_ACCEPT; + } + /* Fall through */ + case IP_CT_NEW: + if (nf_nat_initialized(ct, maniptype)) + break; + + nft_set_pktinfo_ipv4(&pkt, ops, skb, in, out); + + ret = nft_do_chain_pktinfo(&pkt, ops); + if (ret != NF_ACCEPT) + return ret; + if (!nf_nat_initialized(ct, maniptype)) { + ret = nf_nat_alloc_null_binding(ct, ops->hooknum); + if (ret != NF_ACCEPT) + return ret; + } + default: + break; + } + + return nf_nat_packet(ct, ctinfo, ops->hooknum, skb); +} + +static unsigned int nf_nat_prerouting(const struct nf_hook_ops *ops, + struct sk_buff *skb, + const struct net_device *in, + const struct net_device *out, + int (*okfn)(struct sk_buff *)) +{ + __be32 daddr = ip_hdr(skb)->daddr; + unsigned int ret; + + ret = nf_nat_fn(ops, skb, in, out, okfn); + if (ret != NF_DROP && ret != NF_STOLEN && + ip_hdr(skb)->daddr != daddr) { + skb_dst_drop(skb); + } + return ret; +} + +static unsigned int nf_nat_postrouting(const struct nf_hook_ops *ops, + struct sk_buff *skb, + const struct net_device *in, + const struct net_device *out, + int (*okfn)(struct sk_buff *)) +{ + enum ip_conntrack_info ctinfo __maybe_unused; + const struct nf_conn *ct __maybe_unused; + unsigned int ret; + + ret = nf_nat_fn(ops, skb, in, out, okfn); +#ifdef CONFIG_XFRM + if (ret != NF_DROP && ret != NF_STOLEN && + (ct = nf_ct_get(skb, &ctinfo)) != NULL) { + enum ip_conntrack_dir dir = CTINFO2DIR(ctinfo); + + if (ct->tuplehash[dir].tuple.src.u3.ip != + ct->tuplehash[!dir].tuple.dst.u3.ip || + ct->tuplehash[dir].tuple.src.u.all != + ct->tuplehash[!dir].tuple.dst.u.all) + return nf_xfrm_me_harder(skb, AF_INET) == 0 ? + ret : NF_DROP; + } +#endif + return ret; +} + +static unsigned int nf_nat_output(const struct nf_hook_ops *ops, + struct sk_buff *skb, + const struct net_device *in, + const struct net_device *out, + int (*okfn)(struct sk_buff *)) +{ + enum ip_conntrack_info ctinfo; + const struct nf_conn *ct; + unsigned int ret; + + ret = nf_nat_fn(ops, skb, in, out, okfn); + if (ret != NF_DROP && ret != NF_STOLEN && + (ct = nf_ct_get(skb, &ctinfo)) != NULL) { + enum ip_conntrack_dir dir = CTINFO2DIR(ctinfo); + + if (ct->tuplehash[dir].tuple.dst.u3.ip != + ct->tuplehash[!dir].tuple.src.u3.ip) { + if (ip_route_me_harder(skb, RTN_UNSPEC)) + ret = NF_DROP; + } +#ifdef CONFIG_XFRM + else if (ct->tuplehash[dir].tuple.dst.u.all != + ct->tuplehash[!dir].tuple.src.u.all) + if (nf_xfrm_me_harder(skb, AF_INET)) + ret = NF_DROP; +#endif + } + return ret; +} + +static struct nf_chain_type nft_chain_nat_ipv4 = { + .family = NFPROTO_IPV4, + .name = "nat", + .type = NFT_CHAIN_T_NAT, + .hook_mask = (1 << NF_INET_PRE_ROUTING) | + (1 << NF_INET_POST_ROUTING) | + (1 << NF_INET_LOCAL_OUT) | + (1 << NF_INET_LOCAL_IN), + .fn = { + [NF_INET_PRE_ROUTING] = nf_nat_prerouting, + [NF_INET_POST_ROUTING] = nf_nat_postrouting, + [NF_INET_LOCAL_OUT] = nf_nat_output, + [NF_INET_LOCAL_IN] = nf_nat_fn, + }, + .me = THIS_MODULE, +}; + +static int __init nft_chain_nat_init(void) +{ + int err; + + err = nft_register_chain_type(&nft_chain_nat_ipv4); + if (err < 0) + return err; + + return 0; +} + +static void __exit nft_chain_nat_exit(void) +{ + nft_unregister_chain_type(&nft_chain_nat_ipv4); +} + +module_init(nft_chain_nat_init); +module_exit(nft_chain_nat_exit); + +MODULE_LICENSE("GPL"); +MODULE_AUTHOR("Patrick McHardy <kaber@trash.net>"); +MODULE_ALIAS_NFT_CHAIN(AF_INET, "nat"); diff --git a/net/ipv4/netfilter/nft_chain_route_ipv4.c b/net/ipv4/netfilter/nft_chain_route_ipv4.c new file mode 100644 index 000000000000..4e6bf9a3d7aa --- /dev/null +++ b/net/ipv4/netfilter/nft_chain_route_ipv4.c @@ -0,0 +1,90 @@ +/* + * Copyright (c) 2008 Patrick McHardy <kaber@trash.net> + * Copyright (c) 2012 Pablo Neira Ayuso <pablo@netfilter.org> + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License version 2 as + * published by the Free Software Foundation. + */ + +#include <linux/module.h> +#include <linux/init.h> +#include <linux/list.h> +#include <linux/skbuff.h> +#include <linux/netlink.h> +#include <linux/netfilter.h> +#include <linux/netfilter_ipv4.h> +#include <linux/netfilter/nfnetlink.h> +#include <linux/netfilter/nf_tables.h> +#include <net/netfilter/nf_tables.h> +#include <net/netfilter/nf_tables_ipv4.h> +#include <net/route.h> +#include <net/ip.h> + +static unsigned int nf_route_table_hook(const struct nf_hook_ops *ops, + struct sk_buff *skb, + const struct net_device *in, + const struct net_device *out, + int (*okfn)(struct sk_buff *)) +{ + unsigned int ret; + struct nft_pktinfo pkt; + u32 mark; + __be32 saddr, daddr; + u_int8_t tos; + const struct iphdr *iph; + + /* root is playing with raw sockets. */ + if (skb->len < sizeof(struct iphdr) || + ip_hdrlen(skb) < sizeof(struct iphdr)) + return NF_ACCEPT; + + nft_set_pktinfo_ipv4(&pkt, ops, skb, in, out); + + mark = skb->mark; + iph = ip_hdr(skb); + saddr = iph->saddr; + daddr = iph->daddr; + tos = iph->tos; + + ret = nft_do_chain_pktinfo(&pkt, ops); + if (ret != NF_DROP && ret != NF_QUEUE) { + iph = ip_hdr(skb); + + if (iph->saddr != saddr || + iph->daddr != daddr || + skb->mark != mark || + iph->tos != tos) + if (ip_route_me_harder(skb, RTN_UNSPEC)) + ret = NF_DROP; + } + return ret; +} + +static struct nf_chain_type nft_chain_route_ipv4 = { + .family = NFPROTO_IPV4, + .name = "route", + .type = NFT_CHAIN_T_ROUTE, + .hook_mask = (1 << NF_INET_LOCAL_OUT), + .fn = { + [NF_INET_LOCAL_OUT] = nf_route_table_hook, + }, + .me = THIS_MODULE, +}; + +static int __init nft_chain_route_init(void) +{ + return nft_register_chain_type(&nft_chain_route_ipv4); +} + +static void __exit nft_chain_route_exit(void) +{ + nft_unregister_chain_type(&nft_chain_route_ipv4); +} + +module_init(nft_chain_route_init); +module_exit(nft_chain_route_exit); + +MODULE_LICENSE("GPL"); +MODULE_AUTHOR("Patrick McHardy <kaber@trash.net>"); +MODULE_ALIAS_NFT_CHAIN(AF_INET, "route"); diff --git a/net/ipv4/netfilter/nft_reject_ipv4.c b/net/ipv4/netfilter/nft_reject_ipv4.c new file mode 100644 index 000000000000..fff5ba1a33b7 --- /dev/null +++ b/net/ipv4/netfilter/nft_reject_ipv4.c @@ -0,0 +1,123 @@ +/* + * Copyright (c) 2008-2009 Patrick McHardy <kaber@trash.net> + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License version 2 as + * published by the Free Software Foundation. + * + * Development of this code funded by Astaro AG (http://www.astaro.com/) + */ + +#include <linux/kernel.h> +#include <linux/init.h> +#include <linux/module.h> +#include <linux/netlink.h> +#include <linux/netfilter.h> +#include <linux/netfilter/nf_tables.h> +#include <net/netfilter/nf_tables.h> +#include <net/icmp.h> + +struct nft_reject { + enum nft_reject_types type:8; + u8 icmp_code; +}; + +static void nft_reject_eval(const struct nft_expr *expr, + struct nft_data data[NFT_REG_MAX + 1], + const struct nft_pktinfo *pkt) +{ + struct nft_reject *priv = nft_expr_priv(expr); + + switch (priv->type) { + case NFT_REJECT_ICMP_UNREACH: + icmp_send(pkt->skb, ICMP_DEST_UNREACH, priv->icmp_code, 0); + break; + case NFT_REJECT_TCP_RST: + break; + } + + data[NFT_REG_VERDICT].verdict = NF_DROP; +} + +static const struct nla_policy nft_reject_policy[NFTA_REJECT_MAX + 1] = { + [NFTA_REJECT_TYPE] = { .type = NLA_U32 }, + [NFTA_REJECT_ICMP_CODE] = { .type = NLA_U8 }, +}; + +static int nft_reject_init(const struct nft_ctx *ctx, + const struct nft_expr *expr, + const struct nlattr * const tb[]) +{ + struct nft_reject *priv = nft_expr_priv(expr); + + if (tb[NFTA_REJECT_TYPE] == NULL) + return -EINVAL; + + priv->type = ntohl(nla_get_be32(tb[NFTA_REJECT_TYPE])); + switch (priv->type) { + case NFT_REJECT_ICMP_UNREACH: + if (tb[NFTA_REJECT_ICMP_CODE] == NULL) + return -EINVAL; + priv->icmp_code = nla_get_u8(tb[NFTA_REJECT_ICMP_CODE]); + case NFT_REJECT_TCP_RST: + break; + default: + return -EINVAL; + } + + return 0; +} + +static int nft_reject_dump(struct sk_buff *skb, const struct nft_expr *expr) +{ + const struct nft_reject *priv = nft_expr_priv(expr); + + if (nla_put_be32(skb, NFTA_REJECT_TYPE, priv->type)) + goto nla_put_failure; + + switch (priv->type) { + case NFT_REJECT_ICMP_UNREACH: + if (nla_put_u8(skb, NFTA_REJECT_ICMP_CODE, priv->icmp_code)) + goto nla_put_failure; + break; + } + + return 0; + +nla_put_failure: + return -1; +} + +static struct nft_expr_type nft_reject_type; +static const struct nft_expr_ops nft_reject_ops = { + .type = &nft_reject_type, + .size = NFT_EXPR_SIZE(sizeof(struct nft_reject)), + .eval = nft_reject_eval, + .init = nft_reject_init, + .dump = nft_reject_dump, +}; + +static struct nft_expr_type nft_reject_type __read_mostly = { + .name = "reject", + .ops = &nft_reject_ops, + .policy = nft_reject_policy, + .maxattr = NFTA_REJECT_MAX, + .owner = THIS_MODULE, +}; + +static int __init nft_reject_module_init(void) +{ + return nft_register_expr(&nft_reject_type); +} + +static void __exit nft_reject_module_exit(void) +{ + nft_unregister_expr(&nft_reject_type); +} + +module_init(nft_reject_module_init); +module_exit(nft_reject_module_exit); + +MODULE_LICENSE("GPL"); +MODULE_AUTHOR("Patrick McHardy <kaber@trash.net>"); +MODULE_ALIAS_NFT_EXPR("reject"); diff --git a/net/ipv4/ping.c b/net/ipv4/ping.c index d7d9882d4cae..9afbdb19f4a2 100644 --- a/net/ipv4/ping.c +++ b/net/ipv4/ping.c @@ -202,15 +202,14 @@ static struct sock *ping_lookup(struct net *net, struct sk_buff *skb, u16 ident) #if IS_ENABLED(CONFIG_IPV6) } else if (skb->protocol == htons(ETH_P_IPV6) && sk->sk_family == AF_INET6) { - struct ipv6_pinfo *np = inet6_sk(sk); pr_debug("found: %p: num=%d, daddr=%pI6c, dif=%d\n", sk, (int) isk->inet_num, - &inet6_sk(sk)->rcv_saddr, + &sk->sk_v6_rcv_saddr, sk->sk_bound_dev_if); - if (!ipv6_addr_any(&np->rcv_saddr) && - !ipv6_addr_equal(&np->rcv_saddr, + if (!ipv6_addr_any(&sk->sk_v6_rcv_saddr) && + !ipv6_addr_equal(&sk->sk_v6_rcv_saddr, &ipv6_hdr(skb)->daddr)) continue; #endif @@ -237,11 +236,11 @@ static void inet_get_ping_group_range_net(struct net *net, kgid_t *low, unsigned int seq; do { - seq = read_seqbegin(&sysctl_local_ports.lock); + seq = read_seqbegin(&net->ipv4.sysctl_local_ports.lock); *low = data[0]; *high = data[1]; - } while (read_seqretry(&sysctl_local_ports.lock, seq)); + } while (read_seqretry(&net->ipv4.sysctl_local_ports.lock, seq)); } @@ -362,7 +361,7 @@ static void ping_set_saddr(struct sock *sk, struct sockaddr *saddr) } else if (saddr->sa_family == AF_INET6) { struct sockaddr_in6 *addr = (struct sockaddr_in6 *) saddr; struct ipv6_pinfo *np = inet6_sk(sk); - np->rcv_saddr = np->saddr = addr->sin6_addr; + sk->sk_v6_rcv_saddr = np->saddr = addr->sin6_addr; #endif } } @@ -376,7 +375,7 @@ static void ping_clear_saddr(struct sock *sk, int dif) #if IS_ENABLED(CONFIG_IPV6) } else if (sk->sk_family == AF_INET6) { struct ipv6_pinfo *np = inet6_sk(sk); - memset(&np->rcv_saddr, 0, sizeof(np->rcv_saddr)); + memset(&sk->sk_v6_rcv_saddr, 0, sizeof(sk->sk_v6_rcv_saddr)); memset(&np->saddr, 0, sizeof(np->saddr)); #endif } @@ -416,10 +415,12 @@ int ping_bind(struct sock *sk, struct sockaddr *uaddr, int addr_len) (int)sk->sk_bound_dev_if); err = 0; - if ((sk->sk_family == AF_INET && isk->inet_rcv_saddr) || - (sk->sk_family == AF_INET6 && - !ipv6_addr_any(&inet6_sk(sk)->rcv_saddr))) + if (sk->sk_family == AF_INET && isk->inet_rcv_saddr) sk->sk_userlocks |= SOCK_BINDADDR_LOCK; +#if IS_ENABLED(CONFIG_IPV6) + if (sk->sk_family == AF_INET6 && !ipv6_addr_any(&sk->sk_v6_rcv_saddr)) + sk->sk_userlocks |= SOCK_BINDADDR_LOCK; +#endif if (snum) sk->sk_userlocks |= SOCK_BINDPORT_LOCK; @@ -429,7 +430,7 @@ int ping_bind(struct sock *sk, struct sockaddr *uaddr, int addr_len) #if IS_ENABLED(CONFIG_IPV6) if (sk->sk_family == AF_INET6) - memset(&inet6_sk(sk)->daddr, 0, sizeof(inet6_sk(sk)->daddr)); + memset(&sk->sk_v6_daddr, 0, sizeof(sk->sk_v6_daddr)); #endif sk_dst_reset(sk); @@ -713,6 +714,8 @@ int ping_v4_sendmsg(struct kiocb *iocb, struct sock *sk, struct msghdr *msg, ipc.opt = NULL; ipc.oif = sk->sk_bound_dev_if; ipc.tx_flags = 0; + ipc.ttl = 0; + ipc.tos = -1; sock_tx_timestamp(sk, &ipc.tx_flags); @@ -744,7 +747,7 @@ int ping_v4_sendmsg(struct kiocb *iocb, struct sock *sk, struct msghdr *msg, return -EINVAL; faddr = ipc.opt->opt.faddr; } - tos = RT_TOS(inet->tos); + tos = get_rttos(&ipc, inet); if (sock_flag(sk, SOCK_LOCALROUTE) || (msg->msg_flags & MSG_DONTROUTE) || (ipc.opt && ipc.opt->opt.is_strictroute)) { diff --git a/net/ipv4/raw.c b/net/ipv4/raw.c index 193db03540ad..41e1d2845c8f 100644 --- a/net/ipv4/raw.c +++ b/net/ipv4/raw.c @@ -299,7 +299,7 @@ static int raw_rcv_skb(struct sock *sk, struct sk_buff *skb) { /* Charge it to the socket. */ - ipv4_pktinfo_prepare(skb); + ipv4_pktinfo_prepare(sk, skb); if (sock_queue_rcv_skb(sk, skb) < 0) { kfree_skb(skb); return NET_RX_DROP; @@ -519,6 +519,8 @@ static int raw_sendmsg(struct kiocb *iocb, struct sock *sk, struct msghdr *msg, ipc.addr = inet->inet_saddr; ipc.opt = NULL; ipc.tx_flags = 0; + ipc.ttl = 0; + ipc.tos = -1; ipc.oif = sk->sk_bound_dev_if; if (msg->msg_controllen) { @@ -558,7 +560,7 @@ static int raw_sendmsg(struct kiocb *iocb, struct sock *sk, struct msghdr *msg, daddr = ipc.opt->opt.faddr; } } - tos = RT_CONN_FLAGS(sk); + tos = get_rtconn_flags(&ipc, sk); if (msg->msg_flags & MSG_DONTROUTE) tos |= RTO_ONLINK; diff --git a/net/ipv4/route.c b/net/ipv4/route.c index 6011615e810d..f428935c50db 100644 --- a/net/ipv4/route.c +++ b/net/ipv4/route.c @@ -295,7 +295,7 @@ static int rt_cpu_seq_show(struct seq_file *seq, void *v) seq_printf(seq,"%08x %08x %08x %08x %08x %08x %08x %08x " " %08x %08x %08x %08x %08x %08x %08x %08x %08x \n", dst_entries_get_slow(&ipv4_dst_ops), - st->in_hit, + 0, /* st->in_hit */ st->in_slow_tot, st->in_slow_mc, st->in_no_route, @@ -303,16 +303,16 @@ static int rt_cpu_seq_show(struct seq_file *seq, void *v) st->in_martian_dst, st->in_martian_src, - st->out_hit, + 0, /* st->out_hit */ st->out_slow_tot, st->out_slow_mc, - st->gc_total, - st->gc_ignored, - st->gc_goal_miss, - st->gc_dst_overflow, - st->in_hlist_search, - st->out_hlist_search + 0, /* st->gc_total */ + 0, /* st->gc_ignored */ + 0, /* st->gc_goal_miss */ + 0, /* st->gc_dst_overflow */ + 0, /* st->in_hlist_search */ + 0 /* st->out_hlist_search */ ); return 0; } @@ -1036,6 +1036,10 @@ void ipv4_sk_update_pmtu(struct sk_buff *skb, struct sock *sk, u32 mtu) bool new = false; bh_lock_sock(sk); + + if (!ip_sk_accept_pmtu(sk)) + goto out; + rt = (struct rtable *) __sk_dst_get(sk); if (sock_owned_by_user(sk) || !rt) { diff --git a/net/ipv4/syncookies.c b/net/ipv4/syncookies.c index 14a15c49129d..b95331e6c077 100644 --- a/net/ipv4/syncookies.c +++ b/net/ipv4/syncookies.c @@ -25,15 +25,7 @@ extern int sysctl_tcp_syncookies; -__u32 syncookie_secret[2][16-4+SHA_DIGEST_WORDS]; -EXPORT_SYMBOL(syncookie_secret); - -static __init int init_syncookies(void) -{ - get_random_bytes(syncookie_secret, sizeof(syncookie_secret)); - return 0; -} -__initcall(init_syncookies); +static u32 syncookie_secret[2][16-4+SHA_DIGEST_WORDS]; #define COOKIEBITS 24 /* Upper bits store count */ #define COOKIEMASK (((__u32)1 << COOKIEBITS) - 1) @@ -44,8 +36,11 @@ static DEFINE_PER_CPU(__u32 [16 + 5 + SHA_WORKSPACE_WORDS], static u32 cookie_hash(__be32 saddr, __be32 daddr, __be16 sport, __be16 dport, u32 count, int c) { - __u32 *tmp = __get_cpu_var(ipv4_cookie_scratch); + __u32 *tmp; + + net_get_random_once(syncookie_secret, sizeof(syncookie_secret)); + tmp = __get_cpu_var(ipv4_cookie_scratch); memcpy(tmp + 4, syncookie_secret[c], sizeof(syncookie_secret[c])); tmp[0] = (__force u32)saddr; tmp[1] = (__force u32)daddr; @@ -89,8 +84,7 @@ __u32 cookie_init_timestamp(struct request_sock *req) static __u32 secure_tcp_syn_cookie(__be32 saddr, __be32 daddr, __be16 sport, - __be16 dport, __u32 sseq, __u32 count, - __u32 data) + __be16 dport, __u32 sseq, __u32 data) { /* * Compute the secure sequence number. @@ -102,7 +96,7 @@ static __u32 secure_tcp_syn_cookie(__be32 saddr, __be32 daddr, __be16 sport, * As an extra hack, we add a small "data" value that encodes the * MSS into the second hash value. */ - + u32 count = tcp_cookie_time(); return (cookie_hash(saddr, daddr, sport, dport, 0, 0) + sseq + (count << COOKIEBITS) + ((cookie_hash(saddr, daddr, sport, dport, count, 1) + data) @@ -114,22 +108,21 @@ static __u32 secure_tcp_syn_cookie(__be32 saddr, __be32 daddr, __be16 sport, * If the syncookie is bad, the data returned will be out of * range. This must be checked by the caller. * - * The count value used to generate the cookie must be within - * "maxdiff" if the current (passed-in) "count". The return value - * is (__u32)-1 if this test fails. + * The count value used to generate the cookie must be less than + * MAX_SYNCOOKIE_AGE minutes in the past. + * The return value (__u32)-1 if this test fails. */ static __u32 check_tcp_syn_cookie(__u32 cookie, __be32 saddr, __be32 daddr, - __be16 sport, __be16 dport, __u32 sseq, - __u32 count, __u32 maxdiff) + __be16 sport, __be16 dport, __u32 sseq) { - __u32 diff; + u32 diff, count = tcp_cookie_time(); /* Strip away the layers from the cookie */ cookie -= cookie_hash(saddr, daddr, sport, dport, 0, 0) + sseq; /* Cookie is now reduced to (count * 2^24) ^ (hash % 2^24) */ diff = (count - (cookie >> COOKIEBITS)) & ((__u32) - 1 >> COOKIEBITS); - if (diff >= maxdiff) + if (diff >= MAX_SYNCOOKIE_AGE) return (__u32)-1; return (cookie - @@ -138,22 +131,22 @@ static __u32 check_tcp_syn_cookie(__u32 cookie, __be32 saddr, __be32 daddr, } /* - * MSS Values are taken from the 2009 paper - * 'Measuring TCP Maximum Segment Size' by S. Alcock and R. Nelson: - * - values 1440 to 1460 accounted for 80% of observed mss values - * - values outside the 536-1460 range are rare (<0.2%). + * MSS Values are chosen based on the 2011 paper + * 'An Analysis of TCP Maximum Segement Sizes' by S. Alcock and R. Nelson. + * Values .. + * .. lower than 536 are rare (< 0.2%) + * .. between 537 and 1299 account for less than < 1.5% of observed values + * .. in the 1300-1349 range account for about 15 to 20% of observed mss values + * .. exceeding 1460 are very rare (< 0.04%) * - * Table must be sorted. + * 1460 is the single most frequently announced mss value (30 to 46% depending + * on monitor location). Table must be sorted. */ static __u16 const msstab[] = { - 64, - 512, 536, - 1024, - 1440, + 1300, + 1440, /* 1440, 1452: PPPoE */ 1460, - 4312, - 8960, }; /* @@ -173,7 +166,7 @@ u32 __cookie_v4_init_sequence(const struct iphdr *iph, const struct tcphdr *th, return secure_tcp_syn_cookie(iph->saddr, iph->daddr, th->source, th->dest, ntohl(th->seq), - jiffies / (HZ * 60), mssind); + mssind); } EXPORT_SYMBOL_GPL(__cookie_v4_init_sequence); @@ -189,13 +182,6 @@ __u32 cookie_v4_init_sequence(struct sock *sk, struct sk_buff *skb, __u16 *mssp) } /* - * This (misnamed) value is the age of syncookie which is permitted. - * Its ideal value should be dependent on TCP_TIMEOUT_INIT and - * sysctl_tcp_retries1. It's a rather complicated formula (exponential - * backoff) to compute at runtime so it's currently hardcoded here. - */ -#define COUNTER_TRIES 4 -/* * Check if a ack sequence number is a valid syncookie. * Return the decoded mss if it is, or 0 if not. */ @@ -204,9 +190,7 @@ int __cookie_v4_check(const struct iphdr *iph, const struct tcphdr *th, { __u32 seq = ntohl(th->seq) - 1; __u32 mssind = check_tcp_syn_cookie(cookie, iph->saddr, iph->daddr, - th->source, th->dest, seq, - jiffies / (HZ * 60), - COUNTER_TRIES); + th->source, th->dest, seq); return mssind < ARRAY_SIZE(msstab) ? msstab[mssind] : 0; } @@ -315,10 +299,10 @@ struct sock *cookie_v4_check(struct sock *sk, struct sk_buff *skb, treq->rcv_isn = ntohl(th->seq) - 1; treq->snt_isn = cookie; req->mss = mss; - ireq->loc_port = th->dest; - ireq->rmt_port = th->source; - ireq->loc_addr = ip_hdr(skb)->daddr; - ireq->rmt_addr = ip_hdr(skb)->saddr; + ireq->ir_num = ntohs(th->dest); + ireq->ir_rmt_port = th->source; + ireq->ir_loc_addr = ip_hdr(skb)->daddr; + ireq->ir_rmt_addr = ip_hdr(skb)->saddr; ireq->ecn_ok = ecn_ok; ireq->snd_wscale = tcp_opt.snd_wscale; ireq->sack_ok = tcp_opt.sack_ok; @@ -358,8 +342,8 @@ struct sock *cookie_v4_check(struct sock *sk, struct sk_buff *skb, flowi4_init_output(&fl4, sk->sk_bound_dev_if, sk->sk_mark, RT_CONN_FLAGS(sk), RT_SCOPE_UNIVERSE, IPPROTO_TCP, inet_sk_flowi_flags(sk), - (opt && opt->srr) ? opt->faddr : ireq->rmt_addr, - ireq->loc_addr, th->source, th->dest); + (opt && opt->srr) ? opt->faddr : ireq->ir_rmt_addr, + ireq->ir_loc_addr, th->source, th->dest); security_req_classify_flow(req, flowi4_to_flowi(&fl4)); rt = ip_route_output_key(sock_net(sk), &fl4); if (IS_ERR(rt)) { diff --git a/net/ipv4/sysctl_net_ipv4.c b/net/ipv4/sysctl_net_ipv4.c index 540279f4c531..3d69ec8dac57 100644 --- a/net/ipv4/sysctl_net_ipv4.c +++ b/net/ipv4/sysctl_net_ipv4.c @@ -43,12 +43,12 @@ static int ip_ping_group_range_min[] = { 0, 0 }; static int ip_ping_group_range_max[] = { GID_T_MAX, GID_T_MAX }; /* Update system visible IP port range */ -static void set_local_port_range(int range[2]) +static void set_local_port_range(struct net *net, int range[2]) { - write_seqlock(&sysctl_local_ports.lock); - sysctl_local_ports.range[0] = range[0]; - sysctl_local_ports.range[1] = range[1]; - write_sequnlock(&sysctl_local_ports.lock); + write_seqlock(&net->ipv4.sysctl_local_ports.lock); + net->ipv4.sysctl_local_ports.range[0] = range[0]; + net->ipv4.sysctl_local_ports.range[1] = range[1]; + write_sequnlock(&net->ipv4.sysctl_local_ports.lock); } /* Validate changes from /proc interface. */ @@ -56,6 +56,8 @@ static int ipv4_local_port_range(struct ctl_table *table, int write, void __user *buffer, size_t *lenp, loff_t *ppos) { + struct net *net = + container_of(table->data, struct net, ipv4.sysctl_local_ports.range); int ret; int range[2]; struct ctl_table tmp = { @@ -66,14 +68,15 @@ static int ipv4_local_port_range(struct ctl_table *table, int write, .extra2 = &ip_local_port_range_max, }; - inet_get_local_port_range(range, range + 1); + inet_get_local_port_range(net, &range[0], &range[1]); + ret = proc_dointvec_minmax(&tmp, write, buffer, lenp, ppos); if (write && ret == 0) { if (range[1] < range[0]) ret = -EINVAL; else - set_local_port_range(range); + set_local_port_range(net, range); } return ret; @@ -83,23 +86,27 @@ static int ipv4_local_port_range(struct ctl_table *table, int write, static void inet_get_ping_group_range_table(struct ctl_table *table, kgid_t *low, kgid_t *high) { kgid_t *data = table->data; + struct net *net = + container_of(table->data, struct net, ipv4.sysctl_ping_group_range); unsigned int seq; do { - seq = read_seqbegin(&sysctl_local_ports.lock); + seq = read_seqbegin(&net->ipv4.sysctl_local_ports.lock); *low = data[0]; *high = data[1]; - } while (read_seqretry(&sysctl_local_ports.lock, seq)); + } while (read_seqretry(&net->ipv4.sysctl_local_ports.lock, seq)); } /* Update system visible IP port range */ static void set_ping_group_range(struct ctl_table *table, kgid_t low, kgid_t high) { kgid_t *data = table->data; - write_seqlock(&sysctl_local_ports.lock); + struct net *net = + container_of(table->data, struct net, ipv4.sysctl_ping_group_range); + write_seqlock(&net->ipv4.sysctl_local_ports.lock); data[0] = low; data[1] = high; - write_sequnlock(&sysctl_local_ports.lock); + write_sequnlock(&net->ipv4.sysctl_local_ports.lock); } /* Validate changes from /proc interface. */ @@ -193,49 +200,6 @@ static int proc_allowed_congestion_control(struct ctl_table *ctl, return ret; } -static int ipv4_tcp_mem(struct ctl_table *ctl, int write, - void __user *buffer, size_t *lenp, - loff_t *ppos) -{ - int ret; - unsigned long vec[3]; - struct net *net = current->nsproxy->net_ns; -#ifdef CONFIG_MEMCG_KMEM - struct mem_cgroup *memcg; -#endif - - struct ctl_table tmp = { - .data = &vec, - .maxlen = sizeof(vec), - .mode = ctl->mode, - }; - - if (!write) { - ctl->data = &net->ipv4.sysctl_tcp_mem; - return proc_doulongvec_minmax(ctl, write, buffer, lenp, ppos); - } - - ret = proc_doulongvec_minmax(&tmp, write, buffer, lenp, ppos); - if (ret) - return ret; - -#ifdef CONFIG_MEMCG_KMEM - rcu_read_lock(); - memcg = mem_cgroup_from_task(current); - - tcp_prot_mem(memcg, vec[0], 0); - tcp_prot_mem(memcg, vec[1], 1); - tcp_prot_mem(memcg, vec[2], 2); - rcu_read_unlock(); -#endif - - net->ipv4.sysctl_tcp_mem[0] = vec[0]; - net->ipv4.sysctl_tcp_mem[1] = vec[1]; - net->ipv4.sysctl_tcp_mem[2] = vec[2]; - - return 0; -} - static int proc_tcp_fastopen_key(struct ctl_table *ctl, int write, void __user *buffer, size_t *lenp, loff_t *ppos) @@ -267,6 +231,11 @@ static int proc_tcp_fastopen_key(struct ctl_table *ctl, int write, ret = -EINVAL; goto bad_key; } + /* Generate a dummy secret but don't publish it. This + * is needed so we don't regenerate a new key on the + * first invocation of tcp_fastopen_cookie_gen + */ + tcp_fastopen_init_key_once(false); tcp_fastopen_reset_cipher(user_key, TCP_FASTOPEN_KEY_LENGTH); } @@ -475,13 +444,6 @@ static struct ctl_table ipv4_table[] = { .proc_handler = proc_dointvec }, { - .procname = "ip_local_port_range", - .data = &sysctl_local_ports.range, - .maxlen = sizeof(sysctl_local_ports.range), - .mode = 0644, - .proc_handler = ipv4_local_port_range, - }, - { .procname = "ip_local_reserved_ports", .data = NULL, /* initialized in sysctl_ipv4_init */ .maxlen = 65536, @@ -552,6 +514,13 @@ static struct ctl_table ipv4_table[] = { .proc_handler = proc_dointvec }, { + .procname = "tcp_mem", + .maxlen = sizeof(sysctl_tcp_mem), + .data = &sysctl_tcp_mem, + .mode = 0644, + .proc_handler = proc_doulongvec_minmax, + }, + { .procname = "tcp_wmem", .data = &sysctl_tcp_wmem, .maxlen = sizeof(sysctl_tcp_wmem), @@ -732,13 +701,6 @@ static struct ctl_table ipv4_table[] = { .proc_handler = proc_allowed_congestion_control, }, { - .procname = "tcp_max_ssthresh", - .data = &sysctl_tcp_max_ssthresh, - .maxlen = sizeof(int), - .mode = 0644, - .proc_handler = proc_dointvec, - }, - { .procname = "tcp_thin_linear_timeouts", .data = &sysctl_tcp_thin_linear_timeouts, .maxlen = sizeof(int), @@ -854,10 +816,11 @@ static struct ctl_table ipv4_net_table[] = { .proc_handler = proc_dointvec }, { - .procname = "tcp_mem", - .maxlen = sizeof(init_net.ipv4.sysctl_tcp_mem), + .procname = "ip_local_port_range", + .maxlen = sizeof(init_net.ipv4.sysctl_local_ports.range), + .data = &init_net.ipv4.sysctl_local_ports.range, .mode = 0644, - .proc_handler = ipv4_tcp_mem, + .proc_handler = ipv4_local_port_range, }, { } }; @@ -868,30 +831,15 @@ static __net_init int ipv4_sysctl_init_net(struct net *net) table = ipv4_net_table; if (!net_eq(net, &init_net)) { + int i; + table = kmemdup(table, sizeof(ipv4_net_table), GFP_KERNEL); if (table == NULL) goto err_alloc; - table[0].data = - &net->ipv4.sysctl_icmp_echo_ignore_all; - table[1].data = - &net->ipv4.sysctl_icmp_echo_ignore_broadcasts; - table[2].data = - &net->ipv4.sysctl_icmp_ignore_bogus_error_responses; - table[3].data = - &net->ipv4.sysctl_icmp_errors_use_inbound_ifaddr; - table[4].data = - &net->ipv4.sysctl_icmp_ratelimit; - table[5].data = - &net->ipv4.sysctl_icmp_ratemask; - table[6].data = - &net->ipv4.sysctl_ping_group_range; - table[7].data = - &net->ipv4.sysctl_tcp_ecn; - - /* Don't export sysctls to unprivileged users */ - if (net->user_ns != &init_user_ns) - table[0].procname = NULL; + /* Update the variables to point into the current struct net */ + for (i = 0; i < ARRAY_SIZE(ipv4_net_table) - 1; i++) + table[i].data += (void *)net - (void *)&init_net; } /* @@ -901,7 +849,12 @@ static __net_init int ipv4_sysctl_init_net(struct net *net) net->ipv4.sysctl_ping_group_range[0] = make_kgid(&init_user_ns, 1); net->ipv4.sysctl_ping_group_range[1] = make_kgid(&init_user_ns, 0); - tcp_init_mem(net); + /* + * Set defaults for local port range + */ + seqlock_init(&net->ipv4.sysctl_local_ports.lock); + net->ipv4.sysctl_local_ports.range[0] = 32768; + net->ipv4.sysctl_local_ports.range[1] = 61000; net->ipv4.ipv4_hdr = register_net_sysctl(net, "net/ipv4", table); if (net->ipv4.ipv4_hdr == NULL) diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c index 6e5617b9f9db..8e8529d3c8c9 100644 --- a/net/ipv4/tcp.c +++ b/net/ipv4/tcp.c @@ -288,9 +288,11 @@ int sysctl_tcp_min_tso_segs __read_mostly = 2; struct percpu_counter tcp_orphan_count; EXPORT_SYMBOL_GPL(tcp_orphan_count); +long sysctl_tcp_mem[3] __read_mostly; int sysctl_tcp_wmem[3] __read_mostly; int sysctl_tcp_rmem[3] __read_mostly; +EXPORT_SYMBOL(sysctl_tcp_mem); EXPORT_SYMBOL(sysctl_tcp_rmem); EXPORT_SYMBOL(sysctl_tcp_wmem); @@ -3097,13 +3099,13 @@ static int __init set_thash_entries(char *str) } __setup("thash_entries=", set_thash_entries); -void tcp_init_mem(struct net *net) +static void tcp_init_mem(void) { unsigned long limit = nr_free_buffer_pages() / 8; limit = max(limit, 128UL); - net->ipv4.sysctl_tcp_mem[0] = limit / 4 * 3; - net->ipv4.sysctl_tcp_mem[1] = limit; - net->ipv4.sysctl_tcp_mem[2] = net->ipv4.sysctl_tcp_mem[0] * 2; + sysctl_tcp_mem[0] = limit / 4 * 3; + sysctl_tcp_mem[1] = limit; + sysctl_tcp_mem[2] = sysctl_tcp_mem[0] * 2; } void __init tcp_init(void) @@ -3137,10 +3139,9 @@ void __init tcp_init(void) &tcp_hashinfo.ehash_mask, 0, thash_entries ? 0 : 512 * 1024); - for (i = 0; i <= tcp_hashinfo.ehash_mask; i++) { + for (i = 0; i <= tcp_hashinfo.ehash_mask; i++) INIT_HLIST_NULLS_HEAD(&tcp_hashinfo.ehash[i].chain, i); - INIT_HLIST_NULLS_HEAD(&tcp_hashinfo.ehash[i].twchain, i); - } + if (inet_ehash_locks_alloc(&tcp_hashinfo)) panic("TCP: failed to alloc ehash_locks"); tcp_hashinfo.bhash = @@ -3166,7 +3167,7 @@ void __init tcp_init(void) sysctl_tcp_max_orphans = cnt / 2; sysctl_max_syn_backlog = max(128, cnt / 256); - tcp_init_mem(&init_net); + tcp_init_mem(); /* Set per-socket limits to no more than 1/128 the pressure threshold */ limit = nr_free_buffer_pages() << (PAGE_SHIFT - 7); max_wshare = min(4UL*1024*1024, limit); diff --git a/net/ipv4/tcp_bic.c b/net/ipv4/tcp_bic.c index f45e1c242440..821846fb0a7e 100644 --- a/net/ipv4/tcp_bic.c +++ b/net/ipv4/tcp_bic.c @@ -140,7 +140,8 @@ static inline void bictcp_update(struct bictcp *ca, u32 cwnd) ca->cnt = 1; } -static void bictcp_cong_avoid(struct sock *sk, u32 ack, u32 in_flight) +static void bictcp_cong_avoid(struct sock *sk, u32 ack, u32 acked, + u32 in_flight) { struct tcp_sock *tp = tcp_sk(sk); struct bictcp *ca = inet_csk_ca(sk); @@ -149,7 +150,7 @@ static void bictcp_cong_avoid(struct sock *sk, u32 ack, u32 in_flight) return; if (tp->snd_cwnd <= tp->snd_ssthresh) - tcp_slow_start(tp); + tcp_slow_start(tp, acked); else { bictcp_update(ca, tp->snd_cwnd); tcp_cong_avoid_ai(tp, ca->cnt); diff --git a/net/ipv4/tcp_cong.c b/net/ipv4/tcp_cong.c index 019c2389a341..ad37bf18ae4b 100644 --- a/net/ipv4/tcp_cong.c +++ b/net/ipv4/tcp_cong.c @@ -15,8 +15,6 @@ #include <linux/gfp.h> #include <net/tcp.h> -int sysctl_tcp_max_ssthresh = 0; - static DEFINE_SPINLOCK(tcp_cong_list_lock); static LIST_HEAD(tcp_cong_list); @@ -299,35 +297,24 @@ bool tcp_is_cwnd_limited(const struct sock *sk, u32 in_flight) } EXPORT_SYMBOL_GPL(tcp_is_cwnd_limited); -/* - * Slow start is used when congestion window is less than slow start - * threshold. This version implements the basic RFC2581 version - * and optionally supports: - * RFC3742 Limited Slow Start - growth limited to max_ssthresh - * RFC3465 Appropriate Byte Counting - growth limited by bytes acknowledged +/* Slow start is used when congestion window is no greater than the slow start + * threshold. We base on RFC2581 and also handle stretch ACKs properly. + * We do not implement RFC3465 Appropriate Byte Counting (ABC) per se but + * something better;) a packet is only considered (s)acked in its entirety to + * defend the ACK attacks described in the RFC. Slow start processes a stretch + * ACK of degree N as if N acks of degree 1 are received back to back except + * ABC caps N to 2. Slow start exits when cwnd grows over ssthresh and + * returns the leftover acks to adjust cwnd in congestion avoidance mode. */ -void tcp_slow_start(struct tcp_sock *tp) +int tcp_slow_start(struct tcp_sock *tp, u32 acked) { - int cnt; /* increase in packets */ - unsigned int delta = 0; - u32 snd_cwnd = tp->snd_cwnd; - - if (unlikely(!snd_cwnd)) { - pr_err_once("snd_cwnd is nul, please report this bug.\n"); - snd_cwnd = 1U; - } + u32 cwnd = tp->snd_cwnd + acked; - if (sysctl_tcp_max_ssthresh > 0 && tp->snd_cwnd > sysctl_tcp_max_ssthresh) - cnt = sysctl_tcp_max_ssthresh >> 1; /* limited slow start */ - else - cnt = snd_cwnd; /* exponential increase */ - - tp->snd_cwnd_cnt += cnt; - while (tp->snd_cwnd_cnt >= snd_cwnd) { - tp->snd_cwnd_cnt -= snd_cwnd; - delta++; - } - tp->snd_cwnd = min(snd_cwnd + delta, tp->snd_cwnd_clamp); + if (cwnd > tp->snd_ssthresh) + cwnd = tp->snd_ssthresh + 1; + acked -= cwnd - tp->snd_cwnd; + tp->snd_cwnd = min(cwnd, tp->snd_cwnd_clamp); + return acked; } EXPORT_SYMBOL_GPL(tcp_slow_start); @@ -351,7 +338,7 @@ EXPORT_SYMBOL_GPL(tcp_cong_avoid_ai); /* This is Jacobson's slow start and congestion avoidance. * SIGCOMM '88, p. 328. */ -void tcp_reno_cong_avoid(struct sock *sk, u32 ack, u32 in_flight) +void tcp_reno_cong_avoid(struct sock *sk, u32 ack, u32 acked, u32 in_flight) { struct tcp_sock *tp = tcp_sk(sk); @@ -360,7 +347,7 @@ void tcp_reno_cong_avoid(struct sock *sk, u32 ack, u32 in_flight) /* In "safe" area, increase. */ if (tp->snd_cwnd <= tp->snd_ssthresh) - tcp_slow_start(tp); + tcp_slow_start(tp, acked); /* In dangerous area, increase slowly. */ else tcp_cong_avoid_ai(tp, tp->snd_cwnd); diff --git a/net/ipv4/tcp_cubic.c b/net/ipv4/tcp_cubic.c index b6ae92a51f58..828e4c3ffbaf 100644 --- a/net/ipv4/tcp_cubic.c +++ b/net/ipv4/tcp_cubic.c @@ -304,7 +304,8 @@ static inline void bictcp_update(struct bictcp *ca, u32 cwnd) ca->cnt = 1; } -static void bictcp_cong_avoid(struct sock *sk, u32 ack, u32 in_flight) +static void bictcp_cong_avoid(struct sock *sk, u32 ack, u32 acked, + u32 in_flight) { struct tcp_sock *tp = tcp_sk(sk); struct bictcp *ca = inet_csk_ca(sk); @@ -315,7 +316,7 @@ static void bictcp_cong_avoid(struct sock *sk, u32 ack, u32 in_flight) if (tp->snd_cwnd <= tp->snd_ssthresh) { if (hystart && after(ack, ca->end_seq)) bictcp_hystart_reset(sk); - tcp_slow_start(tp); + tcp_slow_start(tp, acked); } else { bictcp_update(ca, tp->snd_cwnd); tcp_cong_avoid_ai(tp, ca->cnt); diff --git a/net/ipv4/tcp_fastopen.c b/net/ipv4/tcp_fastopen.c index ab7bd35bb312..f195d9316e55 100644 --- a/net/ipv4/tcp_fastopen.c +++ b/net/ipv4/tcp_fastopen.c @@ -8,12 +8,26 @@ #include <net/inetpeer.h> #include <net/tcp.h> -int sysctl_tcp_fastopen __read_mostly; +int sysctl_tcp_fastopen __read_mostly = TFO_CLIENT_ENABLE; struct tcp_fastopen_context __rcu *tcp_fastopen_ctx; static DEFINE_SPINLOCK(tcp_fastopen_ctx_lock); +void tcp_fastopen_init_key_once(bool publish) +{ + static u8 key[TCP_FASTOPEN_KEY_LENGTH]; + + /* tcp_fastopen_reset_cipher publishes the new context + * atomically, so we allow this race happening here. + * + * All call sites of tcp_fastopen_cookie_gen also check + * for a valid cookie, so this is an acceptable risk. + */ + if (net_get_random_once(key, sizeof(key)) && publish) + tcp_fastopen_reset_cipher(key, sizeof(key)); +} + static void tcp_fastopen_ctx_free(struct rcu_head *head) { struct tcp_fastopen_context *ctx = @@ -70,6 +84,8 @@ void tcp_fastopen_cookie_gen(__be32 src, __be32 dst, __be32 path[4] = { src, dst, 0, 0 }; struct tcp_fastopen_context *ctx; + tcp_fastopen_init_key_once(true); + rcu_read_lock(); ctx = rcu_dereference(tcp_fastopen_ctx); if (ctx) { @@ -78,14 +94,3 @@ void tcp_fastopen_cookie_gen(__be32 src, __be32 dst, } rcu_read_unlock(); } - -static int __init tcp_fastopen_init(void) -{ - __u8 key[TCP_FASTOPEN_KEY_LENGTH]; - - get_random_bytes(key, sizeof(key)); - tcp_fastopen_reset_cipher(key, sizeof(key)); - return 0; -} - -late_initcall(tcp_fastopen_init); diff --git a/net/ipv4/tcp_highspeed.c b/net/ipv4/tcp_highspeed.c index 30f27f6b3655..8ed9305dfdf4 100644 --- a/net/ipv4/tcp_highspeed.c +++ b/net/ipv4/tcp_highspeed.c @@ -109,7 +109,7 @@ static void hstcp_init(struct sock *sk) tp->snd_cwnd_clamp = min_t(u32, tp->snd_cwnd_clamp, 0xffffffff/128); } -static void hstcp_cong_avoid(struct sock *sk, u32 adk, u32 in_flight) +static void hstcp_cong_avoid(struct sock *sk, u32 ack, u32 acked, u32 in_flight) { struct tcp_sock *tp = tcp_sk(sk); struct hstcp *ca = inet_csk_ca(sk); @@ -118,7 +118,7 @@ static void hstcp_cong_avoid(struct sock *sk, u32 adk, u32 in_flight) return; if (tp->snd_cwnd <= tp->snd_ssthresh) - tcp_slow_start(tp); + tcp_slow_start(tp, acked); else { /* Update AIMD parameters. * diff --git a/net/ipv4/tcp_htcp.c b/net/ipv4/tcp_htcp.c index c1a8175361e8..4a194acfd923 100644 --- a/net/ipv4/tcp_htcp.c +++ b/net/ipv4/tcp_htcp.c @@ -227,7 +227,7 @@ static u32 htcp_recalc_ssthresh(struct sock *sk) return max((tp->snd_cwnd * ca->beta) >> 7, 2U); } -static void htcp_cong_avoid(struct sock *sk, u32 ack, u32 in_flight) +static void htcp_cong_avoid(struct sock *sk, u32 ack, u32 acked, u32 in_flight) { struct tcp_sock *tp = tcp_sk(sk); struct htcp *ca = inet_csk_ca(sk); @@ -236,7 +236,7 @@ static void htcp_cong_avoid(struct sock *sk, u32 ack, u32 in_flight) return; if (tp->snd_cwnd <= tp->snd_ssthresh) - tcp_slow_start(tp); + tcp_slow_start(tp, acked); else { /* In dangerous area, increase slowly. * In theory this is tp->snd_cwnd += alpha / tp->snd_cwnd diff --git a/net/ipv4/tcp_hybla.c b/net/ipv4/tcp_hybla.c index 57bdd17dff4d..478fe82611bf 100644 --- a/net/ipv4/tcp_hybla.c +++ b/net/ipv4/tcp_hybla.c @@ -85,7 +85,8 @@ static inline u32 hybla_fraction(u32 odds) * o Give cwnd a new value based on the model proposed * o remember increments <1 */ -static void hybla_cong_avoid(struct sock *sk, u32 ack, u32 in_flight) +static void hybla_cong_avoid(struct sock *sk, u32 ack, u32 acked, + u32 in_flight) { struct tcp_sock *tp = tcp_sk(sk); struct hybla *ca = inet_csk_ca(sk); @@ -102,7 +103,7 @@ static void hybla_cong_avoid(struct sock *sk, u32 ack, u32 in_flight) return; if (!ca->hybla_en) { - tcp_reno_cong_avoid(sk, ack, in_flight); + tcp_reno_cong_avoid(sk, ack, acked, in_flight); return; } diff --git a/net/ipv4/tcp_illinois.c b/net/ipv4/tcp_illinois.c index 834857f3c871..8a520996f3d2 100644 --- a/net/ipv4/tcp_illinois.c +++ b/net/ipv4/tcp_illinois.c @@ -256,7 +256,8 @@ static void tcp_illinois_state(struct sock *sk, u8 new_state) /* * Increase window in response to successful acknowledgment. */ -static void tcp_illinois_cong_avoid(struct sock *sk, u32 ack, u32 in_flight) +static void tcp_illinois_cong_avoid(struct sock *sk, u32 ack, u32 acked, + u32 in_flight) { struct tcp_sock *tp = tcp_sk(sk); struct illinois *ca = inet_csk_ca(sk); @@ -270,7 +271,7 @@ static void tcp_illinois_cong_avoid(struct sock *sk, u32 ack, u32 in_flight) /* In slow start */ if (tp->snd_cwnd <= tp->snd_ssthresh) - tcp_slow_start(tp); + tcp_slow_start(tp, acked); else { u32 delta; diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c index 068c8fb0d158..c53b7f35c51d 100644 --- a/net/ipv4/tcp_input.c +++ b/net/ipv4/tcp_input.c @@ -267,11 +267,31 @@ static bool TCP_ECN_rcv_ecn_echo(const struct tcp_sock *tp, const struct tcphdr * 1. Tuning sk->sk_sndbuf, when connection enters established state. */ -static void tcp_fixup_sndbuf(struct sock *sk) +static void tcp_sndbuf_expand(struct sock *sk) { - int sndmem = SKB_TRUESIZE(tcp_sk(sk)->rx_opt.mss_clamp + MAX_TCP_HEADER); + const struct tcp_sock *tp = tcp_sk(sk); + int sndmem, per_mss; + u32 nr_segs; + + /* Worst case is non GSO/TSO : each frame consumes one skb + * and skb->head is kmalloced using power of two area of memory + */ + per_mss = max_t(u32, tp->rx_opt.mss_clamp, tp->mss_cache) + + MAX_TCP_HEADER + + SKB_DATA_ALIGN(sizeof(struct skb_shared_info)); + + per_mss = roundup_pow_of_two(per_mss) + + SKB_DATA_ALIGN(sizeof(struct sk_buff)); + + nr_segs = max_t(u32, TCP_INIT_CWND, tp->snd_cwnd); + nr_segs = max_t(u32, nr_segs, tp->reordering + 1); + + /* Fast Recovery (RFC 5681 3.2) : + * Cubic needs 1.7 factor, rounded to 2 to include + * extra cushion (application might react slowly to POLLOUT) + */ + sndmem = 2 * nr_segs * per_mss; - sndmem *= TCP_INIT_CWND; if (sk->sk_sndbuf < sndmem) sk->sk_sndbuf = min(sndmem, sysctl_tcp_wmem[2]); } @@ -355,6 +375,12 @@ static void tcp_fixup_rcvbuf(struct sock *sk) rcvmem = 2 * SKB_TRUESIZE(mss + MAX_TCP_HEADER) * tcp_default_init_rwnd(mss); + /* Dynamic Right Sizing (DRS) has 2 to 3 RTT latency + * Allow enough cushion so that sender is not limited by our window + */ + if (sysctl_tcp_moderate_rcvbuf) + rcvmem <<= 2; + if (sk->sk_rcvbuf < rcvmem) sk->sk_rcvbuf = min(rcvmem, sysctl_tcp_rmem[2]); } @@ -370,9 +396,11 @@ void tcp_init_buffer_space(struct sock *sk) if (!(sk->sk_userlocks & SOCK_RCVBUF_LOCK)) tcp_fixup_rcvbuf(sk); if (!(sk->sk_userlocks & SOCK_SNDBUF_LOCK)) - tcp_fixup_sndbuf(sk); + tcp_sndbuf_expand(sk); tp->rcvq_space.space = tp->rcv_wnd; + tp->rcvq_space.time = tcp_time_stamp; + tp->rcvq_space.seq = tp->copied_seq; maxwin = tcp_full_space(sk); @@ -512,48 +540,62 @@ void tcp_rcv_space_adjust(struct sock *sk) { struct tcp_sock *tp = tcp_sk(sk); int time; - int space; - - if (tp->rcvq_space.time == 0) - goto new_measure; + int copied; time = tcp_time_stamp - tp->rcvq_space.time; if (time < (tp->rcv_rtt_est.rtt >> 3) || tp->rcv_rtt_est.rtt == 0) return; - space = 2 * (tp->copied_seq - tp->rcvq_space.seq); + /* Number of bytes copied to user in last RTT */ + copied = tp->copied_seq - tp->rcvq_space.seq; + if (copied <= tp->rcvq_space.space) + goto new_measure; - space = max(tp->rcvq_space.space, space); + /* A bit of theory : + * copied = bytes received in previous RTT, our base window + * To cope with packet losses, we need a 2x factor + * To cope with slow start, and sender growing its cwin by 100 % + * every RTT, we need a 4x factor, because the ACK we are sending + * now is for the next RTT, not the current one : + * <prev RTT . ><current RTT .. ><next RTT .... > + */ - if (tp->rcvq_space.space != space) { - int rcvmem; + if (sysctl_tcp_moderate_rcvbuf && + !(sk->sk_userlocks & SOCK_RCVBUF_LOCK)) { + int rcvwin, rcvmem, rcvbuf; - tp->rcvq_space.space = space; + /* minimal window to cope with packet losses, assuming + * steady state. Add some cushion because of small variations. + */ + rcvwin = (copied << 1) + 16 * tp->advmss; - if (sysctl_tcp_moderate_rcvbuf && - !(sk->sk_userlocks & SOCK_RCVBUF_LOCK)) { - int new_clamp = space; + /* If rate increased by 25%, + * assume slow start, rcvwin = 3 * copied + * If rate increased by 50%, + * assume sender can use 2x growth, rcvwin = 4 * copied + */ + if (copied >= + tp->rcvq_space.space + (tp->rcvq_space.space >> 2)) { + if (copied >= + tp->rcvq_space.space + (tp->rcvq_space.space >> 1)) + rcvwin <<= 1; + else + rcvwin += (rcvwin >> 1); + } - /* Receive space grows, normalize in order to - * take into account packet headers and sk_buff - * structure overhead. - */ - space /= tp->advmss; - if (!space) - space = 1; - rcvmem = SKB_TRUESIZE(tp->advmss + MAX_TCP_HEADER); - while (tcp_win_from_space(rcvmem) < tp->advmss) - rcvmem += 128; - space *= rcvmem; - space = min(space, sysctl_tcp_rmem[2]); - if (space > sk->sk_rcvbuf) { - sk->sk_rcvbuf = space; - - /* Make the window clamp follow along. */ - tp->window_clamp = new_clamp; - } + rcvmem = SKB_TRUESIZE(tp->advmss + MAX_TCP_HEADER); + while (tcp_win_from_space(rcvmem) < tp->advmss) + rcvmem += 128; + + rcvbuf = min(rcvwin / tp->advmss * rcvmem, sysctl_tcp_rmem[2]); + if (rcvbuf > sk->sk_rcvbuf) { + sk->sk_rcvbuf = rcvbuf; + + /* Make the window clamp follow along. */ + tp->window_clamp = rcvwin; } } + tp->rcvq_space.space = copied; new_measure: tp->rcvq_space.seq = tp->copied_seq; @@ -713,7 +755,12 @@ static void tcp_update_pacing_rate(struct sock *sk) if (tp->srtt > 8 + 2) do_div(rate, tp->srtt); - sk->sk_pacing_rate = min_t(u64, rate, ~0U); + /* ACCESS_ONCE() is needed because sch_fq fetches sk_pacing_rate + * without any lock. We want to make sure compiler wont store + * intermediate values in this location. + */ + ACCESS_ONCE(sk->sk_pacing_rate) = min_t(u64, rate, + sk->sk_max_pacing_rate); } /* Calculate rto without backoff. This is the second half of Van Jacobson's @@ -2887,10 +2934,10 @@ static void tcp_synack_rtt_meas(struct sock *sk, const u32 synack_stamp) tcp_ack_update_rtt(sk, FLAG_SYN_ACKED, seq_rtt, -1); } -static void tcp_cong_avoid(struct sock *sk, u32 ack, u32 in_flight) +static void tcp_cong_avoid(struct sock *sk, u32 ack, u32 acked, u32 in_flight) { const struct inet_connection_sock *icsk = inet_csk(sk); - icsk->icsk_ca_ops->cong_avoid(sk, ack, in_flight); + icsk->icsk_ca_ops->cong_avoid(sk, ack, acked, in_flight); tcp_sk(sk)->snd_cwnd_stamp = tcp_time_stamp; } @@ -2979,7 +3026,7 @@ static int tcp_clean_rtx_queue(struct sock *sk, int prior_fackets, const struct inet_connection_sock *icsk = inet_csk(sk); struct sk_buff *skb; u32 now = tcp_time_stamp; - int fully_acked = true; + bool fully_acked = true; int flag = 0; u32 pkts_acked = 0; u32 reord = tp->packets_out; @@ -3407,7 +3454,7 @@ static int tcp_ack(struct sock *sk, const struct sk_buff *skb, int flag) /* Advance cwnd if state allows */ if (tcp_may_raise_cwnd(sk, flag)) - tcp_cong_avoid(sk, ack, prior_in_flight); + tcp_cong_avoid(sk, ack, acked, prior_in_flight); if (tcp_ack_is_dubious(sk, flag)) { is_dupack = !(flag & (FLAG_SND_UNA_ADVANCED | FLAG_NOT_DUP)); @@ -4717,15 +4764,7 @@ static void tcp_new_space(struct sock *sk) struct tcp_sock *tp = tcp_sk(sk); if (tcp_should_expand_sndbuf(sk)) { - int sndmem = SKB_TRUESIZE(max_t(u32, - tp->rx_opt.mss_clamp, - tp->mss_cache) + - MAX_TCP_HEADER); - int demanded = max_t(unsigned int, tp->snd_cwnd, - tp->reordering + 1); - sndmem *= 2 * demanded; - if (sndmem > sk->sk_sndbuf) - sk->sk_sndbuf = min(sndmem, sysctl_tcp_wmem[2]); + tcp_sndbuf_expand(sk); tp->snd_cwnd_stamp = tcp_time_stamp; } @@ -5693,8 +5732,8 @@ int tcp_rcv_state_process(struct sock *sk, struct sk_buff *skb, tcp_init_congestion_control(sk); tcp_mtup_init(sk); - tcp_init_buffer_space(sk); tp->copied_seq = tp->rcv_nxt; + tcp_init_buffer_space(sk); } smp_mb(); tcp_set_state(sk, TCP_ESTABLISHED); diff --git a/net/ipv4/tcp_ipv4.c b/net/ipv4/tcp_ipv4.c index b14266bb91eb..14bba8a1c5a7 100644 --- a/net/ipv4/tcp_ipv4.c +++ b/net/ipv4/tcp_ipv4.c @@ -288,6 +288,7 @@ static void tcp_v4_mtu_reduced(struct sock *sk) mtu = dst_mtu(dst); if (inet->pmtudisc != IP_PMTUDISC_DONT && + ip_sk_accept_pmtu(sk) && inet_csk(sk)->icsk_pmtu_cookie > mtu) { tcp_sync_mss(sk, mtu); @@ -835,11 +836,11 @@ static int tcp_v4_send_synack(struct sock *sk, struct dst_entry *dst, skb = tcp_make_synack(sk, dst, req, NULL); if (skb) { - __tcp_v4_send_check(skb, ireq->loc_addr, ireq->rmt_addr); + __tcp_v4_send_check(skb, ireq->ir_loc_addr, ireq->ir_rmt_addr); skb_set_queue_mapping(skb, queue_mapping); - err = ip_build_and_send_pkt(skb, sk, ireq->loc_addr, - ireq->rmt_addr, + err = ip_build_and_send_pkt(skb, sk, ireq->ir_loc_addr, + ireq->ir_rmt_addr, ireq->opt); err = net_xmit_eval(err); if (!tcp_rsk(req)->snt_synack && !err) @@ -972,7 +973,7 @@ static struct tcp_md5sig_key *tcp_v4_reqsk_md5_lookup(struct sock *sk, { union tcp_md5_addr *addr; - addr = (union tcp_md5_addr *)&inet_rsk(req)->rmt_addr; + addr = (union tcp_md5_addr *)&inet_rsk(req)->ir_rmt_addr; return tcp_md5_do_lookup(sk, addr, AF_INET); } @@ -1149,8 +1150,8 @@ int tcp_v4_md5_hash_skb(char *md5_hash, struct tcp_md5sig_key *key, saddr = inet_sk(sk)->inet_saddr; daddr = inet_sk(sk)->inet_daddr; } else if (req) { - saddr = inet_rsk(req)->loc_addr; - daddr = inet_rsk(req)->rmt_addr; + saddr = inet_rsk(req)->ir_loc_addr; + daddr = inet_rsk(req)->ir_rmt_addr; } else { const struct iphdr *iph = ip_hdr(skb); saddr = iph->saddr; @@ -1366,8 +1367,8 @@ static int tcp_v4_conn_req_fastopen(struct sock *sk, kfree_skb(skb_synack); return -1; } - err = ip_build_and_send_pkt(skb_synack, sk, ireq->loc_addr, - ireq->rmt_addr, ireq->opt); + err = ip_build_and_send_pkt(skb_synack, sk, ireq->ir_loc_addr, + ireq->ir_rmt_addr, ireq->opt); err = net_xmit_eval(err); if (!err) tcp_rsk(req)->snt_synack = tcp_time_stamp; @@ -1410,8 +1411,8 @@ static int tcp_v4_conn_req_fastopen(struct sock *sk, inet_csk(child)->icsk_af_ops->rebuild_header(child); tcp_init_congestion_control(child); tcp_mtup_init(child); - tcp_init_buffer_space(child); tcp_init_metrics(child); + tcp_init_buffer_space(child); /* Queue the data carried in the SYN packet. We need to first * bump skb's refcnt because the caller will attempt to free it. @@ -1502,8 +1503,8 @@ int tcp_v4_conn_request(struct sock *sk, struct sk_buff *skb) tcp_openreq_init(req, &tmp_opt, skb); ireq = inet_rsk(req); - ireq->loc_addr = daddr; - ireq->rmt_addr = saddr; + ireq->ir_loc_addr = daddr; + ireq->ir_rmt_addr = saddr; ireq->no_srccheck = inet_sk(sk)->transparent; ireq->opt = tcp_v4_save_options(skb); @@ -1578,15 +1579,15 @@ int tcp_v4_conn_request(struct sock *sk, struct sk_buff *skb) fastopen_cookie_present(&valid_foc) ? &valid_foc : NULL); if (skb_synack) { - __tcp_v4_send_check(skb_synack, ireq->loc_addr, ireq->rmt_addr); + __tcp_v4_send_check(skb_synack, ireq->ir_loc_addr, ireq->ir_rmt_addr); skb_set_queue_mapping(skb_synack, skb_get_queue_mapping(skb)); } else goto drop_and_free; if (likely(!do_fastopen)) { int err; - err = ip_build_and_send_pkt(skb_synack, sk, ireq->loc_addr, - ireq->rmt_addr, ireq->opt); + err = ip_build_and_send_pkt(skb_synack, sk, ireq->ir_loc_addr, + ireq->ir_rmt_addr, ireq->opt); err = net_xmit_eval(err); if (err || want_cookie) goto drop_and_free; @@ -1644,9 +1645,9 @@ struct sock *tcp_v4_syn_recv_sock(struct sock *sk, struct sk_buff *skb, newtp = tcp_sk(newsk); newinet = inet_sk(newsk); ireq = inet_rsk(req); - newinet->inet_daddr = ireq->rmt_addr; - newinet->inet_rcv_saddr = ireq->loc_addr; - newinet->inet_saddr = ireq->loc_addr; + newinet->inet_daddr = ireq->ir_rmt_addr; + newinet->inet_rcv_saddr = ireq->ir_loc_addr; + newinet->inet_saddr = ireq->ir_loc_addr; inet_opt = ireq->opt; rcu_assign_pointer(newinet->inet_opt, inet_opt); ireq->opt = NULL; @@ -2194,18 +2195,6 @@ EXPORT_SYMBOL(tcp_v4_destroy_sock); #ifdef CONFIG_PROC_FS /* Proc filesystem TCP sock list dumping. */ -static inline struct inet_timewait_sock *tw_head(struct hlist_nulls_head *head) -{ - return hlist_nulls_empty(head) ? NULL : - list_entry(head->first, struct inet_timewait_sock, tw_node); -} - -static inline struct inet_timewait_sock *tw_next(struct inet_timewait_sock *tw) -{ - return !is_a_nulls(tw->tw_node.next) ? - hlist_nulls_entry(tw->tw_node.next, typeof(*tw), tw_node) : NULL; -} - /* * Get next listener socket follow cur. If cur is NULL, get first socket * starting from bucket given in st->bucket; when st->bucket is zero the @@ -2309,10 +2298,9 @@ static void *listening_get_idx(struct seq_file *seq, loff_t *pos) return rc; } -static inline bool empty_bucket(struct tcp_iter_state *st) +static inline bool empty_bucket(const struct tcp_iter_state *st) { - return hlist_nulls_empty(&tcp_hashinfo.ehash[st->bucket].chain) && - hlist_nulls_empty(&tcp_hashinfo.ehash[st->bucket].twchain); + return hlist_nulls_empty(&tcp_hashinfo.ehash[st->bucket].chain); } /* @@ -2329,7 +2317,6 @@ static void *established_get_first(struct seq_file *seq) for (; st->bucket <= tcp_hashinfo.ehash_mask; ++st->bucket) { struct sock *sk; struct hlist_nulls_node *node; - struct inet_timewait_sock *tw; spinlock_t *lock = inet_ehash_lockp(&tcp_hashinfo, st->bucket); /* Lockless fast path for the common case of empty buckets */ @@ -2345,18 +2332,7 @@ static void *established_get_first(struct seq_file *seq) rc = sk; goto out; } - st->state = TCP_SEQ_STATE_TIME_WAIT; - inet_twsk_for_each(tw, node, - &tcp_hashinfo.ehash[st->bucket].twchain) { - if (tw->tw_family != st->family || - !net_eq(twsk_net(tw), net)) { - continue; - } - rc = tw; - goto out; - } spin_unlock_bh(lock); - st->state = TCP_SEQ_STATE_ESTABLISHED; } out: return rc; @@ -2365,7 +2341,6 @@ out: static void *established_get_next(struct seq_file *seq, void *cur) { struct sock *sk = cur; - struct inet_timewait_sock *tw; struct hlist_nulls_node *node; struct tcp_iter_state *st = seq->private; struct net *net = seq_file_net(seq); @@ -2373,45 +2348,16 @@ static void *established_get_next(struct seq_file *seq, void *cur) ++st->num; ++st->offset; - if (st->state == TCP_SEQ_STATE_TIME_WAIT) { - tw = cur; - tw = tw_next(tw); -get_tw: - while (tw && (tw->tw_family != st->family || !net_eq(twsk_net(tw), net))) { - tw = tw_next(tw); - } - if (tw) { - cur = tw; - goto out; - } - spin_unlock_bh(inet_ehash_lockp(&tcp_hashinfo, st->bucket)); - st->state = TCP_SEQ_STATE_ESTABLISHED; - - /* Look for next non empty bucket */ - st->offset = 0; - while (++st->bucket <= tcp_hashinfo.ehash_mask && - empty_bucket(st)) - ; - if (st->bucket > tcp_hashinfo.ehash_mask) - return NULL; - - spin_lock_bh(inet_ehash_lockp(&tcp_hashinfo, st->bucket)); - sk = sk_nulls_head(&tcp_hashinfo.ehash[st->bucket].chain); - } else - sk = sk_nulls_next(sk); + sk = sk_nulls_next(sk); sk_nulls_for_each_from(sk, node) { if (sk->sk_family == st->family && net_eq(sock_net(sk), net)) - goto found; + return sk; } - st->state = TCP_SEQ_STATE_TIME_WAIT; - tw = tw_head(&tcp_hashinfo.ehash[st->bucket].twchain); - goto get_tw; -found: - cur = sk; -out: - return cur; + spin_unlock_bh(inet_ehash_lockp(&tcp_hashinfo, st->bucket)); + ++st->bucket; + return established_get_first(seq); } static void *established_get_idx(struct seq_file *seq, loff_t pos) @@ -2464,10 +2410,9 @@ static void *tcp_seek_last_pos(struct seq_file *seq) if (rc) break; st->bucket = 0; + st->state = TCP_SEQ_STATE_ESTABLISHED; /* Fallthrough */ case TCP_SEQ_STATE_ESTABLISHED: - case TCP_SEQ_STATE_TIME_WAIT: - st->state = TCP_SEQ_STATE_ESTABLISHED; if (st->bucket > tcp_hashinfo.ehash_mask) break; rc = established_get_first(seq); @@ -2524,7 +2469,6 @@ static void *tcp_seq_next(struct seq_file *seq, void *v, loff_t *pos) } break; case TCP_SEQ_STATE_ESTABLISHED: - case TCP_SEQ_STATE_TIME_WAIT: rc = established_get_next(seq, v); break; } @@ -2548,7 +2492,6 @@ static void tcp_seq_stop(struct seq_file *seq, void *v) if (v != SEQ_START_TOKEN) spin_unlock_bh(&tcp_hashinfo.listening_hash[st->bucket].lock); break; - case TCP_SEQ_STATE_TIME_WAIT: case TCP_SEQ_STATE_ESTABLISHED: if (v) spin_unlock_bh(inet_ehash_lockp(&tcp_hashinfo, st->bucket)); @@ -2606,10 +2549,10 @@ static void get_openreq4(const struct sock *sk, const struct request_sock *req, seq_printf(f, "%4d: %08X:%04X %08X:%04X" " %02X %08X:%08X %02X:%08lX %08X %5u %8d %u %d %pK%n", i, - ireq->loc_addr, + ireq->ir_loc_addr, ntohs(inet_sk(sk)->inet_sport), - ireq->rmt_addr, - ntohs(ireq->rmt_port), + ireq->ir_rmt_addr, + ntohs(ireq->ir_rmt_port), TCP_SYN_RECV, 0, 0, /* could print option size, but that is af dependent. */ 1, /* timers active (only the expire timer) */ @@ -2707,6 +2650,7 @@ static void get_timewait4_sock(const struct inet_timewait_sock *tw, static int tcp4_seq_show(struct seq_file *seq, void *v) { struct tcp_iter_state *st; + struct sock *sk = v; int len; if (v == SEQ_START_TOKEN) { @@ -2721,14 +2665,14 @@ static int tcp4_seq_show(struct seq_file *seq, void *v) switch (st->state) { case TCP_SEQ_STATE_LISTENING: case TCP_SEQ_STATE_ESTABLISHED: - get_tcp4_sock(v, seq, st->num, &len); + if (sk->sk_state == TCP_TIME_WAIT) + get_timewait4_sock(v, seq, st->num, &len); + else + get_tcp4_sock(v, seq, st->num, &len); break; case TCP_SEQ_STATE_OPENREQ: get_openreq4(st->syn_wait_sk, v, seq, st->num, st->uid, &len); break; - case TCP_SEQ_STATE_TIME_WAIT: - get_timewait4_sock(v, seq, st->num, &len); - break; } seq_printf(seq, "%*s\n", TMPSZ - 1 - len, ""); out: @@ -2806,6 +2750,7 @@ struct proto tcp_prot = { .orphan_count = &tcp_orphan_count, .memory_allocated = &tcp_memory_allocated, .memory_pressure = &tcp_memory_pressure, + .sysctl_mem = sysctl_tcp_mem, .sysctl_wmem = sysctl_tcp_wmem, .sysctl_rmem = sysctl_tcp_rmem, .max_header = MAX_TCP_HEADER, diff --git a/net/ipv4/tcp_lp.c b/net/ipv4/tcp_lp.c index 72f7218b03f5..991d62a2f9bb 100644 --- a/net/ipv4/tcp_lp.c +++ b/net/ipv4/tcp_lp.c @@ -115,12 +115,13 @@ static void tcp_lp_init(struct sock *sk) * Will only call newReno CA when away from inference. * From TCP-LP's paper, this will be handled in additive increasement. */ -static void tcp_lp_cong_avoid(struct sock *sk, u32 ack, u32 in_flight) +static void tcp_lp_cong_avoid(struct sock *sk, u32 ack, u32 acked, + u32 in_flight) { struct lp *lp = inet_csk_ca(sk); if (!(lp->flag & LP_WITHIN_INF)) - tcp_reno_cong_avoid(sk, ack, in_flight); + tcp_reno_cong_avoid(sk, ack, acked, in_flight); } /** diff --git a/net/ipv4/tcp_memcontrol.c b/net/ipv4/tcp_memcontrol.c index 559d4ae6ebf4..03e9154f7e68 100644 --- a/net/ipv4/tcp_memcontrol.c +++ b/net/ipv4/tcp_memcontrol.c @@ -6,15 +6,10 @@ #include <linux/memcontrol.h> #include <linux/module.h> -static inline struct tcp_memcontrol *tcp_from_cgproto(struct cg_proto *cg_proto) -{ - return container_of(cg_proto, struct tcp_memcontrol, cg_proto); -} - static void memcg_tcp_enter_memory_pressure(struct sock *sk) { if (sk->sk_cgrp->memory_pressure) - *sk->sk_cgrp->memory_pressure = 1; + sk->sk_cgrp->memory_pressure = 1; } EXPORT_SYMBOL(memcg_tcp_enter_memory_pressure); @@ -27,34 +22,24 @@ int tcp_init_cgroup(struct mem_cgroup *memcg, struct cgroup_subsys *ss) */ struct res_counter *res_parent = NULL; struct cg_proto *cg_proto, *parent_cg; - struct tcp_memcontrol *tcp; struct mem_cgroup *parent = parent_mem_cgroup(memcg); - struct net *net = current->nsproxy->net_ns; cg_proto = tcp_prot.proto_cgroup(memcg); if (!cg_proto) return 0; - tcp = tcp_from_cgproto(cg_proto); - - tcp->tcp_prot_mem[0] = net->ipv4.sysctl_tcp_mem[0]; - tcp->tcp_prot_mem[1] = net->ipv4.sysctl_tcp_mem[1]; - tcp->tcp_prot_mem[2] = net->ipv4.sysctl_tcp_mem[2]; - tcp->tcp_memory_pressure = 0; + cg_proto->sysctl_mem[0] = sysctl_tcp_mem[0]; + cg_proto->sysctl_mem[1] = sysctl_tcp_mem[1]; + cg_proto->sysctl_mem[2] = sysctl_tcp_mem[2]; + cg_proto->memory_pressure = 0; + cg_proto->memcg = memcg; parent_cg = tcp_prot.proto_cgroup(parent); if (parent_cg) - res_parent = parent_cg->memory_allocated; - - res_counter_init(&tcp->tcp_memory_allocated, res_parent); - percpu_counter_init(&tcp->tcp_sockets_allocated, 0); + res_parent = &parent_cg->memory_allocated; - cg_proto->enter_memory_pressure = memcg_tcp_enter_memory_pressure; - cg_proto->memory_pressure = &tcp->tcp_memory_pressure; - cg_proto->sysctl_mem = tcp->tcp_prot_mem; - cg_proto->memory_allocated = &tcp->tcp_memory_allocated; - cg_proto->sockets_allocated = &tcp->tcp_sockets_allocated; - cg_proto->memcg = memcg; + res_counter_init(&cg_proto->memory_allocated, res_parent); + percpu_counter_init(&cg_proto->sockets_allocated, 0); return 0; } @@ -63,21 +48,17 @@ EXPORT_SYMBOL(tcp_init_cgroup); void tcp_destroy_cgroup(struct mem_cgroup *memcg) { struct cg_proto *cg_proto; - struct tcp_memcontrol *tcp; cg_proto = tcp_prot.proto_cgroup(memcg); if (!cg_proto) return; - tcp = tcp_from_cgproto(cg_proto); - percpu_counter_destroy(&tcp->tcp_sockets_allocated); + percpu_counter_destroy(&cg_proto->sockets_allocated); } EXPORT_SYMBOL(tcp_destroy_cgroup); static int tcp_update_limit(struct mem_cgroup *memcg, u64 val) { - struct net *net = current->nsproxy->net_ns; - struct tcp_memcontrol *tcp; struct cg_proto *cg_proto; u64 old_lim; int i; @@ -90,16 +71,14 @@ static int tcp_update_limit(struct mem_cgroup *memcg, u64 val) if (val > RES_COUNTER_MAX) val = RES_COUNTER_MAX; - tcp = tcp_from_cgproto(cg_proto); - - old_lim = res_counter_read_u64(&tcp->tcp_memory_allocated, RES_LIMIT); - ret = res_counter_set_limit(&tcp->tcp_memory_allocated, val); + old_lim = res_counter_read_u64(&cg_proto->memory_allocated, RES_LIMIT); + ret = res_counter_set_limit(&cg_proto->memory_allocated, val); if (ret) return ret; for (i = 0; i < 3; i++) - tcp->tcp_prot_mem[i] = min_t(long, val >> PAGE_SHIFT, - net->ipv4.sysctl_tcp_mem[i]); + cg_proto->sysctl_mem[i] = min_t(long, val >> PAGE_SHIFT, + sysctl_tcp_mem[i]); if (val == RES_COUNTER_MAX) clear_bit(MEMCG_SOCK_ACTIVE, &cg_proto->flags); @@ -156,28 +135,24 @@ static int tcp_cgroup_write(struct cgroup_subsys_state *css, struct cftype *cft, static u64 tcp_read_stat(struct mem_cgroup *memcg, int type, u64 default_val) { - struct tcp_memcontrol *tcp; struct cg_proto *cg_proto; cg_proto = tcp_prot.proto_cgroup(memcg); if (!cg_proto) return default_val; - tcp = tcp_from_cgproto(cg_proto); - return res_counter_read_u64(&tcp->tcp_memory_allocated, type); + return res_counter_read_u64(&cg_proto->memory_allocated, type); } static u64 tcp_read_usage(struct mem_cgroup *memcg) { - struct tcp_memcontrol *tcp; struct cg_proto *cg_proto; cg_proto = tcp_prot.proto_cgroup(memcg); if (!cg_proto) return atomic_long_read(&tcp_memory_allocated) << PAGE_SHIFT; - tcp = tcp_from_cgproto(cg_proto); - return res_counter_read_u64(&tcp->tcp_memory_allocated, RES_USAGE); + return res_counter_read_u64(&cg_proto->memory_allocated, RES_USAGE); } static u64 tcp_cgroup_read(struct cgroup_subsys_state *css, struct cftype *cft) @@ -205,54 +180,25 @@ static u64 tcp_cgroup_read(struct cgroup_subsys_state *css, struct cftype *cft) static int tcp_cgroup_reset(struct cgroup_subsys_state *css, unsigned int event) { struct mem_cgroup *memcg; - struct tcp_memcontrol *tcp; struct cg_proto *cg_proto; memcg = mem_cgroup_from_css(css); cg_proto = tcp_prot.proto_cgroup(memcg); if (!cg_proto) return 0; - tcp = tcp_from_cgproto(cg_proto); switch (event) { case RES_MAX_USAGE: - res_counter_reset_max(&tcp->tcp_memory_allocated); + res_counter_reset_max(&cg_proto->memory_allocated); break; case RES_FAILCNT: - res_counter_reset_failcnt(&tcp->tcp_memory_allocated); + res_counter_reset_failcnt(&cg_proto->memory_allocated); break; } return 0; } -unsigned long long tcp_max_memory(const struct mem_cgroup *memcg) -{ - struct tcp_memcontrol *tcp; - struct cg_proto *cg_proto; - - cg_proto = tcp_prot.proto_cgroup((struct mem_cgroup *)memcg); - if (!cg_proto) - return 0; - - tcp = tcp_from_cgproto(cg_proto); - return res_counter_read_u64(&tcp->tcp_memory_allocated, RES_LIMIT); -} - -void tcp_prot_mem(struct mem_cgroup *memcg, long val, int idx) -{ - struct tcp_memcontrol *tcp; - struct cg_proto *cg_proto; - - cg_proto = tcp_prot.proto_cgroup(memcg); - if (!cg_proto) - return; - - tcp = tcp_from_cgproto(cg_proto); - - tcp->tcp_prot_mem[idx] = val; -} - static struct cftype tcp_files[] = { { .name = "kmem.tcp.limit_in_bytes", diff --git a/net/ipv4/tcp_metrics.c b/net/ipv4/tcp_metrics.c index 52f3c6b971d2..2ab09cbae74d 100644 --- a/net/ipv4/tcp_metrics.c +++ b/net/ipv4/tcp_metrics.c @@ -215,13 +215,15 @@ static struct tcp_metrics_block *__tcp_get_metrics_req(struct request_sock *req, addr.family = req->rsk_ops->family; switch (addr.family) { case AF_INET: - addr.addr.a4 = inet_rsk(req)->rmt_addr; + addr.addr.a4 = inet_rsk(req)->ir_rmt_addr; hash = (__force unsigned int) addr.addr.a4; break; +#if IS_ENABLED(CONFIG_IPV6) case AF_INET6: - *(struct in6_addr *)addr.addr.a6 = inet6_rsk(req)->rmt_addr; - hash = ipv6_addr_hash(&inet6_rsk(req)->rmt_addr); + *(struct in6_addr *)addr.addr.a6 = inet_rsk(req)->ir_v6_rmt_addr; + hash = ipv6_addr_hash(&inet_rsk(req)->ir_v6_rmt_addr); break; +#endif default: return NULL; } @@ -240,7 +242,6 @@ static struct tcp_metrics_block *__tcp_get_metrics_req(struct request_sock *req, static struct tcp_metrics_block *__tcp_get_metrics_tw(struct inet_timewait_sock *tw) { - struct inet6_timewait_sock *tw6; struct tcp_metrics_block *tm; struct inetpeer_addr addr; unsigned int hash; @@ -252,11 +253,12 @@ static struct tcp_metrics_block *__tcp_get_metrics_tw(struct inet_timewait_sock addr.addr.a4 = tw->tw_daddr; hash = (__force unsigned int) addr.addr.a4; break; +#if IS_ENABLED(CONFIG_IPV6) case AF_INET6: - tw6 = inet6_twsk((struct sock *)tw); - *(struct in6_addr *)addr.addr.a6 = tw6->tw_v6_daddr; - hash = ipv6_addr_hash(&tw6->tw_v6_daddr); + *(struct in6_addr *)addr.addr.a6 = tw->tw_v6_daddr; + hash = ipv6_addr_hash(&tw->tw_v6_daddr); break; +#endif default: return NULL; } @@ -288,10 +290,12 @@ static struct tcp_metrics_block *tcp_get_metrics(struct sock *sk, addr.addr.a4 = inet_sk(sk)->inet_daddr; hash = (__force unsigned int) addr.addr.a4; break; +#if IS_ENABLED(CONFIG_IPV6) case AF_INET6: - *(struct in6_addr *)addr.addr.a6 = inet6_sk(sk)->daddr; - hash = ipv6_addr_hash(&inet6_sk(sk)->daddr); + *(struct in6_addr *)addr.addr.a6 = sk->sk_v6_daddr; + hash = ipv6_addr_hash(&sk->sk_v6_daddr); break; +#endif default: return NULL; } @@ -667,8 +671,9 @@ void tcp_fastopen_cache_set(struct sock *sk, u16 mss, struct tcp_fastopen_metrics *tfom = &tm->tcpm_fastopen; write_seqlock_bh(&fastopen_seqlock); - tfom->mss = mss; - if (cookie->len > 0) + if (mss) + tfom->mss = mss; + if (cookie && cookie->len > 0) tfom->cookie = *cookie; if (syn_lost) { ++tfom->syn_loss; diff --git a/net/ipv4/tcp_minisocks.c b/net/ipv4/tcp_minisocks.c index 58a3e69aef64..97b684159861 100644 --- a/net/ipv4/tcp_minisocks.c +++ b/net/ipv4/tcp_minisocks.c @@ -293,12 +293,9 @@ void tcp_time_wait(struct sock *sk, int state, int timeo) #if IS_ENABLED(CONFIG_IPV6) if (tw->tw_family == PF_INET6) { struct ipv6_pinfo *np = inet6_sk(sk); - struct inet6_timewait_sock *tw6; - tw->tw_ipv6_offset = inet6_tw_offset(sk->sk_prot); - tw6 = inet6_twsk((struct sock *)tw); - tw6->tw_v6_daddr = np->daddr; - tw6->tw_v6_rcv_saddr = np->rcv_saddr; + tw->tw_v6_daddr = sk->sk_v6_daddr; + tw->tw_v6_rcv_saddr = sk->sk_v6_rcv_saddr; tw->tw_tclass = np->tclass; tw->tw_ipv6only = np->ipv6only; } diff --git a/net/ipv4/tcp_offload.c b/net/ipv4/tcp_offload.c index 533c58a5cfb7..a2b68a108eae 100644 --- a/net/ipv4/tcp_offload.c +++ b/net/ipv4/tcp_offload.c @@ -14,7 +14,7 @@ #include <net/tcp.h> #include <net/protocol.h> -struct sk_buff *tcp_tso_segment(struct sk_buff *skb, +struct sk_buff *tcp_gso_segment(struct sk_buff *skb, netdev_features_t features) { struct sk_buff *segs = ERR_PTR(-EINVAL); @@ -57,6 +57,8 @@ struct sk_buff *tcp_tso_segment(struct sk_buff *skb, SKB_GSO_TCP_ECN | SKB_GSO_TCPV6 | SKB_GSO_GRE | + SKB_GSO_IPIP | + SKB_GSO_SIT | SKB_GSO_MPLS | SKB_GSO_UDP_TUNNEL | 0) || @@ -136,7 +138,7 @@ struct sk_buff *tcp_tso_segment(struct sk_buff *skb, out: return segs; } -EXPORT_SYMBOL(tcp_tso_segment); +EXPORT_SYMBOL(tcp_gso_segment); struct sk_buff **tcp_gro_receive(struct sk_buff **head, struct sk_buff *skb) { @@ -317,7 +319,7 @@ static int tcp4_gro_complete(struct sk_buff *skb) static const struct net_offload tcpv4_offload = { .callbacks = { .gso_send_check = tcp_v4_gso_send_check, - .gso_segment = tcp_tso_segment, + .gso_segment = tcp_gso_segment, .gro_receive = tcp4_gro_receive, .gro_complete = tcp4_gro_complete, }, diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c index d46f2143305c..672854664ff5 100644 --- a/net/ipv4/tcp_output.c +++ b/net/ipv4/tcp_output.c @@ -850,15 +850,15 @@ static int tcp_transmit_skb(struct sock *sk, struct sk_buff *skb, int clone_it, BUG_ON(!skb || !tcp_skb_pcount(skb)); - /* If congestion control is doing timestamping, we must - * take such a timestamp before we potentially clone/copy. - */ - if (icsk->icsk_ca_ops->flags & TCP_CONG_RTT_STAMP) - __net_timestamp(skb); - - if (likely(clone_it)) { + if (clone_it) { const struct sk_buff *fclone = skb + 1; + /* If congestion control is doing timestamping, we must + * take such a timestamp before we potentially clone/copy. + */ + if (icsk->icsk_ca_ops->flags & TCP_CONG_RTT_STAMP) + __net_timestamp(skb); + if (unlikely(skb->fclone == SKB_FCLONE_ORIG && fclone->fclone == SKB_FCLONE_CLONE)) NET_INC_STATS_BH(sock_net(sk), @@ -2353,21 +2353,6 @@ int __tcp_retransmit_skb(struct sock *sk, struct sk_buff *skb) tcp_retrans_try_collapse(sk, skb, cur_mss); - /* Some Solaris stacks overoptimize and ignore the FIN on a - * retransmit when old data is attached. So strip it off - * since it is cheap to do so and saves bytes on the network. - */ - if (skb->len > 0 && - (TCP_SKB_CB(skb)->tcp_flags & TCPHDR_FIN) && - tp->snd_una == (TCP_SKB_CB(skb)->end_seq - 1)) { - if (!pskb_trim(skb, 0)) { - /* Reuse, even though it does some unnecessary work */ - tcp_init_nondata_skb(skb, TCP_SKB_CB(skb)->end_seq - 1, - TCP_SKB_CB(skb)->tcp_flags); - skb->ip_summed = CHECKSUM_NONE; - } - } - /* Make a copy, if the first transmission SKB clone we made * is still in somebody's hands, else make a clone. */ @@ -2736,8 +2721,8 @@ struct sk_buff *tcp_make_synack(struct sock *sk, struct dst_entry *dst, th->syn = 1; th->ack = 1; TCP_ECN_make_synack(req, th); - th->source = ireq->loc_port; - th->dest = ireq->rmt_port; + th->source = htons(ireq->ir_num); + th->dest = ireq->ir_rmt_port; /* Setting of flags are superfluous here for callers (and ECE is * not even correctly set) */ diff --git a/net/ipv4/tcp_probe.c b/net/ipv4/tcp_probe.c index 611beab38a00..8b97d71e193b 100644 --- a/net/ipv4/tcp_probe.c +++ b/net/ipv4/tcp_probe.c @@ -101,22 +101,6 @@ static inline int tcp_probe_avail(void) si4.sin_addr.s_addr = inet->inet_##mem##addr; \ } while (0) \ -#if IS_ENABLED(CONFIG_IPV6) -#define tcp_probe_copy_fl_to_si6(inet, si6, mem) \ - do { \ - struct ipv6_pinfo *pi6 = inet->pinet6; \ - si6.sin6_family = AF_INET6; \ - si6.sin6_port = inet->inet_##mem##port; \ - si6.sin6_addr = pi6->mem##addr; \ - si6.sin6_flowinfo = 0; /* No need here. */ \ - si6.sin6_scope_id = 0; /* No need here. */ \ - } while (0) -#else -#define tcp_probe_copy_fl_to_si6(fl, si6, mem) \ - do { \ - memset(&si6, 0, sizeof(si6)); \ - } while (0) -#endif /* * Hook inserted to be called before each receive packet. @@ -147,8 +131,17 @@ static void jtcp_rcv_established(struct sock *sk, struct sk_buff *skb, tcp_probe_copy_fl_to_si4(inet, p->dst.v4, d); break; case AF_INET6: - tcp_probe_copy_fl_to_si6(inet, p->src.v6, s); - tcp_probe_copy_fl_to_si6(inet, p->dst.v6, d); + memset(&p->src.v6, 0, sizeof(p->src.v6)); + memset(&p->dst.v6, 0, sizeof(p->dst.v6)); +#if IS_ENABLED(CONFIG_IPV6) + p->src.v6.sin6_family = AF_INET6; + p->src.v6.sin6_port = inet->inet_sport; + p->src.v6.sin6_addr = inet6_sk(sk)->saddr; + + p->dst.v6.sin6_family = AF_INET6; + p->dst.v6.sin6_port = inet->inet_dport; + p->dst.v6.sin6_addr = sk->sk_v6_daddr; +#endif break; default: BUG(); diff --git a/net/ipv4/tcp_scalable.c b/net/ipv4/tcp_scalable.c index 8ce55b8aaec8..19ea6c2951f3 100644 --- a/net/ipv4/tcp_scalable.c +++ b/net/ipv4/tcp_scalable.c @@ -15,7 +15,8 @@ #define TCP_SCALABLE_AI_CNT 50U #define TCP_SCALABLE_MD_SCALE 3 -static void tcp_scalable_cong_avoid(struct sock *sk, u32 ack, u32 in_flight) +static void tcp_scalable_cong_avoid(struct sock *sk, u32 ack, u32 acked, + u32 in_flight) { struct tcp_sock *tp = tcp_sk(sk); @@ -23,7 +24,7 @@ static void tcp_scalable_cong_avoid(struct sock *sk, u32 ack, u32 in_flight) return; if (tp->snd_cwnd <= tp->snd_ssthresh) - tcp_slow_start(tp); + tcp_slow_start(tp, acked); else tcp_cong_avoid_ai(tp, min(tp->snd_cwnd, TCP_SCALABLE_AI_CNT)); } diff --git a/net/ipv4/tcp_timer.c b/net/ipv4/tcp_timer.c index 4b85e6f636c9..64f0354c84c7 100644 --- a/net/ipv4/tcp_timer.c +++ b/net/ipv4/tcp_timer.c @@ -156,12 +156,16 @@ static bool retransmits_timed_out(struct sock *sk, static int tcp_write_timeout(struct sock *sk) { struct inet_connection_sock *icsk = inet_csk(sk); + struct tcp_sock *tp = tcp_sk(sk); int retry_until; bool do_reset, syn_set = false; if ((1 << sk->sk_state) & (TCPF_SYN_SENT | TCPF_SYN_RECV)) { - if (icsk->icsk_retransmits) + if (icsk->icsk_retransmits) { dst_negative_advice(sk); + if (tp->syn_fastopen || tp->syn_data) + tcp_fastopen_cache_set(sk, 0, NULL, true); + } retry_until = icsk->icsk_syn_retries ? : sysctl_tcp_syn_retries; syn_set = true; } else { @@ -374,9 +378,8 @@ void tcp_retransmit_timer(struct sock *sk) } #if IS_ENABLED(CONFIG_IPV6) else if (sk->sk_family == AF_INET6) { - struct ipv6_pinfo *np = inet6_sk(sk); LIMIT_NETDEBUG(KERN_DEBUG pr_fmt("Peer %pI6:%u/%u unexpectedly shrunk window %u:%u (repaired)\n"), - &np->daddr, + &sk->sk_v6_daddr, ntohs(inet->inet_dport), inet->inet_num, tp->snd_una, tp->snd_nxt); } diff --git a/net/ipv4/tcp_vegas.c b/net/ipv4/tcp_vegas.c index 80fa2bfd7ede..06cae62bf208 100644 --- a/net/ipv4/tcp_vegas.c +++ b/net/ipv4/tcp_vegas.c @@ -163,13 +163,14 @@ static inline u32 tcp_vegas_ssthresh(struct tcp_sock *tp) return min(tp->snd_ssthresh, tp->snd_cwnd-1); } -static void tcp_vegas_cong_avoid(struct sock *sk, u32 ack, u32 in_flight) +static void tcp_vegas_cong_avoid(struct sock *sk, u32 ack, u32 acked, + u32 in_flight) { struct tcp_sock *tp = tcp_sk(sk); struct vegas *vegas = inet_csk_ca(sk); if (!vegas->doing_vegas_now) { - tcp_reno_cong_avoid(sk, ack, in_flight); + tcp_reno_cong_avoid(sk, ack, acked, in_flight); return; } @@ -194,7 +195,7 @@ static void tcp_vegas_cong_avoid(struct sock *sk, u32 ack, u32 in_flight) /* We don't have enough RTT samples to do the Vegas * calculation, so we'll behave like Reno. */ - tcp_reno_cong_avoid(sk, ack, in_flight); + tcp_reno_cong_avoid(sk, ack, acked, in_flight); } else { u32 rtt, diff; u64 target_cwnd; @@ -243,7 +244,7 @@ static void tcp_vegas_cong_avoid(struct sock *sk, u32 ack, u32 in_flight) } else if (tp->snd_cwnd <= tp->snd_ssthresh) { /* Slow start. */ - tcp_slow_start(tp); + tcp_slow_start(tp, acked); } else { /* Congestion avoidance. */ @@ -283,7 +284,7 @@ static void tcp_vegas_cong_avoid(struct sock *sk, u32 ack, u32 in_flight) } /* Use normal slow start */ else if (tp->snd_cwnd <= tp->snd_ssthresh) - tcp_slow_start(tp); + tcp_slow_start(tp, acked); } diff --git a/net/ipv4/tcp_vegas.h b/net/ipv4/tcp_vegas.h index 6c0eea2f8249..0531b99d8637 100644 --- a/net/ipv4/tcp_vegas.h +++ b/net/ipv4/tcp_vegas.h @@ -15,10 +15,10 @@ struct vegas { u32 baseRTT; /* the min of all Vegas RTT measurements seen (in usec) */ }; -extern void tcp_vegas_init(struct sock *sk); -extern void tcp_vegas_state(struct sock *sk, u8 ca_state); -extern void tcp_vegas_pkts_acked(struct sock *sk, u32 cnt, s32 rtt_us); -extern void tcp_vegas_cwnd_event(struct sock *sk, enum tcp_ca_event event); -extern void tcp_vegas_get_info(struct sock *sk, u32 ext, struct sk_buff *skb); +void tcp_vegas_init(struct sock *sk); +void tcp_vegas_state(struct sock *sk, u8 ca_state); +void tcp_vegas_pkts_acked(struct sock *sk, u32 cnt, s32 rtt_us); +void tcp_vegas_cwnd_event(struct sock *sk, enum tcp_ca_event event); +void tcp_vegas_get_info(struct sock *sk, u32 ext, struct sk_buff *skb); #endif /* __TCP_VEGAS_H */ diff --git a/net/ipv4/tcp_veno.c b/net/ipv4/tcp_veno.c index ac43cd747bce..326475a94865 100644 --- a/net/ipv4/tcp_veno.c +++ b/net/ipv4/tcp_veno.c @@ -114,13 +114,14 @@ static void tcp_veno_cwnd_event(struct sock *sk, enum tcp_ca_event event) tcp_veno_init(sk); } -static void tcp_veno_cong_avoid(struct sock *sk, u32 ack, u32 in_flight) +static void tcp_veno_cong_avoid(struct sock *sk, u32 ack, u32 acked, + u32 in_flight) { struct tcp_sock *tp = tcp_sk(sk); struct veno *veno = inet_csk_ca(sk); if (!veno->doing_veno_now) { - tcp_reno_cong_avoid(sk, ack, in_flight); + tcp_reno_cong_avoid(sk, ack, acked, in_flight); return; } @@ -133,7 +134,7 @@ static void tcp_veno_cong_avoid(struct sock *sk, u32 ack, u32 in_flight) /* We don't have enough rtt samples to do the Veno * calculation, so we'll behave like Reno. */ - tcp_reno_cong_avoid(sk, ack, in_flight); + tcp_reno_cong_avoid(sk, ack, acked, in_flight); } else { u64 target_cwnd; u32 rtt; @@ -152,7 +153,7 @@ static void tcp_veno_cong_avoid(struct sock *sk, u32 ack, u32 in_flight) if (tp->snd_cwnd <= tp->snd_ssthresh) { /* Slow start. */ - tcp_slow_start(tp); + tcp_slow_start(tp, acked); } else { /* Congestion avoidance. */ if (veno->diff < beta) { diff --git a/net/ipv4/tcp_yeah.c b/net/ipv4/tcp_yeah.c index 05c3b6f0e8e1..a347a078ee07 100644 --- a/net/ipv4/tcp_yeah.c +++ b/net/ipv4/tcp_yeah.c @@ -69,7 +69,8 @@ static void tcp_yeah_pkts_acked(struct sock *sk, u32 pkts_acked, s32 rtt_us) tcp_vegas_pkts_acked(sk, pkts_acked, rtt_us); } -static void tcp_yeah_cong_avoid(struct sock *sk, u32 ack, u32 in_flight) +static void tcp_yeah_cong_avoid(struct sock *sk, u32 ack, u32 acked, + u32 in_flight) { struct tcp_sock *tp = tcp_sk(sk); struct yeah *yeah = inet_csk_ca(sk); @@ -78,7 +79,7 @@ static void tcp_yeah_cong_avoid(struct sock *sk, u32 ack, u32 in_flight) return; if (tp->snd_cwnd <= tp->snd_ssthresh) - tcp_slow_start(tp); + tcp_slow_start(tp, acked); else if (!yeah->doing_reno_now) { /* Scalable */ diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c index 0ca44df51ee9..89909dd730dd 100644 --- a/net/ipv4/udp.c +++ b/net/ipv4/udp.c @@ -103,6 +103,7 @@ #include <linux/seq_file.h> #include <net/net_namespace.h> #include <net/icmp.h> +#include <net/inet_hashtables.h> #include <net/route.h> #include <net/checksum.h> #include <net/xfrm.h> @@ -219,7 +220,7 @@ int udp_lib_get_port(struct sock *sk, unsigned short snum, unsigned short first, last; DECLARE_BITMAP(bitmap, PORTS_PER_CHAIN); - inet_get_local_port_range(&low, &high); + inet_get_local_port_range(net, &low, &high); remaining = (high - low) + 1; rand = net_random(); @@ -406,6 +407,18 @@ static inline int compute_score2(struct sock *sk, struct net *net, return score; } +static unsigned int udp_ehashfn(struct net *net, const __be32 laddr, + const __u16 lport, const __be32 faddr, + const __be16 fport) +{ + static u32 udp_ehash_secret __read_mostly; + + net_get_random_once(&udp_ehash_secret, sizeof(udp_ehash_secret)); + + return __inet_ehashfn(laddr, lport, faddr, fport, + udp_ehash_secret + net_hash_mix(net)); +} + /* called with read_rcu_lock() */ static struct sock *udp4_lib_lookup2(struct net *net, @@ -429,8 +442,8 @@ begin: badness = score; reuseport = sk->sk_reuseport; if (reuseport) { - hash = inet_ehashfn(net, daddr, hnum, - saddr, sport); + hash = udp_ehashfn(net, daddr, hnum, + saddr, sport); matches = 1; } } else if (score == badness && reuseport) { @@ -510,8 +523,8 @@ begin: badness = score; reuseport = sk->sk_reuseport; if (reuseport) { - hash = inet_ehashfn(net, daddr, hnum, - saddr, sport); + hash = udp_ehashfn(net, daddr, hnum, + saddr, sport); matches = 1; } } else if (score == badness && reuseport) { @@ -565,6 +578,26 @@ struct sock *udp4_lib_lookup(struct net *net, __be32 saddr, __be16 sport, } EXPORT_SYMBOL_GPL(udp4_lib_lookup); +static inline bool __udp_is_mcast_sock(struct net *net, struct sock *sk, + __be16 loc_port, __be32 loc_addr, + __be16 rmt_port, __be32 rmt_addr, + int dif, unsigned short hnum) +{ + struct inet_sock *inet = inet_sk(sk); + + if (!net_eq(sock_net(sk), net) || + udp_sk(sk)->udp_port_hash != hnum || + (inet->inet_daddr && inet->inet_daddr != rmt_addr) || + (inet->inet_dport != rmt_port && inet->inet_dport) || + (inet->inet_rcv_saddr && inet->inet_rcv_saddr != loc_addr) || + ipv6_only_sock(sk) || + (sk->sk_bound_dev_if && sk->sk_bound_dev_if != dif)) + return false; + if (!ip_mc_sf_allow(sk, loc_addr, rmt_addr, dif)) + return false; + return true; +} + static inline struct sock *udp_v4_mcast_next(struct net *net, struct sock *sk, __be16 loc_port, __be32 loc_addr, __be16 rmt_port, __be32 rmt_addr, @@ -575,20 +608,11 @@ static inline struct sock *udp_v4_mcast_next(struct net *net, struct sock *sk, unsigned short hnum = ntohs(loc_port); sk_nulls_for_each_from(s, node) { - struct inet_sock *inet = inet_sk(s); - - if (!net_eq(sock_net(s), net) || - udp_sk(s)->udp_port_hash != hnum || - (inet->inet_daddr && inet->inet_daddr != rmt_addr) || - (inet->inet_dport != rmt_port && inet->inet_dport) || - (inet->inet_rcv_saddr && - inet->inet_rcv_saddr != loc_addr) || - ipv6_only_sock(s) || - (s->sk_bound_dev_if && s->sk_bound_dev_if != dif)) - continue; - if (!ip_mc_sf_allow(s, loc_addr, rmt_addr, dif)) - continue; - goto found; + if (__udp_is_mcast_sock(net, s, + loc_port, loc_addr, + rmt_port, rmt_addr, + dif, hnum)) + goto found; } s = NULL; found: @@ -855,6 +879,8 @@ int udp_sendmsg(struct kiocb *iocb, struct sock *sk, struct msghdr *msg, ipc.opt = NULL; ipc.tx_flags = 0; + ipc.ttl = 0; + ipc.tos = -1; getfrag = is_udplite ? udplite_getfrag : ip_generic_getfrag; @@ -938,7 +964,7 @@ int udp_sendmsg(struct kiocb *iocb, struct sock *sk, struct msghdr *msg, faddr = ipc.opt->opt.faddr; connected = 0; } - tos = RT_TOS(inet->tos); + tos = get_rttos(&ipc, inet); if (sock_flag(sk, SOCK_LOCALROUTE) || (msg->msg_flags & MSG_DONTROUTE) || (ipc.opt && ipc.opt->opt.is_strictroute)) { @@ -1403,8 +1429,10 @@ static int __udp_queue_rcv_skb(struct sock *sk, struct sk_buff *skb) { int rc; - if (inet_sk(sk)->inet_daddr) + if (inet_sk(sk)->inet_daddr) { sock_rps_save_rxhash(sk, skb); + sk_mark_napi_id(sk, skb); + } rc = sock_queue_rcv_skb(sk, skb); if (rc < 0) { @@ -1528,7 +1556,7 @@ int udp_queue_rcv_skb(struct sock *sk, struct sk_buff *skb) rc = 0; - ipv4_pktinfo_prepare(skb); + ipv4_pktinfo_prepare(sk, skb); bh_lock_sock(sk); if (!sock_owned_by_user(sk)) rc = __udp_queue_rcv_skb(sk, skb); @@ -1577,6 +1605,14 @@ static void flush_stack(struct sock **stack, unsigned int count, kfree_skb(skb1); } +static void udp_sk_rx_dst_set(struct sock *sk, const struct sk_buff *skb) +{ + struct dst_entry *dst = skb_dst(skb); + + dst_hold(dst); + sk->sk_rx_dst = dst; +} + /* * Multicasts and broadcasts go to each listener. * @@ -1705,16 +1741,32 @@ int __udp4_lib_rcv(struct sk_buff *skb, struct udp_table *udptable, if (udp4_csum_init(skb, uh, proto)) goto csum_error; - if (rt->rt_flags & (RTCF_BROADCAST|RTCF_MULTICAST)) - return __udp4_lib_mcast_deliver(net, skb, uh, - saddr, daddr, udptable); + if (skb->sk) { + int ret; + sk = skb->sk; + + if (unlikely(sk->sk_rx_dst == NULL)) + udp_sk_rx_dst_set(sk, skb); - sk = __udp4_lib_lookup_skb(skb, uh->source, uh->dest, udptable); + ret = udp_queue_rcv_skb(sk, skb); + + /* a return value > 0 means to resubmit the input, but + * it wants the return to be -protocol, or 0 + */ + if (ret > 0) + return -ret; + return 0; + } else { + if (rt->rt_flags & (RTCF_BROADCAST|RTCF_MULTICAST)) + return __udp4_lib_mcast_deliver(net, skb, uh, + saddr, daddr, udptable); + + sk = __udp4_lib_lookup_skb(skb, uh->source, uh->dest, udptable); + } if (sk != NULL) { int ret; - sk_mark_napi_id(sk, skb); ret = udp_queue_rcv_skb(sk, skb); sock_put(sk); @@ -1768,6 +1820,135 @@ drop: return 0; } +/* We can only early demux multicast if there is a single matching socket. + * If more than one socket found returns NULL + */ +static struct sock *__udp4_lib_mcast_demux_lookup(struct net *net, + __be16 loc_port, __be32 loc_addr, + __be16 rmt_port, __be32 rmt_addr, + int dif) +{ + struct sock *sk, *result; + struct hlist_nulls_node *node; + unsigned short hnum = ntohs(loc_port); + unsigned int count, slot = udp_hashfn(net, hnum, udp_table.mask); + struct udp_hslot *hslot = &udp_table.hash[slot]; + + rcu_read_lock(); +begin: + count = 0; + result = NULL; + sk_nulls_for_each_rcu(sk, node, &hslot->head) { + if (__udp_is_mcast_sock(net, sk, + loc_port, loc_addr, + rmt_port, rmt_addr, + dif, hnum)) { + result = sk; + ++count; + } + } + /* + * if the nulls value we got at the end of this lookup is + * not the expected one, we must restart lookup. + * We probably met an item that was moved to another chain. + */ + if (get_nulls_value(node) != slot) + goto begin; + + if (result) { + if (count != 1 || + unlikely(!atomic_inc_not_zero_hint(&result->sk_refcnt, 2))) + result = NULL; + else if (unlikely(!__udp_is_mcast_sock(net, result, + loc_port, loc_addr, + rmt_port, rmt_addr, + dif, hnum))) { + sock_put(result); + result = NULL; + } + } + rcu_read_unlock(); + return result; +} + +/* For unicast we should only early demux connected sockets or we can + * break forwarding setups. The chains here can be long so only check + * if the first socket is an exact match and if not move on. + */ +static struct sock *__udp4_lib_demux_lookup(struct net *net, + __be16 loc_port, __be32 loc_addr, + __be16 rmt_port, __be32 rmt_addr, + int dif) +{ + struct sock *sk, *result; + struct hlist_nulls_node *node; + unsigned short hnum = ntohs(loc_port); + unsigned int hash2 = udp4_portaddr_hash(net, loc_addr, hnum); + unsigned int slot2 = hash2 & udp_table.mask; + struct udp_hslot *hslot2 = &udp_table.hash2[slot2]; + INET_ADDR_COOKIE(acookie, rmt_addr, loc_addr) + const __portpair ports = INET_COMBINED_PORTS(rmt_port, hnum); + + rcu_read_lock(); + result = NULL; + udp_portaddr_for_each_entry_rcu(sk, node, &hslot2->head) { + if (INET_MATCH(sk, net, acookie, + rmt_addr, loc_addr, ports, dif)) + result = sk; + /* Only check first socket in chain */ + break; + } + + if (result) { + if (unlikely(!atomic_inc_not_zero_hint(&result->sk_refcnt, 2))) + result = NULL; + else if (unlikely(!INET_MATCH(sk, net, acookie, + rmt_addr, loc_addr, + ports, dif))) { + sock_put(result); + result = NULL; + } + } + rcu_read_unlock(); + return result; +} + +void udp_v4_early_demux(struct sk_buff *skb) +{ + const struct iphdr *iph = ip_hdr(skb); + const struct udphdr *uh = udp_hdr(skb); + struct sock *sk; + struct dst_entry *dst; + struct net *net = dev_net(skb->dev); + int dif = skb->dev->ifindex; + + /* validate the packet */ + if (!pskb_may_pull(skb, skb_transport_offset(skb) + sizeof(struct udphdr))) + return; + + if (skb->pkt_type == PACKET_BROADCAST || + skb->pkt_type == PACKET_MULTICAST) + sk = __udp4_lib_mcast_demux_lookup(net, uh->dest, iph->daddr, + uh->source, iph->saddr, dif); + else if (skb->pkt_type == PACKET_HOST) + sk = __udp4_lib_demux_lookup(net, uh->dest, iph->daddr, + uh->source, iph->saddr, dif); + else + return; + + if (!sk) + return; + + skb->sk = sk; + skb->destructor = sock_edemux; + dst = sk->sk_rx_dst; + + if (dst) + dst = dst_check(dst, 0); + if (dst) + skb_dst_set_noref(skb, dst); +} + int udp_rcv(struct sk_buff *skb) { return __udp4_lib_rcv(skb, &udp_table, IPPROTO_UDP); diff --git a/net/ipv4/udp_impl.h b/net/ipv4/udp_impl.h index 5a681e298b90..f3c27899f62b 100644 --- a/net/ipv4/udp_impl.h +++ b/net/ipv4/udp_impl.h @@ -5,30 +5,30 @@ #include <net/protocol.h> #include <net/inet_common.h> -extern int __udp4_lib_rcv(struct sk_buff *, struct udp_table *, int ); -extern void __udp4_lib_err(struct sk_buff *, u32, struct udp_table *); +int __udp4_lib_rcv(struct sk_buff *, struct udp_table *, int); +void __udp4_lib_err(struct sk_buff *, u32, struct udp_table *); -extern int udp_v4_get_port(struct sock *sk, unsigned short snum); +int udp_v4_get_port(struct sock *sk, unsigned short snum); -extern int udp_setsockopt(struct sock *sk, int level, int optname, - char __user *optval, unsigned int optlen); -extern int udp_getsockopt(struct sock *sk, int level, int optname, - char __user *optval, int __user *optlen); +int udp_setsockopt(struct sock *sk, int level, int optname, + char __user *optval, unsigned int optlen); +int udp_getsockopt(struct sock *sk, int level, int optname, + char __user *optval, int __user *optlen); #ifdef CONFIG_COMPAT -extern int compat_udp_setsockopt(struct sock *sk, int level, int optname, - char __user *optval, unsigned int optlen); -extern int compat_udp_getsockopt(struct sock *sk, int level, int optname, - char __user *optval, int __user *optlen); +int compat_udp_setsockopt(struct sock *sk, int level, int optname, + char __user *optval, unsigned int optlen); +int compat_udp_getsockopt(struct sock *sk, int level, int optname, + char __user *optval, int __user *optlen); #endif -extern int udp_recvmsg(struct kiocb *iocb, struct sock *sk, struct msghdr *msg, - size_t len, int noblock, int flags, int *addr_len); -extern int udp_sendpage(struct sock *sk, struct page *page, int offset, - size_t size, int flags); -extern int udp_queue_rcv_skb(struct sock *sk, struct sk_buff *skb); -extern void udp_destroy_sock(struct sock *sk); +int udp_recvmsg(struct kiocb *iocb, struct sock *sk, struct msghdr *msg, + size_t len, int noblock, int flags, int *addr_len); +int udp_sendpage(struct sock *sk, struct page *page, int offset, size_t size, + int flags); +int udp_queue_rcv_skb(struct sock *sk, struct sk_buff *skb); +void udp_destroy_sock(struct sock *sk); #ifdef CONFIG_PROC_FS -extern int udp4_seq_show(struct seq_file *seq, void *v); +int udp4_seq_show(struct seq_file *seq, void *v); #endif #endif /* _UDP4_IMPL_H */ diff --git a/net/ipv4/udp_offload.c b/net/ipv4/udp_offload.c index f35eccaa855e..83206de2bc76 100644 --- a/net/ipv4/udp_offload.c +++ b/net/ipv4/udp_offload.c @@ -52,6 +52,7 @@ static struct sk_buff *udp4_ufo_fragment(struct sk_buff *skb, if (unlikely(type & ~(SKB_GSO_UDP | SKB_GSO_DODGY | SKB_GSO_UDP_TUNNEL | + SKB_GSO_IPIP | SKB_GSO_GRE | SKB_GSO_MPLS) || !(type & (SKB_GSO_UDP)))) goto out; diff --git a/net/ipv4/xfrm4_mode_tunnel.c b/net/ipv4/xfrm4_mode_tunnel.c index b5663c37f089..31b18152528f 100644 --- a/net/ipv4/xfrm4_mode_tunnel.c +++ b/net/ipv4/xfrm4_mode_tunnel.c @@ -16,13 +16,13 @@ #include <net/xfrm.h> /* Informational hook. The decap is still done here. */ -static struct xfrm_tunnel __rcu *rcv_notify_handlers __read_mostly; +static struct xfrm_tunnel_notifier __rcu *rcv_notify_handlers __read_mostly; static DEFINE_MUTEX(xfrm4_mode_tunnel_input_mutex); -int xfrm4_mode_tunnel_input_register(struct xfrm_tunnel *handler) +int xfrm4_mode_tunnel_input_register(struct xfrm_tunnel_notifier *handler) { - struct xfrm_tunnel __rcu **pprev; - struct xfrm_tunnel *t; + struct xfrm_tunnel_notifier __rcu **pprev; + struct xfrm_tunnel_notifier *t; int ret = -EEXIST; int priority = handler->priority; @@ -50,10 +50,10 @@ err: } EXPORT_SYMBOL_GPL(xfrm4_mode_tunnel_input_register); -int xfrm4_mode_tunnel_input_deregister(struct xfrm_tunnel *handler) +int xfrm4_mode_tunnel_input_deregister(struct xfrm_tunnel_notifier *handler) { - struct xfrm_tunnel __rcu **pprev; - struct xfrm_tunnel *t; + struct xfrm_tunnel_notifier __rcu **pprev; + struct xfrm_tunnel_notifier *t; int ret = -ENOENT; mutex_lock(&xfrm4_mode_tunnel_input_mutex); @@ -134,7 +134,7 @@ static int xfrm4_mode_tunnel_output(struct xfrm_state *x, struct sk_buff *skb) static int xfrm4_mode_tunnel_input(struct xfrm_state *x, struct sk_buff *skb) { - struct xfrm_tunnel *handler; + struct xfrm_tunnel_notifier *handler; int err = -EINVAL; if (XFRM_MODE_SKB_CB(skb)->protocol != IPPROTO_IPIP) diff --git a/net/ipv6/Kconfig b/net/ipv6/Kconfig index 11b13ea69db4..d92e5586783e 100644 --- a/net/ipv6/Kconfig +++ b/net/ipv6/Kconfig @@ -21,24 +21,6 @@ menuconfig IPV6 if IPV6 -config IPV6_PRIVACY - bool "IPv6: Privacy Extensions (RFC 3041) support" - ---help--- - Privacy Extensions for Stateless Address Autoconfiguration in IPv6 - support. With this option, additional periodically-altered - pseudo-random global-scope unicast address(es) will be assigned to - your interface(s). - - We use our standard pseudo-random algorithm to generate the - randomized interface identifier, instead of one described in RFC 3041. - - By default the kernel does not generate temporary addresses. - To use temporary addresses, do - - echo 2 >/proc/sys/net/ipv6/conf/all/use_tempaddr - - See <file:Documentation/networking/ip-sysctl.txt> for details. - config IPV6_ROUTER_PREF bool "IPv6: Router Preference (RFC 4191) support" ---help--- @@ -153,6 +135,17 @@ config INET6_XFRM_MODE_ROUTEOPTIMIZATION ---help--- Support for MIPv6 route optimization mode. +config IPV6_VTI +tristate "Virtual (secure) IPv6: tunneling" + select IPV6_TUNNEL + depends on INET6_XFRM_MODE_TUNNEL + ---help--- + Tunneling means encapsulating data of one protocol type within + another protocol and sending it over a channel that understands the + encapsulating protocol. This can be used with xfrm mode tunnel to give + the notion of a secure tunnel for IPSEC and then use routing protocol + on top. + config IPV6_SIT tristate "IPv6: IPv6-in-IPv4 tunnel (SIT driver)" select INET_TUNNEL diff --git a/net/ipv6/Makefile b/net/ipv6/Makefile index 470a9c008e9b..17bb830872db 100644 --- a/net/ipv6/Makefile +++ b/net/ipv6/Makefile @@ -36,6 +36,7 @@ obj-$(CONFIG_INET6_XFRM_MODE_BEET) += xfrm6_mode_beet.o obj-$(CONFIG_IPV6_MIP6) += mip6.o obj-$(CONFIG_NETFILTER) += netfilter/ +obj-$(CONFIG_IPV6_VTI) += ip6_vti.o obj-$(CONFIG_IPV6_SIT) += sit.o obj-$(CONFIG_IPV6_TUNNEL) += ip6_tunnel.o obj-$(CONFIG_IPV6_GRE) += ip6_gre.o diff --git a/net/ipv6/addrconf.c b/net/ipv6/addrconf.c index cd3fb301da38..542d09561ed6 100644 --- a/net/ipv6/addrconf.c +++ b/net/ipv6/addrconf.c @@ -83,11 +83,7 @@ #include <linux/if_tunnel.h> #include <linux/rtnetlink.h> #include <linux/netconf.h> - -#ifdef CONFIG_IPV6_PRIVACY #include <linux/random.h> -#endif - #include <linux/uaccess.h> #include <asm/unaligned.h> @@ -124,11 +120,9 @@ static inline void addrconf_sysctl_unregister(struct inet6_dev *idev) } #endif -#ifdef CONFIG_IPV6_PRIVACY static void __ipv6_regen_rndid(struct inet6_dev *idev); static void __ipv6_try_regen_rndid(struct inet6_dev *idev, struct in6_addr *tmpaddr); static void ipv6_regen_rndid(unsigned long data); -#endif static int ipv6_generate_eui64(u8 *eui, struct net_device *dev); static int ipv6_count_addresses(struct inet6_dev *idev); @@ -183,13 +177,11 @@ static struct ipv6_devconf ipv6_devconf __read_mostly = { .rtr_solicits = MAX_RTR_SOLICITATIONS, .rtr_solicit_interval = RTR_SOLICITATION_INTERVAL, .rtr_solicit_delay = MAX_RTR_SOLICITATION_DELAY, -#ifdef CONFIG_IPV6_PRIVACY .use_tempaddr = 0, .temp_valid_lft = TEMP_VALID_LIFETIME, .temp_prefered_lft = TEMP_PREFERRED_LIFETIME, .regen_max_retry = REGEN_MAX_RETRY, .max_desync_factor = MAX_DESYNC_FACTOR, -#endif .max_addresses = IPV6_MAX_ADDRESSES, .accept_ra_defrtr = 1, .accept_ra_pinfo = 1, @@ -221,13 +213,11 @@ static struct ipv6_devconf ipv6_devconf_dflt __read_mostly = { .rtr_solicits = MAX_RTR_SOLICITATIONS, .rtr_solicit_interval = RTR_SOLICITATION_INTERVAL, .rtr_solicit_delay = MAX_RTR_SOLICITATION_DELAY, -#ifdef CONFIG_IPV6_PRIVACY .use_tempaddr = 0, .temp_valid_lft = TEMP_VALID_LIFETIME, .temp_prefered_lft = TEMP_PREFERRED_LIFETIME, .regen_max_retry = REGEN_MAX_RETRY, .max_desync_factor = MAX_DESYNC_FACTOR, -#endif .max_addresses = IPV6_MAX_ADDRESSES, .accept_ra_defrtr = 1, .accept_ra_pinfo = 1, @@ -371,7 +361,6 @@ static struct inet6_dev *ipv6_add_dev(struct net_device *dev) } #endif -#ifdef CONFIG_IPV6_PRIVACY INIT_LIST_HEAD(&ndev->tempaddr_list); setup_timer(&ndev->regen_timer, ipv6_regen_rndid, (unsigned long)ndev); if ((dev->flags&IFF_LOOPBACK) || @@ -384,7 +373,7 @@ static struct inet6_dev *ipv6_add_dev(struct net_device *dev) in6_dev_hold(ndev); ipv6_regen_rndid((unsigned long) ndev); } -#endif + ndev->token = in6addr_any; if (netif_running(dev) && addrconf_qdisc_ok(dev)) @@ -865,12 +854,10 @@ ipv6_add_addr(struct inet6_dev *idev, const struct in6_addr *addr, /* Add to inet6_dev unicast addr list. */ ipv6_link_dev_addr(idev, ifa); -#ifdef CONFIG_IPV6_PRIVACY if (ifa->flags&IFA_F_TEMPORARY) { list_add(&ifa->tmp_list, &idev->tempaddr_list); in6_ifa_hold(ifa); } -#endif in6_ifa_hold(ifa); write_unlock(&idev->lock); @@ -913,7 +900,7 @@ static void ipv6_del_addr(struct inet6_ifaddr *ifp) spin_unlock_bh(&addrconf_hash_lock); write_lock_bh(&idev->lock); -#ifdef CONFIG_IPV6_PRIVACY + if (ifp->flags&IFA_F_TEMPORARY) { list_del(&ifp->tmp_list); if (ifp->ifpub) { @@ -922,7 +909,6 @@ static void ipv6_del_addr(struct inet6_ifaddr *ifp) } __in6_ifa_put(ifp); } -#endif list_for_each_entry_safe(ifa, ifn, &idev->addr_list, if_list) { if (ifa == ifp) { @@ -1013,7 +999,6 @@ out: in6_ifa_put(ifp); } -#ifdef CONFIG_IPV6_PRIVACY static int ipv6_create_tempaddr(struct inet6_ifaddr *ifp, struct inet6_ifaddr *ift) { struct inet6_dev *idev = ifp->idev; @@ -1116,7 +1101,6 @@ retry: out: return ret; } -#endif /* * Choose an appropriate source address (RFC3484) @@ -1131,9 +1115,7 @@ enum { #endif IPV6_SADDR_RULE_OIF, IPV6_SADDR_RULE_LABEL, -#ifdef CONFIG_IPV6_PRIVACY IPV6_SADDR_RULE_PRIVACY, -#endif IPV6_SADDR_RULE_ORCHID, IPV6_SADDR_RULE_PREFIX, IPV6_SADDR_RULE_MAX @@ -1247,7 +1229,6 @@ static int ipv6_get_saddr_eval(struct net *net, &score->ifa->addr, score->addr_type, score->ifa->idev->dev->ifindex) == dst->label; break; -#ifdef CONFIG_IPV6_PRIVACY case IPV6_SADDR_RULE_PRIVACY: { /* Rule 7: Prefer public address @@ -1259,7 +1240,6 @@ static int ipv6_get_saddr_eval(struct net *net, ret = (!(score->ifa->flags & IFA_F_TEMPORARY)) ^ preftmp; break; } -#endif case IPV6_SADDR_RULE_ORCHID: /* Rule 8-: Prefer ORCHID vs ORCHID or * non-ORCHID vs non-ORCHID @@ -1588,7 +1568,6 @@ static void addrconf_dad_stop(struct inet6_ifaddr *ifp, int dad_failed) if (dad_failed) ipv6_ifa_notify(0, ifp); in6_ifa_put(ifp); -#ifdef CONFIG_IPV6_PRIVACY } else if (ifp->flags&IFA_F_TEMPORARY) { struct inet6_ifaddr *ifpub; spin_lock_bh(&ifp->lock); @@ -1602,7 +1581,6 @@ static void addrconf_dad_stop(struct inet6_ifaddr *ifp, int dad_failed) spin_unlock_bh(&ifp->lock); } ipv6_del_addr(ifp); -#endif } else ipv6_del_addr(ifp); } @@ -1851,7 +1829,6 @@ static int ipv6_inherit_eui64(u8 *eui, struct inet6_dev *idev) return err; } -#ifdef CONFIG_IPV6_PRIVACY /* (re)generation of randomized interface identifier (RFC 3041 3.2, 3.5) */ static void __ipv6_regen_rndid(struct inet6_dev *idev) { @@ -1919,7 +1896,6 @@ static void __ipv6_try_regen_rndid(struct inet6_dev *idev, struct in6_addr *tmp if (tmpaddr && memcmp(idev->rndid, &tmpaddr->s6_addr[8], 8) == 0) __ipv6_regen_rndid(idev); } -#endif /* * Add prefix route. @@ -2207,9 +2183,7 @@ ok: if (ifp) { int flags; unsigned long now; -#ifdef CONFIG_IPV6_PRIVACY struct inet6_ifaddr *ift; -#endif u32 stored_lft; /* update lifetime (RFC2462 5.5.3 e) */ @@ -2250,7 +2224,6 @@ ok: } else spin_unlock(&ifp->lock); -#ifdef CONFIG_IPV6_PRIVACY read_lock_bh(&in6_dev->lock); /* update all temporary addresses in the list */ list_for_each_entry(ift, &in6_dev->tempaddr_list, @@ -2315,7 +2288,7 @@ ok: } else { read_unlock_bh(&in6_dev->lock); } -#endif + in6_ifa_put(ifp); addrconf_verify(0); } @@ -2995,7 +2968,6 @@ static int addrconf_ifdown(struct net_device *dev, int how) if (!how) idev->if_flags &= ~(IF_RS_SENT|IF_RA_RCVD|IF_READY); -#ifdef CONFIG_IPV6_PRIVACY if (how && del_timer(&idev->regen_timer)) in6_dev_put(idev); @@ -3015,7 +2987,6 @@ static int addrconf_ifdown(struct net_device *dev, int how) in6_ifa_put(ifa); write_lock_bh(&idev->lock); } -#endif while (!list_empty(&idev->addr_list)) { ifa = list_first_entry(&idev->addr_list, @@ -3528,7 +3499,6 @@ restart: in6_ifa_put(ifp); goto restart; } -#ifdef CONFIG_IPV6_PRIVACY } else if ((ifp->flags&IFA_F_TEMPORARY) && !(ifp->flags&IFA_F_TENTATIVE)) { unsigned long regen_advance = ifp->idev->cnf.regen_max_retry * @@ -3556,7 +3526,6 @@ restart: } else if (time_before(ifp->tstamp + ifp->prefered_lft * HZ - regen_advance * HZ, next)) next = ifp->tstamp + ifp->prefered_lft * HZ - regen_advance * HZ; spin_unlock(&ifp->lock); -#endif } else { /* ifp->prefered_lft <= ifp->valid_lft */ if (time_before(ifp->tstamp + ifp->prefered_lft * HZ, next)) @@ -4128,13 +4097,11 @@ static inline void ipv6_store_devconf(struct ipv6_devconf *cnf, jiffies_to_msecs(cnf->mldv1_unsolicited_report_interval); array[DEVCONF_MLDV2_UNSOLICITED_REPORT_INTERVAL] = jiffies_to_msecs(cnf->mldv2_unsolicited_report_interval); -#ifdef CONFIG_IPV6_PRIVACY array[DEVCONF_USE_TEMPADDR] = cnf->use_tempaddr; array[DEVCONF_TEMP_VALID_LFT] = cnf->temp_valid_lft; array[DEVCONF_TEMP_PREFERED_LFT] = cnf->temp_prefered_lft; array[DEVCONF_REGEN_MAX_RETRY] = cnf->regen_max_retry; array[DEVCONF_MAX_DESYNC_FACTOR] = cnf->max_desync_factor; -#endif array[DEVCONF_MAX_ADDRESSES] = cnf->max_addresses; array[DEVCONF_ACCEPT_RA_DEFRTR] = cnf->accept_ra_defrtr; array[DEVCONF_ACCEPT_RA_PINFO] = cnf->accept_ra_pinfo; @@ -4828,7 +4795,6 @@ static struct addrconf_sysctl_table .mode = 0644, .proc_handler = proc_dointvec_ms_jiffies, }, -#ifdef CONFIG_IPV6_PRIVACY { .procname = "use_tempaddr", .data = &ipv6_devconf.use_tempaddr, @@ -4864,7 +4830,6 @@ static struct addrconf_sysctl_table .mode = 0644, .proc_handler = proc_dointvec, }, -#endif { .procname = "max_addresses", .data = &ipv6_devconf.max_addresses, diff --git a/net/ipv6/af_inet6.c b/net/ipv6/af_inet6.c index 7c96100b021e..6468bda1f2b9 100644 --- a/net/ipv6/af_inet6.c +++ b/net/ipv6/af_inet6.c @@ -110,11 +110,6 @@ static int inet6_create(struct net *net, struct socket *sock, int protocol, int try_loading_module = 0; int err; - if (sock->type != SOCK_RAW && - sock->type != SOCK_DGRAM && - !inet_ehash_secret) - build_ehash_secret(); - /* Look for the requested type/protocol pair. */ lookup_protocol: err = -ESOCKTNOSUPPORT; @@ -364,7 +359,7 @@ int inet6_bind(struct socket *sock, struct sockaddr *uaddr, int addr_len) inet->inet_rcv_saddr = v4addr; inet->inet_saddr = v4addr; - np->rcv_saddr = addr->sin6_addr; + sk->sk_v6_rcv_saddr = addr->sin6_addr; if (!(addr_type & IPV6_ADDR_MULTICAST)) np->saddr = addr->sin6_addr; @@ -461,14 +456,14 @@ int inet6_getname(struct socket *sock, struct sockaddr *uaddr, peer == 1) return -ENOTCONN; sin->sin6_port = inet->inet_dport; - sin->sin6_addr = np->daddr; + sin->sin6_addr = sk->sk_v6_daddr; if (np->sndflow) sin->sin6_flowinfo = np->flow_label; } else { - if (ipv6_addr_any(&np->rcv_saddr)) + if (ipv6_addr_any(&sk->sk_v6_rcv_saddr)) sin->sin6_addr = np->saddr; else - sin->sin6_addr = np->rcv_saddr; + sin->sin6_addr = sk->sk_v6_rcv_saddr; sin->sin6_port = inet->inet_sport; } @@ -655,7 +650,7 @@ int inet6_sk_rebuild_header(struct sock *sk) memset(&fl6, 0, sizeof(fl6)); fl6.flowi6_proto = sk->sk_protocol; - fl6.daddr = np->daddr; + fl6.daddr = sk->sk_v6_daddr; fl6.saddr = np->saddr; fl6.flowlabel = np->flow_label; fl6.flowi6_oif = sk->sk_bound_dev_if; @@ -870,8 +865,6 @@ static int __init inet6_init(void) if (err) goto out_sock_register_fail; - tcpv6_prot.sysctl_mem = init_net.ipv4.sysctl_tcp_mem; - /* * ipngwg API draft makes clear that the correct semantics * for TCP and UDP is to consider one TCP and UDP instance @@ -1028,52 +1021,4 @@ out_unregister_tcp_proto: } module_init(inet6_init); -static void __exit inet6_exit(void) -{ - if (disable_ipv6_mod) - return; - - /* First of all disallow new sockets creation. */ - sock_unregister(PF_INET6); - /* Disallow any further netlink messages */ - rtnl_unregister_all(PF_INET6); - - udpv6_exit(); - udplitev6_exit(); - tcpv6_exit(); - - /* Cleanup code parts. */ - ipv6_packet_cleanup(); - ipv6_frag_exit(); - ipv6_exthdrs_exit(); - addrconf_cleanup(); - ip6_flowlabel_cleanup(); - ndisc_late_cleanup(); - ip6_route_cleanup(); -#ifdef CONFIG_PROC_FS - - /* Cleanup code parts. */ - if6_proc_exit(); - ipv6_misc_proc_exit(); - udplite6_proc_exit(); - raw6_proc_exit(); -#endif - ipv6_netfilter_fini(); - ipv6_stub = NULL; - igmp6_cleanup(); - ndisc_cleanup(); - ip6_mr_cleanup(); - icmpv6_cleanup(); - rawv6_exit(); - - unregister_pernet_subsys(&inet6_net_ops); - proto_unregister(&rawv6_prot); - proto_unregister(&udplitev6_prot); - proto_unregister(&udpv6_prot); - proto_unregister(&tcpv6_prot); - - rcu_barrier(); /* Wait for completion of call_rcu()'s */ -} -module_exit(inet6_exit); - MODULE_ALIAS_NETPROTO(PF_INET6); diff --git a/net/ipv6/datagram.c b/net/ipv6/datagram.c index 48b6bd2a9a14..a454b0ff57c7 100644 --- a/net/ipv6/datagram.c +++ b/net/ipv6/datagram.c @@ -107,16 +107,16 @@ ipv4_connected: if (err) goto out; - ipv6_addr_set_v4mapped(inet->inet_daddr, &np->daddr); + ipv6_addr_set_v4mapped(inet->inet_daddr, &sk->sk_v6_daddr); if (ipv6_addr_any(&np->saddr) || ipv6_mapped_addr_any(&np->saddr)) ipv6_addr_set_v4mapped(inet->inet_saddr, &np->saddr); - if (ipv6_addr_any(&np->rcv_saddr) || - ipv6_mapped_addr_any(&np->rcv_saddr)) { + if (ipv6_addr_any(&sk->sk_v6_rcv_saddr) || + ipv6_mapped_addr_any(&sk->sk_v6_rcv_saddr)) { ipv6_addr_set_v4mapped(inet->inet_rcv_saddr, - &np->rcv_saddr); + &sk->sk_v6_rcv_saddr); if (sk->sk_prot->rehash) sk->sk_prot->rehash(sk); } @@ -145,7 +145,7 @@ ipv4_connected: } } - np->daddr = *daddr; + sk->sk_v6_daddr = *daddr; np->flow_label = fl6.flowlabel; inet->inet_dport = usin->sin6_port; @@ -156,7 +156,7 @@ ipv4_connected: */ fl6.flowi6_proto = sk->sk_protocol; - fl6.daddr = np->daddr; + fl6.daddr = sk->sk_v6_daddr; fl6.saddr = np->saddr; fl6.flowi6_oif = sk->sk_bound_dev_if; fl6.flowi6_mark = sk->sk_mark; @@ -183,16 +183,16 @@ ipv4_connected: if (ipv6_addr_any(&np->saddr)) np->saddr = fl6.saddr; - if (ipv6_addr_any(&np->rcv_saddr)) { - np->rcv_saddr = fl6.saddr; + if (ipv6_addr_any(&sk->sk_v6_rcv_saddr)) { + sk->sk_v6_rcv_saddr = fl6.saddr; inet->inet_rcv_saddr = LOOPBACK4_IPV6; if (sk->sk_prot->rehash) sk->sk_prot->rehash(sk); } ip6_dst_store(sk, dst, - ipv6_addr_equal(&fl6.daddr, &np->daddr) ? - &np->daddr : NULL, + ipv6_addr_equal(&fl6.daddr, &sk->sk_v6_daddr) ? + &sk->sk_v6_daddr : NULL, #ifdef CONFIG_IPV6_SUBTREES ipv6_addr_equal(&fl6.saddr, &np->saddr) ? &np->saddr : @@ -883,11 +883,10 @@ EXPORT_SYMBOL_GPL(ip6_datagram_send_ctl); void ip6_dgram_sock_seq_show(struct seq_file *seq, struct sock *sp, __u16 srcp, __u16 destp, int bucket) { - struct ipv6_pinfo *np = inet6_sk(sp); const struct in6_addr *dest, *src; - dest = &np->daddr; - src = &np->rcv_saddr; + dest = &sp->sk_v6_daddr; + src = &sp->sk_v6_rcv_saddr; seq_printf(seq, "%5d: %08X%08X%08X%08X:%04X %08X%08X%08X%08X:%04X " "%02X %08X:%08X %02X:%08lX %08X %5u %8d %lu %d %pK %d\n", diff --git a/net/ipv6/esp6.c b/net/ipv6/esp6.c index e67e63f9858d..b8719df0366e 100644 --- a/net/ipv6/esp6.c +++ b/net/ipv6/esp6.c @@ -164,10 +164,9 @@ static int esp6_output(struct xfrm_state *x, struct sk_buff *skb) u8 *iv; u8 *tail; __be32 *seqhi; - struct esp_data *esp = x->data; /* skb is pure payload to encrypt */ - aead = esp->aead; + aead = x->data; alen = crypto_aead_authsize(aead); tfclen = 0; @@ -181,8 +180,6 @@ static int esp6_output(struct xfrm_state *x, struct sk_buff *skb) } blksize = ALIGN(crypto_aead_blocksize(aead), 4); clen = ALIGN(skb->len + 2 + tfclen, blksize); - if (esp->padlen) - clen = ALIGN(clen, esp->padlen); plen = clen - skb->len - tfclen; err = skb_cow_data(skb, tfclen + plen + alen, &trailer); @@ -271,8 +268,7 @@ error: static int esp_input_done2(struct sk_buff *skb, int err) { struct xfrm_state *x = xfrm_input_state(skb); - struct esp_data *esp = x->data; - struct crypto_aead *aead = esp->aead; + struct crypto_aead *aead = x->data; int alen = crypto_aead_authsize(aead); int hlen = sizeof(struct ip_esp_hdr) + crypto_aead_ivsize(aead); int elen = skb->len - hlen; @@ -325,8 +321,7 @@ static void esp_input_done(struct crypto_async_request *base, int err) static int esp6_input(struct xfrm_state *x, struct sk_buff *skb) { struct ip_esp_hdr *esph; - struct esp_data *esp = x->data; - struct crypto_aead *aead = esp->aead; + struct crypto_aead *aead = x->data; struct aead_request *req; struct sk_buff *trailer; int elen = skb->len - sizeof(*esph) - crypto_aead_ivsize(aead); @@ -414,9 +409,8 @@ out: static u32 esp6_get_mtu(struct xfrm_state *x, int mtu) { - struct esp_data *esp = x->data; - u32 blksize = ALIGN(crypto_aead_blocksize(esp->aead), 4); - u32 align = max_t(u32, blksize, esp->padlen); + struct crypto_aead *aead = x->data; + u32 blksize = ALIGN(crypto_aead_blocksize(aead), 4); unsigned int net_adj; if (x->props.mode != XFRM_MODE_TUNNEL) @@ -424,8 +418,8 @@ static u32 esp6_get_mtu(struct xfrm_state *x, int mtu) else net_adj = 0; - return ((mtu - x->props.header_len - crypto_aead_authsize(esp->aead) - - net_adj) & ~(align - 1)) + net_adj - 2; + return ((mtu - x->props.header_len - crypto_aead_authsize(aead) - + net_adj) & ~(blksize - 1)) + net_adj - 2; } static void esp6_err(struct sk_buff *skb, struct inet6_skb_parm *opt, @@ -454,18 +448,16 @@ static void esp6_err(struct sk_buff *skb, struct inet6_skb_parm *opt, static void esp6_destroy(struct xfrm_state *x) { - struct esp_data *esp = x->data; + struct crypto_aead *aead = x->data; - if (!esp) + if (!aead) return; - crypto_free_aead(esp->aead); - kfree(esp); + crypto_free_aead(aead); } static int esp_init_aead(struct xfrm_state *x) { - struct esp_data *esp = x->data; struct crypto_aead *aead; int err; @@ -474,7 +466,7 @@ static int esp_init_aead(struct xfrm_state *x) if (IS_ERR(aead)) goto error; - esp->aead = aead; + x->data = aead; err = crypto_aead_setkey(aead, x->aead->alg_key, (x->aead->alg_key_len + 7) / 8); @@ -491,7 +483,6 @@ error: static int esp_init_authenc(struct xfrm_state *x) { - struct esp_data *esp = x->data; struct crypto_aead *aead; struct crypto_authenc_key_param *param; struct rtattr *rta; @@ -526,7 +517,7 @@ static int esp_init_authenc(struct xfrm_state *x) if (IS_ERR(aead)) goto error; - esp->aead = aead; + x->data = aead; keylen = (x->aalg ? (x->aalg->alg_key_len + 7) / 8 : 0) + (x->ealg->alg_key_len + 7) / 8 + RTA_SPACE(sizeof(*param)); @@ -581,7 +572,6 @@ error: static int esp6_init_state(struct xfrm_state *x) { - struct esp_data *esp; struct crypto_aead *aead; u32 align; int err; @@ -589,11 +579,7 @@ static int esp6_init_state(struct xfrm_state *x) if (x->encap) return -EINVAL; - esp = kzalloc(sizeof(*esp), GFP_KERNEL); - if (esp == NULL) - return -ENOMEM; - - x->data = esp; + x->data = NULL; if (x->aead) err = esp_init_aead(x); @@ -603,9 +589,7 @@ static int esp6_init_state(struct xfrm_state *x) if (err) goto error; - aead = esp->aead; - - esp->padlen = 0; + aead = x->data; x->props.header_len = sizeof(struct ip_esp_hdr) + crypto_aead_ivsize(aead); @@ -625,9 +609,7 @@ static int esp6_init_state(struct xfrm_state *x) } align = ALIGN(crypto_aead_blocksize(aead), 4); - if (esp->padlen) - align = max_t(u32, align, esp->padlen); - x->props.trailer_len = align + 1 + crypto_aead_authsize(esp->aead); + x->props.trailer_len = align + 1 + crypto_aead_authsize(aead); error: return err; diff --git a/net/ipv6/inet6_connection_sock.c b/net/ipv6/inet6_connection_sock.c index e4311cbc8b4e..77bb8afb141d 100644 --- a/net/ipv6/inet6_connection_sock.c +++ b/net/ipv6/inet6_connection_sock.c @@ -70,20 +70,20 @@ struct dst_entry *inet6_csk_route_req(struct sock *sk, struct flowi6 *fl6, const struct request_sock *req) { - struct inet6_request_sock *treq = inet6_rsk(req); + struct inet_request_sock *ireq = inet_rsk(req); struct ipv6_pinfo *np = inet6_sk(sk); struct in6_addr *final_p, final; struct dst_entry *dst; memset(fl6, 0, sizeof(*fl6)); fl6->flowi6_proto = IPPROTO_TCP; - fl6->daddr = treq->rmt_addr; + fl6->daddr = ireq->ir_v6_rmt_addr; final_p = fl6_update_dst(fl6, np->opt, &final); - fl6->saddr = treq->loc_addr; - fl6->flowi6_oif = treq->iif; + fl6->saddr = ireq->ir_v6_loc_addr; + fl6->flowi6_oif = ireq->ir_iif; fl6->flowi6_mark = sk->sk_mark; - fl6->fl6_dport = inet_rsk(req)->rmt_port; - fl6->fl6_sport = inet_rsk(req)->loc_port; + fl6->fl6_dport = ireq->ir_rmt_port; + fl6->fl6_sport = htons(ireq->ir_num); security_req_classify_flow(req, flowi6_to_flowi(fl6)); dst = ip6_dst_lookup_flow(sk, fl6, final_p, false); @@ -129,13 +129,13 @@ struct request_sock *inet6_csk_search_req(const struct sock *sk, lopt->nr_table_entries)]; (req = *prev) != NULL; prev = &req->dl_next) { - const struct inet6_request_sock *treq = inet6_rsk(req); + const struct inet_request_sock *ireq = inet_rsk(req); - if (inet_rsk(req)->rmt_port == rport && + if (ireq->ir_rmt_port == rport && req->rsk_ops->family == AF_INET6 && - ipv6_addr_equal(&treq->rmt_addr, raddr) && - ipv6_addr_equal(&treq->loc_addr, laddr) && - (!treq->iif || treq->iif == iif)) { + ipv6_addr_equal(&ireq->ir_v6_rmt_addr, raddr) && + ipv6_addr_equal(&ireq->ir_v6_loc_addr, laddr) && + (!ireq->ir_iif || ireq->ir_iif == iif)) { WARN_ON(req->sk != NULL); *prevp = prev; return req; @@ -153,8 +153,8 @@ void inet6_csk_reqsk_queue_hash_add(struct sock *sk, { struct inet_connection_sock *icsk = inet_csk(sk); struct listen_sock *lopt = icsk->icsk_accept_queue.listen_opt; - const u32 h = inet6_synq_hash(&inet6_rsk(req)->rmt_addr, - inet_rsk(req)->rmt_port, + const u32 h = inet6_synq_hash(&inet_rsk(req)->ir_v6_rmt_addr, + inet_rsk(req)->ir_rmt_port, lopt->hash_rnd, lopt->nr_table_entries); reqsk_queue_hash_req(&icsk->icsk_accept_queue, h, req, timeout); @@ -165,11 +165,10 @@ EXPORT_SYMBOL_GPL(inet6_csk_reqsk_queue_hash_add); void inet6_csk_addr2sockaddr(struct sock *sk, struct sockaddr * uaddr) { - struct ipv6_pinfo *np = inet6_sk(sk); struct sockaddr_in6 *sin6 = (struct sockaddr_in6 *) uaddr; sin6->sin6_family = AF_INET6; - sin6->sin6_addr = np->daddr; + sin6->sin6_addr = sk->sk_v6_daddr; sin6->sin6_port = inet_sk(sk)->inet_dport; /* We do not store received flowlabel for TCP */ sin6->sin6_flowinfo = 0; @@ -203,7 +202,7 @@ static struct dst_entry *inet6_csk_route_socket(struct sock *sk, memset(fl6, 0, sizeof(*fl6)); fl6->flowi6_proto = sk->sk_protocol; - fl6->daddr = np->daddr; + fl6->daddr = sk->sk_v6_daddr; fl6->saddr = np->saddr; fl6->flowlabel = np->flow_label; IP6_ECN_flow_xmit(sk, fl6->flowlabel); @@ -245,7 +244,7 @@ int inet6_csk_xmit(struct sk_buff *skb, struct flowi *fl_unused) skb_dst_set_noref(skb, dst); /* Restore final destination back after routing done */ - fl6.daddr = np->daddr; + fl6.daddr = sk->sk_v6_daddr; res = ip6_xmit(sk, skb, &fl6, np->opt, np->tclass); rcu_read_unlock(); diff --git a/net/ipv6/inet6_hashtables.c b/net/ipv6/inet6_hashtables.c index 066640e0ba8e..262e13c02ec2 100644 --- a/net/ipv6/inet6_hashtables.c +++ b/net/ipv6/inet6_hashtables.c @@ -23,6 +23,39 @@ #include <net/secure_seq.h> #include <net/ip.h> +static unsigned int inet6_ehashfn(struct net *net, + const struct in6_addr *laddr, + const u16 lport, + const struct in6_addr *faddr, + const __be16 fport) +{ + static u32 inet6_ehash_secret __read_mostly; + static u32 ipv6_hash_secret __read_mostly; + + u32 lhash, fhash; + + net_get_random_once(&inet6_ehash_secret, sizeof(inet6_ehash_secret)); + net_get_random_once(&ipv6_hash_secret, sizeof(ipv6_hash_secret)); + + lhash = (__force u32)laddr->s6_addr32[3]; + fhash = __ipv6_addr_jhash(faddr, ipv6_hash_secret); + + return __inet6_ehashfn(lhash, lport, fhash, fport, + inet6_ehash_secret + net_hash_mix(net)); +} + +static int inet6_sk_ehashfn(const struct sock *sk) +{ + const struct inet_sock *inet = inet_sk(sk); + const struct in6_addr *laddr = &sk->sk_v6_rcv_saddr; + const struct in6_addr *faddr = &sk->sk_v6_daddr; + const __u16 lport = inet->inet_num; + const __be16 fport = inet->inet_dport; + struct net *net = sock_net(sk); + + return inet6_ehashfn(net, laddr, lport, faddr, fport); +} + int __inet6_hash(struct sock *sk, struct inet_timewait_sock *tw) { struct inet_hashinfo *hashinfo = sk->sk_prot->h.hashinfo; @@ -89,43 +122,22 @@ begin: sk_nulls_for_each_rcu(sk, node, &head->chain) { if (sk->sk_hash != hash) continue; - if (likely(INET6_MATCH(sk, net, saddr, daddr, ports, dif))) { - if (unlikely(!atomic_inc_not_zero(&sk->sk_refcnt))) - goto begintw; - if (unlikely(!INET6_MATCH(sk, net, saddr, daddr, - ports, dif))) { - sock_put(sk); - goto begin; - } - goto out; - } - } - if (get_nulls_value(node) != slot) - goto begin; - -begintw: - /* Must check for a TIME_WAIT'er before going to listener hash. */ - sk_nulls_for_each_rcu(sk, node, &head->twchain) { - if (sk->sk_hash != hash) + if (!INET6_MATCH(sk, net, saddr, daddr, ports, dif)) continue; - if (likely(INET6_TW_MATCH(sk, net, saddr, daddr, - ports, dif))) { - if (unlikely(!atomic_inc_not_zero(&sk->sk_refcnt))) { - sk = NULL; - goto out; - } - if (unlikely(!INET6_TW_MATCH(sk, net, saddr, daddr, - ports, dif))) { - inet_twsk_put(inet_twsk(sk)); - goto begintw; - } + if (unlikely(!atomic_inc_not_zero(&sk->sk_refcnt))) goto out; + + if (unlikely(!INET6_MATCH(sk, net, saddr, daddr, ports, dif))) { + sock_gen_put(sk); + goto begin; } + goto found; } if (get_nulls_value(node) != slot) - goto begintw; - sk = NULL; + goto begin; out: + sk = NULL; +found: rcu_read_unlock(); return sk; } @@ -140,11 +152,10 @@ static inline int compute_score(struct sock *sk, struct net *net, if (net_eq(sock_net(sk), net) && inet_sk(sk)->inet_num == hnum && sk->sk_family == PF_INET6) { - const struct ipv6_pinfo *np = inet6_sk(sk); score = 1; - if (!ipv6_addr_any(&np->rcv_saddr)) { - if (!ipv6_addr_equal(&np->rcv_saddr, daddr)) + if (!ipv6_addr_any(&sk->sk_v6_rcv_saddr)) { + if (!ipv6_addr_equal(&sk->sk_v6_rcv_saddr, daddr)) return -1; score++; } @@ -236,9 +247,8 @@ static int __inet6_check_established(struct inet_timewait_death_row *death_row, { struct inet_hashinfo *hinfo = death_row->hashinfo; struct inet_sock *inet = inet_sk(sk); - const struct ipv6_pinfo *np = inet6_sk(sk); - const struct in6_addr *daddr = &np->rcv_saddr; - const struct in6_addr *saddr = &np->daddr; + const struct in6_addr *daddr = &sk->sk_v6_rcv_saddr; + const struct in6_addr *saddr = &sk->sk_v6_daddr; const int dif = sk->sk_bound_dev_if; const __portpair ports = INET_COMBINED_PORTS(inet->inet_dport, lport); struct net *net = sock_net(sk); @@ -248,38 +258,28 @@ static int __inet6_check_established(struct inet_timewait_death_row *death_row, spinlock_t *lock = inet_ehash_lockp(hinfo, hash); struct sock *sk2; const struct hlist_nulls_node *node; - struct inet_timewait_sock *tw; + struct inet_timewait_sock *tw = NULL; int twrefcnt = 0; spin_lock(lock); - /* Check TIME-WAIT sockets first. */ - sk_nulls_for_each(sk2, node, &head->twchain) { - if (sk2->sk_hash != hash) - continue; - - if (likely(INET6_TW_MATCH(sk2, net, saddr, daddr, - ports, dif))) { - tw = inet_twsk(sk2); - if (twsk_unique(sk, sk2, twp)) - goto unique; - else - goto not_unique; - } - } - tw = NULL; - - /* And established part... */ sk_nulls_for_each(sk2, node, &head->chain) { if (sk2->sk_hash != hash) continue; - if (likely(INET6_MATCH(sk2, net, saddr, daddr, ports, dif))) + + if (likely(INET6_MATCH(sk2, net, saddr, daddr, ports, dif))) { + if (sk2->sk_state == TCP_TIME_WAIT) { + tw = inet_twsk(sk2); + if (twsk_unique(sk, sk2, twp)) + break; + } goto not_unique; + } } -unique: /* Must record num and sport now. Otherwise we will see - * in hash table socket with a funny identity. */ + * in hash table socket with a funny identity. + */ inet->inet_num = lport; inet->inet_sport = htons(lport); sk->sk_hash = hash; @@ -312,9 +312,9 @@ not_unique: static inline u32 inet6_sk_port_offset(const struct sock *sk) { const struct inet_sock *inet = inet_sk(sk); - const struct ipv6_pinfo *np = inet6_sk(sk); - return secure_ipv6_port_ephemeral(np->rcv_saddr.s6_addr32, - np->daddr.s6_addr32, + + return secure_ipv6_port_ephemeral(sk->sk_v6_rcv_saddr.s6_addr32, + sk->sk_v6_daddr.s6_addr32, inet->inet_dport); } diff --git a/net/ipv6/ip6_fib.c b/net/ipv6/ip6_fib.c index 5bec666aba61..5550a8113a6d 100644 --- a/net/ipv6/ip6_fib.c +++ b/net/ipv6/ip6_fib.c @@ -1529,25 +1529,6 @@ static void fib6_clean_tree(struct net *net, struct fib6_node *root, fib6_walk(&c.w); } -void fib6_clean_all_ro(struct net *net, int (*func)(struct rt6_info *, void *arg), - int prune, void *arg) -{ - struct fib6_table *table; - struct hlist_head *head; - unsigned int h; - - rcu_read_lock(); - for (h = 0; h < FIB6_TABLE_HASHSZ; h++) { - head = &net->ipv6.fib_table_hash[h]; - hlist_for_each_entry_rcu(table, head, tb6_hlist) { - read_lock_bh(&table->tb6_lock); - fib6_clean_tree(net, &table->tb6_root, - func, prune, arg); - read_unlock_bh(&table->tb6_lock); - } - } - rcu_read_unlock(); -} void fib6_clean_all(struct net *net, int (*func)(struct rt6_info *, void *arg), int prune, void *arg) { @@ -1782,3 +1763,189 @@ void fib6_gc_cleanup(void) unregister_pernet_subsys(&fib6_net_ops); kmem_cache_destroy(fib6_node_kmem); } + +#ifdef CONFIG_PROC_FS + +struct ipv6_route_iter { + struct seq_net_private p; + struct fib6_walker_t w; + loff_t skip; + struct fib6_table *tbl; + __u32 sernum; +}; + +static int ipv6_route_seq_show(struct seq_file *seq, void *v) +{ + struct rt6_info *rt = v; + struct ipv6_route_iter *iter = seq->private; + + seq_printf(seq, "%pi6 %02x ", &rt->rt6i_dst.addr, rt->rt6i_dst.plen); + +#ifdef CONFIG_IPV6_SUBTREES + seq_printf(seq, "%pi6 %02x ", &rt->rt6i_src.addr, rt->rt6i_src.plen); +#else + seq_puts(seq, "00000000000000000000000000000000 00 "); +#endif + if (rt->rt6i_flags & RTF_GATEWAY) + seq_printf(seq, "%pi6", &rt->rt6i_gateway); + else + seq_puts(seq, "00000000000000000000000000000000"); + + seq_printf(seq, " %08x %08x %08x %08x %8s\n", + rt->rt6i_metric, atomic_read(&rt->dst.__refcnt), + rt->dst.__use, rt->rt6i_flags, + rt->dst.dev ? rt->dst.dev->name : ""); + iter->w.leaf = NULL; + return 0; +} + +static int ipv6_route_yield(struct fib6_walker_t *w) +{ + struct ipv6_route_iter *iter = w->args; + + if (!iter->skip) + return 1; + + do { + iter->w.leaf = iter->w.leaf->dst.rt6_next; + iter->skip--; + if (!iter->skip && iter->w.leaf) + return 1; + } while (iter->w.leaf); + + return 0; +} + +static void ipv6_route_seq_setup_walk(struct ipv6_route_iter *iter) +{ + memset(&iter->w, 0, sizeof(iter->w)); + iter->w.func = ipv6_route_yield; + iter->w.root = &iter->tbl->tb6_root; + iter->w.state = FWS_INIT; + iter->w.node = iter->w.root; + iter->w.args = iter; + iter->sernum = iter->w.root->fn_sernum; + INIT_LIST_HEAD(&iter->w.lh); + fib6_walker_link(&iter->w); +} + +static struct fib6_table *ipv6_route_seq_next_table(struct fib6_table *tbl, + struct net *net) +{ + unsigned int h; + struct hlist_node *node; + + if (tbl) { + h = (tbl->tb6_id & (FIB6_TABLE_HASHSZ - 1)) + 1; + node = rcu_dereference_bh(hlist_next_rcu(&tbl->tb6_hlist)); + } else { + h = 0; + node = NULL; + } + + while (!node && h < FIB6_TABLE_HASHSZ) { + node = rcu_dereference_bh( + hlist_first_rcu(&net->ipv6.fib_table_hash[h++])); + } + return hlist_entry_safe(node, struct fib6_table, tb6_hlist); +} + +static void ipv6_route_check_sernum(struct ipv6_route_iter *iter) +{ + if (iter->sernum != iter->w.root->fn_sernum) { + iter->sernum = iter->w.root->fn_sernum; + iter->w.state = FWS_INIT; + iter->w.node = iter->w.root; + WARN_ON(iter->w.skip); + iter->w.skip = iter->w.count; + } +} + +static void *ipv6_route_seq_next(struct seq_file *seq, void *v, loff_t *pos) +{ + int r; + struct rt6_info *n; + struct net *net = seq_file_net(seq); + struct ipv6_route_iter *iter = seq->private; + + if (!v) + goto iter_table; + + n = ((struct rt6_info *)v)->dst.rt6_next; + if (n) { + ++*pos; + return n; + } + +iter_table: + ipv6_route_check_sernum(iter); + read_lock(&iter->tbl->tb6_lock); + r = fib6_walk_continue(&iter->w); + read_unlock(&iter->tbl->tb6_lock); + if (r > 0) { + if (v) + ++*pos; + return iter->w.leaf; + } else if (r < 0) { + fib6_walker_unlink(&iter->w); + return NULL; + } + fib6_walker_unlink(&iter->w); + + iter->tbl = ipv6_route_seq_next_table(iter->tbl, net); + if (!iter->tbl) + return NULL; + + ipv6_route_seq_setup_walk(iter); + goto iter_table; +} + +static void *ipv6_route_seq_start(struct seq_file *seq, loff_t *pos) + __acquires(RCU_BH) +{ + struct net *net = seq_file_net(seq); + struct ipv6_route_iter *iter = seq->private; + + rcu_read_lock_bh(); + iter->tbl = ipv6_route_seq_next_table(NULL, net); + iter->skip = *pos; + + if (iter->tbl) { + ipv6_route_seq_setup_walk(iter); + return ipv6_route_seq_next(seq, NULL, pos); + } else { + return NULL; + } +} + +static bool ipv6_route_iter_active(struct ipv6_route_iter *iter) +{ + struct fib6_walker_t *w = &iter->w; + return w->node && !(w->state == FWS_U && w->node == w->root); +} + +static void ipv6_route_seq_stop(struct seq_file *seq, void *v) + __releases(RCU_BH) +{ + struct ipv6_route_iter *iter = seq->private; + + if (ipv6_route_iter_active(iter)) + fib6_walker_unlink(&iter->w); + + rcu_read_unlock_bh(); +} + +static const struct seq_operations ipv6_route_seq_ops = { + .start = ipv6_route_seq_start, + .next = ipv6_route_seq_next, + .stop = ipv6_route_seq_stop, + .show = ipv6_route_seq_show +}; + +int ipv6_route_open(struct inode *inode, struct file *file) +{ + return seq_open_net(inode, file, &ipv6_route_seq_ops, + sizeof(struct ipv6_route_iter)); +} + +#endif /* CONFIG_PROC_FS */ diff --git a/net/ipv6/ip6_flowlabel.c b/net/ipv6/ip6_flowlabel.c index 46e88433ec7d..e7fb7106550f 100644 --- a/net/ipv6/ip6_flowlabel.c +++ b/net/ipv6/ip6_flowlabel.c @@ -41,7 +41,7 @@ #define FL_MIN_LINGER 6 /* Minimal linger. It is set to 6sec specified in old IPv6 RFC. Well, it was reasonable value. */ -#define FL_MAX_LINGER 60 /* Maximal linger timeout */ +#define FL_MAX_LINGER 150 /* Maximal linger timeout */ /* FL hash table */ @@ -345,6 +345,8 @@ static int fl6_renew(struct ip6_flowlabel *fl, unsigned long linger, unsigned lo expires = check_linger(expires); if (!expires) return -EPERM; + + spin_lock_bh(&ip6_fl_lock); fl->lastuse = jiffies; if (time_before(fl->linger, linger)) fl->linger = linger; @@ -352,6 +354,8 @@ static int fl6_renew(struct ip6_flowlabel *fl, unsigned long linger, unsigned lo expires = fl->linger; if (time_before(fl->expires, fl->lastuse + expires)) fl->expires = fl->lastuse + expires; + spin_unlock_bh(&ip6_fl_lock); + return 0; } @@ -453,8 +457,10 @@ static int mem_check(struct sock *sk) if (room > FL_MAX_SIZE - FL_MAX_PER_SOCK) return 0; + rcu_read_lock_bh(); for_each_sk_fl_rcu(np, sfl) count++; + rcu_read_unlock_bh(); if (room <= 0 || ((count >= FL_MAX_PER_SOCK || @@ -465,34 +471,6 @@ static int mem_check(struct sock *sk) return 0; } -static bool ipv6_hdr_cmp(struct ipv6_opt_hdr *h1, struct ipv6_opt_hdr *h2) -{ - if (h1 == h2) - return false; - if (h1 == NULL || h2 == NULL) - return true; - if (h1->hdrlen != h2->hdrlen) - return true; - return memcmp(h1+1, h2+1, ((h1->hdrlen+1)<<3) - sizeof(*h1)); -} - -static bool ipv6_opt_cmp(struct ipv6_txoptions *o1, struct ipv6_txoptions *o2) -{ - if (o1 == o2) - return false; - if (o1 == NULL || o2 == NULL) - return true; - if (o1->opt_nflen != o2->opt_nflen) - return true; - if (ipv6_hdr_cmp(o1->hopopt, o2->hopopt)) - return true; - if (ipv6_hdr_cmp(o1->dst0opt, o2->dst0opt)) - return true; - if (ipv6_hdr_cmp((struct ipv6_opt_hdr *)o1->srcrt, (struct ipv6_opt_hdr *)o2->srcrt)) - return true; - return false; -} - static inline void fl_link(struct ipv6_pinfo *np, struct ipv6_fl_socklist *sfl, struct ip6_flowlabel *fl) { @@ -503,6 +481,32 @@ static inline void fl_link(struct ipv6_pinfo *np, struct ipv6_fl_socklist *sfl, spin_unlock_bh(&ip6_sk_fl_lock); } +int ipv6_flowlabel_opt_get(struct sock *sk, struct in6_flowlabel_req *freq) +{ + struct ipv6_pinfo *np = inet6_sk(sk); + struct ipv6_fl_socklist *sfl; + + rcu_read_lock_bh(); + + for_each_sk_fl_rcu(np, sfl) { + if (sfl->fl->label == (np->flow_label & IPV6_FLOWLABEL_MASK)) { + spin_lock_bh(&ip6_fl_lock); + freq->flr_label = sfl->fl->label; + freq->flr_dst = sfl->fl->dst; + freq->flr_share = sfl->fl->share; + freq->flr_expires = (sfl->fl->expires - jiffies) / HZ; + freq->flr_linger = sfl->fl->linger / HZ; + + spin_unlock_bh(&ip6_fl_lock); + rcu_read_unlock_bh(); + return 0; + } + } + rcu_read_unlock_bh(); + + return -ENOENT; +} + int ipv6_flowlabel_opt(struct sock *sk, char __user *optval, int optlen) { int uninitialized_var(err); @@ -603,11 +607,6 @@ recheck: uid_eq(fl1->owner.uid, fl->owner.uid))) goto release; - err = -EINVAL; - if (!ipv6_addr_equal(&fl1->dst, &fl->dst) || - ipv6_opt_cmp(fl1->opt, fl->opt)) - goto release; - err = -ENOMEM; if (sfl1 == NULL) goto release; diff --git a/net/ipv6/ip6_offload.c b/net/ipv6/ip6_offload.c index d82de7228100..4b851692b1f6 100644 --- a/net/ipv6/ip6_offload.c +++ b/net/ipv6/ip6_offload.c @@ -66,7 +66,6 @@ static int ipv6_gso_send_check(struct sk_buff *skb) __skb_pull(skb, sizeof(*ipv6h)); err = -EPROTONOSUPPORT; - rcu_read_lock(); ops = rcu_dereference(inet6_offloads[ ipv6_gso_pull_exthdrs(skb, ipv6h->nexthdr)]); @@ -74,7 +73,6 @@ static int ipv6_gso_send_check(struct sk_buff *skb) skb_reset_transport_header(skb); err = ops->callbacks.gso_send_check(skb); } - rcu_read_unlock(); out: return err; @@ -92,46 +90,58 @@ static struct sk_buff *ipv6_gso_segment(struct sk_buff *skb, u8 *prevhdr; int offset = 0; bool tunnel; + int nhoff; if (unlikely(skb_shinfo(skb)->gso_type & ~(SKB_GSO_UDP | SKB_GSO_DODGY | SKB_GSO_TCP_ECN | SKB_GSO_GRE | + SKB_GSO_IPIP | + SKB_GSO_SIT | SKB_GSO_UDP_TUNNEL | SKB_GSO_MPLS | SKB_GSO_TCPV6 | 0))) goto out; + skb_reset_network_header(skb); + nhoff = skb_network_header(skb) - skb_mac_header(skb); if (unlikely(!pskb_may_pull(skb, sizeof(*ipv6h)))) goto out; - tunnel = skb->encapsulation; + tunnel = SKB_GSO_CB(skb)->encap_level > 0; + if (tunnel) + features = skb->dev->hw_enc_features & netif_skb_features(skb); + SKB_GSO_CB(skb)->encap_level += sizeof(*ipv6h); + ipv6h = ipv6_hdr(skb); __skb_pull(skb, sizeof(*ipv6h)); segs = ERR_PTR(-EPROTONOSUPPORT); proto = ipv6_gso_pull_exthdrs(skb, ipv6h->nexthdr); - rcu_read_lock(); + ops = rcu_dereference(inet6_offloads[proto]); if (likely(ops && ops->callbacks.gso_segment)) { skb_reset_transport_header(skb); segs = ops->callbacks.gso_segment(skb, features); } - rcu_read_unlock(); if (IS_ERR(segs)) goto out; for (skb = segs; skb; skb = skb->next) { - ipv6h = ipv6_hdr(skb); - ipv6h->payload_len = htons(skb->len - skb->mac_len - - sizeof(*ipv6h)); + ipv6h = (struct ipv6hdr *)(skb_mac_header(skb) + nhoff); + ipv6h->payload_len = htons(skb->len - nhoff - sizeof(*ipv6h)); + if (tunnel) { + skb_reset_inner_headers(skb); + skb->encapsulation = 1; + } + skb->network_header = (u8 *)ipv6h - skb->head; + if (!tunnel && proto == IPPROTO_UDP) { unfrag_ip6hlen = ip6_find_1stfragopt(skb, &prevhdr); - fptr = (struct frag_hdr *)(skb_network_header(skb) + - unfrag_ip6hlen); + fptr = (struct frag_hdr *)((u8 *)ipv6h + unfrag_ip6hlen); fptr->frag_off = htons(offset); if (skb->next != NULL) fptr->frag_off |= htons(IP6_MF); @@ -267,6 +277,13 @@ static struct packet_offload ipv6_packet_offload __read_mostly = { }, }; +static const struct net_offload sit_offload = { + .callbacks = { + .gso_send_check = ipv6_gso_send_check, + .gso_segment = ipv6_gso_segment, + }, +}; + static int __init ipv6_offload_init(void) { @@ -278,6 +295,9 @@ static int __init ipv6_offload_init(void) pr_crit("%s: Cannot add EXTHDRS protocol offload\n", __func__); dev_add_offload(&ipv6_packet_offload); + + inet_add_offload(&sit_offload, IPPROTO_IPV6); + return 0; } diff --git a/net/ipv6/ip6_output.c b/net/ipv6/ip6_output.c index 91fb4e8212f5..5e31a909a2b0 100644 --- a/net/ipv6/ip6_output.c +++ b/net/ipv6/ip6_output.c @@ -125,7 +125,8 @@ static int ip6_finish_output2(struct sk_buff *skb) static int ip6_finish_output(struct sk_buff *skb) { if ((skb->len > ip6_skb_dst_mtu(skb) && !skb_is_gso(skb)) || - dst_allfrag(skb_dst(skb))) + dst_allfrag(skb_dst(skb)) || + (IP6CB(skb)->frag_max_size && skb->len > IP6CB(skb)->frag_max_size)) return ip6_fragment(skb, ip6_finish_output2); else return ip6_finish_output2(skb); diff --git a/net/ipv6/ip6_vti.c b/net/ipv6/ip6_vti.c new file mode 100644 index 000000000000..ed94ba61dda0 --- /dev/null +++ b/net/ipv6/ip6_vti.c @@ -0,0 +1,1056 @@ +/* + * IPv6 virtual tunneling interface + * + * Copyright (C) 2013 secunet Security Networks AG + * + * Author: + * Steffen Klassert <steffen.klassert@secunet.com> + * + * Based on: + * net/ipv6/ip6_tunnel.c + * + * This program is free software; you can redistribute it and/or + * modify it under the terms of the GNU General Public License + * as published by the Free Software Foundation; either version + * 2 of the License, or (at your option) any later version. + */ + +#include <linux/module.h> +#include <linux/capability.h> +#include <linux/errno.h> +#include <linux/types.h> +#include <linux/sockios.h> +#include <linux/icmp.h> +#include <linux/if.h> +#include <linux/in.h> +#include <linux/ip.h> +#include <linux/if_tunnel.h> +#include <linux/net.h> +#include <linux/in6.h> +#include <linux/netdevice.h> +#include <linux/if_arp.h> +#include <linux/icmpv6.h> +#include <linux/init.h> +#include <linux/route.h> +#include <linux/rtnetlink.h> +#include <linux/netfilter_ipv6.h> +#include <linux/slab.h> +#include <linux/hash.h> + +#include <linux/uaccess.h> +#include <linux/atomic.h> + +#include <net/icmp.h> +#include <net/ip.h> +#include <net/ip_tunnels.h> +#include <net/ipv6.h> +#include <net/ip6_route.h> +#include <net/addrconf.h> +#include <net/ip6_tunnel.h> +#include <net/xfrm.h> +#include <net/net_namespace.h> +#include <net/netns/generic.h> + +#define HASH_SIZE_SHIFT 5 +#define HASH_SIZE (1 << HASH_SIZE_SHIFT) + +static u32 HASH(const struct in6_addr *addr1, const struct in6_addr *addr2) +{ + u32 hash = ipv6_addr_hash(addr1) ^ ipv6_addr_hash(addr2); + + return hash_32(hash, HASH_SIZE_SHIFT); +} + +static int vti6_dev_init(struct net_device *dev); +static void vti6_dev_setup(struct net_device *dev); +static struct rtnl_link_ops vti6_link_ops __read_mostly; + +static int vti6_net_id __read_mostly; +struct vti6_net { + /* the vti6 tunnel fallback device */ + struct net_device *fb_tnl_dev; + /* lists for storing tunnels in use */ + struct ip6_tnl __rcu *tnls_r_l[HASH_SIZE]; + struct ip6_tnl __rcu *tnls_wc[1]; + struct ip6_tnl __rcu **tnls[2]; +}; + +static struct net_device_stats *vti6_get_stats(struct net_device *dev) +{ + struct pcpu_tstats sum = { 0 }; + int i; + + for_each_possible_cpu(i) { + const struct pcpu_tstats *tstats = per_cpu_ptr(dev->tstats, i); + + sum.rx_packets += tstats->rx_packets; + sum.rx_bytes += tstats->rx_bytes; + sum.tx_packets += tstats->tx_packets; + sum.tx_bytes += tstats->tx_bytes; + } + dev->stats.rx_packets = sum.rx_packets; + dev->stats.rx_bytes = sum.rx_bytes; + dev->stats.tx_packets = sum.tx_packets; + dev->stats.tx_bytes = sum.tx_bytes; + return &dev->stats; +} + +#define for_each_vti6_tunnel_rcu(start) \ + for (t = rcu_dereference(start); t; t = rcu_dereference(t->next)) + +/** + * vti6_tnl_lookup - fetch tunnel matching the end-point addresses + * @net: network namespace + * @remote: the address of the tunnel exit-point + * @local: the address of the tunnel entry-point + * + * Return: + * tunnel matching given end-points if found, + * else fallback tunnel if its device is up, + * else %NULL + **/ +static struct ip6_tnl * +vti6_tnl_lookup(struct net *net, const struct in6_addr *remote, + const struct in6_addr *local) +{ + unsigned int hash = HASH(remote, local); + struct ip6_tnl *t; + struct vti6_net *ip6n = net_generic(net, vti6_net_id); + + for_each_vti6_tunnel_rcu(ip6n->tnls_r_l[hash]) { + if (ipv6_addr_equal(local, &t->parms.laddr) && + ipv6_addr_equal(remote, &t->parms.raddr) && + (t->dev->flags & IFF_UP)) + return t; + } + t = rcu_dereference(ip6n->tnls_wc[0]); + if (t && (t->dev->flags & IFF_UP)) + return t; + + return NULL; +} + +/** + * vti6_tnl_bucket - get head of list matching given tunnel parameters + * @p: parameters containing tunnel end-points + * + * Description: + * vti6_tnl_bucket() returns the head of the list matching the + * &struct in6_addr entries laddr and raddr in @p. + * + * Return: head of IPv6 tunnel list + **/ +static struct ip6_tnl __rcu ** +vti6_tnl_bucket(struct vti6_net *ip6n, const struct __ip6_tnl_parm *p) +{ + const struct in6_addr *remote = &p->raddr; + const struct in6_addr *local = &p->laddr; + unsigned int h = 0; + int prio = 0; + + if (!ipv6_addr_any(remote) || !ipv6_addr_any(local)) { + prio = 1; + h = HASH(remote, local); + } + return &ip6n->tnls[prio][h]; +} + +static void +vti6_tnl_link(struct vti6_net *ip6n, struct ip6_tnl *t) +{ + struct ip6_tnl __rcu **tp = vti6_tnl_bucket(ip6n, &t->parms); + + rcu_assign_pointer(t->next , rtnl_dereference(*tp)); + rcu_assign_pointer(*tp, t); +} + +static void +vti6_tnl_unlink(struct vti6_net *ip6n, struct ip6_tnl *t) +{ + struct ip6_tnl __rcu **tp; + struct ip6_tnl *iter; + + for (tp = vti6_tnl_bucket(ip6n, &t->parms); + (iter = rtnl_dereference(*tp)) != NULL; + tp = &iter->next) { + if (t == iter) { + rcu_assign_pointer(*tp, t->next); + break; + } + } +} + +static void vti6_dev_free(struct net_device *dev) +{ + free_percpu(dev->tstats); + free_netdev(dev); +} + +static int vti6_tnl_create2(struct net_device *dev) +{ + struct ip6_tnl *t = netdev_priv(dev); + struct net *net = dev_net(dev); + struct vti6_net *ip6n = net_generic(net, vti6_net_id); + int err; + + err = vti6_dev_init(dev); + if (err < 0) + goto out; + + err = register_netdevice(dev); + if (err < 0) + goto out; + + strcpy(t->parms.name, dev->name); + dev->rtnl_link_ops = &vti6_link_ops; + + dev_hold(dev); + vti6_tnl_link(ip6n, t); + + return 0; + +out: + return err; +} + +static struct ip6_tnl *vti6_tnl_create(struct net *net, struct __ip6_tnl_parm *p) +{ + struct net_device *dev; + struct ip6_tnl *t; + char name[IFNAMSIZ]; + int err; + + if (p->name[0]) + strlcpy(name, p->name, IFNAMSIZ); + else + sprintf(name, "ip6_vti%%d"); + + dev = alloc_netdev(sizeof(*t), name, vti6_dev_setup); + if (dev == NULL) + goto failed; + + dev_net_set(dev, net); + + t = netdev_priv(dev); + t->parms = *p; + t->net = dev_net(dev); + + err = vti6_tnl_create2(dev); + if (err < 0) + goto failed_free; + + return t; + +failed_free: + vti6_dev_free(dev); +failed: + return NULL; +} + +/** + * vti6_locate - find or create tunnel matching given parameters + * @net: network namespace + * @p: tunnel parameters + * @create: != 0 if allowed to create new tunnel if no match found + * + * Description: + * vti6_locate() first tries to locate an existing tunnel + * based on @parms. If this is unsuccessful, but @create is set a new + * tunnel device is created and registered for use. + * + * Return: + * matching tunnel or NULL + **/ +static struct ip6_tnl *vti6_locate(struct net *net, struct __ip6_tnl_parm *p, + int create) +{ + const struct in6_addr *remote = &p->raddr; + const struct in6_addr *local = &p->laddr; + struct ip6_tnl __rcu **tp; + struct ip6_tnl *t; + struct vti6_net *ip6n = net_generic(net, vti6_net_id); + + for (tp = vti6_tnl_bucket(ip6n, p); + (t = rtnl_dereference(*tp)) != NULL; + tp = &t->next) { + if (ipv6_addr_equal(local, &t->parms.laddr) && + ipv6_addr_equal(remote, &t->parms.raddr)) + return t; + } + if (!create) + return NULL; + return vti6_tnl_create(net, p); +} + +/** + * vti6_dev_uninit - tunnel device uninitializer + * @dev: the device to be destroyed + * + * Description: + * vti6_dev_uninit() removes tunnel from its list + **/ +static void vti6_dev_uninit(struct net_device *dev) +{ + struct ip6_tnl *t = netdev_priv(dev); + struct net *net = dev_net(dev); + struct vti6_net *ip6n = net_generic(net, vti6_net_id); + + if (dev == ip6n->fb_tnl_dev) + RCU_INIT_POINTER(ip6n->tnls_wc[0], NULL); + else + vti6_tnl_unlink(ip6n, t); + ip6_tnl_dst_reset(t); + dev_put(dev); +} + +static int vti6_rcv(struct sk_buff *skb) +{ + struct ip6_tnl *t; + const struct ipv6hdr *ipv6h = ipv6_hdr(skb); + + rcu_read_lock(); + + if ((t = vti6_tnl_lookup(dev_net(skb->dev), &ipv6h->saddr, + &ipv6h->daddr)) != NULL) { + struct pcpu_tstats *tstats; + + if (t->parms.proto != IPPROTO_IPV6 && t->parms.proto != 0) { + rcu_read_unlock(); + goto discard; + } + + if (!xfrm6_policy_check(NULL, XFRM_POLICY_IN, skb)) { + rcu_read_unlock(); + return 0; + } + + if (!ip6_tnl_rcv_ctl(t, &ipv6h->daddr, &ipv6h->saddr)) { + t->dev->stats.rx_dropped++; + rcu_read_unlock(); + goto discard; + } + + tstats = this_cpu_ptr(t->dev->tstats); + tstats->rx_packets++; + tstats->rx_bytes += skb->len; + + skb->mark = 0; + secpath_reset(skb); + skb->dev = t->dev; + + rcu_read_unlock(); + return 0; + } + rcu_read_unlock(); + return 1; + +discard: + kfree_skb(skb); + return 0; +} + +/** + * vti6_addr_conflict - compare packet addresses to tunnel's own + * @t: the outgoing tunnel device + * @hdr: IPv6 header from the incoming packet + * + * Description: + * Avoid trivial tunneling loop by checking that tunnel exit-point + * doesn't match source of incoming packet. + * + * Return: + * 1 if conflict, + * 0 else + **/ +static inline bool +vti6_addr_conflict(const struct ip6_tnl *t, const struct ipv6hdr *hdr) +{ + return ipv6_addr_equal(&t->parms.raddr, &hdr->saddr); +} + +/** + * vti6_xmit - send a packet + * @skb: the outgoing socket buffer + * @dev: the outgoing tunnel device + **/ +static int vti6_xmit(struct sk_buff *skb, struct net_device *dev) +{ + struct net *net = dev_net(dev); + struct ip6_tnl *t = netdev_priv(dev); + struct net_device_stats *stats = &t->dev->stats; + struct dst_entry *dst = NULL, *ndst = NULL; + struct flowi6 fl6; + struct ipv6hdr *ipv6h = ipv6_hdr(skb); + struct net_device *tdev; + int err = -1; + + if ((t->parms.proto != IPPROTO_IPV6 && t->parms.proto != 0) || + !ip6_tnl_xmit_ctl(t) || vti6_addr_conflict(t, ipv6h)) + return err; + + dst = ip6_tnl_dst_check(t); + if (!dst) { + memcpy(&fl6, &t->fl.u.ip6, sizeof(fl6)); + + ndst = ip6_route_output(net, NULL, &fl6); + + if (ndst->error) + goto tx_err_link_failure; + ndst = xfrm_lookup(net, ndst, flowi6_to_flowi(&fl6), NULL, 0); + if (IS_ERR(ndst)) { + err = PTR_ERR(ndst); + ndst = NULL; + goto tx_err_link_failure; + } + dst = ndst; + } + + if (!dst->xfrm || dst->xfrm->props.mode != XFRM_MODE_TUNNEL) + goto tx_err_link_failure; + + tdev = dst->dev; + + if (tdev == dev) { + stats->collisions++; + net_warn_ratelimited("%s: Local routing loop detected!\n", + t->parms.name); + goto tx_err_dst_release; + } + + + skb_dst_drop(skb); + skb_dst_set_noref(skb, dst); + + ip6tunnel_xmit(skb, dev); + if (ndst) { + dev->mtu = dst_mtu(ndst); + ip6_tnl_dst_store(t, ndst); + } + + return 0; +tx_err_link_failure: + stats->tx_carrier_errors++; + dst_link_failure(skb); +tx_err_dst_release: + dst_release(ndst); + return err; +} + +static netdev_tx_t +vti6_tnl_xmit(struct sk_buff *skb, struct net_device *dev) +{ + struct ip6_tnl *t = netdev_priv(dev); + struct net_device_stats *stats = &t->dev->stats; + int ret; + + switch (skb->protocol) { + case htons(ETH_P_IPV6): + ret = vti6_xmit(skb, dev); + break; + default: + goto tx_err; + } + + if (ret < 0) + goto tx_err; + + return NETDEV_TX_OK; + +tx_err: + stats->tx_errors++; + stats->tx_dropped++; + kfree_skb(skb); + return NETDEV_TX_OK; +} + +static void vti6_link_config(struct ip6_tnl *t) +{ + struct dst_entry *dst; + struct net_device *dev = t->dev; + struct __ip6_tnl_parm *p = &t->parms; + struct flowi6 *fl6 = &t->fl.u.ip6; + + memcpy(dev->dev_addr, &p->laddr, sizeof(struct in6_addr)); + memcpy(dev->broadcast, &p->raddr, sizeof(struct in6_addr)); + + /* Set up flowi template */ + fl6->saddr = p->laddr; + fl6->daddr = p->raddr; + fl6->flowi6_oif = p->link; + fl6->flowi6_mark = be32_to_cpu(p->i_key); + fl6->flowi6_proto = p->proto; + fl6->flowlabel = 0; + + p->flags &= ~(IP6_TNL_F_CAP_XMIT | IP6_TNL_F_CAP_RCV | + IP6_TNL_F_CAP_PER_PACKET); + p->flags |= ip6_tnl_get_cap(t, &p->laddr, &p->raddr); + + if (p->flags & IP6_TNL_F_CAP_XMIT && p->flags & IP6_TNL_F_CAP_RCV) + dev->flags |= IFF_POINTOPOINT; + else + dev->flags &= ~IFF_POINTOPOINT; + + dev->iflink = p->link; + + if (p->flags & IP6_TNL_F_CAP_XMIT) { + + dst = ip6_route_output(dev_net(dev), NULL, fl6); + if (dst->error) + return; + + dst = xfrm_lookup(dev_net(dev), dst, flowi6_to_flowi(fl6), + NULL, 0); + if (IS_ERR(dst)) + return; + + if (dst->dev) { + dev->hard_header_len = dst->dev->hard_header_len; + + dev->mtu = dst_mtu(dst); + + if (dev->mtu < IPV6_MIN_MTU) + dev->mtu = IPV6_MIN_MTU; + } + dst_release(dst); + } +} + +/** + * vti6_tnl_change - update the tunnel parameters + * @t: tunnel to be changed + * @p: tunnel configuration parameters + * + * Description: + * vti6_tnl_change() updates the tunnel parameters + **/ +static int +vti6_tnl_change(struct ip6_tnl *t, const struct __ip6_tnl_parm *p) +{ + t->parms.laddr = p->laddr; + t->parms.raddr = p->raddr; + t->parms.link = p->link; + t->parms.i_key = p->i_key; + t->parms.o_key = p->o_key; + t->parms.proto = p->proto; + ip6_tnl_dst_reset(t); + vti6_link_config(t); + return 0; +} + +static int vti6_update(struct ip6_tnl *t, struct __ip6_tnl_parm *p) +{ + struct net *net = dev_net(t->dev); + struct vti6_net *ip6n = net_generic(net, vti6_net_id); + int err; + + vti6_tnl_unlink(ip6n, t); + synchronize_net(); + err = vti6_tnl_change(t, p); + vti6_tnl_link(ip6n, t); + netdev_state_change(t->dev); + return err; +} + +static void +vti6_parm_from_user(struct __ip6_tnl_parm *p, const struct ip6_tnl_parm2 *u) +{ + p->laddr = u->laddr; + p->raddr = u->raddr; + p->link = u->link; + p->i_key = u->i_key; + p->o_key = u->o_key; + p->proto = u->proto; + + memcpy(p->name, u->name, sizeof(u->name)); +} + +static void +vti6_parm_to_user(struct ip6_tnl_parm2 *u, const struct __ip6_tnl_parm *p) +{ + u->laddr = p->laddr; + u->raddr = p->raddr; + u->link = p->link; + u->i_key = p->i_key; + u->o_key = p->o_key; + u->proto = p->proto; + + memcpy(u->name, p->name, sizeof(u->name)); +} + +/** + * vti6_tnl_ioctl - configure vti6 tunnels from userspace + * @dev: virtual device associated with tunnel + * @ifr: parameters passed from userspace + * @cmd: command to be performed + * + * Description: + * vti6_ioctl() is used for managing vti6 tunnels + * from userspace. + * + * The possible commands are the following: + * %SIOCGETTUNNEL: get tunnel parameters for device + * %SIOCADDTUNNEL: add tunnel matching given tunnel parameters + * %SIOCCHGTUNNEL: change tunnel parameters to those given + * %SIOCDELTUNNEL: delete tunnel + * + * The fallback device "ip6_vti0", created during module + * initialization, can be used for creating other tunnel devices. + * + * Return: + * 0 on success, + * %-EFAULT if unable to copy data to or from userspace, + * %-EPERM if current process hasn't %CAP_NET_ADMIN set + * %-EINVAL if passed tunnel parameters are invalid, + * %-EEXIST if changing a tunnel's parameters would cause a conflict + * %-ENODEV if attempting to change or delete a nonexisting device + **/ +static int +vti6_ioctl(struct net_device *dev, struct ifreq *ifr, int cmd) +{ + int err = 0; + struct ip6_tnl_parm2 p; + struct __ip6_tnl_parm p1; + struct ip6_tnl *t = NULL; + struct net *net = dev_net(dev); + struct vti6_net *ip6n = net_generic(net, vti6_net_id); + + switch (cmd) { + case SIOCGETTUNNEL: + if (dev == ip6n->fb_tnl_dev) { + if (copy_from_user(&p, ifr->ifr_ifru.ifru_data, sizeof(p))) { + err = -EFAULT; + break; + } + vti6_parm_from_user(&p1, &p); + t = vti6_locate(net, &p1, 0); + } else { + memset(&p, 0, sizeof(p)); + } + if (t == NULL) + t = netdev_priv(dev); + vti6_parm_to_user(&p, &t->parms); + if (copy_to_user(ifr->ifr_ifru.ifru_data, &p, sizeof(p))) + err = -EFAULT; + break; + case SIOCADDTUNNEL: + case SIOCCHGTUNNEL: + err = -EPERM; + if (!ns_capable(net->user_ns, CAP_NET_ADMIN)) + break; + err = -EFAULT; + if (copy_from_user(&p, ifr->ifr_ifru.ifru_data, sizeof(p))) + break; + err = -EINVAL; + if (p.proto != IPPROTO_IPV6 && p.proto != 0) + break; + vti6_parm_from_user(&p1, &p); + t = vti6_locate(net, &p1, cmd == SIOCADDTUNNEL); + if (dev != ip6n->fb_tnl_dev && cmd == SIOCCHGTUNNEL) { + if (t != NULL) { + if (t->dev != dev) { + err = -EEXIST; + break; + } + } else + t = netdev_priv(dev); + + err = vti6_update(t, &p1); + } + if (t) { + err = 0; + vti6_parm_to_user(&p, &t->parms); + if (copy_to_user(ifr->ifr_ifru.ifru_data, &p, sizeof(p))) + err = -EFAULT; + + } else + err = (cmd == SIOCADDTUNNEL ? -ENOBUFS : -ENOENT); + break; + case SIOCDELTUNNEL: + err = -EPERM; + if (!ns_capable(net->user_ns, CAP_NET_ADMIN)) + break; + + if (dev == ip6n->fb_tnl_dev) { + err = -EFAULT; + if (copy_from_user(&p, ifr->ifr_ifru.ifru_data, sizeof(p))) + break; + err = -ENOENT; + vti6_parm_from_user(&p1, &p); + t = vti6_locate(net, &p1, 0); + if (t == NULL) + break; + err = -EPERM; + if (t->dev == ip6n->fb_tnl_dev) + break; + dev = t->dev; + } + err = 0; + unregister_netdevice(dev); + break; + default: + err = -EINVAL; + } + return err; +} + +/** + * vti6_tnl_change_mtu - change mtu manually for tunnel device + * @dev: virtual device associated with tunnel + * @new_mtu: the new mtu + * + * Return: + * 0 on success, + * %-EINVAL if mtu too small + **/ +static int vti6_change_mtu(struct net_device *dev, int new_mtu) +{ + if (new_mtu < IPV6_MIN_MTU) + return -EINVAL; + + dev->mtu = new_mtu; + return 0; +} + +static const struct net_device_ops vti6_netdev_ops = { + .ndo_uninit = vti6_dev_uninit, + .ndo_start_xmit = vti6_tnl_xmit, + .ndo_do_ioctl = vti6_ioctl, + .ndo_change_mtu = vti6_change_mtu, + .ndo_get_stats = vti6_get_stats, +}; + +/** + * vti6_dev_setup - setup virtual tunnel device + * @dev: virtual device associated with tunnel + * + * Description: + * Initialize function pointers and device parameters + **/ +static void vti6_dev_setup(struct net_device *dev) +{ + struct ip6_tnl *t; + + dev->netdev_ops = &vti6_netdev_ops; + dev->destructor = vti6_dev_free; + + dev->type = ARPHRD_TUNNEL6; + dev->hard_header_len = LL_MAX_HEADER + sizeof(struct ipv6hdr); + dev->mtu = ETH_DATA_LEN; + t = netdev_priv(dev); + dev->flags |= IFF_NOARP; + dev->addr_len = sizeof(struct in6_addr); + dev->features |= NETIF_F_NETNS_LOCAL; + dev->priv_flags &= ~IFF_XMIT_DST_RELEASE; +} + +/** + * vti6_dev_init_gen - general initializer for all tunnel devices + * @dev: virtual device associated with tunnel + **/ +static inline int vti6_dev_init_gen(struct net_device *dev) +{ + struct ip6_tnl *t = netdev_priv(dev); + + t->dev = dev; + t->net = dev_net(dev); + dev->tstats = alloc_percpu(struct pcpu_tstats); + if (!dev->tstats) + return -ENOMEM; + return 0; +} + +/** + * vti6_dev_init - initializer for all non fallback tunnel devices + * @dev: virtual device associated with tunnel + **/ +static int vti6_dev_init(struct net_device *dev) +{ + struct ip6_tnl *t = netdev_priv(dev); + int err = vti6_dev_init_gen(dev); + + if (err) + return err; + vti6_link_config(t); + return 0; +} + +/** + * vti6_fb_tnl_dev_init - initializer for fallback tunnel device + * @dev: fallback device + * + * Return: 0 + **/ +static int __net_init vti6_fb_tnl_dev_init(struct net_device *dev) +{ + struct ip6_tnl *t = netdev_priv(dev); + struct net *net = dev_net(dev); + struct vti6_net *ip6n = net_generic(net, vti6_net_id); + int err = vti6_dev_init_gen(dev); + + if (err) + return err; + + t->parms.proto = IPPROTO_IPV6; + dev_hold(dev); + + vti6_link_config(t); + + rcu_assign_pointer(ip6n->tnls_wc[0], t); + return 0; +} + +static int vti6_validate(struct nlattr *tb[], struct nlattr *data[]) +{ + return 0; +} + +static void vti6_netlink_parms(struct nlattr *data[], + struct __ip6_tnl_parm *parms) +{ + memset(parms, 0, sizeof(*parms)); + + if (!data) + return; + + if (data[IFLA_VTI_LINK]) + parms->link = nla_get_u32(data[IFLA_VTI_LINK]); + + if (data[IFLA_VTI_LOCAL]) + nla_memcpy(&parms->laddr, data[IFLA_VTI_LOCAL], + sizeof(struct in6_addr)); + + if (data[IFLA_VTI_REMOTE]) + nla_memcpy(&parms->raddr, data[IFLA_VTI_REMOTE], + sizeof(struct in6_addr)); + + if (data[IFLA_VTI_IKEY]) + parms->i_key = nla_get_be32(data[IFLA_VTI_IKEY]); + + if (data[IFLA_VTI_OKEY]) + parms->o_key = nla_get_be32(data[IFLA_VTI_OKEY]); +} + +static int vti6_newlink(struct net *src_net, struct net_device *dev, + struct nlattr *tb[], struct nlattr *data[]) +{ + struct net *net = dev_net(dev); + struct ip6_tnl *nt; + + nt = netdev_priv(dev); + vti6_netlink_parms(data, &nt->parms); + + nt->parms.proto = IPPROTO_IPV6; + + if (vti6_locate(net, &nt->parms, 0)) + return -EEXIST; + + return vti6_tnl_create2(dev); +} + +static int vti6_changelink(struct net_device *dev, struct nlattr *tb[], + struct nlattr *data[]) +{ + struct ip6_tnl *t; + struct __ip6_tnl_parm p; + struct net *net = dev_net(dev); + struct vti6_net *ip6n = net_generic(net, vti6_net_id); + + if (dev == ip6n->fb_tnl_dev) + return -EINVAL; + + vti6_netlink_parms(data, &p); + + t = vti6_locate(net, &p, 0); + + if (t) { + if (t->dev != dev) + return -EEXIST; + } else + t = netdev_priv(dev); + + return vti6_update(t, &p); +} + +static size_t vti6_get_size(const struct net_device *dev) +{ + return + /* IFLA_VTI_LINK */ + nla_total_size(4) + + /* IFLA_VTI_LOCAL */ + nla_total_size(sizeof(struct in6_addr)) + + /* IFLA_VTI_REMOTE */ + nla_total_size(sizeof(struct in6_addr)) + + /* IFLA_VTI_IKEY */ + nla_total_size(4) + + /* IFLA_VTI_OKEY */ + nla_total_size(4) + + 0; +} + +static int vti6_fill_info(struct sk_buff *skb, const struct net_device *dev) +{ + struct ip6_tnl *tunnel = netdev_priv(dev); + struct __ip6_tnl_parm *parm = &tunnel->parms; + + if (nla_put_u32(skb, IFLA_VTI_LINK, parm->link) || + nla_put(skb, IFLA_VTI_LOCAL, sizeof(struct in6_addr), + &parm->laddr) || + nla_put(skb, IFLA_VTI_REMOTE, sizeof(struct in6_addr), + &parm->raddr) || + nla_put_be32(skb, IFLA_VTI_IKEY, parm->i_key) || + nla_put_be32(skb, IFLA_VTI_OKEY, parm->o_key)) + goto nla_put_failure; + return 0; + +nla_put_failure: + return -EMSGSIZE; +} + +static const struct nla_policy vti6_policy[IFLA_VTI_MAX + 1] = { + [IFLA_VTI_LINK] = { .type = NLA_U32 }, + [IFLA_VTI_LOCAL] = { .len = sizeof(struct in6_addr) }, + [IFLA_VTI_REMOTE] = { .len = sizeof(struct in6_addr) }, + [IFLA_VTI_IKEY] = { .type = NLA_U32 }, + [IFLA_VTI_OKEY] = { .type = NLA_U32 }, +}; + +static struct rtnl_link_ops vti6_link_ops __read_mostly = { + .kind = "vti6", + .maxtype = IFLA_VTI_MAX, + .policy = vti6_policy, + .priv_size = sizeof(struct ip6_tnl), + .setup = vti6_dev_setup, + .validate = vti6_validate, + .newlink = vti6_newlink, + .changelink = vti6_changelink, + .get_size = vti6_get_size, + .fill_info = vti6_fill_info, +}; + +static struct xfrm_tunnel_notifier vti6_handler __read_mostly = { + .handler = vti6_rcv, + .priority = 1, +}; + +static void __net_exit vti6_destroy_tunnels(struct vti6_net *ip6n) +{ + int h; + struct ip6_tnl *t; + LIST_HEAD(list); + + for (h = 0; h < HASH_SIZE; h++) { + t = rtnl_dereference(ip6n->tnls_r_l[h]); + while (t != NULL) { + unregister_netdevice_queue(t->dev, &list); + t = rtnl_dereference(t->next); + } + } + + t = rtnl_dereference(ip6n->tnls_wc[0]); + unregister_netdevice_queue(t->dev, &list); + unregister_netdevice_many(&list); +} + +static int __net_init vti6_init_net(struct net *net) +{ + struct vti6_net *ip6n = net_generic(net, vti6_net_id); + struct ip6_tnl *t = NULL; + int err; + + ip6n->tnls[0] = ip6n->tnls_wc; + ip6n->tnls[1] = ip6n->tnls_r_l; + + err = -ENOMEM; + ip6n->fb_tnl_dev = alloc_netdev(sizeof(struct ip6_tnl), "ip6_vti0", + vti6_dev_setup); + + if (!ip6n->fb_tnl_dev) + goto err_alloc_dev; + dev_net_set(ip6n->fb_tnl_dev, net); + + err = vti6_fb_tnl_dev_init(ip6n->fb_tnl_dev); + if (err < 0) + goto err_register; + + err = register_netdev(ip6n->fb_tnl_dev); + if (err < 0) + goto err_register; + + t = netdev_priv(ip6n->fb_tnl_dev); + + strcpy(t->parms.name, ip6n->fb_tnl_dev->name); + return 0; + +err_register: + vti6_dev_free(ip6n->fb_tnl_dev); +err_alloc_dev: + return err; +} + +static void __net_exit vti6_exit_net(struct net *net) +{ + struct vti6_net *ip6n = net_generic(net, vti6_net_id); + + rtnl_lock(); + vti6_destroy_tunnels(ip6n); + rtnl_unlock(); +} + +static struct pernet_operations vti6_net_ops = { + .init = vti6_init_net, + .exit = vti6_exit_net, + .id = &vti6_net_id, + .size = sizeof(struct vti6_net), +}; + +/** + * vti6_tunnel_init - register protocol and reserve needed resources + * + * Return: 0 on success + **/ +static int __init vti6_tunnel_init(void) +{ + int err; + + err = register_pernet_device(&vti6_net_ops); + if (err < 0) + goto out_pernet; + + err = xfrm6_mode_tunnel_input_register(&vti6_handler); + if (err < 0) { + pr_err("%s: can't register vti6\n", __func__); + goto out; + } + err = rtnl_link_register(&vti6_link_ops); + if (err < 0) + goto rtnl_link_failed; + + return 0; + +rtnl_link_failed: + xfrm6_mode_tunnel_input_deregister(&vti6_handler); +out: + unregister_pernet_device(&vti6_net_ops); +out_pernet: + return err; +} + +/** + * vti6_tunnel_cleanup - free resources and unregister protocol + **/ +static void __exit vti6_tunnel_cleanup(void) +{ + rtnl_link_unregister(&vti6_link_ops); + if (xfrm6_mode_tunnel_input_deregister(&vti6_handler)) + pr_info("%s: can't deregister vti6\n", __func__); + + unregister_pernet_device(&vti6_net_ops); +} + +module_init(vti6_tunnel_init); +module_exit(vti6_tunnel_cleanup); +MODULE_LICENSE("GPL"); +MODULE_ALIAS_RTNL_LINK("vti6"); +MODULE_ALIAS_NETDEV("ip6_vti0"); +MODULE_AUTHOR("Steffen Klassert"); +MODULE_DESCRIPTION("IPv6 virtual tunnel interface"); diff --git a/net/ipv6/ipv6_sockglue.c b/net/ipv6/ipv6_sockglue.c index d1e2e8ef29c5..1c6ce3119ff8 100644 --- a/net/ipv6/ipv6_sockglue.c +++ b/net/ipv6/ipv6_sockglue.c @@ -174,7 +174,7 @@ static int do_ipv6_setsockopt(struct sock *sk, int level, int optname, } if (ipv6_only_sock(sk) || - !ipv6_addr_v4mapped(&np->daddr)) { + !ipv6_addr_v4mapped(&sk->sk_v6_daddr)) { retv = -EADDRNOTAVAIL; break; } @@ -1011,7 +1011,7 @@ static int do_ipv6_getsockopt(struct sock *sk, int level, int optname, struct in6_pktinfo src_info; src_info.ipi6_ifindex = np->mcast_oif ? np->mcast_oif : np->sticky_pktinfo.ipi6_ifindex; - src_info.ipi6_addr = np->mcast_oif ? np->daddr : np->sticky_pktinfo.ipi6_addr; + src_info.ipi6_addr = np->mcast_oif ? sk->sk_v6_daddr : np->sticky_pktinfo.ipi6_addr; put_cmsg(&msg, SOL_IPV6, IPV6_PKTINFO, sizeof(src_info), &src_info); } if (np->rxopt.bits.rxhlim) { @@ -1026,7 +1026,8 @@ static int do_ipv6_getsockopt(struct sock *sk, int level, int optname, struct in6_pktinfo src_info; src_info.ipi6_ifindex = np->mcast_oif ? np->mcast_oif : np->sticky_pktinfo.ipi6_ifindex; - src_info.ipi6_addr = np->mcast_oif ? np->daddr : np->sticky_pktinfo.ipi6_addr; + src_info.ipi6_addr = np->mcast_oif ? sk->sk_v6_daddr : + np->sticky_pktinfo.ipi6_addr; put_cmsg(&msg, SOL_IPV6, IPV6_2292PKTINFO, sizeof(src_info), &src_info); } if (np->rxopt.bits.rxohlim) { @@ -1211,6 +1212,34 @@ static int do_ipv6_getsockopt(struct sock *sk, int level, int optname, val = np->sndflow; break; + case IPV6_FLOWLABEL_MGR: + { + struct in6_flowlabel_req freq; + + if (len < sizeof(freq)) + return -EINVAL; + + if (copy_from_user(&freq, optval, sizeof(freq))) + return -EFAULT; + + if (freq.flr_action != IPV6_FL_A_GET) + return -EINVAL; + + len = sizeof(freq); + memset(&freq, 0, sizeof(freq)); + + val = ipv6_flowlabel_opt_get(sk, &freq); + if (val < 0) + return val; + + if (put_user(len, optlen)) + return -EFAULT; + if (copy_to_user(optval, &freq, len)) + return -EFAULT; + + return 0; + } + case IPV6_ADDR_PREFERENCES: val = 0; diff --git a/net/ipv6/netfilter/Kconfig b/net/ipv6/netfilter/Kconfig index a7f842b29b67..7702f9e90a04 100644 --- a/net/ipv6/netfilter/Kconfig +++ b/net/ipv6/netfilter/Kconfig @@ -25,6 +25,19 @@ config NF_CONNTRACK_IPV6 To compile it as a module, choose M here. If unsure, say N. +config NF_TABLES_IPV6 + depends on NF_TABLES + tristate "IPv6 nf_tables support" + +config NFT_CHAIN_ROUTE_IPV6 + depends on NF_TABLES_IPV6 + tristate "IPv6 nf_tables route chain support" + +config NFT_CHAIN_NAT_IPV6 + depends on NF_TABLES_IPV6 + depends on NF_NAT_IPV6 && NFT_NAT + tristate "IPv6 nf_tables nat chain support" + config IP6_NF_IPTABLES tristate "IP6 tables support (required for filtering)" depends on INET && IPV6 diff --git a/net/ipv6/netfilter/Makefile b/net/ipv6/netfilter/Makefile index 2b53738f798c..d1b4928f34f7 100644 --- a/net/ipv6/netfilter/Makefile +++ b/net/ipv6/netfilter/Makefile @@ -23,6 +23,11 @@ obj-$(CONFIG_NF_NAT_IPV6) += nf_nat_ipv6.o nf_defrag_ipv6-y := nf_defrag_ipv6_hooks.o nf_conntrack_reasm.o obj-$(CONFIG_NF_DEFRAG_IPV6) += nf_defrag_ipv6.o +# nf_tables +obj-$(CONFIG_NF_TABLES_IPV6) += nf_tables_ipv6.o +obj-$(CONFIG_NFT_CHAIN_ROUTE_IPV6) += nft_chain_route_ipv6.o +obj-$(CONFIG_NFT_CHAIN_NAT_IPV6) += nft_chain_nat_ipv6.o + # matches obj-$(CONFIG_IP6_NF_MATCH_AH) += ip6t_ah.o obj-$(CONFIG_IP6_NF_MATCH_EUI64) += ip6t_eui64.o diff --git a/net/ipv6/netfilter/ip6t_REJECT.c b/net/ipv6/netfilter/ip6t_REJECT.c index 56eef30ee5f6..da00a2ecde55 100644 --- a/net/ipv6/netfilter/ip6t_REJECT.c +++ b/net/ipv6/netfilter/ip6t_REJECT.c @@ -39,7 +39,7 @@ MODULE_DESCRIPTION("Xtables: packet \"rejection\" target for IPv6"); MODULE_LICENSE("GPL"); /* Send RST reply */ -static void send_reset(struct net *net, struct sk_buff *oldskb) +static void send_reset(struct net *net, struct sk_buff *oldskb, int hook) { struct sk_buff *nskb; struct tcphdr otcph, *tcph; @@ -88,8 +88,7 @@ static void send_reset(struct net *net, struct sk_buff *oldskb) } /* Check checksum. */ - if (csum_ipv6_magic(&oip6h->saddr, &oip6h->daddr, otcplen, IPPROTO_TCP, - skb_checksum(oldskb, tcphoff, otcplen, 0))) { + if (nf_ip6_checksum(oldskb, hook, tcphoff, IPPROTO_TCP)) { pr_debug("TCP checksum is invalid\n"); return; } @@ -227,7 +226,7 @@ reject_tg6(struct sk_buff *skb, const struct xt_action_param *par) /* Do nothing */ break; case IP6T_TCP_RESET: - send_reset(net, skb); + send_reset(net, skb, par->hooknum); break; default: net_info_ratelimited("case %u not handled yet\n", reject->with); diff --git a/net/ipv6/netfilter/ip6t_SYNPROXY.c b/net/ipv6/netfilter/ip6t_SYNPROXY.c index 2748b042da72..bf9f612c1bc2 100644 --- a/net/ipv6/netfilter/ip6t_SYNPROXY.c +++ b/net/ipv6/netfilter/ip6t_SYNPROXY.c @@ -312,7 +312,7 @@ synproxy_tg6(struct sk_buff *skb, const struct xt_action_param *par) return XT_CONTINUE; } -static unsigned int ipv6_synproxy_hook(unsigned int hooknum, +static unsigned int ipv6_synproxy_hook(const struct nf_hook_ops *ops, struct sk_buff *skb, const struct net_device *in, const struct net_device *out, diff --git a/net/ipv6/netfilter/ip6table_filter.c b/net/ipv6/netfilter/ip6table_filter.c index 29b44b14c5ea..ca7f6c128086 100644 --- a/net/ipv6/netfilter/ip6table_filter.c +++ b/net/ipv6/netfilter/ip6table_filter.c @@ -32,13 +32,14 @@ static const struct xt_table packet_filter = { /* The work comes in here from netfilter.c. */ static unsigned int -ip6table_filter_hook(unsigned int hook, struct sk_buff *skb, +ip6table_filter_hook(const struct nf_hook_ops *ops, struct sk_buff *skb, const struct net_device *in, const struct net_device *out, int (*okfn)(struct sk_buff *)) { const struct net *net = dev_net((in != NULL) ? in : out); - return ip6t_do_table(skb, hook, in, out, net->ipv6.ip6table_filter); + return ip6t_do_table(skb, ops->hooknum, in, out, + net->ipv6.ip6table_filter); } static struct nf_hook_ops *filter_ops __read_mostly; diff --git a/net/ipv6/netfilter/ip6table_mangle.c b/net/ipv6/netfilter/ip6table_mangle.c index c705907ae6ab..307bbb782d14 100644 --- a/net/ipv6/netfilter/ip6table_mangle.c +++ b/net/ipv6/netfilter/ip6table_mangle.c @@ -76,17 +76,17 @@ ip6t_mangle_out(struct sk_buff *skb, const struct net_device *out) /* The work comes in here from netfilter.c. */ static unsigned int -ip6table_mangle_hook(unsigned int hook, struct sk_buff *skb, +ip6table_mangle_hook(const struct nf_hook_ops *ops, struct sk_buff *skb, const struct net_device *in, const struct net_device *out, int (*okfn)(struct sk_buff *)) { - if (hook == NF_INET_LOCAL_OUT) + if (ops->hooknum == NF_INET_LOCAL_OUT) return ip6t_mangle_out(skb, out); - if (hook == NF_INET_POST_ROUTING) - return ip6t_do_table(skb, hook, in, out, + if (ops->hooknum == NF_INET_POST_ROUTING) + return ip6t_do_table(skb, ops->hooknum, in, out, dev_net(out)->ipv6.ip6table_mangle); /* INPUT/FORWARD */ - return ip6t_do_table(skb, hook, in, out, + return ip6t_do_table(skb, ops->hooknum, in, out, dev_net(in)->ipv6.ip6table_mangle); } diff --git a/net/ipv6/netfilter/ip6table_nat.c b/net/ipv6/netfilter/ip6table_nat.c index 9b076d2d3a7b..84c7f33d0cf8 100644 --- a/net/ipv6/netfilter/ip6table_nat.c +++ b/net/ipv6/netfilter/ip6table_nat.c @@ -63,7 +63,7 @@ static unsigned int nf_nat_rule_find(struct sk_buff *skb, unsigned int hooknum, } static unsigned int -nf_nat_ipv6_fn(unsigned int hooknum, +nf_nat_ipv6_fn(const struct nf_hook_ops *ops, struct sk_buff *skb, const struct net_device *in, const struct net_device *out, @@ -72,7 +72,7 @@ nf_nat_ipv6_fn(unsigned int hooknum, struct nf_conn *ct; enum ip_conntrack_info ctinfo; struct nf_conn_nat *nat; - enum nf_nat_manip_type maniptype = HOOK2MANIP(hooknum); + enum nf_nat_manip_type maniptype = HOOK2MANIP(ops->hooknum); __be16 frag_off; int hdrlen; u8 nexthdr; @@ -111,7 +111,8 @@ nf_nat_ipv6_fn(unsigned int hooknum, if (hdrlen >= 0 && nexthdr == IPPROTO_ICMPV6) { if (!nf_nat_icmpv6_reply_translation(skb, ct, ctinfo, - hooknum, hdrlen)) + ops->hooknum, + hdrlen)) return NF_DROP; else return NF_ACCEPT; @@ -124,14 +125,14 @@ nf_nat_ipv6_fn(unsigned int hooknum, if (!nf_nat_initialized(ct, maniptype)) { unsigned int ret; - ret = nf_nat_rule_find(skb, hooknum, in, out, ct); + ret = nf_nat_rule_find(skb, ops->hooknum, in, out, ct); if (ret != NF_ACCEPT) return ret; } else { pr_debug("Already setup manip %s for ct %p\n", maniptype == NF_NAT_MANIP_SRC ? "SRC" : "DST", ct); - if (nf_nat_oif_changed(hooknum, ctinfo, nat, out)) + if (nf_nat_oif_changed(ops->hooknum, ctinfo, nat, out)) goto oif_changed; } break; @@ -140,11 +141,11 @@ nf_nat_ipv6_fn(unsigned int hooknum, /* ESTABLISHED */ NF_CT_ASSERT(ctinfo == IP_CT_ESTABLISHED || ctinfo == IP_CT_ESTABLISHED_REPLY); - if (nf_nat_oif_changed(hooknum, ctinfo, nat, out)) + if (nf_nat_oif_changed(ops->hooknum, ctinfo, nat, out)) goto oif_changed; } - return nf_nat_packet(ct, ctinfo, hooknum, skb); + return nf_nat_packet(ct, ctinfo, ops->hooknum, skb); oif_changed: nf_ct_kill_acct(ct, ctinfo, skb); @@ -152,7 +153,7 @@ oif_changed: } static unsigned int -nf_nat_ipv6_in(unsigned int hooknum, +nf_nat_ipv6_in(const struct nf_hook_ops *ops, struct sk_buff *skb, const struct net_device *in, const struct net_device *out, @@ -161,7 +162,7 @@ nf_nat_ipv6_in(unsigned int hooknum, unsigned int ret; struct in6_addr daddr = ipv6_hdr(skb)->daddr; - ret = nf_nat_ipv6_fn(hooknum, skb, in, out, okfn); + ret = nf_nat_ipv6_fn(ops, skb, in, out, okfn); if (ret != NF_DROP && ret != NF_STOLEN && ipv6_addr_cmp(&daddr, &ipv6_hdr(skb)->daddr)) skb_dst_drop(skb); @@ -170,7 +171,7 @@ nf_nat_ipv6_in(unsigned int hooknum, } static unsigned int -nf_nat_ipv6_out(unsigned int hooknum, +nf_nat_ipv6_out(const struct nf_hook_ops *ops, struct sk_buff *skb, const struct net_device *in, const struct net_device *out, @@ -187,7 +188,7 @@ nf_nat_ipv6_out(unsigned int hooknum, if (skb->len < sizeof(struct ipv6hdr)) return NF_ACCEPT; - ret = nf_nat_ipv6_fn(hooknum, skb, in, out, okfn); + ret = nf_nat_ipv6_fn(ops, skb, in, out, okfn); #ifdef CONFIG_XFRM if (ret != NF_DROP && ret != NF_STOLEN && !(IP6CB(skb)->flags & IP6SKB_XFRM_TRANSFORMED) && @@ -209,7 +210,7 @@ nf_nat_ipv6_out(unsigned int hooknum, } static unsigned int -nf_nat_ipv6_local_fn(unsigned int hooknum, +nf_nat_ipv6_local_fn(const struct nf_hook_ops *ops, struct sk_buff *skb, const struct net_device *in, const struct net_device *out, @@ -224,7 +225,7 @@ nf_nat_ipv6_local_fn(unsigned int hooknum, if (skb->len < sizeof(struct ipv6hdr)) return NF_ACCEPT; - ret = nf_nat_ipv6_fn(hooknum, skb, in, out, okfn); + ret = nf_nat_ipv6_fn(ops, skb, in, out, okfn); if (ret != NF_DROP && ret != NF_STOLEN && (ct = nf_ct_get(skb, &ctinfo)) != NULL) { enum ip_conntrack_dir dir = CTINFO2DIR(ctinfo); diff --git a/net/ipv6/netfilter/ip6table_raw.c b/net/ipv6/netfilter/ip6table_raw.c index 9a626d86720f..5274740acecc 100644 --- a/net/ipv6/netfilter/ip6table_raw.c +++ b/net/ipv6/netfilter/ip6table_raw.c @@ -19,13 +19,14 @@ static const struct xt_table packet_raw = { /* The work comes in here from netfilter.c. */ static unsigned int -ip6table_raw_hook(unsigned int hook, struct sk_buff *skb, +ip6table_raw_hook(const struct nf_hook_ops *ops, struct sk_buff *skb, const struct net_device *in, const struct net_device *out, int (*okfn)(struct sk_buff *)) { const struct net *net = dev_net((in != NULL) ? in : out); - return ip6t_do_table(skb, hook, in, out, net->ipv6.ip6table_raw); + return ip6t_do_table(skb, ops->hooknum, in, out, + net->ipv6.ip6table_raw); } static struct nf_hook_ops *rawtable_ops __read_mostly; diff --git a/net/ipv6/netfilter/ip6table_security.c b/net/ipv6/netfilter/ip6table_security.c index ce88d1d7e525..ab3b0219ecfa 100644 --- a/net/ipv6/netfilter/ip6table_security.c +++ b/net/ipv6/netfilter/ip6table_security.c @@ -36,14 +36,15 @@ static const struct xt_table security_table = { }; static unsigned int -ip6table_security_hook(unsigned int hook, struct sk_buff *skb, +ip6table_security_hook(const struct nf_hook_ops *ops, struct sk_buff *skb, const struct net_device *in, const struct net_device *out, int (*okfn)(struct sk_buff *)) { const struct net *net = dev_net((in != NULL) ? in : out); - return ip6t_do_table(skb, hook, in, out, net->ipv6.ip6table_security); + return ip6t_do_table(skb, ops->hooknum, in, out, + net->ipv6.ip6table_security); } static struct nf_hook_ops *sectbl_ops __read_mostly; diff --git a/net/ipv6/netfilter/nf_conntrack_l3proto_ipv6.c b/net/ipv6/netfilter/nf_conntrack_l3proto_ipv6.c index d6e4dd8b58df..4cbc6b290dd5 100644 --- a/net/ipv6/netfilter/nf_conntrack_l3proto_ipv6.c +++ b/net/ipv6/netfilter/nf_conntrack_l3proto_ipv6.c @@ -95,7 +95,7 @@ static int ipv6_get_l4proto(const struct sk_buff *skb, unsigned int nhoff, return NF_ACCEPT; } -static unsigned int ipv6_helper(unsigned int hooknum, +static unsigned int ipv6_helper(const struct nf_hook_ops *ops, struct sk_buff *skb, const struct net_device *in, const struct net_device *out, @@ -133,7 +133,7 @@ static unsigned int ipv6_helper(unsigned int hooknum, return helper->help(skb, protoff, ct, ctinfo); } -static unsigned int ipv6_confirm(unsigned int hooknum, +static unsigned int ipv6_confirm(const struct nf_hook_ops *ops, struct sk_buff *skb, const struct net_device *in, const struct net_device *out, @@ -169,66 +169,16 @@ out: return nf_conntrack_confirm(skb); } -static unsigned int __ipv6_conntrack_in(struct net *net, - unsigned int hooknum, - struct sk_buff *skb, - const struct net_device *in, - const struct net_device *out, - int (*okfn)(struct sk_buff *)) -{ - struct sk_buff *reasm = skb->nfct_reasm; - const struct nf_conn_help *help; - struct nf_conn *ct; - enum ip_conntrack_info ctinfo; - - /* This packet is fragmented and has reassembled packet. */ - if (reasm) { - /* Reassembled packet isn't parsed yet ? */ - if (!reasm->nfct) { - unsigned int ret; - - ret = nf_conntrack_in(net, PF_INET6, hooknum, reasm); - if (ret != NF_ACCEPT) - return ret; - } - - /* Conntrack helpers need the entire reassembled packet in the - * POST_ROUTING hook. In case of unconfirmed connections NAT - * might reassign a helper, so the entire packet is also - * required. - */ - ct = nf_ct_get(reasm, &ctinfo); - if (ct != NULL && !nf_ct_is_untracked(ct)) { - help = nfct_help(ct); - if ((help && help->helper) || !nf_ct_is_confirmed(ct)) { - nf_conntrack_get_reasm(reasm); - NF_HOOK_THRESH(NFPROTO_IPV6, hooknum, reasm, - (struct net_device *)in, - (struct net_device *)out, - okfn, NF_IP6_PRI_CONNTRACK + 1); - return NF_DROP_ERR(-ECANCELED); - } - } - - nf_conntrack_get(reasm->nfct); - skb->nfct = reasm->nfct; - skb->nfctinfo = reasm->nfctinfo; - return NF_ACCEPT; - } - - return nf_conntrack_in(net, PF_INET6, hooknum, skb); -} - -static unsigned int ipv6_conntrack_in(unsigned int hooknum, +static unsigned int ipv6_conntrack_in(const struct nf_hook_ops *ops, struct sk_buff *skb, const struct net_device *in, const struct net_device *out, int (*okfn)(struct sk_buff *)) { - return __ipv6_conntrack_in(dev_net(in), hooknum, skb, in, out, okfn); + return nf_conntrack_in(dev_net(in), PF_INET6, ops->hooknum, skb); } -static unsigned int ipv6_conntrack_local(unsigned int hooknum, +static unsigned int ipv6_conntrack_local(const struct nf_hook_ops *ops, struct sk_buff *skb, const struct net_device *in, const struct net_device *out, @@ -239,7 +189,7 @@ static unsigned int ipv6_conntrack_local(unsigned int hooknum, net_notice_ratelimited("ipv6_conntrack_local: packet too short\n"); return NF_ACCEPT; } - return __ipv6_conntrack_in(dev_net(out), hooknum, skb, in, out, okfn); + return nf_conntrack_in(dev_net(out), PF_INET6, ops->hooknum, skb); } static struct nf_hook_ops ipv6_conntrack_ops[] __read_mostly = { @@ -297,9 +247,9 @@ ipv6_getorigdst(struct sock *sk, int optval, void __user *user, int *len) struct nf_conntrack_tuple tuple = { .src.l3num = NFPROTO_IPV6 }; struct nf_conn *ct; - tuple.src.u3.in6 = inet6->rcv_saddr; + tuple.src.u3.in6 = sk->sk_v6_rcv_saddr; tuple.src.u.tcp.port = inet->inet_sport; - tuple.dst.u3.in6 = inet6->daddr; + tuple.dst.u3.in6 = sk->sk_v6_daddr; tuple.dst.u.tcp.port = inet->inet_dport; tuple.dst.protonum = sk->sk_protocol; diff --git a/net/ipv6/netfilter/nf_conntrack_reasm.c b/net/ipv6/netfilter/nf_conntrack_reasm.c index dffdc1a389c5..767ab8da8218 100644 --- a/net/ipv6/netfilter/nf_conntrack_reasm.c +++ b/net/ipv6/netfilter/nf_conntrack_reasm.c @@ -144,12 +144,24 @@ static inline u8 ip6_frag_ecn(const struct ipv6hdr *ipv6h) return 1 << (ipv6_get_dsfield(ipv6h) & INET_ECN_MASK); } +static unsigned int nf_hash_frag(__be32 id, const struct in6_addr *saddr, + const struct in6_addr *daddr) +{ + u32 c; + + net_get_random_once(&nf_frags.rnd, sizeof(nf_frags.rnd)); + c = jhash_3words(ipv6_addr_hash(saddr), ipv6_addr_hash(daddr), + (__force u32)id, nf_frags.rnd); + return c & (INETFRAGS_HASHSZ - 1); +} + + static unsigned int nf_hashfn(struct inet_frag_queue *q) { const struct frag_queue *nq; nq = container_of(q, struct frag_queue, q); - return inet6_hash_frag(nq->id, &nq->saddr, &nq->daddr, nf_frags.rnd); + return nf_hash_frag(nq->id, &nq->saddr, &nq->daddr); } static void nf_skb_free(struct sk_buff *skb) @@ -185,7 +197,7 @@ static inline struct frag_queue *fq_find(struct net *net, __be32 id, arg.ecn = ecn; read_lock_bh(&nf_frags.lock); - hash = inet6_hash_frag(id, src, dst, nf_frags.rnd); + hash = nf_hash_frag(id, src, dst); q = inet_frag_find(&net->nf_frag.frags, &nf_frags, &arg, hash); local_bh_enable(); @@ -621,31 +633,16 @@ ret_orig: return skb; } -void nf_ct_frag6_output(unsigned int hooknum, struct sk_buff *skb, - struct net_device *in, struct net_device *out, - int (*okfn)(struct sk_buff *)) +void nf_ct_frag6_consume_orig(struct sk_buff *skb) { struct sk_buff *s, *s2; - unsigned int ret = 0; for (s = NFCT_FRAG6_CB(skb)->orig; s;) { - nf_conntrack_put_reasm(s->nfct_reasm); - nf_conntrack_get_reasm(skb); - s->nfct_reasm = skb; - s2 = s->next; s->next = NULL; - - if (ret != -ECANCELED) - ret = NF_HOOK_THRESH(NFPROTO_IPV6, hooknum, s, - in, out, okfn, - NF_IP6_PRI_CONNTRACK_DEFRAG + 1); - else - kfree_skb(s); - + consume_skb(s); s = s2; } - nf_conntrack_put_reasm(skb); } static int nf_ct_net_init(struct net *net) diff --git a/net/ipv6/netfilter/nf_defrag_ipv6_hooks.c b/net/ipv6/netfilter/nf_defrag_ipv6_hooks.c index aacd121fe8c5..7b9a748c6bac 100644 --- a/net/ipv6/netfilter/nf_defrag_ipv6_hooks.c +++ b/net/ipv6/netfilter/nf_defrag_ipv6_hooks.c @@ -52,7 +52,7 @@ static enum ip6_defrag_users nf_ct6_defrag_user(unsigned int hooknum, } -static unsigned int ipv6_defrag(unsigned int hooknum, +static unsigned int ipv6_defrag(const struct nf_hook_ops *ops, struct sk_buff *skb, const struct net_device *in, const struct net_device *out, @@ -66,7 +66,7 @@ static unsigned int ipv6_defrag(unsigned int hooknum, return NF_ACCEPT; #endif - reasm = nf_ct_frag6_gather(skb, nf_ct6_defrag_user(hooknum, skb)); + reasm = nf_ct_frag6_gather(skb, nf_ct6_defrag_user(ops->hooknum, skb)); /* queued */ if (reasm == NULL) return NF_STOLEN; @@ -75,8 +75,11 @@ static unsigned int ipv6_defrag(unsigned int hooknum, if (reasm == skb) return NF_ACCEPT; - nf_ct_frag6_output(hooknum, reasm, (struct net_device *)in, - (struct net_device *)out, okfn); + nf_ct_frag6_consume_orig(reasm); + + NF_HOOK_THRESH(NFPROTO_IPV6, ops->hooknum, reasm, + (struct net_device *) in, (struct net_device *) out, + okfn, NF_IP6_PRI_CONNTRACK_DEFRAG + 1); return NF_STOLEN; } diff --git a/net/ipv6/netfilter/nf_tables_ipv6.c b/net/ipv6/netfilter/nf_tables_ipv6.c new file mode 100644 index 000000000000..d77db8a13505 --- /dev/null +++ b/net/ipv6/netfilter/nf_tables_ipv6.c @@ -0,0 +1,127 @@ +/* + * Copyright (c) 2008 Patrick McHardy <kaber@trash.net> + * Copyright (c) 2012-2013 Pablo Neira Ayuso <pablo@netfilter.org> + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License version 2 as + * published by the Free Software Foundation. + * + * Development of this code funded by Astaro AG (http://www.astaro.com/) + */ + +#include <linux/init.h> +#include <linux/module.h> +#include <linux/ipv6.h> +#include <linux/netfilter_ipv6.h> +#include <net/netfilter/nf_tables.h> +#include <net/netfilter/nf_tables_ipv6.h> + +static unsigned int nft_ipv6_output(const struct nf_hook_ops *ops, + struct sk_buff *skb, + const struct net_device *in, + const struct net_device *out, + int (*okfn)(struct sk_buff *)) +{ + struct nft_pktinfo pkt; + + if (unlikely(skb->len < sizeof(struct ipv6hdr))) { + if (net_ratelimit()) + pr_info("nf_tables_ipv6: ignoring short SOCK_RAW " + "packet\n"); + return NF_ACCEPT; + } + if (nft_set_pktinfo_ipv6(&pkt, ops, skb, in, out) < 0) + return NF_DROP; + + return nft_do_chain_pktinfo(&pkt, ops); +} + +static struct nft_af_info nft_af_ipv6 __read_mostly = { + .family = NFPROTO_IPV6, + .nhooks = NF_INET_NUMHOOKS, + .owner = THIS_MODULE, + .hooks = { + [NF_INET_LOCAL_OUT] = nft_ipv6_output, + }, +}; + +static int nf_tables_ipv6_init_net(struct net *net) +{ + net->nft.ipv6 = kmalloc(sizeof(struct nft_af_info), GFP_KERNEL); + if (net->nft.ipv6 == NULL) + return -ENOMEM; + + memcpy(net->nft.ipv6, &nft_af_ipv6, sizeof(nft_af_ipv6)); + + if (nft_register_afinfo(net, net->nft.ipv6) < 0) + goto err; + + return 0; +err: + kfree(net->nft.ipv6); + return -ENOMEM; +} + +static void nf_tables_ipv6_exit_net(struct net *net) +{ + nft_unregister_afinfo(net->nft.ipv6); + kfree(net->nft.ipv6); +} + +static struct pernet_operations nf_tables_ipv6_net_ops = { + .init = nf_tables_ipv6_init_net, + .exit = nf_tables_ipv6_exit_net, +}; + +static unsigned int +nft_do_chain_ipv6(const struct nf_hook_ops *ops, + struct sk_buff *skb, + const struct net_device *in, + const struct net_device *out, + int (*okfn)(struct sk_buff *)) +{ + struct nft_pktinfo pkt; + + /* malformed packet, drop it */ + if (nft_set_pktinfo_ipv6(&pkt, ops, skb, in, out) < 0) + return NF_DROP; + + return nft_do_chain_pktinfo(&pkt, ops); +} + +static struct nf_chain_type filter_ipv6 = { + .family = NFPROTO_IPV6, + .name = "filter", + .type = NFT_CHAIN_T_DEFAULT, + .hook_mask = (1 << NF_INET_LOCAL_IN) | + (1 << NF_INET_LOCAL_OUT) | + (1 << NF_INET_FORWARD) | + (1 << NF_INET_PRE_ROUTING) | + (1 << NF_INET_POST_ROUTING), + .fn = { + [NF_INET_LOCAL_IN] = nft_do_chain_ipv6, + [NF_INET_LOCAL_OUT] = nft_ipv6_output, + [NF_INET_FORWARD] = nft_do_chain_ipv6, + [NF_INET_PRE_ROUTING] = nft_do_chain_ipv6, + [NF_INET_POST_ROUTING] = nft_do_chain_ipv6, + }, +}; + +static int __init nf_tables_ipv6_init(void) +{ + nft_register_chain_type(&filter_ipv6); + return register_pernet_subsys(&nf_tables_ipv6_net_ops); +} + +static void __exit nf_tables_ipv6_exit(void) +{ + unregister_pernet_subsys(&nf_tables_ipv6_net_ops); + nft_unregister_chain_type(&filter_ipv6); +} + +module_init(nf_tables_ipv6_init); +module_exit(nf_tables_ipv6_exit); + +MODULE_LICENSE("GPL"); +MODULE_AUTHOR("Patrick McHardy <kaber@trash.net>"); +MODULE_ALIAS_NFT_FAMILY(AF_INET6); diff --git a/net/ipv6/netfilter/nft_chain_nat_ipv6.c b/net/ipv6/netfilter/nft_chain_nat_ipv6.c new file mode 100644 index 000000000000..e86dcd70dc76 --- /dev/null +++ b/net/ipv6/netfilter/nft_chain_nat_ipv6.c @@ -0,0 +1,211 @@ +/* + * Copyright (c) 2011 Patrick McHardy <kaber@trash.net> + * Copyright (c) 2012 Intel Corporation + * + * This program is free software; you can redistribute it and/or modify it + * under the terms and conditions of the GNU General Public License, + * version 2, as published by the Free Software Foundation. + * + */ + +#include <linux/module.h> +#include <linux/init.h> +#include <linux/list.h> +#include <linux/skbuff.h> +#include <linux/ip.h> +#include <linux/netfilter.h> +#include <linux/netfilter_ipv6.h> +#include <linux/netfilter/nf_tables.h> +#include <net/netfilter/nf_conntrack.h> +#include <net/netfilter/nf_nat.h> +#include <net/netfilter/nf_nat_core.h> +#include <net/netfilter/nf_tables.h> +#include <net/netfilter/nf_tables_ipv6.h> +#include <net/netfilter/nf_nat_l3proto.h> +#include <net/ipv6.h> + +/* + * IPv6 NAT chains + */ + +static unsigned int nf_nat_ipv6_fn(const struct nf_hook_ops *ops, + struct sk_buff *skb, + const struct net_device *in, + const struct net_device *out, + int (*okfn)(struct sk_buff *)) +{ + enum ip_conntrack_info ctinfo; + struct nf_conn *ct = nf_ct_get(skb, &ctinfo); + struct nf_conn_nat *nat; + enum nf_nat_manip_type maniptype = HOOK2MANIP(ops->hooknum); + __be16 frag_off; + int hdrlen; + u8 nexthdr; + struct nft_pktinfo pkt; + unsigned int ret; + + if (ct == NULL || nf_ct_is_untracked(ct)) + return NF_ACCEPT; + + nat = nfct_nat(ct); + if (nat == NULL) { + /* Conntrack module was loaded late, can't add extension. */ + if (nf_ct_is_confirmed(ct)) + return NF_ACCEPT; + nat = nf_ct_ext_add(ct, NF_CT_EXT_NAT, GFP_ATOMIC); + if (nat == NULL) + return NF_ACCEPT; + } + + switch (ctinfo) { + case IP_CT_RELATED: + case IP_CT_RELATED + IP_CT_IS_REPLY: + nexthdr = ipv6_hdr(skb)->nexthdr; + hdrlen = ipv6_skip_exthdr(skb, sizeof(struct ipv6hdr), + &nexthdr, &frag_off); + + if (hdrlen >= 0 && nexthdr == IPPROTO_ICMPV6) { + if (!nf_nat_icmpv6_reply_translation(skb, ct, ctinfo, + ops->hooknum, + hdrlen)) + return NF_DROP; + else + return NF_ACCEPT; + } + /* Fall through */ + case IP_CT_NEW: + if (nf_nat_initialized(ct, maniptype)) + break; + + nft_set_pktinfo_ipv6(&pkt, ops, skb, in, out); + + ret = nft_do_chain_pktinfo(&pkt, ops); + if (ret != NF_ACCEPT) + return ret; + if (!nf_nat_initialized(ct, maniptype)) { + ret = nf_nat_alloc_null_binding(ct, ops->hooknum); + if (ret != NF_ACCEPT) + return ret; + } + default: + break; + } + + return nf_nat_packet(ct, ctinfo, ops->hooknum, skb); +} + +static unsigned int nf_nat_ipv6_prerouting(const struct nf_hook_ops *ops, + struct sk_buff *skb, + const struct net_device *in, + const struct net_device *out, + int (*okfn)(struct sk_buff *)) +{ + struct in6_addr daddr = ipv6_hdr(skb)->daddr; + unsigned int ret; + + ret = nf_nat_ipv6_fn(ops, skb, in, out, okfn); + if (ret != NF_DROP && ret != NF_STOLEN && + ipv6_addr_cmp(&daddr, &ipv6_hdr(skb)->daddr)) + skb_dst_drop(skb); + + return ret; +} + +static unsigned int nf_nat_ipv6_postrouting(const struct nf_hook_ops *ops, + struct sk_buff *skb, + const struct net_device *in, + const struct net_device *out, + int (*okfn)(struct sk_buff *)) +{ + enum ip_conntrack_info ctinfo __maybe_unused; + const struct nf_conn *ct __maybe_unused; + unsigned int ret; + + ret = nf_nat_ipv6_fn(ops, skb, in, out, okfn); +#ifdef CONFIG_XFRM + if (ret != NF_DROP && ret != NF_STOLEN && + !(IP6CB(skb)->flags & IP6SKB_XFRM_TRANSFORMED) && + (ct = nf_ct_get(skb, &ctinfo)) != NULL) { + enum ip_conntrack_dir dir = CTINFO2DIR(ctinfo); + + if (!nf_inet_addr_cmp(&ct->tuplehash[dir].tuple.src.u3, + &ct->tuplehash[!dir].tuple.dst.u3) || + (ct->tuplehash[dir].tuple.src.u.all != + ct->tuplehash[!dir].tuple.dst.u.all)) + if (nf_xfrm_me_harder(skb, AF_INET6) < 0) + ret = NF_DROP; + } +#endif + return ret; +} + +static unsigned int nf_nat_ipv6_output(const struct nf_hook_ops *ops, + struct sk_buff *skb, + const struct net_device *in, + const struct net_device *out, + int (*okfn)(struct sk_buff *)) +{ + enum ip_conntrack_info ctinfo; + const struct nf_conn *ct; + unsigned int ret; + + ret = nf_nat_ipv6_fn(ops, skb, in, out, okfn); + if (ret != NF_DROP && ret != NF_STOLEN && + (ct = nf_ct_get(skb, &ctinfo)) != NULL) { + enum ip_conntrack_dir dir = CTINFO2DIR(ctinfo); + + if (!nf_inet_addr_cmp(&ct->tuplehash[dir].tuple.dst.u3, + &ct->tuplehash[!dir].tuple.src.u3)) { + if (ip6_route_me_harder(skb)) + ret = NF_DROP; + } +#ifdef CONFIG_XFRM + else if (!(IP6CB(skb)->flags & IP6SKB_XFRM_TRANSFORMED) && + ct->tuplehash[dir].tuple.dst.u.all != + ct->tuplehash[!dir].tuple.src.u.all) + if (nf_xfrm_me_harder(skb, AF_INET6)) + ret = NF_DROP; +#endif + } + return ret; +} + +static struct nf_chain_type nft_chain_nat_ipv6 = { + .family = NFPROTO_IPV6, + .name = "nat", + .type = NFT_CHAIN_T_NAT, + .hook_mask = (1 << NF_INET_PRE_ROUTING) | + (1 << NF_INET_POST_ROUTING) | + (1 << NF_INET_LOCAL_OUT) | + (1 << NF_INET_LOCAL_IN), + .fn = { + [NF_INET_PRE_ROUTING] = nf_nat_ipv6_prerouting, + [NF_INET_POST_ROUTING] = nf_nat_ipv6_postrouting, + [NF_INET_LOCAL_OUT] = nf_nat_ipv6_output, + [NF_INET_LOCAL_IN] = nf_nat_ipv6_fn, + }, + .me = THIS_MODULE, +}; + +static int __init nft_chain_nat_ipv6_init(void) +{ + int err; + + err = nft_register_chain_type(&nft_chain_nat_ipv6); + if (err < 0) + return err; + + return 0; +} + +static void __exit nft_chain_nat_ipv6_exit(void) +{ + nft_unregister_chain_type(&nft_chain_nat_ipv6); +} + +module_init(nft_chain_nat_ipv6_init); +module_exit(nft_chain_nat_ipv6_exit); + +MODULE_LICENSE("GPL"); +MODULE_AUTHOR("Tomasz Bursztyka <tomasz.bursztyka@linux.intel.com>"); +MODULE_ALIAS_NFT_CHAIN(AF_INET6, "nat"); diff --git a/net/ipv6/netfilter/nft_chain_route_ipv6.c b/net/ipv6/netfilter/nft_chain_route_ipv6.c new file mode 100644 index 000000000000..3fe40f0456ad --- /dev/null +++ b/net/ipv6/netfilter/nft_chain_route_ipv6.c @@ -0,0 +1,88 @@ +/* + * Copyright (c) 2008 Patrick McHardy <kaber@trash.net> + * Copyright (c) 2012 Pablo Neira Ayuso <pablo@netfilter.org> + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License version 2 as + * published by the Free Software Foundation. + * + * Development of this code funded by Astaro AG (http://www.astaro.com/) + */ + +#include <linux/module.h> +#include <linux/init.h> +#include <linux/list.h> +#include <linux/skbuff.h> +#include <linux/netlink.h> +#include <linux/netfilter.h> +#include <linux/netfilter_ipv6.h> +#include <linux/netfilter/nfnetlink.h> +#include <linux/netfilter/nf_tables.h> +#include <net/netfilter/nf_tables.h> +#include <net/netfilter/nf_tables_ipv6.h> +#include <net/route.h> + +static unsigned int nf_route_table_hook(const struct nf_hook_ops *ops, + struct sk_buff *skb, + const struct net_device *in, + const struct net_device *out, + int (*okfn)(struct sk_buff *)) +{ + unsigned int ret; + struct nft_pktinfo pkt; + struct in6_addr saddr, daddr; + u_int8_t hop_limit; + u32 mark, flowlabel; + + /* malformed packet, drop it */ + if (nft_set_pktinfo_ipv6(&pkt, ops, skb, in, out) < 0) + return NF_DROP; + + /* save source/dest address, mark, hoplimit, flowlabel, priority */ + memcpy(&saddr, &ipv6_hdr(skb)->saddr, sizeof(saddr)); + memcpy(&daddr, &ipv6_hdr(skb)->daddr, sizeof(daddr)); + mark = skb->mark; + hop_limit = ipv6_hdr(skb)->hop_limit; + + /* flowlabel and prio (includes version, which shouldn't change either */ + flowlabel = *((u32 *)ipv6_hdr(skb)); + + ret = nft_do_chain_pktinfo(&pkt, ops); + if (ret != NF_DROP && ret != NF_QUEUE && + (memcmp(&ipv6_hdr(skb)->saddr, &saddr, sizeof(saddr)) || + memcmp(&ipv6_hdr(skb)->daddr, &daddr, sizeof(daddr)) || + skb->mark != mark || + ipv6_hdr(skb)->hop_limit != hop_limit || + flowlabel != *((u_int32_t *)ipv6_hdr(skb)))) + return ip6_route_me_harder(skb) == 0 ? ret : NF_DROP; + + return ret; +} + +static struct nf_chain_type nft_chain_route_ipv6 = { + .family = NFPROTO_IPV6, + .name = "route", + .type = NFT_CHAIN_T_ROUTE, + .hook_mask = (1 << NF_INET_LOCAL_OUT), + .fn = { + [NF_INET_LOCAL_OUT] = nf_route_table_hook, + }, + .me = THIS_MODULE, +}; + +static int __init nft_chain_route_init(void) +{ + return nft_register_chain_type(&nft_chain_route_ipv6); +} + +static void __exit nft_chain_route_exit(void) +{ + nft_unregister_chain_type(&nft_chain_route_ipv6); +} + +module_init(nft_chain_route_init); +module_exit(nft_chain_route_exit); + +MODULE_LICENSE("GPL"); +MODULE_AUTHOR("Patrick McHardy <kaber@trash.net>"); +MODULE_ALIAS_NFT_CHAIN(AF_INET6, "route"); diff --git a/net/ipv6/ping.c b/net/ipv6/ping.c index 18f19df4189f..8815e31a87fe 100644 --- a/net/ipv6/ping.c +++ b/net/ipv6/ping.c @@ -116,7 +116,7 @@ int ping_v6_sendmsg(struct kiocb *iocb, struct sock *sk, struct msghdr *msg, } else { if (sk->sk_state != TCP_ESTABLISHED) return -EDESTADDRREQ; - daddr = &np->daddr; + daddr = &sk->sk_v6_daddr; } if (!iif) diff --git a/net/ipv6/raw.c b/net/ipv6/raw.c index a4ed2416399e..3c00842b0079 100644 --- a/net/ipv6/raw.c +++ b/net/ipv6/raw.c @@ -77,20 +77,19 @@ static struct sock *__raw_v6_lookup(struct net *net, struct sock *sk, sk_for_each_from(sk) if (inet_sk(sk)->inet_num == num) { - struct ipv6_pinfo *np = inet6_sk(sk); if (!net_eq(sock_net(sk), net)) continue; - if (!ipv6_addr_any(&np->daddr) && - !ipv6_addr_equal(&np->daddr, rmt_addr)) + if (!ipv6_addr_any(&sk->sk_v6_daddr) && + !ipv6_addr_equal(&sk->sk_v6_daddr, rmt_addr)) continue; if (sk->sk_bound_dev_if && sk->sk_bound_dev_if != dif) continue; - if (!ipv6_addr_any(&np->rcv_saddr)) { - if (ipv6_addr_equal(&np->rcv_saddr, loc_addr)) + if (!ipv6_addr_any(&sk->sk_v6_rcv_saddr)) { + if (ipv6_addr_equal(&sk->sk_v6_rcv_saddr, loc_addr)) goto found; if (is_multicast && inet6_mc_check(sk, loc_addr, rmt_addr)) @@ -302,7 +301,7 @@ static int rawv6_bind(struct sock *sk, struct sockaddr *uaddr, int addr_len) } inet->inet_rcv_saddr = inet->inet_saddr = v4addr; - np->rcv_saddr = addr->sin6_addr; + sk->sk_v6_rcv_saddr = addr->sin6_addr; if (!(addr_type & IPV6_ADDR_MULTICAST)) np->saddr = addr->sin6_addr; err = 0; @@ -804,8 +803,8 @@ static int rawv6_sendmsg(struct kiocb *iocb, struct sock *sk, * sk->sk_dst_cache. */ if (sk->sk_state == TCP_ESTABLISHED && - ipv6_addr_equal(daddr, &np->daddr)) - daddr = &np->daddr; + ipv6_addr_equal(daddr, &sk->sk_v6_daddr)) + daddr = &sk->sk_v6_daddr; if (addr_len >= sizeof(struct sockaddr_in6) && sin6->sin6_scope_id && @@ -816,7 +815,7 @@ static int rawv6_sendmsg(struct kiocb *iocb, struct sock *sk, return -EDESTADDRREQ; proto = inet->inet_num; - daddr = &np->daddr; + daddr = &sk->sk_v6_daddr; fl6.flowlabel = np->flow_label; } diff --git a/net/ipv6/reassembly.c b/net/ipv6/reassembly.c index 1aeb473b2cc6..cc85a9ba5010 100644 --- a/net/ipv6/reassembly.c +++ b/net/ipv6/reassembly.c @@ -82,24 +82,24 @@ static int ip6_frag_reasm(struct frag_queue *fq, struct sk_buff *prev, * callers should be careful not to use the hash value outside the ipfrag_lock * as doing so could race with ipfrag_hash_rnd being recalculated. */ -unsigned int inet6_hash_frag(__be32 id, const struct in6_addr *saddr, - const struct in6_addr *daddr, u32 rnd) +static unsigned int inet6_hash_frag(__be32 id, const struct in6_addr *saddr, + const struct in6_addr *daddr) { u32 c; + net_get_random_once(&ip6_frags.rnd, sizeof(ip6_frags.rnd)); c = jhash_3words(ipv6_addr_hash(saddr), ipv6_addr_hash(daddr), - (__force u32)id, rnd); + (__force u32)id, ip6_frags.rnd); return c & (INETFRAGS_HASHSZ - 1); } -EXPORT_SYMBOL_GPL(inet6_hash_frag); static unsigned int ip6_hashfn(struct inet_frag_queue *q) { struct frag_queue *fq; fq = container_of(q, struct frag_queue, q); - return inet6_hash_frag(fq->id, &fq->saddr, &fq->daddr, ip6_frags.rnd); + return inet6_hash_frag(fq->id, &fq->saddr, &fq->daddr); } bool ip6_frag_match(struct inet_frag_queue *q, void *a) @@ -193,7 +193,7 @@ fq_find(struct net *net, __be32 id, const struct in6_addr *src, arg.ecn = ecn; read_lock(&ip6_frags.lock); - hash = inet6_hash_frag(id, src, dst, ip6_frags.rnd); + hash = inet6_hash_frag(id, src, dst); q = inet_frag_find(&net->ipv6.frags, &ip6_frags, &arg, hash); if (IS_ERR_OR_NULL(q)) { diff --git a/net/ipv6/route.c b/net/ipv6/route.c index 04e17b3309fb..7faa9d5e1503 100644 --- a/net/ipv6/route.c +++ b/net/ipv6/route.c @@ -619,7 +619,7 @@ static struct rt6_info *find_match(struct rt6_info *rt, int oif, int strict, goto out; m = rt6_score_route(rt, oif, strict); - if (m == RT6_NUD_FAIL_SOFT && !IS_ENABLED(CONFIG_IPV6_ROUTER_PREF)) { + if (m == RT6_NUD_FAIL_SOFT) { match_do_rr = true; m = 0; /* lowest valid score */ } else if (m < 0) { @@ -731,8 +731,11 @@ int rt6_route_rcv(struct net_device *dev, u8 *opt, int len, prefix = &prefix_buf; } - rt = rt6_get_route_info(net, prefix, rinfo->prefix_len, gwaddr, - dev->ifindex); + if (rinfo->prefix_len == 0) + rt = rt6_get_dflt_router(gwaddr, dev); + else + rt = rt6_get_route_info(net, prefix, rinfo->prefix_len, + gwaddr, dev->ifindex); if (rt && !lifetime) { ip6_del_rt(rt); @@ -871,11 +874,9 @@ static struct rt6_info *rt6_alloc_cow(struct rt6_info *ort, rt = ip6_rt_copy(ort, daddr); if (rt) { - if (!(rt->rt6i_flags & RTF_GATEWAY)) { - if (ort->rt6i_dst.plen != 128 && - ipv6_addr_equal(&ort->rt6i_dst.addr, daddr)) - rt->rt6i_flags |= RTF_ANYCAST; - } + if (ort->rt6i_dst.plen != 128 && + ipv6_addr_equal(&ort->rt6i_dst.addr, daddr)) + rt->rt6i_flags |= RTF_ANYCAST; rt->rt6i_flags |= RTF_CACHE; @@ -1163,7 +1164,6 @@ void ip6_update_pmtu(struct sk_buff *skb, struct net *net, __be32 mtu, memset(&fl6, 0, sizeof(fl6)); fl6.flowi6_oif = oif; fl6.flowi6_mark = mark; - fl6.flowi6_flags = 0; fl6.daddr = iph->daddr; fl6.saddr = iph->saddr; fl6.flowlabel = ip6_flowinfo(iph); @@ -1262,7 +1262,6 @@ void ip6_redirect(struct sk_buff *skb, struct net *net, int oif, u32 mark) memset(&fl6, 0, sizeof(fl6)); fl6.flowi6_oif = oif; fl6.flowi6_mark = mark; - fl6.flowi6_flags = 0; fl6.daddr = iph->daddr; fl6.saddr = iph->saddr; fl6.flowlabel = ip6_flowinfo(iph); @@ -1284,7 +1283,6 @@ void ip6_redirect_no_header(struct sk_buff *skb, struct net *net, int oif, memset(&fl6, 0, sizeof(fl6)); fl6.flowi6_oif = oif; fl6.flowi6_mark = mark; - fl6.flowi6_flags = 0; fl6.daddr = msg->dest; fl6.saddr = iph->daddr; @@ -2831,56 +2829,12 @@ static int ip6_route_dev_notify(struct notifier_block *this, #ifdef CONFIG_PROC_FS -struct rt6_proc_arg -{ - char *buffer; - int offset; - int length; - int skip; - int len; -}; - -static int rt6_info_route(struct rt6_info *rt, void *p_arg) -{ - struct seq_file *m = p_arg; - - seq_printf(m, "%pi6 %02x ", &rt->rt6i_dst.addr, rt->rt6i_dst.plen); - -#ifdef CONFIG_IPV6_SUBTREES - seq_printf(m, "%pi6 %02x ", &rt->rt6i_src.addr, rt->rt6i_src.plen); -#else - seq_puts(m, "00000000000000000000000000000000 00 "); -#endif - if (rt->rt6i_flags & RTF_GATEWAY) { - seq_printf(m, "%pi6", &rt->rt6i_gateway); - } else { - seq_puts(m, "00000000000000000000000000000000"); - } - seq_printf(m, " %08x %08x %08x %08x %8s\n", - rt->rt6i_metric, atomic_read(&rt->dst.__refcnt), - rt->dst.__use, rt->rt6i_flags, - rt->dst.dev ? rt->dst.dev->name : ""); - return 0; -} - -static int ipv6_route_show(struct seq_file *m, void *v) -{ - struct net *net = (struct net *)m->private; - fib6_clean_all_ro(net, rt6_info_route, 0, m); - return 0; -} - -static int ipv6_route_open(struct inode *inode, struct file *file) -{ - return single_open_net(inode, file, ipv6_route_show); -} - static const struct file_operations ipv6_route_proc_fops = { .owner = THIS_MODULE, .open = ipv6_route_open, .read = seq_read, .llseek = seq_lseek, - .release = single_release_net, + .release = seq_release_net, }; static int rt6_stats_seq_show(struct seq_file *seq, void *v) diff --git a/net/ipv6/sit.c b/net/ipv6/sit.c index 19269453a8ea..3a9038dd818d 100644 --- a/net/ipv6/sit.c +++ b/net/ipv6/sit.c @@ -933,10 +933,9 @@ static netdev_tx_t ipip6_tunnel_xmit(struct sk_buff *skb, ttl = iph6->hop_limit; tos = INET_ECN_encapsulate(tos, ipv6_get_dsfield(iph6)); - if (likely(!skb->encapsulation)) { - skb_reset_inner_headers(skb); - skb->encapsulation = 1; - } + skb = iptunnel_handle_offloads(skb, false, SKB_GSO_SIT); + if (IS_ERR(skb)) + goto out; err = iptunnel_xmit(rt, skb, fl4.saddr, fl4.daddr, IPPROTO_IPV6, tos, ttl, df, !net_eq(tunnel->net, dev_net(dev))); @@ -946,8 +945,9 @@ static netdev_tx_t ipip6_tunnel_xmit(struct sk_buff *skb, tx_error_icmp: dst_link_failure(skb); tx_error: - dev->stats.tx_errors++; dev_kfree_skb(skb); +out: + dev->stats.tx_errors++; return NETDEV_TX_OK; } @@ -956,13 +956,15 @@ static netdev_tx_t ipip_tunnel_xmit(struct sk_buff *skb, struct net_device *dev) struct ip_tunnel *tunnel = netdev_priv(dev); const struct iphdr *tiph = &tunnel->parms.iph; - if (likely(!skb->encapsulation)) { - skb_reset_inner_headers(skb); - skb->encapsulation = 1; - } + skb = iptunnel_handle_offloads(skb, false, SKB_GSO_IPIP); + if (IS_ERR(skb)) + goto out; ip_tunnel_xmit(skb, dev, tiph, IPPROTO_IPIP); return NETDEV_TX_OK; +out: + dev->stats.tx_errors++; + return NETDEV_TX_OK; } static netdev_tx_t sit_tunnel_xmit(struct sk_buff *skb, @@ -1292,6 +1294,12 @@ static void ipip6_dev_free(struct net_device *dev) free_netdev(dev); } +#define SIT_FEATURES (NETIF_F_SG | \ + NETIF_F_FRAGLIST | \ + NETIF_F_HIGHDMA | \ + NETIF_F_GSO_SOFTWARE | \ + NETIF_F_HW_CSUM) + static void ipip6_tunnel_setup(struct net_device *dev) { dev->netdev_ops = &ipip6_netdev_ops; @@ -1305,6 +1313,8 @@ static void ipip6_tunnel_setup(struct net_device *dev) dev->iflink = 0; dev->addr_len = 4; dev->features |= NETIF_F_LLTX; + dev->features |= SIT_FEATURES; + dev->hw_features |= SIT_FEATURES; } static int ipip6_tunnel_init(struct net_device *dev) diff --git a/net/ipv6/syncookies.c b/net/ipv6/syncookies.c index bf63ac8a49b9..535a3ad262f1 100644 --- a/net/ipv6/syncookies.c +++ b/net/ipv6/syncookies.c @@ -24,26 +24,23 @@ #define COOKIEBITS 24 /* Upper bits store count */ #define COOKIEMASK (((__u32)1 << COOKIEBITS) - 1) -/* Table must be sorted. */ +static u32 syncookie6_secret[2][16-4+SHA_DIGEST_WORDS]; + +/* RFC 2460, Section 8.3: + * [ipv6 tcp] MSS must be computed as the maximum packet size minus 60 [..] + * + * Due to IPV6_MIN_MTU=1280 the lowest possible MSS is 1220, which allows + * using higher values than ipv4 tcp syncookies. + * The other values are chosen based on ethernet (1500 and 9k MTU), plus + * one that accounts for common encap (PPPoe) overhead. Table must be sorted. + */ static __u16 const msstab[] = { - 64, - 512, - 536, - 1280 - 60, + 1280 - 60, /* IPV6_MIN_MTU - 60 */ 1480 - 60, 1500 - 60, - 4460 - 60, 9000 - 60, }; -/* - * This (misnamed) value is the age of syncookie which is permitted. - * Its ideal value should be dependent on TCP_TIMEOUT_INIT and - * sysctl_tcp_retries1. It's a rather complicated formula (exponential - * backoff) to compute at runtime so it's currently hardcoded here. - */ -#define COUNTER_TRIES 4 - static inline struct sock *get_cookie_sock(struct sock *sk, struct sk_buff *skb, struct request_sock *req, struct dst_entry *dst) @@ -66,14 +63,18 @@ static DEFINE_PER_CPU(__u32 [16 + 5 + SHA_WORKSPACE_WORDS], static u32 cookie_hash(const struct in6_addr *saddr, const struct in6_addr *daddr, __be16 sport, __be16 dport, u32 count, int c) { - __u32 *tmp = __get_cpu_var(ipv6_cookie_scratch); + __u32 *tmp; + + net_get_random_once(syncookie6_secret, sizeof(syncookie6_secret)); + + tmp = __get_cpu_var(ipv6_cookie_scratch); /* * we have 320 bits of information to hash, copy in the remaining - * 192 bits required for sha_transform, from the syncookie_secret + * 192 bits required for sha_transform, from the syncookie6_secret * and overwrite the digest with the secret */ - memcpy(tmp + 10, syncookie_secret[c], 44); + memcpy(tmp + 10, syncookie6_secret[c], 44); memcpy(tmp, saddr, 16); memcpy(tmp + 4, daddr, 16); tmp[8] = ((__force u32)sport << 16) + (__force u32)dport; @@ -86,8 +87,9 @@ static u32 cookie_hash(const struct in6_addr *saddr, const struct in6_addr *dadd static __u32 secure_tcp_syn_cookie(const struct in6_addr *saddr, const struct in6_addr *daddr, __be16 sport, __be16 dport, __u32 sseq, - __u32 count, __u32 data) + __u32 data) { + u32 count = tcp_cookie_time(); return (cookie_hash(saddr, daddr, sport, dport, 0, 0) + sseq + (count << COOKIEBITS) + ((cookie_hash(saddr, daddr, sport, dport, count, 1) + data) @@ -96,15 +98,14 @@ static __u32 secure_tcp_syn_cookie(const struct in6_addr *saddr, static __u32 check_tcp_syn_cookie(__u32 cookie, const struct in6_addr *saddr, const struct in6_addr *daddr, __be16 sport, - __be16 dport, __u32 sseq, __u32 count, - __u32 maxdiff) + __be16 dport, __u32 sseq) { - __u32 diff; + __u32 diff, count = tcp_cookie_time(); cookie -= cookie_hash(saddr, daddr, sport, dport, 0, 0) + sseq; diff = (count - (cookie >> COOKIEBITS)) & ((__u32) -1 >> COOKIEBITS); - if (diff >= maxdiff) + if (diff >= MAX_SYNCOOKIE_AGE) return (__u32)-1; return (cookie - @@ -125,8 +126,7 @@ u32 __cookie_v6_init_sequence(const struct ipv6hdr *iph, *mssp = msstab[mssind]; return secure_tcp_syn_cookie(&iph->saddr, &iph->daddr, th->source, - th->dest, ntohl(th->seq), - jiffies / (HZ * 60), mssind); + th->dest, ntohl(th->seq), mssind); } EXPORT_SYMBOL_GPL(__cookie_v6_init_sequence); @@ -146,8 +146,7 @@ int __cookie_v6_check(const struct ipv6hdr *iph, const struct tcphdr *th, { __u32 seq = ntohl(th->seq) - 1; __u32 mssind = check_tcp_syn_cookie(cookie, &iph->saddr, &iph->daddr, - th->source, th->dest, seq, - jiffies / (HZ * 60), COUNTER_TRIES); + th->source, th->dest, seq); return mssind < ARRAY_SIZE(msstab) ? msstab[mssind] : 0; } @@ -157,7 +156,6 @@ struct sock *cookie_v6_check(struct sock *sk, struct sk_buff *skb) { struct tcp_options_received tcp_opt; struct inet_request_sock *ireq; - struct inet6_request_sock *ireq6; struct tcp_request_sock *treq; struct ipv6_pinfo *np = inet6_sk(sk); struct tcp_sock *tp = tcp_sk(sk); @@ -194,7 +192,6 @@ struct sock *cookie_v6_check(struct sock *sk, struct sk_buff *skb) goto out; ireq = inet_rsk(req); - ireq6 = inet6_rsk(req); treq = tcp_rsk(req); treq->listener = NULL; @@ -202,22 +199,22 @@ struct sock *cookie_v6_check(struct sock *sk, struct sk_buff *skb) goto out_free; req->mss = mss; - ireq->rmt_port = th->source; - ireq->loc_port = th->dest; - ireq6->rmt_addr = ipv6_hdr(skb)->saddr; - ireq6->loc_addr = ipv6_hdr(skb)->daddr; + ireq->ir_rmt_port = th->source; + ireq->ir_num = ntohs(th->dest); + ireq->ir_v6_rmt_addr = ipv6_hdr(skb)->saddr; + ireq->ir_v6_loc_addr = ipv6_hdr(skb)->daddr; if (ipv6_opt_accepted(sk, skb) || np->rxopt.bits.rxinfo || np->rxopt.bits.rxoinfo || np->rxopt.bits.rxhlim || np->rxopt.bits.rxohlim) { atomic_inc(&skb->users); - ireq6->pktopts = skb; + ireq->pktopts = skb; } - ireq6->iif = sk->sk_bound_dev_if; + ireq->ir_iif = sk->sk_bound_dev_if; /* So that link locals have meaning */ if (!sk->sk_bound_dev_if && - ipv6_addr_type(&ireq6->rmt_addr) & IPV6_ADDR_LINKLOCAL) - ireq6->iif = inet6_iif(skb); + ipv6_addr_type(&ireq->ir_v6_rmt_addr) & IPV6_ADDR_LINKLOCAL) + ireq->ir_iif = inet6_iif(skb); req->expires = 0UL; req->num_retrans = 0; @@ -241,12 +238,12 @@ struct sock *cookie_v6_check(struct sock *sk, struct sk_buff *skb) struct flowi6 fl6; memset(&fl6, 0, sizeof(fl6)); fl6.flowi6_proto = IPPROTO_TCP; - fl6.daddr = ireq6->rmt_addr; + fl6.daddr = ireq->ir_v6_rmt_addr; final_p = fl6_update_dst(&fl6, np->opt, &final); - fl6.saddr = ireq6->loc_addr; + fl6.saddr = ireq->ir_v6_loc_addr; fl6.flowi6_oif = sk->sk_bound_dev_if; fl6.flowi6_mark = sk->sk_mark; - fl6.fl6_dport = inet_rsk(req)->rmt_port; + fl6.fl6_dport = ireq->ir_rmt_port; fl6.fl6_sport = inet_sk(sk)->inet_sport; security_req_classify_flow(req, flowi6_to_flowi(&fl6)); diff --git a/net/ipv6/tcp_ipv6.c b/net/ipv6/tcp_ipv6.c index 5c71501fc917..0740f93a114a 100644 --- a/net/ipv6/tcp_ipv6.c +++ b/net/ipv6/tcp_ipv6.c @@ -192,13 +192,13 @@ static int tcp_v6_connect(struct sock *sk, struct sockaddr *uaddr, } if (tp->rx_opt.ts_recent_stamp && - !ipv6_addr_equal(&np->daddr, &usin->sin6_addr)) { + !ipv6_addr_equal(&sk->sk_v6_daddr, &usin->sin6_addr)) { tp->rx_opt.ts_recent = 0; tp->rx_opt.ts_recent_stamp = 0; tp->write_seq = 0; } - np->daddr = usin->sin6_addr; + sk->sk_v6_daddr = usin->sin6_addr; np->flow_label = fl6.flowlabel; /* @@ -237,17 +237,17 @@ static int tcp_v6_connect(struct sock *sk, struct sockaddr *uaddr, } else { ipv6_addr_set_v4mapped(inet->inet_saddr, &np->saddr); ipv6_addr_set_v4mapped(inet->inet_rcv_saddr, - &np->rcv_saddr); + &sk->sk_v6_rcv_saddr); } return err; } - if (!ipv6_addr_any(&np->rcv_saddr)) - saddr = &np->rcv_saddr; + if (!ipv6_addr_any(&sk->sk_v6_rcv_saddr)) + saddr = &sk->sk_v6_rcv_saddr; fl6.flowi6_proto = IPPROTO_TCP; - fl6.daddr = np->daddr; + fl6.daddr = sk->sk_v6_daddr; fl6.saddr = saddr ? *saddr : np->saddr; fl6.flowi6_oif = sk->sk_bound_dev_if; fl6.flowi6_mark = sk->sk_mark; @@ -266,7 +266,7 @@ static int tcp_v6_connect(struct sock *sk, struct sockaddr *uaddr, if (saddr == NULL) { saddr = &fl6.saddr; - np->rcv_saddr = *saddr; + sk->sk_v6_rcv_saddr = *saddr; } /* set the source address */ @@ -279,7 +279,7 @@ static int tcp_v6_connect(struct sock *sk, struct sockaddr *uaddr, rt = (struct rt6_info *) dst; if (tcp_death_row.sysctl_tw_recycle && !tp->rx_opt.ts_recent_stamp && - ipv6_addr_equal(&rt->rt6i_dst.addr, &np->daddr)) + ipv6_addr_equal(&rt->rt6i_dst.addr, &sk->sk_v6_daddr)) tcp_fetch_timewait_stamp(sk, dst); icsk->icsk_ext_hdr_len = 0; @@ -298,7 +298,7 @@ static int tcp_v6_connect(struct sock *sk, struct sockaddr *uaddr, if (!tp->write_seq && likely(!tp->repair)) tp->write_seq = secure_tcpv6_sequence_number(np->saddr.s6_addr32, - np->daddr.s6_addr32, + sk->sk_v6_daddr.s6_addr32, inet->inet_sport, inet->inet_dport); @@ -465,7 +465,7 @@ static int tcp_v6_send_synack(struct sock *sk, struct dst_entry *dst, struct request_sock *req, u16 queue_mapping) { - struct inet6_request_sock *treq = inet6_rsk(req); + struct inet_request_sock *ireq = inet_rsk(req); struct ipv6_pinfo *np = inet6_sk(sk); struct sk_buff * skb; int err = -ENOMEM; @@ -477,9 +477,10 @@ static int tcp_v6_send_synack(struct sock *sk, struct dst_entry *dst, skb = tcp_make_synack(sk, dst, req, NULL); if (skb) { - __tcp_v6_send_check(skb, &treq->loc_addr, &treq->rmt_addr); + __tcp_v6_send_check(skb, &ireq->ir_v6_loc_addr, + &ireq->ir_v6_rmt_addr); - fl6->daddr = treq->rmt_addr; + fl6->daddr = ireq->ir_v6_rmt_addr; skb_set_queue_mapping(skb, queue_mapping); err = ip6_xmit(sk, skb, fl6, np->opt, np->tclass); err = net_xmit_eval(err); @@ -502,7 +503,7 @@ static int tcp_v6_rtx_synack(struct sock *sk, struct request_sock *req) static void tcp_v6_reqsk_destructor(struct request_sock *req) { - kfree_skb(inet6_rsk(req)->pktopts); + kfree_skb(inet_rsk(req)->pktopts); } #ifdef CONFIG_TCP_MD5SIG @@ -515,13 +516,13 @@ static struct tcp_md5sig_key *tcp_v6_md5_do_lookup(struct sock *sk, static struct tcp_md5sig_key *tcp_v6_md5_lookup(struct sock *sk, struct sock *addr_sk) { - return tcp_v6_md5_do_lookup(sk, &inet6_sk(addr_sk)->daddr); + return tcp_v6_md5_do_lookup(sk, &addr_sk->sk_v6_daddr); } static struct tcp_md5sig_key *tcp_v6_reqsk_md5_lookup(struct sock *sk, struct request_sock *req) { - return tcp_v6_md5_do_lookup(sk, &inet6_rsk(req)->rmt_addr); + return tcp_v6_md5_do_lookup(sk, &inet_rsk(req)->ir_v6_rmt_addr); } static int tcp_v6_parse_md5_keys (struct sock *sk, char __user *optval, @@ -621,10 +622,10 @@ static int tcp_v6_md5_hash_skb(char *md5_hash, struct tcp_md5sig_key *key, if (sk) { saddr = &inet6_sk(sk)->saddr; - daddr = &inet6_sk(sk)->daddr; + daddr = &sk->sk_v6_daddr; } else if (req) { - saddr = &inet6_rsk(req)->loc_addr; - daddr = &inet6_rsk(req)->rmt_addr; + saddr = &inet_rsk(req)->ir_v6_loc_addr; + daddr = &inet_rsk(req)->ir_v6_rmt_addr; } else { const struct ipv6hdr *ip6h = ipv6_hdr(skb); saddr = &ip6h->saddr; @@ -949,7 +950,7 @@ static int tcp_v6_conn_request(struct sock *sk, struct sk_buff *skb) { struct tcp_options_received tmp_opt; struct request_sock *req; - struct inet6_request_sock *treq; + struct inet_request_sock *ireq; struct ipv6_pinfo *np = inet6_sk(sk); struct tcp_sock *tp = tcp_sk(sk); __u32 isn = TCP_SKB_CB(skb)->when; @@ -994,25 +995,25 @@ static int tcp_v6_conn_request(struct sock *sk, struct sk_buff *skb) tmp_opt.tstamp_ok = tmp_opt.saw_tstamp; tcp_openreq_init(req, &tmp_opt, skb); - treq = inet6_rsk(req); - treq->rmt_addr = ipv6_hdr(skb)->saddr; - treq->loc_addr = ipv6_hdr(skb)->daddr; + ireq = inet_rsk(req); + ireq->ir_v6_rmt_addr = ipv6_hdr(skb)->saddr; + ireq->ir_v6_loc_addr = ipv6_hdr(skb)->daddr; if (!want_cookie || tmp_opt.tstamp_ok) TCP_ECN_create_request(req, skb, sock_net(sk)); - treq->iif = sk->sk_bound_dev_if; + ireq->ir_iif = sk->sk_bound_dev_if; /* So that link locals have meaning */ if (!sk->sk_bound_dev_if && - ipv6_addr_type(&treq->rmt_addr) & IPV6_ADDR_LINKLOCAL) - treq->iif = inet6_iif(skb); + ipv6_addr_type(&ireq->ir_v6_rmt_addr) & IPV6_ADDR_LINKLOCAL) + ireq->ir_iif = inet6_iif(skb); if (!isn) { if (ipv6_opt_accepted(sk, skb) || np->rxopt.bits.rxinfo || np->rxopt.bits.rxoinfo || np->rxopt.bits.rxhlim || np->rxopt.bits.rxohlim) { atomic_inc(&skb->users); - treq->pktopts = skb; + ireq->pktopts = skb; } if (want_cookie) { @@ -1051,7 +1052,7 @@ static int tcp_v6_conn_request(struct sock *sk, struct sk_buff *skb) * to the moment of synflood. */ LIMIT_NETDEBUG(KERN_DEBUG "TCP: drop open request from %pI6/%u\n", - &treq->rmt_addr, ntohs(tcp_hdr(skb)->source)); + &ireq->ir_v6_rmt_addr, ntohs(tcp_hdr(skb)->source)); goto drop_and_release; } @@ -1086,7 +1087,7 @@ static struct sock * tcp_v6_syn_recv_sock(struct sock *sk, struct sk_buff *skb, struct request_sock *req, struct dst_entry *dst) { - struct inet6_request_sock *treq; + struct inet_request_sock *ireq; struct ipv6_pinfo *newnp, *np = inet6_sk(sk); struct tcp6_sock *newtcp6sk; struct inet_sock *newinet; @@ -1116,11 +1117,11 @@ static struct sock * tcp_v6_syn_recv_sock(struct sock *sk, struct sk_buff *skb, memcpy(newnp, np, sizeof(struct ipv6_pinfo)); - ipv6_addr_set_v4mapped(newinet->inet_daddr, &newnp->daddr); + ipv6_addr_set_v4mapped(newinet->inet_daddr, &newsk->sk_v6_daddr); ipv6_addr_set_v4mapped(newinet->inet_saddr, &newnp->saddr); - newnp->rcv_saddr = newnp->saddr; + newsk->sk_v6_rcv_saddr = newnp->saddr; inet_csk(newsk)->icsk_af_ops = &ipv6_mapped; newsk->sk_backlog_rcv = tcp_v4_do_rcv; @@ -1151,7 +1152,7 @@ static struct sock * tcp_v6_syn_recv_sock(struct sock *sk, struct sk_buff *skb, return newsk; } - treq = inet6_rsk(req); + ireq = inet_rsk(req); if (sk_acceptq_is_full(sk)) goto out_overflow; @@ -1185,10 +1186,10 @@ static struct sock * tcp_v6_syn_recv_sock(struct sock *sk, struct sk_buff *skb, memcpy(newnp, np, sizeof(struct ipv6_pinfo)); - newnp->daddr = treq->rmt_addr; - newnp->saddr = treq->loc_addr; - newnp->rcv_saddr = treq->loc_addr; - newsk->sk_bound_dev_if = treq->iif; + newsk->sk_v6_daddr = ireq->ir_v6_rmt_addr; + newnp->saddr = ireq->ir_v6_loc_addr; + newsk->sk_v6_rcv_saddr = ireq->ir_v6_loc_addr; + newsk->sk_bound_dev_if = ireq->ir_iif; /* Now IPv6 options... @@ -1203,11 +1204,11 @@ static struct sock * tcp_v6_syn_recv_sock(struct sock *sk, struct sk_buff *skb, /* Clone pktoptions received with SYN */ newnp->pktoptions = NULL; - if (treq->pktopts != NULL) { - newnp->pktoptions = skb_clone(treq->pktopts, + if (ireq->pktopts != NULL) { + newnp->pktoptions = skb_clone(ireq->pktopts, sk_gfp_atomic(sk, GFP_ATOMIC)); - consume_skb(treq->pktopts); - treq->pktopts = NULL; + consume_skb(ireq->pktopts); + ireq->pktopts = NULL; if (newnp->pktoptions) skb_set_owner_r(newnp->pktoptions, newsk); } @@ -1244,13 +1245,13 @@ static struct sock * tcp_v6_syn_recv_sock(struct sock *sk, struct sk_buff *skb, #ifdef CONFIG_TCP_MD5SIG /* Copy over the MD5 key from the original socket */ - if ((key = tcp_v6_md5_do_lookup(sk, &newnp->daddr)) != NULL) { + if ((key = tcp_v6_md5_do_lookup(sk, &newsk->sk_v6_daddr)) != NULL) { /* We're using one, so create a matching key * on the newsk structure. If we fail to get * memory, then we end up not copying the key * across. Shucks. */ - tcp_md5_do_add(newsk, (union tcp_md5_addr *)&newnp->daddr, + tcp_md5_do_add(newsk, (union tcp_md5_addr *)&newsk->sk_v6_daddr, AF_INET6, key->key, key->keylen, sk_gfp_atomic(sk, GFP_ATOMIC)); } @@ -1722,8 +1723,8 @@ static void get_openreq6(struct seq_file *seq, const struct sock *sk, struct request_sock *req, int i, kuid_t uid) { int ttd = req->expires - jiffies; - const struct in6_addr *src = &inet6_rsk(req)->loc_addr; - const struct in6_addr *dest = &inet6_rsk(req)->rmt_addr; + const struct in6_addr *src = &inet_rsk(req)->ir_v6_loc_addr; + const struct in6_addr *dest = &inet_rsk(req)->ir_v6_rmt_addr; if (ttd < 0) ttd = 0; @@ -1734,10 +1735,10 @@ static void get_openreq6(struct seq_file *seq, i, src->s6_addr32[0], src->s6_addr32[1], src->s6_addr32[2], src->s6_addr32[3], - ntohs(inet_rsk(req)->loc_port), + inet_rsk(req)->ir_num, dest->s6_addr32[0], dest->s6_addr32[1], dest->s6_addr32[2], dest->s6_addr32[3], - ntohs(inet_rsk(req)->rmt_port), + ntohs(inet_rsk(req)->ir_rmt_port), TCP_SYN_RECV, 0,0, /* could print option size, but that is af dependent. */ 1, /* timers active (only the expire timer) */ @@ -1758,10 +1759,9 @@ static void get_tcp6_sock(struct seq_file *seq, struct sock *sp, int i) const struct inet_sock *inet = inet_sk(sp); const struct tcp_sock *tp = tcp_sk(sp); const struct inet_connection_sock *icsk = inet_csk(sp); - const struct ipv6_pinfo *np = inet6_sk(sp); - dest = &np->daddr; - src = &np->rcv_saddr; + dest = &sp->sk_v6_daddr; + src = &sp->sk_v6_rcv_saddr; destp = ntohs(inet->inet_dport); srcp = ntohs(inet->inet_sport); @@ -1810,11 +1810,10 @@ static void get_timewait6_sock(struct seq_file *seq, { const struct in6_addr *dest, *src; __u16 destp, srcp; - const struct inet6_timewait_sock *tw6 = inet6_twsk((struct sock *)tw); - long delta = tw->tw_ttd - jiffies; + s32 delta = tw->tw_ttd - inet_tw_time_stamp(); - dest = &tw6->tw_v6_daddr; - src = &tw6->tw_v6_rcv_saddr; + dest = &tw->tw_v6_daddr; + src = &tw->tw_v6_rcv_saddr; destp = ntohs(tw->tw_dport); srcp = ntohs(tw->tw_sport); @@ -1834,6 +1833,7 @@ static void get_timewait6_sock(struct seq_file *seq, static int tcp6_seq_show(struct seq_file *seq, void *v) { struct tcp_iter_state *st; + struct sock *sk = v; if (v == SEQ_START_TOKEN) { seq_puts(seq, @@ -1849,14 +1849,14 @@ static int tcp6_seq_show(struct seq_file *seq, void *v) switch (st->state) { case TCP_SEQ_STATE_LISTENING: case TCP_SEQ_STATE_ESTABLISHED: - get_tcp6_sock(seq, v, st->num); + if (sk->sk_state == TCP_TIME_WAIT) + get_timewait6_sock(seq, v, st->num); + else + get_tcp6_sock(seq, v, st->num); break; case TCP_SEQ_STATE_OPENREQ: get_openreq6(seq, st->syn_wait_sk, v, st->num, st->uid); break; - case TCP_SEQ_STATE_TIME_WAIT: - get_timewait6_sock(seq, v, st->num); - break; } out: return 0; @@ -1929,6 +1929,7 @@ struct proto tcpv6_prot = { .memory_allocated = &tcp_memory_allocated, .memory_pressure = &tcp_memory_pressure, .orphan_count = &tcp_orphan_count, + .sysctl_mem = sysctl_tcp_mem, .sysctl_wmem = sysctl_tcp_wmem, .sysctl_rmem = sysctl_tcp_rmem, .max_header = MAX_TCP_HEADER, diff --git a/net/ipv6/tcpv6_offload.c b/net/ipv6/tcpv6_offload.c index 2ec6bf6a0aa0..c1097c798900 100644 --- a/net/ipv6/tcpv6_offload.c +++ b/net/ipv6/tcpv6_offload.c @@ -83,7 +83,7 @@ static int tcp6_gro_complete(struct sk_buff *skb) static const struct net_offload tcpv6_offload = { .callbacks = { .gso_send_check = tcp_v6_gso_send_check, - .gso_segment = tcp_tso_segment, + .gso_segment = tcp_gso_segment, .gro_receive = tcp6_gro_receive, .gro_complete = tcp6_gro_complete, }, diff --git a/net/ipv6/udp.c b/net/ipv6/udp.c index 18786098fd41..f3893e897f72 100644 --- a/net/ipv6/udp.c +++ b/net/ipv6/udp.c @@ -53,22 +53,42 @@ #include <trace/events/skb.h> #include "udp_impl.h" +static unsigned int udp6_ehashfn(struct net *net, + const struct in6_addr *laddr, + const u16 lport, + const struct in6_addr *faddr, + const __be16 fport) +{ + static u32 udp6_ehash_secret __read_mostly; + static u32 udp_ipv6_hash_secret __read_mostly; + + u32 lhash, fhash; + + net_get_random_once(&udp6_ehash_secret, + sizeof(udp6_ehash_secret)); + net_get_random_once(&udp_ipv6_hash_secret, + sizeof(udp_ipv6_hash_secret)); + + lhash = (__force u32)laddr->s6_addr32[3]; + fhash = __ipv6_addr_jhash(faddr, udp_ipv6_hash_secret); + + return __inet6_ehashfn(lhash, lport, fhash, fport, + udp_ipv6_hash_secret + net_hash_mix(net)); +} + int ipv6_rcv_saddr_equal(const struct sock *sk, const struct sock *sk2) { - const struct in6_addr *sk_rcv_saddr6 = &inet6_sk(sk)->rcv_saddr; const struct in6_addr *sk2_rcv_saddr6 = inet6_rcv_saddr(sk2); - __be32 sk1_rcv_saddr = sk_rcv_saddr(sk); - __be32 sk2_rcv_saddr = sk_rcv_saddr(sk2); int sk_ipv6only = ipv6_only_sock(sk); int sk2_ipv6only = inet_v6_ipv6only(sk2); - int addr_type = ipv6_addr_type(sk_rcv_saddr6); + int addr_type = ipv6_addr_type(&sk->sk_v6_rcv_saddr); int addr_type2 = sk2_rcv_saddr6 ? ipv6_addr_type(sk2_rcv_saddr6) : IPV6_ADDR_MAPPED; /* if both are mapped, treat as IPv4 */ if (addr_type == IPV6_ADDR_MAPPED && addr_type2 == IPV6_ADDR_MAPPED) return (!sk2_ipv6only && - (!sk1_rcv_saddr || !sk2_rcv_saddr || - sk1_rcv_saddr == sk2_rcv_saddr)); + (!sk->sk_rcv_saddr || !sk2->sk_rcv_saddr || + sk->sk_rcv_saddr == sk2->sk_rcv_saddr)); if (addr_type2 == IPV6_ADDR_ANY && !(sk2_ipv6only && addr_type == IPV6_ADDR_MAPPED)) @@ -79,7 +99,7 @@ int ipv6_rcv_saddr_equal(const struct sock *sk, const struct sock *sk2) return 1; if (sk2_rcv_saddr6 && - ipv6_addr_equal(sk_rcv_saddr6, sk2_rcv_saddr6)) + ipv6_addr_equal(&sk->sk_v6_rcv_saddr, sk2_rcv_saddr6)) return 1; return 0; @@ -107,7 +127,7 @@ int udp_v6_get_port(struct sock *sk, unsigned short snum) unsigned int hash2_nulladdr = udp6_portaddr_hash(sock_net(sk), &in6addr_any, snum); unsigned int hash2_partial = - udp6_portaddr_hash(sock_net(sk), &inet6_sk(sk)->rcv_saddr, 0); + udp6_portaddr_hash(sock_net(sk), &sk->sk_v6_rcv_saddr, 0); /* precompute partial secondary hash */ udp_sk(sk)->udp_portaddr_hash = hash2_partial; @@ -117,7 +137,7 @@ int udp_v6_get_port(struct sock *sk, unsigned short snum) static void udp_v6_rehash(struct sock *sk) { u16 new_hash = udp6_portaddr_hash(sock_net(sk), - &inet6_sk(sk)->rcv_saddr, + &sk->sk_v6_rcv_saddr, inet_sk(sk)->inet_num); udp_lib_rehash(sk, new_hash); @@ -133,7 +153,6 @@ static inline int compute_score(struct sock *sk, struct net *net, if (net_eq(sock_net(sk), net) && udp_sk(sk)->udp_port_hash == hnum && sk->sk_family == PF_INET6) { - struct ipv6_pinfo *np = inet6_sk(sk); struct inet_sock *inet = inet_sk(sk); score = 0; @@ -142,13 +161,13 @@ static inline int compute_score(struct sock *sk, struct net *net, return -1; score++; } - if (!ipv6_addr_any(&np->rcv_saddr)) { - if (!ipv6_addr_equal(&np->rcv_saddr, daddr)) + if (!ipv6_addr_any(&sk->sk_v6_rcv_saddr)) { + if (!ipv6_addr_equal(&sk->sk_v6_rcv_saddr, daddr)) return -1; score++; } - if (!ipv6_addr_any(&np->daddr)) { - if (!ipv6_addr_equal(&np->daddr, saddr)) + if (!ipv6_addr_any(&sk->sk_v6_daddr)) { + if (!ipv6_addr_equal(&sk->sk_v6_daddr, saddr)) return -1; score++; } @@ -171,10 +190,9 @@ static inline int compute_score2(struct sock *sk, struct net *net, if (net_eq(sock_net(sk), net) && udp_sk(sk)->udp_port_hash == hnum && sk->sk_family == PF_INET6) { - struct ipv6_pinfo *np = inet6_sk(sk); struct inet_sock *inet = inet_sk(sk); - if (!ipv6_addr_equal(&np->rcv_saddr, daddr)) + if (!ipv6_addr_equal(&sk->sk_v6_rcv_saddr, daddr)) return -1; score = 0; if (inet->inet_dport) { @@ -182,8 +200,8 @@ static inline int compute_score2(struct sock *sk, struct net *net, return -1; score++; } - if (!ipv6_addr_any(&np->daddr)) { - if (!ipv6_addr_equal(&np->daddr, saddr)) + if (!ipv6_addr_any(&sk->sk_v6_daddr)) { + if (!ipv6_addr_equal(&sk->sk_v6_daddr, saddr)) return -1; score++; } @@ -219,8 +237,8 @@ begin: badness = score; reuseport = sk->sk_reuseport; if (reuseport) { - hash = inet6_ehashfn(net, daddr, hnum, - saddr, sport); + hash = udp6_ehashfn(net, daddr, hnum, + saddr, sport); matches = 1; } else if (score == SCORE2_MAX) goto exact_match; @@ -300,8 +318,8 @@ begin: badness = score; reuseport = sk->sk_reuseport; if (reuseport) { - hash = inet6_ehashfn(net, daddr, hnum, - saddr, sport); + hash = udp6_ehashfn(net, daddr, hnum, + saddr, sport); matches = 1; } } else if (score == badness && reuseport) { @@ -551,8 +569,10 @@ static int __udpv6_queue_rcv_skb(struct sock *sk, struct sk_buff *skb) { int rc; - if (!ipv6_addr_any(&inet6_sk(sk)->daddr)) + if (!ipv6_addr_any(&sk->sk_v6_daddr)) { sock_rps_save_rxhash(sk, skb); + sk_mark_napi_id(sk, skb); + } rc = sock_queue_rcv_skb(sk, skb); if (rc < 0) { @@ -690,20 +710,19 @@ static struct sock *udp_v6_mcast_next(struct net *net, struct sock *sk, if (udp_sk(s)->udp_port_hash == num && s->sk_family == PF_INET6) { - struct ipv6_pinfo *np = inet6_sk(s); if (inet->inet_dport) { if (inet->inet_dport != rmt_port) continue; } - if (!ipv6_addr_any(&np->daddr) && - !ipv6_addr_equal(&np->daddr, rmt_addr)) + if (!ipv6_addr_any(&sk->sk_v6_daddr) && + !ipv6_addr_equal(&sk->sk_v6_daddr, rmt_addr)) continue; if (s->sk_bound_dev_if && s->sk_bound_dev_if != dif) continue; - if (!ipv6_addr_any(&np->rcv_saddr)) { - if (!ipv6_addr_equal(&np->rcv_saddr, loc_addr)) + if (!ipv6_addr_any(&sk->sk_v6_rcv_saddr)) { + if (!ipv6_addr_equal(&sk->sk_v6_rcv_saddr, loc_addr)) continue; } if (!inet6_mc_check(s, loc_addr, rmt_addr)) @@ -846,7 +865,6 @@ int __udp6_lib_rcv(struct sk_buff *skb, struct udp_table *udptable, if (sk != NULL) { int ret; - sk_mark_napi_id(sk, skb); ret = udpv6_queue_rcv_skb(sk, skb); sock_put(sk); @@ -1064,7 +1082,7 @@ int udpv6_sendmsg(struct kiocb *iocb, struct sock *sk, } else if (!up->pending) { if (sk->sk_state != TCP_ESTABLISHED) return -EDESTADDRREQ; - daddr = &np->daddr; + daddr = &sk->sk_v6_daddr; } else daddr = NULL; @@ -1134,8 +1152,8 @@ do_udp_sendmsg: * sk->sk_dst_cache. */ if (sk->sk_state == TCP_ESTABLISHED && - ipv6_addr_equal(daddr, &np->daddr)) - daddr = &np->daddr; + ipv6_addr_equal(daddr, &sk->sk_v6_daddr)) + daddr = &sk->sk_v6_daddr; if (addr_len >= sizeof(struct sockaddr_in6) && sin6->sin6_scope_id && @@ -1146,7 +1164,7 @@ do_udp_sendmsg: return -EDESTADDRREQ; fl6.fl6_dport = inet->inet_dport; - daddr = &np->daddr; + daddr = &sk->sk_v6_daddr; fl6.flowlabel = np->flow_label; connected = 1; } @@ -1261,8 +1279,8 @@ do_append_data: if (dst) { if (connected) { ip6_dst_store(sk, dst, - ipv6_addr_equal(&fl6.daddr, &np->daddr) ? - &np->daddr : NULL, + ipv6_addr_equal(&fl6.daddr, &sk->sk_v6_daddr) ? + &sk->sk_v6_daddr : NULL, #ifdef CONFIG_IPV6_SUBTREES ipv6_addr_equal(&fl6.saddr, &np->saddr) ? &np->saddr : diff --git a/net/ipv6/udp_impl.h b/net/ipv6/udp_impl.h index 4691ed50a928..c779c3c90b9d 100644 --- a/net/ipv6/udp_impl.h +++ b/net/ipv6/udp_impl.h @@ -7,33 +7,32 @@ #include <net/inet_common.h> #include <net/transp_v6.h> -extern int __udp6_lib_rcv(struct sk_buff *, struct udp_table *, int ); -extern void __udp6_lib_err(struct sk_buff *, struct inet6_skb_parm *, - u8 , u8 , int , __be32 , struct udp_table *); +int __udp6_lib_rcv(struct sk_buff *, struct udp_table *, int); +void __udp6_lib_err(struct sk_buff *, struct inet6_skb_parm *, u8, u8, int, + __be32, struct udp_table *); -extern int udp_v6_get_port(struct sock *sk, unsigned short snum); +int udp_v6_get_port(struct sock *sk, unsigned short snum); -extern int udpv6_getsockopt(struct sock *sk, int level, int optname, - char __user *optval, int __user *optlen); -extern int udpv6_setsockopt(struct sock *sk, int level, int optname, - char __user *optval, unsigned int optlen); +int udpv6_getsockopt(struct sock *sk, int level, int optname, + char __user *optval, int __user *optlen); +int udpv6_setsockopt(struct sock *sk, int level, int optname, + char __user *optval, unsigned int optlen); #ifdef CONFIG_COMPAT -extern int compat_udpv6_setsockopt(struct sock *sk, int level, int optname, - char __user *optval, unsigned int optlen); -extern int compat_udpv6_getsockopt(struct sock *sk, int level, int optname, - char __user *optval, int __user *optlen); +int compat_udpv6_setsockopt(struct sock *sk, int level, int optname, + char __user *optval, unsigned int optlen); +int compat_udpv6_getsockopt(struct sock *sk, int level, int optname, + char __user *optval, int __user *optlen); #endif -extern int udpv6_sendmsg(struct kiocb *iocb, struct sock *sk, - struct msghdr *msg, size_t len); -extern int udpv6_recvmsg(struct kiocb *iocb, struct sock *sk, - struct msghdr *msg, size_t len, - int noblock, int flags, int *addr_len); -extern int udpv6_queue_rcv_skb(struct sock * sk, struct sk_buff *skb); -extern void udpv6_destroy_sock(struct sock *sk); +int udpv6_sendmsg(struct kiocb *iocb, struct sock *sk, struct msghdr *msg, + size_t len); +int udpv6_recvmsg(struct kiocb *iocb, struct sock *sk, struct msghdr *msg, + size_t len, int noblock, int flags, int *addr_len); +int udpv6_queue_rcv_skb(struct sock *sk, struct sk_buff *skb); +void udpv6_destroy_sock(struct sock *sk); -extern void udp_v6_clear_sk(struct sock *sk, int size); +void udp_v6_clear_sk(struct sock *sk, int size); #ifdef CONFIG_PROC_FS -extern int udp6_seq_show(struct seq_file *seq, void *v); +int udp6_seq_show(struct seq_file *seq, void *v); #endif #endif /* _UDP6_IMPL_H */ diff --git a/net/ipv6/udp_offload.c b/net/ipv6/udp_offload.c index 60559511bd9c..e7359f9eaa8d 100644 --- a/net/ipv6/udp_offload.c +++ b/net/ipv6/udp_offload.c @@ -64,6 +64,8 @@ static struct sk_buff *udp6_ufo_fragment(struct sk_buff *skb, SKB_GSO_DODGY | SKB_GSO_UDP_TUNNEL | SKB_GSO_GRE | + SKB_GSO_IPIP | + SKB_GSO_SIT | SKB_GSO_MPLS) || !(type & (SKB_GSO_UDP)))) goto out; @@ -88,7 +90,7 @@ static struct sk_buff *udp6_ufo_fragment(struct sk_buff *skb, /* Check if there is enough headroom to insert fragment header. */ tnl_hlen = skb_tnl_header_len(skb); - if (skb_headroom(skb) < (tnl_hlen + frag_hdr_sz)) { + if (skb->mac_header < (tnl_hlen + frag_hdr_sz)) { if (gso_pskb_expand_head(skb, tnl_hlen + frag_hdr_sz)) goto out; } diff --git a/net/ipv6/xfrm6_mode_tunnel.c b/net/ipv6/xfrm6_mode_tunnel.c index 4770d515c2c8..cb04f7a16b5e 100644 --- a/net/ipv6/xfrm6_mode_tunnel.c +++ b/net/ipv6/xfrm6_mode_tunnel.c @@ -18,6 +18,65 @@ #include <net/ipv6.h> #include <net/xfrm.h> +/* Informational hook. The decap is still done here. */ +static struct xfrm_tunnel_notifier __rcu *rcv_notify_handlers __read_mostly; +static DEFINE_MUTEX(xfrm6_mode_tunnel_input_mutex); + +int xfrm6_mode_tunnel_input_register(struct xfrm_tunnel_notifier *handler) +{ + struct xfrm_tunnel_notifier __rcu **pprev; + struct xfrm_tunnel_notifier *t; + int ret = -EEXIST; + int priority = handler->priority; + + mutex_lock(&xfrm6_mode_tunnel_input_mutex); + + for (pprev = &rcv_notify_handlers; + (t = rcu_dereference_protected(*pprev, + lockdep_is_held(&xfrm6_mode_tunnel_input_mutex))) != NULL; + pprev = &t->next) { + if (t->priority > priority) + break; + if (t->priority == priority) + goto err; + + } + + handler->next = *pprev; + rcu_assign_pointer(*pprev, handler); + + ret = 0; + +err: + mutex_unlock(&xfrm6_mode_tunnel_input_mutex); + return ret; +} +EXPORT_SYMBOL_GPL(xfrm6_mode_tunnel_input_register); + +int xfrm6_mode_tunnel_input_deregister(struct xfrm_tunnel_notifier *handler) +{ + struct xfrm_tunnel_notifier __rcu **pprev; + struct xfrm_tunnel_notifier *t; + int ret = -ENOENT; + + mutex_lock(&xfrm6_mode_tunnel_input_mutex); + for (pprev = &rcv_notify_handlers; + (t = rcu_dereference_protected(*pprev, + lockdep_is_held(&xfrm6_mode_tunnel_input_mutex))) != NULL; + pprev = &t->next) { + if (t == handler) { + *pprev = handler->next; + ret = 0; + break; + } + } + mutex_unlock(&xfrm6_mode_tunnel_input_mutex); + synchronize_net(); + + return ret; +} +EXPORT_SYMBOL_GPL(xfrm6_mode_tunnel_input_deregister); + static inline void ipip6_ecn_decapsulate(struct sk_buff *skb) { const struct ipv6hdr *outer_iph = ipv6_hdr(skb); @@ -63,8 +122,15 @@ static int xfrm6_mode_tunnel_output(struct xfrm_state *x, struct sk_buff *skb) return 0; } +#define for_each_input_rcu(head, handler) \ + for (handler = rcu_dereference(head); \ + handler != NULL; \ + handler = rcu_dereference(handler->next)) + + static int xfrm6_mode_tunnel_input(struct xfrm_state *x, struct sk_buff *skb) { + struct xfrm_tunnel_notifier *handler; int err = -EINVAL; if (XFRM_MODE_SKB_CB(skb)->protocol != IPPROTO_IPV6) @@ -72,6 +138,9 @@ static int xfrm6_mode_tunnel_input(struct xfrm_state *x, struct sk_buff *skb) if (!pskb_may_pull(skb, sizeof(struct ipv6hdr))) goto out; + for_each_input_rcu(rcv_notify_handlers, handler) + handler->handler(skb); + err = skb_unclone(skb, GFP_ATOMIC); if (err) goto out; diff --git a/net/irda/irnet/irnet.h b/net/irda/irnet/irnet.h index 564eb0b8afa3..8d65bb9477fc 100644 --- a/net/irda/irnet/irnet.h +++ b/net/irda/irnet/irnet.h @@ -509,16 +509,11 @@ typedef struct irnet_ctrl_channel */ /* -------------------------- IRDA PART -------------------------- */ -extern int - irda_irnet_create(irnet_socket *); /* Initialise a IrNET socket */ -extern int - irda_irnet_connect(irnet_socket *); /* Try to connect over IrDA */ -extern void - irda_irnet_destroy(irnet_socket *); /* Teardown a IrNET socket */ -extern int - irda_irnet_init(void); /* Initialise IrDA part of IrNET */ -extern void - irda_irnet_cleanup(void); /* Teardown IrDA part of IrNET */ +int irda_irnet_create(irnet_socket *); /* Initialise an IrNET socket */ +int irda_irnet_connect(irnet_socket *); /* Try to connect over IrDA */ +void irda_irnet_destroy(irnet_socket *); /* Teardown an IrNET socket */ +int irda_irnet_init(void); /* Initialise IrDA part of IrNET */ +void irda_irnet_cleanup(void); /* Teardown IrDA part of IrNET */ /**************************** VARIABLES ****************************/ diff --git a/net/l2tp/l2tp_core.c b/net/l2tp/l2tp_core.c index b076e8309bc2..9af77d9c0ec9 100644 --- a/net/l2tp/l2tp_core.c +++ b/net/l2tp/l2tp_core.c @@ -1181,7 +1181,7 @@ static void l2tp_xmit_ipv6_csum(struct sock *sk, struct sk_buff *skb, !(skb_dst(skb)->dev->features & NETIF_F_IPV6_CSUM)) { __wsum csum = skb_checksum(skb, 0, udp_len, 0); skb->ip_summed = CHECKSUM_UNNECESSARY; - uh->check = csum_ipv6_magic(&np->saddr, &np->daddr, udp_len, + uh->check = csum_ipv6_magic(&np->saddr, &sk->sk_v6_daddr, udp_len, IPPROTO_UDP, csum); if (uh->check == 0) uh->check = CSUM_MANGLED_0; @@ -1189,7 +1189,7 @@ static void l2tp_xmit_ipv6_csum(struct sock *sk, struct sk_buff *skb, skb->ip_summed = CHECKSUM_PARTIAL; skb->csum_start = skb_transport_header(skb) - skb->head; skb->csum_offset = offsetof(struct udphdr, check); - uh->check = ~csum_ipv6_magic(&np->saddr, &np->daddr, + uh->check = ~csum_ipv6_magic(&np->saddr, &sk->sk_v6_daddr, udp_len, IPPROTO_UDP, 0); } } @@ -1713,13 +1713,13 @@ int l2tp_tunnel_create(struct net *net, int fd, int version, u32 tunnel_id, u32 struct ipv6_pinfo *np = inet6_sk(sk); if (ipv6_addr_v4mapped(&np->saddr) && - ipv6_addr_v4mapped(&np->daddr)) { + ipv6_addr_v4mapped(&sk->sk_v6_daddr)) { struct inet_sock *inet = inet_sk(sk); tunnel->v4mapped = true; inet->inet_saddr = np->saddr.s6_addr32[3]; - inet->inet_rcv_saddr = np->rcv_saddr.s6_addr32[3]; - inet->inet_daddr = np->daddr.s6_addr32[3]; + inet->inet_rcv_saddr = sk->sk_v6_rcv_saddr.s6_addr32[3]; + inet->inet_daddr = sk->sk_v6_daddr.s6_addr32[3]; } else { tunnel->v4mapped = false; } diff --git a/net/l2tp/l2tp_core.h b/net/l2tp/l2tp_core.h index 6f251cbc2ed7..1ee9f6965d68 100644 --- a/net/l2tp/l2tp_core.h +++ b/net/l2tp/l2tp_core.h @@ -238,29 +238,40 @@ out: return tunnel; } -extern struct sock *l2tp_tunnel_sock_lookup(struct l2tp_tunnel *tunnel); -extern void l2tp_tunnel_sock_put(struct sock *sk); -extern struct l2tp_session *l2tp_session_find(struct net *net, struct l2tp_tunnel *tunnel, u32 session_id); -extern struct l2tp_session *l2tp_session_find_nth(struct l2tp_tunnel *tunnel, int nth); -extern struct l2tp_session *l2tp_session_find_by_ifname(struct net *net, char *ifname); -extern struct l2tp_tunnel *l2tp_tunnel_find(struct net *net, u32 tunnel_id); -extern struct l2tp_tunnel *l2tp_tunnel_find_nth(struct net *net, int nth); - -extern int l2tp_tunnel_create(struct net *net, int fd, int version, u32 tunnel_id, u32 peer_tunnel_id, struct l2tp_tunnel_cfg *cfg, struct l2tp_tunnel **tunnelp); -extern void l2tp_tunnel_closeall(struct l2tp_tunnel *tunnel); -extern int l2tp_tunnel_delete(struct l2tp_tunnel *tunnel); -extern struct l2tp_session *l2tp_session_create(int priv_size, struct l2tp_tunnel *tunnel, u32 session_id, u32 peer_session_id, struct l2tp_session_cfg *cfg); -extern void __l2tp_session_unhash(struct l2tp_session *session); -extern int l2tp_session_delete(struct l2tp_session *session); -extern void l2tp_session_free(struct l2tp_session *session); -extern void l2tp_recv_common(struct l2tp_session *session, struct sk_buff *skb, unsigned char *ptr, unsigned char *optr, u16 hdrflags, int length, int (*payload_hook)(struct sk_buff *skb)); -extern int l2tp_session_queue_purge(struct l2tp_session *session); -extern int l2tp_udp_encap_recv(struct sock *sk, struct sk_buff *skb); - -extern int l2tp_xmit_skb(struct l2tp_session *session, struct sk_buff *skb, int hdr_len); - -extern int l2tp_nl_register_ops(enum l2tp_pwtype pw_type, const struct l2tp_nl_cmd_ops *ops); -extern void l2tp_nl_unregister_ops(enum l2tp_pwtype pw_type); +struct sock *l2tp_tunnel_sock_lookup(struct l2tp_tunnel *tunnel); +void l2tp_tunnel_sock_put(struct sock *sk); +struct l2tp_session *l2tp_session_find(struct net *net, + struct l2tp_tunnel *tunnel, + u32 session_id); +struct l2tp_session *l2tp_session_find_nth(struct l2tp_tunnel *tunnel, int nth); +struct l2tp_session *l2tp_session_find_by_ifname(struct net *net, char *ifname); +struct l2tp_tunnel *l2tp_tunnel_find(struct net *net, u32 tunnel_id); +struct l2tp_tunnel *l2tp_tunnel_find_nth(struct net *net, int nth); + +int l2tp_tunnel_create(struct net *net, int fd, int version, u32 tunnel_id, + u32 peer_tunnel_id, struct l2tp_tunnel_cfg *cfg, + struct l2tp_tunnel **tunnelp); +void l2tp_tunnel_closeall(struct l2tp_tunnel *tunnel); +int l2tp_tunnel_delete(struct l2tp_tunnel *tunnel); +struct l2tp_session *l2tp_session_create(int priv_size, + struct l2tp_tunnel *tunnel, + u32 session_id, u32 peer_session_id, + struct l2tp_session_cfg *cfg); +void __l2tp_session_unhash(struct l2tp_session *session); +int l2tp_session_delete(struct l2tp_session *session); +void l2tp_session_free(struct l2tp_session *session); +void l2tp_recv_common(struct l2tp_session *session, struct sk_buff *skb, + unsigned char *ptr, unsigned char *optr, u16 hdrflags, + int length, int (*payload_hook)(struct sk_buff *skb)); +int l2tp_session_queue_purge(struct l2tp_session *session); +int l2tp_udp_encap_recv(struct sock *sk, struct sk_buff *skb); + +int l2tp_xmit_skb(struct l2tp_session *session, struct sk_buff *skb, + int hdr_len); + +int l2tp_nl_register_ops(enum l2tp_pwtype pw_type, + const struct l2tp_nl_cmd_ops *ops); +void l2tp_nl_unregister_ops(enum l2tp_pwtype pw_type); /* Session reference counts. Incremented when code obtains a reference * to a session. diff --git a/net/l2tp/l2tp_debugfs.c b/net/l2tp/l2tp_debugfs.c index 072d7202e182..2d6760a2ae34 100644 --- a/net/l2tp/l2tp_debugfs.c +++ b/net/l2tp/l2tp_debugfs.c @@ -127,9 +127,10 @@ static void l2tp_dfs_seq_tunnel_show(struct seq_file *m, void *v) #if IS_ENABLED(CONFIG_IPV6) if (tunnel->sock->sk_family == AF_INET6) { - struct ipv6_pinfo *np = inet6_sk(tunnel->sock); + const struct ipv6_pinfo *np = inet6_sk(tunnel->sock); + seq_printf(m, " from %pI6c to %pI6c\n", - &np->saddr, &np->daddr); + &np->saddr, &tunnel->sock->sk_v6_daddr); } else #endif seq_printf(m, " from %pI4 to %pI4\n", diff --git a/net/l2tp/l2tp_ip6.c b/net/l2tp/l2tp_ip6.c index b8a6039314e8..cfd65304be60 100644 --- a/net/l2tp/l2tp_ip6.c +++ b/net/l2tp/l2tp_ip6.c @@ -63,7 +63,7 @@ static struct sock *__l2tp_ip6_bind_lookup(struct net *net, struct sock *sk; sk_for_each_bound(sk, &l2tp_ip6_bind_table) { - struct in6_addr *addr = inet6_rcv_saddr(sk); + const struct in6_addr *addr = inet6_rcv_saddr(sk); struct l2tp_ip6_sock *l2tp = l2tp_ip6_sk(sk); if (l2tp == NULL) @@ -331,7 +331,7 @@ static int l2tp_ip6_bind(struct sock *sk, struct sockaddr *uaddr, int addr_len) rcu_read_unlock(); inet->inet_rcv_saddr = inet->inet_saddr = v4addr; - np->rcv_saddr = addr->l2tp_addr; + sk->sk_v6_rcv_saddr = addr->l2tp_addr; np->saddr = addr->l2tp_addr; l2tp_ip6_sk(sk)->conn_id = addr->l2tp_conn_id; @@ -421,14 +421,14 @@ static int l2tp_ip6_getname(struct socket *sock, struct sockaddr *uaddr, if (!lsk->peer_conn_id) return -ENOTCONN; lsa->l2tp_conn_id = lsk->peer_conn_id; - lsa->l2tp_addr = np->daddr; + lsa->l2tp_addr = sk->sk_v6_daddr; if (np->sndflow) lsa->l2tp_flowinfo = np->flow_label; } else { - if (ipv6_addr_any(&np->rcv_saddr)) + if (ipv6_addr_any(&sk->sk_v6_rcv_saddr)) lsa->l2tp_addr = np->saddr; else - lsa->l2tp_addr = np->rcv_saddr; + lsa->l2tp_addr = sk->sk_v6_rcv_saddr; lsa->l2tp_conn_id = lsk->conn_id; } @@ -537,8 +537,8 @@ static int l2tp_ip6_sendmsg(struct kiocb *iocb, struct sock *sk, * sk->sk_dst_cache. */ if (sk->sk_state == TCP_ESTABLISHED && - ipv6_addr_equal(daddr, &np->daddr)) - daddr = &np->daddr; + ipv6_addr_equal(daddr, &sk->sk_v6_daddr)) + daddr = &sk->sk_v6_daddr; if (addr_len >= sizeof(struct sockaddr_in6) && lsa->l2tp_scope_id && @@ -548,7 +548,7 @@ static int l2tp_ip6_sendmsg(struct kiocb *iocb, struct sock *sk, if (sk->sk_state != TCP_ESTABLISHED) return -EDESTADDRREQ; - daddr = &np->daddr; + daddr = &sk->sk_v6_daddr; fl6.flowlabel = np->flow_label; } diff --git a/net/l2tp/l2tp_netlink.c b/net/l2tp/l2tp_netlink.c index 0825ff26e113..be446d517bc9 100644 --- a/net/l2tp/l2tp_netlink.c +++ b/net/l2tp/l2tp_netlink.c @@ -306,8 +306,8 @@ static int l2tp_nl_tunnel_send(struct sk_buff *skb, u32 portid, u32 seq, int fla if (np) { if (nla_put(skb, L2TP_ATTR_IP6_SADDR, sizeof(np->saddr), &np->saddr) || - nla_put(skb, L2TP_ATTR_IP6_DADDR, sizeof(np->daddr), - &np->daddr)) + nla_put(skb, L2TP_ATTR_IP6_DADDR, sizeof(sk->sk_v6_daddr), + &sk->sk_v6_daddr)) goto nla_put_failure; } else #endif diff --git a/net/l2tp/l2tp_ppp.c b/net/l2tp/l2tp_ppp.c index 8c46b271064a..ffda81ef1a70 100644 --- a/net/l2tp/l2tp_ppp.c +++ b/net/l2tp/l2tp_ppp.c @@ -910,8 +910,8 @@ static int pppol2tp_getname(struct socket *sock, struct sockaddr *uaddr, #if IS_ENABLED(CONFIG_IPV6) } else if ((tunnel->version == 2) && (tunnel->sock->sk_family == AF_INET6)) { - struct ipv6_pinfo *np = inet6_sk(tunnel->sock); struct sockaddr_pppol2tpin6 sp; + len = sizeof(sp); memset(&sp, 0, len); sp.sa_family = AF_PPPOX; @@ -924,13 +924,13 @@ static int pppol2tp_getname(struct socket *sock, struct sockaddr *uaddr, sp.pppol2tp.d_session = session->peer_session_id; sp.pppol2tp.addr.sin6_family = AF_INET6; sp.pppol2tp.addr.sin6_port = inet->inet_dport; - memcpy(&sp.pppol2tp.addr.sin6_addr, &np->daddr, - sizeof(np->daddr)); + memcpy(&sp.pppol2tp.addr.sin6_addr, &tunnel->sock->sk_v6_daddr, + sizeof(tunnel->sock->sk_v6_daddr)); memcpy(uaddr, &sp, len); } else if ((tunnel->version == 3) && (tunnel->sock->sk_family == AF_INET6)) { - struct ipv6_pinfo *np = inet6_sk(tunnel->sock); struct sockaddr_pppol2tpv3in6 sp; + len = sizeof(sp); memset(&sp, 0, len); sp.sa_family = AF_PPPOX; @@ -943,8 +943,8 @@ static int pppol2tp_getname(struct socket *sock, struct sockaddr *uaddr, sp.pppol2tp.d_session = session->peer_session_id; sp.pppol2tp.addr.sin6_family = AF_INET6; sp.pppol2tp.addr.sin6_port = inet->inet_dport; - memcpy(&sp.pppol2tp.addr.sin6_addr, &np->daddr, - sizeof(np->daddr)); + memcpy(&sp.pppol2tp.addr.sin6_addr, &tunnel->sock->sk_v6_daddr, + sizeof(tunnel->sock->sk_v6_daddr)); memcpy(uaddr, &sp, len); #endif } else if (tunnel->version == 3) { diff --git a/net/mac80211/Kconfig b/net/mac80211/Kconfig index 62535fe9f570..97b5dcad5025 100644 --- a/net/mac80211/Kconfig +++ b/net/mac80211/Kconfig @@ -4,6 +4,7 @@ config MAC80211 select CRYPTO select CRYPTO_ARC4 select CRYPTO_AES + select CRYPTO_CCM select CRC32 select AVERAGE ---help--- @@ -258,6 +259,17 @@ config MAC80211_MESH_SYNC_DEBUG Do not select this option. +config MAC80211_MESH_CSA_DEBUG + bool "Verbose mesh channel switch debugging" + depends on MAC80211_DEBUG_MENU + depends on MAC80211_MESH + ---help--- + Selecting this option causes mac80211 to print out very verbose mesh + channel switch debugging messages (when mac80211 is taking part in a + mesh network). + + Do not select this option. + config MAC80211_MESH_PS_DEBUG bool "Verbose mesh powersave debugging" depends on MAC80211_DEBUG_MENU diff --git a/net/mac80211/aes_ccm.c b/net/mac80211/aes_ccm.c index be7614b9ed27..7c7df475a401 100644 --- a/net/mac80211/aes_ccm.c +++ b/net/mac80211/aes_ccm.c @@ -2,6 +2,8 @@ * Copyright 2003-2004, Instant802 Networks, Inc. * Copyright 2005-2006, Devicescape Software, Inc. * + * Rewrite: Copyright (C) 2013 Linaro Ltd <ard.biesheuvel@linaro.org> + * * This program is free software; you can redistribute it and/or modify * it under the terms of the GNU General Public License version 2 as * published by the Free Software Foundation. @@ -17,134 +19,75 @@ #include "key.h" #include "aes_ccm.h" -static void aes_ccm_prepare(struct crypto_cipher *tfm, u8 *scratch, u8 *a) +void ieee80211_aes_ccm_encrypt(struct crypto_aead *tfm, u8 *b_0, u8 *aad, + u8 *data, size_t data_len, u8 *mic) { - int i; - u8 *b_0, *aad, *b, *s_0; - - b_0 = scratch + 3 * AES_BLOCK_SIZE; - aad = scratch + 4 * AES_BLOCK_SIZE; - b = scratch; - s_0 = scratch + AES_BLOCK_SIZE; - - crypto_cipher_encrypt_one(tfm, b, b_0); + struct scatterlist assoc, pt, ct[2]; + struct { + struct aead_request req; + u8 priv[crypto_aead_reqsize(tfm)]; + } aead_req; - /* Extra Authenticate-only data (always two AES blocks) */ - for (i = 0; i < AES_BLOCK_SIZE; i++) - aad[i] ^= b[i]; - crypto_cipher_encrypt_one(tfm, b, aad); + memset(&aead_req, 0, sizeof(aead_req)); - aad += AES_BLOCK_SIZE; + sg_init_one(&pt, data, data_len); + sg_init_one(&assoc, &aad[2], be16_to_cpup((__be16 *)aad)); + sg_init_table(ct, 2); + sg_set_buf(&ct[0], data, data_len); + sg_set_buf(&ct[1], mic, IEEE80211_CCMP_MIC_LEN); - for (i = 0; i < AES_BLOCK_SIZE; i++) - aad[i] ^= b[i]; - crypto_cipher_encrypt_one(tfm, a, aad); + aead_request_set_tfm(&aead_req.req, tfm); + aead_request_set_assoc(&aead_req.req, &assoc, assoc.length); + aead_request_set_crypt(&aead_req.req, &pt, ct, data_len, b_0); - /* Mask out bits from auth-only-b_0 */ - b_0[0] &= 0x07; - - /* S_0 is used to encrypt T (= MIC) */ - b_0[14] = 0; - b_0[15] = 0; - crypto_cipher_encrypt_one(tfm, s_0, b_0); + crypto_aead_encrypt(&aead_req.req); } - -void ieee80211_aes_ccm_encrypt(struct crypto_cipher *tfm, u8 *scratch, - u8 *data, size_t data_len, - u8 *cdata, u8 *mic) +int ieee80211_aes_ccm_decrypt(struct crypto_aead *tfm, u8 *b_0, u8 *aad, + u8 *data, size_t data_len, u8 *mic) { - int i, j, last_len, num_blocks; - u8 *pos, *cpos, *b, *s_0, *e, *b_0; - - b = scratch; - s_0 = scratch + AES_BLOCK_SIZE; - e = scratch + 2 * AES_BLOCK_SIZE; - b_0 = scratch + 3 * AES_BLOCK_SIZE; - - num_blocks = DIV_ROUND_UP(data_len, AES_BLOCK_SIZE); - last_len = data_len % AES_BLOCK_SIZE; - aes_ccm_prepare(tfm, scratch, b); - - /* Process payload blocks */ - pos = data; - cpos = cdata; - for (j = 1; j <= num_blocks; j++) { - int blen = (j == num_blocks && last_len) ? - last_len : AES_BLOCK_SIZE; - - /* Authentication followed by encryption */ - for (i = 0; i < blen; i++) - b[i] ^= pos[i]; - crypto_cipher_encrypt_one(tfm, b, b); - - b_0[14] = (j >> 8) & 0xff; - b_0[15] = j & 0xff; - crypto_cipher_encrypt_one(tfm, e, b_0); - for (i = 0; i < blen; i++) - *cpos++ = *pos++ ^ e[i]; - } - - for (i = 0; i < IEEE80211_CCMP_MIC_LEN; i++) - mic[i] = b[i] ^ s_0[i]; + struct scatterlist assoc, pt, ct[2]; + struct { + struct aead_request req; + u8 priv[crypto_aead_reqsize(tfm)]; + } aead_req; + + memset(&aead_req, 0, sizeof(aead_req)); + + sg_init_one(&pt, data, data_len); + sg_init_one(&assoc, &aad[2], be16_to_cpup((__be16 *)aad)); + sg_init_table(ct, 2); + sg_set_buf(&ct[0], data, data_len); + sg_set_buf(&ct[1], mic, IEEE80211_CCMP_MIC_LEN); + + aead_request_set_tfm(&aead_req.req, tfm); + aead_request_set_assoc(&aead_req.req, &assoc, assoc.length); + aead_request_set_crypt(&aead_req.req, ct, &pt, + data_len + IEEE80211_CCMP_MIC_LEN, b_0); + + return crypto_aead_decrypt(&aead_req.req); } - -int ieee80211_aes_ccm_decrypt(struct crypto_cipher *tfm, u8 *scratch, - u8 *cdata, size_t data_len, u8 *mic, u8 *data) +struct crypto_aead *ieee80211_aes_key_setup_encrypt(const u8 key[]) { - int i, j, last_len, num_blocks; - u8 *pos, *cpos, *b, *s_0, *a, *b_0; - - b = scratch; - s_0 = scratch + AES_BLOCK_SIZE; - a = scratch + 2 * AES_BLOCK_SIZE; - b_0 = scratch + 3 * AES_BLOCK_SIZE; - - num_blocks = DIV_ROUND_UP(data_len, AES_BLOCK_SIZE); - last_len = data_len % AES_BLOCK_SIZE; - aes_ccm_prepare(tfm, scratch, a); - - /* Process payload blocks */ - cpos = cdata; - pos = data; - for (j = 1; j <= num_blocks; j++) { - int blen = (j == num_blocks && last_len) ? - last_len : AES_BLOCK_SIZE; - - /* Decryption followed by authentication */ - b_0[14] = (j >> 8) & 0xff; - b_0[15] = j & 0xff; - crypto_cipher_encrypt_one(tfm, b, b_0); - for (i = 0; i < blen; i++) { - *pos = *cpos++ ^ b[i]; - a[i] ^= *pos++; - } - crypto_cipher_encrypt_one(tfm, a, a); - } - - for (i = 0; i < IEEE80211_CCMP_MIC_LEN; i++) { - if ((mic[i] ^ s_0[i]) != a[i]) - return -1; - } - - return 0; -} + struct crypto_aead *tfm; + int err; + tfm = crypto_alloc_aead("ccm(aes)", 0, CRYPTO_ALG_ASYNC); + if (IS_ERR(tfm)) + return tfm; -struct crypto_cipher *ieee80211_aes_key_setup_encrypt(const u8 key[]) -{ - struct crypto_cipher *tfm; + err = crypto_aead_setkey(tfm, key, WLAN_KEY_LEN_CCMP); + if (!err) + err = crypto_aead_setauthsize(tfm, IEEE80211_CCMP_MIC_LEN); + if (!err) + return tfm; - tfm = crypto_alloc_cipher("aes", 0, CRYPTO_ALG_ASYNC); - if (!IS_ERR(tfm)) - crypto_cipher_setkey(tfm, key, WLAN_KEY_LEN_CCMP); - - return tfm; + crypto_free_aead(tfm); + return ERR_PTR(err); } - -void ieee80211_aes_key_free(struct crypto_cipher *tfm) +void ieee80211_aes_key_free(struct crypto_aead *tfm) { - crypto_free_cipher(tfm); + crypto_free_aead(tfm); } diff --git a/net/mac80211/aes_ccm.h b/net/mac80211/aes_ccm.h index 5b7d744e2370..2c7ab1948a2e 100644 --- a/net/mac80211/aes_ccm.h +++ b/net/mac80211/aes_ccm.h @@ -12,13 +12,11 @@ #include <linux/crypto.h> -struct crypto_cipher *ieee80211_aes_key_setup_encrypt(const u8 key[]); -void ieee80211_aes_ccm_encrypt(struct crypto_cipher *tfm, u8 *scratch, - u8 *data, size_t data_len, - u8 *cdata, u8 *mic); -int ieee80211_aes_ccm_decrypt(struct crypto_cipher *tfm, u8 *scratch, - u8 *cdata, size_t data_len, - u8 *mic, u8 *data); -void ieee80211_aes_key_free(struct crypto_cipher *tfm); +struct crypto_aead *ieee80211_aes_key_setup_encrypt(const u8 key[]); +void ieee80211_aes_ccm_encrypt(struct crypto_aead *tfm, u8 *b_0, u8 *aad, + u8 *data, size_t data_len, u8 *mic); +int ieee80211_aes_ccm_decrypt(struct crypto_aead *tfm, u8 *b_0, u8 *aad, + u8 *data, size_t data_len, u8 *mic); +void ieee80211_aes_key_free(struct crypto_aead *tfm); #endif /* AES_CCM_H */ diff --git a/net/mac80211/cfg.c b/net/mac80211/cfg.c index 629dee7ec9bf..95667b088c5b 100644 --- a/net/mac80211/cfg.c +++ b/net/mac80211/cfg.c @@ -1059,6 +1059,7 @@ static int ieee80211_stop_ap(struct wiphy *wiphy, struct net_device *dev) /* abort any running channel switch */ sdata->vif.csa_active = false; cancel_work_sync(&sdata->csa_finalize_work); + cancel_work_sync(&sdata->u.ap.request_smps_work); /* turn off carrier for this interface and dependent VLANs */ list_for_each_entry(vlan, &sdata->u.ap.vlans, u.vlan.list) @@ -1342,8 +1343,8 @@ static int sta_apply_parameters(struct ieee80211_local *local, sta->plink_state = params->plink_state; ieee80211_mps_sta_status_update(sta); - changed |= - ieee80211_mps_local_status_update(sdata); + changed |= ieee80211_mps_set_sta_local_pm(sta, + NL80211_MESH_POWER_UNKNOWN); break; default: /* nothing */ @@ -1553,6 +1554,20 @@ static int ieee80211_change_station(struct wiphy *wiphy, mutex_unlock(&local->sta_mtx); + if ((sdata->vif.type == NL80211_IFTYPE_AP || + sdata->vif.type == NL80211_IFTYPE_AP_VLAN) && + sta->known_smps_mode != sta->sdata->bss->req_smps && + test_sta_flag(sta, WLAN_STA_AUTHORIZED) && + sta_info_tx_streams(sta) != 1) { + ht_dbg(sta->sdata, + "%pM just authorized and MIMO capable - update SMPS\n", + sta->sta.addr); + ieee80211_send_smps_action(sta->sdata, + sta->sdata->bss->req_smps, + sta->sta.addr, + sta->sdata->vif.bss_conf.bssid); + } + if (sdata->vif.type == NL80211_IFTYPE_STATION && params->sta_flags_mask & BIT(NL80211_STA_FLAG_AUTHORIZED)) { ieee80211_recalc_ps(local, -1); @@ -2337,8 +2352,92 @@ static int ieee80211_testmode_dump(struct wiphy *wiphy, } #endif -int __ieee80211_request_smps(struct ieee80211_sub_if_data *sdata, - enum ieee80211_smps_mode smps_mode) +int __ieee80211_request_smps_ap(struct ieee80211_sub_if_data *sdata, + enum ieee80211_smps_mode smps_mode) +{ + struct sta_info *sta; + enum ieee80211_smps_mode old_req; + int i; + + if (WARN_ON_ONCE(sdata->vif.type != NL80211_IFTYPE_AP)) + return -EINVAL; + + if (sdata->vif.bss_conf.chandef.width == NL80211_CHAN_WIDTH_20_NOHT) + return 0; + + old_req = sdata->u.ap.req_smps; + sdata->u.ap.req_smps = smps_mode; + + /* AUTOMATIC doesn't mean much for AP - don't allow it */ + if (old_req == smps_mode || + smps_mode == IEEE80211_SMPS_AUTOMATIC) + return 0; + + /* If no associated stations, there's no need to do anything */ + if (!atomic_read(&sdata->u.ap.num_mcast_sta)) { + sdata->smps_mode = smps_mode; + ieee80211_queue_work(&sdata->local->hw, &sdata->recalc_smps); + return 0; + } + + ht_dbg(sdata, + "SMSP %d requested in AP mode, sending Action frame to %d stations\n", + smps_mode, atomic_read(&sdata->u.ap.num_mcast_sta)); + + mutex_lock(&sdata->local->sta_mtx); + for (i = 0; i < STA_HASH_SIZE; i++) { + for (sta = rcu_dereference_protected(sdata->local->sta_hash[i], + lockdep_is_held(&sdata->local->sta_mtx)); + sta; + sta = rcu_dereference_protected(sta->hnext, + lockdep_is_held(&sdata->local->sta_mtx))) { + /* + * Only stations associated to our AP and + * associated VLANs + */ + if (sta->sdata->bss != &sdata->u.ap) + continue; + + /* This station doesn't support MIMO - skip it */ + if (sta_info_tx_streams(sta) == 1) + continue; + + /* + * Don't wake up a STA just to send the action frame + * unless we are getting more restrictive. + */ + if (test_sta_flag(sta, WLAN_STA_PS_STA) && + !ieee80211_smps_is_restrictive(sta->known_smps_mode, + smps_mode)) { + ht_dbg(sdata, + "Won't send SMPS to sleeping STA %pM\n", + sta->sta.addr); + continue; + } + + /* + * If the STA is not authorized, wait until it gets + * authorized and the action frame will be sent then. + */ + if (!test_sta_flag(sta, WLAN_STA_AUTHORIZED)) + continue; + + ht_dbg(sdata, "Sending SMPS to %pM\n", sta->sta.addr); + ieee80211_send_smps_action(sdata, smps_mode, + sta->sta.addr, + sdata->vif.bss_conf.bssid); + } + } + mutex_unlock(&sdata->local->sta_mtx); + + sdata->smps_mode = smps_mode; + ieee80211_queue_work(&sdata->local->hw, &sdata->recalc_smps); + + return 0; +} + +int __ieee80211_request_smps_mgd(struct ieee80211_sub_if_data *sdata, + enum ieee80211_smps_mode smps_mode) { const u8 *ap; enum ieee80211_smps_mode old_req; @@ -2346,6 +2445,9 @@ int __ieee80211_request_smps(struct ieee80211_sub_if_data *sdata, lockdep_assert_held(&sdata->wdev.mtx); + if (WARN_ON_ONCE(sdata->vif.type != NL80211_IFTYPE_STATION)) + return -EINVAL; + old_req = sdata->u.mgd.req_smps; sdata->u.mgd.req_smps = smps_mode; @@ -2402,7 +2504,7 @@ static int ieee80211_set_power_mgmt(struct wiphy *wiphy, struct net_device *dev, /* no change, but if automatic follow powersave */ sdata_lock(sdata); - __ieee80211_request_smps(sdata, sdata->u.mgd.req_smps); + __ieee80211_request_smps_mgd(sdata, sdata->u.mgd.req_smps); sdata_unlock(sdata); if (local->hw.flags & IEEE80211_HW_SUPPORTS_DYNAMIC_PS) @@ -2860,35 +2962,55 @@ void ieee80211_csa_finalize_work(struct work_struct *work) container_of(work, struct ieee80211_sub_if_data, csa_finalize_work); struct ieee80211_local *local = sdata->local; - int err, changed; + int err, changed = 0; if (!ieee80211_sdata_running(sdata)) return; - if (WARN_ON(sdata->vif.type != NL80211_IFTYPE_AP)) - return; - sdata->radar_required = sdata->csa_radar_required; err = ieee80211_vif_change_channel(sdata, &local->csa_chandef, &changed); if (WARN_ON(err < 0)) return; - err = ieee80211_assign_beacon(sdata, sdata->u.ap.next_beacon); - if (err < 0) - return; + if (!local->use_chanctx) { + local->_oper_chandef = local->csa_chandef; + ieee80211_hw_config(local, 0); + } - changed |= err; - kfree(sdata->u.ap.next_beacon); - sdata->u.ap.next_beacon = NULL; + ieee80211_bss_info_change_notify(sdata, changed); + + switch (sdata->vif.type) { + case NL80211_IFTYPE_AP: + err = ieee80211_assign_beacon(sdata, sdata->u.ap.next_beacon); + if (err < 0) + return; + changed |= err; + kfree(sdata->u.ap.next_beacon); + sdata->u.ap.next_beacon = NULL; + + ieee80211_bss_info_change_notify(sdata, err); + break; + case NL80211_IFTYPE_ADHOC: + ieee80211_ibss_finish_csa(sdata); + break; +#ifdef CONFIG_MAC80211_MESH + case NL80211_IFTYPE_MESH_POINT: + err = ieee80211_mesh_finish_csa(sdata); + if (err < 0) + return; + break; +#endif + default: + WARN_ON(1); + return; + } sdata->vif.csa_active = false; ieee80211_wake_queues_by_reason(&sdata->local->hw, IEEE80211_MAX_QUEUE_MAP, IEEE80211_QUEUE_STOP_REASON_CSA); - ieee80211_bss_info_change_notify(sdata, changed); - cfg80211_ch_switch_notify(sdata->dev, &local->csa_chandef); } @@ -2899,6 +3021,7 @@ static int ieee80211_channel_switch(struct wiphy *wiphy, struct net_device *dev, struct ieee80211_local *local = sdata->local; struct ieee80211_chanctx_conf *chanctx_conf; struct ieee80211_chanctx *chanctx; + struct ieee80211_if_mesh __maybe_unused *ifmsh; int err, num_chanctx; if (!list_empty(&local->roc_list) || local->scanning) @@ -2936,20 +3059,76 @@ static int ieee80211_channel_switch(struct wiphy *wiphy, struct net_device *dev, if (sdata->vif.csa_active) return -EBUSY; - /* only handle AP for now. */ switch (sdata->vif.type) { case NL80211_IFTYPE_AP: + sdata->csa_counter_offset_beacon = + params->counter_offset_beacon; + sdata->csa_counter_offset_presp = params->counter_offset_presp; + sdata->u.ap.next_beacon = + cfg80211_beacon_dup(¶ms->beacon_after); + if (!sdata->u.ap.next_beacon) + return -ENOMEM; + + err = ieee80211_assign_beacon(sdata, ¶ms->beacon_csa); + if (err < 0) { + kfree(sdata->u.ap.next_beacon); + return err; + } break; + case NL80211_IFTYPE_ADHOC: + if (!sdata->vif.bss_conf.ibss_joined) + return -EINVAL; + + if (params->chandef.width != sdata->u.ibss.chandef.width) + return -EINVAL; + + switch (params->chandef.width) { + case NL80211_CHAN_WIDTH_40: + if (cfg80211_get_chandef_type(¶ms->chandef) != + cfg80211_get_chandef_type(&sdata->u.ibss.chandef)) + return -EINVAL; + case NL80211_CHAN_WIDTH_5: + case NL80211_CHAN_WIDTH_10: + case NL80211_CHAN_WIDTH_20_NOHT: + case NL80211_CHAN_WIDTH_20: + break; + default: + return -EINVAL; + } + + /* changes into another band are not supported */ + if (sdata->u.ibss.chandef.chan->band != + params->chandef.chan->band) + return -EINVAL; + + err = ieee80211_ibss_csa_beacon(sdata, params); + if (err < 0) + return err; + break; +#ifdef CONFIG_MAC80211_MESH + case NL80211_IFTYPE_MESH_POINT: + ifmsh = &sdata->u.mesh; + + if (!ifmsh->mesh_id) + return -EINVAL; + + if (params->chandef.width != sdata->vif.bss_conf.chandef.width) + return -EINVAL; + + /* changes into another band are not supported */ + if (sdata->vif.bss_conf.chandef.chan->band != + params->chandef.chan->band) + return -EINVAL; + + err = ieee80211_mesh_csa_beacon(sdata, params, true); + if (err < 0) + return err; + break; +#endif default: return -EOPNOTSUPP; } - sdata->u.ap.next_beacon = cfg80211_beacon_dup(¶ms->beacon_after); - if (!sdata->u.ap.next_beacon) - return -ENOMEM; - - sdata->csa_counter_offset_beacon = params->counter_offset_beacon; - sdata->csa_counter_offset_presp = params->counter_offset_presp; sdata->csa_radar_required = params->radar_required; if (params->block_tx) @@ -2957,10 +3136,6 @@ static int ieee80211_channel_switch(struct wiphy *wiphy, struct net_device *dev, IEEE80211_MAX_QUEUE_MAP, IEEE80211_QUEUE_STOP_REASON_CSA); - err = ieee80211_assign_beacon(sdata, ¶ms->beacon_csa); - if (err < 0) - return err; - local->csa_chandef = params->chandef; sdata->vif.csa_active = true; @@ -3014,7 +3189,8 @@ static int ieee80211_mgmt_tx(struct wiphy *wiphy, struct wireless_dev *wdev, need_offchan = true; if (!ieee80211_is_action(mgmt->frame_control) || mgmt->u.action.category == WLAN_CATEGORY_PUBLIC || - mgmt->u.action.category == WLAN_CATEGORY_SELF_PROTECTED) + mgmt->u.action.category == WLAN_CATEGORY_SELF_PROTECTED || + mgmt->u.action.category == WLAN_CATEGORY_SPECTRUM_MGMT) break; rcu_read_lock(); sta = sta_info_get(sdata, mgmt->da); diff --git a/net/mac80211/chan.c b/net/mac80211/chan.c index 3a4764b2869e..03ba6b5c5373 100644 --- a/net/mac80211/chan.c +++ b/net/mac80211/chan.c @@ -453,11 +453,6 @@ int ieee80211_vif_change_channel(struct ieee80211_sub_if_data *sdata, chanctx_changed |= IEEE80211_CHANCTX_CHANGE_CHANNEL; drv_change_chanctx(local, ctx, chanctx_changed); - if (!local->use_chanctx) { - local->_oper_chandef = *chandef; - ieee80211_hw_config(local, 0); - } - ieee80211_recalc_chanctx_chantype(local, ctx); ieee80211_recalc_smps_chanctx(local, ctx); ieee80211_recalc_radar_chanctx(local, ctx); diff --git a/net/mac80211/debug.h b/net/mac80211/debug.h index 4ccc5ed6237d..493d68061f0c 100644 --- a/net/mac80211/debug.h +++ b/net/mac80211/debug.h @@ -44,6 +44,12 @@ #define MAC80211_MESH_SYNC_DEBUG 0 #endif +#ifdef CONFIG_MAC80211_MESH_CSA_DEBUG +#define MAC80211_MESH_CSA_DEBUG 1 +#else +#define MAC80211_MESH_CSA_DEBUG 0 +#endif + #ifdef CONFIG_MAC80211_MESH_PS_DEBUG #define MAC80211_MESH_PS_DEBUG 1 #else @@ -157,6 +163,10 @@ do { \ _sdata_dbg(MAC80211_MESH_SYNC_DEBUG, \ sdata, fmt, ##__VA_ARGS__) +#define mcsa_dbg(sdata, fmt, ...) \ + _sdata_dbg(MAC80211_MESH_CSA_DEBUG, \ + sdata, fmt, ##__VA_ARGS__) + #define mps_dbg(sdata, fmt, ...) \ _sdata_dbg(MAC80211_MESH_PS_DEBUG, \ sdata, fmt, ##__VA_ARGS__) diff --git a/net/mac80211/debugfs.c b/net/mac80211/debugfs.c index b0e32d628114..5c090e41d9bb 100644 --- a/net/mac80211/debugfs.c +++ b/net/mac80211/debugfs.c @@ -103,54 +103,57 @@ static ssize_t hwflags_read(struct file *file, char __user *user_buf, if (!buf) return 0; - sf += snprintf(buf, mxln - sf, "0x%x\n", local->hw.flags); + sf += scnprintf(buf, mxln - sf, "0x%x\n", local->hw.flags); if (local->hw.flags & IEEE80211_HW_HAS_RATE_CONTROL) - sf += snprintf(buf + sf, mxln - sf, "HAS_RATE_CONTROL\n"); + sf += scnprintf(buf + sf, mxln - sf, "HAS_RATE_CONTROL\n"); if (local->hw.flags & IEEE80211_HW_RX_INCLUDES_FCS) - sf += snprintf(buf + sf, mxln - sf, "RX_INCLUDES_FCS\n"); + sf += scnprintf(buf + sf, mxln - sf, "RX_INCLUDES_FCS\n"); if (local->hw.flags & IEEE80211_HW_HOST_BROADCAST_PS_BUFFERING) - sf += snprintf(buf + sf, mxln - sf, - "HOST_BCAST_PS_BUFFERING\n"); + sf += scnprintf(buf + sf, mxln - sf, + "HOST_BCAST_PS_BUFFERING\n"); if (local->hw.flags & IEEE80211_HW_2GHZ_SHORT_SLOT_INCAPABLE) - sf += snprintf(buf + sf, mxln - sf, - "2GHZ_SHORT_SLOT_INCAPABLE\n"); + sf += scnprintf(buf + sf, mxln - sf, + "2GHZ_SHORT_SLOT_INCAPABLE\n"); if (local->hw.flags & IEEE80211_HW_2GHZ_SHORT_PREAMBLE_INCAPABLE) - sf += snprintf(buf + sf, mxln - sf, - "2GHZ_SHORT_PREAMBLE_INCAPABLE\n"); + sf += scnprintf(buf + sf, mxln - sf, + "2GHZ_SHORT_PREAMBLE_INCAPABLE\n"); if (local->hw.flags & IEEE80211_HW_SIGNAL_UNSPEC) - sf += snprintf(buf + sf, mxln - sf, "SIGNAL_UNSPEC\n"); + sf += scnprintf(buf + sf, mxln - sf, "SIGNAL_UNSPEC\n"); if (local->hw.flags & IEEE80211_HW_SIGNAL_DBM) - sf += snprintf(buf + sf, mxln - sf, "SIGNAL_DBM\n"); + sf += scnprintf(buf + sf, mxln - sf, "SIGNAL_DBM\n"); if (local->hw.flags & IEEE80211_HW_NEED_DTIM_BEFORE_ASSOC) - sf += snprintf(buf + sf, mxln - sf, "NEED_DTIM_BEFORE_ASSOC\n"); + sf += scnprintf(buf + sf, mxln - sf, + "NEED_DTIM_BEFORE_ASSOC\n"); if (local->hw.flags & IEEE80211_HW_SPECTRUM_MGMT) - sf += snprintf(buf + sf, mxln - sf, "SPECTRUM_MGMT\n"); + sf += scnprintf(buf + sf, mxln - sf, "SPECTRUM_MGMT\n"); if (local->hw.flags & IEEE80211_HW_AMPDU_AGGREGATION) - sf += snprintf(buf + sf, mxln - sf, "AMPDU_AGGREGATION\n"); + sf += scnprintf(buf + sf, mxln - sf, "AMPDU_AGGREGATION\n"); if (local->hw.flags & IEEE80211_HW_SUPPORTS_PS) - sf += snprintf(buf + sf, mxln - sf, "SUPPORTS_PS\n"); + sf += scnprintf(buf + sf, mxln - sf, "SUPPORTS_PS\n"); if (local->hw.flags & IEEE80211_HW_PS_NULLFUNC_STACK) - sf += snprintf(buf + sf, mxln - sf, "PS_NULLFUNC_STACK\n"); + sf += scnprintf(buf + sf, mxln - sf, "PS_NULLFUNC_STACK\n"); if (local->hw.flags & IEEE80211_HW_SUPPORTS_DYNAMIC_PS) - sf += snprintf(buf + sf, mxln - sf, "SUPPORTS_DYNAMIC_PS\n"); + sf += scnprintf(buf + sf, mxln - sf, "SUPPORTS_DYNAMIC_PS\n"); if (local->hw.flags & IEEE80211_HW_MFP_CAPABLE) - sf += snprintf(buf + sf, mxln - sf, "MFP_CAPABLE\n"); + sf += scnprintf(buf + sf, mxln - sf, "MFP_CAPABLE\n"); if (local->hw.flags & IEEE80211_HW_SUPPORTS_STATIC_SMPS) - sf += snprintf(buf + sf, mxln - sf, "SUPPORTS_STATIC_SMPS\n"); + sf += scnprintf(buf + sf, mxln - sf, "SUPPORTS_STATIC_SMPS\n"); if (local->hw.flags & IEEE80211_HW_SUPPORTS_DYNAMIC_SMPS) - sf += snprintf(buf + sf, mxln - sf, "SUPPORTS_DYNAMIC_SMPS\n"); + sf += scnprintf(buf + sf, mxln - sf, + "SUPPORTS_DYNAMIC_SMPS\n"); if (local->hw.flags & IEEE80211_HW_SUPPORTS_UAPSD) - sf += snprintf(buf + sf, mxln - sf, "SUPPORTS_UAPSD\n"); + sf += scnprintf(buf + sf, mxln - sf, "SUPPORTS_UAPSD\n"); if (local->hw.flags & IEEE80211_HW_REPORTS_TX_ACK_STATUS) - sf += snprintf(buf + sf, mxln - sf, "REPORTS_TX_ACK_STATUS\n"); + sf += scnprintf(buf + sf, mxln - sf, + "REPORTS_TX_ACK_STATUS\n"); if (local->hw.flags & IEEE80211_HW_CONNECTION_MONITOR) - sf += snprintf(buf + sf, mxln - sf, "CONNECTION_MONITOR\n"); + sf += scnprintf(buf + sf, mxln - sf, "CONNECTION_MONITOR\n"); if (local->hw.flags & IEEE80211_HW_SUPPORTS_PER_STA_GTK) - sf += snprintf(buf + sf, mxln - sf, "SUPPORTS_PER_STA_GTK\n"); + sf += scnprintf(buf + sf, mxln - sf, "SUPPORTS_PER_STA_GTK\n"); if (local->hw.flags & IEEE80211_HW_AP_LINK_PS) - sf += snprintf(buf + sf, mxln - sf, "AP_LINK_PS\n"); + sf += scnprintf(buf + sf, mxln - sf, "AP_LINK_PS\n"); if (local->hw.flags & IEEE80211_HW_TX_AMPDU_SETUP_IN_HW) - sf += snprintf(buf + sf, mxln - sf, "TX_AMPDU_SETUP_IN_HW\n"); + sf += scnprintf(buf + sf, mxln - sf, "TX_AMPDU_SETUP_IN_HW\n"); rv = simple_read_from_buffer(user_buf, count, ppos, buf, strlen(buf)); kfree(buf); diff --git a/net/mac80211/debugfs_netdev.c b/net/mac80211/debugfs_netdev.c index cafe614ef93d..04b5a14c8a05 100644 --- a/net/mac80211/debugfs_netdev.c +++ b/net/mac80211/debugfs_netdev.c @@ -224,12 +224,15 @@ static int ieee80211_set_smps(struct ieee80211_sub_if_data *sdata, smps_mode == IEEE80211_SMPS_AUTOMATIC)) return -EINVAL; - /* supported only on managed interfaces for now */ - if (sdata->vif.type != NL80211_IFTYPE_STATION) + if (sdata->vif.type != NL80211_IFTYPE_STATION && + sdata->vif.type != NL80211_IFTYPE_AP) return -EOPNOTSUPP; sdata_lock(sdata); - err = __ieee80211_request_smps(sdata, smps_mode); + if (sdata->vif.type == NL80211_IFTYPE_STATION) + err = __ieee80211_request_smps_mgd(sdata, smps_mode); + else + err = __ieee80211_request_smps_ap(sdata, smps_mode); sdata_unlock(sdata); return err; @@ -245,12 +248,15 @@ static const char *smps_modes[IEEE80211_SMPS_NUM_MODES] = { static ssize_t ieee80211_if_fmt_smps(const struct ieee80211_sub_if_data *sdata, char *buf, int buflen) { - if (sdata->vif.type != NL80211_IFTYPE_STATION) - return -EOPNOTSUPP; - - return snprintf(buf, buflen, "request: %s\nused: %s\n", - smps_modes[sdata->u.mgd.req_smps], - smps_modes[sdata->smps_mode]); + if (sdata->vif.type == NL80211_IFTYPE_STATION) + return snprintf(buf, buflen, "request: %s\nused: %s\n", + smps_modes[sdata->u.mgd.req_smps], + smps_modes[sdata->smps_mode]); + if (sdata->vif.type == NL80211_IFTYPE_AP) + return snprintf(buf, buflen, "request: %s\nused: %s\n", + smps_modes[sdata->u.ap.req_smps], + smps_modes[sdata->smps_mode]); + return -EINVAL; } static ssize_t ieee80211_if_parse_smps(struct ieee80211_sub_if_data *sdata, @@ -563,6 +569,7 @@ static void add_sta_files(struct ieee80211_sub_if_data *sdata) static void add_ap_files(struct ieee80211_sub_if_data *sdata) { DEBUGFS_ADD(num_mcast_sta); + DEBUGFS_ADD_MODE(smps, 0600); DEBUGFS_ADD(num_sta_ps); DEBUGFS_ADD(dtim_count); DEBUGFS_ADD(num_buffered_multicast); diff --git a/net/mac80211/driver-ops.h b/net/mac80211/driver-ops.h index b3ea11f3d526..5d03c47c0a4c 100644 --- a/net/mac80211/driver-ops.h +++ b/net/mac80211/driver-ops.h @@ -1085,4 +1085,31 @@ drv_channel_switch_beacon(struct ieee80211_sub_if_data *sdata, } } +static inline int drv_join_ibss(struct ieee80211_local *local, + struct ieee80211_sub_if_data *sdata) +{ + int ret = 0; + + might_sleep(); + check_sdata_in_driver(sdata); + + trace_drv_join_ibss(local, sdata, &sdata->vif.bss_conf); + if (local->ops->join_ibss) + ret = local->ops->join_ibss(&local->hw, &sdata->vif); + trace_drv_return_int(local, ret); + return ret; +} + +static inline void drv_leave_ibss(struct ieee80211_local *local, + struct ieee80211_sub_if_data *sdata) +{ + might_sleep(); + check_sdata_in_driver(sdata); + + trace_drv_leave_ibss(local, sdata); + if (local->ops->leave_ibss) + local->ops->leave_ibss(&local->hw, &sdata->vif); + trace_drv_return_void(local); +} + #endif /* __MAC80211_DRIVER_OPS */ diff --git a/net/mac80211/ht.c b/net/mac80211/ht.c index 529bf58bc145..9a8be8f69224 100644 --- a/net/mac80211/ht.c +++ b/net/mac80211/ht.c @@ -448,14 +448,25 @@ int ieee80211_send_smps_action(struct ieee80211_sub_if_data *sdata, return 0; } -void ieee80211_request_smps_work(struct work_struct *work) +void ieee80211_request_smps_mgd_work(struct work_struct *work) { struct ieee80211_sub_if_data *sdata = container_of(work, struct ieee80211_sub_if_data, u.mgd.request_smps_work); sdata_lock(sdata); - __ieee80211_request_smps(sdata, sdata->u.mgd.driver_smps_mode); + __ieee80211_request_smps_mgd(sdata, sdata->u.mgd.driver_smps_mode); + sdata_unlock(sdata); +} + +void ieee80211_request_smps_ap_work(struct work_struct *work) +{ + struct ieee80211_sub_if_data *sdata = + container_of(work, struct ieee80211_sub_if_data, + u.ap.request_smps_work); + + sdata_lock(sdata); + __ieee80211_request_smps_ap(sdata, sdata->u.ap.driver_smps_mode); sdata_unlock(sdata); } @@ -464,19 +475,29 @@ void ieee80211_request_smps(struct ieee80211_vif *vif, { struct ieee80211_sub_if_data *sdata = vif_to_sdata(vif); - if (WARN_ON(vif->type != NL80211_IFTYPE_STATION)) + if (WARN_ON_ONCE(vif->type != NL80211_IFTYPE_STATION && + vif->type != NL80211_IFTYPE_AP)) return; if (WARN_ON(smps_mode == IEEE80211_SMPS_OFF)) smps_mode = IEEE80211_SMPS_AUTOMATIC; - if (sdata->u.mgd.driver_smps_mode == smps_mode) - return; - - sdata->u.mgd.driver_smps_mode = smps_mode; - - ieee80211_queue_work(&sdata->local->hw, - &sdata->u.mgd.request_smps_work); + if (vif->type == NL80211_IFTYPE_STATION) { + if (sdata->u.mgd.driver_smps_mode == smps_mode) + return; + sdata->u.mgd.driver_smps_mode = smps_mode; + ieee80211_queue_work(&sdata->local->hw, + &sdata->u.mgd.request_smps_work); + } else { + /* AUTOMATIC is meaningless in AP mode */ + if (WARN_ON_ONCE(smps_mode == IEEE80211_SMPS_AUTOMATIC)) + return; + if (sdata->u.ap.driver_smps_mode == smps_mode) + return; + sdata->u.ap.driver_smps_mode = smps_mode; + ieee80211_queue_work(&sdata->local->hw, + &sdata->u.ap.request_smps_work); + } } /* this might change ... don't want non-open drivers using it */ EXPORT_SYMBOL_GPL(ieee80211_request_smps); diff --git a/net/mac80211/ibss.c b/net/mac80211/ibss.c index a12afe77bb26..531be040b9ae 100644 --- a/net/mac80211/ibss.c +++ b/net/mac80211/ibss.c @@ -39,7 +39,8 @@ ieee80211_ibss_build_presp(struct ieee80211_sub_if_data *sdata, const int beacon_int, const u32 basic_rates, const u16 capability, u64 tsf, struct cfg80211_chan_def *chandef, - bool *have_higher_than_11mbit) + bool *have_higher_than_11mbit, + struct cfg80211_csa_settings *csa_settings) { struct ieee80211_if_ibss *ifibss = &sdata->u.ibss; struct ieee80211_local *local = sdata->local; @@ -59,6 +60,7 @@ ieee80211_ibss_build_presp(struct ieee80211_sub_if_data *sdata, 2 + 8 /* max Supported Rates */ + 3 /* max DS params */ + 4 /* IBSS params */ + + 5 /* Channel Switch Announcement */ + 2 + (IEEE80211_MAX_SUPP_RATES - 8) + 2 + sizeof(struct ieee80211_ht_cap) + 2 + sizeof(struct ieee80211_ht_operation) + @@ -135,6 +137,16 @@ ieee80211_ibss_build_presp(struct ieee80211_sub_if_data *sdata, *pos++ = 0; *pos++ = 0; + if (csa_settings) { + *pos++ = WLAN_EID_CHANNEL_SWITCH; + *pos++ = 3; + *pos++ = csa_settings->block_tx ? 1 : 0; + *pos++ = ieee80211_frequency_to_channel( + csa_settings->chandef.chan->center_freq); + sdata->csa_counter_offset_beacon = (pos - presp->head); + *pos++ = csa_settings->count; + } + /* put the remaining rates in WLAN_EID_EXT_SUPP_RATES */ if (rates_n > 8) { *pos++ = WLAN_EID_EXT_SUPP_RATES; @@ -217,6 +229,8 @@ static void __ieee80211_sta_join_ibss(struct ieee80211_sub_if_data *sdata, struct beacon_data *presp; enum nl80211_bss_scan_width scan_width; bool have_higher_than_11mbit; + bool radar_required = false; + int err; sdata_assert_lock(sdata); @@ -235,6 +249,7 @@ static void __ieee80211_sta_join_ibss(struct ieee80211_sub_if_data *sdata, ieee80211_bss_info_change_notify(sdata, BSS_CHANGED_IBSS | BSS_CHANGED_BEACON_ENABLED); + drv_leave_ibss(local, sdata); } presp = rcu_dereference_protected(ifibss->presp, @@ -259,6 +274,23 @@ static void __ieee80211_sta_join_ibss(struct ieee80211_sub_if_data *sdata, } chandef.width = NL80211_CHAN_WIDTH_20; chandef.center_freq1 = chan->center_freq; + /* check again for downgraded chandef */ + if (!cfg80211_reg_can_beacon(local->hw.wiphy, &chandef)) { + sdata_info(sdata, + "Failed to join IBSS, beacons forbidden\n"); + return; + } + } + + err = cfg80211_chandef_dfs_required(sdata->local->hw.wiphy, + &chandef); + if (err > 0) { + if (!ifibss->userspace_handles_dfs) { + sdata_info(sdata, + "Failed to join IBSS, DFS channel without control program\n"); + return; + } + radar_required = true; } ieee80211_vif_release_channel(sdata); @@ -276,13 +308,14 @@ static void __ieee80211_sta_join_ibss(struct ieee80211_sub_if_data *sdata, presp = ieee80211_ibss_build_presp(sdata, beacon_int, basic_rates, capability, tsf, &chandef, - &have_higher_than_11mbit); + &have_higher_than_11mbit, NULL); if (!presp) return; rcu_assign_pointer(ifibss->presp, presp); mgmt = (void *)presp->head; + sdata->radar_required = radar_required; sdata->vif.bss_conf.enable_beacon = true; sdata->vif.bss_conf.beacon_int = beacon_int; sdata->vif.bss_conf.basic_rates = basic_rates; @@ -317,11 +350,26 @@ static void __ieee80211_sta_join_ibss(struct ieee80211_sub_if_data *sdata, else sdata->flags &= ~IEEE80211_SDATA_OPERATING_GMODE; + ieee80211_set_wmm_default(sdata, true); + sdata->vif.bss_conf.ibss_joined = true; sdata->vif.bss_conf.ibss_creator = creator; - ieee80211_bss_info_change_notify(sdata, bss_change); - ieee80211_set_wmm_default(sdata, true); + err = drv_join_ibss(local, sdata); + if (err) { + sdata->vif.bss_conf.ibss_joined = false; + sdata->vif.bss_conf.ibss_creator = false; + sdata->vif.bss_conf.enable_beacon = false; + sdata->vif.bss_conf.ssid_len = 0; + RCU_INIT_POINTER(ifibss->presp, NULL); + kfree_rcu(presp, rcu_head); + ieee80211_vif_release_channel(sdata); + sdata_info(sdata, "Failed to join IBSS, driver failure: %d\n", + err); + return; + } + + ieee80211_bss_info_change_notify(sdata, bss_change); ifibss->state = IEEE80211_IBSS_MLME_JOINED; mod_timer(&ifibss->timer, @@ -416,6 +464,115 @@ static void ieee80211_sta_join_ibss(struct ieee80211_sub_if_data *sdata, tsf, false); } +int ieee80211_ibss_csa_beacon(struct ieee80211_sub_if_data *sdata, + struct cfg80211_csa_settings *csa_settings) +{ + struct ieee80211_if_ibss *ifibss = &sdata->u.ibss; + struct beacon_data *presp, *old_presp; + struct cfg80211_bss *cbss; + const struct cfg80211_bss_ies *ies; + u16 capability; + u64 tsf; + int ret = 0; + + sdata_assert_lock(sdata); + + capability = WLAN_CAPABILITY_IBSS; + + if (ifibss->privacy) + capability |= WLAN_CAPABILITY_PRIVACY; + + cbss = cfg80211_get_bss(sdata->local->hw.wiphy, ifibss->chandef.chan, + ifibss->bssid, ifibss->ssid, + ifibss->ssid_len, WLAN_CAPABILITY_IBSS | + WLAN_CAPABILITY_PRIVACY, + capability); + + if (WARN_ON(!cbss)) { + ret = -EINVAL; + goto out; + } + + rcu_read_lock(); + ies = rcu_dereference(cbss->ies); + tsf = ies->tsf; + rcu_read_unlock(); + cfg80211_put_bss(sdata->local->hw.wiphy, cbss); + + old_presp = rcu_dereference_protected(ifibss->presp, + lockdep_is_held(&sdata->wdev.mtx)); + + presp = ieee80211_ibss_build_presp(sdata, + sdata->vif.bss_conf.beacon_int, + sdata->vif.bss_conf.basic_rates, + capability, tsf, &ifibss->chandef, + NULL, csa_settings); + if (!presp) { + ret = -ENOMEM; + goto out; + } + + rcu_assign_pointer(ifibss->presp, presp); + if (old_presp) + kfree_rcu(old_presp, rcu_head); + + /* it might not send the beacon for a while. send an action frame + * immediately to announce the channel switch. + */ + if (csa_settings) + ieee80211_send_action_csa(sdata, csa_settings); + + ieee80211_bss_info_change_notify(sdata, BSS_CHANGED_BEACON); + out: + return ret; +} + +int ieee80211_ibss_finish_csa(struct ieee80211_sub_if_data *sdata) +{ + struct ieee80211_if_ibss *ifibss = &sdata->u.ibss; + struct cfg80211_bss *cbss; + int err; + u16 capability; + + sdata_lock(sdata); + /* update cfg80211 bss information with the new channel */ + if (!is_zero_ether_addr(ifibss->bssid)) { + capability = WLAN_CAPABILITY_IBSS; + + if (ifibss->privacy) + capability |= WLAN_CAPABILITY_PRIVACY; + + cbss = cfg80211_get_bss(sdata->local->hw.wiphy, + ifibss->chandef.chan, + ifibss->bssid, ifibss->ssid, + ifibss->ssid_len, WLAN_CAPABILITY_IBSS | + WLAN_CAPABILITY_PRIVACY, + capability); + /* XXX: should not really modify cfg80211 data */ + if (cbss) { + cbss->channel = sdata->local->csa_chandef.chan; + cfg80211_put_bss(sdata->local->hw.wiphy, cbss); + } + } + + ifibss->chandef = sdata->local->csa_chandef; + + /* generate the beacon */ + err = ieee80211_ibss_csa_beacon(sdata, NULL); + sdata_unlock(sdata); + if (err < 0) + return err; + + return 0; +} + +void ieee80211_ibss_stop(struct ieee80211_sub_if_data *sdata) +{ + struct ieee80211_if_ibss *ifibss = &sdata->u.ibss; + + cancel_work_sync(&ifibss->csa_connection_drop_work); +} + static struct sta_info *ieee80211_ibss_finish_sta(struct sta_info *sta) __acquires(RCU) { @@ -499,6 +656,315 @@ ieee80211_ibss_add_sta(struct ieee80211_sub_if_data *sdata, const u8 *bssid, return ieee80211_ibss_finish_sta(sta); } +static int ieee80211_sta_active_ibss(struct ieee80211_sub_if_data *sdata) +{ + struct ieee80211_local *local = sdata->local; + int active = 0; + struct sta_info *sta; + + sdata_assert_lock(sdata); + + rcu_read_lock(); + + list_for_each_entry_rcu(sta, &local->sta_list, list) { + if (sta->sdata == sdata && + time_after(sta->last_rx + IEEE80211_IBSS_MERGE_INTERVAL, + jiffies)) { + active++; + break; + } + } + + rcu_read_unlock(); + + return active; +} + +static void ieee80211_ibss_disconnect(struct ieee80211_sub_if_data *sdata) +{ + struct ieee80211_if_ibss *ifibss = &sdata->u.ibss; + struct ieee80211_local *local = sdata->local; + struct cfg80211_bss *cbss; + struct beacon_data *presp; + struct sta_info *sta; + int active_ibss; + u16 capability; + + active_ibss = ieee80211_sta_active_ibss(sdata); + + if (!active_ibss && !is_zero_ether_addr(ifibss->bssid)) { + capability = WLAN_CAPABILITY_IBSS; + + if (ifibss->privacy) + capability |= WLAN_CAPABILITY_PRIVACY; + + cbss = cfg80211_get_bss(local->hw.wiphy, ifibss->chandef.chan, + ifibss->bssid, ifibss->ssid, + ifibss->ssid_len, WLAN_CAPABILITY_IBSS | + WLAN_CAPABILITY_PRIVACY, + capability); + + if (cbss) { + cfg80211_unlink_bss(local->hw.wiphy, cbss); + cfg80211_put_bss(sdata->local->hw.wiphy, cbss); + } + } + + ifibss->state = IEEE80211_IBSS_MLME_SEARCH; + + sta_info_flush(sdata); + + spin_lock_bh(&ifibss->incomplete_lock); + while (!list_empty(&ifibss->incomplete_stations)) { + sta = list_first_entry(&ifibss->incomplete_stations, + struct sta_info, list); + list_del(&sta->list); + spin_unlock_bh(&ifibss->incomplete_lock); + + sta_info_free(local, sta); + spin_lock_bh(&ifibss->incomplete_lock); + } + spin_unlock_bh(&ifibss->incomplete_lock); + + netif_carrier_off(sdata->dev); + + sdata->vif.bss_conf.ibss_joined = false; + sdata->vif.bss_conf.ibss_creator = false; + sdata->vif.bss_conf.enable_beacon = false; + sdata->vif.bss_conf.ssid_len = 0; + + /* remove beacon */ + presp = rcu_dereference_protected(ifibss->presp, + lockdep_is_held(&sdata->wdev.mtx)); + RCU_INIT_POINTER(sdata->u.ibss.presp, NULL); + if (presp) + kfree_rcu(presp, rcu_head); + + clear_bit(SDATA_STATE_OFFCHANNEL_BEACON_STOPPED, &sdata->state); + ieee80211_bss_info_change_notify(sdata, BSS_CHANGED_BEACON_ENABLED | + BSS_CHANGED_IBSS); + drv_leave_ibss(local, sdata); + ieee80211_vif_release_channel(sdata); +} + +static void ieee80211_csa_connection_drop_work(struct work_struct *work) +{ + struct ieee80211_sub_if_data *sdata = + container_of(work, struct ieee80211_sub_if_data, + u.ibss.csa_connection_drop_work); + + ieee80211_ibss_disconnect(sdata); + synchronize_rcu(); + skb_queue_purge(&sdata->skb_queue); + + /* trigger a scan to find another IBSS network to join */ + ieee80211_queue_work(&sdata->local->hw, &sdata->work); +} + +static void ieee80211_ibss_csa_mark_radar(struct ieee80211_sub_if_data *sdata) +{ + struct ieee80211_if_ibss *ifibss = &sdata->u.ibss; + int err; + + /* if the current channel is a DFS channel, mark the channel as + * unavailable. + */ + err = cfg80211_chandef_dfs_required(sdata->local->hw.wiphy, + &ifibss->chandef); + if (err > 0) + cfg80211_radar_event(sdata->local->hw.wiphy, &ifibss->chandef, + GFP_ATOMIC); +} + +static bool +ieee80211_ibss_process_chanswitch(struct ieee80211_sub_if_data *sdata, + struct ieee802_11_elems *elems, + bool beacon) +{ + struct cfg80211_csa_settings params; + struct ieee80211_csa_ie csa_ie; + struct ieee80211_if_ibss *ifibss = &sdata->u.ibss; + struct ieee80211_chanctx_conf *chanctx_conf; + struct ieee80211_chanctx *chanctx; + enum nl80211_channel_type ch_type; + int err, num_chanctx; + u32 sta_flags; + + if (sdata->vif.csa_active) + return true; + + if (!sdata->vif.bss_conf.ibss_joined) + return false; + + sta_flags = IEEE80211_STA_DISABLE_VHT; + switch (ifibss->chandef.width) { + case NL80211_CHAN_WIDTH_5: + case NL80211_CHAN_WIDTH_10: + case NL80211_CHAN_WIDTH_20_NOHT: + sta_flags |= IEEE80211_STA_DISABLE_HT; + /* fall through */ + case NL80211_CHAN_WIDTH_20: + sta_flags |= IEEE80211_STA_DISABLE_40MHZ; + break; + default: + break; + } + + memset(¶ms, 0, sizeof(params)); + memset(&csa_ie, 0, sizeof(csa_ie)); + err = ieee80211_parse_ch_switch_ie(sdata, elems, beacon, + ifibss->chandef.chan->band, + sta_flags, ifibss->bssid, &csa_ie); + /* can't switch to destination channel, fail */ + if (err < 0) + goto disconnect; + + /* did not contain a CSA */ + if (err) + return false; + + params.count = csa_ie.count; + params.chandef = csa_ie.chandef; + + if (ifibss->chandef.chan->band != params.chandef.chan->band) + goto disconnect; + + switch (ifibss->chandef.width) { + case NL80211_CHAN_WIDTH_20_NOHT: + case NL80211_CHAN_WIDTH_20: + case NL80211_CHAN_WIDTH_40: + /* keep our current HT mode (HT20/HT40+/HT40-), even if + * another mode has been announced. The mode is not adopted + * within the beacon while doing CSA and we should therefore + * keep the mode which we announce. + */ + ch_type = cfg80211_get_chandef_type(&ifibss->chandef); + cfg80211_chandef_create(¶ms.chandef, params.chandef.chan, + ch_type); + break; + case NL80211_CHAN_WIDTH_5: + case NL80211_CHAN_WIDTH_10: + if (params.chandef.width != ifibss->chandef.width) { + sdata_info(sdata, + "IBSS %pM received channel switch from incompatible channel width (%d MHz, width:%d, CF1/2: %d/%d MHz), disconnecting\n", + ifibss->bssid, + params.chandef.chan->center_freq, + params.chandef.width, + params.chandef.center_freq1, + params.chandef.center_freq2); + goto disconnect; + } + break; + default: + /* should not happen, sta_flags should prevent VHT modes. */ + WARN_ON(1); + goto disconnect; + } + + if (!cfg80211_reg_can_beacon(sdata->local->hw.wiphy, ¶ms.chandef)) { + sdata_info(sdata, + "IBSS %pM switches to unsupported channel (%d MHz, width:%d, CF1/2: %d/%d MHz), disconnecting\n", + ifibss->bssid, + params.chandef.chan->center_freq, + params.chandef.width, + params.chandef.center_freq1, + params.chandef.center_freq2); + goto disconnect; + } + + err = cfg80211_chandef_dfs_required(sdata->local->hw.wiphy, + ¶ms.chandef); + if (err < 0) + goto disconnect; + if (err) { + /* IBSS-DFS only allowed with a control program */ + if (!ifibss->userspace_handles_dfs) + goto disconnect; + + params.radar_required = true; + } + + rcu_read_lock(); + chanctx_conf = rcu_dereference(sdata->vif.chanctx_conf); + if (!chanctx_conf) { + rcu_read_unlock(); + goto disconnect; + } + + /* don't handle for multi-VIF cases */ + chanctx = container_of(chanctx_conf, struct ieee80211_chanctx, conf); + if (chanctx->refcount > 1) { + rcu_read_unlock(); + goto disconnect; + } + num_chanctx = 0; + list_for_each_entry_rcu(chanctx, &sdata->local->chanctx_list, list) + num_chanctx++; + + if (num_chanctx > 1) { + rcu_read_unlock(); + goto disconnect; + } + rcu_read_unlock(); + + /* all checks done, now perform the channel switch. */ + ibss_dbg(sdata, + "received channel switch announcement to go to channel %d MHz\n", + params.chandef.chan->center_freq); + + params.block_tx = !!csa_ie.mode; + + ieee80211_ibss_csa_beacon(sdata, ¶ms); + sdata->csa_radar_required = params.radar_required; + + if (params.block_tx) + ieee80211_stop_queues_by_reason(&sdata->local->hw, + IEEE80211_MAX_QUEUE_MAP, + IEEE80211_QUEUE_STOP_REASON_CSA); + + sdata->local->csa_chandef = params.chandef; + sdata->vif.csa_active = true; + + ieee80211_bss_info_change_notify(sdata, err); + drv_channel_switch_beacon(sdata, ¶ms.chandef); + + ieee80211_ibss_csa_mark_radar(sdata); + + return true; +disconnect: + ibss_dbg(sdata, "Can't handle channel switch, disconnect\n"); + ieee80211_queue_work(&sdata->local->hw, + &ifibss->csa_connection_drop_work); + + ieee80211_ibss_csa_mark_radar(sdata); + + return true; +} + +static void +ieee80211_rx_mgmt_spectrum_mgmt(struct ieee80211_sub_if_data *sdata, + struct ieee80211_mgmt *mgmt, size_t len, + struct ieee80211_rx_status *rx_status, + struct ieee802_11_elems *elems) +{ + int required_len; + + if (len < IEEE80211_MIN_ACTION_SIZE + 1) + return; + + /* CSA is the only action we handle for now */ + if (mgmt->u.action.u.measurement.action_code != + WLAN_ACTION_SPCT_CHL_SWITCH) + return; + + required_len = IEEE80211_MIN_ACTION_SIZE + + sizeof(mgmt->u.action.u.chan_switch); + if (len < required_len) + return; + + ieee80211_ibss_process_chanswitch(sdata, elems, false); +} + static void ieee80211_rx_mgmt_deauth_ibss(struct ieee80211_sub_if_data *sdata, struct ieee80211_mgmt *mgmt, size_t len) @@ -661,10 +1127,6 @@ static void ieee80211_rx_bss_info(struct ieee80211_sub_if_data *sdata, /* check if we need to merge IBSS */ - /* we use a fixed BSSID */ - if (sdata->u.ibss.fixed_bssid) - goto put_bss; - /* not an IBSS */ if (!(cbss->capability & WLAN_CAPABILITY_IBSS)) goto put_bss; @@ -680,10 +1142,18 @@ static void ieee80211_rx_bss_info(struct ieee80211_sub_if_data *sdata, sdata->u.ibss.ssid_len)) goto put_bss; + /* process channel switch */ + if (ieee80211_ibss_process_chanswitch(sdata, elems, true)) + goto put_bss; + /* same BSSID */ if (ether_addr_equal(cbss->bssid, sdata->u.ibss.bssid)) goto put_bss; + /* we use a fixed BSSID */ + if (sdata->u.ibss.fixed_bssid) + goto put_bss; + if (ieee80211_have_rx_timestamp(rx_status)) { /* time when timestamp field was received */ rx_timestamp = @@ -775,30 +1245,6 @@ void ieee80211_ibss_rx_no_sta(struct ieee80211_sub_if_data *sdata, ieee80211_queue_work(&local->hw, &sdata->work); } -static int ieee80211_sta_active_ibss(struct ieee80211_sub_if_data *sdata) -{ - struct ieee80211_local *local = sdata->local; - int active = 0; - struct sta_info *sta; - - sdata_assert_lock(sdata); - - rcu_read_lock(); - - list_for_each_entry_rcu(sta, &local->sta_list, list) { - if (sta->sdata == sdata && - time_after(sta->last_rx + IEEE80211_IBSS_MERGE_INTERVAL, - jiffies)) { - active++; - break; - } - } - - rcu_read_unlock(); - - return active; -} - static void ieee80211_ibss_sta_expire(struct ieee80211_sub_if_data *sdata) { struct ieee80211_local *local = sdata->local; @@ -1076,6 +1522,8 @@ void ieee80211_ibss_rx_queued_mgmt(struct ieee80211_sub_if_data *sdata, struct ieee80211_rx_status *rx_status; struct ieee80211_mgmt *mgmt; u16 fc; + struct ieee802_11_elems elems; + int ies_len; rx_status = IEEE80211_SKB_RXCB(skb); mgmt = (struct ieee80211_mgmt *) skb->data; @@ -1101,6 +1549,27 @@ void ieee80211_ibss_rx_queued_mgmt(struct ieee80211_sub_if_data *sdata, case IEEE80211_STYPE_DEAUTH: ieee80211_rx_mgmt_deauth_ibss(sdata, mgmt, skb->len); break; + case IEEE80211_STYPE_ACTION: + switch (mgmt->u.action.category) { + case WLAN_CATEGORY_SPECTRUM_MGMT: + ies_len = skb->len - + offsetof(struct ieee80211_mgmt, + u.action.u.chan_switch.variable); + + if (ies_len < 0) + break; + + ieee802_11_parse_elems( + mgmt->u.action.u.chan_switch.variable, + ies_len, true, &elems); + + if (elems.parse_error) + break; + + ieee80211_rx_mgmt_spectrum_mgmt(sdata, mgmt, skb->len, + rx_status, &elems); + break; + } } mgmt_out: @@ -1167,6 +1636,8 @@ void ieee80211_ibss_setup_sdata(struct ieee80211_sub_if_data *sdata) (unsigned long) sdata); INIT_LIST_HEAD(&ifibss->incomplete_stations); spin_lock_init(&ifibss->incomplete_lock); + INIT_WORK(&ifibss->csa_connection_drop_work, + ieee80211_csa_connection_drop_work); } /* scan finished notification */ @@ -1202,6 +1673,7 @@ int ieee80211_ibss_join(struct ieee80211_sub_if_data *sdata, sdata->u.ibss.privacy = params->privacy; sdata->u.ibss.control_port = params->control_port; + sdata->u.ibss.userspace_handles_dfs = params->userspace_handles_dfs; sdata->u.ibss.basic_rates = params->basic_rates; /* fix basic_rates if channel does not support these rates */ @@ -1265,73 +1737,19 @@ int ieee80211_ibss_join(struct ieee80211_sub_if_data *sdata, int ieee80211_ibss_leave(struct ieee80211_sub_if_data *sdata) { struct ieee80211_if_ibss *ifibss = &sdata->u.ibss; - struct ieee80211_local *local = sdata->local; - struct cfg80211_bss *cbss; - u16 capability; - int active_ibss; - struct sta_info *sta; - struct beacon_data *presp; - - active_ibss = ieee80211_sta_active_ibss(sdata); - - if (!active_ibss && !is_zero_ether_addr(ifibss->bssid)) { - capability = WLAN_CAPABILITY_IBSS; - if (ifibss->privacy) - capability |= WLAN_CAPABILITY_PRIVACY; - - cbss = cfg80211_get_bss(local->hw.wiphy, ifibss->chandef.chan, - ifibss->bssid, ifibss->ssid, - ifibss->ssid_len, WLAN_CAPABILITY_IBSS | - WLAN_CAPABILITY_PRIVACY, - capability); - - if (cbss) { - cfg80211_unlink_bss(local->hw.wiphy, cbss); - cfg80211_put_bss(local->hw.wiphy, cbss); - } - } - - ifibss->state = IEEE80211_IBSS_MLME_SEARCH; - memset(ifibss->bssid, 0, ETH_ALEN); + ieee80211_ibss_disconnect(sdata); ifibss->ssid_len = 0; - - sta_info_flush(sdata); - - spin_lock_bh(&ifibss->incomplete_lock); - while (!list_empty(&ifibss->incomplete_stations)) { - sta = list_first_entry(&ifibss->incomplete_stations, - struct sta_info, list); - list_del(&sta->list); - spin_unlock_bh(&ifibss->incomplete_lock); - - sta_info_free(local, sta); - spin_lock_bh(&ifibss->incomplete_lock); - } - spin_unlock_bh(&ifibss->incomplete_lock); - - netif_carrier_off(sdata->dev); + memset(ifibss->bssid, 0, ETH_ALEN); /* remove beacon */ kfree(sdata->u.ibss.ie); - presp = rcu_dereference_protected(ifibss->presp, - lockdep_is_held(&sdata->wdev.mtx)); - RCU_INIT_POINTER(sdata->u.ibss.presp, NULL); /* on the next join, re-program HT parameters */ memset(&ifibss->ht_capa, 0, sizeof(ifibss->ht_capa)); memset(&ifibss->ht_capa_mask, 0, sizeof(ifibss->ht_capa_mask)); - sdata->vif.bss_conf.ibss_joined = false; - sdata->vif.bss_conf.ibss_creator = false; - sdata->vif.bss_conf.enable_beacon = false; - sdata->vif.bss_conf.ssid_len = 0; - clear_bit(SDATA_STATE_OFFCHANNEL_BEACON_STOPPED, &sdata->state); - ieee80211_bss_info_change_notify(sdata, BSS_CHANGED_BEACON_ENABLED | - BSS_CHANGED_IBSS); - ieee80211_vif_release_channel(sdata); synchronize_rcu(); - kfree(presp); skb_queue_purge(&sdata->skb_queue); diff --git a/net/mac80211/ieee80211_i.h b/net/mac80211/ieee80211_i.h index 611abfcfb5eb..29dc505be125 100644 --- a/net/mac80211/ieee80211_i.h +++ b/net/mac80211/ieee80211_i.h @@ -262,6 +262,10 @@ struct ieee80211_if_ap { struct ps_data ps; atomic_t num_mcast_sta; /* number of stations receiving multicast */ + enum ieee80211_smps_mode req_smps, /* requested smps mode */ + driver_smps_mode; /* smps mode request */ + + struct work_struct request_smps_work; }; struct ieee80211_if_wds { @@ -322,7 +326,6 @@ struct ieee80211_roc_work { /* flags used in struct ieee80211_if_managed.flags */ enum ieee80211_sta_flags { - IEEE80211_STA_BEACON_POLL = BIT(0), IEEE80211_STA_CONNECTION_POLL = BIT(1), IEEE80211_STA_CONTROL_PORT = BIT(2), IEEE80211_STA_DISABLE_HT = BIT(4), @@ -335,6 +338,7 @@ enum ieee80211_sta_flags { IEEE80211_STA_DISABLE_VHT = BIT(11), IEEE80211_STA_DISABLE_80P80MHZ = BIT(12), IEEE80211_STA_DISABLE_160MHZ = BIT(13), + IEEE80211_STA_DISABLE_WMM = BIT(14), }; struct ieee80211_mgd_auth_data { @@ -487,6 +491,7 @@ struct ieee80211_if_managed { struct ieee80211_if_ibss { struct timer_list timer; + struct work_struct csa_connection_drop_work; unsigned long last_scan_completed; @@ -497,6 +502,7 @@ struct ieee80211_if_ibss { bool privacy; bool control_port; + bool userspace_handles_dfs; u8 bssid[ETH_ALEN] __aligned(2); u8 ssid[IEEE80211_MAX_SSID_LEN]; @@ -538,6 +544,11 @@ struct ieee80211_mesh_sync_ops { /* add other framework functions here */ }; +struct mesh_csa_settings { + struct rcu_head rcu_head; + struct cfg80211_csa_settings settings; +}; + struct ieee80211_if_mesh { struct timer_list housekeeping_timer; struct timer_list mesh_path_timer; @@ -598,6 +609,11 @@ struct ieee80211_if_mesh { int ps_peers_light_sleep; int ps_peers_deep_sleep; struct ps_data ps; + /* Channel Switching Support */ + struct mesh_csa_settings __rcu *csa; + bool chsw_init; + u8 chsw_ttl; + u16 pre_value; }; #ifdef CONFIG_MAC80211_MESH @@ -1206,6 +1222,14 @@ struct ieee80211_ra_tid { u16 tid; }; +/* this struct holds the value parsing from channel switch IE */ +struct ieee80211_csa_ie { + struct cfg80211_chan_def chandef; + u8 mode; + u8 count; + u8 ttl; +}; + /* Parsed Information Elements */ struct ieee802_11_elems { const u8 *ie_start; @@ -1242,6 +1266,7 @@ struct ieee802_11_elems { const struct ieee80211_timeout_interval_ie *timeout_int; const u8 *opmode_notif; const struct ieee80211_sec_chan_offs_ie *sec_chan_offs; + const struct ieee80211_mesh_chansw_params_ie *mesh_chansw_params_ie; /* length of them, respectively */ u8 ssid_len; @@ -1333,11 +1358,19 @@ int ieee80211_ibss_leave(struct ieee80211_sub_if_data *sdata); void ieee80211_ibss_work(struct ieee80211_sub_if_data *sdata); void ieee80211_ibss_rx_queued_mgmt(struct ieee80211_sub_if_data *sdata, struct sk_buff *skb); +int ieee80211_ibss_csa_beacon(struct ieee80211_sub_if_data *sdata, + struct cfg80211_csa_settings *csa_settings); +int ieee80211_ibss_finish_csa(struct ieee80211_sub_if_data *sdata); +void ieee80211_ibss_stop(struct ieee80211_sub_if_data *sdata); /* mesh code */ void ieee80211_mesh_work(struct ieee80211_sub_if_data *sdata); void ieee80211_mesh_rx_queued_mgmt(struct ieee80211_sub_if_data *sdata, struct sk_buff *skb); +int ieee80211_mesh_csa_beacon(struct ieee80211_sub_if_data *sdata, + struct cfg80211_csa_settings *csa_settings, + bool csa_action); +int ieee80211_mesh_finish_csa(struct ieee80211_sub_if_data *sdata); /* scan/BSS handling */ void ieee80211_scan_work(struct work_struct *work); @@ -1434,7 +1467,10 @@ void ieee80211_send_delba(struct ieee80211_sub_if_data *sdata, int ieee80211_send_smps_action(struct ieee80211_sub_if_data *sdata, enum ieee80211_smps_mode smps, const u8 *da, const u8 *bssid); -void ieee80211_request_smps_work(struct work_struct *work); +void ieee80211_request_smps_ap_work(struct work_struct *work); +void ieee80211_request_smps_mgd_work(struct work_struct *work); +bool ieee80211_smps_is_restrictive(enum ieee80211_smps_mode smps_mode_old, + enum ieee80211_smps_mode smps_mode_new); void ___ieee80211_stop_rx_ba_session(struct sta_info *sta, u16 tid, u16 initiator, u16 reason, bool stop); @@ -1484,6 +1520,28 @@ void ieee80211_apply_vhtcap_overrides(struct ieee80211_sub_if_data *sdata, void ieee80211_process_measurement_req(struct ieee80211_sub_if_data *sdata, struct ieee80211_mgmt *mgmt, size_t len); +/** + * ieee80211_parse_ch_switch_ie - parses channel switch IEs + * @sdata: the sdata of the interface which has received the frame + * @elems: parsed 802.11 elements received with the frame + * @beacon: indicates if the frame was a beacon or probe response + * @current_band: indicates the current band + * @sta_flags: contains information about own capabilities and restrictions + * to decide which channel switch announcements can be accepted. Only the + * following subset of &enum ieee80211_sta_flags are evaluated: + * %IEEE80211_STA_DISABLE_HT, %IEEE80211_STA_DISABLE_VHT, + * %IEEE80211_STA_DISABLE_40MHZ, %IEEE80211_STA_DISABLE_80P80MHZ, + * %IEEE80211_STA_DISABLE_160MHZ. + * @bssid: the currently connected bssid (for reporting) + * @csa_ie: parsed 802.11 csa elements on count, mode, chandef and mesh ttl. + All of them will be filled with if success only. + * Return: 0 on success, <0 on error and >0 if there is nothing to parse. + */ +int ieee80211_parse_ch_switch_ie(struct ieee80211_sub_if_data *sdata, + struct ieee802_11_elems *elems, bool beacon, + enum ieee80211_band current_band, + u32 sta_flags, u8 *bssid, + struct ieee80211_csa_ie *csa_ie); /* Suspend/resume and hw reconfiguration */ int ieee80211_reconfig(struct ieee80211_local *local); @@ -1629,8 +1687,10 @@ void ieee80211_send_probe_req(struct ieee80211_sub_if_data *sdata, u8 *dst, u32 ieee80211_sta_get_rates(struct ieee80211_sub_if_data *sdata, struct ieee802_11_elems *elems, enum ieee80211_band band, u32 *basic_rates); -int __ieee80211_request_smps(struct ieee80211_sub_if_data *sdata, - enum ieee80211_smps_mode smps_mode); +int __ieee80211_request_smps_mgd(struct ieee80211_sub_if_data *sdata, + enum ieee80211_smps_mode smps_mode); +int __ieee80211_request_smps_ap(struct ieee80211_sub_if_data *sdata, + enum ieee80211_smps_mode smps_mode); void ieee80211_recalc_smps(struct ieee80211_sub_if_data *sdata); size_t ieee80211_ie_split(const u8 *ies, size_t ielen, @@ -1657,6 +1717,7 @@ int ieee80211_add_ext_srates_ie(struct ieee80211_sub_if_data *sdata, void ieee80211_ht_oper_to_chandef(struct ieee80211_channel *control_chan, const struct ieee80211_ht_operation *ht_oper, struct cfg80211_chan_def *chandef); +u32 ieee80211_chandef_downgrade(struct cfg80211_chan_def *c); int __must_check ieee80211_vif_use_channel(struct ieee80211_sub_if_data *sdata, @@ -1685,6 +1746,8 @@ void ieee80211_dfs_cac_timer(unsigned long data); void ieee80211_dfs_cac_timer_work(struct work_struct *work); void ieee80211_dfs_cac_cancel(struct ieee80211_local *local); void ieee80211_dfs_radar_detected_work(struct work_struct *work); +int ieee80211_send_action_csa(struct ieee80211_sub_if_data *sdata, + struct cfg80211_csa_settings *csa_settings); #ifdef CONFIG_MAC80211_NOINLINE #define debug_noinline noinline diff --git a/net/mac80211/iface.c b/net/mac80211/iface.c index fcecd633514e..ff101ea1d9ae 100644 --- a/net/mac80211/iface.c +++ b/net/mac80211/iface.c @@ -766,6 +766,10 @@ static void ieee80211_do_stop(struct ieee80211_sub_if_data *sdata, if (sdata->vif.type == NL80211_IFTYPE_STATION) ieee80211_mgd_stop(sdata); + if (sdata->vif.type == NL80211_IFTYPE_ADHOC) + ieee80211_ibss_stop(sdata); + + /* * Remove all stations associated with this interface. * @@ -1289,7 +1293,10 @@ static void ieee80211_setup_sdata(struct ieee80211_sub_if_data *sdata, case NL80211_IFTYPE_AP: skb_queue_head_init(&sdata->u.ap.ps.bc_buf); INIT_LIST_HEAD(&sdata->u.ap.vlans); + INIT_WORK(&sdata->u.ap.request_smps_work, + ieee80211_request_smps_ap_work); sdata->vif.bss_conf.bssid = sdata->vif.addr; + sdata->u.ap.req_smps = IEEE80211_SMPS_OFF; break; case NL80211_IFTYPE_P2P_CLIENT: type = NL80211_IFTYPE_STATION; diff --git a/net/mac80211/key.c b/net/mac80211/key.c index 620677e897bd..3e51dd7d98b3 100644 --- a/net/mac80211/key.c +++ b/net/mac80211/key.c @@ -879,7 +879,7 @@ ieee80211_gtk_rekey_add(struct ieee80211_vif *vif, keyconf->keylen, keyconf->key, 0, NULL); if (IS_ERR(key)) - return ERR_PTR(PTR_ERR(key)); + return ERR_CAST(key); if (sdata->u.mgd.mfp != IEEE80211_MFP_DISABLED) key->conf.flags |= IEEE80211_KEY_FLAG_RX_MGMT; diff --git a/net/mac80211/key.h b/net/mac80211/key.h index 036d57e76a5e..aaae0ed37004 100644 --- a/net/mac80211/key.h +++ b/net/mac80211/key.h @@ -83,7 +83,7 @@ struct ieee80211_key { * Management frames. */ u8 rx_pn[IEEE80211_NUM_TIDS + 1][IEEE80211_CCMP_PN_LEN]; - struct crypto_cipher *tfm; + struct crypto_aead *tfm; u32 replays; /* dot11RSNAStatsCCMPReplays */ } ccmp; struct { diff --git a/net/mac80211/mesh.c b/net/mac80211/mesh.c index 707ac61d63e5..896fe3bd599e 100644 --- a/net/mac80211/mesh.c +++ b/net/mac80211/mesh.c @@ -12,6 +12,7 @@ #include <asm/unaligned.h> #include "ieee80211_i.h" #include "mesh.h" +#include "driver-ops.h" static int mesh_allocated; static struct kmem_cache *rm_cache; @@ -610,6 +611,7 @@ ieee80211_mesh_build_beacon(struct ieee80211_if_mesh *ifmsh) struct sk_buff *skb; struct ieee80211_mgmt *mgmt; struct ieee80211_chanctx_conf *chanctx_conf; + struct mesh_csa_settings *csa; enum ieee80211_band band; u8 *pos; struct ieee80211_sub_if_data *sdata; @@ -624,6 +626,10 @@ ieee80211_mesh_build_beacon(struct ieee80211_if_mesh *ifmsh) head_len = hdr_len + 2 + /* NULL SSID */ + /* Channel Switch Announcement */ + 2 + sizeof(struct ieee80211_channel_sw_ie) + + /* Mesh Channel Swith Parameters */ + 2 + sizeof(struct ieee80211_mesh_chansw_params_ie) + 2 + 8 + /* supported rates */ 2 + 3; /* DS params */ tail_len = 2 + (IEEE80211_MAX_SUPP_RATES - 8) + @@ -665,6 +671,38 @@ ieee80211_mesh_build_beacon(struct ieee80211_if_mesh *ifmsh) *pos++ = WLAN_EID_SSID; *pos++ = 0x0; + rcu_read_lock(); + csa = rcu_dereference(ifmsh->csa); + if (csa) { + __le16 pre_value; + + pos = skb_put(skb, 13); + memset(pos, 0, 13); + *pos++ = WLAN_EID_CHANNEL_SWITCH; + *pos++ = 3; + *pos++ = 0x0; + *pos++ = ieee80211_frequency_to_channel( + csa->settings.chandef.chan->center_freq); + sdata->csa_counter_offset_beacon = hdr_len + 6; + *pos++ = csa->settings.count; + *pos++ = WLAN_EID_CHAN_SWITCH_PARAM; + *pos++ = 6; + if (ifmsh->chsw_init) { + *pos++ = ifmsh->mshcfg.dot11MeshTTL; + *pos |= WLAN_EID_CHAN_SWITCH_PARAM_INITIATOR; + } else { + *pos++ = ifmsh->chsw_ttl; + } + *pos++ |= csa->settings.block_tx ? + WLAN_EID_CHAN_SWITCH_PARAM_TX_RESTRICT : 0x00; + put_unaligned_le16(WLAN_REASON_MESH_CHAN, pos); + pos += 2; + pre_value = cpu_to_le16(ifmsh->pre_value); + memcpy(pos, &pre_value, 2); + pos += 2; + } + rcu_read_unlock(); + if (ieee80211_add_srates_ie(sdata, skb, true, band) || mesh_add_ds_params_ie(sdata, skb)) goto out_free; @@ -812,6 +850,127 @@ void ieee80211_stop_mesh(struct ieee80211_sub_if_data *sdata) ieee80211_configure_filter(local); } +static bool +ieee80211_mesh_process_chnswitch(struct ieee80211_sub_if_data *sdata, + struct ieee802_11_elems *elems, bool beacon) +{ + struct cfg80211_csa_settings params; + struct ieee80211_csa_ie csa_ie; + struct ieee80211_chanctx_conf *chanctx_conf; + struct ieee80211_chanctx *chanctx; + struct ieee80211_if_mesh *ifmsh = &sdata->u.mesh; + enum ieee80211_band band = ieee80211_get_sdata_band(sdata); + int err, num_chanctx; + u32 sta_flags; + + if (sdata->vif.csa_active) + return true; + + if (!ifmsh->mesh_id) + return false; + + sta_flags = IEEE80211_STA_DISABLE_VHT; + switch (sdata->vif.bss_conf.chandef.width) { + case NL80211_CHAN_WIDTH_20_NOHT: + sta_flags |= IEEE80211_STA_DISABLE_HT; + case NL80211_CHAN_WIDTH_20: + sta_flags |= IEEE80211_STA_DISABLE_40MHZ; + break; + default: + break; + } + + memset(¶ms, 0, sizeof(params)); + memset(&csa_ie, 0, sizeof(csa_ie)); + err = ieee80211_parse_ch_switch_ie(sdata, elems, beacon, band, + sta_flags, sdata->vif.addr, + &csa_ie); + if (err < 0) + return false; + if (err) + return false; + + params.chandef = csa_ie.chandef; + params.count = csa_ie.count; + + if (sdata->vif.bss_conf.chandef.chan->band != + params.chandef.chan->band) + return false; + + if (!cfg80211_chandef_usable(sdata->local->hw.wiphy, ¶ms.chandef, + IEEE80211_CHAN_DISABLED)) { + sdata_info(sdata, + "mesh STA %pM switches to unsupported channel (%d MHz, width:%d, CF1/2: %d/%d MHz), aborting\n", + sdata->vif.addr, + params.chandef.chan->center_freq, + params.chandef.width, + params.chandef.center_freq1, + params.chandef.center_freq2); + return false; + } + + err = cfg80211_chandef_dfs_required(sdata->local->hw.wiphy, + ¶ms.chandef); + if (err < 0) + return false; + if (err) { + params.radar_required = true; + /* TODO: DFS not (yet) supported */ + return false; + } + + rcu_read_lock(); + chanctx_conf = rcu_dereference(sdata->vif.chanctx_conf); + if (!chanctx_conf) + goto failed_chswitch; + + /* don't handle for multi-VIF cases */ + chanctx = container_of(chanctx_conf, struct ieee80211_chanctx, conf); + if (chanctx->refcount > 1) + goto failed_chswitch; + + num_chanctx = 0; + list_for_each_entry_rcu(chanctx, &sdata->local->chanctx_list, list) + num_chanctx++; + + if (num_chanctx > 1) + goto failed_chswitch; + + rcu_read_unlock(); + + mcsa_dbg(sdata, + "received channel switch announcement to go to channel %d MHz\n", + params.chandef.chan->center_freq); + + params.block_tx = csa_ie.mode & WLAN_EID_CHAN_SWITCH_PARAM_TX_RESTRICT; + if (beacon) + ifmsh->chsw_ttl = csa_ie.ttl - 1; + else + ifmsh->chsw_ttl = 0; + + if (ifmsh->chsw_ttl > 0) + if (ieee80211_mesh_csa_beacon(sdata, ¶ms, false) < 0) + return false; + + sdata->csa_radar_required = params.radar_required; + + if (params.block_tx) + ieee80211_stop_queues_by_reason(&sdata->local->hw, + IEEE80211_MAX_QUEUE_MAP, + IEEE80211_QUEUE_STOP_REASON_CSA); + + sdata->local->csa_chandef = params.chandef; + sdata->vif.csa_active = true; + + ieee80211_bss_info_change_notify(sdata, err); + drv_channel_switch_beacon(sdata, ¶ms.chandef); + + return true; +failed_chswitch: + rcu_read_unlock(); + return false; +} + static void ieee80211_mesh_rx_probe_req(struct ieee80211_sub_if_data *sdata, struct ieee80211_mgmt *mgmt, size_t len) @@ -918,6 +1077,142 @@ static void ieee80211_mesh_rx_bcn_presp(struct ieee80211_sub_if_data *sdata, if (ifmsh->sync_ops) ifmsh->sync_ops->rx_bcn_presp(sdata, stype, mgmt, &elems, rx_status); + + if (!ifmsh->chsw_init) + ieee80211_mesh_process_chnswitch(sdata, &elems, true); +} + +int ieee80211_mesh_finish_csa(struct ieee80211_sub_if_data *sdata) +{ + struct ieee80211_if_mesh *ifmsh = &sdata->u.mesh; + struct mesh_csa_settings *tmp_csa_settings; + int ret = 0; + + /* Reset the TTL value and Initiator flag */ + ifmsh->chsw_init = false; + ifmsh->chsw_ttl = 0; + + /* Remove the CSA and MCSP elements from the beacon */ + tmp_csa_settings = rcu_dereference(ifmsh->csa); + rcu_assign_pointer(ifmsh->csa, NULL); + kfree_rcu(tmp_csa_settings, rcu_head); + ret = ieee80211_mesh_rebuild_beacon(sdata); + if (ret) + return -EINVAL; + + ieee80211_bss_info_change_notify(sdata, BSS_CHANGED_BEACON); + + mcsa_dbg(sdata, "complete switching to center freq %d MHz", + sdata->vif.bss_conf.chandef.chan->center_freq); + return 0; +} + +int ieee80211_mesh_csa_beacon(struct ieee80211_sub_if_data *sdata, + struct cfg80211_csa_settings *csa_settings, + bool csa_action) +{ + struct ieee80211_if_mesh *ifmsh = &sdata->u.mesh; + struct mesh_csa_settings *tmp_csa_settings; + int ret = 0; + + tmp_csa_settings = kmalloc(sizeof(*tmp_csa_settings), + GFP_ATOMIC); + if (!tmp_csa_settings) + return -ENOMEM; + + memcpy(&tmp_csa_settings->settings, csa_settings, + sizeof(struct cfg80211_csa_settings)); + + rcu_assign_pointer(ifmsh->csa, tmp_csa_settings); + + ret = ieee80211_mesh_rebuild_beacon(sdata); + if (ret) { + tmp_csa_settings = rcu_dereference(ifmsh->csa); + rcu_assign_pointer(ifmsh->csa, NULL); + kfree_rcu(tmp_csa_settings, rcu_head); + return ret; + } + + ieee80211_bss_info_change_notify(sdata, BSS_CHANGED_BEACON); + + if (csa_action) + ieee80211_send_action_csa(sdata, csa_settings); + + return 0; +} + +static int mesh_fwd_csa_frame(struct ieee80211_sub_if_data *sdata, + struct ieee80211_mgmt *mgmt, size_t len) +{ + struct ieee80211_mgmt *mgmt_fwd; + struct sk_buff *skb; + struct ieee80211_local *local = sdata->local; + u8 *pos = mgmt->u.action.u.chan_switch.variable; + size_t offset_ttl; + + skb = dev_alloc_skb(local->tx_headroom + len); + if (!skb) + return -ENOMEM; + skb_reserve(skb, local->tx_headroom); + mgmt_fwd = (struct ieee80211_mgmt *) skb_put(skb, len); + + /* offset_ttl is based on whether the secondary channel + * offset is available or not. Substract 1 from the mesh TTL + * and disable the initiator flag before forwarding. + */ + offset_ttl = (len < 42) ? 7 : 10; + *(pos + offset_ttl) -= 1; + *(pos + offset_ttl + 1) &= ~WLAN_EID_CHAN_SWITCH_PARAM_INITIATOR; + sdata->u.mesh.chsw_ttl = *(pos + offset_ttl); + + memcpy(mgmt_fwd, mgmt, len); + eth_broadcast_addr(mgmt_fwd->da); + memcpy(mgmt_fwd->sa, sdata->vif.addr, ETH_ALEN); + memcpy(mgmt_fwd->bssid, sdata->vif.addr, ETH_ALEN); + + ieee80211_tx_skb(sdata, skb); + return 0; +} + +static void mesh_rx_csa_frame(struct ieee80211_sub_if_data *sdata, + struct ieee80211_mgmt *mgmt, size_t len) +{ + struct ieee80211_if_mesh *ifmsh = &sdata->u.mesh; + struct ieee802_11_elems elems; + u16 pre_value; + bool fwd_csa = true; + size_t baselen; + u8 *pos, ttl; + + if (mgmt->u.action.u.measurement.action_code != + WLAN_ACTION_SPCT_CHL_SWITCH) + return; + + pos = mgmt->u.action.u.chan_switch.variable; + baselen = offsetof(struct ieee80211_mgmt, + u.action.u.chan_switch.variable); + ieee802_11_parse_elems(pos, len - baselen, false, &elems); + + ttl = elems.mesh_chansw_params_ie->mesh_ttl; + if (!--ttl) + fwd_csa = false; + + pre_value = le16_to_cpu(elems.mesh_chansw_params_ie->mesh_pre_value); + if (ifmsh->pre_value >= pre_value) + return; + + ifmsh->pre_value = pre_value; + + if (!ieee80211_mesh_process_chnswitch(sdata, &elems, false)) { + mcsa_dbg(sdata, "Failed to process CSA action frame"); + return; + } + + /* forward or re-broadcast the CSA frame */ + if (fwd_csa) { + if (mesh_fwd_csa_frame(sdata, mgmt, len) < 0) + mcsa_dbg(sdata, "Failed to forward the CSA frame"); + } } static void ieee80211_mesh_rx_mgmt_action(struct ieee80211_sub_if_data *sdata, @@ -939,6 +1234,9 @@ static void ieee80211_mesh_rx_mgmt_action(struct ieee80211_sub_if_data *sdata, if (mesh_action_is_path_sel(mgmt)) mesh_rx_path_sel_frame(sdata, mgmt, len); break; + case WLAN_CATEGORY_SPECTRUM_MGMT: + mesh_rx_csa_frame(sdata, mgmt, len); + break; } } @@ -1056,13 +1354,11 @@ void ieee80211_mesh_init_sdata(struct ieee80211_sub_if_data *sdata) (unsigned long) sdata); ifmsh->accepting_plinks = true; - ifmsh->preq_id = 0; - ifmsh->sn = 0; - ifmsh->num_gates = 0; atomic_set(&ifmsh->mpaths, 0); mesh_rmc_init(sdata); ifmsh->last_preq = jiffies; ifmsh->next_perr = jiffies; + ifmsh->chsw_init = false; /* Allocate all mesh structures when creating the first mesh interface. */ if (!mesh_allocated) ieee80211s_init(); diff --git a/net/mac80211/mesh_plink.c b/net/mac80211/mesh_plink.c index 6b65d5055f5b..4301aa5aa227 100644 --- a/net/mac80211/mesh_plink.c +++ b/net/mac80211/mesh_plink.c @@ -222,7 +222,8 @@ static u32 __mesh_plink_deactivate(struct sta_info *sta) mesh_path_flush_by_nexthop(sta); ieee80211_mps_sta_status_update(sta); - changed |= ieee80211_mps_local_status_update(sdata); + changed |= ieee80211_mps_set_sta_local_pm(sta, + NL80211_MESH_POWER_UNKNOWN); return changed; } diff --git a/net/mac80211/mesh_ps.c b/net/mac80211/mesh_ps.c index 22290a929b94..0f79b78b5e86 100644 --- a/net/mac80211/mesh_ps.c +++ b/net/mac80211/mesh_ps.c @@ -152,6 +152,9 @@ u32 ieee80211_mps_set_sta_local_pm(struct sta_info *sta, { struct ieee80211_sub_if_data *sdata = sta->sdata; + if (sta->local_pm == pm) + return 0; + mps_dbg(sdata, "local STA operates in mode %d with %pM\n", pm, sta->sta.addr); @@ -245,6 +248,14 @@ void ieee80211_mps_sta_status_update(struct sta_info *sta) do_buffer = (pm != NL80211_MESH_POWER_ACTIVE); + /* clear the MPSP flags for non-peers or active STA */ + if (sta->plink_state != NL80211_PLINK_ESTAB) { + clear_sta_flag(sta, WLAN_STA_MPSP_OWNER); + clear_sta_flag(sta, WLAN_STA_MPSP_RECIPIENT); + } else if (!do_buffer) { + clear_sta_flag(sta, WLAN_STA_MPSP_OWNER); + } + /* Don't let the same PS state be set twice */ if (test_sta_flag(sta, WLAN_STA_PS_STA) == do_buffer) return; @@ -257,14 +268,6 @@ void ieee80211_mps_sta_status_update(struct sta_info *sta) } else { ieee80211_sta_ps_deliver_wakeup(sta); } - - /* clear the MPSP flags for non-peers or active STA */ - if (sta->plink_state != NL80211_PLINK_ESTAB) { - clear_sta_flag(sta, WLAN_STA_MPSP_OWNER); - clear_sta_flag(sta, WLAN_STA_MPSP_RECIPIENT); - } else if (!do_buffer) { - clear_sta_flag(sta, WLAN_STA_MPSP_OWNER); - } } static void mps_set_sta_peer_pm(struct sta_info *sta, @@ -444,8 +447,7 @@ static void mpsp_qos_null_append(struct sta_info *sta, */ static void mps_frame_deliver(struct sta_info *sta, int n_frames) { - struct ieee80211_sub_if_data *sdata = sta->sdata; - struct ieee80211_local *local = sdata->local; + struct ieee80211_local *local = sta->sdata->local; int ac; struct sk_buff_head frames; struct sk_buff *skb; @@ -558,10 +560,10 @@ void ieee80211_mpsp_trigger_process(u8 *qc, struct sta_info *sta, } /** - * ieee80211_mps_frame_release - release buffered frames in response to beacon + * ieee80211_mps_frame_release - release frames buffered due to mesh power save * * @sta: mesh STA - * @elems: beacon IEs + * @elems: IEs of beacon or probe response * * For peers if we have individually-addressed frames buffered or the peer * indicates buffered frames, send a corresponding MPSP trigger frame. Since @@ -588,9 +590,10 @@ void ieee80211_mps_frame_release(struct sta_info *sta, (!elems->awake_window || !le16_to_cpu(*elems->awake_window))) return; - for (ac = 0; ac < IEEE80211_NUM_ACS; ac++) - buffer_local += skb_queue_len(&sta->ps_tx_buf[ac]) + - skb_queue_len(&sta->tx_filtered[ac]); + if (!test_sta_flag(sta, WLAN_STA_MPSP_OWNER)) + for (ac = 0; ac < IEEE80211_NUM_ACS; ac++) + buffer_local += skb_queue_len(&sta->ps_tx_buf[ac]) + + skb_queue_len(&sta->tx_filtered[ac]); if (!has_buffered && !buffer_local) return; diff --git a/net/mac80211/mlme.c b/net/mac80211/mlme.c index 86e4ad56b573..d7504ab61a34 100644 --- a/net/mac80211/mlme.c +++ b/net/mac80211/mlme.c @@ -145,66 +145,6 @@ static int ecw2cw(int ecw) return (1 << ecw) - 1; } -static u32 chandef_downgrade(struct cfg80211_chan_def *c) -{ - u32 ret; - int tmp; - - switch (c->width) { - case NL80211_CHAN_WIDTH_20: - c->width = NL80211_CHAN_WIDTH_20_NOHT; - ret = IEEE80211_STA_DISABLE_HT | IEEE80211_STA_DISABLE_VHT; - break; - case NL80211_CHAN_WIDTH_40: - c->width = NL80211_CHAN_WIDTH_20; - c->center_freq1 = c->chan->center_freq; - ret = IEEE80211_STA_DISABLE_40MHZ | - IEEE80211_STA_DISABLE_VHT; - break; - case NL80211_CHAN_WIDTH_80: - tmp = (30 + c->chan->center_freq - c->center_freq1)/20; - /* n_P40 */ - tmp /= 2; - /* freq_P40 */ - c->center_freq1 = c->center_freq1 - 20 + 40 * tmp; - c->width = NL80211_CHAN_WIDTH_40; - ret = IEEE80211_STA_DISABLE_VHT; - break; - case NL80211_CHAN_WIDTH_80P80: - c->center_freq2 = 0; - c->width = NL80211_CHAN_WIDTH_80; - ret = IEEE80211_STA_DISABLE_80P80MHZ | - IEEE80211_STA_DISABLE_160MHZ; - break; - case NL80211_CHAN_WIDTH_160: - /* n_P20 */ - tmp = (70 + c->chan->center_freq - c->center_freq1)/20; - /* n_P80 */ - tmp /= 4; - c->center_freq1 = c->center_freq1 - 40 + 80 * tmp; - c->width = NL80211_CHAN_WIDTH_80; - ret = IEEE80211_STA_DISABLE_80P80MHZ | - IEEE80211_STA_DISABLE_160MHZ; - break; - default: - case NL80211_CHAN_WIDTH_20_NOHT: - WARN_ON_ONCE(1); - c->width = NL80211_CHAN_WIDTH_20_NOHT; - ret = IEEE80211_STA_DISABLE_HT | IEEE80211_STA_DISABLE_VHT; - break; - case NL80211_CHAN_WIDTH_5: - case NL80211_CHAN_WIDTH_10: - WARN_ON_ONCE(1); - /* keep c->width */ - ret = IEEE80211_STA_DISABLE_HT | IEEE80211_STA_DISABLE_VHT; - break; - } - - WARN_ON_ONCE(!cfg80211_chandef_valid(c)); - - return ret; -} - static u32 ieee80211_determine_chantype(struct ieee80211_sub_if_data *sdata, struct ieee80211_supported_band *sband, @@ -352,7 +292,7 @@ out: break; } - ret |= chandef_downgrade(chandef); + ret |= ieee80211_chandef_downgrade(chandef); } if (chandef->width != vht_chandef.width && !tracking) @@ -406,13 +346,13 @@ static int ieee80211_config_bw(struct ieee80211_sub_if_data *sdata, */ if (ifmgd->flags & IEEE80211_STA_DISABLE_80P80MHZ && chandef.width == NL80211_CHAN_WIDTH_80P80) - flags |= chandef_downgrade(&chandef); + flags |= ieee80211_chandef_downgrade(&chandef); if (ifmgd->flags & IEEE80211_STA_DISABLE_160MHZ && chandef.width == NL80211_CHAN_WIDTH_160) - flags |= chandef_downgrade(&chandef); + flags |= ieee80211_chandef_downgrade(&chandef); if (ifmgd->flags & IEEE80211_STA_DISABLE_40MHZ && chandef.width > NL80211_CHAN_WIDTH_20) - flags |= chandef_downgrade(&chandef); + flags |= ieee80211_chandef_downgrade(&chandef); if (cfg80211_chandef_identical(&chandef, &sdata->vif.bss_conf.chandef)) return 0; @@ -893,8 +833,7 @@ void ieee80211_send_nullfunc(struct ieee80211_local *local, if (local->hw.flags & IEEE80211_HW_REPORTS_TX_ACK_STATUS) IEEE80211_SKB_CB(skb)->flags |= IEEE80211_TX_CTL_REQ_TX_STATUS; - if (ifmgd->flags & (IEEE80211_STA_BEACON_POLL | - IEEE80211_STA_CONNECTION_POLL)) + if (ifmgd->flags & IEEE80211_STA_CONNECTION_POLL) IEEE80211_SKB_CB(skb)->flags |= IEEE80211_TX_CTL_USE_MINRATE; ieee80211_tx_skb(sdata, skb); @@ -937,6 +876,8 @@ static void ieee80211_chswitch_work(struct work_struct *work) container_of(work, struct ieee80211_sub_if_data, u.mgd.chswitch_work); struct ieee80211_local *local = sdata->local; struct ieee80211_if_managed *ifmgd = &sdata->u.mgd; + u32 changed = 0; + int ret; if (!ieee80211_sdata_running(sdata)) return; @@ -945,24 +886,39 @@ static void ieee80211_chswitch_work(struct work_struct *work) if (!ifmgd->associated) goto out; - local->_oper_chandef = local->csa_chandef; + ret = ieee80211_vif_change_channel(sdata, &local->csa_chandef, + &changed); + if (ret) { + sdata_info(sdata, + "vif channel switch failed, disconnecting\n"); + ieee80211_queue_work(&sdata->local->hw, + &ifmgd->csa_connection_drop_work); + goto out; + } - if (!local->ops->channel_switch) { - /* call "hw_config" only if doing sw channel switch */ - ieee80211_hw_config(local, IEEE80211_CONF_CHANGE_CHANNEL); - } else { - /* update the device channel directly */ - local->hw.conf.chandef = local->_oper_chandef; + if (!local->use_chanctx) { + local->_oper_chandef = local->csa_chandef; + /* Call "hw_config" only if doing sw channel switch. + * Otherwise update the channel directly + */ + if (!local->ops->channel_switch) + ieee80211_hw_config(local, 0); + else + local->hw.conf.chandef = local->_oper_chandef; } /* XXX: shouldn't really modify cfg80211-owned data! */ - ifmgd->associated->channel = local->_oper_chandef.chan; + ifmgd->associated->channel = local->csa_chandef.chan; /* XXX: wait for a beacon first? */ ieee80211_wake_queues_by_reason(&local->hw, IEEE80211_MAX_QUEUE_MAP, IEEE80211_QUEUE_STOP_REASON_CSA); + + ieee80211_bss_info_change_notify(sdata, changed); + out: + sdata->vif.csa_active = false; ifmgd->flags &= ~IEEE80211_STA_CSA_RECEIVED; sdata_unlock(sdata); } @@ -1000,20 +956,10 @@ ieee80211_sta_process_chanswitch(struct ieee80211_sub_if_data *sdata, struct ieee80211_local *local = sdata->local; struct ieee80211_if_managed *ifmgd = &sdata->u.mgd; struct cfg80211_bss *cbss = ifmgd->associated; - struct ieee80211_bss *bss; struct ieee80211_chanctx *chanctx; - enum ieee80211_band new_band; - int new_freq; - u8 new_chan_no; - u8 count; - u8 mode; - struct ieee80211_channel *new_chan; - struct cfg80211_chan_def new_chandef = {}; - struct cfg80211_chan_def new_vht_chandef = {}; - const struct ieee80211_sec_chan_offs_ie *sec_chan_offs; - const struct ieee80211_wide_bw_chansw_ie *wide_bw_chansw_ie; - const struct ieee80211_ht_operation *ht_oper; - int secondary_channel_offset = -1; + enum ieee80211_band current_band; + struct ieee80211_csa_ie csa_ie; + int res; sdata_assert_lock(sdata); @@ -1027,181 +973,53 @@ ieee80211_sta_process_chanswitch(struct ieee80211_sub_if_data *sdata, if (ifmgd->flags & IEEE80211_STA_CSA_RECEIVED) return; - sec_chan_offs = elems->sec_chan_offs; - wide_bw_chansw_ie = elems->wide_bw_chansw_ie; - ht_oper = elems->ht_operation; - - if (ifmgd->flags & (IEEE80211_STA_DISABLE_HT | - IEEE80211_STA_DISABLE_40MHZ)) { - sec_chan_offs = NULL; - wide_bw_chansw_ie = NULL; - /* only used for bandwidth here */ - ht_oper = NULL; - } - - if (ifmgd->flags & IEEE80211_STA_DISABLE_VHT) - wide_bw_chansw_ie = NULL; - - if (elems->ext_chansw_ie) { - if (!ieee80211_operating_class_to_band( - elems->ext_chansw_ie->new_operating_class, - &new_band)) { - sdata_info(sdata, - "cannot understand ECSA IE operating class %d, disconnecting\n", - elems->ext_chansw_ie->new_operating_class); - ieee80211_queue_work(&local->hw, - &ifmgd->csa_connection_drop_work); - } - new_chan_no = elems->ext_chansw_ie->new_ch_num; - count = elems->ext_chansw_ie->count; - mode = elems->ext_chansw_ie->mode; - } else if (elems->ch_switch_ie) { - new_band = cbss->channel->band; - new_chan_no = elems->ch_switch_ie->new_ch_num; - count = elems->ch_switch_ie->count; - mode = elems->ch_switch_ie->mode; - } else { - /* nothing here we understand */ + current_band = cbss->channel->band; + memset(&csa_ie, 0, sizeof(csa_ie)); + res = ieee80211_parse_ch_switch_ie(sdata, elems, beacon, current_band, + ifmgd->flags, + ifmgd->associated->bssid, &csa_ie); + if (res < 0) + ieee80211_queue_work(&local->hw, + &ifmgd->csa_connection_drop_work); + if (res) return; - } - - bss = (void *)cbss->priv; - new_freq = ieee80211_channel_to_frequency(new_chan_no, new_band); - new_chan = ieee80211_get_channel(sdata->local->hw.wiphy, new_freq); - if (!new_chan || new_chan->flags & IEEE80211_CHAN_DISABLED) { + if (!cfg80211_chandef_usable(local->hw.wiphy, &csa_ie.chandef, + IEEE80211_CHAN_DISABLED)) { sdata_info(sdata, - "AP %pM switches to unsupported channel (%d MHz), disconnecting\n", - ifmgd->associated->bssid, new_freq); + "AP %pM switches to unsupported channel (%d MHz, width:%d, CF1/2: %d/%d MHz), disconnecting\n", + ifmgd->associated->bssid, + csa_ie.chandef.chan->center_freq, + csa_ie.chandef.width, csa_ie.chandef.center_freq1, + csa_ie.chandef.center_freq2); ieee80211_queue_work(&local->hw, &ifmgd->csa_connection_drop_work); return; } - if (!beacon && sec_chan_offs) { - secondary_channel_offset = sec_chan_offs->sec_chan_offs; - } else if (beacon && ht_oper) { - secondary_channel_offset = - ht_oper->ht_param & IEEE80211_HT_PARAM_CHA_SEC_OFFSET; - } else if (!(ifmgd->flags & IEEE80211_STA_DISABLE_HT)) { - /* - * If it's not a beacon, HT is enabled and the IE not present, - * it's 20 MHz, 802.11-2012 8.5.2.6: - * This element [the Secondary Channel Offset Element] is - * present when switching to a 40 MHz channel. It may be - * present when switching to a 20 MHz channel (in which - * case the secondary channel offset is set to SCN). - */ - secondary_channel_offset = IEEE80211_HT_PARAM_CHA_SEC_NONE; - } - - switch (secondary_channel_offset) { - default: - /* secondary_channel_offset was present but is invalid */ - case IEEE80211_HT_PARAM_CHA_SEC_NONE: - cfg80211_chandef_create(&new_chandef, new_chan, - NL80211_CHAN_HT20); - break; - case IEEE80211_HT_PARAM_CHA_SEC_ABOVE: - cfg80211_chandef_create(&new_chandef, new_chan, - NL80211_CHAN_HT40PLUS); - break; - case IEEE80211_HT_PARAM_CHA_SEC_BELOW: - cfg80211_chandef_create(&new_chandef, new_chan, - NL80211_CHAN_HT40MINUS); - break; - case -1: - cfg80211_chandef_create(&new_chandef, new_chan, - NL80211_CHAN_NO_HT); - /* keep width for 5/10 MHz channels */ - switch (sdata->vif.bss_conf.chandef.width) { - case NL80211_CHAN_WIDTH_5: - case NL80211_CHAN_WIDTH_10: - new_chandef.width = sdata->vif.bss_conf.chandef.width; - break; - default: - break; - } - break; - } + ifmgd->flags |= IEEE80211_STA_CSA_RECEIVED; + sdata->vif.csa_active = true; - if (wide_bw_chansw_ie) { - new_vht_chandef.chan = new_chan; - new_vht_chandef.center_freq1 = - ieee80211_channel_to_frequency( - wide_bw_chansw_ie->new_center_freq_seg0, - new_band); + mutex_lock(&local->chanctx_mtx); + if (local->use_chanctx) { + u32 num_chanctx = 0; + list_for_each_entry(chanctx, &local->chanctx_list, list) + num_chanctx++; - switch (wide_bw_chansw_ie->new_channel_width) { - default: - /* hmmm, ignore VHT and use HT if present */ - case IEEE80211_VHT_CHANWIDTH_USE_HT: - new_vht_chandef.chan = NULL; - break; - case IEEE80211_VHT_CHANWIDTH_80MHZ: - new_vht_chandef.width = NL80211_CHAN_WIDTH_80; - break; - case IEEE80211_VHT_CHANWIDTH_160MHZ: - new_vht_chandef.width = NL80211_CHAN_WIDTH_160; - break; - case IEEE80211_VHT_CHANWIDTH_80P80MHZ: - /* field is otherwise reserved */ - new_vht_chandef.center_freq2 = - ieee80211_channel_to_frequency( - wide_bw_chansw_ie->new_center_freq_seg1, - new_band); - new_vht_chandef.width = NL80211_CHAN_WIDTH_80P80; - break; - } - if (ifmgd->flags & IEEE80211_STA_DISABLE_80P80MHZ && - new_vht_chandef.width == NL80211_CHAN_WIDTH_80P80) - chandef_downgrade(&new_vht_chandef); - if (ifmgd->flags & IEEE80211_STA_DISABLE_160MHZ && - new_vht_chandef.width == NL80211_CHAN_WIDTH_160) - chandef_downgrade(&new_vht_chandef); - if (ifmgd->flags & IEEE80211_STA_DISABLE_40MHZ && - new_vht_chandef.width > NL80211_CHAN_WIDTH_20) - chandef_downgrade(&new_vht_chandef); - } - - /* if VHT data is there validate & use it */ - if (new_vht_chandef.chan) { - if (!cfg80211_chandef_compatible(&new_vht_chandef, - &new_chandef)) { + if (num_chanctx > 1 || + !(local->hw.flags & IEEE80211_HW_CHANCTX_STA_CSA)) { sdata_info(sdata, - "AP %pM CSA has inconsistent channel data, disconnecting\n", - ifmgd->associated->bssid); + "not handling chan-switch with channel contexts\n"); ieee80211_queue_work(&local->hw, &ifmgd->csa_connection_drop_work); + mutex_unlock(&local->chanctx_mtx); return; } - new_chandef = new_vht_chandef; - } - - if (!cfg80211_chandef_usable(local->hw.wiphy, &new_chandef, - IEEE80211_CHAN_DISABLED)) { - sdata_info(sdata, - "AP %pM switches to unsupported channel (%d MHz, width:%d, CF1/2: %d/%d MHz), disconnecting\n", - ifmgd->associated->bssid, new_freq, - new_chandef.width, new_chandef.center_freq1, - new_chandef.center_freq2); - ieee80211_queue_work(&local->hw, - &ifmgd->csa_connection_drop_work); - return; } - ifmgd->flags |= IEEE80211_STA_CSA_RECEIVED; - - if (local->use_chanctx) { - sdata_info(sdata, - "not handling channel switch with channel contexts\n"); + if (WARN_ON(!rcu_access_pointer(sdata->vif.chanctx_conf))) { ieee80211_queue_work(&local->hw, &ifmgd->csa_connection_drop_work); - return; - } - - mutex_lock(&local->chanctx_mtx); - if (WARN_ON(!rcu_access_pointer(sdata->vif.chanctx_conf))) { mutex_unlock(&local->chanctx_mtx); return; } @@ -1217,9 +1035,9 @@ ieee80211_sta_process_chanswitch(struct ieee80211_sub_if_data *sdata, } mutex_unlock(&local->chanctx_mtx); - local->csa_chandef = new_chandef; + local->csa_chandef = csa_ie.chandef; - if (mode) + if (csa_ie.mode) ieee80211_stop_queues_by_reason(&local->hw, IEEE80211_MAX_QUEUE_MAP, IEEE80211_QUEUE_STOP_REASON_CSA); @@ -1228,9 +1046,9 @@ ieee80211_sta_process_chanswitch(struct ieee80211_sub_if_data *sdata, /* use driver's channel switch callback */ struct ieee80211_channel_switch ch_switch = { .timestamp = timestamp, - .block_tx = mode, - .chandef = new_chandef, - .count = count, + .block_tx = csa_ie.mode, + .chandef = csa_ie.chandef, + .count = csa_ie.count, }; drv_channel_switch(local, &ch_switch); @@ -1238,11 +1056,11 @@ ieee80211_sta_process_chanswitch(struct ieee80211_sub_if_data *sdata, } /* channel switch handled in software */ - if (count <= 1) + if (csa_ie.count <= 1) ieee80211_queue_work(&local->hw, &ifmgd->chswitch_work); else mod_timer(&ifmgd->chswitch_timer, - TU_TO_EXP_TIME(count * cbss->beacon_interval)); + TU_TO_EXP_TIME(csa_ie.count * cbss->beacon_interval)); } static u32 ieee80211_handle_pwr_constr(struct ieee80211_sub_if_data *sdata, @@ -1374,8 +1192,7 @@ static bool ieee80211_powersave_allowed(struct ieee80211_sub_if_data *sdata) if (!mgd->associated) return false; - if (mgd->flags & (IEEE80211_STA_BEACON_POLL | - IEEE80211_STA_CONNECTION_POLL)) + if (mgd->flags & IEEE80211_STA_CONNECTION_POLL) return false; if (!mgd->have_beacon) @@ -1691,8 +1508,7 @@ static void __ieee80211_stop_poll(struct ieee80211_sub_if_data *sdata) { lockdep_assert_held(&sdata->local->mtx); - sdata->u.mgd.flags &= ~(IEEE80211_STA_CONNECTION_POLL | - IEEE80211_STA_BEACON_POLL); + sdata->u.mgd.flags &= ~IEEE80211_STA_CONNECTION_POLL; ieee80211_run_deferred_scan(sdata->local); } @@ -1954,11 +1770,8 @@ static void ieee80211_reset_ap_probe(struct ieee80211_sub_if_data *sdata) struct ieee80211_local *local = sdata->local; mutex_lock(&local->mtx); - if (!(ifmgd->flags & (IEEE80211_STA_BEACON_POLL | - IEEE80211_STA_CONNECTION_POLL))) { - mutex_unlock(&local->mtx); - return; - } + if (!(ifmgd->flags & IEEE80211_STA_CONNECTION_POLL)) + goto out; __ieee80211_stop_poll(sdata); @@ -2094,15 +1907,9 @@ static void ieee80211_mgd_probe_ap(struct ieee80211_sub_if_data *sdata, * because otherwise we would reset the timer every time and * never check whether we received a probe response! */ - if (ifmgd->flags & (IEEE80211_STA_BEACON_POLL | - IEEE80211_STA_CONNECTION_POLL)) + if (ifmgd->flags & IEEE80211_STA_CONNECTION_POLL) already = true; - if (beacon) - ifmgd->flags |= IEEE80211_STA_BEACON_POLL; - else - ifmgd->flags |= IEEE80211_STA_CONNECTION_POLL; - mutex_unlock(&sdata->local->mtx); if (already) @@ -2174,6 +1981,7 @@ static void __ieee80211_disconnect(struct ieee80211_sub_if_data *sdata) WLAN_REASON_DISASSOC_DUE_TO_INACTIVITY, true, frame_buf); ifmgd->flags &= ~IEEE80211_STA_CSA_RECEIVED; + sdata->vif.csa_active = false; ieee80211_wake_queues_by_reason(&sdata->local->hw, IEEE80211_MAX_QUEUE_MAP, IEEE80211_QUEUE_STOP_REASON_CSA); @@ -2717,7 +2525,7 @@ static bool ieee80211_assoc_success(struct ieee80211_sub_if_data *sdata, */ ifmgd->wmm_last_param_set = -1; - if (elems.wmm_param) + if (!(ifmgd->flags & IEEE80211_STA_DISABLE_WMM) && elems.wmm_param) ieee80211_sta_wmm_params(local, sdata, elems.wmm_param, elems.wmm_param_len); else @@ -3061,17 +2869,10 @@ static void ieee80211_rx_mgmt_beacon(struct ieee80211_sub_if_data *sdata, } } - if (ifmgd->flags & IEEE80211_STA_BEACON_POLL) { + if (ifmgd->flags & IEEE80211_STA_CONNECTION_POLL) { mlme_dbg_ratelimited(sdata, "cancelling AP probe due to a received beacon\n"); - mutex_lock(&local->mtx); - ifmgd->flags &= ~IEEE80211_STA_BEACON_POLL; - ieee80211_run_deferred_scan(local); - mutex_unlock(&local->mtx); - - mutex_lock(&local->iflist_mtx); - ieee80211_recalc_ps(local, -1); - mutex_unlock(&local->iflist_mtx); + ieee80211_reset_ap_probe(sdata); } /* @@ -3152,7 +2953,8 @@ static void ieee80211_rx_mgmt_beacon(struct ieee80211_sub_if_data *sdata, ieee80211_sta_process_chanswitch(sdata, rx_status->mactime, &elems, true); - if (ieee80211_sta_wmm_params(local, sdata, elems.wmm_param, + if (!(ifmgd->flags & IEEE80211_STA_DISABLE_WMM) && + ieee80211_sta_wmm_params(local, sdata, elems.wmm_param, elems.wmm_param_len)) changed |= BSS_CHANGED_QOS; @@ -3543,8 +3345,7 @@ void ieee80211_sta_work(struct ieee80211_sub_if_data *sdata) } else if (ifmgd->assoc_data && ifmgd->assoc_data->timeout_started) run_again(sdata, ifmgd->assoc_data->timeout); - if (ifmgd->flags & (IEEE80211_STA_BEACON_POLL | - IEEE80211_STA_CONNECTION_POLL) && + if (ifmgd->flags & IEEE80211_STA_CONNECTION_POLL && ifmgd->associated) { u8 bssid[ETH_ALEN]; int max_tries; @@ -3697,7 +3498,7 @@ void ieee80211_sta_setup_sdata(struct ieee80211_sub_if_data *sdata) ieee80211_beacon_connection_loss_work); INIT_WORK(&ifmgd->csa_connection_drop_work, ieee80211_csa_connection_drop_work); - INIT_WORK(&ifmgd->request_smps_work, ieee80211_request_smps_work); + INIT_WORK(&ifmgd->request_smps_work, ieee80211_request_smps_mgd_work); setup_timer(&ifmgd->timer, ieee80211_sta_timer, (unsigned long) sdata); setup_timer(&ifmgd->bcn_mon_timer, ieee80211_sta_bcn_mon_timer, @@ -3876,7 +3677,7 @@ static int ieee80211_prep_channel(struct ieee80211_sub_if_data *sdata, return ret; while (ret && chandef.width != NL80211_CHAN_WIDTH_20_NOHT) { - ifmgd->flags |= chandef_downgrade(&chandef); + ifmgd->flags |= ieee80211_chandef_downgrade(&chandef); ret = ieee80211_vif_use_channel(sdata, &chandef, IEEE80211_CHANCTX_SHARED); } @@ -4135,6 +3936,44 @@ int ieee80211_mgd_auth(struct ieee80211_sub_if_data *sdata, return err; } +static bool ieee80211_usable_wmm_params(struct ieee80211_sub_if_data *sdata, + const u8 *wmm_param, int len) +{ + const u8 *pos; + size_t left; + + if (len < 8) + return false; + + if (wmm_param[5] != 1 /* version */) + return false; + + pos = wmm_param + 8; + left = len - 8; + + for (; left >= 4; left -= 4, pos += 4) { + u8 aifsn = pos[0] & 0x0f; + u8 ecwmin = pos[1] & 0x0f; + u8 ecwmax = (pos[1] & 0xf0) >> 4; + int aci = (pos[0] >> 5) & 0x03; + + if (aifsn < 2) { + sdata_info(sdata, + "AP has invalid WMM params (AIFSN=%d for ACI %d), disabling WMM\n", + aifsn, aci); + return false; + } + if (ecwmin > ecwmax) { + sdata_info(sdata, + "AP has invalid WMM params (ECWmin/max=%d/%d for ACI %d), disabling WMM\n", + ecwmin, ecwmax, aci); + return false; + } + } + + return true; +} + int ieee80211_mgd_assoc(struct ieee80211_sub_if_data *sdata, struct cfg80211_assoc_request *req) { @@ -4192,9 +4031,45 @@ int ieee80211_mgd_assoc(struct ieee80211_sub_if_data *sdata, } /* prepare assoc data */ - + ifmgd->beacon_crc_valid = false; + assoc_data->wmm = bss->wmm_used && + (local->hw.queues >= IEEE80211_NUM_ACS); + if (assoc_data->wmm) { + /* try to check validity of WMM params IE */ + const struct cfg80211_bss_ies *ies; + const u8 *wp, *start, *end; + + rcu_read_lock(); + ies = rcu_dereference(req->bss->ies); + start = ies->data; + end = start + ies->len; + + while (true) { + wp = cfg80211_find_vendor_ie( + WLAN_OUI_MICROSOFT, + WLAN_OUI_TYPE_MICROSOFT_WMM, + start, end - start); + if (!wp) + break; + start = wp + wp[1] + 2; + /* if this IE is too short, try the next */ + if (wp[1] <= 4) + continue; + /* if this IE is WMM params, we found what we wanted */ + if (wp[6] == 1) + break; + } + + if (!wp || !ieee80211_usable_wmm_params(sdata, wp + 2, + wp[1] - 2)) { + assoc_data->wmm = false; + ifmgd->flags |= IEEE80211_STA_DISABLE_WMM; + } + rcu_read_unlock(); + } + /* * IEEE802.11n does not allow TKIP/WEP as pairwise ciphers in HT mode. * We still associate in non-HT mode (11a/b/g) if any one of these @@ -4224,18 +4099,22 @@ int ieee80211_mgd_assoc(struct ieee80211_sub_if_data *sdata, /* Also disable HT if we don't support it or the AP doesn't use WMM */ sband = local->hw.wiphy->bands[req->bss->channel->band]; if (!sband->ht_cap.ht_supported || - local->hw.queues < IEEE80211_NUM_ACS || !bss->wmm_used) { + local->hw.queues < IEEE80211_NUM_ACS || !bss->wmm_used || + ifmgd->flags & IEEE80211_STA_DISABLE_WMM) { ifmgd->flags |= IEEE80211_STA_DISABLE_HT; - if (!bss->wmm_used) + if (!bss->wmm_used && + !(ifmgd->flags & IEEE80211_STA_DISABLE_WMM)) netdev_info(sdata->dev, "disabling HT as WMM/QoS is not supported by the AP\n"); } /* disable VHT if we don't support it or the AP doesn't use WMM */ if (!sband->vht_cap.vht_supported || - local->hw.queues < IEEE80211_NUM_ACS || !bss->wmm_used) { + local->hw.queues < IEEE80211_NUM_ACS || !bss->wmm_used || + ifmgd->flags & IEEE80211_STA_DISABLE_WMM) { ifmgd->flags |= IEEE80211_STA_DISABLE_VHT; - if (!bss->wmm_used) + if (!bss->wmm_used && + !(ifmgd->flags & IEEE80211_STA_DISABLE_WMM)) netdev_info(sdata->dev, "disabling VHT as WMM/QoS is not supported by the AP\n"); } @@ -4264,8 +4143,6 @@ int ieee80211_mgd_assoc(struct ieee80211_sub_if_data *sdata, sdata->smps_mode = ifmgd->req_smps; assoc_data->capability = req->bss->capability; - assoc_data->wmm = bss->wmm_used && - (local->hw.queues >= IEEE80211_NUM_ACS); assoc_data->supp_rates = bss->supp_rates; assoc_data->supp_rates_len = bss->supp_rates_len; diff --git a/net/mac80211/rate.c b/net/mac80211/rate.c index e126605cec66..22b223f13c9f 100644 --- a/net/mac80211/rate.c +++ b/net/mac80211/rate.c @@ -235,7 +235,8 @@ static void rc_send_low_basicrate(s8 *idx, u32 basic_rates, static void __rate_control_send_low(struct ieee80211_hw *hw, struct ieee80211_supported_band *sband, struct ieee80211_sta *sta, - struct ieee80211_tx_info *info) + struct ieee80211_tx_info *info, + u32 rate_mask) { int i; u32 rate_flags = @@ -247,6 +248,12 @@ static void __rate_control_send_low(struct ieee80211_hw *hw, info->control.rates[0].idx = 0; for (i = 0; i < sband->n_bitrates; i++) { + if (!(rate_mask & BIT(i))) + continue; + + if ((rate_flags & sband->bitrates[i].flags) != rate_flags) + continue; + if (!rate_supported(sta, sband->band, i)) continue; @@ -274,7 +281,8 @@ bool rate_control_send_low(struct ieee80211_sta *pubsta, bool use_basicrate = false; if (!pubsta || !priv_sta || rc_no_data_or_no_ack_use_min(txrc)) { - __rate_control_send_low(txrc->hw, sband, pubsta, info); + __rate_control_send_low(txrc->hw, sband, pubsta, info, + txrc->rate_idx_mask); if (!pubsta && txrc->bss) { mcast_rate = txrc->bss_conf->mcast_rate[sband->band]; @@ -656,7 +664,8 @@ void ieee80211_get_tx_rates(struct ieee80211_vif *vif, rate_control_apply_mask(sdata, sta, sband, info, dest, max_rates); if (dest[0].idx < 0) - __rate_control_send_low(&sdata->local->hw, sband, sta, info); + __rate_control_send_low(&sdata->local->hw, sband, sta, info, + sdata->rc_rateidx_mask[info->band]); if (sta) rate_fixup_ratelist(vif, sband, info, dest, max_rates); diff --git a/net/mac80211/rate.h b/net/mac80211/rate.h index 5dedc56c94db..505bc0dea074 100644 --- a/net/mac80211/rate.h +++ b/net/mac80211/rate.h @@ -144,8 +144,8 @@ void rate_control_deinitialize(struct ieee80211_local *local); /* Rate control algorithms */ #ifdef CONFIG_MAC80211_RC_PID -extern int rc80211_pid_init(void); -extern void rc80211_pid_exit(void); +int rc80211_pid_init(void); +void rc80211_pid_exit(void); #else static inline int rc80211_pid_init(void) { @@ -157,8 +157,8 @@ static inline void rc80211_pid_exit(void) #endif #ifdef CONFIG_MAC80211_RC_MINSTREL -extern int rc80211_minstrel_init(void); -extern void rc80211_minstrel_exit(void); +int rc80211_minstrel_init(void); +void rc80211_minstrel_exit(void); #else static inline int rc80211_minstrel_init(void) { @@ -170,8 +170,8 @@ static inline void rc80211_minstrel_exit(void) #endif #ifdef CONFIG_MAC80211_RC_MINSTREL_HT -extern int rc80211_minstrel_ht_init(void); -extern void rc80211_minstrel_ht_exit(void); +int rc80211_minstrel_ht_init(void); +void rc80211_minstrel_ht_exit(void); #else static inline int rc80211_minstrel_ht_init(void) { diff --git a/net/mac80211/rc80211_minstrel.c b/net/mac80211/rc80211_minstrel.c index 8b5f7ef7c0c9..7fa1b36e6202 100644 --- a/net/mac80211/rc80211_minstrel.c +++ b/net/mac80211/rc80211_minstrel.c @@ -203,6 +203,15 @@ minstrel_update_stats(struct minstrel_priv *mp, struct minstrel_sta_info *mi) memcpy(mi->max_tp_rate, tmp_tp_rate, sizeof(mi->max_tp_rate)); mi->max_prob_rate = tmp_prob_rate; +#ifdef CONFIG_MAC80211_DEBUGFS + /* use fixed index if set */ + if (mp->fixed_rate_idx != -1) { + mi->max_tp_rate[0] = mp->fixed_rate_idx; + mi->max_tp_rate[1] = mp->fixed_rate_idx; + mi->max_prob_rate = mp->fixed_rate_idx; + } +#endif + /* Reset update timer */ mi->stats_update = jiffies; @@ -310,6 +319,11 @@ minstrel_get_rate(void *priv, struct ieee80211_sta *sta, /* increase sum packet counter */ mi->packet_count++; +#ifdef CONFIG_MAC80211_DEBUGFS + if (mp->fixed_rate_idx != -1) + return; +#endif + delta = (mi->packet_count * sampling_ratio / 100) - (mi->sample_count + mi->sample_deferred / 2); diff --git a/net/mac80211/rc80211_minstrel_ht.c b/net/mac80211/rc80211_minstrel_ht.c index 7c323f27ba23..5d60779a0c1b 100644 --- a/net/mac80211/rc80211_minstrel_ht.c +++ b/net/mac80211/rc80211_minstrel_ht.c @@ -365,6 +365,14 @@ minstrel_ht_update_stats(struct minstrel_priv *mp, struct minstrel_ht_sta *mi) } } +#ifdef CONFIG_MAC80211_DEBUGFS + /* use fixed index if set */ + if (mp->fixed_rate_idx != -1) { + mi->max_tp_rate = mp->fixed_rate_idx; + mi->max_tp_rate2 = mp->fixed_rate_idx; + mi->max_prob_rate = mp->fixed_rate_idx; + } +#endif mi->stats_update = jiffies; } @@ -774,6 +782,11 @@ minstrel_ht_get_rate(void *priv, struct ieee80211_sta *sta, void *priv_sta, info->flags |= mi->tx_flags; minstrel_ht_check_cck_shortpreamble(mp, mi, txrc->short_preamble); +#ifdef CONFIG_MAC80211_DEBUGFS + if (mp->fixed_rate_idx != -1) + return; +#endif + /* Don't use EAPOL frames for sampling on non-mrr hw */ if (mp->hw->max_rates == 1 && (info->control.flags & IEEE80211_TX_CTRL_PORT_CTRL_PROTO)) @@ -781,16 +794,6 @@ minstrel_ht_get_rate(void *priv, struct ieee80211_sta *sta, void *priv_sta, else sample_idx = minstrel_get_sample_rate(mp, mi); -#ifdef CONFIG_MAC80211_DEBUGFS - /* use fixed index if set */ - if (mp->fixed_rate_idx != -1) { - mi->max_tp_rate = mp->fixed_rate_idx; - mi->max_tp_rate2 = mp->fixed_rate_idx; - mi->max_prob_rate = mp->fixed_rate_idx; - sample_idx = -1; - } -#endif - mi->total_packets++; /* wraparound */ diff --git a/net/mac80211/rc80211_pid_debugfs.c b/net/mac80211/rc80211_pid_debugfs.c index c97a0657c043..6ff134650a84 100644 --- a/net/mac80211/rc80211_pid_debugfs.c +++ b/net/mac80211/rc80211_pid_debugfs.c @@ -167,29 +167,29 @@ static ssize_t rate_control_pid_events_read(struct file *file, char __user *buf, * provide large enough buffers. */ length = length < RC_PID_PRINT_BUF_SIZE ? length : RC_PID_PRINT_BUF_SIZE; - p = snprintf(pb, length, "%u %lu ", ev->id, ev->timestamp); + p = scnprintf(pb, length, "%u %lu ", ev->id, ev->timestamp); switch (ev->type) { case RC_PID_EVENT_TYPE_TX_STATUS: - p += snprintf(pb + p, length - p, "tx_status %u %u", - !(ev->data.flags & IEEE80211_TX_STAT_ACK), - ev->data.tx_status.status.rates[0].idx); + p += scnprintf(pb + p, length - p, "tx_status %u %u", + !(ev->data.flags & IEEE80211_TX_STAT_ACK), + ev->data.tx_status.status.rates[0].idx); break; case RC_PID_EVENT_TYPE_RATE_CHANGE: - p += snprintf(pb + p, length - p, "rate_change %d %d", - ev->data.index, ev->data.rate); + p += scnprintf(pb + p, length - p, "rate_change %d %d", + ev->data.index, ev->data.rate); break; case RC_PID_EVENT_TYPE_TX_RATE: - p += snprintf(pb + p, length - p, "tx_rate %d %d", - ev->data.index, ev->data.rate); + p += scnprintf(pb + p, length - p, "tx_rate %d %d", + ev->data.index, ev->data.rate); break; case RC_PID_EVENT_TYPE_PF_SAMPLE: - p += snprintf(pb + p, length - p, - "pf_sample %d %d %d %d", - ev->data.pf_sample, ev->data.prop_err, - ev->data.int_err, ev->data.der_err); + p += scnprintf(pb + p, length - p, + "pf_sample %d %d %d %d", + ev->data.pf_sample, ev->data.prop_err, + ev->data.int_err, ev->data.der_err); break; } - p += snprintf(pb + p, length - p, "\n"); + p += scnprintf(pb + p, length - p, "\n"); spin_unlock_irqrestore(&events->lock, status); diff --git a/net/mac80211/rx.c b/net/mac80211/rx.c index 674eac1f996c..caecef870c0e 100644 --- a/net/mac80211/rx.c +++ b/net/mac80211/rx.c @@ -995,8 +995,9 @@ ieee80211_rx_h_check(struct ieee80211_rx_data *rx) rx->sta->num_duplicates++; } return RX_DROP_UNUSABLE; - } else + } else if (!(status->flag & RX_FLAG_AMSDU_MORE)) { rx->sta->last_seq_ctrl[rx->seqno_idx] = hdr->seq_ctrl; + } } if (unlikely(rx->skb->len < 16)) { @@ -2402,7 +2403,8 @@ ieee80211_rx_h_action(struct ieee80211_rx_data *rx) return RX_DROP_UNUSABLE; if (!rx->sta && mgmt->u.action.category != WLAN_CATEGORY_PUBLIC && - mgmt->u.action.category != WLAN_CATEGORY_SELF_PROTECTED) + mgmt->u.action.category != WLAN_CATEGORY_SELF_PROTECTED && + mgmt->u.action.category != WLAN_CATEGORY_SPECTRUM_MGMT) return RX_DROP_UNUSABLE; if (!(status->rx_flags & IEEE80211_RX_RA_MATCH)) @@ -2566,31 +2568,49 @@ ieee80211_rx_h_action(struct ieee80211_rx_data *rx) goto queue; case WLAN_CATEGORY_SPECTRUM_MGMT: - if (status->band != IEEE80211_BAND_5GHZ) - break; - - if (sdata->vif.type != NL80211_IFTYPE_STATION) - break; - /* verify action_code is present */ if (len < IEEE80211_MIN_ACTION_SIZE + 1) break; switch (mgmt->u.action.u.measurement.action_code) { case WLAN_ACTION_SPCT_MSR_REQ: + if (status->band != IEEE80211_BAND_5GHZ) + break; + if (len < (IEEE80211_MIN_ACTION_SIZE + sizeof(mgmt->u.action.u.measurement))) break; + + if (sdata->vif.type != NL80211_IFTYPE_STATION) + break; + ieee80211_process_measurement_req(sdata, mgmt, len); goto handled; - case WLAN_ACTION_SPCT_CHL_SWITCH: - if (sdata->vif.type != NL80211_IFTYPE_STATION) + case WLAN_ACTION_SPCT_CHL_SWITCH: { + u8 *bssid; + if (len < (IEEE80211_MIN_ACTION_SIZE + + sizeof(mgmt->u.action.u.chan_switch))) + break; + + if (sdata->vif.type != NL80211_IFTYPE_STATION && + sdata->vif.type != NL80211_IFTYPE_ADHOC && + sdata->vif.type != NL80211_IFTYPE_MESH_POINT) break; - if (!ether_addr_equal(mgmt->bssid, sdata->u.mgd.bssid)) + if (sdata->vif.type == NL80211_IFTYPE_STATION) + bssid = sdata->u.mgd.bssid; + else if (sdata->vif.type == NL80211_IFTYPE_ADHOC) + bssid = sdata->u.ibss.bssid; + else if (sdata->vif.type == NL80211_IFTYPE_MESH_POINT) + bssid = mgmt->sa; + else + break; + + if (!ether_addr_equal(mgmt->bssid, bssid)) break; goto queue; + } } break; case WLAN_CATEGORY_SA_QUERY: diff --git a/net/mac80211/scan.c b/net/mac80211/scan.c index d2d17a449224..5ad66a83ef7f 100644 --- a/net/mac80211/scan.c +++ b/net/mac80211/scan.c @@ -394,8 +394,7 @@ static bool ieee80211_can_scan(struct ieee80211_local *local, return false; if (sdata->vif.type == NL80211_IFTYPE_STATION && - sdata->u.mgd.flags & (IEEE80211_STA_BEACON_POLL | - IEEE80211_STA_CONNECTION_POLL)) + sdata->u.mgd.flags & IEEE80211_STA_CONNECTION_POLL) return false; return true; diff --git a/net/mac80211/spectmgmt.c b/net/mac80211/spectmgmt.c index 578eea3fc04d..a40da20b32e0 100644 --- a/net/mac80211/spectmgmt.c +++ b/net/mac80211/spectmgmt.c @@ -21,6 +21,175 @@ #include "sta_info.h" #include "wme.h" +int ieee80211_parse_ch_switch_ie(struct ieee80211_sub_if_data *sdata, + struct ieee802_11_elems *elems, bool beacon, + enum ieee80211_band current_band, + u32 sta_flags, u8 *bssid, + struct ieee80211_csa_ie *csa_ie) +{ + enum ieee80211_band new_band; + int new_freq; + u8 new_chan_no; + struct ieee80211_channel *new_chan; + struct cfg80211_chan_def new_vht_chandef = {}; + const struct ieee80211_sec_chan_offs_ie *sec_chan_offs; + const struct ieee80211_wide_bw_chansw_ie *wide_bw_chansw_ie; + const struct ieee80211_ht_operation *ht_oper; + int secondary_channel_offset = -1; + + sec_chan_offs = elems->sec_chan_offs; + wide_bw_chansw_ie = elems->wide_bw_chansw_ie; + ht_oper = elems->ht_operation; + + if (sta_flags & (IEEE80211_STA_DISABLE_HT | + IEEE80211_STA_DISABLE_40MHZ)) { + sec_chan_offs = NULL; + wide_bw_chansw_ie = NULL; + /* only used for bandwidth here */ + ht_oper = NULL; + } + + if (sta_flags & IEEE80211_STA_DISABLE_VHT) + wide_bw_chansw_ie = NULL; + + if (elems->ext_chansw_ie) { + if (!ieee80211_operating_class_to_band( + elems->ext_chansw_ie->new_operating_class, + &new_band)) { + sdata_info(sdata, + "cannot understand ECSA IE operating class %d, disconnecting\n", + elems->ext_chansw_ie->new_operating_class); + return -EINVAL; + } + new_chan_no = elems->ext_chansw_ie->new_ch_num; + csa_ie->count = elems->ext_chansw_ie->count; + csa_ie->mode = elems->ext_chansw_ie->mode; + } else if (elems->ch_switch_ie) { + new_band = current_band; + new_chan_no = elems->ch_switch_ie->new_ch_num; + csa_ie->count = elems->ch_switch_ie->count; + csa_ie->mode = elems->ch_switch_ie->mode; + } else { + /* nothing here we understand */ + return 1; + } + + /* Mesh Channel Switch Parameters Element */ + if (elems->mesh_chansw_params_ie) { + csa_ie->ttl = elems->mesh_chansw_params_ie->mesh_ttl; + csa_ie->mode = elems->mesh_chansw_params_ie->mesh_flags; + } + + new_freq = ieee80211_channel_to_frequency(new_chan_no, new_band); + new_chan = ieee80211_get_channel(sdata->local->hw.wiphy, new_freq); + if (!new_chan || new_chan->flags & IEEE80211_CHAN_DISABLED) { + sdata_info(sdata, + "BSS %pM switches to unsupported channel (%d MHz), disconnecting\n", + bssid, new_freq); + return -EINVAL; + } + + if (!beacon && sec_chan_offs) { + secondary_channel_offset = sec_chan_offs->sec_chan_offs; + } else if (beacon && ht_oper) { + secondary_channel_offset = + ht_oper->ht_param & IEEE80211_HT_PARAM_CHA_SEC_OFFSET; + } else if (!(sta_flags & IEEE80211_STA_DISABLE_HT)) { + /* If it's not a beacon, HT is enabled and the IE not present, + * it's 20 MHz, 802.11-2012 8.5.2.6: + * This element [the Secondary Channel Offset Element] is + * present when switching to a 40 MHz channel. It may be + * present when switching to a 20 MHz channel (in which + * case the secondary channel offset is set to SCN). + */ + secondary_channel_offset = IEEE80211_HT_PARAM_CHA_SEC_NONE; + } + + switch (secondary_channel_offset) { + default: + /* secondary_channel_offset was present but is invalid */ + case IEEE80211_HT_PARAM_CHA_SEC_NONE: + cfg80211_chandef_create(&csa_ie->chandef, new_chan, + NL80211_CHAN_HT20); + break; + case IEEE80211_HT_PARAM_CHA_SEC_ABOVE: + cfg80211_chandef_create(&csa_ie->chandef, new_chan, + NL80211_CHAN_HT40PLUS); + break; + case IEEE80211_HT_PARAM_CHA_SEC_BELOW: + cfg80211_chandef_create(&csa_ie->chandef, new_chan, + NL80211_CHAN_HT40MINUS); + break; + case -1: + cfg80211_chandef_create(&csa_ie->chandef, new_chan, + NL80211_CHAN_NO_HT); + /* keep width for 5/10 MHz channels */ + switch (sdata->vif.bss_conf.chandef.width) { + case NL80211_CHAN_WIDTH_5: + case NL80211_CHAN_WIDTH_10: + csa_ie->chandef.width = + sdata->vif.bss_conf.chandef.width; + break; + default: + break; + } + break; + } + + if (wide_bw_chansw_ie) { + new_vht_chandef.chan = new_chan; + new_vht_chandef.center_freq1 = + ieee80211_channel_to_frequency( + wide_bw_chansw_ie->new_center_freq_seg0, + new_band); + + switch (wide_bw_chansw_ie->new_channel_width) { + default: + /* hmmm, ignore VHT and use HT if present */ + case IEEE80211_VHT_CHANWIDTH_USE_HT: + new_vht_chandef.chan = NULL; + break; + case IEEE80211_VHT_CHANWIDTH_80MHZ: + new_vht_chandef.width = NL80211_CHAN_WIDTH_80; + break; + case IEEE80211_VHT_CHANWIDTH_160MHZ: + new_vht_chandef.width = NL80211_CHAN_WIDTH_160; + break; + case IEEE80211_VHT_CHANWIDTH_80P80MHZ: + /* field is otherwise reserved */ + new_vht_chandef.center_freq2 = + ieee80211_channel_to_frequency( + wide_bw_chansw_ie->new_center_freq_seg1, + new_band); + new_vht_chandef.width = NL80211_CHAN_WIDTH_80P80; + break; + } + if (sta_flags & IEEE80211_STA_DISABLE_80P80MHZ && + new_vht_chandef.width == NL80211_CHAN_WIDTH_80P80) + ieee80211_chandef_downgrade(&new_vht_chandef); + if (sta_flags & IEEE80211_STA_DISABLE_160MHZ && + new_vht_chandef.width == NL80211_CHAN_WIDTH_160) + ieee80211_chandef_downgrade(&new_vht_chandef); + if (sta_flags & IEEE80211_STA_DISABLE_40MHZ && + new_vht_chandef.width > NL80211_CHAN_WIDTH_20) + ieee80211_chandef_downgrade(&new_vht_chandef); + } + + /* if VHT data is there validate & use it */ + if (new_vht_chandef.chan) { + if (!cfg80211_chandef_compatible(&new_vht_chandef, + &csa_ie->chandef)) { + sdata_info(sdata, + "BSS %pM: CSA has inconsistent channel data, disconnecting\n", + bssid); + return -EINVAL; + } + csa_ie->chandef = new_vht_chandef; + } + + return 0; +} + static void ieee80211_send_refuse_measurement_request(struct ieee80211_sub_if_data *sdata, struct ieee80211_msrment_ie *request_ie, const u8 *da, const u8 *bssid, diff --git a/net/mac80211/sta_info.c b/net/mac80211/sta_info.c index aeb967a0aeed..1eb66e26e49d 100644 --- a/net/mac80211/sta_info.c +++ b/net/mac80211/sta_info.c @@ -385,6 +385,30 @@ struct sta_info *sta_info_alloc(struct ieee80211_sub_if_data *sdata, sta->last_seq_ctrl[i] = cpu_to_le16(USHRT_MAX); sta->sta.smps_mode = IEEE80211_SMPS_OFF; + if (sdata->vif.type == NL80211_IFTYPE_AP || + sdata->vif.type == NL80211_IFTYPE_AP_VLAN) { + struct ieee80211_supported_band *sband = + local->hw.wiphy->bands[ieee80211_get_sdata_band(sdata)]; + u8 smps = (sband->ht_cap.cap & IEEE80211_HT_CAP_SM_PS) >> + IEEE80211_HT_CAP_SM_PS_SHIFT; + /* + * Assume that hostapd advertises our caps in the beacon and + * this is the known_smps_mode for a station that just assciated + */ + switch (smps) { + case WLAN_HT_SMPS_CONTROL_DISABLED: + sta->known_smps_mode = IEEE80211_SMPS_OFF; + break; + case WLAN_HT_SMPS_CONTROL_STATIC: + sta->known_smps_mode = IEEE80211_SMPS_STATIC; + break; + case WLAN_HT_SMPS_CONTROL_DYNAMIC: + sta->known_smps_mode = IEEE80211_SMPS_DYNAMIC; + break; + default: + WARN_ON(1); + } + } sta_dbg(sdata, "Allocated STA %pM\n", sta->sta.addr); @@ -1069,6 +1093,19 @@ void ieee80211_sta_ps_deliver_wakeup(struct sta_info *sta) ieee80211_add_pending_skbs_fn(local, &pending, clear_sta_ps_flags, sta); + /* This station just woke up and isn't aware of our SMPS state */ + if (!ieee80211_smps_is_restrictive(sta->known_smps_mode, + sdata->smps_mode) && + sta->known_smps_mode != sdata->bss->req_smps && + sta_info_tx_streams(sta) != 1) { + ht_dbg(sdata, + "%pM just woke up and MIMO capable - update SMPS\n", + sta->sta.addr); + ieee80211_send_smps_action(sdata, sdata->bss->req_smps, + sta->sta.addr, + sdata->vif.bss_conf.bssid); + } + local->total_ps_buffered -= buffered; sta_info_recalc_tim(sta); @@ -1520,3 +1557,38 @@ int sta_info_move_state(struct sta_info *sta, return 0; } + +u8 sta_info_tx_streams(struct sta_info *sta) +{ + struct ieee80211_sta_ht_cap *ht_cap = &sta->sta.ht_cap; + u8 rx_streams; + + if (!sta->sta.ht_cap.ht_supported) + return 1; + + if (sta->sta.vht_cap.vht_supported) { + int i; + u16 tx_mcs_map = + le16_to_cpu(sta->sta.vht_cap.vht_mcs.tx_mcs_map); + + for (i = 7; i >= 0; i--) + if ((tx_mcs_map & (0x3 << (i * 2))) != + IEEE80211_VHT_MCS_NOT_SUPPORTED) + return i + 1; + } + + if (ht_cap->mcs.rx_mask[3]) + rx_streams = 4; + else if (ht_cap->mcs.rx_mask[2]) + rx_streams = 3; + else if (ht_cap->mcs.rx_mask[1]) + rx_streams = 2; + else + rx_streams = 1; + + if (!(ht_cap->mcs.tx_params & IEEE80211_HT_MCS_TX_RX_DIFF)) + return rx_streams; + + return ((ht_cap->mcs.tx_params & IEEE80211_HT_MCS_TX_MAX_STREAMS_MASK) + >> IEEE80211_HT_MCS_TX_MAX_STREAMS_SHIFT) + 1; +} diff --git a/net/mac80211/sta_info.h b/net/mac80211/sta_info.h index 4208dbd5861f..3ef06a26b9cb 100644 --- a/net/mac80211/sta_info.h +++ b/net/mac80211/sta_info.h @@ -301,6 +301,8 @@ struct sta_ampdu_mlme { * @chains: chains ever used for RX from this station * @chain_signal_last: last signal (per chain) * @chain_signal_avg: signal average (per chain) + * @known_smps_mode: the smps_mode the client thinks we are in. Relevant for + * AP only. */ struct sta_info { /* General information, mostly static */ @@ -411,6 +413,8 @@ struct sta_info { unsigned int lost_packets; unsigned int beacon_loss_count; + enum ieee80211_smps_mode known_smps_mode; + /* keep last! */ struct ieee80211_sta sta; }; @@ -613,6 +617,7 @@ void sta_set_rate_info_rx(struct sta_info *sta, struct rate_info *rinfo); void ieee80211_sta_expire(struct ieee80211_sub_if_data *sdata, unsigned long exp_time); +u8 sta_info_tx_streams(struct sta_info *sta); void ieee80211_sta_ps_deliver_wakeup(struct sta_info *sta); void ieee80211_sta_ps_deliver_poll_response(struct sta_info *sta); diff --git a/net/mac80211/status.c b/net/mac80211/status.c index 78dc2e99027e..52a152b01b06 100644 --- a/net/mac80211/status.c +++ b/net/mac80211/status.c @@ -194,29 +194,36 @@ static void ieee80211_frame_acked(struct sta_info *sta, struct sk_buff *skb) if (ieee80211_is_action(mgmt->frame_control) && mgmt->u.action.category == WLAN_CATEGORY_HT && mgmt->u.action.u.ht_smps.action == WLAN_HT_ACTION_SMPS && - sdata->vif.type == NL80211_IFTYPE_STATION && ieee80211_sdata_running(sdata)) { - /* - * This update looks racy, but isn't -- if we come - * here we've definitely got a station that we're - * talking to, and on a managed interface that can - * only be the AP. And the only other place updating - * this variable in managed mode is before association. - */ + enum ieee80211_smps_mode smps_mode; + switch (mgmt->u.action.u.ht_smps.smps_control) { case WLAN_HT_SMPS_CONTROL_DYNAMIC: - sdata->smps_mode = IEEE80211_SMPS_DYNAMIC; + smps_mode = IEEE80211_SMPS_DYNAMIC; break; case WLAN_HT_SMPS_CONTROL_STATIC: - sdata->smps_mode = IEEE80211_SMPS_STATIC; + smps_mode = IEEE80211_SMPS_STATIC; break; case WLAN_HT_SMPS_CONTROL_DISABLED: default: /* shouldn't happen since we don't send that */ - sdata->smps_mode = IEEE80211_SMPS_OFF; + smps_mode = IEEE80211_SMPS_OFF; break; } - ieee80211_queue_work(&local->hw, &sdata->recalc_smps); + if (sdata->vif.type == NL80211_IFTYPE_STATION) { + /* + * This update looks racy, but isn't -- if we come + * here we've definitely got a station that we're + * talking to, and on a managed interface that can + * only be the AP. And the only other place updating + * this variable in managed mode is before association. + */ + sdata->smps_mode = smps_mode; + ieee80211_queue_work(&local->hw, &sdata->recalc_smps); + } else if (sdata->vif.type == NL80211_IFTYPE_AP || + sdata->vif.type == NL80211_IFTYPE_AP_VLAN) { + sta->known_smps_mode = smps_mode; + } } } diff --git a/net/mac80211/trace.h b/net/mac80211/trace.h index 1aba645882bd..d4cee98533fd 100644 --- a/net/mac80211/trace.h +++ b/net/mac80211/trace.h @@ -77,13 +77,13 @@ DECLARE_EVENT_CLASS(local_sdata_addr_evt, TP_STRUCT__entry( LOCAL_ENTRY VIF_ENTRY - __array(char, addr, 6) + __array(char, addr, ETH_ALEN) ), TP_fast_assign( LOCAL_ASSIGN; VIF_ASSIGN; - memcpy(__entry->addr, sdata->vif.addr, 6); + memcpy(__entry->addr, sdata->vif.addr, ETH_ALEN); ), TP_printk( @@ -1475,6 +1475,41 @@ DEFINE_EVENT(local_sdata_evt, drv_ipv6_addr_change, ); #endif +TRACE_EVENT(drv_join_ibss, + TP_PROTO(struct ieee80211_local *local, + struct ieee80211_sub_if_data *sdata, + struct ieee80211_bss_conf *info), + + TP_ARGS(local, sdata, info), + + TP_STRUCT__entry( + LOCAL_ENTRY + VIF_ENTRY + __field(u8, dtimper) + __field(u16, bcnint) + __dynamic_array(u8, ssid, info->ssid_len); + ), + + TP_fast_assign( + LOCAL_ASSIGN; + VIF_ASSIGN; + __entry->dtimper = info->dtim_period; + __entry->bcnint = info->beacon_int; + memcpy(__get_dynamic_array(ssid), info->ssid, info->ssid_len); + ), + + TP_printk( + LOCAL_PR_FMT VIF_PR_FMT, + LOCAL_PR_ARG, VIF_PR_ARG + ) +); + +DEFINE_EVENT(local_sdata_evt, drv_leave_ibss, + TP_PROTO(struct ieee80211_local *local, + struct ieee80211_sub_if_data *sdata), + TP_ARGS(local, sdata) +); + /* * Tracing for API calls that drivers call. */ diff --git a/net/mac80211/tx.c b/net/mac80211/tx.c index 70b5a05c0a4e..c558b246ef00 100644 --- a/net/mac80211/tx.c +++ b/net/mac80211/tx.c @@ -1367,6 +1367,35 @@ static int invoke_tx_handlers(struct ieee80211_tx_data *tx) return 0; } +bool ieee80211_tx_prepare_skb(struct ieee80211_hw *hw, + struct ieee80211_vif *vif, struct sk_buff *skb, + int band, struct ieee80211_sta **sta) +{ + struct ieee80211_sub_if_data *sdata = vif_to_sdata(vif); + struct ieee80211_tx_info *info = IEEE80211_SKB_CB(skb); + struct ieee80211_tx_data tx; + + if (ieee80211_tx_prepare(sdata, &tx, skb) == TX_DROP) + return false; + + info->band = band; + info->control.vif = vif; + info->hw_queue = vif->hw_queue[skb_get_queue_mapping(skb)]; + + if (invoke_tx_handlers(&tx)) + return false; + + if (sta) { + if (tx.sta) + *sta = &tx.sta->sta; + else + *sta = NULL; + } + + return true; +} +EXPORT_SYMBOL(ieee80211_tx_prepare_skb); + /* * Returns false if the frame couldn't be transmitted but was queued instead. */ @@ -1982,7 +2011,7 @@ netdev_tx_t ieee80211_subif_start_xmit(struct sk_buff *skb, * EAPOL frames from the local station. */ if (unlikely(!ieee80211_vif_is_mesh(&sdata->vif) && - !is_multicast_ether_addr(hdr.addr1) && !authorized && + !multicast && !authorized && (cpu_to_be16(ethertype) != sdata->control_port_protocol || !ether_addr_equal(sdata->vif.addr, skb->data + ETH_ALEN)))) { #ifdef CONFIG_MAC80211_VERBOSE_DEBUG @@ -2358,15 +2387,35 @@ static void ieee80211_update_csa(struct ieee80211_sub_if_data *sdata, struct probe_resp *resp; int counter_offset_beacon = sdata->csa_counter_offset_beacon; int counter_offset_presp = sdata->csa_counter_offset_presp; + u8 *beacon_data; + size_t beacon_data_len; + + switch (sdata->vif.type) { + case NL80211_IFTYPE_AP: + beacon_data = beacon->tail; + beacon_data_len = beacon->tail_len; + break; + case NL80211_IFTYPE_ADHOC: + beacon_data = beacon->head; + beacon_data_len = beacon->head_len; + break; + case NL80211_IFTYPE_MESH_POINT: + beacon_data = beacon->head; + beacon_data_len = beacon->head_len; + break; + default: + return; + } + if (WARN_ON(counter_offset_beacon >= beacon_data_len)) + return; /* warn if the driver did not check for/react to csa completeness */ - if (WARN_ON(((u8 *)beacon->tail)[counter_offset_beacon] == 0)) + if (WARN_ON(beacon_data[counter_offset_beacon] == 0)) return; - ((u8 *)beacon->tail)[counter_offset_beacon]--; + beacon_data[counter_offset_beacon]--; - if (sdata->vif.type == NL80211_IFTYPE_AP && - counter_offset_presp) { + if (sdata->vif.type == NL80211_IFTYPE_AP && counter_offset_presp) { rcu_read_lock(); resp = rcu_dereference(sdata->u.ap.probe_resp); @@ -2401,6 +2450,24 @@ bool ieee80211_csa_is_complete(struct ieee80211_vif *vif) goto out; beacon_data = beacon->tail; beacon_data_len = beacon->tail_len; + } else if (vif->type == NL80211_IFTYPE_ADHOC) { + struct ieee80211_if_ibss *ifibss = &sdata->u.ibss; + + beacon = rcu_dereference(ifibss->presp); + if (!beacon) + goto out; + + beacon_data = beacon->head; + beacon_data_len = beacon->head_len; + } else if (vif->type == NL80211_IFTYPE_MESH_POINT) { + struct ieee80211_if_mesh *ifmsh = &sdata->u.mesh; + + beacon = rcu_dereference(ifmsh->beacon); + if (!beacon) + goto out; + + beacon_data = beacon->head; + beacon_data_len = beacon->head_len; } else { WARN_ON(1); goto out; @@ -2485,6 +2552,10 @@ struct sk_buff *ieee80211_beacon_get_tim(struct ieee80211_hw *hw, if (!presp) goto out; + if (sdata->vif.csa_active) + ieee80211_update_csa(sdata, presp); + + skb = dev_alloc_skb(local->tx_headroom + presp->head_len); if (!skb) goto out; @@ -2502,6 +2573,9 @@ struct sk_buff *ieee80211_beacon_get_tim(struct ieee80211_hw *hw, if (!bcn) goto out; + if (sdata->vif.csa_active) + ieee80211_update_csa(sdata, bcn); + if (ifmsh->sync_ops) ifmsh->sync_ops->adjust_tbtt( sdata); diff --git a/net/mac80211/util.c b/net/mac80211/util.c index 69e4ef5348a0..592a18171f95 100644 --- a/net/mac80211/util.c +++ b/net/mac80211/util.c @@ -300,9 +300,6 @@ void ieee80211_propagate_queue_wake(struct ieee80211_local *local, int queue) if (!sdata->dev) continue; - if (test_bit(SDATA_STATE_OFFCHANNEL, &sdata->state)) - continue; - if (sdata->vif.cab_queue != IEEE80211_INVAL_HW_QUEUE && local->queue_stop_reasons[sdata->vif.cab_queue] != 0) continue; @@ -567,18 +564,15 @@ void ieee80211_flush_queues(struct ieee80211_local *local, IEEE80211_QUEUE_STOP_REASON_FLUSH); } -void ieee80211_iterate_active_interfaces( - struct ieee80211_hw *hw, u32 iter_flags, - void (*iterator)(void *data, u8 *mac, - struct ieee80211_vif *vif), - void *data) +static void __iterate_active_interfaces(struct ieee80211_local *local, + u32 iter_flags, + void (*iterator)(void *data, u8 *mac, + struct ieee80211_vif *vif), + void *data) { - struct ieee80211_local *local = hw_to_local(hw); struct ieee80211_sub_if_data *sdata; - mutex_lock(&local->iflist_mtx); - - list_for_each_entry(sdata, &local->interfaces, list) { + list_for_each_entry_rcu(sdata, &local->interfaces, list) { switch (sdata->vif.type) { case NL80211_IFTYPE_MONITOR: if (!(sdata->u.mntr_flags & MONITOR_FLAG_ACTIVE)) @@ -597,13 +591,25 @@ void ieee80211_iterate_active_interfaces( &sdata->vif); } - sdata = rcu_dereference_protected(local->monitor_sdata, - lockdep_is_held(&local->iflist_mtx)); + sdata = rcu_dereference_check(local->monitor_sdata, + lockdep_is_held(&local->iflist_mtx) || + lockdep_rtnl_is_held()); if (sdata && (iter_flags & IEEE80211_IFACE_ITER_RESUME_ALL || sdata->flags & IEEE80211_SDATA_IN_DRIVER)) iterator(data, sdata->vif.addr, &sdata->vif); +} + +void ieee80211_iterate_active_interfaces( + struct ieee80211_hw *hw, u32 iter_flags, + void (*iterator)(void *data, u8 *mac, + struct ieee80211_vif *vif), + void *data) +{ + struct ieee80211_local *local = hw_to_local(hw); + mutex_lock(&local->iflist_mtx); + __iterate_active_interfaces(local, iter_flags, iterator, data); mutex_unlock(&local->iflist_mtx); } EXPORT_SYMBOL_GPL(ieee80211_iterate_active_interfaces); @@ -615,38 +621,26 @@ void ieee80211_iterate_active_interfaces_atomic( void *data) { struct ieee80211_local *local = hw_to_local(hw); - struct ieee80211_sub_if_data *sdata; rcu_read_lock(); + __iterate_active_interfaces(local, iter_flags, iterator, data); + rcu_read_unlock(); +} +EXPORT_SYMBOL_GPL(ieee80211_iterate_active_interfaces_atomic); - list_for_each_entry_rcu(sdata, &local->interfaces, list) { - switch (sdata->vif.type) { - case NL80211_IFTYPE_MONITOR: - if (!(sdata->u.mntr_flags & MONITOR_FLAG_ACTIVE)) - continue; - break; - case NL80211_IFTYPE_AP_VLAN: - continue; - default: - break; - } - if (!(iter_flags & IEEE80211_IFACE_ITER_RESUME_ALL) && - !(sdata->flags & IEEE80211_SDATA_IN_DRIVER)) - continue; - if (ieee80211_sdata_running(sdata)) - iterator(data, sdata->vif.addr, - &sdata->vif); - } +void ieee80211_iterate_active_interfaces_rtnl( + struct ieee80211_hw *hw, u32 iter_flags, + void (*iterator)(void *data, u8 *mac, + struct ieee80211_vif *vif), + void *data) +{ + struct ieee80211_local *local = hw_to_local(hw); - sdata = rcu_dereference(local->monitor_sdata); - if (sdata && - (iter_flags & IEEE80211_IFACE_ITER_RESUME_ALL || - sdata->flags & IEEE80211_SDATA_IN_DRIVER)) - iterator(data, sdata->vif.addr, &sdata->vif); + ASSERT_RTNL(); - rcu_read_unlock(); + __iterate_active_interfaces(local, iter_flags, iterator, data); } -EXPORT_SYMBOL_GPL(ieee80211_iterate_active_interfaces_atomic); +EXPORT_SYMBOL_GPL(ieee80211_iterate_active_interfaces_rtnl); /* * Nothing should have been stuffed into the workqueue during @@ -746,6 +740,7 @@ u32 ieee802_11_parse_elems_crc(const u8 *start, size_t len, bool action, case WLAN_EID_TIMEOUT_INTERVAL: case WLAN_EID_SECONDARY_CHANNEL_OFFSET: case WLAN_EID_WIDE_BW_CHANNEL_SWITCH: + case WLAN_EID_CHAN_SWITCH_PARAM: /* * not listing WLAN_EID_CHANNEL_SWITCH_WRAPPER -- it seems possible * that if the content gets bigger it might be needed more than once @@ -911,6 +906,14 @@ u32 ieee802_11_parse_elems_crc(const u8 *start, size_t len, bool action, } elems->sec_chan_offs = (void *)pos; break; + case WLAN_EID_CHAN_SWITCH_PARAM: + if (elen != + sizeof(*elems->mesh_chansw_params_ie)) { + elem_parse_failed = true; + break; + } + elems->mesh_chansw_params_ie = (void *)pos; + break; case WLAN_EID_WIDE_BW_CHANNEL_SWITCH: if (!action || elen != sizeof(*elems->wide_bw_chansw_ie)) { @@ -1007,14 +1010,21 @@ void ieee80211_set_wmm_default(struct ieee80211_sub_if_data *sdata, */ enable_qos = (sdata->vif.type != NL80211_IFTYPE_STATION); - for (ac = 0; ac < IEEE80211_NUM_ACS; ac++) { - /* Set defaults according to 802.11-2007 Table 7-37 */ - aCWmax = 1023; - if (use_11b) - aCWmin = 31; - else - aCWmin = 15; + /* Set defaults according to 802.11-2007 Table 7-37 */ + aCWmax = 1023; + if (use_11b) + aCWmin = 31; + else + aCWmin = 15; + + /* Confiure old 802.11b/g medium access rules. */ + qparam.cw_max = aCWmax; + qparam.cw_min = aCWmin; + qparam.txop = 0; + qparam.aifs = 2; + for (ac = 0; ac < IEEE80211_NUM_ACS; ac++) { + /* Update if QoS is enabled. */ if (enable_qos) { switch (ac) { case IEEE80211_AC_BK: @@ -1050,12 +1060,6 @@ void ieee80211_set_wmm_default(struct ieee80211_sub_if_data *sdata, qparam.aifs = 2; break; } - } else { - /* Confiure old 802.11b/g medium access rules. */ - qparam.cw_max = aCWmax; - qparam.cw_min = aCWmin; - qparam.txop = 0; - qparam.aifs = 2; } qparam.uapsd = false; @@ -1084,8 +1088,8 @@ void ieee80211_send_auth(struct ieee80211_sub_if_data *sdata, struct ieee80211_mgmt *mgmt; int err; - skb = dev_alloc_skb(local->hw.extra_tx_headroom + - sizeof(*mgmt) + 6 + extra_len); + /* 24 + 6 = header + auth_algo + auth_transaction + status_code */ + skb = dev_alloc_skb(local->hw.extra_tx_headroom + 24 + 6 + extra_len); if (!skb) return; @@ -2296,3 +2300,175 @@ void ieee80211_radar_detected(struct ieee80211_hw *hw) ieee80211_queue_work(hw, &local->radar_detected_work); } EXPORT_SYMBOL(ieee80211_radar_detected); + +u32 ieee80211_chandef_downgrade(struct cfg80211_chan_def *c) +{ + u32 ret; + int tmp; + + switch (c->width) { + case NL80211_CHAN_WIDTH_20: + c->width = NL80211_CHAN_WIDTH_20_NOHT; + ret = IEEE80211_STA_DISABLE_HT | IEEE80211_STA_DISABLE_VHT; + break; + case NL80211_CHAN_WIDTH_40: + c->width = NL80211_CHAN_WIDTH_20; + c->center_freq1 = c->chan->center_freq; + ret = IEEE80211_STA_DISABLE_40MHZ | + IEEE80211_STA_DISABLE_VHT; + break; + case NL80211_CHAN_WIDTH_80: + tmp = (30 + c->chan->center_freq - c->center_freq1)/20; + /* n_P40 */ + tmp /= 2; + /* freq_P40 */ + c->center_freq1 = c->center_freq1 - 20 + 40 * tmp; + c->width = NL80211_CHAN_WIDTH_40; + ret = IEEE80211_STA_DISABLE_VHT; + break; + case NL80211_CHAN_WIDTH_80P80: + c->center_freq2 = 0; + c->width = NL80211_CHAN_WIDTH_80; + ret = IEEE80211_STA_DISABLE_80P80MHZ | + IEEE80211_STA_DISABLE_160MHZ; + break; + case NL80211_CHAN_WIDTH_160: + /* n_P20 */ + tmp = (70 + c->chan->center_freq - c->center_freq1)/20; + /* n_P80 */ + tmp /= 4; + c->center_freq1 = c->center_freq1 - 40 + 80 * tmp; + c->width = NL80211_CHAN_WIDTH_80; + ret = IEEE80211_STA_DISABLE_80P80MHZ | + IEEE80211_STA_DISABLE_160MHZ; + break; + default: + case NL80211_CHAN_WIDTH_20_NOHT: + WARN_ON_ONCE(1); + c->width = NL80211_CHAN_WIDTH_20_NOHT; + ret = IEEE80211_STA_DISABLE_HT | IEEE80211_STA_DISABLE_VHT; + break; + case NL80211_CHAN_WIDTH_5: + case NL80211_CHAN_WIDTH_10: + WARN_ON_ONCE(1); + /* keep c->width */ + ret = IEEE80211_STA_DISABLE_HT | IEEE80211_STA_DISABLE_VHT; + break; + } + + WARN_ON_ONCE(!cfg80211_chandef_valid(c)); + + return ret; +} + +/* + * Returns true if smps_mode_new is strictly more restrictive than + * smps_mode_old. + */ +bool ieee80211_smps_is_restrictive(enum ieee80211_smps_mode smps_mode_old, + enum ieee80211_smps_mode smps_mode_new) +{ + if (WARN_ON_ONCE(smps_mode_old == IEEE80211_SMPS_AUTOMATIC || + smps_mode_new == IEEE80211_SMPS_AUTOMATIC)) + return false; + + switch (smps_mode_old) { + case IEEE80211_SMPS_STATIC: + return false; + case IEEE80211_SMPS_DYNAMIC: + return smps_mode_new == IEEE80211_SMPS_STATIC; + case IEEE80211_SMPS_OFF: + return smps_mode_new != IEEE80211_SMPS_OFF; + default: + WARN_ON(1); + } + + return false; +} + +int ieee80211_send_action_csa(struct ieee80211_sub_if_data *sdata, + struct cfg80211_csa_settings *csa_settings) +{ + struct sk_buff *skb; + struct ieee80211_mgmt *mgmt; + struct ieee80211_local *local = sdata->local; + int freq; + int hdr_len = offsetof(struct ieee80211_mgmt, u.action.u.chan_switch) + + sizeof(mgmt->u.action.u.chan_switch); + u8 *pos; + + if (sdata->vif.type != NL80211_IFTYPE_ADHOC && + sdata->vif.type != NL80211_IFTYPE_MESH_POINT) + return -EOPNOTSUPP; + + skb = dev_alloc_skb(local->tx_headroom + hdr_len + + 5 + /* channel switch announcement element */ + 3 + /* secondary channel offset element */ + 8); /* mesh channel switch parameters element */ + if (!skb) + return -ENOMEM; + + skb_reserve(skb, local->tx_headroom); + mgmt = (struct ieee80211_mgmt *)skb_put(skb, hdr_len); + memset(mgmt, 0, hdr_len); + mgmt->frame_control = cpu_to_le16(IEEE80211_FTYPE_MGMT | + IEEE80211_STYPE_ACTION); + + eth_broadcast_addr(mgmt->da); + memcpy(mgmt->sa, sdata->vif.addr, ETH_ALEN); + if (ieee80211_vif_is_mesh(&sdata->vif)) { + memcpy(mgmt->bssid, sdata->vif.addr, ETH_ALEN); + } else { + struct ieee80211_if_ibss *ifibss = &sdata->u.ibss; + memcpy(mgmt->bssid, ifibss->bssid, ETH_ALEN); + } + mgmt->u.action.category = WLAN_CATEGORY_SPECTRUM_MGMT; + mgmt->u.action.u.chan_switch.action_code = WLAN_ACTION_SPCT_CHL_SWITCH; + pos = skb_put(skb, 5); + *pos++ = WLAN_EID_CHANNEL_SWITCH; /* EID */ + *pos++ = 3; /* IE length */ + *pos++ = csa_settings->block_tx ? 1 : 0; /* CSA mode */ + freq = csa_settings->chandef.chan->center_freq; + *pos++ = ieee80211_frequency_to_channel(freq); /* channel */ + *pos++ = csa_settings->count; /* count */ + + if (csa_settings->chandef.width == NL80211_CHAN_WIDTH_40) { + enum nl80211_channel_type ch_type; + + skb_put(skb, 3); + *pos++ = WLAN_EID_SECONDARY_CHANNEL_OFFSET; /* EID */ + *pos++ = 1; /* IE length */ + ch_type = cfg80211_get_chandef_type(&csa_settings->chandef); + if (ch_type == NL80211_CHAN_HT40PLUS) + *pos++ = IEEE80211_HT_PARAM_CHA_SEC_ABOVE; + else + *pos++ = IEEE80211_HT_PARAM_CHA_SEC_BELOW; + } + + if (ieee80211_vif_is_mesh(&sdata->vif)) { + struct ieee80211_if_mesh *ifmsh = &sdata->u.mesh; + __le16 pre_value; + + skb_put(skb, 8); + *pos++ = WLAN_EID_CHAN_SWITCH_PARAM; /* EID */ + *pos++ = 6; /* IE length */ + *pos++ = sdata->u.mesh.mshcfg.dot11MeshTTL; /* Mesh TTL */ + *pos = 0x00; /* Mesh Flag: Tx Restrict, Initiator, Reason */ + *pos |= WLAN_EID_CHAN_SWITCH_PARAM_INITIATOR; + *pos++ |= csa_settings->block_tx ? + WLAN_EID_CHAN_SWITCH_PARAM_TX_RESTRICT : 0x00; + put_unaligned_le16(WLAN_REASON_MESH_CHAN, pos); /* Reason Cd */ + pos += 2; + if (!ifmsh->pre_value) + ifmsh->pre_value = 1; + else + ifmsh->pre_value++; + pre_value = cpu_to_le16(ifmsh->pre_value); + memcpy(pos, &pre_value, 2); /* Precedence Value */ + pos += 2; + ifmsh->chsw_init = true; + } + + ieee80211_tx_skb(sdata, skb); + return 0; +} diff --git a/net/mac80211/vht.c b/net/mac80211/vht.c index 97c289414e32..de0112785aae 100644 --- a/net/mac80211/vht.c +++ b/net/mac80211/vht.c @@ -185,13 +185,13 @@ ieee80211_vht_cap_ie_to_sta_vht_cap(struct ieee80211_sub_if_data *sdata, if (own_cap.cap & IEEE80211_VHT_CAP_SU_BEAMFORMEE_CAPABLE) { vht_cap->cap |= cap_info & (IEEE80211_VHT_CAP_SU_BEAMFORMER_CAPABLE | - IEEE80211_VHT_CAP_BEAMFORMER_ANTENNAS_MAX | IEEE80211_VHT_CAP_SOUNDING_DIMENSIONS_MAX); } if (own_cap.cap & IEEE80211_VHT_CAP_SU_BEAMFORMER_CAPABLE) vht_cap->cap |= cap_info & - IEEE80211_VHT_CAP_SU_BEAMFORMEE_CAPABLE; + (IEEE80211_VHT_CAP_SU_BEAMFORMEE_CAPABLE | + IEEE80211_VHT_CAP_BEAMFORMEE_STS_MAX); if (own_cap.cap & IEEE80211_VHT_CAP_MU_BEAMFORMER_CAPABLE) vht_cap->cap |= cap_info & diff --git a/net/mac80211/wpa.c b/net/mac80211/wpa.c index c9edfcb7a13b..d65728220763 100644 --- a/net/mac80211/wpa.c +++ b/net/mac80211/wpa.c @@ -301,22 +301,16 @@ ieee80211_crypto_tkip_decrypt(struct ieee80211_rx_data *rx) } -static void ccmp_special_blocks(struct sk_buff *skb, u8 *pn, u8 *scratch, +static void ccmp_special_blocks(struct sk_buff *skb, u8 *pn, u8 *b_0, u8 *aad, int encrypted) { __le16 mask_fc; int a4_included, mgmt; u8 qos_tid; - u8 *b_0, *aad; - u16 data_len, len_a; + u16 len_a; unsigned int hdrlen; struct ieee80211_hdr *hdr = (struct ieee80211_hdr *)skb->data; - memset(scratch, 0, 6 * AES_BLOCK_SIZE); - - b_0 = scratch + 3 * AES_BLOCK_SIZE; - aad = scratch + 4 * AES_BLOCK_SIZE; - /* * Mask FC: zero subtype b4 b5 b6 (if not mgmt) * Retry, PwrMgt, MoreData; set Protected @@ -338,20 +332,21 @@ static void ccmp_special_blocks(struct sk_buff *skb, u8 *pn, u8 *scratch, else qos_tid = 0; - data_len = skb->len - hdrlen - IEEE80211_CCMP_HDR_LEN; - if (encrypted) - data_len -= IEEE80211_CCMP_MIC_LEN; + /* In CCM, the initial vectors (IV) used for CTR mode encryption and CBC + * mode authentication are not allowed to collide, yet both are derived + * from this vector b_0. We only set L := 1 here to indicate that the + * data size can be represented in (L+1) bytes. The CCM layer will take + * care of storing the data length in the top (L+1) bytes and setting + * and clearing the other bits as is required to derive the two IVs. + */ + b_0[0] = 0x1; - /* First block, b_0 */ - b_0[0] = 0x59; /* flags: Adata: 1, M: 011, L: 001 */ /* Nonce: Nonce Flags | A2 | PN * Nonce Flags: Priority (b0..b3) | Management (b4) | Reserved (b5..b7) */ b_0[1] = qos_tid | (mgmt << 4); memcpy(&b_0[2], hdr->addr2, ETH_ALEN); memcpy(&b_0[8], pn, IEEE80211_CCMP_PN_LEN); - /* l(m) */ - put_unaligned_be16(data_len, &b_0[14]); /* AAD (extra authenticate-only data) / masked 802.11 header * FC | A1 | A2 | A3 | SC | [A4] | [QC] */ @@ -407,7 +402,8 @@ static int ccmp_encrypt_skb(struct ieee80211_tx_data *tx, struct sk_buff *skb) u8 *pos; u8 pn[6]; u64 pn64; - u8 scratch[6 * AES_BLOCK_SIZE]; + u8 aad[2 * AES_BLOCK_SIZE]; + u8 b_0[AES_BLOCK_SIZE]; if (info->control.hw_key && !(info->control.hw_key->flags & IEEE80211_KEY_FLAG_GENERATE_IV) && @@ -460,9 +456,9 @@ static int ccmp_encrypt_skb(struct ieee80211_tx_data *tx, struct sk_buff *skb) return 0; pos += IEEE80211_CCMP_HDR_LEN; - ccmp_special_blocks(skb, pn, scratch, 0); - ieee80211_aes_ccm_encrypt(key->u.ccmp.tfm, scratch, pos, len, - pos, skb_put(skb, IEEE80211_CCMP_MIC_LEN)); + ccmp_special_blocks(skb, pn, b_0, aad, 0); + ieee80211_aes_ccm_encrypt(key->u.ccmp.tfm, b_0, aad, pos, len, + skb_put(skb, IEEE80211_CCMP_MIC_LEN)); return 0; } @@ -525,16 +521,16 @@ ieee80211_crypto_ccmp_decrypt(struct ieee80211_rx_data *rx) } if (!(status->flag & RX_FLAG_DECRYPTED)) { - u8 scratch[6 * AES_BLOCK_SIZE]; + u8 aad[2 * AES_BLOCK_SIZE]; + u8 b_0[AES_BLOCK_SIZE]; /* hardware didn't decrypt/verify MIC */ - ccmp_special_blocks(skb, pn, scratch, 1); + ccmp_special_blocks(skb, pn, b_0, aad, 1); if (ieee80211_aes_ccm_decrypt( - key->u.ccmp.tfm, scratch, + key->u.ccmp.tfm, b_0, aad, skb->data + hdrlen + IEEE80211_CCMP_HDR_LEN, data_len, - skb->data + skb->len - IEEE80211_CCMP_MIC_LEN, - skb->data + hdrlen + IEEE80211_CCMP_HDR_LEN)) + skb->data + skb->len - IEEE80211_CCMP_MIC_LEN)) return RX_DROP_UNUSABLE; } diff --git a/net/mac802154/ieee802154_dev.c b/net/mac802154/ieee802154_dev.c index b7c7f815deae..52ae6646a411 100644 --- a/net/mac802154/ieee802154_dev.c +++ b/net/mac802154/ieee802154_dev.c @@ -174,8 +174,7 @@ ieee802154_alloc_device(size_t priv_data_len, struct ieee802154_ops *ops) if (!ops || !ops->xmit || !ops->ed || !ops->start || !ops->stop || !ops->set_channel) { - printk(KERN_ERR - "undefined IEEE802.15.4 device operations\n"); + pr_err("undefined IEEE802.15.4 device operations\n"); return NULL; } @@ -201,8 +200,7 @@ ieee802154_alloc_device(size_t priv_data_len, struct ieee802154_ops *ops) phy = wpan_phy_alloc(priv_size); if (!phy) { - printk(KERN_ERR - "failure to allocate master IEEE802.15.4 device\n"); + pr_err("failure to allocate master IEEE802.15.4 device\n"); return NULL; } diff --git a/net/mac802154/wpan.c b/net/mac802154/wpan.c index 2ca2f4dceab7..e24bcf977296 100644 --- a/net/mac802154/wpan.c +++ b/net/mac802154/wpan.c @@ -208,6 +208,8 @@ static int mac802154_header_create(struct sk_buff *skb, head[1] = fc >> 8; memcpy(skb_push(skb, pos), head, pos); + skb_reset_mac_header(skb); + skb->mac_len = pos; return pos; } diff --git a/net/mpls/mpls_gso.c b/net/mpls/mpls_gso.c index 1bec1219ab81..851cd880b0c0 100644 --- a/net/mpls/mpls_gso.c +++ b/net/mpls/mpls_gso.c @@ -33,6 +33,7 @@ static struct sk_buff *mpls_gso_segment(struct sk_buff *skb, SKB_GSO_DODGY | SKB_GSO_TCP_ECN | SKB_GSO_GRE | + SKB_GSO_IPIP | SKB_GSO_MPLS))) goto out; diff --git a/net/netfilter/Kconfig b/net/netfilter/Kconfig index 6e839b6dff2b..48acec17e27a 100644 --- a/net/netfilter/Kconfig +++ b/net/netfilter/Kconfig @@ -413,6 +413,58 @@ config NETFILTER_SYNPROXY endif # NF_CONNTRACK +config NF_TABLES + depends on NETFILTER_NETLINK + tristate "Netfilter nf_tables support" + +config NFT_EXTHDR + depends on NF_TABLES + tristate "Netfilter nf_tables IPv6 exthdr module" + +config NFT_META + depends on NF_TABLES + tristate "Netfilter nf_tables meta module" + +config NFT_CT + depends on NF_TABLES + depends on NF_CONNTRACK + tristate "Netfilter nf_tables conntrack module" + +config NFT_RBTREE + depends on NF_TABLES + tristate "Netfilter nf_tables rbtree set module" + +config NFT_HASH + depends on NF_TABLES + tristate "Netfilter nf_tables hash set module" + +config NFT_COUNTER + depends on NF_TABLES + tristate "Netfilter nf_tables counter module" + +config NFT_LOG + depends on NF_TABLES + tristate "Netfilter nf_tables log module" + +config NFT_LIMIT + depends on NF_TABLES + tristate "Netfilter nf_tables limit module" + +config NFT_NAT + depends on NF_TABLES + depends on NF_CONNTRACK + depends on NF_NAT + tristate "Netfilter nf_tables nat module" + +config NFT_COMPAT + depends on NF_TABLES + depends on NETFILTER_XTABLES + tristate "Netfilter x_tables over nf_tables module" + help + This is required if you intend to use any of existing + x_tables match/target extensions over the nf_tables + framework. + config NETFILTER_XTABLES tristate "Netfilter Xtables support (required for ip_tables)" default m if NETFILTER_ADVANCED=n diff --git a/net/netfilter/Makefile b/net/netfilter/Makefile index c3a0a12907f6..394483b2c193 100644 --- a/net/netfilter/Makefile +++ b/net/netfilter/Makefile @@ -64,6 +64,24 @@ obj-$(CONFIG_NF_NAT_TFTP) += nf_nat_tftp.o # SYNPROXY obj-$(CONFIG_NETFILTER_SYNPROXY) += nf_synproxy_core.o +# nf_tables +nf_tables-objs += nf_tables_core.o nf_tables_api.o +nf_tables-objs += nft_immediate.o nft_cmp.o nft_lookup.o +nf_tables-objs += nft_bitwise.o nft_byteorder.o nft_payload.o + +obj-$(CONFIG_NF_TABLES) += nf_tables.o +obj-$(CONFIG_NFT_COMPAT) += nft_compat.o +obj-$(CONFIG_NFT_EXTHDR) += nft_exthdr.o +obj-$(CONFIG_NFT_META) += nft_meta.o +obj-$(CONFIG_NFT_CT) += nft_ct.o +obj-$(CONFIG_NFT_LIMIT) += nft_limit.o +obj-$(CONFIG_NFT_NAT) += nft_nat.o +#nf_tables-objs += nft_meta_target.o +obj-$(CONFIG_NFT_RBTREE) += nft_rbtree.o +obj-$(CONFIG_NFT_HASH) += nft_hash.o +obj-$(CONFIG_NFT_COUNTER) += nft_counter.o +obj-$(CONFIG_NFT_LOG) += nft_log.o + # generic X tables obj-$(CONFIG_NETFILTER_XTABLES) += x_tables.o xt_tcpudp.o diff --git a/net/netfilter/core.c b/net/netfilter/core.c index 593b16ea45e0..1fbab0cdd302 100644 --- a/net/netfilter/core.c +++ b/net/netfilter/core.c @@ -146,7 +146,7 @@ unsigned int nf_iterate(struct list_head *head, /* Optimization: we don't need to hold module reference here, since function can't sleep. --RR */ repeat: - verdict = (*elemp)->hook(hook, skb, indev, outdev, okfn); + verdict = (*elemp)->hook(*elemp, skb, indev, outdev, okfn); if (verdict != NF_ACCEPT) { #ifdef CONFIG_NETFILTER_DEBUG if (unlikely((verdict & NF_VERDICT_MASK) diff --git a/net/netfilter/ipset/Kconfig b/net/netfilter/ipset/Kconfig index ba36c283d837..a2d6263b6c64 100644 --- a/net/netfilter/ipset/Kconfig +++ b/net/netfilter/ipset/Kconfig @@ -1,7 +1,7 @@ menuconfig IP_SET tristate "IP set support" depends on INET && NETFILTER - depends on NETFILTER_NETLINK + select NETFILTER_NETLINK help This option adds IP set support to the kernel. In order to define and use the sets, you need the userspace utility @@ -90,6 +90,15 @@ config IP_SET_HASH_IPPORTNET To compile it as a module, choose M here. If unsure, say N. +config IP_SET_HASH_NETPORTNET + tristate "hash:net,port,net set support" + depends on IP_SET + help + This option adds the hash:net,port,net set type support, by which + one can store two IPv4/IPv6 subnets, and a protocol/port in a set. + + To compile it as a module, choose M here. If unsure, say N. + config IP_SET_HASH_NET tristate "hash:net set support" depends on IP_SET @@ -99,6 +108,15 @@ config IP_SET_HASH_NET To compile it as a module, choose M here. If unsure, say N. +config IP_SET_HASH_NETNET + tristate "hash:net,net set support" + depends on IP_SET + help + This option adds the hash:net,net set type support, by which + one can store IPv4/IPv6 network address/prefix pairs in a set. + + To compile it as a module, choose M here. If unsure, say N. + config IP_SET_HASH_NETPORT tristate "hash:net,port set support" depends on IP_SET diff --git a/net/netfilter/ipset/Makefile b/net/netfilter/ipset/Makefile index 6e965ecd5444..44b2d38476fa 100644 --- a/net/netfilter/ipset/Makefile +++ b/net/netfilter/ipset/Makefile @@ -20,6 +20,8 @@ obj-$(CONFIG_IP_SET_HASH_IPPORTNET) += ip_set_hash_ipportnet.o obj-$(CONFIG_IP_SET_HASH_NET) += ip_set_hash_net.o obj-$(CONFIG_IP_SET_HASH_NETPORT) += ip_set_hash_netport.o obj-$(CONFIG_IP_SET_HASH_NETIFACE) += ip_set_hash_netiface.o +obj-$(CONFIG_IP_SET_HASH_NETNET) += ip_set_hash_netnet.o +obj-$(CONFIG_IP_SET_HASH_NETPORTNET) += ip_set_hash_netportnet.o # list types obj-$(CONFIG_IP_SET_LIST_SET) += ip_set_list_set.o diff --git a/net/netfilter/ipset/ip_set_bitmap_gen.h b/net/netfilter/ipset/ip_set_bitmap_gen.h index 25243379b887..f2c7d83dc23f 100644 --- a/net/netfilter/ipset/ip_set_bitmap_gen.h +++ b/net/netfilter/ipset/ip_set_bitmap_gen.h @@ -8,38 +8,32 @@ #ifndef __IP_SET_BITMAP_IP_GEN_H #define __IP_SET_BITMAP_IP_GEN_H -#define CONCAT(a, b) a##b -#define TOKEN(a,b) CONCAT(a, b) - -#define mtype_do_test TOKEN(MTYPE, _do_test) -#define mtype_gc_test TOKEN(MTYPE, _gc_test) -#define mtype_is_filled TOKEN(MTYPE, _is_filled) -#define mtype_do_add TOKEN(MTYPE, _do_add) -#define mtype_do_del TOKEN(MTYPE, _do_del) -#define mtype_do_list TOKEN(MTYPE, _do_list) -#define mtype_do_head TOKEN(MTYPE, _do_head) -#define mtype_adt_elem TOKEN(MTYPE, _adt_elem) -#define mtype_add_timeout TOKEN(MTYPE, _add_timeout) -#define mtype_gc_init TOKEN(MTYPE, _gc_init) -#define mtype_kadt TOKEN(MTYPE, _kadt) -#define mtype_uadt TOKEN(MTYPE, _uadt) -#define mtype_destroy TOKEN(MTYPE, _destroy) -#define mtype_flush TOKEN(MTYPE, _flush) -#define mtype_head TOKEN(MTYPE, _head) -#define mtype_same_set TOKEN(MTYPE, _same_set) -#define mtype_elem TOKEN(MTYPE, _elem) -#define mtype_test TOKEN(MTYPE, _test) -#define mtype_add TOKEN(MTYPE, _add) -#define mtype_del TOKEN(MTYPE, _del) -#define mtype_list TOKEN(MTYPE, _list) -#define mtype_gc TOKEN(MTYPE, _gc) +#define mtype_do_test IPSET_TOKEN(MTYPE, _do_test) +#define mtype_gc_test IPSET_TOKEN(MTYPE, _gc_test) +#define mtype_is_filled IPSET_TOKEN(MTYPE, _is_filled) +#define mtype_do_add IPSET_TOKEN(MTYPE, _do_add) +#define mtype_ext_cleanup IPSET_TOKEN(MTYPE, _ext_cleanup) +#define mtype_do_del IPSET_TOKEN(MTYPE, _do_del) +#define mtype_do_list IPSET_TOKEN(MTYPE, _do_list) +#define mtype_do_head IPSET_TOKEN(MTYPE, _do_head) +#define mtype_adt_elem IPSET_TOKEN(MTYPE, _adt_elem) +#define mtype_add_timeout IPSET_TOKEN(MTYPE, _add_timeout) +#define mtype_gc_init IPSET_TOKEN(MTYPE, _gc_init) +#define mtype_kadt IPSET_TOKEN(MTYPE, _kadt) +#define mtype_uadt IPSET_TOKEN(MTYPE, _uadt) +#define mtype_destroy IPSET_TOKEN(MTYPE, _destroy) +#define mtype_flush IPSET_TOKEN(MTYPE, _flush) +#define mtype_head IPSET_TOKEN(MTYPE, _head) +#define mtype_same_set IPSET_TOKEN(MTYPE, _same_set) +#define mtype_elem IPSET_TOKEN(MTYPE, _elem) +#define mtype_test IPSET_TOKEN(MTYPE, _test) +#define mtype_add IPSET_TOKEN(MTYPE, _add) +#define mtype_del IPSET_TOKEN(MTYPE, _del) +#define mtype_list IPSET_TOKEN(MTYPE, _list) +#define mtype_gc IPSET_TOKEN(MTYPE, _gc) #define mtype MTYPE -#define ext_timeout(e, m) \ - (unsigned long *)((e) + (m)->offset[IPSET_OFFSET_TIMEOUT]) -#define ext_counter(e, m) \ - (struct ip_set_counter *)((e) + (m)->offset[IPSET_OFFSET_COUNTER]) -#define get_ext(map, id) ((map)->extensions + (map)->dsize * (id)) +#define get_ext(set, map, id) ((map)->extensions + (set)->dsize * (id)) static void mtype_gc_init(struct ip_set *set, void (*gc)(unsigned long ul_set)) @@ -49,11 +43,22 @@ mtype_gc_init(struct ip_set *set, void (*gc)(unsigned long ul_set)) init_timer(&map->gc); map->gc.data = (unsigned long) set; map->gc.function = gc; - map->gc.expires = jiffies + IPSET_GC_PERIOD(map->timeout) * HZ; + map->gc.expires = jiffies + IPSET_GC_PERIOD(set->timeout) * HZ; add_timer(&map->gc); } static void +mtype_ext_cleanup(struct ip_set *set) +{ + struct mtype *map = set->data; + u32 id; + + for (id = 0; id < map->elements; id++) + if (test_bit(id, map->members)) + ip_set_ext_destroy(set, get_ext(set, map, id)); +} + +static void mtype_destroy(struct ip_set *set) { struct mtype *map = set->data; @@ -62,8 +67,11 @@ mtype_destroy(struct ip_set *set) del_timer_sync(&map->gc); ip_set_free(map->members); - if (map->dsize) + if (set->dsize) { + if (set->extensions & IPSET_EXT_DESTROY) + mtype_ext_cleanup(set); ip_set_free(map->extensions); + } kfree(map); set->data = NULL; @@ -74,6 +82,8 @@ mtype_flush(struct ip_set *set) { struct mtype *map = set->data; + if (set->extensions & IPSET_EXT_DESTROY) + mtype_ext_cleanup(set); memset(map->members, 0, map->memsize); } @@ -91,12 +101,9 @@ mtype_head(struct ip_set *set, struct sk_buff *skb) nla_put_net32(skb, IPSET_ATTR_MEMSIZE, htonl(sizeof(*map) + map->memsize + - map->dsize * map->elements)) || - (SET_WITH_TIMEOUT(set) && - nla_put_net32(skb, IPSET_ATTR_TIMEOUT, htonl(map->timeout))) || - (SET_WITH_COUNTER(set) && - nla_put_net32(skb, IPSET_ATTR_CADT_FLAGS, - htonl(IPSET_FLAG_WITH_COUNTERS)))) + set->dsize * map->elements))) + goto nla_put_failure; + if (unlikely(ip_set_put_flags(skb, set))) goto nla_put_failure; ipset_nest_end(skb, nested); @@ -111,16 +118,16 @@ mtype_test(struct ip_set *set, void *value, const struct ip_set_ext *ext, { struct mtype *map = set->data; const struct mtype_adt_elem *e = value; - void *x = get_ext(map, e->id); - int ret = mtype_do_test(e, map); + void *x = get_ext(set, map, e->id); + int ret = mtype_do_test(e, map, set->dsize); if (ret <= 0) return ret; if (SET_WITH_TIMEOUT(set) && - ip_set_timeout_expired(ext_timeout(x, map))) + ip_set_timeout_expired(ext_timeout(x, set))) return 0; if (SET_WITH_COUNTER(set)) - ip_set_update_counter(ext_counter(x, map), ext, mext, flags); + ip_set_update_counter(ext_counter(x, set), ext, mext, flags); return 1; } @@ -130,26 +137,30 @@ mtype_add(struct ip_set *set, void *value, const struct ip_set_ext *ext, { struct mtype *map = set->data; const struct mtype_adt_elem *e = value; - void *x = get_ext(map, e->id); - int ret = mtype_do_add(e, map, flags); + void *x = get_ext(set, map, e->id); + int ret = mtype_do_add(e, map, flags, set->dsize); if (ret == IPSET_ADD_FAILED) { if (SET_WITH_TIMEOUT(set) && - ip_set_timeout_expired(ext_timeout(x, map))) + ip_set_timeout_expired(ext_timeout(x, set))) ret = 0; else if (!(flags & IPSET_FLAG_EXIST)) return -IPSET_ERR_EXIST; + /* Element is re-added, cleanup extensions */ + ip_set_ext_destroy(set, x); } if (SET_WITH_TIMEOUT(set)) #ifdef IP_SET_BITMAP_STORED_TIMEOUT - mtype_add_timeout(ext_timeout(x, map), e, ext, map, ret); + mtype_add_timeout(ext_timeout(x, set), e, ext, set, map, ret); #else - ip_set_timeout_set(ext_timeout(x, map), ext->timeout); + ip_set_timeout_set(ext_timeout(x, set), ext->timeout); #endif if (SET_WITH_COUNTER(set)) - ip_set_init_counter(ext_counter(x, map), ext); + ip_set_init_counter(ext_counter(x, set), ext); + if (SET_WITH_COMMENT(set)) + ip_set_init_comment(ext_comment(x, set), ext); return 0; } @@ -159,16 +170,27 @@ mtype_del(struct ip_set *set, void *value, const struct ip_set_ext *ext, { struct mtype *map = set->data; const struct mtype_adt_elem *e = value; - const void *x = get_ext(map, e->id); + void *x = get_ext(set, map, e->id); - if (mtype_do_del(e, map) || - (SET_WITH_TIMEOUT(set) && - ip_set_timeout_expired(ext_timeout(x, map)))) + if (mtype_do_del(e, map)) + return -IPSET_ERR_EXIST; + + ip_set_ext_destroy(set, x); + if (SET_WITH_TIMEOUT(set) && + ip_set_timeout_expired(ext_timeout(x, set))) return -IPSET_ERR_EXIST; return 0; } +#ifndef IP_SET_BITMAP_STORED_TIMEOUT +static inline bool +mtype_is_filled(const struct mtype_elem *x) +{ + return true; +} +#endif + static int mtype_list(const struct ip_set *set, struct sk_buff *skb, struct netlink_callback *cb) @@ -176,20 +198,21 @@ mtype_list(const struct ip_set *set, struct mtype *map = set->data; struct nlattr *adt, *nested; void *x; - u32 id, first = cb->args[2]; + u32 id, first = cb->args[IPSET_CB_ARG0]; adt = ipset_nest_start(skb, IPSET_ATTR_ADT); if (!adt) return -EMSGSIZE; - for (; cb->args[2] < map->elements; cb->args[2]++) { - id = cb->args[2]; - x = get_ext(map, id); + for (; cb->args[IPSET_CB_ARG0] < map->elements; + cb->args[IPSET_CB_ARG0]++) { + id = cb->args[IPSET_CB_ARG0]; + x = get_ext(set, map, id); if (!test_bit(id, map->members) || (SET_WITH_TIMEOUT(set) && #ifdef IP_SET_BITMAP_STORED_TIMEOUT mtype_is_filled((const struct mtype_elem *) x) && #endif - ip_set_timeout_expired(ext_timeout(x, map)))) + ip_set_timeout_expired(ext_timeout(x, set)))) continue; nested = ipset_nest_start(skb, IPSET_ATTR_DATA); if (!nested) { @@ -199,40 +222,27 @@ mtype_list(const struct ip_set *set, } else goto nla_put_failure; } - if (mtype_do_list(skb, map, id)) + if (mtype_do_list(skb, map, id, set->dsize)) goto nla_put_failure; - if (SET_WITH_TIMEOUT(set)) { -#ifdef IP_SET_BITMAP_STORED_TIMEOUT - if (nla_put_net32(skb, IPSET_ATTR_TIMEOUT, - htonl(ip_set_timeout_stored(map, id, - ext_timeout(x, map))))) - goto nla_put_failure; -#else - if (nla_put_net32(skb, IPSET_ATTR_TIMEOUT, - htonl(ip_set_timeout_get( - ext_timeout(x, map))))) - goto nla_put_failure; -#endif - } - if (SET_WITH_COUNTER(set) && - ip_set_put_counter(skb, ext_counter(x, map))) + if (ip_set_put_extensions(skb, set, x, + mtype_is_filled((const struct mtype_elem *) x))) goto nla_put_failure; ipset_nest_end(skb, nested); } ipset_nest_end(skb, adt); /* Set listing finished */ - cb->args[2] = 0; + cb->args[IPSET_CB_ARG0] = 0; return 0; nla_put_failure: nla_nest_cancel(skb, nested); - ipset_nest_end(skb, adt); if (unlikely(id == first)) { - cb->args[2] = 0; + cb->args[IPSET_CB_ARG0] = 0; return -EMSGSIZE; } + ipset_nest_end(skb, adt); return 0; } @@ -241,21 +251,23 @@ mtype_gc(unsigned long ul_set) { struct ip_set *set = (struct ip_set *) ul_set; struct mtype *map = set->data; - const void *x; + void *x; u32 id; /* We run parallel with other readers (test element) * but adding/deleting new entries is locked out */ read_lock_bh(&set->lock); for (id = 0; id < map->elements; id++) - if (mtype_gc_test(id, map)) { - x = get_ext(map, id); - if (ip_set_timeout_expired(ext_timeout(x, map))) + if (mtype_gc_test(id, map, set->dsize)) { + x = get_ext(set, map, id); + if (ip_set_timeout_expired(ext_timeout(x, set))) { clear_bit(id, map->members); + ip_set_ext_destroy(set, x); + } } read_unlock_bh(&set->lock); - map->gc.expires = jiffies + IPSET_GC_PERIOD(map->timeout) * HZ; + map->gc.expires = jiffies + IPSET_GC_PERIOD(set->timeout) * HZ; add_timer(&map->gc); } diff --git a/net/netfilter/ipset/ip_set_bitmap_ip.c b/net/netfilter/ipset/ip_set_bitmap_ip.c index f1a8128bef01..6f1f9f494808 100644 --- a/net/netfilter/ipset/ip_set_bitmap_ip.c +++ b/net/netfilter/ipset/ip_set_bitmap_ip.c @@ -25,12 +25,13 @@ #include <linux/netfilter/ipset/ip_set.h> #include <linux/netfilter/ipset/ip_set_bitmap.h> -#define REVISION_MIN 0 -#define REVISION_MAX 1 /* Counter support added */ +#define IPSET_TYPE_REV_MIN 0 +/* 1 Counter support added */ +#define IPSET_TYPE_REV_MAX 2 /* Comment support added */ MODULE_LICENSE("GPL"); MODULE_AUTHOR("Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>"); -IP_SET_MODULE_DESC("bitmap:ip", REVISION_MIN, REVISION_MAX); +IP_SET_MODULE_DESC("bitmap:ip", IPSET_TYPE_REV_MIN, IPSET_TYPE_REV_MAX); MODULE_ALIAS("ip_set_bitmap:ip"); #define MTYPE bitmap_ip @@ -44,10 +45,7 @@ struct bitmap_ip { u32 elements; /* number of max elements in the set */ u32 hosts; /* number of hosts in a subnet */ size_t memsize; /* members size */ - size_t dsize; /* extensions struct size */ - size_t offset[IPSET_OFFSET_MAX]; /* Offsets to extensions */ u8 netmask; /* subnet netmask */ - u32 timeout; /* timeout parameter */ struct timer_list gc; /* garbage collection */ }; @@ -65,20 +63,21 @@ ip_to_id(const struct bitmap_ip *m, u32 ip) /* Common functions */ static inline int -bitmap_ip_do_test(const struct bitmap_ip_adt_elem *e, struct bitmap_ip *map) +bitmap_ip_do_test(const struct bitmap_ip_adt_elem *e, + struct bitmap_ip *map, size_t dsize) { return !!test_bit(e->id, map->members); } static inline int -bitmap_ip_gc_test(u16 id, const struct bitmap_ip *map) +bitmap_ip_gc_test(u16 id, const struct bitmap_ip *map, size_t dsize) { return !!test_bit(id, map->members); } static inline int bitmap_ip_do_add(const struct bitmap_ip_adt_elem *e, struct bitmap_ip *map, - u32 flags) + u32 flags, size_t dsize) { return !!test_and_set_bit(e->id, map->members); } @@ -90,7 +89,8 @@ bitmap_ip_do_del(const struct bitmap_ip_adt_elem *e, struct bitmap_ip *map) } static inline int -bitmap_ip_do_list(struct sk_buff *skb, const struct bitmap_ip *map, u32 id) +bitmap_ip_do_list(struct sk_buff *skb, const struct bitmap_ip *map, u32 id, + size_t dsize) { return nla_put_ipaddr4(skb, IPSET_ATTR_IP, htonl(map->first_ip + id * map->hosts)); @@ -113,7 +113,7 @@ bitmap_ip_kadt(struct ip_set *set, const struct sk_buff *skb, struct bitmap_ip *map = set->data; ipset_adtfn adtfn = set->variant->adt[adt]; struct bitmap_ip_adt_elem e = { }; - struct ip_set_ext ext = IP_SET_INIT_KEXT(skb, opt, map); + struct ip_set_ext ext = IP_SET_INIT_KEXT(skb, opt, set); u32 ip; ip = ntohl(ip4addr(skb, opt->flags & IPSET_DIM_ONE_SRC)); @@ -131,9 +131,9 @@ bitmap_ip_uadt(struct ip_set *set, struct nlattr *tb[], { struct bitmap_ip *map = set->data; ipset_adtfn adtfn = set->variant->adt[adt]; - u32 ip, ip_to; + u32 ip = 0, ip_to = 0; struct bitmap_ip_adt_elem e = { }; - struct ip_set_ext ext = IP_SET_INIT_UEXT(map); + struct ip_set_ext ext = IP_SET_INIT_UEXT(set); int ret = 0; if (unlikely(!tb[IPSET_ATTR_IP] || @@ -200,7 +200,7 @@ bitmap_ip_same_set(const struct ip_set *a, const struct ip_set *b) return x->first_ip == y->first_ip && x->last_ip == y->last_ip && x->netmask == y->netmask && - x->timeout == y->timeout && + a->timeout == b->timeout && a->extensions == b->extensions; } @@ -209,25 +209,6 @@ bitmap_ip_same_set(const struct ip_set *a, const struct ip_set *b) struct bitmap_ip_elem { }; -/* Timeout variant */ - -struct bitmap_ipt_elem { - unsigned long timeout; -}; - -/* Plain variant with counter */ - -struct bitmap_ipc_elem { - struct ip_set_counter counter; -}; - -/* Timeout variant with counter */ - -struct bitmap_ipct_elem { - unsigned long timeout; - struct ip_set_counter counter; -}; - #include "ip_set_bitmap_gen.h" /* Create bitmap:ip type of sets */ @@ -240,8 +221,8 @@ init_map_ip(struct ip_set *set, struct bitmap_ip *map, map->members = ip_set_alloc(map->memsize); if (!map->members) return false; - if (map->dsize) { - map->extensions = ip_set_alloc(map->dsize * elements); + if (set->dsize) { + map->extensions = ip_set_alloc(set->dsize * elements); if (!map->extensions) { kfree(map->members); return false; @@ -252,7 +233,7 @@ init_map_ip(struct ip_set *set, struct bitmap_ip *map, map->elements = elements; map->hosts = hosts; map->netmask = netmask; - map->timeout = IPSET_NO_TIMEOUT; + set->timeout = IPSET_NO_TIMEOUT; set->data = map; set->family = NFPROTO_IPV4; @@ -261,10 +242,11 @@ init_map_ip(struct ip_set *set, struct bitmap_ip *map, } static int -bitmap_ip_create(struct ip_set *set, struct nlattr *tb[], u32 flags) +bitmap_ip_create(struct net *net, struct ip_set *set, struct nlattr *tb[], + u32 flags) { struct bitmap_ip *map; - u32 first_ip, last_ip, hosts, cadt_flags = 0; + u32 first_ip = 0, last_ip = 0, hosts; u64 elements; u8 netmask = 32; int ret; @@ -336,61 +318,15 @@ bitmap_ip_create(struct ip_set *set, struct nlattr *tb[], u32 flags) map->memsize = bitmap_bytes(0, elements - 1); set->variant = &bitmap_ip; - if (tb[IPSET_ATTR_CADT_FLAGS]) - cadt_flags = ip_set_get_h32(tb[IPSET_ATTR_CADT_FLAGS]); - if (cadt_flags & IPSET_FLAG_WITH_COUNTERS) { - set->extensions |= IPSET_EXT_COUNTER; - if (tb[IPSET_ATTR_TIMEOUT]) { - map->dsize = sizeof(struct bitmap_ipct_elem); - map->offset[IPSET_OFFSET_TIMEOUT] = - offsetof(struct bitmap_ipct_elem, timeout); - map->offset[IPSET_OFFSET_COUNTER] = - offsetof(struct bitmap_ipct_elem, counter); - - if (!init_map_ip(set, map, first_ip, last_ip, - elements, hosts, netmask)) { - kfree(map); - return -ENOMEM; - } - - map->timeout = ip_set_timeout_uget( - tb[IPSET_ATTR_TIMEOUT]); - set->extensions |= IPSET_EXT_TIMEOUT; - - bitmap_ip_gc_init(set, bitmap_ip_gc); - } else { - map->dsize = sizeof(struct bitmap_ipc_elem); - map->offset[IPSET_OFFSET_COUNTER] = - offsetof(struct bitmap_ipc_elem, counter); - - if (!init_map_ip(set, map, first_ip, last_ip, - elements, hosts, netmask)) { - kfree(map); - return -ENOMEM; - } - } - } else if (tb[IPSET_ATTR_TIMEOUT]) { - map->dsize = sizeof(struct bitmap_ipt_elem); - map->offset[IPSET_OFFSET_TIMEOUT] = - offsetof(struct bitmap_ipt_elem, timeout); - - if (!init_map_ip(set, map, first_ip, last_ip, - elements, hosts, netmask)) { - kfree(map); - return -ENOMEM; - } - - map->timeout = ip_set_timeout_uget(tb[IPSET_ATTR_TIMEOUT]); - set->extensions |= IPSET_EXT_TIMEOUT; - + set->dsize = ip_set_elem_len(set, tb, 0); + if (!init_map_ip(set, map, first_ip, last_ip, + elements, hosts, netmask)) { + kfree(map); + return -ENOMEM; + } + if (tb[IPSET_ATTR_TIMEOUT]) { + set->timeout = ip_set_timeout_uget(tb[IPSET_ATTR_TIMEOUT]); bitmap_ip_gc_init(set, bitmap_ip_gc); - } else { - map->dsize = 0; - if (!init_map_ip(set, map, first_ip, last_ip, - elements, hosts, netmask)) { - kfree(map); - return -ENOMEM; - } } return 0; } @@ -401,8 +337,8 @@ static struct ip_set_type bitmap_ip_type __read_mostly = { .features = IPSET_TYPE_IP, .dimension = IPSET_DIM_ONE, .family = NFPROTO_IPV4, - .revision_min = REVISION_MIN, - .revision_max = REVISION_MAX, + .revision_min = IPSET_TYPE_REV_MIN, + .revision_max = IPSET_TYPE_REV_MAX, .create = bitmap_ip_create, .create_policy = { [IPSET_ATTR_IP] = { .type = NLA_NESTED }, @@ -420,6 +356,7 @@ static struct ip_set_type bitmap_ip_type __read_mostly = { [IPSET_ATTR_LINENO] = { .type = NLA_U32 }, [IPSET_ATTR_BYTES] = { .type = NLA_U64 }, [IPSET_ATTR_PACKETS] = { .type = NLA_U64 }, + [IPSET_ATTR_COMMENT] = { .type = NLA_NUL_STRING }, }, .me = THIS_MODULE, }; diff --git a/net/netfilter/ipset/ip_set_bitmap_ipmac.c b/net/netfilter/ipset/ip_set_bitmap_ipmac.c index 3b30e0bef890..740eabededd9 100644 --- a/net/netfilter/ipset/ip_set_bitmap_ipmac.c +++ b/net/netfilter/ipset/ip_set_bitmap_ipmac.c @@ -25,12 +25,13 @@ #include <linux/netfilter/ipset/ip_set.h> #include <linux/netfilter/ipset/ip_set_bitmap.h> -#define REVISION_MIN 0 -#define REVISION_MAX 1 /* Counter support added */ +#define IPSET_TYPE_REV_MIN 0 +/* 1 Counter support added */ +#define IPSET_TYPE_REV_MAX 2 /* Comment support added */ MODULE_LICENSE("GPL"); MODULE_AUTHOR("Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>"); -IP_SET_MODULE_DESC("bitmap:ip,mac", REVISION_MIN, REVISION_MAX); +IP_SET_MODULE_DESC("bitmap:ip,mac", IPSET_TYPE_REV_MIN, IPSET_TYPE_REV_MAX); MODULE_ALIAS("ip_set_bitmap:ip,mac"); #define MTYPE bitmap_ipmac @@ -48,11 +49,8 @@ struct bitmap_ipmac { u32 first_ip; /* host byte order, included in range */ u32 last_ip; /* host byte order, included in range */ u32 elements; /* number of max elements in the set */ - u32 timeout; /* timeout value */ - struct timer_list gc; /* garbage collector */ size_t memsize; /* members size */ - size_t dsize; /* size of element */ - size_t offset[IPSET_OFFSET_MAX]; /* Offsets to extensions */ + struct timer_list gc; /* garbage collector */ }; /* ADT structure for generic function args */ @@ -82,13 +80,13 @@ get_elem(void *extensions, u16 id, size_t dsize) static inline int bitmap_ipmac_do_test(const struct bitmap_ipmac_adt_elem *e, - const struct bitmap_ipmac *map) + const struct bitmap_ipmac *map, size_t dsize) { const struct bitmap_ipmac_elem *elem; if (!test_bit(e->id, map->members)) return 0; - elem = get_elem(map->extensions, e->id, map->dsize); + elem = get_elem(map->extensions, e->id, dsize); if (elem->filled == MAC_FILLED) return e->ether == NULL || ether_addr_equal(e->ether, elem->ether); @@ -97,13 +95,13 @@ bitmap_ipmac_do_test(const struct bitmap_ipmac_adt_elem *e, } static inline int -bitmap_ipmac_gc_test(u16 id, const struct bitmap_ipmac *map) +bitmap_ipmac_gc_test(u16 id, const struct bitmap_ipmac *map, size_t dsize) { const struct bitmap_ipmac_elem *elem; if (!test_bit(id, map->members)) return 0; - elem = get_elem(map->extensions, id, map->dsize); + elem = get_elem(map->extensions, id, dsize); /* Timer not started for the incomplete elements */ return elem->filled == MAC_FILLED; } @@ -117,13 +115,13 @@ bitmap_ipmac_is_filled(const struct bitmap_ipmac_elem *elem) static inline int bitmap_ipmac_add_timeout(unsigned long *timeout, const struct bitmap_ipmac_adt_elem *e, - const struct ip_set_ext *ext, + const struct ip_set_ext *ext, struct ip_set *set, struct bitmap_ipmac *map, int mode) { u32 t = ext->timeout; if (mode == IPSET_ADD_START_STORED_TIMEOUT) { - if (t == map->timeout) + if (t == set->timeout) /* Timeout was not specified, get stored one */ t = *timeout; ip_set_timeout_set(timeout, t); @@ -142,11 +140,11 @@ bitmap_ipmac_add_timeout(unsigned long *timeout, static inline int bitmap_ipmac_do_add(const struct bitmap_ipmac_adt_elem *e, - struct bitmap_ipmac *map, u32 flags) + struct bitmap_ipmac *map, u32 flags, size_t dsize) { struct bitmap_ipmac_elem *elem; - elem = get_elem(map->extensions, e->id, map->dsize); + elem = get_elem(map->extensions, e->id, dsize); if (test_and_set_bit(e->id, map->members)) { if (elem->filled == MAC_FILLED) { if (e->ether && (flags & IPSET_FLAG_EXIST)) @@ -178,22 +176,12 @@ bitmap_ipmac_do_del(const struct bitmap_ipmac_adt_elem *e, return !test_and_clear_bit(e->id, map->members); } -static inline unsigned long -ip_set_timeout_stored(struct bitmap_ipmac *map, u32 id, unsigned long *timeout) -{ - const struct bitmap_ipmac_elem *elem = - get_elem(map->extensions, id, map->dsize); - - return elem->filled == MAC_FILLED ? ip_set_timeout_get(timeout) : - *timeout; -} - static inline int bitmap_ipmac_do_list(struct sk_buff *skb, const struct bitmap_ipmac *map, - u32 id) + u32 id, size_t dsize) { const struct bitmap_ipmac_elem *elem = - get_elem(map->extensions, id, map->dsize); + get_elem(map->extensions, id, dsize); return nla_put_ipaddr4(skb, IPSET_ATTR_IP, htonl(map->first_ip + id)) || @@ -216,7 +204,7 @@ bitmap_ipmac_kadt(struct ip_set *set, const struct sk_buff *skb, struct bitmap_ipmac *map = set->data; ipset_adtfn adtfn = set->variant->adt[adt]; struct bitmap_ipmac_adt_elem e = {}; - struct ip_set_ext ext = IP_SET_INIT_KEXT(skb, opt, map); + struct ip_set_ext ext = IP_SET_INIT_KEXT(skb, opt, set); u32 ip; /* MAC can be src only */ @@ -245,8 +233,8 @@ bitmap_ipmac_uadt(struct ip_set *set, struct nlattr *tb[], const struct bitmap_ipmac *map = set->data; ipset_adtfn adtfn = set->variant->adt[adt]; struct bitmap_ipmac_adt_elem e = {}; - struct ip_set_ext ext = IP_SET_INIT_UEXT(map); - u32 ip; + struct ip_set_ext ext = IP_SET_INIT_UEXT(set); + u32 ip = 0; int ret = 0; if (unlikely(!tb[IPSET_ATTR_IP] || @@ -285,43 +273,12 @@ bitmap_ipmac_same_set(const struct ip_set *a, const struct ip_set *b) return x->first_ip == y->first_ip && x->last_ip == y->last_ip && - x->timeout == y->timeout && + a->timeout == b->timeout && a->extensions == b->extensions; } /* Plain variant */ -/* Timeout variant */ - -struct bitmap_ipmact_elem { - struct { - unsigned char ether[ETH_ALEN]; - unsigned char filled; - } __attribute__ ((aligned)); - unsigned long timeout; -}; - -/* Plain variant with counter */ - -struct bitmap_ipmacc_elem { - struct { - unsigned char ether[ETH_ALEN]; - unsigned char filled; - } __attribute__ ((aligned)); - struct ip_set_counter counter; -}; - -/* Timeout variant with counter */ - -struct bitmap_ipmacct_elem { - struct { - unsigned char ether[ETH_ALEN]; - unsigned char filled; - } __attribute__ ((aligned)); - unsigned long timeout; - struct ip_set_counter counter; -}; - #include "ip_set_bitmap_gen.h" /* Create bitmap:ip,mac type of sets */ @@ -330,11 +287,11 @@ static bool init_map_ipmac(struct ip_set *set, struct bitmap_ipmac *map, u32 first_ip, u32 last_ip, u32 elements) { - map->members = ip_set_alloc((last_ip - first_ip + 1) * map->dsize); + map->members = ip_set_alloc(map->memsize); if (!map->members) return false; - if (map->dsize) { - map->extensions = ip_set_alloc(map->dsize * elements); + if (set->dsize) { + map->extensions = ip_set_alloc(set->dsize * elements); if (!map->extensions) { kfree(map->members); return false; @@ -343,7 +300,7 @@ init_map_ipmac(struct ip_set *set, struct bitmap_ipmac *map, map->first_ip = first_ip; map->last_ip = last_ip; map->elements = elements; - map->timeout = IPSET_NO_TIMEOUT; + set->timeout = IPSET_NO_TIMEOUT; set->data = map; set->family = NFPROTO_IPV4; @@ -352,10 +309,10 @@ init_map_ipmac(struct ip_set *set, struct bitmap_ipmac *map, } static int -bitmap_ipmac_create(struct ip_set *set, struct nlattr *tb[], +bitmap_ipmac_create(struct net *net, struct ip_set *set, struct nlattr *tb[], u32 flags) { - u32 first_ip, last_ip, cadt_flags = 0; + u32 first_ip = 0, last_ip = 0; u64 elements; struct bitmap_ipmac *map; int ret; @@ -399,57 +356,15 @@ bitmap_ipmac_create(struct ip_set *set, struct nlattr *tb[], map->memsize = bitmap_bytes(0, elements - 1); set->variant = &bitmap_ipmac; - if (tb[IPSET_ATTR_CADT_FLAGS]) - cadt_flags = ip_set_get_h32(tb[IPSET_ATTR_CADT_FLAGS]); - if (cadt_flags & IPSET_FLAG_WITH_COUNTERS) { - set->extensions |= IPSET_EXT_COUNTER; - if (tb[IPSET_ATTR_TIMEOUT]) { - map->dsize = sizeof(struct bitmap_ipmacct_elem); - map->offset[IPSET_OFFSET_TIMEOUT] = - offsetof(struct bitmap_ipmacct_elem, timeout); - map->offset[IPSET_OFFSET_COUNTER] = - offsetof(struct bitmap_ipmacct_elem, counter); - - if (!init_map_ipmac(set, map, first_ip, last_ip, - elements)) { - kfree(map); - return -ENOMEM; - } - map->timeout = ip_set_timeout_uget( - tb[IPSET_ATTR_TIMEOUT]); - set->extensions |= IPSET_EXT_TIMEOUT; - bitmap_ipmac_gc_init(set, bitmap_ipmac_gc); - } else { - map->dsize = sizeof(struct bitmap_ipmacc_elem); - map->offset[IPSET_OFFSET_COUNTER] = - offsetof(struct bitmap_ipmacc_elem, counter); - - if (!init_map_ipmac(set, map, first_ip, last_ip, - elements)) { - kfree(map); - return -ENOMEM; - } - } - } else if (tb[IPSET_ATTR_TIMEOUT]) { - map->dsize = sizeof(struct bitmap_ipmact_elem); - map->offset[IPSET_OFFSET_TIMEOUT] = - offsetof(struct bitmap_ipmact_elem, timeout); - - if (!init_map_ipmac(set, map, first_ip, last_ip, elements)) { - kfree(map); - return -ENOMEM; - } - map->timeout = ip_set_timeout_uget(tb[IPSET_ATTR_TIMEOUT]); - set->extensions |= IPSET_EXT_TIMEOUT; + set->dsize = ip_set_elem_len(set, tb, + sizeof(struct bitmap_ipmac_elem)); + if (!init_map_ipmac(set, map, first_ip, last_ip, elements)) { + kfree(map); + return -ENOMEM; + } + if (tb[IPSET_ATTR_TIMEOUT]) { + set->timeout = ip_set_timeout_uget(tb[IPSET_ATTR_TIMEOUT]); bitmap_ipmac_gc_init(set, bitmap_ipmac_gc); - } else { - map->dsize = sizeof(struct bitmap_ipmac_elem); - - if (!init_map_ipmac(set, map, first_ip, last_ip, elements)) { - kfree(map); - return -ENOMEM; - } - set->variant = &bitmap_ipmac; } return 0; } @@ -460,8 +375,8 @@ static struct ip_set_type bitmap_ipmac_type = { .features = IPSET_TYPE_IP | IPSET_TYPE_MAC, .dimension = IPSET_DIM_TWO, .family = NFPROTO_IPV4, - .revision_min = REVISION_MIN, - .revision_max = REVISION_MAX, + .revision_min = IPSET_TYPE_REV_MIN, + .revision_max = IPSET_TYPE_REV_MAX, .create = bitmap_ipmac_create, .create_policy = { [IPSET_ATTR_IP] = { .type = NLA_NESTED }, @@ -478,6 +393,7 @@ static struct ip_set_type bitmap_ipmac_type = { [IPSET_ATTR_LINENO] = { .type = NLA_U32 }, [IPSET_ATTR_BYTES] = { .type = NLA_U64 }, [IPSET_ATTR_PACKETS] = { .type = NLA_U64 }, + [IPSET_ATTR_COMMENT] = { .type = NLA_NUL_STRING }, }, .me = THIS_MODULE, }; diff --git a/net/netfilter/ipset/ip_set_bitmap_port.c b/net/netfilter/ipset/ip_set_bitmap_port.c index 8207d1fda528..cf99676e69f8 100644 --- a/net/netfilter/ipset/ip_set_bitmap_port.c +++ b/net/netfilter/ipset/ip_set_bitmap_port.c @@ -20,12 +20,13 @@ #include <linux/netfilter/ipset/ip_set_bitmap.h> #include <linux/netfilter/ipset/ip_set_getport.h> -#define REVISION_MIN 0 -#define REVISION_MAX 1 /* Counter support added */ +#define IPSET_TYPE_REV_MIN 0 +/* 1 Counter support added */ +#define IPSET_TYPE_REV_MAX 2 /* Comment support added */ MODULE_LICENSE("GPL"); MODULE_AUTHOR("Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>"); -IP_SET_MODULE_DESC("bitmap:port", REVISION_MIN, REVISION_MAX); +IP_SET_MODULE_DESC("bitmap:port", IPSET_TYPE_REV_MIN, IPSET_TYPE_REV_MAX); MODULE_ALIAS("ip_set_bitmap:port"); #define MTYPE bitmap_port @@ -38,9 +39,6 @@ struct bitmap_port { u16 last_port; /* host byte order, included in range */ u32 elements; /* number of max elements in the set */ size_t memsize; /* members size */ - size_t dsize; /* extensions struct size */ - size_t offset[IPSET_OFFSET_MAX]; /* Offsets to extensions */ - u32 timeout; /* timeout parameter */ struct timer_list gc; /* garbage collection */ }; @@ -59,20 +57,20 @@ port_to_id(const struct bitmap_port *m, u16 port) static inline int bitmap_port_do_test(const struct bitmap_port_adt_elem *e, - const struct bitmap_port *map) + const struct bitmap_port *map, size_t dsize) { return !!test_bit(e->id, map->members); } static inline int -bitmap_port_gc_test(u16 id, const struct bitmap_port *map) +bitmap_port_gc_test(u16 id, const struct bitmap_port *map, size_t dsize) { return !!test_bit(id, map->members); } static inline int bitmap_port_do_add(const struct bitmap_port_adt_elem *e, - struct bitmap_port *map, u32 flags) + struct bitmap_port *map, u32 flags, size_t dsize) { return !!test_and_set_bit(e->id, map->members); } @@ -85,7 +83,8 @@ bitmap_port_do_del(const struct bitmap_port_adt_elem *e, } static inline int -bitmap_port_do_list(struct sk_buff *skb, const struct bitmap_port *map, u32 id) +bitmap_port_do_list(struct sk_buff *skb, const struct bitmap_port *map, u32 id, + size_t dsize) { return nla_put_net16(skb, IPSET_ATTR_PORT, htons(map->first_port + id)); @@ -106,7 +105,7 @@ bitmap_port_kadt(struct ip_set *set, const struct sk_buff *skb, struct bitmap_port *map = set->data; ipset_adtfn adtfn = set->variant->adt[adt]; struct bitmap_port_adt_elem e = {}; - struct ip_set_ext ext = IP_SET_INIT_KEXT(skb, opt, map); + struct ip_set_ext ext = IP_SET_INIT_KEXT(skb, opt, set); __be16 __port; u16 port = 0; @@ -131,7 +130,7 @@ bitmap_port_uadt(struct ip_set *set, struct nlattr *tb[], struct bitmap_port *map = set->data; ipset_adtfn adtfn = set->variant->adt[adt]; struct bitmap_port_adt_elem e = {}; - struct ip_set_ext ext = IP_SET_INIT_UEXT(map); + struct ip_set_ext ext = IP_SET_INIT_UEXT(set); u32 port; /* wraparound */ u16 port_to; int ret = 0; @@ -191,7 +190,7 @@ bitmap_port_same_set(const struct ip_set *a, const struct ip_set *b) return x->first_port == y->first_port && x->last_port == y->last_port && - x->timeout == y->timeout && + a->timeout == b->timeout && a->extensions == b->extensions; } @@ -200,25 +199,6 @@ bitmap_port_same_set(const struct ip_set *a, const struct ip_set *b) struct bitmap_port_elem { }; -/* Timeout variant */ - -struct bitmap_portt_elem { - unsigned long timeout; -}; - -/* Plain variant with counter */ - -struct bitmap_portc_elem { - struct ip_set_counter counter; -}; - -/* Timeout variant with counter */ - -struct bitmap_portct_elem { - unsigned long timeout; - struct ip_set_counter counter; -}; - #include "ip_set_bitmap_gen.h" /* Create bitmap:ip type of sets */ @@ -230,8 +210,8 @@ init_map_port(struct ip_set *set, struct bitmap_port *map, map->members = ip_set_alloc(map->memsize); if (!map->members) return false; - if (map->dsize) { - map->extensions = ip_set_alloc(map->dsize * map->elements); + if (set->dsize) { + map->extensions = ip_set_alloc(set->dsize * map->elements); if (!map->extensions) { kfree(map->members); return false; @@ -239,7 +219,7 @@ init_map_port(struct ip_set *set, struct bitmap_port *map, } map->first_port = first_port; map->last_port = last_port; - map->timeout = IPSET_NO_TIMEOUT; + set->timeout = IPSET_NO_TIMEOUT; set->data = map; set->family = NFPROTO_UNSPEC; @@ -248,11 +228,11 @@ init_map_port(struct ip_set *set, struct bitmap_port *map, } static int -bitmap_port_create(struct ip_set *set, struct nlattr *tb[], u32 flags) +bitmap_port_create(struct net *net, struct ip_set *set, struct nlattr *tb[], + u32 flags) { struct bitmap_port *map; u16 first_port, last_port; - u32 cadt_flags = 0; if (unlikely(!ip_set_attr_netorder(tb, IPSET_ATTR_PORT) || !ip_set_attr_netorder(tb, IPSET_ATTR_PORT_TO) || @@ -274,55 +254,16 @@ bitmap_port_create(struct ip_set *set, struct nlattr *tb[], u32 flags) return -ENOMEM; map->elements = last_port - first_port + 1; - map->memsize = map->elements * sizeof(unsigned long); + map->memsize = bitmap_bytes(0, map->elements); set->variant = &bitmap_port; - if (tb[IPSET_ATTR_CADT_FLAGS]) - cadt_flags = ip_set_get_h32(tb[IPSET_ATTR_CADT_FLAGS]); - if (cadt_flags & IPSET_FLAG_WITH_COUNTERS) { - set->extensions |= IPSET_EXT_COUNTER; - if (tb[IPSET_ATTR_TIMEOUT]) { - map->dsize = sizeof(struct bitmap_portct_elem); - map->offset[IPSET_OFFSET_TIMEOUT] = - offsetof(struct bitmap_portct_elem, timeout); - map->offset[IPSET_OFFSET_COUNTER] = - offsetof(struct bitmap_portct_elem, counter); - if (!init_map_port(set, map, first_port, last_port)) { - kfree(map); - return -ENOMEM; - } - - map->timeout = - ip_set_timeout_uget(tb[IPSET_ATTR_TIMEOUT]); - set->extensions |= IPSET_EXT_TIMEOUT; - bitmap_port_gc_init(set, bitmap_port_gc); - } else { - map->dsize = sizeof(struct bitmap_portc_elem); - map->offset[IPSET_OFFSET_COUNTER] = - offsetof(struct bitmap_portc_elem, counter); - if (!init_map_port(set, map, first_port, last_port)) { - kfree(map); - return -ENOMEM; - } - } - } else if (tb[IPSET_ATTR_TIMEOUT]) { - map->dsize = sizeof(struct bitmap_portt_elem); - map->offset[IPSET_OFFSET_TIMEOUT] = - offsetof(struct bitmap_portt_elem, timeout); - if (!init_map_port(set, map, first_port, last_port)) { - kfree(map); - return -ENOMEM; - } - - map->timeout = ip_set_timeout_uget(tb[IPSET_ATTR_TIMEOUT]); - set->extensions |= IPSET_EXT_TIMEOUT; + set->dsize = ip_set_elem_len(set, tb, 0); + if (!init_map_port(set, map, first_port, last_port)) { + kfree(map); + return -ENOMEM; + } + if (tb[IPSET_ATTR_TIMEOUT]) { + set->timeout = ip_set_timeout_uget(tb[IPSET_ATTR_TIMEOUT]); bitmap_port_gc_init(set, bitmap_port_gc); - } else { - map->dsize = 0; - if (!init_map_port(set, map, first_port, last_port)) { - kfree(map); - return -ENOMEM; - } - } return 0; } @@ -333,8 +274,8 @@ static struct ip_set_type bitmap_port_type = { .features = IPSET_TYPE_PORT, .dimension = IPSET_DIM_ONE, .family = NFPROTO_UNSPEC, - .revision_min = REVISION_MIN, - .revision_max = REVISION_MAX, + .revision_min = IPSET_TYPE_REV_MIN, + .revision_max = IPSET_TYPE_REV_MAX, .create = bitmap_port_create, .create_policy = { [IPSET_ATTR_PORT] = { .type = NLA_U16 }, @@ -349,6 +290,7 @@ static struct ip_set_type bitmap_port_type = { [IPSET_ATTR_LINENO] = { .type = NLA_U32 }, [IPSET_ATTR_BYTES] = { .type = NLA_U64 }, [IPSET_ATTR_PACKETS] = { .type = NLA_U64 }, + [IPSET_ATTR_COMMENT] = { .type = NLA_NUL_STRING }, }, .me = THIS_MODULE, }; diff --git a/net/netfilter/ipset/ip_set_core.c b/net/netfilter/ipset/ip_set_core.c index f2e30fb31e78..bac7e01df67f 100644 --- a/net/netfilter/ipset/ip_set_core.c +++ b/net/netfilter/ipset/ip_set_core.c @@ -17,6 +17,8 @@ #include <linux/spinlock.h> #include <linux/rculist.h> #include <net/netlink.h> +#include <net/net_namespace.h> +#include <net/netns/generic.h> #include <linux/netfilter.h> #include <linux/netfilter/x_tables.h> @@ -27,8 +29,17 @@ static LIST_HEAD(ip_set_type_list); /* all registered set types */ static DEFINE_MUTEX(ip_set_type_mutex); /* protects ip_set_type_list */ static DEFINE_RWLOCK(ip_set_ref_lock); /* protects the set refs */ -static struct ip_set * __rcu *ip_set_list; /* all individual sets */ -static ip_set_id_t ip_set_max = CONFIG_IP_SET_MAX; /* max number of sets */ +struct ip_set_net { + struct ip_set * __rcu *ip_set_list; /* all individual sets */ + ip_set_id_t ip_set_max; /* max number of sets */ + int is_deleted; /* deleted by ip_set_net_exit */ +}; +static int ip_set_net_id __read_mostly; + +static inline struct ip_set_net *ip_set_pernet(struct net *net) +{ + return net_generic(net, ip_set_net_id); +} #define IP_SET_INC 64 #define STREQ(a, b) (strncmp(a, b, IPSET_MAXNAMELEN) == 0) @@ -45,8 +56,8 @@ MODULE_ALIAS_NFNL_SUBSYS(NFNL_SUBSYS_IPSET); /* When the nfnl mutex is held: */ #define nfnl_dereference(p) \ rcu_dereference_protected(p, 1) -#define nfnl_set(id) \ - nfnl_dereference(ip_set_list)[id] +#define nfnl_set(inst, id) \ + nfnl_dereference((inst)->ip_set_list)[id] /* * The set types are implemented in modules and registered set types @@ -315,6 +326,60 @@ ip_set_get_ipaddr6(struct nlattr *nla, union nf_inet_addr *ipaddr) } EXPORT_SYMBOL_GPL(ip_set_get_ipaddr6); +typedef void (*destroyer)(void *); +/* ipset data extension types, in size order */ + +const struct ip_set_ext_type ip_set_extensions[] = { + [IPSET_EXT_ID_COUNTER] = { + .type = IPSET_EXT_COUNTER, + .flag = IPSET_FLAG_WITH_COUNTERS, + .len = sizeof(struct ip_set_counter), + .align = __alignof__(struct ip_set_counter), + }, + [IPSET_EXT_ID_TIMEOUT] = { + .type = IPSET_EXT_TIMEOUT, + .len = sizeof(unsigned long), + .align = __alignof__(unsigned long), + }, + [IPSET_EXT_ID_COMMENT] = { + .type = IPSET_EXT_COMMENT | IPSET_EXT_DESTROY, + .flag = IPSET_FLAG_WITH_COMMENT, + .len = sizeof(struct ip_set_comment), + .align = __alignof__(struct ip_set_comment), + .destroy = (destroyer) ip_set_comment_free, + }, +}; +EXPORT_SYMBOL_GPL(ip_set_extensions); + +static inline bool +add_extension(enum ip_set_ext_id id, u32 flags, struct nlattr *tb[]) +{ + return ip_set_extensions[id].flag ? + (flags & ip_set_extensions[id].flag) : + !!tb[IPSET_ATTR_TIMEOUT]; +} + +size_t +ip_set_elem_len(struct ip_set *set, struct nlattr *tb[], size_t len) +{ + enum ip_set_ext_id id; + size_t offset = 0; + u32 cadt_flags = 0; + + if (tb[IPSET_ATTR_CADT_FLAGS]) + cadt_flags = ip_set_get_h32(tb[IPSET_ATTR_CADT_FLAGS]); + for (id = 0; id < IPSET_EXT_ID_MAX; id++) { + if (!add_extension(id, cadt_flags, tb)) + continue; + offset += ALIGN(len + offset, ip_set_extensions[id].align); + set->offset[id] = offset; + set->extensions |= ip_set_extensions[id].type; + offset += ip_set_extensions[id].len; + } + return len + offset; +} +EXPORT_SYMBOL_GPL(ip_set_elem_len); + int ip_set_get_extensions(struct ip_set *set, struct nlattr *tb[], struct ip_set_ext *ext) @@ -334,6 +399,12 @@ ip_set_get_extensions(struct ip_set *set, struct nlattr *tb[], ext->packets = be64_to_cpu(nla_get_be64( tb[IPSET_ATTR_PACKETS])); } + if (tb[IPSET_ATTR_COMMENT]) { + if (!(set->extensions & IPSET_EXT_COMMENT)) + return -IPSET_ERR_COMMENT; + ext->comment = ip_set_comment_uget(tb[IPSET_ATTR_COMMENT]); + } + return 0; } EXPORT_SYMBOL_GPL(ip_set_get_extensions); @@ -374,13 +445,14 @@ __ip_set_put(struct ip_set *set) */ static inline struct ip_set * -ip_set_rcu_get(ip_set_id_t index) +ip_set_rcu_get(struct net *net, ip_set_id_t index) { struct ip_set *set; + struct ip_set_net *inst = ip_set_pernet(net); rcu_read_lock(); /* ip_set_list itself needs to be protected */ - set = rcu_dereference(ip_set_list)[index]; + set = rcu_dereference(inst->ip_set_list)[index]; rcu_read_unlock(); return set; @@ -390,7 +462,8 @@ int ip_set_test(ip_set_id_t index, const struct sk_buff *skb, const struct xt_action_param *par, struct ip_set_adt_opt *opt) { - struct ip_set *set = ip_set_rcu_get(index); + struct ip_set *set = ip_set_rcu_get( + dev_net(par->in ? par->in : par->out), index); int ret = 0; BUG_ON(set == NULL); @@ -428,7 +501,8 @@ int ip_set_add(ip_set_id_t index, const struct sk_buff *skb, const struct xt_action_param *par, struct ip_set_adt_opt *opt) { - struct ip_set *set = ip_set_rcu_get(index); + struct ip_set *set = ip_set_rcu_get( + dev_net(par->in ? par->in : par->out), index); int ret; BUG_ON(set == NULL); @@ -450,7 +524,8 @@ int ip_set_del(ip_set_id_t index, const struct sk_buff *skb, const struct xt_action_param *par, struct ip_set_adt_opt *opt) { - struct ip_set *set = ip_set_rcu_get(index); + struct ip_set *set = ip_set_rcu_get( + dev_net(par->in ? par->in : par->out), index); int ret = 0; BUG_ON(set == NULL); @@ -474,14 +549,15 @@ EXPORT_SYMBOL_GPL(ip_set_del); * */ ip_set_id_t -ip_set_get_byname(const char *name, struct ip_set **set) +ip_set_get_byname(struct net *net, const char *name, struct ip_set **set) { ip_set_id_t i, index = IPSET_INVALID_ID; struct ip_set *s; + struct ip_set_net *inst = ip_set_pernet(net); rcu_read_lock(); - for (i = 0; i < ip_set_max; i++) { - s = rcu_dereference(ip_set_list)[i]; + for (i = 0; i < inst->ip_set_max; i++) { + s = rcu_dereference(inst->ip_set_list)[i]; if (s != NULL && STREQ(s->name, name)) { __ip_set_get(s); index = i; @@ -501,17 +577,26 @@ EXPORT_SYMBOL_GPL(ip_set_get_byname); * to be valid, after calling this function. * */ -void -ip_set_put_byindex(ip_set_id_t index) + +static inline void +__ip_set_put_byindex(struct ip_set_net *inst, ip_set_id_t index) { struct ip_set *set; rcu_read_lock(); - set = rcu_dereference(ip_set_list)[index]; + set = rcu_dereference(inst->ip_set_list)[index]; if (set != NULL) __ip_set_put(set); rcu_read_unlock(); } + +void +ip_set_put_byindex(struct net *net, ip_set_id_t index) +{ + struct ip_set_net *inst = ip_set_pernet(net); + + __ip_set_put_byindex(inst, index); +} EXPORT_SYMBOL_GPL(ip_set_put_byindex); /* @@ -522,9 +607,9 @@ EXPORT_SYMBOL_GPL(ip_set_put_byindex); * */ const char * -ip_set_name_byindex(ip_set_id_t index) +ip_set_name_byindex(struct net *net, ip_set_id_t index) { - const struct ip_set *set = ip_set_rcu_get(index); + const struct ip_set *set = ip_set_rcu_get(net, index); BUG_ON(set == NULL); BUG_ON(set->ref == 0); @@ -546,14 +631,15 @@ EXPORT_SYMBOL_GPL(ip_set_name_byindex); * The nfnl mutex is used in the function. */ ip_set_id_t -ip_set_nfnl_get(const char *name) +ip_set_nfnl_get(struct net *net, const char *name) { ip_set_id_t i, index = IPSET_INVALID_ID; struct ip_set *s; + struct ip_set_net *inst = ip_set_pernet(net); nfnl_lock(NFNL_SUBSYS_IPSET); - for (i = 0; i < ip_set_max; i++) { - s = nfnl_set(i); + for (i = 0; i < inst->ip_set_max; i++) { + s = nfnl_set(inst, i); if (s != NULL && STREQ(s->name, name)) { __ip_set_get(s); index = i; @@ -573,15 +659,16 @@ EXPORT_SYMBOL_GPL(ip_set_nfnl_get); * The nfnl mutex is used in the function. */ ip_set_id_t -ip_set_nfnl_get_byindex(ip_set_id_t index) +ip_set_nfnl_get_byindex(struct net *net, ip_set_id_t index) { struct ip_set *set; + struct ip_set_net *inst = ip_set_pernet(net); - if (index > ip_set_max) + if (index > inst->ip_set_max) return IPSET_INVALID_ID; nfnl_lock(NFNL_SUBSYS_IPSET); - set = nfnl_set(index); + set = nfnl_set(inst, index); if (set) __ip_set_get(set); else @@ -600,13 +687,17 @@ EXPORT_SYMBOL_GPL(ip_set_nfnl_get_byindex); * The nfnl mutex is used in the function. */ void -ip_set_nfnl_put(ip_set_id_t index) +ip_set_nfnl_put(struct net *net, ip_set_id_t index) { struct ip_set *set; + struct ip_set_net *inst = ip_set_pernet(net); + nfnl_lock(NFNL_SUBSYS_IPSET); - set = nfnl_set(index); - if (set != NULL) - __ip_set_put(set); + if (!inst->is_deleted) { /* already deleted from ip_set_net_exit() */ + set = nfnl_set(inst, index); + if (set != NULL) + __ip_set_put(set); + } nfnl_unlock(NFNL_SUBSYS_IPSET); } EXPORT_SYMBOL_GPL(ip_set_nfnl_put); @@ -664,14 +755,14 @@ static const struct nla_policy ip_set_create_policy[IPSET_ATTR_CMD_MAX + 1] = { }; static struct ip_set * -find_set_and_id(const char *name, ip_set_id_t *id) +find_set_and_id(struct ip_set_net *inst, const char *name, ip_set_id_t *id) { struct ip_set *set = NULL; ip_set_id_t i; *id = IPSET_INVALID_ID; - for (i = 0; i < ip_set_max; i++) { - set = nfnl_set(i); + for (i = 0; i < inst->ip_set_max; i++) { + set = nfnl_set(inst, i); if (set != NULL && STREQ(set->name, name)) { *id = i; break; @@ -681,22 +772,23 @@ find_set_and_id(const char *name, ip_set_id_t *id) } static inline struct ip_set * -find_set(const char *name) +find_set(struct ip_set_net *inst, const char *name) { ip_set_id_t id; - return find_set_and_id(name, &id); + return find_set_and_id(inst, name, &id); } static int -find_free_id(const char *name, ip_set_id_t *index, struct ip_set **set) +find_free_id(struct ip_set_net *inst, const char *name, ip_set_id_t *index, + struct ip_set **set) { struct ip_set *s; ip_set_id_t i; *index = IPSET_INVALID_ID; - for (i = 0; i < ip_set_max; i++) { - s = nfnl_set(i); + for (i = 0; i < inst->ip_set_max; i++) { + s = nfnl_set(inst, i); if (s == NULL) { if (*index == IPSET_INVALID_ID) *index = i; @@ -725,6 +817,8 @@ ip_set_create(struct sock *ctnl, struct sk_buff *skb, const struct nlmsghdr *nlh, const struct nlattr * const attr[]) { + struct net *net = sock_net(ctnl); + struct ip_set_net *inst = ip_set_pernet(net); struct ip_set *set, *clash = NULL; ip_set_id_t index = IPSET_INVALID_ID; struct nlattr *tb[IPSET_ATTR_CREATE_MAX+1] = {}; @@ -783,7 +877,7 @@ ip_set_create(struct sock *ctnl, struct sk_buff *skb, goto put_out; } - ret = set->type->create(set, tb, flags); + ret = set->type->create(net, set, tb, flags); if (ret != 0) goto put_out; @@ -794,7 +888,7 @@ ip_set_create(struct sock *ctnl, struct sk_buff *skb, * by the nfnl mutex. Find the first free index in ip_set_list * and check clashing. */ - ret = find_free_id(set->name, &index, &clash); + ret = find_free_id(inst, set->name, &index, &clash); if (ret == -EEXIST) { /* If this is the same set and requested, ignore error */ if ((flags & IPSET_FLAG_EXIST) && @@ -807,9 +901,9 @@ ip_set_create(struct sock *ctnl, struct sk_buff *skb, goto cleanup; } else if (ret == -IPSET_ERR_MAX_SETS) { struct ip_set **list, **tmp; - ip_set_id_t i = ip_set_max + IP_SET_INC; + ip_set_id_t i = inst->ip_set_max + IP_SET_INC; - if (i < ip_set_max || i == IPSET_INVALID_ID) + if (i < inst->ip_set_max || i == IPSET_INVALID_ID) /* Wraparound */ goto cleanup; @@ -817,14 +911,14 @@ ip_set_create(struct sock *ctnl, struct sk_buff *skb, if (!list) goto cleanup; /* nfnl mutex is held, both lists are valid */ - tmp = nfnl_dereference(ip_set_list); - memcpy(list, tmp, sizeof(struct ip_set *) * ip_set_max); - rcu_assign_pointer(ip_set_list, list); + tmp = nfnl_dereference(inst->ip_set_list); + memcpy(list, tmp, sizeof(struct ip_set *) * inst->ip_set_max); + rcu_assign_pointer(inst->ip_set_list, list); /* Make sure all current packets have passed through */ synchronize_net(); /* Use new list */ - index = ip_set_max; - ip_set_max = i; + index = inst->ip_set_max; + inst->ip_set_max = i; kfree(tmp); ret = 0; } else if (ret) @@ -834,7 +928,7 @@ ip_set_create(struct sock *ctnl, struct sk_buff *skb, * Finally! Add our shiny new set to the list, and be done. */ pr_debug("create: '%s' created with index %u!\n", set->name, index); - nfnl_set(index) = set; + nfnl_set(inst, index) = set; return ret; @@ -857,12 +951,12 @@ ip_set_setname_policy[IPSET_ATTR_CMD_MAX + 1] = { }; static void -ip_set_destroy_set(ip_set_id_t index) +ip_set_destroy_set(struct ip_set_net *inst, ip_set_id_t index) { - struct ip_set *set = nfnl_set(index); + struct ip_set *set = nfnl_set(inst, index); pr_debug("set: %s\n", set->name); - nfnl_set(index) = NULL; + nfnl_set(inst, index) = NULL; /* Must call it without holding any lock */ set->variant->destroy(set); @@ -875,6 +969,7 @@ ip_set_destroy(struct sock *ctnl, struct sk_buff *skb, const struct nlmsghdr *nlh, const struct nlattr * const attr[]) { + struct ip_set_net *inst = ip_set_pernet(sock_net(ctnl)); struct ip_set *s; ip_set_id_t i; int ret = 0; @@ -894,21 +989,22 @@ ip_set_destroy(struct sock *ctnl, struct sk_buff *skb, */ read_lock_bh(&ip_set_ref_lock); if (!attr[IPSET_ATTR_SETNAME]) { - for (i = 0; i < ip_set_max; i++) { - s = nfnl_set(i); + for (i = 0; i < inst->ip_set_max; i++) { + s = nfnl_set(inst, i); if (s != NULL && s->ref) { ret = -IPSET_ERR_BUSY; goto out; } } read_unlock_bh(&ip_set_ref_lock); - for (i = 0; i < ip_set_max; i++) { - s = nfnl_set(i); + for (i = 0; i < inst->ip_set_max; i++) { + s = nfnl_set(inst, i); if (s != NULL) - ip_set_destroy_set(i); + ip_set_destroy_set(inst, i); } } else { - s = find_set_and_id(nla_data(attr[IPSET_ATTR_SETNAME]), &i); + s = find_set_and_id(inst, nla_data(attr[IPSET_ATTR_SETNAME]), + &i); if (s == NULL) { ret = -ENOENT; goto out; @@ -918,7 +1014,7 @@ ip_set_destroy(struct sock *ctnl, struct sk_buff *skb, } read_unlock_bh(&ip_set_ref_lock); - ip_set_destroy_set(i); + ip_set_destroy_set(inst, i); } return 0; out: @@ -943,6 +1039,7 @@ ip_set_flush(struct sock *ctnl, struct sk_buff *skb, const struct nlmsghdr *nlh, const struct nlattr * const attr[]) { + struct ip_set_net *inst = ip_set_pernet(sock_net(ctnl)); struct ip_set *s; ip_set_id_t i; @@ -950,13 +1047,13 @@ ip_set_flush(struct sock *ctnl, struct sk_buff *skb, return -IPSET_ERR_PROTOCOL; if (!attr[IPSET_ATTR_SETNAME]) { - for (i = 0; i < ip_set_max; i++) { - s = nfnl_set(i); + for (i = 0; i < inst->ip_set_max; i++) { + s = nfnl_set(inst, i); if (s != NULL) ip_set_flush_set(s); } } else { - s = find_set(nla_data(attr[IPSET_ATTR_SETNAME])); + s = find_set(inst, nla_data(attr[IPSET_ATTR_SETNAME])); if (s == NULL) return -ENOENT; @@ -982,6 +1079,7 @@ ip_set_rename(struct sock *ctnl, struct sk_buff *skb, const struct nlmsghdr *nlh, const struct nlattr * const attr[]) { + struct ip_set_net *inst = ip_set_pernet(sock_net(ctnl)); struct ip_set *set, *s; const char *name2; ip_set_id_t i; @@ -992,7 +1090,7 @@ ip_set_rename(struct sock *ctnl, struct sk_buff *skb, attr[IPSET_ATTR_SETNAME2] == NULL)) return -IPSET_ERR_PROTOCOL; - set = find_set(nla_data(attr[IPSET_ATTR_SETNAME])); + set = find_set(inst, nla_data(attr[IPSET_ATTR_SETNAME])); if (set == NULL) return -ENOENT; @@ -1003,8 +1101,8 @@ ip_set_rename(struct sock *ctnl, struct sk_buff *skb, } name2 = nla_data(attr[IPSET_ATTR_SETNAME2]); - for (i = 0; i < ip_set_max; i++) { - s = nfnl_set(i); + for (i = 0; i < inst->ip_set_max; i++) { + s = nfnl_set(inst, i); if (s != NULL && STREQ(s->name, name2)) { ret = -IPSET_ERR_EXIST_SETNAME2; goto out; @@ -1031,6 +1129,7 @@ ip_set_swap(struct sock *ctnl, struct sk_buff *skb, const struct nlmsghdr *nlh, const struct nlattr * const attr[]) { + struct ip_set_net *inst = ip_set_pernet(sock_net(ctnl)); struct ip_set *from, *to; ip_set_id_t from_id, to_id; char from_name[IPSET_MAXNAMELEN]; @@ -1040,11 +1139,13 @@ ip_set_swap(struct sock *ctnl, struct sk_buff *skb, attr[IPSET_ATTR_SETNAME2] == NULL)) return -IPSET_ERR_PROTOCOL; - from = find_set_and_id(nla_data(attr[IPSET_ATTR_SETNAME]), &from_id); + from = find_set_and_id(inst, nla_data(attr[IPSET_ATTR_SETNAME]), + &from_id); if (from == NULL) return -ENOENT; - to = find_set_and_id(nla_data(attr[IPSET_ATTR_SETNAME2]), &to_id); + to = find_set_and_id(inst, nla_data(attr[IPSET_ATTR_SETNAME2]), + &to_id); if (to == NULL) return -IPSET_ERR_EXIST_SETNAME2; @@ -1061,8 +1162,8 @@ ip_set_swap(struct sock *ctnl, struct sk_buff *skb, write_lock_bh(&ip_set_ref_lock); swap(from->ref, to->ref); - nfnl_set(from_id) = to; - nfnl_set(to_id) = from; + nfnl_set(inst, from_id) = to; + nfnl_set(inst, to_id) = from; write_unlock_bh(&ip_set_ref_lock); return 0; @@ -1081,9 +1182,12 @@ ip_set_swap(struct sock *ctnl, struct sk_buff *skb, static int ip_set_dump_done(struct netlink_callback *cb) { - if (cb->args[2]) { - pr_debug("release set %s\n", nfnl_set(cb->args[1])->name); - ip_set_put_byindex((ip_set_id_t) cb->args[1]); + struct ip_set_net *inst = (struct ip_set_net *)cb->args[IPSET_CB_NET]; + if (cb->args[IPSET_CB_ARG0]) { + pr_debug("release set %s\n", + nfnl_set(inst, cb->args[IPSET_CB_INDEX])->name); + __ip_set_put_byindex(inst, + (ip_set_id_t) cb->args[IPSET_CB_INDEX]); } return 0; } @@ -1101,7 +1205,7 @@ dump_attrs(struct nlmsghdr *nlh) } static int -dump_init(struct netlink_callback *cb) +dump_init(struct netlink_callback *cb, struct ip_set_net *inst) { struct nlmsghdr *nlh = nlmsg_hdr(cb->skb); int min_len = nlmsg_total_size(sizeof(struct nfgenmsg)); @@ -1114,21 +1218,22 @@ dump_init(struct netlink_callback *cb) nla_parse(cda, IPSET_ATTR_CMD_MAX, attr, nlh->nlmsg_len - min_len, ip_set_setname_policy); - /* cb->args[0] : dump single set/all sets - * [1] : set index - * [..]: type specific + /* cb->args[IPSET_CB_NET]: net namespace + * [IPSET_CB_DUMP]: dump single set/all sets + * [IPSET_CB_INDEX]: set index + * [IPSET_CB_ARG0]: type specific */ if (cda[IPSET_ATTR_SETNAME]) { struct ip_set *set; - set = find_set_and_id(nla_data(cda[IPSET_ATTR_SETNAME]), + set = find_set_and_id(inst, nla_data(cda[IPSET_ATTR_SETNAME]), &index); if (set == NULL) return -ENOENT; dump_type = DUMP_ONE; - cb->args[1] = index; + cb->args[IPSET_CB_INDEX] = index; } else dump_type = DUMP_ALL; @@ -1136,7 +1241,8 @@ dump_init(struct netlink_callback *cb) u32 f = ip_set_get_h32(cda[IPSET_ATTR_FLAGS]); dump_type |= (f << 16); } - cb->args[0] = dump_type; + cb->args[IPSET_CB_NET] = (unsigned long)inst; + cb->args[IPSET_CB_DUMP] = dump_type; return 0; } @@ -1148,11 +1254,12 @@ ip_set_dump_start(struct sk_buff *skb, struct netlink_callback *cb) struct ip_set *set = NULL; struct nlmsghdr *nlh = NULL; unsigned int flags = NETLINK_CB(cb->skb).portid ? NLM_F_MULTI : 0; + struct ip_set_net *inst = ip_set_pernet(sock_net(skb->sk)); u32 dump_type, dump_flags; int ret = 0; - if (!cb->args[0]) { - ret = dump_init(cb); + if (!cb->args[IPSET_CB_DUMP]) { + ret = dump_init(cb, inst); if (ret < 0) { nlh = nlmsg_hdr(cb->skb); /* We have to create and send the error message @@ -1163,18 +1270,19 @@ ip_set_dump_start(struct sk_buff *skb, struct netlink_callback *cb) } } - if (cb->args[1] >= ip_set_max) + if (cb->args[IPSET_CB_INDEX] >= inst->ip_set_max) goto out; - dump_type = DUMP_TYPE(cb->args[0]); - dump_flags = DUMP_FLAGS(cb->args[0]); - max = dump_type == DUMP_ONE ? cb->args[1] + 1 : ip_set_max; + dump_type = DUMP_TYPE(cb->args[IPSET_CB_DUMP]); + dump_flags = DUMP_FLAGS(cb->args[IPSET_CB_DUMP]); + max = dump_type == DUMP_ONE ? cb->args[IPSET_CB_INDEX] + 1 + : inst->ip_set_max; dump_last: - pr_debug("args[0]: %u %u args[1]: %ld\n", - dump_type, dump_flags, cb->args[1]); - for (; cb->args[1] < max; cb->args[1]++) { - index = (ip_set_id_t) cb->args[1]; - set = nfnl_set(index); + pr_debug("dump type, flag: %u %u index: %ld\n", + dump_type, dump_flags, cb->args[IPSET_CB_INDEX]); + for (; cb->args[IPSET_CB_INDEX] < max; cb->args[IPSET_CB_INDEX]++) { + index = (ip_set_id_t) cb->args[IPSET_CB_INDEX]; + set = nfnl_set(inst, index); if (set == NULL) { if (dump_type == DUMP_ONE) { ret = -ENOENT; @@ -1190,7 +1298,7 @@ dump_last: !!(set->type->features & IPSET_DUMP_LAST))) continue; pr_debug("List set: %s\n", set->name); - if (!cb->args[2]) { + if (!cb->args[IPSET_CB_ARG0]) { /* Start listing: make sure set won't be destroyed */ pr_debug("reference set\n"); __ip_set_get(set); @@ -1207,7 +1315,7 @@ dump_last: goto nla_put_failure; if (dump_flags & IPSET_FLAG_LIST_SETNAME) goto next_set; - switch (cb->args[2]) { + switch (cb->args[IPSET_CB_ARG0]) { case 0: /* Core header data */ if (nla_put_string(skb, IPSET_ATTR_TYPENAME, @@ -1227,7 +1335,7 @@ dump_last: read_lock_bh(&set->lock); ret = set->variant->list(set, skb, cb); read_unlock_bh(&set->lock); - if (!cb->args[2]) + if (!cb->args[IPSET_CB_ARG0]) /* Set is done, proceed with next one */ goto next_set; goto release_refcount; @@ -1236,8 +1344,8 @@ dump_last: /* If we dump all sets, continue with dumping last ones */ if (dump_type == DUMP_ALL) { dump_type = DUMP_LAST; - cb->args[0] = dump_type | (dump_flags << 16); - cb->args[1] = 0; + cb->args[IPSET_CB_DUMP] = dump_type | (dump_flags << 16); + cb->args[IPSET_CB_INDEX] = 0; goto dump_last; } goto out; @@ -1246,15 +1354,15 @@ nla_put_failure: ret = -EFAULT; next_set: if (dump_type == DUMP_ONE) - cb->args[1] = IPSET_INVALID_ID; + cb->args[IPSET_CB_INDEX] = IPSET_INVALID_ID; else - cb->args[1]++; + cb->args[IPSET_CB_INDEX]++; release_refcount: /* If there was an error or set is done, release set */ - if (ret || !cb->args[2]) { - pr_debug("release set %s\n", nfnl_set(index)->name); - ip_set_put_byindex(index); - cb->args[2] = 0; + if (ret || !cb->args[IPSET_CB_ARG0]) { + pr_debug("release set %s\n", nfnl_set(inst, index)->name); + __ip_set_put_byindex(inst, index); + cb->args[IPSET_CB_ARG0] = 0; } out: if (nlh) { @@ -1356,6 +1464,7 @@ ip_set_uadd(struct sock *ctnl, struct sk_buff *skb, const struct nlmsghdr *nlh, const struct nlattr * const attr[]) { + struct ip_set_net *inst = ip_set_pernet(sock_net(ctnl)); struct ip_set *set; struct nlattr *tb[IPSET_ATTR_ADT_MAX+1] = {}; const struct nlattr *nla; @@ -1374,7 +1483,7 @@ ip_set_uadd(struct sock *ctnl, struct sk_buff *skb, attr[IPSET_ATTR_LINENO] == NULL)))) return -IPSET_ERR_PROTOCOL; - set = find_set(nla_data(attr[IPSET_ATTR_SETNAME])); + set = find_set(inst, nla_data(attr[IPSET_ATTR_SETNAME])); if (set == NULL) return -ENOENT; @@ -1410,6 +1519,7 @@ ip_set_udel(struct sock *ctnl, struct sk_buff *skb, const struct nlmsghdr *nlh, const struct nlattr * const attr[]) { + struct ip_set_net *inst = ip_set_pernet(sock_net(ctnl)); struct ip_set *set; struct nlattr *tb[IPSET_ATTR_ADT_MAX+1] = {}; const struct nlattr *nla; @@ -1428,7 +1538,7 @@ ip_set_udel(struct sock *ctnl, struct sk_buff *skb, attr[IPSET_ATTR_LINENO] == NULL)))) return -IPSET_ERR_PROTOCOL; - set = find_set(nla_data(attr[IPSET_ATTR_SETNAME])); + set = find_set(inst, nla_data(attr[IPSET_ATTR_SETNAME])); if (set == NULL) return -ENOENT; @@ -1464,6 +1574,7 @@ ip_set_utest(struct sock *ctnl, struct sk_buff *skb, const struct nlmsghdr *nlh, const struct nlattr * const attr[]) { + struct ip_set_net *inst = ip_set_pernet(sock_net(ctnl)); struct ip_set *set; struct nlattr *tb[IPSET_ATTR_ADT_MAX+1] = {}; int ret = 0; @@ -1474,7 +1585,7 @@ ip_set_utest(struct sock *ctnl, struct sk_buff *skb, !flag_nested(attr[IPSET_ATTR_DATA]))) return -IPSET_ERR_PROTOCOL; - set = find_set(nla_data(attr[IPSET_ATTR_SETNAME])); + set = find_set(inst, nla_data(attr[IPSET_ATTR_SETNAME])); if (set == NULL) return -ENOENT; @@ -1499,6 +1610,7 @@ ip_set_header(struct sock *ctnl, struct sk_buff *skb, const struct nlmsghdr *nlh, const struct nlattr * const attr[]) { + struct ip_set_net *inst = ip_set_pernet(sock_net(ctnl)); const struct ip_set *set; struct sk_buff *skb2; struct nlmsghdr *nlh2; @@ -1508,7 +1620,7 @@ ip_set_header(struct sock *ctnl, struct sk_buff *skb, attr[IPSET_ATTR_SETNAME] == NULL)) return -IPSET_ERR_PROTOCOL; - set = find_set(nla_data(attr[IPSET_ATTR_SETNAME])); + set = find_set(inst, nla_data(attr[IPSET_ATTR_SETNAME])); if (set == NULL) return -ENOENT; @@ -1733,8 +1845,10 @@ ip_set_sockfn_get(struct sock *sk, int optval, void __user *user, int *len) unsigned int *op; void *data; int copylen = *len, ret = 0; + struct net *net = sock_net(sk); + struct ip_set_net *inst = ip_set_pernet(net); - if (!ns_capable(sock_net(sk)->user_ns, CAP_NET_ADMIN)) + if (!ns_capable(net->user_ns, CAP_NET_ADMIN)) return -EPERM; if (optval != SO_IP_SET) return -EBADF; @@ -1783,22 +1897,39 @@ ip_set_sockfn_get(struct sock *sk, int optval, void __user *user, int *len) } req_get->set.name[IPSET_MAXNAMELEN - 1] = '\0'; nfnl_lock(NFNL_SUBSYS_IPSET); - find_set_and_id(req_get->set.name, &id); + find_set_and_id(inst, req_get->set.name, &id); req_get->set.index = id; nfnl_unlock(NFNL_SUBSYS_IPSET); goto copy; } + case IP_SET_OP_GET_FNAME: { + struct ip_set_req_get_set_family *req_get = data; + ip_set_id_t id; + + if (*len != sizeof(struct ip_set_req_get_set_family)) { + ret = -EINVAL; + goto done; + } + req_get->set.name[IPSET_MAXNAMELEN - 1] = '\0'; + nfnl_lock(NFNL_SUBSYS_IPSET); + find_set_and_id(inst, req_get->set.name, &id); + req_get->set.index = id; + if (id != IPSET_INVALID_ID) + req_get->family = nfnl_set(inst, id)->family; + nfnl_unlock(NFNL_SUBSYS_IPSET); + goto copy; + } case IP_SET_OP_GET_BYINDEX: { struct ip_set_req_get_set *req_get = data; struct ip_set *set; if (*len != sizeof(struct ip_set_req_get_set) || - req_get->set.index >= ip_set_max) { + req_get->set.index >= inst->ip_set_max) { ret = -EINVAL; goto done; } nfnl_lock(NFNL_SUBSYS_IPSET); - set = nfnl_set(req_get->set.index); + set = nfnl_set(inst, req_get->set.index); strncpy(req_get->set.name, set ? set->name : "", IPSET_MAXNAMELEN); nfnl_unlock(NFNL_SUBSYS_IPSET); @@ -1827,49 +1958,81 @@ static struct nf_sockopt_ops so_set __read_mostly = { .owner = THIS_MODULE, }; -static int __init -ip_set_init(void) +static int __net_init +ip_set_net_init(struct net *net) { + struct ip_set_net *inst = ip_set_pernet(net); struct ip_set **list; - int ret; - if (max_sets) - ip_set_max = max_sets; - if (ip_set_max >= IPSET_INVALID_ID) - ip_set_max = IPSET_INVALID_ID - 1; + inst->ip_set_max = max_sets ? max_sets : CONFIG_IP_SET_MAX; + if (inst->ip_set_max >= IPSET_INVALID_ID) + inst->ip_set_max = IPSET_INVALID_ID - 1; - list = kzalloc(sizeof(struct ip_set *) * ip_set_max, GFP_KERNEL); + list = kzalloc(sizeof(struct ip_set *) * inst->ip_set_max, GFP_KERNEL); if (!list) return -ENOMEM; + inst->is_deleted = 0; + rcu_assign_pointer(inst->ip_set_list, list); + pr_notice("ip_set: protocol %u\n", IPSET_PROTOCOL); + return 0; +} - rcu_assign_pointer(ip_set_list, list); - ret = nfnetlink_subsys_register(&ip_set_netlink_subsys); +static void __net_exit +ip_set_net_exit(struct net *net) +{ + struct ip_set_net *inst = ip_set_pernet(net); + + struct ip_set *set = NULL; + ip_set_id_t i; + + inst->is_deleted = 1; /* flag for ip_set_nfnl_put */ + + for (i = 0; i < inst->ip_set_max; i++) { + set = nfnl_set(inst, i); + if (set != NULL) + ip_set_destroy_set(inst, i); + } + kfree(rcu_dereference_protected(inst->ip_set_list, 1)); +} + +static struct pernet_operations ip_set_net_ops = { + .init = ip_set_net_init, + .exit = ip_set_net_exit, + .id = &ip_set_net_id, + .size = sizeof(struct ip_set_net) +}; + + +static int __init +ip_set_init(void) +{ + int ret = nfnetlink_subsys_register(&ip_set_netlink_subsys); if (ret != 0) { pr_err("ip_set: cannot register with nfnetlink.\n"); - kfree(list); return ret; } ret = nf_register_sockopt(&so_set); if (ret != 0) { pr_err("SO_SET registry failed: %d\n", ret); nfnetlink_subsys_unregister(&ip_set_netlink_subsys); - kfree(list); return ret; } - - pr_notice("ip_set: protocol %u\n", IPSET_PROTOCOL); + ret = register_pernet_subsys(&ip_set_net_ops); + if (ret) { + pr_err("ip_set: cannot register pernet_subsys.\n"); + nf_unregister_sockopt(&so_set); + nfnetlink_subsys_unregister(&ip_set_netlink_subsys); + return ret; + } return 0; } static void __exit ip_set_fini(void) { - struct ip_set **list = rcu_dereference_protected(ip_set_list, 1); - - /* There can't be any existing set */ + unregister_pernet_subsys(&ip_set_net_ops); nf_unregister_sockopt(&so_set); nfnetlink_subsys_unregister(&ip_set_netlink_subsys); - kfree(list); pr_debug("these are the famous last words\n"); } diff --git a/net/netfilter/ipset/ip_set_getport.c b/net/netfilter/ipset/ip_set_getport.c index dac156f819ac..29fb01ddff93 100644 --- a/net/netfilter/ipset/ip_set_getport.c +++ b/net/netfilter/ipset/ip_set_getport.c @@ -102,9 +102,25 @@ ip_set_get_ip4_port(const struct sk_buff *skb, bool src, int protocol = iph->protocol; /* See comments at tcp_match in ip_tables.c */ - if (protocol <= 0 || (ntohs(iph->frag_off) & IP_OFFSET)) + if (protocol <= 0) return false; + if (ntohs(iph->frag_off) & IP_OFFSET) + switch (protocol) { + case IPPROTO_TCP: + case IPPROTO_SCTP: + case IPPROTO_UDP: + case IPPROTO_UDPLITE: + case IPPROTO_ICMP: + /* Port info not available for fragment offset > 0 */ + return false; + default: + /* Other protocols doesn't have ports, + so we can match fragments */ + *proto = protocol; + return true; + } + return get_port(skb, protocol, protooff, src, port, proto); } EXPORT_SYMBOL_GPL(ip_set_get_ip4_port); diff --git a/net/netfilter/ipset/ip_set_hash_gen.h b/net/netfilter/ipset/ip_set_hash_gen.h index 707bc520d629..be6932ad3a86 100644 --- a/net/netfilter/ipset/ip_set_hash_gen.h +++ b/net/netfilter/ipset/ip_set_hash_gen.h @@ -15,8 +15,7 @@ #define rcu_dereference_bh(p) rcu_dereference(p) #endif -#define CONCAT(a, b) a##b -#define TOKEN(a, b) CONCAT(a, b) +#define rcu_dereference_bh_nfnl(p) rcu_dereference_bh_check(p, 1) /* Hashing which uses arrays to resolve clashing. The hash table is resized * (doubled) when searching becomes too long. @@ -78,10 +77,14 @@ struct htable { #define hbucket(h, i) (&((h)->bucket[i])) +#ifndef IPSET_NET_COUNT +#define IPSET_NET_COUNT 1 +#endif + /* Book-keeping of the prefixes added to the set */ struct net_prefixes { - u8 cidr; /* the different cidr values in the set */ - u32 nets; /* number of elements per cidr */ + u32 nets[IPSET_NET_COUNT]; /* number of elements per cidr */ + u8 cidr[IPSET_NET_COUNT]; /* the different cidr values in the set */ }; /* Compute the hash table size */ @@ -114,23 +117,6 @@ htable_bits(u32 hashsize) return bits; } -/* Destroy the hashtable part of the set */ -static void -ahash_destroy(struct htable *t) -{ - struct hbucket *n; - u32 i; - - for (i = 0; i < jhash_size(t->htable_bits); i++) { - n = hbucket(t, i); - if (n->size) - /* FIXME: use slab cache */ - kfree(n->value); - } - - ip_set_free(t); -} - static int hbucket_elem_add(struct hbucket *n, u8 ahash_max, size_t dsize) { @@ -156,30 +142,30 @@ hbucket_elem_add(struct hbucket *n, u8 ahash_max, size_t dsize) } #ifdef IP_SET_HASH_WITH_NETS +#if IPSET_NET_COUNT > 1 +#define __CIDR(cidr, i) (cidr[i]) +#else +#define __CIDR(cidr, i) (cidr) +#endif #ifdef IP_SET_HASH_WITH_NETS_PACKED /* When cidr is packed with nomatch, cidr - 1 is stored in the entry */ -#define CIDR(cidr) (cidr + 1) +#define CIDR(cidr, i) (__CIDR(cidr, i) + 1) #else -#define CIDR(cidr) (cidr) +#define CIDR(cidr, i) (__CIDR(cidr, i)) #endif #define SET_HOST_MASK(family) (family == AF_INET ? 32 : 128) #ifdef IP_SET_HASH_WITH_MULTI -#define NETS_LENGTH(family) (SET_HOST_MASK(family) + 1) +#define NLEN(family) (SET_HOST_MASK(family) + 1) #else -#define NETS_LENGTH(family) SET_HOST_MASK(family) +#define NLEN(family) SET_HOST_MASK(family) #endif #else -#define NETS_LENGTH(family) 0 +#define NLEN(family) 0 #endif /* IP_SET_HASH_WITH_NETS */ -#define ext_timeout(e, h) \ -(unsigned long *)(((void *)(e)) + (h)->offset[IPSET_OFFSET_TIMEOUT]) -#define ext_counter(e, h) \ -(struct ip_set_counter *)(((void *)(e)) + (h)->offset[IPSET_OFFSET_COUNTER]) - #endif /* _IP_SET_HASH_GEN_H */ /* Family dependent templates */ @@ -194,6 +180,8 @@ hbucket_elem_add(struct hbucket *n, u8 ahash_max, size_t dsize) #undef mtype_data_next #undef mtype_elem +#undef mtype_ahash_destroy +#undef mtype_ext_cleanup #undef mtype_add_cidr #undef mtype_del_cidr #undef mtype_ahash_memsize @@ -220,41 +208,43 @@ hbucket_elem_add(struct hbucket *n, u8 ahash_max, size_t dsize) #undef HKEY -#define mtype_data_equal TOKEN(MTYPE, _data_equal) +#define mtype_data_equal IPSET_TOKEN(MTYPE, _data_equal) #ifdef IP_SET_HASH_WITH_NETS -#define mtype_do_data_match TOKEN(MTYPE, _do_data_match) +#define mtype_do_data_match IPSET_TOKEN(MTYPE, _do_data_match) #else #define mtype_do_data_match(d) 1 #endif -#define mtype_data_set_flags TOKEN(MTYPE, _data_set_flags) -#define mtype_data_reset_flags TOKEN(MTYPE, _data_reset_flags) -#define mtype_data_netmask TOKEN(MTYPE, _data_netmask) -#define mtype_data_list TOKEN(MTYPE, _data_list) -#define mtype_data_next TOKEN(MTYPE, _data_next) -#define mtype_elem TOKEN(MTYPE, _elem) -#define mtype_add_cidr TOKEN(MTYPE, _add_cidr) -#define mtype_del_cidr TOKEN(MTYPE, _del_cidr) -#define mtype_ahash_memsize TOKEN(MTYPE, _ahash_memsize) -#define mtype_flush TOKEN(MTYPE, _flush) -#define mtype_destroy TOKEN(MTYPE, _destroy) -#define mtype_gc_init TOKEN(MTYPE, _gc_init) -#define mtype_same_set TOKEN(MTYPE, _same_set) -#define mtype_kadt TOKEN(MTYPE, _kadt) -#define mtype_uadt TOKEN(MTYPE, _uadt) +#define mtype_data_set_flags IPSET_TOKEN(MTYPE, _data_set_flags) +#define mtype_data_reset_elem IPSET_TOKEN(MTYPE, _data_reset_elem) +#define mtype_data_reset_flags IPSET_TOKEN(MTYPE, _data_reset_flags) +#define mtype_data_netmask IPSET_TOKEN(MTYPE, _data_netmask) +#define mtype_data_list IPSET_TOKEN(MTYPE, _data_list) +#define mtype_data_next IPSET_TOKEN(MTYPE, _data_next) +#define mtype_elem IPSET_TOKEN(MTYPE, _elem) +#define mtype_ahash_destroy IPSET_TOKEN(MTYPE, _ahash_destroy) +#define mtype_ext_cleanup IPSET_TOKEN(MTYPE, _ext_cleanup) +#define mtype_add_cidr IPSET_TOKEN(MTYPE, _add_cidr) +#define mtype_del_cidr IPSET_TOKEN(MTYPE, _del_cidr) +#define mtype_ahash_memsize IPSET_TOKEN(MTYPE, _ahash_memsize) +#define mtype_flush IPSET_TOKEN(MTYPE, _flush) +#define mtype_destroy IPSET_TOKEN(MTYPE, _destroy) +#define mtype_gc_init IPSET_TOKEN(MTYPE, _gc_init) +#define mtype_same_set IPSET_TOKEN(MTYPE, _same_set) +#define mtype_kadt IPSET_TOKEN(MTYPE, _kadt) +#define mtype_uadt IPSET_TOKEN(MTYPE, _uadt) #define mtype MTYPE -#define mtype_elem TOKEN(MTYPE, _elem) -#define mtype_add TOKEN(MTYPE, _add) -#define mtype_del TOKEN(MTYPE, _del) -#define mtype_test_cidrs TOKEN(MTYPE, _test_cidrs) -#define mtype_test TOKEN(MTYPE, _test) -#define mtype_expire TOKEN(MTYPE, _expire) -#define mtype_resize TOKEN(MTYPE, _resize) -#define mtype_head TOKEN(MTYPE, _head) -#define mtype_list TOKEN(MTYPE, _list) -#define mtype_gc TOKEN(MTYPE, _gc) -#define mtype_variant TOKEN(MTYPE, _variant) -#define mtype_data_match TOKEN(MTYPE, _data_match) +#define mtype_add IPSET_TOKEN(MTYPE, _add) +#define mtype_del IPSET_TOKEN(MTYPE, _del) +#define mtype_test_cidrs IPSET_TOKEN(MTYPE, _test_cidrs) +#define mtype_test IPSET_TOKEN(MTYPE, _test) +#define mtype_expire IPSET_TOKEN(MTYPE, _expire) +#define mtype_resize IPSET_TOKEN(MTYPE, _resize) +#define mtype_head IPSET_TOKEN(MTYPE, _head) +#define mtype_list IPSET_TOKEN(MTYPE, _list) +#define mtype_gc IPSET_TOKEN(MTYPE, _gc) +#define mtype_variant IPSET_TOKEN(MTYPE, _variant) +#define mtype_data_match IPSET_TOKEN(MTYPE, _data_match) #ifndef HKEY_DATALEN #define HKEY_DATALEN sizeof(struct mtype_elem) @@ -269,13 +259,10 @@ hbucket_elem_add(struct hbucket *n, u8 ahash_max, size_t dsize) /* The generic hash structure */ struct htype { - struct htable *table; /* the hash table */ + struct htable __rcu *table; /* the hash table */ u32 maxelem; /* max elements in the hash */ u32 elements; /* current element (vs timeout) */ u32 initval; /* random jhash init value */ - u32 timeout; /* timeout value, if enabled */ - size_t dsize; /* data struct size */ - size_t offset[IPSET_OFFSET_MAX]; /* Offsets to extensions */ struct timer_list gc; /* garbage collection when timeout enabled */ struct mtype_elem next; /* temporary storage for uadd */ #ifdef IP_SET_HASH_WITH_MULTI @@ -297,49 +284,49 @@ struct htype { /* Network cidr size book keeping when the hash stores different * sized networks */ static void -mtype_add_cidr(struct htype *h, u8 cidr, u8 nets_length) +mtype_add_cidr(struct htype *h, u8 cidr, u8 nets_length, u8 n) { int i, j; /* Add in increasing prefix order, so larger cidr first */ - for (i = 0, j = -1; i < nets_length && h->nets[i].nets; i++) { + for (i = 0, j = -1; i < nets_length && h->nets[i].nets[n]; i++) { if (j != -1) continue; - else if (h->nets[i].cidr < cidr) + else if (h->nets[i].cidr[n] < cidr) j = i; - else if (h->nets[i].cidr == cidr) { - h->nets[i].nets++; + else if (h->nets[i].cidr[n] == cidr) { + h->nets[i].nets[n]++; return; } } if (j != -1) { for (; i > j; i--) { - h->nets[i].cidr = h->nets[i - 1].cidr; - h->nets[i].nets = h->nets[i - 1].nets; + h->nets[i].cidr[n] = h->nets[i - 1].cidr[n]; + h->nets[i].nets[n] = h->nets[i - 1].nets[n]; } } - h->nets[i].cidr = cidr; - h->nets[i].nets = 1; + h->nets[i].cidr[n] = cidr; + h->nets[i].nets[n] = 1; } static void -mtype_del_cidr(struct htype *h, u8 cidr, u8 nets_length) +mtype_del_cidr(struct htype *h, u8 cidr, u8 nets_length, u8 n) { u8 i, j, net_end = nets_length - 1; for (i = 0; i < nets_length; i++) { - if (h->nets[i].cidr != cidr) + if (h->nets[i].cidr[n] != cidr) continue; - if (h->nets[i].nets > 1 || i == net_end || - h->nets[i + 1].nets == 0) { - h->nets[i].nets--; + if (h->nets[i].nets[n] > 1 || i == net_end || + h->nets[i + 1].nets[n] == 0) { + h->nets[i].nets[n]--; return; } - for (j = i; j < net_end && h->nets[j].nets; j++) { - h->nets[j].cidr = h->nets[j + 1].cidr; - h->nets[j].nets = h->nets[j + 1].nets; + for (j = i; j < net_end && h->nets[j].nets[n]; j++) { + h->nets[j].cidr[n] = h->nets[j + 1].cidr[n]; + h->nets[j].nets[n] = h->nets[j + 1].nets[n]; } - h->nets[j].nets = 0; + h->nets[j].nets[n] = 0; return; } } @@ -347,10 +334,10 @@ mtype_del_cidr(struct htype *h, u8 cidr, u8 nets_length) /* Calculate the actual memory size of the set data */ static size_t -mtype_ahash_memsize(const struct htype *h, u8 nets_length) +mtype_ahash_memsize(const struct htype *h, const struct htable *t, + u8 nets_length, size_t dsize) { u32 i; - struct htable *t = h->table; size_t memsize = sizeof(*h) + sizeof(*t) #ifdef IP_SET_HASH_WITH_NETS @@ -359,35 +346,70 @@ mtype_ahash_memsize(const struct htype *h, u8 nets_length) + jhash_size(t->htable_bits) * sizeof(struct hbucket); for (i = 0; i < jhash_size(t->htable_bits); i++) - memsize += t->bucket[i].size * h->dsize; + memsize += t->bucket[i].size * dsize; return memsize; } +/* Get the ith element from the array block n */ +#define ahash_data(n, i, dsize) \ + ((struct mtype_elem *)((n)->value + ((i) * (dsize)))) + +static void +mtype_ext_cleanup(struct ip_set *set, struct hbucket *n) +{ + int i; + + for (i = 0; i < n->pos; i++) + ip_set_ext_destroy(set, ahash_data(n, i, set->dsize)); +} + /* Flush a hash type of set: destroy all elements */ static void mtype_flush(struct ip_set *set) { struct htype *h = set->data; - struct htable *t = h->table; + struct htable *t; struct hbucket *n; u32 i; + t = rcu_dereference_bh_nfnl(h->table); for (i = 0; i < jhash_size(t->htable_bits); i++) { n = hbucket(t, i); if (n->size) { + if (set->extensions & IPSET_EXT_DESTROY) + mtype_ext_cleanup(set, n); n->size = n->pos = 0; /* FIXME: use slab cache */ kfree(n->value); } } #ifdef IP_SET_HASH_WITH_NETS - memset(h->nets, 0, sizeof(struct net_prefixes) - * NETS_LENGTH(set->family)); + memset(h->nets, 0, sizeof(struct net_prefixes) * NLEN(set->family)); #endif h->elements = 0; } +/* Destroy the hashtable part of the set */ +static void +mtype_ahash_destroy(struct ip_set *set, struct htable *t, bool ext_destroy) +{ + struct hbucket *n; + u32 i; + + for (i = 0; i < jhash_size(t->htable_bits); i++) { + n = hbucket(t, i); + if (n->size) { + if (set->extensions & IPSET_EXT_DESTROY && ext_destroy) + mtype_ext_cleanup(set, n); + /* FIXME: use slab cache */ + kfree(n->value); + } + } + + ip_set_free(t); +} + /* Destroy a hash type of set */ static void mtype_destroy(struct ip_set *set) @@ -397,7 +419,7 @@ mtype_destroy(struct ip_set *set) if (set->extensions & IPSET_EXT_TIMEOUT) del_timer_sync(&h->gc); - ahash_destroy(h->table); + mtype_ahash_destroy(set, rcu_dereference_bh_nfnl(h->table), true); #ifdef IP_SET_HASH_WITH_RBTREE rbtree_destroy(&h->rbtree); #endif @@ -414,10 +436,10 @@ mtype_gc_init(struct ip_set *set, void (*gc)(unsigned long ul_set)) init_timer(&h->gc); h->gc.data = (unsigned long) set; h->gc.function = gc; - h->gc.expires = jiffies + IPSET_GC_PERIOD(h->timeout) * HZ; + h->gc.expires = jiffies + IPSET_GC_PERIOD(set->timeout) * HZ; add_timer(&h->gc); pr_debug("gc initialized, run in every %u\n", - IPSET_GC_PERIOD(h->timeout)); + IPSET_GC_PERIOD(set->timeout)); } static bool @@ -428,37 +450,40 @@ mtype_same_set(const struct ip_set *a, const struct ip_set *b) /* Resizing changes htable_bits, so we ignore it */ return x->maxelem == y->maxelem && - x->timeout == y->timeout && + a->timeout == b->timeout && #ifdef IP_SET_HASH_WITH_NETMASK x->netmask == y->netmask && #endif a->extensions == b->extensions; } -/* Get the ith element from the array block n */ -#define ahash_data(n, i, dsize) \ - ((struct mtype_elem *)((n)->value + ((i) * (dsize)))) - /* Delete expired elements from the hashtable */ static void -mtype_expire(struct htype *h, u8 nets_length, size_t dsize) +mtype_expire(struct ip_set *set, struct htype *h, u8 nets_length, size_t dsize) { - struct htable *t = h->table; + struct htable *t; struct hbucket *n; struct mtype_elem *data; u32 i; int j; +#ifdef IP_SET_HASH_WITH_NETS + u8 k; +#endif + rcu_read_lock_bh(); + t = rcu_dereference_bh(h->table); for (i = 0; i < jhash_size(t->htable_bits); i++) { n = hbucket(t, i); for (j = 0; j < n->pos; j++) { data = ahash_data(n, j, dsize); - if (ip_set_timeout_expired(ext_timeout(data, h))) { + if (ip_set_timeout_expired(ext_timeout(data, set))) { pr_debug("expired %u/%u\n", i, j); #ifdef IP_SET_HASH_WITH_NETS - mtype_del_cidr(h, CIDR(data->cidr), - nets_length); + for (k = 0; k < IPSET_NET_COUNT; k++) + mtype_del_cidr(h, CIDR(data->cidr, k), + nets_length, k); #endif + ip_set_ext_destroy(set, data); if (j != n->pos - 1) /* Not last one */ memcpy(data, @@ -481,6 +506,7 @@ mtype_expire(struct htype *h, u8 nets_length, size_t dsize) n->value = tmp; } } + rcu_read_unlock_bh(); } static void @@ -491,10 +517,10 @@ mtype_gc(unsigned long ul_set) pr_debug("called\n"); write_lock_bh(&set->lock); - mtype_expire(h, NETS_LENGTH(set->family), h->dsize); + mtype_expire(set, h, NLEN(set->family), set->dsize); write_unlock_bh(&set->lock); - h->gc.expires = jiffies + IPSET_GC_PERIOD(h->timeout) * HZ; + h->gc.expires = jiffies + IPSET_GC_PERIOD(set->timeout) * HZ; add_timer(&h->gc); } @@ -505,7 +531,7 @@ static int mtype_resize(struct ip_set *set, bool retried) { struct htype *h = set->data; - struct htable *t, *orig = h->table; + struct htable *t, *orig = rcu_dereference_bh_nfnl(h->table); u8 htable_bits = orig->htable_bits; #ifdef IP_SET_HASH_WITH_NETS u8 flags; @@ -520,8 +546,7 @@ mtype_resize(struct ip_set *set, bool retried) if (SET_WITH_TIMEOUT(set) && !retried) { i = h->elements; write_lock_bh(&set->lock); - mtype_expire(set->data, NETS_LENGTH(set->family), - h->dsize); + mtype_expire(set, set->data, NLEN(set->family), set->dsize); write_unlock_bh(&set->lock); if (h->elements < i) return 0; @@ -548,25 +573,25 @@ retry: for (i = 0; i < jhash_size(orig->htable_bits); i++) { n = hbucket(orig, i); for (j = 0; j < n->pos; j++) { - data = ahash_data(n, j, h->dsize); + data = ahash_data(n, j, set->dsize); #ifdef IP_SET_HASH_WITH_NETS flags = 0; mtype_data_reset_flags(data, &flags); #endif m = hbucket(t, HKEY(data, h->initval, htable_bits)); - ret = hbucket_elem_add(m, AHASH_MAX(h), h->dsize); + ret = hbucket_elem_add(m, AHASH_MAX(h), set->dsize); if (ret < 0) { #ifdef IP_SET_HASH_WITH_NETS mtype_data_reset_flags(data, &flags); #endif read_unlock_bh(&set->lock); - ahash_destroy(t); + mtype_ahash_destroy(set, t, false); if (ret == -EAGAIN) goto retry; return ret; } - d = ahash_data(m, m->pos++, h->dsize); - memcpy(d, data, h->dsize); + d = ahash_data(m, m->pos++, set->dsize); + memcpy(d, data, set->dsize); #ifdef IP_SET_HASH_WITH_NETS mtype_data_reset_flags(d, &flags); #endif @@ -581,7 +606,7 @@ retry: pr_debug("set %s resized from %u (%p) to %u (%p)\n", set->name, orig->htable_bits, orig, t->htable_bits, t); - ahash_destroy(orig); + mtype_ahash_destroy(set, orig, false); return 0; } @@ -604,7 +629,7 @@ mtype_add(struct ip_set *set, void *value, const struct ip_set_ext *ext, if (SET_WITH_TIMEOUT(set) && h->elements >= h->maxelem) /* FIXME: when set is full, we slow down here */ - mtype_expire(h, NETS_LENGTH(set->family), h->dsize); + mtype_expire(set, h, NLEN(set->family), set->dsize); if (h->elements >= h->maxelem) { if (net_ratelimit()) @@ -618,11 +643,11 @@ mtype_add(struct ip_set *set, void *value, const struct ip_set_ext *ext, key = HKEY(value, h->initval, t->htable_bits); n = hbucket(t, key); for (i = 0; i < n->pos; i++) { - data = ahash_data(n, i, h->dsize); + data = ahash_data(n, i, set->dsize); if (mtype_data_equal(data, d, &multi)) { if (flag_exist || (SET_WITH_TIMEOUT(set) && - ip_set_timeout_expired(ext_timeout(data, h)))) { + ip_set_timeout_expired(ext_timeout(data, set)))) { /* Just the extensions could be overwritten */ j = i; goto reuse_slot; @@ -633,30 +658,37 @@ mtype_add(struct ip_set *set, void *value, const struct ip_set_ext *ext, } /* Reuse first timed out entry */ if (SET_WITH_TIMEOUT(set) && - ip_set_timeout_expired(ext_timeout(data, h)) && + ip_set_timeout_expired(ext_timeout(data, set)) && j != AHASH_MAX(h) + 1) j = i; } reuse_slot: if (j != AHASH_MAX(h) + 1) { /* Fill out reused slot */ - data = ahash_data(n, j, h->dsize); + data = ahash_data(n, j, set->dsize); #ifdef IP_SET_HASH_WITH_NETS - mtype_del_cidr(h, CIDR(data->cidr), NETS_LENGTH(set->family)); - mtype_add_cidr(h, CIDR(d->cidr), NETS_LENGTH(set->family)); + for (i = 0; i < IPSET_NET_COUNT; i++) { + mtype_del_cidr(h, CIDR(data->cidr, i), + NLEN(set->family), i); + mtype_add_cidr(h, CIDR(d->cidr, i), + NLEN(set->family), i); + } #endif + ip_set_ext_destroy(set, data); } else { /* Use/create a new slot */ TUNE_AHASH_MAX(h, multi); - ret = hbucket_elem_add(n, AHASH_MAX(h), h->dsize); + ret = hbucket_elem_add(n, AHASH_MAX(h), set->dsize); if (ret != 0) { if (ret == -EAGAIN) mtype_data_next(&h->next, d); goto out; } - data = ahash_data(n, n->pos++, h->dsize); + data = ahash_data(n, n->pos++, set->dsize); #ifdef IP_SET_HASH_WITH_NETS - mtype_add_cidr(h, CIDR(d->cidr), NETS_LENGTH(set->family)); + for (i = 0; i < IPSET_NET_COUNT; i++) + mtype_add_cidr(h, CIDR(d->cidr, i), NLEN(set->family), + i); #endif h->elements++; } @@ -665,9 +697,11 @@ reuse_slot: mtype_data_set_flags(data, flags); #endif if (SET_WITH_TIMEOUT(set)) - ip_set_timeout_set(ext_timeout(data, h), ext->timeout); + ip_set_timeout_set(ext_timeout(data, set), ext->timeout); if (SET_WITH_COUNTER(set)) - ip_set_init_counter(ext_counter(data, h), ext); + ip_set_init_counter(ext_counter(data, set), ext); + if (SET_WITH_COMMENT(set)) + ip_set_init_comment(ext_comment(data, set), ext); out: rcu_read_unlock_bh(); @@ -682,47 +716,60 @@ mtype_del(struct ip_set *set, void *value, const struct ip_set_ext *ext, struct ip_set_ext *mext, u32 flags) { struct htype *h = set->data; - struct htable *t = h->table; + struct htable *t; const struct mtype_elem *d = value; struct mtype_elem *data; struct hbucket *n; - int i; + int i, ret = -IPSET_ERR_EXIST; +#ifdef IP_SET_HASH_WITH_NETS + u8 j; +#endif u32 key, multi = 0; + rcu_read_lock_bh(); + t = rcu_dereference_bh(h->table); key = HKEY(value, h->initval, t->htable_bits); n = hbucket(t, key); for (i = 0; i < n->pos; i++) { - data = ahash_data(n, i, h->dsize); + data = ahash_data(n, i, set->dsize); if (!mtype_data_equal(data, d, &multi)) continue; if (SET_WITH_TIMEOUT(set) && - ip_set_timeout_expired(ext_timeout(data, h))) - return -IPSET_ERR_EXIST; + ip_set_timeout_expired(ext_timeout(data, set))) + goto out; if (i != n->pos - 1) /* Not last one */ - memcpy(data, ahash_data(n, n->pos - 1, h->dsize), - h->dsize); + memcpy(data, ahash_data(n, n->pos - 1, set->dsize), + set->dsize); n->pos--; h->elements--; #ifdef IP_SET_HASH_WITH_NETS - mtype_del_cidr(h, CIDR(d->cidr), NETS_LENGTH(set->family)); + for (j = 0; j < IPSET_NET_COUNT; j++) + mtype_del_cidr(h, CIDR(d->cidr, j), NLEN(set->family), + j); #endif + ip_set_ext_destroy(set, data); if (n->pos + AHASH_INIT_SIZE < n->size) { void *tmp = kzalloc((n->size - AHASH_INIT_SIZE) - * h->dsize, + * set->dsize, GFP_ATOMIC); - if (!tmp) - return 0; + if (!tmp) { + ret = 0; + goto out; + } n->size -= AHASH_INIT_SIZE; - memcpy(tmp, n->value, n->size * h->dsize); + memcpy(tmp, n->value, n->size * set->dsize); kfree(n->value); n->value = tmp; } - return 0; + ret = 0; + goto out; } - return -IPSET_ERR_EXIST; +out: + rcu_read_unlock_bh(); + return ret; } static inline int @@ -730,8 +777,7 @@ mtype_data_match(struct mtype_elem *data, const struct ip_set_ext *ext, struct ip_set_ext *mext, struct ip_set *set, u32 flags) { if (SET_WITH_COUNTER(set)) - ip_set_update_counter(ext_counter(data, - (struct htype *)(set->data)), + ip_set_update_counter(ext_counter(data, set), ext, mext, flags); return mtype_do_data_match(data); } @@ -745,25 +791,38 @@ mtype_test_cidrs(struct ip_set *set, struct mtype_elem *d, struct ip_set_ext *mext, u32 flags) { struct htype *h = set->data; - struct htable *t = h->table; + struct htable *t = rcu_dereference_bh(h->table); struct hbucket *n; struct mtype_elem *data; +#if IPSET_NET_COUNT == 2 + struct mtype_elem orig = *d; + int i, j = 0, k; +#else int i, j = 0; +#endif u32 key, multi = 0; - u8 nets_length = NETS_LENGTH(set->family); + u8 nets_length = NLEN(set->family); pr_debug("test by nets\n"); - for (; j < nets_length && h->nets[j].nets && !multi; j++) { - mtype_data_netmask(d, h->nets[j].cidr); + for (; j < nets_length && h->nets[j].nets[0] && !multi; j++) { +#if IPSET_NET_COUNT == 2 + mtype_data_reset_elem(d, &orig); + mtype_data_netmask(d, h->nets[j].cidr[0], false); + for (k = 0; k < nets_length && h->nets[k].nets[1] && !multi; + k++) { + mtype_data_netmask(d, h->nets[k].cidr[1], true); +#else + mtype_data_netmask(d, h->nets[j].cidr[0]); +#endif key = HKEY(d, h->initval, t->htable_bits); n = hbucket(t, key); for (i = 0; i < n->pos; i++) { - data = ahash_data(n, i, h->dsize); + data = ahash_data(n, i, set->dsize); if (!mtype_data_equal(data, d, &multi)) continue; if (SET_WITH_TIMEOUT(set)) { if (!ip_set_timeout_expired( - ext_timeout(data, h))) + ext_timeout(data, set))) return mtype_data_match(data, ext, mext, set, flags); @@ -774,6 +833,9 @@ mtype_test_cidrs(struct ip_set *set, struct mtype_elem *d, return mtype_data_match(data, ext, mext, set, flags); } +#if IPSET_NET_COUNT == 2 + } +#endif } return 0; } @@ -785,30 +847,41 @@ mtype_test(struct ip_set *set, void *value, const struct ip_set_ext *ext, struct ip_set_ext *mext, u32 flags) { struct htype *h = set->data; - struct htable *t = h->table; + struct htable *t; struct mtype_elem *d = value; struct hbucket *n; struct mtype_elem *data; - int i; + int i, ret = 0; u32 key, multi = 0; + rcu_read_lock_bh(); + t = rcu_dereference_bh(h->table); #ifdef IP_SET_HASH_WITH_NETS /* If we test an IP address and not a network address, * try all possible network sizes */ - if (CIDR(d->cidr) == SET_HOST_MASK(set->family)) - return mtype_test_cidrs(set, d, ext, mext, flags); + for (i = 0; i < IPSET_NET_COUNT; i++) + if (CIDR(d->cidr, i) != SET_HOST_MASK(set->family)) + break; + if (i == IPSET_NET_COUNT) { + ret = mtype_test_cidrs(set, d, ext, mext, flags); + goto out; + } #endif key = HKEY(d, h->initval, t->htable_bits); n = hbucket(t, key); for (i = 0; i < n->pos; i++) { - data = ahash_data(n, i, h->dsize); + data = ahash_data(n, i, set->dsize); if (mtype_data_equal(data, d, &multi) && !(SET_WITH_TIMEOUT(set) && - ip_set_timeout_expired(ext_timeout(data, h)))) - return mtype_data_match(data, ext, mext, set, flags); + ip_set_timeout_expired(ext_timeout(data, set)))) { + ret = mtype_data_match(data, ext, mext, set, flags); + goto out; + } } - return 0; +out: + rcu_read_unlock_bh(); + return ret; } /* Reply a HEADER request: fill out the header part of the set */ @@ -816,18 +889,18 @@ static int mtype_head(struct ip_set *set, struct sk_buff *skb) { const struct htype *h = set->data; + const struct htable *t; struct nlattr *nested; size_t memsize; - read_lock_bh(&set->lock); - memsize = mtype_ahash_memsize(h, NETS_LENGTH(set->family)); - read_unlock_bh(&set->lock); + t = rcu_dereference_bh_nfnl(h->table); + memsize = mtype_ahash_memsize(h, t, NLEN(set->family), set->dsize); nested = ipset_nest_start(skb, IPSET_ATTR_DATA); if (!nested) goto nla_put_failure; if (nla_put_net32(skb, IPSET_ATTR_HASHSIZE, - htonl(jhash_size(h->table->htable_bits))) || + htonl(jhash_size(t->htable_bits))) || nla_put_net32(skb, IPSET_ATTR_MAXELEM, htonl(h->maxelem))) goto nla_put_failure; #ifdef IP_SET_HASH_WITH_NETMASK @@ -836,12 +909,9 @@ mtype_head(struct ip_set *set, struct sk_buff *skb) goto nla_put_failure; #endif if (nla_put_net32(skb, IPSET_ATTR_REFERENCES, htonl(set->ref - 1)) || - nla_put_net32(skb, IPSET_ATTR_MEMSIZE, htonl(memsize)) || - ((set->extensions & IPSET_EXT_TIMEOUT) && - nla_put_net32(skb, IPSET_ATTR_TIMEOUT, htonl(h->timeout))) || - ((set->extensions & IPSET_EXT_COUNTER) && - nla_put_net32(skb, IPSET_ATTR_CADT_FLAGS, - htonl(IPSET_FLAG_WITH_COUNTERS)))) + nla_put_net32(skb, IPSET_ATTR_MEMSIZE, htonl(memsize))) + goto nla_put_failure; + if (unlikely(ip_set_put_flags(skb, set))) goto nla_put_failure; ipset_nest_end(skb, nested); @@ -856,11 +926,11 @@ mtype_list(const struct ip_set *set, struct sk_buff *skb, struct netlink_callback *cb) { const struct htype *h = set->data; - const struct htable *t = h->table; + const struct htable *t = rcu_dereference_bh_nfnl(h->table); struct nlattr *atd, *nested; const struct hbucket *n; const struct mtype_elem *e; - u32 first = cb->args[2]; + u32 first = cb->args[IPSET_CB_ARG0]; /* We assume that one hash bucket fills into one page */ void *incomplete; int i; @@ -869,20 +939,22 @@ mtype_list(const struct ip_set *set, if (!atd) return -EMSGSIZE; pr_debug("list hash set %s\n", set->name); - for (; cb->args[2] < jhash_size(t->htable_bits); cb->args[2]++) { + for (; cb->args[IPSET_CB_ARG0] < jhash_size(t->htable_bits); + cb->args[IPSET_CB_ARG0]++) { incomplete = skb_tail_pointer(skb); - n = hbucket(t, cb->args[2]); - pr_debug("cb->args[2]: %lu, t %p n %p\n", cb->args[2], t, n); + n = hbucket(t, cb->args[IPSET_CB_ARG0]); + pr_debug("cb->arg bucket: %lu, t %p n %p\n", + cb->args[IPSET_CB_ARG0], t, n); for (i = 0; i < n->pos; i++) { - e = ahash_data(n, i, h->dsize); + e = ahash_data(n, i, set->dsize); if (SET_WITH_TIMEOUT(set) && - ip_set_timeout_expired(ext_timeout(e, h))) + ip_set_timeout_expired(ext_timeout(e, set))) continue; pr_debug("list hash %lu hbucket %p i %u, data %p\n", - cb->args[2], n, i, e); + cb->args[IPSET_CB_ARG0], n, i, e); nested = ipset_nest_start(skb, IPSET_ATTR_DATA); if (!nested) { - if (cb->args[2] == first) { + if (cb->args[IPSET_CB_ARG0] == first) { nla_nest_cancel(skb, atd); return -EMSGSIZE; } else @@ -890,43 +962,37 @@ mtype_list(const struct ip_set *set, } if (mtype_data_list(skb, e)) goto nla_put_failure; - if (SET_WITH_TIMEOUT(set) && - nla_put_net32(skb, IPSET_ATTR_TIMEOUT, - htonl(ip_set_timeout_get( - ext_timeout(e, h))))) - goto nla_put_failure; - if (SET_WITH_COUNTER(set) && - ip_set_put_counter(skb, ext_counter(e, h))) + if (ip_set_put_extensions(skb, set, e, true)) goto nla_put_failure; ipset_nest_end(skb, nested); } } ipset_nest_end(skb, atd); /* Set listing finished */ - cb->args[2] = 0; + cb->args[IPSET_CB_ARG0] = 0; return 0; nla_put_failure: nlmsg_trim(skb, incomplete); - ipset_nest_end(skb, atd); - if (unlikely(first == cb->args[2])) { + if (unlikely(first == cb->args[IPSET_CB_ARG0])) { pr_warning("Can't list set %s: one bucket does not fit into " "a message. Please report it!\n", set->name); - cb->args[2] = 0; + cb->args[IPSET_CB_ARG0] = 0; return -EMSGSIZE; } + ipset_nest_end(skb, atd); return 0; } static int -TOKEN(MTYPE, _kadt)(struct ip_set *set, const struct sk_buff *skb, - const struct xt_action_param *par, - enum ipset_adt adt, struct ip_set_adt_opt *opt); +IPSET_TOKEN(MTYPE, _kadt)(struct ip_set *set, const struct sk_buff *skb, + const struct xt_action_param *par, + enum ipset_adt adt, struct ip_set_adt_opt *opt); static int -TOKEN(MTYPE, _uadt)(struct ip_set *set, struct nlattr *tb[], - enum ipset_adt adt, u32 *lineno, u32 flags, bool retried); +IPSET_TOKEN(MTYPE, _uadt)(struct ip_set *set, struct nlattr *tb[], + enum ipset_adt adt, u32 *lineno, u32 flags, bool retried); static const struct ip_set_type_variant mtype_variant = { .kadt = mtype_kadt, @@ -946,16 +1012,17 @@ static const struct ip_set_type_variant mtype_variant = { #ifdef IP_SET_EMIT_CREATE static int -TOKEN(HTYPE, _create)(struct ip_set *set, struct nlattr *tb[], u32 flags) +IPSET_TOKEN(HTYPE, _create)(struct net *net, struct ip_set *set, + struct nlattr *tb[], u32 flags) { u32 hashsize = IPSET_DEFAULT_HASHSIZE, maxelem = IPSET_DEFAULT_MAXELEM; - u32 cadt_flags = 0; u8 hbits; #ifdef IP_SET_HASH_WITH_NETMASK u8 netmask; #endif size_t hsize; struct HTYPE *h; + struct htable *t; if (!(set->family == NFPROTO_IPV4 || set->family == NFPROTO_IPV6)) return -IPSET_ERR_INVALID_FAMILY; @@ -1005,7 +1072,7 @@ TOKEN(HTYPE, _create)(struct ip_set *set, struct nlattr *tb[], u32 flags) h->netmask = netmask; #endif get_random_bytes(&h->initval, sizeof(h->initval)); - h->timeout = IPSET_NO_TIMEOUT; + set->timeout = IPSET_NO_TIMEOUT; hbits = htable_bits(hashsize); hsize = htable_size(hbits); @@ -1013,91 +1080,37 @@ TOKEN(HTYPE, _create)(struct ip_set *set, struct nlattr *tb[], u32 flags) kfree(h); return -ENOMEM; } - h->table = ip_set_alloc(hsize); - if (!h->table) { + t = ip_set_alloc(hsize); + if (!t) { kfree(h); return -ENOMEM; } - h->table->htable_bits = hbits; + t->htable_bits = hbits; + rcu_assign_pointer(h->table, t); set->data = h; - if (set->family == NFPROTO_IPV4) - set->variant = &TOKEN(HTYPE, 4_variant); - else - set->variant = &TOKEN(HTYPE, 6_variant); - - if (tb[IPSET_ATTR_CADT_FLAGS]) - cadt_flags = ip_set_get_h32(tb[IPSET_ATTR_CADT_FLAGS]); - if (cadt_flags & IPSET_FLAG_WITH_COUNTERS) { - set->extensions |= IPSET_EXT_COUNTER; - if (tb[IPSET_ATTR_TIMEOUT]) { - h->timeout = - ip_set_timeout_uget(tb[IPSET_ATTR_TIMEOUT]); - set->extensions |= IPSET_EXT_TIMEOUT; - if (set->family == NFPROTO_IPV4) { - h->dsize = - sizeof(struct TOKEN(HTYPE, 4ct_elem)); - h->offset[IPSET_OFFSET_TIMEOUT] = - offsetof(struct TOKEN(HTYPE, 4ct_elem), - timeout); - h->offset[IPSET_OFFSET_COUNTER] = - offsetof(struct TOKEN(HTYPE, 4ct_elem), - counter); - TOKEN(HTYPE, 4_gc_init)(set, - TOKEN(HTYPE, 4_gc)); - } else { - h->dsize = - sizeof(struct TOKEN(HTYPE, 6ct_elem)); - h->offset[IPSET_OFFSET_TIMEOUT] = - offsetof(struct TOKEN(HTYPE, 6ct_elem), - timeout); - h->offset[IPSET_OFFSET_COUNTER] = - offsetof(struct TOKEN(HTYPE, 6ct_elem), - counter); - TOKEN(HTYPE, 6_gc_init)(set, - TOKEN(HTYPE, 6_gc)); - } - } else { - if (set->family == NFPROTO_IPV4) { - h->dsize = - sizeof(struct TOKEN(HTYPE, 4c_elem)); - h->offset[IPSET_OFFSET_COUNTER] = - offsetof(struct TOKEN(HTYPE, 4c_elem), - counter); - } else { - h->dsize = - sizeof(struct TOKEN(HTYPE, 6c_elem)); - h->offset[IPSET_OFFSET_COUNTER] = - offsetof(struct TOKEN(HTYPE, 6c_elem), - counter); - } - } - } else if (tb[IPSET_ATTR_TIMEOUT]) { - h->timeout = ip_set_timeout_uget(tb[IPSET_ATTR_TIMEOUT]); - set->extensions |= IPSET_EXT_TIMEOUT; - if (set->family == NFPROTO_IPV4) { - h->dsize = sizeof(struct TOKEN(HTYPE, 4t_elem)); - h->offset[IPSET_OFFSET_TIMEOUT] = - offsetof(struct TOKEN(HTYPE, 4t_elem), - timeout); - TOKEN(HTYPE, 4_gc_init)(set, TOKEN(HTYPE, 4_gc)); - } else { - h->dsize = sizeof(struct TOKEN(HTYPE, 6t_elem)); - h->offset[IPSET_OFFSET_TIMEOUT] = - offsetof(struct TOKEN(HTYPE, 6t_elem), - timeout); - TOKEN(HTYPE, 6_gc_init)(set, TOKEN(HTYPE, 6_gc)); - } + if (set->family == NFPROTO_IPV4) { + set->variant = &IPSET_TOKEN(HTYPE, 4_variant); + set->dsize = ip_set_elem_len(set, tb, + sizeof(struct IPSET_TOKEN(HTYPE, 4_elem))); } else { + set->variant = &IPSET_TOKEN(HTYPE, 6_variant); + set->dsize = ip_set_elem_len(set, tb, + sizeof(struct IPSET_TOKEN(HTYPE, 6_elem))); + } + if (tb[IPSET_ATTR_TIMEOUT]) { + set->timeout = ip_set_timeout_uget(tb[IPSET_ATTR_TIMEOUT]); if (set->family == NFPROTO_IPV4) - h->dsize = sizeof(struct TOKEN(HTYPE, 4_elem)); + IPSET_TOKEN(HTYPE, 4_gc_init)(set, + IPSET_TOKEN(HTYPE, 4_gc)); else - h->dsize = sizeof(struct TOKEN(HTYPE, 6_elem)); + IPSET_TOKEN(HTYPE, 6_gc_init)(set, + IPSET_TOKEN(HTYPE, 6_gc)); } pr_debug("create %s hashsize %u (%u) maxelem %u: %p(%p)\n", - set->name, jhash_size(h->table->htable_bits), - h->table->htable_bits, h->maxelem, set->data, h->table); + set->name, jhash_size(t->htable_bits), + t->htable_bits, h->maxelem, set->data, t); return 0; } diff --git a/net/netfilter/ipset/ip_set_hash_ip.c b/net/netfilter/ipset/ip_set_hash_ip.c index c74e6e14cd93..e65fc2423d56 100644 --- a/net/netfilter/ipset/ip_set_hash_ip.c +++ b/net/netfilter/ipset/ip_set_hash_ip.c @@ -23,19 +23,20 @@ #include <linux/netfilter/ipset/ip_set.h> #include <linux/netfilter/ipset/ip_set_hash.h> -#define REVISION_MIN 0 -#define REVISION_MAX 1 /* Counters support */ +#define IPSET_TYPE_REV_MIN 0 +/* 1 Counters support */ +#define IPSET_TYPE_REV_MAX 2 /* Comments support */ MODULE_LICENSE("GPL"); MODULE_AUTHOR("Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>"); -IP_SET_MODULE_DESC("hash:ip", REVISION_MIN, REVISION_MAX); +IP_SET_MODULE_DESC("hash:ip", IPSET_TYPE_REV_MIN, IPSET_TYPE_REV_MAX); MODULE_ALIAS("ip_set_hash:ip"); /* Type specific function prefix */ #define HTYPE hash_ip #define IP_SET_HASH_WITH_NETMASK -/* IPv4 variants */ +/* IPv4 variant */ /* Member elements */ struct hash_ip4_elem { @@ -43,22 +44,6 @@ struct hash_ip4_elem { __be32 ip; }; -struct hash_ip4t_elem { - __be32 ip; - unsigned long timeout; -}; - -struct hash_ip4c_elem { - __be32 ip; - struct ip_set_counter counter; -}; - -struct hash_ip4ct_elem { - __be32 ip; - struct ip_set_counter counter; - unsigned long timeout; -}; - /* Common functions */ static inline bool @@ -99,7 +84,7 @@ hash_ip4_kadt(struct ip_set *set, const struct sk_buff *skb, const struct hash_ip *h = set->data; ipset_adtfn adtfn = set->variant->adt[adt]; struct hash_ip4_elem e = {}; - struct ip_set_ext ext = IP_SET_INIT_KEXT(skb, opt, h); + struct ip_set_ext ext = IP_SET_INIT_KEXT(skb, opt, set); __be32 ip; ip4addrptr(skb, opt->flags & IPSET_DIM_ONE_SRC, &ip); @@ -118,8 +103,8 @@ hash_ip4_uadt(struct ip_set *set, struct nlattr *tb[], const struct hash_ip *h = set->data; ipset_adtfn adtfn = set->variant->adt[adt]; struct hash_ip4_elem e = {}; - struct ip_set_ext ext = IP_SET_INIT_UEXT(h); - u32 ip, ip_to, hosts; + struct ip_set_ext ext = IP_SET_INIT_UEXT(set); + u32 ip = 0, ip_to = 0, hosts; int ret = 0; if (unlikely(!tb[IPSET_ATTR_IP] || @@ -178,29 +163,13 @@ hash_ip4_uadt(struct ip_set *set, struct nlattr *tb[], return ret; } -/* IPv6 variants */ +/* IPv6 variant */ /* Member elements */ struct hash_ip6_elem { union nf_inet_addr ip; }; -struct hash_ip6t_elem { - union nf_inet_addr ip; - unsigned long timeout; -}; - -struct hash_ip6c_elem { - union nf_inet_addr ip; - struct ip_set_counter counter; -}; - -struct hash_ip6ct_elem { - union nf_inet_addr ip; - struct ip_set_counter counter; - unsigned long timeout; -}; - /* Common functions */ static inline bool @@ -253,7 +222,7 @@ hash_ip6_kadt(struct ip_set *set, const struct sk_buff *skb, const struct hash_ip *h = set->data; ipset_adtfn adtfn = set->variant->adt[adt]; struct hash_ip6_elem e = {}; - struct ip_set_ext ext = IP_SET_INIT_KEXT(skb, opt, h); + struct ip_set_ext ext = IP_SET_INIT_KEXT(skb, opt, set); ip6addrptr(skb, opt->flags & IPSET_DIM_ONE_SRC, &e.ip.in6); hash_ip6_netmask(&e.ip, h->netmask); @@ -270,7 +239,7 @@ hash_ip6_uadt(struct ip_set *set, struct nlattr *tb[], const struct hash_ip *h = set->data; ipset_adtfn adtfn = set->variant->adt[adt]; struct hash_ip6_elem e = {}; - struct ip_set_ext ext = IP_SET_INIT_UEXT(h); + struct ip_set_ext ext = IP_SET_INIT_UEXT(set); int ret; if (unlikely(!tb[IPSET_ATTR_IP] || @@ -304,8 +273,8 @@ static struct ip_set_type hash_ip_type __read_mostly = { .features = IPSET_TYPE_IP, .dimension = IPSET_DIM_ONE, .family = NFPROTO_UNSPEC, - .revision_min = REVISION_MIN, - .revision_max = REVISION_MAX, + .revision_min = IPSET_TYPE_REV_MIN, + .revision_max = IPSET_TYPE_REV_MAX, .create = hash_ip_create, .create_policy = { [IPSET_ATTR_HASHSIZE] = { .type = NLA_U32 }, @@ -324,6 +293,7 @@ static struct ip_set_type hash_ip_type __read_mostly = { [IPSET_ATTR_LINENO] = { .type = NLA_U32 }, [IPSET_ATTR_BYTES] = { .type = NLA_U64 }, [IPSET_ATTR_PACKETS] = { .type = NLA_U64 }, + [IPSET_ATTR_COMMENT] = { .type = NLA_NUL_STRING }, }, .me = THIS_MODULE, }; diff --git a/net/netfilter/ipset/ip_set_hash_ipport.c b/net/netfilter/ipset/ip_set_hash_ipport.c index 7a2d2bd98d04..525a595dd1fe 100644 --- a/net/netfilter/ipset/ip_set_hash_ipport.c +++ b/net/netfilter/ipset/ip_set_hash_ipport.c @@ -24,19 +24,20 @@ #include <linux/netfilter/ipset/ip_set_getport.h> #include <linux/netfilter/ipset/ip_set_hash.h> -#define REVISION_MIN 0 -/* 1 SCTP and UDPLITE support added */ -#define REVISION_MAX 2 /* Counters support added */ +#define IPSET_TYPE_REV_MIN 0 +/* 1 SCTP and UDPLITE support added */ +/* 2 Counters support added */ +#define IPSET_TYPE_REV_MAX 3 /* Comments support added */ MODULE_LICENSE("GPL"); MODULE_AUTHOR("Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>"); -IP_SET_MODULE_DESC("hash:ip,port", REVISION_MIN, REVISION_MAX); +IP_SET_MODULE_DESC("hash:ip,port", IPSET_TYPE_REV_MIN, IPSET_TYPE_REV_MAX); MODULE_ALIAS("ip_set_hash:ip,port"); /* Type specific function prefix */ #define HTYPE hash_ipport -/* IPv4 variants */ +/* IPv4 variant */ /* Member elements */ struct hash_ipport4_elem { @@ -46,31 +47,6 @@ struct hash_ipport4_elem { u8 padding; }; -struct hash_ipport4t_elem { - __be32 ip; - __be16 port; - u8 proto; - u8 padding; - unsigned long timeout; -}; - -struct hash_ipport4c_elem { - __be32 ip; - __be16 port; - u8 proto; - u8 padding; - struct ip_set_counter counter; -}; - -struct hash_ipport4ct_elem { - __be32 ip; - __be16 port; - u8 proto; - u8 padding; - struct ip_set_counter counter; - unsigned long timeout; -}; - /* Common functions */ static inline bool @@ -116,10 +92,9 @@ hash_ipport4_kadt(struct ip_set *set, const struct sk_buff *skb, const struct xt_action_param *par, enum ipset_adt adt, struct ip_set_adt_opt *opt) { - const struct hash_ipport *h = set->data; ipset_adtfn adtfn = set->variant->adt[adt]; struct hash_ipport4_elem e = { }; - struct ip_set_ext ext = IP_SET_INIT_KEXT(skb, opt, h); + struct ip_set_ext ext = IP_SET_INIT_KEXT(skb, opt, set); if (!ip_set_get_ip4_port(skb, opt->flags & IPSET_DIM_TWO_SRC, &e.port, &e.proto)) @@ -136,8 +111,8 @@ hash_ipport4_uadt(struct ip_set *set, struct nlattr *tb[], const struct hash_ipport *h = set->data; ipset_adtfn adtfn = set->variant->adt[adt]; struct hash_ipport4_elem e = { }; - struct ip_set_ext ext = IP_SET_INIT_UEXT(h); - u32 ip, ip_to, p = 0, port, port_to; + struct ip_set_ext ext = IP_SET_INIT_UEXT(set); + u32 ip, ip_to = 0, p = 0, port, port_to; bool with_ports = false; int ret; @@ -222,7 +197,7 @@ hash_ipport4_uadt(struct ip_set *set, struct nlattr *tb[], return ret; } -/* IPv6 variants */ +/* IPv6 variant */ struct hash_ipport6_elem { union nf_inet_addr ip; @@ -231,31 +206,6 @@ struct hash_ipport6_elem { u8 padding; }; -struct hash_ipport6t_elem { - union nf_inet_addr ip; - __be16 port; - u8 proto; - u8 padding; - unsigned long timeout; -}; - -struct hash_ipport6c_elem { - union nf_inet_addr ip; - __be16 port; - u8 proto; - u8 padding; - struct ip_set_counter counter; -}; - -struct hash_ipport6ct_elem { - union nf_inet_addr ip; - __be16 port; - u8 proto; - u8 padding; - struct ip_set_counter counter; - unsigned long timeout; -}; - /* Common functions */ static inline bool @@ -306,10 +256,9 @@ hash_ipport6_kadt(struct ip_set *set, const struct sk_buff *skb, const struct xt_action_param *par, enum ipset_adt adt, struct ip_set_adt_opt *opt) { - const struct hash_ipport *h = set->data; ipset_adtfn adtfn = set->variant->adt[adt]; struct hash_ipport6_elem e = { }; - struct ip_set_ext ext = IP_SET_INIT_KEXT(skb, opt, h); + struct ip_set_ext ext = IP_SET_INIT_KEXT(skb, opt, set); if (!ip_set_get_ip6_port(skb, opt->flags & IPSET_DIM_TWO_SRC, &e.port, &e.proto)) @@ -326,7 +275,7 @@ hash_ipport6_uadt(struct ip_set *set, struct nlattr *tb[], const struct hash_ipport *h = set->data; ipset_adtfn adtfn = set->variant->adt[adt]; struct hash_ipport6_elem e = { }; - struct ip_set_ext ext = IP_SET_INIT_UEXT(h); + struct ip_set_ext ext = IP_SET_INIT_UEXT(set); u32 port, port_to; bool with_ports = false; int ret; @@ -396,8 +345,8 @@ static struct ip_set_type hash_ipport_type __read_mostly = { .features = IPSET_TYPE_IP | IPSET_TYPE_PORT, .dimension = IPSET_DIM_TWO, .family = NFPROTO_UNSPEC, - .revision_min = REVISION_MIN, - .revision_max = REVISION_MAX, + .revision_min = IPSET_TYPE_REV_MIN, + .revision_max = IPSET_TYPE_REV_MAX, .create = hash_ipport_create, .create_policy = { [IPSET_ATTR_HASHSIZE] = { .type = NLA_U32 }, @@ -419,6 +368,7 @@ static struct ip_set_type hash_ipport_type __read_mostly = { [IPSET_ATTR_LINENO] = { .type = NLA_U32 }, [IPSET_ATTR_BYTES] = { .type = NLA_U64 }, [IPSET_ATTR_PACKETS] = { .type = NLA_U64 }, + [IPSET_ATTR_COMMENT] = { .type = NLA_NUL_STRING }, }, .me = THIS_MODULE, }; diff --git a/net/netfilter/ipset/ip_set_hash_ipportip.c b/net/netfilter/ipset/ip_set_hash_ipportip.c index 34e8a1acce42..f5636631466e 100644 --- a/net/netfilter/ipset/ip_set_hash_ipportip.c +++ b/net/netfilter/ipset/ip_set_hash_ipportip.c @@ -24,19 +24,20 @@ #include <linux/netfilter/ipset/ip_set_getport.h> #include <linux/netfilter/ipset/ip_set_hash.h> -#define REVISION_MIN 0 -/* 1 SCTP and UDPLITE support added */ -#define REVISION_MAX 2 /* Counters support added */ +#define IPSET_TYPE_REV_MIN 0 +/* 1 SCTP and UDPLITE support added */ +/* 2 Counters support added */ +#define IPSET_TYPE_REV_MAX 3 /* Comments support added */ MODULE_LICENSE("GPL"); MODULE_AUTHOR("Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>"); -IP_SET_MODULE_DESC("hash:ip,port,ip", REVISION_MIN, REVISION_MAX); +IP_SET_MODULE_DESC("hash:ip,port,ip", IPSET_TYPE_REV_MIN, IPSET_TYPE_REV_MAX); MODULE_ALIAS("ip_set_hash:ip,port,ip"); /* Type specific function prefix */ #define HTYPE hash_ipportip -/* IPv4 variants */ +/* IPv4 variant */ /* Member elements */ struct hash_ipportip4_elem { @@ -47,34 +48,6 @@ struct hash_ipportip4_elem { u8 padding; }; -struct hash_ipportip4t_elem { - __be32 ip; - __be32 ip2; - __be16 port; - u8 proto; - u8 padding; - unsigned long timeout; -}; - -struct hash_ipportip4c_elem { - __be32 ip; - __be32 ip2; - __be16 port; - u8 proto; - u8 padding; - struct ip_set_counter counter; -}; - -struct hash_ipportip4ct_elem { - __be32 ip; - __be32 ip2; - __be16 port; - u8 proto; - u8 padding; - struct ip_set_counter counter; - unsigned long timeout; -}; - static inline bool hash_ipportip4_data_equal(const struct hash_ipportip4_elem *ip1, const struct hash_ipportip4_elem *ip2, @@ -120,10 +93,9 @@ hash_ipportip4_kadt(struct ip_set *set, const struct sk_buff *skb, const struct xt_action_param *par, enum ipset_adt adt, struct ip_set_adt_opt *opt) { - const struct hash_ipportip *h = set->data; ipset_adtfn adtfn = set->variant->adt[adt]; struct hash_ipportip4_elem e = { }; - struct ip_set_ext ext = IP_SET_INIT_KEXT(skb, opt, h); + struct ip_set_ext ext = IP_SET_INIT_KEXT(skb, opt, set); if (!ip_set_get_ip4_port(skb, opt->flags & IPSET_DIM_TWO_SRC, &e.port, &e.proto)) @@ -141,8 +113,8 @@ hash_ipportip4_uadt(struct ip_set *set, struct nlattr *tb[], const struct hash_ipportip *h = set->data; ipset_adtfn adtfn = set->variant->adt[adt]; struct hash_ipportip4_elem e = { }; - struct ip_set_ext ext = IP_SET_INIT_UEXT(h); - u32 ip, ip_to, p = 0, port, port_to; + struct ip_set_ext ext = IP_SET_INIT_UEXT(set); + u32 ip, ip_to = 0, p = 0, port, port_to; bool with_ports = false; int ret; @@ -231,7 +203,7 @@ hash_ipportip4_uadt(struct ip_set *set, struct nlattr *tb[], return ret; } -/* IPv6 variants */ +/* IPv6 variant */ struct hash_ipportip6_elem { union nf_inet_addr ip; @@ -241,34 +213,6 @@ struct hash_ipportip6_elem { u8 padding; }; -struct hash_ipportip6t_elem { - union nf_inet_addr ip; - union nf_inet_addr ip2; - __be16 port; - u8 proto; - u8 padding; - unsigned long timeout; -}; - -struct hash_ipportip6c_elem { - union nf_inet_addr ip; - union nf_inet_addr ip2; - __be16 port; - u8 proto; - u8 padding; - struct ip_set_counter counter; -}; - -struct hash_ipportip6ct_elem { - union nf_inet_addr ip; - union nf_inet_addr ip2; - __be16 port; - u8 proto; - u8 padding; - struct ip_set_counter counter; - unsigned long timeout; -}; - /* Common functions */ static inline bool @@ -319,10 +263,9 @@ hash_ipportip6_kadt(struct ip_set *set, const struct sk_buff *skb, const struct xt_action_param *par, enum ipset_adt adt, struct ip_set_adt_opt *opt) { - const struct hash_ipportip *h = set->data; ipset_adtfn adtfn = set->variant->adt[adt]; struct hash_ipportip6_elem e = { }; - struct ip_set_ext ext = IP_SET_INIT_KEXT(skb, opt, h); + struct ip_set_ext ext = IP_SET_INIT_KEXT(skb, opt, set); if (!ip_set_get_ip6_port(skb, opt->flags & IPSET_DIM_TWO_SRC, &e.port, &e.proto)) @@ -340,7 +283,7 @@ hash_ipportip6_uadt(struct ip_set *set, struct nlattr *tb[], const struct hash_ipportip *h = set->data; ipset_adtfn adtfn = set->variant->adt[adt]; struct hash_ipportip6_elem e = { }; - struct ip_set_ext ext = IP_SET_INIT_UEXT(h); + struct ip_set_ext ext = IP_SET_INIT_UEXT(set); u32 port, port_to; bool with_ports = false; int ret; @@ -414,8 +357,8 @@ static struct ip_set_type hash_ipportip_type __read_mostly = { .features = IPSET_TYPE_IP | IPSET_TYPE_PORT | IPSET_TYPE_IP2, .dimension = IPSET_DIM_THREE, .family = NFPROTO_UNSPEC, - .revision_min = REVISION_MIN, - .revision_max = REVISION_MAX, + .revision_min = IPSET_TYPE_REV_MIN, + .revision_max = IPSET_TYPE_REV_MAX, .create = hash_ipportip_create, .create_policy = { [IPSET_ATTR_HASHSIZE] = { .type = NLA_U32 }, @@ -437,6 +380,7 @@ static struct ip_set_type hash_ipportip_type __read_mostly = { [IPSET_ATTR_LINENO] = { .type = NLA_U32 }, [IPSET_ATTR_BYTES] = { .type = NLA_U64 }, [IPSET_ATTR_PACKETS] = { .type = NLA_U64 }, + [IPSET_ATTR_COMMENT] = { .type = NLA_NUL_STRING }, }, .me = THIS_MODULE, }; diff --git a/net/netfilter/ipset/ip_set_hash_ipportnet.c b/net/netfilter/ipset/ip_set_hash_ipportnet.c index f15f3e28b9c3..5d87fe8a41ff 100644 --- a/net/netfilter/ipset/ip_set_hash_ipportnet.c +++ b/net/netfilter/ipset/ip_set_hash_ipportnet.c @@ -24,15 +24,16 @@ #include <linux/netfilter/ipset/ip_set_getport.h> #include <linux/netfilter/ipset/ip_set_hash.h> -#define REVISION_MIN 0 -/* 1 SCTP and UDPLITE support added */ -/* 2 Range as input support for IPv4 added */ -/* 3 nomatch flag support added */ -#define REVISION_MAX 4 /* Counters support added */ +#define IPSET_TYPE_REV_MIN 0 +/* 1 SCTP and UDPLITE support added */ +/* 2 Range as input support for IPv4 added */ +/* 3 nomatch flag support added */ +/* 4 Counters support added */ +#define IPSET_TYPE_REV_MAX 5 /* Comments support added */ MODULE_LICENSE("GPL"); MODULE_AUTHOR("Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>"); -IP_SET_MODULE_DESC("hash:ip,port,net", REVISION_MIN, REVISION_MAX); +IP_SET_MODULE_DESC("hash:ip,port,net", IPSET_TYPE_REV_MIN, IPSET_TYPE_REV_MAX); MODULE_ALIAS("ip_set_hash:ip,port,net"); /* Type specific function prefix */ @@ -46,7 +47,7 @@ MODULE_ALIAS("ip_set_hash:ip,port,net"); #define IP_SET_HASH_WITH_PROTO #define IP_SET_HASH_WITH_NETS -/* IPv4 variants */ +/* IPv4 variant */ /* Member elements */ struct hash_ipportnet4_elem { @@ -58,37 +59,6 @@ struct hash_ipportnet4_elem { u8 proto; }; -struct hash_ipportnet4t_elem { - __be32 ip; - __be32 ip2; - __be16 port; - u8 cidr:7; - u8 nomatch:1; - u8 proto; - unsigned long timeout; -}; - -struct hash_ipportnet4c_elem { - __be32 ip; - __be32 ip2; - __be16 port; - u8 cidr:7; - u8 nomatch:1; - u8 proto; - struct ip_set_counter counter; -}; - -struct hash_ipportnet4ct_elem { - __be32 ip; - __be32 ip2; - __be16 port; - u8 cidr:7; - u8 nomatch:1; - u8 proto; - struct ip_set_counter counter; - unsigned long timeout; -}; - /* Common functions */ static inline bool @@ -170,9 +140,9 @@ hash_ipportnet4_kadt(struct ip_set *set, const struct sk_buff *skb, const struct hash_ipportnet *h = set->data; ipset_adtfn adtfn = set->variant->adt[adt]; struct hash_ipportnet4_elem e = { - .cidr = h->nets[0].cidr ? h->nets[0].cidr - 1 : HOST_MASK - 1 + .cidr = IP_SET_INIT_CIDR(h->nets[0].cidr[0], HOST_MASK) - 1, }; - struct ip_set_ext ext = IP_SET_INIT_KEXT(skb, opt, h); + struct ip_set_ext ext = IP_SET_INIT_KEXT(skb, opt, set); if (adt == IPSET_TEST) e.cidr = HOST_MASK - 1; @@ -195,9 +165,9 @@ hash_ipportnet4_uadt(struct ip_set *set, struct nlattr *tb[], const struct hash_ipportnet *h = set->data; ipset_adtfn adtfn = set->variant->adt[adt]; struct hash_ipportnet4_elem e = { .cidr = HOST_MASK - 1 }; - struct ip_set_ext ext = IP_SET_INIT_UEXT(h); - u32 ip, ip_to, p = 0, port, port_to; - u32 ip2_from, ip2_to, ip2_last, ip2; + struct ip_set_ext ext = IP_SET_INIT_UEXT(set); + u32 ip = 0, ip_to = 0, p = 0, port, port_to; + u32 ip2_from = 0, ip2_to = 0, ip2_last, ip2; bool with_ports = false; u8 cidr; int ret; @@ -272,7 +242,7 @@ hash_ipportnet4_uadt(struct ip_set *set, struct nlattr *tb[], if (ip > ip_to) swap(ip, ip_to); } else if (tb[IPSET_ATTR_CIDR]) { - u8 cidr = nla_get_u8(tb[IPSET_ATTR_CIDR]); + cidr = nla_get_u8(tb[IPSET_ATTR_CIDR]); if (!cidr || cidr > 32) return -IPSET_ERR_INVALID_CIDR; @@ -306,9 +276,9 @@ hash_ipportnet4_uadt(struct ip_set *set, struct nlattr *tb[], : port; for (; p <= port_to; p++) { e.port = htons(p); - ip2 = retried - && ip == ntohl(h->next.ip) - && p == ntohs(h->next.port) + ip2 = retried && + ip == ntohl(h->next.ip) && + p == ntohs(h->next.port) ? ntohl(h->next.ip2) : ip2_from; while (!after(ip2, ip2_to)) { e.ip2 = htonl(ip2); @@ -328,7 +298,7 @@ hash_ipportnet4_uadt(struct ip_set *set, struct nlattr *tb[], return ret; } -/* IPv6 variants */ +/* IPv6 variant */ struct hash_ipportnet6_elem { union nf_inet_addr ip; @@ -339,37 +309,6 @@ struct hash_ipportnet6_elem { u8 proto; }; -struct hash_ipportnet6t_elem { - union nf_inet_addr ip; - union nf_inet_addr ip2; - __be16 port; - u8 cidr:7; - u8 nomatch:1; - u8 proto; - unsigned long timeout; -}; - -struct hash_ipportnet6c_elem { - union nf_inet_addr ip; - union nf_inet_addr ip2; - __be16 port; - u8 cidr:7; - u8 nomatch:1; - u8 proto; - struct ip_set_counter counter; -}; - -struct hash_ipportnet6ct_elem { - union nf_inet_addr ip; - union nf_inet_addr ip2; - __be16 port; - u8 cidr:7; - u8 nomatch:1; - u8 proto; - struct ip_set_counter counter; - unsigned long timeout; -}; - /* Common functions */ static inline bool @@ -454,9 +393,9 @@ hash_ipportnet6_kadt(struct ip_set *set, const struct sk_buff *skb, const struct hash_ipportnet *h = set->data; ipset_adtfn adtfn = set->variant->adt[adt]; struct hash_ipportnet6_elem e = { - .cidr = h->nets[0].cidr ? h->nets[0].cidr - 1 : HOST_MASK - 1 + .cidr = IP_SET_INIT_CIDR(h->nets[0].cidr[0], HOST_MASK) - 1, }; - struct ip_set_ext ext = IP_SET_INIT_KEXT(skb, opt, h); + struct ip_set_ext ext = IP_SET_INIT_KEXT(skb, opt, set); if (adt == IPSET_TEST) e.cidr = HOST_MASK - 1; @@ -479,7 +418,7 @@ hash_ipportnet6_uadt(struct ip_set *set, struct nlattr *tb[], const struct hash_ipportnet *h = set->data; ipset_adtfn adtfn = set->variant->adt[adt]; struct hash_ipportnet6_elem e = { .cidr = HOST_MASK - 1 }; - struct ip_set_ext ext = IP_SET_INIT_UEXT(h); + struct ip_set_ext ext = IP_SET_INIT_UEXT(set); u32 port, port_to; bool with_ports = false; u8 cidr; @@ -574,8 +513,8 @@ static struct ip_set_type hash_ipportnet_type __read_mostly = { IPSET_TYPE_NOMATCH, .dimension = IPSET_DIM_THREE, .family = NFPROTO_UNSPEC, - .revision_min = REVISION_MIN, - .revision_max = REVISION_MAX, + .revision_min = IPSET_TYPE_REV_MIN, + .revision_max = IPSET_TYPE_REV_MAX, .create = hash_ipportnet_create, .create_policy = { [IPSET_ATTR_HASHSIZE] = { .type = NLA_U32 }, @@ -600,6 +539,7 @@ static struct ip_set_type hash_ipportnet_type __read_mostly = { [IPSET_ATTR_LINENO] = { .type = NLA_U32 }, [IPSET_ATTR_BYTES] = { .type = NLA_U64 }, [IPSET_ATTR_PACKETS] = { .type = NLA_U64 }, + [IPSET_ATTR_COMMENT] = { .type = NLA_NUL_STRING }, }, .me = THIS_MODULE, }; diff --git a/net/netfilter/ipset/ip_set_hash_net.c b/net/netfilter/ipset/ip_set_hash_net.c index 223e9f546d0f..8295cf4f9fdc 100644 --- a/net/netfilter/ipset/ip_set_hash_net.c +++ b/net/netfilter/ipset/ip_set_hash_net.c @@ -22,21 +22,22 @@ #include <linux/netfilter/ipset/ip_set.h> #include <linux/netfilter/ipset/ip_set_hash.h> -#define REVISION_MIN 0 -/* 1 Range as input support for IPv4 added */ -/* 2 nomatch flag support added */ -#define REVISION_MAX 3 /* Counters support added */ +#define IPSET_TYPE_REV_MIN 0 +/* 1 Range as input support for IPv4 added */ +/* 2 nomatch flag support added */ +/* 3 Counters support added */ +#define IPSET_TYPE_REV_MAX 4 /* Comments support added */ MODULE_LICENSE("GPL"); MODULE_AUTHOR("Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>"); -IP_SET_MODULE_DESC("hash:net", REVISION_MIN, REVISION_MAX); +IP_SET_MODULE_DESC("hash:net", IPSET_TYPE_REV_MIN, IPSET_TYPE_REV_MAX); MODULE_ALIAS("ip_set_hash:net"); /* Type specific function prefix */ #define HTYPE hash_net #define IP_SET_HASH_WITH_NETS -/* IPv4 variants */ +/* IPv4 variant */ /* Member elements */ struct hash_net4_elem { @@ -46,31 +47,6 @@ struct hash_net4_elem { u8 cidr; }; -struct hash_net4t_elem { - __be32 ip; - u16 padding0; - u8 nomatch; - u8 cidr; - unsigned long timeout; -}; - -struct hash_net4c_elem { - __be32 ip; - u16 padding0; - u8 nomatch; - u8 cidr; - struct ip_set_counter counter; -}; - -struct hash_net4ct_elem { - __be32 ip; - u16 padding0; - u8 nomatch; - u8 cidr; - struct ip_set_counter counter; - unsigned long timeout; -}; - /* Common functions */ static inline bool @@ -143,9 +119,9 @@ hash_net4_kadt(struct ip_set *set, const struct sk_buff *skb, const struct hash_net *h = set->data; ipset_adtfn adtfn = set->variant->adt[adt]; struct hash_net4_elem e = { - .cidr = h->nets[0].cidr ? h->nets[0].cidr : HOST_MASK + .cidr = IP_SET_INIT_CIDR(h->nets[0].cidr[0], HOST_MASK), }; - struct ip_set_ext ext = IP_SET_INIT_KEXT(skb, opt, h); + struct ip_set_ext ext = IP_SET_INIT_KEXT(skb, opt, set); if (e.cidr == 0) return -EINVAL; @@ -165,8 +141,8 @@ hash_net4_uadt(struct ip_set *set, struct nlattr *tb[], const struct hash_net *h = set->data; ipset_adtfn adtfn = set->variant->adt[adt]; struct hash_net4_elem e = { .cidr = HOST_MASK }; - struct ip_set_ext ext = IP_SET_INIT_UEXT(h); - u32 ip = 0, ip_to, last; + struct ip_set_ext ext = IP_SET_INIT_UEXT(set); + u32 ip = 0, ip_to = 0, last; int ret; if (unlikely(!tb[IPSET_ATTR_IP] || @@ -228,7 +204,7 @@ hash_net4_uadt(struct ip_set *set, struct nlattr *tb[], return ret; } -/* IPv6 variants */ +/* IPv6 variant */ struct hash_net6_elem { union nf_inet_addr ip; @@ -237,31 +213,6 @@ struct hash_net6_elem { u8 cidr; }; -struct hash_net6t_elem { - union nf_inet_addr ip; - u16 padding0; - u8 nomatch; - u8 cidr; - unsigned long timeout; -}; - -struct hash_net6c_elem { - union nf_inet_addr ip; - u16 padding0; - u8 nomatch; - u8 cidr; - struct ip_set_counter counter; -}; - -struct hash_net6ct_elem { - union nf_inet_addr ip; - u16 padding0; - u8 nomatch; - u8 cidr; - struct ip_set_counter counter; - unsigned long timeout; -}; - /* Common functions */ static inline bool @@ -338,9 +289,9 @@ hash_net6_kadt(struct ip_set *set, const struct sk_buff *skb, const struct hash_net *h = set->data; ipset_adtfn adtfn = set->variant->adt[adt]; struct hash_net6_elem e = { - .cidr = h->nets[0].cidr ? h->nets[0].cidr : HOST_MASK + .cidr = IP_SET_INIT_CIDR(h->nets[0].cidr[0], HOST_MASK), }; - struct ip_set_ext ext = IP_SET_INIT_KEXT(skb, opt, h); + struct ip_set_ext ext = IP_SET_INIT_KEXT(skb, opt, set); if (e.cidr == 0) return -EINVAL; @@ -357,10 +308,9 @@ static int hash_net6_uadt(struct ip_set *set, struct nlattr *tb[], enum ipset_adt adt, u32 *lineno, u32 flags, bool retried) { - const struct hash_net *h = set->data; ipset_adtfn adtfn = set->variant->adt[adt]; struct hash_net6_elem e = { .cidr = HOST_MASK }; - struct ip_set_ext ext = IP_SET_INIT_UEXT(h); + struct ip_set_ext ext = IP_SET_INIT_UEXT(set); int ret; if (unlikely(!tb[IPSET_ATTR_IP] || @@ -406,8 +356,8 @@ static struct ip_set_type hash_net_type __read_mostly = { .features = IPSET_TYPE_IP | IPSET_TYPE_NOMATCH, .dimension = IPSET_DIM_ONE, .family = NFPROTO_UNSPEC, - .revision_min = REVISION_MIN, - .revision_max = REVISION_MAX, + .revision_min = IPSET_TYPE_REV_MIN, + .revision_max = IPSET_TYPE_REV_MAX, .create = hash_net_create, .create_policy = { [IPSET_ATTR_HASHSIZE] = { .type = NLA_U32 }, @@ -425,6 +375,7 @@ static struct ip_set_type hash_net_type __read_mostly = { [IPSET_ATTR_CADT_FLAGS] = { .type = NLA_U32 }, [IPSET_ATTR_BYTES] = { .type = NLA_U64 }, [IPSET_ATTR_PACKETS] = { .type = NLA_U64 }, + [IPSET_ATTR_COMMENT] = { .type = NLA_NUL_STRING }, }, .me = THIS_MODULE, }; diff --git a/net/netfilter/ipset/ip_set_hash_netiface.c b/net/netfilter/ipset/ip_set_hash_netiface.c index 7d798d5d5cd3..3f64a66bf5d9 100644 --- a/net/netfilter/ipset/ip_set_hash_netiface.c +++ b/net/netfilter/ipset/ip_set_hash_netiface.c @@ -23,14 +23,15 @@ #include <linux/netfilter/ipset/ip_set.h> #include <linux/netfilter/ipset/ip_set_hash.h> -#define REVISION_MIN 0 -/* 1 nomatch flag support added */ -/* 2 /0 support added */ -#define REVISION_MAX 3 /* Counters support added */ +#define IPSET_TYPE_REV_MIN 0 +/* 1 nomatch flag support added */ +/* 2 /0 support added */ +/* 3 Counters support added */ +#define IPSET_TYPE_REV_MAX 4 /* Comments support added */ MODULE_LICENSE("GPL"); MODULE_AUTHOR("Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>"); -IP_SET_MODULE_DESC("hash:net,iface", REVISION_MIN, REVISION_MAX); +IP_SET_MODULE_DESC("hash:net,iface", IPSET_TYPE_REV_MIN, IPSET_TYPE_REV_MAX); MODULE_ALIAS("ip_set_hash:net,iface"); /* Interface name rbtree */ @@ -134,7 +135,7 @@ iface_add(struct rb_root *root, const char **iface) #define STREQ(a, b) (strcmp(a, b) == 0) -/* IPv4 variants */ +/* IPv4 variant */ struct hash_netiface4_elem_hashed { __be32 ip; @@ -144,7 +145,7 @@ struct hash_netiface4_elem_hashed { u8 elem; }; -/* Member elements without timeout */ +/* Member elements */ struct hash_netiface4_elem { __be32 ip; u8 physdev; @@ -154,37 +155,6 @@ struct hash_netiface4_elem { const char *iface; }; -struct hash_netiface4t_elem { - __be32 ip; - u8 physdev; - u8 cidr; - u8 nomatch; - u8 elem; - const char *iface; - unsigned long timeout; -}; - -struct hash_netiface4c_elem { - __be32 ip; - u8 physdev; - u8 cidr; - u8 nomatch; - u8 elem; - const char *iface; - struct ip_set_counter counter; -}; - -struct hash_netiface4ct_elem { - __be32 ip; - u8 physdev; - u8 cidr; - u8 nomatch; - u8 elem; - const char *iface; - struct ip_set_counter counter; - unsigned long timeout; -}; - /* Common functions */ static inline bool @@ -265,10 +235,10 @@ hash_netiface4_kadt(struct ip_set *set, const struct sk_buff *skb, struct hash_netiface *h = set->data; ipset_adtfn adtfn = set->variant->adt[adt]; struct hash_netiface4_elem e = { - .cidr = h->nets[0].cidr ? h->nets[0].cidr : HOST_MASK, + .cidr = IP_SET_INIT_CIDR(h->nets[0].cidr[0], HOST_MASK), .elem = 1, }; - struct ip_set_ext ext = IP_SET_INIT_KEXT(skb, opt, h); + struct ip_set_ext ext = IP_SET_INIT_KEXT(skb, opt, set); int ret; if (e.cidr == 0) @@ -319,8 +289,8 @@ hash_netiface4_uadt(struct ip_set *set, struct nlattr *tb[], struct hash_netiface *h = set->data; ipset_adtfn adtfn = set->variant->adt[adt]; struct hash_netiface4_elem e = { .cidr = HOST_MASK, .elem = 1 }; - struct ip_set_ext ext = IP_SET_INIT_UEXT(h); - u32 ip = 0, ip_to, last; + struct ip_set_ext ext = IP_SET_INIT_UEXT(set); + u32 ip = 0, ip_to = 0, last; char iface[IFNAMSIZ]; int ret; @@ -399,7 +369,7 @@ hash_netiface4_uadt(struct ip_set *set, struct nlattr *tb[], return ret; } -/* IPv6 variants */ +/* IPv6 variant */ struct hash_netiface6_elem_hashed { union nf_inet_addr ip; @@ -418,37 +388,6 @@ struct hash_netiface6_elem { const char *iface; }; -struct hash_netiface6t_elem { - union nf_inet_addr ip; - u8 physdev; - u8 cidr; - u8 nomatch; - u8 elem; - const char *iface; - unsigned long timeout; -}; - -struct hash_netiface6c_elem { - union nf_inet_addr ip; - u8 physdev; - u8 cidr; - u8 nomatch; - u8 elem; - const char *iface; - struct ip_set_counter counter; -}; - -struct hash_netiface6ct_elem { - union nf_inet_addr ip; - u8 physdev; - u8 cidr; - u8 nomatch; - u8 elem; - const char *iface; - struct ip_set_counter counter; - unsigned long timeout; -}; - /* Common functions */ static inline bool @@ -534,10 +473,10 @@ hash_netiface6_kadt(struct ip_set *set, const struct sk_buff *skb, struct hash_netiface *h = set->data; ipset_adtfn adtfn = set->variant->adt[adt]; struct hash_netiface6_elem e = { - .cidr = h->nets[0].cidr ? h->nets[0].cidr : HOST_MASK, + .cidr = IP_SET_INIT_CIDR(h->nets[0].cidr[0], HOST_MASK), .elem = 1, }; - struct ip_set_ext ext = IP_SET_INIT_KEXT(skb, opt, h); + struct ip_set_ext ext = IP_SET_INIT_KEXT(skb, opt, set); int ret; if (e.cidr == 0) @@ -584,7 +523,7 @@ hash_netiface6_uadt(struct ip_set *set, struct nlattr *tb[], struct hash_netiface *h = set->data; ipset_adtfn adtfn = set->variant->adt[adt]; struct hash_netiface6_elem e = { .cidr = HOST_MASK, .elem = 1 }; - struct ip_set_ext ext = IP_SET_INIT_UEXT(h); + struct ip_set_ext ext = IP_SET_INIT_UEXT(set); char iface[IFNAMSIZ]; int ret; @@ -645,8 +584,8 @@ static struct ip_set_type hash_netiface_type __read_mostly = { IPSET_TYPE_NOMATCH, .dimension = IPSET_DIM_TWO, .family = NFPROTO_UNSPEC, - .revision_min = REVISION_MIN, - .revision_max = REVISION_MAX, + .revision_min = IPSET_TYPE_REV_MIN, + .revision_max = IPSET_TYPE_REV_MAX, .create = hash_netiface_create, .create_policy = { [IPSET_ATTR_HASHSIZE] = { .type = NLA_U32 }, @@ -668,6 +607,7 @@ static struct ip_set_type hash_netiface_type __read_mostly = { [IPSET_ATTR_LINENO] = { .type = NLA_U32 }, [IPSET_ATTR_BYTES] = { .type = NLA_U64 }, [IPSET_ATTR_PACKETS] = { .type = NLA_U64 }, + [IPSET_ATTR_COMMENT] = { .type = NLA_NUL_STRING }, }, .me = THIS_MODULE, }; diff --git a/net/netfilter/ipset/ip_set_hash_netnet.c b/net/netfilter/ipset/ip_set_hash_netnet.c new file mode 100644 index 000000000000..2bc2dec20b00 --- /dev/null +++ b/net/netfilter/ipset/ip_set_hash_netnet.c @@ -0,0 +1,481 @@ +/* Copyright (C) 2003-2013 Jozsef Kadlecsik <kadlec@blackhole.kfki.hu> + * Copyright (C) 2013 Oliver Smith <oliver@8.c.9.b.0.7.4.0.1.0.0.2.ip6.arpa> + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License version 2 as + * published by the Free Software Foundation. + */ + +/* Kernel module implementing an IP set type: the hash:net type */ + +#include <linux/jhash.h> +#include <linux/module.h> +#include <linux/ip.h> +#include <linux/skbuff.h> +#include <linux/errno.h> +#include <linux/random.h> +#include <net/ip.h> +#include <net/ipv6.h> +#include <net/netlink.h> + +#include <linux/netfilter.h> +#include <linux/netfilter/ipset/pfxlen.h> +#include <linux/netfilter/ipset/ip_set.h> +#include <linux/netfilter/ipset/ip_set_hash.h> + +#define IPSET_TYPE_REV_MIN 0 +#define IPSET_TYPE_REV_MAX 0 + +MODULE_LICENSE("GPL"); +MODULE_AUTHOR("Oliver Smith <oliver@8.c.9.b.0.7.4.0.1.0.0.2.ip6.arpa>"); +IP_SET_MODULE_DESC("hash:net,net", IPSET_TYPE_REV_MIN, IPSET_TYPE_REV_MAX); +MODULE_ALIAS("ip_set_hash:net,net"); + +/* Type specific function prefix */ +#define HTYPE hash_netnet +#define IP_SET_HASH_WITH_NETS +#define IPSET_NET_COUNT 2 + +/* IPv4 variants */ + +/* Member elements */ +struct hash_netnet4_elem { + union { + __be32 ip[2]; + __be64 ipcmp; + }; + u8 nomatch; + union { + u8 cidr[2]; + u16 ccmp; + }; +}; + +/* Common functions */ + +static inline bool +hash_netnet4_data_equal(const struct hash_netnet4_elem *ip1, + const struct hash_netnet4_elem *ip2, + u32 *multi) +{ + return ip1->ipcmp == ip2->ipcmp && + ip2->ccmp == ip2->ccmp; +} + +static inline int +hash_netnet4_do_data_match(const struct hash_netnet4_elem *elem) +{ + return elem->nomatch ? -ENOTEMPTY : 1; +} + +static inline void +hash_netnet4_data_set_flags(struct hash_netnet4_elem *elem, u32 flags) +{ + elem->nomatch = (flags >> 16) & IPSET_FLAG_NOMATCH; +} + +static inline void +hash_netnet4_data_reset_flags(struct hash_netnet4_elem *elem, u8 *flags) +{ + swap(*flags, elem->nomatch); +} + +static inline void +hash_netnet4_data_reset_elem(struct hash_netnet4_elem *elem, + struct hash_netnet4_elem *orig) +{ + elem->ip[1] = orig->ip[1]; +} + +static inline void +hash_netnet4_data_netmask(struct hash_netnet4_elem *elem, u8 cidr, bool inner) +{ + if (inner) { + elem->ip[1] &= ip_set_netmask(cidr); + elem->cidr[1] = cidr; + } else { + elem->ip[0] &= ip_set_netmask(cidr); + elem->cidr[0] = cidr; + } +} + +static bool +hash_netnet4_data_list(struct sk_buff *skb, + const struct hash_netnet4_elem *data) +{ + u32 flags = data->nomatch ? IPSET_FLAG_NOMATCH : 0; + + if (nla_put_ipaddr4(skb, IPSET_ATTR_IP, data->ip[0]) || + nla_put_ipaddr4(skb, IPSET_ATTR_IP2, data->ip[1]) || + nla_put_u8(skb, IPSET_ATTR_CIDR, data->cidr[0]) || + nla_put_u8(skb, IPSET_ATTR_CIDR2, data->cidr[1]) || + (flags && + nla_put_net32(skb, IPSET_ATTR_CADT_FLAGS, htonl(flags)))) + goto nla_put_failure; + return 0; + +nla_put_failure: + return 1; +} + +static inline void +hash_netnet4_data_next(struct hash_netnet4_elem *next, + const struct hash_netnet4_elem *d) +{ + next->ipcmp = d->ipcmp; +} + +#define MTYPE hash_netnet4 +#define PF 4 +#define HOST_MASK 32 +#include "ip_set_hash_gen.h" + +static int +hash_netnet4_kadt(struct ip_set *set, const struct sk_buff *skb, + const struct xt_action_param *par, + enum ipset_adt adt, struct ip_set_adt_opt *opt) +{ + const struct hash_netnet *h = set->data; + ipset_adtfn adtfn = set->variant->adt[adt]; + struct hash_netnet4_elem e = { }; + struct ip_set_ext ext = IP_SET_INIT_KEXT(skb, opt, set); + + e.cidr[0] = IP_SET_INIT_CIDR(h->nets[0].cidr[0], HOST_MASK); + e.cidr[1] = IP_SET_INIT_CIDR(h->nets[0].cidr[1], HOST_MASK); + if (adt == IPSET_TEST) + e.ccmp = (HOST_MASK << (sizeof(e.cidr[0]) * 8)) | HOST_MASK; + + ip4addrptr(skb, opt->flags & IPSET_DIM_ONE_SRC, &e.ip[0]); + ip4addrptr(skb, opt->flags & IPSET_DIM_TWO_SRC, &e.ip[1]); + e.ip[0] &= ip_set_netmask(e.cidr[0]); + e.ip[1] &= ip_set_netmask(e.cidr[1]); + + return adtfn(set, &e, &ext, &opt->ext, opt->cmdflags); +} + +static int +hash_netnet4_uadt(struct ip_set *set, struct nlattr *tb[], + enum ipset_adt adt, u32 *lineno, u32 flags, bool retried) +{ + const struct hash_netnet *h = set->data; + ipset_adtfn adtfn = set->variant->adt[adt]; + struct hash_netnet4_elem e = { }; + struct ip_set_ext ext = IP_SET_INIT_UEXT(set); + u32 ip = 0, ip_to = 0, last; + u32 ip2 = 0, ip2_from = 0, ip2_to = 0, last2; + u8 cidr, cidr2; + int ret; + + e.cidr[0] = e.cidr[1] = HOST_MASK; + if (unlikely(!tb[IPSET_ATTR_IP] || !tb[IPSET_ATTR_IP2] || + !ip_set_optattr_netorder(tb, IPSET_ATTR_TIMEOUT) || + !ip_set_optattr_netorder(tb, IPSET_ATTR_CADT_FLAGS) || + !ip_set_optattr_netorder(tb, IPSET_ATTR_PACKETS) || + !ip_set_optattr_netorder(tb, IPSET_ATTR_BYTES))) + return -IPSET_ERR_PROTOCOL; + + if (tb[IPSET_ATTR_LINENO]) + *lineno = nla_get_u32(tb[IPSET_ATTR_LINENO]); + + ret = ip_set_get_hostipaddr4(tb[IPSET_ATTR_IP], &ip) || + ip_set_get_hostipaddr4(tb[IPSET_ATTR_IP2], &ip2_from) || + ip_set_get_extensions(set, tb, &ext); + if (ret) + return ret; + + if (tb[IPSET_ATTR_CIDR]) { + cidr = nla_get_u8(tb[IPSET_ATTR_CIDR]); + if (!cidr || cidr > HOST_MASK) + return -IPSET_ERR_INVALID_CIDR; + e.cidr[0] = cidr; + } + + if (tb[IPSET_ATTR_CIDR2]) { + cidr2 = nla_get_u8(tb[IPSET_ATTR_CIDR2]); + if (!cidr2 || cidr2 > HOST_MASK) + return -IPSET_ERR_INVALID_CIDR; + e.cidr[1] = cidr2; + } + + if (tb[IPSET_ATTR_CADT_FLAGS]) { + u32 cadt_flags = ip_set_get_h32(tb[IPSET_ATTR_CADT_FLAGS]); + if (cadt_flags & IPSET_FLAG_NOMATCH) + flags |= (IPSET_FLAG_NOMATCH << 16); + } + + if (adt == IPSET_TEST || !(tb[IPSET_ATTR_IP_TO] && + tb[IPSET_ATTR_IP2_TO])) { + e.ip[0] = htonl(ip & ip_set_hostmask(e.cidr[0])); + e.ip[1] = htonl(ip2_from & ip_set_hostmask(e.cidr[1])); + ret = adtfn(set, &e, &ext, &ext, flags); + return ip_set_enomatch(ret, flags, adt, set) ? -ret : + ip_set_eexist(ret, flags) ? 0 : ret; + } + + ip_to = ip; + if (tb[IPSET_ATTR_IP_TO]) { + ret = ip_set_get_hostipaddr4(tb[IPSET_ATTR_IP_TO], &ip_to); + if (ret) + return ret; + if (ip_to < ip) + swap(ip, ip_to); + if (ip + UINT_MAX == ip_to) + return -IPSET_ERR_HASH_RANGE; + } + + ip2_to = ip2_from; + if (tb[IPSET_ATTR_IP2_TO]) { + ret = ip_set_get_hostipaddr4(tb[IPSET_ATTR_IP2_TO], &ip2_to); + if (ret) + return ret; + if (ip2_to < ip2_from) + swap(ip2_from, ip2_to); + if (ip2_from + UINT_MAX == ip2_to) + return -IPSET_ERR_HASH_RANGE; + + } + + if (retried) + ip = ntohl(h->next.ip[0]); + + while (!after(ip, ip_to)) { + e.ip[0] = htonl(ip); + last = ip_set_range_to_cidr(ip, ip_to, &cidr); + e.cidr[0] = cidr; + ip2 = (retried && + ip == ntohl(h->next.ip[0])) ? ntohl(h->next.ip[1]) + : ip2_from; + while (!after(ip2, ip2_to)) { + e.ip[1] = htonl(ip2); + last2 = ip_set_range_to_cidr(ip2, ip2_to, &cidr2); + e.cidr[1] = cidr2; + ret = adtfn(set, &e, &ext, &ext, flags); + if (ret && !ip_set_eexist(ret, flags)) + return ret; + else + ret = 0; + ip2 = last2 + 1; + } + ip = last + 1; + } + return ret; +} + +/* IPv6 variants */ + +struct hash_netnet6_elem { + union nf_inet_addr ip[2]; + u8 nomatch; + union { + u8 cidr[2]; + u16 ccmp; + }; +}; + +/* Common functions */ + +static inline bool +hash_netnet6_data_equal(const struct hash_netnet6_elem *ip1, + const struct hash_netnet6_elem *ip2, + u32 *multi) +{ + return ipv6_addr_equal(&ip1->ip[0].in6, &ip2->ip[0].in6) && + ipv6_addr_equal(&ip1->ip[1].in6, &ip2->ip[1].in6) && + ip1->ccmp == ip2->ccmp; +} + +static inline int +hash_netnet6_do_data_match(const struct hash_netnet6_elem *elem) +{ + return elem->nomatch ? -ENOTEMPTY : 1; +} + +static inline void +hash_netnet6_data_set_flags(struct hash_netnet6_elem *elem, u32 flags) +{ + elem->nomatch = (flags >> 16) & IPSET_FLAG_NOMATCH; +} + +static inline void +hash_netnet6_data_reset_flags(struct hash_netnet6_elem *elem, u8 *flags) +{ + swap(*flags, elem->nomatch); +} + +static inline void +hash_netnet6_data_reset_elem(struct hash_netnet6_elem *elem, + struct hash_netnet6_elem *orig) +{ + elem->ip[1] = orig->ip[1]; +} + +static inline void +hash_netnet6_data_netmask(struct hash_netnet6_elem *elem, u8 cidr, bool inner) +{ + if (inner) { + ip6_netmask(&elem->ip[1], cidr); + elem->cidr[1] = cidr; + } else { + ip6_netmask(&elem->ip[0], cidr); + elem->cidr[0] = cidr; + } +} + +static bool +hash_netnet6_data_list(struct sk_buff *skb, + const struct hash_netnet6_elem *data) +{ + u32 flags = data->nomatch ? IPSET_FLAG_NOMATCH : 0; + + if (nla_put_ipaddr6(skb, IPSET_ATTR_IP, &data->ip[0].in6) || + nla_put_ipaddr6(skb, IPSET_ATTR_IP2, &data->ip[1].in6) || + nla_put_u8(skb, IPSET_ATTR_CIDR, data->cidr[0]) || + nla_put_u8(skb, IPSET_ATTR_CIDR2, data->cidr[1]) || + (flags && + nla_put_net32(skb, IPSET_ATTR_CADT_FLAGS, htonl(flags)))) + goto nla_put_failure; + return 0; + +nla_put_failure: + return 1; +} + +static inline void +hash_netnet6_data_next(struct hash_netnet4_elem *next, + const struct hash_netnet6_elem *d) +{ +} + +#undef MTYPE +#undef PF +#undef HOST_MASK + +#define MTYPE hash_netnet6 +#define PF 6 +#define HOST_MASK 128 +#define IP_SET_EMIT_CREATE +#include "ip_set_hash_gen.h" + +static int +hash_netnet6_kadt(struct ip_set *set, const struct sk_buff *skb, + const struct xt_action_param *par, + enum ipset_adt adt, struct ip_set_adt_opt *opt) +{ + const struct hash_netnet *h = set->data; + ipset_adtfn adtfn = set->variant->adt[adt]; + struct hash_netnet6_elem e = { }; + struct ip_set_ext ext = IP_SET_INIT_KEXT(skb, opt, set); + + e.cidr[0] = IP_SET_INIT_CIDR(h->nets[0].cidr[0], HOST_MASK); + e.cidr[1] = IP_SET_INIT_CIDR(h->nets[0].cidr[1], HOST_MASK); + if (adt == IPSET_TEST) + e.ccmp = (HOST_MASK << (sizeof(u8)*8)) | HOST_MASK; + + ip6addrptr(skb, opt->flags & IPSET_DIM_ONE_SRC, &e.ip[0].in6); + ip6addrptr(skb, opt->flags & IPSET_DIM_TWO_SRC, &e.ip[1].in6); + ip6_netmask(&e.ip[0], e.cidr[0]); + ip6_netmask(&e.ip[1], e.cidr[1]); + + return adtfn(set, &e, &ext, &opt->ext, opt->cmdflags); +} + +static int +hash_netnet6_uadt(struct ip_set *set, struct nlattr *tb[], + enum ipset_adt adt, u32 *lineno, u32 flags, bool retried) +{ + ipset_adtfn adtfn = set->variant->adt[adt]; + struct hash_netnet6_elem e = { }; + struct ip_set_ext ext = IP_SET_INIT_UEXT(set); + int ret; + + e.cidr[0] = e.cidr[1] = HOST_MASK; + if (unlikely(!tb[IPSET_ATTR_IP] || !tb[IPSET_ATTR_IP2] || + !ip_set_optattr_netorder(tb, IPSET_ATTR_TIMEOUT) || + !ip_set_optattr_netorder(tb, IPSET_ATTR_CADT_FLAGS) || + !ip_set_optattr_netorder(tb, IPSET_ATTR_PACKETS) || + !ip_set_optattr_netorder(tb, IPSET_ATTR_BYTES))) + return -IPSET_ERR_PROTOCOL; + if (unlikely(tb[IPSET_ATTR_IP_TO] || tb[IPSET_ATTR_IP2_TO])) + return -IPSET_ERR_HASH_RANGE_UNSUPPORTED; + + if (tb[IPSET_ATTR_LINENO]) + *lineno = nla_get_u32(tb[IPSET_ATTR_LINENO]); + + ret = ip_set_get_ipaddr6(tb[IPSET_ATTR_IP], &e.ip[0]) || + ip_set_get_ipaddr6(tb[IPSET_ATTR_IP2], &e.ip[1]) || + ip_set_get_extensions(set, tb, &ext); + if (ret) + return ret; + + if (tb[IPSET_ATTR_CIDR]) + e.cidr[0] = nla_get_u8(tb[IPSET_ATTR_CIDR]); + + if (tb[IPSET_ATTR_CIDR2]) + e.cidr[1] = nla_get_u8(tb[IPSET_ATTR_CIDR2]); + + if (!e.cidr[0] || e.cidr[0] > HOST_MASK || !e.cidr[1] || + e.cidr[1] > HOST_MASK) + return -IPSET_ERR_INVALID_CIDR; + + ip6_netmask(&e.ip[0], e.cidr[0]); + ip6_netmask(&e.ip[1], e.cidr[1]); + + if (tb[IPSET_ATTR_CADT_FLAGS]) { + u32 cadt_flags = ip_set_get_h32(tb[IPSET_ATTR_CADT_FLAGS]); + if (cadt_flags & IPSET_FLAG_NOMATCH) + flags |= (IPSET_FLAG_NOMATCH << 16); + } + + ret = adtfn(set, &e, &ext, &ext, flags); + + return ip_set_enomatch(ret, flags, adt, set) ? -ret : + ip_set_eexist(ret, flags) ? 0 : ret; +} + +static struct ip_set_type hash_netnet_type __read_mostly = { + .name = "hash:net,net", + .protocol = IPSET_PROTOCOL, + .features = IPSET_TYPE_IP | IPSET_TYPE_IP2 | IPSET_TYPE_NOMATCH, + .dimension = IPSET_DIM_TWO, + .family = NFPROTO_UNSPEC, + .revision_min = IPSET_TYPE_REV_MIN, + .revision_max = IPSET_TYPE_REV_MAX, + .create = hash_netnet_create, + .create_policy = { + [IPSET_ATTR_HASHSIZE] = { .type = NLA_U32 }, + [IPSET_ATTR_MAXELEM] = { .type = NLA_U32 }, + [IPSET_ATTR_PROBES] = { .type = NLA_U8 }, + [IPSET_ATTR_RESIZE] = { .type = NLA_U8 }, + [IPSET_ATTR_TIMEOUT] = { .type = NLA_U32 }, + [IPSET_ATTR_CADT_FLAGS] = { .type = NLA_U32 }, + }, + .adt_policy = { + [IPSET_ATTR_IP] = { .type = NLA_NESTED }, + [IPSET_ATTR_IP_TO] = { .type = NLA_NESTED }, + [IPSET_ATTR_IP2] = { .type = NLA_NESTED }, + [IPSET_ATTR_IP2_TO] = { .type = NLA_NESTED }, + [IPSET_ATTR_CIDR] = { .type = NLA_U8 }, + [IPSET_ATTR_CIDR2] = { .type = NLA_U8 }, + [IPSET_ATTR_TIMEOUT] = { .type = NLA_U32 }, + [IPSET_ATTR_CADT_FLAGS] = { .type = NLA_U32 }, + [IPSET_ATTR_BYTES] = { .type = NLA_U64 }, + [IPSET_ATTR_PACKETS] = { .type = NLA_U64 }, + [IPSET_ATTR_COMMENT] = { .type = NLA_NUL_STRING }, + }, + .me = THIS_MODULE, +}; + +static int __init +hash_netnet_init(void) +{ + return ip_set_type_register(&hash_netnet_type); +} + +static void __exit +hash_netnet_fini(void) +{ + ip_set_type_unregister(&hash_netnet_type); +} + +module_init(hash_netnet_init); +module_exit(hash_netnet_fini); diff --git a/net/netfilter/ipset/ip_set_hash_netport.c b/net/netfilter/ipset/ip_set_hash_netport.c index 09d6690bee6f..7097fb0141bf 100644 --- a/net/netfilter/ipset/ip_set_hash_netport.c +++ b/net/netfilter/ipset/ip_set_hash_netport.c @@ -23,15 +23,16 @@ #include <linux/netfilter/ipset/ip_set_getport.h> #include <linux/netfilter/ipset/ip_set_hash.h> -#define REVISION_MIN 0 -/* 1 SCTP and UDPLITE support added */ -/* 2 Range as input support for IPv4 added */ -/* 3 nomatch flag support added */ -#define REVISION_MAX 4 /* Counters support added */ +#define IPSET_TYPE_REV_MIN 0 +/* 1 SCTP and UDPLITE support added */ +/* 2 Range as input support for IPv4 added */ +/* 3 nomatch flag support added */ +/* 4 Counters support added */ +#define IPSET_TYPE_REV_MAX 5 /* Comments support added */ MODULE_LICENSE("GPL"); MODULE_AUTHOR("Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>"); -IP_SET_MODULE_DESC("hash:net,port", REVISION_MIN, REVISION_MAX); +IP_SET_MODULE_DESC("hash:net,port", IPSET_TYPE_REV_MIN, IPSET_TYPE_REV_MAX); MODULE_ALIAS("ip_set_hash:net,port"); /* Type specific function prefix */ @@ -45,7 +46,7 @@ MODULE_ALIAS("ip_set_hash:net,port"); */ #define IP_SET_HASH_WITH_NETS_PACKED -/* IPv4 variants */ +/* IPv4 variant */ /* Member elements */ struct hash_netport4_elem { @@ -56,34 +57,6 @@ struct hash_netport4_elem { u8 nomatch:1; }; -struct hash_netport4t_elem { - __be32 ip; - __be16 port; - u8 proto; - u8 cidr:7; - u8 nomatch:1; - unsigned long timeout; -}; - -struct hash_netport4c_elem { - __be32 ip; - __be16 port; - u8 proto; - u8 cidr:7; - u8 nomatch:1; - struct ip_set_counter counter; -}; - -struct hash_netport4ct_elem { - __be32 ip; - __be16 port; - u8 proto; - u8 cidr:7; - u8 nomatch:1; - struct ip_set_counter counter; - unsigned long timeout; -}; - /* Common functions */ static inline bool @@ -162,9 +135,9 @@ hash_netport4_kadt(struct ip_set *set, const struct sk_buff *skb, const struct hash_netport *h = set->data; ipset_adtfn adtfn = set->variant->adt[adt]; struct hash_netport4_elem e = { - .cidr = h->nets[0].cidr ? h->nets[0].cidr - 1 : HOST_MASK - 1 + .cidr = IP_SET_INIT_CIDR(h->nets[0].cidr[0], HOST_MASK) - 1, }; - struct ip_set_ext ext = IP_SET_INIT_KEXT(skb, opt, h); + struct ip_set_ext ext = IP_SET_INIT_KEXT(skb, opt, set); if (adt == IPSET_TEST) e.cidr = HOST_MASK - 1; @@ -186,8 +159,8 @@ hash_netport4_uadt(struct ip_set *set, struct nlattr *tb[], const struct hash_netport *h = set->data; ipset_adtfn adtfn = set->variant->adt[adt]; struct hash_netport4_elem e = { .cidr = HOST_MASK - 1 }; - struct ip_set_ext ext = IP_SET_INIT_UEXT(h); - u32 port, port_to, p = 0, ip = 0, ip_to, last; + struct ip_set_ext ext = IP_SET_INIT_UEXT(set); + u32 port, port_to, p = 0, ip = 0, ip_to = 0, last; bool with_ports = false; u8 cidr; int ret; @@ -287,7 +260,7 @@ hash_netport4_uadt(struct ip_set *set, struct nlattr *tb[], return ret; } -/* IPv6 variants */ +/* IPv6 variant */ struct hash_netport6_elem { union nf_inet_addr ip; @@ -297,34 +270,6 @@ struct hash_netport6_elem { u8 nomatch:1; }; -struct hash_netport6t_elem { - union nf_inet_addr ip; - __be16 port; - u8 proto; - u8 cidr:7; - u8 nomatch:1; - unsigned long timeout; -}; - -struct hash_netport6c_elem { - union nf_inet_addr ip; - __be16 port; - u8 proto; - u8 cidr:7; - u8 nomatch:1; - struct ip_set_counter counter; -}; - -struct hash_netport6ct_elem { - union nf_inet_addr ip; - __be16 port; - u8 proto; - u8 cidr:7; - u8 nomatch:1; - struct ip_set_counter counter; - unsigned long timeout; -}; - /* Common functions */ static inline bool @@ -407,9 +352,9 @@ hash_netport6_kadt(struct ip_set *set, const struct sk_buff *skb, const struct hash_netport *h = set->data; ipset_adtfn adtfn = set->variant->adt[adt]; struct hash_netport6_elem e = { - .cidr = h->nets[0].cidr ? h->nets[0].cidr - 1 : HOST_MASK - 1, + .cidr = IP_SET_INIT_CIDR(h->nets[0].cidr[0], HOST_MASK) - 1, }; - struct ip_set_ext ext = IP_SET_INIT_KEXT(skb, opt, h); + struct ip_set_ext ext = IP_SET_INIT_KEXT(skb, opt, set); if (adt == IPSET_TEST) e.cidr = HOST_MASK - 1; @@ -431,7 +376,7 @@ hash_netport6_uadt(struct ip_set *set, struct nlattr *tb[], const struct hash_netport *h = set->data; ipset_adtfn adtfn = set->variant->adt[adt]; struct hash_netport6_elem e = { .cidr = HOST_MASK - 1 }; - struct ip_set_ext ext = IP_SET_INIT_UEXT(h); + struct ip_set_ext ext = IP_SET_INIT_UEXT(set); u32 port, port_to; bool with_ports = false; u8 cidr; @@ -518,8 +463,8 @@ static struct ip_set_type hash_netport_type __read_mostly = { .features = IPSET_TYPE_IP | IPSET_TYPE_PORT | IPSET_TYPE_NOMATCH, .dimension = IPSET_DIM_TWO, .family = NFPROTO_UNSPEC, - .revision_min = REVISION_MIN, - .revision_max = REVISION_MAX, + .revision_min = IPSET_TYPE_REV_MIN, + .revision_max = IPSET_TYPE_REV_MAX, .create = hash_netport_create, .create_policy = { [IPSET_ATTR_HASHSIZE] = { .type = NLA_U32 }, @@ -542,6 +487,7 @@ static struct ip_set_type hash_netport_type __read_mostly = { [IPSET_ATTR_CADT_FLAGS] = { .type = NLA_U32 }, [IPSET_ATTR_BYTES] = { .type = NLA_U64 }, [IPSET_ATTR_PACKETS] = { .type = NLA_U64 }, + [IPSET_ATTR_COMMENT] = { .type = NLA_NUL_STRING }, }, .me = THIS_MODULE, }; diff --git a/net/netfilter/ipset/ip_set_hash_netportnet.c b/net/netfilter/ipset/ip_set_hash_netportnet.c new file mode 100644 index 000000000000..703d1192a6a2 --- /dev/null +++ b/net/netfilter/ipset/ip_set_hash_netportnet.c @@ -0,0 +1,586 @@ +/* Copyright (C) 2003-2013 Jozsef Kadlecsik <kadlec@blackhole.kfki.hu> + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License version 2 as + * published by the Free Software Foundation. + */ + +/* Kernel module implementing an IP set type: the hash:ip,port,net type */ + +#include <linux/jhash.h> +#include <linux/module.h> +#include <linux/ip.h> +#include <linux/skbuff.h> +#include <linux/errno.h> +#include <linux/random.h> +#include <net/ip.h> +#include <net/ipv6.h> +#include <net/netlink.h> +#include <net/tcp.h> + +#include <linux/netfilter.h> +#include <linux/netfilter/ipset/pfxlen.h> +#include <linux/netfilter/ipset/ip_set.h> +#include <linux/netfilter/ipset/ip_set_getport.h> +#include <linux/netfilter/ipset/ip_set_hash.h> + +#define IPSET_TYPE_REV_MIN 0 +#define IPSET_TYPE_REV_MAX 0 /* Comments support added */ + +MODULE_LICENSE("GPL"); +MODULE_AUTHOR("Oliver Smith <oliver@8.c.9.b.0.7.4.0.1.0.0.2.ip6.arpa>"); +IP_SET_MODULE_DESC("hash:net,port,net", IPSET_TYPE_REV_MIN, IPSET_TYPE_REV_MAX); +MODULE_ALIAS("ip_set_hash:net,port,net"); + +/* Type specific function prefix */ +#define HTYPE hash_netportnet +#define IP_SET_HASH_WITH_PROTO +#define IP_SET_HASH_WITH_NETS +#define IPSET_NET_COUNT 2 + +/* IPv4 variant */ + +/* Member elements */ +struct hash_netportnet4_elem { + union { + __be32 ip[2]; + __be64 ipcmp; + }; + __be16 port; + union { + u8 cidr[2]; + u16 ccmp; + }; + u8 nomatch:1; + u8 proto; +}; + +/* Common functions */ + +static inline bool +hash_netportnet4_data_equal(const struct hash_netportnet4_elem *ip1, + const struct hash_netportnet4_elem *ip2, + u32 *multi) +{ + return ip1->ipcmp == ip2->ipcmp && + ip1->ccmp == ip2->ccmp && + ip1->port == ip2->port && + ip1->proto == ip2->proto; +} + +static inline int +hash_netportnet4_do_data_match(const struct hash_netportnet4_elem *elem) +{ + return elem->nomatch ? -ENOTEMPTY : 1; +} + +static inline void +hash_netportnet4_data_set_flags(struct hash_netportnet4_elem *elem, u32 flags) +{ + elem->nomatch = !!((flags >> 16) & IPSET_FLAG_NOMATCH); +} + +static inline void +hash_netportnet4_data_reset_flags(struct hash_netportnet4_elem *elem, u8 *flags) +{ + swap(*flags, elem->nomatch); +} + +static inline void +hash_netportnet4_data_reset_elem(struct hash_netportnet4_elem *elem, + struct hash_netportnet4_elem *orig) +{ + elem->ip[1] = orig->ip[1]; +} + +static inline void +hash_netportnet4_data_netmask(struct hash_netportnet4_elem *elem, + u8 cidr, bool inner) +{ + if (inner) { + elem->ip[1] &= ip_set_netmask(cidr); + elem->cidr[1] = cidr; + } else { + elem->ip[0] &= ip_set_netmask(cidr); + elem->cidr[0] = cidr; + } +} + +static bool +hash_netportnet4_data_list(struct sk_buff *skb, + const struct hash_netportnet4_elem *data) +{ + u32 flags = data->nomatch ? IPSET_FLAG_NOMATCH : 0; + + if (nla_put_ipaddr4(skb, IPSET_ATTR_IP, data->ip[0]) || + nla_put_ipaddr4(skb, IPSET_ATTR_IP2, data->ip[1]) || + nla_put_net16(skb, IPSET_ATTR_PORT, data->port) || + nla_put_u8(skb, IPSET_ATTR_CIDR, data->cidr[0]) || + nla_put_u8(skb, IPSET_ATTR_CIDR2, data->cidr[1]) || + nla_put_u8(skb, IPSET_ATTR_PROTO, data->proto) || + (flags && + nla_put_net32(skb, IPSET_ATTR_CADT_FLAGS, htonl(flags)))) + goto nla_put_failure; + return 0; + +nla_put_failure: + return 1; +} + +static inline void +hash_netportnet4_data_next(struct hash_netportnet4_elem *next, + const struct hash_netportnet4_elem *d) +{ + next->ipcmp = d->ipcmp; + next->port = d->port; +} + +#define MTYPE hash_netportnet4 +#define PF 4 +#define HOST_MASK 32 +#include "ip_set_hash_gen.h" + +static int +hash_netportnet4_kadt(struct ip_set *set, const struct sk_buff *skb, + const struct xt_action_param *par, + enum ipset_adt adt, struct ip_set_adt_opt *opt) +{ + const struct hash_netportnet *h = set->data; + ipset_adtfn adtfn = set->variant->adt[adt]; + struct hash_netportnet4_elem e = { }; + struct ip_set_ext ext = IP_SET_INIT_KEXT(skb, opt, set); + + e.cidr[0] = IP_SET_INIT_CIDR(h->nets[0].cidr[0], HOST_MASK); + e.cidr[1] = IP_SET_INIT_CIDR(h->nets[0].cidr[1], HOST_MASK); + if (adt == IPSET_TEST) + e.ccmp = (HOST_MASK << (sizeof(e.cidr[0]) * 8)) | HOST_MASK; + + if (!ip_set_get_ip4_port(skb, opt->flags & IPSET_DIM_TWO_SRC, + &e.port, &e.proto)) + return -EINVAL; + + ip4addrptr(skb, opt->flags & IPSET_DIM_ONE_SRC, &e.ip[0]); + ip4addrptr(skb, opt->flags & IPSET_DIM_THREE_SRC, &e.ip[1]); + e.ip[0] &= ip_set_netmask(e.cidr[0]); + e.ip[1] &= ip_set_netmask(e.cidr[1]); + + return adtfn(set, &e, &ext, &opt->ext, opt->cmdflags); +} + +static int +hash_netportnet4_uadt(struct ip_set *set, struct nlattr *tb[], + enum ipset_adt adt, u32 *lineno, u32 flags, bool retried) +{ + const struct hash_netportnet *h = set->data; + ipset_adtfn adtfn = set->variant->adt[adt]; + struct hash_netportnet4_elem e = { }; + struct ip_set_ext ext = IP_SET_INIT_UEXT(set); + u32 ip = 0, ip_to = 0, ip_last, p = 0, port, port_to; + u32 ip2_from = 0, ip2_to = 0, ip2_last, ip2; + bool with_ports = false; + u8 cidr, cidr2; + int ret; + + e.cidr[0] = e.cidr[1] = HOST_MASK; + if (unlikely(!tb[IPSET_ATTR_IP] || !tb[IPSET_ATTR_IP2] || + !ip_set_attr_netorder(tb, IPSET_ATTR_PORT) || + !ip_set_optattr_netorder(tb, IPSET_ATTR_PORT_TO) || + !ip_set_optattr_netorder(tb, IPSET_ATTR_TIMEOUT) || + !ip_set_optattr_netorder(tb, IPSET_ATTR_CADT_FLAGS) || + !ip_set_optattr_netorder(tb, IPSET_ATTR_PACKETS) || + !ip_set_optattr_netorder(tb, IPSET_ATTR_BYTES))) + return -IPSET_ERR_PROTOCOL; + + if (tb[IPSET_ATTR_LINENO]) + *lineno = nla_get_u32(tb[IPSET_ATTR_LINENO]); + + ret = ip_set_get_hostipaddr4(tb[IPSET_ATTR_IP], &ip) || + ip_set_get_hostipaddr4(tb[IPSET_ATTR_IP2], &ip2_from) || + ip_set_get_extensions(set, tb, &ext); + if (ret) + return ret; + + if (tb[IPSET_ATTR_CIDR]) { + cidr = nla_get_u8(tb[IPSET_ATTR_CIDR]); + if (!cidr || cidr > HOST_MASK) + return -IPSET_ERR_INVALID_CIDR; + e.cidr[0] = cidr; + } + + if (tb[IPSET_ATTR_CIDR2]) { + cidr = nla_get_u8(tb[IPSET_ATTR_CIDR2]); + if (!cidr || cidr > HOST_MASK) + return -IPSET_ERR_INVALID_CIDR; + e.cidr[1] = cidr; + } + + if (tb[IPSET_ATTR_PORT]) + e.port = nla_get_be16(tb[IPSET_ATTR_PORT]); + else + return -IPSET_ERR_PROTOCOL; + + if (tb[IPSET_ATTR_PROTO]) { + e.proto = nla_get_u8(tb[IPSET_ATTR_PROTO]); + with_ports = ip_set_proto_with_ports(e.proto); + + if (e.proto == 0) + return -IPSET_ERR_INVALID_PROTO; + } else + return -IPSET_ERR_MISSING_PROTO; + + if (!(with_ports || e.proto == IPPROTO_ICMP)) + e.port = 0; + + if (tb[IPSET_ATTR_CADT_FLAGS]) { + u32 cadt_flags = ip_set_get_h32(tb[IPSET_ATTR_CADT_FLAGS]); + if (cadt_flags & IPSET_FLAG_NOMATCH) + flags |= (IPSET_FLAG_NOMATCH << 16); + } + + with_ports = with_ports && tb[IPSET_ATTR_PORT_TO]; + if (adt == IPSET_TEST || + !(tb[IPSET_ATTR_IP_TO] || with_ports || tb[IPSET_ATTR_IP2_TO])) { + e.ip[0] = htonl(ip & ip_set_hostmask(e.cidr[0])); + e.ip[1] = htonl(ip2_from & ip_set_hostmask(e.cidr[1])); + ret = adtfn(set, &e, &ext, &ext, flags); + return ip_set_enomatch(ret, flags, adt, set) ? -ret : + ip_set_eexist(ret, flags) ? 0 : ret; + } + + ip_to = ip; + if (tb[IPSET_ATTR_IP_TO]) { + ret = ip_set_get_hostipaddr4(tb[IPSET_ATTR_IP_TO], &ip_to); + if (ret) + return ret; + if (ip > ip_to) + swap(ip, ip_to); + if (unlikely(ip + UINT_MAX == ip_to)) + return -IPSET_ERR_HASH_RANGE; + } + + port_to = port = ntohs(e.port); + if (tb[IPSET_ATTR_PORT_TO]) { + port_to = ip_set_get_h16(tb[IPSET_ATTR_PORT_TO]); + if (port > port_to) + swap(port, port_to); + } + + ip2_to = ip2_from; + if (tb[IPSET_ATTR_IP2_TO]) { + ret = ip_set_get_hostipaddr4(tb[IPSET_ATTR_IP2_TO], &ip2_to); + if (ret) + return ret; + if (ip2_from > ip2_to) + swap(ip2_from, ip2_to); + if (unlikely(ip2_from + UINT_MAX == ip2_to)) + return -IPSET_ERR_HASH_RANGE; + } + + if (retried) + ip = ntohl(h->next.ip[0]); + + while (!after(ip, ip_to)) { + e.ip[0] = htonl(ip); + ip_last = ip_set_range_to_cidr(ip, ip_to, &cidr); + e.cidr[0] = cidr; + p = retried && ip == ntohl(h->next.ip[0]) ? ntohs(h->next.port) + : port; + for (; p <= port_to; p++) { + e.port = htons(p); + ip2 = (retried && ip == ntohl(h->next.ip[0]) && + p == ntohs(h->next.port)) ? ntohl(h->next.ip[1]) + : ip2_from; + while (!after(ip2, ip2_to)) { + e.ip[1] = htonl(ip2); + ip2_last = ip_set_range_to_cidr(ip2, ip2_to, + &cidr2); + e.cidr[1] = cidr2; + ret = adtfn(set, &e, &ext, &ext, flags); + if (ret && !ip_set_eexist(ret, flags)) + return ret; + else + ret = 0; + ip2 = ip2_last + 1; + } + } + ip = ip_last + 1; + } + return ret; +} + +/* IPv6 variant */ + +struct hash_netportnet6_elem { + union nf_inet_addr ip[2]; + __be16 port; + union { + u8 cidr[2]; + u16 ccmp; + }; + u8 nomatch:1; + u8 proto; +}; + +/* Common functions */ + +static inline bool +hash_netportnet6_data_equal(const struct hash_netportnet6_elem *ip1, + const struct hash_netportnet6_elem *ip2, + u32 *multi) +{ + return ipv6_addr_equal(&ip1->ip[0].in6, &ip2->ip[0].in6) && + ipv6_addr_equal(&ip1->ip[1].in6, &ip2->ip[1].in6) && + ip1->ccmp == ip2->ccmp && + ip1->port == ip2->port && + ip1->proto == ip2->proto; +} + +static inline int +hash_netportnet6_do_data_match(const struct hash_netportnet6_elem *elem) +{ + return elem->nomatch ? -ENOTEMPTY : 1; +} + +static inline void +hash_netportnet6_data_set_flags(struct hash_netportnet6_elem *elem, u32 flags) +{ + elem->nomatch = !!((flags >> 16) & IPSET_FLAG_NOMATCH); +} + +static inline void +hash_netportnet6_data_reset_flags(struct hash_netportnet6_elem *elem, u8 *flags) +{ + swap(*flags, elem->nomatch); +} + +static inline void +hash_netportnet6_data_reset_elem(struct hash_netportnet6_elem *elem, + struct hash_netportnet6_elem *orig) +{ + elem->ip[1] = orig->ip[1]; +} + +static inline void +hash_netportnet6_data_netmask(struct hash_netportnet6_elem *elem, + u8 cidr, bool inner) +{ + if (inner) { + ip6_netmask(&elem->ip[1], cidr); + elem->cidr[1] = cidr; + } else { + ip6_netmask(&elem->ip[0], cidr); + elem->cidr[0] = cidr; + } +} + +static bool +hash_netportnet6_data_list(struct sk_buff *skb, + const struct hash_netportnet6_elem *data) +{ + u32 flags = data->nomatch ? IPSET_FLAG_NOMATCH : 0; + + if (nla_put_ipaddr6(skb, IPSET_ATTR_IP, &data->ip[0].in6) || + nla_put_ipaddr6(skb, IPSET_ATTR_IP2, &data->ip[1].in6) || + nla_put_net16(skb, IPSET_ATTR_PORT, data->port) || + nla_put_u8(skb, IPSET_ATTR_CIDR, data->cidr[0]) || + nla_put_u8(skb, IPSET_ATTR_CIDR2, data->cidr[1]) || + nla_put_u8(skb, IPSET_ATTR_PROTO, data->proto) || + (flags && + nla_put_net32(skb, IPSET_ATTR_CADT_FLAGS, htonl(flags)))) + goto nla_put_failure; + return 0; + +nla_put_failure: + return 1; +} + +static inline void +hash_netportnet6_data_next(struct hash_netportnet4_elem *next, + const struct hash_netportnet6_elem *d) +{ + next->port = d->port; +} + +#undef MTYPE +#undef PF +#undef HOST_MASK + +#define MTYPE hash_netportnet6 +#define PF 6 +#define HOST_MASK 128 +#define IP_SET_EMIT_CREATE +#include "ip_set_hash_gen.h" + +static int +hash_netportnet6_kadt(struct ip_set *set, const struct sk_buff *skb, + const struct xt_action_param *par, + enum ipset_adt adt, struct ip_set_adt_opt *opt) +{ + const struct hash_netportnet *h = set->data; + ipset_adtfn adtfn = set->variant->adt[adt]; + struct hash_netportnet6_elem e = { }; + struct ip_set_ext ext = IP_SET_INIT_KEXT(skb, opt, set); + + e.cidr[0] = IP_SET_INIT_CIDR(h->nets[0].cidr[0], HOST_MASK); + e.cidr[1] = IP_SET_INIT_CIDR(h->nets[0].cidr[1], HOST_MASK); + if (adt == IPSET_TEST) + e.ccmp = (HOST_MASK << (sizeof(u8) * 8)) | HOST_MASK; + + if (!ip_set_get_ip6_port(skb, opt->flags & IPSET_DIM_TWO_SRC, + &e.port, &e.proto)) + return -EINVAL; + + ip6addrptr(skb, opt->flags & IPSET_DIM_ONE_SRC, &e.ip[0].in6); + ip6addrptr(skb, opt->flags & IPSET_DIM_THREE_SRC, &e.ip[1].in6); + ip6_netmask(&e.ip[0], e.cidr[0]); + ip6_netmask(&e.ip[1], e.cidr[1]); + + return adtfn(set, &e, &ext, &opt->ext, opt->cmdflags); +} + +static int +hash_netportnet6_uadt(struct ip_set *set, struct nlattr *tb[], + enum ipset_adt adt, u32 *lineno, u32 flags, bool retried) +{ + const struct hash_netportnet *h = set->data; + ipset_adtfn adtfn = set->variant->adt[adt]; + struct hash_netportnet6_elem e = { }; + struct ip_set_ext ext = IP_SET_INIT_UEXT(set); + u32 port, port_to; + bool with_ports = false; + int ret; + + e.cidr[0] = e.cidr[1] = HOST_MASK; + if (unlikely(!tb[IPSET_ATTR_IP] || !tb[IPSET_ATTR_IP2] || + !ip_set_attr_netorder(tb, IPSET_ATTR_PORT) || + !ip_set_optattr_netorder(tb, IPSET_ATTR_PORT_TO) || + !ip_set_optattr_netorder(tb, IPSET_ATTR_TIMEOUT) || + !ip_set_optattr_netorder(tb, IPSET_ATTR_CADT_FLAGS) || + !ip_set_optattr_netorder(tb, IPSET_ATTR_PACKETS) || + !ip_set_optattr_netorder(tb, IPSET_ATTR_BYTES))) + return -IPSET_ERR_PROTOCOL; + if (unlikely(tb[IPSET_ATTR_IP_TO] || tb[IPSET_ATTR_IP2_TO])) + return -IPSET_ERR_HASH_RANGE_UNSUPPORTED; + + if (tb[IPSET_ATTR_LINENO]) + *lineno = nla_get_u32(tb[IPSET_ATTR_LINENO]); + + ret = ip_set_get_ipaddr6(tb[IPSET_ATTR_IP], &e.ip[0]) || + ip_set_get_ipaddr6(tb[IPSET_ATTR_IP2], &e.ip[1]) || + ip_set_get_extensions(set, tb, &ext); + if (ret) + return ret; + + if (tb[IPSET_ATTR_CIDR]) + e.cidr[0] = nla_get_u8(tb[IPSET_ATTR_CIDR]); + + if (tb[IPSET_ATTR_CIDR2]) + e.cidr[1] = nla_get_u8(tb[IPSET_ATTR_CIDR2]); + + if (unlikely(!e.cidr[0] || e.cidr[0] > HOST_MASK || !e.cidr[1] || + e.cidr[1] > HOST_MASK)) + return -IPSET_ERR_INVALID_CIDR; + + ip6_netmask(&e.ip[0], e.cidr[0]); + ip6_netmask(&e.ip[1], e.cidr[1]); + + if (tb[IPSET_ATTR_PORT]) + e.port = nla_get_be16(tb[IPSET_ATTR_PORT]); + else + return -IPSET_ERR_PROTOCOL; + + if (tb[IPSET_ATTR_PROTO]) { + e.proto = nla_get_u8(tb[IPSET_ATTR_PROTO]); + with_ports = ip_set_proto_with_ports(e.proto); + + if (e.proto == 0) + return -IPSET_ERR_INVALID_PROTO; + } else + return -IPSET_ERR_MISSING_PROTO; + + if (!(with_ports || e.proto == IPPROTO_ICMPV6)) + e.port = 0; + + if (tb[IPSET_ATTR_CADT_FLAGS]) { + u32 cadt_flags = ip_set_get_h32(tb[IPSET_ATTR_CADT_FLAGS]); + if (cadt_flags & IPSET_FLAG_NOMATCH) + flags |= (IPSET_FLAG_NOMATCH << 16); + } + + if (adt == IPSET_TEST || !with_ports || !tb[IPSET_ATTR_PORT_TO]) { + ret = adtfn(set, &e, &ext, &ext, flags); + return ip_set_enomatch(ret, flags, adt, set) ? -ret : + ip_set_eexist(ret, flags) ? 0 : ret; + } + + port = ntohs(e.port); + port_to = ip_set_get_h16(tb[IPSET_ATTR_PORT_TO]); + if (port > port_to) + swap(port, port_to); + + if (retried) + port = ntohs(h->next.port); + for (; port <= port_to; port++) { + e.port = htons(port); + ret = adtfn(set, &e, &ext, &ext, flags); + + if (ret && !ip_set_eexist(ret, flags)) + return ret; + else + ret = 0; + } + return ret; +} + +static struct ip_set_type hash_netportnet_type __read_mostly = { + .name = "hash:net,port,net", + .protocol = IPSET_PROTOCOL, + .features = IPSET_TYPE_IP | IPSET_TYPE_PORT | IPSET_TYPE_IP2 | + IPSET_TYPE_NOMATCH, + .dimension = IPSET_DIM_THREE, + .family = NFPROTO_UNSPEC, + .revision_min = IPSET_TYPE_REV_MIN, + .revision_max = IPSET_TYPE_REV_MAX, + .create = hash_netportnet_create, + .create_policy = { + [IPSET_ATTR_HASHSIZE] = { .type = NLA_U32 }, + [IPSET_ATTR_MAXELEM] = { .type = NLA_U32 }, + [IPSET_ATTR_PROBES] = { .type = NLA_U8 }, + [IPSET_ATTR_RESIZE] = { .type = NLA_U8 }, + [IPSET_ATTR_TIMEOUT] = { .type = NLA_U32 }, + [IPSET_ATTR_CADT_FLAGS] = { .type = NLA_U32 }, + }, + .adt_policy = { + [IPSET_ATTR_IP] = { .type = NLA_NESTED }, + [IPSET_ATTR_IP_TO] = { .type = NLA_NESTED }, + [IPSET_ATTR_IP2] = { .type = NLA_NESTED }, + [IPSET_ATTR_IP2_TO] = { .type = NLA_NESTED }, + [IPSET_ATTR_PORT] = { .type = NLA_U16 }, + [IPSET_ATTR_PORT_TO] = { .type = NLA_U16 }, + [IPSET_ATTR_CIDR] = { .type = NLA_U8 }, + [IPSET_ATTR_CIDR2] = { .type = NLA_U8 }, + [IPSET_ATTR_PROTO] = { .type = NLA_U8 }, + [IPSET_ATTR_CADT_FLAGS] = { .type = NLA_U32 }, + [IPSET_ATTR_TIMEOUT] = { .type = NLA_U32 }, + [IPSET_ATTR_LINENO] = { .type = NLA_U32 }, + [IPSET_ATTR_BYTES] = { .type = NLA_U64 }, + [IPSET_ATTR_PACKETS] = { .type = NLA_U64 }, + [IPSET_ATTR_COMMENT] = { .type = NLA_NUL_STRING }, + }, + .me = THIS_MODULE, +}; + +static int __init +hash_netportnet_init(void) +{ + return ip_set_type_register(&hash_netportnet_type); +} + +static void __exit +hash_netportnet_fini(void) +{ + ip_set_type_unregister(&hash_netportnet_type); +} + +module_init(hash_netportnet_init); +module_exit(hash_netportnet_fini); diff --git a/net/netfilter/ipset/ip_set_list_set.c b/net/netfilter/ipset/ip_set_list_set.c index 979b8c90e422..3e2317f3cf68 100644 --- a/net/netfilter/ipset/ip_set_list_set.c +++ b/net/netfilter/ipset/ip_set_list_set.c @@ -15,12 +15,13 @@ #include <linux/netfilter/ipset/ip_set.h> #include <linux/netfilter/ipset/ip_set_list.h> -#define REVISION_MIN 0 -#define REVISION_MAX 1 /* Counters support added */ +#define IPSET_TYPE_REV_MIN 0 +/* 1 Counters support added */ +#define IPSET_TYPE_REV_MAX 2 /* Comments support added */ MODULE_LICENSE("GPL"); MODULE_AUTHOR("Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>"); -IP_SET_MODULE_DESC("list:set", REVISION_MIN, REVISION_MAX); +IP_SET_MODULE_DESC("list:set", IPSET_TYPE_REV_MIN, IPSET_TYPE_REV_MAX); MODULE_ALIAS("ip_set_list:set"); /* Member elements */ @@ -28,28 +29,6 @@ struct set_elem { ip_set_id_t id; }; -struct sett_elem { - struct { - ip_set_id_t id; - } __attribute__ ((aligned)); - unsigned long timeout; -}; - -struct setc_elem { - struct { - ip_set_id_t id; - } __attribute__ ((aligned)); - struct ip_set_counter counter; -}; - -struct setct_elem { - struct { - ip_set_id_t id; - } __attribute__ ((aligned)); - struct ip_set_counter counter; - unsigned long timeout; -}; - struct set_adt_elem { ip_set_id_t id; ip_set_id_t refid; @@ -58,24 +37,14 @@ struct set_adt_elem { /* Type structure */ struct list_set { - size_t dsize; /* element size */ - size_t offset[IPSET_OFFSET_MAX]; /* Offsets to extensions */ u32 size; /* size of set list array */ - u32 timeout; /* timeout value */ struct timer_list gc; /* garbage collection */ + struct net *net; /* namespace */ struct set_elem members[0]; /* the set members */ }; -static inline struct set_elem * -list_set_elem(const struct list_set *map, u32 id) -{ - return (struct set_elem *)((void *)map->members + id * map->dsize); -} - -#define ext_timeout(e, m) \ -(unsigned long *)((void *)(e) + (m)->offset[IPSET_OFFSET_TIMEOUT]) -#define ext_counter(e, m) \ -(struct ip_set_counter *)((void *)(e) + (m)->offset[IPSET_OFFSET_COUNTER]) +#define list_set_elem(set, map, id) \ + (struct set_elem *)((void *)(map)->members + (id) * (set)->dsize) static int list_set_ktest(struct ip_set *set, const struct sk_buff *skb, @@ -92,16 +61,16 @@ list_set_ktest(struct ip_set *set, const struct sk_buff *skb, if (opt->cmdflags & IPSET_FLAG_SKIP_SUBCOUNTER_UPDATE) opt->cmdflags &= ~IPSET_FLAG_SKIP_COUNTER_UPDATE; for (i = 0; i < map->size; i++) { - e = list_set_elem(map, i); + e = list_set_elem(set, map, i); if (e->id == IPSET_INVALID_ID) return 0; if (SET_WITH_TIMEOUT(set) && - ip_set_timeout_expired(ext_timeout(e, map))) + ip_set_timeout_expired(ext_timeout(e, set))) continue; ret = ip_set_test(e->id, skb, par, opt); if (ret > 0) { if (SET_WITH_COUNTER(set)) - ip_set_update_counter(ext_counter(e, map), + ip_set_update_counter(ext_counter(e, set), ext, &opt->ext, cmdflags); return ret; @@ -121,11 +90,11 @@ list_set_kadd(struct ip_set *set, const struct sk_buff *skb, int ret; for (i = 0; i < map->size; i++) { - e = list_set_elem(map, i); + e = list_set_elem(set, map, i); if (e->id == IPSET_INVALID_ID) return 0; if (SET_WITH_TIMEOUT(set) && - ip_set_timeout_expired(ext_timeout(e, map))) + ip_set_timeout_expired(ext_timeout(e, set))) continue; ret = ip_set_add(e->id, skb, par, opt); if (ret == 0) @@ -145,11 +114,11 @@ list_set_kdel(struct ip_set *set, const struct sk_buff *skb, int ret; for (i = 0; i < map->size; i++) { - e = list_set_elem(map, i); + e = list_set_elem(set, map, i); if (e->id == IPSET_INVALID_ID) return 0; if (SET_WITH_TIMEOUT(set) && - ip_set_timeout_expired(ext_timeout(e, map))) + ip_set_timeout_expired(ext_timeout(e, set))) continue; ret = ip_set_del(e->id, skb, par, opt); if (ret == 0) @@ -163,8 +132,7 @@ list_set_kadt(struct ip_set *set, const struct sk_buff *skb, const struct xt_action_param *par, enum ipset_adt adt, struct ip_set_adt_opt *opt) { - struct list_set *map = set->data; - struct ip_set_ext ext = IP_SET_INIT_KEXT(skb, opt, map); + struct ip_set_ext ext = IP_SET_INIT_KEXT(skb, opt, set); switch (adt) { case IPSET_TEST: @@ -188,10 +156,10 @@ id_eq(const struct ip_set *set, u32 i, ip_set_id_t id) if (i >= map->size) return 0; - e = list_set_elem(map, i); + e = list_set_elem(set, map, i); return !!(e->id == id && !(SET_WITH_TIMEOUT(set) && - ip_set_timeout_expired(ext_timeout(e, map)))); + ip_set_timeout_expired(ext_timeout(e, set)))); } static int @@ -199,28 +167,36 @@ list_set_add(struct ip_set *set, u32 i, struct set_adt_elem *d, const struct ip_set_ext *ext) { struct list_set *map = set->data; - struct set_elem *e = list_set_elem(map, i); + struct set_elem *e = list_set_elem(set, map, i); if (e->id != IPSET_INVALID_ID) { - if (i == map->size - 1) + if (i == map->size - 1) { /* Last element replaced: e.g. add new,before,last */ - ip_set_put_byindex(e->id); - else { - struct set_elem *x = list_set_elem(map, map->size - 1); + ip_set_put_byindex(map->net, e->id); + ip_set_ext_destroy(set, e); + } else { + struct set_elem *x = list_set_elem(set, map, + map->size - 1); /* Last element pushed off */ - if (x->id != IPSET_INVALID_ID) - ip_set_put_byindex(x->id); - memmove(list_set_elem(map, i + 1), e, - map->dsize * (map->size - (i + 1))); + if (x->id != IPSET_INVALID_ID) { + ip_set_put_byindex(map->net, x->id); + ip_set_ext_destroy(set, x); + } + memmove(list_set_elem(set, map, i + 1), e, + set->dsize * (map->size - (i + 1))); + /* Extensions must be initialized to zero */ + memset(e, 0, set->dsize); } } e->id = d->id; if (SET_WITH_TIMEOUT(set)) - ip_set_timeout_set(ext_timeout(e, map), ext->timeout); + ip_set_timeout_set(ext_timeout(e, set), ext->timeout); if (SET_WITH_COUNTER(set)) - ip_set_init_counter(ext_counter(e, map), ext); + ip_set_init_counter(ext_counter(e, set), ext); + if (SET_WITH_COMMENT(set)) + ip_set_init_comment(ext_comment(e, set), ext); return 0; } @@ -228,16 +204,17 @@ static int list_set_del(struct ip_set *set, u32 i) { struct list_set *map = set->data; - struct set_elem *e = list_set_elem(map, i); + struct set_elem *e = list_set_elem(set, map, i); - ip_set_put_byindex(e->id); + ip_set_put_byindex(map->net, e->id); + ip_set_ext_destroy(set, e); if (i < map->size - 1) - memmove(e, list_set_elem(map, i + 1), - map->dsize * (map->size - (i + 1))); + memmove(e, list_set_elem(set, map, i + 1), + set->dsize * (map->size - (i + 1))); /* Last element */ - e = list_set_elem(map, map->size - 1); + e = list_set_elem(set, map, map->size - 1); e->id = IPSET_INVALID_ID; return 0; } @@ -247,13 +224,16 @@ set_cleanup_entries(struct ip_set *set) { struct list_set *map = set->data; struct set_elem *e; - u32 i; + u32 i = 0; - for (i = 0; i < map->size; i++) { - e = list_set_elem(map, i); + while (i < map->size) { + e = list_set_elem(set, map, i); if (e->id != IPSET_INVALID_ID && - ip_set_timeout_expired(ext_timeout(e, map))) + ip_set_timeout_expired(ext_timeout(e, set))) list_set_del(set, i); + /* Check element moved to position i in next loop */ + else + i++; } } @@ -268,11 +248,11 @@ list_set_utest(struct ip_set *set, void *value, const struct ip_set_ext *ext, int ret; for (i = 0; i < map->size; i++) { - e = list_set_elem(map, i); + e = list_set_elem(set, map, i); if (e->id == IPSET_INVALID_ID) return 0; else if (SET_WITH_TIMEOUT(set) && - ip_set_timeout_expired(ext_timeout(e, map))) + ip_set_timeout_expired(ext_timeout(e, set))) continue; else if (e->id != d->id) continue; @@ -299,14 +279,14 @@ list_set_uadd(struct ip_set *set, void *value, const struct ip_set_ext *ext, bool flag_exist = flags & IPSET_FLAG_EXIST; u32 i, ret = 0; + if (SET_WITH_TIMEOUT(set)) + set_cleanup_entries(set); + /* Check already added element */ for (i = 0; i < map->size; i++) { - e = list_set_elem(map, i); + e = list_set_elem(set, map, i); if (e->id == IPSET_INVALID_ID) goto insert; - else if (SET_WITH_TIMEOUT(set) && - ip_set_timeout_expired(ext_timeout(e, map))) - continue; else if (e->id != d->id) continue; @@ -319,18 +299,22 @@ list_set_uadd(struct ip_set *set, void *value, const struct ip_set_ext *ext, /* Can't re-add */ return -IPSET_ERR_EXIST; /* Update extensions */ + ip_set_ext_destroy(set, e); + if (SET_WITH_TIMEOUT(set)) - ip_set_timeout_set(ext_timeout(e, map), ext->timeout); + ip_set_timeout_set(ext_timeout(e, set), ext->timeout); if (SET_WITH_COUNTER(set)) - ip_set_init_counter(ext_counter(e, map), ext); + ip_set_init_counter(ext_counter(e, set), ext); + if (SET_WITH_COMMENT(set)) + ip_set_init_comment(ext_comment(e, set), ext); /* Set is already added to the list */ - ip_set_put_byindex(d->id); + ip_set_put_byindex(map->net, d->id); return 0; } insert: ret = -IPSET_ERR_LIST_FULL; for (i = 0; i < map->size && ret == -IPSET_ERR_LIST_FULL; i++) { - e = list_set_elem(map, i); + e = list_set_elem(set, map, i); if (e->id == IPSET_INVALID_ID) ret = d->before != 0 ? -IPSET_ERR_REF_EXIST : list_set_add(set, i, d, ext); @@ -355,12 +339,12 @@ list_set_udel(struct ip_set *set, void *value, const struct ip_set_ext *ext, u32 i; for (i = 0; i < map->size; i++) { - e = list_set_elem(map, i); + e = list_set_elem(set, map, i); if (e->id == IPSET_INVALID_ID) return d->before != 0 ? -IPSET_ERR_REF_EXIST : -IPSET_ERR_EXIST; else if (SET_WITH_TIMEOUT(set) && - ip_set_timeout_expired(ext_timeout(e, map))) + ip_set_timeout_expired(ext_timeout(e, set))) continue; else if (e->id != d->id) continue; @@ -386,7 +370,7 @@ list_set_uadt(struct ip_set *set, struct nlattr *tb[], struct list_set *map = set->data; ipset_adtfn adtfn = set->variant->adt[adt]; struct set_adt_elem e = { .refid = IPSET_INVALID_ID }; - struct ip_set_ext ext = IP_SET_INIT_UEXT(map); + struct ip_set_ext ext = IP_SET_INIT_UEXT(set); struct ip_set *s; int ret = 0; @@ -403,7 +387,7 @@ list_set_uadt(struct ip_set *set, struct nlattr *tb[], ret = ip_set_get_extensions(set, tb, &ext); if (ret) return ret; - e.id = ip_set_get_byname(nla_data(tb[IPSET_ATTR_NAME]), &s); + e.id = ip_set_get_byname(map->net, nla_data(tb[IPSET_ATTR_NAME]), &s); if (e.id == IPSET_INVALID_ID) return -IPSET_ERR_NAME; /* "Loop detection" */ @@ -423,7 +407,8 @@ list_set_uadt(struct ip_set *set, struct nlattr *tb[], } if (tb[IPSET_ATTR_NAMEREF]) { - e.refid = ip_set_get_byname(nla_data(tb[IPSET_ATTR_NAMEREF]), + e.refid = ip_set_get_byname(map->net, + nla_data(tb[IPSET_ATTR_NAMEREF]), &s); if (e.refid == IPSET_INVALID_ID) { ret = -IPSET_ERR_NAMEREF; @@ -439,9 +424,9 @@ list_set_uadt(struct ip_set *set, struct nlattr *tb[], finish: if (e.refid != IPSET_INVALID_ID) - ip_set_put_byindex(e.refid); + ip_set_put_byindex(map->net, e.refid); if (adt != IPSET_ADD || ret) - ip_set_put_byindex(e.id); + ip_set_put_byindex(map->net, e.id); return ip_set_eexist(ret, flags) ? 0 : ret; } @@ -454,9 +439,10 @@ list_set_flush(struct ip_set *set) u32 i; for (i = 0; i < map->size; i++) { - e = list_set_elem(map, i); + e = list_set_elem(set, map, i); if (e->id != IPSET_INVALID_ID) { - ip_set_put_byindex(e->id); + ip_set_put_byindex(map->net, e->id); + ip_set_ext_destroy(set, e); e->id = IPSET_INVALID_ID; } } @@ -485,14 +471,11 @@ list_set_head(struct ip_set *set, struct sk_buff *skb) if (!nested) goto nla_put_failure; if (nla_put_net32(skb, IPSET_ATTR_SIZE, htonl(map->size)) || - (SET_WITH_TIMEOUT(set) && - nla_put_net32(skb, IPSET_ATTR_TIMEOUT, htonl(map->timeout))) || - (SET_WITH_COUNTER(set) && - nla_put_net32(skb, IPSET_ATTR_CADT_FLAGS, - htonl(IPSET_FLAG_WITH_COUNTERS))) || nla_put_net32(skb, IPSET_ATTR_REFERENCES, htonl(set->ref - 1)) || nla_put_net32(skb, IPSET_ATTR_MEMSIZE, - htonl(sizeof(*map) + map->size * map->dsize))) + htonl(sizeof(*map) + map->size * set->dsize))) + goto nla_put_failure; + if (unlikely(ip_set_put_flags(skb, set))) goto nla_put_failure; ipset_nest_end(skb, nested); @@ -507,19 +490,20 @@ list_set_list(const struct ip_set *set, { const struct list_set *map = set->data; struct nlattr *atd, *nested; - u32 i, first = cb->args[2]; + u32 i, first = cb->args[IPSET_CB_ARG0]; const struct set_elem *e; atd = ipset_nest_start(skb, IPSET_ATTR_ADT); if (!atd) return -EMSGSIZE; - for (; cb->args[2] < map->size; cb->args[2]++) { - i = cb->args[2]; - e = list_set_elem(map, i); + for (; cb->args[IPSET_CB_ARG0] < map->size; + cb->args[IPSET_CB_ARG0]++) { + i = cb->args[IPSET_CB_ARG0]; + e = list_set_elem(set, map, i); if (e->id == IPSET_INVALID_ID) goto finish; if (SET_WITH_TIMEOUT(set) && - ip_set_timeout_expired(ext_timeout(e, map))) + ip_set_timeout_expired(ext_timeout(e, set))) continue; nested = ipset_nest_start(skb, IPSET_ATTR_DATA); if (!nested) { @@ -530,31 +514,25 @@ list_set_list(const struct ip_set *set, goto nla_put_failure; } if (nla_put_string(skb, IPSET_ATTR_NAME, - ip_set_name_byindex(e->id))) - goto nla_put_failure; - if (SET_WITH_TIMEOUT(set) && - nla_put_net32(skb, IPSET_ATTR_TIMEOUT, - htonl(ip_set_timeout_get( - ext_timeout(e, map))))) + ip_set_name_byindex(map->net, e->id))) goto nla_put_failure; - if (SET_WITH_COUNTER(set) && - ip_set_put_counter(skb, ext_counter(e, map))) + if (ip_set_put_extensions(skb, set, e, true)) goto nla_put_failure; ipset_nest_end(skb, nested); } finish: ipset_nest_end(skb, atd); /* Set listing finished */ - cb->args[2] = 0; + cb->args[IPSET_CB_ARG0] = 0; return 0; nla_put_failure: nla_nest_cancel(skb, nested); - ipset_nest_end(skb, atd); if (unlikely(i == first)) { - cb->args[2] = 0; + cb->args[IPSET_CB_ARG0] = 0; return -EMSGSIZE; } + ipset_nest_end(skb, atd); return 0; } @@ -565,7 +543,7 @@ list_set_same_set(const struct ip_set *a, const struct ip_set *b) const struct list_set *y = b->data; return x->size == y->size && - x->timeout == y->timeout && + a->timeout == b->timeout && a->extensions == b->extensions; } @@ -594,7 +572,7 @@ list_set_gc(unsigned long ul_set) set_cleanup_entries(set); write_unlock_bh(&set->lock); - map->gc.expires = jiffies + IPSET_GC_PERIOD(map->timeout) * HZ; + map->gc.expires = jiffies + IPSET_GC_PERIOD(set->timeout) * HZ; add_timer(&map->gc); } @@ -606,43 +584,40 @@ list_set_gc_init(struct ip_set *set, void (*gc)(unsigned long ul_set)) init_timer(&map->gc); map->gc.data = (unsigned long) set; map->gc.function = gc; - map->gc.expires = jiffies + IPSET_GC_PERIOD(map->timeout) * HZ; + map->gc.expires = jiffies + IPSET_GC_PERIOD(set->timeout) * HZ; add_timer(&map->gc); } /* Create list:set type of sets */ -static struct list_set * -init_list_set(struct ip_set *set, u32 size, size_t dsize, - unsigned long timeout) +static bool +init_list_set(struct net *net, struct ip_set *set, u32 size) { struct list_set *map; struct set_elem *e; u32 i; - map = kzalloc(sizeof(*map) + size * dsize, GFP_KERNEL); + map = kzalloc(sizeof(*map) + size * set->dsize, GFP_KERNEL); if (!map) - return NULL; + return false; map->size = size; - map->dsize = dsize; - map->timeout = timeout; + map->net = net; set->data = map; for (i = 0; i < size; i++) { - e = list_set_elem(map, i); + e = list_set_elem(set, map, i); e->id = IPSET_INVALID_ID; } - return map; + return true; } static int -list_set_create(struct ip_set *set, struct nlattr *tb[], u32 flags) +list_set_create(struct net *net, struct ip_set *set, struct nlattr *tb[], + u32 flags) { - struct list_set *map; - u32 size = IP_SET_LIST_DEFAULT_SIZE, cadt_flags = 0; - unsigned long timeout = 0; + u32 size = IP_SET_LIST_DEFAULT_SIZE; if (unlikely(!ip_set_optattr_netorder(tb, IPSET_ATTR_SIZE) || !ip_set_optattr_netorder(tb, IPSET_ATTR_TIMEOUT) || @@ -654,45 +629,13 @@ list_set_create(struct ip_set *set, struct nlattr *tb[], u32 flags) if (size < IP_SET_LIST_MIN_SIZE) size = IP_SET_LIST_MIN_SIZE; - if (tb[IPSET_ATTR_CADT_FLAGS]) - cadt_flags = ip_set_get_h32(tb[IPSET_ATTR_CADT_FLAGS]); - if (tb[IPSET_ATTR_TIMEOUT]) - timeout = ip_set_timeout_uget(tb[IPSET_ATTR_TIMEOUT]); set->variant = &set_variant; - if (cadt_flags & IPSET_FLAG_WITH_COUNTERS) { - set->extensions |= IPSET_EXT_COUNTER; - if (tb[IPSET_ATTR_TIMEOUT]) { - map = init_list_set(set, size, - sizeof(struct setct_elem), timeout); - if (!map) - return -ENOMEM; - set->extensions |= IPSET_EXT_TIMEOUT; - map->offset[IPSET_OFFSET_TIMEOUT] = - offsetof(struct setct_elem, timeout); - map->offset[IPSET_OFFSET_COUNTER] = - offsetof(struct setct_elem, counter); - list_set_gc_init(set, list_set_gc); - } else { - map = init_list_set(set, size, - sizeof(struct setc_elem), 0); - if (!map) - return -ENOMEM; - map->offset[IPSET_OFFSET_COUNTER] = - offsetof(struct setc_elem, counter); - } - } else if (tb[IPSET_ATTR_TIMEOUT]) { - map = init_list_set(set, size, - sizeof(struct sett_elem), timeout); - if (!map) - return -ENOMEM; - set->extensions |= IPSET_EXT_TIMEOUT; - map->offset[IPSET_OFFSET_TIMEOUT] = - offsetof(struct sett_elem, timeout); + set->dsize = ip_set_elem_len(set, tb, sizeof(struct set_elem)); + if (!init_list_set(net, set, size)) + return -ENOMEM; + if (tb[IPSET_ATTR_TIMEOUT]) { + set->timeout = ip_set_timeout_uget(tb[IPSET_ATTR_TIMEOUT]); list_set_gc_init(set, list_set_gc); - } else { - map = init_list_set(set, size, sizeof(struct set_elem), 0); - if (!map) - return -ENOMEM; } return 0; } @@ -703,8 +646,8 @@ static struct ip_set_type list_set_type __read_mostly = { .features = IPSET_TYPE_NAME | IPSET_DUMP_LAST, .dimension = IPSET_DIM_ONE, .family = NFPROTO_UNSPEC, - .revision_min = REVISION_MIN, - .revision_max = REVISION_MAX, + .revision_min = IPSET_TYPE_REV_MIN, + .revision_max = IPSET_TYPE_REV_MAX, .create = list_set_create, .create_policy = { [IPSET_ATTR_SIZE] = { .type = NLA_U32 }, @@ -721,6 +664,7 @@ static struct ip_set_type list_set_type __read_mostly = { [IPSET_ATTR_CADT_FLAGS] = { .type = NLA_U32 }, [IPSET_ATTR_BYTES] = { .type = NLA_U64 }, [IPSET_ATTR_PACKETS] = { .type = NLA_U64 }, + [IPSET_ATTR_COMMENT] = { .type = NLA_NUL_STRING }, }, .me = THIS_MODULE, }; diff --git a/net/netfilter/ipvs/ip_vs_core.c b/net/netfilter/ipvs/ip_vs_core.c index 74fd00c27210..4f26ee46b51f 100644 --- a/net/netfilter/ipvs/ip_vs_core.c +++ b/net/netfilter/ipvs/ip_vs_core.c @@ -1139,12 +1139,6 @@ ip_vs_out(unsigned int hooknum, struct sk_buff *skb, int af) ip_vs_fill_iph_skb(af, skb, &iph); #ifdef CONFIG_IP_VS_IPV6 if (af == AF_INET6) { - if (!iph.fragoffs && skb_nfct_reasm(skb)) { - struct sk_buff *reasm = skb_nfct_reasm(skb); - /* Save fw mark for coming frags */ - reasm->ipvs_property = 1; - reasm->mark = skb->mark; - } if (unlikely(iph.protocol == IPPROTO_ICMPV6)) { int related; int verdict = ip_vs_out_icmp_v6(skb, &related, @@ -1239,11 +1233,11 @@ ip_vs_out(unsigned int hooknum, struct sk_buff *skb, int af) * Check if packet is reply for established ip_vs_conn. */ static unsigned int -ip_vs_reply4(unsigned int hooknum, struct sk_buff *skb, +ip_vs_reply4(const struct nf_hook_ops *ops, struct sk_buff *skb, const struct net_device *in, const struct net_device *out, int (*okfn)(struct sk_buff *)) { - return ip_vs_out(hooknum, skb, AF_INET); + return ip_vs_out(ops->hooknum, skb, AF_INET); } /* @@ -1251,11 +1245,11 @@ ip_vs_reply4(unsigned int hooknum, struct sk_buff *skb, * Check if packet is reply for established ip_vs_conn. */ static unsigned int -ip_vs_local_reply4(unsigned int hooknum, struct sk_buff *skb, +ip_vs_local_reply4(const struct nf_hook_ops *ops, struct sk_buff *skb, const struct net_device *in, const struct net_device *out, int (*okfn)(struct sk_buff *)) { - return ip_vs_out(hooknum, skb, AF_INET); + return ip_vs_out(ops->hooknum, skb, AF_INET); } #ifdef CONFIG_IP_VS_IPV6 @@ -1266,11 +1260,11 @@ ip_vs_local_reply4(unsigned int hooknum, struct sk_buff *skb, * Check if packet is reply for established ip_vs_conn. */ static unsigned int -ip_vs_reply6(unsigned int hooknum, struct sk_buff *skb, +ip_vs_reply6(const struct nf_hook_ops *ops, struct sk_buff *skb, const struct net_device *in, const struct net_device *out, int (*okfn)(struct sk_buff *)) { - return ip_vs_out(hooknum, skb, AF_INET6); + return ip_vs_out(ops->hooknum, skb, AF_INET6); } /* @@ -1278,11 +1272,11 @@ ip_vs_reply6(unsigned int hooknum, struct sk_buff *skb, * Check if packet is reply for established ip_vs_conn. */ static unsigned int -ip_vs_local_reply6(unsigned int hooknum, struct sk_buff *skb, +ip_vs_local_reply6(const struct nf_hook_ops *ops, struct sk_buff *skb, const struct net_device *in, const struct net_device *out, int (*okfn)(struct sk_buff *)) { - return ip_vs_out(hooknum, skb, AF_INET6); + return ip_vs_out(ops->hooknum, skb, AF_INET6); } #endif @@ -1614,12 +1608,6 @@ ip_vs_in(unsigned int hooknum, struct sk_buff *skb, int af) #ifdef CONFIG_IP_VS_IPV6 if (af == AF_INET6) { - if (!iph.fragoffs && skb_nfct_reasm(skb)) { - struct sk_buff *reasm = skb_nfct_reasm(skb); - /* Save fw mark for coming frags. */ - reasm->ipvs_property = 1; - reasm->mark = skb->mark; - } if (unlikely(iph.protocol == IPPROTO_ICMPV6)) { int related; int verdict = ip_vs_in_icmp_v6(skb, &related, hooknum, @@ -1671,9 +1659,8 @@ ip_vs_in(unsigned int hooknum, struct sk_buff *skb, int af) /* sorry, all this trouble for a no-hit :) */ IP_VS_DBG_PKT(12, af, pp, skb, 0, "ip_vs_in: packet continues traversal as normal"); - if (iph.fragoffs && !skb_nfct_reasm(skb)) { + if (iph.fragoffs) { /* Fragment that couldn't be mapped to a conn entry - * and don't have any pointer to a reasm skb * is missing module nf_defrag_ipv6 */ IP_VS_DBG_RL("Unhandled frag, load nf_defrag_ipv6\n"); @@ -1733,12 +1720,12 @@ ip_vs_in(unsigned int hooknum, struct sk_buff *skb, int af) * Schedule and forward packets from remote clients */ static unsigned int -ip_vs_remote_request4(unsigned int hooknum, struct sk_buff *skb, +ip_vs_remote_request4(const struct nf_hook_ops *ops, struct sk_buff *skb, const struct net_device *in, const struct net_device *out, int (*okfn)(struct sk_buff *)) { - return ip_vs_in(hooknum, skb, AF_INET); + return ip_vs_in(ops->hooknum, skb, AF_INET); } /* @@ -1746,58 +1733,26 @@ ip_vs_remote_request4(unsigned int hooknum, struct sk_buff *skb, * Schedule and forward packets from local clients */ static unsigned int -ip_vs_local_request4(unsigned int hooknum, struct sk_buff *skb, +ip_vs_local_request4(const struct nf_hook_ops *ops, struct sk_buff *skb, const struct net_device *in, const struct net_device *out, int (*okfn)(struct sk_buff *)) { - return ip_vs_in(hooknum, skb, AF_INET); + return ip_vs_in(ops->hooknum, skb, AF_INET); } #ifdef CONFIG_IP_VS_IPV6 /* - * AF_INET6 fragment handling - * Copy info from first fragment, to the rest of them. - */ -static unsigned int -ip_vs_preroute_frag6(unsigned int hooknum, struct sk_buff *skb, - const struct net_device *in, - const struct net_device *out, - int (*okfn)(struct sk_buff *)) -{ - struct sk_buff *reasm = skb_nfct_reasm(skb); - struct net *net; - - /* Skip if not a "replay" from nf_ct_frag6_output or first fragment. - * ipvs_property is set when checking first fragment - * in ip_vs_in() and ip_vs_out(). - */ - if (reasm) - IP_VS_DBG(2, "Fragment recv prop:%d\n", reasm->ipvs_property); - if (!reasm || !reasm->ipvs_property) - return NF_ACCEPT; - - net = skb_net(skb); - if (!net_ipvs(net)->enable) - return NF_ACCEPT; - - /* Copy stored fw mark, saved in ip_vs_{in,out} */ - skb->mark = reasm->mark; - - return NF_ACCEPT; -} - -/* * AF_INET6 handler in NF_INET_LOCAL_IN chain * Schedule and forward packets from remote clients */ static unsigned int -ip_vs_remote_request6(unsigned int hooknum, struct sk_buff *skb, +ip_vs_remote_request6(const struct nf_hook_ops *ops, struct sk_buff *skb, const struct net_device *in, const struct net_device *out, int (*okfn)(struct sk_buff *)) { - return ip_vs_in(hooknum, skb, AF_INET6); + return ip_vs_in(ops->hooknum, skb, AF_INET6); } /* @@ -1805,11 +1760,11 @@ ip_vs_remote_request6(unsigned int hooknum, struct sk_buff *skb, * Schedule and forward packets from local clients */ static unsigned int -ip_vs_local_request6(unsigned int hooknum, struct sk_buff *skb, +ip_vs_local_request6(const struct nf_hook_ops *ops, struct sk_buff *skb, const struct net_device *in, const struct net_device *out, int (*okfn)(struct sk_buff *)) { - return ip_vs_in(hooknum, skb, AF_INET6); + return ip_vs_in(ops->hooknum, skb, AF_INET6); } #endif @@ -1825,7 +1780,7 @@ ip_vs_local_request6(unsigned int hooknum, struct sk_buff *skb, * and send them to ip_vs_in_icmp. */ static unsigned int -ip_vs_forward_icmp(unsigned int hooknum, struct sk_buff *skb, +ip_vs_forward_icmp(const struct nf_hook_ops *ops, struct sk_buff *skb, const struct net_device *in, const struct net_device *out, int (*okfn)(struct sk_buff *)) { @@ -1842,12 +1797,12 @@ ip_vs_forward_icmp(unsigned int hooknum, struct sk_buff *skb, if (unlikely(sysctl_backup_only(ipvs) || !ipvs->enable)) return NF_ACCEPT; - return ip_vs_in_icmp(skb, &r, hooknum); + return ip_vs_in_icmp(skb, &r, ops->hooknum); } #ifdef CONFIG_IP_VS_IPV6 static unsigned int -ip_vs_forward_icmp_v6(unsigned int hooknum, struct sk_buff *skb, +ip_vs_forward_icmp_v6(const struct nf_hook_ops *ops, struct sk_buff *skb, const struct net_device *in, const struct net_device *out, int (*okfn)(struct sk_buff *)) { @@ -1866,7 +1821,7 @@ ip_vs_forward_icmp_v6(unsigned int hooknum, struct sk_buff *skb, if (unlikely(sysctl_backup_only(ipvs) || !ipvs->enable)) return NF_ACCEPT; - return ip_vs_in_icmp_v6(skb, &r, hooknum, &iphdr); + return ip_vs_in_icmp_v6(skb, &r, ops->hooknum, &iphdr); } #endif @@ -1924,14 +1879,6 @@ static struct nf_hook_ops ip_vs_ops[] __read_mostly = { .priority = 100, }, #ifdef CONFIG_IP_VS_IPV6 - /* After mangle & nat fetch 2:nd fragment and following */ - { - .hook = ip_vs_preroute_frag6, - .owner = THIS_MODULE, - .pf = NFPROTO_IPV6, - .hooknum = NF_INET_PRE_ROUTING, - .priority = NF_IP6_PRI_NAT_DST + 1, - }, /* After packet filtering, change source only for VS/NAT */ { .hook = ip_vs_reply6, diff --git a/net/netfilter/ipvs/ip_vs_ctl.c b/net/netfilter/ipvs/ip_vs_ctl.c index a3df9bddc4f7..62786a495cea 100644 --- a/net/netfilter/ipvs/ip_vs_ctl.c +++ b/net/netfilter/ipvs/ip_vs_ctl.c @@ -704,7 +704,7 @@ static void ip_vs_dest_free(struct ip_vs_dest *dest) __ip_vs_dst_cache_reset(dest); __ip_vs_svc_put(svc, false); free_percpu(dest->stats.cpustats); - kfree(dest); + ip_vs_dest_put_and_free(dest); } /* @@ -3820,10 +3820,6 @@ void __net_exit ip_vs_control_net_cleanup(struct net *net) { struct netns_ipvs *ipvs = net_ipvs(net); - /* Some dest can be in grace period even before cleanup, we have to - * defer ip_vs_trash_cleanup until ip_vs_dest_wait_readers is called. - */ - rcu_barrier(); ip_vs_trash_cleanup(net); ip_vs_stop_estimator(net, &ipvs->tot_stats); ip_vs_control_net_cleanup_sysctl(net); diff --git a/net/netfilter/ipvs/ip_vs_lblc.c b/net/netfilter/ipvs/ip_vs_lblc.c index eff13c94498e..ca056a331e60 100644 --- a/net/netfilter/ipvs/ip_vs_lblc.c +++ b/net/netfilter/ipvs/ip_vs_lblc.c @@ -136,7 +136,7 @@ static void ip_vs_lblc_rcu_free(struct rcu_head *head) struct ip_vs_lblc_entry, rcu_head); - ip_vs_dest_put(en->dest); + ip_vs_dest_put_and_free(en->dest); kfree(en); } diff --git a/net/netfilter/ipvs/ip_vs_lblcr.c b/net/netfilter/ipvs/ip_vs_lblcr.c index 0b8550089a2e..3f21a2f47de1 100644 --- a/net/netfilter/ipvs/ip_vs_lblcr.c +++ b/net/netfilter/ipvs/ip_vs_lblcr.c @@ -130,7 +130,7 @@ static void ip_vs_lblcr_elem_rcu_free(struct rcu_head *head) struct ip_vs_dest_set_elem *e; e = container_of(head, struct ip_vs_dest_set_elem, rcu_head); - ip_vs_dest_put(e->dest); + ip_vs_dest_put_and_free(e->dest); kfree(e); } diff --git a/net/netfilter/ipvs/ip_vs_pe_sip.c b/net/netfilter/ipvs/ip_vs_pe_sip.c index 9ef22bdce9f1..bed5f7042529 100644 --- a/net/netfilter/ipvs/ip_vs_pe_sip.c +++ b/net/netfilter/ipvs/ip_vs_pe_sip.c @@ -65,7 +65,6 @@ static int get_callid(const char *dptr, unsigned int dataoff, static int ip_vs_sip_fill_param(struct ip_vs_conn_param *p, struct sk_buff *skb) { - struct sk_buff *reasm = skb_nfct_reasm(skb); struct ip_vs_iphdr iph; unsigned int dataoff, datalen, matchoff, matchlen; const char *dptr; @@ -79,15 +78,10 @@ ip_vs_sip_fill_param(struct ip_vs_conn_param *p, struct sk_buff *skb) /* todo: IPv6 fragments: * I think this only should be done for the first fragment. /HS */ - if (reasm) { - skb = reasm; - dataoff = iph.thoff_reasm + sizeof(struct udphdr); - } else - dataoff = iph.len + sizeof(struct udphdr); + dataoff = iph.len + sizeof(struct udphdr); if (dataoff >= skb->len) return -EINVAL; - /* todo: Check if this will mess-up the reasm skb !!! /HS */ retc = skb_linearize(skb); if (retc < 0) return retc; diff --git a/net/netfilter/ipvs/ip_vs_proto_sctp.c b/net/netfilter/ipvs/ip_vs_proto_sctp.c index 23e596e438b3..2f7ea7564044 100644 --- a/net/netfilter/ipvs/ip_vs_proto_sctp.c +++ b/net/netfilter/ipvs/ip_vs_proto_sctp.c @@ -20,13 +20,18 @@ sctp_conn_schedule(int af, struct sk_buff *skb, struct ip_vs_proto_data *pd, sctp_sctphdr_t *sh, _sctph; sh = skb_header_pointer(skb, iph->len, sizeof(_sctph), &_sctph); - if (sh == NULL) + if (sh == NULL) { + *verdict = NF_DROP; return 0; + } sch = skb_header_pointer(skb, iph->len + sizeof(sctp_sctphdr_t), sizeof(_schunkh), &_schunkh); - if (sch == NULL) + if (sch == NULL) { + *verdict = NF_DROP; return 0; + } + net = skb_net(skb); ipvs = net_ipvs(net); rcu_read_lock(); @@ -76,6 +81,7 @@ sctp_snat_handler(struct sk_buff *skb, struct ip_vs_protocol *pp, { sctp_sctphdr_t *sctph; unsigned int sctphoff = iph->len; + bool payload_csum = false; #ifdef CONFIG_IP_VS_IPV6 if (cp->af == AF_INET6 && iph->fragoffs) @@ -87,19 +93,31 @@ sctp_snat_handler(struct sk_buff *skb, struct ip_vs_protocol *pp, return 0; if (unlikely(cp->app != NULL)) { + int ret; + /* Some checks before mangling */ if (pp->csum_check && !pp->csum_check(cp->af, skb, pp)) return 0; /* Call application helper if needed */ - if (!ip_vs_app_pkt_out(cp, skb)) + ret = ip_vs_app_pkt_out(cp, skb); + if (ret == 0) return 0; + /* ret=2: csum update is needed after payload mangling */ + if (ret == 2) + payload_csum = true; } sctph = (void *) skb_network_header(skb) + sctphoff; - sctph->source = cp->vport; - sctp_nat_csum(skb, sctph, sctphoff); + /* Only update csum if we really have to */ + if (sctph->source != cp->vport || payload_csum || + skb->ip_summed == CHECKSUM_PARTIAL) { + sctph->source = cp->vport; + sctp_nat_csum(skb, sctph, sctphoff); + } else { + skb->ip_summed = CHECKSUM_UNNECESSARY; + } return 1; } @@ -110,6 +128,7 @@ sctp_dnat_handler(struct sk_buff *skb, struct ip_vs_protocol *pp, { sctp_sctphdr_t *sctph; unsigned int sctphoff = iph->len; + bool payload_csum = false; #ifdef CONFIG_IP_VS_IPV6 if (cp->af == AF_INET6 && iph->fragoffs) @@ -121,19 +140,32 @@ sctp_dnat_handler(struct sk_buff *skb, struct ip_vs_protocol *pp, return 0; if (unlikely(cp->app != NULL)) { + int ret; + /* Some checks before mangling */ if (pp->csum_check && !pp->csum_check(cp->af, skb, pp)) return 0; /* Call application helper if needed */ - if (!ip_vs_app_pkt_in(cp, skb)) + ret = ip_vs_app_pkt_in(cp, skb); + if (ret == 0) return 0; + /* ret=2: csum update is needed after payload mangling */ + if (ret == 2) + payload_csum = true; } sctph = (void *) skb_network_header(skb) + sctphoff; - sctph->dest = cp->dport; - sctp_nat_csum(skb, sctph, sctphoff); + /* Only update csum if we really have to */ + if (sctph->dest != cp->dport || payload_csum || + (skb->ip_summed == CHECKSUM_PARTIAL && + !(skb_dst(skb)->dev->features & NETIF_F_SCTP_CSUM))) { + sctph->dest = cp->dport; + sctp_nat_csum(skb, sctph, sctphoff); + } else if (skb->ip_summed != CHECKSUM_PARTIAL) { + skb->ip_summed = CHECKSUM_UNNECESSARY; + } return 1; } diff --git a/net/netfilter/ipvs/ip_vs_sh.c b/net/netfilter/ipvs/ip_vs_sh.c index 3588faebe529..cc65b2f42cd4 100644 --- a/net/netfilter/ipvs/ip_vs_sh.c +++ b/net/netfilter/ipvs/ip_vs_sh.c @@ -115,27 +115,46 @@ ip_vs_sh_get(struct ip_vs_service *svc, struct ip_vs_sh_state *s, } -/* As ip_vs_sh_get, but with fallback if selected server is unavailable */ +/* As ip_vs_sh_get, but with fallback if selected server is unavailable + * + * The fallback strategy loops around the table starting from a "random" + * point (in fact, it is chosen to be the original hash value to make the + * algorithm deterministic) to find a new server. + */ static inline struct ip_vs_dest * ip_vs_sh_get_fallback(struct ip_vs_service *svc, struct ip_vs_sh_state *s, const union nf_inet_addr *addr, __be16 port) { - unsigned int offset; - unsigned int hash; + unsigned int offset, roffset; + unsigned int hash, ihash; struct ip_vs_dest *dest; + /* first try the dest it's supposed to go to */ + ihash = ip_vs_sh_hashkey(svc->af, addr, port, 0); + dest = rcu_dereference(s->buckets[ihash].dest); + if (!dest) + return NULL; + if (!is_unavailable(dest)) + return dest; + + IP_VS_DBG_BUF(6, "SH: selected unavailable server %s:%d, reselecting", + IP_VS_DBG_ADDR(svc->af, &dest->addr), ntohs(dest->port)); + + /* if the original dest is unavailable, loop around the table + * starting from ihash to find a new dest + */ for (offset = 0; offset < IP_VS_SH_TAB_SIZE; offset++) { - hash = ip_vs_sh_hashkey(svc->af, addr, port, offset); + roffset = (offset + ihash) % IP_VS_SH_TAB_SIZE; + hash = ip_vs_sh_hashkey(svc->af, addr, port, roffset); dest = rcu_dereference(s->buckets[hash].dest); if (!dest) break; - if (is_unavailable(dest)) - IP_VS_DBG_BUF(6, "SH: selected unavailable server " - "%s:%d (offset %d)", - IP_VS_DBG_ADDR(svc->af, &dest->addr), - ntohs(dest->port), offset); - else + if (!is_unavailable(dest)) return dest; + IP_VS_DBG_BUF(6, "SH: selected unavailable " + "server %s:%d (offset %d), reselecting", + IP_VS_DBG_ADDR(svc->af, &dest->addr), + ntohs(dest->port), roffset); } return NULL; diff --git a/net/netfilter/nf_conntrack_acct.c b/net/netfilter/nf_conntrack_acct.c index 2d3030ab5b61..a4b5e2a435ac 100644 --- a/net/netfilter/nf_conntrack_acct.c +++ b/net/netfilter/nf_conntrack_acct.c @@ -39,21 +39,23 @@ static struct ctl_table acct_sysctl_table[] = { unsigned int seq_print_acct(struct seq_file *s, const struct nf_conn *ct, int dir) { - struct nf_conn_counter *acct; + struct nf_conn_acct *acct; + struct nf_conn_counter *counter; acct = nf_conn_acct_find(ct); if (!acct) return 0; + counter = acct->counter; return seq_printf(s, "packets=%llu bytes=%llu ", - (unsigned long long)atomic64_read(&acct[dir].packets), - (unsigned long long)atomic64_read(&acct[dir].bytes)); + (unsigned long long)atomic64_read(&counter[dir].packets), + (unsigned long long)atomic64_read(&counter[dir].bytes)); }; EXPORT_SYMBOL_GPL(seq_print_acct); static struct nf_ct_ext_type acct_extend __read_mostly = { - .len = sizeof(struct nf_conn_counter[IP_CT_DIR_MAX]), - .align = __alignof__(struct nf_conn_counter[IP_CT_DIR_MAX]), + .len = sizeof(struct nf_conn_acct), + .align = __alignof__(struct nf_conn_acct), .id = NF_CT_EXT_ACCT, }; diff --git a/net/netfilter/nf_conntrack_core.c b/net/netfilter/nf_conntrack_core.c index 5d892febd64c..e22d950c60b3 100644 --- a/net/netfilter/nf_conntrack_core.c +++ b/net/netfilter/nf_conntrack_core.c @@ -1109,12 +1109,14 @@ void __nf_ct_refresh_acct(struct nf_conn *ct, acct: if (do_acct) { - struct nf_conn_counter *acct; + struct nf_conn_acct *acct; acct = nf_conn_acct_find(ct); if (acct) { - atomic64_inc(&acct[CTINFO2DIR(ctinfo)].packets); - atomic64_add(skb->len, &acct[CTINFO2DIR(ctinfo)].bytes); + struct nf_conn_counter *counter = acct->counter; + + atomic64_inc(&counter[CTINFO2DIR(ctinfo)].packets); + atomic64_add(skb->len, &counter[CTINFO2DIR(ctinfo)].bytes); } } } @@ -1126,13 +1128,15 @@ bool __nf_ct_kill_acct(struct nf_conn *ct, int do_acct) { if (do_acct) { - struct nf_conn_counter *acct; + struct nf_conn_acct *acct; acct = nf_conn_acct_find(ct); if (acct) { - atomic64_inc(&acct[CTINFO2DIR(ctinfo)].packets); + struct nf_conn_counter *counter = acct->counter; + + atomic64_inc(&counter[CTINFO2DIR(ctinfo)].packets); atomic64_add(skb->len - skb_network_offset(skb), - &acct[CTINFO2DIR(ctinfo)].bytes); + &counter[CTINFO2DIR(ctinfo)].bytes); } } diff --git a/net/netfilter/nf_conntrack_netlink.c b/net/netfilter/nf_conntrack_netlink.c index eea936b70d15..08870b859046 100644 --- a/net/netfilter/nf_conntrack_netlink.c +++ b/net/netfilter/nf_conntrack_netlink.c @@ -211,13 +211,23 @@ nla_put_failure: } static int -dump_counters(struct sk_buff *skb, u64 pkts, u64 bytes, - enum ip_conntrack_dir dir) +dump_counters(struct sk_buff *skb, struct nf_conn_acct *acct, + enum ip_conntrack_dir dir, int type) { - enum ctattr_type type = dir ? CTA_COUNTERS_REPLY: CTA_COUNTERS_ORIG; + enum ctattr_type attr = dir ? CTA_COUNTERS_REPLY: CTA_COUNTERS_ORIG; + struct nf_conn_counter *counter = acct->counter; struct nlattr *nest_count; + u64 pkts, bytes; - nest_count = nla_nest_start(skb, type | NLA_F_NESTED); + if (type == IPCTNL_MSG_CT_GET_CTRZERO) { + pkts = atomic64_xchg(&counter[dir].packets, 0); + bytes = atomic64_xchg(&counter[dir].bytes, 0); + } else { + pkts = atomic64_read(&counter[dir].packets); + bytes = atomic64_read(&counter[dir].bytes); + } + + nest_count = nla_nest_start(skb, attr | NLA_F_NESTED); if (!nest_count) goto nla_put_failure; @@ -234,24 +244,19 @@ nla_put_failure: } static int -ctnetlink_dump_counters(struct sk_buff *skb, const struct nf_conn *ct, - enum ip_conntrack_dir dir, int type) +ctnetlink_dump_acct(struct sk_buff *skb, const struct nf_conn *ct, int type) { - struct nf_conn_counter *acct; - u64 pkts, bytes; + struct nf_conn_acct *acct = nf_conn_acct_find(ct); - acct = nf_conn_acct_find(ct); if (!acct) return 0; - if (type == IPCTNL_MSG_CT_GET_CTRZERO) { - pkts = atomic64_xchg(&acct[dir].packets, 0); - bytes = atomic64_xchg(&acct[dir].bytes, 0); - } else { - pkts = atomic64_read(&acct[dir].packets); - bytes = atomic64_read(&acct[dir].bytes); - } - return dump_counters(skb, pkts, bytes, dir); + if (dump_counters(skb, acct, IP_CT_DIR_ORIGINAL, type) < 0) + return -1; + if (dump_counters(skb, acct, IP_CT_DIR_REPLY, type) < 0) + return -1; + + return 0; } static int @@ -488,8 +493,7 @@ ctnetlink_fill_info(struct sk_buff *skb, u32 portid, u32 seq, u32 type, if (ctnetlink_dump_status(skb, ct) < 0 || ctnetlink_dump_timeout(skb, ct) < 0 || - ctnetlink_dump_counters(skb, ct, IP_CT_DIR_ORIGINAL, type) < 0 || - ctnetlink_dump_counters(skb, ct, IP_CT_DIR_REPLY, type) < 0 || + ctnetlink_dump_acct(skb, ct, type) < 0 || ctnetlink_dump_timestamp(skb, ct) < 0 || ctnetlink_dump_protoinfo(skb, ct) < 0 || ctnetlink_dump_helpinfo(skb, ct) < 0 || @@ -530,7 +534,7 @@ ctnetlink_proto_size(const struct nf_conn *ct) } static inline size_t -ctnetlink_counters_size(const struct nf_conn *ct) +ctnetlink_acct_size(const struct nf_conn *ct) { if (!nf_ct_ext_exist(ct, NF_CT_EXT_ACCT)) return 0; @@ -579,7 +583,7 @@ ctnetlink_nlmsg_size(const struct nf_conn *ct) + 3 * nla_total_size(sizeof(u_int8_t)) /* CTA_PROTO_NUM */ + nla_total_size(sizeof(u_int32_t)) /* CTA_ID */ + nla_total_size(sizeof(u_int32_t)) /* CTA_STATUS */ - + ctnetlink_counters_size(ct) + + ctnetlink_acct_size(ct) + ctnetlink_timestamp_size(ct) + nla_total_size(sizeof(u_int32_t)) /* CTA_TIMEOUT */ + nla_total_size(0) /* CTA_PROTOINFO */ @@ -673,10 +677,7 @@ ctnetlink_conntrack_event(unsigned int events, struct nf_ct_event *item) goto nla_put_failure; if (events & (1 << IPCT_DESTROY)) { - if (ctnetlink_dump_counters(skb, ct, - IP_CT_DIR_ORIGINAL, type) < 0 || - ctnetlink_dump_counters(skb, ct, - IP_CT_DIR_REPLY, type) < 0 || + if (ctnetlink_dump_acct(skb, ct, type) < 0 || ctnetlink_dump_timestamp(skb, ct) < 0) goto nla_put_failure; } else { diff --git a/net/netfilter/nf_conntrack_sip.c b/net/netfilter/nf_conntrack_sip.c index e0c4373b4747..466410eaa482 100644 --- a/net/netfilter/nf_conntrack_sip.c +++ b/net/netfilter/nf_conntrack_sip.c @@ -52,66 +52,8 @@ module_param(sip_direct_media, int, 0600); MODULE_PARM_DESC(sip_direct_media, "Expect Media streams between signalling " "endpoints only (default 1)"); -unsigned int (*nf_nat_sip_hook)(struct sk_buff *skb, unsigned int protoff, - unsigned int dataoff, const char **dptr, - unsigned int *datalen) __read_mostly; -EXPORT_SYMBOL_GPL(nf_nat_sip_hook); - -void (*nf_nat_sip_seq_adjust_hook)(struct sk_buff *skb, unsigned int protoff, - s16 off) __read_mostly; -EXPORT_SYMBOL_GPL(nf_nat_sip_seq_adjust_hook); - -unsigned int (*nf_nat_sip_expect_hook)(struct sk_buff *skb, - unsigned int protoff, - unsigned int dataoff, - const char **dptr, - unsigned int *datalen, - struct nf_conntrack_expect *exp, - unsigned int matchoff, - unsigned int matchlen) __read_mostly; -EXPORT_SYMBOL_GPL(nf_nat_sip_expect_hook); - -unsigned int (*nf_nat_sdp_addr_hook)(struct sk_buff *skb, unsigned int protoff, - unsigned int dataoff, - const char **dptr, - unsigned int *datalen, - unsigned int sdpoff, - enum sdp_header_types type, - enum sdp_header_types term, - const union nf_inet_addr *addr) - __read_mostly; -EXPORT_SYMBOL_GPL(nf_nat_sdp_addr_hook); - -unsigned int (*nf_nat_sdp_port_hook)(struct sk_buff *skb, unsigned int protoff, - unsigned int dataoff, - const char **dptr, - unsigned int *datalen, - unsigned int matchoff, - unsigned int matchlen, - u_int16_t port) __read_mostly; -EXPORT_SYMBOL_GPL(nf_nat_sdp_port_hook); - -unsigned int (*nf_nat_sdp_session_hook)(struct sk_buff *skb, - unsigned int protoff, - unsigned int dataoff, - const char **dptr, - unsigned int *datalen, - unsigned int sdpoff, - const union nf_inet_addr *addr) - __read_mostly; -EXPORT_SYMBOL_GPL(nf_nat_sdp_session_hook); - -unsigned int (*nf_nat_sdp_media_hook)(struct sk_buff *skb, unsigned int protoff, - unsigned int dataoff, - const char **dptr, - unsigned int *datalen, - struct nf_conntrack_expect *rtp_exp, - struct nf_conntrack_expect *rtcp_exp, - unsigned int mediaoff, - unsigned int medialen, - union nf_inet_addr *rtp_addr) - __read_mostly; -EXPORT_SYMBOL_GPL(nf_nat_sdp_media_hook); +const struct nf_nat_sip_hooks *nf_nat_sip_hooks; +EXPORT_SYMBOL_GPL(nf_nat_sip_hooks); static int string_len(const struct nf_conn *ct, const char *dptr, const char *limit, int *shift) @@ -914,8 +856,7 @@ static int set_expected_rtp_rtcp(struct sk_buff *skb, unsigned int protoff, int direct_rtp = 0, skip_expect = 0, ret = NF_DROP; u_int16_t base_port; __be16 rtp_port, rtcp_port; - typeof(nf_nat_sdp_port_hook) nf_nat_sdp_port; - typeof(nf_nat_sdp_media_hook) nf_nat_sdp_media; + const struct nf_nat_sip_hooks *hooks; saddr = NULL; if (sip_direct_media) { @@ -966,22 +907,23 @@ static int set_expected_rtp_rtcp(struct sk_buff *skb, unsigned int protoff, #endif skip_expect = 1; } while (!skip_expect); - rcu_read_unlock(); base_port = ntohs(tuple.dst.u.udp.port) & ~1; rtp_port = htons(base_port); rtcp_port = htons(base_port + 1); if (direct_rtp) { - nf_nat_sdp_port = rcu_dereference(nf_nat_sdp_port_hook); - if (nf_nat_sdp_port && - !nf_nat_sdp_port(skb, protoff, dataoff, dptr, datalen, + hooks = rcu_dereference(nf_nat_sip_hooks); + if (hooks && + !hooks->sdp_port(skb, protoff, dataoff, dptr, datalen, mediaoff, medialen, ntohs(rtp_port))) goto err1; } - if (skip_expect) + if (skip_expect) { + rcu_read_unlock(); return NF_ACCEPT; + } rtp_exp = nf_ct_expect_alloc(ct); if (rtp_exp == NULL) @@ -995,10 +937,10 @@ static int set_expected_rtp_rtcp(struct sk_buff *skb, unsigned int protoff, nf_ct_expect_init(rtcp_exp, class, nf_ct_l3num(ct), saddr, daddr, IPPROTO_UDP, NULL, &rtcp_port); - nf_nat_sdp_media = rcu_dereference(nf_nat_sdp_media_hook); - if (nf_nat_sdp_media && ct->status & IPS_NAT_MASK && !direct_rtp) - ret = nf_nat_sdp_media(skb, protoff, dataoff, dptr, datalen, - rtp_exp, rtcp_exp, + hooks = rcu_dereference(nf_nat_sip_hooks); + if (hooks && ct->status & IPS_NAT_MASK && !direct_rtp) + ret = hooks->sdp_media(skb, protoff, dataoff, dptr, + datalen, rtp_exp, rtcp_exp, mediaoff, medialen, daddr); else { if (nf_ct_expect_related(rtp_exp) == 0) { @@ -1012,6 +954,7 @@ static int set_expected_rtp_rtcp(struct sk_buff *skb, unsigned int protoff, err2: nf_ct_expect_put(rtp_exp); err1: + rcu_read_unlock(); return ret; } @@ -1051,13 +994,12 @@ static int process_sdp(struct sk_buff *skb, unsigned int protoff, unsigned int caddr_len, maddr_len; unsigned int i; union nf_inet_addr caddr, maddr, rtp_addr; + const struct nf_nat_sip_hooks *hooks; unsigned int port; const struct sdp_media_type *t; int ret = NF_ACCEPT; - typeof(nf_nat_sdp_addr_hook) nf_nat_sdp_addr; - typeof(nf_nat_sdp_session_hook) nf_nat_sdp_session; - nf_nat_sdp_addr = rcu_dereference(nf_nat_sdp_addr_hook); + hooks = rcu_dereference(nf_nat_sip_hooks); /* Find beginning of session description */ if (ct_sip_get_sdp_header(ct, *dptr, 0, *datalen, @@ -1125,10 +1067,11 @@ static int process_sdp(struct sk_buff *skb, unsigned int protoff, } /* Update media connection address if present */ - if (maddr_len && nf_nat_sdp_addr && ct->status & IPS_NAT_MASK) { - ret = nf_nat_sdp_addr(skb, protoff, dataoff, + if (maddr_len && hooks && ct->status & IPS_NAT_MASK) { + ret = hooks->sdp_addr(skb, protoff, dataoff, dptr, datalen, mediaoff, - SDP_HDR_CONNECTION, SDP_HDR_MEDIA, + SDP_HDR_CONNECTION, + SDP_HDR_MEDIA, &rtp_addr); if (ret != NF_ACCEPT) { nf_ct_helper_log(skb, ct, "cannot mangle SDP"); @@ -1139,10 +1082,11 @@ static int process_sdp(struct sk_buff *skb, unsigned int protoff, } /* Update session connection and owner addresses */ - nf_nat_sdp_session = rcu_dereference(nf_nat_sdp_session_hook); - if (nf_nat_sdp_session && ct->status & IPS_NAT_MASK) - ret = nf_nat_sdp_session(skb, protoff, dataoff, - dptr, datalen, sdpoff, &rtp_addr); + hooks = rcu_dereference(nf_nat_sip_hooks); + if (hooks && ct->status & IPS_NAT_MASK) + ret = hooks->sdp_session(skb, protoff, dataoff, + dptr, datalen, sdpoff, + &rtp_addr); return ret; } @@ -1242,11 +1186,11 @@ static int process_register_request(struct sk_buff *skb, unsigned int protoff, unsigned int matchoff, matchlen; struct nf_conntrack_expect *exp; union nf_inet_addr *saddr, daddr; + const struct nf_nat_sip_hooks *hooks; __be16 port; u8 proto; unsigned int expires = 0; int ret; - typeof(nf_nat_sip_expect_hook) nf_nat_sip_expect; /* Expected connections can not register again. */ if (ct->status & IPS_EXPECTED) @@ -1309,10 +1253,10 @@ static int process_register_request(struct sk_buff *skb, unsigned int protoff, exp->helper = nfct_help(ct)->helper; exp->flags = NF_CT_EXPECT_PERMANENT | NF_CT_EXPECT_INACTIVE; - nf_nat_sip_expect = rcu_dereference(nf_nat_sip_expect_hook); - if (nf_nat_sip_expect && ct->status & IPS_NAT_MASK) - ret = nf_nat_sip_expect(skb, protoff, dataoff, dptr, datalen, - exp, matchoff, matchlen); + hooks = rcu_dereference(nf_nat_sip_hooks); + if (hooks && ct->status & IPS_NAT_MASK) + ret = hooks->expect(skb, protoff, dataoff, dptr, datalen, + exp, matchoff, matchlen); else { if (nf_ct_expect_related(exp) != 0) { nf_ct_helper_log(skb, ct, "cannot add expectation"); @@ -1515,7 +1459,7 @@ static int process_sip_msg(struct sk_buff *skb, struct nf_conn *ct, unsigned int protoff, unsigned int dataoff, const char **dptr, unsigned int *datalen) { - typeof(nf_nat_sip_hook) nf_nat_sip; + const struct nf_nat_sip_hooks *hooks; int ret; if (strnicmp(*dptr, "SIP/2.0 ", strlen("SIP/2.0 ")) != 0) @@ -1524,9 +1468,9 @@ static int process_sip_msg(struct sk_buff *skb, struct nf_conn *ct, ret = process_sip_response(skb, protoff, dataoff, dptr, datalen); if (ret == NF_ACCEPT && ct->status & IPS_NAT_MASK) { - nf_nat_sip = rcu_dereference(nf_nat_sip_hook); - if (nf_nat_sip && !nf_nat_sip(skb, protoff, dataoff, - dptr, datalen)) { + hooks = rcu_dereference(nf_nat_sip_hooks); + if (hooks && !hooks->msg(skb, protoff, dataoff, + dptr, datalen)) { nf_ct_helper_log(skb, ct, "cannot NAT SIP message"); ret = NF_DROP; } @@ -1546,7 +1490,6 @@ static int sip_help_tcp(struct sk_buff *skb, unsigned int protoff, s16 diff, tdiff = 0; int ret = NF_ACCEPT; bool term; - typeof(nf_nat_sip_seq_adjust_hook) nf_nat_sip_seq_adjust; if (ctinfo != IP_CT_ESTABLISHED && ctinfo != IP_CT_ESTABLISHED_REPLY) @@ -1610,9 +1553,11 @@ static int sip_help_tcp(struct sk_buff *skb, unsigned int protoff, } if (ret == NF_ACCEPT && ct->status & IPS_NAT_MASK) { - nf_nat_sip_seq_adjust = rcu_dereference(nf_nat_sip_seq_adjust_hook); - if (nf_nat_sip_seq_adjust) - nf_nat_sip_seq_adjust(skb, protoff, tdiff); + const struct nf_nat_sip_hooks *hooks; + + hooks = rcu_dereference(nf_nat_sip_hooks); + if (hooks) + hooks->seq_adjust(skb, protoff, tdiff); } return ret; diff --git a/net/netfilter/nf_internals.h b/net/netfilter/nf_internals.h index 3deec997be89..61a3c927e63c 100644 --- a/net/netfilter/nf_internals.h +++ b/net/netfilter/nf_internals.h @@ -13,26 +13,20 @@ /* core.c */ -extern unsigned int nf_iterate(struct list_head *head, - struct sk_buff *skb, - unsigned int hook, - const struct net_device *indev, - const struct net_device *outdev, - struct nf_hook_ops **elemp, - int (*okfn)(struct sk_buff *), - int hook_thresh); +unsigned int nf_iterate(struct list_head *head, struct sk_buff *skb, + unsigned int hook, const struct net_device *indev, + const struct net_device *outdev, + struct nf_hook_ops **elemp, + int (*okfn)(struct sk_buff *), int hook_thresh); /* nf_queue.c */ -extern int nf_queue(struct sk_buff *skb, - struct nf_hook_ops *elem, - u_int8_t pf, unsigned int hook, - struct net_device *indev, - struct net_device *outdev, - int (*okfn)(struct sk_buff *), - unsigned int queuenum); -extern int __init netfilter_queue_init(void); +int nf_queue(struct sk_buff *skb, struct nf_hook_ops *elem, u_int8_t pf, + unsigned int hook, struct net_device *indev, + struct net_device *outdev, int (*okfn)(struct sk_buff *), + unsigned int queuenum); +int __init netfilter_queue_init(void); /* nf_log.c */ -extern int __init netfilter_log_init(void); +int __init netfilter_log_init(void); #endif diff --git a/net/netfilter/nf_nat_core.c b/net/netfilter/nf_nat_core.c index 6f0f4f7f68a5..63a815402211 100644 --- a/net/netfilter/nf_nat_core.c +++ b/net/netfilter/nf_nat_core.c @@ -432,6 +432,26 @@ nf_nat_setup_info(struct nf_conn *ct, } EXPORT_SYMBOL(nf_nat_setup_info); +unsigned int +nf_nat_alloc_null_binding(struct nf_conn *ct, unsigned int hooknum) +{ + /* Force range to this IP; let proto decide mapping for + * per-proto parts (hence not IP_NAT_RANGE_PROTO_SPECIFIED). + * Use reply in case it's already been mangled (eg local packet). + */ + union nf_inet_addr ip = + (HOOK2MANIP(hooknum) == NF_NAT_MANIP_SRC ? + ct->tuplehash[IP_CT_DIR_REPLY].tuple.dst.u3 : + ct->tuplehash[IP_CT_DIR_REPLY].tuple.src.u3); + struct nf_nat_range range = { + .flags = NF_NAT_RANGE_MAP_IPS, + .min_addr = ip, + .max_addr = ip, + }; + return nf_nat_setup_info(ct, &range, HOOK2MANIP(hooknum)); +} +EXPORT_SYMBOL_GPL(nf_nat_alloc_null_binding); + /* Do packet manipulations according to nf_nat_setup_info. */ unsigned int nf_nat_packet(struct nf_conn *ct, enum ip_conntrack_info ctinfo, diff --git a/net/netfilter/nf_nat_sip.c b/net/netfilter/nf_nat_sip.c index f9790405b7ff..b4d691db955e 100644 --- a/net/netfilter/nf_nat_sip.c +++ b/net/netfilter/nf_nat_sip.c @@ -625,33 +625,26 @@ static struct nf_ct_helper_expectfn sip_nat = { static void __exit nf_nat_sip_fini(void) { - RCU_INIT_POINTER(nf_nat_sip_hook, NULL); - RCU_INIT_POINTER(nf_nat_sip_seq_adjust_hook, NULL); - RCU_INIT_POINTER(nf_nat_sip_expect_hook, NULL); - RCU_INIT_POINTER(nf_nat_sdp_addr_hook, NULL); - RCU_INIT_POINTER(nf_nat_sdp_port_hook, NULL); - RCU_INIT_POINTER(nf_nat_sdp_session_hook, NULL); - RCU_INIT_POINTER(nf_nat_sdp_media_hook, NULL); + RCU_INIT_POINTER(nf_nat_sip_hooks, NULL); + nf_ct_helper_expectfn_unregister(&sip_nat); synchronize_rcu(); } +static const struct nf_nat_sip_hooks sip_hooks = { + .msg = nf_nat_sip, + .seq_adjust = nf_nat_sip_seq_adjust, + .expect = nf_nat_sip_expect, + .sdp_addr = nf_nat_sdp_addr, + .sdp_port = nf_nat_sdp_port, + .sdp_session = nf_nat_sdp_session, + .sdp_media = nf_nat_sdp_media, +}; + static int __init nf_nat_sip_init(void) { - BUG_ON(nf_nat_sip_hook != NULL); - BUG_ON(nf_nat_sip_seq_adjust_hook != NULL); - BUG_ON(nf_nat_sip_expect_hook != NULL); - BUG_ON(nf_nat_sdp_addr_hook != NULL); - BUG_ON(nf_nat_sdp_port_hook != NULL); - BUG_ON(nf_nat_sdp_session_hook != NULL); - BUG_ON(nf_nat_sdp_media_hook != NULL); - RCU_INIT_POINTER(nf_nat_sip_hook, nf_nat_sip); - RCU_INIT_POINTER(nf_nat_sip_seq_adjust_hook, nf_nat_sip_seq_adjust); - RCU_INIT_POINTER(nf_nat_sip_expect_hook, nf_nat_sip_expect); - RCU_INIT_POINTER(nf_nat_sdp_addr_hook, nf_nat_sdp_addr); - RCU_INIT_POINTER(nf_nat_sdp_port_hook, nf_nat_sdp_port); - RCU_INIT_POINTER(nf_nat_sdp_session_hook, nf_nat_sdp_session); - RCU_INIT_POINTER(nf_nat_sdp_media_hook, nf_nat_sdp_media); + BUG_ON(nf_nat_sip_hooks != NULL); + RCU_INIT_POINTER(nf_nat_sip_hooks, &sip_hooks); nf_ct_helper_expectfn_register(&sip_nat); return 0; } diff --git a/net/netfilter/nf_tables_api.c b/net/netfilter/nf_tables_api.c new file mode 100644 index 000000000000..dcddc49c0e08 --- /dev/null +++ b/net/netfilter/nf_tables_api.c @@ -0,0 +1,3275 @@ +/* + * Copyright (c) 2007-2009 Patrick McHardy <kaber@trash.net> + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License version 2 as + * published by the Free Software Foundation. + * + * Development of this code funded by Astaro AG (http://www.astaro.com/) + */ + +#include <linux/module.h> +#include <linux/init.h> +#include <linux/list.h> +#include <linux/skbuff.h> +#include <linux/netlink.h> +#include <linux/netfilter.h> +#include <linux/netfilter/nfnetlink.h> +#include <linux/netfilter/nf_tables.h> +#include <net/netfilter/nf_tables_core.h> +#include <net/netfilter/nf_tables.h> +#include <net/net_namespace.h> +#include <net/sock.h> + +static LIST_HEAD(nf_tables_expressions); + +/** + * nft_register_afinfo - register nf_tables address family info + * + * @afi: address family info to register + * + * Register the address family for use with nf_tables. Returns zero on + * success or a negative errno code otherwise. + */ +int nft_register_afinfo(struct net *net, struct nft_af_info *afi) +{ + INIT_LIST_HEAD(&afi->tables); + nfnl_lock(NFNL_SUBSYS_NFTABLES); + list_add_tail(&afi->list, &net->nft.af_info); + nfnl_unlock(NFNL_SUBSYS_NFTABLES); + return 0; +} +EXPORT_SYMBOL_GPL(nft_register_afinfo); + +/** + * nft_unregister_afinfo - unregister nf_tables address family info + * + * @afi: address family info to unregister + * + * Unregister the address family for use with nf_tables. + */ +void nft_unregister_afinfo(struct nft_af_info *afi) +{ + nfnl_lock(NFNL_SUBSYS_NFTABLES); + list_del(&afi->list); + nfnl_unlock(NFNL_SUBSYS_NFTABLES); +} +EXPORT_SYMBOL_GPL(nft_unregister_afinfo); + +static struct nft_af_info *nft_afinfo_lookup(struct net *net, int family) +{ + struct nft_af_info *afi; + + list_for_each_entry(afi, &net->nft.af_info, list) { + if (afi->family == family) + return afi; + } + return NULL; +} + +static struct nft_af_info * +nf_tables_afinfo_lookup(struct net *net, int family, bool autoload) +{ + struct nft_af_info *afi; + + afi = nft_afinfo_lookup(net, family); + if (afi != NULL) + return afi; +#ifdef CONFIG_MODULES + if (autoload) { + nfnl_unlock(NFNL_SUBSYS_NFTABLES); + request_module("nft-afinfo-%u", family); + nfnl_lock(NFNL_SUBSYS_NFTABLES); + afi = nft_afinfo_lookup(net, family); + if (afi != NULL) + return ERR_PTR(-EAGAIN); + } +#endif + return ERR_PTR(-EAFNOSUPPORT); +} + +/* + * Tables + */ + +static struct nft_table *nft_table_lookup(const struct nft_af_info *afi, + const struct nlattr *nla) +{ + struct nft_table *table; + + list_for_each_entry(table, &afi->tables, list) { + if (!nla_strcmp(nla, table->name)) + return table; + } + return NULL; +} + +static struct nft_table *nf_tables_table_lookup(const struct nft_af_info *afi, + const struct nlattr *nla) +{ + struct nft_table *table; + + if (nla == NULL) + return ERR_PTR(-EINVAL); + + table = nft_table_lookup(afi, nla); + if (table != NULL) + return table; + + return ERR_PTR(-ENOENT); +} + +static inline u64 nf_tables_alloc_handle(struct nft_table *table) +{ + return ++table->hgenerator; +} + +static struct nf_chain_type *chain_type[AF_MAX][NFT_CHAIN_T_MAX]; + +static int __nf_tables_chain_type_lookup(int family, const struct nlattr *nla) +{ + int i; + + for (i=0; i<NFT_CHAIN_T_MAX; i++) { + if (chain_type[family][i] != NULL && + !nla_strcmp(nla, chain_type[family][i]->name)) + return i; + } + return -1; +} + +static int nf_tables_chain_type_lookup(const struct nft_af_info *afi, + const struct nlattr *nla, + bool autoload) +{ + int type; + + type = __nf_tables_chain_type_lookup(afi->family, nla); +#ifdef CONFIG_MODULES + if (type < 0 && autoload) { + nfnl_unlock(NFNL_SUBSYS_NFTABLES); + request_module("nft-chain-%u-%*.s", afi->family, + nla_len(nla)-1, (const char *)nla_data(nla)); + nfnl_lock(NFNL_SUBSYS_NFTABLES); + type = __nf_tables_chain_type_lookup(afi->family, nla); + } +#endif + return type; +} + +static const struct nla_policy nft_table_policy[NFTA_TABLE_MAX + 1] = { + [NFTA_TABLE_NAME] = { .type = NLA_STRING }, + [NFTA_TABLE_FLAGS] = { .type = NLA_U32 }, +}; + +static int nf_tables_fill_table_info(struct sk_buff *skb, u32 portid, u32 seq, + int event, u32 flags, int family, + const struct nft_table *table) +{ + struct nlmsghdr *nlh; + struct nfgenmsg *nfmsg; + + event |= NFNL_SUBSYS_NFTABLES << 8; + nlh = nlmsg_put(skb, portid, seq, event, sizeof(struct nfgenmsg), flags); + if (nlh == NULL) + goto nla_put_failure; + + nfmsg = nlmsg_data(nlh); + nfmsg->nfgen_family = family; + nfmsg->version = NFNETLINK_V0; + nfmsg->res_id = 0; + + if (nla_put_string(skb, NFTA_TABLE_NAME, table->name) || + nla_put_be32(skb, NFTA_TABLE_FLAGS, htonl(table->flags))) + goto nla_put_failure; + + return nlmsg_end(skb, nlh); + +nla_put_failure: + nlmsg_trim(skb, nlh); + return -1; +} + +static int nf_tables_table_notify(const struct sk_buff *oskb, + const struct nlmsghdr *nlh, + const struct nft_table *table, + int event, int family) +{ + struct sk_buff *skb; + u32 portid = oskb ? NETLINK_CB(oskb).portid : 0; + u32 seq = nlh ? nlh->nlmsg_seq : 0; + struct net *net = oskb ? sock_net(oskb->sk) : &init_net; + bool report; + int err; + + report = nlh ? nlmsg_report(nlh) : false; + if (!report && !nfnetlink_has_listeners(net, NFNLGRP_NFTABLES)) + return 0; + + err = -ENOBUFS; + skb = nlmsg_new(NLMSG_GOODSIZE, GFP_KERNEL); + if (skb == NULL) + goto err; + + err = nf_tables_fill_table_info(skb, portid, seq, event, 0, + family, table); + if (err < 0) { + kfree_skb(skb); + goto err; + } + + err = nfnetlink_send(skb, net, portid, NFNLGRP_NFTABLES, report, + GFP_KERNEL); +err: + if (err < 0) + nfnetlink_set_err(net, portid, NFNLGRP_NFTABLES, err); + return err; +} + +static int nf_tables_dump_tables(struct sk_buff *skb, + struct netlink_callback *cb) +{ + const struct nfgenmsg *nfmsg = nlmsg_data(cb->nlh); + const struct nft_af_info *afi; + const struct nft_table *table; + unsigned int idx = 0, s_idx = cb->args[0]; + struct net *net = sock_net(skb->sk); + int family = nfmsg->nfgen_family; + + list_for_each_entry(afi, &net->nft.af_info, list) { + if (family != NFPROTO_UNSPEC && family != afi->family) + continue; + + list_for_each_entry(table, &afi->tables, list) { + if (idx < s_idx) + goto cont; + if (idx > s_idx) + memset(&cb->args[1], 0, + sizeof(cb->args) - sizeof(cb->args[0])); + if (nf_tables_fill_table_info(skb, + NETLINK_CB(cb->skb).portid, + cb->nlh->nlmsg_seq, + NFT_MSG_NEWTABLE, + NLM_F_MULTI, + afi->family, table) < 0) + goto done; +cont: + idx++; + } + } +done: + cb->args[0] = idx; + return skb->len; +} + +static int nf_tables_gettable(struct sock *nlsk, struct sk_buff *skb, + const struct nlmsghdr *nlh, + const struct nlattr * const nla[]) +{ + const struct nfgenmsg *nfmsg = nlmsg_data(nlh); + const struct nft_af_info *afi; + const struct nft_table *table; + struct sk_buff *skb2; + struct net *net = sock_net(skb->sk); + int family = nfmsg->nfgen_family; + int err; + + if (nlh->nlmsg_flags & NLM_F_DUMP) { + struct netlink_dump_control c = { + .dump = nf_tables_dump_tables, + }; + return netlink_dump_start(nlsk, skb, nlh, &c); + } + + afi = nf_tables_afinfo_lookup(net, family, false); + if (IS_ERR(afi)) + return PTR_ERR(afi); + + table = nf_tables_table_lookup(afi, nla[NFTA_TABLE_NAME]); + if (IS_ERR(table)) + return PTR_ERR(table); + + skb2 = alloc_skb(NLMSG_GOODSIZE, GFP_KERNEL); + if (!skb2) + return -ENOMEM; + + err = nf_tables_fill_table_info(skb2, NETLINK_CB(skb).portid, + nlh->nlmsg_seq, NFT_MSG_NEWTABLE, 0, + family, table); + if (err < 0) + goto err; + + return nlmsg_unicast(nlsk, skb2, NETLINK_CB(skb).portid); + +err: + kfree_skb(skb2); + return err; +} + +static int nf_tables_table_enable(struct nft_table *table) +{ + struct nft_chain *chain; + int err, i = 0; + + list_for_each_entry(chain, &table->chains, list) { + err = nf_register_hook(&nft_base_chain(chain)->ops); + if (err < 0) + goto err; + + i++; + } + return 0; +err: + list_for_each_entry(chain, &table->chains, list) { + if (i-- <= 0) + break; + + nf_unregister_hook(&nft_base_chain(chain)->ops); + } + return err; +} + +static int nf_tables_table_disable(struct nft_table *table) +{ + struct nft_chain *chain; + + list_for_each_entry(chain, &table->chains, list) + nf_unregister_hook(&nft_base_chain(chain)->ops); + + return 0; +} + +static int nf_tables_updtable(struct sock *nlsk, struct sk_buff *skb, + const struct nlmsghdr *nlh, + const struct nlattr * const nla[], + struct nft_af_info *afi, struct nft_table *table) +{ + const struct nfgenmsg *nfmsg = nlmsg_data(nlh); + int family = nfmsg->nfgen_family, ret = 0; + + if (nla[NFTA_TABLE_FLAGS]) { + __be32 flags; + + flags = ntohl(nla_get_be32(nla[NFTA_TABLE_FLAGS])); + if (flags & ~NFT_TABLE_F_DORMANT) + return -EINVAL; + + if ((flags & NFT_TABLE_F_DORMANT) && + !(table->flags & NFT_TABLE_F_DORMANT)) { + ret = nf_tables_table_disable(table); + if (ret >= 0) + table->flags |= NFT_TABLE_F_DORMANT; + } else if (!(flags & NFT_TABLE_F_DORMANT) && + table->flags & NFT_TABLE_F_DORMANT) { + ret = nf_tables_table_enable(table); + if (ret >= 0) + table->flags &= ~NFT_TABLE_F_DORMANT; + } + if (ret < 0) + goto err; + } + + nf_tables_table_notify(skb, nlh, table, NFT_MSG_NEWTABLE, family); +err: + return ret; +} + +static int nf_tables_newtable(struct sock *nlsk, struct sk_buff *skb, + const struct nlmsghdr *nlh, + const struct nlattr * const nla[]) +{ + const struct nfgenmsg *nfmsg = nlmsg_data(nlh); + const struct nlattr *name; + struct nft_af_info *afi; + struct nft_table *table; + struct net *net = sock_net(skb->sk); + int family = nfmsg->nfgen_family; + + afi = nf_tables_afinfo_lookup(net, family, true); + if (IS_ERR(afi)) + return PTR_ERR(afi); + + name = nla[NFTA_TABLE_NAME]; + table = nf_tables_table_lookup(afi, name); + if (IS_ERR(table)) { + if (PTR_ERR(table) != -ENOENT) + return PTR_ERR(table); + table = NULL; + } + + if (table != NULL) { + if (nlh->nlmsg_flags & NLM_F_EXCL) + return -EEXIST; + if (nlh->nlmsg_flags & NLM_F_REPLACE) + return -EOPNOTSUPP; + return nf_tables_updtable(nlsk, skb, nlh, nla, afi, table); + } + + table = kzalloc(sizeof(*table) + nla_len(name), GFP_KERNEL); + if (table == NULL) + return -ENOMEM; + + nla_strlcpy(table->name, name, nla_len(name)); + INIT_LIST_HEAD(&table->chains); + INIT_LIST_HEAD(&table->sets); + + if (nla[NFTA_TABLE_FLAGS]) { + __be32 flags; + + flags = ntohl(nla_get_be32(nla[NFTA_TABLE_FLAGS])); + if (flags & ~NFT_TABLE_F_DORMANT) { + kfree(table); + return -EINVAL; + } + + table->flags |= flags; + } + + list_add_tail(&table->list, &afi->tables); + nf_tables_table_notify(skb, nlh, table, NFT_MSG_NEWTABLE, family); + return 0; +} + +static int nf_tables_deltable(struct sock *nlsk, struct sk_buff *skb, + const struct nlmsghdr *nlh, + const struct nlattr * const nla[]) +{ + const struct nfgenmsg *nfmsg = nlmsg_data(nlh); + struct nft_af_info *afi; + struct nft_table *table; + struct net *net = sock_net(skb->sk); + int family = nfmsg->nfgen_family; + + afi = nf_tables_afinfo_lookup(net, family, false); + if (IS_ERR(afi)) + return PTR_ERR(afi); + + table = nf_tables_table_lookup(afi, nla[NFTA_TABLE_NAME]); + if (IS_ERR(table)) + return PTR_ERR(table); + + if (table->use) + return -EBUSY; + + list_del(&table->list); + nf_tables_table_notify(skb, nlh, table, NFT_MSG_DELTABLE, family); + kfree(table); + return 0; +} + +int nft_register_chain_type(struct nf_chain_type *ctype) +{ + int err = 0; + + nfnl_lock(NFNL_SUBSYS_NFTABLES); + if (chain_type[ctype->family][ctype->type] != NULL) { + err = -EBUSY; + goto out; + } + + if (!try_module_get(ctype->me)) + goto out; + + chain_type[ctype->family][ctype->type] = ctype; +out: + nfnl_unlock(NFNL_SUBSYS_NFTABLES); + return err; +} +EXPORT_SYMBOL_GPL(nft_register_chain_type); + +void nft_unregister_chain_type(struct nf_chain_type *ctype) +{ + nfnl_lock(NFNL_SUBSYS_NFTABLES); + chain_type[ctype->family][ctype->type] = NULL; + module_put(ctype->me); + nfnl_unlock(NFNL_SUBSYS_NFTABLES); +} +EXPORT_SYMBOL_GPL(nft_unregister_chain_type); + +/* + * Chains + */ + +static struct nft_chain * +nf_tables_chain_lookup_byhandle(const struct nft_table *table, u64 handle) +{ + struct nft_chain *chain; + + list_for_each_entry(chain, &table->chains, list) { + if (chain->handle == handle) + return chain; + } + + return ERR_PTR(-ENOENT); +} + +static struct nft_chain *nf_tables_chain_lookup(const struct nft_table *table, + const struct nlattr *nla) +{ + struct nft_chain *chain; + + if (nla == NULL) + return ERR_PTR(-EINVAL); + + list_for_each_entry(chain, &table->chains, list) { + if (!nla_strcmp(nla, chain->name)) + return chain; + } + + return ERR_PTR(-ENOENT); +} + +static const struct nla_policy nft_chain_policy[NFTA_CHAIN_MAX + 1] = { + [NFTA_CHAIN_TABLE] = { .type = NLA_STRING }, + [NFTA_CHAIN_HANDLE] = { .type = NLA_U64 }, + [NFTA_CHAIN_NAME] = { .type = NLA_STRING, + .len = NFT_CHAIN_MAXNAMELEN - 1 }, + [NFTA_CHAIN_HOOK] = { .type = NLA_NESTED }, + [NFTA_CHAIN_POLICY] = { .type = NLA_U32 }, + [NFTA_CHAIN_TYPE] = { .type = NLA_NUL_STRING }, + [NFTA_CHAIN_COUNTERS] = { .type = NLA_NESTED }, +}; + +static const struct nla_policy nft_hook_policy[NFTA_HOOK_MAX + 1] = { + [NFTA_HOOK_HOOKNUM] = { .type = NLA_U32 }, + [NFTA_HOOK_PRIORITY] = { .type = NLA_U32 }, +}; + +static int nft_dump_stats(struct sk_buff *skb, struct nft_stats __percpu *stats) +{ + struct nft_stats *cpu_stats, total; + struct nlattr *nest; + int cpu; + + memset(&total, 0, sizeof(total)); + for_each_possible_cpu(cpu) { + cpu_stats = per_cpu_ptr(stats, cpu); + total.pkts += cpu_stats->pkts; + total.bytes += cpu_stats->bytes; + } + nest = nla_nest_start(skb, NFTA_CHAIN_COUNTERS); + if (nest == NULL) + goto nla_put_failure; + + if (nla_put_be64(skb, NFTA_COUNTER_PACKETS, cpu_to_be64(total.pkts)) || + nla_put_be64(skb, NFTA_COUNTER_BYTES, cpu_to_be64(total.bytes))) + goto nla_put_failure; + + nla_nest_end(skb, nest); + return 0; + +nla_put_failure: + return -ENOSPC; +} + +static int nf_tables_fill_chain_info(struct sk_buff *skb, u32 portid, u32 seq, + int event, u32 flags, int family, + const struct nft_table *table, + const struct nft_chain *chain) +{ + struct nlmsghdr *nlh; + struct nfgenmsg *nfmsg; + + event |= NFNL_SUBSYS_NFTABLES << 8; + nlh = nlmsg_put(skb, portid, seq, event, sizeof(struct nfgenmsg), flags); + if (nlh == NULL) + goto nla_put_failure; + + nfmsg = nlmsg_data(nlh); + nfmsg->nfgen_family = family; + nfmsg->version = NFNETLINK_V0; + nfmsg->res_id = 0; + + if (nla_put_string(skb, NFTA_CHAIN_TABLE, table->name)) + goto nla_put_failure; + if (nla_put_be64(skb, NFTA_CHAIN_HANDLE, cpu_to_be64(chain->handle))) + goto nla_put_failure; + if (nla_put_string(skb, NFTA_CHAIN_NAME, chain->name)) + goto nla_put_failure; + + if (chain->flags & NFT_BASE_CHAIN) { + const struct nft_base_chain *basechain = nft_base_chain(chain); + const struct nf_hook_ops *ops = &basechain->ops; + struct nlattr *nest; + + nest = nla_nest_start(skb, NFTA_CHAIN_HOOK); + if (nest == NULL) + goto nla_put_failure; + if (nla_put_be32(skb, NFTA_HOOK_HOOKNUM, htonl(ops->hooknum))) + goto nla_put_failure; + if (nla_put_be32(skb, NFTA_HOOK_PRIORITY, htonl(ops->priority))) + goto nla_put_failure; + nla_nest_end(skb, nest); + + if (nla_put_be32(skb, NFTA_CHAIN_POLICY, + htonl(basechain->policy))) + goto nla_put_failure; + + if (nla_put_string(skb, NFTA_CHAIN_TYPE, + chain_type[ops->pf][nft_base_chain(chain)->type]->name)) + goto nla_put_failure; + + if (nft_dump_stats(skb, nft_base_chain(chain)->stats)) + goto nla_put_failure; + } + + if (nla_put_be32(skb, NFTA_CHAIN_USE, htonl(chain->use))) + goto nla_put_failure; + + return nlmsg_end(skb, nlh); + +nla_put_failure: + nlmsg_trim(skb, nlh); + return -1; +} + +static int nf_tables_chain_notify(const struct sk_buff *oskb, + const struct nlmsghdr *nlh, + const struct nft_table *table, + const struct nft_chain *chain, + int event, int family) +{ + struct sk_buff *skb; + u32 portid = oskb ? NETLINK_CB(oskb).portid : 0; + struct net *net = oskb ? sock_net(oskb->sk) : &init_net; + u32 seq = nlh ? nlh->nlmsg_seq : 0; + bool report; + int err; + + report = nlh ? nlmsg_report(nlh) : false; + if (!report && !nfnetlink_has_listeners(net, NFNLGRP_NFTABLES)) + return 0; + + err = -ENOBUFS; + skb = nlmsg_new(NLMSG_GOODSIZE, GFP_KERNEL); + if (skb == NULL) + goto err; + + err = nf_tables_fill_chain_info(skb, portid, seq, event, 0, family, + table, chain); + if (err < 0) { + kfree_skb(skb); + goto err; + } + + err = nfnetlink_send(skb, net, portid, NFNLGRP_NFTABLES, report, + GFP_KERNEL); +err: + if (err < 0) + nfnetlink_set_err(net, portid, NFNLGRP_NFTABLES, err); + return err; +} + +static int nf_tables_dump_chains(struct sk_buff *skb, + struct netlink_callback *cb) +{ + const struct nfgenmsg *nfmsg = nlmsg_data(cb->nlh); + const struct nft_af_info *afi; + const struct nft_table *table; + const struct nft_chain *chain; + unsigned int idx = 0, s_idx = cb->args[0]; + struct net *net = sock_net(skb->sk); + int family = nfmsg->nfgen_family; + + list_for_each_entry(afi, &net->nft.af_info, list) { + if (family != NFPROTO_UNSPEC && family != afi->family) + continue; + + list_for_each_entry(table, &afi->tables, list) { + list_for_each_entry(chain, &table->chains, list) { + if (idx < s_idx) + goto cont; + if (idx > s_idx) + memset(&cb->args[1], 0, + sizeof(cb->args) - sizeof(cb->args[0])); + if (nf_tables_fill_chain_info(skb, NETLINK_CB(cb->skb).portid, + cb->nlh->nlmsg_seq, + NFT_MSG_NEWCHAIN, + NLM_F_MULTI, + afi->family, table, chain) < 0) + goto done; +cont: + idx++; + } + } + } +done: + cb->args[0] = idx; + return skb->len; +} + + +static int nf_tables_getchain(struct sock *nlsk, struct sk_buff *skb, + const struct nlmsghdr *nlh, + const struct nlattr * const nla[]) +{ + const struct nfgenmsg *nfmsg = nlmsg_data(nlh); + const struct nft_af_info *afi; + const struct nft_table *table; + const struct nft_chain *chain; + struct sk_buff *skb2; + struct net *net = sock_net(skb->sk); + int family = nfmsg->nfgen_family; + int err; + + if (nlh->nlmsg_flags & NLM_F_DUMP) { + struct netlink_dump_control c = { + .dump = nf_tables_dump_chains, + }; + return netlink_dump_start(nlsk, skb, nlh, &c); + } + + afi = nf_tables_afinfo_lookup(net, family, false); + if (IS_ERR(afi)) + return PTR_ERR(afi); + + table = nf_tables_table_lookup(afi, nla[NFTA_CHAIN_TABLE]); + if (IS_ERR(table)) + return PTR_ERR(table); + + chain = nf_tables_chain_lookup(table, nla[NFTA_CHAIN_NAME]); + if (IS_ERR(chain)) + return PTR_ERR(chain); + + skb2 = alloc_skb(NLMSG_GOODSIZE, GFP_KERNEL); + if (!skb2) + return -ENOMEM; + + err = nf_tables_fill_chain_info(skb2, NETLINK_CB(skb).portid, + nlh->nlmsg_seq, NFT_MSG_NEWCHAIN, 0, + family, table, chain); + if (err < 0) + goto err; + + return nlmsg_unicast(nlsk, skb2, NETLINK_CB(skb).portid); + +err: + kfree_skb(skb2); + return err; +} + +static int +nf_tables_chain_policy(struct nft_base_chain *chain, const struct nlattr *attr) +{ + switch (ntohl(nla_get_be32(attr))) { + case NF_DROP: + chain->policy = NF_DROP; + break; + case NF_ACCEPT: + chain->policy = NF_ACCEPT; + break; + default: + return -EINVAL; + } + return 0; +} + +static const struct nla_policy nft_counter_policy[NFTA_COUNTER_MAX + 1] = { + [NFTA_COUNTER_PACKETS] = { .type = NLA_U64 }, + [NFTA_COUNTER_BYTES] = { .type = NLA_U64 }, +}; + +static int +nf_tables_counters(struct nft_base_chain *chain, const struct nlattr *attr) +{ + struct nlattr *tb[NFTA_COUNTER_MAX+1]; + struct nft_stats __percpu *newstats; + struct nft_stats *stats; + int err; + + err = nla_parse_nested(tb, NFTA_COUNTER_MAX, attr, nft_counter_policy); + if (err < 0) + return err; + + if (!tb[NFTA_COUNTER_BYTES] || !tb[NFTA_COUNTER_PACKETS]) + return -EINVAL; + + newstats = alloc_percpu(struct nft_stats); + if (newstats == NULL) + return -ENOMEM; + + /* Restore old counters on this cpu, no problem. Per-cpu statistics + * are not exposed to userspace. + */ + stats = this_cpu_ptr(newstats); + stats->bytes = be64_to_cpu(nla_get_be64(tb[NFTA_COUNTER_BYTES])); + stats->pkts = be64_to_cpu(nla_get_be64(tb[NFTA_COUNTER_PACKETS])); + + if (chain->stats) { + /* nfnl_lock is held, add some nfnl function for this, later */ + struct nft_stats __percpu *oldstats = + rcu_dereference_protected(chain->stats, 1); + + rcu_assign_pointer(chain->stats, newstats); + synchronize_rcu(); + free_percpu(oldstats); + } else + rcu_assign_pointer(chain->stats, newstats); + + return 0; +} + +static int nf_tables_newchain(struct sock *nlsk, struct sk_buff *skb, + const struct nlmsghdr *nlh, + const struct nlattr * const nla[]) +{ + const struct nfgenmsg *nfmsg = nlmsg_data(nlh); + const struct nlattr * uninitialized_var(name); + const struct nft_af_info *afi; + struct nft_table *table; + struct nft_chain *chain; + struct nft_base_chain *basechain = NULL; + struct nlattr *ha[NFTA_HOOK_MAX + 1]; + struct net *net = sock_net(skb->sk); + int family = nfmsg->nfgen_family; + u64 handle = 0; + int err; + bool create; + + create = nlh->nlmsg_flags & NLM_F_CREATE ? true : false; + + afi = nf_tables_afinfo_lookup(net, family, true); + if (IS_ERR(afi)) + return PTR_ERR(afi); + + table = nf_tables_table_lookup(afi, nla[NFTA_CHAIN_TABLE]); + if (IS_ERR(table)) + return PTR_ERR(table); + + if (table->use == UINT_MAX) + return -EOVERFLOW; + + chain = NULL; + name = nla[NFTA_CHAIN_NAME]; + + if (nla[NFTA_CHAIN_HANDLE]) { + handle = be64_to_cpu(nla_get_be64(nla[NFTA_CHAIN_HANDLE])); + chain = nf_tables_chain_lookup_byhandle(table, handle); + if (IS_ERR(chain)) + return PTR_ERR(chain); + } else { + chain = nf_tables_chain_lookup(table, name); + if (IS_ERR(chain)) { + if (PTR_ERR(chain) != -ENOENT) + return PTR_ERR(chain); + chain = NULL; + } + } + + if (chain != NULL) { + if (nlh->nlmsg_flags & NLM_F_EXCL) + return -EEXIST; + if (nlh->nlmsg_flags & NLM_F_REPLACE) + return -EOPNOTSUPP; + + if (nla[NFTA_CHAIN_HANDLE] && name && + !IS_ERR(nf_tables_chain_lookup(table, nla[NFTA_CHAIN_NAME]))) + return -EEXIST; + + if (nla[NFTA_CHAIN_POLICY]) { + if (!(chain->flags & NFT_BASE_CHAIN)) + return -EOPNOTSUPP; + + err = nf_tables_chain_policy(nft_base_chain(chain), + nla[NFTA_CHAIN_POLICY]); + if (err < 0) + return err; + } + + if (nla[NFTA_CHAIN_COUNTERS]) { + if (!(chain->flags & NFT_BASE_CHAIN)) + return -EOPNOTSUPP; + + err = nf_tables_counters(nft_base_chain(chain), + nla[NFTA_CHAIN_COUNTERS]); + if (err < 0) + return err; + } + + if (nla[NFTA_CHAIN_HANDLE] && name) + nla_strlcpy(chain->name, name, NFT_CHAIN_MAXNAMELEN); + + goto notify; + } + + if (nla[NFTA_CHAIN_HOOK]) { + struct nf_hook_ops *ops; + nf_hookfn *hookfn; + u32 hooknum; + int type = NFT_CHAIN_T_DEFAULT; + + if (nla[NFTA_CHAIN_TYPE]) { + type = nf_tables_chain_type_lookup(afi, + nla[NFTA_CHAIN_TYPE], + create); + if (type < 0) + return -ENOENT; + } + + err = nla_parse_nested(ha, NFTA_HOOK_MAX, nla[NFTA_CHAIN_HOOK], + nft_hook_policy); + if (err < 0) + return err; + if (ha[NFTA_HOOK_HOOKNUM] == NULL || + ha[NFTA_HOOK_PRIORITY] == NULL) + return -EINVAL; + + hooknum = ntohl(nla_get_be32(ha[NFTA_HOOK_HOOKNUM])); + if (hooknum >= afi->nhooks) + return -EINVAL; + + hookfn = chain_type[family][type]->fn[hooknum]; + if (hookfn == NULL) + return -EOPNOTSUPP; + + basechain = kzalloc(sizeof(*basechain), GFP_KERNEL); + if (basechain == NULL) + return -ENOMEM; + + basechain->type = type; + chain = &basechain->chain; + + ops = &basechain->ops; + ops->pf = family; + ops->owner = afi->owner; + ops->hooknum = ntohl(nla_get_be32(ha[NFTA_HOOK_HOOKNUM])); + ops->priority = ntohl(nla_get_be32(ha[NFTA_HOOK_PRIORITY])); + ops->priv = chain; + ops->hook = hookfn; + if (afi->hooks[ops->hooknum]) + ops->hook = afi->hooks[ops->hooknum]; + + chain->flags |= NFT_BASE_CHAIN; + + if (nla[NFTA_CHAIN_POLICY]) { + err = nf_tables_chain_policy(basechain, + nla[NFTA_CHAIN_POLICY]); + if (err < 0) { + free_percpu(basechain->stats); + kfree(basechain); + return err; + } + } else + basechain->policy = NF_ACCEPT; + + if (nla[NFTA_CHAIN_COUNTERS]) { + err = nf_tables_counters(basechain, + nla[NFTA_CHAIN_COUNTERS]); + if (err < 0) { + free_percpu(basechain->stats); + kfree(basechain); + return err; + } + } else { + struct nft_stats __percpu *newstats; + + newstats = alloc_percpu(struct nft_stats); + if (newstats == NULL) + return -ENOMEM; + + rcu_assign_pointer(nft_base_chain(chain)->stats, + newstats); + } + } else { + chain = kzalloc(sizeof(*chain), GFP_KERNEL); + if (chain == NULL) + return -ENOMEM; + } + + INIT_LIST_HEAD(&chain->rules); + chain->handle = nf_tables_alloc_handle(table); + chain->net = net; + chain->table = table; + nla_strlcpy(chain->name, name, NFT_CHAIN_MAXNAMELEN); + + if (!(table->flags & NFT_TABLE_F_DORMANT) && + chain->flags & NFT_BASE_CHAIN) { + err = nf_register_hook(&nft_base_chain(chain)->ops); + if (err < 0) { + free_percpu(basechain->stats); + kfree(basechain); + return err; + } + } + list_add_tail(&chain->list, &table->chains); + table->use++; +notify: + nf_tables_chain_notify(skb, nlh, table, chain, NFT_MSG_NEWCHAIN, + family); + return 0; +} + +static void nf_tables_rcu_chain_destroy(struct rcu_head *head) +{ + struct nft_chain *chain = container_of(head, struct nft_chain, rcu_head); + + BUG_ON(chain->use > 0); + + if (chain->flags & NFT_BASE_CHAIN) { + free_percpu(nft_base_chain(chain)->stats); + kfree(nft_base_chain(chain)); + } else + kfree(chain); +} + +static int nf_tables_delchain(struct sock *nlsk, struct sk_buff *skb, + const struct nlmsghdr *nlh, + const struct nlattr * const nla[]) +{ + const struct nfgenmsg *nfmsg = nlmsg_data(nlh); + const struct nft_af_info *afi; + struct nft_table *table; + struct nft_chain *chain; + struct net *net = sock_net(skb->sk); + int family = nfmsg->nfgen_family; + + afi = nf_tables_afinfo_lookup(net, family, false); + if (IS_ERR(afi)) + return PTR_ERR(afi); + + table = nf_tables_table_lookup(afi, nla[NFTA_CHAIN_TABLE]); + if (IS_ERR(table)) + return PTR_ERR(table); + + chain = nf_tables_chain_lookup(table, nla[NFTA_CHAIN_NAME]); + if (IS_ERR(chain)) + return PTR_ERR(chain); + + if (!list_empty(&chain->rules)) + return -EBUSY; + + list_del(&chain->list); + table->use--; + + if (!(table->flags & NFT_TABLE_F_DORMANT) && + chain->flags & NFT_BASE_CHAIN) + nf_unregister_hook(&nft_base_chain(chain)->ops); + + nf_tables_chain_notify(skb, nlh, table, chain, NFT_MSG_DELCHAIN, + family); + + /* Make sure all rule references are gone before this is released */ + call_rcu(&chain->rcu_head, nf_tables_rcu_chain_destroy); + return 0; +} + +static void nft_ctx_init(struct nft_ctx *ctx, + const struct sk_buff *skb, + const struct nlmsghdr *nlh, + const struct nft_af_info *afi, + const struct nft_table *table, + const struct nft_chain *chain, + const struct nlattr * const *nla) +{ + ctx->net = sock_net(skb->sk); + ctx->skb = skb; + ctx->nlh = nlh; + ctx->afi = afi; + ctx->table = table; + ctx->chain = chain; + ctx->nla = nla; +} + +/* + * Expressions + */ + +/** + * nft_register_expr - register nf_tables expr type + * @ops: expr type + * + * Registers the expr type for use with nf_tables. Returns zero on + * success or a negative errno code otherwise. + */ +int nft_register_expr(struct nft_expr_type *type) +{ + nfnl_lock(NFNL_SUBSYS_NFTABLES); + list_add_tail(&type->list, &nf_tables_expressions); + nfnl_unlock(NFNL_SUBSYS_NFTABLES); + return 0; +} +EXPORT_SYMBOL_GPL(nft_register_expr); + +/** + * nft_unregister_expr - unregister nf_tables expr type + * @ops: expr type + * + * Unregisters the expr typefor use with nf_tables. + */ +void nft_unregister_expr(struct nft_expr_type *type) +{ + nfnl_lock(NFNL_SUBSYS_NFTABLES); + list_del(&type->list); + nfnl_unlock(NFNL_SUBSYS_NFTABLES); +} +EXPORT_SYMBOL_GPL(nft_unregister_expr); + +static const struct nft_expr_type *__nft_expr_type_get(struct nlattr *nla) +{ + const struct nft_expr_type *type; + + list_for_each_entry(type, &nf_tables_expressions, list) { + if (!nla_strcmp(nla, type->name)) + return type; + } + return NULL; +} + +static const struct nft_expr_type *nft_expr_type_get(struct nlattr *nla) +{ + const struct nft_expr_type *type; + + if (nla == NULL) + return ERR_PTR(-EINVAL); + + type = __nft_expr_type_get(nla); + if (type != NULL && try_module_get(type->owner)) + return type; + +#ifdef CONFIG_MODULES + if (type == NULL) { + nfnl_unlock(NFNL_SUBSYS_NFTABLES); + request_module("nft-expr-%.*s", + nla_len(nla), (char *)nla_data(nla)); + nfnl_lock(NFNL_SUBSYS_NFTABLES); + if (__nft_expr_type_get(nla)) + return ERR_PTR(-EAGAIN); + } +#endif + return ERR_PTR(-ENOENT); +} + +static const struct nla_policy nft_expr_policy[NFTA_EXPR_MAX + 1] = { + [NFTA_EXPR_NAME] = { .type = NLA_STRING }, + [NFTA_EXPR_DATA] = { .type = NLA_NESTED }, +}; + +static int nf_tables_fill_expr_info(struct sk_buff *skb, + const struct nft_expr *expr) +{ + if (nla_put_string(skb, NFTA_EXPR_NAME, expr->ops->type->name)) + goto nla_put_failure; + + if (expr->ops->dump) { + struct nlattr *data = nla_nest_start(skb, NFTA_EXPR_DATA); + if (data == NULL) + goto nla_put_failure; + if (expr->ops->dump(skb, expr) < 0) + goto nla_put_failure; + nla_nest_end(skb, data); + } + + return skb->len; + +nla_put_failure: + return -1; +}; + +struct nft_expr_info { + const struct nft_expr_ops *ops; + struct nlattr *tb[NFT_EXPR_MAXATTR + 1]; +}; + +static int nf_tables_expr_parse(const struct nft_ctx *ctx, + const struct nlattr *nla, + struct nft_expr_info *info) +{ + const struct nft_expr_type *type; + const struct nft_expr_ops *ops; + struct nlattr *tb[NFTA_EXPR_MAX + 1]; + int err; + + err = nla_parse_nested(tb, NFTA_EXPR_MAX, nla, nft_expr_policy); + if (err < 0) + return err; + + type = nft_expr_type_get(tb[NFTA_EXPR_NAME]); + if (IS_ERR(type)) + return PTR_ERR(type); + + if (tb[NFTA_EXPR_DATA]) { + err = nla_parse_nested(info->tb, type->maxattr, + tb[NFTA_EXPR_DATA], type->policy); + if (err < 0) + goto err1; + } else + memset(info->tb, 0, sizeof(info->tb[0]) * (type->maxattr + 1)); + + if (type->select_ops != NULL) { + ops = type->select_ops(ctx, + (const struct nlattr * const *)info->tb); + if (IS_ERR(ops)) { + err = PTR_ERR(ops); + goto err1; + } + } else + ops = type->ops; + + info->ops = ops; + return 0; + +err1: + module_put(type->owner); + return err; +} + +static int nf_tables_newexpr(const struct nft_ctx *ctx, + const struct nft_expr_info *info, + struct nft_expr *expr) +{ + const struct nft_expr_ops *ops = info->ops; + int err; + + expr->ops = ops; + if (ops->init) { + err = ops->init(ctx, expr, (const struct nlattr **)info->tb); + if (err < 0) + goto err1; + } + + return 0; + +err1: + expr->ops = NULL; + return err; +} + +static void nf_tables_expr_destroy(struct nft_expr *expr) +{ + if (expr->ops->destroy) + expr->ops->destroy(expr); + module_put(expr->ops->type->owner); +} + +/* + * Rules + */ + +static struct nft_rule *__nf_tables_rule_lookup(const struct nft_chain *chain, + u64 handle) +{ + struct nft_rule *rule; + + // FIXME: this sucks + list_for_each_entry(rule, &chain->rules, list) { + if (handle == rule->handle) + return rule; + } + + return ERR_PTR(-ENOENT); +} + +static struct nft_rule *nf_tables_rule_lookup(const struct nft_chain *chain, + const struct nlattr *nla) +{ + if (nla == NULL) + return ERR_PTR(-EINVAL); + + return __nf_tables_rule_lookup(chain, be64_to_cpu(nla_get_be64(nla))); +} + +static const struct nla_policy nft_rule_policy[NFTA_RULE_MAX + 1] = { + [NFTA_RULE_TABLE] = { .type = NLA_STRING }, + [NFTA_RULE_CHAIN] = { .type = NLA_STRING, + .len = NFT_CHAIN_MAXNAMELEN - 1 }, + [NFTA_RULE_HANDLE] = { .type = NLA_U64 }, + [NFTA_RULE_EXPRESSIONS] = { .type = NLA_NESTED }, + [NFTA_RULE_COMPAT] = { .type = NLA_NESTED }, + [NFTA_RULE_POSITION] = { .type = NLA_U64 }, +}; + +static int nf_tables_fill_rule_info(struct sk_buff *skb, u32 portid, u32 seq, + int event, u32 flags, int family, + const struct nft_table *table, + const struct nft_chain *chain, + const struct nft_rule *rule) +{ + struct nlmsghdr *nlh; + struct nfgenmsg *nfmsg; + const struct nft_expr *expr, *next; + struct nlattr *list; + const struct nft_rule *prule; + int type = event | NFNL_SUBSYS_NFTABLES << 8; + + nlh = nlmsg_put(skb, portid, seq, type, sizeof(struct nfgenmsg), + flags); + if (nlh == NULL) + goto nla_put_failure; + + nfmsg = nlmsg_data(nlh); + nfmsg->nfgen_family = family; + nfmsg->version = NFNETLINK_V0; + nfmsg->res_id = 0; + + if (nla_put_string(skb, NFTA_RULE_TABLE, table->name)) + goto nla_put_failure; + if (nla_put_string(skb, NFTA_RULE_CHAIN, chain->name)) + goto nla_put_failure; + if (nla_put_be64(skb, NFTA_RULE_HANDLE, cpu_to_be64(rule->handle))) + goto nla_put_failure; + + if ((event != NFT_MSG_DELRULE) && (rule->list.prev != &chain->rules)) { + prule = list_entry(rule->list.prev, struct nft_rule, list); + if (nla_put_be64(skb, NFTA_RULE_POSITION, + cpu_to_be64(prule->handle))) + goto nla_put_failure; + } + + list = nla_nest_start(skb, NFTA_RULE_EXPRESSIONS); + if (list == NULL) + goto nla_put_failure; + nft_rule_for_each_expr(expr, next, rule) { + struct nlattr *elem = nla_nest_start(skb, NFTA_LIST_ELEM); + if (elem == NULL) + goto nla_put_failure; + if (nf_tables_fill_expr_info(skb, expr) < 0) + goto nla_put_failure; + nla_nest_end(skb, elem); + } + nla_nest_end(skb, list); + + return nlmsg_end(skb, nlh); + +nla_put_failure: + nlmsg_trim(skb, nlh); + return -1; +} + +static int nf_tables_rule_notify(const struct sk_buff *oskb, + const struct nlmsghdr *nlh, + const struct nft_table *table, + const struct nft_chain *chain, + const struct nft_rule *rule, + int event, u32 flags, int family) +{ + struct sk_buff *skb; + u32 portid = NETLINK_CB(oskb).portid; + struct net *net = oskb ? sock_net(oskb->sk) : &init_net; + u32 seq = nlh->nlmsg_seq; + bool report; + int err; + + report = nlmsg_report(nlh); + if (!report && !nfnetlink_has_listeners(net, NFNLGRP_NFTABLES)) + return 0; + + err = -ENOBUFS; + skb = nlmsg_new(NLMSG_GOODSIZE, GFP_KERNEL); + if (skb == NULL) + goto err; + + err = nf_tables_fill_rule_info(skb, portid, seq, event, flags, + family, table, chain, rule); + if (err < 0) { + kfree_skb(skb); + goto err; + } + + err = nfnetlink_send(skb, net, portid, NFNLGRP_NFTABLES, report, + GFP_KERNEL); +err: + if (err < 0) + nfnetlink_set_err(net, portid, NFNLGRP_NFTABLES, err); + return err; +} + +static inline bool +nft_rule_is_active(struct net *net, const struct nft_rule *rule) +{ + return (rule->genmask & (1 << net->nft.gencursor)) == 0; +} + +static inline int gencursor_next(struct net *net) +{ + return net->nft.gencursor+1 == 1 ? 1 : 0; +} + +static inline int +nft_rule_is_active_next(struct net *net, const struct nft_rule *rule) +{ + return (rule->genmask & (1 << gencursor_next(net))) == 0; +} + +static inline void +nft_rule_activate_next(struct net *net, struct nft_rule *rule) +{ + /* Now inactive, will be active in the future */ + rule->genmask = (1 << net->nft.gencursor); +} + +static inline void +nft_rule_disactivate_next(struct net *net, struct nft_rule *rule) +{ + rule->genmask = (1 << gencursor_next(net)); +} + +static inline void nft_rule_clear(struct net *net, struct nft_rule *rule) +{ + rule->genmask = 0; +} + +static int nf_tables_dump_rules(struct sk_buff *skb, + struct netlink_callback *cb) +{ + const struct nfgenmsg *nfmsg = nlmsg_data(cb->nlh); + const struct nft_af_info *afi; + const struct nft_table *table; + const struct nft_chain *chain; + const struct nft_rule *rule; + unsigned int idx = 0, s_idx = cb->args[0]; + struct net *net = sock_net(skb->sk); + int family = nfmsg->nfgen_family; + u8 genctr = ACCESS_ONCE(net->nft.genctr); + u8 gencursor = ACCESS_ONCE(net->nft.gencursor); + + list_for_each_entry(afi, &net->nft.af_info, list) { + if (family != NFPROTO_UNSPEC && family != afi->family) + continue; + + list_for_each_entry(table, &afi->tables, list) { + list_for_each_entry(chain, &table->chains, list) { + list_for_each_entry(rule, &chain->rules, list) { + if (!nft_rule_is_active(net, rule)) + goto cont; + if (idx < s_idx) + goto cont; + if (idx > s_idx) + memset(&cb->args[1], 0, + sizeof(cb->args) - sizeof(cb->args[0])); + if (nf_tables_fill_rule_info(skb, NETLINK_CB(cb->skb).portid, + cb->nlh->nlmsg_seq, + NFT_MSG_NEWRULE, + NLM_F_MULTI | NLM_F_APPEND, + afi->family, table, chain, rule) < 0) + goto done; +cont: + idx++; + } + } + } + } +done: + /* Invalidate this dump, a transition to the new generation happened */ + if (gencursor != net->nft.gencursor || genctr != net->nft.genctr) + return -EBUSY; + + cb->args[0] = idx; + return skb->len; +} + +static int nf_tables_getrule(struct sock *nlsk, struct sk_buff *skb, + const struct nlmsghdr *nlh, + const struct nlattr * const nla[]) +{ + const struct nfgenmsg *nfmsg = nlmsg_data(nlh); + const struct nft_af_info *afi; + const struct nft_table *table; + const struct nft_chain *chain; + const struct nft_rule *rule; + struct sk_buff *skb2; + struct net *net = sock_net(skb->sk); + int family = nfmsg->nfgen_family; + int err; + + if (nlh->nlmsg_flags & NLM_F_DUMP) { + struct netlink_dump_control c = { + .dump = nf_tables_dump_rules, + }; + return netlink_dump_start(nlsk, skb, nlh, &c); + } + + afi = nf_tables_afinfo_lookup(net, family, false); + if (IS_ERR(afi)) + return PTR_ERR(afi); + + table = nf_tables_table_lookup(afi, nla[NFTA_RULE_TABLE]); + if (IS_ERR(table)) + return PTR_ERR(table); + + chain = nf_tables_chain_lookup(table, nla[NFTA_RULE_CHAIN]); + if (IS_ERR(chain)) + return PTR_ERR(chain); + + rule = nf_tables_rule_lookup(chain, nla[NFTA_RULE_HANDLE]); + if (IS_ERR(rule)) + return PTR_ERR(rule); + + skb2 = alloc_skb(NLMSG_GOODSIZE, GFP_KERNEL); + if (!skb2) + return -ENOMEM; + + err = nf_tables_fill_rule_info(skb2, NETLINK_CB(skb).portid, + nlh->nlmsg_seq, NFT_MSG_NEWRULE, 0, + family, table, chain, rule); + if (err < 0) + goto err; + + return nlmsg_unicast(nlsk, skb2, NETLINK_CB(skb).portid); + +err: + kfree_skb(skb2); + return err; +} + +static void nf_tables_rcu_rule_destroy(struct rcu_head *head) +{ + struct nft_rule *rule = container_of(head, struct nft_rule, rcu_head); + struct nft_expr *expr; + + /* + * Careful: some expressions might not be initialized in case this + * is called on error from nf_tables_newrule(). + */ + expr = nft_expr_first(rule); + while (expr->ops && expr != nft_expr_last(rule)) { + nf_tables_expr_destroy(expr); + expr = nft_expr_next(expr); + } + kfree(rule); +} + +static void nf_tables_rule_destroy(struct nft_rule *rule) +{ + call_rcu(&rule->rcu_head, nf_tables_rcu_rule_destroy); +} + +#define NFT_RULE_MAXEXPRS 128 + +static struct nft_expr_info *info; + +static struct nft_rule_trans * +nf_tables_trans_add(struct nft_rule *rule, const struct nft_ctx *ctx) +{ + struct nft_rule_trans *rupd; + + rupd = kmalloc(sizeof(struct nft_rule_trans), GFP_KERNEL); + if (rupd == NULL) + return NULL; + + rupd->chain = ctx->chain; + rupd->table = ctx->table; + rupd->rule = rule; + rupd->family = ctx->afi->family; + rupd->nlh = ctx->nlh; + list_add_tail(&rupd->list, &ctx->net->nft.commit_list); + + return rupd; +} + +static int nf_tables_newrule(struct sock *nlsk, struct sk_buff *skb, + const struct nlmsghdr *nlh, + const struct nlattr * const nla[]) +{ + const struct nfgenmsg *nfmsg = nlmsg_data(nlh); + const struct nft_af_info *afi; + struct net *net = sock_net(skb->sk); + struct nft_table *table; + struct nft_chain *chain; + struct nft_rule *rule, *old_rule = NULL; + struct nft_rule_trans *repl = NULL; + struct nft_expr *expr; + struct nft_ctx ctx; + struct nlattr *tmp; + unsigned int size, i, n; + int err, rem; + bool create; + u64 handle, pos_handle; + + create = nlh->nlmsg_flags & NLM_F_CREATE ? true : false; + + afi = nf_tables_afinfo_lookup(net, nfmsg->nfgen_family, create); + if (IS_ERR(afi)) + return PTR_ERR(afi); + + table = nf_tables_table_lookup(afi, nla[NFTA_RULE_TABLE]); + if (IS_ERR(table)) + return PTR_ERR(table); + + chain = nf_tables_chain_lookup(table, nla[NFTA_RULE_CHAIN]); + if (IS_ERR(chain)) + return PTR_ERR(chain); + + if (nla[NFTA_RULE_HANDLE]) { + handle = be64_to_cpu(nla_get_be64(nla[NFTA_RULE_HANDLE])); + rule = __nf_tables_rule_lookup(chain, handle); + if (IS_ERR(rule)) + return PTR_ERR(rule); + + if (nlh->nlmsg_flags & NLM_F_EXCL) + return -EEXIST; + if (nlh->nlmsg_flags & NLM_F_REPLACE) + old_rule = rule; + else + return -EOPNOTSUPP; + } else { + if (!create || nlh->nlmsg_flags & NLM_F_REPLACE) + return -EINVAL; + handle = nf_tables_alloc_handle(table); + } + + if (nla[NFTA_RULE_POSITION]) { + if (!(nlh->nlmsg_flags & NLM_F_CREATE)) + return -EOPNOTSUPP; + + pos_handle = be64_to_cpu(nla_get_be64(nla[NFTA_RULE_POSITION])); + old_rule = __nf_tables_rule_lookup(chain, pos_handle); + if (IS_ERR(old_rule)) + return PTR_ERR(old_rule); + } + + nft_ctx_init(&ctx, skb, nlh, afi, table, chain, nla); + + n = 0; + size = 0; + if (nla[NFTA_RULE_EXPRESSIONS]) { + nla_for_each_nested(tmp, nla[NFTA_RULE_EXPRESSIONS], rem) { + err = -EINVAL; + if (nla_type(tmp) != NFTA_LIST_ELEM) + goto err1; + if (n == NFT_RULE_MAXEXPRS) + goto err1; + err = nf_tables_expr_parse(&ctx, tmp, &info[n]); + if (err < 0) + goto err1; + size += info[n].ops->size; + n++; + } + } + + err = -ENOMEM; + rule = kzalloc(sizeof(*rule) + size, GFP_KERNEL); + if (rule == NULL) + goto err1; + + nft_rule_activate_next(net, rule); + + rule->handle = handle; + rule->dlen = size; + + expr = nft_expr_first(rule); + for (i = 0; i < n; i++) { + err = nf_tables_newexpr(&ctx, &info[i], expr); + if (err < 0) + goto err2; + info[i].ops = NULL; + expr = nft_expr_next(expr); + } + + if (nlh->nlmsg_flags & NLM_F_REPLACE) { + if (nft_rule_is_active_next(net, old_rule)) { + repl = nf_tables_trans_add(old_rule, &ctx); + if (repl == NULL) { + err = -ENOMEM; + goto err2; + } + nft_rule_disactivate_next(net, old_rule); + list_add_tail(&rule->list, &old_rule->list); + } else { + err = -ENOENT; + goto err2; + } + } else if (nlh->nlmsg_flags & NLM_F_APPEND) + if (old_rule) + list_add_rcu(&rule->list, &old_rule->list); + else + list_add_tail_rcu(&rule->list, &chain->rules); + else { + if (old_rule) + list_add_tail_rcu(&rule->list, &old_rule->list); + else + list_add_rcu(&rule->list, &chain->rules); + } + + if (nf_tables_trans_add(rule, &ctx) == NULL) { + err = -ENOMEM; + goto err3; + } + return 0; + +err3: + list_del_rcu(&rule->list); + if (repl) { + list_del_rcu(&repl->rule->list); + list_del(&repl->list); + nft_rule_clear(net, repl->rule); + kfree(repl); + } +err2: + nf_tables_rule_destroy(rule); +err1: + for (i = 0; i < n; i++) { + if (info[i].ops != NULL) + module_put(info[i].ops->type->owner); + } + return err; +} + +static int +nf_tables_delrule_one(struct nft_ctx *ctx, struct nft_rule *rule) +{ + /* You cannot delete the same rule twice */ + if (nft_rule_is_active_next(ctx->net, rule)) { + if (nf_tables_trans_add(rule, ctx) == NULL) + return -ENOMEM; + nft_rule_disactivate_next(ctx->net, rule); + return 0; + } + return -ENOENT; +} + +static int nf_tables_delrule(struct sock *nlsk, struct sk_buff *skb, + const struct nlmsghdr *nlh, + const struct nlattr * const nla[]) +{ + const struct nfgenmsg *nfmsg = nlmsg_data(nlh); + const struct nft_af_info *afi; + struct net *net = sock_net(skb->sk); + const struct nft_table *table; + struct nft_chain *chain; + struct nft_rule *rule, *tmp; + int family = nfmsg->nfgen_family, err = 0; + struct nft_ctx ctx; + + afi = nf_tables_afinfo_lookup(net, family, false); + if (IS_ERR(afi)) + return PTR_ERR(afi); + + table = nf_tables_table_lookup(afi, nla[NFTA_RULE_TABLE]); + if (IS_ERR(table)) + return PTR_ERR(table); + + chain = nf_tables_chain_lookup(table, nla[NFTA_RULE_CHAIN]); + if (IS_ERR(chain)) + return PTR_ERR(chain); + + nft_ctx_init(&ctx, skb, nlh, afi, table, chain, nla); + + if (nla[NFTA_RULE_HANDLE]) { + rule = nf_tables_rule_lookup(chain, nla[NFTA_RULE_HANDLE]); + if (IS_ERR(rule)) + return PTR_ERR(rule); + + err = nf_tables_delrule_one(&ctx, rule); + } else { + /* Remove all rules in this chain */ + list_for_each_entry_safe(rule, tmp, &chain->rules, list) { + err = nf_tables_delrule_one(&ctx, rule); + if (err < 0) + break; + } + } + + return err; +} + +static int nf_tables_commit(struct sk_buff *skb) +{ + struct net *net = sock_net(skb->sk); + struct nft_rule_trans *rupd, *tmp; + + /* Bump generation counter, invalidate any dump in progress */ + net->nft.genctr++; + + /* A new generation has just started */ + net->nft.gencursor = gencursor_next(net); + + /* Make sure all packets have left the previous generation before + * purging old rules. + */ + synchronize_rcu(); + + list_for_each_entry_safe(rupd, tmp, &net->nft.commit_list, list) { + /* Delete this rule from the dirty list */ + list_del(&rupd->list); + + /* This rule was inactive in the past and just became active. + * Clear the next bit of the genmask since its meaning has + * changed, now it is the future. + */ + if (nft_rule_is_active(net, rupd->rule)) { + nft_rule_clear(net, rupd->rule); + nf_tables_rule_notify(skb, rupd->nlh, rupd->table, + rupd->chain, rupd->rule, + NFT_MSG_NEWRULE, 0, + rupd->family); + kfree(rupd); + continue; + } + + /* This rule is in the past, get rid of it */ + list_del_rcu(&rupd->rule->list); + nf_tables_rule_notify(skb, rupd->nlh, rupd->table, rupd->chain, + rupd->rule, NFT_MSG_DELRULE, 0, + rupd->family); + nf_tables_rule_destroy(rupd->rule); + kfree(rupd); + } + + return 0; +} + +static int nf_tables_abort(struct sk_buff *skb) +{ + struct net *net = sock_net(skb->sk); + struct nft_rule_trans *rupd, *tmp; + + list_for_each_entry_safe(rupd, tmp, &net->nft.commit_list, list) { + /* Delete all rules from the dirty list */ + list_del(&rupd->list); + + if (!nft_rule_is_active_next(net, rupd->rule)) { + nft_rule_clear(net, rupd->rule); + kfree(rupd); + continue; + } + + /* This rule is inactive, get rid of it */ + list_del_rcu(&rupd->rule->list); + nf_tables_rule_destroy(rupd->rule); + kfree(rupd); + } + return 0; +} + +/* + * Sets + */ + +static LIST_HEAD(nf_tables_set_ops); + +int nft_register_set(struct nft_set_ops *ops) +{ + nfnl_lock(NFNL_SUBSYS_NFTABLES); + list_add_tail(&ops->list, &nf_tables_set_ops); + nfnl_unlock(NFNL_SUBSYS_NFTABLES); + return 0; +} +EXPORT_SYMBOL_GPL(nft_register_set); + +void nft_unregister_set(struct nft_set_ops *ops) +{ + nfnl_lock(NFNL_SUBSYS_NFTABLES); + list_del(&ops->list); + nfnl_unlock(NFNL_SUBSYS_NFTABLES); +} +EXPORT_SYMBOL_GPL(nft_unregister_set); + +static const struct nft_set_ops *nft_select_set_ops(const struct nlattr * const nla[]) +{ + const struct nft_set_ops *ops; + u32 features; + +#ifdef CONFIG_MODULES + if (list_empty(&nf_tables_set_ops)) { + nfnl_unlock(NFNL_SUBSYS_NFTABLES); + request_module("nft-set"); + nfnl_lock(NFNL_SUBSYS_NFTABLES); + if (!list_empty(&nf_tables_set_ops)) + return ERR_PTR(-EAGAIN); + } +#endif + features = 0; + if (nla[NFTA_SET_FLAGS] != NULL) { + features = ntohl(nla_get_be32(nla[NFTA_SET_FLAGS])); + features &= NFT_SET_INTERVAL | NFT_SET_MAP; + } + + // FIXME: implement selection properly + list_for_each_entry(ops, &nf_tables_set_ops, list) { + if ((ops->features & features) != features) + continue; + if (!try_module_get(ops->owner)) + continue; + return ops; + } + + return ERR_PTR(-EOPNOTSUPP); +} + +static const struct nla_policy nft_set_policy[NFTA_SET_MAX + 1] = { + [NFTA_SET_TABLE] = { .type = NLA_STRING }, + [NFTA_SET_NAME] = { .type = NLA_STRING }, + [NFTA_SET_FLAGS] = { .type = NLA_U32 }, + [NFTA_SET_KEY_TYPE] = { .type = NLA_U32 }, + [NFTA_SET_KEY_LEN] = { .type = NLA_U32 }, + [NFTA_SET_DATA_TYPE] = { .type = NLA_U32 }, + [NFTA_SET_DATA_LEN] = { .type = NLA_U32 }, +}; + +static int nft_ctx_init_from_setattr(struct nft_ctx *ctx, + const struct sk_buff *skb, + const struct nlmsghdr *nlh, + const struct nlattr * const nla[]) +{ + struct net *net = sock_net(skb->sk); + const struct nfgenmsg *nfmsg = nlmsg_data(nlh); + const struct nft_af_info *afi; + const struct nft_table *table = NULL; + + afi = nf_tables_afinfo_lookup(net, nfmsg->nfgen_family, false); + if (IS_ERR(afi)) + return PTR_ERR(afi); + + if (nla[NFTA_SET_TABLE] != NULL) { + table = nf_tables_table_lookup(afi, nla[NFTA_SET_TABLE]); + if (IS_ERR(table)) + return PTR_ERR(table); + } + + nft_ctx_init(ctx, skb, nlh, afi, table, NULL, nla); + return 0; +} + +struct nft_set *nf_tables_set_lookup(const struct nft_table *table, + const struct nlattr *nla) +{ + struct nft_set *set; + + if (nla == NULL) + return ERR_PTR(-EINVAL); + + list_for_each_entry(set, &table->sets, list) { + if (!nla_strcmp(nla, set->name)) + return set; + } + return ERR_PTR(-ENOENT); +} + +static int nf_tables_set_alloc_name(struct nft_ctx *ctx, struct nft_set *set, + const char *name) +{ + const struct nft_set *i; + const char *p; + unsigned long *inuse; + unsigned int n = 0; + + p = strnchr(name, IFNAMSIZ, '%'); + if (p != NULL) { + if (p[1] != 'd' || strchr(p + 2, '%')) + return -EINVAL; + + inuse = (unsigned long *)get_zeroed_page(GFP_KERNEL); + if (inuse == NULL) + return -ENOMEM; + + list_for_each_entry(i, &ctx->table->sets, list) { + if (!sscanf(i->name, name, &n)) + continue; + if (n < 0 || n > BITS_PER_LONG * PAGE_SIZE) + continue; + set_bit(n, inuse); + } + + n = find_first_zero_bit(inuse, BITS_PER_LONG * PAGE_SIZE); + free_page((unsigned long)inuse); + } + + snprintf(set->name, sizeof(set->name), name, n); + list_for_each_entry(i, &ctx->table->sets, list) { + if (!strcmp(set->name, i->name)) + return -ENFILE; + } + return 0; +} + +static int nf_tables_fill_set(struct sk_buff *skb, const struct nft_ctx *ctx, + const struct nft_set *set, u16 event, u16 flags) +{ + struct nfgenmsg *nfmsg; + struct nlmsghdr *nlh; + u32 portid = NETLINK_CB(ctx->skb).portid; + u32 seq = ctx->nlh->nlmsg_seq; + + event |= NFNL_SUBSYS_NFTABLES << 8; + nlh = nlmsg_put(skb, portid, seq, event, sizeof(struct nfgenmsg), + flags); + if (nlh == NULL) + goto nla_put_failure; + + nfmsg = nlmsg_data(nlh); + nfmsg->nfgen_family = ctx->afi->family; + nfmsg->version = NFNETLINK_V0; + nfmsg->res_id = 0; + + if (nla_put_string(skb, NFTA_SET_TABLE, ctx->table->name)) + goto nla_put_failure; + if (nla_put_string(skb, NFTA_SET_NAME, set->name)) + goto nla_put_failure; + if (set->flags != 0) + if (nla_put_be32(skb, NFTA_SET_FLAGS, htonl(set->flags))) + goto nla_put_failure; + + if (nla_put_be32(skb, NFTA_SET_KEY_TYPE, htonl(set->ktype))) + goto nla_put_failure; + if (nla_put_be32(skb, NFTA_SET_KEY_LEN, htonl(set->klen))) + goto nla_put_failure; + if (set->flags & NFT_SET_MAP) { + if (nla_put_be32(skb, NFTA_SET_DATA_TYPE, htonl(set->dtype))) + goto nla_put_failure; + if (nla_put_be32(skb, NFTA_SET_DATA_LEN, htonl(set->dlen))) + goto nla_put_failure; + } + + return nlmsg_end(skb, nlh); + +nla_put_failure: + nlmsg_trim(skb, nlh); + return -1; +} + +static int nf_tables_set_notify(const struct nft_ctx *ctx, + const struct nft_set *set, + int event) +{ + struct sk_buff *skb; + u32 portid = NETLINK_CB(ctx->skb).portid; + bool report; + int err; + + report = nlmsg_report(ctx->nlh); + if (!report && !nfnetlink_has_listeners(ctx->net, NFNLGRP_NFTABLES)) + return 0; + + err = -ENOBUFS; + skb = nlmsg_new(NLMSG_GOODSIZE, GFP_KERNEL); + if (skb == NULL) + goto err; + + err = nf_tables_fill_set(skb, ctx, set, event, 0); + if (err < 0) { + kfree_skb(skb); + goto err; + } + + err = nfnetlink_send(skb, ctx->net, portid, NFNLGRP_NFTABLES, report, + GFP_KERNEL); +err: + if (err < 0) + nfnetlink_set_err(ctx->net, portid, NFNLGRP_NFTABLES, err); + return err; +} + +static int nf_tables_dump_sets_table(struct nft_ctx *ctx, struct sk_buff *skb, + struct netlink_callback *cb) +{ + const struct nft_set *set; + unsigned int idx = 0, s_idx = cb->args[0]; + + if (cb->args[1]) + return skb->len; + + list_for_each_entry(set, &ctx->table->sets, list) { + if (idx < s_idx) + goto cont; + if (nf_tables_fill_set(skb, ctx, set, NFT_MSG_NEWSET, + NLM_F_MULTI) < 0) { + cb->args[0] = idx; + goto done; + } +cont: + idx++; + } + cb->args[1] = 1; +done: + return skb->len; +} + +static int nf_tables_dump_sets_all(struct nft_ctx *ctx, struct sk_buff *skb, + struct netlink_callback *cb) +{ + const struct nft_set *set; + unsigned int idx = 0, s_idx = cb->args[0]; + struct nft_table *table, *cur_table = (struct nft_table *)cb->args[2]; + + if (cb->args[1]) + return skb->len; + + list_for_each_entry(table, &ctx->afi->tables, list) { + if (cur_table && cur_table != table) + continue; + + ctx->table = table; + list_for_each_entry(set, &ctx->table->sets, list) { + if (idx < s_idx) + goto cont; + if (nf_tables_fill_set(skb, ctx, set, NFT_MSG_NEWSET, + NLM_F_MULTI) < 0) { + cb->args[0] = idx; + cb->args[2] = (unsigned long) table; + goto done; + } +cont: + idx++; + } + } + cb->args[1] = 1; +done: + return skb->len; +} + +static int nf_tables_dump_sets(struct sk_buff *skb, struct netlink_callback *cb) +{ + const struct nfgenmsg *nfmsg = nlmsg_data(cb->nlh); + struct nlattr *nla[NFTA_SET_MAX + 1]; + struct nft_ctx ctx; + int err, ret; + + err = nlmsg_parse(cb->nlh, sizeof(*nfmsg), nla, NFTA_SET_MAX, + nft_set_policy); + if (err < 0) + return err; + + err = nft_ctx_init_from_setattr(&ctx, cb->skb, cb->nlh, (void *)nla); + if (err < 0) + return err; + + if (ctx.table == NULL) + ret = nf_tables_dump_sets_all(&ctx, skb, cb); + else + ret = nf_tables_dump_sets_table(&ctx, skb, cb); + + return ret; +} + +static int nf_tables_getset(struct sock *nlsk, struct sk_buff *skb, + const struct nlmsghdr *nlh, + const struct nlattr * const nla[]) +{ + const struct nft_set *set; + struct nft_ctx ctx; + struct sk_buff *skb2; + int err; + + /* Verify existance before starting dump */ + err = nft_ctx_init_from_setattr(&ctx, skb, nlh, nla); + if (err < 0) + return err; + + if (nlh->nlmsg_flags & NLM_F_DUMP) { + struct netlink_dump_control c = { + .dump = nf_tables_dump_sets, + }; + return netlink_dump_start(nlsk, skb, nlh, &c); + } + + set = nf_tables_set_lookup(ctx.table, nla[NFTA_SET_NAME]); + if (IS_ERR(set)) + return PTR_ERR(set); + + skb2 = alloc_skb(NLMSG_GOODSIZE, GFP_KERNEL); + if (skb2 == NULL) + return -ENOMEM; + + err = nf_tables_fill_set(skb2, &ctx, set, NFT_MSG_NEWSET, 0); + if (err < 0) + goto err; + + return nlmsg_unicast(nlsk, skb2, NETLINK_CB(skb).portid); + +err: + kfree_skb(skb2); + return err; +} + +static int nf_tables_newset(struct sock *nlsk, struct sk_buff *skb, + const struct nlmsghdr *nlh, + const struct nlattr * const nla[]) +{ + const struct nfgenmsg *nfmsg = nlmsg_data(nlh); + const struct nft_set_ops *ops; + const struct nft_af_info *afi; + struct net *net = sock_net(skb->sk); + struct nft_table *table; + struct nft_set *set; + struct nft_ctx ctx; + char name[IFNAMSIZ]; + unsigned int size; + bool create; + u32 ktype, klen, dlen, dtype, flags; + int err; + + if (nla[NFTA_SET_TABLE] == NULL || + nla[NFTA_SET_NAME] == NULL || + nla[NFTA_SET_KEY_LEN] == NULL) + return -EINVAL; + + ktype = NFT_DATA_VALUE; + if (nla[NFTA_SET_KEY_TYPE] != NULL) { + ktype = ntohl(nla_get_be32(nla[NFTA_SET_KEY_TYPE])); + if ((ktype & NFT_DATA_RESERVED_MASK) == NFT_DATA_RESERVED_MASK) + return -EINVAL; + } + + klen = ntohl(nla_get_be32(nla[NFTA_SET_KEY_LEN])); + if (klen == 0 || klen > FIELD_SIZEOF(struct nft_data, data)) + return -EINVAL; + + flags = 0; + if (nla[NFTA_SET_FLAGS] != NULL) { + flags = ntohl(nla_get_be32(nla[NFTA_SET_FLAGS])); + if (flags & ~(NFT_SET_ANONYMOUS | NFT_SET_CONSTANT | + NFT_SET_INTERVAL | NFT_SET_MAP)) + return -EINVAL; + } + + dtype = 0; + dlen = 0; + if (nla[NFTA_SET_DATA_TYPE] != NULL) { + if (!(flags & NFT_SET_MAP)) + return -EINVAL; + + dtype = ntohl(nla_get_be32(nla[NFTA_SET_DATA_TYPE])); + if ((dtype & NFT_DATA_RESERVED_MASK) == NFT_DATA_RESERVED_MASK && + dtype != NFT_DATA_VERDICT) + return -EINVAL; + + if (dtype != NFT_DATA_VERDICT) { + if (nla[NFTA_SET_DATA_LEN] == NULL) + return -EINVAL; + dlen = ntohl(nla_get_be32(nla[NFTA_SET_DATA_LEN])); + if (dlen == 0 || + dlen > FIELD_SIZEOF(struct nft_data, data)) + return -EINVAL; + } else + dlen = sizeof(struct nft_data); + } else if (flags & NFT_SET_MAP) + return -EINVAL; + + create = nlh->nlmsg_flags & NLM_F_CREATE ? true : false; + + afi = nf_tables_afinfo_lookup(net, nfmsg->nfgen_family, create); + if (IS_ERR(afi)) + return PTR_ERR(afi); + + table = nf_tables_table_lookup(afi, nla[NFTA_SET_TABLE]); + if (IS_ERR(table)) + return PTR_ERR(table); + + nft_ctx_init(&ctx, skb, nlh, afi, table, NULL, nla); + + set = nf_tables_set_lookup(table, nla[NFTA_SET_NAME]); + if (IS_ERR(set)) { + if (PTR_ERR(set) != -ENOENT) + return PTR_ERR(set); + set = NULL; + } + + if (set != NULL) { + if (nlh->nlmsg_flags & NLM_F_EXCL) + return -EEXIST; + if (nlh->nlmsg_flags & NLM_F_REPLACE) + return -EOPNOTSUPP; + return 0; + } + + if (!(nlh->nlmsg_flags & NLM_F_CREATE)) + return -ENOENT; + + ops = nft_select_set_ops(nla); + if (IS_ERR(ops)) + return PTR_ERR(ops); + + size = 0; + if (ops->privsize != NULL) + size = ops->privsize(nla); + + err = -ENOMEM; + set = kzalloc(sizeof(*set) + size, GFP_KERNEL); + if (set == NULL) + goto err1; + + nla_strlcpy(name, nla[NFTA_SET_NAME], sizeof(set->name)); + err = nf_tables_set_alloc_name(&ctx, set, name); + if (err < 0) + goto err2; + + INIT_LIST_HEAD(&set->bindings); + set->ops = ops; + set->ktype = ktype; + set->klen = klen; + set->dtype = dtype; + set->dlen = dlen; + set->flags = flags; + + err = ops->init(set, nla); + if (err < 0) + goto err2; + + list_add_tail(&set->list, &table->sets); + nf_tables_set_notify(&ctx, set, NFT_MSG_NEWSET); + return 0; + +err2: + kfree(set); +err1: + module_put(ops->owner); + return err; +} + +static void nf_tables_set_destroy(const struct nft_ctx *ctx, struct nft_set *set) +{ + list_del(&set->list); + if (!(set->flags & NFT_SET_ANONYMOUS)) + nf_tables_set_notify(ctx, set, NFT_MSG_DELSET); + + set->ops->destroy(set); + module_put(set->ops->owner); + kfree(set); +} + +static int nf_tables_delset(struct sock *nlsk, struct sk_buff *skb, + const struct nlmsghdr *nlh, + const struct nlattr * const nla[]) +{ + struct nft_set *set; + struct nft_ctx ctx; + int err; + + if (nla[NFTA_SET_TABLE] == NULL) + return -EINVAL; + + err = nft_ctx_init_from_setattr(&ctx, skb, nlh, nla); + if (err < 0) + return err; + + set = nf_tables_set_lookup(ctx.table, nla[NFTA_SET_NAME]); + if (IS_ERR(set)) + return PTR_ERR(set); + if (!list_empty(&set->bindings)) + return -EBUSY; + + nf_tables_set_destroy(&ctx, set); + return 0; +} + +static int nf_tables_bind_check_setelem(const struct nft_ctx *ctx, + const struct nft_set *set, + const struct nft_set_iter *iter, + const struct nft_set_elem *elem) +{ + enum nft_registers dreg; + + dreg = nft_type_to_reg(set->dtype); + return nft_validate_data_load(ctx, dreg, &elem->data, set->dtype); +} + +int nf_tables_bind_set(const struct nft_ctx *ctx, struct nft_set *set, + struct nft_set_binding *binding) +{ + struct nft_set_binding *i; + struct nft_set_iter iter; + + if (!list_empty(&set->bindings) && set->flags & NFT_SET_ANONYMOUS) + return -EBUSY; + + if (set->flags & NFT_SET_MAP) { + /* If the set is already bound to the same chain all + * jumps are already validated for that chain. + */ + list_for_each_entry(i, &set->bindings, list) { + if (i->chain == binding->chain) + goto bind; + } + + iter.skip = 0; + iter.count = 0; + iter.err = 0; + iter.fn = nf_tables_bind_check_setelem; + + set->ops->walk(ctx, set, &iter); + if (iter.err < 0) { + /* Destroy anonymous sets if binding fails */ + if (set->flags & NFT_SET_ANONYMOUS) + nf_tables_set_destroy(ctx, set); + + return iter.err; + } + } +bind: + binding->chain = ctx->chain; + list_add_tail(&binding->list, &set->bindings); + return 0; +} + +void nf_tables_unbind_set(const struct nft_ctx *ctx, struct nft_set *set, + struct nft_set_binding *binding) +{ + list_del(&binding->list); + + if (list_empty(&set->bindings) && set->flags & NFT_SET_ANONYMOUS) + nf_tables_set_destroy(ctx, set); +} + +/* + * Set elements + */ + +static const struct nla_policy nft_set_elem_policy[NFTA_SET_ELEM_MAX + 1] = { + [NFTA_SET_ELEM_KEY] = { .type = NLA_NESTED }, + [NFTA_SET_ELEM_DATA] = { .type = NLA_NESTED }, + [NFTA_SET_ELEM_FLAGS] = { .type = NLA_U32 }, +}; + +static const struct nla_policy nft_set_elem_list_policy[NFTA_SET_ELEM_LIST_MAX + 1] = { + [NFTA_SET_ELEM_LIST_TABLE] = { .type = NLA_STRING }, + [NFTA_SET_ELEM_LIST_SET] = { .type = NLA_STRING }, + [NFTA_SET_ELEM_LIST_ELEMENTS] = { .type = NLA_NESTED }, +}; + +static int nft_ctx_init_from_elemattr(struct nft_ctx *ctx, + const struct sk_buff *skb, + const struct nlmsghdr *nlh, + const struct nlattr * const nla[]) +{ + const struct nfgenmsg *nfmsg = nlmsg_data(nlh); + const struct nft_af_info *afi; + const struct nft_table *table; + struct net *net = sock_net(skb->sk); + + afi = nf_tables_afinfo_lookup(net, nfmsg->nfgen_family, false); + if (IS_ERR(afi)) + return PTR_ERR(afi); + + table = nf_tables_table_lookup(afi, nla[NFTA_SET_ELEM_LIST_TABLE]); + if (IS_ERR(table)) + return PTR_ERR(table); + + nft_ctx_init(ctx, skb, nlh, afi, table, NULL, nla); + return 0; +} + +static int nf_tables_fill_setelem(struct sk_buff *skb, + const struct nft_set *set, + const struct nft_set_elem *elem) +{ + unsigned char *b = skb_tail_pointer(skb); + struct nlattr *nest; + + nest = nla_nest_start(skb, NFTA_LIST_ELEM); + if (nest == NULL) + goto nla_put_failure; + + if (nft_data_dump(skb, NFTA_SET_ELEM_KEY, &elem->key, NFT_DATA_VALUE, + set->klen) < 0) + goto nla_put_failure; + + if (set->flags & NFT_SET_MAP && + !(elem->flags & NFT_SET_ELEM_INTERVAL_END) && + nft_data_dump(skb, NFTA_SET_ELEM_DATA, &elem->data, + set->dtype == NFT_DATA_VERDICT ? NFT_DATA_VERDICT : NFT_DATA_VALUE, + set->dlen) < 0) + goto nla_put_failure; + + if (elem->flags != 0) + if (nla_put_be32(skb, NFTA_SET_ELEM_FLAGS, htonl(elem->flags))) + goto nla_put_failure; + + nla_nest_end(skb, nest); + return 0; + +nla_put_failure: + nlmsg_trim(skb, b); + return -EMSGSIZE; +} + +struct nft_set_dump_args { + const struct netlink_callback *cb; + struct nft_set_iter iter; + struct sk_buff *skb; +}; + +static int nf_tables_dump_setelem(const struct nft_ctx *ctx, + const struct nft_set *set, + const struct nft_set_iter *iter, + const struct nft_set_elem *elem) +{ + struct nft_set_dump_args *args; + + args = container_of(iter, struct nft_set_dump_args, iter); + return nf_tables_fill_setelem(args->skb, set, elem); +} + +static int nf_tables_dump_set(struct sk_buff *skb, struct netlink_callback *cb) +{ + const struct nft_set *set; + struct nft_set_dump_args args; + struct nft_ctx ctx; + struct nlattr *nla[NFTA_SET_ELEM_LIST_MAX + 1]; + struct nfgenmsg *nfmsg; + struct nlmsghdr *nlh; + struct nlattr *nest; + u32 portid, seq; + int event, err; + + nfmsg = nlmsg_data(cb->nlh); + err = nlmsg_parse(cb->nlh, sizeof(*nfmsg), nla, NFTA_SET_ELEM_LIST_MAX, + nft_set_elem_list_policy); + if (err < 0) + return err; + + err = nft_ctx_init_from_elemattr(&ctx, cb->skb, cb->nlh, (void *)nla); + if (err < 0) + return err; + + set = nf_tables_set_lookup(ctx.table, nla[NFTA_SET_ELEM_LIST_SET]); + if (IS_ERR(set)) + return PTR_ERR(set); + + event = NFT_MSG_NEWSETELEM; + event |= NFNL_SUBSYS_NFTABLES << 8; + portid = NETLINK_CB(cb->skb).portid; + seq = cb->nlh->nlmsg_seq; + + nlh = nlmsg_put(skb, portid, seq, event, sizeof(struct nfgenmsg), + NLM_F_MULTI); + if (nlh == NULL) + goto nla_put_failure; + + nfmsg = nlmsg_data(nlh); + nfmsg->nfgen_family = NFPROTO_UNSPEC; + nfmsg->version = NFNETLINK_V0; + nfmsg->res_id = 0; + + if (nla_put_string(skb, NFTA_SET_ELEM_LIST_TABLE, ctx.table->name)) + goto nla_put_failure; + if (nla_put_string(skb, NFTA_SET_ELEM_LIST_SET, set->name)) + goto nla_put_failure; + + nest = nla_nest_start(skb, NFTA_SET_ELEM_LIST_ELEMENTS); + if (nest == NULL) + goto nla_put_failure; + + args.cb = cb; + args.skb = skb; + args.iter.skip = cb->args[0]; + args.iter.count = 0; + args.iter.err = 0; + args.iter.fn = nf_tables_dump_setelem; + set->ops->walk(&ctx, set, &args.iter); + + nla_nest_end(skb, nest); + nlmsg_end(skb, nlh); + + if (args.iter.err && args.iter.err != -EMSGSIZE) + return args.iter.err; + if (args.iter.count == cb->args[0]) + return 0; + + cb->args[0] = args.iter.count; + return skb->len; + +nla_put_failure: + return -ENOSPC; +} + +static int nf_tables_getsetelem(struct sock *nlsk, struct sk_buff *skb, + const struct nlmsghdr *nlh, + const struct nlattr * const nla[]) +{ + const struct nft_set *set; + struct nft_ctx ctx; + int err; + + err = nft_ctx_init_from_elemattr(&ctx, skb, nlh, nla); + if (err < 0) + return err; + + set = nf_tables_set_lookup(ctx.table, nla[NFTA_SET_ELEM_LIST_SET]); + if (IS_ERR(set)) + return PTR_ERR(set); + + if (nlh->nlmsg_flags & NLM_F_DUMP) { + struct netlink_dump_control c = { + .dump = nf_tables_dump_set, + }; + return netlink_dump_start(nlsk, skb, nlh, &c); + } + return -EOPNOTSUPP; +} + +static int nft_add_set_elem(const struct nft_ctx *ctx, struct nft_set *set, + const struct nlattr *attr) +{ + struct nlattr *nla[NFTA_SET_ELEM_MAX + 1]; + struct nft_data_desc d1, d2; + struct nft_set_elem elem; + struct nft_set_binding *binding; + enum nft_registers dreg; + int err; + + err = nla_parse_nested(nla, NFTA_SET_ELEM_MAX, attr, + nft_set_elem_policy); + if (err < 0) + return err; + + if (nla[NFTA_SET_ELEM_KEY] == NULL) + return -EINVAL; + + elem.flags = 0; + if (nla[NFTA_SET_ELEM_FLAGS] != NULL) { + elem.flags = ntohl(nla_get_be32(nla[NFTA_SET_ELEM_FLAGS])); + if (elem.flags & ~NFT_SET_ELEM_INTERVAL_END) + return -EINVAL; + } + + if (set->flags & NFT_SET_MAP) { + if (nla[NFTA_SET_ELEM_DATA] == NULL && + !(elem.flags & NFT_SET_ELEM_INTERVAL_END)) + return -EINVAL; + } else { + if (nla[NFTA_SET_ELEM_DATA] != NULL) + return -EINVAL; + } + + err = nft_data_init(ctx, &elem.key, &d1, nla[NFTA_SET_ELEM_KEY]); + if (err < 0) + goto err1; + err = -EINVAL; + if (d1.type != NFT_DATA_VALUE || d1.len != set->klen) + goto err2; + + err = -EEXIST; + if (set->ops->get(set, &elem) == 0) + goto err2; + + if (nla[NFTA_SET_ELEM_DATA] != NULL) { + err = nft_data_init(ctx, &elem.data, &d2, nla[NFTA_SET_ELEM_DATA]); + if (err < 0) + goto err2; + + err = -EINVAL; + if (set->dtype != NFT_DATA_VERDICT && d2.len != set->dlen) + goto err3; + + dreg = nft_type_to_reg(set->dtype); + list_for_each_entry(binding, &set->bindings, list) { + struct nft_ctx bind_ctx = { + .afi = ctx->afi, + .table = ctx->table, + .chain = binding->chain, + }; + + err = nft_validate_data_load(&bind_ctx, dreg, + &elem.data, d2.type); + if (err < 0) + goto err3; + } + } + + err = set->ops->insert(set, &elem); + if (err < 0) + goto err3; + + return 0; + +err3: + if (nla[NFTA_SET_ELEM_DATA] != NULL) + nft_data_uninit(&elem.data, d2.type); +err2: + nft_data_uninit(&elem.key, d1.type); +err1: + return err; +} + +static int nf_tables_newsetelem(struct sock *nlsk, struct sk_buff *skb, + const struct nlmsghdr *nlh, + const struct nlattr * const nla[]) +{ + const struct nlattr *attr; + struct nft_set *set; + struct nft_ctx ctx; + int rem, err; + + err = nft_ctx_init_from_elemattr(&ctx, skb, nlh, nla); + if (err < 0) + return err; + + set = nf_tables_set_lookup(ctx.table, nla[NFTA_SET_ELEM_LIST_SET]); + if (IS_ERR(set)) + return PTR_ERR(set); + if (!list_empty(&set->bindings) && set->flags & NFT_SET_CONSTANT) + return -EBUSY; + + nla_for_each_nested(attr, nla[NFTA_SET_ELEM_LIST_ELEMENTS], rem) { + err = nft_add_set_elem(&ctx, set, attr); + if (err < 0) + return err; + } + return 0; +} + +static int nft_del_setelem(const struct nft_ctx *ctx, struct nft_set *set, + const struct nlattr *attr) +{ + struct nlattr *nla[NFTA_SET_ELEM_MAX + 1]; + struct nft_data_desc desc; + struct nft_set_elem elem; + int err; + + err = nla_parse_nested(nla, NFTA_SET_ELEM_MAX, attr, + nft_set_elem_policy); + if (err < 0) + goto err1; + + err = -EINVAL; + if (nla[NFTA_SET_ELEM_KEY] == NULL) + goto err1; + + err = nft_data_init(ctx, &elem.key, &desc, nla[NFTA_SET_ELEM_KEY]); + if (err < 0) + goto err1; + + err = -EINVAL; + if (desc.type != NFT_DATA_VALUE || desc.len != set->klen) + goto err2; + + err = set->ops->get(set, &elem); + if (err < 0) + goto err2; + + set->ops->remove(set, &elem); + + nft_data_uninit(&elem.key, NFT_DATA_VALUE); + if (set->flags & NFT_SET_MAP) + nft_data_uninit(&elem.data, set->dtype); + +err2: + nft_data_uninit(&elem.key, desc.type); +err1: + return err; +} + +static int nf_tables_delsetelem(struct sock *nlsk, struct sk_buff *skb, + const struct nlmsghdr *nlh, + const struct nlattr * const nla[]) +{ + const struct nlattr *attr; + struct nft_set *set; + struct nft_ctx ctx; + int rem, err; + + err = nft_ctx_init_from_elemattr(&ctx, skb, nlh, nla); + if (err < 0) + return err; + + set = nf_tables_set_lookup(ctx.table, nla[NFTA_SET_ELEM_LIST_SET]); + if (IS_ERR(set)) + return PTR_ERR(set); + if (!list_empty(&set->bindings) && set->flags & NFT_SET_CONSTANT) + return -EBUSY; + + nla_for_each_nested(attr, nla[NFTA_SET_ELEM_LIST_ELEMENTS], rem) { + err = nft_del_setelem(&ctx, set, attr); + if (err < 0) + return err; + } + return 0; +} + +static const struct nfnl_callback nf_tables_cb[NFT_MSG_MAX] = { + [NFT_MSG_NEWTABLE] = { + .call = nf_tables_newtable, + .attr_count = NFTA_TABLE_MAX, + .policy = nft_table_policy, + }, + [NFT_MSG_GETTABLE] = { + .call = nf_tables_gettable, + .attr_count = NFTA_TABLE_MAX, + .policy = nft_table_policy, + }, + [NFT_MSG_DELTABLE] = { + .call = nf_tables_deltable, + .attr_count = NFTA_TABLE_MAX, + .policy = nft_table_policy, + }, + [NFT_MSG_NEWCHAIN] = { + .call = nf_tables_newchain, + .attr_count = NFTA_CHAIN_MAX, + .policy = nft_chain_policy, + }, + [NFT_MSG_GETCHAIN] = { + .call = nf_tables_getchain, + .attr_count = NFTA_CHAIN_MAX, + .policy = nft_chain_policy, + }, + [NFT_MSG_DELCHAIN] = { + .call = nf_tables_delchain, + .attr_count = NFTA_CHAIN_MAX, + .policy = nft_chain_policy, + }, + [NFT_MSG_NEWRULE] = { + .call_batch = nf_tables_newrule, + .attr_count = NFTA_RULE_MAX, + .policy = nft_rule_policy, + }, + [NFT_MSG_GETRULE] = { + .call = nf_tables_getrule, + .attr_count = NFTA_RULE_MAX, + .policy = nft_rule_policy, + }, + [NFT_MSG_DELRULE] = { + .call_batch = nf_tables_delrule, + .attr_count = NFTA_RULE_MAX, + .policy = nft_rule_policy, + }, + [NFT_MSG_NEWSET] = { + .call = nf_tables_newset, + .attr_count = NFTA_SET_MAX, + .policy = nft_set_policy, + }, + [NFT_MSG_GETSET] = { + .call = nf_tables_getset, + .attr_count = NFTA_SET_MAX, + .policy = nft_set_policy, + }, + [NFT_MSG_DELSET] = { + .call = nf_tables_delset, + .attr_count = NFTA_SET_MAX, + .policy = nft_set_policy, + }, + [NFT_MSG_NEWSETELEM] = { + .call = nf_tables_newsetelem, + .attr_count = NFTA_SET_ELEM_LIST_MAX, + .policy = nft_set_elem_list_policy, + }, + [NFT_MSG_GETSETELEM] = { + .call = nf_tables_getsetelem, + .attr_count = NFTA_SET_ELEM_LIST_MAX, + .policy = nft_set_elem_list_policy, + }, + [NFT_MSG_DELSETELEM] = { + .call = nf_tables_delsetelem, + .attr_count = NFTA_SET_ELEM_LIST_MAX, + .policy = nft_set_elem_list_policy, + }, +}; + +static const struct nfnetlink_subsystem nf_tables_subsys = { + .name = "nf_tables", + .subsys_id = NFNL_SUBSYS_NFTABLES, + .cb_count = NFT_MSG_MAX, + .cb = nf_tables_cb, + .commit = nf_tables_commit, + .abort = nf_tables_abort, +}; + +/* + * Loop detection - walk through the ruleset beginning at the destination chain + * of a new jump until either the source chain is reached (loop) or all + * reachable chains have been traversed. + * + * The loop check is performed whenever a new jump verdict is added to an + * expression or verdict map or a verdict map is bound to a new chain. + */ + +static int nf_tables_check_loops(const struct nft_ctx *ctx, + const struct nft_chain *chain); + +static int nf_tables_loop_check_setelem(const struct nft_ctx *ctx, + const struct nft_set *set, + const struct nft_set_iter *iter, + const struct nft_set_elem *elem) +{ + switch (elem->data.verdict) { + case NFT_JUMP: + case NFT_GOTO: + return nf_tables_check_loops(ctx, elem->data.chain); + default: + return 0; + } +} + +static int nf_tables_check_loops(const struct nft_ctx *ctx, + const struct nft_chain *chain) +{ + const struct nft_rule *rule; + const struct nft_expr *expr, *last; + const struct nft_set *set; + struct nft_set_binding *binding; + struct nft_set_iter iter; + + if (ctx->chain == chain) + return -ELOOP; + + list_for_each_entry(rule, &chain->rules, list) { + nft_rule_for_each_expr(expr, last, rule) { + const struct nft_data *data = NULL; + int err; + + if (!expr->ops->validate) + continue; + + err = expr->ops->validate(ctx, expr, &data); + if (err < 0) + return err; + + if (data == NULL) + continue; + + switch (data->verdict) { + case NFT_JUMP: + case NFT_GOTO: + err = nf_tables_check_loops(ctx, data->chain); + if (err < 0) + return err; + default: + break; + } + } + } + + list_for_each_entry(set, &ctx->table->sets, list) { + if (!(set->flags & NFT_SET_MAP) || + set->dtype != NFT_DATA_VERDICT) + continue; + + list_for_each_entry(binding, &set->bindings, list) { + if (binding->chain != chain) + continue; + + iter.skip = 0; + iter.count = 0; + iter.err = 0; + iter.fn = nf_tables_loop_check_setelem; + + set->ops->walk(ctx, set, &iter); + if (iter.err < 0) + return iter.err; + } + } + + return 0; +} + +/** + * nft_validate_input_register - validate an expressions' input register + * + * @reg: the register number + * + * Validate that the input register is one of the general purpose + * registers. + */ +int nft_validate_input_register(enum nft_registers reg) +{ + if (reg <= NFT_REG_VERDICT) + return -EINVAL; + if (reg > NFT_REG_MAX) + return -ERANGE; + return 0; +} +EXPORT_SYMBOL_GPL(nft_validate_input_register); + +/** + * nft_validate_output_register - validate an expressions' output register + * + * @reg: the register number + * + * Validate that the output register is one of the general purpose + * registers or the verdict register. + */ +int nft_validate_output_register(enum nft_registers reg) +{ + if (reg < NFT_REG_VERDICT) + return -EINVAL; + if (reg > NFT_REG_MAX) + return -ERANGE; + return 0; +} +EXPORT_SYMBOL_GPL(nft_validate_output_register); + +/** + * nft_validate_data_load - validate an expressions' data load + * + * @ctx: context of the expression performing the load + * @reg: the destination register number + * @data: the data to load + * @type: the data type + * + * Validate that a data load uses the appropriate data type for + * the destination register. A value of NULL for the data means + * that its runtime gathered data, which is always of type + * NFT_DATA_VALUE. + */ +int nft_validate_data_load(const struct nft_ctx *ctx, enum nft_registers reg, + const struct nft_data *data, + enum nft_data_types type) +{ + int err; + + switch (reg) { + case NFT_REG_VERDICT: + if (data == NULL || type != NFT_DATA_VERDICT) + return -EINVAL; + + if (data->verdict == NFT_GOTO || data->verdict == NFT_JUMP) { + err = nf_tables_check_loops(ctx, data->chain); + if (err < 0) + return err; + + if (ctx->chain->level + 1 > data->chain->level) { + if (ctx->chain->level + 1 == NFT_JUMP_STACK_SIZE) + return -EMLINK; + data->chain->level = ctx->chain->level + 1; + } + } + + return 0; + default: + if (data != NULL && type != NFT_DATA_VALUE) + return -EINVAL; + return 0; + } +} +EXPORT_SYMBOL_GPL(nft_validate_data_load); + +static const struct nla_policy nft_verdict_policy[NFTA_VERDICT_MAX + 1] = { + [NFTA_VERDICT_CODE] = { .type = NLA_U32 }, + [NFTA_VERDICT_CHAIN] = { .type = NLA_STRING, + .len = NFT_CHAIN_MAXNAMELEN - 1 }, +}; + +static int nft_verdict_init(const struct nft_ctx *ctx, struct nft_data *data, + struct nft_data_desc *desc, const struct nlattr *nla) +{ + struct nlattr *tb[NFTA_VERDICT_MAX + 1]; + struct nft_chain *chain; + int err; + + err = nla_parse_nested(tb, NFTA_VERDICT_MAX, nla, nft_verdict_policy); + if (err < 0) + return err; + + if (!tb[NFTA_VERDICT_CODE]) + return -EINVAL; + data->verdict = ntohl(nla_get_be32(tb[NFTA_VERDICT_CODE])); + + switch (data->verdict) { + case NF_ACCEPT: + case NF_DROP: + case NF_QUEUE: + case NFT_CONTINUE: + case NFT_BREAK: + case NFT_RETURN: + desc->len = sizeof(data->verdict); + break; + case NFT_JUMP: + case NFT_GOTO: + if (!tb[NFTA_VERDICT_CHAIN]) + return -EINVAL; + chain = nf_tables_chain_lookup(ctx->table, + tb[NFTA_VERDICT_CHAIN]); + if (IS_ERR(chain)) + return PTR_ERR(chain); + if (chain->flags & NFT_BASE_CHAIN) + return -EOPNOTSUPP; + + chain->use++; + data->chain = chain; + desc->len = sizeof(data); + break; + default: + return -EINVAL; + } + + desc->type = NFT_DATA_VERDICT; + return 0; +} + +static void nft_verdict_uninit(const struct nft_data *data) +{ + switch (data->verdict) { + case NFT_JUMP: + case NFT_GOTO: + data->chain->use--; + break; + } +} + +static int nft_verdict_dump(struct sk_buff *skb, const struct nft_data *data) +{ + struct nlattr *nest; + + nest = nla_nest_start(skb, NFTA_DATA_VERDICT); + if (!nest) + goto nla_put_failure; + + if (nla_put_be32(skb, NFTA_VERDICT_CODE, htonl(data->verdict))) + goto nla_put_failure; + + switch (data->verdict) { + case NFT_JUMP: + case NFT_GOTO: + if (nla_put_string(skb, NFTA_VERDICT_CHAIN, data->chain->name)) + goto nla_put_failure; + } + nla_nest_end(skb, nest); + return 0; + +nla_put_failure: + return -1; +} + +static int nft_value_init(const struct nft_ctx *ctx, struct nft_data *data, + struct nft_data_desc *desc, const struct nlattr *nla) +{ + unsigned int len; + + len = nla_len(nla); + if (len == 0) + return -EINVAL; + if (len > sizeof(data->data)) + return -EOVERFLOW; + + nla_memcpy(data->data, nla, sizeof(data->data)); + desc->type = NFT_DATA_VALUE; + desc->len = len; + return 0; +} + +static int nft_value_dump(struct sk_buff *skb, const struct nft_data *data, + unsigned int len) +{ + return nla_put(skb, NFTA_DATA_VALUE, len, data->data); +} + +static const struct nla_policy nft_data_policy[NFTA_DATA_MAX + 1] = { + [NFTA_DATA_VALUE] = { .type = NLA_BINARY, + .len = FIELD_SIZEOF(struct nft_data, data) }, + [NFTA_DATA_VERDICT] = { .type = NLA_NESTED }, +}; + +/** + * nft_data_init - parse nf_tables data netlink attributes + * + * @ctx: context of the expression using the data + * @data: destination struct nft_data + * @desc: data description + * @nla: netlink attribute containing data + * + * Parse the netlink data attributes and initialize a struct nft_data. + * The type and length of data are returned in the data description. + * + * The caller can indicate that it only wants to accept data of type + * NFT_DATA_VALUE by passing NULL for the ctx argument. + */ +int nft_data_init(const struct nft_ctx *ctx, struct nft_data *data, + struct nft_data_desc *desc, const struct nlattr *nla) +{ + struct nlattr *tb[NFTA_DATA_MAX + 1]; + int err; + + err = nla_parse_nested(tb, NFTA_DATA_MAX, nla, nft_data_policy); + if (err < 0) + return err; + + if (tb[NFTA_DATA_VALUE]) + return nft_value_init(ctx, data, desc, tb[NFTA_DATA_VALUE]); + if (tb[NFTA_DATA_VERDICT] && ctx != NULL) + return nft_verdict_init(ctx, data, desc, tb[NFTA_DATA_VERDICT]); + return -EINVAL; +} +EXPORT_SYMBOL_GPL(nft_data_init); + +/** + * nft_data_uninit - release a nft_data item + * + * @data: struct nft_data to release + * @type: type of data + * + * Release a nft_data item. NFT_DATA_VALUE types can be silently discarded, + * all others need to be released by calling this function. + */ +void nft_data_uninit(const struct nft_data *data, enum nft_data_types type) +{ + switch (type) { + case NFT_DATA_VALUE: + return; + case NFT_DATA_VERDICT: + return nft_verdict_uninit(data); + default: + WARN_ON(1); + } +} +EXPORT_SYMBOL_GPL(nft_data_uninit); + +int nft_data_dump(struct sk_buff *skb, int attr, const struct nft_data *data, + enum nft_data_types type, unsigned int len) +{ + struct nlattr *nest; + int err; + + nest = nla_nest_start(skb, attr); + if (nest == NULL) + return -1; + + switch (type) { + case NFT_DATA_VALUE: + err = nft_value_dump(skb, data, len); + break; + case NFT_DATA_VERDICT: + err = nft_verdict_dump(skb, data); + break; + default: + err = -EINVAL; + WARN_ON(1); + } + + nla_nest_end(skb, nest); + return err; +} +EXPORT_SYMBOL_GPL(nft_data_dump); + +static int nf_tables_init_net(struct net *net) +{ + INIT_LIST_HEAD(&net->nft.af_info); + INIT_LIST_HEAD(&net->nft.commit_list); + return 0; +} + +static struct pernet_operations nf_tables_net_ops = { + .init = nf_tables_init_net, +}; + +static int __init nf_tables_module_init(void) +{ + int err; + + info = kmalloc(sizeof(struct nft_expr_info) * NFT_RULE_MAXEXPRS, + GFP_KERNEL); + if (info == NULL) { + err = -ENOMEM; + goto err1; + } + + err = nf_tables_core_module_init(); + if (err < 0) + goto err2; + + err = nfnetlink_subsys_register(&nf_tables_subsys); + if (err < 0) + goto err3; + + pr_info("nf_tables: (c) 2007-2009 Patrick McHardy <kaber@trash.net>\n"); + return register_pernet_subsys(&nf_tables_net_ops); +err3: + nf_tables_core_module_exit(); +err2: + kfree(info); +err1: + return err; +} + +static void __exit nf_tables_module_exit(void) +{ + unregister_pernet_subsys(&nf_tables_net_ops); + nfnetlink_subsys_unregister(&nf_tables_subsys); + nf_tables_core_module_exit(); + kfree(info); +} + +module_init(nf_tables_module_init); +module_exit(nf_tables_module_exit); + +MODULE_LICENSE("GPL"); +MODULE_AUTHOR("Patrick McHardy <kaber@trash.net>"); +MODULE_ALIAS_NFNL_SUBSYS(NFNL_SUBSYS_NFTABLES); diff --git a/net/netfilter/nf_tables_core.c b/net/netfilter/nf_tables_core.c new file mode 100644 index 000000000000..cb9e685caae1 --- /dev/null +++ b/net/netfilter/nf_tables_core.c @@ -0,0 +1,270 @@ +/* + * Copyright (c) 2008 Patrick McHardy <kaber@trash.net> + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License version 2 as + * published by the Free Software Foundation. + * + * Development of this code funded by Astaro AG (http://www.astaro.com/) + */ + +#include <linux/module.h> +#include <linux/init.h> +#include <linux/list.h> +#include <linux/rculist.h> +#include <linux/skbuff.h> +#include <linux/netlink.h> +#include <linux/netfilter.h> +#include <linux/netfilter/nfnetlink.h> +#include <linux/netfilter/nf_tables.h> +#include <net/netfilter/nf_tables_core.h> +#include <net/netfilter/nf_tables.h> +#include <net/netfilter/nf_log.h> + +static void nft_cmp_fast_eval(const struct nft_expr *expr, + struct nft_data data[NFT_REG_MAX + 1]) +{ + const struct nft_cmp_fast_expr *priv = nft_expr_priv(expr); + u32 mask; + + mask = ~0U >> (sizeof(priv->data) * BITS_PER_BYTE - priv->len); + if ((data[priv->sreg].data[0] & mask) == priv->data) + return; + data[NFT_REG_VERDICT].verdict = NFT_BREAK; +} + +static bool nft_payload_fast_eval(const struct nft_expr *expr, + struct nft_data data[NFT_REG_MAX + 1], + const struct nft_pktinfo *pkt) +{ + const struct nft_payload *priv = nft_expr_priv(expr); + const struct sk_buff *skb = pkt->skb; + struct nft_data *dest = &data[priv->dreg]; + unsigned char *ptr; + + if (priv->base == NFT_PAYLOAD_NETWORK_HEADER) + ptr = skb_network_header(skb); + else + ptr = skb_network_header(skb) + pkt->xt.thoff; + + ptr += priv->offset; + + if (unlikely(ptr + priv->len >= skb_tail_pointer(skb))) + return false; + + if (priv->len == 2) + *(u16 *)dest->data = *(u16 *)ptr; + else if (priv->len == 4) + *(u32 *)dest->data = *(u32 *)ptr; + else + *(u8 *)dest->data = *(u8 *)ptr; + return true; +} + +struct nft_jumpstack { + const struct nft_chain *chain; + const struct nft_rule *rule; + int rulenum; +}; + +static inline void +nft_chain_stats(const struct nft_chain *this, const struct nft_pktinfo *pkt, + struct nft_jumpstack *jumpstack, unsigned int stackptr) +{ + struct nft_stats __percpu *stats; + const struct nft_chain *chain = stackptr ? jumpstack[0].chain : this; + + rcu_read_lock_bh(); + stats = rcu_dereference(nft_base_chain(chain)->stats); + __this_cpu_inc(stats->pkts); + __this_cpu_add(stats->bytes, pkt->skb->len); + rcu_read_unlock_bh(); +} + +enum nft_trace { + NFT_TRACE_RULE, + NFT_TRACE_RETURN, + NFT_TRACE_POLICY, +}; + +static const char *const comments[] = { + [NFT_TRACE_RULE] = "rule", + [NFT_TRACE_RETURN] = "return", + [NFT_TRACE_POLICY] = "policy", +}; + +static struct nf_loginfo trace_loginfo = { + .type = NF_LOG_TYPE_LOG, + .u = { + .log = { + .level = 4, + .logflags = NF_LOG_MASK, + }, + }, +}; + +static inline void nft_trace_packet(const struct nft_pktinfo *pkt, + const struct nft_chain *chain, + int rulenum, enum nft_trace type) +{ + struct net *net = dev_net(pkt->in ? pkt->in : pkt->out); + + nf_log_packet(net, pkt->xt.family, pkt->hooknum, pkt->skb, pkt->in, + pkt->out, &trace_loginfo, "TRACE: %s:%s:%s:%u ", + chain->table->name, chain->name, comments[type], + rulenum); +} + +unsigned int +nft_do_chain_pktinfo(struct nft_pktinfo *pkt, const struct nf_hook_ops *ops) +{ + const struct nft_chain *chain = ops->priv; + const struct nft_rule *rule; + const struct nft_expr *expr, *last; + struct nft_data data[NFT_REG_MAX + 1]; + unsigned int stackptr = 0; + struct nft_jumpstack jumpstack[NFT_JUMP_STACK_SIZE]; + int rulenum = 0; + /* + * Cache cursor to avoid problems in case that the cursor is updated + * while traversing the ruleset. + */ + unsigned int gencursor = ACCESS_ONCE(chain->net->nft.gencursor); + +do_chain: + rule = list_entry(&chain->rules, struct nft_rule, list); +next_rule: + data[NFT_REG_VERDICT].verdict = NFT_CONTINUE; + list_for_each_entry_continue_rcu(rule, &chain->rules, list) { + + /* This rule is not active, skip. */ + if (unlikely(rule->genmask & (1 << gencursor))) + continue; + + rulenum++; + + nft_rule_for_each_expr(expr, last, rule) { + if (expr->ops == &nft_cmp_fast_ops) + nft_cmp_fast_eval(expr, data); + else if (expr->ops != &nft_payload_fast_ops || + !nft_payload_fast_eval(expr, data, pkt)) + expr->ops->eval(expr, data, pkt); + + if (data[NFT_REG_VERDICT].verdict != NFT_CONTINUE) + break; + } + + switch (data[NFT_REG_VERDICT].verdict) { + case NFT_BREAK: + data[NFT_REG_VERDICT].verdict = NFT_CONTINUE; + /* fall through */ + case NFT_CONTINUE: + continue; + } + break; + } + + switch (data[NFT_REG_VERDICT].verdict) { + case NF_ACCEPT: + case NF_DROP: + case NF_QUEUE: + if (unlikely(pkt->skb->nf_trace)) + nft_trace_packet(pkt, chain, rulenum, NFT_TRACE_RULE); + + return data[NFT_REG_VERDICT].verdict; + case NFT_JUMP: + if (unlikely(pkt->skb->nf_trace)) + nft_trace_packet(pkt, chain, rulenum, NFT_TRACE_RULE); + + BUG_ON(stackptr >= NFT_JUMP_STACK_SIZE); + jumpstack[stackptr].chain = chain; + jumpstack[stackptr].rule = rule; + jumpstack[stackptr].rulenum = rulenum; + stackptr++; + /* fall through */ + case NFT_GOTO: + chain = data[NFT_REG_VERDICT].chain; + goto do_chain; + case NFT_RETURN: + if (unlikely(pkt->skb->nf_trace)) + nft_trace_packet(pkt, chain, rulenum, NFT_TRACE_RETURN); + + /* fall through */ + case NFT_CONTINUE: + break; + default: + WARN_ON(1); + } + + if (stackptr > 0) { + if (unlikely(pkt->skb->nf_trace)) + nft_trace_packet(pkt, chain, ++rulenum, NFT_TRACE_RETURN); + + stackptr--; + chain = jumpstack[stackptr].chain; + rule = jumpstack[stackptr].rule; + rulenum = jumpstack[stackptr].rulenum; + goto next_rule; + } + nft_chain_stats(chain, pkt, jumpstack, stackptr); + + if (unlikely(pkt->skb->nf_trace)) + nft_trace_packet(pkt, chain, ++rulenum, NFT_TRACE_POLICY); + + return nft_base_chain(chain)->policy; +} +EXPORT_SYMBOL_GPL(nft_do_chain_pktinfo); + +int __init nf_tables_core_module_init(void) +{ + int err; + + err = nft_immediate_module_init(); + if (err < 0) + goto err1; + + err = nft_cmp_module_init(); + if (err < 0) + goto err2; + + err = nft_lookup_module_init(); + if (err < 0) + goto err3; + + err = nft_bitwise_module_init(); + if (err < 0) + goto err4; + + err = nft_byteorder_module_init(); + if (err < 0) + goto err5; + + err = nft_payload_module_init(); + if (err < 0) + goto err6; + + return 0; + +err6: + nft_byteorder_module_exit(); +err5: + nft_bitwise_module_exit(); +err4: + nft_lookup_module_exit(); +err3: + nft_cmp_module_exit(); +err2: + nft_immediate_module_exit(); +err1: + return err; +} + +void nf_tables_core_module_exit(void) +{ + nft_payload_module_exit(); + nft_byteorder_module_exit(); + nft_bitwise_module_exit(); + nft_lookup_module_exit(); + nft_cmp_module_exit(); + nft_immediate_module_exit(); +} diff --git a/net/netfilter/nfnetlink.c b/net/netfilter/nfnetlink.c index 572d87dc116f..046aa13b4fea 100644 --- a/net/netfilter/nfnetlink.c +++ b/net/netfilter/nfnetlink.c @@ -147,9 +147,6 @@ static int nfnetlink_rcv_msg(struct sk_buff *skb, struct nlmsghdr *nlh) const struct nfnetlink_subsystem *ss; int type, err; - if (!ns_capable(net->user_ns, CAP_NET_ADMIN)) - return -EPERM; - /* All the messages must at least contain nfgenmsg */ if (nlmsg_len(nlh) < sizeof(struct nfgenmsg)) return 0; @@ -217,9 +214,181 @@ replay: } } +static void nfnetlink_rcv_batch(struct sk_buff *skb, struct nlmsghdr *nlh, + u_int16_t subsys_id) +{ + struct sk_buff *nskb, *oskb = skb; + struct net *net = sock_net(skb->sk); + const struct nfnetlink_subsystem *ss; + const struct nfnl_callback *nc; + bool success = true, done = false; + int err; + + if (subsys_id >= NFNL_SUBSYS_COUNT) + return netlink_ack(skb, nlh, -EINVAL); +replay: + nskb = netlink_skb_clone(oskb, GFP_KERNEL); + if (!nskb) + return netlink_ack(oskb, nlh, -ENOMEM); + + nskb->sk = oskb->sk; + skb = nskb; + + nfnl_lock(subsys_id); + ss = rcu_dereference_protected(table[subsys_id].subsys, + lockdep_is_held(&table[subsys_id].mutex)); + if (!ss) { +#ifdef CONFIG_MODULES + nfnl_unlock(subsys_id); + request_module("nfnetlink-subsys-%d", subsys_id); + nfnl_lock(subsys_id); + ss = rcu_dereference_protected(table[subsys_id].subsys, + lockdep_is_held(&table[subsys_id].mutex)); + if (!ss) +#endif + { + nfnl_unlock(subsys_id); + kfree_skb(nskb); + return netlink_ack(skb, nlh, -EOPNOTSUPP); + } + } + + if (!ss->commit || !ss->abort) { + nfnl_unlock(subsys_id); + kfree_skb(nskb); + return netlink_ack(skb, nlh, -EOPNOTSUPP); + } + + while (skb->len >= nlmsg_total_size(0)) { + int msglen, type; + + nlh = nlmsg_hdr(skb); + err = 0; + + if (nlh->nlmsg_len < NLMSG_HDRLEN) { + err = -EINVAL; + goto ack; + } + + /* Only requests are handled by the kernel */ + if (!(nlh->nlmsg_flags & NLM_F_REQUEST)) { + err = -EINVAL; + goto ack; + } + + type = nlh->nlmsg_type; + if (type == NFNL_MSG_BATCH_BEGIN) { + /* Malformed: Batch begin twice */ + success = false; + goto done; + } else if (type == NFNL_MSG_BATCH_END) { + done = true; + goto done; + } else if (type < NLMSG_MIN_TYPE) { + err = -EINVAL; + goto ack; + } + + /* We only accept a batch with messages for the same + * subsystem. + */ + if (NFNL_SUBSYS_ID(type) != subsys_id) { + err = -EINVAL; + goto ack; + } + + nc = nfnetlink_find_client(type, ss); + if (!nc) { + err = -EINVAL; + goto ack; + } + + { + int min_len = nlmsg_total_size(sizeof(struct nfgenmsg)); + u_int8_t cb_id = NFNL_MSG_TYPE(nlh->nlmsg_type); + struct nlattr *cda[ss->cb[cb_id].attr_count + 1]; + struct nlattr *attr = (void *)nlh + min_len; + int attrlen = nlh->nlmsg_len - min_len; + + err = nla_parse(cda, ss->cb[cb_id].attr_count, + attr, attrlen, ss->cb[cb_id].policy); + if (err < 0) + goto ack; + + if (nc->call_batch) { + err = nc->call_batch(net->nfnl, skb, nlh, + (const struct nlattr **)cda); + } + + /* The lock was released to autoload some module, we + * have to abort and start from scratch using the + * original skb. + */ + if (err == -EAGAIN) { + ss->abort(skb); + nfnl_unlock(subsys_id); + kfree_skb(nskb); + goto replay; + } + } +ack: + if (nlh->nlmsg_flags & NLM_F_ACK || err) { + /* We don't stop processing the batch on errors, thus, + * userspace gets all the errors that the batch + * triggers. + */ + netlink_ack(skb, nlh, err); + if (err) + success = false; + } + + msglen = NLMSG_ALIGN(nlh->nlmsg_len); + if (msglen > skb->len) + msglen = skb->len; + skb_pull(skb, msglen); + } +done: + if (success && done) + ss->commit(skb); + else + ss->abort(skb); + + nfnl_unlock(subsys_id); + kfree_skb(nskb); +} + static void nfnetlink_rcv(struct sk_buff *skb) { - netlink_rcv_skb(skb, &nfnetlink_rcv_msg); + struct nlmsghdr *nlh = nlmsg_hdr(skb); + struct net *net = sock_net(skb->sk); + int msglen; + + if (nlh->nlmsg_len < NLMSG_HDRLEN || + skb->len < nlh->nlmsg_len) + return; + + if (!ns_capable(net->user_ns, CAP_NET_ADMIN)) { + netlink_ack(skb, nlh, -EPERM); + return; + } + + if (nlh->nlmsg_type == NFNL_MSG_BATCH_BEGIN) { + struct nfgenmsg *nfgenmsg; + + msglen = NLMSG_ALIGN(nlh->nlmsg_len); + if (msglen > skb->len) + msglen = skb->len; + + if (nlh->nlmsg_len < NLMSG_HDRLEN || + skb->len < NLMSG_HDRLEN + sizeof(struct nfgenmsg)) + return; + + nfgenmsg = nlmsg_data(nlh); + skb_pull(skb, msglen); + nfnetlink_rcv_batch(skb, nlh, nfgenmsg->res_id); + } else { + netlink_rcv_skb(skb, &nfnetlink_rcv_msg); + } } #ifdef CONFIG_MODULES diff --git a/net/netfilter/nfnetlink_cttimeout.c b/net/netfilter/nfnetlink_cttimeout.c index 50580494148d..476accd17145 100644 --- a/net/netfilter/nfnetlink_cttimeout.c +++ b/net/netfilter/nfnetlink_cttimeout.c @@ -49,10 +49,8 @@ static const struct nla_policy cttimeout_nla_policy[CTA_TIMEOUT_MAX+1] = { }; static int -ctnl_timeout_parse_policy(struct ctnl_timeout *timeout, - struct nf_conntrack_l4proto *l4proto, - struct net *net, - const struct nlattr *attr) +ctnl_timeout_parse_policy(void *timeouts, struct nf_conntrack_l4proto *l4proto, + struct net *net, const struct nlattr *attr) { int ret = 0; @@ -64,8 +62,7 @@ ctnl_timeout_parse_policy(struct ctnl_timeout *timeout, if (ret < 0) return ret; - ret = l4proto->ctnl_timeout.nlattr_to_obj(tb, net, - &timeout->data); + ret = l4proto->ctnl_timeout.nlattr_to_obj(tb, net, timeouts); } return ret; } @@ -123,7 +120,8 @@ cttimeout_new_timeout(struct sock *ctnl, struct sk_buff *skb, goto err_proto_put; } - ret = ctnl_timeout_parse_policy(matching, l4proto, net, + ret = ctnl_timeout_parse_policy(&matching->data, + l4proto, net, cda[CTA_TIMEOUT_DATA]); return ret; } @@ -138,7 +136,7 @@ cttimeout_new_timeout(struct sock *ctnl, struct sk_buff *skb, goto err_proto_put; } - ret = ctnl_timeout_parse_policy(timeout, l4proto, net, + ret = ctnl_timeout_parse_policy(&timeout->data, l4proto, net, cda[CTA_TIMEOUT_DATA]); if (ret < 0) goto err; @@ -342,6 +340,147 @@ cttimeout_del_timeout(struct sock *ctnl, struct sk_buff *skb, return ret; } +static int +cttimeout_default_set(struct sock *ctnl, struct sk_buff *skb, + const struct nlmsghdr *nlh, + const struct nlattr * const cda[]) +{ + __u16 l3num; + __u8 l4num; + struct nf_conntrack_l4proto *l4proto; + struct net *net = sock_net(skb->sk); + unsigned int *timeouts; + int ret; + + if (!cda[CTA_TIMEOUT_L3PROTO] || + !cda[CTA_TIMEOUT_L4PROTO] || + !cda[CTA_TIMEOUT_DATA]) + return -EINVAL; + + l3num = ntohs(nla_get_be16(cda[CTA_TIMEOUT_L3PROTO])); + l4num = nla_get_u8(cda[CTA_TIMEOUT_L4PROTO]); + l4proto = nf_ct_l4proto_find_get(l3num, l4num); + + /* This protocol is not supported, skip. */ + if (l4proto->l4proto != l4num) { + ret = -EOPNOTSUPP; + goto err; + } + + timeouts = l4proto->get_timeouts(net); + + ret = ctnl_timeout_parse_policy(timeouts, l4proto, net, + cda[CTA_TIMEOUT_DATA]); + if (ret < 0) + goto err; + + nf_ct_l4proto_put(l4proto); + return 0; +err: + nf_ct_l4proto_put(l4proto); + return ret; +} + +static int +cttimeout_default_fill_info(struct net *net, struct sk_buff *skb, u32 portid, + u32 seq, u32 type, int event, + struct nf_conntrack_l4proto *l4proto) +{ + struct nlmsghdr *nlh; + struct nfgenmsg *nfmsg; + unsigned int flags = portid ? NLM_F_MULTI : 0; + + event |= NFNL_SUBSYS_CTNETLINK_TIMEOUT << 8; + nlh = nlmsg_put(skb, portid, seq, event, sizeof(*nfmsg), flags); + if (nlh == NULL) + goto nlmsg_failure; + + nfmsg = nlmsg_data(nlh); + nfmsg->nfgen_family = AF_UNSPEC; + nfmsg->version = NFNETLINK_V0; + nfmsg->res_id = 0; + + if (nla_put_be16(skb, CTA_TIMEOUT_L3PROTO, htons(l4proto->l3proto)) || + nla_put_u8(skb, CTA_TIMEOUT_L4PROTO, l4proto->l4proto)) + goto nla_put_failure; + + if (likely(l4proto->ctnl_timeout.obj_to_nlattr)) { + struct nlattr *nest_parms; + unsigned int *timeouts = l4proto->get_timeouts(net); + int ret; + + nest_parms = nla_nest_start(skb, + CTA_TIMEOUT_DATA | NLA_F_NESTED); + if (!nest_parms) + goto nla_put_failure; + + ret = l4proto->ctnl_timeout.obj_to_nlattr(skb, timeouts); + if (ret < 0) + goto nla_put_failure; + + nla_nest_end(skb, nest_parms); + } + + nlmsg_end(skb, nlh); + return skb->len; + +nlmsg_failure: +nla_put_failure: + nlmsg_cancel(skb, nlh); + return -1; +} + +static int cttimeout_default_get(struct sock *ctnl, struct sk_buff *skb, + const struct nlmsghdr *nlh, + const struct nlattr * const cda[]) +{ + __u16 l3num; + __u8 l4num; + struct nf_conntrack_l4proto *l4proto; + struct net *net = sock_net(skb->sk); + struct sk_buff *skb2; + int ret, err; + + if (!cda[CTA_TIMEOUT_L3PROTO] || !cda[CTA_TIMEOUT_L4PROTO]) + return -EINVAL; + + l3num = ntohs(nla_get_be16(cda[CTA_TIMEOUT_L3PROTO])); + l4num = nla_get_u8(cda[CTA_TIMEOUT_L4PROTO]); + l4proto = nf_ct_l4proto_find_get(l3num, l4num); + + /* This protocol is not supported, skip. */ + if (l4proto->l4proto != l4num) { + err = -EOPNOTSUPP; + goto err; + } + + skb2 = nlmsg_new(NLMSG_DEFAULT_SIZE, GFP_KERNEL); + if (skb2 == NULL) { + err = -ENOMEM; + goto err; + } + + ret = cttimeout_default_fill_info(net, skb2, NETLINK_CB(skb).portid, + nlh->nlmsg_seq, + NFNL_MSG_TYPE(nlh->nlmsg_type), + IPCTNL_MSG_TIMEOUT_DEFAULT_SET, + l4proto); + if (ret <= 0) { + kfree_skb(skb2); + err = -ENOMEM; + goto err; + } + ret = netlink_unicast(ctnl, skb2, NETLINK_CB(skb).portid, MSG_DONTWAIT); + if (ret > 0) + ret = 0; + + /* this avoids a loop in nfnetlink. */ + return ret == -EAGAIN ? -ENOBUFS : ret; +err: + nf_ct_l4proto_put(l4proto); + return err; +} + #ifdef CONFIG_NF_CONNTRACK_TIMEOUT static struct ctnl_timeout *ctnl_timeout_find_get(const char *name) { @@ -384,6 +523,12 @@ static const struct nfnl_callback cttimeout_cb[IPCTNL_MSG_TIMEOUT_MAX] = { [IPCTNL_MSG_TIMEOUT_DELETE] = { .call = cttimeout_del_timeout, .attr_count = CTA_TIMEOUT_MAX, .policy = cttimeout_nla_policy }, + [IPCTNL_MSG_TIMEOUT_DEFAULT_SET]= { .call = cttimeout_default_set, + .attr_count = CTA_TIMEOUT_MAX, + .policy = cttimeout_nla_policy }, + [IPCTNL_MSG_TIMEOUT_DEFAULT_GET]= { .call = cttimeout_default_get, + .attr_count = CTA_TIMEOUT_MAX, + .policy = cttimeout_nla_policy }, }; static const struct nfnetlink_subsystem cttimeout_subsys = { diff --git a/net/netfilter/nfnetlink_log.c b/net/netfilter/nfnetlink_log.c index d92cc317bf8b..3c4b69e5fe17 100644 --- a/net/netfilter/nfnetlink_log.c +++ b/net/netfilter/nfnetlink_log.c @@ -319,7 +319,8 @@ nfulnl_set_flags(struct nfulnl_instance *inst, u_int16_t flags) } static struct sk_buff * -nfulnl_alloc_skb(u32 peer_portid, unsigned int inst_size, unsigned int pkt_size) +nfulnl_alloc_skb(struct net *net, u32 peer_portid, unsigned int inst_size, + unsigned int pkt_size) { struct sk_buff *skb; unsigned int n; @@ -328,13 +329,13 @@ nfulnl_alloc_skb(u32 peer_portid, unsigned int inst_size, unsigned int pkt_size) * message. WARNING: has to be <= 128k due to slab restrictions */ n = max(inst_size, pkt_size); - skb = nfnetlink_alloc_skb(&init_net, n, peer_portid, GFP_ATOMIC); + skb = nfnetlink_alloc_skb(net, n, peer_portid, GFP_ATOMIC); if (!skb) { if (n > pkt_size) { /* try to allocate only as much as we need for current * packet */ - skb = nfnetlink_alloc_skb(&init_net, pkt_size, + skb = nfnetlink_alloc_skb(net, pkt_size, peer_portid, GFP_ATOMIC); if (!skb) pr_err("nfnetlink_log: can't even alloc %u bytes\n", @@ -702,8 +703,8 @@ nfulnl_log_packet(struct net *net, } if (!inst->skb) { - inst->skb = nfulnl_alloc_skb(inst->peer_portid, inst->nlbufsiz, - size); + inst->skb = nfulnl_alloc_skb(net, inst->peer_portid, + inst->nlbufsiz, size); if (!inst->skb) goto alloc_failure; } diff --git a/net/netfilter/nfnetlink_queue_core.c b/net/netfilter/nfnetlink_queue_core.c index ae2e5c11d01a..21258cf70091 100644 --- a/net/netfilter/nfnetlink_queue_core.c +++ b/net/netfilter/nfnetlink_queue_core.c @@ -298,7 +298,7 @@ nfqnl_put_packet_info(struct sk_buff *nlskb, struct sk_buff *packet, } static struct sk_buff * -nfqnl_build_packet_message(struct nfqnl_instance *queue, +nfqnl_build_packet_message(struct net *net, struct nfqnl_instance *queue, struct nf_queue_entry *entry, __be32 **packet_id_ptr) { @@ -372,7 +372,7 @@ nfqnl_build_packet_message(struct nfqnl_instance *queue, if (queue->flags & NFQA_CFG_F_CONNTRACK) ct = nfqnl_ct_get(entskb, &size, &ctinfo); - skb = nfnetlink_alloc_skb(&init_net, size, queue->peer_portid, + skb = nfnetlink_alloc_skb(net, size, queue->peer_portid, GFP_ATOMIC); if (!skb) return NULL; @@ -525,7 +525,7 @@ __nfqnl_enqueue_packet(struct net *net, struct nfqnl_instance *queue, __be32 *packet_id_ptr; int failopen = 0; - nskb = nfqnl_build_packet_message(queue, entry, &packet_id_ptr); + nskb = nfqnl_build_packet_message(net, queue, entry, &packet_id_ptr); if (nskb == NULL) { err = -ENOMEM; goto err_out; diff --git a/net/netfilter/nft_bitwise.c b/net/netfilter/nft_bitwise.c new file mode 100644 index 000000000000..4fb6ee2c1106 --- /dev/null +++ b/net/netfilter/nft_bitwise.c @@ -0,0 +1,146 @@ +/* + * Copyright (c) 2008-2009 Patrick McHardy <kaber@trash.net> + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License version 2 as + * published by the Free Software Foundation. + * + * Development of this code funded by Astaro AG (http://www.astaro.com/) + */ + +#include <linux/kernel.h> +#include <linux/init.h> +#include <linux/module.h> +#include <linux/netlink.h> +#include <linux/netfilter.h> +#include <linux/netfilter/nf_tables.h> +#include <net/netfilter/nf_tables_core.h> +#include <net/netfilter/nf_tables.h> + +struct nft_bitwise { + enum nft_registers sreg:8; + enum nft_registers dreg:8; + u8 len; + struct nft_data mask; + struct nft_data xor; +}; + +static void nft_bitwise_eval(const struct nft_expr *expr, + struct nft_data data[NFT_REG_MAX + 1], + const struct nft_pktinfo *pkt) +{ + const struct nft_bitwise *priv = nft_expr_priv(expr); + const struct nft_data *src = &data[priv->sreg]; + struct nft_data *dst = &data[priv->dreg]; + unsigned int i; + + for (i = 0; i < DIV_ROUND_UP(priv->len, 4); i++) { + dst->data[i] = (src->data[i] & priv->mask.data[i]) ^ + priv->xor.data[i]; + } +} + +static const struct nla_policy nft_bitwise_policy[NFTA_BITWISE_MAX + 1] = { + [NFTA_BITWISE_SREG] = { .type = NLA_U32 }, + [NFTA_BITWISE_DREG] = { .type = NLA_U32 }, + [NFTA_BITWISE_LEN] = { .type = NLA_U32 }, + [NFTA_BITWISE_MASK] = { .type = NLA_NESTED }, + [NFTA_BITWISE_XOR] = { .type = NLA_NESTED }, +}; + +static int nft_bitwise_init(const struct nft_ctx *ctx, + const struct nft_expr *expr, + const struct nlattr * const tb[]) +{ + struct nft_bitwise *priv = nft_expr_priv(expr); + struct nft_data_desc d1, d2; + int err; + + if (tb[NFTA_BITWISE_SREG] == NULL || + tb[NFTA_BITWISE_DREG] == NULL || + tb[NFTA_BITWISE_LEN] == NULL || + tb[NFTA_BITWISE_MASK] == NULL || + tb[NFTA_BITWISE_XOR] == NULL) + return -EINVAL; + + priv->sreg = ntohl(nla_get_be32(tb[NFTA_BITWISE_SREG])); + err = nft_validate_input_register(priv->sreg); + if (err < 0) + return err; + + priv->dreg = ntohl(nla_get_be32(tb[NFTA_BITWISE_DREG])); + err = nft_validate_output_register(priv->dreg); + if (err < 0) + return err; + err = nft_validate_data_load(ctx, priv->dreg, NULL, NFT_DATA_VALUE); + if (err < 0) + return err; + + priv->len = ntohl(nla_get_be32(tb[NFTA_BITWISE_LEN])); + + err = nft_data_init(NULL, &priv->mask, &d1, tb[NFTA_BITWISE_MASK]); + if (err < 0) + return err; + if (d1.len != priv->len) + return -EINVAL; + + err = nft_data_init(NULL, &priv->xor, &d2, tb[NFTA_BITWISE_XOR]); + if (err < 0) + return err; + if (d2.len != priv->len) + return -EINVAL; + + return 0; +} + +static int nft_bitwise_dump(struct sk_buff *skb, const struct nft_expr *expr) +{ + const struct nft_bitwise *priv = nft_expr_priv(expr); + + if (nla_put_be32(skb, NFTA_BITWISE_SREG, htonl(priv->sreg))) + goto nla_put_failure; + if (nla_put_be32(skb, NFTA_BITWISE_DREG, htonl(priv->dreg))) + goto nla_put_failure; + if (nla_put_be32(skb, NFTA_BITWISE_LEN, htonl(priv->len))) + goto nla_put_failure; + + if (nft_data_dump(skb, NFTA_BITWISE_MASK, &priv->mask, + NFT_DATA_VALUE, priv->len) < 0) + goto nla_put_failure; + + if (nft_data_dump(skb, NFTA_BITWISE_XOR, &priv->xor, + NFT_DATA_VALUE, priv->len) < 0) + goto nla_put_failure; + + return 0; + +nla_put_failure: + return -1; +} + +static struct nft_expr_type nft_bitwise_type; +static const struct nft_expr_ops nft_bitwise_ops = { + .type = &nft_bitwise_type, + .size = NFT_EXPR_SIZE(sizeof(struct nft_bitwise)), + .eval = nft_bitwise_eval, + .init = nft_bitwise_init, + .dump = nft_bitwise_dump, +}; + +static struct nft_expr_type nft_bitwise_type __read_mostly = { + .name = "bitwise", + .ops = &nft_bitwise_ops, + .policy = nft_bitwise_policy, + .maxattr = NFTA_BITWISE_MAX, + .owner = THIS_MODULE, +}; + +int __init nft_bitwise_module_init(void) +{ + return nft_register_expr(&nft_bitwise_type); +} + +void nft_bitwise_module_exit(void) +{ + nft_unregister_expr(&nft_bitwise_type); +} diff --git a/net/netfilter/nft_byteorder.c b/net/netfilter/nft_byteorder.c new file mode 100644 index 000000000000..c39ed8d29df1 --- /dev/null +++ b/net/netfilter/nft_byteorder.c @@ -0,0 +1,173 @@ +/* + * Copyright (c) 2008-2009 Patrick McHardy <kaber@trash.net> + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License version 2 as + * published by the Free Software Foundation. + * + * Development of this code funded by Astaro AG (http://www.astaro.com/) + */ + +#include <linux/kernel.h> +#include <linux/init.h> +#include <linux/module.h> +#include <linux/netlink.h> +#include <linux/netfilter.h> +#include <linux/netfilter/nf_tables.h> +#include <net/netfilter/nf_tables_core.h> +#include <net/netfilter/nf_tables.h> + +struct nft_byteorder { + enum nft_registers sreg:8; + enum nft_registers dreg:8; + enum nft_byteorder_ops op:8; + u8 len; + u8 size; +}; + +static void nft_byteorder_eval(const struct nft_expr *expr, + struct nft_data data[NFT_REG_MAX + 1], + const struct nft_pktinfo *pkt) +{ + const struct nft_byteorder *priv = nft_expr_priv(expr); + struct nft_data *src = &data[priv->sreg], *dst = &data[priv->dreg]; + union { u32 u32; u16 u16; } *s, *d; + unsigned int i; + + s = (void *)src->data; + d = (void *)dst->data; + + switch (priv->size) { + case 4: + switch (priv->op) { + case NFT_BYTEORDER_NTOH: + for (i = 0; i < priv->len / 4; i++) + d[i].u32 = ntohl((__force __be32)s[i].u32); + break; + case NFT_BYTEORDER_HTON: + for (i = 0; i < priv->len / 4; i++) + d[i].u32 = (__force __u32)htonl(s[i].u32); + break; + } + break; + case 2: + switch (priv->op) { + case NFT_BYTEORDER_NTOH: + for (i = 0; i < priv->len / 2; i++) + d[i].u16 = ntohs((__force __be16)s[i].u16); + break; + case NFT_BYTEORDER_HTON: + for (i = 0; i < priv->len / 2; i++) + d[i].u16 = (__force __u16)htons(s[i].u16); + break; + } + break; + } +} + +static const struct nla_policy nft_byteorder_policy[NFTA_BYTEORDER_MAX + 1] = { + [NFTA_BYTEORDER_SREG] = { .type = NLA_U32 }, + [NFTA_BYTEORDER_DREG] = { .type = NLA_U32 }, + [NFTA_BYTEORDER_OP] = { .type = NLA_U32 }, + [NFTA_BYTEORDER_LEN] = { .type = NLA_U32 }, + [NFTA_BYTEORDER_SIZE] = { .type = NLA_U32 }, +}; + +static int nft_byteorder_init(const struct nft_ctx *ctx, + const struct nft_expr *expr, + const struct nlattr * const tb[]) +{ + struct nft_byteorder *priv = nft_expr_priv(expr); + int err; + + if (tb[NFTA_BYTEORDER_SREG] == NULL || + tb[NFTA_BYTEORDER_DREG] == NULL || + tb[NFTA_BYTEORDER_LEN] == NULL || + tb[NFTA_BYTEORDER_SIZE] == NULL || + tb[NFTA_BYTEORDER_OP] == NULL) + return -EINVAL; + + priv->sreg = ntohl(nla_get_be32(tb[NFTA_BYTEORDER_SREG])); + err = nft_validate_input_register(priv->sreg); + if (err < 0) + return err; + + priv->dreg = ntohl(nla_get_be32(tb[NFTA_BYTEORDER_DREG])); + err = nft_validate_output_register(priv->dreg); + if (err < 0) + return err; + err = nft_validate_data_load(ctx, priv->dreg, NULL, NFT_DATA_VALUE); + if (err < 0) + return err; + + priv->op = ntohl(nla_get_be32(tb[NFTA_BYTEORDER_OP])); + switch (priv->op) { + case NFT_BYTEORDER_NTOH: + case NFT_BYTEORDER_HTON: + break; + default: + return -EINVAL; + } + + priv->len = ntohl(nla_get_be32(tb[NFTA_BYTEORDER_LEN])); + if (priv->len == 0 || priv->len > FIELD_SIZEOF(struct nft_data, data)) + return -EINVAL; + + priv->size = ntohl(nla_get_be32(tb[NFTA_BYTEORDER_SIZE])); + switch (priv->size) { + case 2: + case 4: + break; + default: + return -EINVAL; + } + + return 0; +} + +static int nft_byteorder_dump(struct sk_buff *skb, const struct nft_expr *expr) +{ + const struct nft_byteorder *priv = nft_expr_priv(expr); + + if (nla_put_be32(skb, NFTA_BYTEORDER_SREG, htonl(priv->sreg))) + goto nla_put_failure; + if (nla_put_be32(skb, NFTA_BYTEORDER_DREG, htonl(priv->dreg))) + goto nla_put_failure; + if (nla_put_be32(skb, NFTA_BYTEORDER_OP, htonl(priv->op))) + goto nla_put_failure; + if (nla_put_be32(skb, NFTA_BYTEORDER_LEN, htonl(priv->len))) + goto nla_put_failure; + if (nla_put_be32(skb, NFTA_BYTEORDER_SIZE, htonl(priv->size))) + goto nla_put_failure; + return 0; + +nla_put_failure: + return -1; +} + +static struct nft_expr_type nft_byteorder_type; +static const struct nft_expr_ops nft_byteorder_ops = { + .type = &nft_byteorder_type, + .size = NFT_EXPR_SIZE(sizeof(struct nft_byteorder)), + .eval = nft_byteorder_eval, + .init = nft_byteorder_init, + .dump = nft_byteorder_dump, +}; + +static struct nft_expr_type nft_byteorder_type __read_mostly = { + .name = "byteorder", + .ops = &nft_byteorder_ops, + .policy = nft_byteorder_policy, + .maxattr = NFTA_BYTEORDER_MAX, + .owner = THIS_MODULE, +}; + +int __init nft_byteorder_module_init(void) +{ + return nft_register_expr(&nft_byteorder_type); +} + +void nft_byteorder_module_exit(void) +{ + nft_unregister_expr(&nft_byteorder_type); +} diff --git a/net/netfilter/nft_cmp.c b/net/netfilter/nft_cmp.c new file mode 100644 index 000000000000..954925db414d --- /dev/null +++ b/net/netfilter/nft_cmp.c @@ -0,0 +1,223 @@ +/* + * Copyright (c) 2008-2009 Patrick McHardy <kaber@trash.net> + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License version 2 as + * published by the Free Software Foundation. + * + * Development of this code funded by Astaro AG (http://www.astaro.com/) + */ + +#include <linux/kernel.h> +#include <linux/init.h> +#include <linux/module.h> +#include <linux/netlink.h> +#include <linux/netfilter.h> +#include <linux/netfilter/nf_tables.h> +#include <net/netfilter/nf_tables_core.h> +#include <net/netfilter/nf_tables.h> + +struct nft_cmp_expr { + struct nft_data data; + enum nft_registers sreg:8; + u8 len; + enum nft_cmp_ops op:8; +}; + +static void nft_cmp_eval(const struct nft_expr *expr, + struct nft_data data[NFT_REG_MAX + 1], + const struct nft_pktinfo *pkt) +{ + const struct nft_cmp_expr *priv = nft_expr_priv(expr); + int d; + + d = nft_data_cmp(&data[priv->sreg], &priv->data, priv->len); + switch (priv->op) { + case NFT_CMP_EQ: + if (d != 0) + goto mismatch; + break; + case NFT_CMP_NEQ: + if (d == 0) + goto mismatch; + break; + case NFT_CMP_LT: + if (d == 0) + goto mismatch; + case NFT_CMP_LTE: + if (d > 0) + goto mismatch; + break; + case NFT_CMP_GT: + if (d == 0) + goto mismatch; + case NFT_CMP_GTE: + if (d < 0) + goto mismatch; + break; + } + return; + +mismatch: + data[NFT_REG_VERDICT].verdict = NFT_BREAK; +} + +static const struct nla_policy nft_cmp_policy[NFTA_CMP_MAX + 1] = { + [NFTA_CMP_SREG] = { .type = NLA_U32 }, + [NFTA_CMP_OP] = { .type = NLA_U32 }, + [NFTA_CMP_DATA] = { .type = NLA_NESTED }, +}; + +static int nft_cmp_init(const struct nft_ctx *ctx, const struct nft_expr *expr, + const struct nlattr * const tb[]) +{ + struct nft_cmp_expr *priv = nft_expr_priv(expr); + struct nft_data_desc desc; + int err; + + priv->sreg = ntohl(nla_get_be32(tb[NFTA_CMP_SREG])); + priv->op = ntohl(nla_get_be32(tb[NFTA_CMP_OP])); + + err = nft_data_init(NULL, &priv->data, &desc, tb[NFTA_CMP_DATA]); + BUG_ON(err < 0); + + priv->len = desc.len; + return 0; +} + +static int nft_cmp_dump(struct sk_buff *skb, const struct nft_expr *expr) +{ + const struct nft_cmp_expr *priv = nft_expr_priv(expr); + + if (nla_put_be32(skb, NFTA_CMP_SREG, htonl(priv->sreg))) + goto nla_put_failure; + if (nla_put_be32(skb, NFTA_CMP_OP, htonl(priv->op))) + goto nla_put_failure; + + if (nft_data_dump(skb, NFTA_CMP_DATA, &priv->data, + NFT_DATA_VALUE, priv->len) < 0) + goto nla_put_failure; + return 0; + +nla_put_failure: + return -1; +} + +static struct nft_expr_type nft_cmp_type; +static const struct nft_expr_ops nft_cmp_ops = { + .type = &nft_cmp_type, + .size = NFT_EXPR_SIZE(sizeof(struct nft_cmp_expr)), + .eval = nft_cmp_eval, + .init = nft_cmp_init, + .dump = nft_cmp_dump, +}; + +static int nft_cmp_fast_init(const struct nft_ctx *ctx, + const struct nft_expr *expr, + const struct nlattr * const tb[]) +{ + struct nft_cmp_fast_expr *priv = nft_expr_priv(expr); + struct nft_data_desc desc; + struct nft_data data; + u32 mask; + int err; + + priv->sreg = ntohl(nla_get_be32(tb[NFTA_CMP_SREG])); + + err = nft_data_init(NULL, &data, &desc, tb[NFTA_CMP_DATA]); + BUG_ON(err < 0); + desc.len *= BITS_PER_BYTE; + + mask = ~0U >> (sizeof(priv->data) * BITS_PER_BYTE - desc.len); + priv->data = data.data[0] & mask; + priv->len = desc.len; + return 0; +} + +static int nft_cmp_fast_dump(struct sk_buff *skb, const struct nft_expr *expr) +{ + const struct nft_cmp_fast_expr *priv = nft_expr_priv(expr); + struct nft_data data; + + if (nla_put_be32(skb, NFTA_CMP_SREG, htonl(priv->sreg))) + goto nla_put_failure; + if (nla_put_be32(skb, NFTA_CMP_OP, htonl(NFT_CMP_EQ))) + goto nla_put_failure; + + data.data[0] = priv->data; + if (nft_data_dump(skb, NFTA_CMP_DATA, &data, + NFT_DATA_VALUE, priv->len / BITS_PER_BYTE) < 0) + goto nla_put_failure; + return 0; + +nla_put_failure: + return -1; +} + +const struct nft_expr_ops nft_cmp_fast_ops = { + .type = &nft_cmp_type, + .size = NFT_EXPR_SIZE(sizeof(struct nft_cmp_fast_expr)), + .eval = NULL, /* inlined */ + .init = nft_cmp_fast_init, + .dump = nft_cmp_fast_dump, +}; + +static const struct nft_expr_ops * +nft_cmp_select_ops(const struct nft_ctx *ctx, const struct nlattr * const tb[]) +{ + struct nft_data_desc desc; + struct nft_data data; + enum nft_registers sreg; + enum nft_cmp_ops op; + int err; + + if (tb[NFTA_CMP_SREG] == NULL || + tb[NFTA_CMP_OP] == NULL || + tb[NFTA_CMP_DATA] == NULL) + return ERR_PTR(-EINVAL); + + sreg = ntohl(nla_get_be32(tb[NFTA_CMP_SREG])); + err = nft_validate_input_register(sreg); + if (err < 0) + return ERR_PTR(err); + + op = ntohl(nla_get_be32(tb[NFTA_CMP_OP])); + switch (op) { + case NFT_CMP_EQ: + case NFT_CMP_NEQ: + case NFT_CMP_LT: + case NFT_CMP_LTE: + case NFT_CMP_GT: + case NFT_CMP_GTE: + break; + default: + return ERR_PTR(-EINVAL); + } + + err = nft_data_init(NULL, &data, &desc, tb[NFTA_CMP_DATA]); + if (err < 0) + return ERR_PTR(err); + + if (desc.len <= sizeof(u32) && op == NFT_CMP_EQ) + return &nft_cmp_fast_ops; + else + return &nft_cmp_ops; +} + +static struct nft_expr_type nft_cmp_type __read_mostly = { + .name = "cmp", + .select_ops = nft_cmp_select_ops, + .policy = nft_cmp_policy, + .maxattr = NFTA_CMP_MAX, + .owner = THIS_MODULE, +}; + +int __init nft_cmp_module_init(void) +{ + return nft_register_expr(&nft_cmp_type); +} + +void nft_cmp_module_exit(void) +{ + nft_unregister_expr(&nft_cmp_type); +} diff --git a/net/netfilter/nft_compat.c b/net/netfilter/nft_compat.c new file mode 100644 index 000000000000..a82667c64729 --- /dev/null +++ b/net/netfilter/nft_compat.c @@ -0,0 +1,768 @@ +/* + * (C) 2012-2013 by Pablo Neira Ayuso <pablo@netfilter.org> + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License version 2 as + * published by the Free Software Foundation. + * + * This software has been sponsored by Sophos Astaro <http://www.sophos.com> + */ + +#include <linux/kernel.h> +#include <linux/init.h> +#include <linux/module.h> +#include <linux/netlink.h> +#include <linux/netfilter.h> +#include <linux/netfilter/nfnetlink.h> +#include <linux/netfilter/nf_tables.h> +#include <linux/netfilter/nf_tables_compat.h> +#include <linux/netfilter/x_tables.h> +#include <linux/netfilter_ipv4/ip_tables.h> +#include <linux/netfilter_ipv6/ip6_tables.h> +#include <asm/uaccess.h> /* for set_fs */ +#include <net/netfilter/nf_tables.h> + +union nft_entry { + struct ipt_entry e4; + struct ip6t_entry e6; +}; + +static inline void +nft_compat_set_par(struct xt_action_param *par, void *xt, const void *xt_info) +{ + par->target = xt; + par->targinfo = xt_info; + par->hotdrop = false; +} + +static void nft_target_eval(const struct nft_expr *expr, + struct nft_data data[NFT_REG_MAX + 1], + const struct nft_pktinfo *pkt) +{ + void *info = nft_expr_priv(expr); + struct xt_target *target = expr->ops->data; + struct sk_buff *skb = pkt->skb; + int ret; + + nft_compat_set_par((struct xt_action_param *)&pkt->xt, target, info); + + ret = target->target(skb, &pkt->xt); + + if (pkt->xt.hotdrop) + ret = NF_DROP; + + switch(ret) { + case XT_CONTINUE: + data[NFT_REG_VERDICT].verdict = NFT_CONTINUE; + break; + default: + data[NFT_REG_VERDICT].verdict = ret; + break; + } + return; +} + +static const struct nla_policy nft_target_policy[NFTA_TARGET_MAX + 1] = { + [NFTA_TARGET_NAME] = { .type = NLA_NUL_STRING }, + [NFTA_TARGET_REV] = { .type = NLA_U32 }, + [NFTA_TARGET_INFO] = { .type = NLA_BINARY }, +}; + +static void +nft_target_set_tgchk_param(struct xt_tgchk_param *par, + const struct nft_ctx *ctx, + struct xt_target *target, void *info, + union nft_entry *entry, u8 proto, bool inv) +{ + par->net = &init_net; + par->table = ctx->table->name; + switch (ctx->afi->family) { + case AF_INET: + entry->e4.ip.proto = proto; + entry->e4.ip.invflags = inv ? IPT_INV_PROTO : 0; + break; + case AF_INET6: + entry->e6.ipv6.proto = proto; + entry->e6.ipv6.invflags = inv ? IP6T_INV_PROTO : 0; + break; + } + par->entryinfo = entry; + par->target = target; + par->targinfo = info; + if (ctx->chain->flags & NFT_BASE_CHAIN) { + const struct nft_base_chain *basechain = + nft_base_chain(ctx->chain); + const struct nf_hook_ops *ops = &basechain->ops; + + par->hook_mask = 1 << ops->hooknum; + } + par->family = ctx->afi->family; +} + +static void target_compat_from_user(struct xt_target *t, void *in, void *out) +{ +#ifdef CONFIG_COMPAT + if (t->compat_from_user) { + int pad; + + t->compat_from_user(out, in); + pad = XT_ALIGN(t->targetsize) - t->targetsize; + if (pad > 0) + memset(out + t->targetsize, 0, pad); + } else +#endif + memcpy(out, in, XT_ALIGN(t->targetsize)); +} + +static inline int nft_compat_target_offset(struct xt_target *target) +{ +#ifdef CONFIG_COMPAT + return xt_compat_target_offset(target); +#else + return 0; +#endif +} + +static const struct nla_policy nft_rule_compat_policy[NFTA_RULE_COMPAT_MAX + 1] = { + [NFTA_RULE_COMPAT_PROTO] = { .type = NLA_U32 }, + [NFTA_RULE_COMPAT_FLAGS] = { .type = NLA_U32 }, +}; + +static u8 nft_parse_compat(const struct nlattr *attr, bool *inv) +{ + struct nlattr *tb[NFTA_RULE_COMPAT_MAX+1]; + u32 flags; + int err; + + err = nla_parse_nested(tb, NFTA_RULE_COMPAT_MAX, attr, + nft_rule_compat_policy); + if (err < 0) + return err; + + if (!tb[NFTA_RULE_COMPAT_PROTO] || !tb[NFTA_RULE_COMPAT_FLAGS]) + return -EINVAL; + + flags = ntohl(nla_get_be32(tb[NFTA_RULE_COMPAT_FLAGS])); + if (flags & ~NFT_RULE_COMPAT_F_MASK) + return -EINVAL; + if (flags & NFT_RULE_COMPAT_F_INV) + *inv = true; + + return ntohl(nla_get_be32(tb[NFTA_RULE_COMPAT_PROTO])); +} + +static int +nft_target_init(const struct nft_ctx *ctx, const struct nft_expr *expr, + const struct nlattr * const tb[]) +{ + void *info = nft_expr_priv(expr); + struct xt_target *target = expr->ops->data; + struct xt_tgchk_param par; + size_t size = XT_ALIGN(nla_len(tb[NFTA_TARGET_INFO])); + u8 proto = 0; + bool inv = false; + union nft_entry e = {}; + int ret; + + target_compat_from_user(target, nla_data(tb[NFTA_TARGET_INFO]), info); + + if (ctx->nla[NFTA_RULE_COMPAT]) + proto = nft_parse_compat(ctx->nla[NFTA_RULE_COMPAT], &inv); + + nft_target_set_tgchk_param(&par, ctx, target, info, &e, proto, inv); + + ret = xt_check_target(&par, size, proto, inv); + if (ret < 0) + goto err; + + /* The standard target cannot be used */ + if (target->target == NULL) { + ret = -EINVAL; + goto err; + } + + return 0; +err: + module_put(target->me); + return ret; +} + +static void +nft_target_destroy(const struct nft_expr *expr) +{ + struct xt_target *target = expr->ops->data; + + module_put(target->me); +} + +static int +target_dump_info(struct sk_buff *skb, const struct xt_target *t, const void *in) +{ + int ret; + +#ifdef CONFIG_COMPAT + if (t->compat_to_user) { + mm_segment_t old_fs; + void *out; + + out = kmalloc(XT_ALIGN(t->targetsize), GFP_ATOMIC); + if (out == NULL) + return -ENOMEM; + + /* We want to reuse existing compat_to_user */ + old_fs = get_fs(); + set_fs(KERNEL_DS); + t->compat_to_user(out, in); + set_fs(old_fs); + ret = nla_put(skb, NFTA_TARGET_INFO, XT_ALIGN(t->targetsize), out); + kfree(out); + } else +#endif + ret = nla_put(skb, NFTA_TARGET_INFO, XT_ALIGN(t->targetsize), in); + + return ret; +} + +static int nft_target_dump(struct sk_buff *skb, const struct nft_expr *expr) +{ + const struct xt_target *target = expr->ops->data; + void *info = nft_expr_priv(expr); + + if (nla_put_string(skb, NFTA_TARGET_NAME, target->name) || + nla_put_be32(skb, NFTA_TARGET_REV, htonl(target->revision)) || + target_dump_info(skb, target, info)) + goto nla_put_failure; + + return 0; + +nla_put_failure: + return -1; +} + +static int nft_target_validate(const struct nft_ctx *ctx, + const struct nft_expr *expr, + const struct nft_data **data) +{ + struct xt_target *target = expr->ops->data; + unsigned int hook_mask = 0; + + if (ctx->chain->flags & NFT_BASE_CHAIN) { + const struct nft_base_chain *basechain = + nft_base_chain(ctx->chain); + const struct nf_hook_ops *ops = &basechain->ops; + + hook_mask = 1 << ops->hooknum; + if (hook_mask & target->hooks) + return 0; + + /* This target is being called from an invalid chain */ + return -EINVAL; + } + return 0; +} + +static void nft_match_eval(const struct nft_expr *expr, + struct nft_data data[NFT_REG_MAX + 1], + const struct nft_pktinfo *pkt) +{ + void *info = nft_expr_priv(expr); + struct xt_match *match = expr->ops->data; + struct sk_buff *skb = pkt->skb; + bool ret; + + nft_compat_set_par((struct xt_action_param *)&pkt->xt, match, info); + + ret = match->match(skb, (struct xt_action_param *)&pkt->xt); + + if (pkt->xt.hotdrop) { + data[NFT_REG_VERDICT].verdict = NF_DROP; + return; + } + + switch(ret) { + case true: + data[NFT_REG_VERDICT].verdict = NFT_CONTINUE; + break; + case false: + data[NFT_REG_VERDICT].verdict = NFT_BREAK; + break; + } +} + +static const struct nla_policy nft_match_policy[NFTA_MATCH_MAX + 1] = { + [NFTA_MATCH_NAME] = { .type = NLA_NUL_STRING }, + [NFTA_MATCH_REV] = { .type = NLA_U32 }, + [NFTA_MATCH_INFO] = { .type = NLA_BINARY }, +}; + +/* struct xt_mtchk_param and xt_tgchk_param look very similar */ +static void +nft_match_set_mtchk_param(struct xt_mtchk_param *par, const struct nft_ctx *ctx, + struct xt_match *match, void *info, + union nft_entry *entry, u8 proto, bool inv) +{ + par->net = &init_net; + par->table = ctx->table->name; + switch (ctx->afi->family) { + case AF_INET: + entry->e4.ip.proto = proto; + entry->e4.ip.invflags = inv ? IPT_INV_PROTO : 0; + break; + case AF_INET6: + entry->e6.ipv6.proto = proto; + entry->e6.ipv6.invflags = inv ? IP6T_INV_PROTO : 0; + break; + } + par->entryinfo = entry; + par->match = match; + par->matchinfo = info; + if (ctx->chain->flags & NFT_BASE_CHAIN) { + const struct nft_base_chain *basechain = + nft_base_chain(ctx->chain); + const struct nf_hook_ops *ops = &basechain->ops; + + par->hook_mask = 1 << ops->hooknum; + } + par->family = ctx->afi->family; +} + +static void match_compat_from_user(struct xt_match *m, void *in, void *out) +{ +#ifdef CONFIG_COMPAT + if (m->compat_from_user) { + int pad; + + m->compat_from_user(out, in); + pad = XT_ALIGN(m->matchsize) - m->matchsize; + if (pad > 0) + memset(out + m->matchsize, 0, pad); + } else +#endif + memcpy(out, in, XT_ALIGN(m->matchsize)); +} + +static int +nft_match_init(const struct nft_ctx *ctx, const struct nft_expr *expr, + const struct nlattr * const tb[]) +{ + void *info = nft_expr_priv(expr); + struct xt_match *match = expr->ops->data; + struct xt_mtchk_param par; + size_t size = XT_ALIGN(nla_len(tb[NFTA_MATCH_INFO])); + u8 proto = 0; + bool inv = false; + union nft_entry e = {}; + int ret; + + match_compat_from_user(match, nla_data(tb[NFTA_MATCH_INFO]), info); + + if (ctx->nla[NFTA_RULE_COMPAT]) + proto = nft_parse_compat(ctx->nla[NFTA_RULE_COMPAT], &inv); + + nft_match_set_mtchk_param(&par, ctx, match, info, &e, proto, inv); + + ret = xt_check_match(&par, size, proto, inv); + if (ret < 0) + goto err; + + return 0; +err: + module_put(match->me); + return ret; +} + +static void +nft_match_destroy(const struct nft_expr *expr) +{ + struct xt_match *match = expr->ops->data; + + module_put(match->me); +} + +static int +match_dump_info(struct sk_buff *skb, const struct xt_match *m, const void *in) +{ + int ret; + +#ifdef CONFIG_COMPAT + if (m->compat_to_user) { + mm_segment_t old_fs; + void *out; + + out = kmalloc(XT_ALIGN(m->matchsize), GFP_ATOMIC); + if (out == NULL) + return -ENOMEM; + + /* We want to reuse existing compat_to_user */ + old_fs = get_fs(); + set_fs(KERNEL_DS); + m->compat_to_user(out, in); + set_fs(old_fs); + ret = nla_put(skb, NFTA_MATCH_INFO, XT_ALIGN(m->matchsize), out); + kfree(out); + } else +#endif + ret = nla_put(skb, NFTA_MATCH_INFO, XT_ALIGN(m->matchsize), in); + + return ret; +} + +static inline int nft_compat_match_offset(struct xt_match *match) +{ +#ifdef CONFIG_COMPAT + return xt_compat_match_offset(match); +#else + return 0; +#endif +} + +static int nft_match_dump(struct sk_buff *skb, const struct nft_expr *expr) +{ + void *info = nft_expr_priv(expr); + struct xt_match *match = expr->ops->data; + + if (nla_put_string(skb, NFTA_MATCH_NAME, match->name) || + nla_put_be32(skb, NFTA_MATCH_REV, htonl(match->revision)) || + match_dump_info(skb, match, info)) + goto nla_put_failure; + + return 0; + +nla_put_failure: + return -1; +} + +static int nft_match_validate(const struct nft_ctx *ctx, + const struct nft_expr *expr, + const struct nft_data **data) +{ + struct xt_match *match = expr->ops->data; + unsigned int hook_mask = 0; + + if (ctx->chain->flags & NFT_BASE_CHAIN) { + const struct nft_base_chain *basechain = + nft_base_chain(ctx->chain); + const struct nf_hook_ops *ops = &basechain->ops; + + hook_mask = 1 << ops->hooknum; + if (hook_mask & match->hooks) + return 0; + + /* This match is being called from an invalid chain */ + return -EINVAL; + } + return 0; +} + +static int +nfnl_compat_fill_info(struct sk_buff *skb, u32 portid, u32 seq, u32 type, + int event, u16 family, const char *name, + int rev, int target) +{ + struct nlmsghdr *nlh; + struct nfgenmsg *nfmsg; + unsigned int flags = portid ? NLM_F_MULTI : 0; + + event |= NFNL_SUBSYS_NFT_COMPAT << 8; + nlh = nlmsg_put(skb, portid, seq, event, sizeof(*nfmsg), flags); + if (nlh == NULL) + goto nlmsg_failure; + + nfmsg = nlmsg_data(nlh); + nfmsg->nfgen_family = family; + nfmsg->version = NFNETLINK_V0; + nfmsg->res_id = 0; + + if (nla_put_string(skb, NFTA_COMPAT_NAME, name) || + nla_put_be32(skb, NFTA_COMPAT_REV, htonl(rev)) || + nla_put_be32(skb, NFTA_COMPAT_TYPE, htonl(target))) + goto nla_put_failure; + + nlmsg_end(skb, nlh); + return skb->len; + +nlmsg_failure: +nla_put_failure: + nlmsg_cancel(skb, nlh); + return -1; +} + +static int +nfnl_compat_get(struct sock *nfnl, struct sk_buff *skb, + const struct nlmsghdr *nlh, const struct nlattr * const tb[]) +{ + int ret = 0, target; + struct nfgenmsg *nfmsg; + const char *fmt; + const char *name; + u32 rev; + struct sk_buff *skb2; + + if (tb[NFTA_COMPAT_NAME] == NULL || + tb[NFTA_COMPAT_REV] == NULL || + tb[NFTA_COMPAT_TYPE] == NULL) + return -EINVAL; + + name = nla_data(tb[NFTA_COMPAT_NAME]); + rev = ntohl(nla_get_be32(tb[NFTA_COMPAT_REV])); + target = ntohl(nla_get_be32(tb[NFTA_COMPAT_TYPE])); + + nfmsg = nlmsg_data(nlh); + + switch(nfmsg->nfgen_family) { + case AF_INET: + fmt = "ipt_%s"; + break; + case AF_INET6: + fmt = "ip6t_%s"; + break; + default: + pr_err("nft_compat: unsupported protocol %d\n", + nfmsg->nfgen_family); + return -EINVAL; + } + + try_then_request_module(xt_find_revision(nfmsg->nfgen_family, name, + rev, target, &ret), + fmt, name); + + if (ret < 0) + return ret; + + skb2 = nlmsg_new(NLMSG_DEFAULT_SIZE, GFP_KERNEL); + if (skb2 == NULL) + return -ENOMEM; + + /* include the best revision for this extension in the message */ + if (nfnl_compat_fill_info(skb2, NETLINK_CB(skb).portid, + nlh->nlmsg_seq, + NFNL_MSG_TYPE(nlh->nlmsg_type), + NFNL_MSG_COMPAT_GET, + nfmsg->nfgen_family, + name, ret, target) <= 0) { + kfree_skb(skb2); + return -ENOSPC; + } + + ret = netlink_unicast(nfnl, skb2, NETLINK_CB(skb).portid, + MSG_DONTWAIT); + if (ret > 0) + ret = 0; + + return ret == -EAGAIN ? -ENOBUFS : ret; +} + +static const struct nla_policy nfnl_compat_policy_get[NFTA_COMPAT_MAX+1] = { + [NFTA_COMPAT_NAME] = { .type = NLA_NUL_STRING, + .len = NFT_COMPAT_NAME_MAX-1 }, + [NFTA_COMPAT_REV] = { .type = NLA_U32 }, + [NFTA_COMPAT_TYPE] = { .type = NLA_U32 }, +}; + +static const struct nfnl_callback nfnl_nft_compat_cb[NFNL_MSG_COMPAT_MAX] = { + [NFNL_MSG_COMPAT_GET] = { .call = nfnl_compat_get, + .attr_count = NFTA_COMPAT_MAX, + .policy = nfnl_compat_policy_get }, +}; + +static const struct nfnetlink_subsystem nfnl_compat_subsys = { + .name = "nft-compat", + .subsys_id = NFNL_SUBSYS_NFT_COMPAT, + .cb_count = NFNL_MSG_COMPAT_MAX, + .cb = nfnl_nft_compat_cb, +}; + +static LIST_HEAD(nft_match_list); + +struct nft_xt { + struct list_head head; + struct nft_expr_ops ops; +}; + +static struct nft_expr_type nft_match_type; + +static const struct nft_expr_ops * +nft_match_select_ops(const struct nft_ctx *ctx, + const struct nlattr * const tb[]) +{ + struct nft_xt *nft_match; + struct xt_match *match; + char *mt_name; + __u32 rev, family; + + if (tb[NFTA_MATCH_NAME] == NULL || + tb[NFTA_MATCH_REV] == NULL || + tb[NFTA_MATCH_INFO] == NULL) + return ERR_PTR(-EINVAL); + + mt_name = nla_data(tb[NFTA_MATCH_NAME]); + rev = ntohl(nla_get_be32(tb[NFTA_MATCH_REV])); + family = ctx->afi->family; + + /* Re-use the existing match if it's already loaded. */ + list_for_each_entry(nft_match, &nft_match_list, head) { + struct xt_match *match = nft_match->ops.data; + + if (strcmp(match->name, mt_name) == 0 && + match->revision == rev && match->family == family) + return &nft_match->ops; + } + + match = xt_request_find_match(family, mt_name, rev); + if (IS_ERR(match)) + return ERR_PTR(-ENOENT); + + /* This is the first time we use this match, allocate operations */ + nft_match = kzalloc(sizeof(struct nft_xt), GFP_KERNEL); + if (nft_match == NULL) + return ERR_PTR(-ENOMEM); + + nft_match->ops.type = &nft_match_type; + nft_match->ops.size = NFT_EXPR_SIZE(XT_ALIGN(match->matchsize) + + nft_compat_match_offset(match)); + nft_match->ops.eval = nft_match_eval; + nft_match->ops.init = nft_match_init; + nft_match->ops.destroy = nft_match_destroy; + nft_match->ops.dump = nft_match_dump; + nft_match->ops.validate = nft_match_validate; + nft_match->ops.data = match; + + list_add(&nft_match->head, &nft_match_list); + + return &nft_match->ops; +} + +static void nft_match_release(void) +{ + struct nft_xt *nft_match, *tmp; + + list_for_each_entry_safe(nft_match, tmp, &nft_match_list, head) + kfree(nft_match); +} + +static struct nft_expr_type nft_match_type __read_mostly = { + .name = "match", + .select_ops = nft_match_select_ops, + .policy = nft_match_policy, + .maxattr = NFTA_MATCH_MAX, + .owner = THIS_MODULE, +}; + +static LIST_HEAD(nft_target_list); + +static struct nft_expr_type nft_target_type; + +static const struct nft_expr_ops * +nft_target_select_ops(const struct nft_ctx *ctx, + const struct nlattr * const tb[]) +{ + struct nft_xt *nft_target; + struct xt_target *target; + char *tg_name; + __u32 rev, family; + + if (tb[NFTA_TARGET_NAME] == NULL || + tb[NFTA_TARGET_REV] == NULL || + tb[NFTA_TARGET_INFO] == NULL) + return ERR_PTR(-EINVAL); + + tg_name = nla_data(tb[NFTA_TARGET_NAME]); + rev = ntohl(nla_get_be32(tb[NFTA_TARGET_REV])); + family = ctx->afi->family; + + /* Re-use the existing target if it's already loaded. */ + list_for_each_entry(nft_target, &nft_match_list, head) { + struct xt_target *target = nft_target->ops.data; + + if (strcmp(target->name, tg_name) == 0 && + target->revision == rev && target->family == family) + return &nft_target->ops; + } + + target = xt_request_find_target(family, tg_name, rev); + if (IS_ERR(target)) + return ERR_PTR(-ENOENT); + + /* This is the first time we use this target, allocate operations */ + nft_target = kzalloc(sizeof(struct nft_xt), GFP_KERNEL); + if (nft_target == NULL) + return ERR_PTR(-ENOMEM); + + nft_target->ops.type = &nft_target_type; + nft_target->ops.size = NFT_EXPR_SIZE(XT_ALIGN(target->targetsize) + + nft_compat_target_offset(target)); + nft_target->ops.eval = nft_target_eval; + nft_target->ops.init = nft_target_init; + nft_target->ops.destroy = nft_target_destroy; + nft_target->ops.dump = nft_target_dump; + nft_target->ops.validate = nft_target_validate; + nft_target->ops.data = target; + + list_add(&nft_target->head, &nft_target_list); + + return &nft_target->ops; +} + +static void nft_target_release(void) +{ + struct nft_xt *nft_target, *tmp; + + list_for_each_entry_safe(nft_target, tmp, &nft_target_list, head) + kfree(nft_target); +} + +static struct nft_expr_type nft_target_type __read_mostly = { + .name = "target", + .select_ops = nft_target_select_ops, + .policy = nft_target_policy, + .maxattr = NFTA_TARGET_MAX, + .owner = THIS_MODULE, +}; + +static int __init nft_compat_module_init(void) +{ + int ret; + + ret = nft_register_expr(&nft_match_type); + if (ret < 0) + return ret; + + ret = nft_register_expr(&nft_target_type); + if (ret < 0) + goto err_match; + + ret = nfnetlink_subsys_register(&nfnl_compat_subsys); + if (ret < 0) { + pr_err("nft_compat: cannot register with nfnetlink.\n"); + goto err_target; + } + + pr_info("nf_tables_compat: (c) 2012 Pablo Neira Ayuso <pablo@netfilter.org>\n"); + + return ret; + +err_target: + nft_unregister_expr(&nft_target_type); +err_match: + nft_unregister_expr(&nft_match_type); + return ret; +} + +static void __exit nft_compat_module_exit(void) +{ + nfnetlink_subsys_unregister(&nfnl_compat_subsys); + nft_unregister_expr(&nft_target_type); + nft_unregister_expr(&nft_match_type); + nft_match_release(); + nft_target_release(); +} + +MODULE_ALIAS_NFNL_SUBSYS(NFNL_SUBSYS_NFT_COMPAT); + +module_init(nft_compat_module_init); +module_exit(nft_compat_module_exit); + +MODULE_LICENSE("GPL"); +MODULE_AUTHOR("Pablo Neira Ayuso <pablo@netfilter.org>"); +MODULE_ALIAS_NFT_EXPR("match"); +MODULE_ALIAS_NFT_EXPR("target"); diff --git a/net/netfilter/nft_counter.c b/net/netfilter/nft_counter.c new file mode 100644 index 000000000000..c89ee486ce54 --- /dev/null +++ b/net/netfilter/nft_counter.c @@ -0,0 +1,113 @@ +/* + * Copyright (c) 2008-2009 Patrick McHardy <kaber@trash.net> + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License version 2 as + * published by the Free Software Foundation. + * + * Development of this code funded by Astaro AG (http://www.astaro.com/) + */ + +#include <linux/kernel.h> +#include <linux/init.h> +#include <linux/module.h> +#include <linux/seqlock.h> +#include <linux/netlink.h> +#include <linux/netfilter.h> +#include <linux/netfilter/nf_tables.h> +#include <net/netfilter/nf_tables.h> + +struct nft_counter { + seqlock_t lock; + u64 bytes; + u64 packets; +}; + +static void nft_counter_eval(const struct nft_expr *expr, + struct nft_data data[NFT_REG_MAX + 1], + const struct nft_pktinfo *pkt) +{ + struct nft_counter *priv = nft_expr_priv(expr); + + write_seqlock_bh(&priv->lock); + priv->bytes += pkt->skb->len; + priv->packets++; + write_sequnlock_bh(&priv->lock); +} + +static int nft_counter_dump(struct sk_buff *skb, const struct nft_expr *expr) +{ + struct nft_counter *priv = nft_expr_priv(expr); + unsigned int seq; + u64 bytes; + u64 packets; + + do { + seq = read_seqbegin(&priv->lock); + bytes = priv->bytes; + packets = priv->packets; + } while (read_seqretry(&priv->lock, seq)); + + if (nla_put_be64(skb, NFTA_COUNTER_BYTES, cpu_to_be64(bytes))) + goto nla_put_failure; + if (nla_put_be64(skb, NFTA_COUNTER_PACKETS, cpu_to_be64(packets))) + goto nla_put_failure; + return 0; + +nla_put_failure: + return -1; +} + +static const struct nla_policy nft_counter_policy[NFTA_COUNTER_MAX + 1] = { + [NFTA_COUNTER_PACKETS] = { .type = NLA_U64 }, + [NFTA_COUNTER_BYTES] = { .type = NLA_U64 }, +}; + +static int nft_counter_init(const struct nft_ctx *ctx, + const struct nft_expr *expr, + const struct nlattr * const tb[]) +{ + struct nft_counter *priv = nft_expr_priv(expr); + + if (tb[NFTA_COUNTER_PACKETS]) + priv->packets = be64_to_cpu(nla_get_be64(tb[NFTA_COUNTER_PACKETS])); + if (tb[NFTA_COUNTER_BYTES]) + priv->bytes = be64_to_cpu(nla_get_be64(tb[NFTA_COUNTER_BYTES])); + + seqlock_init(&priv->lock); + return 0; +} + +static struct nft_expr_type nft_counter_type; +static const struct nft_expr_ops nft_counter_ops = { + .type = &nft_counter_type, + .size = NFT_EXPR_SIZE(sizeof(struct nft_counter)), + .eval = nft_counter_eval, + .init = nft_counter_init, + .dump = nft_counter_dump, +}; + +static struct nft_expr_type nft_counter_type __read_mostly = { + .name = "counter", + .ops = &nft_counter_ops, + .policy = nft_counter_policy, + .maxattr = NFTA_COUNTER_MAX, + .owner = THIS_MODULE, +}; + +static int __init nft_counter_module_init(void) +{ + return nft_register_expr(&nft_counter_type); +} + +static void __exit nft_counter_module_exit(void) +{ + nft_unregister_expr(&nft_counter_type); +} + +module_init(nft_counter_module_init); +module_exit(nft_counter_module_exit); + +MODULE_LICENSE("GPL"); +MODULE_AUTHOR("Patrick McHardy <kaber@trash.net>"); +MODULE_ALIAS_NFT_EXPR("counter"); diff --git a/net/netfilter/nft_ct.c b/net/netfilter/nft_ct.c new file mode 100644 index 000000000000..955f4e6e7089 --- /dev/null +++ b/net/netfilter/nft_ct.c @@ -0,0 +1,258 @@ +/* + * Copyright (c) 2008-2009 Patrick McHardy <kaber@trash.net> + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License version 2 as + * published by the Free Software Foundation. + * + * Development of this code funded by Astaro AG (http://www.astaro.com/) + */ + +#include <linux/kernel.h> +#include <linux/init.h> +#include <linux/module.h> +#include <linux/netlink.h> +#include <linux/netfilter.h> +#include <linux/netfilter/nf_tables.h> +#include <net/netfilter/nf_tables.h> +#include <net/netfilter/nf_conntrack.h> +#include <net/netfilter/nf_conntrack_tuple.h> +#include <net/netfilter/nf_conntrack_helper.h> + +struct nft_ct { + enum nft_ct_keys key:8; + enum ip_conntrack_dir dir:8; + enum nft_registers dreg:8; + uint8_t family; +}; + +static void nft_ct_eval(const struct nft_expr *expr, + struct nft_data data[NFT_REG_MAX + 1], + const struct nft_pktinfo *pkt) +{ + const struct nft_ct *priv = nft_expr_priv(expr); + struct nft_data *dest = &data[priv->dreg]; + enum ip_conntrack_info ctinfo; + const struct nf_conn *ct; + const struct nf_conn_help *help; + const struct nf_conntrack_tuple *tuple; + const struct nf_conntrack_helper *helper; + long diff; + unsigned int state; + + ct = nf_ct_get(pkt->skb, &ctinfo); + + switch (priv->key) { + case NFT_CT_STATE: + if (ct == NULL) + state = NF_CT_STATE_INVALID_BIT; + else if (nf_ct_is_untracked(ct)) + state = NF_CT_STATE_UNTRACKED_BIT; + else + state = NF_CT_STATE_BIT(ctinfo); + dest->data[0] = state; + return; + } + + if (ct == NULL) + goto err; + + switch (priv->key) { + case NFT_CT_DIRECTION: + dest->data[0] = CTINFO2DIR(ctinfo); + return; + case NFT_CT_STATUS: + dest->data[0] = ct->status; + return; +#ifdef CONFIG_NF_CONNTRACK_MARK + case NFT_CT_MARK: + dest->data[0] = ct->mark; + return; +#endif +#ifdef CONFIG_NF_CONNTRACK_SECMARK + case NFT_CT_SECMARK: + dest->data[0] = ct->secmark; + return; +#endif + case NFT_CT_EXPIRATION: + diff = (long)jiffies - (long)ct->timeout.expires; + if (diff < 0) + diff = 0; + dest->data[0] = jiffies_to_msecs(diff); + return; + case NFT_CT_HELPER: + if (ct->master == NULL) + goto err; + help = nfct_help(ct->master); + if (help == NULL) + goto err; + helper = rcu_dereference(help->helper); + if (helper == NULL) + goto err; + if (strlen(helper->name) >= sizeof(dest->data)) + goto err; + strncpy((char *)dest->data, helper->name, sizeof(dest->data)); + return; + } + + tuple = &ct->tuplehash[priv->dir].tuple; + switch (priv->key) { + case NFT_CT_L3PROTOCOL: + dest->data[0] = nf_ct_l3num(ct); + return; + case NFT_CT_SRC: + memcpy(dest->data, tuple->src.u3.all, + nf_ct_l3num(ct) == NFPROTO_IPV4 ? 4 : 16); + return; + case NFT_CT_DST: + memcpy(dest->data, tuple->dst.u3.all, + nf_ct_l3num(ct) == NFPROTO_IPV4 ? 4 : 16); + return; + case NFT_CT_PROTOCOL: + dest->data[0] = nf_ct_protonum(ct); + return; + case NFT_CT_PROTO_SRC: + dest->data[0] = (__force __u16)tuple->src.u.all; + return; + case NFT_CT_PROTO_DST: + dest->data[0] = (__force __u16)tuple->dst.u.all; + return; + } + return; +err: + data[NFT_REG_VERDICT].verdict = NFT_BREAK; +} + +static const struct nla_policy nft_ct_policy[NFTA_CT_MAX + 1] = { + [NFTA_CT_DREG] = { .type = NLA_U32 }, + [NFTA_CT_KEY] = { .type = NLA_U32 }, + [NFTA_CT_DIRECTION] = { .type = NLA_U8 }, +}; + +static int nft_ct_init(const struct nft_ctx *ctx, + const struct nft_expr *expr, + const struct nlattr * const tb[]) +{ + struct nft_ct *priv = nft_expr_priv(expr); + int err; + + if (tb[NFTA_CT_DREG] == NULL || + tb[NFTA_CT_KEY] == NULL) + return -EINVAL; + + priv->key = ntohl(nla_get_be32(tb[NFTA_CT_KEY])); + if (tb[NFTA_CT_DIRECTION] != NULL) { + priv->dir = nla_get_u8(tb[NFTA_CT_DIRECTION]); + switch (priv->dir) { + case IP_CT_DIR_ORIGINAL: + case IP_CT_DIR_REPLY: + break; + default: + return -EINVAL; + } + } + + switch (priv->key) { + case NFT_CT_STATE: + case NFT_CT_DIRECTION: + case NFT_CT_STATUS: +#ifdef CONFIG_NF_CONNTRACK_MARK + case NFT_CT_MARK: +#endif +#ifdef CONFIG_NF_CONNTRACK_SECMARK + case NFT_CT_SECMARK: +#endif + case NFT_CT_EXPIRATION: + case NFT_CT_HELPER: + if (tb[NFTA_CT_DIRECTION] != NULL) + return -EINVAL; + break; + case NFT_CT_PROTOCOL: + case NFT_CT_SRC: + case NFT_CT_DST: + case NFT_CT_PROTO_SRC: + case NFT_CT_PROTO_DST: + if (tb[NFTA_CT_DIRECTION] == NULL) + return -EINVAL; + break; + default: + return -EOPNOTSUPP; + } + + err = nf_ct_l3proto_try_module_get(ctx->afi->family); + if (err < 0) + return err; + priv->family = ctx->afi->family; + + priv->dreg = ntohl(nla_get_be32(tb[NFTA_CT_DREG])); + err = nft_validate_output_register(priv->dreg); + if (err < 0) + goto err1; + + err = nft_validate_data_load(ctx, priv->dreg, NULL, NFT_DATA_VALUE); + if (err < 0) + goto err1; + return 0; + +err1: + nf_ct_l3proto_module_put(ctx->afi->family); + return err; +} + +static void nft_ct_destroy(const struct nft_expr *expr) +{ + struct nft_ct *priv = nft_expr_priv(expr); + + nf_ct_l3proto_module_put(priv->family); +} + +static int nft_ct_dump(struct sk_buff *skb, const struct nft_expr *expr) +{ + const struct nft_ct *priv = nft_expr_priv(expr); + + if (nla_put_be32(skb, NFTA_CT_DREG, htonl(priv->dreg))) + goto nla_put_failure; + if (nla_put_be32(skb, NFTA_CT_KEY, htonl(priv->key))) + goto nla_put_failure; + if (nla_put_u8(skb, NFTA_CT_DIRECTION, priv->dir)) + goto nla_put_failure; + return 0; + +nla_put_failure: + return -1; +} + +static struct nft_expr_type nft_ct_type; +static const struct nft_expr_ops nft_ct_ops = { + .type = &nft_ct_type, + .size = NFT_EXPR_SIZE(sizeof(struct nft_ct)), + .eval = nft_ct_eval, + .init = nft_ct_init, + .destroy = nft_ct_destroy, + .dump = nft_ct_dump, +}; + +static struct nft_expr_type nft_ct_type __read_mostly = { + .name = "ct", + .ops = &nft_ct_ops, + .policy = nft_ct_policy, + .maxattr = NFTA_CT_MAX, + .owner = THIS_MODULE, +}; + +static int __init nft_ct_module_init(void) +{ + return nft_register_expr(&nft_ct_type); +} + +static void __exit nft_ct_module_exit(void) +{ + nft_unregister_expr(&nft_ct_type); +} + +module_init(nft_ct_module_init); +module_exit(nft_ct_module_exit); + +MODULE_LICENSE("GPL"); +MODULE_AUTHOR("Patrick McHardy <kaber@trash.net>"); +MODULE_ALIAS_NFT_EXPR("ct"); diff --git a/net/netfilter/nft_expr_template.c b/net/netfilter/nft_expr_template.c new file mode 100644 index 000000000000..b6eed4d5a096 --- /dev/null +++ b/net/netfilter/nft_expr_template.c @@ -0,0 +1,94 @@ +/* + * Copyright (c) 2008-2009 Patrick McHardy <kaber@trash.net> + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License version 2 as + * published by the Free Software Foundation. + * + * Development of this code funded by Astaro AG (http://www.astaro.com/) + */ + +#include <linux/kernel.h> +#include <linux/init.h> +#include <linux/netlink.h> +#include <linux/netfilter.h> +#include <linux/netfilter/nf_tables.h> +#include <net/netfilter/nf_tables.h> + +struct nft_template { + +}; + +static void nft_template_eval(const struct nft_expr *expr, + struct nft_data data[NFT_REG_MAX + 1], + const struct nft_pktinfo *pkt) +{ + struct nft_template *priv = nft_expr_priv(expr); + +} + +static const struct nla_policy nft_template_policy[NFTA_TEMPLATE_MAX + 1] = { + [NFTA_TEMPLATE_ATTR] = { .type = NLA_U32 }, +}; + +static int nft_template_init(const struct nft_ctx *ctx, + const struct nft_expr *expr, + const struct nlattr * const tb[]) +{ + struct nft_template *priv = nft_expr_priv(expr); + + return 0; +} + +static void nft_template_destroy(const struct nft_ctx *ctx, + const struct nft_expr *expr) +{ + struct nft_template *priv = nft_expr_priv(expr); + +} + +static int nft_template_dump(struct sk_buff *skb, const struct nft_expr *expr) +{ + const struct nft_template *priv = nft_expr_priv(expr); + + NLA_PUT_BE32(skb, NFTA_TEMPLATE_ATTR, priv->field); + return 0; + +nla_put_failure: + return -1; +} + +static struct nft_expr_type nft_template_type; +static const struct nft_expr_ops nft_template_ops = { + .type = &nft_template_type, + .size = NFT_EXPR_SIZE(sizeof(struct nft_template)), + .eval = nft_template_eval, + .init = nft_template_init, + .destroy = nft_template_destroy, + .dump = nft_template_dump, +}; + +static struct nft_expr_type nft_template_type __read_mostly = { + .name = "template", + .ops = &nft_template_ops, + .policy = nft_template_policy, + .maxattr = NFTA_TEMPLATE_MAX, + .owner = THIS_MODULE, +}; + +static int __init nft_template_module_init(void) +{ + return nft_register_expr(&nft_template_type); +} + +static void __exit nft_template_module_exit(void) +{ + nft_unregister_expr(&nft_template_type); +} + +module_init(nft_template_module_init); +module_exit(nft_template_module_exit); + +MODULE_LICENSE("GPL"); +MODULE_AUTHOR("Patrick McHardy <kaber@trash.net>"); +MODULE_ALIAS_NFT_EXPR("template"); diff --git a/net/netfilter/nft_exthdr.c b/net/netfilter/nft_exthdr.c new file mode 100644 index 000000000000..8e0bb75e7c51 --- /dev/null +++ b/net/netfilter/nft_exthdr.c @@ -0,0 +1,133 @@ +/* + * Copyright (c) 2008 Patrick McHardy <kaber@trash.net> + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License version 2 as + * published by the Free Software Foundation. + * + * Development of this code funded by Astaro AG (http://www.astaro.com/) + */ + +#include <linux/kernel.h> +#include <linux/init.h> +#include <linux/module.h> +#include <linux/netlink.h> +#include <linux/netfilter.h> +#include <linux/netfilter/nf_tables.h> +#include <net/netfilter/nf_tables.h> +// FIXME: +#include <net/ipv6.h> + +struct nft_exthdr { + u8 type; + u8 offset; + u8 len; + enum nft_registers dreg:8; +}; + +static void nft_exthdr_eval(const struct nft_expr *expr, + struct nft_data data[NFT_REG_MAX + 1], + const struct nft_pktinfo *pkt) +{ + struct nft_exthdr *priv = nft_expr_priv(expr); + struct nft_data *dest = &data[priv->dreg]; + unsigned int offset; + int err; + + err = ipv6_find_hdr(pkt->skb, &offset, priv->type, NULL, NULL); + if (err < 0) + goto err; + offset += priv->offset; + + if (skb_copy_bits(pkt->skb, offset, dest->data, priv->len) < 0) + goto err; + return; +err: + data[NFT_REG_VERDICT].verdict = NFT_BREAK; +} + +static const struct nla_policy nft_exthdr_policy[NFTA_EXTHDR_MAX + 1] = { + [NFTA_EXTHDR_DREG] = { .type = NLA_U32 }, + [NFTA_EXTHDR_TYPE] = { .type = NLA_U8 }, + [NFTA_EXTHDR_OFFSET] = { .type = NLA_U32 }, + [NFTA_EXTHDR_LEN] = { .type = NLA_U32 }, +}; + +static int nft_exthdr_init(const struct nft_ctx *ctx, + const struct nft_expr *expr, + const struct nlattr * const tb[]) +{ + struct nft_exthdr *priv = nft_expr_priv(expr); + int err; + + if (tb[NFTA_EXTHDR_DREG] == NULL || + tb[NFTA_EXTHDR_TYPE] == NULL || + tb[NFTA_EXTHDR_OFFSET] == NULL || + tb[NFTA_EXTHDR_LEN] == NULL) + return -EINVAL; + + priv->type = nla_get_u8(tb[NFTA_EXTHDR_TYPE]); + priv->offset = ntohl(nla_get_be32(tb[NFTA_EXTHDR_OFFSET])); + priv->len = ntohl(nla_get_be32(tb[NFTA_EXTHDR_LEN])); + if (priv->len == 0 || + priv->len > FIELD_SIZEOF(struct nft_data, data)) + return -EINVAL; + + priv->dreg = ntohl(nla_get_be32(tb[NFTA_EXTHDR_DREG])); + err = nft_validate_output_register(priv->dreg); + if (err < 0) + return err; + return nft_validate_data_load(ctx, priv->dreg, NULL, NFT_DATA_VALUE); +} + +static int nft_exthdr_dump(struct sk_buff *skb, const struct nft_expr *expr) +{ + const struct nft_exthdr *priv = nft_expr_priv(expr); + + if (nla_put_be32(skb, NFTA_EXTHDR_DREG, htonl(priv->dreg))) + goto nla_put_failure; + if (nla_put_u8(skb, NFTA_EXTHDR_TYPE, priv->type)) + goto nla_put_failure; + if (nla_put_be32(skb, NFTA_EXTHDR_OFFSET, htonl(priv->offset))) + goto nla_put_failure; + if (nla_put_be32(skb, NFTA_EXTHDR_LEN, htonl(priv->len))) + goto nla_put_failure; + return 0; + +nla_put_failure: + return -1; +} + +static struct nft_expr_type nft_exthdr_type; +static const struct nft_expr_ops nft_exthdr_ops = { + .type = &nft_exthdr_type, + .size = NFT_EXPR_SIZE(sizeof(struct nft_exthdr)), + .eval = nft_exthdr_eval, + .init = nft_exthdr_init, + .dump = nft_exthdr_dump, +}; + +static struct nft_expr_type nft_exthdr_type __read_mostly = { + .name = "exthdr", + .ops = &nft_exthdr_ops, + .policy = nft_exthdr_policy, + .maxattr = NFTA_EXTHDR_MAX, + .owner = THIS_MODULE, +}; + +static int __init nft_exthdr_module_init(void) +{ + return nft_register_expr(&nft_exthdr_type); +} + +static void __exit nft_exthdr_module_exit(void) +{ + nft_unregister_expr(&nft_exthdr_type); +} + +module_init(nft_exthdr_module_init); +module_exit(nft_exthdr_module_exit); + +MODULE_LICENSE("GPL"); +MODULE_AUTHOR("Patrick McHardy <kaber@trash.net>"); +MODULE_ALIAS_NFT_EXPR("exthdr"); diff --git a/net/netfilter/nft_hash.c b/net/netfilter/nft_hash.c new file mode 100644 index 000000000000..3d3f8fce10a5 --- /dev/null +++ b/net/netfilter/nft_hash.c @@ -0,0 +1,231 @@ +/* + * Copyright (c) 2008-2009 Patrick McHardy <kaber@trash.net> + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License version 2 as + * published by the Free Software Foundation. + * + * Development of this code funded by Astaro AG (http://www.astaro.com/) + */ + +#include <linux/kernel.h> +#include <linux/init.h> +#include <linux/module.h> +#include <linux/list.h> +#include <linux/jhash.h> +#include <linux/netlink.h> +#include <linux/netfilter.h> +#include <linux/netfilter/nf_tables.h> +#include <net/netfilter/nf_tables.h> + +struct nft_hash { + struct hlist_head *hash; + unsigned int hsize; +}; + +struct nft_hash_elem { + struct hlist_node hnode; + struct nft_data key; + struct nft_data data[]; +}; + +static u32 nft_hash_rnd __read_mostly; +static bool nft_hash_rnd_initted __read_mostly; + +static unsigned int nft_hash_data(const struct nft_data *data, + unsigned int hsize, unsigned int len) +{ + unsigned int h; + + h = jhash(data->data, len, nft_hash_rnd); + return ((u64)h * hsize) >> 32; +} + +static bool nft_hash_lookup(const struct nft_set *set, + const struct nft_data *key, + struct nft_data *data) +{ + const struct nft_hash *priv = nft_set_priv(set); + const struct nft_hash_elem *he; + unsigned int h; + + h = nft_hash_data(key, priv->hsize, set->klen); + hlist_for_each_entry(he, &priv->hash[h], hnode) { + if (nft_data_cmp(&he->key, key, set->klen)) + continue; + if (set->flags & NFT_SET_MAP) + nft_data_copy(data, he->data); + return true; + } + return false; +} + +static void nft_hash_elem_destroy(const struct nft_set *set, + struct nft_hash_elem *he) +{ + nft_data_uninit(&he->key, NFT_DATA_VALUE); + if (set->flags & NFT_SET_MAP) + nft_data_uninit(he->data, set->dtype); + kfree(he); +} + +static int nft_hash_insert(const struct nft_set *set, + const struct nft_set_elem *elem) +{ + struct nft_hash *priv = nft_set_priv(set); + struct nft_hash_elem *he; + unsigned int size, h; + + if (elem->flags != 0) + return -EINVAL; + + size = sizeof(*he); + if (set->flags & NFT_SET_MAP) + size += sizeof(he->data[0]); + + he = kzalloc(size, GFP_KERNEL); + if (he == NULL) + return -ENOMEM; + + nft_data_copy(&he->key, &elem->key); + if (set->flags & NFT_SET_MAP) + nft_data_copy(he->data, &elem->data); + + h = nft_hash_data(&he->key, priv->hsize, set->klen); + hlist_add_head_rcu(&he->hnode, &priv->hash[h]); + return 0; +} + +static void nft_hash_remove(const struct nft_set *set, + const struct nft_set_elem *elem) +{ + struct nft_hash_elem *he = elem->cookie; + + hlist_del_rcu(&he->hnode); + kfree(he); +} + +static int nft_hash_get(const struct nft_set *set, struct nft_set_elem *elem) +{ + const struct nft_hash *priv = nft_set_priv(set); + struct nft_hash_elem *he; + unsigned int h; + + h = nft_hash_data(&elem->key, priv->hsize, set->klen); + hlist_for_each_entry(he, &priv->hash[h], hnode) { + if (nft_data_cmp(&he->key, &elem->key, set->klen)) + continue; + + elem->cookie = he; + elem->flags = 0; + if (set->flags & NFT_SET_MAP) + nft_data_copy(&elem->data, he->data); + return 0; + } + return -ENOENT; +} + +static void nft_hash_walk(const struct nft_ctx *ctx, const struct nft_set *set, + struct nft_set_iter *iter) +{ + const struct nft_hash *priv = nft_set_priv(set); + const struct nft_hash_elem *he; + struct nft_set_elem elem; + unsigned int i; + + for (i = 0; i < priv->hsize; i++) { + hlist_for_each_entry(he, &priv->hash[i], hnode) { + if (iter->count < iter->skip) + goto cont; + + memcpy(&elem.key, &he->key, sizeof(elem.key)); + if (set->flags & NFT_SET_MAP) + memcpy(&elem.data, he->data, sizeof(elem.data)); + elem.flags = 0; + + iter->err = iter->fn(ctx, set, iter, &elem); + if (iter->err < 0) + return; +cont: + iter->count++; + } + } +} + +static unsigned int nft_hash_privsize(const struct nlattr * const nla[]) +{ + return sizeof(struct nft_hash); +} + +static int nft_hash_init(const struct nft_set *set, + const struct nlattr * const tb[]) +{ + struct nft_hash *priv = nft_set_priv(set); + unsigned int cnt, i; + + if (unlikely(!nft_hash_rnd_initted)) { + get_random_bytes(&nft_hash_rnd, 4); + nft_hash_rnd_initted = true; + } + + /* Aim for a load factor of 0.75 */ + // FIXME: temporarily broken until we have set descriptions + cnt = 100; + cnt = cnt * 4 / 3; + + priv->hash = kcalloc(cnt, sizeof(struct hlist_head), GFP_KERNEL); + if (priv->hash == NULL) + return -ENOMEM; + priv->hsize = cnt; + + for (i = 0; i < cnt; i++) + INIT_HLIST_HEAD(&priv->hash[i]); + + return 0; +} + +static void nft_hash_destroy(const struct nft_set *set) +{ + const struct nft_hash *priv = nft_set_priv(set); + const struct hlist_node *next; + struct nft_hash_elem *elem; + unsigned int i; + + for (i = 0; i < priv->hsize; i++) { + hlist_for_each_entry_safe(elem, next, &priv->hash[i], hnode) { + hlist_del(&elem->hnode); + nft_hash_elem_destroy(set, elem); + } + } + kfree(priv->hash); +} + +static struct nft_set_ops nft_hash_ops __read_mostly = { + .privsize = nft_hash_privsize, + .init = nft_hash_init, + .destroy = nft_hash_destroy, + .get = nft_hash_get, + .insert = nft_hash_insert, + .remove = nft_hash_remove, + .lookup = nft_hash_lookup, + .walk = nft_hash_walk, + .features = NFT_SET_MAP, + .owner = THIS_MODULE, +}; + +static int __init nft_hash_module_init(void) +{ + return nft_register_set(&nft_hash_ops); +} + +static void __exit nft_hash_module_exit(void) +{ + nft_unregister_set(&nft_hash_ops); +} + +module_init(nft_hash_module_init); +module_exit(nft_hash_module_exit); + +MODULE_LICENSE("GPL"); +MODULE_AUTHOR("Patrick McHardy <kaber@trash.net>"); +MODULE_ALIAS_NFT_SET(); diff --git a/net/netfilter/nft_immediate.c b/net/netfilter/nft_immediate.c new file mode 100644 index 000000000000..f169501f1ad4 --- /dev/null +++ b/net/netfilter/nft_immediate.c @@ -0,0 +1,132 @@ +/* + * Copyright (c) 2008-2009 Patrick McHardy <kaber@trash.net> + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License version 2 as + * published by the Free Software Foundation. + * + * Development of this code funded by Astaro AG (http://www.astaro.com/) + */ + +#include <linux/kernel.h> +#include <linux/init.h> +#include <linux/module.h> +#include <linux/netlink.h> +#include <linux/netfilter.h> +#include <linux/netfilter/nf_tables.h> +#include <net/netfilter/nf_tables_core.h> +#include <net/netfilter/nf_tables.h> + +struct nft_immediate_expr { + struct nft_data data; + enum nft_registers dreg:8; + u8 dlen; +}; + +static void nft_immediate_eval(const struct nft_expr *expr, + struct nft_data data[NFT_REG_MAX + 1], + const struct nft_pktinfo *pkt) +{ + const struct nft_immediate_expr *priv = nft_expr_priv(expr); + + nft_data_copy(&data[priv->dreg], &priv->data); +} + +static const struct nla_policy nft_immediate_policy[NFTA_IMMEDIATE_MAX + 1] = { + [NFTA_IMMEDIATE_DREG] = { .type = NLA_U32 }, + [NFTA_IMMEDIATE_DATA] = { .type = NLA_NESTED }, +}; + +static int nft_immediate_init(const struct nft_ctx *ctx, + const struct nft_expr *expr, + const struct nlattr * const tb[]) +{ + struct nft_immediate_expr *priv = nft_expr_priv(expr); + struct nft_data_desc desc; + int err; + + if (tb[NFTA_IMMEDIATE_DREG] == NULL || + tb[NFTA_IMMEDIATE_DATA] == NULL) + return -EINVAL; + + priv->dreg = ntohl(nla_get_be32(tb[NFTA_IMMEDIATE_DREG])); + err = nft_validate_output_register(priv->dreg); + if (err < 0) + return err; + + err = nft_data_init(ctx, &priv->data, &desc, tb[NFTA_IMMEDIATE_DATA]); + if (err < 0) + return err; + priv->dlen = desc.len; + + err = nft_validate_data_load(ctx, priv->dreg, &priv->data, desc.type); + if (err < 0) + goto err1; + + return 0; + +err1: + nft_data_uninit(&priv->data, desc.type); + return err; +} + +static void nft_immediate_destroy(const struct nft_expr *expr) +{ + const struct nft_immediate_expr *priv = nft_expr_priv(expr); + return nft_data_uninit(&priv->data, nft_dreg_to_type(priv->dreg)); +} + +static int nft_immediate_dump(struct sk_buff *skb, const struct nft_expr *expr) +{ + const struct nft_immediate_expr *priv = nft_expr_priv(expr); + + if (nla_put_be32(skb, NFTA_IMMEDIATE_DREG, htonl(priv->dreg))) + goto nla_put_failure; + + return nft_data_dump(skb, NFTA_IMMEDIATE_DATA, &priv->data, + nft_dreg_to_type(priv->dreg), priv->dlen); + +nla_put_failure: + return -1; +} + +static int nft_immediate_validate(const struct nft_ctx *ctx, + const struct nft_expr *expr, + const struct nft_data **data) +{ + const struct nft_immediate_expr *priv = nft_expr_priv(expr); + + if (priv->dreg == NFT_REG_VERDICT) + *data = &priv->data; + + return 0; +} + +static struct nft_expr_type nft_imm_type; +static const struct nft_expr_ops nft_imm_ops = { + .type = &nft_imm_type, + .size = NFT_EXPR_SIZE(sizeof(struct nft_immediate_expr)), + .eval = nft_immediate_eval, + .init = nft_immediate_init, + .destroy = nft_immediate_destroy, + .dump = nft_immediate_dump, + .validate = nft_immediate_validate, +}; + +static struct nft_expr_type nft_imm_type __read_mostly = { + .name = "immediate", + .ops = &nft_imm_ops, + .policy = nft_immediate_policy, + .maxattr = NFTA_IMMEDIATE_MAX, + .owner = THIS_MODULE, +}; + +int __init nft_immediate_module_init(void) +{ + return nft_register_expr(&nft_imm_type); +} + +void nft_immediate_module_exit(void) +{ + nft_unregister_expr(&nft_imm_type); +} diff --git a/net/netfilter/nft_limit.c b/net/netfilter/nft_limit.c new file mode 100644 index 000000000000..85da5bd02f64 --- /dev/null +++ b/net/netfilter/nft_limit.c @@ -0,0 +1,119 @@ +/* + * Copyright (c) 2008-2009 Patrick McHardy <kaber@trash.net> + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License version 2 as + * published by the Free Software Foundation. + * + * Development of this code funded by Astaro AG (http://www.astaro.com/) + */ + +#include <linux/kernel.h> +#include <linux/init.h> +#include <linux/module.h> +#include <linux/spinlock.h> +#include <linux/netlink.h> +#include <linux/netfilter.h> +#include <linux/netfilter/nf_tables.h> +#include <net/netfilter/nf_tables.h> + +static DEFINE_SPINLOCK(limit_lock); + +struct nft_limit { + u64 tokens; + u64 rate; + u64 unit; + unsigned long stamp; +}; + +static void nft_limit_eval(const struct nft_expr *expr, + struct nft_data data[NFT_REG_MAX + 1], + const struct nft_pktinfo *pkt) +{ + struct nft_limit *priv = nft_expr_priv(expr); + + spin_lock_bh(&limit_lock); + if (time_after_eq(jiffies, priv->stamp)) { + priv->tokens = priv->rate; + priv->stamp = jiffies + priv->unit * HZ; + } + + if (priv->tokens >= 1) { + priv->tokens--; + spin_unlock_bh(&limit_lock); + return; + } + spin_unlock_bh(&limit_lock); + + data[NFT_REG_VERDICT].verdict = NFT_BREAK; +} + +static const struct nla_policy nft_limit_policy[NFTA_LIMIT_MAX + 1] = { + [NFTA_LIMIT_RATE] = { .type = NLA_U64 }, + [NFTA_LIMIT_UNIT] = { .type = NLA_U64 }, +}; + +static int nft_limit_init(const struct nft_ctx *ctx, + const struct nft_expr *expr, + const struct nlattr * const tb[]) +{ + struct nft_limit *priv = nft_expr_priv(expr); + + if (tb[NFTA_LIMIT_RATE] == NULL || + tb[NFTA_LIMIT_UNIT] == NULL) + return -EINVAL; + + priv->rate = be64_to_cpu(nla_get_be64(tb[NFTA_LIMIT_RATE])); + priv->unit = be64_to_cpu(nla_get_be64(tb[NFTA_LIMIT_UNIT])); + priv->stamp = jiffies + priv->unit * HZ; + priv->tokens = priv->rate; + return 0; +} + +static int nft_limit_dump(struct sk_buff *skb, const struct nft_expr *expr) +{ + const struct nft_limit *priv = nft_expr_priv(expr); + + if (nla_put_be64(skb, NFTA_LIMIT_RATE, cpu_to_be64(priv->rate))) + goto nla_put_failure; + if (nla_put_be64(skb, NFTA_LIMIT_UNIT, cpu_to_be64(priv->unit))) + goto nla_put_failure; + return 0; + +nla_put_failure: + return -1; +} + +static struct nft_expr_type nft_limit_type; +static const struct nft_expr_ops nft_limit_ops = { + .type = &nft_limit_type, + .size = NFT_EXPR_SIZE(sizeof(struct nft_limit)), + .eval = nft_limit_eval, + .init = nft_limit_init, + .dump = nft_limit_dump, +}; + +static struct nft_expr_type nft_limit_type __read_mostly = { + .name = "limit", + .ops = &nft_limit_ops, + .policy = nft_limit_policy, + .maxattr = NFTA_LIMIT_MAX, + .owner = THIS_MODULE, +}; + +static int __init nft_limit_module_init(void) +{ + return nft_register_expr(&nft_limit_type); +} + +static void __exit nft_limit_module_exit(void) +{ + nft_unregister_expr(&nft_limit_type); +} + +module_init(nft_limit_module_init); +module_exit(nft_limit_module_exit); + +MODULE_LICENSE("GPL"); +MODULE_AUTHOR("Patrick McHardy <kaber@trash.net>"); +MODULE_ALIAS_NFT_EXPR("limit"); diff --git a/net/netfilter/nft_log.c b/net/netfilter/nft_log.c new file mode 100644 index 000000000000..57cad072a13e --- /dev/null +++ b/net/netfilter/nft_log.c @@ -0,0 +1,146 @@ +/* + * Copyright (c) 2008-2009 Patrick McHardy <kaber@trash.net> + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License version 2 as + * published by the Free Software Foundation. + * + * Development of this code funded by Astaro AG (http://www.astaro.com/) + */ + +#include <linux/kernel.h> +#include <linux/init.h> +#include <linux/module.h> +#include <linux/netlink.h> +#include <linux/netfilter.h> +#include <linux/netfilter/nf_tables.h> +#include <net/netfilter/nf_tables.h> +#include <net/netfilter/nf_log.h> +#include <linux/netdevice.h> + +static const char *nft_log_null_prefix = ""; + +struct nft_log { + struct nf_loginfo loginfo; + char *prefix; + int family; +}; + +static void nft_log_eval(const struct nft_expr *expr, + struct nft_data data[NFT_REG_MAX + 1], + const struct nft_pktinfo *pkt) +{ + const struct nft_log *priv = nft_expr_priv(expr); + struct net *net = dev_net(pkt->in ? pkt->in : pkt->out); + + nf_log_packet(net, priv->family, pkt->hooknum, pkt->skb, pkt->in, + pkt->out, &priv->loginfo, "%s", priv->prefix); +} + +static const struct nla_policy nft_log_policy[NFTA_LOG_MAX + 1] = { + [NFTA_LOG_GROUP] = { .type = NLA_U16 }, + [NFTA_LOG_PREFIX] = { .type = NLA_STRING }, + [NFTA_LOG_SNAPLEN] = { .type = NLA_U32 }, + [NFTA_LOG_QTHRESHOLD] = { .type = NLA_U16 }, +}; + +static int nft_log_init(const struct nft_ctx *ctx, + const struct nft_expr *expr, + const struct nlattr * const tb[]) +{ + struct nft_log *priv = nft_expr_priv(expr); + struct nf_loginfo *li = &priv->loginfo; + const struct nlattr *nla; + + priv->family = ctx->afi->family; + + nla = tb[NFTA_LOG_PREFIX]; + if (nla != NULL) { + priv->prefix = kmalloc(nla_len(nla) + 1, GFP_KERNEL); + if (priv->prefix == NULL) + return -ENOMEM; + nla_strlcpy(priv->prefix, nla, nla_len(nla) + 1); + } else + priv->prefix = (char *)nft_log_null_prefix; + + li->type = NF_LOG_TYPE_ULOG; + if (tb[NFTA_LOG_GROUP] != NULL) + li->u.ulog.group = ntohs(nla_get_be16(tb[NFTA_LOG_GROUP])); + + if (tb[NFTA_LOG_SNAPLEN] != NULL) + li->u.ulog.copy_len = ntohl(nla_get_be32(tb[NFTA_LOG_SNAPLEN])); + if (tb[NFTA_LOG_QTHRESHOLD] != NULL) { + li->u.ulog.qthreshold = + ntohs(nla_get_be16(tb[NFTA_LOG_QTHRESHOLD])); + } + + return 0; +} + +static void nft_log_destroy(const struct nft_expr *expr) +{ + struct nft_log *priv = nft_expr_priv(expr); + + if (priv->prefix != nft_log_null_prefix) + kfree(priv->prefix); +} + +static int nft_log_dump(struct sk_buff *skb, const struct nft_expr *expr) +{ + const struct nft_log *priv = nft_expr_priv(expr); + const struct nf_loginfo *li = &priv->loginfo; + + if (priv->prefix != nft_log_null_prefix) + if (nla_put_string(skb, NFTA_LOG_PREFIX, priv->prefix)) + goto nla_put_failure; + if (li->u.ulog.group) + if (nla_put_be16(skb, NFTA_LOG_GROUP, htons(li->u.ulog.group))) + goto nla_put_failure; + if (li->u.ulog.copy_len) + if (nla_put_be32(skb, NFTA_LOG_SNAPLEN, + htonl(li->u.ulog.copy_len))) + goto nla_put_failure; + if (li->u.ulog.qthreshold) + if (nla_put_be16(skb, NFTA_LOG_QTHRESHOLD, + htons(li->u.ulog.qthreshold))) + goto nla_put_failure; + return 0; + +nla_put_failure: + return -1; +} + +static struct nft_expr_type nft_log_type; +static const struct nft_expr_ops nft_log_ops = { + .type = &nft_log_type, + .size = NFT_EXPR_SIZE(sizeof(struct nft_log)), + .eval = nft_log_eval, + .init = nft_log_init, + .destroy = nft_log_destroy, + .dump = nft_log_dump, +}; + +static struct nft_expr_type nft_log_type __read_mostly = { + .name = "log", + .ops = &nft_log_ops, + .policy = nft_log_policy, + .maxattr = NFTA_LOG_MAX, + .owner = THIS_MODULE, +}; + +static int __init nft_log_module_init(void) +{ + return nft_register_expr(&nft_log_type); +} + +static void __exit nft_log_module_exit(void) +{ + nft_unregister_expr(&nft_log_type); +} + +module_init(nft_log_module_init); +module_exit(nft_log_module_exit); + +MODULE_LICENSE("GPL"); +MODULE_AUTHOR("Patrick McHardy <kaber@trash.net>"); +MODULE_ALIAS_NFT_EXPR("log"); diff --git a/net/netfilter/nft_lookup.c b/net/netfilter/nft_lookup.c new file mode 100644 index 000000000000..8a6116b75b5a --- /dev/null +++ b/net/netfilter/nft_lookup.c @@ -0,0 +1,141 @@ +/* + * Copyright (c) 2009 Patrick McHardy <kaber@trash.net> + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License version 2 as + * published by the Free Software Foundation. + * + * Development of this code funded by Astaro AG (http://www.astaro.com/) + */ + +#include <linux/kernel.h> +#include <linux/init.h> +#include <linux/list.h> +#include <linux/rbtree.h> +#include <linux/netlink.h> +#include <linux/netfilter.h> +#include <linux/netfilter/nf_tables.h> +#include <net/netfilter/nf_tables.h> + +struct nft_lookup { + struct nft_set *set; + enum nft_registers sreg:8; + enum nft_registers dreg:8; + struct nft_set_binding binding; +}; + +static void nft_lookup_eval(const struct nft_expr *expr, + struct nft_data data[NFT_REG_MAX + 1], + const struct nft_pktinfo *pkt) +{ + const struct nft_lookup *priv = nft_expr_priv(expr); + const struct nft_set *set = priv->set; + + if (set->ops->lookup(set, &data[priv->sreg], &data[priv->dreg])) + return; + data[NFT_REG_VERDICT].verdict = NFT_BREAK; +} + +static const struct nla_policy nft_lookup_policy[NFTA_LOOKUP_MAX + 1] = { + [NFTA_LOOKUP_SET] = { .type = NLA_STRING }, + [NFTA_LOOKUP_SREG] = { .type = NLA_U32 }, + [NFTA_LOOKUP_DREG] = { .type = NLA_U32 }, +}; + +static int nft_lookup_init(const struct nft_ctx *ctx, + const struct nft_expr *expr, + const struct nlattr * const tb[]) +{ + struct nft_lookup *priv = nft_expr_priv(expr); + struct nft_set *set; + int err; + + if (tb[NFTA_LOOKUP_SET] == NULL || + tb[NFTA_LOOKUP_SREG] == NULL) + return -EINVAL; + + set = nf_tables_set_lookup(ctx->table, tb[NFTA_LOOKUP_SET]); + if (IS_ERR(set)) + return PTR_ERR(set); + + priv->sreg = ntohl(nla_get_be32(tb[NFTA_LOOKUP_SREG])); + err = nft_validate_input_register(priv->sreg); + if (err < 0) + return err; + + if (tb[NFTA_LOOKUP_DREG] != NULL) { + if (!(set->flags & NFT_SET_MAP)) + return -EINVAL; + + priv->dreg = ntohl(nla_get_be32(tb[NFTA_LOOKUP_DREG])); + err = nft_validate_output_register(priv->dreg); + if (err < 0) + return err; + + if (priv->dreg == NFT_REG_VERDICT) { + if (set->dtype != NFT_DATA_VERDICT) + return -EINVAL; + } else if (set->dtype == NFT_DATA_VERDICT) + return -EINVAL; + } else if (set->flags & NFT_SET_MAP) + return -EINVAL; + + err = nf_tables_bind_set(ctx, set, &priv->binding); + if (err < 0) + return err; + + priv->set = set; + return 0; +} + +static void nft_lookup_destroy(const struct nft_expr *expr) +{ + struct nft_lookup *priv = nft_expr_priv(expr); + + nf_tables_unbind_set(NULL, priv->set, &priv->binding); +} + +static int nft_lookup_dump(struct sk_buff *skb, const struct nft_expr *expr) +{ + const struct nft_lookup *priv = nft_expr_priv(expr); + + if (nla_put_string(skb, NFTA_LOOKUP_SET, priv->set->name)) + goto nla_put_failure; + if (nla_put_be32(skb, NFTA_LOOKUP_SREG, htonl(priv->sreg))) + goto nla_put_failure; + if (priv->set->flags & NFT_SET_MAP) + if (nla_put_be32(skb, NFTA_LOOKUP_DREG, htonl(priv->dreg))) + goto nla_put_failure; + return 0; + +nla_put_failure: + return -1; +} + +static struct nft_expr_type nft_lookup_type; +static const struct nft_expr_ops nft_lookup_ops = { + .type = &nft_lookup_type, + .size = NFT_EXPR_SIZE(sizeof(struct nft_lookup)), + .eval = nft_lookup_eval, + .init = nft_lookup_init, + .destroy = nft_lookup_destroy, + .dump = nft_lookup_dump, +}; + +static struct nft_expr_type nft_lookup_type __read_mostly = { + .name = "lookup", + .ops = &nft_lookup_ops, + .policy = nft_lookup_policy, + .maxattr = NFTA_LOOKUP_MAX, + .owner = THIS_MODULE, +}; + +int __init nft_lookup_module_init(void) +{ + return nft_register_expr(&nft_lookup_type); +} + +void nft_lookup_module_exit(void) +{ + nft_unregister_expr(&nft_lookup_type); +} diff --git a/net/netfilter/nft_meta.c b/net/netfilter/nft_meta.c new file mode 100644 index 000000000000..8c28220a90b3 --- /dev/null +++ b/net/netfilter/nft_meta.c @@ -0,0 +1,228 @@ +/* + * Copyright (c) 2008-2009 Patrick McHardy <kaber@trash.net> + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License version 2 as + * published by the Free Software Foundation. + * + * Development of this code funded by Astaro AG (http://www.astaro.com/) + */ + +#include <linux/kernel.h> +#include <linux/init.h> +#include <linux/module.h> +#include <linux/netlink.h> +#include <linux/netfilter.h> +#include <linux/netfilter/nf_tables.h> +#include <net/dst.h> +#include <net/sock.h> +#include <net/tcp_states.h> /* for TCP_TIME_WAIT */ +#include <net/netfilter/nf_tables.h> + +struct nft_meta { + enum nft_meta_keys key:8; + enum nft_registers dreg:8; +}; + +static void nft_meta_eval(const struct nft_expr *expr, + struct nft_data data[NFT_REG_MAX + 1], + const struct nft_pktinfo *pkt) +{ + const struct nft_meta *priv = nft_expr_priv(expr); + const struct sk_buff *skb = pkt->skb; + const struct net_device *in = pkt->in, *out = pkt->out; + struct nft_data *dest = &data[priv->dreg]; + + switch (priv->key) { + case NFT_META_LEN: + dest->data[0] = skb->len; + break; + case NFT_META_PROTOCOL: + *(__be16 *)dest->data = skb->protocol; + break; + case NFT_META_PRIORITY: + dest->data[0] = skb->priority; + break; + case NFT_META_MARK: + dest->data[0] = skb->mark; + break; + case NFT_META_IIF: + if (in == NULL) + goto err; + dest->data[0] = in->ifindex; + break; + case NFT_META_OIF: + if (out == NULL) + goto err; + dest->data[0] = out->ifindex; + break; + case NFT_META_IIFNAME: + if (in == NULL) + goto err; + strncpy((char *)dest->data, in->name, sizeof(dest->data)); + break; + case NFT_META_OIFNAME: + if (out == NULL) + goto err; + strncpy((char *)dest->data, out->name, sizeof(dest->data)); + break; + case NFT_META_IIFTYPE: + if (in == NULL) + goto err; + *(u16 *)dest->data = in->type; + break; + case NFT_META_OIFTYPE: + if (out == NULL) + goto err; + *(u16 *)dest->data = out->type; + break; + case NFT_META_SKUID: + if (skb->sk == NULL || skb->sk->sk_state == TCP_TIME_WAIT) + goto err; + + read_lock_bh(&skb->sk->sk_callback_lock); + if (skb->sk->sk_socket == NULL || + skb->sk->sk_socket->file == NULL) { + read_unlock_bh(&skb->sk->sk_callback_lock); + goto err; + } + + dest->data[0] = + from_kuid_munged(&init_user_ns, + skb->sk->sk_socket->file->f_cred->fsuid); + read_unlock_bh(&skb->sk->sk_callback_lock); + break; + case NFT_META_SKGID: + if (skb->sk == NULL || skb->sk->sk_state == TCP_TIME_WAIT) + goto err; + + read_lock_bh(&skb->sk->sk_callback_lock); + if (skb->sk->sk_socket == NULL || + skb->sk->sk_socket->file == NULL) { + read_unlock_bh(&skb->sk->sk_callback_lock); + goto err; + } + dest->data[0] = + from_kgid_munged(&init_user_ns, + skb->sk->sk_socket->file->f_cred->fsgid); + read_unlock_bh(&skb->sk->sk_callback_lock); + break; +#ifdef CONFIG_NET_CLS_ROUTE + case NFT_META_RTCLASSID: { + const struct dst_entry *dst = skb_dst(skb); + + if (dst == NULL) + goto err; + dest->data[0] = dst->tclassid; + break; + } +#endif +#ifdef CONFIG_NETWORK_SECMARK + case NFT_META_SECMARK: + dest->data[0] = skb->secmark; + break; +#endif + default: + WARN_ON(1); + goto err; + } + return; + +err: + data[NFT_REG_VERDICT].verdict = NFT_BREAK; +} + +static const struct nla_policy nft_meta_policy[NFTA_META_MAX + 1] = { + [NFTA_META_DREG] = { .type = NLA_U32 }, + [NFTA_META_KEY] = { .type = NLA_U32 }, +}; + +static int nft_meta_init(const struct nft_ctx *ctx, const struct nft_expr *expr, + const struct nlattr * const tb[]) +{ + struct nft_meta *priv = nft_expr_priv(expr); + int err; + + if (tb[NFTA_META_DREG] == NULL || + tb[NFTA_META_KEY] == NULL) + return -EINVAL; + + priv->key = ntohl(nla_get_be32(tb[NFTA_META_KEY])); + switch (priv->key) { + case NFT_META_LEN: + case NFT_META_PROTOCOL: + case NFT_META_PRIORITY: + case NFT_META_MARK: + case NFT_META_IIF: + case NFT_META_OIF: + case NFT_META_IIFNAME: + case NFT_META_OIFNAME: + case NFT_META_IIFTYPE: + case NFT_META_OIFTYPE: + case NFT_META_SKUID: + case NFT_META_SKGID: +#ifdef CONFIG_NET_CLS_ROUTE + case NFT_META_RTCLASSID: +#endif +#ifdef CONFIG_NETWORK_SECMARK + case NFT_META_SECMARK: +#endif + break; + default: + return -EOPNOTSUPP; + } + + priv->dreg = ntohl(nla_get_be32(tb[NFTA_META_DREG])); + err = nft_validate_output_register(priv->dreg); + if (err < 0) + return err; + return nft_validate_data_load(ctx, priv->dreg, NULL, NFT_DATA_VALUE); +} + +static int nft_meta_dump(struct sk_buff *skb, const struct nft_expr *expr) +{ + const struct nft_meta *priv = nft_expr_priv(expr); + + if (nla_put_be32(skb, NFTA_META_DREG, htonl(priv->dreg))) + goto nla_put_failure; + if (nla_put_be32(skb, NFTA_META_KEY, htonl(priv->key))) + goto nla_put_failure; + return 0; + +nla_put_failure: + return -1; +} + +static struct nft_expr_type nft_meta_type; +static const struct nft_expr_ops nft_meta_ops = { + .type = &nft_meta_type, + .size = NFT_EXPR_SIZE(sizeof(struct nft_meta)), + .eval = nft_meta_eval, + .init = nft_meta_init, + .dump = nft_meta_dump, +}; + +static struct nft_expr_type nft_meta_type __read_mostly = { + .name = "meta", + .ops = &nft_meta_ops, + .policy = nft_meta_policy, + .maxattr = NFTA_META_MAX, + .owner = THIS_MODULE, +}; + +static int __init nft_meta_module_init(void) +{ + return nft_register_expr(&nft_meta_type); +} + +static void __exit nft_meta_module_exit(void) +{ + nft_unregister_expr(&nft_meta_type); +} + +module_init(nft_meta_module_init); +module_exit(nft_meta_module_exit); + +MODULE_LICENSE("GPL"); +MODULE_AUTHOR("Patrick McHardy <kaber@trash.net>"); +MODULE_ALIAS_NFT_EXPR("meta"); diff --git a/net/netfilter/nft_meta_target.c b/net/netfilter/nft_meta_target.c new file mode 100644 index 000000000000..71177df75ffb --- /dev/null +++ b/net/netfilter/nft_meta_target.c @@ -0,0 +1,117 @@ +/* + * Copyright (c) 2008 Patrick McHardy <kaber@trash.net> + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License version 2 as + * published by the Free Software Foundation. + * + * Development of this code funded by Astaro AG (http://www.astaro.com/) + */ + +#include <linux/kernel.h> +#include <linux/init.h> +#include <linux/list.h> +#include <linux/rbtree.h> +#include <linux/netlink.h> +#include <linux/netfilter.h> +#include <linux/netfilter/nf_tables.h> +#include <net/netfilter/nf_tables.h> + +struct nft_meta { + enum nft_meta_keys key; +}; + +static void nft_meta_eval(const struct nft_expr *expr, + struct nft_data *nfres, + struct nft_data *data, + const struct nft_pktinfo *pkt) +{ + const struct nft_meta *meta = nft_expr_priv(expr); + struct sk_buff *skb = pkt->skb; + u32 val = data->data[0]; + + switch (meta->key) { + case NFT_META_MARK: + skb->mark = val; + break; + case NFT_META_PRIORITY: + skb->priority = val; + break; + case NFT_META_NFTRACE: + skb->nf_trace = val; + break; +#ifdef CONFIG_NETWORK_SECMARK + case NFT_META_SECMARK: + skb->secmark = val; + break; +#endif + default: + WARN_ON(1); + } +} + +static const struct nla_policy nft_meta_policy[NFTA_META_MAX + 1] = { + [NFTA_META_KEY] = { .type = NLA_U32 }, +}; + +static int nft_meta_init(const struct nft_expr *expr, struct nlattr *tb[]) +{ + struct nft_meta *meta = nft_expr_priv(expr); + + if (tb[NFTA_META_KEY] == NULL) + return -EINVAL; + + meta->key = ntohl(nla_get_be32(tb[NFTA_META_KEY])); + switch (meta->key) { + case NFT_META_MARK: + case NFT_META_PRIORITY: + case NFT_META_NFTRACE: +#ifdef CONFIG_NETWORK_SECMARK + case NFT_META_SECMARK: +#endif + break; + default: + return -EINVAL; + } + + return 0; +} + +static int nft_meta_dump(struct sk_buff *skb, const struct nft_expr *expr) +{ + struct nft_meta *meta = nft_expr_priv(expr); + + NLA_PUT_BE32(skb, NFTA_META_KEY, htonl(meta->key)); + return 0; + +nla_put_failure: + return -1; +} + +static struct nft_expr_ops meta_target __read_mostly = { + .name = "meta", + .size = NFT_EXPR_SIZE(sizeof(struct nft_meta)), + .owner = THIS_MODULE, + .eval = nft_meta_eval, + .init = nft_meta_init, + .dump = nft_meta_dump, + .policy = nft_meta_policy, + .maxattr = NFTA_META_MAX, +}; + +static int __init nft_meta_target_init(void) +{ + return nft_register_expr(&meta_target); +} + +static void __exit nft_meta_target_exit(void) +{ + nft_unregister_expr(&meta_target); +} + +module_init(nft_meta_target_init); +module_exit(nft_meta_target_exit); + +MODULE_LICENSE("GPL"); +MODULE_AUTHOR("Patrick McHardy <kaber@trash.net>"); +MODULE_ALIAS_NFT_EXPR("meta"); diff --git a/net/netfilter/nft_nat.c b/net/netfilter/nft_nat.c new file mode 100644 index 000000000000..d3b1ffe26181 --- /dev/null +++ b/net/netfilter/nft_nat.c @@ -0,0 +1,224 @@ +/* + * Copyright (c) 2008-2009 Patrick McHardy <kaber@trash.net> + * Copyright (c) 2012 Pablo Neira Ayuso <pablo@netfilter.org> + * Copyright (c) 2012 Intel Corporation + * + * This program is free software; you can redistribute it and/or modify it + * under the terms and conditions of the GNU General Public License, + * version 2, as published by the Free Software Foundation. + * + */ + +#include <linux/module.h> +#include <linux/init.h> +#include <linux/skbuff.h> +#include <linux/ip.h> +#include <linux/string.h> +#include <linux/netlink.h> +#include <linux/netfilter.h> +#include <linux/netfilter_ipv4.h> +#include <linux/netfilter/nfnetlink.h> +#include <linux/netfilter/nf_tables.h> +#include <net/netfilter/nf_conntrack.h> +#include <net/netfilter/nf_nat.h> +#include <net/netfilter/nf_nat_core.h> +#include <net/netfilter/nf_tables.h> +#include <net/netfilter/nf_nat_l3proto.h> +#include <net/ip.h> + +struct nft_nat { + enum nft_registers sreg_addr_min:8; + enum nft_registers sreg_addr_max:8; + enum nft_registers sreg_proto_min:8; + enum nft_registers sreg_proto_max:8; + int family; + enum nf_nat_manip_type type; +}; + +static void nft_nat_eval(const struct nft_expr *expr, + struct nft_data data[NFT_REG_MAX + 1], + const struct nft_pktinfo *pkt) +{ + const struct nft_nat *priv = nft_expr_priv(expr); + enum ip_conntrack_info ctinfo; + struct nf_conn *ct = nf_ct_get(pkt->skb, &ctinfo); + struct nf_nat_range range; + + memset(&range, 0, sizeof(range)); + if (priv->sreg_addr_min) { + if (priv->family == AF_INET) { + range.min_addr.ip = (__force __be32) + data[priv->sreg_addr_min].data[0]; + range.max_addr.ip = (__force __be32) + data[priv->sreg_addr_max].data[0]; + + } else { + memcpy(range.min_addr.ip6, + data[priv->sreg_addr_min].data, + sizeof(struct nft_data)); + memcpy(range.max_addr.ip6, + data[priv->sreg_addr_max].data, + sizeof(struct nft_data)); + } + range.flags |= NF_NAT_RANGE_MAP_IPS; + } + + if (priv->sreg_proto_min) { + range.min_proto.all = (__force __be16) + data[priv->sreg_proto_min].data[0]; + range.max_proto.all = (__force __be16) + data[priv->sreg_proto_max].data[0]; + range.flags |= NF_NAT_RANGE_PROTO_SPECIFIED; + } + + data[NFT_REG_VERDICT].verdict = + nf_nat_setup_info(ct, &range, priv->type); +} + +static const struct nla_policy nft_nat_policy[NFTA_NAT_MAX + 1] = { + [NFTA_NAT_TYPE] = { .type = NLA_U32 }, + [NFTA_NAT_FAMILY] = { .type = NLA_U32 }, + [NFTA_NAT_REG_ADDR_MIN] = { .type = NLA_U32 }, + [NFTA_NAT_REG_ADDR_MAX] = { .type = NLA_U32 }, + [NFTA_NAT_REG_PROTO_MIN] = { .type = NLA_U32 }, + [NFTA_NAT_REG_PROTO_MAX] = { .type = NLA_U32 }, +}; + +static int nft_nat_init(const struct nft_ctx *ctx, const struct nft_expr *expr, + const struct nlattr * const tb[]) +{ + struct nft_nat *priv = nft_expr_priv(expr); + int err; + + if (tb[NFTA_NAT_TYPE] == NULL) + return -EINVAL; + + switch (ntohl(nla_get_be32(tb[NFTA_NAT_TYPE]))) { + case NFT_NAT_SNAT: + priv->type = NF_NAT_MANIP_SRC; + break; + case NFT_NAT_DNAT: + priv->type = NF_NAT_MANIP_DST; + break; + default: + return -EINVAL; + } + + if (tb[NFTA_NAT_FAMILY] == NULL) + return -EINVAL; + + priv->family = ntohl(nla_get_be32(tb[NFTA_NAT_FAMILY])); + if (priv->family != AF_INET && priv->family != AF_INET6) + return -EINVAL; + + if (tb[NFTA_NAT_REG_ADDR_MIN]) { + priv->sreg_addr_min = ntohl(nla_get_be32( + tb[NFTA_NAT_REG_ADDR_MIN])); + err = nft_validate_input_register(priv->sreg_addr_min); + if (err < 0) + return err; + } + + if (tb[NFTA_NAT_REG_ADDR_MAX]) { + priv->sreg_addr_max = ntohl(nla_get_be32( + tb[NFTA_NAT_REG_ADDR_MAX])); + err = nft_validate_input_register(priv->sreg_addr_max); + if (err < 0) + return err; + } else + priv->sreg_addr_max = priv->sreg_addr_min; + + if (tb[NFTA_NAT_REG_PROTO_MIN]) { + priv->sreg_proto_min = ntohl(nla_get_be32( + tb[NFTA_NAT_REG_PROTO_MIN])); + err = nft_validate_input_register(priv->sreg_proto_min); + if (err < 0) + return err; + } + + if (tb[NFTA_NAT_REG_PROTO_MAX]) { + priv->sreg_proto_max = ntohl(nla_get_be32( + tb[NFTA_NAT_REG_PROTO_MAX])); + err = nft_validate_input_register(priv->sreg_proto_max); + if (err < 0) + return err; + } else + priv->sreg_proto_max = priv->sreg_proto_min; + + return 0; +} + +static int nft_nat_dump(struct sk_buff *skb, const struct nft_expr *expr) +{ + const struct nft_nat *priv = nft_expr_priv(expr); + + switch (priv->type) { + case NF_NAT_MANIP_SRC: + if (nla_put_be32(skb, NFTA_NAT_TYPE, htonl(NFT_NAT_SNAT))) + goto nla_put_failure; + break; + case NF_NAT_MANIP_DST: + if (nla_put_be32(skb, NFTA_NAT_TYPE, htonl(NFT_NAT_DNAT))) + goto nla_put_failure; + break; + } + + if (nla_put_be32(skb, NFTA_NAT_FAMILY, htonl(priv->family))) + goto nla_put_failure; + if (nla_put_be32(skb, + NFTA_NAT_REG_ADDR_MIN, htonl(priv->sreg_addr_min))) + goto nla_put_failure; + if (nla_put_be32(skb, + NFTA_NAT_REG_ADDR_MAX, htonl(priv->sreg_addr_max))) + goto nla_put_failure; + if (nla_put_be32(skb, + NFTA_NAT_REG_PROTO_MIN, htonl(priv->sreg_proto_min))) + goto nla_put_failure; + if (nla_put_be32(skb, + NFTA_NAT_REG_PROTO_MAX, htonl(priv->sreg_proto_max))) + goto nla_put_failure; + return 0; + +nla_put_failure: + return -1; +} + +static struct nft_expr_type nft_nat_type; +static const struct nft_expr_ops nft_nat_ops = { + .type = &nft_nat_type, + .size = NFT_EXPR_SIZE(sizeof(struct nft_nat)), + .eval = nft_nat_eval, + .init = nft_nat_init, + .dump = nft_nat_dump, +}; + +static struct nft_expr_type nft_nat_type __read_mostly = { + .name = "nat", + .ops = &nft_nat_ops, + .policy = nft_nat_policy, + .maxattr = NFTA_NAT_MAX, + .owner = THIS_MODULE, +}; + +static int __init nft_nat_module_init(void) +{ + int err; + + err = nft_register_expr(&nft_nat_type); + if (err < 0) + return err; + + return 0; +} + +static void __exit nft_nat_module_exit(void) +{ + nft_unregister_expr(&nft_nat_type); +} + +module_init(nft_nat_module_init); +module_exit(nft_nat_module_exit); + +MODULE_LICENSE("GPL"); +MODULE_AUTHOR("Tomasz Bursztyka <tomasz.bursztyka@linux.intel.com>"); +MODULE_ALIAS_NFT_EXPR("nat"); diff --git a/net/netfilter/nft_payload.c b/net/netfilter/nft_payload.c new file mode 100644 index 000000000000..a2aeb318678f --- /dev/null +++ b/net/netfilter/nft_payload.c @@ -0,0 +1,160 @@ +/* + * Copyright (c) 2008-2009 Patrick McHardy <kaber@trash.net> + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License version 2 as + * published by the Free Software Foundation. + * + * Development of this code funded by Astaro AG (http://www.astaro.com/) + */ + +#include <linux/kernel.h> +#include <linux/init.h> +#include <linux/module.h> +#include <linux/netlink.h> +#include <linux/netfilter.h> +#include <linux/netfilter/nf_tables.h> +#include <net/netfilter/nf_tables_core.h> +#include <net/netfilter/nf_tables.h> + +static void nft_payload_eval(const struct nft_expr *expr, + struct nft_data data[NFT_REG_MAX + 1], + const struct nft_pktinfo *pkt) +{ + const struct nft_payload *priv = nft_expr_priv(expr); + const struct sk_buff *skb = pkt->skb; + struct nft_data *dest = &data[priv->dreg]; + int offset; + + switch (priv->base) { + case NFT_PAYLOAD_LL_HEADER: + if (!skb_mac_header_was_set(skb)) + goto err; + offset = skb_mac_header(skb) - skb->data; + break; + case NFT_PAYLOAD_NETWORK_HEADER: + offset = skb_network_offset(skb); + break; + case NFT_PAYLOAD_TRANSPORT_HEADER: + offset = pkt->xt.thoff; + break; + default: + BUG(); + } + offset += priv->offset; + + if (skb_copy_bits(skb, offset, dest->data, priv->len) < 0) + goto err; + return; +err: + data[NFT_REG_VERDICT].verdict = NFT_BREAK; +} + +static const struct nla_policy nft_payload_policy[NFTA_PAYLOAD_MAX + 1] = { + [NFTA_PAYLOAD_DREG] = { .type = NLA_U32 }, + [NFTA_PAYLOAD_BASE] = { .type = NLA_U32 }, + [NFTA_PAYLOAD_OFFSET] = { .type = NLA_U32 }, + [NFTA_PAYLOAD_LEN] = { .type = NLA_U32 }, +}; + +static int nft_payload_init(const struct nft_ctx *ctx, + const struct nft_expr *expr, + const struct nlattr * const tb[]) +{ + struct nft_payload *priv = nft_expr_priv(expr); + int err; + + priv->base = ntohl(nla_get_be32(tb[NFTA_PAYLOAD_BASE])); + priv->offset = ntohl(nla_get_be32(tb[NFTA_PAYLOAD_OFFSET])); + priv->len = ntohl(nla_get_be32(tb[NFTA_PAYLOAD_LEN])); + + priv->dreg = ntohl(nla_get_be32(tb[NFTA_PAYLOAD_DREG])); + err = nft_validate_output_register(priv->dreg); + if (err < 0) + return err; + return nft_validate_data_load(ctx, priv->dreg, NULL, NFT_DATA_VALUE); +} + +static int nft_payload_dump(struct sk_buff *skb, const struct nft_expr *expr) +{ + const struct nft_payload *priv = nft_expr_priv(expr); + + if (nla_put_be32(skb, NFTA_PAYLOAD_DREG, htonl(priv->dreg)) || + nla_put_be32(skb, NFTA_PAYLOAD_BASE, htonl(priv->base)) || + nla_put_be32(skb, NFTA_PAYLOAD_OFFSET, htonl(priv->offset)) || + nla_put_be32(skb, NFTA_PAYLOAD_LEN, htonl(priv->len))) + goto nla_put_failure; + return 0; + +nla_put_failure: + return -1; +} + +static struct nft_expr_type nft_payload_type; +static const struct nft_expr_ops nft_payload_ops = { + .type = &nft_payload_type, + .size = NFT_EXPR_SIZE(sizeof(struct nft_payload)), + .eval = nft_payload_eval, + .init = nft_payload_init, + .dump = nft_payload_dump, +}; + +const struct nft_expr_ops nft_payload_fast_ops = { + .type = &nft_payload_type, + .size = NFT_EXPR_SIZE(sizeof(struct nft_payload)), + .eval = nft_payload_eval, + .init = nft_payload_init, + .dump = nft_payload_dump, +}; + +static const struct nft_expr_ops * +nft_payload_select_ops(const struct nft_ctx *ctx, + const struct nlattr * const tb[]) +{ + enum nft_payload_bases base; + unsigned int offset, len; + + if (tb[NFTA_PAYLOAD_DREG] == NULL || + tb[NFTA_PAYLOAD_BASE] == NULL || + tb[NFTA_PAYLOAD_OFFSET] == NULL || + tb[NFTA_PAYLOAD_LEN] == NULL) + return ERR_PTR(-EINVAL); + + base = ntohl(nla_get_be32(tb[NFTA_PAYLOAD_BASE])); + switch (base) { + case NFT_PAYLOAD_LL_HEADER: + case NFT_PAYLOAD_NETWORK_HEADER: + case NFT_PAYLOAD_TRANSPORT_HEADER: + break; + default: + return ERR_PTR(-EOPNOTSUPP); + } + + offset = ntohl(nla_get_be32(tb[NFTA_PAYLOAD_OFFSET])); + len = ntohl(nla_get_be32(tb[NFTA_PAYLOAD_LEN])); + if (len == 0 || len > FIELD_SIZEOF(struct nft_data, data)) + return ERR_PTR(-EINVAL); + + if (len <= 4 && IS_ALIGNED(offset, len) && base != NFT_PAYLOAD_LL_HEADER) + return &nft_payload_fast_ops; + else + return &nft_payload_ops; +} + +static struct nft_expr_type nft_payload_type __read_mostly = { + .name = "payload", + .select_ops = nft_payload_select_ops, + .policy = nft_payload_policy, + .maxattr = NFTA_PAYLOAD_MAX, + .owner = THIS_MODULE, +}; + +int __init nft_payload_module_init(void) +{ + return nft_register_expr(&nft_payload_type); +} + +void nft_payload_module_exit(void) +{ + nft_unregister_expr(&nft_payload_type); +} diff --git a/net/netfilter/nft_rbtree.c b/net/netfilter/nft_rbtree.c new file mode 100644 index 000000000000..ca0c1b231bfe --- /dev/null +++ b/net/netfilter/nft_rbtree.c @@ -0,0 +1,247 @@ +/* + * Copyright (c) 2008-2009 Patrick McHardy <kaber@trash.net> + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License version 2 as + * published by the Free Software Foundation. + * + * Development of this code funded by Astaro AG (http://www.astaro.com/) + */ + +#include <linux/kernel.h> +#include <linux/init.h> +#include <linux/module.h> +#include <linux/list.h> +#include <linux/rbtree.h> +#include <linux/netlink.h> +#include <linux/netfilter.h> +#include <linux/netfilter/nf_tables.h> +#include <net/netfilter/nf_tables.h> + +struct nft_rbtree { + struct rb_root root; +}; + +struct nft_rbtree_elem { + struct rb_node node; + u16 flags; + struct nft_data key; + struct nft_data data[]; +}; + +static bool nft_rbtree_lookup(const struct nft_set *set, + const struct nft_data *key, + struct nft_data *data) +{ + const struct nft_rbtree *priv = nft_set_priv(set); + const struct nft_rbtree_elem *rbe, *interval = NULL; + const struct rb_node *parent = priv->root.rb_node; + int d; + + while (parent != NULL) { + rbe = rb_entry(parent, struct nft_rbtree_elem, node); + + d = nft_data_cmp(&rbe->key, key, set->klen); + if (d < 0) { + parent = parent->rb_left; + interval = rbe; + } else if (d > 0) + parent = parent->rb_right; + else { +found: + if (rbe->flags & NFT_SET_ELEM_INTERVAL_END) + goto out; + if (set->flags & NFT_SET_MAP) + nft_data_copy(data, rbe->data); + return true; + } + } + + if (set->flags & NFT_SET_INTERVAL && interval != NULL) { + rbe = interval; + goto found; + } +out: + return false; +} + +static void nft_rbtree_elem_destroy(const struct nft_set *set, + struct nft_rbtree_elem *rbe) +{ + nft_data_uninit(&rbe->key, NFT_DATA_VALUE); + if (set->flags & NFT_SET_MAP) + nft_data_uninit(rbe->data, set->dtype); + kfree(rbe); +} + +static int __nft_rbtree_insert(const struct nft_set *set, + struct nft_rbtree_elem *new) +{ + struct nft_rbtree *priv = nft_set_priv(set); + struct nft_rbtree_elem *rbe; + struct rb_node *parent, **p; + int d; + + parent = NULL; + p = &priv->root.rb_node; + while (*p != NULL) { + parent = *p; + rbe = rb_entry(parent, struct nft_rbtree_elem, node); + d = nft_data_cmp(&rbe->key, &new->key, set->klen); + if (d < 0) + p = &parent->rb_left; + else if (d > 0) + p = &parent->rb_right; + else + return -EEXIST; + } + rb_link_node(&new->node, parent, p); + rb_insert_color(&new->node, &priv->root); + return 0; +} + +static int nft_rbtree_insert(const struct nft_set *set, + const struct nft_set_elem *elem) +{ + struct nft_rbtree_elem *rbe; + unsigned int size; + int err; + + size = sizeof(*rbe); + if (set->flags & NFT_SET_MAP) + size += sizeof(rbe->data[0]); + + rbe = kzalloc(size, GFP_KERNEL); + if (rbe == NULL) + return -ENOMEM; + + rbe->flags = elem->flags; + nft_data_copy(&rbe->key, &elem->key); + if (set->flags & NFT_SET_MAP) + nft_data_copy(rbe->data, &elem->data); + + err = __nft_rbtree_insert(set, rbe); + if (err < 0) + kfree(rbe); + return err; +} + +static void nft_rbtree_remove(const struct nft_set *set, + const struct nft_set_elem *elem) +{ + struct nft_rbtree *priv = nft_set_priv(set); + struct nft_rbtree_elem *rbe = elem->cookie; + + rb_erase(&rbe->node, &priv->root); + kfree(rbe); +} + +static int nft_rbtree_get(const struct nft_set *set, struct nft_set_elem *elem) +{ + const struct nft_rbtree *priv = nft_set_priv(set); + const struct rb_node *parent = priv->root.rb_node; + struct nft_rbtree_elem *rbe; + int d; + + while (parent != NULL) { + rbe = rb_entry(parent, struct nft_rbtree_elem, node); + + d = nft_data_cmp(&rbe->key, &elem->key, set->klen); + if (d < 0) + parent = parent->rb_left; + else if (d > 0) + parent = parent->rb_right; + else { + elem->cookie = rbe; + if (set->flags & NFT_SET_MAP) + nft_data_copy(&elem->data, rbe->data); + elem->flags = rbe->flags; + return 0; + } + } + return -ENOENT; +} + +static void nft_rbtree_walk(const struct nft_ctx *ctx, + const struct nft_set *set, + struct nft_set_iter *iter) +{ + const struct nft_rbtree *priv = nft_set_priv(set); + const struct nft_rbtree_elem *rbe; + struct nft_set_elem elem; + struct rb_node *node; + + for (node = rb_first(&priv->root); node != NULL; node = rb_next(node)) { + if (iter->count < iter->skip) + goto cont; + + rbe = rb_entry(node, struct nft_rbtree_elem, node); + nft_data_copy(&elem.key, &rbe->key); + if (set->flags & NFT_SET_MAP) + nft_data_copy(&elem.data, rbe->data); + elem.flags = rbe->flags; + + iter->err = iter->fn(ctx, set, iter, &elem); + if (iter->err < 0) + return; +cont: + iter->count++; + } +} + +static unsigned int nft_rbtree_privsize(const struct nlattr * const nla[]) +{ + return sizeof(struct nft_rbtree); +} + +static int nft_rbtree_init(const struct nft_set *set, + const struct nlattr * const nla[]) +{ + struct nft_rbtree *priv = nft_set_priv(set); + + priv->root = RB_ROOT; + return 0; +} + +static void nft_rbtree_destroy(const struct nft_set *set) +{ + struct nft_rbtree *priv = nft_set_priv(set); + struct nft_rbtree_elem *rbe; + struct rb_node *node; + + while ((node = priv->root.rb_node) != NULL) { + rb_erase(node, &priv->root); + rbe = rb_entry(node, struct nft_rbtree_elem, node); + nft_rbtree_elem_destroy(set, rbe); + } +} + +static struct nft_set_ops nft_rbtree_ops __read_mostly = { + .privsize = nft_rbtree_privsize, + .init = nft_rbtree_init, + .destroy = nft_rbtree_destroy, + .insert = nft_rbtree_insert, + .remove = nft_rbtree_remove, + .get = nft_rbtree_get, + .lookup = nft_rbtree_lookup, + .walk = nft_rbtree_walk, + .features = NFT_SET_INTERVAL | NFT_SET_MAP, + .owner = THIS_MODULE, +}; + +static int __init nft_rbtree_module_init(void) +{ + return nft_register_set(&nft_rbtree_ops); +} + +static void __exit nft_rbtree_module_exit(void) +{ + nft_unregister_set(&nft_rbtree_ops); +} + +module_init(nft_rbtree_module_init); +module_exit(nft_rbtree_module_exit); + +MODULE_LICENSE("GPL"); +MODULE_AUTHOR("Patrick McHardy <kaber@trash.net>"); +MODULE_ALIAS_NFT_SET(); diff --git a/net/netfilter/xt_TCPMSS.c b/net/netfilter/xt_TCPMSS.c index cd24290f3b2f..e762de5ee89b 100644 --- a/net/netfilter/xt_TCPMSS.c +++ b/net/netfilter/xt_TCPMSS.c @@ -43,10 +43,42 @@ optlen(const u_int8_t *opt, unsigned int offset) return opt[offset+1]; } +static u_int32_t tcpmss_reverse_mtu(struct net *net, + const struct sk_buff *skb, + unsigned int family) +{ + struct flowi fl; + const struct nf_afinfo *ai; + struct rtable *rt = NULL; + u_int32_t mtu = ~0U; + + if (family == PF_INET) { + struct flowi4 *fl4 = &fl.u.ip4; + memset(fl4, 0, sizeof(*fl4)); + fl4->daddr = ip_hdr(skb)->saddr; + } else { + struct flowi6 *fl6 = &fl.u.ip6; + + memset(fl6, 0, sizeof(*fl6)); + fl6->daddr = ipv6_hdr(skb)->saddr; + } + rcu_read_lock(); + ai = nf_get_afinfo(family); + if (ai != NULL) + ai->route(net, (struct dst_entry **)&rt, &fl, false); + rcu_read_unlock(); + + if (rt != NULL) { + mtu = dst_mtu(&rt->dst); + dst_release(&rt->dst); + } + return mtu; +} + static int tcpmss_mangle_packet(struct sk_buff *skb, const struct xt_action_param *par, - unsigned int in_mtu, + unsigned int family, unsigned int tcphoff, unsigned int minlen) { @@ -76,6 +108,9 @@ tcpmss_mangle_packet(struct sk_buff *skb, return -1; if (info->mss == XT_TCPMSS_CLAMP_PMTU) { + struct net *net = dev_net(par->in ? par->in : par->out); + unsigned int in_mtu = tcpmss_reverse_mtu(net, skb, family); + if (dst_mtu(skb_dst(skb)) <= minlen) { net_err_ratelimited("unknown or invalid path-MTU (%u)\n", dst_mtu(skb_dst(skb))); @@ -165,37 +200,6 @@ tcpmss_mangle_packet(struct sk_buff *skb, return TCPOLEN_MSS; } -static u_int32_t tcpmss_reverse_mtu(const struct sk_buff *skb, - unsigned int family) -{ - struct flowi fl; - const struct nf_afinfo *ai; - struct rtable *rt = NULL; - u_int32_t mtu = ~0U; - - if (family == PF_INET) { - struct flowi4 *fl4 = &fl.u.ip4; - memset(fl4, 0, sizeof(*fl4)); - fl4->daddr = ip_hdr(skb)->saddr; - } else { - struct flowi6 *fl6 = &fl.u.ip6; - - memset(fl6, 0, sizeof(*fl6)); - fl6->daddr = ipv6_hdr(skb)->saddr; - } - rcu_read_lock(); - ai = nf_get_afinfo(family); - if (ai != NULL) - ai->route(&init_net, (struct dst_entry **)&rt, &fl, false); - rcu_read_unlock(); - - if (rt != NULL) { - mtu = dst_mtu(&rt->dst); - dst_release(&rt->dst); - } - return mtu; -} - static unsigned int tcpmss_tg4(struct sk_buff *skb, const struct xt_action_param *par) { @@ -204,7 +208,7 @@ tcpmss_tg4(struct sk_buff *skb, const struct xt_action_param *par) int ret; ret = tcpmss_mangle_packet(skb, par, - tcpmss_reverse_mtu(skb, PF_INET), + PF_INET, iph->ihl * 4, sizeof(*iph) + sizeof(struct tcphdr)); if (ret < 0) @@ -233,7 +237,7 @@ tcpmss_tg6(struct sk_buff *skb, const struct xt_action_param *par) if (tcphoff < 0) return NF_DROP; ret = tcpmss_mangle_packet(skb, par, - tcpmss_reverse_mtu(skb, PF_INET6), + PF_INET6, tcphoff, sizeof(*ipv6h) + sizeof(struct tcphdr)); if (ret < 0) diff --git a/net/netfilter/xt_TPROXY.c b/net/netfilter/xt_TPROXY.c index 5d8a3a3cd5a7..ef8a926752a9 100644 --- a/net/netfilter/xt_TPROXY.c +++ b/net/netfilter/xt_TPROXY.c @@ -200,7 +200,7 @@ nf_tproxy_get_sock_v6(struct net *net, const u8 protocol, in->ifindex); if (sk) { int connected = (sk->sk_state == TCP_ESTABLISHED); - int wildcard = ipv6_addr_any(&inet6_sk(sk)->rcv_saddr); + int wildcard = ipv6_addr_any(&sk->sk_v6_rcv_saddr); /* NOTE: we return listeners even if bound to * 0.0.0.0, those are filtered out in diff --git a/net/netfilter/xt_connbytes.c b/net/netfilter/xt_connbytes.c index e595e07a759b..1e634615ab9d 100644 --- a/net/netfilter/xt_connbytes.c +++ b/net/netfilter/xt_connbytes.c @@ -26,16 +26,18 @@ connbytes_mt(const struct sk_buff *skb, struct xt_action_param *par) u_int64_t what = 0; /* initialize to make gcc happy */ u_int64_t bytes = 0; u_int64_t pkts = 0; + const struct nf_conn_acct *acct; const struct nf_conn_counter *counters; ct = nf_ct_get(skb, &ctinfo); if (!ct) return false; - counters = nf_conn_acct_find(ct); - if (!counters) + acct = nf_conn_acct_find(ct); + if (!acct) return false; + counters = acct->counter; switch (sinfo->what) { case XT_CONNBYTES_PKTS: switch (sinfo->direction) { diff --git a/net/netfilter/xt_set.c b/net/netfilter/xt_set.c index 31790e789e22..e7c4e0e01ff5 100644 --- a/net/netfilter/xt_set.c +++ b/net/netfilter/xt_set.c @@ -81,7 +81,7 @@ set_match_v0_checkentry(const struct xt_mtchk_param *par) struct xt_set_info_match_v0 *info = par->matchinfo; ip_set_id_t index; - index = ip_set_nfnl_get_byindex(info->match_set.index); + index = ip_set_nfnl_get_byindex(par->net, info->match_set.index); if (index == IPSET_INVALID_ID) { pr_warning("Cannot find set indentified by id %u to match\n", @@ -91,7 +91,7 @@ set_match_v0_checkentry(const struct xt_mtchk_param *par) if (info->match_set.u.flags[IPSET_DIM_MAX-1] != 0) { pr_warning("Protocol error: set match dimension " "is over the limit!\n"); - ip_set_nfnl_put(info->match_set.index); + ip_set_nfnl_put(par->net, info->match_set.index); return -ERANGE; } @@ -106,9 +106,104 @@ set_match_v0_destroy(const struct xt_mtdtor_param *par) { struct xt_set_info_match_v0 *info = par->matchinfo; - ip_set_nfnl_put(info->match_set.index); + ip_set_nfnl_put(par->net, info->match_set.index); } +/* Revision 1 match */ + +static bool +set_match_v1(const struct sk_buff *skb, struct xt_action_param *par) +{ + const struct xt_set_info_match_v1 *info = par->matchinfo; + ADT_OPT(opt, par->family, info->match_set.dim, + info->match_set.flags, 0, UINT_MAX); + + if (opt.flags & IPSET_RETURN_NOMATCH) + opt.cmdflags |= IPSET_FLAG_RETURN_NOMATCH; + + return match_set(info->match_set.index, skb, par, &opt, + info->match_set.flags & IPSET_INV_MATCH); +} + +static int +set_match_v1_checkentry(const struct xt_mtchk_param *par) +{ + struct xt_set_info_match_v1 *info = par->matchinfo; + ip_set_id_t index; + + index = ip_set_nfnl_get_byindex(par->net, info->match_set.index); + + if (index == IPSET_INVALID_ID) { + pr_warning("Cannot find set indentified by id %u to match\n", + info->match_set.index); + return -ENOENT; + } + if (info->match_set.dim > IPSET_DIM_MAX) { + pr_warning("Protocol error: set match dimension " + "is over the limit!\n"); + ip_set_nfnl_put(par->net, info->match_set.index); + return -ERANGE; + } + + return 0; +} + +static void +set_match_v1_destroy(const struct xt_mtdtor_param *par) +{ + struct xt_set_info_match_v1 *info = par->matchinfo; + + ip_set_nfnl_put(par->net, info->match_set.index); +} + +/* Revision 3 match */ + +static bool +match_counter(u64 counter, const struct ip_set_counter_match *info) +{ + switch (info->op) { + case IPSET_COUNTER_NONE: + return true; + case IPSET_COUNTER_EQ: + return counter == info->value; + case IPSET_COUNTER_NE: + return counter != info->value; + case IPSET_COUNTER_LT: + return counter < info->value; + case IPSET_COUNTER_GT: + return counter > info->value; + } + return false; +} + +static bool +set_match_v3(const struct sk_buff *skb, struct xt_action_param *par) +{ + const struct xt_set_info_match_v3 *info = par->matchinfo; + ADT_OPT(opt, par->family, info->match_set.dim, + info->match_set.flags, info->flags, UINT_MAX); + int ret; + + if (info->packets.op != IPSET_COUNTER_NONE || + info->bytes.op != IPSET_COUNTER_NONE) + opt.cmdflags |= IPSET_FLAG_MATCH_COUNTERS; + + ret = match_set(info->match_set.index, skb, par, &opt, + info->match_set.flags & IPSET_INV_MATCH); + + if (!(ret && opt.cmdflags & IPSET_FLAG_MATCH_COUNTERS)) + return ret; + + if (!match_counter(opt.ext.packets, &info->packets)) + return 0; + return match_counter(opt.ext.bytes, &info->bytes); +} + +#define set_match_v3_checkentry set_match_v1_checkentry +#define set_match_v3_destroy set_match_v1_destroy + +/* Revision 0 interface: backward compatible with netfilter/iptables */ + static unsigned int set_target_v0(struct sk_buff *skb, const struct xt_action_param *par) { @@ -133,7 +228,7 @@ set_target_v0_checkentry(const struct xt_tgchk_param *par) ip_set_id_t index; if (info->add_set.index != IPSET_INVALID_ID) { - index = ip_set_nfnl_get_byindex(info->add_set.index); + index = ip_set_nfnl_get_byindex(par->net, info->add_set.index); if (index == IPSET_INVALID_ID) { pr_warning("Cannot find add_set index %u as target\n", info->add_set.index); @@ -142,12 +237,12 @@ set_target_v0_checkentry(const struct xt_tgchk_param *par) } if (info->del_set.index != IPSET_INVALID_ID) { - index = ip_set_nfnl_get_byindex(info->del_set.index); + index = ip_set_nfnl_get_byindex(par->net, info->del_set.index); if (index == IPSET_INVALID_ID) { pr_warning("Cannot find del_set index %u as target\n", info->del_set.index); if (info->add_set.index != IPSET_INVALID_ID) - ip_set_nfnl_put(info->add_set.index); + ip_set_nfnl_put(par->net, info->add_set.index); return -ENOENT; } } @@ -156,9 +251,9 @@ set_target_v0_checkentry(const struct xt_tgchk_param *par) pr_warning("Protocol error: SET target dimension " "is over the limit!\n"); if (info->add_set.index != IPSET_INVALID_ID) - ip_set_nfnl_put(info->add_set.index); + ip_set_nfnl_put(par->net, info->add_set.index); if (info->del_set.index != IPSET_INVALID_ID) - ip_set_nfnl_put(info->del_set.index); + ip_set_nfnl_put(par->net, info->del_set.index); return -ERANGE; } @@ -175,57 +270,12 @@ set_target_v0_destroy(const struct xt_tgdtor_param *par) const struct xt_set_info_target_v0 *info = par->targinfo; if (info->add_set.index != IPSET_INVALID_ID) - ip_set_nfnl_put(info->add_set.index); + ip_set_nfnl_put(par->net, info->add_set.index); if (info->del_set.index != IPSET_INVALID_ID) - ip_set_nfnl_put(info->del_set.index); + ip_set_nfnl_put(par->net, info->del_set.index); } -/* Revision 1 match and target */ - -static bool -set_match_v1(const struct sk_buff *skb, struct xt_action_param *par) -{ - const struct xt_set_info_match_v1 *info = par->matchinfo; - ADT_OPT(opt, par->family, info->match_set.dim, - info->match_set.flags, 0, UINT_MAX); - - if (opt.flags & IPSET_RETURN_NOMATCH) - opt.cmdflags |= IPSET_FLAG_RETURN_NOMATCH; - - return match_set(info->match_set.index, skb, par, &opt, - info->match_set.flags & IPSET_INV_MATCH); -} - -static int -set_match_v1_checkentry(const struct xt_mtchk_param *par) -{ - struct xt_set_info_match_v1 *info = par->matchinfo; - ip_set_id_t index; - - index = ip_set_nfnl_get_byindex(info->match_set.index); - - if (index == IPSET_INVALID_ID) { - pr_warning("Cannot find set indentified by id %u to match\n", - info->match_set.index); - return -ENOENT; - } - if (info->match_set.dim > IPSET_DIM_MAX) { - pr_warning("Protocol error: set match dimension " - "is over the limit!\n"); - ip_set_nfnl_put(info->match_set.index); - return -ERANGE; - } - - return 0; -} - -static void -set_match_v1_destroy(const struct xt_mtdtor_param *par) -{ - struct xt_set_info_match_v1 *info = par->matchinfo; - - ip_set_nfnl_put(info->match_set.index); -} +/* Revision 1 target */ static unsigned int set_target_v1(struct sk_buff *skb, const struct xt_action_param *par) @@ -251,7 +301,7 @@ set_target_v1_checkentry(const struct xt_tgchk_param *par) ip_set_id_t index; if (info->add_set.index != IPSET_INVALID_ID) { - index = ip_set_nfnl_get_byindex(info->add_set.index); + index = ip_set_nfnl_get_byindex(par->net, info->add_set.index); if (index == IPSET_INVALID_ID) { pr_warning("Cannot find add_set index %u as target\n", info->add_set.index); @@ -260,12 +310,12 @@ set_target_v1_checkentry(const struct xt_tgchk_param *par) } if (info->del_set.index != IPSET_INVALID_ID) { - index = ip_set_nfnl_get_byindex(info->del_set.index); + index = ip_set_nfnl_get_byindex(par->net, info->del_set.index); if (index == IPSET_INVALID_ID) { pr_warning("Cannot find del_set index %u as target\n", info->del_set.index); if (info->add_set.index != IPSET_INVALID_ID) - ip_set_nfnl_put(info->add_set.index); + ip_set_nfnl_put(par->net, info->add_set.index); return -ENOENT; } } @@ -274,9 +324,9 @@ set_target_v1_checkentry(const struct xt_tgchk_param *par) pr_warning("Protocol error: SET target dimension " "is over the limit!\n"); if (info->add_set.index != IPSET_INVALID_ID) - ip_set_nfnl_put(info->add_set.index); + ip_set_nfnl_put(par->net, info->add_set.index); if (info->del_set.index != IPSET_INVALID_ID) - ip_set_nfnl_put(info->del_set.index); + ip_set_nfnl_put(par->net, info->del_set.index); return -ERANGE; } @@ -289,9 +339,9 @@ set_target_v1_destroy(const struct xt_tgdtor_param *par) const struct xt_set_info_target_v1 *info = par->targinfo; if (info->add_set.index != IPSET_INVALID_ID) - ip_set_nfnl_put(info->add_set.index); + ip_set_nfnl_put(par->net, info->add_set.index); if (info->del_set.index != IPSET_INVALID_ID) - ip_set_nfnl_put(info->del_set.index); + ip_set_nfnl_put(par->net, info->del_set.index); } /* Revision 2 target */ @@ -320,52 +370,6 @@ set_target_v2(struct sk_buff *skb, const struct xt_action_param *par) #define set_target_v2_checkentry set_target_v1_checkentry #define set_target_v2_destroy set_target_v1_destroy -/* Revision 3 match */ - -static bool -match_counter(u64 counter, const struct ip_set_counter_match *info) -{ - switch (info->op) { - case IPSET_COUNTER_NONE: - return true; - case IPSET_COUNTER_EQ: - return counter == info->value; - case IPSET_COUNTER_NE: - return counter != info->value; - case IPSET_COUNTER_LT: - return counter < info->value; - case IPSET_COUNTER_GT: - return counter > info->value; - } - return false; -} - -static bool -set_match_v3(const struct sk_buff *skb, struct xt_action_param *par) -{ - const struct xt_set_info_match_v3 *info = par->matchinfo; - ADT_OPT(opt, par->family, info->match_set.dim, - info->match_set.flags, info->flags, UINT_MAX); - int ret; - - if (info->packets.op != IPSET_COUNTER_NONE || - info->bytes.op != IPSET_COUNTER_NONE) - opt.cmdflags |= IPSET_FLAG_MATCH_COUNTERS; - - ret = match_set(info->match_set.index, skb, par, &opt, - info->match_set.flags & IPSET_INV_MATCH); - - if (!(ret && opt.cmdflags & IPSET_FLAG_MATCH_COUNTERS)) - return ret; - - if (!match_counter(opt.ext.packets, &info->packets)) - return 0; - return match_counter(opt.ext.bytes, &info->bytes); -} - -#define set_match_v3_checkentry set_match_v1_checkentry -#define set_match_v3_destroy set_match_v1_destroy - static struct xt_match set_matches[] __read_mostly = { { .name = "set", diff --git a/net/netfilter/xt_socket.c b/net/netfilter/xt_socket.c index 06df2b9110f5..1ba67931eb1b 100644 --- a/net/netfilter/xt_socket.c +++ b/net/netfilter/xt_socket.c @@ -35,15 +35,6 @@ #include <net/netfilter/nf_conntrack.h> #endif -static void -xt_socket_put_sk(struct sock *sk) -{ - if (sk->sk_state == TCP_TIME_WAIT) - inet_twsk_put(inet_twsk(sk)); - else - sock_put(sk); -} - static int extract_icmp4_fields(const struct sk_buff *skb, u8 *protocol, @@ -216,7 +207,7 @@ socket_match(const struct sk_buff *skb, struct xt_action_param *par, inet_twsk(sk)->tw_transparent)); if (sk != skb->sk) - xt_socket_put_sk(sk); + sock_gen_put(sk); if (wildcard || !transparent) sk = NULL; @@ -370,7 +361,7 @@ socket_mt6_v1_v2(const struct sk_buff *skb, struct xt_action_param *par) */ wildcard = (!(info->flags & XT_SOCKET_NOWILDCARD) && sk->sk_state != TCP_TIME_WAIT && - ipv6_addr_any(&inet6_sk(sk)->rcv_saddr)); + ipv6_addr_any(&sk->sk_v6_rcv_saddr)); /* Ignore non-transparent sockets, if XT_SOCKET_TRANSPARENT is used */ @@ -381,7 +372,7 @@ socket_mt6_v1_v2(const struct sk_buff *skb, struct xt_action_param *par) inet_twsk(sk)->tw_transparent)); if (sk != skb->sk) - xt_socket_put_sk(sk); + sock_gen_put(sk); if (wildcard || !transparent) sk = NULL; diff --git a/net/netlabel/netlabel_kapi.c b/net/netlabel/netlabel_kapi.c index 96a458e12f60..dce1bebf7aec 100644 --- a/net/netlabel/netlabel_kapi.c +++ b/net/netlabel/netlabel_kapi.c @@ -817,7 +817,7 @@ int netlbl_req_setattr(struct request_sock *req, switch (req->rsk_ops->family) { case AF_INET: entry = netlbl_domhsh_getentry_af4(secattr->domain, - inet_rsk(req)->rmt_addr); + inet_rsk(req)->ir_rmt_addr); if (entry == NULL) { ret_val = -ENOENT; goto req_setattr_return; diff --git a/net/nfc/Kconfig b/net/nfc/Kconfig index 5948b2fc72f6..6e0fa0cce198 100644 --- a/net/nfc/Kconfig +++ b/net/nfc/Kconfig @@ -14,6 +14,20 @@ menuconfig NFC To compile this support as a module, choose M here: the module will be called nfc. +config NFC_DIGITAL + depends on NFC + select CRC_CCITT + select CRC_ITU_T + tristate "NFC Digital Protocol stack support" + default n + help + Say Y if you want to build NFC digital protocol stack support. + This is needed by NFC chipsets whose firmware only implement + the NFC analog layer. + + To compile this support as a module, choose M here: the module will + be called nfc_digital. + source "net/nfc/nci/Kconfig" source "net/nfc/hci/Kconfig" diff --git a/net/nfc/Makefile b/net/nfc/Makefile index a76f4533cb6c..2555ff8e7219 100644 --- a/net/nfc/Makefile +++ b/net/nfc/Makefile @@ -5,7 +5,9 @@ obj-$(CONFIG_NFC) += nfc.o obj-$(CONFIG_NFC_NCI) += nci/ obj-$(CONFIG_NFC_HCI) += hci/ +obj-$(CONFIG_NFC_DIGITAL) += nfc_digital.o nfc-objs := core.o netlink.o af_nfc.o rawsock.o llcp_core.o llcp_commands.o \ llcp_sock.o +nfc_digital-objs := digital_core.o digital_technology.o digital_dep.o diff --git a/net/nfc/core.c b/net/nfc/core.c index e92923cf3e03..872529105abc 100644 --- a/net/nfc/core.c +++ b/net/nfc/core.c @@ -384,6 +384,19 @@ int nfc_dep_link_is_up(struct nfc_dev *dev, u32 target_idx, { dev->dep_link_up = true; + if (!dev->active_target) { + struct nfc_target *target; + + target = nfc_find_target(dev, target_idx); + if (target == NULL) + return -ENOTCONN; + + dev->active_target = target; + } + + dev->polling = false; + dev->rf_mode = rf_mode; + nfc_llcp_mac_is_up(dev, target_idx, comm_mode, rf_mode); return nfc_genl_dep_link_up_event(dev, target_idx, comm_mode, rf_mode); @@ -536,7 +549,7 @@ error: return rc; } -static struct nfc_se *find_se(struct nfc_dev *dev, u32 se_idx) +struct nfc_se *nfc_find_se(struct nfc_dev *dev, u32 se_idx) { struct nfc_se *se, *n; @@ -546,6 +559,7 @@ static struct nfc_se *find_se(struct nfc_dev *dev, u32 se_idx) return NULL; } +EXPORT_SYMBOL(nfc_find_se); int nfc_enable_se(struct nfc_dev *dev, u32 se_idx) { @@ -577,7 +591,7 @@ int nfc_enable_se(struct nfc_dev *dev, u32 se_idx) goto error; } - se = find_se(dev, se_idx); + se = nfc_find_se(dev, se_idx); if (!se) { rc = -EINVAL; goto error; @@ -622,7 +636,7 @@ int nfc_disable_se(struct nfc_dev *dev, u32 se_idx) goto error; } - se = find_se(dev, se_idx); + se = nfc_find_se(dev, se_idx); if (!se) { rc = -EINVAL; goto error; @@ -881,7 +895,7 @@ int nfc_add_se(struct nfc_dev *dev, u32 se_idx, u16 type) pr_debug("%s se index %d\n", dev_name(&dev->dev), se_idx); - se = find_se(dev, se_idx); + se = nfc_find_se(dev, se_idx); if (se) return -EALREADY; diff --git a/net/nfc/digital.h b/net/nfc/digital.h new file mode 100644 index 000000000000..08b29b55ea63 --- /dev/null +++ b/net/nfc/digital.h @@ -0,0 +1,170 @@ +/* + * NFC Digital Protocol stack + * Copyright (c) 2013, Intel Corporation. + * + * This program is free software; you can redistribute it and/or modify it + * under the terms and conditions of the GNU General Public License, + * version 2, as published by the Free Software Foundation. + * + * This program is distributed in the hope it will be useful, but WITHOUT + * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or + * FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for + * more details. + * + */ + +#ifndef __DIGITAL_H +#define __DIGITAL_H + +#include <net/nfc/nfc.h> +#include <net/nfc/digital.h> + +#include <linux/crc-ccitt.h> +#include <linux/crc-itu-t.h> + +#define PROTOCOL_ERR(req) pr_err("%d: NFC Digital Protocol error: %s\n", \ + __LINE__, req) + +#define DIGITAL_CMD_IN_SEND 0 +#define DIGITAL_CMD_TG_SEND 1 +#define DIGITAL_CMD_TG_LISTEN 2 +#define DIGITAL_CMD_TG_LISTEN_MDAA 3 + +#define DIGITAL_MAX_HEADER_LEN 7 +#define DIGITAL_CRC_LEN 2 + +#define DIGITAL_SENSF_NFCID2_NFC_DEP_B1 0x01 +#define DIGITAL_SENSF_NFCID2_NFC_DEP_B2 0xFE + +#define DIGITAL_SENS_RES_NFC_DEP 0x0100 +#define DIGITAL_SEL_RES_NFC_DEP 0x40 +#define DIGITAL_SENSF_FELICA_SC 0xFFFF + +#define DIGITAL_DRV_CAPS_IN_CRC(ddev) \ + ((ddev)->driver_capabilities & NFC_DIGITAL_DRV_CAPS_IN_CRC) +#define DIGITAL_DRV_CAPS_TG_CRC(ddev) \ + ((ddev)->driver_capabilities & NFC_DIGITAL_DRV_CAPS_TG_CRC) + +struct digital_data_exch { + data_exchange_cb_t cb; + void *cb_context; +}; + +struct sk_buff *digital_skb_alloc(struct nfc_digital_dev *ddev, + unsigned int len); + +int digital_send_cmd(struct nfc_digital_dev *ddev, u8 cmd_type, + struct sk_buff *skb, struct digital_tg_mdaa_params *params, + u16 timeout, nfc_digital_cmd_complete_t cmd_cb, + void *cb_context); + +int digital_in_configure_hw(struct nfc_digital_dev *ddev, int type, int param); +static inline int digital_in_send_cmd(struct nfc_digital_dev *ddev, + struct sk_buff *skb, u16 timeout, + nfc_digital_cmd_complete_t cmd_cb, + void *cb_context) +{ + return digital_send_cmd(ddev, DIGITAL_CMD_IN_SEND, skb, NULL, timeout, + cmd_cb, cb_context); +} + +void digital_poll_next_tech(struct nfc_digital_dev *ddev); + +int digital_in_send_sens_req(struct nfc_digital_dev *ddev, u8 rf_tech); +int digital_in_send_sensf_req(struct nfc_digital_dev *ddev, u8 rf_tech); + +int digital_target_found(struct nfc_digital_dev *ddev, + struct nfc_target *target, u8 protocol); + +int digital_in_recv_mifare_res(struct sk_buff *resp); + +int digital_in_send_atr_req(struct nfc_digital_dev *ddev, + struct nfc_target *target, __u8 comm_mode, __u8 *gb, + size_t gb_len); +int digital_in_send_dep_req(struct nfc_digital_dev *ddev, + struct nfc_target *target, struct sk_buff *skb, + struct digital_data_exch *data_exch); + +int digital_tg_configure_hw(struct nfc_digital_dev *ddev, int type, int param); +static inline int digital_tg_send_cmd(struct nfc_digital_dev *ddev, + struct sk_buff *skb, u16 timeout, + nfc_digital_cmd_complete_t cmd_cb, void *cb_context) +{ + return digital_send_cmd(ddev, DIGITAL_CMD_TG_SEND, skb, NULL, timeout, + cmd_cb, cb_context); +} + +void digital_tg_recv_sens_req(struct nfc_digital_dev *ddev, void *arg, + struct sk_buff *resp); + +void digital_tg_recv_sensf_req(struct nfc_digital_dev *ddev, void *arg, + struct sk_buff *resp); + +static inline int digital_tg_listen(struct nfc_digital_dev *ddev, u16 timeout, + nfc_digital_cmd_complete_t cb, void *arg) +{ + return digital_send_cmd(ddev, DIGITAL_CMD_TG_LISTEN, NULL, NULL, + timeout, cb, arg); +} + +void digital_tg_recv_atr_req(struct nfc_digital_dev *ddev, void *arg, + struct sk_buff *resp); + +int digital_tg_send_dep_res(struct nfc_digital_dev *ddev, struct sk_buff *skb); + +int digital_tg_listen_nfca(struct nfc_digital_dev *ddev, u8 rf_tech); +int digital_tg_listen_nfcf(struct nfc_digital_dev *ddev, u8 rf_tech); + +typedef u16 (*crc_func_t)(u16, const u8 *, size_t); + +#define CRC_A_INIT 0x6363 +#define CRC_B_INIT 0xFFFF +#define CRC_F_INIT 0x0000 + +void digital_skb_add_crc(struct sk_buff *skb, crc_func_t crc_func, u16 init, + u8 bitwise_inv, u8 msb_first); + +static inline void digital_skb_add_crc_a(struct sk_buff *skb) +{ + digital_skb_add_crc(skb, crc_ccitt, CRC_A_INIT, 0, 0); +} + +static inline void digital_skb_add_crc_b(struct sk_buff *skb) +{ + digital_skb_add_crc(skb, crc_ccitt, CRC_B_INIT, 1, 0); +} + +static inline void digital_skb_add_crc_f(struct sk_buff *skb) +{ + digital_skb_add_crc(skb, crc_itu_t, CRC_F_INIT, 0, 1); +} + +static inline void digital_skb_add_crc_none(struct sk_buff *skb) +{ + return; +} + +int digital_skb_check_crc(struct sk_buff *skb, crc_func_t crc_func, + u16 crc_init, u8 bitwise_inv, u8 msb_first); + +static inline int digital_skb_check_crc_a(struct sk_buff *skb) +{ + return digital_skb_check_crc(skb, crc_ccitt, CRC_A_INIT, 0, 0); +} + +static inline int digital_skb_check_crc_b(struct sk_buff *skb) +{ + return digital_skb_check_crc(skb, crc_ccitt, CRC_B_INIT, 1, 0); +} + +static inline int digital_skb_check_crc_f(struct sk_buff *skb) +{ + return digital_skb_check_crc(skb, crc_itu_t, CRC_F_INIT, 0, 1); +} + +static inline int digital_skb_check_crc_none(struct sk_buff *skb) +{ + return 0; +} + +#endif /* __DIGITAL_H */ diff --git a/net/nfc/digital_core.c b/net/nfc/digital_core.c new file mode 100644 index 000000000000..09fc95439955 --- /dev/null +++ b/net/nfc/digital_core.c @@ -0,0 +1,737 @@ +/* + * NFC Digital Protocol stack + * Copyright (c) 2013, Intel Corporation. + * + * This program is free software; you can redistribute it and/or modify it + * under the terms and conditions of the GNU General Public License, + * version 2, as published by the Free Software Foundation. + * + * This program is distributed in the hope it will be useful, but WITHOUT + * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or + * FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for + * more details. + * + */ + +#define pr_fmt(fmt) "digital: %s: " fmt, __func__ + +#include <linux/module.h> + +#include "digital.h" + +#define DIGITAL_PROTO_NFCA_RF_TECH \ + (NFC_PROTO_JEWEL_MASK | NFC_PROTO_MIFARE_MASK | NFC_PROTO_NFC_DEP_MASK) + +#define DIGITAL_PROTO_NFCF_RF_TECH \ + (NFC_PROTO_FELICA_MASK | NFC_PROTO_NFC_DEP_MASK) + +struct digital_cmd { + struct list_head queue; + + u8 type; + u8 pending; + + u16 timeout; + struct sk_buff *req; + struct sk_buff *resp; + struct digital_tg_mdaa_params *mdaa_params; + + nfc_digital_cmd_complete_t cmd_cb; + void *cb_context; +}; + +struct sk_buff *digital_skb_alloc(struct nfc_digital_dev *ddev, + unsigned int len) +{ + struct sk_buff *skb; + + skb = alloc_skb(len + ddev->tx_headroom + ddev->tx_tailroom, + GFP_KERNEL); + if (skb) + skb_reserve(skb, ddev->tx_headroom); + + return skb; +} + +void digital_skb_add_crc(struct sk_buff *skb, crc_func_t crc_func, u16 init, + u8 bitwise_inv, u8 msb_first) +{ + u16 crc; + + crc = crc_func(init, skb->data, skb->len); + + if (bitwise_inv) + crc = ~crc; + + if (msb_first) + crc = __fswab16(crc); + + *skb_put(skb, 1) = crc & 0xFF; + *skb_put(skb, 1) = (crc >> 8) & 0xFF; +} + +int digital_skb_check_crc(struct sk_buff *skb, crc_func_t crc_func, + u16 crc_init, u8 bitwise_inv, u8 msb_first) +{ + int rc; + u16 crc; + + if (skb->len <= 2) + return -EIO; + + crc = crc_func(crc_init, skb->data, skb->len - 2); + + if (bitwise_inv) + crc = ~crc; + + if (msb_first) + crc = __swab16(crc); + + rc = (skb->data[skb->len - 2] - (crc & 0xFF)) + + (skb->data[skb->len - 1] - ((crc >> 8) & 0xFF)); + + if (rc) + return -EIO; + + skb_trim(skb, skb->len - 2); + + return 0; +} + +static inline void digital_switch_rf(struct nfc_digital_dev *ddev, bool on) +{ + ddev->ops->switch_rf(ddev, on); +} + +static inline void digital_abort_cmd(struct nfc_digital_dev *ddev) +{ + ddev->ops->abort_cmd(ddev); +} + +static void digital_wq_cmd_complete(struct work_struct *work) +{ + struct digital_cmd *cmd; + struct nfc_digital_dev *ddev = container_of(work, + struct nfc_digital_dev, + cmd_complete_work); + + mutex_lock(&ddev->cmd_lock); + + cmd = list_first_entry_or_null(&ddev->cmd_queue, struct digital_cmd, + queue); + if (!cmd) { + mutex_unlock(&ddev->cmd_lock); + return; + } + + list_del(&cmd->queue); + + mutex_unlock(&ddev->cmd_lock); + + if (!IS_ERR(cmd->resp)) + print_hex_dump_debug("DIGITAL RX: ", DUMP_PREFIX_NONE, 16, 1, + cmd->resp->data, cmd->resp->len, false); + + cmd->cmd_cb(ddev, cmd->cb_context, cmd->resp); + + kfree(cmd->mdaa_params); + kfree(cmd); + + schedule_work(&ddev->cmd_work); +} + +static void digital_send_cmd_complete(struct nfc_digital_dev *ddev, + void *arg, struct sk_buff *resp) +{ + struct digital_cmd *cmd = arg; + + cmd->resp = resp; + + schedule_work(&ddev->cmd_complete_work); +} + +static void digital_wq_cmd(struct work_struct *work) +{ + int rc; + struct digital_cmd *cmd; + struct digital_tg_mdaa_params *params; + struct nfc_digital_dev *ddev = container_of(work, + struct nfc_digital_dev, + cmd_work); + + mutex_lock(&ddev->cmd_lock); + + cmd = list_first_entry_or_null(&ddev->cmd_queue, struct digital_cmd, + queue); + if (!cmd || cmd->pending) { + mutex_unlock(&ddev->cmd_lock); + return; + } + + mutex_unlock(&ddev->cmd_lock); + + if (cmd->req) + print_hex_dump_debug("DIGITAL TX: ", DUMP_PREFIX_NONE, 16, 1, + cmd->req->data, cmd->req->len, false); + + switch (cmd->type) { + case DIGITAL_CMD_IN_SEND: + rc = ddev->ops->in_send_cmd(ddev, cmd->req, cmd->timeout, + digital_send_cmd_complete, cmd); + break; + + case DIGITAL_CMD_TG_SEND: + rc = ddev->ops->tg_send_cmd(ddev, cmd->req, cmd->timeout, + digital_send_cmd_complete, cmd); + break; + + case DIGITAL_CMD_TG_LISTEN: + rc = ddev->ops->tg_listen(ddev, cmd->timeout, + digital_send_cmd_complete, cmd); + break; + + case DIGITAL_CMD_TG_LISTEN_MDAA: + params = cmd->mdaa_params; + + rc = ddev->ops->tg_listen_mdaa(ddev, params, cmd->timeout, + digital_send_cmd_complete, cmd); + break; + + default: + pr_err("Unknown cmd type %d\n", cmd->type); + return; + } + + if (!rc) + return; + + pr_err("in_send_command returned err %d\n", rc); + + mutex_lock(&ddev->cmd_lock); + list_del(&cmd->queue); + mutex_unlock(&ddev->cmd_lock); + + kfree_skb(cmd->req); + kfree(cmd->mdaa_params); + kfree(cmd); + + schedule_work(&ddev->cmd_work); +} + +int digital_send_cmd(struct nfc_digital_dev *ddev, u8 cmd_type, + struct sk_buff *skb, struct digital_tg_mdaa_params *params, + u16 timeout, nfc_digital_cmd_complete_t cmd_cb, + void *cb_context) +{ + struct digital_cmd *cmd; + + cmd = kzalloc(sizeof(struct digital_cmd), GFP_KERNEL); + if (!cmd) + return -ENOMEM; + + cmd->type = cmd_type; + cmd->timeout = timeout; + cmd->req = skb; + cmd->mdaa_params = params; + cmd->cmd_cb = cmd_cb; + cmd->cb_context = cb_context; + INIT_LIST_HEAD(&cmd->queue); + + mutex_lock(&ddev->cmd_lock); + list_add_tail(&cmd->queue, &ddev->cmd_queue); + mutex_unlock(&ddev->cmd_lock); + + schedule_work(&ddev->cmd_work); + + return 0; +} + +int digital_in_configure_hw(struct nfc_digital_dev *ddev, int type, int param) +{ + int rc; + + rc = ddev->ops->in_configure_hw(ddev, type, param); + if (rc) + pr_err("in_configure_hw failed: %d\n", rc); + + return rc; +} + +int digital_tg_configure_hw(struct nfc_digital_dev *ddev, int type, int param) +{ + int rc; + + rc = ddev->ops->tg_configure_hw(ddev, type, param); + if (rc) + pr_err("tg_configure_hw failed: %d\n", rc); + + return rc; +} + +static int digital_tg_listen_mdaa(struct nfc_digital_dev *ddev, u8 rf_tech) +{ + struct digital_tg_mdaa_params *params; + + params = kzalloc(sizeof(struct digital_tg_mdaa_params), GFP_KERNEL); + if (!params) + return -ENOMEM; + + params->sens_res = DIGITAL_SENS_RES_NFC_DEP; + get_random_bytes(params->nfcid1, sizeof(params->nfcid1)); + params->sel_res = DIGITAL_SEL_RES_NFC_DEP; + + params->nfcid2[0] = DIGITAL_SENSF_NFCID2_NFC_DEP_B1; + params->nfcid2[1] = DIGITAL_SENSF_NFCID2_NFC_DEP_B2; + get_random_bytes(params->nfcid2 + 2, NFC_NFCID2_MAXSIZE - 2); + params->sc = DIGITAL_SENSF_FELICA_SC; + + return digital_send_cmd(ddev, DIGITAL_CMD_TG_LISTEN_MDAA, NULL, params, + 500, digital_tg_recv_atr_req, NULL); +} + +int digital_target_found(struct nfc_digital_dev *ddev, + struct nfc_target *target, u8 protocol) +{ + int rc; + u8 framing; + u8 rf_tech; + int (*check_crc)(struct sk_buff *skb); + void (*add_crc)(struct sk_buff *skb); + + rf_tech = ddev->poll_techs[ddev->poll_tech_index].rf_tech; + + switch (protocol) { + case NFC_PROTO_JEWEL: + framing = NFC_DIGITAL_FRAMING_NFCA_T1T; + check_crc = digital_skb_check_crc_b; + add_crc = digital_skb_add_crc_b; + break; + + case NFC_PROTO_MIFARE: + framing = NFC_DIGITAL_FRAMING_NFCA_T2T; + check_crc = digital_skb_check_crc_a; + add_crc = digital_skb_add_crc_a; + break; + + case NFC_PROTO_FELICA: + framing = NFC_DIGITAL_FRAMING_NFCF_T3T; + check_crc = digital_skb_check_crc_f; + add_crc = digital_skb_add_crc_f; + break; + + case NFC_PROTO_NFC_DEP: + if (rf_tech == NFC_DIGITAL_RF_TECH_106A) { + framing = NFC_DIGITAL_FRAMING_NFCA_NFC_DEP; + check_crc = digital_skb_check_crc_a; + add_crc = digital_skb_add_crc_a; + } else { + framing = NFC_DIGITAL_FRAMING_NFCF_NFC_DEP; + check_crc = digital_skb_check_crc_f; + add_crc = digital_skb_add_crc_f; + } + break; + + default: + pr_err("Invalid protocol %d\n", protocol); + return -EINVAL; + } + + pr_debug("rf_tech=%d, protocol=%d\n", rf_tech, protocol); + + ddev->curr_rf_tech = rf_tech; + ddev->curr_protocol = protocol; + + if (DIGITAL_DRV_CAPS_IN_CRC(ddev)) { + ddev->skb_add_crc = digital_skb_add_crc_none; + ddev->skb_check_crc = digital_skb_check_crc_none; + } else { + ddev->skb_add_crc = add_crc; + ddev->skb_check_crc = check_crc; + } + + rc = digital_in_configure_hw(ddev, NFC_DIGITAL_CONFIG_FRAMING, framing); + if (rc) + return rc; + + target->supported_protocols = (1 << protocol); + rc = nfc_targets_found(ddev->nfc_dev, target, 1); + if (rc) + return rc; + + ddev->poll_tech_count = 0; + + return 0; +} + +void digital_poll_next_tech(struct nfc_digital_dev *ddev) +{ + digital_switch_rf(ddev, 0); + + mutex_lock(&ddev->poll_lock); + + if (!ddev->poll_tech_count) { + mutex_unlock(&ddev->poll_lock); + return; + } + + ddev->poll_tech_index = (ddev->poll_tech_index + 1) % + ddev->poll_tech_count; + + mutex_unlock(&ddev->poll_lock); + + schedule_work(&ddev->poll_work); +} + +static void digital_wq_poll(struct work_struct *work) +{ + int rc; + struct digital_poll_tech *poll_tech; + struct nfc_digital_dev *ddev = container_of(work, + struct nfc_digital_dev, + poll_work); + mutex_lock(&ddev->poll_lock); + + if (!ddev->poll_tech_count) { + mutex_unlock(&ddev->poll_lock); + return; + } + + poll_tech = &ddev->poll_techs[ddev->poll_tech_index]; + + mutex_unlock(&ddev->poll_lock); + + rc = poll_tech->poll_func(ddev, poll_tech->rf_tech); + if (rc) + digital_poll_next_tech(ddev); +} + +static void digital_add_poll_tech(struct nfc_digital_dev *ddev, u8 rf_tech, + digital_poll_t poll_func) +{ + struct digital_poll_tech *poll_tech; + + if (ddev->poll_tech_count >= NFC_DIGITAL_POLL_MODE_COUNT_MAX) + return; + + poll_tech = &ddev->poll_techs[ddev->poll_tech_count++]; + + poll_tech->rf_tech = rf_tech; + poll_tech->poll_func = poll_func; +} + +/** + * start_poll operation + * + * For every supported protocol, the corresponding polling function is added + * to the table of polling technologies (ddev->poll_techs[]) using + * digital_add_poll_tech(). + * When a polling function fails (by timeout or protocol error) the next one is + * schedule by digital_poll_next_tech() on the poll workqueue (ddev->poll_work). + */ +static int digital_start_poll(struct nfc_dev *nfc_dev, __u32 im_protocols, + __u32 tm_protocols) +{ + struct nfc_digital_dev *ddev = nfc_get_drvdata(nfc_dev); + u32 matching_im_protocols, matching_tm_protocols; + + pr_debug("protocols: im 0x%x, tm 0x%x, supported 0x%x\n", im_protocols, + tm_protocols, ddev->protocols); + + matching_im_protocols = ddev->protocols & im_protocols; + matching_tm_protocols = ddev->protocols & tm_protocols; + + if (!matching_im_protocols && !matching_tm_protocols) { + pr_err("Unknown protocol\n"); + return -EINVAL; + } + + if (ddev->poll_tech_count) { + pr_err("Already polling\n"); + return -EBUSY; + } + + if (ddev->curr_protocol) { + pr_err("A target is already active\n"); + return -EBUSY; + } + + ddev->poll_tech_count = 0; + ddev->poll_tech_index = 0; + + if (matching_im_protocols & DIGITAL_PROTO_NFCA_RF_TECH) + digital_add_poll_tech(ddev, NFC_DIGITAL_RF_TECH_106A, + digital_in_send_sens_req); + + if (im_protocols & DIGITAL_PROTO_NFCF_RF_TECH) { + digital_add_poll_tech(ddev, NFC_DIGITAL_RF_TECH_212F, + digital_in_send_sensf_req); + + digital_add_poll_tech(ddev, NFC_DIGITAL_RF_TECH_424F, + digital_in_send_sensf_req); + } + + if (tm_protocols & NFC_PROTO_NFC_DEP_MASK) { + if (ddev->ops->tg_listen_mdaa) { + digital_add_poll_tech(ddev, 0, + digital_tg_listen_mdaa); + } else { + digital_add_poll_tech(ddev, NFC_DIGITAL_RF_TECH_106A, + digital_tg_listen_nfca); + + digital_add_poll_tech(ddev, NFC_DIGITAL_RF_TECH_212F, + digital_tg_listen_nfcf); + + digital_add_poll_tech(ddev, NFC_DIGITAL_RF_TECH_424F, + digital_tg_listen_nfcf); + } + } + + if (!ddev->poll_tech_count) { + pr_err("Unsupported protocols: im=0x%x, tm=0x%x\n", + matching_im_protocols, matching_tm_protocols); + return -EINVAL; + } + + schedule_work(&ddev->poll_work); + + return 0; +} + +static void digital_stop_poll(struct nfc_dev *nfc_dev) +{ + struct nfc_digital_dev *ddev = nfc_get_drvdata(nfc_dev); + + mutex_lock(&ddev->poll_lock); + + if (!ddev->poll_tech_count) { + pr_err("Polling operation was not running\n"); + mutex_unlock(&ddev->poll_lock); + return; + } + + ddev->poll_tech_count = 0; + + mutex_unlock(&ddev->poll_lock); + + cancel_work_sync(&ddev->poll_work); + + digital_abort_cmd(ddev); +} + +static int digital_dev_up(struct nfc_dev *nfc_dev) +{ + struct nfc_digital_dev *ddev = nfc_get_drvdata(nfc_dev); + + digital_switch_rf(ddev, 1); + + return 0; +} + +static int digital_dev_down(struct nfc_dev *nfc_dev) +{ + struct nfc_digital_dev *ddev = nfc_get_drvdata(nfc_dev); + + digital_switch_rf(ddev, 0); + + return 0; +} + +static int digital_dep_link_up(struct nfc_dev *nfc_dev, + struct nfc_target *target, + __u8 comm_mode, __u8 *gb, size_t gb_len) +{ + struct nfc_digital_dev *ddev = nfc_get_drvdata(nfc_dev); + + return digital_in_send_atr_req(ddev, target, comm_mode, gb, gb_len); +} + +static int digital_dep_link_down(struct nfc_dev *nfc_dev) +{ + struct nfc_digital_dev *ddev = nfc_get_drvdata(nfc_dev); + + ddev->curr_protocol = 0; + + return 0; +} + +static int digital_activate_target(struct nfc_dev *nfc_dev, + struct nfc_target *target, __u32 protocol) +{ + return 0; +} + +static void digital_deactivate_target(struct nfc_dev *nfc_dev, + struct nfc_target *target) +{ + struct nfc_digital_dev *ddev = nfc_get_drvdata(nfc_dev); + + ddev->curr_protocol = 0; +} + +static int digital_tg_send(struct nfc_dev *dev, struct sk_buff *skb) +{ + struct nfc_digital_dev *ddev = nfc_get_drvdata(dev); + + return digital_tg_send_dep_res(ddev, skb); +} + +static void digital_in_send_complete(struct nfc_digital_dev *ddev, void *arg, + struct sk_buff *resp) +{ + struct digital_data_exch *data_exch = arg; + int rc; + + if (IS_ERR(resp)) { + rc = PTR_ERR(resp); + goto done; + } + + if (ddev->curr_protocol == NFC_PROTO_MIFARE) + rc = digital_in_recv_mifare_res(resp); + else + rc = ddev->skb_check_crc(resp); + + if (rc) { + kfree_skb(resp); + resp = NULL; + } + +done: + data_exch->cb(data_exch->cb_context, resp, rc); + + kfree(data_exch); +} + +static int digital_in_send(struct nfc_dev *nfc_dev, struct nfc_target *target, + struct sk_buff *skb, data_exchange_cb_t cb, + void *cb_context) +{ + struct nfc_digital_dev *ddev = nfc_get_drvdata(nfc_dev); + struct digital_data_exch *data_exch; + + data_exch = kzalloc(sizeof(struct digital_data_exch), GFP_KERNEL); + if (!data_exch) { + pr_err("Failed to allocate data_exch struct\n"); + return -ENOMEM; + } + + data_exch->cb = cb; + data_exch->cb_context = cb_context; + + if (ddev->curr_protocol == NFC_PROTO_NFC_DEP) + return digital_in_send_dep_req(ddev, target, skb, data_exch); + + ddev->skb_add_crc(skb); + + return digital_in_send_cmd(ddev, skb, 500, digital_in_send_complete, + data_exch); +} + +static struct nfc_ops digital_nfc_ops = { + .dev_up = digital_dev_up, + .dev_down = digital_dev_down, + .start_poll = digital_start_poll, + .stop_poll = digital_stop_poll, + .dep_link_up = digital_dep_link_up, + .dep_link_down = digital_dep_link_down, + .activate_target = digital_activate_target, + .deactivate_target = digital_deactivate_target, + .tm_send = digital_tg_send, + .im_transceive = digital_in_send, +}; + +struct nfc_digital_dev *nfc_digital_allocate_device(struct nfc_digital_ops *ops, + __u32 supported_protocols, + __u32 driver_capabilities, + int tx_headroom, int tx_tailroom) +{ + struct nfc_digital_dev *ddev; + + if (!ops->in_configure_hw || !ops->in_send_cmd || !ops->tg_listen || + !ops->tg_configure_hw || !ops->tg_send_cmd || !ops->abort_cmd || + !ops->switch_rf) + return NULL; + + ddev = kzalloc(sizeof(struct nfc_digital_dev), GFP_KERNEL); + if (!ddev) + return NULL; + + ddev->driver_capabilities = driver_capabilities; + ddev->ops = ops; + + mutex_init(&ddev->cmd_lock); + INIT_LIST_HEAD(&ddev->cmd_queue); + + INIT_WORK(&ddev->cmd_work, digital_wq_cmd); + INIT_WORK(&ddev->cmd_complete_work, digital_wq_cmd_complete); + + mutex_init(&ddev->poll_lock); + INIT_WORK(&ddev->poll_work, digital_wq_poll); + + if (supported_protocols & NFC_PROTO_JEWEL_MASK) + ddev->protocols |= NFC_PROTO_JEWEL_MASK; + if (supported_protocols & NFC_PROTO_MIFARE_MASK) + ddev->protocols |= NFC_PROTO_MIFARE_MASK; + if (supported_protocols & NFC_PROTO_FELICA_MASK) + ddev->protocols |= NFC_PROTO_FELICA_MASK; + if (supported_protocols & NFC_PROTO_NFC_DEP_MASK) + ddev->protocols |= NFC_PROTO_NFC_DEP_MASK; + + ddev->tx_headroom = tx_headroom + DIGITAL_MAX_HEADER_LEN; + ddev->tx_tailroom = tx_tailroom + DIGITAL_CRC_LEN; + + ddev->nfc_dev = nfc_allocate_device(&digital_nfc_ops, ddev->protocols, + ddev->tx_headroom, + ddev->tx_tailroom); + if (!ddev->nfc_dev) { + pr_err("nfc_allocate_device failed\n"); + goto free_dev; + } + + nfc_set_drvdata(ddev->nfc_dev, ddev); + + return ddev; + +free_dev: + kfree(ddev); + + return NULL; +} +EXPORT_SYMBOL(nfc_digital_allocate_device); + +void nfc_digital_free_device(struct nfc_digital_dev *ddev) +{ + nfc_free_device(ddev->nfc_dev); + kfree(ddev); +} +EXPORT_SYMBOL(nfc_digital_free_device); + +int nfc_digital_register_device(struct nfc_digital_dev *ddev) +{ + return nfc_register_device(ddev->nfc_dev); +} +EXPORT_SYMBOL(nfc_digital_register_device); + +void nfc_digital_unregister_device(struct nfc_digital_dev *ddev) +{ + struct digital_cmd *cmd, *n; + + nfc_unregister_device(ddev->nfc_dev); + + mutex_lock(&ddev->poll_lock); + ddev->poll_tech_count = 0; + mutex_unlock(&ddev->poll_lock); + + cancel_work_sync(&ddev->poll_work); + cancel_work_sync(&ddev->cmd_work); + cancel_work_sync(&ddev->cmd_complete_work); + + list_for_each_entry_safe(cmd, n, &ddev->cmd_queue, queue) { + list_del(&cmd->queue); + kfree(cmd->mdaa_params); + kfree(cmd); + } +} +EXPORT_SYMBOL(nfc_digital_unregister_device); + +MODULE_LICENSE("GPL"); diff --git a/net/nfc/digital_dep.c b/net/nfc/digital_dep.c new file mode 100644 index 000000000000..07bbc24fb4c7 --- /dev/null +++ b/net/nfc/digital_dep.c @@ -0,0 +1,729 @@ +/* + * NFC Digital Protocol stack + * Copyright (c) 2013, Intel Corporation. + * + * This program is free software; you can redistribute it and/or modify it + * under the terms and conditions of the GNU General Public License, + * version 2, as published by the Free Software Foundation. + * + * This program is distributed in the hope it will be useful, but WITHOUT + * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or + * FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for + * more details. + * + */ + +#define pr_fmt(fmt) "digital: %s: " fmt, __func__ + +#include "digital.h" + +#define DIGITAL_NFC_DEP_FRAME_DIR_OUT 0xD4 +#define DIGITAL_NFC_DEP_FRAME_DIR_IN 0xD5 + +#define DIGITAL_NFC_DEP_NFCA_SOD_SB 0xF0 + +#define DIGITAL_CMD_ATR_REQ 0x00 +#define DIGITAL_CMD_ATR_RES 0x01 +#define DIGITAL_CMD_PSL_REQ 0x04 +#define DIGITAL_CMD_PSL_RES 0x05 +#define DIGITAL_CMD_DEP_REQ 0x06 +#define DIGITAL_CMD_DEP_RES 0x07 + +#define DIGITAL_ATR_REQ_MIN_SIZE 16 +#define DIGITAL_ATR_REQ_MAX_SIZE 64 + +#define DIGITAL_NFCID3_LEN ((u8)8) +#define DIGITAL_LR_BITS_PAYLOAD_SIZE_254B 0x30 +#define DIGITAL_GB_BIT 0x02 + +#define DIGITAL_NFC_DEP_PFB_TYPE(pfb) ((pfb) & 0xE0) + +#define DIGITAL_NFC_DEP_PFB_TIMEOUT_BIT 0x10 + +#define DIGITAL_NFC_DEP_PFB_IS_TIMEOUT(pfb) \ + ((pfb) & DIGITAL_NFC_DEP_PFB_TIMEOUT_BIT) +#define DIGITAL_NFC_DEP_MI_BIT_SET(pfb) ((pfb) & 0x10) +#define DIGITAL_NFC_DEP_NAD_BIT_SET(pfb) ((pfb) & 0x08) +#define DIGITAL_NFC_DEP_DID_BIT_SET(pfb) ((pfb) & 0x04) +#define DIGITAL_NFC_DEP_PFB_PNI(pfb) ((pfb) & 0x03) + +#define DIGITAL_NFC_DEP_PFB_I_PDU 0x00 +#define DIGITAL_NFC_DEP_PFB_ACK_NACK_PDU 0x40 +#define DIGITAL_NFC_DEP_PFB_SUPERVISOR_PDU 0x80 + +struct digital_atr_req { + u8 dir; + u8 cmd; + u8 nfcid3[10]; + u8 did; + u8 bs; + u8 br; + u8 pp; + u8 gb[0]; +} __packed; + +struct digital_atr_res { + u8 dir; + u8 cmd; + u8 nfcid3[10]; + u8 did; + u8 bs; + u8 br; + u8 to; + u8 pp; + u8 gb[0]; +} __packed; + +struct digital_psl_req { + u8 dir; + u8 cmd; + u8 did; + u8 brs; + u8 fsl; +} __packed; + +struct digital_psl_res { + u8 dir; + u8 cmd; + u8 did; +} __packed; + +struct digital_dep_req_res { + u8 dir; + u8 cmd; + u8 pfb; +} __packed; + +static void digital_in_recv_dep_res(struct nfc_digital_dev *ddev, void *arg, + struct sk_buff *resp); + +static void digital_skb_push_dep_sod(struct nfc_digital_dev *ddev, + struct sk_buff *skb) +{ + skb_push(skb, sizeof(u8)); + + skb->data[0] = skb->len; + + if (ddev->curr_rf_tech == NFC_DIGITAL_RF_TECH_106A) + *skb_push(skb, sizeof(u8)) = DIGITAL_NFC_DEP_NFCA_SOD_SB; +} + +static int digital_skb_pull_dep_sod(struct nfc_digital_dev *ddev, + struct sk_buff *skb) +{ + u8 size; + + if (skb->len < 2) + return -EIO; + + if (ddev->curr_rf_tech == NFC_DIGITAL_RF_TECH_106A) + skb_pull(skb, sizeof(u8)); + + size = skb->data[0]; + if (size != skb->len) + return -EIO; + + skb_pull(skb, sizeof(u8)); + + return 0; +} + +static void digital_in_recv_atr_res(struct nfc_digital_dev *ddev, void *arg, + struct sk_buff *resp) +{ + struct nfc_target *target = arg; + struct digital_atr_res *atr_res; + u8 gb_len; + int rc; + + if (IS_ERR(resp)) { + rc = PTR_ERR(resp); + resp = NULL; + goto exit; + } + + rc = ddev->skb_check_crc(resp); + if (rc) { + PROTOCOL_ERR("14.4.1.6"); + goto exit; + } + + rc = digital_skb_pull_dep_sod(ddev, resp); + if (rc) { + PROTOCOL_ERR("14.4.1.2"); + goto exit; + } + + if (resp->len < sizeof(struct digital_atr_res)) { + rc = -EIO; + goto exit; + } + + gb_len = resp->len - sizeof(struct digital_atr_res); + + atr_res = (struct digital_atr_res *)resp->data; + + rc = nfc_set_remote_general_bytes(ddev->nfc_dev, atr_res->gb, gb_len); + if (rc) + goto exit; + + rc = nfc_dep_link_is_up(ddev->nfc_dev, target->idx, NFC_COMM_ACTIVE, + NFC_RF_INITIATOR); + + ddev->curr_nfc_dep_pni = 0; + +exit: + dev_kfree_skb(resp); + + if (rc) + ddev->curr_protocol = 0; +} + +int digital_in_send_atr_req(struct nfc_digital_dev *ddev, + struct nfc_target *target, __u8 comm_mode, __u8 *gb, + size_t gb_len) +{ + struct sk_buff *skb; + struct digital_atr_req *atr_req; + uint size; + + size = DIGITAL_ATR_REQ_MIN_SIZE + gb_len; + + if (size > DIGITAL_ATR_REQ_MAX_SIZE) { + PROTOCOL_ERR("14.6.1.1"); + return -EINVAL; + } + + skb = digital_skb_alloc(ddev, size); + if (!skb) + return -ENOMEM; + + skb_put(skb, sizeof(struct digital_atr_req)); + + atr_req = (struct digital_atr_req *)skb->data; + memset(atr_req, 0, sizeof(struct digital_atr_req)); + + atr_req->dir = DIGITAL_NFC_DEP_FRAME_DIR_OUT; + atr_req->cmd = DIGITAL_CMD_ATR_REQ; + if (target->nfcid2_len) + memcpy(atr_req->nfcid3, target->nfcid2, + max(target->nfcid2_len, DIGITAL_NFCID3_LEN)); + else + get_random_bytes(atr_req->nfcid3, DIGITAL_NFCID3_LEN); + + atr_req->did = 0; + atr_req->bs = 0; + atr_req->br = 0; + + atr_req->pp = DIGITAL_LR_BITS_PAYLOAD_SIZE_254B; + + if (gb_len) { + atr_req->pp |= DIGITAL_GB_BIT; + memcpy(skb_put(skb, gb_len), gb, gb_len); + } + + digital_skb_push_dep_sod(ddev, skb); + + ddev->skb_add_crc(skb); + + digital_in_send_cmd(ddev, skb, 500, digital_in_recv_atr_res, target); + + return 0; +} + +static int digital_in_send_rtox(struct nfc_digital_dev *ddev, + struct digital_data_exch *data_exch, u8 rtox) +{ + struct digital_dep_req_res *dep_req; + struct sk_buff *skb; + int rc; + + skb = digital_skb_alloc(ddev, 1); + if (!skb) + return -ENOMEM; + + *skb_put(skb, 1) = rtox; + + skb_push(skb, sizeof(struct digital_dep_req_res)); + + dep_req = (struct digital_dep_req_res *)skb->data; + + dep_req->dir = DIGITAL_NFC_DEP_FRAME_DIR_OUT; + dep_req->cmd = DIGITAL_CMD_DEP_REQ; + dep_req->pfb = DIGITAL_NFC_DEP_PFB_SUPERVISOR_PDU | + DIGITAL_NFC_DEP_PFB_TIMEOUT_BIT; + + digital_skb_push_dep_sod(ddev, skb); + + ddev->skb_add_crc(skb); + + rc = digital_in_send_cmd(ddev, skb, 1500, digital_in_recv_dep_res, + data_exch); + + return rc; +} + +static void digital_in_recv_dep_res(struct nfc_digital_dev *ddev, void *arg, + struct sk_buff *resp) +{ + struct digital_data_exch *data_exch = arg; + struct digital_dep_req_res *dep_res; + u8 pfb; + uint size; + int rc; + + if (IS_ERR(resp)) { + rc = PTR_ERR(resp); + resp = NULL; + goto exit; + } + + rc = ddev->skb_check_crc(resp); + if (rc) { + PROTOCOL_ERR("14.4.1.6"); + goto error; + } + + rc = digital_skb_pull_dep_sod(ddev, resp); + if (rc) { + PROTOCOL_ERR("14.4.1.2"); + goto exit; + } + + dep_res = (struct digital_dep_req_res *)resp->data; + + if (resp->len < sizeof(struct digital_dep_req_res) || + dep_res->dir != DIGITAL_NFC_DEP_FRAME_DIR_IN || + dep_res->cmd != DIGITAL_CMD_DEP_RES) { + rc = -EIO; + goto error; + } + + pfb = dep_res->pfb; + + switch (DIGITAL_NFC_DEP_PFB_TYPE(pfb)) { + case DIGITAL_NFC_DEP_PFB_I_PDU: + if (DIGITAL_NFC_DEP_PFB_PNI(pfb) != ddev->curr_nfc_dep_pni) { + PROTOCOL_ERR("14.12.3.3"); + rc = -EIO; + goto error; + } + + ddev->curr_nfc_dep_pni = + DIGITAL_NFC_DEP_PFB_PNI(ddev->curr_nfc_dep_pni + 1); + rc = 0; + break; + + case DIGITAL_NFC_DEP_PFB_ACK_NACK_PDU: + pr_err("Received a ACK/NACK PDU\n"); + rc = -EIO; + goto error; + + case DIGITAL_NFC_DEP_PFB_SUPERVISOR_PDU: + if (!DIGITAL_NFC_DEP_PFB_IS_TIMEOUT(pfb)) { + rc = -EINVAL; + goto error; + } + + rc = digital_in_send_rtox(ddev, data_exch, resp->data[3]); + if (rc) + goto error; + + kfree_skb(resp); + return; + } + + if (DIGITAL_NFC_DEP_MI_BIT_SET(pfb)) { + pr_err("MI bit set. Chained PDU not supported\n"); + rc = -EIO; + goto error; + } + + size = sizeof(struct digital_dep_req_res); + + if (DIGITAL_NFC_DEP_DID_BIT_SET(pfb)) + size++; + + if (size > resp->len) { + rc = -EIO; + goto error; + } + + skb_pull(resp, size); + +exit: + data_exch->cb(data_exch->cb_context, resp, rc); + +error: + kfree(data_exch); + + if (rc) + kfree_skb(resp); +} + +int digital_in_send_dep_req(struct nfc_digital_dev *ddev, + struct nfc_target *target, struct sk_buff *skb, + struct digital_data_exch *data_exch) +{ + struct digital_dep_req_res *dep_req; + + skb_push(skb, sizeof(struct digital_dep_req_res)); + + dep_req = (struct digital_dep_req_res *)skb->data; + dep_req->dir = DIGITAL_NFC_DEP_FRAME_DIR_OUT; + dep_req->cmd = DIGITAL_CMD_DEP_REQ; + dep_req->pfb = ddev->curr_nfc_dep_pni; + + digital_skb_push_dep_sod(ddev, skb); + + ddev->skb_add_crc(skb); + + return digital_in_send_cmd(ddev, skb, 1500, digital_in_recv_dep_res, + data_exch); +} + +static void digital_tg_recv_dep_req(struct nfc_digital_dev *ddev, void *arg, + struct sk_buff *resp) +{ + int rc; + struct digital_dep_req_res *dep_req; + size_t size; + + if (IS_ERR(resp)) { + rc = PTR_ERR(resp); + resp = NULL; + goto exit; + } + + rc = ddev->skb_check_crc(resp); + if (rc) { + PROTOCOL_ERR("14.4.1.6"); + goto exit; + } + + rc = digital_skb_pull_dep_sod(ddev, resp); + if (rc) { + PROTOCOL_ERR("14.4.1.2"); + goto exit; + } + + size = sizeof(struct digital_dep_req_res); + dep_req = (struct digital_dep_req_res *)resp->data; + + if (resp->len < size || dep_req->dir != DIGITAL_NFC_DEP_FRAME_DIR_OUT || + dep_req->cmd != DIGITAL_CMD_DEP_REQ) { + rc = -EIO; + goto exit; + } + + if (DIGITAL_NFC_DEP_DID_BIT_SET(dep_req->pfb)) + size++; + + if (resp->len < size) { + rc = -EIO; + goto exit; + } + + switch (DIGITAL_NFC_DEP_PFB_TYPE(dep_req->pfb)) { + case DIGITAL_NFC_DEP_PFB_I_PDU: + pr_debug("DIGITAL_NFC_DEP_PFB_I_PDU\n"); + ddev->curr_nfc_dep_pni = DIGITAL_NFC_DEP_PFB_PNI(dep_req->pfb); + break; + case DIGITAL_NFC_DEP_PFB_ACK_NACK_PDU: + pr_err("Received a ACK/NACK PDU\n"); + rc = -EINVAL; + goto exit; + break; + case DIGITAL_NFC_DEP_PFB_SUPERVISOR_PDU: + pr_err("Received a SUPERVISOR PDU\n"); + rc = -EINVAL; + goto exit; + break; + } + + skb_pull(resp, size); + + rc = nfc_tm_data_received(ddev->nfc_dev, resp); + +exit: + if (rc) + kfree_skb(resp); +} + +int digital_tg_send_dep_res(struct nfc_digital_dev *ddev, struct sk_buff *skb) +{ + struct digital_dep_req_res *dep_res; + + skb_push(skb, sizeof(struct digital_dep_req_res)); + dep_res = (struct digital_dep_req_res *)skb->data; + + dep_res->dir = DIGITAL_NFC_DEP_FRAME_DIR_IN; + dep_res->cmd = DIGITAL_CMD_DEP_RES; + dep_res->pfb = ddev->curr_nfc_dep_pni; + + digital_skb_push_dep_sod(ddev, skb); + + ddev->skb_add_crc(skb); + + return digital_tg_send_cmd(ddev, skb, 1500, digital_tg_recv_dep_req, + NULL); +} + +static void digital_tg_send_psl_res_complete(struct nfc_digital_dev *ddev, + void *arg, struct sk_buff *resp) +{ + u8 rf_tech = PTR_ERR(arg); + + if (IS_ERR(resp)) + return; + + digital_tg_configure_hw(ddev, NFC_DIGITAL_CONFIG_RF_TECH, rf_tech); + + digital_tg_listen(ddev, 1500, digital_tg_recv_dep_req, NULL); + + dev_kfree_skb(resp); +} + +static int digital_tg_send_psl_res(struct nfc_digital_dev *ddev, u8 did, + u8 rf_tech) +{ + struct digital_psl_res *psl_res; + struct sk_buff *skb; + int rc; + + skb = digital_skb_alloc(ddev, sizeof(struct digital_psl_res)); + if (!skb) + return -ENOMEM; + + skb_put(skb, sizeof(struct digital_psl_res)); + + psl_res = (struct digital_psl_res *)skb->data; + + psl_res->dir = DIGITAL_NFC_DEP_FRAME_DIR_IN; + psl_res->cmd = DIGITAL_CMD_PSL_RES; + psl_res->did = did; + + digital_skb_push_dep_sod(ddev, skb); + + ddev->skb_add_crc(skb); + + rc = digital_tg_send_cmd(ddev, skb, 0, digital_tg_send_psl_res_complete, + ERR_PTR(rf_tech)); + + if (rc) + kfree_skb(skb); + + return rc; +} + +static void digital_tg_recv_psl_req(struct nfc_digital_dev *ddev, void *arg, + struct sk_buff *resp) +{ + int rc; + struct digital_psl_req *psl_req; + u8 rf_tech; + u8 dsi; + + if (IS_ERR(resp)) { + rc = PTR_ERR(resp); + resp = NULL; + goto exit; + } + + rc = ddev->skb_check_crc(resp); + if (rc) { + PROTOCOL_ERR("14.4.1.6"); + goto exit; + } + + rc = digital_skb_pull_dep_sod(ddev, resp); + if (rc) { + PROTOCOL_ERR("14.4.1.2"); + goto exit; + } + + psl_req = (struct digital_psl_req *)resp->data; + + if (resp->len != sizeof(struct digital_psl_req) || + psl_req->dir != DIGITAL_NFC_DEP_FRAME_DIR_OUT || + psl_req->cmd != DIGITAL_CMD_PSL_REQ) { + rc = -EIO; + goto exit; + } + + dsi = (psl_req->brs >> 3) & 0x07; + switch (dsi) { + case 0: + rf_tech = NFC_DIGITAL_RF_TECH_106A; + break; + case 1: + rf_tech = NFC_DIGITAL_RF_TECH_212F; + break; + case 2: + rf_tech = NFC_DIGITAL_RF_TECH_424F; + break; + default: + pr_err("Unsuported dsi value %d\n", dsi); + goto exit; + } + + rc = digital_tg_send_psl_res(ddev, psl_req->did, rf_tech); + +exit: + kfree_skb(resp); +} + +static void digital_tg_send_atr_res_complete(struct nfc_digital_dev *ddev, + void *arg, struct sk_buff *resp) +{ + int offset; + + if (IS_ERR(resp)) { + digital_poll_next_tech(ddev); + return; + } + + offset = 2; + if (resp->data[0] == DIGITAL_NFC_DEP_NFCA_SOD_SB) + offset++; + + if (resp->data[offset] == DIGITAL_CMD_PSL_REQ) + digital_tg_recv_psl_req(ddev, arg, resp); + else + digital_tg_recv_dep_req(ddev, arg, resp); +} + +static int digital_tg_send_atr_res(struct nfc_digital_dev *ddev, + struct digital_atr_req *atr_req) +{ + struct digital_atr_res *atr_res; + struct sk_buff *skb; + u8 *gb; + size_t gb_len; + int rc; + + gb = nfc_get_local_general_bytes(ddev->nfc_dev, &gb_len); + if (!gb) + gb_len = 0; + + skb = digital_skb_alloc(ddev, sizeof(struct digital_atr_res) + gb_len); + if (!skb) + return -ENOMEM; + + skb_put(skb, sizeof(struct digital_atr_res)); + atr_res = (struct digital_atr_res *)skb->data; + + memset(atr_res, 0, sizeof(struct digital_atr_res)); + + atr_res->dir = DIGITAL_NFC_DEP_FRAME_DIR_IN; + atr_res->cmd = DIGITAL_CMD_ATR_RES; + memcpy(atr_res->nfcid3, atr_req->nfcid3, sizeof(atr_req->nfcid3)); + atr_res->to = 8; + atr_res->pp = DIGITAL_LR_BITS_PAYLOAD_SIZE_254B; + if (gb_len) { + skb_put(skb, gb_len); + + atr_res->pp |= DIGITAL_GB_BIT; + memcpy(atr_res->gb, gb, gb_len); + } + + digital_skb_push_dep_sod(ddev, skb); + + ddev->skb_add_crc(skb); + + rc = digital_tg_send_cmd(ddev, skb, 999, + digital_tg_send_atr_res_complete, NULL); + if (rc) { + kfree_skb(skb); + return rc; + } + + return rc; +} + +void digital_tg_recv_atr_req(struct nfc_digital_dev *ddev, void *arg, + struct sk_buff *resp) +{ + int rc; + struct digital_atr_req *atr_req; + size_t gb_len, min_size; + + if (IS_ERR(resp)) { + rc = PTR_ERR(resp); + resp = NULL; + goto exit; + } + + if (!resp->len) { + rc = -EIO; + goto exit; + } + + if (resp->data[0] == DIGITAL_NFC_DEP_NFCA_SOD_SB) { + min_size = DIGITAL_ATR_REQ_MIN_SIZE + 2; + + ddev->curr_rf_tech = NFC_DIGITAL_RF_TECH_106A; + ddev->skb_add_crc = digital_skb_add_crc_a; + ddev->skb_check_crc = digital_skb_check_crc_a; + } else { + min_size = DIGITAL_ATR_REQ_MIN_SIZE + 1; + + ddev->curr_rf_tech = NFC_DIGITAL_RF_TECH_212F; + ddev->skb_add_crc = digital_skb_add_crc_f; + ddev->skb_check_crc = digital_skb_check_crc_f; + } + + if (resp->len < min_size) { + rc = -EIO; + goto exit; + } + + if (DIGITAL_DRV_CAPS_TG_CRC(ddev)) { + ddev->skb_add_crc = digital_skb_add_crc_none; + ddev->skb_check_crc = digital_skb_check_crc_none; + } + + rc = ddev->skb_check_crc(resp); + if (rc) { + PROTOCOL_ERR("14.4.1.6"); + goto exit; + } + + rc = digital_skb_pull_dep_sod(ddev, resp); + if (rc) { + PROTOCOL_ERR("14.4.1.2"); + goto exit; + } + + atr_req = (struct digital_atr_req *)resp->data; + + if (atr_req->dir != DIGITAL_NFC_DEP_FRAME_DIR_OUT || + atr_req->cmd != DIGITAL_CMD_ATR_REQ) { + rc = -EINVAL; + goto exit; + } + + rc = digital_tg_configure_hw(ddev, NFC_DIGITAL_CONFIG_FRAMING, + NFC_DIGITAL_FRAMING_NFC_DEP_ACTIVATED); + if (rc) + goto exit; + + rc = digital_tg_send_atr_res(ddev, atr_req); + if (rc) + goto exit; + + gb_len = resp->len - sizeof(struct digital_atr_req); + rc = nfc_tm_activated(ddev->nfc_dev, NFC_PROTO_NFC_DEP_MASK, + NFC_COMM_PASSIVE, atr_req->gb, gb_len); + if (rc) + goto exit; + + ddev->poll_tech_count = 0; + + rc = 0; +exit: + if (rc) + digital_poll_next_tech(ddev); + + dev_kfree_skb(resp); +} diff --git a/net/nfc/digital_technology.c b/net/nfc/digital_technology.c new file mode 100644 index 000000000000..251c8c753ebe --- /dev/null +++ b/net/nfc/digital_technology.c @@ -0,0 +1,770 @@ +/* + * NFC Digital Protocol stack + * Copyright (c) 2013, Intel Corporation. + * + * This program is free software; you can redistribute it and/or modify it + * under the terms and conditions of the GNU General Public License, + * version 2, as published by the Free Software Foundation. + * + * This program is distributed in the hope it will be useful, but WITHOUT + * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or + * FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for + * more details. + * + */ + +#define pr_fmt(fmt) "digital: %s: " fmt, __func__ + +#include "digital.h" + +#define DIGITAL_CMD_SENS_REQ 0x26 +#define DIGITAL_CMD_ALL_REQ 0x52 +#define DIGITAL_CMD_SEL_REQ_CL1 0x93 +#define DIGITAL_CMD_SEL_REQ_CL2 0x95 +#define DIGITAL_CMD_SEL_REQ_CL3 0x97 + +#define DIGITAL_SDD_REQ_SEL_PAR 0x20 + +#define DIGITAL_SDD_RES_CT 0x88 +#define DIGITAL_SDD_RES_LEN 5 + +#define DIGITAL_SEL_RES_NFCID1_COMPLETE(sel_res) (!((sel_res) & 0x04)) +#define DIGITAL_SEL_RES_IS_T2T(sel_res) (!((sel_res) & 0x60)) +#define DIGITAL_SEL_RES_IS_NFC_DEP(sel_res) ((sel_res) & 0x40) + +#define DIGITAL_SENS_RES_IS_T1T(sens_res) (((sens_res) & 0x0C00) == 0x0C00) +#define DIGITAL_SENS_RES_IS_VALID(sens_res) \ + ((!((sens_res) & 0x001F) && (((sens_res) & 0x0C00) == 0x0C00)) || \ + (((sens_res) & 0x001F) && ((sens_res) & 0x0C00) != 0x0C00)) + +#define DIGITAL_MIFARE_READ_RES_LEN 16 +#define DIGITAL_MIFARE_ACK_RES 0x0A + +#define DIGITAL_CMD_SENSF_REQ 0x00 +#define DIGITAL_CMD_SENSF_RES 0x01 + +#define DIGITAL_SENSF_RES_MIN_LENGTH 17 +#define DIGITAL_SENSF_RES_RD_AP_B1 0x00 +#define DIGITAL_SENSF_RES_RD_AP_B2 0x8F + +#define DIGITAL_SENSF_REQ_RC_NONE 0 +#define DIGITAL_SENSF_REQ_RC_SC 1 +#define DIGITAL_SENSF_REQ_RC_AP 2 + +struct digital_sdd_res { + u8 nfcid1[4]; + u8 bcc; +} __packed; + +struct digital_sel_req { + u8 sel_cmd; + u8 b2; + u8 nfcid1[4]; + u8 bcc; +} __packed; + +struct digital_sensf_req { + u8 cmd; + u8 sc1; + u8 sc2; + u8 rc; + u8 tsn; +} __packed; + +struct digital_sensf_res { + u8 cmd; + u8 nfcid2[8]; + u8 pad0[2]; + u8 pad1[3]; + u8 mrti_check; + u8 mrti_update; + u8 pad2; + u8 rd[2]; +} __packed; + +static int digital_in_send_sdd_req(struct nfc_digital_dev *ddev, + struct nfc_target *target); + +static void digital_in_recv_sel_res(struct nfc_digital_dev *ddev, void *arg, + struct sk_buff *resp) +{ + struct nfc_target *target = arg; + int rc; + u8 sel_res; + u8 nfc_proto; + + if (IS_ERR(resp)) { + rc = PTR_ERR(resp); + resp = NULL; + goto exit; + } + + if (!DIGITAL_DRV_CAPS_IN_CRC(ddev)) { + rc = digital_skb_check_crc_a(resp); + if (rc) { + PROTOCOL_ERR("4.4.1.3"); + goto exit; + } + } + + if (!resp->len) { + rc = -EIO; + goto exit; + } + + sel_res = resp->data[0]; + + if (!DIGITAL_SEL_RES_NFCID1_COMPLETE(sel_res)) { + rc = digital_in_send_sdd_req(ddev, target); + if (rc) + goto exit; + + goto exit_free_skb; + } + + if (DIGITAL_SEL_RES_IS_T2T(sel_res)) { + nfc_proto = NFC_PROTO_MIFARE; + } else if (DIGITAL_SEL_RES_IS_NFC_DEP(sel_res)) { + nfc_proto = NFC_PROTO_NFC_DEP; + } else { + rc = -EOPNOTSUPP; + goto exit; + } + + target->sel_res = sel_res; + + rc = digital_target_found(ddev, target, nfc_proto); + +exit: + kfree(target); + +exit_free_skb: + dev_kfree_skb(resp); + + if (rc) + digital_poll_next_tech(ddev); +} + +static int digital_in_send_sel_req(struct nfc_digital_dev *ddev, + struct nfc_target *target, + struct digital_sdd_res *sdd_res) +{ + struct sk_buff *skb; + struct digital_sel_req *sel_req; + u8 sel_cmd; + int rc; + + skb = digital_skb_alloc(ddev, sizeof(struct digital_sel_req)); + if (!skb) + return -ENOMEM; + + skb_put(skb, sizeof(struct digital_sel_req)); + sel_req = (struct digital_sel_req *)skb->data; + + if (target->nfcid1_len <= 4) + sel_cmd = DIGITAL_CMD_SEL_REQ_CL1; + else if (target->nfcid1_len < 10) + sel_cmd = DIGITAL_CMD_SEL_REQ_CL2; + else + sel_cmd = DIGITAL_CMD_SEL_REQ_CL3; + + sel_req->sel_cmd = sel_cmd; + sel_req->b2 = 0x70; + memcpy(sel_req->nfcid1, sdd_res->nfcid1, 4); + sel_req->bcc = sdd_res->bcc; + + if (DIGITAL_DRV_CAPS_IN_CRC(ddev)) { + rc = digital_in_configure_hw(ddev, NFC_DIGITAL_CONFIG_FRAMING, + NFC_DIGITAL_FRAMING_NFCA_STANDARD_WITH_CRC_A); + if (rc) + goto exit; + } else { + digital_skb_add_crc_a(skb); + } + + rc = digital_in_send_cmd(ddev, skb, 30, digital_in_recv_sel_res, + target); +exit: + if (rc) + kfree_skb(skb); + + return rc; +} + +static void digital_in_recv_sdd_res(struct nfc_digital_dev *ddev, void *arg, + struct sk_buff *resp) +{ + struct nfc_target *target = arg; + struct digital_sdd_res *sdd_res; + int rc; + u8 offset, size; + u8 i, bcc; + + if (IS_ERR(resp)) { + rc = PTR_ERR(resp); + resp = NULL; + goto exit; + } + + if (resp->len < DIGITAL_SDD_RES_LEN) { + PROTOCOL_ERR("4.7.2.8"); + rc = -EINVAL; + goto exit; + } + + sdd_res = (struct digital_sdd_res *)resp->data; + + for (i = 0, bcc = 0; i < 4; i++) + bcc ^= sdd_res->nfcid1[i]; + + if (bcc != sdd_res->bcc) { + PROTOCOL_ERR("4.7.2.6"); + rc = -EINVAL; + goto exit; + } + + if (sdd_res->nfcid1[0] == DIGITAL_SDD_RES_CT) { + offset = 1; + size = 3; + } else { + offset = 0; + size = 4; + } + + memcpy(target->nfcid1 + target->nfcid1_len, sdd_res->nfcid1 + offset, + size); + target->nfcid1_len += size; + + rc = digital_in_send_sel_req(ddev, target, sdd_res); + +exit: + dev_kfree_skb(resp); + + if (rc) { + kfree(target); + digital_poll_next_tech(ddev); + } +} + +static int digital_in_send_sdd_req(struct nfc_digital_dev *ddev, + struct nfc_target *target) +{ + int rc; + struct sk_buff *skb; + u8 sel_cmd; + + rc = digital_in_configure_hw(ddev, NFC_DIGITAL_CONFIG_FRAMING, + NFC_DIGITAL_FRAMING_NFCA_STANDARD); + if (rc) + return rc; + + skb = digital_skb_alloc(ddev, 2); + if (!skb) + return -ENOMEM; + + if (target->nfcid1_len == 0) + sel_cmd = DIGITAL_CMD_SEL_REQ_CL1; + else if (target->nfcid1_len == 3) + sel_cmd = DIGITAL_CMD_SEL_REQ_CL2; + else + sel_cmd = DIGITAL_CMD_SEL_REQ_CL3; + + *skb_put(skb, sizeof(u8)) = sel_cmd; + *skb_put(skb, sizeof(u8)) = DIGITAL_SDD_REQ_SEL_PAR; + + return digital_in_send_cmd(ddev, skb, 30, digital_in_recv_sdd_res, + target); +} + +static void digital_in_recv_sens_res(struct nfc_digital_dev *ddev, void *arg, + struct sk_buff *resp) +{ + struct nfc_target *target = NULL; + int rc; + + if (IS_ERR(resp)) { + rc = PTR_ERR(resp); + resp = NULL; + goto exit; + } + + if (resp->len < sizeof(u16)) { + rc = -EIO; + goto exit; + } + + target = kzalloc(sizeof(struct nfc_target), GFP_KERNEL); + if (!target) { + rc = -ENOMEM; + goto exit; + } + + target->sens_res = __le16_to_cpu(*(__le16 *)resp->data); + + if (!DIGITAL_SENS_RES_IS_VALID(target->sens_res)) { + PROTOCOL_ERR("4.6.3.3"); + rc = -EINVAL; + goto exit; + } + + if (DIGITAL_SENS_RES_IS_T1T(target->sens_res)) + rc = digital_target_found(ddev, target, NFC_PROTO_JEWEL); + else + rc = digital_in_send_sdd_req(ddev, target); + +exit: + dev_kfree_skb(resp); + + if (rc) { + kfree(target); + digital_poll_next_tech(ddev); + } +} + +int digital_in_send_sens_req(struct nfc_digital_dev *ddev, u8 rf_tech) +{ + struct sk_buff *skb; + int rc; + + rc = digital_in_configure_hw(ddev, NFC_DIGITAL_CONFIG_RF_TECH, + NFC_DIGITAL_RF_TECH_106A); + if (rc) + return rc; + + rc = digital_in_configure_hw(ddev, NFC_DIGITAL_CONFIG_FRAMING, + NFC_DIGITAL_FRAMING_NFCA_SHORT); + if (rc) + return rc; + + skb = digital_skb_alloc(ddev, 1); + if (!skb) + return -ENOMEM; + + *skb_put(skb, sizeof(u8)) = DIGITAL_CMD_SENS_REQ; + + rc = digital_in_send_cmd(ddev, skb, 30, digital_in_recv_sens_res, NULL); + if (rc) + kfree_skb(skb); + + return rc; +} + +int digital_in_recv_mifare_res(struct sk_buff *resp) +{ + /* Successful READ command response is 16 data bytes + 2 CRC bytes long. + * Since the driver can't differentiate a ACK/NACK response from a valid + * READ response, the CRC calculation must be handled at digital level + * even if the driver supports it for this technology. + */ + if (resp->len == DIGITAL_MIFARE_READ_RES_LEN + DIGITAL_CRC_LEN) { + if (digital_skb_check_crc_a(resp)) { + PROTOCOL_ERR("9.4.1.2"); + return -EIO; + } + + return 0; + } + + /* ACK response (i.e. successful WRITE). */ + if (resp->len == 1 && resp->data[0] == DIGITAL_MIFARE_ACK_RES) { + resp->data[0] = 0; + return 0; + } + + /* NACK and any other responses are treated as error. */ + return -EIO; +} + +static void digital_in_recv_sensf_res(struct nfc_digital_dev *ddev, void *arg, + struct sk_buff *resp) +{ + int rc; + u8 proto; + struct nfc_target target; + struct digital_sensf_res *sensf_res; + + if (IS_ERR(resp)) { + rc = PTR_ERR(resp); + resp = NULL; + goto exit; + } + + if (resp->len < DIGITAL_SENSF_RES_MIN_LENGTH) { + rc = -EIO; + goto exit; + } + + if (!DIGITAL_DRV_CAPS_IN_CRC(ddev)) { + rc = digital_skb_check_crc_f(resp); + if (rc) { + PROTOCOL_ERR("6.4.1.8"); + goto exit; + } + } + + skb_pull(resp, 1); + + memset(&target, 0, sizeof(struct nfc_target)); + + sensf_res = (struct digital_sensf_res *)resp->data; + + memcpy(target.sensf_res, sensf_res, resp->len); + target.sensf_res_len = resp->len; + + memcpy(target.nfcid2, sensf_res->nfcid2, NFC_NFCID2_MAXSIZE); + target.nfcid2_len = NFC_NFCID2_MAXSIZE; + + if (target.nfcid2[0] == DIGITAL_SENSF_NFCID2_NFC_DEP_B1 && + target.nfcid2[1] == DIGITAL_SENSF_NFCID2_NFC_DEP_B2) + proto = NFC_PROTO_NFC_DEP; + else + proto = NFC_PROTO_FELICA; + + rc = digital_target_found(ddev, &target, proto); + +exit: + dev_kfree_skb(resp); + + if (rc) + digital_poll_next_tech(ddev); +} + +int digital_in_send_sensf_req(struct nfc_digital_dev *ddev, u8 rf_tech) +{ + struct digital_sensf_req *sensf_req; + struct sk_buff *skb; + int rc; + u8 size; + + rc = digital_in_configure_hw(ddev, NFC_DIGITAL_CONFIG_RF_TECH, rf_tech); + if (rc) + return rc; + + rc = digital_in_configure_hw(ddev, NFC_DIGITAL_CONFIG_FRAMING, + NFC_DIGITAL_FRAMING_NFCF); + if (rc) + return rc; + + size = sizeof(struct digital_sensf_req); + + skb = digital_skb_alloc(ddev, size); + if (!skb) + return -ENOMEM; + + skb_put(skb, size); + + sensf_req = (struct digital_sensf_req *)skb->data; + sensf_req->cmd = DIGITAL_CMD_SENSF_REQ; + sensf_req->sc1 = 0xFF; + sensf_req->sc2 = 0xFF; + sensf_req->rc = 0; + sensf_req->tsn = 0; + + *skb_push(skb, 1) = size + 1; + + if (!DIGITAL_DRV_CAPS_IN_CRC(ddev)) + digital_skb_add_crc_f(skb); + + rc = digital_in_send_cmd(ddev, skb, 30, digital_in_recv_sensf_res, + NULL); + if (rc) + kfree_skb(skb); + + return rc; +} + +static int digital_tg_send_sel_res(struct nfc_digital_dev *ddev) +{ + struct sk_buff *skb; + int rc; + + skb = digital_skb_alloc(ddev, 1); + if (!skb) + return -ENOMEM; + + *skb_put(skb, 1) = DIGITAL_SEL_RES_NFC_DEP; + + if (!DIGITAL_DRV_CAPS_TG_CRC(ddev)) + digital_skb_add_crc_a(skb); + + rc = digital_tg_send_cmd(ddev, skb, 300, digital_tg_recv_atr_req, + NULL); + if (rc) + kfree_skb(skb); + + return rc; +} + +static void digital_tg_recv_sel_req(struct nfc_digital_dev *ddev, void *arg, + struct sk_buff *resp) +{ + int rc; + + if (IS_ERR(resp)) { + rc = PTR_ERR(resp); + resp = NULL; + goto exit; + } + + if (!DIGITAL_DRV_CAPS_TG_CRC(ddev)) { + rc = digital_skb_check_crc_a(resp); + if (rc) { + PROTOCOL_ERR("4.4.1.3"); + goto exit; + } + } + + /* Silently ignore SEL_REQ content and send a SEL_RES for NFC-DEP */ + + rc = digital_tg_send_sel_res(ddev); + +exit: + if (rc) + digital_poll_next_tech(ddev); + + dev_kfree_skb(resp); +} + +static int digital_tg_send_sdd_res(struct nfc_digital_dev *ddev) +{ + struct sk_buff *skb; + struct digital_sdd_res *sdd_res; + int rc, i; + + skb = digital_skb_alloc(ddev, sizeof(struct digital_sdd_res)); + if (!skb) + return -ENOMEM; + + skb_put(skb, sizeof(struct digital_sdd_res)); + sdd_res = (struct digital_sdd_res *)skb->data; + + sdd_res->nfcid1[0] = 0x08; + get_random_bytes(sdd_res->nfcid1 + 1, 3); + + sdd_res->bcc = 0; + for (i = 0; i < 4; i++) + sdd_res->bcc ^= sdd_res->nfcid1[i]; + + rc = digital_tg_send_cmd(ddev, skb, 300, digital_tg_recv_sel_req, + NULL); + if (rc) + kfree_skb(skb); + + return rc; +} + +static void digital_tg_recv_sdd_req(struct nfc_digital_dev *ddev, void *arg, + struct sk_buff *resp) +{ + u8 *sdd_req; + int rc; + + if (IS_ERR(resp)) { + rc = PTR_ERR(resp); + resp = NULL; + goto exit; + } + + sdd_req = resp->data; + + if (resp->len < 2 || sdd_req[0] != DIGITAL_CMD_SEL_REQ_CL1 || + sdd_req[1] != DIGITAL_SDD_REQ_SEL_PAR) { + rc = -EINVAL; + goto exit; + } + + rc = digital_tg_send_sdd_res(ddev); + +exit: + if (rc) + digital_poll_next_tech(ddev); + + dev_kfree_skb(resp); +} + +static int digital_tg_send_sens_res(struct nfc_digital_dev *ddev) +{ + struct sk_buff *skb; + u8 *sens_res; + int rc; + + skb = digital_skb_alloc(ddev, 2); + if (!skb) + return -ENOMEM; + + sens_res = skb_put(skb, 2); + + sens_res[0] = (DIGITAL_SENS_RES_NFC_DEP >> 8) & 0xFF; + sens_res[1] = DIGITAL_SENS_RES_NFC_DEP & 0xFF; + + rc = digital_tg_send_cmd(ddev, skb, 300, digital_tg_recv_sdd_req, + NULL); + if (rc) + kfree_skb(skb); + + return rc; +} + +void digital_tg_recv_sens_req(struct nfc_digital_dev *ddev, void *arg, + struct sk_buff *resp) +{ + u8 sens_req; + int rc; + + if (IS_ERR(resp)) { + rc = PTR_ERR(resp); + resp = NULL; + goto exit; + } + + sens_req = resp->data[0]; + + if (!resp->len || (sens_req != DIGITAL_CMD_SENS_REQ && + sens_req != DIGITAL_CMD_ALL_REQ)) { + rc = -EINVAL; + goto exit; + } + + rc = digital_tg_send_sens_res(ddev); + +exit: + if (rc) + digital_poll_next_tech(ddev); + + dev_kfree_skb(resp); +} + +static int digital_tg_send_sensf_res(struct nfc_digital_dev *ddev, + struct digital_sensf_req *sensf_req) +{ + struct sk_buff *skb; + u8 size; + int rc; + struct digital_sensf_res *sensf_res; + + size = sizeof(struct digital_sensf_res); + + if (sensf_req->rc != DIGITAL_SENSF_REQ_RC_NONE) + size -= sizeof(sensf_res->rd); + + skb = digital_skb_alloc(ddev, size); + if (!skb) + return -ENOMEM; + + skb_put(skb, size); + + sensf_res = (struct digital_sensf_res *)skb->data; + + memset(sensf_res, 0, size); + + sensf_res->cmd = DIGITAL_CMD_SENSF_RES; + sensf_res->nfcid2[0] = DIGITAL_SENSF_NFCID2_NFC_DEP_B1; + sensf_res->nfcid2[1] = DIGITAL_SENSF_NFCID2_NFC_DEP_B2; + get_random_bytes(&sensf_res->nfcid2[2], 6); + + switch (sensf_req->rc) { + case DIGITAL_SENSF_REQ_RC_SC: + sensf_res->rd[0] = sensf_req->sc1; + sensf_res->rd[1] = sensf_req->sc2; + break; + case DIGITAL_SENSF_REQ_RC_AP: + sensf_res->rd[0] = DIGITAL_SENSF_RES_RD_AP_B1; + sensf_res->rd[1] = DIGITAL_SENSF_RES_RD_AP_B2; + break; + } + + *skb_push(skb, sizeof(u8)) = size + 1; + + if (!DIGITAL_DRV_CAPS_TG_CRC(ddev)) + digital_skb_add_crc_f(skb); + + rc = digital_tg_send_cmd(ddev, skb, 300, + digital_tg_recv_atr_req, NULL); + if (rc) + kfree_skb(skb); + + return rc; +} + +void digital_tg_recv_sensf_req(struct nfc_digital_dev *ddev, void *arg, + struct sk_buff *resp) +{ + struct digital_sensf_req *sensf_req; + int rc; + + if (IS_ERR(resp)) { + rc = PTR_ERR(resp); + resp = NULL; + goto exit; + } + + if (!DIGITAL_DRV_CAPS_TG_CRC(ddev)) { + rc = digital_skb_check_crc_f(resp); + if (rc) { + PROTOCOL_ERR("6.4.1.8"); + goto exit; + } + } + + if (resp->len != sizeof(struct digital_sensf_req) + 1) { + rc = -EINVAL; + goto exit; + } + + skb_pull(resp, 1); + sensf_req = (struct digital_sensf_req *)resp->data; + + if (sensf_req->cmd != DIGITAL_CMD_SENSF_REQ) { + rc = -EINVAL; + goto exit; + } + + rc = digital_tg_send_sensf_res(ddev, sensf_req); + +exit: + if (rc) + digital_poll_next_tech(ddev); + + dev_kfree_skb(resp); +} + +int digital_tg_listen_nfca(struct nfc_digital_dev *ddev, u8 rf_tech) +{ + int rc; + + rc = digital_tg_configure_hw(ddev, NFC_DIGITAL_CONFIG_RF_TECH, rf_tech); + if (rc) + return rc; + + rc = digital_tg_configure_hw(ddev, NFC_DIGITAL_CONFIG_FRAMING, + NFC_DIGITAL_FRAMING_NFCA_NFC_DEP); + if (rc) + return rc; + + return digital_tg_listen(ddev, 300, digital_tg_recv_sens_req, NULL); +} + +int digital_tg_listen_nfcf(struct nfc_digital_dev *ddev, u8 rf_tech) +{ + int rc; + u8 *nfcid2; + + rc = digital_tg_configure_hw(ddev, NFC_DIGITAL_CONFIG_RF_TECH, rf_tech); + if (rc) + return rc; + + rc = digital_tg_configure_hw(ddev, NFC_DIGITAL_CONFIG_FRAMING, + NFC_DIGITAL_FRAMING_NFCF_NFC_DEP); + if (rc) + return rc; + + nfcid2 = kzalloc(NFC_NFCID2_MAXSIZE, GFP_KERNEL); + if (!nfcid2) + return -ENOMEM; + + nfcid2[0] = DIGITAL_SENSF_NFCID2_NFC_DEP_B1; + nfcid2[1] = DIGITAL_SENSF_NFCID2_NFC_DEP_B2; + get_random_bytes(nfcid2 + 2, NFC_NFCID2_MAXSIZE - 2); + + return digital_tg_listen(ddev, 300, digital_tg_recv_sensf_req, nfcid2); +} diff --git a/net/nfc/nci/spi.c b/net/nfc/nci/spi.c index c7cf37ba7298..f1d426f10cce 100644 --- a/net/nfc/nci/spi.c +++ b/net/nfc/nci/spi.c @@ -21,11 +21,8 @@ #include <linux/export.h> #include <linux/spi/spi.h> #include <linux/crc-ccitt.h> -#include <linux/nfc.h> #include <net/nfc/nci_core.h> -#define NCI_SPI_HDR_LEN 4 -#define NCI_SPI_CRC_LEN 2 #define NCI_SPI_ACK_SHIFT 6 #define NCI_SPI_MSB_PAYLOAD_MASK 0x3F @@ -41,54 +38,48 @@ #define CRC_INIT 0xFFFF -static int nci_spi_open(struct nci_dev *nci_dev) -{ - struct nci_spi_dev *ndev = nci_get_drvdata(nci_dev); - - return ndev->ops->open(ndev); -} - -static int nci_spi_close(struct nci_dev *nci_dev) -{ - struct nci_spi_dev *ndev = nci_get_drvdata(nci_dev); - - return ndev->ops->close(ndev); -} - -static int __nci_spi_send(struct nci_spi_dev *ndev, struct sk_buff *skb) +static int __nci_spi_send(struct nci_spi *nspi, struct sk_buff *skb, + int cs_change) { struct spi_message m; struct spi_transfer t; - t.tx_buf = skb->data; - t.len = skb->len; - t.cs_change = 0; - t.delay_usecs = ndev->xfer_udelay; + memset(&t, 0, sizeof(struct spi_transfer)); + /* a NULL skb means we just want the SPI chip select line to raise */ + if (skb) { + t.tx_buf = skb->data; + t.len = skb->len; + } else { + /* still set tx_buf non NULL to make the driver happy */ + t.tx_buf = &t; + t.len = 0; + } + t.cs_change = cs_change; + t.delay_usecs = nspi->xfer_udelay; spi_message_init(&m); spi_message_add_tail(&t, &m); - return spi_sync(ndev->spi, &m); + return spi_sync(nspi->spi, &m); } -static int nci_spi_send(struct nci_dev *nci_dev, struct sk_buff *skb) +int nci_spi_send(struct nci_spi *nspi, + struct completion *write_handshake_completion, + struct sk_buff *skb) { - struct nci_spi_dev *ndev = nci_get_drvdata(nci_dev); unsigned int payload_len = skb->len; unsigned char *hdr; int ret; long completion_rc; - ndev->ops->deassert_int(ndev); - /* add the NCI SPI header to the start of the buffer */ hdr = skb_push(skb, NCI_SPI_HDR_LEN); hdr[0] = NCI_SPI_DIRECT_WRITE; - hdr[1] = ndev->acknowledge_mode; + hdr[1] = nspi->acknowledge_mode; hdr[2] = payload_len >> 8; hdr[3] = payload_len & 0xFF; - if (ndev->acknowledge_mode == NCI_SPI_CRC_ENABLED) { + if (nspi->acknowledge_mode == NCI_SPI_CRC_ENABLED) { u16 crc; crc = crc_ccitt(CRC_INIT, skb->data, skb->len); @@ -96,123 +87,77 @@ static int nci_spi_send(struct nci_dev *nci_dev, struct sk_buff *skb) *skb_put(skb, 1) = crc & 0xFF; } - ret = __nci_spi_send(ndev, skb); + if (write_handshake_completion) { + /* Trick SPI driver to raise chip select */ + ret = __nci_spi_send(nspi, NULL, 1); + if (ret) + goto done; - kfree_skb(skb); - ndev->ops->assert_int(ndev); + /* wait for NFC chip hardware handshake to complete */ + if (wait_for_completion_timeout(write_handshake_completion, + msecs_to_jiffies(1000)) == 0) { + ret = -ETIME; + goto done; + } + } - if (ret != 0 || ndev->acknowledge_mode == NCI_SPI_CRC_DISABLED) + ret = __nci_spi_send(nspi, skb, 0); + if (ret != 0 || nspi->acknowledge_mode == NCI_SPI_CRC_DISABLED) goto done; - init_completion(&ndev->req_completion); - completion_rc = - wait_for_completion_interruptible_timeout(&ndev->req_completion, - NCI_SPI_SEND_TIMEOUT); + init_completion(&nspi->req_completion); + completion_rc = wait_for_completion_interruptible_timeout( + &nspi->req_completion, + NCI_SPI_SEND_TIMEOUT); - if (completion_rc <= 0 || ndev->req_result == ACKNOWLEDGE_NACK) + if (completion_rc <= 0 || nspi->req_result == ACKNOWLEDGE_NACK) ret = -EIO; done: + kfree_skb(skb); + return ret; } - -static struct nci_ops nci_spi_ops = { - .open = nci_spi_open, - .close = nci_spi_close, - .send = nci_spi_send, -}; +EXPORT_SYMBOL_GPL(nci_spi_send); /* ---- Interface to NCI SPI drivers ---- */ /** - * nci_spi_allocate_device - allocate a new nci spi device + * nci_spi_allocate_spi - allocate a new nci spi * * @spi: SPI device - * @ops: device operations - * @supported_protocols: NFC protocols supported by the device - * @supported_se: NFC Secure Elements supported by the device - * @acknowledge_mode: Acknowledge mode used by the device + * @acknowledge_mode: Acknowledge mode used by the NFC device * @delay: delay between transactions in us + * @ndev: nci dev to send incoming nci frames to */ -struct nci_spi_dev *nci_spi_allocate_device(struct spi_device *spi, - struct nci_spi_ops *ops, - u32 supported_protocols, - u32 supported_se, - u8 acknowledge_mode, - unsigned int delay) +struct nci_spi *nci_spi_allocate_spi(struct spi_device *spi, + u8 acknowledge_mode, unsigned int delay, + struct nci_dev *ndev) { - struct nci_spi_dev *ndev; - int tailroom = 0; + struct nci_spi *nspi; - if (!ops->open || !ops->close || !ops->assert_int || !ops->deassert_int) + nspi = devm_kzalloc(&spi->dev, sizeof(struct nci_spi), GFP_KERNEL); + if (!nspi) return NULL; - if (!supported_protocols) - return NULL; - - ndev = devm_kzalloc(&spi->dev, sizeof(struct nci_dev), GFP_KERNEL); - if (!ndev) - return NULL; + nspi->acknowledge_mode = acknowledge_mode; + nspi->xfer_udelay = delay; - ndev->ops = ops; - ndev->acknowledge_mode = acknowledge_mode; - ndev->xfer_udelay = delay; + nspi->spi = spi; + nspi->ndev = ndev; - if (acknowledge_mode == NCI_SPI_CRC_ENABLED) - tailroom += NCI_SPI_CRC_LEN; - - ndev->nci_dev = nci_allocate_device(&nci_spi_ops, supported_protocols, - NCI_SPI_HDR_LEN, tailroom); - if (!ndev->nci_dev) - return NULL; - - nci_set_drvdata(ndev->nci_dev, ndev); - - return ndev; + return nspi; } -EXPORT_SYMBOL_GPL(nci_spi_allocate_device); +EXPORT_SYMBOL_GPL(nci_spi_allocate_spi); -/** - * nci_spi_free_device - deallocate nci spi device - * - * @ndev: The nci spi device to deallocate - */ -void nci_spi_free_device(struct nci_spi_dev *ndev) -{ - nci_free_device(ndev->nci_dev); -} -EXPORT_SYMBOL_GPL(nci_spi_free_device); - -/** - * nci_spi_register_device - register a nci spi device in the nfc subsystem - * - * @pdev: The nci spi device to register - */ -int nci_spi_register_device(struct nci_spi_dev *ndev) -{ - return nci_register_device(ndev->nci_dev); -} -EXPORT_SYMBOL_GPL(nci_spi_register_device); - -/** - * nci_spi_unregister_device - unregister a nci spi device in the nfc subsystem - * - * @dev: The nci spi device to unregister - */ -void nci_spi_unregister_device(struct nci_spi_dev *ndev) -{ - nci_unregister_device(ndev->nci_dev); -} -EXPORT_SYMBOL_GPL(nci_spi_unregister_device); - -static int send_acknowledge(struct nci_spi_dev *ndev, u8 acknowledge) +static int send_acknowledge(struct nci_spi *nspi, u8 acknowledge) { struct sk_buff *skb; unsigned char *hdr; u16 crc; int ret; - skb = nci_skb_alloc(ndev->nci_dev, 0, GFP_KERNEL); + skb = nci_skb_alloc(nspi->ndev, 0, GFP_KERNEL); /* add the NCI SPI header to the start of the buffer */ hdr = skb_push(skb, NCI_SPI_HDR_LEN); @@ -225,14 +170,14 @@ static int send_acknowledge(struct nci_spi_dev *ndev, u8 acknowledge) *skb_put(skb, 1) = crc >> 8; *skb_put(skb, 1) = crc & 0xFF; - ret = __nci_spi_send(ndev, skb); + ret = __nci_spi_send(nspi, skb, 0); kfree_skb(skb); return ret; } -static struct sk_buff *__nci_spi_recv_frame(struct nci_spi_dev *ndev) +static struct sk_buff *__nci_spi_read(struct nci_spi *nspi) { struct sk_buff *skb; struct spi_message m; @@ -242,43 +187,49 @@ static struct sk_buff *__nci_spi_recv_frame(struct nci_spi_dev *ndev) int ret; spi_message_init(&m); + + memset(&tx, 0, sizeof(struct spi_transfer)); req[0] = NCI_SPI_DIRECT_READ; - req[1] = ndev->acknowledge_mode; + req[1] = nspi->acknowledge_mode; tx.tx_buf = req; tx.len = 2; tx.cs_change = 0; spi_message_add_tail(&tx, &m); + + memset(&rx, 0, sizeof(struct spi_transfer)); rx.rx_buf = resp_hdr; rx.len = 2; rx.cs_change = 1; spi_message_add_tail(&rx, &m); - ret = spi_sync(ndev->spi, &m); + ret = spi_sync(nspi->spi, &m); if (ret) return NULL; - if (ndev->acknowledge_mode == NCI_SPI_CRC_ENABLED) + if (nspi->acknowledge_mode == NCI_SPI_CRC_ENABLED) rx_len = ((resp_hdr[0] & NCI_SPI_MSB_PAYLOAD_MASK) << 8) + resp_hdr[1] + NCI_SPI_CRC_LEN; else rx_len = (resp_hdr[0] << 8) | resp_hdr[1]; - skb = nci_skb_alloc(ndev->nci_dev, rx_len, GFP_KERNEL); + skb = nci_skb_alloc(nspi->ndev, rx_len, GFP_KERNEL); if (!skb) return NULL; spi_message_init(&m); + + memset(&rx, 0, sizeof(struct spi_transfer)); rx.rx_buf = skb_put(skb, rx_len); rx.len = rx_len; rx.cs_change = 0; - rx.delay_usecs = ndev->xfer_udelay; + rx.delay_usecs = nspi->xfer_udelay; spi_message_add_tail(&rx, &m); - ret = spi_sync(ndev->spi, &m); + ret = spi_sync(nspi->spi, &m); if (ret) goto receive_error; - if (ndev->acknowledge_mode == NCI_SPI_CRC_ENABLED) { + if (nspi->acknowledge_mode == NCI_SPI_CRC_ENABLED) { *skb_push(skb, 1) = resp_hdr[1]; *skb_push(skb, 1) = resp_hdr[0]; } @@ -318,61 +269,53 @@ static u8 nci_spi_get_ack(struct sk_buff *skb) } /** - * nci_spi_recv_frame - receive frame from NCI SPI drivers + * nci_spi_read - read frame from NCI SPI drivers * - * @ndev: The nci spi device + * @nspi: The nci spi * Context: can sleep * * This call may only be used from a context that may sleep. The sleep * is non-interruptible, and has no timeout. * - * It returns zero on success, else a negative error code. + * It returns an allocated skb containing the frame on success, or NULL. */ -int nci_spi_recv_frame(struct nci_spi_dev *ndev) +struct sk_buff *nci_spi_read(struct nci_spi *nspi) { struct sk_buff *skb; - int ret = 0; - - ndev->ops->deassert_int(ndev); /* Retrieve frame from SPI */ - skb = __nci_spi_recv_frame(ndev); - if (!skb) { - ret = -EIO; + skb = __nci_spi_read(nspi); + if (!skb) goto done; - } - if (ndev->acknowledge_mode == NCI_SPI_CRC_ENABLED) { + if (nspi->acknowledge_mode == NCI_SPI_CRC_ENABLED) { if (!nci_spi_check_crc(skb)) { - send_acknowledge(ndev, ACKNOWLEDGE_NACK); + send_acknowledge(nspi, ACKNOWLEDGE_NACK); goto done; } /* In case of acknowledged mode: if ACK or NACK received, * unblock completion of latest frame sent. */ - ndev->req_result = nci_spi_get_ack(skb); - if (ndev->req_result) - complete(&ndev->req_completion); + nspi->req_result = nci_spi_get_ack(skb); + if (nspi->req_result) + complete(&nspi->req_completion); } /* If there is no payload (ACK/NACK only frame), * free the socket buffer */ - if (skb->len == 0) { + if (!skb->len) { kfree_skb(skb); + skb = NULL; goto done; } - if (ndev->acknowledge_mode == NCI_SPI_CRC_ENABLED) - send_acknowledge(ndev, ACKNOWLEDGE_ACK); - - /* Forward skb to NCI core layer */ - ret = nci_recv_frame(ndev->nci_dev, skb); + if (nspi->acknowledge_mode == NCI_SPI_CRC_ENABLED) + send_acknowledge(nspi, ACKNOWLEDGE_ACK); done: - ndev->ops->assert_int(ndev); - return ret; + return skb; } -EXPORT_SYMBOL_GPL(nci_spi_recv_frame); +EXPORT_SYMBOL_GPL(nci_spi_read); diff --git a/net/nfc/netlink.c b/net/nfc/netlink.c index 68063b2025da..84b7e3ea7b7a 100644 --- a/net/nfc/netlink.c +++ b/net/nfc/netlink.c @@ -58,6 +58,7 @@ static const struct nla_policy nfc_genl_policy[NFC_ATTR_MAX + 1] = { [NFC_ATTR_LLC_SDP] = { .type = NLA_NESTED }, [NFC_ATTR_FIRMWARE_NAME] = { .type = NLA_STRING, .len = NFC_FIRMWARE_NAME_MAXSIZE }, + [NFC_ATTR_SE_APDU] = { .type = NLA_BINARY }, }; static const struct nla_policy nfc_sdp_genl_policy[NFC_SDP_ATTR_MAX + 1] = { @@ -1278,6 +1279,91 @@ static int nfc_genl_dump_ses_done(struct netlink_callback *cb) return 0; } +struct se_io_ctx { + u32 dev_idx; + u32 se_idx; +}; + +static void se_io_cb(void *context, u8 *apdu, size_t apdu_len, int err) +{ + struct se_io_ctx *ctx = context; + struct sk_buff *msg; + void *hdr; + + msg = nlmsg_new(NLMSG_DEFAULT_SIZE, GFP_KERNEL); + if (!msg) { + kfree(ctx); + return; + } + + hdr = genlmsg_put(msg, 0, 0, &nfc_genl_family, 0, + NFC_CMD_SE_IO); + if (!hdr) + goto free_msg; + + if (nla_put_u32(msg, NFC_ATTR_DEVICE_INDEX, ctx->dev_idx) || + nla_put_u32(msg, NFC_ATTR_SE_INDEX, ctx->se_idx) || + nla_put(msg, NFC_ATTR_SE_APDU, apdu_len, apdu)) + goto nla_put_failure; + + genlmsg_end(msg, hdr); + + genlmsg_multicast(msg, 0, nfc_genl_event_mcgrp.id, GFP_KERNEL); + + kfree(ctx); + + return; + +nla_put_failure: + genlmsg_cancel(msg, hdr); +free_msg: + nlmsg_free(msg); + kfree(ctx); + + return; +} + +static int nfc_genl_se_io(struct sk_buff *skb, struct genl_info *info) +{ + struct nfc_dev *dev; + struct se_io_ctx *ctx; + u32 dev_idx, se_idx; + u8 *apdu; + size_t apdu_len; + + if (!info->attrs[NFC_ATTR_DEVICE_INDEX] || + !info->attrs[NFC_ATTR_SE_INDEX] || + !info->attrs[NFC_ATTR_SE_APDU]) + return -EINVAL; + + dev_idx = nla_get_u32(info->attrs[NFC_ATTR_DEVICE_INDEX]); + se_idx = nla_get_u32(info->attrs[NFC_ATTR_SE_INDEX]); + + dev = nfc_get_device(dev_idx); + if (!dev) + return -ENODEV; + + if (!dev->ops || !dev->ops->se_io) + return -ENOTSUPP; + + apdu_len = nla_len(info->attrs[NFC_ATTR_SE_APDU]); + if (apdu_len == 0) + return -EINVAL; + + apdu = nla_data(info->attrs[NFC_ATTR_SE_APDU]); + if (!apdu) + return -EINVAL; + + ctx = kzalloc(sizeof(struct se_io_ctx), GFP_KERNEL); + if (!ctx) + return -ENOMEM; + + ctx->dev_idx = dev_idx; + ctx->se_idx = se_idx; + + return dev->ops->se_io(dev, se_idx, apdu, apdu_len, se_io_cb, ctx); +} + static struct genl_ops nfc_genl_ops[] = { { .cmd = NFC_CMD_GET_DEVICE, @@ -1358,6 +1444,11 @@ static struct genl_ops nfc_genl_ops[] = { .done = nfc_genl_dump_ses_done, .policy = nfc_genl_policy, }, + { + .cmd = NFC_CMD_SE_IO, + .doit = nfc_genl_se_io, + .policy = nfc_genl_policy, + }, }; diff --git a/net/nfc/rawsock.c b/net/nfc/rawsock.c index 313bf1bc848a..cd958b381f96 100644 --- a/net/nfc/rawsock.c +++ b/net/nfc/rawsock.c @@ -142,11 +142,11 @@ static void rawsock_data_exchange_complete(void *context, struct sk_buff *skb, err = rawsock_add_header(skb); if (err) - goto error; + goto error_skb; err = sock_queue_rcv_skb(sk, skb); if (err) - goto error; + goto error_skb; spin_lock_bh(&sk->sk_write_queue.lock); if (!skb_queue_empty(&sk->sk_write_queue)) @@ -158,6 +158,9 @@ static void rawsock_data_exchange_complete(void *context, struct sk_buff *skb, sock_put(sk); return; +error_skb: + kfree_skb(skb); + error: rawsock_report_error(sk, err); sock_put(sk); diff --git a/net/openvswitch/Makefile b/net/openvswitch/Makefile index ea36e99089af..3591cb5dae91 100644 --- a/net/openvswitch/Makefile +++ b/net/openvswitch/Makefile @@ -9,6 +9,8 @@ openvswitch-y := \ datapath.o \ dp_notify.o \ flow.o \ + flow_netlink.o \ + flow_table.o \ vport.o \ vport-internal_dev.o \ vport-netdev.o diff --git a/net/openvswitch/datapath.c b/net/openvswitch/datapath.c index 2aa13bd7f2b2..1408adc2a2a7 100644 --- a/net/openvswitch/datapath.c +++ b/net/openvswitch/datapath.c @@ -55,14 +55,10 @@ #include "datapath.h" #include "flow.h" +#include "flow_netlink.h" #include "vport-internal_dev.h" #include "vport-netdev.h" - -#define REHASH_FLOW_INTERVAL (10 * 60 * HZ) -static void rehash_flow_table(struct work_struct *work); -static DECLARE_DELAYED_WORK(rehash_flow_wq, rehash_flow_table); - int ovs_net_id __read_mostly; static void ovs_notify(struct sk_buff *skb, struct genl_info *info, @@ -165,7 +161,7 @@ static void destroy_dp_rcu(struct rcu_head *rcu) { struct datapath *dp = container_of(rcu, struct datapath, rcu); - ovs_flow_tbl_destroy((__force struct flow_table *)dp->table, false); + ovs_flow_tbl_destroy(&dp->table); free_percpu(dp->stats_percpu); release_net(ovs_dp_get_net(dp)); kfree(dp->ports); @@ -225,6 +221,7 @@ void ovs_dp_process_received_packet(struct vport *p, struct sk_buff *skb) struct dp_stats_percpu *stats; struct sw_flow_key key; u64 *stats_counter; + u32 n_mask_hit; int error; stats = this_cpu_ptr(dp->stats_percpu); @@ -237,7 +234,7 @@ void ovs_dp_process_received_packet(struct vport *p, struct sk_buff *skb) } /* Look up flow. */ - flow = ovs_flow_lookup(rcu_dereference(dp->table), &key); + flow = ovs_flow_tbl_lookup(&dp->table, &key, &n_mask_hit); if (unlikely(!flow)) { struct dp_upcall_info upcall; @@ -262,6 +259,7 @@ out: /* Update datapath statistics. */ u64_stats_update_begin(&stats->sync); (*stats_counter)++; + stats->n_mask_hit += n_mask_hit; u64_stats_update_end(&stats->sync); } @@ -435,7 +433,7 @@ static int queue_userspace_packet(struct net *net, int dp_ifindex, upcall->dp_ifindex = dp_ifindex; nla = nla_nest_start(user_skb, OVS_PACKET_ATTR_KEY); - ovs_flow_to_nlattrs(upcall_info->key, upcall_info->key, user_skb); + ovs_nla_put_flow(upcall_info->key, upcall_info->key, user_skb); nla_nest_end(user_skb, nla); if (upcall_info->userdata) @@ -455,398 +453,6 @@ out: return err; } -/* Called with ovs_mutex. */ -static int flush_flows(struct datapath *dp) -{ - struct flow_table *old_table; - struct flow_table *new_table; - - old_table = ovsl_dereference(dp->table); - new_table = ovs_flow_tbl_alloc(TBL_MIN_BUCKETS); - if (!new_table) - return -ENOMEM; - - rcu_assign_pointer(dp->table, new_table); - - ovs_flow_tbl_destroy(old_table, true); - return 0; -} - -static struct nlattr *reserve_sfa_size(struct sw_flow_actions **sfa, int attr_len) -{ - - struct sw_flow_actions *acts; - int new_acts_size; - int req_size = NLA_ALIGN(attr_len); - int next_offset = offsetof(struct sw_flow_actions, actions) + - (*sfa)->actions_len; - - if (req_size <= (ksize(*sfa) - next_offset)) - goto out; - - new_acts_size = ksize(*sfa) * 2; - - if (new_acts_size > MAX_ACTIONS_BUFSIZE) { - if ((MAX_ACTIONS_BUFSIZE - next_offset) < req_size) - return ERR_PTR(-EMSGSIZE); - new_acts_size = MAX_ACTIONS_BUFSIZE; - } - - acts = ovs_flow_actions_alloc(new_acts_size); - if (IS_ERR(acts)) - return (void *)acts; - - memcpy(acts->actions, (*sfa)->actions, (*sfa)->actions_len); - acts->actions_len = (*sfa)->actions_len; - kfree(*sfa); - *sfa = acts; - -out: - (*sfa)->actions_len += req_size; - return (struct nlattr *) ((unsigned char *)(*sfa) + next_offset); -} - -static int add_action(struct sw_flow_actions **sfa, int attrtype, void *data, int len) -{ - struct nlattr *a; - - a = reserve_sfa_size(sfa, nla_attr_size(len)); - if (IS_ERR(a)) - return PTR_ERR(a); - - a->nla_type = attrtype; - a->nla_len = nla_attr_size(len); - - if (data) - memcpy(nla_data(a), data, len); - memset((unsigned char *) a + a->nla_len, 0, nla_padlen(len)); - - return 0; -} - -static inline int add_nested_action_start(struct sw_flow_actions **sfa, int attrtype) -{ - int used = (*sfa)->actions_len; - int err; - - err = add_action(sfa, attrtype, NULL, 0); - if (err) - return err; - - return used; -} - -static inline void add_nested_action_end(struct sw_flow_actions *sfa, int st_offset) -{ - struct nlattr *a = (struct nlattr *) ((unsigned char *)sfa->actions + st_offset); - - a->nla_len = sfa->actions_len - st_offset; -} - -static int validate_and_copy_actions(const struct nlattr *attr, - const struct sw_flow_key *key, int depth, - struct sw_flow_actions **sfa); - -static int validate_and_copy_sample(const struct nlattr *attr, - const struct sw_flow_key *key, int depth, - struct sw_flow_actions **sfa) -{ - const struct nlattr *attrs[OVS_SAMPLE_ATTR_MAX + 1]; - const struct nlattr *probability, *actions; - const struct nlattr *a; - int rem, start, err, st_acts; - - memset(attrs, 0, sizeof(attrs)); - nla_for_each_nested(a, attr, rem) { - int type = nla_type(a); - if (!type || type > OVS_SAMPLE_ATTR_MAX || attrs[type]) - return -EINVAL; - attrs[type] = a; - } - if (rem) - return -EINVAL; - - probability = attrs[OVS_SAMPLE_ATTR_PROBABILITY]; - if (!probability || nla_len(probability) != sizeof(u32)) - return -EINVAL; - - actions = attrs[OVS_SAMPLE_ATTR_ACTIONS]; - if (!actions || (nla_len(actions) && nla_len(actions) < NLA_HDRLEN)) - return -EINVAL; - - /* validation done, copy sample action. */ - start = add_nested_action_start(sfa, OVS_ACTION_ATTR_SAMPLE); - if (start < 0) - return start; - err = add_action(sfa, OVS_SAMPLE_ATTR_PROBABILITY, nla_data(probability), sizeof(u32)); - if (err) - return err; - st_acts = add_nested_action_start(sfa, OVS_SAMPLE_ATTR_ACTIONS); - if (st_acts < 0) - return st_acts; - - err = validate_and_copy_actions(actions, key, depth + 1, sfa); - if (err) - return err; - - add_nested_action_end(*sfa, st_acts); - add_nested_action_end(*sfa, start); - - return 0; -} - -static int validate_tp_port(const struct sw_flow_key *flow_key) -{ - if (flow_key->eth.type == htons(ETH_P_IP)) { - if (flow_key->ipv4.tp.src || flow_key->ipv4.tp.dst) - return 0; - } else if (flow_key->eth.type == htons(ETH_P_IPV6)) { - if (flow_key->ipv6.tp.src || flow_key->ipv6.tp.dst) - return 0; - } - - return -EINVAL; -} - -static int validate_and_copy_set_tun(const struct nlattr *attr, - struct sw_flow_actions **sfa) -{ - struct sw_flow_match match; - struct sw_flow_key key; - int err, start; - - ovs_match_init(&match, &key, NULL); - err = ovs_ipv4_tun_from_nlattr(nla_data(attr), &match, false); - if (err) - return err; - - start = add_nested_action_start(sfa, OVS_ACTION_ATTR_SET); - if (start < 0) - return start; - - err = add_action(sfa, OVS_KEY_ATTR_IPV4_TUNNEL, &match.key->tun_key, - sizeof(match.key->tun_key)); - add_nested_action_end(*sfa, start); - - return err; -} - -static int validate_set(const struct nlattr *a, - const struct sw_flow_key *flow_key, - struct sw_flow_actions **sfa, - bool *set_tun) -{ - const struct nlattr *ovs_key = nla_data(a); - int key_type = nla_type(ovs_key); - - /* There can be only one key in a action */ - if (nla_total_size(nla_len(ovs_key)) != nla_len(a)) - return -EINVAL; - - if (key_type > OVS_KEY_ATTR_MAX || - (ovs_key_lens[key_type] != nla_len(ovs_key) && - ovs_key_lens[key_type] != -1)) - return -EINVAL; - - switch (key_type) { - const struct ovs_key_ipv4 *ipv4_key; - const struct ovs_key_ipv6 *ipv6_key; - int err; - - case OVS_KEY_ATTR_PRIORITY: - case OVS_KEY_ATTR_SKB_MARK: - case OVS_KEY_ATTR_ETHERNET: - break; - - case OVS_KEY_ATTR_TUNNEL: - *set_tun = true; - err = validate_and_copy_set_tun(a, sfa); - if (err) - return err; - break; - - case OVS_KEY_ATTR_IPV4: - if (flow_key->eth.type != htons(ETH_P_IP)) - return -EINVAL; - - if (!flow_key->ip.proto) - return -EINVAL; - - ipv4_key = nla_data(ovs_key); - if (ipv4_key->ipv4_proto != flow_key->ip.proto) - return -EINVAL; - - if (ipv4_key->ipv4_frag != flow_key->ip.frag) - return -EINVAL; - - break; - - case OVS_KEY_ATTR_IPV6: - if (flow_key->eth.type != htons(ETH_P_IPV6)) - return -EINVAL; - - if (!flow_key->ip.proto) - return -EINVAL; - - ipv6_key = nla_data(ovs_key); - if (ipv6_key->ipv6_proto != flow_key->ip.proto) - return -EINVAL; - - if (ipv6_key->ipv6_frag != flow_key->ip.frag) - return -EINVAL; - - if (ntohl(ipv6_key->ipv6_label) & 0xFFF00000) - return -EINVAL; - - break; - - case OVS_KEY_ATTR_TCP: - if (flow_key->ip.proto != IPPROTO_TCP) - return -EINVAL; - - return validate_tp_port(flow_key); - - case OVS_KEY_ATTR_UDP: - if (flow_key->ip.proto != IPPROTO_UDP) - return -EINVAL; - - return validate_tp_port(flow_key); - - case OVS_KEY_ATTR_SCTP: - if (flow_key->ip.proto != IPPROTO_SCTP) - return -EINVAL; - - return validate_tp_port(flow_key); - - default: - return -EINVAL; - } - - return 0; -} - -static int validate_userspace(const struct nlattr *attr) -{ - static const struct nla_policy userspace_policy[OVS_USERSPACE_ATTR_MAX + 1] = { - [OVS_USERSPACE_ATTR_PID] = {.type = NLA_U32 }, - [OVS_USERSPACE_ATTR_USERDATA] = {.type = NLA_UNSPEC }, - }; - struct nlattr *a[OVS_USERSPACE_ATTR_MAX + 1]; - int error; - - error = nla_parse_nested(a, OVS_USERSPACE_ATTR_MAX, - attr, userspace_policy); - if (error) - return error; - - if (!a[OVS_USERSPACE_ATTR_PID] || - !nla_get_u32(a[OVS_USERSPACE_ATTR_PID])) - return -EINVAL; - - return 0; -} - -static int copy_action(const struct nlattr *from, - struct sw_flow_actions **sfa) -{ - int totlen = NLA_ALIGN(from->nla_len); - struct nlattr *to; - - to = reserve_sfa_size(sfa, from->nla_len); - if (IS_ERR(to)) - return PTR_ERR(to); - - memcpy(to, from, totlen); - return 0; -} - -static int validate_and_copy_actions(const struct nlattr *attr, - const struct sw_flow_key *key, - int depth, - struct sw_flow_actions **sfa) -{ - const struct nlattr *a; - int rem, err; - - if (depth >= SAMPLE_ACTION_DEPTH) - return -EOVERFLOW; - - nla_for_each_nested(a, attr, rem) { - /* Expected argument lengths, (u32)-1 for variable length. */ - static const u32 action_lens[OVS_ACTION_ATTR_MAX + 1] = { - [OVS_ACTION_ATTR_OUTPUT] = sizeof(u32), - [OVS_ACTION_ATTR_USERSPACE] = (u32)-1, - [OVS_ACTION_ATTR_PUSH_VLAN] = sizeof(struct ovs_action_push_vlan), - [OVS_ACTION_ATTR_POP_VLAN] = 0, - [OVS_ACTION_ATTR_SET] = (u32)-1, - [OVS_ACTION_ATTR_SAMPLE] = (u32)-1 - }; - const struct ovs_action_push_vlan *vlan; - int type = nla_type(a); - bool skip_copy; - - if (type > OVS_ACTION_ATTR_MAX || - (action_lens[type] != nla_len(a) && - action_lens[type] != (u32)-1)) - return -EINVAL; - - skip_copy = false; - switch (type) { - case OVS_ACTION_ATTR_UNSPEC: - return -EINVAL; - - case OVS_ACTION_ATTR_USERSPACE: - err = validate_userspace(a); - if (err) - return err; - break; - - case OVS_ACTION_ATTR_OUTPUT: - if (nla_get_u32(a) >= DP_MAX_PORTS) - return -EINVAL; - break; - - - case OVS_ACTION_ATTR_POP_VLAN: - break; - - case OVS_ACTION_ATTR_PUSH_VLAN: - vlan = nla_data(a); - if (vlan->vlan_tpid != htons(ETH_P_8021Q)) - return -EINVAL; - if (!(vlan->vlan_tci & htons(VLAN_TAG_PRESENT))) - return -EINVAL; - break; - - case OVS_ACTION_ATTR_SET: - err = validate_set(a, key, sfa, &skip_copy); - if (err) - return err; - break; - - case OVS_ACTION_ATTR_SAMPLE: - err = validate_and_copy_sample(a, key, depth, sfa); - if (err) - return err; - skip_copy = true; - break; - - default: - return -EINVAL; - } - if (!skip_copy) { - err = copy_action(a, sfa); - if (err) - return err; - } - } - - if (rem > 0) - return -EINVAL; - - return 0; -} - static void clear_stats(struct sw_flow *flow) { flow->used = 0; @@ -902,15 +508,16 @@ static int ovs_packet_cmd_execute(struct sk_buff *skb, struct genl_info *info) if (err) goto err_flow_free; - err = ovs_flow_metadata_from_nlattrs(flow, a[OVS_PACKET_ATTR_KEY]); + err = ovs_nla_get_flow_metadata(flow, a[OVS_PACKET_ATTR_KEY]); if (err) goto err_flow_free; - acts = ovs_flow_actions_alloc(nla_len(a[OVS_PACKET_ATTR_ACTIONS])); + acts = ovs_nla_alloc_flow_actions(nla_len(a[OVS_PACKET_ATTR_ACTIONS])); err = PTR_ERR(acts); if (IS_ERR(acts)) goto err_flow_free; - err = validate_and_copy_actions(a[OVS_PACKET_ATTR_ACTIONS], &flow->key, 0, &acts); + err = ovs_nla_copy_actions(a[OVS_PACKET_ATTR_ACTIONS], + &flow->key, 0, &acts); rcu_assign_pointer(flow->sf_acts, acts); if (err) goto err_flow_free; @@ -958,15 +565,18 @@ static struct genl_ops dp_packet_genl_ops[] = { } }; -static void get_dp_stats(struct datapath *dp, struct ovs_dp_stats *stats) +static void get_dp_stats(struct datapath *dp, struct ovs_dp_stats *stats, + struct ovs_dp_megaflow_stats *mega_stats) { - struct flow_table *table; int i; - table = rcu_dereference_check(dp->table, lockdep_ovsl_is_held()); - stats->n_flows = ovs_flow_tbl_count(table); + memset(mega_stats, 0, sizeof(*mega_stats)); + + stats->n_flows = ovs_flow_tbl_count(&dp->table); + mega_stats->n_masks = ovs_flow_tbl_num_masks(&dp->table); stats->n_hit = stats->n_missed = stats->n_lost = 0; + for_each_possible_cpu(i) { const struct dp_stats_percpu *percpu_stats; struct dp_stats_percpu local_stats; @@ -982,6 +592,7 @@ static void get_dp_stats(struct datapath *dp, struct ovs_dp_stats *stats) stats->n_hit += local_stats.n_hit; stats->n_missed += local_stats.n_missed; stats->n_lost += local_stats.n_lost; + mega_stats->n_mask_hit += local_stats.n_mask_hit; } } @@ -1005,100 +616,6 @@ static struct genl_multicast_group ovs_dp_flow_multicast_group = { .name = OVS_FLOW_MCGROUP }; -static int actions_to_attr(const struct nlattr *attr, int len, struct sk_buff *skb); -static int sample_action_to_attr(const struct nlattr *attr, struct sk_buff *skb) -{ - const struct nlattr *a; - struct nlattr *start; - int err = 0, rem; - - start = nla_nest_start(skb, OVS_ACTION_ATTR_SAMPLE); - if (!start) - return -EMSGSIZE; - - nla_for_each_nested(a, attr, rem) { - int type = nla_type(a); - struct nlattr *st_sample; - - switch (type) { - case OVS_SAMPLE_ATTR_PROBABILITY: - if (nla_put(skb, OVS_SAMPLE_ATTR_PROBABILITY, sizeof(u32), nla_data(a))) - return -EMSGSIZE; - break; - case OVS_SAMPLE_ATTR_ACTIONS: - st_sample = nla_nest_start(skb, OVS_SAMPLE_ATTR_ACTIONS); - if (!st_sample) - return -EMSGSIZE; - err = actions_to_attr(nla_data(a), nla_len(a), skb); - if (err) - return err; - nla_nest_end(skb, st_sample); - break; - } - } - - nla_nest_end(skb, start); - return err; -} - -static int set_action_to_attr(const struct nlattr *a, struct sk_buff *skb) -{ - const struct nlattr *ovs_key = nla_data(a); - int key_type = nla_type(ovs_key); - struct nlattr *start; - int err; - - switch (key_type) { - case OVS_KEY_ATTR_IPV4_TUNNEL: - start = nla_nest_start(skb, OVS_ACTION_ATTR_SET); - if (!start) - return -EMSGSIZE; - - err = ovs_ipv4_tun_to_nlattr(skb, nla_data(ovs_key), - nla_data(ovs_key)); - if (err) - return err; - nla_nest_end(skb, start); - break; - default: - if (nla_put(skb, OVS_ACTION_ATTR_SET, nla_len(a), ovs_key)) - return -EMSGSIZE; - break; - } - - return 0; -} - -static int actions_to_attr(const struct nlattr *attr, int len, struct sk_buff *skb) -{ - const struct nlattr *a; - int rem, err; - - nla_for_each_attr(a, attr, len, rem) { - int type = nla_type(a); - - switch (type) { - case OVS_ACTION_ATTR_SET: - err = set_action_to_attr(a, skb); - if (err) - return err; - break; - - case OVS_ACTION_ATTR_SAMPLE: - err = sample_action_to_attr(a, skb); - if (err) - return err; - break; - default: - if (nla_put(skb, type, nla_len(a), nla_data(a))) - return -EMSGSIZE; - break; - } - } - - return 0; -} - static size_t ovs_flow_cmd_msg_size(const struct sw_flow_actions *acts) { return NLMSG_ALIGN(sizeof(struct ovs_header)) @@ -1135,8 +652,7 @@ static int ovs_flow_cmd_fill_info(struct sw_flow *flow, struct datapath *dp, if (!nla) goto nla_put_failure; - err = ovs_flow_to_nlattrs(&flow->unmasked_key, - &flow->unmasked_key, skb); + err = ovs_nla_put_flow(&flow->unmasked_key, &flow->unmasked_key, skb); if (err) goto error; nla_nest_end(skb, nla); @@ -1145,7 +661,7 @@ static int ovs_flow_cmd_fill_info(struct sw_flow *flow, struct datapath *dp, if (!nla) goto nla_put_failure; - err = ovs_flow_to_nlattrs(&flow->key, &flow->mask->key, skb); + err = ovs_nla_put_flow(&flow->key, &flow->mask->key, skb); if (err) goto error; @@ -1155,7 +671,7 @@ static int ovs_flow_cmd_fill_info(struct sw_flow *flow, struct datapath *dp, used = flow->used; stats.n_packets = flow->packet_count; stats.n_bytes = flow->byte_count; - tcp_flags = flow->tcp_flags; + tcp_flags = (u8)ntohs(flow->tcp_flags); spin_unlock_bh(&flow->lock); if (used && @@ -1188,7 +704,8 @@ static int ovs_flow_cmd_fill_info(struct sw_flow *flow, struct datapath *dp, sf_acts = rcu_dereference_check(flow->sf_acts, lockdep_ovsl_is_held()); - err = actions_to_attr(sf_acts->actions, sf_acts->actions_len, skb); + err = ovs_nla_put_actions(sf_acts->actions, + sf_acts->actions_len, skb); if (!err) nla_nest_end(skb, start); else { @@ -1234,6 +751,14 @@ static struct sk_buff *ovs_flow_cmd_build_info(struct sw_flow *flow, return skb; } +static struct sw_flow *__ovs_flow_tbl_lookup(struct flow_table *tbl, + const struct sw_flow_key *key) +{ + u32 __always_unused n_mask_hit; + + return ovs_flow_tbl_lookup(tbl, key, &n_mask_hit); +} + static int ovs_flow_cmd_new_or_set(struct sk_buff *skb, struct genl_info *info) { struct nlattr **a = info->attrs; @@ -1243,7 +768,6 @@ static int ovs_flow_cmd_new_or_set(struct sk_buff *skb, struct genl_info *info) struct sw_flow_mask mask; struct sk_buff *reply; struct datapath *dp; - struct flow_table *table; struct sw_flow_actions *acts = NULL; struct sw_flow_match match; int error; @@ -1254,21 +778,21 @@ static int ovs_flow_cmd_new_or_set(struct sk_buff *skb, struct genl_info *info) goto error; ovs_match_init(&match, &key, &mask); - error = ovs_match_from_nlattrs(&match, - a[OVS_FLOW_ATTR_KEY], a[OVS_FLOW_ATTR_MASK]); + error = ovs_nla_get_match(&match, + a[OVS_FLOW_ATTR_KEY], a[OVS_FLOW_ATTR_MASK]); if (error) goto error; /* Validate actions. */ if (a[OVS_FLOW_ATTR_ACTIONS]) { - acts = ovs_flow_actions_alloc(nla_len(a[OVS_FLOW_ATTR_ACTIONS])); + acts = ovs_nla_alloc_flow_actions(nla_len(a[OVS_FLOW_ATTR_ACTIONS])); error = PTR_ERR(acts); if (IS_ERR(acts)) goto error; - ovs_flow_key_mask(&masked_key, &key, &mask); - error = validate_and_copy_actions(a[OVS_FLOW_ATTR_ACTIONS], - &masked_key, 0, &acts); + ovs_flow_mask_key(&masked_key, &key, &mask); + error = ovs_nla_copy_actions(a[OVS_FLOW_ATTR_ACTIONS], + &masked_key, 0, &acts); if (error) { OVS_NLERR("Flow actions may not be safe on all matching packets.\n"); goto err_kfree; @@ -1284,29 +808,14 @@ static int ovs_flow_cmd_new_or_set(struct sk_buff *skb, struct genl_info *info) if (!dp) goto err_unlock_ovs; - table = ovsl_dereference(dp->table); - /* Check if this is a duplicate flow */ - flow = ovs_flow_lookup(table, &key); + flow = __ovs_flow_tbl_lookup(&dp->table, &key); if (!flow) { - struct sw_flow_mask *mask_p; /* Bail out if we're not allowed to create a new flow. */ error = -ENOENT; if (info->genlhdr->cmd == OVS_FLOW_CMD_SET) goto err_unlock_ovs; - /* Expand table, if necessary, to make room. */ - if (ovs_flow_tbl_need_to_expand(table)) { - struct flow_table *new_table; - - new_table = ovs_flow_tbl_expand(table); - if (!IS_ERR(new_table)) { - rcu_assign_pointer(dp->table, new_table); - ovs_flow_tbl_destroy(table, true); - table = ovsl_dereference(dp->table); - } - } - /* Allocate flow. */ flow = ovs_flow_alloc(); if (IS_ERR(flow)) { @@ -1317,25 +826,14 @@ static int ovs_flow_cmd_new_or_set(struct sk_buff *skb, struct genl_info *info) flow->key = masked_key; flow->unmasked_key = key; - - /* Make sure mask is unique in the system */ - mask_p = ovs_sw_flow_mask_find(table, &mask); - if (!mask_p) { - /* Allocate a new mask if none exsits. */ - mask_p = ovs_sw_flow_mask_alloc(); - if (!mask_p) - goto err_flow_free; - mask_p->key = mask.key; - mask_p->range = mask.range; - ovs_sw_flow_mask_insert(table, mask_p); - } - - ovs_sw_flow_mask_add_ref(mask_p); - flow->mask = mask_p; rcu_assign_pointer(flow->sf_acts, acts); /* Put flow in bucket. */ - ovs_flow_insert(table, flow); + error = ovs_flow_tbl_insert(&dp->table, flow, &mask); + if (error) { + acts = NULL; + goto err_flow_free; + } reply = ovs_flow_cmd_build_info(flow, dp, info->snd_portid, info->snd_seq, OVS_FLOW_CMD_NEW); @@ -1356,7 +854,7 @@ static int ovs_flow_cmd_new_or_set(struct sk_buff *skb, struct genl_info *info) /* The unmasked key has to be the same for flow updates. */ error = -EINVAL; - if (!ovs_flow_cmp_unmasked_key(flow, &key, match.range.end)) { + if (!ovs_flow_cmp_unmasked_key(flow, &match)) { OVS_NLERR("Flow modification message rejected, unmasked key does not match.\n"); goto err_unlock_ovs; } @@ -1364,7 +862,7 @@ static int ovs_flow_cmd_new_or_set(struct sk_buff *skb, struct genl_info *info) /* Update actions. */ old_acts = ovsl_dereference(flow->sf_acts); rcu_assign_pointer(flow->sf_acts, acts); - ovs_flow_deferred_free_acts(old_acts); + ovs_nla_free_flow_actions(old_acts); reply = ovs_flow_cmd_build_info(flow, dp, info->snd_portid, info->snd_seq, OVS_FLOW_CMD_NEW); @@ -1403,7 +901,6 @@ static int ovs_flow_cmd_get(struct sk_buff *skb, struct genl_info *info) struct sk_buff *reply; struct sw_flow *flow; struct datapath *dp; - struct flow_table *table; struct sw_flow_match match; int err; @@ -1413,7 +910,7 @@ static int ovs_flow_cmd_get(struct sk_buff *skb, struct genl_info *info) } ovs_match_init(&match, &key, NULL); - err = ovs_match_from_nlattrs(&match, a[OVS_FLOW_ATTR_KEY], NULL); + err = ovs_nla_get_match(&match, a[OVS_FLOW_ATTR_KEY], NULL); if (err) return err; @@ -1424,9 +921,8 @@ static int ovs_flow_cmd_get(struct sk_buff *skb, struct genl_info *info) goto unlock; } - table = ovsl_dereference(dp->table); - flow = ovs_flow_lookup_unmasked_key(table, &match); - if (!flow) { + flow = __ovs_flow_tbl_lookup(&dp->table, &key); + if (!flow || !ovs_flow_cmp_unmasked_key(flow, &match)) { err = -ENOENT; goto unlock; } @@ -1453,7 +949,6 @@ static int ovs_flow_cmd_del(struct sk_buff *skb, struct genl_info *info) struct sk_buff *reply; struct sw_flow *flow; struct datapath *dp; - struct flow_table *table; struct sw_flow_match match; int err; @@ -1465,18 +960,17 @@ static int ovs_flow_cmd_del(struct sk_buff *skb, struct genl_info *info) } if (!a[OVS_FLOW_ATTR_KEY]) { - err = flush_flows(dp); + err = ovs_flow_tbl_flush(&dp->table); goto unlock; } ovs_match_init(&match, &key, NULL); - err = ovs_match_from_nlattrs(&match, a[OVS_FLOW_ATTR_KEY], NULL); + err = ovs_nla_get_match(&match, a[OVS_FLOW_ATTR_KEY], NULL); if (err) goto unlock; - table = ovsl_dereference(dp->table); - flow = ovs_flow_lookup_unmasked_key(table, &match); - if (!flow) { + flow = __ovs_flow_tbl_lookup(&dp->table, &key); + if (!flow || !ovs_flow_cmp_unmasked_key(flow, &match)) { err = -ENOENT; goto unlock; } @@ -1487,7 +981,7 @@ static int ovs_flow_cmd_del(struct sk_buff *skb, struct genl_info *info) goto unlock; } - ovs_flow_remove(table, flow); + ovs_flow_tbl_remove(&dp->table, flow); err = ovs_flow_cmd_fill_info(flow, dp, reply, info->snd_portid, info->snd_seq, 0, OVS_FLOW_CMD_DEL); @@ -1506,8 +1000,8 @@ unlock: static int ovs_flow_cmd_dump(struct sk_buff *skb, struct netlink_callback *cb) { struct ovs_header *ovs_header = genlmsg_data(nlmsg_data(cb->nlh)); + struct table_instance *ti; struct datapath *dp; - struct flow_table *table; rcu_read_lock(); dp = get_dp(sock_net(skb->sk), ovs_header->dp_ifindex); @@ -1516,14 +1010,14 @@ static int ovs_flow_cmd_dump(struct sk_buff *skb, struct netlink_callback *cb) return -ENODEV; } - table = rcu_dereference(dp->table); + ti = rcu_dereference(dp->table.ti); for (;;) { struct sw_flow *flow; u32 bucket, obj; bucket = cb->args[0]; obj = cb->args[1]; - flow = ovs_flow_dump_next(table, &bucket, &obj); + flow = ovs_flow_tbl_dump_next(ti, &bucket, &obj); if (!flow) break; @@ -1589,6 +1083,7 @@ static size_t ovs_dp_cmd_msg_size(void) msgsize += nla_total_size(IFNAMSIZ); msgsize += nla_total_size(sizeof(struct ovs_dp_stats)); + msgsize += nla_total_size(sizeof(struct ovs_dp_megaflow_stats)); return msgsize; } @@ -1598,6 +1093,7 @@ static int ovs_dp_cmd_fill_info(struct datapath *dp, struct sk_buff *skb, { struct ovs_header *ovs_header; struct ovs_dp_stats dp_stats; + struct ovs_dp_megaflow_stats dp_megaflow_stats; int err; ovs_header = genlmsg_put(skb, portid, seq, &dp_datapath_genl_family, @@ -1613,8 +1109,14 @@ static int ovs_dp_cmd_fill_info(struct datapath *dp, struct sk_buff *skb, if (err) goto nla_put_failure; - get_dp_stats(dp, &dp_stats); - if (nla_put(skb, OVS_DP_ATTR_STATS, sizeof(struct ovs_dp_stats), &dp_stats)) + get_dp_stats(dp, &dp_stats, &dp_megaflow_stats); + if (nla_put(skb, OVS_DP_ATTR_STATS, sizeof(struct ovs_dp_stats), + &dp_stats)) + goto nla_put_failure; + + if (nla_put(skb, OVS_DP_ATTR_MEGAFLOW_STATS, + sizeof(struct ovs_dp_megaflow_stats), + &dp_megaflow_stats)) goto nla_put_failure; return genlmsg_end(skb, ovs_header); @@ -1687,9 +1189,8 @@ static int ovs_dp_cmd_new(struct sk_buff *skb, struct genl_info *info) ovs_dp_set_net(dp, hold_net(sock_net(skb->sk))); /* Allocate table. */ - err = -ENOMEM; - rcu_assign_pointer(dp->table, ovs_flow_tbl_alloc(TBL_MIN_BUCKETS)); - if (!dp->table) + err = ovs_flow_tbl_init(&dp->table); + if (err) goto err_free_dp; dp->stats_percpu = alloc_percpu(struct dp_stats_percpu); @@ -1699,7 +1200,7 @@ static int ovs_dp_cmd_new(struct sk_buff *skb, struct genl_info *info) } dp->ports = kmalloc(DP_VPORT_HASH_BUCKETS * sizeof(struct hlist_head), - GFP_KERNEL); + GFP_KERNEL); if (!dp->ports) { err = -ENOMEM; goto err_destroy_percpu; @@ -1746,7 +1247,7 @@ err_destroy_ports_array: err_destroy_percpu: free_percpu(dp->stats_percpu); err_destroy_table: - ovs_flow_tbl_destroy(ovsl_dereference(dp->table), false); + ovs_flow_tbl_destroy(&dp->table); err_free_dp: release_net(ovs_dp_get_net(dp)); kfree(dp); @@ -2336,32 +1837,6 @@ error: return err; } -static void rehash_flow_table(struct work_struct *work) -{ - struct datapath *dp; - struct net *net; - - ovs_lock(); - rtnl_lock(); - for_each_net(net) { - struct ovs_net *ovs_net = net_generic(net, ovs_net_id); - - list_for_each_entry(dp, &ovs_net->dps, list_node) { - struct flow_table *old_table = ovsl_dereference(dp->table); - struct flow_table *new_table; - - new_table = ovs_flow_tbl_rehash(old_table); - if (!IS_ERR(new_table)) { - rcu_assign_pointer(dp->table, new_table); - ovs_flow_tbl_destroy(old_table, true); - } - } - } - rtnl_unlock(); - ovs_unlock(); - schedule_delayed_work(&rehash_flow_wq, REHASH_FLOW_INTERVAL); -} - static int __net_init ovs_init_net(struct net *net) { struct ovs_net *ovs_net = net_generic(net, ovs_net_id); @@ -2419,8 +1894,6 @@ static int __init dp_init(void) if (err < 0) goto error_unreg_notifier; - schedule_delayed_work(&rehash_flow_wq, REHASH_FLOW_INTERVAL); - return 0; error_unreg_notifier: @@ -2437,7 +1910,6 @@ error: static void dp_cleanup(void) { - cancel_delayed_work_sync(&rehash_flow_wq); dp_unregister_genl(ARRAY_SIZE(dp_genl_families)); unregister_netdevice_notifier(&ovs_dp_device_notifier); unregister_pernet_device(&ovs_net_ops); diff --git a/net/openvswitch/datapath.h b/net/openvswitch/datapath.h index 4d109c176ef3..d3d14a58aa91 100644 --- a/net/openvswitch/datapath.h +++ b/net/openvswitch/datapath.h @@ -27,6 +27,7 @@ #include <linux/u64_stats_sync.h> #include "flow.h" +#include "flow_table.h" #include "vport.h" #define DP_MAX_PORTS USHRT_MAX @@ -45,11 +46,15 @@ * @n_lost: Number of received packets that had no matching flow in the flow * table that could not be sent to userspace (normally due to an overflow in * one of the datapath's queues). + * @n_mask_hit: Number of masks looked up for flow match. + * @n_mask_hit / (@n_hit + @n_missed) will be the average masks looked + * up per packet. */ struct dp_stats_percpu { u64 n_hit; u64 n_missed; u64 n_lost; + u64 n_mask_hit; struct u64_stats_sync sync; }; @@ -57,7 +62,7 @@ struct dp_stats_percpu { * struct datapath - datapath for flow-based packet switching * @rcu: RCU callback head for deferred destruction. * @list_node: Element in global 'dps' list. - * @table: Current flow table. Protected by ovs_mutex and RCU. + * @table: flow table. * @ports: Hash table for ports. %OVSP_LOCAL port always exists. Protected by * ovs_mutex and RCU. * @stats_percpu: Per-CPU datapath statistics. @@ -71,7 +76,7 @@ struct datapath { struct list_head list_node; /* Flow table. */ - struct flow_table __rcu *table; + struct flow_table table; /* Switch ports. */ struct hlist_head *ports; diff --git a/net/openvswitch/flow.c b/net/openvswitch/flow.c index 410db90db73d..b409f5279601 100644 --- a/net/openvswitch/flow.c +++ b/net/openvswitch/flow.c @@ -45,202 +45,38 @@ #include <net/ipv6.h> #include <net/ndisc.h> -static struct kmem_cache *flow_cache; - -static void ovs_sw_flow_mask_set(struct sw_flow_mask *mask, - struct sw_flow_key_range *range, u8 val); - -static void update_range__(struct sw_flow_match *match, - size_t offset, size_t size, bool is_mask) +u64 ovs_flow_used_time(unsigned long flow_jiffies) { - struct sw_flow_key_range *range = NULL; - size_t start = rounddown(offset, sizeof(long)); - size_t end = roundup(offset + size, sizeof(long)); - - if (!is_mask) - range = &match->range; - else if (match->mask) - range = &match->mask->range; - - if (!range) - return; - - if (range->start == range->end) { - range->start = start; - range->end = end; - return; - } - - if (range->start > start) - range->start = start; + struct timespec cur_ts; + u64 cur_ms, idle_ms; - if (range->end < end) - range->end = end; -} + ktime_get_ts(&cur_ts); + idle_ms = jiffies_to_msecs(jiffies - flow_jiffies); + cur_ms = (u64)cur_ts.tv_sec * MSEC_PER_SEC + + cur_ts.tv_nsec / NSEC_PER_MSEC; -#define SW_FLOW_KEY_PUT(match, field, value, is_mask) \ - do { \ - update_range__(match, offsetof(struct sw_flow_key, field), \ - sizeof((match)->key->field), is_mask); \ - if (is_mask) { \ - if ((match)->mask) \ - (match)->mask->key.field = value; \ - } else { \ - (match)->key->field = value; \ - } \ - } while (0) - -#define SW_FLOW_KEY_MEMCPY(match, field, value_p, len, is_mask) \ - do { \ - update_range__(match, offsetof(struct sw_flow_key, field), \ - len, is_mask); \ - if (is_mask) { \ - if ((match)->mask) \ - memcpy(&(match)->mask->key.field, value_p, len);\ - } else { \ - memcpy(&(match)->key->field, value_p, len); \ - } \ - } while (0) - -static u16 range_n_bytes(const struct sw_flow_key_range *range) -{ - return range->end - range->start; + return cur_ms - idle_ms; } -void ovs_match_init(struct sw_flow_match *match, - struct sw_flow_key *key, - struct sw_flow_mask *mask) -{ - memset(match, 0, sizeof(*match)); - match->key = key; - match->mask = mask; +#define TCP_FLAGS_BE16(tp) (*(__be16 *)&tcp_flag_word(tp) & htons(0x0FFF)) - memset(key, 0, sizeof(*key)); - - if (mask) { - memset(&mask->key, 0, sizeof(mask->key)); - mask->range.start = mask->range.end = 0; - } -} - -static bool ovs_match_validate(const struct sw_flow_match *match, - u64 key_attrs, u64 mask_attrs) +void ovs_flow_used(struct sw_flow *flow, struct sk_buff *skb) { - u64 key_expected = 1 << OVS_KEY_ATTR_ETHERNET; - u64 mask_allowed = key_attrs; /* At most allow all key attributes */ - - /* The following mask attributes allowed only if they - * pass the validation tests. */ - mask_allowed &= ~((1 << OVS_KEY_ATTR_IPV4) - | (1 << OVS_KEY_ATTR_IPV6) - | (1 << OVS_KEY_ATTR_TCP) - | (1 << OVS_KEY_ATTR_UDP) - | (1 << OVS_KEY_ATTR_SCTP) - | (1 << OVS_KEY_ATTR_ICMP) - | (1 << OVS_KEY_ATTR_ICMPV6) - | (1 << OVS_KEY_ATTR_ARP) - | (1 << OVS_KEY_ATTR_ND)); - - /* Always allowed mask fields. */ - mask_allowed |= ((1 << OVS_KEY_ATTR_TUNNEL) - | (1 << OVS_KEY_ATTR_IN_PORT) - | (1 << OVS_KEY_ATTR_ETHERTYPE)); - - /* Check key attributes. */ - if (match->key->eth.type == htons(ETH_P_ARP) - || match->key->eth.type == htons(ETH_P_RARP)) { - key_expected |= 1 << OVS_KEY_ATTR_ARP; - if (match->mask && (match->mask->key.eth.type == htons(0xffff))) - mask_allowed |= 1 << OVS_KEY_ATTR_ARP; - } + __be16 tcp_flags = 0; - if (match->key->eth.type == htons(ETH_P_IP)) { - key_expected |= 1 << OVS_KEY_ATTR_IPV4; - if (match->mask && (match->mask->key.eth.type == htons(0xffff))) - mask_allowed |= 1 << OVS_KEY_ATTR_IPV4; - - if (match->key->ip.frag != OVS_FRAG_TYPE_LATER) { - if (match->key->ip.proto == IPPROTO_UDP) { - key_expected |= 1 << OVS_KEY_ATTR_UDP; - if (match->mask && (match->mask->key.ip.proto == 0xff)) - mask_allowed |= 1 << OVS_KEY_ATTR_UDP; - } - - if (match->key->ip.proto == IPPROTO_SCTP) { - key_expected |= 1 << OVS_KEY_ATTR_SCTP; - if (match->mask && (match->mask->key.ip.proto == 0xff)) - mask_allowed |= 1 << OVS_KEY_ATTR_SCTP; - } - - if (match->key->ip.proto == IPPROTO_TCP) { - key_expected |= 1 << OVS_KEY_ATTR_TCP; - if (match->mask && (match->mask->key.ip.proto == 0xff)) - mask_allowed |= 1 << OVS_KEY_ATTR_TCP; - } - - if (match->key->ip.proto == IPPROTO_ICMP) { - key_expected |= 1 << OVS_KEY_ATTR_ICMP; - if (match->mask && (match->mask->key.ip.proto == 0xff)) - mask_allowed |= 1 << OVS_KEY_ATTR_ICMP; - } - } - } - - if (match->key->eth.type == htons(ETH_P_IPV6)) { - key_expected |= 1 << OVS_KEY_ATTR_IPV6; - if (match->mask && (match->mask->key.eth.type == htons(0xffff))) - mask_allowed |= 1 << OVS_KEY_ATTR_IPV6; - - if (match->key->ip.frag != OVS_FRAG_TYPE_LATER) { - if (match->key->ip.proto == IPPROTO_UDP) { - key_expected |= 1 << OVS_KEY_ATTR_UDP; - if (match->mask && (match->mask->key.ip.proto == 0xff)) - mask_allowed |= 1 << OVS_KEY_ATTR_UDP; - } - - if (match->key->ip.proto == IPPROTO_SCTP) { - key_expected |= 1 << OVS_KEY_ATTR_SCTP; - if (match->mask && (match->mask->key.ip.proto == 0xff)) - mask_allowed |= 1 << OVS_KEY_ATTR_SCTP; - } - - if (match->key->ip.proto == IPPROTO_TCP) { - key_expected |= 1 << OVS_KEY_ATTR_TCP; - if (match->mask && (match->mask->key.ip.proto == 0xff)) - mask_allowed |= 1 << OVS_KEY_ATTR_TCP; - } - - if (match->key->ip.proto == IPPROTO_ICMPV6) { - key_expected |= 1 << OVS_KEY_ATTR_ICMPV6; - if (match->mask && (match->mask->key.ip.proto == 0xff)) - mask_allowed |= 1 << OVS_KEY_ATTR_ICMPV6; - - if (match->key->ipv6.tp.src == - htons(NDISC_NEIGHBOUR_SOLICITATION) || - match->key->ipv6.tp.src == htons(NDISC_NEIGHBOUR_ADVERTISEMENT)) { - key_expected |= 1 << OVS_KEY_ATTR_ND; - if (match->mask && (match->mask->key.ipv6.tp.src == htons(0xffff))) - mask_allowed |= 1 << OVS_KEY_ATTR_ND; - } - } - } - } - - if ((key_attrs & key_expected) != key_expected) { - /* Key attributes check failed. */ - OVS_NLERR("Missing expected key attributes (key_attrs=%llx, expected=%llx).\n", - key_attrs, key_expected); - return false; - } - - if ((mask_attrs & mask_allowed) != mask_attrs) { - /* Mask attributes check failed. */ - OVS_NLERR("Contain more than allowed mask fields (mask_attrs=%llx, mask_allowed=%llx).\n", - mask_attrs, mask_allowed); - return false; + if ((flow->key.eth.type == htons(ETH_P_IP) || + flow->key.eth.type == htons(ETH_P_IPV6)) && + flow->key.ip.proto == IPPROTO_TCP && + likely(skb->len >= skb_transport_offset(skb) + sizeof(struct tcphdr))) { + tcp_flags = TCP_FLAGS_BE16(tcp_hdr(skb)); } - return true; + spin_lock(&flow->lock); + flow->used = jiffies; + flow->packet_count++; + flow->byte_count += skb->len; + flow->tcp_flags |= tcp_flags; + spin_unlock(&flow->lock); } static int check_header(struct sk_buff *skb, int len) @@ -311,19 +147,6 @@ static bool icmphdr_ok(struct sk_buff *skb) sizeof(struct icmphdr)); } -u64 ovs_flow_used_time(unsigned long flow_jiffies) -{ - struct timespec cur_ts; - u64 cur_ms, idle_ms; - - ktime_get_ts(&cur_ts); - idle_ms = jiffies_to_msecs(jiffies - flow_jiffies); - cur_ms = (u64)cur_ts.tv_sec * MSEC_PER_SEC + - cur_ts.tv_nsec / NSEC_PER_MSEC; - - return cur_ms - idle_ms; -} - static int parse_ipv6hdr(struct sk_buff *skb, struct sw_flow_key *key) { unsigned int nh_ofs = skb_network_offset(skb); @@ -372,311 +195,6 @@ static bool icmp6hdr_ok(struct sk_buff *skb) sizeof(struct icmp6hdr)); } -void ovs_flow_key_mask(struct sw_flow_key *dst, const struct sw_flow_key *src, - const struct sw_flow_mask *mask) -{ - const long *m = (long *)((u8 *)&mask->key + mask->range.start); - const long *s = (long *)((u8 *)src + mask->range.start); - long *d = (long *)((u8 *)dst + mask->range.start); - int i; - - /* The memory outside of the 'mask->range' are not set since - * further operations on 'dst' only uses contents within - * 'mask->range'. - */ - for (i = 0; i < range_n_bytes(&mask->range); i += sizeof(long)) - *d++ = *s++ & *m++; -} - -#define TCP_FLAGS_OFFSET 13 -#define TCP_FLAG_MASK 0x3f - -void ovs_flow_used(struct sw_flow *flow, struct sk_buff *skb) -{ - u8 tcp_flags = 0; - - if ((flow->key.eth.type == htons(ETH_P_IP) || - flow->key.eth.type == htons(ETH_P_IPV6)) && - flow->key.ip.proto == IPPROTO_TCP && - likely(skb->len >= skb_transport_offset(skb) + sizeof(struct tcphdr))) { - u8 *tcp = (u8 *)tcp_hdr(skb); - tcp_flags = *(tcp + TCP_FLAGS_OFFSET) & TCP_FLAG_MASK; - } - - spin_lock(&flow->lock); - flow->used = jiffies; - flow->packet_count++; - flow->byte_count += skb->len; - flow->tcp_flags |= tcp_flags; - spin_unlock(&flow->lock); -} - -struct sw_flow_actions *ovs_flow_actions_alloc(int size) -{ - struct sw_flow_actions *sfa; - - if (size > MAX_ACTIONS_BUFSIZE) - return ERR_PTR(-EINVAL); - - sfa = kmalloc(sizeof(*sfa) + size, GFP_KERNEL); - if (!sfa) - return ERR_PTR(-ENOMEM); - - sfa->actions_len = 0; - return sfa; -} - -struct sw_flow *ovs_flow_alloc(void) -{ - struct sw_flow *flow; - - flow = kmem_cache_alloc(flow_cache, GFP_KERNEL); - if (!flow) - return ERR_PTR(-ENOMEM); - - spin_lock_init(&flow->lock); - flow->sf_acts = NULL; - flow->mask = NULL; - - return flow; -} - -static struct hlist_head *find_bucket(struct flow_table *table, u32 hash) -{ - hash = jhash_1word(hash, table->hash_seed); - return flex_array_get(table->buckets, - (hash & (table->n_buckets - 1))); -} - -static struct flex_array *alloc_buckets(unsigned int n_buckets) -{ - struct flex_array *buckets; - int i, err; - - buckets = flex_array_alloc(sizeof(struct hlist_head), - n_buckets, GFP_KERNEL); - if (!buckets) - return NULL; - - err = flex_array_prealloc(buckets, 0, n_buckets, GFP_KERNEL); - if (err) { - flex_array_free(buckets); - return NULL; - } - - for (i = 0; i < n_buckets; i++) - INIT_HLIST_HEAD((struct hlist_head *) - flex_array_get(buckets, i)); - - return buckets; -} - -static void free_buckets(struct flex_array *buckets) -{ - flex_array_free(buckets); -} - -static struct flow_table *__flow_tbl_alloc(int new_size) -{ - struct flow_table *table = kmalloc(sizeof(*table), GFP_KERNEL); - - if (!table) - return NULL; - - table->buckets = alloc_buckets(new_size); - - if (!table->buckets) { - kfree(table); - return NULL; - } - table->n_buckets = new_size; - table->count = 0; - table->node_ver = 0; - table->keep_flows = false; - get_random_bytes(&table->hash_seed, sizeof(u32)); - table->mask_list = NULL; - - return table; -} - -static void __flow_tbl_destroy(struct flow_table *table) -{ - int i; - - if (table->keep_flows) - goto skip_flows; - - for (i = 0; i < table->n_buckets; i++) { - struct sw_flow *flow; - struct hlist_head *head = flex_array_get(table->buckets, i); - struct hlist_node *n; - int ver = table->node_ver; - - hlist_for_each_entry_safe(flow, n, head, hash_node[ver]) { - hlist_del(&flow->hash_node[ver]); - ovs_flow_free(flow, false); - } - } - - BUG_ON(!list_empty(table->mask_list)); - kfree(table->mask_list); - -skip_flows: - free_buckets(table->buckets); - kfree(table); -} - -struct flow_table *ovs_flow_tbl_alloc(int new_size) -{ - struct flow_table *table = __flow_tbl_alloc(new_size); - - if (!table) - return NULL; - - table->mask_list = kmalloc(sizeof(struct list_head), GFP_KERNEL); - if (!table->mask_list) { - table->keep_flows = true; - __flow_tbl_destroy(table); - return NULL; - } - INIT_LIST_HEAD(table->mask_list); - - return table; -} - -static void flow_tbl_destroy_rcu_cb(struct rcu_head *rcu) -{ - struct flow_table *table = container_of(rcu, struct flow_table, rcu); - - __flow_tbl_destroy(table); -} - -void ovs_flow_tbl_destroy(struct flow_table *table, bool deferred) -{ - if (!table) - return; - - if (deferred) - call_rcu(&table->rcu, flow_tbl_destroy_rcu_cb); - else - __flow_tbl_destroy(table); -} - -struct sw_flow *ovs_flow_dump_next(struct flow_table *table, u32 *bucket, u32 *last) -{ - struct sw_flow *flow; - struct hlist_head *head; - int ver; - int i; - - ver = table->node_ver; - while (*bucket < table->n_buckets) { - i = 0; - head = flex_array_get(table->buckets, *bucket); - hlist_for_each_entry_rcu(flow, head, hash_node[ver]) { - if (i < *last) { - i++; - continue; - } - *last = i + 1; - return flow; - } - (*bucket)++; - *last = 0; - } - - return NULL; -} - -static void __tbl_insert(struct flow_table *table, struct sw_flow *flow) -{ - struct hlist_head *head; - - head = find_bucket(table, flow->hash); - hlist_add_head_rcu(&flow->hash_node[table->node_ver], head); - - table->count++; -} - -static void flow_table_copy_flows(struct flow_table *old, struct flow_table *new) -{ - int old_ver; - int i; - - old_ver = old->node_ver; - new->node_ver = !old_ver; - - /* Insert in new table. */ - for (i = 0; i < old->n_buckets; i++) { - struct sw_flow *flow; - struct hlist_head *head; - - head = flex_array_get(old->buckets, i); - - hlist_for_each_entry(flow, head, hash_node[old_ver]) - __tbl_insert(new, flow); - } - - new->mask_list = old->mask_list; - old->keep_flows = true; -} - -static struct flow_table *__flow_tbl_rehash(struct flow_table *table, int n_buckets) -{ - struct flow_table *new_table; - - new_table = __flow_tbl_alloc(n_buckets); - if (!new_table) - return ERR_PTR(-ENOMEM); - - flow_table_copy_flows(table, new_table); - - return new_table; -} - -struct flow_table *ovs_flow_tbl_rehash(struct flow_table *table) -{ - return __flow_tbl_rehash(table, table->n_buckets); -} - -struct flow_table *ovs_flow_tbl_expand(struct flow_table *table) -{ - return __flow_tbl_rehash(table, table->n_buckets * 2); -} - -static void __flow_free(struct sw_flow *flow) -{ - kfree((struct sf_flow_acts __force *)flow->sf_acts); - kmem_cache_free(flow_cache, flow); -} - -static void rcu_free_flow_callback(struct rcu_head *rcu) -{ - struct sw_flow *flow = container_of(rcu, struct sw_flow, rcu); - - __flow_free(flow); -} - -void ovs_flow_free(struct sw_flow *flow, bool deferred) -{ - if (!flow) - return; - - ovs_sw_flow_mask_del_ref(flow->mask, deferred); - - if (deferred) - call_rcu(&flow->rcu, rcu_free_flow_callback); - else - __flow_free(flow); -} - -/* Schedules 'sf_acts' to be freed after the next RCU grace period. - * The caller must hold rcu_read_lock for this to be sensible. */ -void ovs_flow_deferred_free_acts(struct sw_flow_actions *sf_acts) -{ - kfree_rcu(sf_acts, rcu); -} - static int parse_vlan(struct sk_buff *skb, struct sw_flow_key *key) { struct qtag_prefix { @@ -910,6 +428,7 @@ int ovs_flow_extract(struct sk_buff *skb, u16 in_port, struct sw_flow_key *key) struct tcphdr *tcp = tcp_hdr(skb); key->ipv4.tp.src = tcp->source; key->ipv4.tp.dst = tcp->dest; + key->ipv4.tp.flags = TCP_FLAGS_BE16(tcp); } } else if (key->ip.proto == IPPROTO_UDP) { if (udphdr_ok(skb)) { @@ -978,6 +497,7 @@ int ovs_flow_extract(struct sk_buff *skb, u16 in_port, struct sw_flow_key *key) struct tcphdr *tcp = tcp_hdr(skb); key->ipv6.tp.src = tcp->source; key->ipv6.tp.dst = tcp->dest; + key->ipv6.tp.flags = TCP_FLAGS_BE16(tcp); } } else if (key->ip.proto == NEXTHDR_UDP) { if (udphdr_ok(skb)) { @@ -1002,1080 +522,3 @@ int ovs_flow_extract(struct sk_buff *skb, u16 in_port, struct sw_flow_key *key) return 0; } - -static u32 ovs_flow_hash(const struct sw_flow_key *key, int key_start, - int key_end) -{ - u32 *hash_key = (u32 *)((u8 *)key + key_start); - int hash_u32s = (key_end - key_start) >> 2; - - /* Make sure number of hash bytes are multiple of u32. */ - BUILD_BUG_ON(sizeof(long) % sizeof(u32)); - - return jhash2(hash_key, hash_u32s, 0); -} - -static int flow_key_start(const struct sw_flow_key *key) -{ - if (key->tun_key.ipv4_dst) - return 0; - else - return rounddown(offsetof(struct sw_flow_key, phy), - sizeof(long)); -} - -static bool __cmp_key(const struct sw_flow_key *key1, - const struct sw_flow_key *key2, int key_start, int key_end) -{ - const long *cp1 = (long *)((u8 *)key1 + key_start); - const long *cp2 = (long *)((u8 *)key2 + key_start); - long diffs = 0; - int i; - - for (i = key_start; i < key_end; i += sizeof(long)) - diffs |= *cp1++ ^ *cp2++; - - return diffs == 0; -} - -static bool __flow_cmp_masked_key(const struct sw_flow *flow, - const struct sw_flow_key *key, int key_start, int key_end) -{ - return __cmp_key(&flow->key, key, key_start, key_end); -} - -static bool __flow_cmp_unmasked_key(const struct sw_flow *flow, - const struct sw_flow_key *key, int key_start, int key_end) -{ - return __cmp_key(&flow->unmasked_key, key, key_start, key_end); -} - -bool ovs_flow_cmp_unmasked_key(const struct sw_flow *flow, - const struct sw_flow_key *key, int key_end) -{ - int key_start; - key_start = flow_key_start(key); - - return __flow_cmp_unmasked_key(flow, key, key_start, key_end); - -} - -struct sw_flow *ovs_flow_lookup_unmasked_key(struct flow_table *table, - struct sw_flow_match *match) -{ - struct sw_flow_key *unmasked = match->key; - int key_end = match->range.end; - struct sw_flow *flow; - - flow = ovs_flow_lookup(table, unmasked); - if (flow && (!ovs_flow_cmp_unmasked_key(flow, unmasked, key_end))) - flow = NULL; - - return flow; -} - -static struct sw_flow *ovs_masked_flow_lookup(struct flow_table *table, - const struct sw_flow_key *unmasked, - struct sw_flow_mask *mask) -{ - struct sw_flow *flow; - struct hlist_head *head; - int key_start = mask->range.start; - int key_end = mask->range.end; - u32 hash; - struct sw_flow_key masked_key; - - ovs_flow_key_mask(&masked_key, unmasked, mask); - hash = ovs_flow_hash(&masked_key, key_start, key_end); - head = find_bucket(table, hash); - hlist_for_each_entry_rcu(flow, head, hash_node[table->node_ver]) { - if (flow->mask == mask && - __flow_cmp_masked_key(flow, &masked_key, - key_start, key_end)) - return flow; - } - return NULL; -} - -struct sw_flow *ovs_flow_lookup(struct flow_table *tbl, - const struct sw_flow_key *key) -{ - struct sw_flow *flow = NULL; - struct sw_flow_mask *mask; - - list_for_each_entry_rcu(mask, tbl->mask_list, list) { - flow = ovs_masked_flow_lookup(tbl, key, mask); - if (flow) /* Found */ - break; - } - - return flow; -} - - -void ovs_flow_insert(struct flow_table *table, struct sw_flow *flow) -{ - flow->hash = ovs_flow_hash(&flow->key, flow->mask->range.start, - flow->mask->range.end); - __tbl_insert(table, flow); -} - -void ovs_flow_remove(struct flow_table *table, struct sw_flow *flow) -{ - BUG_ON(table->count == 0); - hlist_del_rcu(&flow->hash_node[table->node_ver]); - table->count--; -} - -/* The size of the argument for each %OVS_KEY_ATTR_* Netlink attribute. */ -const int ovs_key_lens[OVS_KEY_ATTR_MAX + 1] = { - [OVS_KEY_ATTR_ENCAP] = -1, - [OVS_KEY_ATTR_PRIORITY] = sizeof(u32), - [OVS_KEY_ATTR_IN_PORT] = sizeof(u32), - [OVS_KEY_ATTR_SKB_MARK] = sizeof(u32), - [OVS_KEY_ATTR_ETHERNET] = sizeof(struct ovs_key_ethernet), - [OVS_KEY_ATTR_VLAN] = sizeof(__be16), - [OVS_KEY_ATTR_ETHERTYPE] = sizeof(__be16), - [OVS_KEY_ATTR_IPV4] = sizeof(struct ovs_key_ipv4), - [OVS_KEY_ATTR_IPV6] = sizeof(struct ovs_key_ipv6), - [OVS_KEY_ATTR_TCP] = sizeof(struct ovs_key_tcp), - [OVS_KEY_ATTR_UDP] = sizeof(struct ovs_key_udp), - [OVS_KEY_ATTR_SCTP] = sizeof(struct ovs_key_sctp), - [OVS_KEY_ATTR_ICMP] = sizeof(struct ovs_key_icmp), - [OVS_KEY_ATTR_ICMPV6] = sizeof(struct ovs_key_icmpv6), - [OVS_KEY_ATTR_ARP] = sizeof(struct ovs_key_arp), - [OVS_KEY_ATTR_ND] = sizeof(struct ovs_key_nd), - [OVS_KEY_ATTR_TUNNEL] = -1, -}; - -static bool is_all_zero(const u8 *fp, size_t size) -{ - int i; - - if (!fp) - return false; - - for (i = 0; i < size; i++) - if (fp[i]) - return false; - - return true; -} - -static int __parse_flow_nlattrs(const struct nlattr *attr, - const struct nlattr *a[], - u64 *attrsp, bool nz) -{ - const struct nlattr *nla; - u32 attrs; - int rem; - - attrs = *attrsp; - nla_for_each_nested(nla, attr, rem) { - u16 type = nla_type(nla); - int expected_len; - - if (type > OVS_KEY_ATTR_MAX) { - OVS_NLERR("Unknown key attribute (type=%d, max=%d).\n", - type, OVS_KEY_ATTR_MAX); - return -EINVAL; - } - - if (attrs & (1 << type)) { - OVS_NLERR("Duplicate key attribute (type %d).\n", type); - return -EINVAL; - } - - expected_len = ovs_key_lens[type]; - if (nla_len(nla) != expected_len && expected_len != -1) { - OVS_NLERR("Key attribute has unexpected length (type=%d" - ", length=%d, expected=%d).\n", type, - nla_len(nla), expected_len); - return -EINVAL; - } - - if (!nz || !is_all_zero(nla_data(nla), expected_len)) { - attrs |= 1 << type; - a[type] = nla; - } - } - if (rem) { - OVS_NLERR("Message has %d unknown bytes.\n", rem); - return -EINVAL; - } - - *attrsp = attrs; - return 0; -} - -static int parse_flow_mask_nlattrs(const struct nlattr *attr, - const struct nlattr *a[], u64 *attrsp) -{ - return __parse_flow_nlattrs(attr, a, attrsp, true); -} - -static int parse_flow_nlattrs(const struct nlattr *attr, - const struct nlattr *a[], u64 *attrsp) -{ - return __parse_flow_nlattrs(attr, a, attrsp, false); -} - -int ovs_ipv4_tun_from_nlattr(const struct nlattr *attr, - struct sw_flow_match *match, bool is_mask) -{ - struct nlattr *a; - int rem; - bool ttl = false; - __be16 tun_flags = 0; - - nla_for_each_nested(a, attr, rem) { - int type = nla_type(a); - static const u32 ovs_tunnel_key_lens[OVS_TUNNEL_KEY_ATTR_MAX + 1] = { - [OVS_TUNNEL_KEY_ATTR_ID] = sizeof(u64), - [OVS_TUNNEL_KEY_ATTR_IPV4_SRC] = sizeof(u32), - [OVS_TUNNEL_KEY_ATTR_IPV4_DST] = sizeof(u32), - [OVS_TUNNEL_KEY_ATTR_TOS] = 1, - [OVS_TUNNEL_KEY_ATTR_TTL] = 1, - [OVS_TUNNEL_KEY_ATTR_DONT_FRAGMENT] = 0, - [OVS_TUNNEL_KEY_ATTR_CSUM] = 0, - }; - - if (type > OVS_TUNNEL_KEY_ATTR_MAX) { - OVS_NLERR("Unknown IPv4 tunnel attribute (type=%d, max=%d).\n", - type, OVS_TUNNEL_KEY_ATTR_MAX); - return -EINVAL; - } - - if (ovs_tunnel_key_lens[type] != nla_len(a)) { - OVS_NLERR("IPv4 tunnel attribute type has unexpected " - " length (type=%d, length=%d, expected=%d).\n", - type, nla_len(a), ovs_tunnel_key_lens[type]); - return -EINVAL; - } - - switch (type) { - case OVS_TUNNEL_KEY_ATTR_ID: - SW_FLOW_KEY_PUT(match, tun_key.tun_id, - nla_get_be64(a), is_mask); - tun_flags |= TUNNEL_KEY; - break; - case OVS_TUNNEL_KEY_ATTR_IPV4_SRC: - SW_FLOW_KEY_PUT(match, tun_key.ipv4_src, - nla_get_be32(a), is_mask); - break; - case OVS_TUNNEL_KEY_ATTR_IPV4_DST: - SW_FLOW_KEY_PUT(match, tun_key.ipv4_dst, - nla_get_be32(a), is_mask); - break; - case OVS_TUNNEL_KEY_ATTR_TOS: - SW_FLOW_KEY_PUT(match, tun_key.ipv4_tos, - nla_get_u8(a), is_mask); - break; - case OVS_TUNNEL_KEY_ATTR_TTL: - SW_FLOW_KEY_PUT(match, tun_key.ipv4_ttl, - nla_get_u8(a), is_mask); - ttl = true; - break; - case OVS_TUNNEL_KEY_ATTR_DONT_FRAGMENT: - tun_flags |= TUNNEL_DONT_FRAGMENT; - break; - case OVS_TUNNEL_KEY_ATTR_CSUM: - tun_flags |= TUNNEL_CSUM; - break; - default: - return -EINVAL; - } - } - - SW_FLOW_KEY_PUT(match, tun_key.tun_flags, tun_flags, is_mask); - - if (rem > 0) { - OVS_NLERR("IPv4 tunnel attribute has %d unknown bytes.\n", rem); - return -EINVAL; - } - - if (!is_mask) { - if (!match->key->tun_key.ipv4_dst) { - OVS_NLERR("IPv4 tunnel destination address is zero.\n"); - return -EINVAL; - } - - if (!ttl) { - OVS_NLERR("IPv4 tunnel TTL not specified.\n"); - return -EINVAL; - } - } - - return 0; -} - -int ovs_ipv4_tun_to_nlattr(struct sk_buff *skb, - const struct ovs_key_ipv4_tunnel *tun_key, - const struct ovs_key_ipv4_tunnel *output) -{ - struct nlattr *nla; - - nla = nla_nest_start(skb, OVS_KEY_ATTR_TUNNEL); - if (!nla) - return -EMSGSIZE; - - if (output->tun_flags & TUNNEL_KEY && - nla_put_be64(skb, OVS_TUNNEL_KEY_ATTR_ID, output->tun_id)) - return -EMSGSIZE; - if (output->ipv4_src && - nla_put_be32(skb, OVS_TUNNEL_KEY_ATTR_IPV4_SRC, output->ipv4_src)) - return -EMSGSIZE; - if (output->ipv4_dst && - nla_put_be32(skb, OVS_TUNNEL_KEY_ATTR_IPV4_DST, output->ipv4_dst)) - return -EMSGSIZE; - if (output->ipv4_tos && - nla_put_u8(skb, OVS_TUNNEL_KEY_ATTR_TOS, output->ipv4_tos)) - return -EMSGSIZE; - if (nla_put_u8(skb, OVS_TUNNEL_KEY_ATTR_TTL, output->ipv4_ttl)) - return -EMSGSIZE; - if ((output->tun_flags & TUNNEL_DONT_FRAGMENT) && - nla_put_flag(skb, OVS_TUNNEL_KEY_ATTR_DONT_FRAGMENT)) - return -EMSGSIZE; - if ((output->tun_flags & TUNNEL_CSUM) && - nla_put_flag(skb, OVS_TUNNEL_KEY_ATTR_CSUM)) - return -EMSGSIZE; - - nla_nest_end(skb, nla); - return 0; -} - -static int metadata_from_nlattrs(struct sw_flow_match *match, u64 *attrs, - const struct nlattr **a, bool is_mask) -{ - if (*attrs & (1 << OVS_KEY_ATTR_PRIORITY)) { - SW_FLOW_KEY_PUT(match, phy.priority, - nla_get_u32(a[OVS_KEY_ATTR_PRIORITY]), is_mask); - *attrs &= ~(1 << OVS_KEY_ATTR_PRIORITY); - } - - if (*attrs & (1 << OVS_KEY_ATTR_IN_PORT)) { - u32 in_port = nla_get_u32(a[OVS_KEY_ATTR_IN_PORT]); - - if (is_mask) - in_port = 0xffffffff; /* Always exact match in_port. */ - else if (in_port >= DP_MAX_PORTS) - return -EINVAL; - - SW_FLOW_KEY_PUT(match, phy.in_port, in_port, is_mask); - *attrs &= ~(1 << OVS_KEY_ATTR_IN_PORT); - } else if (!is_mask) { - SW_FLOW_KEY_PUT(match, phy.in_port, DP_MAX_PORTS, is_mask); - } - - if (*attrs & (1 << OVS_KEY_ATTR_SKB_MARK)) { - uint32_t mark = nla_get_u32(a[OVS_KEY_ATTR_SKB_MARK]); - - SW_FLOW_KEY_PUT(match, phy.skb_mark, mark, is_mask); - *attrs &= ~(1 << OVS_KEY_ATTR_SKB_MARK); - } - if (*attrs & (1 << OVS_KEY_ATTR_TUNNEL)) { - if (ovs_ipv4_tun_from_nlattr(a[OVS_KEY_ATTR_TUNNEL], match, - is_mask)) - return -EINVAL; - *attrs &= ~(1 << OVS_KEY_ATTR_TUNNEL); - } - return 0; -} - -static int ovs_key_from_nlattrs(struct sw_flow_match *match, u64 attrs, - const struct nlattr **a, bool is_mask) -{ - int err; - u64 orig_attrs = attrs; - - err = metadata_from_nlattrs(match, &attrs, a, is_mask); - if (err) - return err; - - if (attrs & (1 << OVS_KEY_ATTR_ETHERNET)) { - const struct ovs_key_ethernet *eth_key; - - eth_key = nla_data(a[OVS_KEY_ATTR_ETHERNET]); - SW_FLOW_KEY_MEMCPY(match, eth.src, - eth_key->eth_src, ETH_ALEN, is_mask); - SW_FLOW_KEY_MEMCPY(match, eth.dst, - eth_key->eth_dst, ETH_ALEN, is_mask); - attrs &= ~(1 << OVS_KEY_ATTR_ETHERNET); - } - - if (attrs & (1 << OVS_KEY_ATTR_VLAN)) { - __be16 tci; - - tci = nla_get_be16(a[OVS_KEY_ATTR_VLAN]); - if (!(tci & htons(VLAN_TAG_PRESENT))) { - if (is_mask) - OVS_NLERR("VLAN TCI mask does not have exact match for VLAN_TAG_PRESENT bit.\n"); - else - OVS_NLERR("VLAN TCI does not have VLAN_TAG_PRESENT bit set.\n"); - - return -EINVAL; - } - - SW_FLOW_KEY_PUT(match, eth.tci, tci, is_mask); - attrs &= ~(1 << OVS_KEY_ATTR_VLAN); - } else if (!is_mask) - SW_FLOW_KEY_PUT(match, eth.tci, htons(0xffff), true); - - if (attrs & (1 << OVS_KEY_ATTR_ETHERTYPE)) { - __be16 eth_type; - - eth_type = nla_get_be16(a[OVS_KEY_ATTR_ETHERTYPE]); - if (is_mask) { - /* Always exact match EtherType. */ - eth_type = htons(0xffff); - } else if (ntohs(eth_type) < ETH_P_802_3_MIN) { - OVS_NLERR("EtherType is less than minimum (type=%x, min=%x).\n", - ntohs(eth_type), ETH_P_802_3_MIN); - return -EINVAL; - } - - SW_FLOW_KEY_PUT(match, eth.type, eth_type, is_mask); - attrs &= ~(1 << OVS_KEY_ATTR_ETHERTYPE); - } else if (!is_mask) { - SW_FLOW_KEY_PUT(match, eth.type, htons(ETH_P_802_2), is_mask); - } - - if (attrs & (1 << OVS_KEY_ATTR_IPV4)) { - const struct ovs_key_ipv4 *ipv4_key; - - ipv4_key = nla_data(a[OVS_KEY_ATTR_IPV4]); - if (!is_mask && ipv4_key->ipv4_frag > OVS_FRAG_TYPE_MAX) { - OVS_NLERR("Unknown IPv4 fragment type (value=%d, max=%d).\n", - ipv4_key->ipv4_frag, OVS_FRAG_TYPE_MAX); - return -EINVAL; - } - SW_FLOW_KEY_PUT(match, ip.proto, - ipv4_key->ipv4_proto, is_mask); - SW_FLOW_KEY_PUT(match, ip.tos, - ipv4_key->ipv4_tos, is_mask); - SW_FLOW_KEY_PUT(match, ip.ttl, - ipv4_key->ipv4_ttl, is_mask); - SW_FLOW_KEY_PUT(match, ip.frag, - ipv4_key->ipv4_frag, is_mask); - SW_FLOW_KEY_PUT(match, ipv4.addr.src, - ipv4_key->ipv4_src, is_mask); - SW_FLOW_KEY_PUT(match, ipv4.addr.dst, - ipv4_key->ipv4_dst, is_mask); - attrs &= ~(1 << OVS_KEY_ATTR_IPV4); - } - - if (attrs & (1 << OVS_KEY_ATTR_IPV6)) { - const struct ovs_key_ipv6 *ipv6_key; - - ipv6_key = nla_data(a[OVS_KEY_ATTR_IPV6]); - if (!is_mask && ipv6_key->ipv6_frag > OVS_FRAG_TYPE_MAX) { - OVS_NLERR("Unknown IPv6 fragment type (value=%d, max=%d).\n", - ipv6_key->ipv6_frag, OVS_FRAG_TYPE_MAX); - return -EINVAL; - } - SW_FLOW_KEY_PUT(match, ipv6.label, - ipv6_key->ipv6_label, is_mask); - SW_FLOW_KEY_PUT(match, ip.proto, - ipv6_key->ipv6_proto, is_mask); - SW_FLOW_KEY_PUT(match, ip.tos, - ipv6_key->ipv6_tclass, is_mask); - SW_FLOW_KEY_PUT(match, ip.ttl, - ipv6_key->ipv6_hlimit, is_mask); - SW_FLOW_KEY_PUT(match, ip.frag, - ipv6_key->ipv6_frag, is_mask); - SW_FLOW_KEY_MEMCPY(match, ipv6.addr.src, - ipv6_key->ipv6_src, - sizeof(match->key->ipv6.addr.src), - is_mask); - SW_FLOW_KEY_MEMCPY(match, ipv6.addr.dst, - ipv6_key->ipv6_dst, - sizeof(match->key->ipv6.addr.dst), - is_mask); - - attrs &= ~(1 << OVS_KEY_ATTR_IPV6); - } - - if (attrs & (1 << OVS_KEY_ATTR_ARP)) { - const struct ovs_key_arp *arp_key; - - arp_key = nla_data(a[OVS_KEY_ATTR_ARP]); - if (!is_mask && (arp_key->arp_op & htons(0xff00))) { - OVS_NLERR("Unknown ARP opcode (opcode=%d).\n", - arp_key->arp_op); - return -EINVAL; - } - - SW_FLOW_KEY_PUT(match, ipv4.addr.src, - arp_key->arp_sip, is_mask); - SW_FLOW_KEY_PUT(match, ipv4.addr.dst, - arp_key->arp_tip, is_mask); - SW_FLOW_KEY_PUT(match, ip.proto, - ntohs(arp_key->arp_op), is_mask); - SW_FLOW_KEY_MEMCPY(match, ipv4.arp.sha, - arp_key->arp_sha, ETH_ALEN, is_mask); - SW_FLOW_KEY_MEMCPY(match, ipv4.arp.tha, - arp_key->arp_tha, ETH_ALEN, is_mask); - - attrs &= ~(1 << OVS_KEY_ATTR_ARP); - } - - if (attrs & (1 << OVS_KEY_ATTR_TCP)) { - const struct ovs_key_tcp *tcp_key; - - tcp_key = nla_data(a[OVS_KEY_ATTR_TCP]); - if (orig_attrs & (1 << OVS_KEY_ATTR_IPV4)) { - SW_FLOW_KEY_PUT(match, ipv4.tp.src, - tcp_key->tcp_src, is_mask); - SW_FLOW_KEY_PUT(match, ipv4.tp.dst, - tcp_key->tcp_dst, is_mask); - } else { - SW_FLOW_KEY_PUT(match, ipv6.tp.src, - tcp_key->tcp_src, is_mask); - SW_FLOW_KEY_PUT(match, ipv6.tp.dst, - tcp_key->tcp_dst, is_mask); - } - attrs &= ~(1 << OVS_KEY_ATTR_TCP); - } - - if (attrs & (1 << OVS_KEY_ATTR_UDP)) { - const struct ovs_key_udp *udp_key; - - udp_key = nla_data(a[OVS_KEY_ATTR_UDP]); - if (orig_attrs & (1 << OVS_KEY_ATTR_IPV4)) { - SW_FLOW_KEY_PUT(match, ipv4.tp.src, - udp_key->udp_src, is_mask); - SW_FLOW_KEY_PUT(match, ipv4.tp.dst, - udp_key->udp_dst, is_mask); - } else { - SW_FLOW_KEY_PUT(match, ipv6.tp.src, - udp_key->udp_src, is_mask); - SW_FLOW_KEY_PUT(match, ipv6.tp.dst, - udp_key->udp_dst, is_mask); - } - attrs &= ~(1 << OVS_KEY_ATTR_UDP); - } - - if (attrs & (1 << OVS_KEY_ATTR_SCTP)) { - const struct ovs_key_sctp *sctp_key; - - sctp_key = nla_data(a[OVS_KEY_ATTR_SCTP]); - if (orig_attrs & (1 << OVS_KEY_ATTR_IPV4)) { - SW_FLOW_KEY_PUT(match, ipv4.tp.src, - sctp_key->sctp_src, is_mask); - SW_FLOW_KEY_PUT(match, ipv4.tp.dst, - sctp_key->sctp_dst, is_mask); - } else { - SW_FLOW_KEY_PUT(match, ipv6.tp.src, - sctp_key->sctp_src, is_mask); - SW_FLOW_KEY_PUT(match, ipv6.tp.dst, - sctp_key->sctp_dst, is_mask); - } - attrs &= ~(1 << OVS_KEY_ATTR_SCTP); - } - - if (attrs & (1 << OVS_KEY_ATTR_ICMP)) { - const struct ovs_key_icmp *icmp_key; - - icmp_key = nla_data(a[OVS_KEY_ATTR_ICMP]); - SW_FLOW_KEY_PUT(match, ipv4.tp.src, - htons(icmp_key->icmp_type), is_mask); - SW_FLOW_KEY_PUT(match, ipv4.tp.dst, - htons(icmp_key->icmp_code), is_mask); - attrs &= ~(1 << OVS_KEY_ATTR_ICMP); - } - - if (attrs & (1 << OVS_KEY_ATTR_ICMPV6)) { - const struct ovs_key_icmpv6 *icmpv6_key; - - icmpv6_key = nla_data(a[OVS_KEY_ATTR_ICMPV6]); - SW_FLOW_KEY_PUT(match, ipv6.tp.src, - htons(icmpv6_key->icmpv6_type), is_mask); - SW_FLOW_KEY_PUT(match, ipv6.tp.dst, - htons(icmpv6_key->icmpv6_code), is_mask); - attrs &= ~(1 << OVS_KEY_ATTR_ICMPV6); - } - - if (attrs & (1 << OVS_KEY_ATTR_ND)) { - const struct ovs_key_nd *nd_key; - - nd_key = nla_data(a[OVS_KEY_ATTR_ND]); - SW_FLOW_KEY_MEMCPY(match, ipv6.nd.target, - nd_key->nd_target, - sizeof(match->key->ipv6.nd.target), - is_mask); - SW_FLOW_KEY_MEMCPY(match, ipv6.nd.sll, - nd_key->nd_sll, ETH_ALEN, is_mask); - SW_FLOW_KEY_MEMCPY(match, ipv6.nd.tll, - nd_key->nd_tll, ETH_ALEN, is_mask); - attrs &= ~(1 << OVS_KEY_ATTR_ND); - } - - if (attrs != 0) - return -EINVAL; - - return 0; -} - -/** - * ovs_match_from_nlattrs - parses Netlink attributes into a flow key and - * mask. In case the 'mask' is NULL, the flow is treated as exact match - * flow. Otherwise, it is treated as a wildcarded flow, except the mask - * does not include any don't care bit. - * @match: receives the extracted flow match information. - * @key: Netlink attribute holding nested %OVS_KEY_ATTR_* Netlink attribute - * sequence. The fields should of the packet that triggered the creation - * of this flow. - * @mask: Optional. Netlink attribute holding nested %OVS_KEY_ATTR_* Netlink - * attribute specifies the mask field of the wildcarded flow. - */ -int ovs_match_from_nlattrs(struct sw_flow_match *match, - const struct nlattr *key, - const struct nlattr *mask) -{ - const struct nlattr *a[OVS_KEY_ATTR_MAX + 1]; - const struct nlattr *encap; - u64 key_attrs = 0; - u64 mask_attrs = 0; - bool encap_valid = false; - int err; - - err = parse_flow_nlattrs(key, a, &key_attrs); - if (err) - return err; - - if ((key_attrs & (1 << OVS_KEY_ATTR_ETHERNET)) && - (key_attrs & (1 << OVS_KEY_ATTR_ETHERTYPE)) && - (nla_get_be16(a[OVS_KEY_ATTR_ETHERTYPE]) == htons(ETH_P_8021Q))) { - __be16 tci; - - if (!((key_attrs & (1 << OVS_KEY_ATTR_VLAN)) && - (key_attrs & (1 << OVS_KEY_ATTR_ENCAP)))) { - OVS_NLERR("Invalid Vlan frame.\n"); - return -EINVAL; - } - - key_attrs &= ~(1 << OVS_KEY_ATTR_ETHERTYPE); - tci = nla_get_be16(a[OVS_KEY_ATTR_VLAN]); - encap = a[OVS_KEY_ATTR_ENCAP]; - key_attrs &= ~(1 << OVS_KEY_ATTR_ENCAP); - encap_valid = true; - - if (tci & htons(VLAN_TAG_PRESENT)) { - err = parse_flow_nlattrs(encap, a, &key_attrs); - if (err) - return err; - } else if (!tci) { - /* Corner case for truncated 802.1Q header. */ - if (nla_len(encap)) { - OVS_NLERR("Truncated 802.1Q header has non-zero encap attribute.\n"); - return -EINVAL; - } - } else { - OVS_NLERR("Encap attribute is set for a non-VLAN frame.\n"); - return -EINVAL; - } - } - - err = ovs_key_from_nlattrs(match, key_attrs, a, false); - if (err) - return err; - - if (mask) { - err = parse_flow_mask_nlattrs(mask, a, &mask_attrs); - if (err) - return err; - - if (mask_attrs & 1ULL << OVS_KEY_ATTR_ENCAP) { - __be16 eth_type = 0; - __be16 tci = 0; - - if (!encap_valid) { - OVS_NLERR("Encap mask attribute is set for non-VLAN frame.\n"); - return -EINVAL; - } - - mask_attrs &= ~(1 << OVS_KEY_ATTR_ENCAP); - if (a[OVS_KEY_ATTR_ETHERTYPE]) - eth_type = nla_get_be16(a[OVS_KEY_ATTR_ETHERTYPE]); - - if (eth_type == htons(0xffff)) { - mask_attrs &= ~(1 << OVS_KEY_ATTR_ETHERTYPE); - encap = a[OVS_KEY_ATTR_ENCAP]; - err = parse_flow_mask_nlattrs(encap, a, &mask_attrs); - } else { - OVS_NLERR("VLAN frames must have an exact match on the TPID (mask=%x).\n", - ntohs(eth_type)); - return -EINVAL; - } - - if (a[OVS_KEY_ATTR_VLAN]) - tci = nla_get_be16(a[OVS_KEY_ATTR_VLAN]); - - if (!(tci & htons(VLAN_TAG_PRESENT))) { - OVS_NLERR("VLAN tag present bit must have an exact match (tci_mask=%x).\n", ntohs(tci)); - return -EINVAL; - } - } - - err = ovs_key_from_nlattrs(match, mask_attrs, a, true); - if (err) - return err; - } else { - /* Populate exact match flow's key mask. */ - if (match->mask) - ovs_sw_flow_mask_set(match->mask, &match->range, 0xff); - } - - if (!ovs_match_validate(match, key_attrs, mask_attrs)) - return -EINVAL; - - return 0; -} - -/** - * ovs_flow_metadata_from_nlattrs - parses Netlink attributes into a flow key. - * @flow: Receives extracted in_port, priority, tun_key and skb_mark. - * @attr: Netlink attribute holding nested %OVS_KEY_ATTR_* Netlink attribute - * sequence. - * - * This parses a series of Netlink attributes that form a flow key, which must - * take the same form accepted by flow_from_nlattrs(), but only enough of it to - * get the metadata, that is, the parts of the flow key that cannot be - * extracted from the packet itself. - */ - -int ovs_flow_metadata_from_nlattrs(struct sw_flow *flow, - const struct nlattr *attr) -{ - struct ovs_key_ipv4_tunnel *tun_key = &flow->key.tun_key; - const struct nlattr *a[OVS_KEY_ATTR_MAX + 1]; - u64 attrs = 0; - int err; - struct sw_flow_match match; - - flow->key.phy.in_port = DP_MAX_PORTS; - flow->key.phy.priority = 0; - flow->key.phy.skb_mark = 0; - memset(tun_key, 0, sizeof(flow->key.tun_key)); - - err = parse_flow_nlattrs(attr, a, &attrs); - if (err) - return -EINVAL; - - memset(&match, 0, sizeof(match)); - match.key = &flow->key; - - err = metadata_from_nlattrs(&match, &attrs, a, false); - if (err) - return err; - - return 0; -} - -int ovs_flow_to_nlattrs(const struct sw_flow_key *swkey, - const struct sw_flow_key *output, struct sk_buff *skb) -{ - struct ovs_key_ethernet *eth_key; - struct nlattr *nla, *encap; - bool is_mask = (swkey != output); - - if (nla_put_u32(skb, OVS_KEY_ATTR_PRIORITY, output->phy.priority)) - goto nla_put_failure; - - if ((swkey->tun_key.ipv4_dst || is_mask) && - ovs_ipv4_tun_to_nlattr(skb, &swkey->tun_key, &output->tun_key)) - goto nla_put_failure; - - if (swkey->phy.in_port == DP_MAX_PORTS) { - if (is_mask && (output->phy.in_port == 0xffff)) - if (nla_put_u32(skb, OVS_KEY_ATTR_IN_PORT, 0xffffffff)) - goto nla_put_failure; - } else { - u16 upper_u16; - upper_u16 = !is_mask ? 0 : 0xffff; - - if (nla_put_u32(skb, OVS_KEY_ATTR_IN_PORT, - (upper_u16 << 16) | output->phy.in_port)) - goto nla_put_failure; - } - - if (nla_put_u32(skb, OVS_KEY_ATTR_SKB_MARK, output->phy.skb_mark)) - goto nla_put_failure; - - nla = nla_reserve(skb, OVS_KEY_ATTR_ETHERNET, sizeof(*eth_key)); - if (!nla) - goto nla_put_failure; - - eth_key = nla_data(nla); - memcpy(eth_key->eth_src, output->eth.src, ETH_ALEN); - memcpy(eth_key->eth_dst, output->eth.dst, ETH_ALEN); - - if (swkey->eth.tci || swkey->eth.type == htons(ETH_P_8021Q)) { - __be16 eth_type; - eth_type = !is_mask ? htons(ETH_P_8021Q) : htons(0xffff); - if (nla_put_be16(skb, OVS_KEY_ATTR_ETHERTYPE, eth_type) || - nla_put_be16(skb, OVS_KEY_ATTR_VLAN, output->eth.tci)) - goto nla_put_failure; - encap = nla_nest_start(skb, OVS_KEY_ATTR_ENCAP); - if (!swkey->eth.tci) - goto unencap; - } else - encap = NULL; - - if (swkey->eth.type == htons(ETH_P_802_2)) { - /* - * Ethertype 802.2 is represented in the netlink with omitted - * OVS_KEY_ATTR_ETHERTYPE in the flow key attribute, and - * 0xffff in the mask attribute. Ethertype can also - * be wildcarded. - */ - if (is_mask && output->eth.type) - if (nla_put_be16(skb, OVS_KEY_ATTR_ETHERTYPE, - output->eth.type)) - goto nla_put_failure; - goto unencap; - } - - if (nla_put_be16(skb, OVS_KEY_ATTR_ETHERTYPE, output->eth.type)) - goto nla_put_failure; - - if (swkey->eth.type == htons(ETH_P_IP)) { - struct ovs_key_ipv4 *ipv4_key; - - nla = nla_reserve(skb, OVS_KEY_ATTR_IPV4, sizeof(*ipv4_key)); - if (!nla) - goto nla_put_failure; - ipv4_key = nla_data(nla); - ipv4_key->ipv4_src = output->ipv4.addr.src; - ipv4_key->ipv4_dst = output->ipv4.addr.dst; - ipv4_key->ipv4_proto = output->ip.proto; - ipv4_key->ipv4_tos = output->ip.tos; - ipv4_key->ipv4_ttl = output->ip.ttl; - ipv4_key->ipv4_frag = output->ip.frag; - } else if (swkey->eth.type == htons(ETH_P_IPV6)) { - struct ovs_key_ipv6 *ipv6_key; - - nla = nla_reserve(skb, OVS_KEY_ATTR_IPV6, sizeof(*ipv6_key)); - if (!nla) - goto nla_put_failure; - ipv6_key = nla_data(nla); - memcpy(ipv6_key->ipv6_src, &output->ipv6.addr.src, - sizeof(ipv6_key->ipv6_src)); - memcpy(ipv6_key->ipv6_dst, &output->ipv6.addr.dst, - sizeof(ipv6_key->ipv6_dst)); - ipv6_key->ipv6_label = output->ipv6.label; - ipv6_key->ipv6_proto = output->ip.proto; - ipv6_key->ipv6_tclass = output->ip.tos; - ipv6_key->ipv6_hlimit = output->ip.ttl; - ipv6_key->ipv6_frag = output->ip.frag; - } else if (swkey->eth.type == htons(ETH_P_ARP) || - swkey->eth.type == htons(ETH_P_RARP)) { - struct ovs_key_arp *arp_key; - - nla = nla_reserve(skb, OVS_KEY_ATTR_ARP, sizeof(*arp_key)); - if (!nla) - goto nla_put_failure; - arp_key = nla_data(nla); - memset(arp_key, 0, sizeof(struct ovs_key_arp)); - arp_key->arp_sip = output->ipv4.addr.src; - arp_key->arp_tip = output->ipv4.addr.dst; - arp_key->arp_op = htons(output->ip.proto); - memcpy(arp_key->arp_sha, output->ipv4.arp.sha, ETH_ALEN); - memcpy(arp_key->arp_tha, output->ipv4.arp.tha, ETH_ALEN); - } - - if ((swkey->eth.type == htons(ETH_P_IP) || - swkey->eth.type == htons(ETH_P_IPV6)) && - swkey->ip.frag != OVS_FRAG_TYPE_LATER) { - - if (swkey->ip.proto == IPPROTO_TCP) { - struct ovs_key_tcp *tcp_key; - - nla = nla_reserve(skb, OVS_KEY_ATTR_TCP, sizeof(*tcp_key)); - if (!nla) - goto nla_put_failure; - tcp_key = nla_data(nla); - if (swkey->eth.type == htons(ETH_P_IP)) { - tcp_key->tcp_src = output->ipv4.tp.src; - tcp_key->tcp_dst = output->ipv4.tp.dst; - } else if (swkey->eth.type == htons(ETH_P_IPV6)) { - tcp_key->tcp_src = output->ipv6.tp.src; - tcp_key->tcp_dst = output->ipv6.tp.dst; - } - } else if (swkey->ip.proto == IPPROTO_UDP) { - struct ovs_key_udp *udp_key; - - nla = nla_reserve(skb, OVS_KEY_ATTR_UDP, sizeof(*udp_key)); - if (!nla) - goto nla_put_failure; - udp_key = nla_data(nla); - if (swkey->eth.type == htons(ETH_P_IP)) { - udp_key->udp_src = output->ipv4.tp.src; - udp_key->udp_dst = output->ipv4.tp.dst; - } else if (swkey->eth.type == htons(ETH_P_IPV6)) { - udp_key->udp_src = output->ipv6.tp.src; - udp_key->udp_dst = output->ipv6.tp.dst; - } - } else if (swkey->ip.proto == IPPROTO_SCTP) { - struct ovs_key_sctp *sctp_key; - - nla = nla_reserve(skb, OVS_KEY_ATTR_SCTP, sizeof(*sctp_key)); - if (!nla) - goto nla_put_failure; - sctp_key = nla_data(nla); - if (swkey->eth.type == htons(ETH_P_IP)) { - sctp_key->sctp_src = swkey->ipv4.tp.src; - sctp_key->sctp_dst = swkey->ipv4.tp.dst; - } else if (swkey->eth.type == htons(ETH_P_IPV6)) { - sctp_key->sctp_src = swkey->ipv6.tp.src; - sctp_key->sctp_dst = swkey->ipv6.tp.dst; - } - } else if (swkey->eth.type == htons(ETH_P_IP) && - swkey->ip.proto == IPPROTO_ICMP) { - struct ovs_key_icmp *icmp_key; - - nla = nla_reserve(skb, OVS_KEY_ATTR_ICMP, sizeof(*icmp_key)); - if (!nla) - goto nla_put_failure; - icmp_key = nla_data(nla); - icmp_key->icmp_type = ntohs(output->ipv4.tp.src); - icmp_key->icmp_code = ntohs(output->ipv4.tp.dst); - } else if (swkey->eth.type == htons(ETH_P_IPV6) && - swkey->ip.proto == IPPROTO_ICMPV6) { - struct ovs_key_icmpv6 *icmpv6_key; - - nla = nla_reserve(skb, OVS_KEY_ATTR_ICMPV6, - sizeof(*icmpv6_key)); - if (!nla) - goto nla_put_failure; - icmpv6_key = nla_data(nla); - icmpv6_key->icmpv6_type = ntohs(output->ipv6.tp.src); - icmpv6_key->icmpv6_code = ntohs(output->ipv6.tp.dst); - - if (icmpv6_key->icmpv6_type == NDISC_NEIGHBOUR_SOLICITATION || - icmpv6_key->icmpv6_type == NDISC_NEIGHBOUR_ADVERTISEMENT) { - struct ovs_key_nd *nd_key; - - nla = nla_reserve(skb, OVS_KEY_ATTR_ND, sizeof(*nd_key)); - if (!nla) - goto nla_put_failure; - nd_key = nla_data(nla); - memcpy(nd_key->nd_target, &output->ipv6.nd.target, - sizeof(nd_key->nd_target)); - memcpy(nd_key->nd_sll, output->ipv6.nd.sll, ETH_ALEN); - memcpy(nd_key->nd_tll, output->ipv6.nd.tll, ETH_ALEN); - } - } - } - -unencap: - if (encap) - nla_nest_end(skb, encap); - - return 0; - -nla_put_failure: - return -EMSGSIZE; -} - -/* Initializes the flow module. - * Returns zero if successful or a negative error code. */ -int ovs_flow_init(void) -{ - BUILD_BUG_ON(__alignof__(struct sw_flow_key) % __alignof__(long)); - BUILD_BUG_ON(sizeof(struct sw_flow_key) % sizeof(long)); - - flow_cache = kmem_cache_create("sw_flow", sizeof(struct sw_flow), 0, - 0, NULL); - if (flow_cache == NULL) - return -ENOMEM; - - return 0; -} - -/* Uninitializes the flow module. */ -void ovs_flow_exit(void) -{ - kmem_cache_destroy(flow_cache); -} - -struct sw_flow_mask *ovs_sw_flow_mask_alloc(void) -{ - struct sw_flow_mask *mask; - - mask = kmalloc(sizeof(*mask), GFP_KERNEL); - if (mask) - mask->ref_count = 0; - - return mask; -} - -void ovs_sw_flow_mask_add_ref(struct sw_flow_mask *mask) -{ - mask->ref_count++; -} - -void ovs_sw_flow_mask_del_ref(struct sw_flow_mask *mask, bool deferred) -{ - if (!mask) - return; - - BUG_ON(!mask->ref_count); - mask->ref_count--; - - if (!mask->ref_count) { - list_del_rcu(&mask->list); - if (deferred) - kfree_rcu(mask, rcu); - else - kfree(mask); - } -} - -static bool ovs_sw_flow_mask_equal(const struct sw_flow_mask *a, - const struct sw_flow_mask *b) -{ - u8 *a_ = (u8 *)&a->key + a->range.start; - u8 *b_ = (u8 *)&b->key + b->range.start; - - return (a->range.end == b->range.end) - && (a->range.start == b->range.start) - && (memcmp(a_, b_, range_n_bytes(&a->range)) == 0); -} - -struct sw_flow_mask *ovs_sw_flow_mask_find(const struct flow_table *tbl, - const struct sw_flow_mask *mask) -{ - struct list_head *ml; - - list_for_each(ml, tbl->mask_list) { - struct sw_flow_mask *m; - m = container_of(ml, struct sw_flow_mask, list); - if (ovs_sw_flow_mask_equal(mask, m)) - return m; - } - - return NULL; -} - -/** - * add a new mask into the mask list. - * The caller needs to make sure that 'mask' is not the same - * as any masks that are already on the list. - */ -void ovs_sw_flow_mask_insert(struct flow_table *tbl, struct sw_flow_mask *mask) -{ - list_add_rcu(&mask->list, tbl->mask_list); -} - -/** - * Set 'range' fields in the mask to the value of 'val'. - */ -static void ovs_sw_flow_mask_set(struct sw_flow_mask *mask, - struct sw_flow_key_range *range, u8 val) -{ - u8 *m = (u8 *)&mask->key + range->start; - - mask->range = *range; - memset(m, val, range_n_bytes(range)); -} diff --git a/net/openvswitch/flow.h b/net/openvswitch/flow.h index 212fbf7510c4..1510f51dbf74 100644 --- a/net/openvswitch/flow.h +++ b/net/openvswitch/flow.h @@ -33,14 +33,6 @@ #include <net/inet_ecn.h> struct sk_buff; -struct sw_flow_mask; -struct flow_table; - -struct sw_flow_actions { - struct rcu_head rcu; - u32 actions_len; - struct nlattr actions[]; -}; /* Used to memset ovs_key_ipv4_tunnel padding. */ #define OVS_TUNNEL_KEY_SIZE \ @@ -101,6 +93,7 @@ struct sw_flow_key { struct { __be16 src; /* TCP/UDP/SCTP source port. */ __be16 dst; /* TCP/UDP/SCTP destination port. */ + __be16 flags; /* TCP flags. */ } tp; struct { u8 sha[ETH_ALEN]; /* ARP source hardware address. */ @@ -117,6 +110,7 @@ struct sw_flow_key { struct { __be16 src; /* TCP/UDP/SCTP source port. */ __be16 dst; /* TCP/UDP/SCTP destination port. */ + __be16 flags; /* TCP flags. */ } tp; struct { struct in6_addr target; /* ND target address. */ @@ -127,6 +121,31 @@ struct sw_flow_key { }; } __aligned(BITS_PER_LONG/8); /* Ensure that we can do comparisons as longs. */ +struct sw_flow_key_range { + size_t start; + size_t end; +}; + +struct sw_flow_mask { + int ref_count; + struct rcu_head rcu; + struct list_head list; + struct sw_flow_key_range range; + struct sw_flow_key key; +}; + +struct sw_flow_match { + struct sw_flow_key *key; + struct sw_flow_key_range range; + struct sw_flow_mask *mask; +}; + +struct sw_flow_actions { + struct rcu_head rcu; + u32 actions_len; + struct nlattr actions[]; +}; + struct sw_flow { struct rcu_head rcu; struct hlist_node hash_node[2]; @@ -141,23 +160,9 @@ struct sw_flow { unsigned long used; /* Last used time (in jiffies). */ u64 packet_count; /* Number of packets matched. */ u64 byte_count; /* Number of bytes matched. */ - u8 tcp_flags; /* Union of seen TCP flags. */ -}; - -struct sw_flow_key_range { - size_t start; - size_t end; + __be16 tcp_flags; /* Union of seen TCP flags. */ }; -struct sw_flow_match { - struct sw_flow_key *key; - struct sw_flow_key_range range; - struct sw_flow_mask *mask; -}; - -void ovs_match_init(struct sw_flow_match *match, - struct sw_flow_key *key, struct sw_flow_mask *mask); - struct arp_eth_header { __be16 ar_hrd; /* format of hardware address */ __be16 ar_pro; /* format of protocol address */ @@ -172,88 +177,9 @@ struct arp_eth_header { unsigned char ar_tip[4]; /* target IP address */ } __packed; -int ovs_flow_init(void); -void ovs_flow_exit(void); - -struct sw_flow *ovs_flow_alloc(void); -void ovs_flow_deferred_free(struct sw_flow *); -void ovs_flow_free(struct sw_flow *, bool deferred); - -struct sw_flow_actions *ovs_flow_actions_alloc(int actions_len); -void ovs_flow_deferred_free_acts(struct sw_flow_actions *); - -int ovs_flow_extract(struct sk_buff *, u16 in_port, struct sw_flow_key *); void ovs_flow_used(struct sw_flow *, struct sk_buff *); u64 ovs_flow_used_time(unsigned long flow_jiffies); -int ovs_flow_to_nlattrs(const struct sw_flow_key *, - const struct sw_flow_key *, struct sk_buff *); -int ovs_match_from_nlattrs(struct sw_flow_match *match, - const struct nlattr *, - const struct nlattr *); -int ovs_flow_metadata_from_nlattrs(struct sw_flow *flow, - const struct nlattr *attr); -#define MAX_ACTIONS_BUFSIZE (32 * 1024) -#define TBL_MIN_BUCKETS 1024 - -struct flow_table { - struct flex_array *buckets; - unsigned int count, n_buckets; - struct rcu_head rcu; - struct list_head *mask_list; - int node_ver; - u32 hash_seed; - bool keep_flows; -}; - -static inline int ovs_flow_tbl_count(struct flow_table *table) -{ - return table->count; -} - -static inline int ovs_flow_tbl_need_to_expand(struct flow_table *table) -{ - return (table->count > table->n_buckets); -} - -struct sw_flow *ovs_flow_lookup(struct flow_table *, - const struct sw_flow_key *); -struct sw_flow *ovs_flow_lookup_unmasked_key(struct flow_table *table, - struct sw_flow_match *match); - -void ovs_flow_tbl_destroy(struct flow_table *table, bool deferred); -struct flow_table *ovs_flow_tbl_alloc(int new_size); -struct flow_table *ovs_flow_tbl_expand(struct flow_table *table); -struct flow_table *ovs_flow_tbl_rehash(struct flow_table *table); - -void ovs_flow_insert(struct flow_table *table, struct sw_flow *flow); -void ovs_flow_remove(struct flow_table *table, struct sw_flow *flow); - -struct sw_flow *ovs_flow_dump_next(struct flow_table *table, u32 *bucket, u32 *idx); -extern const int ovs_key_lens[OVS_KEY_ATTR_MAX + 1]; -int ovs_ipv4_tun_from_nlattr(const struct nlattr *attr, - struct sw_flow_match *match, bool is_mask); -int ovs_ipv4_tun_to_nlattr(struct sk_buff *skb, - const struct ovs_key_ipv4_tunnel *tun_key, - const struct ovs_key_ipv4_tunnel *output); - -bool ovs_flow_cmp_unmasked_key(const struct sw_flow *flow, - const struct sw_flow_key *key, int key_end); - -struct sw_flow_mask { - int ref_count; - struct rcu_head rcu; - struct list_head list; - struct sw_flow_key_range range; - struct sw_flow_key key; -}; +int ovs_flow_extract(struct sk_buff *, u16 in_port, struct sw_flow_key *); -struct sw_flow_mask *ovs_sw_flow_mask_alloc(void); -void ovs_sw_flow_mask_add_ref(struct sw_flow_mask *); -void ovs_sw_flow_mask_del_ref(struct sw_flow_mask *, bool deferred); -void ovs_sw_flow_mask_insert(struct flow_table *, struct sw_flow_mask *); -struct sw_flow_mask *ovs_sw_flow_mask_find(const struct flow_table *, - const struct sw_flow_mask *); -void ovs_flow_key_mask(struct sw_flow_key *dst, const struct sw_flow_key *src, - const struct sw_flow_mask *mask); #endif /* flow.h */ diff --git a/net/openvswitch/flow_netlink.c b/net/openvswitch/flow_netlink.c new file mode 100644 index 000000000000..2bc1bc1aca3b --- /dev/null +++ b/net/openvswitch/flow_netlink.c @@ -0,0 +1,1630 @@ +/* + * Copyright (c) 2007-2013 Nicira, Inc. + * + * This program is free software; you can redistribute it and/or + * modify it under the terms of version 2 of the GNU General Public + * License as published by the Free Software Foundation. + * + * This program is distributed in the hope that it will be useful, but + * WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + * General Public License for more details. + * + * You should have received a copy of the GNU General Public License + * along with this program; if not, write to the Free Software + * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA + * 02110-1301, USA + */ + +#include "flow.h" +#include "datapath.h" +#include <linux/uaccess.h> +#include <linux/netdevice.h> +#include <linux/etherdevice.h> +#include <linux/if_ether.h> +#include <linux/if_vlan.h> +#include <net/llc_pdu.h> +#include <linux/kernel.h> +#include <linux/jhash.h> +#include <linux/jiffies.h> +#include <linux/llc.h> +#include <linux/module.h> +#include <linux/in.h> +#include <linux/rcupdate.h> +#include <linux/if_arp.h> +#include <linux/ip.h> +#include <linux/ipv6.h> +#include <linux/sctp.h> +#include <linux/tcp.h> +#include <linux/udp.h> +#include <linux/icmp.h> +#include <linux/icmpv6.h> +#include <linux/rculist.h> +#include <net/ip.h> +#include <net/ipv6.h> +#include <net/ndisc.h> + +#include "flow_netlink.h" + +static void update_range__(struct sw_flow_match *match, + size_t offset, size_t size, bool is_mask) +{ + struct sw_flow_key_range *range = NULL; + size_t start = rounddown(offset, sizeof(long)); + size_t end = roundup(offset + size, sizeof(long)); + + if (!is_mask) + range = &match->range; + else if (match->mask) + range = &match->mask->range; + + if (!range) + return; + + if (range->start == range->end) { + range->start = start; + range->end = end; + return; + } + + if (range->start > start) + range->start = start; + + if (range->end < end) + range->end = end; +} + +#define SW_FLOW_KEY_PUT(match, field, value, is_mask) \ + do { \ + update_range__(match, offsetof(struct sw_flow_key, field), \ + sizeof((match)->key->field), is_mask); \ + if (is_mask) { \ + if ((match)->mask) \ + (match)->mask->key.field = value; \ + } else { \ + (match)->key->field = value; \ + } \ + } while (0) + +#define SW_FLOW_KEY_MEMCPY(match, field, value_p, len, is_mask) \ + do { \ + update_range__(match, offsetof(struct sw_flow_key, field), \ + len, is_mask); \ + if (is_mask) { \ + if ((match)->mask) \ + memcpy(&(match)->mask->key.field, value_p, len);\ + } else { \ + memcpy(&(match)->key->field, value_p, len); \ + } \ + } while (0) + +static u16 range_n_bytes(const struct sw_flow_key_range *range) +{ + return range->end - range->start; +} + +static bool match_validate(const struct sw_flow_match *match, + u64 key_attrs, u64 mask_attrs) +{ + u64 key_expected = 1 << OVS_KEY_ATTR_ETHERNET; + u64 mask_allowed = key_attrs; /* At most allow all key attributes */ + + /* The following mask attributes allowed only if they + * pass the validation tests. */ + mask_allowed &= ~((1 << OVS_KEY_ATTR_IPV4) + | (1 << OVS_KEY_ATTR_IPV6) + | (1 << OVS_KEY_ATTR_TCP) + | (1 << OVS_KEY_ATTR_TCP_FLAGS) + | (1 << OVS_KEY_ATTR_UDP) + | (1 << OVS_KEY_ATTR_SCTP) + | (1 << OVS_KEY_ATTR_ICMP) + | (1 << OVS_KEY_ATTR_ICMPV6) + | (1 << OVS_KEY_ATTR_ARP) + | (1 << OVS_KEY_ATTR_ND)); + + /* Always allowed mask fields. */ + mask_allowed |= ((1 << OVS_KEY_ATTR_TUNNEL) + | (1 << OVS_KEY_ATTR_IN_PORT) + | (1 << OVS_KEY_ATTR_ETHERTYPE)); + + /* Check key attributes. */ + if (match->key->eth.type == htons(ETH_P_ARP) + || match->key->eth.type == htons(ETH_P_RARP)) { + key_expected |= 1 << OVS_KEY_ATTR_ARP; + if (match->mask && (match->mask->key.eth.type == htons(0xffff))) + mask_allowed |= 1 << OVS_KEY_ATTR_ARP; + } + + if (match->key->eth.type == htons(ETH_P_IP)) { + key_expected |= 1 << OVS_KEY_ATTR_IPV4; + if (match->mask && (match->mask->key.eth.type == htons(0xffff))) + mask_allowed |= 1 << OVS_KEY_ATTR_IPV4; + + if (match->key->ip.frag != OVS_FRAG_TYPE_LATER) { + if (match->key->ip.proto == IPPROTO_UDP) { + key_expected |= 1 << OVS_KEY_ATTR_UDP; + if (match->mask && (match->mask->key.ip.proto == 0xff)) + mask_allowed |= 1 << OVS_KEY_ATTR_UDP; + } + + if (match->key->ip.proto == IPPROTO_SCTP) { + key_expected |= 1 << OVS_KEY_ATTR_SCTP; + if (match->mask && (match->mask->key.ip.proto == 0xff)) + mask_allowed |= 1 << OVS_KEY_ATTR_SCTP; + } + + if (match->key->ip.proto == IPPROTO_TCP) { + key_expected |= 1 << OVS_KEY_ATTR_TCP; + key_expected |= 1 << OVS_KEY_ATTR_TCP_FLAGS; + if (match->mask && (match->mask->key.ip.proto == 0xff)) { + mask_allowed |= 1 << OVS_KEY_ATTR_TCP; + mask_allowed |= 1 << OVS_KEY_ATTR_TCP_FLAGS; + } + } + + if (match->key->ip.proto == IPPROTO_ICMP) { + key_expected |= 1 << OVS_KEY_ATTR_ICMP; + if (match->mask && (match->mask->key.ip.proto == 0xff)) + mask_allowed |= 1 << OVS_KEY_ATTR_ICMP; + } + } + } + + if (match->key->eth.type == htons(ETH_P_IPV6)) { + key_expected |= 1 << OVS_KEY_ATTR_IPV6; + if (match->mask && (match->mask->key.eth.type == htons(0xffff))) + mask_allowed |= 1 << OVS_KEY_ATTR_IPV6; + + if (match->key->ip.frag != OVS_FRAG_TYPE_LATER) { + if (match->key->ip.proto == IPPROTO_UDP) { + key_expected |= 1 << OVS_KEY_ATTR_UDP; + if (match->mask && (match->mask->key.ip.proto == 0xff)) + mask_allowed |= 1 << OVS_KEY_ATTR_UDP; + } + + if (match->key->ip.proto == IPPROTO_SCTP) { + key_expected |= 1 << OVS_KEY_ATTR_SCTP; + if (match->mask && (match->mask->key.ip.proto == 0xff)) + mask_allowed |= 1 << OVS_KEY_ATTR_SCTP; + } + + if (match->key->ip.proto == IPPROTO_TCP) { + key_expected |= 1 << OVS_KEY_ATTR_TCP; + key_expected |= 1 << OVS_KEY_ATTR_TCP_FLAGS; + if (match->mask && (match->mask->key.ip.proto == 0xff)) { + mask_allowed |= 1 << OVS_KEY_ATTR_TCP; + mask_allowed |= 1 << OVS_KEY_ATTR_TCP_FLAGS; + } + } + + if (match->key->ip.proto == IPPROTO_ICMPV6) { + key_expected |= 1 << OVS_KEY_ATTR_ICMPV6; + if (match->mask && (match->mask->key.ip.proto == 0xff)) + mask_allowed |= 1 << OVS_KEY_ATTR_ICMPV6; + + if (match->key->ipv6.tp.src == + htons(NDISC_NEIGHBOUR_SOLICITATION) || + match->key->ipv6.tp.src == htons(NDISC_NEIGHBOUR_ADVERTISEMENT)) { + key_expected |= 1 << OVS_KEY_ATTR_ND; + if (match->mask && (match->mask->key.ipv6.tp.src == htons(0xffff))) + mask_allowed |= 1 << OVS_KEY_ATTR_ND; + } + } + } + } + + if ((key_attrs & key_expected) != key_expected) { + /* Key attributes check failed. */ + OVS_NLERR("Missing expected key attributes (key_attrs=%llx, expected=%llx).\n", + key_attrs, key_expected); + return false; + } + + if ((mask_attrs & mask_allowed) != mask_attrs) { + /* Mask attributes check failed. */ + OVS_NLERR("Contain more than allowed mask fields (mask_attrs=%llx, mask_allowed=%llx).\n", + mask_attrs, mask_allowed); + return false; + } + + return true; +} + +/* The size of the argument for each %OVS_KEY_ATTR_* Netlink attribute. */ +static const int ovs_key_lens[OVS_KEY_ATTR_MAX + 1] = { + [OVS_KEY_ATTR_ENCAP] = -1, + [OVS_KEY_ATTR_PRIORITY] = sizeof(u32), + [OVS_KEY_ATTR_IN_PORT] = sizeof(u32), + [OVS_KEY_ATTR_SKB_MARK] = sizeof(u32), + [OVS_KEY_ATTR_ETHERNET] = sizeof(struct ovs_key_ethernet), + [OVS_KEY_ATTR_VLAN] = sizeof(__be16), + [OVS_KEY_ATTR_ETHERTYPE] = sizeof(__be16), + [OVS_KEY_ATTR_IPV4] = sizeof(struct ovs_key_ipv4), + [OVS_KEY_ATTR_IPV6] = sizeof(struct ovs_key_ipv6), + [OVS_KEY_ATTR_TCP] = sizeof(struct ovs_key_tcp), + [OVS_KEY_ATTR_TCP_FLAGS] = sizeof(__be16), + [OVS_KEY_ATTR_UDP] = sizeof(struct ovs_key_udp), + [OVS_KEY_ATTR_SCTP] = sizeof(struct ovs_key_sctp), + [OVS_KEY_ATTR_ICMP] = sizeof(struct ovs_key_icmp), + [OVS_KEY_ATTR_ICMPV6] = sizeof(struct ovs_key_icmpv6), + [OVS_KEY_ATTR_ARP] = sizeof(struct ovs_key_arp), + [OVS_KEY_ATTR_ND] = sizeof(struct ovs_key_nd), + [OVS_KEY_ATTR_TUNNEL] = -1, +}; + +static bool is_all_zero(const u8 *fp, size_t size) +{ + int i; + + if (!fp) + return false; + + for (i = 0; i < size; i++) + if (fp[i]) + return false; + + return true; +} + +static int __parse_flow_nlattrs(const struct nlattr *attr, + const struct nlattr *a[], + u64 *attrsp, bool nz) +{ + const struct nlattr *nla; + u64 attrs; + int rem; + + attrs = *attrsp; + nla_for_each_nested(nla, attr, rem) { + u16 type = nla_type(nla); + int expected_len; + + if (type > OVS_KEY_ATTR_MAX) { + OVS_NLERR("Unknown key attribute (type=%d, max=%d).\n", + type, OVS_KEY_ATTR_MAX); + return -EINVAL; + } + + if (attrs & (1 << type)) { + OVS_NLERR("Duplicate key attribute (type %d).\n", type); + return -EINVAL; + } + + expected_len = ovs_key_lens[type]; + if (nla_len(nla) != expected_len && expected_len != -1) { + OVS_NLERR("Key attribute has unexpected length (type=%d" + ", length=%d, expected=%d).\n", type, + nla_len(nla), expected_len); + return -EINVAL; + } + + if (!nz || !is_all_zero(nla_data(nla), expected_len)) { + attrs |= 1 << type; + a[type] = nla; + } + } + if (rem) { + OVS_NLERR("Message has %d unknown bytes.\n", rem); + return -EINVAL; + } + + *attrsp = attrs; + return 0; +} + +static int parse_flow_mask_nlattrs(const struct nlattr *attr, + const struct nlattr *a[], u64 *attrsp) +{ + return __parse_flow_nlattrs(attr, a, attrsp, true); +} + +static int parse_flow_nlattrs(const struct nlattr *attr, + const struct nlattr *a[], u64 *attrsp) +{ + return __parse_flow_nlattrs(attr, a, attrsp, false); +} + +static int ipv4_tun_from_nlattr(const struct nlattr *attr, + struct sw_flow_match *match, bool is_mask) +{ + struct nlattr *a; + int rem; + bool ttl = false; + __be16 tun_flags = 0; + + nla_for_each_nested(a, attr, rem) { + int type = nla_type(a); + static const u32 ovs_tunnel_key_lens[OVS_TUNNEL_KEY_ATTR_MAX + 1] = { + [OVS_TUNNEL_KEY_ATTR_ID] = sizeof(u64), + [OVS_TUNNEL_KEY_ATTR_IPV4_SRC] = sizeof(u32), + [OVS_TUNNEL_KEY_ATTR_IPV4_DST] = sizeof(u32), + [OVS_TUNNEL_KEY_ATTR_TOS] = 1, + [OVS_TUNNEL_KEY_ATTR_TTL] = 1, + [OVS_TUNNEL_KEY_ATTR_DONT_FRAGMENT] = 0, + [OVS_TUNNEL_KEY_ATTR_CSUM] = 0, + }; + + if (type > OVS_TUNNEL_KEY_ATTR_MAX) { + OVS_NLERR("Unknown IPv4 tunnel attribute (type=%d, max=%d).\n", + type, OVS_TUNNEL_KEY_ATTR_MAX); + return -EINVAL; + } + + if (ovs_tunnel_key_lens[type] != nla_len(a)) { + OVS_NLERR("IPv4 tunnel attribute type has unexpected " + " length (type=%d, length=%d, expected=%d).\n", + type, nla_len(a), ovs_tunnel_key_lens[type]); + return -EINVAL; + } + + switch (type) { + case OVS_TUNNEL_KEY_ATTR_ID: + SW_FLOW_KEY_PUT(match, tun_key.tun_id, + nla_get_be64(a), is_mask); + tun_flags |= TUNNEL_KEY; + break; + case OVS_TUNNEL_KEY_ATTR_IPV4_SRC: + SW_FLOW_KEY_PUT(match, tun_key.ipv4_src, + nla_get_be32(a), is_mask); + break; + case OVS_TUNNEL_KEY_ATTR_IPV4_DST: + SW_FLOW_KEY_PUT(match, tun_key.ipv4_dst, + nla_get_be32(a), is_mask); + break; + case OVS_TUNNEL_KEY_ATTR_TOS: + SW_FLOW_KEY_PUT(match, tun_key.ipv4_tos, + nla_get_u8(a), is_mask); + break; + case OVS_TUNNEL_KEY_ATTR_TTL: + SW_FLOW_KEY_PUT(match, tun_key.ipv4_ttl, + nla_get_u8(a), is_mask); + ttl = true; + break; + case OVS_TUNNEL_KEY_ATTR_DONT_FRAGMENT: + tun_flags |= TUNNEL_DONT_FRAGMENT; + break; + case OVS_TUNNEL_KEY_ATTR_CSUM: + tun_flags |= TUNNEL_CSUM; + break; + default: + return -EINVAL; + } + } + + SW_FLOW_KEY_PUT(match, tun_key.tun_flags, tun_flags, is_mask); + + if (rem > 0) { + OVS_NLERR("IPv4 tunnel attribute has %d unknown bytes.\n", rem); + return -EINVAL; + } + + if (!is_mask) { + if (!match->key->tun_key.ipv4_dst) { + OVS_NLERR("IPv4 tunnel destination address is zero.\n"); + return -EINVAL; + } + + if (!ttl) { + OVS_NLERR("IPv4 tunnel TTL not specified.\n"); + return -EINVAL; + } + } + + return 0; +} + +static int ipv4_tun_to_nlattr(struct sk_buff *skb, + const struct ovs_key_ipv4_tunnel *tun_key, + const struct ovs_key_ipv4_tunnel *output) +{ + struct nlattr *nla; + + nla = nla_nest_start(skb, OVS_KEY_ATTR_TUNNEL); + if (!nla) + return -EMSGSIZE; + + if (output->tun_flags & TUNNEL_KEY && + nla_put_be64(skb, OVS_TUNNEL_KEY_ATTR_ID, output->tun_id)) + return -EMSGSIZE; + if (output->ipv4_src && + nla_put_be32(skb, OVS_TUNNEL_KEY_ATTR_IPV4_SRC, output->ipv4_src)) + return -EMSGSIZE; + if (output->ipv4_dst && + nla_put_be32(skb, OVS_TUNNEL_KEY_ATTR_IPV4_DST, output->ipv4_dst)) + return -EMSGSIZE; + if (output->ipv4_tos && + nla_put_u8(skb, OVS_TUNNEL_KEY_ATTR_TOS, output->ipv4_tos)) + return -EMSGSIZE; + if (nla_put_u8(skb, OVS_TUNNEL_KEY_ATTR_TTL, output->ipv4_ttl)) + return -EMSGSIZE; + if ((output->tun_flags & TUNNEL_DONT_FRAGMENT) && + nla_put_flag(skb, OVS_TUNNEL_KEY_ATTR_DONT_FRAGMENT)) + return -EMSGSIZE; + if ((output->tun_flags & TUNNEL_CSUM) && + nla_put_flag(skb, OVS_TUNNEL_KEY_ATTR_CSUM)) + return -EMSGSIZE; + + nla_nest_end(skb, nla); + return 0; +} + + +static int metadata_from_nlattrs(struct sw_flow_match *match, u64 *attrs, + const struct nlattr **a, bool is_mask) +{ + if (*attrs & (1 << OVS_KEY_ATTR_PRIORITY)) { + SW_FLOW_KEY_PUT(match, phy.priority, + nla_get_u32(a[OVS_KEY_ATTR_PRIORITY]), is_mask); + *attrs &= ~(1 << OVS_KEY_ATTR_PRIORITY); + } + + if (*attrs & (1 << OVS_KEY_ATTR_IN_PORT)) { + u32 in_port = nla_get_u32(a[OVS_KEY_ATTR_IN_PORT]); + + if (is_mask) + in_port = 0xffffffff; /* Always exact match in_port. */ + else if (in_port >= DP_MAX_PORTS) + return -EINVAL; + + SW_FLOW_KEY_PUT(match, phy.in_port, in_port, is_mask); + *attrs &= ~(1 << OVS_KEY_ATTR_IN_PORT); + } else if (!is_mask) { + SW_FLOW_KEY_PUT(match, phy.in_port, DP_MAX_PORTS, is_mask); + } + + if (*attrs & (1 << OVS_KEY_ATTR_SKB_MARK)) { + uint32_t mark = nla_get_u32(a[OVS_KEY_ATTR_SKB_MARK]); + + SW_FLOW_KEY_PUT(match, phy.skb_mark, mark, is_mask); + *attrs &= ~(1 << OVS_KEY_ATTR_SKB_MARK); + } + if (*attrs & (1 << OVS_KEY_ATTR_TUNNEL)) { + if (ipv4_tun_from_nlattr(a[OVS_KEY_ATTR_TUNNEL], match, + is_mask)) + return -EINVAL; + *attrs &= ~(1 << OVS_KEY_ATTR_TUNNEL); + } + return 0; +} + +static int ovs_key_from_nlattrs(struct sw_flow_match *match, u64 attrs, + const struct nlattr **a, bool is_mask) +{ + int err; + u64 orig_attrs = attrs; + + err = metadata_from_nlattrs(match, &attrs, a, is_mask); + if (err) + return err; + + if (attrs & (1 << OVS_KEY_ATTR_ETHERNET)) { + const struct ovs_key_ethernet *eth_key; + + eth_key = nla_data(a[OVS_KEY_ATTR_ETHERNET]); + SW_FLOW_KEY_MEMCPY(match, eth.src, + eth_key->eth_src, ETH_ALEN, is_mask); + SW_FLOW_KEY_MEMCPY(match, eth.dst, + eth_key->eth_dst, ETH_ALEN, is_mask); + attrs &= ~(1 << OVS_KEY_ATTR_ETHERNET); + } + + if (attrs & (1 << OVS_KEY_ATTR_VLAN)) { + __be16 tci; + + tci = nla_get_be16(a[OVS_KEY_ATTR_VLAN]); + if (!(tci & htons(VLAN_TAG_PRESENT))) { + if (is_mask) + OVS_NLERR("VLAN TCI mask does not have exact match for VLAN_TAG_PRESENT bit.\n"); + else + OVS_NLERR("VLAN TCI does not have VLAN_TAG_PRESENT bit set.\n"); + + return -EINVAL; + } + + SW_FLOW_KEY_PUT(match, eth.tci, tci, is_mask); + attrs &= ~(1 << OVS_KEY_ATTR_VLAN); + } else if (!is_mask) + SW_FLOW_KEY_PUT(match, eth.tci, htons(0xffff), true); + + if (attrs & (1 << OVS_KEY_ATTR_ETHERTYPE)) { + __be16 eth_type; + + eth_type = nla_get_be16(a[OVS_KEY_ATTR_ETHERTYPE]); + if (is_mask) { + /* Always exact match EtherType. */ + eth_type = htons(0xffff); + } else if (ntohs(eth_type) < ETH_P_802_3_MIN) { + OVS_NLERR("EtherType is less than minimum (type=%x, min=%x).\n", + ntohs(eth_type), ETH_P_802_3_MIN); + return -EINVAL; + } + + SW_FLOW_KEY_PUT(match, eth.type, eth_type, is_mask); + attrs &= ~(1 << OVS_KEY_ATTR_ETHERTYPE); + } else if (!is_mask) { + SW_FLOW_KEY_PUT(match, eth.type, htons(ETH_P_802_2), is_mask); + } + + if (attrs & (1 << OVS_KEY_ATTR_IPV4)) { + const struct ovs_key_ipv4 *ipv4_key; + + ipv4_key = nla_data(a[OVS_KEY_ATTR_IPV4]); + if (!is_mask && ipv4_key->ipv4_frag > OVS_FRAG_TYPE_MAX) { + OVS_NLERR("Unknown IPv4 fragment type (value=%d, max=%d).\n", + ipv4_key->ipv4_frag, OVS_FRAG_TYPE_MAX); + return -EINVAL; + } + SW_FLOW_KEY_PUT(match, ip.proto, + ipv4_key->ipv4_proto, is_mask); + SW_FLOW_KEY_PUT(match, ip.tos, + ipv4_key->ipv4_tos, is_mask); + SW_FLOW_KEY_PUT(match, ip.ttl, + ipv4_key->ipv4_ttl, is_mask); + SW_FLOW_KEY_PUT(match, ip.frag, + ipv4_key->ipv4_frag, is_mask); + SW_FLOW_KEY_PUT(match, ipv4.addr.src, + ipv4_key->ipv4_src, is_mask); + SW_FLOW_KEY_PUT(match, ipv4.addr.dst, + ipv4_key->ipv4_dst, is_mask); + attrs &= ~(1 << OVS_KEY_ATTR_IPV4); + } + + if (attrs & (1 << OVS_KEY_ATTR_IPV6)) { + const struct ovs_key_ipv6 *ipv6_key; + + ipv6_key = nla_data(a[OVS_KEY_ATTR_IPV6]); + if (!is_mask && ipv6_key->ipv6_frag > OVS_FRAG_TYPE_MAX) { + OVS_NLERR("Unknown IPv6 fragment type (value=%d, max=%d).\n", + ipv6_key->ipv6_frag, OVS_FRAG_TYPE_MAX); + return -EINVAL; + } + SW_FLOW_KEY_PUT(match, ipv6.label, + ipv6_key->ipv6_label, is_mask); + SW_FLOW_KEY_PUT(match, ip.proto, + ipv6_key->ipv6_proto, is_mask); + SW_FLOW_KEY_PUT(match, ip.tos, + ipv6_key->ipv6_tclass, is_mask); + SW_FLOW_KEY_PUT(match, ip.ttl, + ipv6_key->ipv6_hlimit, is_mask); + SW_FLOW_KEY_PUT(match, ip.frag, + ipv6_key->ipv6_frag, is_mask); + SW_FLOW_KEY_MEMCPY(match, ipv6.addr.src, + ipv6_key->ipv6_src, + sizeof(match->key->ipv6.addr.src), + is_mask); + SW_FLOW_KEY_MEMCPY(match, ipv6.addr.dst, + ipv6_key->ipv6_dst, + sizeof(match->key->ipv6.addr.dst), + is_mask); + + attrs &= ~(1 << OVS_KEY_ATTR_IPV6); + } + + if (attrs & (1 << OVS_KEY_ATTR_ARP)) { + const struct ovs_key_arp *arp_key; + + arp_key = nla_data(a[OVS_KEY_ATTR_ARP]); + if (!is_mask && (arp_key->arp_op & htons(0xff00))) { + OVS_NLERR("Unknown ARP opcode (opcode=%d).\n", + arp_key->arp_op); + return -EINVAL; + } + + SW_FLOW_KEY_PUT(match, ipv4.addr.src, + arp_key->arp_sip, is_mask); + SW_FLOW_KEY_PUT(match, ipv4.addr.dst, + arp_key->arp_tip, is_mask); + SW_FLOW_KEY_PUT(match, ip.proto, + ntohs(arp_key->arp_op), is_mask); + SW_FLOW_KEY_MEMCPY(match, ipv4.arp.sha, + arp_key->arp_sha, ETH_ALEN, is_mask); + SW_FLOW_KEY_MEMCPY(match, ipv4.arp.tha, + arp_key->arp_tha, ETH_ALEN, is_mask); + + attrs &= ~(1 << OVS_KEY_ATTR_ARP); + } + + if (attrs & (1 << OVS_KEY_ATTR_TCP)) { + const struct ovs_key_tcp *tcp_key; + + tcp_key = nla_data(a[OVS_KEY_ATTR_TCP]); + if (orig_attrs & (1 << OVS_KEY_ATTR_IPV4)) { + SW_FLOW_KEY_PUT(match, ipv4.tp.src, + tcp_key->tcp_src, is_mask); + SW_FLOW_KEY_PUT(match, ipv4.tp.dst, + tcp_key->tcp_dst, is_mask); + } else { + SW_FLOW_KEY_PUT(match, ipv6.tp.src, + tcp_key->tcp_src, is_mask); + SW_FLOW_KEY_PUT(match, ipv6.tp.dst, + tcp_key->tcp_dst, is_mask); + } + attrs &= ~(1 << OVS_KEY_ATTR_TCP); + } + + if (attrs & (1 << OVS_KEY_ATTR_TCP_FLAGS)) { + if (orig_attrs & (1 << OVS_KEY_ATTR_IPV4)) { + SW_FLOW_KEY_PUT(match, ipv4.tp.flags, + nla_get_be16(a[OVS_KEY_ATTR_TCP_FLAGS]), + is_mask); + } else { + SW_FLOW_KEY_PUT(match, ipv6.tp.flags, + nla_get_be16(a[OVS_KEY_ATTR_TCP_FLAGS]), + is_mask); + } + attrs &= ~(1 << OVS_KEY_ATTR_TCP_FLAGS); + } + + if (attrs & (1 << OVS_KEY_ATTR_UDP)) { + const struct ovs_key_udp *udp_key; + + udp_key = nla_data(a[OVS_KEY_ATTR_UDP]); + if (orig_attrs & (1 << OVS_KEY_ATTR_IPV4)) { + SW_FLOW_KEY_PUT(match, ipv4.tp.src, + udp_key->udp_src, is_mask); + SW_FLOW_KEY_PUT(match, ipv4.tp.dst, + udp_key->udp_dst, is_mask); + } else { + SW_FLOW_KEY_PUT(match, ipv6.tp.src, + udp_key->udp_src, is_mask); + SW_FLOW_KEY_PUT(match, ipv6.tp.dst, + udp_key->udp_dst, is_mask); + } + attrs &= ~(1 << OVS_KEY_ATTR_UDP); + } + + if (attrs & (1 << OVS_KEY_ATTR_SCTP)) { + const struct ovs_key_sctp *sctp_key; + + sctp_key = nla_data(a[OVS_KEY_ATTR_SCTP]); + if (orig_attrs & (1 << OVS_KEY_ATTR_IPV4)) { + SW_FLOW_KEY_PUT(match, ipv4.tp.src, + sctp_key->sctp_src, is_mask); + SW_FLOW_KEY_PUT(match, ipv4.tp.dst, + sctp_key->sctp_dst, is_mask); + } else { + SW_FLOW_KEY_PUT(match, ipv6.tp.src, + sctp_key->sctp_src, is_mask); + SW_FLOW_KEY_PUT(match, ipv6.tp.dst, + sctp_key->sctp_dst, is_mask); + } + attrs &= ~(1 << OVS_KEY_ATTR_SCTP); + } + + if (attrs & (1 << OVS_KEY_ATTR_ICMP)) { + const struct ovs_key_icmp *icmp_key; + + icmp_key = nla_data(a[OVS_KEY_ATTR_ICMP]); + SW_FLOW_KEY_PUT(match, ipv4.tp.src, + htons(icmp_key->icmp_type), is_mask); + SW_FLOW_KEY_PUT(match, ipv4.tp.dst, + htons(icmp_key->icmp_code), is_mask); + attrs &= ~(1 << OVS_KEY_ATTR_ICMP); + } + + if (attrs & (1 << OVS_KEY_ATTR_ICMPV6)) { + const struct ovs_key_icmpv6 *icmpv6_key; + + icmpv6_key = nla_data(a[OVS_KEY_ATTR_ICMPV6]); + SW_FLOW_KEY_PUT(match, ipv6.tp.src, + htons(icmpv6_key->icmpv6_type), is_mask); + SW_FLOW_KEY_PUT(match, ipv6.tp.dst, + htons(icmpv6_key->icmpv6_code), is_mask); + attrs &= ~(1 << OVS_KEY_ATTR_ICMPV6); + } + + if (attrs & (1 << OVS_KEY_ATTR_ND)) { + const struct ovs_key_nd *nd_key; + + nd_key = nla_data(a[OVS_KEY_ATTR_ND]); + SW_FLOW_KEY_MEMCPY(match, ipv6.nd.target, + nd_key->nd_target, + sizeof(match->key->ipv6.nd.target), + is_mask); + SW_FLOW_KEY_MEMCPY(match, ipv6.nd.sll, + nd_key->nd_sll, ETH_ALEN, is_mask); + SW_FLOW_KEY_MEMCPY(match, ipv6.nd.tll, + nd_key->nd_tll, ETH_ALEN, is_mask); + attrs &= ~(1 << OVS_KEY_ATTR_ND); + } + + if (attrs != 0) + return -EINVAL; + + return 0; +} + +static void sw_flow_mask_set(struct sw_flow_mask *mask, + struct sw_flow_key_range *range, u8 val) +{ + u8 *m = (u8 *)&mask->key + range->start; + + mask->range = *range; + memset(m, val, range_n_bytes(range)); +} + +/** + * ovs_nla_get_match - parses Netlink attributes into a flow key and + * mask. In case the 'mask' is NULL, the flow is treated as exact match + * flow. Otherwise, it is treated as a wildcarded flow, except the mask + * does not include any don't care bit. + * @match: receives the extracted flow match information. + * @key: Netlink attribute holding nested %OVS_KEY_ATTR_* Netlink attribute + * sequence. The fields should of the packet that triggered the creation + * of this flow. + * @mask: Optional. Netlink attribute holding nested %OVS_KEY_ATTR_* Netlink + * attribute specifies the mask field of the wildcarded flow. + */ +int ovs_nla_get_match(struct sw_flow_match *match, + const struct nlattr *key, + const struct nlattr *mask) +{ + const struct nlattr *a[OVS_KEY_ATTR_MAX + 1]; + const struct nlattr *encap; + u64 key_attrs = 0; + u64 mask_attrs = 0; + bool encap_valid = false; + int err; + + err = parse_flow_nlattrs(key, a, &key_attrs); + if (err) + return err; + + if ((key_attrs & (1 << OVS_KEY_ATTR_ETHERNET)) && + (key_attrs & (1 << OVS_KEY_ATTR_ETHERTYPE)) && + (nla_get_be16(a[OVS_KEY_ATTR_ETHERTYPE]) == htons(ETH_P_8021Q))) { + __be16 tci; + + if (!((key_attrs & (1 << OVS_KEY_ATTR_VLAN)) && + (key_attrs & (1 << OVS_KEY_ATTR_ENCAP)))) { + OVS_NLERR("Invalid Vlan frame.\n"); + return -EINVAL; + } + + key_attrs &= ~(1 << OVS_KEY_ATTR_ETHERTYPE); + tci = nla_get_be16(a[OVS_KEY_ATTR_VLAN]); + encap = a[OVS_KEY_ATTR_ENCAP]; + key_attrs &= ~(1 << OVS_KEY_ATTR_ENCAP); + encap_valid = true; + + if (tci & htons(VLAN_TAG_PRESENT)) { + err = parse_flow_nlattrs(encap, a, &key_attrs); + if (err) + return err; + } else if (!tci) { + /* Corner case for truncated 802.1Q header. */ + if (nla_len(encap)) { + OVS_NLERR("Truncated 802.1Q header has non-zero encap attribute.\n"); + return -EINVAL; + } + } else { + OVS_NLERR("Encap attribute is set for a non-VLAN frame.\n"); + return -EINVAL; + } + } + + err = ovs_key_from_nlattrs(match, key_attrs, a, false); + if (err) + return err; + + if (mask) { + err = parse_flow_mask_nlattrs(mask, a, &mask_attrs); + if (err) + return err; + + if (mask_attrs & 1 << OVS_KEY_ATTR_ENCAP) { + __be16 eth_type = 0; + __be16 tci = 0; + + if (!encap_valid) { + OVS_NLERR("Encap mask attribute is set for non-VLAN frame.\n"); + return -EINVAL; + } + + mask_attrs &= ~(1 << OVS_KEY_ATTR_ENCAP); + if (a[OVS_KEY_ATTR_ETHERTYPE]) + eth_type = nla_get_be16(a[OVS_KEY_ATTR_ETHERTYPE]); + + if (eth_type == htons(0xffff)) { + mask_attrs &= ~(1 << OVS_KEY_ATTR_ETHERTYPE); + encap = a[OVS_KEY_ATTR_ENCAP]; + err = parse_flow_mask_nlattrs(encap, a, &mask_attrs); + } else { + OVS_NLERR("VLAN frames must have an exact match on the TPID (mask=%x).\n", + ntohs(eth_type)); + return -EINVAL; + } + + if (a[OVS_KEY_ATTR_VLAN]) + tci = nla_get_be16(a[OVS_KEY_ATTR_VLAN]); + + if (!(tci & htons(VLAN_TAG_PRESENT))) { + OVS_NLERR("VLAN tag present bit must have an exact match (tci_mask=%x).\n", ntohs(tci)); + return -EINVAL; + } + } + + err = ovs_key_from_nlattrs(match, mask_attrs, a, true); + if (err) + return err; + } else { + /* Populate exact match flow's key mask. */ + if (match->mask) + sw_flow_mask_set(match->mask, &match->range, 0xff); + } + + if (!match_validate(match, key_attrs, mask_attrs)) + return -EINVAL; + + return 0; +} + +/** + * ovs_nla_get_flow_metadata - parses Netlink attributes into a flow key. + * @flow: Receives extracted in_port, priority, tun_key and skb_mark. + * @attr: Netlink attribute holding nested %OVS_KEY_ATTR_* Netlink attribute + * sequence. + * + * This parses a series of Netlink attributes that form a flow key, which must + * take the same form accepted by flow_from_nlattrs(), but only enough of it to + * get the metadata, that is, the parts of the flow key that cannot be + * extracted from the packet itself. + */ + +int ovs_nla_get_flow_metadata(struct sw_flow *flow, + const struct nlattr *attr) +{ + struct ovs_key_ipv4_tunnel *tun_key = &flow->key.tun_key; + const struct nlattr *a[OVS_KEY_ATTR_MAX + 1]; + u64 attrs = 0; + int err; + struct sw_flow_match match; + + flow->key.phy.in_port = DP_MAX_PORTS; + flow->key.phy.priority = 0; + flow->key.phy.skb_mark = 0; + memset(tun_key, 0, sizeof(flow->key.tun_key)); + + err = parse_flow_nlattrs(attr, a, &attrs); + if (err) + return -EINVAL; + + memset(&match, 0, sizeof(match)); + match.key = &flow->key; + + err = metadata_from_nlattrs(&match, &attrs, a, false); + if (err) + return err; + + return 0; +} + +int ovs_nla_put_flow(const struct sw_flow_key *swkey, + const struct sw_flow_key *output, struct sk_buff *skb) +{ + struct ovs_key_ethernet *eth_key; + struct nlattr *nla, *encap; + bool is_mask = (swkey != output); + + if (nla_put_u32(skb, OVS_KEY_ATTR_PRIORITY, output->phy.priority)) + goto nla_put_failure; + + if ((swkey->tun_key.ipv4_dst || is_mask) && + ipv4_tun_to_nlattr(skb, &swkey->tun_key, &output->tun_key)) + goto nla_put_failure; + + if (swkey->phy.in_port == DP_MAX_PORTS) { + if (is_mask && (output->phy.in_port == 0xffff)) + if (nla_put_u32(skb, OVS_KEY_ATTR_IN_PORT, 0xffffffff)) + goto nla_put_failure; + } else { + u16 upper_u16; + upper_u16 = !is_mask ? 0 : 0xffff; + + if (nla_put_u32(skb, OVS_KEY_ATTR_IN_PORT, + (upper_u16 << 16) | output->phy.in_port)) + goto nla_put_failure; + } + + if (nla_put_u32(skb, OVS_KEY_ATTR_SKB_MARK, output->phy.skb_mark)) + goto nla_put_failure; + + nla = nla_reserve(skb, OVS_KEY_ATTR_ETHERNET, sizeof(*eth_key)); + if (!nla) + goto nla_put_failure; + + eth_key = nla_data(nla); + memcpy(eth_key->eth_src, output->eth.src, ETH_ALEN); + memcpy(eth_key->eth_dst, output->eth.dst, ETH_ALEN); + + if (swkey->eth.tci || swkey->eth.type == htons(ETH_P_8021Q)) { + __be16 eth_type; + eth_type = !is_mask ? htons(ETH_P_8021Q) : htons(0xffff); + if (nla_put_be16(skb, OVS_KEY_ATTR_ETHERTYPE, eth_type) || + nla_put_be16(skb, OVS_KEY_ATTR_VLAN, output->eth.tci)) + goto nla_put_failure; + encap = nla_nest_start(skb, OVS_KEY_ATTR_ENCAP); + if (!swkey->eth.tci) + goto unencap; + } else + encap = NULL; + + if (swkey->eth.type == htons(ETH_P_802_2)) { + /* + * Ethertype 802.2 is represented in the netlink with omitted + * OVS_KEY_ATTR_ETHERTYPE in the flow key attribute, and + * 0xffff in the mask attribute. Ethertype can also + * be wildcarded. + */ + if (is_mask && output->eth.type) + if (nla_put_be16(skb, OVS_KEY_ATTR_ETHERTYPE, + output->eth.type)) + goto nla_put_failure; + goto unencap; + } + + if (nla_put_be16(skb, OVS_KEY_ATTR_ETHERTYPE, output->eth.type)) + goto nla_put_failure; + + if (swkey->eth.type == htons(ETH_P_IP)) { + struct ovs_key_ipv4 *ipv4_key; + + nla = nla_reserve(skb, OVS_KEY_ATTR_IPV4, sizeof(*ipv4_key)); + if (!nla) + goto nla_put_failure; + ipv4_key = nla_data(nla); + ipv4_key->ipv4_src = output->ipv4.addr.src; + ipv4_key->ipv4_dst = output->ipv4.addr.dst; + ipv4_key->ipv4_proto = output->ip.proto; + ipv4_key->ipv4_tos = output->ip.tos; + ipv4_key->ipv4_ttl = output->ip.ttl; + ipv4_key->ipv4_frag = output->ip.frag; + } else if (swkey->eth.type == htons(ETH_P_IPV6)) { + struct ovs_key_ipv6 *ipv6_key; + + nla = nla_reserve(skb, OVS_KEY_ATTR_IPV6, sizeof(*ipv6_key)); + if (!nla) + goto nla_put_failure; + ipv6_key = nla_data(nla); + memcpy(ipv6_key->ipv6_src, &output->ipv6.addr.src, + sizeof(ipv6_key->ipv6_src)); + memcpy(ipv6_key->ipv6_dst, &output->ipv6.addr.dst, + sizeof(ipv6_key->ipv6_dst)); + ipv6_key->ipv6_label = output->ipv6.label; + ipv6_key->ipv6_proto = output->ip.proto; + ipv6_key->ipv6_tclass = output->ip.tos; + ipv6_key->ipv6_hlimit = output->ip.ttl; + ipv6_key->ipv6_frag = output->ip.frag; + } else if (swkey->eth.type == htons(ETH_P_ARP) || + swkey->eth.type == htons(ETH_P_RARP)) { + struct ovs_key_arp *arp_key; + + nla = nla_reserve(skb, OVS_KEY_ATTR_ARP, sizeof(*arp_key)); + if (!nla) + goto nla_put_failure; + arp_key = nla_data(nla); + memset(arp_key, 0, sizeof(struct ovs_key_arp)); + arp_key->arp_sip = output->ipv4.addr.src; + arp_key->arp_tip = output->ipv4.addr.dst; + arp_key->arp_op = htons(output->ip.proto); + memcpy(arp_key->arp_sha, output->ipv4.arp.sha, ETH_ALEN); + memcpy(arp_key->arp_tha, output->ipv4.arp.tha, ETH_ALEN); + } + + if ((swkey->eth.type == htons(ETH_P_IP) || + swkey->eth.type == htons(ETH_P_IPV6)) && + swkey->ip.frag != OVS_FRAG_TYPE_LATER) { + + if (swkey->ip.proto == IPPROTO_TCP) { + struct ovs_key_tcp *tcp_key; + + nla = nla_reserve(skb, OVS_KEY_ATTR_TCP, sizeof(*tcp_key)); + if (!nla) + goto nla_put_failure; + tcp_key = nla_data(nla); + if (swkey->eth.type == htons(ETH_P_IP)) { + tcp_key->tcp_src = output->ipv4.tp.src; + tcp_key->tcp_dst = output->ipv4.tp.dst; + if (nla_put_be16(skb, OVS_KEY_ATTR_TCP_FLAGS, + output->ipv4.tp.flags)) + goto nla_put_failure; + } else if (swkey->eth.type == htons(ETH_P_IPV6)) { + tcp_key->tcp_src = output->ipv6.tp.src; + tcp_key->tcp_dst = output->ipv6.tp.dst; + if (nla_put_be16(skb, OVS_KEY_ATTR_TCP_FLAGS, + output->ipv6.tp.flags)) + goto nla_put_failure; + } + } else if (swkey->ip.proto == IPPROTO_UDP) { + struct ovs_key_udp *udp_key; + + nla = nla_reserve(skb, OVS_KEY_ATTR_UDP, sizeof(*udp_key)); + if (!nla) + goto nla_put_failure; + udp_key = nla_data(nla); + if (swkey->eth.type == htons(ETH_P_IP)) { + udp_key->udp_src = output->ipv4.tp.src; + udp_key->udp_dst = output->ipv4.tp.dst; + } else if (swkey->eth.type == htons(ETH_P_IPV6)) { + udp_key->udp_src = output->ipv6.tp.src; + udp_key->udp_dst = output->ipv6.tp.dst; + } + } else if (swkey->ip.proto == IPPROTO_SCTP) { + struct ovs_key_sctp *sctp_key; + + nla = nla_reserve(skb, OVS_KEY_ATTR_SCTP, sizeof(*sctp_key)); + if (!nla) + goto nla_put_failure; + sctp_key = nla_data(nla); + if (swkey->eth.type == htons(ETH_P_IP)) { + sctp_key->sctp_src = swkey->ipv4.tp.src; + sctp_key->sctp_dst = swkey->ipv4.tp.dst; + } else if (swkey->eth.type == htons(ETH_P_IPV6)) { + sctp_key->sctp_src = swkey->ipv6.tp.src; + sctp_key->sctp_dst = swkey->ipv6.tp.dst; + } + } else if (swkey->eth.type == htons(ETH_P_IP) && + swkey->ip.proto == IPPROTO_ICMP) { + struct ovs_key_icmp *icmp_key; + + nla = nla_reserve(skb, OVS_KEY_ATTR_ICMP, sizeof(*icmp_key)); + if (!nla) + goto nla_put_failure; + icmp_key = nla_data(nla); + icmp_key->icmp_type = ntohs(output->ipv4.tp.src); + icmp_key->icmp_code = ntohs(output->ipv4.tp.dst); + } else if (swkey->eth.type == htons(ETH_P_IPV6) && + swkey->ip.proto == IPPROTO_ICMPV6) { + struct ovs_key_icmpv6 *icmpv6_key; + + nla = nla_reserve(skb, OVS_KEY_ATTR_ICMPV6, + sizeof(*icmpv6_key)); + if (!nla) + goto nla_put_failure; + icmpv6_key = nla_data(nla); + icmpv6_key->icmpv6_type = ntohs(output->ipv6.tp.src); + icmpv6_key->icmpv6_code = ntohs(output->ipv6.tp.dst); + + if (icmpv6_key->icmpv6_type == NDISC_NEIGHBOUR_SOLICITATION || + icmpv6_key->icmpv6_type == NDISC_NEIGHBOUR_ADVERTISEMENT) { + struct ovs_key_nd *nd_key; + + nla = nla_reserve(skb, OVS_KEY_ATTR_ND, sizeof(*nd_key)); + if (!nla) + goto nla_put_failure; + nd_key = nla_data(nla); + memcpy(nd_key->nd_target, &output->ipv6.nd.target, + sizeof(nd_key->nd_target)); + memcpy(nd_key->nd_sll, output->ipv6.nd.sll, ETH_ALEN); + memcpy(nd_key->nd_tll, output->ipv6.nd.tll, ETH_ALEN); + } + } + } + +unencap: + if (encap) + nla_nest_end(skb, encap); + + return 0; + +nla_put_failure: + return -EMSGSIZE; +} + +#define MAX_ACTIONS_BUFSIZE (32 * 1024) + +struct sw_flow_actions *ovs_nla_alloc_flow_actions(int size) +{ + struct sw_flow_actions *sfa; + + if (size > MAX_ACTIONS_BUFSIZE) + return ERR_PTR(-EINVAL); + + sfa = kmalloc(sizeof(*sfa) + size, GFP_KERNEL); + if (!sfa) + return ERR_PTR(-ENOMEM); + + sfa->actions_len = 0; + return sfa; +} + +/* RCU callback used by ovs_nla_free_flow_actions. */ +static void rcu_free_acts_callback(struct rcu_head *rcu) +{ + struct sw_flow_actions *sf_acts = container_of(rcu, + struct sw_flow_actions, rcu); + kfree(sf_acts); +} + +/* Schedules 'sf_acts' to be freed after the next RCU grace period. + * The caller must hold rcu_read_lock for this to be sensible. */ +void ovs_nla_free_flow_actions(struct sw_flow_actions *sf_acts) +{ + call_rcu(&sf_acts->rcu, rcu_free_acts_callback); +} + +static struct nlattr *reserve_sfa_size(struct sw_flow_actions **sfa, + int attr_len) +{ + + struct sw_flow_actions *acts; + int new_acts_size; + int req_size = NLA_ALIGN(attr_len); + int next_offset = offsetof(struct sw_flow_actions, actions) + + (*sfa)->actions_len; + + if (req_size <= (ksize(*sfa) - next_offset)) + goto out; + + new_acts_size = ksize(*sfa) * 2; + + if (new_acts_size > MAX_ACTIONS_BUFSIZE) { + if ((MAX_ACTIONS_BUFSIZE - next_offset) < req_size) + return ERR_PTR(-EMSGSIZE); + new_acts_size = MAX_ACTIONS_BUFSIZE; + } + + acts = ovs_nla_alloc_flow_actions(new_acts_size); + if (IS_ERR(acts)) + return (void *)acts; + + memcpy(acts->actions, (*sfa)->actions, (*sfa)->actions_len); + acts->actions_len = (*sfa)->actions_len; + kfree(*sfa); + *sfa = acts; + +out: + (*sfa)->actions_len += req_size; + return (struct nlattr *) ((unsigned char *)(*sfa) + next_offset); +} + +static int add_action(struct sw_flow_actions **sfa, int attrtype, void *data, int len) +{ + struct nlattr *a; + + a = reserve_sfa_size(sfa, nla_attr_size(len)); + if (IS_ERR(a)) + return PTR_ERR(a); + + a->nla_type = attrtype; + a->nla_len = nla_attr_size(len); + + if (data) + memcpy(nla_data(a), data, len); + memset((unsigned char *) a + a->nla_len, 0, nla_padlen(len)); + + return 0; +} + +static inline int add_nested_action_start(struct sw_flow_actions **sfa, + int attrtype) +{ + int used = (*sfa)->actions_len; + int err; + + err = add_action(sfa, attrtype, NULL, 0); + if (err) + return err; + + return used; +} + +static inline void add_nested_action_end(struct sw_flow_actions *sfa, + int st_offset) +{ + struct nlattr *a = (struct nlattr *) ((unsigned char *)sfa->actions + + st_offset); + + a->nla_len = sfa->actions_len - st_offset; +} + +static int validate_and_copy_sample(const struct nlattr *attr, + const struct sw_flow_key *key, int depth, + struct sw_flow_actions **sfa) +{ + const struct nlattr *attrs[OVS_SAMPLE_ATTR_MAX + 1]; + const struct nlattr *probability, *actions; + const struct nlattr *a; + int rem, start, err, st_acts; + + memset(attrs, 0, sizeof(attrs)); + nla_for_each_nested(a, attr, rem) { + int type = nla_type(a); + if (!type || type > OVS_SAMPLE_ATTR_MAX || attrs[type]) + return -EINVAL; + attrs[type] = a; + } + if (rem) + return -EINVAL; + + probability = attrs[OVS_SAMPLE_ATTR_PROBABILITY]; + if (!probability || nla_len(probability) != sizeof(u32)) + return -EINVAL; + + actions = attrs[OVS_SAMPLE_ATTR_ACTIONS]; + if (!actions || (nla_len(actions) && nla_len(actions) < NLA_HDRLEN)) + return -EINVAL; + + /* validation done, copy sample action. */ + start = add_nested_action_start(sfa, OVS_ACTION_ATTR_SAMPLE); + if (start < 0) + return start; + err = add_action(sfa, OVS_SAMPLE_ATTR_PROBABILITY, + nla_data(probability), sizeof(u32)); + if (err) + return err; + st_acts = add_nested_action_start(sfa, OVS_SAMPLE_ATTR_ACTIONS); + if (st_acts < 0) + return st_acts; + + err = ovs_nla_copy_actions(actions, key, depth + 1, sfa); + if (err) + return err; + + add_nested_action_end(*sfa, st_acts); + add_nested_action_end(*sfa, start); + + return 0; +} + +static int validate_tp_port(const struct sw_flow_key *flow_key) +{ + if (flow_key->eth.type == htons(ETH_P_IP)) { + if (flow_key->ipv4.tp.src || flow_key->ipv4.tp.dst) + return 0; + } else if (flow_key->eth.type == htons(ETH_P_IPV6)) { + if (flow_key->ipv6.tp.src || flow_key->ipv6.tp.dst) + return 0; + } + + return -EINVAL; +} + +void ovs_match_init(struct sw_flow_match *match, + struct sw_flow_key *key, + struct sw_flow_mask *mask) +{ + memset(match, 0, sizeof(*match)); + match->key = key; + match->mask = mask; + + memset(key, 0, sizeof(*key)); + + if (mask) { + memset(&mask->key, 0, sizeof(mask->key)); + mask->range.start = mask->range.end = 0; + } +} + +static int validate_and_copy_set_tun(const struct nlattr *attr, + struct sw_flow_actions **sfa) +{ + struct sw_flow_match match; + struct sw_flow_key key; + int err, start; + + ovs_match_init(&match, &key, NULL); + err = ipv4_tun_from_nlattr(nla_data(attr), &match, false); + if (err) + return err; + + start = add_nested_action_start(sfa, OVS_ACTION_ATTR_SET); + if (start < 0) + return start; + + err = add_action(sfa, OVS_KEY_ATTR_IPV4_TUNNEL, &match.key->tun_key, + sizeof(match.key->tun_key)); + add_nested_action_end(*sfa, start); + + return err; +} + +static int validate_set(const struct nlattr *a, + const struct sw_flow_key *flow_key, + struct sw_flow_actions **sfa, + bool *set_tun) +{ + const struct nlattr *ovs_key = nla_data(a); + int key_type = nla_type(ovs_key); + + /* There can be only one key in a action */ + if (nla_total_size(nla_len(ovs_key)) != nla_len(a)) + return -EINVAL; + + if (key_type > OVS_KEY_ATTR_MAX || + (ovs_key_lens[key_type] != nla_len(ovs_key) && + ovs_key_lens[key_type] != -1)) + return -EINVAL; + + switch (key_type) { + const struct ovs_key_ipv4 *ipv4_key; + const struct ovs_key_ipv6 *ipv6_key; + int err; + + case OVS_KEY_ATTR_PRIORITY: + case OVS_KEY_ATTR_SKB_MARK: + case OVS_KEY_ATTR_ETHERNET: + break; + + case OVS_KEY_ATTR_TUNNEL: + *set_tun = true; + err = validate_and_copy_set_tun(a, sfa); + if (err) + return err; + break; + + case OVS_KEY_ATTR_IPV4: + if (flow_key->eth.type != htons(ETH_P_IP)) + return -EINVAL; + + if (!flow_key->ip.proto) + return -EINVAL; + + ipv4_key = nla_data(ovs_key); + if (ipv4_key->ipv4_proto != flow_key->ip.proto) + return -EINVAL; + + if (ipv4_key->ipv4_frag != flow_key->ip.frag) + return -EINVAL; + + break; + + case OVS_KEY_ATTR_IPV6: + if (flow_key->eth.type != htons(ETH_P_IPV6)) + return -EINVAL; + + if (!flow_key->ip.proto) + return -EINVAL; + + ipv6_key = nla_data(ovs_key); + if (ipv6_key->ipv6_proto != flow_key->ip.proto) + return -EINVAL; + + if (ipv6_key->ipv6_frag != flow_key->ip.frag) + return -EINVAL; + + if (ntohl(ipv6_key->ipv6_label) & 0xFFF00000) + return -EINVAL; + + break; + + case OVS_KEY_ATTR_TCP: + if (flow_key->ip.proto != IPPROTO_TCP) + return -EINVAL; + + return validate_tp_port(flow_key); + + case OVS_KEY_ATTR_UDP: + if (flow_key->ip.proto != IPPROTO_UDP) + return -EINVAL; + + return validate_tp_port(flow_key); + + case OVS_KEY_ATTR_SCTP: + if (flow_key->ip.proto != IPPROTO_SCTP) + return -EINVAL; + + return validate_tp_port(flow_key); + + default: + return -EINVAL; + } + + return 0; +} + +static int validate_userspace(const struct nlattr *attr) +{ + static const struct nla_policy userspace_policy[OVS_USERSPACE_ATTR_MAX + 1] = { + [OVS_USERSPACE_ATTR_PID] = {.type = NLA_U32 }, + [OVS_USERSPACE_ATTR_USERDATA] = {.type = NLA_UNSPEC }, + }; + struct nlattr *a[OVS_USERSPACE_ATTR_MAX + 1]; + int error; + + error = nla_parse_nested(a, OVS_USERSPACE_ATTR_MAX, + attr, userspace_policy); + if (error) + return error; + + if (!a[OVS_USERSPACE_ATTR_PID] || + !nla_get_u32(a[OVS_USERSPACE_ATTR_PID])) + return -EINVAL; + + return 0; +} + +static int copy_action(const struct nlattr *from, + struct sw_flow_actions **sfa) +{ + int totlen = NLA_ALIGN(from->nla_len); + struct nlattr *to; + + to = reserve_sfa_size(sfa, from->nla_len); + if (IS_ERR(to)) + return PTR_ERR(to); + + memcpy(to, from, totlen); + return 0; +} + +int ovs_nla_copy_actions(const struct nlattr *attr, + const struct sw_flow_key *key, + int depth, + struct sw_flow_actions **sfa) +{ + const struct nlattr *a; + int rem, err; + + if (depth >= SAMPLE_ACTION_DEPTH) + return -EOVERFLOW; + + nla_for_each_nested(a, attr, rem) { + /* Expected argument lengths, (u32)-1 for variable length. */ + static const u32 action_lens[OVS_ACTION_ATTR_MAX + 1] = { + [OVS_ACTION_ATTR_OUTPUT] = sizeof(u32), + [OVS_ACTION_ATTR_USERSPACE] = (u32)-1, + [OVS_ACTION_ATTR_PUSH_VLAN] = sizeof(struct ovs_action_push_vlan), + [OVS_ACTION_ATTR_POP_VLAN] = 0, + [OVS_ACTION_ATTR_SET] = (u32)-1, + [OVS_ACTION_ATTR_SAMPLE] = (u32)-1 + }; + const struct ovs_action_push_vlan *vlan; + int type = nla_type(a); + bool skip_copy; + + if (type > OVS_ACTION_ATTR_MAX || + (action_lens[type] != nla_len(a) && + action_lens[type] != (u32)-1)) + return -EINVAL; + + skip_copy = false; + switch (type) { + case OVS_ACTION_ATTR_UNSPEC: + return -EINVAL; + + case OVS_ACTION_ATTR_USERSPACE: + err = validate_userspace(a); + if (err) + return err; + break; + + case OVS_ACTION_ATTR_OUTPUT: + if (nla_get_u32(a) >= DP_MAX_PORTS) + return -EINVAL; + break; + + + case OVS_ACTION_ATTR_POP_VLAN: + break; + + case OVS_ACTION_ATTR_PUSH_VLAN: + vlan = nla_data(a); + if (vlan->vlan_tpid != htons(ETH_P_8021Q)) + return -EINVAL; + if (!(vlan->vlan_tci & htons(VLAN_TAG_PRESENT))) + return -EINVAL; + break; + + case OVS_ACTION_ATTR_SET: + err = validate_set(a, key, sfa, &skip_copy); + if (err) + return err; + break; + + case OVS_ACTION_ATTR_SAMPLE: + err = validate_and_copy_sample(a, key, depth, sfa); + if (err) + return err; + skip_copy = true; + break; + + default: + return -EINVAL; + } + if (!skip_copy) { + err = copy_action(a, sfa); + if (err) + return err; + } + } + + if (rem > 0) + return -EINVAL; + + return 0; +} + +static int sample_action_to_attr(const struct nlattr *attr, struct sk_buff *skb) +{ + const struct nlattr *a; + struct nlattr *start; + int err = 0, rem; + + start = nla_nest_start(skb, OVS_ACTION_ATTR_SAMPLE); + if (!start) + return -EMSGSIZE; + + nla_for_each_nested(a, attr, rem) { + int type = nla_type(a); + struct nlattr *st_sample; + + switch (type) { + case OVS_SAMPLE_ATTR_PROBABILITY: + if (nla_put(skb, OVS_SAMPLE_ATTR_PROBABILITY, + sizeof(u32), nla_data(a))) + return -EMSGSIZE; + break; + case OVS_SAMPLE_ATTR_ACTIONS: + st_sample = nla_nest_start(skb, OVS_SAMPLE_ATTR_ACTIONS); + if (!st_sample) + return -EMSGSIZE; + err = ovs_nla_put_actions(nla_data(a), nla_len(a), skb); + if (err) + return err; + nla_nest_end(skb, st_sample); + break; + } + } + + nla_nest_end(skb, start); + return err; +} + +static int set_action_to_attr(const struct nlattr *a, struct sk_buff *skb) +{ + const struct nlattr *ovs_key = nla_data(a); + int key_type = nla_type(ovs_key); + struct nlattr *start; + int err; + + switch (key_type) { + case OVS_KEY_ATTR_IPV4_TUNNEL: + start = nla_nest_start(skb, OVS_ACTION_ATTR_SET); + if (!start) + return -EMSGSIZE; + + err = ipv4_tun_to_nlattr(skb, nla_data(ovs_key), + nla_data(ovs_key)); + if (err) + return err; + nla_nest_end(skb, start); + break; + default: + if (nla_put(skb, OVS_ACTION_ATTR_SET, nla_len(a), ovs_key)) + return -EMSGSIZE; + break; + } + + return 0; +} + +int ovs_nla_put_actions(const struct nlattr *attr, int len, struct sk_buff *skb) +{ + const struct nlattr *a; + int rem, err; + + nla_for_each_attr(a, attr, len, rem) { + int type = nla_type(a); + + switch (type) { + case OVS_ACTION_ATTR_SET: + err = set_action_to_attr(a, skb); + if (err) + return err; + break; + + case OVS_ACTION_ATTR_SAMPLE: + err = sample_action_to_attr(a, skb); + if (err) + return err; + break; + default: + if (nla_put(skb, type, nla_len(a), nla_data(a))) + return -EMSGSIZE; + break; + } + } + + return 0; +} diff --git a/net/openvswitch/flow_netlink.h b/net/openvswitch/flow_netlink.h new file mode 100644 index 000000000000..440151045d39 --- /dev/null +++ b/net/openvswitch/flow_netlink.h @@ -0,0 +1,60 @@ +/* + * Copyright (c) 2007-2013 Nicira, Inc. + * + * This program is free software; you can redistribute it and/or + * modify it under the terms of version 2 of the GNU General Public + * License as published by the Free Software Foundation. + * + * This program is distributed in the hope that it will be useful, but + * WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + * General Public License for more details. + * + * You should have received a copy of the GNU General Public License + * along with this program; if not, write to the Free Software + * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA + * 02110-1301, USA + */ + + +#ifndef FLOW_NETLINK_H +#define FLOW_NETLINK_H 1 + +#include <linux/kernel.h> +#include <linux/netlink.h> +#include <linux/openvswitch.h> +#include <linux/spinlock.h> +#include <linux/types.h> +#include <linux/rcupdate.h> +#include <linux/if_ether.h> +#include <linux/in6.h> +#include <linux/jiffies.h> +#include <linux/time.h> +#include <linux/flex_array.h> + +#include <net/inet_ecn.h> +#include <net/ip_tunnels.h> + +#include "flow.h" + +void ovs_match_init(struct sw_flow_match *match, + struct sw_flow_key *key, struct sw_flow_mask *mask); + +int ovs_nla_put_flow(const struct sw_flow_key *, + const struct sw_flow_key *, struct sk_buff *); +int ovs_nla_get_flow_metadata(struct sw_flow *flow, + const struct nlattr *attr); +int ovs_nla_get_match(struct sw_flow_match *match, + const struct nlattr *, + const struct nlattr *); + +int ovs_nla_copy_actions(const struct nlattr *attr, + const struct sw_flow_key *key, int depth, + struct sw_flow_actions **sfa); +int ovs_nla_put_actions(const struct nlattr *attr, + int len, struct sk_buff *skb); + +struct sw_flow_actions *ovs_nla_alloc_flow_actions(int actions_len); +void ovs_nla_free_flow_actions(struct sw_flow_actions *); + +#endif /* flow_netlink.h */ diff --git a/net/openvswitch/flow_table.c b/net/openvswitch/flow_table.c new file mode 100644 index 000000000000..e42542706087 --- /dev/null +++ b/net/openvswitch/flow_table.c @@ -0,0 +1,592 @@ +/* + * Copyright (c) 2007-2013 Nicira, Inc. + * + * This program is free software; you can redistribute it and/or + * modify it under the terms of version 2 of the GNU General Public + * License as published by the Free Software Foundation. + * + * This program is distributed in the hope that it will be useful, but + * WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + * General Public License for more details. + * + * You should have received a copy of the GNU General Public License + * along with this program; if not, write to the Free Software + * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA + * 02110-1301, USA + */ + +#include "flow.h" +#include "datapath.h" +#include <linux/uaccess.h> +#include <linux/netdevice.h> +#include <linux/etherdevice.h> +#include <linux/if_ether.h> +#include <linux/if_vlan.h> +#include <net/llc_pdu.h> +#include <linux/kernel.h> +#include <linux/jhash.h> +#include <linux/jiffies.h> +#include <linux/llc.h> +#include <linux/module.h> +#include <linux/in.h> +#include <linux/rcupdate.h> +#include <linux/if_arp.h> +#include <linux/ip.h> +#include <linux/ipv6.h> +#include <linux/sctp.h> +#include <linux/tcp.h> +#include <linux/udp.h> +#include <linux/icmp.h> +#include <linux/icmpv6.h> +#include <linux/rculist.h> +#include <net/ip.h> +#include <net/ipv6.h> +#include <net/ndisc.h> + +#include "datapath.h" + +#define TBL_MIN_BUCKETS 1024 +#define REHASH_INTERVAL (10 * 60 * HZ) + +static struct kmem_cache *flow_cache; + +static u16 range_n_bytes(const struct sw_flow_key_range *range) +{ + return range->end - range->start; +} + +void ovs_flow_mask_key(struct sw_flow_key *dst, const struct sw_flow_key *src, + const struct sw_flow_mask *mask) +{ + const long *m = (long *)((u8 *)&mask->key + mask->range.start); + const long *s = (long *)((u8 *)src + mask->range.start); + long *d = (long *)((u8 *)dst + mask->range.start); + int i; + + /* The memory outside of the 'mask->range' are not set since + * further operations on 'dst' only uses contents within + * 'mask->range'. + */ + for (i = 0; i < range_n_bytes(&mask->range); i += sizeof(long)) + *d++ = *s++ & *m++; +} + +struct sw_flow *ovs_flow_alloc(void) +{ + struct sw_flow *flow; + + flow = kmem_cache_alloc(flow_cache, GFP_KERNEL); + if (!flow) + return ERR_PTR(-ENOMEM); + + spin_lock_init(&flow->lock); + flow->sf_acts = NULL; + flow->mask = NULL; + + return flow; +} + +int ovs_flow_tbl_count(struct flow_table *table) +{ + return table->count; +} + +static struct flex_array *alloc_buckets(unsigned int n_buckets) +{ + struct flex_array *buckets; + int i, err; + + buckets = flex_array_alloc(sizeof(struct hlist_head), + n_buckets, GFP_KERNEL); + if (!buckets) + return NULL; + + err = flex_array_prealloc(buckets, 0, n_buckets, GFP_KERNEL); + if (err) { + flex_array_free(buckets); + return NULL; + } + + for (i = 0; i < n_buckets; i++) + INIT_HLIST_HEAD((struct hlist_head *) + flex_array_get(buckets, i)); + + return buckets; +} + +static void flow_free(struct sw_flow *flow) +{ + kfree((struct sf_flow_acts __force *)flow->sf_acts); + kmem_cache_free(flow_cache, flow); +} + +static void rcu_free_flow_callback(struct rcu_head *rcu) +{ + struct sw_flow *flow = container_of(rcu, struct sw_flow, rcu); + + flow_free(flow); +} + +static void rcu_free_sw_flow_mask_cb(struct rcu_head *rcu) +{ + struct sw_flow_mask *mask = container_of(rcu, struct sw_flow_mask, rcu); + + kfree(mask); +} + +static void flow_mask_del_ref(struct sw_flow_mask *mask, bool deferred) +{ + if (!mask) + return; + + BUG_ON(!mask->ref_count); + mask->ref_count--; + + if (!mask->ref_count) { + list_del_rcu(&mask->list); + if (deferred) + call_rcu(&mask->rcu, rcu_free_sw_flow_mask_cb); + else + kfree(mask); + } +} + +void ovs_flow_free(struct sw_flow *flow, bool deferred) +{ + if (!flow) + return; + + flow_mask_del_ref(flow->mask, deferred); + + if (deferred) + call_rcu(&flow->rcu, rcu_free_flow_callback); + else + flow_free(flow); +} + +static void free_buckets(struct flex_array *buckets) +{ + flex_array_free(buckets); +} + +static void __table_instance_destroy(struct table_instance *ti) +{ + int i; + + if (ti->keep_flows) + goto skip_flows; + + for (i = 0; i < ti->n_buckets; i++) { + struct sw_flow *flow; + struct hlist_head *head = flex_array_get(ti->buckets, i); + struct hlist_node *n; + int ver = ti->node_ver; + + hlist_for_each_entry_safe(flow, n, head, hash_node[ver]) { + hlist_del(&flow->hash_node[ver]); + ovs_flow_free(flow, false); + } + } + +skip_flows: + free_buckets(ti->buckets); + kfree(ti); +} + +static struct table_instance *table_instance_alloc(int new_size) +{ + struct table_instance *ti = kmalloc(sizeof(*ti), GFP_KERNEL); + + if (!ti) + return NULL; + + ti->buckets = alloc_buckets(new_size); + + if (!ti->buckets) { + kfree(ti); + return NULL; + } + ti->n_buckets = new_size; + ti->node_ver = 0; + ti->keep_flows = false; + get_random_bytes(&ti->hash_seed, sizeof(u32)); + + return ti; +} + +int ovs_flow_tbl_init(struct flow_table *table) +{ + struct table_instance *ti; + + ti = table_instance_alloc(TBL_MIN_BUCKETS); + + if (!ti) + return -ENOMEM; + + rcu_assign_pointer(table->ti, ti); + INIT_LIST_HEAD(&table->mask_list); + table->last_rehash = jiffies; + table->count = 0; + return 0; +} + +static void flow_tbl_destroy_rcu_cb(struct rcu_head *rcu) +{ + struct table_instance *ti = container_of(rcu, struct table_instance, rcu); + + __table_instance_destroy(ti); +} + +static void table_instance_destroy(struct table_instance *ti, bool deferred) +{ + if (!ti) + return; + + if (deferred) + call_rcu(&ti->rcu, flow_tbl_destroy_rcu_cb); + else + __table_instance_destroy(ti); +} + +void ovs_flow_tbl_destroy(struct flow_table *table) +{ + struct table_instance *ti = ovsl_dereference(table->ti); + + table_instance_destroy(ti, false); +} + +struct sw_flow *ovs_flow_tbl_dump_next(struct table_instance *ti, + u32 *bucket, u32 *last) +{ + struct sw_flow *flow; + struct hlist_head *head; + int ver; + int i; + + ver = ti->node_ver; + while (*bucket < ti->n_buckets) { + i = 0; + head = flex_array_get(ti->buckets, *bucket); + hlist_for_each_entry_rcu(flow, head, hash_node[ver]) { + if (i < *last) { + i++; + continue; + } + *last = i + 1; + return flow; + } + (*bucket)++; + *last = 0; + } + + return NULL; +} + +static struct hlist_head *find_bucket(struct table_instance *ti, u32 hash) +{ + hash = jhash_1word(hash, ti->hash_seed); + return flex_array_get(ti->buckets, + (hash & (ti->n_buckets - 1))); +} + +static void table_instance_insert(struct table_instance *ti, struct sw_flow *flow) +{ + struct hlist_head *head; + + head = find_bucket(ti, flow->hash); + hlist_add_head_rcu(&flow->hash_node[ti->node_ver], head); +} + +static void flow_table_copy_flows(struct table_instance *old, + struct table_instance *new) +{ + int old_ver; + int i; + + old_ver = old->node_ver; + new->node_ver = !old_ver; + + /* Insert in new table. */ + for (i = 0; i < old->n_buckets; i++) { + struct sw_flow *flow; + struct hlist_head *head; + + head = flex_array_get(old->buckets, i); + + hlist_for_each_entry(flow, head, hash_node[old_ver]) + table_instance_insert(new, flow); + } + + old->keep_flows = true; +} + +static struct table_instance *table_instance_rehash(struct table_instance *ti, + int n_buckets) +{ + struct table_instance *new_ti; + + new_ti = table_instance_alloc(n_buckets); + if (!new_ti) + return NULL; + + flow_table_copy_flows(ti, new_ti); + + return new_ti; +} + +int ovs_flow_tbl_flush(struct flow_table *flow_table) +{ + struct table_instance *old_ti; + struct table_instance *new_ti; + + old_ti = ovsl_dereference(flow_table->ti); + new_ti = table_instance_alloc(TBL_MIN_BUCKETS); + if (!new_ti) + return -ENOMEM; + + rcu_assign_pointer(flow_table->ti, new_ti); + flow_table->last_rehash = jiffies; + flow_table->count = 0; + + table_instance_destroy(old_ti, true); + return 0; +} + +static u32 flow_hash(const struct sw_flow_key *key, int key_start, + int key_end) +{ + u32 *hash_key = (u32 *)((u8 *)key + key_start); + int hash_u32s = (key_end - key_start) >> 2; + + /* Make sure number of hash bytes are multiple of u32. */ + BUILD_BUG_ON(sizeof(long) % sizeof(u32)); + + return jhash2(hash_key, hash_u32s, 0); +} + +static int flow_key_start(const struct sw_flow_key *key) +{ + if (key->tun_key.ipv4_dst) + return 0; + else + return rounddown(offsetof(struct sw_flow_key, phy), + sizeof(long)); +} + +static bool cmp_key(const struct sw_flow_key *key1, + const struct sw_flow_key *key2, + int key_start, int key_end) +{ + const long *cp1 = (long *)((u8 *)key1 + key_start); + const long *cp2 = (long *)((u8 *)key2 + key_start); + long diffs = 0; + int i; + + for (i = key_start; i < key_end; i += sizeof(long)) + diffs |= *cp1++ ^ *cp2++; + + return diffs == 0; +} + +static bool flow_cmp_masked_key(const struct sw_flow *flow, + const struct sw_flow_key *key, + int key_start, int key_end) +{ + return cmp_key(&flow->key, key, key_start, key_end); +} + +bool ovs_flow_cmp_unmasked_key(const struct sw_flow *flow, + struct sw_flow_match *match) +{ + struct sw_flow_key *key = match->key; + int key_start = flow_key_start(key); + int key_end = match->range.end; + + return cmp_key(&flow->unmasked_key, key, key_start, key_end); +} + +static struct sw_flow *masked_flow_lookup(struct table_instance *ti, + const struct sw_flow_key *unmasked, + struct sw_flow_mask *mask) +{ + struct sw_flow *flow; + struct hlist_head *head; + int key_start = mask->range.start; + int key_end = mask->range.end; + u32 hash; + struct sw_flow_key masked_key; + + ovs_flow_mask_key(&masked_key, unmasked, mask); + hash = flow_hash(&masked_key, key_start, key_end); + head = find_bucket(ti, hash); + hlist_for_each_entry_rcu(flow, head, hash_node[ti->node_ver]) { + if (flow->mask == mask && flow->hash == hash && + flow_cmp_masked_key(flow, &masked_key, + key_start, key_end)) + return flow; + } + return NULL; +} + +struct sw_flow *ovs_flow_tbl_lookup(struct flow_table *tbl, + const struct sw_flow_key *key, + u32 *n_mask_hit) +{ + struct table_instance *ti = rcu_dereference(tbl->ti); + struct sw_flow_mask *mask; + struct sw_flow *flow; + + *n_mask_hit = 0; + list_for_each_entry_rcu(mask, &tbl->mask_list, list) { + (*n_mask_hit)++; + flow = masked_flow_lookup(ti, key, mask); + if (flow) /* Found */ + return flow; + } + return NULL; +} + +int ovs_flow_tbl_num_masks(const struct flow_table *table) +{ + struct sw_flow_mask *mask; + int num = 0; + + list_for_each_entry(mask, &table->mask_list, list) + num++; + + return num; +} + +static struct table_instance *table_instance_expand(struct table_instance *ti) +{ + return table_instance_rehash(ti, ti->n_buckets * 2); +} + +void ovs_flow_tbl_remove(struct flow_table *table, struct sw_flow *flow) +{ + struct table_instance *ti = ovsl_dereference(table->ti); + + BUG_ON(table->count == 0); + hlist_del_rcu(&flow->hash_node[ti->node_ver]); + table->count--; +} + +static struct sw_flow_mask *mask_alloc(void) +{ + struct sw_flow_mask *mask; + + mask = kmalloc(sizeof(*mask), GFP_KERNEL); + if (mask) + mask->ref_count = 0; + + return mask; +} + +static void mask_add_ref(struct sw_flow_mask *mask) +{ + mask->ref_count++; +} + +static bool mask_equal(const struct sw_flow_mask *a, + const struct sw_flow_mask *b) +{ + u8 *a_ = (u8 *)&a->key + a->range.start; + u8 *b_ = (u8 *)&b->key + b->range.start; + + return (a->range.end == b->range.end) + && (a->range.start == b->range.start) + && (memcmp(a_, b_, range_n_bytes(&a->range)) == 0); +} + +static struct sw_flow_mask *flow_mask_find(const struct flow_table *tbl, + const struct sw_flow_mask *mask) +{ + struct list_head *ml; + + list_for_each(ml, &tbl->mask_list) { + struct sw_flow_mask *m; + m = container_of(ml, struct sw_flow_mask, list); + if (mask_equal(mask, m)) + return m; + } + + return NULL; +} + +/** + * add a new mask into the mask list. + * The caller needs to make sure that 'mask' is not the same + * as any masks that are already on the list. + */ +static int flow_mask_insert(struct flow_table *tbl, struct sw_flow *flow, + struct sw_flow_mask *new) +{ + struct sw_flow_mask *mask; + mask = flow_mask_find(tbl, new); + if (!mask) { + /* Allocate a new mask if none exsits. */ + mask = mask_alloc(); + if (!mask) + return -ENOMEM; + mask->key = new->key; + mask->range = new->range; + list_add_rcu(&mask->list, &tbl->mask_list); + } + + mask_add_ref(mask); + flow->mask = mask; + return 0; +} + +int ovs_flow_tbl_insert(struct flow_table *table, struct sw_flow *flow, + struct sw_flow_mask *mask) +{ + struct table_instance *new_ti = NULL; + struct table_instance *ti; + int err; + + err = flow_mask_insert(table, flow, mask); + if (err) + return err; + + flow->hash = flow_hash(&flow->key, flow->mask->range.start, + flow->mask->range.end); + ti = ovsl_dereference(table->ti); + table_instance_insert(ti, flow); + table->count++; + + /* Expand table, if necessary, to make room. */ + if (table->count > ti->n_buckets) + new_ti = table_instance_expand(ti); + else if (time_after(jiffies, table->last_rehash + REHASH_INTERVAL)) + new_ti = table_instance_rehash(ti, ti->n_buckets); + + if (new_ti) { + rcu_assign_pointer(table->ti, new_ti); + table_instance_destroy(ti, true); + table->last_rehash = jiffies; + } + return 0; +} + +/* Initializes the flow module. + * Returns zero if successful or a negative error code. */ +int ovs_flow_init(void) +{ + BUILD_BUG_ON(__alignof__(struct sw_flow_key) % __alignof__(long)); + BUILD_BUG_ON(sizeof(struct sw_flow_key) % sizeof(long)); + + flow_cache = kmem_cache_create("sw_flow", sizeof(struct sw_flow), 0, + 0, NULL); + if (flow_cache == NULL) + return -ENOMEM; + + return 0; +} + +/* Uninitializes the flow module. */ +void ovs_flow_exit(void) +{ + kmem_cache_destroy(flow_cache); +} diff --git a/net/openvswitch/flow_table.h b/net/openvswitch/flow_table.h new file mode 100644 index 000000000000..fbe45d5ad07d --- /dev/null +++ b/net/openvswitch/flow_table.h @@ -0,0 +1,81 @@ +/* + * Copyright (c) 2007-2013 Nicira, Inc. + * + * This program is free software; you can redistribute it and/or + * modify it under the terms of version 2 of the GNU General Public + * License as published by the Free Software Foundation. + * + * This program is distributed in the hope that it will be useful, but + * WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + * General Public License for more details. + * + * You should have received a copy of the GNU General Public License + * along with this program; if not, write to the Free Software + * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA + * 02110-1301, USA + */ + +#ifndef FLOW_TABLE_H +#define FLOW_TABLE_H 1 + +#include <linux/kernel.h> +#include <linux/netlink.h> +#include <linux/openvswitch.h> +#include <linux/spinlock.h> +#include <linux/types.h> +#include <linux/rcupdate.h> +#include <linux/if_ether.h> +#include <linux/in6.h> +#include <linux/jiffies.h> +#include <linux/time.h> +#include <linux/flex_array.h> + +#include <net/inet_ecn.h> +#include <net/ip_tunnels.h> + +#include "flow.h" + +struct table_instance { + struct flex_array *buckets; + unsigned int n_buckets; + struct rcu_head rcu; + int node_ver; + u32 hash_seed; + bool keep_flows; +}; + +struct flow_table { + struct table_instance __rcu *ti; + struct list_head mask_list; + unsigned long last_rehash; + unsigned int count; +}; + +int ovs_flow_init(void); +void ovs_flow_exit(void); + +struct sw_flow *ovs_flow_alloc(void); +void ovs_flow_free(struct sw_flow *, bool deferred); + +int ovs_flow_tbl_init(struct flow_table *); +int ovs_flow_tbl_count(struct flow_table *table); +void ovs_flow_tbl_destroy(struct flow_table *table); +int ovs_flow_tbl_flush(struct flow_table *flow_table); + +int ovs_flow_tbl_insert(struct flow_table *table, struct sw_flow *flow, + struct sw_flow_mask *mask); +void ovs_flow_tbl_remove(struct flow_table *table, struct sw_flow *flow); +int ovs_flow_tbl_num_masks(const struct flow_table *table); +struct sw_flow *ovs_flow_tbl_dump_next(struct table_instance *table, + u32 *bucket, u32 *idx); +struct sw_flow *ovs_flow_tbl_lookup(struct flow_table *, + const struct sw_flow_key *, + u32 *n_mask_hit); + +bool ovs_flow_cmp_unmasked_key(const struct sw_flow *flow, + struct sw_flow_match *match); + +void ovs_flow_mask_key(struct sw_flow_key *dst, const struct sw_flow_key *src, + const struct sw_flow_mask *mask); +#endif /* flow_table.h */ diff --git a/net/openvswitch/vport-gre.c b/net/openvswitch/vport-gre.c index c99dea543d64..a3d6951602db 100644 --- a/net/openvswitch/vport-gre.c +++ b/net/openvswitch/vport-gre.c @@ -24,8 +24,6 @@ #include <linux/if_tunnel.h> #include <linux/if_vlan.h> #include <linux/in.h> -#include <linux/if_vlan.h> -#include <linux/in.h> #include <linux/in_route.h> #include <linux/inetdevice.h> #include <linux/jhash.h> diff --git a/net/openvswitch/vport-internal_dev.c b/net/openvswitch/vport-internal_dev.c index 98d3edbbc235..729c68763fe7 100644 --- a/net/openvswitch/vport-internal_dev.c +++ b/net/openvswitch/vport-internal_dev.c @@ -134,7 +134,7 @@ static void do_setup(struct net_device *netdev) netdev->tx_queue_len = 0; netdev->features = NETIF_F_LLTX | NETIF_F_SG | NETIF_F_FRAGLIST | - NETIF_F_HIGHDMA | NETIF_F_HW_CSUM | NETIF_F_TSO; + NETIF_F_HIGHDMA | NETIF_F_HW_CSUM | NETIF_F_GSO_SOFTWARE; netdev->vlan_features = netdev->features; netdev->features |= NETIF_F_HW_VLAN_CTAG_TX; diff --git a/net/openvswitch/vport-vxlan.c b/net/openvswitch/vport-vxlan.c index a481c03e2861..e797a50ac2be 100644 --- a/net/openvswitch/vport-vxlan.c +++ b/net/openvswitch/vport-vxlan.c @@ -29,7 +29,6 @@ #include <net/ip.h> #include <net/udp.h> #include <net/ip_tunnels.h> -#include <net/udp.h> #include <net/rtnetlink.h> #include <net/route.h> #include <net/dsfield.h> @@ -173,7 +172,7 @@ static int vxlan_tnl_send(struct vport *vport, struct sk_buff *skb) skb->local_df = 1; - inet_get_local_port_range(&port_min, &port_max); + inet_get_local_port_range(net, &port_min, &port_max); src_port = vxlan_src_port(port_min, port_max, skb); err = vxlan_xmit_skb(vxlan_port->vs, rt, skb, diff --git a/net/rds/connection.c b/net/rds/connection.c index 642ad42c416b..378c3a6acf84 100644 --- a/net/rds/connection.c +++ b/net/rds/connection.c @@ -51,10 +51,16 @@ static struct kmem_cache *rds_conn_slab; static struct hlist_head *rds_conn_bucket(__be32 laddr, __be32 faddr) { + static u32 rds_hash_secret __read_mostly; + + unsigned long hash; + + net_get_random_once(&rds_hash_secret, sizeof(rds_hash_secret)); + /* Pass NULL, don't need struct net for hash */ - unsigned long hash = inet_ehashfn(NULL, - be32_to_cpu(laddr), 0, - be32_to_cpu(faddr), 0); + hash = __inet_ehashfn(be32_to_cpu(laddr), 0, + be32_to_cpu(faddr), 0, + rds_hash_secret); return &rds_conn_hash[hash & RDS_CONNECTION_HASH_MASK]; } diff --git a/net/rds/rds.h b/net/rds/rds.h index ec1d731ecff0..48f8ffc60f8f 100644 --- a/net/rds/rds.h +++ b/net/rds/rds.h @@ -749,7 +749,7 @@ void rds_atomic_send_complete(struct rds_message *rm, int wc_status); int rds_cmsg_atomic(struct rds_sock *rs, struct rds_message *rm, struct cmsghdr *cmsg); -extern void __rds_put_mr_final(struct rds_mr *mr); +void __rds_put_mr_final(struct rds_mr *mr); static inline void rds_mr_put(struct rds_mr *mr) { if (atomic_dec_and_test(&mr->r_refcount)) diff --git a/net/rfkill/Kconfig b/net/rfkill/Kconfig index 78efe895b663..4c10e7e6c9f6 100644 --- a/net/rfkill/Kconfig +++ b/net/rfkill/Kconfig @@ -36,7 +36,7 @@ config RFKILL_REGULATOR config RFKILL_GPIO tristate "GPIO RFKILL driver" - depends on RFKILL && GPIOLIB && HAVE_CLK + depends on RFKILL && GPIOLIB default n help If you say yes here you get support of a generic gpio RFKILL diff --git a/net/rfkill/rfkill-gpio.c b/net/rfkill/rfkill-gpio.c index fb076cd6f808..5620d3c07479 100644 --- a/net/rfkill/rfkill-gpio.c +++ b/net/rfkill/rfkill-gpio.c @@ -24,27 +24,23 @@ #include <linux/platform_device.h> #include <linux/clk.h> #include <linux/slab.h> +#include <linux/acpi.h> +#include <linux/acpi_gpio.h> #include <linux/rfkill-gpio.h> -enum rfkill_gpio_clk_state { - UNSPECIFIED = 0, - PWR_ENABLED, - PWR_DISABLED -}; +struct rfkill_gpio_data { + const char *name; + enum rfkill_type type; + int reset_gpio; + int shutdown_gpio; -#define PWR_CLK_SET(_RF, _EN) \ - ((_RF)->pwr_clk_enabled = (!(_EN) ? PWR_ENABLED : PWR_DISABLED)) -#define PWR_CLK_ENABLED(_RF) ((_RF)->pwr_clk_enabled == PWR_ENABLED) -#define PWR_CLK_DISABLED(_RF) ((_RF)->pwr_clk_enabled != PWR_ENABLED) + struct rfkill *rfkill_dev; + char *reset_name; + char *shutdown_name; + struct clk *clk; -struct rfkill_gpio_data { - struct rfkill_gpio_platform_data *pdata; - struct rfkill *rfkill_dev; - char *reset_name; - char *shutdown_name; - enum rfkill_gpio_clk_state pwr_clk_enabled; - struct clk *pwr_clk; + bool clk_enabled; }; static int rfkill_gpio_set_power(void *data, bool blocked) @@ -52,23 +48,22 @@ static int rfkill_gpio_set_power(void *data, bool blocked) struct rfkill_gpio_data *rfkill = data; if (blocked) { - if (gpio_is_valid(rfkill->pdata->shutdown_gpio)) - gpio_direction_output(rfkill->pdata->shutdown_gpio, 0); - if (gpio_is_valid(rfkill->pdata->reset_gpio)) - gpio_direction_output(rfkill->pdata->reset_gpio, 0); - if (rfkill->pwr_clk && PWR_CLK_ENABLED(rfkill)) - clk_disable(rfkill->pwr_clk); + if (gpio_is_valid(rfkill->shutdown_gpio)) + gpio_set_value(rfkill->shutdown_gpio, 0); + if (gpio_is_valid(rfkill->reset_gpio)) + gpio_set_value(rfkill->reset_gpio, 0); + if (!IS_ERR(rfkill->clk) && rfkill->clk_enabled) + clk_disable(rfkill->clk); } else { - if (rfkill->pwr_clk && PWR_CLK_DISABLED(rfkill)) - clk_enable(rfkill->pwr_clk); - if (gpio_is_valid(rfkill->pdata->reset_gpio)) - gpio_direction_output(rfkill->pdata->reset_gpio, 1); - if (gpio_is_valid(rfkill->pdata->shutdown_gpio)) - gpio_direction_output(rfkill->pdata->shutdown_gpio, 1); + if (!IS_ERR(rfkill->clk) && !rfkill->clk_enabled) + clk_enable(rfkill->clk); + if (gpio_is_valid(rfkill->reset_gpio)) + gpio_set_value(rfkill->reset_gpio, 1); + if (gpio_is_valid(rfkill->shutdown_gpio)) + gpio_set_value(rfkill->shutdown_gpio, 1); } - if (rfkill->pwr_clk) - PWR_CLK_SET(rfkill, blocked); + rfkill->clk_enabled = blocked; return 0; } @@ -77,117 +72,112 @@ static const struct rfkill_ops rfkill_gpio_ops = { .set_block = rfkill_gpio_set_power, }; +static int rfkill_gpio_acpi_probe(struct device *dev, + struct rfkill_gpio_data *rfkill) +{ + const struct acpi_device_id *id; + + id = acpi_match_device(dev->driver->acpi_match_table, dev); + if (!id) + return -ENODEV; + + rfkill->name = dev_name(dev); + rfkill->type = (unsigned)id->driver_data; + rfkill->reset_gpio = acpi_get_gpio_by_index(dev, 0, NULL); + rfkill->shutdown_gpio = acpi_get_gpio_by_index(dev, 1, NULL); + + return 0; +} + static int rfkill_gpio_probe(struct platform_device *pdev) { - struct rfkill_gpio_data *rfkill; struct rfkill_gpio_platform_data *pdata = pdev->dev.platform_data; + struct rfkill_gpio_data *rfkill; + const char *clk_name = NULL; int ret = 0; int len = 0; - if (!pdata) { - pr_warn("%s: No platform data specified\n", __func__); - return -EINVAL; + rfkill = devm_kzalloc(&pdev->dev, sizeof(*rfkill), GFP_KERNEL); + if (!rfkill) + return -ENOMEM; + + if (ACPI_HANDLE(&pdev->dev)) { + ret = rfkill_gpio_acpi_probe(&pdev->dev, rfkill); + if (ret) + return ret; + } else if (pdata) { + clk_name = pdata->power_clk_name; + rfkill->name = pdata->name; + rfkill->type = pdata->type; + rfkill->reset_gpio = pdata->reset_gpio; + rfkill->shutdown_gpio = pdata->shutdown_gpio; + } else { + return -ENODEV; } /* make sure at-least one of the GPIO is defined and that * a name is specified for this instance */ - if (!pdata->name || (!gpio_is_valid(pdata->reset_gpio) && - !gpio_is_valid(pdata->shutdown_gpio))) { + if ((!gpio_is_valid(rfkill->reset_gpio) && + !gpio_is_valid(rfkill->shutdown_gpio)) || !rfkill->name) { pr_warn("%s: invalid platform data\n", __func__); return -EINVAL; } - rfkill = kzalloc(sizeof(*rfkill), GFP_KERNEL); - if (!rfkill) - return -ENOMEM; - - if (pdata->gpio_runtime_setup) { + if (pdata && pdata->gpio_runtime_setup) { ret = pdata->gpio_runtime_setup(pdev); if (ret) { pr_warn("%s: can't set up gpio\n", __func__); - goto fail_alloc; + return ret; } } - rfkill->pdata = pdata; - - len = strlen(pdata->name); - rfkill->reset_name = kzalloc(len + 7, GFP_KERNEL); - if (!rfkill->reset_name) { - ret = -ENOMEM; - goto fail_alloc; - } + len = strlen(rfkill->name); + rfkill->reset_name = devm_kzalloc(&pdev->dev, len + 7, GFP_KERNEL); + if (!rfkill->reset_name) + return -ENOMEM; - rfkill->shutdown_name = kzalloc(len + 10, GFP_KERNEL); - if (!rfkill->shutdown_name) { - ret = -ENOMEM; - goto fail_reset_name; - } + rfkill->shutdown_name = devm_kzalloc(&pdev->dev, len + 10, GFP_KERNEL); + if (!rfkill->shutdown_name) + return -ENOMEM; - snprintf(rfkill->reset_name, len + 6 , "%s_reset", pdata->name); - snprintf(rfkill->shutdown_name, len + 9, "%s_shutdown", pdata->name); + snprintf(rfkill->reset_name, len + 6 , "%s_reset", rfkill->name); + snprintf(rfkill->shutdown_name, len + 9, "%s_shutdown", rfkill->name); - if (pdata->power_clk_name) { - rfkill->pwr_clk = clk_get(&pdev->dev, pdata->power_clk_name); - if (IS_ERR(rfkill->pwr_clk)) { - pr_warn("%s: can't find pwr_clk.\n", __func__); - ret = PTR_ERR(rfkill->pwr_clk); - goto fail_shutdown_name; - } - } + rfkill->clk = devm_clk_get(&pdev->dev, clk_name); - if (gpio_is_valid(pdata->reset_gpio)) { - ret = gpio_request(pdata->reset_gpio, rfkill->reset_name); + if (gpio_is_valid(rfkill->reset_gpio)) { + ret = devm_gpio_request_one(&pdev->dev, rfkill->reset_gpio, + 0, rfkill->reset_name); if (ret) { pr_warn("%s: failed to get reset gpio.\n", __func__); - goto fail_clock; + return ret; } } - if (gpio_is_valid(pdata->shutdown_gpio)) { - ret = gpio_request(pdata->shutdown_gpio, rfkill->shutdown_name); + if (gpio_is_valid(rfkill->shutdown_gpio)) { + ret = devm_gpio_request_one(&pdev->dev, rfkill->shutdown_gpio, + 0, rfkill->shutdown_name); if (ret) { pr_warn("%s: failed to get shutdown gpio.\n", __func__); - goto fail_reset; + return ret; } } - rfkill->rfkill_dev = rfkill_alloc(pdata->name, &pdev->dev, pdata->type, - &rfkill_gpio_ops, rfkill); - if (!rfkill->rfkill_dev) { - ret = -ENOMEM; - goto fail_shutdown; - } + rfkill->rfkill_dev = rfkill_alloc(rfkill->name, &pdev->dev, + rfkill->type, &rfkill_gpio_ops, + rfkill); + if (!rfkill->rfkill_dev) + return -ENOMEM; ret = rfkill_register(rfkill->rfkill_dev); if (ret < 0) - goto fail_rfkill; + return ret; platform_set_drvdata(pdev, rfkill); - dev_info(&pdev->dev, "%s device registered.\n", pdata->name); + dev_info(&pdev->dev, "%s device registered.\n", rfkill->name); return 0; - -fail_rfkill: - rfkill_destroy(rfkill->rfkill_dev); -fail_shutdown: - if (gpio_is_valid(pdata->shutdown_gpio)) - gpio_free(pdata->shutdown_gpio); -fail_reset: - if (gpio_is_valid(pdata->reset_gpio)) - gpio_free(pdata->reset_gpio); -fail_clock: - if (rfkill->pwr_clk) - clk_put(rfkill->pwr_clk); -fail_shutdown_name: - kfree(rfkill->shutdown_name); -fail_reset_name: - kfree(rfkill->reset_name); -fail_alloc: - kfree(rfkill); - - return ret; } static int rfkill_gpio_remove(struct platform_device *pdev) @@ -195,31 +185,26 @@ static int rfkill_gpio_remove(struct platform_device *pdev) struct rfkill_gpio_data *rfkill = platform_get_drvdata(pdev); struct rfkill_gpio_platform_data *pdata = pdev->dev.platform_data; - if (pdata->gpio_runtime_close) + if (pdata && pdata->gpio_runtime_close) pdata->gpio_runtime_close(pdev); rfkill_unregister(rfkill->rfkill_dev); rfkill_destroy(rfkill->rfkill_dev); - if (gpio_is_valid(rfkill->pdata->shutdown_gpio)) - gpio_free(rfkill->pdata->shutdown_gpio); - if (gpio_is_valid(rfkill->pdata->reset_gpio)) - gpio_free(rfkill->pdata->reset_gpio); - if (rfkill->pwr_clk && PWR_CLK_ENABLED(rfkill)) - clk_disable(rfkill->pwr_clk); - if (rfkill->pwr_clk) - clk_put(rfkill->pwr_clk); - kfree(rfkill->shutdown_name); - kfree(rfkill->reset_name); - kfree(rfkill); return 0; } +static const struct acpi_device_id rfkill_acpi_match[] = { + { "BCM4752", RFKILL_TYPE_GPS }, + { }, +}; + static struct platform_driver rfkill_gpio_driver = { .probe = rfkill_gpio_probe, .remove = rfkill_gpio_remove, .driver = { - .name = "rfkill_gpio", - .owner = THIS_MODULE, + .name = "rfkill_gpio", + .owner = THIS_MODULE, + .acpi_match_table = ACPI_PTR(rfkill_acpi_match), }, }; diff --git a/net/rxrpc/ar-internal.h b/net/rxrpc/ar-internal.h index a693aca2ae2e..5f43675ee1df 100644 --- a/net/rxrpc/ar-internal.h +++ b/net/rxrpc/ar-internal.h @@ -426,17 +426,16 @@ extern struct workqueue_struct *rxrpc_workqueue; /* * ar-accept.c */ -extern void rxrpc_accept_incoming_calls(struct work_struct *); -extern struct rxrpc_call *rxrpc_accept_call(struct rxrpc_sock *, - unsigned long); -extern int rxrpc_reject_call(struct rxrpc_sock *); +void rxrpc_accept_incoming_calls(struct work_struct *); +struct rxrpc_call *rxrpc_accept_call(struct rxrpc_sock *, unsigned long); +int rxrpc_reject_call(struct rxrpc_sock *); /* * ar-ack.c */ -extern void __rxrpc_propose_ACK(struct rxrpc_call *, u8, __be32, bool); -extern void rxrpc_propose_ACK(struct rxrpc_call *, u8, __be32, bool); -extern void rxrpc_process_call(struct work_struct *); +void __rxrpc_propose_ACK(struct rxrpc_call *, u8, __be32, bool); +void rxrpc_propose_ACK(struct rxrpc_call *, u8, __be32, bool); +void rxrpc_process_call(struct work_struct *); /* * ar-call.c @@ -445,19 +444,18 @@ extern struct kmem_cache *rxrpc_call_jar; extern struct list_head rxrpc_calls; extern rwlock_t rxrpc_call_lock; -extern struct rxrpc_call *rxrpc_get_client_call(struct rxrpc_sock *, - struct rxrpc_transport *, - struct rxrpc_conn_bundle *, - unsigned long, int, gfp_t); -extern struct rxrpc_call *rxrpc_incoming_call(struct rxrpc_sock *, - struct rxrpc_connection *, - struct rxrpc_header *, gfp_t); -extern struct rxrpc_call *rxrpc_find_server_call(struct rxrpc_sock *, - unsigned long); -extern void rxrpc_release_call(struct rxrpc_call *); -extern void rxrpc_release_calls_on_socket(struct rxrpc_sock *); -extern void __rxrpc_put_call(struct rxrpc_call *); -extern void __exit rxrpc_destroy_all_calls(void); +struct rxrpc_call *rxrpc_get_client_call(struct rxrpc_sock *, + struct rxrpc_transport *, + struct rxrpc_conn_bundle *, + unsigned long, int, gfp_t); +struct rxrpc_call *rxrpc_incoming_call(struct rxrpc_sock *, + struct rxrpc_connection *, + struct rxrpc_header *, gfp_t); +struct rxrpc_call *rxrpc_find_server_call(struct rxrpc_sock *, unsigned long); +void rxrpc_release_call(struct rxrpc_call *); +void rxrpc_release_calls_on_socket(struct rxrpc_sock *); +void __rxrpc_put_call(struct rxrpc_call *); +void __exit rxrpc_destroy_all_calls(void); /* * ar-connection.c @@ -465,19 +463,16 @@ extern void __exit rxrpc_destroy_all_calls(void); extern struct list_head rxrpc_connections; extern rwlock_t rxrpc_connection_lock; -extern struct rxrpc_conn_bundle *rxrpc_get_bundle(struct rxrpc_sock *, - struct rxrpc_transport *, - struct key *, - __be16, gfp_t); -extern void rxrpc_put_bundle(struct rxrpc_transport *, - struct rxrpc_conn_bundle *); -extern int rxrpc_connect_call(struct rxrpc_sock *, struct rxrpc_transport *, - struct rxrpc_conn_bundle *, struct rxrpc_call *, - gfp_t); -extern void rxrpc_put_connection(struct rxrpc_connection *); -extern void __exit rxrpc_destroy_all_connections(void); -extern struct rxrpc_connection *rxrpc_find_connection(struct rxrpc_transport *, - struct rxrpc_header *); +struct rxrpc_conn_bundle *rxrpc_get_bundle(struct rxrpc_sock *, + struct rxrpc_transport *, + struct key *, __be16, gfp_t); +void rxrpc_put_bundle(struct rxrpc_transport *, struct rxrpc_conn_bundle *); +int rxrpc_connect_call(struct rxrpc_sock *, struct rxrpc_transport *, + struct rxrpc_conn_bundle *, struct rxrpc_call *, gfp_t); +void rxrpc_put_connection(struct rxrpc_connection *); +void __exit rxrpc_destroy_all_connections(void); +struct rxrpc_connection *rxrpc_find_connection(struct rxrpc_transport *, + struct rxrpc_header *); extern struct rxrpc_connection * rxrpc_incoming_connection(struct rxrpc_transport *, struct rxrpc_header *, gfp_t); @@ -485,15 +480,15 @@ rxrpc_incoming_connection(struct rxrpc_transport *, struct rxrpc_header *, /* * ar-connevent.c */ -extern void rxrpc_process_connection(struct work_struct *); -extern void rxrpc_reject_packet(struct rxrpc_local *, struct sk_buff *); -extern void rxrpc_reject_packets(struct work_struct *); +void rxrpc_process_connection(struct work_struct *); +void rxrpc_reject_packet(struct rxrpc_local *, struct sk_buff *); +void rxrpc_reject_packets(struct work_struct *); /* * ar-error.c */ -extern void rxrpc_UDP_error_report(struct sock *); -extern void rxrpc_UDP_error_handler(struct work_struct *); +void rxrpc_UDP_error_report(struct sock *); +void rxrpc_UDP_error_handler(struct work_struct *); /* * ar-input.c @@ -501,18 +496,17 @@ extern void rxrpc_UDP_error_handler(struct work_struct *); extern unsigned long rxrpc_ack_timeout; extern const char *rxrpc_pkts[]; -extern void rxrpc_data_ready(struct sock *, int); -extern int rxrpc_queue_rcv_skb(struct rxrpc_call *, struct sk_buff *, bool, - bool); -extern void rxrpc_fast_process_packet(struct rxrpc_call *, struct sk_buff *); +void rxrpc_data_ready(struct sock *, int); +int rxrpc_queue_rcv_skb(struct rxrpc_call *, struct sk_buff *, bool, bool); +void rxrpc_fast_process_packet(struct rxrpc_call *, struct sk_buff *); /* * ar-local.c */ extern rwlock_t rxrpc_local_lock; -extern struct rxrpc_local *rxrpc_lookup_local(struct sockaddr_rxrpc *); -extern void rxrpc_put_local(struct rxrpc_local *); -extern void __exit rxrpc_destroy_all_locals(void); +struct rxrpc_local *rxrpc_lookup_local(struct sockaddr_rxrpc *); +void rxrpc_put_local(struct rxrpc_local *); +void __exit rxrpc_destroy_all_locals(void); /* * ar-key.c @@ -520,31 +514,29 @@ extern void __exit rxrpc_destroy_all_locals(void); extern struct key_type key_type_rxrpc; extern struct key_type key_type_rxrpc_s; -extern int rxrpc_request_key(struct rxrpc_sock *, char __user *, int); -extern int rxrpc_server_keyring(struct rxrpc_sock *, char __user *, int); -extern int rxrpc_get_server_data_key(struct rxrpc_connection *, const void *, - time_t, u32); +int rxrpc_request_key(struct rxrpc_sock *, char __user *, int); +int rxrpc_server_keyring(struct rxrpc_sock *, char __user *, int); +int rxrpc_get_server_data_key(struct rxrpc_connection *, const void *, time_t, + u32); /* * ar-output.c */ extern int rxrpc_resend_timeout; -extern int rxrpc_send_packet(struct rxrpc_transport *, struct sk_buff *); -extern int rxrpc_client_sendmsg(struct kiocb *, struct rxrpc_sock *, - struct rxrpc_transport *, struct msghdr *, - size_t); -extern int rxrpc_server_sendmsg(struct kiocb *, struct rxrpc_sock *, - struct msghdr *, size_t); +int rxrpc_send_packet(struct rxrpc_transport *, struct sk_buff *); +int rxrpc_client_sendmsg(struct kiocb *, struct rxrpc_sock *, + struct rxrpc_transport *, struct msghdr *, size_t); +int rxrpc_server_sendmsg(struct kiocb *, struct rxrpc_sock *, struct msghdr *, + size_t); /* * ar-peer.c */ -extern struct rxrpc_peer *rxrpc_get_peer(struct sockaddr_rxrpc *, gfp_t); -extern void rxrpc_put_peer(struct rxrpc_peer *); -extern struct rxrpc_peer *rxrpc_find_peer(struct rxrpc_local *, - __be32, __be16); -extern void __exit rxrpc_destroy_all_peers(void); +struct rxrpc_peer *rxrpc_get_peer(struct sockaddr_rxrpc *, gfp_t); +void rxrpc_put_peer(struct rxrpc_peer *); +struct rxrpc_peer *rxrpc_find_peer(struct rxrpc_local *, __be32, __be16); +void __exit rxrpc_destroy_all_peers(void); /* * ar-proc.c @@ -556,38 +548,36 @@ extern const struct file_operations rxrpc_connection_seq_fops; /* * ar-recvmsg.c */ -extern void rxrpc_remove_user_ID(struct rxrpc_sock *, struct rxrpc_call *); -extern int rxrpc_recvmsg(struct kiocb *, struct socket *, struct msghdr *, - size_t, int); +void rxrpc_remove_user_ID(struct rxrpc_sock *, struct rxrpc_call *); +int rxrpc_recvmsg(struct kiocb *, struct socket *, struct msghdr *, size_t, + int); /* * ar-security.c */ -extern int rxrpc_register_security(struct rxrpc_security *); -extern void rxrpc_unregister_security(struct rxrpc_security *); -extern int rxrpc_init_client_conn_security(struct rxrpc_connection *); -extern int rxrpc_init_server_conn_security(struct rxrpc_connection *); -extern int rxrpc_secure_packet(const struct rxrpc_call *, struct sk_buff *, - size_t, void *); -extern int rxrpc_verify_packet(const struct rxrpc_call *, struct sk_buff *, - u32 *); -extern void rxrpc_clear_conn_security(struct rxrpc_connection *); +int rxrpc_register_security(struct rxrpc_security *); +void rxrpc_unregister_security(struct rxrpc_security *); +int rxrpc_init_client_conn_security(struct rxrpc_connection *); +int rxrpc_init_server_conn_security(struct rxrpc_connection *); +int rxrpc_secure_packet(const struct rxrpc_call *, struct sk_buff *, size_t, + void *); +int rxrpc_verify_packet(const struct rxrpc_call *, struct sk_buff *, u32 *); +void rxrpc_clear_conn_security(struct rxrpc_connection *); /* * ar-skbuff.c */ -extern void rxrpc_packet_destructor(struct sk_buff *); +void rxrpc_packet_destructor(struct sk_buff *); /* * ar-transport.c */ -extern struct rxrpc_transport *rxrpc_get_transport(struct rxrpc_local *, - struct rxrpc_peer *, - gfp_t); -extern void rxrpc_put_transport(struct rxrpc_transport *); -extern void __exit rxrpc_destroy_all_transports(void); -extern struct rxrpc_transport *rxrpc_find_transport(struct rxrpc_local *, - struct rxrpc_peer *); +struct rxrpc_transport *rxrpc_get_transport(struct rxrpc_local *, + struct rxrpc_peer *, gfp_t); +void rxrpc_put_transport(struct rxrpc_transport *); +void __exit rxrpc_destroy_all_transports(void); +struct rxrpc_transport *rxrpc_find_transport(struct rxrpc_local *, + struct rxrpc_peer *); /* * debug tracing diff --git a/net/sched/Kconfig b/net/sched/Kconfig index c03a32a0418e..ad1f1d819203 100644 --- a/net/sched/Kconfig +++ b/net/sched/Kconfig @@ -443,6 +443,16 @@ config NET_CLS_CGROUP To compile this code as a module, choose M here: the module will be called cls_cgroup. +config NET_CLS_BPF + tristate "BPF-based classifier" + select NET_CLS + ---help--- + If you say Y here, you will be able to classify packets based on + programmable BPF (JIT'ed) filters as an alternative to ematches. + + To compile this code as a module, choose M here: the module will + be called cls_bpf. + config NET_EMATCH bool "Extended Matches" select NET_CLS diff --git a/net/sched/Makefile b/net/sched/Makefile index e5f9abe9a5db..35fa47a494ab 100644 --- a/net/sched/Makefile +++ b/net/sched/Makefile @@ -50,6 +50,7 @@ obj-$(CONFIG_NET_CLS_RSVP6) += cls_rsvp6.o obj-$(CONFIG_NET_CLS_BASIC) += cls_basic.o obj-$(CONFIG_NET_CLS_FLOW) += cls_flow.o obj-$(CONFIG_NET_CLS_CGROUP) += cls_cgroup.o +obj-$(CONFIG_NET_CLS_BPF) += cls_bpf.o obj-$(CONFIG_NET_EMATCH) += ematch.o obj-$(CONFIG_NET_EMATCH_CMP) += em_cmp.o obj-$(CONFIG_NET_EMATCH_NBYTE) += em_nbyte.o diff --git a/net/sched/act_police.c b/net/sched/act_police.c index 189e3c5b3d09..272d8e924cf6 100644 --- a/net/sched/act_police.c +++ b/net/sched/act_police.c @@ -231,14 +231,14 @@ override: } if (R_tab) { police->rate_present = true; - psched_ratecfg_precompute(&police->rate, &R_tab->rate); + psched_ratecfg_precompute(&police->rate, &R_tab->rate, 0); qdisc_put_rtab(R_tab); } else { police->rate_present = false; } if (P_tab) { police->peak_present = true; - psched_ratecfg_precompute(&police->peak, &P_tab->rate); + psched_ratecfg_precompute(&police->peak, &P_tab->rate, 0); qdisc_put_rtab(P_tab); } else { police->peak_present = false; diff --git a/net/sched/cls_basic.c b/net/sched/cls_basic.c index d76a35d0dc85..636d9131d870 100644 --- a/net/sched/cls_basic.c +++ b/net/sched/cls_basic.c @@ -137,7 +137,7 @@ static int basic_set_parms(struct net *net, struct tcf_proto *tp, struct nlattr **tb, struct nlattr *est) { - int err = -EINVAL; + int err; struct tcf_exts e; struct tcf_ematch_tree t; diff --git a/net/sched/cls_bpf.c b/net/sched/cls_bpf.c new file mode 100644 index 000000000000..1002a8226281 --- /dev/null +++ b/net/sched/cls_bpf.c @@ -0,0 +1,385 @@ +/* + * Berkeley Packet Filter based traffic classifier + * + * Might be used to classify traffic through flexible, user-defined and + * possibly JIT-ed BPF filters for traffic control as an alternative to + * ematches. + * + * (C) 2013 Daniel Borkmann <dborkman@redhat.com> + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License version 2 as + * published by the Free Software Foundation. + */ + +#include <linux/module.h> +#include <linux/types.h> +#include <linux/skbuff.h> +#include <linux/filter.h> +#include <net/rtnetlink.h> +#include <net/pkt_cls.h> +#include <net/sock.h> + +MODULE_LICENSE("GPL"); +MODULE_AUTHOR("Daniel Borkmann <dborkman@redhat.com>"); +MODULE_DESCRIPTION("TC BPF based classifier"); + +struct cls_bpf_head { + struct list_head plist; + u32 hgen; +}; + +struct cls_bpf_prog { + struct sk_filter *filter; + struct sock_filter *bpf_ops; + struct tcf_exts exts; + struct tcf_result res; + struct list_head link; + u32 handle; + u16 bpf_len; +}; + +static const struct nla_policy bpf_policy[TCA_BPF_MAX + 1] = { + [TCA_BPF_CLASSID] = { .type = NLA_U32 }, + [TCA_BPF_OPS_LEN] = { .type = NLA_U16 }, + [TCA_BPF_OPS] = { .type = NLA_BINARY, + .len = sizeof(struct sock_filter) * BPF_MAXINSNS }, +}; + +static const struct tcf_ext_map bpf_ext_map = { + .action = TCA_BPF_ACT, + .police = TCA_BPF_POLICE, +}; + +static int cls_bpf_classify(struct sk_buff *skb, const struct tcf_proto *tp, + struct tcf_result *res) +{ + struct cls_bpf_head *head = tp->root; + struct cls_bpf_prog *prog; + int ret; + + list_for_each_entry(prog, &head->plist, link) { + int filter_res = SK_RUN_FILTER(prog->filter, skb); + + if (filter_res == 0) + continue; + + *res = prog->res; + if (filter_res != -1) + res->classid = filter_res; + + ret = tcf_exts_exec(skb, &prog->exts, res); + if (ret < 0) + continue; + + return ret; + } + + return -1; +} + +static int cls_bpf_init(struct tcf_proto *tp) +{ + struct cls_bpf_head *head; + + head = kzalloc(sizeof(*head), GFP_KERNEL); + if (head == NULL) + return -ENOBUFS; + + INIT_LIST_HEAD(&head->plist); + tp->root = head; + + return 0; +} + +static void cls_bpf_delete_prog(struct tcf_proto *tp, struct cls_bpf_prog *prog) +{ + tcf_unbind_filter(tp, &prog->res); + tcf_exts_destroy(tp, &prog->exts); + + sk_unattached_filter_destroy(prog->filter); + + kfree(prog->bpf_ops); + kfree(prog); +} + +static int cls_bpf_delete(struct tcf_proto *tp, unsigned long arg) +{ + struct cls_bpf_head *head = tp->root; + struct cls_bpf_prog *prog, *todel = (struct cls_bpf_prog *) arg; + + list_for_each_entry(prog, &head->plist, link) { + if (prog == todel) { + tcf_tree_lock(tp); + list_del(&prog->link); + tcf_tree_unlock(tp); + + cls_bpf_delete_prog(tp, prog); + return 0; + } + } + + return -ENOENT; +} + +static void cls_bpf_destroy(struct tcf_proto *tp) +{ + struct cls_bpf_head *head = tp->root; + struct cls_bpf_prog *prog, *tmp; + + list_for_each_entry_safe(prog, tmp, &head->plist, link) { + list_del(&prog->link); + cls_bpf_delete_prog(tp, prog); + } + + kfree(head); +} + +static unsigned long cls_bpf_get(struct tcf_proto *tp, u32 handle) +{ + struct cls_bpf_head *head = tp->root; + struct cls_bpf_prog *prog; + unsigned long ret = 0UL; + + if (head == NULL) + return 0UL; + + list_for_each_entry(prog, &head->plist, link) { + if (prog->handle == handle) { + ret = (unsigned long) prog; + break; + } + } + + return ret; +} + +static void cls_bpf_put(struct tcf_proto *tp, unsigned long f) +{ +} + +static int cls_bpf_modify_existing(struct net *net, struct tcf_proto *tp, + struct cls_bpf_prog *prog, + unsigned long base, struct nlattr **tb, + struct nlattr *est) +{ + struct sock_filter *bpf_ops, *bpf_old; + struct tcf_exts exts; + struct sock_fprog tmp; + struct sk_filter *fp, *fp_old; + u16 bpf_size, bpf_len; + u32 classid; + int ret; + + if (!tb[TCA_BPF_OPS_LEN] || !tb[TCA_BPF_OPS] || !tb[TCA_BPF_CLASSID]) + return -EINVAL; + + ret = tcf_exts_validate(net, tp, tb, est, &exts, &bpf_ext_map); + if (ret < 0) + return ret; + + classid = nla_get_u32(tb[TCA_BPF_CLASSID]); + bpf_len = nla_get_u16(tb[TCA_BPF_OPS_LEN]); + if (bpf_len > BPF_MAXINSNS || bpf_len == 0) { + ret = -EINVAL; + goto errout; + } + + bpf_size = bpf_len * sizeof(*bpf_ops); + bpf_ops = kzalloc(bpf_size, GFP_KERNEL); + if (bpf_ops == NULL) { + ret = -ENOMEM; + goto errout; + } + + memcpy(bpf_ops, nla_data(tb[TCA_BPF_OPS]), bpf_size); + + tmp.len = bpf_len; + tmp.filter = (struct sock_filter __user *) bpf_ops; + + ret = sk_unattached_filter_create(&fp, &tmp); + if (ret) + goto errout_free; + + tcf_tree_lock(tp); + fp_old = prog->filter; + bpf_old = prog->bpf_ops; + + prog->bpf_len = bpf_len; + prog->bpf_ops = bpf_ops; + prog->filter = fp; + prog->res.classid = classid; + tcf_tree_unlock(tp); + + tcf_bind_filter(tp, &prog->res, base); + tcf_exts_change(tp, &prog->exts, &exts); + + if (fp_old) + sk_unattached_filter_destroy(fp_old); + if (bpf_old) + kfree(bpf_old); + + return 0; + +errout_free: + kfree(bpf_ops); +errout: + tcf_exts_destroy(tp, &exts); + return ret; +} + +static u32 cls_bpf_grab_new_handle(struct tcf_proto *tp, + struct cls_bpf_head *head) +{ + unsigned int i = 0x80000000; + + do { + if (++head->hgen == 0x7FFFFFFF) + head->hgen = 1; + } while (--i > 0 && cls_bpf_get(tp, head->hgen)); + if (i == 0) + pr_err("Insufficient number of handles\n"); + + return i; +} + +static int cls_bpf_change(struct net *net, struct sk_buff *in_skb, + struct tcf_proto *tp, unsigned long base, + u32 handle, struct nlattr **tca, + unsigned long *arg) +{ + struct cls_bpf_head *head = tp->root; + struct cls_bpf_prog *prog = (struct cls_bpf_prog *) *arg; + struct nlattr *tb[TCA_BPF_MAX + 1]; + int ret; + + if (tca[TCA_OPTIONS] == NULL) + return -EINVAL; + + ret = nla_parse_nested(tb, TCA_BPF_MAX, tca[TCA_OPTIONS], bpf_policy); + if (ret < 0) + return ret; + + if (prog != NULL) { + if (handle && prog->handle != handle) + return -EINVAL; + return cls_bpf_modify_existing(net, tp, prog, base, tb, + tca[TCA_RATE]); + } + + prog = kzalloc(sizeof(*prog), GFP_KERNEL); + if (prog == NULL) + return -ENOBUFS; + + if (handle == 0) + prog->handle = cls_bpf_grab_new_handle(tp, head); + else + prog->handle = handle; + if (prog->handle == 0) { + ret = -EINVAL; + goto errout; + } + + ret = cls_bpf_modify_existing(net, tp, prog, base, tb, tca[TCA_RATE]); + if (ret < 0) + goto errout; + + tcf_tree_lock(tp); + list_add(&prog->link, &head->plist); + tcf_tree_unlock(tp); + + *arg = (unsigned long) prog; + + return 0; +errout: + if (*arg == 0UL && prog) + kfree(prog); + + return ret; +} + +static int cls_bpf_dump(struct tcf_proto *tp, unsigned long fh, + struct sk_buff *skb, struct tcmsg *tm) +{ + struct cls_bpf_prog *prog = (struct cls_bpf_prog *) fh; + struct nlattr *nest, *nla; + + if (prog == NULL) + return skb->len; + + tm->tcm_handle = prog->handle; + + nest = nla_nest_start(skb, TCA_OPTIONS); + if (nest == NULL) + goto nla_put_failure; + + if (nla_put_u32(skb, TCA_BPF_CLASSID, prog->res.classid)) + goto nla_put_failure; + if (nla_put_u16(skb, TCA_BPF_OPS_LEN, prog->bpf_len)) + goto nla_put_failure; + + nla = nla_reserve(skb, TCA_BPF_OPS, prog->bpf_len * + sizeof(struct sock_filter)); + if (nla == NULL) + goto nla_put_failure; + + memcpy(nla_data(nla), prog->bpf_ops, nla_len(nla)); + + if (tcf_exts_dump(skb, &prog->exts, &bpf_ext_map) < 0) + goto nla_put_failure; + + nla_nest_end(skb, nest); + + if (tcf_exts_dump_stats(skb, &prog->exts, &bpf_ext_map) < 0) + goto nla_put_failure; + + return skb->len; + +nla_put_failure: + nla_nest_cancel(skb, nest); + return -1; +} + +static void cls_bpf_walk(struct tcf_proto *tp, struct tcf_walker *arg) +{ + struct cls_bpf_head *head = tp->root; + struct cls_bpf_prog *prog; + + list_for_each_entry(prog, &head->plist, link) { + if (arg->count < arg->skip) + goto skip; + if (arg->fn(tp, (unsigned long) prog, arg) < 0) { + arg->stop = 1; + break; + } +skip: + arg->count++; + } +} + +static struct tcf_proto_ops cls_bpf_ops __read_mostly = { + .kind = "bpf", + .owner = THIS_MODULE, + .classify = cls_bpf_classify, + .init = cls_bpf_init, + .destroy = cls_bpf_destroy, + .get = cls_bpf_get, + .put = cls_bpf_put, + .change = cls_bpf_change, + .delete = cls_bpf_delete, + .walk = cls_bpf_walk, + .dump = cls_bpf_dump, +}; + +static int __init cls_bpf_init_mod(void) +{ + return register_tcf_proto_ops(&cls_bpf_ops); +} + +static void __exit cls_bpf_exit_mod(void) +{ + unregister_tcf_proto_ops(&cls_bpf_ops); +} + +module_init(cls_bpf_init_mod); +module_exit(cls_bpf_exit_mod); diff --git a/net/sched/cls_cgroup.c b/net/sched/cls_cgroup.c index 867b4a3e3980..16006c92c3fd 100644 --- a/net/sched/cls_cgroup.c +++ b/net/sched/cls_cgroup.c @@ -72,11 +72,11 @@ static void cgrp_attach(struct cgroup_subsys_state *css, struct cgroup_taskset *tset) { struct task_struct *p; - void *v; + struct cgroup_cls_state *cs = css_cls_state(css); + void *v = (void *)(unsigned long)cs->classid; cgroup_taskset_for_each(p, css, tset) { task_lock(p); - v = (void *)(unsigned long)task_cls_classid(p); iterate_fd(p->files, 0, update_classid, v); task_unlock(p); } diff --git a/net/sched/em_ipset.c b/net/sched/em_ipset.c index 938b7cbf5627..527aeb7a3ff0 100644 --- a/net/sched/em_ipset.c +++ b/net/sched/em_ipset.c @@ -24,11 +24,12 @@ static int em_ipset_change(struct tcf_proto *tp, void *data, int data_len, { struct xt_set_info *set = data; ip_set_id_t index; + struct net *net = dev_net(qdisc_dev(tp->q)); if (data_len != sizeof(*set)) return -EINVAL; - index = ip_set_nfnl_get_byindex(set->index); + index = ip_set_nfnl_get_byindex(net, set->index); if (index == IPSET_INVALID_ID) return -ENOENT; @@ -37,7 +38,7 @@ static int em_ipset_change(struct tcf_proto *tp, void *data, int data_len, if (em->data) return 0; - ip_set_nfnl_put(index); + ip_set_nfnl_put(net, index); return -ENOMEM; } @@ -45,7 +46,7 @@ static void em_ipset_destroy(struct tcf_proto *p, struct tcf_ematch *em) { const struct xt_set_info *set = (const void *) em->data; if (set) { - ip_set_nfnl_put(set->index); + ip_set_nfnl_put(dev_net(qdisc_dev(p->q)), set->index); kfree((void *) em->data); } } diff --git a/net/sched/em_meta.c b/net/sched/em_meta.c index 7c3de6ffa516..e5cef9567225 100644 --- a/net/sched/em_meta.c +++ b/net/sched/em_meta.c @@ -793,8 +793,10 @@ static int em_meta_change(struct tcf_proto *tp, void *data, int len, goto errout; meta = kzalloc(sizeof(*meta), GFP_KERNEL); - if (meta == NULL) + if (meta == NULL) { + err = -ENOMEM; goto errout; + } memcpy(&meta->lvalue.hdr, &hdr->left, sizeof(hdr->left)); memcpy(&meta->rvalue.hdr, &hdr->right, sizeof(hdr->right)); diff --git a/net/sched/sch_api.c b/net/sched/sch_api.c index 2adda7fa2d39..cd81505662b8 100644 --- a/net/sched/sch_api.c +++ b/net/sched/sch_api.c @@ -737,9 +737,11 @@ void qdisc_tree_decrease_qlen(struct Qdisc *sch, unsigned int n) const struct Qdisc_class_ops *cops; unsigned long cl; u32 parentid; + int drops; if (n == 0) return; + drops = max_t(int, n, 0); while ((parentid = sch->parent)) { if (TC_H_MAJ(parentid) == TC_H_MAJ(TC_H_INGRESS)) return; @@ -756,6 +758,7 @@ void qdisc_tree_decrease_qlen(struct Qdisc *sch, unsigned int n) cops->put(sch, cl); } sch->q.qlen -= n; + sch->qstats.drops += drops; } } EXPORT_SYMBOL(qdisc_tree_decrease_qlen); diff --git a/net/sched/sch_generic.c b/net/sched/sch_generic.c index a74e278654aa..922a09406ba7 100644 --- a/net/sched/sch_generic.c +++ b/net/sched/sch_generic.c @@ -126,7 +126,7 @@ int sch_direct_xmit(struct sk_buff *skb, struct Qdisc *q, HARD_TX_LOCK(dev, txq, smp_processor_id()); if (!netif_xmit_frozen_or_stopped(txq)) - ret = dev_hard_start_xmit(skb, dev, txq); + ret = dev_hard_start_xmit(skb, dev, txq, NULL); HARD_TX_UNLOCK(dev, txq); @@ -829,7 +829,7 @@ void dev_deactivate_many(struct list_head *head) struct net_device *dev; bool sync_needed = false; - list_for_each_entry(dev, head, unreg_list) { + list_for_each_entry(dev, head, close_list) { netdev_for_each_tx_queue(dev, dev_deactivate_queue, &noop_qdisc); if (dev_ingress_queue(dev)) @@ -848,7 +848,7 @@ void dev_deactivate_many(struct list_head *head) synchronize_net(); /* Wait for outstanding qdisc_run calls. */ - list_for_each_entry(dev, head, unreg_list) + list_for_each_entry(dev, head, close_list) while (some_qdisc_is_busy(dev)) yield(); } @@ -857,7 +857,7 @@ void dev_deactivate(struct net_device *dev) { LIST_HEAD(single); - list_add(&dev->unreg_list, &single); + list_add(&dev->close_list, &single); dev_deactivate_many(&single); list_del(&single); } @@ -910,11 +910,12 @@ void dev_shutdown(struct net_device *dev) } void psched_ratecfg_precompute(struct psched_ratecfg *r, - const struct tc_ratespec *conf) + const struct tc_ratespec *conf, + u64 rate64) { memset(r, 0, sizeof(*r)); r->overhead = conf->overhead; - r->rate_bytes_ps = conf->rate; + r->rate_bytes_ps = max_t(u64, conf->rate, rate64); r->linklayer = (conf->linklayer & TC_LINKLAYER_MASK); r->mult = 1; /* diff --git a/net/sched/sch_htb.c b/net/sched/sch_htb.c index 863846cc5513..0e1e38b40025 100644 --- a/net/sched/sch_htb.c +++ b/net/sched/sch_htb.c @@ -997,6 +997,8 @@ static const struct nla_policy htb_policy[TCA_HTB_MAX + 1] = { [TCA_HTB_CTAB] = { .type = NLA_BINARY, .len = TC_RTAB_SIZE }, [TCA_HTB_RTAB] = { .type = NLA_BINARY, .len = TC_RTAB_SIZE }, [TCA_HTB_DIRECT_QLEN] = { .type = NLA_U32 }, + [TCA_HTB_RATE64] = { .type = NLA_U64 }, + [TCA_HTB_CEIL64] = { .type = NLA_U64 }, }; static void htb_work_func(struct work_struct *work) @@ -1114,6 +1116,12 @@ static int htb_dump_class(struct Qdisc *sch, unsigned long arg, opt.level = cl->level; if (nla_put(skb, TCA_HTB_PARMS, sizeof(opt), &opt)) goto nla_put_failure; + if ((cl->rate.rate_bytes_ps >= (1ULL << 32)) && + nla_put_u64(skb, TCA_HTB_RATE64, cl->rate.rate_bytes_ps)) + goto nla_put_failure; + if ((cl->ceil.rate_bytes_ps >= (1ULL << 32)) && + nla_put_u64(skb, TCA_HTB_CEIL64, cl->ceil.rate_bytes_ps)) + goto nla_put_failure; nla_nest_end(skb, nest); spin_unlock_bh(root_lock); @@ -1332,6 +1340,7 @@ static int htb_change_class(struct Qdisc *sch, u32 classid, struct qdisc_rate_table *rtab = NULL, *ctab = NULL; struct nlattr *tb[TCA_HTB_MAX + 1]; struct tc_htb_opt *hopt; + u64 rate64, ceil64; /* extract all subattrs from opt attr */ if (!opt) @@ -1491,8 +1500,12 @@ static int htb_change_class(struct Qdisc *sch, u32 classid, cl->prio = TC_HTB_NUMPRIO - 1; } - psched_ratecfg_precompute(&cl->rate, &hopt->rate); - psched_ratecfg_precompute(&cl->ceil, &hopt->ceil); + rate64 = tb[TCA_HTB_RATE64] ? nla_get_u64(tb[TCA_HTB_RATE64]) : 0; + + ceil64 = tb[TCA_HTB_CEIL64] ? nla_get_u64(tb[TCA_HTB_CEIL64]) : 0; + + psched_ratecfg_precompute(&cl->rate, &hopt->rate, rate64); + psched_ratecfg_precompute(&cl->ceil, &hopt->ceil, ceil64); cl->buffer = PSCHED_TICKS2NS(hopt->buffer); cl->cbuffer = PSCHED_TICKS2NS(hopt->cbuffer); diff --git a/net/sched/sch_netem.c b/net/sched/sch_netem.c index b87e83d07478..75c94e59a3bd 100644 --- a/net/sched/sch_netem.c +++ b/net/sched/sch_netem.c @@ -235,7 +235,6 @@ static bool loss_4state(struct netem_sched_data *q) clg->state = 2; else if (clg->a3 < rnd && rnd < clg->a2 + clg->a3) { clg->state = 1; - return true; } else if (clg->a2 + clg->a3 < rnd) { clg->state = 3; return true; diff --git a/net/sched/sch_tbf.c b/net/sched/sch_tbf.c index 1aaf1b6e51a2..68f98595819c 100644 --- a/net/sched/sch_tbf.c +++ b/net/sched/sch_tbf.c @@ -266,20 +266,23 @@ static const struct nla_policy tbf_policy[TCA_TBF_MAX + 1] = { [TCA_TBF_PARMS] = { .len = sizeof(struct tc_tbf_qopt) }, [TCA_TBF_RTAB] = { .type = NLA_BINARY, .len = TC_RTAB_SIZE }, [TCA_TBF_PTAB] = { .type = NLA_BINARY, .len = TC_RTAB_SIZE }, + [TCA_TBF_RATE64] = { .type = NLA_U64 }, + [TCA_TBF_PRATE64] = { .type = NLA_U64 }, }; static int tbf_change(struct Qdisc *sch, struct nlattr *opt) { int err; struct tbf_sched_data *q = qdisc_priv(sch); - struct nlattr *tb[TCA_TBF_PTAB + 1]; + struct nlattr *tb[TCA_TBF_MAX + 1]; struct tc_tbf_qopt *qopt; struct qdisc_rate_table *rtab = NULL; struct qdisc_rate_table *ptab = NULL; struct Qdisc *child = NULL; int max_size, n; + u64 rate64 = 0, prate64 = 0; - err = nla_parse_nested(tb, TCA_TBF_PTAB, opt, tbf_policy); + err = nla_parse_nested(tb, TCA_TBF_MAX, opt, tbf_policy); if (err < 0) return err; @@ -341,9 +344,13 @@ static int tbf_change(struct Qdisc *sch, struct nlattr *opt) q->tokens = q->buffer; q->ptokens = q->mtu; - psched_ratecfg_precompute(&q->rate, &rtab->rate); + if (tb[TCA_TBF_RATE64]) + rate64 = nla_get_u64(tb[TCA_TBF_RATE64]); + psched_ratecfg_precompute(&q->rate, &rtab->rate, rate64); if (ptab) { - psched_ratecfg_precompute(&q->peak, &ptab->rate); + if (tb[TCA_TBF_PRATE64]) + prate64 = nla_get_u64(tb[TCA_TBF_PRATE64]); + psched_ratecfg_precompute(&q->peak, &ptab->rate, prate64); q->peak_present = true; } else { q->peak_present = false; @@ -402,6 +409,13 @@ static int tbf_dump(struct Qdisc *sch, struct sk_buff *skb) opt.buffer = PSCHED_NS2TICKS(q->buffer); if (nla_put(skb, TCA_TBF_PARMS, sizeof(opt), &opt)) goto nla_put_failure; + if (q->rate.rate_bytes_ps >= (1ULL << 32) && + nla_put_u64(skb, TCA_TBF_RATE64, q->rate.rate_bytes_ps)) + goto nla_put_failure; + if (q->peak_present && + q->peak.rate_bytes_ps >= (1ULL << 32) && + nla_put_u64(skb, TCA_TBF_PRATE64, q->peak.rate_bytes_ps)) + goto nla_put_failure; nla_nest_end(skb, nest); return skb->len; diff --git a/net/sctp/associola.c b/net/sctp/associola.c index cef509985192..c9b91cb1cb0d 100644 --- a/net/sctp/associola.c +++ b/net/sctp/associola.c @@ -602,7 +602,7 @@ void sctp_assoc_rm_peer(struct sctp_association *asoc, /* Start a T3 timer here in case it wasn't running so * that these migrated packets have a chance to get - * retrnasmitted. + * retransmitted. */ if (!timer_pending(&active->T3_rtx_timer)) if (!mod_timer(&active->T3_rtx_timer, @@ -665,7 +665,7 @@ struct sctp_transport *sctp_assoc_add_peer(struct sctp_association *asoc, /* Set the path max_retrans. */ peer->pathmaxrxt = asoc->pathmaxrxt; - /* And the partial failure retrnas threshold */ + /* And the partial failure retrans threshold */ peer->pf_retrans = asoc->pf_retrans; /* Initialize the peer's SACK delay timeout based on the diff --git a/net/sctp/auth.c b/net/sctp/auth.c index 8c4fa5dec824..46b5977978a1 100644 --- a/net/sctp/auth.c +++ b/net/sctp/auth.c @@ -539,18 +539,14 @@ struct sctp_hmac *sctp_auth_asoc_get_hmac(const struct sctp_association *asoc) for (i = 0; i < n_elt; i++) { id = ntohs(hmacs->hmac_ids[i]); - /* Check the id is in the supported range */ - if (id > SCTP_AUTH_HMAC_ID_MAX) { - id = 0; - continue; - } - - /* See is we support the id. Supported IDs have name and - * length fields set, so that we can allocated and use + /* Check the id is in the supported range. And + * see if we support the id. Supported IDs have name and + * length fields set, so that we can allocate and use * them. We can safely just check for name, for without the * name, we can't allocate the TFM. */ - if (!sctp_hmac_list[id].hmac_name) { + if (id > SCTP_AUTH_HMAC_ID_MAX || + !sctp_hmac_list[id].hmac_name) { id = 0; continue; } diff --git a/net/sctp/chunk.c b/net/sctp/chunk.c index 7bd5ed4a8657..f2044fcb9dd1 100644 --- a/net/sctp/chunk.c +++ b/net/sctp/chunk.c @@ -201,7 +201,7 @@ struct sctp_datamsg *sctp_datamsg_from_user(struct sctp_association *asoc, max = asoc->frag_point; /* If the the peer requested that we authenticate DATA chunks - * we need to accound for bundling of the AUTH chunks along with + * we need to account for bundling of the AUTH chunks along with * DATA. */ if (sctp_auth_send_cid(SCTP_CID_DATA, asoc)) { diff --git a/net/sctp/ipv6.c b/net/sctp/ipv6.c index 96a55910262c..7567e6f1a920 100644 --- a/net/sctp/ipv6.c +++ b/net/sctp/ipv6.c @@ -428,20 +428,20 @@ static void sctp_v6_from_sk(union sctp_addr *addr, struct sock *sk) { addr->v6.sin6_family = AF_INET6; addr->v6.sin6_port = 0; - addr->v6.sin6_addr = inet6_sk(sk)->rcv_saddr; + addr->v6.sin6_addr = sk->sk_v6_rcv_saddr; } /* Initialize sk->sk_rcv_saddr from sctp_addr. */ static void sctp_v6_to_sk_saddr(union sctp_addr *addr, struct sock *sk) { if (addr->sa.sa_family == AF_INET && sctp_sk(sk)->v4mapped) { - inet6_sk(sk)->rcv_saddr.s6_addr32[0] = 0; - inet6_sk(sk)->rcv_saddr.s6_addr32[1] = 0; - inet6_sk(sk)->rcv_saddr.s6_addr32[2] = htonl(0x0000ffff); - inet6_sk(sk)->rcv_saddr.s6_addr32[3] = + sk->sk_v6_rcv_saddr.s6_addr32[0] = 0; + sk->sk_v6_rcv_saddr.s6_addr32[1] = 0; + sk->sk_v6_rcv_saddr.s6_addr32[2] = htonl(0x0000ffff); + sk->sk_v6_rcv_saddr.s6_addr32[3] = addr->v4.sin_addr.s_addr; } else { - inet6_sk(sk)->rcv_saddr = addr->v6.sin6_addr; + sk->sk_v6_rcv_saddr = addr->v6.sin6_addr; } } @@ -449,12 +449,12 @@ static void sctp_v6_to_sk_saddr(union sctp_addr *addr, struct sock *sk) static void sctp_v6_to_sk_daddr(union sctp_addr *addr, struct sock *sk) { if (addr->sa.sa_family == AF_INET && sctp_sk(sk)->v4mapped) { - inet6_sk(sk)->daddr.s6_addr32[0] = 0; - inet6_sk(sk)->daddr.s6_addr32[1] = 0; - inet6_sk(sk)->daddr.s6_addr32[2] = htonl(0x0000ffff); - inet6_sk(sk)->daddr.s6_addr32[3] = addr->v4.sin_addr.s_addr; + sk->sk_v6_daddr.s6_addr32[0] = 0; + sk->sk_v6_daddr.s6_addr32[1] = 0; + sk->sk_v6_daddr.s6_addr32[2] = htonl(0x0000ffff); + sk->sk_v6_daddr.s6_addr32[3] = addr->v4.sin_addr.s_addr; } else { - inet6_sk(sk)->daddr = addr->v6.sin6_addr; + sk->sk_v6_daddr = addr->v6.sin6_addr; } } diff --git a/net/sctp/output.c b/net/sctp/output.c index 319137340d15..e650978daf27 100644 --- a/net/sctp/output.c +++ b/net/sctp/output.c @@ -390,7 +390,6 @@ int sctp_packet_transmit(struct sctp_packet *packet) __u8 has_data = 0; struct dst_entry *dst = tp->dst; unsigned char *auth = NULL; /* pointer to auth in skb data */ - __u32 cksum_buf_len = sizeof(struct sctphdr); pr_debug("%s: packet:%p\n", __func__, packet); @@ -493,7 +492,6 @@ int sctp_packet_transmit(struct sctp_packet *packet) if (chunk == packet->auth) auth = skb_tail_pointer(nskb); - cksum_buf_len += chunk->skb->len; memcpy(skb_put(nskb, chunk->skb->len), chunk->skb->data, chunk->skb->len); @@ -538,12 +536,7 @@ int sctp_packet_transmit(struct sctp_packet *packet) if (!sctp_checksum_disable) { if (!(dst->dev->features & NETIF_F_SCTP_CSUM) || (dst_xfrm(dst) != NULL) || packet->ipfragok) { - __u32 crc32 = sctp_start_cksum((__u8 *)sh, cksum_buf_len); - - /* 3) Put the resultant value into the checksum field in the - * common header, and leave the rest of the bits unchanged. - */ - sh->checksum = sctp_end_cksum(crc32); + sh->checksum = sctp_compute_cksum(nskb, 0); } else { /* no need to seed pseudo checksum for SCTP */ nskb->ip_summed = CHECKSUM_PARTIAL; diff --git a/net/sctp/sm_make_chunk.c b/net/sctp/sm_make_chunk.c index d244a23ab8d3..fe690320b1e4 100644 --- a/net/sctp/sm_make_chunk.c +++ b/net/sctp/sm_make_chunk.c @@ -1297,6 +1297,13 @@ struct sctp_chunk *sctp_make_auth(const struct sctp_association *asoc) /* Turn an skb into a chunk. * FIXME: Eventually move the structure directly inside the skb->cb[]. + * + * sctpimpguide-05.txt Section 2.8.2 + * M1) Each time a new DATA chunk is transmitted + * set the 'TSN.Missing.Report' count for that TSN to 0. The + * 'TSN.Missing.Report' count will be used to determine missing chunks + * and when to fast retransmit. + * */ struct sctp_chunk *sctp_chunkify(struct sk_buff *skb, const struct sctp_association *asoc, @@ -1314,29 +1321,9 @@ struct sctp_chunk *sctp_chunkify(struct sk_buff *skb, INIT_LIST_HEAD(&retval->list); retval->skb = skb; retval->asoc = (struct sctp_association *)asoc; - retval->has_tsn = 0; - retval->has_ssn = 0; - retval->rtt_in_progress = 0; - retval->sent_at = 0; retval->singleton = 1; - retval->end_of_packet = 0; - retval->ecn_ce_done = 0; - retval->pdiscard = 0; - - /* sctpimpguide-05.txt Section 2.8.2 - * M1) Each time a new DATA chunk is transmitted - * set the 'TSN.Missing.Report' count for that TSN to 0. The - * 'TSN.Missing.Report' count will be used to determine missing chunks - * and when to fast retransmit. - */ - retval->tsn_missing_report = 0; - retval->tsn_gap_acked = 0; - retval->fast_retransmit = SCTP_CAN_FRTX; - /* If this is a fragmented message, track all fragments - * of the message (for SEND_FAILED). - */ - retval->msg = NULL; + retval->fast_retransmit = SCTP_CAN_FRTX; /* Polish the bead hole. */ INIT_LIST_HEAD(&retval->transmitted_list); diff --git a/net/sctp/socket.c b/net/sctp/socket.c index 911b71b26b0e..72046b9729a8 100644 --- a/net/sctp/socket.c +++ b/net/sctp/socket.c @@ -5890,7 +5890,7 @@ static long sctp_get_port_local(struct sock *sk, union sctp_addr *addr) int low, high, remaining, index; unsigned int rover; - inet_get_local_port_range(&low, &high); + inet_get_local_port_range(sock_net(sk), &low, &high); remaining = (high - low) + 1; rover = net_random() % remaining + low; diff --git a/net/sunrpc/svcsock.c b/net/sunrpc/svcsock.c index 9c9caaa5e0d3..b6e59f0a9475 100644 --- a/net/sunrpc/svcsock.c +++ b/net/sunrpc/svcsock.c @@ -291,12 +291,14 @@ static int svc_one_sock_name(struct svc_sock *svsk, char *buf, int remaining) &inet_sk(sk)->inet_rcv_saddr, inet_sk(sk)->inet_num); break; +#if IS_ENABLED(CONFIG_IPV6) case PF_INET6: len = snprintf(buf, remaining, "ipv6 %s %pI6 %d\n", proto_name, - &inet6_sk(sk)->rcv_saddr, + &sk->sk_v6_rcv_saddr, inet_sk(sk)->inet_num); break; +#endif default: len = snprintf(buf, remaining, "*unknown-%d*\n", sk->sk_family); diff --git a/net/tipc/bcast.c b/net/tipc/bcast.c index 716de1ac6cb5..0d4402587fdf 100644 --- a/net/tipc/bcast.c +++ b/net/tipc/bcast.c @@ -480,18 +480,24 @@ receive: tipc_node_unlock(node); tipc_link_recv_bundle(buf); } else if (msg_user(msg) == MSG_FRAGMENTER) { - int ret = tipc_link_recv_fragment(&node->bclink.defragm, - &buf, &msg); - if (ret < 0) + int ret; + ret = tipc_link_recv_fragment(&node->bclink.reasm_head, + &node->bclink.reasm_tail, + &buf); + if (ret == LINK_REASM_ERROR) goto unlock; spin_lock_bh(&bc_lock); bclink_accept_pkt(node, seqno); bcl->stats.recv_fragments++; - if (ret > 0) + if (ret == LINK_REASM_COMPLETE) { bcl->stats.recv_fragmented++; + /* Point msg to inner header */ + msg = buf_msg(buf); + spin_unlock_bh(&bc_lock); + goto receive; + } spin_unlock_bh(&bc_lock); tipc_node_unlock(node); - tipc_net_route_msg(buf); } else if (msg_user(msg) == NAME_DISTRIBUTOR) { spin_lock_bh(&bc_lock); bclink_accept_pkt(node, seqno); diff --git a/net/tipc/bearer.c b/net/tipc/bearer.c index 609c30c80816..3f9707a16d06 100644 --- a/net/tipc/bearer.c +++ b/net/tipc/bearer.c @@ -387,7 +387,7 @@ restart: b_ptr = &tipc_bearers[bearer_id]; strcpy(b_ptr->name, name); - res = m_ptr->enable_bearer(b_ptr); + res = m_ptr->enable_media(b_ptr); if (res) { pr_warn("Bearer <%s> rejected, enable failure (%d)\n", name, -res); @@ -420,23 +420,15 @@ exit: } /** - * tipc_block_bearer - Block the bearer with the given name, and reset all its links + * tipc_block_bearer - Block the bearer, and reset all its links */ -int tipc_block_bearer(const char *name) +int tipc_block_bearer(struct tipc_bearer *b_ptr) { - struct tipc_bearer *b_ptr = NULL; struct tipc_link *l_ptr; struct tipc_link *temp_l_ptr; read_lock_bh(&tipc_net_lock); - b_ptr = tipc_bearer_find(name); - if (!b_ptr) { - pr_warn("Attempt to block unknown bearer <%s>\n", name); - read_unlock_bh(&tipc_net_lock); - return -EINVAL; - } - - pr_info("Blocking bearer <%s>\n", name); + pr_info("Blocking bearer <%s>\n", b_ptr->name); spin_lock_bh(&b_ptr->lock); b_ptr->blocked = 1; list_for_each_entry_safe(l_ptr, temp_l_ptr, &b_ptr->links, link_list) { @@ -465,7 +457,7 @@ static void bearer_disable(struct tipc_bearer *b_ptr) pr_info("Disabling bearer <%s>\n", b_ptr->name); spin_lock_bh(&b_ptr->lock); b_ptr->blocked = 1; - b_ptr->media->disable_bearer(b_ptr); + b_ptr->media->disable_media(b_ptr); list_for_each_entry_safe(l_ptr, temp_l_ptr, &b_ptr->links, link_list) { tipc_link_delete(l_ptr); } diff --git a/net/tipc/bearer.h b/net/tipc/bearer.h index 09c869adcfcf..e5e04be6fffa 100644 --- a/net/tipc/bearer.h +++ b/net/tipc/bearer.h @@ -75,8 +75,8 @@ struct tipc_bearer; /** * struct tipc_media - TIPC media information available to internal users * @send_msg: routine which handles buffer transmission - * @enable_bearer: routine which enables a bearer - * @disable_bearer: routine which disables a bearer + * @enable_media: routine which enables a media + * @disable_media: routine which disables a media * @addr2str: routine which converts media address to string * @addr2msg: routine which converts media address to protocol message area * @msg2addr: routine which converts media address from protocol message area @@ -91,8 +91,8 @@ struct tipc_media { int (*send_msg)(struct sk_buff *buf, struct tipc_bearer *b_ptr, struct tipc_media_addr *dest); - int (*enable_bearer)(struct tipc_bearer *b_ptr); - void (*disable_bearer)(struct tipc_bearer *b_ptr); + int (*enable_media)(struct tipc_bearer *b_ptr); + void (*disable_media)(struct tipc_bearer *b_ptr); int (*addr2str)(struct tipc_media_addr *a, char *str_buf, int str_size); int (*addr2msg)(struct tipc_media_addr *a, char *msg_area); int (*msg2addr)(const struct tipc_bearer *b_ptr, @@ -163,7 +163,7 @@ int tipc_register_media(struct tipc_media *m_ptr); void tipc_recv_msg(struct sk_buff *buf, struct tipc_bearer *tb_ptr); -int tipc_block_bearer(const char *name); +int tipc_block_bearer(struct tipc_bearer *b_ptr); void tipc_continue(struct tipc_bearer *tb_ptr); int tipc_enable_bearer(const char *bearer_name, u32 disc_domain, u32 priority); diff --git a/net/tipc/core.h b/net/tipc/core.h index be72f8cebc53..94895d4e86ab 100644 --- a/net/tipc/core.h +++ b/net/tipc/core.h @@ -90,21 +90,21 @@ extern int tipc_random __read_mostly; /* * Routines available to privileged subsystems */ -extern int tipc_core_start_net(unsigned long); -extern int tipc_handler_start(void); -extern void tipc_handler_stop(void); -extern int tipc_netlink_start(void); -extern void tipc_netlink_stop(void); -extern int tipc_socket_init(void); -extern void tipc_socket_stop(void); -extern int tipc_sock_create_local(int type, struct socket **res); -extern void tipc_sock_release_local(struct socket *sock); -extern int tipc_sock_accept_local(struct socket *sock, - struct socket **newsock, int flags); +int tipc_core_start_net(unsigned long); +int tipc_handler_start(void); +void tipc_handler_stop(void); +int tipc_netlink_start(void); +void tipc_netlink_stop(void); +int tipc_socket_init(void); +void tipc_socket_stop(void); +int tipc_sock_create_local(int type, struct socket **res); +void tipc_sock_release_local(struct socket *sock); +int tipc_sock_accept_local(struct socket *sock, struct socket **newsock, + int flags); #ifdef CONFIG_SYSCTL -extern int tipc_register_sysctl(void); -extern void tipc_unregister_sysctl(void); +int tipc_register_sysctl(void); +void tipc_unregister_sysctl(void); #else #define tipc_register_sysctl() 0 #define tipc_unregister_sysctl() @@ -201,6 +201,6 @@ static inline struct tipc_msg *buf_msg(struct sk_buff *skb) return (struct tipc_msg *)skb->data; } -extern struct sk_buff *tipc_buf_acquire(u32 size); +struct sk_buff *tipc_buf_acquire(u32 size); #endif diff --git a/net/tipc/eth_media.c b/net/tipc/eth_media.c index 40ea40cf6204..f80d59f5a161 100644 --- a/net/tipc/eth_media.c +++ b/net/tipc/eth_media.c @@ -2,7 +2,7 @@ * net/tipc/eth_media.c: Ethernet bearer support for TIPC * * Copyright (c) 2001-2007, Ericsson AB - * Copyright (c) 2005-2008, 2011, Wind River Systems + * Copyright (c) 2005-2008, 2011-2013, Wind River Systems * All rights reserved. * * Redistribution and use in source and binary forms, with or without @@ -37,19 +37,19 @@ #include "core.h" #include "bearer.h" -#define MAX_ETH_BEARERS MAX_BEARERS +#define MAX_ETH_MEDIA MAX_BEARERS #define ETH_ADDR_OFFSET 4 /* message header offset of MAC address */ /** - * struct eth_bearer - Ethernet bearer data structure + * struct eth_media - Ethernet bearer data structure * @bearer: ptr to associated "generic" bearer structure * @dev: ptr to associated Ethernet network device * @tipc_packet_type: used in binding TIPC to Ethernet driver * @setup: work item used when enabling bearer * @cleanup: work item used when disabling bearer */ -struct eth_bearer { +struct eth_media { struct tipc_bearer *bearer; struct net_device *dev; struct packet_type tipc_packet_type; @@ -58,7 +58,7 @@ struct eth_bearer { }; static struct tipc_media eth_media_info; -static struct eth_bearer eth_bearers[MAX_ETH_BEARERS]; +static struct eth_media eth_media_array[MAX_ETH_MEDIA]; static int eth_started; static int recv_notification(struct notifier_block *nb, unsigned long evt, @@ -100,7 +100,7 @@ static int send_msg(struct sk_buff *buf, struct tipc_bearer *tb_ptr, if (!clone) return 0; - dev = ((struct eth_bearer *)(tb_ptr->usr_handle))->dev; + dev = ((struct eth_media *)(tb_ptr->usr_handle))->dev; delta = dev->hard_header_len - skb_headroom(buf); if ((delta > 0) && @@ -128,43 +128,43 @@ static int send_msg(struct sk_buff *buf, struct tipc_bearer *tb_ptr, static int recv_msg(struct sk_buff *buf, struct net_device *dev, struct packet_type *pt, struct net_device *orig_dev) { - struct eth_bearer *eb_ptr = (struct eth_bearer *)pt->af_packet_priv; + struct eth_media *eb_ptr = (struct eth_media *)pt->af_packet_priv; if (!net_eq(dev_net(dev), &init_net)) { kfree_skb(buf); - return 0; + return NET_RX_DROP; } if (likely(eb_ptr->bearer)) { if (likely(buf->pkt_type <= PACKET_BROADCAST)) { buf->next = NULL; tipc_recv_msg(buf, eb_ptr->bearer); - return 0; + return NET_RX_SUCCESS; } } kfree_skb(buf); - return 0; + return NET_RX_DROP; } /** - * setup_bearer - setup association between Ethernet bearer and interface + * setup_media - setup association between Ethernet bearer and interface */ -static void setup_bearer(struct work_struct *work) +static void setup_media(struct work_struct *work) { - struct eth_bearer *eb_ptr = - container_of(work, struct eth_bearer, setup); + struct eth_media *eb_ptr = + container_of(work, struct eth_media, setup); dev_add_pack(&eb_ptr->tipc_packet_type); } /** - * enable_bearer - attach TIPC bearer to an Ethernet interface + * enable_media - attach TIPC bearer to an Ethernet interface */ -static int enable_bearer(struct tipc_bearer *tb_ptr) +static int enable_media(struct tipc_bearer *tb_ptr) { struct net_device *dev; - struct eth_bearer *eb_ptr = ð_bearers[0]; - struct eth_bearer *stop = ð_bearers[MAX_ETH_BEARERS]; + struct eth_media *eb_ptr = ð_media_array[0]; + struct eth_media *stop = ð_media_array[MAX_ETH_MEDIA]; char *driver_name = strchr((const char *)tb_ptr->name, ':') + 1; int pending_dev = 0; @@ -188,7 +188,7 @@ static int enable_bearer(struct tipc_bearer *tb_ptr) eb_ptr->tipc_packet_type.func = recv_msg; eb_ptr->tipc_packet_type.af_packet_priv = eb_ptr; INIT_LIST_HEAD(&(eb_ptr->tipc_packet_type.list)); - INIT_WORK(&eb_ptr->setup, setup_bearer); + INIT_WORK(&eb_ptr->setup, setup_media); schedule_work(&eb_ptr->setup); /* Associate TIPC bearer with Ethernet bearer */ @@ -205,14 +205,14 @@ static int enable_bearer(struct tipc_bearer *tb_ptr) } /** - * cleanup_bearer - break association between Ethernet bearer and interface + * cleanup_media - break association between Ethernet bearer and interface * * This routine must be invoked from a work queue because it can sleep. */ -static void cleanup_bearer(struct work_struct *work) +static void cleanup_media(struct work_struct *work) { - struct eth_bearer *eb_ptr = - container_of(work, struct eth_bearer, cleanup); + struct eth_media *eb_ptr = + container_of(work, struct eth_media, cleanup); dev_remove_pack(&eb_ptr->tipc_packet_type); dev_put(eb_ptr->dev); @@ -220,18 +220,18 @@ static void cleanup_bearer(struct work_struct *work) } /** - * disable_bearer - detach TIPC bearer from an Ethernet interface + * disable_media - detach TIPC bearer from an Ethernet interface * * Mark Ethernet bearer as inactive so that incoming buffers are thrown away, * then get worker thread to complete bearer cleanup. (Can't do cleanup * here because cleanup code needs to sleep and caller holds spinlocks.) */ -static void disable_bearer(struct tipc_bearer *tb_ptr) +static void disable_media(struct tipc_bearer *tb_ptr) { - struct eth_bearer *eb_ptr = (struct eth_bearer *)tb_ptr->usr_handle; + struct eth_media *eb_ptr = (struct eth_media *)tb_ptr->usr_handle; eb_ptr->bearer = NULL; - INIT_WORK(&eb_ptr->cleanup, cleanup_bearer); + INIT_WORK(&eb_ptr->cleanup, cleanup_media); schedule_work(&eb_ptr->cleanup); } @@ -245,8 +245,8 @@ static int recv_notification(struct notifier_block *nb, unsigned long evt, void *ptr) { struct net_device *dev = netdev_notifier_info_to_dev(ptr); - struct eth_bearer *eb_ptr = ð_bearers[0]; - struct eth_bearer *stop = ð_bearers[MAX_ETH_BEARERS]; + struct eth_media *eb_ptr = ð_media_array[0]; + struct eth_media *stop = ð_media_array[MAX_ETH_MEDIA]; if (!net_eq(dev_net(dev), &init_net)) return NOTIFY_DONE; @@ -265,17 +265,17 @@ static int recv_notification(struct notifier_block *nb, unsigned long evt, if (netif_carrier_ok(dev)) tipc_continue(eb_ptr->bearer); else - tipc_block_bearer(eb_ptr->bearer->name); + tipc_block_bearer(eb_ptr->bearer); break; case NETDEV_UP: tipc_continue(eb_ptr->bearer); break; case NETDEV_DOWN: - tipc_block_bearer(eb_ptr->bearer->name); + tipc_block_bearer(eb_ptr->bearer); break; case NETDEV_CHANGEMTU: case NETDEV_CHANGEADDR: - tipc_block_bearer(eb_ptr->bearer->name); + tipc_block_bearer(eb_ptr->bearer); tipc_continue(eb_ptr->bearer); break; case NETDEV_UNREGISTER: @@ -327,8 +327,8 @@ static int eth_msg2addr(const struct tipc_bearer *tb_ptr, */ static struct tipc_media eth_media_info = { .send_msg = send_msg, - .enable_bearer = enable_bearer, - .disable_bearer = disable_bearer, + .enable_media = enable_media, + .disable_media = disable_media, .addr2str = eth_addr2str, .addr2msg = eth_addr2msg, .msg2addr = eth_msg2addr, diff --git a/net/tipc/ib_media.c b/net/tipc/ib_media.c index 9934a32bfa87..c13989297464 100644 --- a/net/tipc/ib_media.c +++ b/net/tipc/ib_media.c @@ -42,17 +42,17 @@ #include "core.h" #include "bearer.h" -#define MAX_IB_BEARERS MAX_BEARERS +#define MAX_IB_MEDIA MAX_BEARERS /** - * struct ib_bearer - Infiniband bearer data structure + * struct ib_media - Infiniband media data structure * @bearer: ptr to associated "generic" bearer structure * @dev: ptr to associated Infiniband network device * @tipc_packet_type: used in binding TIPC to Infiniband driver * @cleanup: work item used when disabling bearer */ -struct ib_bearer { +struct ib_media { struct tipc_bearer *bearer; struct net_device *dev; struct packet_type tipc_packet_type; @@ -61,7 +61,7 @@ struct ib_bearer { }; static struct tipc_media ib_media_info; -static struct ib_bearer ib_bearers[MAX_IB_BEARERS]; +static struct ib_media ib_media_array[MAX_IB_MEDIA]; static int ib_started; /** @@ -93,7 +93,7 @@ static int send_msg(struct sk_buff *buf, struct tipc_bearer *tb_ptr, if (!clone) return 0; - dev = ((struct ib_bearer *)(tb_ptr->usr_handle))->dev; + dev = ((struct ib_media *)(tb_ptr->usr_handle))->dev; delta = dev->hard_header_len - skb_headroom(buf); if ((delta > 0) && @@ -121,43 +121,43 @@ static int send_msg(struct sk_buff *buf, struct tipc_bearer *tb_ptr, static int recv_msg(struct sk_buff *buf, struct net_device *dev, struct packet_type *pt, struct net_device *orig_dev) { - struct ib_bearer *ib_ptr = (struct ib_bearer *)pt->af_packet_priv; + struct ib_media *ib_ptr = (struct ib_media *)pt->af_packet_priv; if (!net_eq(dev_net(dev), &init_net)) { kfree_skb(buf); - return 0; + return NET_RX_DROP; } if (likely(ib_ptr->bearer)) { if (likely(buf->pkt_type <= PACKET_BROADCAST)) { buf->next = NULL; tipc_recv_msg(buf, ib_ptr->bearer); - return 0; + return NET_RX_SUCCESS; } } kfree_skb(buf); - return 0; + return NET_RX_DROP; } /** * setup_bearer - setup association between InfiniBand bearer and interface */ -static void setup_bearer(struct work_struct *work) +static void setup_media(struct work_struct *work) { - struct ib_bearer *ib_ptr = - container_of(work, struct ib_bearer, setup); + struct ib_media *ib_ptr = + container_of(work, struct ib_media, setup); dev_add_pack(&ib_ptr->tipc_packet_type); } /** - * enable_bearer - attach TIPC bearer to an InfiniBand interface + * enable_media - attach TIPC bearer to an InfiniBand interface */ -static int enable_bearer(struct tipc_bearer *tb_ptr) +static int enable_media(struct tipc_bearer *tb_ptr) { struct net_device *dev; - struct ib_bearer *ib_ptr = &ib_bearers[0]; - struct ib_bearer *stop = &ib_bearers[MAX_IB_BEARERS]; + struct ib_media *ib_ptr = &ib_media_array[0]; + struct ib_media *stop = &ib_media_array[MAX_IB_MEDIA]; char *driver_name = strchr((const char *)tb_ptr->name, ':') + 1; int pending_dev = 0; @@ -181,7 +181,7 @@ static int enable_bearer(struct tipc_bearer *tb_ptr) ib_ptr->tipc_packet_type.func = recv_msg; ib_ptr->tipc_packet_type.af_packet_priv = ib_ptr; INIT_LIST_HEAD(&(ib_ptr->tipc_packet_type.list)); - INIT_WORK(&ib_ptr->setup, setup_bearer); + INIT_WORK(&ib_ptr->setup, setup_media); schedule_work(&ib_ptr->setup); /* Associate TIPC bearer with InfiniBand bearer */ @@ -204,8 +204,8 @@ static int enable_bearer(struct tipc_bearer *tb_ptr) */ static void cleanup_bearer(struct work_struct *work) { - struct ib_bearer *ib_ptr = - container_of(work, struct ib_bearer, cleanup); + struct ib_media *ib_ptr = + container_of(work, struct ib_media, cleanup); dev_remove_pack(&ib_ptr->tipc_packet_type); dev_put(ib_ptr->dev); @@ -213,15 +213,15 @@ static void cleanup_bearer(struct work_struct *work) } /** - * disable_bearer - detach TIPC bearer from an InfiniBand interface + * disable_media - detach TIPC bearer from an InfiniBand interface * * Mark InfiniBand bearer as inactive so that incoming buffers are thrown away, * then get worker thread to complete bearer cleanup. (Can't do cleanup * here because cleanup code needs to sleep and caller holds spinlocks.) */ -static void disable_bearer(struct tipc_bearer *tb_ptr) +static void disable_media(struct tipc_bearer *tb_ptr) { - struct ib_bearer *ib_ptr = (struct ib_bearer *)tb_ptr->usr_handle; + struct ib_media *ib_ptr = (struct ib_media *)tb_ptr->usr_handle; ib_ptr->bearer = NULL; INIT_WORK(&ib_ptr->cleanup, cleanup_bearer); @@ -238,8 +238,8 @@ static int recv_notification(struct notifier_block *nb, unsigned long evt, void *ptr) { struct net_device *dev = netdev_notifier_info_to_dev(ptr); - struct ib_bearer *ib_ptr = &ib_bearers[0]; - struct ib_bearer *stop = &ib_bearers[MAX_IB_BEARERS]; + struct ib_media *ib_ptr = &ib_media_array[0]; + struct ib_media *stop = &ib_media_array[MAX_IB_MEDIA]; if (!net_eq(dev_net(dev), &init_net)) return NOTIFY_DONE; @@ -258,17 +258,17 @@ static int recv_notification(struct notifier_block *nb, unsigned long evt, if (netif_carrier_ok(dev)) tipc_continue(ib_ptr->bearer); else - tipc_block_bearer(ib_ptr->bearer->name); + tipc_block_bearer(ib_ptr->bearer); break; case NETDEV_UP: tipc_continue(ib_ptr->bearer); break; case NETDEV_DOWN: - tipc_block_bearer(ib_ptr->bearer->name); + tipc_block_bearer(ib_ptr->bearer); break; case NETDEV_CHANGEMTU: case NETDEV_CHANGEADDR: - tipc_block_bearer(ib_ptr->bearer->name); + tipc_block_bearer(ib_ptr->bearer); tipc_continue(ib_ptr->bearer); break; case NETDEV_UNREGISTER: @@ -323,8 +323,8 @@ static int ib_msg2addr(const struct tipc_bearer *tb_ptr, */ static struct tipc_media ib_media_info = { .send_msg = send_msg, - .enable_bearer = enable_bearer, - .disable_bearer = disable_bearer, + .enable_media = enable_media, + .disable_media = disable_media, .addr2str = ib_addr2str, .addr2msg = ib_addr2msg, .msg2addr = ib_msg2addr, diff --git a/net/tipc/link.c b/net/tipc/link.c index 0cc3d9015c5d..cf465d66ccde 100644 --- a/net/tipc/link.c +++ b/net/tipc/link.c @@ -75,20 +75,6 @@ static const char *link_unk_evt = "Unknown link event "; */ #define START_CHANGEOVER 100000u -/** - * struct tipc_link_name - deconstructed link name - * @addr_local: network address of node at this end - * @if_local: name of interface at this end - * @addr_peer: network address of node at far end - * @if_peer: name of interface at far end - */ -struct tipc_link_name { - u32 addr_local; - char if_local[TIPC_MAX_IF_NAME]; - u32 addr_peer; - char if_peer[TIPC_MAX_IF_NAME]; -}; - static void link_handle_out_of_seq_msg(struct tipc_link *l_ptr, struct sk_buff *buf); static void link_recv_proto_msg(struct tipc_link *l_ptr, struct sk_buff *buf); @@ -97,8 +83,7 @@ static int link_recv_changeover_msg(struct tipc_link **l_ptr, static void link_set_supervision_props(struct tipc_link *l_ptr, u32 tolerance); static int link_send_sections_long(struct tipc_port *sender, struct iovec const *msg_sect, - u32 num_sect, unsigned int total_len, - u32 destnode); + unsigned int len, u32 destnode); static void link_state_event(struct tipc_link *l_ptr, u32 event); static void link_reset_statistics(struct tipc_link *l_ptr); static void link_print(struct tipc_link *l_ptr, const char *str); @@ -161,72 +146,6 @@ int tipc_link_is_active(struct tipc_link *l_ptr) } /** - * link_name_validate - validate & (optionally) deconstruct tipc_link name - * @name: ptr to link name string - * @name_parts: ptr to area for link name components (or NULL if not needed) - * - * Returns 1 if link name is valid, otherwise 0. - */ -static int link_name_validate(const char *name, - struct tipc_link_name *name_parts) -{ - char name_copy[TIPC_MAX_LINK_NAME]; - char *addr_local; - char *if_local; - char *addr_peer; - char *if_peer; - char dummy; - u32 z_local, c_local, n_local; - u32 z_peer, c_peer, n_peer; - u32 if_local_len; - u32 if_peer_len; - - /* copy link name & ensure length is OK */ - name_copy[TIPC_MAX_LINK_NAME - 1] = 0; - /* need above in case non-Posix strncpy() doesn't pad with nulls */ - strncpy(name_copy, name, TIPC_MAX_LINK_NAME); - if (name_copy[TIPC_MAX_LINK_NAME - 1] != 0) - return 0; - - /* ensure all component parts of link name are present */ - addr_local = name_copy; - if_local = strchr(addr_local, ':'); - if (if_local == NULL) - return 0; - *(if_local++) = 0; - addr_peer = strchr(if_local, '-'); - if (addr_peer == NULL) - return 0; - *(addr_peer++) = 0; - if_local_len = addr_peer - if_local; - if_peer = strchr(addr_peer, ':'); - if (if_peer == NULL) - return 0; - *(if_peer++) = 0; - if_peer_len = strlen(if_peer) + 1; - - /* validate component parts of link name */ - if ((sscanf(addr_local, "%u.%u.%u%c", - &z_local, &c_local, &n_local, &dummy) != 3) || - (sscanf(addr_peer, "%u.%u.%u%c", - &z_peer, &c_peer, &n_peer, &dummy) != 3) || - (z_local > 255) || (c_local > 4095) || (n_local > 4095) || - (z_peer > 255) || (c_peer > 4095) || (n_peer > 4095) || - (if_local_len <= 1) || (if_local_len > TIPC_MAX_IF_NAME) || - (if_peer_len <= 1) || (if_peer_len > TIPC_MAX_IF_NAME)) - return 0; - - /* return link name components, if necessary */ - if (name_parts) { - name_parts->addr_local = tipc_addr(z_local, c_local, n_local); - strcpy(name_parts->if_local, if_local); - name_parts->addr_peer = tipc_addr(z_peer, c_peer, n_peer); - strcpy(name_parts->if_peer, if_peer); - } - return 1; -} - -/** * link_timeout - handle expiration of link timer * @l_ptr: pointer to link * @@ -485,15 +404,9 @@ static void link_release_outqueue(struct tipc_link *l_ptr) */ void tipc_link_reset_fragments(struct tipc_link *l_ptr) { - struct sk_buff *buf = l_ptr->defragm_buf; - struct sk_buff *next; - - while (buf) { - next = buf->next; - kfree_skb(buf); - buf = next; - } - l_ptr->defragm_buf = NULL; + kfree_skb(l_ptr->reasm_head); + l_ptr->reasm_head = NULL; + l_ptr->reasm_tail = NULL; } /** @@ -1065,8 +978,7 @@ static int link_send_buf_fast(struct tipc_link *l_ptr, struct sk_buff *buf, */ int tipc_link_send_sections_fast(struct tipc_port *sender, struct iovec const *msg_sect, - const u32 num_sect, unsigned int total_len, - u32 destaddr) + unsigned int len, u32 destaddr) { struct tipc_msg *hdr = &sender->phdr; struct tipc_link *l_ptr; @@ -1080,8 +992,7 @@ again: * Try building message using port's max_pkt hint. * (Must not hold any locks while building message.) */ - res = tipc_msg_build(hdr, msg_sect, num_sect, total_len, - sender->max_pkt, &buf); + res = tipc_msg_build(hdr, msg_sect, len, sender->max_pkt, &buf); /* Exit if build request was invalid */ if (unlikely(res < 0)) return res; @@ -1121,8 +1032,7 @@ exit: if ((msg_hdr_sz(hdr) + res) <= sender->max_pkt) goto again; - return link_send_sections_long(sender, msg_sect, - num_sect, total_len, + return link_send_sections_long(sender, msg_sect, len, destaddr); } tipc_node_unlock(node); @@ -1133,8 +1043,8 @@ exit: if (buf) return tipc_reject_msg(buf, TIPC_ERR_NO_NODE); if (res >= 0) - return tipc_port_reject_sections(sender, hdr, msg_sect, num_sect, - total_len, TIPC_ERR_NO_NODE); + return tipc_port_reject_sections(sender, hdr, msg_sect, + len, TIPC_ERR_NO_NODE); return res; } @@ -1154,18 +1064,17 @@ exit: */ static int link_send_sections_long(struct tipc_port *sender, struct iovec const *msg_sect, - u32 num_sect, unsigned int total_len, - u32 destaddr) + unsigned int len, u32 destaddr) { struct tipc_link *l_ptr; struct tipc_node *node; struct tipc_msg *hdr = &sender->phdr; - u32 dsz = total_len; + u32 dsz = len; u32 max_pkt, fragm_sz, rest; struct tipc_msg fragm_hdr; struct sk_buff *buf, *buf_chain, *prev; u32 fragm_crs, fragm_rest, hsz, sect_rest; - const unchar *sect_crs; + const unchar __user *sect_crs; int curr_sect; u32 fragm_no; int res = 0; @@ -1207,7 +1116,7 @@ again: if (!sect_rest) { sect_rest = msg_sect[++curr_sect].iov_len; - sect_crs = (const unchar *)msg_sect[curr_sect].iov_base; + sect_crs = msg_sect[curr_sect].iov_base; } if (sect_rest < fragm_rest) @@ -1283,8 +1192,8 @@ reject: buf = buf_chain->next; kfree_skb(buf_chain); } - return tipc_port_reject_sections(sender, hdr, msg_sect, num_sect, - total_len, TIPC_ERR_NO_NODE); + return tipc_port_reject_sections(sender, hdr, msg_sect, + len, TIPC_ERR_NO_NODE); } /* Append chain of fragments to send queue & send them */ @@ -1592,15 +1501,15 @@ void tipc_recv_msg(struct sk_buff *head, struct tipc_bearer *b_ptr) /* Ensure bearer is still enabled */ if (unlikely(!b_ptr->active)) - goto cont; + goto discard; /* Ensure message is well-formed */ if (unlikely(!link_recv_buf_validate(buf))) - goto cont; + goto discard; /* Ensure message data is a single contiguous unit */ if (unlikely(skb_linearize(buf))) - goto cont; + goto discard; /* Handle arrival of a non-unicast link message */ msg = buf_msg(buf); @@ -1616,20 +1525,18 @@ void tipc_recv_msg(struct sk_buff *head, struct tipc_bearer *b_ptr) /* Discard unicast link messages destined for another node */ if (unlikely(!msg_short(msg) && (msg_destnode(msg) != tipc_own_addr))) - goto cont; + goto discard; /* Locate neighboring node that sent message */ n_ptr = tipc_node_find(msg_prevnode(msg)); if (unlikely(!n_ptr)) - goto cont; + goto discard; tipc_node_lock(n_ptr); /* Locate unicast link endpoint that should handle message */ l_ptr = n_ptr->links[b_ptr->identity]; - if (unlikely(!l_ptr)) { - tipc_node_unlock(n_ptr); - goto cont; - } + if (unlikely(!l_ptr)) + goto unlock_discard; /* Verify that communication with node is currently allowed */ if ((n_ptr->block_setup & WAIT_PEER_DOWN) && @@ -1639,10 +1546,8 @@ void tipc_recv_msg(struct sk_buff *head, struct tipc_bearer *b_ptr) !msg_redundant_link(msg)) n_ptr->block_setup &= ~WAIT_PEER_DOWN; - if (n_ptr->block_setup) { - tipc_node_unlock(n_ptr); - goto cont; - } + if (n_ptr->block_setup) + goto unlock_discard; /* Validate message sequence number info */ seq_no = msg_seqno(msg); @@ -1678,98 +1583,100 @@ void tipc_recv_msg(struct sk_buff *head, struct tipc_bearer *b_ptr) /* Now (finally!) process the incoming message */ protocol_check: - if (likely(link_working_working(l_ptr))) { - if (likely(seq_no == mod(l_ptr->next_in_no))) { - l_ptr->next_in_no++; - if (unlikely(l_ptr->oldest_deferred_in)) - head = link_insert_deferred_queue(l_ptr, - head); -deliver: - if (likely(msg_isdata(msg))) { - tipc_node_unlock(n_ptr); - tipc_port_recv_msg(buf); - continue; - } - switch (msg_user(msg)) { - int ret; - case MSG_BUNDLER: - l_ptr->stats.recv_bundles++; - l_ptr->stats.recv_bundled += - msg_msgcnt(msg); - tipc_node_unlock(n_ptr); - tipc_link_recv_bundle(buf); - continue; - case NAME_DISTRIBUTOR: - n_ptr->bclink.recv_permitted = true; - tipc_node_unlock(n_ptr); - tipc_named_recv(buf); - continue; - case BCAST_PROTOCOL: - tipc_link_recv_sync(n_ptr, buf); - tipc_node_unlock(n_ptr); - continue; - case CONN_MANAGER: - tipc_node_unlock(n_ptr); - tipc_port_recv_proto_msg(buf); - continue; - case MSG_FRAGMENTER: - l_ptr->stats.recv_fragments++; - ret = tipc_link_recv_fragment( - &l_ptr->defragm_buf, - &buf, &msg); - if (ret == 1) { - l_ptr->stats.recv_fragmented++; - goto deliver; - } - if (ret == -1) - l_ptr->next_in_no--; - break; - case CHANGEOVER_PROTOCOL: - type = msg_type(msg); - if (link_recv_changeover_msg(&l_ptr, - &buf)) { - msg = buf_msg(buf); - seq_no = msg_seqno(msg); - if (type == ORIGINAL_MSG) - goto deliver; - goto protocol_check; - } - break; - default: - kfree_skb(buf); - buf = NULL; - break; - } + if (unlikely(!link_working_working(l_ptr))) { + if (msg_user(msg) == LINK_PROTOCOL) { + link_recv_proto_msg(l_ptr, buf); + head = link_insert_deferred_queue(l_ptr, head); + tipc_node_unlock(n_ptr); + continue; + } + + /* Traffic message. Conditionally activate link */ + link_state_event(l_ptr, TRAFFIC_MSG_EVT); + + if (link_working_working(l_ptr)) { + /* Re-insert buffer in front of queue */ + buf->next = head; + head = buf; tipc_node_unlock(n_ptr); - tipc_net_route_msg(buf); continue; } + goto unlock_discard; + } + + /* Link is now in state WORKING_WORKING */ + if (unlikely(seq_no != mod(l_ptr->next_in_no))) { link_handle_out_of_seq_msg(l_ptr, buf); head = link_insert_deferred_queue(l_ptr, head); tipc_node_unlock(n_ptr); continue; } - - /* Link is not in state WORKING_WORKING */ - if (msg_user(msg) == LINK_PROTOCOL) { - link_recv_proto_msg(l_ptr, buf); + l_ptr->next_in_no++; + if (unlikely(l_ptr->oldest_deferred_in)) head = link_insert_deferred_queue(l_ptr, head); +deliver: + if (likely(msg_isdata(msg))) { tipc_node_unlock(n_ptr); + tipc_port_recv_msg(buf); continue; } - - /* Traffic message. Conditionally activate link */ - link_state_event(l_ptr, TRAFFIC_MSG_EVT); - - if (link_working_working(l_ptr)) { - /* Re-insert buffer in front of queue */ - buf->next = head; - head = buf; + switch (msg_user(msg)) { + int ret; + case MSG_BUNDLER: + l_ptr->stats.recv_bundles++; + l_ptr->stats.recv_bundled += msg_msgcnt(msg); + tipc_node_unlock(n_ptr); + tipc_link_recv_bundle(buf); + continue; + case NAME_DISTRIBUTOR: + n_ptr->bclink.recv_permitted = true; + tipc_node_unlock(n_ptr); + tipc_named_recv(buf); + continue; + case BCAST_PROTOCOL: + tipc_link_recv_sync(n_ptr, buf); + tipc_node_unlock(n_ptr); + continue; + case CONN_MANAGER: + tipc_node_unlock(n_ptr); + tipc_port_recv_proto_msg(buf); + continue; + case MSG_FRAGMENTER: + l_ptr->stats.recv_fragments++; + ret = tipc_link_recv_fragment(&l_ptr->reasm_head, + &l_ptr->reasm_tail, + &buf); + if (ret == LINK_REASM_COMPLETE) { + l_ptr->stats.recv_fragmented++; + msg = buf_msg(buf); + goto deliver; + } + if (ret == LINK_REASM_ERROR) + tipc_link_reset(l_ptr); tipc_node_unlock(n_ptr); continue; + case CHANGEOVER_PROTOCOL: + type = msg_type(msg); + if (link_recv_changeover_msg(&l_ptr, &buf)) { + msg = buf_msg(buf); + seq_no = msg_seqno(msg); + if (type == ORIGINAL_MSG) + goto deliver; + goto protocol_check; + } + break; + default: + kfree_skb(buf); + buf = NULL; + break; } tipc_node_unlock(n_ptr); -cont: + tipc_net_route_msg(buf); + continue; +unlock_discard: + + tipc_node_unlock(n_ptr); +discard: kfree_skb(buf); } read_unlock_bh(&tipc_net_lock); @@ -2432,114 +2339,47 @@ static int link_send_long_buf(struct tipc_link *l_ptr, struct sk_buff *buf) } /* - * A pending message being re-assembled must store certain values - * to handle subsequent fragments correctly. The following functions - * help storing these values in unused, available fields in the - * pending message. This makes dynamic memory allocation unnecessary. - */ -static void set_long_msg_seqno(struct sk_buff *buf, u32 seqno) -{ - msg_set_seqno(buf_msg(buf), seqno); -} - -static u32 get_fragm_size(struct sk_buff *buf) -{ - return msg_ack(buf_msg(buf)); -} - -static void set_fragm_size(struct sk_buff *buf, u32 sz) -{ - msg_set_ack(buf_msg(buf), sz); -} - -static u32 get_expected_frags(struct sk_buff *buf) -{ - return msg_bcast_ack(buf_msg(buf)); -} - -static void set_expected_frags(struct sk_buff *buf, u32 exp) -{ - msg_set_bcast_ack(buf_msg(buf), exp); -} - -/* * tipc_link_recv_fragment(): Called with node lock on. Returns * the reassembled buffer if message is complete. */ -int tipc_link_recv_fragment(struct sk_buff **pending, struct sk_buff **fb, - struct tipc_msg **m) -{ - struct sk_buff *prev = NULL; - struct sk_buff *fbuf = *fb; - struct tipc_msg *fragm = buf_msg(fbuf); - struct sk_buff *pbuf = *pending; - u32 long_msg_seq_no = msg_long_msgno(fragm); - - *fb = NULL; - - /* Is there an incomplete message waiting for this fragment? */ - while (pbuf && ((buf_seqno(pbuf) != long_msg_seq_no) || - (msg_orignode(fragm) != msg_orignode(buf_msg(pbuf))))) { - prev = pbuf; - pbuf = pbuf->next; - } - - if (!pbuf && (msg_type(fragm) == FIRST_FRAGMENT)) { - struct tipc_msg *imsg = (struct tipc_msg *)msg_data(fragm); - u32 msg_sz = msg_size(imsg); - u32 fragm_sz = msg_data_sz(fragm); - u32 exp_fragm_cnt; - u32 max = TIPC_MAX_USER_MSG_SIZE + NAMED_H_SIZE; - - if (msg_type(imsg) == TIPC_MCAST_MSG) - max = TIPC_MAX_USER_MSG_SIZE + MCAST_H_SIZE; - if (fragm_sz == 0 || msg_size(imsg) > max) { - kfree_skb(fbuf); - return 0; - } - exp_fragm_cnt = msg_sz / fragm_sz + !!(msg_sz % fragm_sz); - pbuf = tipc_buf_acquire(msg_size(imsg)); - if (pbuf != NULL) { - pbuf->next = *pending; - *pending = pbuf; - skb_copy_to_linear_data(pbuf, imsg, - msg_data_sz(fragm)); - /* Prepare buffer for subsequent fragments. */ - set_long_msg_seqno(pbuf, long_msg_seq_no); - set_fragm_size(pbuf, fragm_sz); - set_expected_frags(pbuf, exp_fragm_cnt - 1); - } else { - pr_debug("Link unable to reassemble fragmented message\n"); - kfree_skb(fbuf); - return -1; - } - kfree_skb(fbuf); - return 0; - } else if (pbuf && (msg_type(fragm) != FIRST_FRAGMENT)) { - u32 dsz = msg_data_sz(fragm); - u32 fsz = get_fragm_size(pbuf); - u32 crs = ((msg_fragm_no(fragm) - 1) * fsz); - u32 exp_frags = get_expected_frags(pbuf) - 1; - skb_copy_to_linear_data_offset(pbuf, crs, - msg_data(fragm), dsz); - kfree_skb(fbuf); - - /* Is message complete? */ - if (exp_frags == 0) { - if (prev) - prev->next = pbuf->next; - else - *pending = pbuf->next; - msg_reset_reroute_cnt(buf_msg(pbuf)); - *fb = pbuf; - *m = buf_msg(pbuf); - return 1; - } - set_expected_frags(pbuf, exp_frags); +int tipc_link_recv_fragment(struct sk_buff **head, struct sk_buff **tail, + struct sk_buff **fbuf) +{ + struct sk_buff *frag = *fbuf; + struct tipc_msg *msg = buf_msg(frag); + u32 fragid = msg_type(msg); + bool headstolen; + int delta; + + skb_pull(frag, msg_hdr_sz(msg)); + if (fragid == FIRST_FRAGMENT) { + if (*head || skb_unclone(frag, GFP_ATOMIC)) + goto out_free; + *head = frag; + skb_frag_list_init(*head); return 0; + } else if (skb_try_coalesce(*head, frag, &headstolen, &delta)) { + kfree_skb_partial(frag, headstolen); + } else { + if (!*head) + goto out_free; + if (!skb_has_frag_list(*head)) + skb_shinfo(*head)->frag_list = frag; + else + (*tail)->next = frag; + *tail = frag; + (*head)->truesize += frag->truesize; + } + if (fragid == LAST_FRAGMENT) { + *fbuf = *head; + *tail = *head = NULL; + return LINK_REASM_COMPLETE; } - kfree_skb(fbuf); return 0; +out_free: + pr_warn_ratelimited("Link unable to reassemble fragmented message\n"); + kfree_skb(*fbuf); + return LINK_REASM_ERROR; } static void link_set_supervision_props(struct tipc_link *l_ptr, u32 tolerance) @@ -2585,25 +2425,21 @@ void tipc_link_set_queue_limits(struct tipc_link *l_ptr, u32 window) static struct tipc_link *link_find_link(const char *name, struct tipc_node **node) { - struct tipc_link_name link_name_parts; - struct tipc_bearer *b_ptr; struct tipc_link *l_ptr; + struct tipc_node *n_ptr; + int i; - if (!link_name_validate(name, &link_name_parts)) - return NULL; - - b_ptr = tipc_bearer_find_interface(link_name_parts.if_local); - if (!b_ptr) - return NULL; - - *node = tipc_node_find(link_name_parts.addr_peer); - if (!*node) - return NULL; - - l_ptr = (*node)->links[b_ptr->identity]; - if (!l_ptr || strcmp(l_ptr->name, name)) - return NULL; - + list_for_each_entry(n_ptr, &tipc_node_list, list) { + for (i = 0; i < MAX_BEARERS; i++) { + l_ptr = n_ptr->links[i]; + if (l_ptr && !strcmp(l_ptr->name, name)) + goto found; + } + } + l_ptr = NULL; + n_ptr = NULL; +found: + *node = n_ptr; return l_ptr; } @@ -2646,6 +2482,7 @@ static int link_cmd_set_value(const char *name, u32 new_value, u16 cmd) struct tipc_link *l_ptr; struct tipc_bearer *b_ptr; struct tipc_media *m_ptr; + int res = 0; l_ptr = link_find_link(name, &node); if (l_ptr) { @@ -2668,9 +2505,12 @@ static int link_cmd_set_value(const char *name, u32 new_value, u16 cmd) case TIPC_CMD_SET_LINK_WINDOW: tipc_link_set_queue_limits(l_ptr, new_value); break; + default: + res = -EINVAL; + break; } tipc_node_unlock(node); - return 0; + return res; } b_ptr = tipc_bearer_find(name); @@ -2678,15 +2518,18 @@ static int link_cmd_set_value(const char *name, u32 new_value, u16 cmd) switch (cmd) { case TIPC_CMD_SET_LINK_TOL: b_ptr->tolerance = new_value; - return 0; + break; case TIPC_CMD_SET_LINK_PRI: b_ptr->priority = new_value; - return 0; + break; case TIPC_CMD_SET_LINK_WINDOW: b_ptr->window = new_value; - return 0; + break; + default: + res = -EINVAL; + break; } - return -EINVAL; + return res; } m_ptr = tipc_media_find(name); @@ -2695,15 +2538,18 @@ static int link_cmd_set_value(const char *name, u32 new_value, u16 cmd) switch (cmd) { case TIPC_CMD_SET_LINK_TOL: m_ptr->tolerance = new_value; - return 0; + break; case TIPC_CMD_SET_LINK_PRI: m_ptr->priority = new_value; - return 0; + break; case TIPC_CMD_SET_LINK_WINDOW: m_ptr->window = new_value; - return 0; + break; + default: + res = -EINVAL; + break; } - return -EINVAL; + return res; } struct sk_buff *tipc_link_cmd_config(const void *req_tlv_area, int req_tlv_space, diff --git a/net/tipc/link.h b/net/tipc/link.h index c048ed1cbd76..8a6c1026644d 100644 --- a/net/tipc/link.h +++ b/net/tipc/link.h @@ -41,6 +41,12 @@ #include "node.h" /* + * Link reassembly status codes + */ +#define LINK_REASM_ERROR -1 +#define LINK_REASM_COMPLETE 1 + +/* * Out-of-range value for link sequence numbers */ #define INVALID_LINK_SEQ 0x10000 @@ -134,7 +140,8 @@ struct tipc_stats { * @next_out: ptr to first unsent outbound message in queue * @waiting_ports: linked list of ports waiting for link congestion to abate * @long_msg_seq_no: next identifier to use for outbound fragmented messages - * @defragm_buf: list of partially reassembled inbound message fragments + * @reasm_head: list head of partially reassembled inbound message fragments + * @reasm_tail: last fragment received * @stats: collects statistics regarding link activity */ struct tipc_link { @@ -196,9 +203,10 @@ struct tipc_link { struct sk_buff *next_out; struct list_head waiting_ports; - /* Fragmentation/defragmentation */ + /* Fragmentation/reassembly */ u32 long_msg_seq_no; - struct sk_buff *defragm_buf; + struct sk_buff *reasm_head; + struct sk_buff *reasm_tail; /* Statistics */ struct tipc_stats stats; @@ -227,13 +235,11 @@ int tipc_link_send_buf(struct tipc_link *l_ptr, struct sk_buff *buf); u32 tipc_link_get_max_pkt(u32 dest, u32 selector); int tipc_link_send_sections_fast(struct tipc_port *sender, struct iovec const *msg_sect, - const u32 num_sect, - unsigned int total_len, - u32 destnode); + unsigned int len, u32 destnode); void tipc_link_recv_bundle(struct sk_buff *buf); -int tipc_link_recv_fragment(struct sk_buff **pending, - struct sk_buff **fb, - struct tipc_msg **msg); +int tipc_link_recv_fragment(struct sk_buff **reasm_head, + struct sk_buff **reasm_tail, + struct sk_buff **fbuf); void tipc_link_send_proto_msg(struct tipc_link *l_ptr, u32 msg_typ, int prob, u32 gap, u32 tolerance, u32 priority, u32 acked_mtu); diff --git a/net/tipc/msg.c b/net/tipc/msg.c index ced60e2fc4f7..e525f8ce1dee 100644 --- a/net/tipc/msg.c +++ b/net/tipc/msg.c @@ -73,13 +73,13 @@ void tipc_msg_init(struct tipc_msg *m, u32 user, u32 type, u32 hsize, * Returns message data size or errno */ int tipc_msg_build(struct tipc_msg *hdr, struct iovec const *msg_sect, - u32 num_sect, unsigned int total_len, int max_size, - struct sk_buff **buf) + unsigned int len, int max_size, struct sk_buff **buf) { - int dsz, sz, hsz, pos, res, cnt; + int dsz, sz, hsz; + unsigned char *to; - dsz = total_len; - pos = hsz = msg_hdr_sz(hdr); + dsz = len; + hsz = msg_hdr_sz(hdr); sz = hsz + dsz; msg_set_size(hdr, sz); if (unlikely(sz > max_size)) { @@ -91,16 +91,11 @@ int tipc_msg_build(struct tipc_msg *hdr, struct iovec const *msg_sect, if (!(*buf)) return -ENOMEM; skb_copy_to_linear_data(*buf, hdr, hsz); - for (res = 1, cnt = 0; res && (cnt < num_sect); cnt++) { - skb_copy_to_linear_data_offset(*buf, pos, - msg_sect[cnt].iov_base, - msg_sect[cnt].iov_len); - pos += msg_sect[cnt].iov_len; + to = (*buf)->data + hsz; + if (len && memcpy_fromiovecend(to, msg_sect, 0, dsz)) { + kfree_skb(*buf); + *buf = NULL; + return -EFAULT; } - if (likely(res)) - return dsz; - - kfree_skb(*buf); - *buf = NULL; - return -EFAULT; + return dsz; } diff --git a/net/tipc/msg.h b/net/tipc/msg.h index 5e4ccf5c27df..76d1269b9443 100644 --- a/net/tipc/msg.h +++ b/net/tipc/msg.h @@ -554,12 +554,6 @@ static inline void msg_set_last_bcast(struct tipc_msg *m, u32 n) msg_set_bits(m, 4, 16, 0xffff, n); } - -static inline u32 msg_fragm_no(struct tipc_msg *m) -{ - return msg_bits(m, 4, 16, 0xffff); -} - static inline void msg_set_fragm_no(struct tipc_msg *m, u32 n) { msg_set_bits(m, 4, 16, 0xffff, n); @@ -576,12 +570,6 @@ static inline void msg_set_next_sent(struct tipc_msg *m, u32 n) msg_set_bits(m, 4, 0, 0xffff, n); } - -static inline u32 msg_long_msgno(struct tipc_msg *m) -{ - return msg_bits(m, 4, 0, 0xffff); -} - static inline void msg_set_long_msgno(struct tipc_msg *m, u32 n) { msg_set_bits(m, 4, 0, 0xffff, n); @@ -722,6 +710,5 @@ u32 tipc_msg_tot_importance(struct tipc_msg *m); void tipc_msg_init(struct tipc_msg *m, u32 user, u32 type, u32 hsize, u32 destnode); int tipc_msg_build(struct tipc_msg *hdr, struct iovec const *msg_sect, - u32 num_sect, unsigned int total_len, int max_size, - struct sk_buff **buf); + unsigned int len, int max_size, struct sk_buff **buf); #endif diff --git a/net/tipc/node.c b/net/tipc/node.c index 6e6c434872e8..25100c0a6fe8 100644 --- a/net/tipc/node.c +++ b/net/tipc/node.c @@ -298,9 +298,10 @@ static void node_lost_contact(struct tipc_node *n_ptr) } n_ptr->bclink.deferred_size = 0; - if (n_ptr->bclink.defragm) { - kfree_skb(n_ptr->bclink.defragm); - n_ptr->bclink.defragm = NULL; + if (n_ptr->bclink.reasm_head) { + kfree_skb(n_ptr->bclink.reasm_head); + n_ptr->bclink.reasm_head = NULL; + n_ptr->bclink.reasm_tail = NULL; } tipc_bclink_remove_node(n_ptr->addr); diff --git a/net/tipc/node.h b/net/tipc/node.h index 3c189b35b102..e5e96c04e167 100644 --- a/net/tipc/node.h +++ b/net/tipc/node.h @@ -74,7 +74,8 @@ * @deferred_size: number of OOS b'cast messages in deferred queue * @deferred_head: oldest OOS b'cast message received from node * @deferred_tail: newest OOS b'cast message received from node - * @defragm: list of partially reassembled b'cast message fragments from node + * @reasm_head: broadcast reassembly queue head from node + * @reasm_tail: last broadcast fragment received from node * @recv_permitted: true if node is allowed to receive b'cast messages */ struct tipc_node { @@ -98,7 +99,8 @@ struct tipc_node { u32 deferred_size; struct sk_buff *deferred_head; struct sk_buff *deferred_tail; - struct sk_buff *defragm; + struct sk_buff *reasm_head; + struct sk_buff *reasm_tail; bool recv_permitted; } bclink; }; diff --git a/net/tipc/port.c b/net/tipc/port.c index b3ed2fcab4fb..c081a7632302 100644 --- a/net/tipc/port.c +++ b/net/tipc/port.c @@ -90,8 +90,7 @@ int tipc_port_peer_msg(struct tipc_port *p_ptr, struct tipc_msg *msg) * tipc_multicast - send a multicast message to local and remote destinations */ int tipc_multicast(u32 ref, struct tipc_name_seq const *seq, - u32 num_sect, struct iovec const *msg_sect, - unsigned int total_len) + struct iovec const *msg_sect, unsigned int len) { struct tipc_msg *hdr; struct sk_buff *buf; @@ -114,8 +113,7 @@ int tipc_multicast(u32 ref, struct tipc_name_seq const *seq, msg_set_namelower(hdr, seq->lower); msg_set_nameupper(hdr, seq->upper); msg_set_hdr_sz(hdr, MCAST_H_SIZE); - res = tipc_msg_build(hdr, msg_sect, num_sect, total_len, MAX_MSG_SIZE, - &buf); + res = tipc_msg_build(hdr, msg_sect, len, MAX_MSG_SIZE, &buf); if (unlikely(!buf)) return res; @@ -436,14 +434,13 @@ exit: } int tipc_port_reject_sections(struct tipc_port *p_ptr, struct tipc_msg *hdr, - struct iovec const *msg_sect, u32 num_sect, - unsigned int total_len, int err) + struct iovec const *msg_sect, unsigned int len, + int err) { struct sk_buff *buf; int res; - res = tipc_msg_build(hdr, msg_sect, num_sect, total_len, MAX_MSG_SIZE, - &buf); + res = tipc_msg_build(hdr, msg_sect, len, MAX_MSG_SIZE, &buf); if (!buf) return res; @@ -918,15 +915,14 @@ int tipc_port_recv_msg(struct sk_buff *buf) * tipc_port_recv_sections(): Concatenate and deliver sectioned * message for this node. */ -static int tipc_port_recv_sections(struct tipc_port *sender, unsigned int num_sect, +static int tipc_port_recv_sections(struct tipc_port *sender, struct iovec const *msg_sect, - unsigned int total_len) + unsigned int len) { struct sk_buff *buf; int res; - res = tipc_msg_build(&sender->phdr, msg_sect, num_sect, total_len, - MAX_MSG_SIZE, &buf); + res = tipc_msg_build(&sender->phdr, msg_sect, len, MAX_MSG_SIZE, &buf); if (likely(buf)) tipc_port_recv_msg(buf); return res; @@ -935,8 +931,7 @@ static int tipc_port_recv_sections(struct tipc_port *sender, unsigned int num_se /** * tipc_send - send message sections on connection */ -int tipc_send(u32 ref, unsigned int num_sect, struct iovec const *msg_sect, - unsigned int total_len) +int tipc_send(u32 ref, struct iovec const *msg_sect, unsigned int len) { struct tipc_port *p_ptr; u32 destnode; @@ -950,11 +945,10 @@ int tipc_send(u32 ref, unsigned int num_sect, struct iovec const *msg_sect, if (!tipc_port_congested(p_ptr)) { destnode = port_peernode(p_ptr); if (likely(!in_own_node(destnode))) - res = tipc_link_send_sections_fast(p_ptr, msg_sect, num_sect, - total_len, destnode); + res = tipc_link_send_sections_fast(p_ptr, msg_sect, + len, destnode); else - res = tipc_port_recv_sections(p_ptr, num_sect, msg_sect, - total_len); + res = tipc_port_recv_sections(p_ptr, msg_sect, len); if (likely(res != -ELINKCONG)) { p_ptr->congested = 0; @@ -965,7 +959,7 @@ int tipc_send(u32 ref, unsigned int num_sect, struct iovec const *msg_sect, } if (port_unreliable(p_ptr)) { p_ptr->congested = 0; - return total_len; + return len; } return -ELINKCONG; } @@ -974,8 +968,7 @@ int tipc_send(u32 ref, unsigned int num_sect, struct iovec const *msg_sect, * tipc_send2name - send message sections to port name */ int tipc_send2name(u32 ref, struct tipc_name const *name, unsigned int domain, - unsigned int num_sect, struct iovec const *msg_sect, - unsigned int total_len) + struct iovec const *msg_sect, unsigned int len) { struct tipc_port *p_ptr; struct tipc_msg *msg; @@ -999,36 +992,32 @@ int tipc_send2name(u32 ref, struct tipc_name const *name, unsigned int domain, if (likely(destport || destnode)) { if (likely(in_own_node(destnode))) - res = tipc_port_recv_sections(p_ptr, num_sect, - msg_sect, total_len); + res = tipc_port_recv_sections(p_ptr, msg_sect, len); else if (tipc_own_addr) res = tipc_link_send_sections_fast(p_ptr, msg_sect, - num_sect, total_len, - destnode); + len, destnode); else res = tipc_port_reject_sections(p_ptr, msg, msg_sect, - num_sect, total_len, - TIPC_ERR_NO_NODE); + len, TIPC_ERR_NO_NODE); if (likely(res != -ELINKCONG)) { if (res > 0) p_ptr->sent++; return res; } if (port_unreliable(p_ptr)) { - return total_len; + return len; } return -ELINKCONG; } - return tipc_port_reject_sections(p_ptr, msg, msg_sect, num_sect, - total_len, TIPC_ERR_NO_NAME); + return tipc_port_reject_sections(p_ptr, msg, msg_sect, len, + TIPC_ERR_NO_NAME); } /** * tipc_send2port - send message sections to port identity */ int tipc_send2port(u32 ref, struct tipc_portid const *dest, - unsigned int num_sect, struct iovec const *msg_sect, - unsigned int total_len) + struct iovec const *msg_sect, unsigned int len) { struct tipc_port *p_ptr; struct tipc_msg *msg; @@ -1046,21 +1035,20 @@ int tipc_send2port(u32 ref, struct tipc_portid const *dest, msg_set_hdr_sz(msg, BASIC_H_SIZE); if (in_own_node(dest->node)) - res = tipc_port_recv_sections(p_ptr, num_sect, msg_sect, - total_len); + res = tipc_port_recv_sections(p_ptr, msg_sect, len); else if (tipc_own_addr) - res = tipc_link_send_sections_fast(p_ptr, msg_sect, num_sect, - total_len, dest->node); + res = tipc_link_send_sections_fast(p_ptr, msg_sect, len, + dest->node); else - res = tipc_port_reject_sections(p_ptr, msg, msg_sect, num_sect, - total_len, TIPC_ERR_NO_NODE); + res = tipc_port_reject_sections(p_ptr, msg, msg_sect, len, + TIPC_ERR_NO_NODE); if (likely(res != -ELINKCONG)) { if (res > 0) p_ptr->sent++; return res; } if (port_unreliable(p_ptr)) { - return total_len; + return len; } return -ELINKCONG; } diff --git a/net/tipc/port.h b/net/tipc/port.h index 5a7026b9c345..912253597343 100644 --- a/net/tipc/port.h +++ b/net/tipc/port.h @@ -151,24 +151,20 @@ int tipc_port_peer_msg(struct tipc_port *p_ptr, struct tipc_msg *msg); * TIPC messaging routines */ int tipc_port_recv_msg(struct sk_buff *buf); -int tipc_send(u32 portref, unsigned int num_sect, struct iovec const *msg_sect, - unsigned int total_len); +int tipc_send(u32 portref, struct iovec const *msg_sect, unsigned int len); int tipc_send2name(u32 portref, struct tipc_name const *name, u32 domain, - unsigned int num_sect, struct iovec const *msg_sect, - unsigned int total_len); + struct iovec const *msg_sect, unsigned int len); int tipc_send2port(u32 portref, struct tipc_portid const *dest, - unsigned int num_sect, struct iovec const *msg_sect, - unsigned int total_len); + struct iovec const *msg_sect, unsigned int len); int tipc_multicast(u32 portref, struct tipc_name_seq const *seq, - unsigned int section_count, struct iovec const *msg, - unsigned int total_len); + struct iovec const *msg, unsigned int len); int tipc_port_reject_sections(struct tipc_port *p_ptr, struct tipc_msg *hdr, - struct iovec const *msg_sect, u32 num_sect, - unsigned int total_len, int err); + struct iovec const *msg_sect, unsigned int len, + int err); struct sk_buff *tipc_port_get_ports(void); void tipc_port_recv_proto_msg(struct sk_buff *buf); void tipc_port_recv_mcast(struct sk_buff *buf, struct tipc_port_list *dp); diff --git a/net/tipc/socket.c b/net/tipc/socket.c index 6cc7ddd2fb7c..3906527259d1 100644 --- a/net/tipc/socket.c +++ b/net/tipc/socket.c @@ -338,7 +338,7 @@ static int release(struct socket *sock) buf = __skb_dequeue(&sk->sk_receive_queue); if (buf == NULL) break; - if (TIPC_SKB_CB(buf)->handle != 0) + if (TIPC_SKB_CB(buf)->handle != NULL) kfree_skb(buf); else { if ((sock->state == SS_CONNECTING) || @@ -622,13 +622,11 @@ static int send_msg(struct kiocb *iocb, struct socket *sock, res = tipc_send2name(tport->ref, &dest->addr.name.name, dest->addr.name.domain, - m->msg_iovlen, m->msg_iov, total_len); } else if (dest->addrtype == TIPC_ADDR_ID) { res = tipc_send2port(tport->ref, &dest->addr.id, - m->msg_iovlen, m->msg_iov, total_len); } else if (dest->addrtype == TIPC_ADDR_MCAST) { @@ -641,7 +639,6 @@ static int send_msg(struct kiocb *iocb, struct socket *sock, break; res = tipc_multicast(tport->ref, &dest->addr.nameseq, - m->msg_iovlen, m->msg_iov, total_len); } @@ -707,8 +704,7 @@ static int send_packet(struct kiocb *iocb, struct socket *sock, break; } - res = tipc_send(tport->ref, m->msg_iovlen, m->msg_iov, - total_len); + res = tipc_send(tport->ref, m->msg_iov, total_len); if (likely(res != -ELINKCONG)) break; if (timeout_val <= 0L) { @@ -1368,7 +1364,7 @@ static u32 filter_rcv(struct sock *sk, struct sk_buff *buf) return TIPC_ERR_OVERLOAD; /* Enqueue message */ - TIPC_SKB_CB(buf)->handle = 0; + TIPC_SKB_CB(buf)->handle = NULL; __skb_queue_tail(&sk->sk_receive_queue, buf); skb_set_owner_r(buf, sk); @@ -1691,7 +1687,7 @@ restart: /* Disconnect and send a 'FIN+' or 'FIN-' message to peer */ buf = __skb_dequeue(&sk->sk_receive_queue); if (buf) { - if (TIPC_SKB_CB(buf)->handle != 0) { + if (TIPC_SKB_CB(buf)->handle != NULL) { kfree_skb(buf); goto restart; } diff --git a/net/wimax/wimax-internal.h b/net/wimax/wimax-internal.h index 1e743d214856..5dcd9c067bf0 100644 --- a/net/wimax/wimax-internal.h +++ b/net/wimax/wimax-internal.h @@ -63,11 +63,11 @@ void __wimax_state_set(struct wimax_dev *wimax_dev, enum wimax_st state) { wimax_dev->state = state; } -extern void __wimax_state_change(struct wimax_dev *, enum wimax_st); +void __wimax_state_change(struct wimax_dev *, enum wimax_st); #ifdef CONFIG_DEBUG_FS -extern int wimax_debugfs_add(struct wimax_dev *); -extern void wimax_debugfs_rm(struct wimax_dev *); +int wimax_debugfs_add(struct wimax_dev *); +void wimax_debugfs_rm(struct wimax_dev *); #else static inline int wimax_debugfs_add(struct wimax_dev *wimax_dev) { @@ -76,13 +76,13 @@ static inline int wimax_debugfs_add(struct wimax_dev *wimax_dev) static inline void wimax_debugfs_rm(struct wimax_dev *wimax_dev) {} #endif -extern void wimax_id_table_add(struct wimax_dev *); -extern struct wimax_dev *wimax_dev_get_by_genl_info(struct genl_info *, int); -extern void wimax_id_table_rm(struct wimax_dev *); -extern void wimax_id_table_release(void); +void wimax_id_table_add(struct wimax_dev *); +struct wimax_dev *wimax_dev_get_by_genl_info(struct genl_info *, int); +void wimax_id_table_rm(struct wimax_dev *); +void wimax_id_table_release(void); -extern int wimax_rfkill_add(struct wimax_dev *); -extern void wimax_rfkill_rm(struct wimax_dev *); +int wimax_rfkill_add(struct wimax_dev *); +void wimax_rfkill_rm(struct wimax_dev *); extern struct genl_family wimax_gnl_family; extern struct genl_multicast_group wimax_gnl_mcg; diff --git a/net/wireless/chan.c b/net/wireless/chan.c index 50f6195c8b70..9b8cc877eb19 100644 --- a/net/wireless/chan.c +++ b/net/wireless/chan.c @@ -328,6 +328,7 @@ int cfg80211_chandef_dfs_required(struct wiphy *wiphy, return cfg80211_get_chans_dfs_required(wiphy, chandef->center_freq2, width); } +EXPORT_SYMBOL(cfg80211_chandef_dfs_required); static bool cfg80211_secondary_chans_ok(struct wiphy *wiphy, u32 center_freq, u32 bandwidth, @@ -503,7 +504,8 @@ cfg80211_get_chan_state(struct wireless_dev *wdev, case NL80211_IFTYPE_ADHOC: if (wdev->current_bss) { *chan = wdev->current_bss->pub.channel; - *chanmode = wdev->ibss_fixed + *chanmode = (wdev->ibss_fixed && + !wdev->ibss_dfs_possible) ? CHAN_MODE_SHARED : CHAN_MODE_EXCLUSIVE; return; diff --git a/net/wireless/core.h b/net/wireless/core.h index 3159e9c284c5..af10e59af2d8 100644 --- a/net/wireless/core.h +++ b/net/wireless/core.h @@ -234,10 +234,10 @@ struct cfg80211_beacon_registration { }; /* free object */ -extern void cfg80211_dev_free(struct cfg80211_registered_device *rdev); +void cfg80211_dev_free(struct cfg80211_registered_device *rdev); -extern int cfg80211_dev_rename(struct cfg80211_registered_device *rdev, - char *newname); +int cfg80211_dev_rename(struct cfg80211_registered_device *rdev, + char *newname); void ieee80211_set_bitrate_flags(struct wiphy *wiphy); @@ -382,15 +382,6 @@ int cfg80211_can_use_iftype_chan(struct cfg80211_registered_device *rdev, enum cfg80211_chan_mode chanmode, u8 radar_detect); -/** - * cfg80211_chandef_dfs_required - checks if radar detection is required - * @wiphy: the wiphy to validate against - * @chandef: the channel definition to check - * Return: 1 if radar detection is required, 0 if it is not, < 0 on error - */ -int cfg80211_chandef_dfs_required(struct wiphy *wiphy, - const struct cfg80211_chan_def *c); - void cfg80211_set_dfs_state(struct wiphy *wiphy, const struct cfg80211_chan_def *chandef, enum nl80211_dfs_state dfs_state); diff --git a/net/wireless/debugfs.c b/net/wireless/debugfs.c index 90d050036624..454157717efa 100644 --- a/net/wireless/debugfs.c +++ b/net/wireless/debugfs.c @@ -47,17 +47,19 @@ static int ht_print_chan(struct ieee80211_channel *chan, return 0; if (chan->flags & IEEE80211_CHAN_DISABLED) - return snprintf(buf + offset, - buf_size - offset, - "%d Disabled\n", - chan->center_freq); - - return snprintf(buf + offset, - buf_size - offset, - "%d HT40 %c%c\n", - chan->center_freq, - (chan->flags & IEEE80211_CHAN_NO_HT40MINUS) ? ' ' : '-', - (chan->flags & IEEE80211_CHAN_NO_HT40PLUS) ? ' ' : '+'); + return scnprintf(buf + offset, + buf_size - offset, + "%d Disabled\n", + chan->center_freq); + + return scnprintf(buf + offset, + buf_size - offset, + "%d HT40 %c%c\n", + chan->center_freq, + (chan->flags & IEEE80211_CHAN_NO_HT40MINUS) ? + ' ' : '-', + (chan->flags & IEEE80211_CHAN_NO_HT40PLUS) ? + ' ' : '+'); } static ssize_t ht40allow_map_read(struct file *file, diff --git a/net/wireless/genregdb.awk b/net/wireless/genregdb.awk index 9392f8cbb901..42ed274e81f4 100644 --- a/net/wireless/genregdb.awk +++ b/net/wireless/genregdb.awk @@ -46,6 +46,12 @@ BEGIN { sub(/:/, "", country) printf "static const struct ieee80211_regdomain regdom_%s = {\n", country printf "\t.alpha2 = \"%s\",\n", country + if ($NF ~ /DFS-ETSI/) + printf "\t.dfs_region = NL80211_DFS_ETSI,\n" + else if ($NF ~ /DFS-FCC/) + printf "\t.dfs_region = NL80211_DFS_FCC,\n" + else if ($NF ~ /DFS-JP/) + printf "\t.dfs_region = NL80211_DFS_JP,\n" printf "\t.reg_rules = {\n" active = 1 regdb = regdb "\t®dom_" country ",\n" diff --git a/net/wireless/ibss.c b/net/wireless/ibss.c index 403fe29c024d..9d797df56649 100644 --- a/net/wireless/ibss.c +++ b/net/wireless/ibss.c @@ -83,6 +83,8 @@ int __cfg80211_join_ibss(struct cfg80211_registered_device *rdev, struct cfg80211_cached_keys *connkeys) { struct wireless_dev *wdev = dev->ieee80211_ptr; + struct ieee80211_channel *check_chan; + u8 radar_detect_width = 0; int err; ASSERT_WDEV_LOCK(wdev); @@ -114,14 +116,28 @@ int __cfg80211_join_ibss(struct cfg80211_registered_device *rdev, wdev->connect_keys = connkeys; wdev->ibss_fixed = params->channel_fixed; + wdev->ibss_dfs_possible = params->userspace_handles_dfs; #ifdef CONFIG_CFG80211_WEXT wdev->wext.ibss.chandef = params->chandef; #endif + check_chan = params->chandef.chan; + if (params->userspace_handles_dfs) { + /* use channel NULL to check for radar even if the current + * channel is not a radar channel - it might decide to change + * to DFS channel later. + */ + radar_detect_width = BIT(params->chandef.width); + check_chan = NULL; + } + + err = cfg80211_can_use_iftype_chan(rdev, wdev, wdev->iftype, + check_chan, + (params->channel_fixed && + !radar_detect_width) + ? CHAN_MODE_SHARED + : CHAN_MODE_EXCLUSIVE, + radar_detect_width); - err = cfg80211_can_use_chan(rdev, wdev, params->chandef.chan, - params->channel_fixed - ? CHAN_MODE_SHARED - : CHAN_MODE_EXCLUSIVE); if (err) { wdev->connect_keys = NULL; return err; diff --git a/net/wireless/mlme.c b/net/wireless/mlme.c index 8d49c1ce3dea..6a6b1c8e907d 100644 --- a/net/wireless/mlme.c +++ b/net/wireless/mlme.c @@ -707,11 +707,13 @@ void cfg80211_dfs_channels_update_work(struct work_struct *work) if (c->dfs_state != NL80211_DFS_UNAVAILABLE) continue; - timeout = c->dfs_state_entered + - IEEE80211_DFS_MIN_NOP_TIME_MS; + timeout = c->dfs_state_entered + msecs_to_jiffies( + IEEE80211_DFS_MIN_NOP_TIME_MS); if (time_after_eq(jiffies, timeout)) { c->dfs_state = NL80211_DFS_USABLE; + c->dfs_state_entered = jiffies; + cfg80211_chandef_create(&chandef, c, NL80211_CHAN_NO_HT); diff --git a/net/wireless/nl80211.c b/net/wireless/nl80211.c index 626dc3b5fd8d..a7f4e7902104 100644 --- a/net/wireless/nl80211.c +++ b/net/wireless/nl80211.c @@ -354,6 +354,9 @@ static const struct nla_policy nl80211_policy[NL80211_ATTR_MAX+1] = { [NL80211_ATTR_CSA_IES] = { .type = NLA_NESTED }, [NL80211_ATTR_CSA_C_OFF_BEACON] = { .type = NLA_U16 }, [NL80211_ATTR_CSA_C_OFF_PRESP] = { .type = NLA_U16 }, + [NL80211_ATTR_STA_SUPPORTED_CHANNELS] = { .type = NLA_BINARY }, + [NL80211_ATTR_STA_SUPPORTED_OPER_CLASSES] = { .type = NLA_BINARY }, + [NL80211_ATTR_HANDLE_DFS] = { .type = NLA_FLAG }, }; /* policy for the key attributes */ @@ -3896,9 +3899,45 @@ static int nl80211_parse_sta_wme(struct genl_info *info, return 0; } +static int nl80211_parse_sta_channel_info(struct genl_info *info, + struct station_parameters *params) +{ + if (info->attrs[NL80211_ATTR_STA_SUPPORTED_CHANNELS]) { + params->supported_channels = + nla_data(info->attrs[NL80211_ATTR_STA_SUPPORTED_CHANNELS]); + params->supported_channels_len = + nla_len(info->attrs[NL80211_ATTR_STA_SUPPORTED_CHANNELS]); + /* + * Need to include at least one (first channel, number of + * channels) tuple for each subband, and must have proper + * tuples for the rest of the data as well. + */ + if (params->supported_channels_len < 2) + return -EINVAL; + if (params->supported_channels_len % 2) + return -EINVAL; + } + + if (info->attrs[NL80211_ATTR_STA_SUPPORTED_OPER_CLASSES]) { + params->supported_oper_classes = + nla_data(info->attrs[NL80211_ATTR_STA_SUPPORTED_OPER_CLASSES]); + params->supported_oper_classes_len = + nla_len(info->attrs[NL80211_ATTR_STA_SUPPORTED_OPER_CLASSES]); + /* + * The value of the Length field of the Supported Operating + * Classes element is between 2 and 253. + */ + if (params->supported_oper_classes_len < 2 || + params->supported_oper_classes_len > 253) + return -EINVAL; + } + return 0; +} + static int nl80211_set_station_tdls(struct genl_info *info, struct station_parameters *params) { + int err; /* Dummy STA entry gets updated once the peer capabilities are known */ if (info->attrs[NL80211_ATTR_PEER_AID]) params->aid = nla_get_u16(info->attrs[NL80211_ATTR_PEER_AID]); @@ -3909,6 +3948,10 @@ static int nl80211_set_station_tdls(struct genl_info *info, params->vht_capa = nla_data(info->attrs[NL80211_ATTR_VHT_CAPABILITY]); + err = nl80211_parse_sta_channel_info(info, params); + if (err) + return err; + return nl80211_parse_sta_wme(info, params); } @@ -4089,6 +4132,10 @@ static int nl80211_new_station(struct sk_buff *skb, struct genl_info *info) return -EINVAL; } + err = nl80211_parse_sta_channel_info(info, ¶ms); + if (err) + return err; + err = nl80211_parse_sta_wme(info, ¶ms); if (err) return err; @@ -5591,6 +5638,9 @@ static int nl80211_start_radar_detection(struct sk_buff *skb, if (err) return err; + if (netif_carrier_ok(dev)) + return -EBUSY; + if (wdev->cac_started) return -EBUSY; @@ -5634,15 +5684,27 @@ static int nl80211_channel_switch(struct sk_buff *skb, struct genl_info *info) static struct nlattr *csa_attrs[NL80211_ATTR_MAX+1]; u8 radar_detect_width = 0; int err; + bool need_new_beacon = false; if (!rdev->ops->channel_switch || !(rdev->wiphy.flags & WIPHY_FLAG_HAS_CHANNEL_SWITCH)) return -EOPNOTSUPP; - /* may add IBSS support later */ - if (dev->ieee80211_ptr->iftype != NL80211_IFTYPE_AP && - dev->ieee80211_ptr->iftype != NL80211_IFTYPE_P2P_GO) + switch (dev->ieee80211_ptr->iftype) { + case NL80211_IFTYPE_AP: + case NL80211_IFTYPE_P2P_GO: + need_new_beacon = true; + + /* useless if AP is not running */ + if (!wdev->beacon_interval) + return -EINVAL; + break; + case NL80211_IFTYPE_ADHOC: + case NL80211_IFTYPE_MESH_POINT: + break; + default: return -EOPNOTSUPP; + } memset(¶ms, 0, sizeof(params)); @@ -5651,15 +5713,14 @@ static int nl80211_channel_switch(struct sk_buff *skb, struct genl_info *info) return -EINVAL; /* only important for AP, IBSS and mesh create IEs internally */ - if (!info->attrs[NL80211_ATTR_CSA_IES]) - return -EINVAL; - - /* useless if AP is not running */ - if (!wdev->beacon_interval) + if (need_new_beacon && !info->attrs[NL80211_ATTR_CSA_IES]) return -EINVAL; params.count = nla_get_u32(info->attrs[NL80211_ATTR_CH_SWITCH_COUNT]); + if (!need_new_beacon) + goto skip_beacons; + err = nl80211_parse_beacon(info->attrs, ¶ms.beacon_after); if (err) return err; @@ -5699,6 +5760,7 @@ static int nl80211_channel_switch(struct sk_buff *skb, struct genl_info *info) return -EINVAL; } +skip_beacons: err = nl80211_parse_chandef(rdev, info, ¶ms.chandef); if (err) return err; @@ -5706,12 +5768,17 @@ static int nl80211_channel_switch(struct sk_buff *skb, struct genl_info *info) if (!cfg80211_reg_can_beacon(&rdev->wiphy, ¶ms.chandef)) return -EINVAL; - err = cfg80211_chandef_dfs_required(wdev->wiphy, ¶ms.chandef); - if (err < 0) { - return err; - } else if (err) { - radar_detect_width = BIT(params.chandef.width); - params.radar_required = true; + if (dev->ieee80211_ptr->iftype == NL80211_IFTYPE_AP || + dev->ieee80211_ptr->iftype == NL80211_IFTYPE_P2P_GO || + dev->ieee80211_ptr->iftype == NL80211_IFTYPE_ADHOC) { + err = cfg80211_chandef_dfs_required(wdev->wiphy, + ¶ms.chandef); + if (err < 0) { + return err; + } else if (err) { + radar_detect_width = BIT(params.chandef.width); + params.radar_required = true; + } } err = cfg80211_can_use_iftype_chan(rdev, wdev, wdev->iftype, @@ -6535,6 +6602,9 @@ static int nl80211_join_ibss(struct sk_buff *skb, struct genl_info *info) ibss.control_port = nla_get_flag(info->attrs[NL80211_ATTR_CONTROL_PORT]); + ibss.userspace_handles_dfs = + nla_get_flag(info->attrs[NL80211_ATTR_HANDLE_DFS]); + err = cfg80211_join_ibss(rdev, dev, &ibss, connkeys); if (err) kfree(connkeys); @@ -10740,7 +10810,9 @@ void cfg80211_ch_switch_notify(struct net_device *dev, wdev_lock(wdev); if (WARN_ON(wdev->iftype != NL80211_IFTYPE_AP && - wdev->iftype != NL80211_IFTYPE_P2P_GO)) + wdev->iftype != NL80211_IFTYPE_P2P_GO && + wdev->iftype != NL80211_IFTYPE_ADHOC && + wdev->iftype != NL80211_IFTYPE_MESH_POINT)) goto out; wdev->channel = chandef->chan; diff --git a/net/wireless/reg.c b/net/wireless/reg.c index de06d5d1287f..7da67fd0b418 100644 --- a/net/wireless/reg.c +++ b/net/wireless/reg.c @@ -172,11 +172,21 @@ static const struct ieee80211_regdomain world_regdom = { NL80211_RRF_NO_IBSS | NL80211_RRF_NO_OFDM), /* IEEE 802.11a, channel 36..48 */ - REG_RULE(5180-10, 5240+10, 80, 6, 20, + REG_RULE(5180-10, 5240+10, 160, 6, 20, NL80211_RRF_PASSIVE_SCAN | NL80211_RRF_NO_IBSS), - /* NB: 5260 MHz - 5700 MHz requires DFS */ + /* IEEE 802.11a, channel 52..64 - DFS required */ + REG_RULE(5260-10, 5320+10, 160, 6, 20, + NL80211_RRF_PASSIVE_SCAN | + NL80211_RRF_NO_IBSS | + NL80211_RRF_DFS), + + /* IEEE 802.11a, channel 100..144 - DFS required */ + REG_RULE(5500-10, 5720+10, 160, 6, 20, + NL80211_RRF_PASSIVE_SCAN | + NL80211_RRF_NO_IBSS | + NL80211_RRF_DFS), /* IEEE 802.11a, channel 149..165 */ REG_RULE(5745-10, 5825+10, 80, 6, 20, @@ -758,24 +768,25 @@ const struct ieee80211_reg_rule *freq_reg_info(struct wiphy *wiphy, } EXPORT_SYMBOL(freq_reg_info); -#ifdef CONFIG_CFG80211_REG_DEBUG -static const char *reg_initiator_name(enum nl80211_reg_initiator initiator) +const char *reg_initiator_name(enum nl80211_reg_initiator initiator) { switch (initiator) { case NL80211_REGDOM_SET_BY_CORE: - return "Set by core"; + return "core"; case NL80211_REGDOM_SET_BY_USER: - return "Set by user"; + return "user"; case NL80211_REGDOM_SET_BY_DRIVER: - return "Set by driver"; + return "driver"; case NL80211_REGDOM_SET_BY_COUNTRY_IE: - return "Set by country IE"; + return "country IE"; default: WARN_ON(1); - return "Set by bug"; + return "bug"; } } +EXPORT_SYMBOL(reg_initiator_name); +#ifdef CONFIG_CFG80211_REG_DEBUG static void chan_reg_rule_print_dbg(struct ieee80211_channel *chan, const struct ieee80211_reg_rule *reg_rule) { @@ -962,6 +973,13 @@ static bool reg_dev_ignore_cell_hint(struct wiphy *wiphy) } #endif +static bool wiphy_strict_alpha2_regd(struct wiphy *wiphy) +{ + if (wiphy->flags & WIPHY_FLAG_STRICT_REGULATORY && + !(wiphy->flags & WIPHY_FLAG_CUSTOM_REGULATORY)) + return true; + return false; +} static bool ignore_reg_update(struct wiphy *wiphy, enum nl80211_reg_initiator initiator) @@ -969,14 +987,17 @@ static bool ignore_reg_update(struct wiphy *wiphy, struct regulatory_request *lr = get_last_request(); if (!lr) { - REG_DBG_PRINT("Ignoring regulatory request %s since last_request is not set\n", + REG_DBG_PRINT("Ignoring regulatory request set by %s " + "since last_request is not set\n", reg_initiator_name(initiator)); return true; } if (initiator == NL80211_REGDOM_SET_BY_CORE && wiphy->flags & WIPHY_FLAG_CUSTOM_REGULATORY) { - REG_DBG_PRINT("Ignoring regulatory request %s since the driver uses its own custom regulatory domain\n", + REG_DBG_PRINT("Ignoring regulatory request set by %s " + "since the driver uses its own custom " + "regulatory domain\n", reg_initiator_name(initiator)); return true; } @@ -985,10 +1006,12 @@ static bool ignore_reg_update(struct wiphy *wiphy, * wiphy->regd will be set once the device has its own * desired regulatory domain set */ - if (wiphy->flags & WIPHY_FLAG_STRICT_REGULATORY && !wiphy->regd && + if (wiphy_strict_alpha2_regd(wiphy) && !wiphy->regd && initiator != NL80211_REGDOM_SET_BY_COUNTRY_IE && !is_world_regdom(lr->alpha2)) { - REG_DBG_PRINT("Ignoring regulatory request %s since the driver requires its own regulatory domain to be set first\n", + REG_DBG_PRINT("Ignoring regulatory request set by %s " + "since the driver requires its own regulatory " + "domain to be set first\n", reg_initiator_name(initiator)); return true; } @@ -1689,8 +1712,8 @@ int regulatory_hint(struct wiphy *wiphy, const char *alpha2) } EXPORT_SYMBOL(regulatory_hint); -void regulatory_hint_11d(struct wiphy *wiphy, enum ieee80211_band band, - const u8 *country_ie, u8 country_ie_len) +void regulatory_hint_country_ie(struct wiphy *wiphy, enum ieee80211_band band, + const u8 *country_ie, u8 country_ie_len) { char alpha2[2]; enum environment_cap env = ENVIRON_ANY; diff --git a/net/wireless/reg.h b/net/wireless/reg.h index af2d5f8a5d82..9677e3c13da9 100644 --- a/net/wireless/reg.h +++ b/net/wireless/reg.h @@ -58,7 +58,7 @@ int regulatory_hint_found_beacon(struct wiphy *wiphy, gfp_t gfp); /** - * regulatory_hint_11d - hints a country IE as a regulatory domain + * regulatory_hint_country_ie - hints a country IE as a regulatory domain * @wiphy: the wireless device giving the hint (used only for reporting * conflicts) * @band: the band on which the country IE was received on. This determines @@ -78,7 +78,7 @@ int regulatory_hint_found_beacon(struct wiphy *wiphy, * not observed. For this reason if a triplet is seen with channel * information for a band the BSS is not present in it will be ignored. */ -void regulatory_hint_11d(struct wiphy *wiphy, +void regulatory_hint_country_ie(struct wiphy *wiphy, enum ieee80211_band band, const u8 *country_ie, u8 country_ie_len); diff --git a/net/wireless/scan.c b/net/wireless/scan.c index eeb71480f1af..d4397eba5408 100644 --- a/net/wireless/scan.c +++ b/net/wireless/scan.c @@ -254,10 +254,10 @@ void __cfg80211_sched_scan_results(struct work_struct *wk) rdev = container_of(wk, struct cfg80211_registered_device, sched_scan_results_wk); - request = rdev->sched_scan_req; - rtnl_lock(); + request = rdev->sched_scan_req; + /* we don't have sched_scan_req anymore if the scan is stopping */ if (request) { if (request->flags & NL80211_SCAN_FLAG_FLUSH) { diff --git a/net/wireless/sme.c b/net/wireless/sme.c index 20e86a95dc4e..65f800890d70 100644 --- a/net/wireless/sme.c +++ b/net/wireless/sme.c @@ -682,8 +682,8 @@ void __cfg80211_connect_result(struct net_device *dev, const u8 *bssid, * - country_ie + 2, the start of the country ie data, and * - and country_ie[1] which is the IE length */ - regulatory_hint_11d(wdev->wiphy, bss->channel->band, - country_ie + 2, country_ie[1]); + regulatory_hint_country_ie(wdev->wiphy, bss->channel->band, + country_ie + 2, country_ie[1]); kfree(country_ie); } diff --git a/net/wireless/sysfs.h b/net/wireless/sysfs.h index 65acbebd3711..b533ed71daff 100644 --- a/net/wireless/sysfs.h +++ b/net/wireless/sysfs.h @@ -1,8 +1,8 @@ #ifndef __WIRELESS_SYSFS_H #define __WIRELESS_SYSFS_H -extern int wiphy_sysfs_init(void); -extern void wiphy_sysfs_exit(void); +int wiphy_sysfs_init(void); +void wiphy_sysfs_exit(void); extern struct class ieee80211_class; diff --git a/net/wireless/util.c b/net/wireless/util.c index ce090c1c5e4f..935dea9485da 100644 --- a/net/wireless/util.c +++ b/net/wireless/util.c @@ -10,6 +10,7 @@ #include <net/cfg80211.h> #include <net/ip.h> #include <net/dsfield.h> +#include <linux/if_vlan.h> #include "core.h" #include "rdev-ops.h" @@ -691,6 +692,7 @@ EXPORT_SYMBOL(ieee80211_amsdu_to_8023s); unsigned int cfg80211_classify8021d(struct sk_buff *skb) { unsigned int dscp; + unsigned char vlan_priority; /* skb->priority values from 256->263 are magic values to * directly indicate a specific 802.1d priority. This is used @@ -700,6 +702,13 @@ unsigned int cfg80211_classify8021d(struct sk_buff *skb) if (skb->priority >= 256 && skb->priority <= 263) return skb->priority - 256; + if (vlan_tx_tag_present(skb)) { + vlan_priority = (vlan_tx_tag_get(skb) & VLAN_PRIO_MASK) + >> VLAN_PRIO_SHIFT; + if (vlan_priority > 0) + return vlan_priority; + } + switch (skb->protocol) { case htons(ETH_P_IP): dscp = ipv4_get_dsfield(ip_hdr(skb)) & 0xfc; @@ -1240,7 +1249,7 @@ int cfg80211_can_use_iftype_chan(struct cfg80211_registered_device *rdev, enum cfg80211_chan_mode chmode; int num_different_channels = 0; int total = 1; - bool radar_required; + bool radar_required = false; int i, j; ASSERT_RTNL(); @@ -1255,14 +1264,20 @@ int cfg80211_can_use_iftype_chan(struct cfg80211_registered_device *rdev, case NL80211_IFTYPE_MESH_POINT: case NL80211_IFTYPE_P2P_GO: case NL80211_IFTYPE_WDS: - radar_required = !!(chan && - (chan->flags & IEEE80211_CHAN_RADAR)); + /* if the interface could potentially choose a DFS channel, + * then mark DFS as required. + */ + if (!chan) { + if (chanmode != CHAN_MODE_UNDEFINED && radar_detect) + radar_required = true; + break; + } + radar_required = !!(chan->flags & IEEE80211_CHAN_RADAR); break; case NL80211_IFTYPE_P2P_CLIENT: case NL80211_IFTYPE_STATION: case NL80211_IFTYPE_P2P_DEVICE: case NL80211_IFTYPE_MONITOR: - radar_required = false; break; case NUM_NL80211_IFTYPES: case NL80211_IFTYPE_UNSPECIFIED: diff --git a/net/xfrm/xfrm_algo.c b/net/xfrm/xfrm_algo.c index ab4ef72f0b1d..debe733386f8 100644 --- a/net/xfrm/xfrm_algo.c +++ b/net/xfrm/xfrm_algo.c @@ -802,17 +802,4 @@ int xfrm_count_pfkey_enc_supported(void) } EXPORT_SYMBOL_GPL(xfrm_count_pfkey_enc_supported); -#if defined(CONFIG_INET_ESP) || defined(CONFIG_INET_ESP_MODULE) || defined(CONFIG_INET6_ESP) || defined(CONFIG_INET6_ESP_MODULE) - -void *pskb_put(struct sk_buff *skb, struct sk_buff *tail, int len) -{ - if (tail != skb) { - skb->data_len += len; - skb->len += len; - } - return skb_put(tail, len); -} -EXPORT_SYMBOL_GPL(pskb_put); -#endif - MODULE_LICENSE("GPL"); diff --git a/net/xfrm/xfrm_hash.h b/net/xfrm/xfrm_hash.h index 716502ada53b..0622d319e1f2 100644 --- a/net/xfrm/xfrm_hash.h +++ b/net/xfrm/xfrm_hash.h @@ -130,7 +130,7 @@ static inline unsigned int __addr_hash(const xfrm_address_t *daddr, return h & hmask; } -extern struct hlist_head *xfrm_hash_alloc(unsigned int sz); -extern void xfrm_hash_free(struct hlist_head *n, unsigned int sz); +struct hlist_head *xfrm_hash_alloc(unsigned int sz); +void xfrm_hash_free(struct hlist_head *n, unsigned int sz); #endif /* _XFRM_HASH_H */ diff --git a/net/xfrm/xfrm_ipcomp.c b/net/xfrm/xfrm_ipcomp.c index 3be02b680268..ccfdc7115a83 100644 --- a/net/xfrm/xfrm_ipcomp.c +++ b/net/xfrm/xfrm_ipcomp.c @@ -220,8 +220,8 @@ static void ipcomp_free_scratches(void) static void * __percpu *ipcomp_alloc_scratches(void) { - int i; void * __percpu *scratches; + int i; if (ipcomp_scratch_users++) return ipcomp_scratches; @@ -233,7 +233,9 @@ static void * __percpu *ipcomp_alloc_scratches(void) ipcomp_scratches = scratches; for_each_possible_cpu(i) { - void *scratch = vmalloc(IPCOMP_SCRATCH_SIZE); + void *scratch; + + scratch = vmalloc_node(IPCOMP_SCRATCH_SIZE, cpu_to_node(i)); if (!scratch) return NULL; *per_cpu_ptr(scratches, i) = scratch; diff --git a/net/xfrm/xfrm_policy.c b/net/xfrm/xfrm_policy.c index 76e1873811d4..9a91f7431c41 100644 --- a/net/xfrm/xfrm_policy.c +++ b/net/xfrm/xfrm_policy.c @@ -1844,6 +1844,13 @@ static int xdst_queue_output(struct sk_buff *skb) struct xfrm_dst *xdst = (struct xfrm_dst *) dst; struct xfrm_policy *pol = xdst->pols[0]; struct xfrm_policy_queue *pq = &pol->polq; + const struct sk_buff *fclone = skb + 1; + + if (unlikely(skb->fclone == SKB_FCLONE_ORIG && + fclone->fclone == SKB_FCLONE_CLONE)) { + kfree_skb(skb); + return 0; + } if (pq->hold_queue.qlen > XFRM_MAX_QUEUE_LEN) { kfree_skb(skb); diff --git a/net/xfrm/xfrm_state.c b/net/xfrm/xfrm_state.c index b9c3f9e943a9..68c2f357a183 100644 --- a/net/xfrm/xfrm_state.c +++ b/net/xfrm/xfrm_state.c @@ -468,7 +468,7 @@ expired: } err = __xfrm_state_delete(x); - if (!err && x->id.spi) + if (!err) km_state_expired(x, 1, 0); xfrm_audit_state_delete(x, err ? 0 : 1, @@ -815,7 +815,7 @@ xfrm_state_find(const xfrm_address_t *daddr, const xfrm_address_t *saddr, xfrm_state_look_at(pol, x, fl, encap_family, &best, &acquire_in_progress, &error); } - if (best) + if (best || acquire_in_progress) goto found; h_wildcard = xfrm_dst_hash(net, daddr, &saddr_wildcard, tmpl->reqid, encap_family); @@ -824,7 +824,7 @@ xfrm_state_find(const xfrm_address_t *daddr, const xfrm_address_t *saddr, x->props.reqid == tmpl->reqid && (mark & x->mark.m) == x->mark.v && !(x->props.flags & XFRM_STATE_WILDRECV) && - xfrm_state_addr_check(x, daddr, saddr, encap_family) && + xfrm_addr_equal(&x->id.daddr, daddr, encap_family) && tmpl->mode == x->props.mode && tmpl->id.proto == x->id.proto && (tmpl->id.spi == x->id.spi || !tmpl->id.spi)) |