From 65e9a6d25deb752d97b88335068dc0e7accbc9c3 Mon Sep 17 00:00:00 2001 From: Randy Dunlap Date: Sun, 17 Mar 2019 17:17:45 -0700 Subject: networking: fix snmp_counter.rst Doc. Warnings Fix documentation markup warnings in snmp_counter.rst: Documentation/networking/snmp_counter.rst:416: WARNING: Title underline too short. Documentation/networking/snmp_counter.rst:684: WARNING: Bullet list ends without a blank line; unexpected unindent. Documentation/networking/snmp_counter.rst:693: WARNING: Title underline too short. Documentation/networking/snmp_counter.rst:707: WARNING: Bullet list ends without a blank line; unexpected unindent. Documentation/networking/snmp_counter.rst:712: WARNING: Bullet list ends without a blank line; unexpected unindent. Documentation/networking/snmp_counter.rst:722: WARNING: Title underline too short. Documentation/networking/snmp_counter.rst:733: WARNING: Bullet list ends without a blank line; unexpected unindent. Documentation/networking/snmp_counter.rst:736: WARNING: Bullet list ends without a blank line; unexpected unindent. Documentation/networking/snmp_counter.rst:739: WARNING: Bullet list ends without a blank line; unexpected unindent. Fixes: 80cc49507ba48 ("net: Add part of TCP counts explanations in snmp_counters.rst") Fixes: 8e2ea53a83dfb ("add snmp counters document") Fixes: a6c7c7aac2de6 ("net: add document for several snmp counters") Signed-off-by: Randy Dunlap Cc: yupeng --- Documentation/networking/snmp_counter.rst | 12 +++++++++--- 1 file changed, 9 insertions(+), 3 deletions(-) (limited to 'Documentation') diff --git a/Documentation/networking/snmp_counter.rst b/Documentation/networking/snmp_counter.rst index 52b026be028f..38a4edc4522b 100644 --- a/Documentation/networking/snmp_counter.rst +++ b/Documentation/networking/snmp_counter.rst @@ -413,7 +413,7 @@ algorithm. .. _F-RTO: https://tools.ietf.org/html/rfc5682 TCP Fast Path -============ +============= When kernel receives a TCP packet, it has two paths to handler the packet, one is fast path, another is slow path. The comment in kernel code provides a good explanation of them, I pasted them below:: @@ -681,6 +681,7 @@ The TCP stack receives an out of order duplicate packet, so it sends a DSACK to the sender. * TcpExtTCPDSACKRecv + The TCP stack receives a DSACK, which indicates an acknowledged duplicate packet is received. @@ -690,7 +691,7 @@ The TCP stack receives a DSACK, which indicate an out of order duplicate packet is received. invalid SACK and DSACK -==================== +====================== When a SACK (or DSACK) block is invalid, a corresponding counter would be updated. The validation method is base on the start/end sequence number of the SACK block. For more details, please refer the comment @@ -704,11 +705,13 @@ explaination: .. _Add counters for discarded SACK blocks: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=18f02545a9a16c9a89778b91a162ad16d510bb32 * TcpExtTCPSACKDiscard + This counter indicates how many SACK blocks are invalid. If the invalid SACK block is caused by ACK recording, the TCP stack will only ignore it and won't update this counter. * TcpExtTCPDSACKIgnoredOld and TcpExtTCPDSACKIgnoredNoUndo + When a DSACK block is invalid, one of these two counters would be updated. Which counter will be updated depends on the undo_marker flag of the TCP socket. If the undo_marker is not set, the TCP stack isn't @@ -719,7 +722,7 @@ will be updated. If the undo_marker is set, TcpExtTCPDSACKIgnoredOld will be updated. As implied in its name, it might be an old packet. SACK shift -========= +========== The linux networking stack stores data in sk_buff struct (skb for short). If a SACK block acrosses multiple skb, the TCP stack will try to re-arrange data in these skb. E.g. if a SACK block acknowledges seq @@ -730,12 +733,15 @@ seq 14 to 20. All data in skb2 will be moved to skb1, and skb2 will be discard, this operation is 'merge'. * TcpExtTCPSackShifted + A skb is shifted * TcpExtTCPSackMerged + A skb is merged * TcpExtTCPSackShiftFallback + A skb should be shifted or merged, but the TCP stack doesn't do it for some reasons. -- cgit v1.2.3 From 25208dd856e74f2b60d053eb98e6dd335816fbc1 Mon Sep 17 00:00:00 2001 From: Tobias Klauser Date: Mon, 18 Mar 2019 12:08:58 +0100 Subject: doc: fix link to MSG_ZEROCOPY patchset Use https and link to the patch directly. Signed-off-by: Tobias Klauser Acked-by: Willem de Bruijn Signed-off-by: David S. Miller --- Documentation/networking/msg_zerocopy.rst | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'Documentation') diff --git a/Documentation/networking/msg_zerocopy.rst b/Documentation/networking/msg_zerocopy.rst index 18c1415e7bfa..ace56204dd03 100644 --- a/Documentation/networking/msg_zerocopy.rst +++ b/Documentation/networking/msg_zerocopy.rst @@ -50,7 +50,7 @@ the excellent reporting over at LWN.net or read the original code. patchset [PATCH net-next v4 0/9] socket sendmsg MSG_ZEROCOPY - http://lkml.kernel.org/r/20170803202945.70750-1-willemdebruijn.kernel@gmail.com + https://lkml.kernel.org/netdev/20170803202945.70750-1-willemdebruijn.kernel@gmail.com Interface -- cgit v1.2.3 From ffa91253739ca89fc997195d8bbd1f7ba3e29fbe Mon Sep 17 00:00:00 2001 From: Florian Fainelli Date: Mon, 18 Mar 2019 11:07:33 -0700 Subject: Documentation: networking: Update netdev-FAQ regarding patches Provide an explanation of what is expected with respect to sending new versions of specific patches within a patch series, as well as what happens if an earlier patch series accidentally gets merged). Signed-off-by: Florian Fainelli Signed-off-by: David S. Miller --- Documentation/networking/netdev-FAQ.rst | 13 +++++++++++++ 1 file changed, 13 insertions(+) (limited to 'Documentation') diff --git a/Documentation/networking/netdev-FAQ.rst b/Documentation/networking/netdev-FAQ.rst index 0ac5fa77f501..8c7a713cf657 100644 --- a/Documentation/networking/netdev-FAQ.rst +++ b/Documentation/networking/netdev-FAQ.rst @@ -131,6 +131,19 @@ it to the maintainer to figure out what is the most recent and current version that should be applied. If there is any doubt, the maintainer will reply and ask what should be done. +Q: I made changes to only a few patches in a patch series should I resend only those changed? +-------------------------------------------------------------------------------------------- +A: No, please resend the entire patch series and make sure you do number your +patches such that it is clear this is the latest and greatest set of patches +that can be applied. + +Q: I submitted multiple versions of a patch series and it looks like a version other than the last one has been accepted, what should I do? +------------------------------------------------------------------------------------------------------------------------------------------- +A: There is no revert possible, once it is pushed out, it stays like that. +Please send incremental versions on top of what has been merged in order to fix +the patches the way they would look like if your latest patch series was to be +merged. + Q: How can I tell what patches are queued up for backporting to the various stable releases? -------------------------------------------------------------------------------------------- A: Normally Greg Kroah-Hartman collects stable commits himself, but for -- cgit v1.2.3 From 7c9abe12b359d988970c5e38f0e190249e5ae00f Mon Sep 17 00:00:00 2001 From: Pablo Neira Ayuso Date: Tue, 19 Mar 2019 13:51:22 +0100 Subject: netfilter: nf_flowtable: remove duplicated transition in diagram No direct transition from prerouting to forward hook, routing lookup needs to happen first. Fixes: 19b351f16fd9 ("netfilter: add flowtable documentation") Signed-off-by: Pablo Neira Ayuso --- Documentation/networking/nf_flowtable.txt | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) (limited to 'Documentation') diff --git a/Documentation/networking/nf_flowtable.txt b/Documentation/networking/nf_flowtable.txt index 54128c50d508..ca2136c76042 100644 --- a/Documentation/networking/nf_flowtable.txt +++ b/Documentation/networking/nf_flowtable.txt @@ -44,10 +44,10 @@ including the Netfilter hooks and the flowtable fastpath bypass. / \ / \ |Routing | / \ --> ingress ---> prerouting ---> |decision| | postrouting |--> neigh_xmit \_________/ \__________/ ---------- \____________/ ^ - | ^ | | ^ | - flowtable | | ____\/___ | | - | | | / \ | | - __\/___ | --------->| forward |------------ | + | ^ | ^ | + flowtable | ____\/___ | | + | | / \ | | + __\/___ | | forward |------------ | |-----| | \_________/ | |-----| | 'flow offload' rule | |-----| | adds entry to | -- cgit v1.2.3 From 5cd1c56c42beb6d228cc8d4373fdc5f5ec78a5ad Mon Sep 17 00:00:00 2001 From: Jarkko Nikula Date: Fri, 15 Mar 2019 12:56:49 +0200 Subject: i2c: i801: Add support for Intel Comet Lake Add PCI ID for Intel Comet Lake PCH. Signed-off-by: Jarkko Nikula Reviewed-by: Jean Delvare Signed-off-by: Wolfram Sang --- Documentation/i2c/busses/i2c-i801 | 1 + drivers/i2c/busses/Kconfig | 1 + drivers/i2c/busses/i2c-i801.c | 4 ++++ 3 files changed, 6 insertions(+) (limited to 'Documentation') diff --git a/Documentation/i2c/busses/i2c-i801 b/Documentation/i2c/busses/i2c-i801 index d1ee484a787d..ee9984f35868 100644 --- a/Documentation/i2c/busses/i2c-i801 +++ b/Documentation/i2c/busses/i2c-i801 @@ -36,6 +36,7 @@ Supported adapters: * Intel Cannon Lake (PCH) * Intel Cedar Fork (PCH) * Intel Ice Lake (PCH) + * Intel Comet Lake (PCH) Datasheets: Publicly available at the Intel website On Intel Patsburg and later chipsets, both the normal host SMBus controller diff --git a/drivers/i2c/busses/Kconfig b/drivers/i2c/busses/Kconfig index f2c681971201..f8979abb9a19 100644 --- a/drivers/i2c/busses/Kconfig +++ b/drivers/i2c/busses/Kconfig @@ -131,6 +131,7 @@ config I2C_I801 Cannon Lake (PCH) Cedar Fork (PCH) Ice Lake (PCH) + Comet Lake (PCH) This driver can also be built as a module. If so, the module will be called i2c-i801. diff --git a/drivers/i2c/busses/i2c-i801.c b/drivers/i2c/busses/i2c-i801.c index c91e145ef5a5..679c6c41f64b 100644 --- a/drivers/i2c/busses/i2c-i801.c +++ b/drivers/i2c/busses/i2c-i801.c @@ -71,6 +71,7 @@ * Cannon Lake-LP (PCH) 0x9da3 32 hard yes yes yes * Cedar Fork (PCH) 0x18df 32 hard yes yes yes * Ice Lake-LP (PCH) 0x34a3 32 hard yes yes yes + * Comet Lake (PCH) 0x02a3 32 hard yes yes yes * * Features supported by this driver: * Software PEC no @@ -240,6 +241,7 @@ #define PCI_DEVICE_ID_INTEL_LEWISBURG_SSKU_SMBUS 0xa223 #define PCI_DEVICE_ID_INTEL_KABYLAKE_PCH_H_SMBUS 0xa2a3 #define PCI_DEVICE_ID_INTEL_CANNONLAKE_H_SMBUS 0xa323 +#define PCI_DEVICE_ID_INTEL_COMETLAKE_SMBUS 0x02a3 struct i801_mux_config { char *gpio_chip; @@ -1038,6 +1040,7 @@ static const struct pci_device_id i801_ids[] = { { PCI_DEVICE(PCI_VENDOR_ID_INTEL, PCI_DEVICE_ID_INTEL_CANNONLAKE_H_SMBUS) }, { PCI_DEVICE(PCI_VENDOR_ID_INTEL, PCI_DEVICE_ID_INTEL_CANNONLAKE_LP_SMBUS) }, { PCI_DEVICE(PCI_VENDOR_ID_INTEL, PCI_DEVICE_ID_INTEL_ICELAKE_LP_SMBUS) }, + { PCI_DEVICE(PCI_VENDOR_ID_INTEL, PCI_DEVICE_ID_INTEL_COMETLAKE_SMBUS) }, { 0, } }; @@ -1534,6 +1537,7 @@ static int i801_probe(struct pci_dev *dev, const struct pci_device_id *id) case PCI_DEVICE_ID_INTEL_DNV_SMBUS: case PCI_DEVICE_ID_INTEL_KABYLAKE_PCH_H_SMBUS: case PCI_DEVICE_ID_INTEL_ICELAKE_LP_SMBUS: + case PCI_DEVICE_ID_INTEL_COMETLAKE_SMBUS: priv->features |= FEATURE_I2C_BLOCK_READ; priv->features |= FEATURE_IRQ; priv->features |= FEATURE_SMBUS_PEC; -- cgit v1.2.3 From 24105bf4d10485143f8e26337cda8bcb7f6e6da5 Mon Sep 17 00:00:00 2001 From: Fabrizio Castro Date: Tue, 19 Mar 2019 11:02:01 +0000 Subject: dt-bindings: irqchip: renesas-irqc: Document r8a774c0 support Document RZ/G2E (R8A774C0) SoC bindings. Signed-off-by: Fabrizio Castro Reviewed-by: Geert Uytterhoeven Reviewed-by: Simon Horman Reviewed-by: Rob Herring Signed-off-by: Marc Zyngier --- Documentation/devicetree/bindings/interrupt-controller/renesas,irqc.txt | 1 + 1 file changed, 1 insertion(+) (limited to 'Documentation') diff --git a/Documentation/devicetree/bindings/interrupt-controller/renesas,irqc.txt b/Documentation/devicetree/bindings/interrupt-controller/renesas,irqc.txt index 8de96a4fb2d5..f977ea7617f6 100644 --- a/Documentation/devicetree/bindings/interrupt-controller/renesas,irqc.txt +++ b/Documentation/devicetree/bindings/interrupt-controller/renesas,irqc.txt @@ -16,6 +16,7 @@ Required properties: - "renesas,irqc-r8a7793" (R-Car M2-N) - "renesas,irqc-r8a7794" (R-Car E2) - "renesas,intc-ex-r8a774a1" (RZ/G2M) + - "renesas,intc-ex-r8a774c0" (RZ/G2E) - "renesas,intc-ex-r8a7795" (R-Car H3) - "renesas,intc-ex-r8a7796" (R-Car M3-W) - "renesas,intc-ex-r8a77965" (R-Car M3-N) -- cgit v1.2.3 From fb1eb41a3dd4cfff274c98f3c3324ab329641298 Mon Sep 17 00:00:00 2001 From: Christian Lamparter Date: Fri, 22 Mar 2019 01:05:00 +0100 Subject: dt-bindings: net: dsa: qca8k: fix example In the example, the phy at phy@0 is clashing with the switch0@0 at the same address. Usually, the switches are accessible through pseudo PHYs which in case of the qca8k are located at 0x10 - 0x18. Reviewed-by: Florian Fainelli Signed-off-by: Christian Lamparter Signed-off-by: David S. Miller --- Documentation/devicetree/bindings/net/dsa/qca8k.txt | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) (limited to 'Documentation') diff --git a/Documentation/devicetree/bindings/net/dsa/qca8k.txt b/Documentation/devicetree/bindings/net/dsa/qca8k.txt index bbcb255c3150..5eda99e6c86e 100644 --- a/Documentation/devicetree/bindings/net/dsa/qca8k.txt +++ b/Documentation/devicetree/bindings/net/dsa/qca8k.txt @@ -55,12 +55,12 @@ Example: reg = <4>; }; - switch0@0 { + switch@10 { compatible = "qca,qca8337"; #address-cells = <1>; #size-cells = <0>; - reg = <0>; + reg = <0x10>; ports { #address-cells = <1>; -- cgit v1.2.3 From 5e07321f3388e6f2b13c43ae9df3e09efa8418e0 Mon Sep 17 00:00:00 2001 From: Christian Lamparter Date: Fri, 22 Mar 2019 01:05:01 +0100 Subject: dt-bindings: net: dsa: qca8k: support internal mdio-bus This patch updates the qca8k's binding to document to the approach for using the internal mdio-bus of the supported qca8k switches. Reviewed-by: Florian Fainelli Signed-off-by: Christian Lamparter Signed-off-by: David S. Miller --- .../devicetree/bindings/net/dsa/qca8k.txt | 69 ++++++++++++++++++++-- 1 file changed, 64 insertions(+), 5 deletions(-) (limited to 'Documentation') diff --git a/Documentation/devicetree/bindings/net/dsa/qca8k.txt b/Documentation/devicetree/bindings/net/dsa/qca8k.txt index 5eda99e6c86e..93a7469e70d4 100644 --- a/Documentation/devicetree/bindings/net/dsa/qca8k.txt +++ b/Documentation/devicetree/bindings/net/dsa/qca8k.txt @@ -12,10 +12,15 @@ Required properties: Subnodes: The integrated switch subnode should be specified according to the binding -described in dsa/dsa.txt. As the QCA8K switches do not have a N:N mapping of -port and PHY id, each subnode describing a port needs to have a valid phandle -referencing the internal PHY connected to it. The CPU port of this switch is -always port 0. +described in dsa/dsa.txt. If the QCA8K switch is connect to a SoC's external +mdio-bus each subnode describing a port needs to have a valid phandle +referencing the internal PHY it is connected to. This is because there's no +N:N mapping of port and PHY id. + +Don't use mixed external and internal mdio-bus configurations, as this is +not supported by the hardware. + +The CPU port of this switch is always port 0. A CPU port node has the following optional node: @@ -31,8 +36,9 @@ For QCA8K the 'fixed-link' sub-node supports only the following properties: - 'full-duplex' (boolean, optional), to indicate that full duplex is used. When absent, half duplex is assumed. -Example: +Examples: +for the external mdio-bus configuration: &mdio0 { phy_port1: phy@0 { @@ -108,3 +114,56 @@ Example: }; }; }; + +for the internal master mdio-bus configuration: + + &mdio0 { + switch@10 { + compatible = "qca,qca8337"; + #address-cells = <1>; + #size-cells = <0>; + + reg = <0x10>; + + ports { + #address-cells = <1>; + #size-cells = <0>; + + port@0 { + reg = <0>; + label = "cpu"; + ethernet = <&gmac1>; + phy-mode = "rgmii"; + fixed-link { + speed = 1000; + full-duplex; + }; + }; + + port@1 { + reg = <1>; + label = "lan1"; + }; + + port@2 { + reg = <2>; + label = "lan2"; + }; + + port@3 { + reg = <3>; + label = "lan3"; + }; + + port@4 { + reg = <4>; + label = "lan4"; + }; + + port@5 { + reg = <5>; + label = "wan"; + }; + }; + }; + }; -- cgit v1.2.3 From f52c97d9df983cc38a8809d0910e5eaba0c180b3 Mon Sep 17 00:00:00 2001 From: Jesper Dangaard Brouer Date: Mon, 25 Mar 2019 15:12:15 +0100 Subject: bpf, doc: fix BTF docs reflow of bullet list Section 2.2.1 BTF_KIND_INT a bullet list was collapsed due to text reflow in commit 9ab5305dbe3f ("docs/btf: reflow text to fill up to 78 characters"). This patch correct the mistake. Also adjust next bullet list, which is used for comparison, to get rendered the same way. Fixes: 9ab5305dbe3f ("docs/btf: reflow text to fill up to 78 characters") Link: https://www.kernel.org/doc/html/latest/bpf/btf.html#btf-kind-int Signed-off-by: Jesper Dangaard Brouer Acked-by: Andrii Nakryiko Signed-off-by: Alexei Starovoitov --- Documentation/bpf/btf.rst | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) (limited to 'Documentation') diff --git a/Documentation/bpf/btf.rst b/Documentation/bpf/btf.rst index 9a60a5d60e38..7313d354f20e 100644 --- a/Documentation/bpf/btf.rst +++ b/Documentation/bpf/btf.rst @@ -148,16 +148,16 @@ The ``btf_type.size * 8`` must be equal to or greater than ``BTF_INT_BITS()`` for the type. The maximum value of ``BTF_INT_BITS()`` is 128. The ``BTF_INT_OFFSET()`` specifies the starting bit offset to calculate values -for this int. For example, a bitfield struct member has: * btf member bit -offset 100 from the start of the structure, * btf member pointing to an int -type, * the int type has ``BTF_INT_OFFSET() = 2`` and ``BTF_INT_BITS() = 4`` +for this int. For example, a bitfield struct member has: + * btf member bit offset 100 from the start of the structure, + * btf member pointing to an int type, + * the int type has ``BTF_INT_OFFSET() = 2`` and ``BTF_INT_BITS() = 4`` Then in the struct memory layout, this member will occupy ``4`` bits starting from bits ``100 + 2 = 102``. Alternatively, the bitfield struct member can be the following to access the same bits as the above: - * btf member bit offset 102, * btf member pointing to an int type, * the int type has ``BTF_INT_OFFSET() = 0`` and ``BTF_INT_BITS() = 4`` -- cgit v1.2.3 From 2291da5b4d305a6506d915d5bef85dedd7c94882 Mon Sep 17 00:00:00 2001 From: Dmitry Torokhov Date: Fri, 18 Jan 2019 13:37:40 -0800 Subject: [media] doc-rst: switch to new names for Full Screen/Aspect keys We defined better names for keys to activate full screen mode or change aspect ratio (while keeping the existing keycodes to avoid breaking userspace), so let's use them in the document. Signed-off-by: Dmitry Torokhov --- Documentation/media/uapi/rc/rc-tables.rst | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) (limited to 'Documentation') diff --git a/Documentation/media/uapi/rc/rc-tables.rst b/Documentation/media/uapi/rc/rc-tables.rst index c8ae9479f842..57797e56f45e 100644 --- a/Documentation/media/uapi/rc/rc-tables.rst +++ b/Documentation/media/uapi/rc/rc-tables.rst @@ -616,7 +616,7 @@ the remote via /dev/input/event devices. - .. row 78 - - ``KEY_SCREEN`` + - ``KEY_ASPECT_RATIO`` - Select screen aspect ratio @@ -624,7 +624,7 @@ the remote via /dev/input/event devices. - .. row 79 - - ``KEY_ZOOM`` + - ``KEY_FULL_SCREEN`` - Put device into zoom/full screen mode -- cgit v1.2.3 From c4dcd89d20a8fe4009d25660c69396611328cc5e Mon Sep 17 00:00:00 2001 From: Wolfram Sang Date: Fri, 22 Mar 2019 00:04:19 +0100 Subject: i2c: iop3xx: make bindings file name match the driver If we use the "i2c-" prefix for the binding documentation file name, then it should match the file name of the driver, if possible. It is possible for this driver, so rename it. Signed-off-by: Wolfram Sang --- Documentation/devicetree/bindings/i2c/i2c-iop3xx.txt | 20 ++++++++++++++++++++ Documentation/devicetree/bindings/i2c/i2c-xscale.txt | 20 -------------------- 2 files changed, 20 insertions(+), 20 deletions(-) create mode 100644 Documentation/devicetree/bindings/i2c/i2c-iop3xx.txt delete mode 100644 Documentation/devicetree/bindings/i2c/i2c-xscale.txt (limited to 'Documentation') diff --git a/Documentation/devicetree/bindings/i2c/i2c-iop3xx.txt b/Documentation/devicetree/bindings/i2c/i2c-iop3xx.txt new file mode 100644 index 000000000000..dcc8390e0d24 --- /dev/null +++ b/Documentation/devicetree/bindings/i2c/i2c-iop3xx.txt @@ -0,0 +1,20 @@ +i2c Controller on XScale platforms such as IOP3xx and IXP4xx + +Required properties: +- compatible : Must be one of + "intel,iop3xx-i2c" + "intel,ixp4xx-i2c"; +- reg +- #address-cells = <1>; +- #size-cells = <0>; + +Optional properties: +- Child nodes conforming to i2c bus binding + +Example: + +i2c@c8011000 { + compatible = "intel,ixp4xx-i2c"; + reg = <0xc8011000 0x18>; + interrupts = <33 IRQ_TYPE_LEVEL_LOW>; +}; diff --git a/Documentation/devicetree/bindings/i2c/i2c-xscale.txt b/Documentation/devicetree/bindings/i2c/i2c-xscale.txt deleted file mode 100644 index dcc8390e0d24..000000000000 --- a/Documentation/devicetree/bindings/i2c/i2c-xscale.txt +++ /dev/null @@ -1,20 +0,0 @@ -i2c Controller on XScale platforms such as IOP3xx and IXP4xx - -Required properties: -- compatible : Must be one of - "intel,iop3xx-i2c" - "intel,ixp4xx-i2c"; -- reg -- #address-cells = <1>; -- #size-cells = <0>; - -Optional properties: -- Child nodes conforming to i2c bus binding - -Example: - -i2c@c8011000 { - compatible = "intel,ixp4xx-i2c"; - reg = <0xc8011000 0x18>; - interrupts = <33 IRQ_TYPE_LEVEL_LOW>; -}; -- cgit v1.2.3 From 94c87527f4e1ebf85936f707ec84ff458f3bbb00 Mon Sep 17 00:00:00 2001 From: Wolfram Sang Date: Fri, 22 Mar 2019 00:04:20 +0100 Subject: i2c: mt65xx: make bindings file name match the driver If we use the "i2c-" prefix for the binding documentation file name, then it should match the file name of the driver, if possible. It is possible for this driver, so rename it. Signed-off-by: Wolfram Sang --- .../devicetree/bindings/i2c/i2c-mt65xx.txt | 44 ++++++++++++++++++++++ Documentation/devicetree/bindings/i2c/i2c-mtk.txt | 44 ---------------------- 2 files changed, 44 insertions(+), 44 deletions(-) create mode 100644 Documentation/devicetree/bindings/i2c/i2c-mt65xx.txt delete mode 100644 Documentation/devicetree/bindings/i2c/i2c-mtk.txt (limited to 'Documentation') diff --git a/Documentation/devicetree/bindings/i2c/i2c-mt65xx.txt b/Documentation/devicetree/bindings/i2c/i2c-mt65xx.txt new file mode 100644 index 000000000000..ee4c32454198 --- /dev/null +++ b/Documentation/devicetree/bindings/i2c/i2c-mt65xx.txt @@ -0,0 +1,44 @@ +* MediaTek's I2C controller + +The MediaTek's I2C controller is used to interface with I2C devices. + +Required properties: + - compatible: value should be either of the following. + "mediatek,mt2701-i2c", "mediatek,mt6577-i2c": for MediaTek MT2701 + "mediatek,mt2712-i2c": for MediaTek MT2712 + "mediatek,mt6577-i2c": for MediaTek MT6577 + "mediatek,mt6589-i2c": for MediaTek MT6589 + "mediatek,mt7622-i2c": for MediaTek MT7622 + "mediatek,mt7623-i2c", "mediatek,mt6577-i2c": for MediaTek MT7623 + "mediatek,mt7629-i2c", "mediatek,mt2712-i2c": for MediaTek MT7629 + "mediatek,mt8173-i2c": for MediaTek MT8173 + - reg: physical base address of the controller and dma base, length of memory + mapped region. + - interrupts: interrupt number to the cpu. + - clock-div: the fixed value for frequency divider of clock source in i2c + module. Each IC may be different. + - clocks: clock name from clock manager + - clock-names: Must include "main" and "dma", if enable have-pmic need include + "pmic" extra. + +Optional properties: + - clock-frequency: Frequency in Hz of the bus when transfer, the default value + is 100000. + - mediatek,have-pmic: platform can control i2c form special pmic side. + Only mt6589 and mt8135 support this feature. + - mediatek,use-push-pull: IO config use push-pull mode. + +Example: + + i2c0: i2c@1100d000 { + compatible = "mediatek,mt6577-i2c"; + reg = <0x1100d000 0x70>, + <0x11000300 0x80>; + interrupts = ; + clock-frequency = <400000>; + mediatek,have-pmic; + clock-div = <16>; + clocks = <&i2c0_ck>, <&ap_dma_ck>; + clock-names = "main", "dma"; + }; + diff --git a/Documentation/devicetree/bindings/i2c/i2c-mtk.txt b/Documentation/devicetree/bindings/i2c/i2c-mtk.txt deleted file mode 100644 index ee4c32454198..000000000000 --- a/Documentation/devicetree/bindings/i2c/i2c-mtk.txt +++ /dev/null @@ -1,44 +0,0 @@ -* MediaTek's I2C controller - -The MediaTek's I2C controller is used to interface with I2C devices. - -Required properties: - - compatible: value should be either of the following. - "mediatek,mt2701-i2c", "mediatek,mt6577-i2c": for MediaTek MT2701 - "mediatek,mt2712-i2c": for MediaTek MT2712 - "mediatek,mt6577-i2c": for MediaTek MT6577 - "mediatek,mt6589-i2c": for MediaTek MT6589 - "mediatek,mt7622-i2c": for MediaTek MT7622 - "mediatek,mt7623-i2c", "mediatek,mt6577-i2c": for MediaTek MT7623 - "mediatek,mt7629-i2c", "mediatek,mt2712-i2c": for MediaTek MT7629 - "mediatek,mt8173-i2c": for MediaTek MT8173 - - reg: physical base address of the controller and dma base, length of memory - mapped region. - - interrupts: interrupt number to the cpu. - - clock-div: the fixed value for frequency divider of clock source in i2c - module. Each IC may be different. - - clocks: clock name from clock manager - - clock-names: Must include "main" and "dma", if enable have-pmic need include - "pmic" extra. - -Optional properties: - - clock-frequency: Frequency in Hz of the bus when transfer, the default value - is 100000. - - mediatek,have-pmic: platform can control i2c form special pmic side. - Only mt6589 and mt8135 support this feature. - - mediatek,use-push-pull: IO config use push-pull mode. - -Example: - - i2c0: i2c@1100d000 { - compatible = "mediatek,mt6577-i2c"; - reg = <0x1100d000 0x70>, - <0x11000300 0x80>; - interrupts = ; - clock-frequency = <400000>; - mediatek,have-pmic; - clock-div = <16>; - clocks = <&i2c0_ck>, <&ap_dma_ck>; - clock-names = "main", "dma"; - }; - -- cgit v1.2.3 From 0a96f9ffbfe9446b5c4c67461085236d578248e5 Mon Sep 17 00:00:00 2001 From: Wolfram Sang Date: Fri, 22 Mar 2019 00:04:21 +0100 Subject: i2c: stu300: make bindings file name match the driver If we use the "i2c-" prefix for the binding documentation file name, then it should match the file name of the driver, if possible. It is possible for this driver, so rename it. Signed-off-by: Wolfram Sang --- Documentation/devicetree/bindings/i2c/i2c-st-ddci2c.txt | 15 --------------- Documentation/devicetree/bindings/i2c/i2c-stu300.txt | 15 +++++++++++++++ 2 files changed, 15 insertions(+), 15 deletions(-) delete mode 100644 Documentation/devicetree/bindings/i2c/i2c-st-ddci2c.txt create mode 100644 Documentation/devicetree/bindings/i2c/i2c-stu300.txt (limited to 'Documentation') diff --git a/Documentation/devicetree/bindings/i2c/i2c-st-ddci2c.txt b/Documentation/devicetree/bindings/i2c/i2c-st-ddci2c.txt deleted file mode 100644 index bd81a482634f..000000000000 --- a/Documentation/devicetree/bindings/i2c/i2c-st-ddci2c.txt +++ /dev/null @@ -1,15 +0,0 @@ -ST Microelectronics DDC I2C - -Required properties : -- compatible : Must be "st,ddci2c" -- reg: physical base address of the controller and length of memory mapped - region. -- interrupts: interrupt number to the cpu. -- #address-cells = <1>; -- #size-cells = <0>; - -Optional properties: -- Child nodes conforming to i2c bus binding - -Examples : - diff --git a/Documentation/devicetree/bindings/i2c/i2c-stu300.txt b/Documentation/devicetree/bindings/i2c/i2c-stu300.txt new file mode 100644 index 000000000000..bd81a482634f --- /dev/null +++ b/Documentation/devicetree/bindings/i2c/i2c-stu300.txt @@ -0,0 +1,15 @@ +ST Microelectronics DDC I2C + +Required properties : +- compatible : Must be "st,ddci2c" +- reg: physical base address of the controller and length of memory mapped + region. +- interrupts: interrupt number to the cpu. +- #address-cells = <1>; +- #size-cells = <0>; + +Optional properties: +- Child nodes conforming to i2c bus binding + +Examples : + -- cgit v1.2.3 From 45dfceb0d14a06eeffe22581eb2996f4ed5225ca Mon Sep 17 00:00:00 2001 From: Wolfram Sang Date: Fri, 22 Mar 2019 00:04:22 +0100 Subject: i2c: sun6i-p2wi: make bindings file name match the driver If we use the "i2c-" prefix for the binding documentation file name, then it should match the file name of the driver, if possible. It is possible for this driver, so rename it. Signed-off-by: Wolfram Sang Acked-by: Maxime Ripard --- .../devicetree/bindings/i2c/i2c-sun6i-p2wi.txt | 41 ++++++++++++++++++++++ .../devicetree/bindings/i2c/i2c-sunxi-p2wi.txt | 41 ---------------------- 2 files changed, 41 insertions(+), 41 deletions(-) create mode 100644 Documentation/devicetree/bindings/i2c/i2c-sun6i-p2wi.txt delete mode 100644 Documentation/devicetree/bindings/i2c/i2c-sunxi-p2wi.txt (limited to 'Documentation') diff --git a/Documentation/devicetree/bindings/i2c/i2c-sun6i-p2wi.txt b/Documentation/devicetree/bindings/i2c/i2c-sun6i-p2wi.txt new file mode 100644 index 000000000000..49df0053347a --- /dev/null +++ b/Documentation/devicetree/bindings/i2c/i2c-sun6i-p2wi.txt @@ -0,0 +1,41 @@ + +* Allwinner P2WI (Push/Pull 2 Wire Interface) controller + +Required properties : + + - reg : Offset and length of the register set for the device. + - compatible : Should one of the following: + - "allwinner,sun6i-a31-p2wi" + - interrupts : The interrupt line connected to the P2WI peripheral. + - clocks : The gate clk connected to the P2WI peripheral. + - resets : The reset line connected to the P2WI peripheral. + +Optional properties : + + - clock-frequency : Desired P2WI bus clock frequency in Hz. If not set the +default frequency is 100kHz + +A P2WI may contain one child node encoding a P2WI slave device. + +Slave device properties: + Required properties: + - reg : the I2C slave address used during the initialization + process to switch from I2C to P2WI mode + +Example: + + p2wi@1f03400 { + compatible = "allwinner,sun6i-a31-p2wi"; + reg = <0x01f03400 0x400>; + interrupts = <0 39 4>; + clocks = <&apb0_gates 3>; + clock-frequency = <6000000>; + resets = <&apb0_rst 3>; + + axp221: pmic@68 { + compatible = "x-powers,axp221"; + reg = <0x68>; + + /* ... */ + }; + }; diff --git a/Documentation/devicetree/bindings/i2c/i2c-sunxi-p2wi.txt b/Documentation/devicetree/bindings/i2c/i2c-sunxi-p2wi.txt deleted file mode 100644 index 49df0053347a..000000000000 --- a/Documentation/devicetree/bindings/i2c/i2c-sunxi-p2wi.txt +++ /dev/null @@ -1,41 +0,0 @@ - -* Allwinner P2WI (Push/Pull 2 Wire Interface) controller - -Required properties : - - - reg : Offset and length of the register set for the device. - - compatible : Should one of the following: - - "allwinner,sun6i-a31-p2wi" - - interrupts : The interrupt line connected to the P2WI peripheral. - - clocks : The gate clk connected to the P2WI peripheral. - - resets : The reset line connected to the P2WI peripheral. - -Optional properties : - - - clock-frequency : Desired P2WI bus clock frequency in Hz. If not set the -default frequency is 100kHz - -A P2WI may contain one child node encoding a P2WI slave device. - -Slave device properties: - Required properties: - - reg : the I2C slave address used during the initialization - process to switch from I2C to P2WI mode - -Example: - - p2wi@1f03400 { - compatible = "allwinner,sun6i-a31-p2wi"; - reg = <0x01f03400 0x400>; - interrupts = <0 39 4>; - clocks = <&apb0_gates 3>; - clock-frequency = <6000000>; - resets = <&apb0_rst 3>; - - axp221: pmic@68 { - compatible = "x-powers,axp221"; - reg = <0x68>; - - /* ... */ - }; - }; -- cgit v1.2.3 From 080a910414659dfd18ffaf60a1e95b96c3f50eab Mon Sep 17 00:00:00 2001 From: Wolfram Sang Date: Fri, 22 Mar 2019 00:04:23 +0100 Subject: i2c: wmt: make bindings file name match the driver If we use the "i2c-" prefix for the binding documentation file name, then it should match the file name of the driver, if possible. It is possible for this driver, so rename it. Signed-off-by: Wolfram Sang --- .../devicetree/bindings/i2c/i2c-vt8500.txt | 24 ---------------------- Documentation/devicetree/bindings/i2c/i2c-wmt.txt | 24 ++++++++++++++++++++++ 2 files changed, 24 insertions(+), 24 deletions(-) delete mode 100644 Documentation/devicetree/bindings/i2c/i2c-vt8500.txt create mode 100644 Documentation/devicetree/bindings/i2c/i2c-wmt.txt (limited to 'Documentation') diff --git a/Documentation/devicetree/bindings/i2c/i2c-vt8500.txt b/Documentation/devicetree/bindings/i2c/i2c-vt8500.txt deleted file mode 100644 index 94a425eaa6c7..000000000000 --- a/Documentation/devicetree/bindings/i2c/i2c-vt8500.txt +++ /dev/null @@ -1,24 +0,0 @@ -* Wondermedia I2C Controller - -Required properties : - - - compatible : should be "wm,wm8505-i2c" - - reg : Offset and length of the register set for the device - - interrupts : where IRQ is the interrupt number - - clocks : phandle to the I2C clock source - -Optional properties : - - - clock-frequency : desired I2C bus clock frequency in Hz. - Valid values are 100000 and 400000. - Default to 100000 if not specified, or invalid value. - -Example : - - i2c_0: i2c@d8280000 { - compatible = "wm,wm8505-i2c"; - reg = <0xd8280000 0x1000>; - interrupts = <19>; - clocks = <&clki2c0>; - clock-frequency = <400000>; - }; diff --git a/Documentation/devicetree/bindings/i2c/i2c-wmt.txt b/Documentation/devicetree/bindings/i2c/i2c-wmt.txt new file mode 100644 index 000000000000..94a425eaa6c7 --- /dev/null +++ b/Documentation/devicetree/bindings/i2c/i2c-wmt.txt @@ -0,0 +1,24 @@ +* Wondermedia I2C Controller + +Required properties : + + - compatible : should be "wm,wm8505-i2c" + - reg : Offset and length of the register set for the device + - interrupts : where IRQ is the interrupt number + - clocks : phandle to the I2C clock source + +Optional properties : + + - clock-frequency : desired I2C bus clock frequency in Hz. + Valid values are 100000 and 400000. + Default to 100000 if not specified, or invalid value. + +Example : + + i2c_0: i2c@d8280000 { + compatible = "wm,wm8505-i2c"; + reg = <0xd8280000 0x1000>; + interrupts = <19>; + clocks = <&clki2c0>; + clock-frequency = <400000>; + }; -- cgit v1.2.3 From 898a737c8a436b2fcd6dcb0b57775ada2f846a26 Mon Sep 17 00:00:00 2001 From: Erin Lo Date: Mon, 11 Mar 2019 16:54:31 +0800 Subject: dt-bindings: serial: Add compatible for Mediatek MT8183 This adds dt-binding documentation of uart for Mediatek MT8183 SoC Platform. Signed-off-by: Erin Lo Acked-by: Rob Herring Acked-by: Matthias Brugger Signed-off-by: Greg Kroah-Hartman --- Documentation/devicetree/bindings/serial/mtk-uart.txt | 1 + 1 file changed, 1 insertion(+) (limited to 'Documentation') diff --git a/Documentation/devicetree/bindings/serial/mtk-uart.txt b/Documentation/devicetree/bindings/serial/mtk-uart.txt index 742cb470595b..bcfb13194f16 100644 --- a/Documentation/devicetree/bindings/serial/mtk-uart.txt +++ b/Documentation/devicetree/bindings/serial/mtk-uart.txt @@ -16,6 +16,7 @@ Required properties: * "mediatek,mt8127-uart" for MT8127 compatible UARTS * "mediatek,mt8135-uart" for MT8135 compatible UARTS * "mediatek,mt8173-uart" for MT8173 compatible UARTS + * "mediatek,mt8183-uart", "mediatek,mt6577-uart" for MT8183 compatible UARTS * "mediatek,mt6577-uart" for MT6577 and all of the above - reg: The base address of the UART register bank. -- cgit v1.2.3 From 7d6ab823d6461e60d211d4c8d89a13dce08b730d Mon Sep 17 00:00:00 2001 From: David Howells Date: Wed, 27 Mar 2019 22:53:31 +0000 Subject: vfs: Update mount API docs Update the mount API docs to reflect recent changes to the code. Signed-off-by: David Howells Signed-off-by: Linus Torvalds --- Documentation/filesystems/mount_api.txt | 367 +++++++++++++++++--------------- 1 file changed, 195 insertions(+), 172 deletions(-) (limited to 'Documentation') diff --git a/Documentation/filesystems/mount_api.txt b/Documentation/filesystems/mount_api.txt index 944d1965e917..00ff0cfccfa7 100644 --- a/Documentation/filesystems/mount_api.txt +++ b/Documentation/filesystems/mount_api.txt @@ -12,11 +12,13 @@ CONTENTS (4) Filesystem context security. - (5) VFS filesystem context operations. + (5) VFS filesystem context API. - (6) Parameter description. + (6) Superblock creation helpers. - (7) Parameter helper functions. + (7) Parameter description. + + (8) Parameter helper functions. ======== @@ -41,12 +43,15 @@ The creation of new mounts is now to be done in a multistep process: (7) Destroy the context. -To support this, the file_system_type struct gains a new field: +To support this, the file_system_type struct gains two new fields: int (*init_fs_context)(struct fs_context *fc); + const struct fs_parameter_description *parameters; -which is invoked to set up the filesystem-specific parts of a filesystem -context, including the additional space. +The first is invoked to set up the filesystem-specific parts of a filesystem +context, including the additional space, and the second points to the +parameter description for validation at registration time and querying by a +future system call. Note that security initialisation is done *after* the filesystem is called so that the namespaces may be adjusted first. @@ -73,9 +78,9 @@ context. This is represented by the fs_context structure: void *s_fs_info; unsigned int sb_flags; unsigned int sb_flags_mask; + unsigned int s_iflags; + unsigned int lsm_flags; enum fs_context_purpose purpose:8; - bool sloppy:1; - bool silent:1; ... }; @@ -141,6 +146,10 @@ The fs_context fields are as follows: Which bits SB_* flags are to be set/cleared in super_block::s_flags. + (*) unsigned int s_iflags + + These will be bitwise-OR'd with s->s_iflags when a superblock is created. + (*) enum fs_context_purpose This indicates the purpose for which the context is intended. The @@ -150,17 +159,6 @@ The fs_context fields are as follows: FS_CONTEXT_FOR_SUBMOUNT -- New automatic submount of extant mount FS_CONTEXT_FOR_RECONFIGURE -- Change an existing mount - (*) bool sloppy - (*) bool silent - - These are set if the sloppy or silent mount options are given. - - [NOTE] sloppy is probably unnecessary when userspace passes over one - option at a time since the error can just be ignored if userspace deems it - to be unimportant. - - [NOTE] silent is probably redundant with sb_flags & SB_SILENT. - The mount context is created by calling vfs_new_fs_context() or vfs_dup_fs_context() and is destroyed with put_fs_context(). Note that the structure is not refcounted. @@ -342,28 +340,47 @@ number of operations used by the new mount code for this purpose: It should return 0 on success or a negative error code on failure. -================================= -VFS FILESYSTEM CONTEXT OPERATIONS -================================= +========================== +VFS FILESYSTEM CONTEXT API +========================== -There are four operations for creating a filesystem context and -one for destroying a context: +There are four operations for creating a filesystem context and one for +destroying a context: - (*) struct fs_context *vfs_new_fs_context(struct file_system_type *fs_type, - struct dentry *reference, - unsigned int sb_flags, - unsigned int sb_flags_mask, - enum fs_context_purpose purpose); + (*) struct fs_context *fs_context_for_mount( + struct file_system_type *fs_type, + unsigned int sb_flags); - Create a filesystem context for a given filesystem type and purpose. This - allocates the filesystem context, sets the superblock flags, initialises - the security and calls fs_type->init_fs_context() to initialise the - filesystem private data. + Allocate a filesystem context for the purpose of setting up a new mount, + whether that be with a new superblock or sharing an existing one. This + sets the superblock flags, initialises the security and calls + fs_type->init_fs_context() to initialise the filesystem private data. - reference can be NULL or it may indicate the root dentry of a superblock - that is going to be reconfigured (FS_CONTEXT_FOR_RECONFIGURE) or - the automount point that triggered a submount (FS_CONTEXT_FOR_SUBMOUNT). - This is provided as a source of namespace information. + fs_type specifies the filesystem type that will manage the context and + sb_flags presets the superblock flags stored therein. + + (*) struct fs_context *fs_context_for_reconfigure( + struct dentry *dentry, + unsigned int sb_flags, + unsigned int sb_flags_mask); + + Allocate a filesystem context for the purpose of reconfiguring an + existing superblock. dentry provides a reference to the superblock to be + configured. sb_flags and sb_flags_mask indicate which superblock flags + need changing and to what. + + (*) struct fs_context *fs_context_for_submount( + struct file_system_type *fs_type, + struct dentry *reference); + + Allocate a filesystem context for the purpose of creating a new mount for + an automount point or other derived superblock. fs_type specifies the + filesystem type that will manage the context and the reference dentry + supplies the parameters. Namespaces are propagated from the reference + dentry's superblock also. + + Note that it's not a requirement that the reference dentry be of the same + filesystem type as fs_type. (*) struct fs_context *vfs_dup_fs_context(struct fs_context *src_fc); @@ -390,20 +407,6 @@ context pointer or a negative error code. For the remaining operations, if an error occurs, a negative error code will be returned. - (*) int vfs_get_tree(struct fs_context *fc); - - Get or create the mountable root and superblock, using the parameters in - the filesystem context to select/configure the superblock. This invokes - the ->validate() op and then the ->get_tree() op. - - [NOTE] ->validate() could perhaps be rolled into ->get_tree() and - ->reconfigure(). - - (*) struct vfsmount *vfs_create_mount(struct fs_context *fc); - - Create a mount given the parameters in the specified filesystem context. - Note that this does not attach the mount to anything. - (*) int vfs_parse_fs_param(struct fs_context *fc, struct fs_parameter *param); @@ -432,17 +435,80 @@ returned. clear the pointer, but then becomes responsible for disposing of the object. - (*) int vfs_parse_fs_string(struct fs_context *fc, char *key, + (*) int vfs_parse_fs_string(struct fs_context *fc, const char *key, const char *value, size_t v_size); - A wrapper around vfs_parse_fs_param() that just passes a constant string. + A wrapper around vfs_parse_fs_param() that copies the value string it is + passed. (*) int generic_parse_monolithic(struct fs_context *fc, void *data); Parse a sys_mount() data page, assuming the form to be a text list consisting of key[=val] options separated by commas. Each item in the list is passed to vfs_mount_option(). This is the default when the - ->parse_monolithic() operation is NULL. + ->parse_monolithic() method is NULL. + + (*) int vfs_get_tree(struct fs_context *fc); + + Get or create the mountable root and superblock, using the parameters in + the filesystem context to select/configure the superblock. This invokes + the ->get_tree() method. + + (*) struct vfsmount *vfs_create_mount(struct fs_context *fc); + + Create a mount given the parameters in the specified filesystem context. + Note that this does not attach the mount to anything. + + +=========================== +SUPERBLOCK CREATION HELPERS +=========================== + +A number of VFS helpers are available for use by filesystems for the creation +or looking up of superblocks. + + (*) struct super_block * + sget_fc(struct fs_context *fc, + int (*test)(struct super_block *sb, struct fs_context *fc), + int (*set)(struct super_block *sb, struct fs_context *fc)); + + This is the core routine. If test is non-NULL, it searches for an + existing superblock matching the criteria held in the fs_context, using + the test function to match them. If no match is found, a new superblock + is created and the set function is called to set it up. + + Prior to the set function being called, fc->s_fs_info will be transferred + to sb->s_fs_info - and fc->s_fs_info will be cleared if set returns + success (ie. 0). + +The following helpers all wrap sget_fc(): + + (*) int vfs_get_super(struct fs_context *fc, + enum vfs_get_super_keying keying, + int (*fill_super)(struct super_block *sb, + struct fs_context *fc)) + + This creates/looks up a deviceless superblock. The keying indicates how + many superblocks of this type may exist and in what manner they may be + shared: + + (1) vfs_get_single_super + + Only one such superblock may exist in the system. Any further + attempt to get a new superblock gets this one (and any parameter + differences are ignored). + + (2) vfs_get_keyed_super + + Multiple superblocks of this type may exist and they're keyed on + their s_fs_info pointer (for example this may refer to a + namespace). + + (3) vfs_get_independent_super + + Multiple independent superblocks of this type may exist. This + function never matches an existing one and always creates a new + one. ===================== @@ -454,35 +520,22 @@ There's a core description struct that links everything together: struct fs_parameter_description { const char name[16]; - u8 nr_params; - u8 nr_alt_keys; - u8 nr_enums; - bool ignore_unknown; - bool no_source; - const char *const *keys; - const struct constant_table *alt_keys; const struct fs_parameter_spec *specs; const struct fs_parameter_enum *enums; }; For example: - enum afs_param { + enum { Opt_autocell, Opt_bar, Opt_dyn, Opt_foo, Opt_source, - nr__afs_params }; static const struct fs_parameter_description afs_fs_parameters = { .name = "kAFS", - .nr_params = nr__afs_params, - .nr_alt_keys = ARRAY_SIZE(afs_param_alt_keys), - .nr_enums = ARRAY_SIZE(afs_param_enums), - .keys = afs_param_keys, - .alt_keys = afs_param_alt_keys, .specs = afs_param_specs, .enums = afs_param_enums, }; @@ -494,28 +547,24 @@ The members are as follows: The name to be used in error messages generated by the parse helper functions. - (2) u8 nr_params; - - The number of discrete parameter identifiers. This indicates the number - of elements in the ->types[] array and also limits the values that may be - used in the values that the ->keys[] array maps to. - - It is expected that, for example, two parameters that are related, say - "acl" and "noacl" with have the same ID, but will be flagged to indicate - that one is the inverse of the other. The value can then be picked out - from the parse result. + (2) const struct fs_parameter_specification *specs; - (3) const struct fs_parameter_specification *specs; + Table of parameter specifications, terminated with a null entry, where the + entries are of type: - Table of parameter specifications, where the entries are of type: - - struct fs_parameter_type { - enum fs_parameter_spec type:8; - u8 flags; + struct fs_parameter_spec { + const char *name; + u8 opt; + enum fs_parameter_type type:8; + unsigned short flags; }; - and the parameter identifier is the index to the array. 'type' indicates - the desired value type and must be one of: + The 'name' field is a string to match exactly to the parameter key (no + wildcards, patterns and no case-independence) and 'opt' is the value that + will be returned by the fs_parser() function in the case of a successful + match. + + The 'type' field indicates the desired value type and must be one of: TYPE NAME EXPECTED VALUE RESULT IN ======================= ======================= ===================== @@ -525,85 +574,65 @@ The members are as follows: fs_param_is_u32_octal 32-bit octal int result->uint_32 fs_param_is_u32_hex 32-bit hex int result->uint_32 fs_param_is_s32 32-bit signed int result->int_32 + fs_param_is_u64 64-bit unsigned int result->uint_64 fs_param_is_enum Enum value name result->uint_32 fs_param_is_string Arbitrary string param->string fs_param_is_blob Binary blob param->blob fs_param_is_blockdev Blockdev path * Needs lookup fs_param_is_path Path * Needs lookup - fs_param_is_fd File descriptor param->file - - And each parameter can be qualified with 'flags': - - fs_param_v_optional The value is optional - fs_param_neg_with_no If key name is prefixed with "no", it is false - fs_param_neg_with_empty If value is "", it is false - fs_param_deprecated The parameter is deprecated. - - For example: - - static const struct fs_parameter_spec afs_param_specs[nr__afs_params] = { - [Opt_autocell] = { fs_param_is flag }, - [Opt_bar] = { fs_param_is_enum }, - [Opt_dyn] = { fs_param_is flag }, - [Opt_foo] = { fs_param_is_bool, fs_param_neg_with_no }, - [Opt_source] = { fs_param_is_string }, - }; + fs_param_is_fd File descriptor result->int_32 Note that if the value is of fs_param_is_bool type, fs_parse() will try to match any string value against "0", "1", "no", "yes", "false", "true". - [!] NOTE that the table must be sorted according to primary key name so - that ->keys[] is also sorted. - - (4) const char *const *keys; - - Table of primary key names for the parameters. There must be one entry - per defined parameter. The table is optional if ->nr_params is 0. The - table is just an array of names e.g.: + Each parameter can also be qualified with 'flags': - static const char *const afs_param_keys[nr__afs_params] = { - [Opt_autocell] = "autocell", - [Opt_bar] = "bar", - [Opt_dyn] = "dyn", - [Opt_foo] = "foo", - [Opt_source] = "source", - }; - - [!] NOTE that the table must be sorted such that the table can be searched - with bsearch() using strcmp(). This means that the Opt_* values must - correspond to the entries in this table. - - (5) const struct constant_table *alt_keys; - u8 nr_alt_keys; - - Table of additional key names and their mappings to parameter ID plus the - number of elements in the table. This is optional. The table is just an - array of { name, integer } pairs, e.g.: + fs_param_v_optional The value is optional + fs_param_neg_with_no result->negated set if key is prefixed with "no" + fs_param_neg_with_empty result->negated set if value is "" + fs_param_deprecated The parameter is deprecated. - static const struct constant_table afs_param_keys[] = { - { "baz", Opt_bar }, - { "dynamic", Opt_dyn }, + These are wrapped with a number of convenience wrappers: + + MACRO SPECIFIES + ======================= =============================================== + fsparam_flag() fs_param_is_flag + fsparam_flag_no() fs_param_is_flag, fs_param_neg_with_no + fsparam_bool() fs_param_is_bool + fsparam_u32() fs_param_is_u32 + fsparam_u32oct() fs_param_is_u32_octal + fsparam_u32hex() fs_param_is_u32_hex + fsparam_s32() fs_param_is_s32 + fsparam_u64() fs_param_is_u64 + fsparam_enum() fs_param_is_enum + fsparam_string() fs_param_is_string + fsparam_blob() fs_param_is_blob + fsparam_bdev() fs_param_is_blockdev + fsparam_path() fs_param_is_path + fsparam_fd() fs_param_is_fd + + all of which take two arguments, name string and option number - for + example: + + static const struct fs_parameter_spec afs_param_specs[] = { + fsparam_flag ("autocell", Opt_autocell), + fsparam_flag ("dyn", Opt_dyn), + fsparam_string ("source", Opt_source), + fsparam_flag_no ("foo", Opt_foo), + {} }; - [!] NOTE that the table must be sorted such that strcmp() can be used with - bsearch() to search the entries. - - The parameter ID can also be fs_param_key_removed to indicate that a - deprecated parameter has been removed and that an error will be given. - This differs from fs_param_deprecated where the parameter may still have - an effect. - - Further, the behaviour of the parameter may differ when an alternate name - is used (for instance with NFS, "v3", "v4.2", etc. are alternate names). + An addition macro, __fsparam() is provided that takes an additional pair + of arguments to specify the type and the flags for anything that doesn't + match one of the above macros. (6) const struct fs_parameter_enum *enums; - u8 nr_enums; - Table of enum value names to integer mappings and the number of elements - stored therein. This is of type: + Table of enum value names to integer mappings, terminated with a null + entry. This is of type: struct fs_parameter_enum { - u8 param_id; + u8 opt; char name[14]; u8 value; }; @@ -621,11 +650,6 @@ The members are as follows: try to look the value up in the enum table and the result will be stored in the parse result. - (7) bool no_source; - - If this is set, fs_parse() will ignore any "source" parameter and not - pass it to the filesystem. - The parser should be pointed to by the parser pointer in the file_system_type struct as this will provide validation on registration (if CONFIG_VALIDATE_FS_PARSER=y) and will allow the description to be queried from @@ -650,9 +674,8 @@ process the parameters it is given. int value; }; - and it must be sorted such that it can be searched using bsearch() using - strcmp(). If a match is found, the corresponding value is returned. If a - match isn't found, the not_found value is returned instead. + If a match is found, the corresponding value is returned. If a match + isn't found, the not_found value is returned instead. (*) bool validate_constant_table(const struct constant_table *tbl, size_t tbl_size, @@ -665,36 +688,36 @@ process the parameters it is given. should just be set to lie inside the low-to-high range. If all is good, true is returned. If the table is invalid, errors are - logged to dmesg, the stack is dumped and false is returned. + logged to dmesg and false is returned. + + (*) bool fs_validate_description(const struct fs_parameter_description *desc); + + This performs some validation checks on a parameter description. It + returns true if the description is good and false if it is not. It will + log errors to dmesg if validation fails. (*) int fs_parse(struct fs_context *fc, - const struct fs_param_parser *parser, + const struct fs_parameter_description *desc, struct fs_parameter *param, - struct fs_param_parse_result *result); + struct fs_parse_result *result); This is the main interpreter of parameters. It uses the parameter - description (parser) to look up the name of the parameter to use and to - convert that to a parameter ID (stored in result->key). + description to look up a parameter by key name and to convert that to an + option number (which it returns). If successful, and if the parameter type indicates the result is a boolean, integer or enum type, the value is converted by this function and - the result stored in result->{boolean,int_32,uint_32}. + the result stored in result->{boolean,int_32,uint_32,uint_64}. If a match isn't initially made, the key is prefixed with "no" and no value is present then an attempt will be made to look up the key with the prefix removed. If this matches a parameter for which the type has flag - fs_param_neg_with_no set, then a match will be made and the value will be - set to false/0/NULL. - - If the parameter is successfully matched and, optionally, parsed - correctly, 1 is returned. If the parameter isn't matched and - parser->ignore_unknown is set, then 0 is returned. Otherwise -EINVAL is - returned. - - (*) bool fs_validate_description(const struct fs_parameter_description *desc); + fs_param_neg_with_no set, then a match will be made and result->negated + will be set to true. - This is validates the parameter description. It returns true if the - description is good and false if it is not. + If the parameter isn't matched, -ENOPARAM will be returned; if the + parameter is matched, but the value is erroneous, -EINVAL will be + returned; otherwise the parameter's option number will be returned. (*) int fs_lookup_param(struct fs_context *fc, struct fs_parameter *value, -- cgit v1.2.3 From 47c42e6b4192a2ac8b6c9858ebcf400a9eff7a10 Mon Sep 17 00:00:00 2001 From: Sean Christopherson Date: Thu, 7 Mar 2019 15:27:44 -0800 Subject: KVM: x86: fix handling of role.cr4_pae and rename it to 'gpte_size' The cr4_pae flag is a bit of a misnomer, its purpose is really to track whether the guest PTE that is being shadowed is a 4-byte entry or an 8-byte entry. Prior to supporting nested EPT, the size of the gpte was reflected purely by CR4.PAE. KVM fudged things a bit for direct sptes, but it was mostly harmless since the size of the gpte never mattered. Now that a spte may be tracking an indirect EPT entry, relying on CR4.PAE is wrong and ill-named. For direct shadow pages, force the gpte_size to '1' as they are always 8-byte entries; EPT entries can only be 8-bytes and KVM always uses 8-byte entries for NPT and its identity map (when running with EPT but not unrestricted guest). Likewise, nested EPT entries are always 8-bytes. Nested EPT presents a unique scenario as the size of the entries are not dictated by CR4.PAE, but neither is the shadow page a direct map. To handle this scenario, set cr0_wp=1 and smap_andnot_wp=1, an otherwise impossible combination, to denote a nested EPT shadow page. Use the information to avoid incorrectly zapping an unsync'd indirect page in __kvm_sync_page(). Providing a consistent and accurate gpte_size fixes a bug reported by Vitaly where fast_cr3_switch() always fails when switching from L2 to L1 as kvm_mmu_get_page() would force role.cr4_pae=0 for direct pages, whereas kvm_calc_mmu_role_common() would set it according to CR4.PAE. Fixes: 7dcd575520082 ("x86/kvm/mmu: check if tdp/shadow MMU reconfiguration is needed") Reported-by: Vitaly Kuznetsov Tested-by: Vitaly Kuznetsov Signed-off-by: Sean Christopherson Signed-off-by: Paolo Bonzini --- Documentation/virtual/kvm/mmu.txt | 11 +++++++---- arch/x86/include/asm/kvm_host.h | 4 ++-- arch/x86/kvm/mmu.c | 38 ++++++++++++++++++++++++-------------- arch/x86/kvm/mmutrace.h | 4 ++-- 4 files changed, 35 insertions(+), 22 deletions(-) (limited to 'Documentation') diff --git a/Documentation/virtual/kvm/mmu.txt b/Documentation/virtual/kvm/mmu.txt index f365102c80f5..2efe0efc516e 100644 --- a/Documentation/virtual/kvm/mmu.txt +++ b/Documentation/virtual/kvm/mmu.txt @@ -142,7 +142,7 @@ Shadow pages contain the following information: If clear, this page corresponds to a guest page table denoted by the gfn field. role.quadrant: - When role.cr4_pae=0, the guest uses 32-bit gptes while the host uses 64-bit + When role.gpte_is_8_bytes=0, the guest uses 32-bit gptes while the host uses 64-bit sptes. That means a guest page table contains more ptes than the host, so multiple shadow pages are needed to shadow one guest page. For first-level shadow pages, role.quadrant can be 0 or 1 and denotes the @@ -158,9 +158,9 @@ Shadow pages contain the following information: The page is invalid and should not be used. It is a root page that is currently pinned (by a cpu hardware register pointing to it); once it is unpinned it will be destroyed. - role.cr4_pae: - Contains the value of cr4.pae for which the page is valid (e.g. whether - 32-bit or 64-bit gptes are in use). + role.gpte_is_8_bytes: + Reflects the size of the guest PTE for which the page is valid, i.e. '1' + if 64-bit gptes are in use, '0' if 32-bit gptes are in use. role.nxe: Contains the value of efer.nxe for which the page is valid. role.cr0_wp: @@ -173,6 +173,9 @@ Shadow pages contain the following information: Contains the value of cr4.smap && !cr0.wp for which the page is valid (pages for which this is true are different from other pages; see the treatment of cr0.wp=0 below). + role.ept_sp: + This is a virtual flag to denote a shadowed nested EPT page. ept_sp + is true if "cr0_wp && smap_andnot_wp", an otherwise invalid combination. role.smm: Is 1 if the page is valid in system management mode. This field determines which of the kvm_memslots array was used to build this diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h index a5db4475e72d..88f5192ce05e 100644 --- a/arch/x86/include/asm/kvm_host.h +++ b/arch/x86/include/asm/kvm_host.h @@ -253,14 +253,14 @@ struct kvm_mmu_memory_cache { * kvm_memory_slot.arch.gfn_track which is 16 bits, so the role bits used * by indirect shadow page can not be more than 15 bits. * - * Currently, we used 14 bits that are @level, @cr4_pae, @quadrant, @access, + * Currently, we used 14 bits that are @level, @gpte_is_8_bytes, @quadrant, @access, * @nxe, @cr0_wp, @smep_andnot_wp and @smap_andnot_wp. */ union kvm_mmu_page_role { u32 word; struct { unsigned level:4; - unsigned cr4_pae:1; + unsigned gpte_is_8_bytes:1; unsigned quadrant:2; unsigned direct:1; unsigned access:3; diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c index 01bb090aaa5c..776a58b00682 100644 --- a/arch/x86/kvm/mmu.c +++ b/arch/x86/kvm/mmu.c @@ -182,7 +182,7 @@ struct kvm_shadow_walk_iterator { static const union kvm_mmu_page_role mmu_base_role_mask = { .cr0_wp = 1, - .cr4_pae = 1, + .gpte_is_8_bytes = 1, .nxe = 1, .smep_andnot_wp = 1, .smap_andnot_wp = 1, @@ -2205,6 +2205,7 @@ static bool kvm_mmu_prepare_zap_page(struct kvm *kvm, struct kvm_mmu_page *sp, static void kvm_mmu_commit_zap_page(struct kvm *kvm, struct list_head *invalid_list); + #define for_each_valid_sp(_kvm, _sp, _gfn) \ hlist_for_each_entry(_sp, \ &(_kvm)->arch.mmu_page_hash[kvm_page_table_hashfn(_gfn)], hash_link) \ @@ -2215,12 +2216,17 @@ static void kvm_mmu_commit_zap_page(struct kvm *kvm, for_each_valid_sp(_kvm, _sp, _gfn) \ if ((_sp)->gfn != (_gfn) || (_sp)->role.direct) {} else +static inline bool is_ept_sp(struct kvm_mmu_page *sp) +{ + return sp->role.cr0_wp && sp->role.smap_andnot_wp; +} + /* @sp->gfn should be write-protected at the call site */ static bool __kvm_sync_page(struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp, struct list_head *invalid_list) { - if (sp->role.cr4_pae != !!is_pae(vcpu) - || vcpu->arch.mmu->sync_page(vcpu, sp) == 0) { + if ((!is_ept_sp(sp) && sp->role.gpte_is_8_bytes != !!is_pae(vcpu)) || + vcpu->arch.mmu->sync_page(vcpu, sp) == 0) { kvm_mmu_prepare_zap_page(vcpu->kvm, sp, invalid_list); return false; } @@ -2423,7 +2429,7 @@ static struct kvm_mmu_page *kvm_mmu_get_page(struct kvm_vcpu *vcpu, role.level = level; role.direct = direct; if (role.direct) - role.cr4_pae = 0; + role.gpte_is_8_bytes = true; role.access = access; if (!vcpu->arch.mmu->direct_map && vcpu->arch.mmu->root_level <= PT32_ROOT_LEVEL) { @@ -4794,7 +4800,6 @@ static union kvm_mmu_role kvm_calc_mmu_role_common(struct kvm_vcpu *vcpu, role.base.access = ACC_ALL; role.base.nxe = !!is_nx(vcpu); - role.base.cr4_pae = !!is_pae(vcpu); role.base.cr0_wp = is_write_protection(vcpu); role.base.smm = is_smm(vcpu); role.base.guest_mode = is_guest_mode(vcpu); @@ -4815,6 +4820,7 @@ kvm_calc_tdp_mmu_root_page_role(struct kvm_vcpu *vcpu, bool base_only) role.base.ad_disabled = (shadow_accessed_mask == 0); role.base.level = kvm_x86_ops->get_tdp_level(vcpu); role.base.direct = true; + role.base.gpte_is_8_bytes = true; return role; } @@ -4879,6 +4885,7 @@ kvm_calc_shadow_mmu_root_page_role(struct kvm_vcpu *vcpu, bool base_only) role.base.smap_andnot_wp = role.ext.cr4_smap && !is_write_protection(vcpu); role.base.direct = !is_paging(vcpu); + role.base.gpte_is_8_bytes = !!is_pae(vcpu); if (!is_long_mode(vcpu)) role.base.level = PT32E_ROOT_LEVEL; @@ -4919,21 +4926,24 @@ kvm_calc_shadow_ept_root_page_role(struct kvm_vcpu *vcpu, bool accessed_dirty, bool execonly) { union kvm_mmu_role role = {0}; - union kvm_mmu_page_role root_base = vcpu->arch.root_mmu.mmu_role.base; - /* Legacy paging and SMM flags are inherited from root_mmu */ - role.base.smm = root_base.smm; - role.base.nxe = root_base.nxe; - role.base.cr0_wp = root_base.cr0_wp; - role.base.smep_andnot_wp = root_base.smep_andnot_wp; - role.base.smap_andnot_wp = root_base.smap_andnot_wp; + /* SMM flag is inherited from root_mmu */ + role.base.smm = vcpu->arch.root_mmu.mmu_role.base.smm; role.base.level = PT64_ROOT_4LEVEL; + role.base.gpte_is_8_bytes = true; role.base.direct = false; role.base.ad_disabled = !accessed_dirty; role.base.guest_mode = true; role.base.access = ACC_ALL; + /* + * WP=1 and NOT_WP=1 is an impossible combination, use WP and the + * SMAP variation to denote shadow EPT entries. + */ + role.base.cr0_wp = true; + role.base.smap_andnot_wp = true; + role.ext = kvm_calc_mmu_role_ext(vcpu); role.ext.execonly = execonly; @@ -5184,7 +5194,7 @@ static bool detect_write_misaligned(struct kvm_mmu_page *sp, gpa_t gpa, gpa, bytes, sp->role.word); offset = offset_in_page(gpa); - pte_size = sp->role.cr4_pae ? 8 : 4; + pte_size = sp->role.gpte_is_8_bytes ? 8 : 4; /* * Sometimes, the OS only writes the last one bytes to update status @@ -5208,7 +5218,7 @@ static u64 *get_written_sptes(struct kvm_mmu_page *sp, gpa_t gpa, int *nspte) page_offset = offset_in_page(gpa); level = sp->role.level; *nspte = 1; - if (!sp->role.cr4_pae) { + if (!sp->role.gpte_is_8_bytes) { page_offset <<= 1; /* 32->64 */ /* * A 32-bit pde maps 4MB while the shadow pdes map diff --git a/arch/x86/kvm/mmutrace.h b/arch/x86/kvm/mmutrace.h index 9f6c855a0043..dd30dccd2ad5 100644 --- a/arch/x86/kvm/mmutrace.h +++ b/arch/x86/kvm/mmutrace.h @@ -29,10 +29,10 @@ \ role.word = __entry->role; \ \ - trace_seq_printf(p, "sp gfn %llx l%u%s q%u%s %s%s" \ + trace_seq_printf(p, "sp gfn %llx l%u %u-byte q%u%s %s%s" \ " %snxe %sad root %u %s%c", \ __entry->gfn, role.level, \ - role.cr4_pae ? " pae" : "", \ + role.gpte_is_8_bytes ? 8 : 4, \ role.quadrant, \ role.direct ? " direct" : "", \ access_str[role.access], \ -- cgit v1.2.3 From 5e124900c6ebf5dfbe31b7b67073a64b2da14967 Mon Sep 17 00:00:00 2001 From: Sean Christopherson Date: Fri, 15 Feb 2019 12:48:38 -0800 Subject: KVM: doc: Fix incorrect word ordering regarding supported use of APIs Per Paolo[1], instantiating multiple VMs in a single process is legal; but this conflicts with KVM's API documentation, which states: The only supported use is one virtual machine per process, and one vcpu per thread. However, an earlier section in the documentation states: Only run VM ioctls from the same process (address space) that was used to create the VM. and: Only run vcpu ioctls from the same thread that was used to create the vcpu. This suggests that the conflicting documentation is simply an incorrect ordering of of words, i.e. what's really meant is that a virtual machine can't be shared across multiple processes and a vCPU can't be shared across multiple threads. Tweak the blurb on issuing ioctls to use a more assertive tone, and rewrite the "supported use" sentence to reference said blurb instead of poorly restating it in different terms. Opportunistically add missing punctuation. [1] https://lkml.kernel.org/r/f23265d4-528e-3bd4-011f-4d7b8f3281db@redhat.com Fixes: 9c1b96e34717 ("KVM: Document basic API") Signed-off-by: Sean Christopherson [Improve notes on asynchronous ioctl] Signed-off-by: Paolo Bonzini --- Documentation/virtual/kvm/api.txt | 28 ++++++++++++++++------------ 1 file changed, 16 insertions(+), 12 deletions(-) (limited to 'Documentation') diff --git a/Documentation/virtual/kvm/api.txt b/Documentation/virtual/kvm/api.txt index 7de9eee73fcd..158805532083 100644 --- a/Documentation/virtual/kvm/api.txt +++ b/Documentation/virtual/kvm/api.txt @@ -5,24 +5,26 @@ The Definitive KVM (Kernel-based Virtual Machine) API Documentation ---------------------- The kvm API is a set of ioctls that are issued to control various aspects -of a virtual machine. The ioctls belong to three classes +of a virtual machine. The ioctls belong to three classes: - System ioctls: These query and set global attributes which affect the whole kvm subsystem. In addition a system ioctl is used to create - virtual machines + virtual machines. - VM ioctls: These query and set attributes that affect an entire virtual machine, for example memory layout. In addition a VM ioctl is used to create virtual cpus (vcpus). - Only run VM ioctls from the same process (address space) that was used - to create the VM. + VM ioctls must be issued from the same process (address space) that was + used to create the VM. - vcpu ioctls: These query and set attributes that control the operation of a single virtual cpu. - Only run vcpu ioctls from the same thread that was used to create the - vcpu. + vcpu ioctls should be issued from the same thread that was used to create + the vcpu, except for asynchronous vcpu ioctl that are marked as such in + the documentation. Otherwise, the first ioctl after switching threads + could see a performance impact. 2. File descriptors @@ -41,9 +43,8 @@ In general file descriptors can be migrated among processes by means of fork() and the SCM_RIGHTS facility of unix domain socket. These kinds of tricks are explicitly not supported by kvm. While they will not cause harm to the host, their actual behavior is not guaranteed by -the API. The only supported use is one virtual machine per process, -and one vcpu per thread. - +the API. See "General description" for details on the ioctl usage +model that is supported by KVM. It is important to note that althought VM ioctls may only be issued from the process that created the VM, a VM's lifecycle is associated with its @@ -515,11 +516,15 @@ c) KVM_INTERRUPT_SET_LEVEL Note that any value for 'irq' other than the ones stated above is invalid and incurs unexpected behavior. +This is an asynchronous vcpu ioctl and can be invoked from any thread. + MIPS: Queues an external interrupt to be injected into the virtual CPU. A negative interrupt number dequeues the interrupt. +This is an asynchronous vcpu ioctl and can be invoked from any thread. + 4.17 KVM_DEBUG_GUEST @@ -2493,7 +2498,7 @@ KVM_S390_MCHK (vm, vcpu) - machine check interrupt; cr 14 bits in parm, machine checks needing further payload are not supported by this ioctl) -Note that the vcpu ioctl is asynchronous to vcpu execution. +This is an asynchronous vcpu ioctl and can be invoked from any thread. 4.78 KVM_PPC_GET_HTAB_FD @@ -3042,8 +3047,7 @@ KVM_S390_INT_EMERGENCY - sigp emergency; parameters in .emerg KVM_S390_INT_EXTERNAL_CALL - sigp external call; parameters in .extcall KVM_S390_MCHK - machine check interrupt; parameters in .mchk - -Note that the vcpu ioctl is asynchronous to vcpu execution. +This is an asynchronous vcpu ioctl and can be invoked from any thread. 4.94 KVM_S390_GET_IRQ_STATE -- cgit v1.2.3 From ddba91801aeb5c160b660caed1800eb3aef403f8 Mon Sep 17 00:00:00 2001 From: Sean Christopherson Date: Fri, 15 Feb 2019 12:48:39 -0800 Subject: KVM: Reject device ioctls from processes other than the VM's creator KVM's API requires thats ioctls must be issued from the same process that created the VM. In other words, userspace can play games with a VM's file descriptors, e.g. fork(), SCM_RIGHTS, etc..., but only the creator can do anything useful. Explicitly reject device ioctls that are issued by a process other than the VM's creator, and update KVM's API documentation to extend its requirements to device ioctls. Fixes: 852b6d57dc7f ("kvm: add device control API") Cc: Signed-off-by: Sean Christopherson Signed-off-by: Paolo Bonzini --- Documentation/virtual/kvm/api.txt | 16 +++++++++++----- virt/kvm/kvm_main.c | 3 +++ 2 files changed, 14 insertions(+), 5 deletions(-) (limited to 'Documentation') diff --git a/Documentation/virtual/kvm/api.txt b/Documentation/virtual/kvm/api.txt index 158805532083..88ecbc7645db 100644 --- a/Documentation/virtual/kvm/api.txt +++ b/Documentation/virtual/kvm/api.txt @@ -13,7 +13,7 @@ of a virtual machine. The ioctls belong to three classes: - VM ioctls: These query and set attributes that affect an entire virtual machine, for example memory layout. In addition a VM ioctl is used to - create virtual cpus (vcpus). + create virtual cpus (vcpus) and devices. VM ioctls must be issued from the same process (address space) that was used to create the VM. @@ -26,6 +26,11 @@ of a virtual machine. The ioctls belong to three classes: the documentation. Otherwise, the first ioctl after switching threads could see a performance impact. + - device ioctls: These query and set attributes that control the operation + of a single device. + + device ioctls must be issued from the same process (address space) that + was used to create the VM. 2. File descriptors ------------------- @@ -34,10 +39,11 @@ The kvm API is centered around file descriptors. An initial open("/dev/kvm") obtains a handle to the kvm subsystem; this handle can be used to issue system ioctls. A KVM_CREATE_VM ioctl on this handle will create a VM file descriptor which can be used to issue VM -ioctls. A KVM_CREATE_VCPU ioctl on a VM fd will create a virtual cpu -and return a file descriptor pointing to it. Finally, ioctls on a vcpu -fd can be used to control the vcpu, including the important task of -actually running guest code. +ioctls. A KVM_CREATE_VCPU or KVM_CREATE_DEVICE ioctl on a VM fd will +create a virtual cpu or device and return a file descriptor pointing to +the new resource. Finally, ioctls on a vcpu or device fd can be used +to control the vcpu or device. For vcpus, this includes the important +task of actually running guest code. In general file descriptors can be migrated among processes by means of fork() and the SCM_RIGHTS facility of unix domain socket. These diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c index f25aa98a94df..55fe8e20d8fd 100644 --- a/virt/kvm/kvm_main.c +++ b/virt/kvm/kvm_main.c @@ -2905,6 +2905,9 @@ static long kvm_device_ioctl(struct file *filp, unsigned int ioctl, { struct kvm_device *dev = filp->private_data; + if (dev->kvm->mm != current->mm) + return -EIO; + switch (ioctl) { case KVM_SET_DEVICE_ATTR: return kvm_device_ioctl_attr(dev, dev->ops->set_attr, arg); -- cgit v1.2.3 From 919f6cd8bb2fe7151f8aecebc3b3d1ca2567396e Mon Sep 17 00:00:00 2001 From: Sean Christopherson Date: Fri, 15 Feb 2019 12:48:40 -0800 Subject: KVM: doc: Document the life cycle of a VM and its resources The series to add memcg accounting to KVM allocations[1] states: There are many KVM kernel memory allocations which are tied to the life of the VM process and should be charged to the VM process's cgroup. While it is correct to account KVM kernel allocations to the cgroup of the process that created the VM, it's technically incorrect to state that the KVM kernel memory allocations are tied to the life of the VM process. This is because the VM itself, i.e. struct kvm, is not tied to the life of the process which created it, rather it is tied to the life of its associated file descriptor. In other words, kvm_destroy_vm() is not invoked until fput() decrements its associated file's refcount to zero. A simple example is to fork() in Qemu and have the child sleep indefinitely; kvm_destroy_vm() isn't called until Qemu closes its file descriptor *and* the rogue child is killed. The allocations are guaranteed to be *accounted* to the process which created the VM, but only because KVM's per-{VM,vCPU} ioctls reject the ioctl() with -EIO if kvm->mm != current->mm. I.e. the child can keep the VM "alive" but can't do anything useful with its reference. Note that because 'struct kvm' also holds a reference to the mm_struct of its owner, the above behavior also applies to userspace allocations. Given that mucking with a VM's file descriptor can lead to subtle and undesirable behavior, e.g. memcg charges persisting after a VM is shut down, explicitly document a VM's lifecycle and its impact on the VM's resources. Alternatively, KVM could aggressively free resources when the creating process exits, e.g. via mmu_notifier->release(). However, mmu_notifier isn't guaranteed to be available, and freeing resources when the creator exits is likely to be error prone and fragile as KVM would need to ensure that it only freed resources that are truly out of reach. In practice, the existing behavior shouldn't be problematic as a properly configured system will prevent a child process from being moved out of the appropriate cgroup hierarchy, i.e. prevent hiding the process from the OOM killer, and will prevent an unprivileged user from being able to to hold a reference to struct kvm via another method, e.g. debugfs. [1]https://patchwork.kernel.org/patch/10806707/ Signed-off-by: Sean Christopherson Signed-off-by: Paolo Bonzini --- Documentation/virtual/kvm/api.txt | 17 +++++++++++++++++ 1 file changed, 17 insertions(+) (limited to 'Documentation') diff --git a/Documentation/virtual/kvm/api.txt b/Documentation/virtual/kvm/api.txt index 88ecbc7645db..f39969d0121e 100644 --- a/Documentation/virtual/kvm/api.txt +++ b/Documentation/virtual/kvm/api.txt @@ -69,6 +69,23 @@ by and on behalf of the VM's process may not be freed/unaccounted when the VM is shut down. +It is important to note that althought VM ioctls may only be issued from +the process that created the VM, a VM's lifecycle is associated with its +file descriptor, not its creator (process). In other words, the VM and +its resources, *including the associated address space*, are not freed +until the last reference to the VM's file descriptor has been released. +For example, if fork() is issued after ioctl(KVM_CREATE_VM), the VM will +not be freed until both the parent (original) process and its child have +put their references to the VM's file descriptor. + +Because a VM's resources are not freed until the last reference to its +file descriptor is released, creating additional references to a VM via +via fork(), dup(), etc... without careful consideration is strongly +discouraged and may have unwanted side effects, e.g. memory allocated +by and on behalf of the VM's process may not be freed/unaccounted when +the VM is shut down. + + 3. Extensions ------------- -- cgit v1.2.3 From e2788c4a41cb5fa68096f5a58bccacec1a700295 Mon Sep 17 00:00:00 2001 From: Paolo Bonzini Date: Thu, 28 Mar 2019 17:22:31 +0100 Subject: Documentation: kvm: clarify KVM_SET_USER_MEMORY_REGION The documentation does not mention how to delete a slot, add the information. Reported-by: Nathaniel McCallum Signed-off-by: Paolo Bonzini --- Documentation/virtual/kvm/api.txt | 18 ++++++++++-------- 1 file changed, 10 insertions(+), 8 deletions(-) (limited to 'Documentation') diff --git a/Documentation/virtual/kvm/api.txt b/Documentation/virtual/kvm/api.txt index f39969d0121e..67068c47c591 100644 --- a/Documentation/virtual/kvm/api.txt +++ b/Documentation/virtual/kvm/api.txt @@ -1114,14 +1114,12 @@ struct kvm_userspace_memory_region { #define KVM_MEM_LOG_DIRTY_PAGES (1UL << 0) #define KVM_MEM_READONLY (1UL << 1) -This ioctl allows the user to create or modify a guest physical memory -slot. When changing an existing slot, it may be moved in the guest -physical memory space, or its flags may be modified. It may not be -resized. Slots may not overlap in guest physical address space. -Bits 0-15 of "slot" specifies the slot id and this value should be -less than the maximum number of user memory slots supported per VM. -The maximum allowed slots can be queried using KVM_CAP_NR_MEMSLOTS, -if this capability is supported by the architecture. +This ioctl allows the user to create, modify or delete a guest physical +memory slot. Bits 0-15 of "slot" specify the slot id and this value +should be less than the maximum number of user memory slots supported per +VM. The maximum allowed slots can be queried using KVM_CAP_NR_MEMSLOTS, +if this capability is supported by the architecture. Slots may not +overlap in guest physical address space. If KVM_CAP_MULTI_ADDRESS_SPACE is available, bits 16-31 of "slot" specifies the address space which is being modified. They must be @@ -1130,6 +1128,10 @@ KVM_CAP_MULTI_ADDRESS_SPACE capability. Slots in separate address spaces are unrelated; the restriction on overlapping slots only applies within each address space. +Deleting a slot is done by passing zero for memory_size. When changing +an existing slot, it may be moved in the guest physical memory space, +or its flags may be modified, but it may not be resized. + Memory for the region is taken starting at the address denoted by the field userspace_addr, which must point at user addressable memory for the entire memory slot size. Any object may back this memory, including -- cgit v1.2.3 From d3b018f757560ab5c2bce0e7f46e0c1510d7afd4 Mon Sep 17 00:00:00 2001 From: Carlos Menin Date: Wed, 13 Mar 2019 11:11:26 -0300 Subject: dt-bindings: hwmon: (adc128d818) Specify ti,mode property size By default, cells in DT are 32-bit in size. The driver reads "ti,mode" using the function of_property_read_u8() which causes the value to be read incorrectly in little-endian architectures if the size is not specified. Make it explicit in the binding documentation that this prorperty must be set as a 8-bit value. Signed-off-by: Carlos Menin Reviewed-by: Rob Herring Signed-off-by: Guenter Roeck --- Documentation/devicetree/bindings/hwmon/adc128d818.txt | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) (limited to 'Documentation') diff --git a/Documentation/devicetree/bindings/hwmon/adc128d818.txt b/Documentation/devicetree/bindings/hwmon/adc128d818.txt index 08bab0e94d25..d0ae46d7bac3 100644 --- a/Documentation/devicetree/bindings/hwmon/adc128d818.txt +++ b/Documentation/devicetree/bindings/hwmon/adc128d818.txt @@ -26,7 +26,7 @@ Required node properties: Optional node properties: - - ti,mode: Operation mode (see above). + - ti,mode: Operation mode (u8) (see above). Example (operation mode 2): @@ -34,5 +34,5 @@ Example (operation mode 2): adc128d818@1d { compatible = "ti,adc128d818"; reg = <0x1d>; - ti,mode = <2>; + ti,mode = /bits/ 8 <2>; }; -- cgit v1.2.3 From a957abb58ddfc83fa95406bb97b4541f30c78421 Mon Sep 17 00:00:00 2001 From: Thor Thayer Date: Mon, 11 Mar 2019 17:18:04 -0500 Subject: dt-bindings: arm: socfpga: Add S10 System Manager binding Add the device tree bindings for the Stratix10 System Manager. Signed-off-by: Thor Thayer Reviewed-by: Rob Herring Reviewed-by: Arnd Bergmann Signed-off-by: Lee Jones --- .../devicetree/bindings/arm/altera/socfpga-system.txt | 12 ++++++++++++ 1 file changed, 12 insertions(+) (limited to 'Documentation') diff --git a/Documentation/devicetree/bindings/arm/altera/socfpga-system.txt b/Documentation/devicetree/bindings/arm/altera/socfpga-system.txt index f4d04a067282..82edbaaa3f85 100644 --- a/Documentation/devicetree/bindings/arm/altera/socfpga-system.txt +++ b/Documentation/devicetree/bindings/arm/altera/socfpga-system.txt @@ -11,3 +11,15 @@ Example: reg = <0xffd08000 0x1000>; cpu1-start-addr = <0xffd080c4>; }; + +ARM64 - Stratix10 +Required properties: +- compatible : "altr,sys-mgr-s10" +- reg : Should contain 1 register range(address and length) + for system manager register. + +Example: + sysmgr@ffd12000 { + compatible = "altr,sys-mgr-s10"; + reg = <0xffd12000 0x228>; + }; -- cgit v1.2.3 From ae82899bbe92a7777ded9a562ee602dd5917bcd8 Mon Sep 17 00:00:00 2001 From: Stanislav Fomichev Date: Mon, 1 Apr 2019 13:57:34 -0700 Subject: flow_dissector: document BPF flow dissector environment Short doc on what BPF flow dissector should expect in the input __sk_buff and flow_keys. Signed-off-by: Stanislav Fomichev Signed-off-by: Daniel Borkmann --- Documentation/networking/bpf_flow_dissector.txt | 115 ++++++++++++++++++++++++ 1 file changed, 115 insertions(+) create mode 100644 Documentation/networking/bpf_flow_dissector.txt (limited to 'Documentation') diff --git a/Documentation/networking/bpf_flow_dissector.txt b/Documentation/networking/bpf_flow_dissector.txt new file mode 100644 index 000000000000..70f66a2d57e7 --- /dev/null +++ b/Documentation/networking/bpf_flow_dissector.txt @@ -0,0 +1,115 @@ +================== +BPF Flow Dissector +================== + +Overview +======== + +Flow dissector is a routine that parses metadata out of the packets. It's +used in the various places in the networking subsystem (RFS, flow hash, etc). + +BPF flow dissector is an attempt to reimplement C-based flow dissector logic +in BPF to gain all the benefits of BPF verifier (namely, limits on the +number of instructions and tail calls). + +API +=== + +BPF flow dissector programs operate on an __sk_buff. However, only the +limited set of fields is allowed: data, data_end and flow_keys. flow_keys +is 'struct bpf_flow_keys' and contains flow dissector input and +output arguments. + +The inputs are: + * nhoff - initial offset of the networking header + * thoff - initial offset of the transport header, initialized to nhoff + * n_proto - L3 protocol type, parsed out of L2 header + +Flow dissector BPF program should fill out the rest of the 'struct +bpf_flow_keys' fields. Input arguments nhoff/thoff/n_proto should be also +adjusted accordingly. + +The return code of the BPF program is either BPF_OK to indicate successful +dissection, or BPF_DROP to indicate parsing error. + +__sk_buff->data +=============== + +In the VLAN-less case, this is what the initial state of the BPF flow +dissector looks like: ++------+------+------------+-----------+ +| DMAC | SMAC | ETHER_TYPE | L3_HEADER | ++------+------+------------+-----------+ + ^ + | + +-- flow dissector starts here + +skb->data + flow_keys->nhoff point to the first byte of L3_HEADER. +flow_keys->thoff = nhoff +flow_keys->n_proto = ETHER_TYPE + + +In case of VLAN, flow dissector can be called with the two different states. + +Pre-VLAN parsing: ++------+------+------+-----+-----------+-----------+ +| DMAC | SMAC | TPID | TCI |ETHER_TYPE | L3_HEADER | ++------+------+------+-----+-----------+-----------+ + ^ + | + +-- flow dissector starts here + +skb->data + flow_keys->nhoff point the to first byte of TCI. +flow_keys->thoff = nhoff +flow_keys->n_proto = TPID + +Please note that TPID can be 802.1AD and, hence, BPF program would +have to parse VLAN information twice for double tagged packets. + + +Post-VLAN parsing: ++------+------+------+-----+-----------+-----------+ +| DMAC | SMAC | TPID | TCI |ETHER_TYPE | L3_HEADER | ++------+------+------+-----+-----------+-----------+ + ^ + | + +-- flow dissector starts here + +skb->data + flow_keys->nhoff point the to first byte of L3_HEADER. +flow_keys->thoff = nhoff +flow_keys->n_proto = ETHER_TYPE + +In this case VLAN information has been processed before the flow dissector +and BPF flow dissector is not required to handle it. + + +The takeaway here is as follows: BPF flow dissector program can be called with +the optional VLAN header and should gracefully handle both cases: when single +or double VLAN is present and when it is not present. The same program +can be called for both cases and would have to be written carefully to +handle both cases. + + +Reference Implementation +======================== + +See tools/testing/selftests/bpf/progs/bpf_flow.c for the reference +implementation and tools/testing/selftests/bpf/flow_dissector_load.[hc] for +the loader. bpftool can be used to load BPF flow dissector program as well. + +The reference implementation is organized as follows: +* jmp_table map that contains sub-programs for each supported L3 protocol +* _dissect routine - entry point; it does input n_proto parsing and does + bpf_tail_call to the appropriate L3 handler + +Since BPF at this point doesn't support looping (or any jumping back), +jmp_table is used instead to handle multiple levels of encapsulation (and +IPv6 options). + + +Current Limitations +=================== +BPF flow dissector doesn't support exporting all the metadata that in-kernel +C-based implementation can export. Notable example is single VLAN (802.1Q) +and double VLAN (802.1AD) tags. Please refer to the 'struct bpf_flow_keys' +for a set of information that's currently can be exported from the BPF context. -- cgit v1.2.3 From 5eed7898626bedd6405421550c0c6e8ab9591bb2 Mon Sep 17 00:00:00 2001 From: Stanislav Fomichev Date: Wed, 3 Apr 2019 13:53:18 -0700 Subject: flow_dissector: rst'ify documentation Rename bpf_flow_dissector.txt to bpf_flow_dissector.rst and fix formatting. Also, link it from the Documentation/networking/index.rst. Tested with 'make htmldocs' to make sure it looks reasonable. Fixes: ae82899bbe92 ("flow_dissector: document BPF flow dissector environment") Signed-off-by: Stanislav Fomichev Acked-by: Martin KaFai Lau Signed-off-by: Daniel Borkmann --- Documentation/networking/bpf_flow_dissector.rst | 126 ++++++++++++++++++++++++ Documentation/networking/bpf_flow_dissector.txt | 115 --------------------- Documentation/networking/index.rst | 1 + 3 files changed, 127 insertions(+), 115 deletions(-) create mode 100644 Documentation/networking/bpf_flow_dissector.rst delete mode 100644 Documentation/networking/bpf_flow_dissector.txt (limited to 'Documentation') diff --git a/Documentation/networking/bpf_flow_dissector.rst b/Documentation/networking/bpf_flow_dissector.rst new file mode 100644 index 000000000000..b375ae2ec2c4 --- /dev/null +++ b/Documentation/networking/bpf_flow_dissector.rst @@ -0,0 +1,126 @@ +.. SPDX-License-Identifier: GPL-2.0 + +================== +BPF Flow Dissector +================== + +Overview +======== + +Flow dissector is a routine that parses metadata out of the packets. It's +used in the various places in the networking subsystem (RFS, flow hash, etc). + +BPF flow dissector is an attempt to reimplement C-based flow dissector logic +in BPF to gain all the benefits of BPF verifier (namely, limits on the +number of instructions and tail calls). + +API +=== + +BPF flow dissector programs operate on an ``__sk_buff``. However, only the +limited set of fields is allowed: ``data``, ``data_end`` and ``flow_keys``. +``flow_keys`` is ``struct bpf_flow_keys`` and contains flow dissector input +and output arguments. + +The inputs are: + * ``nhoff`` - initial offset of the networking header + * ``thoff`` - initial offset of the transport header, initialized to nhoff + * ``n_proto`` - L3 protocol type, parsed out of L2 header + +Flow dissector BPF program should fill out the rest of the ``struct +bpf_flow_keys`` fields. Input arguments ``nhoff/thoff/n_proto`` should be +also adjusted accordingly. + +The return code of the BPF program is either BPF_OK to indicate successful +dissection, or BPF_DROP to indicate parsing error. + +__sk_buff->data +=============== + +In the VLAN-less case, this is what the initial state of the BPF flow +dissector looks like:: + + +------+------+------------+-----------+ + | DMAC | SMAC | ETHER_TYPE | L3_HEADER | + +------+------+------------+-----------+ + ^ + | + +-- flow dissector starts here + + +.. code:: c + + skb->data + flow_keys->nhoff point to the first byte of L3_HEADER + flow_keys->thoff = nhoff + flow_keys->n_proto = ETHER_TYPE + +In case of VLAN, flow dissector can be called with the two different states. + +Pre-VLAN parsing:: + + +------+------+------+-----+-----------+-----------+ + | DMAC | SMAC | TPID | TCI |ETHER_TYPE | L3_HEADER | + +------+------+------+-----+-----------+-----------+ + ^ + | + +-- flow dissector starts here + +.. code:: c + + skb->data + flow_keys->nhoff point the to first byte of TCI + flow_keys->thoff = nhoff + flow_keys->n_proto = TPID + +Please note that TPID can be 802.1AD and, hence, BPF program would +have to parse VLAN information twice for double tagged packets. + + +Post-VLAN parsing:: + + +------+------+------+-----+-----------+-----------+ + | DMAC | SMAC | TPID | TCI |ETHER_TYPE | L3_HEADER | + +------+------+------+-----+-----------+-----------+ + ^ + | + +-- flow dissector starts here + +.. code:: c + + skb->data + flow_keys->nhoff point the to first byte of L3_HEADER + flow_keys->thoff = nhoff + flow_keys->n_proto = ETHER_TYPE + +In this case VLAN information has been processed before the flow dissector +and BPF flow dissector is not required to handle it. + + +The takeaway here is as follows: BPF flow dissector program can be called with +the optional VLAN header and should gracefully handle both cases: when single +or double VLAN is present and when it is not present. The same program +can be called for both cases and would have to be written carefully to +handle both cases. + + +Reference Implementation +======================== + +See ``tools/testing/selftests/bpf/progs/bpf_flow.c`` for the reference +implementation and ``tools/testing/selftests/bpf/flow_dissector_load.[hc]`` +for the loader. bpftool can be used to load BPF flow dissector program as well. + +The reference implementation is organized as follows: + * ``jmp_table`` map that contains sub-programs for each supported L3 protocol + * ``_dissect`` routine - entry point; it does input ``n_proto`` parsing and + does ``bpf_tail_call`` to the appropriate L3 handler + +Since BPF at this point doesn't support looping (or any jumping back), +jmp_table is used instead to handle multiple levels of encapsulation (and +IPv6 options). + + +Current Limitations +=================== +BPF flow dissector doesn't support exporting all the metadata that in-kernel +C-based implementation can export. Notable example is single VLAN (802.1Q) +and double VLAN (802.1AD) tags. Please refer to the ``struct bpf_flow_keys`` +for a set of information that's currently can be exported from the BPF context. diff --git a/Documentation/networking/bpf_flow_dissector.txt b/Documentation/networking/bpf_flow_dissector.txt deleted file mode 100644 index 70f66a2d57e7..000000000000 --- a/Documentation/networking/bpf_flow_dissector.txt +++ /dev/null @@ -1,115 +0,0 @@ -================== -BPF Flow Dissector -================== - -Overview -======== - -Flow dissector is a routine that parses metadata out of the packets. It's -used in the various places in the networking subsystem (RFS, flow hash, etc). - -BPF flow dissector is an attempt to reimplement C-based flow dissector logic -in BPF to gain all the benefits of BPF verifier (namely, limits on the -number of instructions and tail calls). - -API -=== - -BPF flow dissector programs operate on an __sk_buff. However, only the -limited set of fields is allowed: data, data_end and flow_keys. flow_keys -is 'struct bpf_flow_keys' and contains flow dissector input and -output arguments. - -The inputs are: - * nhoff - initial offset of the networking header - * thoff - initial offset of the transport header, initialized to nhoff - * n_proto - L3 protocol type, parsed out of L2 header - -Flow dissector BPF program should fill out the rest of the 'struct -bpf_flow_keys' fields. Input arguments nhoff/thoff/n_proto should be also -adjusted accordingly. - -The return code of the BPF program is either BPF_OK to indicate successful -dissection, or BPF_DROP to indicate parsing error. - -__sk_buff->data -=============== - -In the VLAN-less case, this is what the initial state of the BPF flow -dissector looks like: -+------+------+------------+-----------+ -| DMAC | SMAC | ETHER_TYPE | L3_HEADER | -+------+------+------------+-----------+ - ^ - | - +-- flow dissector starts here - -skb->data + flow_keys->nhoff point to the first byte of L3_HEADER. -flow_keys->thoff = nhoff -flow_keys->n_proto = ETHER_TYPE - - -In case of VLAN, flow dissector can be called with the two different states. - -Pre-VLAN parsing: -+------+------+------+-----+-----------+-----------+ -| DMAC | SMAC | TPID | TCI |ETHER_TYPE | L3_HEADER | -+------+------+------+-----+-----------+-----------+ - ^ - | - +-- flow dissector starts here - -skb->data + flow_keys->nhoff point the to first byte of TCI. -flow_keys->thoff = nhoff -flow_keys->n_proto = TPID - -Please note that TPID can be 802.1AD and, hence, BPF program would -have to parse VLAN information twice for double tagged packets. - - -Post-VLAN parsing: -+------+------+------+-----+-----------+-----------+ -| DMAC | SMAC | TPID | TCI |ETHER_TYPE | L3_HEADER | -+------+------+------+-----+-----------+-----------+ - ^ - | - +-- flow dissector starts here - -skb->data + flow_keys->nhoff point the to first byte of L3_HEADER. -flow_keys->thoff = nhoff -flow_keys->n_proto = ETHER_TYPE - -In this case VLAN information has been processed before the flow dissector -and BPF flow dissector is not required to handle it. - - -The takeaway here is as follows: BPF flow dissector program can be called with -the optional VLAN header and should gracefully handle both cases: when single -or double VLAN is present and when it is not present. The same program -can be called for both cases and would have to be written carefully to -handle both cases. - - -Reference Implementation -======================== - -See tools/testing/selftests/bpf/progs/bpf_flow.c for the reference -implementation and tools/testing/selftests/bpf/flow_dissector_load.[hc] for -the loader. bpftool can be used to load BPF flow dissector program as well. - -The reference implementation is organized as follows: -* jmp_table map that contains sub-programs for each supported L3 protocol -* _dissect routine - entry point; it does input n_proto parsing and does - bpf_tail_call to the appropriate L3 handler - -Since BPF at this point doesn't support looping (or any jumping back), -jmp_table is used instead to handle multiple levels of encapsulation (and -IPv6 options). - - -Current Limitations -=================== -BPF flow dissector doesn't support exporting all the metadata that in-kernel -C-based implementation can export. Notable example is single VLAN (802.1Q) -and double VLAN (802.1AD) tags. Please refer to the 'struct bpf_flow_keys' -for a set of information that's currently can be exported from the BPF context. diff --git a/Documentation/networking/index.rst b/Documentation/networking/index.rst index 5449149be496..984e68f9e026 100644 --- a/Documentation/networking/index.rst +++ b/Documentation/networking/index.rst @@ -9,6 +9,7 @@ Contents: netdev-FAQ af_xdp batman-adv + bpf_flow_dissector can can_ucan_protocol device_drivers/freescale/dpaa2/index -- cgit v1.2.3 From b11ed18efa8f3dc58b259b812588317b765b1cfc Mon Sep 17 00:00:00 2001 From: Dave Rodgman Date: Fri, 5 Apr 2019 18:38:58 -0700 Subject: lib/lzo: fix bugs for very short or empty input For very short input data (0 - 1 bytes), lzo-rle was not behaving correctly. Fix this behaviour and update documentation accordingly. For zero-length input, lzo v0 outputs an end-of-stream marker only, which was misinterpreted by lzo-rle as a bitstream version number. Ensure bitstream versions > 0 require a minimum stream length of 5. Also fixes a bug in handling the tail for very short inputs when a bitstream version is present. Link: http://lkml.kernel.org/r/20190326165857.34613-1-dave.rodgman@arm.com Signed-off-by: Dave Rodgman Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- Documentation/lzo.txt | 8 +++++--- lib/lzo/lzo1x_compress.c | 9 ++++++--- lib/lzo/lzo1x_decompress_safe.c | 4 +--- 3 files changed, 12 insertions(+), 9 deletions(-) (limited to 'Documentation') diff --git a/Documentation/lzo.txt b/Documentation/lzo.txt index f79934225d8d..ca983328976b 100644 --- a/Documentation/lzo.txt +++ b/Documentation/lzo.txt @@ -102,9 +102,11 @@ Byte sequences dictionary which is empty, and that it will always be invalid at this place. - 17 : bitstream version. If the first byte is 17, the next byte - gives the bitstream version (version 1 only). If the first byte - is not 17, the bitstream version is 0. + 17 : bitstream version. If the first byte is 17, and compressed + stream length is at least 5 bytes (length of shortest possible + versioned bitstream), the next byte gives the bitstream version + (version 1 only). + Otherwise, the bitstream version is 0. 18..21 : copy 0..3 literals state = (byte - 17) = 0..3 [ copy literals ] diff --git a/lib/lzo/lzo1x_compress.c b/lib/lzo/lzo1x_compress.c index 4525fb094844..a8ede77afe0d 100644 --- a/lib/lzo/lzo1x_compress.c +++ b/lib/lzo/lzo1x_compress.c @@ -291,13 +291,14 @@ int lzogeneric1x_1_compress(const unsigned char *in, size_t in_len, { const unsigned char *ip = in; unsigned char *op = out; + unsigned char *data_start; size_t l = in_len; size_t t = 0; signed char state_offset = -2; unsigned int m4_max_offset; - // LZO v0 will never write 17 as first byte, - // so this is used to version the bitstream + // LZO v0 will never write 17 as first byte (except for zero-length + // input), so this is used to version the bitstream if (bitstream_version > 0) { *op++ = 17; *op++ = bitstream_version; @@ -306,6 +307,8 @@ int lzogeneric1x_1_compress(const unsigned char *in, size_t in_len, m4_max_offset = M4_MAX_OFFSET_V0; } + data_start = op; + while (l > 20) { size_t ll = l <= (m4_max_offset + 1) ? l : (m4_max_offset + 1); uintptr_t ll_end = (uintptr_t) ip + ll; @@ -324,7 +327,7 @@ int lzogeneric1x_1_compress(const unsigned char *in, size_t in_len, if (t > 0) { const unsigned char *ii = in + in_len - t; - if (op == out && t <= 238) { + if (op == data_start && t <= 238) { *op++ = (17 + t); } else if (t <= 3) { op[state_offset] |= t; diff --git a/lib/lzo/lzo1x_decompress_safe.c b/lib/lzo/lzo1x_decompress_safe.c index 6d2600ea3b55..9e07e9ef1aad 100644 --- a/lib/lzo/lzo1x_decompress_safe.c +++ b/lib/lzo/lzo1x_decompress_safe.c @@ -54,11 +54,9 @@ int lzo1x_decompress_safe(const unsigned char *in, size_t in_len, if (unlikely(in_len < 3)) goto input_overrun; - if (likely(*ip == 17)) { + if (likely(in_len >= 5) && likely(*ip == 17)) { bitstream_version = ip[1]; ip += 2; - if (unlikely(in_len < 5)) - goto input_overrun; } else { bitstream_version = 0; } -- cgit v1.2.3 From be87ab0afd680ac35486d16c0963c56d9be1d8a0 Mon Sep 17 00:00:00 2001 From: Waiman Long Date: Fri, 5 Apr 2019 18:39:14 -0700 Subject: psi: clarify the units used in pressure files The output of the PSI files show a bunch of numbers with no unit. The psi.txt documentation file also does not indicate what units are used. One can only find out by looking at the source code. The units are percentage for the averages and useconds for the total. Make the information easier to find by documenting the units in psi.txt. Link: http://lkml.kernel.org/r/20190402193810.3450-1-longman@redhat.com Signed-off-by: Waiman Long Acked-by: Johannes Weiner Cc: "Peter Zijlstra (Intel)" Cc: Tejun Heo Cc: Jonathan Corbet Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- Documentation/accounting/psi.txt | 12 ++++++------ 1 file changed, 6 insertions(+), 6 deletions(-) (limited to 'Documentation') diff --git a/Documentation/accounting/psi.txt b/Documentation/accounting/psi.txt index b8ca28b60215..7e71c9c1d8e9 100644 --- a/Documentation/accounting/psi.txt +++ b/Documentation/accounting/psi.txt @@ -56,12 +56,12 @@ situation from a state where some tasks are stalled but the CPU is still doing productive work. As such, time spent in this subset of the stall state is tracked separately and exported in the "full" averages. -The ratios are tracked as recent trends over ten, sixty, and three -hundred second windows, which gives insight into short term events as -well as medium and long term trends. The total absolute stall time is -tracked and exported as well, to allow detection of latency spikes -which wouldn't necessarily make a dent in the time averages, or to -average trends over custom time frames. +The ratios (in %) are tracked as recent trends over ten, sixty, and +three hundred second windows, which gives insight into short term events +as well as medium and long term trends. The total absolute stall time +(in us) is tracked and exported as well, to allow detection of latency +spikes which wouldn't necessarily make a dent in the time averages, +or to average trends over custom time frames. Cgroup2 interface ================= -- cgit v1.2.3 From ac0722f23ff5bc1b15e268564a4d56d35cd4a1b5 Mon Sep 17 00:00:00 2001 From: Maxime Ripard Date: Mon, 18 Mar 2019 11:05:21 +0100 Subject: dt-bindings: cpu: Fix JSON schema Commit fd73403a4862 ("dt-bindings: arm: Add SMP enable-method for Milbeaut") added support for a new cpu enable-method, but did so using tabulations to ident. This is however invalid in the syntax, and resulted in a failure when trying to use that schemas for validation. Use spaces instead of tabs to indent to fix this. Fixes: fd73403a4862 ("dt-bindings: arm: Add SMP enable-method for Milbeaut") Signed-off-by: Maxime Ripard Reviewed-by: Rob Herring Acked-by: Sugaya Taichi Signed-off-by: Olof Johansson --- Documentation/devicetree/bindings/arm/cpus.yaml | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'Documentation') diff --git a/Documentation/devicetree/bindings/arm/cpus.yaml b/Documentation/devicetree/bindings/arm/cpus.yaml index 365dcf384d73..82dd7582e945 100644 --- a/Documentation/devicetree/bindings/arm/cpus.yaml +++ b/Documentation/devicetree/bindings/arm/cpus.yaml @@ -228,7 +228,7 @@ patternProperties: - renesas,r9a06g032-smp - rockchip,rk3036-smp - rockchip,rk3066-smp - - socionext,milbeaut-m10v-smp + - socionext,milbeaut-m10v-smp - ste,dbx500-smp cpu-release-addr: -- cgit v1.2.3 From 837f74116585dcd235fae1696e1e1471b6bb9e01 Mon Sep 17 00:00:00 2001 From: Nicolas Dichtel Date: Tue, 9 Apr 2019 17:16:59 +0200 Subject: xfrm: update doc about xfrm[46]_gc_thresh Those entries are not used anymore. CC: Florian Westphal Fixes: 09c7570480f7 ("xfrm: remove flow cache") Signed-off-by: Nicolas Dichtel Signed-off-by: Steffen Klassert --- Documentation/networking/ip-sysctl.txt | 2 ++ 1 file changed, 2 insertions(+) (limited to 'Documentation') diff --git a/Documentation/networking/ip-sysctl.txt b/Documentation/networking/ip-sysctl.txt index acdfb5d2bcaa..f1843b132453 100644 --- a/Documentation/networking/ip-sysctl.txt +++ b/Documentation/networking/ip-sysctl.txt @@ -1336,6 +1336,7 @@ tag - INTEGER Default value is 0. xfrm4_gc_thresh - INTEGER + (Obsolete since linux-4.14) The threshold at which we will start garbage collecting for IPv4 destination cache entries. At twice this value the system will refuse new allocations. @@ -1919,6 +1920,7 @@ echo_ignore_all - BOOLEAN Default: 0 xfrm6_gc_thresh - INTEGER + (Obsolete since linux-4.14) The threshold at which we will start garbage collecting for IPv6 destination cache entries. At twice this value the system will refuse new allocations. -- cgit v1.2.3 From 4611da30d679a4b0a2c2b5d4d7b3fbbafc922df7 Mon Sep 17 00:00:00 2001 From: Marc Dionne Date: Fri, 12 Apr 2019 16:33:47 +0100 Subject: rxrpc: Make rxrpc_kernel_check_life() indicate if call completed Make rxrpc_kernel_check_life() pass back the life counter through the argument list and return true if the call has not yet completed. Suggested-by: Marc Dionne Signed-off-by: David Howells Signed-off-by: David S. Miller --- Documentation/networking/rxrpc.txt | 16 +++++++++------- fs/afs/rxrpc.c | 4 ++-- include/net/af_rxrpc.h | 4 +++- net/rxrpc/af_rxrpc.c | 14 +++++++++----- 4 files changed, 23 insertions(+), 15 deletions(-) (limited to 'Documentation') diff --git a/Documentation/networking/rxrpc.txt b/Documentation/networking/rxrpc.txt index 2df5894353d6..cd7303d7fa25 100644 --- a/Documentation/networking/rxrpc.txt +++ b/Documentation/networking/rxrpc.txt @@ -1009,16 +1009,18 @@ The kernel interface functions are as follows: (*) Check call still alive. - u32 rxrpc_kernel_check_life(struct socket *sock, - struct rxrpc_call *call); + bool rxrpc_kernel_check_life(struct socket *sock, + struct rxrpc_call *call, + u32 *_life); void rxrpc_kernel_probe_life(struct socket *sock, struct rxrpc_call *call); - The first function returns a number that is updated when ACKs are received - from the peer (notably including PING RESPONSE ACKs which we can elicit by - sending PING ACKs to see if the call still exists on the server). The - caller should compare the numbers of two calls to see if the call is still - alive after waiting for a suitable interval. + The first function passes back in *_life a number that is updated when + ACKs are received from the peer (notably including PING RESPONSE ACKs + which we can elicit by sending PING ACKs to see if the call still exists + on the server). The caller should compare the numbers of two calls to see + if the call is still alive after waiting for a suitable interval. It also + returns true as long as the call hasn't yet reached the completed state. This allows the caller to work out if the server is still contactable and if the call is still alive on the server while waiting for the server to diff --git a/fs/afs/rxrpc.c b/fs/afs/rxrpc.c index 2c588f9bbbda..5cb11aff9298 100644 --- a/fs/afs/rxrpc.c +++ b/fs/afs/rxrpc.c @@ -621,7 +621,7 @@ static long afs_wait_for_call_to_complete(struct afs_call *call, rtt2 = 2; timeout = rtt2; - last_life = rxrpc_kernel_check_life(call->net->socket, call->rxcall); + rxrpc_kernel_check_life(call->net->socket, call->rxcall, &last_life); add_wait_queue(&call->waitq, &myself); for (;;) { @@ -639,7 +639,7 @@ static long afs_wait_for_call_to_complete(struct afs_call *call, if (afs_check_call_state(call, AFS_CALL_COMPLETE)) break; - life = rxrpc_kernel_check_life(call->net->socket, call->rxcall); + rxrpc_kernel_check_life(call->net->socket, call->rxcall, &life); if (timeout == 0 && life == last_life && signal_pending(current)) { if (stalled) diff --git a/include/net/af_rxrpc.h b/include/net/af_rxrpc.h index 2bfb87eb98ce..78c856cba4f5 100644 --- a/include/net/af_rxrpc.h +++ b/include/net/af_rxrpc.h @@ -61,10 +61,12 @@ int rxrpc_kernel_charge_accept(struct socket *, rxrpc_notify_rx_t, rxrpc_user_attach_call_t, unsigned long, gfp_t, unsigned int); void rxrpc_kernel_set_tx_length(struct socket *, struct rxrpc_call *, s64); -u32 rxrpc_kernel_check_life(const struct socket *, const struct rxrpc_call *); +bool rxrpc_kernel_check_life(const struct socket *, const struct rxrpc_call *, + u32 *); void rxrpc_kernel_probe_life(struct socket *, struct rxrpc_call *); u32 rxrpc_kernel_get_epoch(struct socket *, struct rxrpc_call *); bool rxrpc_kernel_get_reply_time(struct socket *, struct rxrpc_call *, ktime_t *); +bool rxrpc_kernel_call_is_complete(struct rxrpc_call *); #endif /* _NET_RXRPC_H */ diff --git a/net/rxrpc/af_rxrpc.c b/net/rxrpc/af_rxrpc.c index c54dce3ca0dd..ae8c5d7f3bf1 100644 --- a/net/rxrpc/af_rxrpc.c +++ b/net/rxrpc/af_rxrpc.c @@ -371,18 +371,22 @@ EXPORT_SYMBOL(rxrpc_kernel_end_call); * rxrpc_kernel_check_life - Check to see whether a call is still alive * @sock: The socket the call is on * @call: The call to check + * @_life: Where to store the life value * * Allow a kernel service to find out whether a call is still alive - ie. we're - * getting ACKs from the server. Returns a number representing the life state - * which can be compared to that returned by a previous call. + * getting ACKs from the server. Passes back in *_life a number representing + * the life state which can be compared to that returned by a previous call and + * return true if the call is still alive. * * If the life state stalls, rxrpc_kernel_probe_life() should be called and * then 2RTT waited. */ -u32 rxrpc_kernel_check_life(const struct socket *sock, - const struct rxrpc_call *call) +bool rxrpc_kernel_check_life(const struct socket *sock, + const struct rxrpc_call *call, + u32 *_life) { - return call->acks_latest; + *_life = call->acks_latest; + return call->state != RXRPC_CALL_COMPLETE; } EXPORT_SYMBOL(rxrpc_kernel_check_life); -- cgit v1.2.3 From 19fad20d15a6494f47f85d869f00b11343ee5c78 Mon Sep 17 00:00:00 2001 From: ZhangXiaoxu Date: Tue, 16 Apr 2019 09:47:24 +0800 Subject: ipv4: set the tcp_min_rtt_wlen range from 0 to one day There is a UBSAN report as below: UBSAN: Undefined behaviour in net/ipv4/tcp_input.c:2877:56 signed integer overflow: 2147483647 * 1000 cannot be represented in type 'int' CPU: 3 PID: 0 Comm: swapper/3 Not tainted 5.1.0-rc4-00058-g582549e #1 Call Trace: dump_stack+0x8c/0xba ubsan_epilogue+0x11/0x60 handle_overflow+0x12d/0x170 ? ttwu_do_wakeup+0x21/0x320 __ubsan_handle_mul_overflow+0x12/0x20 tcp_ack_update_rtt+0x76c/0x780 tcp_clean_rtx_queue+0x499/0x14d0 tcp_ack+0x69e/0x1240 ? __wake_up_sync_key+0x2c/0x50 ? update_group_capacity+0x50/0x680 tcp_rcv_established+0x4e2/0xe10 tcp_v4_do_rcv+0x22b/0x420 tcp_v4_rcv+0xfe8/0x1190 ip_protocol_deliver_rcu+0x36/0x180 ip_local_deliver+0x15b/0x1a0 ip_rcv+0xac/0xd0 __netif_receive_skb_one_core+0x7f/0xb0 __netif_receive_skb+0x33/0xc0 netif_receive_skb_internal+0x84/0x1c0 napi_gro_receive+0x2a0/0x300 receive_buf+0x3d4/0x2350 ? detach_buf_split+0x159/0x390 virtnet_poll+0x198/0x840 ? reweight_entity+0x243/0x4b0 net_rx_action+0x25c/0x770 __do_softirq+0x19b/0x66d irq_exit+0x1eb/0x230 do_IRQ+0x7a/0x150 common_interrupt+0xf/0xf It can be reproduced by: echo 2147483647 > /proc/sys/net/ipv4/tcp_min_rtt_wlen Fixes: f672258391b42 ("tcp: track min RTT using windowed min-filter") Signed-off-by: ZhangXiaoxu Signed-off-by: David S. Miller --- Documentation/networking/ip-sysctl.txt | 1 + net/ipv4/sysctl_net_ipv4.c | 5 ++++- 2 files changed, 5 insertions(+), 1 deletion(-) (limited to 'Documentation') diff --git a/Documentation/networking/ip-sysctl.txt b/Documentation/networking/ip-sysctl.txt index acdfb5d2bcaa..e2142fe40cda 100644 --- a/Documentation/networking/ip-sysctl.txt +++ b/Documentation/networking/ip-sysctl.txt @@ -422,6 +422,7 @@ tcp_min_rtt_wlen - INTEGER minimum RTT when it is moved to a longer path (e.g., due to traffic engineering). A longer window makes the filter more resistant to RTT inflations such as transient congestion. The unit is seconds. + Possible values: 0 - 86400 (1 day) Default: 300 tcp_moderate_rcvbuf - BOOLEAN diff --git a/net/ipv4/sysctl_net_ipv4.c b/net/ipv4/sysctl_net_ipv4.c index ba0fc4b18465..eeb4041fa5f9 100644 --- a/net/ipv4/sysctl_net_ipv4.c +++ b/net/ipv4/sysctl_net_ipv4.c @@ -49,6 +49,7 @@ static int ip_ping_group_range_min[] = { 0, 0 }; static int ip_ping_group_range_max[] = { GID_T_MAX, GID_T_MAX }; static int comp_sack_nr_max = 255; static u32 u32_max_div_HZ = UINT_MAX / HZ; +static int one_day_secs = 24 * 3600; /* obsolete */ static int sysctl_tcp_low_latency __read_mostly; @@ -1151,7 +1152,9 @@ static struct ctl_table ipv4_net_table[] = { .data = &init_net.ipv4.sysctl_tcp_min_rtt_wlen, .maxlen = sizeof(int), .mode = 0644, - .proc_handler = proc_dointvec + .proc_handler = proc_dointvec_minmax, + .extra1 = &zero, + .extra2 = &one_day_secs }, { .procname = "tcp_autocorking", -- cgit v1.2.3 From 36ad7022536e0c65f8baeeaa5efde11dec44808a Mon Sep 17 00:00:00 2001 From: Petr Štetiar Date: Wed, 17 Apr 2019 22:09:12 +0200 Subject: of_net: Fix residues after of_get_nvmem_mac_address removal MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit I've discovered following discrepancy in the bindings/net/ethernet.txt documentation, where it states following: - nvmem-cells: phandle, reference to an nvmem node for the MAC address; - nvmem-cell-names: string, should be "mac-address" if nvmem is to be.. which is actually misleading and confusing. There are only two ethernet drivers in the tree, cadence/macb and davinci which supports this properties. This nvmem-cell* properties were introduced in commit 9217e566bdee ("of_net: Implement of_get_nvmem_mac_address helper"), but commit afa64a72b862 ("of: net: kill of_get_nvmem_mac_address()") forget to properly clean up this parts. So this patch fixes the documentation by moving the nvmem-cell* properties at the appropriate places. While at it, I've removed unused include as well. Cc: Bartosz Golaszewski Fixes: afa64a72b862 ("of: net: kill of_get_nvmem_mac_address()") Signed-off-by: Petr Štetiar Signed-off-by: David S. Miller --- Documentation/devicetree/bindings/net/davinci_emac.txt | 2 ++ Documentation/devicetree/bindings/net/ethernet.txt | 2 -- Documentation/devicetree/bindings/net/macb.txt | 4 ++++ drivers/of/of_net.c | 1 - 4 files changed, 6 insertions(+), 3 deletions(-) (limited to 'Documentation') diff --git a/Documentation/devicetree/bindings/net/davinci_emac.txt b/Documentation/devicetree/bindings/net/davinci_emac.txt index 24c5cdaba8d2..ca83dcc84fb8 100644 --- a/Documentation/devicetree/bindings/net/davinci_emac.txt +++ b/Documentation/devicetree/bindings/net/davinci_emac.txt @@ -20,6 +20,8 @@ Required properties: Optional properties: - phy-handle: See ethernet.txt file in the same directory. If absent, davinci_emac driver defaults to 100/FULL. +- nvmem-cells: phandle, reference to an nvmem node for the MAC address +- nvmem-cell-names: string, should be "mac-address" if nvmem is to be used - ti,davinci-rmii-en: 1 byte, 1 means use RMII - ti,davinci-no-bd-ram: boolean, does EMAC have BD RAM? diff --git a/Documentation/devicetree/bindings/net/ethernet.txt b/Documentation/devicetree/bindings/net/ethernet.txt index cfc376bc977a..2974e63ba311 100644 --- a/Documentation/devicetree/bindings/net/ethernet.txt +++ b/Documentation/devicetree/bindings/net/ethernet.txt @@ -10,8 +10,6 @@ Documentation/devicetree/bindings/phy/phy-bindings.txt. the boot program; should be used in cases where the MAC address assigned to the device by the boot program is different from the "local-mac-address" property; -- nvmem-cells: phandle, reference to an nvmem node for the MAC address; -- nvmem-cell-names: string, should be "mac-address" if nvmem is to be used; - max-speed: number, specifies maximum speed in Mbit/s supported by the device; - max-frame-size: number, maximum transfer unit (IEEE defined MTU), rather than the maximum frame size (there's contradiction in the Devicetree diff --git a/Documentation/devicetree/bindings/net/macb.txt b/Documentation/devicetree/bindings/net/macb.txt index 174f292d8a3e..8b80515729d7 100644 --- a/Documentation/devicetree/bindings/net/macb.txt +++ b/Documentation/devicetree/bindings/net/macb.txt @@ -26,6 +26,10 @@ Required properties: Optional elements: 'tsu_clk' - clocks: Phandles to input clocks. +Optional properties: +- nvmem-cells: phandle, reference to an nvmem node for the MAC address +- nvmem-cell-names: string, should be "mac-address" if nvmem is to be used + Optional properties for PHY child node: - reset-gpios : Should specify the gpio for phy reset - magic-packet : If present, indicates that the hardware supports waking diff --git a/drivers/of/of_net.c b/drivers/of/of_net.c index 810ab0fbcccb..d820f3edd431 100644 --- a/drivers/of/of_net.c +++ b/drivers/of/of_net.c @@ -7,7 +7,6 @@ */ #include #include -#include #include #include #include -- cgit v1.2.3 From c2b71462d294cf517a0bc6e4fd6424d7cee5596f Mon Sep 17 00:00:00 2001 From: Alan Stern Date: Fri, 19 Apr 2019 13:52:38 -0400 Subject: USB: core: Fix bug caused by duplicate interface PM usage counter The syzkaller fuzzer reported a bug in the USB hub driver which turned out to be caused by a negative runtime-PM usage counter. This allowed a hub to be runtime suspended at a time when the driver did not expect it. The symptom is a WARNING issued because the hub's status URB is submitted while it is already active: URB 0000000031fb463e submitted while active WARNING: CPU: 0 PID: 2917 at drivers/usb/core/urb.c:363 The negative runtime-PM usage count was caused by an unfortunate design decision made when runtime PM was first implemented for USB. At that time, USB class drivers were allowed to unbind from their interfaces without balancing the usage counter (i.e., leaving it with a positive count). The core code would take care of setting the counter back to 0 before allowing another driver to bind to the interface. Later on when runtime PM was implemented for the entire kernel, the opposite decision was made: Drivers were required to balance their runtime-PM get and put calls. In order to maintain backward compatibility, however, the USB subsystem adapted to the new implementation by keeping an independent usage counter for each interface and using it to automatically adjust the normal usage counter back to 0 whenever a driver was unbound. This approach involves duplicating information, but what is worse, it doesn't work properly in cases where a USB class driver delays decrementing the usage counter until after the driver's disconnect() routine has returned and the counter has been adjusted back to 0. Doing so would cause the usage counter to become negative. There's even a warning about this in the USB power management documentation! As it happens, this is exactly what the hub driver does. The kick_hub_wq() routine increments the runtime-PM usage counter, and the corresponding decrement is carried out by hub_event() in the context of the hub_wq work-queue thread. This work routine may sometimes run after the driver has been unbound from its interface, and when it does it causes the usage counter to go negative. It is not possible for hub_disconnect() to wait for a pending hub_event() call to finish, because hub_disconnect() is called with the device lock held and hub_event() acquires that lock. The only feasible fix is to reverse the original design decision: remove the duplicate interface-specific usage counter and require USB drivers to balance their runtime PM gets and puts. As far as I know, all existing drivers currently do this. Signed-off-by: Alan Stern Reported-and-tested-by: syzbot+7634edaea4d0b341c625@syzkaller.appspotmail.com CC: Signed-off-by: Greg Kroah-Hartman --- Documentation/driver-api/usb/power-management.rst | 14 +++++++++----- drivers/usb/core/driver.c | 13 ------------- drivers/usb/storage/realtek_cr.c | 13 +++++-------- include/linux/usb.h | 2 -- 4 files changed, 14 insertions(+), 28 deletions(-) (limited to 'Documentation') diff --git a/Documentation/driver-api/usb/power-management.rst b/Documentation/driver-api/usb/power-management.rst index 79beb807996b..4a74cf6f2797 100644 --- a/Documentation/driver-api/usb/power-management.rst +++ b/Documentation/driver-api/usb/power-management.rst @@ -370,11 +370,15 @@ autosuspend the interface's device. When the usage counter is = 0 then the interface is considered to be idle, and the kernel may autosuspend the device. -Drivers need not be concerned about balancing changes to the usage -counter; the USB core will undo any remaining "get"s when a driver -is unbound from its interface. As a corollary, drivers must not call -any of the ``usb_autopm_*`` functions after their ``disconnect`` -routine has returned. +Drivers must be careful to balance their overall changes to the usage +counter. Unbalanced "get"s will remain in effect when a driver is +unbound from its interface, preventing the device from going into +runtime suspend should the interface be bound to a driver again. On +the other hand, drivers are allowed to achieve this balance by calling +the ``usb_autopm_*`` functions even after their ``disconnect`` routine +has returned -- say from within a work-queue routine -- provided they +retain an active reference to the interface (via ``usb_get_intf`` and +``usb_put_intf``). Drivers using the async routines are responsible for their own synchronization and mutual exclusion. diff --git a/drivers/usb/core/driver.c b/drivers/usb/core/driver.c index 8987cec9549d..ebcadaad89d1 100644 --- a/drivers/usb/core/driver.c +++ b/drivers/usb/core/driver.c @@ -473,11 +473,6 @@ static int usb_unbind_interface(struct device *dev) pm_runtime_disable(dev); pm_runtime_set_suspended(dev); - /* Undo any residual pm_autopm_get_interface_* calls */ - for (r = atomic_read(&intf->pm_usage_cnt); r > 0; --r) - usb_autopm_put_interface_no_suspend(intf); - atomic_set(&intf->pm_usage_cnt, 0); - if (!error) usb_autosuspend_device(udev); @@ -1633,7 +1628,6 @@ void usb_autopm_put_interface(struct usb_interface *intf) int status; usb_mark_last_busy(udev); - atomic_dec(&intf->pm_usage_cnt); status = pm_runtime_put_sync(&intf->dev); dev_vdbg(&intf->dev, "%s: cnt %d -> %d\n", __func__, atomic_read(&intf->dev.power.usage_count), @@ -1662,7 +1656,6 @@ void usb_autopm_put_interface_async(struct usb_interface *intf) int status; usb_mark_last_busy(udev); - atomic_dec(&intf->pm_usage_cnt); status = pm_runtime_put(&intf->dev); dev_vdbg(&intf->dev, "%s: cnt %d -> %d\n", __func__, atomic_read(&intf->dev.power.usage_count), @@ -1684,7 +1677,6 @@ void usb_autopm_put_interface_no_suspend(struct usb_interface *intf) struct usb_device *udev = interface_to_usbdev(intf); usb_mark_last_busy(udev); - atomic_dec(&intf->pm_usage_cnt); pm_runtime_put_noidle(&intf->dev); } EXPORT_SYMBOL_GPL(usb_autopm_put_interface_no_suspend); @@ -1715,8 +1707,6 @@ int usb_autopm_get_interface(struct usb_interface *intf) status = pm_runtime_get_sync(&intf->dev); if (status < 0) pm_runtime_put_sync(&intf->dev); - else - atomic_inc(&intf->pm_usage_cnt); dev_vdbg(&intf->dev, "%s: cnt %d -> %d\n", __func__, atomic_read(&intf->dev.power.usage_count), status); @@ -1750,8 +1740,6 @@ int usb_autopm_get_interface_async(struct usb_interface *intf) status = pm_runtime_get(&intf->dev); if (status < 0 && status != -EINPROGRESS) pm_runtime_put_noidle(&intf->dev); - else - atomic_inc(&intf->pm_usage_cnt); dev_vdbg(&intf->dev, "%s: cnt %d -> %d\n", __func__, atomic_read(&intf->dev.power.usage_count), status); @@ -1775,7 +1763,6 @@ void usb_autopm_get_interface_no_resume(struct usb_interface *intf) struct usb_device *udev = interface_to_usbdev(intf); usb_mark_last_busy(udev); - atomic_inc(&intf->pm_usage_cnt); pm_runtime_get_noresume(&intf->dev); } EXPORT_SYMBOL_GPL(usb_autopm_get_interface_no_resume); diff --git a/drivers/usb/storage/realtek_cr.c b/drivers/usb/storage/realtek_cr.c index 31b024441938..cc794e25a0b6 100644 --- a/drivers/usb/storage/realtek_cr.c +++ b/drivers/usb/storage/realtek_cr.c @@ -763,18 +763,16 @@ static void rts51x_suspend_timer_fn(struct timer_list *t) break; case RTS51X_STAT_IDLE: case RTS51X_STAT_SS: - usb_stor_dbg(us, "RTS51X_STAT_SS, intf->pm_usage_cnt:%d, power.usage:%d\n", - atomic_read(&us->pusb_intf->pm_usage_cnt), + usb_stor_dbg(us, "RTS51X_STAT_SS, power.usage:%d\n", atomic_read(&us->pusb_intf->dev.power.usage_count)); - if (atomic_read(&us->pusb_intf->pm_usage_cnt) > 0) { + if (atomic_read(&us->pusb_intf->dev.power.usage_count) > 0) { usb_stor_dbg(us, "Ready to enter SS state\n"); rts51x_set_stat(chip, RTS51X_STAT_SS); /* ignore mass storage interface's children */ pm_suspend_ignore_children(&us->pusb_intf->dev, true); usb_autopm_put_interface_async(us->pusb_intf); - usb_stor_dbg(us, "RTS51X_STAT_SS 01, intf->pm_usage_cnt:%d, power.usage:%d\n", - atomic_read(&us->pusb_intf->pm_usage_cnt), + usb_stor_dbg(us, "RTS51X_STAT_SS 01, power.usage:%d\n", atomic_read(&us->pusb_intf->dev.power.usage_count)); } break; @@ -807,11 +805,10 @@ static void rts51x_invoke_transport(struct scsi_cmnd *srb, struct us_data *us) int ret; if (working_scsi(srb)) { - usb_stor_dbg(us, "working scsi, intf->pm_usage_cnt:%d, power.usage:%d\n", - atomic_read(&us->pusb_intf->pm_usage_cnt), + usb_stor_dbg(us, "working scsi, power.usage:%d\n", atomic_read(&us->pusb_intf->dev.power.usage_count)); - if (atomic_read(&us->pusb_intf->pm_usage_cnt) <= 0) { + if (atomic_read(&us->pusb_intf->dev.power.usage_count) <= 0) { ret = usb_autopm_get_interface(us->pusb_intf); usb_stor_dbg(us, "working scsi, ret=%d\n", ret); } diff --git a/include/linux/usb.h b/include/linux/usb.h index 5e49e82c4368..ff010d1fd1c7 100644 --- a/include/linux/usb.h +++ b/include/linux/usb.h @@ -200,7 +200,6 @@ usb_find_last_int_out_endpoint(struct usb_host_interface *alt, * @dev: driver model's view of this device * @usb_dev: if an interface is bound to the USB major, this will point * to the sysfs representation for that device. - * @pm_usage_cnt: PM usage counter for this interface * @reset_ws: Used for scheduling resets from atomic context. * @resetting_device: USB core reset the device, so use alt setting 0 as * current; needs bandwidth alloc after reset. @@ -257,7 +256,6 @@ struct usb_interface { struct device dev; /* interface specific device info */ struct device *usb_dev; - atomic_t pm_usage_cnt; /* usage counter for autosuspend */ struct work_struct reset_ws; /* for resets in atomic context */ }; #define to_usb_interface(d) container_of(d, struct usb_interface, dev) -- cgit v1.2.3 From 39420fe04f093c15e1674ef56dbae0df109738ec Mon Sep 17 00:00:00 2001 From: Corentin Labbe Date: Sat, 20 Apr 2019 18:14:33 +0000 Subject: dt-bindings: add an explanation for internal phy-mode When working on the Allwinner internal PHY, the first work was to use the "internal" mode, but some answer was made my mail on what are really internal mean for PHY. This patch write that in the doc. Signed-off-by: Corentin Labbe Reviewed-by: Andrew Lunn Signed-off-by: David S. Miller --- Documentation/devicetree/bindings/net/ethernet.txt | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) (limited to 'Documentation') diff --git a/Documentation/devicetree/bindings/net/ethernet.txt b/Documentation/devicetree/bindings/net/ethernet.txt index 2974e63ba311..a68621580584 100644 --- a/Documentation/devicetree/bindings/net/ethernet.txt +++ b/Documentation/devicetree/bindings/net/ethernet.txt @@ -16,7 +16,8 @@ Documentation/devicetree/bindings/phy/phy-bindings.txt. Specification). - phy-mode: string, operation mode of the PHY interface. This is now a de-facto standard property; supported values are: - * "internal" + * "internal" (Internal means there is not a standard bus between the MAC and + the PHY, something proprietary is being used to embed the PHY in the MAC.) * "mii" * "gmii" * "sgmii" -- cgit v1.2.3 From 26d1b8586b4fe14814ff4fd471cfc56014359e59 Mon Sep 17 00:00:00 2001 From: Corentin Labbe Date: Sat, 20 Apr 2019 16:43:01 +0000 Subject: Documentation: decnet: remove reference to CONFIG_DECNET_ROUTE_FWMARK CONFIG_DECNET_ROUTE_FWMARK was removed in commit 47dcf0cb1005 ("[NET]: Rethink mark field in struct flowi") Since nothing replace it (and nothindg need to replace it, simply remove it from documentation. Signed-off-by: Corentin Labbe Signed-off-by: David S. Miller --- Documentation/networking/decnet.txt | 2 -- 1 file changed, 2 deletions(-) (limited to 'Documentation') diff --git a/Documentation/networking/decnet.txt b/Documentation/networking/decnet.txt index e12a4900cf72..d192f8b9948b 100644 --- a/Documentation/networking/decnet.txt +++ b/Documentation/networking/decnet.txt @@ -22,8 +22,6 @@ you'll need the following options as well... CONFIG_DECNET_ROUTER (to be able to add/delete routes) CONFIG_NETFILTER (will be required for the DECnet routing daemon) - CONFIG_DECNET_ROUTE_FWMARK is optional - Don't turn on SIOCGIFCONF support for DECnet unless you are really sure that you need it, in general you won't and it can cause ifconfig to malfunction. -- cgit v1.2.3 From 24512228b7a3f412b5a51f189df302616b021c33 Mon Sep 17 00:00:00 2001 From: Mel Gorman Date: Thu, 25 Apr 2019 22:23:51 -0700 Subject: mm: do not boost watermarks to avoid fragmentation for the DISCONTIG memory model Mikulas Patocka reported that commit 1c30844d2dfe ("mm: reclaim small amounts of memory when an external fragmentation event occurs") "broke" memory management on parisc. The machine is not NUMA but the DISCONTIG model creates three pgdats even though it's a UMA machine for the following ranges 0) Start 0x0000000000000000 End 0x000000003fffffff Size 1024 MB 1) Start 0x0000000100000000 End 0x00000001bfdfffff Size 3070 MB 2) Start 0x0000004040000000 End 0x00000040ffffffff Size 3072 MB Mikulas reported: With the patch 1c30844d2, the kernel will incorrectly reclaim the first zone when it fills up, ignoring the fact that there are two completely free zones. Basiscally, it limits cache size to 1GiB. For example, if I run: # dd if=/dev/sda of=/dev/null bs=1M count=2048 - with the proper kernel, there should be "Buffers - 2GiB" when this command finishes. With the patch 1c30844d2, buffers will consume just 1GiB or slightly more, because the kernel was incorrectly reclaiming them. The page allocator and reclaim makes assumptions that pgdats really represent NUMA nodes and zones represent ranges and makes decisions on that basis. Watermark boosting for small pgdats leads to unexpected results even though this would have behaved reasonably on SPARSEMEM. DISCONTIG is essentially deprecated and even parisc plans to move to SPARSEMEM so there is no need to be fancy, this patch simply disables watermark boosting by default on DISCONTIGMEM. Link: http://lkml.kernel.org/r/20190419094335.GJ18914@techsingularity.net Fixes: 1c30844d2dfe ("mm: reclaim small amounts of memory when an external fragmentation event occurs") Signed-off-by: Mel Gorman Reported-by: Mikulas Patocka Tested-by: Mikulas Patocka Acked-by: Vlastimil Babka Cc: James Bottomley Cc: Matthew Wilcox Cc: Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- Documentation/sysctl/vm.txt | 16 ++++++++-------- mm/page_alloc.c | 13 +++++++++++++ 2 files changed, 21 insertions(+), 8 deletions(-) (limited to 'Documentation') diff --git a/Documentation/sysctl/vm.txt b/Documentation/sysctl/vm.txt index 6af24cdb25cc..3f13d8599337 100644 --- a/Documentation/sysctl/vm.txt +++ b/Documentation/sysctl/vm.txt @@ -866,14 +866,14 @@ The intent is that compaction has less work to do in the future and to increase the success rate of future high-order allocations such as SLUB allocations, THP and hugetlbfs pages. -To make it sensible with respect to the watermark_scale_factor parameter, -the unit is in fractions of 10,000. The default value of 15,000 means -that up to 150% of the high watermark will be reclaimed in the event of -a pageblock being mixed due to fragmentation. The level of reclaim is -determined by the number of fragmentation events that occurred in the -recent past. If this value is smaller than a pageblock then a pageblocks -worth of pages will be reclaimed (e.g. 2MB on 64-bit x86). A boost factor -of 0 will disable the feature. +To make it sensible with respect to the watermark_scale_factor +parameter, the unit is in fractions of 10,000. The default value of +15,000 on !DISCONTIGMEM configurations means that up to 150% of the high +watermark will be reclaimed in the event of a pageblock being mixed due +to fragmentation. The level of reclaim is determined by the number of +fragmentation events that occurred in the recent past. If this value is +smaller than a pageblock then a pageblocks worth of pages will be reclaimed +(e.g. 2MB on 64-bit x86). A boost factor of 0 will disable the feature. ============================================================= diff --git a/mm/page_alloc.c b/mm/page_alloc.c index c6ce20aaf80b..6c6d9f1c404e 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -266,7 +266,20 @@ compound_page_dtor * const compound_page_dtors[] = { int min_free_kbytes = 1024; int user_min_free_kbytes = -1; +#ifdef CONFIG_DISCONTIGMEM +/* + * DiscontigMem defines memory ranges as separate pg_data_t even if the ranges + * are not on separate NUMA nodes. Functionally this works but with + * watermark_boost_factor, it can reclaim prematurely as the ranges can be + * quite small. By default, do not boost watermarks on discontigmem as in + * many cases very high-order allocations like THP are likely to be + * unsupported and the premature reclaim offsets the advantage of long-term + * fragmentation avoidance. + */ +int watermark_boost_factor __read_mostly; +#else int watermark_boost_factor __read_mostly = 15000; +#endif int watermark_scale_factor = 10; static unsigned long nr_kernel_pages __initdata; -- cgit v1.2.3 From dbcdae185a704068c22984d6d05acc140ec03a8f Mon Sep 17 00:00:00 2001 From: Andrew Jones Date: Mon, 29 Apr 2019 10:27:10 +0200 Subject: Documentation: kvm: fix dirty log ioctl arch lists KVM_GET_DIRTY_LOG is implemented by all architectures, not just x86, and KVM_CAP_MANUAL_DIRTY_LOG_PROTECT is additionally implemented by arm, arm64, and mips. Signed-off-by: Andrew Jones Signed-off-by: Paolo Bonzini --- Documentation/virtual/kvm/api.txt | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) (limited to 'Documentation') diff --git a/Documentation/virtual/kvm/api.txt b/Documentation/virtual/kvm/api.txt index 67068c47c591..7f3ebc9e7cee 100644 --- a/Documentation/virtual/kvm/api.txt +++ b/Documentation/virtual/kvm/api.txt @@ -321,7 +321,7 @@ cpu's hardware control block. 4.8 KVM_GET_DIRTY_LOG (vm ioctl) Capability: basic -Architectures: x86 +Architectures: all Type: vm ioctl Parameters: struct kvm_dirty_log (in/out) Returns: 0 on success, -1 on error @@ -3810,7 +3810,7 @@ to I/O ports. 4.117 KVM_CLEAR_DIRTY_LOG (vm ioctl) Capability: KVM_CAP_MANUAL_DIRTY_LOG_PROTECT -Architectures: x86 +Architectures: x86, arm, arm64, mips Type: vm ioctl Parameters: struct kvm_dirty_log (in) Returns: 0 on success, -1 on error @@ -4799,7 +4799,7 @@ and injected exceptions. 7.18 KVM_CAP_MANUAL_DIRTY_LOG_PROTECT -Architectures: all +Architectures: x86, arm, arm64, mips Parameters: args[0] whether feature should be enabled or not With this capability enabled, KVM_GET_DIRTY_LOG will not automatically -- cgit v1.2.3 From 76d58e0f07ec203bbdfcaabd9a9fc10a5a3ed5ea Mon Sep 17 00:00:00 2001 From: Paolo Bonzini Date: Wed, 17 Apr 2019 15:28:44 +0200 Subject: KVM: fix KVM_CLEAR_DIRTY_LOG for memory slots of unaligned size If a memory slot's size is not a multiple of 64 pages (256K), then the KVM_CLEAR_DIRTY_LOG API is unusable: clearing the final 64 pages either requires the requested page range to go beyond memslot->npages, or requires log->num_pages to be unaligned, and kvm_clear_dirty_log_protect requires log->num_pages to be both in range and aligned. To allow this case, allow log->num_pages not to be a multiple of 64 if it ends exactly on the last page of the slot. Reported-by: Peter Xu Fixes: 98938aa8edd6 ("KVM: validate userspace input in kvm_clear_dirty_log_protect()", 2019-01-02) Signed-off-by: Paolo Bonzini --- Documentation/virtual/kvm/api.txt | 5 +++-- tools/testing/selftests/kvm/dirty_log_test.c | 9 ++++++--- virt/kvm/kvm_main.c | 7 ++++--- 3 files changed, 13 insertions(+), 8 deletions(-) (limited to 'Documentation') diff --git a/Documentation/virtual/kvm/api.txt b/Documentation/virtual/kvm/api.txt index 7f3ebc9e7cee..64b38dfcc243 100644 --- a/Documentation/virtual/kvm/api.txt +++ b/Documentation/virtual/kvm/api.txt @@ -3830,8 +3830,9 @@ The ioctl clears the dirty status of pages in a memory slot, according to the bitmap that is passed in struct kvm_clear_dirty_log's dirty_bitmap field. Bit 0 of the bitmap corresponds to page "first_page" in the memory slot, and num_pages is the size in bits of the input bitmap. -Both first_page and num_pages must be a multiple of 64. For each bit -that is set in the input bitmap, the corresponding page is marked "clean" +first_page must be a multiple of 64; num_pages must also be a multiple of +64 unless first_page + num_pages is the size of the memory slot. For each +bit that is set in the input bitmap, the corresponding page is marked "clean" in KVM's dirty bitmap, and dirty tracking is re-enabled for that page (for example via write-protection, or by clearing the dirty bit in a page table entry). diff --git a/tools/testing/selftests/kvm/dirty_log_test.c b/tools/testing/selftests/kvm/dirty_log_test.c index 4715cfba20dc..93f99c6b7d79 100644 --- a/tools/testing/selftests/kvm/dirty_log_test.c +++ b/tools/testing/selftests/kvm/dirty_log_test.c @@ -288,8 +288,11 @@ static void run_test(enum vm_guest_mode mode, unsigned long iterations, #endif max_gfn = (1ul << (guest_pa_bits - guest_page_shift)) - 1; guest_page_size = (1ul << guest_page_shift); - /* 1G of guest page sized pages */ - guest_num_pages = (1ul << (30 - guest_page_shift)); + /* + * A little more than 1G of guest page sized pages. Cover the + * case where the size is not aligned to 64 pages. + */ + guest_num_pages = (1ul << (30 - guest_page_shift)) + 3; host_page_size = getpagesize(); host_num_pages = (guest_num_pages * guest_page_size) / host_page_size + !!((guest_num_pages * guest_page_size) % host_page_size); @@ -359,7 +362,7 @@ static void run_test(enum vm_guest_mode mode, unsigned long iterations, kvm_vm_get_dirty_log(vm, TEST_MEM_SLOT_INDEX, bmap); #ifdef USE_CLEAR_DIRTY_LOG kvm_vm_clear_dirty_log(vm, TEST_MEM_SLOT_INDEX, bmap, 0, - DIV_ROUND_UP(host_num_pages, 64) * 64); + host_num_pages); #endif vm_dirty_log_verify(bmap); iteration++; diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c index dc8edc97ba85..a704d1f9bd96 100644 --- a/virt/kvm/kvm_main.c +++ b/virt/kvm/kvm_main.c @@ -1240,7 +1240,7 @@ int kvm_clear_dirty_log_protect(struct kvm *kvm, if (as_id >= KVM_ADDRESS_SPACE_NUM || id >= KVM_USER_MEM_SLOTS) return -EINVAL; - if ((log->first_page & 63) || (log->num_pages & 63)) + if (log->first_page & 63) return -EINVAL; slots = __kvm_memslots(kvm, as_id); @@ -1253,8 +1253,9 @@ int kvm_clear_dirty_log_protect(struct kvm *kvm, n = kvm_dirty_bitmap_bytes(memslot); if (log->first_page > memslot->npages || - log->num_pages > memslot->npages - log->first_page) - return -EINVAL; + log->num_pages > memslot->npages - log->first_page || + (log->num_pages < memslot->npages - log->first_page && (log->num_pages & 63))) + return -EINVAL; *flush = false; dirty_bitmap_buffer = kvm_second_dirty_bitmap(memslot); -- cgit v1.2.3 From 799381e49b4e7b0177664cdbc54a2de8b41647a8 Mon Sep 17 00:00:00 2001 From: Randy Dunlap Date: Sun, 28 Apr 2019 18:10:39 -0700 Subject: Documentation: fix netdev-FAQ.rst markup warning Fix ReST underline warning: ./Documentation/networking/netdev-FAQ.rst:135: WARNING: Title underline too short. Q: I made changes to only a few patches in a patch series should I resend only those changed? -------------------------------------------------------------------------------------------- Fixes: ffa91253739c ("Documentation: networking: Update netdev-FAQ regarding patches") Signed-off-by: Randy Dunlap Cc: Florian Fainelli Signed-off-by: David S. Miller --- Documentation/networking/netdev-FAQ.rst | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'Documentation') diff --git a/Documentation/networking/netdev-FAQ.rst b/Documentation/networking/netdev-FAQ.rst index 8c7a713cf657..642fa963be3c 100644 --- a/Documentation/networking/netdev-FAQ.rst +++ b/Documentation/networking/netdev-FAQ.rst @@ -132,7 +132,7 @@ version that should be applied. If there is any doubt, the maintainer will reply and ask what should be done. Q: I made changes to only a few patches in a patch series should I resend only those changed? --------------------------------------------------------------------------------------------- +--------------------------------------------------------------------------------------------- A: No, please resend the entire patch series and make sure you do number your patches such that it is clear this is the latest and greatest set of patches that can be applied. -- cgit v1.2.3 From 13a22f73319c75511bd6b49cc77afb15fed0a91e Mon Sep 17 00:00:00 2001 From: Bartosz Golaszewski Date: Tue, 23 Apr 2019 11:04:41 +0200 Subject: dt-bindings: mfd: Add DT bindings for max77650 Add a DT binding document for max77650 ultra-low power PMIC. This describes the core mfd device and the GPIO module. Signed-off-by: Bartosz Golaszewski Reviewed-by: Rob Herring Acked-by: Pavel Machek Signed-off-by: Lee Jones --- Documentation/devicetree/bindings/mfd/max77650.txt | 46 ++++++++++++++++++++++ 1 file changed, 46 insertions(+) create mode 100644 Documentation/devicetree/bindings/mfd/max77650.txt (limited to 'Documentation') diff --git a/Documentation/devicetree/bindings/mfd/max77650.txt b/Documentation/devicetree/bindings/mfd/max77650.txt new file mode 100644 index 000000000000..b529d8d19335 --- /dev/null +++ b/Documentation/devicetree/bindings/mfd/max77650.txt @@ -0,0 +1,46 @@ +MAX77650 ultra low-power PMIC from Maxim Integrated. + +Required properties: +------------------- +- compatible: Must be "maxim,max77650" +- reg: I2C device address. +- interrupts: The interrupt on the parent the controller is + connected to. +- interrupt-controller: Marks the device node as an interrupt controller. +- #interrupt-cells: Must be <2>. + +- gpio-controller: Marks the device node as a gpio controller. +- #gpio-cells: Must be <2>. The first cell is the pin number and + the second cell is used to specify the gpio active + state. + +Optional properties: +-------------------- +gpio-line-names: Single string containing the name of the GPIO line. + +The GPIO-controller module is represented as part of the top-level PMIC +node. The device exposes a single GPIO line. + +For device-tree bindings of other sub-modules (regulator, power supply, +LEDs and onkey) refer to the binding documents under the respective +sub-system directories. + +For more details on GPIO bindings, please refer to the generic GPIO DT +binding document . + +Example: +-------- + + pmic@48 { + compatible = "maxim,max77650"; + reg = <0x48>; + + interrupt-controller; + interrupt-parent = <&gpio2>; + #interrupt-cells = <2>; + interrupts = <3 IRQ_TYPE_LEVEL_LOW>; + + gpio-controller; + #gpio-cells = <2>; + gpio-line-names = "max77650-charger"; + }; -- cgit v1.2.3 From 424ece627cd9e61ecd35ddd306b6bbda5e17ec24 Mon Sep 17 00:00:00 2001 From: Bartosz Golaszewski Date: Tue, 23 Apr 2019 11:04:42 +0200 Subject: dt-bindings: power: supply: Add DT bindings for max77650 Add the DT binding document for the battery charger module of max77650. Signed-off-by: Bartosz Golaszewski Reviewed-by: Sebastian Reichel Signed-off-by: Lee Jones --- .../bindings/power/supply/max77650-charger.txt | 28 ++++++++++++++++++++++ 1 file changed, 28 insertions(+) create mode 100644 Documentation/devicetree/bindings/power/supply/max77650-charger.txt (limited to 'Documentation') diff --git a/Documentation/devicetree/bindings/power/supply/max77650-charger.txt b/Documentation/devicetree/bindings/power/supply/max77650-charger.txt new file mode 100644 index 000000000000..e6d0fb6ff94e --- /dev/null +++ b/Documentation/devicetree/bindings/power/supply/max77650-charger.txt @@ -0,0 +1,28 @@ +Battery charger driver for MAX77650 PMIC from Maxim Integrated. + +This module is part of the MAX77650 MFD device. For more details +see Documentation/devicetree/bindings/mfd/max77650.txt. + +The charger is represented as a sub-node of the PMIC node on the device tree. + +Required properties: +-------------------- +- compatible: Must be "maxim,max77650-charger" + +Optional properties: +-------------------- +- input-voltage-min-microvolt: Minimum CHGIN regulation voltage. Must be one + of: 4000000, 4100000, 4200000, 4300000, + 4400000, 4500000, 4600000, 4700000. +- input-current-limit-microamp: CHGIN input current limit (in microamps). Must + be one of: 95000, 190000, 285000, 380000, + 475000. + +Example: +-------- + + charger { + compatible = "maxim,max77650-charger"; + input-voltage-min-microvolt = <4200000>; + input-current-limit-microamp = <285000>; + }; -- cgit v1.2.3 From 5a032b0697c79d7499ba955b40fa9a7756f08dad Mon Sep 17 00:00:00 2001 From: Bartosz Golaszewski Date: Tue, 23 Apr 2019 11:04:43 +0200 Subject: dt-bindings: leds: Add DT bindings for max77650 Add the DT binding document for the LEDs module of max77650. Signed-off-by: Bartosz Golaszewski Reviewed-by: Rob Herring Acked-by: Pavel Machek Signed-off-by: Lee Jones --- .../devicetree/bindings/leds/leds-max77650.txt | 57 ++++++++++++++++++++++ 1 file changed, 57 insertions(+) create mode 100644 Documentation/devicetree/bindings/leds/leds-max77650.txt (limited to 'Documentation') diff --git a/Documentation/devicetree/bindings/leds/leds-max77650.txt b/Documentation/devicetree/bindings/leds/leds-max77650.txt new file mode 100644 index 000000000000..3a67115cc1da --- /dev/null +++ b/Documentation/devicetree/bindings/leds/leds-max77650.txt @@ -0,0 +1,57 @@ +LED driver for MAX77650 PMIC from Maxim Integrated. + +This module is part of the MAX77650 MFD device. For more details +see Documentation/devicetree/bindings/mfd/max77650.txt. + +The LED controller is represented as a sub-node of the PMIC node on +the device tree. + +This device has three current sinks. + +Required properties: +-------------------- +- compatible: Must be "maxim,max77650-led" +- #address-cells: Must be <1>. +- #size-cells: Must be <0>. + +Each LED is represented as a sub-node of the LED-controller node. Up to +three sub-nodes can be defined. + +Required properties of the sub-node: +------------------------------------ + +- reg: Must be <0>, <1> or <2>. + +Optional properties of the sub-node: +------------------------------------ + +- label: See Documentation/devicetree/bindings/leds/common.txt +- linux,default-trigger: See Documentation/devicetree/bindings/leds/common.txt + +For more details, please refer to the generic GPIO DT binding document +. + +Example: +-------- + + leds { + compatible = "maxim,max77650-led"; + #address-cells = <1>; + #size-cells = <0>; + + led@0 { + reg = <0>; + label = "blue:usr0"; + }; + + led@1 { + reg = <1>; + label = "red:usr1"; + linux,default-trigger = "heartbeat"; + }; + + led@2 { + reg = <2>; + label = "green:usr2"; + }; + }; -- cgit v1.2.3 From 93fb61e2c3defc6fe89cdf041f7f6a659b64ec56 Mon Sep 17 00:00:00 2001 From: Bartosz Golaszewski Date: Tue, 23 Apr 2019 11:04:44 +0200 Subject: dt-bindings: input: Add DT bindings for max77650 Add the DT binding document for the onkey module of max77650. Signed-off-by: Bartosz Golaszewski Reviewed-by: Rob Herring Signed-off-by: Lee Jones --- .../devicetree/bindings/input/max77650-onkey.txt | 26 ++++++++++++++++++++++ 1 file changed, 26 insertions(+) create mode 100644 Documentation/devicetree/bindings/input/max77650-onkey.txt (limited to 'Documentation') diff --git a/Documentation/devicetree/bindings/input/max77650-onkey.txt b/Documentation/devicetree/bindings/input/max77650-onkey.txt new file mode 100644 index 000000000000..477dc74f452a --- /dev/null +++ b/Documentation/devicetree/bindings/input/max77650-onkey.txt @@ -0,0 +1,26 @@ +Onkey driver for MAX77650 PMIC from Maxim Integrated. + +This module is part of the MAX77650 MFD device. For more details +see Documentation/devicetree/bindings/mfd/max77650.txt. + +The onkey controller is represented as a sub-node of the PMIC node on +the device tree. + +Required properties: +-------------------- +- compatible: Must be "maxim,max77650-onkey". + +Optional properties: +- linux,code: The key-code to be reported when the key is pressed. + Defaults to KEY_POWER. +- maxim,onkey-slide: The system's button is a slide switch, not the default + push button. + +Example: +-------- + + onkey { + compatible = "maxim,max77650-onkey"; + linux,code = ; + maxim,onkey-slide; + }; -- cgit v1.2.3 From 56076a538536ace0ac4965771acd7c62fdd4c3dd Mon Sep 17 00:00:00 2001 From: Dmitry Osipenko Date: Sun, 5 May 2019 18:43:20 +0300 Subject: dt-bindings: mfd: max77620: Add compatible for Maxim 77663 Maxim 77663 has a few minor differences in regards to hardware interface and available capabilities by comparing it with 77620 and 20024 models, hence re-use 77620 device-tree binding for the 77663. Signed-off-by: Dmitry Osipenko Reviewed-by: Rob Herring Signed-off-by: Lee Jones --- Documentation/devicetree/bindings/mfd/max77620.txt | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) (limited to 'Documentation') diff --git a/Documentation/devicetree/bindings/mfd/max77620.txt b/Documentation/devicetree/bindings/mfd/max77620.txt index 9c16d51cc15b..b75283787ba1 100644 --- a/Documentation/devicetree/bindings/mfd/max77620.txt +++ b/Documentation/devicetree/bindings/mfd/max77620.txt @@ -4,7 +4,8 @@ Required properties: ------------------- - compatible: Must be one of "maxim,max77620" - "maxim,max20024". + "maxim,max20024" + "maxim,max77663" - reg: I2C device address. Optional properties: @@ -105,6 +106,7 @@ Optional properties: Here supported time periods by device in microseconds are as follows: MAX77620 supports 40, 80, 160, 320, 640, 1280, 2560 and 5120 microseconds. MAX20024 supports 20, 40, 80, 160, 320, 640, 1280 and 2540 microseconds. +MAX77663 supports 20, 40, 80, 160, 320, 640, 1280 and 2540 microseconds. -maxim,power-ok-control: configure map power ok bit 1: Enables POK(Power OK) to control nRST_IO and GPIO1 -- cgit v1.2.3 From c63217a462fe655eff329abc6a552fba5452c46f Mon Sep 17 00:00:00 2001 From: Dmitry Osipenko Date: Sun, 5 May 2019 18:43:21 +0300 Subject: dt-bindings: mfd: max77620: Add system-power-controller property Document new generic property that designates the PMIC as the system's power controller. Signed-off-by: Dmitry Osipenko Reviewed-by: Rob Herring Signed-off-by: Lee Jones --- Documentation/devicetree/bindings/mfd/max77620.txt | 5 +++++ 1 file changed, 5 insertions(+) (limited to 'Documentation') diff --git a/Documentation/devicetree/bindings/mfd/max77620.txt b/Documentation/devicetree/bindings/mfd/max77620.txt index b75283787ba1..5a642a51d58e 100644 --- a/Documentation/devicetree/bindings/mfd/max77620.txt +++ b/Documentation/devicetree/bindings/mfd/max77620.txt @@ -18,6 +18,11 @@ Optional properties: IRQ numbers for different interrupt source of MAX77620 are defined at dt-bindings/mfd/max77620.h. +- system-power-controller: Indicates that this PMIC is controlling the + system power, see [1] for more details. + +[1] Documentation/devicetree/bindings/power/power-controller.txt + Optional subnodes and their properties: ======================================= -- cgit v1.2.3 From fb8c86911052ccd4011392f78a24991c55c0d172 Mon Sep 17 00:00:00 2001 From: Amelie Delaunay Date: Thu, 9 May 2019 10:58:48 +0200 Subject: dt-bindings: mfd: Add ST Multi-Function eXpander (STMFX) core bindings This patch adds documentation of device tree bindings for the STMicroelectronics Multi-Function eXpander (STMFX) MFD core. Signed-off-by: Amelie Delaunay Reviewed-by: Linus Walleij Reviewed-by: Rob Herring Signed-off-by: Lee Jones --- Documentation/devicetree/bindings/mfd/stmfx.txt | 28 +++++++++++++++++++++++++ 1 file changed, 28 insertions(+) create mode 100644 Documentation/devicetree/bindings/mfd/stmfx.txt (limited to 'Documentation') diff --git a/Documentation/devicetree/bindings/mfd/stmfx.txt b/Documentation/devicetree/bindings/mfd/stmfx.txt new file mode 100644 index 000000000000..f0c2f7fcf5c7 --- /dev/null +++ b/Documentation/devicetree/bindings/mfd/stmfx.txt @@ -0,0 +1,28 @@ +STMicroelectonics Multi-Function eXpander (STMFX) Core bindings + +ST Multi-Function eXpander (STMFX) is a slave controller using I2C for +communication with the main MCU. Its main features are GPIO expansion, main +MCU IDD measurement (IDD is the amount of current that flows through VDD) and +resistive touchscreen controller. + +Required properties: +- compatible: should be "st,stmfx-0300". +- reg: I2C slave address of the device. +- interrupts: interrupt specifier triggered by MFX_IRQ_OUT signal. + Please refer to ../interrupt-controller/interrupt.txt + +Optional properties: +- drive-open-drain: configure MFX_IRQ_OUT as open drain. +- vdd-supply: phandle of the regulator supplying STMFX. + +Example: + + stmfx: stmfx@42 { + compatible = "st,stmfx-0300"; + reg = <0x42>; + interrupts = <8 IRQ_TYPE_EDGE_RISING>; + interrupt-parent = <&gpioi>; + vdd-supply = <&v3v3>; + }; + +Please refer to ../pinctrl/pinctrl-stmfx.txt for STMFX GPIO expander function bindings. -- cgit v1.2.3 From 2e0b80ce4520602b6c1aa806b42a01b7ff7a9af5 Mon Sep 17 00:00:00 2001 From: Amelie Delaunay Date: Thu, 9 May 2019 10:58:50 +0200 Subject: dt-bindings: pinctrl: document the STMFX pinctrl bindings This patch adds documentation of device tree bindings for the STMicroelectronics Multi-Function eXpander (STMFX) GPIO expander. Signed-off-by: Amelie Delaunay Reviewed-by: Linus Walleij Reviewed-by: Rob Herring Signed-off-by: Lee Jones --- .../devicetree/bindings/pinctrl/pinctrl-stmfx.txt | 116 +++++++++++++++++++++ 1 file changed, 116 insertions(+) create mode 100644 Documentation/devicetree/bindings/pinctrl/pinctrl-stmfx.txt (limited to 'Documentation') diff --git a/Documentation/devicetree/bindings/pinctrl/pinctrl-stmfx.txt b/Documentation/devicetree/bindings/pinctrl/pinctrl-stmfx.txt new file mode 100644 index 000000000000..c1b4c1819b84 --- /dev/null +++ b/Documentation/devicetree/bindings/pinctrl/pinctrl-stmfx.txt @@ -0,0 +1,116 @@ +STMicroelectronics Multi-Function eXpander (STMFX) GPIO expander bindings + +ST Multi-Function eXpander (STMFX) offers up to 24 GPIOs expansion. +Please refer to ../mfd/stmfx.txt for STMFX Core bindings. + +Required properties: +- compatible: should be "st,stmfx-0300-pinctrl". +- #gpio-cells: should be <2>, the first cell is the GPIO number and the second + cell is the gpio flags in accordance with . +- gpio-controller: marks the device as a GPIO controller. +- #interrupt-cells: should be <2>, the first cell is the GPIO number and the + second cell is the interrupt flags in accordance with + . +- interrupt-controller: marks the device as an interrupt controller. +- gpio-ranges: specifies the mapping between gpio controller and pin + controller pins. Check "Concerning gpio-ranges property" below. +Please refer to ../gpio/gpio.txt. + +Please refer to pinctrl-bindings.txt for pin configuration. + +Required properties for pin configuration sub-nodes: +- pins: list of pins to which the configuration applies. + +Optional properties for pin configuration sub-nodes (pinconf-generic ones): +- bias-disable: disable any bias on the pin. +- bias-pull-up: the pin will be pulled up. +- bias-pull-pin-default: use the pin-default pull state. +- bias-pull-down: the pin will be pulled down. +- drive-open-drain: the pin will be driven with open drain. +- drive-push-pull: the pin will be driven actively high and low. +- output-high: the pin will be configured as an output driving high level. +- output-low: the pin will be configured as an output driving low level. + +Note that STMFX pins[15:0] are called "gpio[15:0]", and STMFX pins[23:16] are +called "agpio[7:0]". Example, to refer to pin 18 of STMFX, use "agpio2". + +Concerning gpio-ranges property: +- if all STMFX pins[24:0] are available (no other STMFX function in use), you + should use gpio-ranges = <&stmfx_pinctrl 0 0 24>; +- if agpio[3:0] are not available (STMFX Touchscreen function in use), you + should use gpio-ranges = <&stmfx_pinctrl 0 0 16>, <&stmfx_pinctrl 20 20 4>; +- if agpio[7:4] are not available (STMFX IDD function in use), you + should use gpio-ranges = <&stmfx_pinctrl 0 0 20>; + + +Example: + + stmfx: stmfx@42 { + ... + + stmfx_pinctrl: stmfx-pin-controller { + compatible = "st,stmfx-0300-pinctrl"; + #gpio-cells = <2>; + #interrupt-cells = <2>; + gpio-controller; + interrupt-controller; + gpio-ranges = <&stmfx_pinctrl 0 0 24>; + + joystick_pins: joystick { + pins = "gpio0", "gpio1", "gpio2", "gpio3", "gpio4"; + drive-push-pull; + bias-pull-up; + }; + }; + }; + +Example of STMFX GPIO consumers: + + joystick { + compatible = "gpio-keys"; + #address-cells = <1>; + #size-cells = <0>; + pinctrl-0 = <&joystick_pins>; + pinctrl-names = "default"; + button-0 { + label = "JoySel"; + linux,code = ; + interrupt-parent = <&stmfx_pinctrl>; + interrupts = <0 IRQ_TYPE_EDGE_RISING>; + }; + button-1 { + label = "JoyDown"; + linux,code = ; + interrupt-parent = <&stmfx_pinctrl>; + interrupts = <1 IRQ_TYPE_EDGE_RISING>; + }; + button-2 { + label = "JoyLeft"; + linux,code = ; + interrupt-parent = <&stmfx_pinctrl>; + interrupts = <2 IRQ_TYPE_EDGE_RISING>; + }; + button-3 { + label = "JoyRight"; + linux,code = ; + interrupt-parent = <&stmfx_pinctrl>; + interrupts = <3 IRQ_TYPE_EDGE_RISING>; + }; + button-4 { + label = "JoyUp"; + linux,code = ; + interrupt-parent = <&stmfx_pinctrl>; + interrupts = <4 IRQ_TYPE_EDGE_RISING>; + }; + }; + + leds { + compatible = "gpio-leds"; + orange { + gpios = <&stmfx_pinctrl 17 1>; + }; + + blue { + gpios = <&stmfx_pinctrl 19 1>; + }; + } -- cgit v1.2.3