<feed xmlns='http://www.w3.org/2005/Atom'>
<title>linux/net/ipv4, branch master</title>
<subtitle>Linux Kernel (branches are rebased on master from time to time)</subtitle>
<id>https://sre.ring0.de/linux/atom?h=master</id>
<link rel='self' href='https://sre.ring0.de/linux/atom?h=master'/>
<link rel='alternate' type='text/html' href='https://sre.ring0.de/linux/'/>
<updated>2023-01-24T05:37:39Z</updated>
<entry>
<title>ipv4: prevent potential spectre v1 gadget in fib_metrics_match()</title>
<updated>2023-01-24T05:37:39Z</updated>
<author>
<name>Eric Dumazet</name>
<email>edumazet@google.com</email>
</author>
<published>2023-01-20T13:31:40Z</published>
<link rel='alternate' type='text/html' href='https://sre.ring0.de/linux/commit/?id=5e9398a26a92fc402d82ce1f97cc67d832527da0'/>
<id>urn:sha1:5e9398a26a92fc402d82ce1f97cc67d832527da0</id>
<content type='text'>
if (!type)
        continue;
    if (type &gt; RTAX_MAX)
        return false;
    ...
    fi_val = fi-&gt;fib_metrics-&gt;metrics[type - 1];

@type being used as an array index, we need to prevent
cpu speculation or risk leaking kernel memory content.

Fixes: 5f9ae3d9e7e4 ("ipv4: do metrics match when looking up and deleting a route")
Signed-off-by: Eric Dumazet &lt;edumazet@google.com&gt;
Link: https://lore.kernel.org/r/20230120133140.3624204-1-edumazet@google.com
Signed-off-by: Jakub Kicinski &lt;kuba@kernel.org&gt;
</content>
</entry>
<entry>
<title>ipv4: prevent potential spectre v1 gadget in ip_metrics_convert()</title>
<updated>2023-01-24T05:37:25Z</updated>
<author>
<name>Eric Dumazet</name>
<email>edumazet@google.com</email>
</author>
<published>2023-01-20T13:30:40Z</published>
<link rel='alternate' type='text/html' href='https://sre.ring0.de/linux/commit/?id=1d1d63b612801b3f0a39b7d4467cad0abd60e5c8'/>
<id>urn:sha1:1d1d63b612801b3f0a39b7d4467cad0abd60e5c8</id>
<content type='text'>
if (!type)
		continue;
	if (type &gt; RTAX_MAX)
		return -EINVAL;
	...
	metrics[type - 1] = val;

@type being used as an array index, we need to prevent
cpu speculation or risk leaking kernel memory content.

Fixes: 6cf9dfd3bd62 ("net: fib: move metrics parsing to a helper")
Signed-off-by: Eric Dumazet &lt;edumazet@google.com&gt;
Link: https://lore.kernel.org/r/20230120133040.3623463-1-edumazet@google.com
Signed-off-by: Jakub Kicinski &lt;kuba@kernel.org&gt;
</content>
</entry>
<entry>
<title>tcp: fix rate_app_limited to default to 1</title>
<updated>2023-01-20T13:23:35Z</updated>
<author>
<name>David Morley</name>
<email>morleyd@google.com</email>
</author>
<published>2023-01-19T19:00:28Z</published>
<link rel='alternate' type='text/html' href='https://sre.ring0.de/linux/commit/?id=300b655db1b5152d6101bcb6801d50899b20c2d6'/>
<id>urn:sha1:300b655db1b5152d6101bcb6801d50899b20c2d6</id>
<content type='text'>
The initial default value of 0 for tp-&gt;rate_app_limited was incorrect,
since a flow is indeed application-limited until it first sends
data. Fixing the default to be 1 is generally correct but also
specifically will help user-space applications avoid using the initial
tcpi_delivery_rate value of 0 that persists until the connection has
some non-zero bandwidth sample.

Fixes: eb8329e0a04d ("tcp: export data delivery rate")
Suggested-by: Yuchung Cheng &lt;ycheng@google.com&gt;
Signed-off-by: David Morley &lt;morleyd@google.com&gt;
Signed-off-by: Neal Cardwell &lt;ncardwell@google.com&gt;
Tested-by: David Morley &lt;morleyd@google.com&gt;
Reviewed-by: Eric Dumazet &lt;edumazet@google.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>net/ulp: use consistent error code when blocking ULP</title>
<updated>2023-01-19T17:26:16Z</updated>
<author>
<name>Paolo Abeni</name>
<email>pabeni@redhat.com</email>
</author>
<published>2023-01-18T12:24:12Z</published>
<link rel='alternate' type='text/html' href='https://sre.ring0.de/linux/commit/?id=8ccc99362b60c6f27bb46f36fdaaccf4ef0303de'/>
<id>urn:sha1:8ccc99362b60c6f27bb46f36fdaaccf4ef0303de</id>
<content type='text'>
The referenced commit changed the error code returned by the kernel
when preventing a non-established socket from attaching the ktls
ULP. Before to such a commit, the user-space got ENOTCONN instead
of EINVAL.

The existing self-tests depend on such error code, and the change
caused a failure:

  RUN           global.non_established ...
 tls.c:1673:non_established:Expected errno (22) == ENOTCONN (107)
 non_established: Test failed at step #3
          FAIL  global.non_established

In the unlikely event existing applications do the same, address
the issue by restoring the prior error code in the above scenario.

Note that the only other ULP performing similar checks at init
time - smc_ulp_ops - also fails with ENOTCONN when trying to attach
the ULP to a non-established socket.

Reported-by: Sabrina Dubroca &lt;sd@queasysnail.net&gt;
Fixes: 2c02d41d71f9 ("net/ulp: prevent ULP without clone op from entering the LISTEN status")
Signed-off-by: Paolo Abeni &lt;pabeni@redhat.com&gt;
Reviewed-by: Sabrina Dubroca &lt;sd@queasysnail.net&gt;
Link: https://lore.kernel.org/r/7bb199e7a93317fb6f8bf8b9b2dc71c18f337cde.1674042685.git.pabeni@redhat.com
Signed-off-by: Jakub Kicinski &lt;kuba@kernel.org&gt;
</content>
</entry>
<entry>
<title>tcp: avoid the lookup process failing to get sk in ehash table</title>
<updated>2023-01-19T12:06:45Z</updated>
<author>
<name>Jason Xing</name>
<email>kernelxing@tencent.com</email>
</author>
<published>2023-01-18T01:59:41Z</published>
<link rel='alternate' type='text/html' href='https://sre.ring0.de/linux/commit/?id=3f4ca5fafc08881d7a57daa20449d171f2887043'/>
<id>urn:sha1:3f4ca5fafc08881d7a57daa20449d171f2887043</id>
<content type='text'>
While one cpu is working on looking up the right socket from ehash
table, another cpu is done deleting the request socket and is about
to add (or is adding) the big socket from the table. It means that
we could miss both of them, even though it has little chance.

Let me draw a call trace map of the server side.
   CPU 0                           CPU 1
   -----                           -----
tcp_v4_rcv()                  syn_recv_sock()
                            inet_ehash_insert()
                            -&gt; sk_nulls_del_node_init_rcu(osk)
__inet_lookup_established()
                            -&gt; __sk_nulls_add_node_rcu(sk, list)

Notice that the CPU 0 is receiving the data after the final ack
during 3-way shakehands and CPU 1 is still handling the final ack.

Why could this be a real problem?
This case is happening only when the final ack and the first data
receiving by different CPUs. Then the server receiving data with
ACK flag tries to search one proper established socket from ehash
table, but apparently it fails as my map shows above. After that,
the server fetches a listener socket and then sends a RST because
it finds a ACK flag in the skb (data), which obeys RST definition
in RFC 793.

Besides, Eric pointed out there's one more race condition where it
handles tw socket hashdance. Only by adding to the tail of the list
before deleting the old one can we avoid the race if the reader has
already begun the bucket traversal and it would possibly miss the head.

Many thanks to Eric for great help from beginning to end.

Fixes: 5e0724d027f0 ("tcp/dccp: fix hashdance race for passive sessions")
Suggested-by: Eric Dumazet &lt;edumazet@google.com&gt;
Signed-off-by: Jason Xing &lt;kernelxing@tencent.com&gt;
Reviewed-by: Eric Dumazet &lt;edumazet@google.com&gt;
Reviewed-by: Kuniyuki Iwashima &lt;kuniyu@amazon.com&gt;
Link: https://lore.kernel.org/lkml/20230112065336.41034-1-kerneljasonxing@gmail.com/
Link: https://lore.kernel.org/r/20230118015941.1313-1-kerneljasonxing@gmail.com
Signed-off-by: Paolo Abeni &lt;pabeni@redhat.com&gt;
</content>
</entry>
<entry>
<title>Merge tag 'net-6.2-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net</title>
<updated>2023-01-05T20:40:50Z</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2023-01-05T20:40:50Z</published>
<link rel='alternate' type='text/html' href='https://sre.ring0.de/linux/commit/?id=50011c32f421215f6231996fcc84fd1fe81c4a48'/>
<id>urn:sha1:50011c32f421215f6231996fcc84fd1fe81c4a48</id>
<content type='text'>
Pull networking fixes from Jakub Kicinski:
 "Including fixes from bpf, wifi, and netfilter.

  Current release - regressions:

   - bpf: fix nullness propagation for reg to reg comparisons, avoid
     null-deref

   - inet: control sockets should not use current thread task_frag

   - bpf: always use maximal size for copy_array()

   - eth: bnxt_en: don't link netdev to a devlink port for VFs

  Current release - new code bugs:

   - rxrpc: fix a couple of potential use-after-frees

   - netfilter: conntrack: fix IPv6 exthdr error check

   - wifi: iwlwifi: fw: skip PPAG for JF, avoid FW crashes

   - eth: dsa: qca8k: various fixes for the in-band register access

   - eth: nfp: fix schedule in atomic context when sync mc address

   - eth: renesas: rswitch: fix getting mac address from device tree

   - mobile: ipa: use proper endpoint mask for suspend

  Previous releases - regressions:

   - tcp: add TIME_WAIT sockets in bhash2, fix regression caught by
     Jiri / python tests

   - net: tc: don't intepret cls results when asked to drop, fix
     oob-access

   - vrf: determine the dst using the original ifindex for multicast

   - eth: bnxt_en:
      - fix XDP RX path if BPF adjusted packet length
      - fix HDS (header placement) and jumbo thresholds for RX packets

   - eth: ice: xsk: do not use xdp_return_frame() on tx_buf-&gt;raw_buf,
     avoid memory corruptions

  Previous releases - always broken:

   - ulp: prevent ULP without clone op from entering the LISTEN status

   - veth: fix race with AF_XDP exposing old or uninitialized
     descriptors

   - bpf:
      - pull before calling skb_postpull_rcsum() (fix checksum support
        and avoid a WARN())
      - fix panic due to wrong pageattr of im-&gt;image (when livepatch and
        kretfunc coexist)
      - keep a reference to the mm, in case the task is dead

   - mptcp: fix deadlock in fastopen error path

   - netfilter:
      - nf_tables: perform type checking for existing sets
      - nf_tables: honor set timeout and garbage collection updates
      - ipset: fix hash:net,port,net hang with /0 subnet
      - ipset: avoid hung task warning when adding/deleting entries

   - selftests: net:
      - fix cmsg_so_mark.sh test hang on non-x86 systems
      - fix the arp_ndisc_evict_nocarrier test for IPv6

   - usb: rndis_host: secure rndis_query check against int overflow

   - eth: r8169: fix dmar pte write access during suspend/resume with
     WOL

   - eth: lan966x: fix configuration of the PCS

   - eth: sparx5: fix reading of the MAC address

   - eth: qed: allow sleep in qed_mcp_trace_dump()

   - eth: hns3:
      - fix interrupts re-initialization after VF FLR
      - fix handling of promisc when MAC addr table gets full
      - refine the handling for VF heartbeat

   - eth: mlx5:
      - properly handle ingress QinQ-tagged packets on VST
      - fix io_eq_size and event_eq_size params validation on big endian
      - fix RoCE setting at HCA level if not supported at all
      - don't turn CQE compression on by default for IPoIB

   - eth: ena:
      - fix toeplitz initial hash key value
      - account for the number of XDP-processed bytes in interface stats
      - fix rx_copybreak value update

  Misc:

   - ethtool: harden phy stat handling against buggy drivers

   - docs: netdev: convert maintainer's doc from FAQ to a normal
     document"

* tag 'net-6.2-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (112 commits)
  caif: fix memory leak in cfctrl_linkup_request()
  inet: control sockets should not use current thread task_frag
  net/ulp: prevent ULP without clone op from entering the LISTEN status
  qed: allow sleep in qed_mcp_trace_dump()
  MAINTAINERS: Update maintainers for ptp_vmw driver
  usb: rndis_host: Secure rndis_query check against int overflow
  net: dpaa: Fix dtsec check for PCS availability
  octeontx2-pf: Fix lmtst ID used in aura free
  drivers/net/bonding/bond_3ad: return when there's no aggregator
  netfilter: ipset: Rework long task execution when adding/deleting entries
  netfilter: ipset: fix hash:net,port,net hang with /0 subnet
  net: sparx5: Fix reading of the MAC address
  vxlan: Fix memory leaks in error path
  net: sched: htb: fix htb_classify() kernel-doc
  net: sched: cbq: dont intepret cls results when asked to drop
  net: sched: atm: dont intepret cls results when asked to drop
  dt-bindings: net: marvell,orion-mdio: Fix examples
  dt-bindings: net: sun8i-emac: Add phy-supply property
  net: ipa: use proper endpoint mask for suspend
  selftests: net: return non-zero for failures reported in arp_ndisc_evict_nocarrier
  ...
</content>
</entry>
<entry>
<title>inet: control sockets should not use current thread task_frag</title>
<updated>2023-01-05T04:38:25Z</updated>
<author>
<name>Eric Dumazet</name>
<email>edumazet@google.com</email>
</author>
<published>2023-01-03T19:27:36Z</published>
<link rel='alternate' type='text/html' href='https://sre.ring0.de/linux/commit/?id=1ac88557447088ccd15eb2f2520ce46d463c8e0b'/>
<id>urn:sha1:1ac88557447088ccd15eb2f2520ce46d463c8e0b</id>
<content type='text'>
Because ICMP handlers run from softirq contexts,
they must not use current thread task_frag.

Previously, all sockets allocated by inet_ctl_sock_create()
would use the per-socket page fragment, with no chance of
recursion.

Fixes: 98123866fcf3 ("Treewide: Stop corrupting socket's task_frag")
Reported-by: syzbot+bebc6f1acdf4cbb79b03@syzkaller.appspotmail.com
Signed-off-by: Eric Dumazet &lt;edumazet@google.com&gt;
Cc: Benjamin Coddington &lt;bcodding@redhat.com&gt;
Acked-by: Guillaume Nault &lt;gnault@redhat.com&gt;
Link: https://lore.kernel.org/r/20230103192736.454149-1-edumazet@google.com
Signed-off-by: Jakub Kicinski &lt;kuba@kernel.org&gt;
</content>
</entry>
<entry>
<title>net/ulp: prevent ULP without clone op from entering the LISTEN status</title>
<updated>2023-01-05T04:37:41Z</updated>
<author>
<name>Paolo Abeni</name>
<email>pabeni@redhat.com</email>
</author>
<published>2023-01-03T11:19:17Z</published>
<link rel='alternate' type='text/html' href='https://sre.ring0.de/linux/commit/?id=2c02d41d71f90a5168391b6a5f2954112ba2307c'/>
<id>urn:sha1:2c02d41d71f90a5168391b6a5f2954112ba2307c</id>
<content type='text'>
When an ULP-enabled socket enters the LISTEN status, the listener ULP data
pointer is copied inside the child/accepted sockets by sk_clone_lock().

The relevant ULP can take care of de-duplicating the context pointer via
the clone() operation, but only MPTCP and SMC implement such op.

Other ULPs may end-up with a double-free at socket disposal time.

We can't simply clear the ULP data at clone time, as TLS replaces the
socket ops with custom ones assuming a valid TLS ULP context is
available.

Instead completely prevent clone-less ULP sockets from entering the
LISTEN status.

Fixes: 734942cc4ea6 ("tcp: ULP infrastructure")
Reported-by: slipper &lt;slipper.alive@gmail.com&gt;
Signed-off-by: Paolo Abeni &lt;pabeni@redhat.com&gt;
Link: https://lore.kernel.org/r/4b80c3d1dbe3d0ab072f80450c202d9bc88b4b03.1672740602.git.pabeni@redhat.com
Signed-off-by: Jakub Kicinski &lt;kuba@kernel.org&gt;
</content>
</entry>
<entry>
<title>tcp: Add TIME_WAIT sockets in bhash2.</title>
<updated>2022-12-30T07:25:52Z</updated>
<author>
<name>Kuniyuki Iwashima</name>
<email>kuniyu@amazon.com</email>
</author>
<published>2022-12-26T13:27:52Z</published>
<link rel='alternate' type='text/html' href='https://sre.ring0.de/linux/commit/?id=936a192f974018b4f6040f6f77b1cc1e75bd8666'/>
<id>urn:sha1:936a192f974018b4f6040f6f77b1cc1e75bd8666</id>
<content type='text'>
Jiri Slaby reported regression of bind() with a simple repro. [0]

The repro creates a TIME_WAIT socket and tries to bind() a new socket
with the same local address and port.  Before commit 28044fc1d495 ("net:
Add a bhash2 table hashed by port and address"), the bind() failed with
-EADDRINUSE, but now it succeeds.

The cited commit should have put TIME_WAIT sockets into bhash2; otherwise,
inet_bhash2_conflict() misses TIME_WAIT sockets when validating bind()
requests if the address is not a wildcard one.

The straight option is to move sk_bind2_node from struct sock to struct
sock_common to add twsk to bhash2 as implemented as RFC. [1]  However, the
binary layout change in the struct sock could affect performances moving
hot fields on different cachelines.

To avoid that, we add another TIME_WAIT list in inet_bind2_bucket and check
it while validating bind().

[0]: https://lore.kernel.org/netdev/6b971a4e-c7d8-411e-1f92-fda29b5b2fb9@kernel.org/
[1]: https://lore.kernel.org/netdev/20221221151258.25748-2-kuniyu@amazon.com/

Fixes: 28044fc1d495 ("net: Add a bhash2 table hashed by port and address")
Reported-by: Jiri Slaby &lt;jirislaby@kernel.org&gt;
Suggested-by: Paolo Abeni &lt;pabeni@redhat.com&gt;
Signed-off-by: Kuniyuki Iwashima &lt;kuniyu@amazon.com&gt;
Acked-by: Joanne Koong &lt;joannelkoong@gmail.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>treewide: Convert del_timer*() to timer_shutdown*()</title>
<updated>2022-12-25T21:38:09Z</updated>
<author>
<name>Steven Rostedt (Google)</name>
<email>rostedt@goodmis.org</email>
</author>
<published>2022-12-20T18:45:19Z</published>
<link rel='alternate' type='text/html' href='https://sre.ring0.de/linux/commit/?id=292a089d78d3e2f7944e60bb897c977785a321e3'/>
<id>urn:sha1:292a089d78d3e2f7944e60bb897c977785a321e3</id>
<content type='text'>
Due to several bugs caused by timers being re-armed after they are
shutdown and just before they are freed, a new state of timers was added
called "shutdown".  After a timer is set to this state, then it can no
longer be re-armed.

The following script was run to find all the trivial locations where
del_timer() or del_timer_sync() is called in the same function that the
object holding the timer is freed.  It also ignores any locations where
the timer-&gt;function is modified between the del_timer*() and the free(),
as that is not considered a "trivial" case.

This was created by using a coccinelle script and the following
commands:

    $ cat timer.cocci
    @@
    expression ptr, slab;
    identifier timer, rfield;
    @@
    (
    -       del_timer(&amp;ptr-&gt;timer);
    +       timer_shutdown(&amp;ptr-&gt;timer);
    |
    -       del_timer_sync(&amp;ptr-&gt;timer);
    +       timer_shutdown_sync(&amp;ptr-&gt;timer);
    )
      ... when strict
          when != ptr-&gt;timer
    (
            kfree_rcu(ptr, rfield);
    |
            kmem_cache_free(slab, ptr);
    |
            kfree(ptr);
    )

    $ spatch timer.cocci . &gt; /tmp/t.patch
    $ patch -p1 &lt; /tmp/t.patch

Link: https://lore.kernel.org/lkml/20221123201306.823305113@linutronix.de/
Signed-off-by: Steven Rostedt (Google) &lt;rostedt@goodmis.org&gt;
Acked-by: Pavel Machek &lt;pavel@ucw.cz&gt; [ LED ]
Acked-by: Kalle Valo &lt;kvalo@kernel.org&gt; [ wireless ]
Acked-by: Paolo Abeni &lt;pabeni@redhat.com&gt; [ networking ]
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
</feed>
