summaryrefslogtreecommitdiffstats
path: root/include
AgeCommit message (Collapse)AuthorFilesLines
2022-07-30rv/monitor: Add the wip monitorDaniel Bristot de Oliveira1-0/+10
The wakeup in preemptive (wip) monitor verifies if the wakeup events always take place with preemption disabled: | | v #==================# H preemptive H <+ #==================# | | | | preempt_disable | preempt_enable v | sched_waking +------------------+ | +--------------- | | | | | non_preemptive | | +--------------> | | -+ +------------------+ The wakeup event always takes place with preemption disabled because of the scheduler synchronization. However, because the preempt_count and its trace event are not atomic with regard to interrupts, some inconsistencies might happen. The documentation illustrates one of these cases. Link: https://lkml.kernel.org/r/c98ca678df81115fddc04921b3c79720c836b18f.1659052063.git.bristot@kernel.org Cc: Wim Van Sebroeck <wim@linux-watchdog.org> Cc: Guenter Roeck <linux@roeck-us.net> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Ingo Molnar <mingo@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Will Deacon <will@kernel.org> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Marco Elver <elver@google.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: "Paul E. McKenney" <paulmck@kernel.org> Cc: Shuah Khan <skhan@linuxfoundation.org> Cc: Gabriele Paoloni <gpaoloni@redhat.com> Cc: Juri Lelli <juri.lelli@redhat.com> Cc: Clark Williams <williams@redhat.com> Cc: Tao Zhou <tao.zhou@linux.dev> Cc: Randy Dunlap <rdunlap@infradead.org> Cc: linux-doc@vger.kernel.org Cc: linux-kernel@vger.kernel.org Cc: linux-trace-devel@vger.kernel.org Signed-off-by: Daniel Bristot de Oliveira <bristot@kernel.org> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2022-07-30Documentation/rv: Add deterministic automata monitor synthesis documentationDaniel Bristot de Oliveira1-0/+3
Add the da_monitor_synthesis.rst introduces some concepts behind the Deterministic Automata (DA) monitor synthesis and interface. Link: https://lkml.kernel.org/r/7873bdb7b2e5d2bc0b2eb6ca0b324af9a0ba27a0.1659052063.git.bristot@kernel.org Cc: Wim Van Sebroeck <wim@linux-watchdog.org> Cc: Guenter Roeck <linux@roeck-us.net> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Ingo Molnar <mingo@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Will Deacon <will@kernel.org> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Marco Elver <elver@google.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: "Paul E. McKenney" <paulmck@kernel.org> Cc: Shuah Khan <skhan@linuxfoundation.org> Cc: Gabriele Paoloni <gpaoloni@redhat.com> Cc: Juri Lelli <juri.lelli@redhat.com> Cc: Clark Williams <williams@redhat.com> Cc: Tao Zhou <tao.zhou@linux.dev> Cc: Randy Dunlap <rdunlap@infradead.org> Cc: linux-doc@vger.kernel.org Cc: linux-kernel@vger.kernel.org Cc: linux-trace-devel@vger.kernel.org Signed-off-by: Daniel Bristot de Oliveira <bristot@kernel.org> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2022-07-30rv/include: Add instrumentation helper functionsDaniel Bristot de Oliveira1-0/+29
Instrumentation helper functions to facilitate the instrumentation of auto-generated RV monitors create by dot2k. Link: https://lkml.kernel.org/r/3b36c9435f9d9299beb84e5c7c46920e205bedec.1659052063.git.bristot@kernel.org Cc: Wim Van Sebroeck <wim@linux-watchdog.org> Cc: Guenter Roeck <linux@roeck-us.net> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Ingo Molnar <mingo@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Will Deacon <will@kernel.org> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Marco Elver <elver@google.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: "Paul E. McKenney" <paulmck@kernel.org> Cc: Shuah Khan <skhan@linuxfoundation.org> Cc: Gabriele Paoloni <gpaoloni@redhat.com> Cc: Juri Lelli <juri.lelli@redhat.com> Cc: Clark Williams <williams@redhat.com> Cc: Tao Zhou <tao.zhou@linux.dev> Cc: Randy Dunlap <rdunlap@infradead.org> Cc: linux-doc@vger.kernel.org Cc: linux-kernel@vger.kernel.org Cc: linux-trace-devel@vger.kernel.org Signed-off-by: Daniel Bristot de Oliveira <bristot@kernel.org> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2022-07-30rv/include: Add deterministic automata monitor definition via C macrosDaniel Bristot de Oliveira3-0/+671
In Linux terms, the runtime verification monitors are encapsulated inside the "RV monitor" abstraction. The "RV monitor" includes a set of instances of the monitor (per-cpu monitor, per-task monitor, and so on), the helper functions that glue the monitor to the system reference model, and the trace output as a reaction for event parsing and exceptions, as depicted below: Linux +----- RV Monitor ----------------------------------+ Formal Realm | | Realm +-------------------+ +----------------+ +-----------------+ | Linux kernel | | Monitor | | Reference | | Tracing | -> | Instance(s) | <- | Model | | (instrumentation) | | (verification) | | (specification) | +-------------------+ +----------------+ +-----------------+ | | | | V | | +----------+ | | | Reaction | | | +--+--+--+-+ | | | | | | | | | +-> trace output ? | +------------------------|--|----------------------+ | +----> panic ? +-------> <user-specified> Add the rv/da_monitor.h, enabling automatic code generation for the *Monitor Instance(s)* using C macros, and code to support it. The benefits of the usage of macro for monitor synthesis are 3-fold as it: - Reduces the code duplication; - Facilitates the bug fix/improvement; - Avoids the case of developers changing the core of the monitor code to manipulate the model in a (let's say) non-standard way. This initial implementation presents three different types of monitor instances: - DECLARE_DA_MON_GLOBAL(name, type) - DECLARE_DA_MON_PER_CPU(name, type) - DECLARE_DA_MON_PER_TASK(name, type) The first declares the functions for a global deterministic automata monitor, the second for monitors with per-cpu instances, and the third with per-task instances. Link: https://lkml.kernel.org/r/51b0bf425a281e226dfeba7401d2115d6091f84e.1659052063.git.bristot@kernel.org Cc: Wim Van Sebroeck <wim@linux-watchdog.org> Cc: Guenter Roeck <linux@roeck-us.net> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Ingo Molnar <mingo@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Will Deacon <will@kernel.org> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Marco Elver <elver@google.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: "Paul E. McKenney" <paulmck@kernel.org> Cc: Shuah Khan <skhan@linuxfoundation.org> Cc: Gabriele Paoloni <gpaoloni@redhat.com> Cc: Juri Lelli <juri.lelli@redhat.com> Cc: Clark Williams <williams@redhat.com> Cc: Tao Zhou <tao.zhou@linux.dev> Cc: Randy Dunlap <rdunlap@infradead.org> Cc: linux-doc@vger.kernel.org Cc: linux-kernel@vger.kernel.org Cc: linux-trace-devel@vger.kernel.org Signed-off-by: Daniel Bristot de Oliveira <bristot@kernel.org> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2022-07-30rv/include: Add helper functions for deterministic automataDaniel Bristot de Oliveira1-0/+75
Formally, a deterministic automaton, denoted by G, is defined as a quintuple: G = { X, E, f, x_0, X_m } where: - X is the set of states; - E is the finite set of events; - x_0 is the initial state; - X_m (subset of X) is the set of marked states. - f : X x E -> X $ is the transition function. It defines the state transition in the occurrence of a event from E in the state X. In the special case of deterministic automata, the occurrence of the event in E in a state in X has a deterministic next state from X. An automaton can also be represented using a graphical format of vertices (nodes) and edges. The open-source tool Graphviz can produce this graphic format using the (textual) DOT language as the source code. The dot2c tool presented in this paper: De Oliveira, Daniel Bristot; Cucinotta, Tommaso; De Oliveira, Romulo Silva. Efficient formal verification for the Linux kernel. In: International Conference on Software Engineering and Formal Methods. Springer, Cham, 2019. p. 315-332. Translates a deterministic automaton in the DOT format into a C source code representation that to be used for monitoring. This header file implements helper functions to facilitate the usage of the C output from dot2c/k for monitoring. Link: https://lkml.kernel.org/r/563234f2bfa84b540f60cf9e39c2d9f0eea95a55.1659052063.git.bristot@kernel.org Cc: Wim Van Sebroeck <wim@linux-watchdog.org> Cc: Guenter Roeck <linux@roeck-us.net> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Ingo Molnar <mingo@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Will Deacon <will@kernel.org> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Marco Elver <elver@google.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: "Paul E. McKenney" <paulmck@kernel.org> Cc: Shuah Khan <skhan@linuxfoundation.org> Cc: Gabriele Paoloni <gpaoloni@redhat.com> Cc: Juri Lelli <juri.lelli@redhat.com> Cc: Clark Williams <williams@redhat.com> Cc: Tao Zhou <tao.zhou@linux.dev> Cc: Randy Dunlap <rdunlap@infradead.org> Cc: linux-doc@vger.kernel.org Cc: linux-kernel@vger.kernel.org Cc: linux-trace-devel@vger.kernel.org Signed-off-by: Daniel Bristot de Oliveira <bristot@kernel.org> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2022-07-30rv: Add runtime reactors interfaceDaniel Bristot de Oliveira1-0/+17
A runtime monitor can cause a reaction to the detection of an exception on the model's execution. By default, the monitors have tracing reactions, printing the monitor output via tracepoints. But other reactions can be added (on-demand) via this interface. The user interface resembles the kernel tracing interface and presents these files: "available_reactors" - Reading shows the available reactors, one per line. For example: # cat available_reactors nop panic printk "reacting_on" - It is an on/off general switch for reactors, disabling all reactions. "monitors/MONITOR/reactors" - List available reactors, with the select reaction for the given MONITOR inside []. The default one is the nop (no operation) reactor. - Writing the name of a reactor enables it to the given MONITOR. For example: # cat monitors/wip/reactors [nop] panic printk # echo panic > monitors/wip/reactors # cat monitors/wip/reactors nop [panic] printk Link: https://lkml.kernel.org/r/1794eb994637457bdeaa6bad0b8263d2f7eece0c.1659052063.git.bristot@kernel.org Cc: Wim Van Sebroeck <wim@linux-watchdog.org> Cc: Guenter Roeck <linux@roeck-us.net> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Ingo Molnar <mingo@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Will Deacon <will@kernel.org> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Marco Elver <elver@google.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: "Paul E. McKenney" <paulmck@kernel.org> Cc: Shuah Khan <skhan@linuxfoundation.org> Cc: Gabriele Paoloni <gpaoloni@redhat.com> Cc: Juri Lelli <juri.lelli@redhat.com> Cc: Clark Williams <williams@redhat.com> Cc: Tao Zhou <tao.zhou@linux.dev> Cc: Randy Dunlap <rdunlap@infradead.org> Cc: linux-doc@vger.kernel.org Cc: linux-kernel@vger.kernel.org Cc: linux-trace-devel@vger.kernel.org Signed-off-by: Daniel Bristot de Oliveira <bristot@kernel.org> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2022-07-30rv: Add Runtime Verification (RV) interfaceDaniel Bristot de Oliveira2-0/+54
RV is a lightweight (yet rigorous) method that complements classical exhaustive verification techniques (such as model checking and theorem proving) with a more practical approach to complex systems. RV works by analyzing the trace of the system's actual execution, comparing it against a formal specification of the system behavior. RV can give precise information on the runtime behavior of the monitored system while enabling the reaction for unexpected events, avoiding, for example, the propagation of a failure on safety-critical systems. The development of this interface roots in the development of the paper: De Oliveira, Daniel Bristot; Cucinotta, Tommaso; De Oliveira, Romulo Silva. Efficient formal verification for the Linux kernel. In: International Conference on Software Engineering and Formal Methods. Springer, Cham, 2019. p. 315-332. And: De Oliveira, Daniel Bristot. Automata-based formal analysis and verification of the real-time Linux kernel. PhD Thesis, 2020. The RV interface resembles the tracing/ interface on purpose. The current path for the RV interface is /sys/kernel/tracing/rv/. It presents these files: "available_monitors" - List the available monitors, one per line. For example: # cat available_monitors wip wwnr "enabled_monitors" - Lists the enabled monitors, one per line; - Writing to it enables a given monitor; - Writing a monitor name with a '!' prefix disables it; - Truncating the file disables all enabled monitors. For example: # cat enabled_monitors # echo wip > enabled_monitors # echo wwnr >> enabled_monitors # cat enabled_monitors wip wwnr # echo '!wip' >> enabled_monitors # cat enabled_monitors wwnr # echo > enabled_monitors # cat enabled_monitors # Note that more than one monitor can be enabled concurrently. "monitoring_on" - It is an on/off general switcher for monitoring. Note that it does not disable enabled monitors or detach events, but stop the per-entity monitors of monitoring the events received from the system. It resembles the "tracing_on" switcher. "monitors/" Each monitor will have its one directory inside "monitors/". There the monitor specific files will be presented. The "monitors/" directory resembles the "events" directory on tracefs. For example: # cd monitors/wip/ # ls desc enable # cat desc wakeup in preemptive per-cpu testing monitor. # cat enable 0 For further information, see the comments in the header of kernel/trace/rv/rv.c from this patch. Link: https://lkml.kernel.org/r/a4bfe038f50cb047bfb343ad0e12b0e646ab308b.1659052063.git.bristot@kernel.org Cc: Wim Van Sebroeck <wim@linux-watchdog.org> Cc: Guenter Roeck <linux@roeck-us.net> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Ingo Molnar <mingo@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Will Deacon <will@kernel.org> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Marco Elver <elver@google.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: "Paul E. McKenney" <paulmck@kernel.org> Cc: Shuah Khan <skhan@linuxfoundation.org> Cc: Gabriele Paoloni <gpaoloni@redhat.com> Cc: Juri Lelli <juri.lelli@redhat.com> Cc: Clark Williams <williams@redhat.com> Cc: Tao Zhou <tao.zhou@linux.dev> Cc: Randy Dunlap <rdunlap@infradead.org> Cc: linux-doc@vger.kernel.org Cc: linux-kernel@vger.kernel.org Cc: linux-trace-devel@vger.kernel.org Signed-off-by: Daniel Bristot de Oliveira <bristot@kernel.org> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2022-07-30tracing: Use a copy of the va_list for __assign_vstr()Steven Rostedt (Google)1-1/+6
If an instance of tracing enables the same trace event as another instance, or the top level instance, or even perf, then the va_list passed into some tracepoints can be used more than once. As va_list can only be traversed once, this can cause issues: # cat /sys/kernel/tracing/instances/qla2xxx/trace cat-56106 [012] ..... 2419873.470098: ql_dbg_log: qla2xxx [0000:05:00.0]-1054:14: Entered (null). cat-56106 [012] ..... 2419873.470101: ql_dbg_log: qla2xxx [0000:05:00.0]-1000:14: Entered ×+<96>²Ü<98>^H. cat-56106 [012] ..... 2419873.470102: ql_dbg_log: qla2xxx [0000:05:00.0]-1006:14: Prepare to issue mbox cmd=0xde589000. # cat /sys/kernel/tracing/trace cat-56106 [012] ..... 2419873.470097: ql_dbg_log: qla2xxx [0000:05:00.0]-1054:14: Entered qla2x00_get_firmware_state. cat-56106 [012] ..... 2419873.470100: ql_dbg_log: qla2xxx [0000:05:00.0]-1000:14: Entered qla2x00_mailbox_command. cat-56106 [012] ..... 2419873.470102: ql_dbg_log: qla2xxx [0000:05:00.0]-1006:14: Prepare to issue mbox cmd=0x69. The instance version is corrupted because the top level instance iterated the va_list first. Use va_copy() in the __assign_vstr() macro to make sure that each trace event for each use case gets a fresh va_list. Link: https://lore.kernel.org/all/259d53a5-958e-6508-4e45-74dba2821242@marvell.com/ Link: https://lkml.kernel.org/r/20220719182004.21daa83e@gandalf.local.home Fixes: 0563231f93c6d ("tracing/events: Add __vstring() and __assign_vstr() helper macros") Reported-by: Arun Easi <aeasi@marvell.com> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2022-07-19scsi: qla2xxx: tracing: Use the new __vstring() helperSteven Rostedt (Google)1-2/+2
Instead of open coding a __dynamic_array() with a fixed length (which defeats the purpose of the dynamic array in the first place). Use the new __vstring() helper that will use a va_list and only write enough of the string into the ring buffer that is needed. Link: https://lkml.kernel.org/r/20220705224750.896553364@goodmis.org Cc: Bart Van Assche <bvanassche@acm.org> Cc: Ingo Molnar <mingo@kernel.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2022-07-19scsi: iscsi: tracing: Use the new __vstring() helperSteven Rostedt (Google)1-2/+2
Instead of open coding a __dynamic_array() with a fixed length (which defeats the purpose of the dynamic array in the first place). Use the new __vstring() helper that will use a va_list and only write enough of the string into the ring buffer that is needed. Link: https://lkml.kernel.org/r/20220705224750.715763972@goodmis.org Cc: Fred Herard <fred.herard@oracle.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2022-07-15tracing/events: Add __vstring() and __assign_vstr() helper macrosSteven Rostedt (Google)6-0/+38
There's several places that open code the following logic: TP_STRUCT__entry(__dynamic_array(char, msg, MSG_MAX)), TP_fast_assign(vsnprintf(__get_str(msg), MSG_MAX, vaf->fmt, *vaf->va);) To load a string created by variable array va_list. The main issue with this approach is that "MSG_MAX" usage in the __dynamic_array() portion. That actually just reserves the MSG_MAX in the event, and even wastes space because there's dynamic meta data also saved in the event to denote the offset and size of the dynamic array. It would have been better to just use a static __array() field. Instead, create __vstring() and __assign_vstr() that work like __string and __assign_str() but instead of taking a destination string to copy, take a format string and a va_list pointer and fill in the values. It uses the helper: #define __trace_event_vstr_len(fmt, va) \ ({ \ va_list __ap; \ int __ret; \ \ va_copy(__ap, *(va)); \ __ret = vsnprintf(NULL, 0, fmt, __ap) + 1; \ va_end(__ap); \ \ min(__ret, TRACE_EVENT_STR_MAX); \ }) To figure out the length to store the string. It may be slightly slower as it needs to run the vsnprintf() twice, but it now saves space on the ring buffer. Link: https://lkml.kernel.org/r/20220705224749.053570613@goodmis.org Cc: Dennis Dalessandro <dennis.dalessandro@cornelisnetworks.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Jason Gunthorpe <jgg@ziepe.ca> Cc: Leon Romanovsky <leon@kernel.org> Cc: Kalle Valo <kvalo@kernel.org> Cc: "David S. Miller" <davem@davemloft.net> Cc: Eric Dumazet <edumazet@google.com> Cc: Jakub Kicinski <kuba@kernel.org> Cc: Paolo Abeni <pabeni@redhat.com> Cc: Arend van Spriel <aspriel@gmail.com> Cc: Franky Lin <franky.lin@broadcom.com> Cc: Hante Meuleman <hante.meuleman@broadcom.com> Cc: Gregory Greenman <gregory.greenman@intel.com> Cc: Peter Chen <peter.chen@kernel.org> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Mathias Nyman <mathias.nyman@intel.com> Cc: Chunfeng Yun <chunfeng.yun@mediatek.com> Cc: Bin Liu <b-liu@ti.com> Cc: Marek Lindner <mareklindner@neomailbox.ch> Cc: Simon Wunderlich <sw@simonwunderlich.de> Cc: Antonio Quartulli <a@unstable.cc> Cc: Sven Eckelmann <sven@narfation.org> Cc: Johannes Berg <johannes@sipsolutions.net> Cc: Jim Cromie <jim.cromie@gmail.com> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2022-07-15neighbor: tracing: Have neigh_create event use __string()Steven Rostedt (Google)1-1/+1
The dev field of the neigh_create event uses __dynamic_array() with a fixed size, which defeats the purpose of __dynamic_array(). Looking at the logic, as it already uses __assign_str(), just use the same logic in __string to create the size needed. It appears that because "dev" can be NULL, it needs the check. But __string() can have the same checks as __assign_str() so use them there too. Link: https://lkml.kernel.org/r/20220705183741.35387e3f@rorschach.local.home Cc: David Ahern <dsahern@gmail.com> Cc: David S. Miller <davem@davemloft.net> Cc: netdev@vger.kernel.org Acked-by: Jakub Kicinski <kuba@kernel.org> Reviewed-by: David Ahern <dsahern@kernel.org> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2022-07-15tracing/ipv4/ipv6: Use static array for name field in fib*_lookup_table eventSteven Rostedt (Google)2-7/+7
The fib_lookup_table and fib6_lookup_table events declare name as a dynamic_array, but also give it a fixed size, which defeats the purpose of the dynamic array, especially since the dynamic array also includes meta data in the event to specify its size. Since the size of the name is at most 16 bytes (defined by IFNAMSIZ), it is not worth spending the effort to determine the size of the string. Just use a fixed size array and copy into it. This will save 4 bytes that are used for the meta data that saves the size and position of a dynamic array, and even slightly speed up the event processing. Link: https://lkml.kernel.org/r/20220704091436.3705edbf@rorschach.local.home Cc: David S. Miller <davem@davemloft.net> Cc: netdev@vger.kernel.org Acked-by: Jakub Kicinski <kuba@kernel.org> Reviewed-by: David Ahern <dsahern@kernel.org> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2022-07-14tracing: devlink: Use static array for string in devlink_trap_report eventSteven Rostedt (Google)1-4/+3
The trace event devlink_trap_report uses the __dynamic_array() macro to determine the size of the input_dev_name field. This is because it needs to test the dev field for NULL, and will use "NULL" if it is. But it also has the size of the dynamic array as a fixed IFNAMSIZ bytes. This defeats the purpose of the dynamic array, as this will reserve that amount of bytes on the ring buffer, and to make matters worse, it will even save that size in the event as the event expects it to be dynamic (for which it is not). Since IFNAMSIZ is just 16 bytes, just make it a static array and this will remove the meta data from the event that records the size. Link: https://lkml.kernel.org/r/20220712185820.002d9fb5@gandalf.local.home Cc: Leon Romanovsky <leon@kernel.org> Cc: Jiri Pirko <jiri@nvidia.com> Cc: "David S. Miller" <davem@davemloft.net> Cc: Eric Dumazet <edumazet@google.com> Cc: Paolo Abeni <pabeni@redhat.com> Cc: netdev@vger.kernel.org Reviewed-by: Ido Schimmel <idosch@nvidia.com> Tested-by: Ido Schimmel <idosch@nvidia.com> Acked-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2022-07-12blk-iocost: tracing: atomic64_read(&ioc->vtime_rate) is assigned an extra ↵Li kunyu1-1/+1
semicolon Remove extra semicolon. Link: https://lkml.kernel.org/r/20220629030013.10362-1-kunyu@nfschina.com Cc: Tejun Heo <tj@kernel.org> Cc: Jens Axboe <axboe@kernel.dk> Signed-off-by: Li kunyu <kunyu@nfschina.com> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2022-07-03lockref: remove unused 'lockref_get_or_lock()' functionLinus Torvalds1-1/+0
Looking at the conditional lock acquire functions in the kernel due to the new sparse support (see commit 4a557a5d1a61 "sparse: introduce conditional lock acquire function attribute"), it became obvious that the lockref code has a couple of them, but they don't match the usual naming convention for the other ones, and their return value logic is also reversed. In the other very similar places, the naming pattern is '*_and_lock()' (eg 'atomic_put_and_lock()' and 'refcount_dec_and_lock()'), and the function returns true when the lock is taken. The lockref code is superficially very similar to the refcount code, only with the special "atomic wrt the embedded lock" semantics. But instead of the '*_and_lock()' naming it uses '*_or_lock()'. And instead of returning true in case it took the lock, it returns true if it *didn't* take the lock. Now, arguably the reflock code is quite logical: it really is a "either decrement _or_ lock" kind of situation - and the return value is about whether the operation succeeded without any special care needed. So despite the similarities, the differences do make some sense, and maybe it's not worth trying to unify the different conditional locking primitives in this area. But while looking at this all, it did become obvious that the 'lockref_get_or_lock()' function hasn't actually had any users for almost a decade. The only user it ever had was the shortlived 'd_rcu_to_refcount()' function, and it got removed and replaced with 'lockref_get_not_dead()' back in 2013 in commits 0d98439ea3c6 ("vfs: use lockred 'dead' flag to mark unrecoverably dead dentries") and e5c832d55588 ("vfs: fix dentry RCU to refcounting possibly sleeping dput()") In fact, that single use was removed less than a week after the whole function was introduced in commit b3abd80250c1 ("lockref: add 'lockref_get_or_lock() helper") so this function has been around for a decade, but only had a user for six days. Let's just put this mis-designed and unused function out of its misery. We can think about the naming and semantic oddities of the remaining 'lockref_put_or_lock()' later, but at least that function has users. And while the naming is different and the return value doesn't match, that function matches the whole '{atomic,refcount}_dec_and_test()' pattern much better (ie the magic happens when the count goes down to zero, not when it is incremented from zero). Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-07-03sparse: introduce conditional lock acquire function attributeLinus Torvalds2-3/+5
The kernel tends to try to avoid conditional locking semantics because it makes it harder to think about and statically check locking rules, but we do have a few fundamental locking primitives that take locks conditionally - most obviously the 'trylock' functions. That has always been a problem for 'sparse' checking for locking imbalance, and we've had a special '__cond_lock()' macro that we've used to let sparse know how the locking works: # define __cond_lock(x,c) ((c) ? ({ __acquire(x); 1; }) : 0) so that you can then use this to tell sparse that (for example) the spinlock trylock macro ends up acquiring the lock when it succeeds, but not when it fails: #define raw_spin_trylock(lock) __cond_lock(lock, _raw_spin_trylock(lock)) and then sparse can follow along the locking rules when you have code like if (!spin_trylock(&dentry->d_lock)) return LRU_SKIP; .. sparse sees that the lock is held here.. spin_unlock(&dentry->d_lock); and sparse ends up happy about the lock contexts. However, this '__cond_lock()' use does result in very ugly header files, and requires you to basically wrap the real function with that macro that uses '__cond_lock'. Which has made PeterZ NAK things that try to fix sparse warnings over the years [1]. To solve this, there is now a very experimental patch to sparse that basically does the exact same thing as '__cond_lock()' did, but using a function attribute instead. That seems to make PeterZ happy [2]. Note that this does not replace existing use of '__cond_lock()', but only exposes the new proposed attribute and uses it for the previously unannotated 'refcount_dec_and_lock()' family of functions. For existing sparse installations, this will make no difference (a negative output context was ignored), but if you have the experimental sparse patch it will make sparse now understand code that uses those functions, the same way '__cond_lock()' makes sparse understand the very similar 'atomic_dec_and_lock()' uses that have the old '__cond_lock()' annotations. Note that in some cases this will silence existing context imbalance warnings. But in other cases it may end up exposing new sparse warnings for code that sparse just didn't see the locking for at all before. This is a trial, in other words. I'd expect that if it ends up being successful, and new sparse releases end up having this new attribute, we'll migrate the old-style '__cond_lock()' users to use the new-style '__cond_acquires' function attribute. The actual experimental sparse patch was posted in [3]. Link: https://lore.kernel.org/all/20130930134434.GC12926@twins.programming.kicks-ass.net/ [1] Link: https://lore.kernel.org/all/Yr60tWxN4P568x3W@worktop.programming.kicks-ass.net/ [2] Link: https://lore.kernel.org/all/CAHk-=wjZfO9hGqJ2_hGQG3U_XzSh9_XaXze=HgPdvJbgrvASfA@mail.gmail.com/ [3] Acked-by: Peter Zijlstra <peterz@infradead.org> Cc: Alexander Aring <aahringo@redhat.com> Cc: Luc Van Oostenryck <luc.vanoostenryck@gmail.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-07-01Merge tag 'pm-5.19-rc5' of ↵Linus Torvalds1-0/+5
git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm Pull power management fixes from Rafael Wysocki: "These fix some issues in cpufreq drivers and some issues in devfreq: - Fix error code path issues related PROBE_DEFER handling in devfreq (Christian Marangi) - Revert an editing accident in SPDX-License line in the devfreq passive governor (Lukas Bulwahn) - Fix refcount leak in of_get_devfreq_events() in the exynos-ppmu devfreq driver (Miaoqian Lin) - Use HZ_PER_KHZ macro in the passive devfreq governor (Yicong Yang) - Fix missing of_node_put for qoriq and pmac32 driver (Liang He) - Fix issues around throttle interrupt for qcom driver (Stephen Boyd) - Add MT8186 to cpufreq-dt-platdev blocklist (AngeloGioacchino Del Regno) - Make amd-pstate enable CPPC on resume from S3 (Jinzhou Su)" * tag 'pm-5.19-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: PM / devfreq: passive: revert an editing accident in SPDX-License line PM / devfreq: Fix kernel warning with cpufreq passive register fail PM / devfreq: Rework freq_table to be local to devfreq struct PM / devfreq: exynos-ppmu: Fix refcount leak in of_get_devfreq_events PM / devfreq: passive: Use HZ_PER_KHZ macro in units.h PM / devfreq: Fix cpufreq passive unregister erroring on PROBE_DEFER PM / devfreq: Mute warning on governor PROBE_DEFER PM / devfreq: Fix kernel panic with cpu based scaling to passive gov cpufreq: Add MT8186 to cpufreq-dt-platdev blocklist cpufreq: pmac32-cpufreq: Fix refcount leak bug cpufreq: qcom-hw: Don't do lmh things without a throttle interrupt drivers: cpufreq: Add missing of_node_put() in qoriq-cpufreq.c cpufreq: amd-pstate: Add resume and suspend callbacks
2022-07-01Merge tag 'io_uring-5.19-2022-07-01' of git://git.kernel.dk/linux-blockLinus Torvalds1-1/+1
Pull io_uring fixes from Jens Axboe: "Two minor tweaks: - While we still can, adjust the send/recv based flags to be in ->ioprio rather than in ->addr2. This is consistent with eg accept, and also doesn't waste a full 64-bit field for flags (Pavel) - 5.18-stable fix for re-importing provided buffers. Not much real world relevance here as it'll only impact non-pollable files gone async, which is more of a practical test case rather than something that is used in the wild (Dylan)" * tag 'io_uring-5.19-2022-07-01' of git://git.kernel.dk/linux-block: io_uring: fix provided buffer import io_uring: keep sendrecv flags in ioprio
2022-06-30Merge tag 'drm-fixes-2022-07-01' of git://anongit.freedesktop.org/drm/drmLinus Torvalds2-7/+19
Pull drm fixes from Dave Airlie: "Bit quieter this week, the main thing is it pulls in the fixes for the sysfb resource issue you were seeing. these had been queued for next so should have had some decent testing. Otherwise amdgpu, i915 and msm each have a few fixes, and vc4 has one. fbdev: - sysfb fixes/conflicting fb fixes amdgpu: - GPU recovery fix - Fix integer type usage in fourcc header for AMD modifiers - KFD TLB flush fix for gfx9 APUs - Display fix i915: - Fix ioctl argument error return - Fix d3cold disable to allow PCI upstream bridge D3 transition - Fix setting cache_dirty for dma-buf objects on discrete msm: - Fix to increment vsync_cnt before calling drm_crtc_handle_vblank so that userspace sees the value *after* it is incremented if waiting for vblank events - Fix to reset drm_dev to NULL in dp_display_unbind to avoid a crash in probe/bind error paths - Fix to resolve the smatch error of de-referencing before NULL check in dpu_encoder_phys_wb.c - Fix error return to userspace if fence-id allocation fails in submit ioctl vc4: - NULL ptr dereference fix" * tag 'drm-fixes-2022-07-01' of git://anongit.freedesktop.org/drm/drm: Revert "drm/amdgpu/display: set vblank_disable_immediate for DC" drm/amdgpu: To flush tlb for MMHUB of RAVEN series drm/fourcc: fix integer type usage in uapi header drm/amdgpu: fix adev variable used in amdgpu_device_gpu_recover() fbdev: Disable sysfb device registration when removing conflicting FBs firmware: sysfb: Add sysfb_disable() helper function firmware: sysfb: Make sysfb_create_simplefb() return a pdev pointer drm/msm/gem: Fix error return on fence id alloc fail drm/i915: tweak the ordering in cpu_write_needs_clflush drm/i915/dgfx: Disable d3cold at gfx root port drm/i915/gem: add missing else drm/vc4: perfmon: Fix variable dereferenced before check drm/msm/dpu: Fix variable dereferenced before check drm/msm/dp: reset drm_dev to NULL at dp_display_unbind() drm/msm/dpu: Increment vsync_cnt before waking up userspace
2022-07-01Merge tag 'drm-misc-fixes-2022-06-30' of ↵Dave Airlie1-5/+17
git://anongit.freedesktop.org/drm/drm-misc into drm-fixes A NULL pointer dereference fix for vc4, and 3 patches to improve the sysfb device behaviour when removing conflicting framebuffers Signed-off-by: Dave Airlie <airlied@redhat.com> From: Maxime Ripard <maxime@cerno.tech> Link: https://patchwork.freedesktop.org/patch/msgid/20220630072404.2fa4z3nk5h5q34ci@houat
2022-06-30Merge tag 'net-5.19-rc5' of ↵Linus Torvalds4-11/+22
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Pull networking fixes from Jakub Kicinski: "Including fixes from netfilter. Current release - new code bugs: - clear msg_get_inq in __sys_recvfrom() and __copy_msghdr_from_user() - mptcp: - invoke MP_FAIL response only when needed - fix shutdown vs fallback race - consistent map handling on failure - octeon_ep: use bitwise AND Previous releases - regressions: - tipc: move bc link creation back to tipc_node_create, fix NPD Previous releases - always broken: - tcp: add a missing nf_reset_ct() in 3WHS handling to prevent socket buffered skbs from keeping refcount on the conntrack module - ipv6: take care of disable_policy when restoring routes - tun: make sure to always disable and unlink NAPI instances - phy: don't trigger state machine while in suspend - netfilter: nf_tables: avoid skb access on nf_stolen - asix: fix "can't send until first packet is send" issue - usb: asix: do not force pause frames support - nxp-nci: don't issue a zero length i2c_master_read() Misc: - ncsi: allow use of proper "mellanox" DT vendor prefix - act_api: add a message for user space if any actions were already flushed before the error was hit" * tag 'net-5.19-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (55 commits) net: dsa: felix: fix race between reading PSFP stats and port stats selftest: tun: add test for NAPI dismantle net: tun: avoid disabling NAPI twice net: sparx5: mdb add/del handle non-sparx5 devices net: sfp: fix memory leak in sfp_probe() mlxsw: spectrum_router: Fix rollback in tunnel next hop init net: rose: fix UAF bugs caused by timer handler net: usb: ax88179_178a: Fix packet receiving net: bonding: fix use-after-free after 802.3ad slave unbind ipv6: fix lockdep splat in in6_dump_addrs() net: phy: ax88772a: fix lost pause advertisement configuration net: phy: Don't trigger state machine while in suspend usbnet: fix memory allocation in helpers selftests net: fix kselftest net fatal error NFC: nxp-nci: don't print header length mismatch on i2c error NFC: nxp-nci: Don't issue a zero length i2c_master_read() net: tipc: fix possible refcount leak in tipc_sk_create() nfc: nfcmrvl: Fix irq_of_parse_and_map() return value net: ipv6: unexport __init-annotated seg6_hmac_net_init() ipv6/sit: fix ipip6_tunnel_get_prl return value ...
2022-06-30Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdmaLinus Torvalds1-1/+1
Pull rdma fixes from Jason Gunthorpe: "Three minor bug fixes: - qedr not setting the QP timeout properly toward userspace - Memory leak on error path in ib_cm - Divide by 0 in RDMA interrupt moderation" * tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma: linux/dim: Fix divide by 0 in RDMA DIM RDMA/cm: Fix memory leak in ib_cm_insert_listen RDMA/qedr: Fix reporting QP timeout attribute
2022-06-30Merge tag 'fsnotify_for_v5.19-rc5' of ↵Linus Torvalds1-0/+4
git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs Pull fanotify fix from Jan Kara: "A fix for recently added fanotify API to have stricter checks and refuse some invalid flag combinations to make our life easier in the future" * tag 'fsnotify_for_v5.19-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs: fanotify: refine the validation checks on non-dir inode mask
2022-06-30io_uring: keep sendrecv flags in ioprioPavel Begunkov1-1/+1
We waste a u64 SQE field for flags even though we don't need as many bits and it can be used for something more useful later. Store io_uring specific send/recv flags in sqe->ioprio instead of ->addr2. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Fixes: 0455d4ccec54 ("io_uring: add POLL_FIRST support for send/sendmsg and recv/recvmsg") [axboe: change comment in io_uring.h as well] Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-06-29net: phy: Don't trigger state machine while in suspendLukas Wunner1-0/+6
Upon system sleep, mdio_bus_phy_suspend() stops the phy_state_machine(), but subsequent interrupts may retrigger it: They may have been left enabled to facilitate wakeup and are not quiesced until the ->suspend_noirq() phase. Unwanted interrupts may hence occur between mdio_bus_phy_suspend() and dpm_suspend_noirq(), as well as between dpm_resume_noirq() and mdio_bus_phy_resume(). Retriggering the phy_state_machine() through an interrupt is not only undesirable for the reason given in mdio_bus_phy_suspend() (freezing it midway with phydev->lock held), but also because the PHY may be inaccessible after it's suspended: Accesses to USB-attached PHYs are blocked once usb_suspend_both() clears the can_submit flag and PHYs on PCI network cards may become inaccessible upon suspend as well. Amend phy_interrupt() to avoid triggering the state machine if the PHY is suspended. Signal wakeup instead if the attached net_device or its parent has been configured as a wakeup source. (Those conditions are identical to mdio_bus_phy_may_suspend().) Postpone handling of the interrupt until the PHY has resumed. Before stopping the phy_state_machine() in mdio_bus_phy_suspend(), wait for a concurrent phy_interrupt() to run to completion. That is necessary because phy_interrupt() may have checked the PHY's suspend status before the system sleep transition commenced and it may thus retrigger the state machine after it was stopped. Likewise, after re-enabling interrupt handling in mdio_bus_phy_resume(), wait for a concurrent phy_interrupt() to complete to ensure that interrupts which it postponed are properly rerun. The issue was exposed by commit 1ce8b37241ed ("usbnet: smsc95xx: Forward PHY interrupts to PHY driver to avoid polling"), but has existed since forever. Fixes: 541cd3ee00a4 ("phylib: Fix deadlock on resume") Link: https://lore.kernel.org/netdev/a5315a8a-32c2-962f-f696-de9a26d30091@samsung.com/ Reported-by: Marek Szyprowski <m.szyprowski@samsung.com> Tested-by: Marek Szyprowski <m.szyprowski@samsung.com> Signed-off-by: Lukas Wunner <lukas@wunner.de> Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Cc: stable@vger.kernel.org # v2.6.33+ Reviewed-by: Andrew Lunn <andrew@lunn.ch> Link: https://lore.kernel.org/r/b7f386d04e9b5b0e2738f0125743e30676f309ef.1656410895.git.lukas@wunner.de Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-06-29Merge git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nfJakub Kicinski1-6/+10
Pablo Neira Ayuso says: ==================== Netfilter fixes for net 1) Restore set counter when one of the CPU loses race to add elements to sets. 2) After NF_STOLEN, skb might be there no more, update nftables trace infra to avoid access to skb in this case. From Florian Westphal. 3) nftables bridge might register a prerouting hook with zero priority, br_netfilter incorrectly skips it. Also from Florian. * git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf: netfilter: br_netfilter: do not skip all hooks with 0 priority netfilter: nf_tables: avoid skb access on nf_stolen netfilter: nft_dynset: restore set element counter when failing to update ==================== Link: https://lore.kernel.org/r/ Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-06-30PM / devfreq: Rework freq_table to be local to devfreq structChristian Marangi1-0/+5
On a devfreq PROBE_DEFER, the freq_table in the driver profile struct, is never reset and may be leaved in an undefined state. This comes from the fact that we store the freq_table in the driver profile struct that is commonly defined as static and not reset on PROBE_DEFER. We currently skip the reinit of the freq_table if we found it's already defined since a driver may declare his own freq_table. This logic is flawed in the case devfreq core generate a freq_table, set it in the profile struct and then PROBE_DEFER, freeing the freq_table. In this case devfreq will found a NOT NULL freq_table that has been freed, skip the freq_table generation and probe the driver based on the wrong table. To fix this and correctly handle PROBE_DEFER, use a local freq_table and max_state in the devfreq struct and never modify the freq_table present in the profile struct if it does provide it. Fixes: 0ec09ac2cebe ("PM / devfreq: Set the freq_table of devfreq device") Cc: stable@vger.kernel.org Signed-off-by: Christian Marangi <ansuelsmth@gmail.com> Signed-off-by: Chanwoo Choi <cw00.choi@samsung.com>
2022-06-29drm/fourcc: fix integer type usage in uapi headerCarlos Llamas1-2/+2
Kernel uapi headers are supposed to use __[us]{8,16,32,64} types defined by <linux/types.h> as opposed to 'uint32_t' and similar. See [1] for the relevant discussion about this topic. In this particular case, the usage of 'uint64_t' escaped headers_check as these macros are not being called here. However, the following program triggers a compilation error: #include <drm/drm_fourcc.h> int main() { unsigned long x = AMD_FMT_MOD_CLEAR(RB); return 0; } gcc error: drm.c:5:27: error: ‘uint64_t’ undeclared (first use in this function) 5 | unsigned long x = AMD_FMT_MOD_CLEAR(RB); | ^~~~~~~~~~~~~~~~~ This patch changes AMD_FMT_MOD_{SET,CLEAR} macros to use the correct integer types, which fixes the above issue. [1] https://lkml.org/lkml/2019/6/5/18 Fixes: 8ba16d599374 ("drm/fourcc: Add AMD DRM modifiers.") Signed-off-by: Carlos Llamas <cmllamas@google.com> Reviewed-by: Simon Ser <contact@emersion.fr> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-06-29firmware: sysfb: Add sysfb_disable() helper functionJavier Martinez Canillas1-0/+12
This can be used by subsystems to unregister a platform device registered by sysfb and also to disable future platform device registration in sysfb. Suggested-by: Daniel Vetter <daniel.vetter@ffwll.ch> Signed-off-by: Javier Martinez Canillas <javierm@redhat.com> Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch> Link: https://patchwork.freedesktop.org/patch/msgid/20220607182338.344270-3-javierm@redhat.com
2022-06-29firmware: sysfb: Make sysfb_create_simplefb() return a pdev pointerJavier Martinez Canillas1-5/+5
This function just returned 0 on success or an errno code on error, but it could be useful for sysfb_init() callers to have a pointer to the device. Signed-off-by: Javier Martinez Canillas <javierm@redhat.com> Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch> Reviewed-by: Thomas Zimmermann <tzimmermann@suse.de> Link: https://patchwork.freedesktop.org/patch/msgid/20220607182338.344270-2-javierm@redhat.com
2022-06-28mptcp: fix conflict with <netinet/in.h>Ossama Othman1-4/+5
Including <linux/mptcp.h> before the C library <netinet/in.h> header causes symbol redefinition errors at compile-time due to duplicate declarations and definitions in the <linux/in.h> header included by <linux/mptcp.h>. Explicitly include <netinet/in.h> before <linux/in.h> in <linux/mptcp.h> when __KERNEL__ is not defined so that the C library compatibility logic in <linux/libc-compat.h> is enabled when including <linux/mptcp.h> in user space code. Fixes: c11c5906bc0a ("mptcp: add MPTCP_SUBFLOW_ADDRS getsockopt support") Signed-off-by: Ossama Othman <ossama.othman@intel.com> Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-06-28linux/dim: Fix divide by 0 in RDMA DIMTao Liu1-1/+1
Fix a divide 0 error in rdma_dim_stats_compare() when prev->cpe_ratio == 0. CallTrace: Hardware name: H3C R4900 G3/RS33M2C9S, BIOS 2.00.37P21 03/12/2020 task: ffff880194b78000 task.stack: ffffc90006714000 RIP: 0010:backport_rdma_dim+0x10e/0x240 [mlx_compat] RSP: 0018:ffff880c10e83ec0 EFLAGS: 00010202 RAX: 0000000000002710 RBX: ffff88096cd7f780 RCX: 0000000000000064 RDX: 0000000000000000 RSI: 0000000000000002 RDI: 0000000000000001 RBP: 0000000000000001 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000000 R12: 000000001d7c6c09 R13: ffff88096cd7f780 R14: ffff880b174fe800 R15: 0000000000000000 FS: 0000000000000000(0000) GS:ffff880c10e80000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00000000a0965b00 CR3: 000000000200a003 CR4: 00000000007606e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 PKRU: 55555554 Call Trace: <IRQ> ib_poll_handler+0x43/0x80 [ib_core] irq_poll_softirq+0xae/0x110 __do_softirq+0xd1/0x28c irq_exit+0xde/0xf0 do_IRQ+0x54/0xe0 common_interrupt+0x8f/0x8f </IRQ> ? cpuidle_enter_state+0xd9/0x2a0 ? cpuidle_enter_state+0xc7/0x2a0 ? do_idle+0x170/0x1d0 ? cpu_startup_entry+0x6f/0x80 ? start_secondary+0x1b9/0x210 ? secondary_startup_64+0xa5/0xb0 Code: 0f 87 e1 00 00 00 8b 4c 24 14 44 8b 43 14 89 c8 4d 63 c8 44 29 c0 99 31 d0 29 d0 31 d2 48 98 48 8d 04 80 48 8d 04 80 48 c1 e0 02 <49> f7 f1 48 83 f8 0a 0f 86 c1 00 00 00 44 39 c1 7f 10 48 89 df RIP: backport_rdma_dim+0x10e/0x240 [mlx_compat] RSP: ffff880c10e83ec0 Fixes: f4915455dcf0 ("linux/dim: Implement RDMA adaptive moderation (DIM)") Link: https://lore.kernel.org/r/20220627140004.3099-1-thomas.liu@ucloud.cn Signed-off-by: Tao Liu <thomas.liu@ucloud.cn> Reviewed-by: Max Gurtovoy <mgurtovoy@nvidia.com> Acked-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2022-06-28fanotify: refine the validation checks on non-dir inode maskAmir Goldstein1-0/+4
Commit ceaf69f8eadc ("fanotify: do not allow setting dirent events in mask of non-dir") added restrictions about setting dirent events in the mask of a non-dir inode mark, which does not make any sense. For backward compatibility, these restictions were added only to new (v5.17+) APIs. It also does not make any sense to set the flags FAN_EVENT_ON_CHILD or FAN_ONDIR in the mask of a non-dir inode. Add these flags to the dir-only restriction of the new APIs as well. Move the check of the dir-only flags for new APIs into the helper fanotify_events_supported(), which is only called for FAN_MARK_ADD, because there is no need to error on an attempt to remove the dir-only flags from non-dir inode. Fixes: ceaf69f8eadc ("fanotify: do not allow setting dirent events in mask of non-dir") Link: https://lore.kernel.org/linux-fsdevel/20220627113224.kr2725conevh53u4@quack3.lan/ Link: https://lore.kernel.org/r/20220627174719.2838175-1-amir73il@gmail.com Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Jan Kara <jack@suse.cz>
2022-06-27Merge tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhostLinus Torvalds1-0/+2
Pull virtio fixes from Michael Tsirkin: "Fixes all over the place, most notably we are disabling IRQ hardening (again!)" * tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost: virtio_ring: make vring_create_virtqueue_split prettier vhost-vdpa: call vhost_vdpa_cleanup during the release virtio_mmio: Restore guest page size on resume virtio_mmio: Add missing PM calls to freeze/restore caif_virtio: fix race between virtio_device_ready() and ndo_open() virtio-net: fix race between ndo_open() and virtio_device_ready() virtio: disable notification hardening by default virtio: Remove unnecessary variable assignments virtio_ring : keep used_wrap_counter in vq->last_used_idx vduse: Tie vduse mgmtdev and its device vdpa/mlx5: Initialize CVQ vringh only once vdpa/mlx5: Update Control VQ callback information
2022-06-27netfilter: nf_tables: avoid skb access on nf_stolenFlorian Westphal1-6/+10
When verdict is NF_STOLEN, the skb might have been freed. When tracing is enabled, this can result in a use-after-free: 1. access to skb->nf_trace 2. access to skb->mark 3. computation of trace id 4. dump of packet payload To avoid 1, keep a cached copy of skb->nf_trace in the trace state struct. Refresh this copy whenever verdict is != STOLEN. Avoid 2 by skipping skb->mark access if verdict is STOLEN. 3 is avoided by precomputing the trace id. Only dump the packet when verdict is not "STOLEN". Reported-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2022-06-26Merge tag 'soc-fixes-5.19' of ↵Linus Torvalds1-4/+5
git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc Pull ARM SoC fixes from Arnd Bergmann: "A number of fixes have accumulated, but they are largely for harmless issues: - Several OF node leak fixes - A fix to the Exynos7885 UART clock description - DTS fixes to prevent boot failures on TI AM64 and J721s2 - Bus probe error handling fixes for Baikal-T1 - A fixup to the way STM32 SoCs use separate dts files for different firmware stacks - Multiple code fixes for Arm SCMI firmware, all dealing with robustness of the implementation - Multiple NXP i.MX devicetree fixes, addressing incorrect data in DT nodes - Three updates to the MAINTAINERS file, including Florian Fainelli taking over BCM283x/BCM2711 (Raspberry Pi) from Nicolas Saenz Julienne" * tag 'soc-fixes-5.19' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc: (29 commits) ARM: dts: aspeed: nuvia: rename vendor nuvia to qcom arm: mach-spear: Add missing of_node_put() in time.c ARM: cns3xxx: Fix refcount leak in cns3xxx_init MAINTAINERS: Update email address arm64: dts: ti: k3-am64-main: Remove support for HS400 speed mode arm64: dts: ti: k3-j721s2: Fix overlapping GICD memory region ARM: dts: bcm2711-rpi-400: Fix GPIO line names bus: bt1-axi: Don't print error on -EPROBE_DEFER bus: bt1-apb: Don't print error on -EPROBE_DEFER ARM: Fix refcount leak in axxia_boot_secondary ARM: dts: stm32: move SCMI related nodes in a dedicated file for stm32mp15 soc: imx: imx8m-blk-ctrl: fix display clock for LCDIF2 power domain ARM: dts: imx6qdl-colibri: Fix capacitive touch reset polarity ARM: dts: imx6qdl: correct PU regulator ramp delay firmware: arm_scmi: Fix incorrect error propagation in scmi_voltage_descriptors_get firmware: arm_scmi: Avoid using extended string-buffers sizes if not necessary firmware: arm_scmi: Fix SENSOR_AXIS_NAME_GET behaviour when unsupported ARM: dts: imx7: Move hsic_phy power domain to HSIC PHY node soc: bcm: brcmstb: pm: pm-arm: Fix refcount leak in brcmstb_pm_probe MAINTAINERS: Update BCM2711/BCM2835 maintainer ...
2022-06-26Merge tag 'mm-hotfixes-stable-2022-06-26' of ↵Linus Torvalds1-1/+2
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull hotfixes from Andrew Morton: "Minor things, mainly - mailmap updates, MAINTAINERS updates, etc. Fixes for this merge window: - fix for a damon boot hang, from SeongJae - fix for a kfence warning splat, from Jason Donenfeld - fix for zero-pfn pinning, from Alex Williamson - fix for fallocate hole punch clearing, from Mike Kravetz Fixes for previous releases: - fix for a performance regression, from Marcelo - fix for a hwpoisining BUG from zhenwei pi" * tag 'mm-hotfixes-stable-2022-06-26' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: mailmap: add entry for Christian Marangi mm/memory-failure: disable unpoison once hw error happens hugetlbfs: zero partial pages during fallocate hole punch mm: memcontrol: reference to tools/cgroup/memcg_slabinfo.py mm: re-allow pinning of zero pfns mm/kfence: select random number before taking raw lock MAINTAINERS: add maillist information for LoongArch MAINTAINERS: update MM tree references MAINTAINERS: update Abel Vesa's email MAINTAINERS: add MEMORY HOT(UN)PLUG section and add David as reviewer MAINTAINERS: add Miaohe Lin as a memory-failure reviewer mailmap: add alias for jarkko@profian.com mm/damon/reclaim: schedule 'damon_reclaim_timer' only after 'system_wq' is initialized kthread: make it clear that kthread_create_on_node() might be terminated by any fatal signal mm: lru_cache_disable: use synchronize_rcu_expedited mm/page_isolation.c: fix one kernel-doc comment
2022-06-24Merge tag 'gpio-fixes-for-v5.19-rc4' of ↵Linus Torvalds1-13/+16
git://git.kernel.org/pub/scm/linux/kernel/git/brgl/linux Pull gpio fixes from Bartosz Golaszewski: - make the irqchip immutable in gpio-realtek-otto - fix error code propagation in gpio-winbond - fix device removing in gpio-grgpio - fix a typo in gpio-mxs which indicates the driver is for a different model - documentation fixes - MAINTAINERS file updates * tag 'gpio-fixes-for-v5.19-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/brgl/linux: gpio: mxs: Fix header comment gpio: Fix kernel-doc comments to nested union gpio: grgpio: Fix device removing gpio: winbond: Fix error code in winbond_gpio_get() gpio: realtek-otto: Make the irqchip immutable docs: driver-api: gpio: Fix filename mismatch MAINTAINERS: add include/dt-bindings/gpio to GPIO SUBSYSTEM
2022-06-24net: fix IFF_TX_SKB_NO_LINEAR definitionDan Carpenter1-1/+1
The "1<<31" shift has a sign extension bug so IFF_TX_SKB_NO_LINEAR is 0xffffffff80000000 instead of 0x0000000080000000. Fixes: c2ff53d8049f ("net: Add priv_flags for allow tx skb without linear") Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Reviewed-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com> Link: https://lore.kernel.org/r/YrRrcGttfEVnf85Q@kili Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-06-24Merge tag 'ata-5.19-rc4' of ↵Linus Torvalds1-0/+1
git://git.kernel.org/pub/scm/linux/kernel/git/dlemoal/libata Pull ATA fix from Damien Le Moal: - a single patch to fix tracing of command completion (Edward) * tag 'ata-5.19-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/dlemoal/libata: ata: libata: add qc->flags in ata_qc_complete_template tracepoint
2022-06-24Merge tag 'block-5.19-2022-06-24' of git://git.kernel.dk/linux-blockLinus Torvalds2-7/+6
Pull block fixes from Jens Axboe: - Series fixing issues with sysfs locking and name reuse (Christoph) - NVMe pull request via Christoph: - Fix the mixed up CRIMS/CRWMS constants (Joel Granados) - Add another broken identifier quirk (Leo Savernik) - Fix up a quirk because Samsung reuses PCI IDs over different products (Christoph Hellwig) - Remove old WARN_ON() that doesn't apply anymore (Li) - Fix for using a stale cached request value for rq-qos throttling mechanisms that may schedule(), like iocost (me) - Remove unused parameter to blk_independent_access_range() (Damien) * tag 'block-5.19-2022-06-24' of git://git.kernel.dk/linux-block: block: remove WARN_ON() from bd_link_disk_holder nvme: move the Samsung X5 quirk entry to the core quirks nvme: fix the CRIMS and CRWMS definitions to match the spec nvme: add a bogus subsystem NQN quirk for Micron MTFDKBA2T0TFH block: pop cached rq before potentially blocking rq_qos_throttle() block: remove queue from struct blk_independent_access_range block: freeze the queue earlier in del_gendisk block: remove per-disk debugfs files in blk_unregister_queue block: serialize all debugfs operations using q->debugfs_mutex block: disable the elevator int del_gendisk
2022-06-24Merge tag 'io_uring-5.19-2022-06-24' of git://git.kernel.dk/linux-blockLinus Torvalds1-7/+35
Pull io_uring fixes from Jens Axboe: "A few fixes that should go into the 5.19 release. All are fixing issues that either happened in this release, or going to stable. In detail: - A small series of fixlets for the poll handling, all destined for stable (Pavel) - Fix a merge error from myself that caused a potential -EINVAL for the recv/recvmsg flag setting (me) - Fix a kbuf recycling issue for partial IO (me) - Use the original request for the inflight tracking (me) - Fix an issue introduced this merge window with trace points using a custom decoder function, which won't work for perf (Dylan)" * tag 'io_uring-5.19-2022-06-24' of git://git.kernel.dk/linux-block: io_uring: use original request task for inflight tracking io_uring: move io_uring_get_opcode out of TP_printk io_uring: fix double poll leak on repolling io_uring: fix wrong arm_poll error handling io_uring: fail links when poll fails io_uring: fix req->apoll_events io_uring: fix merge error in checking send/recv addr2 flags io_uring: mark reissue requests with REQ_F_PARTIAL_IO
2022-06-24Merge tag 'printk-for-5.19-rc4' of ↵Linus Torvalds2-33/+0
git://git.kernel.org/pub/scm/linux/kernel/git/printk/linux Pull printk kernel thread revert from Petr Mladek: "Revert printk console kthreads. The testing of 5.19 release candidates revealed issues that did not happen when all consoles were serialized using the console semaphore. More time is needed to check expectations of the existing console drivers and be confident that they can be safely used in parallel" * tag 'printk-for-5.19-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/printk/linux: Revert "printk: add functions to prefer direct printing" Revert "printk: add kthread console printers" Revert "printk: extend console_lock for per-console locking" Revert "printk: remove @console_locked" Revert "printk: Block console kthreads when direct printing will be required" Revert "printk: Wait for the global console lock when the system is going down"
2022-06-24virtio: disable notification hardening by defaultJason Wang1-0/+2
We try to harden virtio device notifications in 8b4ec69d7e09 ("virtio: harden vring IRQ"). It works with the assumption that the driver or core can properly call virtio_device_ready() at the right place. Unfortunately, this seems to be not true and uncover various bugs of the existing drivers, mainly the issue of using virtio_device_ready() incorrectly. So let's add a Kconfig option and disable it by default. It gives us time to fix the drivers and then we can consider re-enabling it. Signed-off-by: Jason Wang <jasowang@redhat.com> Message-Id: <20220622012940.21441-1-jasowang@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Cornelia Huck <cohuck@redhat.com>
2022-06-23gpio: Fix kernel-doc comments to nested unionAkira Yokosawa1-13/+16
Commit 48ec13d36d3f ("gpio: Properly document parent data union") is supposed to have fixed a warning from "make htmldocs" regarding kernel-doc comments to union members. However, the same warning still remains [1]. Fix the issue by following the example found in section "Nested structs/unions" of Documentation/doc-guide/kernel-doc.rst. Signed-off-by: Akira Yokosawa <akiyks@gmail.com> Reported-by: Stephen Rothwell <sfr@canb.auug.org.au> Fixes: 48ec13d36d3f ("gpio: Properly document parent data union") Link: https://lore.kernel.org/r/20220606093302.21febee3@canb.auug.org.au/ [1] Cc: Linus Walleij <linus.walleij@linaro.org> Cc: Bartosz Golaszewski <brgl@bgdev.pl> Cc: Joey Gouly <joey.gouly@arm.com> Cc: Marc Zyngier <maz@kernel.org> Tested-by: Stephen Rothwell <sfr@canb.auug.org.au> Reviewed-by: Mauro Carvalho Chehab <mchehab@kernel.org> Signed-off-by: Bartosz Golaszewski <brgl@bgdev.pl>
2022-06-23Merge tag 'random-5.19-rc4-for-linus' of ↵Linus Torvalds1-4/+8
git://git.kernel.org/pub/scm/linux/kernel/git/crng/random Pull random number generator fixes from Jason Donenfeld: - A change to schedule the interrupt randomness mixing less often, yet credit a little more each time, to reduce overhead during interrupt storms. - Squelch an undesired pr_warn() from __ratelimit(), which was causing problems in the reporters' CI. - A trivial comment fix. * tag 'random-5.19-rc4-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/crng/random: random: update comment from copy_to_user() -> copy_to_iter() random: quiet urandom warning ratelimit suppression message random: schedule mix_interrupt_randomness() less often
2022-06-23Merge branch 'rework/kthreads' into for-linusPetr Mladek2-33/+0
2022-06-23Revert "printk: add functions to prefer direct printing"Petr Mladek1-11/+0
This reverts commit 2bb2b7b57f81255c13f4395ea911d6bdc70c9fe2. The testing of 5.19 release candidates revealed missing synchronization between early and regular console functionality. It would be possible to start the console kthreads later as a workaround. But it is clear that console lock serialized console drivers between each other. It opens a big area of possible problems that were not considered by people involved in the development and review. printk() is crucial for debugging kernel issues and console output is very important part of it. The number of consoles is huge and a proper review would take some time. As a result it need to be reverted for 5.19. Link: https://lore.kernel.org/r/YrBdjVwBOVgLfHyb@alley Signed-off-by: Petr Mladek <pmladek@suse.com> Link: https://lore.kernel.org/r/20220623145157.21938-7-pmladek@suse.com
2022-06-23Revert "printk: add kthread console printers"Petr Mladek1-2/+0
This reverts commit 09c5ba0aa2fcfdadb17d045c3ee6f86d69270df7. This reverts commit b87f02307d3cfbda768520f0687c51ca77e14fc3. The testing of 5.19 release candidates revealed missing synchronization between early and regular console functionality. It would be possible to start the console kthreads later as a workaround. But it is clear that console lock serialized console drivers between each other. It opens a big area of possible problems that were not considered by people involved in the development and review. printk() is crucial for debugging kernel issues and console output is very important part of it. The number of consoles is huge and a proper review would take some time. As a result it need to be reverted for 5.19. Link: https://lore.kernel.org/r/YrBdjVwBOVgLfHyb@alley Signed-off-by: Petr Mladek <pmladek@suse.com> Link: https://lore.kernel.org/r/20220623145157.21938-6-pmladek@suse.com