summaryrefslogtreecommitdiff
path: root/include
AgeCommit message (Collapse)Author
2020-02-26Merge tag 'trace-v5.6-rc2' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace Pull tracing and bootconfig updates: "Fixes and changes to bootconfig before it goes live in a release. Change in API of bootconfig (before it comes live in a release): - Have a magic value "BOOTCONFIG" in initrd to know a bootconfig exists - Set CONFIG_BOOT_CONFIG to 'n' by default - Show error if "bootconfig" on cmdline but not compiled in - Prevent redefining the same value - Have a way to append values - Added a SELECT BLK_DEV_INITRD to fix a build failure Synthetic event fixes: - Switch to raw_smp_processor_id() for recording CPU value in preempt section. (No care for what the value actually is) - Fix samples always recording u64 values - Fix endianess - Check number of values matches number of fields - Fix a printing bug Fix of trace_printk() breaking postponed start up tests Make a function static that is only used in a single file" * tag 'trace-v5.6-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace: bootconfig: Fix CONFIG_BOOTTIME_TRACING dependency issue bootconfig: Add append value operator support bootconfig: Prohibit re-defining value on same key bootconfig: Print array as multiple commands for legacy command line bootconfig: Reject subkey and value on same parent key tools/bootconfig: Remove unneeded error message silencer bootconfig: Add bootconfig magic word for indicating bootconfig explicitly bootconfig: Set CONFIG_BOOT_CONFIG=n by default tracing: Clear trace_state when starting trace bootconfig: Mark boot_config_checksum() static tracing: Disable trace_printk() on post poned tests tracing: Have synthetic event test use raw_smp_processor_id() tracing: Fix number printing bug in print_synth_event() tracing: Check that number of vals matches number of synth event fields tracing: Make synth_event trace functions endian-correct tracing: Make sure synth_event_trace() example always uses u64
2020-02-26Merge branch 'master' of git://blackhole.kfki.hu/nfPablo Neira Ayuso
Jozsef Kadlecsik says: ==================== ipset patches for nf The first one is larger than usual, but the issue could not be solved simpler. Also, it's a resend of the patch I submitted a few days ago, with a one line fix on top of that: the size of the comment extensions was not taken into account at reporting the full size of the set. - Fix "INFO: rcu detected stall in hash_xxx" reports of syzbot by introducing region locking and using workqueue instead of timer based gc of timed out entries in hash types of sets in ipset. - Fix the forceadd evaluation path - the bug was also uncovered by the syzbot. ==================== Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2020-02-25devlink: extend devlink_trap_report() to accept cookie and passJiri Pirko
Add cookie argument to devlink_trap_report() allowing driver to pass in the user cookie. Pass on the cookie down to drop monitor code. Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: Ido Schimmel <idosch@mellanox.com> Reviewed-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-02-25drop_monitor: extend by passing cookie from driverJiri Pirko
If driver passed along the cookie, push it through Netlink. Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-02-25devlink: add trap metadata type for cookieJiri Pirko
Allow driver to indicate cookie metadata for registered traps. Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: Ido Schimmel <idosch@mellanox.com> Reviewed-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-02-25flow_offload: pass action cookie through offload structuresJiri Pirko
Extend struct flow_action_entry in order to hold TC action cookie specified by user inserting the action. Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: Ido Schimmel <idosch@mellanox.com> Reviewed-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-02-25icmp: allow icmpv6_ndo_send to work with CONFIG_IPV6=nJason A. Donenfeld
The icmpv6_send function has long had a static inline implementation with an empty body for CONFIG_IPV6=n, so that code calling it doesn't need to be ifdef'd. The new icmpv6_ndo_send function, which is intended for drivers as a drop-in replacement with an identical function signature, should follow the same pattern. Without this patch, drivers that used to work with CONFIG_IPV6=n now result in a linker error. Cc: Chen Zhou <chenzhou10@huawei.com> Reported-by: Hulk Robot <hulkci@huawei.com> Fixes: 0b41713b6066 ("icmp: introduce helper for nat'd source address in network device context") Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-02-25Merge tag 'imx-clk-fixes-5.6' of ↵Stephen Boyd
git://git.kernel.org/pub/scm/linux/kernel/git/shawnguo/linux into clk-fixes Pull an i.MX clk fix from Shawn Guo: A fix on i.MX8MN I2C4 and UART1 clock IDs, which clash with PWM2 and PWM3 ID. This bug makes I2C4 and UART1 device in i.MX8MN DT point to PWM2 and PWM3 clock. * tag 'imx-clk-fixes-5.6' of git://git.kernel.org/pub/scm/linux/kernel/git/shawnguo/linux: clk: imx8mn: Fix incorrect clock defines
2020-02-25blktrace: Protect q->blk_trace with RCUJan Kara
KASAN is reporting that __blk_add_trace() has a use-after-free issue when accessing q->blk_trace. Indeed the switching of block tracing (and thus eventual freeing of q->blk_trace) is completely unsynchronized with the currently running tracing and thus it can happen that the blk_trace structure is being freed just while __blk_add_trace() works on it. Protect accesses to q->blk_trace by RCU during tracing and make sure we wait for the end of RCU grace period when shutting down tracing. Luckily that is rare enough event that we can afford that. Note that postponing the freeing of blk_trace to an RCU callback should better be avoided as it could have unexpected user visible side-effects as debugfs files would be still existing for a short while block tracing has been shut down. Link: https://bugzilla.kernel.org/show_bug.cgi?id=205711 CC: stable@vger.kernel.org Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Reviewed-by: Ming Lei <ming.lei@redhat.com> Tested-by: Ming Lei <ming.lei@redhat.com> Reviewed-by: Bart Van Assche <bvanassche@acm.org> Reported-by: Tristan Madani <tristmd@gmail.com> Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-02-24bpf: Provide recursion prevention helpersThomas Gleixner
The places which need to prevent the execution of trace type BPF programs to prevent deadlocks on the hash bucket lock do this open coded. Provide two inline functions, bpf_disable/enable_instrumentation() to replace these open coded protection constructs. Use migrate_disable/enable() instead of preempt_disable/enable() right away so this works on RT enabled kernels. On a !RT kernel migrate_disable / enable() are mapped to preempt_disable/enable(). These helpers use this_cpu_inc/dec() instead of __this_cpu_inc/dec() on an RT enabled kernel because migrate disabled regions are preemptible and preemption might hit in the middle of a RMW operation which can lead to inconsistent state. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20200224145644.103910133@linutronix.de
2020-02-24bpf: Use migrate_disable/enable in array macros and cgroup/lirc code.David Miller
Replace the preemption disable/enable with migrate_disable/enable() to reflect the actual requirement and to allow PREEMPT_RT to substitute it with an actual migration disable mechanism which does not disable preemption. Including the code paths that go via __bpf_prog_run_save_cb(). Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20200224145643.998293311@linutronix.de
2020-02-24bpf: Use bpf_prog_run_pin_on_cpu() at simple call sites.David Miller
All of these cases are strictly of the form: preempt_disable(); BPF_PROG_RUN(...); preempt_enable(); Replace this with bpf_prog_run_pin_on_cpu() which wraps BPF_PROG_RUN() with: migrate_disable(); BPF_PROG_RUN(...); migrate_enable(); On non RT enabled kernels this maps to preempt_disable/enable() and on RT enabled kernels this solely prevents migration, which is sufficient as there is no requirement to prevent reentrancy to any BPF program from a preempting task. The only requirement is that the program stays on the same CPU. Therefore, this is a trivially correct transformation. The seccomp loop does not need protection over the loop. It only needs protection per BPF filter program [ tglx: Converted to bpf_prog_run_pin_on_cpu() ] Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20200224145643.691493094@linutronix.de
2020-02-24bpf: Replace cant_sleep() with cant_migrate()Thomas Gleixner
As already discussed in the previous change which introduced BPF_RUN_PROG_PIN_ON_CPU() BPF only requires to disable migration to guarantee per CPUness. If RT substitutes the preempt disable based migration protection then the cant_sleep() check will obviously trigger as preemption is not disabled. Replace it by cant_migrate() which maps to cant_sleep() on a non RT kernel and will verify that migration is disabled on a full RT kernel. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20200224145643.583038889@linutronix.de
2020-02-24bpf: Provide bpf_prog_run_pin_on_cpu() helperThomas Gleixner
BPF programs require to run on one CPU to completion as they use per CPU storage, but according to Alexei they don't need reentrancy protection as obviously BPF programs running in thread context can always be 'preempted' by hard and soft interrupts and instrumentation and the same program can run concurrently on a different CPU. The currently used mechanism to ensure CPUness is to wrap the invocation into a preempt_disable/enable() pair. Disabling preemption is also disabling migration for a task. preempt_disable/enable() is used because there is no explicit way to reliably disable only migration. Provide a separate macro to invoke a BPF program which can be used in migrateable task context. It wraps BPF_PROG_RUN() in a migrate_disable/enable() pair which maps on non RT enabled kernels to preempt_disable/enable(). On RT enabled kernels this merely disables migration. Both methods ensure that the invoked BPF program runs on one CPU to completion. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20200224145643.474592620@linutronix.de
2020-02-24Merge tag 'mac80211-next-for-net-next-2020-02-24' of ↵David S. Miller
git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211-next Johannes Berg says: ==================== A new set of changes: * lots of small documentation fixes, from Jérôme Pouiller * beacon protection (BIGTK) support from Jouni Malinen * some initial code for TID configuration, from Tamizh chelvam * I reverted some new API before it's actually used, because it's wrong to mix controlled port and preauth * a few other cleanups/fixes ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2020-02-24mdio_bus: Add generic mdio_find_bus()Jeremy Linton
It appears most ethernet drivers follow one of two main strategies for mdio bus/phy management. A monolithic model where the net driver itself creates, probes and uses the phy, and one where an external mdio/phy driver instantiates the mdio bus/phy and the net driver only attaches to a known phy. Usually in this latter model the phys are discovered via DT relationships or simply phy name/address hardcoding. This is a shame because modern well behaved mdio buses are self describing and can be probed. The mdio layer itself is fully capable of this, yet there isn't a clean way for a standalone net driver to attach and enumerate the discovered devices. This is because outside of of_mdio_find_bus() there isn't a straightforward way to acquire the mii_bus pointer. So, lets add a mdio_find_bus which can return the mii_bus based only on its name. Signed-off-by: Jeremy Linton <jeremy.linton@arm.com> Acked-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-02-24net: Special handling for IP & MPLS.Martin Varghese
Special handling is needed in bareudp module for IP & MPLS as they support more than one ethertypes. MPLS has 2 ethertypes. 0x8847 for MPLS unicast and 0x8848 for MPLS multicast. While decapsulating MPLS packet from UDP packet the tunnel destination IP address is checked to determine the ethertype. The ethertype of the packet will be set to 0x8848 if the tunnel destination IP address is a multicast IP address. The ethertype of the packet will be set to 0x8847 if the tunnel destination IP address is a unicast IP address. IP has 2 ethertypes.0x0800 for IPV4 and 0x86dd for IPv6. The version field of the IP header tunnelled will be checked to determine the ethertype. This special handling to tunnel additional ethertypes will be disabled by default and can be enabled using a flag called multiproto. This flag can be used only with ethertypes 0x8847 and 0x0800. Signed-off-by: Martin Varghese <martin.varghese@nokia.com> Acked-by: Willem de Bruijn <willemb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-02-24net: UDP tunnel encapsulation module for tunnelling different protocols like ↵Martin Varghese
MPLS, IP, NSH etc. The Bareudp tunnel module provides a generic L3 encapsulation tunnelling module for tunnelling different protocols like MPLS, IP,NSH etc inside a UDP tunnel. Signed-off-by: Martin Varghese <martin.varghese@nokia.com> Acked-by: Willem de Bruijn <willemb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-02-24devlink: add ACL generic packet trapsJiri Pirko
Add packet traps that can report packets that were dropped during ACL processing. Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-02-24Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds
Pull kvm fixes from Paolo Bonzini: "Bugfixes, including the fix for CVE-2020-2732 and a few issues found by 'make W=1'" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: KVM: s390: rstify new ioctls in api.rst KVM: nVMX: Check IO instruction VM-exit conditions KVM: nVMX: Refactor IO bitmap checks into helper function KVM: nVMX: Don't emulate instructions in guest mode KVM: nVMX: Emulate MTF when performing instruction emulation KVM: fix error handling in svm_hardware_setup KVM: SVM: Fix potential memory leak in svm_cpu_init() KVM: apic: avoid calculating pending eoi from an uninitialized val KVM: nVMX: clear PIN_BASED_POSTED_INTR from nested pinbased_ctls only when apicv is globally disabled KVM: nVMX: handle nested posted interrupts when apicv is disabled for L1 kvm: x86: svm: Fix NULL pointer dereference when AVIC not enabled KVM: VMX: Add VMX_FEATURE_USR_WAIT_PAUSE KVM: nVMX: Hold KVM's srcu lock when syncing vmcs12->shadow KVM: x86: don't notify userspace IOAPIC on edge-triggered interrupt EOI kvm/emulate: fix a -Werror=cast-function-type KVM: x86: fix incorrect comparison in trace event KVM: nVMX: Fix some obsolete comments and grammar error KVM: x86: fix missing prototypes KVM: x86: enable -Werror
2020-02-24mac80211: Add api to support configuring TID specific configurationTamizh chelvam
Implement drv_set_tid_config api to allow TID specific configuration and drv_reset_tid_config api to reset peer specific TID configuration. This per-TID onfiguration will be applied for all the connected stations when MAC is NULL. Signed-off-by: Tamizh chelvam <tamizhr@codeaurora.org> Link: https://lore.kernel.org/r/1579506687-18296-7-git-send-email-tamizhr@codeaurora.org Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2020-02-24nl80211: Add support to configure TID specific RTSCTS configurationTamizh chelvam
This patch adds support to configure per TID RTSCTS control configuration to enable/disable through the NL80211_TID_CONFIG_ATTR_RTSCTS_CTRL attribute. Signed-off-by: Tamizh chelvam <tamizhr@codeaurora.org> Link: https://lore.kernel.org/r/1579506687-18296-5-git-send-email-tamizhr@codeaurora.org Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2020-02-24nl80211: Add support to configure TID specific AMPDU configurationTamizh chelvam
This patch adds support to configure per TID AMPDU control configuration to enable/disable aggregation through the NL80211_TID_CONFIG_ATTR_AMPDU_CTRL attribute. Signed-off-by: Tamizh chelvam <tamizhr@codeaurora.org> Link: https://lore.kernel.org/r/1579506687-18296-4-git-send-email-tamizhr@codeaurora.org Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2020-02-24nl80211: Add support to configure TID specific retry configurationTamizh chelvam
This patch adds support to configure per TID retry configuration through the NL80211_TID_CONFIG_ATTR_RETRY_SHORT and NL80211_TID_CONFIG_ATTR_RETRY_LONG attributes. This TID specific retry configuration will have more precedence than phy level configuration. Signed-off-by: Tamizh chelvam <tamizhr@codeaurora.org> Link: https://lore.kernel.org/r/1579506687-18296-3-git-send-email-tamizhr@codeaurora.org [rebase completely on top of my previous API changes] Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2020-02-24nl80211: modify TID-config APIJohannes Berg
Make some changes to the TID-config API: * use u16 in nl80211 (only, and restrict to using 8 bits for now), to avoid issues in the future if we ever want to use higher TIDs. * reject empty TIDs mask (via netlink policy) * change feature advertising to not use extended feature flags but have own mechanism for this, which simplifies the code * fix all variable names from 'tid' to 'tids' since it's a mask * change to cfg80211_ name prefixes, not ieee80211_ * fix some minor docs/spelling things. Change-Id: Ia234d464b3f914cdeab82f540e018855be580dce Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2020-02-24nl80211: Add NL command to support TID speicific configurationsTamizh chelvam
Add the new NL80211_CMD_SET_TID_CONFIG command to support data TID specific configuration. Per TID configuration is passed in the nested NL80211_ATTR_TID_CONFIG attribute. This patch adds support to configure per TID noack policy through the NL80211_TID_CONFIG_ATTR_NOACK attribute. Signed-off-by: Tamizh chelvam <tamizhr@codeaurora.org> Link: https://lore.kernel.org/r/1579506687-18296-2-git-send-email-tamizhr@codeaurora.org Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2020-02-24cfg80211: Support key configuration for Beacon protection (BIGTK)Jouni Malinen
IEEE P802.11-REVmd/D3.0 adds support for protecting Beacon frames using a new set of keys (BIGTK; key index 6..7) similarly to the way group-addressed Robust Management frames are protected (IGTK; key index 4..5). Extend cfg80211 and nl80211 to allow the new BIGTK to be configured. Add an extended feature flag to indicate driver support for the new key index values to avoid array overflows in driver implementations and also to indicate to user space when this functionality is available. Signed-off-by: Jouni Malinen <jouni@codeaurora.org> Link: https://lore.kernel.org/r/20200222132548.20835-2-jouni@codeaurora.org Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2020-02-24cfg80211: fix indentation errorsJérôme Pouiller
Signed-off-by: Jérôme Pouiller <jerome.pouiller@silabs.com> Link: https://lore.kernel.org/r/20200221115604.594035-10-Jerome.Pouiller@silabs.com Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2020-02-24cfg80211: merge documentations of field "dev"Jérôme Pouiller
The field "dev" was documented on two places. This patch merges the comments. Signed-off-by: Jérôme Pouiller <jerome.pouiller@silabs.com> Link: https://lore.kernel.org/r/20200221115604.594035-8-Jerome.Pouiller@silabs.com Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2020-02-24cfg80211: merge documentations of field "debugfsdir"Jérôme Pouiller
The field "privid" is documented twice. Comments were more or less the same. The patch merge them. Signed-off-by: Jérôme Pouiller <jerome.pouiller@silabs.com> Link: https://lore.kernel.org/r/20200221115604.594035-7-Jerome.Pouiller@silabs.com Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2020-02-24cfg80211: drop duplicated documentation of field "reg_notifier"Jérôme Pouiller
The field "reg_notifier" was already documented above the definition of struct wiphy. The comment inside the definition of the struct did not bring more information. Signed-off-by: Jérôme Pouiller <jerome.pouiller@silabs.com> Link: https://lore.kernel.org/r/20200221115604.594035-6-Jerome.Pouiller@silabs.com Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2020-02-24cfg80211: drop duplicated documentation of field "perm_addr"Jérôme Pouiller
The field "perm_addr" was already documented above the definition of struct wiphy. Comments were almost identical. Signed-off-by: Jérôme Pouiller <jerome.pouiller@silabs.com> Link: https://lore.kernel.org/r/20200221115604.594035-5-Jerome.Pouiller@silabs.com Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2020-02-24cfg80211: drop duplicated documentation of field "_net"Jérôme Pouiller
The field "_net" was already documented above the definition of struct wiphy. Both comments were identical. Signed-off-by: Jérôme Pouiller <jerome.pouiller@silabs.com> Link: https://lore.kernel.org/r/20200221115604.594035-4-Jerome.Pouiller@silabs.com Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2020-02-24cfg80211: drop duplicated documentation of field "registered"Jérôme Pouiller
Field "registered" was documented three times: twice in the documentation block of struct wiphy and once inside the struct definition. This patch keep only one comment. Signed-off-by: Jérôme Pouiller <jerome.pouiller@silabs.com> Link: https://lore.kernel.org/r/20200221115604.594035-3-Jerome.Pouiller@silabs.com Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2020-02-24cfg80211: drop duplicated documentation of field "privid"Jérôme Pouiller
The field "privid" was already documented above the definition of struct wiphy. Comments were not identical, but they said more or less the same thing. Signed-off-by: Jérôme Pouiller <jerome.pouiller@silabs.com> Link: https://lore.kernel.org/r/20200221115604.594035-2-Jerome.Pouiller@silabs.com Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2020-02-24cfg80211: drop duplicated documentation of field "probe_resp_offload"Jérôme Pouiller
The field "probe_resp_offload" was already documented above the definition of struct wiphy. Both comments were identical. Signed-off-by: Jérôme Pouiller <jerome.pouiller@silabs.com> Link: https://lore.kernel.org/r/20200221115604.594035-1-Jerome.Pouiller@silabs.com Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2020-02-24Revert "nl80211: add src and dst addr attributes for control port tx/rx"Johannes Berg
This reverts commit 8c3ed7aa2b9ef666195b789e9b02e28383243fa8. As Jouni points out, there's really no need for this, since the RSN pre-authentication frames are normal data frames, not port control frames (locally). We can still revert this now since it hasn't actually gone beyond -next. Fixes: 8c3ed7aa2b9e ("nl80211: add src and dst addr attributes for control port tx/rx") Signed-off-by: Johannes Berg <johannes.berg@intel.com> Link: https://lore.kernel.org/r/20200224101910.b746e263287a.I9eb15d6895515179d50964dec3550c9dc784bb93@changeid Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2020-02-22Merge tag 'irq-urgent-2020-02-22' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull irq fixes from Thomas Gleixner: "Two fixes for the irq core code which are follow ups to the recent MSI fixes: - The WARN_ON which was put into the MSI setaffinity callback for paranoia reasons actually triggered via a callchain which escaped when all the possible ways to reach that code were analyzed. The proc/irq/$N/*affinity interfaces have a quirk which came in when ALPHA moved to the generic interface: In case that the written affinity mask does not contain any online CPU it calls into ALPHAs magic auto affinity setting code. A few years later this mechanism was also made available to x86 for no good reasons and in a way which circumvents all sanity checks for interrupts which cannot have their affinity set from process context on X86 due to the way the X86 interrupt delivery works. It would be possible to make this work properly, but there is no point in doing so. If the interrupt is not yet started then the affinity setting has no effect and if it is started already then it is already assigned to an online CPU so there is no point to randomly move it to some other CPU. Just return EINVAL as the code has done before that change forever. - The new MSI quirk bit in the irq domain flags turned out to be already occupied, which escaped the author and the reviewers because the already in use bits were 0,6,2,3,4,5 listed in that order. That bit 6 was simply overlooked because the ordering was straight forward linear otherwise. So the new bit ended up being a duplicate. Fix it up by switching the oddball 6 to the obvious 1" * tag 'irq-urgent-2020-02-22' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: genirq/irqdomain: Make sure all irq domain flags are distinct genirq/proc: Reject invalid affinity masks (again)
2020-02-22Merge tag 'scsi-fixes' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi Pull SCSI fixes from James Bottomley: "Four non-core fixes. Two are reverts of target fixes which turned out to have unwanted side effects, one is a revert of an RDMA fix with the same problem and the final one fixes an incorrect warning about memory allocation failures in megaraid_sas (the driver actually reduces the allocation size until it succeeds)" Signed-off-by: James E.J. Bottomley <jejb@linux.ibm.com> * tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi: scsi: Revert "target: iscsi: Wait for all commands to finish before freeing a session" scsi: Revert "RDMA/isert: Fix a recently introduced regression related to logout" scsi: megaraid_sas: silence a warning scsi: Revert "target/core: Inline transport_lun_remove_cmd()"
2020-02-22netfilter: ipset: Fix "INFO: rcu detected stall in hash_xxx" reportsJozsef Kadlecsik
In the case of huge hash:* types of sets, due to the single spinlock of a set the processing of the whole set under spinlock protection could take too long. There were four places where the whole hash table of the set was processed from bucket to bucket under holding the spinlock: - During resizing a set, the original set was locked to exclude kernel side add/del element operations (userspace add/del is excluded by the nfnetlink mutex). The original set is actually just read during the resize, so the spinlocking is replaced with rcu locking of regions. However, thus there can be parallel kernel side add/del of entries. In order not to loose those operations a backlog is added and replayed after the successful resize. - Garbage collection of timed out entries was also protected by the spinlock. In order not to lock too long, region locking is introduced and a single region is processed in one gc go. Also, the simple timer based gc running is replaced with a workqueue based solution. The internal book-keeping (number of elements, size of extensions) is moved to region level due to the region locking. - Adding elements: when the max number of the elements is reached, the gc was called to evict the timed out entries. The new approach is that the gc is called just for the matching region, assuming that if the region (proportionally) seems to be full, then the whole set does. We could scan the other regions to check every entry under rcu locking, but for huge sets it'd mean a slowdown at adding elements. - Listing the set header data: when the set was defined with timeout support, the garbage collector was called to clean up timed out entries to get the correct element numbers and set size values. Now the set is scanned to check non-timed out entries, without actually calling the gc for the whole set. Thanks to Florian Westphal for helping me to solve the SOFTIRQ-safe -> SOFTIRQ-unsafe lock order issues during working on the patch. Reported-by: syzbot+4b0e9d4ff3cf117837e5@syzkaller.appspotmail.com Reported-by: syzbot+c27b8d5010f45c666ed1@syzkaller.appspotmail.com Reported-by: syzbot+68a806795ac89df3aa1c@syzkaller.appspotmail.com Fixes: 23c42a403a9c ("netfilter: ipset: Introduction of new commands and protocol version 7") Signed-off-by: Jozsef Kadlecsik <kadlec@netfilter.org>
2020-02-21Merge tag 'sched-for-bpf-2020-02-20' of ↵Alexei Starovoitov
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip into bpf-next Two migrate disable related stubs for BPF to base the RT patches on
2020-02-21Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-nextDavid S. Miller
Daniel Borkmann says: ==================== pull-request: bpf-next 2020-02-21 The following pull-request contains BPF updates for your *net-next* tree. We've added 25 non-merge commits during the last 4 day(s) which contain a total of 33 files changed, 2433 insertions(+), 161 deletions(-). The main changes are: 1) Allow for adding TCP listen sockets into sock_map/hash so they can be used with reuseport BPF programs, from Jakub Sitnicki. 2) Add a new bpf_program__set_attach_target() helper for adding libbpf support to specify the tracepoint/function dynamically, from Eelco Chaudron. 3) Add bpf_read_branch_records() BPF helper which helps use cases like profile guided optimizations, from Daniel Xu. 4) Enable bpf_perf_event_read_value() in all tracing programs, from Song Liu. 5) Relax BTF mandatory check if only used for libbpf itself e.g. to process BTF defined maps, from Andrii Nakryiko. 6) Move BPF selftests -mcpu compilation attribute from 'probe' to 'v3' as it has been observed that former fails in envs with low memlock, from Yonghong Song. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2020-02-21Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netDavid S. Miller
Conflict resolution of ice_virtchnl_pf.c based upon work by Stephen Rothwell. Signed-off-by: David S. Miller <davem@davemloft.net>
2020-02-21net: Generate reuseport group ID on group creationJakub Sitnicki
Commit 736b46027eb4 ("net: Add ID (if needed) to sock_reuseport and expose reuseport_lock") has introduced lazy generation of reuseport group IDs that survive group resize. By comparing the identifier we check if BPF reuseport program is not trying to select a socket from a BPF map that belongs to a different reuseport group than the one the packet is for. Because SOCKARRAY used to be the only BPF map type that can be used with reuseport BPF, it was possible to delay the generation of reuseport group ID until a socket from the group was inserted into BPF map for the first time. Now that SOCK{MAP,HASH} can be used with reuseport BPF we have two options, either generate the reuseport ID on map update, like SOCKARRAY does, or allocate an ID from the start when reuseport group gets created. This patch takes the latter approach to keep sockmap free of calls into reuseport code. This streamlines the reuseport_id access as its lifetime now matches the longevity of reuseport object. The cost of this simplification, however, is that we allocate reuseport IDs for all SO_REUSEPORT users. Even those that don't use SOCKARRAY in their setups. With the way identifiers are currently generated, we can have at most S32_MAX reuseport groups, which hopefully is sufficient. If we ever get close to the limit, we can switch an u64 counter like sk_cookie. Another change is that we now always call into SOCKARRAY logic to unlink the socket from the map when unhashing or closing the socket. Previously we did it only when at least one socket from the group was in a BPF map. It is worth noting that this doesn't conflict with sockmap tear-down in case a socket is in a SOCK{MAP,HASH} and belongs to a reuseport group. sockmap tear-down happens first: prot->unhash `- tcp_bpf_unhash |- tcp_bpf_remove | `- while (sk_psock_link_pop(psock)) | `- sk_psock_unlink | `- sock_map_delete_from_link | `- __sock_map_delete | `- sock_map_unref | `- sk_psock_put | `- sk_psock_drop | `- rcu_assign_sk_user_data(sk, NULL) `- inet_unhash `- reuseport_detach_sock `- bpf_sk_reuseport_detach `- WRITE_ONCE(sk->sk_user_data, NULL) Suggested-by: Martin Lau <kafai@fb.com> Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20200218171023.844439-10-jakub@cloudflare.com
2020-02-21tcp_bpf: Don't let child socket inherit parent protocol ops on copyJakub Sitnicki
Prepare for cloning listening sockets that have their protocol callbacks overridden by sk_msg. Child sockets must not inherit parent callbacks that access state stored in sk_user_data owned by the parent. Restore the child socket protocol callbacks before it gets hashed and any of the callbacks can get invoked. Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: John Fastabend <john.fastabend@gmail.com> Link: https://lore.kernel.org/bpf/20200218171023.844439-4-jakub@cloudflare.com
2020-02-21net, sk_msg: Clear sk_user_data pointer on clone if taggedJakub Sitnicki
sk_user_data can hold a pointer to an object that is not intended to be shared between the parent socket and the child that gets a pointer copy on clone. This is the case when sk_user_data points at reference-counted object, like struct sk_psock. One way to resolve it is to tag the pointer with a no-copy flag by repurposing its lowest bit. Based on the bit-flag value we clear the child sk_user_data pointer after cloning the parent socket. The no-copy flag is stored in the pointer itself as opposed to externally, say in socket flags, to guarantee that the pointer and the flag are copied from parent to child socket in an atomic fashion. Parent socket state is subject to change while copying, we don't hold any locks at that time. This approach relies on an assumption that sk_user_data holds a pointer to an object aligned at least 2 bytes. A manual audit of existing users of rcu_dereference_sk_user_data helper confirms our assumption. Also, an RCU-protected sk_user_data is not likely to hold a pointer to a char value or a pathological case of "struct { char c; }". To be safe, warn when the flag-bit is set when setting sk_user_data to catch any future misuses. It is worth considering why clearing sk_user_data unconditionally is not an option. There exist users, DRBD, NVMe, and Xen drivers being among them, that rely on the pointer being copied when cloning the listening socket. Potentially we could distinguish these users by checking if the listening socket has been created in kernel-space via sock_create_kern, and hence has sk_kern_sock flag set. However, this is not the case for NVMe and Xen drivers, which create sockets without marking them as belonging to the kernel. Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: John Fastabend <john.fastabend@gmail.com> Acked-by: Martin KaFai Lau <kafai@fb.com> Link: https://lore.kernel.org/bpf/20200218171023.844439-3-jakub@cloudflare.com
2020-02-21net, sk_msg: Annotate lockless access to sk_prot on cloneJakub Sitnicki
sk_msg and ULP frameworks override protocol callbacks pointer in sk->sk_prot, while tcp accesses it locklessly when cloning the listening socket, that is with neither sk_lock nor sk_callback_lock held. Once we enable use of listening sockets with sockmap (and hence sk_msg), there will be shared access to sk->sk_prot if socket is getting cloned while being inserted/deleted to/from the sockmap from another CPU: Read side: tcp_v4_rcv sk = __inet_lookup_skb(...) tcp_check_req(sk) inet_csk(sk)->icsk_af_ops->syn_recv_sock tcp_v4_syn_recv_sock tcp_create_openreq_child inet_csk_clone_lock sk_clone_lock READ_ONCE(sk->sk_prot) Write side: sock_map_ops->map_update_elem sock_map_update_elem sock_map_update_common sock_map_link_no_progs tcp_bpf_init tcp_bpf_update_sk_prot sk_psock_update_proto WRITE_ONCE(sk->sk_prot, ops) sock_map_ops->map_delete_elem sock_map_delete_elem __sock_map_delete sock_map_unref sk_psock_put sk_psock_drop sk_psock_restore_proto tcp_update_ulp WRITE_ONCE(sk->sk_prot, proto) Mark the shared access with READ_ONCE/WRITE_ONCE annotations. Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20200218171023.844439-2-jakub@cloudflare.com
2020-02-21Merge tag 'tty-5.6-rc3' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty Pull tty/serial driver fixes from Greg KH: "Here are a number of small tty and serial driver fixes for 5.6-rc3 that resolve a bunch of reported issues. They are: - vt selection and ioctl fixes - serdev bugfix - atmel serial driver fixes - qcom serial driver fixes - other minor serial driver fixes All of these have been in linux-next for a while with no reported issues" * tag 'tty-5.6-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty: vt: selection, close sel_buffer race vt: selection, handle pending signals in paste_selection serial: cpm_uart: call cpm_muram_init before registering console tty: serial: qcom_geni_serial: Fix RX cancel command failure serial: 8250: Check UPF_IRQ_SHARED in advance tty: serial: imx: setup the correct sg entry for tx dma vt: vt_ioctl: fix race in VT_RESIZEX vt: fix scrollback flushing on background consoles tty: serial: tegra: Handle RX transfer in PIO mode if DMA wasn't started tty/serial: atmel: manage shutdown in case of RS485 or ISO7816 mode serdev: ttyport: restore client ops on deregistration serial: ar933x_uart: set UART_CS_{RX,TX}_READY_ORIDE
2020-02-21Merge tag 'usb-5.6-rc3' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb Pull USB/Thunderbolt fixes from Greg KH: "Here are a number of small USB driver fixes for 5.6-rc3. Included in here are: - MAINTAINER file updates - USB gadget driver fixes - usb core quirk additions and fixes for regressions - xhci driver fixes - usb serial driver id additions and fixes - thunderbolt bugfix Thunderbolt patches come in through here now that USB4 is really thunderbolt. All of these have been in linux-next for a while with no reported issues" * tag 'usb-5.6-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb: (34 commits) USB: misc: iowarrior: add support for the 100 device thunderbolt: Prevent crash if non-active NVMem file is read usb: gadget: udc-xilinx: Fix xudc_stop() kernel-doc format USB: misc: iowarrior: add support for the 28 and 28L devices USB: misc: iowarrior: add support for 2 OEMed devices USB: Fix novation SourceControl XL after suspend xhci: Fix memory leak when caching protocol extended capability PSI tables - take 2 Revert "xhci: Fix memory leak when caching protocol extended capability PSI tables" MAINTAINERS: Sort entries in database for THUNDERBOLT usb: dwc3: debug: fix string position formatting mixup with ret and len usb: gadget: serial: fix Tx stall after buffer overflow usb: gadget: ffs: ffs_aio_cancel(): Save/restore IRQ flags usb: dwc2: Fix SET/CLEAR_FEATURE and GET_STATUS flows usb: dwc2: Fix in ISOC request length checking usb: gadget: composite: Support more than 500mA MaxPower usb: gadget: composite: Fix bMaxPower for SuperSpeedPlus usb: gadget: u_audio: Fix high-speed max packet size usb: dwc3: gadget: Check for IOC/LST bit in TRB->ctrl fields USB: core: clean up endpoint-descriptor parsing USB: quirks: blacklist duplicate ep on Sound Devices USBPre2 ...
2020-02-21Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netLinus Torvalds
Pull networking fixes from David Miller: 1) Limit xt_hashlimit hash table size to avoid OOM or hung tasks, from Cong Wang. 2) Fix deadlock in xsk by publishing global consumer pointers when NAPI is finished, from Magnus Karlsson. 3) Set table field properly to RT_TABLE_COMPAT when necessary, from Jethro Beekman. 4) NLA_STRING attributes are not necessary NULL terminated, deal wiht that in IFLA_ALT_IFNAME. From Eric Dumazet. 5) Fix checksum handling in atlantic driver, from Dmitry Bezrukov. 6) Handle mtu==0 devices properly in wireguard, from Jason A. Donenfeld. 7) Fix several lockdep warnings in bonding, from Taehee Yoo. 8) Fix cls_flower port blocking, from Jason Baron. 9) Sanitize internal map names in libbpf, from Toke Høiland-Jørgensen. 10) Fix RDMA race in qede driver, from Michal Kalderon. 11) Fix several false lockdep warnings by adding conditions to list_for_each_entry_rcu(), from Madhuparna Bhowmik. 12) Fix sleep in atomic in mlx5 driver, from Huy Nguyen. 13) Fix potential deadlock in bpf_map_do_batch(), from Yonghong Song. 14) Hey, variables declared in switch statement before any case statements are not initialized. I learn something every day. Get rids of this stuff in several parts of the networking, from Kees Cook. * git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (99 commits) bnxt_en: Issue PCIe FLR in kdump kernel to cleanup pending DMAs. bnxt_en: Improve device shutdown method. net: netlink: cap max groups which will be considered in netlink_bind() net: thunderx: workaround BGX TX Underflow issue ionic: fix fw_status read net: disable BRIDGE_NETFILTER by default net: macb: Properly handle phylink on at91rm9200 s390/qeth: fix off-by-one in RX copybreak check s390/qeth: don't warn for napi with 0 budget s390/qeth: vnicc Fix EOPNOTSUPP precedence openvswitch: Distribute switch variables for initialization net: ip6_gre: Distribute switch variables for initialization net: core: Distribute switch variables for initialization udp: rehash on disconnect net/tls: Fix to avoid gettig invalid tls record bpf: Fix a potential deadlock with bpf_map_do_batch bpf: Do not grab the bucket spinlock by default on htab batch ops ice: Wait for VF to be reset/ready before configuration ice: Don't tell the OS that link is going down ice: Don't reject odd values of usecs set by user ...