summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2019-06-27PCI / ACPI: Add _PR0 dependent devicesMika Westerberg
If otherwise unrelated PCI devices share ACPI power resources turning them on causes the devices to enter D0uninitialized power state which may cause problems. For example in Intel Ice Lake two root ports (RP0 and RP1), Thunderbolt controller (NHI) and xHCI controller all share power resources as can be ween in the topology below where power resources are marked with []: Host bridge | +- RP0 ---\ +- RP1 ---|--+--> [TBT] +- NHI --/ | | | | v +- xHCI --> [D3C] In a situation where all devices sharing the power resources are in D3cold (the power resources are turned off) and for example the Thunderbolt controller is runtime resumed resulting that the power resources are turned on. This means that the other devices sharing them (RP0, RP1 and xHCI) are transitioned into D0uninitialized state. If they were configured to trigger wake (PME) on a certain event that configuration gets lost after reset so we would need to re-initialize them to get the wakeup working as expected again. To do so we would need to runtime resume all of them to make sure their registers get restored properly before we can runtime suspend them again. Since we just added concept of "_PR0 dependent device" we can solve this by calling the relevant add/remove functions when the PCI device is bind to its ACPI representation. If it has power resources the PCI device will be added as dependent device to them and runtime resumed whenever they are physically turned on. This should make sure PCI core can reconfigure wakes after the device is transitioned into D0uninitialized. Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2019-06-27ACPI / PM: Introduce concept of a _PR0 dependent deviceMika Westerberg
If there are shared power resources between otherwise unrelated devices turning them on causes the other devices sharing them to be powered up as well. In case of PCI devices go into D0uninitialized state meaning that if they were configured to trigger wake that configuration is lost at this point. For this reason introduce a concept of "_PR0 dependent device" that can be added to any ACPI device that has power resources. The dependent device will be included in a list of dependent devices for all power resources returned by the ACPI device's _PR0 (assuming it has one). Whenever a power resource having dependent devices is turned physically on (its _ON method is called) we runtime resume all of them to allow their driver or in case of PCI the PCI core to re-initialize the device and its wake configuration. This adds two functions that can be used to add and remove these dependent devices. Note the dependent device does not necessary need share power resources so this functionality can be used to add "software dependencies" as well if needed. Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2019-06-27PCI / ACPI: Use cached ACPI device state to get PCI device power stateMika Westerberg
The ACPI power state returned by acpi_device_get_power() may depend on the configuration of ACPI power resources in the system which may change any time after acpi_device_get_power() has returned, unless the reference counters of the ACPI power resources in question are set to prevent that from happening. Thus it is invalid to use acpi_device_get_power() in acpi_pci_get_power_state() the way it is done now and the value of the ->power.state field in the corresponding struct acpi_device objects (which reflects the ACPI power resources reference counting, among other things) should be used instead. As an example where this becomes an issue is Intel Ice Lake where the Thunderbolt controller (NHI), two PCIe root ports (RP0 and RP1) and xHCI all share the same power resources. The following picture with power resources marked with [] shows the topology: Host bridge | +- RP0 ---\ +- RP1 ---|--+--> [TBT] +- NHI --/ | | | | v +- xHCI --> [D3C] Here TBT and D3C are the shared ACPI power resources. ACPI _PR3() method of the devices in question returns either TBT or D3C or both. Say we runtime suspend first the root ports RP0 and RP1, then NHI. Now since the TBT power resource is still on when the root ports are runtime suspended their dev->current_state is set to D3hot. When NHI is runtime suspended TBT is finally turned off but state of the root ports remain to be D3hot. Now when the xHCI is runtime suspended D3C gets also turned off. PCI core thus has power states of these devices cached in their dev->current_state as follows: RP0 -> D3hot RP1 -> D3hot NHI -> D3cold xHCI -> D3cold If the user now runs lspci for instance, the result is all 1's like in the below output (00:07.0 is the first root port, RP0): 00:07.0 PCI bridge: Intel Corporation Device 8a1d (rev ff) (prog-if ff) !!! Unknown header type 7f Kernel driver in use: pcieport In short the hardware state is not in sync with the software state anymore. The exact same thing happens with the PME polling thread which ends up bringing the root ports back into D0 after they are runtime suspended. For this reason, modify acpi_pci_get_power_state() so that it uses the ACPI device power state that was cached by the ACPI core. This makes the PCI device power state match the ACPI device power state regardless of state of the shared power resources which may still be on at this point. Link: https://lore.kernel.org/r/20190618161858.77834-2-mika.westerberg@linux.intel.com Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2019-06-27ACPI: PM: Allow transitions to D0 to occur in special casesRafael J. Wysocki
If a device with ACPI PM is left in D0 during a system-wide transition to the S3 (suspend-to-RAM) or S4 (hibernation) sleep state, the actual state of the device need not be D0 during resume from it, although its power.state value will still reflect D0 (that is, the power state from before the system-wide transition). In that case, the acpi_device_set_power() call made to ensure that the power state of the device will be D0 going forward has no effect, because the new state (D0) is equal to the one reflected by the device's power.state value. That does not affect power resources, which are taken care of by acpi_resume_power_resources() called from acpi_pm_finish() during resume from system-wide sleep states, but it still may be necessary to invoke _PS0 for the device on top of that in order to finalize its transition to D0. For this reason, modify acpi_device_set_power() to allow transitions to D0 to occur even if D0 is the current power state of the device according to its power.state value. That will not affect power resources, which are assumed to be in the right configuration already (as reflected by the current values of their reference counters), but it may cause _PS0 to be evaluated for the device. However, evaluating _PS0 for a device already in D0 may lead to confusion in general, so invoke _PSC (if present) to check the device's current power state upfront and only evaluate _PS0 for it if _PSC has returned a power state different from D0. [If _PSC is not present or the evaluation of it fails, the power state of the device is assumed to be D0 at this point.] Fixes: 20dacb71ad28 (ACPI / PM: Rework device power management to follow ACPI 6) Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Reviewed-by: Mika Westerberg <mika.westerberg@linux.intel.com>
2019-06-27ACPI: PM: Avoid evaluating _PS3 on transitions from D3hot to D3coldRafael J. Wysocki
If the power state of a device with ACPI PM is changed from D3hot to D3cold, it merely is a matter of dropping references to additional power resources (specifically, those in the list returned by _PR3), and the _PS3 method should not be invoked for the device then (as it has already been evaluated during the previous transition to D3hot). Fixes: 20dacb71ad28 (ACPI / PM: Rework device power management to follow ACPI 6) Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Reviewed-by: Mika Westerberg <mika.westerberg@linux.intel.com>
2019-06-27proc: remove useless d_is_dir() checkChristian Brauner
Remove the d_is_dir() check from tgid_pidfd_to_pid(). It is pointless since you should never get &proc_tgid_base_operations for f_op on a non-directory. Suggested-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Christian Brauner <christian@brauner.io>
2019-06-27copy_process(): don't use ksys_close() on cleanupsAl Viro
anon_inode_getfd() should be used *ONLY* in situations when we are guaranteed to be past the last failure point (including copying the descriptor number to userland, at that). And ksys_close() should not be used for cleanups at all. anon_inode_getfile() is there for all nontrivial cases like that. Just use that... Fixes: b3e583825266 ("clone: add CLONE_PIDFD") Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Reviewed-by: Jann Horn <jannh@google.com> Signed-off-by: Christian Brauner <christian@brauner.io>
2019-06-27x86/entry: Simplify _TIF_SYSCALL_EMU handlingSudeep Holla
The usage of emulated and _TIF_SYSCALL_EMU flags in syscall_trace_enter is more complicated than required. Cc: Andy Lutomirski <luto@kernel.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Borislav Petkov <bp@alien8.de> Acked-by: Oleg Nesterov <oleg@redhat.com> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Sudeep Holla <sudeep.holla@arm.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2019-06-27cpu/hotplug: Fix out-of-bounds read when setting fail stateEiichi Tsukata
Setting invalid value to /sys/devices/system/cpu/cpuX/hotplug/fail can control `struct cpuhp_step *sp` address, results in the following global-out-of-bounds read. Reproducer: # echo -2 > /sys/devices/system/cpu/cpu0/hotplug/fail KASAN report: BUG: KASAN: global-out-of-bounds in write_cpuhp_fail+0x2cd/0x2e0 Read of size 8 at addr ffffffff89734438 by task bash/1941 CPU: 0 PID: 1941 Comm: bash Not tainted 5.2.0-rc6+ #31 Call Trace: write_cpuhp_fail+0x2cd/0x2e0 dev_attr_store+0x58/0x80 sysfs_kf_write+0x13d/0x1a0 kernfs_fop_write+0x2bc/0x460 vfs_write+0x1e1/0x560 ksys_write+0x126/0x250 do_syscall_64+0xc1/0x390 entry_SYSCALL_64_after_hwframe+0x49/0xbe RIP: 0033:0x7f05e4f4c970 The buggy address belongs to the variable: cpu_hotplug_lock+0x98/0xa0 Memory state around the buggy address: ffffffff89734300: fa fa fa fa 00 00 00 00 00 00 00 00 00 00 00 00 ffffffff89734380: fa fa fa fa 00 00 00 00 00 00 00 00 00 00 00 00 >ffffffff89734400: 00 00 00 00 fa fa fa fa 00 00 00 00 fa fa fa fa ^ ffffffff89734480: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ffffffff89734500: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 Add a sanity check for the value written from user space. Fixes: 1db49484f21ed ("smp/hotplug: Hotplug state fail injection") Signed-off-by: Eiichi Tsukata <devel@etsukata.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: peterz@infradead.org Link: https://lkml.kernel.org/r/20190627024732.31672-1-devel@etsukata.com
2019-06-27crypto: asymmetric_keys - select CRYPTO_HASH where neededArnd Bergmann
Build testing with some core crypto options disabled revealed a few modules that are missing CRYPTO_HASH: crypto/asymmetric_keys/x509_public_key.o: In function `x509_get_sig_params': x509_public_key.c:(.text+0x4c7): undefined reference to `crypto_alloc_shash' x509_public_key.c:(.text+0x5e5): undefined reference to `crypto_shash_digest' crypto/asymmetric_keys/pkcs7_verify.o: In function `pkcs7_digest.isra.0': pkcs7_verify.c:(.text+0xab): undefined reference to `crypto_alloc_shash' pkcs7_verify.c:(.text+0x1b2): undefined reference to `crypto_shash_digest' pkcs7_verify.c:(.text+0x3c1): undefined reference to `crypto_shash_update' pkcs7_verify.c:(.text+0x411): undefined reference to `crypto_shash_finup' This normally doesn't show up in randconfig tests because there is a large number of other options that select CRYPTO_HASH. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2019-06-27crypto: serpent - mark __serpent_setkey_sbox noinlineArnd Bergmann
The same bug that gcc hit in the past is apparently now showing up with clang, which decides to inline __serpent_setkey_sbox: crypto/serpent_generic.c:268:5: error: stack frame size of 2112 bytes in function '__serpent_setkey' [-Werror,-Wframe-larger-than=] Marking it 'noinline' reduces the stack usage from 2112 bytes to 192 and 96 bytes, respectively, and seems to generate more useful object code. Fixes: c871c10e4ea7 ("crypto: serpent - improve __serpent_setkey with UBSAN") Signed-off-by: Arnd Bergmann <arnd@arndb.de> Reviewed-by: Eric Biggers <ebiggers@kernel.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2019-06-27crypto: testmgr - dynamically allocate crypto_shashArnd Bergmann
The largest stack object in this file is now the shash descriptor. Since there are many other stack variables, this can push it over the 1024 byte warning limit, in particular with clang and KASAN: crypto/testmgr.c:1693:12: error: stack frame size of 1312 bytes in function '__alg_test_hash' [-Werror,-Wframe-larger-than=] Make test_hash_vs_generic_impl() do the same thing as the corresponding eaed and skcipher functions by allocating the descriptor dynamically. We can still do better than this, but it brings us well below the 1024 byte limit. Suggested-by: Eric Biggers <ebiggers@kernel.org> Fixes: 9a8a6b3f0950 ("crypto: testmgr - fuzz hashes against their generic implementation") Signed-off-by: Arnd Bergmann <arnd@arndb.de> Reviewed-by: Eric Biggers <ebiggers@kernel.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2019-06-27crypto: testmgr - dynamically allocate testvec_configArnd Bergmann
On arm32, we get warnings about high stack usage in some of the functions: crypto/testmgr.c:2269:12: error: stack frame size of 1032 bytes in function 'alg_test_aead' [-Werror,-Wframe-larger-than=] static int alg_test_aead(const struct alg_test_desc *desc, const char *driver, ^ crypto/testmgr.c:1693:12: error: stack frame size of 1312 bytes in function '__alg_test_hash' [-Werror,-Wframe-larger-than=] static int __alg_test_hash(const struct hash_testvec *vecs, ^ On of the larger objects on the stack here is struct testvec_config, so change that to dynamic allocation. Fixes: 40153b10d91c ("crypto: testmgr - fuzz AEADs against their generic implementation") Fixes: d435e10e67be ("crypto: testmgr - fuzz skciphers against their generic implementation") Fixes: 9a8a6b3f0950 ("crypto: testmgr - fuzz hashes against their generic implementation") Signed-off-by: Arnd Bergmann <arnd@arndb.de> Reviewed-by: Eric Biggers <ebiggers@kernel.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2019-06-27crypto: talitos - eliminate unneeded 'done' functions at build timeChristophe Leroy
When building for SEC1 only, talitos2_done functions are unneeded and should go away. For this, use has_ftr_sec1() which will always return true when only SEC1 support is being built, allowing GCC to drop TALITOS2 functions. Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Reviewed-by: Horia Geantă <horia.geanta@nxp.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2019-06-27crypto: sun4i-ss - reduce stack usageArnd Bergmann
After the latest addition, the stack usage of sun4i_ss_cipher_poll grew beyond the warning limit when KASAN is enabled: drivers/crypto/sunxi-ss/sun4i-ss-cipher.c:118:12: error: stack frame size of 1152 bytes in function 'sun4i_ss_cipher_poll' [-Werror,-Wframe-larger-than=] static int sun4i_ss_cipher_poll(struct skcipher_request *areq) Reduce it in three ways: - split out the new code into a separate function so its stack usage can overlap that of the sun4i_ss_opti_poll() code path - mark both special cases as noinline_for_stack, which should ideally result in a tail call that frees the rest of the stack - move the buf and obuf variables into the code blocks in which they are used. The three separate functions now use 144, 640 and 304 bytes of kernel stack, respectively. Fixes: 0ae1f46c55f8 ("crypto: sun4i-ss - fallback when length is not multiple of blocksize") Signed-off-by: Arnd Bergmann <arnd@arndb.de> Tested-by: Corentin LABBE <clabbe.montjoie@gmail.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2019-06-27crypto: ccree - add HW engine config checkGilad Ben-Yossef
Add check to verify the stated device tree HW configuration matches the HW. Signed-off-by: Gilad Ben-Yossef <gilad@benyossef.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2019-06-27crypto: ccree - prevent isr handling in case driver is suspendedOfir Drang
ccree irq may be shared with other devices, in order to prevent ccree isr handling while device maybe suspended we added a check to verify that the device is not suspended. Signed-off-by: Ofir Drang <ofir.drang@arm.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2019-06-27crypto: ccree - check that cryptocell reset completedOfir Drang
In case of driver probe and pm resume we need to check that the cryptocell hardware reset cycle is completed. during the reset cycle that Cryptocell provide read only access to the APB interface which allows to verify through the CC registers that the reset is completed. Until reset completion we assume that any write/crypto operation is blocked. Signed-off-by: Ofir Drang <ofir.drang@arm.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2019-06-27crypto: ccree - Relocate driver irq registration after clk initofir.drang@arm.com
Signed-off-by: Ofir Drang <ofir.drang@arm.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2019-06-26af_packet: Block execution of tasks waiting for transmit to complete in ↵Neil Horman
AF_PACKET When an application is run that: a) Sets its scheduler to be SCHED_FIFO and b) Opens a memory mapped AF_PACKET socket, and sends frames with the MSG_DONTWAIT flag cleared, its possible for the application to hang forever in the kernel. This occurs because when waiting, the code in tpacket_snd calls schedule, which under normal circumstances allows other tasks to run, including ksoftirqd, which in some cases is responsible for freeing the transmitted skb (which in AF_PACKET calls a destructor that flips the status bit of the transmitted frame back to available, allowing the transmitting task to complete). However, when the calling application is SCHED_FIFO, its priority is such that the schedule call immediately places the task back on the cpu, preventing ksoftirqd from freeing the skb, which in turn prevents the transmitting task from detecting that the transmission is complete. We can fix this by converting the schedule call to a completion mechanism. By using a completion queue, we force the calling task, when it detects there are no more frames to send, to schedule itself off the cpu until such time as the last transmitted skb is freed, allowing forward progress to be made. Tested by myself and the reporter, with good results Change Notes: V1->V2: Enhance the sleep logic to support being interruptible and allowing for honoring to SK_SNDTIMEO (Willem de Bruijn) V2->V3: Rearrage the point at which we wait for the completion queue, to avoid needing to check for ph/skb being null at the end of the loop. Also move the complete call to the skb destructor to avoid needing to modify __packet_set_status. Also gate calling complete on packet_read_pending returning zero to avoid multiple calls to complete. (Willem de Bruijn) Move timeo computation within loop, to re-fetch the socket timeout since we also use the timeo variable to record the return code from the wait_for_complete call (Neil Horman) V3->V4: Willem has requested that the control flow be restored to the previous state. Doing so lets us eliminate the need for the po->wait_on_complete flag variable, and lets us get rid of the packet_next_frame function, but introduces another complexity. Specifically, but using the packet pending count, we can, if an applications calls sendmsg multiple times with MSG_DONTWAIT set, each set of transmitted frames, when complete, will cause tpacket_destruct_skb to issue a complete call, for which there will never be a wait_on_completion call. This imbalance will lead to any future call to wait_for_completion here to return early, when the frames they sent may not have completed. To correct this, we need to re-init the completion queue on every call to tpacket_snd before we enter the loop so as to ensure we wait properly for the frames we send in this iteration. Change the timeout and interrupted gotos to out_put rather than out_status so that we don't try to free a non-existant skb Clean up some extra newlines (Willem de Bruijn) Reviewed-by: Willem de Bruijn <willemb@google.com> Signed-off-by: Neil Horman <nhorman@tuxdriver.com> Reported-by: Matteo Croce <mcroce@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-06-26sctp: change to hold sk after auth shkey is created successfullyXin Long
Now in sctp_endpoint_init(), it holds the sk then creates auth shkey. But when the creation fails, it doesn't release the sk, which causes a sk defcnf leak, Here to fix it by only holding the sk when auth shkey is created successfully. Fixes: a29a5bd4f5c3 ("[SCTP]: Implement SCTP-AUTH initializations.") Reported-by: syzbot+afabda3890cc2f765041@syzkaller.appspotmail.com Reported-by: syzbot+276ca1c77a19977c0130@syzkaller.appspotmail.com Signed-off-by: Xin Long <lucien.xin@gmail.com> Acked-by: Neil Horman <nhorman@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-06-27Merge tag 'drm-misc-fixes-2019-06-26' of ↵Dave Airlie
git://anongit.freedesktop.org/drm/drm-misc into drm-fixes virtio- Don't call drm_connector_update_edid_property() while holding spinlock Cc: Gerd Hoffmann <kraxel@redhat.com> Signed-off-by: Dave Airlie <airlied@redhat.com> From: Sean Paul <sean@poorly.run> Link: https://patchwork.freedesktop.org/patch/msgid/20190626205615.GA123489@art_vandelay
2019-06-26block, bfq: Init saved_wr_start_at_switch_to_srt in unlikely caseDouglas Anderson
Some debug code suggested by Paolo was tripping when I did reboot stress tests. Specifically in bfq_bfqq_resume_state() "bic->saved_wr_start_at_switch_to_srt" was later than the current value of "jiffies". A bit of debugging showed that "bic->saved_wr_start_at_switch_to_srt" was actually 0 and a bit more debugging showed that was because we had run through the "unlikely" case in the bfq_bfqq_save_state() function. Let's init "saved_wr_start_at_switch_to_srt" in the unlikely case to something sane. NOTE: this fixes no known real-world errors. Reviewed-by: Paolo Valente <paolo.valente@linaro.org> Reviewed-by: Guenter Roeck <groeck@chromium.org> Signed-off-by: Douglas Anderson <dianders@chromium.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-06-26riscv: mm: Fix code commentShihPo Hung
Fix the comment since vmalloc_fault doesn't reach flush_tlb_fix_spurious_fault. Signed-off-by: ShihPo Hung <shihpo.hung@sifive.com> Cc: Palmer Dabbelt <palmer@sifive.com> Cc: Albert Ou <aou@eecs.berkeley.edu> Cc: Paul Walmsley <paul.walmsley@sifive.com> Cc: linux-riscv@lists.infradead.org Reviewed-by: Palmer Dabbelt <palmer@sifive.com> Signed-off-by: Paul Walmsley <paul.walmsley@sifive.com>
2019-06-26dt-bindings: clock: sifive: add MIT license as an option for the header filePaul Walmsley
At Bin Meng's request, add the MIT license as an option for the SiFive FU540 PRCI header file. Signed-off-by: Paul Walmsley <paul.walmsley@sifive.com> Cc: Bin Meng <bmeng.cn@gmail.com>
2019-06-26PCI: PM: Avoid skipping bus-level PM on platforms without ACPIRafael J. Wysocki
There are platforms that do not call pm_set_suspend_via_firmware(), so pm_suspend_via_firmware() returns 'false' on them, but the power states of PCI devices (PCIe ports in particular) are changed as a result of powering down core platform components during system-wide suspend. Thus the pm_suspend_via_firmware() checks in pci_pm_suspend_noirq() and pci_pm_resume_noirq() introduced by commit 3e26c5feed2a ("PCI: PM: Skip devices in D0 for suspend-to- idle") are not sufficient to determine that devices left in D0 during suspend will remain in D0 during resume and so the bus-level power management can be skipped for them. For this reason, introduce a new global suspend flag, PM_SUSPEND_FLAG_NO_PLATFORM, set it for suspend-to-idle only and replace the pm_suspend_via_firmware() checks mentioned above with checks against this flag. Fixes: 3e26c5feed2a ("PCI: PM: Skip devices in D0 for suspend-to-idle") Reported-by: Jon Hunter <jonathanh@nvidia.com> Tested-by: Jon Hunter <jonathanh@nvidia.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Tested-by: Mika Westerberg <mika.westerberg@linux.intel.com> Reviewed-by: Mika Westerberg <mika.westerberg@linux.intel.com>
2019-06-26Merge branch 'ipv6-fix-neighbour-resolution-with-raw-socket'David S. Miller
Nicolas Dichtel says: ==================== ipv6: fix neighbour resolution with raw socket The first patch prepares the fix, it constify rt6_nexthop(). The detail of the bug is explained in the second patch. v1 -> v2: - fix compilation warnings - split the initial patch ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2019-06-26ipv6: fix neighbour resolution with raw socketNicolas Dichtel
The scenario is the following: the user uses a raw socket to send an ipv6 packet, destinated to a not-connected network, and specify a connected nh. Here is the corresponding python script to reproduce this scenario: import socket IPPROTO_RAW = 255 send_s = socket.socket(socket.AF_INET6, socket.SOCK_RAW, IPPROTO_RAW) # scapy # p = IPv6(src='fd00:100::1', dst='fd00:200::fa')/ICMPv6EchoRequest() # str(p) req = b'`\x00\x00\x00\x00\x08:@\xfd\x00\x01\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x01\xfd\x00\x02\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\xfa\x80\x00\x81\xc0\x00\x00\x00\x00' send_s.sendto(req, ('fd00:175::2', 0, 0, 0)) fd00:175::/64 is a connected route and fd00:200::fa is not a connected host. With this scenario, the kernel starts by sending a NS to resolve fd00:175::2. When it receives the NA, it flushes its queue and try to send the initial packet. But instead of sending it, it sends another NS to resolve fd00:200::fa, which obvioulsy fails, thus the packet is dropped. If the user sends again the packet, it now uses the right nh (fd00:175::2). The problem is that ip6_dst_lookup_neigh() uses the rt6i_gateway, which is :: because the associated route is a connected route, thus it uses the dst addr of the packet. Let's use rt6_nexthop() to choose the right nh. Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-06-26ipv6: constify rt6_nexthop()Nicolas Dichtel
There is no functional change in this patch, it only prepares the next one. rt6_nexthop() will be used by ip6_dst_lookup_neigh(), which uses const variables. Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Reported-by: kbuild test robot <lkp@intel.com> Acked-by: Nick Desaulniers <ndesaulniers@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-06-26net: dsa: microchip: Use gpiod_set_value_cansleep()Marek Vasut
Replace gpiod_set_value() with gpiod_set_value_cansleep(), as the switch reset GPIO can be connected to e.g. I2C GPIO expander and it is perfectly fine for the kernel to sleep for a bit in ksz_switch_register(). Signed-off-by: Marek Vasut <marex@denx.de> Cc: Andrew Lunn <andrew@lunn.ch> Cc: Florian Fainelli <f.fainelli@gmail.com> Cc: Linus Walleij <linus.walleij@linaro.org> Cc: Tristram Ha <Tristram.Ha@microchip.com> Cc: Woojung Huh <Woojung.Huh@microchip.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Reviewed-by: Linus Walleij <linus.walleij@linaro.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-06-26net: aquantia: fix vlans not working over bridged networkDmitry Bogdanov
In configuration of vlan over bridge over aquantia device it was found that vlan tagged traffic is dropped on chip. The reason is that bridge device enables promisc mode, but in atlantic chip vlan filters will still apply. So we have to corellate promisc settings with vlan configuration. The solution is to track in a separate state variable the need of vlan forced promisc. And also consider generic promisc configuration when doing vlan filter config. Fixes: 7975d2aff5af ("net: aquantia: add support of rx-vlan-filter offload") Signed-off-by: Dmitry Bogdanov <dmitry.bogdanov@aquantia.com> Signed-off-by: Igor Russkikh <igor.russkikh@aquantia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-06-26keys: Network namespace domain tagDavid Howells
Create key domain tags for network namespaces and make it possible to automatically tag keys that are used by networked services (e.g. AF_RXRPC, AFS, DNS) with the default network namespace if not set by the caller. This allows keys with the same description but in different namespaces to coexist within a keyring. Signed-off-by: David Howells <dhowells@redhat.com> cc: netdev@vger.kernel.org cc: linux-nfs@vger.kernel.org cc: linux-cifs@vger.kernel.org cc: linux-afs@lists.infradead.org
2019-06-26keys: Garbage collect keys for which the domain has been removedDavid Howells
If a key operation domain (such as a network namespace) has been removed then attempt to garbage collect all the keys that use it. Signed-off-by: David Howells <dhowells@redhat.com>
2019-06-26keys: Include target namespace in match criteriaDavid Howells
Currently a key has a standard matching criteria of { type, description } and this is used to only allow keys with unique criteria in a keyring. This means, however, that you cannot have keys with the same type and description but a different target namespace in the same keyring. This is a potential problem for a containerised environment where, say, a container is made up of some parts of its mount space involving netfs superblocks from two different network namespaces. This is also a problem for shared system management keyrings such as the DNS records keyring or the NFS idmapper keyring that might contain keys from different network namespaces. Fix this by including a namespace component in a key's matching criteria. Keyring types are marked to indicate which, if any, namespace is relevant to keys of that type, and that namespace is set when the key is created from the current task's namespace set. The capability bit KEYCTL_CAPS1_NS_KEY_TAG is set if the kernel is employing this feature. Signed-off-by: David Howells <dhowells@redhat.com>
2019-06-26keys: Move the user and user-session keyrings to the user_namespaceDavid Howells
Move the user and user-session keyrings to the user_namespace struct rather than pinning them from the user_struct struct. This prevents these keyrings from propagating across user-namespaces boundaries with regard to the KEY_SPEC_* flags, thereby making them more useful in a containerised environment. The issue is that a single user_struct may be represent UIDs in several different namespaces. The way the patch does this is by attaching a 'register keyring' in each user_namespace and then sticking the user and user-session keyrings into that. It can then be searched to retrieve them. Signed-off-by: David Howells <dhowells@redhat.com> cc: Jann Horn <jannh@google.com>
2019-06-26keys: Namespace keyring namesDavid Howells
Keyring names are held in a single global list that any process can pick from by means of keyctl_join_session_keyring (provided the keyring grants Search permission). This isn't very container friendly, however. Make the following changes: (1) Make default session, process and thread keyring names begin with a '.' instead of '_'. (2) Keyrings whose names begin with a '.' aren't added to the list. Such keyrings are system specials. (3) Replace the global list with per-user_namespace lists. A keyring adds its name to the list for the user_namespace that it is currently in. (4) When a user_namespace is deleted, it just removes itself from the keyring name list. The global keyring_name_lock is retained for accessing the name lists. This allows (4) to work. This can be tested by: # keyctl newring foo @s 995906392 # unshare -U $ keyctl show ... 995906392 --alswrv 65534 65534 \_ keyring: foo ... $ keyctl session foo Joined session keyring: 935622349 As can be seen, a new session keyring was created. The capability bit KEYCTL_CAPS1_NS_KEYRING_NAME is set if the kernel is employing this feature. Signed-off-by: David Howells <dhowells@redhat.com> cc: Eric W. Biederman <ebiederm@xmission.com>
2019-06-26keys: Add a 'recurse' flag for keyring searchesDavid Howells
Add a 'recurse' flag for keyring searches so that the flag can be omitted and recursion disabled, thereby allowing just the nominated keyring to be searched and none of the children. Signed-off-by: David Howells <dhowells@redhat.com>
2019-06-26keys: Cache the hash value to avoid lots of recalculationDavid Howells
Cache the hash of the key's type and description in the index key so that we're not recalculating it every time we look at a key during a search. The hash function does a bunch of multiplications, so evading those is probably worthwhile - especially as this is done for every key examined during a search. This also allows the methods used by assoc_array to get chunks of index-key to be simplified. Signed-off-by: David Howells <dhowells@redhat.com>
2019-06-26keys: Simplify key description managementDavid Howells
Simplify key description management by cramming the word containing the length with the first few chars of the description also. This simplifies the code that generates the index-key used by assoc_array. It should speed up key searching a bit too. Signed-off-by: David Howells <dhowells@redhat.com>
2019-06-26keys: Kill off request_key_async{,_with_auxdata}David Howells
Kill off request_key_async{,_with_auxdata}() as they're not currently used. Signed-off-by: David Howells <dhowells@redhat.com>
2019-06-26Merge branch 'md-next' of https://github.com/liu-song-6/linux into for-5.3/blockJens Axboe
Pull single MD warning fix from Song. * 'md-next' of https://github.com/liu-song-6/linux: md/raid1: Fix a warning message in remove_wb()
2019-06-26ipv4: reset rt_iif for recirculated mcast/bcast out pktsStephen Suryaputra
Multicast or broadcast egress packets have rt_iif set to the oif. These packets might be recirculated back as input and lookup to the raw sockets may fail because they are bound to the incoming interface (skb_iif). If rt_iif is not zero, during the lookup, inet_iif() function returns rt_iif instead of skb_iif. Hence, the lookup fails. v2: Make it non vrf specific (David Ahern). Reword the changelog to reflect it. Signed-off-by: Stephen Suryaputra <ssuryaextr@gmail.com> Reviewed-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-06-26md/raid1: Fix a warning message in remove_wb()Dan Carpenter
The WARN_ON() macro doesn't take an error message, it just takes a condition. I've changed this to use WARN(1, "...") instead. Fixes: 3e148a320979 ("md/raid1: fix potential data inconsistency issue with write behind device") Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Song Liu <songliubraving@fb.com>
2019-06-26dt-bindings: riscv: resolve 'make dt_binding_check' warningsPaul Walmsley
Rob pointed out that one of the examples in the RISC-V 'cpus' YAML schema results in warnings from 'make dt_binding_check'. Fix these. While here, make the whitespace in the second example consistent with the first example. Signed-off-by: Paul Walmsley <paul.walmsley@sifive.com> Cc: Rob Herring <robh@kernel.org> Reviewed-by: Rob Herring <robh@kernel.org> # for fixing the dtc warnings
2019-06-26riscv: dts: Re-organize the DT nodesYash Shah
As per the convention for any SOC device with external connection, define only device DT node in SOC DTSi file with status = "disabled" and enable device in Board DTS file with status = "okay" Reported-by: Anup Patel <anup@brainfault.org> Signed-off-by: Yash Shah <yash.shah@sifive.com> Signed-off-by: Paul Walmsley <paul.walmsley@sifive.com>
2019-06-26RISC-V: defconfig: enable MMC & SPI for RISC-VAtish Patra
Currently, riscv upstream defconfig doesn't let you boot through userspace if rootfs is on the SD card. Let's enable MMC & SPI drivers as well so that one can boot to the user space using default config in upstream kernel. While here, enable automatic mounting of devtmpfs to simplify kernel testing with minimal root filesystems. (pjw) Signed-off-by: Atish Patra <atish.patra@wdc.com> Reviewed-by: Palmer Dabbelt <palmer@sifive.com> [paul.walmsley@sifive.com: mention the DEVTMPFS_MOUNT change in the patch description] Signed-off-by: Paul Walmsley <paul.walmsley@sifive.com>
2019-06-26team: Always enable vlan tx offloadYueHaibing
We should rather have vlan_tci filled all the way down to the transmitting netdevice and let it do the hw/sw vlan implementation. Suggested-by: Jiri Pirko <jiri@resnulli.us> Signed-off-by: YueHaibing <yuehaibing@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-06-26Merge branch 'smc-fixes'David S. Miller
Ursula Braun says: ==================== net/smc: fixes 2019-06-26 here are 2 small smc fixes for the net tree. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2019-06-26net/smc: Fix error path in smc_initYueHaibing
If register_pernet_subsys success in smc_init, we should cleanup it in case any other error. Fixes: 64e28b52c7a6 (net/smc: add pnet table namespace support") Signed-off-by: YueHaibing <yuehaibing@huawei.com> Signed-off-by: Ursula Braun <ubraun@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-06-26net/smc: hold conns_lock before calling smc_lgr_register_conn()Huaping Zhou
After smc_lgr_create(), the newly created link group is added to smc_lgr_list, thus is accessible from other context. Although link group creation is serialized by smc_create_lgr_pending, the new link group may still be accessed concurrently. For example, if ib_device is no longer active, smc_ib_port_event_work() will call smc_port_terminate(), which in turn will call __smc_lgr_terminate() on every link group of this device. So conns_lock is required here. Signed-off-by: Huaping Zhou <zhp@smail.nju.edu.cn> Signed-off-by: Ursula Braun <ubraun@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>