summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2019-09-10iocost_monitor: Report more info with higher accuracyTejun Heo
When outputting json: * Don't truncate numbers. * Report address of iocg to ease drilling down further. When outputting table: * Use math.ceil() for delay_ms so that small delays don't read as 0. Signed-off-by: Tejun Heo <tj@kernel.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-09-10iocost_monitor: Always use strings for json valuesTejun Heo
Json has limited accuracy for numbers and can silently truncate 64bit values, which can be extremely confusing. Let's consistently use string encapsulated values for json output. While at it, convert an unnecesary f-string to str(). Signed-off-by: Tejun Heo <tj@kernel.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-09-10blk-iocost: Don't let merges push vtime into the futureTejun Heo
Merges have the same problem that forced-bios had which is fixed by the previous patch. The cost of a merge is calculated at the time of issue and force-advances vtime into the future. Until global vtime catches up, how the cgroup's hweight changes in the meantime doesn't matter and it often leads to situations where the cost is calculated at one hweight and paid at a very different one. See the previous patch for more details. Fix it by never advancing vtime into the future for merges. If budget is available, vtime is advanced. Otherwise, the cost is charged as debt. This brings merge cost handling in line with issue cost handling in ioc_rqos_throttle(). Signed-off-by: Tejun Heo <tj@kernel.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-09-10blk-iocost: Account force-charged overage in absolute vtimeTejun Heo
Currently, when a bio needs to be force-charged and there isn't enough budget, vtime is simply pushed into the future. This means that the cost of the whole bio is scaled using the current hweight and then charged immediately. Until the global vtime advances beyond this future vtime, the cgroup won't be allowed to issue normal IOs. This is incorrect and can lead to, for example, exploding vrate or extended stalls if vrate range is constrained. Consider the following scenario. 1. A cgroup with a very low hweight runs out of budget. 2. A storm of swap-out happens on it. All of them are scaled according to the current low hweight and charged to vtime pushing it to a far future. 3. All other cgroups go idle and now the above cgroup has access to the whole device. However, because vtime is already wound using the past low hweight, what its current hweight is doesn't matter until global vtime catches up to the local vtime. 4. As a result, either vrate gets ramped up extremely or the IOs stall while the underlying device is idle. This is because the hweight the overage is calculated at is different from the hweight that it's being paid at. Fix it by remembering the overage in absoulte vtime and continuously paying with the actual budget according to the current hweight at each period. Note that non-forced bios which wait already remembers the cost in absolute vtime. This brings forced-bio accounting in line. Signed-off-by: Tejun Heo <tj@kernel.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-09-10blk-iocost: Fix incorrect operation order during iocg freeTejun Heo
ioc_pd_free() first cancels the hrtimers and then deactivates the iocg. However, the iocg timer can run inbetween and reschedule the hrtimers which will end up running after the iocg is freed leading to crashes like the following. general protection fault: 0000 [#1] SMP ... RIP: 0010:iocg_kick_delay+0xbe/0x1b0 RSP: 0018:ffffc90003598ea0 EFLAGS: 00010046 RAX: 1cee00fd69512b54 RBX: ffff8881bba48400 RCX: 00000000000003e8 RDX: 0000000000000000 RSI: 0000000000000001 RDI: ffff8881bba48400 RBP: 0000000000004e20 R08: 0000000000000002 R09: 00000000000003e8 R10: 0000000000000000 R11: 0000000000000000 R12: ffffc90003598ef0 R13: 00979f3810ad461f R14: ffff8881bba4b400 R15: 25439f950d26e1d1 FS: 0000000000000000(0000) GS:ffff88885f800000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007f64328c7e40 CR3: 0000000002409005 CR4: 00000000003606e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: <IRQ> iocg_delay_timer_fn+0x3d/0x60 __hrtimer_run_queues+0xfe/0x270 hrtimer_interrupt+0xf4/0x210 smp_apic_timer_interrupt+0x5e/0x120 apic_timer_interrupt+0xf/0x20 </IRQ> Fix it by canceling hrtimers after deactivating the iocg. Fixes: 7caa47151ab2 ("blkcg: implement blk-iocost") Reported-by: Dave Jones <davej@codemonkey.org.uk> Signed-off-by: Tejun Heo <tj@kernel.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-09-10sctp: fix the missing put_user when dumping transport thresholdsXin Long
This issue causes SCTP_PEER_ADDR_THLDS sockopt not to be able to dump a transport thresholds info. Fix it by adding 'goto' put_user in sctp_getsockopt_paddr_thresholds. Fixes: 8add543e369d ("sctp: add SCTP_FUTURE_ASSOC for SCTP_PEER_ADDR_THLDS sockopt") Signed-off-by: Xin Long <lucien.xin@gmail.com> Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Acked-by: Neil Horman <nhorman@tuxdriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-09-10sch_hhf: ensure quantum and hhf_non_hh_weight are non-zeroCong Wang
In case of TCA_HHF_NON_HH_WEIGHT or TCA_HHF_QUANTUM is zero, it would make no progress inside the loop in hhf_dequeue() thus kernel would get stuck. Fix this by checking this corner case in hhf_change(). Fixes: 10239edf86f1 ("net-qdisc-hhf: Heavy-Hitter Filter (HHF) qdisc") Reported-by: syzbot+bc6297c11f19ee807dc2@syzkaller.appspotmail.com Reported-by: syzbot+041483004a7f45f1f20a@syzkaller.appspotmail.com Reported-by: syzbot+55be5f513bed37fc4367@syzkaller.appspotmail.com Cc: Jamal Hadi Salim <jhs@mojatatu.com> Cc: Jiri Pirko <jiri@resnulli.us> Cc: Terry Lam <vtlam@google.com> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-09-10net_sched: check cops->tcf_block in tc_bind_tclass()Cong Wang
At least sch_red and sch_tbf don't implement ->tcf_block() while still have a non-zero tc "class". Instead of adding nop implementations to each of such qdisc's, we can just relax the check of cops->tcf_block() in tc_bind_tclass(). They don't support TC filter anyway. Reported-by: syzbot+21b29db13c065852f64b@syzkaller.appspotmail.com Cc: Jamal Hadi Salim <jhs@mojatatu.com> Cc: Jiri Pirko <jiri@resnulli.us> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-09-10KVM: x86: Add kvm_emulate_{rd,wr}msr() to consolidate VXM/SVM codeSean Christopherson
Move RDMSR and WRMSR emulation into common x86 code to consolidate nearly identical SVM and VMX code. Note, consolidating RDMSR introduces an extra indirect call, i.e. retpoline, due to reaching {svm,vmx}_get_msr() via kvm_x86_ops, but a guest kernel likely has bigger problems if increasing the latency of RDMSR VM-Exits by ~70 cycles has a measurable impact on overall VM performance. E.g. the only recurring RDMSR VM-Exits (after booting) on my system running Linux 5.2 in the guest are for MSR_IA32_TSC_ADJUST via arch_cpu_idle_enter(). No functional change intended. Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-09-10KVM: x86: Refactor up kvm_{g,s}et_msr() to simplify callersSean Christopherson
Refactor the top-level MSR accessors to take/return the index and value directly instead of requiring the caller to dump them into a msr_data struct. No functional change intended. Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-09-10doc: kvm: Fix return description of KVM_SET_MSRSXiaoyao Li
Userspace can use ioctl KVM_SET_MSRS to update a set of MSRs of guest. This ioctl set specified MSRs one by one. If it fails to set an MSR, e.g., due to setting reserved bits, the MSR is not supported/emulated by KVM, etc..., it stops processing the MSR list and returns the number of MSRs have been set successfully. Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-09-10KVM: X86: Tune PLE Window tracepointPeter Xu
The PLE window tracepoint triggers even if the window is not changed, and the wording can be a bit confusing too. One example line: kvm_ple_window: vcpu 0: ple_window 4096 (shrink 4096) It easily let people think of "the window now is 4096 which is shrinked", but the truth is the value actually didn't change (4096). Let's only dump this message if the value really changed, and we make the message even simpler like: kvm_ple_window: vcpu 4 old 4096 new 8192 (growed) Signed-off-by: Peter Xu <peterx@redhat.com> Reviewed-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-09-10KVM: VMX: Change ple_window type to unsigned intPeter Xu
The VMX ple_window is 32 bits wide, so logically it can overflow with an int. The module parameter is declared as unsigned int which is good, however the dynamic variable is not. Switching all the ple_window references to use unsigned int. The tracepoint changes will also affect SVM, but SVM is using an even smaller width (16 bits) so it's always fine. Suggested-by: Sean Christopherson <sean.j.christopherson@intel.com> Reviewed-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Peter Xu <peterx@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-09-10KVM: X86: Remove tailing newline for tracepointsPeter Xu
It's done by TP_printk() already. Reviewed-by: Krish Sadhukhan <krish.sadhukhan@oracle.com> Reviewed-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Peter Xu <peterx@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-09-10KVM: X86: Trace vcpu_id for vmexitPeter Xu
Tracing the ID helps to pair vmenters and vmexits for guests with multiple vCPUs. Reviewed-by: Krish Sadhukhan <krish.sadhukhan@oracle.com> Reviewed-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Peter Xu <peterx@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-09-10Merge tag 'kvmarm-5.4' of ↵Paolo Bonzini
git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD KVM/arm updates for 5.4 - New ITS translation cache - Allow up to 512 CPUs to be supported with GICv3 (for real this time) - Now call kvm_arch_vcpu_blocking early in the blocking sequence - Tidy-up device mappings in S2 when DIC is available - Clean icache invalidation on VMID rollover - General cleanup
2019-09-10Merge branch 'nfp-implement-firmware-loading-policy'David S. Miller
Simon Horman says: ==================== nfp: implement firmware loading policy Dirk says: This series adds configuration capabilities to the firmware loading policy of the NFP driver. NFP firmware loading is controlled via three HWinfo keys which can be set per device: 'abi_drv_reset', 'abi_drv_load_ifc' and 'app_fw_from_flash'. Refer to patch #11 for more detail on how these control the firmware loading. In order to configure the full extend of FW loading policy, a new devlink parameter has been introduced, 'reset_dev_on_drv_probe', which controls if the driver should reset the device when it's probed. This, in conjunction with the existing 'fw_load_policy' (extended to include a 'disk' option) provides the means to tweak the NFP HWinfo keys as required by users. Patches 1 and 2 adds the devlink modifications and patches 3 through 9 adds the support into the NFP driver. Furthermore, the last 2 patches are documentation only. v2: Renamed all 'reset_dev_on_drv_probe' defines the same as the devlink parameter name (Jiri) ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2019-09-10Documentation: nfp: add nfp driver specific notesDirk van der Merwe
This adds the initial documentation for the NFP driver specific documentation. Right now, only basic information is provided about acquiring firmware and configuring device firmware loading. Original driver documentation can be found here: https://github.com/Netronome/nfp-drv-kmods/blob/master/README.md Signed-off-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com> Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: Simon Horman <simon.horman@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-09-10kdoc: fix nfp_fw_load documentationDirk van der Merwe
Fixed the incorrect prefix for the 'nfp_fw_load' function. Signed-off-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com> Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: Simon Horman <simon.horman@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-09-10nfp: devlink: add 'reset_dev_on_drv_probe' supportDirk van der Merwe
Add support for the 'reset_dev_on_drv_probe' devlink parameter. The reset control policy is controlled by the 'abi_drv_reset' hwinfo key. Signed-off-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com> Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: Simon Horman <simon.horman@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-09-10nfp: devlink: add 'fw_load_policy' supportDirk van der Merwe
Add support for the 'fw_load_policy' devlink parameter. The FW load policy is controlled by the 'app_fw_from_flash' hwinfo key. Remap the values from devlink to the hwinfo key and back. Signed-off-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com> Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: Simon Horman <simon.horman@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-09-10nfp: add devlink param infrastructureDirk van der Merwe
Register devlink parameters for driver use. Subsequent patches will add support for specific parameters. In order to support devlink parameters, the management firmware needs to be able to lookup and set hwinfo keys. Signed-off-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com> Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: Simon Horman <simon.horman@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-09-10nfp: honor FW reset and loading policiesDirk van der Merwe
The firmware reset and loading policies can be controlled with the combination of three hwinfo keys, 'abi_drv_reset', 'abi_drv_load_ifc' and 'app_fw_from_flash'. 'app_fw_from_flash' defines which firmware should take precedence, 'Disk', 'Flash' or the 'Preferred' firmware. When 'Preferred' is selected, the management firmware makes the decision on which firmware will be loaded by comparing versions of the flash firmware and the host supplied firmware. 'abi_drv_reset' defines when the driver should reset the firmware when the driver is probed, either 'Disk' if firmware was found on disk, 'Always' reset or 'Never' reset. Note that the device is always reset on driver unload if firmware was loaded when the driver was probed. 'abi_drv_load_ifc' defines a list of PF devices allowed to load FW on the device. Furthermore, we limit the cases to where the driver will unload firmware again when the driver is removed to only when firmware was loaded by the driver and only if this particular device was the only one that could have loaded firmware. This is needed to avoid firmware being removed while in use on multi-host platforms. Signed-off-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com> Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: Simon Horman <simon.horman@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-09-10nfp: nsp: add support for hwinfo set operationDirk van der Merwe
Add support for the NSP HWinfo set command. This closely follows the HWinfo lookup command. Signed-off-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com> Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: Simon Horman <simon.horman@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-09-10nfp: nsp: add support for optional hwinfo lookupDirk van der Merwe
There are cases where we want to read a hwinfo entry from the NFP, and if it doesn't exist, use a default value instead. To support this, we must silence warning/error messages when the hwinfo entry doesn't exist since this is a valid use case. The NSP command structure provides the ability to silence command errors, in which case the caller should log any command errors appropriately. Protocol errors are unaffected by this. Signed-off-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com> Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: Simon Horman <simon.horman@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-09-10nfp: nsp: add support for fw_loaded commandDirk van der Merwe
Add support for the simple command that indicates whether application firmware is loaded. Signed-off-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com> Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: Simon Horman <simon.horman@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-09-10devlink: add 'reset_dev_on_drv_probe' paramDirk van der Merwe
Add the 'reset_dev_on_drv_probe' devlink parameter, controlling the device reset policy on driver probe. This parameter is useful in conjunction with the existing 'fw_load_policy' parameter. Signed-off-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com> Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: Simon Horman <simon.horman@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-09-10devlink: extend 'fw_load_policy' valuesDirk van der Merwe
Add the 'disk' value to the generic 'fw_load_policy' devlink parameter. This value indicates that firmware should always be loaded from disk only. Signed-off-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com> Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: Simon Horman <simon.horman@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-09-10Merge branch 'net-dsa-mv88e6xxx-add-PCL-support'David S. Miller
Vivien Didelot says: ==================== net: dsa: mv88e6xxx: add PCL support This small series implements the ethtool RXNFC operations in the mv88e6xxx DSA driver to configure a port's Layer 2 Policy Control List (PCL) supported by models such as 88E6352 and 88E6390 and equivalent. This allows to configure a port to discard frames based on a configured destination or source MAC address and an optional VLAN, with e.g.: # ethtool --config-nfc lan1 flow-type ether src 00:11:22:33:44:55 action -1 ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2019-09-10net: dsa: mv88e6xxx: add RXNFC supportVivien Didelot
Implement the .get_rxnfc and .set_rxnfc DSA operations to configure a port's Layer 2 Policy Control List (PCL) via ethtool. Currently only dropping frames based on MAC Destination or Source Address (including the option VLAN parameter) is supported. Signed-off-by: Vivien Didelot <vivien.didelot@gmail.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-09-10net: dsa: mv88e6xxx: introduce .port_set_policyVivien Didelot
Introduce a new .port_set_policy operation to configure a port's Policy Control List, based on mapping such as DA, SA, Etype and so on. Models similar to 88E6352 and 88E6390 are supported at the moment. Signed-off-by: Vivien Didelot <vivien.didelot@gmail.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-09-10net: dsa: mv88e6xxx: complete ATU state definitionsVivien Didelot
Marvell has different values for the state of a MAC address, depending on its multicast bit. This patch completes the definitions for these states. At the same time, use 0 which is intuitive enough and simplifies the code a bit, instead of the UC or MC unused value. Signed-off-by: Vivien Didelot <vivien.didelot@gmail.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-09-10io_uring: limit parallelism of buffered writesJens Axboe
All the popular filesystems need to grab the inode lock for buffered writes. With io_uring punting buffered writes to async context, we observe a lot of contention with all workers hamming this mutex. For buffered writes, we generally don't need a lot of parallelism on the submission side, as the flushing will take care of that for us. Hence we don't need a deep queue on the write side, as long as we can safely punt from the original submission context. Add a workqueue with a limit of 2 that we can use for buffered writes. This greatly improves the performance and efficiency of higher queue depth buffered async writes with io_uring. Reported-by: Andres Freund <andres@anarazel.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-09-10io_uring: add io_queue_async_work() helperJens Axboe
Add a helper for queueing a request for async execution, in preparation for optimizing it. No functional change in this patch. Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-09-10waitid: Add support for waiting for the current process groupEric W. Biederman
It was recently discovered that the linux version of waitid is not a superset of the other wait functions because it does not include support for waiting for the current process group. This has two downsides: 1. An extra system call is needed to get the current process group. 2. After the current process group is received and before it is passed to waitid a signal could arrive causing the current process group to change. Inherent race-conditions as these make it impossible for userspace to emulate this functionaly and thus violate async-signal safety requirements for waitpid. Arguments can be made for using a different choice of idtype and id for this case but the BSDs already use this P_PGID and 0 to indicate waiting for the current process's process group. So be nice to user space programmers and don't introduce an unnecessary incompatibility. Some people have noted that the posix description is that waitpid will wait for the current process group, and that in the presence of pthreads that process group can change. To get clarity on this issue I looked at XNU, FreeBSD, and Luminos. All of those flavors of unix waited for the current process group at the time of call and as written could not adapt to the process group changing after the call. At one point Linux did adapt to the current process group changing but that stopped in 161550d74c07 ("pid: sys_wait... fixes"). It has been over 11 years since Linux has that behavior, no programs that fail with the change in behavior have been reported, and I could not find any other unix that does this. So I think it is safe to clarify the definition of current process group, to current process group at the time of the wait function. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com> Reviewed-by: Oleg Nesterov <oleg@redhat.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Palmer Dabbelt <palmer@sifive.com> Cc: Rich Felker <dalias@libc.org> Cc: Alistair Francis <alistair23@gmail.com> Cc: Zong Li <zongbox@gmail.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Florian Weimer <fweimer@redhat.com> Cc: Adhemerval Zanella <adhemerval.zanella@linaro.org> Cc: GNU C Library <libc-alpha@sourceware.org> Link: https://lore.kernel.org/r/20190814154400.6371-2-christian.brauner@ubuntu.com
2019-09-10Merge tag 'kvm-ppc-next-5.4-1' of ↵Paolo Bonzini
git://git.kernel.org/pub/scm/linux/kernel/git/paulus/powerpc into HEAD PPC KVM update for 5.4 - Some prep for extending the uses of the rmap array - Various minor fixes - Commits from the powerpc topic/ppc-kvm branch, which fix a problem with interrupts arriving after free_irq, causing host hangs and crashes.
2019-09-10export.h: remove defined(__KERNEL__), which is no longer neededMasahiro Yamada
The conditional, define(__KERNEL__), was added by commit f235541699bc ("export.h: allow for per-symbol configurable EXPORT_SYMBOL()"). It was needed at that time to avoid the build error of modpost with CONFIG_TRIM_UNUSED_KSYMS=y. Since commit b2c5cdcfd4bc ("modpost: remove symbol prefix support"), modpost no longer includes linux/export.h, thus the define(__KERNEL__) is unneeded. Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com> Acked-by: Nicolas Pitre <nico@fluxnic.net>
2019-09-10KVM: x86: Manually calculate reserved bits when loading PDPTRSSean Christopherson
Manually generate the PDPTR reserved bit mask when explicitly loading PDPTRs. The reserved bits that are being tracked by the MMU reflect the current paging mode, which is unlikely to be PAE paging in the vast majority of flows that use load_pdptrs(), e.g. CR0 and CR4 emulation, __set_sregs(), etc... This can cause KVM to incorrectly signal a bad PDPTR, or more likely, miss a reserved bit check and subsequently fail a VM-Enter due to a bad VMCS.GUEST_PDPTR. Add a one off helper to generate the reserved bits instead of sharing code across the MMU's calculations and the PDPTR emulation. The PDPTR reserved bits are basically set in stone, and pushing a helper into the MMU's calculation adds unnecessary complexity without improving readability. Oppurtunistically fix/update the comment for load_pdptrs(). Note, the buggy commit also introduced a deliberate functional change, "Also remove bit 5-6 from rsvd_bits_mask per latest SDM.", which was effectively (and correctly) reverted by commit cd9ae5fe47df ("KVM: x86: Fix page-tables reserved bits"). A bit of SDM archaeology shows that the SDM from late 2008 had a bug (likely a copy+paste error) where it listed bits 6:5 as AVL and A for PDPTEs used for 4k entries but reserved for 2mb entries. I.e. the SDM contradicted itself, and bits 6:5 are and always have been reserved. Fixes: 20c466b56168d ("KVM: Use rsvd_bits_mask in load_pdptrs()") Cc: stable@vger.kernel.org Cc: Nadav Amit <nadav.amit@gmail.com> Reported-by: Doug Reiland <doug.reiland@intel.com> Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Reviewed-by: Peter Xu <peterx@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-09-10KVM: x86: Disable posted interrupts for non-standard IRQs delivery modesAlexander Graf
We can easily route hardware interrupts directly into VM context when they target the "Fixed" or "LowPriority" delivery modes. However, on modes such as "SMI" or "Init", we need to go via KVM code to actually put the vCPU into a different mode of operation, so we can not post the interrupt Add code in the VMX and SVM PI logic to explicitly refuse to establish posted mappings for advanced IRQ deliver modes. This reflects the logic in __apic_accept_irq() which also only ever passes Fixed and LowPriority interrupts as posted interrupts into the guest. This fixes a bug I have with code which configures real hardware to inject virtual SMIs into my guest. Signed-off-by: Alexander Graf <graf@amazon.com> Reviewed-by: Liran Alon <liran.alon@oracle.com> Reviewed-by: Sean Christopherson <sean.j.christopherson@intel.com> Reviewed-by: Wanpeng Li <wanpengli@tencent.com> Cc: stable@vger.kernel.org Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-09-10ARM: 8906/1: drivers/amba: add reset control to amba bus probeDINH L NGUYEN
The primecell controller on some SoCs, i.e. SoCFPGA, is held in reset by default. Until recently, the DMA controller was brought out of reset by the bootloader(i.e. U-Boot). But a recent change in U-Boot, the peripherals that are not used are held in reset and are left to Linux to bring them out of reset. Add a mechanism for getting the reset property and de-assert the primecell module from reset if found. This is a not a hard fail if the reset property is not present in the device tree node, so the driver will continue to probe. Because there are different variants of the controller that may have multiple reset signals, the code will find all reset(s) specified and de-assert them. Signed-off-by: Dinh Nguyen <dinguyen@kernel.org> Reviewed-by: Rob Herring <robh@kernel.org> Reviewed-by: Philipp Zabel <p.zabel@pengutronix.de> Reviewed-by: Linus Walleij <linus.walleij@linaro.org> Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
2019-09-10ARM: 8905/1: Emit __gnu_mcount_nc when using Clang 10.0.0 or newerNathan Chancellor
Currently, multi_v7_defconfig + CONFIG_FUNCTION_TRACER fails to build with clang: arm-linux-gnueabi-ld: kernel/softirq.o: in function `_local_bh_enable': softirq.c:(.text+0x504): undefined reference to `mcount' arm-linux-gnueabi-ld: kernel/softirq.o: in function `__local_bh_enable_ip': softirq.c:(.text+0x58c): undefined reference to `mcount' arm-linux-gnueabi-ld: kernel/softirq.o: in function `do_softirq': softirq.c:(.text+0x6c8): undefined reference to `mcount' arm-linux-gnueabi-ld: kernel/softirq.o: in function `irq_enter': softirq.c:(.text+0x75c): undefined reference to `mcount' arm-linux-gnueabi-ld: kernel/softirq.o: in function `irq_exit': softirq.c:(.text+0x840): undefined reference to `mcount' arm-linux-gnueabi-ld: kernel/softirq.o:softirq.c:(.text+0xa50): more undefined references to `mcount' follow clang can emit a working mcount symbol, __gnu_mcount_nc, when '-meabi gnu' is passed to it. Until r369147 in LLVM, this was broken and caused the kernel not to boot with '-pg' because the calling convention was not correct. Always build with '-meabi gnu' when using clang but ensure that '-pg' (which is added with CONFIG_FUNCTION_TRACER and its prereq CONFIG_HAVE_FUNCTION_TRACER) cannot be added with it unless this is fixed (which means using clang 10.0.0 and newer). Link: https://github.com/ClangBuiltLinux/linux/issues/35 Link: https://bugs.llvm.org/show_bug.cgi?id=33845 Link: https://github.com/llvm/llvm-project/commit/16fa8b09702378bacfa3d07081afe6b353b99e60 Reviewed-by: Matthias Kaehlcke <mka@chromium.org> Reviewed-by: Nick Desaulniers <ndesaulniers@google.com> Reviewed-by: Stefan Agner <stefan@agner.ch> Signed-off-by: Nathan Chancellor <natechancellor@gmail.com> Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
2019-09-10ARM: 8904/1: skip nomap memblocks while finding the lowmem/highmem boundaryChester Lin
adjust_lowmem_bounds() checks every memblocks in order to find the boundary between lowmem and highmem. However some memblocks could be marked as NOMAP so they are not used by kernel, which should be skipped while calculating the boundary. Signed-off-by: Chester Lin <clin@suse.com> Reviewed-by: Mike Rapoport <rppt@linux.ibm.com> Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
2019-09-10io_uring: optimize submit_and_wait APIJens Axboe
For some applications that end up using a submit-and-wait type of approach for certain batches of IO, we can make that a bit more efficient by allowing the application to block for the last IO submission. This prevents an async when we don't need it, as the application will be blocking for the completion event(s) anyway. Typical use cases are using the liburing io_uring_submit_and_wait() API, or just using io_uring_enter() doing both submissions and completions. As a specific example, RocksDB doing MultiGet() is sped up quite a bit with this change. Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-09-10ALSA: firewire-tascam: check intermediate state of clock status and retryTakashi Sakamoto
2 bytes in MSB of register for clock status is zero during intermediate state after changing status of sampling clock in models of TASCAM FireWire series. The duration of this state differs depending on cases. During the state, it's better to retry reading the register for current status of the clock. In current implementation, the intermediate state is checked only when getting current sampling transmission frequency, then retry reading. This care is required for the other operations to read the register. This commit moves the codes of check and retry into helper function commonly used for operations to read the register. Fixes: e453df44f0d6 ("ALSA: firewire-tascam: add PCM functionality") Cc: <stable@vger.kernel.org> # v4.4+ Signed-off-by: Takashi Sakamoto <o-takashi@sakamocchi.jp> Link: https://lore.kernel.org/r/20190910135152.29800-3-o-takashi@sakamocchi.jp Signed-off-by: Takashi Iwai <tiwai@suse.de>
2019-09-10ALSA: firewire-tascam: handle error code when getting current source of clockTakashi Sakamoto
The return value of snd_tscm_stream_get_clock() is ignored. This commit checks the value and handle error. Fixes: e453df44f0d6 ("ALSA: firewire-tascam: add PCM functionality") Cc: <stable@vger.kernel.org> # v4.4+ Signed-off-by: Takashi Sakamoto <o-takashi@sakamocchi.jp> Link: https://lore.kernel.org/r/20190910135152.29800-2-o-takashi@sakamocchi.jp Signed-off-by: Takashi Iwai <tiwai@suse.de>
2019-09-10rtc: meson: mark PM functions as __maybe_unusedArnd Bergmann
The meson_vrtc_set_wakeup_time() function is only used by the PM functions and causes a warning when they are disabled: drivers/rtc/rtc-meson-vrtc.c:32:13: error: unused function 'meson_vrtc_set_wakeup_time' [-Werror,-Wunused-function] Remove the #ifdef around the callers and add a __maybe_unused annotation as a more reliable way to avoid these warnings. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Reviewed-by: Kevin Hilman <khilman@baylibre.com> Link: https://lore.kernel.org/r/20190906152438.1533833-1-arnd@arndb.de Signed-off-by: Alexandre Belloni <alexandre.belloni@bootlin.com>
2019-09-10rtc: sc27xx: Remove clearing SPRD_RTC_POWEROFF_ALM_FLAG flagBaolin Wang
The SPRD_RTC_POWEROFF_ALM_FLAG flag is used to indicate if a poweroff alarm is set, which can power on the system when system in power-off status. And the bootloader will validate this flag to check if the booting mode is alarm booting mode, thus we should not clear this flag in kernel, instead bootloader will clear this flag after checking the booting mode. Signed-off-by: Baolin Wang <baolin.wang@linaro.org> Link: https://lore.kernel.org/r/1f75310242de75b14d8973538bf96efffb395daf.1567666894.git.baolin.wang@linaro.org Signed-off-by: Alexandre Belloni <alexandre.belloni@bootlin.com>
2019-09-10ath9k: release allocated buffer if timed outNavid Emamdoost
In ath9k_wmi_cmd, the allocated network buffer needs to be released if timeout happens. Otherwise memory will be leaked. Signed-off-by: Navid Emamdoost <navid.emamdoost@gmail.com> Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
2019-09-10ath9k_htc: release allocated buffer if timed outNavid Emamdoost
In htc_config_pipe_credits, htc_setup_complete, and htc_connect_service if time out happens, the allocated buffer needs to be released. Otherwise there will be memory leak. Signed-off-by: Navid Emamdoost <navid.emamdoost@gmail.com> Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
2019-09-10ath9k: Remove unneeded variable to store return valuezhong jiang
ath9k_reg_rmw_single do not need return value to cope with different cases. And change functon return type to void. Signed-off-by: zhong jiang <zhongjiang@huawei.com> Signed-off-by: Kalle Valo <kvalo@codeaurora.org>