summaryrefslogtreecommitdiff
path: root/include/linux
AgeCommit message (Collapse)Author
2019-04-16i2c: core: introduce callbacks for atomic transfersWolfram Sang
We had the request to access devices very late when interrupts are not available anymore multiple times now. Mostly to prepare shutdown or reboot. Allow adapters to specify a specific callback for this case. Note that we fall back to the generic {master|smbus}_xfer callback if this new atomic one is not present. This is intentional to preserve the previous behaviour and avoid regressions. Because there are drivers not using interrupts or because it might have worked "accidently" before. Signed-off-by: Wolfram Sang <wsa+renesas@sang-engineering.com> Reviewed-by Andy Shevchenko <andriy.shevchenko@linux.intel.com> Tested-by: Stefan Lengfeld <contact@stefanchrist.eu> Signed-off-by: Wolfram Sang <wsa@the-dreams.de>
2019-04-16perf/core: Add perf_pmu_resched() as global functionStephane Eranian
This patch add perf_pmu_resched() a global function that can be called to force rescheduling of events for a given PMU. The function locks both cpuctx and task_ctx internally. This will be used by a subsequent patch. Signed-off-by: Stephane Eranian <eranian@google.com> [ Simplified the calling convention. ] Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vince Weaver <vincent.weaver@maine.edu> Cc: kan.liang@intel.com Cc: nelson.dsouza@intel.com Cc: tonyj@suse.com Link: https://lkml.kernel.org/r/20190408173252.37932-2-eranian@google.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-04-16Merge branch 'linus' into perf/core, to pick up fixesIngo Molnar
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-04-16x86/reboot, efi: Use EFI reboot for Acer TravelMate X514-51TJian-Hong Pan
Upon reboot, the Acer TravelMate X514-51T laptop appears to complete the shutdown process, but then it hangs in BIOS POST with a black screen. The problem is intermittent - at some points it has appeared related to Secure Boot settings or different kernel builds, but ultimately we have not been able to identify the exact conditions that trigger the issue to come and go. Besides, the EFI mode cannot be disabled in the BIOS of this model. However, after extensive testing, we observe that using the EFI reboot method reliably avoids the issue in all cases. So add a boot time quirk to use EFI reboot on such systems. Buglink: https://bugzilla.kernel.org/show_bug.cgi?id=203119 Signed-off-by: Jian-Hong Pan <jian-hong@endlessm.com> Signed-off-by: Daniel Drake <drake@endlessm.com> Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Matt Fleming <matt@codeblueprint.co.uk> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-efi@vger.kernel.org Cc: linux@endlessm.com Link: http://lkml.kernel.org/r/20190412080152.3718-1-jian-hong@endlessm.com [ Fix !CONFIG_EFI build failure, clarify the code and the changelog a bit. ] Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-04-15hwmon: Add support for samples attributesGuenter Roeck
Add support for the new samples attributes to the hwmon core. Cc: Krzysztof Adamski <krzysztof.adamski@nokia.com> Cc: Nicolin Chen <nicoleotsuka@gmail.com> Acked-by: Nicolin Chen <nicoleotsuka@gmail.com> Signed-off-by: Guenter Roeck <linux@roeck-us.net>
2019-04-15hwmon: Add convience macro to define simple static sensorsCharles Keepax
It takes a fair amount of boiler plate code to add new sensors, add a macro that can be used to specify simple static sensors. Signed-off-by: Charles Keepax <ckeepax@opensource.cirrus.com> Signed-off-by: Guenter Roeck <linux@roeck-us.net>
2019-04-15ntp: Audit NTP parameters adjustmentOndrej Mosnacek
Emit an audit record every time selected NTP parameters are modified from userspace (via adjtimex(2) or clock_adjtime(2)). These parameters may be used to indirectly change system clock, and thus their modifications should be audited. Such events will now generate records of type AUDIT_TIME_ADJNTPVAL containing the following fields: - op -- which value was adjusted: - offset -- corresponding to the time_offset variable - freq -- corresponding to the time_freq variable - status -- corresponding to the time_status variable - adjust -- corresponding to the time_adjust variable - tick -- corresponding to the tick_usec variable - tai -- corresponding to the timekeeping's TAI offset - old -- the old value - new -- the new value Example records: type=TIME_ADJNTPVAL msg=audit(1530616044.507:7): op=status old=64 new=8256 type=TIME_ADJNTPVAL msg=audit(1530616044.511:11): op=freq old=0 new=49180377088000 The records of this type will be associated with the corresponding syscall records. An overview of parameter changes that can be done via do_adjtimex() (based on information from Miroslav Lichvar) and whether they are audited: __timekeeping_set_tai_offset() -- sets the offset from the International Atomic Time (AUDITED) NTP variables: time_offset -- can adjust the clock by up to 0.5 seconds per call and also speed it up or slow down by up to about 0.05% (43 seconds per day) (AUDITED) time_freq -- can speed up or slow down by up to about 0.05% (AUDITED) time_status -- can insert/delete leap seconds and it also enables/ disables synchronization of the hardware real-time clock (AUDITED) time_maxerror, time_esterror -- change error estimates used to inform userspace applications (NOT AUDITED) time_constant -- controls the speed of the clock adjustments that are made when time_offset is set (NOT AUDITED) time_adjust -- can temporarily speed up or slow down the clock by up to 0.05% (AUDITED) tick_usec -- a more extreme version of time_freq; can speed up or slow down the clock by up to 10% (AUDITED) Signed-off-by: Ondrej Mosnacek <omosnace@redhat.com> Reviewed-by: Richard Guy Briggs <rgb@redhat.com> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Paul Moore <paul@paul-moore.com>
2019-04-15timekeeping: Audit clock adjustmentsOndrej Mosnacek
Emit an audit record whenever the system clock is changed (i.e. shifted by a non-zero offset) by a syscall from userspace. The syscalls than can (at the time of writing) trigger such record are: - settimeofday(2), stime(2), clock_settime(2) -- via do_settimeofday64() - adjtimex(2), clock_adjtime(2) -- via do_adjtimex() The new records have type AUDIT_TIME_INJOFFSET and contain the following fields: - sec -- the 'seconds' part of the offset - nsec -- the 'nanoseconds' part of the offset Example record (time was shifted backwards by ~15.875 seconds): type=TIME_INJOFFSET msg=audit(1530616049.652:13): sec=-16 nsec=124887145 The records of this type will be associated with the corresponding syscall records. Signed-off-by: Ondrej Mosnacek <omosnace@redhat.com> Reviewed-by: Richard Guy Briggs <rgb@redhat.com> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> [PM: fixed a line width problem in __audit_tk_injoffset()] Signed-off-by: Paul Moore <paul@paul-moore.com>
2019-04-15Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-nextDavid S. Miller
Pablo Neira Ayuso says: ==================== Netfilter/IPVS updates for net-next The following patchset contains Netfilter updates for net-next: 1) Remove the broute pseudo hook, implement this from the bridge prerouting hook instead. Now broute becomes real table in ebtables, from Florian Westphal. This also includes a size reduction patch for the bridge control buffer area via squashing boolean into bitfields and a selftest. 2) Add OS passive fingerprint version matching, from Fernando Fernandez. 3) Support for gue encapsulation for IPVS, from Jacky Hu. 4) Add support for NAT to the inet family, from Florian Westphal. This includes support for masquerade, redirect and nat extensions. 5) Skip interface lookup in flowtable, use device in the dst object. 6) Add jiffies64_to_msecs() and use it, from Li RongQing. 7) Remove unused parameter in nf_tables_set_desc_parse(), from Colin Ian King. 8) Statify several functions, patches from YueHaibing and Florian Westphal. 9) Add an optimized version of nf_inet_addr_cmp(), from Li RongQing. 10) Merge route extension to core, also from Florian. 11) Use IS_ENABLED(CONFIG_NF_NAT) instead of NF_NAT_NEEDED, from Florian. 12) Merge ip/ip6 masquerade extensions, from Florian. This includes netdevice notifier unification. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2019-04-15platform/chrome: wilco_ec: Standardize mailbox interfaceNick Crews
The current API for the wilco EC mailbox interface is bad. It assumes that most messages sent to the EC follow a similar structure, with a command byte in MBOX[0], followed by a junk byte, followed by actual data. This doesn't happen in several cases, such as setting the RTC time, using the raw debugfs interface, and reading or writing properties such as the Peak Shift policy (this last to be submitted soon). Similarly for the response message from the EC, the current interface assumes that the first byte of data is always 0, and the second byte is unused. However, in both setting and getting the RTC time, in the debugfs interface, and for reading and writing properties, this isn't true. The current way to resolve this is to use WILCO_EC_FLAG_RAW* flags to specify when and when not to skip these initial bytes in the sent and received message. They are confusing and used so much that they are normal, and not exceptions. In addition, the first byte of response in the debugfs interface is still always skipped, which is weird, since this raw interface should be giving the entire result. Additionally, sent messages assume the first byte is a command, and so struct wilco_ec_message contains the "command" field. In setting or getting properties however, the first byte is not a command, and so this field has to be filled with a byte that isn't actually a command. This is again inconsistent. wilco_ec_message contains a result field as well, copied from wilco_ec_response->result. The message result field should be removed: if the message fails, the cause is already logged, and the callers are alerted. They will never care about the actual state of the result flag. These flags and different cases make the wilco_ec_transfer() function, used in wilco_ec_mailbox(), really gross, dealing with a bunch of different cases. It's difficult to figure out what it is doing. Finally, making these assumptions about the structure of a message make it so that the messages do not correspond well with the specification for the EC's mailbox interface. For instance, this interface specification may say that MBOX[9] in the received message contains some information, but the calling code needs to remember that the first byte of response is always skipped, and because it didn't set the RESPONSE_RAW flag, the next byte is also skipped, so this information is actually contained within wilco_ec_message->response_data[7]. This makes it difficult to maintain this code in the future. To fix these problems this patch standardizes the mailbox interface by: - Removing the WILCO_EC_FLAG_RAW* flags - Removing the command and reserved_raw bytes from wilco_ec_request - Removing the mbox0 byte from wilco_ec_response - Simplifying wilco_ec_transfer() because of these changes - Gives the callers of wilco_ec_mailbox() the responsibility of exactly and consistently defining the structure of the mailbox request and response - Removing command and result from wilco_ec_message. This results in the reduction of total code, and makes it much more maintainable and understandable. Signed-off-by: Nick Crews <ncrews@chromium.org> Acked-by: Alexandre Belloni <alexandre.belloni@bootlin.com> Signed-off-by: Enric Balletbo i Serra <enric.balletbo@collabora.com>
2019-04-15PCI: endpoint: Add support to specify alignment for buffers allocated to BARsKishon Vijay Abraham I
The address that is allocated using pci_epf_alloc_space() is directly written to the target address of the Inbound Address Translation unit (ie the HW component implementing inbound address decoding) on endpoint controllers. Designware IP [1] has a configuration parameter (CX_ATU_MIN_REGION_SIZE [2]) which has 64KB as default value and the lower 16 bits of the Base, Limit and Target registers of the Inbound ATU are fixed to zero. If the programmed memory address is not aligned to 64 KB boundary this causes memory corruption. Modify pci_epf_alloc_space() API to take alignment size as argument in order to allocate buffers to be mapped to BARs with an alignment that suits the platform where they are used. Add an 'align' parameter to epc_features which can be used by platform drivers to specify the BAR allocation alignment requirements and use this while invoking pci_epf_alloc_space(). [1] "I/O and MEM Match Modes" section in DesignWare Cores PCI Express Controller Databook version 4.90a [2] http://www.ti.com/lit/ug/spruid7c/spruid7c.pdf Signed-off-by: Kishon Vijay Abraham I <kishon@ti.com> Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
2019-04-15printk: Tie printk_once / printk_deferred_once into .data.once for resetPaul Gortmaker
In commit b1fca27d384e ("kernel debug: support resetting WARN*_ONCE") we got the opportunity to reset state on the one shot messages, without having to reboot. However printk_once (printk_deferred_once) live in a different file and didn't get the same kind of update/conversion, so they remain unconditionally one shot, until the system is rebooted. For example, we currently have: sched/rt.c: printk_deferred_once("sched: RT throttling activated\n"); ..which could reasonably be tripped as someone is testing and tuning a new system/workload and their task placements. For consistency, and to avoid reboots in the same vein as the original commit, we make these two instances of _once the same as the WARN*_ONCE instances are. Link: http://lkml.kernel.org/r/1555121491-31213-1-git-send-email-paul.gortmaker@windriver.com Cc: Andi Kleen <ak@linux.intel.com> Cc: Petr Mladek <pmladek@suse.com> Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com> Reviewed-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com> Signed-off-by: Petr Mladek <pmladek@suse.com>
2019-04-15firmware: xilinx: Add fpga API'sNava kishore Manne
This Patch Adds fpga API's to support the Bitstream loading by using firmware interface. Signed-off-by: Nava kishore Manne <nava.manne@xilinx.com> Reviewed-by: Moritz Fischer <mdf@kernel.org> Signed-off-by: Michal Simek <michal.simek@xilinx.com>
2019-04-15BackMerge v5.1-rc5 into drm-nextDave Airlie
Need rc5 for udl fix to add udl cleanups on top. Signed-off-by: Dave Airlie <airlied@redhat.com>
2019-04-14Merge branch 'page-refs' (page ref overflow)Linus Torvalds
Merge page ref overflow branch. Jann Horn reported that he can overflow the page ref count with sufficient memory (and a filesystem that is intentionally extremely slow). Admittedly it's not exactly easy. To have more than four billion references to a page requires a minimum of 32GB of kernel memory just for the pointers to the pages, much less any metadata to keep track of those pointers. Jann needed a total of 140GB of memory and a specially crafted filesystem that leaves all reads pending (in order to not ever free the page references and just keep adding more). Still, we have a fairly straightforward way to limit the two obvious user-controllable sources of page references: direct-IO like page references gotten through get_user_pages(), and the splice pipe page duplication. So let's just do that. * branch page-refs: fs: prevent page refcount overflow in pipe_buf_get mm: prevent get_user_pages() from overflowing page refcount mm: add 'try_get_page()' helper function mm: make page ref count overflow check tighter and more explicit
2019-04-14fs: prevent page refcount overflow in pipe_buf_getMatthew Wilcox
Change pipe_buf_get() to return a bool indicating whether it succeeded in raising the refcount of the page (if the thing in the pipe is a page). This removes another mechanism for overflowing the page refcount. All callers converted to handle a failure. Reported-by: Jann Horn <jannh@google.com> Signed-off-by: Matthew Wilcox <willy@infradead.org> Cc: stable@kernel.org Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-04-14mm: add 'try_get_page()' helper functionLinus Torvalds
This is the same as the traditional 'get_page()' function, but instead of unconditionally incrementing the reference count of the page, it only does so if the count was "safe". It returns whether the reference count was incremented (and is marked __must_check, since the caller obviously has to be aware of it). Also like 'get_page()', you can't use this function unless you already had a reference to the page. The intent is that you can use this exactly like get_page(), but in situations where you want to limit the maximum reference count. The code currently does an unconditional WARN_ON_ONCE() if we ever hit the reference count issues (either zero or negative), as a notification that the conditional non-increment actually happened. NOTE! The count access for the "safety" check is inherently racy, but that doesn't matter since the buffer we use is basically half the range of the reference count (ie we look at the sign of the count). Acked-by: Matthew Wilcox <willy@infradead.org> Cc: Jann Horn <jannh@google.com> Cc: stable@kernel.org Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-04-14mm: make page ref count overflow check tighter and more explicitLinus Torvalds
We have a VM_BUG_ON() to check that the page reference count doesn't underflow (or get close to overflow) by checking the sign of the count. That's all fine, but we actually want to allow people to use a "get page ref unless it's already very high" helper function, and we want that one to use the sign of the page ref (without triggering this VM_BUG_ON). Change the VM_BUG_ON to only check for small underflows (or _very_ close to overflowing), and ignore overflows which have strayed into negative territory. Acked-by: Matthew Wilcox <willy@infradead.org> Cc: Jann Horn <jannh@google.com> Cc: stable@kernel.org Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-04-14iio: inkern: API for reading available iio channel attribute valuesArtur Rojek
Extend the inkern API with a function for reading available attribute values of iio channels. Signed-off-by: Artur Rojek <contact@artur-rojek.eu> Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
2019-04-13bfq: update internal depth state when queue depth changesJens Axboe
A previous commit moved the shallow depth and BFQ depth map calculations to be done at init time, moving it outside of the hotter IO path. This potentially causes hangs if the users changes the depth of the scheduler map, by writing to the 'nr_requests' sysfs file for that device. Add a blk-mq-sched hook that allows blk-mq to inform the scheduler if the depth changes, so that the scheduler can update its internal state. Tested-by: Kai Krakow <kai@kaishome.de> Reported-by: Paolo Valente <paolo.valente@linaro.org> Fixes: f0635b8a416e ("bfq: calculate shallow depths at init time") Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-04-13Merge tag 'for-linus-20190412' of git://git.kernel.dk/linux-blockLinus Torvalds
Pull block fixes from Jens Axboe: "Set of fixes that should go into this round. This pull is larger than I'd like at this time, but there's really no specific reason for that. Some are fixes for issues that went into this merge window, others are not. Anyway, this contains: - Hardware queue limiting for virtio-blk/scsi (Dongli) - Multi-page bvec fixes for lightnvm pblk - Multi-bio dio error fix (Jason) - Remove the cache hint from the io_uring tool side, since we didn't move forward with that (me) - Make io_uring SETUP_SQPOLL root restricted (me) - Fix leak of page in error handling for pc requests (Jérôme) - Fix BFQ regression introduced in this merge window (Paolo) - Fix break logic for bio segment iteration (Ming) - Fix NVMe cancel request error handling (Ming) - NVMe pull request with two fixes (Christoph): - fix the initial CSN for nvme-fc (James) - handle log page offsets properly in the target (Keith)" * tag 'for-linus-20190412' of git://git.kernel.dk/linux-block: block: fix the return errno for direct IO nvmet: fix discover log page when offsets are used nvme-fc: correct csn initialization and increments on error block: do not leak memory in bio_copy_user_iov() lightnvm: pblk: fix crash in pblk_end_partial_read due to multipage bvecs nvme: cancel request synchronously blk-mq: introduce blk_mq_complete_request_sync() scsi: virtio_scsi: limit number of hw queues by nr_cpu_ids virtio-blk: limit number of hw queues by nr_cpu_ids block, bfq: fix use after free in bfq_bfqq_expire io_uring: restrict IORING_SETUP_SQPOLL to root tools/io_uring: remove IOCQE_FLAG_CACHEHIT block: don't use for-inside-for in bio_for_each_segment_all
2019-04-13Merge tag 'nfs-for-5.1-4' of git://git.linux-nfs.org/projects/trondmy/linux-nfsLinus Torvalds
Pull NFS client bugfixes from Trond Myklebust: "Highlights include: Stable fix: - Fix a deadlock in close() due to incorrect draining of RDMA queues Bugfixes: - Revert "SUNRPC: Micro-optimise when the task is known not to be sleeping" as it is causing stack overflows - Fix a regression where NFSv4 getacl and fs_locations stopped working - Forbid setting AF_INET6 to "struct sockaddr_in"->sin_family. - Fix xfstests failures due to incorrect copy_file_range() return values" * tag 'nfs-for-5.1-4' of git://git.linux-nfs.org/projects/trondmy/linux-nfs: Revert "SUNRPC: Micro-optimise when the task is known not to be sleeping" NFSv4.1 fix incorrect return value in copy_file_range xprtrdma: Fix helper that drains the transport NFS: Fix handling of reply page vector NFS: Forbid setting AF_INET6 to "struct sockaddr_in"->sin_family.
2019-04-13Merge tag 'clk-fixes-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux Pull clk fixes from Stephen Boyd: "Here's more than a handful of clk driver fixes for changes that came in during the merge window: - Fix the AT91 sama5d2 programmable clk prescaler formula - A bunch of Amlogic meson clk driver fixes for the VPU clks - A DMI quirk for Intel's Bay Trail SoC's driver to properly mark pmc clks as critical only when really needed - Stop overwriting CLK_SET_RATE_PARENT flag in mediatek's clk gate implementation - Use the right structure to test for a frequency table in i.MX's PLL_1416x driver" * tag 'clk-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux: clk: imx: Fix PLL_1416X not rounding rates clk: mediatek: fix clk-gate flag setting platform/x86: pmc_atom: Drop __initconst on dmi table clk: x86: Add system specific quirk to mark clocks as critical clk: meson: vid-pll-div: remove warning and return 0 on invalid config clk: meson: pll: fix rounding and setting a rate that matches precisely clk: meson-g12a: fix VPU clock parents clk: meson: g12a: fix VPU clock muxes mask clk: meson-gxbb: round the vdec dividers to closest clk: at91: fix programmable clock for sama5d2
2019-04-13CPER: Remove unnecessary use of user-space typesBjorn Helgaas
"__u32" and similar types are intended for things exported to user-space, including structs used in ioctls; see include/uapi/asm-generic/int-l64.h. They are not needed for the CPER struct definitions, which not exported to user-space and not used in ioctls. Replace them with the typical "u32" and similar types. No functional change intended. The reason for changing this is to remove the question of "why do we use __u32 here instead of u32?" We should use __u32 when there's a reason for it; otherwise, we should prefer u32 for consistency. Reference: Documentation/process/coding-style.rst Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> CC: Masahiro Yamada <yamada.masahiro@socionext.com> CC: Greg Kroah-Hartman <gregkh@linuxfoundation.org> CC: Andrew Morton <akpm@linux-foundation.org>
2019-04-13CPER: Add UEFI spec referencesBjorn Helgaas
Add UEFI spec references for CPER UUIDs and structures, fix a few typos, and remove some useless comments. No functional change intended. Link: http://www.uefi.org/specifications Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2019-04-12Merge branch 'core-urgent-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull core fixes from Ingo Molnar: "Fix an objtool warning plus fix a u64_to_user_ptr() macro expansion bug" * 'core-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: objtool: Add rewind_stack_do_exit() to the noreturn list linux/kernel.h: Use parentheses around argument in u64_to_user_ptr()
2019-04-12fs: drop unused fput_atomic definitionLukas Bulwahn
commit d7065da03822 ("get rid of the magic around f_count in aio") added fput_atomic to include/linux/fs.h, motivated by its use in __aio_put_req() in fs/aio.c. Later, commit 3ffa3c0e3f6e ("aio: now fput() is OK from interrupt context; get rid of manual delayed __fput()") removed the only use of fput_atomic in __aio_put_req(), but did not remove the since then unused fput_atomic definition in include/linux/fs.h. We curate this now and finally remove the unused definition. This issue was identified during a code review due to a coccinelle warning from the atomic_as_refcounter.cocci rule pointing to the use of atomic_t in fput_atomic. Suggested-by: Krystian Radlak <kradlak@exida.com> Signed-off-by: Lukas Bulwahn <lukas.bulwahn@gmail.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2019-04-12rhashtable: use BIT(0) for locking.NeilBrown
As reported by Guenter Roeck, the new bit-locking using BIT(1) doesn't work on the m68k architecture. m68k only requires 2-byte alignment for words and longwords, so there is only one unused bit in pointers to structs - We current use two, one for the NULLS marker at the end of the linked list, and one for the bit-lock in the head of the list. The two uses don't need to conflict as we never need the head of the list to be a NULLS marker - the marker is only needed to check if an object has moved to a different table, and the bucket head cannot move. The NULLS marker is only needed in a ->next pointer. As we already have different types for the bucket head pointer (struct rhash_lock_head) and the ->next pointers (struct rhash_head), it is fairly easy to treat the lsb differently in each. So: Initialize buckets heads to NULL, and use the lsb for locking. When loading the pointer from the bucket head, if it is NULL (ignoring the lock big), report as being the expected NULLS marker. When storing a value into a bucket head, if it is a NULLS marker, store NULL instead. And convert all places that used bit 1 for locking, to use bit 0. Fixes: 8f0db018006a ("rhashtable: use bit_spin_locks to protect hash bucket.") Reported-by: Guenter Roeck <linux@roeck-us.net> Tested-by: Guenter Roeck <linux@roeck-us.net> Signed-off-by: NeilBrown <neilb@suse.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-04-12rhashtable: replace rht_ptr_locked() with rht_assign_locked()NeilBrown
The only times rht_ptr_locked() is used, it is to store a new value in a bucket-head. This is the only time it makes sense to use it too. So replace it by a function which does the whole task: Sets the lock bit and assigns to a bucket head. Signed-off-by: NeilBrown <neilb@suse.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-04-12rhashtable: move dereference inside rht_ptr()NeilBrown
Rather than dereferencing a pointer to a bucket and then passing the result to rht_ptr(), we now pass in the pointer and do the dereference in rht_ptr(). This requires that we pass in the tbl and hash as well to support RCU checks, and means that the various rht_for_each functions can expect a pointer that can be dereferenced without further care. There are two places where we dereference a bucket pointer where there is no testable protection - in each case we know that we much have exclusive access without having taken a lock. The previous code used rht_dereference() to pretend that holding the mutex provided protects, but holding the mutex never provides protection for accessing buckets. So instead introduce rht_ptr_exclusive() that can be used when there is known to be exclusive access without holding any locks. Signed-off-by: NeilBrown <neilb@suse.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-04-12rhashtable: reorder some inline functions and macros.NeilBrown
This patch only moves some code around, it doesn't change the code at all. A subsequent patch will benefit from this as it needs to add calls to functions which are now defined before the call-site, but weren't before. Signed-off-by: NeilBrown <neilb@suse.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-04-12rhashtable: fix some __rcu annotation errorsNeilBrown
With these annotations, the rhashtable now gets no warnings when compiled with "C=1" for sparse checking. Signed-off-by: NeilBrown <neilb@suse.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-04-12scsi: scsi_transport_fc: nvme: display FC-NVMe port rolesHannes Reinecke
Currently the FC-NVMe driver is leverating the SCSI FC transport class to access the remote ports. Which means that all FC-NVMe remote ports will be visible to the fc transport layer, but due to missing definitions the port roles will always be 'unknown'. This patch adds the missing definitions to the fc transport class to that the port roles are correctly displayed. Signed-off-by: Hannes Reinecke <hare@suse.de> Reviewed-by: James Smart <james.smart@broadcom.com> Reviewed-by: Ewan D. Milne <emilne@redhat.com> Reviewed-by: Giridhar Malavali <gmalavali@marvell.com> Reviewed-by: Himanshu Madhani <hmadhani@marvell.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2019-04-12Merge branch 'next-integrity-for-james' of ↵James Morris
git://git.kernel.org/pub/scm/linux/kernel/git/zohar/linux-integrity into next-integrity From Mimi: "This pull request contains just three patches, the remainder are either included in other pull requests (eg. audit, lockdown) or will be upstreamed via other subsystems (eg. kselftests, Power).  Included in this pull request is one bug fix, one documentation update, and extending the x86 IMA arch policy rules to coordinate the different kernel module signature verification methods."
2019-04-12bpf: Introduce bpf_strtol and bpf_strtoul helpersAndrey Ignatov
Add bpf_strtol and bpf_strtoul to convert a string to long and unsigned long correspondingly. It's similar to user space strtol(3) and strtoul(3) with a few changes to the API: * instead of NUL-terminated C string the helpers expect buffer and buffer length; * resulting long or unsigned long is returned in a separate result-argument; * return value is used to indicate success or failure, on success number of consumed bytes is returned that can be used to identify position to read next if the buffer is expected to contain multiple integers; * instead of *base* argument, *flags* is used that provides base in 5 LSB, other bits are reserved for future use; * number of supported bases is limited. Documentation for the new helpers is provided in bpf.h UAPI. The helpers are made available to BPF_PROG_TYPE_CGROUP_SYSCTL programs to be able to convert string input to e.g. "ulongvec" output. E.g. "net/ipv4/tcp_mem" consists of three ulong integers. They can be parsed by calling to bpf_strtoul three times. Implementation notes: Implementation includes "../../lib/kstrtox.h" to reuse integer parsing functions. It's done exactly same way as fs/proc/base.c already does. Unfortunately existing kstrtoX function can't be used directly since they fail if any invalid character is present right after integer in the string. Existing simple_strtoX functions can't be used either since they're obsolete and don't handle overflow properly. Signed-off-by: Andrey Ignatov <rdna@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2019-04-12bpf: Introduce ARG_PTR_TO_{INT,LONG} arg typesAndrey Ignatov
Currently the way to pass result from BPF helper to BPF program is to provide memory area defined by pointer and size: func(void *, size_t). It works great for generic use-case, but for simple types, such as int, it's overkill and consumes two arguments when it could use just one. Introduce new argument types ARG_PTR_TO_INT and ARG_PTR_TO_LONG to be able to pass result from helper to program via pointer to int and long correspondingly: func(int *) or func(long *). New argument types are similar to ARG_PTR_TO_MEM with the following differences: * they don't require corresponding ARG_CONST_SIZE argument, predefined access sizes are used instead (32bit for int, 64bit for long); * it's possible to use more than one such an argument in a helper; * provided pointers have to be aligned. It's easy to introduce similar ARG_PTR_TO_CHAR and ARG_PTR_TO_SHORT argument types. It's not done due to lack of use-case though. Signed-off-by: Andrey Ignatov <rdna@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2019-04-12bpf: Add file_pos field to bpf_sysctl ctxAndrey Ignatov
Add file_pos field to bpf_sysctl context to read and write sysctl file position at which sysctl is being accessed (read or written). The field can be used to e.g. override whole sysctl value on write to sysctl even when sys_write is called by user space with file_pos > 0. Or BPF program may reject such accesses. Signed-off-by: Andrey Ignatov <rdna@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2019-04-12bpf: Introduce bpf_sysctl_{get,set}_new_value helpersAndrey Ignatov
Add helpers to work with new value being written to sysctl by user space. bpf_sysctl_get_new_value() copies value being written to sysctl into provided buffer. bpf_sysctl_set_new_value() overrides new value being written by user space with a one from provided buffer. Buffer should contain string representation of the value, similar to what can be seen in /proc/sys/. Both helpers can be used only on sysctl write. File position matters and can be managed by an interface that will be introduced separately. E.g. if user space calls sys_write to a file in /proc/sys/ at file position = X, where X > 0, then the value set by bpf_sysctl_set_new_value() will be written starting from X. If program wants to override whole value with specified buffer, file position has to be set to zero. Documentation for the new helpers is provided in bpf.h UAPI. Signed-off-by: Andrey Ignatov <rdna@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2019-04-12bpf: Introduce bpf_sysctl_get_current_value helperAndrey Ignatov
Add bpf_sysctl_get_current_value() helper to copy current sysctl value into provided by BPF_PROG_TYPE_CGROUP_SYSCTL program buffer. It provides same string as user space can see by reading corresponding file in /proc/sys/, including new line, etc. Documentation for the new helper is provided in bpf.h UAPI. Since current value is kept in ctl_table->data in a parsed form, ctl_table->proc_handler() with write=0 is called to read that data and convert it to a string. Such a string can later be parsed by a program using helpers that will be introduced separately. Unfortunately it's not trivial to provide API to access parsed data due to variety of data representations (string, intvec, uintvec, ulongvec, custom structures, even NULL, etc). Instead it's assumed that user know how to handle specific sysctl they're interested in and appropriate helpers can be used. Since ctl_table->proc_handler() expects __user buffer, conversion to __user happens for kernel allocated one where the value is stored. Signed-off-by: Andrey Ignatov <rdna@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2019-04-12bpf: Sysctl hookAndrey Ignatov
Containerized applications may run as root and it may create problems for whole host. Specifically such applications may change a sysctl and affect applications in other containers. Furthermore in existing infrastructure it may not be possible to just completely disable writing to sysctl, instead such a process should be gradual with ability to log what sysctl are being changed by a container, investigate, limit the set of writable sysctl to currently used ones (so that new ones can not be changed) and eventually reduce this set to zero. The patch introduces new program type BPF_PROG_TYPE_CGROUP_SYSCTL and attach type BPF_CGROUP_SYSCTL to solve these problems on cgroup basis. New program type has access to following minimal context: struct bpf_sysctl { __u32 write; }; Where @write indicates whether sysctl is being read (= 0) or written (= 1). Helpers to access sysctl name and value will be introduced separately. BPF_CGROUP_SYSCTL attach point is added to sysctl code right before passing control to ctl_table->proc_handler so that BPF program can either allow or deny access to sysctl. Suggested-by: Roman Gushchin <guro@fb.com> Signed-off-by: Andrey Ignatov <rdna@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2019-04-12block: disk_events: introduce event flagsMartin Wilck
Currently, an empty disk->events field tells the block layer not to forward media change events to user space. This was done in commit 7c88a168da80 ("block: don't propagate unlisted DISK_EVENTs to userland") in order to avoid events from "fringe" drivers to be forwarded to user space. By doing so, the block layer lost the information which events were supported by a particular block device, and most importantly, whether or not a given device supports media change events at all. Prepare for not interpreting the "events" field this way in the future any more. This is done by adding an additional field "event_flags" to struct gendisk, and two flag bits that can be set to have the device treated like one that had the "events" field set to a non-zero value before. This applies only to the sd and sr drivers, which are changed to set the new flags. The new flags are DISK_EVENT_FLAG_POLL to enforce polling of the device for synchronous events, and DISK_EVENT_FLAG_UEVENT to tell the blocklayer to generate udev events from kernel events. In order to add the event_flags field to struct gendisk, the events field is converted to an "unsigned short"; it doesn't need to hold values bigger than 2 anyway. This patch doesn't change behavior. Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Martin Wilck <mwilck@suse.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-04-12block: genhd: remove async_events fieldMartin Wilck
The async_events field, intended to be used for drivers that support asynchronous notifications about disk events (aka media change events), isn't currently used by any driver, and apparently that has been that way for a long time (if not forever). Remove it. Reviewed-by: Hannes Reinecke <hare@suse.de> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Martin Wilck <mwilck@suse.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-04-12iommu: io-pgtable: Add ARM Mali midgard MMU page table formatRob Herring
ARM Mali midgard GPU is similar to standard 64-bit stage 1 page tables, but have a few differences. Add a new format type to represent the format. The input address size is 48-bits and the output address size is 40-bits (and possibly less?). Note that the later bifrost GPUs follow the standard 64-bit stage 1 format. The differences in the format compared to 64-bit stage 1 format are: The 3rd level page entry bits are 0x1 instead of 0x3 for page entries. The access flags are not read-only and unprivileged, but read and write. This is similar to stage 2 entries, but the memory attributes field matches stage 1 being an index. The nG bit is not set by the vendor driver. This one didn't seem to matter, but we'll keep it aligned to the vendor driver. Cc: Will Deacon <will.deacon@arm.com> Acked-by: Robin Murphy <robin.murphy@arm.com> Cc: linux-arm-kernel@lists.infradead.org Cc: iommu@lists.linux-foundation.org Acked-by: Alyssa Rosenzweig <alyssa@rosenzweig.io> Acked-by: Joerg Roedel <jroedel@suse.de> Signed-off-by: Rob Herring <robh@kernel.org> Link: https://patchwork.freedesktop.org/patch/msgid/20190409205427.6943-2-robh@kernel.org
2019-04-12block: only allow contiguous page structs in a bio_vecChristoph Hellwig
We currently have to call nth_page when iterating over pages inside a bio_vec. Jens complained a while ago that this is fairly expensive. To mitigate this we can check that that the actual page structures are contiguous when adding them to the bio, and just do check pointer arithmetics later on. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-04-12block: change how we get page references in bio_iov_iter_get_pagesChristoph Hellwig
Instead of needing a special macro to iterate over all pages in a bvec just do a second passs over the whole bio. This also matches what we do on the release side. The release side helper is moved up to where we need the get helper to clearly express the symmetry. Reviewed-by: Ming Lei <ming.lei@redhat.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-04-12overflow.h: Add comment documenting __ab_c_size()Rasmus Villemoes
__ab_c_size() is a somewhat opaque name. Document its purpose, and while at it, rename the parameters to actually match the abc naming. [ bp: glued a complete patch from chunks on LKML. ] Reported-by: Borislav Petkov <bp@suse.de> Signed-off-by: Rasmus Villemoes <linux@rasmusvillemoes.dk> Signed-off-by: Borislav Petkov <bp@suse.de> Acked-by: Kees Cook <keescook@chromium.org> Cc: Matthew Wilcox <willy@infradead.org> Link: https://lkml.kernel.org/r/20190405045711.30339-1-bp@alien8.de
2019-04-12vfio/mdev: Add iommu related member in mdev_deviceLu Baolu
A parent device might create different types of mediated devices. For example, a mediated device could be created by the parent device with full isolation and protection provided by the IOMMU. One usage case could be found on Intel platforms where a mediated device is an assignable subset of a PCI, the DMA requests on behalf of it are all tagged with a PASID. Since IOMMU supports PASID-granular translations (scalable mode in VT-d 3.0), this mediated device could be individually protected and isolated by an IOMMU. This patch adds a new member in the struct mdev_device to indicate that the mediated device represented by mdev could be isolated and protected by attaching a domain to a device represented by mdev->iommu_device. It also adds a helper to add or set the iommu device. * mdev_device->iommu_device - This, if set, indicates that the mediated device could be fully isolated and protected by IOMMU via attaching an iommu domain to this device. If empty, it indicates using vendor defined isolation, hence bypass IOMMU. * mdev_set/get_iommu_device(dev, iommu_device) - Set or get the iommu device which represents this mdev in IOMMU's device scope. Drivers don't need to set the iommu device if it uses vendor defined isolation. Cc: Ashok Raj <ashok.raj@intel.com> Cc: Jacob Pan <jacob.jun.pan@linux.intel.com> Cc: Kevin Tian <kevin.tian@intel.com> Cc: Liu Yi L <yi.l.liu@intel.com> Suggested-by: Kevin Tian <kevin.tian@intel.com> Suggested-by: Alex Williamson <alex.williamson@redhat.com> Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Reviewed-by: Jean-Philippe Brucker <jean-philippe.brucker@arm.com> Reviewed-by: Kirti Wankhede <kwankhede@nvidia.com> Acked-by: Alex Williamson <alex.williamson@redhat.com> Signed-off-by: Joerg Roedel <jroedel@suse.de>
2019-04-12spi: Add spi_is_bpw_supported()Noralf Trønnes
This let SPI clients check if the controller supports a particular word width. drivers/gpu/drm/tinydrm/mipi-dbi.c will use this to determine if the controller supports 16-bit for RGB565 pixels. If it doesn't it will swap the bytes before transfer on little endian machines. Signed-off-by: Noralf Trønnes <noralf@tronnes.org> Signed-off-by: Mark Brown <broonie@kernel.org>
2019-04-11Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-nextDavid S. Miller
Daniel Borkmann says: ==================== pull-request: bpf-next 2019-04-12 The following pull-request contains BPF updates for your *net-next* tree. The main changes are: 1) Improve BPF verifier scalability for large programs through two optimizations: i) remove verifier states that are not useful in pruning, ii) stop walking parentage chain once first LIVE_READ is seen. Combined gives approx 20x speedup. Increase limits for accepting large programs under root, and add various stress tests, from Alexei. 2) Implement global data support in BPF. This enables static global variables for .data, .rodata and .bss sections to be properly handled which allows for more natural program development. This also opens up the possibility to optimize program workflow by compiling ELFs only once and later only rewriting section data before reload, from Daniel and with test cases and libbpf refactoring from Joe. 3) Add config option to generate BTF type info for vmlinux as part of the kernel build process. DWARF debug info is converted via pahole to BTF. Latter relies on libbpf and makes use of BTF deduplication algorithm which results in 100x savings compared to DWARF data. Resulting .BTF section is typically about 2MB in size, from Andrii. 4) Add BPF verifier support for stack access with variable offset from helpers and add various test cases along with it, from Andrey. 5) Extend bpf_skb_adjust_room() growth BPF helper to mark inner MAC header so that L2 encapsulation can be used for tc tunnels, from Alan. 6) Add support for input __sk_buff context in BPF_PROG_TEST_RUN so that users can define a subset of allowed __sk_buff fields that get fed into the test program, from Stanislav. 7) Add bpf fs multi-dimensional array tests for BTF test suite and fix up various UBSAN warnings in bpftool, from Yonghong. 8) Generate a pkg-config file for libbpf, from Luca. 9) Dump program's BTF id in bpftool, from Prashant. 10) libbpf fix to use smaller BPF log buffer size for AF_XDP's XDP program, from Magnus. 11) kallsyms related fixes for the case when symbols are not present in BPF selftests and samples, from Daniel ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2019-04-12bridge: broute: make broute a real ebtables tableFlorian Westphal
This makes broute a normal ebtables table, hooking at PREROUTING. The broute hook is removed. It uses skb->cb to signal to bridge rx handler that the skb should be routed instead of being bridged. This change is backwards compatible with ebtables as no userspace visible parts are changed. This means we can also remove the !ops test in ebt_register_table, it was only there for broute table sake. Signed-off-by: Florian Westphal <fw@strlen.de> Acked-by: David S. Miller <davem@davemloft.net> Acked-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>