summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2020-02-21mm/sparsemem: pfn_to_page is not valid yet on SPARSEMEMWei Yang
When we use SPARSEMEM instead of SPARSEMEM_VMEMMAP, pfn_to_page() doesn't work before sparse_init_one_section() is called. This leads to a crash when hotplug memory: BUG: unable to handle page fault for address: 0000000006400000 #PF: supervisor write access in kernel mode #PF: error_code(0x0002) - not-present page PGD 0 P4D 0 Oops: 0002 [#1] SMP PTI CPU: 3 PID: 221 Comm: kworker/u16:1 Tainted: G W 5.5.0-next-20200205+ #343 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 0.0.0 02/06/2015 Workqueue: kacpi_hotplug acpi_hotplug_work_fn RIP: 0010:__memset+0x24/0x30 Code: cc cc cc cc cc cc 0f 1f 44 00 00 49 89 f9 48 89 d1 83 e2 07 48 c1 e9 03 40 0f b6 f6 48 b8 01 01 01 01 01 01 01 01 48 0f af c6 <f3> 48 ab 89 d1 f3 aa 4c 89 c8 c3 90 49 89 f9 40 88 f0 48 89 d1 f3 RSP: 0018:ffffb43ac0373c80 EFLAGS: 00010a87 RAX: ffffffffffffffff RBX: ffff8a1518800000 RCX: 0000000000050000 RDX: 0000000000000000 RSI: 00000000000000ff RDI: 0000000006400000 RBP: 0000000000140000 R08: 0000000000100000 R09: 0000000006400000 R10: 0000000000000000 R11: 0000000000000002 R12: 0000000000000000 R13: 0000000000000028 R14: 0000000000000000 R15: ffff8a153ffd9280 FS: 0000000000000000(0000) GS:ffff8a153ab00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000006400000 CR3: 0000000136fca000 CR4: 00000000000006e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: sparse_add_section+0x1c9/0x26a __add_pages+0xbf/0x150 add_pages+0x12/0x60 add_memory_resource+0xc8/0x210 __add_memory+0x62/0xb0 acpi_memory_device_add+0x13f/0x300 acpi_bus_attach+0xf6/0x200 acpi_bus_scan+0x43/0x90 acpi_device_hotplug+0x275/0x3d0 acpi_hotplug_work_fn+0x1a/0x30 process_one_work+0x1a7/0x370 worker_thread+0x30/0x380 kthread+0x112/0x130 ret_from_fork+0x35/0x40 We should use memmap as it did. On x86 the impact is limited to x86_32 builds, or x86_64 configurations that override the default setting for SPARSEMEM_VMEMMAP. Other memory hotplug archs (arm64, ia64, and ppc) also default to SPARSEMEM_VMEMMAP=y. [dan.j.williams@intel.com: changelog update] {rppt@linux.ibm.com: changelog update] Link: http://lkml.kernel.org/r/20200219030454.4844-1-bhe@redhat.com Fixes: ba72b4c8cf60 ("mm/sparsemem: support sub-section hotplug") Signed-off-by: Wei Yang <richardw.yang@linux.intel.com> Signed-off-by: Baoquan He <bhe@redhat.com> Acked-by: David Hildenbrand <david@redhat.com> Reviewed-by: Baoquan He <bhe@redhat.com> Reviewed-by: Dan Williams <dan.j.williams@intel.com> Acked-by: Michal Hocko <mhocko@suse.com> Cc: Mike Rapoport <rppt@linux.ibm.com> Cc: Oscar Salvador <osalvador@suse.de> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-02-21mm/vmscan.c: don't round up scan size for online memory cgroupGavin Shan
Commit 68600f623d69 ("mm: don't miss the last page because of round-off error") makes the scan size round up to @denominator regardless of the memory cgroup's state, online or offline. This affects the overall reclaiming behavior: the corresponding LRU list is eligible for reclaiming only when its size logically right shifted by @sc->priority is bigger than zero in the former formula. For example, the inactive anonymous LRU list should have at least 0x4000 pages to be eligible for reclaiming when we have 60/12 for swappiness/priority and without taking scan/rotation ratio into account. After the roundup is applied, the inactive anonymous LRU list becomes eligible for reclaiming when its size is bigger than or equal to 0x1000 in the same condition. (0x4000 >> 12) * 60 / (60 + 140 + 1) = 1 ((0x1000 >> 12) * 60) + 200) / (60 + 140 + 1) = 1 aarch64 has 512MB huge page size when the base page size is 64KB. The memory cgroup that has a huge page is always eligible for reclaiming in that case. The reclaiming is likely to stop after the huge page is reclaimed, meaing the further iteration on @sc->priority and the silbing and child memory cgroups will be skipped. The overall behaviour has been changed. This fixes the issue by applying the roundup to offlined memory cgroups only, to give more preference to reclaim memory from offlined memory cgroup. It sounds reasonable as those memory is unlikedly to be used by anyone. The issue was found by starting up 8 VMs on a Ampere Mustang machine, which has 8 CPUs and 16 GB memory. Each VM is given with 2 vCPUs and 2GB memory. It took 264 seconds for all VMs to be completely up and 784MB swap is consumed after that. With this patch applied, it took 236 seconds and 60MB swap to do same thing. So there is 10% performance improvement for my case. Note that KSM is disable while THP is enabled in the testing. total used free shared buff/cache available Mem: 16196 10065 2049 16 4081 3749 Swap: 8175 784 7391 total used free shared buff/cache available Mem: 16196 11324 3656 24 1215 2936 Swap: 8175 60 8115 Link: http://lkml.kernel.org/r/20200211024514.8730-1-gshan@redhat.com Fixes: 68600f623d69 ("mm: don't miss the last page because of round-off error") Signed-off-by: Gavin Shan <gshan@redhat.com> Acked-by: Roman Gushchin <guro@fb.com> Cc: <stable@vger.kernel.org> [4.20+] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-02-21lib/string.c: update match_string() doc-strings with correct behaviorAlexandru Ardelean
There were a few attempts at changing behavior of the match_string() helpers (i.e. 'match_string()' & 'sysfs_match_string()'), to change & extend the behavior according to the doc-string. But the simplest approach is to just fix the doc-strings. The current behavior is fine as-is, and some bugs were introduced trying to fix it. As for extending the behavior, new helpers can always be introduced if needed. The match_string() helpers behave more like 'strncmp()' in the sense that they go up to n elements or until the first NULL element in the array of strings. This change updates the doc-strings with this info. Link: http://lkml.kernel.org/r/20200213072722.8249-1-alexandru.ardelean@analog.com Signed-off-by: Alexandru Ardelean <alexandru.ardelean@analog.com> Acked-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Cc: Kees Cook <keescook@chromium.org> Cc: "Tobin C . Harding" <tobin@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-02-21mm/memcontrol.c: lost css_put in memcg_expand_shrinker_maps()Vasily Averin
for_each_mem_cgroup() increases css reference counter for memory cgroup and requires to use mem_cgroup_iter_break() if the walk is cancelled. Link: http://lkml.kernel.org/r/c98414fb-7e1f-da0f-867a-9340ec4bd30b@virtuozzo.com Fixes: 0a4465d34028 ("mm, memcg: assign memcg-aware shrinkers bitmap to memcg") Signed-off-by: Vasily Averin <vvs@virtuozzo.com> Acked-by: Kirill Tkhai <ktkhai@virtuozzo.com> Acked-by: Michal Hocko <mhocko@suse.com> Reviewed-by: Roman Gushchin <guro@fb.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Vladimir Davydov <vdavydov.dev@gmail.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-02-21mm/swapfile.c: fix a comment in sys_swapon()Christoph Hellwig
claim_swapfile now always takes i_rwsem. Link: http://lkml.kernel.org/r/20200114161225.309792-2-hch@lst.de Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-02-21scripts/get_maintainer.pl: deprioritize old Fixes: addressesDouglas Anderson
Recently, I found that get_maintainer was causing me to send emails to the old addresses for maintainers. Since I usually just trust the output of get_maintainer to know the right email address, I didn't even look carefully and fired off two patch series that went to the wrong place. Oops. The problem was introduced recently when trying to add signatures from Fixes. The problem was that these email addresses were added too early in the process of compiling our list of places to send. Things added to the list earlier are considered more canonical and when we later added maintainer entries we ended up deduplicating to the old address. Here are two examples using mainline commits (to make it easier to replicate) for the two maintainers that I messed up recently: $ git format-patch d8549bcd0529~..d8549bcd0529 $ ./scripts/get_maintainer.pl 0001-clk-Add-clk_hw*.patch | grep Boyd Stephen Boyd <sboyd@codeaurora.org>... $ git format-patch 6d1238aa3395~..6d1238aa3395 $ ./scripts/get_maintainer.pl 0001-arm64-dts-qcom-qcs404*.patch | grep Andy Andy Gross <andy.gross@linaro.org> Let's move the adding of addresses from Fixes: to the end since the email addresses from these are much more likely to be older. After this patch the above examples get the right addresses for the two examples. Link: http://lkml.kernel.org/r/20200127095001.1.I41fba9f33590bfd92cd01960161d8384268c6569@changeid Fixes: 2f5bd343694e ("scripts/get_maintainer.pl: add signatures from Fixes: <badcommit> lines in commit message") Signed-off-by: Douglas Anderson <dianders@chromium.org> Acked-by: Joe Perches <joe@perches.com> Cc: Stephen Boyd <sboyd@kernel.org> Cc: Bjorn Andersson <bjorn.andersson@linaro.org> Cc: Andy Gross <agross@kernel.org> Cc: Kees Cook <keescook@chromium.org> Cc: Dan Carpenter <dan.carpenter@oracle.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-02-21get_maintainer: remove uses of P: for maintainer nameJoe Perches
Commit 1ca84ed6425f ("MAINTAINERS: Reclaim the P: tag for Maintainer Entry Profile") changed the use of the "P:" tag from "Person" to "Profile (ie: special subsystem coding styles and characteristics)" Change how get_maintainer.pl parses the "P:" tag to match. Link: http://lkml.kernel.org/r/ca53823fc5d25c0be32ad937d0207a0589c08643.camel@perches.com Signed-off-by: Joe Perches <joe@perches.com> Acked-by: Dan Williams <dan.j.william@intel.com> Cc: Jonathan Corbet <corbet@lwn.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-02-21selftests/vm: add missed tests in run_vmtestsSeongJae Park
The commits introducing 'mlock-random-test'[1], 'map_fiex_noreplace'[2], and 'thuge-gen'[3] have not added those in the 'run_vmtests' script and thus the 'run_tests' command of kselftests doesn't run those. This commit adds those in the script. 'gup_benchmark' and 'transhuge-stress' are also not included in the 'run_vmtests', but this commit does not add those because those are for performance measurement rather than pass/fail tests. [1] commit 26b4224d9961 ("selftests: expanding more mlock selftest") [2] commit 91cbacc34512 ("tools/testing/selftests/vm/map_fixed_noreplace.c: add test for MAP_FIXED_NOREPLACE") [3] commit fcc1f2d5dd34 ("selftests: add a test program for variable huge page sizes in mmap/shmget") Link: http://lkml.kernel.org/r/20200206085144.29126-1-sj38.park@gmail.com Signed-off-by: SeongJae Park <sjpark@amazon.de> Cc: Uladzislau Rezki (Sony) <urezki@gmail.com> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Shuah Khan <shuah@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-02-21include/uapi/linux/swab.h: fix userspace breakage, use __BITS_PER_LONG for swapChristian Borntraeger
QEMU has a funny new build error message when I use the upstream kernel headers: CC block/file-posix.o In file included from /home/cborntra/REPOS/qemu/include/qemu/timer.h:4, from /home/cborntra/REPOS/qemu/include/qemu/timed-average.h:29, from /home/cborntra/REPOS/qemu/include/block/accounting.h:28, from /home/cborntra/REPOS/qemu/include/block/block_int.h:27, from /home/cborntra/REPOS/qemu/block/file-posix.c:30: /usr/include/linux/swab.h: In function `__swab': /home/cborntra/REPOS/qemu/include/qemu/bitops.h:20:34: error: "sizeof" is not defined, evaluates to 0 [-Werror=undef] 20 | #define BITS_PER_LONG (sizeof (unsigned long) * BITS_PER_BYTE) | ^~~~~~ /home/cborntra/REPOS/qemu/include/qemu/bitops.h:20:41: error: missing binary operator before token "(" 20 | #define BITS_PER_LONG (sizeof (unsigned long) * BITS_PER_BYTE) | ^ cc1: all warnings being treated as errors make: *** [/home/cborntra/REPOS/qemu/rules.mak:69: block/file-posix.o] Error 1 rm tests/qemu-iotests/socket_scm_helper.o This was triggered by commit d5767057c9a ("uapi: rename ext2_swab() to swab() and share globally in swab.h"). That patch is doing #include <asm/bitsperlong.h> but it uses BITS_PER_LONG. The kernel file asm/bitsperlong.h provide only __BITS_PER_LONG. Let us use the __ variant in swap.h Link: http://lkml.kernel.org/r/20200213142147.17604-1-borntraeger@de.ibm.com Fixes: d5767057c9a ("uapi: rename ext2_swab() to swab() and share globally in swab.h") Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com> Cc: Yury Norov <yury.norov@gmail.com> Cc: Allison Randal <allison@lohutok.net> Cc: Joe Perches <joe@perches.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: William Breathitt Gray <vilhelm.gray@gmail.com> Cc: Torsten Hilbrich <torsten.hilbrich@secunet.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-02-21Revert "ipc,sem: remove uneeded sem_undo_list lock usage in exit_sem()"Ioanna Alifieraki
This reverts commit a97955844807e327df11aa33869009d14d6b7de0. Commit a97955844807 ("ipc,sem: remove uneeded sem_undo_list lock usage in exit_sem()") removes a lock that is needed. This leads to a process looping infinitely in exit_sem() and can also lead to a crash. There is a reproducer available in [1] and with the commit reverted the issue does not reproduce anymore. Using the reproducer found in [1] is fairly easy to reach a point where one of the child processes is looping infinitely in exit_sem between for(;;) and if (semid == -1) block, while it's trying to free its last sem_undo structure which has already been freed by freeary(). Each sem_undo struct is on two lists: one per semaphore set (list_id) and one per process (list_proc). The list_id list tracks undos by semaphore set, and the list_proc by process. Undo structures are removed either by freeary() or by exit_sem(). The freeary function is invoked when the user invokes a syscall to remove a semaphore set. During this operation freeary() traverses the list_id associated with the semaphore set and removes the undo structures from both the list_id and list_proc lists. For this case, exit_sem() is called at process exit. Each process contains a struct sem_undo_list (referred to as "ulp") which contains the head for the list_proc list. When the process exits, exit_sem() traverses this list to remove each sem_undo struct. As in freeary(), whenever a sem_undo struct is removed from list_proc, it is also removed from the list_id list. Removing elements from list_id is safe for both exit_sem() and freeary() due to sem_lock(). Removing elements from list_proc is not safe; freeary() locks &un->ulp->lock when it performs list_del_rcu(&un->list_proc) but exit_sem() does not (locking was removed by commit a97955844807 ("ipc,sem: remove uneeded sem_undo_list lock usage in exit_sem()"). This can result in the following situation while executing the reproducer [1] : Consider a child process in exit_sem() and the parent in freeary() (because of semctl(sid[i], NSEM, IPC_RMID)). - The list_proc for the child contains the last two undo structs A and B (the rest have been removed either by exit_sem() or freeary()). - The semid for A is 1 and semid for B is 2. - exit_sem() removes A and at the same time freeary() removes B. - Since A and B have different semid sem_lock() will acquire different locks for each process and both can proceed. The bug is that they remove A and B from the same list_proc at the same time because only freeary() acquires the ulp lock. When exit_sem() removes A it makes ulp->list_proc.next to point at B and at the same time freeary() removes B setting B->semid=-1. At the next iteration of for(;;) loop exit_sem() will try to remove B. The only way to break from for(;;) is for (&un->list_proc == &ulp->list_proc) to be true which is not. Then exit_sem() will check if B->semid=-1 which is and will continue looping in for(;;) until the memory for B is reallocated and the value at B->semid is changed. At that point, exit_sem() will crash attempting to unlink B from the lists (this can be easily triggered by running the reproducer [1] a second time). To prove this scenario instrumentation was added to keep information about each sem_undo (un) struct that is removed per process and per semaphore set (sma). CPU0 CPU1 [caller holds sem_lock(sma for A)] ... freeary() exit_sem() ... ... ... sem_lock(sma for B) spin_lock(A->ulp->lock) ... list_del_rcu(un_A->list_proc) list_del_rcu(un_B->list_proc) Undo structures A and B have different semid and sem_lock() operations proceed. However they belong to the same list_proc list and they are removed at the same time. This results into ulp->list_proc.next pointing to the address of B which is already removed. After reverting commit a97955844807 ("ipc,sem: remove uneeded sem_undo_list lock usage in exit_sem()") the issue was no longer reproducible. [1] https://bugzilla.redhat.com/show_bug.cgi?id=1694779 Link: http://lkml.kernel.org/r/20191211191318.11860-1-ioanna-maria.alifieraki@canonical.com Fixes: a97955844807 ("ipc,sem: remove uneeded sem_undo_list lock usage in exit_sem()") Signed-off-by: Ioanna Alifieraki <ioanna-maria.alifieraki@canonical.com> Acked-by: Manfred Spraul <manfred@colorfullife.com> Acked-by: Herton R. Krzesinski <herton@redhat.com> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: <malat@debian.org> Cc: Joel Fernandes (Google) <joel@joelfernandes.org> Cc: Davidlohr Bueso <dave@stgolabs.net> Cc: Jay Vosburgh <jay.vosburgh@canonical.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-02-21y2038: hide timeval/timespec/itimerval/itimerspec typesArnd Bergmann
There are no in-kernel users remaining, but there may still be users that include linux/time.h instead of sys/time.h from user space, so leave the types available to user space while hiding them from kernel space. Only the __kernel_old_* versions of these types remain now. Link: http://lkml.kernel.org/r/20200110154232.4104492-4-arnd@arndb.de Signed-off-by: Arnd Bergmann <arnd@arndb.de> Acked-by: Thomas Gleixner <tglx@linutronix.de> Cc: Deepa Dinamani <deepa.kernel@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-02-21y2038: remove unused time32 interfacesArnd Bergmann
No users remain, so kill these off before we grow new ones. Link: http://lkml.kernel.org/r/20200110154232.4104492-3-arnd@arndb.de Signed-off-by: Arnd Bergmann <arnd@arndb.de> Acked-by: Thomas Gleixner <tglx@linutronix.de> Cc: Deepa Dinamani <deepa.kernel@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-02-21y2038: remove ktime to/from timespec/timeval conversionArnd Bergmann
A couple of helpers are now obsolete and can be removed, so drivers can no longer start using them and instead use y2038-safe interfaces. Link: http://lkml.kernel.org/r/20200110154232.4104492-2-arnd@arndb.de Signed-off-by: Arnd Bergmann <arnd@arndb.de> Acked-by: Thomas Gleixner <tglx@linutronix.de> Cc: Deepa Dinamani <deepa.kernel@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-02-21ACPI: PM: s2idle: Check fixed wakeup events in acpi_s2idle_wake()Rafael J. Wysocki
Commit fdde0ff8590b ("ACPI: PM: s2idle: Prevent spurious SCIs from waking up the system") overlooked the fact that fixed events can wake up the system too and broke RTC wakeup from suspend-to-idle as a result. Fix this issue by checking the fixed events in acpi_s2idle_wake() in addition to checking wakeup GPEs and break out of the suspend-to-idle loop if the status bits of any enabled fixed events are set then. Fixes: fdde0ff8590b ("ACPI: PM: s2idle: Prevent spurious SCIs from waking up the system") Reported-and-tested-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: 5.4+ <stable@vger.kernel.org> # 5.4+ Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-02-21enetc: remove "depends on (ARCH_LAYERSCAPE || COMPILE_TEST)"Vladimir Oltean
ARCH_LAYERSCAPE isn't needed for this driver, it builds and sends/receives traffic without this config option just fine. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Acked-by: Claudiu Manoil <claudiu.manoil@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-02-21tc-testing: updated tdc tests for basic filter with u16 extended match rulesRoman Mashak
Signed-off-by: Roman Mashak <mrv@mojatatu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-02-21net: page_pool: Add documentation on page_pool APIIlias Apalodimas
Add documentation explaining the basic functionality and design principles of the API Signed-off-by: Ilias Apalodimas <ilias.apalodimas@linaro.org> Acked-by: Randy Dunlap <rdunlap@infradead.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-02-21Merge tag 'drm-intel-fixes-2020-02-20' of ↵Dave Airlie
git://anongit.freedesktop.org/drm/drm-intel into drm-fixes drm/i915 fixes for v5.6-rc3: - Workaround missing Display Stream Compression (DSC) state readout by forcing modeset when its enabled at probe - Fix EHL port clock voltage level requirements - Fix queuing retire workers on the virtual engine - Fix use of partially initialized waiters - Stop using drm_pci_alloc/drm_pci/free - Fix rewind of RING_TAIL by forcing a context reload - Fix locking on resetting ring->head - Propagate our bug filing URL change to stable kernels Signed-off-by: Dave Airlie <airlied@redhat.com> From: Jani Nikula <jani.nikula@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/87y2sxtsrd.fsf@intel.com
2020-02-21Merge tag 'drm-misc-fixes-2020-02-20' of ↵Dave Airlie
git://anongit.freedesktop.org/drm/drm-misc into drm-fixes drm-misc-fixes for v5.6-rc3: - Fix dt binding for sunxi. - Allow only 1 rotation argument, and allow 0 rotation in video cmdline. - Small compiler warning fix for panfrost. - Fix when using performance counters in panfrost when using per fd address space. Signed-off-by: Dave Airlie <airlied@redhat.com> From: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/f5a6370d-9898-6c72-43e4-5bb56a99b6f2@linux.intel.com
2020-02-20docs/bpf: Update bpf development Q/A fileYonghong Song
bpf now has its own mailing list bpf@vger.kernel.org. Update the bpf_devel_QA.rst file to reflect this. Also llvm has switch to github with llvm and clang in the same repo https://github.com/llvm/llvm-project.git. Update the QA file with newer build instructions. Signed-off-by: Yonghong Song <yhs@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Song Liu <songliubraving@fb.com> Link: https://lore.kernel.org/bpf/20200221004354.930952-1-yhs@fb.com
2020-02-20selftests/bpf: Fix trampoline_count clean up logicAndrii Nakryiko
Libbpf's Travis CI tests caught this issue. Ensure bpf_link and bpf_object clean up is performed correctly. Fixes: d633d57902a5 ("selftest/bpf: Add test for allowed trampolines count") Signed-off-by: Andrii Nakryiko <andriin@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Link: https://lore.kernel.org/bpf/20200220230546.769250-1-andriin@fb.com
2020-02-20Merge branch 'set_attach_target'Alexei Starovoitov
Eelco Chaudron says: ==================== Currently when you want to attach a trace program to a bpf program the section name needs to match the tracepoint/function semantics. However the addition of the bpf_program__set_attach_target() API allows you to specify the tracepoint/function dynamically. The call flow would look something like this: xdp_fd = bpf_prog_get_fd_by_id(id); trace_obj = bpf_object__open_file("func.o", NULL); prog = bpf_object__find_program_by_title(trace_obj, "fentry/myfunc"); bpf_program__set_expected_attach_type(prog, BPF_TRACE_FENTRY); bpf_program__set_attach_target(prog, xdp_fd, "xdpfilt_blk_all"); bpf_object__load(trace_obj) v1 -> v2: Remove requirement for attach type hint in API v2 -> v3: Moved common warning to __find_vmlinux_btf_id, requested by Andrii Updated the xdp_bpf2bpf test to use this new API v3 -> v4: Split up patch, update libbpf.map version v4 -> v5: Fix return code, and prog assignment in test case ==================== Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2020-02-20selftests/bpf: Update xdp_bpf2bpf test to use new set_attach_target APIEelco Chaudron
Use the new bpf_program__set_attach_target() API in the xdp_bpf2bpf selftest so it can be referenced as an example on how to use it. Signed-off-by: Eelco Chaudron <echaudro@redhat.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Toke Høiland-Jørgensen <toke@redhat.com> Acked-by: Andrii Nakryiko <andriin@fb.com> Link: https://lore.kernel.org/bpf/158220520562.127661.14289388017034825841.stgit@xdp-tutorial
2020-02-20libbpf: Add support for dynamic program attach targetEelco Chaudron
Currently when you want to attach a trace program to a bpf program the section name needs to match the tracepoint/function semantics. However the addition of the bpf_program__set_attach_target() API allows you to specify the tracepoint/function dynamically. The call flow would look something like this: xdp_fd = bpf_prog_get_fd_by_id(id); trace_obj = bpf_object__open_file("func.o", NULL); prog = bpf_object__find_program_by_title(trace_obj, "fentry/myfunc"); bpf_program__set_expected_attach_type(prog, BPF_TRACE_FENTRY); bpf_program__set_attach_target(prog, xdp_fd, "xdpfilt_blk_all"); bpf_object__load(trace_obj) Signed-off-by: Eelco Chaudron <echaudro@redhat.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Andrii Nakryiko <andriin@fb.com> Acked-by: Toke Høiland-Jørgensen <toke@redhat.com> Link: https://lore.kernel.org/bpf/158220519486.127661.7964708960649051384.stgit@xdp-tutorial
2020-02-20libbpf: Bump libpf current version to v0.0.8Eelco Chaudron
New development cycles starts, bump to v0.0.8. Signed-off-by: Eelco Chaudron <echaudro@redhat.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Toke Høiland-Jørgensen <toke@redhat.com> Link: https://lore.kernel.org/bpf/158220518424.127661.8278643006567775528.stgit@xdp-tutorial
2020-02-20Merge branch 'bnxt_en-shutdown-and-kexec-kdump-related-fixes'David S. Miller
Michael Chan says: ==================== bnxt_en: shutdown and kexec/kdump related fixes. 2 small patches to fix kexec shutdown and kdump kernel driver init issues. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2020-02-20bnxt_en: Issue PCIe FLR in kdump kernel to cleanup pending DMAs.Vasundhara Volam
If crashed kernel does not shutdown the NIC properly, PCIe FLR is required in the kdump kernel in order to initialize all the functions properly. Fixes: d629522e1d66 ("bnxt_en: Reduce memory usage when running in kdump kernel.") Signed-off-by: Vasundhara Volam <vasundhara-v.volam@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-02-20bnxt_en: Improve device shutdown method.Vasundhara Volam
Especially when bnxt_shutdown() is called during kexec, we need to disable MSIX and disable Bus Master to completely quiesce the device. Make these 2 calls unconditionally in the shutdown method. Fixes: c20dc142dd7b ("bnxt_en: Disable bus master during PCI shutdown and driver unload.") Signed-off-by: Vasundhara Volam <vasundhara-v.volam@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-02-20net: netlink: cap max groups which will be considered in netlink_bind()Nikolay Aleksandrov
Since nl_groups is a u32 we can't bind more groups via ->bind (netlink_bind) call, but netlink has supported more groups via setsockopt() for a long time and thus nlk->ngroups could be over 32. Recently I added support for per-vlan notifications and increased the groups to 33 for NETLINK_ROUTE which exposed an old bug in the netlink_bind() code causing out-of-bounds access on archs where unsigned long is 32 bits via test_bit() on a local variable. Fix this by capping the maximum groups in netlink_bind() to BITS_PER_TYPE(u32), effectively capping them at 32 which is the minimum of allocated groups and the maximum groups which can be bound via netlink_bind(). CC: Christophe Leroy <christophe.leroy@c-s.fr> CC: Richard Guy Briggs <rgb@redhat.com> Fixes: 4f520900522f ("netlink: have netlink per-protocol bind function return an error code.") Reported-by: Erhard F. <erhard_f@mailbox.org> Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-02-20Merge branch '1GbE' of ↵David S. Miller
git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/next-queue Jeff Kirsher says: ==================== 1GbE Intel Wired LAN Driver Updates 2020-02-19 This series contains updates to e1000e and igc drivers. Ben Dooks adds a missing cpu_to_le64() in the e1000e transmit ring flush function. Jia-Ju Bai replaces a couple of udelay() with usleep_range() where we could sleep while holding a spinlock in e1000e. Chen Zhou make 2 functions static in igc, Sasha finishes the legacy power management support in igc by adding resume and schedule suspend requests. Also added register dump functionality in the igc driver. Added device id support for the next generation of i219 devices in e1000e. Fixed a typo in the igc driver that referenced a device that is not support in the driver. Added the missing PTP support when suspending now that igc has legacy power management support. Added PCIe error detection, slot reset and resume capability in igc. Added WoL support for igc as well. Lastly, added a code comment to distinguish between interrupt and flag definitions. Vitaly adds device id support for Tiger Lake platforms, which has another next generation of i219 device in e1000e. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2020-02-20net: thunderx: workaround BGX TX Underflow issueTim Harvey
While it is not yet understood why a TX underflow can easily occur for SGMII interfaces resulting in a TX wedge. It has been found that disabling/re-enabling the LMAC resolves the issue. Signed-off-by: Tim Harvey <tharvey@gateworks.com> Reviewed-by: Robert Jones <rjones@gateworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-02-20ionic: fix fw_status readShannon Nelson
The fw_status field is only 8 bits, so fix the read. Also, we only want to look at the one status bit, to allow for future use of the other bits, and watch for a bad PCI read. Fixes: 97ca486592c0 ("ionic: add heartbeat check") Signed-off-by: Shannon Nelson <snelson@pensando.io> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-02-20Merge branch 'next-integrity' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/zohar/linux-integrity Pull IMA fixes from Mimi Zohar: "Two bug fixes and an associated change for each. The one that adds SM3 to the IMA list of supported hash algorithms is a simple change, but could be considered a new feature" * 'next-integrity' of git://git.kernel.org/pub/scm/linux/kernel/git/zohar/linux-integrity: ima: add sm3 algorithm to hash algorithm configuration list crypto: rename sm3-256 to sm3 in hash_algo_name efi: Only print errors about failing to get certs if EFI vars are found x86/ima: use correct identifier for SetupMode variable
2020-02-20Merge branch '100GbE' of ↵David S. Miller
git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/next-queue Jeff Kirsher says: ==================== 100GbE Intel Wired LAN Driver Updates 2020-02-19 This series contains updates to the ice driver only. Avinash adds input validation for software DCB configurations received via lldptool or pcap to ensure bad bandwidth inputs are not inputted which could cause the loss of link. Paul update the malicious driver detection event messages to rate limit once per second and to include the total number of receive|transmit MDD event count. Dan updates how TCAM entries are managed to ensure when overriding pre-existing TCAM entries, properly delete the existing entry and remove it from the change/update list. Brett ensures we clear the relevant values in the QRXFLXP_CNTXT register for VF queues to ensure the receive queue data is not stale. Avinash adds required DCBNL operations for configuring ETS in software DCB CEE mode. Also added code to detect if DCB is in IEEE or CEE mode to properly report what mode we are in. Dave fixes the driver to properly report the current maximum TC, not the maximum allowed number of TCs. Krzysztof adds support for AF_XDP feature in the ice driver. Jake increases the maximum time that the driver will wait for a PR reset to account for possibility of a slightly longer than expected PD reset. Jesse fixes a number of strings which did not have line feeds, so add line feeds so that messages do not rum together, creating a jumbled mess. Bruce adds support for additional E810 and E823 device ids. Also updated the product name change for E822 devices. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2020-02-20net: disable BRIDGE_NETFILTER by defaultRoman Kiryanov
The description says 'If unsure, say N.' but the module is built as M by default (once the dependencies are satisfied). When the module is selected (Y or M), it enables NETFILTER_FAMILY_BRIDGE and SKB_EXTENSIONS which alter kernel internal structures. We (Android Studio Emulator) currently do not use this module and think this it is more consistent to have it disabled by default as opposite to disabling it explicitly to prevent enabling NETFILTER_FAMILY_BRIDGE and SKB_EXTENSIONS. Signed-off-by: Roman Kiryanov <rkir@google.com> Acked-by: Florian Westphal <fw@strlen.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-02-20net: macb: Properly handle phylink on at91rm9200Alexandre Belloni
at91ether_init was handling the phy mode and speed but since the switch to phylink, the NCFGR register got overwritten by macb_mac_config(). The issue is that the RM9200_RMII bit and the MACB_CLK_DIV32 field are cleared but never restored as they conflict with the PAE, GBE and PCSSEL bits. Add new capability to differentiate between EMAC and the other versions of the IP and use it to set and avoid clearing the relevant bits. Also, this fixes a NULL pointer dereference in macb_mac_link_up as the EMAC doesn't use any rings/bufffers/queues. Fixes: 7897b071ac3b ("net: macb: convert to phylink") Signed-off-by: Alexandre Belloni <alexandre.belloni@bootlin.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-02-20sched: Provide cant_migrate()Thomas Gleixner
Some code pathes rely on preempt_disable() to prevent migration on a non RT enabled kernel. These preempt_disable/enable() pairs are substituted by migrate_disable/enable() pairs or other forms of RT specific protection. On RT these protections prevent migration but not preemption. Obviously a cant_sleep() check in such a section will trigger on RT because preemption is not disabled. Provide a cant_migrate() macro which maps to cant_sleep() on a non RT kernel and an empty placeholder for RT for now. The placeholder will be changed to a proper debug check along with the RT specific migration protection mechanism. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lkml.kernel.org/r/20200214161503.070487511@linutronix.de
2020-02-20sched/rt: Provide migrate_disable/enable() inlinesThomas Gleixner
Code which solely needs to prevent migration of a task uses preempt_disable()/enable() pairs. This is the only reliable way to do so as setting the task affinity to a single CPU can be undone by a setaffinity operation from a different task/process. RT provides a seperate migrate_disable/enable() mechanism which does not disable preemption to achieve the semantic requirements of a (almost) fully preemptible kernel. As it is unclear from looking at a given code path whether the intention is to disable preemption or migration, introduce migrate_disable/enable() inline functions which can be used to annotate code which merely needs to disable migration. Map them to preempt_disable/enable() for now. The RT substitution will be provided later. Code which is annotated that way documents that it has no requirement to protect against reentrancy of a preempting task. Either this is not required at all or the call sites are already serialized by other means. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Juri Lelli <juri.lelli@redhat.com> Cc: Vincent Guittot <vincent.guittot@linaro.org> Cc: Dietmar Eggemann <dietmar.eggemann@arm.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Ben Segall <bsegall@google.com> Cc: Mel Gorman <mgorman@suse.de> Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Link: https://lore.kernel.org/r/878slclv1u.fsf@nanos.tec.linutronix.de
2020-02-20libbpf: Relax check whether BTF is mandatoryAndrii Nakryiko
If BPF program is using BTF-defined maps, BTF is required only for libbpf itself to process map definitions. If after that BTF fails to be loaded into kernel (e.g., if it doesn't support BTF at all), this shouldn't prevent valid BPF program from loading. Existing retry-without-BTF logic for creating maps will succeed to create such maps without any problems. So, presence of .maps section shouldn't make BTF required for kernel. Update the check accordingly. Validated by ensuring simple BPF program with BTF-defined maps is still loaded on old kernel without BTF support and map is correctly parsed and created. Fixes: abd29c931459 ("libbpf: allow specifying map definitions using BTF") Reported-by: Julia Kartseva <hex@fb.com> Signed-off-by: Andrii Nakryiko <andriin@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20200220062635.1497872-1-andriin@fb.com
2020-02-20Merge branch 's390-fixes'David S. Miller
Julian Wiedmann says: ==================== s390/qeth: fixes 2020-02-20 please apply the following patch series for qeth to netdev's net tree. This corrects three minor issues: 1) return a more fitting errno when VNICC cmds are not supported, 2) remove a bogus WARN in the NAPI code, and 3) be _very_ pedantic about the RX copybreak. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2020-02-20s390/qeth: fix off-by-one in RX copybreak checkJulian Wiedmann
The RX copybreak is intended as the _max_ value where the frame's data should be copied. So for frame_len == copybreak, don't build an SG skb. Fixes: 4a71df50047f ("qeth: new qeth device driver") Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-02-20s390/qeth: don't warn for napi with 0 budgetJulian Wiedmann
Calling napi->poll() with 0 budget is a legitimate use by netpoll. Fixes: a1c3ed4c9ca0 ("qeth: NAPI support for l2 and l3 discipline") Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-02-20s390/qeth: vnicc Fix EOPNOTSUPP precedenceAlexandra Winter
When getting or setting VNICC parameters, the error code EOPNOTSUPP should have precedence over EBUSY. EBUSY is used because vnicc feature and bridgeport feature are mutually exclusive, which is a temporary condition. Whereas EOPNOTSUPP indicates that the HW does not support all or parts of the vnicc feature. This issue causes the vnicc sysfs params to show 'blocked by bridgeport' for HW that does not support VNICC at all. Fixes: caa1f0b10d18 ("s390/qeth: add VNICC enable/disable support") Signed-off-by: Alexandra Winter <wintera@linux.ibm.com> Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-02-20net: use netif_is_bridge_port() to check for IFF_BRIDGE_PORTJulian Wiedmann
Trivial cleanup, so that all bridge port-specific code can be found in one go. CC: Johannes Berg <johannes@sipsolutions.net> CC: Roopa Prabhu <roopa@cumulusnetworks.com> CC: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com> Reviewed-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-02-20net: page_pool: API cleanup and commentsIlias Apalodimas
Functions starting with __ usually indicate those which are exported, but should not be called directly. Update some of those declared in the API and make it more readable. page_pool_unmap_page() and page_pool_release_page() were doing exactly the same thing calling __page_pool_clean_page(). Let's rename __page_pool_clean_page() to page_pool_release_page() and export it in order to show up on perf logs and get rid of page_pool_unmap_page(). Finally rename __page_pool_put_page() to page_pool_put_page() since we can now directly call it from drivers and rename the existing page_pool_put_page() to page_pool_put_full_page() since they do the same thing but the latter is trying to sync the full DMA area. This patch also updates netsec, mvneta and stmmac drivers which use those functions. Suggested-by: Jonathan Lemon <jonathan.lemon@gmail.com> Acked-by: Toke Høiland-Jørgensen <toke@redhat.com> Acked-by: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: Ilias Apalodimas <ilias.apalodimas@linaro.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-02-20Merge branch 'mlxsw-Preparation-for-RTNL-removal'David S. Miller
Ido Schimmel says: ==================== mlxsw: Preparation for RTNL removal The driver currently acquires RTNL in its route insertion path, which contributes to very large control plane latencies. This patch set prepares mlxsw for RTNL removal from its route insertion path in a follow-up patch set. Patches #1-#2 protect shared resources - KVDL and counter pool - with their own locks. All allocations of these resources are currently performed under RTNL, so no locks were required. Patches #3-#7 ensure that updates to mirroring sessions only take place in case there are active mirroring sessions. This allows us to avoid taking RTNL when it is unnecessary, as updating of the mirroring sessions must be performed under RTNL for the time being. Patches #8-#10 replace the use of APIs that assume that RTNL is taken with their RCU counterparts. Specifically, patches #8 and #9 replace __in_dev_get_rtnl() with __in_dev_get_rcu() under RCU read-side critical section. Patch #10 replaces __dev_get_by_index() with dev_get_by_index_rcu(). Patches #11-#15 perform small adjustments in the code to make it easier to later introduce a router lock instead of relying on RTNL. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2020-02-20mlxsw: spectrum_nve: Make tunnel initialization symmetricIdo Schimmel
The device supports a single VTEP whose configuration is shared between all VXLAN tunnels. While the shared configuration is cleared upon the destruction of the last tunnel - in mlxsw_sp_nve_tunnel_fini() - it is set in mlxsw_sp_nve_fid_enable(), after calling mlxsw_sp_nve_tunnel_init(). Make tunnel initialization and destruction symmetric and set the configuration in mlxsw_sp_nve_tunnel_init(). This will later allow us to protect the shared configuration with a lock. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-02-20mlxsw: spectrum: Export function to check if RIF existsIdo Schimmel
After the previous patch, all the callers of mlxsw_sp_rif_find_by_dev() outside of the routing code use it to understand if a RIF exists for the passed netdev. Therefore, export a function to check if a RIF exists and make mlxsw_sp_rif_find_by_dev() internal to the routing code. This will later allow us to more easily introduce the router lock which will also protect the RIFs. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-02-20mlxsw: spectrum: Prevent RIF access outside of routing codeIdo Schimmel
There are currently 5 users of mlxsw_sp_rif_find_by_dev() outside of the routing code. Only one call site actually needs to dereference the router interface (RIF). The rest merely need to know if a RIF exists for the provided netdev. Convert this call site to query the needed information directly from the routing code instead of dereferencing the RIF. This will later allow us to replace mlxsw_sp_rif_find_by_dev() with a function that checks if a RIF exist. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-02-20mlxsw: spectrum_router: Prepare function for router lock introductionIdo Schimmel
The function de-associates the port-vlan from its router interface (RIF). It is called both from the netdev notifier block and the inetaddr notifier block that will soon hold the router lock. Make sure that router code calls the internal version, as it will already have the router lock held when the function is called. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>