summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2019-06-29fs/binfmt_flat.c: make load_flat_shared_library() workJann Horn
load_flat_shared_library() is broken: It only calls load_flat_file() if prepare_binprm() returns zero, but prepare_binprm() returns the number of bytes read - so this only happens if the file is empty. Instead, call into load_flat_file() if the number of bytes read is non-negative. (Even if the number of bytes is zero - in that case, load_flat_file() will see nullbytes and return a nice -ENOEXEC.) In addition, remove the code related to bprm creds and stop using prepare_binprm() - this code is loading a library, not a main executable, and it only actually uses the members "buf", "file" and "filename" of the linux_binprm struct. Instead, call kernel_read() directly. Link: http://lkml.kernel.org/r/20190524201817.16509-1-jannh@google.com Fixes: 287980e49ffc ("remove lots of IS_ERR_VALUE abuses") Signed-off-by: Jann Horn <jannh@google.com> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Kees Cook <keescook@chromium.org> Cc: Nicolas Pitre <nicolas.pitre@linaro.org> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Cc: Russell King <linux@armlinux.org.uk> Cc: Greg Ungerer <gerg@linux-m68k.org> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-06-29mm/mempolicy.c: fix an incorrect rebind node in mpol_rebind_nodemaskzhong jiang
mpol_rebind_nodemask() is called for MPOL_BIND and MPOL_INTERLEAVE mempoclicies when the tasks's cpuset's mems_allowed changes. For policies created without MPOL_F_STATIC_NODES or MPOL_F_RELATIVE_NODES, it works by remapping the policy's allowed nodes (stored in v.nodes) using the previous value of mems_allowed (stored in w.cpuset_mems_allowed) as the domain of map and the new mems_allowed (passed as nodes) as the range of the map (see the comment of bitmap_remap() for details). The result of remapping is stored back as policy's nodemask in v.nodes, and the new value of mems_allowed should be stored in w.cpuset_mems_allowed to facilitate the next rebind, if it happens. However, 213980c0f23b ("mm, mempolicy: simplify rebinding mempolicies when updating cpusets") introduced a bug where the result of remapping is stored in w.cpuset_mems_allowed instead. Thus, a mempolicy's allowed nodes can evolve in an unexpected way after a series of rebinding due to cpuset mems_allowed changes, possibly binding to a wrong node or a smaller number of nodes which may e.g. overload them. This patch fixes the bug so rebinding again works as intended. [vbabka@suse.cz: new changlog] Link: http://lkml.kernel.org/r/ef6a69c6-c052-b067-8f2c-9d615c619bb9@suse.cz Link: http://lkml.kernel.org/r/1558768043-23184-1-git-send-email-zhongjiang@huawei.com Fixes: 213980c0f23b ("mm, mempolicy: simplify rebinding mempolicies when updating cpusets") Signed-off-by: zhong jiang <zhongjiang@huawei.com> Reviewed-by: Vlastimil Babka <vbabka@suse.cz> Cc: Oscar Salvador <osalvador@suse.de> Cc: Anshuman Khandual <khandual@linux.vnet.ibm.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Mel Gorman <mgorman@techsingularity.net> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Ralph Campbell <rcampbell@nvidia.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-06-29fs/proc/array.c: allow reporting eip/esp for all coredumping threadsJohn Ogness
0a1eb2d474ed ("fs/proc: Stop reporting eip and esp in /proc/PID/stat") stopped reporting eip/esp and fd7d56270b52 ("fs/proc: Report eip/esp in /prod/PID/stat for coredumping") reintroduced the feature to fix a regression with userspace core dump handlers (such as minicoredumper). Because PF_DUMPCORE is only set for the primary thread, this didn't fix the original problem for secondary threads. Allow reporting the eip/esp for all threads by checking for PF_EXITING as well. This is set for all the other threads when they are killed. coredump_wait() waits for all the tasks to become inactive before proceeding to invoke a core dumper. Link: http://lkml.kernel.org/r/87y32p7i7a.fsf@linutronix.de Link: http://lkml.kernel.org/r/20190522161614.628-1-jlu@pengutronix.de Fixes: fd7d56270b526ca3 ("fs/proc: Report eip/esp in /prod/PID/stat for coredumping") Signed-off-by: John Ogness <john.ogness@linutronix.de> Reported-by: Jan Luebbe <jlu@pengutronix.de> Tested-by: Jan Luebbe <jlu@pengutronix.de> Cc: Alexey Dobriyan <adobriyan@gmail.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-06-29mm/dev_pfn: exclude MEMORY_DEVICE_PRIVATE while computing virtual addressAnshuman Khandual
The presence of struct page does not guarantee linear mapping for the pfn physical range. Device private memory which is non-coherent is excluded from linear mapping during devm_memremap_pages() though they will still have struct page coverage. Change pfn_t_to_virt() to just check for device private memory before giving out virtual address for a given pfn. pfn_t_to_virt() actually has no callers. Let's fix it for the 5.2 kernel and remove it in 5.3. Link: http://lkml.kernel.org/r/1558089514-25067-1-git-send-email-anshuman.khandual@arm.com Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Jérôme Glisse <jglisse@redhat.com> Cc: Laurent Dufour <ldufour@linux.vnet.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-06-29iwlwifi: mvm: clear rfkill_safe_init_done when we start the firmwareEmmanuel Grumbach
Otherwise it'll stay set forever which is clearly buggy. Cc: stable@vger.kernel.org Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com> Signed-off-by: Luca Coelho <luciano.coelho@intel.com>
2019-06-29iwlwifi: don't WARN when calling iwl_get_shared_mem_conf with RF-KillEmmanuel Grumbach
iwl_mvm_send_cmd returns 0 when the command won't be sent because RF-Kill is asserted. Do the same when we call iwl_get_shared_mem_conf since it is not sent through iwl_mvm_send_cmd but directly calls the transport layer. Cc: stable@vger.kernel.org Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com> Signed-off-by: Luca Coelho <luciano.coelho@intel.com>
2019-06-29iwlwifi: pcie: don't service an interrupt that was maskedEmmanuel Grumbach
Sometimes the register status can include interrupts that were masked. We can, for example, get the RF-Kill bit set in the interrupt status register although this interrupt was masked. Then if we get the ALIVE interrupt (for example) that was not masked, we need to *not* service the RF-Kill interrupt. Fix this in the MSI-X interrupt handler. Cc: stable@vger.kernel.org Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com> Signed-off-by: Luca Coelho <luciano.coelho@intel.com>
2019-06-29iwlwifi: fix RF-Kill interrupt while FW load for gen2 devicesEmmanuel Grumbach
Newest devices have a new firmware load mechanism. This mechanism is called the context info. It means that the driver doesn't need to load the sections of the firmware. The driver rather prepares a place in DRAM, with pointers to the relevant sections of the firmware, and the firmware loads itself. At the end of the process, the firmware sends the ALIVE interrupt. This is different from the previous scheme in which the driver expected the FH_TX interrupt after each section being transferred over the DMA. In order to support this new flow, we enabled all the interrupts. This broke the assumption that we have in the code that the RF-Kill interrupt can't interrupt the firmware load flow. Change the context info flow to enable only the ALIVE interrupt, and re-enable all the other interrupts only after the firmware is alive. Then, we won't see the RF-Kill interrupt until then. Getting the RF-Kill interrupt while loading the firmware made us kill the firmware while it is loading and we ended up dumping garbage instead of the firmware state. Re-enable the ALIVE | RX interrupts from the ISR when we get the ALIVE interrupt to be able to get the RX interrupt that comes immediately afterwards for the ALIVE notification. This is needed for non MSI-X only. Cc: stable@vger.kernel.org Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com> Signed-off-by: Luca Coelho <luciano.coelho@intel.com>
2019-06-29iwlwifi: pcie: fix ALIVE interrupt handling for gen2 devices w/o MSI-XEmmanuel Grumbach
We added code to restock the buffer upon ALIVE interrupt when MSI-X is disabled. This was added as part of the context info code. This code was added only if the ISR debug level is set which is very unlikely to be related. Move this code to run even when the ISR debug level is not set. Note that gen2 devices work with MSI-X in most cases so that this path is seldom used. Cc: stable@vger.kernel.org Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com> Signed-off-by: Luca Coelho <luciano.coelho@intel.com>
2019-06-29iwlwifi: mvm: delay GTK setting in FW in AP modeJohannes Berg
In AP (and IBSS) mode, we can only set GTKs to firmware after we have sent down the multicast station, but this we can only do after we've enabled beaconing, etc. However, during rfkill exit, hostapd will configure the keys before starting the AP, and cfg80211/mac80211 accept it happily. On earlier devices, this didn't bother us as GTK TX wasn't really handled in firmware, we just put the key material into the TX cmd and thus it only mattered when we actually transmitted a frame. On newer devices, however, the firmware needs to track all of this and that doesn't work if we add the key before the (multicast) sta it belongs to. To fix this, keep a list of keys to add during AP enable, and call the function there. Cc: stable@vger.kernel.org Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Luca Coelho <luciano.coelho@intel.com>
2019-06-29iwlwifi: mvm: remove MAC_FILTER_IN_11AX for AP modeLuca Coelho
The FW API was clarified saying that this flag should only be set in BSS client mode. Remove it from the MAC_CTXT command we send in AP and GO modes. Signed-off-by: Luca Coelho <luciano.coelho@intel.com> Fixes: 3b5ee8dd8bb1 ("iwlwifi: mvm: set MAC_FILTER_IN_11AX in AP mode") Signed-off-by: Luca Coelho <luciano.coelho@intel.com>
2019-06-29iwlwifi: dbg: debug recording stop and restart command removeShahar S Matityahu
The 0xF6 command used to start and stop the recording from 22560 devices was removed. This is causing an assert when the driver tries to alter the recording state. Remove the use of the command. Signed-off-by: Shahar S Matityahu <shahar.s.matityahu@intel.com> Signed-off-by: Luca Coelho <luciano.coelho@intel.com>
2019-06-29iwlwifi: dbg: don't stop dbg recording before entering D3 from 9000 devicesShahar S Matityahu
From 9000 device family the FW automatically stops the debug recording and the driver should not stop it as well. Signed-off-by: Shahar S Matityahu <shahar.s.matityahu@intel.com> Signed-off-by: Luca Coelho <luciano.coelho@intel.com>
2019-06-29iwlwifi: dbg_ini: fix debug monitor stop and restart in ini modeShahar S Matityahu
In ini debug mode the recording does not restart unless legacy monitor configuration is also given. Add dbg_ini_dest field to trans to indicate the debug monitor destination to solve this. Signed-off-by: Shahar S Matityahu <shahar.s.matityahu@intel.com> Signed-off-by: Luca Coelho <luciano.coelho@intel.com>
2019-06-29iwlwifi: mvm: make the usage of TWT configurableEmmanuel Grumbach
TWT is still very new and we expect issues. Make its usage configurable and disable it by default. Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com> Signed-off-by: Luca Coelho <luciano.coelho@intel.com>
2019-06-29iwlwifi: support FSEQ TLV even when FMAC is not compiledEmmanuel Grumbach
FSEQ TLV should be parsed and read even when FMAC is not compiled. Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com> Signed-off-by: Luca Coelho <luciano.coelho@intel.com>
2019-06-29iwlwifi: dbg: move trans debug fields to a separate structShahar S Matityahu
Unite iwl_trans debug related fields under iwl_trans_debug struct to increase readability and keep iwl_trans clean. Signed-off-by: Shahar S Matityahu <shahar.s.matityahu@intel.com> Signed-off-by: Luca Coelho <luciano.coelho@intel.com>
2019-06-29iwlwifi: dbg_ini: remove redundant checking of ini modeShahar S Matityahu
There are several flows where the driver checks if it runs in ini mode. Some of these flows are no longer used in ini mode or there is another condition that check the ini mode in the same flow. Either way, those conditions are redundant. Remove the redundant conditions. Signed-off-by: Shahar S Matityahu <shahar.s.matityahu@intel.com> Signed-off-by: Luca Coelho <luciano.coelho@intel.com>
2019-06-29iwlwifi: dbg_ini: enforce apply point early on buffer allocation tlvShahar S Matityahu
Apply buffer allocation TLV only if it is set to apply point IWL_FW_INI_APPLY_EARLY. Signed-off-by: Shahar S Matityahu <shahar.s.matityahu@intel.com> Signed-off-by: Luca Coelho <luciano.coelho@intel.com>
2019-06-29iwlwifi: dbg: fix debug monitor stop and restart delaysShahar S Matityahu
The driver should delay only in recording stop flow between writing to DBGC_IN_SAMPLE register and DBGC_OUT_CTRL register. Any other delay is not needed. Change the following: 1. Remove any unnecessary delays in the flow 2. Increase the delay in the stop recording flow since 100 micro is not enough 3. Use usleep_range instead of delay since the driver is allowed to sleep in this flow. Signed-off-by: Shahar S Matityahu <shahar.s.matityahu@intel.com> Fixes: 5cfe79c8d92a ("iwlwifi: fw: stop and start debugging using host command") Signed-off-by: Luca Coelho <luciano.coelho@intel.com>
2019-06-29iwlwifi: pcie: increase the size of PCI dumpsLuca Coelho
Currently we dump only the first 64 bytes of the PCI config space, which leaves out some important things, such as the base address registers. Increase it to 352 for the PCI device and to 524 for the rootport to make sure we include everything we need. Signed-off-by: Luca Coelho <luciano.coelho@intel.com>
2019-06-29iwlwifi: mvm: Drop large non sta framesAndrei Otcheretianski
In some buggy scenarios we could possible attempt to transmit frames larger than maximum MSDU size. Since our devices don't know how to handle this, it may result in asserts, hangs etc. This can happen, for example, when we receive a large multicast frame and try to transmit it back to the air in AP mode. Since in a legal scenario this should never happen, drop such frames and warn about it. Signed-off-by: Andrei Otcheretianski <andrei.otcheretianski@intel.com> Signed-off-by: Luca Coelho <luciano.coelho@intel.com>
2019-06-29iwlwifi: mvm: Add log information about SAR statusHaim Dreyfuss
Inform users when SAR status is changing. Signed-off-by: Haim Dreyfuss <haim.dreyfuss@intel.com> Signed-off-by: Luca Coelho <luciano.coelho@intel.com>
2019-06-29iwlwifi: Add support for SAR South Korea limitationHaim Dreyfuss
South Korea is adding a more strict SAR limit called "Limb SAR". Currently, WGDS SAR offset group 3 is not used (not mapped to any country). In order to be able to comply with South Korea new restriction: - OEM will use WGDS SAR offset group 3 to South Korea limitation. - OEM will change WGDS revision to 1 (currently latest revision is 0) to notify that Korea Limb SAR applied. - Driver will read the WGDS table and pass the values to FW (as usual) - Driver will pass to FW an indication that Korea Limb SAR is applied in case table revision is 1. Signed-off-by: Haim Dreyfuss <haim.dreyfuss@intel.com> Signed-off-by: Luca Coelho <luciano.coelho@intel.com>
2019-06-29iwlwifi: fix module init error pathsJohannes Berg
When the module fails to initialize for some reason, it doesn't clean up properly. Fix that. Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Luca Coelho <luciano.coelho@intel.com>
2019-06-29iwlwifi: mvm: convert to FW AC when configuring MU EDCAShaul Triebitz
The AC numbers used by mac80211 differ from those used by the firmware. When setting MU EDCA params for each AC, use the correct FW AC numbers. Signed-off-by: Shaul Triebitz <shaul.triebitz@intel.com> Signed-off-by: Luca Coelho <luciano.coelho@intel.com>
2019-06-29iwlwifi: mvm: correctly fill the ac array in the iwl_mac_ctx_cmdNaftali Goldstein
The indexes into the ac array in the iwl_mac_ctx_cmd are from the iwl_ac enum and not the txfs. The current code therefore puts the edca params in the wrong indexes of the array, causing wrong priority for data-streams of different ACs. Fix this. Note that this bug only occurs in NICs that use the new tx api, since in the old tx api the txf number is equal to the corresponding ac in the iwl_ac enum. Signed-off-by: Naftali Goldstein <naftali.goldstein@intel.com> Signed-off-by: Luca Coelho <luciano.coelho@intel.com>
2019-06-29iwlwifi: remove some unnecessary NULL checksDan Carpenter
These pointers are an offset into the "sta" struct. They're assigned like this: const struct ieee80211_sta_vht_cap *vht_cap = &sta->vht_cap; They're not the first member of the struct (->supp_rates[] is first) so they can't be NULL. Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Luca Coelho <luciano.coelho@intel.com>
2019-06-29iwlwifi: d3: Use struct_size() helperGustavo A. R. Silva
Make use of the struct_size() helper instead of an open-coded version in order to avoid any potential type mistakes, in particular in the context in which this code is being used. So, change the following form: sizeof(*pattern_cmd) + wowlan->n_patterns * sizeof(struct iwlagn_wowlan_pattern) to : struct_size(pattern_cmd, patterns, wowlan->n_patterns) This code was detected with the help of Coccinelle. Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com> Signed-off-by: Luca Coelho <luciano.coelho@intel.com>
2019-06-29iwlwifi: lib: Use struct_size() helperGustavo A. R. Silva
Make use of the struct_size() helper instead of an open-coded version in order to avoid any potential type mistakes, in particular in the context in which this code is being used. So, change the following form: sizeof(*pattern_cmd) + wowlan->n_patterns * sizeof(struct iwlagn_wowlan_pattern) to : struct_size(pattern_cmd, patterns, wowlan->n_patterns) This code was detected with the help of Coccinelle. Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com> Signed-off-by: Luca Coelho <luciano.coelho@intel.com>
2019-06-29iwlwifi: fw api: support adwell HB default APs number apiShahar S Matityahu
Support adaptive dwell high band default number of APs new api. Signed-off-by: Shahar S Matityahu <shahar.s.matityahu@intel.com> Signed-off-by: Luca Coelho <luciano.coelho@intel.com>
2019-06-29iwlwifi: mvm: remove multiple debugfs entriesMordechay Goodstein
Now that we have per station control over amsdu size no need for multiple entries, especially that the old one is misleading due to not setting it for all protocols as a limit. Signed-off-by: Mordechay Goodstein <mordechay.goodstein@intel.com> Signed-off-by: Luca Coelho <luciano.coelho@intel.com>
2019-06-29iwlwifi: dbg_ini: implement dump info collectionShahar S Matityahu
The info struct contains data about the FW, HW, RF and the debug configuration. Signed-off-by: Shahar S Matityahu <shahar.s.matityahu@intel.com> Signed-off-by: Luca Coelho <luciano.coelho@intel.com>
2019-06-29iwlwifi: mvm: add a debugfs entry to set a fixed size AMSDU for all TX packetsMordechay Goodstein
The current debugfs entry only limits the max AMSDU for TCP. Add a new debugfs entry to allow setting a fixed AMSDU size for all TX packets, including UDP and ICMP Signed-off-by: Mordechay Goodstein <mordechay.goodstein@intel.com> Signed-off-by: Luca Coelho <luciano.coelho@intel.com>
2019-06-29iwlwifi: dbg_ini: support debug info TLVShahar S Matityahu
Add support to debug info TLV. The TLV contains human readable naming of the FW image and the debug configuration. Signed-off-by: Shahar S Matityahu <shahar.s.matityahu@intel.com> Signed-off-by: Luca Coelho <luciano.coelho@intel.com>
2019-06-29iwlwifi: dbg_ini: use different barker for ini dumpShahar S Matityahu
Use a different barker for ini dump to allow differentiation from legacy dump. Also it allows to remove INI_BIT from dump TLVs. Signed-off-by: Shahar S Matityahu <shahar.s.matityahu@intel.com> Signed-off-by: Luca Coelho <luciano.coelho@intel.com>
2019-06-29iwlwifi: dbg_ini: add consecutive trigger firing supportShahar S Matityahu
When a dump trigger is fired, the driver sets IWL_FWRT_STATUS_DUMPING and aborts any consecutive dump collection. To allow consecutive triggers firing, use 5 dump workers and allocate them upon incoming dump collection requests. This functionality is needed since in ini debug mode each trigger may have entirely different memory regions to collect unlike the legacy mode in which all the triggers dump the same memory regions. Signed-off-by: Shahar S Matityahu <shahar.s.matityahu@intel.com> Signed-off-by: Luca Coelho <luciano.coelho@intel.com>
2019-06-29iwlwifi: dbg_ini: abort region collection in case the size is 0Shahar S Matityahu
Allows to abort region collection in case the region size is 0. It is needed for future regions that their size might be 0. Signed-off-by: Shahar S Matityahu <shahar.s.matityahu@intel.com> Signed-off-by: Luca Coelho <luciano.coelho@intel.com>
2019-06-29iwlwifi: update CSI APIJohannes Berg
Update the CSI API to the new version supported by the firmware. Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Luca Coelho <luciano.coelho@intel.com>
2019-06-29iwlwifi: dbg_ini: dump headers cleanupShahar S Matityahu
Unite dump memory ranges under a single struct and add a specific header for each type of memory. Also, maintain a single version to all dump structures. This cleanup is also needed for the future addition of FW notification regions and others. Signed-off-by: Shahar S Matityahu <shahar.s.matityahu@intel.com> Signed-off-by: Luca Coelho <luciano.coelho@intel.com>
2019-06-29iwlwifi: dbg: allow dump collection in case of an early errorShahar S Matityahu
Improve the robustness of the dump collection flow in case of an early error: 1. in iwl_trans_pcie_sync_nmi, disable and enable interrupts only if they were already enabled 2. attempt to initiate dump collection in iwl_fw_dbg_error_collect only if the device is enabled 3. check Tx command queue was already allocated before trying to collect it Signed-off-by: Shahar S Matityahu <shahar.s.matityahu@intel.com> Signed-off-by: Luca Coelho <luciano.coelho@intel.com>
2019-06-29iwlwifi: iwl_mvm_tx_mpdu() must be called with BH disabledJiri Kosina
As iwl_mvm_tx_mpdu() is not disabling BH while obtaining iwl_mvm_sta->lock (which is being taken from BH context as well), it has to be always invoked with BH disabled. Make that clear in a comment. Signed-off-by: Jiri Kosina <jkosina@suse.cz> Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Luca Coelho <luciano.coelho@intel.com>
2019-06-29bpf: fix uapi bpf_prog_info fields alignmentBaruch Siach
Merge commit 1c8c5a9d38f60 ("Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next") undid the fix from commit 36f9814a494 ("bpf: fix uapi hole for 32 bit compat applications") by taking the gpl_compatible 1-bit field definition from commit b85fab0e67b162 ("bpf: Add gpl_compatible flag to struct bpf_prog_info") as is. That breaks architectures with 16-bit alignment like m68k. Add 31-bit pad after gpl_compatible to restore alignment of following fields. Thanks to Dmitry V. Levin his analysis of this bug history. Signed-off-by: Baruch Siach <baruch@tkos.co.il> Acked-by: Song Liu <songliubraving@fb.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Daniel Borkmann <daniel@iogearbox.net> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-06-29Merge branch 'bpf-lookup-devmap'Daniel Borkmann
Toke Høiland-Jørgensen says: ==================== When using the bpf_redirect_map() helper to redirect packets from XDP, the eBPF program cannot currently know whether the redirect will succeed, which makes it impossible to gracefully handle errors. To properly fix this will probably require deeper changes to the way TX resources are allocated, but one thing that is fairly straight forward to fix is to allow lookups into devmaps, so programs can at least know when a redirect is *guaranteed* to fail because there is no entry in the map. Currently, programs work around this by keeping a shadow map of another type which indicates whether a map index is valid. This series contains two changes that are complementary ways to fix this issue: - Moving the map lookup into the bpf_redirect_map() helper (and caching the result), so the helper can return an error if no value is found in the map. This includes a refactoring of the devmap and cpumap code to not care about the index on enqueue. - Allowing regular lookups into devmaps from eBPF programs, using the read-only flag to make sure they don't change the values. The performance impact of the series is negligible, in the sense that I cannot measure it because the variance between test runs is higher than the difference pre/post series. Changelog: v6: - Factor out list handling in maps to a helper in list.h (new patch 1) - Rename variables in struct bpf_redirect_info (new patch 3 + patch 4) - Explain why we are clearing out the map in the info struct on lookup failure - Remove unneeded check for forwarding target in tracepoint macro v5: - Rebase on latest bpf-next. - Update documentation for bpf_redirect_map() with the new meaning of flags. v4: - Fix a few nits from Andrii - Lose the #defines in bpf.h and just compare the flags argument directly to XDP_TX in bpf_xdp_redirect_map(). v3: - Adopt Jonathan's idea of using the lower two bits of the flag value as the return code. - Always do the lookup, and cache the result for use in xdp_do_redirect(); to achieve this, refactor the devmap and cpumap code to get rid the bitmap for selecting which devices to flush. v2: - For patch 1, make it clear that the change works for any map type. - For patch 2, just use the new BPF_F_RDONLY_PROG flag to make the return value read-only. ==================== Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-06-29devmap: Allow map lookups from eBPFToke Høiland-Jørgensen
We don't currently allow lookups into a devmap from eBPF, because the map lookup returns a pointer directly to the dev->ifindex, which shouldn't be modifiable from eBPF. However, being able to do lookups in devmaps is useful to know (e.g.) whether forwarding to a specific interface is enabled. Currently, programs work around this by keeping a shadow map of another type which indicates whether a map index is valid. Since we now have a flag to make maps read-only from the eBPF side, we can simply lift the lookup restriction if we make sure this flag is always set. Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com> Acked-by: Jonathan Lemon <jonathan.lemon@gmail.com> Acked-by: Andrii Nakryiko <andriin@fb.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-06-29bpf_xdp_redirect_map: Perform map lookup in eBPF helperToke Høiland-Jørgensen
The bpf_redirect_map() helper used by XDP programs doesn't return any indication of whether it can successfully redirect to the map index it was given. Instead, BPF programs have to track this themselves, leading to programs using duplicate maps to track which entries are populated in the devmap. This patch fixes this by moving the map lookup into the bpf_redirect_map() helper, which makes it possible to return failure to the eBPF program. The lower bits of the flags argument is used as the return code, which means that existing users who pass a '0' flag argument will get XDP_ABORTED. With this, a BPF program can check the return code from the helper call and react by, for instance, substituting a different redirect. This works for any type of map used for redirect. Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com> Acked-by: Jonathan Lemon <jonathan.lemon@gmail.com> Acked-by: Andrii Nakryiko <andriin@fb.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-06-29devmap: Rename ifindex member in bpf_redirect_infoToke Høiland-Jørgensen
The bpf_redirect_info struct has an 'ifindex' member which was named back when the redirects could only target egress interfaces. Now that we can also redirect to sockets and CPUs, this is a bit misleading, so rename the member to tgt_index. Reorder the struct members so we can have 'tgt_index' and 'tgt_value' next to each other in a subsequent patch. Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-06-29devmap/cpumap: Use flush list instead of bitmapToke Høiland-Jørgensen
The socket map uses a linked list instead of a bitmap to keep track of which entries to flush. Do the same for devmap and cpumap, as this means we don't have to care about the map index when enqueueing things into the map (and so we can cache the map lookup). Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com> Acked-by: Jonathan Lemon <jonathan.lemon@gmail.com> Acked-by: Andrii Nakryiko <andriin@fb.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-06-29xskmap: Move non-standard list manipulation to helperToke Høiland-Jørgensen
Add a helper in list.h for the non-standard way of clearing a list that is used in xskmap. This makes it easier to reuse it in the other map types, and also makes sure this usage is not forgotten in any list refactorings in the future. Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-06-29selftests/bpf: fix -Wstrict-aliasing in test_sockopt_sk.cStanislav Fomichev
Let's use union with u8[4] and u32 members for sockopt buffer, that should fix any possible aliasing issues. test_sockopt_sk.c: In function ‘getsetsockopt’: test_sockopt_sk.c:115:2: warning: dereferencing type-punned pointer will break strict-aliasing rules [-Wstrict-aliasing] if (*(__u32 *)buf != 0x55AA*2) { ^~ test_sockopt_sk.c:116:3: warning: dereferencing type-punned pointer will break strict-aliasing rules [-Wstrict-aliasing] log_err("Unexpected getsockopt(SO_SNDBUF) 0x%x != 0x55AA*2", ^~~~~~~ Fixes: 8a027dc0d8f5 ("selftests/bpf: add sockopt test that exercises sk helpers") Reported-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Stanislav Fomichev <sdf@google.com> Acked-by: Song Liu <songliubraving@fb.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>