summaryrefslogtreecommitdiff
path: root/include
AgeCommit message (Collapse)Author
2017-02-03net: remove support for per driver ndo_busy_poll()Eric Dumazet
We added generic support for busy polling in NAPI layer in linux-4.5 No network driver uses ndo_busy_poll() anymore, we can get rid of the pointer in struct net_device_ops, and its use in sk_busy_loop() Saves NETIF_F_BUSY_POLL features bit. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-02-03base/memory, hotplug: fix a kernel oops in show_valid_zones()Toshi Kani
Reading a sysfs "memoryN/valid_zones" file leads to the following oops when the first page of a range is not backed by struct page. show_valid_zones() assumes that 'start_pfn' is always valid for page_zone(). BUG: unable to handle kernel paging request at ffffea017a000000 IP: show_valid_zones+0x6f/0x160 This issue may happen on x86-64 systems with 64GiB or more memory since their memory block size is bumped up to 2GiB. [1] An example of such systems is desribed below. 0x3240000000 is only aligned by 1GiB and this memory block starts from 0x3200000000, which is not backed by struct page. BIOS-e820: [mem 0x0000003240000000-0x000000603fffffff] usable Since test_pages_in_a_zone() already checks holes, fix this issue by extending this function to return 'valid_start' and 'valid_end' for a given range. show_valid_zones() then proceeds with the valid range. [1] 'Commit bdee237c0343 ("x86: mm: Use 2GB memory block size on large-memory x86-64 systems")' Link: http://lkml.kernel.org/r/20170127222149.30893-3-toshi.kani@hpe.com Signed-off-by: Toshi Kani <toshi.kani@hpe.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Zhang Zhen <zhenzhang.zhang@huawei.com> Cc: Reza Arbab <arbab@linux.vnet.ibm.com> Cc: David Rientjes <rientjes@google.com> Cc: Dan Williams <dan.j.williams@intel.com> Cc: <stable@vger.kernel.org> [4.4+] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-02-03Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-nextDavid S. Miller
Pablo Neira Ayuso says: ==================== Netfilter updates for net-next The following patchset contains Netfilter updates for your net-next tree, they are: 1) Stash ctinfo 3-bit field into pointer to nf_conntrack object from sk_buff so we only access one single cacheline in the conntrack hotpath. Patchset from Florian Westphal. 2) Don't leak pointer to internal structures when exporting x_tables ruleset back to userspace, from Willem DeBruijn. This includes new helper functions to copy data to userspace such as xt_data_to_user() as well as conversions of our ip_tables, ip6_tables and arp_tables clients to use it. Not surprinsingly, ebtables requires an ad-hoc update. There is also a new field in x_tables extensions to indicate the amount of bytes that we copy to userspace. 3) Add nf_log_all_netns sysctl: This new knob allows you to enable logging via nf_log infrastructure for all existing netnamespaces. Given the effort to provide pernet syslog has been discontinued, let's provide a way to restore logging using netfilter kernel logging facilities in trusted environments. Patch from Michal Kubecek. 4) Validate SCTP checksum from conntrack helper, from Davide Caratti. 5) Merge UDPlite conntrack and NAT helpers into UDP, this was mostly a copy&paste from the original helper, from Florian Westphal. 6) Reset netfilter state when duplicating packets, also from Florian. 7) Remove unnecessary check for broadcast in IPv6 in pkttype match and nft_meta, from Liping Zhang. 8) Add missing code to deal with loopback packets from nft_meta when used by the netdev family, also from Liping. 9) Several cleanups on nf_tables, one to remove unnecessary check from the netlink control plane path to add table, set and stateful objects and code consolidation when unregister chain hooks, from Gao Feng. 10) Fix harmless reference counter underflow in IPVS that, however, results in problems with the introduction of the new refcount_t type, from David Windsor. 11) Enable LIBCRC32C from nf_ct_sctp instead of nf_nat_sctp, from Davide Caratti. 12) Missing documentation on nf_tables uapi header, from Liping Zhang. 13) Use rb_entry() helper in xt_connlimit, from Geliang Tang. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2017-02-03sched: cls_flower: expose priority to offloading netdeviceJiri Pirko
The driver that offloads flower rules needs to know with which priority user inserted the rules. So add this information into offload struct. Signed-off-by: Jiri Pirko <jiri@mellanox.com> Acked-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-02-03lib: Introduce priority array area managerJiri Pirko
This introduces a infrastructure for management of linear priority areas. Priority order in an array matters, however order of items inside a priority group does not matter. As an initial implementation, L-sort algorithm is used. It is quite trivial. More advanced algorithm called P-sort will be introduced as a follow-up. The infrastructure is prepared for other algos. Alongside this, a testing module is introduced as well. Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-02-03list: introduce list_for_each_entry_from_reverse helperJiri Pirko
Similar to list_for_each_entry_continue and its reverse variant list_for_each_entry_continue_reverse, introduce reverse helper for list_for_each_entry_from. Signed-off-by: Jiri Pirko <jiri@mellanox.com> Acked-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-02-03trace: rename trace_print_hex_seq arg and add kdocDaniel Borkmann
Steven suggested to improve trace_print_hex_seq() a bit after commit 2acae0d5b0f7 ("trace: add variant without spacing in trace_print_hex_seq") in two ways: i) by adding a kdoc comment for the helper function itself and ii) by renaming 'spacing' argument into 'concatenate' to better denote that we don't add spaces between each hex bytes. Suggested-by: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-02-03bridge: uapi: add per vlan tunnel infoRoopa Prabhu
New nested netlink attribute to associate tunnel info per vlan. This is used by bridge driver to send tunnel metadata to bridge ports in vlan tunnel mode. This patch also adds new per port flag IFLA_BRPORT_VLAN_TUNNEL to enable vlan tunnel mode. off by default. One example use for this is a vxlan bridging gateway or vtep which maps vlans to vn-segments (or vnis). User can configure per-vlan tunnel information which the bridge driver can use to bridge vlan into the corresponding vn-segment. Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-02-03vxlan: support fdb and learning in COLLECT_METADATA modeRoopa Prabhu
Vxlan COLLECT_METADATA mode today solves the per-vni netdev scalability problem in l3 networks. It expects all forwarding information to be present in dst_metadata. This patch series enhances collect metadata mode to include the case where only vni is present in dst_metadata, and the vxlan driver can then use the rest of the forwarding information datbase to make forwarding decisions. There is no change to default COLLECT_METADATA behaviour. These changes only apply to COLLECT_METADATA when used with the bridging use-case with a special dst_metadata tunnel info flag (eg: where vxlan device is part of a bridge). For all this to work, the vxlan driver will need to now support a single fdb table hashed by mac + vni. This series essentially makes this happen. use-case and workflow: vxlan collect metadata device participates in bridging vlan to vn-segments. Bridge driver above the vxlan device, sends the vni corresponding to the vlan in the dst_metadata. vxlan driver will lookup forwarding database with (mac + vni) for the required remote destination information to forward the packet. Changes introduced by this patch: - allow learning and forwarding database state in vxlan netdev in COLLECT_METADATA mode. Current behaviour is not changed by default. tunnel info flag IP_TUNNEL_INFO_BRIDGE is used to support the new bridge friendly mode. - A single fdb table hashed by (mac, vni) to allow fdb entries with multiple vnis in the same fdb table - rx path already has the vni - tx path expects a vni in the packet with dst_metadata - prior to this series, fdb remote_dsts carried remote vni and the vxlan device carrying the fdb table represented the source vni. With the vxlan device now representing multiple vnis, this patch adds a src vni attribute to the fdb entry. The remote vni already uses NDA_VNI attribute. This patch introduces NDA_SRC_VNI netlink attribute to represent the src vni in a multi vni fdb table. iproute2 example (patched and pruned iproute2 output to just show relevant fdb entries): example shows same host mac learnt on two vni's. before (netdev per vni): $bridge fdb show | grep "00:02:00:00:00:03" 00:02:00:00:00:03 dev vxlan1001 dst 12.0.0.8 self 00:02:00:00:00:03 dev vxlan1000 dst 12.0.0.8 self after this patch with collect metadata in bridged mode (single netdev): $bridge fdb show | grep "00:02:00:00:00:03" 00:02:00:00:00:03 dev vxlan0 src_vni 1001 dst 12.0.0.8 self 00:02:00:00:00:03 dev vxlan0 src_vni 1000 dst 12.0.0.8 self Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-02-03ip_tunnels: new IP_TUNNEL_INFO_BRIDGE flag for ip_tunnel_info modeRoopa Prabhu
New ip_tunnel_info flag to represent bridged tunnel metadata. Used by bridge driver later in the series to pass per vlan dst metadata to bridge ports. Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-02-03net/sched: act_ife: Change to use ife moduleYotam Gigi
Use the encode/decode functionality from the ife module instead of using implementation inside the act_ife. Reviewed-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: Yotam Gigi <yotamg@mellanox.com> Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: Roman Mashak <mrv@mojatatu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-02-03net: Introduce ife encapsulation moduleYotam Gigi
This module is responsible for the ife encapsulation protocol encode/decode logics. That module can: - ife_encode: encode skb and reserve space for the ife meta header - ife_decode: decode skb and extract the meta header size - ife_tlv_meta_encode - encodes one tlv entry into the reserved ife header space. - ife_tlv_meta_decode - decodes one tlv entry from the packet - ife_tlv_meta_next - advance to the next tlv Reviewed-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: Yotam Gigi <yotamg@mellanox.com> Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: Roman Mashak <mrv@mojatatu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-02-03net/sched: act_ife: Unexport ife_tlv_meta_encodeYotam Gigi
As the function ife_tlv_meta_encode is not used by any other module, unexport it and make it static for the act_ife module. Signed-off-by: Yotam Gigi <yotamg@mellanox.com> Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: Roman Mashak <mrv@mojatatu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-02-03Merge tag 'drm-fixes-for-v4.10-rc7' of ↵Linus Torvalds
git://people.freedesktop.org/~airlied/linux Pull drm fixes from Dave Airlie: "Another fixes pull for v4.10, it's a bit big due to the backport of the VMA fixes for i915 that should fix the oops on shutdown problems that you've worked around. There are also two drm core connector registration fixes, a bunch of nouveau regression fixes and two AMD fixes" * tag 'drm-fixes-for-v4.10-rc7' of git://people.freedesktop.org/~airlied/linux: drm/radeon: Fix vram_size/visible values in DRM_RADEON_GEM_INFO ioctl drm/amdgpu/si: fix crash on headless asics drm/i915: Track pinned vma in intel_plane_state drm/atomic: Unconditionally call prepare_fb. drm/atomic: Fix double free in drm_atomic_state_default_clear drm/nouveau/kms/nv50: request vblank events for commits that send completion events drm/nouveau/nv1a,nv1f/disp: fix memory clock rate retrieval drm/nouveau/disp/gt215: Fix HDA ELD handling (thus, HDMI audio) on gt215 drm/nouveau/nouveau/led: prevent compiling the led-code if nouveau=y and leds=m drm/nouveau/disp/mcp7x: disable dptmds workaround drm/nouveau: prevent userspace from deleting client object drm/nouveau/fence/g84-: protect against concurrent access to semaphore buffers drm: Don't race connector registration drm: prevent double-(un)registration for connectors
2017-02-03firmware: qcom: scm: Fix interrupted SCM callsAndy Gross
This patch adds a Qualcomm specific quirk to the arm_smccc_smc call. On Qualcomm ARM64 platforms, the SMC call can return before it has completed. If this occurs, the call can be restarted, but it requires using the returned session ID value from the interrupted SMC call. The quirk stores off the session ID from the interrupted call in the quirk structure so that it can be used by the caller. This patch folds in a fix given by Sricharan R: https://lkml.org/lkml/2016/9/28/272 Signed-off-by: Andy Gross <andy.gross@linaro.org> Reviewed-by: Will Deacon <will.deacon@arm.com> Signed-off-by: Will Deacon <will.deacon@arm.com>
2017-02-03arm: kernel: Add SMC structure parameterAndy Gross
This patch adds a quirk parameter to the arm_smccc_(smc/hvc) calls. The quirk structure allows for specialized SMC operations due to SoC specific requirements. The current arm_smccc_(smc/hvc) is renamed and macros are used instead to specify the standard arm_smccc_(smc/hvc) or the arm_smccc_(smc/hvc)_quirk function. This patch and partial implementation was suggested by Will Deacon. Signed-off-by: Andy Gross <andy.gross@linaro.org> Reviewed-by: Will Deacon <will.deacon@arm.com> Signed-off-by: Will Deacon <will.deacon@arm.com>
2017-02-03Merge branch 'modversions' (modversions fixes for powerpc from Ard)Linus Torvalds
Merge kcrctab entry fixes from Ard Biesheuvel: "This is a followup to [0] 'modversions: redefine kcrctab entries as relative CRC pointers', but since relative CRC pointers do not work in modules, and are actually only needed by powerpc with CONFIG_RELOCATABLE=y, I have made it a Kconfig selectable feature instead. First it introduces the MODULE_REL_CRCS Kconfig symbol, and adds the kbuild handling of it, i.e., modpost, genksyms and kallsyms. Then it switches all architectures to 32-bit CRC entries in kcrctab, where all architectures except powerpc with CONFIG_RELOCATABLE=y use absolute ELF symbol references as before" [0] http://marc.info/?l=linux-arch&m=148493613415294&w=2 * emailed patches from Ard Biesheuvel: module: unify absolute krctab definitions for 32-bit and 64-bit modversions: treat symbol CRCs as 32 bit quantities kbuild: modversions: add infrastructure for emitting relative CRCs
2017-02-03log2: make order_base_2() behave correctly on const input value zeroArd Biesheuvel
The function order_base_2() is defined (according to the comment block) as returning zero on input zero, but subsequently passes the input into roundup_pow_of_two(), which is explicitly undefined for input zero. This has gone unnoticed until now, but optimization passes in GCC 7 may produce constant folded function instances where a constant value of zero is passed into order_base_2(), resulting in link errors against the deliberately undefined '____ilog2_NaN'. So update order_base_2() to adhere to its own documented interface. [ See http://marc.info/?l=linux-kernel&m=147672952517795&w=2 and follow-up discussion for more background. The gcc "optimization pass" is really just broken, but now the GCC trunk problem seems to have escaped out of just specially built daily images, so we need to work around it in mainline. - Linus ] Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-02-03module: unify absolute krctab definitions for 32-bit and 64-bitArd Biesheuvel
The previous patch introduced a separate inline asm version of the krcrctab declaration template for use with 64-bit architectures, which cannot refer to ELF symbols using 32-bit quantities. This declaration should be equivalent to the C one for 32-bit architectures, but just in case - unify them in a separate patch, which can simply be dropped if it turns out to break anything. Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-02-03modversions: treat symbol CRCs as 32 bit quantitiesArd Biesheuvel
The modversion symbol CRCs are emitted as ELF symbols, which allows us to easily populate the kcrctab sections by relying on the linker to associate each kcrctab slot with the correct value. This has a couple of downsides: - Given that the CRCs are treated as memory addresses, we waste 4 bytes for each CRC on 64 bit architectures, - On architectures that support runtime relocation, a R_<arch>_RELATIVE relocation entry is emitted for each CRC value, which identifies it as a quantity that requires fixing up based on the actual runtime load offset of the kernel. This results in corrupted CRCs unless we explicitly undo the fixup (and this is currently being handled in the core module code) - Such runtime relocation entries take up 24 bytes of __init space each, resulting in a x8 overhead in [uncompressed] kernel size for CRCs. Switching to explicit 32 bit values on 64 bit architectures fixes most of these issues, given that 32 bit values are not treated as quantities that require fixing up based on the actual runtime load offset. Note that on some ELF64 architectures [such as PPC64], these 32-bit values are still emitted as [absolute] runtime relocatable quantities, even if the value resolves to a build time constant. Since relative relocations are always resolved at build time, this patch enables MODULE_REL_CRCS on powerpc when CONFIG_RELOCATABLE=y, which turns the absolute CRC references into relative references into .rodata where the actual CRC value is stored. So redefine all CRC fields and variables as u32, and redefine the __CRC_SYMBOL() macro for 64 bit builds to emit the CRC reference using inline assembler (which is necessary since 64-bit C code cannot use 32-bit types to hold memory addresses, even if they are ultimately resolved using values that do not exceed 0xffffffff). To avoid potential problems with legacy 32-bit architectures using legacy toolchains, the equivalent C definition of the kcrctab entry is retained for 32-bit architectures. Note that this mostly reverts commit d4703aefdbc8 ("module: handle ppc64 relocating kcrctabs when CONFIG_RELOCATABLE=y") Acked-by: Rusty Russell <rusty@rustcorp.com.au> Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-02-03tcp: add tcp_mss_clamp() helperEric Dumazet
Small cleanup factorizing code doing the TCP_MAXSEG clamping. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-02-03ipv6: sr: remove cleanup flag and fix HMAC computationDavid Lebrun
In the latest version of the IPv6 Segment Routing IETF draft [1] the cleanup flag is removed and the flags field length is shrunk from 16 bits to 8 bits. As a consequence, the input of the HMAC computation is modified in a non-backward compatible way by covering the whole octet of flags instead of only the cleanup bit. As such, if an implementation compatible with the latest draft computes the HMAC of an SRH who has other flags set to 1, then the HMAC result would differ from the current implementation. This patch carries those modifications to prevent conflict with other implementations of IPv6 SR. [1] https://tools.ietf.org/html/draft-ietf-6man-segment-routing-header-05 Signed-off-by: David Lebrun <david.lebrun@uclouvain.be> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-02-03ACPI: Add support for ResourceSource/IRQ domain mappingAgustin Vega-Frias
ACPI extended IRQ resources may contain a ResourceSource to specify an alternate interrupt controller. Introduce acpi_irq_get and use it to implement ResourceSource/IRQ domain mapping. The new API is similar to of_irq_get and allows re-initialization of a platform resource from the ACPI extended IRQ resource, and provides proper behavior for probe deferral when the domain is not yet present when called. Acked-by: Rafael J. Wysocki <rafael@kernel.org> Acked-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com> Reviewed-by: Hanjun Guo <hanjun.guo@linaro.org> Tested-by: Hanjun Guo <hanjun.guo@linaro.org> Signed-off-by: Agustin Vega-Frias <agustinv@codeaurora.org> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
2017-02-03[media] v4l2-subdev.h: fix v4l2_subdev_pad_config documentationBaruch Siach
The fields of v4l2_subdev_pad_config are not pointers. Fixes: 21c29de1d09 ("[media] v4l2-subdev.h: Improve documentation") Cc: Mauro Carvalho Chehab <mchehab@s-opensource.com> Signed-off-by: Baruch Siach <baruch@tkos.co.il> Signed-off-by: Mauro Carvalho Chehab <mchehab@s-opensource.com>
2017-02-03Merge remote-tracking branches 'regmap/topic/doc' and 'regmap/topic/rbtree' ↵Mark Brown
into regmap-next
2017-02-03vmw_vmci: switch to pci_irq_alloc_vectorsChristoph Hellwig
Cleans up the IRQ management code a lot, including removing a lot of state from the per-device structure. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-02-03[media] media: Protect enable_source and disable_source handler code pathsShuah Khan
Drivers might try to access and run enable_source and disable_source handlers when the driver that implements these handlers is clearing the handlers during its unregister. Fix the following race condition: process 1 process 2 request video streaming unbind au0828 v4l2 checks if tuner is free ... ... au0828_unregister_media_device() ... ... (doesn't hold graph_mutex) mdev->enable_source = NULL; if (mdev && mdev->enable_source) mdev->disable_source = NULL; mdev->enable_source() (enable_source holds graph_mutex) As shown above enable_source check is done without holding the graph_mutex. If unbind happens to be in progress, au0828 could clear enable_source and disable_source handlers leading to null pointer de-reference. Fix it by protecting enable_source and disable_source set and clear and protecting enable_source and disable_source handler access and the call itself. process 1 process 2 request video streaming unbind au0828 v4l2 checks if tuner is free ... ... au0828_unregister_media_device() ... ... (hold graph_mutex while clearing) mdev->enable_source = NULL; if (mdev) mdev->disable_source = NULL; (hold graph_mutex to check and call enable_source) if (mdev->enable_source) mdev->enable_source() If graph_mutex is held to just heck for handler being null and needs to be released before calling the handler, there will be another window for the handlers to be cleared. Hence, enable_source and disable_source handlers no longer hold the graph_mutex and expect callers to hold it to avoid forcing them release the graph_mutex before calling the handlers. Signed-off-by: Shuah Khan <shuahkh@osg.samsung.com> Signed-off-by: Mauro Carvalho Chehab <mchehab@s-opensource.com>
2017-02-03serdev: add a tty port controller driverRob Herring
Add a serdev controller driver for tty ports. The controller is registered with serdev when tty ports are registered with the TTY core. As the TTY core is built-in only, this has the side effect of making serdev built-in as well. Signed-off-by: Rob Herring <robh@kernel.org> Reviewed-By: Sebastian Reichel <sre@kernel.org> Tested-By: Sebastian Reichel <sre@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-02-03serdev: Introduce new bus for serial attached devicesRob Herring
The serdev bus is designed for devices such as Bluetooth, WiFi, GPS and NFC connected to UARTs on host processors. Tradionally these have been handled with tty line disciplines, rfkill, and userspace glue such as hciattach. This approach has many drawbacks since it doesn't fit into the Linux driver model. Handling of sideband signals, power control and firmware loading are the main issues. This creates a serdev bus with controllers (i.e. host serial ports) and attached devices. Typically, these are point to point connections, but some devices have muxing protocols or a h/w mux is conceivable. Any muxing is not yet supported with the serdev bus. Signed-off-by: Rob Herring <robh@kernel.org> Reviewed-By: Sebastian Reichel <sre@kernel.org> Tested-By: Sebastian Reichel <sre@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-02-03tty_port: Add port client functionsRob Herring
Introduce a client (upward direction) operations struct for tty_port clients. Initially supported operations are for receiving data and write wake-up. This will allow for having clients other than an ldisc. Convert the calls to the ldisc to use the client ops as the default operations. Signed-off-by: Rob Herring <robh@kernel.org> Reviewed-By: Sebastian Reichel <sre@kernel.org> Tested-By: Sebastian Reichel <sre@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-02-02net: add LINUX_MIB_PFMEMALLOCDROP counterEric Dumazet
Debugging issues caused by pfmemalloc is often tedious. Add a new SNMP counter to more easily diagnose these problems. Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Josef Bacik <jbacik@fb.com> Acked-by: Josef Bacik <jbacik@fb.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-02-02net: phy: marvell: Add support for 88e1545 PHYAndrew Lunn
The 88e1545 PHYs are discrete Marvell PHYs, found in a quad package on the zii-devel-b board. Add support for it to the Marvell PHY driver. Signed-off-by: Andrew Lunn <andrew@lunn.ch> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-02-02unix: add ioctl to open a unix socket file with O_PATHAndrey Vagin
This ioctl opens a file to which a socket is bound and returns a file descriptor. The caller has to have CAP_NET_ADMIN in the socket network namespace. Currently it is impossible to get a path and a mount point for a socket file. socket_diag reports address, device ID and inode number for unix sockets. An address can contain a relative path or a file may be moved somewhere. And these properties say nothing about a mount namespace and a mount point of a socket file. With the introduced ioctl, we can get a path by reading /proc/self/fd/X and get mnt_id from /proc/self/fdinfo/X. In CRIU we are going to use this ioctl to dump and restore unix socket. Here is an example how it can be used: $ strace -e socket,bind,ioctl ./test /tmp/test_sock socket(AF_UNIX, SOCK_STREAM, 0) = 3 bind(3, {sa_family=AF_UNIX, sun_path="test_sock"}, 11) = 0 ioctl(3, SIOCUNIXFILE, 0) = 4 ^Z $ ss -a | grep test_sock u_str LISTEN 0 1 test_sock 17798 * 0 $ ls -l /proc/760/fd/{3,4} lrwx------ 1 root root 64 Feb 1 09:41 3 -> 'socket:[17798]' l--------- 1 root root 64 Feb 1 09:41 4 -> /tmp/test_sock $ cat /proc/760/fdinfo/4 pos: 0 flags: 012000000 mnt_id: 40 $ cat /proc/self/mountinfo | grep "^40\s" 40 19 0:37 / /tmp rw shared:23 - tmpfs tmpfs rw Signed-off-by: Andrei Vagin <avagin@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-02-02net: phy: Marvell: Add mv88e6390 internal PHYAndrew Lunn
The mv88e6390 Ethernet switch has internal PHYs. These PHYs don't have an model ID in the ID2 register. So the MDIO driver in the switch intercepts reads to this register, and returns the switch family ID. Extend the Marvell PHY driver by including this ID, and treat the PHY as a 88E1540. Signed-off-by: Andrew Lunn <andrew@lunn.ch> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-02-02Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller
All merge conflicts were simple overlapping changes. Signed-off-by: David S. Miller <davem@davemloft.net>
2017-02-02Merge branch 'perf-urgent-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull perf fixes from Ingo Molnar: "Five kernel fixes: - an mmap tracing ABI fix for certain mappings - a use-after-free fix, found via KASAN - three CPU hotplug related x86 PMU driver fixes" * 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: perf/x86/intel/uncore: Make package handling more robust perf/x86/intel/uncore: Clean up hotplug conversion fallout perf/x86/intel/rapl: Make package handling more robust perf/core: Fix PERF_RECORD_MMAP2 prot/flags for anonymous memory perf/core: Fix use-after-free bug
2017-02-02blktrace: make do_blk_trace_setup() staticOmar Sandoval
This isn't used outside of blktrace.c anymore. Fixes: 62c2a7d969f3 ("block: push BKL into blktrace ioctls") Signed-off-by: Omar Sandoval <osandov@fb.com> Signed-off-by: Jens Axboe <axboe@fb.com>
2017-02-02block: fix debugfs config conditional in struct request_queueOmar Sandoval
The debugfs dentries are only used for CONFIG_BLK_DEBUG_FS, so make them conditional on that instead of CONFIG_DEBUG_FS. Signed-off-by: Omar Sandoval <osandov@fb.com> Signed-off-by: Jens Axboe <axboe@fb.com>
2017-02-02debugfs: add debugfs_lookup()Omar Sandoval
We don't always have easy access to the dentry of a file or directory we created in debugfs. Add a helper which allows us to get a dentry we previously created. The motivation for this change is a problem with blktrace and the blk-mq debugfs entries introduced in 07e4fead45e6 ("blk-mq: create debugfs directory tree"). Namely, in some cases, the directory that blktrace needs to create may already exist, but in other cases, it may not. We _could_ rely on a bunch of implied knowledge to decide whether to create the directory or not, but it's much cleaner on our end to just look it up. Signed-off-by: Omar Sandoval <osandov@fb.com> Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Jens Axboe <axboe@fb.com>
2017-02-02ext4: move halfmd4 into hash.c directlyJason A. Donenfeld
The "half md4" transform should not be used by any new code. And fortunately, it's only used now by ext4. Since ext4 supports several hashing methods, at some point it might be desirable to move to something like SipHash. As an intermediate step, remove half md4 from cryptohash.h and lib, and make it just a local function in ext4's hash.c. There's precedent for doing this; the other function ext can use for its hashes -- TEA -- is also implemented in the same place. Also, by being a local function, this might allow gcc to perform some additional optimizations. Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com> Reviewed-by: Andreas Dilger <adilger@dilger.ca> Cc: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2017-02-02scsi, block: fix duplicate bdi name registration crashesDan Williams
Warnings of the following form occur because scsi reuses a devt number while the block layer still has it referenced as the name of the bdi [1]: WARNING: CPU: 1 PID: 93 at fs/sysfs/dir.c:31 sysfs_warn_dup+0x62/0x80 sysfs: cannot create duplicate filename '/devices/virtual/bdi/8:192' [..] Call Trace: dump_stack+0x86/0xc3 __warn+0xcb/0xf0 warn_slowpath_fmt+0x5f/0x80 ? kernfs_path_from_node+0x4f/0x60 sysfs_warn_dup+0x62/0x80 sysfs_create_dir_ns+0x77/0x90 kobject_add_internal+0xb2/0x350 kobject_add+0x75/0xd0 device_add+0x15a/0x650 device_create_groups_vargs+0xe0/0xf0 device_create_vargs+0x1c/0x20 bdi_register+0x90/0x240 ? lockdep_init_map+0x57/0x200 bdi_register_owner+0x36/0x60 device_add_disk+0x1bb/0x4e0 ? __pm_runtime_use_autosuspend+0x5c/0x70 sd_probe_async+0x10d/0x1c0 async_run_entry_fn+0x39/0x170 This is a brute-force fix to pass the devt release information from sd_probe() to the locations where we register the bdi, device_add_disk(), and unregister the bdi, blk_cleanup_queue(). Thanks to Omar for the quick reproducer script [2]. This patch survives where an unmodified kernel fails in a few seconds. [1]: https://marc.info/?l=linux-scsi&m=147116857810716&w=4 [2]: http://marc.info/?l=linux-block&m=148554717109098&w=2 Cc: James Bottomley <James.Bottomley@hansenpartnership.com> Cc: Bart Van Assche <bart.vanassche@sandisk.com> Cc: "Martin K. Petersen" <martin.petersen@oracle.com> Cc: Jan Kara <jack@suse.cz> Reported-by: Omar Sandoval <osandov@osandov.com> Tested-by: Omar Sandoval <osandov@fb.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Bart Van Assche <bart.vanassche@sandisk.com> Signed-off-by: Jens Axboe <axboe@fb.com>
2017-02-02block: Get rid of blk_get_backing_dev_info()Jan Kara
blk_get_backing_dev_info() is now a simple dereference. Remove that function and simplify some code around that. Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Jens Axboe <axboe@fb.com>
2017-02-02block: Make blk_get_backing_dev_info() safe without open bdevJan Kara
Currenly blk_get_backing_dev_info() is not safe to be called when the block device is not open as bdev->bd_disk is NULL in that case. However inode_to_bdi() uses this function and may be call called from flusher worker or other writeback related functions without bdev being open which leads to crashes such as: [113031.075540] Unable to handle kernel paging request for data at address 0x00000000 [113031.075614] Faulting instruction address: 0xc0000000003692e0 0:mon> t [c0000000fb65f900] c00000000036cb6c writeback_sb_inodes+0x30c/0x590 [c0000000fb65fa10] c00000000036ced4 __writeback_inodes_wb+0xe4/0x150 [c0000000fb65fa70] c00000000036d33c wb_writeback+0x30c/0x450 [c0000000fb65fb40] c00000000036e198 wb_workfn+0x268/0x580 [c0000000fb65fc50] c0000000000f3470 process_one_work+0x1e0/0x590 [c0000000fb65fce0] c0000000000f38c8 worker_thread+0xa8/0x660 [c0000000fb65fd80] c0000000000fc4b0 kthread+0x110/0x130 [c0000000fb65fe30] c0000000000098f0 ret_from_kernel_thread+0x5c/0x6c Signed-off-by: Jens Axboe <axboe@fb.com>
2017-02-02block: Dynamically allocate and refcount backing_dev_infoJan Kara
Instead of storing backing_dev_info inside struct request_queue, allocate it dynamically, reference count it, and free it when the last reference is dropped. Currently only request_queue holds the reference but in the following patch we add other users referencing backing_dev_info. Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Jens Axboe <axboe@fb.com>
2017-02-02block: Use pointer to backing_dev_info from request_queueJan Kara
We will want to have struct backing_dev_info allocated separately from struct request_queue. As the first step add pointer to backing_dev_info to request_queue and convert all users touching it. No functional changes in this patch. Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Jens Axboe <axboe@fb.com>
2017-02-02block: Unhash block device inodes on gendisk destructionJan Kara
Currently, block device inodes stay around after corresponding gendisk hash died until memory reclaim finds them and frees them. Since we will make block device inode pin the bdi, we want to free the block device inode as soon as the device goes away so that bdi does not stay around unnecessarily. Furthermore we need to avoid issues when new device with the same major,minor pair gets created since reusing the bdi structure would be rather difficult in this case. Unhashing block device inode on gendisk destruction nicely deals with these problems. Once last block device inode reference is dropped (which may be directly in del_gendisk()), the inode gets evicted. Furthermore if the major,minor pair gets reallocated, we are guaranteed to get new block device inode even if old block device inode is not yet evicted and thus we avoid issues with possible reuse of bdi. Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Jens Axboe <axboe@fb.com>
2017-02-02netfilter: allow logging from non-init namespacesMichal Kubeček
Commit 69b34fb996b2 ("netfilter: xt_LOG: add net namespace support for xt_LOG") disabled logging packets using the LOG target from non-init namespaces. The motivation was to prevent containers from flooding kernel log of the host. The plan was to keep it that way until syslog namespace implementation allows containers to log in a safe way. However, the work on syslog namespace seems to have hit a dead end somewhere in 2013 and there are users who want to use xt_LOG in all network namespaces. This patch allows to do so by setting /proc/sys/net/netfilter/nf_log_all_netns to a nonzero value. This sysctl is only accessible from init_net so that one cannot switch the behaviour from inside a container. Signed-off-by: Michal Kubecek <mkubecek@suse.cz> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2017-02-02ipvs: free ip_vs_dest structs when refcnt=0David Windsor
Currently, the ip_vs_dest cache frees ip_vs_dest objects when their reference count becomes < 0. Aside from not being semantically sound, this is problematic for the new type refcount_t, which will be introduced shortly in a separate patch. refcount_t is the new kernel type for holding reference counts, and provides overflow protection and a constrained interface relative to atomic_t (the type currently being used for kernel reference counts). Per Julian Anastasov: "The problem is that dest_trash currently holds deleted dests (unlinked from RCU lists) with refcnt=0." Changing dest_trash to hold dest with refcnt=1 will allow us to free ip_vs_dest structs when their refcnt=0, in ip_vs_dest_put_and_free(). Signed-off-by: David Windsor <dwindsor@gmail.com> Signed-off-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Simon Horman <horms@verge.net.au> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2017-02-02netfilter: merge ctinfo into nfct pointer storage areaFlorian Westphal
After this change conntrack operations (lookup, creation, matching from ruleset) only access one instead of two sk_buff cache lines. This works for normal conntracks because those are allocated from a slab that guarantees hw cacheline or 8byte alignment (whatever is larger) so the 3 bits needed for ctinfo won't overlap with nf_conn addresses. Template allocation now does manual address alignment (see previous change) on arches that don't have sufficent kmalloc min alignment. Some spots intentionally use skb->_nfct instead of skb_nfct() helpers, this is to avoid undoing the skb_nfct() use when we remove untracked conntrack object in the future. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2017-02-02netfilter: guarantee 8 byte minalign for template addressesFlorian Westphal
The next change will merge skb->nfct pointer and skb->nfctinfo status bits into single skb->_nfct (unsigned long) area. For this to work nf_conn addresses must always be aligned at least on an 8 byte boundary since we will need the lower 3bits to store nfctinfo. Conntrack templates are allocated via kmalloc. kbuild test robot reported BUILD_BUG_ON failed: NFCT_INFOMASK >= ARCH_KMALLOC_MINALIGN on v1 of this patchset, so not all platforms meet this requirement. Do manual alignment if needed, the alignment offset is stored in the nf_conn entry protocol area. This works because templates are not handed off to L4 protocol trackers. Reported-by: kbuild test robot <fengguang.wu@intel.com> Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>