linux.git - Linus' kernel tree

Age	Commit message (Collapse)	Author
2018-07-18	mtd: rawnand: provide only single helper function for ECC conf	Abhishek Sahu
	Function nand_ecc_choose_conf() will be help for all the cases, so other helper functions can be made static. nand_check_ecc_caps(): Invoke nand_ecc_choose_conf() with both chip->ecc.size and chip->ecc.strength value set. nand_maximize_ecc(): Invoke nand_ecc_choose_conf() with NAND_ECC_MAXIMIZE flag. nand_match_ecc_req(): Invoke nand_ecc_choose_conf() with either chip->ecc.size or chip->ecc.strength value set and without NAND_ECC_MAXIMIZE flag. CC: Masahiro Yamada <yamada.masahiro@socionext.com> Signed-off-by: Abhishek Sahu <absahu@codeaurora.org> Signed-off-by: Miquel Raynal <miquel.raynal@bootlin.com>
2018-07-18	mtd: rawnand: helper function for setting up ECC configuration	Abhishek Sahu
	commit 2c8f8afa7f92 ("mtd: nand: add generic helpers to check, match, maximize ECC settings") provides generic helpers which drivers can use for setting up ECC parameters. Since same board can have different ECC strength nand chips so following is the logic for setting up ECC strength and ECC step size, which can be used by most of the drivers. 1. If both ECC step size and ECC strength are already set (usually by DT) then just check whether this setting is supported by NAND controller. 2. If NAND_ECC_MAXIMIZE is set, then select maximum ECC strength supported by NAND controller. 3. Otherwise, try to match the ECC step size and ECC strength closest to the chip's requirement. If available OOB size can't fit the chip requirement then select maximum ECC strength which can be fit with available OOB size. This patch introduces nand_ecc_choose_conf function which calls the required helper functions for the above logic. The drivers can use this single function instead of calling the 3 helper functions individually. Signed-off-by: Abhishek Sahu <absahu@codeaurora.org> Reviewed-by: Masahiro Yamada <yamada.masahiro@socionext.com> Signed-off-by: Miquel Raynal <miquel.raynal@bootlin.com>
2018-07-18	ALSA: pcm: Nuke snd_pcm_lib_mmap_vmalloc()	Takashi Iwai
	snd_pcm_lib_mmap_vmalloc() was supposed to be implemented with somewhat special for vmalloc handling, but in the end, this turned to just the default handler, i.e. NULL. As the situation has never changed over decades, let's rip it off. Reviewed-by: Takashi Sakamoto <o-takashi@sakamocchi.jp> Signed-off-by: Takashi Iwai <tiwai@suse.de>
2018-07-18	dt-bindings: clock: add rk3399 DDR3 standard speed bins.	Enric Balletbo i Serra
	DDR3 SDRAM Standard (JESD79-3F) defines some standard speed bins for DDR3 memories. The rk3399_dmc driver allows you to pass these values via the device tree. For that purpose the devfreq/rk3399_dmc.txt binding refers to a ddr.h file which does not exist. This patch adds the missing defines in a include file called rk3399-ddr.h with the definition of standard speed bins according to the ARM Trusted Firmware (ATF). Fixes: c1ceb8f7c167 (Documentation: bindings: add dt documentation for rk3399 dmc) Signed-off-by: Enric Balletbo i Serra <enric.balletbo@collabora.com> Reviewed-by: Rob Herring <robh@kernel.org> Signed-off-by: MyungJoo Ham <myungjoo.ham@samsung.com>
2018-07-17	aio: don't expose __aio_sigset in uapi	Christoph Hellwig
	glibc uses a different defintion of sigset_t than the kernel does, and the current version would pull in both. To fix this just do not expose the type at all - this somewhat mirrors pselect() where we do not even have a type for the magic sigmask argument, but just use pointer arithmetics. Fixes: 7a074e96 ("aio: implement io_pgetevents") Signed-off-by: Christoph Hellwig <hch@lst.de> Reported-by: Adrian Reber <adrian@lisas.de> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2018-07-17	random: Return nbytes filled from hw RNG	Tobin C. Harding
	Currently the function get_random_bytes_arch() has return value 'void'. If the hw RNG fails we currently fall back to using get_random_bytes(). This defeats the purpose of requesting random material from the hw RNG in the first place. There are currently no intree users of get_random_bytes_arch(). Only get random bytes from the hw RNG, make function return the number of bytes retrieved from the hw RNG. Acked-by: Theodore Ts'o <tytso@mit.edu> Reviewed-by: Steven Rostedt (VMware) <rostedt@goodmis.org> Signed-off-by: Tobin C. Harding <me@tobin.cc> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2018-07-18	net: phy: sfp: Add HWMON support for module sensors	Andrew Lunn
	SFP modules can contain a number of sensors. The EEPROM also contains recommended alarm and critical values for each sensor, and indications of if these have been exceeded. Export this information via HWMON. Currently temperature, VCC, bias current, transmit power, and possibly receiver power is supported. The sensors in the modules can either return calibrate or uncalibrated values. Uncalibrated values need to be manipulated, using coefficients provided in the SFP EEPROM. Uncalibrated receive power values require floating point maths in order to calibrate them. Performing this in the kernel is hard. So if the SFP module indicates it uses uncalibrated values, RX power is not made available. With this hwmon device, it is possible to view the sensor values using lm-sensors programs: in0: +3.29 V (crit min = +2.90 V, min = +3.00 V) (max = +3.60 V, crit max = +3.70 V) temp1: +33.0°C (low = -5.0°C, high = +80.0°C) (crit low = -10.0°C, crit = +85.0°C) power1: 1000.00 nW (max = 794.00 uW, min = 50.00 uW) ALARM (LCRIT) (lcrit = 40.00 uW, crit = 1000.00 uW) curr1: +0.00 A (crit min = +0.00 A, min = +0.00 A) ALARM (LCRIT, MIN) (max = +0.01 A, crit max = +0.01 A) The scaling sensors performs on the bias current is not particularly good. The raw values are more useful: curr1: curr1_input: 0.000 curr1_min: 0.002 curr1_max: 0.010 curr1_lcrit: 0.000 curr1_crit: 0.011 curr1_min_alarm: 1.000 curr1_max_alarm: 0.000 curr1_lcrit_alarm: 1.000 curr1_crit_alarm: 0.000 In order to keep the I2C overhead to a minimum, the constant values, such as limits and calibration coefficients are read once at module insertion time. Thus only reading _input and _alarm properties requires i2c read operations. Signed-off-by: Andrew Lunn <andrew@lunn.ch> Acked-by: Guenter Roeck <linux@roeck-us.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-07-18	hwmon: Add helper to tell if a char is invalid in a name	Andrew Lunn
	HWMON device names are not allowed to contain "-* \t\n". Add a helper which will return true if passed an invalid character. It can be used to massage a string into a hwmon compatible name by replacing invalid characters with '_'. Signed-off-by: Andrew Lunn <andrew@lunn.ch> Acked-by: Guenter Roeck <linux@roeck-us.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-07-18	hwmon: Add support for power min, lcrit, min_alarm and lcrit_alarm	Andrew Lunn
	Some sensors support reporting minimal and lower critical power, as well as alarms when these thresholds are reached. Add support for these attributes to the hwmon core. Signed-off-by: Andrew Lunn <andrew@lunn.ch> Acked-by: Guenter Roeck <linux@roeck-us.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-07-18	hwmon: Add missing HWMON_T_LCRIT_ALARM define	Andrew Lunn
	The enum hwmon_temp_lcrit_alarm exists, but the BIT definition is missing. Signed-off-by: Andrew Lunn <andrew@lunn.ch> Acked-by: Guenter Roeck <linux@roeck-us.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-07-17	ALSA: hda: Make audio component support more generic	Takashi Iwai
	This is the final step for more generic support of DRM audio component. The generic audio component code is now moved to its own file, and the symbols are renamed from snd_hac_i915_* to snd_hdac_acomp_*, respectively. The generic code is enabled via the new kconfig, CONFIG_SND_HDA_COMPONENT, while CONFIG_SND_HDA_I915 is kept as the super-class. Along with the split, three new callbacks are added to audio_ops: pin2port is for providing the conversion between the pin number and the widget id, and master_bind/master_unbin are called at binding / unbinding the master component, respectively. All these are optional, but used in i915 implementation and also other later implementations. A note about the new snd_hdac_acomp_init() function: there is a slight difference between this and the old snd_hdac_i915_init(). The latter (still) synchronizes with the master component binding, i.e. it assures that the relevant DRM component gets bound when it returns, or gives a negative error. Meanwhile the new function doesn't synchronize but just leaves as is. It's the responsibility by the caller's side to synchronize, or the caller may accept the asynchronous binding on the fly. v1->v2: Fix missing NULL check in master_bind/unbind Signed-off-by: Takashi Iwai <tiwai@suse.de>
2018-07-17	ALSA: hda/i915: Associate audio component with devres	Takashi Iwai
	The HD-audio i915 binding code contains a single pointer, hdac_acomp, for allowing the access to audio component from the master bind/unbind callbacks. This was needed because the callbacks pass only the device pointer and we can't guarantee the object type assigned to the drvdata (which is free for each controller driver implementation). And this implementation will be a problem if we support multiple components for different DRM drivers, not only i915. As a solution, allocate the audio component object via devres and associate it with the given device, so that the component callbacks can refer to it via devres_find(). The removal of the object is still done half-manually via devres_destroy() to make the code consistent (although it may work without the explicit call). Also, the snd_hda_i915_register_notifier() had the reference to hdac_acomp as well. In this patch, the corresponding code is removed by passing hdac_bus object to the function, too. Signed-off-by: Takashi Iwai <tiwai@suse.de>
2018-07-17	drm/i915: Split audio component to a generic type	Takashi Iwai
	For allowing other drivers to use the DRM audio component, rename the i915_audio_component_* with drm_audio_component_*, and split the generic part into drm_audio_component.h. The i915 specific stuff remains in struct i915_audio_component, which contains drm_audio_component as the base. The license of drm_audio_component.h is kept to MIT as same as the the original i915_component.h. This is a preliminary change for further development, and no functional changes by this patch itself, merely code-split and renames. v1->v2: Use SPDX for drm_audio_component.h, fix remaining i915 argument in drm_audio_component.h Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Signed-off-by: Takashi Iwai <tiwai@suse.de>
2018-07-17	netfilter: nf_tables: fix jumpstack depth validation	Taehee Yoo
	The level of struct nft_ctx is updated by nf_tables_check_loops(). That is used to validate jumpstack depth. But jumpstack validation routine doesn't update and validate recursively. So, in some cases, chain depth can be bigger than the NFT_JUMP_STACK_SIZE. After this patch, The jumpstack validation routine is located in the nft_chain_validate(). When new rules or new set elements are added, the nft_table_validate() is called by the nf_tables_newrule and the nf_tables_newsetelem. The nft_table_validate() calls the nft_chain_validate() that visit all their children chains recursively. So it can update depth of chain certainly. Reproducer: %cat ./test.sh #!/bin/bash nft add table ip filter nft add chain ip filter input { type filter hook input priority 0\; } for ((i=0;i<20;i++)); do nft add chain ip filter a$i done nft add rule ip filter input jump a1 for ((i=0;i<10;i++)); do nft add rule ip filter a$i jump a$((i+1)) done for ((i=11;i<19;i++)); do nft add rule ip filter a$i jump a$((i+1)) done nft add rule ip filter a10 jump a11 Result: [ 253.931782] WARNING: CPU: 1 PID: 0 at net/netfilter/nf_tables_core.c:186 nft_do_chain+0xacc/0xdf0 [nf_tables] [ 253.931915] Modules linked in: nf_tables nfnetlink ip_tables x_tables [ 253.932153] CPU: 1 PID: 0 Comm: swapper/1 Not tainted 4.18.0-rc3+ #48 [ 253.932153] RIP: 0010:nft_do_chain+0xacc/0xdf0 [nf_tables] [ 253.932153] Code: 83 f8 fb 0f 84 c7 00 00 00 e9 d0 00 00 00 83 f8 fd 74 0e 83 f8 ff 0f 84 b4 00 00 00 e9 bd 00 00 00 83 bd 64 fd ff ff 0f 76 09 <0f> 0b 31 c0 e9 bc 02 00 00 44 8b ad 64 fd [ 253.933807] RSP: 0018:ffff88011b807570 EFLAGS: 00010212 [ 253.933807] RAX: 00000000fffffffd RBX: ffff88011b807660 RCX: 0000000000000000 [ 253.933807] RDX: 0000000000000010 RSI: ffff880112b39d78 RDI: ffff88011b807670 [ 253.933807] RBP: ffff88011b807850 R08: ffffed0023700ece R09: ffffed0023700ecd [ 253.933807] R10: ffff88011b80766f R11: ffffed0023700ece R12: ffff88011b807898 [ 253.933807] R13: ffff880112b39d80 R14: ffff880112b39d60 R15: dffffc0000000000 [ 253.933807] FS: 0000000000000000(0000) GS:ffff88011b800000(0000) knlGS:0000000000000000 [ 253.933807] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 253.933807] CR2: 00000000014f1008 CR3: 000000006b216000 CR4: 00000000001006e0 [ 253.933807] Call Trace: [ 253.933807] <IRQ> [ 253.933807] ? sched_clock_cpu+0x132/0x170 [ 253.933807] ? __nft_trace_packet+0x180/0x180 [nf_tables] [ 253.933807] ? sched_clock_cpu+0x132/0x170 [ 253.933807] ? debug_show_all_locks+0x290/0x290 [ 253.933807] ? __lock_acquire+0x4835/0x4af0 [ 253.933807] ? inet_ehash_locks_alloc+0x1a0/0x1a0 [ 253.933807] ? unwind_next_frame+0x159e/0x1840 [ 253.933807] ? __read_once_size_nocheck.constprop.4+0x5/0x10 [ 253.933807] ? nft_do_chain_ipv4+0x197/0x1e0 [nf_tables] [ 253.933807] ? nft_do_chain+0x5/0xdf0 [nf_tables] [ 253.933807] nft_do_chain_ipv4+0x197/0x1e0 [nf_tables] [ 253.933807] ? nft_do_chain_arp+0xb0/0xb0 [nf_tables] [ 253.933807] ? __lock_is_held+0x9d/0x130 [ 253.933807] nf_hook_slow+0xc4/0x150 [ 253.933807] ip_local_deliver+0x28b/0x380 [ 253.933807] ? ip_call_ra_chain+0x3e0/0x3e0 [ 253.933807] ? ip_rcv_finish+0x1610/0x1610 [ 253.933807] ip_rcv+0xbcc/0xcc0 [ 253.933807] ? debug_show_all_locks+0x290/0x290 [ 253.933807] ? ip_local_deliver+0x380/0x380 [ 253.933807] ? __lock_is_held+0x9d/0x130 [ 253.933807] ? ip_local_deliver+0x380/0x380 [ 253.933807] __netif_receive_skb_core+0x1c9c/0x2240 Signed-off-by: Taehee Yoo <ap420073@gmail.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2018-07-18	kbuild: Add build salt to the kernel and modules	Laura Abbott
	In Fedora, the debug information is packaged separately (foo-debuginfo) and can be installed separately. There's been a long standing issue where only one version of a debuginfo info package can be installed at a time. There's been an effort for Fedora for parallel debuginfo to rectify this problem. Part of the requirement to allow parallel debuginfo to work is that build ids are unique between builds. The existing upstream rpm implementation ensures this by re-calculating the build-id using the version and release as a seed. This doesn't work 100% for the kernel because of the vDSO which is its own binary and doesn't get updated when embedded. Fix this by adding some data in an ELF note for both the kernel and modules. The data is controlled via a Kconfig option so distributions can set it to an appropriate value to ensure uniqueness between builds. Suggested-by: Masahiro Yamada <yamada.masahiro@socionext.com> Signed-off-by: Laura Abbott <labbott@redhat.com> Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
2018-07-17	HID: core: do not upper bound the collection stack	Benjamin Tissoires
	Looks like 4 was sufficient until now. However, the Surface Dial needs a stack of 5 and simply fails at probing. Dynamically add HID_COLLECTION_STACK_SIZE to the size of the stack if we hit the upper bound. Checkpatch complains about bare unsigned, so converting those to 'unsigned int' in struct hid_parser Acked-by: Peter Hutterer <peter.hutterer@who-t.net> Signed-off-by: Benjamin Tissoires <benjamin.tissoires@redhat.com> Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2018-07-17	HID: input: enable Totem on the Dell Canvas 27	Benjamin Tissoires
	The Dell Canvas 27 has a tool that can be put on the surface and acts as a dial. The firmware processes the detection of the tool and forward regular HID reports with X, Y, Azimuth, rotation, width/height. The firmware also exports Contact ID, Countact Count which may hint that several totems can be used at the same time (the FW only supports one). We can tell that MT_TOOL_DIAL will be reported by setting the min/max of ABS_MT_TOOL_TYPE to MT_TOOL_DIAL. This tool is aimed at being used by the system and not the applications, so the user space processing should not go through the regular touch inputs. We set INPUT_PROP_DIRECT which applies ID_INPUT_TOUCHSCREEN to this new type of devices, but we will counter this for the time being with the special udev hwdb entry mentioned above. Link: https://bugzilla.redhat.com/show_bug.cgi?id=1511846 Acked-by: Peter Hutterer <peter.hutterer@who-t.net> Signed-off-by: Benjamin Tissoires <benjamin.tissoires@redhat.com> Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2018-07-17	input: add MT_TOOL_DIAL	Benjamin Tissoires
	A dial is a tool you place on a multitouch surface which reports its orientation or a relative angle of rotation when rotating its knob. Some examples are the Dell Totem (on the Canvas 27"), the Microsoft Dial, or the Griffin Powermate, though the later can't be put on a touch surface. We give some extra space to account for other types of fingers if we need (MT_TOOL_THUMB) Slightly change the documentation to not make it mandatory to update each MT_TOOL we add. Reviewed-by: Peter Hutterer <peter.hutterer@who-t.net> Acked-by: Dmitry Torokhov <dmitry.torokhov@gmail.com> Signed-off-by: Benjamin Tissoires <benjamin.tissoires@redhat.com> Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2018-07-17	netfilter: conntrack: remove l3proto abstraction	Florian Westphal
	This unifies ipv4 and ipv6 protocol trackers and removes the l3proto abstraction. This gets rid of all l3proto indirect calls and the need to do a lookup on the function to call for l3 demux. It increases module size by only a small amount (12kbyte), so this reduces size because nf_conntrack.ko is useless without either nf_conntrack_ipv4 or nf_conntrack_ipv6 module. before: text data bss dec hex filename 7357 1088 0 8445 20fd nf_conntrack_ipv4.ko 7405 1084 4 8493 212d nf_conntrack_ipv6.ko 72614 13689 236 86539 1520b nf_conntrack.ko 19K nf_conntrack_ipv4.ko 19K nf_conntrack_ipv6.ko 179K nf_conntrack.ko after: text data bss dec hex filename 79277 13937 236 93450 16d0a nf_conntrack.ko 191K nf_conntrack.ko Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2018-07-17	ARM: at91: pm: add PMC fast startup registers defines	Claudiu Beznea
	Add PMC fast startup registers defines. Signed-off-by: Claudiu Beznea <claudiu.beznea@microchip.com> Signed-off-by: Alexandre Belloni <alexandre.belloni@bootlin.com>
2018-07-17	ARM: at91: pm: Add ULP1 mode support	Wenyou Yang
	In the ULP1 mode, in order to achieve the lowest power consumption with the system in retention mode and be able to resume on the wake up events, all the clocks are shut off, inclusive the embedded 12MHz RC oscillator, and the number of wake up sources is limited as well. When the wake up event is asserted, the embedded 12MHz RC oscillator restarts automatically. The ULP1 (Ultra Low-power mode 1) is introduced by SAMA5D2. The previous size of pm_suspend.o was 2148 bytes. With the addition of ULP1 mode the new size of pm_suspend.o raised at 2456 bytes. Signed-off-by: Wenyou Yang <wenyou.yang@atmel.com> Signed-off-by: Ludovic Desroches <ludovic.desroches@microchip.com> [claudiu.beznea@microchip.com: aligned with 4.18-rc1] Signed-off-by: Claudiu Beznea <claudiu.beznea@microchip.com> Signed-off-by: Alexandre Belloni <alexandre.belloni@bootlin.com>
2018-07-17	vga_switcheroo: set audio client id according to bound GPU id	Jim Qu
	On modern laptop, there are more and more platforms have two GPUs, and each of them maybe have audio codec for HDMP/DP output. For some dGPU which is no output, audio codec usually is disabled. In currect HDA audio driver, it will set all codec as VGA_SWITCHEROO_DIS, the audio which is binded to UMA will be suspended if user use debugfs to contorl power In HDA driver side, it is difficult to know which GPU the audio has binded to. So set the bound gpu pci dev to vga_switcheroo. if the audio client is not the third registration, audio id will set in vga_switcheroo enable function. if the audio client is the last registration when vga_switcheroo _ready() get true, we should get audio client id from bound GPU directly. Signed-off-by: Jim Qu <Jim.Qu@amd.com> Reviewed-by: Lukas Wunner <lukas@wunner.de> Signed-off-by: Takashi Iwai <tiwai@suse.de>
2018-07-17	i2c: recovery: add get_bus_free callback	Wolfram Sang
	Some IP cores have an internal 'bus free' logic which may be more advanced than just checking if SDA is high. Add a separate callback to get this status. Filling it is optional. Signed-off-by: Wolfram Sang <wsa+renesas@sang-engineering.com> Signed-off-by: Wolfram Sang <wsa@the-dreams.de>
2018-07-17	i2c: recovery: require either get_sda or set_sda	Wolfram Sang
	For bus recovery, we either need to bail out early if we can read SDA or we need to send STOP after every pulse. Otherwise recovery might be misinterpreted as an unwanted write. So, require one of those SDA handling functions to avoid this problem. Signed-off-by: Wolfram Sang <wsa+renesas@sang-engineering.com> Acked-by: Peter Rosin <peda@axentia.se> Signed-off-by: Wolfram Sang <wsa@the-dreams.de>
2018-07-17	Merge tag 'v4.18-rc5' into i2c/for-4.19	Wolfram Sang
	Linux 4.18-rc5
2018-07-17	x86/mm/tlb: Leave lazy TLB mode at page table free time	Rik van Riel
	Andy discovered that speculative memory accesses while in lazy TLB mode can crash a system, when a CPU tries to dereference a speculative access using memory contents that used to be valid page table memory, but have since been reused for something else and point into la-la land. The latter problem can be prevented in two ways. The first is to always send a TLB shootdown IPI to CPUs in lazy TLB mode, while the second one is to only send the TLB shootdown at page table freeing time. The second should result in fewer IPIs, since operationgs like mprotect and madvise are very common with some workloads, but do not involve page table freeing. Also, on munmap, batching of page table freeing covers much larger ranges of virtual memory than the batching of unmapped user pages. Tested-by: Song Liu <songliubraving@fb.com> Signed-off-by: Rik van Riel <riel@surriel.com> Acked-by: Dave Hansen <dave.hansen@intel.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: efault@gmx.de Cc: kernel-team@fb.com Cc: luto@kernel.org Link: http://lkml.kernel.org/r/20180716190337.26133-3-riel@surriel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-07-17	mm: Allocate the mm_cpumask (mm->cpu_bitmap[]) dynamically based on nr_cpu_ids	Rik van Riel
	The mm_struct always contains a cpumask bitmap, regardless of CONFIG_CPUMASK_OFFSTACK. That means the first step can be to simplify things, and simply have one bitmask at the end of the mm_struct for the mm_cpumask. This does necessitate moving everything else in mm_struct into an anonymous sub-structure, which can be randomized when struct randomization is enabled. The second step is to determine the correct size for the mm_struct slab object from the size of the mm_struct (excluding the CPU bitmap) and the size the cpumask. For init_mm we can simply allocate the maximum size this kernel is compiled for, since we only have one init_mm in the system, anyway. Pointer magic by Mike Galbraith, to evade -Wstringop-overflow getting confused by the dynamically sized array. Tested-by: Song Liu <songliubraving@fb.com> Signed-off-by: Rik van Riel <riel@surriel.com> Signed-off-by: Mike Galbraith <efault@gmx.de> Signed-off-by: Rik van Riel <riel@surriel.com> Acked-by: Dave Hansen <dave.hansen@intel.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: kernel-team@fb.com Cc: luto@kernel.org Link: http://lkml.kernel.org/r/20180716190337.26133-2-riel@surriel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-07-17	Merge tag 'v4.18-rc5' into x86/mm, to pick up fixes	Ingo Molnar
	Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-07-17	sched/Documentation: Update wake_up() & co. memory-barrier guarantees	Andrea Parri
	Both the implementation and the users' expectation [1] for the various wakeup primitives have evolved over time, but the documentation has not kept up with these changes: brings it into 2018. [1] http://lkml.kernel.org/r/20180424091510.GB4064@hirez.programming.kicks-ass.net Also applied feedback from Alan Stern. Suggested-by: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Andrea Parri <andrea.parri@amarulasolutions.com> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Akira Yokosawa <akiyks@gmail.com> Cc: Alan Stern <stern@rowland.harvard.edu> Cc: Boqun Feng <boqun.feng@gmail.com> Cc: Daniel Lustig <dlustig@nvidia.com> Cc: David Howells <dhowells@redhat.com> Cc: Jade Alglave <j.alglave@ucl.ac.uk> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Luc Maranget <luc.maranget@inria.fr> Cc: Nicholas Piggin <npiggin@gmail.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Will Deacon <will.deacon@arm.com> Cc: linux-arch@vger.kernel.org Cc: parri.andrea@gmail.com Link: http://lkml.kernel.org/r/20180716180605.16115-12-paulmck@linux.vnet.ibm.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-07-17	locking/spinlock, sched/core: Clarify requirements for smp_mb__after_spinlock()	Andrea Parri
	There are 11 interpretations of the requirements described in the header comment for smp_mb__after_spinlock(): one for each LKMM maintainer, and one currently encoded in the Cat file. Stick to the latter (until a more satisfactory solution is available). This also reworks some snippets related to the barrier to illustrate the requirements and to link them to the idioms which are relied upon at its call sites. Suggested-by: Boqun Feng <boqun.feng@gmail.com> Signed-off-by: Andrea Parri <andrea.parri@amarulasolutions.com> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Acked-by: Peter Zijlstra <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Will Deacon <will.deacon@arm.com> Cc: akiyks@gmail.com Cc: dhowells@redhat.com Cc: j.alglave@ucl.ac.uk Cc: linux-arch@vger.kernel.org Cc: luc.maranget@inria.fr Cc: npiggin@gmail.com Cc: parri.andrea@gmail.com Cc: stern@rowland.harvard.edu Link: http://lkml.kernel.org/r/20180716180605.16115-11-paulmck@linux.vnet.ibm.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-07-17	Merge tag 'v4.18-rc5' into locking/core, to pick up fixes	Ingo Molnar
	Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-07-17	Merge branch 'for-mingo' of ↵	Ingo Molnar
	git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu into core/rcu Pull RCU updates from Paul E. McKenney: - An optimization and a fix for RCU expedited grace periods, with the fix being from Boqun Feng. - Miscellaneous fixes, including a lockdep-annotation fix from Boqun Feng. - SRCU updates. - Updates to rcutorture and associated scripting. - Introduce grace-period sequence numbers to the RCU-bh, RCU-preempt, and RCU-sched flavors, replacing the old ->gpnum and ->completed pair of fields. This change allows lockless code to obtain the complete grace-period state with a single READ_ONCE(), which is needed to maintain tolerable lock contention during the upcoming consolidation of the three RCU flavors. Note that grace-period sequence numbers are already used by rcu_barrier(), expedited RCU grace periods, and SRCU, and are thus already heavily used and well-tested. Joel Fernandes contributed a number of excellent fixes and improvements. - Clean up some grace-period-reporting loose ends, including improving the handling of quiescent states from offline CPUs and fixing some false-positive WARN_ON_ONCE() invocations. (Strictly speaking, the WARN_ON_ONCE() invocations were quite correct, but their invariants were (harmlessly) violated by the earlier sloppy handling of quiescent states from offline CPUs.) In addition, improve grace-period forward-progress guarantees so as to allow removal of fail-safe checks that required otherwise needless lock acquisitions. Finally, add more diagnostics to help debug the upcoming consolidation of the RCU-bh, RCU-preempt, and RCU-sched flavors. - Additional miscellaneous fixes, including those contributed by Byungchul Park, Mauro Carvalho Chehab, Joe Perches, Joel Fernandes, Steven Rostedt, Andrea Parri, and Neil Brown. - Additional torture-test changes, including several contributed by Arnd Bergmann and Joel Fernandes. Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-07-16	tcp: Fix broken repair socket window probe patch	Stefan Baranoff
	Correct previous bad attempt at allowing sockets to come out of TCP repair without sending window probes. To avoid changing size of the repair variable in struct tcp_sock, this lets the decision for sending probes or not to be made when coming out of repair by introducing two ways to turn it off. v2: * Remove erroneous comment; defines now make behavior clear Fixes: 70b7ff130224 ("tcp: allow user to create repair socket without window probes") Signed-off-by: Stefan Baranoff <sbaranoff@gmail.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Acked-by: Andrei Vagin <avagin@virtuozzo.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-07-16	net/ethernet/freescale/fman: fix cross-build error	Randy Dunlap
	CC [M] drivers/net/ethernet/freescale/fman/fman.o In file included from ../drivers/net/ethernet/freescale/fman/fman.c:35: ../include/linux/fsl/guts.h: In function 'guts_set_dmacr': ../include/linux/fsl/guts.h:165:2: error: implicit declaration of function 'clrsetbits_be32' [-Werror=implicit-function-declaration] clrsetbits_be32(&guts->dmacr, 3 << shift, device << shift); ^~~~~~~~~~~~~~~ Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Cc: Madalin Bucur <madalin.bucur@nxp.com> Cc: netdev@vger.kernel.org Cc: linuxppc-dev@lists.ozlabs.org Signed-off-by: David S. Miller <davem@davemloft.net>
2018-07-16	net: convert gro_count to bitmask	Li RongQing
	gro_hash size is 192 bytes, and uses 3 cache lines, if there is few flows, gro_hash may be not fully used, so it is unnecessary to iterate all gro_hash in napi_gro_flush(), to occupy unnecessary cacheline. convert gro_count to a bitmask, and rename it as gro_bitmask, each bit represents a element of gro_hash, only flush a gro_hash element if the related bit is set, to speed up napi_gro_flush(). and update gro_bitmask only if it will be changed, to reduce cache update Suggested-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Li RongQing <lirongqing@baidu.com> Cc: Stefano Brivio <sbrivio@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-07-16	net: phy: add phy_speed_down and phy_speed_up	Heiner Kallweit
	Some network drivers include functionality to speed down the PHY when suspending and just waiting for a WoL packet because this saves energy. This functionality is quite generic, therefore let's factor it out to phylib. Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-07-16	drm/amdgpu: Allow to create BO lists in CS ioctl v3	Andrey Grodzovsky
	This change is to support MESA performace optimization. Modify CS IOCTL to allow its input as command buffer and an array of buffer handles to create a temporay bo list and then destroy it when IOCTL completes. This saves on calling for BO_LIST create and destry IOCTLs in MESA and by this improves performance. v2: Avoid inserting the temp list into idr struct. v3: Remove idr alloation from amdgpu_bo_list_create. Remove useless argument from amdgpu_cs_parser_fini Minor cosmetic stuff. v4: Revert amdgpu_bo_list_destroy back to static Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Reviewed-by: Chunming Zhou <david1.zhou@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2018-07-16	ima: based on policy require signed kexec kernel images	Mimi Zohar
	The original kexec_load syscall can not verify file signatures, nor can the kexec image be measured. Based on policy, deny the kexec_load syscall. Signed-off-by: Mimi Zohar <zohar@linux.vnet.ibm.com> Cc: Eric Biederman <ebiederm@xmission.com> Cc: Kees Cook <keescook@chromium.org> Reviewed-by: Kees Cook <keescook@chromium.org> Signed-off-by: James Morris <james.morris@microsoft.com>
2018-07-16	security: define new LSM hook named security_kernel_load_data	Mimi Zohar
	Differentiate between the kernel reading a file specified by userspace from the kernel loading a buffer containing data provided by userspace. This patch defines a new LSM hook named security_kernel_load_data(). Signed-off-by: Mimi Zohar <zohar@linux.vnet.ibm.com> Cc: Eric Biederman <ebiederm@xmission.com> Cc: Luis R. Rodriguez <mcgrof@kernel.org> Cc: Kees Cook <keescook@chromium.org> Cc: Casey Schaufler <casey@schaufler-ca.com> Acked-by: Serge Hallyn <serge@hallyn.com> Acked-by: Kees Cook <keescook@chromium.org> Signed-off-by: James Morris <james.morris@microsoft.com>
2018-07-16	ipv6/mcast: init as INCLUDE when join SSM INCLUDE group	Hangbin Liu
	This an IPv6 version patch of "ipv4/igmp: init group mode as INCLUDE when join source group". From RFC3810, part 6.1: If no per-interface state existed for that multicast address before the change (i.e., the change consisted of creating a new per-interface record), or if no state exists after the change (i.e., the change consisted of deleting a per-interface record), then the "non-existent" state is considered to have an INCLUDE filter mode and an empty source list. Which means a new multicast group should start with state IN(). Currently, for MLDv2 SSM JOIN_SOURCE_GROUP mode, we first call ipv6_sock_mc_join(), then ip6_mc_source(), which will trigger a TO_IN() message instead of ALLOW(). The issue was exposed by commit a052517a8ff65 ("net/multicast: should not send source list records when have filter mode change"). Before this change, we sent both ALLOW(A) and TO_IN(A). Now, we only send TO_IN(A). Fix it by adding a new parameter to init group mode. Also add some wrapper functions to avoid changing too much code. v1 -> v2: In the first version I only cleared the group change record. But this is not enough. Because when a new group join, it will init as EXCLUDE and trigger a filter mode change in ip/ip6_mc_add_src(), which will clear all source addresses sf_crcount. This will prevent early joined address sending state change records if multi source addressed joined at the same time. In v2 patch, I fixed it by directly initializing the mode to INCLUDE for SSM JOIN_SOURCE_GROUP. I also split the original patch into two separated patches for IPv4 and IPv6. There is also a difference between v4 and v6 version. For IPv6, when the interface goes down and up, we will send correct state change record with unspecified IPv6 address (::) with function ipv6_mc_up(). But after DAD is completed, we resend the change record TO_IN() in mld_send_initial_cr(). Fix it by sending ALLOW() for INCLUDE mode in mld_send_initial_cr(). Fixes: a052517a8ff65 ("net/multicast: should not send source list records when have filter mode change") Reviewed-by: Stefano Brivio <sbrivio@redhat.com> Signed-off-by: Hangbin Liu <liuhangbin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-07-16	ipv4/igmp: init group mode as INCLUDE when join source group	Hangbin Liu
	Based on RFC3376 5.1 If no interface state existed for that multicast address before the change (i.e., the change consisted of creating a new per-interface record), or if no state exists after the change (i.e., the change consisted of deleting a per-interface record), then the "non-existent" state is considered to have a filter mode of INCLUDE and an empty source list. Which means a new multicast group should start with state IN(). Function ip_mc_join_group() works correctly for IGMP ASM(Any-Source Multicast) mode. It adds a group with state EX() and inits crcount to mc_qrv, so the kernel will send a TO_EX() report message after adding group. But for IGMPv3 SSM(Source-specific multicast) JOIN_SOURCE_GROUP mode, we split the group joining into two steps. First we join the group like ASM, i.e. via ip_mc_join_group(). So the state changes from IN() to EX(). Then we add the source-specific address with INCLUDE mode. So the state changes from EX() to IN(A). Before the first step sends a group change record, we finished the second step. So we will only send the second change record. i.e. TO_IN(A). Regarding the RFC stands, we should actually send an ALLOW(A) message for SSM JOIN_SOURCE_GROUP as the state should mimic the 'IN() to IN(A)' transition. The issue was exposed by commit a052517a8ff65 ("net/multicast: should not send source list records when have filter mode change"). Before this change, we used to send both ALLOW(A) and TO_IN(A). After this change we only send TO_IN(A). Fix it by adding a new parameter to init group mode. Also add new wrapper functions so we don't need to change too much code. v1 -> v2: In my first version I only cleared the group change record. But this is not enough. Because when a new group join, it will init as EXCLUDE and trigger an filter mode change in ip/ip6_mc_add_src(), which will clear all source addresses' sf_crcount. This will prevent early joined address sending state change records if multi source addressed joined at the same time. In v2 patch, I fixed it by directly initializing the mode to INCLUDE for SSM JOIN_SOURCE_GROUP. I also split the original patch into two separated patches for IPv4 and IPv6. Fixes: a052517a8ff65 ("net/multicast: should not send source list records when have filter mode change") Reviewed-by: Stefano Brivio <sbrivio@redhat.com> Signed-off-by: Hangbin Liu <liuhangbin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-07-16	mm: don't do zero_resv_unavail if memmap is not allocated	Pavel Tatashin
	Moving zero_resv_unavail before memmap_init_zone(), caused a regression on x86-32. The cause is that we access struct pages before they are allocated when CONFIG_FLAT_NODE_MEM_MAP is used. free_area_init_nodes() zero_resv_unavail() mm_zero_struct_page(pfn_to_page(pfn)); <- struct page is not alloced free_area_init_node() if CONFIG_FLAT_NODE_MEM_MAP alloc_node_mem_map() memblock_virt_alloc_node_nopanic() <- struct page alloced here On the other hand memblock_virt_alloc_node_nopanic() zeroes all the memory that it returns, so we do not need to do zero_resv_unavail() here. Fixes: e181ae0c5db9 ("mm: zero unavailable pages before memmap init") Signed-off-by: Pavel Tatashin <pasha.tatashin@oracle.com> Tested-by: Matt Hart <matt@mattface.org> Acked-by: Michal Hocko <mhocko@suse.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-07-16	netfilter: conntrack: remove get_timeout() indirection	Florian Westphal
	Not needed, we can have the l4trackers fetch it themselvs. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2018-07-16	netfilter: conntrack: avoid calls to l4proto invert_tuple	Florian Westphal
	Handle the common cases (tcp, udp, etc). in the core and only do the indirect call for the protocols that need it (GRE for instance). Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2018-07-16	netfilter: conntrack: remove get_l4proto indirection from l3 protocol trackers	Florian Westphal
	Handle it in the core instead. ipv6_skip_exthdr() is built-in even if ipv6 is a module, i.e. this doesn't create an ipv6 dependency. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2018-07-16	netfilter: conntrack: remove invert_tuple indirection from l3 protocol trackers	Florian Westphal
	Its simpler to just handle it directly in nf_ct_invert_tuple(). Also gets rid of need to pass l3proto pointer to resolve_conntrack(). Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2018-07-16	netfilter: conntrack: remove pkt_to_tuple indirection from l3 protocol trackers	Florian Westphal
	Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2018-07-16	netfilter: conntrack: remove ctnetlink callbacks from l3 protocol trackers	Florian Westphal
	handle everything from ctnetlink directly. After all these years we still only support ipv4 and ipv6, so it seems reasonable to remove l3 protocol tracker support and instead handle ipv4/ipv6 from a common, always builtin inet tracker. Step 1: Get rid of all the l3proto->func() calls. Start with ctnetlink, then move on to packet-path ones. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2018-07-16	openvswitch: use nf_ct_get_tuplepr, invert_tuplepr	Florian Westphal
	These versions deal with the l3proto/l4proto details internally. It removes only caller of nf_ct_get_tuple, so make it static. After this, l3proto->get_l4proto() can be removed in a followup patch. Signed-off-by: Florian Westphal <fw@strlen.de> Acked-by: Pravin B Shelar <pshelar@ovn.org> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2018-07-16	netfilter: utils: move nf_ip6_checksum* from ipv6 to utils	Florian Westphal
	similar to previous change, this also allows to remove it from nf_ipv6_ops and avoid the indirection. It also removes the bogus dependency of nf_conntrack_ipv6 on ipv6 module: ipv6 checksum functions are built into kernel even if CONFIG_IPV6=m, but ipv6/netfilter.o isn't. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>