summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2015-11-02bonding: simplify / unify event handling code for 3ad mode.Mahesh Bandewar
Old logic of updating state-machine is not required since ad_update_actor_keys() does it implicitly. The only loss is the notification differentiation between speed vs. duplex change. Now only one unified notification is printed. Signed-off-by: Mahesh Bandewar <maheshb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-11-02bonding: unify all places where actor-oper key needs to be updated.Mahesh Bandewar
actor_admin, and actor_oper key is changed at multiple locations in the code. This patch brings all those updates into one location in an attempt to avoid possible inconsistent updates causing LACP state machine to go in weird state. The unified place is ad_update_actor_key() with simple state-machine logic - (a) If port is "duplex" then only it can participate in LACP (b) Speed change reinitializes the LACP state-machine. Signed-off-by: Mahesh Bandewar <maheshb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-11-02bonding: Simplify __get_duplex function.Mahesh Bandewar
Eliminate 'else' clause by simply initializing variable Signed-off-by: Mahesh Bandewar <maheshb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-11-02Merge branch 'bpf-persistent'David S. Miller
Daniel Borkmann says: ==================== BPF updates This set adds support for persistent maps/progs. Please see individual patches for further details. A man-page update to bpf(2) will be sent later on, also a iproute2 patch for support in tc. v1 -> v2: - Reworked most of patch 4 and 5 - Rebased to latest net-next ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2015-11-02bpf: add sample usages for persistent maps/progsDaniel Borkmann
This patch adds a couple of stand-alone examples on how BPF_OBJ_PIN and BPF_OBJ_GET commands can be used. Example with maps: # ./fds_example -F /sys/fs/bpf/m -P -m -k 1 -v 42 bpf: map fd:3 (Success) bpf: pin ret:(0,Success) bpf: fd:3 u->(1:42) ret:(0,Success) # ./fds_example -F /sys/fs/bpf/m -G -m -k 1 bpf: get fd:3 (Success) bpf: fd:3 l->(1):42 ret:(0,Success) # ./fds_example -F /sys/fs/bpf/m -G -m -k 1 -v 24 bpf: get fd:3 (Success) bpf: fd:3 u->(1:24) ret:(0,Success) # ./fds_example -F /sys/fs/bpf/m -G -m -k 1 bpf: get fd:3 (Success) bpf: fd:3 l->(1):24 ret:(0,Success) # ./fds_example -F /sys/fs/bpf/m2 -P -m bpf: map fd:3 (Success) bpf: pin ret:(0,Success) # ./fds_example -F /sys/fs/bpf/m2 -G -m -k 1 bpf: get fd:3 (Success) bpf: fd:3 l->(1):0 ret:(0,Success) # ./fds_example -F /sys/fs/bpf/m2 -G -m bpf: get fd:3 (Success) Example with progs: # ./fds_example -F /sys/fs/bpf/p -P -p bpf: prog fd:3 (Success) bpf: pin ret:(0,Success) bpf sock:4 <- fd:3 attached ret:(0,Success) # ./fds_example -F /sys/fs/bpf/p -G -p bpf: get fd:3 (Success) bpf: sock:4 <- fd:3 attached ret:(0,Success) # ./fds_example -F /sys/fs/bpf/p2 -P -p -o ./sockex1_kern.o bpf: prog fd:5 (Success) bpf: pin ret:(0,Success) bpf: sock:3 <- fd:5 attached ret:(0,Success) # ./fds_example -F /sys/fs/bpf/p2 -G -p bpf: get fd:3 (Success) bpf: sock:4 <- fd:3 attached ret:(0,Success) Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-11-02bpf: add support for persistent maps/progsDaniel Borkmann
This work adds support for "persistent" eBPF maps/programs. The term "persistent" is to be understood that maps/programs have a facility that lets them survive process termination. This is desired by various eBPF subsystem users. Just to name one example: tc classifier/action. Whenever tc parses the ELF object, extracts and loads maps/progs into the kernel, these file descriptors will be out of reach after the tc instance exits. So a subsequent tc invocation won't be able to access/relocate on this resource, and therefore maps cannot easily be shared, f.e. between the ingress and egress networking data path. The current workaround is that Unix domain sockets (UDS) need to be instrumented in order to pass the created eBPF map/program file descriptors to a third party management daemon through UDS' socket passing facility. This makes it a bit complicated to deploy shared eBPF maps or programs (programs f.e. for tail calls) among various processes. We've been brainstorming on how we could tackle this issue and various approches have been tried out so far, which can be read up further in the below reference. The architecture we eventually ended up with is a minimal file system that can hold map/prog objects. The file system is a per mount namespace singleton, and the default mount point is /sys/fs/bpf/. Any subsequent mounts within a given namespace will point to the same instance. The file system allows for creating a user-defined directory structure. The objects for maps/progs are created/fetched through bpf(2) with two new commands (BPF_OBJ_PIN/BPF_OBJ_GET). I.e. a bpf file descriptor along with a pathname is being passed to bpf(2) that in turn creates (we call it eBPF object pinning) the file system nodes. Only the pathname is being passed to bpf(2) for getting a new BPF file descriptor to an existing node. The user can use that to access maps and progs later on, through bpf(2). Removal of file system nodes is being managed through normal VFS functions such as unlink(2), etc. The file system code is kept to a very minimum and can be further extended later on. The next step I'm working on is to add dump eBPF map/prog commands to bpf(2), so that a specification from a given file descriptor can be retrieved. This can be used by things like CRIU but also applications can inspect the meta data after calling BPF_OBJ_GET. Big thanks also to Alexei and Hannes who significantly contributed in the design discussion that eventually let us end up with this architecture here. Reference: https://lkml.org/lkml/2015/10/15/925 Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-11-02bpf: consolidate bpf_prog_put{, _rcu} dismantle pathsDaniel Borkmann
We currently have duplicated cleanup code in bpf_prog_put() and bpf_prog_put_rcu() cleanup paths. Back then we decided that it was not worth it to make it a common helper called by both, but with the recent addition of resource charging, we could have avoided the fix in commit ac00737f4e81 ("bpf: Need to call bpf_prog_uncharge_memlock from bpf_prog_put") if we would have had only a single, common path. We can simplify it further by assigning aux->prog only once during allocation time. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-11-02bpf: align and clean bpf_{map,prog}_get helpersDaniel Borkmann
Add a bpf_map_get() function that we're going to use later on and align/clean the remaining helpers a bit so that we have them a bit more consistent: - __bpf_map_get() and __bpf_prog_get() that both work on the fd struct, check whether the descriptor is eBPF and return the pointer to the map/prog stored in the private data. Also, we can return f.file->private_data directly, the function signature is enough of a documentation already. - bpf_map_get() and bpf_prog_get() that both work on u32 user fd, call their respective __bpf_map_get()/__bpf_prog_get() variants, and take a reference. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-11-02bpf: abstract anon_inode_getfd invocationsDaniel Borkmann
Since we're going to use anon_inode_getfd() invocations in more than just the current places, make a helper function for both, so that we only need to pass a map/prog pointer to the helper itself in order to get a fd. The new helpers are called bpf_map_new_fd() and bpf_prog_new_fd(). Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-11-02net: fix percpu memory leaksEric Dumazet
This patch fixes following problems : 1) percpu_counter_init() can return an error, therefore init_frag_mem_limit() must propagate this error so that inet_frags_init_net() can do the same up to its callers. 2) If ip[46]_frags_ns_ctl_register() fail, we must unwind properly and free the percpu_counter. Without this fix, we leave freed object in percpu_counters global list (if CONFIG_HOTPLUG_CPU) leading to crashes. This bug was detected by KASAN and syzkaller tool (http://github.com/google/syzkaller) Fixes: 6d7b857d541e ("net: use lib/percpu_counter API for fragmentation mem accounting") Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: Dmitry Vyukov <dvyukov@google.com> Cc: Hannes Frederic Sowa <hannes@stressinduktion.org> Cc: Jesper Dangaard Brouer <brouer@redhat.com> Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-11-02net: avoid NULL deref in inet_ctl_sock_destroy()Eric Dumazet
Under low memory conditions, tcp_sk_init() and icmp_sk_init() can both iterate on all possible cpus and call inet_ctl_sock_destroy(), with eventual NULL pointer. Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: Dmitry Vyukov <dvyukov@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-11-03drm/exynos/gem: remove DMA-mapping hacks used for constructing page arrayMarek Szyprowski
Exynos GEM objects contains an array of pointers to the pages, which the allocated buffer consists of. Till now the code used some hacks (like relying on DMA-mapping internal structures or using ARM-specific dma_to_pfn helper) to build this array. This patch fixes this by adding proper call to dma_get_sgtable_attrs() and using the acquired scatter-list to construct needed array. This approach is more portable (work also for ARM64) and finally fixes the layering violation that was present in this code. Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com> Signed-off-by: Inki Dae <inki.dae@samsung.com>
2015-11-03ARM: exynos_defconfig: enable Exynos DRM Mixer driverAndrzej Hajda
Mixer driver is selected by CONFIG_DRM_EXYNOS_HDMI option. Since Exynos5433 HDMI does not require Mixer. There will be separate options to select Mixer and HDMI. Adding new option to defconfig before Kconfig will allow to keep bisectability. Signed-off-by: Andrzej Hajda <a.hajda@samsung.com> Reviewed-by: Krzysztof Kozlowski <k.kozlowski@samsung.com> Acked-by: Krzysztof Kozlowski <k.kozlowski@samsung.com> Signed-off-by: Inki Dae <inki.dae@samsung.com>
2015-11-03drm/exynos: simplify Kconfig component namesAndrzej Hajda
Many Exynos DRM sub-options mentions Exynos DRM in their titles. It is redundant and can be safely shortened. The patch additionally makes some entries more descriptive. Signed-off-by: Andrzej Hajda <a.hajda@samsung.com> Signed-off-by: Inki Dae <inki.dae@samsung.com>
2015-11-03drm/exynos: re-arrange Kconfig entriesAndrzej Hajda
Exynos DRM driver have quite big number of components and options. The patch re-arranges them into three logical groups: - CRTCs, - Encoders and Bridges, - Sub-drivers. It should make driver options more clear. Signed-off-by: Andrzej Hajda <a.hajda@samsung.com> Signed-off-by: Inki Dae <inki.dae@samsung.com>
2015-11-03drm/exynos: abstract out common dependencyAndrzej Hajda
All options depends on DRM_EXYNOS so it can be moved to enclosing if clause. Signed-off-by: Andrzej Hajda <a.hajda@samsung.com> Signed-off-by: Inki Dae <inki.dae@samsung.com>
2015-11-03drm/exynos: separate Mixer and HDMI driversAndrzej Hajda
Latest Exynos SoCs does not have Mixer IP, but they still have HDMI IP. Their drivers should be configurable separately. Signed-off-by: Andrzej Hajda <a.hajda@samsung.com> Signed-off-by: Inki Dae <inki.dae@samsung.com>
2015-11-03drm/exynos/mixer: replace direct cross-driver call with drm mode validationAndrzej Hajda
HDMI driver called directly function from MIXER driver to invalidate modes not supported by MIXER. The patch replaces the hack with proper .atomic_check callback. Signed-off-by: Andrzej Hajda <a.hajda@samsung.com> Signed-off-by: Inki Dae <inki.dae@samsung.com>
2015-11-03drm/exynos: add atomic_check callback to exynos_crtcAndrzej Hajda
Some CRTCs needs mode validation, this patch adds neccessary callback to Exynos DRM framework. It is called from DRM core via atomic_check helper for drm_crtc. Signed-off-by: Andrzej Hajda <a.hajda@samsung.com> Signed-off-by: Inki Dae <inki.dae@samsung.com>
2015-11-03drm/exynos/decon5433: add support for DECON-TVAndrzej Hajda
DECON-TV IP is responsible for generating video stream which is transferred to HDMI IP. It is almost fully compatible with DECON IP. The patch is based on initial work of Hyungwon Hwang. Signed-off-by: Andrzej Hajda <a.hajda@samsung.com> Signed-off-by: Inki Dae <inki.dae@samsung.com>
2015-11-03drm/exynos/decon5433: remove duplicated initializationAndrzej Hajda
Field .commit is already initialized few lines above. Signed-off-by: Andrzej Hajda <a.hajda@samsung.com> Signed-off-by: Inki Dae <inki.dae@samsung.com>
2015-11-03drm/exynos/decon5433: merge different flag fieldsAndrzej Hajda
Driver uses four different fields for internal flags. They can be merged into one. Signed-off-by: Andrzej Hajda <a.hajda@samsung.com> Signed-off-by: Inki Dae <inki.dae@samsung.com>
2015-11-03drm/exynos/decon5433: add function to set particular register bitsAndrzej Hajda
The driver often sets only particular bits of configuration registers. Using separate function to such action simplifies the code. Signed-off-by: Andrzej Hajda <a.hajda@samsung.com> Signed-off-by: Inki Dae <inki.dae@samsung.com>
2015-11-03drm/exynos/decon5433: fix timing registers writesAndrzej Hajda
All timing registers should contain values decreased by one. Signed-off-by: Andrzej Hajda <a.hajda@samsung.com> Signed-off-by: Inki Dae <inki.dae@samsung.com>
2015-11-03drm/exynos/decon5433: add PCLK clockAndrzej Hajda
PCLK clock is used by DECON IP. The patch also replaces magic number with number of clocks in array definition. Signed-off-by: Andrzej Hajda <a.hajda@samsung.com> Signed-off-by: Inki Dae <inki.dae@samsung.com>
2015-11-03Merge branch 'xfs-dax-updates' into for-nextDave Chinner
2015-11-03Merge branch 'xfs-misc-fixes-for-4.4-2' into for-nextDave Chinner
2015-11-03xfs: optimise away log forces on timestamp updates for fdatasyncDave Chinner
xfs: timestamp updates cause excessive fdatasync log traffic Sage Weil reported that a ceph test workload was writing to the log on every fdatasync during an overwrite workload. Event tracing showed that the only metadata modification being made was the timestamp updates during the write(2) syscall, but fdatasync(2) is supposed to ignore them. The key observation was that the transactions in the log all looked like this: INODE: #regs: 4 ino: 0x8b flags: 0x45 dsize: 32 And contained a flags field of 0x45 or 0x85, and had data and attribute forks following the inode core. This means that the timestamp updates were triggering dirty relogging of previously logged parts of the inode that hadn't yet been flushed back to disk. There are two parts to this problem. The first is that XFS relogs dirty regions in subsequent transactions, so it carries around the fields that have been dirtied since the last time the inode was written back to disk, not since the last time the inode was forced into the log. The second part is that on v5 filesystems, the inode change count update during inode dirtying also sets the XFS_ILOG_CORE flag, so on v5 filesystems this makes a timestamp update dirty the entire inode. As a result when fdatasync is run, it looks at the dirty fields in the inode, and sees more than just the timestamp flag, even though the only metadata change since the last fdatasync was just the timestamps. Hence we force the log on every subsequent fdatasync even though it is not needed. To fix this, add a new field to the inode log item that tracks changes since the last time fsync/fdatasync forced the log to flush the changes to the journal. This flag is updated when we dirty the inode, but we do it before updating the change count so it does not carry the "core dirty" flag from timestamp updates. The fields are zeroed when the inode is marked clean (due to writeback/freeing) or when an fsync/datasync forces the log. Hence if we only dirty the timestamps on the inode between fsync/fdatasync calls, the fdatasync will not trigger another log force. Over 100 runs of the test program: Ext4 baseline: runtime: 1.63s +/- 0.24s avg lat: 1.59ms +/- 0.24ms iops: ~2000 XFS, vanilla kernel: runtime: 2.45s +/- 0.18s avg lat: 2.39ms +/- 0.18ms log forces: ~400/s iops: ~1000 XFS, patched kernel: runtime: 1.49s +/- 0.26s avg lat: 1.46ms +/- 0.25ms log forces: ~30/s iops: ~1500 Reported-by: Sage Weil <sage@redhat.com> Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
2015-11-03xfs: don't leak uuid table on rmmodDarrick J. Wong
Don't leak the UUID table when the module is unloaded. (Found with kmemleak.) Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
2015-11-03xfs: invalidate cached acl if set via ioctlAndreas Gruenbacher
Setting or removing the "SGI_ACL_[FILE|DEFAULT]" attributes via the XFS_IOC_ATTRMULTI_BY_HANDLE ioctl completely bypasses the POSIX ACL infrastructure, like setting the "trusted.SGI_ACL_[FILE|DEFAULT]" xattrs did until commit 6caa1056. Similar to that commit, invalidate cached acls when setting/removing them via the ioctl as well. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
2015-11-03xfs: Plug memory leak in xfs_attrmulti_attr_setAndreas Gruenbacher
When setting attributes via XFS_IOC_ATTRMULTI_BY_HANDLE, the user-space buffer is copied into a new kernel-space buffer via memdup_user; that buffer then isn't freed. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
2015-11-03xfs: Validate the length of on-disk ACLsAndreas Gruenbacher
In xfs_acl_from_disk, instead of trusting that xfs_acl.acl_cnt is correct, make sure that the length of the attributes is correct as well. Also, turn the aclp parameter into a const pointer. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
2015-11-03xfs: invalidate cached acl if set directly via xattrBrian Foster
ACLs are stored as extended attributes of the inode to which they apply. XFS converts the standard "system.posix_acl_[access|default]" attribute names used to control ACLs to "trusted.SGI_ACL_[FILE|DEFAULT]" as stored on-disk. These xattrs are directly exposed in on-disk format via getxattr/setxattr, without any ACL aware code in the path to perform validation, etc. This is partly historical and supports backup/restore applications such as xfsdump to back up and restore the binary blob that represents ACLs as-is. Andreas reports that the ACLs observed via the getfacl interface is not consistent when ACLs are set directly via the setxattr path. This occurs because the ACLs are cached in-core against the inode and the xattr path has no knowledge that the operation relates to ACLs. Update the xattr set codepath to trap writes of the special XFS ACL attributes and invalidate the associated cached ACL when this occurs. This ensures that the correct ACLs are used on a subsequent operation through the actual ACL interface. Note that this does not update or add support for setting the ACL xattrs directly beyond the restore use case that requires a correctly formatted binary blob and to restore a consistent i_mode at the same time. It is still possible for a root user to set an invalid or inconsistent (with i_mode) ACL blob on-disk and potentially cause corruption. [ With fixes from Andreas Gruenbacher. ] Reported-by: Andreas Gruenbacher <agruenba@redhat.com> Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
2015-11-03xfs: xfs_filemap_pmd_fault treats read faults as write faultsDave Chinner
The code initially committed didn't have the same checks for write faults as the dax_pmd_fault code and hence treats all faults as write faults. We can get read faults through this path because they is no pmd_mkwrite path for write faults similar to the normal page fault path. Hence we need to ensure that we only do c/mtime updates on write faults, and freeze protection is unnecessary for read faults. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
2015-11-03xfs: add ->pfn_mkwrite support for DAXDave Chinner
->pfn_mkwrite support is needed so that when a page with allocated backing store takes a write fault we can check that the fault has not raced with a truncate and is pointing to a region beyond the current end of file. This also allows us to update the timestamp on the inode, too, which fixes a generic/080 failure. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
2015-11-03xfs: DAX does not use IO completion callbacksDave Chinner
For DAX, we are now doing block zeroing during allocation. This means we no longer need a special DAX fault IO completion callback to do unwritten extent conversion. Because mmap never extends the file size (it SEGVs the process) we don't need a callback to update the file size, either. Hence we can remove the completion callbacks from the __dax_fault and __dax_mkwrite calls. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
2015-11-03xfs: Don't use unwritten extents for DAXDave Chinner
DAX has a page fault serialisation problem with block allocation. Because it allows concurrent page faults and does not have a page lock to serialise faults to the same page, it can get two concurrent faults to the page that race. When two read faults race, this isn't a huge problem as the data underlying the page is not changing and so "detect and drop" works just fine. The issues are to do with write faults. When two write faults occur, we serialise block allocation in get_blocks() so only one faul will allocate the extent. It will, however, be marked as an unwritten extent, and that is where the problem lies - the DAX fault code cannot differentiate between a block that was just allocated and a block that was preallocated and needs zeroing. The result is that both write faults end up zeroing the block and attempting to convert it back to written. The problem is that the first fault can zero and convert before the second fault starts zeroing, resulting in the zeroing for the second fault overwriting the data that the first fault wrote with zeros. The second fault then attempts to convert the unwritten extent, which is then a no-op because it's already written. Data loss occurs as a result of this race. Because there is no sane locking construct in the page fault code that we can use for serialisation across the page faults, we need to ensure block allocation and zeroing occurs atomically in the filesystem. This means we can still take concurrent page faults and the only time they will serialise is in the filesystem mapping/allocation callback. The page fault code will always see written, initialised extents, so we will be able to remove the unwritten extent handling from the DAX code when all filesystems are converted. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
2015-11-03xfs: introduce BMAPI_ZERO for allocating zeroed extentsDave Chinner
To enable DAX to do atomic allocation of zeroed extents, we need to drive the block zeroing deep into the allocator. Because xfs_bmapi_write() can return merged extents on allocation that were only partially allocated (i.e. requested range spans allocated and hole regions, allocation into the hole was contiguous), we cannot zero the extent returned from xfs_bmapi_write() as that can overwrite existing data with zeros. Hence we have to drive the extent zeroing into the allocation code, prior to where we merge the extents into the BMBT and return the resultant map. This means we need to propagate this need down to the xfs_alloc_vextent() and issue the block zeroing at this point. While this functionality is being introduced for DAX, there is no reason why it is specific to DAX - we can per-zero blocks during the allocation transaction on any type of device. It's just slow (and usually slower than unwritten allocation and conversion) on traditional block devices so doesn't tend to get used. We can, however, hook hardware zeroing optimisations via sb_issue_zeroout() to this operation, so it may be useful in future and hence the "allocate zeroed blocks" API needs to be implementation neutral. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
2015-11-03xfs: fix inode size update overflow in xfs_map_direct()Dave Chinner
Both direct IO and DAX pass an offset and count into get_blocks that will overflow a s64 variable when an IO goes into the last supported block in a file (i.e. at offset 2^63 - 1FSB bytes). This can be seen from the tracing: xfs_get_blocks_alloc: [...] offset 0x7ffffffffffff000 count 4096 xfs_gbmap_direct: [...] offset 0x7ffffffffffff000 count 4096 xfs_gbmap_direct_none:[...] offset 0x7ffffffffffff000 count 4096 0x7ffffffffffff000 + 4096 = 0x8000000000000000, and hence that overflows the s64 offset and we fail to detect the need for a filesize update and an ioend is not allocated. This is *mostly* avoided for direct IO because such extending IOs occur with full block allocation, and so the "IS_UNWRITTEN()" check still evaluates as true and we get an ioend that way. However, doing single sector extending IOs to this last block will expose the fact that file size updates will not occur after the first allocating direct IO as the overflow will then be exposed. There is one further complexity: the DAX page fault path also exposes the same issue in block allocation. However, page faults cannot extend the file size, so in this case we want to allocate the block but do not want to allocate an ioend to enable file size update at IO completion. Hence we now need to distinguish between the direct IO patch allocation and dax fault path allocation to avoid leaking ioend structures. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
2015-11-02Documentation: add new description of path-name lookup.Neil Brown
This document is based on three recent lwn.net articles. Some of the introductory material and linkage between articles has been removed, and some time-based descriptions have been revised. Also all links to code have been removed as the code is very close by. Contains corrections and improvements from Randy Dunlap <rdunlap@infradead.org>. Signed-off-by: NeilBrown <neil@brown.name> Signed-off-by: Jonathan Corbet <corbet@lwn.net>
2015-11-02Documentation/vm/slub.txt: document slabinfo-gnuplot.shSergey Senozhatsky
Add documentation on how to use slabinfo-gnuplot.sh script. Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com> Acked-by: Christoph Lameter <cl@linux.com> Signed-off-by: Jonathan Corbet <corbet@lwn.net>
2015-11-02Doc: ABI/stable: Fix typo in ABI/stableMasanari Iida
This patch fix some spelling typos in Documentation/ABI/stable. Signed-off-by: Masanari Iida <standby24x7@gmail.com> Signed-off-by: Jonathan Corbet <corbet@lwn.net>
2015-11-02Merge tag 'regmap-v4.4' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regmap Pull regmap updates from Mark Brown: "Quite a few new features for regmap this time, mostly expanding things around the edges of the existing functionality to cover more devices rather than thinsg with wide applicability: - Support for offload of the update_bits() operation to hardware where devices implement bit level access. - Support for a few extra operations that need scratch buffers on fast_io devices where we can't sleep. - Expanded the feature set of regmap_irq to cope with some extra register layouts. - Cleanups to the debugfs code" * tag 'regmap-v4.4' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regmap: regmap: Allow installing custom reg_update_bits function regmap: debugfs: simplify regmap_reg_ranges_read_file() slightly regmap: debugfs: use memcpy instead of snprintf regmap: debugfs: use snprintf return value in regmap_reg_ranges_read_file() regmap: Add generic macro to define regmap_irq regmap: debugfs: Remove scratch buffer for register length calculation regmap: irq: add ack_invert flag for chips using cleared bits as ack regmap: irq: add support for chips who have separate unmask registers regmap: Allocate buffers with GFP_ATOMIC when fast_io == true
2015-11-03rtc: rtctest: enabling UIE for a chip that doesn't support it returns EINVALUwe Kleine-König
Calling ioctl(..., RTC_UIE_ON, ...) without CONFIG_RTC_INTF_DEV_UIE_EMUL either ends in rtc_update_irq_enable if rtc->uie_unsupported is true or in __rtc_set_alarm in the if (!rtc->ops->set_alarm) branch. In both cases the return value is -EINVAL. So check for that one instead of ENOTTY. Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de> Signed-off-by: Alexandre Belloni <alexandre.belloni@free-electrons.com>
2015-11-03rtc: pcf2127: make module license match the file headerUwe Kleine-König
The header of the pcf2127 driver specifies GPL v2 only as license, so use "GPL v2" as module license specifier instead of "GPL" as the latter means "GNU Public License v2 or later". Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de> Signed-off-by: Alexandre Belloni <alexandre.belloni@free-electrons.com>
2015-11-02tracepoints: Fix documentation of RCU lockdep checksMathieu Desnoyers
The documentation on top of __DECLARE_TRACE() does not match its implementation since the condition check has been added to the RCU lockdep checks. Update the documentation to match its implementation. Link: http://lkml.kernel.org/r/1446504164-21563-1-git-send-email-mathieu.desnoyers@efficios.com CC: Dave Hansen <dave@sr71.net> Fixes: a05d59a56733 "tracing: Add condition check to RCU lockdep checks" Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2015-11-02Merge tag 'nfs-rdma-4.4-2' of git://git.linux-nfs.org/projects/anna/nfs-rdmaTrond Myklebust
NFS: NFSoRDMA Client Side Changes In addition to a variety of bugfixes, these patches are mostly geared at enabling both swap and backchannel support to the NFS over RDMA client. Signed-off-by: Anna Schumake <Anna.Schumaker@Netapp.com>
2015-11-03Merge branch 'vmwgfx-next' of git://people.freedesktop.org/~thomash/linux ↵Dave Airlie
into drm-next Changes for vmwgfx for 4.4. If there is time, I'll follow up with a series to move to threaded irqs. * 'vmwgfx-next' of git://people.freedesktop.org/~thomash/linux: drm/vmwgfx: Replace iowrite/ioread with volatile memory accesses drm/vmwgfx: Turn off support for multisample count != 0 v2 drm/vmwgfx: switch from ioremap_cache to memremap
2015-11-02Merge branches 'pci/aer', 'pci/hotplug', 'pci/misc', 'pci/msi', ↵Bjorn Helgaas
'pci/resource' and 'pci/virtualization' into next * pci/aer: PCI/AER: Clear error status registers during enumeration and restore * pci/hotplug: PCI: pciehp: Queue power work requests in dedicated function * pci/misc: PCI: Turn off Request Attributes to avoid Chelsio T5 Completion erratum x86/PCI: Make pci_subsys_init() static PCI: Add builtin_pci_driver() to avoid registration boilerplate PCI: Remove unnecessary "if" statement * pci/msi: x86/PCI: Don't alloc pcibios-irq when MSI is enabled PCI/MSI: Export all remapped MSIs to sysfs attributes PCI: Disable MSI on SiS 761 * pci/resource: sparc/PCI: Add mem64 resource parsing for root bus PCI: Expand Enhanced Allocation BAR output PCI: Make Enhanced Allocation bitmasks more obvious PCI: Handle Enhanced Allocation capability for SR-IOV devices PCI: Add support for Enhanced Allocation devices PCI: Add Enhanced Allocation register entries PCI: Handle IORESOURCE_PCI_FIXED when assigning resources PCI: Handle IORESOURCE_PCI_FIXED when sizing resources PCI: Clear IORESOURCE_UNSET when reverting to firmware-assigned address * pci/virtualization: PCI: Fix sriov_enable() error path for pcibios_enable_sriov() failures PCI: Wait 1 second between disabling VFs and clearing NumVFs PCI: Reorder pcibios_sriov_disable() PCI: Remove VFs in reverse order if virtfn_add() fails PCI: Remove redundant validation of SR-IOV offset/stride registers PCI: Set SR-IOV NumVFs to zero after enumeration PCI: Enable SR-IOV ARI Capable Hierarchy before reading TotalVFs PCI: Don't try to restore VF BARs
2015-11-02pstore: fix code comment to match codeGeliang Tang
Fix code comment about kmsg_dump register so it matches the code. Signed-off-by: Geliang Tang <geliangtang@163.com> Signed-off-by: Tony Luck <tony.luck@intel.com>