summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2016-12-12ceph: move xattr initialzation before the encoding past the ceph_mds_capsJeff Layton
Just for clarity. This part is inside the header, so it makes sense to group it with the rest of the stuff in the header. Signed-off-by: Jeff Layton <jlayton@redhat.com> Reviewed-by: Yan, Zheng <zyan@redhat.com>
2016-12-12ceph: fix minor typo in unsafe_request_waitJeff Layton
Signed-off-by: Jeff Layton <jlayton@redhat.com> Reviewed-by: Yan, Zheng <zyan@redhat.com>
2016-12-12ceph: record truncate size/seq for snap data writebackYan, Zheng
Dirty snapshot data needs to be flushed unconditionally. If they were created before truncation, writeback should use old truncate size/seq. Signed-off-by: Yan, Zheng <zyan@redhat.com>
2016-12-12ceph: check availability of mds cluster on mountYan, Zheng
Signed-off-by: Yan, Zheng <zyan@redhat.com>
2016-12-12ceph: fix splice read for no Fc capability caseYan, Zheng
When iov_iter type is ITER_PIPE, copy_page_to_iter() increases the page's reference and add the page to a pipe_buffer. It also set the pipe_buffer's ops to page_cache_pipe_buf_ops. The comfirm callback in page_cache_pipe_buf_ops expects the page is from page cache and uptodate, otherwise it return error. For ceph_sync_read() case, pages are not from page cache. So we can't call copy_page_to_iter() when iov_iter type is ITER_PIPE. The fix is using iov_iter_get_pages_alloc() to allocate pages for the pipe. (the code is similar to default_file_splice_read) Signed-off-by: Yan, Zheng <zyan@redhat.com>
2016-12-12ceph: try getting buffer capability for readahead/fadviseYan, Zheng
For readahead/fadvise cases, caller of ceph_readpages does not hold buffer capability. Pages can be added to page cache while there is no buffer capability. This can cause data integrity issue. Signed-off-by: Yan, Zheng <zyan@redhat.com>
2016-12-12ceph: fix scheduler warning due to nested blockingNikolay Borisov
try_get_cap_refs can be used as a condition in a wait_event* calls. This is all fine until it has to call __ceph_do_pending_vmtruncate, which in turn acquires the i_truncate_mutex. This leads to a situation in which a task's state is !TASK_RUNNING and at the same time it's trying to acquire a sleeping primitive. In essence a nested sleeping primitives are being used. This causes the following warning: WARNING: CPU: 22 PID: 11064 at kernel/sched/core.c:7631 __might_sleep+0x9f/0xb0() do not call blocking ops when !TASK_RUNNING; state=1 set at [<ffffffff8109447d>] prepare_to_wait_event+0x5d/0x110 ipmi_msghandler tcp_scalable ib_qib dca ib_mad ib_core ib_addr ipv6 CPU: 22 PID: 11064 Comm: fs_checker.pl Tainted: G O 4.4.20-clouder2 #6 Hardware name: Supermicro X10DRi/X10DRi, BIOS 1.1a 10/16/2015 0000000000000000 ffff8838b416fa88 ffffffff812f4409 ffff8838b416fad0 ffffffff81a034f2 ffff8838b416fac0 ffffffff81052b46 ffffffff81a0432c 0000000000000061 0000000000000000 0000000000000000 ffff88167bda54a0 Call Trace: [<ffffffff812f4409>] dump_stack+0x67/0x9e [<ffffffff81052b46>] warn_slowpath_common+0x86/0xc0 [<ffffffff81052bcc>] warn_slowpath_fmt+0x4c/0x50 [<ffffffff8109447d>] ? prepare_to_wait_event+0x5d/0x110 [<ffffffff8109447d>] ? prepare_to_wait_event+0x5d/0x110 [<ffffffff8107767f>] __might_sleep+0x9f/0xb0 [<ffffffff81612d30>] mutex_lock+0x20/0x40 [<ffffffffa04eea14>] __ceph_do_pending_vmtruncate+0x44/0x1a0 [ceph] [<ffffffffa04fa692>] try_get_cap_refs+0xa2/0x320 [ceph] [<ffffffffa04fd6f5>] ceph_get_caps+0x255/0x2b0 [ceph] [<ffffffff81094370>] ? wait_woken+0xb0/0xb0 [<ffffffffa04f2c11>] ceph_write_iter+0x2b1/0xde0 [ceph] [<ffffffff81613f22>] ? schedule_timeout+0x202/0x260 [<ffffffff8117f01a>] ? kmem_cache_free+0x1ea/0x200 [<ffffffff811b46ce>] ? iput+0x9e/0x230 [<ffffffff81077632>] ? __might_sleep+0x52/0xb0 [<ffffffff81156147>] ? __might_fault+0x37/0x40 [<ffffffff8119e123>] ? cp_new_stat+0x153/0x170 [<ffffffff81198cfa>] __vfs_write+0xaa/0xe0 [<ffffffff81199369>] vfs_write+0xa9/0x190 [<ffffffff811b6d01>] ? set_close_on_exec+0x31/0x70 [<ffffffff8119a056>] SyS_write+0x46/0xa0 This happens since wait_event_interruptible can interfere with the mutex locking code, since they both fiddle with the task state. Fix the issue by using the newly-added nested blocking infrastructure in 61ada528dea0 ("sched/wait: Provide infrastructure to deal with nested blocking") Link: https://lwn.net/Articles/628628/ Signed-off-by: Nikolay Borisov <kernel@kyup.com> Signed-off-by: Yan, Zheng <zyan@redhat.com>
2016-12-12ceph: fix printing wrong return variable in ceph_direct_read_write()Zhi Zhang
Fix printing wrong return variable for invalidate_inode_pages2_range in ceph_direct_read_write(). Signed-off-by: Zhi Zhang <zhang.david2011@gmail.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2016-12-12crush: include mapper.h in mapper.cTobias Klauser
Include linux/crush/mapper.h in crush/mapper.c to get the prototypes of crush_find_rule and crush_do_rule which are defined there. This fixes the following GCC warnings when building with 'W=1': net/ceph/crush/mapper.c:40:5: warning: no previous prototype for ‘crush_find_rule’ [-Wmissing-prototypes] net/ceph/crush/mapper.c:793:5: warning: no previous prototype for ‘crush_do_rule’ [-Wmissing-prototypes] Signed-off-by: Tobias Klauser <tklauser@distanz.ch> [idryomov@gmail.com: corresponding !__KERNEL__ include] Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2016-12-12rbd: silence bogus -Wmaybe-uninitialized warningIlya Dryomov
drivers/block/rbd.c: In function ‘rbd_watch_cb’: drivers/block/rbd.c:3690:5: error: ‘struct_v’ may be used uninitialized in this function [-Werror=maybe-uninitialized] drivers/block/rbd.c:3759:5: note: ‘struct_v’ was declared here Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2016-12-12Merge branch 'x86-headers-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 header fixlet from Ingo Molnar: "Remove unnecessary module.h inclusion from core code (Paul Gortmaker)" * 'x86-headers-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/percpu: Remove unnecessary include of module.h, add asm/desc.h
2016-12-12ACPI / CPPC: Fix per-CPU pointer management in acpi_cppc_processor_probe()Rafael J. Wysocki
Fix a possible use-after-free scenario in acpi_cppc_processor_probe() that can happen if the function returns without cleaning up the per-CPU pointer set by it previously. Reported-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2016-12-12ACPI / CPPC: Fix crash in acpi_cppc_processor_exit()Sebastian Andrzej Siewior
First I had crashed what I bisected down to de966cf4a4fa (sched/x86: Change CONFIG_SCHED_ITMT to CONFIG_SCHED_MC_PRIO) because it made SCHED_ITMT the default. Then I run another bisect round and got here with the same backtrace: |BUG: unable to handle kernel NULL pointer dereference at (null) |IP: [<ffffffff812aab6e>] acpi_cppc_processor_exit+0x40/0x60 |PGD 0 [ 0.577616] |Oops: 0000 [#1] SMP |Modules linked in: |CPU: 3 PID: 1 Comm: swapper/0 Not tainted 4.9.0-rc6-00146-g17669006adf6 #51 |task: ffff88003f878000 task.stack: ffffc90000008000 |RIP: 0010:[<ffffffff812aab6e>] [<ffffffff812aab6e>] acpi_cppc_processor_exit+0x40/0x60 |RSP: 0000:ffffc9000000bd48 EFLAGS: 00010296 |RAX: 00000000000137e0 RBX: 0000000000000000 RCX: 0000000000000001 |RDX: ffff88003fc00000 RSI: 0000000000000000 RDI: ffff88003fbca130 |RBP: ffffc9000000bd60 R08: 0000000000000514 R09: 0000000000000000 |R10: 0000000000000001 R11: 0000000000000000 R12: 0000000000000002 |R13: 0000000000000020 R14: ffffffff8167cb00 R15: 0000000000000000 |FS: 0000000000000000(0000) GS:ffff88003fcc0000(0000) knlGS:0000000000000000 |CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 |CR2: 0000000000000000 CR3: 0000000001618000 CR4: 00000000000406e0 |Stack: | ffff88003f939848 ffff88003fbca130 0000000000000001 ffffc9000000bd80 | ffffffff812a4ccb ffff88003fc0cee8 0000000000000000 ffffc9000000bdb8 | ffffffff812dc20d ffff88003fc0cee8 ffffffff8167cb00 ffff88003fc0cf48 |Call Trace: | [<ffffffff812a4ccb>] acpi_processor_stop+0xb2/0xc5 | [<ffffffff812dc20d>] driver_probe_device+0x14d/0x2f0 | [<ffffffff812dc41e>] __driver_attach+0x6e/0x90 | [<ffffffff812da234>] bus_for_each_dev+0x54/0x90 | [<ffffffff812dbbf9>] driver_attach+0x19/0x20 | [<ffffffff812db6a6>] bus_add_driver+0xe6/0x200 | [<ffffffff812dcb23>] driver_register+0x83/0xc0 | [<ffffffff816f050a>] acpi_processor_driver_init+0x20/0x94 | [<ffffffff81000487>] do_one_initcall+0x97/0x180 | [<ffffffff816ccf5c>] kernel_init_freeable+0x112/0x1a6 | [<ffffffff813a0fc9>] kernel_init+0x9/0xf0 | [<ffffffff813acf35>] ret_from_fork+0x25/0x30 |Code: 02 00 00 00 48 8b 14 d5 e0 c3 55 81 48 8b 1c 02 4c 8d 6b 20 eb 15 49 8b 7d 00 48 85 ff 74 05 e8 39 8c d9 ff 41 ff c4 49 83 c5 20 <44> 3b 23 72 e6 48 8d bb a0 02 00 00 e8 b1 6f f9 ff 48 89 df e8 |RIP [<ffffffff812aab6e>] acpi_cppc_processor_exit+0x40/0x60 | RSP <ffffc9000000bd48> |CR2: 0000000000000000 |---[ end trace 917a625107b09711 ]--- Fix it. Fixes: 17669006adf6 (cpufreq/intel_pstate: Use CPPC to get max performance) Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> [ rjw: Subject ] Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2016-12-12Merge branch 'x86-fpu-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 FPU updates from Ingo Molnar: "The main changes in this cycle were: - do a large round of simplifications after all CPUs do 'eager' FPU context switching in v4.9: remove CR0 twiddling, remove leftover eager/lazy bts, etc (Andy Lutomirski) - more FPU code simplifications: remove struct fpu::counter, clarify nomenclature, remove unnecessary arguments/functions and better structure the code (Rik van Riel)" * 'x86-fpu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/fpu: Remove clts() x86/fpu: Remove stts() x86/fpu: Handle #NM without FPU emulation as an error x86/fpu, lguest: Remove CR0.TS support x86/fpu, kvm: Remove host CR0.TS manipulation x86/fpu: Remove irq_ts_save() and irq_ts_restore() x86/fpu: Stop saving and restoring CR0.TS in fpu__init_check_bugs() x86/fpu: Get rid of two redundant clts() calls x86/fpu: Finish excising 'eagerfpu' x86/fpu: Split old_fpu & new_fpu handling into separate functions x86/fpu: Remove 'cpu' argument from __cpu_invalidate_fpregs_state() x86/fpu: Split old & new FPU code paths x86/fpu: Remove __fpregs_(de)activate() x86/fpu: Rename lazy restore functions to "register state valid" x86/fpu, kvm: Remove KVM vcpu->fpu_counter x86/fpu: Remove struct fpu::counter x86/fpu: Remove use_eager_fpu() x86/fpu: Remove the XFEATURE_MASK_EAGER/LAZY distinction x86/fpu: Hard-disable lazy FPU mode x86/crypto, x86/fpu: Remove X86_FEATURE_EAGER_FPU #ifdef from the crc32c code
2016-12-12Merge branch 'x86-cpu-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 CPU updates from Ingo Molnar: "The changes in this development cycle were: - AMD CPU topology enhancements that are cleanups on current CPUs but which enable future Fam17 hardware. (Yazen Ghannam) - unify bugs.c and bugs_64.c (Borislav Petkov) - remove the show_msr= boot option (Borislav Petkov) - simplify a boot message (Borislav Petkov)" * 'x86-cpu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/cpu/AMD: Clean up cpu_llc_id assignment per topology feature x86/cpu: Get rid of the show_msr= boot option x86/cpu: Merge bugs.c and bugs_64.c x86/cpu: Remove the printk format specifier in "CPU0: "
2016-12-12Input: drv260x - use generic device propertiesJingkui Wang
Update driver drv260x to use generic device properties so that it can be used on non-DT systems. We also remove platform data as generic device properties work on static board code as well. Signed-off-by: Jingkui Wang <jkwang@google.com> Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
2016-12-12Input: drv260x - use temporary for &client->devDmitry Torokhov
Let's introduce a temporary for "client->dev" is probe() as we use it quite a few times and "dev" is shorter. Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
2016-12-12Input: drv260x - fix input device's parent assignmentJingkui Wang
We were assigning I2C bus controller instead of client as parent device. Besides being logically wrong, it messed up with devm handling of input device. As a result we were leaving input device and event node behind after rmmod-ing the driver, which lead to a kernel oops if one were to access the event node later. Let's remove the assignment and rely on devm_input_allocate_device() to set it up properly for us. Signed-off-by: Jingkui Wang <jkwang@google.com> Fixes: 7132fe4f5687 ("Input: drv260x - add TI drv260x haptics driver") Cc: stable@vger.kernel.org Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
2016-12-12i40iw: Reorganize structures to align with HW capabilitiesHenry Orosco
Some resources are incorrectly organized and at odds with HW capabilities. Specifically, ILQ, IEQ, QPs, MSS, QOS and statistics belong in a VSI. Signed-off-by: Faisal Latif <faisal.latif@intel.com> Signed-off-by: Mustafa Ismail <mustafa.ismail@intel.com> Signed-off-by: Henry Orosco <henry.orosco@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-12-12i40iw: Fix incorrect check for errorMustafa Ismail
In i40iw_ieq_handle_partial() the check for !status is incorrect. Signed-off-by: Mustafa Ismail <mustafa.ismail@intel.com> Signed-off-by: Henry Orosco <henry.orosco@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-12-12i40iw: Assign MSS only when it is a new MTUMustafa Ismail
Currently we are changing the MSS regardless of whether there is a change or not in MTU. Fix to make the assignment of MSS dependent on an MTU change. Signed-off-by: Mustafa Ismail <mustafa.ismail@intel.com> Signed-off-by: Henry Orosco <henry.orosco@intel.com> Signed-off-by: Shiraz Saleem <shiraz.saleem@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-12-12i40iw: Fix race condition in terminate timer's handlerShiraz Saleem
Add a QP reference when terminate timer is started to ensure the destroy QP doesn't race ahead to free the QP while it is being referenced in the terminate timer's handler. Signed-off-by: Shiraz Saleem <shiraz.saleem@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-12-12i40iw: Fix memory leak in CQP destroy when in resetMustafa Ismail
On a device close, the control QP (CQP) is destroyed by calling cqp_destroy which destroys the CQP and frees its SD buffer memory. However, if the reset flag is true, cqp_destroy is never called and leads to a memory leak on SD buffer memory. Fix this by always calling cqp_destroy, on device close, regardless of reset. The exception to this when CQP create fails. In this case, the SD buffer memory is already freed on an error check and there is no need to call cqp_destroy. Signed-off-by: Mustafa Ismail <mustafa.ismail@intel.com> Signed-off-by: Shiraz Saleem <shiraz.saleem@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-12-12i40iw: Fix QP flush to not hang on empty queues or failureShiraz Saleem
When flush QP and there are no pending work requests, signal completion to unblock i40iw_drain_sq and i40iw_drain_rq which are waiting on completion for iwqp->sq_drained and iwqp->sq_drained respectively. Also, signal completion if flush QP fails to prevent the drain SQ or RQ from being blocked indefintely. Signed-off-by: Shiraz Saleem <shiraz.saleem@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-12-12i40iw: Fix double free of QPMustafa Ismail
A QP can be double freed if i40iw_cm_disconn() is called while it is currently being freed by i40iw_rem_ref(). The fix in i40iw_cm_disconn() will first check if the QP is already freed before making another request for the QP to be freed. Signed-off-by: Mustafa Ismail <mustafa.ismail@intel.com> Signed-off-by: Shiraz Saleem <shiraz.saleem@intel.com> Signed-off-by: Henry Orosco <henry.orosco@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-12-12Merge branch 'x86-cleanups-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 cleanups from Ingo Molnar: "Two cleanups in the LDT handling code, by Dan Carpenter and Thomas Gleixner" * 'x86-cleanups-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/ldt: Make all size computations unsigned x86/ldt: Make a size argument unsigned
2016-12-12i40iw: Use correct src address in memcpy to rdma stats countersShiraz Saleem
hw_stats is a pointer to i40_iw_dev_stats struct in i40iw_get_hw_stats(). Use hw_stats and not &hw_stats in the memcpy to copy the i40iw device stats data into rdma_hw_stats counters. Fixes: b40f4757daa1 ("IB/core: Make device counter infrastructure dynamic") Cc: stable@vger.kernel.org # 4.7+ Signed-off-by: Shiraz Saleem <shiraz.saleem@intel.com> Signed-off-by: Faisal Latif <faisal.latif@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-12-12Merge branch 'x86-build-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 build updates from Ingo Molnar: "The main changes in this cycle were: - Makefile improvements (Paul Bolle) - KConfig cleanups to better separate 32-bit only, 64-bit only and generic feature enablement sections (Ingo Molnar)" * 'x86-build-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/build: Remove three unneeded genhdr-y entries x86/build: Don't use $(LINUXINCLUDE) twice x86/kconfig: Sort the 'config X86' selects alphabetically x86/kconfig: Clean up 32-bit compat options x86/kconfig: Clean up IA32_EMULATION select x86/kconfig, x86/pkeys: Move pkeys selects to X86_INTEL_MEMORY_PROTECTION_KEYS x86/kconfig: Move 64-bit only arch Kconfig selects to 'config X86_64' x86/kconfig: Move 32-bit only arch Kconfig selects to 'config X86_32'
2016-12-12Merge branch 'x86-boot-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 boot updates from Ingo Molnar: "Misc cleanups/simplifications by Borislav Petkov, Paul Bolle and Wei Yang" * 'x86-boot-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/boot/64: Optimize fixmap page fixup x86/boot: Simplify the GDTR calculation assembly code a bit x86/boot/build: Remove always empty $(USERINCLUDE)
2016-12-12i40iw: Remove macros I40IW_STAG_KEY_FROM_STAG and I40IW_STAG_INDEX_FROM_STAGThomas Huth
The macros I40IW_STAG_KEY_FROM_STAG and I40IW_STAG_INDEX_FROM_STAG are apparently bad - they are using the logical "&&" operation which does not make sense here. It should have been a bitwise "&" instead. Since the macros seem to be completely unused, let's simply remove them so that nobody accidentially uses them in the future. And while we're at it, also remove the unused macro I40IW_CREATE_STAG. Signed-off-by: Thomas Huth <thuth@redhat.com> Reviewed-by: Leon Romanovsky <leonro@mellanox.com> Acked-by: Faisal Latif <faisal.latif@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-12-12libceph: no need to drop con->mutex for ->get_authorizer()Ilya Dryomov
->get_authorizer(), ->verify_authorizer_reply(), ->sign_message() and ->check_message_signature() shouldn't be doing anything with or on the connection (like closing it or sending messages). Signed-off-by: Ilya Dryomov <idryomov@gmail.com> Reviewed-by: Sage Weil <sage@redhat.com>
2016-12-12libceph: drop len argument of *verify_authorizer_reply()Ilya Dryomov
The length of the reply is protocol-dependent - for cephx it's ceph_x_authorize_reply. Nothing sensible can be passed from the messenger layer anyway. Signed-off-by: Ilya Dryomov <idryomov@gmail.com> Reviewed-by: Sage Weil <sage@redhat.com>
2016-12-12libceph: verify authorize reply on connectIlya Dryomov
After sending an authorizer (ceph_x_authorize_a + ceph_x_authorize_b), the client gets back a ceph_x_authorize_reply, which it is supposed to verify to ensure the authenticity and protect against replay attacks. The code for doing this is there (ceph_x_verify_authorizer_reply(), ceph_auth_verify_authorizer_reply() + plumbing), but it is never invoked by the the messenger. AFAICT this goes back to 2009, when ceph authentication protocols support was added to the kernel client in 4e7a5dcd1bba ("ceph: negotiate authentication protocol; implement AUTH_NONE protocol"). The second param of ceph_connection_operations::verify_authorizer_reply is unused all the way down. Pass 0 to facilitate backporting, and kill it in the next commit. Cc: stable@vger.kernel.org Signed-off-by: Ilya Dryomov <idryomov@gmail.com> Reviewed-by: Sage Weil <sage@redhat.com>
2016-12-12libceph: no need for GFP_NOFS in ceph_monc_init()Ilya Dryomov
It's called during inital setup, when everything should be allocated with GFP_KERNEL. Signed-off-by: Ilya Dryomov <idryomov@gmail.com> Reviewed-by: Sage Weil <sage@redhat.com>
2016-12-12libceph: stop allocating a new cipher on every crypto requestIlya Dryomov
This is useless and more importantly not allowed on the writeback path, because crypto_alloc_skcipher() allocates memory with GFP_KERNEL, which can recurse back into the filesystem: kworker/9:3 D ffff92303f318180 0 20732 2 0x00000080 Workqueue: ceph-msgr ceph_con_workfn [libceph] ffff923035dd4480 ffff923038f8a0c0 0000000000000001 000000009eb27318 ffff92269eb28000 ffff92269eb27338 ffff923036b145ac ffff923035dd4480 00000000ffffffff ffff923036b145b0 ffffffff951eb4e1 ffff923036b145a8 Call Trace: [<ffffffff951eb4e1>] ? schedule+0x31/0x80 [<ffffffff951eb77a>] ? schedule_preempt_disabled+0xa/0x10 [<ffffffff951ed1f4>] ? __mutex_lock_slowpath+0xb4/0x130 [<ffffffff951ed28b>] ? mutex_lock+0x1b/0x30 [<ffffffffc0a974b3>] ? xfs_reclaim_inodes_ag+0x233/0x2d0 [xfs] [<ffffffff94d92ba5>] ? move_active_pages_to_lru+0x125/0x270 [<ffffffff94f2b985>] ? radix_tree_gang_lookup_tag+0xc5/0x1c0 [<ffffffff94dad0f3>] ? __list_lru_walk_one.isra.3+0x33/0x120 [<ffffffffc0a98331>] ? xfs_reclaim_inodes_nr+0x31/0x40 [xfs] [<ffffffff94e05bfe>] ? super_cache_scan+0x17e/0x190 [<ffffffff94d919f3>] ? shrink_slab.part.38+0x1e3/0x3d0 [<ffffffff94d9616a>] ? shrink_node+0x10a/0x320 [<ffffffff94d96474>] ? do_try_to_free_pages+0xf4/0x350 [<ffffffff94d967ba>] ? try_to_free_pages+0xea/0x1b0 [<ffffffff94d863bd>] ? __alloc_pages_nodemask+0x61d/0xe60 [<ffffffff94ddf42d>] ? cache_grow_begin+0x9d/0x560 [<ffffffff94ddfb88>] ? fallback_alloc+0x148/0x1c0 [<ffffffff94ed84e7>] ? __crypto_alloc_tfm+0x37/0x130 [<ffffffff94de09db>] ? __kmalloc+0x1eb/0x580 [<ffffffffc09fe2db>] ? crush_choose_firstn+0x3eb/0x470 [libceph] [<ffffffff94ed84e7>] ? __crypto_alloc_tfm+0x37/0x130 [<ffffffff94ed9c19>] ? crypto_spawn_tfm+0x39/0x60 [<ffffffffc08b30a3>] ? crypto_cbc_init_tfm+0x23/0x40 [cbc] [<ffffffff94ed857c>] ? __crypto_alloc_tfm+0xcc/0x130 [<ffffffff94edcc23>] ? crypto_skcipher_init_tfm+0x113/0x180 [<ffffffff94ed7cc3>] ? crypto_create_tfm+0x43/0xb0 [<ffffffff94ed83b0>] ? crypto_larval_lookup+0x150/0x150 [<ffffffff94ed7da2>] ? crypto_alloc_tfm+0x72/0x120 [<ffffffffc0a01dd7>] ? ceph_aes_encrypt2+0x67/0x400 [libceph] [<ffffffffc09fd264>] ? ceph_pg_to_up_acting_osds+0x84/0x5b0 [libceph] [<ffffffff950d40a0>] ? release_sock+0x40/0x90 [<ffffffff95139f94>] ? tcp_recvmsg+0x4b4/0xae0 [<ffffffffc0a02714>] ? ceph_encrypt2+0x54/0xc0 [libceph] [<ffffffffc0a02b4d>] ? ceph_x_encrypt+0x5d/0x90 [libceph] [<ffffffffc0a02bdf>] ? calcu_signature+0x5f/0x90 [libceph] [<ffffffffc0a02ef5>] ? ceph_x_sign_message+0x35/0x50 [libceph] [<ffffffffc09e948c>] ? prepare_write_message_footer+0x5c/0xa0 [libceph] [<ffffffffc09ecd18>] ? ceph_con_workfn+0x2258/0x2dd0 [libceph] [<ffffffffc09e9903>] ? queue_con_delay+0x33/0xd0 [libceph] [<ffffffffc09f68ed>] ? __submit_request+0x20d/0x2f0 [libceph] [<ffffffffc09f6ef8>] ? ceph_osdc_start_request+0x28/0x30 [libceph] [<ffffffffc0b52603>] ? rbd_queue_workfn+0x2f3/0x350 [rbd] [<ffffffff94c94ec0>] ? process_one_work+0x160/0x410 [<ffffffff94c951bd>] ? worker_thread+0x4d/0x480 [<ffffffff94c95170>] ? process_one_work+0x410/0x410 [<ffffffff94c9af8d>] ? kthread+0xcd/0xf0 [<ffffffff951efb2f>] ? ret_from_fork+0x1f/0x40 [<ffffffff94c9aec0>] ? kthread_create_on_node+0x190/0x190 Allocating the cipher along with the key fixes the issue - as long the key doesn't change, a single cipher context can be used concurrently in multiple requests. We still can't take that GFP_KERNEL allocation though. Both ceph_crypto_key_clone() and ceph_crypto_key_decode() are called from GFP_NOFS context, so resort to memalloc_noio_{save,restore}() here. Reported-by: Lucas Stach <l.stach@pengutronix.de> Signed-off-by: Ilya Dryomov <idryomov@gmail.com> Reviewed-by: Sage Weil <sage@redhat.com>
2016-12-12libceph: uninline ceph_crypto_key_destroy()Ilya Dryomov
Signed-off-by: Ilya Dryomov <idryomov@gmail.com> Reviewed-by: Sage Weil <sage@redhat.com>
2016-12-12libceph: remove now unused ceph_*{en,de}crypt*() functionsIlya Dryomov
Signed-off-by: Ilya Dryomov <idryomov@gmail.com> Reviewed-by: Sage Weil <sage@redhat.com>
2016-12-12libceph: switch ceph_x_decrypt() to ceph_crypt()Ilya Dryomov
Signed-off-by: Ilya Dryomov <idryomov@gmail.com> Reviewed-by: Sage Weil <sage@redhat.com>
2016-12-12libceph: switch ceph_x_encrypt() to ceph_crypt()Ilya Dryomov
Signed-off-by: Ilya Dryomov <idryomov@gmail.com> Reviewed-by: Sage Weil <sage@redhat.com>
2016-12-12libceph: tweak calcu_signature() a littleIlya Dryomov
- replace an ad-hoc array with a struct - rename to calc_signature() for consistency Signed-off-by: Ilya Dryomov <idryomov@gmail.com> Reviewed-by: Sage Weil <sage@redhat.com>
2016-12-12libceph: rename and align ceph_x_authorizer::reply_bufIlya Dryomov
It's going to be used as a temporary buffer for in-place en/decryption with ceph_crypt() instead of on-stack buffers, so rename to enc_buf. Ensure alignment to avoid GFP_ATOMIC allocations in the crypto stack. Signed-off-by: Ilya Dryomov <idryomov@gmail.com> Reviewed-by: Sage Weil <sage@redhat.com>
2016-12-12libceph: introduce ceph_crypt() for in-place en/decryptionIlya Dryomov
Starting with 4.9, kernel stacks may be vmalloced and therefore not guaranteed to be physically contiguous; the new CONFIG_VMAP_STACK option is enabled by default on x86. This makes it invalid to use on-stack buffers with the crypto scatterlist API, as sg_set_buf() expects a logical address and won't work with vmalloced addresses. There isn't a different (e.g. kvec-based) crypto API we could switch net/ceph/crypto.c to and the current scatterlist.h API isn't getting updated to accommodate this use case. Allocating a new header and padding for each operation is a non-starter, so do the en/decryption in-place on a single pre-assembled (header + data + padding) heap buffer. This is explicitly supported by the crypto API: "... the caller may provide the same scatter/gather list for the plaintext and cipher text. After the completion of the cipher operation, the plaintext data is replaced with the ciphertext data in case of an encryption and vice versa for a decryption." Signed-off-by: Ilya Dryomov <idryomov@gmail.com> Reviewed-by: Sage Weil <sage@redhat.com>
2016-12-12libceph: introduce ceph_x_encrypt_offset()Ilya Dryomov
Signed-off-by: Ilya Dryomov <idryomov@gmail.com> Reviewed-by: Sage Weil <sage@redhat.com>
2016-12-12libceph: old_key in process_one_ticket() is redundantIlya Dryomov
Since commit 0a990e709356 ("ceph: clean up service ticket decoding"), th->session_key isn't assigned until everything is decoded. Signed-off-by: Ilya Dryomov <idryomov@gmail.com> Reviewed-by: Sage Weil <sage@redhat.com>
2016-12-12libceph: ceph_x_encrypt_buflen() takes in_lenIlya Dryomov
Pass what's going to be encrypted - that's msg_b, not ticket_blob. ceph_x_encrypt_buflen() returns the upper bound, so this doesn't change the maxlen calculation, but makes it a bit clearer. Signed-off-by: Ilya Dryomov <idryomov@gmail.com> Reviewed-by: Sage Weil <sage@redhat.com>
2016-12-12ubifs: Raise write version to 5Richard Weinberger
Starting with version 5 the following properties change: - UBIFS_FLG_DOUBLE_HASH is mandatory - UBIFS_FLG_ENCRYPTION is optional but depdens on UBIFS_FLG_DOUBLE_HASH - Filesystems with unknown super block flags will be rejected, this allows us in future to add new features without raising the UBIFS write version. Signed-off-by: Richard Weinberger <richard@nod.at>
2016-12-12ubifs: Implement UBIFS_FLG_ENCRYPTIONRichard Weinberger
This feature flag indicates that the filesystem contains encrypted files. Signed-off-by: Richard Weinberger <richard@nod.at>
2016-12-12ubifs: Implement UBIFS_FLG_DOUBLE_HASHRichard Weinberger
This feature flag indicates that all directory entry nodes have a 32bit cookie set and therefore UBIFS is allowed to perform lookups by hash. Signed-off-by: Richard Weinberger <richard@nod.at>
2016-12-12ubifs: Use a random number for cookiesRichard Weinberger
Signed-off-by: Richard Weinberger <richard@nod.at>
2016-12-12ubifs: Add full hash lookup supportRichard Weinberger
UBIFS stores a 32bit hash of every file, for traditional lookups by name this scheme is fine since UBIFS can first try to find the file by the hash of the filename and upon collisions it can walk through all entries with the same hash and do a string compare. When filesnames are encrypted fscrypto will ask the filesystem for a unique cookie, based on this cookie the filesystem has to be able to locate the target file again. With 32bit hashes this is impossible because the chance for collisions is very high. Do deal with that we store a 32bit cookie directly in the UBIFS directory entry node such that we get a 64bit cookie (32bit from filename hash and the dent cookie). For a lookup by hash UBIFS finds the entry by the first 32bit and then compares the dent cookie. If it does not match, it has to do a linear search of the whole directory and compares all dent cookies until the correct entry is found. Signed-off-by: Richard Weinberger <richard@nod.at>