summaryrefslogtreecommitdiffstats
AgeCommit message (Collapse)Author
2015-03-06Linux 3.19.1v3.19.1Greg Kroah-Hartman
2015-03-06ppc/kvm: Replace ACCESS_ONCE with READ_ONCEChristian Borntraeger
commit 5ee07612e9e20817bb99256ab6cf1400fd5aa270 upstream. ACCESS_ONCE does not work reliably on non-scalar types. For example gcc 4.6 and 4.7 might remove the volatile tag for such accesses during the SRA (scalar replacement of aggregates) step (https://gcc.gnu.org/bugzilla/show_bug.cgi?id=58145) Change the ppc/kvm code to replace ACCESS_ONCE with READ_ONCE. Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com> Acked-by: Alexander Graf <agraf@suse.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-03-06ppc/hugetlbfs: Replace ACCESS_ONCE with READ_ONCEChristian Borntraeger
commit da1a288d8562739aa8ba0273d4fb6b73b856c0d3 upstream. ACCESS_ONCE does not work reliably on non-scalar types. For example gcc 4.6 and 4.7 might remove the volatile tag for such accesses during the SRA (scalar replacement of aggregates) step (https://gcc.gnu.org/bugzilla/show_bug.cgi?id=58145) Change the ppc/hugetlbfs code to replace ACCESS_ONCE with READ_ONCE. Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-03-06mm/gup: Replace ACCESS_ONCE with READ_ONCEChristian Borntraeger
commit 38c5ce936a0862a6ce2c8d1c72689a3aba301425 upstream. ACCESS_ONCE does not work reliably on non-scalar types. For example gcc 4.6 and 4.7 might remove the volatile tag for such accesses during the SRA (scalar replacement of aggregates) step (https://gcc.gnu.org/bugzilla/show_bug.cgi?id=58145) Fixup gup_pmd_range. Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-03-06next: sh: Fix compile errorGuenter Roeck
commit 378af02b1aecabb3756e19c0cbb8cdd9c3b9637f upstream. Commit 927609d622a3 ("kernel: tighten rules for ACCESS ONCE") results in a compile failure for sh builds with CONFIG_X2TLB enabled. arch/sh/mm/gup.c: In function 'gup_get_pte': arch/sh/mm/gup.c:20:2: error: invalid initializer make[1]: *** [arch/sh/mm/gup.o] Error 1 Replace ACCESS_ONCE with READ_ONCE to fix the problem. Fixes: 927609d622a3 ("kernel: tighten rules for ACCESS ONCE") Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Christian Borntraeger <borntraeger@de.ibm.com> Signed-off-by: Guenter Roeck <linux@roeck-us.net> Reviewed-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-03-06quota: Store maximum space limit in bytesJan Kara
commit b10a08194c2b615955dfab2300331a90ae9344c7 upstream. Currently maximum space limit quota format supports is in blocks however since we store space limits in bytes, this is somewhat confusing. So store the maximum limit in bytes as well. Also rename the field to match the new unit and related inode field to match the new naming scheme. Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-03-06x86/xen/p2m: Replace ACCESS_ONCE with READ_ONCEChristian Borntraeger
commit 1760f1eb7ec485197bd3a8a9c13e4160bb740275 upstream. ACCESS_ONCE does not work reliably on non-scalar types. For example gcc 4.6 and 4.7 might remove the volatile tag for such accesses during the SRA (scalar replacement of aggregates) step (https://gcc.gnu.org/bugzilla/show_bug.cgi?id=58145) Change the p2m code to replace ACCESS_ONCE with READ_ONCE. Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com> Reviewed-by: Juergen Gross <jgross@suse.com> Acked-by: David Vrabel <david.vrabel@citrix.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-03-06x86/spinlocks/paravirt: Fix memory corruption on unlockRaghavendra K T
commit d6abfdb2022368d8c6c4be3f11a06656601a6cc2 upstream. Paravirt spinlock clears slowpath flag after doing unlock. As explained by Linus currently it does: prev = *lock; add_smp(&lock->tickets.head, TICKET_LOCK_INC); /* add_smp() is a full mb() */ if (unlikely(lock->tickets.tail & TICKET_SLOWPATH_FLAG)) __ticket_unlock_slowpath(lock, prev); which is *exactly* the kind of things you cannot do with spinlocks, because after you've done the "add_smp()" and released the spinlock for the fast-path, you can't access the spinlock any more. Exactly because a fast-path lock might come in, and release the whole data structure. Linus suggested that we should not do any writes to lock after unlock(), and we can move slowpath clearing to fastpath lock. So this patch implements the fix with: 1. Moving slowpath flag to head (Oleg): Unlocked locks don't care about the slowpath flag; therefore we can keep it set after the last unlock, and clear it again on the first (try)lock. -- this removes the write after unlock. note that keeping slowpath flag would result in unnecessary kicks. By moving the slowpath flag from the tail to the head ticket we also avoid the need to access both the head and tail tickets on unlock. 2. use xadd to avoid read/write after unlock that checks the need for unlock_kick (Linus): We further avoid the need for a read-after-release by using xadd; the prev head value will include the slowpath flag and indicate if we need to do PV kicking of suspended spinners -- on modern chips xadd isn't (much) more expensive than an add + load. Result: setup: 16core (32 cpu +ht sandy bridge 8GB 16vcpu guest) benchmark overcommit %improve kernbench 1x -0.13 kernbench 2x 0.02 dbench 1x -1.77 dbench 2x -0.63 [Jeremy: Hinted missing TICKET_LOCK_INC for kick] [Oleg: Moved slowpath flag to head, ticket_equals idea] [PeterZ: Added detailed changelog] Suggested-by: Linus Torvalds <torvalds@linux-foundation.org> Reported-by: Sasha Levin <sasha.levin@oracle.com> Tested-by: Sasha Levin <sasha.levin@oracle.com> Signed-off-by: Raghavendra K T <raghavendra.kt@linux.vnet.ibm.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Oleg Nesterov <oleg@redhat.com> Cc: Andrew Jones <drjones@redhat.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com> Cc: Christian Borntraeger <borntraeger@de.ibm.com> Cc: Christoph Lameter <cl@linux.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Dave Jones <davej@redhat.com> Cc: David Vrabel <david.vrabel@citrix.com> Cc: Fernando Luis Vázquez Cao <fernando_b1@lab.ntt.co.jp> Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Ulrich Obergfell <uobergfe@redhat.com> Cc: Waiman Long <Waiman.Long@hp.com> Cc: a.ryabinin@samsung.com Cc: dave@stgolabs.net Cc: hpa@zytor.com Cc: jasowang@redhat.com Cc: jeremy@goop.org Cc: paul.gortmaker@windriver.com Cc: riel@redhat.com Cc: tglx@linutronix.de Cc: waiman.long@hp.com Cc: xen-devel@lists.xenproject.org Link: http://lkml.kernel.org/r/20150215173043.GA7471@linux.vnet.ibm.com Signed-off-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-03-06kernel: make READ_ONCE() valid on const argumentsLinus Torvalds
commit dd36929720f40f17685e841ae0d4c581c165ea60 upstream. The use of READ_ONCE() causes lots of warnings witht he pending paravirt spinlock fixes, because those ends up having passing a member to a 'const' structure to READ_ONCE(). There should certainly be nothing wrong with using READ_ONCE() with a const source, but the helper function __read_once_size() would cause warnings because it would drop the 'const' qualifier, but also because the destination would be marked 'const' too due to the use of 'typeof'. Use a union of types in READ_ONCE() to avoid this issue. Also make sure to use parenthesis around the macro arguments to avoid possible operator precedence issues. Tested-by: Ingo Molnar <mingo@kernel.org> Cc: Christian Borntraeger <borntraeger@de.ibm.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-03-06kernel: Fix sparse warning for ACCESS_ONCEChristian Borntraeger
commit c5b19946eb76c67566aae6a84bf2b10ad59295ea upstream. Commit 927609d622a3 ("kernel: tighten rules for ACCESS ONCE") results in sparse warnings like "Using plain integer as NULL pointer" - Let's add a type cast to the dummy assignment. To avoid warnings lik "sparse: warning: cast to restricted __hc32" we also use __force on that cast. Fixes: 927609d622a3 ("kernel: tighten rules for ACCESS ONCE") Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-03-06kernel: tighten rules for ACCESS ONCEChristian Borntraeger
commit 927609d622a3773995f84bc03b4564f873cf0e22 upstream. Now that all non-scalar users of ACCESS_ONCE have been converted to READ_ONCE or ASSIGN once, lets tighten ACCESS_ONCE to only work on scalar types. This variant was proposed by Alexei Starovoitov. Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com> Reviewed-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-03-06x86: pmc-atom: Assign debugfs node as soon as possibleAndy Shevchenko
commit 1b43d7125f3b6f7d46e72da64f65f3187a83b66b upstream. pmc_dbgfs_unregister() will be called when pmc->dbgfs_dir is unconditionally NULL on error path in pmc_dbgfs_register(). To prevent this we move the assignment to where is should be. Fixes: f855911c1f48 (x86/pmc_atom: Expose PMC device state and platform sleep state) Reported-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Cc: Aubrey Li <aubrey.li@linux.intel.com> Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Cc: Kumar P. Mahesh <mahesh.kumar.p@intel.com> Link: http://lkml.kernel.org/r/1421253575-22509-2-git-send-email-andriy.shevchenko@linux.intel.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-03-06x86/irq: Fix regression caused by commit b568b8601f05Jiang Liu
commit 1ea76fbadd667b19c4fa4466f3a3b55a505e83d9 upstream. Commit b568b8601f05 ("Treat SCI interrupt as normal GSI interrupt") accidently removes support of legacy PIC interrupt when fixing a regression for Xen, which causes a nasty regression on HP/Compaq nc6000 where we fail to register the ACPI interrupt, and thus lose eg. thermal notifications leading a potentially overheated machine. So reintroduce support of legacy PIC based ACPI SCI interrupt. Reported-by: Ville Syrjälä <syrjala@sci.fi> Tested-by: Ville Syrjälä <syrjala@sci.fi> Signed-off-by: Jiang Liu <jiang.liu@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Pavel Machek <pavel@ucw.cz> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Len Brown <len.brown@intel.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Rafael J. Wysocki <rjw@rjwysocki.net> Cc: Sander Eikelenboom <linux@eikelenboom.it> Cc: linux-pm@vger.kernel.org Link: http://lkml.kernel.org/r/1424052673-22974-1-git-send-email-jiang.liu@linux.intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-03-06x86, mm/ASLR: Fix stack randomization on 64-bit systemsHector Marco-Gisbert
commit 4e7c22d447bb6d7e37bfe39ff658486ae78e8d77 upstream. The issue is that the stack for processes is not properly randomized on 64 bit architectures due to an integer overflow. The affected function is randomize_stack_top() in file "fs/binfmt_elf.c": static unsigned long randomize_stack_top(unsigned long stack_top) { unsigned int random_variable = 0; if ((current->flags & PF_RANDOMIZE) && !(current->personality & ADDR_NO_RANDOMIZE)) { random_variable = get_random_int() & STACK_RND_MASK; random_variable <<= PAGE_SHIFT; } return PAGE_ALIGN(stack_top) + random_variable; return PAGE_ALIGN(stack_top) - random_variable; } Note that, it declares the "random_variable" variable as "unsigned int". Since the result of the shifting operation between STACK_RND_MASK (which is 0x3fffff on x86_64, 22 bits) and PAGE_SHIFT (which is 12 on x86_64): random_variable <<= PAGE_SHIFT; then the two leftmost bits are dropped when storing the result in the "random_variable". This variable shall be at least 34 bits long to hold the (22+12) result. These two dropped bits have an impact on the entropy of process stack. Concretely, the total stack entropy is reduced by four: from 2^28 to 2^30 (One fourth of expected entropy). This patch restores back the entropy by correcting the types involved in the operations in the functions randomize_stack_top() and stack_maxrandom_size(). The successful fix can be tested with: $ for i in `seq 1 10`; do cat /proc/self/maps | grep stack; done 7ffeda566000-7ffeda587000 rw-p 00000000 00:00 0 [stack] 7fff5a332000-7fff5a353000 rw-p 00000000 00:00 0 [stack] 7ffcdb7a1000-7ffcdb7c2000 rw-p 00000000 00:00 0 [stack] 7ffd5e2c4000-7ffd5e2e5000 rw-p 00000000 00:00 0 [stack] ... Once corrected, the leading bytes should be between 7ffc and 7fff, rather than always being 7fff. Signed-off-by: Hector Marco-Gisbert <hecmargi@upv.es> Signed-off-by: Ismael Ripoll <iripoll@upv.es> [ Rebased, fixed 80 char bugs, cleaned up commit message, added test example and CVE ] Signed-off-by: Kees Cook <keescook@chromium.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Al Viro <viro@zeniv.linux.org.uk> Fixes: CVE-2015-1593 Link: http://lkml.kernel.org/r/20150214173350.GA18393@www.outflux.net Signed-off-by: Borislav Petkov <bp@suse.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-03-06x86/efi: Avoid triple faults during EFI mixed mode callsMatt Fleming
commit 96738c69a7fcdbf0d7c9df0c8a27660011e82a7b upstream. Andy pointed out that if an NMI or MCE is received while we're in the middle of an EFI mixed mode call a triple fault will occur. This can happen, for example, when issuing an EFI mixed mode call while running perf. The reason for the triple fault is that we execute the mixed mode call in 32-bit mode with paging disabled but with 64-bit kernel IDT handlers installed throughout the call. At Andy's suggestion, stop playing the games we currently do at runtime, such as disabling paging and installing a 32-bit GDT for __KERNEL_CS. We can simply switch to the __KERNEL32_CS descriptor before invoking firmware services, and run in compatibility mode. This way, if an NMI/MCE does occur the kernel IDT handler will execute correctly, since it'll jump to __KERNEL_CS automatically. However, this change is only possible post-ExitBootServices(). Before then the firmware "owns" the machine and expects for its 32-bit IDT handlers to be left intact to service interrupts, etc. So, we now need to distinguish between early boot and runtime invocations of EFI services. During early boot, we need to restore the GDT that the firmware expects to be present. We can only jump to the __KERNEL32_CS code segment for mixed mode calls after ExitBootServices() has been invoked. A liberal sprinkling of comments in the thunking code should make the differences in early and late environments more apparent. Reported-by: Andy Lutomirski <luto@amacapital.net> Tested-by: Borislav Petkov <bp@suse.de> Signed-off-by: Matt Fleming <matt.fleming@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-03-06blk-throttle: check stats_cpu before reading it from sysfsThadeu Lima de Souza Cascardo
commit 045c47ca306acf30c740c285a77a4b4bda6be7c5 upstream. When reading blkio.throttle.io_serviced in a recently created blkio cgroup, it's possible to race against the creation of a throttle policy, which delays the allocation of stats_cpu. Like other functions in the throttle code, just checking for a NULL stats_cpu prevents the following oops caused by that race. [ 1117.285199] Unable to handle kernel paging request for data at address 0x7fb4d0020 [ 1117.285252] Faulting instruction address: 0xc0000000003efa2c [ 1137.733921] Oops: Kernel access of bad area, sig: 11 [#1] [ 1137.733945] SMP NR_CPUS=2048 NUMA PowerNV [ 1137.734025] Modules linked in: bridge stp llc kvm_hv kvm binfmt_misc autofs4 [ 1137.734102] CPU: 3 PID: 5302 Comm: blkcgroup Not tainted 3.19.0 #5 [ 1137.734132] task: c000000f1d188b00 ti: c000000f1d210000 task.ti: c000000f1d210000 [ 1137.734167] NIP: c0000000003efa2c LR: c0000000003ef9f0 CTR: c0000000003ef980 [ 1137.734202] REGS: c000000f1d213500 TRAP: 0300 Not tainted (3.19.0) [ 1137.734230] MSR: 9000000000009032 <SF,HV,EE,ME,IR,DR,RI> CR: 42008884 XER: 20000000 [ 1137.734325] CFAR: 0000000000008458 DAR: 00000007fb4d0020 DSISR: 40000000 SOFTE: 0 GPR00: c0000000003ed3a0 c000000f1d213780 c000000000c59538 0000000000000000 GPR04: 0000000000000800 0000000000000000 0000000000000000 0000000000000000 GPR08: ffffffffffffffff 00000007fb4d0020 00000007fb4d0000 c000000000780808 GPR12: 0000000022000888 c00000000fdc0d80 0000000000000000 0000000000000000 GPR16: 0000000000000000 0000000000000000 0000000000000000 0000000000000000 GPR20: 000001003e120200 c000000f1d5b0cc0 0000000000000200 0000000000000000 GPR24: 0000000000000001 c000000000c269e0 0000000000000020 c000000f1d5b0c80 GPR28: c000000000ca3a08 c000000000ca3dec c000000f1c667e00 c000000f1d213850 [ 1137.734886] NIP [c0000000003efa2c] .tg_prfill_cpu_rwstat+0xac/0x180 [ 1137.734915] LR [c0000000003ef9f0] .tg_prfill_cpu_rwstat+0x70/0x180 [ 1137.734943] Call Trace: [ 1137.734952] [c000000f1d213780] [d000000005560520] 0xd000000005560520 (unreliable) [ 1137.734996] [c000000f1d2138a0] [c0000000003ed3a0] .blkcg_print_blkgs+0xe0/0x1a0 [ 1137.735039] [c000000f1d213960] [c0000000003efb50] .tg_print_cpu_rwstat+0x50/0x70 [ 1137.735082] [c000000f1d2139e0] [c000000000104b48] .cgroup_seqfile_show+0x58/0x150 [ 1137.735125] [c000000f1d213a70] [c0000000002749dc] .kernfs_seq_show+0x3c/0x50 [ 1137.735161] [c000000f1d213ae0] [c000000000218630] .seq_read+0xe0/0x510 [ 1137.735197] [c000000f1d213bd0] [c000000000275b04] .kernfs_fop_read+0x164/0x200 [ 1137.735240] [c000000f1d213c80] [c0000000001eb8e0] .__vfs_read+0x30/0x80 [ 1137.735276] [c000000f1d213cf0] [c0000000001eb9c4] .vfs_read+0x94/0x1b0 [ 1137.735312] [c000000f1d213d90] [c0000000001ebb38] .SyS_read+0x58/0x100 [ 1137.735349] [c000000f1d213e30] [c000000000009218] syscall_exit+0x0/0x98 [ 1137.735383] Instruction dump: [ 1137.735405] 7c6307b4 7f891800 409d00b8 60000000 60420000 3d420004 392a63b0 786a1f24 [ 1137.735471] 7d49502a e93e01c8 7d495214 7d2ad214 <7cead02a> e9090008 e9490010 e9290018 And here is one code that allows to easily reproduce this, although this has first been found by running docker. void run(pid_t pid) { int n; int status; int fd; char *buffer; buffer = memalign(BUFFER_ALIGN, BUFFER_SIZE); n = snprintf(buffer, BUFFER_SIZE, "%d\n", pid); fd = open(CGPATH "/test/tasks", O_WRONLY); write(fd, buffer, n); close(fd); if (fork() > 0) { fd = open("/dev/sda", O_RDONLY | O_DIRECT); read(fd, buffer, 512); close(fd); wait(&status); } else { fd = open(CGPATH "/test/blkio.throttle.io_serviced", O_RDONLY); n = read(fd, buffer, BUFFER_SIZE); close(fd); } free(buffer); exit(0); } void test(void) { int status; mkdir(CGPATH "/test", 0666); if (fork() > 0) wait(&status); else run(getpid()); rmdir(CGPATH "/test"); } int main(int argc, char **argv) { int i; for (i = 0; i < NR_TESTS; i++) test(); return 0; } Reported-by: Ricardo Marin Matinata <rmm@br.ibm.com> Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@linux.vnet.ibm.com> Signed-off-by: Jens Axboe <axboe@fb.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-03-06Btrfs: fix fsync data loss after adding hard link to inodeFilipe Manana
commit 1a4bcf470c886b955adf36486f4c86f2441d85cb upstream. We have a scenario where after the fsync log replay we can lose file data that had been previously fsync'ed if we added an hard link for our inode and after that we sync'ed the fsync log (for example by fsync'ing some other file or directory). This is because when adding an hard link we updated the inode item in the log tree with an i_size value of 0. At that point the new inode item was in memory only and a subsequent fsync log replay would not make us lose the file data. However if after adding the hard link we sync the log tree to disk, by fsync'ing some other file or directory for example, we ended up losing the file data after log replay, because the inode item in the persisted log tree had an an i_size of zero. This is easy to reproduce, and the following excerpt from my test for xfstests shows this: _scratch_mkfs >> $seqres.full 2>&1 _init_flakey _mount_flakey # Create one file with data and fsync it. # This made the btrfs fsync log persist the data and the inode metadata with # a correct inode->i_size (4096 bytes). $XFS_IO_PROG -f -c "pwrite -S 0xaa -b 4K 0 4K" -c "fsync" \ $SCRATCH_MNT/foo | _filter_xfs_io # Now add one hard link to our file. This made the btrfs code update the fsync # log, in memory only, with an inode metadata having a size of 0. ln $SCRATCH_MNT/foo $SCRATCH_MNT/foo_link # Now force persistence of the fsync log to disk, for example, by fsyncing some # other file. touch $SCRATCH_MNT/bar $XFS_IO_PROG -c "fsync" $SCRATCH_MNT/bar # Before a power loss or crash, we could read the 4Kb of data from our file as # expected. echo "File content before:" od -t x1 $SCRATCH_MNT/foo # Simulate a crash/power loss. _load_flakey_table $FLAKEY_DROP_WRITES _unmount_flakey _load_flakey_table $FLAKEY_ALLOW_WRITES _mount_flakey # After the fsync log replay, because the fsync log had a value of 0 for our # inode's i_size, we couldn't read anymore the 4Kb of data that we previously # wrote and fsync'ed. The size of the file became 0 after the fsync log replay. echo "File content after:" od -t x1 $SCRATCH_MNT/foo Another alternative test, that doesn't need to fsync an inode in the same transaction it was created, is: _scratch_mkfs >> $seqres.full 2>&1 _init_flakey _mount_flakey # Create our test file with some data. $XFS_IO_PROG -f -c "pwrite -S 0xaa -b 8K 0 8K" \ $SCRATCH_MNT/foo | _filter_xfs_io # Make sure the file is durably persisted. sync # Append some data to our file, to increase its size. $XFS_IO_PROG -f -c "pwrite -S 0xcc -b 4K 8K 4K" \ $SCRATCH_MNT/foo | _filter_xfs_io # Fsync the file, so from this point on if a crash/power failure happens, our # new data is guaranteed to be there next time the fs is mounted. $XFS_IO_PROG -c "fsync" $SCRATCH_MNT/foo # Add one hard link to our file. This made btrfs write into the in memory fsync # log a special inode with generation 0 and an i_size of 0 too. Note that this # didn't update the inode in the fsync log on disk. ln $SCRATCH_MNT/foo $SCRATCH_MNT/foo_link # Now make sure the in memory fsync log is durably persisted. # Creating and fsync'ing another file will do it. touch $SCRATCH_MNT/bar $XFS_IO_PROG -c "fsync" $SCRATCH_MNT/bar # As expected, before the crash/power failure, we should be able to read the # 12Kb of file data. echo "File content before:" od -t x1 $SCRATCH_MNT/foo # Simulate a crash/power loss. _load_flakey_table $FLAKEY_DROP_WRITES _unmount_flakey _load_flakey_table $FLAKEY_ALLOW_WRITES _mount_flakey # After mounting the fs again, the fsync log was replayed. # The btrfs fsync log replay code didn't update the i_size of the persisted # inode because the inode item in the log had a special generation with a # value of 0 (and it couldn't know the correct i_size, since that inode item # had a 0 i_size too). This made the last 4Kb of file data inaccessible and # effectively lost. echo "File content after:" od -t x1 $SCRATCH_MNT/foo This isn't a new issue/regression. This problem has been around since the log tree code was added in 2008: Btrfs: Add a write ahead tree log to optimize synchronous operations (commit e02119d5a7b4396c5a872582fddc8bd6d305a70a) Test cases for xfstests follow soon. Signed-off-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: Chris Mason <clm@fb.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-03-06btrfs: fix leak of path in btrfs_find_itemDavid Sterba
commit 381cf6587f8a8a8e981bc0c1aaaa8859b51dc756 upstream. If btrfs_find_item is called with NULL path it allocates one locally but does not free it. Affected paths are inserting an orphan item for a file and for a subvol root. Move the path allocation to the callers. Fixes: 3f870c289900 ("btrfs: expand btrfs_find_item() to include find_orphan_item functionality") Signed-off-by: David Sterba <dsterba@suse.cz> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-03-06btrfs: set proper message level for skinny metadataDavid Sterba
commit 5efa0490cc94aee06cd8d282683e22a8ce0a0026 upstream. This has been confusing people for too long, the message is really just informative. Signed-off-by: David Sterba <dsterba@suse.cz> Signed-off-by: Chris Mason <clm@fb.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-03-06libceph: fix double __remove_osd() problemIlya Dryomov
commit 7eb71e0351fbb1b242ae70abb7bb17107fe2f792 upstream. It turns out it's possible to get __remove_osd() called twice on the same OSD. That doesn't sit well with rb_erase() - depending on the shape of the tree we can get a NULL dereference, a soft lockup or a random crash at some point in the future as we end up touching freed memory. One scenario that I was able to reproduce is as follows: <osd3 is idle, on the osd lru list> <con reset - osd3> con_fault_finish() osd_reset() <osdmap - osd3 down> ceph_osdc_handle_map() <takes map_sem> kick_requests() <takes request_mutex> reset_changed_osds() __reset_osd() __remove_osd() <releases request_mutex> <releases map_sem> <takes map_sem> <takes request_mutex> __kick_osd_requests() __reset_osd() __remove_osd() <-- !!! A case can be made that osd refcounting is imperfect and reworking it would be a proper resolution, but for now Sage and I decided to fix this by adding a safe guard around __remove_osd(). Fixes: http://tracker.ceph.com/issues/8087 Cc: Sage Weil <sage@redhat.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com> Reviewed-by: Sage Weil <sage@redhat.com> Reviewed-by: Alex Elder <elder@linaro.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-03-06samsung-laptop: Add use_native_backlight quirk, and enable it on some modelsHans de Goede
commit 4690555e13c48fef07f2762f6b0cd6b181e326d0 upstream. Since kernel 3.14 the backlight control has been broken on various Samsung Atom based netbooks. This has been bisected and this problem happens since commit b35684b8fa94 ("drm/i915: do full backlight setup at enable time") This has been reported and discussed in detail here: http://lists.freedesktop.org/archives/intel-gfx/2014-July/049395.html Unfortunately no-one has been able to fix this. This only affects Samsung Atom netbooks, and the Linux kernel and the BIOS of those laptops have never worked well together. All affected laptops already have a quirk to avoid using the standard acpi-video interface and instead use the samsung specific SABI interface which samsung-laptop uses. It seems that recent fixes to the i915 driver have also broken backlight control through the SABI interface. The intel_backlight driver OTOH works fine, and also allows for finer grained backlight control. So add a new use_native_backlight quirk, and replace the broken_acpi_video quirk with this quirk for affected models. This new quirk disables acpi-video as before and also stops samsung-laptop from registering the SABI based samsung_laptop backlight interface, leaving only the working intel_backlight interface. This commit enables this new quirk for 3 models which are known to be affected, chances are that it needs to be used on other models too. BugLink: https://bugzilla.redhat.com/show_bug.cgi?id=1094948 # N145P BugLink: https://bugzilla.redhat.com/show_bug.cgi?id=1115713 # N250P Reported-by: Bertrik Sikken <bertrik@sikken.nl> # N150P Cc: stable@vger.kernel.org # 3.16 Signed-off-by: Hans de Goede <hdegoede@redhat.com> Signed-off-by: Darren Hart <dvhart@linux.intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-03-06jffs2: fix handling of corrupted summary lengthChen Jie
commit 164c24063a3eadee11b46575c5482b2f1417be49 upstream. sm->offset maybe wrong but magic maybe right, the offset do not have CRC. Badness at c00c7580 [verbose debug info unavailable] NIP: c00c7580 LR: c00c718c CTR: 00000014 REGS: df07bb40 TRAP: 0700 Not tainted (2.6.34.13-WR4.3.0.0_standard) MSR: 00029000 <EE,ME,CE> CR: 22084f84 XER: 00000000 TASK = df84d6e0[908] 'mount' THREAD: df07a000 GPR00: 00000001 df07bbf0 df84d6e0 00000000 00000001 00000000 df07bb58 00000041 GPR08: 00000041 c0638860 00000000 00000010 22084f88 100636c8 df814ff8 00000000 GPR16: df84d6e0 dfa558cc c05adb90 00000048 c0452d30 00000000 000240d0 000040d0 GPR24: 00000014 c05ae734 c05be2e0 00000000 00000001 00000000 00000000 c05ae730 NIP [c00c7580] __alloc_pages_nodemask+0x4d0/0x638 LR [c00c718c] __alloc_pages_nodemask+0xdc/0x638 Call Trace: [df07bbf0] [c00c718c] __alloc_pages_nodemask+0xdc/0x638 (unreliable) [df07bc90] [c00c7708] __get_free_pages+0x20/0x48 [df07bca0] [c00f4a40] __kmalloc+0x15c/0x1ec [df07bcd0] [c01fc880] jffs2_scan_medium+0xa58/0x14d0 [df07bd70] [c01ff38c] jffs2_do_mount_fs+0x1f4/0x6b4 [df07bdb0] [c020144c] jffs2_do_fill_super+0xa8/0x260 [df07bdd0] [c020230c] jffs2_fill_super+0x104/0x184 [df07be00] [c0335814] get_sb_mtd_aux+0x9c/0xec [df07be20] [c033596c] get_sb_mtd+0x84/0x1e8 [df07be60] [c0201ed0] jffs2_get_sb+0x1c/0x2c [df07be70] [c0103898] vfs_kern_mount+0x78/0x1e8 [df07bea0] [c0103a58] do_kern_mount+0x40/0x100 [df07bec0] [c011fe90] do_mount+0x240/0x890 [df07bf10] [c0120570] sys_mount+0x90/0xd8 [df07bf40] [c00110d8] ret_from_syscall+0x0/0x4 === Exception: c01 at 0xff61a34 LR = 0x100135f0 Instruction dump: 38800005 38600000 48010f41 4bfffe1c 4bfc2d15 4bfffe8c 72e90200 4082fc28 3d20c064 39298860 8809000d 68000001 <0f000000> 2f800000 419efc0c 38000001 mount: mounting /dev/mtdblock3 on /common failed: Input/output error Signed-off-by: Chen Jie <chenjie6@huawei.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: David Woodhouse <David.Woodhouse@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-03-06EDAC, amd64_edac: Prevent OOPS with >16 memory controllersDaniel J Blueman
commit 0c510cc83bdbaac8406f4f7caef34f4da0ba35ea upstream. When DRAM errors occur on memory controllers after EDAC_MAX_MCS (16), the kernel fatally dereferences unallocated structures, see splat below; this occurs on at least NumaConnect systems. Fix by checking if a memory controller info structure was found. BUG: unable to handle kernel NULL pointer dereference at 0000000000000320 IP: [<ffffffff819f714f>] decode_bus_error+0x2f/0x2b0 PGD 2f8b5a3067 PUD 2f8b5a2067 PMD 0 Oops: 0000 [#2] SMP Modules linked in: CPU: 224 PID: 11930 Comm: stream_c.exe.gn Tainted: G D 3.19.0 #1 Hardware name: Supermicro H8QGL/H8QGL, BIOS 3.5b 01/28/2015 task: ffff8807dbfb8c00 ti: ffff8807dd16c000 task.ti: ffff8807dd16c000 RIP: 0010:[<ffffffff819f714f>] [<ffffffff819f714f>] decode_bus_error+0x2f/0x2b0 RSP: 0000:ffff8907dfc03c48 EFLAGS: 00010297 RAX: 0000000000000001 RBX: 9c67400010080a13 RCX: 0000000000001dc6 RDX: 000000001dc61dc6 RSI: ffff8907dfc03df0 RDI: 000000000000001c RBP: ffff8907dfc03ce8 R08: 0000000000000000 R09: 0000000000000022 R10: ffff891fffa30380 R11: 00000000001cfc90 R12: 0000000000000008 R13: 0000000000000000 R14: 000000000000001c R15: 00009c6740001000 FS: 00007fa97ee18700(0000) GS:ffff8907dfc00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000000320 CR3: 0000003f889b8000 CR4: 00000000000407e0 Stack: 0000000000000000 ffff8907dfc03df0 0000000000000008 9c67400010080a13 000000000000001c 00009c6740001000 ffff8907dfc03c88 ffffffff810e4f9a ffff8907dfc03ce8 ffffffff81b375b9 0000000000000000 0000000000000010 Call Trace: <IRQ> ? vprintk_default ? printk amd_decode_mce notifier_call_chain atomic_notifier_call_chain mce_log machine_check_poll mce_timer_fn ? mce_cpu_restart call_timer_fn.isra.29 run_timer_softirq __do_softirq irq_exit smp_apic_timer_interrupt apic_timer_interrupt <EOI> ? down_read_trylock __do_page_fault ? __schedule do_page_fault page_fault Signed-off-by: Daniel J Blueman <daniel@numascale.com> Link: http://lkml.kernel.org/r/1424144078-24589-1-git-send-email-daniel@numascale.com [ Boris: massage commit message ] Signed-off-by: Borislav Petkov <bp@suse.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-03-06sb_edac: Fix detection on SNB machinesBorislav Petkov
commit 11249e73992981e31fd50e7231da24fad68e3320 upstream. d0585cd815fa ("sb_edac: Claim a different PCI device") changed the probing of sb_edac to look for PCI device 0x3ca0: 3f:0e.0 System peripheral: Intel Corporation Xeon E5/Core i7 Processor Home Agent (rev 07) 00: 86 80 a0 3c 00 00 00 00 07 00 80 08 00 00 80 00 ... but we're matching for 0x3ca8, i.e. PCI_DEVICE_ID_INTEL_SBRIDGE_IMC_TA in sbridge_probe() therefore the probing fails. Changing it to probe for 0x3ca0 (PCI_DEVICE_ID_INTEL_SBRIDGE_IMC_HA0), .i.e., the 14.0 device, fixes the issue and driver loads successfully again: [ 2449.013120] EDAC DEBUG: sbridge_init: [ 2449.017029] EDAC sbridge: Seeking for: PCI ID 8086:3ca0 [ 2449.022368] EDAC DEBUG: sbridge_get_onedevice: Detected 8086:3ca0 [ 2449.028498] EDAC sbridge: Seeking for: PCI ID 8086:3ca0 [ 2449.033768] EDAC sbridge: Seeking for: PCI ID 8086:3ca8 [ 2449.039028] EDAC DEBUG: sbridge_get_onedevice: Detected 8086:3ca8 [ 2449.045155] EDAC sbridge: Seeking for: PCI ID 8086:3ca8 ... Add a debug printk while at it to be able to catch the failure in the future and dump driver version on successful load. Fixes: d0585cd815fa ("sb_edac: Claim a different PCI device") Acked-by: Aristeu Rozanski <aris@redhat.com> Cc: Tony Luck <tony.luck@intel.com> Acked-by: Andy Lutomirski <luto@amacapital.net> Acked-by: Mauro Carvalho Chehab <m.chehab@samsung.com> Signed-off-by: Borislav Petkov <bp@suse.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-03-06md/raid1: fix read balance when a drive is write-mostly.Tomáš Hodek
commit d1901ef099c38afd11add4cfb3312c02ef21ec4a upstream. When a drive is marked write-mostly it should only be the target of reads if there is no other option. This behaviour was broken by commit 9dedf60313fa4dddfd5b9b226a0ef12a512bf9dc md/raid1: read balance chooses idlest disk for SSD which causes a write-mostly device to be *preferred* is some cases. Restore correct behaviour by checking and setting best_dist_disk and best_pending_disk rather than best_disk. We only need to test one of these as they are both changed from -1 or >=0 at the same time. As we leave min_pending and best_dist unchanged, any non-write-mostly device will appear better than the write-mostly device. Reported-by: Tomáš Hodek <tomas.hodek@volny.cz> Reported-by: Dark Penguin <darkpenguin@yandex.ru> Signed-off-by: NeilBrown <neilb@suse.de> Link: http://marc.info/?l=linux-raid&m=135982797322422 Fixes: 9dedf60313fa4dddfd5b9b226a0ef12a512bf9dc Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-03-06md/raid5: Fix livelock when array is both resyncing and degraded.NeilBrown
commit 26ac107378c4742978216be1005b7291b799c7b2 upstream. Commit a7854487cd7128a30a7f4f5259de9f67d5efb95f: md: When RAID5 is dirty, force reconstruct-write instead of read-modify-write. Causes an RCW cycle to be forced even when the array is degraded. A degraded array cannot support RCW as that requires reading all data blocks, and one may be missing. Forcing an RCW when it is not possible causes a live-lock and the code spins, repeatedly deciding to do something that cannot succeed. So change the condition to only force RCW on non-degraded arrays. Reported-by: Manibalan P <pmanibalan@amiindia.co.in> Bisected-by: Jes Sorensen <Jes.Sorensen@redhat.com> Tested-by: Jes Sorensen <Jes.Sorensen@redhat.com> Signed-off-by: NeilBrown <neilb@suse.de> Fixes: a7854487cd7128a30a7f4f5259de9f67d5efb95f Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-03-06perf tools: Fix probing for PERF_FLAG_FD_CLOEXEC flagAdrian Hunter
commit 48536c9195ae8c2a00fd8f400bac72ab613feaab upstream. Commit f6edb53c4993ffe92ce521fb449d1c146cea6ec2 converted the probe to a CPU wide event first (pid == -1). For kernels that do not support the PERF_FLAG_FD_CLOEXEC flag the probe fails with EINVAL. Since this errno is not handled pid is not reset to 0 and the subsequent use of pid = -1 as an argument brings in an additional failure path if perf_event_paranoid > 0: $ perf record -- sleep 1 perf_event_open(..., 0) failed unexpectedly with error 13 (Permission denied) [ perf record: Woken up 1 times to write data ] [ perf record: Captured and wrote 0.007 MB /tmp/perf.data (11 samples) ] Also, ensure the fd of the confirmation check is closed and comment why pid = -1 is used. Needs to go to 3.18 stable tree as well. Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Based-on-patch-by: David Ahern <david.ahern@oracle.com> Acked-by: David Ahern <david.ahern@oracle.com> Cc: David Ahern <dsahern@gmail.com> Link: http://lkml.kernel.org/r/54EC610C.8000403@intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-03-06clocksource: mtk: Fix race conditions in probe codeMatthias Brugger
commit d4a19eb3b15a4ba98f627182f48d5bc0cffae670 upstream. We have two race conditions in the probe code which could lead to a null pointer dereference in the interrupt handler. The interrupt handler accesses the clockevent device, which may not yet be registered. First race condition happens when the interrupt handler gets registered before the interrupts get disabled. The second race condition happens when the interrupts get enabled, but the clockevent device is not yet registered. Fix that by disabling the interrupts before we register the interrupt and enable the interrupts after the clockevent device got registered. Reported-by: Gongbae Park <yongbae2@gmail.com> Signed-off-by: Matthias Brugger <matthias.bgg@gmail.com> Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-03-06metag: Fix KSTK_EIP() and KSTK_ESP() macrosJames Hogan
commit c2996cb29bfb73927a79dc96e598a718e843f01a upstream. The KSTK_EIP() and KSTK_ESP() macros should return the user program counter (PC) and stack pointer (A0StP) of the given task. These are used to determine which VMA corresponds to the user stack in /proc/<pid>/maps, and for the user PC & A0StP in /proc/<pid>/stat. However for Meta the PC & A0StP from the task's kernel context are used, resulting in broken output. For example in following /proc/<pid>/maps output, the 3afff000-3b021000 VMA should be described as the stack: # cat /proc/self/maps ... 100b0000-100b1000 rwxp 00000000 00:00 0 [heap] 3afff000-3b021000 rwxp 00000000 00:00 0 And in the following /proc/<pid>/stat output, the PC is in kernel code (1074234964 = 0x40078654) and the A0StP is in the kernel heap (1335981392 = 0x4fa17550): # cat /proc/self/stat 51 (cat) R ... 1335981392 1074234964 ... Fix the definitions of KSTK_EIP() and KSTK_ESP() to use task_pt_regs(tsk)->ctx rather than (tsk)->thread.kernel_context. This gets the registers from the user context stored after the thread info at the base of the kernel stack, which is from the last entry into the kernel from userland, regardless of where in the kernel the task may have been interrupted, which results in the following more correct /proc/<pid>/maps output: # cat /proc/self/maps ... 0800b000-08070000 r-xp 00000000 00:02 207 /lib/libuClibc-0.9.34-git.so ... 100b0000-100b1000 rwxp 00000000 00:00 0 [heap] 3afff000-3b021000 rwxp 00000000 00:00 0 [stack] And /proc/<pid>/stat now correctly reports the PC in libuClibc (134320308 = 0x80190b4) and the A0StP in the [stack] region (989864576 = 0x3b002280): # cat /proc/self/stat 51 (cat) R ... 989864576 134320308 ... Reported-by: Alexey Brodkin <Alexey.Brodkin@synopsys.com> Reported-by: Vineet Gupta <Vineet.Gupta1@synopsys.com> Signed-off-by: James Hogan <james.hogan@imgtec.com> Cc: linux-metag@vger.kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-03-06xfs: Fix quota type in quota structures when reusing quota fileJan Kara
commit dfcc70a8c868fe03276fa59864149708fb41930b upstream. For filesystems without separate project quota inode field in the superblock we just reuse project quota file for group quotas (and vice versa) if project quota file is allocated and we need group quota file. When we reuse the file, quota structures on disk suddenly have wrong type stored in d_flags though. Nobody really cares about this (although structure type reported to userspace was wrong as well) except that after commit 14bf61ffe6ac (quota: Switch ->get_dqblk() and ->set_dqblk() to use bytes as space units) assertion in xfs_qm_scall_getquota() started to trigger on xfs/106 test (apparently I was testing without XFS_DEBUG so I didn't notice when submitting the above commit). Fix the problem by properly resetting ddq->d_flags when running quotacheck for a quota file. Reported-by: Al Viro <viro@ZenIV.linux.org.uk> Signed-off-by: Jan Kara <jack@suse.cz> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-03-06gpio: tps65912: fix wrong container_of argumentsNicolas Saenz Julienne
commit 2f97c20e5f7c3582c7310f65a04465bfb0fd0e85 upstream. The gpio_chip operations receive a pointer the gpio_chip struct which is contained in the driver's private struct, yet the container_of call in those functions point to the mfd struct defined in include/linux/mfd/tps65912.h. Signed-off-by: Nicolas Saenz Julienne <nicolassaenzj@gmail.com> Signed-off-by: Linus Walleij <linus.walleij@linaro.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-03-06gpiolib: of: allow of_gpiochip_find_and_xlate to find more than one chip per ↵Hans Holmberg
node commit 9cf75e9e4ddd587ac12e88e8751c358b7b27e95f upstream. The change: 7b8792bbdffdff3abda704f89c6a45ea97afdc62 gpiolib: of: Correct error handling in of_get_named_gpiod_flags assumed that only one gpio-chip is registred per of-node. Some drivers register more than one chip per of-node, so adjust the matching function of_gpiochip_find_and_xlate to not stop looking for chips if a node-match is found and the translation fails. Fixes: 7b8792bbdffd ("gpiolib: of: Correct error handling in of_get_named_gpiod_flags") Signed-off-by: Hans Holmberg <hans.holmberg@intel.com> Acked-by: Alexandre Courbot <acourbot@nvidia.com> Tested-by: Robert Jarzmik <robert.jarzmik@free.fr> Tested-by: Tyler Hall <tylerwhall@gmail.com> Signed-off-by: Linus Walleij <linus.walleij@linaro.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-03-06arm64: compat Fix siginfo_t -> compat_siginfo_t conversion on big endianCatalin Marinas
commit 9d42d48a342aee208c1154696196497fdc556bbf upstream. The native (64-bit) sigval_t union contains sival_int (32-bit) and sival_ptr (64-bit). When a compat application invokes a syscall that takes a sigval_t value (as part of a larger structure, e.g. compat_sys_mq_notify, compat_sys_timer_create), the compat_sigval_t union is converted to the native sigval_t with sival_int overlapping with either the least or the most significant half of sival_ptr, depending on endianness. When the corresponding signal is delivered to a compat application, on big endian the current (compat_uptr_t)sival_ptr cast always returns 0 since sival_int corresponds to the top part of sival_ptr. This patch fixes copy_siginfo_to_user32() so that sival_int is copied to the compat_siginfo_t structure. Reported-by: Bamvor Jian Zhang <bamvor.zhangjian@huawei.com> Tested-by: Bamvor Jian Zhang <bamvor.zhangjian@huawei.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-03-06hx4700: regulator: declare full constraintsMartin Vajnar
commit a52d209336f8fc7483a8c7f4a8a7d2a8e1692a6c upstream. Since the removal of CONFIG_REGULATOR_DUMMY option, the touchscreen stopped working. This patch enables the "replacement" for REGULATOR_DUMMY and allows the touchscreen to work even though there is no regulator for "vcc". Signed-off-by: Martin Vajnar <martin.vajnar@gmail.com> Signed-off-by: Robert Jarzmik <robert.jarzmik@free.fr> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-03-06KVM: s390: avoid memory leaks if __inject_vm() failsDavid Hildenbrand
commit 428d53be5e7468769d4e7899cca06ed5f783a6e1 upstream. We have to delete the allocated interrupt info if __inject_vm() fails. Otherwise user space can keep flooding kvm with floating interrupts and provoke more and more memory leaks. Reported-by: Dominik Dingel <dingel@linux.vnet.ibm.com> Reviewed-by: Dominik Dingel <dingel@linux.vnet.ibm.com> Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-03-06KVM: s390: floating irqs: fix user triggerable endless loopDavid Hildenbrand
commit 8e2207cdd087ebb031e9118d1fd0902c6533a5e5 upstream. If a vm with no VCPUs is created, the injection of a floating irq leads to an endless loop in the kernel. Let's skip the search for a destination VCPU for a floating irq if no VCPUs were created. Reviewed-by: Dominik Dingel <dingel@linux.vnet.ibm.com> Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com> Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-03-06KVM: s390: base hrtimer on a monotonic clockDavid Hildenbrand
commit 0ac96caf0f9381088c673a16d910b1d329670edf upstream. The hrtimer that handles the wait with enabled timer interrupts should not be disturbed by changes of the host time. This patch changes our hrtimer to be based on a monotonic clock. Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com> Acked-by: Cornelia Huck <cornelia.huck@de.ibm.com> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-03-06KVM: s390: forward hrtimer if guest ckc not pending yetDavid Hildenbrand
commit 2d00f759427bb3ed963b60f570830e9eca7e1c69 upstream. Patch 0759d0681cae ("KVM: s390: cleanup handle_wait by reusing kvm_vcpu_block") changed the way pending guest clock comparator interrupts are detected. It was assumed that as soon as the hrtimer wakes up, the condition for the guest ckc is satisfied. This is however only true as long as adjclock() doesn't speed up the monotonic clock. Reason is that the hrtimer is based on CLOCK_MONOTONIC, the guest clock comparator detection is based on the raw TOD clock. If CLOCK_MONOTONIC runs faster than the TOD clock, the hrtimer wakes the target VCPU up too early and the target VCPU will not detect any pending interrupts, therefore going back to sleep. It will never be woken up again because the hrtimer has finished. The VCPU is stuck. As a quick fix, we have to forward the hrtimer until the guest clock comparator is really due, to guarantee properly timed wake ups. As the hrtimer callback might be triggered on another cpu, we have to make sure that the timer is really stopped and not currently executing the callback on another cpu. This can happen if the vcpu thread is scheduled onto another physical cpu, but the timer base is not migrated. So lets use hrtimer_cancel instead of try_to_cancel. A proper fix might be to introduce a RAW based hrtimer. Reported-by: Christian Borntraeger <borntraeger@de.ibm.com> Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com> Acked-by: Cornelia Huck <cornelia.huck@de.ibm.com> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-03-06udf: Check length of extended attributes and allocation descriptorsJan Kara
commit 23b133bdc452aa441fcb9b82cbf6dd05cfd342d0 upstream. Check length of extended attributes and allocation descriptors when loading inodes from disk. Otherwise corrupted filesystems could confuse the code and make the kernel oops. Reported-by: Carl Henrik Lunde <chlunde@ping.uio.no> Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-03-06udf: Remove repeated loads blocksizeJan Kara
commit 79144954278d4bb5989f8b903adcac7a20ff2a5a upstream. Store blocksize in a local variable in udf_fill_inode() since it is used a lot of times. Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-03-06MIPS: HTW: Prevent accidental HTW start due to nested htw_{start, stop}Markos Chandras
commit ed4cbc81addbc076b016c5b979fd1a02f0897f0a upstream. activate_mm() and switch_mm() call get_new_mmu_context() which in turn can enable the HTW before the entryhi is changed with the new ASID. Since the latter will enable the HTW in local_flush_tlb_all(), then there is a small timing window where the HTW is running with the new ASID but with an old pgd since the TLBMISS_HANDLER_SETUP_PGD hasn't assigned a new one yet. In order to prevent that, we introduce a simple htw counter to avoid starting HTW accidentally due to nested htw_{start,stop}() sequences. Moreover, since various IPI calls can enforce TLB flushing operations on a different core, such an operation may interrupt another htw_{stop,start} in progress leading inconsistent updates of the htw_seq variable. In order to avoid that, we disable the interrupts whenever we update that variable. Signed-off-by: Markos Chandras <markos.chandras@imgtec.com> Cc: linux-mips@linux-mips.org Patchwork: https://patchwork.linux-mips.org/patch/9118/ Signed-off-by: Ralf Baechle <ralf@linux-mips.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-03-06ARC: fix page address calculation if PAGE_OFFSET != LINUX_LINK_BASEAlexey Brodkin
commit 06f34e1c28f3608b0ce5b310e41102d3fe7b65a1 upstream. We used to calculate page address differently in 2 cases: 1. In virt_to_page(x) we do --->8--- mem_map + (x - CONFIG_LINUX_LINK_BASE) >> PAGE_SHIFT --->8--- 2. In in pte_page(x) we do --->8--- mem_map + (pte_val(x) - PAGE_OFFSET) >> PAGE_SHIFT --->8--- That leads to problems in case PAGE_OFFSET != CONFIG_LINUX_LINK_BASE - different pages will be selected depending on where and how we calculate page address. In particular in the STAR 9000853582 when gdb attempted to read memory of another process it got improper page in get_user_pages() because this is exactly one of the places where we search for a page by pte_page(). The fix is trivial - we need to calculate page address similarly in both cases. Signed-off-by: Alexey Brodkin <abrodkin@synopsys.com> Signed-off-by: Vineet Gupta <vgupta@synopsys.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-03-06serial: fsl_lpuart: avoid new transfer while DMA is runningStefan Agner
commit 5f1437f61a0b351d25b528c159360da3d5e8c77b upstream. When the UART is in DMA receive mode (RDMAS set) and one character just arrived while another interrupt is handled (e.g. TX), the RDRF (receiver data register full flag) is set due to the water level of 1. But since the DMA will take care of this character, there is no need to handle it by calling lpuart_prepare_rx. Handling it leads to adding the RX timeout timer twice: [ 74.336698] Kernel BUG at 80053070 [verbose debug info unavailable] [ 74.342999] Internal error: Oops - BUG: 0 [#1] ARM0:00.00 khungtaskd [ 74.347817] Modules linked in: 0 S 0.0 0.0 0:00.00 writeback [ 74.350926] CPU: 0 PID: 0 Comm: swapper Not tainted 3.19.0-rc3-00001-g39d78e2 #1788 [ 74.358617] Hardware name: Freescale Vybrid VF610 (Device Tree)t [ 74.364563] task: 807a7678 ti: 8079c000 task.ti: 8079c000 kblockd [ 74.370002] PC is at add_timer+0x24/0x28.0 0.0 0:00.09 kworker/u2:1 [ 74.373960] LR is at lpuart_int+0x15c/0x3d8 [ 74.378171] pc : [<80053070>] lr : [<802e0d88>] psr: a0010193 [ 74.378171] sp : 8079de10 ip : 8079de20 fp : 8079de1c [ 74.389694] r10: 807d44c0 r9 : 8688c300 r8 : 00000013 [ 74.394943] r7 : 20010193 r6 : 00000000 r5 : 000000a0 r4 : 86997210 [ 74.401498] r3 : ffffa7da r2 : 80817868 r1 : 86997210 r0 : 86997344 [ 74.408052] Flags: NzCv IRQs off FIQs on Mode SVC_32 ISA ARM Segment kernel [ 74.415489] Control: 10c5387d Table: 8611c059 DAC: 00000015 [ 74.421265] Process swapper (pid: 0, stack limit = 0x8079c230) ... Solve this by only execute the receiver path (lpuart_prepare_rx) if the DMA receive mode (RDMAS) is not set. Also, make sure the flag is cleared on initialization, in case it has been left set. This can be best reproduced using UART as a serial console, then running top while dd'ing data into the terminal. Signed-off-by: Stefan Agner <stefan@agner.ch> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-03-06serial: fsl_lpuart: delete timer on shutdownStefan Agner
commit 4a8588a1cf867333187d9ff071e6fbdab587d194 upstream. If the serial port gets closed while a RX transfer is in progress, the timer might fire after the serial port shutdown finished. This leads in a NULL pointer dereference: [ 7.508324] Unable to handle kernel NULL pointer dereference at virtual address 00000000 [ 7.516590] pgd = 86348000 [ 7.519445] [00000000] *pgd=86179831, *pte=00000000, *ppte=00000000 [ 7.526145] Internal error: Oops: 17 [#1] ARM [ 7.530611] Modules linked in: [ 7.533876] CPU: 0 PID: 123 Comm: systemd Not tainted 3.19.0-rc3-00004-g5b11ea7 #1778 [ 7.541827] Hardware name: Freescale Vybrid VF610 (Device Tree) [ 7.547862] task: 861c3400 ti: 86ac8000 task.ti: 86ac8000 [ 7.553392] PC is at lpuart_timer_func+0x24/0xf8 [ 7.558127] LR is at lpuart_timer_func+0x20/0xf8 [ 7.562857] pc : [<802df99c>] lr : [<802df998>] psr: 600b0113 [ 7.562857] sp : 86ac9b90 ip : 86ac9b90 fp : 86ac9bbc [ 7.574467] r10: 80817180 r9 : 80817b98 r8 : 80817998 [ 7.579803] r7 : 807acee0 r6 : 86989000 r5 : 00000100 r4 : 86997210 [ 7.586444] r3 : 86ac8000 r2 : 86ac9bc0 r1 : 86997210 r0 : 00000000 [ 7.593085] Flags: nZCv IRQs on FIQs on Mode SVC_32 ISA ARM Segment user [ 7.600341] Control: 10c5387d Table: 86348059 DAC: 00000015 [ 7.606203] Process systemd (pid: 123, stack limit = 0x86ac8230) Setup the timer on UART startup which allows to delete the timer unconditionally on shutdown. This also saves the initialization on each transfer. Signed-off-by: Stefan Agner <stefan@agner.ch> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-03-06ntp: Fixup adjtimex freq validation on 32-bit systemsJohn Stultz
commit 29183a70b0b828500816bd794b3fe192fce89f73 upstream. Additional validation of adjtimex freq values to avoid potential multiplication overflows were added in commit 5e5aeb4367b (time: adjtimex: Validate the ADJ_FREQUENCY values) Unfortunately the patch used LONG_MAX/MIN instead of LLONG_MAX/MIN, which was fine on 64-bit systems, but being much smaller on 32-bit systems caused false positives resulting in most direct frequency adjustments to fail w/ EINVAL. ntpd only does direct frequency adjustments at startup, so the issue was not as easily observed there, but other time sync applications like ptpd and chrony were more effected by the bug. See bugs: https://bugzilla.kernel.org/show_bug.cgi?id=92481 https://bugzilla.redhat.com/show_bug.cgi?id=1188074 This patch changes the checks to use LLONG_MAX for clarity, and additionally the checks are disabled on 32-bit systems since LLONG_MAX/PPM_SCALE is always larger then the 32-bit long freq value, so multiplication overflows aren't possible there. Reported-by: Josh Boyer <jwboyer@fedoraproject.org> Reported-by: George Joseph <george.joseph@fairview5.com> Tested-by: George Joseph <george.joseph@fairview5.com> Signed-off-by: John Stultz <john.stultz@linaro.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Sasha Levin <sasha.levin@oracle.com> Link: http://lkml.kernel.org/r/1423553436-29747-1-git-send-email-john.stultz@linaro.org [ Prettified the changelog and the comments a bit. ] Signed-off-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-03-06kdb: Fix off by one error in kdb_cpu()Jason Wessel
commit df0036d117e6c9df36324e517728e33543065f9a upstream. There was a follow on replacement patch against the prior "kgdb: Timeout if secondary CPUs ignore the roundup". See: https://lkml.org/lkml/2015/1/7/442 This patch is the delta vs the patch that was committed upstream: * Fix an off-by-one error in kdb_cpu(). * Replace NR_CPUS with CONFIG_NR_CPUS to tell checkpatch that we really want a static limit. * Removed the "KGDB: " prefix from the pr_crit() in debug_core.c (kgdb-next contains a patch which introduced pr_fmt() to this file to the tag will now be applied automatically). Cc: Daniel Thompson <daniel.thompson@linaro.org> Signed-off-by: Jason Wessel <jason.wessel@windriver.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-03-06kdb: Avoid printing KERN_ levels to consolesDaniel Thompson
commit f7d4ca8bbfda23b4f1eae9b6757ff64166b093d5 upstream. Currently when kdb traps printk messages then the raw log level prefix (consisting of '\001' followed by a numeral) does not get stripped off before the message is issued to the various I/O handlers supported by kdb. This causes annoying visual noise as well as causing problems grepping for ^. It is also a change of behaviour compared to normal usage of printk() usage. For example <SysRq>-h ends up with different output to that of kdb's "sr h". This patch addresses the problem by stripping log levels from messages before they are issued to the I/O handlers. printk() which can also act as an i/o handler in some cases is special cased; if the caller provided a log level then the prefix will be preserved when sent to printk(). The addition of non-printable characters to the output of kdb commands is a regression, albeit and extremely elderly one, introduced by commit 04d2c8c83d0e ("printk: convert the format for KERN_<LEVEL> to a 2 byte pattern"). Note also that this patch does *not* restore the original behaviour from v3.5. Instead it makes printk() from within a kdb command display the message without any prefix (i.e. like printk() normally does). Signed-off-by: Daniel Thompson <daniel.thompson@linaro.org> Cc: Joe Perches <joe@perches.com> Signed-off-by: Jason Wessel <jason.wessel@windriver.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-03-06kdb: fix incorrect counts in KDB summary command outputJay Lan
commit 146755923262037fc4c54abc28c04b1103f3cc51 upstream. The output of KDB 'summary' command should report MemTotal, MemFree and Buffers output in kB. Current codes report in unit of pages. A define of K(x) as is defined in the code, but not used. This patch would apply the define to convert the values to kB. Please include me on Cc on replies. I do not subscribe to linux-kernel. Signed-off-by: Jay Lan <jlan@sgi.com> Signed-off-by: Jason Wessel <jason.wessel@windriver.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-03-06ARM: mvebu: build armada375-smp code conditionallyArnd Bergmann
commit 165235180ff61f0012ea68a299e46daec43dcaa7 upstream. mvebu_armada375_smp_wa_init is only used on armada 375 but is defined for all mvebu machines. As it calls a function that is only provided sometimes, this can result in a link error: arch/arm/mach-mvebu/built-in.o: In function `mvebu_armada375_smp_wa_init': :(.text+0x228): undefined reference to `mvebu_setup_boot_addr_wa' To solve this, we can just change the existing #ifdef around the function to also check for Armada375 SMP platforms. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Fixes: 305969fb6292 ("ARM: mvebu: use the common function for Armada 375 SMP workaround") Cc: Andrew Lunn <andrew@lunn.ch> Cc: Jason Cooper <jason@lakedaemon.net> Cc: Gregory Clement <gregory.clement@free-electrons.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-03-06ARM: vexpress: use ARM_CPU_SUSPEND if neededArnd Bergmann
commit 95fcedb027a27f32bf2434f9271635c380e57fb5 upstream. The vexpress tc2 power management code calls mcpm_loopback, which is only available if ARM_CPU_SUSPEND is enabled, otherwise we get a link error: arch/arm/mach-vexpress/built-in.o: In function `tc2_pm_init': arch/arm/mach-vexpress/tc2_pm.c:389: undefined reference to `mcpm_loopback' This explicitly selects ARM_CPU_SUSPEND like other platforms that need it. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Fixes: 3592d7e002438 ("ARM: 8082/1: TC2: test the MCPM loopback during boot") Acked-by: Nicolas Pitre <nico@linaro.org> Acked-by: Liviu Dudau <liviu.dudau@arm.com> Cc: Kevin Hilman <khilman@linaro.org> Cc: Sudeep Holla <sudeep.holla@arm.com> Cc: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>