summaryrefslogtreecommitdiffstats
path: root/include
AgeCommit message (Collapse)Author
2015-03-06quota: Store maximum space limit in bytesJan Kara
commit b10a08194c2b615955dfab2300331a90ae9344c7 upstream. Currently maximum space limit quota format supports is in blocks however since we store space limits in bytes, this is somewhat confusing. So store the maximum limit in bytes as well. Also rename the field to match the new unit and related inode field to match the new naming scheme. Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-03-06kernel: make READ_ONCE() valid on const argumentsLinus Torvalds
commit dd36929720f40f17685e841ae0d4c581c165ea60 upstream. The use of READ_ONCE() causes lots of warnings witht he pending paravirt spinlock fixes, because those ends up having passing a member to a 'const' structure to READ_ONCE(). There should certainly be nothing wrong with using READ_ONCE() with a const source, but the helper function __read_once_size() would cause warnings because it would drop the 'const' qualifier, but also because the destination would be marked 'const' too due to the use of 'typeof'. Use a union of types in READ_ONCE() to avoid this issue. Also make sure to use parenthesis around the macro arguments to avoid possible operator precedence issues. Tested-by: Ingo Molnar <mingo@kernel.org> Cc: Christian Borntraeger <borntraeger@de.ibm.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-03-06kernel: Fix sparse warning for ACCESS_ONCEChristian Borntraeger
commit c5b19946eb76c67566aae6a84bf2b10ad59295ea upstream. Commit 927609d622a3 ("kernel: tighten rules for ACCESS ONCE") results in sparse warnings like "Using plain integer as NULL pointer" - Let's add a type cast to the dummy assignment. To avoid warnings lik "sparse: warning: cast to restricted __hc32" we also use __force on that cast. Fixes: 927609d622a3 ("kernel: tighten rules for ACCESS ONCE") Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-03-06kernel: tighten rules for ACCESS ONCEChristian Borntraeger
commit 927609d622a3773995f84bc03b4564f873cf0e22 upstream. Now that all non-scalar users of ACCESS_ONCE have been converted to READ_ONCE or ASSIGN once, lets tighten ACCESS_ONCE to only work on scalar types. This variant was proposed by Alexei Starovoitov. Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com> Reviewed-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-03-06kdb: Avoid printing KERN_ levels to consolesDaniel Thompson
commit f7d4ca8bbfda23b4f1eae9b6757ff64166b093d5 upstream. Currently when kdb traps printk messages then the raw log level prefix (consisting of '\001' followed by a numeral) does not get stripped off before the message is issued to the various I/O handlers supported by kdb. This causes annoying visual noise as well as causing problems grepping for ^. It is also a change of behaviour compared to normal usage of printk() usage. For example <SysRq>-h ends up with different output to that of kdb's "sr h". This patch addresses the problem by stripping log levels from messages before they are issued to the I/O handlers. printk() which can also act as an i/o handler in some cases is special cased; if the caller provided a log level then the prefix will be preserved when sent to printk(). The addition of non-printable characters to the output of kdb commands is a regression, albeit and extremely elderly one, introduced by commit 04d2c8c83d0e ("printk: convert the format for KERN_<LEVEL> to a 2 byte pattern"). Note also that this patch does *not* restore the original behaviour from v3.5. Instead it makes printk() from within a kdb command display the message without any prefix (i.e. like printk() normally does). Signed-off-by: Daniel Thompson <daniel.thompson@linaro.org> Cc: Joe Perches <joe@perches.com> Signed-off-by: Jason Wessel <jason.wessel@windriver.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-03-06USB: add flag for HCDs that can't receive wakeup requests (isp1760-hcd)Alan Stern
commit 074f9dd55f9cab1b82690ed7e44bcf38b9616ce0 upstream. Currently the USB stack assumes that all host controller drivers are capable of receiving wakeup requests from downstream devices. However, this isn't true for the isp1760-hcd driver, which means that it isn't safe to do a runtime suspend of any device attached to a root-hub port if the device requires wakeup. This patch adds a "cant_recv_wakeups" flag to the usb_hcd structure and sets the flag in isp1760-hcd. The core is modified to prevent a direct child of the root hub from being put into runtime suspend with wakeup enabled if the flag is set. Signed-off-by: Alan Stern <stern@rowland.harvard.edu> Tested-by: Nicolas Pitre <nico@linaro.org> Signed-off-by: Greg Kroah-Hartman <greg@kroah.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-03-06USB: don't cancel queued resets when unbinding driversAlan Stern
commit 524134d422316a59d5464ccbc12036bbe90c5563 upstream. The USB stack provides a mechanism for drivers to request an asynchronous device reset (usb_queue_reset_device()). The mechanism uses a work item (reset_ws) embedded in the usb_interface structure used by the driver, and the reset is carried out by a work queue routine. The asynchronous reset can race with driver unbinding. When this happens, we try to cancel the queued reset before unbinding the driver, on the theory that the driver won't care about any resets once it is unbound. However, thanks to the fact that lockdep now tracks work queue accesses, this can provoke a lockdep warning in situations where the device reset causes another interface's driver to be unbound; see http://marc.info/?l=linux-usb&m=141893165203776&w=2 for an example. The reason is that the work routine for reset_ws in one interface calls cancel_queued_work() for the reset_ws in another interface. Lockdep thinks this might lead to a work routine trying to cancel itself. The simplest solution is not to cancel queued resets when unbinding drivers. This means we now need to acquire a reference to the usb_interface when queuing a reset_ws work item and to drop the reference when the work routine finishes. We also need to make sure that the usb_interface structure doesn't outlive its parent usb_device; this means acquiring and dropping a reference when the interface is created and destroyed. In addition, cancelling a queued reset can fail (if the device is in the middle of an earlier reset), and this can cause usb_reset_device() to try to rebind an interface that has been deallocated (see http://marc.info/?l=linux-usb&m=142175717016628&w=2 for details). Acquiring the extra references prevents this failure. Signed-off-by: Alan Stern <stern@rowland.harvard.edu> Reported-by: Russell King - ARM Linux <linux@arm.linux.org.uk> Reported-by: Olivier Sobrie <olivier@sobrie.be> Tested-by: Olivier Sobrie <olivier@sobrie.be> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-03-06usb: core: buffer: smallest buffer should start at ARCH_DMA_MINALIGNSebastian Andrzej Siewior
commit 5efd2ea8c9f4f12916ffc8ba636792ce052f6911 upstream. the following error pops up during "testusb -a -t 10" | musb-hdrc musb-hdrc.1.auto: dma_pool_free buffer-128, f134e000/be842000 (bad dma) hcd_buffer_create() creates a few buffers, the smallest has 32 bytes of size. ARCH_KMALLOC_MINALIGN is set to 64 bytes. This combo results in hcd_buffer_alloc() returning memory which is 32 bytes aligned and it might by identified by buffer_offset() as another buffer. This means the buffer which is on a 32 byte boundary will not get freed, instead it tries to free another buffer with the error message. This patch fixes the issue by creating the smallest DMA buffer with the size of ARCH_KMALLOC_MINALIGN (or 32 in case ARCH_KMALLOC_MINALIGN is smaller). This might be 32, 64 or even 128 bytes. The next three pools will have the size 128, 512 and 2048. In case the smallest pool is 128 bytes then we have only three pools instead of four (and zero the first entry in the array). The last pool size is always 2048 bytes which is the assumed PAGE_SIZE / 2 of 4096. I doubt it makes sense to continue using PAGE_SIZE / 2 where we would end up with 8KiB buffer in case we have 16KiB pages. Instead I think it makes sense to have a common size(s) and extend them if there is need to. There is a BUILD_BUG_ON() now in case someone has a minalign of more than 128 bytes. Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Acked-by: Alan Stern <stern@rowland.harvard.edu> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-03-06cipso: don't use IPCB() to locate the CIPSO IP optionPaul Moore
commit 04f81f0154e4bf002be6f4d85668ce1257efa4d9 upstream. Using the IPCB() macro to get the IPv4 options is convenient, but unfortunately NetLabel often needs to examine the CIPSO option outside of the scope of the IP layer in the stack. While historically IPCB() worked above the IP layer, due to the inclusion of the inet_skb_param struct at the head of the {tcp,udp}_skb_cb structs, recent commit 971f10ec ("tcp: better TCP_SKB_CB layout to reduce cache line misses") reordered the tcp_skb_cb struct and invalidated this IPCB() trick. This patch fixes the problem by creating a new function, cipso_v4_optptr(), which locates the CIPSO option inside the IP header without calling IPCB(). Unfortunately, this isn't as fast as a simple lookup so some additional tweaks were made to limit the use of this new function. Reported-by: Casey Schaufler <casey@schaufler-ca.com> Signed-off-by: Paul Moore <pmoore@redhat.com> Tested-by: Casey Schaufler <casey@schaufler-ca.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-03-06SUNRPC: NULL utsname dereference on NFS umount during namespace cleanupTrond Myklebust
commit 03a9a42a1a7e5b3e7919ddfacc1d1cc81882a955 upstream. Fix an Oopsable condition when nsm_mon_unmon is called as part of the namespace cleanup, which now apparently happens after the utsname has been freed. Link: http://lkml.kernel.org/r/20150125220604.090121ae@neptune.home Reported-by: Bruno Prémont <bonbons@linux-vserver.org> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-03-06NFS: struct nfs_commit_info.lock must always point to inode->i_lockTrond Myklebust
commit f4086a3d789dbe18949862276d83b8f49fce6d2f upstream. Commit 411a99adffb4f (nfs: clear_request_commit while holding i_lock) assumes that the nfs_commit_info always points to the inode->i_lock. For historical reasons, that is not the case for O_DIRECT writes. Cc: Weston Andros Adamson <dros@primarydata.com> Fixes: 411a99adffb4f ("nfs: clear_request_commit while holding i_lock") Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-03-06fsnotify: fix handling of renames in auditJan Kara
commit 6ee8e25fc3e916193bce4ebb43d5439e1e2144ab upstream. Commit e9fd702a58c4 ("audit: convert audit watches to use fsnotify instead of inotify") broke handling of renames in audit. Audit code wants to update inode number of an inode corresponding to watched name in a directory. When something gets renamed into a directory to a watched name, inotify previously passed moved inode to audit code however new fsnotify code passes directory inode where the change happened. That confuses audit and it starts watching parent directory instead of a file in a directory. This can be observed for example by doing: cd /tmp touch foo bar auditctl -w /tmp/foo touch foo mv bar foo touch foo In audit log we see events like: type=CONFIG_CHANGE msg=audit(1423563584.155:90): auid=1000 ses=2 op="updated rules" path="/tmp/foo" key=(null) list=4 res=1 ... type=PATH msg=audit(1423563584.155:91): item=2 name="bar" inode=1046884 dev=08:0 2 mode=0100644 ouid=0 ogid=0 rdev=00:00 nametype=DELETE type=PATH msg=audit(1423563584.155:91): item=3 name="foo" inode=1046842 dev=08:0 2 mode=0100644 ouid=0 ogid=0 rdev=00:00 nametype=DELETE type=PATH msg=audit(1423563584.155:91): item=4 name="foo" inode=1046884 dev=08:0 2 mode=0100644 ouid=0 ogid=0 rdev=00:00 nametype=CREATE ... and that's it - we see event for the first touch after creating the audit rule, we see events for rename but we don't see any event for the last touch. However we start seeing events for unrelated stuff happening in /tmp. Fix the problem by passing moved inode as data in the FS_MOVED_FROM and FS_MOVED_TO events instead of the directory where the change happens. This doesn't introduce any new problems because noone besides audit_watch.c cares about the passed value: fs/notify/fanotify/fanotify.c cares only about FSNOTIFY_EVENT_PATH events. fs/notify/dnotify/dnotify.c doesn't care about passed 'data' value at all. fs/notify/inotify/inotify_fsnotify.c uses 'data' only for FSNOTIFY_EVENT_PATH. kernel/audit_tree.c doesn't care about passed 'data' at all. kernel/audit_watch.c expects moved inode as 'data'. Fixes: e9fd702a58c49db ("audit: convert audit watches to use fsnotify instead of inotify") Signed-off-by: Jan Kara <jack@suse.cz> Cc: Paul Moore <paul@paul-moore.com> Cc: Eric Paris <eparis@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-02-08Merge tag 'trace-fixes-v3.19-rc7' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace Pull ftrace fixes from Steven Rostedt: "During testing Sedat Dilek hit a "suspicious RCU usage" splat that pointed out a real bug. During suspend and resume the tlb_flush tracepoint is called when the CPU is going offline. As the CPU has been noted as offline, RCU is ignoring that CPU, which means that it can not use RCU protected locks. When tracepoints are activated, they require RCU locking, and if RCU is ignoring a CPU that runs a tracepoint, there is a chance that the tracepoint could cause corruption. The solution was to change the tracepoint into a TRACE_EVENT_CONDITION() which allows us to check a condition to determine if the tracepoint should be called or not. If the condition is not met, the rcu protected code will not be executed. By adding the condition "cpu_online(smp_processor_id())", this will prevent the RCU protected code from being executed if the CPU is marked offline. After adding this, another bug was discovered. As RCU checks rcu callers, if a rcu call is not done, there is no check (obviously). We found that tracepoints could be added in RCU ignored locations and not have lockdep complain until the tracepoint is activated. This missed places where tracepoints were added in places they should not have been. To fix this, code was added in 3.18 that if lockdep is enabled, any tracepoint will still call the rcu checks even if the tracepoint is not enabled. The bug here, is that the check does not take the CONDITION into account. As the condition may prevent tracepoints from being activated in RCU ignored areas (as the one patch does), we get false positives when we enable lockdep and hit a tracepoint that the condition prevents it from being called in a RCU ignored location. The fix for this is to add the CONDITION to the rcu checks, even if the tracepoint is not enabled" * tag 'trace-fixes-v3.19-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace: x86/tlb/trace: Do not trace on CPU that is offline tracing: Add condition check to RCU lockdep checks
2015-02-07x86/tlb/trace: Do not trace on CPU that is offlineSteven Rostedt (Red Hat)
When taking a CPU down for suspend and resume, a tracepoint may be called when the CPU has been designated offline. As tracepoints require RCU for protection, they must not be called if the current CPU is offline. Unfortunately, trace_tlb_flush() is called in this scenario as was noted by LOCKDEP: ... Disabling non-boot CPUs ... intel_pstate CPU 1 exiting =============================== smpboot: CPU 1 didn't die... [ INFO: suspicious RCU usage. ] 3.19.0-rc7-next-20150204.1-iniza-small #1 Not tainted ------------------------------- include/trace/events/tlb.h:35 suspicious rcu_dereference_check() usage! other info that might help us debug this: RCU used illegally from offline CPU! rcu_scheduler_active = 1, debug_locks = 0 no locks held by swapper/1/0. stack backtrace: CPU: 1 PID: 0 Comm: swapper/1 Not tainted 3.19.0-rc7-next-20150204.1-iniza-small #1 Hardware name: SAMSUNG ELECTRONICS CO., LTD. 530U3BI/530U4BI/530U4BH/530U3BI/530U4BI/530U4BH, BIOS 13XK 03/28/2013 0000000000000001 ffff88011a44fe18 ffffffff817e370d 0000000000000011 ffff88011a448290 ffff88011a44fe48 ffffffff810d6847 ffff8800c66b9600 0000000000000001 ffff88011a44c000 ffffffff81cb3900 ffff88011a44fe78 Call Trace: [<ffffffff817e370d>] dump_stack+0x4c/0x65 [<ffffffff810d6847>] lockdep_rcu_suspicious+0xe7/0x120 [<ffffffff810b71a5>] idle_task_exit+0x205/0x2c0 [<ffffffff81054c4e>] play_dead_common+0xe/0x50 [<ffffffff81054ca5>] native_play_dead+0x15/0x140 [<ffffffff8102963f>] arch_cpu_idle_dead+0xf/0x20 [<ffffffff810cd89e>] cpu_startup_entry+0x37e/0x580 [<ffffffff81053e20>] start_secondary+0x140/0x150 intel_pstate CPU 2 exiting ... By converting the tlb_flush tracepoint to a TRACE_EVENT_CONDITION where the condition is cpu_online(smp_processor_id()), we can avoid calling RCU protected code when the CPU is offline. Link: http://lkml.kernel.org/r/CA+icZUUGiGDoL5NU8RuxKzFjoLjEKRtUWx=JB8B9a0EQv-eGzQ@mail.gmail.com Cc: stable@vger.kernel.org # 3.17+ Fixes: d17d8f9dedb9 "x86/mm: Add tracepoints for TLB flushes" Reported-by: Sedat Dilek <sedat.dilek@gmail.com> Tested-by: Sedat Dilek <sedat.dilek@gmail.com> Suggested-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Acked-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Acked-by: Dave Hansen <dave@sr71.net> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2015-02-07tracing: Add condition check to RCU lockdep checksSteven Rostedt (Red Hat)
The trace_tlb_flush() tracepoint can be called when a CPU is going offline. When a CPU is offline, RCU is no longer watching that CPU and since the tracepoint is protected by RCU, it must not be called. To prevent the tlb_flush tracepoint from being called when the CPU is offline, it was converted to a TRACE_EVENT_CONDITION where the condition checks if the CPU is online before calling the tracepoint. Unfortunately, this was not enough to stop lockdep from complaining about it. Even though the RCU protected code of the tracepoint will never be called, the condition is hidden within the tracepoint, and even though the condition prevents RCU code from being called, the lockdep checks are outside the tracepoint (this is to test tracepoints even when they are not enabled). Even though tracepoints should be checked to be RCU safe when they are not enabled, the condition should still be considered when checking RCU. Link: http://lkml.kernel.org/r/CA+icZUUGiGDoL5NU8RuxKzFjoLjEKRtUWx=JB8B9a0EQv-eGzQ@mail.gmail.com Fixes: 3a630178fd5f "tracing: generate RCU warnings even when tracepoints are disabled" Cc: stable@vger.kernel.org # 3.18+ Acked-by: Dave Hansen <dave@sr71.net> Reported-by: Sedat Dilek <sedat.dilek@gmail.com> Tested-by: Sedat Dilek <sedat.dilek@gmail.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2015-02-07Merge tag 'rdma-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband Pull one more infiniband revert from Roland Dreier: "One more last-second RDMA change for 3.19: Yann realized that the previous revert of new userspace ABI did not go far enough, and we're still exposing a change that we don't want. Revert even closer to 3.18 interface to make sure we get things right in the long run" Yann Droneaud pipes up: "I hope this could go in v3.19 as, at this stage, we don't want to expose any bits of this ABI in a released kernel" * tag 'rdma-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband: Revert "IB/core: Add support for extended query device caps"
2015-02-06Merge branch 'sched-urgent-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull scheduler fixes from Ingo Molnar: "Misc fixes" * 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: sched/deadline: Fix deadline parameter modification handling sched/wait: Remove might_sleep() from wait_event_cmd() sched: Fix crash if cpuset_cpumask_can_shrink() is passed an empty cpumask sched/fair: Avoid using uninitialized variable in preferred_group_nid()
2015-02-06Merge tag 'sound-3.19' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound Pull sound fixes from Takashi Iwai: "Hopefully the final pull request for 3.19: this ended up with a slightly higher volume than wished, but I put them all as they are either stable or 3.19 regression fixes. Most of commits are from ASoC, and have been stewed for a while in linux-next. The only change in the common code is the regression fixes for ASoC AC97 stuff wrt device registrations. The rest are device-specific, mostly small fixes in various ASoC drivers and ak411x on ice1724 boards" * tag 'sound-3.19' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound: ASoC: Intel: fix sst firmware path for cht-bsw-rt5672 ARM: dts: Fix I2S1, I2S2 compatible for exynos4 SoCs ASoC: sgtl5000: add delay before first I2C access MAINTAINERS: ASoC: add maintainer for Intel BDW/HSW ASoC driver ASoC: atmel_ssc_dai: fix the setting for DSP mode ASoC: sgtl5000: Use shift mask when setting codec mode ASoC: tlv320aic3x: Fix data delay configuration ALSA: ak411x: Fix stall in work callback ASoC: Intel: Used lock version to update shim registers ASoC: wm8731: init mutex in i2c init path ASoC: atmel_ssc_dai: fix start event for I2S mode ASoC: rt5640: Add RT5642 ACPI ID for Intel Baytrail ASoC: wm97xx: Reset AC'97 device before registering it ASoC: Add support for allocating AC'97 device before registering it
2015-02-06Revert "IB/core: Add support for extended query device caps"Yann Droneaud
While commit 7e36ef8205ff ("IB/core: Temporarily disable ex_query_device uverb") is correct as it makes the extended QUERY_DEVICE uverb (which came as part of commit 5a77abf9a97a ("IB/core: Add support for extended query device caps") and commit 860f10a799c8 ("IB/core: Add flags for on demand paging support")) not available to userspace, it doesn't address the initial issue regarding ib_copy_to_udata() [1][2]. Additionally, further discussions around this new uverb seems to conclude it would require a different data structure than the one currently described in <rdma/ib_user_verbs.h> [3]. Both of these issues require a revert of the changes, so this patch partially reverts commit 8cdd312cfed7 ("IB/mlx5: Implement the ODP capability query verb") and commit 860f10a799c8 ("IB/core: Add flags for on demand paging support") and fully reverts commit 5a77abf9a97a ("IB/core: Add support for extended query device caps"). [1] "Re: [PATCH v3 06/17] IB/core: Add support for extended query device caps" http://mid.gmane.org/1418733236.2779.26.camel@opteya.com [2] "Re: [PATCH] IB/core: Temporarily disable ex_query_device uverb" http://mid.gmane.org/1423067503.3030.83.camel@opteya.com [3] "RE: [PATCH v1 1/5] IB/uverbs: ex_query_device: answer must not depend on request's comp_mask" http://mid.gmane.org/2807E5FD2F6FDA4886F6618EAC48510E0CC12C30@CRSMSX101.amr.corp.intel.com Cc: Eli Cohen <eli@mellanox.com> Cc: Haggai Eran <haggaie@mellanox.com> Cc: Ira Weiny <ira.weiny@intel.com> Cc: Jason Gunthorpe <jgunthorpe@obsidianresearch.com> Cc: Sagi Grimberg <sagig@mellanox.com> Cc: Shachar Raindel <raindel@mellanox.com> Signed-off-by: Yann Droneaud <ydroneaud@opteya.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
2015-02-05Merge tag 'asoc-fix-ac97-v3.19-rc7' of ↵Takashi Iwai
git://git.kernel.org/pub/scm/linux/kernel/git/broonie/sound into for-linus ASoC: AC'97 fixes These are rather too large for this late in the release cycle but they're clear, well understood and have been tested to fix a regression which was introduced for v3.19. The details are all in Lars' changelog and they've been cooking in -next for a while, to a large extent out of conservatism about the size.
2015-02-05Merge tag 'asoc-fix-v3.19-rc7' of ↵Takashi Iwai
git://git.kernel.org/pub/scm/linux/kernel/git/broonie/sound into for-linus ASoC: Fixes for v3.19 A few last minute fixes for v3.19, all driver specific. None of them stand out particularly - it's all the standard people who are affected will care stuff. The Samsung fix is a DT only fix for the audio controller, it's being merged via the ASoC tree due to process messups (the submitter sent it at the end of a tangentally related series rather than separately to the ARM folks) in order to make sure that it gets to people sooner.
2015-02-05MMerge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netLinus Torvalds
Pull networking fixes from David Miller: 1) Stretch ACKs can kill performance with Reno and CUBIC congestion control, largely due to LRO and GRO. Fix from Neal Cardwell. 2) Fix userland breakage because we accidently emit zero length netlink messages from the bridging code. From Roopa Prabhu. 3) Carry handling in generic csum_tcpudp_nofold is broken, fix from Karl Beldan. 4) Remove bogus dev_set_net() calls from CAIF driver, from Nicolas Dichtel. 5) Make sure PPP deflation never returns a length greater then the output buffer, otherwise we overflow and trigger skb_over_panic(). Fix from Florian Westphal. 6) COSA driver needs VIRT_TO_BUS Kconfig dependencies, from Arnd Bergmann. 7) Don't increase route cached MTU on datagram too big ICMPs. From Li Wei. 8) Fix error path leaks in nf_tables, from Pablo Neira Ayuso. 9) Fix bitmask handling regression in netlink that broke things like acpi userland tools. From Pablo Neira Ayuso. 10) Wrong header pointer passed to param_type2af() in SCTP code, from Saran Maruti Ramanara. 11) Stacked vlans not handled correctly by vlan_get_protocol(), from Toshiaki Makita. 12) Add missing DMA memory barrier to xgene driver, from Iyappan Subramanian. 13) Fix crash in rate estimators, from Eric Dumazet. 14) We've been adding various workarounds, one after another, for the change which added the per-net tcp_sock. It was meant to reduce socket contention but added lots of problems. Reduce this instead to a proper per-cpu socket and that rids us of all the daemons. From Eric Dumazet. 15) Fix memory corruption and OOPS in mlx4 driver, from Jack Morgenstein. 16) When we disabled UFO in the virtio_net device, it introduces some serious performance regressions. The orignal problem was IPV6 fragment ID generation, so fix that properly instead. From Vlad Yasevich. 17) sr9700 driver build breaks on xtensa because it defines macros with the same name as those used by the arch code. Use more unique names. From Chen Gang. 18) Fix endianness in new virio 1.0 mode of the vhost net driver, from Michael S Tsirkin. 19) Several sysctls were setting the maxlen attribute incorrectly, from Sasha Levin. 20) Don't accept an FQ scheduler quantum of zero, that leads to crashes. From Kenneth Klette Jonassen. 21) Fix dumping of non-existing actions in the packet scheduler classifier. From Ignacy Gawędzki. 22) Return the write work_done value when doing TX work in the qlcnic driver. 23) ip6gre_err accesses the info field with the wrong endianness, from Sabrina Dubroca. * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (54 commits) sit: fix some __be16/u16 mismatches ipv6: fix sparse errors in ip6_make_flowlabel() net: remove some sparse warnings flow_keys: n_proto type should be __be16 ip6_gre: fix endianness errors in ip6gre_err qlcnic: Fix NAPI poll routine for Tx completion amd-xgbe: Set RSS enablement based on hardware features amd-xgbe: Adjust for zero-based traffic class count cls_api.c: Fix dumping of non-existing actions' stats. pkt_sched: fq: avoid hang when quantum 0 net: rds: use correct size for max unacked packets and bytes vhost/net: fix up num_buffers endian-ness gianfar: correct the bad expression while writing bit-pattern net: usb: sr9700: Use 'SR_' prefix for the common register macros Revert "drivers/net: Disable UFO through virtio" Revert "drivers/net, ipv6: Select IPv6 fragment idents for virtio UFO packets" ipv6: Select fragment id during UFO segmentation if not set. xen-netback: stop the guest rx thread after a fatal error net/mlx4_core: Fix kernel Oops (mem corruption) when working with more than 80 VFs isdn: off by one in connect_res() ...
2015-02-05ipv6: fix sparse errors in ip6_make_flowlabel()Eric Dumazet
include/net/ipv6.h:713:22: warning: incorrect type in assignment (different base types) include/net/ipv6.h:713:22: expected restricted __be32 [usertype] hash include/net/ipv6.h:713:22: got unsigned int include/net/ipv6.h:719:25: warning: restricted __be32 degrades to integer include/net/ipv6.h:719:22: warning: invalid assignment: ^= include/net/ipv6.h:719:22: left side has type restricted __be32 include/net/ipv6.h:719:22: right side has type unsigned int Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-02-05flow_keys: n_proto type should be __be16Eric Dumazet
(struct flow_keys)->n_proto is in network order, use proper type for this. Fixes following sparse errors : net/core/flow_dissector.c:139:39: warning: incorrect type in assignment (different base types) net/core/flow_dissector.c:139:39: expected unsigned short [unsigned] [usertype] n_proto net/core/flow_dissector.c:139:39: got restricted __be16 [assigned] [usertype] proto net/core/flow_dissector.c:237:23: warning: incorrect type in assignment (different base types) net/core/flow_dissector.c:237:23: expected unsigned short [unsigned] [usertype] n_proto net/core/flow_dissector.c:237:23: got restricted __be16 [assigned] [usertype] proto Signed-off-by: Eric Dumazet <edumazet@google.com> Fixes: e0f31d849867 ("flow_keys: Record IP layer protocol in skb_flow_dissect()") Signed-off-by: David S. Miller <davem@davemloft.net>
2015-02-03ipv6: Select fragment id during UFO segmentation if not set.Vlad Yasevich
If the IPv6 fragment id has not been set and we perform fragmentation due to UFO, select a new fragment id. We now consider a fragment id of 0 as unset and if id selection process returns 0 (after all the pertrubations), we set it to 0x80000000, thus giving us ample space not to create collisions with the next packet we may have to fragment. When doing UFO integrity checking, we also select the fragment id if it has not be set yet. This is stored into the skb_shinfo() thus allowing UFO to function correclty. This patch also removes duplicate fragment id generation code and moves ipv6_select_ident() into the header as it may be used during GSO. Signed-off-by: Vladislav Yasevich <vyasevic@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-02-03sched/wait: Remove might_sleep() from wait_event_cmd()Mikulas Patocka
The patch e22b886a8a43 ("sched/wait: Add might_sleep() checks") introduced a bug in the raid5 subsystem. The function raid5_quiesce() (and resize_stripes()) uses the 'cmd' part to release and acquire a spinlock (so we call the sleep primitives in atomic context), and therefore we cannot do the might_sleep() check. Remove it. Fixes: e22b886a8a43 ("sched/wait: Add might_sleep() checks") Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: http://lkml.kernel.org/r/alpine.LRH.2.02.1502020935580.13510@file01.intranet.prod.int.rdu2.redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-02-02net/mlx4_core: Fix kernel Oops (mem corruption) when working with more than ↵Jack Morgenstein
80 VFs Commit de966c592802 (net/mlx4_core: Support more than 64 VFs) was meant to allow up to 126 VFs. However, due to leaving MLX4_MFUNC_MAX too low, using more than 80 VFs resulted in memory corruptions (and Oopses) when more than 80 VFs were requested. In addition, the number of slaves was left too high. This commit fixes these issues. Fixes: de966c592802 ("net/mlx4_core: Support more than 64 VFs") Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il> Signed-off-by: Amir Vadai <amirv@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-02-02Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nfDavid S. Miller
Pablo Neira Ayuso says: ==================== Netfilter/IPVS fixes for net The following patchset contains Netfilter/IPVS fixes for your net tree, they are: 1) Validate hooks for nf_tables NAT expressions, otherwise users can crash the kernel when using them from the wrong hook. We already got one user trapped on this when configuring masquerading. 2) Fix a BUG splat in nf_tables with CONFIG_DEBUG_PREEMPT=y. Reported by Andreas Schultz. 3) Avoid unnecessary reroute of traffic in the local input path in IPVS that triggers a crash in in xfrm. Reported by Florian Wiessner and fixes by Julian Anastasov. 4) Fix memory and module refcount leak from the error path of nf_tables_newchain(). ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2015-02-01ipv4: tcp: get rid of ugly unicast_sockEric Dumazet
In commit be9f4a44e7d41 ("ipv4: tcp: remove per net tcp_sock") I tried to address contention on a socket lock, but the solution I chose was horrible : commit 3a7c384ffd57e ("ipv4: tcp: unicast_sock should not land outside of TCP stack") addressed a selinux regression. commit 0980e56e506b ("ipv4: tcp: set unicast_sock uc_ttl to -1") took care of another regression. commit b5ec8eeac46 ("ipv4: fix ip_send_skb()") fixed another regression. commit 811230cd85 ("tcp: ipv4: initialize unicast_sock sk_pacing_rate") was another shot in the dark. Really, just use a proper socket per cpu, and remove the skb_orphan() call, to re-enable flow control. This solves a serious problem with FQ packet scheduler when used in hostile environments, as we do not want to allocate a flow structure for every RST packet sent in response to a spoofed packet. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-02-01sched: don't cause task state changes in nested sleep debuggingLinus Torvalds
Commit 8eb23b9f35aa ("sched: Debug nested sleeps") added code to report on nested sleep conditions, which we generally want to avoid because the inner sleeping operation can re-set the thread state to TASK_RUNNING, but that will then cause the outer sleep loop not actually sleep when it calls schedule. However, that's actually valid traditional behavior, with the inner sleep being some fairly rare case (like taking a sleeping lock that normally doesn't actually need to sleep). And the debug code would actually change the state of the task to TASK_RUNNING internally, which makes that kind of traditional and working code not work at all, because now the nested sleep doesn't just sometimes cause the outer one to not block, but will cause it to happen every time. In particular, it will cause the cardbus kernel daemon (pccardd) to basically busy-loop doing scheduling, converting a laptop into a heater, as reported by Bruno Prémont. But there may be other legacy uses of that nested sleep model in other drivers that are also likely to never get converted to the new model. This fixes both cases: - don't set TASK_RUNNING when the nested condition happens (note: even if WARN_ONCE() only _warns_ once, the return value isn't whether the warning happened, but whether the condition for the warning was true. So despite the warning only happening once, the "if (WARN_ON(..))" would trigger for every nested sleep. - in the cases where we knowingly disable the warning by using "sched_annotate_sleep()", don't change the task state (that is used for all core scheduling decisions), instead use '->task_state_change' that is used for the debugging decision itself. (Credit for the second part of the fix goes to Oleg Nesterov: "Can't we avoid this subtle change in behaviour DEBUG_ATOMIC_SLEEP adds?" with the suggested change to use 'task_state_change' as part of the test) Reported-and-bisected-by: Bruno Prémont <bonbons@linux-vserver.org> Tested-by: Rafael J Wysocki <rjw@rjwysocki.net> Acked-by: Oleg Nesterov <oleg@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de>, Cc: Ilya Dryomov <ilya.dryomov@inktank.com>, Cc: Mike Galbraith <umgwanakikbuti@gmail.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Peter Hurley <peter@hurleysoftware.com>, Cc: Davidlohr Bueso <dave@stgolabs.net>, Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-01-31net: sched: fix panic in rate estimatorsEric Dumazet
Doing the following commands on a non idle network device panics the box instantly, because cpu_bstats gets overwritten by stats. tc qdisc add dev eth0 root <your_favorite_qdisc> ... some traffic (one packet is enough) ... tc qdisc replace dev eth0 root est 1sec 4sec <your_favorite_qdisc> [ 325.355596] BUG: unable to handle kernel paging request at ffff8841dc5a074c [ 325.362609] IP: [<ffffffff81541c9e>] __gnet_stats_copy_basic+0x3e/0x90 [ 325.369158] PGD 1fa7067 PUD 0 [ 325.372254] Oops: 0000 [#1] SMP [ 325.375514] Modules linked in: ... [ 325.398346] CPU: 13 PID: 14313 Comm: tc Not tainted 3.19.0-smp-DEV #1163 [ 325.412042] task: ffff8800793ab5d0 ti: ffff881ff2fa4000 task.ti: ffff881ff2fa4000 [ 325.419518] RIP: 0010:[<ffffffff81541c9e>] [<ffffffff81541c9e>] __gnet_stats_copy_basic+0x3e/0x90 [ 325.428506] RSP: 0018:ffff881ff2fa7928 EFLAGS: 00010286 [ 325.433824] RAX: 000000000000000c RBX: ffff881ff2fa796c RCX: 000000000000000c [ 325.440988] RDX: ffff8841dc5a0744 RSI: 0000000000000060 RDI: 0000000000000060 [ 325.448120] RBP: ffff881ff2fa7948 R08: ffffffff81cd4f80 R09: 0000000000000000 [ 325.455268] R10: ffff883ff223e400 R11: 0000000000000000 R12: 000000015cba0744 [ 325.462405] R13: ffffffff81cd4f80 R14: ffff883ff223e460 R15: ffff883feea0722c [ 325.469536] FS: 00007f2ee30fa700(0000) GS:ffff88407fa20000(0000) knlGS:0000000000000000 [ 325.477630] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 325.483380] CR2: ffff8841dc5a074c CR3: 0000003feeae9000 CR4: 00000000001407e0 [ 325.490510] Stack: [ 325.492524] ffff883feea0722c ffff883fef719dc0 ffff883feea0722c ffff883ff223e4a0 [ 325.499990] ffff881ff2fa79a8 ffffffff815424ee ffff883ff223e49c 000000015cba0744 [ 325.507460] 00000000f2fa7978 0000000000000000 ffff881ff2fa79a8 ffff883ff223e4a0 [ 325.514956] Call Trace: [ 325.517412] [<ffffffff815424ee>] gen_new_estimator+0x8e/0x230 [ 325.523250] [<ffffffff815427aa>] gen_replace_estimator+0x4a/0x60 [ 325.529349] [<ffffffff815718ab>] tc_modify_qdisc+0x52b/0x590 [ 325.535117] [<ffffffff8155edd0>] rtnetlink_rcv_msg+0xa0/0x240 [ 325.540963] [<ffffffff8155ed30>] ? __rtnl_unlock+0x20/0x20 [ 325.546532] [<ffffffff8157f811>] netlink_rcv_skb+0xb1/0xc0 [ 325.552145] [<ffffffff8155b355>] rtnetlink_rcv+0x25/0x40 [ 325.557558] [<ffffffff8157f0d8>] netlink_unicast+0x168/0x220 [ 325.563317] [<ffffffff8157f47c>] netlink_sendmsg+0x2ec/0x3e0 Lets play safe and not use an union : percpu 'pointers' are mostly read anyway, and we have typically few qdiscs per host. Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: John Fastabend <john.fastabend@gmail.com> Fixes: 22e0f8b9322c ("net: sched: make bstats per cpu and estimator RCU safe") Signed-off-by: David S. Miller <davem@davemloft.net>
2015-01-31Merge branch 'i2c/for-current' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux Pull i2c fixes from Wolfram Sang: "i2c driver bugfixes (s3c2410, slave-eeprom, sh_mobile), size regression "bugfix" (i2c slave), documentation bugfix (st). Also, one documentation update (da9063), so some devicetrees can now be verified" * 'i2c/for-current' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux: i2c: sh_mobile: terminate DMA reads properly i2c: Only include slave support if selected i2c: s3c2410: fix ABBA deadlock by keeping clock prepared i2c: slave-eeprom: fix boundary check when using sysfs i2c: st: Rename clock reference to something that exists DT: i2c: Add devices handled by the da9063 MFD driver
2015-01-30net: Fix vlan_get_protocol for stacked vlanToshiaki Makita
vlan_get_protocol() could not get network protocol if a skb has a 802.1ad vlan tag or multiple vlans, which caused incorrect checksum calculation in several drivers. Fix vlan_get_protocol() to retrieve network protocol instead of incorrect vlan protocol. As the logic is the same as skb_network_protocol(), create a common helper function __vlan_get_protocol() and call it from existing functions. Signed-off-by: Toshiaki Makita <makita.toshiaki@lab.ntt.co.jp> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-01-30Merge branch 'perf-urgent-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull perf fixes from Ingo Molnar: "Mostly tooling fixes, but also an event groups fix, two PMU driver fixes and a CPU model variant addition" * 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: perf: Tighten (and fix) the grouping condition perf/x86/intel: Add model number for Airmont perf/rapl: Fix crash in rapl_scale() perf/x86/intel/uncore: Move uncore_box_init() out of driver initialization perf probe: Fix probing kretprobes perf symbols: Introduce 'for' method to iterate over the symbols with a given name perf probe: Do not rely on map__load() filter to find symbols perf symbols: Introduce method to iterate symbols ordered by name perf symbols: Return the first entry with a given name in find_by_name method perf annotate: Fix memory leaks in LOCK handling perf annotate: Handle ins parsing failures perf scripting perl: Force to use stdbool perf evlist: Remove extraneous 'was' on error message
2015-01-30Merge branch 'for_linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs Pull quota and UDF fix from Jan Kara: "A fix for UDF to properly free preallocated blocks and a fix for quota so that Q_GETQUOTA quotactl reports correct numbers for XFS filesystem (and similarly Q_XGETQUOTA quotactl works properly for other filesystems)" * 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs: quota: Switch ->get_dqblk() and ->set_dqblk() to use bytes as space units udf: Release preallocation on last writeable close
2015-01-29vm: add VM_FAULT_SIGSEGV handling supportLinus Torvalds
The core VM already knows about VM_FAULT_SIGBUS, but cannot return a "you should SIGSEGV" error, because the SIGSEGV case was generally handled by the caller - usually the architecture fault handler. That results in lots of duplication - all the architecture fault handlers end up doing very similar "look up vma, check permissions, do retries etc" - but it generally works. However, there are cases where the VM actually wants to SIGSEGV, and applications _expect_ SIGSEGV. In particular, when accessing the stack guard page, libsigsegv expects a SIGSEGV. And it usually got one, because the stack growth is handled by that duplicated architecture fault handler. However, when the generic VM layer started propagating the error return from the stack expansion in commit fee7e49d4514 ("mm: propagate error from stack expansion even for guard page"), that now exposed the existing VM_FAULT_SIGBUS result to user space. And user space really expected SIGSEGV, not SIGBUS. To fix that case, we need to add a VM_FAULT_SIGSEGV, and teach all those duplicate architecture fault handlers about it. They all already have the code to handle SIGSEGV, so it's about just tying that new return value to the existing code, but it's all a bit annoying. This is the mindless minimal patch to do this. A more extensive patch would be to try to gather up the mostly shared fault handling logic into one generic helper routine, and long-term we really should do that cleanup. Just from this patch, you can generally see that most architectures just copied (directly or indirectly) the old x86 way of doing things, but in the meantime that original x86 model has been improved to hold the VM semaphore for shorter times etc and to handle VM_FAULT_RETRY and other "newer" things, so it would be a good idea to bring all those improvements to the generic case and teach other architectures about them too. Reported-and-tested-by: Takashi Iwai <tiwai@suse.de> Tested-by: Jan Engelhardt <jengelh@inai.de> Acked-by: Heiko Carstens <heiko.carstens@de.ibm.com> # "s390 still compiles and boots" Cc: linux-arch@vger.kernel.org Cc: stable@vger.kernel.org Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-01-28tcp: stretch ACK fixes prepNeal Cardwell
LRO, GRO, delayed ACKs, and middleboxes can cause "stretch ACKs" that cover more than the RFC-specified maximum of 2 packets. These stretch ACKs can cause serious performance shortfalls in common congestion control algorithms that were designed and tuned years ago with receiver hosts that were not using LRO or GRO, and were instead politely ACKing every other packet. This patch series fixes Reno and CUBIC to handle stretch ACKs. This patch prepares for the upcoming stretch ACK bug fix patches. It adds an "acked" parameter to tcp_cong_avoid_ai() to allow for future fixes to tcp_cong_avoid_ai() to correctly handle stretch ACKs, and changes all congestion control algorithms to pass in 1 for the ACKed count. It also changes tcp_slow_start() to return the number of packet ACK "credits" that were not processed in slow start mode, and can be processed by the congestion control module in additive increase mode. In future patches we will fix tcp_cong_avoid_ai() to handle stretch ACKs, and fix Reno and CUBIC handling of stretch ACKs in slow start and additive increase mode. Reported-by: Eyal Perry <eyalpe@mellanox.com> Signed-off-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: Yuchung Cheng <ycheng@google.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-01-28ALSA: ak411x: Fix stall in work callbackTakashi Iwai
When ak4114 work calls its callback and the callback invokes ak4114_reinit(), it stalls due to flush_delayed_work(). For avoiding this, control the reentrance by introducing a refcount. Also flush_delayed_work() is replaced with cancel_delayed_work_sync(). The exactly same bug is present in ak4113.c and fixed as well. Reported-by: Pavel Hofman <pavel.hofman@ivitera.com> Acked-by: Jaroslav Kysela <perex@perex.cz> Tested-by: Pavel Hofman <pavel.hofman@ivitera.com> Cc: <stable@vger.kernel.org> Signed-off-by: Takashi Iwai <tiwai@suse.de>
2015-01-28perf: Tighten (and fix) the grouping conditionPeter Zijlstra
The fix from 9fc81d87420d ("perf: Fix events installation during moving group") was incomplete in that it failed to recognise that creating a group with events for different CPUs is semantically broken -- they cannot be co-scheduled. Furthermore, it leads to real breakage where, when we create an event for CPU Y and then migrate it to form a group on CPU X, the code gets confused where the counter is programmed -- triggered in practice as well by me via the perf fuzzer. Fix this by tightening the rules for creating groups. Only allow grouping of counters that can be co-scheduled in the same context. This means for the same task and/or the same cpu. Fixes: 9fc81d87420d ("perf: Fix events installation during moving group") Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: http://lkml.kernel.org/r/20150123125834.090683288@infradead.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-01-28quota: Switch ->get_dqblk() and ->set_dqblk() to use bytes as space unitsJan Kara
Currently ->get_dqblk() and ->set_dqblk() use struct fs_disk_quota which tracks space limits and usage in 512-byte blocks. However VFS quotas track usage in bytes (as some filesystems require that) and we need to somehow pass this information. Upto now it wasn't a problem because we didn't do any unit conversion (thus VFS quota routines happily stuck number of bytes into d_bcount field of struct fd_disk_quota). Only if you tried to use Q_XGETQUOTA or Q_XSETQLIM for VFS quotas (or Q_GETQUOTA / Q_SETQUOTA for XFS quotas), you got bogus results. Hardly anyone tried this but reportedly some Samba users hit the problem in practice. So when we want interfaces compatible we need to fix this. We bite the bullet and define another quota structure used for passing information from/to ->get_dqblk()/->set_dqblk. It's somewhat sad we have to have more conversion routines in fs/quota/quota.c and another copying of quota structure slows down getting of quota information by about 2% but it seems cleaner than overloading e.g. units of d_bcount to bytes. CC: stable@vger.kernel.org Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jan Kara <jack@suse.cz>
2015-01-27Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netLinus Torvalds
Pull networking fixes from David Miller: 1) Don't OOPS on socket AIO, from Christoph Hellwig. 2) Scheduled scans should be aborted upon RFKILL, from Emmanuel Grumbach. 3) Fix sleep in atomic context in kvaser_usb, from Ahmed S Darwish. 4) Fix RCU locking across copy_to_user() in bpf code, from Alexei Starovoitov. 5) Lots of crash, memory leak, short TX packet et al bug fixes in sh_eth from Ben Hutchings. 6) Fix memory corruption in SCTP wrt. INIT collitions, from Daniel Borkmann. 7) Fix return value logic for poll handlers in netxen, enic, and bnx2x. From Eric Dumazet and Govindarajulu Varadarajan. 8) Header length calculation fix in mac80211 from Fred Chou. 9) mv643xx_eth doesn't handle highmem correctly in non-TSO code paths. From Ezequiel Garcia. 10) udp_diag has bogus logic in it's hash chain skipping, copy same fix tcp diag used. From Herbert Xu. 11) amd-xgbe programs wrong rx flow control register, from Thomas Lendacky. 12) Fix race leading to use after free in ping receive path, from Subash Abhinov Kasiviswanathan. 13) Cache redirect routes otherwise we can get a heavy backlog of rcu jobs liberating DST_NOCACHE entries. From Hannes Frederic Sowa. * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (48 commits) net: don't OOPS on socket aio stmmac: prevent probe drivers to crash kernel bnx2x: fix napi poll return value for repoll ipv6: replacing a rt6_info needs to purge possible propagated rt6_infos too sh_eth: Fix DMA-API usage for RX buffers sh_eth: Check for DMA mapping errors on transmit sh_eth: Ensure DMA engines are stopped before freeing buffers sh_eth: Remove RX overflow log messages ping: Fix race in free in receive path udp_diag: Fix socket skipping within chain can: kvaser_usb: Fix state handling upon BUS_ERROR events can: kvaser_usb: Retry the first bulk transfer on -ETIMEDOUT can: kvaser_usb: Send correct context to URB completion can: kvaser_usb: Do not sleep in atomic context ipv4: try to cache dst_entries which would cause a redirect samples: bpf: relax test_maps check bpf: rcu lock must not be held when calling copy_to_user() net: sctp: fix slab corruption from use after free on INIT collisions net: mv643xx_eth: Fix highmem support in non-TSO egress path sh_eth: Fix serialisation of interrupt disable with interrupt & NAPI handlers ...
2015-01-26ipv4: try to cache dst_entries which would cause a redirectHannes Frederic Sowa
Not caching dst_entries which cause redirects could be exploited by hosts on the same subnet, causing a severe DoS attack. This effect aggravated since commit f88649721268999 ("ipv4: fix dst race in sk_dst_get()"). Lookups causing redirects will be allocated with DST_NOCACHE set which will force dst_release to free them via RCU. Unfortunately waiting for RCU grace period just takes too long, we can end up with >1M dst_entries waiting to be released and the system will run OOM. rcuos threads cannot catch up under high softirq load. Attaching the flag to emit a redirect later on to the specific skb allows us to cache those dst_entries thus reducing the pressure on allocation and deallocation. This issue was discovered by Marcelo Leitner. Cc: Julian Anastasov <ja@ssi.bg> Signed-off-by: Marcelo Leitner <mleitner@redhat.com> Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-01-26Merge branch 'akpm' (patches from Andrew Morton)Linus Torvalds
Merge misc fixes from Andrew Morton: "Six fixes" * emailed patches from Andrew Morton <akpm@linux-foundation.org>: drivers/rtc/rtc-s5m.c: terminate s5m_rtc_id array with empty element printk: add dummy routine for when CONFIG_PRINTK=n mm/vmscan: fix highidx argument type memcg: remove extra newlines from memcg oom kill log x86, build: replace Perl script with Shell script mm: page_alloc: embed OOM killing naturally into allocation slowpath
2015-01-26Merge tag 'regulator-v3.19-rc6' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator Pull regulator fixes from Mark Brown: "One correctness fix here for the s2mps11 driver which would have resulted in some of the regulators being completely broken together with a fix for locking in regualtor_put() (which is fortunately rarely called at all in practical systems)" * tag 'regulator-v3.19-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator: regulator: s2mps11: Fix wrong calculation of register offset regulator: core: fix race condition in regulator_put()
2015-01-26printk: add dummy routine for when CONFIG_PRINTK=nPranith Kumar
There are missing dummy routines for log_buf_addr_get() and log_buf_len_get() for when CONFIG_PRINTK is not set causing build failures. This patch adds these dummy routines at the appropriate location. Signed-off-by: Pranith Kumar <bobby.prani@gmail.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Reviewed-by: Petr Mladek <pmladek@suse.cz> Acked-by: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-01-26mm: page_alloc: embed OOM killing naturally into allocation slowpathJohannes Weiner
The OOM killing invocation does a lot of duplicative checks against the task's allocation context. Rework it to take advantage of the existing checks in the allocator slowpath. The OOM killer is invoked when the allocator is unable to reclaim any pages but the allocation has to keep looping. Instead of having a check for __GFP_NORETRY hidden in oom_gfp_allowed(), just move the OOM invocation to the true branch of should_alloc_retry(). The __GFP_FS check from oom_gfp_allowed() can then be moved into the OOM avoidance branch in __alloc_pages_may_oom(), along with the PF_DUMPCORE test. __alloc_pages_may_oom() can then signal to the caller whether the OOM killer was invoked, instead of requiring it to duplicate the order and high_zoneidx checks to guess this when deciding whether to continue. Signed-off-by: Johannes Weiner <hannes@cmpxchg.org> Acked-by: Michal Hocko <mhocko@suse.cz> Cc: David Rientjes <rientjes@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-01-26i2c: Only include slave support if selectedJean Delvare
Make the slave support depend on CONFIG_I2C_SLAVE. Otherwise it gets included unconditionally, even when it is not needed. I2C bus drivers which implement slave support must select I2C_SLAVE. Signed-off-by: Jean Delvare <jdelvare@suse.de> Signed-off-by: Wolfram Sang <wsa@the-dreams.de>
2015-01-26ASoC: Add support for allocating AC'97 device before registering itLars-Peter Clausen
In some cases it is necessary to before additional operations after the device has been initialized and before the device is registered. This can for example be resetting the device. This patch introduces a new function snd_soc_alloc_ac97_codec() which is similar to snd_soc_new_ac97_codec() except that it does not register the device. Any users of snd_soc_alloc_ac97_codec() are responsible for calling device_add() manually. Fixes: 6794f709b712 ("ASoC: ac97: Drop delayed device registration") Reported-by: Manuel Lauss <manuel.lauss@gmail.com> Signed-off-by: Lars-Peter Clausen <lars@metafoo.de> Tested-by: Manuel Lauss <manuel.lauss@gmail.com> Signed-off-by: Mark Brown <broonie@kernel.org> Cc: stable@vger.kernel.org
2015-01-25Merge branch 'timers-urgent-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull timer fixes from Thomas Gleixner: "A set of small fixes: - regression fix for exynos_mct clocksource - trivial build fix for kona clocksource - functional one liner fix for the sh_tmu clocksource - two validation fixes to prevent (root only) data corruption in the kernel via settimeofday and adjtimex. Tagged for stable" * 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: time: adjtimex: Validate the ADJ_FREQUENCY values time: settimeofday: Validate the values of tv from user clocksource: sh_tmu: Set cpu_possible_mask to fix SMP broadcast clocksource: kona: fix __iomem annotation clocksource: exynos_mct: Fix bitmask regression for exynos4_mct_write
2015-01-24Merge tag 'pci-v3.19-fixes-1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci Pull PCI fixes from Bjorn Helgaas: "These are fixes for: - a resource management problem that causes a Radeon "Fatal error during GPU init" on machines where the BIOS programmed an invalid Root Port window. This was a regression in v3.16. - an Atheros AR93xx device that doesn't handle PCI bus resets correctly. This was a regression in v3.14. - an out-of-date email address" * tag 'pci-v3.19-fixes-1' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci: MAINTAINERS: Update Richard Zhu's email address sparc/PCI: Clip bridge windows to fit in upstream windows powerpc/PCI: Clip bridge windows to fit in upstream windows parisc/PCI: Clip bridge windows to fit in upstream windows mn10300/PCI: Clip bridge windows to fit in upstream windows microblaze/PCI: Clip bridge windows to fit in upstream windows ia64/PCI: Clip bridge windows to fit in upstream windows frv/PCI: Clip bridge windows to fit in upstream windows alpha/PCI: Clip bridge windows to fit in upstream windows x86/PCI: Clip bridge windows to fit in upstream windows PCI: Add pci_claim_bridge_resource() to clip window if necessary PCI: Add pci_bus_clip_resource() to clip to fit upstream window PCI: Pass bridge device, not bus, when updating bridge windows PCI: Mark Atheros AR93xx to avoid bus reset PCI: Add flag for devices where we can't use bus reset