summaryrefslogtreecommitdiffstats
path: root/include
AgeCommit message (Collapse)Author
2015-06-28kernel: make READ_ONCE() valid on const argumentsLinus Torvalds
[ Upstream commit dd36929720f40f17685e841ae0d4c581c165ea60 ] The use of READ_ONCE() causes lots of warnings witht he pending paravirt spinlock fixes, because those ends up having passing a member to a 'const' structure to READ_ONCE(). There should certainly be nothing wrong with using READ_ONCE() with a const source, but the helper function __read_once_size() would cause warnings because it would drop the 'const' qualifier, but also because the destination would be marked 'const' too due to the use of 'typeof'. Use a union of types in READ_ONCE() to avoid this issue. Also make sure to use parenthesis around the macro arguments to avoid possible operator precedence issues. Tested-by: Ingo Molnar <mingo@kernel.org> Cc: Christian Borntraeger <borntraeger@de.ibm.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
2015-06-28[media] Add and use IS_REACHABLE macroArnd Bergmann
[ Upstream commit 9b174527e7b756cda9f5d9e541f87b7fec9cfdf0 ] In the media drivers, the v4l2 core knows about all submodules and calls into them from a common function. However this cannot work if the modules that get called are loadable and the core is built-in. In that case we get drivers/built-in.o: In function `set_type': drivers/media/v4l2-core/tuner-core.c:301: undefined reference to `tea5767_attach' drivers/media/v4l2-core/tuner-core.c:307: undefined reference to `tea5761_attach' drivers/media/v4l2-core/tuner-core.c:349: undefined reference to `tda9887_attach' drivers/media/v4l2-core/tuner-core.c:405: undefined reference to `xc4000_attach' This was working previously, until the IS_ENABLED() macro was used to replace the construct like #if defined(CONFIG_DVB_CX24110) || (defined(CONFIG_DVB_CX24110_MODULE) && defined(MODULE)) with the difference that the new code no longer checks whether it is being built as a loadable module itself. To fix this, this new patch adds an 'IS_REACHABLE' macro, which evaluates true in exactly the condition that was used previously. The downside of this is that this trades an obvious link error for a more subtle runtime failure, but it is clear that the change that introduced the link error was unintentional and it seems better to revert it for now. Also, a similar change was originally created by Trent Piepho and then reverted by teh change to the IS_ENABLED macro. Ideally Kconfig would be used to avoid the case of a broken dependency, or the code restructured in a way to turn around the dependency, but either way would require much larger changes here. Fixes: 7b34be71db53 ("[media] use IS_ENABLED() macro") See-also: c5dec9fb248e ("V4L/DVB (4751): Fix DBV_FE_CUSTOMISE for card drivers compiled into kernel") Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Mauro Carvalho Chehab <mchehab@osg.samsung.com> Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
2015-06-28jhash: Update jhash_[321]words functions to use correct initvalAlexander Duyck
[ Upstream commit 2e7056c433216f406b90a003aa0ba42e19d3bdcf ] Looking over the implementation for jhash2 and comparing it to jhash_3words I realized that the two hashes were in fact very different. Doing a bit of digging led me to "The new jhash implementation" in which lookup2 was supposed to have been replaced with lookup3. In reviewing the patch I noticed that jhash2 had originally initialized a and b to JHASH_GOLDENRATIO and c to initval, but after the patch a, b, and c were initialized to initval + (length << 2) + JHASH_INITVAL. However the changes in jhash_3words simply replaced the initialization of a and b with JHASH_INITVAL. This change corrects what I believe was an oversight so that a, b, and c in jhash_3words all have the same value added consisting of initval + (length << 2) + JHASH_INITVAL so that jhash2 and jhash_3words will now produce the same hash result given the same inputs. Fixes: 60d509c823cca ("The new jhash implementation") Signed-off-by: Alexander Duyck <alexander.h.duyck@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
2015-06-28net: add sk_fullsock() helperEric Dumazet
[ Upstream commit 1d0ab253872cdd3d8e7913f59c266c7fd01771d0 ] We have many places where we want to check if a socket is not a timewait or request socket. Use a helper to avoid hard coding this. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
2015-06-28inet: add TCP_NEW_SYN_RECV stateEric Dumazet
[ Upstream commit 10feb428a5045d5eb18a5d755fbb8f0cc9645626 ] TCP_SYN_RECV state is currently used by fast open sockets. Initial TCP requests (the pseudo sockets created when a SYN is received) are not yet associated to a state. They are attached to their parent, and the parent is in TCP_LISTEN state. This commit adds TCP_NEW_SYN_RECV state, so that we can convert TCP stack to a different schem gradually. This state is not exported to user space. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
2015-06-28Driver core: Unified device properties interface for platform firmwareRafael J. Wysocki
[ Upstream commit b31384fa5de37a100507751dfb5c0a49d06cee67 ] Add a uniform interface by which device drivers can request device properties from the platform firmware by providing a property name and the corresponding data type. The purpose of it is to help to write portable code that won't depend on any particular platform firmware interface. The following general helper functions are added: device_property_present() device_property_read_u8() device_property_read_u16() device_property_read_u32() device_property_read_u64() device_property_read_string() device_property_read_u8_array() device_property_read_u16_array() device_property_read_u32_array() device_property_read_u64_array() device_property_read_string_array() The first one allows the caller to check if the given property is present. The next 5 of them allow single-valued properties of various types to be retrieved in a uniform way. The remaining 5 are for reading properties with multiple values (arrays of either numbers or strings). The interface covers both ACPI and Device Trees. This change set includes material from Mika Westerberg and Aaron Lu. Signed-off-by: Aaron Lu <aaron.lu@intel.com> Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com> Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Acked-by: Grant Likely <grant.likely@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
2015-06-28ACPI: Add support for device specific propertiesMika Westerberg
[ Upstream commit ffdcd955c3078af3ce117edcfce80fde1a512bed ] Device Tree is used in many embedded systems to describe the system configuration to the OS. It supports attaching properties or name-value pairs to the devices it describe. With these properties one can pass additional information to the drivers that would not be available otherwise. ACPI is another configuration mechanism (among other things) typically seen, but not limited to, x86 machines. ACPI allows passing arbitrary data from methods but there has not been mechanism equivalent to Device Tree until the introduction of _DSD in the recent publication of the ACPI 5.1 specification. In order to facilitate ACPI usage in systems where Device Tree is typically used, it would be beneficial to standardize a way to retrieve Device Tree style properties from ACPI devices, which is what we do in this patch. If a given device described in ACPI namespace wants to export properties it must implement _DSD method (Device Specific Data, introduced with ACPI 5.1) that returns the properties in a package of packages. For example: Name (_DSD, Package () { ToUUID("daffd814-6eba-4d8c-8a91-bc9bbf4aa301"), Package () { Package () {"name1", <VALUE1>}, Package () {"name2", <VALUE2>}, ... } }) The UUID reserved for properties is daffd814-6eba-4d8c-8a91-bc9bbf4aa301 and is documented in the ACPI 5.1 companion document called "_DSD Implementation Guide" [1], [2]. We add several helper functions that can be used to extract these properties and convert them to different Linux data types. The ultimate goal is that we only have one device property API that retrieves the requested properties from Device Tree or from ACPI transparent to the caller. [1] http://www.uefi.org/sites/default/files/resources/_DSD-implementation-guide-toplevel.htm [2] http://www.uefi.org/sites/default/files/resources/_DSD-device-properties-UUID.pdf Reviewed-by: Hanjun Guo <hanjun.guo@linaro.org> Reviewed-by: Josh Triplett <josh@joshtriplett.org> Reviewed-by: Grant Likely <grant.likely@linaro.org> Signed-off-by: Darren Hart <dvhart@linux.intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
2015-06-15tcp: fix child sockets to use system default congestion control if not setNeal Cardwell
[ Upstream commit 9f950415e4e28e7cfae2e416b43e862e8101d996 ] Linux 3.17 and earlier are explicitly engineered so that if the app doesn't specifically request a CC module on a listener before the SYN arrives, then the child gets the system default CC when the connection is established. See tcp_init_congestion_control() in 3.17 or earlier, which says "if no choice made yet assign the current value set as default". The change ("net: tcp: assign tcp cong_ops when tcp sk is created") altered these semantics, so that children got their parent listener's congestion control even if the system default had changed after the listener was created. This commit returns to those original semantics from 3.17 and earlier, since they are the original semantics from 2007 in 4d4d3d1e8 ("[TCP]: Congestion control initialization."), and some Linux congestion control workflows depend on that. In summary, if a listener socket specifically sets TCP_CONGESTION to "x", or the route locks the CC module to "x", then the child gets "x". Otherwise the child gets current system default from net.ipv4.tcp_congestion_control. That's the behavior in 3.17 and earlier, and this commit restores that. Fixes: 55d8694fa82c ("net: tcp: assign tcp cong_ops when tcp sk is created") Cc: Florian Westphal <fw@strlen.de> Cc: Daniel Borkmann <dborkman@redhat.com> Cc: Glenn Judd <glenn.judd@morganstanley.com> Cc: Stephen Hemminger <stephen@networkplumber.org> Signed-off-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Yuchung Cheng <ycheng@google.com> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
2015-06-15sctp: Fix mangled IPv4 addresses on a IPv6 listening socketJason Gunthorpe
[ Upstream commit 9302d7bb0c5cd46be5706859301f18c137b2439f ] sctp_v4_map_v6 was subtly writing and reading from members of a union in a way the clobbered data it needed to read before it read it. Zeroing the v6 flowinfo overwrites the v4 sin_addr with 0, meaning that every place that calls sctp_v4_map_v6 gets ::ffff:0.0.0.0 as the result. Reorder things to guarantee correct behaviour no matter what the union layout is. This impacts user space clients that open an IPv6 SCTP socket and receive IPv4 connections. Prior to 299ee user space would see a sockaddr with AF_INET and a correct address, after 299ee the sockaddr is AF_INET6, but the address is wrong. Fixes: 299ee123e198 (sctp: Fixup v4mapped behaviour to comply with Sock API) Signed-off-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Neil Horman <nhorman@tuxdriver.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
2015-06-10xfrm: release dst_orig in case of error in xfrm_lookup()huaibin Wang
[ Upstream commit ac37e2515c1a89c477459a2020b6bfdedabdb91b ] dst_orig should be released on error. Function like __xfrm_route_forward() expects that behavior. Since a recent commit, xfrm_lookup() may also be called by xfrm_lookup_route(), which expects the opposite. Let's introduce a new flag (XFRM_LOOKUP_KEEP_DST_REF) to tell what should be done in case of error. Fixes: f92ee61982d("xfrm: Generate blackhole routes only from route lookup functions") Signed-off-by: huaibin Wang <huaibin.wang@6wind.com> Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com> Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
2015-06-10drm/radeon: add new bonaire pci idAlex Deucher
[ Upstream commit fcf3b54282e4c5a95a1f45f67558bc105acdbc6a ] Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
2015-06-10sched: Handle priority boosted tasks proper in setscheduler()Thomas Gleixner
[ Upstream commit 0782e63bc6fe7e2d3408d250df11d388b7799c6b ] Ronny reported that the following scenario is not handled correctly: T1 (prio = 10) lock(rtmutex); T2 (prio = 20) lock(rtmutex) boost T1 T1 (prio = 20) sys_set_scheduler(prio = 30) T1 prio = 30 .... sys_set_scheduler(prio = 10) T1 prio = 30 The last step is wrong as T1 should now be back at prio 20. Commit c365c292d059 ("sched: Consider pi boosting in setscheduler()") only handles the case where a boosted tasks tries to lower its priority. Fix it by taking the new effective priority into account for the decision whether a change of the priority is required. Reported-by: Ronny Meeus <ronny.meeus@gmail.com> Tested-by: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Steven Rostedt <rostedt@goodmis.org> Cc: <stable@vger.kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Mike Galbraith <umgwanakikbuti@gmail.com> Fixes: c365c292d059 ("sched: Consider pi boosting in setscheduler()") Link: http://lkml.kernel.org/r/alpine.DEB.2.11.1505051806060.4225@nanos Signed-off-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
2015-06-10libata: Ignore spurious PHY event on LPM policy changeGabriele Mazzotta
[ Upstream commit 09c5b4803a80a5451d950d6a539d2eb311dc0fb1 ] When the LPM policy is set to ATA_LPM_MAX_POWER, the device might generate a spurious PHY event that cuases errors on the link. Ignore this event if it occured within 10s after the policy change. The timeout was chosen observing that on a Dell XPS13 9333 these spurious events can occur up to roughly 6s after the policy change. Link: http://lkml.kernel.org/g/3352987.ugV1Ipy7Z5@xps13 Signed-off-by: Gabriele Mazzotta <gabriele.mzt@gmail.com> Signed-off-by: Tejun Heo <tj@kernel.org> Cc: stable@vger.kernel.org Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
2015-06-10libata: Add helper to determine when PHY events should be ignoredGabriele Mazzotta
[ Upstream commit 8393b811f38acdf7fd8da2028708edad3e68ce1f ] This is a preparation commit that will allow to add other criteria according to which PHY events should be dropped. Signed-off-by: Gabriele Mazzotta <gabriele.mzt@gmail.com> Signed-off-by: Tejun Heo <tj@kernel.org> Cc: stable@vger.kernel.org Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
2015-06-09xen/events: don't bind non-percpu VIRQs with percpu chipDavid Vrabel
[ Upstream commit 77bb3dfdc0d554befad58fdefbc41be5bc3ed38a ] A non-percpu VIRQ (e.g., VIRQ_CONSOLE) may be freed on a different VCPU than it is bound to. This can result in a race between handle_percpu_irq() and removing the action in __free_irq() because handle_percpu_irq() does not take desc->lock. The interrupt handler sees a NULL action and oopses. Only use the percpu chip/handler for per-CPU VIRQs (like VIRQ_TIMER). # cat /proc/interrupts | grep virq 40: 87246 0 xen-percpu-virq timer0 44: 0 0 xen-percpu-virq debug0 47: 0 20995 xen-percpu-virq timer1 51: 0 0 xen-percpu-virq debug1 69: 0 0 xen-dyn-virq xen-pcpu 74: 0 0 xen-dyn-virq mce 75: 29 0 xen-dyn-virq hvc_console Signed-off-by: David Vrabel <david.vrabel@citrix.com> Cc: <stable@vger.kernel.org> Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
2015-06-09ktime: Fix ktime_divns to do signed divisionJohn Stultz
[ Upstream commit f7bcb70ebae0dcdb5a2d859b09e4465784d99029 ] It was noted that the 32bit implementation of ktime_divns() was doing unsigned division and didn't properly handle negative values. And when a ktime helper was changed to utilize ktime_divns, it caused a regression on some IR blasters. See the following bugzilla for details: https://bugzilla.redhat.com/show_bug.cgi?id=1200353 This patch fixes the problem in ktime_divns by checking and preserving the sign bit, and then reapplying it if appropriate after the division, it also changes the return type to a s64 to make it more obvious this is expected. Nicolas also pointed out that negative dividers would cause infinite loops on 32bit systems, negative dividers is unlikely for users of this function, but out of caution this patch adds checks for negative dividers for both 32-bit (BUG_ON) and 64-bit(WARN_ON) versions to make sure no such use cases creep in. [ tglx: Hand an u64 to do_div() to avoid the compiler warning ] Fixes: 166afb64511e 'ktime: Sanitize ktime_to_us/ms conversion' Reported-and-tested-by: Trevor Cordes <trevor@tecnopolis.ca> Signed-off-by: John Stultz <john.stultz@linaro.org> Acked-by: Nicolas Pitre <nicolas.pitre@linaro.org> Cc: Ingo Molnar <mingo@kernel.org> Cc: Josh Boyer <jwboyer@redhat.com> Cc: One Thousand Gnomes <gnomes@lxorguk.ukuu.org.uk> Cc: <stable@vger.kernel.org> Link: http://lkml.kernel.org/r/1431118043-23452-1-git-send-email-john.stultz@linaro.org Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
2015-06-09ktime: Optimize ktime_divns for constant divisorsNicolas Pitre
[ Upstream commit 8b618628b2bf83512fc8df5e8672619d65adfdfb ] At least on ARM, do_div() is optimized to turn constant divisors into an inline multiplication by the reciprocal value at compile time. However this optimization is missed entirely whenever ktime_divns() is used and the slow out-of-line division code is used all the time. Let ktime_divns() use do_div() inline whenever the divisor is constant and small enough. This will make things like ktime_to_us() and ktime_to_ms() much faster. Cc: Arnd Bergmann <arnd.bergmann@linaro.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@kernel.org> Cc: Nicolas Pitre <nico@linaro.org> Acked-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Nicolas Pitre <nico@linaro.org> Signed-off-by: John Stultz <john.stultz@linaro.org> Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
2015-06-09ACPICA: Tables: Change acpi_find_root_pointer() to use acpi_physical_address.Lv Zheng
[ Upstream commit f254e3c57b9d952e987502aefa0804c177dd2503 ] ACPICA commit 7d9fd64397d7c38899d3dc497525f6e6b044e0e3 OSPMs like Linux expect an acpi_physical_address returning value from acpi_find_root_pointer(). This triggers warnings if sizeof (acpi_size) doesn't equal to sizeof (acpi_physical_address): drivers/acpi/osl.c:275:3: warning: passing argument 1 of 'acpi_find_root_pointer' from incompatible pointer type [enabled by default] In file included from include/acpi/acpi.h:64:0, from include/linux/acpi.h:36, from drivers/acpi/osl.c:41: include/acpi/acpixf.h:433:1: note: expected 'acpi_size *' but argument is of type 'acpi_physical_address *' This patch corrects acpi_find_root_pointer(). Link: https://github.com/acpica/acpica/commit/7d9fd643 Signed-off-by: Lv Zheng <lv.zheng@intel.com> Signed-off-by: Bob Moore <robert.moore@intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
2015-05-23nilfs2: fix sanity check of btree level in nilfs_btree_root_broken()Ryusuke Konishi
[ Upstream commit d8fd150fe3935e1692bf57c66691e17409ebb9c1 ] The range check for b-tree level parameter in nilfs_btree_root_broken() is wrong; it accepts the case of "level == NILFS_BTREE_LEVEL_MAX" even though the level is limited to values in the range of 0 to (NILFS_BTREE_LEVEL_MAX - 1). Since the level parameter is read from storage device and used to index nilfs_btree_path array whose element count is NILFS_BTREE_LEVEL_MAX, it can cause memory overrun during btree operations if the boundary value is set to the level parameter on device. This fixes the broken sanity check and adds a comment to clarify that the upper bound NILFS_BTREE_LEVEL_MAX is exclusive. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
2015-05-17uas: Add US_FL_MAX_SECTORS_240 flagHans de Goede
[ Upstream commit ee136af4a064c2f61e2025873584d2c7ec93f4ae ] The usb-storage driver sets max_sectors = 240 in its scsi-host template, for uas we do not want to do that for all devices, but testing has shown that some devices need it. This commit adds a US_FL_MAX_SECTORS_240 flag for such devices, and implements support for it in uas.c, while at it it also adds support for US_FL_MAX_SECTORS_64 to uas.c. Cc: stable@vger.kernel.org # 3.16 Signed-off-by: Hans de Goede <hdegoede@redhat.com> Acked-by: Alan Stern <stern@rowland.harvard.edu> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
2015-05-17ASoC: dapm: Enable autodisable on SOC_DAPM_SINGLE_TLV_AUTODISABLECharles Keepax
[ Upstream commit a2d97723cb3a7741af81868427b36bba274b681b ] Correct small copy and paste error where autodisable was not being enabled for the SOC_DAPM_SINGLE_TLV_AUTODISABLE control. Signed-off-by: Charles Keepax <ckeepax@opensource.wolfsonmicro.com> Signed-off-by: Mark Brown <broonie@kernel.org> Cc: stable@vger.kernel.org Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
2015-05-17ALSA: emu10k1: Emu10k2 32 bit DMA modePeter Zubaj
[ Upstream commit 7241ea558c6715501e777396b5fc312c372e11d9 ] Looks like audigy emu10k2 (probably emu10k1 - sb live too) support two modes for DMA. Second mode is useful for 64 bit os with more then 2 GB of ram (fixes problems with big soundfont loading) 1) 32MB from 2 GB address space using 8192 pages (used now as default) 2) 16MB from 4 GB address space using 4096 pages Mode is set using HCFG_EXPANDED_MEM flag in HCFG register. Also format of emu10k2 page table is then different. Signed-off-by: Peter Zubaj <pzubaj@marticonet.sk> Tested-by: Takashi Iwai <tiwai@suse.de> Cc: <stable@vger.kernel.org> Signed-off-by: Takashi Iwai <tiwai@suse.de> Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
2015-05-17mm/hugetlb: take page table lock in follow_huge_pmd()Naoya Horiguchi
[ Upstream commit e66f17ff71772b209eed39de35aaa99ba819c93d ] We have a race condition between move_pages() and freeing hugepages, where move_pages() calls follow_page(FOLL_GET) for hugepages internally and tries to get its refcount without preventing concurrent freeing. This race crashes the kernel, so this patch fixes it by moving FOLL_GET code for hugepages into follow_huge_pmd() with taking the page table lock. This patch intentionally removes page==NULL check after pte_page. This is justified because pte_page() never returns NULL for any architectures or configurations. This patch changes the behavior of follow_huge_pmd() for tail pages and then tail pages can be pinned/returned. So the caller must be changed to properly handle the returned tail pages. We could have a choice to add the similar locking to follow_huge_(addr|pud) for consistency, but it's not necessary because currently these functions don't support FOLL_GET flag, so let's leave it for future development. Here is the reproducer: $ cat movepages.c #include <stdio.h> #include <stdlib.h> #include <numaif.h> #define ADDR_INPUT 0x700000000000UL #define HPS 0x200000 #define PS 0x1000 int main(int argc, char *argv[]) { int i; int nr_hp = strtol(argv[1], NULL, 0); int nr_p = nr_hp * HPS / PS; int ret; void **addrs; int *status; int *nodes; pid_t pid; pid = strtol(argv[2], NULL, 0); addrs = malloc(sizeof(char *) * nr_p + 1); status = malloc(sizeof(char *) * nr_p + 1); nodes = malloc(sizeof(char *) * nr_p + 1); while (1) { for (i = 0; i < nr_p; i++) { addrs[i] = (void *)ADDR_INPUT + i * PS; nodes[i] = 1; status[i] = 0; } ret = numa_move_pages(pid, nr_p, addrs, nodes, status, MPOL_MF_MOVE_ALL); if (ret == -1) err("move_pages"); for (i = 0; i < nr_p; i++) { addrs[i] = (void *)ADDR_INPUT + i * PS; nodes[i] = 0; status[i] = 0; } ret = numa_move_pages(pid, nr_p, addrs, nodes, status, MPOL_MF_MOVE_ALL); if (ret == -1) err("move_pages"); } return 0; } $ cat hugepage.c #include <stdio.h> #include <sys/mman.h> #include <string.h> #define ADDR_INPUT 0x700000000000UL #define HPS 0x200000 int main(int argc, char *argv[]) { int nr_hp = strtol(argv[1], NULL, 0); char *p; while (1) { p = mmap((void *)ADDR_INPUT, nr_hp * HPS, PROT_READ | PROT_WRITE, MAP_PRIVATE | MAP_ANONYMOUS | MAP_HUGETLB, -1, 0); if (p != (void *)ADDR_INPUT) { perror("mmap"); break; } memset(p, 0, nr_hp * HPS); munmap(p, nr_hp * HPS); } } $ sysctl vm.nr_hugepages=40 $ ./hugepage 10 & $ ./movepages 10 $(pgrep -f hugepage) Fixes: e632a938d914 ("mm: migrate: add hugepage migration code to move_pages()") Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> Reported-by: Hugh Dickins <hughd@google.com> Cc: James Hogan <james.hogan@imgtec.com> Cc: David Rientjes <rientjes@google.com> Cc: Mel Gorman <mel@csn.ul.ie> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Michal Hocko <mhocko@suse.cz> Cc: Rik van Riel <riel@redhat.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Luiz Capitulino <lcapitulino@redhat.com> Cc: Nishanth Aravamudan <nacc@linux.vnet.ibm.com> Cc: Lee Schermerhorn <lee.schermerhorn@hp.com> Cc: Steve Capper <steve.capper@linaro.org> Cc: <stable@vger.kernel.org> [3.12+] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
2015-05-17ebpf: verifier: check that call reg with ARG_ANYTHING is initializedDaniel Borkmann
[ Upstream commit 80f1d68ccba70b1060c9c7360ca83da430f66bed ] I noticed that a helper function with argument type ARG_ANYTHING does not need to have an initialized value (register). This can worst case lead to unintented stack memory leakage in future helper functions if they are not carefully designed, or unintended application behaviour in case the application developer was not careful enough to match a correct helper function signature in the API. The underlying issue is that ARG_ANYTHING should actually be split into two different semantics: 1) ARG_DONTCARE for function arguments that the helper function does not care about (in other words: the default for unused function arguments), and 2) ARG_ANYTHING that is an argument actually being used by a helper function and *guaranteed* to be an initialized register. The current risk is low: ARG_ANYTHING is only used for the 'flags' argument (r4) in bpf_map_update_elem() that internally does strict checking. Fixes: 17a5267067f3 ("bpf: verifier (add verifier core)") Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Alexei Starovoitov <ast@plumgrid.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
2015-05-17ACPICA: Utilities: split IO address types from data type models.Lv Zheng
[ Upstream commit 2b8760100e1de69b6ff004c986328a82947db4ad ] ACPICA commit aacf863cfffd46338e268b7415f7435cae93b451 It is reported that on a physically 64-bit addressed machine, 32-bit kernel can trigger crashes in accessing the memory regions that are beyond the 32-bit boundary. The region field's start address should still be 32-bit compliant, but after a calculation (adding some offsets), it may exceed the 32-bit boundary. This case is rare and buggy, but there are real BIOSes leaked with such issues (see References below). This patch fixes this gap by always defining IO addresses as 64-bit, and allows OSPMs to optimize it for a real 32-bit machine to reduce the size of the internal objects. Internal acpi_physical_address usages in the structures that can be fixed by this change include: 1. struct acpi_object_region: acpi_physical_address address; 2. struct acpi_address_range: acpi_physical_address start_address; acpi_physical_address end_address; 3. struct acpi_mem_space_context; acpi_physical_address address; 4. struct acpi_table_desc acpi_physical_address address; See known issues 1 for other usages. Note that acpi_io_address which is used for ACPI_PROCESSOR may also suffer from same problem, so this patch changes it accordingly. For iasl, it will enforce acpi_physical_address as 32-bit to generate 32-bit OSPM compatible tables on 32-bit platforms, we need to define ACPI_32BIT_PHYSICAL_ADDRESS for it in acenv.h. Known issues: 1. Cleanup of mapped virtual address In struct acpi_mem_space_context, acpi_physical_address is used as a virtual address: acpi_physical_address mapped_physical_address; It is better to introduce acpi_virtual_address or use acpi_size instead. This patch doesn't make such a change. Because this should be done along with a change to acpi_os_map_memory()/acpi_os_unmap_memory(). There should be no functional problem to leave this unchanged except that only this structure is enlarged unexpectedly. Link: https://github.com/acpica/acpica/commit/aacf863c Reference: https://bugzilla.kernel.org/show_bug.cgi?id=87971 Reference: https://bugzilla.kernel.org/show_bug.cgi?id=79501 Reported-and-tested-by: Paul Menzel <paulepanter@users.sourceforge.net> Reported-and-tested-by: Sial Nije <sialnije@gmail.com> Signed-off-by: Lv Zheng <lv.zheng@intel.com> Signed-off-by: Bob Moore <robert.moore@intel.com> Cc: All applicable <stable@vger.kernel.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
2015-05-17target: Fix COMPARE_AND_WRITE with SG_TO_MEM_NOALLOC handlingNicholas Bellinger
[ Upstream commit c8e639852ad720499912acedfd6b072325fd2807 ] This patch fixes a bug for COMPARE_AND_WRITE handling with fabrics using SCF_PASSTHROUGH_SG_TO_MEM_NOALLOC. It adds the missing allocation for cmd->t_bidi_data_sg within transport_generic_new_cmd() that is used by COMPARE_AND_WRITE for the initial READ payload, even if the fabric is already providing a pre-allocated buffer for cmd->t_data_sg. Also, fix zero-length COMPARE_AND_WRITE handling within the compare_and_write_callback() and target_complete_ok_work() to queue the response, skipping the initial READ. This fixes COMPARE_AND_WRITE emulation with loopback, vhost, and xen-backend fabric drivers using SG_TO_MEM_NOALLOC. Reported-by: Christoph Hellwig <hch@lst.de> Cc: Christoph Hellwig <hch@lst.de> Cc: <stable@vger.kernel.org> # v3.12+ Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org> Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
2015-05-17usb: define a generic USB_RESUME_TIMEOUT macroFelipe Balbi
[ Upstream commit 62f0342de1f012f3e90607d39e20fce811391169 ] Every USB Host controller should use this new macro to define for how long resume signalling should be driven on the bus. Currently, almost every single USB controller is using a 20ms timeout for resume signalling. That's problematic for two reasons: a) sometimes that 20ms timer expires a little before 20ms, which makes us fail certification b) some (many) devices actually need more than 20ms resume signalling. Sure, in case of (b) we can state that the device is against the USB spec, but the fact is that we have no control over which device the certification lab will use. We also have no control over which host they will use. Most likely they'll be using a Windows PC which, again, we have no control over how that USB stack is written and how long resume signalling they are using. At the end of the day, we must make sure Linux passes electrical compliance when working as Host or as Device and currently we don't pass compliance as host because we're driving resume signallig for exactly 20ms and that confuses certification test setup resulting in Certification failure. Cc: <stable@vger.kernel.org> # v3.10+ Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Acked-by: Peter Chen <peter.chen@freescale.com> Signed-off-by: Felipe Balbi <balbi@ti.com> Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
2015-05-11net: fix crash in build_skb()Eric Dumazet
[ Upstream commit 2ea2f62c8bda242433809c7f4e9eae1c52c40bbe ] When I added pfmemalloc support in build_skb(), I forgot netlink was using build_skb() with a vmalloc() area. In this patch I introduce __build_skb() for netlink use, and build_skb() is a wrapper handling both skb->head_frag and skb->pfmemalloc This means netlink no longer has to hack skb->head_frag [ 1567.700067] kernel BUG at arch/x86/mm/physaddr.c:26! [ 1567.700067] invalid opcode: 0000 [#1] PREEMPT SMP KASAN [ 1567.700067] Dumping ftrace buffer: [ 1567.700067] (ftrace buffer empty) [ 1567.700067] Modules linked in: [ 1567.700067] CPU: 9 PID: 16186 Comm: trinity-c182 Not tainted 4.0.0-next-20150424-sasha-00037-g4796e21 #2167 [ 1567.700067] task: ffff880127efb000 ti: ffff880246770000 task.ti: ffff880246770000 [ 1567.700067] RIP: __phys_addr (arch/x86/mm/physaddr.c:26 (discriminator 3)) [ 1567.700067] RSP: 0018:ffff8802467779d8 EFLAGS: 00010202 [ 1567.700067] RAX: 000041000ed8e000 RBX: ffffc9008ed8e000 RCX: 000000000000002c [ 1567.700067] RDX: 0000000000000004 RSI: 0000000000000000 RDI: ffffffffb3fd6049 [ 1567.700067] RBP: ffff8802467779f8 R08: 0000000000000019 R09: ffff8801d0168000 [ 1567.700067] R10: ffff8801d01680c7 R11: ffffed003a02d019 R12: ffffc9000ed8e000 [ 1567.700067] R13: 0000000000000f40 R14: 0000000000001180 R15: ffffc9000ed8e000 [ 1567.700067] FS: 00007f2a7da3f700(0000) GS:ffff8801d1000000(0000) knlGS:0000000000000000 [ 1567.700067] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 1567.700067] CR2: 0000000000738308 CR3: 000000022e329000 CR4: 00000000000007e0 [ 1567.700067] Stack: [ 1567.700067] ffffc9000ed8e000 ffff8801d0168000 ffffc9000ed8e000 ffff8801d0168000 [ 1567.700067] ffff880246777a28 ffffffffad7c0a21 0000000000001080 ffff880246777c08 [ 1567.700067] ffff88060d302e68 ffff880246777b58 ffff880246777b88 ffffffffad9a6821 [ 1567.700067] Call Trace: [ 1567.700067] build_skb (include/linux/mm.h:508 net/core/skbuff.c:316) [ 1567.700067] netlink_sendmsg (net/netlink/af_netlink.c:1633 net/netlink/af_netlink.c:2329) [ 1567.774369] ? sched_clock_cpu (kernel/sched/clock.c:311) [ 1567.774369] ? netlink_unicast (net/netlink/af_netlink.c:2273) [ 1567.774369] ? netlink_unicast (net/netlink/af_netlink.c:2273) [ 1567.774369] sock_sendmsg (net/socket.c:614 net/socket.c:623) [ 1567.774369] sock_write_iter (net/socket.c:823) [ 1567.774369] ? sock_sendmsg (net/socket.c:806) [ 1567.774369] __vfs_write (fs/read_write.c:479 fs/read_write.c:491) [ 1567.774369] ? get_lock_stats (kernel/locking/lockdep.c:249) [ 1567.774369] ? default_llseek (fs/read_write.c:487) [ 1567.774369] ? vtime_account_user (kernel/sched/cputime.c:701) [ 1567.774369] ? rw_verify_area (fs/read_write.c:406 (discriminator 4)) [ 1567.774369] vfs_write (fs/read_write.c:539) [ 1567.774369] SyS_write (fs/read_write.c:586 fs/read_write.c:577) [ 1567.774369] ? SyS_read (fs/read_write.c:577) [ 1567.774369] ? __this_cpu_preempt_check (lib/smp_processor_id.c:63) [ 1567.774369] ? trace_hardirqs_on_caller (kernel/locking/lockdep.c:2594 kernel/locking/lockdep.c:2636) [ 1567.774369] ? trace_hardirqs_on_thunk (arch/x86/lib/thunk_64.S:42) [ 1567.774369] system_call_fastpath (arch/x86/kernel/entry_64.S:261) Fixes: 79930f5892e ("net: do not deplete pfmemalloc reserve") Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: Sasha Levin <sasha.levin@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
2015-05-11net: add skb_checksum_complete_unsetTom Herbert
[ Upstream commit 4e18b9adf2f910ec4d30b811a74a5b626e6c6125 ] This function changes ip_summed to CHECKSUM_NONE if CHECKSUM_COMPLETE is set. This is called to discard checksum-complete when packet is being modified and checksum is not pulled for headers in a layer. Signed-off-by: Tom Herbert <tom@herbertland.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
2015-05-11arm/arm64: KVM: Keep elrsr/aisr in sync with software modelChristoffer Dall
commit ae705930fca6322600690df9dc1c7d0516145a93 upstream. There is an interesting bug in the vgic code, which manifests itself when the KVM run loop has a signal pending or needs a vmid generation rollover after having disabled interrupts but before actually switching to the guest. In this case, we flush the vgic as usual, but we sync back the vgic state and exit to userspace before entering the guest. The consequence is that we will be syncing the list registers back to the software model using the GICH_ELRSR and GICH_EISR from the last execution of the guest, potentially overwriting a list register containing an interrupt. This showed up during migration testing where we would capture a state where the VM has masked the arch timer but there were no interrupts, resulting in a hung test. Cc: Marc Zyngier <marc.zyngier@arm.com> Reported-by: Alex Bennee <alex.bennee@linaro.org> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org> Signed-off-by: Alex Bennée <alex.bennee@linaro.org> Acked-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org> Signed-off-by: Shannon Zhao <shannon.zhao@linaro.org> Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
2015-05-11arm/arm64: KVM: Require in-kernel vgic for the arch timersChristoffer Dall
commit 05971120fca43e0357789a14b3386bb56eef2201 upstream. It is curently possible to run a VM with architected timers support without creating an in-kernel VGIC, which will result in interrupts from the virtual timer going nowhere. To address this issue, move the architected timers initialization to the time when we run a VCPU for the first time, and then only initialize (and enable) the architected timers if we have a properly created and initialized in-kernel VGIC. When injecting interrupts from the virtual timer to the vgic, the current setup should ensure that this never calls an on-demand init of the VGIC, which is the only call path that could return an error from kvm_vgic_inject_irq(), so capture the return value and raise a warning if there's an error there. We also change the kvm_timer_init() function from returning an int to be a void function, since the function always succeeds. Reviewed-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org> Signed-off-by: Shannon Zhao <shannon.zhao@linaro.org> Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
2015-05-11arm/arm64: KVM: vgic: move reset initialization into vgic_init_maps()Peter Maydell
commit 6d3cfbe21bef5b66530b50ad16c88fdc71a04c35 upstream. VGIC initialization currently happens in three phases: (1) kvm_vgic_create() (triggered by userspace GIC creation) (2) vgic_init_maps() (triggered by userspace GIC register read/write requests, or from kvm_vgic_init() if not already run) (3) kvm_vgic_init() (triggered by first VM run) We were doing initialization of some state to correspond with the state of a freshly-reset GIC in kvm_vgic_init(); this is too late, since it will overwrite changes made by userspace using the register access APIs before the VM is run. Move this initialization earlier, into the vgic_init_maps() phase. This fixes a bug where QEMU could successfully restore a saved VM state snapshot into a VM that had already been run, but could not restore it "from cold" using the -loadvm command line option (the symptoms being that the restored VM would run but interrupts were ignored). Finally rename vgic_init_maps to vgic_init and renamed kvm_vgic_init to kvm_vgic_map_resources. [ This patch is originally written by Peter Maydell, but I have modified it somewhat heavily, renaming various bits and moving code around. If something is broken, I am to be blamed. - Christoffer ] Acked-by: Marc Zyngier <marc.zyngier@arm.com> Reviewed-by: Eric Auger <eric.auger@linaro.org> Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org> Signed-off-by: Shannon Zhao <shannon.zhao@linaro.org> Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
2015-05-11kvm: add a memslot flag for incoherent memory regionsArd Biesheuvel
commit 1050dcda3052912984b26fb6d2695a3f41792000 upstream. Memory regions may be incoherent with the caches, typically when the guest has mapped a host system RAM backed memory region as uncached. Add a flag KVM_MEMSLOT_INCOHERENT so that we can tag these memslots and handle them appropriately when mapping them. Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Shannon Zhao <shannon.zhao@linaro.org> Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
2015-04-27usbnet: Fix tx_bytes statistic running backward in cdc_ncmBen Hutchings
[ Upstream commit 7a1e890e2168e33fb62d84528e996b8b4b478fea ] cdc_ncm disagrees with usbnet about how much framing overhead should be counted in the tx_bytes statistics, and tries 'fix' this by decrementing tx_bytes on the transmit path. But statistics must never be decremented except due to roll-over; this will thoroughly confuse user-space. Also, tx_bytes is only incremented by usbnet in the completion path. Fix this by requiring drivers that set FLAG_MULTI_FRAME to set a tx_bytes delta along with the tx_packets count. Fixes: beeecd42c3b4 ("net: cdc_ncm/cdc_mbim: adding NCM protocol statistics") Signed-off-by: Ben Hutchings <ben.hutchings@codethink.co.uk> Signed-off-by: Bjørn Mork <bjorn@mork.no> Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
2015-04-27usbnet: Fix tx_packets stat for FLAG_MULTI_FRAME driversBen Hutchings
[ Upstream commit 1e9e39f4a29857a396ac7b669d109f697f66695e ] Currently the usbnet core does not update the tx_packets statistic for drivers with FLAG_MULTI_PACKET and there is no hook in the TX completion path where they could do this. cdc_ncm and dependent drivers are bumping tx_packets stat on the transmit path while asix and sr9800 aren't updating it at all. Add a packet count in struct skb_data so these drivers can fill it in, initialise it to 1 for other drivers, and add the packet count to the tx_packets statistic on completion. Signed-off-by: Ben Hutchings <ben.hutchings@codethink.co.uk> Tested-by: Bjørn Mork <bjorn@mork.no> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
2015-04-27vlan: introduce *vlan_hwaccel_push_inside helpersJiri Pirko
[ Upstream commit 5968250c868ceee680aa77395b24e6ddcae17d36 ] Use them to push skb->vlan_tci into the payload and avoid code duplication. Signed-off-by: Jiri Pirko <jiri@resnulli.us> Acked-by: Pravin B Shelar <pshelar@nicira.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
2015-04-27vlan: rename __vlan_put_tag to vlan_insert_tag_set_protoJiri Pirko
[ Upstream commit 62749e2cb3c4a7da3eaa5c01a7e787aebeff8536 ] Name fits better. Plus there's going to be introduced __vlan_insert_tag later on. Signed-off-by: Jiri Pirko <jiri@resnulli.us> Acked-by: Pravin B Shelar <pshelar@nicira.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
2015-04-27vlan: kill vlan_put_tag helperJiri Pirko
[ Upstream commit b4bef1b57544b18899eb15569e3bafd8d2eeeff6 ] Since both tx and rx paths work with skb->vlan_tci, there's no need for this function anymore. Switch users directly to __vlan_hwaccel_put_tag. Signed-off-by: Jiri Pirko <jiri@resnulli.us> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
2015-04-27ipv6: protect skb->sk accesses from recursive dereference inside the stackhannes@stressinduktion.org
[ Upstream commit f60e5990d9c1424af9dbca60a23ba2a1c7c1ce90 ] We should not consult skb->sk for output decisions in xmit recursion levels > 0 in the stack. Otherwise local socket settings could influence the result of e.g. tunnel encapsulation process. ipv6 does not conform with this in three places: 1) ip6_fragment: we do consult ipv6_npinfo for frag_size 2) sk_mc_loop in ipv6 uses skb->sk and checks if we should loop the packet back to the local socket 3) ip6_skb_dst_mtu could query the settings from the user socket and force a wrong MTU Furthermore: In sk_mc_loop we could potentially land in WARN_ON(1) if we use a PF_PACKET socket ontop of an IPv6-backed vxlan device. Reuse xmit_recursion as we are currently only interested in protecting tunnel devices. Cc: Jiri Pirko <jiri@resnulli.us> Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
2015-04-27kernel: Change ASSIGN_ONCE(val, x) to WRITE_ONCE(x, val)Christian Borntraeger
[ Upstream commit 43239cbe79fc369f5d2160bd7f69e28b5c50a58c ] Feedback has shown that WRITE_ONCE(x, val) is easier to use than ASSIGN_ONCE(val,x). There are no in-tree users yet, so lets change it for 3.19. Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com> Acked-by: Peter Zijlstra <peterz@infradead.org> Acked-by: Davidlohr Bueso <dave@stgolabs.net> Acked-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
2015-04-26kernel: Provide READ_ONCE and ASSIGN_ONCEChristian Borntraeger
[ Upstream commit 230fa253df6352af12ad0a16128760b5cb3f92df ] ACCESS_ONCE does not work reliably on non-scalar types. For example gcc 4.6 and 4.7 might remove the volatile tag for such accesses during the SRA (scalar replacement of aggregates) step https://gcc.gnu.org/bugzilla/show_bug.cgi?id=58145) Let's provide READ_ONCE/ASSIGN_ONCE that will do all accesses via scalar types as suggested by Linus Torvalds. Accesses larger than the machines word size cannot be guaranteed to be atomic. These macros will use memcpy and emit a build warning. Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com> Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
2015-04-24cpuidle: remove state_count field from struct cpuidle_deviceBartlomiej Zolnierkiewicz
[ Upstream commit d75e4af14e228bbe3f86e29bcecb8e6be98d4e04 ] Thomas Schlichter reports the following issue on his Samsung NC20: "The C-states C1 and C2 to the OS when connected to AC, and additionally provides the C3 C-state when disconnected from AC. However, the number of C-states shown in sysfs is fixed to the number of C-states present at boot. If I boot with AC connected, I always only see the C-states up to C2 even if I disconnect AC. The reason is commit 130a5f692425 (ACPI / cpuidle: remove dev->state_count setting). It removes the update of dev->state_count, but sysfs uses exactly this variable to show the C-states. The fix is to use drv->state_count in sysfs. As this is currently the last user of dev->state_count, this variable can be completely removed." Remove dev->state_count as per the above. Reported-by: Thomas Schlichter <thomas.schlichter@web.de> Signed-off-by: Bartlomiej Zolnierkiewicz <b.zolnierkie@samsung.com> Signed-off-by: Kyungmin Park <kyungmin.park@samsung.com> Acked-by: Daniel Lezcano <daniel.lezcano@linaro.org> Cc: 3.14+ <stable@vger.kernel.org> # 3.14+ [ rjw: Changelog ] Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
2015-04-24Defer processing of REQ_PREEMPT requests for blocked devicesBart Van Assche
[ Upstream commit bba0bdd7ad4713d82338bcd9b72d57e9335a664b ] SCSI transport drivers and SCSI LLDs block a SCSI device if the transport layer is not operational. This means that in this state no requests should be processed, even if the REQ_PREEMPT flag has been set. This patch avoids that a rescan shortly after a cable pull sporadically triggers the following kernel oops: BUG: unable to handle kernel paging request at ffffc9001a6bc084 IP: [<ffffffffa04e08f2>] mlx4_ib_post_send+0xd2/0xb30 [mlx4_ib] Process rescan-scsi-bus (pid: 9241, threadinfo ffff88053484a000, task ffff880534aae100) Call Trace: [<ffffffffa0718135>] srp_post_send+0x65/0x70 [ib_srp] [<ffffffffa071b9df>] srp_queuecommand+0x1cf/0x3e0 [ib_srp] [<ffffffffa0001ff1>] scsi_dispatch_cmd+0x101/0x280 [scsi_mod] [<ffffffffa0009ad1>] scsi_request_fn+0x411/0x4d0 [scsi_mod] [<ffffffff81223b37>] __blk_run_queue+0x27/0x30 [<ffffffff8122a8d2>] blk_execute_rq_nowait+0x82/0x110 [<ffffffff8122a9c2>] blk_execute_rq+0x62/0xf0 [<ffffffffa000b0e8>] scsi_execute+0xe8/0x190 [scsi_mod] [<ffffffffa000b2f3>] scsi_execute_req+0xa3/0x130 [scsi_mod] [<ffffffffa000c1aa>] scsi_probe_lun+0x17a/0x450 [scsi_mod] [<ffffffffa000ce86>] scsi_probe_and_add_lun+0x156/0x480 [scsi_mod] [<ffffffffa000dc2f>] __scsi_scan_target+0xdf/0x1f0 [scsi_mod] [<ffffffffa000dfa3>] scsi_scan_host_selected+0x183/0x1c0 [scsi_mod] [<ffffffffa000edfb>] scsi_scan+0xdb/0xe0 [scsi_mod] [<ffffffffa000ee13>] store_scan+0x13/0x20 [scsi_mod] [<ffffffff811c8d9b>] sysfs_write_file+0xcb/0x160 [<ffffffff811589de>] vfs_write+0xce/0x140 [<ffffffff81158b53>] sys_write+0x53/0xa0 [<ffffffff81464592>] system_call_fastpath+0x16/0x1b [<00007f611c9d9300>] 0x7f611c9d92ff Reported-by: Max Gurtuvoy <maxg@mellanox.com> Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com> Reviewed-by: Mike Christie <michaelc@cs.wisc.edu> Cc: <stable@vger.kernel.org> Signed-off-by: James Bottomley <JBottomley@Odin.com> Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
2015-04-24mm: prevent endless growth of anon_vma hierarchyKonstantin Khlebnikov
[ Upstream commit 7a3ef208e662f4b63d43a23f61a64a129c525bbc ] Constantly forking task causes unlimited grow of anon_vma chain. Each next child allocates new level of anon_vmas and links vma to all previous levels because pages might be inherited from any level. This patch adds heuristic which decides to reuse existing anon_vma instead of forking new one. It adds counter anon_vma->degree which counts linked vmas and directly descending anon_vmas and reuses anon_vma if counter is lower than two. As a result each anon_vma has either vma or at least two descending anon_vmas. In such trees half of nodes are leafs with alive vmas, thus count of anon_vmas is no more than two times bigger than count of vmas. This heuristic reuses anon_vmas as few as possible because each reuse adds false aliasing among vmas and rmap walker ought to scan more ptes when it searches where page is might be mapped. Link: http://lkml.kernel.org/r/20120816024610.GA5350@evergreen.ssec.wisc.edu Fixes: 5beb49305251 ("mm: change anon_vma linking to fix multi-process server scalability issue") [akpm@linux-foundation.org: fix typo, per Rik] Signed-off-by: Konstantin Khlebnikov <koct9i@gmail.com> Reported-by: Daniel Forrest <dan.forrest@ssec.wisc.edu> Tested-by: Michal Hocko <mhocko@suse.cz> Tested-by: Jerome Marchand <jmarchan@redhat.com> Reviewed-by: Michal Hocko <mhocko@suse.cz> Reviewed-by: Rik van Riel <riel@redhat.com> Cc: <stable@vger.kernel.org> [2.6.34+] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
2015-04-16regulator: palmas: Correct TPS659038 register definition for REGEN2Keerthy
[ Upstream commit e03826d5045e81a66a4fad7be9a8ecdaeb7911cf ] The register offset for REGEN2_CTRL in different for TPS659038 chip as when compared with other Palmas family PMICs. In the case of TPS659038 the wrong offset pointed to PLLEN_CTRL and was causing a hang. Correcting the same. Signed-off-by: Keerthy <j-keerthy@ti.com> Signed-off-by: Mark Brown <broonie@kernel.org> Cc: stable@vger.kernel.org Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
2015-04-16dm snapshot: suspend merging snapshot when doing exception handoverMikulas Patocka
[ Upstream commit 09ee96b21456883e108c3b00597bb37ec512151b ] The "dm snapshot: suspend origin when doing exception handover" commit fixed a exception store handover bug associated with pending exceptions to the "snapshot-origin" target. However, a similar problem exists in snapshot merging. When snapshot merging is in progress, we use the target "snapshot-merge" instead of "snapshot-origin". Consequently, during exception store handover, we must find the snapshot-merge target and suspend its associated mapped_device. To avoid lockdep warnings, the target must be suspended and resumed without holding _origins_lock. Introduce a dm_hold() function that grabs a reference on a mapped_device, but unlike dm_get(), it doesn't crash if the device has the DMF_FREEING flag set, it returns an error in this case. In snapshot_resume() we grab the reference to the origin device using dm_hold() while holding _origins_lock (_origins_lock guarantees that the device won't disappear). Then we release _origins_lock, suspend the device and grab _origins_lock again. NOTE to stable@ people: When backporting to kernels 3.18 and older, use dm_internal_suspend and dm_internal_resume instead of dm_internal_suspend_fast and dm_internal_resume_fast. Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com> Cc: stable@vger.kernel.org Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
2015-04-16regmap: introduce regmap_name to fix syscon regmap trace eventsPhilipp Zabel
[ Upstream commit c6b570d97c0e77f570bb6b2ed30d372b2b1e9aae ] This patch fixes a NULL pointer dereference when enabling regmap event tracing in the presence of a syscon regmap, introduced by commit bdb0066df96e ("mfd: syscon: Decouple syscon interface from platform devices"). That patch introduced syscon regmaps that have their dev field set to NULL. The regmap trace events expect it to point to a valid struct device and feed it to dev_name(): $ echo 1 > /sys/kernel/debug/tracing/events/regmap/enable Unable to handle kernel NULL pointer dereference at virtual address 0000002c pgd = 80004000 [0000002c] *pgd=00000000 Internal error: Oops: 17 [#1] SMP ARM Modules linked in: coda videobuf2_vmalloc CPU: 0 PID: 304 Comm: kworker/0:2 Not tainted 4.0.0-rc2+ #9197 Hardware name: Freescale i.MX6 Quad/DualLite (Device Tree) Workqueue: events_freezable thermal_zone_device_check task: 9f25a200 ti: 9f1ee000 task.ti: 9f1ee000 PC is at ftrace_raw_event_regmap_block+0x3c/0xe4 LR is at _regmap_raw_read+0x1bc/0x1cc pc : [<803636e8>] lr : [<80365f2c>] psr: 600f0093 sp : 9f1efd78 ip : 9f1efdb8 fp : 9f1efdb4 r10: 00000004 r9 : 00000001 r8 : 00000001 r7 : 00000180 r6 : 00000000 r5 : 9f00e3c0 r4 : 00000003 r3 : 00000001 r2 : 00000180 r1 : 00000000 r0 : 9f00e3c0 Flags: nZCv IRQs off FIQs on Mode SVC_32 ISA ARM Segment kernel Control: 10c5387d Table: 2d91004a DAC: 00000015 Process kworker/0:2 (pid: 304, stack limit = 0x9f1ee210) Stack: (0x9f1efd78 to 0x9f1f0000) fd60: 9f1efda4 9f1efd88 fd80: 800708c0 805f9510 80927140 800f0013 9f1fc800 9eb2f490 00000000 00000180 fda0: 808e3840 00000001 9f1efdfc 9f1efdb8 80365f2c 803636b8 805f8958 800708e0 fdc0: a00f0013 803636ac 9f16de00 00000180 80927140 9f1fc800 9f1fc800 9f1efe6c fde0: 9f1efe6c 9f732400 00000000 00000000 9f1efe1c 9f1efe00 80365f70 80365d7c fe00: 80365f3c 9f1fc800 9f1fc800 00000180 9f1efe44 9f1efe20 803656a4 80365f48 fe20: 9f1fc800 00000180 9f1efe6c 9f1efe6c 9f732400 00000000 9f1efe64 9f1efe48 fe40: 803657bc 80365634 00000001 9e95f910 9f1fc800 9f1efeb4 9f1efe8c 9f1efe68 fe60: 80452ac0 80365778 9f1efe8c 9f1efe78 9e93d400 9e93d5e8 9f1efeb4 9f72ef40 fe80: 9f1efeac 9f1efe90 8044e11c 80452998 8045298c 9e93d608 9e93d400 808e1978 fea0: 9f1efecc 9f1efeb0 8044fd14 8044e0d0 ffffffff 9f25a200 9e93d608 9e481380 fec0: 9f1efedc 9f1efed0 8044fde8 8044fcec 9f1eff1c 9f1efee0 80038d50 8044fdd8 fee0: 9f1ee020 9f72ef40 9e481398 00000000 00000008 9f72ef54 9f1ee020 9f72ef40 ff00: 9e481398 9e481380 00000008 9f72ef40 9f1eff5c 9f1eff20 80039754 80038bfc ff20: 00000000 9e481380 80894100 808e1662 00000000 9e4f2ec0 00000000 9e481380 ff40: 800396f8 00000000 00000000 00000000 9f1effac 9f1eff60 8003e020 80039704 ff60: ffffffff 00000000 ffffffff 9e481380 00000000 00000000 9f1eff78 9f1eff78 ff80: 00000000 00000000 9f1eff88 9f1eff88 9e4f2ec0 8003df30 00000000 00000000 ffa0: 00000000 9f1effb0 8000eb60 8003df3c 00000000 00000000 00000000 00000000 ffc0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 ffe0: 00000000 00000000 00000000 00000000 00000013 00000000 ffffffff ffffffff Backtrace: [<803636ac>] (ftrace_raw_event_regmap_block) from [<80365f2c>] (_regmap_raw_read+0x1bc/0x1cc) r9:00000001 r8:808e3840 r7:00000180 r6:00000000 r5:9eb2f490 r4:9f1fc800 [<80365d70>] (_regmap_raw_read) from [<80365f70>] (_regmap_bus_read+0x34/0x6c) r10:00000000 r9:00000000 r8:9f732400 r7:9f1efe6c r6:9f1efe6c r5:9f1fc800 r4:9f1fc800 [<80365f3c>] (_regmap_bus_read) from [<803656a4>] (_regmap_read+0x7c/0x144) r6:00000180 r5:9f1fc800 r4:9f1fc800 r3:80365f3c [<80365628>] (_regmap_read) from [<803657bc>] (regmap_read+0x50/0x70) r9:00000000 r8:9f732400 r7:9f1efe6c r6:9f1efe6c r5:00000180 r4:9f1fc800 [<8036576c>] (regmap_read) from [<80452ac0>] (imx_get_temp+0x134/0x1a4) r6:9f1efeb4 r5:9f1fc800 r4:9e95f910 r3:00000001 [<8045298c>] (imx_get_temp) from [<8044e11c>] (thermal_zone_get_temp+0x58/0x74) r7:9f72ef40 r6:9f1efeb4 r5:9e93d5e8 r4:9e93d400 [<8044e0c4>] (thermal_zone_get_temp) from [<8044fd14>] (thermal_zone_device_update+0x34/0xec) r6:808e1978 r5:9e93d400 r4:9e93d608 r3:8045298c [<8044fce0>] (thermal_zone_device_update) from [<8044fde8>] (thermal_zone_device_check+0x1c/0x20) r5:9e481380 r4:9e93d608 [<8044fdcc>] (thermal_zone_device_check) from [<80038d50>] (process_one_work+0x160/0x3d4) [<80038bf0>] (process_one_work) from [<80039754>] (worker_thread+0x5c/0x4f4) r10:9f72ef40 r9:00000008 r8:9e481380 r7:9e481398 r6:9f72ef40 r5:9f1ee020 r4:9f72ef54 [<800396f8>] (worker_thread) from [<8003e020>] (kthread+0xf0/0x108) r10:00000000 r9:00000000 r8:00000000 r7:800396f8 r6:9e481380 r5:00000000 r4:9e4f2ec0 [<8003df30>] (kthread) from [<8000eb60>] (ret_from_fork+0x14/0x34) r7:00000000 r6:00000000 r5:8003df30 r4:9e4f2ec0 Code: e3140040 1a00001a e3140020 1a000016 (e596002c) ---[ end trace 193c15c2494ec960 ]--- Fixes: bdb0066df96e (mfd: syscon: Decouple syscon interface from platform devices) Signed-off-by: Philipp Zabel <p.zabel@pengutronix.de> Signed-off-by: Mark Brown <broonie@kernel.org> Cc: stable@vger.kernel.org Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
2015-04-16sched/wait: Provide infrastructure to deal with nested blockingPeter Zijlstra
[ Upstream commit 61ada528dea028331e99e8ceaed87c683ad25de2 ] There are a few places that call blocking primitives from wait loops, provide infrastructure to support this without the typical task_struct::state collision. We record the wakeup in wait_queue_t::flags which leaves task_struct::state free to be used by others. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Oleg Nesterov <oleg@redhat.com> Cc: tglx@linutronix.de Cc: ilya.dryomov@inktank.com Cc: umgwanakikbuti@gmail.com Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: http://lkml.kernel.org/r/20140924082242.051202318@infradead.org Signed-off-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
2015-03-28workqueue: fix hang involving racing cancel[_delayed]_work_sync()'s for ↵Tejun Heo
PREEMPT_NONE [ Upstream commit 8603e1b30027f943cc9c1eef2b291d42c3347af1 ] cancel[_delayed]_work_sync() are implemented using __cancel_work_timer() which grabs the PENDING bit using try_to_grab_pending() and then flushes the work item with PENDING set to prevent the on-going execution of the work item from requeueing itself. try_to_grab_pending() can always grab PENDING bit without blocking except when someone else is doing the above flushing during cancelation. In that case, try_to_grab_pending() returns -ENOENT. In this case, __cancel_work_timer() currently invokes flush_work(). The assumption is that the completion of the work item is what the other canceling task would be waiting for too and thus waiting for the same condition and retrying should allow forward progress without excessive busy looping Unfortunately, this doesn't work if preemption is disabled or the latter task has real time priority. Let's say task A just got woken up from flush_work() by the completion of the target work item. If, before task A starts executing, task B gets scheduled and invokes __cancel_work_timer() on the same work item, its try_to_grab_pending() will return -ENOENT as the work item is still being canceled by task A and flush_work() will also immediately return false as the work item is no longer executing. This puts task B in a busy loop possibly preventing task A from executing and clearing the canceling state on the work item leading to a hang. task A task B worker executing work __cancel_work_timer() try_to_grab_pending() set work CANCELING flush_work() block for work completion completion, wakes up A __cancel_work_timer() while (forever) { try_to_grab_pending() -ENOENT as work is being canceled flush_work() false as work is no longer executing } This patch removes the possible hang by updating __cancel_work_timer() to explicitly wait for clearing of CANCELING rather than invoking flush_work() after try_to_grab_pending() fails with -ENOENT. Link: http://lkml.kernel.org/g/20150206171156.GA8942@axis.com v3: bit_waitqueue() can't be used for work items defined in vmalloc area. Switched to custom wake function which matches the target work item and exclusive wait and wakeup. v2: v1 used wake_up() on bit_waitqueue() which leads to NULL deref if the target bit waitqueue has wait_bit_queue's on it. Use DEFINE_WAIT_BIT() and __wake_up_bit() instead. Reported by Tomeu Vizoso. Signed-off-by: Tejun Heo <tj@kernel.org> Reported-by: Rabin Vincent <rabin.vincent@axis.com> Cc: Tomeu Vizoso <tomeu.vizoso@gmail.com> Cc: stable@vger.kernel.org Tested-by: Jesper Nilsson <jesper.nilsson@axis.com> Tested-by: Rabin Vincent <rabin.vincent@axis.com> Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
2015-03-28mmu_gather: move minimal range calculations into generic codeWill Deacon
[ Upstream commit fb7332a9fedfd62b1ba6530c86f39f0fa38afd49 ] On architectures with hardware broadcasting of TLB invalidation messages , it makes sense to reduce the range of the mmu_gather structure when unmapping page ranges based on the dirty address information passed to tlb_remove_tlb_entry. arm64 already does this by directly manipulating the start/end fields of the gather structure, but this confuses the generic code which does not expect these fields to change and can end up calculating invalid, negative ranges when forcing a flush in zap_pte_range. This patch moves the minimal range calculation out of the arm64 code and into the generic implementation, simplifying zap_pte_range in the process (which no longer needs to care about start/end, since they will point to the appropriate ranges already). With the range being tracked by core code, the need_flush flag is dropped in favour of checking that the end of the range has actually been set. Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Russell King - ARM Linux <linux@arm.linux.org.uk> Cc: Michal Simek <monstr@monstr.eu> Acked-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Will Deacon <will.deacon@arm.com> Signed-off-by: Sasha Levin <sasha.levin@oracle.com>