summaryrefslogtreecommitdiffstats
AgeCommit message (Collapse)Author
2014-07-28Linux 3.15.7v3.15.7Greg Kroah-Hartman
2014-07-28ARC: Implement ptrace(PTRACE_GET_THREAD_AREA)Anton Kolesov
commit a4b6cb735b25aa84a462a1985e3e43bebaf5beb4 upstream. This patch adds implementation of GET_THREAD_AREA ptrace request type. This is required by GDB to debug NPTL applications. Signed-off-by: Anton Kolesov <Anton.Kolesov@synopsys.com> Signed-off-by: Vineet Gupta <vgupta@synopsys.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-07-28Don't trigger congestion wait on dirty-but-not-writeout pagesLinus Torvalds
commit b738d764652dc5aab1c8939f637112981fce9e0e upstream. shrink_inactive_list() used to wait 0.1s to avoid congestion when all the pages that were isolated from the inactive list were dirty but not under active writeback. That makes no real sense, and apparently causes major interactivity issues under some loads since 3.11. The ostensible reason for it was to wait for kswapd to start writing pages, but that seems questionable as well, since the congestion wait code seems to trigger for kswapd itself as well. Also, the logic behind delaying anything when we haven't actually started writeback is not clear - it only delays actually starting that writeback. We'll still trigger the congestion waiting if (a) the process is kswapd, and we hit pages flagged for immediate reclaim (b) the process is not kswapd, and the zone backing dev writeback is actually congested. This probably needs to be revisited, but as it is this fixes a reported regression. Reported-by: Felipe Contreras <felipe.contreras@gmail.com> Pinpointed-by: Hillf Danton <dhillf@gmail.com> Cc: Michal Hocko <mhocko@suse.cz> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Mel Gorman <mgorman@suse.de> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> [mhocko@suse.cz: backport to 3.12 stable tree] Fixes: e2be15f6c3ee ('mm: vmscan: stall page reclaim and writeback pages based on dirty/writepage pages encountered') Reported-by: Felipe Contreras <felipe.contreras@gmail.com> Pinpointed-by: Hillf Danton <dhillf@gmail.com> Cc: Michal Hocko <mhocko@suse.cz> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Mel Gorman <mgorman@suse.de> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Michal Hocko <mhocko@suse.cz> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-07-28iwlwifi: mvm: disable CTS to SelfEmmanuel Grumbach
commit dc271ee0d04d12d6bfabacbec803289a7072fbd9 upstream. Firmware folks seem say that this flag can make trouble. Drop it. The advantage of CTS to self is that it slightly reduces the cost of the protection, but make the protection less reliable. Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-07-28ARM: dts: imx: Add alias for ethernet controllerMarek Vasut
commit 22970070e027cbbb9b2878f8f7c31d0d7f29e94d upstream. Add alias for FEC ethernet on i.MX to allow bootloaders (like U-Boot) patch-in the MAC address for FEC using this alias. Signed-off-by: Marek Vasut <marex@denx.de> Signed-off-by: Shawn Guo <shawn.guo@linaro.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-07-28gpio: dwapb: drop irq_setup_generic_chip()Sebastian Andrzej Siewior
commit 11d3d334af07408ce3a68860c40006ddcd343da5 upstream. The driver calls irq_alloc_domain_generic_chips() which creates a gc and adds it to gc_list. The driver later then calls irq_setup_generic_chip() which also initializes the gc and adds it to the gc_list() and this corrupts the list. Enable LIST_DEBUG and you see the kernel complain. This isn't required, irq_alloc_domain_generic_chips() did the init. Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Tested-by: Alan Tull <delicious.quinoa@gmail.com> Signed-off-by: Linus Walleij <linus.walleij@linaro.org> Signed-off-by: dann frazier <dann.frazier@canonical.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-07-28aio: protect reqs_available updates from changes in interrupt handlersBenjamin LaHaise
commit 263782c1c95bbddbb022dc092fd89a36bb8d5577 upstream. As of commit f8567a3845ac05bb28f3c1b478ef752762bd39ef it is now possible to have put_reqs_available() called from irq context. While put_reqs_available() is per cpu, it did not protect itself from interrupts on the same CPU. This lead to aio_complete() corrupting the available io requests count when run under a heavy O_DIRECT workloads as reported by Robert Elliott. Fix this by disabling irq updates around the per cpu batch updates of reqs_available. Many thanks to Robert and folks for testing and tracking this down. Reported-by: Robert Elliot <Elliott@hp.com> Tested-by: Robert Elliot <Elliott@hp.com> Signed-off-by: Benjamin LaHaise <bcrl@kvack.org> Cc: Jens Axboe <axboe@kernel.dk>, Christoph Hellwig <hch@infradead.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-07-28IB/mlx5: Enable "block multicast loopback" for kernel consumersOr Gerlitz
commit 652c1a05171695d21b84dd3a723606b50eeb80fd upstream. In commit f360d88a2efd, we advertise blocking multicast loopback to both kernel and userspace consumers, but don't allow kernel consumers (e.g IPoIB) to use it with their UD QPs. Fix that. Fixes: f360d88a2efd ("IB/mlx5: Add block multicast loopback support") Reported-by: Haggai Eran <haggaie@mellanox.com> Signed-off-by: Eli Cohen <eli@mellanox.com> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Roland Dreier <roland@purestorage.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-07-28x86/efi: Include a .bss section within the PE/COFF headersMichael Brown
commit c7fb93ec51d462ec3540a729ba446663c26a0505 upstream. The PE/COFF headers currently describe only the initialised-data portions of the image, and result in no space being allocated for the uninitialised-data portions. Consequently, the EFI boot stub will end up overwriting unexpected areas of memory, with unpredictable results. Fix by including a .bss section in the PE/COFF headers (functionally equivalent to the init_size field in the bzImage header). Signed-off-by: Michael Brown <mbrown@fensystems.co.uk> Cc: Thomas Bächler <thomas@archlinux.org> Cc: Josh Boyer <jwboyer@fedoraproject.org> Signed-off-by: Matt Fleming <matt.fleming@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-07-28sched: Fix possible divide by zero in avg_atom() calculationMateusz Guzik
commit b0ab99e7736af88b8ac1b7ae50ea287fffa2badc upstream. proc_sched_show_task() does: if (nr_switches) do_div(avg_atom, nr_switches); nr_switches is unsigned long and do_div truncates it to 32 bits, which means it can test non-zero on e.g. x86-64 and be truncated to zero for division. Fix the problem by using div64_ul() instead. As a side effect calculations of avg_atom for big nr_switches are now correct. Signed-off-by: Mateusz Guzik <mguzik@redhat.com> Signed-off-by: Peter Zijlstra <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: http://lkml.kernel.org/r/1402750809-31991-1-git-send-email-mguzik@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-07-28locking/mutex: Disable optimistic spinning on some architecturesPeter Zijlstra
commit 4badad352a6bb202ec68afa7a574c0bb961e5ebc upstream. The optimistic spin code assumes regular stores and cmpxchg() play nice; this is found to not be true for at least: parisc, sparc32, tile32, metag-lock1, arc-!llsc and hexagon. There is further wreckage, but this in particular seemed easy to trigger, so blacklist this. Opt in for known good archs. Signed-off-by: Peter Zijlstra <peterz@infradead.org> Reported-by: Mikulas Patocka <mpatocka@redhat.com> Cc: David Miller <davem@davemloft.net> Cc: Chris Metcalf <cmetcalf@tilera.com> Cc: James Bottomley <James.Bottomley@hansenpartnership.com> Cc: Vineet Gupta <vgupta@synopsys.com> Cc: Jason Low <jason.low2@hp.com> Cc: Waiman Long <waiman.long@hp.com> Cc: "James E.J. Bottomley" <jejb@parisc-linux.org> Cc: Paul McKenney <paulmck@linux.vnet.ibm.com> Cc: John David Anglin <dave.anglin@bell.net> Cc: James Hogan <james.hogan@imgtec.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Davidlohr Bueso <davidlohr@hp.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Russell King <linux@arm.linux.org.uk> Cc: Will Deacon <will.deacon@arm.com> Cc: linux-arm-kernel@lists.infradead.org Cc: linux-kernel@vger.kernel.org Cc: linuxppc-dev@lists.ozlabs.org Cc: sparclinux@vger.kernel.org Link: http://lkml.kernel.org/r/20140606175316.GV13930@laptop.programming.kicks-ass.net Signed-off-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-07-28PM / sleep: Fix request_firmware() error at resumeTakashi Iwai
commit 4320f6b1d9db4ca912c5eb6ecb328b2e090e1586 upstream. The commit [247bc037: PM / Sleep: Mitigate race between the freezer and request_firmware()] introduced the finer state control, but it also leads to a new bug; for example, a bug report regarding the firmware loading of intel BT device at suspend/resume: https://bugzilla.novell.com/show_bug.cgi?id=873790 The root cause seems to be a small window between the process resume and the clear of usermodehelper lock. The request_firmware() function checks the UMH lock and gives up when it's in UMH_DISABLE state. This is for avoiding the invalid f/w loading during suspend/resume phase. The problem is, however, that usermodehelper_enable() is called at the end of thaw_processes(). Thus, a thawed process in between can kick off the f/w loader code path (in this case, via btusb_setup_intel()) even before the call of usermodehelper_enable(). Then usermodehelper_read_trylock() returns an error and request_firmware() spews WARN_ON() in the end. This oneliner patch fixes the issue just by setting to UMH_FREEZING state again before restarting tasks, so that the call of request_firmware() will be blocked until the end of this function instead of returning an error. Fixes: 247bc0374254 (PM / Sleep: Mitigate race between the freezer and request_firmware()) Link: https://bugzilla.novell.com/show_bug.cgi?id=873790 Signed-off-by: Takashi Iwai <tiwai@suse.de> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-07-28RDMA/cxgb4: Initialize the device status pageSteve Wise
commit 6b54d54dea82ae214e4a45a503c4ef755a8ecee8 upstream. The status page is mapped to user processes and allows sharing the device state between the kernel and user processes. This state isn't getting initialized and thus intermittently causes problems. Namely, the user process can mistakenly think the user doorbell writes are disabled which causes SQ work requests to never get fetched by HW. Fixes: 05eb23893c2c ("cxgb4/iw_cxgb4: Doorbell Drop Avoidance Bug Fixes"). Signed-off-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Roland Dreier <roland@purestorage.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-07-28dm cache metadata: do not allow the data block size to changeMike Snitzer
commit 048e5a07f282c57815b3901d4a68a77fa131ce0a upstream. The block size for the dm-cache's data device must remained fixed for the life of the cache. Disallow any attempt to change the cache's data block size. Signed-off-by: Mike Snitzer <snitzer@redhat.com> Acked-by: Joe Thornber <ejt@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-07-28dm thin metadata: do not allow the data block size to changeMike Snitzer
commit 9aec8629ec829fc9403788cd959e05dd87988bd1 upstream. The block size for the thin-pool's data device must remained fixed for the life of the thin-pool. Disallow any attempt to change the thin-pool's data block size. It should be noted that attempting to change the data block size via thin-pool table reload will be ignored as a side-effect of the thin-pool handover that the thin-pool target does during thin-pool table reload. Here is an example outcome of attempting to load a thin-pool table that reduced the thin-pool's data block size from 1024K to 512K. Before: kernel: device-mapper: thin: 253:4: growing the data device from 204800 to 409600 blocks After: kernel: device-mapper: thin metadata: changing the data block size (from 2048 to 1024) is not supported kernel: device-mapper: table: 253:4: thin-pool: Error creating metadata object kernel: device-mapper: ioctl: error adding target to table Signed-off-by: Mike Snitzer <snitzer@redhat.com> Acked-by: Joe Thornber <ejt@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-07-28mtd: devices: elm: fix elm_context_save() and elm_context_restore() functionsTed Juan
commit 6938ad40cb97a52d88a763008935340729a4acc7 upstream. These two function's switch case lack the 'break' that make them always return error. Signed-off-by: Ted Juan <ted.juan@gmail.com> Acked-by: Pekon Gupta <pekon@ti.com> Signed-off-by: Brian Norris <computersforpeace@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-07-28random: check for increase of entropy_count because of signed conversionHannes Frederic Sowa
commit 79a8468747c5f95ed3d5ce8376a3e82e0c5857fc upstream. The expression entropy_count -= ibytes << (ENTROPY_SHIFT + 3) could actually increase entropy_count if during assignment of the unsigned expression on the RHS (mind the -=) we reduce the value modulo 2^width(int) and assign it to entropy_count. Trinity found this. [ Commit modified by tytso to add an additional safety check for a negative entropy_count -- which should never happen, and to also add an additional paranoia check to prevent overly large count values to be passed into urandom_read(). ] Reported-by: Dave Jones <davej@redhat.com> Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-07-28cpufreq: move policy kobj to policy->cpu at resumeViresh Kumar
commit 92c14bd9477a20a83144f08c0ca25b0308bf0730 upstream. This is only relevant to implementations with multiple clusters, where clusters have separate clock lines but all CPUs within a cluster share it. Consider a dual cluster platform with 2 cores per cluster. During suspend we start hot unplugging CPUs in order 1 to 3. When CPU2 is removed, policy->kobj would be moved to CPU3 and when CPU3 goes down we wouldn't free policy or its kobj as we want to retain permissions/values/etc. Now on resume, we will get CPU2 before CPU3 and will call __cpufreq_add_dev(). We will recover the old policy and update policy->cpu from 3 to 2 from update_policy_cpu(). But the kobj is still tied to CPU3 and isn't moved to CPU2. We wouldn't create a link for CPU2, but would try that for CPU3 while bringing it online. Which will report errors as CPU3 already has kobj assigned to it. This bug got introduced with commit 42f921a, which overlooked this scenario. To fix this, lets move kobj to the new policy->cpu while bringing first CPU of a cluster back. Also do a WARN_ON() if kobject_move failed, as we would reach here only for the first CPU of a non-boot cluster. And we can't recover from this situation, if kobject_move() fails. Fixes: 42f921a6f10c (cpufreq: remove sysfs files for CPUs which failed to come back after resume) Reported-and-tested-by: Bu Yitian <ybu@qti.qualcomm.com> Reported-by: Saravana Kannan <skannan@codeaurora.org> Reviewed-by: Srivatsa S. Bhat <srivatsa@mit.edu> Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-07-28x86, tsc: Fix cpufreq lockupPeter Zijlstra
commit 3896c329df8092661dac80f55a8c3f60136fd61a upstream. Mauro reported that his AMD X2 using the powernow-k8 cpufreq driver locked up when doing cpu hotplug. Because we called set_cyc2ns_scale() from the time_cpufreq_notifier() unconditionally, it gets called multiple times for each freq change, instead of only the once, when the tsc_khz value actually changes. Because it gets called more than once, we run out of cyc2ns data slots and stall, waiting for a free one, but because we're half way offline, there's no consumers to free slots. By placing the call inside the condition that actually changes tsc_khz we avoid superfluous calls and avoid the problem. Reported-by: Mauro <registosites@hotmail.com> Tested-by: Mauro <registosites@hotmail.com> Fixes: 20d1c86a5776 ("sched/clock, x86: Rewrite cyc2ns() to avoid the need to disable IRQs") Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net> Cc: Viresh Kumar <viresh.kumar@linaro.org> Cc: Bin Gao <bin.gao@intel.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mika Westerberg <mika.westerberg@linux.intel.com> Cc: Paul Gortmaker <paul.gortmaker@windriver.com> Cc: Stefani Seibold <stefani@seibold.net> Cc: linux-kernel@vger.kernel.org Signed-off-by: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-07-28hwrng: fetch randomness only after device initAmit Shah
commit d3cc7996473a7bdd33256029988ea690754e4e2a upstream. Commit d9e7972619334 "hwrng: add randomness to system from rng sources" added a call to rng_get_data() from the hwrng_register() function. However, some rng devices need initialization before data can be read from them. This commit makes the call to rng_get_data() depend on no init fn pointer being registered by the device. If an init function is registered, this call is made after device init. CC: Kees Cook <keescook@chromium.org> CC: Jason Cooper <jason@lakedaemon.net> CC: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: Amit Shah <amit.shah@redhat.com> Reviewed-by: Jason Cooper <jason@lakedaemon.net> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-07-28alarmtimer: Fix bug where relative alarm timers were treated as absoluteJohn Stultz
commit 16927776ae757d0d132bdbfabbfe2c498342bd59 upstream. Sharvil noticed with the posix timer_settime interface, using the CLOCK_REALTIME_ALARM or CLOCK_BOOTTIME_ALARM clockid, if the users tried to specify a relative time timer, it would incorrectly be treated as absolute regardless of the state of the flags argument. This patch corrects this, properly checking the absolute/relative flag, as well as adds further error checking that no invalid flag bits are set. Reported-by: Sharvil Nanavati <sharvil@google.com> Signed-off-by: John Stultz <john.stultz@linaro.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@kernel.org> Cc: Prarit Bhargava <prarit@redhat.com> Cc: Sharvil Nanavati <sharvil@google.com> Link: http://lkml.kernel.org/r/1404767171-6902-1-git-send-email-john.stultz@linaro.org Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-07-28Revert "drm/i915: reverse dp link param selection, prefer fast over wide again"Dave Airlie
commit c6930992948adf0f8fc1f6ff1da51c5002a2cf95 upstream. This reverts commit 38aecea0ccbb909d635619cba22f1891e589b434. This breaks Haswell Thinkpad + Lenovo dock in SST mode with a HDMI monitor attached. Before this we can 1920x1200 mode, after this we only ever get 1024x768, and a lot of deferring. This didn't revert clean, but this should be fine. bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=1117008 Signed-off-by: Dave Airlie <airlied@redhat.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-07-28drm/radeon: avoid leaking edid dataAlex Deucher
commit 0ac66effe7fcdee55bda6d5d10d3372c95a41920 upstream. In some cases we fetch the edid in the detect() callback in order to determine what sort of monitor is connected. If that happens, don't fetch the edid again in the get_modes() callback or we will leak the edid. Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-07-28drm/qxl: return IRQ_NONE if it was not our irqJason Wang
commit fbb60fe35ad579b511de8604b06a30b43846473b upstream. Return IRQ_NONE if it was not our irq. This is necessary for the case when qxl is sharing irq line with a device A in a crash kernel. If qxl is initialized before A and A's irq was raised during this gap, returning IRQ_HANDLED in this case will cause this irq to be raised again after EOI since kernel think it was handled but in fact it was not. Cc: Gerd Hoffmann <kraxel@redhat.com> Signed-off-by: Jason Wang <jasowang@redhat.com> Signed-off-by: Dave Airlie <airlied@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-07-28drm/radeon: set default bl level to something reasonableAlex Deucher
commit 201bb62402e0227375c655446ea04fcd0acf7287 upstream. If the value in the scratch register is 0, set it to the max level. This fixes an issue where the console fb blanking code calls back into the backlight driver on unblank and then sets the backlight level to 0 after the driver has already set the mode and enabled the backlight. bugs: https://bugs.freedesktop.org/show_bug.cgi?id=81382 https://bugs.freedesktop.org/show_bug.cgi?id=70207 Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Tested-by: David Heidelberger <david.heidelberger@ixit.cz> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-07-28irqchip: gic: Fix core ID calculation when topology is read from DTTomasz Figa
commit 29e697b11853d3f83b1864ae385abdad4aa2c361 upstream. Certain GIC implementation, namely those found on earlier, single cluster, Exynos SoCs, have registers mapped without per-CPU banking, which means that the driver needs to use different offset for each CPU. Currently the driver calculates the offset by multiplying value returned by cpu_logical_map() by CPU offset parsed from DT. This is correct when CPU topology is not specified in DT and aforementioned function returns core ID alone. However when DT contains CPU topology, the function changes to return cluster ID as well, which is non-zero on mentioned SoCs and so breaks the calculation in GIC driver. This patch fixes this by masking out cluster ID in CPU offset calculation so that only core ID is considered. Multi-cluster Exynos SoCs already have banked GIC implementations, so this simple fix should be enough. Reported-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com> Reported-by: Bartlomiej Zolnierkiewicz <b.zolnierkie@samsung.com> Signed-off-by: Tomasz Figa <t.figa@samsung.com> Fixes: db0d4db22a78d ("ARM: gic: allow GIC to support non-banked setups") Link: https://lkml.kernel.org/r/1405610624-18722-1-git-send-email-t.figa@samsung.com Signed-off-by: Jason Cooper <jason@lakedaemon.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-07-28irqchip: gic: Add binding probe for ARM GIC400Suravee Suthikulpanit
commit 144cb08864ed44be52d8634ac69cd98e5efcf527 upstream. Commit 3ab72f9156bb "dt-bindings: add GIC-400 binding" added the "arm,gic-400" compatible string, but the corresponding IRQCHIP_DECLARE was never added to the gic driver. Therefore add the missing irqchip declaration for it. Signed-off-by: Suravee Suthikulpanit <Suravee.Suthikulpanit@amd.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Removed additional empty line and adapted commit message to mark it as fixing an issue. Signed-off-by: Heiko Stuebner <heiko@sntech.de> Acked-by: Will Deacon <will.deacon@arm.com> Fixes: 3ab72f9156bb ("dt-bindings: add GIC-400 binding") Link: https://lkml.kernel.org/r/2621565.f5eISveXXJ@diego Signed-off-by: Jason Cooper <jason@lakedaemon.net>
2014-07-28irqchip: gic: Add support for cortex a7 compatible stringMatthias Brugger
commit a97e8027b1d28eafe6bafe062556c1ec926a49c6 upstream. Patch 0a68214b "ARM: DT: Add binding for GIC virtualization extentions (VGIC)" added the "arm,cortex-a7-gic" compatible string, but the corresponding IRQCHIP_DECLARE was never added to the gic driver. To let real Cortex-A7 SoCs use it, add the necessary declaration to the device driver. Signed-off-by: Matthias Brugger <matthias.bgg@gmail.com> Link: https://lkml.kernel.org/r/1404388732-28890-1-git-send-email-matthias.bgg@gmail.com Fixes: 0a68214b76ca ("ARM: DT: Add binding for GIC virtualization extentions (VGIC)") Signed-off-by: Jason Cooper <jason@lakedaemon.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-07-28ring-buffer: Fix polling on trace_pipeMartin Lau
commit 97b8ee845393701edc06e27ccec2876ff9596019 upstream. ring_buffer_poll_wait() should always put the poll_table to its wait_queue even there is immediate data available. Otherwise, the following epoll and read sequence will eventually hang forever: 1. Put some data to make the trace_pipe ring_buffer read ready first 2. epoll_ctl(efd, EPOLL_CTL_ADD, trace_pipe_fd, ee) 3. epoll_wait() 4. read(trace_pipe_fd) till EAGAIN 5. Add some more data to the trace_pipe ring_buffer 6. epoll_wait() -> this epoll_wait() will block forever ~ During the epoll_ctl(efd, EPOLL_CTL_ADD,...) call in step 2, ring_buffer_poll_wait() returns immediately without adding poll_table, which has poll_table->_qproc pointing to ep_poll_callback(), to its wait_queue. ~ During the epoll_wait() call in step 3 and step 6, ring_buffer_poll_wait() cannot add ep_poll_callback() to its wait_queue because the poll_table->_qproc is NULL and it is how epoll works. ~ When there is new data available in step 6, ring_buffer does not know it has to call ep_poll_callback() because it is not in its wait queue. Hence, block forever. Other poll implementation seems to call poll_wait() unconditionally as the very first thing to do. For example, tcp_poll() in tcp.c. Link: http://lkml.kernel.org/p/20140610060637.GA14045@devbig242.prn2.facebook.com Fixes: 2a2cc8f7c4d0 "ftrace: allow the event pipe to be polled" Reviewed-by: Chris Mason <clm@fb.com> Signed-off-by: Martin Lau <kafai@fb.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-07-28mwifiex: fix Tx timeout issueAmitkumar Karwar
commit d76744a93246eccdca1106037e8ee29debf48277 upstream. https://bugzilla.kernel.org/show_bug.cgi?id=70191 https://bugzilla.kernel.org/show_bug.cgi?id=77581 It is observed that sometimes Tx packet is downloaded without adding driver's txpd header. This results in firmware parsing garbage data as packet length. Sometimes firmware is unable to read the packet if length comes out as invalid. This stops further traffic and timeout occurs. The root cause is uninitialized fields in tx_info(skb->cb) of packet used to get garbage values. In this case if MWIFIEX_BUF_FLAG_REQUEUED_PKT flag is mistakenly set, txpd header was skipped. This patch makes sure that tx_info is correctly initialized to fix the problem. Reported-by: Andrew Wiley <wiley.andrew.j@gmail.com> Reported-by: Linus Gasser <list@markas-al-nour.org> Reported-by: Michael Hirsch <hirsch@teufel.de> Tested-by: Xinming Hu <huxm@marvell.com> Signed-off-by: Amitkumar Karwar <akarwar@marvell.com> Signed-off-by: Maithili Hinge <maithili@marvell.com> Signed-off-by: Avinash Patil <patila@marvell.com> Signed-off-by: Bing Zhao <bzhao@marvell.com> Signed-off-by: John W. Linville <linville@tuxdriver.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-07-28perf/x86/intel: ignore CondChgd bit to avoid false NMI handlingHATAYAMA Daisuke
commit b292d7a10487aee6e74b1c18b8d95b92f40d4a4f upstream. Currently, any NMI is falsely handled by a NMI handler of NMI watchdog if CondChgd bit in MSR_CORE_PERF_GLOBAL_STATUS MSR is set. For example, we use external NMI to make system panic to get crash dump, but in this case, the external NMI is falsely handled do to the issue. This commit deals with the issue simply by ignoring CondChgd bit. Here is explanation in detail. On x86 NMI watchdog uses performance monitoring feature to periodically signal NMI each time performance counter gets overflowed. intel_pmu_handle_irq() is called as a NMI_LOCAL handler from a NMI handler of NMI watchdog, perf_event_nmi_handler(). It identifies an owner of a given NMI by looking at overflow status bits in MSR_CORE_PERF_GLOBAL_STATUS MSR. If some of the bits are set, then it handles the given NMI as its own NMI. The problem is that the intel_pmu_handle_irq() doesn't distinguish CondChgd bit from other bits. Unlike the other status bits, CondChgd bit doesn't represent overflow status for performance counters. Thus, CondChgd bit cannot be thought of as a mark indicating a given NMI is NMI watchdog's. As a result, if CondChgd bit is set, any NMI is falsely handled by the NMI handler of NMI watchdog. Also, if type of the falsely handled NMI is either NMI_UNKNOWN, NMI_SERR or NMI_IO_CHECK, the corresponding action is never performed until CondChgd bit is cleared. I noticed this behavior on systems with Ivy Bridge processors: Intel Xeon CPU E5-2630 v2 and Intel Xeon CPU E7-8890 v2. On both systems, CondChgd bit in MSR_CORE_PERF_GLOBAL_STATUS MSR has already been set in the beginning at boot. Then the CondChgd bit is immediately cleared by next wrmsr to MSR_CORE_PERF_GLOBAL_CTRL MSR and appears to remain 0. On the other hand, on older processors such as Nehalem, Xeon E7540, CondChgd bit is not set in the beginning at boot. I'm not sure about exact behavior of CondChgd bit, in particular when this bit is set. Although I read Intel System Programmer's Manual to figure out that, the descriptions I found are: In 18.9.1: "The MSR_PERF_GLOBAL_STATUS MSR also provides a ¡sticky bit¢ to indicate changes to the state of performancmonitoring hardware" In Table 35-2 IA-32 Architectural MSRs 63 CondChg: status bits of this register has changed. These are different from the bahviour I see on the actual system as I explained above. At least, I think ignoring CondChgd bit should be enough for NMI watchdog perspective. Signed-off-by: HATAYAMA Daisuke <d.hatayama@jp.fujitsu.com> Acked-by: Don Zickus <dzickus@redhat.com> Signed-off-by: Peter Zijlstra <peterz@infradead.org> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: linux-kernel@vger.kernel.org Link: http://lkml.kernel.org/r/20140625.103503.409316067.d.hatayama@jp.fujitsu.com Signed-off-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-07-28perf: Do not allow optimized switch for non-cloned eventsJiri Olsa
commit 1f9a7268c67f0290837aada443d28fd953ddca90 upstream. The context check in perf_event_context_sched_out allows non-cloned context to be part of the optimized schedule out switch. This could move non-cloned context into another workload child. Once this child exits, the context is closed and leaves all original (parent) events in closed state. Any other new cloned event will have closed state and not measure anything. And probably causing other odd bugs. Signed-off-by: Jiri Olsa <jolsa@kernel.org> Signed-off-by: Peter Zijlstra <peterz@infradead.org> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Corey Ashford <cjashfor@linux.vnet.ibm.com> Cc: David Ahern <dsahern@gmail.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: http://lkml.kernel.org/r/1403598026-2310-2-git-send-email-jolsa@kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-07-28xen-netback: Fix pointer incrementation to avoid incorrect loggingZoltan Kiss
[ Upstream commit d8cfbfc4660054150ca1b7c501a8edc0771022f9 ] Due to this pointer is increased prematurely, the error log contains rubbish. Signed-off-by: Zoltan Kiss <zoltan.kiss@citrix.com> Reported-by: Armin Zentai <armin.zentai@ezit.hu> Cc: netdev@vger.kernel.org Cc: linux-kernel@vger.kernel.org Cc: xen-devel@lists.xenproject.org Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-07-28xen-netback: Fix releasing header slot on error pathZoltan Kiss
[ Upstream commit 1b860da0404a76af8533099ffe0a965490939369 ] This patch makes this function aware that the first frag and the header might share the same ring slot. That could happen if the first slot is bigger than PKT_PROT_LEN. Due to this the error path might release that slot twice or never, depending on the error scenario. xenvif_idx_release is also removed from xenvif_idx_unmap, and called separately. Signed-off-by: Zoltan Kiss <zoltan.kiss@citrix.com> Reported-by: Armin Zentai <armin.zentai@ezit.hu> Cc: netdev@vger.kernel.org Cc: linux-kernel@vger.kernel.org Cc: xen-devel@lists.xenproject.org Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-07-28xen-netback: Fix releasing frag_list skbs in error pathZoltan Kiss
[ Upstream commit b42cc6e421e7bf74e545483aa34b99d2a2ca6d3a ] When the grant operations failed, the skb is freed up eventually, and it tries to release the frags, if there is any. For the main skb nr_frags is set to 0 to avoid this, but on the frag_list it iterates through the frags array, and tries to call put_page on the page pointer which contains garbage at that time. Signed-off-by: Zoltan Kiss <zoltan.kiss@citrix.com> Reported-by: Armin Zentai <armin.zentai@ezit.hu> Cc: netdev@vger.kernel.org Cc: linux-kernel@vger.kernel.org Cc: xen-devel@lists.xenproject.org Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-07-28xen-netback: Fix handling frag_list on grant op error pathZoltan Kiss
[ Upstream commit 1a998d3e6bc1e44f4c0bc7509bdedef8ed3845ec ] The error handling for skb's with frag_list was completely wrong, it caused double unmap attempts to happen if the error was on the first skb. Move it to the right place in the loop. Signed-off-by: Zoltan Kiss <zoltan.kiss@citrix.com> Reported-by: Armin Zentai <armin.zentai@ezit.hu> Cc: netdev@vger.kernel.org Cc: linux-kernel@vger.kernel.org Cc: xen-devel@lists.xenproject.org Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-07-28ipv4: fix buffer overflow in ip_options_compile()Eric Dumazet
[ Upstream commit 10ec9472f05b45c94db3c854d22581a20b97db41 ] There is a benign buffer overflow in ip_options_compile spotted by AddressSanitizer[1] : Its benign because we always can access one extra byte in skb->head (because header is followed by struct skb_shared_info), and in this case this byte is not even used. [28504.910798] ================================================================== [28504.912046] AddressSanitizer: heap-buffer-overflow in ip_options_compile [28504.913170] Read of size 1 by thread T15843: [28504.914026] [<ffffffff81802f91>] ip_options_compile+0x121/0x9c0 [28504.915394] [<ffffffff81804a0d>] ip_options_get_from_user+0xad/0x120 [28504.916843] [<ffffffff8180dedf>] do_ip_setsockopt.isra.15+0x8df/0x1630 [28504.918175] [<ffffffff8180ec60>] ip_setsockopt+0x30/0xa0 [28504.919490] [<ffffffff8181e59b>] tcp_setsockopt+0x5b/0x90 [28504.920835] [<ffffffff8177462f>] sock_common_setsockopt+0x5f/0x70 [28504.922208] [<ffffffff817729c2>] SyS_setsockopt+0xa2/0x140 [28504.923459] [<ffffffff818cfb69>] system_call_fastpath+0x16/0x1b [28504.924722] [28504.925106] Allocated by thread T15843: [28504.925815] [<ffffffff81804995>] ip_options_get_from_user+0x35/0x120 [28504.926884] [<ffffffff8180dedf>] do_ip_setsockopt.isra.15+0x8df/0x1630 [28504.927975] [<ffffffff8180ec60>] ip_setsockopt+0x30/0xa0 [28504.929175] [<ffffffff8181e59b>] tcp_setsockopt+0x5b/0x90 [28504.930400] [<ffffffff8177462f>] sock_common_setsockopt+0x5f/0x70 [28504.931677] [<ffffffff817729c2>] SyS_setsockopt+0xa2/0x140 [28504.932851] [<ffffffff818cfb69>] system_call_fastpath+0x16/0x1b [28504.934018] [28504.934377] The buggy address ffff880026382828 is located 0 bytes to the right [28504.934377] of 40-byte region [ffff880026382800, ffff880026382828) [28504.937144] [28504.937474] Memory state around the buggy address: [28504.938430] ffff880026382300: ........ rrrrrrrr rrrrrrrr rrrrrrrr [28504.939884] ffff880026382400: ffffffff rrrrrrrr rrrrrrrr rrrrrrrr [28504.941294] ffff880026382500: .....rrr rrrrrrrr rrrrrrrr rrrrrrrr [28504.942504] ffff880026382600: ffffffff rrrrrrrr rrrrrrrr rrrrrrrr [28504.943483] ffff880026382700: ffffffff rrrrrrrr rrrrrrrr rrrrrrrr [28504.944511] >ffff880026382800: .....rrr rrrrrrrr rrrrrrrr rrrrrrrr [28504.945573] ^ [28504.946277] ffff880026382900: ffffffff rrrrrrrr rrrrrrrr rrrrrrrr [28505.094949] ffff880026382a00: ffffffff rrrrrrrr rrrrrrrr rrrrrrrr [28505.096114] ffff880026382b00: ffffffff rrrrrrrr rrrrrrrr rrrrrrrr [28505.097116] ffff880026382c00: ffffffff rrrrrrrr rrrrrrrr rrrrrrrr [28505.098472] ffff880026382d00: ffffffff rrrrrrrr rrrrrrrr rrrrrrrr [28505.099804] Legend: [28505.100269] f - 8 freed bytes [28505.100884] r - 8 redzone bytes [28505.101649] . - 8 allocated bytes [28505.102406] x=1..7 - x allocated bytes + (8-x) redzone bytes [28505.103637] ================================================================== [1] https://code.google.com/p/address-sanitizer/wiki/AddressSanitizerForKernel Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-07-28dns_resolver: Null-terminate the right stringBen Hutchings
[ Upstream commit 640d7efe4c08f06c4ae5d31b79bd8740e7f6790a ] *_result[len] is parsed as *(_result[len]) which is not at all what we want to touch here. Signed-off-by: Ben Hutchings <ben@decadent.org.uk> Fixes: 84a7c0b1db1c ("dns_resolver: assure that dns_query() result is null-terminated") Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-07-28dns_resolver: assure that dns_query() result is null-terminatedManuel Schölling
[ Upstream commit 84a7c0b1db1c17d5ded8d3800228a608e1070b40 ] dns_query() credulously assumes that keys are null-terminated and returns a copy of a memory block that is off by one. Signed-off-by: Manuel Schölling <manuel.schoelling@gmx.de> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-07-28net: ppp: don't call sk_chk_filter twiceChristoph Schulz
[ Upstream commit 3916a3192793fd3c11f69d623ef0cdbdbf9ea10a ] Commit 568f194e8bd16c353ad50f9ab95d98b20578a39d ("net: ppp: use sk_unattached_filter api") causes sk_chk_filter() to be called twice when setting a PPP pass or active filter. This applies to both the generic PPP subsystem implemented by drivers/net/ppp/ppp_generic.c and the ISDN PPP subsystem implemented by drivers/isdn/i4l/isdn_ppp.c. The first call is from within get_filter(). The second one is through the call chain ppp_ioctl() or isdn_ppp_ioctl() --> sk_unattached_filter_create() --> __sk_prepare_filter() --> sk_chk_filter() The first call from within get_filter() should be deleted as get_filter() is called just before calling sk_unattached_filter_create() later on, which eventually calls sk_chk_filter() anyway. For 3.15.x, this proposed change is a bugfix rather than a pure optimization as in that branch, sk_chk_filter() may replace filter codes by other codes which are not recognized when executing sk_chk_filter() a second time. So with 3.15.x, if sk_chk_filter() is called twice, the second invocation may yield EINVAL (this depends on the filter codes found in the filter to be set, but because the replacement is done for frequently used codes, this is almost always the case). The net effect is that setting pass and/or active PPP filters does not work anymore, since sk_unattached_filter_create() always returns EINVAL due to the second call to sk_chk_filter(), regardless whether the filter was originally sane or not. Signed-off-by: Christoph Schulz <develop@kristov.de> Acked-by: Daniel Borkmann <dborkman@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-07-28net: huawei_cdc_ncm: add "subclass 3" devicesBjørn Mork
[ Upstream commit c2a6c7813f1ffae636e369b5d7011c9f518d3cd9 ] Huawei's usage of the subclass and protocol fields is not 100% clear to us, but there appears to be a very strict system. A device with the "shared" device ID 12d1:1506 and this NCM function was recently reported (showing only default altsetting): Interface Descriptor: bLength 9 bDescriptorType 4 bInterfaceNumber 1 bAlternateSetting 0 bNumEndpoints 1 bInterfaceClass 255 Vendor Specific Class bInterfaceSubClass 3 bInterfaceProtocol 22 iInterface 8 CDC Network Control Model (NCM) ** UNRECOGNIZED: 05 24 00 10 01 ** UNRECOGNIZED: 06 24 1a 00 01 1f ** UNRECOGNIZED: 0c 24 1b 00 01 00 04 10 14 dc 05 20 ** UNRECOGNIZED: 0d 24 0f 0a 0f 00 00 00 ea 05 03 00 01 ** UNRECOGNIZED: 05 24 06 01 01 Endpoint Descriptor: bLength 7 bDescriptorType 5 bEndpointAddress 0x85 EP 5 IN bmAttributes 3 Transfer Type Interrupt Synch Type None Usage Type Data wMaxPacketSize 0x0010 1x 16 bytes bInterval 9 Cc: Enrico Mioso <mrkiko.rs@gmail.com> Signed-off-by: Bjørn Mork <bjorn@mork.no> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-07-28net: ppp: fix creating PPP pass and active filtersChristoph Schulz
[ Upstream commit cc25eaae238ddd693aa5eaa73e565d8ff4915f6e ] Commit 568f194e8bd16c353ad50f9ab95d98b20578a39d ("net: ppp: use sk_unattached_filter api") inadvertently changed the logic when setting PPP pass and active filters. This applies to both the generic PPP subsystem implemented by drivers/net/ppp/ppp_generic.c and the ISDN PPP subsystem implemented by drivers/isdn/i4l/isdn_ppp.c. The original code in ppp_ioctl() (or isdn_ppp_ioctl(), resp.) handling PPPIOCSPASS and PPPIOCSACTIVE allowed to remove a pass/active filter previously set by using a filter of length zero. However, with the new code this is not possible anymore as this case is not explicitly checked for, which leads to passing NULL as a filter to sk_unattached_filter_create(). This results in returning EINVAL to the caller. Additionally, the variables ppp->pass_filter and ppp->active_filter (or is->pass_filter and is->active_filter, resp.) are not reset to NULL, although the filters they point to may have been destroyed by sk_unattached_filter_destroy(), so in this EINVAL case dangling pointers are left behind (provided the pointers were previously non-NULL). This patch corrects both problems by checking whether the filter passed is empty or non-empty, and prevents sk_unattached_filter_create() from being called in the first case. Moreover, the pointers are always reset to NULL as soon as sk_unattached_filter_destroy() returns. Signed-off-by: Christoph Schulz <develop@kristov.de> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-07-28sunvnet: clean up objects created in vnet_new() on vnet_exit()Sowmini Varadhan
[ Upstream commit a4b70a07ed12a71131cab7adce2ce91c71b37060 ] Nothing cleans up the objects created by vnet_new(), they are completely leaked. vnet_exit(), after doing the vio_unregister_driver() to clean up ports, should call a helper function that iterates over vnet_list and cleans up those objects. This includes unregister_netdevice() as well as free_netdev(). Signed-off-by: Sowmini Varadhan <sowmini.varadhan@oracle.com> Acked-by: Dave Kleikamp <dave.kleikamp@oracle.com> Reviewed-by: Karl Volz <karl.volz@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-07-28net-gre-gro: Fix a bug that breaks the forwarding pathJerry Chu
[ Upstream commit c3caf1192f904de2f1381211f564537235d50de3 ] Fixed a bug that was introduced by my GRE-GRO patch (bf5a755f5e9186406bbf50f4087100af5bd68e40 net-gre-gro: Add GRE support to the GRO stack) that breaks the forwarding path because various GSO related fields were not set. The bug will cause on the egress path either the GSO code to fail, or a GRE-TSO capable (NETIF_F_GSO_GRE) NICs to choke. The following fix has been tested for both cases. Signed-off-by: H.K. Jerry Chu <hkchu@google.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-07-28bonding: fix ad_select module param checkNikolay Aleksandrov
[ Upstream commit 548d28bd0eac840d122b691279ce9f4ce6ecbfb6 ] Obvious copy/paste error when I converted the ad_select to the new option API. "lacp_rate" there should be "ad_select" so we can get the proper value. CC: Jay Vosburgh <j.vosburgh@gmail.com> CC: Veaceslav Falico <vfalico@gmail.com> CC: Andy Gospodarek <andy@greyhouse.net> CC: David S. Miller <davem@davemloft.net> Fixes: 9e5f5eebe765 ("bonding: convert ad_select to use the new option API") Reported-by: Karim Scheik <karim.scheik@prisma-solutions.at> Signed-off-by: Nikolay Aleksandrov <nikolay@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-07-28net: pppoe: use correct channel MTU when using Multilink PPPChristoph Schulz
[ Upstream commit a8a3e41c67d24eb12f9ab9680cbb85e24fcd9711 ] The PPP channel MTU is used with Multilink PPP when ppp_mp_explode() (see ppp_generic module) tries to determine how big a fragment might be. According to RFC 1661, the MTU excludes the 2-byte PPP protocol field, see the corresponding comment and code in ppp_mp_explode(): /* * hdrlen includes the 2-byte PPP protocol field, but the * MTU counts only the payload excluding the protocol field. * (RFC1661 Section 2) */ mtu = pch->chan->mtu - (hdrlen - 2); However, the pppoe module *does* include the PPP protocol field in the channel MTU, which is wrong as it causes the PPP payload to be 1-2 bytes too big under certain circumstances (one byte if PPP protocol compression is used, two otherwise), causing the generated Ethernet packets to be dropped. So the pppoe module has to subtract two bytes from the channel MTU. This error only manifests itself when using Multilink PPP, as otherwise the channel MTU is not used anywhere. In the following, I will describe how to reproduce this bug. We configure two pppd instances for multilink PPP over two PPPoE links, say eth2 and eth3, with a MTU of 1492 bytes for each link and a MRRU of 2976 bytes. (This MRRU is computed by adding the two link MTUs and subtracting the MP header twice, which is 4 bytes long.) The necessary pppd statements on both sides are "multilink mtu 1492 mru 1492 mrru 2976". On the client side, we additionally need "plugin rp-pppoe.so eth2" and "plugin rp-pppoe.so eth3", respectively; on the server side, we additionally need to start two pppoe-server instances to be able to establish two PPPoE sessions, one over eth2 and one over eth3. We set the MTU of the PPP network interface to the MRRU (2976) on both sides of the connection in order to make use of the higher bandwidth. (If we didn't do that, IP fragmentation would kick in, which we want to avoid.) Now we send a ICMPv4 echo request with a payload of 2948 bytes from client to server over the PPP link. This results in the following network packet: 2948 (echo payload) + 8 (ICMPv4 header) + 20 (IPv4 header) --------------------- 2976 (PPP payload) These 2976 bytes do not exceed the MTU of the PPP network interface, so the IP packet is not fragmented. Now the multilink PPP code in ppp_mp_explode() prepends one protocol byte (0x21 for IPv4), making the packet one byte bigger than the negotiated MRRU. So this packet would have to be divided in three fragments. But this does not happen as each link MTU is assumed to be two bytes larger. So this packet is diveded into two fragments only, one of size 1489 and one of size 1488. Now we have for that bigger fragment: 1489 (PPP payload) + 4 (MP header) + 2 (PPP protocol field for the MP payload (0x3d)) + 6 (PPPoE header) -------------------------- 1501 (Ethernet payload) This packet exceeds the link MTU and is discarded. If one configures the link MTU on the client side to 1501, one can see the discarded Ethernet frames with tcpdump running on the client. A ping -s 2948 -c 1 192.168.15.254 leads to the smaller fragment that is correctly received on the server side: (tcpdump -vvvne -i eth3 pppoes and ppp proto 0x3d) 52:54:00:ad:87:fd > 52:54:00:79:5c:d0, ethertype PPPoE S (0x8864), length 1514: PPPoE [ses 0x3] MLPPP (0x003d), length 1494: seq 0x000, Flags [end], length 1492 and to the bigger fragment that is not received on the server side: (tcpdump -vvvne -i eth2 pppoes and ppp proto 0x3d) 52:54:00:70:9e:89 > 52:54:00:5d:6f:b0, ethertype PPPoE S (0x8864), length 1515: PPPoE [ses 0x5] MLPPP (0x003d), length 1495: seq 0x000, Flags [begin], length 1493 With the patch below, we correctly obtain three fragments: 52:54:00:ad:87:fd > 52:54:00:79:5c:d0, ethertype PPPoE S (0x8864), length 1514: PPPoE [ses 0x1] MLPPP (0x003d), length 1494: seq 0x000, Flags [begin], length 1492 52:54:00:70:9e:89 > 52:54:00:5d:6f:b0, ethertype PPPoE S (0x8864), length 1514: PPPoE [ses 0x1] MLPPP (0x003d), length 1494: seq 0x000, Flags [none], length 1492 52:54:00:ad:87:fd > 52:54:00:79:5c:d0, ethertype PPPoE S (0x8864), length 27: PPPoE [ses 0x1] MLPPP (0x003d), length 7: seq 0x000, Flags [end], length 5 And the ICMPv4 echo request is successfully received at the server side: IP (tos 0x0, ttl 64, id 21925, offset 0, flags [DF], proto ICMP (1), length 2976) 192.168.222.2 > 192.168.15.254: ICMP echo request, id 30530, seq 0, length 2956 The bug was introduced in commit c9aa6895371b2a257401f59d3393c9f7ac5a8698 ("[PPPOE]: Advertise PPPoE MTU") from the very beginning. This patch applies to 3.10 upwards but the fix can be applied (with minor modifications) to kernels as old as 2.6.32. Signed-off-by: Christoph Schulz <develop@kristov.de> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-07-28net: sctp: fix information leaks in ulpevent layerDaniel Borkmann
[ Upstream commit 8f2e5ae40ec193bc0a0ed99e95315c3eebca84ea ] While working on some other SCTP code, I noticed that some structures shared with user space are leaking uninitialized stack or heap buffer. In particular, struct sctp_sndrcvinfo has a 2 bytes hole between .sinfo_flags and .sinfo_ppid that remains unfilled by us in sctp_ulpevent_read_sndrcvinfo() when putting this into cmsg. But also struct sctp_remote_error contains a 2 bytes hole that we don't fill but place into a skb through skb_copy_expand() via sctp_ulpevent_make_remote_error(). Both structures are defined by the IETF in RFC6458: * Section 5.3.2. SCTP Header Information Structure: The sctp_sndrcvinfo structure is defined below: struct sctp_sndrcvinfo { uint16_t sinfo_stream; uint16_t sinfo_ssn; uint16_t sinfo_flags; <-- 2 bytes hole --> uint32_t sinfo_ppid; uint32_t sinfo_context; uint32_t sinfo_timetolive; uint32_t sinfo_tsn; uint32_t sinfo_cumtsn; sctp_assoc_t sinfo_assoc_id; }; * 6.1.3. SCTP_REMOTE_ERROR: A remote peer may send an Operation Error message to its peer. This message indicates a variety of error conditions on an association. The entire ERROR chunk as it appears on the wire is included in an SCTP_REMOTE_ERROR event. Please refer to the SCTP specification [RFC4960] and any extensions for a list of possible error formats. An SCTP error notification has the following format: struct sctp_remote_error { uint16_t sre_type; uint16_t sre_flags; uint32_t sre_length; uint16_t sre_error; <-- 2 bytes hole --> sctp_assoc_t sre_assoc_id; uint8_t sre_data[]; }; Fix this by setting both to 0 before filling them out. We also have other structures shared between user and kernel space in SCTP that contains holes (e.g. struct sctp_paddrthlds), but we copy that buffer over from user space first and thus don't need to care about it in that cases. While at it, we can also remove lengthy comments copied from the draft, instead, we update the comment with the correct RFC number where one can look it up. Signed-off-by: Daniel Borkmann <dborkman@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-07-28tipc: clear 'next'-pointer of message fragments before reassemblyJon Paul Maloy
[ Upstream commit 999417549c16dd0e3a382aa9f6ae61688db03181 ] If the 'next' pointer of the last fragment buffer in a message is not zeroed before reassembly, we risk ending up with a corrupt message, since the reassembly function itself isn't doing this. Currently, when a buffer is retrieved from the deferred queue of the broadcast link, the next pointer is not cleared, with the result as described above. This commit corrects this, and thereby fixes a bug that may occur when long broadcast messages are transmitted across dual interfaces. The bug has been present since 40ba3cdf542a469aaa9083fa041656e59b109b90 ("tipc: message reassembly using fragment chain") This commit should be applied to both net and net-next. Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-07-28r8152: fix r8152_csum_workaround functionhayeswang
[ Upstream commit a91d45f1a343188793d6f2bdf1a72c64015a8255 ] The transport offset of the IPv4 packet should be fixed and wouldn't be out of the hw limitation, so the r8152_csum_workaround() should be used for IPv6 packets. Signed-off-by: Hayes Wang <hayeswang@realtek.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-07-28be2net: set EQ DB clear-intr bit in be_open()Suresh Reddy
[ Upstream commit 4cad9f3b61c7268fa89ab8096e23202300399b5d ] On BE3, if the clear-interrupt bit of the EQ doorbell is not set the first time it is armed, ocassionally we have observed that the EQ doesn't raise anymore interrupts even if it is in armed state. This patch fixes this by setting the clear-interrupt bit when EQs are armed for the first time in be_open(). Signed-off-by: Suresh Reddy <Suresh.Reddy@emulex.com> Signed-off-by: Sathya Perla <sathya.perla@emulex.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>