aboutsummaryrefslogtreecommitdiffstats
path: root/init
AgeCommit message (Collapse)Author
2020-06-11Merge branch 'v5.2/standard/base' into v5.2/standard/preempt-rt/intel-x86Bruce Ashfield
2020-06-11Merge tag 'v5.2.44' into v5.2/standard/baseBruce Ashfield
This is the 5.2.44 stable release
2020-06-08x86: Fix early boot crash on gcc-10, third tryBorislav Petkov
commit a9a3ed1eff3601b63aea4fb462d8b3b92c7c1e7e upstream. ... or the odyssey of trying to disable the stack protector for the function which generates the stack canary value. The whole story started with Sergei reporting a boot crash with a kernel built with gcc-10: Kernel panic — not syncing: stack-protector: Kernel stack is corrupted in: start_secondary CPU: 1 PID: 0 Comm: swapper/1 Not tainted 5.6.0-rc5—00235—gfffb08b37df9 #139 Hardware name: Gigabyte Technology Co., Ltd. To be filled by O.E.M./H77M—D3H, BIOS F12 11/14/2013 Call Trace: dump_stack panic ? start_secondary __stack_chk_fail start_secondary secondary_startup_64 -—-[ end Kernel panic — not syncing: stack—protector: Kernel stack is corrupted in: start_secondary This happens because gcc-10 tail-call optimizes the last function call in start_secondary() - cpu_startup_entry() - and thus emits a stack canary check which fails because the canary value changes after the boot_init_stack_canary() call. To fix that, the initial attempt was to mark the one function which generates the stack canary with: __attribute__((optimize("-fno-stack-protector"))) ... start_secondary(void *unused) however, using the optimize attribute doesn't work cumulatively as the attribute does not add to but rather replaces previously supplied optimization options - roughly all -fxxx options. The key one among them being -fno-omit-frame-pointer and thus leading to not present frame pointer - frame pointer which the kernel needs. The next attempt to prevent compilers from tail-call optimizing the last function call cpu_startup_entry(), shy of carving out start_secondary() into a separate compilation unit and building it with -fno-stack-protector, was to add an empty asm(""). This current solution was short and sweet, and reportedly, is supported by both compilers but we didn't get very far this time: future (LTO?) optimization passes could potentially eliminate this, which leads us to the third attempt: having an actual memory barrier there which the compiler cannot ignore or move around etc. That should hold for a long time, but hey we said that about the other two solutions too so... Reported-by: Sergei Trofimovich <slyfox@gentoo.org> Signed-off-by: Borislav Petkov <bp@suse.de> Tested-by: Kalle Valo <kvalo@codeaurora.org> Cc: <stable@vger.kernel.org> Link: https://lkml.kernel.org/r/20200314164451.346497-1-slyfox@gentoo.org Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2020-06-08gcc-10: mark more functions __init to avoid section mismatch warningsLinus Torvalds
commit e99332e7b4cda6e60f5b5916cf9943a79dbef902 upstream. It seems that for whatever reason, gcc-10 ends up not inlining a couple of functions that used to be inlined before. Even if they only have one single callsite - it looks like gcc may have decided that the code was unlikely, and not worth inlining. The code generation difference is harmless, but caused a few new section mismatch errors, since the (now no longer inlined) function wasn't in the __init section, but called other init functions: Section mismatch in reference from the function kexec_free_initrd() to the function .init.text:free_initrd_mem() Section mismatch in reference from the function tpm2_calc_event_log_size() to the function .init.text:early_memremap() Section mismatch in reference from the function tpm2_calc_event_log_size() to the function .init.text:early_memunmap() So add the appropriate __init annotation to make modpost not complain. In both cases there were trivially just a single callsite from another __init function. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> [PG: drop the tpm chunk - doesn't exist until v5.3-rc1~192^2~5.] Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2020-06-08Stop the ad-hoc games with -Wno-maybe-initializedLinus Torvalds
commit 78a5255ffb6a1af189a83e493d916ba1c54d8c75 upstream. We have some rather random rules about when we accept the "maybe-initialized" warnings, and when we don't. For example, we consider it unreliable for gcc versions < 4.9, but also if -O3 is enabled, or if optimizing for size. And then various kernel config options disabled it, because they know that they trigger that warning by confusing gcc sufficiently (ie PROFILE_ALL_BRANCHES). And now gcc-10 seems to be introducing a lot of those warnings too, so it falls under the same heading as 4.9 did. At the same time, we have a very straightforward way to _enable_ that warning when wanted: use "W=2" to enable more warnings. So stop playing these ad-hoc games, and just disable that warning by default, with the known and straight-forward "if you want to work on the extra compiler warnings, use W=123". Would it be great to have code that is always so obvious that it never confuses the compiler whether a variable is used initialized or not? Yes, it would. In a perfect world, the compilers would be smarter, and our source code would be simpler. That's currently not the world we live in, though. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> [PG: use the 4.19-stable version instead of mainline.] Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2020-06-05Merge tag 'v5.2.43' into v5.2/standard/baseBruce Ashfield
This is the 5.2.43 stable release
2020-06-04printk: queue wake_up_klogd irq_work only if per-CPU areas are readySergey Senozhatsky
commit ab6f762f0f53162d41497708b33c9a3236d3609e upstream. printk_deferred(), similarly to printk_safe/printk_nmi, does not immediately attempt to print a new message on the consoles, avoiding calls into non-reentrant kernel paths, e.g. scheduler or timekeeping, which potentially can deadlock the system. Those printk() flavors, instead, rely on per-CPU flush irq_work to print messages from safer contexts. For same reasons (recursive scheduler or timekeeping calls) printk() uses per-CPU irq_work in order to wake up user space syslog/kmsg readers. However, only printk_safe/printk_nmi do make sure that per-CPU areas have been initialised and that it's safe to modify per-CPU irq_work. This means that, for instance, should printk_deferred() be invoked "too early", that is before per-CPU areas are initialised, printk_deferred() will perform illegal per-CPU access. Lech Perczak [0] reports that after commit 1b710b1b10ef ("char/random: silence a lockdep splat with printk()") user-space syslog/kmsg readers are not able to read new kernel messages. The reason is printk_deferred() being called too early (as was pointed out by Petr and John). Fix printk_deferred() and do not queue per-CPU irq_work before per-CPU areas are initialized. Link: https://lore.kernel.org/lkml/aa0732c6-5c4e-8a8b-a1c1-75ebe3dca05b@camlintechnologies.com/ Reported-by: Lech Perczak <l.perczak@camlintechnologies.com> Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com> Tested-by: Jann Horn <jannh@google.com> Reviewed-by: Petr Mladek <pmladek@suse.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Theodore Ts'o <tytso@mit.edu> Cc: John Ogness <john.ogness@linutronix.de> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2020-05-19sched: Lazy migrate_disable processingScott Wood
commit 425c5b38779a860062aa62219dc920d374b13c17 in linux-rt-devel. Avoid overhead on the majority of migrate disable/enable sequences by only manipulating scheduler data (and grabbing the relevant locks) when the task actually schedules while migrate-disabled. A kernel build showed around a 10% reduction in system time (with CONFIG_NR_CPUS=512). Instead of cpuhp_pin_lock, CPU hotplug is handled by keeping a per-CPU count of the number of pinned tasks (including tasks which have not scheduled in the migrate-disabled section); takedown_cpu() will wait until that reaches zero (confirmed by take_cpu_down() in stop machine context to deal with races) before migrating tasks off of the cpu. To simplify synchronization, updating cpus_mask is no longer deferred until migrate_enable(). This lets us not have to worry about migrate_enable() missing the update if it's on the fast path (didn't schedule during the migrate disabled section). It also makes the code a bit simpler and reduces deviation from mainline. While the main motivation for this is the performance benefit, lazy migrate disable also eliminates the restriction on calling migrate_disable() while atomic but leaving the atomic region prior to calling migrate_enable() -- though this won't help with local_bh_disable() (and thus rcutorture) unless something similar is done with the recently added local_lock. Signed-off-by: Scott Wood <swood@redhat.com> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com> Signed-off-by: Bruce Ashfield <bruce.ashfield@gmail.com>
2019-07-22kconfig: Add PREEMPT_RT_FULLThomas Gleixner
Introduce the final symbol for PREEMPT_RT_FULL. Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2019-07-22posix-timers: Thread posix-cpu-timers on -rtJohn Stultz
posix-cpu-timer code takes non -rt safe locks in hard irq context. Move it to a thread. [ 3.0 fixes from Peter Zijlstra <peterz@infradead.org> ] Signed-off-by: John Stultz <johnstul@us.ibm.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2019-07-22slub: Disable SLUB_CPU_PARTIALSebastian Andrzej Siewior
|BUG: sleeping function called from invalid context at kernel/locking/rtmutex.c:915 |in_atomic(): 1, irqs_disabled(): 0, pid: 87, name: rcuop/7 |1 lock held by rcuop/7/87: | #0: (rcu_callback){......}, at: [<ffffffff8112c76a>] rcu_nocb_kthread+0x1ca/0x5d0 |Preemption disabled at:[<ffffffff811eebd9>] put_cpu_partial+0x29/0x220 | |CPU: 0 PID: 87 Comm: rcuop/7 Tainted: G W 4.0.0-rt0+ #477 |Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.7.5-20140531_083030-gandalf 04/01/2014 | 000000000007a9fc ffff88013987baf8 ffffffff817441c7 0000000000000007 | 0000000000000000 ffff88013987bb18 ffffffff810eee51 0000000000000000 | ffff88013fc10200 ffff88013987bb48 ffffffff8174a1c4 000000000007a9fc |Call Trace: | [<ffffffff817441c7>] dump_stack+0x4f/0x90 | [<ffffffff810eee51>] ___might_sleep+0x121/0x1b0 | [<ffffffff8174a1c4>] rt_spin_lock+0x24/0x60 | [<ffffffff811a689a>] __free_pages_ok+0xaa/0x540 | [<ffffffff811a729d>] __free_pages+0x1d/0x30 | [<ffffffff811eddd5>] __free_slab+0xc5/0x1e0 | [<ffffffff811edf46>] free_delayed+0x56/0x70 | [<ffffffff811eecfd>] put_cpu_partial+0x14d/0x220 | [<ffffffff811efc98>] __slab_free+0x158/0x2c0 | [<ffffffff811f0021>] kmem_cache_free+0x221/0x2d0 | [<ffffffff81204d0c>] file_free_rcu+0x2c/0x40 | [<ffffffff8112c7e3>] rcu_nocb_kthread+0x243/0x5d0 | [<ffffffff810e951c>] kthread+0xfc/0x120 | [<ffffffff8174abc8>] ret_from_fork+0x58/0x90 Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
2019-07-22sched: Disable CONFIG_RT_GROUP_SCHED on RTThomas Gleixner
Carsten reported problems when running: taskset 01 chrt -f 1 sleep 1 from within rc.local on a F15 machine. The task stays running and never gets on the run queue because some of the run queues have rt_throttled=1 which does not go away. Works nice from a ssh login shell. Disabling CONFIG_RT_GROUP_SCHED solves that as well. Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2019-07-22mm: Allow only slub on RTIngo Molnar
Disable SLAB and SLOB on -RT. Only SLUB is adopted to -RT needs. Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2019-07-22kernel: sched: Provide a pointer to the valid CPU maskSebastian Andrzej Siewior
In commit 4b53a3412d66 ("sched/core: Remove the tsk_nr_cpus_allowed() wrapper") the tsk_nr_cpus_allowed() wrapper was removed. There was not much difference in !RT but in RT we used this to implement migrate_disable(). Within a migrate_disable() section the CPU mask is restricted to single CPU while the "normal" CPU mask remains untouched. As an alternative implementation Ingo suggested to use struct task_struct { const cpumask_t *cpus_ptr; cpumask_t cpus_mask; }; with t->cpus_allowed_ptr = &t->cpus_allowed; In -RT we then can switch the cpus_ptr to t->cpus_allowed_ptr = &cpumask_of(task_cpu(p)); in a migration disabled region. The rules are simple: - Code that 'uses' ->cpus_allowed would use the pointer. - Code that 'modifies' ->cpus_allowed would use the direct mask. While converting the existing users I tried to stick with the rules above however… well mostly CPUFREQ tries to temporary switch the CPU mask to do something on a certain CPU and then switches the mask back it its original value. So in theory `cpus_ptr' could or should be used. However if this is invoked in a migration disabled region (which is not the case because it would require something like preempt_disable() and set_cpus_allowed_ptr() might sleep so it can't be) then the "restore" part would restore the wrong mask. So it only looks strange and I go for the pointer… Some drivers copy the cpumask without cpumask_copy() and others use cpumask_copy but without alloc_cpumask_var(). I did not fix those as part of this, could do this as a follow up… So is this the way we want it? Is the usage of `cpus_ptr' vs `cpus_mask' for the set + restore part (see cpufreq users) what we want? At some point it looks like they should use a different interface for their doing. I am not sure why switching to certain CPU is important but maybe it could be done via a workqueue from the CPUFREQ core (so we have a comment desribing why are doing this and a get_online_cpus() to ensure that the CPU does not go offline too early). Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Mike Galbraith <efault@gmx.de> Cc: Ingo Molnar <mingo@elte.hu> Cc: Rafael J. Wysocki <rjw@rjwysocki.net> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
2019-07-22printk_safe: remove printk safe codeJohn Ogness
vprintk variants are now NMI-safe so there is no longer a need for the "safe" calls. NOTE: This also removes printk flushing functionality. Signed-off-by: John Ogness <john.ogness@linutronix.de> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
2019-07-22uptime: allow the optional limiting of kernel runtimeBruce Ashfield
Introduce the ability to limit the limit the uptime of a kernel. When enabled, these options set a maximum uptime on the kernel, and (optionally) trigger a clean reboot at expiration. This functionality may appear to be very close to the softdog watchdog implementation. It is. But can't be the softdog for several reasons: - The soft watchdog should be available while this functionality is active - The duration range is different between this and the softdog. The timeout available here is potentially quite a bit longer. - At expiration, there are different expiration requirements and actions. - This functionality is specific to a particular use case and should not impact mainline functionality To cleanly restart the kernel after one minute of uptime, the following config items would be required: CONFIG_UPTIME_LIMITED_KERNEL=y CONFIG_UPTIME_LIMIT_DURATION=1 CONFIG_UPTIME_LIMIT_KERNEL_REBOOT=y Signed-off-by: Bruce Ashfield <bruce.ashfield@windriver.com>
2019-07-22check console device file on fs when bootingBruce Ashfield
If a root filesystem is generated as non-root, one of the tell tale signs is /dev/console not being a character file. To save a whole class of questions, let's just test for the condition and let the user know. Signed-off-by: Richard Laroque <rlarocqu@windriver.com> Signed-off-by: Bruce Ashfield <bruce.ashfield@windriver.com>
2019-07-22mount_root: clarify error messages for when no rootfs foundPaul Gortmaker
To an end user who doesn't really know linux that well, a message like: Kernel panic - not syncing: VFS: Unable to mount root fs on unknown-block(0,0) may just look like cryptic computer speak indicating some deep and complex problem, instead of the reality that they have a simple local configuration problem. Ideally it would be nice to not use the misleading "panic" at all, but since various panic notifiers are historically expecting to be called when there is no valid rootfs, we can't change that. So instead, this tries to make it 100% clear to folks of any background that it is an end user configuration issue. Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com> Signed-off-by: Darren Hart <dvhart@linux.intel.com>
2019-06-29initramfs: fix populate_initrd_image() section mismatchGeert Uytterhoeven
With gcc-4.6.3: WARNING: vmlinux.o(.text.unlikely+0x140): Section mismatch in reference from the function populate_initrd_image() to the variable .init.ramfs.info:__initramfs_size The function populate_initrd_image() references the variable __init __initramfs_size. This is often because populate_initrd_image lacks a __init annotation or the annotation of __initramfs_size is wrong. WARNING: vmlinux.o(.text.unlikely+0x14c): Section mismatch in reference from the function populate_initrd_image() to the function .init.text:unpack_to_rootfs() The function populate_initrd_image() references the function __init unpack_to_rootfs(). This is often because populate_initrd_image lacks a __init annotation or the annotation of unpack_to_rootfs is wrong. WARNING: vmlinux.o(.text.unlikely+0x198): Section mismatch in reference from the function populate_initrd_image() to the function .init.text:xwrite() The function populate_initrd_image() references the function __init xwrite(). This is often because populate_initrd_image lacks a __init annotation or the annotation of xwrite is wrong. Indeed, if the compiler decides not to inline populate_initrd_image(), a warning is generated. Fix this by adding the missing __init annotations. Link: http://lkml.kernel.org/r/20190617074340.12779-1-geert@linux-m68k.org Fixes: 7c184ecd262fe64f ("initramfs: factor out a helper to populate the initrd image") Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-06-08Merge tag 'char-misc-5.2-rc4' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc Pull char/misc driver fixes from Greg KH: "Here are some small char and misc driver fixes for 5.2-rc4 to resolve a number of reported issues. The most "notable" one here is the kernel headers in proc^Wsysfs fixes. Those changes move the header file info into sysfs and fixes the build issues that you reported. Other than that, a bunch of small habanalabs driver fixes, some fpga driver fixes, and a few other tiny driver fixes. All of these have been in linux-next for a while with no reported issues" * tag 'char-misc-5.2-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc: habanalabs: Read upper bits of trace buffer from RWPHI habanalabs: Fix virtual address access via debugfs for 2MB pages fpga: zynqmp-fpga: Correctly handle error pointer habanalabs: fix bug in checking huge page optimization habanalabs: Avoid using a non-initialized MMU cache mutex habanalabs: fix debugfs code uapi/habanalabs: add opcode for enable/disable device debug mode habanalabs: halt debug engines on user process close test_firmware: Use correct snprintf() limit genwqe: Prevent an integer overflow in the ioctl parport: Fix mem leak in parport_register_dev_model fpga: dfl: expand minor range when registering chrdev region fpga: dfl: Add lockdep classes for pdata->lock fpga: dfl: afu: Pass the correct device to dma_mapping_error() fpga: stratix10-soc: fix use-after-free on s10_init() w1: ds2408: Fix typo after 49695ac46861 (reset on output_write retry with readback) kheaders: Do not regenerate archive if config is not changed kheaders: Move from proc to sysfs lkdtm/bugs: Adjust recursion test to avoid elision lkdtm/usercopy: Moves the KERNEL_DS test to non-canonical
2019-05-30treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 167Thomas Gleixner
Based on 1 normalized pattern(s): this program is free software you can redistribute it and or modify it under the terms of the gnu general public license as published by the free software foundation version 2 of the license this program is distributed in the hope that it will be useful but without any warranty without even the implied warranty of merchantability or fitness for a particular purpose see the gnu general public license for more details you should have received a copy of the gnu general public license along with this program if not write to the free software foundation inc 59 temple place suite 330 boston ma 02111 1307 usa extracted by the scancode license scanner the SPDX license identifier GPL-2.0-only has been chosen to replace the boilerplate/reference in 83 file(s). Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Richard Fontana <rfontana@redhat.com> Reviewed-by: Kate Stewart <kstewart@linuxfoundation.org> Reviewed-by: Allison Randal <allison@lohutok.net> Cc: linux-spdx@vger.kernel.org Link: https://lkml.kernel.org/r/20190527070034.021731668@linutronix.de Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-05-24kheaders: Move from proc to sysfsJoel Fernandes (Google)
The kheaders archive consisting of the kernel headers used for compiling bpf programs is in /proc. However there is concern that moving it here will make it permanent. Let us move it to /sys/kernel as discussed [1]. [1] https://lore.kernel.org/patchwork/patch/1067310/#1265969 Suggested-by: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-05-21treewide: Add SPDX license identifier - Makefile/KconfigThomas Gleixner
Add SPDX license identifiers to all Make/Kconfig files which: - Have no license information of any form These files fall under the project license, GPL v2 only. The resulting SPDX license identifier is: GPL-2.0-only Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-05-21treewide: Add SPDX license identifier for missed filesThomas Gleixner
Add SPDX license identifiers to all files which: - Have no license information of any form - Have EXPORT_.*_SYMBOL_GPL inside which was used in the initial scan/conversion to ignore the file These files fall under the project license, GPL v2 only. The resulting SPDX license identifier is: GPL-2.0-only Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-05-18initramfs: don't free a non-existent initrdSteven Price
Since commit 54c7a8916a88 ("initramfs: free initrd memory if opening /initrd.image fails"), the kernel has unconditionally attempted to free the initrd even if it doesn't exist. In the non-existent case this causes a boot-time splat if CONFIG_DEBUG_VIRTUAL is enabled due to a call to virt_to_phys() with a NULL address. Instead we should check that the initrd actually exists and only attempt to free it if it does. Link: http://lkml.kernel.org/r/20190516143125.48948-1-steven.price@arm.com Fixes: 54c7a8916a88 ("initramfs: free initrd memory if opening /initrd.image fails") Signed-off-by: Steven Price <steven.price@arm.com> Reported-by: Mark Rutland <mark.rutland@arm.com> Tested-by: Mark Rutland <mark.rutland@arm.com> Reviewed-by: Mike Rapoport <rppt@linux.ibm.com> Cc: Christoph Hellwig <hch@lst.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-05-14mm: shuffle initial free memory to improve memory-side-cache utilizationDan Williams
Patch series "mm: Randomize free memory", v10. This patch (of 3): Randomization of the page allocator improves the average utilization of a direct-mapped memory-side-cache. Memory side caching is a platform capability that Linux has been previously exposed to in HPC (high-performance computing) environments on specialty platforms. In that instance it was a smaller pool of high-bandwidth-memory relative to higher-capacity / lower-bandwidth DRAM. Now, this capability is going to be found on general purpose server platforms where DRAM is a cache in front of higher latency persistent memory [1]. Robert offered an explanation of the state of the art of Linux interactions with memory-side-caches [2], and I copy it here: It's been a problem in the HPC space: http://www.nersc.gov/research-and-development/knl-cache-mode-performance-coe/ A kernel module called zonesort is available to try to help: https://software.intel.com/en-us/articles/xeon-phi-software and this abandoned patch series proposed that for the kernel: https://lkml.kernel.org/r/20170823100205.17311-1-lukasz.daniluk@intel.com Dan's patch series doesn't attempt to ensure buffers won't conflict, but also reduces the chance that the buffers will. This will make performance more consistent, albeit slower than "optimal" (which is near impossible to attain in a general-purpose kernel). That's better than forcing users to deploy remedies like: "To eliminate this gradual degradation, we have added a Stream measurement to the Node Health Check that follows each job; nodes are rebooted whenever their measured memory bandwidth falls below 300 GB/s." A replacement for zonesort was merged upstream in commit cc9aec03e58f ("x86/numa_emulation: Introduce uniform split capability"). With this numa_emulation capability, memory can be split into cache sized ("near-memory" sized) numa nodes. A bind operation to such a node, and disabling workloads on other nodes, enables full cache performance. However, once the workload exceeds the cache size then cache conflicts are unavoidable. While HPC environments might be able to tolerate time-scheduling of cache sized workloads, for general purpose server platforms, the oversubscribed cache case will be the common case. The worst case scenario is that a server system owner benchmarks a workload at boot with an un-contended cache only to see that performance degrade over time, even below the average cache performance due to excessive conflicts. Randomization clips the peaks and fills in the valleys of cache utilization to yield steady average performance. Here are some performance impact details of the patches: 1/ An Intel internal synthetic memory bandwidth measurement tool, saw a 3X speedup in a contrived case that tries to force cache conflicts. The contrived cased used the numa_emulation capability to force an instance of the benchmark to be run in two of the near-memory sized numa nodes. If both instances were placed on the same emulated they would fit and cause zero conflicts. While on separate emulated nodes without randomization they underutilized the cache and conflicted unnecessarily due to the in-order allocation per node. 2/ A well known Java server application benchmark was run with a heap size that exceeded cache size by 3X. The cache conflict rate was 8% for the first run and degraded to 21% after page allocator aging. With randomization enabled the rate levelled out at 11%. 3/ A MongoDB workload did not observe measurable difference in cache-conflict rates, but the overall throughput dropped by 7% with randomization in one case. 4/ Mel Gorman ran his suite of performance workloads with randomization enabled on platforms without a memory-side-cache and saw a mix of some improvements and some losses [3]. While there is potentially significant improvement for applications that depend on low latency access across a wide working-set, the performance may be negligible to negative for other workloads. For this reason the shuffle capability defaults to off unless a direct-mapped memory-side-cache is detected. Even then, the page_alloc.shuffle=0 parameter can be specified to disable the randomization on those systems. Outside of memory-side-cache utilization concerns there is potentially security benefit from randomization. Some data exfiltration and return-oriented-programming attacks rely on the ability to infer the location of sensitive data objects. The kernel page allocator, especially early in system boot, has predictable first-in-first out behavior for physical pages. Pages are freed in physical address order when first onlined. Quoting Kees: "While we already have a base-address randomization (CONFIG_RANDOMIZE_MEMORY), attacks against the same hardware and memory layouts would certainly be using the predictability of allocation ordering (i.e. for attacks where the base address isn't important: only the relative positions between allocated memory). This is common in lots of heap-style attacks. They try to gain control over ordering by spraying allocations, etc. I'd really like to see this because it gives us something similar to CONFIG_SLAB_FREELIST_RANDOM but for the page allocator." While SLAB_FREELIST_RANDOM reduces the predictability of some local slab caches it leaves vast bulk of memory to be predictably in order allocated. However, it should be noted, the concrete security benefits are hard to quantify, and no known CVE is mitigated by this randomization. Introduce shuffle_free_memory(), and its helper shuffle_zone(), to perform a Fisher-Yates shuffle of the page allocator 'free_area' lists when they are initially populated with free memory at boot and at hotplug time. Do this based on either the presence of a page_alloc.shuffle=Y command line parameter, or autodetection of a memory-side-cache (to be added in a follow-on patch). The shuffling is done in terms of CONFIG_SHUFFLE_PAGE_ORDER sized free pages where the default CONFIG_SHUFFLE_PAGE_ORDER is MAX_ORDER-1 i.e. 10, 4MB this trades off randomization granularity for time spent shuffling. MAX_ORDER-1 was chosen to be minimally invasive to the page allocator while still showing memory-side cache behavior improvements, and the expectation that the security implications of finer granularity randomization is mitigated by CONFIG_SLAB_FREELIST_RANDOM. The performance impact of the shuffling appears to be in the noise compared to other memory initialization work. This initial randomization can be undone over time so a follow-on patch is introduced to inject entropy on page free decisions. It is reasonable to ask if the page free entropy is sufficient, but it is not enough due to the in-order initial freeing of pages. At the start of that process putting page1 in front or behind page0 still keeps them close together, page2 is still near page1 and has a high chance of being adjacent. As more pages are added ordering diversity improves, but there is still high page locality for the low address pages and this leads to no significant impact to the cache conflict rate. [1]: https://itpeernetwork.intel.com/intel-optane-dc-persistent-memory-operating-modes/ [2]: https://lkml.kernel.org/r/AT5PR8401MB1169D656C8B5E121752FC0F8AB120@AT5PR8401MB1169.NAMPRD84.PROD.OUTLOOK.COM [3]: https://lkml.org/lkml/2018/10/12/309 [dan.j.williams@intel.com: fix shuffle enable] Link: http://lkml.kernel.org/r/154943713038.3858443.4125180191382062871.stgit@dwillia2-desk3.amr.corp.intel.com [cai@lca.pw: fix SHUFFLE_PAGE_ALLOCATOR help texts] Link: http://lkml.kernel.org/r/20190425201300.75650-1-cai@lca.pw Link: http://lkml.kernel.org/r/154899811738.3165233.12325692939590944259.stgit@dwillia2-desk3.amr.corp.intel.com Signed-off-by: Dan Williams <dan.j.williams@intel.com> Signed-off-by: Qian Cai <cai@lca.pw> Reviewed-by: Kees Cook <keescook@chromium.org> Acked-by: Michal Hocko <mhocko@suse.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Keith Busch <keith.busch@intel.com> Cc: Robert Elliott <elliott@hpe.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-05-14init: free_initmem: poison freed init memoryMike Rapoport
Various architectures including x86 poison the freed init memory. Do the same in the generic free_initmem implementation and switch sparc32 architecture that is identical to the generic code over to it now. Link: http://lkml.kernel.org/r/1550515285-17446-4-git-send-email-rppt@linux.ibm.com Signed-off-by: Mike Rapoport <rppt@linux.ibm.com> Reviewed-by: Andrew Morton <akpm@linux-foundation.org> Cc: Christoph Hellwig <hch@lst.de> Cc: Palmer Dabbelt <palmer@sifive.com> Cc: Richard Kuo <rkuo@codeaurora.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-05-14init: provide a generic free_initmem implementationMike Rapoport
Patch series "provide a generic free_initmem implementation", v2. Many architectures implement free_initmem() in exactly the same or very similar way: they wrap the call to free_initmem_default() with sometimes different 'poison' parameter. These patches switch those architectures to use a generic implementation that does free_initmem_default(POISON_FREE_INITMEM). This was inspired by Christoph's patches for free_initrd_mem [1] and I shamelessly copied changelog entries from his patches :) [1] https://lore.kernel.org/lkml/20190213174621.29297-1-hch@lst.de/ This patch (of 2): For most architectures free_initmem just a wrapper for the same free_initmem_default(-1) call. Provide that as a generic implementation marked __weak. Link: http://lkml.kernel.org/r/1550515285-17446-2-git-send-email-rppt@linux.ibm.com Signed-off-by: Mike Rapoport <rppt@linux.ibm.com> Reviewed-by: Andrew Morton <akpm@linux-foundation.org> Cc: Christoph Hellwig <hch@lst.de> Cc: Palmer Dabbelt <palmer@sifive.com> Cc: Richard Kuo <rkuo@codeaurora.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-05-14initramfs: poison freed initrd memoryChristoph Hellwig
Various architectures including x86 poison the freed initrd memory. Do the same in the generic free_initrd_mem implementation and switch a few more architectures that are identical to the generic code over to it now. Link: http://lkml.kernel.org/r/20190213174621.29297-9-hch@lst.de Signed-off-by: Christoph Hellwig <hch@lst.de> Acked-by: Mike Rapoport <rppt@linux.ibm.com> Cc: Catalin Marinas <catalin.marinas@arm.com> [arm64] Cc: Geert Uytterhoeven <geert@linux-m68k.org> [m68k] Cc: Steven Price <steven.price@arm.com> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Guan Xuetao <gxt@pku.edu.cn> Cc: Russell King <linux@armlinux.org.uk> Cc: Will Deacon <will.deacon@arm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-05-14initramfs: provide a generic free_initrd_mem implementationChristoph Hellwig
For most architectures free_initrd_mem just expands to the same free_reserved_area call. Provide that as a generic implementation marked __weak. Link: http://lkml.kernel.org/r/20190213174621.29297-8-hch@lst.de Signed-off-by: Christoph Hellwig <hch@lst.de> Acked-by: Geert Uytterhoeven <geert@linux-m68k.org> [m68k] Acked-by: Mike Rapoport <rppt@linux.ibm.com> Cc: Catalin Marinas <catalin.marinas@arm.com> [arm64] Cc: Steven Price <steven.price@arm.com> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Guan Xuetao <gxt@pku.edu.cn> Cc: Russell King <linux@armlinux.org.uk> Cc: Will Deacon <will.deacon@arm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-05-14initramfs: move the legacy keepinitrd parameter to core codeChristoph Hellwig
No need to handle the freeing disable in arch code when we already have a core hook (and a different name for the option) for it. Link: http://lkml.kernel.org/r/20190213174621.29297-7-hch@lst.de Signed-off-by: Christoph Hellwig <hch@lst.de> Acked-by: Catalin Marinas <catalin.marinas@arm.com> [arm64] Acked-by: Mike Rapoport <rppt@linux.ibm.com> Cc: Geert Uytterhoeven <geert@linux-m68k.org> [m68k] Cc: Steven Price <steven.price@arm.com> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Guan Xuetao <gxt@pku.edu.cn> Cc: Russell King <linux@armlinux.org.uk> Cc: Will Deacon <will.deacon@arm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-05-14initramfs: cleanup populate_rootfsChristoph Hellwig
The code for kernels that support ramdisks or not is mostly the same. Unify it by using an IS_ENABLED for the info message, and moving the error message into a stub for populate_initrd_image. [cai@lca.pw: fix a compilation error] Link: http://lkml.kernel.org/r/20190328014806.36375-1-cai@lca.pw Link: http://lkml.kernel.org/r/20190213174621.29297-6-hch@lst.de Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Qian Cai <cai@lca.pw> Acked-by: Mike Rapoport <rppt@linux.ibm.com> Cc: Catalin Marinas <catalin.marinas@arm.com> [arm64] Cc: Geert Uytterhoeven <geert@linux-m68k.org> [m68k] Cc: Steven Price <steven.price@arm.com> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Guan Xuetao <gxt@pku.edu.cn> Cc: Russell King <linux@armlinux.org.uk> Cc: Will Deacon <will.deacon@arm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-05-14initramfs: factor out a helper to populate the initrd imageChristoph Hellwig
This will allow for cleaner code sharing in the caller. Link: http://lkml.kernel.org/r/20190213174621.29297-5-hch@lst.de Signed-off-by: Christoph Hellwig <hch@lst.de> Acked-by: Mike Rapoport <rppt@linux.ibm.com> Cc: Catalin Marinas <catalin.marinas@arm.com> [arm64] Cc: Geert Uytterhoeven <geert@linux-m68k.org> [m68k] Cc: Steven Price <steven.price@arm.com> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Guan Xuetao <gxt@pku.edu.cn> Cc: Russell King <linux@armlinux.org.uk> Cc: Will Deacon <will.deacon@arm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-05-14initramfs: cleanup initrd freeingChristoph Hellwig
Factor the kexec logic into a separate helper, and then inline the rest of free_initrd into the only caller. Link: http://lkml.kernel.org/r/20190213174621.29297-4-hch@lst.de Signed-off-by: Christoph Hellwig <hch@lst.de> Acked-by: Mike Rapoport <rppt@linux.ibm.com> Cc: Catalin Marinas <catalin.marinas@arm.com> [arm64] Cc: Geert Uytterhoeven <geert@linux-m68k.org> [m68k] Cc: Steven Price <steven.price@arm.com> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Guan Xuetao <gxt@pku.edu.cn> Cc: Russell King <linux@armlinux.org.uk> Cc: Will Deacon <will.deacon@arm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-05-14initramfs: free initrd memory if opening /initrd.image failsChristoph Hellwig
Patch series "initramfs tidyups". I've spent some time chasing down behavior in initramfs and found plenty of opportunity to improve the code. A first stab on that is contained in this series. This patch (of 7): We free the initrd memory for all successful or error cases except for the case where opening /initrd.image fails, which looks like an oversight. Steven said: : This also changes the behaviour when CONFIG_INITRAMFS_FORCE is enabled : - specifically it means that the initrd is freed (previously it was : ignored and never freed). But that seems like reasonable behaviour and : the previous behaviour looks like another oversight. Link: http://lkml.kernel.org/r/20190213174621.29297-3-hch@lst.de Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Steven Price <steven.price@arm.com> Acked-by: Mike Rapoport <rppt@linux.ibm.com> Cc: Catalin Marinas <catalin.marinas@arm.com> [arm64] Cc: Geert Uytterhoeven <geert@linux-m68k.org> [m68k] Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Russell King <linux@armlinux.org.uk> Cc: Will Deacon <will.deacon@arm.com> Cc: Guan Xuetao <gxt@pku.edu.cn> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-05-07Merge tag 'random_for_linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tytso/random Pull randomness updates from Ted Ts'o: - initialize the random driver earler - fix CRNG initialization when we trust the CPU's RNG on NUMA systems - other miscellaneous cleanups and fixes. * tag 'random_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/random: random: add a spinlock_t to struct batched_entropy random: document get_random_int() family random: fix CRNG initialization when random.trust_cpu=1 random: move rand_initialize() earlier random: only read from /dev/random after its pool has received 128 bits drivers/char/random.c: make primary_crng static drivers/char/random.c: remove unused stuct poolinfo::poolbits drivers/char/random.c: constify poolinfo_table
2019-05-07Merge tag 'driver-core-5.2-rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core Pull driver core/kobject updates from Greg KH: "Here is the "big" set of driver core patches for 5.2-rc1 There are a number of ACPI patches in here as well, as Rafael said they should go through this tree due to the driver core changes they required. They have all been acked by the ACPI developers. There are also a number of small subsystem-specific changes in here, due to some changes to the kobject core code. Those too have all been acked by the various subsystem maintainers. As for content, it's pretty boring outside of the ACPI changes: - spdx cleanups - kobject documentation updates - default attribute groups for kobjects - other minor kobject/driver core fixes All have been in linux-next for a while with no reported issues" * tag 'driver-core-5.2-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core: (47 commits) kobject: clean up the kobject add documentation a bit more kobject: Fix kernel-doc comment first line kobject: Remove docstring reference to kset firmware_loader: Fix a typo ("syfs" -> "sysfs") kobject: fix dereference before null check on kobj Revert "driver core: platform: Fix the usage of platform device name(pdev->name)" init/config: Do not select BUILD_BIN2C for IKCONFIG Provide in-kernel headers to make extending kernel easier kobject: Improve doc clarity kobject_init_and_add() kobject: Improve docs for kobject_add/del driver core: platform: Fix the usage of platform device name(pdev->name) livepatch: Replace klp_ktype_patch's default_attrs with groups cpufreq: schedutil: Replace default_attrs field with groups padata: Replace padata_attr_type default_attrs field with groups irqdesc: Replace irq_kobj_type's default_attrs field with groups net-sysfs: Replace ktype default_attrs field with groups block: Replace all ktype default_attrs with groups samples/kobject: Replace foo_ktype's default_attrs field with groups kobject: Add support for default attribute groups to kobj_type driver core: Postpone DMA tear-down until after devres release for probe failure ...
2019-05-07Merge tag 'pidfd-v5.2-rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/brauner/linux Pull pidfd updates from Christian Brauner: "This patchset makes it possible to retrieve pidfds at process creation time by introducing the new flag CLONE_PIDFD to the clone() system call. Linus originally suggested to implement this as a new flag to clone() instead of making it a separate system call. After a thorough review from Oleg CLONE_PIDFD returns pidfds in the parent_tidptr argument. This means we can give back the associated pid and the pidfd at the same time. Access to process metadata information thus becomes rather trivial. As has been agreed, CLONE_PIDFD creates file descriptors based on anonymous inodes similar to the new mount api. They are made unconditional by this patchset as they are now needed by core kernel code (vfs, pidfd) even more than they already were before (timerfd, signalfd, io_uring, epoll etc.). The core patchset is rather small. The bulky looking changelist is caused by David's very simple changes to Kconfig to make anon inodes unconditional. A pidfd comes with additional information in fdinfo if the kernel supports procfs. The fdinfo file contains the pid of the process in the callers pid namespace in the same format as the procfs status file, i.e. "Pid:\t%d". To remove worries about missing metadata access this patchset comes with a sample/test program that illustrates how a combination of CLONE_PIDFD and pidfd_send_signal() can be used to gain race-free access to process metadata through /proc/<pid>. Further work based on this patchset has been done by Joel. His work makes pidfds pollable. It finished too late for this merge window. I would prefer to have it sitting in linux-next for a while and send it for inclusion during the 5.3 merge window" * tag 'pidfd-v5.2-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/brauner/linux: samples: show race-free pidfd metadata access signal: support CLONE_PIDFD with pidfd_send_signal clone: add CLONE_PIDFD Make anon_inodes unconditional
2019-05-07Merge tag 'printk-for-5.2' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/pmladek/printk Pull printk updates from Petr Mladek: - Allow state reset of printk_once() calls. - Prevent crashes when dereferencing invalid pointers in vsprintf(). Only the first byte is checked for simplicity. - Make vsprintf warnings consistent and inlined. - Treewide conversion of obsolete %pf, %pF to %ps, %pF printf modifiers. - Some clean up of vsprintf and test_printf code. * tag 'printk-for-5.2' of git://git.kernel.org/pub/scm/linux/kernel/git/pmladek/printk: lib/vsprintf: Make function pointer_string static vsprintf: Limit the length of inlined error messages vsprintf: Avoid confusion between invalid address and value vsprintf: Prevent crash when dereferencing invalid pointers vsprintf: Consolidate handling of unknown pointer specifiers vsprintf: Factor out %pO handler as kobject_string() vsprintf: Factor out %pV handler as va_format() vsprintf: Factor out %p[iI] handler as ip_addr_string() vsprintf: Do not check address of well-known strings vsprintf: Consistent %pK handling for kptr_restrict == 0 vsprintf: Shuffle restricted_pointer() printk: Tie printk_once / printk_deferred_once into .data.once for reset treewide: Switch printk users from %pf and %pF to %ps and %pS, respectively lib/test_printf: Switch to bitmap_zalloc()
2019-05-05x86/mm: Initialize PGD cache during mm initializationNadav Amit
Poking-mm initialization might require to duplicate the PGD in early stage. Initialize the PGD cache earlier to prevent boot failures. Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: Nadav Amit <namit@vmware.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rick Edgecombe <rick.p.edgecombe@intel.com> Cc: Rik van Riel <riel@surriel.com> Cc: Stephen Rothwell <sfr@canb.auug.org.au> Cc: Thomas Gleixner <tglx@linutronix.de> Fixes: 4fc19708b165 ("x86/alternatives: Initialize temporary mm for patching") Link: http://lkml.kernel.org/r/20190505011124.39692-1-namit@vmware.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-04-30x86/alternatives: Initialize temporary mm for patchingNadav Amit
To prevent improper use of the PTEs that are used for text patching, the next patches will use a temporary mm struct. Initailize it by copying the init mm. The address that will be used for patching is taken from the lower area that is usually used for the task memory. Doing so prevents the need to frequently synchronize the temporary-mm (e.g., when BPF programs are installed), since different PGDs are used for the task memory. Finally, randomize the address of the PTEs to harden against exploits that use these PTEs. Suggested-by: Andy Lutomirski <luto@kernel.org> Tested-by: Masami Hiramatsu <mhiramat@kernel.org> Signed-off-by: Nadav Amit <namit@vmware.com> Signed-off-by: Rick Edgecombe <rick.p.edgecombe@intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Masami Hiramatsu <mhiramat@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Dave Hansen <dave.hansen@intel.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Kees Cook <keescook@chromium.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Rik van Riel <riel@surriel.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: akpm@linux-foundation.org Cc: ard.biesheuvel@linaro.org Cc: deneen.t.dock@intel.com Cc: kernel-hardening@lists.openwall.com Cc: kristen@linux.intel.com Cc: linux_dti@icloud.com Cc: will.deacon@arm.com Link: https://lkml.kernel.org/r/20190426232303.28381-8-nadav.amit@gmail.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-04-29init/config: Do not select BUILD_BIN2C for IKCONFIGJoel Fernandes (Google)
Since commit 13610aa908dc ("kernel/configs: use .incbin directive to embed config_data.gz"), IKCONFIG no longer uses BUILD_BIN2C so prevent it from being selected in Kconfig. Reviewed-by: Masahiro Yamada <yamada.masahiro@socionext.com> Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-04-29Provide in-kernel headers to make extending kernel easierJoel Fernandes (Google)
Introduce in-kernel headers which are made available as an archive through proc (/proc/kheaders.tar.xz file). This archive makes it possible to run eBPF and other tracing programs that need to extend the kernel for tracing purposes without any dependency on the file system having headers. A github PR is sent for the corresponding BCC patch at: https://github.com/iovisor/bcc/pull/2312 On Android and embedded systems, it is common to switch kernels but not have kernel headers available on the file system. Further once a different kernel is booted, any headers stored on the file system will no longer be useful. This is an issue even well known to distros. By storing the headers as a compressed archive within the kernel, we can avoid these issues that have been a hindrance for a long time. The best way to use this feature is by building it in. Several users have a need for this, when they switch debug kernels, they do not want to update the filesystem or worry about it where to store the headers on it. However, the feature is also buildable as a module in case the user desires it not being part of the kernel image. This makes it possible to load and unload the headers from memory on demand. A tracing program can load the module, do its operations, and then unload the module to save kernel memory. The total memory needed is 3.3MB. By having the archive available at a fixed location independent of filesystem dependencies and conventions, all debugging tools can directly refer to the fixed location for the archive, without concerning with where the headers on a typical filesystem which significantly simplifies tooling that needs kernel headers. The code to read the headers is based on /proc/config.gz code and uses the same technique to embed the headers. Other approaches were discussed such as having an in-memory mountable filesystem, but that has drawbacks such as requiring an in-kernel xz decompressor which we don't have today, and requiring usage of 42 MB of kernel memory to host the decompressed headers at anytime. Also this approach is simpler than such approaches. Reviewed-by: Masahiro Yamada <yamada.masahiro@socionext.com> Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-04-19random: move rand_initialize() earlierKees Cook
Right now rand_initialize() is run as an early_initcall(), but it only depends on timekeeping_init() (for mixing ktime_get_real() into the pools). However, the call to boot_init_stack_canary() for stack canary initialization runs earlier, which triggers a warning at boot: random: get_random_bytes called from start_kernel+0x357/0x548 with crng_init=0 Instead, this moves rand_initialize() to after timekeeping_init(), and moves canary initialization here as well. Note that this warning may still remain for machines that do not have UEFI RNG support (which initializes the RNG pools during setup_arch()), or for x86 machines without RDRAND (or booting without "random.trust=on" or CONFIG_RANDOM_TRUST_CPU=y). Signed-off-by: Kees Cook <keescook@chromium.org> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2019-04-19init: initialize jump labels before command line option parsingDan Williams
When a module option, or core kernel argument, toggles a static-key it requires jump labels to be initialized early. While x86, PowerPC, and ARM64 arrange for jump_label_init() to be called before parse_args(), ARM does not. Kernel command line: rdinit=/sbin/init page_alloc.shuffle=1 panic=-1 console=ttyAMA0,115200 page_alloc.shuffle=1 ------------[ cut here ]------------ WARNING: CPU: 0 PID: 0 at ./include/linux/jump_label.h:303 page_alloc_shuffle+0x12c/0x1ac static_key_enable(): static key 'page_alloc_shuffle_key+0x0/0x4' used before call to jump_label_init() Modules linked in: CPU: 0 PID: 0 Comm: swapper Not tainted 5.1.0-rc4-next-20190410-00003-g3367c36ce744 #1 Hardware name: ARM Integrator/CP (Device Tree) [<c0011c68>] (unwind_backtrace) from [<c000ec48>] (show_stack+0x10/0x18) [<c000ec48>] (show_stack) from [<c07e9710>] (dump_stack+0x18/0x24) [<c07e9710>] (dump_stack) from [<c001bb1c>] (__warn+0xe0/0x108) [<c001bb1c>] (__warn) from [<c001bb88>] (warn_slowpath_fmt+0x44/0x6c) [<c001bb88>] (warn_slowpath_fmt) from [<c0b0c4a8>] (page_alloc_shuffle+0x12c/0x1ac) [<c0b0c4a8>] (page_alloc_shuffle) from [<c0b0c550>] (shuffle_store+0x28/0x48) [<c0b0c550>] (shuffle_store) from [<c003e6a0>] (parse_args+0x1f4/0x350) [<c003e6a0>] (parse_args) from [<c0ac3c00>] (start_kernel+0x1c0/0x488) Move the fallback call to jump_label_init() to occur before parse_args(). The redundant calls to jump_label_init() in other archs are left intact in case they have static key toggling use cases that are even earlier than option parsing. Link: http://lkml.kernel.org/r/155544804466.1032396.13418949511615676665.stgit@dwillia2-desk3.amr.corp.intel.com Signed-off-by: Dan Williams <dan.j.williams@intel.com> Reported-by: Guenter Roeck <groeck@google.com> Reviewed-by: Kees Cook <keescook@chromium.org> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Mike Rapoport <rppt@linux.ibm.com> Cc: Russell King <rmk@armlinux.org.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-04-19Make anon_inodes unconditionalDavid Howells
Make the anon_inodes facility unconditional so that it can be used by core VFS code and pidfd code. Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> [christian@brauner.io: adapt commit message to mention pidfds] Signed-off-by: Christian Brauner <christian@brauner.io>
2019-04-09treewide: Switch printk users from %pf and %pF to %ps and %pS, respectivelySakari Ailus
%pF and %pf are functionally equivalent to %pS and %ps conversion specifiers. The former are deprecated, therefore switch the current users to use the preferred variant. The changes have been produced by the following command: git grep -l '%p[fF]' | grep -v '^\(tools\|Documentation\)/' | \ while read i; do perl -i -pe 's/%pf/%ps/g; s/%pF/%pS/g;' $i; done And verifying the result. Link: http://lkml.kernel.org/r/20190325193229.23390-1-sakari.ailus@linux.intel.com Cc: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Cc: linux-arm-kernel@lists.infradead.org Cc: sparclinux@vger.kernel.org Cc: linux-um@lists.infradead.org Cc: xen-devel@lists.xenproject.org Cc: linux-acpi@vger.kernel.org Cc: linux-pm@vger.kernel.org Cc: drbd-dev@lists.linbit.com Cc: linux-block@vger.kernel.org Cc: linux-mmc@vger.kernel.org Cc: linux-nvdimm@lists.01.org Cc: linux-pci@vger.kernel.org Cc: linux-scsi@vger.kernel.org Cc: linux-btrfs@vger.kernel.org Cc: linux-f2fs-devel@lists.sourceforge.net Cc: linux-mm@kvack.org Cc: ceph-devel@vger.kernel.org Cc: netdev@vger.kernel.org Signed-off-by: Sakari Ailus <sakari.ailus@linux.intel.com> Acked-by: David Sterba <dsterba@suse.com> (for btrfs) Acked-by: Mike Rapoport <rppt@linux.ibm.com> (for mm/memblock.c) Acked-by: Bjorn Helgaas <bhelgaas@google.com> (for drivers/pci) Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Signed-off-by: Petr Mladek <pmladek@suse.com>
2019-03-12init/main: add checks for the return value of memblock_alloc*()Mike Rapoport
Add panic() calls if memblock_alloc() returns NULL. The panic() format duplicates the one used by memblock itself and in order to avoid explosion with long parameters list replace open coded allocation size calculations with a local variable. Link: http://lkml.kernel.org/r/1548057848-15136-18-git-send-email-rppt@linux.ibm.com Signed-off-by: Mike Rapoport <rppt@linux.ibm.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Christophe Leroy <christophe.leroy@c-s.fr> Cc: Christoph Hellwig <hch@lst.de> Cc: "David S. Miller" <davem@davemloft.net> Cc: Dennis Zhou <dennis@kernel.org> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Cc: Greentime Hu <green.hu@gmail.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Guan Xuetao <gxt@pku.edu.cn> Cc: Guo Ren <guoren@kernel.org> Cc: Guo Ren <ren_guo@c-sky.com> [c-sky] Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Juergen Gross <jgross@suse.com> [Xen] Cc: Mark Salter <msalter@redhat.com> Cc: Matt Turner <mattst88@gmail.com> Cc: Max Filippov <jcmvbkbc@gmail.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Michal Simek <monstr@monstr.eu> Cc: Paul Burton <paul.burton@mips.com> Cc: Petr Mladek <pmladek@suse.com> Cc: Richard Weinberger <richard@nod.at> Cc: Rich Felker <dalias@libc.org> Cc: Rob Herring <robh+dt@kernel.org> Cc: Rob Herring <robh@kernel.org> Cc: Russell King <linux@armlinux.org.uk> Cc: Stafford Horne <shorne@gmail.com> Cc: Tony Luck <tony.luck@intel.com> Cc: Vineet Gupta <vgupta@synopsys.com> Cc: Yoshinori Sato <ysato@users.sourceforge.jp> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-03-10Merge tag 'kbuild-v5.1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild Pull Kbuild updates from Masahiro Yamada: - do not generate unneeded top-level built-in.a - let git ignore O= directory entirely - optimize scripts/kallsyms slightly - exclude DWARF info from *.s regardless of config options - fix GCC toolchain search path for Clang to prepare ld.lld support - do not generate modules.order when CONFIG_MODULES is disabled - simplify single target rules and remove VPATH for external module build - allow to add optional flags to dpkg-buildpackage when building deb-pkg - move some compiler option tests from Makefile to Kconfig - various Makefile cleanups * tag 'kbuild-v5.1' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild: (40 commits) kbuild: remove scripts/basic/% build target kbuild: use -Werror=implicit-... instead of -Werror-implicit-... kbuild: clean up scripts/gcc-version.sh kbuild: remove cc-version macro kbuild: update comment block of scripts/clang-version.sh kbuild: remove commented-out INITRD_COMPRESS kbuild: move -gsplit-dwarf, -gdwarf-4 option tests to Kconfig kbuild: [bin]deb-pkg: add DPKG_FLAGS variable kbuild: move ".config not found!" message from Kconfig to Makefile kbuild: invoke syncconfig if include/config/auto.conf.cmd is missing kbuild: simplify single target rules kbuild: remove empty rules for makefiles kbuild: make -r/-R effective in top Makefile for old Make versions kbuild: move tools_silent to a more relevant place kbuild: compute false-positive -Wmaybe-uninitialized cases in Kconfig kbuild: refactor cc-cross-prefix implementation kbuild: hardcode genksyms path and remove GENKSYMS variable scripts/gdb: refactor rules for symlink creation kbuild: create symlink to vmlinux-gdb.py in scripts_gdb target scripts/gdb: do not descend into scripts/gdb from scripts ...
2019-03-10Merge branch 'timers-urgent-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull timer fix from Thomas Gleixner: "A single fix to prevent a unmet dependencies warning in Kconfig" * 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: time: Make VIRT_CPU_ACCOUNTING_GEN depend on GENERIC_CLOCKEVENTS