aboutsummaryrefslogtreecommitdiffstats
path: root/arch/arm64/mm/mmu.c
AgeCommit message (Collapse)Author
2022-04-08arm64: Do not defer reserve_crashkernel() for platforms with no DMA memory zonesVijay Balakrishna
commit 031495635b4668f94e964e037ca93d0d38bfde58 upstream. The following patches resulted in deferring crash kernel reservation to mem_init(), mainly aimed at platforms with DMA memory zones (no IOMMU), in particular Raspberry Pi 4. commit 1a8e1cef7603 ("arm64: use both ZONE_DMA and ZONE_DMA32") commit 8424ecdde7df ("arm64: mm: Set ZONE_DMA size based on devicetree's dma-ranges") commit 0a30c53573b0 ("arm64: mm: Move reserve_crashkernel() into mem_init()") commit 2687275a5843 ("arm64: Force NO_BLOCK_MAPPINGS if crashkernel reservation is required") Above changes introduced boot slowdown due to linear map creation for all the memory banks with NO_BLOCK_MAPPINGS, see discussion[1]. The proposed changes restore crash kernel reservation to earlier behavior thus avoids slow boot, particularly for platforms with IOMMU (no DMA memory zones). Tested changes to confirm no ~150ms boot slowdown on our SoC with IOMMU and 8GB memory. Also tested with ZONE_DMA and/or ZONE_DMA32 configs to confirm no regression to deferring scheme of crash kernel memory reservation. In both cases successfully collected kernel crash dump. [1] https://lore.kernel.org/all/9436d033-579b-55fa-9b00-6f4b661c2dd7@linux.microsoft.com/ Signed-off-by: Vijay Balakrishna <vijayb@linux.microsoft.com> Cc: stable@vger.kernel.org Reviewed-by: Pasha Tatashin <pasha.tatashin@soleen.com> Link: https://lore.kernel.org/r/1646242689-20744-1-git-send-email-vijayb@linux.microsoft.com [will: Add #ifdef CONFIG_KEXEC_CORE guards to fix 'crashk_res' references in allnoconfig build] Signed-off-by: Will Deacon <will@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-04-08arm64/mm: avoid fixmap race condition when create pud mappingJianyong Wu
[ Upstream commit ee017ee353506fcec58e481673e4331ff198a80e ] The 'fixmap' is a global resource and is used recursively by create pud mapping(), leading to a potential race condition in the presence of a concurrent call to alloc_init_pud(): kernel_init thread virtio-mem workqueue thread ================== =========================== alloc_init_pud(...) alloc_init_pud(...) pudp = pud_set_fixmap_offset(...) pudp = pud_set_fixmap_offset(...) READ_ONCE(*pudp) pud_clear_fixmap(...) READ_ONCE(*pudp) // CRASH! As kernel may sleep during creating pud mapping, introduce a mutex lock to serialise use of the fixmap entries by alloc_init_pud(). However, there is no need for locking in early boot stage and it doesn't work well with KASLR enabled when early boot. So, enable lock when system_state doesn't equal to "SYSTEM_BOOTING". Signed-off-by: Jianyong Wu <jianyong.wu@arm.com> Reviewed-by: Catalin Marinas <catalin.marinas@arm.com> Fixes: f4710445458c ("arm64: mm: use fixmap when creating page tables") Link: https://lore.kernel.org/r/20220201114400.56885-1-jianyong.wu@arm.com Signed-off-by: Will Deacon <will@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-03-11arm64: entry: Allow the trampoline text to occupy multiple pagesJames Morse
commit a9c406e6462ff14956d690de7bbe5131a5677dc9 upstream. Adding a second set of vectors to .entry.tramp.text will make it larger than a single 4K page. Allow the trampoline text to occupy up to three pages by adding two more fixmap slots. Previous changes to tramp_valias allowed it to reach beyond a single page. Reviewed-by: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: James Morse <james.morse@arm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-11-18arm64: mm: update max_pfn after memory hotplugSudarshan Rajagopalan
[ Upstream commit 8fac67ca236b961b573355e203dbaf62a706a2e5 ] After new memory blocks have been hotplugged, max_pfn and max_low_pfn needs updating to reflect on new PFNs being hot added to system. Without this patch, debug-related functions that use max_pfn such as get_max_dump_pfn() or read_page_owner() will not work with any page in memory that is hot-added after boot. Fixes: 4ab215061554 ("arm64: Add memory hotplug support") Signed-off-by: Sudarshan Rajagopalan <quic_sudaraja@quicinc.com> Signed-off-by: Chris Goldsworthy <quic_cgoldswo@quicinc.com> Acked-by: David Hildenbrand <david@redhat.com> Cc: Florian Fainelli <f.fainelli@gmail.com> Cc: Georgi Djakov <quic_c_gdjako@quicinc.com> Tested-by: Georgi Djakov <quic_c_gdjako@quicinc.com> Link: https://lore.kernel.org/r/a51a27ee7be66024b5ce626310d673f24107bcb8.1632853776.git.quic_cgoldswo@quicinc.com Signed-off-by: Will Deacon <will@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-06-30arm64: Force NO_BLOCK_MAPPINGS if crashkernel reservation is requiredCatalin Marinas
commit 2687275a5843d1089687f08fc64eb3f3b026a169 upstream. mem_init() currently relies on knowing the boundaries of the crashkernel reservation to map such region with page granularity for later unmapping via set_memory_valid(..., 0). If the crashkernel reservation is deferred, such boundaries are not known when the linear mapping is created. Simply parse the command line for "crashkernel" and, if found, create the linear map with NO_BLOCK_MAPPINGS. Signed-off-by: Catalin Marinas <catalin.marinas@arm.com> Tested-by: Nicolas Saenz Julienne <nsaenzjulienne@suse.de> Reviewed-by: Nicolas Saenz Julienne <nsaenzjulienne@suse.de> Acked-by: James Morse <james.morse@arm.com> Cc: James Morse <james.morse@arm.com> Cc: Nicolas Saenz Julienne <nsaenzjulienne@suse.de> Link: https://lore.kernel.org/r/20201119175556.18681-1-catalin.marinas@arm.com Signed-off-by: Catalin Marinas <catalin.marinas@arm.com> Cc: Kefeng Wang <wangkefeng.wang@huawei.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-04-07arm64: mm: correct the inside linear map range during hotplug checkPavel Tatashin
[ Upstream commit ee7febce051945be28ad86d16a15886f878204de ] Memory hotplug may fail on systems with CONFIG_RANDOMIZE_BASE because the linear map range is not checked correctly. The start physical address that linear map covers can be actually at the end of the range because of randomization. Check that and if so reduce it to 0. This can be verified on QEMU with setting kaslr-seed to ~0ul: memstart_offset_seed = 0xffff START: __pa(_PAGE_OFFSET(vabits_actual)) = ffff9000c0000000 END: __pa(PAGE_END - 1) = 1000bfffffff Signed-off-by: Pavel Tatashin <pasha.tatashin@soleen.com> Fixes: 58284a901b42 ("arm64/mm: Validate hotplug range before creating linear mapping") Tested-by: Tyler Hicks <tyhicks@linux.microsoft.com> Reviewed-by: Anshuman Khandual <anshuman.khandual@arm.com> Link: https://lore.kernel.org/r/20210216150351.129018-2-pasha.tatashin@soleen.com Signed-off-by: Will Deacon <will@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-03-17arm64: mm: use a 48-bit ID map when possible on 52-bit VA buildsArd Biesheuvel
[ Upstream commit 7ba8f2b2d652cd8d8a2ab61f4be66973e70f9f88 ] 52-bit VA kernels can run on hardware that is only 48-bit capable, but configure the ID map as 52-bit by default. This was not a problem until recently, because the special T0SZ value for a 52-bit VA space was never programmed into the TCR register anwyay, and because a 52-bit ID map happens to use the same number of translation levels as a 48-bit one. This behavior was changed by commit 1401bef703a4 ("arm64: mm: Always update TCR_EL1 from __cpu_set_tcr_t0sz()"), which causes the unsupported T0SZ value for a 52-bit VA to be programmed into TCR_EL1. While some hardware simply ignores this, Mark reports that Amberwing systems choke on this, resulting in a broken boot. But even before that commit, the unsupported idmap_t0sz value was exposed to KVM and used to program TCR_EL2 incorrectly as well. Given that we already have to deal with address spaces being either 48-bit or 52-bit in size, the cleanest approach seems to be to simply default to a 48-bit VA ID map, and only switch to a 52-bit one if the placement of the kernel in DRAM requires it. This is guaranteed not to happen unless the system is actually 52-bit VA capable. Fixes: 90ec95cda91a ("arm64: mm: Introduce VA_BITS_MIN") Reported-by: Mark Salter <msalter@redhat.com> Link: http://lore.kernel.org/r/20210310003216.410037-1-msalter@redhat.com Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Link: https://lore.kernel.org/r/20210310171515.416643-2-ardb@kernel.org Signed-off-by: Will Deacon <will@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-03-17arm64: mte: Map hotplugged memory as Normal TaggedCatalin Marinas
commit d15dfd31384ba3cb93150e5f87661a76fa419f74 upstream. In a system supporting MTE, the linear map must allow reading/writing allocation tags by setting the memory type as Normal Tagged. Currently, this is only handled for memory present at boot. Hotplugged memory uses Normal non-Tagged memory. Introduce pgprot_mhp() for hotplugged memory and use it in add_memory_resource(). The arm64 code maps pgprot_mhp() to pgprot_tagged(). Note that ZONE_DEVICE memory should not be mapped as Tagged and therefore setting the memory type in arch_add_memory() is not feasible. Signed-off-by: Catalin Marinas <catalin.marinas@arm.com> Fixes: 0178dc761368 ("arm64: mte: Use Normal Tagged attributes for the linear map") Reported-by: Patrick Daly <pdaly@codeaurora.org> Tested-by: Patrick Daly <pdaly@codeaurora.org> Link: https://lore.kernel.org/r/1614745263-27827-1-git-send-email-pdaly@codeaurora.org Cc: <stable@vger.kernel.org> # 5.10.x Cc: Will Deacon <will@kernel.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Vincenzo Frascino <vincenzo.frascino@arm.com> Cc: David Hildenbrand <david@redhat.com> Reviewed-by: David Hildenbrand <david@redhat.com> Reviewed-by: Vincenzo Frascino <vincenzo.frascino@arm.com> Reviewed-by: Anshuman Khandual <anshuman.khandual@arm.com> Link: https://lore.kernel.org/r/20210309122601.5543-1-catalin.marinas@arm.com Signed-off-by: Will Deacon <will@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-11-13arm64/mm: Validate hotplug range before creating linear mappingAnshuman Khandual
During memory hotplug process, the linear mapping should not be created for a given memory range if that would fall outside the maximum allowed linear range. Else it might cause memory corruption in the kernel virtual space. Maximum linear mapping region is [PAGE_OFFSET..(PAGE_END -1)] accommodating both its ends but excluding PAGE_END. Max physical range that can be mapped inside this linear mapping range, must also be derived from its end points. This ensures that arch_add_memory() validates memory hot add range for its potential linear mapping requirements, before creating it with __create_pgd_mapping(). Fixes: 4ab215061554 ("arm64: Add memory hotplug support") Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com> Reviewed-by: Ard Biesheuvel <ardb@kernel.org> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Will Deacon <will@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Ard Biesheuvel <ardb@kernel.org> Cc: Steven Price <steven.price@arm.com> Cc: Robin Murphy <robin.murphy@arm.com> Cc: David Hildenbrand <david@redhat.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: linux-arm-kernel@lists.infradead.org Cc: linux-kernel@vger.kernel.org Link: https://lore.kernel.org/r/1605252614-761-1-git-send-email-anshuman.khandual@arm.com Signed-off-by: Will Deacon <will@kernel.org>
2020-10-25treewide: Convert macro and uses of __section(foo) to __section("foo")Joe Perches
Use a more generic form for __section that requires quotes to avoid complications with clang and gcc differences. Remove the quote operator # from compiler_attributes.h __section macro. Convert all unquoted __section(foo) uses to quoted __section("foo"). Also convert __attribute__((section("foo"))) uses to __section("foo") even if the __attribute__ has multiple list entry forms. Conversion done using the script at: https://lore.kernel.org/lkml/75393e5ddc272dc7403de74d645e6c6e0f4e70eb.camel@perches.com/2-convert_section.pl Signed-off-by: Joe Perches <joe@perches.com> Reviewed-by: Nick Desaulniers <ndesaulniers@gooogle.com> Reviewed-by: Miguel Ojeda <ojeda@kernel.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-10-13arch, drivers: replace for_each_membock() with for_each_mem_range()Mike Rapoport
There are several occurrences of the following pattern: for_each_memblock(memory, reg) { start = __pfn_to_phys(memblock_region_memory_base_pfn(reg); end = __pfn_to_phys(memblock_region_memory_end_pfn(reg)); /* do something with start and end */ } Using for_each_mem_range() iterator is more appropriate in such cases and allows simpler and cleaner code. [akpm@linux-foundation.org: fix arch/arm/mm/pmsa-v7.c build] [rppt@linux.ibm.com: mips: fix cavium-octeon build caused by memblock refactoring] Link: http://lkml.kernel.org/r/20200827124549.GD167163@linux.ibm.com Signed-off-by: Mike Rapoport <rppt@linux.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Cc: Andy Lutomirski <luto@kernel.org> Cc: Baoquan He <bhe@redhat.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Daniel Axtens <dja@axtens.net> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Emil Renner Berthing <kernel@esmil.dk> Cc: Hari Bathini <hbathini@linux.ibm.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jonathan Cameron <Jonathan.Cameron@huawei.com> Cc: Marek Szyprowski <m.szyprowski@samsung.com> Cc: Max Filippov <jcmvbkbc@gmail.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Michal Simek <monstr@monstr.eu> Cc: Miguel Ojeda <miguel.ojeda.sandonis@gmail.com> Cc: Palmer Dabbelt <palmer@dabbelt.com> Cc: Paul Mackerras <paulus@samba.org> Cc: Paul Walmsley <paul.walmsley@sifive.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Russell King <linux@armlinux.org.uk> Cc: Stafford Horne <shorne@gmail.com> Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Will Deacon <will@kernel.org> Cc: Yoshinori Sato <ysato@users.sourceforge.jp> Link: https://lkml.kernel.org/r/20200818151634.14343-13-rppt@kernel.org Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-10-12Merge tag 'core-build-2020-10-12' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull orphan section checking from Ingo Molnar: "Orphan link sections were a long-standing source of obscure bugs, because the heuristics that various linkers & compilers use to handle them (include these bits into the output image vs discarding them silently) are both highly idiosyncratic and also version dependent. Instead of this historically problematic mess, this tree by Kees Cook (et al) adds build time asserts and build time warnings if there's any orphan section in the kernel or if a section is not sized as expected. And because we relied on so many silent assumptions in this area, fix a metric ton of dependencies and some outright bugs related to this, before we can finally enable the checks on the x86, ARM and ARM64 platforms" * tag 'core-build-2020-10-12' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (36 commits) x86/boot/compressed: Warn on orphan section placement x86/build: Warn on orphan section placement arm/boot: Warn on orphan section placement arm/build: Warn on orphan section placement arm64/build: Warn on orphan section placement x86/boot/compressed: Add missing debugging sections to output x86/boot/compressed: Remove, discard, or assert for unwanted sections x86/boot/compressed: Reorganize zero-size section asserts x86/build: Add asserts for unwanted sections x86/build: Enforce an empty .got.plt section x86/asm: Avoid generating unused kprobe sections arm/boot: Handle all sections explicitly arm/build: Assert for unwanted sections arm/build: Add missing sections arm/build: Explicitly keep .ARM.attributes sections arm/build: Refactor linker script headers arm64/build: Assert for unwanted sections arm64/build: Add missing DWARF sections arm64/build: Use common DISCARDS in linker script arm64/build: Remove .eh_frame* sections due to unwind tables ...
2020-09-03arm64: mte: Use Normal Tagged attributes for the linear mapCatalin Marinas
Once user space is given access to tagged memory, the kernel must be able to clear/save/restore tags visible to the user. This is done via the linear mapping, therefore map it as such. The new MT_NORMAL_TAGGED index for MAIR_EL1 is initially mapped as Normal memory and later changed to Normal Tagged via the cpufeature infrastructure. From a mismatched attribute aliases perspective, the Tagged memory is considered a permission and it won't lead to undefined behaviour. Signed-off-by: Catalin Marinas <catalin.marinas@arm.com> Cc: Will Deacon <will@kernel.org> Cc: Suzuki K Poulose <Suzuki.Poulose@arm.com>
2020-09-01arm64/mm: Remove needless section quotesKees Cook
Fix a case of needless quotes in __section(), which Clang doesn't like. Signed-off-by: Kees Cook <keescook@chromium.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Reviewed-by: Nick Desaulniers <ndesaulniers@google.com> Acked-by: Will Deacon <will@kernel.org> Link: https://lore.kernel.org/r/20200821194310.3089815-9-keescook@chromium.org
2020-08-07arm64/mm: enable vmem_altmap support for vmemmap mappingsAnshuman Khandual
Device memory ranges when getting hot added into ZONE_DEVICE, might require their vmemmap mapping's backing memory to be allocated from their own range instead of consuming system memory. This prevents large system memory usage for potentially large device memory ranges. Device driver communicates this request via vmem_altmap structure. Architecture needs to take this request into account while creating and tearing down vemmmap mappings. This enables vmem_altmap support in vmemmap_populate() and vmemmap_free() which includes vmemmap_populate_basepages() used for ARM64_16K_PAGES and ARM64_64K_PAGES configs. Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Tested-by: Jia He <justin.he@arm.com> Reviewed-by: Catalin Marinas <catalin.marinas@arm.com> Cc: Will Deacon <will@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Steve Capper <steve.capper@arm.com> Cc: David Hildenbrand <david@redhat.com> Cc: Yu Zhao <yuzhao@google.com> Cc: Hsin-Yi Wang <hsinyi@chromium.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Andy Lutomirski <luto@kernel.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jonathan Corbet <corbet@lwn.net> Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> Cc: "Matthew Wilcox (Oracle)" <willy@infradead.org> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Michal Hocko <mhocko@suse.com> Cc: Mike Rapoport <rppt@linux.ibm.com> Cc: Palmer Dabbelt <palmer@dabbelt.com> Cc: Paul Mackerras <paulus@samba.org> Cc: Paul Walmsley <paul.walmsley@sifive.com> Cc: Pavel Tatashin <pasha.tatashin@soleen.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Robin Murphy <robin.murphy@arm.com> Cc: Tony Luck <tony.luck@intel.com> Link: http://lkml.kernel.org/r/1594004178-8861-4-git-send-email-anshuman.khandual@arm.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-08-07mm/sparsemem: enable vmem_altmap support in vmemmap_alloc_block_buf()Anshuman Khandual
There are many instances where vmemap allocation is often switched between regular memory and device memory just based on whether altmap is available or not. vmemmap_alloc_block_buf() is used in various platforms to allocate vmemmap mappings. Lets also enable it to handle altmap based device memory allocation along with existing regular memory allocations. This will help in avoiding the altmap based allocation switch in many places. To summarize there are two different methods to call vmemmap_alloc_block_buf(). vmemmap_alloc_block_buf(size, node, NULL) /* Allocate from system RAM */ vmemmap_alloc_block_buf(size, node, altmap) /* Allocate from altmap */ This converts altmap_alloc_block_buf() into a static function, drops it's entry from the header and updates Documentation/vm/memory-model.rst. Suggested-by: Robin Murphy <robin.murphy@arm.com> Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Tested-by: Jia He <justin.he@arm.com> Reviewed-by: Catalin Marinas <catalin.marinas@arm.com> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Will Deacon <will@kernel.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: Borislav Petkov <bp@alien8.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Dan Williams <dan.j.williams@intel.com> Cc: David Hildenbrand <david@redhat.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: Hsin-Yi Wang <hsinyi@chromium.org> Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: "Matthew Wilcox (Oracle)" <willy@infradead.org> Cc: Michal Hocko <mhocko@suse.com> Cc: Mike Rapoport <rppt@linux.ibm.com> Cc: Palmer Dabbelt <palmer@dabbelt.com> Cc: Paul Walmsley <paul.walmsley@sifive.com> Cc: Pavel Tatashin <pasha.tatashin@soleen.com> Cc: Steve Capper <steve.capper@arm.com> Cc: Tony Luck <tony.luck@intel.com> Cc: Yu Zhao <yuzhao@google.com> Link: http://lkml.kernel.org/r/1594004178-8861-3-git-send-email-anshuman.khandual@arm.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-08-07mm/sparsemem: enable vmem_altmap support in vmemmap_populate_basepages()Anshuman Khandual
Patch series "arm64: Enable vmemmap mapping from device memory", v4. This series enables vmemmap backing memory allocation from device memory ranges on arm64. But before that, it enables vmemmap_populate_basepages() and vmemmap_alloc_block_buf() to accommodate struct vmem_altmap based alocation requests. This patch (of 3): vmemmap_populate_basepages() is used across platforms to allocate backing memory for vmemmap mapping. This is used as a standard default choice or as a fallback when intended huge pages allocation fails. This just creates entire vmemmap mapping with base pages (PAGE_SIZE). On arm64 platforms, vmemmap_populate_basepages() is called instead of the platform specific vmemmap_populate() when ARM64_SWAPPER_USES_SECTION_MAPS is not enabled as in case for ARM64_16K_PAGES and ARM64_64K_PAGES configs. At present vmemmap_populate_basepages() does not support allocating from driver defined struct vmem_altmap while trying to create vmemmap mapping for a device memory range. It prevents ARM64_16K_PAGES and ARM64_64K_PAGES configs on arm64 from supporting device memory with vmemap_altmap request. This enables vmem_altmap support in vmemmap_populate_basepages() unlocking device memory allocation for vmemap mapping on arm64 platforms with 16K or 64K base page configs. Each architecture should evaluate and decide on subscribing device memory based base page allocation through vmemmap_populate_basepages(). Hence lets keep it disabled on all archs in order to preserve the existing semantics. A subsequent patch enables it on arm64. Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Tested-by: Jia He <justin.he@arm.com> Reviewed-by: David Hildenbrand <david@redhat.com> Acked-by: Will Deacon <will@kernel.org> Acked-by: Catalin Marinas <catalin.marinas@arm.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Paul Walmsley <paul.walmsley@sifive.com> Cc: Palmer Dabbelt <palmer@dabbelt.com> Cc: Tony Luck <tony.luck@intel.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: Mike Rapoport <rppt@linux.ibm.com> Cc: Michal Hocko <mhocko@suse.com> Cc: "Matthew Wilcox (Oracle)" <willy@infradead.org> Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Pavel Tatashin <pasha.tatashin@soleen.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Borislav Petkov <bp@alien8.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Hsin-Yi Wang <hsinyi@chromium.org> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Paul Mackerras <paulus@samba.org> Cc: Robin Murphy <robin.murphy@arm.com> Cc: Steve Capper <steve.capper@arm.com> Cc: Yu Zhao <yuzhao@google.com> Link: http://lkml.kernel.org/r/1594004178-8861-1-git-send-email-anshuman.khandual@arm.com Link: http://lkml.kernel.org/r/1594004178-8861-2-git-send-email-anshuman.khandual@arm.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-08-07mm: remove unneeded includes of <asm/pgalloc.h>Mike Rapoport
Patch series "mm: cleanup usage of <asm/pgalloc.h>" Most architectures have very similar versions of pXd_alloc_one() and pXd_free_one() for intermediate levels of page table. These patches add generic versions of these functions in <asm-generic/pgalloc.h> and enable use of the generic functions where appropriate. In addition, functions declared and defined in <asm/pgalloc.h> headers are used mostly by core mm and early mm initialization in arch and there is no actual reason to have the <asm/pgalloc.h> included all over the place. The first patch in this series removes unneeded includes of <asm/pgalloc.h> In the end it didn't work out as neatly as I hoped and moving pXd_alloc_track() definitions to <asm-generic/pgalloc.h> would require unnecessary changes to arches that have custom page table allocations, so I've decided to move lib/ioremap.c to mm/ and make pgalloc-track.h local to mm/. This patch (of 8): In most cases <asm/pgalloc.h> header is required only for allocations of page table memory. Most of the .c files that include that header do not use symbols declared in <asm/pgalloc.h> and do not require that header. As for the other header files that used to include <asm/pgalloc.h>, it is possible to move that include into the .c file that actually uses symbols from <asm/pgalloc.h> and drop the include from the header file. The process was somewhat automated using sed -i -E '/[<"]asm\/pgalloc\.h/d' \ $(grep -L -w -f /tmp/xx \ $(git grep -E -l '[<"]asm/pgalloc\.h')) where /tmp/xx contains all the symbols defined in arch/*/include/asm/pgalloc.h. [rppt@linux.ibm.com: fix powerpc warning] Signed-off-by: Mike Rapoport <rppt@linux.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Reviewed-by: Pekka Enberg <penberg@kernel.org> Acked-by: Geert Uytterhoeven <geert@linux-m68k.org> [m68k] Cc: Abdul Haleem <abdhalee@linux.vnet.ibm.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Christophe Leroy <christophe.leroy@csgroup.eu> Cc: Joerg Roedel <joro@8bytes.org> Cc: Max Filippov <jcmvbkbc@gmail.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Satheesh Rajendran <sathnaga@linux.vnet.ibm.com> Cc: Stafford Horne <shorne@gmail.com> Cc: Stephen Rothwell <sfr@canb.auug.org.au> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Joerg Roedel <jroedel@suse.de> Cc: Matthew Wilcox <willy@infradead.org> Link: http://lkml.kernel.org/r/20200627143453.31835-1-rppt@kernel.org Link: http://lkml.kernel.org/r/20200627143453.31835-2-rppt@kernel.org Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-06-15arm64: mm: reset address tag set by kasan sw taggingShyam Thombre
KASAN sw tagging sets a random tag of 8 bits in the top byte of the pointer returned by the memory allocating functions. So for the functions unaware of this change, the top 8 bits of the address must be reset which is done by the function arch_kasan_reset_tag(). Signed-off-by: Shyam Thombre <sthombre@codeaurora.org> Link: https://lore.kernel.org/r/1591787384-5823-1-git-send-email-sthombre@codeaurora.org Signed-off-by: Will Deacon <will@kernel.org>
2020-06-09mm: consolidate pte_index() and pte_offset_*() definitionsMike Rapoport
All architectures define pte_index() as (address >> PAGE_SHIFT) & (PTRS_PER_PTE - 1) and all architectures define pte_offset_kernel() as an entry in the array of PTEs indexed by the pte_index(). For the most architectures the pte_offset_kernel() implementation relies on the availability of pmd_page_vaddr() that converts a PMD entry value to the virtual address of the page containing PTEs array. Let's move x86 definitions of the PTE accessors to the generic place in <linux/pgtable.h> and then simply drop the respective definitions from the other architectures. The architectures that didn't provide pmd_page_vaddr() are updated to have that defined. The generic implementation of pte_offset_kernel() can be overridden by an architecture and alpha makes use of this because it has special ordering requirements for its version of pte_offset_kernel(). [rppt@linux.ibm.com: v2] Link: http://lkml.kernel.org/r/20200514170327.31389-11-rppt@kernel.org [rppt@linux.ibm.com: update] Link: http://lkml.kernel.org/r/20200514170327.31389-12-rppt@kernel.org [rppt@linux.ibm.com: update] Link: http://lkml.kernel.org/r/20200514170327.31389-13-rppt@kernel.org [akpm@linux-foundation.org: fix x86 warning] [sfr@canb.auug.org.au: fix powerpc build] Link: http://lkml.kernel.org/r/20200607153443.GB738695@linux.ibm.com Signed-off-by: Mike Rapoport <rppt@linux.ibm.com> Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Cain <bcain@codeaurora.org> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Chris Zankel <chris@zankel.net> Cc: "David S. Miller" <davem@davemloft.net> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Cc: Greentime Hu <green.hu@gmail.com> Cc: Greg Ungerer <gerg@linux-m68k.org> Cc: Guan Xuetao <gxt@pku.edu.cn> Cc: Guo Ren <guoren@kernel.org> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Helge Deller <deller@gmx.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: Ley Foon Tan <ley.foon.tan@intel.com> Cc: Mark Salter <msalter@redhat.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Matt Turner <mattst88@gmail.com> Cc: Max Filippov <jcmvbkbc@gmail.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Michal Simek <monstr@monstr.eu> Cc: Nick Hu <nickhu@andestech.com> Cc: Paul Walmsley <paul.walmsley@sifive.com> Cc: Richard Weinberger <richard@nod.at> Cc: Rich Felker <dalias@libc.org> Cc: Russell King <linux@armlinux.org.uk> Cc: Stafford Horne <shorne@gmail.com> Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Tony Luck <tony.luck@intel.com> Cc: Vincent Chen <deanbo422@gmail.com> Cc: Vineet Gupta <vgupta@synopsys.com> Cc: Will Deacon <will@kernel.org> Cc: Yoshinori Sato <ysato@users.sourceforge.jp> Link: http://lkml.kernel.org/r/20200514170327.31389-10-rppt@kernel.org Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-06-04arm64: add support for folded p4d page tablesMike Rapoport
Implement primitives necessary for the 4th level folding, add walks of p4d level where appropriate, replace 5level-fixup.h with pgtable-nop4d.h and remove __ARCH_USE_5LEVEL_HACK. [arnd@arndb.de: fix gcc-10 shift warning] Link: http://lkml.kernel.org/r/20200429185657.4085975-1-arnd@arndb.de Signed-off-by: Mike Rapoport <rppt@linux.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Brian Cain <bcain@codeaurora.org> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Christophe Leroy <christophe.leroy@c-s.fr> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: Geert Uytterhoeven <geert+renesas@glider.be> Cc: Guan Xuetao <gxt@pku.edu.cn> Cc: James Morse <james.morse@arm.com> Cc: Jonas Bonn <jonas@southpole.se> Cc: Julien Thierry <julien.thierry.kdev@gmail.com> Cc: Ley Foon Tan <ley.foon.tan@intel.com> Cc: Marc Zyngier <maz@kernel.org> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Paul Mackerras <paulus@samba.org> Cc: Rich Felker <dalias@libc.org> Cc: Russell King <linux@armlinux.org.uk> Cc: Stafford Horne <shorne@gmail.com> Cc: Stefan Kristiansson <stefan.kristiansson@saunalahti.fi> Cc: Suzuki K Poulose <suzuki.poulose@arm.com> Cc: Tony Luck <tony.luck@intel.com> Cc: Will Deacon <will@kernel.org> Cc: Yoshinori Sato <ysato@users.sourceforge.jp> Link: http://lkml.kernel.org/r/20200414153455.21744-4-rppt@kernel.org Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-05-07arm64: Set GP bit in kernel page tables to enable BTI for the kernelMark Brown
Now that the kernel is built with BTI annotations enable the feature by setting the GP bit in the stage 1 translation tables. This is done based on the features supported by the boot CPU so that we do not need to rewrite the translation tables. In order to avoid potential issues on big.LITTLE systems when there are a mix of BTI and non-BTI capable CPUs in the system when we have enabled kernel mode BTI we change BTI to be a _STRICT_BOOT_CPU_FEATURE when we have kernel BTI. This will prevent any CPUs that don't support BTI being started if the boot CPU supports BTI rather than simply not using BTI as we do when supporting BTI only in userspace. The main concern is the possibility of BTYPE being preserved by a CPU that does not implement BTI when a thread is migrated to it resulting in an incorrect state which could generate an exception when the thread migrates back to a CPU that does support BTI. If we encounter practical systems which mix BTI and non-BTI CPUs we will need to revisit this implementation. Since we currently do not generate landing pads in the BPF JIT we only map the base kernel text in this way. Signed-off-by: Mark Brown <broonie@kernel.org> Reviewed-by: Catalin Marinas <catalin.marinas@arm.com> Link: https://lore.kernel.org/r/20200506195138.22086-5-broonie@kernel.org Signed-off-by: Will Deacon <will@kernel.org>
2020-04-10mm/memory_hotplug: add pgprot_t to mhp_paramsLogan Gunthorpe
devm_memremap_pages() is currently used by the PCI P2PDMA code to create struct page mappings for IO memory. At present, these mappings are created with PAGE_KERNEL which implies setting the PAT bits to be WB. However, on x86, an mtrr register will typically override this and force the cache type to be UC-. In the case firmware doesn't set this register it is effectively WB and will typically result in a machine check exception when it's accessed. Other arches are not currently likely to function correctly seeing they don't have any MTRR registers to fall back on. To solve this, provide a way to specify the pgprot value explicitly to arch_add_memory(). Of the arches that support MEMORY_HOTPLUG: x86_64, and arm64 need a simple change to pass the pgprot_t down to their respective functions which set up the page tables. For x86_32, set the page tables explicitly using _set_memory_prot() (seeing they are already mapped). For ia64, s390 and sh, reject anything but PAGE_KERNEL settings -- this should be fine, for now, seeing these architectures don't support ZONE_DEVICE. A check in __add_pages() is also added to ensure the pgprot parameter was set for all arches. Signed-off-by: Logan Gunthorpe <logang@deltatee.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Acked-by: David Hildenbrand <david@redhat.com> Acked-by: Michal Hocko <mhocko@suse.com> Acked-by: Dan Williams <dan.j.williams@intel.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Eric Badger <ebadger@gigaio.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jason Gunthorpe <jgg@ziepe.ca> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Paul Mackerras <paulus@samba.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Will Deacon <will@kernel.org> Link: http://lkml.kernel.org/r/20200306170846.9333-7-logang@deltatee.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-04-10mm/memory_hotplug: rename mhp_restrictions to mhp_paramsLogan Gunthorpe
The mhp_restrictions struct really doesn't specify anything resembling a restriction anymore so rename it to be mhp_params as it is a list of extended parameters. Signed-off-by: Logan Gunthorpe <logang@deltatee.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Reviewed-by: David Hildenbrand <david@redhat.com> Reviewed-by: Dan Williams <dan.j.williams@intel.com> Acked-by: Michal Hocko <mhocko@suse.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Eric Badger <ebadger@gigaio.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jason Gunthorpe <jgg@ziepe.ca> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Paul Mackerras <paulus@samba.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Will Deacon <will@kernel.org> Link: http://lkml.kernel.org/r/20200306170846.9333-3-logang@deltatee.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-03-04arm64/mm: Enable memory hot removeAnshuman Khandual
The arch code for hot-remove must tear down portions of the linear map and vmemmap corresponding to memory being removed. In both cases the page tables mapping these regions must be freed, and when sparse vmemmap is in use the memory backing the vmemmap must also be freed. This patch adds unmap_hotplug_range() and free_empty_tables() helpers which can be used to tear down either region and calls it from vmemmap_free() and ___remove_pgd_mapping(). The free_mapped argument determines whether the backing memory will be freed. It makes two distinct passes over the kernel page table. In the first pass with unmap_hotplug_range() it unmaps, invalidates applicable TLB cache and frees backing memory if required (vmemmap) for each mapped leaf entry. In the second pass with free_empty_tables() it looks for empty page table sections whose page table page can be unmapped, TLB invalidated and freed. While freeing intermediate level page table pages bail out if any of its entries are still valid. This can happen for partially filled kernel page table either from a previously attempted failed memory hot add or while removing an address range which does not span the entire page table page range. The vmemmap region may share levels of table with the vmalloc region. There can be conflicts between hot remove freeing page table pages with a concurrent vmalloc() walking the kernel page table. This conflict can not just be solved by taking the init_mm ptl because of existing locking scheme in vmalloc(). So free_empty_tables() implements a floor and ceiling method which is borrowed from user page table tear with free_pgd_range() which skips freeing page table pages if intermediate address range is not aligned or maximum floor-ceiling might not own the entire page table page. Boot memory on arm64 cannot be removed. Hence this registers a new memory hotplug notifier which prevents boot memory offlining and it's removal. While here update arch_add_memory() to handle __add_pages() failures by just unmapping recently added kernel linear mapping. Now enable memory hot remove on arm64 platforms by default with ARCH_ENABLE_MEMORY_HOTREMOVE. This implementation is overall inspired from kernel page table tear down procedure on X86 architecture and user page table tear down method. [Mike and Catalin added P4D page table level support] Reviewed-by: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Mike Rapoport <rppt@linux.ibm.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2020-02-04arm64: mm: convert mm/dump.c to use walk_page_range()Steven Price
Now walk_page_range() can walk kernel page tables, we can switch the arm64 ptdump code over to using it, simplifying the code. Link: http://lkml.kernel.org/r/20191218162402.45610-22-steven.price@arm.com Signed-off-by: Steven Price <steven.price@arm.com> Reviewed-by: Catalin Marinas <catalin.marinas@arm.com> Cc: Albert Ou <aou@eecs.berkeley.edu> Cc: Alexandre Ghiti <alex@ghiti.fr> Cc: Andy Lutomirski <luto@kernel.org> Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Christian Borntraeger <borntraeger@de.ibm.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: David S. Miller <davem@davemloft.net> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Hogan <jhogan@kernel.org> Cc: James Morse <james.morse@arm.com> Cc: Jerome Glisse <jglisse@redhat.com> Cc: "Liang, Kan" <kan.liang@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Paul Burton <paul.burton@mips.com> Cc: Paul Mackerras <paulus@samba.org> Cc: Paul Walmsley <paul.walmsley@sifive.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: Russell King <linux@armlinux.org.uk> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vasily Gorbik <gor@linux.ibm.com> Cc: Vineet Gupta <vgupta@synopsys.com> Cc: Will Deacon <will@kernel.org> Cc: Zong Li <zong.li@sifive.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-01-04mm/memory_hotplug: shrink zones when offlining memoryDavid Hildenbrand
We currently try to shrink a single zone when removing memory. We use the zone of the first page of the memory we are removing. If that memmap was never initialized (e.g., memory was never onlined), we will read garbage and can trigger kernel BUGs (due to a stale pointer): BUG: unable to handle page fault for address: 000000000000353d #PF: supervisor write access in kernel mode #PF: error_code(0x0002) - not-present page PGD 0 P4D 0 Oops: 0002 [#1] SMP PTI CPU: 1 PID: 7 Comm: kworker/u8:0 Not tainted 5.3.0-rc5-next-20190820+ #317 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.1-0-ga5cab58e9a3f-prebuilt.qemu.4 Workqueue: kacpi_hotplug acpi_hotplug_work_fn RIP: 0010:clear_zone_contiguous+0x5/0x10 Code: 48 89 c6 48 89 c3 e8 2a fe ff ff 48 85 c0 75 cf 5b 5d c3 c6 85 fd 05 00 00 01 5b 5d c3 0f 1f 840 RSP: 0018:ffffad2400043c98 EFLAGS: 00010246 RAX: 0000000000000000 RBX: 0000000200000000 RCX: 0000000000000000 RDX: 0000000000200000 RSI: 0000000000140000 RDI: 0000000000002f40 RBP: 0000000140000000 R08: 0000000000000000 R09: 0000000000000001 R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000140000 R13: 0000000000140000 R14: 0000000000002f40 R15: ffff9e3e7aff3680 FS: 0000000000000000(0000) GS:ffff9e3e7bb00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 000000000000353d CR3: 0000000058610000 CR4: 00000000000006e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: __remove_pages+0x4b/0x640 arch_remove_memory+0x63/0x8d try_remove_memory+0xdb/0x130 __remove_memory+0xa/0x11 acpi_memory_device_remove+0x70/0x100 acpi_bus_trim+0x55/0x90 acpi_device_hotplug+0x227/0x3a0 acpi_hotplug_work_fn+0x1a/0x30 process_one_work+0x221/0x550 worker_thread+0x50/0x3b0 kthread+0x105/0x140 ret_from_fork+0x3a/0x50 Modules linked in: CR2: 000000000000353d Instead, shrink the zones when offlining memory or when onlining failed. Introduce and use remove_pfn_range_from_zone(() for that. We now properly shrink the zones, even if we have DIMMs whereby - Some memory blocks fall into no zone (never onlined) - Some memory blocks fall into multiple zones (offlined+re-onlined) - Multiple memory blocks that fall into different zones Drop the zone parameter (with a potential dubious value) from __remove_pages() and __remove_section(). Link: http://lkml.kernel.org/r/20191006085646.5768-6-david@redhat.com Fixes: f1dd2cd13c4b ("mm, memory_hotplug: do not associate hotadded memory to zones until online") [visible after d0dc12e86b319] Signed-off-by: David Hildenbrand <david@redhat.com> Reviewed-by: Oscar Salvador <osalvador@suse.de> Cc: Michal Hocko <mhocko@suse.com> Cc: "Matthew Wilcox (Oracle)" <willy@infradead.org> Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.ibm.com> Cc: Pavel Tatashin <pasha.tatashin@soleen.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Logan Gunthorpe <logang@deltatee.com> Cc: <stable@vger.kernel.org> [5.0+] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-11-26Merge tag 'acpi-5.5-rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm Pull ACPI updates from Rafael Wysocki: "These update the ACPICA code in the kernel to upstream revision 20191018, add support for EFI specific purpose memory, update the ACPI EC driver to make it work on systems with hardware-reduced ACPI, improve ACPI-based device enumeration for some platforms, rework the lid blacklist handling in the button driver and add more lid quirks to it, unify ACPI _HID/_UID matching, fix assorted issues and clean up the code and documentation. Specifics: - Update the ACPICA code in the kernel to upstream revision 20191018 including: * Fixes for Clang warnings (Bob Moore) * Fix for possible overflow in get_tick_count() (Bob Moore) * Introduction of acpi_unload_table() (Bob Moore) * Debugger and utilities updates (Erik Schmauss) * Fix for unloading tables loaded via configfs (Nikolaus Voss) - Add support for EFI specific purpose memory to optionally allow either application-exclusive or core-kernel-mm managed access to differentiated memory (Dan Williams) - Fix and clean up processing of the HMAT table (Brice Goglin, Qian Cai, Tao Xu) - Update the ACPI EC driver to make it work on systems with hardware-reduced ACPI (Daniel Drake) - Always build in support for the Generic Event Device (GED) to allow one kernel binary to work both on systems with full hardware ACPI and hardware-reduced ACPI (Arjan van de Ven) - Fix the table unload mechanism to unregister platform devices created when the given table was loaded (Andy Shevchenko) - Rework the lid blacklist handling in the button driver and add more lid quirks to it (Hans de Goede) - Improve ACPI-based device enumeration for some platforms based on Intel BayTrail SoCs (Hans de Goede) - Add an OpRegion driver for the Cherry Trail Crystal Cove PMIC and prevent handlers from being registered for unhandled PMIC OpRegions (Hans de Goede) - Unify ACPI _HID/_UID matching (Andy Shevchenko) - Clean up documentation and comments (Cao jin, James Pack, Kacper Piwiński)" * tag 'acpi-5.5-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (52 commits) ACPI: OSI: Shoot duplicate word ACPI: HMAT: use %u instead of %d to print u32 values ACPI: NUMA: HMAT: fix a section mismatch ACPI: HMAT: don't mix pxm and nid when setting memory target processor_pxm ACPI: NUMA: HMAT: Register "soft reserved" memory as an "hmem" device ACPI: NUMA: HMAT: Register HMAT at device_initcall level device-dax: Add a driver for "hmem" devices dax: Fix alloc_dax_region() compile warning lib: Uplevel the pmem "region" ida to a global allocator x86/efi: Add efi_fake_mem support for EFI_MEMORY_SP arm/efi: EFI soft reservation to memblock x86/efi: EFI soft reservation to E820 enumeration efi: Common enable/disable infrastructure for EFI soft reservation x86/efi: Push EFI_MEMMAP check into leaf routines efi: Enumerate EFI_MEMORY_SP ACPI: NUMA: Establish a new drivers/acpi/numa/ directory ACPICA: Update version to 20191018 ACPICA: debugger: remove leading whitespaces when converting a string to a buffer ACPICA: acpiexec: initialize all simple types and field units from user input ACPICA: debugger: add field unit support for acpi_db_get_next_token ...
2019-11-07arm/efi: EFI soft reservation to memblockDan Williams
UEFI 2.8 defines an EFI_MEMORY_SP attribute bit to augment the interpretation of the EFI Memory Types as "reserved for a specific purpose". The proposed Linux behavior for specific purpose memory is that it is reserved for direct-access (device-dax) by default and not available for any kernel usage, not even as an OOM fallback. Later, through udev scripts or another init mechanism, these device-dax claimed ranges can be reconfigured and hot-added to the available System-RAM with a unique node identifier. This device-dax management scheme implements "soft" in the "soft reserved" designation by allowing some or all of the reservation to be recovered as typical memory. This policy can be disabled at compile-time with CONFIG_EFI_SOFT_RESERVE=n, or runtime with efi=nosoftreserve. For this patch, update the ARM paths that consider EFI_CONVENTIONAL_MEMORY to optionally take the EFI_MEMORY_SP attribute into account as a reservation indicator. Publish the soft reservation as IORES_DESC_SOFT_RESERVED memory, similar to x86. (Based on an original patch by Ard) Reviewed-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Dan Williams <dan.j.williams@intel.com> Acked-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2019-11-06arm64: mm: simplify the page end calculation in __create_pgd_mapping()Masahiro Yamada
Calculate the page-aligned end address more simply. The local variable, "length" is unneeded. Reviewed-by: Mark Rutland <mark.rutland@arm.com> Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2019-09-26mm: treewide: clarify pgtable_page_{ctor,dtor}() namingMark Rutland
The naming of pgtable_page_{ctor,dtor}() seems to have confused a few people, and until recently arm64 used these erroneously/pointlessly for other levels of page table. To make it incredibly clear that these only apply to the PTE level, and to align with the naming of pgtable_pmd_page_{ctor,dtor}(), let's rename them to pgtable_pte_page_{ctor,dtor}(). These changes were generated with the following shell script: ---- git grep -lw 'pgtable_page_.tor' | while read FILE; do sed -i '{s/pgtable_page_ctor/pgtable_pte_page_ctor/}' $FILE; sed -i '{s/pgtable_page_dtor/pgtable_pte_page_dtor/}' $FILE; done ---- ... with the documentation re-flowed to remain under 80 columns, and whitespace fixed up in macros to keep backslashes aligned. There should be no functional change as a result of this patch. Link: http://lkml.kernel.org/r/20190722141133.3116-1-mark.rutland@arm.com Signed-off-by: Mark Rutland <mark.rutland@arm.com> Reviewed-by: Mike Rapoport <rppt@linux.ibm.com> Acked-by: Geert Uytterhoeven <geert@linux-m68k.org> [m68k] Cc: Anshuman Khandual <anshuman.khandual@arm.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Michal Hocko <mhocko@suse.com> Cc: Yu Zhao <yuzhao@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-08-30Merge branches 'for-next/52-bit-kva', 'for-next/cpu-topology', ↵Will Deacon
'for-next/error-injection', 'for-next/perf', 'for-next/psci-cpuidle', 'for-next/rng', 'for-next/smpboot', 'for-next/tbi' and 'for-next/tlbi' into for-next/core * for-next/52-bit-kva: (25 commits) Support for 52-bit virtual addressing in kernel space * for-next/cpu-topology: (9 commits) Move CPU topology parsing into core code and add support for ACPI 6.3 * for-next/error-injection: (2 commits) Support for function error injection via kprobes * for-next/perf: (8 commits) Support for i.MX8 DDR PMU and proper SMMUv3 group validation * for-next/psci-cpuidle: (7 commits) Move PSCI idle code into a new CPUidle driver * for-next/rng: (4 commits) Support for 'rng-seed' property being passed in the devicetree * for-next/smpboot: (3 commits) Reduce fragility of secondary CPU bringup in debug configurations * for-next/tbi: (10 commits) Introduce new syscall ABI with relaxed requirements for pointer tags * for-next/tlbi: (6 commits) Handle spurious page faults arising from kernel space
2019-08-28arm64: fix fixmap copy for 16K pages and 48-bit VAMark Rutland
With 16K pages and 48-bit VAs, the PGD level of table has two entries, and so the fixmap shares a PGD with the kernel image. Since commit: f9040773b7bbbd9e ("arm64: move kernel image to base of vmalloc area") ... we copy the existing fixmap to the new fine-grained page tables at the PUD level in this case. When walking to the new PUD, we forgot to offset the PGD entry and always used the PGD entry at index 0, but this worked as the kernel image and fixmap were in the low half of the TTBR1 address space. As of commit: 14c127c957c1c607 ("arm64: mm: Flip kernel VA space") ... the kernel image and fixmap are in the high half of the TTBR1 address space, and hence use the PGD at index 1, but we didn't update the fixmap copying code to account for this. Thus, we'll erroneously try to copy the fixmap slots into a PUD under the PGD entry at index 0. At the point we do so this PGD entry has not been initialised, and thus we'll try to write a value to a small offset from physical address 0, causing a number of potential problems. Fix this be correctly offsetting the PGD. This is split over a few steps for legibility. Fixes: 14c127c957c1c607 ("arm64: mm: Flip kernel VA space") Reported-by: Anshuman Khandual <anshuman.khandual@arm.com> Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org> Cc: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Mark Rutland <mark.rutland@arm.com> Acked-by: Marc Zyngier <maz@kernel.org> Tested-by: Marc Zyngier <maz@kernel.org> Acked-by: Steve Capper <Steve.Capper@arm.com> Tested-by: Steve Capper <Steve.Capper@arm.com> Tested-by: Anshuman Khandual <anshuman.khandual@arm.com> Signed-off-by: Will Deacon <will@kernel.org>
2019-08-23arm64: map FDT as RW for early_init_dt_scan()Hsin-Yi Wang
Currently in arm64, FDT is mapped to RO before it's passed to early_init_dt_scan(). However, there might be some codes (eg. commit "fdt: add support for rng-seed") that need to modify FDT during init. Map FDT to RO after early fixups are done. Signed-off-by: Hsin-Yi Wang <hsinyi@chromium.org> Reviewed-by: Stephen Boyd <swboyd@chromium.org> Reviewed-by: Mike Rapoport <rppt@linux.ibm.com> Signed-off-by: Will Deacon <will@kernel.org>
2019-08-14arm64: memory: rename VA_START to PAGE_ENDMark Rutland
Prior to commit: 14c127c957c1c607 ("arm64: mm: Flip kernel VA space") ... VA_START described the start of the TTBR1 address space for a given VA size described by VA_BITS, where all kernel mappings began. Since that commit, VA_START described a portion midway through the address space, where the linear map ends and other kernel mappings begin. To avoid confusion, let's rename VA_START to PAGE_END, making it clear that it's not the start of the TTBR1 address space and implying that it's related to PAGE_OFFSET. Comments and other mnemonics are updated accordingly, along with a typo fix in the decription of VMEMMAP_SIZE. There should be no functional change as a result of this patch. Signed-off-by: Mark Rutland <mark.rutland@arm.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Tested-by: Steve Capper <steve.capper@arm.com> Reviewed-by: Steve Capper <steve.capper@arm.com> Signed-off-by: Will Deacon <will@kernel.org>
2019-08-09arm64: mm: Remove vabits_userSteve Capper
Previous patches have enabled 52-bit kernel + user VAs and there is no longer any scenario where user VA != kernel VA size. This patch removes the, now redundant, vabits_user variable and replaces usage with vabits_actual where appropriate. Reviewed-by: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Steve Capper <steve.capper@arm.com> Signed-off-by: Will Deacon <will@kernel.org>
2019-08-09arm64: mm: Introduce vabits_actualSteve Capper
In order to support 52-bit kernel addresses detectable at boot time, one needs to know the actual VA_BITS detected. A new variable vabits_actual is introduced in this commit and employed for the KVM hypervisor layout, KASAN, fault handling and phys-to/from-virt translation where there would normally be compile time constants. In order to maintain performance in phys_to_virt, another variable physvirt_offset is introduced. Reviewed-by: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Steve Capper <steve.capper@arm.com> Signed-off-by: Will Deacon <will@kernel.org>
2019-08-09arm64: mm: Flip kernel VA spaceSteve Capper
In order to allow for a KASAN shadow that changes size at boot time, one must fix the KASAN_SHADOW_END for both 48 & 52-bit VAs and "grow" the start address. Also, it is highly desirable to maintain the same function addresses in the kernel .text between VA sizes. Both of these requirements necessitate us to flip the kernel address space halves s.t. the direct linear map occupies the lower addresses. This patch puts the direct linear map in the lower addresses of the kernel VA range and everything else in the higher ranges. We need to adjust: *) KASAN shadow region placement logic, *) KASAN_SHADOW_OFFSET computation logic, *) virt_to_phys, phys_to_virt checks, *) page table dumper. These are all small changes, that need to take place atomically, so they are bundled into this commit. As part of the re-arrangement, a guard region of 2MB (to preserve alignment for fixed map) is added after the vmemmap. Otherwise the vmemmap could intersect with IS_ERR pointers. Reviewed-by: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Steve Capper <steve.capper@arm.com> Signed-off-by: Will Deacon <will@kernel.org>
2019-07-18mm/memory_hotplug: allow arch_remove_memory() without CONFIG_MEMORY_HOTREMOVEDavid Hildenbrand
We want to improve error handling while adding memory by allowing to use arch_remove_memory() and __remove_pages() even if CONFIG_MEMORY_HOTREMOVE is not set to e.g., implement something like: arch_add_memory() rc = do_something(); if (rc) { arch_remove_memory(); } We won't get rid of CONFIG_MEMORY_HOTREMOVE for now, as it will require quite some dependencies for memory offlining. Link: http://lkml.kernel.org/r/20190527111152.16324-7-david@redhat.com Signed-off-by: David Hildenbrand <david@redhat.com> Reviewed-by: Pavel Tatashin <pasha.tatashin@soleen.com> Cc: Tony Luck <tony.luck@intel.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Yoshinori Sato <ysato@users.sourceforge.jp> Cc: Rich Felker <dalias@libc.org> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Borislav Petkov <bp@alien8.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: "Rafael J. Wysocki" <rafael@kernel.org> Cc: Michal Hocko <mhocko@suse.com> Cc: David Hildenbrand <david@redhat.com> Cc: Oscar Salvador <osalvador@suse.com> Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> Cc: Alex Deucher <alexander.deucher@amd.com> Cc: "David S. Miller" <davem@davemloft.net> Cc: Mark Brown <broonie@kernel.org> Cc: Chris Wilson <chris@chris-wilson.co.uk> Cc: Christophe Leroy <christophe.leroy@c-s.fr> Cc: Nicholas Piggin <npiggin@gmail.com> Cc: Vasily Gorbik <gor@linux.ibm.com> Cc: Rob Herring <robh@kernel.org> Cc: Masahiro Yamada <yamada.masahiro@socionext.com> Cc: "mike.travis@hpe.com" <mike.travis@hpe.com> Cc: Andrew Banman <andrew.banman@hpe.com> Cc: Arun KS <arunks@codeaurora.org> Cc: Qian Cai <cai@lca.pw> Cc: Mathieu Malaterre <malat@debian.org> Cc: Baoquan He <bhe@redhat.com> Cc: Logan Gunthorpe <logang@deltatee.com> Cc: Anshuman Khandual <anshuman.khandual@arm.com> Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Chintan Pandya <cpandya@codeaurora.org> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jonathan Cameron <Jonathan.Cameron@huawei.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Jun Yao <yaojun8558363@gmail.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Mike Rapoport <rppt@linux.vnet.ibm.com> Cc: Oscar Salvador <osalvador@suse.de> Cc: Robin Murphy <robin.murphy@arm.com> Cc: Wei Yang <richard.weiyang@gmail.com> Cc: Will Deacon <will.deacon@arm.com> Cc: Yu Zhao <yuzhao@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-07-18arm64/mm: add temporary arch_remove_memory() implementationDavid Hildenbrand
A proper arch_remove_memory() implementation is on its way, which also cleanly removes page tables in arch_add_memory() in case something goes wrong. As we want to use arch_remove_memory() in case something goes wrong during memory hotplug after arch_add_memory() finished, let's add a temporary hack that is sufficient enough until we get a proper implementation that cleans up page table entries. We will remove CONFIG_MEMORY_HOTREMOVE around this code in follow up patches. Link: http://lkml.kernel.org/r/20190527111152.16324-5-david@redhat.com Signed-off-by: David Hildenbrand <david@redhat.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Will Deacon <will.deacon@arm.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org> Cc: Chintan Pandya <cpandya@codeaurora.org> Cc: Mike Rapoport <rppt@linux.ibm.com> Cc: Jun Yao <yaojun8558363@gmail.com> Cc: Yu Zhao <yuzhao@google.com> Cc: Robin Murphy <robin.murphy@arm.com> Cc: Anshuman Khandual <anshuman.khandual@arm.com> Cc: Alex Deucher <alexander.deucher@amd.com> Cc: Andrew Banman <andrew.banman@hpe.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Arun KS <arunks@codeaurora.org> Cc: Baoquan He <bhe@redhat.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Christophe Leroy <christophe.leroy@c-s.fr> Cc: Chris Wilson <chris@chris-wilson.co.uk> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: "David S. Miller" <davem@davemloft.net> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jonathan Cameron <Jonathan.Cameron@huawei.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> Cc: Logan Gunthorpe <logang@deltatee.com> Cc: Mark Brown <broonie@kernel.org> Cc: Masahiro Yamada <yamada.masahiro@socionext.com> Cc: Mathieu Malaterre <malat@debian.org> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Michal Hocko <mhocko@suse.com> Cc: Mike Rapoport <rppt@linux.vnet.ibm.com> Cc: "mike.travis@hpe.com" <mike.travis@hpe.com> Cc: Nicholas Piggin <npiggin@gmail.com> Cc: Oscar Salvador <osalvador@suse.com> Cc: Oscar Salvador <osalvador@suse.de> Cc: Paul Mackerras <paulus@samba.org> Cc: Pavel Tatashin <pasha.tatashin@soleen.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Qian Cai <cai@lca.pw> Cc: "Rafael J. Wysocki" <rafael@kernel.org> Cc: Rich Felker <dalias@libc.org> Cc: Rob Herring <robh@kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Tony Luck <tony.luck@intel.com> Cc: Vasily Gorbik <gor@linux.ibm.com> Cc: Wei Yang <richard.weiyang@gmail.com> Cc: Yoshinori Sato <ysato@users.sourceforge.jp> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-07-16mm/ioremap: probe platform for p4d huge map supportAnshuman Khandual
Finish up what commit c2febafc6773 ("mm: convert generic code to 5-level paging") started while levelling up P4D huge mapping support at par with PUD and PMD. A new arch call back arch_ioremap_p4d_supported() is added which just maintains status quo (P4D huge map not supported) on x86, arm64 and powerpc. When HAVE_ARCH_HUGE_VMAP is enabled its just a simple check from the arch about the support, hence runtime effects are minimal. Link: http://lkml.kernel.org/r/1561699231-20991-1-git-send-email-anshuman.khandual@arm.com Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com> Acked-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Michael Ellerman <mpe@ellerman.id.au> (powerpc) Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Will Deacon <will.deacon@arm.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Michal Hocko <mhocko@kernel.org> Cc: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-07-12arm64: switch to generic version of pte allocationMike Rapoport
The PTE allocations in arm64 are identical to the generic ones modulo the GFP flags. Using the generic pte_alloc_one() functions ensures that the user page tables are allocated with __GFP_ACCOUNT set. The arm64 definition of PGALLOC_GFP is removed and replaced with GFP_PGTABLE_USER for p[gum]d_alloc_one() for the user page tables and GFP_PGTABLE_KERNEL for the kernel page tables. The KVM memory cache is now using GFP_PGTABLE_USER. The mappings created with create_pgd_mapping() are now using GFP_PGTABLE_KERNEL. The conversion to the generic version of pte_free_kernel() removes the NULL check for pte. The pte_free() version on arm64 is identical to the generic one and can be simply dropped. [cai@lca.pw: fix a bogus GFP flag in pgd_alloc()] Link: https://lore.kernel.org/r/1559656836-24940-1-git-send-email-cai@lca.pw/ [and fix it more] Link: https://lore.kernel.org/linux-mm/20190617151252.GF16810@rapoport-lnx/ Link: http://lkml.kernel.org/r/1557296232-15361-5-git-send-email-rppt@linux.ibm.com Signed-off-by: Mike Rapoport <rppt@linux.ibm.com> Cc: Albert Ou <aou@eecs.berkeley.edu> Cc: Anshuman Khandual <anshuman.khandual@arm.com> Cc: Anton Ivanov <anton.ivanov@cambridgegreys.com> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Cc: Greentime Hu <green.hu@gmail.com> Cc: Guan Xuetao <gxt@pku.edu.cn> Cc: Guo Ren <guoren@kernel.org> Cc: Guo Ren <ren_guo@c-sky.com> Cc: Helge Deller <deller@gmx.de> Cc: Ley Foon Tan <lftan@altera.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Matt Turner <mattst88@gmail.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Michal Hocko <mhocko@suse.com> Cc: Palmer Dabbelt <palmer@sifive.com> Cc: Paul Burton <paul.burton@mips.com> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: Richard Kuo <rkuo@codeaurora.org> Cc: Richard Weinberger <richard@nod.at> Cc: Russell King <linux@armlinux.org.uk> Cc: Sam Creasey <sammy@sammy.net> Cc: Vincent Chen <deanbo422@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-07-08Merge tag 'arm64-upstream' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux Pull arm64 updates from Catalin Marinas: - arm64 support for syscall emulation via PTRACE_SYSEMU{,_SINGLESTEP} - Wire up VM_FLUSH_RESET_PERMS for arm64, allowing the core code to manage the permissions of executable vmalloc regions more strictly - Slight performance improvement by keeping softirqs enabled while touching the FPSIMD/SVE state (kernel_neon_begin/end) - Expose a couple of ARMv8.5 features to user (HWCAP): CondM (new XAFLAG and AXFLAG instructions for floating point comparison flags manipulation) and FRINT (rounding floating point numbers to integers) - Re-instate ARM64_PSEUDO_NMI support which was previously marked as BROKEN due to some bugs (now fixed) - Improve parking of stopped CPUs and implement an arm64-specific panic_smp_self_stop() to avoid warning on not being able to stop secondary CPUs during panic - perf: enable the ARM Statistical Profiling Extensions (SPE) on ACPI platforms - perf: DDR performance monitor support for iMX8QXP - cache_line_size() can now be set from DT or ACPI/PPTT if provided to cope with a system cache info not exposed via the CPUID registers - Avoid warning on hardware cache line size greater than ARCH_DMA_MINALIGN if the system is fully coherent - arm64 do_page_fault() and hugetlb cleanups - Refactor set_pte_at() to avoid redundant READ_ONCE(*ptep) - Ignore ACPI 5.1 FADTs reported as 5.0 (infer from the 'arm_boot_flags' introduced in 5.1) - CONFIG_RANDOMIZE_BASE now enabled in defconfig - Allow the selection of ARM64_MODULE_PLTS, currently only done via RANDOMIZE_BASE (and an erratum workaround), allowing modules to spill over into the vmalloc area - Make ZONE_DMA32 configurable * tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: (54 commits) perf: arm_spe: Enable ACPI/Platform automatic module loading arm_pmu: acpi: spe: Add initial MADT/SPE probing ACPI/PPTT: Add function to return ACPI 6.3 Identical tokens ACPI/PPTT: Modify node flag detection to find last IDENTICAL x86/entry: Simplify _TIF_SYSCALL_EMU handling arm64: rename dump_instr as dump_kernel_instr arm64/mm: Drop [PTE|PMD]_TYPE_FAULT arm64: Implement panic_smp_self_stop() arm64: Improve parking of stopped CPUs arm64: Expose FRINT capabilities to userspace arm64: Expose ARMv8.5 CondM capability to userspace arm64: defconfig: enable CONFIG_RANDOMIZE_BASE arm64: ARM64_MODULES_PLTS must depend on MODULES arm64: bpf: do not allocate executable memory arm64/kprobes: set VM_FLUSH_RESET_PERMS on kprobe instruction pages arm64/mm: wire up CONFIG_ARCH_HAS_SET_DIRECT_MAP arm64: module: create module allocations without exec permissions arm64: Allow user selection of ARM64_MODULE_PLTS acpi/arm64: ignore 5.1 FADTs that are reported as 5.0 arm64: Allow selecting Pseudo-NMI again ...
2019-06-19treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 234Thomas Gleixner
Based on 1 normalized pattern(s): this program is free software you can redistribute it and or modify it under the terms of the gnu general public license version 2 as published by the free software foundation this program is distributed in the hope that it will be useful but without any warranty without even the implied warranty of merchantability or fitness for a particular purpose see the gnu general public license for more details you should have received a copy of the gnu general public license along with this program if not see http www gnu org licenses extracted by the scancode license scanner the SPDX license identifier GPL-2.0-only has been chosen to replace the boilerplate/reference in 503 file(s). Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Alexios Zavras <alexios.zavras@intel.com> Reviewed-by: Allison Randal <allison@lohutok.net> Reviewed-by: Enrico Weigelt <info@metux.net> Cc: linux-spdx@vger.kernel.org Link: https://lkml.kernel.org/r/20190602204653.811534538@linutronix.de Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-06-07arm64: Fix comment after #endifOdin Ugedal
The config value used in the if was changed in b433dce056d3814dc4b33e5a8a533d6401ffcfb0, but the comment on the corresponding end was not changed. Signed-off-by: Odin Ugedal <odin@ugedal.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2019-06-04arm64/mm: Change BUG_ON() to VM_BUG_ON() in [pmd|pud]_set_huge()Anshuman Khandual
There are no callers for the functions which will pass unaligned physical addresses. Hence just change these BUG_ON() checks into VM_BUG_ON() which gets compiled out unless CONFIG_VM_DEBUG is enabled. Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com> Cc: Will Deacon <will.deacon@arm.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2019-06-04arm64/mm: Simplify protection flag creation for kernel huge mappingsAnshuman Khandual
Even though they have got the same value, PMD_TYPE_SECT and PUD_TYPE_SECT get used for kernel huge mappings. But before that first the table bit gets cleared using leaf level PTE_TABLE_BIT. Though functionally they are same, we should use page table level specific macros to be consistent as per the MMU specifications. Create page table level specific wrappers for kernel huge mapping entries and just drop mk_sect_prot() which does not have any other user. Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com> Cc: Will Deacon <will.deacon@arm.com> Cc: Mark Rutland <mark.rutland@arm.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2019-05-22Merge tag 'arm64-fixes' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux Pull arm64 fixes from Will Deacon: - Fix SPE probe failure when backing auxbuf with high-order pages - Fix handling of DMA allocations from outside of the vmalloc area - Fix generation of build-id ELF section for vDSO object - Disable huge I/O mappings if kernel page table dumping is enabled - A few other minor fixes (comments, kconfig etc) * tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: arm64: vdso: Explicitly add build-id option arm64/mm: Inhibit huge-vmap with ptdump arm64: Print physical address of page table base in show_pte() arm64: don't trash config with compat symbol if COMPAT is disabled arm64: assembler: Update comment above cond_yield_neon() macro drivers/perf: arm_spe: Don't error on high-order pages for aux buf arm64/iommu: handle non-remapped addresses in ->mmap and ->get_sgtable
2019-05-16arm64/mm: Inhibit huge-vmap with ptdumpMark Rutland
The arm64 ptdump code can race with concurrent modification of the kernel page tables. At the time this was added, this was sound as: * Modifications to leaf entries could result in stale information being logged, but would not result in a functional problem. * Boot time modifications to non-leaf entries (e.g. freeing of initmem) were performed when the ptdump code cannot be invoked. * At runtime, modifications to non-leaf entries only occurred in the vmalloc region, and these were strictly additive, as intermediate entries were never freed. However, since commit: commit 324420bf91f6 ("arm64: add support for ioremap() block mappings") ... it has been possible to create huge mappings in the vmalloc area at runtime, and as part of this existing intermediate levels of table my be removed and freed. It's possible for the ptdump code to race with this, and continue to walk tables which have been freed (and potentially poisoned or reallocated). As a result of this, the ptdump code may dereference bogus addresses, which could be fatal. Since huge-vmap is a TLB and memory optimization, we can disable it when the runtime ptdump code is in use to avoid this problem. Cc: Catalin Marinas <catalin.marinas@arm.com> Fixes: 324420bf91f60582 ("arm64: add support for ioremap() block mappings") Acked-by: Ard Biesheuvel <ard.biesheuvel@arm.com> Signed-off-by: Mark Rutland <mark.rutland@arm.com> Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com> Signed-off-by: Will Deacon <will.deacon@arm.com>
2019-05-14treewide: replace #include <asm/sizes.h> with #include <linux/sizes.h>Masahiro Yamada
Since commit dccd2304cc90 ("ARM: 7430/1: sizes.h: move from asm-generic to <linux/sizes.h>"), <asm/sizes.h> and <asm-generic/sizes.h> are just wrappers of <linux/sizes.h>. This commit replaces all <asm/sizes.h> and <asm-generic/sizes.h> to prepare for the removal. Link: http://lkml.kernel.org/r/1553267665-27228-1-git-send-email-yamada.masahiro@socionext.com Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>