aboutsummaryrefslogtreecommitdiffstats
path: root/arch/arm64/mm/init.c
AgeCommit message (Collapse)Author
2023-11-02Merge tag 'mm-nonmm-stable-2023-11-02-14-08' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull non-MM updates from Andrew Morton: "As usual, lots of singleton and doubleton patches all over the tree and there's little I can say which isn't in the individual changelogs. The lengthier patch series are - 'kdump: use generic functions to simplify crashkernel reservation in arch', from Baoquan He. This is mainly cleanups and consolidation of the 'crashkernel=' kernel parameter handling - After much discussion, David Laight's 'minmax: Relax type checks in min() and max()' is here. Hopefully reduces some typecasting and the use of min_t() and max_t() - A group of patches from Oleg Nesterov which clean up and slightly fix our handling of reads from /proc/PID/task/... and which remove task_struct.thread_group" * tag 'mm-nonmm-stable-2023-11-02-14-08' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (64 commits) scripts/gdb/vmalloc: disable on no-MMU scripts/gdb: fix usage of MOD_TEXT not defined when CONFIG_MODULES=n .mailmap: add address mapping for Tomeu Vizoso mailmap: update email address for Claudiu Beznea tools/testing/selftests/mm/run_vmtests.sh: lower the ptrace permissions .mailmap: map Benjamin Poirier's address scripts/gdb: add lx_current support for riscv ocfs2: fix a spelling typo in comment proc: test ProtectionKey in proc-empty-vm test proc: fix proc-empty-vm test with vsyscall fs/proc/base.c: remove unneeded semicolon do_io_accounting: use sig->stats_lock do_io_accounting: use __for_each_thread() ocfs2: replace BUG_ON() at ocfs2_num_free_extents() with ocfs2_error() ocfs2: fix a typo in a comment scripts/show_delta: add __main__ judgement before main code treewide: mark stuff as __ro_after_init fs: ocfs2: check status values proc: test /proc/${pid}/statm compiler.h: move __is_constexpr() to compiler.h ...
2023-10-13arm64: swiotlb: Reduce the default size if no ZONE_DMA bouncing neededCatalin Marinas
With CONFIG_DMA_BOUNCE_UNALIGNED_KMALLOC enabled, the arm64 kernel still allocates the default SWIOTLB buffer (64MB) even if ZONE_DMA is disabled or all the RAM fits into this zone. However, this potentially wastes a non-negligible amount of memory on platforms with little RAM. Reduce the SWIOTLB size to 1MB per 1GB of RAM if only needed for kmalloc() buffer bouncing. Signed-off-by: Catalin Marinas <catalin.marinas@arm.com> Suggested-by: Ross Burton <ross.burton@arm.com> Cc: Ross Burton <ross.burton@arm.com> Cc: Will Deacon <will@kernel.org> Reviewed-by: Robin Murphy <robin.murphy@arm.com>
2023-10-04arm64: kdump: use generic interface to simplify crashkernel reservationBaoquan He
With the help of newly changed function parse_crashkernel() and generic reserve_crashkernel_generic(), crashkernel reservation can be simplified by steps: 1) Add a new header file <asm/crash_core.h>, and define CRASH_ALIGN, CRASH_ADDR_LOW_MAX, CRASH_ADDR_HIGH_MAX and DEFAULT_CRASH_KERNEL_LOW_SIZE in <asm/crash_core.h>; 2) Add arch_reserve_crashkernel() to call parse_crashkernel() and reserve_crashkernel_generic(); 3) Add ARCH_HAS_GENERIC_CRASHKERNEL_RESERVATION Kconfig in arch/arm64/Kconfig. The old reserve_crashkernel_low() and reserve_crashkernel() can be removed. Link: https://lkml.kernel.org/r/20230914033142.676708-8-bhe@redhat.com Signed-off-by: Baoquan He <bhe@redhat.com> Reviewed-by: Zhen Lei <thunder.leizhen@huawei.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Chen Jiahao <chenjiahao16@huawei.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-10-04crash_core: change the prototype of function parse_crashkernel()Baoquan He
Add two parameters 'low_size' and 'high' to function parse_crashkernel(), later crashkernel=,high|low parsing will be added. Make adjustments in all call sites of parse_crashkernel() in arch. Link: https://lkml.kernel.org/r/20230914033142.676708-3-bhe@redhat.com Signed-off-by: Baoquan He <bhe@redhat.com> Reviewed-by: Zhen Lei <thunder.leizhen@huawei.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Chen Jiahao <chenjiahao16@huawei.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-08-29Merge tag 'dma-mapping-6.6-2023-08-29' of ↵Linus Torvalds
git://git.infradead.org/users/hch/dma-mapping Pull dma-maping updates from Christoph Hellwig: - allow dynamic sizing of the swiotlb buffer, to cater for secure virtualization workloads that require all I/O to be bounce buffered (Petr Tesarik) - move a declaration to a header (Arnd Bergmann) - check for memory region overlap in dma-contiguous (Binglei Wang) - remove the somewhat dangerous runtime swiotlb-xen enablement and unexport is_swiotlb_active (Christoph Hellwig, Juergen Gross) - per-node CMA improvements (Yajun Deng) * tag 'dma-mapping-6.6-2023-08-29' of git://git.infradead.org/users/hch/dma-mapping: swiotlb: optimize get_max_slots() swiotlb: move slot allocation explanation comment where it belongs swiotlb: search the software IO TLB only if the device makes use of it swiotlb: allocate a new memory pool when existing pools are full swiotlb: determine potential physical address limit swiotlb: if swiotlb is full, fall back to a transient memory pool swiotlb: add a flag whether SWIOTLB is allowed to grow swiotlb: separate memory pool data from other allocator data swiotlb: add documentation and rename swiotlb_do_find_slots() swiotlb: make io_tlb_default_mem local to swiotlb.c swiotlb: bail out of swiotlb_init_late() if swiotlb is already allocated dma-contiguous: check for memory region overlap dma-contiguous: support numa CMA for specified node dma-contiguous: support per-numa CMA for all architectures dma-mapping: move arch_dma_set_mask() declaration to header swiotlb: unexport is_swiotlb_active x86: always initialize xen-swiotlb when xen-pcifront is enabling xen/pci: add flag for PCI passthrough being possible
2023-08-04arm64: fix build warning for ARM64_MEMSTART_SHIFTZhang Jianhua
When building with W=1, the following warning occurs. arch/arm64/include/asm/kernel-pgtable.h:129:41: error: "PUD_SHIFT" is not defined, evaluates to 0 [-Werror=undef] 129 | #define ARM64_MEMSTART_SHIFT PUD_SHIFT | ^~~~~~~~~ arch/arm64/include/asm/kernel-pgtable.h:142:5: note: in expansion of macro ‘ARM64_MEMSTART_SHIFT’ 142 | #if ARM64_MEMSTART_SHIFT < SECTION_SIZE_BITS | ^~~~~~~~~~~~~~~~~~~~ The generic PUD_SHIFT was defined in include/asm-generic/pgtable-nopud.h, however the #ifndef __ASSEMBLY__ guard in this header file makes it unavailable for assembly files. While someone .S file include the <asm/kernel-pgtable.h>, the build warning would occur. Now move the macro ARM64_MEMSTART_SHIFT and ARM64_MEMSTART_ALIGN to arch/arm64/mm/init.c where it is used only, to avoid this issue. Signed-off-by: Zhang Jianhua <chris.zjh@huawei.com> Reviewed-by: Catalin Marinas <catalin.marinas@arm.com> Link: https://lore.kernel.org/r/20230804075615.3334756-1-chris.zjh@huawei.com Signed-off-by: Will Deacon <will@kernel.org>
2023-07-31dma-contiguous: support per-numa CMA for all architecturesYajun Deng
In the commit b7176c261cdb ("dma-contiguous: provide the ability to reserve per-numa CMA"), Barry adds DMA_PERNUMA_CMA for ARM64. But this feature is architecture independent, so support per-numa CMA for all architectures, and enable it by default if NUMA. Signed-off-by: Yajun Deng <yajun.deng@linux.dev> Tested-by: Yicong Yang <yangyicong@hisilicon.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
2023-06-28Merge tag 'mm-stable-2023-06-24-19-15' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull mm updates from Andrew Morton: - Yosry Ahmed brought back some cgroup v1 stats in OOM logs - Yosry has also eliminated cgroup's atomic rstat flushing - Nhat Pham adds the new cachestat() syscall. It provides userspace with the ability to query pagecache status - a similar concept to mincore() but more powerful and with improved usability - Mel Gorman provides more optimizations for compaction, reducing the prevalence of page rescanning - Lorenzo Stoakes has done some maintanance work on the get_user_pages() interface - Liam Howlett continues with cleanups and maintenance work to the maple tree code. Peng Zhang also does some work on maple tree - Johannes Weiner has done some cleanup work on the compaction code - David Hildenbrand has contributed additional selftests for get_user_pages() - Thomas Gleixner has contributed some maintenance and optimization work for the vmalloc code - Baolin Wang has provided some compaction cleanups, - SeongJae Park continues maintenance work on the DAMON code - Huang Ying has done some maintenance on the swap code's usage of device refcounting - Christoph Hellwig has some cleanups for the filemap/directio code - Ryan Roberts provides two patch series which yield some rationalization of the kernel's access to pte entries - use the provided APIs rather than open-coding accesses - Lorenzo Stoakes has some fixes to the interaction between pagecache and directio access to file mappings - John Hubbard has a series of fixes to the MM selftesting code - ZhangPeng continues the folio conversion campaign - Hugh Dickins has been working on the pagetable handling code, mainly with a view to reducing the load on the mmap_lock - Catalin Marinas has reduced the arm64 kmalloc() minimum alignment from 128 to 8 - Domenico Cerasuolo has improved the zswap reclaim mechanism by reorganizing the LRU management - Matthew Wilcox provides some fixups to make gfs2 work better with the buffer_head code - Vishal Moola also has done some folio conversion work - Matthew Wilcox has removed the remnants of the pagevec code - their functionality is migrated over to struct folio_batch * tag 'mm-stable-2023-06-24-19-15' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (380 commits) mm/hugetlb: remove hugetlb_set_page_subpool() mm: nommu: correct the range of mmap_sem_read_lock in task_mem() hugetlb: revert use of page_cache_next_miss() Revert "page cache: fix page_cache_next/prev_miss off by one" mm/vmscan: fix root proactive reclaim unthrottling unbalanced node mm: memcg: rename and document global_reclaim() mm: kill [add|del]_page_to_lru_list() mm: compaction: convert to use a folio in isolate_migratepages_block() mm: zswap: fix double invalidate with exclusive loads mm: remove unnecessary pagevec includes mm: remove references to pagevec mm: rename invalidate_mapping_pagevec to mapping_try_invalidate mm: remove struct pagevec net: convert sunrpc from pagevec to folio_batch i915: convert i915_gpu_error to use a folio_batch pagevec: rename fbatch_count() mm: remove check_move_unevictable_pages() drm: convert drm_gem_put_pages() to use a folio_batch i915: convert shmem_sg_free_table() to use a folio_batch scatterlist: add sg_set_folio() ...
2023-06-19arm64: enable ARCH_WANT_KMALLOC_DMA_BOUNCE for arm64Catalin Marinas
With the DMA bouncing of unaligned kmalloc() buffers now in place, enable it for arm64 to allow the kmalloc-{8,16,32,48,96} caches. In addition, always create the swiotlb buffer even when the end of RAM is within the 32-bit physical address range (the swiotlb buffer can still be disabled on the kernel command line). Link: https://lkml.kernel.org/r/20230612153201.554742-18-catalin.marinas@arm.com Signed-off-by: Catalin Marinas <catalin.marinas@arm.com> Tested-by: Isaac J. Manjarres <isaacmanjarres@google.com> Cc: Will Deacon <will@kernel.org> Cc: Alasdair Kergon <agk@redhat.com> Cc: Ard Biesheuvel <ardb@kernel.org> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Christoph Hellwig <hch@lst.de> Cc: Daniel Vetter <daniel@ffwll.ch> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Herbert Xu <herbert@gondor.apana.org.au> Cc: Jerry Snitselaar <jsnitsel@redhat.com> Cc: Joerg Roedel <joro@8bytes.org> Cc: Jonathan Cameron <jic23@kernel.org> Cc: Jonathan Cameron <Jonathan.Cameron@huawei.com> Cc: Lars-Peter Clausen <lars@metafoo.de> Cc: Logan Gunthorpe <logang@deltatee.com> Cc: Marc Zyngier <maz@kernel.org> Cc: Mark Brown <broonie@kernel.org> Cc: Mike Snitzer <snitzer@kernel.org> Cc: "Rafael J. Wysocki" <rafael@kernel.org> Cc: Robin Murphy <robin.murphy@arm.com> Cc: Saravana Kannan <saravanak@google.com> Cc: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-06-09arm64: kdump: simplify the reservation behaviour of crashkernel=,highBaoquan He
On arm64, reservation for 'crashkernel=xM,high' is taken by searching for suitable memory region top down. If the 'xM' of crashkernel high memory is reserved from high memory successfully, it will try to reserve crashkernel low memory later accoringly. Otherwise, it will try to search low memory area for the 'xM' suitable region. Please see the details in Documentation/admin-guide/kernel-parameters.txt. While we observed an unexpected case where a reserved region crosses the high and low meomry boundary. E.g on a system with 4G as low memory end, user added the kernel parameters like: 'crashkernel=512M,high', it could finally have [4G-126M, 4G+386M], [1G, 1G+128M] regions in running kernel. The crashkernel high region crossing low and high memory boudary will bring issues: 1) For crashkernel=x,high, if getting crashkernel high region across low and high memory boundary, then user will see two memory regions in low memory, and one memory region in high memory. The two crashkernel low memory regions are confusing as shown in above example. 2) If people explicityly specify "crashkernel=x,high crashkernel=y,low" and y <= 128M, when crashkernel high region crosses low and high memory boundary and the part of crashkernel high reservation below boundary is bigger than y, the expected crahskernel low reservation will be skipped. But the expected crashkernel high reservation is shrank and could not satisfy user space requirement. 3) The crossing boundary behaviour of crahskernel high reservation is different than x86 arch. On x86_64, the low memory end is 4G fixedly, and the memory near 4G is reserved by system, e.g for mapping firmware, pci mapping, so the crashkernel reservation crossing boundary never happens. From distros point of view, this brings inconsistency and confusion. Users need to dig into x86 and arm64 system details to find out why. For kernel itself, the impact of issue 3) could be slight. While issue 1) and 2) cause actual impact because it brings obscure semantics and behaviour to crashkernel=,high reservation. Here, for crashkernel=xM,high, search the high memory for the suitable region only in high memory. If failed, try reserving the suitable region only in low memory. Like this, the crashkernel high region will only exist in high memory, and crashkernel low region only exists in low memory. The reservation behaviour for crashkernel=,high is clearer and simpler. Note: RPi4 has different zone ranges than normal memory. Its DMA zone is 0~1G, and DMA32 zone is 1G~4G if CONFIG_ZONE_DMA|DMA32 are enabled by default. The low memory end is 1G in order to validate all devices, high memory starts at 1G memory. However, for being consistent with normal arm64 system, its low memory end is still 1G, while reserving crashkernel high memory from 4G if crashkernel=size,high specified. This will remove confusion. With above change applied, summary of arm64 crashkernel reservation range: 1) RPi4(zone DMA:0~1G; DMA32:1G~4G): crashkernel=size 0~1G: low memory | 1G~top: high memory crashkernel=size,high 0~1G: low memory | 4G~top: high memory 2) Other normal system: crashkernel=size crashkernel=size,high 0~4G: low memory | 4G~top: high memory 3) Systems w/o zone DMA|DMA32 crashkernel=size crashkernel=size,high 0~top: low memory Signed-off-by: Baoquan He <bhe@redhat.com> Reviewed-by: Catalin Marinas <catalin.marinas@arm.com> Reviewed-by: Zhen Lei <thunder.leizhen@huawei.com> Link: https://lore.kernel.org/r/ZGIBSEoZ7VRVvP8H@MiWiFi-R3L-srv Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2023-04-11arm64: kdump: defer the crashkernel reservation for platforms with no DMA ↵Baoquan He
memory zones In commit 031495635b46 ("arm64: Do not defer reserve_crashkernel() for platforms with no DMA memory zones"), reserve_crashkernel() is called much earlier in arm64_memblock_init() to avoid causing base apge mapping on platforms with no DMA meomry zones. With taking off protection on crashkernel memory region, no need to call reserve_crashkernel() specially in advance. The deferred invocation of reserve_crashkernel() in bootmem_init() can cover all cases. So revert the whole commit now. Signed-off-by: Baoquan He <bhe@redhat.com> Reviewed-by: Zhen Lei <thunder.leizhen@huawei.com> Link: https://lore.kernel.org/r/20230407011507.17572-4-bhe@redhat.com Signed-off-by: Will Deacon <will@kernel.org>
2022-11-18arm64: kdump: Support crashkernel=X fall back to reserve region above DMA zonesZhen Lei
For crashkernel=X without '@offset', select a region within DMA zones first, and fall back to reserve region above DMA zones. This allows users to use the same configuration on multiple platforms. Signed-off-by: Zhen Lei <thunder.leizhen@huawei.com> Acked-by: Baoquan He <bhe@redhat.com> Reviewed-by: Catalin Marinas <catalin.marinas@arm.com> Link: https://lore.kernel.org/r/20221116121044.1690-3-thunder.leizhen@huawei.com Signed-off-by: Will Deacon <will@kernel.org>
2022-11-18arm64: kdump: Provide default size when crashkernel=Y,low is not specifiedZhen Lei
Try to allocate at least 128 MiB low memory automatically for the case that crashkernel=,high is explicitly specified, while crashkenrel=,low is omitted. This allows users to focus more on the high memory requirements of their business rather than the low memory requirements of the crash kernel booting. Signed-off-by: Zhen Lei <thunder.leizhen@huawei.com> Acked-by: Baoquan He <bhe@redhat.com> Reviewed-by: Catalin Marinas <catalin.marinas@arm.com> Link: https://lore.kernel.org/r/20221116121044.1690-2-thunder.leizhen@huawei.com Signed-off-by: Will Deacon <will@kernel.org>
2022-09-09arm64/sysreg: Add _EL1 into ID_AA64MMFR0_EL1 definition namesMark Brown
Normally we include the full register name in the defines for fields within registers but this has not been followed for ID registers. In preparation for automatic generation of defines add the _EL1s into the defines for ID_AA64MMFR0_EL1 to follow the convention. No functional changes. Signed-off-by: Mark Brown <broonie@kernel.org> Reviewed-by: Kristina Martsenko <kristina.martsenko@arm.com> Link: https://lore.kernel.org/r/20220905225425.1871461-5-broonie@kernel.org Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2022-08-03Merge tag 'efi-next-for-v5.20' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/efi/efi Pull EFI updates from Ard Biesheuvel: - Enable mirrored memory for arm64 - Fix up several abuses of the efivar API - Refactor the efivar API in preparation for moving the 'business logic' part of it into efivarfs - Enable ACPI PRM on arm64 * tag 'efi-next-for-v5.20' of git://git.kernel.org/pub/scm/linux/kernel/git/efi/efi: (24 commits) ACPI: Move PRM config option under the main ACPI config ACPI: Enable Platform Runtime Mechanism(PRM) support on ARM64 ACPI: PRM: Change handler_addr type to void pointer efi: Simplify arch_efi_call_virt() macro drivers: fix typo in firmware/efi/memmap.c efi: vars: Drop __efivar_entry_iter() helper which is no longer used efi: vars: Use locking version to iterate over efivars linked lists efi: pstore: Omit efivars caching EFI varstore access layer efi: vars: Add thin wrapper around EFI get/set variable interface efi: vars: Don't drop lock in the middle of efivar_init() pstore: Add priv field to pstore_record for backend specific use Input: applespi - avoid efivars API and invoke EFI services directly selftests/kexec: remove broken EFI_VARS secure boot fallback check brcmfmac: Switch to appropriate helper to load EFI variable contents iwlwifi: Switch to proper EFI variable store interface media: atomisp_gmin_platform: stop abusing efivar API efi: efibc: avoid efivar API for setting variables efi: avoid efivars layer when loading SSDTs from variables efi: Correct comment on efi_memmap_alloc memblock: Disable mirror feature if kernelcore is not specified ...
2022-07-05arm64/mm: Define defer_reserve_crashkernel()Anshuman Khandual
Crash kernel memory reservation gets deferred, when either CONFIG_ZONE_DMA or CONFIG_ZONE_DMA32 config is enabled on the platform. This deferral also impacts overall linear mapping creation including the crash kernel itself. Just encapsulate this deferral check in a new helper for better clarity. Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Will Deacon <will@kernel.org> Cc: linux-arm-kernel@lists.infradead.org Cc: linux-kernel@vger.kernel.org Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com> Link: https://lore.kernel.org/r/20220705062556.1845734-1-anshuman.khandual@arm.com Signed-off-by: Will Deacon <will@kernel.org>
2022-06-15arm64: mm: Only remove nomap flag for initrdMa Wupeng
Commit 177e15f0c144 ("arm64: add the initrd region to the linear mapping explicitly") remove all the flags of the memory used by initrd. This is fine since MEMBLOCK_MIRROR is not used in arm64. However with mirrored feature introduced to arm64, this will clear the mirrored flag used by initrd, which will lead to error log printed by find_zone_movable_pfns_for_nodes() if the lower 4G range has some non-mirrored memory. To solve this problem, only MEMBLOCK_NOMAP flag will be removed via memblock_clear_nomap(). Signed-off-by: Ma Wupeng <mawupeng1@huawei.com> Reviewed-by: Ard Biesheuvel <ardb@kernel.org> Link: https://lore.kernel.org/r/20220614092156.1972846-5-mawupeng1@huawei.com Acked-by: Mike Rapoport <rppt@linux.ibm.com> Acked-by: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
2022-05-25Merge tag 'dma-mapping-5.19-2022-05-25' of ↵Linus Torvalds
git://git.infradead.org/users/hch/dma-mapping Pull dma-mapping updates from Christoph Hellwig: - don't over-decrypt memory (Robin Murphy) - takes min align mask into account for the swiotlb max mapping size (Tianyu Lan) - use GFP_ATOMIC in dma-debug (Mikulas Patocka) - fix DMA_ATTR_NO_KERNEL_MAPPING on xen/arm (me) - don't fail on highmem CMA pages in dma_direct_alloc_pages (me) - cleanup swiotlb initialization and share more code with swiotlb-xen (me, Stefano Stabellini) * tag 'dma-mapping-5.19-2022-05-25' of git://git.infradead.org/users/hch/dma-mapping: (23 commits) dma-direct: don't over-decrypt memory swiotlb: max mapping size takes min align mask into account swiotlb: use the right nslabs-derived sizes in swiotlb_init_late swiotlb: use the right nslabs value in swiotlb_init_remap swiotlb: don't panic when the swiotlb buffer can't be allocated dma-debug: change allocation mode from GFP_NOWAIT to GFP_ATIOMIC dma-direct: don't fail on highmem CMA pages in dma_direct_alloc_pages swiotlb-xen: fix DMA_ATTR_NO_KERNEL_MAPPING on arm x86: remove cruft from <asm/dma-mapping.h> swiotlb: remove swiotlb_init_with_tbl and swiotlb_init_late_with_tbl swiotlb: merge swiotlb-xen initialization into swiotlb swiotlb: provide swiotlb_init variants that remap the buffer swiotlb: pass a gfp_mask argument to swiotlb_init_late swiotlb: add a SWIOTLB_ANY flag to lift the low memory restriction swiotlb: make the swiotlb_init interface more useful x86: centralize setting SWIOTLB_FORCE when guest memory encryption is enabled x86: remove the IOMMU table infrastructure MIPS/octeon: use swiotlb_init instead of open coding it arm/xen: don't check for xen_initial_domain() in xen_create_contiguous_region swiotlb: rename swiotlb_late_init_with_default_size ...
2022-05-20Merge branches 'for-next/sme', 'for-next/stacktrace', ↵Catalin Marinas
'for-next/fault-in-subpage', 'for-next/misc', 'for-next/ftrace' and 'for-next/crashkernel', remote-tracking branch 'arm64/for-next/perf' into for-next/core * arm64/for-next/perf: perf/arm-cmn: Decode CAL devices properly in debugfs perf/arm-cmn: Fix filter_sel lookup perf/marvell_cn10k: Fix tad_pmu_event_init() to check pmu type first drivers/perf: hisi: Add Support for CPA PMU drivers/perf: hisi: Associate PMUs in SICL with CPUs online drivers/perf: arm_spe: Expose saturating counter to 16-bit perf/arm-cmn: Add CMN-700 support perf/arm-cmn: Refactor occupancy filter selector perf/arm-cmn: Add CMN-650 support dt-bindings: perf: arm-cmn: Add CMN-650 and CMN-700 perf: check return value of armpmu_request_irq() perf: RISC-V: Remove non-kernel-doc ** comments * for-next/sme: (30 commits) : Scalable Matrix Extensions support. arm64/sve: Move sve_free() into SVE code section arm64/sve: Make kernel FPU protection RT friendly arm64/sve: Delay freeing memory in fpsimd_flush_thread() arm64/sme: More sensibly define the size for the ZA register set arm64/sme: Fix NULL check after kzalloc arm64/sme: Add ID_AA64SMFR0_EL1 to __read_sysreg_by_encoding() arm64/sme: Provide Kconfig for SME KVM: arm64: Handle SME host state when running guests KVM: arm64: Trap SME usage in guest KVM: arm64: Hide SME system registers from guests arm64/sme: Save and restore streaming mode over EFI runtime calls arm64/sme: Disable streaming mode and ZA when flushing CPU state arm64/sme: Add ptrace support for ZA arm64/sme: Implement ptrace support for streaming mode SVE registers arm64/sme: Implement ZA signal handling arm64/sme: Implement streaming SVE signal handling arm64/sme: Disable ZA and streaming mode when handling signals arm64/sme: Implement traps and syscall handling for SME arm64/sme: Implement ZA context switching arm64/sme: Implement streaming SVE context switching ... * for-next/stacktrace: : Stacktrace cleanups. arm64: stacktrace: align with common naming arm64: stacktrace: rename stackframe to unwind_state arm64: stacktrace: rename unwinder functions arm64: stacktrace: make struct stackframe private to stacktrace.c arm64: stacktrace: delete PCS comment arm64: stacktrace: remove NULL task check from unwind_frame() * for-next/fault-in-subpage: : btrfs search_ioctl() live-lock fix using fault_in_subpage_writeable(). btrfs: Avoid live-lock in search_ioctl() on hardware with sub-page faults arm64: Add support for user sub-page fault probing mm: Add fault_in_subpage_writeable() to probe at sub-page granularity * for-next/misc: : Miscellaneous patches. arm64: Kconfig.platforms: Add comments arm64: Kconfig: Fix indentation and add comments arm64: mm: avoid writable executable mappings in kexec/hibernate code arm64: lds: move special code sections out of kernel exec segment arm64/hugetlb: Implement arm64 specific huge_ptep_get() arm64/hugetlb: Use ptep_get() to get the pte value of a huge page arm64: mm: Make arch_faults_on_old_pte() check for migratability arm64: mte: Clean up user tag accessors arm64/hugetlb: Drop TLB flush from get_clear_flush() arm64: Declare non global symbols as static arm64: mm: Cleanup useless parameters in zone_sizes_init() arm64: fix types in copy_highpage() arm64: Set ARCH_NR_GPIO to 2048 for ARCH_APPLE arm64: cputype: Avoid overflow using MIDR_IMPLEMENTOR_MASK arm64: document the boot requirements for MTE arm64/mm: Compute PTRS_PER_[PMD|PUD] independently of PTRS_PER_PTE * for-next/ftrace: : ftrace cleanups. arm64/ftrace: Make function graph use ftrace directly ftrace: cleanup ftrace_graph_caller enable and disable * for-next/crashkernel: : Support for crashkernel reservations above ZONE_DMA. arm64: kdump: Do not allocate crash low memory if not needed docs: kdump: Update the crashkernel description for arm64 of: Support more than one crash kernel regions for kexec -s of: fdt: Add memory for devices by DT property "linux,usable-memory-range" arm64: kdump: Reimplement crashkernel=X arm64: Use insert_resource() to simplify code kdump: return -ENOENT if required cmdline option does not exist
2022-05-16arm64: kdump: Do not allocate crash low memory if not neededZhen Lei
When "crashkernel=X,high" is specified, the specified "crashkernel=Y,low" memory is not required in the following corner cases: 1. If both CONFIG_ZONE_DMA and CONFIG_ZONE_DMA32 are disabled, it means that the devices can access any memory. 2. If the system memory is small, the crash high memory may be allocated from the DMA zones. If that happens, there's no need to allocate another crash low memory because there's already one. Add condition '(crash_base >= CRASH_ADDR_LOW_MAX)' to determine whether the 'high' memory is allocated above DMA zones. Note: when both CONFIG_ZONE_DMA and CONFIG_ZONE_DMA32 are disabled, the entire physical memory is DMA accessible, CRASH_ADDR_LOW_MAX equals 'PHYS_MASK + 1'. Signed-off-by: Zhen Lei <thunder.leizhen@huawei.com> Acked-by: Baoquan He <bhe@redhat.com> Link: https://lore.kernel.org/r/20220511032033.426-1-thunder.leizhen@huawei.com Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2022-05-07arm64: kdump: Reimplement crashkernel=XChen Zhou
There are following issues in arm64 kdump: 1. We use crashkernel=X to reserve crashkernel in DMA zone, which will fail when there is not enough low memory. 2. If reserving crashkernel above DMA zone, in this case, crash dump kernel will fail to boot because there is no low memory available for allocation. To solve these issues, introduce crashkernel=X,[high,low]. The "crashkernel=X,high" is used to select a region above DMA zone, and the "crashkernel=Y,low" is used to allocate specified size low memory. Signed-off-by: Chen Zhou <chenzhou10@huawei.com> Co-developed-by: Zhen Lei <thunder.leizhen@huawei.com> Signed-off-by: Zhen Lei <thunder.leizhen@huawei.com> Link: https://lore.kernel.org/r/20220506114402.365-4-thunder.leizhen@huawei.com Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2022-05-07arm64: Use insert_resource() to simplify codeZhen Lei
insert_resource() traverses the subtree layer by layer from the root node until a proper location is found. Compared with request_resource(), the parent node does not need to be determined in advance. In addition, move the insertion of node 'crashk_res' into function reserve_crashkernel() to make the associated code close together. Signed-off-by: Zhen Lei <thunder.leizhen@huawei.com> Acked-by: John Donnelly <john.p.donnelly@oracle.com> Acked-by: Baoquan He <bhe@redhat.com> Link: https://lore.kernel.org/r/20220506114402.365-3-thunder.leizhen@huawei.com Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2022-05-05arm64: mm: Cleanup useless parameters in zone_sizes_init()Kefeng Wang
Directly use max_pfn for max and no one use min, kill them. Reviewed-by: Vijay Balakrishna <vijayb@linux.microsoft.com> Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com> Link: https://lore.kernel.org/r/20220411092455.1461-4-wangkefeng.wang@huawei.com Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2022-04-18swiotlb: make the swiotlb_init interface more usefulChristoph Hellwig
Pass a boolean flag to indicate if swiotlb needs to be enabled based on the addressing needs, and replace the verbose argument with a set of flags, including one to force enable bounce buffering. Note that this patch removes the possibility to force xen-swiotlb use with the swiotlb=force parameter on the command line on x86 (arm and arm64 never supported that), but this interface will be restored shortly. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Tested-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
2022-04-04arm64: fix typos in commentsJulia Lawall
Various spelling mistakes in comments. Detected with the help of Coccinelle. Signed-off-by: Julia Lawall <Julia.Lawall@inria.fr> Link: https://lore.kernel.org/r/20220318103729.157574-10-Julia.Lawall@inria.fr [will: Squashed in 20220318103729.157574-28-Julia.Lawall@inria.fr] Signed-off-by: Will Deacon <will@kernel.org>
2022-03-24Merge branch 'akpm' (patches from Andrew)Linus Torvalds
Merge more updates from Andrew Morton: "Various misc subsystems, before getting into the post-linux-next material. 41 patches. Subsystems affected by this patch series: procfs, misc, core-kernel, lib, checkpatch, init, pipe, minix, fat, cgroups, kexec, kdump, taskstats, panic, kcov, resource, and ubsan" * emailed patches from Andrew Morton <akpm@linux-foundation.org>: (41 commits) Revert "ubsan, kcsan: Don't combine sanitizer with kcov on clang" kernel/resource: fix kfree() of bootmem memory again kcov: properly handle subsequent mmap calls kcov: split ioctl handling into locked and unlocked parts panic: move panic_print before kmsg dumpers panic: add option to dump all CPUs backtraces in panic_print docs: sysctl/kernel: add missing bit to panic_print taskstats: remove unneeded dead assignment kasan: no need to unset panic_on_warn in end_report() ubsan: no need to unset panic_on_warn in ubsan_epilogue() panic: unset panic_on_warn inside panic() docs: kdump: add scp example to write out the dump file docs: kdump: update description about sysfs file system support arm64: mm: use IS_ENABLED(CONFIG_KEXEC_CORE) instead of #ifdef x86/setup: use IS_ENABLED(CONFIG_KEXEC_CORE) instead of #ifdef riscv: mm: init: use IS_ENABLED(CONFIG_KEXEC_CORE) instead of #ifdef kexec: make crashk_res, crashk_low_res and crash_notes symbols always visible cgroup: use irqsave in cgroup_rstat_flush_locked(). fat: use pointer to simple type in put_user() minix: fix bug when opening a file with O_DIRECT ...
2022-03-23arm64: mm: use IS_ENABLED(CONFIG_KEXEC_CORE) instead of #ifdefJisheng Zhang
Replace the conditional compilation using "#ifdef CONFIG_KEXEC_CORE" by a check for "IS_ENABLED(CONFIG_KEXEC_CORE)", to simplify the code and increase compile coverage. Link: https://lkml.kernel.org/r/20211206160514.2000-5-jszhang@kernel.org Signed-off-by: Jisheng Zhang <jszhang@kernel.org> Acked-by: Catalin Marinas <catalin.marinas@arm.com> Acked-by: Baoquan He <bhe@redhat.com> Cc: Albert Ou <aou@eecs.berkeley.edu> Cc: Alexandre Ghiti <alex@ghiti.fr> Cc: Borislav Petkov <bp@alien8.de> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Eric W. Biederman <ebiederm@xmission.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Palmer Dabbelt <palmer@dabbelt.com> Cc: Palmer Dabbelt <palmer@rivosinc.com> Cc: Paul Walmsley <paul.walmsley@sifive.com> Cc: Russell King <linux@armlinux.org.uk> Cc: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Will Deacon <will@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-03-09arm64: mm: Drop 'const' from conditional arm64_dma_phys_limit definitionWill Deacon
Commit 031495635b46 ("arm64: Do not defer reserve_crashkernel() for platforms with no DMA memory zones") introduced different definitions for 'arm64_dma_phys_limit' depending on CONFIG_ZONE_DMA{,32} based on a late suggestion from Pasha. Sadly, this results in a build error when passing W=1: | arch/arm64/mm/init.c:90:19: error: conflicting type qualifiers for 'arm64_dma_phys_limit' Drop the 'const' for now and use '__ro_after_init' consistently. Link: https://lore.kernel.org/r/202203090241.aj7paWeX-lkp@intel.com Link: https://lore.kernel.org/r/CA+CK2bDbbx=8R=UthkMesWOST8eJMtOGJdfMRTFSwVmo0Vn0EA@mail.gmail.com Fixes: 031495635b46 ("arm64: Do not defer reserve_crashkernel() for platforms with no DMA memory zones") Signed-off-by: Will Deacon <will@kernel.org>
2022-03-08arm64: Do not defer reserve_crashkernel() for platforms with no DMA memory zonesVijay Balakrishna
The following patches resulted in deferring crash kernel reservation to mem_init(), mainly aimed at platforms with DMA memory zones (no IOMMU), in particular Raspberry Pi 4. commit 1a8e1cef7603 ("arm64: use both ZONE_DMA and ZONE_DMA32") commit 8424ecdde7df ("arm64: mm: Set ZONE_DMA size based on devicetree's dma-ranges") commit 0a30c53573b0 ("arm64: mm: Move reserve_crashkernel() into mem_init()") commit 2687275a5843 ("arm64: Force NO_BLOCK_MAPPINGS if crashkernel reservation is required") Above changes introduced boot slowdown due to linear map creation for all the memory banks with NO_BLOCK_MAPPINGS, see discussion[1]. The proposed changes restore crash kernel reservation to earlier behavior thus avoids slow boot, particularly for platforms with IOMMU (no DMA memory zones). Tested changes to confirm no ~150ms boot slowdown on our SoC with IOMMU and 8GB memory. Also tested with ZONE_DMA and/or ZONE_DMA32 configs to confirm no regression to deferring scheme of crash kernel memory reservation. In both cases successfully collected kernel crash dump. [1] https://lore.kernel.org/all/9436d033-579b-55fa-9b00-6f4b661c2dd7@linux.microsoft.com/ Signed-off-by: Vijay Balakrishna <vijayb@linux.microsoft.com> Cc: stable@vger.kernel.org Reviewed-by: Pasha Tatashin <pasha.tatashin@soleen.com> Link: https://lore.kernel.org/r/1646242689-20744-1-git-send-email-vijayb@linux.microsoft.com [will: Add #ifdef CONFIG_KEXEC_CORE guards to fix 'crashk_res' references in allnoconfig build] Signed-off-by: Will Deacon <will@kernel.org>
2022-01-20arm64: mm: apply __ro_after_init to memory_limitPeng Fan
This variable is only set during initialization, so mark with __ro_after_init. Signed-off-by: Peng Fan <peng.fan@nxp.com> Reviewed-by: David Hildenbrand <david@redhat.com> Acked-by: Ard Biesheuvel <ardb@kernel.org> Link: https://lore.kernel.org/r/20211215064559.2843555-1-peng.fan@oss.nxp.com Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2021-10-29Merge branch 'for-next/pfn-valid' into for-next/coreWill Deacon
* for-next/pfn-valid: arm64/mm: drop HAVE_ARCH_PFN_VALID dma-mapping: remove bogus test for pfn_valid from dma_map_resource
2021-10-01arm64/mm: drop HAVE_ARCH_PFN_VALIDAnshuman Khandual
CONFIG_SPARSEMEM_VMEMMAP is now the only available memory model on arm64 platforms and free_unused_memmap() would just return without creating any holes in the memmap mapping. There is no need for any special handling in pfn_valid() and HAVE_ARCH_PFN_VALID can just be dropped. This also moves the pfn upper bits sanity check into generic pfn_valid(). [rppt: rebased on v5.15-rc3] Link: https://lkml.kernel.org/r/1621947349-25421-1-git-send-email-anshuman.khandual@arm.com Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com> Acked-by: David Hildenbrand <david@redhat.com> Acked-by: Mike Rapoport <rppt@linux.ibm.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Will Deacon <will@kernel.org> Cc: David Hildenbrand <david@redhat.com> Cc: Mike Rapoport <rppt@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Mike Rapoport <rppt@linux.ibm.com> Link: https://lore.kernel.org/r/20210930013039.11260-3-rppt@kernel.org Signed-off-by: Will Deacon <will@kernel.org>
2021-09-29arm64: mm: Drop pointless call to set_max_mapnr()Will Deacon
set_max_mapnr() is an empty stub function if CONFIG_NUMA=y, otherwise it assigns to the 'max_mapnr' variable which is used to provide a generic pfn_valid() implementation if CONFIG_MMU=n. Since we don't support nommu on arm64, drop the pointless call to set_max_mapnr() from mem_init(). Link: https://lore.kernel.org/r/130a50d7-92fd-31fa-261e-f73dadcb4fcf@redhat.com Signed-off-by: Will Deacon <will@kernel.org>
2021-09-10Merge tag 'arm64-fixes' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux Pull arm64 fixes from Catalin Marinas: - Limit the linear region to 51-bit when KVM is running in nVHE mode. Otherwise, depending on the placement of the ID map, kernel-VA to hyp-VA translations may produce addresses that either conflict with other HYP mappings or generate addresses outside of the 52-bit addressable range. - Instruct kmemleak not to scan the memory reserved for kdump as this range is removed from the kernel linear map and therefore not accessible. * tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: arm64: kdump: Skip kmemleak scan reserved memory for kdump arm64: mm: limit linear region to 51 bits for KVM in nVHE mode
2021-09-10arm64: kdump: Skip kmemleak scan reserved memory for kdumpChen Wandun
Trying to boot with kdump + kmemleak, command will result in a crash: "echo scan > /sys/kernel/debug/kmemleak" crashkernel reserved: 0x0000000007c00000 - 0x0000000027c00000 (512 MB) Kernel command line: BOOT_IMAGE=(hd1,gpt2)/vmlinuz-5.14.0-rc5-next-20210809+ root=/dev/mapper/ao-root ro rd.lvm.lv=ao/root rd.lvm.lv=ao/swap crashkernel=512M Unable to handle kernel paging request at virtual address ffff000007c00000 Mem abort info: ESR = 0x96000007 EC = 0x25: DABT (current EL), IL = 32 bits SET = 0, FnV = 0 EA = 0, S1PTW = 0 FSC = 0x07: level 3 translation fault Data abort info: ISV = 0, ISS = 0x00000007 CM = 0, WnR = 0 swapper pgtable: 64k pages, 48-bit VAs, pgdp=00002024f0d80000 [ffff000007c00000] pgd=1800205ffffd0003, p4d=1800205ffffd0003, pud=1800205ffffd0003, pmd=1800205ffffc0003, pte=0068000007c00f06 Internal error: Oops: 96000007 [#1] SMP pstate: 804000c9 (Nzcv daIF +PAN -UAO -TCO -DIT -SSBS BTYPE=--) pc : scan_block+0x98/0x230 lr : scan_block+0x94/0x230 sp : ffff80008d6cfb70 x29: ffff80008d6cfb70 x28: 0000000000000000 x27: 0000000000000000 x26: 00000000000000c0 x25: 0000000000000001 x24: 0000000000000000 x23: ffffa88a6b18b398 x22: ffff000007c00ff9 x21: ffffa88a6ac7fc40 x20: ffffa88a6af6a830 x19: ffff000007c00000 x18: 0000000000000000 x17: 0000000000000000 x16: 0000000000000000 x15: ffffffffffffffff x14: ffffffff00000000 x13: ffffffffffffffff x12: 0000000000000020 x11: 0000000000000000 x10: 0000000001080000 x9 : ffffa88a6951c77c x8 : ffffa88a6a893988 x7 : ffff203ff6cfb3c0 x6 : ffffa88a6a52b3c0 x5 : ffff203ff6cfb3c0 x4 : 0000000000000000 x3 : 0000000000000000 x2 : 0000000000000001 x1 : ffff20226cb56a40 x0 : 0000000000000000 Call trace: scan_block+0x98/0x230 scan_gray_list+0x120/0x270 kmemleak_scan+0x3a0/0x648 kmemleak_write+0x3ac/0x4c8 full_proxy_write+0x6c/0xa0 vfs_write+0xc8/0x2b8 ksys_write+0x70/0xf8 __arm64_sys_write+0x24/0x30 invoke_syscall+0x4c/0x110 el0_svc_common+0x9c/0x190 do_el0_svc+0x30/0x98 el0_svc+0x28/0xd8 el0t_64_sync_handler+0x90/0xb8 el0t_64_sync+0x180/0x184 The reserved memory for kdump will be looked up by kmemleak, this area will be set invalid when kdump service is bring up. That will result in crash when kmemleak scan this area. Fixes: a7259df76702 ("memblock: make memblock_find_in_range method private") Signed-off-by: Chen Wandun <chenwandun@huawei.com> Reviewed-by: Kefeng Wang <wangkefeng.wang@huawei.com> Reviewed-by: Mike Rapoport <rppt@linux.ibm.com> Reviewed-by: Catalin Marinas <catalin.marinas@arm.com> Link: https://lore.kernel.org/r/20210910064844.3827813-1-chenwandun@huawei.com Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2021-09-09arm64: mm: limit linear region to 51 bits for KVM in nVHE modeArd Biesheuvel
KVM in nVHE mode divides up its VA space into two equal halves, and picks the half that does not conflict with the HYP ID map to map its linear region. This worked fine when the kernel's linear map itself was guaranteed to cover precisely as many bits of VA space, but this was changed by commit f4693c2716b35d08 ("arm64: mm: extend linear region for 52-bit VA configurations"). The result is that, depending on the placement of the ID map, kernel-VA to hyp-VA translations may produce addresses that either conflict with other HYP mappings (including the ID map itself) or generate addresses outside of the 52-bit addressable range, neither of which is likely to lead to anything useful. Given that 52-bit capable cores are guaranteed to implement VHE, this only affects configurations such as pKVM where we opt into non-VHE mode even if the hardware is VHE capable. So just for these configurations, let's limit the kernel linear map to 51 bits and work around the problem. Fixes: f4693c2716b3 ("arm64: mm: extend linear region for 52-bit VA configurations") Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Link: https://lore.kernel.org/r/20210826165613.60774-1-ardb@kernel.org Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2021-09-03Merge branch 'akpm' (patches from Andrew)Linus Torvalds
Merge misc updates from Andrew Morton: "173 patches. Subsystems affected by this series: ia64, ocfs2, block, and mm (debug, pagecache, gup, swap, shmem, memcg, selftests, pagemap, mremap, bootmem, sparsemem, vmalloc, kasan, pagealloc, memory-failure, hugetlb, userfaultfd, vmscan, compaction, mempolicy, memblock, oom-kill, migration, ksm, percpu, vmstat, and madvise)" * emailed patches from Andrew Morton <akpm@linux-foundation.org>: (173 commits) mm/madvise: add MADV_WILLNEED to process_madvise() mm/vmstat: remove unneeded return value mm/vmstat: simplify the array size calculation mm/vmstat: correct some wrong comments mm/percpu,c: remove obsolete comments of pcpu_chunk_populated() selftests: vm: add COW time test for KSM pages selftests: vm: add KSM merging time test mm: KSM: fix data type selftests: vm: add KSM merging across nodes test selftests: vm: add KSM zero page merging test selftests: vm: add KSM unmerge test selftests: vm: add KSM merge test mm/migrate: correct kernel-doc notation mm: wire up syscall process_mrelease mm: introduce process_mrelease system call memblock: make memblock_find_in_range method private mm/mempolicy.c: use in_task() in mempolicy_slab_node() mm/mempolicy: unify the create() func for bind/interleave/prefer-many policies mm/mempolicy: advertise new MPOL_PREFERRED_MANY mm/hugetlb: add support for mempolicy MPOL_PREFERRED_MANY ...
2021-09-03memblock: make memblock_find_in_range method privateMike Rapoport
There are a lot of uses of memblock_find_in_range() along with memblock_reserve() from the times memblock allocation APIs did not exist. memblock_find_in_range() is the very core of memblock allocations, so any future changes to its internal behaviour would mandate updates of all the users outside memblock. Replace the calls to memblock_find_in_range() with an equivalent calls to memblock_phys_alloc() and memblock_phys_alloc_range() and make memblock_find_in_range() private method of memblock. This simplifies the callers, ensures that (unlikely) errors in memblock_reserve() are handled and improves maintainability of memblock_find_in_range(). Link: https://lkml.kernel.org/r/20210816122622.30279-1-rppt@kernel.org Signed-off-by: Mike Rapoport <rppt@linux.ibm.com> Reviewed-by: Catalin Marinas <catalin.marinas@arm.com> [arm64] Acked-by: Kirill A. Shutemov <kirill.shtuemov@linux.intel.com> Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> [ACPI] Acked-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Acked-by: Nick Kossifidis <mick@ics.forth.gr> [riscv] Tested-by: Guenter Roeck <linux@roeck-us.net> Acked-by: Rob Herring <robh@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-09-01Merge tag 'devicetree-for-5.15' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux Pull devicetree updates from Rob Herring: - Refactor arch kdump DT related code to a common implementation - Add fw_devlink tracking for 'phy-handle', 'leds', 'backlight', 'resets', and 'pwm' properties - Various clean-ups to DT FDT code - Fix a runtime error for !CONFIG_SYSFS - Convert Synopsys DW PCI and derivative binding docs to schemas. Add Toshiba Visconti PCIe binding. - Convert a bunch of memory controller bindings to schemas - Covert eeprom-93xx46, Samsung Exynos TRNG, Samsung Exynos IRQ combiner, arm-charlcd, img-ascii-lcd, UniPhier eFuse, Xilinx Zynq MPSoC FPGA, Xilinx Zynq MPSoC reset, Mediatek mmsys, Gemini boards, brcm,iproc-i2c, faraday,ftpci100, and ks8851 net to DT schema. - Extend nvmem bindings to handle bit offsets in unit-addresses - Add DT schemas for HiKey 970 PCIe PHY - Remove unused ZTE, energymicro,efm32-timer, and Exynos SATA bindings - Enable dtc pci_device_reg warning by default - Fixes for handling 'unevaluatedProperties' in preparation to enable pending support in the tooling for jsonschema 2020-12 draft * tag 'devicetree-for-5.15' of git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux: (78 commits) dt-bindings: display: remove zte,vou.txt binding doc dt-bindings: hwmon: merge max1619 into trivial devices dt-bindings: mtd-physmap: Add 'arm,vexpress-flash' compatible dt-bindings: PCI: imx6: convert the imx pcie controller to dtschema dt-bindings: Use 'enum' instead of 'oneOf' plus 'const' entries dt-bindings: Add vendor prefix for Topic Embedded Systems of: fdt: Rename reserve_elfcorehdr() to fdt_reserve_elfcorehdr() arm64: kdump: Remove custom linux,usable-memory-range handling arm64: kdump: Remove custom linux,elfcorehdr handling riscv: Remove non-standard linux,elfcorehdr handling of: fdt: Use IS_ENABLED(CONFIG_BLK_DEV_INITRD) instead of #ifdef of: fdt: Add generic support for handling usable memory range property of: fdt: Add generic support for handling elf core headers property crash_dump: Make elfcorehdr address/size symbols always visible dt-bindings: memory: convert Samsung Exynos DMC to dtschema dt-bindings: devfreq: event: convert Samsung Exynos PPMU to dtschema dt-bindings: devfreq: event: convert Samsung Exynos NoCP to dtschema kbuild: Enable dtc 'pci_device_reg' warning by default dt-bindings: soc: remove obsolete zte zx header dt-bindings: clock: remove obsolete zte zx header ...
2021-08-25Partially revert "arm64/mm: drop HAVE_ARCH_PFN_VALID"Will Deacon
This partially reverts commit 16c9afc776608324ca71c0bc354987bab532f51d. Alex Bee reports a regression in 5.14 on their RK3328 SoC when configuring the PL330 DMA controller: | ------------[ cut here ]------------ | WARNING: CPU: 2 PID: 373 at kernel/dma/mapping.c:235 dma_map_resource+0x68/0xc0 | Modules linked in: spi_rockchip(+) fuse | CPU: 2 PID: 373 Comm: systemd-udevd Not tainted 5.14.0-rc7 #1 | Hardware name: Pine64 Rock64 (DT) | pstate: 80000005 (Nzcv daif -PAN -UAO -TCO BTYPE=--) | pc : dma_map_resource+0x68/0xc0 | lr : pl330_prep_slave_fifo+0x78/0xd0 This appears to be because dma_map_resource() is being called for a physical address which does not correspond to a memory address yet does have a valid 'struct page' due to the way in which the vmemmap is constructed. Prior to 16c9afc77660 ("arm64/mm: drop HAVE_ARCH_PFN_VALID"), the arm64 implementation of pfn_valid() called memblock_is_memory() to return 'false' for such regions and the DMA mapping request would proceed. However, now that we are using the generic implementation where only the presence of the memory map entry is considered, we return 'true' and erroneously fail with DMA_MAPPING_ERROR because we identify the region as DRAM. Although fixing this in the DMA mapping code is arguably the right fix, it is a risky, cross-architecture change at this stage in the cycle. So just revert arm64 back to its old pfn_valid() implementation for v5.14. The change to the generic pfn_valid() code is preserved from the original patch, so as to avoid impacting other architectures. Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Robin Murphy <robin.murphy@arm.com> Cc: Mike Rapoport <rppt@kernel.org> Cc: Anshuman Khandual <anshuman.khandual@arm.com> Cc: Christoph Hellwig <hch@lst.de> Reported-by: Alex Bee <knaerzche@gmail.com> Link: https://lore.kernel.org/r/d3a3c828-b777-faf8-e901-904995688437@gmail.com Signed-off-by: Will Deacon <will@kernel.org>
2021-08-24arm64: kdump: Remove custom linux,usable-memory-range handlingGeert Uytterhoeven
Remove the architecture-specific code for handling the "linux,usable-memory-range" property under the "/chosen" node in DT, as the platform-agnostic FDT core code already takes care of this. Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be> Acked-by: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Rob Herring <robh@kernel.org> Link: https://lore.kernel.org/r/7356c531c49a24b4a55577bf8e46d93f4d8ae460.1628670468.git.geert+renesas@glider.be
2021-08-24arm64: kdump: Remove custom linux,elfcorehdr handlingGeert Uytterhoeven
Remove the architecture-specific code for handling the "linux,elfcorehdr" property under the "/chosen" node in DT, as the platform-agnostic handling in the FDT core code already takes care of this. Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be> Acked-by: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Rob Herring <robh@kernel.org> Link: https://lore.kernel.org/r/3b8f801f9b92066855e87f3079fafc153ab20f69.1628670468.git.geert+renesas@glider.be
2021-07-02Merge branch 'akpm' (patches from Andrew)Linus Torvalds
Merge more updates from Andrew Morton: "190 patches. Subsystems affected by this patch series: mm (hugetlb, userfaultfd, vmscan, kconfig, proc, z3fold, zbud, ras, mempolicy, memblock, migration, thp, nommu, kconfig, madvise, memory-hotplug, zswap, zsmalloc, zram, cleanups, kfence, and hmm), procfs, sysctl, misc, core-kernel, lib, lz4, checkpatch, init, kprobes, nilfs2, hfs, signals, exec, kcov, selftests, compress/decompress, and ipc" * emailed patches from Andrew Morton <akpm@linux-foundation.org>: (190 commits) ipc/util.c: use binary search for max_idx ipc/sem.c: use READ_ONCE()/WRITE_ONCE() for use_global_lock ipc: use kmalloc for msg_queue and shmid_kernel ipc sem: use kvmalloc for sem_undo allocation lib/decompressors: remove set but not used variabled 'level' selftests/vm/pkeys: exercise x86 XSAVE init state selftests/vm/pkeys: refill shadow register after implicit kernel write selftests/vm/pkeys: handle negative sys_pkey_alloc() return code selftests/vm/pkeys: fix alloc_random_pkey() to make it really, really random kcov: add __no_sanitize_coverage to fix noinstr for all architectures exec: remove checks in __register_bimfmt() x86: signal: don't do sas_ss_reset() until we are certain that sigframe won't be abandoned hfsplus: report create_date to kstat.btime hfsplus: remove unnecessary oom message nilfs2: remove redundant continue statement in a while-loop kprobes: remove duplicated strong free_insn_page in x86 and s390 init: print out unknown kernel parameters checkpatch: do not complain about positive return values starting with EPOLL checkpatch: improve the indented label test checkpatch: scripts/spdxcheck.py now requires python3 ...
2021-06-30arm64/mm: drop HAVE_ARCH_PFN_VALIDAnshuman Khandual
CONFIG_SPARSEMEM_VMEMMAP is now the only available memory model on arm64 platforms and free_unused_memmap() would just return without creating any holes in the memmap mapping. There is no need for any special handling in pfn_valid() and HAVE_ARCH_PFN_VALID can just be dropped. This also moves the pfn upper bits sanity check into generic pfn_valid(). Link: https://lkml.kernel.org/r/1621947349-25421-1-git-send-email-anshuman.khandual@arm.com Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com> Acked-by: David Hildenbrand <david@redhat.com> Acked-by: Mike Rapoport <rppt@linux.ibm.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Will Deacon <will@kernel.org> Cc: David Hildenbrand <david@redhat.com> Cc: Mike Rapoport <rppt@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-06-30arm64: drop pfn_valid_within() and simplify pfn_valid()Mike Rapoport
The arm64's version of pfn_valid() differs from the generic because of two reasons: * Parts of the memory map are freed during boot. This makes it necessary to verify that there is actual physical memory that corresponds to a pfn which is done by querying memblock. * There are NOMAP memory regions. These regions are not mapped in the linear map and until the previous commit the struct pages representing these areas had default values. As the consequence of absence of the special treatment of NOMAP regions in the memory map it was necessary to use memblock_is_map_memory() in pfn_valid() and to have pfn_valid_within() aliased to pfn_valid() so that generic mm functionality would not treat a NOMAP page as a normal page. Since the NOMAP regions are now marked as PageReserved(), pfn walkers and the rest of core mm will treat them as unusable memory and thus pfn_valid_within() is no longer required at all and can be disabled on arm64. pfn_valid() can be slightly simplified by replacing memblock_is_map_memory() with memblock_is_memory(). [rppt@kernel.org: fix merge fix] Link: https://lkml.kernel.org/r/YJtoQhidtIJOhYsV@kernel.org Link: https://lkml.kernel.org/r/20210511100550.28178-5-rppt@kernel.org Signed-off-by: Mike Rapoport <rppt@linux.ibm.com> Acked-by: David Hildenbrand <david@redhat.com> Acked-by: Ard Biesheuvel <ardb@kernel.org> Reviewed-by: Kefeng Wang <wangkefeng.wang@huawei.com> Cc: Anshuman Khandual <anshuman.khandual@arm.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Marc Zyngier <maz@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Will Deacon <will@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-06-30arm64: decouple check whether pfn is in linear map from pfn_valid()Mike Rapoport
The intended semantics of pfn_valid() is to verify whether there is a struct page for the pfn in question and nothing else. Yet, on arm64 it is used to distinguish memory areas that are mapped in the linear map vs those that require ioremap() to access them. Introduce a dedicated pfn_is_map_memory() wrapper for memblock_is_map_memory() to perform such check and use it where appropriate. Using a wrapper allows to avoid cyclic include dependencies. While here also update style of pfn_valid() so that both pfn_valid() and pfn_is_map_memory() declarations will be consistent. Link: https://lkml.kernel.org/r/20210511100550.28178-4-rppt@kernel.org Signed-off-by: Mike Rapoport <rppt@linux.ibm.com> Acked-by: David Hildenbrand <david@redhat.com> Acked-by: Ard Biesheuvel <ardb@kernel.org> Reviewed-by: Kefeng Wang <wangkefeng.wang@huawei.com> Cc: Anshuman Khandual <anshuman.khandual@arm.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Marc Zyngier <maz@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Will Deacon <will@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-05-25arm64/mm: Validate CONFIG_PGTABLE_LEVELSAnshuman Khandual
CONFIG_PGTABLE_LEVELS has been statically defined in (arch/arm64/Kconfig) depending on the page size and requested virtual address range. In order to validate this page table levels selection this adds a BUILD_BUG_ON() as per the existing formula ARM64_HW_PGTABLE_LEVELS(). This would help protect any inadvertent changes to CONFIG_PGTABLE_LEVELS selection. Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Will Deacon <will@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: linux-arm-kernel@lists.infradead.org Cc: linux-kernel@vger.kernel.org Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com> Link: https://lore.kernel.org/r/1620649326-24115-1-git-send-email-anshuman.khandual@arm.com Signed-off-by: Will Deacon <will@kernel.org>
2021-05-14arm64: do not set SWIOTLB_NO_FORCE when swiotlb is requiredChristoph Hellwig
Although SWIOTLB_NO_FORCE is meant to allow later calls to swiotlb_init, today dma_direct_map_page returns error if SWIOTLB_NO_FORCE. For now, without a larger overhaul of SWIOTLB_NO_FORCE, the best we can do is to avoid setting SWIOTLB_NO_FORCE in mem_init when we know that it is going to be required later (e.g. Xen requires it). CC: boris.ostrovsky@oracle.com CC: jgross@suse.com CC: catalin.marinas@arm.com CC: will@kernel.org CC: linux-arm-kernel@lists.infradead.org Fixes: 2726bf3ff252 ("swiotlb: Make SWIOTLB_NO_FORCE perform no allocation") Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Stefano Stabellini <stefano.stabellini@xilinx.com> Reviewed-by: Juergen Gross <jgross@suse.com> Acked-by: Catalin Marinas <catalin.marinas@arm.com> Link: https://lore.kernel.org/r/20210512201823.1963-2-sstabellini@kernel.org Signed-off-by: Juergen Gross <jgross@suse.com>
2021-05-07Merge tag 'arm64-fixes' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux Pull more arm64 updates from Catalin Marinas: "A mix of fixes and clean-ups that turned up too late for the first pull request: - Restore terminal stack frame records. Their previous removal caused traces which cross secondary_start_kernel to terminate one entry too late, with a spurious "0" entry. - Fix boot warning with pseudo-NMI due to the way we manipulate the PMR register. - ACPI fixes: avoid corruption of interrupt mappings on watchdog probe failure (GTDT), prevent unregistering of GIC SGIs. - Force SPARSEMEM_VMEMMAP as the only memory model, it saves with having to test all the other combinations. - Documentation fixes and updates: tagged address ABI exceptions on brk/mmap/mremap(), event stream frequency, update booting requirements on the configuration of traps" * tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: arm64: kernel: Update the stale comment arm64: Fix the documented event stream frequency arm64: entry: always set GIC_PRIO_PSR_I_SET during entry arm64: Explicitly document boot requirements for SVE arm64: Explicitly require that FPSIMD instructions do not trap arm64: Relax booting requirements for configuration of traps arm64: cpufeatures: use min and max arm64: stacktrace: restore terminal records arm64/vdso: Discard .note.gnu.property sections in vDSO arm64: doc: Add brk/mmap/mremap() to the Tagged Address ABI Exceptions psci: Remove unneeded semicolon ACPI: irq: Prevent unregistering of GIC SGIs ACPI: GTDT: Don't corrupt interrupt mappings on watchdow probe failure arm64: Show three registers per line arm64: remove HAVE_DEBUG_BUGVERBOSE arm64: alternative: simplify passing alt_region arm64: Force SPARSEMEM_VMEMMAP as the only memory management model arm64: vdso32: drop -no-integrated-as flag
2021-05-01Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds
Pull kvm updates from Paolo Bonzini: "This is a large update by KVM standards, including AMD PSP (Platform Security Processor, aka "AMD Secure Technology") and ARM CoreSight (debug and trace) changes. ARM: - CoreSight: Add support for ETE and TRBE - Stage-2 isolation for the host kernel when running in protected mode - Guest SVE support when running in nVHE mode - Force W^X hypervisor mappings in nVHE mode - ITS save/restore for guests using direct injection with GICv4.1 - nVHE panics now produce readable backtraces - Guest support for PTP using the ptp_kvm driver - Performance improvements in the S2 fault handler x86: - AMD PSP driver changes - Optimizations and cleanup of nested SVM code - AMD: Support for virtual SPEC_CTRL - Optimizations of the new MMU code: fast invalidation, zap under read lock, enable/disably dirty page logging under read lock - /dev/kvm API for AMD SEV live migration (guest API coming soon) - support SEV virtual machines sharing the same encryption context - support SGX in virtual machines - add a few more statistics - improved directed yield heuristics - Lots and lots of cleanups Generic: - Rework of MMU notifier interface, simplifying and optimizing the architecture-specific code - a handful of "Get rid of oprofile leftovers" patches - Some selftests improvements" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (379 commits) KVM: selftests: Speed up set_memory_region_test selftests: kvm: Fix the check of return value KVM: x86: Take advantage of kvm_arch_dy_has_pending_interrupt() KVM: SVM: Skip SEV cache flush if no ASIDs have been used KVM: SVM: Remove an unnecessary prototype declaration of sev_flush_asids() KVM: SVM: Drop redundant svm_sev_enabled() helper KVM: SVM: Move SEV VMCB tracking allocation to sev.c KVM: SVM: Explicitly check max SEV ASID during sev_hardware_setup() KVM: SVM: Unconditionally invoke sev_hardware_teardown() KVM: SVM: Enable SEV/SEV-ES functionality by default (when supported) KVM: SVM: Condition sev_enabled and sev_es_enabled on CONFIG_KVM_AMD_SEV=y KVM: SVM: Append "_enabled" to module-scoped SEV/SEV-ES control variables KVM: SEV: Mask CPUID[0x8000001F].eax according to supported features KVM: SVM: Move SEV module params/variables to sev.c KVM: SVM: Disable SEV/SEV-ES if NPT is disabled KVM: SVM: Free sev_asid_bitmap during init if SEV setup fails KVM: SVM: Zero out the VMCB array used to track SEV ASID association x86/sev: Drop redundant and potentially misleading 'sev_enabled' KVM: x86: Move reverse CPUID helpers to separate header file KVM: x86: Rename GPR accessors to make mode-aware variants the defaults ...