aboutsummaryrefslogtreecommitdiffstats
path: root/arch/s390
AgeCommit message (Collapse)Author
2024-05-17s390/mm: Fix clearing storage keys for huge pagesClaudio Imbrenda
[ Upstream commit 412050af2ea39407fe43324b0be4ab641530ce88 ] The function __storage_key_init_range() expects the end address to be the first byte outside the range to be initialized. I.e. end - start should be the size of the area to be initialized. The current code works because __storage_key_init_range() will still loop over every page in the range, but it is slower than using sske_frame(). Fixes: 3afdfca69870 ("s390/mm: Clear skeys for newly mapped huge guest pmds") Reviewed-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Claudio Imbrenda <imbrenda@linux.ibm.com> Link: https://lore.kernel.org/r/20240416114220.28489-3-imbrenda@linux.ibm.com Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-05-17s390/mm: Fix storage key clearing for guest huge pagesClaudio Imbrenda
[ Upstream commit 843c3280686fc1a83d89ee1e0b5599c9f6b09d0c ] The function __storage_key_init_range() expects the end address to be the first byte outside the range to be initialized. I.e. end - start should be the size of the area to be initialized. The current code works because __storage_key_init_range() will still loop over every page in the range, but it is slower than using sske_frame(). Fixes: 964c2c05c9f3 ("s390/mm: Clear huge page storage keys on enable_skey") Reviewed-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Claudio Imbrenda <imbrenda@linux.ibm.com> Link: https://lore.kernel.org/r/20240416114220.28489-2-imbrenda@linux.ibm.com Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-04-13s390/entry: align system call table on 8 bytesSumanth Korikkar
commit 378ca2d2ad410a1cd5690d06b46c5e2297f4c8c0 upstream. Align system call table on 8 bytes. With sys_call_table entry size of 8 bytes that eliminates the possibility of a system call pointer crossing cache line boundary. Cc: stable@kernel.org Suggested-by: Ulrich Weigand <ulrich.weigand@de.ibm.com> Reviewed-by: Alexander Gordeev <agordeev@linux.ibm.com> Signed-off-by: Sumanth Korikkar <sumanthk@linux.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-03-26s390/vtime: fix average steal time calculationMete Durlu
[ Upstream commit 367c50f78451d3bd7ad70bc5c89f9ba6dec46ca9 ] Current average steal timer calculation produces volatile and inflated values. The only user of this value is KVM so far and it uses that to decide whether or not to yield the vCPU which is seeing steal time. KVM compares average steal timer to a threshold and if the threshold is past then it does not allow CPU polling and yields it to host, else it keeps the CPU by polling. Since KVM's steal time threshold is very low by default (%10) it most likely is not effected much by the bloated average steal timer values because the operating region is pretty small. However there might be new users in the future who might rely on this number. Fix average steal timer calculation by changing the formula from: avg_steal_timer = avg_steal_timer / 2 + steal_timer; to the following: avg_steal_timer = (avg_steal_timer + steal_timer) / 2; This ensures that avg_steal_timer is actually a naive average of steal timer values. It now closely follows steal timer values but of course in a smoother manner. Fixes: 152e9b8676c6 ("s390/vtime: steal time exponential moving average") Signed-off-by: Mete Durlu <meted@linux.ibm.com> Acked-by: Heiko Carstens <hca@linux.ibm.com> Acked-by: Christian Borntraeger <borntraeger@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-03-01s390: use the correct count for __iowrite64_copy()Jason Gunthorpe
[ Upstream commit 723a2cc8d69d4342b47dfddbfe6c19f1b135f09b ] The signature for __iowrite64_copy() requires the number of 64 bit quantities, not bytes. Multiple by 8 to get to a byte length before invoking zpci_memcpy_toio() Fixes: 87bc359b9822 ("s390/pci: speed up __iowrite64_copy by using pci store block insn") Acked-by: Niklas Schnelle <schnelle@linux.ibm.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/0-v1-9223d11a7662+1d7785-s390_iowrite64_jgg@nvidia.com Signed-off-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-02-23KVM: s390: fix setting of fpc registerHeiko Carstens
[ Upstream commit b988b1bb0053c0dcd26187d29ef07566a565cf55 ] kvm_arch_vcpu_ioctl_set_fpu() allows to set the floating point control (fpc) register of a guest cpu. The new value is tested for validity by temporarily loading it into the fpc register. This may lead to corruption of the fpc register of the host process: if an interrupt happens while the value is temporarily loaded into the fpc register, and within interrupt context floating point or vector registers are used, the current fp/vx registers are saved with save_fpu_regs() assuming they belong to user space and will be loaded into fp/vx registers when returning to user space. test_fp_ctl() restores the original user space / host process fpc register value, however it will be discarded, when returning to user space. In result the host process will incorrectly continue to run with the value that was supposed to be used for a guest cpu. Fix this by simply removing the test. There is another test right before the SIE context is entered which will handles invalid values. This results in a change of behaviour: invalid values will now be accepted instead of that the ioctl fails with -EINVAL. This seems to be acceptable, given that this interface is most likely not used anymore, and this is in addition the same behaviour implemented with the memory mapped interface (replace invalid values with zero) - see sync_regs() in kvm-s390.c. Reviewed-by: Christian Borntraeger <borntraeger@linux.ibm.com> Reviewed-by: Claudio Imbrenda <imbrenda@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-02-23s390/ptrace: handle setting of fpc register correctlyHeiko Carstens
[ Upstream commit 8b13601d19c541158a6e18b278c00ba69ae37829 ] If the content of the floating point control (fpc) register of a traced process is modified with the ptrace interface the new value is tested for validity by temporarily loading it into the fpc register. This may lead to corruption of the fpc register of the tracing process: if an interrupt happens while the value is temporarily loaded into the fpc register, and within interrupt context floating point or vector registers are used, the current fp/vx registers are saved with save_fpu_regs() assuming they belong to user space and will be loaded into fp/vx registers when returning to user space. test_fp_ctl() restores the original user space fpc register value, however it will be discarded, when returning to user space. In result the tracer will incorrectly continue to run with the value that was supposed to be used for the traced process. Fix this by saving fpu register contents with save_fpu_regs() before using test_fp_ctl(). Reviewed-by: Claudio Imbrenda <imbrenda@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-01-25s390/pci: fix max size calculation in zpci_memcpy_toio()Niklas Schnelle
[ Upstream commit 80df7d6af7f6d229b34cf237b2cc9024c07111cd ] The zpci_get_max_write_size() helper is used to determine the maximum size a PCI store or load can use at a given __iomem address. For the PCI block store the following restrictions apply: 1. The dst + len must not cross a 4K boundary in the (pseudo-)MMIO space 2. len must not exceed ZPCI_MAX_WRITE_SIZE 3. len must be a multiple of 8 bytes 4. The src address must be double word (8 byte) aligned 5. The dst address must be double word (8 byte) aligned Otherwise only a normal PCI store which takes its src value from a register can be used. For these PCI store restriction 1 still applies. Similarly 1 also applies to PCI loads. It turns out zpci_max_write_size() instead implements stricter conditions which prevents PCI block stores from being used where they can and should be used. In particular instead of conditions 4 and 5 it wrongly enforces both dst and src to be size aligned. This indirectly covers condition 1 but also prevents many legal PCI block stores. On top of the functional shortcomings the zpci_get_max_write_size() is misnamed as it is used for both read and write size calculations. Rename it to zpci_get_max_io_size() and implement the listed conditions explicitly. Reviewed-by: Matthew Rosato <mjrosato@linux.ibm.com> Fixes: cd24834130ac ("s390/pci: base support") Signed-off-by: Niklas Schnelle <schnelle@linux.ibm.com> [agordeev@linux.ibm.com replaced spaces with tabs] Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-01-08s390/vx: fix save/restore of fpu kernel contextHeiko Carstens
[ Upstream commit e6b2dab41888332bf83f592131e7ea07756770a4 ] The KERNEL_FPR mask only contains a flag for the first eight vector registers. However floating point registers overlay parts of the first sixteen vector registers. This could lead to vector register corruption if a kernel fpu context uses any of the vector registers 8 to 15 and is interrupted or calls a KERNEL_FPR context. If that context uses also vector registers 8 to 15, their contents will be corrupted on return. Luckily this is currently not a real bug, since the kernel has only one KERNEL_FPR user with s390_adjust_jiffies() and it is only using floating point registers 0 to 2. Fix this by using the correct bits for KERNEL_FPR. Fixes: 7f79695cc1b6 ("s390/fpu: improve kernel_fpu_[begin|end]") Signed-off-by: Heiko Carstens <hca@linux.ibm.com> Reviewed-by: Hendrik Brueckner <brueckner@linux.ibm.com> Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-12-13KVM: s390/mm: Properly reset no-datClaudio Imbrenda
commit 27072b8e18a73ffeffb1c140939023915a35134b upstream. When the CMMA state needs to be reset, the no-dat bit also needs to be reset. Failure to do so could cause issues in the guest, since the guest expects the bit to be cleared after a reset. Cc: <stable@vger.kernel.org> Reviewed-by: Nico Boehr <nrb@linux.ibm.com> Message-ID: <20231109123624.37314-1-imbrenda@linux.ibm.com> Signed-off-by: Claudio Imbrenda <imbrenda@linux.ibm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-12-08s390/cmma: fix detection of DAT pagesHeiko Carstens
[ Upstream commit 44d93045247661acbd50b1629e62f415f2747577 ] If the cmma no-dat feature is available the kernel page tables are walked to identify and mark all pages which are used for address translation (all region, segment, and page tables). In a subsequent loop all other pages are marked as "no-dat" pages with the ESSA instruction. This information is visible to the hypervisor, so that the hypervisor can optimize purging of guest TLB entries. The initial loop however is incorrect: only the first three of the four pages which belong to segment and region tables will be marked as being used for DAT. The last page is incorrectly marked as no-dat. This can result in incorrect guest TLB flushes. Fix this by simply marking all four pages. Cc: <stable@vger.kernel.org> Reviewed-by: Claudio Imbrenda <imbrenda@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-12-08s390/mm: fix phys vs virt confusion in mark_kernel_pXd() functions familyAlexander Gordeev
[ Upstream commit 3784231b1e091857bd129fd9658a8b3cedbdcd58 ] Due to historical reasons mark_kernel_pXd() functions misuse the notion of physical vs virtual addresses difference. Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com> Stable-dep-of: 44d930452476 ("s390/cmma: fix detection of DAT pages") Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-10-25s390/pci: fix iommu bitmap allocationNiklas Schnelle
commit c1ae1c59c8c6e0b66a718308c623e0cb394dab6b upstream. Since the fixed commits both zdev->iommu_bitmap and zdev->lazy_bitmap are allocated as vzalloc(zdev->iommu_pages / 8). The problem is that zdev->iommu_bitmap is a pointer to unsigned long but the above only yields an allocation that is a multiple of sizeof(unsigned long) which is 8 on s390x if the number of IOMMU pages is a multiple of 64. This in turn is the case only if the effective IOMMU aperture is a multiple of 64 * 4K = 256K. This is usually the case and so didn't cause visible issues since both the virt_to_phys(high_memory) reduced limit and hardware limits use nice numbers. Under KVM, and in particular with QEMU limiting the IOMMU aperture to the vfio DMA limit (default 65535), it is possible for the reported aperture not to be a multiple of 256K however. In this case we end up with an iommu_bitmap whose allocation is not a multiple of 8 causing bitmap operations to access it out of bounds. Sadly we can't just fix this in the obvious way and use bitmap_zalloc() because for large RAM systems (tested on 8 TiB) the zdev->iommu_bitmap grows too large for kmalloc(). So add our own bitmap_vzalloc() wrapper. This might be a candidate for common code, but this area of code will be replaced by the upcoming conversion to use the common code DMA API on s390 so just add a local routine. Fixes: 224593215525 ("s390/pci: use virtual memory for iommu bitmap") Fixes: 13954fd6913a ("s390/pci_dma: improve lazy flush for unmap") Cc: stable@vger.kernel.org Reviewed-by: Matthew Rosato <mjrosato@linux.ibm.com> Signed-off-by: Niklas Schnelle <schnelle@linux.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-09-23s390/ipl: add missing secure/has_secure file to ipl type 'unknown'Sven Schnelle
commit ea5717cb13468323a7c3dd394748301802991f39 upstream. OS installers are relying on /sys/firmware/ipl/has_secure to be present on machines supporting secure boot. This file is present for all IPL types, but not the unknown type, which prevents a secure installation when an LPAR is booted in HMC via FTP(s), because this is an unknown IPL type in linux. While at it, also add the secure file. Fixes: c9896acc7851 ("s390/ipl: Provide has_secure sysfs attribute") Cc: stable@vger.kernel.org Signed-off-by: Sven Schnelle <svens@linux.ibm.com> Reviewed-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-08-11KVM: s390: fix sthyi error handlingHeiko Carstens
[ Upstream commit 0c02cc576eac161601927b41634f80bfd55bfa9e ] Commit 9fb6c9b3fea1 ("s390/sthyi: add cache to store hypervisor info") added cache handling for store hypervisor info. This also changed the possible return code for sthyi_fill(). Instead of only returning a condition code like the sthyi instruction would do, it can now also return a negative error value (-ENOMEM). handle_styhi() was not changed accordingly. In case of an error, the negative error value would incorrectly injected into the guest PSW. Add proper error handling to prevent this, and update the comment which describes the possible return values of sthyi_fill(). Fixes: 9fb6c9b3fea1 ("s390/sthyi: add cache to store hypervisor info") Reviewed-by: Christian Borntraeger <borntraeger@linux.ibm.com> Link: https://lore.kernel.org/r/20230727182939.2050744-1-hca@linux.ibm.com Signed-off-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-07-27KVM: s390: vsie: fix the length of APCB bitmapPierre Morel
[ Upstream commit 246be7d2720ea9a795b576067ecc5e5c7a1e7848 ] bit_and() uses the count of bits as the woking length. Fix the previous implementation and effectively use the right bitmap size. Fixes: 19fd83a64718 ("KVM: s390: vsie: allow CRYCB FORMAT-1") Fixes: 56019f9aca22 ("KVM: s390: vsie: Allow CRYCB FORMAT-2") Signed-off-by: Pierre Morel <pmorel@linux.ibm.com> Reviewed-by: Janosch Frank <frankja@linux.ibm.com> Link: https://lore.kernel.org/kvm/20230511094719.9691-1-pmorel@linux.ibm.com/ Signed-off-by: Janosch Frank <frankja@linux.ibm.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-07-27KVM: s390: fix KVM_S390_GET_CMMA_BITS for GFNs in memslot holesNico Boehr
[ Upstream commit 285cff4c0454340a4dc53f46e67f2cb1c293bd74 ] The KVM_S390_GET_CMMA_BITS ioctl may return incorrect values when userspace specifies a start_gfn outside of memslots. This can occur when a VM has multiple memslots with a hole in between: +-----+----------+--------+--------+ | ... | Slot N-1 | <hole> | Slot N | +-----+----------+--------+--------+ ^ ^ ^ ^ | | | | GFN A A+B | | A+B+C | A+B+C+D When userspace specifies a GFN in [A+B, A+B+C), it would expect to get the CMMA values of the first dirty page in Slot N. However, userspace may get a start_gfn of A+B+C+D with a count of 0, hence completely skipping over any dirty pages in slot N. The error is in kvm_s390_next_dirty_cmma(), which assumes gfn_to_memslot_approx() will return the memslot _below_ the specified GFN when the specified GFN lies outside a memslot. In reality it may return either the memslot below or above the specified GFN. When a memslot above the specified GFN is returned this happens: - ofs is calculated, but since the memslot's base_gfn is larger than the specified cur_gfn, ofs will underflow to a huge number. - ofs is passed to find_next_bit(). Since ofs will exceed the memslot's number of pages, the number of pages in the memslot is returned, completely skipping over all bits in the memslot userspace would be interested in. Fix this by resetting ofs to zero when a memslot _above_ cur_gfn is returned (cur_gfn < ms->base_gfn). Signed-off-by: Nico Boehr <nrb@linux.ibm.com> Reviewed-by: Claudio Imbrenda <imbrenda@linux.ibm.com> Fixes: afdad61615cc ("KVM: s390: Fix storage attributes migration with memory slots") Message-Id: <20230324145424.293889-2-nrb@linux.ibm.com> Signed-off-by: Claudio Imbrenda <imbrenda@linux.ibm.com> Signed-off-by: Janosch Frank <frankja@linux.ibm.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-06-09treewide: Remove uninitialized_var() usageKees Cook
commit 3f649ab728cda8038259d8f14492fe400fbab911 upstream. Using uninitialized_var() is dangerous as it papers over real bugs[1] (or can in the future), and suppresses unrelated compiler warnings (e.g. "unused variable"). If the compiler thinks it is uninitialized, either simply initialize the variable or make compiler changes. In preparation for removing[2] the[3] macro[4], remove all remaining needless uses with the following script: git grep '\buninitialized_var\b' | cut -d: -f1 | sort -u | \ xargs perl -pi -e \ 's/\buninitialized_var\(([^\)]+)\)/\1/g; s:\s*/\* (GCC be quiet|to make compiler happy) \*/$::g;' drivers/video/fbdev/riva/riva_hw.c was manually tweaked to avoid pathological white-space. No outstanding warnings were found building allmodconfig with GCC 9.3.0 for x86_64, i386, arm64, arm, powerpc, powerpc64le, s390x, mips, sparc64, alpha, and m68k. [1] https://lore.kernel.org/lkml/20200603174714.192027-1-glider@google.com/ [2] https://lore.kernel.org/lkml/CA+55aFw+Vbj0i=1TGqCR5vQkCzWJ0QxK6CernOU6eedsudAixw@mail.gmail.com/ [3] https://lore.kernel.org/lkml/CA+55aFwgbgqhbp1fkxvRKEpzyR5J8n1vKT1VZdz9knmPuXhOeg@mail.gmail.com/ [4] https://lore.kernel.org/lkml/CA+55aFz2500WfbKXAx8s67wrm9=yVJu65TpLgN_ybYNv0VEOKA@mail.gmail.com/ Reviewed-by: Leon Romanovsky <leonro@mellanox.com> # drivers/infiniband and mlx4/mlx5 Acked-by: Jason Gunthorpe <jgg@mellanox.com> # IB Acked-by: Kalle Valo <kvalo@codeaurora.org> # wireless drivers Reviewed-by: Chao Yu <yuchao0@huawei.com> # erofs Signed-off-by: Kees Cook <keescook@chromium.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-04-26s390/ptrace: fix PTRACE_GET_LAST_BREAK error handlingHeiko Carstens
[ Upstream commit f9bbf25e7b2b74b52b2f269216a92657774f239c ] Return -EFAULT if put_user() for the PTRACE_GET_LAST_BREAK request fails, instead of silently ignoring it. Reviewed-by: Sven Schnelle <svens@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-04-05s390/uaccess: add missing earlyclobber annotations to __clear_user()Heiko Carstens
commit 89aba4c26fae4e459f755a18912845c348ee48f3 upstream. Add missing earlyclobber annotation to size, to, and tmp2 operands of the __clear_user() inline assembly since they are modified or written to before the last usage of all input operands. This can lead to incorrect register allocation for the inline assembly. Fixes: 6c2a9e6df604 ("[S390] Use alternative user-copy operations for new hardware.") Reported-by: Mark Rutland <mark.rutland@arm.com> Link: https://lore.kernel.org/all/20230321122514.1743889-3-mark.rutland@arm.com/ Cc: stable@vger.kernel.org Reviewed-by: Gerald Schaefer <gerald.schaefer@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-03-22s390/ipl: add missing intersection check to ipl_report handlingSven Schnelle
commit a52e5cdbe8016d4e3e6322fd93d71afddb9a5af9 upstream. The code which handles the ipl report is searching for a free location in memory where it could copy the component and certificate entries to. It checks for intersection between the sections required for the kernel and the component/certificate data area, but fails to check whether the data structures linking these data areas together intersect. This might cause the iplreport copy code to overwrite the iplreport itself. Fix this by adding two addtional intersection checks. Cc: <stable@vger.kernel.org> Fixes: 9641b8cc733f ("s390/ipl: read IPL report at early boot") Signed-off-by: Sven Schnelle <svens@linux.ibm.com> Reviewed-by: Vasily Gorbik <gor@linux.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com> Signed-off-by: Sven Schnelle <svens@linux.ibm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-03-17s390: define RUNTIME_DISCARD_EXIT to fix link error with GNU ld < 2.36Masahiro Yamada
commit a494398bde273143c2352dd373cad8211f7d94b2 upstream. Nathan Chancellor reports that the s390 vmlinux fails to link with GNU ld < 2.36 since commit 99cb0d917ffa ("arch: fix broken BuildID for arm64 and riscv"). It happens for defconfig, or more specifically for CONFIG_EXPOLINE=y. $ s390x-linux-gnu-ld --version | head -n1 GNU ld (GNU Binutils for Debian) 2.35.2 $ make -s ARCH=s390 CROSS_COMPILE=s390x-linux-gnu- allnoconfig $ ./scripts/config -e CONFIG_EXPOLINE $ make -s ARCH=s390 CROSS_COMPILE=s390x-linux-gnu- olddefconfig $ make -s ARCH=s390 CROSS_COMPILE=s390x-linux-gnu- `.exit.text' referenced in section `.s390_return_reg' of drivers/base/dd.o: defined in discarded section `.exit.text' of drivers/base/dd.o make[1]: *** [scripts/Makefile.vmlinux:34: vmlinux] Error 1 make: *** [Makefile:1252: vmlinux] Error 2 arch/s390/kernel/vmlinux.lds.S wants to keep EXIT_TEXT: .exit.text : { EXIT_TEXT } But, at the same time, EXIT_TEXT is thrown away by DISCARD because s390 does not define RUNTIME_DISCARD_EXIT. I still do not understand why the latter wins after 99cb0d917ffa, but defining RUNTIME_DISCARD_EXIT seems correct because the comment line in arch/s390/kernel/vmlinux.lds.S says: /* * .exit.text is discarded at runtime, not link time, * to deal with references from __bug_table */ Nathan also found that binutils commit 21401fc7bf67 ("Duplicate output sections in scripts") cured this issue, so we cannot reproduce it with binutils 2.36+, but it is better to not rely on it. Fixes: 99cb0d917ffa ("arch: fix broken BuildID for arm64 and riscv") Link: https://lore.kernel.org/all/Y7Jal56f6UBh1abE@dev-arch.thelio-3990X/ Reported-by: Nathan Chancellor <nathan@kernel.org> Signed-off-by: Masahiro Yamada <masahiroy@kernel.org> Link: https://lore.kernel.org/r/20230105031306.1455409-1-masahiroy@kernel.org Signed-off-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Tom Saeger <tom.saeger@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-03-11KVM: s390: disable migration mode when dirty tracking is disabledNico Boehr
commit f2d3155e2a6bac44d16f04415a321e8707d895c6 upstream. Migration mode is a VM attribute which enables tracking of changes in storage attributes (PGSTE). It assumes dirty tracking is enabled on all memslots to keep a dirty bitmap of pages with changed storage attributes. When enabling migration mode, we currently check that dirty tracking is enabled for all memslots. However, userspace can disable dirty tracking without disabling migration mode. Since migration mode is pointless with dirty tracking disabled, disable migration mode whenever userspace disables dirty tracking on any slot. Also update the documentation to clarify that dirty tracking must be enabled when enabling migration mode, which is already enforced by the code in kvm_s390_vm_start_migration(). Also highlight in the documentation for KVM_S390_GET_CMMA_BITS that it can now fail with -EINVAL when dirty tracking is disabled while migration mode is on. Move all the error codes to a table so this stays readable. To disable migration mode, slots_lock should be held, which is taken in kvm_set_memory_region() and thus held in kvm_arch_prepare_memory_region(). Restructure the prepare code a bit so all the sanity checking is done before disabling migration mode. This ensures migration mode isn't disabled when some sanity check fails. Cc: stable@vger.kernel.org Fixes: 190df4a212a7 ("KVM: s390: CMMA tracking, ESSA emulation, migration mode") Signed-off-by: Nico Boehr <nrb@linux.ibm.com> Reviewed-by: Janosch Frank <frankja@linux.ibm.com> Reviewed-by: Claudio Imbrenda <imbrenda@linux.ibm.com> Link: https://lore.kernel.org/r/20230127140532.230651-2-nrb@linux.ibm.com Message-Id: <20230127140532.230651-2-nrb@linux.ibm.com> [frankja@linux.ibm.com: fixed commit message typo, moved api.rst error table upwards] Signed-off-by: Janosch Frank <frankja@linux.ibm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-03-11s390/kprobes: fix current_kprobe never cleared after kprobes reenterVasily Gorbik
commit cd57953936f2213dfaccce10d20f396956222c7d upstream. Recent test_kprobe_missed kprobes kunit test uncovers the following problem. Once kprobe is triggered from another kprobe (kprobe reenter), all future kprobes on this cpu are considered as kprobe reenter, thus pre_handler and post_handler are not being called and kprobes are counted as "missed". Commit b9599798f953 ("[S390] kprobes: activation and deactivation") introduced a simpler scheme for kprobes (de)activation and status tracking by using push_kprobe/pop_kprobe, which supposed to work for both initial kprobe entry as well as kprobe reentry and helps to avoid handling those two cases differently. The problem is that a sequence of calls in case of kprobes reenter: push_kprobe() <- NULL (current_kprobe) push_kprobe() <- kprobe1 (current_kprobe) pop_kprobe() -> kprobe1 (current_kprobe) pop_kprobe() -> kprobe1 (current_kprobe) leaves "kprobe1" as "current_kprobe" on this cpu, instead of setting it to NULL. In fact push_kprobe/pop_kprobe can only store a single state (there is just one prev_kprobe in kprobe_ctlblk). Which is a hack but sufficient, there is no need to have another prev_kprobe just to store NULL. To make a simple and backportable fix simply reset "prev_kprobe" when kprobe is poped from this "stack". No need to worry about "kprobe_status" in this case, because its value is only checked when current_kprobe != NULL. Cc: stable@vger.kernel.org Fixes: b9599798f953 ("[S390] kprobes: activation and deactivation") Reviewed-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-03-11s390/kprobes: fix irq mask clobbering on kprobe reenter from post_handlerVasily Gorbik
commit 42e19e6f04984088b6f9f0507c4c89a8152d9730 upstream. Recent test_kprobe_missed kprobes kunit test uncovers the following error (reported when CONFIG_DEBUG_ATOMIC_SLEEP is enabled): BUG: sleeping function called from invalid context at kernel/locking/mutex.c:580 in_atomic(): 0, irqs_disabled(): 1, non_block: 0, pid: 662, name: kunit_try_catch preempt_count: 0, expected: 0 RCU nest depth: 0, expected: 0 no locks held by kunit_try_catch/662. irq event stamp: 280 hardirqs last enabled at (279): [<00000003e60a3d42>] __do_pgm_check+0x17a/0x1c0 hardirqs last disabled at (280): [<00000003e3bd774a>] kprobe_exceptions_notify+0x27a/0x318 softirqs last enabled at (0): [<00000003e3c5c890>] copy_process+0x14a8/0x4c80 softirqs last disabled at (0): [<0000000000000000>] 0x0 CPU: 46 PID: 662 Comm: kunit_try_catch Tainted: G N 6.2.0-173644-g44c18d77f0c0 #2 Hardware name: IBM 3931 A01 704 (LPAR) Call Trace: [<00000003e60a3a00>] dump_stack_lvl+0x120/0x198 [<00000003e3d02e82>] __might_resched+0x60a/0x668 [<00000003e60b9908>] __mutex_lock+0xc0/0x14e0 [<00000003e60bad5a>] mutex_lock_nested+0x32/0x40 [<00000003e3f7b460>] unregister_kprobe+0x30/0xd8 [<00000003e51b2602>] test_kprobe_missed+0xf2/0x268 [<00000003e51b5406>] kunit_try_run_case+0x10e/0x290 [<00000003e51b7dfa>] kunit_generic_run_threadfn_adapter+0x62/0xb8 [<00000003e3ce30f8>] kthread+0x2d0/0x398 [<00000003e3b96afa>] __ret_from_fork+0x8a/0xe8 [<00000003e60ccada>] ret_from_fork+0xa/0x40 The reason for this error report is that kprobes handling code failed to restore irqs. The problem is that when kprobe is triggered from another kprobe post_handler current sequence of enable_singlestep / disable_singlestep is the following: enable_singlestep <- original kprobe (saves kprobe_saved_imask) enable_singlestep <- kprobe triggered from post_handler (clobbers kprobe_saved_imask) disable_singlestep <- kprobe triggered from post_handler (restores kprobe_saved_imask) disable_singlestep <- original kprobe (restores wrong clobbered kprobe_saved_imask) There is just one kprobe_ctlblk per cpu and both calls saves and loads irq mask to kprobe_saved_imask. To fix the problem simply move resume_execution (which calls disable_singlestep) before calling post_handler. This also fixes the problem that post_handler is called with pt_regs which were not yet adjusted after single-stepping. Cc: stable@vger.kernel.org Fixes: 4ba069b802c2 ("[S390] add kprobes support.") Reviewed-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-03-11s390: discard .interp sectionIlya Leoshkevich
commit e9c9cb90e76ffaabcc7ca8f275d9e82195fd6367 upstream. When debugging vmlinux with QEMU + GDB, the following GDB error may occur: (gdb) c Continuing. Warning: Cannot insert breakpoint -1. Cannot access memory at address 0xffffffffffff95c0 Command aborted. (gdb) The reason is that, when .interp section is present, GDB tries to locate the file specified in it in memory and put a number of breakpoints there (see enable_break() function in gdb/solib-svr4.c). Sometimes GDB finds a bogus location that matches its heuristics, fails to set a breakpoint and stops. This makes further debugging impossible. The .interp section contains misleading information anyway (vmlinux does not need ld.so), so fix by discarding it. Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com> Cc: <stable@vger.kernel.org> Signed-off-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-02-22s390/decompressor: specify __decompress() buf len to avoid overflowVasily Gorbik
[ Upstream commit 7ab41c2c08a32132ba8c14624910e2fe8ce4ba4b ] Historically calls to __decompress() didn't specify "out_len" parameter on many architectures including s390, expecting that no writes beyond uncompressed kernel image are performed. This has changed since commit 2aa14b1ab2c4 ("zstd: import usptream v1.5.2") which includes zstd library commit 6a7ede3dfccb ("Reduce size of dctx by reutilizing dst buffer (#2751)"). Now zstd decompression code might store literal buffer in the unwritten portion of the destination buffer. Since "out_len" is not set, it is considered to be unlimited and hence free to use for optimization needs. On s390 this might corrupt initrd or ipl report which are often placed right after the decompressor buffer. Luckily the size of uncompressed kernel image is already known to the decompressor, so to avoid the problem simply specify it in the "out_len" parameter. Link: https://github.com/facebook/zstd/commit/6a7ede3dfccb Signed-off-by: Vasily Gorbik <gor@linux.ibm.com> Tested-by: Alexander Egorenkov <egorenar@linux.ibm.com> Link: https://lore.kernel.org/r/patch-1.thread-41c676.git-41c676c2d153.your-ad-here.call-01675030179-ext-9637@work.hours Signed-off-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-02-06exit: Add and use make_task_dead.Eric W. Biederman
commit 0e25498f8cd43c1b5aa327f373dd094e9a006da7 upstream. There are two big uses of do_exit. The first is it's design use to be the guts of the exit(2) system call. The second use is to terminate a task after something catastrophic has happened like a NULL pointer in kernel code. Add a function make_task_dead that is initialy exactly the same as do_exit to cover the cases where do_exit is called to handle catastrophic failure. In time this can probably be reduced to just a light wrapper around do_task_dead. For now keep it exactly the same so that there will be no behavioral differences introducing this new concept. Replace all of the uses of do_exit that use it for catastraphic task cleanup with make_task_dead to make it clear what the code is doing. As part of this rename rewind_stack_do_exit rewind_stack_and_make_dead. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-02-06KVM: s390: interrupt: use READ_ONCE() before cmpxchg()Heiko Carstens
[ Upstream commit 42400d99e9f0728c17240edb9645637ead40f6b9 ] Use READ_ONCE() before cmpxchg() to prevent that the compiler generates code that fetches the to be compared old value several times from memory. Reviewed-by: Christian Borntraeger <borntraeger@linux.ibm.com> Acked-by: Christian Borntraeger <borntraeger@linux.ibm.com> Reviewed-by: Claudio Imbrenda <imbrenda@linux.ibm.com> Link: https://lore.kernel.org/r/20230109145456.2895385-1-hca@linux.ibm.com Signed-off-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-02-06s390/debug: add _ASM_S390_ prefix to header guardNiklas Schnelle
[ Upstream commit 0d4d52361b6c29bf771acd4fa461f06d78fb2fac ] Using DEBUG_H without a prefix is very generic and inconsistent with other header guards in arch/s390/include/asm. In fact it collides with the same name in the ath9k wireless driver though that depends on !S390 via disabled wireless support. Let's just use a consistent header guard name and prevent possible future trouble. Signed-off-by: Niklas Schnelle <schnelle@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-01-18s390/percpu: add READ_ONCE() to arch_this_cpu_to_op_simple()Heiko Carstens
commit e3f360db08d55a14112bd27454e616a24296a8b0 upstream. Make sure that *ptr__ within arch_this_cpu_to_op_simple() is only dereferenced once by using READ_ONCE(). Otherwise the compiler could generate incorrect code. Cc: <stable@vger.kernel.org> Reviewed-by: Alexander Gordeev <agordeev@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-01-18s390/kexec: fix ipl report address for kdumpAlexander Egorenkov
commit c2337a40e04dde1692b5b0a46ecc59f89aaba8a1 upstream. This commit addresses the following erroneous situation with file-based kdump executed on a system with a valid IPL report. On s390, a kdump kernel, its initrd and IPL report if present are loaded into a special and reserved on boot memory region - crashkernel. When a system crashes and kdump was activated before, the purgatory code is entered first which swaps the crashkernel and [0 - crashkernel size] memory regions. Only after that the kdump kernel is entered. For this reason, the pointer to an IPL report in lowcore must point to the IPL report after the swap and not to the address of the IPL report that was located in crashkernel memory region before the swap. Failing to do so, makes the kdump's decompressor try to read memory from the crashkernel memory region which already contains the production's kernel memory. The situation described above caused spontaneous kdump failures/hangs on systems where the Secure IPL is activated because on such systems an IPL report is always present. In that case kdump's decompressor tried to parse an IPL report which frequently lead to illegal memory accesses because an IPL report contains addresses to various data. Cc: <stable@vger.kernel.org> Fixes: 99feaa717e55 ("s390/kexec_file: Create ipl report and pass to next kernel") Reviewed-by: Vasily Gorbik <gor@linux.ibm.com> Signed-off-by: Alexander Egorenkov <egorenar@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-12-14KVM: s390: vsie: Fix the initialization of the epoch extension (epdx) fieldThomas Huth
commit 0dd4cdccdab3d74bd86b868768a7dca216bcce7e upstream. We recently experienced some weird huge time jumps in nested guests when rebooting them in certain cases. After adding some debug code to the epoch handling in vsie.c (thanks to David Hildenbrand for the idea!), it was obvious that the "epdx" field (the multi-epoch extension) did not get set to 0xff in case the "epoch" field was negative. Seems like the code misses to copy the value from the epdx field from the guest to the shadow control block. By doing so, the weird time jumps are gone in our scenarios. Link: https://bugzilla.redhat.com/show_bug.cgi?id=2140899 Fixes: 8fa1696ea781 ("KVM: s390: Multiple Epoch Facility support") Signed-off-by: Thomas Huth <thuth@redhat.com> Reviewed-by: Christian Borntraeger <borntraeger@linux.ibm.com> Acked-by: David Hildenbrand <david@redhat.com> Reviewed-by: Claudio Imbrenda <imbrenda@linux.ibm.com> Reviewed-by: Janosch Frank <frankja@linux.ibm.com> Cc: stable@vger.kernel.org # 4.19+ Link: https://lore.kernel.org/r/20221123090833.292938-1-thuth@redhat.com Message-Id: <20221123090833.292938-1-thuth@redhat.com> Signed-off-by: Janosch Frank <frankja@linux.ibm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-12-08s390/crashdump: fix TOD programmable field sizeHeiko Carstens
[ Upstream commit f44e07a8afdd713ddc1a8832c39372fe5dd86895 ] The size of the TOD programmable field was incorrectly increased from four to eight bytes with commit 1a2c5840acf9 ("s390/dump: cleanup CPU save area handling"). This leads to an elf notes section NT_S390_TODPREG which has a size of eight instead of four bytes in case of kdump, however even worse is that the contents is incorrect: it is supposed to contain only the contents of the TOD programmable field, but in fact contains a mix of the TOD programmable field (32 bit upper bits) and parts of the CPU timer register (lower 32 bits). Fix this by simply changing the size of the todpreg field within the save area structure. This will implicitly also fix the size of the corresponding elf notes sections. This also gets rid of this compile time warning: in function ‘fortify_memcpy_chk’, inlined from ‘save_area_add_regs’ at arch/s390/kernel/crash_dump.c:99:2: ./include/linux/fortify-string.h:413:25: error: call to ‘__read_overflow2_field’ declared with attribute warning: detected read beyond size of field (2nd parameter); maybe use struct_group()? [-Werror=attribute-warning] 413 | __read_overflow2_field(q_size_field, size); | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Fixes: 1a2c5840acf9 ("s390/dump: cleanup CPU save area handling") Reviewed-by: Christian Borntraeger <borntraeger@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-11-03s390/pci: add missing EX_TABLE entries to ↵Heiko Carstens
__pcistg_mio_inuser()/__pcilg_mio_inuser() commit 6ec803025cf3173a57222e4411097166bd06fa98 upstream. For some exception types the instruction address points behind the instruction that caused the exception. Take that into account and add the missing exception table entry. Cc: <stable@vger.kernel.org> Fixes: f058599e22d5 ("s390/pci: Fix s390_mmio_read/write with MIO") Reviewed-by: Niklas Schnelle <schnelle@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-11-03s390/futex: add missing EX_TABLE entry to __futex_atomic_op()Heiko Carstens
commit a262d3ad6a433e4080cecd0a8841104a5906355e upstream. For some exception types the instruction address points behind the instruction that caused the exception. Take that into account and add the missing exception table entry. Cc: <stable@vger.kernel.org> Reviewed-by: Vasily Gorbik <gor@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-09-15s390: fix nospec table alignmentsJosh Poimboeuf
commit c9305b6c1f52060377c72aebe3a701389e9f3172 upstream. Add proper alignment for .nospec_call_table and .nospec_return_table in vmlinux. [hca@linux.ibm.com]: The problem with the missing alignment of the nospec tables exist since a long time, however only since commit e6ed91fd0768 ("s390/alternatives: remove padding generation code") and with CONFIG_RELOCATABLE=n the kernel may also crash at boot time. The above named commit reduced the size of struct alt_instr by one byte, so its new size is 11 bytes. Therefore depending on the number of cpu alternatives the size of the __alt_instructions array maybe odd, which again also causes that the addresses of the nospec tables will be odd. If the address of __nospec_call_start is odd and the kernel is compiled With CONFIG_RELOCATABLE=n the compiler may generate code that loads the address of __nospec_call_start with a 'larl' instruction. This will generate incorrect code since the 'larl' instruction only works with even addresses. In result the members of the nospec tables will be accessed with an off-by-one offset, which subsequently may lead to addressing exceptions within __nospec_revert(). Fixes: f19fbd5ed642 ("s390: introduce execute-trampolines for branches") Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org> Link: https://lore.kernel.org/r/8719bf1ce4a72ebdeb575200290094e9ce047bcc.1661557333.git.jpoimboe@kernel.org Cc: <stable@vger.kernel.org> # 4.16 Reviewed-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-09-15s390/hugetlb: fix prepare_hugepage_range() check for 2 GB hugepagesGerald Schaefer
commit 7c8d42fdf1a84b1a0dd60d6528309c8ec127e87c upstream. The alignment check in prepare_hugepage_range() is wrong for 2 GB hugepages, it only checks for 1 MB hugepage alignment. This can result in kernel crash in __unmap_hugepage_range() at the BUG_ON(start & ~huge_page_mask(h)) alignment check, for mappings created with MAP_FIXED at unaligned address. Fix this by correctly handling multiple hugepage sizes, similar to the generic version of prepare_hugepage_range(). Fixes: d08de8e2d867 ("s390/mm: add support for 2GB hugepages") Cc: <stable@vger.kernel.org> # 4.8+ Acked-by: Alexander Gordeev <agordeev@linux.ibm.com> Signed-off-by: Gerald Schaefer <gerald.schaefer@linux.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-09-05s390/hypfs: avoid error message under KVMJuergen Gross
[ Upstream commit 7b6670b03641ac308aaa6fa2e6f964ac993b5ea3 ] When booting under KVM the following error messages are issued: hypfs.7f5705: The hardware system does not support hypfs hypfs.7a79f0: Initialization of hypfs failed with rc=-61 Demote the severity of first message from "error" to "info" and issue the second message only in other error cases. Signed-off-by: Juergen Gross <jgross@suse.com> Acked-by: Heiko Carstens <hca@linux.ibm.com> Acked-by: Christian Borntraeger <borntraeger@linux.ibm.com> Link: https://lore.kernel.org/r/20220620094534.18967-1-jgross@suse.com [arch/s390/hypfs/hypfs_diag.c changed description] Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-09-05s390/mm: do not trigger write fault when vma does not allow VM_WRITEGerald Schaefer
commit 41ac42f137080bc230b5882e3c88c392ab7f2d32 upstream. For non-protection pXd_none() page faults in do_dat_exception(), we call do_exception() with access == (VM_READ | VM_WRITE | VM_EXEC). In do_exception(), vma->vm_flags is checked against that before calling handle_mm_fault(). Since commit 92f842eac7ee3 ("[S390] store indication fault optimization"), we call handle_mm_fault() with FAULT_FLAG_WRITE, when recognizing that it was a write access. However, the vma flags check is still only checking against (VM_READ | VM_WRITE | VM_EXEC), and therefore also calling handle_mm_fault() with FAULT_FLAG_WRITE in cases where the vma does not allow VM_WRITE. Fix this by changing access check in do_exception() to VM_WRITE only, when recognizing write access. Link: https://lkml.kernel.org/r/20220811103435.188481-3-david@redhat.com Fixes: 92f842eac7ee3 ("[S390] store indication fault optimization") Cc: <stable@vger.kernel.org> Reported-by: David Hildenbrand <david@redhat.com> Reviewed-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Gerald Schaefer <gerald.schaefer@linux.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com> Signed-off-by: Gerald Schaefer <gerald.schaefer@linux.ibm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-09-05s390: fix double free of GS and RI CBs on fork() failureBrian Foster
commit 13cccafe0edcd03bf1c841de8ab8a1c8e34f77d9 upstream. The pointers for guarded storage and runtime instrumentation control blocks are stored in the thread_struct of the associated task. These pointers are initially copied on fork() via arch_dup_task_struct() and then cleared via copy_thread() before fork() returns. If fork() happens to fail after the initial task dup and before copy_thread(), the newly allocated task and associated thread_struct memory are freed via free_task() -> arch_release_task_struct(). This results in a double free of the guarded storage and runtime info structs because the fields in the failed task still refer to memory associated with the source task. This problem can manifest as a BUG_ON() in set_freepointer() (with CONFIG_SLAB_FREELIST_HARDENED enabled) or KASAN splat (if enabled) when running trinity syscall fuzz tests on s390x. To avoid this problem, clear the associated pointer fields in arch_dup_task_struct() immediately after the new task is copied. Note that the RI flag is still cleared in copy_thread() because it resides in thread stack memory and that is where stack info is copied. Signed-off-by: Brian Foster <bfoster@redhat.com> Fixes: 8d9047f8b967c ("s390/runtime instrumentation: simplify task exit handling") Fixes: 7b83c6297d2fc ("s390/guarded storage: simplify task exit handling") Cc: <stable@vger.kernel.org> # 4.15 Reviewed-by: Gerald Schaefer <gerald.schaefer@linux.ibm.com> Reviewed-by: Heiko Carstens <hca@linux.ibm.com> Link: https://lore.kernel.org/r/20220816155407.537372-1-bfoster@redhat.com Signed-off-by: Vasily Gorbik <gor@linux.ibm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-08-25kexec, KEYS, s390: Make use of built-in and secondary keyring for signature ↵Michal Suchanek
verification commit 0828c4a39be57768b8788e8cbd0d84683ea757e5 upstream. commit e23a8020ce4e ("s390/kexec_file: Signature verification prototype") adds support for KEXEC_SIG verification with keys from platform keyring but the built-in keys and secondary keyring are not used. Add support for the built-in keys and secondary keyring as x86 does. Fixes: e23a8020ce4e ("s390/kexec_file: Signature verification prototype") Cc: stable@vger.kernel.org Cc: Philipp Rudo <prudo@linux.ibm.com> Cc: kexec@lists.infradead.org Cc: keyrings@vger.kernel.org Cc: linux-security-module@vger.kernel.org Signed-off-by: Michal Suchanek <msuchanek@suse.de> Reviewed-by: "Lee, Chun-Yi" <jlee@suse.com> Acked-by: Baoquan He <bhe@redhat.com> Signed-off-by: Coiby Xu <coxu@redhat.com> Acked-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Mimi Zohar <zohar@linux.ibm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-08-03s390/archrandom: prevent CPACF trng invocations in interrupt contextHarald Freudenberger
commit 918e75f77af7d2e049bb70469ec0a2c12782d96a upstream. This patch slightly reworks the s390 arch_get_random_seed_{int,long} implementation: Make sure the CPACF trng instruction is never called in any interrupt context. This is done by adding an additional condition in_task(). Justification: There are some constrains to satisfy for the invocation of the arch_get_random_seed_{int,long}() functions: - They should provide good random data during kernel initialization. - They should not be called in interrupt context as the TRNG instruction is relatively heavy weight and may for example make some network loads cause to timeout and buck. However, it was not clear what kind of interrupt context is exactly encountered during kernel init or network traffic eventually calling arch_get_random_seed_long(). After some days of investigations it is clear that the s390 start_kernel function is not running in any interrupt context and so the trng is called: Jul 11 18:33:39 t35lp54 kernel: [<00000001064e90ca>] arch_get_random_seed_long.part.0+0x32/0x70 Jul 11 18:33:39 t35lp54 kernel: [<000000010715f246>] random_init+0xf6/0x238 Jul 11 18:33:39 t35lp54 kernel: [<000000010712545c>] start_kernel+0x4a4/0x628 Jul 11 18:33:39 t35lp54 kernel: [<000000010590402a>] startup_continue+0x2a/0x40 The condition in_task() is true and the CPACF trng provides random data during kernel startup. The network traffic however, is more difficult. A typical call stack looks like this: Jul 06 17:37:07 t35lp54 kernel: [<000000008b5600fc>] extract_entropy.constprop.0+0x23c/0x240 Jul 06 17:37:07 t35lp54 kernel: [<000000008b560136>] crng_reseed+0x36/0xd8 Jul 06 17:37:07 t35lp54 kernel: [<000000008b5604b8>] crng_make_state+0x78/0x340 Jul 06 17:37:07 t35lp54 kernel: [<000000008b5607e0>] _get_random_bytes+0x60/0xf8 Jul 06 17:37:07 t35lp54 kernel: [<000000008b56108a>] get_random_u32+0xda/0x248 Jul 06 17:37:07 t35lp54 kernel: [<000000008aefe7a8>] kfence_guarded_alloc+0x48/0x4b8 Jul 06 17:37:07 t35lp54 kernel: [<000000008aeff35e>] __kfence_alloc+0x18e/0x1b8 Jul 06 17:37:07 t35lp54 kernel: [<000000008aef7f10>] __kmalloc_node_track_caller+0x368/0x4d8 Jul 06 17:37:07 t35lp54 kernel: [<000000008b611eac>] kmalloc_reserve+0x44/0xa0 Jul 06 17:37:07 t35lp54 kernel: [<000000008b611f98>] __alloc_skb+0x90/0x178 Jul 06 17:37:07 t35lp54 kernel: [<000000008b6120dc>] __napi_alloc_skb+0x5c/0x118 Jul 06 17:37:07 t35lp54 kernel: [<000000008b8f06b4>] qeth_extract_skb+0x13c/0x680 Jul 06 17:37:07 t35lp54 kernel: [<000000008b8f6526>] qeth_poll+0x256/0x3f8 Jul 06 17:37:07 t35lp54 kernel: [<000000008b63d76e>] __napi_poll.constprop.0+0x46/0x2f8 Jul 06 17:37:07 t35lp54 kernel: [<000000008b63dbec>] net_rx_action+0x1cc/0x408 Jul 06 17:37:07 t35lp54 kernel: [<000000008b937302>] __do_softirq+0x132/0x6b0 Jul 06 17:37:07 t35lp54 kernel: [<000000008abf46ce>] __irq_exit_rcu+0x13e/0x170 Jul 06 17:37:07 t35lp54 kernel: [<000000008abf531a>] irq_exit_rcu+0x22/0x50 Jul 06 17:37:07 t35lp54 kernel: [<000000008b922506>] do_io_irq+0xe6/0x198 Jul 06 17:37:07 t35lp54 kernel: [<000000008b935826>] io_int_handler+0xd6/0x110 Jul 06 17:37:07 t35lp54 kernel: [<000000008b9358a6>] psw_idle_exit+0x0/0xa Jul 06 17:37:07 t35lp54 kernel: ([<000000008ab9c59a>] arch_cpu_idle+0x52/0xe0) Jul 06 17:37:07 t35lp54 kernel: [<000000008b933cfe>] default_idle_call+0x6e/0xd0 Jul 06 17:37:07 t35lp54 kernel: [<000000008ac59f4e>] do_idle+0xf6/0x1b0 Jul 06 17:37:07 t35lp54 kernel: [<000000008ac5a28e>] cpu_startup_entry+0x36/0x40 Jul 06 17:37:07 t35lp54 kernel: [<000000008abb0d90>] smp_start_secondary+0x148/0x158 Jul 06 17:37:07 t35lp54 kernel: [<000000008b935b9e>] restart_int_handler+0x6e/0x90 which confirms that the call is in softirq context. So in_task() covers exactly the cases where we want to have CPACF trng called: not in nmi, not in hard irq, not in soft irq but in normal task context and during kernel init. Signed-off-by: Harald Freudenberger <freude@linux.ibm.com> Acked-by: Jason A. Donenfeld <Jason@zx2c4.com> Reviewed-by: Juergen Christ <jchrist@linux.ibm.com> Link: https://lore.kernel.org/r/20220713131721.257907-1-freude@linux.ibm.com Fixes: e4f74400308c ("s390/archrandom: simplify back to earlier design and initialize earlier") [agordeev@linux.ibm.com changed desc, added Fixes and Link, removed -stable] Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-07-29locking/refcount: Consolidate implementations of refcount_tWill Deacon
[ Upstream commit fb041bb7c0a918b95c6889fc965cdc4a75b4c0ca ] The generic implementation of refcount_t should be good enough for everybody, so remove ARCH_HAS_REFCOUNT and REFCOUNT_FULL entirely, leaving the generic implementation enabled unconditionally. Signed-off-by: Will Deacon <will@kernel.org> Reviewed-by: Ard Biesheuvel <ardb@kernel.org> Acked-by: Kees Cook <keescook@chromium.org> Tested-by: Hanjun Guo <guohanjun@huawei.com> Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org> Cc: Elena Reshetova <elena.reshetova@intel.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: https://lkml.kernel.org/r/20191121115902.2551-9-will@kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-07-07s390: remove unneeded 'select BUILD_BIN2C'Masahiro Yamada
commit 25deecb21c18ee29e3be8ac6177b2a9504c33d2d upstream. Since commit 4c0f032d4963 ("s390/purgatory: Omit use of bin2c"), s390 builds the purgatory without using bin2c. Remove 'select BUILD_BIN2C' to avoid the unneeded build of bin2c. Fixes: 4c0f032d4963 ("s390/purgatory: Omit use of bin2c") Signed-off-by: Masahiro Yamada <masahiroy@kernel.org> Link: https://lore.kernel.org/r/20220613170902.1775211-1-masahiroy@kernel.org Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-07-07s390/archrandom: simplify back to earlier design and initialize earlierJason A. Donenfeld
commit e4f74400308cb8abde5fdc9cad609c2aba32110c upstream. s390x appears to present two RNG interfaces: - a "TRNG" that gathers entropy using some hardware function; and - a "DRBG" that takes in a seed and expands it. Previously, the TRNG was wired up to arch_get_random_{long,int}(), but it was observed that this was being called really frequently, resulting in high overhead. So it was changed to be wired up to arch_get_random_ seed_{long,int}(), which was a reasonable decision. Later on, the DRBG was then wired up to arch_get_random_{long,int}(), with a complicated buffer filling thread, to control overhead and rate. Fortunately, none of the performance issues matter much now. The RNG always attempts to use arch_get_random_seed_{long,int}() first, which means a complicated implementation of arch_get_random_{long,int}() isn't really valuable or useful to have around. And it's only used when reseeding, which means it won't hit the high throughput complications that were faced before. So this commit returns to an earlier design of just calling the TRNG in arch_get_random_seed_{long,int}(), and returning false in arch_get_ random_{long,int}(). Part of what makes the simplification possible is that the RNG now seeds itself using the TRNG at bootup. But this only works if the TRNG is detected early in boot, before random_init() is called. So this commit also causes that check to happen in setup_arch(). Cc: stable@vger.kernel.org Cc: Harald Freudenberger <freude@linux.ibm.com> Cc: Ingo Franzki <ifranzki@linux.ibm.com> Cc: Juergen Christ <jchrist@linux.ibm.com> Cc: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com> Link: https://lore.kernel.org/r/20220610222023.378448-1-Jason@zx2c4.com Reviewed-by: Harald Freudenberger <freude@linux.ibm.com> Acked-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com> Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-07-02kexec_file: drop weak attribute from arch_kexec_apply_relocations[_add]Naveen N. Rao
commit 3e35142ef99fe6b4fe5d834ad43ee13cca10a2dc upstream. Since commit d1bcae833b32f1 ("ELF: Don't generate unused section symbols") [1], binutils (v2.36+) started dropping section symbols that it thought were unused. This isn't an issue in general, but with kexec_file.c, gcc is placing kexec_arch_apply_relocations[_add] into a separate .text.unlikely section and the section symbol ".text.unlikely" is being dropped. Due to this, recordmcount is unable to find a non-weak symbol in .text.unlikely to generate a relocation record against. Address this by dropping the weak attribute from these functions. Instead, follow the existing pattern of having architectures #define the name of the function they want to override in their headers. [1] https://sourceware.org/git/?p=binutils-gdb.git;a=commit;h=d1bcae833b32f1 [akpm@linux-foundation.org: arch/s390/include/asm/kexec.h needs linux/module.h] Link: https://lkml.kernel.org/r/20220519091237.676736-1-naveen.n.rao@linux.vnet.ibm.com Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-06-29s390/cpumf: Handle events cycles and instructions identicalThomas Richter
[ Upstream commit be857b7f77d130dbbd47c91fc35198b040f35865 ] Events CPU_CYCLES and INSTRUCTIONS can be submitted with two different perf_event attribute::type values: - PERF_TYPE_HARDWARE: when invoked via perf tool predefined events name cycles or cpu-cycles or instructions. - pmu->type: when invoked via perf tool event name cpu_cf/CPU_CYLCES/ or cpu_cf/INSTRUCTIONS/. This invocation also selects the PMU to which the event belongs. Handle both type of invocations identical for events CPU_CYLCES and INSTRUCTIONS. They address the same hardware. The result is different when event modifier exclude_kernel is also set. Invocation with event modifier for user space event counting fails. Output before: # perf stat -e cpum_cf/cpu_cycles/u -- true Performance counter stats for 'true': <not supported> cpum_cf/cpu_cycles/u 0.000761033 seconds time elapsed 0.000076000 seconds user 0.000725000 seconds sys # Output after: # perf stat -e cpum_cf/cpu_cycles/u -- true Performance counter stats for 'true': 349,613 cpum_cf/cpu_cycles/u 0.000844143 seconds time elapsed 0.000079000 seconds user 0.000800000 seconds sys # Fixes: 6a82e23f45fe ("s390/cpumf: Adjust registration of s390 PMU device drivers") Signed-off-by: Thomas Richter <tmricht@linux.ibm.com> Acked-by: Sumanth Korikkar <sumanthk@linux.ibm.com> [agordeev@linux.ibm.com corrected commit ID of Fixes commit] Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-06-25s390/mm: use non-quiescing sske for KVM switch to keyed guestChristian Borntraeger
commit 3ae11dbcfac906a8c3a480e98660a823130dc16a upstream. The switch to a keyed guest does not require a classic sske as the other guest CPUs are not accessing the key before the switch is complete. By using the NQ SSKE things are faster especially with multiple guests. Signed-off-by: Christian Borntraeger <borntraeger@linux.ibm.com> Suggested-by: Janis Schoetterl-Glausch <scgl@linux.ibm.com> Reviewed-by: Claudio Imbrenda <imbrenda@linux.ibm.com> Link: https://lore.kernel.org/r/20220530092706.11637-3-borntraeger@linux.ibm.com Signed-off-by: Christian Borntraeger <borntraeger@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-06-22s390: define get_cycles macro for arch-overrideJason A. Donenfeld
commit 2e3df523256cb9836de8441e9c791a796759bb3c upstream. S390x defines a get_cycles() function, but it does not do the usual `#define get_cycles get_cycles` dance, making it impossible for generic code to see if an arch-specific function was defined. While the get_cycles() ifdef is not currently used, the following timekeeping patch in this series will depend on the macro existing (or not existing) when defining random_get_entropy(). Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Vasily Gorbik <gor@linux.ibm.com> Cc: Alexander Gordeev <agordeev@linux.ibm.com> Cc: Christian Borntraeger <borntraeger@linux.ibm.com> Cc: Sven Schnelle <svens@linux.ibm.com> Acked-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>