aboutsummaryrefslogtreecommitdiffstats
path: root/fs/overlayfs
AgeCommit message (Collapse)Author
2021-10-17ovl: fix missing negative dentry check in ovl_rename()Zheng Liang
commit a295aef603e109a47af355477326bd41151765b6 upstream. The following reproducer mkdir lower upper work merge touch lower/old touch lower/new mount -t overlay overlay -olowerdir=lower,upperdir=upper,workdir=work merge rm merge/new mv merge/old merge/new & unlink upper/new may result in this race: PROCESS A: rename("merge/old", "merge/new"); overwrite=true,ovl_lower_positive(old)=true, ovl_dentry_is_whiteout(new)=true -> flags |= RENAME_EXCHANGE PROCESS B: unlink("upper/new"); PROCESS A: lookup newdentry in new_upperdir call vfs_rename() with negative newdentry and RENAME_EXCHANGE Fix by adding the missing check for negative newdentry. Signed-off-by: Zheng Liang <zhengliang6@huawei.com> Fixes: e9be9d5e76e3 ("overlay filesystem") Cc: <stable@vger.kernel.org> # v3.18 Signed-off-by: Miklos Szeredi <mszeredi@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-02-23ovl: skip getxattr of security labelsAmir Goldstein
[ Upstream commit 03fedf93593c82538b18476d8c4f0e8f8435ea70 ] When inode has no listxattr op of its own (e.g. squashfs) vfs_listxattr calls the LSM inode_listsecurity hooks to list the xattrs that LSMs will intercept in inode_getxattr hooks. When selinux LSM is installed but not initialized, it will list the security.selinux xattr in inode_listsecurity, but will not intercept it in inode_getxattr. This results in -ENODATA for a getxattr call for an xattr returned by listxattr. This situation was manifested as overlayfs failure to copy up lower files from squashfs when selinux is built-in but not initialized, because ovl_copy_xattr() iterates the lower inode xattrs by vfs_listxattr() and vfs_getxattr(). ovl_copy_xattr() skips copy up of security labels that are indentified by inode_copy_up_xattr LSM hooks, but it does that after vfs_getxattr(). Since we are not going to copy them, skip vfs_getxattr() of the security labels. Reported-by: Michael Labriola <michael.d.labriola@gmail.com> Tested-by: Michael Labriola <michael.d.labriola@gmail.com> Link: https://lore.kernel.org/linux-unionfs/2nv9d47zt7.fsf@aldarion.sourceruckus.org/ Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-02-23ovl: perform vfs_getxattr() with mounter credsMiklos Szeredi
[ Upstream commit 554677b97257b0b69378bd74e521edb7e94769ff ] The vfs_getxattr() in ovl_xattr_set() is used to check whether an xattr exist on a lower layer file that is to be removed. If the xattr does not exist, then no need to copy up the file. This call of vfs_getxattr() wasn't wrapped in credential override, and this is probably okay. But for consitency wrap this instance as well. Reported-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-02-10ovl: fix dentry leak in ovl_get_redirectLiangyan
commit e04527fefba6e4e66492f122cf8cc6314f3cf3bf upstream. We need to lock d_parent->d_lock before dget_dlock, or this may have d_lockref updated parallelly like calltrace below which will cause dentry->d_lockref leak and risk a crash. CPU 0 CPU 1 ovl_set_redirect lookup_fast ovl_get_redirect __d_lookup dget_dlock //no lock protection here spin_lock(&dentry->d_lock) dentry->d_lockref.count++ dentry->d_lockref.count++ [   49.799059] PGD 800000061fed7067 P4D 800000061fed7067 PUD 61fec5067 PMD 0 [   49.799689] Oops: 0002 [#1] SMP PTI [   49.800019] CPU: 2 PID: 2332 Comm: node Not tainted 4.19.24-7.20.al7.x86_64 #1 [   49.800678] Hardware name: Alibaba Cloud Alibaba Cloud ECS, BIOS 8a46cfe 04/01/2014 [   49.801380] RIP: 0010:_raw_spin_lock+0xc/0x20 [   49.803470] RSP: 0018:ffffac6fc5417e98 EFLAGS: 00010246 [   49.803949] RAX: 0000000000000000 RBX: ffff93b8da3446c0 RCX: 0000000a00000000 [   49.804600] RDX: 0000000000000001 RSI: 000000000000000a RDI: 0000000000000088 [   49.805252] RBP: 0000000000000000 R08: 0000000000000000 R09: ffffffff993cf040 [   49.805898] R10: ffff93b92292e580 R11: ffffd27f188a4b80 R12: 0000000000000000 [   49.806548] R13: 00000000ffffff9c R14: 00000000fffffffe R15: ffff93b8da3446c0 [   49.807200] FS:  00007ffbedffb700(0000) GS:ffff93b927880000(0000) knlGS:0000000000000000 [   49.807935] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [   49.808461] CR2: 0000000000000088 CR3: 00000005e3f74006 CR4: 00000000003606a0 [   49.809113] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [   49.809758] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [   49.810410] Call Trace: [   49.810653]  d_delete+0x2c/0xb0 [   49.810951]  vfs_rmdir+0xfd/0x120 [   49.811264]  do_rmdir+0x14f/0x1a0 [   49.811573]  do_syscall_64+0x5b/0x190 [   49.811917]  entry_SYSCALL_64_after_hwframe+0x44/0xa9 [   49.812385] RIP: 0033:0x7ffbf505ffd7 [   49.814404] RSP: 002b:00007ffbedffada8 EFLAGS: 00000297 ORIG_RAX: 0000000000000054 [   49.815098] RAX: ffffffffffffffda RBX: 00007ffbedffb640 RCX: 00007ffbf505ffd7 [   49.815744] RDX: 0000000004449700 RSI: 0000000000000000 RDI: 0000000006c8cd50 [   49.816394] RBP: 00007ffbedffaea0 R08: 0000000000000000 R09: 0000000000017d0b [   49.817038] R10: 0000000000000000 R11: 0000000000000297 R12: 0000000000000012 [   49.817687] R13: 00000000072823d8 R14: 00007ffbedffb700 R15: 00000000072823d8 [   49.818338] Modules linked in: pvpanic cirrusfb button qemu_fw_cfg atkbd libps2 i8042 [   49.819052] CR2: 0000000000000088 [   49.819368] ---[ end trace 4e652b8aa299aa2d ]--- [   49.819796] RIP: 0010:_raw_spin_lock+0xc/0x20 [   49.821880] RSP: 0018:ffffac6fc5417e98 EFLAGS: 00010246 [   49.822363] RAX: 0000000000000000 RBX: ffff93b8da3446c0 RCX: 0000000a00000000 [   49.823008] RDX: 0000000000000001 RSI: 000000000000000a RDI: 0000000000000088 [   49.823658] RBP: 0000000000000000 R08: 0000000000000000 R09: ffffffff993cf040 [   49.825404] R10: ffff93b92292e580 R11: ffffd27f188a4b80 R12: 0000000000000000 [   49.827147] R13: 00000000ffffff9c R14: 00000000fffffffe R15: ffff93b8da3446c0 [   49.828890] FS:  00007ffbedffb700(0000) GS:ffff93b927880000(0000) knlGS:0000000000000000 [   49.830725] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [   49.832359] CR2: 0000000000000088 CR3: 00000005e3f74006 CR4: 00000000003606a0 [   49.834085] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [   49.835792] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Cc: <stable@vger.kernel.org> Fixes: a6c606551141 ("ovl: redirect on rename-dir") Signed-off-by: Liangyan <liangyan.peng@linux.alibaba.com> Reviewed-by: Joseph Qi <joseph.qi@linux.alibaba.com> Suggested-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-06-20ovl: initialize error in ovl_copy_xattrYuxuan Shui
commit 520da69d265a91c6536c63851cbb8a53946974f0 upstream. In ovl_copy_xattr, if all the xattrs to be copied are overlayfs private xattrs, the copy loop will terminate without assigning anything to the error variable, thus returning an uninitialized value. If ovl_copy_xattr is called from ovl_clear_empty, this uninitialized error value is put into a pointer by ERR_PTR(), causing potential invalid memory accesses down the line. This commit initialize error with 0. This is the correct value because when there's no xattr to copy, because all xattrs are private, ovl_copy_xattr should succeed. This bug is discovered with the help of INIT_STACK_ALL and clang. Signed-off-by: Yuxuan Shui <yshuiv7@gmail.com> Link: https://bugs.chromium.org/p/chromium/issues/detail?id=1050405 Fixes: 0956254a2d5b ("ovl: don't copy up opaqueness") Cc: stable@vger.kernel.org # v4.8 Signed-off-by: Alexander Potapenko <glider@google.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-12-17ovl: relax WARN_ON() on rename to selfAmir Goldstein
commit 6889ee5a53b8d969aa542047f5ac8acdc0e79a91 upstream. In ovl_rename(), if new upper is hardlinked to old upper underneath overlayfs before upper dirs are locked, user will get an ESTALE error and a WARN_ON will be printed. Changes to underlying layers while overlayfs is mounted may result in unexpected behavior, but it shouldn't crash the kernel and it shouldn't trigger WARN_ON() either, so relax this WARN_ON(). Reported-by: syzbot+bb1836a212e69f8e201a@syzkaller.appspotmail.com Fixes: 804032fabb3b ("ovl: don't check rename to self") Cc: <stable@vger.kernel.org> # v4.9+ Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-10-05ovl: filter of trusted xattr results in auditMark Salyzyn
commit 5c2e9f346b815841f9bed6029ebcb06415caf640 upstream. When filtering xattr list for reading, presence of trusted xattr results in a security audit log. However, if there is other content no errno will be set, and if there isn't, the errno will be -ENODATA and not -EPERM as is usually associated with a lack of capability. The check does not block the request to list the xattrs present. Switch to ns_capable_noaudit to reflect a more appropriate check. Signed-off-by: Mark Salyzyn <salyzyn@android.com> Cc: linux-security-module@vger.kernel.org Cc: kernel-team@android.com Cc: stable@vger.kernel.org # v3.18+ Fixes: a082c6f680da ("ovl: filter trusted xattr for non-admin") Signed-off-by: Miklos Szeredi <mszeredi@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-11-21ovl: check whiteout in ovl_create_over_whiteout()Miklos Szeredi
commit 5e1275808630ea3b2c97c776f40e475017535f72 upstream. Kaixuxia repors that it's possible to crash overlayfs by removing the whiteout on the upper layer before creating a directory over it. This is a reproducer: mkdir lower upper work merge touch lower/file mount -t overlay overlay -olowerdir=lower,upperdir=upper,workdir=work merge rm merge/file ls -al merge/file rm upper/file ls -al merge/ mkdir merge/file Before commencing with a vfs_rename(..., RENAME_EXCHANGE) verify that the lookup of "upper" is positive and is a whiteout, and return ESTALE otherwise. Reported by: kaixuxia <xiakaixu1987@gmail.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com> Fixes: e9be9d5e76e3 ("overlay filesystem") Cc: <stable@vger.kernel.org> # v3.18 Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-11-21ovl: fix recursive oi->lock in ovl_link()Amir Goldstein
commit 6cd078702f2f33cb6b19a682de3e9184112f1a46 upstream. linking a non-copied-up file into a non-copied-up parent results in a nested call to mutex_lock_interruptible(&oi->lock). Fix this by copying up target parent before ovl_nlink_start(), same as done in ovl_rename(). ~/unionmount-testsuite$ ./run --ov -s ~/unionmount-testsuite$ ln /mnt/a/foo100 /mnt/a/dir100/ WARNING: possible recursive locking detected -------------------------------------------- ln/1545 is trying to acquire lock: 00000000bcce7c4c (&ovl_i_lock_key[depth]){+.+.}, at: ovl_copy_up_start+0x28/0x7d but task is already holding lock: 0000000026d73d5b (&ovl_i_lock_key[depth]){+.+.}, at: ovl_nlink_start+0x3c/0xc1 [SzM: this seems to be a false positive, but doing the copy-up first is harmless and removes the lockdep splat] Reported-by: syzbot+3ef5c0d1a5cb0b21e6be@syzkaller.appspotmail.com Fixes: 5f8415d6b87e ("ovl: persistent overlay inode nlink for...") Cc: <stable@vger.kernel.org> # v4.13 Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com> [amir: backport to v4.18] Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-11-21ovl: fix error handling in ovl_verify_set_fh()Amir Goldstein
commit babf4770be0adc69e6d2de150f4040f175e24beb upstream. We hit a BUG on kfree of an ERR_PTR()... Reported-by: syzbot+ff03fe05c717b82502d0@syzkaller.appspotmail.com Fixes: 8b88a2e64036 ("ovl: verify upper root dir matches lower root dir") Cc: <stable@vger.kernel.org> # v4.13 Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-11-10vfs: swap names of {do,vfs}_clone_file_range()Amir Goldstein
commit a725356b6659469d182d662f22d770d83d3bc7b5 upstream. Commit 031a072a0b8a ("vfs: call vfs_clone_file_range() under freeze protection") created a wrapper do_clone_file_range() around vfs_clone_file_range() moving the freeze protection to former, so overlayfs could call the latter. The more common vfs practice is to call do_xxx helpers from vfs_xxx helpers, where freeze protecction is taken in the vfs_xxx helper, so this anomality could be a source of confusion. It seems that commit 8ede205541ff ("ovl: add reflink/copyfile/dedup support") may have fallen a victim to this confusion - ovl_clone_file_range() calls the vfs_clone_file_range() helper in the hope of getting freeze protection on upper fs, but in fact results in overlayfs allowing to bypass upper fs freeze protection. Swap the names of the two helpers to conform to common vfs practice and call the correct helpers from overlayfs and nfsd. Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com> Fixes: 031a072a0b8a ("vfs: call vfs_clone_file_range() under freeze...") Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2018-10-10ovl: fix format of setxattr debugMiklos Szeredi
commit 1a8f8d2a443ef9ad9a3065ba8c8119df714240fa upstream. Format has a typo: it was meant to be "%.*s", not "%*s". But at some point callers grew nonprintable values as well, so use "%*pE" instead with a maximized length. Reported-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com> Fixes: 3a1e819b4e80 ("ovl: store file handle of lower inode on copy up") Cc: <stable@vger.kernel.org> # v4.12 Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-10-10ovl: fix memory leak on unlink of indexed fileAmir Goldstein
commit 63e132528032ce937126aba591a7b37ec593a6bb upstream. The memory leak was detected by kmemleak when running xfstests overlay/051,053 Fixes: caf70cb2ba5d ("ovl: cleanup orphan index entries") Cc: <stable@vger.kernel.org> # v4.13 Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-10-10ovl: fix access beyond unterminated stringsAmir Goldstein
commit 601350ff58d5415a001769532f6b8333820e5786 upstream. KASAN detected slab-out-of-bounds access in printk from overlayfs, because string format used %*s instead of %.*s. > BUG: KASAN: slab-out-of-bounds in string+0x298/0x2d0 lib/vsprintf.c:604 > Read of size 1 at addr ffff8801c36c66ba by task syz-executor2/27811 > > CPU: 0 PID: 27811 Comm: syz-executor2 Not tainted 4.19.0-rc5+ #36 ... > printk+0xa7/0xcf kernel/printk/printk.c:1996 > ovl_lookup_index.cold.15+0xe8/0x1f8 fs/overlayfs/namei.c:689 Reported-by: syzbot+376cea2b0ef340db3dd4@syzkaller.appspotmail.com Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com> Fixes: 359f392ca53e ("ovl: lookup index entry for copy up origin") Cc: <stable@vger.kernel.org> # v4.13 Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-10-03ovl: hash non-dir by lower inode for fsnotifyAmir Goldstein
commit 764baba80168ad3adafb521d2ab483ccbc49e344 upstream. Commit 31747eda41ef ("ovl: hash directory inodes for fsnotify") fixed an issue of inotify watch on directory that stops getting events after dropping dentry caches. A similar issue exists for non-dir non-upper files, for example: $ mkdir -p lower upper work merged $ touch lower/foo $ mount -t overlay -o lowerdir=lower,workdir=work,upperdir=upper none merged $ inotifywait merged/foo & $ echo 2 > /proc/sys/vm/drop_caches $ cat merged/foo inotifywait doesn't get the OPEN event, because ovl_lookup() called from 'cat' allocates a new overlay inode and does not reuse the watched inode. Fix this by hashing non-dir overlay inodes by lower real inode in the following cases that were not hashed before this change: - A non-upper overlay mount - A lower non-hardlink when index=off A helper ovl_hash_bylower() was added to put all the logic and documentation about which real inode an overlay inode is hashed by into one place. The issue dates back to initial version of overlayfs, but this patch depends on ovl_inode code that was introduced in kernel v4.13. Cc: <stable@vger.kernel.org> #v4.13 Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com> Signed-off-by: Mark Salyzyn <salyzyn@android.com> #4.14 Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-09-09ovl: fix wrong use of impure dir cache in ovl_iterate()Amir Goldstein
commit 67810693077afc1ebf9e1646af300436cb8103c2 upstream. Only upper dir can be impure, but if we are in the middle of iterating a lower real dir, dir could be copied up and marked impure. We only want the impure cache if we started iterating a real upper dir to begin with. Aditya Kali reported that the following reproducer hits the WARN_ON(!cache->refcount) in ovl_get_cache(): docker run --rm drupal:8.5.4-fpm-alpine \ sh -c 'cd /var/www/html/vendor/symfony && \ chown -R www-data:www-data . && ls -l .' Reported-by: Aditya Kali <adityakali@google.com> Tested-by: Aditya Kali <adityakali@google.com> Fixes: 4edb83bb1041 ('ovl: constant d_ino for non-merge dirs') Cc: <stable@vger.kernel.org> # v4.14 Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-08-03ovl: Sync upper dirty data when syncing overlayfsChengguang Xu
commit e8d4bfe3a71537284a90561f77c85dea6c154369 upstream. When executing filesystem sync or umount on overlayfs, dirty data does not get synced as expected on upper filesystem. This patch fixes sync filesystem method to keep data consistency for overlayfs. Signed-off-by: Chengguang Xu <cgxu@mykernel.net> Fixes: e593b2bf513d ("ovl: properly implement sync_filesystem()") Cc: <stable@vger.kernel.org> #4.11 Signed-off-by: Miklos Szeredi <mszeredi@redhat.com> Signed-off-by: Sudip Mukherjee <sudipm.mukherjee@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-04-19ovl: fix lookup with middle layer opaque dir and absolute path redirectsAmir Goldstein
commit 3ec9b3fafcaf441cc4d46b9742cd6ec0c79f8df0 upstream. As of now if we encounter an opaque dir while looking for a dentry, we set d->last=true. This means that there is no need to look further in any of the lower layers. This works fine as long as there are no redirets or relative redircts. But what if there is an absolute redirect on the children dentry of opaque directory. We still need to continue to look into next lower layer. This patch fixes it. Here is an example to demonstrate the issue. Say you have following setup. upper: /redirect (redirect=/a/b/c) lower1: /a/[b]/c ([b] is opaque) (c has absolute redirect=/a/b/d/) lower0: /a/b/d/foo Now "redirect" dir should merge with lower1:/a/b/c/ and lower0:/a/b/d. Note, despite the fact lower1:/a/[b] is opaque, we need to continue to look into lower0 because children c has an absolute redirect. Following is a reproducer. Watch me make foo disappear: $ mkdir lower middle upper work work2 merged $ mkdir lower/origin $ touch lower/origin/foo $ mount -t overlay none merged/ \ -olowerdir=lower,upperdir=middle,workdir=work2 $ mkdir merged/pure $ mv merged/origin merged/pure/redirect $ umount merged $ mount -t overlay none merged/ \ -olowerdir=middle:lower,upperdir=upper,workdir=work $ mv merged/pure/redirect merged/redirect Now you see foo inside a twice redirected merged dir: $ ls merged/redirect foo $ umount merged $ mount -t overlay none merged/ \ -olowerdir=middle:lower,upperdir=upper,workdir=work After mount cycle you don't see foo inside the same dir: $ ls merged/redirect During middle layer lookup, the opaqueness of middle/pure is left in the lookup state and then middle/pure/redirect is wrongly treated as opaque. Fixes: 02b69b284cd7 ("ovl: lookup redirects") Cc: <stable@vger.kernel.org> #v4.10 Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Vivek Goyal <vgoyal@redhat.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-02-22ovl: hash directory inodes for fsnotifyAmir Goldstein
commit 31747eda41ef3c30c09c5c096b380bf54013746a upstream. fsnotify pins a watched directory inode in cache, but if directory dentry is released, new lookup will allocate a new dentry and a new inode. Directory events will be notified on the new inode, while fsnotify listener is watching the old pinned inode. Hash all directory inodes to reuse the pinned inode on lookup. Pure upper dirs are hashes by real upper inode, merge and lower dirs are hashed by real lower inode. The reference to lower inode was being held by the lower dentry object in the overlay dentry (oe->lowerstack[0]). Releasing the overlay dentry may drop lower inode refcount to zero. Add a refcount on behalf of the overlay inode to prevent that. As a by-product, hashing directory inodes also detects multiple redirected dirs to the same lower dir and uncovered redirected dir target on and returns -ESTALE on lookup. The reported issue dates back to initial version of overlayfs, but this patch depends on ovl_inode code that was introduced in kernel v4.13. Cc: <stable@vger.kernel.org> #v4.13 Reported-by: Niklas Cassel <niklas.cassel@axis.com> Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com> Tested-by: Niklas Cassel <niklas.cassel@axis.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-02-16ovl: take mnt_want_write() for removing impure xattrAmir Goldstein
commit a5a927a7c82e28ea76599dee4019c41e372c911f upstream. The optimization in ovl_cache_get_impure() that tries to remove an unneeded "impure" xattr needs to take mnt_want_write() on upper fs. Fixes: 4edb83bb1041 ("ovl: constant d_ino for non-merge dirs") Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-02-16ovl: fix failure to fsync lower dirAmir Goldstein
commit d796e77f1dd541fe34481af2eee6454688d13982 upstream. As a writable mount, it is not expected for overlayfs to return EINVAL/EROFS for fsync, even if dir/file is not changed. This commit fixes the case of fsync of directory, which is easier to address, because overlayfs already implements fsync file operation for directories. The problem reported by Raphael is that new PostgreSQL 10.0 with a database in overlayfs where lower layer in squashfs fails to start. The failure is due to fsync error, when PostgreSQL does fsync on all existing db directories on startup and a specific directory exists lower layer with no changes. Reported-by: Raphael Hertzog <raphael@ouaza.com> Signed-off-by: Amir Goldstein <amir73il@gmail.com> Tested-by: Raphaël Hertzog <hertzog@debian.org> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-12-25locking/barriers: Convert users of lockless_dereference() to READ_ONCE()Will Deacon
commit 3382290ed2d5e275429cef510ab21889d3ccd164 upstream. [ Note, this is a Git cherry-pick of the following commit: 506458efaf15 ("locking/barriers: Convert users of lockless_dereference() to READ_ONCE()") ... for easier x86 PTI code testing and back-porting. ] READ_ONCE() now has an implicit smp_read_barrier_depends() call, so it can be used instead of lockless_dereference() without any change in semantics. Signed-off-by: Will Deacon <will.deacon@arm.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/1508840570-22169-4-git-send-email-will.deacon@arm.com Signed-off-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-12-20ovl: update ctx->pos on impure dir iterationAmir Goldstein
commit b02a16e6413a2f782e542ef60bad9ff6bf212f8a upstream. This fixes a regression with readdir of impure dir in overlayfs that is shared to VM via 9p fs. Reported-by: Miguel Bernal Marin <miguel.bernal.marin@linux.intel.com> Fixes: 4edb83bb1041 ("ovl: constant d_ino for non-merge dirs") Signed-off-by: Amir Goldstein <amir73il@gmail.com> Tested-by: Miguel Bernal Marin <miguel.bernal.marin@linux.intel.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-12-20ovl: Pass ovl_get_nlink() parameters in right orderVivek Goyal
commit 08d8f8a5b094b66b29936e8751b4a818b8db1207 upstream. Right now we seem to be passing index as "lowerdentry" and origin.dentry as "upperdentry". IIUC, we should pass these parameters in reversed order and this looks like a bug. Signed-off-by: Vivek Goyal <vgoyal@redhat.com> Acked-by: Amir Goldstein <amir73il@gmail.com> Fixes: caf70cb2ba5d ("ovl: cleanup orphan index entries") Signed-off-by: Miklos Szeredi <mszeredi@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-11-30ovl: Put upperdentry if ovl_check_origin() failsVivek Goyal
commit 5455f92b54e516995a9ca45bbf790d3629c27a93 upstream. If ovl_check_origin() fails, we should put upperdentry. We have a reference on it by now. So goto out_put_upper instead of out. Fixes: a9d019573e88 ("ovl: lookup non-dir copy-up-origin by file handle") Signed-off-by: Vivek Goyal <vgoyal@redhat.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-10-24ovl: do not cleanup unsupported index entriesAmir Goldstein
With index=on, ovl_indexdir_cleanup() tries to cleanup invalid index entries (e.g. bad index name). This behavior could result in cleaning of entries created by newer kernels and is therefore undesirable. Instead, abort mount if such entries are encountered. We still cleanup 'stale' entries and 'orphan' entries, both those cases can be a result of offline changes to lower and upper dirs. When encoutering an index entry of type directory or whiteout, kernel was supposed to fallback to read-only mount, but the fill_super() operation returns EROFS in this case instead of returning success with read-only mount flag, so mount fails when encoutering directory or whiteout index entries. Bless this behavior by returning -EINVAL on directory and whiteout index entries as we do for all unsupported index entries. Fixes: 61b674710cd9 ("ovl: do not cleanup directory and whiteout index..") Cc: <stable@vger.kernel.org> # v4.13 Signed-off-by: Amir Goldstein <amir73il@gmail.com>
2017-10-24ovl: handle ENOENT on index lookupAmir Goldstein
Treat ENOENT from index entry lookup the same way as treating a returned negative dentry. Apparently, either could be returned if file is not found, depending on the underlying file system. Fixes: 359f392ca53e ("ovl: lookup index entry for copy up origin") Cc: <stable@vger.kernel.org> # v4.13 Signed-off-by: Amir Goldstein <amir73il@gmail.com>
2017-10-24ovl: fix EIO from lookup of non-indexed upperAmir Goldstein
Commit fbaf94ee3cd5 ("ovl: don't set origin on broken lower hardlink") attempt to avoid the condition of non-indexed upper inode with lower hardlink as origin. If this condition is found, lookup returns EIO. The protection of commit mentioned above does not cover the case of lower that is not a hardlink when it is copied up (with either index=off/on) and then lower is hardlinked while overlay is offline. Changes to lower layer while overlayfs is offline should not result in unexpected behavior, so a permanent EIO error after creating a link in lower layer should not be considered as correct behavior. This fix replaces EIO error with success in cases where upper has origin but no index is found, or index is found that does not match upper inode. In those cases, lookup will not fail and the returned overlay inode will be hashed by upper inode instead of by lower origin inode. Fixes: 359f392ca53e ("ovl: lookup index entry for copy up origin") Cc: <stable@vger.kernel.org> # v4.13 Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2017-10-19ovl: Return -ENOMEM if an allocation fails ovl_lookup()Dan Carpenter
The error code is missing here so it means we return ERR_PTR(0) or NULL. The other error paths all return an error code so this probably should as well. Fixes: 02b69b284cd7 ("ovl: lookup redirects") Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Reviewed-by: Chandan Rajendra <chandan@linux.vnet.ibm.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2017-10-19ovl: add NULL check in ovl_alloc_inodeHirofumi Nakagawa
This was detected by fault injection test Signed-off-by: Hirofumi Nakagawa <nklabs@gmail.com> Fixes: 13cf199d0088 ("ovl: allocate an ovl_inode struct") Cc: <stable@vger.kernel.org> # v4.13
2017-10-05ovl: fix regression caused by exclusive upper/work dir protectionAmir Goldstein
Enforcing exclusive ownership on upper/work dirs caused a docker regression: https://github.com/moby/moby/issues/34672. Euan spotted the regression and pointed to the offending commit. Vivek has brought the regression to my attention and provided this reproducer: Terminal 1: mount -t overlay -o workdir=work,lowerdir=lower,upperdir=upper none merged/ Terminal 2: unshare -m Terminal 1: umount merged mount -t overlay -o workdir=work,lowerdir=lower,upperdir=upper none merged/ mount: /root/overlay-testing/merged: none already mounted or mount point busy To fix the regression, I replaced the error with an alarming warning. With index feature enabled, mount does fail, but logs a suggestion to override exclusive dir protection by disabling index. Note that index=off mount does take the inuse locks, so a concurrent index=off will issue the warning and a concurrent index=on mount will fail. Documentation was updated to reflect this change. Fixes: 2cac0c00a6cd ("ovl: get exclusive ownership on upper/work dirs") Cc: <stable@vger.kernel.org> # v4.13 Reported-by: Euan Kemp <euank@euank.com> Reported-by: Vivek Goyal <vgoyal@redhat.com> Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2017-10-05ovl: fix missing unlock_rename() in ovl_do_copy_up()Amir Goldstein
Use the ovl_lock_rename_workdir() helper which requires unlock_rename() only on lock success. Fixes: ("fd210b7d67ee ovl: move copy up lock out") Cc: <stable@vger.kernel.org> # v4.13 Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2017-10-05ovl: fix dentry leak in ovl_indexdir_cleanup()Amir Goldstein
index dentry was not released when breaking out of the loop due to index verification error. Fixes: 415543d5c64f ("ovl: cleanup bad and stale index entries on mount") Cc: <stable@vger.kernel.org> # v4.13 Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2017-10-05ovl: fix dput() of ERR_PTR in ovl_cleanup_index()Amir Goldstein
Fixes: caf70cb2ba5d ("ovl: cleanup orphan index entries") Cc: <stable@vger.kernel.org> # v4.13 Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2017-10-05ovl: fix error value printed in ovl_lookup_index()Amir Goldstein
Fixes: 359f392ca53e ("ovl: lookup index entry for copy up origin") Cc: <stable@vger.kernel.org> # v4.13 Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2017-09-14Merge branch 'work.mount' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull mount flag updates from Al Viro: "Another chunk of fmount preparations from dhowells; only trivial conflicts for that part. It separates MS_... bits (very grotty mount(2) ABI) from the struct super_block ->s_flags (kernel-internal, only a small subset of MS_... stuff). This does *not* convert the filesystems to new constants; only the infrastructure is done here. The next step in that series is where the conflicts would be; that's the conversion of filesystems. It's purely mechanical and it's better done after the merge, so if you could run something like list=$(for i in MS_RDONLY MS_NOSUID MS_NODEV MS_NOEXEC MS_SYNCHRONOUS MS_MANDLOCK MS_DIRSYNC MS_NOATIME MS_NODIRATIME MS_SILENT MS_POSIXACL MS_KERNMOUNT MS_I_VERSION MS_LAZYTIME; do git grep -l $i fs drivers/staging/lustre drivers/mtd ipc mm include/linux; done|sort|uniq|grep -v '^fs/namespace.c$') sed -i -e 's/\<MS_RDONLY\>/SB_RDONLY/g' \ -e 's/\<MS_NOSUID\>/SB_NOSUID/g' \ -e 's/\<MS_NODEV\>/SB_NODEV/g' \ -e 's/\<MS_NOEXEC\>/SB_NOEXEC/g' \ -e 's/\<MS_SYNCHRONOUS\>/SB_SYNCHRONOUS/g' \ -e 's/\<MS_MANDLOCK\>/SB_MANDLOCK/g' \ -e 's/\<MS_DIRSYNC\>/SB_DIRSYNC/g' \ -e 's/\<MS_NOATIME\>/SB_NOATIME/g' \ -e 's/\<MS_NODIRATIME\>/SB_NODIRATIME/g' \ -e 's/\<MS_SILENT\>/SB_SILENT/g' \ -e 's/\<MS_POSIXACL\>/SB_POSIXACL/g' \ -e 's/\<MS_KERNMOUNT\>/SB_KERNMOUNT/g' \ -e 's/\<MS_I_VERSION\>/SB_I_VERSION/g' \ -e 's/\<MS_LAZYTIME\>/SB_LAZYTIME/g' \ $list and commit it with something along the lines of 'convert filesystems away from use of MS_... constants' as commit message, it would save a quite a bit of headache next cycle" * 'work.mount' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: VFS: Differentiate mount flags (MS_*) from internal superblock flags VFS: Convert sb->s_flags & MS_RDONLY to sb_rdonly(sb) vfs: Add sb_rdonly(sb) to query the MS_RDONLY flag on s_flags
2017-09-13mm: treewide: remove GFP_TEMPORARY allocation flagMichal Hocko
GFP_TEMPORARY was introduced by commit e12ba74d8ff3 ("Group short-lived and reclaimable kernel allocations") along with __GFP_RECLAIMABLE. It's primary motivation was to allow users to tell that an allocation is short lived and so the allocator can try to place such allocations close together and prevent long term fragmentation. As much as this sounds like a reasonable semantic it becomes much less clear when to use the highlevel GFP_TEMPORARY allocation flag. How long is temporary? Can the context holding that memory sleep? Can it take locks? It seems there is no good answer for those questions. The current implementation of GFP_TEMPORARY is basically GFP_KERNEL | __GFP_RECLAIMABLE which in itself is tricky because basically none of the existing caller provide a way to reclaim the allocated memory. So this is rather misleading and hard to evaluate for any benefits. I have checked some random users and none of them has added the flag with a specific justification. I suspect most of them just copied from other existing users and others just thought it might be a good idea to use without any measuring. This suggests that GFP_TEMPORARY just motivates for cargo cult usage without any reasoning. I believe that our gfp flags are quite complex already and especially those with highlevel semantic should be clearly defined to prevent from confusion and abuse. Therefore I propose dropping GFP_TEMPORARY and replace all existing users to simply use GFP_KERNEL. Please note that SLAB users with shrinkers will still get __GFP_RECLAIMABLE heuristic and so they will be placed properly for memory fragmentation prevention. I can see reasons we might want some gfp flag to reflect shorterm allocations but I propose starting from a clear semantic definition and only then add users with proper justification. This was been brought up before LSF this year by Matthew [1] and it turned out that GFP_TEMPORARY really doesn't have a clear semantic. It seems to be a heuristic without any measured advantage for most (if not all) its current users. The follow up discussion has revealed that opinions on what might be temporary allocation differ a lot between developers. So rather than trying to tweak existing users into a semantic which they haven't expected I propose to simply remove the flag and start from scratch if we really need a semantic for short term allocations. [1] http://lkml.kernel.org/r/20170118054945.GD18349@bombadil.infradead.org [akpm@linux-foundation.org: fix typo] [akpm@linux-foundation.org: coding-style fixes] [sfr@canb.auug.org.au: drm/i915: fix up] Link: http://lkml.kernel.org/r/20170816144703.378d4f4d@canb.auug.org.au Link: http://lkml.kernel.org/r/20170728091904.14627-1-mhocko@kernel.org Signed-off-by: Michal Hocko <mhocko@suse.com> Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au> Acked-by: Mel Gorman <mgorman@suse.de> Acked-by: Vlastimil Babka <vbabka@suse.cz> Cc: Matthew Wilcox <willy@infradead.org> Cc: Neil Brown <neilb@suse.de> Cc: "Theodore Ts'o" <tytso@mit.edu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-09-13Merge branch 'overlayfs-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/vfs Pull overlayfs updates from Miklos Szeredi: "This fixes d_ino correctness in readdir, which brings overlayfs on par with normal filesystems regarding inode number semantics, as long as all layers are on the same filesystem. There are also some bug fixes, one in particular (random ioctl's shouldn't be able to modify lower layers) that touches some vfs code, but of course no-op for non-overlay fs" * 'overlayfs-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/vfs: ovl: fix false positive ESTALE on lookup ovl: don't allow writing ioctl on lower layer ovl: fix relatime for directories vfs: add flags to d_real() ovl: cleanup d_real for negative ovl: constant d_ino for non-merge dirs ovl: constant d_ino across copy up ovl: fix readdir error value ovl: check snprintf return
2017-09-12ovl: fix false positive ESTALE on lookupAmir Goldstein
Commit b9ac5c274b8c ("ovl: hash overlay non-dir inodes by copy up origin") verifies that the origin lower inode stored in the overlayfs inode matched the inode of a copy up origin dentry found by lookup. There is a false positive result in that check when lower fs does not support file handles and copy up origin cannot be followed by file handle at lookup time. The false negative happens when finding an overlay inode in cache on a copied up overlay dentry lookup. The overlay inode still 'remembers' the copy up origin inode, but the copy up origin dentry is not available for verification. Relax the check in case copy up origin dentry is not available. Fixes: b9ac5c274b8c ("ovl: hash overlay non-dir inodes by copy up...") Cc: <stable@vger.kernel.org> # v4.13 Reported-by: Jordi Pujol <jordipujolp@gmail.com> Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2017-09-05ovl: fix relatime for directoriesMiklos Szeredi
Need to treat non-regular overlayfs files the same as regular files when checking for an atime update. Add a d_real() flag to make it return the upper dentry for all file types. Reported-by: "zhangyi (F)" <yi.zhang@huawei.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2017-09-04vfs: add flags to d_real()Miklos Szeredi
Add a separate flags argument (in addition to the open flags) to control the behavior of d_real(). Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2017-09-04ovl: cleanup d_real for negativeMiklos Szeredi
d_real() is never called with a negative dentry. So remove the d_is_negative() check (which would never trigger anyway, since d_is_reg() returns false for a negative dentry). Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2017-08-10overlayfs, locking: Remove smp_mb__before_spinlock() usagePeter Zijlstra
While we could replace the smp_mb__before_spinlock() with the new smp_mb__after_spinlock(), the normal pattern is to use smp_store_release() to publish an object that is used for lockless_dereference() -- and mirrors the regular rcu_assign_pointer() / rcu_dereference() patterns. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-07-27ovl: constant d_ino for non-merge dirsMiklos Szeredi
Impure directories are ones which contain objects with origins (i.e. those that have been copied up). These are relevant to readdir operation only because of the d_ino field, no other transformation is necessary. Also a directory can become impure between two getdents(2) calls. This patch creates a cache for impure directories. Unlike the cache for merged directories, this one only contains entries with origin and is not refcounted but has a its lifetime tied to that of the dentry. Similarly to the merged cache, the impure cache is invalidated based on a version number. This version number is incremented when an entry with origin is added or removed from the directory. If the cache is empty, then the impure xattr is removed from the directory. This patch also fixes up handling of d_ino for the ".." entry if the parent directory is merged. Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2017-07-27ovl: constant d_ino across copy upAmir Goldstein
When all layers are on the same fs, and iterating a directory which may contain copy up entries, call vfs_getattr() on the overlay entries to make sure that d_ino will be consistent with st_ino from stat(2). There is an overhead of lookup per upper entry in readdir. The overhead is minimal if the iterated entries are already in dcache. It is also quite useful for the common case of 'ls -l' that readdir() pre populates the dcache with the listed entries, making the following stat() calls faster. Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2017-07-27ovl: fix readdir error valueMiklos Szeredi
actor's return value is taken as a bool (filled/not filled) so we need to return the error in the context. Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2017-07-27ovl: check snprintf returnMiklos Szeredi
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2017-07-20ovl: check for bad and whiteout index on lookupAmir Goldstein
Index should always be of the same file type as origin, except for the case of a whiteout index. A whiteout index should only exist if all lower aliases have been unlinked, which means that finding a lower origin on lookup whose index is a whiteout should be treated as a lookup error. Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2017-07-20ovl: do not cleanup directory and whiteout index entriesAmir Goldstein
Directory index entries are going to be used for looking up redirected upper dirs by lower dir fh when decoding an overlay file handle of a merge dir. Whiteout index entries are going to be used as an indication that an exported overlay file handle should be treated as stale (i.e. after unlink of the overlay inode). We don't know the verification rules for directory and whiteout index entries, because they have not been implemented yet, so fail to mount overlay rw if those entries are found to avoid corrupting an index that was created by a newer kernel. Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2017-07-20ovl: fix xattr get and set with selinuxMiklos Szeredi
inode_doinit_with_dentry() in SELinux wants to read the upper inode's xattr to get security label, and ovl_xattr_get() calls ovl_dentry_real(), which depends on dentry->d_inode, but d_inode is null and not initialized yet at this point resulting in an Oops. Fix by getting the upperdentry info from the inode directly in this case. Reported-by: Eryu Guan <eguan@redhat.com> Fixes: 09d8b586731b ("ovl: move __upperdentry to ovl_inode") Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>