aboutsummaryrefslogtreecommitdiffstats
path: root/fs/namei.c
AgeCommit message (Collapse)Author
2015-12-06Don't reset ->total_link_count on nested calls of vfs_path_lookup()Al Viro
we already zero it on outermost set_nameidata(), so initialization in path_init() is pointless and wrong. The same DoS exists on pre-4.2 kernels, but there a slightly different fix will be needed. Cc: stable@vger.kernel.org # v4.2 Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-11-07Merge branch 'akpm' (patches from Andrew)Linus Torvalds
Merge second patch-bomb from Andrew Morton: - most of the rest of MM - procfs - lib/ updates - printk updates - bitops infrastructure tweaks - checkpatch updates - nilfs2 update - signals - various other misc bits: coredump, seqfile, kexec, pidns, zlib, ipc, dma-debug, dma-mapping, ... * emailed patches from Andrew Morton <akpm@linux-foundation.org>: (102 commits) ipc,msg: drop dst nil validation in copy_msg include/linux/zutil.h: fix usage example of zlib_adler32() panic: release stale console lock to always get the logbuf printed out dma-debug: check nents in dma_sync_sg* dma-mapping: tidy up dma_parms default handling pidns: fix set/getpriority and ioprio_set/get in PRIO_USER mode kexec: use file name as the output message prefix fs, seqfile: always allow oom killer seq_file: reuse string_escape_str() fs/seq_file: use seq_* helpers in seq_hex_dump() coredump: change zap_threads() and zap_process() to use for_each_thread() coredump: ensure all coredumping tasks have SIGNAL_GROUP_COREDUMP signal: remove jffs2_garbage_collect_thread()->allow_signal(SIGCONT) signal: introduce kernel_signal_stop() to fix jffs2_garbage_collect_thread() signal: turn dequeue_signal_lock() into kernel_dequeue_signal() signals: kill block_all_signals() and unblock_all_signals() nilfs2: fix gcc uninitialized-variable warnings in powerpc build nilfs2: fix gcc unused-but-set-variable warnings MAINTAINERS: nilfs2: add header file for tracing nilfs2: add tracepoints for analyzing reading and writing metadata files ...
2015-11-07Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial Pull trivial updates from Jiri Kosina: "Trivial stuff from trivial tree that can be trivially summed up as: - treewide drop of spurious unlikely() before IS_ERR() from Viresh Kumar - cosmetic fixes (that don't really affect basic functionality of the driver) for pktcdvd and bcache, from Julia Lawall and Petr Mladek - various comment / printk fixes and updates all over the place" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial: bcache: Really show state of work pending bit hwmon: applesmc: fix comment typos Kconfig: remove comment about scsi_wait_scan module class_find_device: fix reference to argument "match" debugfs: document that debugfs_remove*() accepts NULL and error values net: Drop unlikely before IS_ERR(_OR_NULL) mm: Drop unlikely before IS_ERR(_OR_NULL) fs: Drop unlikely before IS_ERR(_OR_NULL) drivers: net: Drop unlikely before IS_ERR(_OR_NULL) drivers: misc: Drop unlikely before IS_ERR(_OR_NULL) UBI: Update comments to reflect UBI_METAONLY flag pktcdvd: drop null test before destroy functions
2015-11-06mm, fs: introduce mapping_gfp_constraint()Michal Hocko
There are many places which use mapping_gfp_mask to restrict a more generic gfp mask which would be used for allocations which are not directly related to the page cache but they are performed in the same context. Let's introduce a helper function which makes the restriction explicit and easier to track. This patch doesn't introduce any functional changes. [akpm@linux-foundation.org: coding-style fixes] Signed-off-by: Michal Hocko <mhocko@suse.com> Suggested-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-11-05Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace Pull userns hardlink capability check fix from Eric Biederman: "This round just contains a single patch. There has been a lot of other work this period but it is not quite ready yet, so I am pushing it until 4.5. The remaining change by Dirk Steinmetz wich fixes both Gentoo and Ubuntu containers allows hardlinks if we have the appropriate capabilities in the user namespace. Security wise it is really a gimme as the user namespace root can already call setuid become that user and create the hardlink" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace: namei: permit linking with CAP_FOWNER in userns
2015-10-27namei: permit linking with CAP_FOWNER in usernsDirk Steinmetz
Attempting to hardlink to an unsafe file (e.g. a setuid binary) from within an unprivileged user namespace fails, even if CAP_FOWNER is held within the namespace. This may cause various failures, such as a gentoo installation within a lxc container failing to build and install specific packages. This change permits hardlinking of files owned by mapped uids, if CAP_FOWNER is held for that namespace. Furthermore, it improves consistency by using the existing inode_owner_or_capable(), which is aware of namespaced capabilities as of 23adbe12ef7d3 ("fs,userns: Change inode_capable to capable_wrt_inode_uidgid"). Signed-off-by: Dirk Steinmetz <public@rsjtdrjgfuzkfg.com> This is hitting us in Ubuntu during some dpkg upgrades in containers. When upgrading a file dpkg creates a hard link to the old file to back it up before overwriting it. When packages upgrade suid files owned by a non-root user the link isn't permitted, and the package upgrade fails. This patch fixes our problem. Tested-by: Seth Forshee <seth.forshee@canonical.com> Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
2015-10-10namei: results of d_is_negative() should be checked after dentry revalidationTrond Myklebust
Leandro Awa writes: "After switching to version 4.1.6, our parallelized and distributed workflows now fail consistently with errors of the form: T34: ./regex.c:39:22: error: config.h: No such file or directory From our 'git bisect' testing, the following commit appears to be the possible cause of the behavior we've been seeing: commit 766c4cbfacd8" Al Viro says: "What happens is that 766c4cbfacd8 got the things subtly wrong. We used to treat d_is_negative() after lookup_fast() as "fall with ENOENT". That was wrong - checking ->d_flags outside of ->d_seq protection is unreliable and failing with hard error on what should've fallen back to non-RCU pathname resolution is a bug. Unfortunately, we'd pulled the test too far up and ran afoul of another kind of staleness. The dentry might have been absolutely stable from the RCU point of view (and we might be on UP, etc), but stale from the remote fs point of view. If ->d_revalidate() returns "it's actually stale", dentry gets thrown away and the original code wouldn't even have looked at its ->d_flags. What we need is to check ->d_flags where 766c4cbfacd8 does (prior to ->d_seq validation) but only use the result in cases where we do not discard this dentry outright" Reported-by: Leandro Awa <lawa@nvidia.com> Link: https://bugzilla.kernel.org/show_bug.cgi?id=104911 Fixes: 766c4cbfacd8 ("namei: d_is_negative() should be checked...") Tested-by: Leandro Awa <lawa@nvidia.com> Cc: stable@vger.kernel.org # v4.1+ Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Acked-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-09-29fs: Drop unlikely before IS_ERR(_OR_NULL)Viresh Kumar
IS_ERR(_OR_NULL) already contain an 'unlikely' compiler flag and there is no need to do that again from its callers. Drop it. Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> Reviewed-by: Jeff Layton <jlayton@poochiereds.net> Reviewed-by: David Howells <dhowells@redhat.com> Reviewed-by: Steve French <smfrench@gmail.com> Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2015-09-10namei: fix warning while make xmldocs caused by namei.cMasanari Iida
Fix the following warnings: Warning(.//fs/namei.c:2422): No description found for parameter 'nd' Warning(.//fs/namei.c:2422): Excess function parameter 'nameidata' description in 'path_mountpoint' Signed-off-by: Masanari Iida <standby24x7@gmail.com> Acked-by: Randy Dunlap <rdunlap@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-08-21vfs: Test for and handle paths that are unreachable from their mnt_rootEric W. Biederman
In rare cases a directory can be renamed out from under a bind mount. In those cases without special handling it becomes possible to walk up the directory tree to the root dentry of the filesystem and down from the root dentry to every other file or directory on the filesystem. Like division by zero .. from an unconnected path can not be given a useful semantic as there is no predicting at which path component the code will realize it is unconnected. We certainly can not match the current behavior as the current behavior is a security hole. Therefore when encounting .. when following an unconnected path return -ENOENT. - Add a function path_connected to verify path->dentry is reachable from path->mnt.mnt_root. AKA to validate that rename did not do something nasty to the bind mount. To avoid races path_connected must be called after following a path component to it's next path component. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-08-04may_follow_link() should use nd->inodeAl Viro
Now that we can get there in RCU mode, we shouldn't play with nd->path.dentry->d_inode - it's not guaranteed to be stable. Use nd->inode instead. Reported-by: Hugh Dickins <hughd@google.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-08-01link_path_walk(): be careful when failing with ENOTDIRAl Viro
In RCU mode we might end up with dentry evicted just we check that it's a directory. In such case we should return ECHILD rather than ENOTDIR, so that pathwalk would be retries in non-RCU mode. Breakage had been introduced in commit b18825a - prior to that we were looking at nd->inode, which had been fetched before verifying that ->d_seq was still valid. That form of check would only be satisfied if at some point the pathname prefix would indeed have resolved to a non-directory. The fix consists of checking ->d_seq after we'd run into a non-directory dentry, and failing with ECHILD in case of mismatch. Note that all branches since 3.12 have that problem... Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-06-29namei: make set_root_rcu() return voidAl Viro
The only caller that cares about its return value can just as easily pick it from nd->root_seq itself. We used to just calculate it and return to caller, but these days we are storing it in nd->root_seq in all cases. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-05-15turn user_{path_at,path,lpath,path_dir}() into static inlinesAl Viro
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-05-15namei: move saved_nd pointer into struct nameidataAl Viro
these guys are always declared next to each other; might as well put the former (pointer to previous instance) into the latter and simplify the calling conventions for {set,restore}_nameidata() Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-05-15inline user_path_create()Al Viro
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-05-15inline user_path_parent()Al Viro
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-05-15namei: trim do_last() argumentsAl Viro
now that struct filename is stashed in nameidata we have no need to pass it in Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-05-15namei: stash dfd and name into nameidataAl Viro
fewer arguments to pass around... Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-05-15namei: fold path_cleanup() into terminate_walk()Al Viro
they are always called next to each other; moreover, terminate_walk() is more symmetrical that way. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-05-15namei: saner calling conventions for filename_parentat()Al Viro
a) make it reject ERR_PTR() for name b) make it putname(name) on all other failure exits c) make it return name on success again, simplifies the callers Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-05-15namei: saner calling conventions for filename_create()Al Viro
a) make it reject ERR_PTR() for name b) make it putname(name) upon return in all other cases. seriously simplifies the callers... Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-05-15namei: shift nameidata down into filename_parentat()Al Viro
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-05-15namei: make filename_lookup() reject ERR_PTR() passed as nameAl Viro
makes for much easier life in callers Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-05-15namei: shift nameidata inside filename_lookup()Al Viro
pass root instead; non-NULL => copy to nd.root and set LOOKUP_ROOT in flags Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-05-15namei: move putname() call into filename_lookup()Al Viro
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-05-15namei: pass the struct path to store the result down into path_lookupat()Al Viro
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-05-15namei: uninline set_root{,_rcu}()Al Viro
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-05-15namei: be careful with mountpoint crossings in follow_dotdot_rcu()Al Viro
Otherwise we are risking a hard error where nonlazy restart would be the right thing to do; it's a very narrow race with mount --move and most of the time it ends up being completely harmless, but it's possible to construct a case when we'll get a bogus hard error instead of falling back to non-lazy walk... For one thing, when crossing _into_ overmount of parent we need to check for mount_lock bumps when we get NULL from __lookup_mnt() as well. For another, and less exotically, we need to make sure that the data fetched in follow_up_rcu() had been consistent. ->mnt_mountpoint is pinned for as long as it is a mountpoint, but we need to check mount_lock after fetching to verify that. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-05-15namei: unlazy_walk() doesn't need to mess with current->fs anymoreAl Viro
now that we have ->root_seq, legitimize_path(&nd->root, nd->root_seq) will do just fine... Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-05-15namei: handle absolute symlinks without dropping out of RCU modeAl Viro
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-05-15enable passing fast relative symlinks without dropping out of RCU modeAl Viro
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-05-15VFS/namei: make the use of touch_atime() in get_link() RCU-safe.NeilBrown
touch_atime is not RCU-safe, and so cannot be called on an RCU walk. However, in situations where RCU-walk makes a difference, the symlink will likely to accessed much more often than it is useful to update the atime. So split out the test of "Does the atime actually need to be updated" into atime_needs_update(), and have get_link() unlazy if it finds that it will need to do that update. Signed-off-by: NeilBrown <neilb@suse.de> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-05-15namei: don't unlazy until get_link()Al Viro
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-05-15namei: make unlazy_walk and terminate_walk handle nd->stack, add unlazy_linkAl Viro
We are almost done - primitives for leaving RCU mode are aware of nd->stack now, a new primitive for going to non-RCU mode when we have a symlink on hands added. The thing we are heavily relying upon is that *any* unlazy failure will be shortly followed by terminate_walk(), with no access to nameidata in between. So it's enough to leave the things in a state terminate_walk() would cope with. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-05-11namei: store seq numbers in nd->stack[]Al Viro
we'll need them for unlazy_walk() Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-05-11namei: make may_follow_link() safe in RCU modeAl Viro
We *can't* call that audit garbage in RCU mode - it's doing a weird mix of allocations (GFP_NOFS, immediately followed by GFP_KERNEL) and I'm not touching that... thing again. So if this security sclero^Whardening feature gets triggered when we are in RCU mode, tough - we'll fail with -ECHILD and have everything restarted in non-RCU mode. Only to hit the same test and fail, this time with EACCES and with (oh, rapture) an audit spew produced. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-05-11namei: make put_link() RCU-safeAl Viro
very simple - just make path_put() conditional on !RCU. Note that right now it doesn't get called in RCU mode - we leave it before getting anything into stack. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-05-11switch ->put_link() from dentry to inodeAl Viro
only one instance looks at that argument at all; that sole exception wants inode rather than dentry. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-05-11security: make inode_follow_link RCU-walk awareNeilBrown
inode_follow_link now takes an inode and rcu flag as well as the dentry. inode is used in preference to d_backing_inode(dentry), particularly in RCU-walk mode. selinux_inode_follow_link() gets dentry_has_perm() and inode_has_perm() open-coded into it so that it can call avc_has_perm_flags() in way that is safe if LOOKUP_RCU is set. Calling avc_has_perm_flags() with rcu_read_lock() held means that when avc_has_perm_noaudit calls avc_compute_av(), the attempt to rcu_read_unlock() before calling security_compute_av() will not actually drop the RCU read-lock. However as security_compute_av() is completely in a read_lock()ed region, it should be safe with the RCU read-lock held. Signed-off-by: NeilBrown <neilb@suse.de> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-05-11namei: pick_link() callers already have inodeAl Viro
no need to refetch (and once we move unlazy out of there, recheck ->d_seq). Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-05-11VFS: Handle lower layer dentry/inode in pathwalkDavid Howells
Make use of d_backing_inode() in pathwalk to gain access to an inode or dentry that's on a lower layer. Signed-off-by: David Howells <dhowells@redhat.com>
2015-05-11namei: store inode in nd->stack[]Al Viro
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-05-11namei: don't mangle nd->seq in lookup_fast()Al Viro
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-05-11namei: explicitly pass seq number to unlazy_walk() when dentry != NULLAl Viro
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-05-11link_path_walk: use explicit returns for failure exitsAl Viro
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-05-11namei: lift terminate_walk() all the way upAl Viro
Lift it from link_path_walk(), trailing_symlink(), lookup_last(), mountpoint_last(), complete_walk() and do_last(). A _lot_ of those suckers merge. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-05-11namei: lift link_path_walk() call out of trailing_symlink()Al Viro
Make trailing_symlink() return the pathname to traverse or ERR_PTR(-E...). A subtle point is that for "magic" symlinks it returns "" now - that leads to link_path_walk("", nd), which is immediately returning 0 and we are back to the treatment of the last component, at whereever the damn thing has left us. Reduces the stack footprint - link_path_walk() called on more shallow stack now. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-05-11namei: path_init() calling conventions changeAl Viro
* lift link_path_walk() into callers; moving it down into path_init() had been a mistake. Stack footprint, among other things... * do _not_ call path_cleanup() after path_init() failure; on all failure exits out of it we have nothing for path_cleanup() to do * have path_init() return pathname or ERR_PTR(-E...) Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-05-11namei: get rid of nameidata->baseAl Viro
we can do fdput() under rcu_read_lock() just fine; all we need to take care of is fetching nd->inode value first. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>