path: root/doc
diff options
authorPeter Seebach <seebs@eee12.(none)>2010-03-16 19:26:24 -0500
committerPeter Seebach <seebs@eee12.(none)>2010-03-16 19:26:24 -0500
commit33d9386e8d818860ce603356eee074d2a2849085 (patch)
tree185f2c14b86df39403971f0651f4b6136dda147f /doc
initial public release
Diffstat (limited to 'doc')
4 files changed, 286 insertions, 0 deletions
diff --git a/doc/database b/doc/database
new file mode 100644
index 0000000..cac7a3a
--- /dev/null
+++ b/doc/database
@@ -0,0 +1,81 @@
+There are two databases. The log database contains a record of operations
+and events. (Operation logging is optional.) The file database contains a
+record of known files. In general, the file database is configured with
+sqlite options favoring stability, while the log database is configured for
+speed, as operation logging tends to outnumber file operations by a large
+ id (unique key)
+ path (varchar, if known)
+ dev (integer)
+ ino (integer)
+ uid (integer)
+ gid (integer)
+ mode (integer)
+ rdev (integer)
+There are two indexes on the file database, one by path and one by device
+and inode. Earlier versions of pseudo ignored symlinks, but this turned
+out to create problems; specifically, if you had a symlink to a directory,
+and accessed a file through that, it could create unexpected results. Names
+are fully canonicalized by the client, except for functions which would
+operate directly on a symlink, in which case the last path component is not
+It is not an error to have multiple entries with the same device and inode.
+Updates to uid, gid, mode, or rdev are applied to every file with the same
+device and inode. Operations by name are handled by looking up the name
+to obtain the device and inode, then modifying all matching records.
+If a file shows up with no name (this should VERY rarely happen), it is stored
+in the database with the special name 'NAMELESS FILE'. This name can never
+be sent by the client (all names are sent as absolute paths). If a later
+request comes in with a valid name, the 'NAMELESS FILE' is renamed to it so
+it can be unlinked later.
+Rename operations use a pair of paths, separated by a null byte; the client
+sends the total length of both names (plus the null byte), and the server
+knows to split them around the null byte. The impact of a rename on things
+contained within a directory is handled in SQL:
+ UPDATE files SET path = replace(path, oldpath, newpath) WHERE
+ path = oldpath;
+ UPDATE files SET path = replace(path, oldpath, newpath) WHERE
+ (path > oldpath || '/') && (path < oldpath || '0);
+That is to say, anything which either starts with "oldpath/" or is exactly
+equal to oldpath gets renamed, with oldpath replaced by newpath... The
+unusual constructions are to address two key issues. One is that an "OR"
+would prevent proper use of the index. The other is that a pattern,
+such as "LIKE oldpath || '/%'", would prevent use of the index (at least
+in sqlite). The gimmick is that the only things greater than 'a/' and less
+than 'a0' are strings which begin with 'a/' and have additional characters
+after it.
+ id (unique key)
+ stamp (integer, seconds since epoch)
+ operation (id from operations, can be null)
+ client (integer identifier)
+ dev (integer)
+ ino (integer)
+ mode (integer)
+ path (varchar)
+ result (result id)
+ severity (severity id)
+ text (anything else you wanted to say)
+ tag (identifier for operations)
+The log database contains a primary table (logs). As of this writing it
+is not indexed, because indexing is expensive during writes (common, for
+the log database) and very few queries are usually run.
+The log database also contains, when created, tables of operations, result
+types, and severities. These exist so that queries can be run against
+a log database even if these values might have changed in a newer build
+of pseudo. The tables of operations and severities are just id->name pairs.
+No enforcement of the relation is currently provided.
+The log database "tag" field, added since the initial release of pseudo,
+is available for tagging operations. When a client connects to the
+pseudo server, it passes the value of the environment variable PSEUDO_TAG;
+this tag is then recorded for all log entries pertaining to that client.
diff --git a/doc/overview b/doc/overview
new file mode 100644
index 0000000..7341e3c
--- /dev/null
+++ b/doc/overview
@@ -0,0 +1,96 @@
+The pseudo program and library combine to provide an environment which
+provides the illusion of root permissions with respect to file creation,
+ownership, and related functions. At this time, this does not extend to
+emulating chroot functions or a virtual password database, but these features
+may be added.
+The underlying mechanism of pseudo is a library inserted using LD_PRELOAD,
+which provides replacement symbols for core C library functions. At this
+time, the implementation is specific to modern glibc. Support for other
+systems is certainly possible, but not currently implemented or immediately
+planned. The symbols wrapped are generally those that are documented in
+section 2 of the manual -- the ones which are essentially system calls.
+The library works by replacing each real function with a wrapper function
+which obtains the addresses of "real" functions (those in the next library
+down in the chain, typically glibc) and then calls custom-written wrappers
+which alter the behavior of these functions and return results corresponding
+to the virtual environment.
+Underlying this is access to a server process, which is automatically
+spawned by the library if one is not available. The server process maintains
+a UNIX domain socket while it is active, and maintains a database (using
+sqlite) of files known to the system. Files are recorded in the database
+only if they are created within the virtualized environment or have been
+altered by it; files merely read are not added.
+There are four layers of logic for performing or wrapping any function,
+although not all functions involve all four layers:
+1. The generic wrapper, which handles details such as thread-synchronization.
+This function handles the mutex used to keep multiple threads from trying to
+write to the same socket at once, and also disables wrappers when a value
+called "antimagic" is set. The antimagic value is set internally by the
+pseudo client code, and the check for whether or not to use it is controlled
+by the mutex (actually by the mutex owner variable, which is protected by
+the mutex.) Without that, read operations in another thread during the
+"antimagic" part of an operation would bypass pseudo, yielding erratically
+wrong results!
+2. The wrapper function itself. This function may translate a single
+operation into two or more logical operations. This function has no awareness
+of the database, but can send queries to the general client code.
+3. The general client code. This code maintains additional data, such as
+a mapping of file descriptors to paths. In most cases, this code also
+forwards requests to the server code. (If the server is unavailable, the
+client can restart it.)
+4. The server code. This code is fairly simple; all it does is maintain
+the database of file information. Operations consist either of a request
+for information (e.g., a stat(2) call) or notification of a change. The
+server sends back failure or success notices.
+As a fairly typical example, the progress of a stat(2) call is:
+* The __xstat() wrapper is called. This wrapper checks the version argument
+ against the _STAT_VER constant in case we some day run into a system where
+ programs call stat with different versions of struct stat. (Hasn't happened
+ yet.)
+* The __xstat() wrapper calls the __fxstatat() wrapper, which in turn calls
+ the __fxstatat64() wrapper (this allows us to have only one copy of the
+ logic shared among all the path-based stat syscalls).
+* The __fxstatat64() wrapper calls the underlying __fxstatat64() function,
+ which has been mapped to the name real___fxstatat64(). (If this fails,
+ the wrapper function returns immediately.)
+* The __fxstatat64() wrapper passes the resulting stat buffer and path to the
+ client code and asks for a response.
+* The client code converts the stat buffer into a pseudo_msg_t message
+ object, and canonicalizes the path (resolving symlinks and eliminating
+ extra slashes, as well as references to . and ..).
+* The client code now sends the pseudo_msg_t object and converted path to
+ the server as a message.
+* The server receives the message. Since this is a stat() operation (using
+ a path, not a dev/inode pair, for identification), the server searches its
+ database for existing entries with the corresponding name.
+* If the server finds an object, it updates the contents of the pseudo_msg_t
+ with the recorded values for uid, gid, mode, and raw device number, and
+ sends the message back with status SUCCEED.
+* The server also performs sanity checks to see whether there may be other
+ suspiciously-similar entries in the database, in which case it emits
+ diagnostics. (Usually to pseudo.log.)
+* If the server finds no object, it sends the message back with status FAIL.
+* The client code returns the message to the wrapper function.
+* If the status was SUCCEED, the wrapper function copies the modified
+ fields back into its stat buffer; otherwise, it does not.
+* The wrapper function returns the original exit status from stat.
+Most of the functions wrapped are syscalls. There are a few exceptions, such
+as mkstemp, fopen, and freopen. These are wrapped because, in glibc, they
+call internal functions which make inline assembly syscalls, rather than
+calling the syscall entry points. In each case, the wrapper makes the real
+call without intervention, then snoops the results for a file descriptor to
+path mapping. (This would be done to opendir/fdopendir/closedir as well,
+but the DIR * is opaque and can't be snooped practically. This is why
+some versions of 'rm -r' can, at higher diagnostic levels, generate a slew
+of warnings about file descriptors being reopened when no close was
diff --git a/doc/pseudo_ipc b/doc/pseudo_ipc
new file mode 100644
index 0000000..6a73ec8
--- /dev/null
+++ b/doc/pseudo_ipc
@@ -0,0 +1,76 @@
+typedef struct {
+ pseudo_msg_type_t type;
+ op_id_t op;
+ res_id_t result;
+ int xerrno;
+ int client;
+ dev_t dev;
+ unsigned long long ino;
+ uid_t uid;
+ gid_t gid;
+ unsigned long long mode;
+ dev_t rdev;
+ unsigned int pathlen;
+ int nlink;
+ char path[];
+} pseudo_msg_t;
+This structure is used for every communication between the client and the
+server. The last field is optional (it's a C99ism called a flexible array
+member, allowing a single allocation to hold both the structure and the
+variable-length character data at the end).
+All messages contain items up through 'pathlen'. If pathlen is not zero,
+an additional pathlen bytes containing path are provided; path is
+Every message from client should get a response from server. The server
+never really sends a path, currently, but maybe it will someday. Note that
+all server responses will in general share a single message object,
+and future operations may cause that object to be reallocated; the same
+goes for messages received by the server. Basically, pseudo_msg_receive
+is not thread-safe; this is part of (but not all of) the reason that there's
+mutex stuff in the wrappers. (The other part is the "antimagic" being
+able to blow things up.)
+type is one of PING, OP, SHUTDOWN, ACK, or NAK. The client only sends PING or
+OP. The server should always send ACK. When run with '-S', the pseudo
+program runs as a client, sending a SHUTDOWN message to a server -- but only
+if it can find one, it does not start a new one. In this case, the server
+could respond with a NAK, in which case it sends a message in which "path"
+is a list of space-separated PIDs of currently-living clients, for the program
+to print out in an error message. The server will not shut down while there
+are living clients. (The request, though, causes it to shut down immediately
+when there are no more clients, rather than waiting for the timeout
+result is the result of a particular operation. It applies only in replies
+to OP messages.
+client should be the client's PID on send, and the server's client number for
+that client on response. (The response isn't checked, and this is just a
+debugging feature.)
+dev/ino/uid/gid/mode/rdev/path are information about the file. They should
+all be provided on send if possible, but the server only generally changes
+uid/gid/mode/rdev on response, and never sends a path back. Dev and inode
+are currently changed by stat-by-path operations, but this may turn out to
+be wrong.
+xerrno is used to contain a changed errno if, at some point, the server wants
+to override the default errno. Normally, the client just uses its existing
+nlink is used to forward the number of links. The server DOES NOT modify
+this. Rather, nlink is used to provide better diagnostics when checking
+paths against inodes.
+32/64 bit: This structure should have the same offsets for every element,
+including path, on both 64-bit and 32-bit machines. (Check with 'offsets.c'.)
+It is *not* an error if sizeof(pseudo_msg_t) is different; the padding
+happens after the path element. (Note: This is contrary to C99, TC1, but is
+correct according to the current standard. Anyway, gcc's always done it this
+way.) The data written are always pathlen + offsetof(pseudo_msg_t, path),
+and that's correct.
diff --git a/doc/utils b/doc/utils
new file mode 100644
index 0000000..35520d4
--- /dev/null
+++ b/doc/utils
@@ -0,0 +1,33 @@
+ Displays or creates log entries. This offers a quick first
+ approximation of the sorts of queries one is likely to need to
+ run.
+ The pseudo server. Run on the command line, the default behavior
+ is to set up a pseudo environment (LD_PRELOAD, etc) then run either
+ the specified command or a shell by default. The -d option specifies
+ a background daemon, and -f specifies a foreground daemon (which
+ may display output directly). The launcher function isn't really
+ 32-bit/64-bit aware, but if you have both types of libraries in
+ suitably-named directories, it'll do the right thing anyway.
+ Path may be in environment as PSEUDO_PREFIX, specified on command
+ line with -P path, or inferred from the path to $0. (The last
+ generates a diagnostic.)
+ To stop the pseudo server, either wait a while (the default timeout
+ is 30 seconds, or whatever you specified with "-t" or PSEUDO_OPTS
+ when starting it) or run "pseudo -S". The server will not exit
+ while clients are active, but requesting a shutdown sets the timeout
+ to one second, so it will exit quickly after the last client
+ disconnects.
+ The library providing the wrapper functionality, which spawns
+ the pseudo server automatically if needed. If the environment
+ variable PSEUDO_ENOSYS_ABORT is set, attempts to call missing
+ system calls will abort() rather than merely emitting a diagnostic.
+ allows browsing and modification of db (not implemented)