aboutsummaryrefslogtreecommitdiffstats
path: root/doc/database
diff options
context:
space:
mode:
Diffstat (limited to 'doc/database')
-rw-r--r--doc/database81
1 files changed, 81 insertions, 0 deletions
diff --git a/doc/database b/doc/database
new file mode 100644
index 0000000..cac7a3a
--- /dev/null
+++ b/doc/database
@@ -0,0 +1,81 @@
+There are two databases. The log database contains a record of operations
+and events. (Operation logging is optional.) The file database contains a
+record of known files. In general, the file database is configured with
+sqlite options favoring stability, while the log database is configured for
+speed, as operation logging tends to outnumber file operations by a large
+margin.
+
+FILES:
+ id (unique key)
+ path (varchar, if known)
+ dev (integer)
+ ino (integer)
+ uid (integer)
+ gid (integer)
+ mode (integer)
+ rdev (integer)
+
+There are two indexes on the file database, one by path and one by device
+and inode. Earlier versions of pseudo ignored symlinks, but this turned
+out to create problems; specifically, if you had a symlink to a directory,
+and accessed a file through that, it could create unexpected results. Names
+are fully canonicalized by the client, except for functions which would
+operate directly on a symlink, in which case the last path component is not
+replaced.
+
+It is not an error to have multiple entries with the same device and inode.
+Updates to uid, gid, mode, or rdev are applied to every file with the same
+device and inode. Operations by name are handled by looking up the name
+to obtain the device and inode, then modifying all matching records.
+
+If a file shows up with no name (this should VERY rarely happen), it is stored
+in the database with the special name 'NAMELESS FILE'. This name can never
+be sent by the client (all names are sent as absolute paths). If a later
+request comes in with a valid name, the 'NAMELESS FILE' is renamed to it so
+it can be unlinked later.
+
+Rename operations use a pair of paths, separated by a null byte; the client
+sends the total length of both names (plus the null byte), and the server
+knows to split them around the null byte. The impact of a rename on things
+contained within a directory is handled in SQL:
+ UPDATE files SET path = replace(path, oldpath, newpath) WHERE
+ path = oldpath;
+ UPDATE files SET path = replace(path, oldpath, newpath) WHERE
+ (path > oldpath || '/') && (path < oldpath || '0);
+That is to say, anything which either starts with "oldpath/" or is exactly
+equal to oldpath gets renamed, with oldpath replaced by newpath... The
+unusual constructions are to address two key issues. One is that an "OR"
+would prevent proper use of the index. The other is that a pattern,
+such as "LIKE oldpath || '/%'", would prevent use of the index (at least
+in sqlite). The gimmick is that the only things greater than 'a/' and less
+than 'a0' are strings which begin with 'a/' and have additional characters
+after it.
+
+LOGS
+ id (unique key)
+ stamp (integer, seconds since epoch)
+ operation (id from operations, can be null)
+ client (integer identifier)
+ dev (integer)
+ ino (integer)
+ mode (integer)
+ path (varchar)
+ result (result id)
+ severity (severity id)
+ text (anything else you wanted to say)
+ tag (identifier for operations)
+
+The log database contains a primary table (logs). As of this writing it
+is not indexed, because indexing is expensive during writes (common, for
+the log database) and very few queries are usually run.
+
+The log database also contains, when created, tables of operations, result
+types, and severities. These exist so that queries can be run against
+a log database even if these values might have changed in a newer build
+of pseudo. The tables of operations and severities are just id->name pairs.
+No enforcement of the relation is currently provided.
+
+The log database "tag" field, added since the initial release of pseudo,
+is available for tagging operations. When a client connects to the
+pseudo server, it passes the value of the environment variable PSEUDO_TAG;
+this tag is then recorded for all log entries pertaining to that client.