diff options
Diffstat (limited to 'doc/database')
-rw-r--r-- | doc/database | 81 |
1 files changed, 81 insertions, 0 deletions
diff --git a/doc/database b/doc/database new file mode 100644 index 0000000..cac7a3a --- /dev/null +++ b/doc/database @@ -0,0 +1,81 @@ +There are two databases. The log database contains a record of operations +and events. (Operation logging is optional.) The file database contains a +record of known files. In general, the file database is configured with +sqlite options favoring stability, while the log database is configured for +speed, as operation logging tends to outnumber file operations by a large +margin. + +FILES: + id (unique key) + path (varchar, if known) + dev (integer) + ino (integer) + uid (integer) + gid (integer) + mode (integer) + rdev (integer) + +There are two indexes on the file database, one by path and one by device +and inode. Earlier versions of pseudo ignored symlinks, but this turned +out to create problems; specifically, if you had a symlink to a directory, +and accessed a file through that, it could create unexpected results. Names +are fully canonicalized by the client, except for functions which would +operate directly on a symlink, in which case the last path component is not +replaced. + +It is not an error to have multiple entries with the same device and inode. +Updates to uid, gid, mode, or rdev are applied to every file with the same +device and inode. Operations by name are handled by looking up the name +to obtain the device and inode, then modifying all matching records. + +If a file shows up with no name (this should VERY rarely happen), it is stored +in the database with the special name 'NAMELESS FILE'. This name can never +be sent by the client (all names are sent as absolute paths). If a later +request comes in with a valid name, the 'NAMELESS FILE' is renamed to it so +it can be unlinked later. + +Rename operations use a pair of paths, separated by a null byte; the client +sends the total length of both names (plus the null byte), and the server +knows to split them around the null byte. The impact of a rename on things +contained within a directory is handled in SQL: + UPDATE files SET path = replace(path, oldpath, newpath) WHERE + path = oldpath; + UPDATE files SET path = replace(path, oldpath, newpath) WHERE + (path > oldpath || '/') && (path < oldpath || '0); +That is to say, anything which either starts with "oldpath/" or is exactly +equal to oldpath gets renamed, with oldpath replaced by newpath... The +unusual constructions are to address two key issues. One is that an "OR" +would prevent proper use of the index. The other is that a pattern, +such as "LIKE oldpath || '/%'", would prevent use of the index (at least +in sqlite). The gimmick is that the only things greater than 'a/' and less +than 'a0' are strings which begin with 'a/' and have additional characters +after it. + +LOGS + id (unique key) + stamp (integer, seconds since epoch) + operation (id from operations, can be null) + client (integer identifier) + dev (integer) + ino (integer) + mode (integer) + path (varchar) + result (result id) + severity (severity id) + text (anything else you wanted to say) + tag (identifier for operations) + +The log database contains a primary table (logs). As of this writing it +is not indexed, because indexing is expensive during writes (common, for +the log database) and very few queries are usually run. + +The log database also contains, when created, tables of operations, result +types, and severities. These exist so that queries can be run against +a log database even if these values might have changed in a newer build +of pseudo. The tables of operations and severities are just id->name pairs. +No enforcement of the relation is currently provided. + +The log database "tag" field, added since the initial release of pseudo, +is available for tagging operations. When a client connects to the +pseudo server, it passes the value of the environment variable PSEUDO_TAG; +this tag is then recorded for all log entries pertaining to that client. |