diff options
18 files changed, 3565 insertions, 0 deletions
diff --git a/features/seccomp/00-README b/features/seccomp/00-README new file mode 100644 index 00000000..e14506a3 --- /dev/null +++ b/features/seccomp/00-README @@ -0,0 +1,114 @@ + +This is a backport of the seccomp BPF syscall filtering from v3.5 + +Quoting from: https://lkml.org/lkml/2012/1/11/260 + +--------------- +[RFC,PATCH 0/2] dynamic seccomp policies (using BPF filters) + +The goal of the patchset is straightforward: + + To provide a means of reducing the kernel attack surface. + +In practice, this is done at the primary kernel ABI: system calls. +Achieving this goal will address the needs expressed by many systems +projects: + qemu/kvm, openssh, vsftpd, lxc, and chromium and chromium os (me). + +While system call filtering has been attempted many times, I hope that +this approach shows more promise. It works as described below and in +the patch series. + +A userland task may call prctl(PR_ATTACH_SECCOMP_FILTER) to attach a +BPF program to itself. Once attached, all system calls made by the +task will be evaluated by the BPF program prior to being accepted. +Evaluation is done by executing the BPF program over the struct +user_regs_state for the process. +-------------- + +The content appears in v3.5 from: + +------------ +commit cb60e3e65c1b96a4d6444a7a13dc7dd48bc15a2b +Merge: 99262a3 ff2bb04 +Author: Linus Torvalds <torvalds@linux-foundation.org> +Date: Mon May 21 20:27:36 2012 -0700 + + Merge branch 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security + + Pull security subsystem updates from James Morris: + "New notable features: + - The seccomp work from Will Drewry + - PR_{GET,SET}_NO_NEW_PRIVS from Andy Lutomirski + - Longer security labels for Smack from Casey Schaufler + - Additional ptrace restriction modes for Yama by Kees Cook" +----------- + +Here, we take Will's linear block of commits from the above merge, which +are all conveniently all marked with "v18" in the changelog, and the +one PR_{GET,SET}_NO_NEW_PRIVS commit from Andy (req'd as a dependency). + +Documentation: +============== + +See added file: Documentation/prctl/seccomp_filter.txt + + +Testing: +======== + +Several samples are added in samples/seccomp -- building is as easy as: + + mkdir ../test + make O=../test defconfig + make O=../test samples/seccomp/ + +The bpf-direct is a sample which grabs writes to STDERR, and redirects +them to STDOUT, with an "[ERR]" prefix. Consider the core of the program: + +--------------------- + syscall(__NR_write, STDOUT_FILENO, + payload("OHAI! WHAT IS YOUR NAME? ")); + bytes = syscall(__NR_read, STDIN_FILENO, buf, sizeof(buf)); + syscall(__NR_write, STDOUT_FILENO, payload("HELLO, ")); + syscall(__NR_write, STDOUT_FILENO, buf, bytes); + syscall(__NR_write, STDERR_FILENO, + payload("Error message going to STDERR\n")); +--------------------- + +Running this core on a non-seccomp kernel, (i.e. by copying the above core +to "foo.c") we can see with redirection, that the sample Error message +goes to STDERR; i.e. + +----------- +~$./foo +OHAI! WHAT IS YOUR NAME? sdfsdf +HELLO, sdfsdf +Error message going to STDERR +~$./foo 2> /dev/null +OHAI! WHAT IS YOUR NAME? sdfs +HELLO, sdfs +~$ +------------ + +Note in the 2nd instance, the error message disappears into /dev/null + +Now consider the seccomp enabled case, using the same redirect: + +------------ +$ ./bpf-direct +OHAI! WHAT IS YOUR NAME? sdfsd +HELLO, sdfsd +[ERR] Error message going to STDERR +$ ./bpf-direct 2>/dev/null +OHAI! WHAT IS YOUR NAME? sdfsdf +HELLO, sdfsdf +[ERR] Error message going to STDERR +$ +------------ + +There are two things to see in the above. + 1) We see the [ERR] prefix that is clearly from the emulator() + function we've installed on the __NR_write syscall, and + 2) Even when we redirect STDERR to /dev/null, we still see the + message, which confirms it was put on STDOUT instead. diff --git a/features/seccomp/Add-PR_-GET-SET-_NO_NEW_PRIVS-to-prevent-execve-from.patch b/features/seccomp/Add-PR_-GET-SET-_NO_NEW_PRIVS-to-prevent-execve-from.patch new file mode 100644 index 00000000..a14c20d7 --- /dev/null +++ b/features/seccomp/Add-PR_-GET-SET-_NO_NEW_PRIVS-to-prevent-execve-from.patch @@ -0,0 +1,223 @@ +From 45fe2238b82776f1bef0a0eb1082ae8abc97e6a0 Mon Sep 17 00:00:00 2001 +From: Andy Lutomirski <luto@amacapital.net> +Date: Thu, 12 Apr 2012 16:47:50 -0500 +Subject: [PATCH] Add PR_{GET,SET}_NO_NEW_PRIVS to prevent execve from + granting privs + +commit 259e5e6c75a910f3b5e656151dc602f53f9d7548 upstream. + +With this change, calling + prctl(PR_SET_NO_NEW_PRIVS, 1, 0, 0, 0) +disables privilege granting operations at execve-time. For example, a +process will not be able to execute a setuid binary to change their uid +or gid if this bit is set. The same is true for file capabilities. + +Additionally, LSM_UNSAFE_NO_NEW_PRIVS is defined to ensure that +LSMs respect the requested behavior. + +To determine if the NO_NEW_PRIVS bit is set, a task may call + prctl(PR_GET_NO_NEW_PRIVS, 0, 0, 0, 0); +It returns 1 if set and 0 if it is not set. If any of the arguments are +non-zero, it will return -1 and set errno to -EINVAL. +(PR_SET_NO_NEW_PRIVS behaves similarly.) + +This functionality is desired for the proposed seccomp filter patch +series. By using PR_SET_NO_NEW_PRIVS, it allows a task to modify the +system call behavior for itself and its child tasks without being +able to impact the behavior of a more privileged task. + +Another potential use is making certain privileged operations +unprivileged. For example, chroot may be considered "safe" if it cannot +affect privileged tasks. + +Note, this patch causes execve to fail when PR_SET_NO_NEW_PRIVS is +set and AppArmor is in use. It is fixed in a subsequent patch. + +Signed-off-by: Andy Lutomirski <luto@amacapital.net> +Signed-off-by: Will Drewry <wad@chromium.org> +Acked-by: Eric Paris <eparis@redhat.com> +Acked-by: Kees Cook <keescook@chromium.org> + +v18: updated change desc +v17: using new define values as per 3.4 +Signed-off-by: James Morris <james.l.morris@oracle.com> +Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com> +--- + fs/exec.c | 10 +++++++++- + include/linux/prctl.h | 15 +++++++++++++++ + include/linux/sched.h | 2 ++ + include/linux/security.h | 1 + + kernel/sys.c | 10 ++++++++++ + security/apparmor/domain.c | 4 ++++ + security/commoncap.c | 7 +++++-- + security/selinux/hooks.c | 10 +++++++++- + 8 files changed, 55 insertions(+), 4 deletions(-) + +diff --git a/fs/exec.c b/fs/exec.c +index b1fd202..d038968 100644 +--- a/fs/exec.c ++++ b/fs/exec.c +@@ -1245,6 +1245,13 @@ static int check_unsafe_exec(struct linux_binprm *bprm) + bprm->unsafe |= LSM_UNSAFE_PTRACE; + } + ++ /* ++ * This isn't strictly necessary, but it makes it harder for LSMs to ++ * mess up. ++ */ ++ if (current->no_new_privs) ++ bprm->unsafe |= LSM_UNSAFE_NO_NEW_PRIVS; ++ + n_fs = 1; + spin_lock(&p->fs->lock); + rcu_read_lock(); +@@ -1288,7 +1295,8 @@ int prepare_binprm(struct linux_binprm *bprm) + bprm->cred->euid = current_euid(); + bprm->cred->egid = current_egid(); + +- if (!(bprm->file->f_path.mnt->mnt_flags & MNT_NOSUID)) { ++ if (!(bprm->file->f_path.mnt->mnt_flags & MNT_NOSUID) && ++ !current->no_new_privs) { + /* Set-uid? */ + if (mode & S_ISUID) { + bprm->per_clear |= PER_CLEAR_ON_SETID; +diff --git a/include/linux/prctl.h b/include/linux/prctl.h +index e0cfec2..78b76e2 100644 +--- a/include/linux/prctl.h ++++ b/include/linux/prctl.h +@@ -124,4 +124,19 @@ + #define PR_SET_CHILD_SUBREAPER 36 + #define PR_GET_CHILD_SUBREAPER 37 + ++/* ++ * If no_new_privs is set, then operations that grant new privileges (i.e. ++ * execve) will either fail or not grant them. This affects suid/sgid, ++ * file capabilities, and LSMs. ++ * ++ * Operations that merely manipulate or drop existing privileges (setresuid, ++ * capset, etc.) will still work. Drop those privileges if you want them gone. ++ * ++ * Changing LSM security domain is considered a new privilege. So, for example, ++ * asking selinux for a specific new context (e.g. with runcon) will result ++ * in execve returning -EPERM. ++ */ ++#define PR_SET_NO_NEW_PRIVS 38 ++#define PR_GET_NO_NEW_PRIVS 39 ++ + #endif /* _LINUX_PRCTL_H */ +diff --git a/include/linux/sched.h b/include/linux/sched.h +index 81a173c..ba60897 100644 +--- a/include/linux/sched.h ++++ b/include/linux/sched.h +@@ -1341,6 +1341,8 @@ struct task_struct { + * execve */ + unsigned in_iowait:1; + ++ /* task may not gain privileges */ ++ unsigned no_new_privs:1; + + /* Revert to default priority/policy when forking */ + unsigned sched_reset_on_fork:1; +diff --git a/include/linux/security.h b/include/linux/security.h +index 673afbb..6e1dea9 100644 +--- a/include/linux/security.h ++++ b/include/linux/security.h +@@ -144,6 +144,7 @@ struct request_sock; + #define LSM_UNSAFE_SHARE 1 + #define LSM_UNSAFE_PTRACE 2 + #define LSM_UNSAFE_PTRACE_CAP 4 ++#define LSM_UNSAFE_NO_NEW_PRIVS 8 + + #ifdef CONFIG_MMU + extern int mmap_min_addr_handler(struct ctl_table *table, int write, +diff --git a/kernel/sys.c b/kernel/sys.c +index e7006eb..b82568b 100644 +--- a/kernel/sys.c ++++ b/kernel/sys.c +@@ -1979,6 +1979,16 @@ SYSCALL_DEFINE5(prctl, int, option, unsigned long, arg2, unsigned long, arg3, + error = put_user(me->signal->is_child_subreaper, + (int __user *) arg2); + break; ++ case PR_SET_NO_NEW_PRIVS: ++ if (arg2 != 1 || arg3 || arg4 || arg5) ++ return -EINVAL; ++ ++ current->no_new_privs = 1; ++ break; ++ case PR_GET_NO_NEW_PRIVS: ++ if (arg2 || arg3 || arg4 || arg5) ++ return -EINVAL; ++ return current->no_new_privs ? 1 : 0; + default: + error = -EINVAL; + break; +diff --git a/security/apparmor/domain.c b/security/apparmor/domain.c +index 6327685..18c88d0 100644 +--- a/security/apparmor/domain.c ++++ b/security/apparmor/domain.c +@@ -360,6 +360,10 @@ int apparmor_bprm_set_creds(struct linux_binprm *bprm) + if (bprm->cred_prepared) + return 0; + ++ /* XXX: no_new_privs is not usable with AppArmor yet */ ++ if (bprm->unsafe & LSM_UNSAFE_NO_NEW_PRIVS) ++ return -EPERM; ++ + cxt = bprm->cred->security; + BUG_ON(!cxt); + +diff --git a/security/commoncap.c b/security/commoncap.c +index 71a166a..f80d116 100644 +--- a/security/commoncap.c ++++ b/security/commoncap.c +@@ -512,14 +512,17 @@ skip: + + + /* Don't let someone trace a set[ug]id/setpcap binary with the revised +- * credentials unless they have the appropriate permit ++ * credentials unless they have the appropriate permit. ++ * ++ * In addition, if NO_NEW_PRIVS, then ensure we get no new privs. + */ + if ((new->euid != old->uid || + new->egid != old->gid || + !cap_issubset(new->cap_permitted, old->cap_permitted)) && + bprm->unsafe & ~LSM_UNSAFE_PTRACE_CAP) { + /* downgrade; they get no more than they had, and maybe less */ +- if (!capable(CAP_SETUID)) { ++ if (!capable(CAP_SETUID) || ++ (bprm->unsafe & LSM_UNSAFE_NO_NEW_PRIVS)) { + new->euid = new->uid; + new->egid = new->gid; + } +diff --git a/security/selinux/hooks.c b/security/selinux/hooks.c +index d85b793..0b06685 100644 +--- a/security/selinux/hooks.c ++++ b/security/selinux/hooks.c +@@ -2016,6 +2016,13 @@ static int selinux_bprm_set_creds(struct linux_binprm *bprm) + new_tsec->sid = old_tsec->exec_sid; + /* Reset exec SID on execve. */ + new_tsec->exec_sid = 0; ++ ++ /* ++ * Minimize confusion: if no_new_privs and a transition is ++ * explicitly requested, then fail the exec. ++ */ ++ if (bprm->unsafe & LSM_UNSAFE_NO_NEW_PRIVS) ++ return -EPERM; + } else { + /* Check for a default transition on this program. */ + rc = security_transition_sid(old_tsec->sid, isec->sid, +@@ -2029,7 +2036,8 @@ static int selinux_bprm_set_creds(struct linux_binprm *bprm) + ad.selinux_audit_data = &sad; + ad.u.path = bprm->file->f_path; + +- if (bprm->file->f_path.mnt->mnt_flags & MNT_NOSUID) ++ if ((bprm->file->f_path.mnt->mnt_flags & MNT_NOSUID) || ++ (bprm->unsafe & LSM_UNSAFE_NO_NEW_PRIVS)) + new_tsec->sid = old_tsec->sid; + + if (new_tsec->sid == old_tsec->sid) { +-- +1.7.9.1 + diff --git a/features/seccomp/Documentation-prctl-seccomp_filter.patch b/features/seccomp/Documentation-prctl-seccomp_filter.patch new file mode 100644 index 00000000..9f431fb6 --- /dev/null +++ b/features/seccomp/Documentation-prctl-seccomp_filter.patch @@ -0,0 +1,1005 @@ +From c837aebb90de91991e51e55cfddf43b6c16da61e Mon Sep 17 00:00:00 2001 +From: Will Drewry <wad@chromium.org> +Date: Thu, 12 Apr 2012 16:48:04 -0500 +Subject: [PATCH] Documentation: prctl/seccomp_filter + +commit 8ac270d1e29f0428228ab2b9a8ae5e1ed4a5cd84 upstream. + +Documents how system call filtering using Berkeley Packet +Filter programs works and how it may be used. +Includes an example for x86 and a semi-generic +example using a macro-based code generator. + +Acked-by: Eric Paris <eparis@redhat.com> +Signed-off-by: Will Drewry <wad@chromium.org> +Acked-by: Kees Cook <keescook@chromium.org> + +v18: - added acked by + - update no new privs numbers +v17: - remove @compat note and add Pitfalls section for arch checking + (keescook@chromium.org) +v16: - +v15: - +v14: - rebase/nochanges +v13: - rebase on to 88ebdda6159ffc15699f204c33feb3e431bf9bdc +v12: - comment on the ptrace_event use + - update arch support comment + - note the behavior of SECCOMP_RET_DATA when there are multiple filters + (keescook@chromium.org) + - lots of samples/ clean up incl 64-bit bpf-direct support + (markus@chromium.org) + - rebase to linux-next +v11: - overhaul return value language, updates (keescook@chromium.org) + - comment on do_exit(SIGSYS) +v10: - update for SIGSYS + - update for new seccomp_data layout + - update for ptrace option use +v9: - updated bpf-direct.c for SIGILL +v8: - add PR_SET_NO_NEW_PRIVS to the samples. +v7: - updated for all the new stuff in v7: TRAP, TRACE + - only talk about PR_SET_SECCOMP now + - fixed bad JLE32 check (coreyb@linux.vnet.ibm.com) + - adds dropper.c: a simple system call disabler +v6: - tweak the language to note the requirement of + PR_SET_NO_NEW_PRIVS being called prior to use. (luto@mit.edu) +v5: - update sample to use system call arguments + - adds a "fancy" example using a macro-based generator + - cleaned up bpf in the sample + - update docs to mention arguments + - fix prctl value (eparis@redhat.com) + - language cleanup (rdunlap@xenotime.net) +v4: - update for no_new_privs use + - minor tweaks +v3: - call out BPF <-> Berkeley Packet Filter (rdunlap@xenotime.net) + - document use of tentative always-unprivileged + - guard sample compilation for i386 and x86_64 +v2: - move code to samples (corbet@lwn.net) +Signed-off-by: James Morris <james.l.morris@oracle.com> +Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com> +--- + Documentation/prctl/seccomp_filter.txt | 163 ++++++++++++++++++++++ + samples/Makefile | 2 +- + samples/seccomp/Makefile | 38 +++++ + samples/seccomp/bpf-direct.c | 176 +++++++++++++++++++++++ + samples/seccomp/bpf-fancy.c | 102 ++++++++++++++ + samples/seccomp/bpf-helper.c | 89 ++++++++++++ + samples/seccomp/bpf-helper.h | 238 ++++++++++++++++++++++++++++++++ + samples/seccomp/dropper.c | 68 +++++++++ + 8 files changed, 875 insertions(+), 1 deletions(-) + create mode 100644 Documentation/prctl/seccomp_filter.txt + create mode 100644 samples/seccomp/Makefile + create mode 100644 samples/seccomp/bpf-direct.c + create mode 100644 samples/seccomp/bpf-fancy.c + create mode 100644 samples/seccomp/bpf-helper.c + create mode 100644 samples/seccomp/bpf-helper.h + create mode 100644 samples/seccomp/dropper.c + +diff --git a/Documentation/prctl/seccomp_filter.txt b/Documentation/prctl/seccomp_filter.txt +new file mode 100644 +index 0000000..597c3c5 +--- /dev/null ++++ b/Documentation/prctl/seccomp_filter.txt +@@ -0,0 +1,163 @@ ++ SECure COMPuting with filters ++ ============================= ++ ++Introduction ++------------ ++ ++A large number of system calls are exposed to every userland process ++with many of them going unused for the entire lifetime of the process. ++As system calls change and mature, bugs are found and eradicated. A ++certain subset of userland applications benefit by having a reduced set ++of available system calls. The resulting set reduces the total kernel ++surface exposed to the application. System call filtering is meant for ++use with those applications. ++ ++Seccomp filtering provides a means for a process to specify a filter for ++incoming system calls. The filter is expressed as a Berkeley Packet ++Filter (BPF) program, as with socket filters, except that the data ++operated on is related to the system call being made: system call ++number and the system call arguments. This allows for expressive ++filtering of system calls using a filter program language with a long ++history of being exposed to userland and a straightforward data set. ++ ++Additionally, BPF makes it impossible for users of seccomp to fall prey ++to time-of-check-time-of-use (TOCTOU) attacks that are common in system ++call interposition frameworks. BPF programs may not dereference ++pointers which constrains all filters to solely evaluating the system ++call arguments directly. ++ ++What it isn't ++------------- ++ ++System call filtering isn't a sandbox. It provides a clearly defined ++mechanism for minimizing the exposed kernel surface. It is meant to be ++a tool for sandbox developers to use. Beyond that, policy for logical ++behavior and information flow should be managed with a combination of ++other system hardening techniques and, potentially, an LSM of your ++choosing. Expressive, dynamic filters provide further options down this ++path (avoiding pathological sizes or selecting which of the multiplexed ++system calls in socketcall() is allowed, for instance) which could be ++construed, incorrectly, as a more complete sandboxing solution. ++ ++Usage ++----- ++ ++An additional seccomp mode is added and is enabled using the same ++prctl(2) call as the strict seccomp. If the architecture has ++CONFIG_HAVE_ARCH_SECCOMP_FILTER, then filters may be added as below: ++ ++PR_SET_SECCOMP: ++ Now takes an additional argument which specifies a new filter ++ using a BPF program. ++ The BPF program will be executed over struct seccomp_data ++ reflecting the system call number, arguments, and other ++ metadata. The BPF program must then return one of the ++ acceptable values to inform the kernel which action should be ++ taken. ++ ++ Usage: ++ prctl(PR_SET_SECCOMP, SECCOMP_MODE_FILTER, prog); ++ ++ The 'prog' argument is a pointer to a struct sock_fprog which ++ will contain the filter program. If the program is invalid, the ++ call will return -1 and set errno to EINVAL. ++ ++ If fork/clone and execve are allowed by @prog, any child ++ processes will be constrained to the same filters and system ++ call ABI as the parent. ++ ++ Prior to use, the task must call prctl(PR_SET_NO_NEW_PRIVS, 1) or ++ run with CAP_SYS_ADMIN privileges in its namespace. If these are not ++ true, -EACCES will be returned. This requirement ensures that filter ++ programs cannot be applied to child processes with greater privileges ++ than the task that installed them. ++ ++ Additionally, if prctl(2) is allowed by the attached filter, ++ additional filters may be layered on which will increase evaluation ++ time, but allow for further decreasing the attack surface during ++ execution of a process. ++ ++The above call returns 0 on success and non-zero on error. ++ ++Return values ++------------- ++A seccomp filter may return any of the following values. If multiple ++filters exist, the return value for the evaluation of a given system ++call will always use the highest precedent value. (For example, ++SECCOMP_RET_KILL will always take precedence.) ++ ++In precedence order, they are: ++ ++SECCOMP_RET_KILL: ++ Results in the task exiting immediately without executing the ++ system call. The exit status of the task (status & 0x7f) will ++ be SIGSYS, not SIGKILL. ++ ++SECCOMP_RET_TRAP: ++ Results in the kernel sending a SIGSYS signal to the triggering ++ task without executing the system call. The kernel will ++ rollback the register state to just before the system call ++ entry such that a signal handler in the task will be able to ++ inspect the ucontext_t->uc_mcontext registers and emulate ++ system call success or failure upon return from the signal ++ handler. ++ ++ The SECCOMP_RET_DATA portion of the return value will be passed ++ as si_errno. ++ ++ SIGSYS triggered by seccomp will have a si_code of SYS_SECCOMP. ++ ++SECCOMP_RET_ERRNO: ++ Results in the lower 16-bits of the return value being passed ++ to userland as the errno without executing the system call. ++ ++SECCOMP_RET_TRACE: ++ When returned, this value will cause the kernel to attempt to ++ notify a ptrace()-based tracer prior to executing the system ++ call. If there is no tracer present, -ENOSYS is returned to ++ userland and the system call is not executed. ++ ++ A tracer will be notified if it requests PTRACE_O_TRACESECCOMP ++ using ptrace(PTRACE_SETOPTIONS). The tracer will be notified ++ of a PTRACE_EVENT_SECCOMP and the SECCOMP_RET_DATA portion of ++ the BPF program return value will be available to the tracer ++ via PTRACE_GETEVENTMSG. ++ ++SECCOMP_RET_ALLOW: ++ Results in the system call being executed. ++ ++If multiple filters exist, the return value for the evaluation of a ++given system call will always use the highest precedent value. ++ ++Precedence is only determined using the SECCOMP_RET_ACTION mask. When ++multiple filters return values of the same precedence, only the ++SECCOMP_RET_DATA from the most recently installed filter will be ++returned. ++ ++Pitfalls ++-------- ++ ++The biggest pitfall to avoid during use is filtering on system call ++number without checking the architecture value. Why? On any ++architecture that supports multiple system call invocation conventions, ++the system call numbers may vary based on the specific invocation. If ++the numbers in the different calling conventions overlap, then checks in ++the filters may be abused. Always check the arch value! ++ ++Example ++------- ++ ++The samples/seccomp/ directory contains both an x86-specific example ++and a more generic example of a higher level macro interface for BPF ++program generation. ++ ++ ++ ++Adding architecture support ++----------------------- ++ ++See arch/Kconfig for the authoritative requirements. In general, if an ++architecture supports both ptrace_event and seccomp, it will be able to ++support seccomp filter with minor fixup: SIGSYS support and seccomp return ++value checking. Then it must just add CONFIG_HAVE_ARCH_SECCOMP_FILTER ++to its arch-specific Kconfig. +diff --git a/samples/Makefile b/samples/Makefile +index 2f75851..5ef08bb 100644 +--- a/samples/Makefile ++++ b/samples/Makefile +@@ -1,4 +1,4 @@ + # Makefile for Linux samples code + + obj-$(CONFIG_SAMPLES) += kobject/ kprobes/ tracepoints/ trace_events/ \ +- hw_breakpoint/ kfifo/ kdb/ hidraw/ rpmsg/ ++ hw_breakpoint/ kfifo/ kdb/ hidraw/ rpmsg/ seccomp/ +diff --git a/samples/seccomp/Makefile b/samples/seccomp/Makefile +new file mode 100644 +index 0000000..e8fe0f5 +--- /dev/null ++++ b/samples/seccomp/Makefile +@@ -0,0 +1,38 @@ ++# kbuild trick to avoid linker error. Can be omitted if a module is built. ++obj- := dummy.o ++ ++hostprogs-$(CONFIG_SECCOMP) := bpf-fancy dropper ++bpf-fancy-objs := bpf-fancy.o bpf-helper.o ++ ++HOSTCFLAGS_bpf-fancy.o += -I$(objtree)/usr/include ++HOSTCFLAGS_bpf-fancy.o += -idirafter $(objtree)/include ++HOSTCFLAGS_bpf-helper.o += -I$(objtree)/usr/include ++HOSTCFLAGS_bpf-helper.o += -idirafter $(objtree)/include ++ ++HOSTCFLAGS_dropper.o += -I$(objtree)/usr/include ++HOSTCFLAGS_dropper.o += -idirafter $(objtree)/include ++dropper-objs := dropper.o ++ ++# bpf-direct.c is x86-only. ++ifeq ($(SRCARCH),x86) ++# List of programs to build ++hostprogs-$(CONFIG_SECCOMP) += bpf-direct ++bpf-direct-objs := bpf-direct.o ++endif ++ ++HOSTCFLAGS_bpf-direct.o += -I$(objtree)/usr/include ++HOSTCFLAGS_bpf-direct.o += -idirafter $(objtree)/include ++ ++# Try to match the kernel target. ++ifeq ($(CONFIG_64BIT),) ++HOSTCFLAGS_bpf-direct.o += -m32 ++HOSTCFLAGS_dropper.o += -m32 ++HOSTCFLAGS_bpf-helper.o += -m32 ++HOSTCFLAGS_bpf-fancy.o += -m32 ++HOSTLOADLIBES_bpf-direct += -m32 ++HOSTLOADLIBES_bpf-fancy += -m32 ++HOSTLOADLIBES_dropper += -m32 ++endif ++ ++# Tell kbuild to always build the programs ++always := $(hostprogs-y) +diff --git a/samples/seccomp/bpf-direct.c b/samples/seccomp/bpf-direct.c +new file mode 100644 +index 0000000..26f523e +--- /dev/null ++++ b/samples/seccomp/bpf-direct.c +@@ -0,0 +1,176 @@ ++/* ++ * Seccomp filter example for x86 (32-bit and 64-bit) with BPF macros ++ * ++ * Copyright (c) 2012 The Chromium OS Authors <chromium-os-dev@chromium.org> ++ * Author: Will Drewry <wad@chromium.org> ++ * ++ * The code may be used by anyone for any purpose, ++ * and can serve as a starting point for developing ++ * applications using prctl(PR_SET_SECCOMP, 2, ...). ++ */ ++#define __USE_GNU 1 ++#define _GNU_SOURCE 1 ++ ++#include <linux/types.h> ++#include <linux/filter.h> ++#include <linux/seccomp.h> ++#include <linux/unistd.h> ++#include <signal.h> ++#include <stdio.h> ++#include <stddef.h> ++#include <string.h> ++#include <sys/prctl.h> ++#include <unistd.h> ++ ++#define syscall_arg(_n) (offsetof(struct seccomp_data, args[_n])) ++#define syscall_nr (offsetof(struct seccomp_data, nr)) ++ ++#if defined(__i386__) ++#define REG_RESULT REG_EAX ++#define REG_SYSCALL REG_EAX ++#define REG_ARG0 REG_EBX ++#define REG_ARG1 REG_ECX ++#define REG_ARG2 REG_EDX ++#define REG_ARG3 REG_ESI ++#define REG_ARG4 REG_EDI ++#define REG_ARG5 REG_EBP ++#elif defined(__x86_64__) ++#define REG_RESULT REG_RAX ++#define REG_SYSCALL REG_RAX ++#define REG_ARG0 REG_RDI ++#define REG_ARG1 REG_RSI ++#define REG_ARG2 REG_RDX ++#define REG_ARG3 REG_R10 ++#define REG_ARG4 REG_R8 ++#define REG_ARG5 REG_R9 ++#else ++#error Unsupported platform ++#endif ++ ++#ifndef PR_SET_NO_NEW_PRIVS ++#define PR_SET_NO_NEW_PRIVS 38 ++#endif ++ ++#ifndef SYS_SECCOMP ++#define SYS_SECCOMP 1 ++#endif ++ ++static void emulator(int nr, siginfo_t *info, void *void_context) ++{ ++ ucontext_t *ctx = (ucontext_t *)(void_context); ++ int syscall; ++ char *buf; ++ ssize_t bytes; ++ size_t len; ++ if (info->si_code != SYS_SECCOMP) ++ return; ++ if (!ctx) ++ return; ++ syscall = ctx->uc_mcontext.gregs[REG_SYSCALL]; ++ buf = (char *) ctx->uc_mcontext.gregs[REG_ARG1]; ++ len = (size_t) ctx->uc_mcontext.gregs[REG_ARG2]; ++ ++ if (syscall != __NR_write) ++ return; ++ if (ctx->uc_mcontext.gregs[REG_ARG0] != STDERR_FILENO) ++ return; ++ /* Redirect stderr messages to stdout. Doesn't handle EINTR, etc */ ++ ctx->uc_mcontext.gregs[REG_RESULT] = -1; ++ if (write(STDOUT_FILENO, "[ERR] ", 6) > 0) { ++ bytes = write(STDOUT_FILENO, buf, len); ++ ctx->uc_mcontext.gregs[REG_RESULT] = bytes; ++ } ++ return; ++} ++ ++static int install_emulator(void) ++{ ++ struct sigaction act; ++ sigset_t mask; ++ memset(&act, 0, sizeof(act)); ++ sigemptyset(&mask); ++ sigaddset(&mask, SIGSYS); ++ ++ act.sa_sigaction = &emulator; ++ act.sa_flags = SA_SIGINFO; ++ if (sigaction(SIGSYS, &act, NULL) < 0) { ++ perror("sigaction"); ++ return -1; ++ } ++ if (sigprocmask(SIG_UNBLOCK, &mask, NULL)) { ++ perror("sigprocmask"); ++ return -1; ++ } ++ return 0; ++} ++ ++static int install_filter(void) ++{ ++ struct sock_filter filter[] = { ++ /* Grab the system call number */ ++ BPF_STMT(BPF_LD+BPF_W+BPF_ABS, syscall_nr), ++ /* Jump table for the allowed syscalls */ ++ BPF_JUMP(BPF_JMP+BPF_JEQ+BPF_K, __NR_rt_sigreturn, 0, 1), ++ BPF_STMT(BPF_RET+BPF_K, SECCOMP_RET_ALLOW), ++#ifdef __NR_sigreturn ++ BPF_JUMP(BPF_JMP+BPF_JEQ+BPF_K, __NR_sigreturn, 0, 1), ++ BPF_STMT(BPF_RET+BPF_K, SECCOMP_RET_ALLOW), ++#endif ++ BPF_JUMP(BPF_JMP+BPF_JEQ+BPF_K, __NR_exit_group, 0, 1), ++ BPF_STMT(BPF_RET+BPF_K, SECCOMP_RET_ALLOW), ++ BPF_JUMP(BPF_JMP+BPF_JEQ+BPF_K, __NR_exit, 0, 1), ++ BPF_STMT(BPF_RET+BPF_K, SECCOMP_RET_ALLOW), ++ BPF_JUMP(BPF_JMP+BPF_JEQ+BPF_K, __NR_read, 1, 0), ++ BPF_JUMP(BPF_JMP+BPF_JEQ+BPF_K, __NR_write, 3, 2), ++ ++ /* Check that read is only using stdin. */ ++ BPF_STMT(BPF_LD+BPF_W+BPF_ABS, syscall_arg(0)), ++ BPF_JUMP(BPF_JMP+BPF_JEQ+BPF_K, STDIN_FILENO, 4, 0), ++ BPF_STMT(BPF_RET+BPF_K, SECCOMP_RET_KILL), ++ ++ /* Check that write is only using stdout */ ++ BPF_STMT(BPF_LD+BPF_W+BPF_ABS, syscall_arg(0)), ++ BPF_JUMP(BPF_JMP+BPF_JEQ+BPF_K, STDOUT_FILENO, 1, 0), ++ /* Trap attempts to write to stderr */ ++ BPF_JUMP(BPF_JMP+BPF_JEQ+BPF_K, STDERR_FILENO, 1, 2), ++ ++ BPF_STMT(BPF_RET+BPF_K, SECCOMP_RET_ALLOW), ++ BPF_STMT(BPF_RET+BPF_K, SECCOMP_RET_TRAP), ++ BPF_STMT(BPF_RET+BPF_K, SECCOMP_RET_KILL), ++ }; ++ struct sock_fprog prog = { ++ .len = (unsigned short)(sizeof(filter)/sizeof(filter[0])), ++ .filter = filter, ++ }; ++ ++ if (prctl(PR_SET_NO_NEW_PRIVS, 1, 0, 0, 0)) { ++ perror("prctl(NO_NEW_PRIVS)"); ++ return 1; ++ } ++ ++ ++ if (prctl(PR_SET_SECCOMP, SECCOMP_MODE_FILTER, &prog)) { ++ perror("prctl"); ++ return 1; ++ } ++ return 0; ++} ++ ++#define payload(_c) (_c), sizeof((_c)) ++int main(int argc, char **argv) ++{ ++ char buf[4096]; ++ ssize_t bytes = 0; ++ if (install_emulator()) ++ return 1; ++ if (install_filter()) ++ return 1; ++ syscall(__NR_write, STDOUT_FILENO, ++ payload("OHAI! WHAT IS YOUR NAME? ")); ++ bytes = syscall(__NR_read, STDIN_FILENO, buf, sizeof(buf)); ++ syscall(__NR_write, STDOUT_FILENO, payload("HELLO, ")); ++ syscall(__NR_write, STDOUT_FILENO, buf, bytes); ++ syscall(__NR_write, STDERR_FILENO, ++ payload("Error message going to STDERR\n")); ++ return 0; ++} +diff --git a/samples/seccomp/bpf-fancy.c b/samples/seccomp/bpf-fancy.c +new file mode 100644 +index 0000000..8eb483a +--- /dev/null ++++ b/samples/seccomp/bpf-fancy.c +@@ -0,0 +1,102 @@ ++/* ++ * Seccomp BPF example using a macro-based generator. ++ * ++ * Copyright (c) 2012 The Chromium OS Authors <chromium-os-dev@chromium.org> ++ * Author: Will Drewry <wad@chromium.org> ++ * ++ * The code may be used by anyone for any purpose, ++ * and can serve as a starting point for developing ++ * applications using prctl(PR_ATTACH_SECCOMP_FILTER). ++ */ ++ ++#include <linux/filter.h> ++#include <linux/seccomp.h> ++#include <linux/unistd.h> ++#include <stdio.h> ++#include <string.h> ++#include <sys/prctl.h> ++#include <unistd.h> ++ ++#include "bpf-helper.h" ++ ++#ifndef PR_SET_NO_NEW_PRIVS ++#define PR_SET_NO_NEW_PRIVS 38 ++#endif ++ ++int main(int argc, char **argv) ++{ ++ struct bpf_labels l; ++ static const char msg1[] = "Please type something: "; ++ static const char msg2[] = "You typed: "; ++ char buf[256]; ++ struct sock_filter filter[] = { ++ /* TODO: LOAD_SYSCALL_NR(arch) and enforce an arch */ ++ LOAD_SYSCALL_NR, ++ SYSCALL(__NR_exit, ALLOW), ++ SYSCALL(__NR_exit_group, ALLOW), ++ SYSCALL(__NR_write, JUMP(&l, write_fd)), ++ SYSCALL(__NR_read, JUMP(&l, read)), ++ DENY, /* Don't passthrough into a label */ ++ ++ LABEL(&l, read), ++ ARG(0), ++ JNE(STDIN_FILENO, DENY), ++ ARG(1), ++ JNE((unsigned long)buf, DENY), ++ ARG(2), ++ JGE(sizeof(buf), DENY), ++ ALLOW, ++ ++ LABEL(&l, write_fd), ++ ARG(0), ++ JEQ(STDOUT_FILENO, JUMP(&l, write_buf)), ++ JEQ(STDERR_FILENO, JUMP(&l, write_buf)), ++ DENY, ++ ++ LABEL(&l, write_buf), ++ ARG(1), ++ JEQ((unsigned long)msg1, JUMP(&l, msg1_len)), ++ JEQ((unsigned long)msg2, JUMP(&l, msg2_len)), ++ JEQ((unsigned long)buf, JUMP(&l, buf_len)), ++ DENY, ++ ++ LABEL(&l, msg1_len), ++ ARG(2), ++ JLT(sizeof(msg1), ALLOW), ++ DENY, ++ ++ LABEL(&l, msg2_len), ++ ARG(2), ++ JLT(sizeof(msg2), ALLOW), ++ DENY, ++ ++ LABEL(&l, buf_len), ++ ARG(2), ++ JLT(sizeof(buf), ALLOW), ++ DENY, ++ }; ++ struct sock_fprog prog = { ++ .filter = filter, ++ .len = (unsigned short)(sizeof(filter)/sizeof(filter[0])), ++ }; ++ ssize_t bytes; ++ bpf_resolve_jumps(&l, filter, sizeof(filter)/sizeof(*filter)); ++ ++ if (prctl(PR_SET_NO_NEW_PRIVS, 1, 0, 0, 0)) { ++ perror("prctl(NO_NEW_PRIVS)"); ++ return 1; ++ } ++ ++ if (prctl(PR_SET_SECCOMP, SECCOMP_MODE_FILTER, &prog)) { ++ perror("prctl(SECCOMP)"); ++ return 1; ++ } ++ syscall(__NR_write, STDOUT_FILENO, msg1, strlen(msg1)); ++ bytes = syscall(__NR_read, STDIN_FILENO, buf, sizeof(buf)-1); ++ bytes = (bytes > 0 ? bytes : 0); ++ syscall(__NR_write, STDERR_FILENO, msg2, strlen(msg2)); ++ syscall(__NR_write, STDERR_FILENO, buf, bytes); ++ /* Now get killed */ ++ syscall(__NR_write, STDERR_FILENO, msg2, strlen(msg2)+2); ++ return 0; ++} +diff --git a/samples/seccomp/bpf-helper.c b/samples/seccomp/bpf-helper.c +new file mode 100644 +index 0000000..579cfe3 +--- /dev/null ++++ b/samples/seccomp/bpf-helper.c +@@ -0,0 +1,89 @@ ++/* ++ * Seccomp BPF helper functions ++ * ++ * Copyright (c) 2012 The Chromium OS Authors <chromium-os-dev@chromium.org> ++ * Author: Will Drewry <wad@chromium.org> ++ * ++ * The code may be used by anyone for any purpose, ++ * and can serve as a starting point for developing ++ * applications using prctl(PR_ATTACH_SECCOMP_FILTER). ++ */ ++ ++#include <stdio.h> ++#include <string.h> ++ ++#include "bpf-helper.h" ++ ++int bpf_resolve_jumps(struct bpf_labels *labels, ++ struct sock_filter *filter, size_t count) ++{ ++ struct sock_filter *begin = filter; ++ __u8 insn = count - 1; ++ ++ if (count < 1) ++ return -1; ++ /* ++ * Walk it once, backwards, to build the label table and do fixups. ++ * Since backward jumps are disallowed by BPF, this is easy. ++ */ ++ filter += insn; ++ for (; filter >= begin; --insn, --filter) { ++ if (filter->code != (BPF_JMP+BPF_JA)) ++ continue; ++ switch ((filter->jt<<8)|filter->jf) { ++ case (JUMP_JT<<8)|JUMP_JF: ++ if (labels->labels[filter->k].location == 0xffffffff) { ++ fprintf(stderr, "Unresolved label: '%s'\n", ++ labels->labels[filter->k].label); ++ return 1; ++ } ++ filter->k = labels->labels[filter->k].location - ++ (insn + 1); ++ filter->jt = 0; ++ filter->jf = 0; ++ continue; ++ case (LABEL_JT<<8)|LABEL_JF: ++ if (labels->labels[filter->k].location != 0xffffffff) { ++ fprintf(stderr, "Duplicate label use: '%s'\n", ++ labels->labels[filter->k].label); ++ return 1; ++ } ++ labels->labels[filter->k].location = insn; ++ filter->k = 0; /* fall through */ ++ filter->jt = 0; ++ filter->jf = 0; ++ continue; ++ } ++ } ++ return 0; ++} ++ ++/* Simple lookup table for labels. */ ++__u32 seccomp_bpf_label(struct bpf_labels *labels, const char *label) ++{ ++ struct __bpf_label *begin = labels->labels, *end; ++ int id; ++ if (labels->count == 0) { ++ begin->label = label; ++ begin->location = 0xffffffff; ++ labels->count++; ++ return 0; ++ } ++ end = begin + labels->count; ++ for (id = 0; begin < end; ++begin, ++id) { ++ if (!strcmp(label, begin->label)) ++ return id; ++ } ++ begin->label = label; ++ begin->location = 0xffffffff; ++ labels->count++; ++ return id; ++} ++ ++void seccomp_bpf_print(struct sock_filter *filter, size_t count) ++{ ++ struct sock_filter *end = filter + count; ++ for ( ; filter < end; ++filter) ++ printf("{ code=%u,jt=%u,jf=%u,k=%u },\n", ++ filter->code, filter->jt, filter->jf, filter->k); ++} +diff --git a/samples/seccomp/bpf-helper.h b/samples/seccomp/bpf-helper.h +new file mode 100644 +index 0000000..643279d +--- /dev/null ++++ b/samples/seccomp/bpf-helper.h +@@ -0,0 +1,238 @@ ++/* ++ * Example wrapper around BPF macros. ++ * ++ * Copyright (c) 2012 The Chromium OS Authors <chromium-os-dev@chromium.org> ++ * Author: Will Drewry <wad@chromium.org> ++ * ++ * The code may be used by anyone for any purpose, ++ * and can serve as a starting point for developing ++ * applications using prctl(PR_SET_SECCOMP, 2, ...). ++ * ++ * No guarantees are provided with respect to the correctness ++ * or functionality of this code. ++ */ ++#ifndef __BPF_HELPER_H__ ++#define __BPF_HELPER_H__ ++ ++#include <asm/bitsperlong.h> /* for __BITS_PER_LONG */ ++#include <endian.h> ++#include <linux/filter.h> ++#include <linux/seccomp.h> /* for seccomp_data */ ++#include <linux/types.h> ++#include <linux/unistd.h> ++#include <stddef.h> ++ ++#define BPF_LABELS_MAX 256 ++struct bpf_labels { ++ int count; ++ struct __bpf_label { ++ const char *label; ++ __u32 location; ++ } labels[BPF_LABELS_MAX]; ++}; ++ ++int bpf_resolve_jumps(struct bpf_labels *labels, ++ struct sock_filter *filter, size_t count); ++__u32 seccomp_bpf_label(struct bpf_labels *labels, const char *label); ++void seccomp_bpf_print(struct sock_filter *filter, size_t count); ++ ++#define JUMP_JT 0xff ++#define JUMP_JF 0xff ++#define LABEL_JT 0xfe ++#define LABEL_JF 0xfe ++ ++#define ALLOW \ ++ BPF_STMT(BPF_RET+BPF_K, SECCOMP_RET_ALLOW) ++#define DENY \ ++ BPF_STMT(BPF_RET+BPF_K, SECCOMP_RET_KILL) ++#define JUMP(labels, label) \ ++ BPF_JUMP(BPF_JMP+BPF_JA, FIND_LABEL((labels), (label)), \ ++ JUMP_JT, JUMP_JF) ++#define LABEL(labels, label) \ ++ BPF_JUMP(BPF_JMP+BPF_JA, FIND_LABEL((labels), (label)), \ ++ LABEL_JT, LABEL_JF) ++#define SYSCALL(nr, jt) \ ++ BPF_JUMP(BPF_JMP+BPF_JEQ+BPF_K, (nr), 0, 1), \ ++ jt ++ ++/* Lame, but just an example */ ++#define FIND_LABEL(labels, label) seccomp_bpf_label((labels), #label) ++ ++#define EXPAND(...) __VA_ARGS__ ++/* Map all width-sensitive operations */ ++#if __BITS_PER_LONG == 32 ++ ++#define JEQ(x, jt) JEQ32(x, EXPAND(jt)) ++#define JNE(x, jt) JNE32(x, EXPAND(jt)) ++#define JGT(x, jt) JGT32(x, EXPAND(jt)) ++#define JLT(x, jt) JLT32(x, EXPAND(jt)) ++#define JGE(x, jt) JGE32(x, EXPAND(jt)) ++#define JLE(x, jt) JLE32(x, EXPAND(jt)) ++#define JA(x, jt) JA32(x, EXPAND(jt)) ++#define ARG(i) ARG_32(i) ++#define LO_ARG(idx) offsetof(struct seccomp_data, args[(idx)]) ++ ++#elif __BITS_PER_LONG == 64 ++ ++/* Ensure that we load the logically correct offset. */ ++#if __BYTE_ORDER == __LITTLE_ENDIAN ++#define ENDIAN(_lo, _hi) _lo, _hi ++#define LO_ARG(idx) offsetof(struct seccomp_data, args[(idx)]) ++#define HI_ARG(idx) offsetof(struct seccomp_data, args[(idx)]) + sizeof(__u32) ++#elif __BYTE_ORDER == __BIG_ENDIAN ++#define ENDIAN(_lo, _hi) _hi, _lo ++#define LO_ARG(idx) offsetof(struct seccomp_data, args[(idx)]) + sizeof(__u32) ++#define HI_ARG(idx) offsetof(struct seccomp_data, args[(idx)]) ++#else ++#error "Unknown endianness" ++#endif ++ ++union arg64 { ++ struct { ++ __u32 ENDIAN(lo32, hi32); ++ }; ++ __u64 u64; ++}; ++ ++#define JEQ(x, jt) \ ++ JEQ64(((union arg64){.u64 = (x)}).lo32, \ ++ ((union arg64){.u64 = (x)}).hi32, \ ++ EXPAND(jt)) ++#define JGT(x, jt) \ ++ JGT64(((union arg64){.u64 = (x)}).lo32, \ ++ ((union arg64){.u64 = (x)}).hi32, \ ++ EXPAND(jt)) ++#define JGE(x, jt) \ ++ JGE64(((union arg64){.u64 = (x)}).lo32, \ ++ ((union arg64){.u64 = (x)}).hi32, \ ++ EXPAND(jt)) ++#define JNE(x, jt) \ ++ JNE64(((union arg64){.u64 = (x)}).lo32, \ ++ ((union arg64){.u64 = (x)}).hi32, \ ++ EXPAND(jt)) ++#define JLT(x, jt) \ ++ JLT64(((union arg64){.u64 = (x)}).lo32, \ ++ ((union arg64){.u64 = (x)}).hi32, \ ++ EXPAND(jt)) ++#define JLE(x, jt) \ ++ JLE64(((union arg64){.u64 = (x)}).lo32, \ ++ ((union arg64){.u64 = (x)}).hi32, \ ++ EXPAND(jt)) ++ ++#define JA(x, jt) \ ++ JA64(((union arg64){.u64 = (x)}).lo32, \ ++ ((union arg64){.u64 = (x)}).hi32, \ ++ EXPAND(jt)) ++#define ARG(i) ARG_64(i) ++ ++#else ++#error __BITS_PER_LONG value unusable. ++#endif ++ ++/* Loads the arg into A */ ++#define ARG_32(idx) \ ++ BPF_STMT(BPF_LD+BPF_W+BPF_ABS, LO_ARG(idx)) ++ ++/* Loads hi into A and lo in X */ ++#define ARG_64(idx) \ ++ BPF_STMT(BPF_LD+BPF_W+BPF_ABS, LO_ARG(idx)), \ ++ BPF_STMT(BPF_ST, 0), /* lo -> M[0] */ \ ++ BPF_STMT(BPF_LD+BPF_W+BPF_ABS, HI_ARG(idx)), \ ++ BPF_STMT(BPF_ST, 1) /* hi -> M[1] */ ++ ++#define JEQ32(value, jt) \ ++ BPF_JUMP(BPF_JMP+BPF_JEQ+BPF_K, (value), 0, 1), \ ++ jt ++ ++#define JNE32(value, jt) \ ++ BPF_JUMP(BPF_JMP+BPF_JEQ+BPF_K, (value), 1, 0), \ ++ jt ++ ++/* Checks the lo, then swaps to check the hi. A=lo,X=hi */ ++#define JEQ64(lo, hi, jt) \ ++ BPF_JUMP(BPF_JMP+BPF_JEQ+BPF_K, (hi), 0, 5), \ ++ BPF_STMT(BPF_LD+BPF_MEM, 0), /* swap in lo */ \ ++ BPF_JUMP(BPF_JMP+BPF_JEQ+BPF_K, (lo), 0, 2), \ ++ BPF_STMT(BPF_LD+BPF_MEM, 1), /* passed: swap hi back in */ \ ++ jt, \ ++ BPF_STMT(BPF_LD+BPF_MEM, 1) /* failed: swap hi back in */ ++ ++#define JNE64(lo, hi, jt) \ ++ BPF_JUMP(BPF_JMP+BPF_JEQ+BPF_K, (hi), 5, 0), \ ++ BPF_STMT(BPF_LD+BPF_MEM, 0), /* swap in lo */ \ ++ BPF_JUMP(BPF_JMP+BPF_JEQ+BPF_K, (lo), 2, 0), \ ++ BPF_STMT(BPF_LD+BPF_MEM, 1), /* passed: swap hi back in */ \ ++ jt, \ ++ BPF_STMT(BPF_LD+BPF_MEM, 1) /* failed: swap hi back in */ ++ ++#define JA32(value, jt) \ ++ BPF_JUMP(BPF_JMP+BPF_JSET+BPF_K, (value), 0, 1), \ ++ jt ++ ++#define JA64(lo, hi, jt) \ ++ BPF_JUMP(BPF_JMP+BPF_JSET+BPF_K, (hi), 3, 0), \ ++ BPF_STMT(BPF_LD+BPF_MEM, 0), /* swap in lo */ \ ++ BPF_JUMP(BPF_JMP+BPF_JSET+BPF_K, (lo), 0, 2), \ ++ BPF_STMT(BPF_LD+BPF_MEM, 1), /* passed: swap hi back in */ \ ++ jt, \ ++ BPF_STMT(BPF_LD+BPF_MEM, 1) /* failed: swap hi back in */ ++ ++#define JGE32(value, jt) \ ++ BPF_JUMP(BPF_JMP+BPF_JGE+BPF_K, (value), 0, 1), \ ++ jt ++ ++#define JLT32(value, jt) \ ++ BPF_JUMP(BPF_JMP+BPF_JGE+BPF_K, (value), 1, 0), \ ++ jt ++ ++/* Shortcut checking if hi > arg.hi. */ ++#define JGE64(lo, hi, jt) \ ++ BPF_JUMP(BPF_JMP+BPF_JGT+BPF_K, (hi), 4, 0), \ ++ BPF_JUMP(BPF_JMP+BPF_JEQ+BPF_K, (hi), 0, 5), \ ++ BPF_STMT(BPF_LD+BPF_MEM, 0), /* swap in lo */ \ ++ BPF_JUMP(BPF_JMP+BPF_JGE+BPF_K, (lo), 0, 2), \ ++ BPF_STMT(BPF_LD+BPF_MEM, 1), /* passed: swap hi back in */ \ ++ jt, \ ++ BPF_STMT(BPF_LD+BPF_MEM, 1) /* failed: swap hi back in */ ++ ++#define JLT64(lo, hi, jt) \ ++ BPF_JUMP(BPF_JMP+BPF_JGE+BPF_K, (hi), 0, 4), \ ++ BPF_JUMP(BPF_JMP+BPF_JEQ+BPF_K, (hi), 0, 5), \ ++ BPF_STMT(BPF_LD+BPF_MEM, 0), /* swap in lo */ \ ++ BPF_JUMP(BPF_JMP+BPF_JGT+BPF_K, (lo), 2, 0), \ ++ BPF_STMT(BPF_LD+BPF_MEM, 1), /* passed: swap hi back in */ \ ++ jt, \ ++ BPF_STMT(BPF_LD+BPF_MEM, 1) /* failed: swap hi back in */ ++ ++#define JGT32(value, jt) \ ++ BPF_JUMP(BPF_JMP+BPF_JGT+BPF_K, (value), 0, 1), \ ++ jt ++ ++#define JLE32(value, jt) \ ++ BPF_JUMP(BPF_JMP+BPF_JGT+BPF_K, (value), 1, 0), \ ++ jt ++ ++/* Check hi > args.hi first, then do the GE checking */ ++#define JGT64(lo, hi, jt) \ ++ BPF_JUMP(BPF_JMP+BPF_JGT+BPF_K, (hi), 4, 0), \ ++ BPF_JUMP(BPF_JMP+BPF_JEQ+BPF_K, (hi), 0, 5), \ ++ BPF_STMT(BPF_LD+BPF_MEM, 0), /* swap in lo */ \ ++ BPF_JUMP(BPF_JMP+BPF_JGT+BPF_K, (lo), 0, 2), \ ++ BPF_STMT(BPF_LD+BPF_MEM, 1), /* passed: swap hi back in */ \ ++ jt, \ ++ BPF_STMT(BPF_LD+BPF_MEM, 1) /* failed: swap hi back in */ ++ ++#define JLE64(lo, hi, jt) \ ++ BPF_JUMP(BPF_JMP+BPF_JGT+BPF_K, (hi), 6, 0), \ ++ BPF_JUMP(BPF_JMP+BPF_JEQ+BPF_K, (hi), 0, 3), \ ++ BPF_STMT(BPF_LD+BPF_MEM, 0), /* swap in lo */ \ ++ BPF_JUMP(BPF_JMP+BPF_JGT+BPF_K, (lo), 2, 0), \ ++ BPF_STMT(BPF_LD+BPF_MEM, 1), /* passed: swap hi back in */ \ ++ jt, \ ++ BPF_STMT(BPF_LD+BPF_MEM, 1) /* failed: swap hi back in */ ++ ++#define LOAD_SYSCALL_NR \ ++ BPF_STMT(BPF_LD+BPF_W+BPF_ABS, \ ++ offsetof(struct seccomp_data, nr)) ++ ++#endif /* __BPF_HELPER_H__ */ +diff --git a/samples/seccomp/dropper.c b/samples/seccomp/dropper.c +new file mode 100644 +index 0000000..c69c347 +--- /dev/null ++++ b/samples/seccomp/dropper.c +@@ -0,0 +1,68 @@ ++/* ++ * Naive system call dropper built on seccomp_filter. ++ * ++ * Copyright (c) 2012 The Chromium OS Authors <chromium-os-dev@chromium.org> ++ * Author: Will Drewry <wad@chromium.org> ++ * ++ * The code may be used by anyone for any purpose, ++ * and can serve as a starting point for developing ++ * applications using prctl(PR_SET_SECCOMP, 2, ...). ++ * ++ * When run, returns the specified errno for the specified ++ * system call number against the given architecture. ++ * ++ * Run this one as root as PR_SET_NO_NEW_PRIVS is not called. ++ */ ++ ++#include <errno.h> ++#include <linux/audit.h> ++#include <linux/filter.h> ++#include <linux/seccomp.h> ++#include <linux/unistd.h> ++#include <stdio.h> ++#include <stddef.h> ++#include <stdlib.h> ++#include <sys/prctl.h> ++#include <unistd.h> ++ ++static int install_filter(int nr, int arch, int error) ++{ ++ struct sock_filter filter[] = { ++ BPF_STMT(BPF_LD+BPF_W+BPF_ABS, ++ (offsetof(struct seccomp_data, arch))), ++ BPF_JUMP(BPF_JMP+BPF_JEQ+BPF_K, arch, 0, 3), ++ BPF_STMT(BPF_LD+BPF_W+BPF_ABS, ++ (offsetof(struct seccomp_data, nr))), ++ BPF_JUMP(BPF_JMP+BPF_JEQ+BPF_K, nr, 0, 1), ++ BPF_STMT(BPF_RET+BPF_K, ++ SECCOMP_RET_ERRNO|(error & SECCOMP_RET_DATA)), ++ BPF_STMT(BPF_RET+BPF_K, SECCOMP_RET_ALLOW), ++ }; ++ struct sock_fprog prog = { ++ .len = (unsigned short)(sizeof(filter)/sizeof(filter[0])), ++ .filter = filter, ++ }; ++ if (prctl(PR_SET_SECCOMP, 2, &prog)) { ++ perror("prctl"); ++ return 1; ++ } ++ return 0; ++} ++ ++int main(int argc, char **argv) ++{ ++ if (argc < 5) { ++ fprintf(stderr, "Usage:\n" ++ "dropper <syscall_nr> <arch> <errno> <prog> [<args>]\n" ++ "Hint: AUDIT_ARCH_I386: 0x%X\n" ++ " AUDIT_ARCH_X86_64: 0x%X\n" ++ "\n", AUDIT_ARCH_I386, AUDIT_ARCH_X86_64); ++ return 1; ++ } ++ if (install_filter(strtol(argv[1], NULL, 0), strtol(argv[2], NULL, 0), ++ strtol(argv[3], NULL, 0))) ++ return 1; ++ execv(argv[4], &argv[4]); ++ printf("Failed to execv\n"); ++ return 255; ++} +-- +1.7.9.1 + diff --git a/features/seccomp/Fix-execve-behavior-apparmor-for-PR_-GET-SET-_NO_NEW.patch b/features/seccomp/Fix-execve-behavior-apparmor-for-PR_-GET-SET-_NO_NEW.patch new file mode 100644 index 00000000..5d0e2930 --- /dev/null +++ b/features/seccomp/Fix-execve-behavior-apparmor-for-PR_-GET-SET-_NO_NEW.patch @@ -0,0 +1,106 @@ +From a7c57bb9edacc420cc99d16852621a12d112cb0f Mon Sep 17 00:00:00 2001 +From: John Johansen <john.johansen@canonical.com> +Date: Thu, 12 Apr 2012 16:47:51 -0500 +Subject: [PATCH] Fix execve behavior apparmor for PR_{GET,SET}_NO_NEW_PRIVS + +commit c29bceb3967398cf2ac8bf8edf9634fdb722df7d upstream. + +Add support for AppArmor to explicitly fail requested domain transitions +if NO_NEW_PRIVS is set and the task is not unconfined. + +Transitions from unconfined are still allowed because this always results +in a reduction of privileges. + +Acked-by: Eric Paris <eparis@redhat.com> +Signed-off-by: Will Drewry <wad@chromium.org> +Signed-off-by: John Johansen <john.johansen@canonical.com> +Signed-off-by: Andy Lutomirski <luto@amacapital.net> + +v18: new acked-by, new description +Signed-off-by: James Morris <james.l.morris@oracle.com> +Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com> +--- + security/apparmor/domain.c | 39 +++++++++++++++++++++++++++++++++++---- + 1 files changed, 35 insertions(+), 4 deletions(-) + +diff --git a/security/apparmor/domain.c b/security/apparmor/domain.c +index 18c88d0..b81ea10 100644 +--- a/security/apparmor/domain.c ++++ b/security/apparmor/domain.c +@@ -360,10 +360,6 @@ int apparmor_bprm_set_creds(struct linux_binprm *bprm) + if (bprm->cred_prepared) + return 0; + +- /* XXX: no_new_privs is not usable with AppArmor yet */ +- if (bprm->unsafe & LSM_UNSAFE_NO_NEW_PRIVS) +- return -EPERM; +- + cxt = bprm->cred->security; + BUG_ON(!cxt); + +@@ -398,6 +394,11 @@ int apparmor_bprm_set_creds(struct linux_binprm *bprm) + new_profile = find_attach(ns, &ns->base.profiles, name); + if (!new_profile) + goto cleanup; ++ /* ++ * NOTE: Domain transitions from unconfined are allowed ++ * even when no_new_privs is set because this aways results ++ * in a further reduction of permissions. ++ */ + goto apply; + } + +@@ -459,6 +460,16 @@ int apparmor_bprm_set_creds(struct linux_binprm *bprm) + /* fail exec */ + error = -EACCES; + ++ /* ++ * Policy has specified a domain transition, if no_new_privs then ++ * fail the exec. ++ */ ++ if (bprm->unsafe & LSM_UNSAFE_NO_NEW_PRIVS) { ++ aa_put_profile(new_profile); ++ error = -EPERM; ++ goto cleanup; ++ } ++ + if (!new_profile) + goto audit; + +@@ -613,6 +624,14 @@ int aa_change_hat(const char *hats[], int count, u64 token, bool permtest) + const char *target = NULL, *info = NULL; + int error = 0; + ++ /* ++ * Fail explicitly requested domain transitions if no_new_privs. ++ * There is no exception for unconfined as change_hat is not ++ * available. ++ */ ++ if (current->no_new_privs) ++ return -EPERM; ++ + /* released below */ + cred = get_current_cred(); + cxt = cred->security; +@@ -754,6 +773,18 @@ int aa_change_profile(const char *ns_name, const char *hname, bool onexec, + cxt = cred->security; + profile = aa_cred_profile(cred); + ++ /* ++ * Fail explicitly requested domain transitions if no_new_privs ++ * and not unconfined. ++ * Domain transitions from unconfined are allowed even when ++ * no_new_privs is set because this aways results in a reduction ++ * of permissions. ++ */ ++ if (current->no_new_privs && !unconfined(profile)) { ++ put_cred(cred); ++ return -EPERM; ++ } ++ + if (ns_name) { + /* released below */ + ns = aa_find_namespace(profile->ns, ns_name); +-- +1.7.9.1 + diff --git a/features/seccomp/arch-x86-add-syscall_get_arch-to-syscall.h.patch b/features/seccomp/arch-x86-add-syscall_get_arch-to-syscall.h.patch new file mode 100644 index 00000000..841ccffc --- /dev/null +++ b/features/seccomp/arch-x86-add-syscall_get_arch-to-syscall.h.patch @@ -0,0 +1,85 @@ +From d581579e1974f5bd2ff3bb5b93240aa5ccf2f907 Mon Sep 17 00:00:00 2001 +From: Will Drewry <wad@chromium.org> +Date: Thu, 12 Apr 2012 16:47:56 -0500 +Subject: [PATCH] arch/x86: add syscall_get_arch to syscall.h + +commit b7456536cf9466b402b540c5588d79a4177c723a upstream. + +Add syscall_get_arch() to export the current AUDIT_ARCH_* based on system call +entry path. + +Signed-off-by: Will Drewry <wad@chromium.org> +Acked-by: Serge Hallyn <serge.hallyn@canonical.com> +Reviewed-by: H. Peter Anvin <hpa@zytor.com> +Acked-by: Eric Paris <eparis@redhat.com> +Reviewed-by: Kees Cook <keescook@chromium.org> + +v18: - update comment about x32 tasks + - rebase to v3.4-rc2 +v17: rebase and reviewed-by +v14: rebase/nochanges +v13: rebase on to 88ebdda6159ffc15699f204c33feb3e431bf9bdc +Signed-off-by: James Morris <james.l.morris@oracle.com> +Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com> +--- + arch/x86/include/asm/syscall.h | 27 +++++++++++++++++++++++++++ + 1 files changed, 27 insertions(+), 0 deletions(-) + +diff --git a/arch/x86/include/asm/syscall.h b/arch/x86/include/asm/syscall.h +index 386b786..1ace47b 100644 +--- a/arch/x86/include/asm/syscall.h ++++ b/arch/x86/include/asm/syscall.h +@@ -13,9 +13,11 @@ + #ifndef _ASM_X86_SYSCALL_H + #define _ASM_X86_SYSCALL_H + ++#include <linux/audit.h> + #include <linux/sched.h> + #include <linux/err.h> + #include <asm/asm-offsets.h> /* For NR_syscalls */ ++#include <asm/thread_info.h> /* for TS_COMPAT */ + #include <asm/unistd.h> + + extern const unsigned long sys_call_table[]; +@@ -88,6 +90,12 @@ static inline void syscall_set_arguments(struct task_struct *task, + memcpy(®s->bx + i, args, n * sizeof(args[0])); + } + ++static inline int syscall_get_arch(struct task_struct *task, ++ struct pt_regs *regs) ++{ ++ return AUDIT_ARCH_I386; ++} ++ + #else /* CONFIG_X86_64 */ + + static inline void syscall_get_arguments(struct task_struct *task, +@@ -212,6 +220,25 @@ static inline void syscall_set_arguments(struct task_struct *task, + } + } + ++static inline int syscall_get_arch(struct task_struct *task, ++ struct pt_regs *regs) ++{ ++#ifdef CONFIG_IA32_EMULATION ++ /* ++ * TS_COMPAT is set for 32-bit syscall entry and then ++ * remains set until we return to user mode. ++ * ++ * TIF_IA32 tasks should always have TS_COMPAT set at ++ * system call time. ++ * ++ * x32 tasks should be considered AUDIT_ARCH_X86_64. ++ */ ++ if (task_thread_info(task)->status & TS_COMPAT) ++ return AUDIT_ARCH_I386; ++#endif ++ /* Both x32 and x86_64 are considered "64-bit". */ ++ return AUDIT_ARCH_X86_64; ++} + #endif /* CONFIG_X86_32 */ + + #endif /* _ASM_X86_SYSCALL_H */ +-- +1.7.9.1 + diff --git a/features/seccomp/asm-syscall.h-add-syscall_get_arch.patch b/features/seccomp/asm-syscall.h-add-syscall_get_arch.patch new file mode 100644 index 00000000..d95c897d --- /dev/null +++ b/features/seccomp/asm-syscall.h-add-syscall_get_arch.patch @@ -0,0 +1,59 @@ +From 2ca6c225eacea82fd7fdcd24312c817e1e8352e4 Mon Sep 17 00:00:00 2001 +From: Will Drewry <wad@chromium.org> +Date: Thu, 12 Apr 2012 16:47:55 -0500 +Subject: [PATCH] asm/syscall.h: add syscall_get_arch + +commit 07bd18d00d5dcf84eb22f8120f47f09c3d8fe27d upstream. + +Adds a stub for a function that will return the AUDIT_ARCH_* value +appropriate to the supplied task based on the system call convention. + +For audit's use, the value can generally be hard-coded at the +audit-site. However, for other functionality not inlined into syscall +entry/exit, this makes that information available. seccomp_filter is +the first planned consumer and, as such, the comment indicates a tie to +CONFIG_HAVE_ARCH_SECCOMP_FILTER. + +Suggested-by: Roland McGrath <mcgrathr@chromium.org> +Signed-off-by: Will Drewry <wad@chromium.org> +Acked-by: Serge Hallyn <serge.hallyn@canonical.com> +Acked-by: Eric Paris <eparis@redhat.com> + +v18: comment and change reword and rebase. +v14: rebase/nochanges +v13: rebase on to 88ebdda6159ffc15699f204c33feb3e431bf9bdc +v12: rebase on to linux-next +v11: fixed improper return type +v10: introduced +Signed-off-by: James Morris <james.l.morris@oracle.com> +Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com> +--- + include/asm-generic/syscall.h | 14 ++++++++++++++ + 1 files changed, 14 insertions(+), 0 deletions(-) + +diff --git a/include/asm-generic/syscall.h b/include/asm-generic/syscall.h +index 5c122ae..5b09392 100644 +--- a/include/asm-generic/syscall.h ++++ b/include/asm-generic/syscall.h +@@ -142,4 +142,18 @@ void syscall_set_arguments(struct task_struct *task, struct pt_regs *regs, + unsigned int i, unsigned int n, + const unsigned long *args); + ++/** ++ * syscall_get_arch - return the AUDIT_ARCH for the current system call ++ * @task: task of interest, must be in system call entry tracing ++ * @regs: task_pt_regs() of @task ++ * ++ * Returns the AUDIT_ARCH_* based on the system call convention in use. ++ * ++ * It's only valid to call this when @task is stopped on entry to a system ++ * call, due to %TIF_SYSCALL_TRACE, %TIF_SYSCALL_AUDIT, or %TIF_SECCOMP. ++ * ++ * Architectures which permit CONFIG_HAVE_ARCH_SECCOMP_FILTER must ++ * provide an implementation of this. ++ */ ++int syscall_get_arch(struct task_struct *task, struct pt_regs *regs); + #endif /* _ASM_SYSCALL_H */ +-- +1.7.9.1 + diff --git a/features/seccomp/net-compat.c-linux-filter.h-share-compat_sock_fprog.patch b/features/seccomp/net-compat.c-linux-filter.h-share-compat_sock_fprog.patch new file mode 100644 index 00000000..f186f7c9 --- /dev/null +++ b/features/seccomp/net-compat.c-linux-filter.h-share-compat_sock_fprog.patch @@ -0,0 +1,80 @@ +From 01cef9b98077e652997585d35f765b4b69e33f51 Mon Sep 17 00:00:00 2001 +From: Will Drewry <wad@chromium.org> +Date: Thu, 12 Apr 2012 16:47:53 -0500 +Subject: [PATCH] net/compat.c,linux/filter.h: share compat_sock_fprog + +commit 0c5fe1b4221c6701224c2601cf3c692e5721103e upstream. + +Any other users of bpf_*_filter that take a struct sock_fprog from +userspace will need to be able to also accept a compat_sock_fprog +if the arch supports compat calls. This change allows the existing +compat_sock_fprog be shared. + +Signed-off-by: Will Drewry <wad@chromium.org> +Acked-by: Serge Hallyn <serge.hallyn@canonical.com> +Acked-by: Eric Dumazet <eric.dumazet@gmail.com> +Acked-by: Eric Paris <eparis@redhat.com> + +v18: tasered by the apostrophe police +v14: rebase/nochanges +v13: rebase on to 88ebdda6159ffc15699f204c33feb3e431bf9bdc +v12: rebase on to linux-next +v11: introduction +Signed-off-by: James Morris <james.l.morris@oracle.com> +Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com> +--- + include/linux/filter.h | 11 +++++++++++ + net/compat.c | 8 -------- + 2 files changed, 11 insertions(+), 8 deletions(-) + +diff --git a/include/linux/filter.h b/include/linux/filter.h +index aaa2e80..f2e5315 100644 +--- a/include/linux/filter.h ++++ b/include/linux/filter.h +@@ -10,6 +10,7 @@ + + #ifdef __KERNEL__ + #include <linux/atomic.h> ++#include <linux/compat.h> + #endif + + /* +@@ -132,6 +133,16 @@ struct sock_fprog { /* Required for SO_ATTACH_FILTER. */ + + #ifdef __KERNEL__ + ++#ifdef CONFIG_COMPAT ++/* ++ * A struct sock_filter is architecture independent. ++ */ ++struct compat_sock_fprog { ++ u16 len; ++ compat_uptr_t filter; /* struct sock_filter * */ ++}; ++#endif ++ + struct sk_buff; + struct sock; + +diff --git a/net/compat.c b/net/compat.c +index e055708..242c828 100644 +--- a/net/compat.c ++++ b/net/compat.c +@@ -328,14 +328,6 @@ void scm_detach_fds_compat(struct msghdr *kmsg, struct scm_cookie *scm) + __scm_destroy(scm); + } + +-/* +- * A struct sock_filter is architecture independent. +- */ +-struct compat_sock_fprog { +- u16 len; +- compat_uptr_t filter; /* struct sock_filter * */ +-}; +- + static int do_set_attach_filter(struct socket *sock, int level, int optname, + char __user *optval, unsigned int optlen) + { +-- +1.7.9.1 + diff --git a/features/seccomp/ptrace-seccomp-Add-PTRACE_SECCOMP-support.patch b/features/seccomp/ptrace-seccomp-Add-PTRACE_SECCOMP-support.patch new file mode 100644 index 00000000..6c0194a8 --- /dev/null +++ b/features/seccomp/ptrace-seccomp-Add-PTRACE_SECCOMP-support.patch @@ -0,0 +1,165 @@ +From 02fa56dd47cf648e30198b2dd836a45b08354db0 Mon Sep 17 00:00:00 2001 +From: Will Drewry <wad@chromium.org> +Date: Thu, 12 Apr 2012 16:48:02 -0500 +Subject: [PATCH] ptrace,seccomp: Add PTRACE_SECCOMP support + +commit fb0fadf9b213f55ca9368f3edafe51101d5d2deb upstream. + +This change adds support for a new ptrace option, PTRACE_O_TRACESECCOMP, +and a new return value for seccomp BPF programs, SECCOMP_RET_TRACE. + +When a tracer specifies the PTRACE_O_TRACESECCOMP ptrace option, the +tracer will be notified, via PTRACE_EVENT_SECCOMP, for any syscall that +results in a BPF program returning SECCOMP_RET_TRACE. The 16-bit +SECCOMP_RET_DATA mask of the BPF program return value will be passed as +the ptrace_message and may be retrieved using PTRACE_GETEVENTMSG. + +If the subordinate process is not using seccomp filter, then no +system call notifications will occur even if the option is specified. + +If there is no tracer with PTRACE_O_TRACESECCOMP when SECCOMP_RET_TRACE +is returned, the system call will not be executed and an -ENOSYS errno +will be returned to userspace. + +This change adds a dependency on the system call slow path. Any future +efforts to use the system call fast path for seccomp filter will need to +address this restriction. + +Signed-off-by: Will Drewry <wad@chromium.org> +Acked-by: Eric Paris <eparis@redhat.com> + +v18: - rebase + - comment fatal_signal check + - acked-by + - drop secure_computing_int comment +v17: - ... +v16: - update PT_TRACE_MASK to 0xbf4 so that STOP isn't clear on SETOPTIONS call (indan@nul.nu) + [note PT_TRACE_MASK disappears in linux-next] +v15: - add audit support for non-zero return codes + - clean up style (indan@nul.nu) +v14: - rebase/nochanges +v13: - rebase on to 88ebdda6159ffc15699f204c33feb3e431bf9bdc + (Brings back a change to ptrace.c and the masks.) +v12: - rebase to linux-next + - use ptrace_event and update arch/Kconfig to mention slow-path dependency + - drop all tracehook changes and inclusion (oleg@redhat.com) +v11: - invert the logic to just make it a PTRACE_SYSCALL accelerator + (indan@nul.nu) +v10: - moved to PTRACE_O_SECCOMP / PT_TRACE_SECCOMP +v9: - n/a +v8: - guarded PTRACE_SECCOMP use with an ifdef +v7: - introduced +Signed-off-by: James Morris <james.l.morris@oracle.com> +Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com> +--- + arch/Kconfig | 10 +++++----- + include/linux/ptrace.h | 5 ++++- + include/linux/seccomp.h | 1 + + kernel/seccomp.c | 16 ++++++++++++++++ + 4 files changed, 26 insertions(+), 6 deletions(-) + +diff --git a/arch/Kconfig b/arch/Kconfig +index 66aef13..c024b3e 100644 +--- a/arch/Kconfig ++++ b/arch/Kconfig +@@ -219,15 +219,15 @@ config ARCH_WANT_OLD_COMPAT_IPC + config HAVE_ARCH_SECCOMP_FILTER + bool + help +- This symbol should be selected by an architecure if it provides: +- asm/syscall.h: ++ An arch should select this symbol if it provides all of these things: + - syscall_get_arch() + - syscall_get_arguments() + - syscall_rollback() + - syscall_set_return_value() +- SIGSYS siginfo_t support must be implemented. +- __secure_computing()/secure_computing()'s return value must be +- checked, with -1 resulting in the syscall being skipped. ++ - SIGSYS siginfo_t support ++ - secure_computing is called from a ptrace_event()-safe context ++ - secure_computing return value is checked and a return value of -1 ++ results in the system call being skipped immediately. + + config SECCOMP_FILTER + def_bool y +diff --git a/include/linux/ptrace.h b/include/linux/ptrace.h +index 5c71962..597e4fd 100644 +--- a/include/linux/ptrace.h ++++ b/include/linux/ptrace.h +@@ -58,6 +58,7 @@ + #define PTRACE_EVENT_EXEC 4 + #define PTRACE_EVENT_VFORK_DONE 5 + #define PTRACE_EVENT_EXIT 6 ++#define PTRACE_EVENT_SECCOMP 7 + /* Extended result codes which enabled by means other than options. */ + #define PTRACE_EVENT_STOP 128 + +@@ -69,8 +70,9 @@ + #define PTRACE_O_TRACEEXEC (1 << PTRACE_EVENT_EXEC) + #define PTRACE_O_TRACEVFORKDONE (1 << PTRACE_EVENT_VFORK_DONE) + #define PTRACE_O_TRACEEXIT (1 << PTRACE_EVENT_EXIT) ++#define PTRACE_O_TRACESECCOMP (1 << PTRACE_EVENT_SECCOMP) + +-#define PTRACE_O_MASK 0x0000007f ++#define PTRACE_O_MASK 0x000000ff + + #include <asm/ptrace.h> + +@@ -98,6 +100,7 @@ + #define PT_TRACE_EXEC PT_EVENT_FLAG(PTRACE_EVENT_EXEC) + #define PT_TRACE_VFORK_DONE PT_EVENT_FLAG(PTRACE_EVENT_VFORK_DONE) + #define PT_TRACE_EXIT PT_EVENT_FLAG(PTRACE_EVENT_EXIT) ++#define PT_TRACE_SECCOMP PT_EVENT_FLAG(PTRACE_EVENT_SECCOMP) + + /* single stepping state bits (used on ARM and PA-RISC) */ + #define PT_SINGLESTEP_BIT 31 +diff --git a/include/linux/seccomp.h b/include/linux/seccomp.h +index 317ccb7..5818e86 100644 +--- a/include/linux/seccomp.h ++++ b/include/linux/seccomp.h +@@ -21,6 +21,7 @@ + #define SECCOMP_RET_KILL 0x00000000U /* kill the task immediately */ + #define SECCOMP_RET_TRAP 0x00030000U /* disallow and force a SIGSYS */ + #define SECCOMP_RET_ERRNO 0x00050000U /* returns an errno */ ++#define SECCOMP_RET_TRACE 0x7ff00000U /* pass to a tracer or disallow */ + #define SECCOMP_RET_ALLOW 0x7fff0000U /* allow */ + + /* Masks for the return value sections. */ +diff --git a/kernel/seccomp.c b/kernel/seccomp.c +index 9c38306..d9db6ec 100644 +--- a/kernel/seccomp.c ++++ b/kernel/seccomp.c +@@ -24,6 +24,7 @@ + #ifdef CONFIG_SECCOMP_FILTER + #include <asm/syscall.h> + #include <linux/filter.h> ++#include <linux/ptrace.h> + #include <linux/security.h> + #include <linux/slab.h> + #include <linux/tracehook.h> +@@ -408,6 +409,21 @@ int __secure_computing(int this_syscall) + /* Let the filter pass back 16 bits of data. */ + seccomp_send_sigsys(this_syscall, data); + goto skip; ++ case SECCOMP_RET_TRACE: ++ /* Skip these calls if there is no tracer. */ ++ if (!ptrace_event_enabled(current, PTRACE_EVENT_SECCOMP)) ++ goto skip; ++ /* Allow the BPF to provide the event message */ ++ ptrace_event(PTRACE_EVENT_SECCOMP, data); ++ /* ++ * The delivery of a fatal signal during event ++ * notification may silently skip tracer notification. ++ * Terminating the task now avoids executing a system ++ * call that may not be intended. ++ */ ++ if (fatal_signal_pending(current)) ++ break; ++ return 0; + case SECCOMP_RET_ALLOW: + return 0; + case SECCOMP_RET_KILL: +-- +1.7.9.1 + diff --git a/features/seccomp/seccomp-Add-SECCOMP_RET_TRAP.patch b/features/seccomp/seccomp-Add-SECCOMP_RET_TRAP.patch new file mode 100644 index 00000000..31466638 --- /dev/null +++ b/features/seccomp/seccomp-Add-SECCOMP_RET_TRAP.patch @@ -0,0 +1,138 @@ +From 365829a1caa9148a289fe895280a1d2ed0e56e37 Mon Sep 17 00:00:00 2001 +From: Will Drewry <wad@chromium.org> +Date: Thu, 12 Apr 2012 16:48:01 -0500 +Subject: [PATCH] seccomp: Add SECCOMP_RET_TRAP + +commit bb6ea4301a1109afdacaee576fedbfcd7152fc86 upstream. + +Adds a new return value to seccomp filters that triggers a SIGSYS to be +delivered with the new SYS_SECCOMP si_code. + +This allows in-process system call emulation, including just specifying +an errno or cleanly dumping core, rather than just dying. + +Suggested-by: Markus Gutschke <markus@chromium.org> +Suggested-by: Julien Tinnes <jln@chromium.org> +Signed-off-by: Will Drewry <wad@chromium.org> +Acked-by: Eric Paris <eparis@redhat.com> + +v18: - acked-by, rebase + - don't mention secure_computing_int() anymore +v15: - use audit_seccomp/skip + - pad out error spacing; clean up switch (indan@nul.nu) +v14: - n/a +v13: - rebase on to 88ebdda6159ffc15699f204c33feb3e431bf9bdc +v12: - rebase on to linux-next +v11: - clarify the comment (indan@nul.nu) + - s/sigtrap/sigsys +v10: - use SIGSYS, syscall_get_arch, updates arch/Kconfig + note suggested-by (though original suggestion had other behaviors) +v9: - changes to SIGILL +v8: - clean up based on changes to dependent patches +v7: - introduction +Signed-off-by: James Morris <james.l.morris@oracle.com> +Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com> +--- + arch/Kconfig | 14 +++++++++----- + include/asm-generic/siginfo.h | 2 +- + include/linux/seccomp.h | 1 + + kernel/seccomp.c | 26 ++++++++++++++++++++++++++ + 4 files changed, 37 insertions(+), 6 deletions(-) + +diff --git a/arch/Kconfig b/arch/Kconfig +index beaab68..66aef13 100644 +--- a/arch/Kconfig ++++ b/arch/Kconfig +@@ -219,11 +219,15 @@ config ARCH_WANT_OLD_COMPAT_IPC + config HAVE_ARCH_SECCOMP_FILTER + bool + help +- This symbol should be selected by an architecure if it provides +- asm/syscall.h, specifically syscall_get_arguments(), +- syscall_get_arch(), and syscall_set_return_value(). Additionally, +- its system call entry path must respect a return value of -1 from +- __secure_computing() and/or secure_computing(). ++ This symbol should be selected by an architecure if it provides: ++ asm/syscall.h: ++ - syscall_get_arch() ++ - syscall_get_arguments() ++ - syscall_rollback() ++ - syscall_set_return_value() ++ SIGSYS siginfo_t support must be implemented. ++ __secure_computing()/secure_computing()'s return value must be ++ checked, with -1 resulting in the syscall being skipped. + + config SECCOMP_FILTER + def_bool y +diff --git a/include/asm-generic/siginfo.h b/include/asm-generic/siginfo.h +index d2c7f29..8ed6777 100644 +--- a/include/asm-generic/siginfo.h ++++ b/include/asm-generic/siginfo.h +@@ -101,7 +101,7 @@ typedef struct siginfo { + + /* SIGSYS */ + struct { +- void __user *_call_addr; /* calling insn */ ++ void __user *_call_addr; /* calling user insn */ + int _syscall; /* triggering system call number */ + unsigned int _arch; /* AUDIT_ARCH_* of syscall */ + } _sigsys; +diff --git a/include/linux/seccomp.h b/include/linux/seccomp.h +index b4ce2c8..317ccb7 100644 +--- a/include/linux/seccomp.h ++++ b/include/linux/seccomp.h +@@ -19,6 +19,7 @@ + * selects the least permissive choice. + */ + #define SECCOMP_RET_KILL 0x00000000U /* kill the task immediately */ ++#define SECCOMP_RET_TRAP 0x00030000U /* disallow and force a SIGSYS */ + #define SECCOMP_RET_ERRNO 0x00050000U /* returns an errno */ + #define SECCOMP_RET_ALLOW 0x7fff0000U /* allow */ + +diff --git a/kernel/seccomp.c b/kernel/seccomp.c +index 5f78fb6..9c38306 100644 +--- a/kernel/seccomp.c ++++ b/kernel/seccomp.c +@@ -332,6 +332,26 @@ void put_seccomp_filter(struct task_struct *tsk) + kfree(freeme); + } + } ++ ++/** ++ * seccomp_send_sigsys - signals the task to allow in-process syscall emulation ++ * @syscall: syscall number to send to userland ++ * @reason: filter-supplied reason code to send to userland (via si_errno) ++ * ++ * Forces a SIGSYS with a code of SYS_SECCOMP and related sigsys info. ++ */ ++static void seccomp_send_sigsys(int syscall, int reason) ++{ ++ struct siginfo info; ++ memset(&info, 0, sizeof(info)); ++ info.si_signo = SIGSYS; ++ info.si_code = SYS_SECCOMP; ++ info.si_call_addr = (void __user *)KSTK_EIP(current); ++ info.si_errno = reason; ++ info.si_arch = syscall_get_arch(current, task_pt_regs(current)); ++ info.si_syscall = syscall; ++ force_sig_info(SIGSYS, &info, current); ++} + #endif /* CONFIG_SECCOMP_FILTER */ + + /* +@@ -382,6 +402,12 @@ int __secure_computing(int this_syscall) + syscall_set_return_value(current, task_pt_regs(current), + -data, 0); + goto skip; ++ case SECCOMP_RET_TRAP: ++ /* Show the handler the original registers. */ ++ syscall_rollback(current, task_pt_regs(current)); ++ /* Let the filter pass back 16 bits of data. */ ++ seccomp_send_sigsys(this_syscall, data); ++ goto skip; + case SECCOMP_RET_ALLOW: + return 0; + case SECCOMP_RET_KILL: +-- +1.7.9.1 + diff --git a/features/seccomp/seccomp-add-SECCOMP_RET_ERRNO.patch b/features/seccomp/seccomp-add-SECCOMP_RET_ERRNO.patch new file mode 100644 index 00000000..2bb5adcf --- /dev/null +++ b/features/seccomp/seccomp-add-SECCOMP_RET_ERRNO.patch @@ -0,0 +1,202 @@ +From dbb9ea8331cefce3fe15499126a7a1d29beb5d70 Mon Sep 17 00:00:00 2001 +From: Will Drewry <wad@chromium.org> +Date: Thu, 12 Apr 2012 16:47:59 -0500 +Subject: [PATCH] seccomp: add SECCOMP_RET_ERRNO + +commit acf3b2c71ed20c53dc69826683417703c2a88059 upstream. + +This change adds the SECCOMP_RET_ERRNO as a valid return value from a +seccomp filter. Additionally, it makes the first use of the lower +16-bits for storing a filter-supplied errno. 16-bits is more than +enough for the errno-base.h calls. + +Returning errors instead of immediately terminating processes that +violate seccomp policy allow for broader use of this functionality +for kernel attack surface reduction. For example, a linux container +could maintain a whitelist of pre-existing system calls but drop +all new ones with errnos. This would keep a logically static attack +surface while providing errnos that may allow for graceful failure +without the downside of do_exit() on a bad call. + +This change also changes the signature of __secure_computing. It +appears the only direct caller is the arm entry code and it clobbers +any possible return value (register) immediately. + +Signed-off-by: Will Drewry <wad@chromium.org> +Acked-by: Serge Hallyn <serge.hallyn@canonical.com> +Reviewed-by: Kees Cook <keescook@chromium.org> +Acked-by: Eric Paris <eparis@redhat.com> + +v18: - fix up comments and rebase + - fix bad var name which was fixed in later revs + - remove _int() and just change the __secure_computing signature +v16-v17: ... +v15: - use audit_seccomp and add a skip label. (eparis@redhat.com) + - clean up and pad out return codes (indan@nul.nu) +v14: - no change/rebase +v13: - rebase on to 88ebdda6159ffc15699f204c33feb3e431bf9bdc +v12: - move to WARN_ON if filter is NULL + (oleg@redhat.com, luto@mit.edu, keescook@chromium.org) + - return immediately for filter==NULL (keescook@chromium.org) + - change evaluation to only compare the ACTION so that layered + errnos don't result in the lowest one being returned. + (keeschook@chromium.org) +v11: - check for NULL filter (keescook@chromium.org) +v10: - change loaders to fn + v9: - n/a + v8: - update Kconfig to note new need for syscall_set_return_value. + - reordered such that TRAP behavior follows on later. + - made the for loop a little less indent-y + v7: - introduced +Signed-off-by: James Morris <james.l.morris@oracle.com> +Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com> +--- + arch/Kconfig | 6 ++++-- + include/linux/seccomp.h | 10 ++++++---- + kernel/seccomp.c | 42 ++++++++++++++++++++++++++++++++---------- + 3 files changed, 42 insertions(+), 16 deletions(-) + +diff --git a/arch/Kconfig b/arch/Kconfig +index 91c2c73..beaab68 100644 +--- a/arch/Kconfig ++++ b/arch/Kconfig +@@ -220,8 +220,10 @@ config HAVE_ARCH_SECCOMP_FILTER + bool + help + This symbol should be selected by an architecure if it provides +- asm/syscall.h, specifically syscall_get_arguments() and +- syscall_get_arch(). ++ asm/syscall.h, specifically syscall_get_arguments(), ++ syscall_get_arch(), and syscall_set_return_value(). Additionally, ++ its system call entry path must respect a return value of -1 from ++ __secure_computing() and/or secure_computing(). + + config SECCOMP_FILTER + def_bool y +diff --git a/include/linux/seccomp.h b/include/linux/seccomp.h +index 86bb68f..b4ce2c8 100644 +--- a/include/linux/seccomp.h ++++ b/include/linux/seccomp.h +@@ -12,13 +12,14 @@ + + /* + * All BPF programs must return a 32-bit value. +- * The bottom 16-bits are reserved for future use. ++ * The bottom 16-bits are for optional return data. + * The upper 16-bits are ordered from least permissive values to most. + * + * The ordering ensures that a min_t() over composed return values always + * selects the least permissive choice. + */ + #define SECCOMP_RET_KILL 0x00000000U /* kill the task immediately */ ++#define SECCOMP_RET_ERRNO 0x00050000U /* returns an errno */ + #define SECCOMP_RET_ALLOW 0x7fff0000U /* allow */ + + /* Masks for the return value sections. */ +@@ -64,11 +65,12 @@ struct seccomp { + struct seccomp_filter *filter; + }; + +-extern void __secure_computing(int); +-static inline void secure_computing(int this_syscall) ++extern int __secure_computing(int); ++static inline int secure_computing(int this_syscall) + { + if (unlikely(test_thread_flag(TIF_SECCOMP))) +- __secure_computing(this_syscall); ++ return __secure_computing(this_syscall); ++ return 0; + } + + extern long prctl_get_seccomp(void); +diff --git a/kernel/seccomp.c b/kernel/seccomp.c +index 0f7c709..5f78fb6 100644 +--- a/kernel/seccomp.c ++++ b/kernel/seccomp.c +@@ -199,15 +199,20 @@ static int seccomp_check_filter(struct sock_filter *filter, unsigned int flen) + static u32 seccomp_run_filters(int syscall) + { + struct seccomp_filter *f; +- u32 ret = SECCOMP_RET_KILL; ++ u32 ret = SECCOMP_RET_ALLOW; ++ ++ /* Ensure unexpected behavior doesn't result in failing open. */ ++ if (WARN_ON(current->seccomp.filter == NULL)) ++ return SECCOMP_RET_KILL; ++ + /* + * All filters in the list are evaluated and the lowest BPF return +- * value always takes priority. ++ * value always takes priority (ignoring the DATA). + */ + for (f = current->seccomp.filter; f; f = f->prev) { +- ret = sk_run_filter(NULL, f->insns); +- if (ret != SECCOMP_RET_ALLOW) +- break; ++ u32 cur_ret = sk_run_filter(NULL, f->insns); ++ if ((cur_ret & SECCOMP_RET_ACTION) < (ret & SECCOMP_RET_ACTION)) ++ ret = cur_ret; + } + return ret; + } +@@ -346,11 +351,13 @@ static int mode1_syscalls_32[] = { + }; + #endif + +-void __secure_computing(int this_syscall) ++int __secure_computing(int this_syscall) + { + int mode = current->seccomp.mode; + int exit_sig = 0; + int *syscall; ++ u32 ret = SECCOMP_RET_KILL; ++ int data; + + switch (mode) { + case SECCOMP_MODE_STRICT: +@@ -361,14 +368,26 @@ void __secure_computing(int this_syscall) + #endif + do { + if (*syscall == this_syscall) +- return; ++ return 0; + } while (*++syscall); + exit_sig = SIGKILL; + break; + #ifdef CONFIG_SECCOMP_FILTER + case SECCOMP_MODE_FILTER: +- if (seccomp_run_filters(this_syscall) == SECCOMP_RET_ALLOW) +- return; ++ ret = seccomp_run_filters(this_syscall); ++ data = ret & SECCOMP_RET_DATA; ++ switch (ret & SECCOMP_RET_ACTION) { ++ case SECCOMP_RET_ERRNO: ++ /* Set the low-order 16-bits as a errno. */ ++ syscall_set_return_value(current, task_pt_regs(current), ++ -data, 0); ++ goto skip; ++ case SECCOMP_RET_ALLOW: ++ return 0; ++ case SECCOMP_RET_KILL: ++ default: ++ break; ++ } + exit_sig = SIGSYS; + break; + #endif +@@ -379,8 +398,11 @@ void __secure_computing(int this_syscall) + #ifdef SECCOMP_DEBUG + dump_stack(); + #endif +- audit_seccomp(this_syscall, exit_code, SECCOMP_RET_KILL); ++ audit_seccomp(this_syscall, exit_sig, ret); + do_exit(exit_sig); ++skip: ++ audit_seccomp(this_syscall, exit_sig, ret); ++ return -1; + } + + long prctl_get_seccomp(void) +-- +1.7.9.1 + diff --git a/features/seccomp/seccomp-add-system-call-filtering-using-BPF.patch b/features/seccomp/seccomp-add-system-call-filtering-using-BPF.patch new file mode 100644 index 00000000..908f3cfd --- /dev/null +++ b/features/seccomp/seccomp-add-system-call-filtering-using-BPF.patch @@ -0,0 +1,820 @@ +From 01c9617a2eca38f68d917ae16bdf8c2c8d863c8e Mon Sep 17 00:00:00 2001 +From: Will Drewry <wad@chromium.org> +Date: Thu, 12 Apr 2012 16:47:57 -0500 +Subject: [PATCH] seccomp: add system call filtering using BPF + +commit e2cfabdfd075648216f99c2c03821cf3f47c1727 upstream. + +[This patch depends on luto@mit.edu's no_new_privs patch: + https://lkml.org/lkml/2012/1/30/264 + The whole series including Andrew's patches can be found here: + https://github.com/redpig/linux/tree/seccomp + Complete diff here: + https://github.com/redpig/linux/compare/1dc65fed...seccomp +] + +This patch adds support for seccomp mode 2. Mode 2 introduces the +ability for unprivileged processes to install system call filtering +policy expressed in terms of a Berkeley Packet Filter (BPF) program. +This program will be evaluated in the kernel for each system call +the task makes and computes a result based on data in the format +of struct seccomp_data. + +A filter program may be installed by calling: + struct sock_fprog fprog = { ... }; + ... + prctl(PR_SET_SECCOMP, SECCOMP_MODE_FILTER, &fprog); + +The return value of the filter program determines if the system call is +allowed to proceed or denied. If the first filter program installed +allows prctl(2) calls, then the above call may be made repeatedly +by a task to further reduce its access to the kernel. All attached +programs must be evaluated before a system call will be allowed to +proceed. + +Filter programs will be inherited across fork/clone and execve. +However, if the task attaching the filter is unprivileged +(!CAP_SYS_ADMIN) the no_new_privs bit will be set on the task. This +ensures that unprivileged tasks cannot attach filters that affect +privileged tasks (e.g., setuid binary). + +There are a number of benefits to this approach. A few of which are +as follows: +- BPF has been exposed to userland for a long time +- BPF optimization (and JIT'ing) are well understood +- Userland already knows its ABI: system call numbers and desired + arguments +- No time-of-check-time-of-use vulnerable data accesses are possible. +- system call arguments are loaded on access only to minimize copying + required for system call policy decisions. + +Mode 2 support is restricted to architectures that enable +HAVE_ARCH_SECCOMP_FILTER. In this patch, the primary dependency is on +syscall_get_arguments(). The full desired scope of this feature will +add a few minor additional requirements expressed later in this series. +Based on discussion, SECCOMP_RET_ERRNO and SECCOMP_RET_TRACE seem to be +the desired additional functionality. + +No architectures are enabled in this patch. + +Signed-off-by: Will Drewry <wad@chromium.org> +Acked-by: Serge Hallyn <serge.hallyn@canonical.com> +Reviewed-by: Indan Zupancic <indan@nul.nu> +Acked-by: Eric Paris <eparis@redhat.com> +Reviewed-by: Kees Cook <keescook@chromium.org> + +v18: - rebase to v3.4-rc2 + - s/chk/check/ (akpm@linux-foundation.org,jmorris@namei.org) + - allocate with GFP_KERNEL|__GFP_NOWARN (indan@nul.nu) + - add a comment for get_u32 regarding endianness (akpm@) + - fix other typos, style mistakes (akpm@) + - added acked-by +v17: - properly guard seccomp filter needed headers (leann@ubuntu.com) + - tighten return mask to 0x7fff0000 +v16: - no change +v15: - add a 4 instr penalty when counting a path to account for seccomp_filter + size (indan@nul.nu) + - drop the max insns to 256KB (indan@nul.nu) + - return ENOMEM if the max insns limit has been hit (indan@nul.nu) + - move IP checks after args (indan@nul.nu) + - drop !user_filter check (indan@nul.nu) + - only allow explicit bpf codes (indan@nul.nu) + - exit_code -> exit_sig +v14: - put/get_seccomp_filter takes struct task_struct + (indan@nul.nu,keescook@chromium.org) + - adds seccomp_chk_filter and drops general bpf_run/chk_filter user + - add seccomp_bpf_load for use by net/core/filter.c + - lower max per-process/per-hierarchy: 1MB + - moved nnp/capability check prior to allocation + (all of the above: indan@nul.nu) +v13: - rebase on to 88ebdda6159ffc15699f204c33feb3e431bf9bdc +v12: - added a maximum instruction count per path (indan@nul.nu,oleg@redhat.com) + - removed copy_seccomp (keescook@chromium.org,indan@nul.nu) + - reworded the prctl_set_seccomp comment (indan@nul.nu) +v11: - reorder struct seccomp_data to allow future args expansion (hpa@zytor.com) + - style clean up, @compat dropped, compat_sock_fprog32 (indan@nul.nu) + - do_exit(SIGSYS) (keescook@chromium.org, luto@mit.edu) + - pare down Kconfig doc reference. + - extra comment clean up +v10: - seccomp_data has changed again to be more aesthetically pleasing + (hpa@zytor.com) + - calling convention is noted in a new u32 field using syscall_get_arch. + This allows for cross-calling convention tasks to use seccomp filters. + (hpa@zytor.com) + - lots of clean up (thanks, Indan!) + v9: - n/a + v8: - use bpf_chk_filter, bpf_run_filter. update load_fns + - Lots of fixes courtesy of indan@nul.nu: + -- fix up load behavior, compat fixups, and merge alloc code, + -- renamed pc and dropped __packed, use bool compat. + -- Added a hidden CONFIG_SECCOMP_FILTER to synthesize non-arch + dependencies + v7: (massive overhaul thanks to Indan, others) + - added CONFIG_HAVE_ARCH_SECCOMP_FILTER + - merged into seccomp.c + - minimal seccomp_filter.h + - no config option (part of seccomp) + - no new prctl + - doesn't break seccomp on systems without asm/syscall.h + (works but arg access always fails) + - dropped seccomp_init_task, extra free functions, ... + - dropped the no-asm/syscall.h code paths + - merges with network sk_run_filter and sk_chk_filter + v6: - fix memory leak on attach compat check failure + - require no_new_privs || CAP_SYS_ADMIN prior to filter + installation. (luto@mit.edu) + - s/seccomp_struct_/seccomp_/ for macros/functions (amwang@redhat.com) + - cleaned up Kconfig (amwang@redhat.com) + - on block, note if the call was compat (so the # means something) + v5: - uses syscall_get_arguments + (indan@nul.nu,oleg@redhat.com, mcgrathr@chromium.org) + - uses union-based arg storage with hi/lo struct to + handle endianness. Compromises between the two alternate + proposals to minimize extra arg shuffling and account for + endianness assuming userspace uses offsetof(). + (mcgrathr@chromium.org, indan@nul.nu) + - update Kconfig description + - add include/seccomp_filter.h and add its installation + - (naive) on-demand syscall argument loading + - drop seccomp_t (eparis@redhat.com) + v4: - adjusted prctl to make room for PR_[SG]ET_NO_NEW_PRIVS + - now uses current->no_new_privs + (luto@mit.edu,torvalds@linux-foundation.com) + - assign names to seccomp modes (rdunlap@xenotime.net) + - fix style issues (rdunlap@xenotime.net) + - reworded Kconfig entry (rdunlap@xenotime.net) + v3: - macros to inline (oleg@redhat.com) + - init_task behavior fixed (oleg@redhat.com) + - drop creator entry and extra NULL check (oleg@redhat.com) + - alloc returns -EINVAL on bad sizing (serge.hallyn@canonical.com) + - adds tentative use of "always_unprivileged" as per + torvalds@linux-foundation.org and luto@mit.edu + v2: - (patch 2 only) +Signed-off-by: James Morris <james.l.morris@oracle.com> +Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com> +--- + arch/Kconfig | 17 ++ + include/linux/Kbuild | 1 + + include/linux/seccomp.h | 76 +++++++++- + kernel/fork.c | 3 + + kernel/seccomp.c | 396 ++++++++++++++++++++++++++++++++++++++++++++-- + kernel/sys.c | 2 +- + 6 files changed, 472 insertions(+), 23 deletions(-) + +diff --git a/arch/Kconfig b/arch/Kconfig +index 684eb5a..91c2c73 100644 +--- a/arch/Kconfig ++++ b/arch/Kconfig +@@ -216,4 +216,21 @@ config HAVE_CMPXCHG_DOUBLE + config ARCH_WANT_OLD_COMPAT_IPC + bool + ++config HAVE_ARCH_SECCOMP_FILTER ++ bool ++ help ++ This symbol should be selected by an architecure if it provides ++ asm/syscall.h, specifically syscall_get_arguments() and ++ syscall_get_arch(). ++ ++config SECCOMP_FILTER ++ def_bool y ++ depends on HAVE_ARCH_SECCOMP_FILTER && SECCOMP && NET ++ help ++ Enable tasks to build secure computing environments defined ++ in terms of Berkeley Packet Filter programs which implement ++ task-defined system call filtering polices. ++ ++ See Documentation/prctl/seccomp_filter.txt for details. ++ + source "kernel/gcov/Kconfig" +diff --git a/include/linux/Kbuild b/include/linux/Kbuild +index 50f55c7..bc82495 100644 +--- a/include/linux/Kbuild ++++ b/include/linux/Kbuild +@@ -333,6 +333,7 @@ header-y += scc.h + header-y += sched.h + header-y += screen_info.h + header-y += sdla.h ++header-y += seccomp.h + header-y += securebits.h + header-y += selinux_netlink.h + header-y += sem.h +diff --git a/include/linux/seccomp.h b/include/linux/seccomp.h +index d61f27f..86bb68f 100644 +--- a/include/linux/seccomp.h ++++ b/include/linux/seccomp.h +@@ -1,14 +1,67 @@ + #ifndef _LINUX_SECCOMP_H + #define _LINUX_SECCOMP_H + ++#include <linux/compiler.h> ++#include <linux/types.h> ++ ++ ++/* Valid values for seccomp.mode and prctl(PR_SET_SECCOMP, <mode>) */ ++#define SECCOMP_MODE_DISABLED 0 /* seccomp is not in use. */ ++#define SECCOMP_MODE_STRICT 1 /* uses hard-coded filter. */ ++#define SECCOMP_MODE_FILTER 2 /* uses user-supplied filter. */ ++ ++/* ++ * All BPF programs must return a 32-bit value. ++ * The bottom 16-bits are reserved for future use. ++ * The upper 16-bits are ordered from least permissive values to most. ++ * ++ * The ordering ensures that a min_t() over composed return values always ++ * selects the least permissive choice. ++ */ ++#define SECCOMP_RET_KILL 0x00000000U /* kill the task immediately */ ++#define SECCOMP_RET_ALLOW 0x7fff0000U /* allow */ ++ ++/* Masks for the return value sections. */ ++#define SECCOMP_RET_ACTION 0x7fff0000U ++#define SECCOMP_RET_DATA 0x0000ffffU ++ ++/** ++ * struct seccomp_data - the format the BPF program executes over. ++ * @nr: the system call number ++ * @arch: indicates system call convention as an AUDIT_ARCH_* value ++ * as defined in <linux/audit.h>. ++ * @instruction_pointer: at the time of the system call. ++ * @args: up to 6 system call arguments always stored as 64-bit values ++ * regardless of the architecture. ++ */ ++struct seccomp_data { ++ int nr; ++ __u32 arch; ++ __u64 instruction_pointer; ++ __u64 args[6]; ++}; + ++#ifdef __KERNEL__ + #ifdef CONFIG_SECCOMP + + #include <linux/thread_info.h> + #include <asm/seccomp.h> + ++struct seccomp_filter; ++/** ++ * struct seccomp - the state of a seccomp'ed process ++ * ++ * @mode: indicates one of the valid values above for controlled ++ * system calls available to a process. ++ * @filter: The metadata and ruleset for determining what system calls ++ * are allowed for a task. ++ * ++ * @filter must only be accessed from the context of current as there ++ * is no locking. ++ */ + struct seccomp { + int mode; ++ struct seccomp_filter *filter; + }; + + extern void __secure_computing(int); +@@ -19,7 +72,7 @@ static inline void secure_computing(int this_syscall) + } + + extern long prctl_get_seccomp(void); +-extern long prctl_set_seccomp(unsigned long); ++extern long prctl_set_seccomp(unsigned long, char __user *); + + static inline int seccomp_mode(struct seccomp *s) + { +@@ -31,15 +84,16 @@ static inline int seccomp_mode(struct seccomp *s) + #include <linux/errno.h> + + struct seccomp { }; ++struct seccomp_filter { }; + +-#define secure_computing(x) do { } while (0) ++#define secure_computing(x) 0 + + static inline long prctl_get_seccomp(void) + { + return -EINVAL; + } + +-static inline long prctl_set_seccomp(unsigned long arg2) ++static inline long prctl_set_seccomp(unsigned long arg2, char __user *arg3) + { + return -EINVAL; + } +@@ -48,7 +102,21 @@ static inline int seccomp_mode(struct seccomp *s) + { + return 0; + } +- + #endif /* CONFIG_SECCOMP */ + ++#ifdef CONFIG_SECCOMP_FILTER ++extern void put_seccomp_filter(struct task_struct *tsk); ++extern void get_seccomp_filter(struct task_struct *tsk); ++extern u32 seccomp_bpf_load(int off); ++#else /* CONFIG_SECCOMP_FILTER */ ++static inline void put_seccomp_filter(struct task_struct *tsk) ++{ ++ return; ++} ++static inline void get_seccomp_filter(struct task_struct *tsk) ++{ ++ return; ++} ++#endif /* CONFIG_SECCOMP_FILTER */ ++#endif /* __KERNEL__ */ + #endif /* _LINUX_SECCOMP_H */ +diff --git a/kernel/fork.c b/kernel/fork.c +index 8163333..9faa812 100644 +--- a/kernel/fork.c ++++ b/kernel/fork.c +@@ -34,6 +34,7 @@ + #include <linux/cgroup.h> + #include <linux/security.h> + #include <linux/hugetlb.h> ++#include <linux/seccomp.h> + #include <linux/swap.h> + #include <linux/syscalls.h> + #include <linux/jiffies.h> +@@ -171,6 +172,7 @@ void free_task(struct task_struct *tsk) + free_thread_info(tsk->stack); + rt_mutex_debug_task_free(tsk); + ftrace_graph_exit_task(tsk); ++ put_seccomp_filter(tsk); + free_task_struct(tsk); + } + EXPORT_SYMBOL(free_task); +@@ -1164,6 +1166,7 @@ static struct task_struct *copy_process(unsigned long clone_flags, + goto fork_out; + + ftrace_graph_init_task(p); ++ get_seccomp_filter(p); + + rt_mutex_init_task(p); + +diff --git a/kernel/seccomp.c b/kernel/seccomp.c +index e8d76c5..0aeec19 100644 +--- a/kernel/seccomp.c ++++ b/kernel/seccomp.c +@@ -3,16 +3,343 @@ + * + * Copyright 2004-2005 Andrea Arcangeli <andrea@cpushare.com> + * +- * This defines a simple but solid secure-computing mode. ++ * Copyright (C) 2012 Google, Inc. ++ * Will Drewry <wad@chromium.org> ++ * ++ * This defines a simple but solid secure-computing facility. ++ * ++ * Mode 1 uses a fixed list of allowed system calls. ++ * Mode 2 allows user-defined system call filters in the form ++ * of Berkeley Packet Filters/Linux Socket Filters. + */ + ++#include <linux/atomic.h> + #include <linux/audit.h> +-#include <linux/seccomp.h> +-#include <linux/sched.h> + #include <linux/compat.h> ++#include <linux/sched.h> ++#include <linux/seccomp.h> + + /* #define SECCOMP_DEBUG 1 */ +-#define NR_SECCOMP_MODES 1 ++ ++#ifdef CONFIG_SECCOMP_FILTER ++#include <asm/syscall.h> ++#include <linux/filter.h> ++#include <linux/security.h> ++#include <linux/slab.h> ++#include <linux/tracehook.h> ++#include <linux/uaccess.h> ++ ++/** ++ * struct seccomp_filter - container for seccomp BPF programs ++ * ++ * @usage: reference count to manage the object lifetime. ++ * get/put helpers should be used when accessing an instance ++ * outside of a lifetime-guarded section. In general, this ++ * is only needed for handling filters shared across tasks. ++ * @prev: points to a previously installed, or inherited, filter ++ * @len: the number of instructions in the program ++ * @insns: the BPF program instructions to evaluate ++ * ++ * seccomp_filter objects are organized in a tree linked via the @prev ++ * pointer. For any task, it appears to be a singly-linked list starting ++ * with current->seccomp.filter, the most recently attached or inherited filter. ++ * However, multiple filters may share a @prev node, by way of fork(), which ++ * results in a unidirectional tree existing in memory. This is similar to ++ * how namespaces work. ++ * ++ * seccomp_filter objects should never be modified after being attached ++ * to a task_struct (other than @usage). ++ */ ++struct seccomp_filter { ++ atomic_t usage; ++ struct seccomp_filter *prev; ++ unsigned short len; /* Instruction count */ ++ struct sock_filter insns[]; ++}; ++ ++/* Limit any path through the tree to 256KB worth of instructions. */ ++#define MAX_INSNS_PER_PATH ((1 << 18) / sizeof(struct sock_filter)) ++ ++static void seccomp_filter_log_failure(int syscall) ++{ ++ int compat = 0; ++#ifdef CONFIG_COMPAT ++ compat = is_compat_task(); ++#endif ++ pr_info("%s[%d]: %ssystem call %d blocked at 0x%lx\n", ++ current->comm, task_pid_nr(current), ++ (compat ? "compat " : ""), ++ syscall, KSTK_EIP(current)); ++} ++ ++/** ++ * get_u32 - returns a u32 offset into data ++ * @data: a unsigned 64 bit value ++ * @index: 0 or 1 to return the first or second 32-bits ++ * ++ * This inline exists to hide the length of unsigned long. If a 32-bit ++ * unsigned long is passed in, it will be extended and the top 32-bits will be ++ * 0. If it is a 64-bit unsigned long, then whatever data is resident will be ++ * properly returned. ++ * ++ * Endianness is explicitly ignored and left for BPF program authors to manage ++ * as per the specific architecture. ++ */ ++static inline u32 get_u32(u64 data, int index) ++{ ++ return ((u32 *)&data)[index]; ++} ++ ++/* Helper for bpf_load below. */ ++#define BPF_DATA(_name) offsetof(struct seccomp_data, _name) ++/** ++ * bpf_load: checks and returns a pointer to the requested offset ++ * @off: offset into struct seccomp_data to load from ++ * ++ * Returns the requested 32-bits of data. ++ * seccomp_check_filter() should assure that @off is 32-bit aligned ++ * and not out of bounds. Failure to do so is a BUG. ++ */ ++u32 seccomp_bpf_load(int off) ++{ ++ struct pt_regs *regs = task_pt_regs(current); ++ if (off == BPF_DATA(nr)) ++ return syscall_get_nr(current, regs); ++ if (off == BPF_DATA(arch)) ++ return syscall_get_arch(current, regs); ++ if (off >= BPF_DATA(args[0]) && off < BPF_DATA(args[6])) { ++ unsigned long value; ++ int arg = (off - BPF_DATA(args[0])) / sizeof(u64); ++ int index = !!(off % sizeof(u64)); ++ syscall_get_arguments(current, regs, arg, 1, &value); ++ return get_u32(value, index); ++ } ++ if (off == BPF_DATA(instruction_pointer)) ++ return get_u32(KSTK_EIP(current), 0); ++ if (off == BPF_DATA(instruction_pointer) + sizeof(u32)) ++ return get_u32(KSTK_EIP(current), 1); ++ /* seccomp_check_filter should make this impossible. */ ++ BUG(); ++} ++ ++/** ++ * seccomp_check_filter - verify seccomp filter code ++ * @filter: filter to verify ++ * @flen: length of filter ++ * ++ * Takes a previously checked filter (by sk_chk_filter) and ++ * redirects all filter code that loads struct sk_buff data ++ * and related data through seccomp_bpf_load. It also ++ * enforces length and alignment checking of those loads. ++ * ++ * Returns 0 if the rule set is legal or -EINVAL if not. ++ */ ++static int seccomp_check_filter(struct sock_filter *filter, unsigned int flen) ++{ ++ int pc; ++ for (pc = 0; pc < flen; pc++) { ++ struct sock_filter *ftest = &filter[pc]; ++ u16 code = ftest->code; ++ u32 k = ftest->k; ++ ++ switch (code) { ++ case BPF_S_LD_W_ABS: ++ ftest->code = BPF_S_ANC_SECCOMP_LD_W; ++ /* 32-bit aligned and not out of bounds. */ ++ if (k >= sizeof(struct seccomp_data) || k & 3) ++ return -EINVAL; ++ continue; ++ case BPF_S_LD_W_LEN: ++ ftest->code = BPF_S_LD_IMM; ++ ftest->k = sizeof(struct seccomp_data); ++ continue; ++ case BPF_S_LDX_W_LEN: ++ ftest->code = BPF_S_LDX_IMM; ++ ftest->k = sizeof(struct seccomp_data); ++ continue; ++ /* Explicitly include allowed calls. */ ++ case BPF_S_RET_K: ++ case BPF_S_RET_A: ++ case BPF_S_ALU_ADD_K: ++ case BPF_S_ALU_ADD_X: ++ case BPF_S_ALU_SUB_K: ++ case BPF_S_ALU_SUB_X: ++ case BPF_S_ALU_MUL_K: ++ case BPF_S_ALU_MUL_X: ++ case BPF_S_ALU_DIV_X: ++ case BPF_S_ALU_AND_K: ++ case BPF_S_ALU_AND_X: ++ case BPF_S_ALU_OR_K: ++ case BPF_S_ALU_OR_X: ++ case BPF_S_ALU_LSH_K: ++ case BPF_S_ALU_LSH_X: ++ case BPF_S_ALU_RSH_K: ++ case BPF_S_ALU_RSH_X: ++ case BPF_S_ALU_NEG: ++ case BPF_S_LD_IMM: ++ case BPF_S_LDX_IMM: ++ case BPF_S_MISC_TAX: ++ case BPF_S_MISC_TXA: ++ case BPF_S_ALU_DIV_K: ++ case BPF_S_LD_MEM: ++ case BPF_S_LDX_MEM: ++ case BPF_S_ST: ++ case BPF_S_STX: ++ case BPF_S_JMP_JA: ++ case BPF_S_JMP_JEQ_K: ++ case BPF_S_JMP_JEQ_X: ++ case BPF_S_JMP_JGE_K: ++ case BPF_S_JMP_JGE_X: ++ case BPF_S_JMP_JGT_K: ++ case BPF_S_JMP_JGT_X: ++ case BPF_S_JMP_JSET_K: ++ case BPF_S_JMP_JSET_X: ++ continue; ++ default: ++ return -EINVAL; ++ } ++ } ++ return 0; ++} ++ ++/** ++ * seccomp_run_filters - evaluates all seccomp filters against @syscall ++ * @syscall: number of the current system call ++ * ++ * Returns valid seccomp BPF response codes. ++ */ ++static u32 seccomp_run_filters(int syscall) ++{ ++ struct seccomp_filter *f; ++ u32 ret = SECCOMP_RET_KILL; ++ /* ++ * All filters in the list are evaluated and the lowest BPF return ++ * value always takes priority. ++ */ ++ for (f = current->seccomp.filter; f; f = f->prev) { ++ ret = sk_run_filter(NULL, f->insns); ++ if (ret != SECCOMP_RET_ALLOW) ++ break; ++ } ++ return ret; ++} ++ ++/** ++ * seccomp_attach_filter: Attaches a seccomp filter to current. ++ * @fprog: BPF program to install ++ * ++ * Returns 0 on success or an errno on failure. ++ */ ++static long seccomp_attach_filter(struct sock_fprog *fprog) ++{ ++ struct seccomp_filter *filter; ++ unsigned long fp_size = fprog->len * sizeof(struct sock_filter); ++ unsigned long total_insns = fprog->len; ++ long ret; ++ ++ if (fprog->len == 0 || fprog->len > BPF_MAXINSNS) ++ return -EINVAL; ++ ++ for (filter = current->seccomp.filter; filter; filter = filter->prev) ++ total_insns += filter->len + 4; /* include a 4 instr penalty */ ++ if (total_insns > MAX_INSNS_PER_PATH) ++ return -ENOMEM; ++ ++ /* ++ * Installing a seccomp filter requires that the task have ++ * CAP_SYS_ADMIN in its namespace or be running with no_new_privs. ++ * This avoids scenarios where unprivileged tasks can affect the ++ * behavior of privileged children. ++ */ ++ if (!current->no_new_privs && ++ security_capable_noaudit(current_cred(), current_user_ns(), ++ CAP_SYS_ADMIN) != 0) ++ return -EACCES; ++ ++ /* Allocate a new seccomp_filter */ ++ filter = kzalloc(sizeof(struct seccomp_filter) + fp_size, ++ GFP_KERNEL|__GFP_NOWARN); ++ if (!filter) ++ return -ENOMEM; ++ atomic_set(&filter->usage, 1); ++ filter->len = fprog->len; ++ ++ /* Copy the instructions from fprog. */ ++ ret = -EFAULT; ++ if (copy_from_user(filter->insns, fprog->filter, fp_size)) ++ goto fail; ++ ++ /* Check and rewrite the fprog via the skb checker */ ++ ret = sk_chk_filter(filter->insns, filter->len); ++ if (ret) ++ goto fail; ++ ++ /* Check and rewrite the fprog for seccomp use */ ++ ret = seccomp_check_filter(filter->insns, filter->len); ++ if (ret) ++ goto fail; ++ ++ /* ++ * If there is an existing filter, make it the prev and don't drop its ++ * task reference. ++ */ ++ filter->prev = current->seccomp.filter; ++ current->seccomp.filter = filter; ++ return 0; ++fail: ++ kfree(filter); ++ return ret; ++} ++ ++/** ++ * seccomp_attach_user_filter - attaches a user-supplied sock_fprog ++ * @user_filter: pointer to the user data containing a sock_fprog. ++ * ++ * Returns 0 on success and non-zero otherwise. ++ */ ++long seccomp_attach_user_filter(char __user *user_filter) ++{ ++ struct sock_fprog fprog; ++ long ret = -EFAULT; ++ ++#ifdef CONFIG_COMPAT ++ if (is_compat_task()) { ++ struct compat_sock_fprog fprog32; ++ if (copy_from_user(&fprog32, user_filter, sizeof(fprog32))) ++ goto out; ++ fprog.len = fprog32.len; ++ fprog.filter = compat_ptr(fprog32.filter); ++ } else /* falls through to the if below. */ ++#endif ++ if (copy_from_user(&fprog, user_filter, sizeof(fprog))) ++ goto out; ++ ret = seccomp_attach_filter(&fprog); ++out: ++ return ret; ++} ++ ++/* get_seccomp_filter - increments the reference count of the filter on @tsk */ ++void get_seccomp_filter(struct task_struct *tsk) ++{ ++ struct seccomp_filter *orig = tsk->seccomp.filter; ++ if (!orig) ++ return; ++ /* Reference count is bounded by the number of total processes. */ ++ atomic_inc(&orig->usage); ++} ++ ++/* put_seccomp_filter - decrements the ref count of tsk->seccomp.filter */ ++void put_seccomp_filter(struct task_struct *tsk) ++{ ++ struct seccomp_filter *orig = tsk->seccomp.filter; ++ /* Clean up single-reference branches iteratively. */ ++ while (orig && atomic_dec_and_test(&orig->usage)) { ++ struct seccomp_filter *freeme = orig; ++ orig = orig->prev; ++ kfree(freeme); ++ } ++} ++#endif /* CONFIG_SECCOMP_FILTER */ + + /* + * Secure computing mode 1 allows only read/write/exit/sigreturn. +@@ -34,10 +361,11 @@ static int mode1_syscalls_32[] = { + void __secure_computing(int this_syscall) + { + int mode = current->seccomp.mode; +- int * syscall; ++ int exit_sig = 0; ++ int *syscall; + + switch (mode) { +- case 1: ++ case SECCOMP_MODE_STRICT: + syscall = mode1_syscalls; + #ifdef CONFIG_COMPAT + if (is_compat_task()) +@@ -47,7 +375,16 @@ void __secure_computing(int this_syscall) + if (*syscall == this_syscall) + return; + } while (*++syscall); ++ exit_sig = SIGKILL; + break; ++#ifdef CONFIG_SECCOMP_FILTER ++ case SECCOMP_MODE_FILTER: ++ if (seccomp_run_filters(this_syscall) == SECCOMP_RET_ALLOW) ++ return; ++ seccomp_filter_log_failure(this_syscall); ++ exit_sig = SIGSYS; ++ break; ++#endif + default: + BUG(); + } +@@ -56,7 +393,7 @@ void __secure_computing(int this_syscall) + dump_stack(); + #endif + audit_seccomp(this_syscall); +- do_exit(SIGKILL); ++ do_exit(exit_sig); + } + + long prctl_get_seccomp(void) +@@ -64,25 +401,48 @@ long prctl_get_seccomp(void) + return current->seccomp.mode; + } + +-long prctl_set_seccomp(unsigned long seccomp_mode) ++/** ++ * prctl_set_seccomp: configures current->seccomp.mode ++ * @seccomp_mode: requested mode to use ++ * @filter: optional struct sock_fprog for use with SECCOMP_MODE_FILTER ++ * ++ * This function may be called repeatedly with a @seccomp_mode of ++ * SECCOMP_MODE_FILTER to install additional filters. Every filter ++ * successfully installed will be evaluated (in reverse order) for each system ++ * call the task makes. ++ * ++ * Once current->seccomp.mode is non-zero, it may not be changed. ++ * ++ * Returns 0 on success or -EINVAL on failure. ++ */ ++long prctl_set_seccomp(unsigned long seccomp_mode, char __user *filter) + { +- long ret; ++ long ret = -EINVAL; + +- /* can set it only once to be even more secure */ +- ret = -EPERM; +- if (unlikely(current->seccomp.mode)) ++ if (current->seccomp.mode && ++ current->seccomp.mode != seccomp_mode) + goto out; + +- ret = -EINVAL; +- if (seccomp_mode && seccomp_mode <= NR_SECCOMP_MODES) { +- current->seccomp.mode = seccomp_mode; +- set_thread_flag(TIF_SECCOMP); ++ switch (seccomp_mode) { ++ case SECCOMP_MODE_STRICT: ++ ret = 0; + #ifdef TIF_NOTSC + disable_TSC(); + #endif +- ret = 0; ++ break; ++#ifdef CONFIG_SECCOMP_FILTER ++ case SECCOMP_MODE_FILTER: ++ ret = seccomp_attach_user_filter(filter); ++ if (ret) ++ goto out; ++ break; ++#endif ++ default: ++ goto out; + } + +- out: ++ current->seccomp.mode = seccomp_mode; ++ set_thread_flag(TIF_SECCOMP); ++out: + return ret; + } +diff --git a/kernel/sys.c b/kernel/sys.c +index b82568b..ba0ae8e 100644 +--- a/kernel/sys.c ++++ b/kernel/sys.c +@@ -1908,7 +1908,7 @@ SYSCALL_DEFINE5(prctl, int, option, unsigned long, arg2, unsigned long, arg3, + error = prctl_get_seccomp(); + break; + case PR_SET_SECCOMP: +- error = prctl_set_seccomp(arg2); ++ error = prctl_set_seccomp(arg2, (char __user *)arg3); + break; + case PR_GET_TSC: + error = GET_TSC_CTL(arg2); +-- +1.7.9.1 + diff --git a/features/seccomp/seccomp-kill-the-seccomp_t-typedef.patch b/features/seccomp/seccomp-kill-the-seccomp_t-typedef.patch new file mode 100644 index 00000000..95ac5398 --- /dev/null +++ b/features/seccomp/seccomp-kill-the-seccomp_t-typedef.patch @@ -0,0 +1,88 @@ +From 1bed374c1210f2390a39e715243b3767f8958e3b Mon Sep 17 00:00:00 2001 +From: Will Drewry <wad@chromium.org> +Date: Thu, 12 Apr 2012 16:47:54 -0500 +Subject: [PATCH] seccomp: kill the seccomp_t typedef + +commit 932ecebb0405b9a41cd18946e6cff8a17d434e23 upstream. + +Replaces the seccomp_t typedef with struct seccomp to match modern +kernel style. + +Signed-off-by: Will Drewry <wad@chromium.org> +Reviewed-by: James Morris <jmorris@namei.org> +Acked-by: Serge Hallyn <serge.hallyn@canonical.com> +Acked-by: Eric Paris <eparis@redhat.com> + +v18: rebase +... +v14: rebase/nochanges +v13: rebase on to 88ebdda6159ffc15699f204c33feb3e431bf9bdc +v12: rebase on to linux-next +v8-v11: no changes +v7: struct seccomp_struct -> struct seccomp +v6: original inclusion in this series. +Signed-off-by: James Morris <james.l.morris@oracle.com> +Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com> +--- + include/linux/sched.h | 2 +- + include/linux/seccomp.h | 10 ++++++---- + 2 files changed, 7 insertions(+), 5 deletions(-) + +diff --git a/include/linux/sched.h b/include/linux/sched.h +index ba60897..cad1502 100644 +--- a/include/linux/sched.h ++++ b/include/linux/sched.h +@@ -1452,7 +1452,7 @@ struct task_struct { + uid_t loginuid; + unsigned int sessionid; + #endif +- seccomp_t seccomp; ++ struct seccomp seccomp; + + /* Thread group tracking */ + u32 parent_exec_id; +diff --git a/include/linux/seccomp.h b/include/linux/seccomp.h +index cc7a4e9..d61f27f 100644 +--- a/include/linux/seccomp.h ++++ b/include/linux/seccomp.h +@@ -7,7 +7,9 @@ + #include <linux/thread_info.h> + #include <asm/seccomp.h> + +-typedef struct { int mode; } seccomp_t; ++struct seccomp { ++ int mode; ++}; + + extern void __secure_computing(int); + static inline void secure_computing(int this_syscall) +@@ -19,7 +21,7 @@ static inline void secure_computing(int this_syscall) + extern long prctl_get_seccomp(void); + extern long prctl_set_seccomp(unsigned long); + +-static inline int seccomp_mode(seccomp_t *s) ++static inline int seccomp_mode(struct seccomp *s) + { + return s->mode; + } +@@ -28,7 +30,7 @@ static inline int seccomp_mode(seccomp_t *s) + + #include <linux/errno.h> + +-typedef struct { } seccomp_t; ++struct seccomp { }; + + #define secure_computing(x) do { } while (0) + +@@ -42,7 +44,7 @@ static inline long prctl_set_seccomp(unsigned long arg2) + return -EINVAL; + } + +-static inline int seccomp_mode(seccomp_t *s) ++static inline int seccomp_mode(struct seccomp *s) + { + return 0; + } +-- +1.7.9.1 + diff --git a/features/seccomp/seccomp-remove-duplicated-failure-logging.patch b/features/seccomp/seccomp-remove-duplicated-failure-logging.patch new file mode 100644 index 00000000..ed1e662b --- /dev/null +++ b/features/seccomp/seccomp-remove-duplicated-failure-logging.patch @@ -0,0 +1,135 @@ +From 60ec12b5d1111e19e716ee5029296dc0550fad11 Mon Sep 17 00:00:00 2001 +From: Kees Cook <keescook@chromium.org> +Date: Thu, 12 Apr 2012 16:47:58 -0500 +Subject: [PATCH] seccomp: remove duplicated failure logging + +commit 3dc1c1b2d2ed7507ce8a379814ad75745ff97ebe upstream. + +This consolidates the seccomp filter error logging path and adds more +details to the audit log. + +Signed-off-by: Will Drewry <wad@chromium.org> +Signed-off-by: Kees Cook <keescook@chromium.org> +Acked-by: Eric Paris <eparis@redhat.com> + +v18: make compat= permanent in the record +v15: added a return code to the audit_seccomp path by wad@chromium.org + (suggested by eparis@redhat.com) +v*: original by keescook@chromium.org +Signed-off-by: James Morris <james.l.morris@oracle.com> +Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com> +--- + include/linux/audit.h | 8 ++++---- + kernel/auditsc.c | 8 ++++++-- + kernel/seccomp.c | 15 +-------------- + 3 files changed, 11 insertions(+), 20 deletions(-) + +diff --git a/include/linux/audit.h b/include/linux/audit.h +index ed3ef19..22f292a 100644 +--- a/include/linux/audit.h ++++ b/include/linux/audit.h +@@ -463,7 +463,7 @@ extern void audit_putname(const char *name); + extern void __audit_inode(const char *name, const struct dentry *dentry); + extern void __audit_inode_child(const struct dentry *dentry, + const struct inode *parent); +-extern void __audit_seccomp(unsigned long syscall); ++extern void __audit_seccomp(unsigned long syscall, long signr, int code); + extern void __audit_ptrace(struct task_struct *t); + + static inline int audit_dummy_context(void) +@@ -508,10 +508,10 @@ static inline void audit_inode_child(const struct dentry *dentry, + } + void audit_core_dumps(long signr); + +-static inline void audit_seccomp(unsigned long syscall) ++static inline void audit_seccomp(unsigned long syscall, long signr, int code) + { + if (unlikely(!audit_dummy_context())) +- __audit_seccomp(syscall); ++ __audit_seccomp(syscall, signr, code); + } + + static inline void audit_ptrace(struct task_struct *t) +@@ -634,7 +634,7 @@ extern int audit_signals; + #define audit_inode(n,d) do { (void)(d); } while (0) + #define audit_inode_child(i,p) do { ; } while (0) + #define audit_core_dumps(i) do { ; } while (0) +-#define audit_seccomp(i) do { ; } while (0) ++#define audit_seccomp(i,s,c) do { ; } while (0) + #define auditsc_get_stamp(c,t,s) (0) + #define audit_get_loginuid(t) (-1) + #define audit_get_sessionid(t) (-1) +diff --git a/kernel/auditsc.c b/kernel/auditsc.c +index af1de0f..4b96415 100644 +--- a/kernel/auditsc.c ++++ b/kernel/auditsc.c +@@ -67,6 +67,7 @@ + #include <linux/syscalls.h> + #include <linux/capability.h> + #include <linux/fs_struct.h> ++#include <linux/compat.h> + + #include "audit.h" + +@@ -2710,13 +2711,16 @@ void audit_core_dumps(long signr) + audit_log_end(ab); + } + +-void __audit_seccomp(unsigned long syscall) ++void __audit_seccomp(unsigned long syscall, long signr, int code) + { + struct audit_buffer *ab; + + ab = audit_log_start(NULL, GFP_KERNEL, AUDIT_ANOM_ABEND); +- audit_log_abend(ab, "seccomp", SIGKILL); ++ audit_log_abend(ab, "seccomp", signr); + audit_log_format(ab, " syscall=%ld", syscall); ++ audit_log_format(ab, " compat=%d", is_compat_task()); ++ audit_log_format(ab, " ip=0x%lx", KSTK_EIP(current)); ++ audit_log_format(ab, " code=0x%x", code); + audit_log_end(ab); + } + +diff --git a/kernel/seccomp.c b/kernel/seccomp.c +index 0aeec19..0f7c709 100644 +--- a/kernel/seccomp.c ++++ b/kernel/seccomp.c +@@ -60,18 +60,6 @@ struct seccomp_filter { + /* Limit any path through the tree to 256KB worth of instructions. */ + #define MAX_INSNS_PER_PATH ((1 << 18) / sizeof(struct sock_filter)) + +-static void seccomp_filter_log_failure(int syscall) +-{ +- int compat = 0; +-#ifdef CONFIG_COMPAT +- compat = is_compat_task(); +-#endif +- pr_info("%s[%d]: %ssystem call %d blocked at 0x%lx\n", +- current->comm, task_pid_nr(current), +- (compat ? "compat " : ""), +- syscall, KSTK_EIP(current)); +-} +- + /** + * get_u32 - returns a u32 offset into data + * @data: a unsigned 64 bit value +@@ -381,7 +369,6 @@ void __secure_computing(int this_syscall) + case SECCOMP_MODE_FILTER: + if (seccomp_run_filters(this_syscall) == SECCOMP_RET_ALLOW) + return; +- seccomp_filter_log_failure(this_syscall); + exit_sig = SIGSYS; + break; + #endif +@@ -392,7 +379,7 @@ void __secure_computing(int this_syscall) + #ifdef SECCOMP_DEBUG + dump_stack(); + #endif +- audit_seccomp(this_syscall); ++ audit_seccomp(this_syscall, exit_code, SECCOMP_RET_KILL); + do_exit(exit_sig); + } + +-- +1.7.9.1 + diff --git a/features/seccomp/seccomp.scc b/features/seccomp/seccomp.scc new file mode 100644 index 00000000..7ceac72a --- /dev/null +++ b/features/seccomp/seccomp.scc @@ -0,0 +1,15 @@ +patch Add-PR_-GET-SET-_NO_NEW_PRIVS-to-prevent-execve-from.patch +patch Fix-execve-behavior-apparmor-for-PR_-GET-SET-_NO_NEW.patch +patch sk_run_filter-add-BPF_S_ANC_SECCOMP_LD_W.patch +patch net-compat.c-linux-filter.h-share-compat_sock_fprog.patch +patch seccomp-kill-the-seccomp_t-typedef.patch +patch asm-syscall.h-add-syscall_get_arch.patch +patch arch-x86-add-syscall_get_arch-to-syscall.h.patch +patch seccomp-add-system-call-filtering-using-BPF.patch +patch seccomp-remove-duplicated-failure-logging.patch +patch seccomp-add-SECCOMP_RET_ERRNO.patch +patch signal-x86-add-SIGSYS-info-and-make-it-synchronous.patch +patch seccomp-Add-SECCOMP_RET_TRAP.patch +patch ptrace-seccomp-Add-PTRACE_SECCOMP-support.patch +patch x86-Enable-HAVE_ARCH_SECCOMP_FILTER.patch +patch Documentation-prctl-seccomp_filter.patch diff --git a/features/seccomp/signal-x86-add-SIGSYS-info-and-make-it-synchronous.patch b/features/seccomp/signal-x86-add-SIGSYS-info-and-make-it-synchronous.patch new file mode 100644 index 00000000..735a9b94 --- /dev/null +++ b/features/seccomp/signal-x86-add-SIGSYS-info-and-make-it-synchronous.patch @@ -0,0 +1,174 @@ +From 5b84a784a5f5186e35aea6efad849d8898f527a2 Mon Sep 17 00:00:00 2001 +From: Will Drewry <wad@chromium.org> +Date: Thu, 12 Apr 2012 16:48:00 -0500 +Subject: [PATCH] signal, x86: add SIGSYS info and make it synchronous. + +commit a0727e8ce513fe6890416da960181ceb10fbfae6 upstream. + +This change enables SIGSYS, defines _sigfields._sigsys, and adds +x86 (compat) arch support. _sigsys defines fields which allow +a signal handler to receive the triggering system call number, +the relevant AUDIT_ARCH_* value for that number, and the address +of the callsite. + +SIGSYS is added to the SYNCHRONOUS_MASK because it is desirable for it +to have setup_frame() called for it. The goal is to ensure that +ucontext_t reflects the machine state from the time-of-syscall and not +from another signal handler. + +The first consumer of SIGSYS would be seccomp filter. In particular, +a filter program could specify a new return value, SECCOMP_RET_TRAP, +which would result in the system call being denied and the calling +thread signaled. This also means that implementing arch-specific +support can be dependent upon HAVE_ARCH_SECCOMP_FILTER. + +Suggested-by: H. Peter Anvin <hpa@zytor.com> +Signed-off-by: Will Drewry <wad@chromium.org> +Acked-by: Serge Hallyn <serge.hallyn@canonical.com> +Reviewed-by: H. Peter Anvin <hpa@zytor.com> +Acked-by: Eric Paris <eparis@redhat.com> + +v18: - added acked by, rebase +v17: - rebase and reviewed-by addition +v14: - rebase/nochanges +v13: - rebase on to 88ebdda6159ffc15699f204c33feb3e431bf9bdc +v12: - reworded changelog (oleg@redhat.com) +v11: - fix dropped words in the change description + - added fallback copy_siginfo support. + - added __ARCH_SIGSYS define to allow stepped arch support. +v10: - first version based on suggestion +Signed-off-by: James Morris <james.l.morris@oracle.com> +Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com> +--- + arch/x86/ia32/ia32_signal.c | 4 ++++ + arch/x86/include/asm/ia32.h | 6 ++++++ + include/asm-generic/siginfo.h | 22 ++++++++++++++++++++++ + kernel/signal.c | 9 ++++++++- + 4 files changed, 40 insertions(+), 1 deletions(-) + +diff --git a/arch/x86/ia32/ia32_signal.c b/arch/x86/ia32/ia32_signal.c +index a69245b..0b3f235 100644 +--- a/arch/x86/ia32/ia32_signal.c ++++ b/arch/x86/ia32/ia32_signal.c +@@ -67,6 +67,10 @@ int copy_siginfo_to_user32(compat_siginfo_t __user *to, siginfo_t *from) + switch (from->si_code >> 16) { + case __SI_FAULT >> 16: + break; ++ case __SI_SYS >> 16: ++ put_user_ex(from->si_syscall, &to->si_syscall); ++ put_user_ex(from->si_arch, &to->si_arch); ++ break; + case __SI_CHLD >> 16: + if (ia32) { + put_user_ex(from->si_utime, &to->si_utime); +diff --git a/arch/x86/include/asm/ia32.h b/arch/x86/include/asm/ia32.h +index ee52760..b04cbdb 100644 +--- a/arch/x86/include/asm/ia32.h ++++ b/arch/x86/include/asm/ia32.h +@@ -144,6 +144,12 @@ typedef struct compat_siginfo { + int _band; /* POLL_IN, POLL_OUT, POLL_MSG */ + int _fd; + } _sigpoll; ++ ++ struct { ++ unsigned int _call_addr; /* calling insn */ ++ int _syscall; /* triggering system call number */ ++ unsigned int _arch; /* AUDIT_ARCH_* of syscall */ ++ } _sigsys; + } _sifields; + } compat_siginfo_t; + +diff --git a/include/asm-generic/siginfo.h b/include/asm-generic/siginfo.h +index 5e5e386..d2c7f29 100644 +--- a/include/asm-generic/siginfo.h ++++ b/include/asm-generic/siginfo.h +@@ -98,9 +98,18 @@ typedef struct siginfo { + __ARCH_SI_BAND_T _band; /* POLL_IN, POLL_OUT, POLL_MSG */ + int _fd; + } _sigpoll; ++ ++ /* SIGSYS */ ++ struct { ++ void __user *_call_addr; /* calling insn */ ++ int _syscall; /* triggering system call number */ ++ unsigned int _arch; /* AUDIT_ARCH_* of syscall */ ++ } _sigsys; + } _sifields; + } __ARCH_SI_ATTRIBUTES siginfo_t; + ++/* If the arch shares siginfo, then it has SIGSYS. */ ++#define __ARCH_SIGSYS + #endif + + /* +@@ -124,6 +133,11 @@ typedef struct siginfo { + #define si_addr_lsb _sifields._sigfault._addr_lsb + #define si_band _sifields._sigpoll._band + #define si_fd _sifields._sigpoll._fd ++#ifdef __ARCH_SIGSYS ++#define si_call_addr _sifields._sigsys._call_addr ++#define si_syscall _sifields._sigsys._syscall ++#define si_arch _sifields._sigsys._arch ++#endif + + #ifdef __KERNEL__ + #define __SI_MASK 0xffff0000u +@@ -134,6 +148,7 @@ typedef struct siginfo { + #define __SI_CHLD (4 << 16) + #define __SI_RT (5 << 16) + #define __SI_MESGQ (6 << 16) ++#define __SI_SYS (7 << 16) + #define __SI_CODE(T,N) ((T) | ((N) & 0xffff)) + #else + #define __SI_KILL 0 +@@ -143,6 +158,7 @@ typedef struct siginfo { + #define __SI_CHLD 0 + #define __SI_RT 0 + #define __SI_MESGQ 0 ++#define __SI_SYS 0 + #define __SI_CODE(T,N) (N) + #endif + +@@ -240,6 +256,12 @@ typedef struct siginfo { + #define NSIGPOLL 6 + + /* ++ * SIGSYS si_codes ++ */ ++#define SYS_SECCOMP (__SI_SYS|1) /* seccomp triggered */ ++#define NSIGSYS 1 ++ ++/* + * sigevent definitions + * + * It seems likely that SIGEV_THREAD will have to be handled from +diff --git a/kernel/signal.c b/kernel/signal.c +index 17afcaf..1a006b5 100644 +--- a/kernel/signal.c ++++ b/kernel/signal.c +@@ -160,7 +160,7 @@ void recalc_sigpending(void) + + #define SYNCHRONOUS_MASK \ + (sigmask(SIGSEGV) | sigmask(SIGBUS) | sigmask(SIGILL) | \ +- sigmask(SIGTRAP) | sigmask(SIGFPE)) ++ sigmask(SIGTRAP) | sigmask(SIGFPE) | sigmask(SIGSYS)) + + int next_signal(struct sigpending *pending, sigset_t *mask) + { +@@ -2706,6 +2706,13 @@ int copy_siginfo_to_user(siginfo_t __user *to, siginfo_t *from) + err |= __put_user(from->si_uid, &to->si_uid); + err |= __put_user(from->si_ptr, &to->si_ptr); + break; ++#ifdef __ARCH_SIGSYS ++ case __SI_SYS: ++ err |= __put_user(from->si_call_addr, &to->si_call_addr); ++ err |= __put_user(from->si_syscall, &to->si_syscall); ++ err |= __put_user(from->si_arch, &to->si_arch); ++ break; ++#endif + default: /* this is just in case for now ... */ + err |= __put_user(from->si_pid, &to->si_pid); + err |= __put_user(from->si_uid, &to->si_uid); +-- +1.7.9.1 + diff --git a/features/seccomp/sk_run_filter-add-BPF_S_ANC_SECCOMP_LD_W.patch b/features/seccomp/sk_run_filter-add-BPF_S_ANC_SECCOMP_LD_W.patch new file mode 100644 index 00000000..00a6038a --- /dev/null +++ b/features/seccomp/sk_run_filter-add-BPF_S_ANC_SECCOMP_LD_W.patch @@ -0,0 +1,73 @@ +From 23be50acb6765e31a3c1c5b79421c81cce9dbbf9 Mon Sep 17 00:00:00 2001 +From: Will Drewry <wad@chromium.org> +Date: Thu, 12 Apr 2012 16:47:52 -0500 +Subject: [PATCH] sk_run_filter: add BPF_S_ANC_SECCOMP_LD_W + +commit 46b325c7eb01482674406701825ff67f561ccdd4 upstream. + +Introduces a new BPF ancillary instruction that all LD calls will be +mapped through when skb_run_filter() is being used for seccomp BPF. The +rewriting will be done using a secondary chk_filter function that is run +after skb_chk_filter. + +The code change is guarded by CONFIG_SECCOMP_FILTER which is added, +along with the seccomp_bpf_load() function later in this series. + +This is based on http://lkml.org/lkml/2012/3/2/141 + +Suggested-by: Indan Zupancic <indan@nul.nu> +Signed-off-by: Will Drewry <wad@chromium.org> +Acked-by: Eric Dumazet <eric.dumazet@gmail.com> +Acked-by: Eric Paris <eparis@redhat.com> + +v18: rebase +... +v15: include seccomp.h explicitly for when seccomp_bpf_load exists. +v14: First cut using a single additional instruction +... v13: made bpf functions generic. +Signed-off-by: James Morris <james.l.morris@oracle.com> +Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com> +--- + include/linux/filter.h | 1 + + net/core/filter.c | 6 ++++++ + 2 files changed, 7 insertions(+), 0 deletions(-) + +diff --git a/include/linux/filter.h b/include/linux/filter.h +index 8eeb205..aaa2e80 100644 +--- a/include/linux/filter.h ++++ b/include/linux/filter.h +@@ -228,6 +228,7 @@ enum { + BPF_S_ANC_HATYPE, + BPF_S_ANC_RXHASH, + BPF_S_ANC_CPU, ++ BPF_S_ANC_SECCOMP_LD_W, + }; + + #endif /* __KERNEL__ */ +diff --git a/net/core/filter.c b/net/core/filter.c +index 6f755cc..491e2e1 100644 +--- a/net/core/filter.c ++++ b/net/core/filter.c +@@ -38,6 +38,7 @@ + #include <linux/filter.h> + #include <linux/reciprocal_div.h> + #include <linux/ratelimit.h> ++#include <linux/seccomp.h> + + /* No hurry in this branch + * +@@ -352,6 +353,11 @@ load_b: + A = 0; + continue; + } ++#ifdef CONFIG_SECCOMP_FILTER ++ case BPF_S_ANC_SECCOMP_LD_W: ++ A = seccomp_bpf_load(fentry->k); ++ continue; ++#endif + default: + WARN_RATELIMIT(1, "Unknown code:%u jt:%u tf:%u k:%u\n", + fentry->code, fentry->jt, +-- +1.7.9.1 + diff --git a/features/seccomp/x86-Enable-HAVE_ARCH_SECCOMP_FILTER.patch b/features/seccomp/x86-Enable-HAVE_ARCH_SECCOMP_FILTER.patch new file mode 100644 index 00000000..9bf43d5b --- /dev/null +++ b/features/seccomp/x86-Enable-HAVE_ARCH_SECCOMP_FILTER.patch @@ -0,0 +1,80 @@ +From 648b737bc10632617f45eff886dcc29398e717da Mon Sep 17 00:00:00 2001 +From: Will Drewry <wad@chromium.org> +Date: Thu, 12 Apr 2012 16:48:03 -0500 +Subject: [PATCH] x86: Enable HAVE_ARCH_SECCOMP_FILTER + +commit c6cfbeb4029610c8c330c312dcf4d514cc067554 upstream. + +Enable support for seccomp filter on x86: +- syscall_get_arch() +- syscall_get_arguments() +- syscall_rollback() +- syscall_set_return_value() +- SIGSYS siginfo_t support +- secure_computing is called from a ptrace_event()-safe context +- secure_computing return value is checked (see below). + +SECCOMP_RET_TRACE and SECCOMP_RET_TRAP may result in seccomp needing to +skip a system call without killing the process. This is done by +returning a non-zero (-1) value from secure_computing. This change +makes x86 respect that return value. + +To ensure that minimal kernel code is exposed, a non-zero return value +results in an immediate return to user space (with an invalid syscall +number). + +Signed-off-by: Will Drewry <wad@chromium.org> +Reviewed-by: H. Peter Anvin <hpa@zytor.com> +Acked-by: Eric Paris <eparis@redhat.com> +Reviewed-by: Kees Cook <keescook@chromium.org> + +v18: rebase and tweaked change description, acked-by +v17: added reviewed by and rebased +v..: all rebases since original introduction. +Signed-off-by: James Morris <james.l.morris@oracle.com> +Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com> +--- + arch/x86/Kconfig | 1 + + arch/x86/kernel/ptrace.c | 7 ++++++- + 2 files changed, 7 insertions(+), 1 deletions(-) + +diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig +index c9866b0..2b79d94 100644 +--- a/arch/x86/Kconfig ++++ b/arch/x86/Kconfig +@@ -82,6 +82,7 @@ config X86 + select ARCH_HAVE_NMI_SAFE_CMPXCHG + select GENERIC_IOMAP + select DCACHE_WORD_ACCESS ++ select HAVE_ARCH_SECCOMP_FILTER + + config INSTRUCTION_DECODER + def_bool (KPROBES || PERF_EVENTS) +diff --git a/arch/x86/kernel/ptrace.c b/arch/x86/kernel/ptrace.c +index cf11783..c4c6a5c 100644 +--- a/arch/x86/kernel/ptrace.c ++++ b/arch/x86/kernel/ptrace.c +@@ -1474,7 +1474,11 @@ long syscall_trace_enter(struct pt_regs *regs) + regs->flags |= X86_EFLAGS_TF; + + /* do the secure computing check first */ +- secure_computing(regs->orig_ax); ++ if (secure_computing(regs->orig_ax)) { ++ /* seccomp failures shouldn't expose any additional code. */ ++ ret = -1L; ++ goto out; ++ } + + if (unlikely(test_thread_flag(TIF_SYSCALL_EMU))) + ret = -1L; +@@ -1499,6 +1503,7 @@ long syscall_trace_enter(struct pt_regs *regs) + regs->dx, regs->r10); + #endif + ++out: + return ret ?: regs->orig_ax; + } + +-- +1.7.9.1 + diff --git a/ktypes/standard/standard-nocfg.scc b/ktypes/standard/standard-nocfg.scc index e0b6eb66..6c9a5de5 100644 --- a/ktypes/standard/standard-nocfg.scc +++ b/ktypes/standard/standard-nocfg.scc @@ -24,6 +24,9 @@ tag systemtap include features/utrace/utrace.scc tag utrace +include features/seccomp/seccomp.scc +tag seccomp + include arch/arm/arm.scc tag arm |