aboutsummaryrefslogtreecommitdiffstats
path: root/features/seccomp/00-README
blob: e14506a3a5887ac573dcc846ae538fe19d3dda7c (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
This is a backport of the seccomp BPF syscall filtering from v3.5

Quoting from: https://lkml.org/lkml/2012/1/11/260

---------------
[RFC,PATCH 0/2] dynamic seccomp policies (using BPF filters)

The goal of the patchset is straightforward:

 To provide a means of reducing the kernel attack surface.

In practice, this is done at the primary kernel ABI: system calls.
Achieving this goal will address the needs expressed by many systems
projects:
  qemu/kvm, openssh, vsftpd, lxc, and chromium and chromium os (me).

While system call filtering has been attempted many times, I hope that
this approach shows more promise.  It works as described below and in
the patch series.

A userland task may call prctl(PR_ATTACH_SECCOMP_FILTER) to attach a
BPF program to itself.  Once attached, all system calls made by the
task will be evaluated by the BPF program prior to being accepted.
Evaluation is done by executing the BPF program over the struct
user_regs_state for the process.
--------------

The content appears in v3.5 from:

------------
commit cb60e3e65c1b96a4d6444a7a13dc7dd48bc15a2b
Merge: 99262a3 ff2bb04
Author: Linus Torvalds <torvalds@linux-foundation.org>
Date:   Mon May 21 20:27:36 2012 -0700

    Merge branch 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security
    
    Pull security subsystem updates from James Morris:
     "New notable features:
       - The seccomp work from Will Drewry
       - PR_{GET,SET}_NO_NEW_PRIVS from Andy Lutomirski
       - Longer security labels for Smack from Casey Schaufler
       - Additional ptrace restriction modes for Yama by Kees Cook"
-----------

Here, we take Will's linear block of commits from the above merge, which
are all conveniently all marked with "v18" in the changelog, and the
one PR_{GET,SET}_NO_NEW_PRIVS commit from Andy (req'd as a dependency).

Documentation:
==============

See added file: Documentation/prctl/seccomp_filter.txt


Testing:
========

Several samples are added in samples/seccomp -- building is as easy as:

	mkdir ../test
	make O=../test defconfig
	make O=../test samples/seccomp/

The bpf-direct is a sample which grabs writes to STDERR, and redirects
them to STDOUT, with an "[ERR]" prefix.  Consider the core of the program:

---------------------
       syscall(__NR_write, STDOUT_FILENO,
               payload("OHAI! WHAT IS YOUR NAME? "));
       bytes = syscall(__NR_read, STDIN_FILENO, buf, sizeof(buf));
       syscall(__NR_write, STDOUT_FILENO, payload("HELLO, "));
       syscall(__NR_write, STDOUT_FILENO, buf, bytes);
       syscall(__NR_write, STDERR_FILENO,
               payload("Error message going to STDERR\n"));
---------------------

Running this core on a non-seccomp kernel, (i.e. by copying the above core
to "foo.c") we can see with redirection, that the sample Error message
goes to STDERR; i.e.

-----------
~$./foo
OHAI! WHAT IS YOUR NAME? sdfsdf
HELLO, sdfsdf
Error message going to STDERR
~$./foo 2> /dev/null
OHAI! WHAT IS YOUR NAME? sdfs
HELLO, sdfs
~$
------------

Note in the 2nd instance, the error message disappears into /dev/null

Now consider the seccomp enabled case, using the same redirect:

------------
$ ./bpf-direct 
OHAI! WHAT IS YOUR NAME? sdfsd
HELLO, sdfsd
[ERR] Error message going to STDERR
$ ./bpf-direct 2>/dev/null
OHAI! WHAT IS YOUR NAME? sdfsdf
HELLO, sdfsdf
[ERR] Error message going to STDERR
$
------------

There are two things to see in the above.
  1) We see the [ERR] prefix that is clearly from the emulator()
     function we've installed on the __NR_write syscall, and
  2) Even when we redirect STDERR to /dev/null, we still see the
     message, which confirms it was put on STDOUT instead.