codeblog code is freedom — patching my itch

6/13/2014

glibc select weakness fixed

Filed under: Blogging,Chrome OS,Debian,General,Security,Ubuntu,Ubuntu-Server — kees @ 11:21 am

In 2009, I reported this bug to glibc, describing the problem that exists when a program is using select, and has its open file descriptor resource limit raised above 1024 (FD_SETSIZE). If a network daemon starts using the FD_SET/FD_CLR glibc macros on fdset variables for descriptors larger than 1024, glibc will happily write beyond the end of the fdset variable, producing a buffer overflow condition. (This problem had existed since the introduction of the macros, so, for decades? I figured it was long over-due to have a report opened about it.)

At the time, I was told this wasn’t going to be fixed and “every program using [select] must be considered buggy.” 2 years later still more people kept asking for this feature and continued to be told “no”.

But, as it turns out, a few months later after the most recent “no”, it got silently fixed anyway, with the bug left open as “Won’t Fix”! I’m glad Florian did some house-cleaning on the glibc bug tracker, since I’d otherwise never have noticed that this protection had been added to the ever-growing list of -D_FORTIFY_SOURCE=2 protections.

I’ll still recommend everyone use poll instead of select, but now I won’t be so worried when I see requests to raise the open descriptor limit above 1024.

© 2014, Kees Cook. This work is licensed under a Creative Commons Attribution-ShareAlike 3.0 License.
Creative Commons License

5/7/2014

Linux Security Summit 2014

Filed under: Blogging,Chrome OS,Debian,General,Security,Ubuntu,Ubuntu-Server — kees @ 10:31 am

The Linux Security Summit is happening in Chicago August 18th and 19th, just before LinuxCon. Send us some presentation and topic proposals, and join the conversation with other like-minded people. :)

I’d love to see what people have been working on, and what they’d like to work on. Our general topics will hopefully include:

  • System hardening
  • Access control
  • Cryptography
  • Integrity control
  • Hardware security
  • Networking
  • Storage
  • Virtualization
  • Desktop
  • Tools
  • Management
  • Case studies
  • Emerging technologies, threats & techniques

The Call For Participation closes June 6th, so you’ve got about a month, but earlier is better.

© 2014, Kees Cook. This work is licensed under a Creative Commons Attribution-ShareAlike 3.0 License.
Creative Commons License

1/27/2014

-fstack-protector-strong

Filed under: Blogging,Chrome OS,Debian,Security,Ubuntu,Ubuntu-Server — kees @ 2:28 pm

There will be a new option in gcc 4.9 named “-fstack-protector-strong“, which offers an improved version of “-fstack-protector” without going all the way to “-fstack-protector-all“. The stack protector feature itself adds a known canary to the stack during function preamble, and checks it when the function returns. If it changed, there was a stack overflow, and the program aborts. This is fine, but figuring out when to include it is the reason behind the various options.

Since traditionally stack overflows happen with string-based manipulations, the default (-fstack-protector), only includes the canary code when a function defines an 8 (--param=ssp-buffer-size=N, N=8 by default) or more byte local character array. This means just a few functions get the checking, but they’re probably the most likely to need it, so it’s an okay balance. Various distributions ended up lowering their default --param=ssp-buffer-size option down to 4, since there were still cases of functions that should have been protected but the conservative gcc upstream default of 8 wasn’t covering them.

However, even with the increased function coverage, there are rare cases when a stack overflow happens on other kinds of stack variables. To handle this more paranoid concern, -fstack-protector-all was defined to add the canary to all functions. This results in substantial use of stack space for saving the canary on deep stack users, and measurable (though surprisingly still relatively low) performance hit due to all the saving/checking. For a long time, Chrome OS used this, since we’re paranoid. :)

In the interest of gaining back some of the lost performance and not hitting our Chrome OS build images with such a giant stack-protector hammer, Han Shen from the Chrome OS compiler team created the new option -fstack-protector-strong, which enables the canary in many more conditions:

  • local variable’s address used as part of the right hand side of an assignment or function argument
  • local variable is an array (or union containing an array), regardless of array type or length
  • uses register local variables

This meant we were covering all the more paranoid conditions that might lead to a stack overflow. Chrome OS has been using this option instead of -fstack-protector-all for about 10 months now.

As a quick demonstration of the options, you can see this example program under various conditions. It tries to show off an example of shoving serialized data into a non-character variable, like might happen in some network address manipulations or streaming data parsing. Since I’m using memcpy here for clarity, the builds will need to turn off FORTIFY_SOURCE, which would also notice the overflow.

#include <stdio.h>
#include <stdlib.h>
#include <string.h>

struct no_chars {
    unsigned int len;
    unsigned int data;
};

int main(int argc, char * argv[])
{
    struct no_chars info = { };

    if (argc < 3) {
        fprintf(stderr, "Usage: %s LENGTH DATA...\n", argv[0]);
        return 1;
    }

    info.len = atoi(argv[1]);
    memcpy(&info.data, argv[2], info.len);

    return 0;
}

Built with everything disabled, this faults trying to return to an invalid VMA:

    $ gcc -Wall -O2 -U_FORTIFY_SOURCE -fno-stack-protector /tmp/boom.c -o /tmp/boom
    $ /tmp/boom 64 AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
    Segmentation fault (core dumped)
    

Built with FORTIFY_SOURCE enabled, we see the expected catch of the overflow in memcpy:

    $ gcc -Wall -O2 -D_FORTIFY_SOURCE=2 -fno-stack-protector /tmp/boom.c -o /tmp/boom
    $ /tmp/boom 64 AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
    *** buffer overflow detected ***: /tmp/boom terminated
    ...
    

So, we’ll leave FORTIFY_SOURCE disabled for our comparisons. With pre-4.9 gcc, we can see that -fstack-protector does not get triggered to protect this function:

    $ gcc -Wall -O2 -U_FORTIFY_SOURCE -fstack-protector /tmp/boom.c -o /tmp/boom
    $ /tmp/boom 64 AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
    Segmentation fault (core dumped)
    

However, using -fstack-protector-all does trigger the protection, as expected:

    $ gcc -Wall -O2 -U_FORTIFY_SOURCE -fstack-protector-all /tmp/boom.c -o /tmp/boom
    $ /tmp/boom 64 AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
    *** stack smashing detected ***: /tmp/boom terminated
    Aborted (core dumped)
    

And finally, using the gcc snapshot of 4.9, here is -fstack-protector-strong doing its job:

    $ /usr/lib/gcc-snapshot/bin/gcc -Wall -O2 -U_FORTIFY_SOURCE -fstack-protector-strong /tmp/boom.c -o /tmp/boom
    $ /tmp/boom 64 AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
    *** stack smashing detected ***: /tmp/boom terminated
    Aborted (core dumped)
    

For Linux 3.14, I’ve added support for -fstack-protector-strong via the new CONFIG_CC_STACKPROTECTOR_STRONG option. The old CONFIG_CC_STACKPROTECTOR will be available as CONFIG_CC_STACKPROTECTOR_REGULAR. When comparing the results on builds via size and objdump -d analysis, here’s what I found with gcc 4.9:

A normal x86_64 “defconfig” build, without stack protector had a kernel text size of 11430641 bytes with 36110 function bodies. Adding CONFIG_CC_STACKPROTECTOR_REGULAR increased the kernel text size to 11468490 (a +0.33% change), with 1015 of 36110 functions stack-protected (2.81%). Using CONFIG_CC_STACKPROTECTOR_STRONG increased the kernel text size to 11692790 (+2.24%), with 7401 of 36110 functions stack-protected (20.5%). And 20% is a far-cry from 100% if support for -fstack-protector-all was added back to the kernel.

The next bit of work will be figuring out the best way to detect the version of gcc in use when doing Debian package builds, and using -fstack-protector-strong instead of -fstack-protector. For Ubuntu, it’s much simpler because it’ll just be the compiler default.

© 2014, Kees Cook. This work is licensed under a Creative Commons Attribution-ShareAlike 3.0 License.
Creative Commons License

12/10/2013

live patching the kernel

Filed under: Blogging,Chrome OS,Debian,Security,Ubuntu,Ubuntu-Server — kees @ 3:40 pm

A nice set of recent posts have done a great job detailing the remaining ways that a root user can get at kernel memory. Part of this is driven by the ideas behind UEFI Secure Boot, but they come from the same goal: making sure that the root user cannot directly subvert the running kernel. My perspective on this is toward making sure that an attacker who has gained access and then gained root privileges can’t continue to elevate their access and install invisible kernel rootkits.

An outline for possible attack vectors is spelled out by Matthew Gerrett’s continuing “useful kernel lockdown” patch series. The set of attacks was examined by Tyler Borland in “Bypassing modules_disabled security”. His post describes each vector in detail, and he ultimately chooses MSR writing as the way to write kernel memory (and shows an example of how to re-enable module loading). One thing not mentioned is that many distros have MSR access as a module, and it’s rarely loaded. If modules_disabled is already set, an attacker won’t be able to load the MSR module to begin with. However, the other general-purpose vector, kexec, is still available. To prove out this method, Matthew wrote a proof-of-concept for changing kernel memory via kexec.

Chrome OS is several steps ahead here, since it has hibernation disabled, MSR writing disabled, kexec disabled, modules verified, root filesystem read-only and verified, kernel verified, and firmware verified. But since not all my machines are Chrome OS, I wanted to look at some additional protections against kexec on general-purpose distro kernels that have CONFIG_KEXEC enabled, especially those without UEFI Secure Boot and Matthew’s lockdown patch series.

My goal was to disable kexec without needing to rebuild my entire kernel. For future kernels, I have proposed adding /proc/sys/kernel/kexec_disabled, a partner to the existing modules_disabled, that will one-way toggle kexec off. For existing kernels, things got more ugly.

What options do I have for patching a running kernel?

First I looked back at what I’d done in the past with fixing vulnerabilities with systemtap. This ends up being a rather heavy-duty way to go about things, since you need all the distro kernel debug symbols, etc. It does work, but has a significant problem: since it uses kprobes, a root user can just turn off the probes, reverting the changes. So that’s not going to work.

Next I looked at ksplice. The original upstream has gone away, but there is still some work being done by Jiri Slaby. However, even with his updates which fixed various build problems, there were still more, even when building a 3.2 kernel (Ubuntu 12.04 LTS). So that’s out too, which is too bad, since ksplice does exactly what I want: modifies the running kernel’s functions via a module.

So, finally, I decided to just do it by hand, and wrote a friendly kernel rootkit. Instead of dealing with flipping page table permissions on the normally-unwritable kernel code memory, I borrowed from PaX’s KERNEXEC feature, and just turn off write protect checking on the CPU briefly to make the changes. The return values for functions on x86_64 are stored in RAX, so I just need to stuff the kexec_load syscall with “mov -1, %rax; ret” (-1 is EPERM):

#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt

#include <linux/init.h>
#include <linux/module.h>
#include <linux/slab.h>

static unsigned long long_target;
static char *target;
module_param_named(syscall, long_target, ulong, 0644);
MODULE_PARM_DESC(syscall, "Address of syscall");

/* mov $-1, %rax; ret */
unsigned const char bytes[] = { 0x48, 0xc7, 0xc0, 0xff, 0xff, 0xff, 0xff,
                                0xc3 };
unsigned char *orig;

/* Borrowed from PaX KERNEXEC */
static inline void disable_wp(void)
{
        unsigned long cr0;

        preempt_disable();
        barrier();
        cr0 = read_cr0();
        cr0 &= ~X86_CR0_WP;
        write_cr0(cr0);
}

static inline void enable_wp(void)
{
        unsigned long cr0;

        cr0 = read_cr0();
        cr0 |= X86_CR0_WP;
        write_cr0(cr0);
        barrier();
        preempt_enable_no_resched();
}

static int __init syscall_eperm_init(void)
{
        int i;
        target = (char *)long_target;

        if (target == NULL)
                return -EINVAL;

        /* save original */
        orig = kmalloc(sizeof(bytes), GFP_KERNEL);
        if (!orig)
                return -ENOMEM;
        for (i = 0; i < sizeof(bytes); i++) {
                orig[i] = target[i];
        }

        pr_info("writing %lu bytes at %p\n", sizeof(bytes), target);

        disable_wp();
        for (i = 0; i < sizeof(bytes); i++) {
                target[i] = bytes[i];
        }
        enable_wp();

        return 0;
}
module_init(syscall_eperm_init);

static void __exit syscall_eperm_exit(void)
{
        int i;

        pr_info("restoring %lu bytes at %p\n", sizeof(bytes), target);

        disable_wp();
        for (i = 0; i < sizeof(bytes); i++) {
                target[i] = orig[i];
        }
        enable_wp();

        kfree(orig);
}
module_exit(syscall_eperm_exit);

MODULE_LICENSE("GPL");
MODULE_AUTHOR("Kees Cook <kees@outflux.net>");
MODULE_DESCRIPTION("makes target syscall always return EPERM");

If I didn’t want to leave an obvious indication that the kernel had been manipulated, the module could be changed to:

  • not announce what it’s doing
  • remove the exit route to not restore the changes on module unload
  • error out at the end of the init function instead of staying resident

And with this in place, it’s just a matter of loading it with the address of sys_kexec_load (found via /proc/kallsyms) before I disable module loading via modprobe. Here’s my upstart script:

# modules-disable - disable modules after rc scripts are done
#
description "disable loading modules"

start on stopped module-init-tools and stopped rc

task
script
        cd /root/modules/syscall_eperm
        make clean
        make
        insmod ./syscall_eperm.ko \
                syscall=0x$(egrep ' T sys_kexec_load$' /proc/kallsyms | cut -d" " -f1)
        modprobe disable
end script

And now I’m safe from kexec before I have a kernel that contains /proc/sys/kernel/kexec_disabled.

© 2013, Kees Cook. This work is licensed under a Creative Commons Attribution-ShareAlike 3.0 License.
Creative Commons License

8/13/2013

TPM providing /dev/hwrng

Filed under: Blogging,Chrome OS,Debian,Security,Ubuntu,Ubuntu-Server — kees @ 9:10 am

A while ago, I added support for the TPM’s pRNG to the rng-tools package in Ubuntu. Since then, Kent Yoder added TPM support directly into the kernel’s /dev/hwrng device. This means there’s no need to carry the patch in rng-tools any more, since I can use /dev/hwrng directly now:

# modprobe tpm-rng
# echo tpm-rng >> /etc/modules
# grep -v ^# /etc/default/rng-tools
RNGDOPTIONS="--fill-watermark=90%"
# service rng-tools restart

And as before, once it’s been running a while (or you send SIGUSR1 to rngd), you can see reporting in syslog:

# pkill -USR1 rngd
# tail -n 15 /var/log/syslog
Aug 13 09:51:01 linux rngd[39114]: stats: bits received from HRNG source: 260064
Aug 13 09:51:01 linux rngd[39114]: stats: bits sent to kernel pool: 216384
Aug 13 09:51:01 linux rngd[39114]: stats: entropy added to kernel pool: 216384
Aug 13 09:51:01 linux rngd[39114]: stats: FIPS 140-2 successes: 13
Aug 13 09:51:01 linux rngd[39114]: stats: FIPS 140-2 failures: 0
Aug 13 09:51:01 linux rngd[39114]: stats: FIPS 140-2(2001-10-10) Monobit: 0
Aug 13 09:51:01 linux rngd[39114]: stats: FIPS 140-2(2001-10-10) Poker: 0
Aug 13 09:51:01 linux rngd[39114]: stats: FIPS 140-2(2001-10-10) Runs: 0
Aug 13 09:51:01 linux rngd[39114]: stats: FIPS 140-2(2001-10-10) Long run: 0
Aug 13 09:51:01 linux rngd[39114]: stats: FIPS 140-2(2001-10-10) Continuous run: 0
Aug 13 09:51:01 linux rngd[39114]: stats: HRNG source speed: (min=10.433; avg=10.442; max=10.454)Kibits/s
Aug 13 09:51:01 linux rngd[39114]: stats: FIPS tests speed: (min=73.360; avg=75.504; max=86.305)Mibits/s
Aug 13 09:51:01 linux rngd[39114]: stats: Lowest ready-buffers level: 2
Aug 13 09:51:01 linux rngd[39114]: stats: Entropy starvations: 0
Aug 13 09:51:01 linux rngd[39114]: stats: Time spent starving for entropy: (min=0; avg=0.000; max=0)us

I’m pondering getting this running in Chrome OS too, but I want to make sure it doesn’t suck too much battery.

© 2013, Kees Cook. This work is licensed under a Creative Commons Attribution-ShareAlike 3.0 License.
Creative Commons License

1/21/2013

facedancer built

Filed under: Blogging,Chrome OS,Embedded,Security,Ubuntu,Ubuntu-Server — kees @ 2:39 pm

I finally had the time to put together the facedancer11 that Travis Goodspeed was so kind to give me. I had ordered all the parts some time ago, but had been dreading the careful surface-mount soldering work it was going to require. As it turned out, I’m not half bad at it — everything seems to have worked the first time through. I did, however, fail to order 33ohm 0603 resistors, so I have some temporary ones in use until I can replace them.

My facedancer

This device allows the HOST side computer to drive USB protocol communication at the packet level, with the TARGET seeing a USB device on the other end. No more needing to write careful embedded code while breaking USB stacks: the fake USB device can be controlled with Python.

This means I’m able to start some more serious fuzzing of the USB protocol layer. There is already code for emulating HID (Keyboard), Mass Storage, and now Firmware Updates. There’s probably tons to look at just in that. For some background on the fun to be had just with Mass Storage devices, see Goodspeed’s 23C9 presentation on it.

© 2013, Kees Cook. This work is licensed under a Creative Commons Attribution-ShareAlike 3.0 License.
Creative Commons License

11/28/2012

clean module disabling

Filed under: Blogging,Chrome OS,Security,Ubuntu,Ubuntu-Server — kees @ 3:55 pm

I think I found a way to make disabling kernel module loading (via /proc/sys/kernel/modules_disabled) easier for server admins. Right now there’s kind of a weird problem on some distros where reading /etc/modules races with reading /etc/sysctl.{conf,d}. In these cases, you can’t just put “kernel.modules_disabled=1” in the latter since you might not have finished loading modules from /etc/modules.

Before now, on my own systems, I’d added the sysctl call to my /etc/rc.local, which seems like a hack — that file is related to neither sysctl nor modules and both subsystems have their own configuration files, but it does happen absolutely last.

Instead, I’ve now defined “disable” as a modprobe alias via /etc/modprobe.d/disable.conf:

# To disable module loading after boot, "modprobe disable" can be used to
# set the sysctl that controls module loading.
install disable /sbin/sysctl kernel.modules_disabled=1

And then in /etc/modules I can list all the modules I actually need, and then put “disable” on the last line. (Or, if I want to not remember the sysctl path, I can manually run “modprobe disable” to turn off modules at some later point.)

I think it’d be cool this this become an internal alias in upstream kmod.

© 2012, Kees Cook. This work is licensed under a Creative Commons Attribution-ShareAlike 3.0 License.
Creative Commons License

11/2/2012

ARM assembly

Filed under: Blogging,Chrome OS,Embedded,Ubuntu,Ubuntu-Server — kees @ 10:01 am

While I’ve been skimming ARM assembly here and there, yesterday I actually had to write some from scratch to hook up seccomp on ARM. I got stumped for a while, and ended up using two references frequently:

The suffix one is pretty interesting because ARM allows for instructions to be conditional, rather than being required to rely on branching, like x86. For example, if you wanted something like this in C:

    if (i == 0)
        i = 1;
    i = i + 1;

In x86 assembly, you’d have a compare followed by a jump to skip the moving of the “1″ value:

    cmp %ecx, $0
    jne 2
    mov %ecx, $1
2:  inc %ecx

In ARM assembly, you can make the move conditional with a suffix (“mov if equal”):

    cmp   r2, #0
    moveq r2, #1
    add   r2, r2, #1

The real thing that stumped me yesterday, though, was the “!” suffix on load/store. Mainly, I didn’t notice it was there until I’d stared at the objdump output and systematically trimmed away all other other code that wasn’t changing the behavior:

    ldr r0, [sp, #OFFSET]
    str r0, [sp, #OFFSET]!

I was reading this as “variable = variable;” and I thought I was going crazy; how could a self-assignment change the code at all? In the second reference above, I found the that the trailing “!” means “(pre)increment the base by the offset”. I was doing a meaningless assignment, but it had the side-effect of pushing the “sp” register forward, and suddenly it all made sense (I needed to unwind the stack). The actual solution I needed was:

    add sp, sp, #S_OFF

Yay for a crash-course in actual ARM assembly. :)

(And yes, I’m aware of x86′s “cmov”, but I just wanted to do a simple illustration. ARM can do conditional calls!)

© 2012, Kees Cook. This work is licensed under a Creative Commons Attribution-ShareAlike 3.0 License.
Creative Commons License

10/1/2012

Link restrictions released in Linux 3.6

Filed under: Blogging,Chrome OS,Debian,Security,Ubuntu,Ubuntu-Server — kees @ 12:59 pm

It’s been a very long time coming, but symlink and hardlink restrictions have finally landed in the mainline Linux kernel as of version 3.6. The protection is at least old enough to have a driver’s license in most US states, with some of the first discussions I could find dating from Aug 1996.

While this protection is old (to ancient) news for anyone running Chrome OS, Ubuntu, grsecurity, or OpenWall, I’m extremely excited that is can now benefit everyone running Linux. All the way from cloud monstrosities to cell phones, an entire class of vulnerability just goes away. Thanks to everyone that had a part in developing, testing, reviewing, and encouraging these changes over the years. It’s quite a relief to have it finally done. I hope I never have to include the year in my patch revision serial number again. :)

© 2012, Kees Cook. This work is licensed under a Creative Commons Attribution-ShareAlike 3.0 License.
Creative Commons License

5/16/2012

USB AVR fun

At the recent Ubuntu Developer Summit, I managed to convince a few people (after assurances that there would be no permanent damage) to plug a USB stick into their machines so we could watch Xorg crash and wedge their console. What was this evil thing, you ask? It was an AVR microprocessor connected to USB, acting as a USB HID Keyboard, with the product name set to “%n”.

Recently a Chrome OS developer discovered that renaming his Bluetooth Keyboard to “%n” would crash Xorg. The flaw was in the logging stack, triggering glibc to abort the process due to format string protections. At first glance, it looks like this isn’t a big deal since one would have to have already done a Bluetooth pairing with the keyboard, but it would be a problem for any input device, not just Bluetooth. I wanted to see this in action for a “normal” (USB) keyboard.

I borrowed a “Maximus” USB AVR from a friend, and then ultimately bought a Minimus. It will let you put anything you want on the USB bus.

I added a rule for it to udev:

SUBSYSTEM=="usb", ACTION=="add", ATTR{idVendor}=="03eb", ATTR{idProduct}=="*", GROUP="plugdev"

installed the AVR tools:

sudo apt-get install dfu-programmer gcc-avr avr-libc

and pulled down the excellent LUFA USB tree:

git clone git://github.com/abcminiuser/lufa-lib.git

After applying a patch to the LUFA USB keyboard demo, I had my handy USB-AVR-as-Keyboard stick ready to crash Xorg:

-       .VendorID               = 0x03EB,
-       .ProductID              = 0x2042,
+       .VendorID               = 0x045e,
+       .ProductID              = 0x000b,
...
-       .UnicodeString          = L"LUFA Keyboard Demo"
+       .UnicodeString          = L"Keyboard (%n%n%n%n)"

In fact, it was so successfully that after I got the code right and programmed it, Xorg immediately crashed on my development machine. :)

make dfu

After a reboot, I switched it back to programming mode by pressing and holding the “H” button, press/releasing the “R” button, and releasing “H”.

The fix to Xorg is winding its way through upstream, and should land in your distros soon. In the meantime, you can disable your external USB ports, as Marc Deslauriers demonstrated for me:

echo "0" > /sys/bus/usb/devices/usb1/authorized
echo "0" > /sys/bus/usb/devices/usb1/authorized_default

Be careful of shared internal/external ports, and having two buses on one port, etc.

© 2012, Kees Cook. This work is licensed under a Creative Commons Attribution-ShareAlike 3.0 License.
Creative Commons License

3/26/2012

keeping your process unprivileged

Filed under: Blogging,Chrome OS,Debian,Security,Ubuntu,Ubuntu-Server — kees @ 1:17 pm

One of the prerequisites for seccomp filter is the new PR_SET_NO_NEW_PRIVS prctl from Andy Lutomirski.

If you’re not interested in digging into creating a seccomp filter for your program, but you know your program should be effectively a “leaf node” in the process tree, you can call PR_SET_NO_NEW_PRIVS (nnp) to make sure that the current process and its children can not gain new privileges (like through running a setuid binary). This produces some fun results, since things like the “ping” tool expect to gain enough privileges to open a raw socket. If you set nnp to “1″, suddenly that can’t happen any more.

Here’s a quick example that sets nnp, and tries to run the command line arguments:

#include <stdio.h>
#include <unistd.h>
#include <sys/prctl.h>
#ifndef PR_SET_NO_NEW_PRIVS
# define PR_SET_NO_NEW_PRIVS 38
#endif

int main(int argc, char * argv[])
{
        if (prctl(PR_SET_NO_NEW_PRIVS, 1, 0, 0, 0)) {
                perror("prctl(NO_NEW_PRIVS)");
                return 1;
        }

        return execvp(argv[1], &argv[1]);
}

When it tries to run ping, the setuid-ness just gets ignored:

$ gcc -Wall nnp.c -o nnp
$ ./nnp ping -c1 localhost
ping: icmp open socket: Operation not permitted

So, if your program has all the privs its going to need, consider using nnp to keep it from being a potential gateway to more trouble. Hopefully we can ship something like this trivial nnp helper as part of coreutils or similar, like nohup, nice, etc.

© 2012, Kees Cook. This work is licensed under a Creative Commons Attribution-ShareAlike 3.0 License.
Creative Commons License

3/22/2012

seccomp filter now in Ubuntu

Filed under: Blogging,Chrome OS,Debian,Security,Ubuntu,Ubuntu-Server — kees @ 10:02 pm

With the generous help of the Ubuntu kernel team, Will Drewry’s seccomp filter code has landed in Ubuntu 12.04 LTS in time for Beta 2, and will be in Chrome OS shortly. Hopefully this will be in upstream soon, and filter (pun intended) to the rest of the distributions quickly.

One of the questions I’ve been asked by several people while they developed policy for earlier “mode 2″ seccomp implementations was “How do I figure out which syscalls my program is going to need?” To help answer this question, and to show a simple use of seccomp filter, I’ve written up a little tutorial that walks through several steps of building a seccomp filter. It includes a header file (“seccomp-bpf.h“) for implementing the filter, and a collection of other files used to assist in syscall discovery. It should be portable, so it can build even on systems that do not have seccomp available yet.

Read more in the seccomp filter tutorial. Enjoy!

© 2012, Kees Cook. This work is licensed under a Creative Commons Attribution-ShareAlike 3.0 License.
Creative Commons License

2/15/2012

discard, hole-punching, and TRIM

Filed under: Chrome OS,Debian,Ubuntu,Ubuntu-Server — kees @ 1:19 pm

Under Linux, there are a number of related features around marking areas of a file, filesystem, or block device as “no longer allocated”. In the standard view, here’s what happens if you fill a file to 500M and then truncate it to 100M, using the “truncate” syscall:

  1. create the empty file, filesystem allocates an inode, writes accounting details to block device.
  2. write data to file, filesystem allocates and fills data blocks, writes blocks to block device.
  3. truncate the file to a smaller size, filesystem updates accounting details and releases blocks, writes accounting details to block device.

The important thing to note here is that in step 3 the block device has no idea about the released data blocks. The original contents of the file are actually still on the device. (And to a certain extent is why programs like shred exist.) While the recoverability of such released data is a whole other issue, the main problem about this lack of information for the block device is that some devices (like SSDs) could use this information to their benefit to help with extending their life, etc. To support this, the “TRIM” set of commands were created so that a block device could be informed when blocks were released. Under Linux, this is handled by the block device driver, and what the filesystem can pass down is “discard” intent, which is translated into the needed TRIM commands.

So now, when discard notification is enabled for a filesystem (e.g. mount option “discard” for ext4), the earlier example looks like this:

  1. create the empty file, filesystem allocates an inode, writes accounting details to block device.
  2. write data to file, filesystem allocates and fills data blocks, writes blocks to block device.
  3. truncate the file to a smaller size, filesystem updates accounting details and releases blocks, writes accounting details and sends discard intent to block device.

While SSDs can use discard to do fancy SSD things, there’s another great use for discard, which is to restore sparseness to files. Normally, if you create a sparse file (open, seek to size, close), there was no way, after writing data to this file, to “punch a hole” back into it. The best that could be done was to just write zeros over the area, but that took up filesystem space. So, the ability to punch holes in files was added via the FALLOC_FL_PUNCH_HOLE option of fallocate. And when discard was enabled for a filesystem, these punched holes would get passed down to the block device as well.

Take, for example, a qemu/KVM VM running on a disk image that was built from a sparse file. While inside the VM instance, the disk appears to be 10G. Externally, it might only have actually allocated 600M, since those are the only blocks that had been allocated so far. In the instance, if you wrote 8G worth of temporary data, and then deleted it, the underlying sparse file would have ballooned by 8G and stayed ballooned. With discard and hole punching, it’s now possible for the filesystem in the VM to issue discards to the block driver, and then qemu could issue hole-punching requests to the sparse file backing the image, and all of that 8G would get freed again. The only down side is that each layer needs to correctly translate the requests into what the next layer needs.

With Linux 3.1, dm-crypt supports passing discards from the filesystem above down to the block device under it (though this has cryptographic risks, so it is disabled by default). With Linux 3.2, the loopback block driver supports receiving discards and passing them down as hole-punches. That means that a stack like this works now: ext4, on dm-crypt, on loopback of a sparse file, on ext4, on SSD. If a file is deleted at the top, it’ll pass all the way down, discarding allocated blocks all the way to the SSD:

Set up a sparse backing file, loopback mount it, and create a dm-crypt device (with “allow_discards”) on it:

# cd /root
# truncate -s10G test.block
# ls -lk test.block 
-rw-r--r-- 1 root root 10485760 Feb 15 12:36 test.block
# du -sk test.block
0       test.block
# DEV=$(losetup -f --show /root/test.block)
# echo $DEV
/dev/loop0
# SIZE=$(blockdev --getsz $DEV)
# echo $SIZE
20971520
# KEY=$(echo -n "my secret passphrase" | sha256sum | awk '{print $1}')
# echo $KEY
a7e845b0854294da9aa743b807cb67b19647c1195ea8120369f3d12c70468f29
# dmsetup create testenc --table "0 $SIZE crypt aes-cbc-essiv:sha256 $KEY 0 $DEV 0 1 allow_discards"

Now build an ext4 filesystem on it. This enables discard during mkfs, and disables lazy initialization so we can see the final size of the used space on the backing file without waiting for the background initialization at mount-time to finish, and mount it with the “discard” option:

# mkfs.ext4 -E discard,lazy_itable_init=0,lazy_journal_init=0 /dev/mapper/testenc
mke2fs 1.42-WIP (16-Oct-2011)
Discarding device blocks: done                            
Filesystem label=
OS type: Linux
Block size=4096 (log=2)
Fragment size=4096 (log=2)
Stride=0 blocks, Stripe width=0 blocks
655360 inodes, 2621440 blocks
131072 blocks (5.00%) reserved for the super user
First data block=0
Maximum filesystem blocks=2684354560
80 block groups
32768 blocks per group, 32768 fragments per group
8192 inodes per group
Superblock backups stored on blocks: 
        32768, 98304, 163840, 229376, 294912, 819200, 884736, 1605632

Allocating group tables: done                            
Writing inode tables: done                            
Creating journal (32768 blocks): done
Writing superblocks and filesystem accounting information: done 

# mount -o discard /dev/mapper/testenc /mnt
# sync; du -sk test.block
297708  test.block

Now, we create a 200M file, examine the backing file allocation, remove it, and compare the results:

# dd if=/dev/zero of=/mnt/blob bs=1M count=200
200+0 records in
200+0 records out
209715200 bytes (210 MB) copied, 9.92789 s, 21.1 MB/s
# sync; du -sk test.block
502524  test.block
# rm /mnt/blob
# sync; du -sk test.block
297720  test.block

Nearly all the space was reclaimed after the file was deleted. Yay!

Note that the Linux tmpfs filesystem does not yet support hole punching, so the exampe above wouldn’t work if you tried it in a tmpfs-backed filesystem (e.g. /tmp on many systems).

© 2012, Kees Cook. This work is licensed under a Creative Commons Attribution-ShareAlike 3.0 License.
Creative Commons License

2/10/2012

kvm and product_uuid

Filed under: Chrome OS,Debian,Ubuntu,Ubuntu-Server — kees @ 10:08 am

While looking for something to use as a system-unique fall-back when a TPM is not available, I looked at /sys/devices/virtual/dmi/id/product_uuid (same as dmidecode‘s “System Information / UUID”), but was disappointed when, under KVM, the file was missing (and running dmidecode crashes KVM *cough*). However, after a quick check, I noticed that KVM supports the “-uuid” option to set the value of /sys/devices/virtual/dmi/id/product_uuid. Looks like libvirt supports this under capabilities / host / uuid in the XML, too.

host# kvm -uuid 12345678-ABCD-1234-ABCD-1234567890AB ...
host# ssh localhost ...
...
guest# cat /sys/devices/virtual/dmi/id/product_uuid 
12345678-ABCD-1234-ABCD-1234567890AB

© 2012, Kees Cook. This work is licensed under a Creative Commons Attribution-ShareAlike 3.0 License.
Creative Commons License

2/6/2012

use of ptrace

Filed under: Blogging,Chrome OS,Security,Ubuntu,Ubuntu-Server — kees @ 4:48 pm

As I discussed last year, Ubuntu has been restricting the use of ptrace for a few releases now. I’m excited to see Fedora starting to introduce similar restrictions, but I’m disappointed at the specific implementation:

  • A method for doing this already exists (Yama). Yama is not plumbed into SELinux, but I would argue that’s not needed.
  • The SELinux method depends, unsurprisingly, on an active SELinux policy on the system, which isn’t everyone.
  • It’s not possible for regular developers (not system developers) to debug their own processes.
  • It will break all ptrace-based crash handlers (e.g. KDE, Firefox, Chrome) or tools that depend on ptrace to do their regular job (e.g. Wine, gdb, strace, ltrace).

Blocking ptrace blocks exactly one type of attack: credential extraction from a running process. In the face of a persistent attack, ultimately, anything running as the user can be trojaned, regardless of ptrace. Blocking ptrace, however, stalls the initial attack. At the moment an attacker arrives on a system, they cannot immediately extend their reach by examining the other processes (e.g. jumping down existing SSH connections, pulling passwords out of Firefox, etc). Some sensitive processes are already protected from this kind of thing because they are not “dumpable” (due to either specifically requesting this from prctl(PR_SET_DUMPABLE, ...) or due to a uid/gid transition), but many are open for abuse.

The primary “valid” use cases for ptrace are crash handlers, debuggers, and memory analysis tools. In each case, they have a single common element: the process being ptraced knows which process should have permission to attach to it. What Linux lacked was a way to declare these relationships, which is what Yama added. The use of SELinux policy, for example, isn’t sufficient because the permissions are too wide (e.g. giving gdb the ability to ptrace anything just means the attacker has to use gdb to do the job). Right now, due to the use of Yama in Ubuntu, all the mentioned tools have the awareness of how to programmatically declare the ptrace relationships at runtime with prctl(PR_SET_PTRACER, ...). I find it disappointing that Fedora won’t be using this to their advantage when it is available and well tested.

Even ChromeOS uses Yama now. ;)

© 2012, Kees Cook. This work is licensed under a Creative Commons Attribution-ShareAlike 3.0 License.
Creative Commons License

Powered by WordPress