codeblog code is freedom — patching my itch


compiler hardening in Ubuntu and Debian

Filed under: Blogging,Debian,Security,Ubuntu,Ubuntu-Server — kees @ 8:42 am

Back in 2006, the compiler in Ubuntu was patched to enable most build-time security-hardening features (relro, stack protector, fortify source). I wasn’t able to convince Debian to do the same, so Debian went the route of other distributions, adding security hardening flags during package builds only. I remain disappointed in this approach, because it means that someone who builds software without using the packaging tools on a non-Ubuntu system won’t get those hardening features. Think of a sysadmin trying the latest nginx, or a vendor like Valve building games for distribution. On Ubuntu, when you do that “./configure && make” you’ll get the features automatically.

Debian, at the time, didn’t have a good way forward even for package builds since it lacked a concept of “global package build flags”. Happily, a solution (via dh) was developed about 2 years ago, and Debian package maintainers have been working to adopt it ever since.

So, while I don’t think any distro can match Ubuntu’s method of security hardening compiler defaults, it is valuable to see the results of global package build flags in Debian on the package archive. I’ve had an on-going graph of the state of build hardening on both Ubuntu and Debian for a while, but only recently did I put together a comparison of a default install. Very few people have all the packages in the archive installed, so it’s a bit silly to only look at the archive statistics. But let’s start there, just to describe what’s being measured.

Here’s today’s snapshot of Ubuntu’s development archive for the past year (you can see development “opening” after a release every 6 months with an influx of new packages):

Here’s today’s snapshot of Debian’s unstable archive for the past year (at the start of May you can see the archive “unfreezing” after the Wheezy release; the gaps were my analysis tool failing):

Ubuntu’s lines are relatively flat because everything that can be built with hardening already is. Debian’s graph is on a slow upward trend as more packages get migrated to dh to gain knowledge of the global flags.

Each line in the graphs represents the count of source packages that contain binary packages that have at least 1 “hit” for a given category. “ELF” is just that: a source package that ultimately produces at least 1 binary package with at least 1 ELF binary in it (i.e. produces a compiled output). The “Read-only Relocations” (“relro”) hardening feature is almost always done for an ELF, excepting uncommon situations. As a result, the count of ELF and relro are close on Ubuntu. In fact, examining relro is a good indication of whether or not a source package got built with hardening of any kind. So, in Ubuntu, 91.5% of the archive is built with hardening, with Debian at 55.2%.

The “stack protector” and “fortify source” features depend on characteristics of the source itself, and may not always be present in package’s binaries even when hardening is enabled for the build (e.g. no functions got selected for stack protection, or no fortified glibc functions were used). Really these lines mostly indicate the count of packages that have a sufficiently high level of complexity that would trigger such protections.

The “PIE” and “immediate binding” (“bind_now”) features are specifically enabled by a package maintainer. PIE can have a noticeable performance impact on CPU-register-starved architectures like i386 (ia32), so it is neither patched on in Ubuntu, nor part of the default flags in Debian. (And bind_now doesn’t make much sense without PIE, so they usually go together.) It’s worth noting, however, that it probably should be the default on amd64 (x86_64), which has plenty of available registers.

Here is a comparison of default installed packages between the most recent stable releases of Ubuntu (13.10) and Debian (Wheezy). It’s clear that what the average user gets with a default fresh install is better than what the archive-to-archive comparison shows. Debian’s showing is better (74% built with hardening), though it is still clearly lagging behind Ubuntu (99%):

© 2014, Kees Cook. This work is licensed under a Creative Commons Attribution-ShareAlike 3.0 License.
Creative Commons License



Filed under: Blogging,Chrome OS,Debian,Security,Ubuntu,Ubuntu-Server — kees @ 2:28 pm

There will be a new option in gcc 4.9 named “-fstack-protector-strong“, which offers an improved version of “-fstack-protector” without going all the way to “-fstack-protector-all“. The stack protector feature itself adds a known canary to the stack during function preamble, and checks it when the function returns. If it changed, there was a stack overflow, and the program aborts. This is fine, but figuring out when to include it is the reason behind the various options.

Since traditionally stack overflows happen with string-based manipulations, the default (-fstack-protector), only includes the canary code when a function defines an 8 (--param=ssp-buffer-size=N, N=8 by default) or more byte local character array. This means just a few functions get the checking, but they’re probably the most likely to need it, so it’s an okay balance. Various distributions ended up lowering their default --param=ssp-buffer-size option down to 4, since there were still cases of functions that should have been protected but the conservative gcc upstream default of 8 wasn’t covering them.

However, even with the increased function coverage, there are rare cases when a stack overflow happens on other kinds of stack variables. To handle this more paranoid concern, -fstack-protector-all was defined to add the canary to all functions. This results in substantial use of stack space for saving the canary on deep stack users, and measurable (though surprisingly still relatively low) performance hit due to all the saving/checking. For a long time, Chrome OS used this, since we’re paranoid. :)

In the interest of gaining back some of the lost performance and not hitting our Chrome OS build images with such a giant stack-protector hammer, Han Shen from the Chrome OS compiler team created the new option -fstack-protector-strong, which enables the canary in many more conditions:

  • local variable’s address used as part of the right hand side of an assignment or function argument
  • local variable is an array (or union containing an array), regardless of array type or length
  • uses register local variables

This meant we were covering all the more paranoid conditions that might lead to a stack overflow. Chrome OS has been using this option instead of -fstack-protector-all for about 10 months now.

As a quick demonstration of the options, you can see this example program under various conditions. It tries to show off an example of shoving serialized data into a non-character variable, like might happen in some network address manipulations or streaming data parsing. Since I’m using memcpy here for clarity, the builds will need to turn off FORTIFY_SOURCE, which would also notice the overflow.

#include <stdio.h>
#include <stdlib.h>
#include <string.h>

struct no_chars {
    unsigned int len;
    unsigned int data;

int main(int argc, char * argv[])
    struct no_chars info = { };

    if (argc < 3) {
        fprintf(stderr, "Usage: %s LENGTH DATA...\n", argv[0]);
        return 1;

    info.len = atoi(argv[1]);
    memcpy(&, argv[2], info.len);

    return 0;

Built with everything disabled, this faults trying to return to an invalid VMA:

    $ gcc -Wall -O2 -U_FORTIFY_SOURCE -fno-stack-protector /tmp/boom.c -o /tmp/boom
    Segmentation fault (core dumped)

Built with FORTIFY_SOURCE enabled, we see the expected catch of the overflow in memcpy:

    $ gcc -Wall -O2 -D_FORTIFY_SOURCE=2 -fno-stack-protector /tmp/boom.c -o /tmp/boom
    *** buffer overflow detected ***: /tmp/boom terminated

So, we’ll leave FORTIFY_SOURCE disabled for our comparisons. With pre-4.9 gcc, we can see that -fstack-protector does not get triggered to protect this function:

    $ gcc -Wall -O2 -U_FORTIFY_SOURCE -fstack-protector /tmp/boom.c -o /tmp/boom
    Segmentation fault (core dumped)

However, using -fstack-protector-all does trigger the protection, as expected:

    $ gcc -Wall -O2 -U_FORTIFY_SOURCE -fstack-protector-all /tmp/boom.c -o /tmp/boom
    *** stack smashing detected ***: /tmp/boom terminated
    Aborted (core dumped)

And finally, using the gcc snapshot of 4.9, here is -fstack-protector-strong doing its job:

    $ /usr/lib/gcc-snapshot/bin/gcc -Wall -O2 -U_FORTIFY_SOURCE -fstack-protector-strong /tmp/boom.c -o /tmp/boom
    *** stack smashing detected ***: /tmp/boom terminated
    Aborted (core dumped)

For Linux 3.14, I’ve added support for -fstack-protector-strong via the new CONFIG_CC_STACKPROTECTOR_STRONG option. The old CONFIG_CC_STACKPROTECTOR will be available as CONFIG_CC_STACKPROTECTOR_REGULAR. When comparing the results on builds via size and objdump -d analysis, here’s what I found with gcc 4.9:

A normal x86_64 “defconfig” build, without stack protector had a kernel text size of 11430641 bytes with 36110 function bodies. Adding CONFIG_CC_STACKPROTECTOR_REGULAR increased the kernel text size to 11468490 (a +0.33% change), with 1015 of 36110 functions stack-protected (2.81%). Using CONFIG_CC_STACKPROTECTOR_STRONG increased the kernel text size to 11692790 (+2.24%), with 7401 of 36110 functions stack-protected (20.5%). And 20% is a far-cry from 100% if support for -fstack-protector-all was added back to the kernel.

The next bit of work will be figuring out the best way to detect the version of gcc in use when doing Debian package builds, and using -fstack-protector-strong instead of -fstack-protector. For Ubuntu, it’s much simpler because it’ll just be the compiler default.

© 2014, Kees Cook. This work is licensed under a Creative Commons Attribution-ShareAlike 3.0 License.
Creative Commons License


DOM scraping

Filed under: Blogging,Debian,General,Ubuntu,Ubuntu-Server,Web — kees @ 11:16 pm

For a long time now I’ve used mechanize (via either Perl or Python) for doing website interaction automation. Stuff like playing web games, checking the weather, or reviewing my balance at the bank. However, as the use of javascript continues to increase, it’s getting harder and harder to screen-scrape without actually processing DOM events. To do that, really only browsers are doing the right thing, so getting attached to an actual browser DOM is generally the only way to do any kind of web interaction automation.

It seems the thing furthest along this path is Selenium. Initially, I spent some time trying to make it work with Firefox, but gave up. Instead, this seems to work nicely with Chrome via the Chrome WebDriver. And even better, all of this works out of the box on Ubuntu 13.10 via python-selenium and chromium-chromedriver.

Running /usr/lib/chromium-browser/chromedriver2_server from chromium-chromedriver starts a network listener on port 9515. This is the WebDriver API that Selenium can talk to. When requests are made, chromedriver2_server spawns Chrome, and all the interactions happen against that browser.

Since I prefer Python, I avoided the Java interfaces and focused on the Python bindings:

#!/usr/bin/env python
import sys
from selenium import webdriver
from selenium.common.exceptions import NoSuchElementException
from selenium.webdriver.common.keys import Keys

caps = webdriver.DesiredCapabilities.CHROME

browser = webdriver.Remote("http://localhost:9515", caps)

assert "My Bank" in browser.title

    elem = browser.find_element_by_name("userid")

    elem = browser.find_element_by_name("password")
    elem.send_keys("wheee my password" + Keys.RETURN)
except NoSuchElementException:
    print "Could not find login elements"

assert "Account Balances" in browser.title

xpath = "//div[text()='Balance']/../../td[2]/div[contains(text(),'$')]"
balance = browser.find_element_by_xpath(xpath).text

print balance


This would work pretty great, but if you need to save any state between sessions, you’ll want to be able to change where Chrome stores data (since by default in this configuration, it uses an empty temporary directory via --user-data-dir=). Happily, various things about the browser environment can be controlled, including the command line arguments. This is configurable by expanding the “desired capabilities” variable:

caps = webdriver.DesiredCapabilities.CHROME
caps["chromeOptions"] = {
        "args": ["--user-data-dir=/home/user/somewhere/to/store/your/session"],

A great thing about this is that you get to actually watch the browser do its work. However, in cases where this interaction is going to be fully automated, you likely won’t have a Xorg session running, so you’ll need to wrap the WebDriver in one (since it launches Chrome). I used Xvfb for this:

# Start WebDriver under fake X and wait for it to be listening
xvfb-run /usr/lib/chromium-browser/chromedriver2_server &
while ! nc -q0 -w0 localhost 9515; do
    sleep 1


# Shut down WebDriver
kill $pid

exit $rc

Alternatively, all of this could be done in the python script too, but I figured it’s easier to keep the support infrastructure separate from the actual test script itself. I actually leave the xvfb-run call external too, so it’s easier to debug the browser in my own X session.

One bug I encountered was that the WebDriver’s cache of the browser’s DOM can sometimes get out of sync with the actual browser’s DOM. I didn’t find a solution to this, but managed to work around it. I’m hoping later versions fix this. :)

© 2013, Kees Cook. This work is licensed under a Creative Commons Attribution-ShareAlike 3.0 License.
Creative Commons License


live patching the kernel

Filed under: Blogging,Chrome OS,Debian,Security,Ubuntu,Ubuntu-Server — kees @ 3:40 pm

A nice set of recent posts have done a great job detailing the remaining ways that a root user can get at kernel memory. Part of this is driven by the ideas behind UEFI Secure Boot, but they come from the same goal: making sure that the root user cannot directly subvert the running kernel. My perspective on this is toward making sure that an attacker who has gained access and then gained root privileges can’t continue to elevate their access and install invisible kernel rootkits.

An outline for possible attack vectors is spelled out by Matthew Gerrett’s continuing “useful kernel lockdown” patch series. The set of attacks was examined by Tyler Borland in “Bypassing modules_disabled security”. His post describes each vector in detail, and he ultimately chooses MSR writing as the way to write kernel memory (and shows an example of how to re-enable module loading). One thing not mentioned is that many distros have MSR access as a module, and it’s rarely loaded. If modules_disabled is already set, an attacker won’t be able to load the MSR module to begin with. However, the other general-purpose vector, kexec, is still available. To prove out this method, Matthew wrote a proof-of-concept for changing kernel memory via kexec.

Chrome OS is several steps ahead here, since it has hibernation disabled, MSR writing disabled, kexec disabled, modules verified, root filesystem read-only and verified, kernel verified, and firmware verified. But since not all my machines are Chrome OS, I wanted to look at some additional protections against kexec on general-purpose distro kernels that have CONFIG_KEXEC enabled, especially those without UEFI Secure Boot and Matthew’s lockdown patch series.

My goal was to disable kexec without needing to rebuild my entire kernel. For future kernels, I have proposed adding /proc/sys/kernel/kexec_disabled, a partner to the existing modules_disabled, that will one-way toggle kexec off. For existing kernels, things got more ugly.

What options do I have for patching a running kernel?

First I looked back at what I’d done in the past with fixing vulnerabilities with systemtap. This ends up being a rather heavy-duty way to go about things, since you need all the distro kernel debug symbols, etc. It does work, but has a significant problem: since it uses kprobes, a root user can just turn off the probes, reverting the changes. So that’s not going to work.

Next I looked at ksplice. The original upstream has gone away, but there is still some work being done by Jiri Slaby. However, even with his updates which fixed various build problems, there were still more, even when building a 3.2 kernel (Ubuntu 12.04 LTS). So that’s out too, which is too bad, since ksplice does exactly what I want: modifies the running kernel’s functions via a module.

So, finally, I decided to just do it by hand, and wrote a friendly kernel rootkit. Instead of dealing with flipping page table permissions on the normally-unwritable kernel code memory, I borrowed from PaX’s KERNEXEC feature, and just turn off write protect checking on the CPU briefly to make the changes. The return values for functions on x86_64 are stored in RAX, so I just need to stuff the kexec_load syscall with “mov -1, %rax; ret” (-1 is EPERM):

#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt

#include <linux/init.h>
#include <linux/module.h>
#include <linux/slab.h>

static unsigned long long_target;
static char *target;
module_param_named(syscall, long_target, ulong, 0644);
MODULE_PARM_DESC(syscall, "Address of syscall");

/* mov $-1, %rax; ret */
unsigned const char bytes[] = { 0x48, 0xc7, 0xc0, 0xff, 0xff, 0xff, 0xff,
                                0xc3 };
unsigned char *orig;

/* Borrowed from PaX KERNEXEC */
static inline void disable_wp(void)
        unsigned long cr0;

        cr0 = read_cr0();
        cr0 &= ~X86_CR0_WP;

static inline void enable_wp(void)
        unsigned long cr0;

        cr0 = read_cr0();
        cr0 |= X86_CR0_WP;

static int __init syscall_eperm_init(void)
        int i;
        target = (char *)long_target;

        if (target == NULL)
                return -EINVAL;

        /* save original */
        orig = kmalloc(sizeof(bytes), GFP_KERNEL);
        if (!orig)
                return -ENOMEM;
        for (i = 0; i < sizeof(bytes); i++) {
                orig[i] = target[i];

        pr_info("writing %lu bytes at %p\n", sizeof(bytes), target);

        for (i = 0; i < sizeof(bytes); i++) {
                target[i] = bytes[i];

        return 0;

static void __exit syscall_eperm_exit(void)
        int i;

        pr_info("restoring %lu bytes at %p\n", sizeof(bytes), target);

        for (i = 0; i < sizeof(bytes); i++) {
                target[i] = orig[i];


MODULE_AUTHOR("Kees Cook <>");
MODULE_DESCRIPTION("makes target syscall always return EPERM");

If I didn’t want to leave an obvious indication that the kernel had been manipulated, the module could be changed to:

  • not announce what it’s doing
  • remove the exit route to not restore the changes on module unload
  • error out at the end of the init function instead of staying resident

And with this in place, it’s just a matter of loading it with the address of sys_kexec_load (found via /proc/kallsyms) before I disable module loading via modprobe. Here’s my upstart script:

# modules-disable - disable modules after rc scripts are done
description "disable loading modules"

start on stopped module-init-tools and stopped rc

        cd /root/modules/syscall_eperm
        make clean
        insmod ./syscall_eperm.ko \
                syscall=0x$(egrep ' T sys_kexec_load$' /proc/kallsyms | cut -d" " -f1)
        modprobe disable
end script

And now I’m safe from kexec before I have a kernel that contains /proc/sys/kernel/kexec_disabled.

© 2013, Kees Cook. This work is licensed under a Creative Commons Attribution-ShareAlike 3.0 License.
Creative Commons License


Thanks UPS

Filed under: Blogging,Debian,Networking,Ubuntu,Ubuntu-Server — kees @ 8:25 pm

My UPS has decided that every two weeks when it performs a self-test that my 116V mains power isn’t good enough, so it drains the battery and shuts down my home network. Only took a month and a half for me to see on the network graphs that my outages were, to the minute, 2 weeks apart. :)

APC Monitoring

In theory, reducing the sensitivity will fix this…

© 2013, Kees Cook. This work is licensed under a Creative Commons Attribution-ShareAlike 3.0 License.
Creative Commons License


TPM providing /dev/hwrng

Filed under: Blogging,Chrome OS,Debian,Security,Ubuntu,Ubuntu-Server — kees @ 9:10 am

A while ago, I added support for the TPM’s pRNG to the rng-tools package in Ubuntu. Since then, Kent Yoder added TPM support directly into the kernel’s /dev/hwrng device. This means there’s no need to carry the patch in rng-tools any more, since I can use /dev/hwrng directly now:

# modprobe tpm-rng
# echo tpm-rng >> /etc/modules
# grep -v ^# /etc/default/rng-tools
# service rng-tools restart

And as before, once it’s been running a while (or you send SIGUSR1 to rngd), you can see reporting in syslog:

# pkill -USR1 rngd
# tail -n 15 /var/log/syslog
Aug 13 09:51:01 linux rngd[39114]: stats: bits received from HRNG source: 260064
Aug 13 09:51:01 linux rngd[39114]: stats: bits sent to kernel pool: 216384
Aug 13 09:51:01 linux rngd[39114]: stats: entropy added to kernel pool: 216384
Aug 13 09:51:01 linux rngd[39114]: stats: FIPS 140-2 successes: 13
Aug 13 09:51:01 linux rngd[39114]: stats: FIPS 140-2 failures: 0
Aug 13 09:51:01 linux rngd[39114]: stats: FIPS 140-2(2001-10-10) Monobit: 0
Aug 13 09:51:01 linux rngd[39114]: stats: FIPS 140-2(2001-10-10) Poker: 0
Aug 13 09:51:01 linux rngd[39114]: stats: FIPS 140-2(2001-10-10) Runs: 0
Aug 13 09:51:01 linux rngd[39114]: stats: FIPS 140-2(2001-10-10) Long run: 0
Aug 13 09:51:01 linux rngd[39114]: stats: FIPS 140-2(2001-10-10) Continuous run: 0
Aug 13 09:51:01 linux rngd[39114]: stats: HRNG source speed: (min=10.433; avg=10.442; max=10.454)Kibits/s
Aug 13 09:51:01 linux rngd[39114]: stats: FIPS tests speed: (min=73.360; avg=75.504; max=86.305)Mibits/s
Aug 13 09:51:01 linux rngd[39114]: stats: Lowest ready-buffers level: 2
Aug 13 09:51:01 linux rngd[39114]: stats: Entropy starvations: 0
Aug 13 09:51:01 linux rngd[39114]: stats: Time spent starving for entropy: (min=0; avg=0.000; max=0)us

I’m pondering getting this running in Chrome OS too, but I want to make sure it doesn’t suck too much battery.

© 2013, Kees Cook. This work is licensed under a Creative Commons Attribution-ShareAlike 3.0 License.
Creative Commons License


Hardy is end of life

Filed under: Blogging,Ubuntu,Ubuntu-Server — kees @ 12:53 pm

Well, the second Ubuntu Long Term Support release, 8.04 Hardy, has reached end-of-life. (Along with 11.10 Oneiric and the Desktop Support for the 10.04 LTS Lucid.) Flushing my package mirror of Hardy and Oneiric was pretty dramatic, freeing up about 142GB worth of space.


$ df -h /var/cache/mirrors/
Filesystem                        Size  Used Avail Use% Mounted on
/dev/mapper/sysvg-debmirrorlv  753G  692G   62G  92% /var/cache/mirror


$ df -h /var/cache/mirrors/
Filesystem                        Size  Used Avail Use% Mounted on
/dev/mapper/sysvg-debmirrorlv  753G  550G  204G  73% /var/cache/mirror

If only online filesize resize shrinking worked. :)

© 2013, Kees Cook. This work is licensed under a Creative Commons Attribution-ShareAlike 3.0 License.
Creative Commons License


facedancer built

Filed under: Blogging,Chrome OS,Embedded,Security,Ubuntu,Ubuntu-Server — kees @ 2:39 pm

I finally had the time to put together the facedancer11 that Travis Goodspeed was so kind to give me. I had ordered all the parts some time ago, but had been dreading the careful surface-mount soldering work it was going to require. As it turned out, I’m not half bad at it — everything seems to have worked the first time through. I did, however, fail to order 33ohm 0603 resistors, so I have some temporary ones in use until I can replace them.

My facedancer

This device allows the HOST side computer to drive USB protocol communication at the packet level, with the TARGET seeing a USB device on the other end. No more needing to write careful embedded code while breaking USB stacks: the fake USB device can be controlled with Python.

This means I’m able to start some more serious fuzzing of the USB protocol layer. There is already code for emulating HID (Keyboard), Mass Storage, and now Firmware Updates. There’s probably tons to look at just in that. For some background on the fun to be had just with Mass Storage devices, see Goodspeed’s 23C9 presentation on it.

© 2013, Kees Cook. This work is licensed under a Creative Commons Attribution-ShareAlike 3.0 License.
Creative Commons License


clean module disabling

Filed under: Blogging,Chrome OS,Security,Ubuntu,Ubuntu-Server — kees @ 3:55 pm

I think I found a way to make disabling kernel module loading (via /proc/sys/kernel/modules_disabled) easier for server admins. Right now there’s kind of a weird problem on some distros where reading /etc/modules races with reading /etc/sysctl.{conf,d}. In these cases, you can’t just put “kernel.modules_disabled=1” in the latter since you might not have finished loading modules from /etc/modules.

Before now, on my own systems, I’d added the sysctl call to my /etc/rc.local, which seems like a hack — that file is related to neither sysctl nor modules and both subsystems have their own configuration files, but it does happen absolutely last.

Instead, I’ve now defined “disable” as a modprobe alias via /etc/modprobe.d/disable.conf:

# To disable module loading after boot, "modprobe disable" can be used to
# set the sysctl that controls module loading.
install disable /sbin/sysctl kernel.modules_disabled=1

And then in /etc/modules I can list all the modules I actually need, and then put “disable” on the last line. (Or, if I want to not remember the sysctl path, I can manually run “modprobe disable” to turn off modules at some later point.)

I think it’d be cool this this become an internal alias in upstream kmod.

© 2012, Kees Cook. This work is licensed under a Creative Commons Attribution-ShareAlike 3.0 License.
Creative Commons License


ARM assembly

Filed under: Blogging,Chrome OS,Embedded,Ubuntu,Ubuntu-Server — kees @ 10:01 am

While I’ve been skimming ARM assembly here and there, yesterday I actually had to write some from scratch to hook up seccomp on ARM. I got stumped for a while, and ended up using two references frequently:

The suffix one is pretty interesting because ARM allows for instructions to be conditional, rather than being required to rely on branching, like x86. For example, if you wanted something like this in C:

    if (i == 0)
        i = 1;
    i = i + 1;

In x86 assembly, you’d have a compare followed by a jump to skip the moving of the “1″ value:

    cmp %ecx, $0
    jne 2
    mov %ecx, $1
2:  inc %ecx

In ARM assembly, you can make the move conditional with a suffix (“mov if equal”):

    cmp   r2, #0
    moveq r2, #1
    add   r2, r2, #1

The real thing that stumped me yesterday, though, was the “!” suffix on load/store. Mainly, I didn’t notice it was there until I’d stared at the objdump output and systematically trimmed away all other other code that wasn’t changing the behavior:

    ldr r0, [sp, #OFFSET]
    str r0, [sp, #OFFSET]!

I was reading this as “variable = variable;” and I thought I was going crazy; how could a self-assignment change the code at all? In the second reference above, I found the that the trailing “!” means “(pre)increment the base by the offset”. I was doing a meaningless assignment, but it had the side-effect of pushing the “sp” register forward, and suddenly it all made sense (I needed to unwind the stack). The actual solution I needed was:

    add sp, sp, #S_OFF

Yay for a crash-course in actual ARM assembly. :)

(And yes, I’m aware of x86′s “cmov”, but I just wanted to do a simple illustration. ARM can do conditional calls!)

© 2012, Kees Cook. This work is licensed under a Creative Commons Attribution-ShareAlike 3.0 License.
Creative Commons License


Link restrictions released in Linux 3.6

Filed under: Blogging,Chrome OS,Debian,Security,Ubuntu,Ubuntu-Server — kees @ 12:59 pm

It’s been a very long time coming, but symlink and hardlink restrictions have finally landed in the mainline Linux kernel as of version 3.6. The protection is at least old enough to have a driver’s license in most US states, with some of the first discussions I could find dating from Aug 1996.

While this protection is old (to ancient) news for anyone running Chrome OS, Ubuntu, grsecurity, or OpenWall, I’m extremely excited that is can now benefit everyone running Linux. All the way from cloud monstrosities to cell phones, an entire class of vulnerability just goes away. Thanks to everyone that had a part in developing, testing, reviewing, and encouraging these changes over the years. It’s quite a relief to have it finally done. I hope I never have to include the year in my patch revision serial number again. :)

© 2012, Kees Cook. This work is licensed under a Creative Commons Attribution-ShareAlike 3.0 License.
Creative Commons License



At the recent Ubuntu Developer Summit, I managed to convince a few people (after assurances that there would be no permanent damage) to plug a USB stick into their machines so we could watch Xorg crash and wedge their console. What was this evil thing, you ask? It was an AVR microprocessor connected to USB, acting as a USB HID Keyboard, with the product name set to “%n”.

Recently a Chrome OS developer discovered that renaming his Bluetooth Keyboard to “%n” would crash Xorg. The flaw was in the logging stack, triggering glibc to abort the process due to format string protections. At first glance, it looks like this isn’t a big deal since one would have to have already done a Bluetooth pairing with the keyboard, but it would be a problem for any input device, not just Bluetooth. I wanted to see this in action for a “normal” (USB) keyboard.

I borrowed a “Maximus” USB AVR from a friend, and then ultimately bought a Minimus. It will let you put anything you want on the USB bus.

I added a rule for it to udev:

SUBSYSTEM=="usb", ACTION=="add", ATTR{idVendor}=="03eb", ATTR{idProduct}=="*", GROUP="plugdev"

installed the AVR tools:

sudo apt-get install dfu-programmer gcc-avr avr-libc

and pulled down the excellent LUFA USB tree:

git clone git://

After applying a patch to the LUFA USB keyboard demo, I had my handy USB-AVR-as-Keyboard stick ready to crash Xorg:

-       .VendorID               = 0x03EB,
-       .ProductID              = 0x2042,
+       .VendorID               = 0x045e,
+       .ProductID              = 0x000b,
-       .UnicodeString          = L"LUFA Keyboard Demo"
+       .UnicodeString          = L"Keyboard (%n%n%n%n)"

In fact, it was so successfully that after I got the code right and programmed it, Xorg immediately crashed on my development machine. :)

make dfu

After a reboot, I switched it back to programming mode by pressing and holding the “H” button, press/releasing the “R” button, and releasing “H”.

The fix to Xorg is winding its way through upstream, and should land in your distros soon. In the meantime, you can disable your external USB ports, as Marc Deslauriers demonstrated for me:

echo "0" > /sys/bus/usb/devices/usb1/authorized
echo "0" > /sys/bus/usb/devices/usb1/authorized_default

Be careful of shared internal/external ports, and having two buses on one port, etc.

© 2012, Kees Cook. This work is licensed under a Creative Commons Attribution-ShareAlike 3.0 License.
Creative Commons License


keeping your process unprivileged

Filed under: Blogging,Chrome OS,Debian,Security,Ubuntu,Ubuntu-Server — kees @ 1:17 pm

One of the prerequisites for seccomp filter is the new PR_SET_NO_NEW_PRIVS prctl from Andy Lutomirski.

If you’re not interested in digging into creating a seccomp filter for your program, but you know your program should be effectively a “leaf node” in the process tree, you can call PR_SET_NO_NEW_PRIVS (nnp) to make sure that the current process and its children can not gain new privileges (like through running a setuid binary). This produces some fun results, since things like the “ping” tool expect to gain enough privileges to open a raw socket. If you set nnp to “1″, suddenly that can’t happen any more.

Here’s a quick example that sets nnp, and tries to run the command line arguments:

#include <stdio.h>
#include <unistd.h>
#include <sys/prctl.h>
# define PR_SET_NO_NEW_PRIVS 38

int main(int argc, char * argv[])
        if (prctl(PR_SET_NO_NEW_PRIVS, 1, 0, 0, 0)) {
                return 1;

        return execvp(argv[1], &argv[1]);

When it tries to run ping, the setuid-ness just gets ignored:

$ gcc -Wall nnp.c -o nnp
$ ./nnp ping -c1 localhost
ping: icmp open socket: Operation not permitted

So, if your program has all the privs its going to need, consider using nnp to keep it from being a potential gateway to more trouble. Hopefully we can ship something like this trivial nnp helper as part of coreutils or similar, like nohup, nice, etc.

© 2012, Kees Cook. This work is licensed under a Creative Commons Attribution-ShareAlike 3.0 License.
Creative Commons License


seccomp filter now in Ubuntu

Filed under: Blogging,Chrome OS,Debian,Security,Ubuntu,Ubuntu-Server — kees @ 10:02 pm

With the generous help of the Ubuntu kernel team, Will Drewry’s seccomp filter code has landed in Ubuntu 12.04 LTS in time for Beta 2, and will be in Chrome OS shortly. Hopefully this will be in upstream soon, and filter (pun intended) to the rest of the distributions quickly.

One of the questions I’ve been asked by several people while they developed policy for earlier “mode 2″ seccomp implementations was “How do I figure out which syscalls my program is going to need?” To help answer this question, and to show a simple use of seccomp filter, I’ve written up a little tutorial that walks through several steps of building a seccomp filter. It includes a header file (“seccomp-bpf.h“) for implementing the filter, and a collection of other files used to assist in syscall discovery. It should be portable, so it can build even on systems that do not have seccomp available yet.

Read more in the seccomp filter tutorial. Enjoy!

© 2012, Kees Cook. This work is licensed under a Creative Commons Attribution-ShareAlike 3.0 License.
Creative Commons License


discard, hole-punching, and TRIM

Filed under: Chrome OS,Debian,Ubuntu,Ubuntu-Server — kees @ 1:19 pm

Under Linux, there are a number of related features around marking areas of a file, filesystem, or block device as “no longer allocated”. In the standard view, here’s what happens if you fill a file to 500M and then truncate it to 100M, using the “truncate” syscall:

  1. create the empty file, filesystem allocates an inode, writes accounting details to block device.
  2. write data to file, filesystem allocates and fills data blocks, writes blocks to block device.
  3. truncate the file to a smaller size, filesystem updates accounting details and releases blocks, writes accounting details to block device.

The important thing to note here is that in step 3 the block device has no idea about the released data blocks. The original contents of the file are actually still on the device. (And to a certain extent is why programs like shred exist.) While the recoverability of such released data is a whole other issue, the main problem about this lack of information for the block device is that some devices (like SSDs) could use this information to their benefit to help with extending their life, etc. To support this, the “TRIM” set of commands were created so that a block device could be informed when blocks were released. Under Linux, this is handled by the block device driver, and what the filesystem can pass down is “discard” intent, which is translated into the needed TRIM commands.

So now, when discard notification is enabled for a filesystem (e.g. mount option “discard” for ext4), the earlier example looks like this:

  1. create the empty file, filesystem allocates an inode, writes accounting details to block device.
  2. write data to file, filesystem allocates and fills data blocks, writes blocks to block device.
  3. truncate the file to a smaller size, filesystem updates accounting details and releases blocks, writes accounting details and sends discard intent to block device.

While SSDs can use discard to do fancy SSD things, there’s another great use for discard, which is to restore sparseness to files. Normally, if you create a sparse file (open, seek to size, close), there was no way, after writing data to this file, to “punch a hole” back into it. The best that could be done was to just write zeros over the area, but that took up filesystem space. So, the ability to punch holes in files was added via the FALLOC_FL_PUNCH_HOLE option of fallocate. And when discard was enabled for a filesystem, these punched holes would get passed down to the block device as well.

Take, for example, a qemu/KVM VM running on a disk image that was built from a sparse file. While inside the VM instance, the disk appears to be 10G. Externally, it might only have actually allocated 600M, since those are the only blocks that had been allocated so far. In the instance, if you wrote 8G worth of temporary data, and then deleted it, the underlying sparse file would have ballooned by 8G and stayed ballooned. With discard and hole punching, it’s now possible for the filesystem in the VM to issue discards to the block driver, and then qemu could issue hole-punching requests to the sparse file backing the image, and all of that 8G would get freed again. The only down side is that each layer needs to correctly translate the requests into what the next layer needs.

With Linux 3.1, dm-crypt supports passing discards from the filesystem above down to the block device under it (though this has cryptographic risks, so it is disabled by default). With Linux 3.2, the loopback block driver supports receiving discards and passing them down as hole-punches. That means that a stack like this works now: ext4, on dm-crypt, on loopback of a sparse file, on ext4, on SSD. If a file is deleted at the top, it’ll pass all the way down, discarding allocated blocks all the way to the SSD:

Set up a sparse backing file, loopback mount it, and create a dm-crypt device (with “allow_discards”) on it:

# cd /root
# truncate -s10G test.block
# ls -lk test.block
-rw-r--r-- 1 root root 10485760 Feb 15 12:36 test.block
# du -sk test.block
0       test.block
# DEV=$(losetup -f --show /root/test.block)
# echo $DEV
# SIZE=$(blockdev --getsz $DEV)
# echo $SIZE
# KEY=$(echo -n "my secret passphrase" | sha256sum | awk '{print $1}')
# echo $KEY
# dmsetup create testenc --table "0 $SIZE crypt aes-cbc-essiv:sha256 $KEY 0 $DEV 0 1 allow_discards"

Now build an ext4 filesystem on it. This enables discard during mkfs, and disables lazy initialization so we can see the final size of the used space on the backing file without waiting for the background initialization at mount-time to finish, and mount it with the “discard” option:

# mkfs.ext4 -E discard,lazy_itable_init=0,lazy_journal_init=0 /dev/mapper/testenc
mke2fs 1.42-WIP (16-Oct-2011)
Discarding device blocks: done
Filesystem label=
OS type: Linux
Block size=4096 (log=2)
Fragment size=4096 (log=2)
Stride=0 blocks, Stripe width=0 blocks
655360 inodes, 2621440 blocks
131072 blocks (5.00%) reserved for the super user
First data block=0
Maximum filesystem blocks=2684354560
80 block groups
32768 blocks per group, 32768 fragments per group
8192 inodes per group
Superblock backups stored on blocks:
        32768, 98304, 163840, 229376, 294912, 819200, 884736, 1605632

Allocating group tables: done
Writing inode tables: done
Creating journal (32768 blocks): done
Writing superblocks and filesystem accounting information: done 

# mount -o discard /dev/mapper/testenc /mnt
# sync; du -sk test.block
297708  test.block

Now, we create a 200M file, examine the backing file allocation, remove it, and compare the results:

# dd if=/dev/zero of=/mnt/blob bs=1M count=200
200+0 records in
200+0 records out
209715200 bytes (210 MB) copied, 9.92789 s, 21.1 MB/s
# sync; du -sk test.block
502524  test.block
# rm /mnt/blob
# sync; du -sk test.block
297720  test.block

Nearly all the space was reclaimed after the file was deleted. Yay!

Note that the Linux tmpfs filesystem does not yet support hole punching, so the exampe above wouldn’t work if you tried it in a tmpfs-backed filesystem (e.g. /tmp on many systems).

© 2012, Kees Cook. This work is licensed under a Creative Commons Attribution-ShareAlike 3.0 License.
Creative Commons License


kvm and product_uuid

Filed under: Chrome OS,Debian,Ubuntu,Ubuntu-Server — kees @ 10:08 am

While looking for something to use as a system-unique fall-back when a TPM is not available, I looked at /sys/devices/virtual/dmi/id/product_uuid (same as dmidecode‘s “System Information / UUID”), but was disappointed when, under KVM, the file was missing (and running dmidecode crashes KVM *cough*). However, after a quick check, I noticed that KVM supports the “-uuid” option to set the value of /sys/devices/virtual/dmi/id/product_uuid. Looks like libvirt supports this under capabilities / host / uuid in the XML, too.

host# kvm -uuid 12345678-ABCD-1234-ABCD-1234567890AB ...
host# ssh localhost ...
guest# cat /sys/devices/virtual/dmi/id/product_uuid

© 2012, Kees Cook. This work is licensed under a Creative Commons Attribution-ShareAlike 3.0 License.
Creative Commons License


use of ptrace

Filed under: Blogging,Chrome OS,Security,Ubuntu,Ubuntu-Server — kees @ 4:48 pm

As I discussed last year, Ubuntu has been restricting the use of ptrace for a few releases now. I’m excited to see Fedora starting to introduce similar restrictions, but I’m disappointed at the specific implementation:

  • A method for doing this already exists (Yama). Yama is not plumbed into SELinux, but I would argue that’s not needed.
  • The SELinux method depends, unsurprisingly, on an active SELinux policy on the system, which isn’t everyone.
  • It’s not possible for regular developers (not system developers) to debug their own processes.
  • It will break all ptrace-based crash handlers (e.g. KDE, Firefox, Chrome) or tools that depend on ptrace to do their regular job (e.g. Wine, gdb, strace, ltrace).

Blocking ptrace blocks exactly one type of attack: credential extraction from a running process. In the face of a persistent attack, ultimately, anything running as the user can be trojaned, regardless of ptrace. Blocking ptrace, however, stalls the initial attack. At the moment an attacker arrives on a system, they cannot immediately extend their reach by examining the other processes (e.g. jumping down existing SSH connections, pulling passwords out of Firefox, etc). Some sensitive processes are already protected from this kind of thing because they are not “dumpable” (due to either specifically requesting this from prctl(PR_SET_DUMPABLE, ...) or due to a uid/gid transition), but many are open for abuse.

The primary “valid” use cases for ptrace are crash handlers, debuggers, and memory analysis tools. In each case, they have a single common element: the process being ptraced knows which process should have permission to attach to it. What Linux lacked was a way to declare these relationships, which is what Yama added. The use of SELinux policy, for example, isn’t sufficient because the permissions are too wide (e.g. giving gdb the ability to ptrace anything just means the attacker has to use gdb to do the job). Right now, due to the use of Yama in Ubuntu, all the mentioned tools have the awareness of how to programmatically declare the ptrace relationships at runtime with prctl(PR_SET_PTRACER, ...). I find it disappointing that Fedora won’t be using this to their advantage when it is available and well tested.

Even ChromeOS uses Yama now. ;)

© 2012, Kees Cook. This work is licensed under a Creative Commons Attribution-ShareAlike 3.0 License.
Creative Commons License


fixing vulnerabilities with systemtap

Filed under: Blogging,Debian,Security,Ubuntu,Ubuntu-Server,Vulnerabilities — kees @ 3:22 pm

Recently the upstream Linux kernel released a fix for a serious security vulnerability (CVE-2012-0056) without coordinating with Linux distributions, leaving a window of vulnerability open for end users. Luckily:

  • it is only a serious issue in 2.6.39 and later (e.g. Ubuntu 11.10 Oneiric)
  • it is “only” local
  • it requires execute access to a setuid program that generates output

Still, it’s a cross-architecture local root escalation on most common installations. Don’t stop reading just because you don’t have a local user base — attackers can use this to elevate privileges from your user, or from the web server’s user, etc.

Since there is now a nearly-complete walk-through, the urgency for fixing this is higher. While you’re waiting for your distribution’s kernel update, you can use systemtap to change your kernel’s running behavior. RedHat suggested this, and here’s how to do it in Debian and Ubuntu:

  • Download the “am I vulnerable?” tool, either from RedHat (above), or a more correct version from Brad Spengler.
  • Check if you’re vulnerable:
    $ make correct_proc_mem_reproducer
    $ ./correct_proc_mem_reproducer
  • Install the kernel debugging symbols (this is big — over 2G installed on Ubuntu) and systemtap:
    • Debian:
      # apt-get install -y systemtap linux-image-$(uname -r)-dbg
    • Ubuntu:
      • Add the debug package repository and key for your Ubuntu release:
        $ sudo apt-get install -y lsb-release
        $ echo "deb $(lsb_release -cs) main restricted universe multiverse" | \
              sudo tee -a /etc/apt/sources.list.d/ddebs.list
        $ sudo apt-key adv --keyserver --recv-keys ECDCAD72428D7C01
        $ sudo apt-get update
      • (This step does not work since the repository metadata isn’t updating correctly at the moment — see the next step for how to do this manually.) Install the debug symbols for the kernel and install systemtap:
        sudo apt-get install -y systemtap linux-image-$(uname -r)-dbgsym
      • (Manual version of the above, skip if the above works for you. Note that this has no integrity checking, etc.)
        $ sudo apt-get install -y systemtap dpkg-dev
        $ wget$(dpkg -l linux-image-$(uname -r) | grep ^ii | awk '{print $2 "-dbgsym_" $3}' | tail -n1)_$(dpkg-architecture -qDEB_HOST_ARCH).ddeb
        $ sudo dpkg -i linux-image-$(uname -r)-dbgsym.ddeb
  • Create a systemtap script to block the mem_write function, and install it:
    $ cat > proc-pid-mem.stp <<'EOM'
    probe kernel.function("mem_write@fs/proc/base.c").call {
            $count = 0
    $ sudo stap -Fg proc-pid-mem.stp
  • Check that you’re no longer vulnerable (until the next reboot):
    $ ./correct_proc_mem_reproducer
    not vulnerable

In this case, the systemtap script is changing the argument containing the size of the write to zero bytes ($count = 0), which effectively closes this vulnerability.

UPDATE: here’s a systemtap script from Soren that doesn’t require the full debug symbols. Sneaky, put can be rather slow since it hooks all writes in the system. :)

© 2012, Kees Cook. This work is licensed under a Creative Commons Attribution-ShareAlike 3.0 License.
Creative Commons License


abusing the FILE structure

Filed under: Blogging,Debian,Security,Ubuntu,Ubuntu-Server — kees @ 4:46 pm

When attacking a process, one interesting target on the heap is the FILE structure used with “stream functions” (fopen(), fread(), fclose(), etc) in glibc. Most of the FILE structure (struct _IO_FILE internally) is pointers to the various memory buffers used for the stream, flags, etc. What’s interesting is that this isn’t actually the entire structure. When a new FILE structure is allocated and its pointer returned from fopen(), glibc has actually allocated an internal structure called struct _IO_FILE_plus, which contains struct _IO_FILE and a pointer to struct _IO_jump_t, which in turn contains a list of pointers for all the functions attached to the FILE. This is its vtable, which, just like C++ vtables, is used whenever any stream function is called with the FILE. So on the heap, we have:

glibc FILE vtable location

In the face of use-after-free, heap overflows, or arbitrary memory write vulnerabilities, this vtable pointer is an interesting target, and, much like the pointers found in setjmp()/longjmp(), atexit(), etc, could be used to gain control of execution flow in a program. Some time ago, glibc introduced PTR_MANGLE/PTR_DEMANGLE to protect these latter functions, but until now hasn’t protected the FILE structure in the same way.

I’m hoping to change this, and have introduced a patch to use PTR_MANGLE on the vtable pointer. Hopefully I haven’t overlooked something, since I’d really like to see this get in. FILE structure usage is a fair bit more common than setjmp() and atexit() usage. :)

Here’s a quick exploit demonstration in a trivial use-after-free scenario:

#include <stdio.h>
#include <stdlib.h>

void pwn(void)
    printf("Dave, my mind is going.\n");

void * funcs[] = {
    NULL, // "extra word"
    NULL, // DUMMY
    exit, // finish
    NULL, // overflow
    NULL, // underflow
    NULL, // uflow
    NULL, // pbackfail
    NULL, // xsputn
    NULL, // xsgetn
    NULL, // seekoff
    NULL, // seekpos
    NULL, // setbuf
    NULL, // sync
    NULL, // doallocate
    NULL, // read
    NULL, // write
    NULL, // seek
    pwn,  // close
    NULL, // stat
    NULL, // showmanyc
    NULL, // imbue

int main(int argc, char * argv[])
    FILE *fp;
    unsigned char *str;

    printf("sizeof(FILE): 0x%x\n", sizeof(FILE));

    /* Allocate and free enough for a FILE plus a pointer. */
    str = malloc(sizeof(FILE) + sizeof(void *));
    printf("freeing %p\n", str);

    /* Open a file, observe it ended up at previous location. */
    if (!(fp = fopen("/dev/null", "r"))) {
        return 1;
    printf("FILE got %p\n", fp);
    printf("_IO_jump_t @ %p is 0x%08lx\n",
           str + sizeof(FILE), *(unsigned long*)(str + sizeof(FILE)));

    /* Overwrite vtable pointer. */
    *(unsigned long*)(str + sizeof(FILE)) = (unsigned long)funcs;
    printf("_IO_jump_t @ %p now 0x%08lx\n",
           str + sizeof(FILE), *(unsigned long*)(str + sizeof(FILE)));

    /* Trigger call to pwn(). */

    return 0;

Before the patch:

$ ./mini
sizeof(FILE): 0x94
freeing 0x9846008
FILE got 0x9846008
_IO_jump_t @ 0x984609c is 0xf7796aa0
_IO_jump_t @ 0x984609c now 0x0804a060
Dave, my mind is going.

After the patch:

$ ./mini
sizeof(FILE): 0x94
freeing 0x9846008
FILE got 0x9846008
_IO_jump_t @ 0x984609c is 0x3a4125f8
_IO_jump_t @ 0x984609c now 0x0804a060
Segmentation fault

Astute readers will note that this demonstration takes advantage of another characteristic of glibc, which is that its malloc system is unrandomized, allowing an attacker to be able to determine where various structures will end up in the heap relative to each other. I’d like to see this fixed too, but it’ll require more time to study. :)

© 2011, Kees Cook. This work is licensed under a Creative Commons Attribution-ShareAlike 3.0 License.
Creative Commons License


how to throw an EC2 party

Filed under: Blogging,Debian,Ubuntu,Ubuntu-Server — kees @ 9:53 am

Prepare a location to run juju and install it:

mkdir ~/party
cd ~/party
sudo apt-get install juju

Initialize your juju environment. Be sure to add “juju-origin: ppa” to your environment, along with filling in your access-key and secret-key from your Amazon AWS account. Note that control-bucket and admin-secret should not be used by any other environment or juju won’t be able to distinguish them. Other variables are good to set now too. I wanted my instances close to me, use I set “region: us-west-1“. I also wanted a 64bit system, so using the AMI list, I chose “default-series: oneiric“, “default-instance-type: m1.large” and “default-image-id: ami-7b772b3e

$EDITOR ~/.juju/environments.yaml

Get my sbuild charm, and configure some types of builders. The salt should be something used only for this party; it is used to generate the random passwords for the builder accounts. The distro and releases can be set to whatever the mk-sbuild tool understands.

bzr co lp:~kees/charm/oneiric/sbuild/trunk sbuild-charm
cat >local.yaml <<EOM
    salt: some-secret-phrase-for-this-party
    distro: debian
    releases: unstable
    salt: some-secret-phrase-for-this-party
    distro: ubuntu
    releases: precise,oneiric

Bootstrap juju and wait for ec2 instance to come up.

juju bootstrap

Before running the status, you can either accept the SSH key blindly, or use “ec2-describe-instances” to find the instance and public host name, and use my “wait-for-ssh” tool to inject the SSH host key into your ~/.ssh/known_hosts file. This requires having set up the environment variables needed by “ec2-describe-instances“, though.

ec2-describe-instances --region REGION
./sbuild-charm/wait-for-ssh INSTANCE HOST REGION

Get status:

juju status

Deploy a builder:

juju deploy --config local.yaml --repository $PWD local:sbuild-charm builder-debian

Deploy more of the same type:

juju add-unit builder-debian
juju add-unit builder-debian
juju add-unit builder-debian

Now you have to wait for them to finish installing, which will take a while. Once they’re at least partially up (the “builder” user has been created), you can print out the slips of paper to hand out to your party attendees:

./sbuild-charm/slips | mpage -1 > /tmp/
ps2pdf /tmp/ /tmp/slips.pdf

They look like this:

Unit: builder-debian/3
SSH key fingerprints:
  1024 3e:f7:66:53:a9:e8:96:c7:27:36:71:ce:2a:cf:65:31 (DSA)
  256 53:a9:e8:96:c7:20:6f:8f:4a:de:b2:a3:b7:6b:34:f7 (ECDSA)
  2048 3b:29:99:20:6f:8f:4a:de:b2:a3:b7:6b:34:bc:7a:e3 (RSA)
Username: builder
Password: 68b329da9893

To admin the machines, you can use juju itself, where N is the machine number from the “juju status” output:

juju ssh N

To add additional chroots to the entire builder service, add them to the config:

juju set builder-debian release=unstable,testing,stable
juju set builder-ubuntu release=precise,oneiric,lucid,natty

Notes about some of the terrible security hacks this charm does:

  • enables password-based SSH access (and locks the default “ubuntu” account), so party attendees don’t need anything but the ssh client itself to get to the builders.
  • starts rngd -r /dev/urandom to create terrible but plentiful entropy for the sbuild GPG key generation.


© 2011, Kees Cook. This work is licensed under a Creative Commons Attribution-ShareAlike 3.0 License.
Creative Commons License

juju bug fixing

Filed under: Blogging,Debian,Ubuntu,Ubuntu-Server — kees @ 9:11 am

My earlier post on juju described a number of weird glitches I ran into. I got invited by hazmat on IRC (freenode #juju) to try to reproduce the problems so we could isolate the trouble.

Fix #1: use the version from the PPA. The juju setup documentation doesn’t mention this, but it seems that adding “juju-origin: ppa” to your ~/.juju/environment.yaml is a good idea. I suggest it be made the default, and to link to the full list of legal syntax for the environment.yaml file. I was not able to reproduce the missing-machines-at-startup problem after doing this, but perhaps it’s a hard race to lose.

Fix #2: don’t use “terminate-machine“. :P There seems to be a problem around doing the following series of commands: “juju remove-unit FOO/N; juju terminate-machine X; juju add-unit FOO“. This makes the provisioner go crazy, and leaves all further attempts to add units stick in “pending” forever.

Big thank you to hazmat and SpamapS for helping debug this.

© 2011, Kees Cook. This work is licensed under a Creative Commons Attribution-ShareAlike 3.0 License.
Creative Commons License


EC2 instances in support of a BSP

Filed under: Blogging,Debian,Ubuntu,Ubuntu-Server — kees @ 4:05 pm

On Sunday, I brought up EC2 instances to support the combined Debian Bug Squashing Party/Ubuntu Local Jam that took place at PuppetLabs in Portland, OR, USA. The intent was to provide each participant with their own sbuild environment on a 64bit machine, since we were going to be working on Multi-Arch support, and having both 64bit and 32bit chroots would be helpful. The host was an Ubuntu 11.10 (Oneiric) instance so it would be possible to do SRU verifications in the cloud too.

I was curious about the juju provisioning system, since it has an interesting plugin system, called “charms”, that can be used to build out services. I decided to write an sbuild charm, which was pretty straight forward and quite powerful (using this charm it would be possible to trigger the creation of new schroots across all instances at any time, etc).

The juju service itself works really well when it works correctly. When something goes wrong, unfortunately, it becomes nearly impossible to debug or fix. Repeatedly while working on charm development, the provisioning system would lose its mind, and I’d have to destroy the entire environment and re-bootstrap to get things running again. I had hoped this wouldn’t be the case while I was using it during “production” on Sunday, but the provisioner broke spectacularly on Sunday too. Due to the fragility of the juju agents, it wasn’t possible to restart the provisioner — it lost its mind, the other agent’s couldn’t talk to it any more, etc. I would expect the master services on a cloud instance manager to be extremely robust since having it die would mean totally losing control of all your instances.

On Sunday morning, I started 8 instances. 6 came up perfectly and were excellent work-horses all day at the BSP. 2 never came up. The EC2 instances started, but the service provisioner never noticed them. Adding new units didn’t work (instances would start, but no services would notice them), and when I tried to remove the seemingly broken machines, the instance provisioner completely went crazy and started dumping Python traces into the logs (which seems to be related to this bug, though some kind of race condition seems to have confused it much earlier than this total failure), and that was it. We used the instances we had, and I spent 3 hours trying to fix the provisioner, eventually giving up on it.

I was very pleased with EC2 and Ubuntu Server itself on the instances. The schroots worked, sbuild worked (though I identified some additional things that the charm should likely do for setup). I think juju has a lot of potential, but I’m surprised at how fragile it is. It didn’t help that Amazon had rebooted the entire West Coast the day before and there were dead Ubuntu Archive Mirrors in the DNS rotation.

For anyone else wanting to spin up builders in the cloud using juju, I have a run-down of what this looks like from the admin’s perspective, and even include a little script to produce little slips of paper to hand out to attendees with an instance’s hostname, ssh keys, and builder SSH password. Seemed to work pretty well overall; I just wish I could have spun up a few more. :)

So, even with the fighting with juju and a few extra instances that came up and I had to shut down again without actually using them, the total cost to run the instances for the whole BSP was about US$40, and including the charm development time, about US$45.

UPDATE: some more details on how to avoid the glitches I hit.

© 2011, Kees Cook. This work is licensed under a Creative Commons Attribution-ShareAlike 3.0 License.
Creative Commons License


non-executable kernel memory progress

Filed under: Blogging,Debian,Security,Ubuntu,Ubuntu-Server — kees @ 2:39 pm

The Linux kernel attempts to protect portions of its memory from unexpected modification (through potential future exploits) by setting areas read-only where the compiler has allowed it (CONFIG_DEBUG_RODATA). This, combined with marking function pointer tables “const”, reduces the number of easily writable kernel memory targets for attackers.

However, modules (which are almost the bulk of kernel code) were not handled, and remained read-write, regardless of compiler markings. In 2.6.38, thanks to the efforts of many people (especially Siarhei Liakh and Matthieu Castet), CONFIG_DEBUG_SET_MODULE_RONX was created (and CONFIG_DEBUG_RODATA expanded).

To visualize the effects, I patched Arjan van de Ven’s arch/x86/mm/dump_pagetables.c to be a loadable module so I could look at /sys/kernel/debug/kernel_page_tables without needing to rebuild my kernel with CONFIG_X86_PTDUMP.

Comparing Lucid (2.6.32), Maverick (2.6.35), and Natty (2.6.38), it’s clear to see the effects of the RO/NX improvements, especially in the “Modules” section which has no NX markings at all before 2.6.38:

lucid-amd64# awk '/Modules/,/End Modules/' /sys/kernel/debug/kernel_page_tables | grep NX | wc -l

maverick-amd64# awk '/Modules/,/End Modules/' /sys/kernel/debug/kernel_page_tables | grep NX | wc -l

natty-amd64# awk '/Modules/,/End Modules/' /sys/kernel/debug/kernel_page_tables | grep NX | wc -l

2.6.38′s memory region is much more granular, since each module has been chopped up for the various segment permissions:

lucid-amd64# awk '/Modules/,/End Modules/' /sys/kernel/debug/kernel_page_tables | wc -l

maverick-amd64# awk '/Modules/,/End Modules/' /sys/kernel/debug/kernel_page_tables | wc -l

natty-amd64# awk '/Modules/,/End Modules/' /sys/kernel/debug/kernel_page_tables | wc -l

For example, here’s the large “sunrpc” module. “RW” is read-write, “ro” is read-only, “x” is executable, and “NX” is non-executable:

maverick-amd64# awk '/^'$(awk '/^sunrpc/ {print $NF}' /proc/modules)'/','!/GLB/' /sys/kernel/debug/kernel_page_tables
0xffffffffa005d000-0xffffffffa0096000         228K     RW             GLB x  pte
0xffffffffa0096000-0xffffffffa0098000           8K                           pte

natty-amd64# awk '/^'$(awk '/^sunrpc/ {print $NF}' /proc/modules)'/','!/GLB/' /sys/kernel/debug/kernel_page_tables
0xffffffffa005d000-0xffffffffa007a000         116K     ro             GLB x  pte
0xffffffffa007a000-0xffffffffa0083000          36K     ro             GLB NX pte
0xffffffffa0083000-0xffffffffa0097000          80K     RW             GLB NX pte
0xffffffffa0097000-0xffffffffa0099000           8K                           pte

The latter looks a whole lot more like a proper ELF (text segment is read-only and executable, rodata segment is read-only and non-executable, and data segment is read-write and non-executable).

Just another reason to make sure you’re using your CPU’s NX bit (via 64bit or 32bit-PAE kernels)! (And no, PAE is not slower in any meaningful way.)

© 2011, Kees Cook. This work is licensed under a Creative Commons Attribution-ShareAlike 3.0 License.
Creative Commons License


Linux Security Summit 2011 CFP

Filed under: Blogging,Debian,Security,Ubuntu,Ubuntu-Server — kees @ 11:06 am

I’m once again on the program committee for the Linux Security Summit, so I’d love to see people submit talks, attend, etc. It will be held along with the Linux Plumber’s Conference, on September 8th in Santa Rosa, CA, USA.

I’d really like to see more non-LSM developers and end-users show up for this event. We need people interested in defining threats and designing defenses. There is a lot of work to be done on all kinds of fronts and having people voice their opinions and plans can really help us prioritize the areas that need the most attention.

Here’s one of many archives of the announcement, along with the website. We’ve got just under 2 months to get talks submitted (May 27th deadline), with speaker notification quickly after that on June 1st.

Come help us make Linux more secure! :)

© 2011, Kees Cook. This work is licensed under a Creative Commons Attribution-ShareAlike 3.0 License.
Creative Commons License


ptracing siblings

Filed under: Blogging,Debian,Security,Ubuntu,Ubuntu-Server — kees @ 5:29 pm

In Ubuntu, the use of ptrace is restricted. The default allowed relationship between the debugger and the debuggee is that parents are allowed to ptrace their descendants. This means that running “gdb /some/program” and “strace /some/program” Just Works. Using gdb‘s “attach” and strace‘s “-p” options need CAP_SYS_PTRACE, care of sudo.

The next most common use-case was that of crash handlers needing to do a live ptrace of a crashing program (in the rare case of Apport being insufficient). For example, KDE applications have a segfault handler that calls out to kdeinit and requests that the crash handling process be started on it, and then sits in a loop waiting to be attached to. While kdeinit is the parent of both the crashing program (debuggee) and the crash handling program (debugger), the debugger cannot attach to the debugee since they are siblings, not parent/descendant. To solve this, a prctl() call was added so that the debugee could declare who’s descendants were going to attach to it. KDE patched their segfault handler to make the prctl() and everything Just Works again.

Breakpad, the crash handler for Firefox and Chromium, was updated to do effectively the same thing, though they had to add code to pass the process id back to the debuggee since they didn’t have it handy like KDE.

Another use-case was Wine, where for emulation to work correctly, they needed to allow all Wine processes to ptrace each other to correctly emulate Windows. For this, they just declared that all descendants of the wine-server could debug a given Wine process, there-by confining their ptrace festival to just Wine programs.

One of the remaining use-cases is that of a debugging IDE that doesn’t directly use ptrace itself. For example, qtcreator will launch a program and then later attach to it by launching gdb and using the “attach” command. This looks a lot like the crash handler use-case, except that the debuggee doesn’t have any idea that it is running under an IDE. A simple solution for this is to have the IDE run its programs with the LD_PRELOAD environment variable aimed at a short library that just calls prctl() with the parent process id, and suddenly the IDE and its descendants (i.e. gdb) can debug the program all day long.

I’ve got an example of this preloadable library written. If it turns out this is generally useful for IDEs, I could package it up like fakeroot and faketime.

© 2011, Kees Cook. This work is licensed under a Creative Commons Attribution-ShareAlike 3.0 License.
Creative Commons License


shaping the direction of research

Filed under: Blogging,Debian,Security,Ubuntu,Ubuntu-Server,Vulnerabilities — kees @ 1:45 pm

Other people have taken notice of the recent “auto-run” attack research against Linux. I was extremely excited to see Jon Larimer publishing this stuff, since it ultimately did not start with the words, “first we disabled NX, ASLR, and (SELinux|AppArmor) …”

I was pretty disappointed with last year’s Blackhat conference because so many of the presentations just rehashed ancient exploitation techniques, and very few actually showed new ideas. I got tired of seeing mitigation technologies disabled to accomplish an attack. That’s kind of not the point.

Anyway, Jon’s research is a step in the right direction. He defeats ASLR via brute-force, side-steps NX with ret-to-libc, and finds policy holes in AppArmor to accomplish the goal. I was pleased to see “protected by PIE and AppArmor” in his slides — Ubuntu’s hardening of evince was very intentional. It has proven to be a dangerous piece of software, which Jon’s research just further reinforces. He chose to attack the difficult target instead of going after what might have been the easier thumbnailers.

So, because of this research, we can take a step back and think about what could be done to improve the situation from a proactive security perspective. A few things stand out:

  • GNOME really shouldn’t be auto-mounting anything while the screen is locked (LP: #714958).
  • AppArmor profiles for the other thumbnailers should be written (LP: #715874).
  • The predictable ASLR found in the NX-emulation patch is long over-due to be fixed. This has been observed repeatedly before, but I hadn’t actually opened a bug for it yet. Now I have. (LP: #717412)
  • Media players should be built PIE. This has been on the Roadmap for a while now, but is not as easy as it sounds because several of them use inline assembly for speed, and that can be incompatible with PIE.
  • Consider something like grsecurity’s GRKERNSEC_BRUTE to slow down execution of potentially vulnerable processes. It’s like the 3 second delay between bad password attempts.

Trying to brute-force operational ASLR on a 64bit system, though, would probably not have worked. So, again, I stand by my main recommendation for security: use 64bit. :)

Good stuff; thanks Jon!

© 2011, Kees Cook. This work is licensed under a Creative Commons Attribution-ShareAlike 3.0 License.
Creative Commons License


gcc-4.5 and -D_FORTIFY_SOURCE=2 with “header” structures

Filed under: Blogging,Debian,Security,Ubuntu,Ubuntu-Server — kees @ 6:11 pm

Recently gcc (4.5) improved its ability to see the size of various structures. As a result, the FORTIFY protections have suddenly gotten a bit stricter. In the past, you used to be able to do things like this:

struct thingy {
    int magic;
    char data[4];

void work(char *input) {
    char buffer[1000];
    int length;
    struct thingy *header;

    header = (struct thingy *)buffer;

    length = strlen(input);
    if (length > sizeof(buffer) - sizeof(*header) - 1) abort();

    strcpy(header->data, input);
    header->magic = 42;


The problem here is that gcc thinks that header->data is only 4 bytes long. But gcc doesn’t know we intentionally overruled this (and even did length checking), so due to -D_FORTIFY_SOURCE=2, the strcpy() checks kick in when input is more than 4 bytes.

The fix, in this case, is to use memcpy() instead, since we actually know how long our destination is, we can replace the strcpy(...) line with:

    memcpy(header->data, input, length + 1); /* take 0-term too */

This kind of header and then data stuff is common for protocol handlers. So far, things like Wine, TFTP, and others have been experiencing problems with the change. Please keep an eye out for it when doing testing.

© 2010, Kees Cook. This work is licensed under a Creative Commons Attribution-ShareAlike 3.0 License.
Creative Commons License


TARPIT iptables target

Filed under: Blogging,Debian,Networking,Security,Ubuntu,Ubuntu-Server — kees @ 9:21 am

Want to use a network tarpit? It’s so easy to set up! Thanks to jpds for this whole post. :)

sudo module-assistant auto-install xtables-addons-source
sudo iptables -p tcp ... -j TARPIT

Though no such thing exists for IPv6 yet.

Here it is watching over the SSH port:

iptables -N INGRESS-SSH
iptables -A INPUT -p tcp --dport 22 -m state --state NEW -j INGRESS-SSH
iptables -A INGRESS-SSH -p tcp --dport 22 -m state --state NEW -m recent --name SSH --set
iptables -A INGRESS-SSH -p tcp --dport 22 -m state --state NEW -m recent --name SSH --update --rttl --seconds 60 --hitcount 4 -j LOG --log-prefix "[INGRESS SSH TARPIT] "
iptables -A INGRESS-SSH -p tcp --dport 22 -m state --state NEW -m recent --name SSH --rcheck --rttl --seconds 60 --hitcount 4 -j TARPIT

© 2010, Kees Cook. This work is licensed under a Creative Commons Attribution-ShareAlike 3.0 License.
Creative Commons License


security is more than bug fixing

Filed under: Blogging,Debian,Security,Ubuntu,Ubuntu-Server — kees @ 12:20 pm

Security is more than bug fixing. Security fixing/updating, the thing most people are exposed to, is “reactive security”. However, a large area of security work is “proactive” where defensive abilities are put in place to try and catch problems before they happen, or make classes of vulnerabilities unexploitable. This kind of security is what a lot of people don’t understand, and I think it’s important to point out so the distinction can be clearly seen.

In the Linux kernel, there’s yet another distinction: userspace proactive security and kernel proactive security. Most of the effort in kernel code has been protecting userspace from itself (things like Address Space Layout Randomization), but less attention has been given to protecting the kernel from userspace (currently if a serious enough flaw is found in the kernel, it is usually very easy to exploit it).

One project has taken great strides with proactive security for the Linux kernel: PaX and grsecurity. There hasn’t been a concerted effort to get its pieces upstream and it’s long overdue. People are starting to take proactive kernel security more seriously, though there is still plenty of debate.

While I did my best to push some userspace protections upstream earlier in the year, now it’s time for kernel protections. What to help? Here is the initial list of things to do.

Dan Rosenberg has started the information leaks discussion, and I’ve started the read-only memory discussion. Hopefully this will go somewhere good.

© 2010, Kees Cook. This work is licensed under a Creative Commons Attribution-ShareAlike 3.0 License.
Creative Commons License


Jettison Jaunty

Filed under: Blogging,Security,Ubuntu,Ubuntu-Server — kees @ 10:07 pm

Jaunty Jackalope (Ubuntu 9.04) went End-Of-Life on Saturday.

Looking back through my build logs, it seems my desktop did 223 builds, spending 19 hours, 18 minutes, and 23 seconds doing builds during the development cycle of Jaunty. Once released, it performed an additional 99 builds, taking 18 hours, 3 minutes, and 37 seconds for security updates. As before, these times obviously don’t include patch hunting/development, failed builds, testing, stuff done on my laptop or the porting machines, etc.

Combined devel/security build standings per current release:

dapper: 59:19:10
hardy: 189:32:51
karmic: 57:44:27
lucid: 36:07:05
maverick: 13:54:15

Looking at the build histories, Gutsy and Jaunty had about the same amount of builds (around 19 hours) during development, but Intrepid was a whopping 70 hours. This was related to all the default compiler flag testing there. I rebuilt the entire “main” component multiple times that release. Jaunty was a nice return to normalcy.

© 2010, Kees Cook. This work is licensed under a Creative Commons Attribution-ShareAlike 3.0 License.
Creative Commons License

Older Posts »

Powered by WordPress