live patching the kernel

December 10, 2013

live patching the kernel

Filed under: Blogging,Chrome OS,Debian,Security,Ubuntu,Ubuntu-Server — kees @ 3:40 pm

A nice set of recent posts have done a great job detailing the remaining ways that a root user can get at kernel memory. Part of this is driven by the ideas behind UEFI Secure Boot, but they come from the same goal: making sure that the root user cannot directly subvert the running kernel. My perspective on this is toward making sure that an attacker who has gained access and then gained root privileges can’t continue to elevate their access and install invisible kernel rootkits.

An outline for possible attack vectors is spelled out by Matthew Gerrett’s continuing “useful kernel lockdown” patch series. The set of attacks was examined by Tyler Borland in “Bypassing modules_disabled security”. His post describes each vector in detail, and he ultimately chooses MSR writing as the way to write kernel memory (and shows an example of how to re-enable module loading). One thing not mentioned is that many distros have MSR access as a module, and it’s rarely loaded. If modules_disabled is already set, an attacker won’t be able to load the MSR module to begin with. However, the other general-purpose vector, kexec, is still available. To prove out this method, Matthew wrote a proof-of-concept for changing kernel memory via kexec.

Chrome OS is several steps ahead here, since it has hibernation disabled, MSR writing disabled, kexec disabled, modules verified, root filesystem read-only and verified, kernel verified, and firmware verified. But since not all my machines are Chrome OS, I wanted to look at some additional protections against kexec on general-purpose distro kernels that have CONFIG_KEXEC enabled, especially those without UEFI Secure Boot and Matthew’s lockdown patch series.

My goal was to disable kexec without needing to rebuild my entire kernel. For future kernels, I have proposed adding /proc/sys/kernel/kexec_disabled, a partner to the existing modules_disabled, that will one-way toggle kexec off. For existing kernels, things got more ugly.

What options do I have for patching a running kernel?

First I looked back at what I’d done in the past with fixing vulnerabilities with systemtap. This ends up being a rather heavy-duty way to go about things, since you need all the distro kernel debug symbols, etc. It does work, but has a significant problem: since it uses kprobes, a root user can just turn off the probes, reverting the changes. So that’s not going to work.

Next I looked at ksplice. The original upstream has gone away, but there is still some work being done by Jiri Slaby. However, even with his updates which fixed various build problems, there were still more, even when building a 3.2 kernel (Ubuntu 12.04 LTS). So that’s out too, which is too bad, since ksplice does exactly what I want: modifies the running kernel’s functions via a module.

So, finally, I decided to just do it by hand, and wrote a friendly kernel rootkit. Instead of dealing with flipping page table permissions on the normally-unwritable kernel code memory, I borrowed from PaX’s KERNEXEC feature, and just turn off write protect checking on the CPU briefly to make the changes. The return values for functions on x86_64 are stored in RAX, so I just need to stuff the kexec_load syscall with “mov -1, %rax; ret” (-1 is EPERM):

#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt

#include <linux/init.h>
#include <linux/module.h>
#include <linux/slab.h>

static unsigned long long_target;
static char *target;
module_param_named(syscall, long_target, ulong, 0644);
MODULE_PARM_DESC(syscall, "Address of syscall");

/* mov $-1, %rax; ret */
unsigned const char bytes[] = { 0x48, 0xc7, 0xc0, 0xff, 0xff, 0xff, 0xff,
                                0xc3 };
unsigned char *orig;

/* Borrowed from PaX KERNEXEC */
static inline void disable_wp(void)
{
        unsigned long cr0;

        preempt_disable();
        barrier();
        cr0 = read_cr0();
        cr0 &= ~X86_CR0_WP;
        write_cr0(cr0);
}

static inline void enable_wp(void)
{
        unsigned long cr0;

        cr0 = read_cr0();
        cr0 |= X86_CR0_WP;
        write_cr0(cr0);
        barrier();
        preempt_enable_no_resched();
}

static int __init syscall_eperm_init(void)
{
        int i;
        target = (char *)long_target;

        if (target == NULL)
                return -EINVAL;

        /* save original */
        orig = kmalloc(sizeof(bytes), GFP_KERNEL);
        if (!orig)
                return -ENOMEM;
        for (i = 0; i < sizeof(bytes); i++) {
                orig[i] = target[i];
        }

        pr_info("writing %lu bytes at %p\n", sizeof(bytes), target);

        disable_wp();
        for (i = 0; i < sizeof(bytes); i++) {
                target[i] = bytes[i];
        }
        enable_wp();

        return 0;
}
module_init(syscall_eperm_init);

static void __exit syscall_eperm_exit(void)
{
        int i;

        pr_info("restoring %lu bytes at %p\n", sizeof(bytes), target);

        disable_wp();
        for (i = 0; i < sizeof(bytes); i++) {
                target[i] = orig[i];
        }
        enable_wp();

        kfree(orig);
}
module_exit(syscall_eperm_exit);

MODULE_LICENSE("GPL");
MODULE_AUTHOR("Kees Cook <kees@outflux.net>");
MODULE_DESCRIPTION("makes target syscall always return EPERM");

If I didn’t want to leave an obvious indication that the kernel had been manipulated, the module could be changed to:

not announce what it’s doing
remove the exit route to not restore the changes on module unload
error out at the end of the init function instead of staying resident

And with this in place, it’s just a matter of loading it with the address of sys_kexec_load (found via /proc/kallsyms) before I disable module loading via modprobe. Here’s my upstart script:

# modules-disable - disable modules after rc scripts are done
#
description "disable loading modules"

start on stopped module-init-tools and stopped rc

task
script
        cd /root/modules/syscall_eperm
        make clean
        make
        insmod ./syscall_eperm.ko \
                syscall=0x$(egrep ' T sys_kexec_load$' /proc/kallsyms | cut -d" " -f1)
        modprobe disable
end script

And now I’m safe from kexec before I have a kernel that contains /proc/sys/kernel/kexec_disabled.

Comments (7)

7 Comments

“[systemtap-based patching] does work, but has a significant problem: since it uses kprobes, a root user can just turn off the probes, reverting the changes. So that’s not going to work.”

Perhaps. It does sound possible to make a systemtap script protect itself against such reversion, by e.g. disabling further module loads, kexec, the other secureboot paths, and the /sys/kernel/debug-based kprobes overrides, etc.

Comment by Frank Ch. Eigler — December 11, 2013 @ 7:13 am
You need to disable interrupts and perhaps NMIs too, not just preempt.

Comment by x — December 11, 2013 @ 7:14 am
Is it possible to add signature verification to kexec? I.e. allow to load only signed payloads.

Comment by Alex Besogonov — December 11, 2013 @ 3:04 pm
@Alex: not yet, but this is being worked on. Even in the face of that, there will need to be a way for a system to declare it will only load verified images. https://lwn.net/Articles/566170/

Comment by kees — December 11, 2013 @ 3:31 pm
@x: well, for a frequently-called syscall, a lot more needs to be done to avoid insanity. But for this minor case, I took short-cuts. If I had to make it more robust, I’d go examine ksplice more.

Comment by kees — December 11, 2013 @ 3:33 pm
“or $-1, %rax; ret” is only 5 bytes.

Comment by a — December 19, 2013 @ 1:09 pm
SElinux/GRsec could be the solution also. Anyway i don’t get why we have kexec enabled by default in distros kernel :( Not so many people needs it

Comment by Wintch — March 26, 2014 @ 3:02 pm

codeblog code is freedom — patching my itch

December 10, 2013

live patching the kernel

7 Comments