Notes from “Breaking Kernel Address Space Layout Randomization (KASLR) with Intel TSX”

Categories Cybersecurity Lecture Series, Security

As a Georgia Tech OMSCS student as well as working software professional, advanced security topics are always something I want to learn more about. Georgia Tech’s Institute for Information Security & Privacy is presenting a weekly Cybersecurity Lecture Series on Fridays this fall, and being a local I’ve started attending them. Here are my quick (albeit not necessarily complete) notes from this week’s presentation by Yeongjin Jang, a PhD student at Georgia Tech.

Introduction

Goal of many exploits is to invoke commit_creds(), prepare_kernel_creds() et al using code reuse.

KASLR changes the kernel symbol addresses at every boot, so any code reuse exploits require circumvention of KASLR.

Popular OSs that have adopted KASLR:

  • Linux 2.6.12+
  • OSX 10.5+
  • Android 4.0+
  • iOS 6+
  • Windows Vista+

TLB Timing Side Channel

Measures timing difference between TLB hit/miss for address requests by generating page faults to test address spaces. (Hund et al, see below).

Map is a bit “noisy” due to multiple layers in memory management:

  • User
  • CPU
  • TLB
  • OS exception handling
  • OS noise (largest detractor to mapping algorithm)

DrK Attack

De-randomizing Kernel ASLR

Based on work by Rafal Wojtczuk (see below) leveraging the Transactional Synchronization Extension (TSX) in newer Intel CPU architecture.

Abort handler of TSX:

  • Suppress all sync exceptions (e.g. page fault)
  • Does not notify the OS
  • Drops TLB measured error range from ~4000 cycles to ~180.

Attack targets

  • DrK is hardware side-channel, mechanism is OS independent
  • Targeted popular OSs: Linux, Windows, OSX

Attack types

  • Type 1: Revealing mapping status of each page
  • Type 2: Finer grained module detection

Type 2 attack achieves almost 100% mapping accuracy in under 2 seconds on average CPUs running almost any OS.  Even cloud systems are vulnerable if TSX is enabled on the host CPU, although there is more noise due to virtualization.

What about cache coherence?

Intel TLBs are not coherent!  If exploit is context-switched, it doesn’t matter.  Each core either pulls from its own TLB or walks the page tables, resulting in the same kinds of timings.

Controlling noise

  • Dynamic frequency scaling (SpeedStep, TurboBoost, etc) changes the return value of rdtscp()
    • Run busy loops ( while(1);) to max out CPU boost
  • Hardware interrupts and cache conflicts also abort TSX
    • Probe multiple times (e.g. 2-200) and take the minimum

Increasing covertness

  • OS never sees page faults
    • TSX suppresses the exception
  • Possible traces: performance counters
    • High count on dTLB/iTLB miss
    • High count on tx-aborts

Countermeasures?

  • Modify CPU to eliminate timing channels
    • CPUs have already shipped
  • Turn off TSX
    • Disable microcode not an option from software
  • Have a more coarse-grained timer
  • Using separated page tables for kernel and user processes
    • High overhead due to frequent TLB flush
  • Fine grained randomization
    • Difficult to implement, performance degradation
  • Inserting fake mapped/executable pages between mapped
    • Doesn’t work as well as you’d hope, ASLR esp on Linux doesn’t give you enough space to work with
  • Pad modules to vary offsets, might make mapping more difficult

Conclusion

  • TSX can break KASLR of commodity OSes
  • Ensure accuracy, speed and covertness
  • Timing side channel is caused by hardware, OS independent

References