This is a blog post about GDB support in the Gramine project. This feature was originally written by Chia-Che Tsai, and later expanded by me (Paweł Marczewski) and other contributors to the project. Thanks to Michał Kowalczyk and Dmitrii Kuvaiskii for reviewing the post.
Gramine is a framework for running Linux applications under non-standard environments, such as Intel SGX. Intel SGX is a special processor feature that allows creating enclaves: execution environments that are isolated from the host system. When a program runs inside an SGX enclave, its execution state and memory cannot be changed or even viewed by other software running on the machine - even by the host operating system.
This is a pretty unusual situation: normally, an application might not trust the external world (for example remote hosts), but it still trusts the host OS when it comes to basic operations like saving data in files, or sending it between processes. However, in the case of SGX, we do not want the host to access (or modify) our data, so performing these operations will be more complicated than just asking the host OS.
To make sure the interactions with outside world are safe, Gramine itself acts
as an operating system, and intercepts requests (system calls) made by the
application. Handling these requests might ultimately involve talking to the
host OS, but with extra security precautions: for instance, read
and write
operations might actually apply encryption on the fly, so that the data stored
on the host cannot be read without an encryption key.
The diagram below shows the architecture of Gramine. The application and Gramine are running inside an enclave. Gramine handles application's system calls. When necessary, it invokes the untrusted runtime, which runs outside of the enclave and is responsible for calling the host OS.
Gramine might not be a real OS kernel, but it's still a pretty complicated piece of software. It's really useful to have a debugger when developing Gramine and running programs under it. However, we're not exactly running a normal Linux program, and standard Linux tools won't work without extra effort. We managed to get GDB running, but it wasn't easy...
As mentioned before, by design, there is no way to look "inside" an SGX enclave. If we're not currently executing enclave code, memory access will not work.
For debugging purposes, the rules can be relaxed somewhat. It's possible to run
an enclave with debugging features enabled, in which case Linux will allow us to
read the enclave memory. Direct memory access still won't work, but it will be
possible to use mechanisms such as the /proc/<pid>/mem
special file. GDB uses
/proc/<pid>/mem
, so it can access enclave memory without problems.
(To enable debugging features, the enclave must be created with a debug bit set,
and enclave threads must additionally set the DBGOPTIN
flag. This enables
special CPU instructions (EDBGRD
/EDBGWR
) which the Linux kernel can use to
read and write enclave memory).
Unfortunately, it's not that easy to check which part of the program is being executed. Whenever the process is stopped while in enclave, the SGX hardware forces it to exit the enclave and land on the Async Exit Pointer (AEP). Most CPU registers are also reset to prevent data leaks. Therefore, when we examine a stopped process with GDB, we will never see the actual location in enclave, or any other useful information such as stack pointer.
However, if we're able to read the enclave memory (via the /proc/<pid>/mem
special file), we can actually find the previous position stored there, as well
as values of all the other registers.
This is actually more complicated than it sounds like, and requires some support from Gramine. SGX saves the register values for each thread inside its TCS (Thread Control Structure), and in order to read these values from another process, it has to know the TCS base address for a given thread. Gramine dumps this information (along with other necessary details about the enclave) to a global object stored at a predefined address in memory.
Now, we need to make sure GDB uses that information. GDB interacts with the
process using ptrace. This is a Linux mechanism that allows, among other
things, starting and stopping another process, and reading register values while
it's stopped. Unfortunately, this is not useful if the process was stopped while
in enclave: as mentioned before, stopping the process causes it to temporarily
exit from enclave, so ptrace
is not going to report the right location.
What we can do is intercept that mechanism: we use the LD_PRELOAD
trick to inject our own wrapper for ptrace
.
Using this wrapper, we can ensure that GDB sees the right register values.
Whenever we find the process stopped at the AEP, we read the "real" instruction
pointer and other registers through /proc/<pid>/mem
, and return this
information to GDB. Effectively, we pretend that the process was actually
stopped before it exited the enclave.
As a result, the enclave code is largely transparent to GDB: we're able to see the current location in enclave, single-step through instructions, and add breakpoints.
Thanks to our ptrace
wrapper, GDB knows what instructions are being executed.
However, it only sees them as raw machine code: there is nothing that says which
source lines (or even functions) these instructions correspond to. This is
because GDB has no information about binaries that are loaded inside the
enclave.
This situation is actually not that different from regular Linux programs that use dynamic libraries. And regular programs have a standard solution for it: the GNU C library (glibc) maintains a special structure (r_debug) with a list of currently loaded shared libraries, and GDB reads this list.
It's tempting to try using this mechanism. However, r_debug
is already
maintained by glibc in the untrusted runtime (outside of enclave), and while we
could probably write to it, interfering with data maintained by glibc sounds
like a bad idea. And creating another r_debug
(either in enclave, or outside)
wouldn't really work, since GDB only expects one such variable.
Instead, we implemented a similar solution ourselves: Gramine maintains its own
debug_map
structure describing currently loaded binaries, and on
each change, calls a function called debug_map_update_debugger()
. This function is a no-op, and exists only so that
our Python extension for GDB can set a breakpoint in it, and
read debug_map
whenever the breakpoint is triggered.
Determining what files to add to debug_map
is not that simple: we can easily
register binaries loaded by Gramine, but Gramine is not the only component that
loads them. Much like Linux, when Gramine executes a dynamically linked binary,
it loads only the binary itself and its ELF interpreter (usually a dynamic
linker called ld-linux*.so
). Then, the dynamic linker loads all other
libraries required by the application. In addition, the application itself could
use dlopen to load more libraries at run time.
Fortunately, both the dynamic linker and dlopen
are usually provided as part
of the libc
library. Gramine provides a patched version of glibc
that calls
gramine_register_library()
whenever it loads a new dynamic
library, and as a result is able to notify GDB whenever a new library is added.
Having a list of all loaded binaries also turned out to be useful for other developer features: thanks to it, Gramine can report more details about a crash location, and even help with profiling.
As said before, Gramine runs as a library, but handles operations that would
normally be system calls. This is also achieved using our patched version of the
glibc
library: instead of running a SYSCALL
instruction, the library jumps
directly into Gramine. (The application might also execute
SYSCALL
instructions directly, and we intercept these, but this is much slower
than a simple jump).
What's important is that we cannot keep using the application's stack. Normally,
the SYSCALL
instruction triggers entering the kernel (which sets up its own
stack), and the code containing SYSCALL
relies on the fact that the
application's stack stays untouched. Therefore, Gramine also needs to switch to
its own stack.
Deeper layers of Gramine actually switch stacks for the second time: when Gramine needs to run some code outside of the SGX enclave, it uses another syscall-like mechanism called OCALL ("outside call"). The code after OCALL cannot access the enclave memory anymore, so it needs its own stack in untrusted (non-enclave) memory.
As a result, if we want to read a full stack trace, we might need to go through three different places in memory: the untrusted stack, Gramine's stack, and finally the application stack.
GDB has to understand how to move between all these stacks. This is usually done
by decorating the assembly code with CFI directives, which contain exactly
this information (e.g. "in this place in the code, the previous stack frame is
saved at address [RSP+xyz]
").
However, notice that in our case, the stack frames are not ordered: most of the
time the next stack frame is at a lower address (as is normal on x86), but when
we change stacks, we might need to jump back to a higher address. It turns out
that GDB really doesn't like this situation, and claims the stack is corrupt.
The only exception is when a function is called __morestack
(this magic name
is actually hardcoded in GDB sources). In order for GDB to
process our stack trace correctly, we have to pretend that the return address
is inside a __morestack
function.
Unfortunately, getting GDB to work with Gramine required some fighting. I mentioned the most important parts, but I left out some more details, such as signal handling or handling multiple threads. The end result is usable for most scenarios, but still breaks occasionally.
Of course, we're not interfacing with GDB using any official API. An alternative implementation could use GDB's libthread_db mechanism, or perhaps the remote debugging protocol. I think both of these are worth at least investigating.