In a previous post I discussed kernel debugging with VMware Fusion and LLDB. In that approach we were connecting LLDB to the kernel via the Kernel Debugging Protocol (KDP). That method works thanks to a stub implemented in the (target) kernel itself. One drawback we discussed was not being able to halt the kernel execution from the debugger and instead requiring a slightly cumbersome keyboard shortcut to generate a NMI on the target VM.

After publishing the article I received some very great feedback including a tweet from Ryan Govostes:

VMware Fusion has a GDB stub built-in, which lldb can talk to if you load a target definitions file.

To be fair I didn’t have a clear idea of what this exactly meant when I first read it but since it sounded pretty interesting I started doing some research.

I found a great post by snare that explains how to use GDB to connect to the remote debug stub in VMware Fusion and debug the target kernel from the host machine.

I will briefly discuss the approach here and then show how we can instead use LLDB to connect to the remote.

GDB stub in VMware Fusion

It turns out that VMware Fusion implements the GDB stub. I don’t think it is a documented feature (all mentions I’ve found about it were from users in the VMware forums) but it can be enabled by setting a preference. Each VM file contains a .vmx config file in the .vmwarevm package that can be edited (make sure that the VM is not running while you edit it).

Open it in a text editor and add the following line:

# If you are debugging a 32-bit machine use `guest32`
debugStub.listen.guest64 = "TRUE"

With this in place and after rebooting, the VM will listen to connections on the 8864 port (8832 if you’re using guest32) on localhost.

If you wanted to connect from another machine you could use a different option instead and would need to connect to the IP used by the VM:

# If you are debugging a 32-bit machine use `guest32`
debugStub.listen.guest64.remote = "TRUE"

For our use case we will simply connect to localhost so no need for the remote part.

GDB debugging stub

Before explaining how to connect from GDB let’s quickly discuss what is the GDB stub.

In order to setup a communication between two hosts, we need (among other things) a transmission protocol and an application protocol that both client and server can understand. Then obviously both server and client need to have code that is able to send, receive and interpret packets that come through.

This whole system is implemented as GDB Remote and consists of mainly four parts:

TCP as the transmission protocol (KDP on the other hand uses UDP).
The Remote Serial Protocol as the application protocol. It is a well-documented protocol and one rarely needs to know the details of it.
The client side of the connection is GDB and, as expected, knows how to connect to the remote and understands the Remote Serial Protocol to send and receive packets.
The server side of the connection is the tricky part since it’s the guest system and rarely has any knowledge of GDB and how to act as a remote out of the box. In order for the debugged program to allow connecting to GDB, one would use either one of these two solutions:

Using gdbserver, which is a control program for Unix-like systems that allows you to connect your program with a remote GDB. It can be a good option if you have no or little control over the target environment. The docs explain gdbserver in much more details.
Implementing the GDB debugging stub on the target. By doing so a program can itself implement the target side of the communication protocol. The official docs have a lot more info if you’re interested in the particular implementation.

In the case of VMware Fusion, a full GDB remote stub is implemented by the virtual machine and can be enabled by setting the option described above, allowing a remote GDB session to connect to the VM.

GDB Remote

With the debugStub.listen.guest64 option set and the VM rebooted, we can start a GDB session on the host machine and attempt to connect to the VM.

(gdb) file /Library/Developer/KDKs/KDK_10.10.5_14F27.kdk/System/Library/Kernels/kernel.development
Reading symbols from /Library/Developer/KDKs/KDK_10.10.5_14F27.kdk/System/Library/Kernels/kernel.development...Reading symbols from /Library/Developer/KDKs/KDK_10.10.5_14F27.kdk/System/Library/Kernels/kernel.development.dSYM/Contents/Resources/DWARF/kernel.development...
done.
(gdb) target remote localhost:8864
Remote debugging using localhost:8864
0xffffff800f9f1e52 in ?? ()

And at this point we are connected to the remote through the debug stub and we can do anything in the debugger (forget about the missing symbols here, I haven’t looked too much into it). After continuing, one can stop the kernel execution by doing ^c in the debugger as usual.

However, I had to install GDB on my host just to try this out (GDB stopped shipping with OS X since Mavericks) and I’d really like to use LLDB wherever I can since it’s what I’m most familiar with nowadays.

Connecting LLDB to a GDB remote stub

LLDB actually has support for connecting to a GDB remote out of the box with the gdb-remote command. To quote the LLDB docs:

To enable remote debugging, LLDB employs a client-server architecture. The client part runs on the local system and the remote system runs the server. The client and server communicate using the gdb-remote protocol, usually transported over TCP/IP.

In particular, the LLDB-specific extensions are discussed in a fantastic document in the LLDB repo.

LLDB has added new GDB server packets to better support multi-threaded and remote debugging. Why? Normally you need to start the correct GDB and the correct GDB server when debugging. If you have mismatch, then things go wrong very quickly. LLDB makes extensive use of the GDB remote protocol and we wanted to make sure that the experience was a bit more dynamic where we can discover information about a remote target with having to know anything up front. [...] Again with GDB, both sides pre-agree on how the registers will look (how many, their register number,name and offsets). We prefer to be able to dynamically determine what kind of architecture, OS and vendor we are debugging, as well as how things are laid out when it comes to the thread register contexts. Below are the details on the new packets we have added above and beyond the standard GDB remote protocol packets.

So we should be able to just connect to the remote system from LLDB? Let’s find out.

(lldb) file /Library/Developer/KDKs/KDK_10.10.5_14F27.kdk/System/Library/Kernels/kernel.development
Current executable set to '/Library/Developer/KDKs/KDK_10.10.5_14F27.kdk/System/Library/Kernels/kernel.development' (x86_64).
(lldb) gdb-remote 8864
Kernel UUID: C75BDFDD-9F27-3694-BB80-73CF991C13D8
Load Address: 0xffffff800f800000
Kernel slid 0xf600000 in memory.
Loaded kernel file /Library/Developer/KDKs/KDK_10.10.5_14F27.kdk/System/Library/Kernels/kernel.development
Loading 87 kext modules ....................................................................................... done.
Target arch: x86_64
Connected to live debugserver or arm core. Will associate on-core threads to registers reported by server.
Process 1 stopped
* thread #3: tid = 0x0066, name = '0xffffff801c91d9c0', queue = 'cpu-0', stop reason = signal SIGTRAP
    frame #0: 0xffffffffffffffff

Cool! So we were able to connect to the GDB stuff on the VM system. Let’s try and get a backtrace and see how things look.

(lldb) thread backtrace
* thread #3: tid = 0x0066, name = '0xffffff801c91d9c0', queue = 'cpu-0', stop reason = signal SIGTRAP
  frame #0: 0xffffffffffffffff

Hmm, that’s not a lot of information. Also, the only frame being at address 0xffffffffffffffff doesn’t sound right either.

LLDB target definition

Remember that Ryan’s tweet mentionned a target definitions file? I did some more research and found that other tweet from Shantonu Sen that pointed me to the right approach.

We can download the x86_64_target_definition.py file and use it as our plugin.process.gdb-remote.target-definition-file in LLDB’s settings.

# You can alternatively add this to the `.lldbinit` so that it's loaded whenever lldb starts
(lldb) settings set plugin.process.gdb-remote.target-definition-file /path/to/x86_64_target_definition.py

The file has a great comment explaining what the target definition does and why it is necessary.

This file can be used with the following setting: plugin.process.gdb-remote.target-definition-file

This setting should be used when you are trying to connect to a remote GDB server that doesn't support any of the register discovery packets that LLDB normally uses.

Why is this necessary? LLDB doesn't require a new build of LLDB that targets each new architecture you will debug with. Instead, all architectures are supported and LLDB relies on extra GDB server packets to discover the target we are connecting to so that is can show the right registers for each target. This allows the GDB server to change and add new registers without requiring a new LLDB build just so we can see new registers.

This file implements the x86_64 registers for the darwin version of GDB and allows you to connect to servers that use this register set.

Let’s try to use gdb-remote after setting the target definition file.

(lldb) settings set plugin.process.gdb-remote.target-definition-file /path/to/x86_64_target_definition.py
(lldb) file /Library/Developer/KDKs/KDK_10.10.5_14F27.kdk/System/Library/Kernels/kernel.development
Current executable set to '/Library/Developer/KDKs/KDK_10.10.5_14F27.kdk/System/Library/Kernels/kernel.development' (x86_64).
(lldb) gdb-remote 8864
Kernel UUID: C75BDFDD-9F27-3694-BB80-73CF991C13D8
Load Address: 0xffffff800f800000
Kernel slid 0xf600000 in memory.
Loaded kernel file /Library/Developer/KDKs/KDK_10.10.5_14F27.kdk/System/Library/Kernels/kernel.development
Loading 87 kext modules ....................................................................................... done.
Target arch: x86_64
Connected to live debugserver or arm core. Will associate on-core threads to registers reported by server.
Process 1 stopped
* thread #3: tid = 0x0066, 0xffffff800f9f1e52 kernel.development`machine_idle + 370 at pmCPU.c:174, name = '0xffffff801c91d9c0', queue = 'cpu-0', stop reason = signal SIGTRAP
    frame #0: 0xffffff800f9f1e52 kernel.development`machine_idle + 370 at pmCPU.c:174

It already looks better. Let’s now try to get a backtrace:

(lldb) thread backtrace
* thread #3: tid = 0x0066, 0xffffff800f9f1e52 kernel.development`machine_idle + 370 at pmCPU.c:174, name = '0xffffff801c91d9c0', queue = 'cpu-0', stop reason = signal SIGTRAP
  * frame #0: 0xffffff800f9f1e52 kernel.development`machine_idle + 370 at pmCPU.c:174
    frame #1: 0xffffff800f8fddb3 kernel.development`processor_idle(thread=0x0000000000000000, processor=0xffffff80100ef658) + 179 at sched_prim.c:4605
    frame #2: 0xffffff800f8fe300 kernel.development`idle_thread + 32 at sched_prim.c:4729
    frame #3: 0xffffff800f9ea347 kernel.development`call_continuation + 23

Perfect! We have a complete symbolicated trace and the addresses now look correct.

In practice

To make sure that things are working as expected, let’s set a breakpoint on forkproc (this function is used to create a new process structure given a parent process and is called from the fork syscall) and make sure that our breakpoint is hit and that we can inspect the frame arguments.

(lldb) breakpoint set --name forkproc
Breakpoint 1: where = kernel.development`forkproc + 20 at cpu_data.h:330, address = 0xffffff8006da6414
(lldb) continue
Process 1 resuming
Process 1 stopped
* thread #6: tid = 0x0f4c, 0xffffff8006da6414 kernel.development`forkproc(parent_proc=0xffffff8013f37b00) + 20 at cpu_data.h:330, name = '0xffffff8013e4f9c0', queue = 'cpu-1', stop reason = breakpoint 1.1
    frame #0: 0xffffff8006da6414 kernel.development`forkproc(parent_proc=0xffffff8013f37b00) + 20 at cpu_data.h:330
(lldb) thread backtrace
* thread #6: tid = 0x0f4c, 0xffffff8006da6414 kernel.development`forkproc(parent_proc=0xffffff8013f37b00) + 20 at cpu_data.h:330, name = '0xffffff8013e4f9c0', queue = 'cpu-1', stop reason = breakpoint 1.1
  * frame #0: 0xffffff8006da6414 kernel.development`forkproc(parent_proc=0xffffff8013f37b00) + 20 at cpu_data.h:330
    frame #1: 0xffffff8006da6d69 kernel.development`cloneproc(parent_task=0xffffff80135c7718, parent_coalition=0xffffff80135c4400, parent_proc=0xffffff8013f37b00, inherit_memory=0, memstat_internal=0) + 41 at kern_fork.c:977
    frame #2: 0xffffff8006da6038 kernel.development`fork1(parent_proc=0xffffff8013f37b00, child_threadp=0xffffff8014613ac0, kind=<unavailable>, coalition=<unavailable>) + 328 at kern_fork.c:554
    frame #3: 0xffffff8006d9b441 kernel.development`posix_spawn(ap=0xffffff8013f37b00, uap=<unavailable>, retval=0xffffff80135d0040) + 1937 at kern_exec.c:2078
    frame #4: 0xffffff8006e2c0c1 kernel.development`unix_syscall64(state=0xffffff80135db540) + 753 at systemcalls.c:368
    frame #5: 0xffffff8006a0e656 kernel.development`hndl_unix_scall64 + 22
(lldb) p *(struct proc *)$rdi
    (struct proc) $1 = {
      p_list = {
        le_next = 0xffffff80177e6cf0
        le_prev = 0xffffff801610d840
      }
      p_pid = 275
      task = 0xffffff801776cd08
      ...

Everything is working as expected, our breakpoint is hit, we can get a complete backtrace and print the first argument (a reference to the parent process structure that we want to fork from, I’ve cut the output, the proc struct is huge).

Conclusion

We showed an alternative approach to do remote debugging with VMware Fusion and LLDB. This method has some advantages over KDP since it lets us interrupt the execution of the program from the debugger at any time and doesn’t require us to use a NMI from the target VM to give control to the debugger on the host.

I’ve read that this method is also faster but I haven’t noticed a major difference in my testing so far. I’m sure heavy use of both methods will provide much more insights in that regard.

Thanks to Ryan Govostes for the idea, snare for the great post, Shantonu Sen for the target definition solution and VMware for making an awesome product.