Block Debugging
This is a repost of an article I published on the Realmac Software blog.
As a software engineer, a working day can be divided into three main areas:
- Writing code: the easy part. If you know your frameworks, it’s really just a matter of putting your thoughts on “paper”.
- Designing interfaces: the hard part. Even though writing code is not so hard, writing reusable, easy to read and extend code is much harder.
- Debugging: the fun part. Stopping the execution of a program and inspect its internals.
I’d say that only 40% of my day is spent writing code, 30% designing and a good 30% debugging. Of course I include in debugging much more than simply stepping through my own code hunting for bugs.
When stepping through instructions in the debugger one often encounters an instance of NSGlobalBlock, NSStackBlock or NSMallocBlock as argument to a method or simply being invoked. Inspecting what is happening in that a block might seem daunting at first sight but if one remembers that a block is really nothing more than a simple struct containing a function pointer, things get a bit easier.
Calling conventions
Before diving into it we need to talk about calling conventions. In this article I will be discussing the x86-64 instruction set which is what processors on most Mac computers use. This is not what ARM processors on iOS devices use but it shouldn’t be too hard to translate, the underlying concept is not much different.
The main piece of documentation one needs is the System V Application Binary Interface - AMD64 Architecture Processor Supplement that used to be available online but has since seemed to be removed. I have taken the liberty to host a version I have on our website so that it is accessible.
In order to inspect function arguments when stepping through instructions in the debugger, one needs to know where the various arguments are actually located, in the registers or on the stack. The calling convention for x86-64 is as following:
The remaining arguments are placed on the stack. One should also know that return values are located in the accumulator register %rax. Note that for simplicity we won’t discuss floating points that would involve vector registers or functions that return a struct (in this case the return value is located somewhere else, likely on the stack, and its address is passed in %rdi, the remaining arguments being in the following registers, as per the calling convention).
Another key concept is preservation. Some registers are preserved across function calls, which means that the value they store will not change after a call instruction. They might be mutated during the function execution but they have to be restored to their original values so that they don’t appear changed from a caller perspective. This is usually why you see a number of push and pop instructions in a function prologue and epilogue, the idea being storing the state of the registers before and restoring it after the function execution. All the registers used for passing arguments are, by their very nature, not preserved.
So, as an example, if one inspects the various registers just before a call instruction for the following function
int function(int firstArgument, int secondArgument);
one would find firstArgument in %rdi and secondArgument in %rsi. By stepping over the call instruction, one would then find the return value in %rax.
But what about Objective-C methods? Well, as we know, an Objective-C message send is turned by the compiler into an objc_msgSend function call. The objc_msgSend function definition can be found in the Objective-C Runtime open source project and is
void objc_msgSend(void /* id self, SEL op, ... */ )
Thus, an Objective-C method becomes an objc_msgSend function call where the first argument is the receiver and the second argument the selector. The remaining arguments are simply following (The reason for the function taking void is to prevent using objc_msgSend without a cast, so that the compiler can use the correct calling convention at the call site). Similarly as with a plain function, the return value for an Objective-C method will be placed in %rax. It is important to note that the runtime uses a few additional functions for sending messages, mainly for special cases such as messaging super, methods that return floating-point values or structs. However we won’t be discussing these here.
So, taking the following method as an example, the compiler will transform it into a function call:
- (NSString *)substringToIndex:(NSUInteger)idx;
((void (*)(id, SEL, NSUInteger))objc_msgSend)(self, @selector(substringToIndex:), idx)
This means that just before the call instruction, the receiver self can be found in %rdi, the selector in %rsi and the method first argument in %rdx. The return value (a pointer to an NSString object) will be located in %rax upon return of the function.
Block structure
With all this in mind, we can now get back to our block discussion. The LLVM project has a very useful page for the Block Implementation Specification. The libclosure Open Source page is also very useful if one wants to know more about Apple’s block implementation. One key part is the definition of the structure of a block:
struct Block_literal_1 {
void *isa; // initialized to &_NSConcreteStackBlock or &_NSConcreteGlobalBlock
int flags;
int reserved;
void (*invoke)(void *, ...);
struct Block_descriptor_1 {
unsigned long int reserved; // NULL
unsigned long int size; // sizeof(struct Block_literal_1)
// optional helper functions
void (*copy_helper)(void *dst, void *src); // IFF (1<<25)
void (*dispose_helper)(void *src); // IFF (1<<25)
// required ABI.2010.3.16
const char *signature; // IFF (1<<30)
} *descriptor;
// imported variables
};
The first member isa is interesting since it is the reason why a block is after all an Objective-C object. After a flags integer and a reserved member we can find the actual function pointer *void (*invoke)(void , …);. The last member is a reference to a Block_descriptor_1 struct that contains additional data such as a copy and dispose function pointers, a size and a signature.
It is important to notice that the first parameter of the invoke function is a pointer, leading us to hint that this is actually the block itself, similarly to the receiver being the first argument of the objc_msgSend function. If only this argument was easy to get from within a block body this would make recursive blocks less ugly and error prone, in particular when using ARC.
In practice
Following is a simple sample program that we will use in order to inspect a block in the debugger:
// clang -framework Foundation -fobjc-arc -o block block.m
#import <Foundation/Foundation.h>
@interface HelperClass : NSObject
- (void)doThingWithBlock:(BOOL (^)(NSString *arg1, NSInteger arg2))block;
@end
@implementation HelperClass
- (void)doThingWithBlock:(BOOL (^)(NSString *arg1, NSInteger arg2))block
{
block(@"Oh Hai", 22);
}
@end
int main(int argc, char **argv)
{
@autoreleasepool {
HelperClass *object = [HelperClass new];
NSInteger capturedInteger = 2;
[object doThingWithBlock:^ BOOL (NSString *arg1, NSInteger arg2) {
NSInteger someInteger = arg2 + capturedInteger;
printf("%p %li\n", arg1, someInteger);
return YES;
}];
return 0;
}
}
Compile the code and launch the program in the debugger by running
$ clang -framework Foundation -fobjc-arc -o block block.m
$ lldb block
Thus in LLDB, set a breakpoint on the method call and run the program
(lldb) breakpoint set --name "-[HelperClass doThingWithBlock:]"
(lldb) run
We should now hit our breakpoint in the doThingWithBlock: method. We have stopped execution at the very start of the method implementation. The very few first instructions, a series of push and mov are the function prologue and take care of storing the value of registers that should be preserved by pushing them on the stack. As we saw earlier, if we print the content of the above mentioned registers we should be able to retrieve our arguments.
(lldb) po $rdi
$2 = 4296049056 <HelperClass: 0x1001081a0>
(lldb) p (char *)$rsi
(char *) $3 = 0x0000000100000ef2 "doThingWithBlock:"
(lldb) po $rdx
$4 = 140734799804432 <__NSStackBlock__: 0x7fff5fbff810>
Note that here we used the convenience that selectors are currently just strings. This might not always be the case though so a more robust solution would be to actually get the string representation of a selector by mean of the function *const char sel_getName(SEL sel) from the runtime. This function currently returns the selector casted to *(const char ) but again this could easily change in the future.
We are mostly interested about the first method argument (the third argument of the objc_msgSend function) which happens to be a stack block. Now, obviously we know exactly what the arguments, return value and body of this block are since we wrote it, but think if you were stepping through framework code and you find such an instance, you would have no idea. And from experience, the actual interesting bits often happen to be in that very block. Well, this is unfortunately where most people would stop investigating but it is also exactly where any debugging aficionado starts to have some fun!
As we mentioned above, we are mainly interested in knowing two things from the block:
- the body of the invoke function
- the block signature
Invoke function
Let’s start with the invoke function. We saw in the block structure that the invoke function pointers is the fourth member in the struct. We previously printed the address of the block itself so if we manage to infer the address of the function pointer it should be trivial to disassemble it.
Assuming a 64-bit system, we know exactly the size of each member in the struct. We are only interested in members positioned above the function pointer.
We can thus conclude that the function pointer is positioned 16 bytes down the struct. LLDB has a handy tool to read from the memory at a particular address of the process being debugged. Let’s read the memory at the block address, nicely formatted by chunks of 8 bytes (the size of a pointer on 64-bit)
(lldb) memory read --size 8 --format x 0x7fff5fbff810
0x7fff5fbff810: 0x00007fff76b420e0 0x0000000040000000
0x7fff5fbff820: 0x0000000100000de0 0x0000000100001190
0x7fff5fbff830: 0x0000000000000002 0x0000000000000002
0x7fff5fbff840: 0x00000001001081a0 0x00007fff5fbff880
As previously said, the function pointer is located after 16 bytes of content in the struct so we can conclude from the memory reading that the function address itself is 0x0000000100000de0. If we try to disassemble from this address we should hopefully get the first few instructions of this function.
(lldb) disassemble --start-address 0x0000000100000de0
block__main_block_invoke:
0x100000de0: pushq %rbp
0x100000de1: movq %rsp, %rbp
0x100000de4: subq $64, %rsp
0x100000de8: movq %rdi, -40(%rbp)
0x100000dec: movq %rsi, %rdi
0x100000def: movq %rdx, -48(%rbp)
0x100000df3: callq 0x100000e68 ; symbol stub for: objc_retain
0x100000df8: leaq 208(%rip), %rdi ; "%p %li\n"
One could now try to disassemble further instructions by guessing an --end-address or simply set a breakpoint at the address of the first instruction and step through.
Block signature
Now that we managed to disassemble the actual block invoke function, it would be nice to know the block signature. We could probably infer them from the content of the register in the function prologue but there is a more solid version to get them. You might have notice the signature string in the descriptor struct in the block. Now if we could get it we could surely create an NSMethodSignature from it.
First thing first, we need to get to the actual descriptor struct. This struct is hold by reference (probably so that its content can be changed along the line without having to change the block definition itself). The descriptor struct pointer address is positioned just after the function pointer which we know to be 8 bytes sized. From our memory read above we can deduce that the its address is 0x0000000100001190.
However, it is not given that the descriptor struct will actually hold a signature. Luckily, the block has a flags mask that gives us some hints about it. The block specification documents the flags in use in the mask as
enum {
BLOCK_HAS_COPY_DISPOSE = (1 << 25),
BLOCK_HAS_CTOR = (1 << 26), // helpers have C++ code
BLOCK_IS_GLOBAL = (1 << 28),
BLOCK_HAS_STRET = (1 << 29), // IFF BLOCK_HAS_SIGNATURE
BLOCK_HAS_SIGNATURE = (1 << 30),
};
The flags integer happens to be the third member in the block struct. Reading the memory from the block address formatted in chunks of 4 bytes (the size of an integer on 64-bit) we find out the flags as 0x40000000:
(lldb) memory read --size 4 --format x 0x7fff5fbff810
0x7fff5fbff810: 0x76b420e0 0x00007fff 0x40000000 0x00000000
0x7fff5fbff820: 0x00000de0 0x00000001 0x00001190 0x00000001
We can quickly check that ((0x40000000 & (1 << 30)) != 0) so we indeed have a signature.
Note that the documentation states that not every block has a signature. From my experience though, both global blocks (blocks that happen to not capture any surrounding variable and thus optimised to be in a fixed global location rather than created on the stack) and stack blocks (and malloc blocks by extension) do indeed have an extensions. A quick look at the Clang source for the block emitter (see in particular the implementation of the CodeGenFunction::EmitBlockLiteral and buildGlobalBlock functions in CGBlocks.cpp) does show that any block (even global blocks for which the block literal emission function take a fast path) has the BLOCK_HAS_SIGNATURE set in its flags. We can thus be quite confident that blocks in code built with Clang will have a signature.
In order to find the address of the signature variable we also need to figure out which other optional members of the descriptor struct are actually populated (these two members being copy_helper and dispose_helper). In our case ((0x40000000 & (1 << 25)) == 0) so out block doesn’t have a copy and dispose helper function pointers.
So let’s summarise: the signature string pointer will be positioned after two unsigned long variables in the descriptor struct. An unsigned long being 8 bytes on 64-bit, we expect the signature pointer to be 16 bytes below. Let’s inspect the memory at the descriptor address and print the (accordingly casted) string.
(lldb) memory read --size 8 --format x 0x0000000100001190
0x100001190: 0x0000000000000000 0x0000000000000028
0x1000011a0: 0x0000000100000ed7 0x0000000100000eef
0x1000011b0: 0x0000000000000000 0x0000000000000000
0x1000011c0: 0x0000000000000000 0x0000000000000000
(lldb) p (const char *) 0x0000000100000ed7
(const char *) $8 = 0x0000000100000ed7 "c24@?0@8q16"
You might think this is junk but it is actually a signature! We can easily find out by creating an NSMethodSignature from it:
(lldb) po [NSMethodSignature signatureWithObjCTypes:"c24@?0@8q16"]
$9 = 0x000000010010b060 <NSMethodSignature: 0x10010b060>
number of arguments = 3
frame size = 224
is special struct return? NO
return value: -------- -------- -------- --------
type encoding (c) 'c'
flags {isSigned}
modifiers {}
frame {offset = 0, offset adjust = 0, size = 8, size adjust = -7}
memory {offset = 0, size = 1}
argument 0: -------- -------- -------- --------
type encoding (@) '@?'
flags {isObject, isBlock}
modifiers {}
frame {offset = 0, offset adjust = 0, size = 8, size adjust = 0}
memory {offset = 0, size = 8}
argument 1: -------- -------- -------- --------
type encoding (@) '@'
flags {isObject}
modifiers {}
frame {offset = 8, offset adjust = 0, size = 8, size adjust = 0}
memory {offset = 0, size = 8}
argument 2: -------- -------- -------- --------
type encoding (q) 'q'
flags {isSigned}
modifiers {}
frame {offset = 16, offset adjust = 0, size = 8, size adjust = 0}
memory {offset = 0, size = 8}
If all this is Greek to you I suggest reading the Type Encodings in the Objective-C Runtime Programming Guide.
What this says in English is simply: a function that returns a char (aka a BOOL), takes three arguments, a block as first (which is our block reference), a reference to an object as second (our NSString) and a long long as third (the NSInteger).
We were then able to find the signature of a given block and disassemble its invoke function. See, debugging is fun!
If you have any comments, you can leave them below or contact me on Twitter.