Chapters

Hide chapters

Advanced Apple Debugging & Reverse Engineering

Fourth Edition · iOS 16, macOS 13.3 · Swift 5.8, Python 3 · Xcode 14

Section I: Beginning LLDB Commands

Section 1: 10 chapters
Show chapters Hide chapters

Section IV: Custom LLDB Commands

Section 4: 8 chapters
Show chapters Hide chapters

12. Assembly & Memory
Written by Walter Tyree

Heads up... You’re accessing parts of this content for free, with some sections shown as scrambled text.

Heads up... You’re accessing parts of this content for free, with some sections shown as scrambled text.

Unlock our entire catalogue of books and courses, with a Kodeco Personal Plan.

Unlock now

You’ve begun the journey and learned the dark arts of the calling convention in the previous chapter. When a function is called, you now know how parameters are passed to functions, and how function return values come back. What you haven’t learned yet is how code is executed when it’s loaded into memory.

In this chapter, you’ll explore how a program executes. You’ll look at a special register used to tell the processor where it should read the next instruction from, as well as how different sizes and groupings of memory can produce very different results.

Reviewing Reading Assembly

As you saw in the previous chapter, assembly instructions contain an opcode, a source and a destination. During the course of history, there have been two formats for the assembly code, called Intel and AT&T. They changed around the order of source and destination, and used different leading characters to denote registers, constants, etc. The default format for LLDB is Intel. It places the destination as the first argument after the opcode.

opcode  destination source

If you ever encounter a disassembly where those things are reversed, or where the registers are all prefixed with % symbols, you are reading AT&T format. Depending on what system you’re using at the time, there should be a setting to swap formats.

Before you move forward, another change to your LLDB setup will make some things a little easier. Before your code can be executed, functions need to make space in memory and get all of the values into the right registers or into the right order on the stack. This is called the function prologue. After completing its work, a function needs to put everything back and clean up. This is the function epilogue.

Because these two parts aren’t particularly relevant to the logic of a function, LLDBs default is to skip over them when you’ve set a breakpoint. However, as you’re learning, seeing how the prologue moves things around is important. So, you’ll change this setting.

Add the following line to the bottom of your ~/.lldbinit file:

settings set target.skip-prologue false

This line tells LLDB to not skip the function prologue. You came across this earlier in this book, and from now on it’s prudent to not skip the prologue since you’ll be inspecting assembly right from the first instruction in a function.

Note: When editing your ~/.lldbinit file, make sure you don’t use a program like TextEdit for this, as it will add unnecessary characters into the file that could result in LLDB not correctly parsing the file. An easy (although dangerous) way to add this is through a Terminal command like so: echo "settings set target.skip-prologue false" >> ~/.lldbinit.

Make sure you have two ‘>>’ in there or else you’ll overwrite all your previous content in your ~/.lldbinit file. If you’re not comfortable with the Terminal, editors like nano (which you’ve used earlier) are your best bet.

Creating the cpx Command

First of all, you’re going to create your own LLDB command to help later on.

command alias -H "Print value in ObjC context in hexadecimal" -h "Print in hex" -- cpx expression -f x -l objc --

Bits, Bytes and Other Terminology

Before you begin exploring memory, you need to be aware of some vocabulary about how memory is grouped. A value that can contain either a 1 or a 0 is known as a bit. You can say there are 64 bits per address in a 64-bit architecture. Simple enough.

(lldb) p sizeof('A')
(unsigned long) $0 = 1
(lldb) p/t 'A'
(char) $1 = 0b01000001
(lldb) cpx 'A'
(char) $2 = 0x41

The Program Counter Register

When a program executes, code to be executed is loaded into memory. The location of which code to execute next in the program is determined by one magically important register: the pc , program counter or instruction pointer register.

@NSApplicationMain
class AppDelegate: NSObject, NSApplicationDelegate {

  func applicationWillBecomeActive(
    _ notification: Notification) {
      print("\(#function)")
      self.aBadMethod()
  }

  func aBadMethod() {
    print("\(#function)")
  }

  func aGoodMethod() {
    print("\(#function)")
  }
}

(lldb) cpx $pc
(unsigned long) $1 = 0x0000000100dfda78
(lldb) image lookup -vrn ^Registers.*aGoodMethod

(lldb) register write $pc 0x0000000100dfdc48

Registers and Breaking Up the Bits

As mentioned in the previous chapter, arm64 has 31 general purpose registers: x0 - x30. In order to maintain compatibility with previous architectures, such as a 32-bit architecture, registers can be broken up into their 32, 16, or 8-bit values.

(lldb) register write x0 0x0123456789ABCDEF
(lldb) cpx $x0
(lldb) cpx $w0
0x89abcdef

Breaking Down the Memory

Now that you’ve taken a look at the program counter, it’s time to explore further the memory behind it.

(lldb) cpx $pc
(lldb) memory read -fi -c1 0x100685a78
->  0x100685a78: 0xd10383ff   sub    sp, sp, #0xe0
(lldb) expression -f i -l objc -- 0xd10383ff
(unsigned int) $1 = 0xd10383ff   sub    sp, sp, #0xe0

(lldb) p/i 0xd10383ff
(lldb) memory read -fi -c4 0x1005eda78
(lldb) x/4i 0x1005eda78
0xd10383ff   sub    sp, sp, #0xe0
0xa90c4ff4   stp    x20, x19, [sp, #0xc0]
0xa90d7bfd   stp    x29, x30, [sp, #0xd0]
0x910343fd   add    x29, sp, #0xd0

Endianness… This Stuff Is Reversed?

The ARM family architecture devices all use little-endian, which means that data is stored in memory with the least significant byte first. If you were to store the number 0xabcd in memory, the 0xcd byte would be stored first, followed by the 0xab byte.

(lldb) p/i 0xff8303d1
0xff8303d1   .long  0xff8303d1 ; unknown opcode
(lldb) memory read -s1 -c20 -fx 0x1005eda78
0x1005eda78: 0xff 0x83 0x03 0xd1 0xf4 0x4f 0x0c 0xa9
0x1005eda80: 0xfd 0x7b 0x0d 0xa9 0xfd 0x43 0x03 0x91
0x1005eda88: 0xe8 0x03 0x14 0xaa
(lldb) memory read -s2 -c10 -fx 0x1005eda78
0x1005eda78: 0x83ff 0xd103 0x4ff4 0xa90c 0x7bfd 0xa90d 0x43fd 0x9103
0x1005eda88: 0x03e8 0xaa14
(lldb) memory read -s4 -c5 -fx 0x1005eda78
0x1005eda78: 0xd10383ff 0xa90c4ff4 0xa90d7bfd 0x910343fd
0x1005eda88: 0xaa1403e8

Key Points

  • The default format for assembly in LLDB is opcode destination source which is referred to as “Intel” format.
  • LLDB skips the function prologue when a breakpoint drops into assembly. You can change this using the target.skip-prologue setting.
  • A bit is a single 0 or 1 value. Bits are grouped into larger chunks called nibbles (4 bits), bytes (8 bits0), words (32 bits) and double words (64 bits).
  • Use register read and register write to manipulate the values in the registers during an LLDb session.
  • The pc register is technically read-only, but you can write to it at the risk of crashing everything.
  • ARM64 uses a w prefix to refer to the lower 32-bits of any x register.
  • Assembly opcodes and parameters are encoded into 4-byte groups regardless of how long they are.
  • ARM64 uses little-endian encoding where the least significant byte is stored first.

Where to Go From Here?

Good job getting through this one. Memory layout can be a confusing topic. Try exploring memory on other devices to make sure you have a solid understanding of the little-endian architecture and how assembly is grouped together.

Have a technical question? Want to report a bug? You can ask questions and report bugs to the book authors in our official book forum here.
© 2024 Kodeco Inc.

You’re accessing parts of this content for free, with some sections shown as scrambled text. Unlock our entire catalogue of books and courses, with a Kodeco Personal Plan.

Unlock now