An Incredibly Brief Introduction to ARM Assembly (by watswat5)
Note from Wololo: This tutorial was initially published by /Talk member watswat5, as part of our monthly tutorial contest. Watswat5 won the “mods award” prize in February (a $10 PSN Code) for his entry. You can find the original post here.
Recently, an ROP hack was used to discover and exploit a vulnerability in the PS Vita webkit process. One of the many things to arise from this is the possibility to dump and analyze the Vita’s API modules, potentially exposing more useful or interesting security exploits. Unfortunately, modules that are dumped consist entirely of ARM machine code, something that the average human would find impossible to read and wrap their head around.
One possible solution to this problem is to reverse the machine code back to its precompiled state in a process known as “disassembling” the code. After being disassembled, the machine code will be replaced with ARM assembly, the second lowest level of ARM programming possible.
So, without further ado, here it is:
An Incredibly Brief Introduction to ARM Assembly (And Assembly in General), Part 1
Most programmers would probably barf at ARM assembly. For people who know how to program in high level languages like C or C++, Java, C#, etc, assembly would probably look like gibberish. Most high level operations are intuitive, like adding numbers (number + number), setting a variable value ( number = number), or even calling functions( function_name()). In assembly, all these are handled by a set of operations with a specific syntax, called instructions.
Instructions in Assembly often reference things like registers, the stack pointer or the program counter. These are integral parts of the hardware inside the CPU, and it’s very important to know what they do in order to successfully program or analyze any assembly language.
Registers:
- Registers are temporary placeholders for variables that are used by the program and various CPU specific data. They’re a faster alternative to saving variables in memory.
- The ARM architecture has 37 registers, each of which hold 32 bits.
- Registers in assembly are referenced using r followed by the register number (i.e. r0, r1, etc.)
- The first 13 (r0 – r12) registers are General Purpose Registers (GPR’s) that can be used to store anything.
Program Counter
- r15 is the Program Counter (PC). It holds the current location in memory, which in turn houses the current instruction to perform or data to use. A PC is one of the most basic components of a CPU and is basically a binary counter that counts up from 0 to 2^30 (2 bits always remain set to 0).
Stack Pointer
- r13 is the Stack Pointer (SP). The number in this register points to the top of the Stack.
- The stack in ARM assembly is like any other stack: First In, Last Out.
- When an instruction is run that allows jumping around memory, the Stack holds the last memory address in PC before it is changed to the new location’s address. This allows the PC to return to the original location in memory simply by copying over the contents of the top of the Stack.
- Whenever the Stack gains an address, the SP has to increment by 1. Whenever one is removed, the SP is decremented by 1.
- Stack Overflow errors (common with recursive programming) are caused by the Stack accumulating too many memory addresses, causing the original address to be overwritten and the program to enter an infinite loop.
Link Register
- r14 is the Link Register (LR). The Link Register fulfills the same purpose as the Stack on a smaller scale. When a subroutine can’t call another subroutine (called a leaf subroutine), it’s often faster to save the current address to an empty register rather than save it top the Stack AND increment the SP. ARM has a dedicated Link Register, although any of the GPR’s could also be used.
Current Program Status Register
- Known as the CPSR, this register houses various information about the current program. http://en.wikipedia.org/wiki/ARM_architecture#Registers
All the Rest
- The keen will have noticed that not all 37 registers have been mentioned. The ARM processor has several operation modes, and only some of the registers can be accessed in each mode. Each mode always has an r0 – r15 and a CPSR, but the actual physical registers might be any of the available registers.
Throughout an assembly code file, these registers will be referenced regularly. It’s best to remember them or keep a handy reference nearby!
ARM is based on the RISC (Reduced Instruction Set Computing) standard as opposed to the CISC (Complex Instruction Set Computing) standard. RISC follows the K.I.S.S. method of programming and generally has fewer instructions to remember and a faster execution time. However, the downside is that to perform complex operations that CISC might store as one or two instructions, multiple RISC instructions will be required, making it harder to analyze an algorithm.
One example is multiplication:
In CISC, MULT will multiply two operands in memory and store them back all in one instruction.
In RISC, you have to load two values into two registers (two separate instructions), multiply with another instruction and store the result back into memory with a fourth instruction.
You can read more about the RISC and CISC standards here: http://cs.stanford.edu/people/eroberts/courses/soco/projects/risc/risccisc/
This tutorial also uses information from the following resource:
http://simplemachines.it/doc/arm_inst.pdf
End of Part 1!
To read more, check watswat5’s thread on /talk!
Want a chance to win a $10 PSN Code and be featured on the blog with your own guide? Join our monthly tutorial contest here!
So the Vita is hacked and we can run GTA V on it in 4k?
Is that supposed to be funny?
Nice April Fools joke.
Great article keep up the good work!
well, not really, i pulled apart custom crypto algorithm from many app.
Thanks for let us know about those articles in the forum! Very interesting!
Great post. I love reading the tutorials provided by this site; they’re very informative and are great for people who want to contribute to the hacking scene but don’t know where to start (such as I!) Can’t wait to see more!