[Tutorial]An Incredibly Brief Introduction to ARM Assembly
Posted: Fri Feb 06, 2015 4:21 am
Recently, an ROP hack was used to discover and exploit a vulnerability in the PS Vita webkit process. One of the many things to arise from this is the possibility to dump and analyze the Vita's API modules, potentially exposing more useful or interesting security exploits. Unfortunately, modules that are dumped consist entirely of ARM machine code, something that the average human would find impossible to read and wrap their head around.
One possible solution to this problem is to reverse the machine code back to its precompiled state in a process known as “disassembling” the code. After being disassembled, the machine code will be replaced with ARM assembly, the second lowest level of ARM programming possible.
So, without further ado, here it is:
An Incredibly Brief Introduction to ARM Assembly (And Assembly in General), Part 1
Most programmers would probably barf at ARM assembly. For people who know how to program in high level languages like C or C++, Java, C#, etc, assembly would probably look like gibberish. Most high level operations are intuitive, like adding numbers (number + number), setting a variable value ( number = number), or even calling functions( function_name()). In assembly, all these are handled by a set of operations with a specific syntax, called instructions.
Instructions in Assembly often reference things like registers, the stack pointer or the program counter. These are integral parts of the hardware inside the CPU, and it's very important to know what they do in order to successfully program or analyze any assembly language.
Registers:
ARM is based on the RISC (Reduced Instruction Set Computing) standard as opposed to the CISC (Complex Instruction Set Computing) standard. RISC follows the K.I.S.S. method of programming and generally has fewer instructions to remember and a faster execution time. However, the downside is that to perform complex operations that CISC might store as one or two instructions, multiple RISC instructions will be required, making it harder to analyze an algorithm.
One example is multiplication:
In CISC, MULT will multiply two operands in memory and store them back all in one instruction.
In RISC, you have to load two values into two registers (two separate instructions), multiply with another instruction and store the result back into memory with a fourth instruction.
You can read more about the RISC and CISC standards here: http://cs.stanford.edu/people/eroberts/ ... /risccisc/
This tutorial also uses information from the following resource:
http://simplemachines.it/doc/arm_inst.pdf
End of Part 1!
In the next part, we'll set up our environment for making and testing simple ARM assembly programs.
One possible solution to this problem is to reverse the machine code back to its precompiled state in a process known as “disassembling” the code. After being disassembled, the machine code will be replaced with ARM assembly, the second lowest level of ARM programming possible.
So, without further ado, here it is:
An Incredibly Brief Introduction to ARM Assembly (And Assembly in General), Part 1
Most programmers would probably barf at ARM assembly. For people who know how to program in high level languages like C or C++, Java, C#, etc, assembly would probably look like gibberish. Most high level operations are intuitive, like adding numbers (number + number), setting a variable value ( number = number), or even calling functions( function_name()). In assembly, all these are handled by a set of operations with a specific syntax, called instructions.
Instructions in Assembly often reference things like registers, the stack pointer or the program counter. These are integral parts of the hardware inside the CPU, and it's very important to know what they do in order to successfully program or analyze any assembly language.
Registers:
- Registers are temporary placeholders for variables that are used by the program and various CPU specific data. They're a faster alternative to saving variables in memory.
The ARM architecture has 37 registers, each of which hold 32 bits.
Registers in assembly are referenced using r followed by the register number (i.e. r0, r1, etc.)
The first 13 (r0 – r12) registers are General Purpose Registers (GPR's) that can be used to store anything.
- r15 is the Program Counter (PC). It holds the current location in memory, which in turn houses the current instruction to perform or data to use. A PC is one of the most basic components of a CPU and is basically a binary counter that counts up from 0 to 2^30 (2 bits always remain set to 0).
- r13 is the Stack Pointer (SP). The number in this register points to the top of the Stack.
The stack in ARM assembly is like any other stack: First In, Last Out.
When an instruction is run that allows jumping around memory, the Stack holds the last memory address in PC before it is changed to the new location's address. This allows the PC to return to the original location in memory simply by copying over the contents of the top of the Stack.
Whenever the Stack gains an address, the SP has to increment by 1. Whenever one is removed, the SP is decremented by 1.
Stack Overflow errors (common with recursive programming) are caused by the Stack accumulating too many memory addresses, causing the original address to be overwritten and the program to enter an infinite loop.
- r14 is the Link Register (LR). The Link Register fulfills the same purpose as the Stack on a smaller scale. When a subroutine can't call another subroutine (called a leaf subroutine), it's often faster to save the current address to an empty register rather than save it top the Stack AND increment the SP. ARM has a dedicated Link Register, although any of the GPR's could also be used.
- Known as the CPSR, this register houses various information about the current program.
http://en.wikipedia.org/wiki/ARM_architecture#Registers
- The keen will have noticed that not all 37 registers have been mentioned. The ARM processor has several operation modes, and only some of the registers can be accessed in each mode. Each mode always has an r0 – r15 and a CPSR, but the actual physical registers might be any of the available registers.
ARM is based on the RISC (Reduced Instruction Set Computing) standard as opposed to the CISC (Complex Instruction Set Computing) standard. RISC follows the K.I.S.S. method of programming and generally has fewer instructions to remember and a faster execution time. However, the downside is that to perform complex operations that CISC might store as one or two instructions, multiple RISC instructions will be required, making it harder to analyze an algorithm.
One example is multiplication:
In CISC, MULT will multiply two operands in memory and store them back all in one instruction.
In RISC, you have to load two values into two registers (two separate instructions), multiply with another instruction and store the result back into memory with a fourth instruction.
You can read more about the RISC and CISC standards here: http://cs.stanford.edu/people/eroberts/ ... /risccisc/
This tutorial also uses information from the following resource:
http://simplemachines.it/doc/arm_inst.pdf
End of Part 1!
In the next part, we'll set up our environment for making and testing simple ARM assembly programs.