Page 1 of 1

[Tutorial]An Incredibly Brief Introduction to ARM Assembly

Posted: Fri Feb 06, 2015 4:21 am
by watswat5
Recently, an ROP hack was used to discover and exploit a vulnerability in the PS Vita webkit process. One of the many things to arise from this is the possibility to dump and analyze the Vita's API modules, potentially exposing more useful or interesting security exploits. Unfortunately, modules that are dumped consist entirely of ARM machine code, something that the average human would find impossible to read and wrap their head around.

One possible solution to this problem is to reverse the machine code back to its precompiled state in a process known as “disassembling” the code. After being disassembled, the machine code will be replaced with ARM assembly, the second lowest level of ARM programming possible.

So, without further ado, here it is:

An Incredibly Brief Introduction to ARM Assembly (And Assembly in General), Part 1

Most programmers would probably barf at ARM assembly. For people who know how to program in high level languages like C or C++, Java, C#, etc, assembly would probably look like gibberish. Most high level operations are intuitive, like adding numbers (number + number), setting a variable value ( number = number), or even calling functions( function_name()). In assembly, all these are handled by a set of operations with a specific syntax, called instructions.

Instructions in Assembly often reference things like registers, the stack pointer or the program counter. These are integral parts of the hardware inside the CPU, and it's very important to know what they do in order to successfully program or analyze any assembly language.

Registers:
  • Registers are temporary placeholders for variables that are used by the program and various CPU specific data. They're a faster alternative to saving variables in memory.
    The ARM architecture has 37 registers, each of which hold 32 bits.
    Registers in assembly are referenced using r followed by the register number (i.e. r0, r1, etc.)
    The first 13 (r0 – r12) registers are General Purpose Registers (GPR's) that can be used to store anything.
Program Counter
  • r15 is the Program Counter (PC). It holds the current location in memory, which in turn houses the current instruction to perform or data to use. A PC is one of the most basic components of a CPU and is basically a binary counter that counts up from 0 to 2^30 (2 bits always remain set to 0).
Stack Pointer
  • r13 is the Stack Pointer (SP). The number in this register points to the top of the Stack.
    The stack in ARM assembly is like any other stack: First In, Last Out.
    When an instruction is run that allows jumping around memory, the Stack holds the last memory address in PC before it is changed to the new location's address. This allows the PC to return to the original location in memory simply by copying over the contents of the top of the Stack.
    Whenever the Stack gains an address, the SP has to increment by 1. Whenever one is removed, the SP is decremented by 1.
    Stack Overflow errors (common with recursive programming) are caused by the Stack accumulating too many memory addresses, causing the original address to be overwritten and the program to enter an infinite loop.
Link Register
  • r14 is the Link Register (LR). The Link Register fulfills the same purpose as the Stack on a smaller scale. When a subroutine can't call another subroutine (called a leaf subroutine), it's often faster to save the current address to an empty register rather than save it top the Stack AND increment the SP. ARM has a dedicated Link Register, although any of the GPR's could also be used.
Current Program Status Register All the Rest
  • The keen will have noticed that not all 37 registers have been mentioned. The ARM processor has several operation modes, and only some of the registers can be accessed in each mode. Each mode always has an r0 – r15 and a CPSR, but the actual physical registers might be any of the available registers.
Throughout an assembly code file, these registers will be referenced regularly. It's best to remember them or keep a handy reference nearby!

ARM is based on the RISC (Reduced Instruction Set Computing) standard as opposed to the CISC (Complex Instruction Set Computing) standard. RISC follows the K.I.S.S. method of programming and generally has fewer instructions to remember and a faster execution time. However, the downside is that to perform complex operations that CISC might store as one or two instructions, multiple RISC instructions will be required, making it harder to analyze an algorithm.

One example is multiplication:

In CISC, MULT will multiply two operands in memory and store them back all in one instruction.
In RISC, you have to load two values into two registers (two separate instructions), multiply with another instruction and store the result back into memory with a fourth instruction.

You can read more about the RISC and CISC standards here: http://cs.stanford.edu/people/eroberts/ ... /risccisc/

This tutorial also uses information from the following resource:
http://simplemachines.it/doc/arm_inst.pdf

End of Part 1!

In the next part, we'll set up our environment for making and testing simple ARM assembly programs.

[Tutorial]An Incredibly Brief Introduction to ARM Assembly

Posted: Fri Feb 06, 2015 7:51 am
by watswat5
And now for...

An Incredibly Brief Introduction to ARM Assembly (And Assembly in General), Part 2

All RISCines aside, we can begin to actually delve into code.

And the following handy reference to all the ARM instructions will be handy:

http://infocenter.arm.com/help/topic/co ... 01_UAL.pdf

P.S. We're learning about the standard ARM instruction set. ARM has support for another set of instructions called the Thumb instruction set, but we'll look at those later. Remember this whenever you're looking at ARM documentation!

Also, please note that I'm still learning with you guys. This tutorial will mostly gloss over what I've learned and been learning about ARM in the hopes that it encourages more people to bust into the Vita hacking scene. If there's any incorrect or misleading information, please let me know.

The rest of the tutorial will be in Linux. If you don't use Linux, I'm sure you can find alternatives to everything I'm doing. If you feel so inclined, a virtual PC with Ubuntu is what I'm using and it works just fine.

Basically, we're going to grab 2 things: QEMU and the ARM GNU Toolchain.

Windows has some support for QEMU and the GNU toolchain, but I won't be going through the install process for any windows based computers.

For QEMU, type in the terminal the following:

Code: Select all

sudo apt-get install qemu

sudo apt-get install qemu-system
This will install QEMU and all the emulator cores, including the ARM one we'll be using.

For the ARM Toolchain, type the following commands in the terminal:

Code: Select all

sudo add-apt-repository ppa:terry.guo/gcc-arm-embedded 

sudo apt-get update 

sudo apt-get install gcc-arm-none-eabi 
The ARM GNU Toolchain will take our assembly and generate the ARM binary file.

The following is a bash script that handles a lot of monotonous typing for you. The script uses the toolchain to compile an assembly file (filename.s) to an object file (filename.o) then to an executable (filename.elf) and finally to a binary file (filename.bin). The binary file is then copied to a 16MB file called flash.bin that QEMU uses to simulate the ARM's memory.

Code: Select all

#!/bin/bash

echo "Enter File Name (No File Extension): "

read filename

mkdir $HOME/Desktop/$filename

name1="$HOME/Desktop/$filename/$filename.o"
name2="$filename.s"

arm-none-eabi-as -o $name1 $name2

name2="$HOME/Desktop/$filename/$filename.elf"
arm-none-eabi-ld -Ttext=0x0 -o $name2 $name1

name1="$HOME/Desktop/$filename/$filename.bin"
arm-none-eabi-objcopy -O binary $name2 $name1

dd if=/dev/zero of=$HOME/Desktop/$filename/flash.bin bs=4096 count=4096

dd if=$name1 of=$HOME/Desktop/$filename/flash.bin bs=4096 conv=notrunc

echo "qemu-system-arm -M connex -pflash $HOME/Desktop/$filename/flash.bin -nographic -serial /dev/null" >> $HOME/Desktop/$filename/QEMU.sh

echo "Done!"

That's all the setup we'll need. To make sure everything is working, let's run through an incredibly simple assembly program!

Open up a new blank document and enter the following:

Code: Select all

	.text
start:
	mov r0, #5
stop:
	b stop
Save the document to the desktop with a .s extension.

Now, run the bash script I provided and input the name of your file WITHOUT THE EXTENSION.

In the end, there should be a folder on the desktop with the same name as your file. Inside it will be the files we talked about earlier (file.o, file.elf, file.bin and flash.bin) as well as QEMU.sh.
Run QEMU.sh and, if all went well, the terminal should display the QEMU prompt “(qemu)”.

If you don't get the prompt, or if something before this step isn't working, make sure that you've installed all the packages and that your assembly code is exactly what I've posted.

If you do get the prompt, congrats! That's all the hard stuff done.

Now, in the terminal, type “info registers” and it should display the registers we talked about earlier (r0 – r15). If you observe r0, you'll notice the value 5 is stored there. This is good, as it is essentially the only thing our program was supposed to do!

Assuming all went well, let's return and dissect our program.

Code: Select all

.text
This is a directive. It lets the assembler know that everything following is part of the program. If we want any global variables, we'd establish them in the .data directive above this.

Code: Select all

start:
This is a label. It acts like the method name, and if we ever want to jump to this memory location we can just refer to the label.

Code: Select all

mov r0, #5
mov is the command to load a register in ARM assembly. Here, we're loading r0 with #5.
The #5 tells the assembler that 5 is a constant. Alternatively, we could use a register like r5. r5 and r0 are both initially 0, though, so this would have been pretty boring.

Code: Select all

stop:
Another label.

Code: Select all

b stop
Infinitely loops back to stop. b is short for Branch and essentially jumps to the label stop.

So, our program should have loaded #5 into r0 and paused indefinitely. By viewing the registers in the QEMU prompt, we can visually see if this actually happened. Right now, these registers are the only form of I/O that we have, so we'll be using them a bunch.

End of Part 2

In the next part, we'll learn some more instructions and implementing some simple algorithms!

Re: [Tutorial]An Incredibly Brief Introduction to ARM Assemb

Posted: Sat Feb 07, 2015 1:41 am
by watswat5
An Incredibly Brief Introduction to ARM Assembly (And Assembly in General), Part 3
(This is gonna be a relatively long one)

So, now we have everything we need to begin writing our own assembly programs for ARM. Now would be a good time to learn some basic instructions!

*We'll be looking at what's really a tiny fraction of ARM instructions. For an in depth list, check the link in part 2 or the ARM website.

For the next program we're writing, we'll only need two more instructions than we already know.

Code: Select all

ADD Rm, Rn, Operand2
The above instruction adds the value of Rn and Operand2 and stores the result in Rm. As you might expect, Rm and Rn are registers. The Operand, however, can be a either a register or a constant value.

Code: Select all

ADD r0, r1, #16
This takes r0 and sets it equal to r1 + 16.

Code: Select all

ADD r0, r0, #1
This increments r0 by 1.

On the opposite end of ADD is SUB, which simply subtracts Operand2 from Rn rather than adding.

Code: Select all

SUB r0, r0, #1
This decrements r0 rather than incrementing it.

There's another instruction for subtraction called RSB (reverse subtraction)

Code: Select all

RSB Rm, Rn, Operand2
RSB subtracts Rn from Operand2 rather than vice versa.

Code: Select all

RSB r0, r1, 100
This sets r0 equal to 100 – r1.

RSB and SUB are both required because, unlike ADD, SUB is NOT commutative.

So, with all this information newly planted in your noggin, lets test another assembly program.

Code: Select all

	.text
start:
	mov r0, #5
	mov r1, #5
	add r2, r1, r0
	sub r0, r2, #2
	b stop
stop:
	b stop
After compiling and running the program in QEMU, check the registers with the “info registers” command. r0 should display 8, r1 should display 5 and r2 should display A (HEX for 10).

Success! We've begun to learn kindergarten math in ARM assembly!

Now, let's learn about another common math operator, MUL.

Code: Select all

MUL Rm, Rn, Rd
This is a simple multiplication. It takes Rn * Rd and puts the result in Rm.

What you may have noticed is that there is no Operand2. There's a very good reason for this.

One ARM instruction takes up 32 bits. Of these, some are taken by the opcode, some are taken by the register locations, some are taken by conditions (which we haven't touched on) and the end result is that only a few bits are left open. Rather than limit multiplication to a number of a few bits, the ARM instruction set lets you use a full 32 bit number in a register as the operand.

Unfortunately, this also means that MUL takes an extra step of loading a second register with a value.

Let's make a simple assembly program that does some more simple math.

Code: Select all

	.text
start:
	mov r0, #5
	mov r1, #6
	mul r2, r1, r0
	b stop
stop:
	b stop
Running and displaying the registers should result in r2 being 1E, which is exactly 30 in base 10!

Now, since we have access to MUL, we can assume that we have access to some kind of division, right?

Hahahahahahahahahahahahahaha no.

Although some do, most ARM processors don't come with any way of performing hardware integer division. For this, we must resort to a subroutine, which is basically the same as a function in other languages.

Although many subroutines exist for integer division, it'd be great learning to write our own. The next program we write will be a recursive implementation of division!

Before we begin, however, we need to look at some more instructions.

Code: Select all

CMP Rm, Rn
This instruction does a couple things that add up to a simple comparison:
First, it subtracts Rn from Rm. It doesn't store the answer, however.
Next, it sets a flag in the CPSR.

If you remember back to part one, we discussed the registers of the ARM environment. However, I mostly glossed over the Current Program Status Register (CPSR). Now would be a great time to return to the topic!

The CPSR contains some information about the program running. Of most interest to us are the flags it contains. These flags, each represented as 1 bit, tell the CPU the result of a comparison, like when we use CMP.
The N, and Z flags are used to indicate when the result is Negative (Rm < Rn), or Zero (Rm = Rn).
C and V represent Carry and Overflow for arithmetic and comparison operations.

When we use CMP, the N or Z flag will be set. Based on those, we can know if Rm is >, < or = Rn and run instructions accordingly.

CMP would be useless without some actual conditional statements, however!

Most instructions in ARM support something called a conditional suffix. With conditional suffixes, instructions like ADD can become ADDLO (ADD if CMP shows that Rm < Rn) or ADDHI ( ADD if CMP shows that Rm > Rn).

There are many more suffixes for conditional code execution. You can view them all here:
http://infocenter.arm.com/help/index.js ... BHJCJ.html

Now that we have the ability to execute conditional statements, we can continue to make our simple division algorithm!

When considering an algorithm, it's helpful to me to construct some pseduo code in a high level language (English works):

Code: Select all

Is the Divisor (bottom) larger than the Dividend (top)?
    Yes:
        Return the remainder and the counter (quotient)
    No:
        Subtract the Divisor from the Dividend
        Increment the counter (quotient) by 1
        Repeat the method
By looking at the pseudo code, we can see that we'll need 3 variables for the divisor, dividend and the counter. In ARM assembly, it's standard to use r0 – r3 as variables for subroutines, so that's what we'll do!

- r0 will contain the dividend
- r1 will contain the divisor
- r2 will contain the counter

So, with that being said, let's begin!

Open the file with the standard .text, start and stop labels.

Code: Select all

	.text
start:
	b stop
stop:
	b stop
Now, move a number (25 here) to r0 and another (5) to r1

Code: Select all

	.text
start:
	mov r0, #25
	mov r1, #5
	b stop
stop:
	b stop
Now, we're going to branch to the subroutine.
Remember how we use

Code: Select all

b stop
to call the stop subroutine? Well, we're going to do something similar here:

Code: Select all

	.text
start:
	mov r0, #25
	mov r1, #5
	bl divide
	b stop
stop:
	b stop
BL stands for Branch with Link. This saves the PC to the Link register LR so we can return here when we're done.

Now, we need a label for the division. I'm going to use “divide” for convenience.

Code: Select all

	.text
start:
	mov r0, #25
	mov r1, #5
	bl divide
	b stop
divide:

stop:
	b stop
Now let's convert the pseudo code to assembly.

Code: Select all

Is the Divisor (bottom) larger than the Dividend (top)?
    This is represented with a CMP of the divisor (r1) and the dividend (r0)
If Yes, return
    This is done with a MOV command and the LO (less than) suffix.
    We use MOVLO to copy the Link Register back into the PC
    MOVLO pc, lr
If No, Do Stuff
    Subtract divisor (r1) from dividend (r0)
        SUB r0, r0, r1
    Increment the counter (r2)
        ADD r2, r2, #1
    Repeat subroutine
        b divide
Piecing all this together gives us the following:

Code: Select all

	.text
start:
	mov r0, #25
	mov r1, #5
	bl divide
	b stop
divide:
	cmp r0, r1
	movlo pc, lr
	sub r0, r0, r1
	add r2, r2, #1
	b divide
stop:
	b stop
Where r0 will hold the remainder and r2 will hold the quotient.

After assembling and testing with the bash script, the output of r0 should be 0, r1 and r2 should both be 5.

Try plugging in different values for r0 and r1 to see if it keeps working. Remember, the registers are in HEX and not decimal!

This method of dividing, although nice and easy to understand, is actually very inefficient. Try implementing a better division algorithm in assembly as practice!

End of Part 3