# Computer <br> Organization and Structure 

Bing-Yu Chen
National Taiwan University

## Instructions:

## Language of the Computer

$\square$ Operations and Operands - of the Computer Hardware
$\square$ Signed and Unsigned Numbers
$\square$ Representing Instructions

- in the Computer
$\square$ Logical Operations
$\square$ Instructions for Making Decisions
$\square$ Supporting Procedures
- in Computer Hardware
$\square$ Communicating with People
$\square$ MIPS Addressing
- for 32-Bit Immediates and Addresses
$\square$ Translating and Starting a Program
$\square$ Arrays vs. Pointers


## Instruction Set

$\square$ The repertoire of instructions of a computer
$\square$ Different computers have different instruction sets

- But with many aspects in common
$\square$ Early computers had very simple instruction sets
- Simplified implementation
$\square$ Many modern computers also have simple instruction sets


## The MIPS Instruction Set

- Used as the example throughout the book
$\square$ Stanford MIPS commercialized by MIPS Technologies (www.mips.com)
$\square$ Large share of embedded core market
- Applications in consumer electronics, network/storage equipment, cameras, printers, ...
$\square$ Typical of many modern ISAs
- See MIPS Reference Data tear-out card, and Appendixes B and E


## Arithmetic Operations

$\square$ Add and Subtract, 3 operands

- 2 sources and 1 destination
$\square$ operand order is fixed
- destination first
- all arithmetic operations have this form
$\square$ Example:
- C code:
$a=b+c$
MIPS code:
add $a, b, c$


## Arithmetic Operations

$\square$ Design Principle 1:

- simplicity favors regularity
$\square$ Regularity makes implementation simpler
$\square$ Simplicity enables higher performance at lower cost


## Arithmetic Examples

$\square$ compiling two C assignments into MIPS

- C code: $\quad \begin{array}{ll}a=b+c ; \\ d=a-e ;\end{array}$
- MIPS code: add $a, b, c$ sub d, a, e
$\square$ compiling a complex $C$ assignment into MIPS
- C code: $\quad f=(g+h)-(i+j)$
- MIPS code: add $\$ t 0, g, h \quad \#$ temp t0 $=g+h$ add \$t1, i, j
\# temp t1 $=i+j$
sub $\mathrm{f}, \$ \mathrm{t} 0, \$ \mathrm{t} 1 \quad \# \mathrm{f}=\mathrm{t} 0-\mathrm{t} 1$


## Register Operands

$\square$ Of course this complicates some things...

- C code: $\quad a=b+c+d$;
- MIPS code: add $a, b, c$ add $a, a, d$
- where $a \& b \& c \& d$ mean registers
$\square$ Arithmetic instructions use register operands
- operands must be registers


## Register Operands

$\square$ MIPS has a $32 \times 32$-bit register file

- Use for frequently accessed data
- Numbered 0 to 31
- 32-bit data called a "word"
$\square$ Assembler names
- \$t0, \$t1, ..., \$t9 for temporary values
- \$s0, \$s1, ..., \$s7 for saved variables
$\square$ Design Principle 2:
- smaller is faster
$\square$ c.f. main memory: millions of locations


## Register Operand Example

$\square C$ code: $\quad f=(g+h)-(i+j)$

- assume f, ..., jin \$s0, .., \$s4
$\square$ MIPS code: add \$t0, \$s1, \$s2 add \$t1, \$s3, \$s4 sub \$s0, \$t0, \$t1


## Registers vs. Memory

$\square$ Arithmetic instructions operands must be registers

- only 32 registers provided
$\square$ Compiler associates variables with registers
$\square$ What about programs with lots of variables



## Memory Operands

$\square$ Main memory used for composite data - Arrays, structures, dynamic data
$\square$ To apply arithmetic operations

- Load values from memory into registers
- Store result from register to memory
$\square$ MIPS is Big Endian
- Most-Significant Byte at least address of a word
- c.f. Little Endian: Least-Significant Byte at least address


## Memory Organization

$\square$ viewed as a large, single-dimension array, with an address
$\square$ A memory address is an index into the array


## Big Endian vs. Little Endian



Big Endianness

Memory

## Memory Organization

$\square$ "bytes" are nice, but most data items use larger "words"
$\square$ for MIPS

- a word is 32 bits or 4 bytes

| 0 | 32 bits of data |
| :--- | :--- |
| 4 | 32 bits of data |
| 8 | 32 bits of data |
| 12 | 32 bits of data |

$\square 2^{32}$ bytes with byte addresses from 0 to $2^{32}-1$
$\square 2^{30}$ words with byte addresses $0,4,8, \ldots 2^{32}-4$
$\square$ words are aligned (alignment restriction)

- Address must be a multiple of 4
- What are the least 2 significant bits of a word address?


## Word Addressing



## Load \& Store Instructions

$\square$ C code: $\quad \mathrm{g}=\mathrm{h}+\mathrm{A}[8]$;

- $g$ in $\$ s 1, h$ in $\$ s 2$, base address of $A$ in $\$ s 3$
$\square$ MIPS code: Iw \$t0, 32(\$s3) add \$s1, \$s2, \$t0
- index 8 requires offset of 32
$\square 4$ bytes per word
$\square$ can refer to registers by name (e.g., \$s2, \$t0) instead of number


## Load \& Store Instructions

$\square$ C code: $\quad A[12]=h+A[8] ;$

- $h$ in $\$$ s2, base address of $A$ in $\$ s 3$
$\square$ MIPS code: Iw \$t0, 32(\$s3) add \$t0, \$s2, \$t0 sw \$t0, 48(\$s3)
$\square$ store word has destination last
$\square$ remember arithmetic operands are registers, not memory
- can't write: add 48(\$s3), \$s2, 32(\$s3)


## Registers vs. Memory

$\square$ Registers are faster to access than memory
$\square$ Operating on memory data requires loads and stores

- More instructions to be executed
$\square$ Compiler must use registers for variables as much as possible
- Only spill to memory for less frequently used variables
- Register optimization is important!


## Constants

$\square$ Small constants are used quite frequently

- e.g., $A=A+5$;
$B=B+1 ;$
$C=C-18 ;$
$\square$ Solutions? Why not?
- Put 'typical constants' in memory and load them?
- Create hard-wired registers (like \$zero) for constants like one?


## The Constant Zero

$\square$ MIPS register 0 (\$zero) is the constant 0

- Cannot be overwritten
$\square$ Useful for common operations
- add \$t2, \$s1, \$zero
$\square$ e.g., move between registers


## Immediate Operands

$\square$ Constant data specified in an instruction - addi \$s3, \$s3, 4
$\square$ No subtract immediate* instruction

- Just use a negative constant
- addi $\$$ s2, $\$$ s1, -1
$\square$ Design Principle 3:
- Make the common case fast
- Small constants are common
$\square$ Immediate operand avoids a load instruction


## Numbers

$\square$ Bits are just bits (no inherent meaning)

- conventions define relationship between bits and numbers
$\square$ Binary numbers (base 2)
- decimal: $0 . . .2^{\text {n }}-1$
$\square$ Of course it gets more complicated:
- numbers are finite (overflow)
- fractions and real numbers
- negative numbers


## Unsigned Binary Integers

$\square$ Given an $n$-bit number

$$
x=x_{n-1} 2^{n-1}+x_{n-2} 2^{n-2}+\cdots+x_{1} 2^{1}+x_{0} 2^{0}
$$

$\square$ Range: 0 to $+2^{n}-1$
$\square$ Example

- $00000000000000000000000000001011_{2}$ $=0+\ldots+1 \times 2^{3}+0 \times 2^{2}+1 \times 2^{1}+1 \times 2^{0}$ $=0+\ldots+8+0+2+1=11_{10}$
$\square$ Using 32 bits
- 0 to $+4,294,967,295$


## 2's-Complement Signed Integers

$\square$ Given an n-bit number

$$
x=-x_{n-1} 2^{n-1}+x_{n-2} 2^{n-2}+\cdots+x_{1} 2^{1}+x_{0} 2^{0}
$$

$\square$ Range: $-2^{n-1}$ to $+2^{n-1}-1$
$\square$ Example

$$
\begin{aligned}
& 11111111111111111111111111111100_{2} \\
& =-1 \times 2^{31}+1 \times 2^{30}+\ldots+1 \times 2^{2}+0 \times 2^{1}+0 \times 2^{0} \\
& =-2,147,483,648+2,147,483,644=-4_{10}
\end{aligned}
$$

$\square$ Using 32 bits

- $-2,147,483,648$ to $+2,147,483,647$


## 2's-Complement Signed Integers

$\square$ Bit 31 is sign bit

- 1 for negative numbers
- 0 for non-negative numbers
$\square-\left(-2^{n-1}\right)$ can't be represented
$\square$ Non-negative numbers have the same unsigned and 2's-complement representation
$\square$ Some specific numbers
0: $\quad 00000000 \ldots 0000$
$-1: \quad 11111111 \ldots 1111$
Most-negative: $\quad 10000000 \ldots 0000$
Most-positive: $\quad 01111111 \ldots 1111$


## Signed Negation

$\square$ Complement and add 1

- Complement means $1 \rightarrow 0,0 \rightarrow 1$

$$
\begin{aligned}
& x+\bar{x}=1111 \ldots . .111_{2}=-1 \\
& \bar{x}+1=-x
\end{aligned}
$$

$\square$ Example: negate +2

$$
\begin{aligned}
+2 & =00000000 \ldots 0010_{2} \\
-2 & =11111111 \ldots 1101_{2}+1 \\
& =11111111 \ldots 1110_{2}
\end{aligned}
$$

$\square$ "negate" and "complement" are quite diffepent!

## Sign Extension

$\square$ Representing a number using more bits

- Preserve the numeric value
$\square$ In MIPS instruction set
- addi: extend immediate value
- lb, lh: extend loaded byte/halfword
- beq, bne: extend the displacement
$\square$ Replicate the sign bit to the left - c.f. unsigned values: extend with 0s
$\square$ Examples: 8-bit to 16-bit
■ +2: $00000010=>0000000000000010$
- 2: $11111110=>1111111111111110$


## Representing Instructions

$\square$ Instructions are encoded in binary - Called machine code
$\square$ MIPS instructions
E Encoded as 32-bit instruction words

- Small number of formats encoding operation code (opcode), register numbers, ...
- Regularity!
$\square$ Register numbers
- \$t0 - \$t7 are reg's 8-15
- \$t8 - \$t9 are reg's 24-25
- \$s0 - \$s7 are reg's 16-23

MIPS register conventions@Fig.2.14@P.121

## MIPS R-format Instructions

| op | rs | rt | rd | shamt | funct |
| :---: | :---: | :---: | :---: | :---: | :---: |
| 6 bits | 5 bits | 5 bits | 5 bits | 5 bits | 6 bits |

$\square$ op $=$ operation code (opcode)

- basic operation of the instruction
$\square$ rs / rt / rd
- register source / destination operand
$\square$ shamt $=$ shift amount
- 00000 for now
$\square$ funct $=$ function code
- extends opcode


## R-format Example

| op | $r s$ | $r t$ | $r d$ | shamt | funct |
| :---: | :---: | :---: | :---: | :---: | :---: |
| 6 bits | 5 bits | 5 bits | 5 bits | 5 bits | 6 bits |

$\square$ add \$t0, \$s1, \$s2

| special | $\$$ s1 | $\$$ s2 | $\$$ t0 | 0 | add |
| :---: | :---: | :---: | :---: | :---: | :---: |
| 0 | 17 | 18 | 8 | 0 | 32 |
| 000000 | 10001 | 10010 | 01000 | 00000 | 100000 |

$\square 00000010001100100100000000100000_{2}$ $=02324020_{16}$

## Hexadecimal

$\square$ Base 16

- Compact representation of bit strings
- 4 bits per hex digit

| 0 | 0000 | 4 | 0100 | 8 | 1000 | c | 1100 |
| :--- | :--- | :--- | :--- | :--- | :--- | :--- | :--- |
| 1 | 0001 | 5 | 0101 | 9 | 1001 | d | 1101 |
| 2 | 0010 | 6 | 0110 | a | 1010 | e | 1110 |
| 3 | 0011 | 7 | 0111 | b | 1011 | f | 1111 |

$\square$ Example: eca8 6420
■ 11101100101010000110010000100000

## MIPS I-format Instructions

| op | rs | rt | constant or address |
| :---: | :---: | :---: | :---: |
| 6 bits | 5 bits | 5 bits | 16 bits |

$\square$ Immediate arithmetic and load/store instructions

- rs / rt: source or destination register number
- Constant: $-2^{15}$ to $+2^{15}-1$
- Address: offset added to base address in rs
$\square$ Design Principle 4:
- Good design demands good compromises
$\square$ Different formats complicate decoding, but allow 32bit instructions uniformly
$\square$ Keep formats as similar as possible


## I-format Example

| op | rs | rt | constant or address |
| :---: | :---: | :---: | :---: |
| 6 bits | 5 bits | 5 bits | 16 bits |

$\square$ Iw \$t0, 32(\$s2)

| Iw | $\$ s 2$ | $\$$ t0 | 32 |
| :---: | :---: | :---: | :---: |
| 35 | 18 | 8 | 32 |
| 100011 | 10010 | 01000 | 0000000000100000 |

## C / MIPS / Machine Languages

ㅁ $\mathrm{C}: \quad \mathrm{A}[300]=\mathrm{h}+\mathrm{A}[300]$

- MIPS: Iw \$t0, 1200(\$t1)
add \$t0, \$s2, \$t0
sw \$t0, 1200(\$t1)
$\square$ Machine Language:

| 35 | 9 | 8 | 1200 |  |  |
| :---: | :---: | :---: | :---: | :---: | :---: |
| 0 | 18 | 8 | 8 | 0 | 32 |
| 43 | 9 | 8 | 1200 |  |  |

## Stored Program Concept

$\square$ Instructions are bits
$\square$ Programs are stored in memory

- to be read or written just like data
$\square$ Fetch \& Execute Cycle
- Instructions are fetched and put into a special register
- Bits in the register "control" the subsequent actions Fetch the "next" instruction and continue



## Logical Operations

| Logical <br> operations | C <br> operators | MIPS <br> instructions |
| :--- | :---: | :---: |
| Shift left | $\ll$ | sll |
| Shift right | $\gg$ | srl |
| Bitwise AND | $\&$ | and, andi |
| Bitwise OR | $\mid$ | or, ori |

## Logical Operations

| Instruction | Example |
| :--- | :--- |
| and | and $\$ s 1, \$ s 2, \$ s 3$ |
| or | or $\$ s 1, \$ s 2, \$ s 3$ |
| nor | nor $\$ s 1, \$ s 2, \$ s 3$ |
| and immediate | andi $\$ s 1, \$ s 2,100$ |
| or immediate | ori $\$ s 1, \$ s 2,100$ |
| shift left logical | sll $\$ s 1, \$ s 2,10$ |
| shift right logical | srl $\$ s 1, \$ s 2,10$ |

$\square$ There is no NOT, since ...

- $\operatorname{NOT}(A)=\operatorname{NOT}(A$ OR 0$)=A \operatorname{NOR} 0$


## Shift Operations

| op | rs | $r t$ | rd | shamt | funct |
| :---: | :---: | :---: | :---: | :---: | :---: |
| 6 bits | 5 bits | 5 bits | 5 bits | 5 bits | 6 bits |

$\square$ shamt: how many positions to shift
$\square$ Shift left logical

- Shift left and fill with 0 bits
- sll by i bits multiplies by $2^{i}$
$\square$ Shift right logical
- Shift right and fill with 0 bits
- srl by i bits divides by $2^{i}$ (unsigned only)


## Shift Operations

- NOTICE
- shift left/right logical is not I-type
$\square$ Example: sll \$t2, \$s0, 4
$\square$ Machine Language:

| op | $r s$ | $r t$ | $r d$ | shamt | funct |
| :---: | :---: | :---: | :---: | :---: | :---: |
| special | none | $\$ s 0$ | $\$ t 2$ | 4 | sIl |
| 0 | 0 | 16 | 10 | 4 | 0 |

## AND Operations

$\square$ Useful to mask bits in a word - Select some bits, clear others to 0
$\square$ and \$t0, \$t1, \$t2
■ $\$$ t2 $=00000000000000000000110111000000$

- \$t1 = 00000000000000000011110000000000
- \$t0 $=00000000000000000000110000000000$


## OR Operations

ㅁ Useful to include bits in a word

- Set some bits to 1 , leave others unchanged
- or \$t0, \$t1, \$t2

■ \$t2 = 00000000000000000000110111000000

- \$t1 = 00000000000000000011110000000000
- \$t0 $=00000000000000000011110111000000$


## NOT Operations

$\square$ Useful to invert bits in a word - Change 0 to 1 , and 1 to 0
$\square$ MIPS has NOR 3-operand instruction - a NOR b == NOT ( a OR b )

- nor \$t0, \$t1, \$zero
- $\$$ tl $=00000000000000000011110000000000$

■ - to = 11111111111111111100001111111111

## Conditional Operations

$\square$ Decision making instructions

- alter the control flow,
- i.e., change the "next" instruction to be executed
$\square$ MIPS conditional branch instructions:
- bne \$t0, \$t1, Label
- beq \$t0, \$t1, Label

Example: if $(i==j) h=i+j$;
bne \$s0, \$s1, Label add \$s3, \$s0, \$s1
Label:

## Unconditional Operations

$\square$ MIPS unconditional branch instructions:

- j Label
$\square$ (Un-)Conditional Branch Example:
if ( $\mathrm{i}==\mathrm{j}$ )
$\mathrm{f}=\mathrm{g}+\mathrm{h}$;
else
$\mathrm{f}=\mathrm{g}$-h;
bne \$s3, \$s4, Else
add \$s0, \$s1, \$s2
j Exit
Else: sub \$s0, \$s1, \$s2
Exit: ...

Assembler
calculates addresses
$\square$ Can you build a simple for / while loop?

## While Loop

## C:

while (save [i] $==k$ ) $i+=1$;

- assume $i$ in $\$ s 3, k$ in $\$ s 5$, address of save in $\$ s 6$

MI PS:
Loop: sll \$t1,\$s3, 2
add \$t1, \$t1, \$s6
Iw \$t0, 0(\$t1)
bne \$t0, \$s5, Exit
addi \$s3, \$s3, 1
j Loop
\# \$t1=4*i
\# \$t1=addr. of save[i]
\# \$t0=save[i]
\# go to Exit if save[i]!=k
\# i+=1
\# go to Loop
Exit:

## Control Flow

$\square$ set on less than:

```
if ($s3 < $54)
slti $t1, $s3, s`4
$t1=1;
else
\[
\$ \mathrm{tl}=0 ;
\]
```

$\square$ can use this instruction to build "blt \$s1, \$s2, Label"

- can now build general control structures
$\square$ NOTE
- the assembler needs a register to do this,
- there are policy of use conventions for registers


## Branch Instruction Design

$\square$ Why not blt, bge, etc?
$\square$ Hardware for $<, \geq, \ldots$ slower than $=, \neq$ - Combining with branch involves more work per instruction, requiring a slower clock

- All instructions penalized!
$\square$ beq and bne are the common case
$\square$ This is a good design compromise


## Case/Switch in C

switch (k) \{
case 0:
case 1:
case 2 :
case 3:
\}
$f=i+j ; \quad$ break;
/* k=0 */
$f=g+h ;$ break; break;
/* k=1 */
$f=g-h$;
$f=i-j$; break; $\quad{ }^{*} k=3$ */

## Case/Switch in MIPS -- using Jump Address Table

|  | slt | \$t3, \$s5, \$zero | \# test if $\mathrm{k}<0$ |
| :---: | :---: | :---: | :---: |
|  | bne | \$t3, \$zero, Exit | \# if $k<0$, go to Exit |
|  | siti | \$t3, \$s5, 4 | \# test if $k<4$ |
|  | beq | \$t3, \$zero, Exit | \# if $k>=4$, go to Exit |
|  | sll | \$t1, \$s5, 2 | \# \$tl $=4 * \mathrm{k}$ |
|  | add | \$t1, \$t1, \$t4 | \# \$t1=addr. of JumpTable[k] |
|  | Iw | \$t0, 0 (\$t1) | \# \$t0= JumpTable[k] |
|  | jr | \$t0 | \# jump based on \$t0 |
| LO: | add | \$s0, \$s3, \$s4 | \# k $=0$, so f gets $i+j$ |
|  | j | Exit | \# end of this case so go to Exit |
| L1: | add | \$s0, \$s1, \$ 22 | \# $k=1$, so f gets $\mathrm{g}+\mathrm{h}$ |
|  | j | Exit | \# end of this case so go to Exit |
| L2: | sub | \$s0, \$s1, \$s2 | \# k $=2$, so f gets $\mathrm{g}-\mathrm{h}$ |
|  | j | Exit | \# end of this case so go to Exit |
| L3: | sub | \$s0, \$s3, \$s4 | \# $k=3$, so f gets i - j |
| Exit: |  |  | \# end of switch statement |

## Signed vs. Unsigned

$\square$ Signed comparison: slt, slti
$\square$ Unsigned comparison: sltu, sltui

- Example

■ $\quad$ s0 $=11111111111111111111111111111111$
■ $\$ 51=00000000000000000000000000000001$

- slt \$t0, \$s0, \$sl \# signed
$\square-1<+1 \Rightarrow \$ \mathrm{tO}=1$
- sltu \$t0, \$50, \$s1 \# unsigned
$\square+4,294,967,295>+1 \Rightarrow \$ t 0=0$


## Procedure Calling

$\square$ Steps required

- Place parameters in registers
- Transfer control to procedure
- Acquire storage for procedure
- Perform procedure's operations
- Place result in register for caller
- Return to place of call


## Register Usage

| Name | Register No. | Usage |
| :--- | :---: | :--- |
| \$zero | 0 | the constant value 0 |
| \$v0-\$v1 | $2-3$ | values for results \& expression evaluation |
| \$a0-\$a3 | $4-7$ | arguments |
| $\$ t 0-\$ t 7$ | $8-15$ | temporaries (can be overwritten by callee) |
| $\$$ s0-\$s7 | $16-23$ | saved (must be saved/restored by callee) |
| $\$$ t8-\$t9 | $24-25$ | more temporaries |
| $\$ g p$ | 28 | global pointer |
| $\$$ sp | 29 | stack pointer |
| $\$ f p$ | 30 | frame pointer |
| $\$ r a$ | 31 | return address |

Register 1 (\$at) reserved for assembler, 26-27 for operating system

## Procedure Call Instructions

$\square$ Procedure call: jump and link - jal ProcedureLabel
$\square$ Address of following instruction put in \$ra

- Jumps to target address
$\square$ Procedure return: jump register
- jr \$ra
$\square$ Copies \$ra to program counter
$\square$ Can also be used for computed jumps
- e.g., for case/switch statements


## Leaf Procedure in C

int leaf_example (int g, int h, int $i$, int $j)$ \{ int f;
$f=(g+h)-(i+j) ;$ return f;
\}
$\square$ Assume

- Arguments $\mathrm{g}, \ldots, \mathrm{j}$ in $\$ \mathrm{a} 0, \ldots, \$ \mathrm{a3}$
- f in $\$ \mathrm{~s} 0$ (hence, need to save $\$ \mathrm{~s} 0$ on stack)
- Result in \$v0


## Leaf Procedure in MISP

addi $\$$ sp, $\$ \mathrm{sp},-4 \quad$ \# adjust stack for saving $\$ \mathrm{~s} 0$
sw \$s0, 0(\$sp)
add \$t0, \$a0, \$al \# g+h
add \$t1, \$a2, \$a3 \# i+j
sub \$s0, \$t0, \$t1 \# ( $\mathrm{g}+\mathrm{h}$ )-( $\mathrm{i}+\mathrm{j}$ )
add $\$ v 0, \$ s 0$, $\$$ zero $\#$ return $\mathrm{f}(\$ \mathrm{v} 0=\$ \mathrm{~s} 0+0)$
Iw $\$ \mathrm{~s} 0,0(\$ \mathrm{sp})$
addi $\$$ sp, $\$$ sp, $4 \quad$ \# adjust stack again
jr \$ra

## The Stack

High address


## Non-Leaf Procedures

$\square$ Procedures that call other procedures
$\square$ For nested call, caller needs to save on the stack:

- Its return address
- Any arguments and temporaries needed after the call
$\square$ Restore from the stack after the call


## Recursive Procedure in C

int fact (int n) \{
if $(\mathrm{n}<1$ )
return 1;
else
return ( n * fact ( $\mathrm{n}-1$ ) );
\}

- Assume
- Argument n in $\$ \mathrm{aO}$
- Result in \$v0


## Recursive Procedure in MISP

fact:

| addi | $\$ s p, \$ s p,-8$ | \# adjust stack for 2 items |
| :--- | :--- | :--- |
| sw | $\$ r a, 4(\$ s p)$ | \# save the return address |

sw $\quad \$ a 0,0(\$ \mathrm{sp}) \quad$ \# save the argument $n$
siti $\quad \$$ to, \$a0, $1 \quad$ \# test for $n<1$
beq $\quad \$ t 0$, $\$$ zero, $L 1$ \# if $n>=1$, go to L1
addi
\$sp, \$sp, 8 \# pop 2 items off stack
addi $\$ v 0, \$ z e r o, 1$ \# return 1
ir $\quad$ \$ra $\quad$ \# return to after jal
L1: addi $\quad \$ a 0, \$ a 0,-1 \quad \# n>=1:$ argument gets $(\mathrm{n}-1)$

| jal | fact | \# call fact with $(n-1)$ |
| :--- | :--- | :--- |
| Iw | $\$ a 0,0(\$ s p)$ | \# return from jal: restore argument $n$ |

Iw \$ra, 4(\$sp) \# restore the return address
addi $\$$ sp, $\$$ sp, 8 \# adjust stack pointer to pop 2 items
mul $\quad \$ \mathrm{vo}, \$ \mathrm{aO}, \$ \mathrm{v0}$ \# return $n$ * fact $(\mathrm{n}-\mathrm{I})$
jr \$ra \# return to the caller

## Local Data on the Stack

High address

$\square$ Procedure frame (activation record)

- Used by some compilers to manage stack storage


## Memory Layout

$\square$ Text: program code
$\square$ Static data: global variables

- e.g., static variables in C, constant
arrays and strings \$gp initialized to address allowing $\pm$ offsets into this segment
$\square$ Dynamic data: heap E.g., malloc in C
$\square$ Stack: automatic storage

$$
\begin{aligned}
& \text { \$gp } \\
& \$ p \mathrm{c} \\
& \text { \$ }
\end{aligned}
$$

## Character Data

$\square$ Byte-encoded character sets

- ASCII: 128 characters
$\square 95$ graphic, 33 control
- Latin- 1: 256 characters
$\square$ ASCII, +96 more graphic characters
$\square$ Unicode: 32-bit character set
- Used in C++ wide characters, ...
- Most of the world's alphabets, plus symbols
- UTF-8, UTF-16: variable-length encodings


## Byte/Halfword Operations

$\square$ Could use bitwise operations
$\square$ MIPS byte/halfword load/store - String processing is a common case
$\square \mathrm{lb} \mathrm{rt}$, offset(rs) lh rt , offset(rs)

- Sign extend to 32 bits in rt
$\square$ lbu rt, offset(rs) lhu rt, offset(rs)
- Zero extend to 32 bits in rt
$\square$ sb rt, offset(rs) sh rt, offset(rs)
- Store just rightmost byte/halfword


## String Copy Procedure in C

void strcpy (char x[], char y []) \{ int $i$;
$i=0 ;$
while ( $x[i]=y[i]!=' \neq 0$ ') \{ $i=i+1 ;$
\}
\}
$\square$ Assume

- Null-terminated string
- Addresses of $x, y$ in $\$ a 0, \$ a 1, i$ in $\$ s 0$


## String Copy Procedure in MIPS

| addi | \$sp, \$sp, -4 |  |
| :---: | :---: | :---: |
| sw | \$s0, 0(\$sp) |  |
| add | \$s0, \$zero, \$zero | \# 1 = 0 |
| L1: add | \$t1, \$s0, \$a1 | \# address of $\mathrm{y}[\mathrm{i}]$ in \$t1 |
| lb | \$t2, 0(\$t1) | \# \$t2 = y i ] |
| add | \$t3, \$s0, \$a0 | \# address of $x[i]$ in \$t3 |
| sb | \$t2, 0(\$t3) | $\# x[i]=y[i]$ |
| beq | \$t2, \$zero, L2 | \# if $\mathrm{y}[\mathrm{i}]==0, \mathrm{go}$ to L 2 |
| addi | \$s0, \$s0, 1 | $\# \mathrm{i}=\mathrm{i}+1$ |
| j | L1 | \# go to L1 |
| L2: Iw | \$s0, 0(\$sp) | \# restore old \$s0 |
| addi | \$sp, \$sp, 4 |  |
| jr | \$ra |  |

## 32-bit Constants

$\square$ Most constants are small

- 16 -bit immediate is sufficient
$\square$ For the occasional 32-bit constant
- lui rt, constant
- Copies 16 -bit constant to left 16 bits of rt
$\square$ Clears right 16 bits of rt to 0

| lui $\$ s 0,61$ | 0000000001111101 | 0000000000000000 |
| :--- | ---: | ---: |
| ori $\$ s 0, \$ s 0,23040000000001111101$ | 0000100100000000 |  |

## Branch Addressing

$\square$ Instructions:

- bne \$s0,\$s1,L1
- beq \$s0,\$s1,L2
$\square$ Formats:

| op | rs | rt | 16 bit number |
| :---: | :---: | :---: | :---: |

$\square$ Most branch targets are near branch

- Forward or backward
$\square$ PC-relative addressing
- Target address $=$ PC + offset $\times 4$
- PC already incremented by 4 by this time


## Jump Addressing

$\square$ Instructions:

$\square$ Formats:

J op | op | 26 bit number |
| :--- | :--- |

$\square$ Jump targets could be anywhere in text segment

- Encode full address in instruction
$\square$ (Pseudo)Direct jump addressing
- Target address $=$ PC $_{31 . .28}:($ address $\times 4)$


## Target Addressing Example

C:
while (save $[i]==k$ ) $i+=1$;
MI PS:

| Loop: | sll | \$t1, \$s3, 2 | 80000 | 0 | 0 | 19 | 9 | 4 | 0 |
| :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: |
|  | add | \$t1, \$t1, \$s6 | 80004 | 0 | 9 | 22 | 9 | 0 | 32 |
|  | Iw | \$t0, 0 (\$t1) | 80008 | 35 | 9 | 8 |  | 0 |  |
|  | bne | \$t0, \$s5, Exit | 80012 | 5 | 8 | 21 |  | 2 |  |
|  | addi | \$s3, \$s3, 1 | 80016 | 8 | 19 | 19 |  | 1 |  |
|  | j | Loop | 80020 | 2 |  |  | 00 |  |  |
| Exit: |  |  | 80024 |  |  |  |  |  |  |

## Branching Far Away

- If branch target is too far to encode with 16 -bit offset, assembler rewrites the code
$\square$ Example beq \$s0,\$s1, L1 ।
bne \$s0,\$s1, L2
j L1
L2:


## Addressing Modes

$\square$ Immediate addressing

| op | rs | rt | immediate |
| :--- | :--- | :--- | :--- |

$\square$ Register addressing


## Addressing Modes

$\square$ Base addressing


## Addressing Modes

$\square$ PC-relative addressing
Memory


## Decoding Machine Code

$\square$ What is the assembly language statement corresponding to this machine instruction?

- 00af8020 hex
$\rightarrow 00000000101011111000000000100000$
- op $=000000 \Rightarrow$ R-format
- rs = $00101(\mathrm{al}) / \mathrm{rt}=01111(\mathrm{t} 7) / \mathrm{rd}=10000(\mathrm{~s} 0)$
- shamt $=00000 /$ funct $=100000 \Rightarrow$ add
$\rightarrow$ add \$s0, \$a1, \$t7


## Translation and Startup



Many compilers produce object modules directly

Assembler


## Assembler Pseudoinstructions

$\square$ Most assembler instructions represent machine instructions one-to-one
$\square$ Pseudoinstructions: figments of the assembler's imagination

- move $\$$ t0, $\$ t 1 \rightarrow$ add $\$ t 0$, $\$$ zero, $\$$ t1
- blt \$t0, \$t1, L $\rightarrow$ slt \$at, \$t0, \$t1 bne \$at, \$zero, L
- \$at (register 1): assembler temporary


## Producing an Object Module

$\square$ Assembler (or compiler) translates program into machine instructions
$\square$ Provides information for building a complete program from the pieces

- Header: described contents of object module
- Text segment: translated instructions
- Static data segment: data allocated for the life of the program
- Relocation info: for contents that depend on absolute location of loaded program
- Symbol table: global definitions and external refs
- Debug info: for associating with source code


## Linking Object Modules

$\square$ Produces an executable image

1. Merges segments
2. Resolve labels (determine their addresses)
3. Patch location-dependent and external refs
$\square$ Could leave location dependencies for fixing by a relocating loader

- But with virtual memory, no need to do this
- Program can be loaded into absolute location in virtual memory space


## Loading a Program

$\square$ Load from image file on disk into memory

1. Read header to determine segment sizes
2. Create virtual address space
3. Copy text and initialized data into memory
$\square$ Or set page table entries so they can be faulted in
4. Set up arguments on stack
5. Initialize registers (including \$sp, \$fp, \$gp)
6. Jump to startup routine
$\square$ Copies arguments to $\$ a 0, \ldots$ and calls main
$\square$ When main returns, do exit syscall

## Dynamic Linking

$\square$ Only link/load library procedure when it is called

- Requires procedure code to be relocatable
- Avoids image bloat caused by static linking of all (transitively) referenced libraries
- Automatically picks up new library versions


## Lazy Linkage



## Arrays vs. Pointers

$\square$ Array indexing involves

- Multiplying index by element size
- Adding to array base address
$\square$ Pointers correspond directly to memory addresses
- Can avoid indexing complexity


## Array vs. Pointers in C

void clear1 (int array[], int size) \{
int $i$;
for ( $i=0 ; i<$ size; $i+=1$ )
$\operatorname{array}[i]=0$;
\}
void clear2 (int * array, int size) \{
int *p;
for ( $p=$ \&array[0]; $p<$ \&array[size]; $p+=1$ )

$$
\text { *p }=0
$$

\}

## Array Version of Clear in MIPS

add<br>loop1: sll<br>\$t0, \$zero, \$zero<br>add<br>sw<br>addi<br>slt<br>bne<br>\$t1, \$t0, 2<br>\$t2, \$a0, \$t1<br>\$zero, 0(\$t2)<br>\$t0, \$t0, 1<br>\$t3, \$t0, \$a1<br>\$t3, \$zero, loop1

## Pointer Version of Clear in MIPS

add \$t0, \$a0, \$zero<br>loop2: sw addi<br>sll<br>add<br>slt<br>bne<br>\$zero, 0(\$t0)<br>\$t0, \$t0, 4<br>\$t1, \$a1, 2<br>\$t2, \$a0, \$t1<br>\$t3, \$t0, \$t2<br>\$t3, \$zero, loop2

## New Pointer Version of Clear

add \$t0, \$a0, \$zero<br>sll<br>\$t1, \$a1, 2<br>add \$t2, \$a0, \$t1<br>loop2: sw \$zero, 0(\$t0)<br>addi $\$ t 0, \$ t 0,4$<br>slt<br>\$t3, \$t0, \$t2<br>bne<br>\$t3, \$zero, loop2

## Comparing the Two Versions

|  | add | \$t0, \$zero, \$zero | add | \$t0, \$a0, \$zero |
| :---: | :---: | :---: | :---: | :---: |
| \|p1: | sll | \$t1, \$t0, 2 | sll | \$t1, \$a1, 2 |
|  | add | \$t2, \$a0, \$t1 | add | \$t2, \$a0, \$t1 |
|  | sw | \$zero, 0(\$t2) lp2 | sw | \$zero, 0(\$t0) |
|  | addi | \$t0, \$t0, 1 | addi | \$t0, \$t0, 4 |
|  | slt | \$t3, \$t0, \$al | slt | \$t3, \$t0, \$t2 |
|  | bne | \$t3, \$zero, lp1 | bne | \$t3, \$zero, lp2 |

## Comparison of Array vs. Pointer

$\square$ Multiply "strength reduced" to shift
$\square$ Array version requires shift to be inside loop

- Part of index calculation for incremented i
- c.f. incrementing pointer
- Compiler can achieve same effect as manual use of pointers
- Induction variable elimination
- Better to make program clearer and safer


## Summary: <br> MIPS Operands

| Name | Example | Comments |
| :---: | :---: | :---: |
| 32 registers | ```$s0-$s7, $t0-$t9, $zero, $a0-$a3, $v0-$v1, $gp, $fp, $sp, $ra, $at``` | Fast locations for data. In MIPS data must be in registers to perform arithmetic. MIPS register \$zero always equals 0. |
| $2^{30}$ memory words | Memory[0], <br> Memory[4], <br> Memory[4294967292] | Accessed only by data transfer instructions. MIPS uses byte addresses, so sequential words differ by 4. Memory holds data structures, such as arrays, and spilled registers, such as those saved on procedure calls. |

## Summary: <br> MIPS Assembly Language

| Category | Instruction | Example | Meaning | Comments |
| :---: | :---: | :---: | :---: | :---: |
| Arithmetic | add | add \$s1, \$s2, \$s3 | \$s1=\$s2+\$s3 | Three register operands |
|  | subtract | sub \$s1, \$s2, \$s3 | \$s1=\$s2-\$s3 | Three register operands |
|  | add immediate | addi \$s1, \$s2, 100 | \$s1=\$s2+100 | Used to add constants |
| Conditional branch | branch on equal | beq \$s1, \$s2, 25 | if ( $\$ \mathrm{~s} 1==\$ \mathrm{~s} 2$ ) go to PC $+4+100$ | Equal test; PC-relative branch |
|  | branch on not equal | bne \$s1, \$s2, 25 | if (\$s1!=\$s2) go to PC $+4+100$ | Not equal test; PC-relative |
|  | set on less than | slt \$s1, \$s2, \$s3 | if (\$s2<\$s3) \$s1=1 else \$s1=0 | Compare less than; for beq, bne |
|  | set less than immediate | slti \$s1, \$s2, 100 | if (\$s2<100) \$s1=1 else \$s1=0 | Compare less than constant |
| Unconditional jump | jump | j 2500 | go to 10000 | Jump to target address |
|  | jump register | jr \$ra | go to \$ra | For switch, procedure return |
|  | jump and link | jal 2500 | \$ra=PC+4; go to 10000 | For procedure call |

## Summary: <br> MIPS Assembly Language

| Category | Instruction | Example | Meaning | Comments |
| :---: | :---: | :---: | :---: | :---: |
| Data transfer | load word | Iw \$s1, 100(\$s2) | \$s1=Memory[\$s2+100] | Word from memory to register |
|  | store word | sw \$s1, 100(\$s2) | Memory[\$s2+100]=\$s1 | Word from register to memory |
|  | load half | Ih \$s1, 100(\$s2) | \$s1=Memory[\$s2+100] | Halfword from memory to register |
|  | store half | sh \$s1, 100(\$s2) | Memory[\$s2+100]=\$s1 | Halfword from register to memory |
|  | load byte | lb \$s1, 100(\$s2) | \$s1=Memory[\$s2+100] | Byte from memory to register |
|  | store byte | sb \$s1, 100(\$s2) | Memory[\$s2+100]=\$s1 | Byte from register to memory |
|  | load upper immed. | lui \$s1, 100 | \$s1=100* ${ }^{16}$ | Loads constant in upper 16 bits |
| Logical | and | and \$s1, \$s2, \$s3 | \$s1=\$s2\&\$s3 | Bit-by-bit AND |
|  | or | or \$s1, \$s2, \$s3 | \$s1=\$s2\|\$s3 | Bit-by-bit OR |
|  | nor | nor \$s1, \$s2, \$s3 | \$s1=~(\$s2\|\$s3) | Bit-by-bit NOR |
|  | and immediate | andi \$s1, \$s2, 100 | \$s1=\$s2\&100 | Bit-by-bit AND reg with constant |
|  | or immediate | ori \$s1, \$s2, 100 | \$s1=\$s2\|100 | Bit-by-bit OR reg with constant |
|  | shift left logical | sll \$s1, \$s2, 10 | \$s1=\$s2<<10 | Shift left by constant |
|  | shift right logical | srl \$s1, \$s2, 10 | \$s1=\$s2>>10 | Shift right by constant |

