Computer Organization and Structure
Homework
#3
Due:
2009/12/15
1. Different
instructions utilize different hardware blocks in the basic single-cycle
implementation as shown in Figure 1. The next three
problems refer to the following instruction:
|
Instruction |
Interpretation |
a. |
add Rd, Rs, Rt |
Reg[Rd]=Reg[Rs]+Reg[Rt] |
b. |
lw Rt,
Offs(Rs) |
Reg[Rt]=Mem[Reg[Rs]+Offs] |
Figure
1: The basic implementation of the
MIPS subset, including the necessary multiplexors and control lines.
a.
What are the values of control signals
generated by the control in Figure 1 for this
instruction?
b.
Which resources (blocks) perform a
useful function for this instruction?
c.
Which resources (blocks) produce outputs,
but their outputs are not used for this instruction? Which resources produce no
outputs for this instruction?
2. Different
execution units and blocks of digital logic have different latencies (time
needed to do their work). In Figure 1 there are seven
kinds of major blocks. Latencies of blocks along the critical (longest-latency)
path for an instruction determine the minimum latency of that instruction. For
the following three problems, assume the following resource latencies:
|
I-Mem |
Add |
Mux |
ALU |
Regs |
D-Mem |
Control |
a. |
400ps |
100ps |
30ps |
120ps |
200ps |
350ps |
100ps |
b. |
500ps |
150ps |
100ps |
180ps |
220ps |
1000ps |
65ps |
a.
What is the critical path for a MIPS AND
instruction?
b.
What is the critical path for a MIPS
load (LD) instruction?
c.
What is the critical path for a MIPS BEQ
instruction?
3. The
basic single-cycle MIPS implementation in Figure 1 can only
implement some instructions. New instructions can be added to an existing ISA,
but the decision whether or not to do that depends, among other things, on the
cost and complexity such an addition introduces into the processor datapath and
control. The next three problems refer to this new instruction:
|
Instruction |
Interpretation |
a. |
add3 Rd, Rs,
Rt,Rx |
Reg[Rd]=Reg[Rs]+Reg[Rt]+Reg[Rx] |
b. |
sll Rt, Rd,
Shift |
Reg[Rd]=
Reg[Rt] << Shift (shift left by Shift bits) |
a.
Which existing blocks (if any) can be
used for this instruction?
b.
Which new functional blocks (if any) do
we need for this instruction?
c.
What new signals do we need (if any) from
the control unit to support this instruction?
4. The
following problems assume that individual stages of the datapath have the
following latencies:
|
IF |
ID |
EX |
MEM |
WB |
a. |
300ps |
400ps |
350ps |
500ps |
100ps |
b. |
200ps |
150ps |
120ps |
190ps |
140ps |
a.
What is the clock cycle time in a
pipelined and nonpipelined processor?
b.
What is the total latency of a lw
instruction in a pipelined and nonpipelined processor?
c.
If we can split one stage of the
pipelined datapath into two new stages, each with half the latency of the
original stage, which stage would you split and what is the new clock cycle
time of the processor?
5. The following
problems assume that instructions executed by the processor are broken down as
follows:
|
ALU |
beq |
lw |
sw |
a. |
50% |
25% |
15% |
10% |
b. |
30% |
25% |
30% |
15% |
a.
Assuming there are no stalls or hazards,
what is the utilization (% of cycles used) of the data memory?
b.
Assuming there are no stalls or hazards,
what is the utilization of the write-register port of the “Registers”
unit?
c.
Instead of a singl-cycle organization,
we can use a multi-cycle organization where each instruction takes multiple
cycles but one instruction through stages it actually needs (e.g., ST only
takes four cycles because it does not need the WB stage). Compare clock cycle
times and execution times with single-cycle, multi-cycle, and pipelined
organization.
6. The
following problems refer to the following sequence of instructions:
|
Instruction
sequence |
a. |
lw $1, 40($6) add
$6, $2, $2 sw $6, 50($1) |
b. |
lw $5, -16($5) sw $5, -16($5) add
$5, $5, $5 |
a.
Indicate dependences and their type.
b.
Assume there is no forwarding in this
pipelined processor. Indicate hazards and add nop instructions to
eliminate them.
c.
Assume there is full forwarding.
Indicate hazards and add nop instructions to eliminate them.
The
remaining problems assume the following clock cycle times:
|
Without
forwarding |
With
full forwarding |
With
ALU-ALU forwarding only |
a. |
300ps |
400ps |
360ps |
b. |
200ps |
250ps |
220ps |
d.
What is the total execution time of this
instruction sequence without forwarding and with full forwarding? What is the
speed-up achieved by adding full forwarding to a pipeline that had no
forwarding?
e.
Add nop instructions to
this code to eliminate hazards if there is ALU-ALU forwarding only (no
forwarding from the MEM to the EX stage)?
f.
What is the total execution time of this
instruction sequence with only ALU-ALU forwarding? What is the speed-up over a
no-forwarding pipeline?