Index

# Lecture 06

• The programming model assumes Total sequential Execution (TSE); one instruction finishes before next starts
• TSE is a sufficient but not necessary condition for semantic correctness; it is sufficient to satisfy inter-instruction dependencies
• Consider instructions $$i$$ followed by instruction $$j$$ ($$j>i$$) and these pipeline latencies: IF-ID:1, IF-EX:2, IF-ME:3, IF-WB:4
• Data dependencies through registers (read in ID, write in WB)
• RAW: $$j+1>i+4$$ true if $$j-i>3$$ false if $$j-i\leq 3$$ (hazard) (previous writes back before next read register at ID)
• WAR: $$j+3>i+1$$ true $$\forall j>i$$
• WAW: $$j+4>i+4$$ true $$\forall j>i$$
• Data dependencies through memroy (read in ME, write in ME) (load, store)
• RAW: $$j+3>i+3$$ true $$\forall j>i$$
• WAR: $$j+3>i+3$$ true $$\forall j>i$$
• WAW: $$j+3>i+3$$ true $$\forall j>i$$
• Control dependencies through the PC (read in IF, write in ID)
• RAW: $$j+0>i+1$$ true if $$j-i>1$$ false if $$j-i\leq 1$$ (hazard)
• WAR: $$j+1>i+0$$ true $$\forall j>i$$
• WAW: $$j+1>i+1$$ true $$\forall j>i$$
• Hazard: potential for a violated dependency
• 3 cycle RAW hazard on registers
• 2 cycle control on PC(control hazard)
• Solving register RAW harzard:
1. stall for $$x$$ cycles
• worset case, stall for min of 3 cycle

j=i+1
j+1+x>i+4
(i+1)+1+x>i+4
x>2 cycles
2. Data forwarding
• Special case: Load hazard: loads cannot forward their result during ME stage (data is available is after ME stage, can forward during WB)
• if a dependant instruction immediately follows a load ($$j=i+1$$) then it must stall for 1 penalty cycle
• Solving control hazard:
1. if branch is taken then squash the fall-through instruction
2. branch deplay slot
• execute the fall-through instruction even if the branch is taken.
• MIPS does this: performance depends on ability to fill the delay slot

## 3 Static Scheduling

• compiler orders instructions to minimize hazards.
• example pipeline
IF->ID->EX->ME->WB
|->FP1->FP2->FP3->WB

integer register R0, R1, R2, ...        //32 bit regs
floating-point register F0, F1, F2, ... //combined for double-precision
• assumpthions:
• all possible forwarding paths implemneted
• up to 1 integer WB and 1 floating-point WB per cycle (cannot do the WB for a L.D and ADD.D in same cycle)
• load delay = 1 cycle
• 1 branch delay slot
• stalls in ID to resolve hazards.

Index