- The programming model assumes Total sequential Execution (TSE); one instruction finishes before next starts
- TSE is a sufficient but not necessary condition for semantic correctness; it is sufficient to satisfy inter-instruction dependencies
- Consider instructions \(i\) followed by instruction \(j\) (\(j>i\)) and these pipeline latencies:
`IF-ID:1, IF-EX:2, IF-ME:3, IF-WB:4`

- Data dependencies through registers (read in ID, write in WB)
- RAW: \(j+1>i+4\) true if \(j-i>3\) false if \(j-i\leq 3\) (hazard) (previous writes back before next read register at ID)
- WAR: \(j+3>i+1\) true \(\forall j>i\)
- WAW: \(j+4>i+4\) true \(\forall j>i\)

- Data dependencies through memroy (read in ME, write in ME) (load, store)
- RAW: \(j+3>i+3\) true \(\forall j>i\)
- WAR: \(j+3>i+3\) true \(\forall j>i\)
- WAW: \(j+3>i+3\) true \(\forall j>i\)

- Control dependencies through the PC (read in IF, write in ID)
- RAW: \(j+0>i+1\) true if \(j-i>1\) false if \(j-i\leq 1\) (hazard)
- WAR: \(j+1>i+0\) true \(\forall j>i\)
- WAW: \(j+1>i+1\) true \(\forall j>i\)

- Data dependencies through registers (read in ID, write in WB)
- Hazard: potential for a violated dependency
- 3 cycle RAW hazard on registers
- 2 cycle control on PC(control hazard)

- Solving register RAW harzard:
- stall for \(x\) cycles
worset case, stall for min of 3 cycle

`j=i+1 j+1+x>i+4 (i+1)+1+x>i+4 x>2 cycles`

- Data forwarding

- stall for \(x\) cycles
- Special case: Load hazard: loads cannot forward their result during ME stage (data is available is after ME stage, can forward during WB)
- if a dependant instruction immediately follows a load (\(j=i+1\)) then it must stall for 1 penalty cycle

- Solving control hazard:
- if branch is taken then squash the fall-through instruction
- branch deplay slot
- execute the fall-through instruction even if the branch is taken.
- MIPS does this: performance depends on ability to fill the delay slot

- compiler orders instructions to minimize hazards.
- example pipeline

```
IF->ID->EX->ME->WB
|->FP1->FP2->FP3->WB
integer register R0, R1, R2, ... //32 bit regs
floating-point register F0, F1, F2, ... //combined for double-precision
```

- assumpthions:
- all possible forwarding paths implemneted
- up to 1 integer WB and 1 floating-point WB per cycle (cannot do the WB for a L.D and ADD.D in same cycle)
- load delay = 1 cycle
- 1 branch delay slot
- stalls in ID to resolve hazards.