Index

# Lecture 07

## 3 Static Scheduling

IF->ID->EX->ME->WB
|->FP1->FP2->FP3->WB
for (i = 999; i >= 0; i --) x[i] = x[i] + y
// &x[999] in R1
// &x[0] in R2
// y in F0
//                              stalls
Loop:   L.D     F2, (R1)          0
S.D     F2, (R1)          1
BGE     R1, R2, Loop      1
NOP                       0
//.D -> double-precision f.p.
• average CPI
• cycles per instruction
• ideal is 1
• stalls increse CPI

e.g. $$CPI_{loop1}=\frac{6instr\times 1CPI+3stall}{6instr}=1.5$$

• program execution time $$t=\frac{n\times CPI}{f}, n=$$ dynamic instruction count, $$f=$$clock frequency.
• speedup of program 2 over program 1, $$S=\frac{t_1}{t_2}$$

## 4 Local Scheduling

• scheduling within confines of a basic block
• basic block definition: if one instruction gets executed, then all do (only first instrunction can be a branch target and only last instruction can be a branch)
Loop2:  L.D     F2, (R1)
BGE     R1, R2, Loop2
S.D     F2, 8(R1)
// 0 stalls

$$CPI_{loop2}=\frac{5\times 1+0}{5}=1$$, $$S_{\frac{loop2}{loop1}}=\frac{t_{loop1}}{t_{loop2}}=\frac{6\times 1000\times 1.5}{5\times 1000\times 1}=1.8$$

## 5 Global Scheduling

Scheduling across multiple basic blocks

### 5.1 Loop Unrolling

• replicate loop bady $$n$$ times
• reduces look overhead, and increses scheduling opportunities

• example: unroll loop1 twice (2 copies total)
• before scheduling

Loop3:  L.D     F2, (R1)
S.D     R2, (R1)
L.D     F2, (R1)
S.D     F2, (R1)
BGE     R1, R2, Loop3
NOP
• after scheduling

Loop3:  L.D     F2, (R1)
L.D     F4, -8(R1)
S.D     F4, 8(R1)
$$CPI_{loop3}=\frac{8\times 1+0}{8}=1$$, $$S_{\frac{loop3}{loop1}}=\frac{t_{loop1}}{t_{loop2}}=\frac{6\times 1000\times 1.5}{8\times 500\times 1}=2.25$$