Branch Prediction and Predication
Prerequisites: 02-Pipelining-and-Hazards Learning Goals: Understand how branch mispredictions cost performance, and the spectrum of predictors from simple to sophisticated. Understand when predication is preferable to prediction.
Why Branch Prediction Matters
- Branches account for ~20% of all instructions
- ~60% of branches are taken
- Mispredicting a branch wastes all instructions fetched down the wrong path
- Mispredictions/Instruction depends on predictor accuracy
- Penalty/Misprediction depends on pipeline depth
The deeper the pipeline, the more important accurate branch prediction becomes.
Every Processor Predicts
Options when encountering a branch:
| Approach | Branch Cost | Non-branch Cost |
|---|---|---|
| Refuse-to-predict (stall) | 3 cycles | 2 cycles |
| Predict not-taken | 1 cycle (if NT) or 3 (if taken) | 1 cycle |
Every processor uses some form of prediction — refusing to predict is itself a prediction.
Predict Not-Taken Accuracy
- 20% of instructions are branches; 60% of those are taken
- Predict Not-Taken is correct: 80% × 100% + 20% × 40% = 88% of all instructions
Branch Target Buffer (BTB)
The BTB stores the target PC for branches, indexed by the branch instruction’s PC.
BTB Steps
- At Fetch, use current PC to look up BTB
- Read predicted next PC from BTB
- Compare predicted PC with actual PC computed in ALU stage
- If match → correct prediction; if mismatch → misprediction, update BTB
BTB Design Constraints:
- Must have 1-cycle latency → must be small
- Use LSBs of PC for indexing (only likely-soon instructions needed)
Direction Prediction: Branch History Table (BHT)
The BHT predicts whether a branch is taken or not-taken, separately from the target address.
1-Bit Predictor
- Index BHT with LSB of PC
- Entry:
0= not-taken,1= taken - If BHT=0: just increment PC; if BHT=1: look up target in BTB
Problem: Each behavior change causes 2 mispredictions (e.g., loop exit: predicts taken, misses; changes to not-taken, misses again on next loop entry).
2-Bit Predictor (2BP / 2BC)
It requires two consecutive wrong outcomes to change its prediction, which prevents a single anomaly from ruining future predictions.
Uses 2 bits per entry:
- MSB = prediction (taken or not-taken)
- LSB = conviction (strong or weak)
| State | Prediction | Conviction |
|---|---|---|
00 | Not-taken | Strong |
01 | Not-taken | Weak |
10 | Taken | Weak |
11 | Taken | Strong |
State transitions:
| Current State | Branch Outcome | Next State |
|---|---|---|
| 00 | Not taken | 00 |
| 00 | Taken | 01 |
| 01 | Not taken | 00 |
| 01 | Taken | 10 |
| 10 | Not taken | 01 |
| 10 | Taken | 11 |
| 11 | Not taken | 10 |
| 11 | Taken | 11 |
- Single anomaly: 1 misprediction
- Behavior change: 2 mispredictions
- Usually initialized to
00(easiest)
Note: Every predictor has a sequence that causes every prediction to be wrong. More bits alone don’t dramatically improve accuracy.
History-Based Predictors
Use the history of past branch outcomes to predict patterns like TNTNTN… or TTNTTNTTN…
1-Bit History with 2-Bit Counters
- BHT entry: 1 history bit + two 2-bit counters (one for when last outcome was NT, one for T)
- Select which counter based on history bit
N-Bit History Predictor
- Can predict all patterns of length ≤ N+1
- Cost: N + 2×2^N bits per entry (most counters wasted)
Pattern History Table (PHT)
- Share counters across many BHT entries to reduce cost
PShare vs. GShare
| Predictor | History | Counters | Best For |
|---|---|---|---|
| PShare | Private (per-branch) | Shared | Small loops, predictable short patterns |
| GShare | Global (all branches) | Shared | Correlated branches |
In practice: Use both together in a processor.
Tournament Predictors
Combines two predictors and a meta-predictor that chooses which predictor to trust:
| GShare | PShare | Meta-Predictor Action |
|---|---|---|
| Correct | Correct | No change |
| Correct | Incorrect | Count down (favor GShare) |
| Incorrect | Correct | Count up (favor PShare) |
| Incorrect | Incorrect | No change |
Hierarchical Predictors
- Tournament: combines two good predictors; both always updated
- Hierarchical: combines one good + one okay predictor; the good predictor is only updated when the okay predictor is wrong → saves power
An “okay” predictor is typically a simple hardware structure that requires very little power and has low latency. e.g. a small Branch History Table A “good” predictor is a more sophisticated structure designed to catch difficult or correlated patterns that simpler logic would miss. e.g. A GShare or PShare
Return Address Stack (RAS)
Different branch types need different prediction:
| Branch Type | Best Predictor | |
|---|---|---|
| Conditional | BTB | |
| Unconditional | BTB | |
| Function Return (RET) | RAS |
RAS: A small hardware stack storing return addresses for function calls.
- On CALL: push return address onto RAS
- On RET: pop from RAS
- When full: wrap-around replacement, meaning it overwrites the oldest entries.
Why RAS is Necessary
While a Branch Target Buffer (BTB) is great for “normal” branches, it struggles with RET instructions.
- The Problem: A single function (like
print()) might be called from ten different places in your code. - BTB Limitation: A BTB usually only remembers the last destination. If
print()was called by Function A last time, the BTB will predict it returns to Function A, even if Function B is the one calling it now. - The Solution: The RAS uses a Last-In, First-Out (LIFO) stack to perfectly track these nested calls.
Identifying RET(return) early: Use a predictor or predecoding from the prefetch stage.
Predication
For branches that are hard to predict (e.g., small if-then-else), predication can be better than prediction.
Predication: Execute both paths; select the correct result; discard the wrong-path work.
When to Use Each
| Scenario | Better Approach |
|---|---|
| Loops | Prediction |
| Function calls/returns | Prediction |
| Large if-then-else | Prediction |
| Small if-then-else (hard to predict) | Predication |
Conditional Move (MOVC)
- Compiler converts if-then-else to conditional instructions (e.g.,
MOVC) - Requires:
- Compiler support
- More registers
- More instructions executed
Full Predication HW Support: Add predicate bits to every instruction telling the processor which predicate register qualifies the instruction.
Summary
Key Takeaways:
- Branches are frequent (~20%) — misprediction penalty scales with pipeline depth
- BTB: predicts branch targets; BHT: predicts branch direction
- 2-bit predictors reduce mispredictions from behavior changes vs 1-bit
- History-based predictors (PShare, GShare) recognize patterns
- Tournament/hierarchical predictors combine strategies adaptively
- RAS handles function return prediction
- Predication eliminates hard-to-predict branches at the cost of executing both paths
See Also: 02-Pipelining-and-Hazards, 04-ILP-and-Register-Renaming Next: 04-ILP-and-Register-Renaming