source

03 Branch Prediction

Branch Prediction and Predication

Prerequisites: 02-Pipelining-and-Hazards Learning Goals: Understand how branch mispredictions cost performance, and the spectrum of predictors from simple to sophisticated. Understand when predication is preferable to prediction.


Why Branch Prediction Matters

CPI=1+MispredictionsInstructions×PenaltyMisprediction\text{CPI} = 1 + \frac{\text{Mispredictions}}{\text{Instructions}} \times \frac{\text{Penalty}}{\text{Misprediction}}

The deeper the pipeline, the more important accurate branch prediction becomes.


Every Processor Predicts

Options when encountering a branch:

ApproachBranch CostNon-branch Cost
Refuse-to-predict (stall)3 cycles2 cycles
Predict not-taken1 cycle (if NT) or 3 (if taken)1 cycle

Every processor uses some form of prediction — refusing to predict is itself a prediction.

Predict Not-Taken Accuracy


Branch Target Buffer (BTB)

The BTB stores the target PC for branches, indexed by the branch instruction’s PC.

BTB Steps

  1. At Fetch, use current PC to look up BTB
  2. Read predicted next PC from BTB
  3. Compare predicted PC with actual PC computed in ALU stage
  4. If match → correct prediction; if mismatch → misprediction, update BTB

BTB Design Constraints:


Direction Prediction: Branch History Table (BHT)

The BHT predicts whether a branch is taken or not-taken, separately from the target address.

1-Bit Predictor

Problem: Each behavior change causes 2 mispredictions (e.g., loop exit: predicts taken, misses; changes to not-taken, misses again on next loop entry).

2-Bit Predictor (2BP / 2BC)

It requires two consecutive wrong outcomes to change its prediction, which prevents a single anomaly from ruining future predictions.

Uses 2 bits per entry:

StatePredictionConviction
00Not-takenStrong
01Not-takenWeak
10TakenWeak
11TakenStrong

State transitions:

Current StateBranch OutcomeNext State
00Not taken00
00Taken01
01Not taken00
01Taken10
10Not taken01
10Taken11
11Not taken10
11Taken11

Note: Every predictor has a sequence that causes every prediction to be wrong. More bits alone don’t dramatically improve accuracy.


History-Based Predictors

Use the history of past branch outcomes to predict patterns like TNTNTN… or TTNTTNTTN…

1-Bit History with 2-Bit Counters

N-Bit History Predictor

Pattern History Table (PHT)


PShare vs. GShare

PredictorHistoryCountersBest For
PSharePrivate (per-branch)SharedSmall loops, predictable short patterns
GShareGlobal (all branches)SharedCorrelated branches

In practice: Use both together in a processor.


Tournament Predictors

Combines two predictors and a meta-predictor that chooses which predictor to trust:

GSharePShareMeta-Predictor Action
CorrectCorrectNo change
CorrectIncorrectCount down (favor GShare)
IncorrectCorrectCount up (favor PShare)
IncorrectIncorrectNo change

Hierarchical Predictors

An “okay” predictor is typically a simple hardware structure that requires very little power and has low latency. e.g. a small Branch History Table A “good” predictor is a more sophisticated structure designed to catch difficult or correlated patterns that simpler logic would miss. e.g. A GShare or PShare


Return Address Stack (RAS)

Different branch types need different prediction:

Branch TypeBest Predictor
ConditionalBTB
UnconditionalBTB
Function Return (RET)RAS

RAS: A small hardware stack storing return addresses for function calls.

Why RAS is Necessary

While a Branch Target Buffer (BTB) is great for “normal” branches, it struggles with RET instructions.

Identifying RET(return) early: Use a predictor or predecoding from the prefetch stage.


Predication

For branches that are hard to predict (e.g., small if-then-else), predication can be better than prediction.

Predication: Execute both paths; select the correct result; discard the wrong-path work.

When to Use Each

ScenarioBetter Approach
LoopsPrediction
Function calls/returnsPrediction
Large if-then-elsePrediction
Small if-then-else (hard to predict)Predication

Conditional Move (MOVC)

Full Predication HW Support: Add predicate bits to every instruction telling the processor which predicate register qualifies the instruction.


Summary

Key Takeaways:

See Also: 02-Pipelining-and-Hazards, 04-ILP-and-Register-Renaming Next: 04-ILP-and-Register-Renaming