HPCA Study Guide

Course Overview

High Performance Computer Architecture covers the principles and techniques used to design fast, efficient processors. Topics progress from fundamentals through advanced parallelism and memory systems.

Learning Path

Prerequisites

Basic digital logic and computer organization
Understanding of assembly language / instruction sets
Familiarity with binary arithmetic

Recommended Study Order

01-Introduction-and-Metrics — Why architecture matters; how to measure performance
02-Pipelining-and-Hazards — The core pipeline mechanism and its problems
03-Branch-Prediction — Resolving control hazards efficiently
04-ILP-and-Register-Renaming — Exploiting instruction-level parallelism; removing false dependencies
05-Tomasulo-and-OOO-Execution — Hardware out-of-order scheduling
06-ROB-and-Memory-Ordering — Correct OOO execution; memory dependency handling
07-Compiler-ILP-and-VLIW — Compiler-driven ILP; VLIW architecture
08-Advanced-Caches — Cache optimization techniques; hierarchy design
09-Cache-Coherence — Multi-core coherence protocols

Quick Reference

Key Formulas

Formula	Meaning
`CPU Time = #Instructions × CPI × Clock Cycle Time`	Iron Law of Performance
`Speedup = Latency(Y) / Latency(X)`	Performance comparison
`Speedup = 1 / ((1 - Frac) + Frac/SpeedupEnh)`	Amdahl’s Law
`P = ½ C × V² × freq × alpha`	Dynamic power
`CPI = 1 + (Mispredictions/Inst) × (Penalty/Misprediction)`	Branch penalty CPI
`AMAT = Hit Time + Miss Rate × Miss Penalty`	Average Memory Access Time

Key Concepts at a Glance

Moore’s Law — Transistors double every 18–24 months
Memory Wall — Memory latency improves only ~1.1× every 2 years vs faster CPUs
ILP — Instruction-Level Parallelism (ideal); ILP ≥ IPC always
RAW — True data dependency (Read After Write)
WAR / WAW — False (name) dependencies, removable by register renaming
Tomasulo’s Algorithm — Hardware OOO via Reservation Stations + RAT
ROB — ReOrder Buffer enables in-order commit for correct OOO execution
AMAT — Average Memory Access Time; optimized via hit time, miss rate, miss penalty

File Descriptions

01-Introduction-and-Metrics

Topics: Computer architecture goals, Moore’s Law, Memory Wall, power consumption (dynamic/static), fabrication costs, performance metrics, benchmarks, Iron Law, Amdahl’s Law Key Learning Goals: Understand why architecture matters, how performance is measured and compared

02-Pipelining-and-Hazards

Topics: 5-stage pipeline, pipelining CPI, pipeline stalls & flushes, control/data dependencies, hazard types (RAW/WAW/WAR), hazard handling strategies, pipeline depth trade-offs Key Learning Goals: Understand how pipelining improves throughput and what hazards cost performance

03-Branch-Prediction

Topics: Branch behavior, BTB, BHT, 1-bit/2-bit predictors, history-based predictors, PShare/GShare, tournament/hierarchical predictors, Return Address Stack Key Learning Goals: Understand how branch mispredictions hurt performance and how predictors minimize this

04-ILP-and-Register-Renaming

Topics: ILP definition, register renaming, RAT (Register Allocation Table), architectural vs physical registers, predication for branches Key Learning Goals: Understand ILP as an ideal upper bound, and how register renaming removes false dependencies

05-Tomasulo-and-OOO-Execution

Topics: Tomasulo’s Algorithm, Reservation Stations (RS), Issue/Dispatch/Broadcast pipeline, stale results, load/store dependencies Key Learning Goals: Understand the hardware mechanism for out-of-order instruction scheduling

06-ROB-and-Memory-Ordering

Topics: ReOrder Buffer (ROB), exception/misprediction handling in OOO, commit semantics, superscalar, Load/Store Queue (LSQ), store-to-load forwarding Key Learning Goals: Understand how the ROB enables correct OOO execution and how memory ordering is maintained

07-Compiler-ILP-and-VLIW

Topics: Tree height reduction, instruction scheduling, loop unrolling, function inlining, VLIW architecture, superscalar vs VLIW comparison Key Learning Goals: Understand how compilers extract ILP and the trade-offs of VLIW design

08-Advanced-Caches

Topics: AMAT optimization, pipelined caches, VIPT caches, way prediction, replacement policies (LRU/NMRU/PLRU), 3 Cs of misses, prefetching, non-blocking caches, MSHR, cache hierarchies, inclusion/exclusion Key Learning Goals: Understand techniques to reduce hit time, miss rate, and miss penalty

09-Cache-Coherence

Topics: Coherence problem in multicore, write-update vs write-invalidate, snooping, MSI/MOSI/MOESI protocols, directory-based coherence, coherence misses (true/false sharing) Key Learning Goals: Understand why multicore systems need coherence protocols and how major protocols work

Study Tips

The Iron Law and Amdahl’s Law appear on every exam — know them cold
Pipeline hazard questions require tracing instruction timing cycle by cycle
Tomasulo’s Algorithm questions require filling in RS/RAT tables step by step
ROB + commit = correct OOO; remember “phantom exceptions”
AMAT questions: always expand the full hierarchy equation
Cache coherence: draw state machine transitions for MSI/MOESI