Understanding Static Analysis
Before building static analyzers in OCaml, let's develop intuition for how control flow graphs represent program execution, why the dataflow framework works, and the key analyses (reaching definitions, live variables) you'll implement in the exercises.
Why Control Flow Graphs?
An AST tells you what the code says. But to analyze how the code runs — which statements execute, in what order, along which paths — you need a control flow graph (CFG).
How many distinct execution paths exist through this code?
let x = input()
if x > 0 then
if x > 10 then
y = "big"
else
y = "small"
else
y = "negative"
print(y)Explicit Flow
CFGs make every possible execution path visible — branches, loops, and fallthrough are all edges.
Foundation for Dataflow
Reaching definitions, live variables, and all dataflow analyses operate on CFGs — they propagate facts along edges.
Bug Detection
Unreachable code, uninitialized variables, and dead stores all become visible through CFG-based analysis.
Building Blocks of CFGs
A basic block is a maximal sequence of consecutive statements with one entry and one exit — no branches in or out except at the beginning and end.
Code
x = 1 y = 2 z = x + y
CFG
ENTRY | [B1] x=1 y=2 z=x+y | EXIT
Consecutive statements with no branches form a single basic block. Control flows straight through.
Click lines that start a new basic block. A new block starts at: the first statement, branch targets, join points, and statements after branches.
The Dataflow Framework
All dataflow analyses share a common framework: a lattice of abstract values, a transfer function per block, a join operator at merge points, and an iteration strategy that computes until fixpoint.
The powerset lattice of {d1, d2, d3} has 8 elements. Pick two sets to see their join (∪) and meet (∩).
{d1,d2,d3} ← TOP (⊤)
/ | \
{d1,d2} {d1,d3} {d2,d3}
| \ / | \ / |
| {d1} {d2} {d3}
| \ | /
\ \ | /
∅ ← BOTTOM (⊥)Set A: ∅
Set B: ∅
Join (∪): A ∪ B = ∅
Least upper bound — the smallest set containing both A and B. Used at merge points in may-analyses.
Meet (∩): A ∩ B = ∅
Greatest lower bound — elements present in both A and B. Used at merge points in must-analyses.
Forward Analysis (e.g., Reaching Definitions)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
Direction: ENTRY ──────────────► EXIT
Merge at: Block entry (IN)
Compute: Block exit (OUT)
IN[B] = ∪ OUT[P] for all predecessors P
OUT[B] = transfer(IN[B])
= gen[B] ∪ (IN[B] − kill[B])
Initialize: OUT[ENTRY] = ∅Fixpoint: Iterate until no IN or OUT set changes. The lattice has finite height (2n for n definitions or variables), so iteration always terminates. For monotone transfer functions, this is guaranteed by the ascending chain condition.
Reaching Definitions
A definition d: x = ... reaches a program point if there exists a path from the definition to that point on which x is not redefined. This is a forward may-analysis — we ask "can this definition possibly reach here?"
[B1] x = 5 [B1] y = 1 [B2] if x > 3 [B3] y = x + 1 (then) [B4] y = x - 1 (else) [B5] z = y (join)
| Block | gen | kill | IN | OUT |
|---|---|---|---|---|
| B1 | {d1,d2} | {} | ∅ | ∅ |
| B2 | {} | {} | ∅ | ∅ |
| B3 | {d3} | {d2,d4} | ∅ | ∅ |
| B4 | {d4} | {d2,d3} | ∅ | ∅ |
| B5 | {d5} | {} | ∅ | ∅ |
All IN and OUT sets start empty. Definitions: d1:x=5, d2:y=1, d3:y=x+1, d4:y=x-1, d5:z=y.
Why Does This Matter?
Reaching definitions answer: "where could this value have come from?" This enables:
- Constant propagation: If only one definition reaches a use, and it's a constant, we can substitute it.
- Dead store detection: If a definition reaches no use, the store is dead.
- Use-def chains: The reaching definitions at each use point form the use-def chain.
Live Variables
A variable is live at a point if its current value may be read before being overwritten. This is a backward may-analysis — information flows from uses back to definitions.
[B1] x = 5 [B1] y = 1 [B2] if x > 3 [B3] y = x + 1 (then) [B4] y = x - 1 (else) [B5] z = y (join)
| Block | use | def | IN | OUT |
|---|---|---|---|---|
| B5 | {y} | {z} | ∅ | ∅ |
| B4 | {x} | {y} | ∅ | ∅ |
| B3 | {x} | {y} | ∅ | ∅ |
| B2 | {x} | {} | ∅ | ∅ |
| B1 | {} | {x,y} | ∅ | ∅ |
All IN/OUT start empty. Note: we process backward (B5 first, B1 last). use = vars read before redefined. def = vars assigned.
Reaching Definitions
- Direction: Forward (ENTRY → EXIT)
- Domain: Sets of definitions
- gen/kill: Definitions created / overwritten
- Transfer: OUT = gen ∪ (IN − kill)
- Meet: ∪ (may analysis)
- Answers: "Where did this value come from?"
Live Variables
- Direction: Backward (EXIT → ENTRY)
- Domain: Sets of variables
- use/def: Vars read / overwritten
- Transfer: IN = use ∪ (OUT − def)
- Meet: ∪ (may analysis)
- Answers: "Will this value be used later?"
Same framework, different direction. Both analyses use the powerset lattice with union as join. The transfer functions are structurally identical — just with "gen" renamed to "use" and "kill" renamed to "def", flowing in the opposite direction.
Ready to Build Your Own?
You now understand how CFGs represent control flow, how the dataflow framework propagates facts, and the duality between reaching definitions and live variables. Time to implement these concepts in OCaml.
CFG Explorer
Interactively explore CFGs, step through reaching definition and live variable iterations, and inspect call graphs on 5 code snippets.
Open explorer →Exercise 1: CFG Construction
Build control flow graphs from ASTs. Implement basic block identification, edge construction, and predecessor/successor computation.
Start exercise →