Code Representation & ASTs
Understanding ASTs
Interactive walkthrough of AST construction, traversals, symbol tables, and transformations.
Start learning →AST Explorer
Explore AST trees, step through traversals, inspect symbol tables, and see transformations on real code snippets.
Open playground →Exercises
0/4 completeModule 2: Code Representation and Abstract Syntax Trees -- Student Guide
Welcome to Module 2! You will explore how compilers and analyzers represent source code internally as Abstract Syntax Trees (ASTs). You will build visualizers, traversal algorithms, a scoped symbol table, and AST transformation passes -- all in OCaml.
Source Code --> Lexer --> Tokens --> Parser --> AST
|
+──────────────────────────+──────────────+
| | | |
Visualize Traverse Symbol Table Transform
(Exercise 1) (Exercise 2) (Exercise 3) (Exercise 4)
Exercises: 4 (63 tests) | Lab: Lab 2 -- AST Parser (9 tests) Estimated time: 1.5--2 hours for exercises, 2--3 hours for the lab
Table of Contents
- How Exercises Work
- Background: The AST Types
- Exercise 1: AST Structure Mapping (no tests)
- Exercise 2: Traversal Algorithms (27 tests)
- Exercise 3: Symbol Table (6 tests)
- Exercise 4: AST Transformations (30 tests)
- Lab 2: AST Parser and Analyzer (9 tests)
- Troubleshooting
- Exercise Progression Cheat Sheet
1. How Exercises Work
For environment setup (OCaml, opam, dune, ounit2, menhir), follow the
instructions in resources/tools/installation-guides/ocaml-setup.md.
Each exercise has this structure:
exercises/traversal-algorithms/
├── dune <-- tells dune which dirs to build
├── starter/
│ ├── dune <-- library definition
│ ├── traversals.ml <-- YOUR FILE -- edit this
│ └── visitor.ml <-- YOUR FILE -- edit this
└── tests/
├── dune <-- test configuration
└── test_traversals.ml <-- read-only test file
The workflow:
-
Read the starter file -- types and docstrings are provided; every function body is
failwith "TODO"with a comment explaining what to implement. -
Implement one function at a time -- replace the
failwithwith real code. -
Run tests -- see your progress:
dune runtest modules/module2-ast/exercises/traversal-algorithms/ -
Interpret the output:
Early on (many TODOs remaining), you will often see:
Fatal error: exception Failure("TODO: ...")This means the tests hit a
failwith "TODO"stub and crashed before they could run individually. Keep implementing functions from the top of the file.Once enough functions are implemented, you will see per-test results:
..E.FEEEE.= test passedE= error (your code threw an exception -- likely a remaining TODO)F= failure (your code ran but returned the wrong answer -- check logic)
-
Repeat until all tests pass:
....................(all dots!)
Tips:
- Implement functions in the order they appear in the file -- later functions often depend on earlier ones.
- Read the test file (
tests/test_*.ml) to understand exactly what is expected. - Tests are read-only -- do not modify them.
- Always run
dunecommands from the repository root, not from inside an exercise directory.
2. Background: The AST Types
All exercises in this module (except Exercise 3) use the shared AST types
defined in lib/shared_ast/ast_types.ml. You access them via
Shared_ast.Ast_types or with open Shared_ast.Ast_types.
Operators
type op =
| Add | Sub | Mul | Div (* arithmetic *)
| Eq | Neq | Lt | Gt | Le | Ge (* comparison *)
| And | Or (* logical *)
type uop = Neg | Not (* unary operators *)
Expressions
type expr =
| IntLit of int (* 42 *)
| BoolLit of bool (* true *)
| Var of string (* x *)
| BinOp of op * expr * expr (* x + y *)
| UnaryOp of uop * expr (* -x *)
| Call of string * expr list (* f(x, y) *)
Statements
type stmt =
| Assign of string * expr (* x = e *)
| If of expr * stmt list * stmt list (* if e { ... } else { ... } *)
| While of expr * stmt list (* while e { ... } *)
| Return of expr option (* return e or return *)
| Print of expr list (* print(e1, e2) *)
| Block of stmt list (* { s1; s2; ... } *)
Functions and Programs
type func_def = {
name : string;
params : string list;
body : stmt list;
}
type program = func_def list
How it fits together
Here is how the statement x = (2 + 3) * 4 looks as an AST:
Assign("x")
|
BinOp(Mul)
/ \
BinOp(Add) IntLit(4)
/ \
IntLit(2) IntLit(3)
Each node in the tree corresponds to one constructor from the types above. Your exercises will build, walk, query, and rewrite these trees.
3. Exercise 1: AST Structure Mapping (no tests)
Goal: Build three visualization functions that render ASTs in different formats. This is a warm-up exercise with no automated tests -- you run the executable and inspect the output visually.
Time: ~20 minutes
Files to edit:
exercises/ast-structure-mapping/starter/ast_visualizer.ml
Also provided:
exercises/ast-structure-mapping/starter/sample_asts.ml-- pre-built AST examples that re-export programs fromShared_ast.Sample_programsexercises/ast-structure-mapping/starter/worksheet.md-- guided hand-drawing exercises (fill in on paper or in the file)
What to implement
| # | Function | What it does |
|---|---|---|
| 1 | dump_expr / dump_stmt / dump_stmts / dump_ast | Indented tree dump: each node on its own line, children indented 2 more spaces |
| 2 | inc / count_expr / count_stmt / count_stmts / count_node_types | Walk the AST and count how many of each node type appear; return an association list |
| 3 | string_of_op / string_of_uop / expr_to_string / pp_stmt / pp_stmts / print_tree | Pretty-print the AST as human-readable pseudo-code |
Run the executable
dune exec modules/module2-ast/exercises/ast-structure-mapping/starter/ast_visualizer.exe
There are no automated tests for this exercise. Verify your output visually
against the comments in the starter file. The executable runs your three
visualizers on three sample programs (simple_arithmetic, branching,
loop_example).
Hints
dump_ast: Use theindent depthhelper to build the padding string. For compound nodes likeBinOp, print the parent at the current depth, then recursively print children atdepth + 1.count_node_types: Thread an accumulator(string * int) listthrough the recursion. Theinchelper should increment a key if present, or add(key, 1)if not.print_tree: This is the most substantial function. Pattern-match on each statement variant and format it like readable pseudo-code (see the comments in the file for the expected format of each case).
4. Exercise 2: Traversal Algorithms (27 tests)
Goal: Implement three classic tree traversal strategies and two visitor-style operations on the AST.
Time: ~30 minutes
Files to edit:
exercises/traversal-algorithms/starter/traversals.ml-- pre-order, post-order, and BFS traversalsexercises/traversal-algorithms/starter/visitor.ml--count_nodesandevaluate
Dependencies: shared_ast
The three traversals
Each traversal walks a stmt list and collects a string label for every node
visited. Labels look like:
- Statements:
"Assign","If","While","Return","Print","Block" - Expressions:
"IntLit(3)","BoolLit(true)","Var(x)","BinOp(+)","UnaryOp(-)","Call(f)"
Example: Assign("x", BinOp(Add, IntLit 1, IntLit 2))
Pre-order: ["Assign"; "BinOp(+)"; "IntLit(1)"; "IntLit(2)"]
Visit node FIRST, then children left-to-right.
Post-order: ["IntLit(1)"; "IntLit(2)"; "BinOp(+)"; "Assign"]
Visit children FIRST, then the node itself.
BFS: ["Assign"; "BinOp(+)"; "IntLit(1)"; "IntLit(2)"]
All nodes at depth d before any at depth d+1.
What to implement
In traversals.ml:
| # | Function | Hint |
|---|---|---|
| 1 | label_of_expr e | Pattern match on expr and return the label string, e.g., IntLit n returns "IntLit(" ^ string_of_int n ^ ")" |
| 2 | label_of_stmt s | Pattern match on stmt and return just the constructor name, e.g., Assign _ returns "Assign" |
| 3 | pre_order stmts | Emit current node label, then recurse into children. Use mutual recursion with helpers for expr and stmt |
| 4 | post_order stmts | Recurse into children first, then emit the current node label |
| 5 | bfs stmts | Use the OCaml Queue module. You need a sum type to handle both stmt and expr nodes in the same queue |
In visitor.ml:
| # | Function | Hint |
|---|---|---|
| 6 | count_nodes stmts | Walk the AST and count each constructor name (without parameters). Return an association list like [("Assign", 2); ("BinOp", 3); ...] |
| 7 | evaluate e | Evaluate constant integer expressions. Return Some int for pure arithmetic, None if variables/booleans/calls/comparisons appear. Division by zero returns None |
Run tests
dune runtest modules/module2-ast/exercises/traversal-algorithms/
Starter output: EEEEEEEEEEEEEEEEEEEEEEEEEEE (27 errors -- all TODOs)
Hints
- Pre-order/Post-order: Write mutually recursive helpers
pre_order_expr,pre_order_stmt,pre_order_stmts(and likewise for post-order). For pre-order, cons the label before the recursive results; for post-order, append the label after. - BFS: Define a sum type like
type node = Stmt_node of stmt | Expr_node of expr. Seed the queue with all top-level statements, then dequeue a node, emit its label, and enqueue its children. Repeat until empty. evaluate: UseOption.bindto chain recursive calls. If either operand of aBinOpreturnsNone, the whole expression isNone.
5. Exercise 3: Symbol Table (6 tests)
Goal: Implement a scoped symbol table using a stack of maps. This exercise
is self-contained -- it does not depend on shared_ast.
Time: ~15 minutes
Files to edit:
exercises/symbol-table/starter/symbol_table.ml-- the implementationexercises/symbol-table/starter/symbol_table.mli-- the interface (provided, read-only)
The data structures
(* Each symbol has a name, type, and mutability flag *)
type symbol_info = {
sym_name : string;
sym_type : string;
mutable_flag : bool;
}
(* The table is a scope stack: a list of maps from names to symbol_info.
Head = innermost scope, tail = enclosing scopes. *)
type t = symbol_info StringMap.t list
How scoping works
Global scope: { x -> int }
|
+-- Inner scope: { x -> float, y -> bool }
^
x is shadowed (float wins over int)
y is only visible here
After exit_scope:
Global scope: { x -> int }
y is gone, x is int again
What to implement
| # | Function | Hint |
|---|---|---|
| 1 | create () | Return a list with one empty StringMap.t: [StringMap.empty] |
| 2 | define tbl name info | Add the binding to the head (innermost) map using StringMap.add |
| 3 | lookup tbl name | Search from innermost scope outward; return the first match. Use List.find_map or a recursive scan |
| 4 | enter_scope tbl | Push StringMap.empty onto the front of the list |
| 5 | exit_scope tbl | Pop the head. If only one scope remains (the global scope), return None; otherwise return Some rest |
Run tests
dune runtest modules/module2-ast/exercises/symbol-table/
Starter output: EEEEEE (6 errors -- all TODOs)
Hints
- The entire implementation is a list of
StringMap.tvalues.createreturns[StringMap.empty],enter_scopeconses another empty map, andexit_scopereturns the tail. - For
lookup, iterate through the list of scopes and checkStringMap.find_opt name scopein each. The firstSomeyou find is the answer (innermost scope wins). exit_scopeshould refuse to pop the last remaining scope (the global scope).
6. Exercise 4: AST Transformations (30 tests)
Goal: Implement three pure AST-to-AST transformation passes: constant folding, variable renaming, and dead-code elimination.
Time: ~30 minutes
File to edit:
exercises/ast-transformations/starter/transformations.ml
Dependencies: shared_ast
Also provided:
exercises/ast-transformations/starter/build_sample_ast.ml -- small AST
fragments used by the tests
What to implement
| # | Function | What it does |
|---|---|---|
| 1 | constant_fold expr | Bottom-up: fold sub-expressions first, then if a BinOp has two IntLit (or BoolLit) children, compute the result. Handle arithmetic, comparison, logical, and unary operators |
| 2 | rename_variable old_name new_name stmts | Replace every Var old_name with Var new_name and every Assign(old_name, ...) with Assign(new_name, ...). Recurse into all nested structures |
| 3 | eliminate_dead_code stmts | Two rules: (a) remove all statements after a Return in the same list; (b) replace If(BoolLit true, then_b, _) with Block then_b and If(BoolLit false, _, else_b) with Block else_b. Apply recursively into nested blocks |
Constant folding examples
BinOp(Add, IntLit 2, IntLit 3) --> IntLit 5
BinOp(Mul, IntLit 2, BinOp(Add, IntLit 1, IntLit 3)) --> IntLit 8
BinOp(Add, Var "x", BinOp(Add, IntLit 2, IntLit 3)) --> BinOp(Add, Var "x", IntLit 5)
BinOp(Lt, IntLit 3, IntLit 5) --> BoolLit true
UnaryOp(Neg, IntLit 5) --> IntLit (-5)
UnaryOp(Not, BoolLit true) --> BoolLit false
Dead-code elimination examples
[Return (Some (IntLit 42)); Print [Var "x"]]
--> [Return (Some (IntLit 42))]
[If (BoolLit true, [Assign("x", IntLit 1)], [Assign("x", IntLit 2)])]
--> [Block [Assign("x", IntLit 1)]]
Run tests
dune runtest modules/module2-ast/exercises/ast-transformations/
Starter output: EEEEEEEEEEEEEEEEEEEEEEEEEEEEEE (30 errors -- all TODOs,
displayed as 28 due to terminal line truncation)
Hints
- Constant folding: This is a bottom-up transformation. First recursively
fold both operands, then check whether the result is two literals. Write a
helper
eval_binop : op -> expr -> expr -> exprthat handles all operator cases (Add,Sub,Mul,Div,Lt,Gt,Eq, etc.). Return aBoolLitfor comparison/logical operators and anIntLitfor arithmetic. For logical operators (And,Or), also handleBoolLitoperands. - Variable renaming: Write a helper
rename_exprfor expressions andrename_stmtfor statements. Pattern match each constructor and rebuild it with the renamed parts. Remember to rename both the LHS ofAssignandVarnodes inside expressions. - Dead-code elimination: Process the statement list left-to-right. If you
encounter a
Return, keep it and discard everything after it. ForIfnodes, check the condition -- if it isBoolLit trueorBoolLit false, replace the wholeIfwith aBlockof the live branch. Recurse intoIfbranches,Whilebodies, andBlockcontents.
7. Lab 2: AST Parser and Analyzer (9 tests)
After completing the exercises, tackle the lab. You will build a parser for a small language (MiniLang) using Menhir and implement AST analysis functions.
Location: labs/lab2-ast-parser/
Read the full spec: labs/lab2-ast-parser/README.md
What is provided
| File | Status | Description |
|---|---|---|
starter/lexer.mll | Complete | Tokenizes MiniLang source code (do not modify) |
starter/ast_parser.ml | Complete | Wires the lexer and parser together (do not modify) |
starter/parser.mly | TODO | Menhir grammar -- implement the stmt, expr, and atom rules |
starter/analyzer.ml | TODO | Implement extract_functions, extract_variables, extract_calls |
starter/sample_programs/ | Provided | .mini source files for testing |
What to implement
In parser.mly (grammar rules):
| Rule | Cases to handle |
|---|---|
stmt | Assignment (IDENT = expr ;), If/else, While, Return (with/without expr), Print |
expr | Binary operations with the precedence declared at the top of the file. Use atom for leaf expressions |
atom | Integer/boolean literals, variables, parenthesized expressions, unary -/!, function calls |
In analyzer.ml:
| Function | What it does |
|---|---|
extract_functions prog | Return a list of all function names in the program |
extract_variables stmts | Return all variable names from Assign LHS (including nested blocks) |
extract_calls stmts | Return all function names from Call expressions (including nested blocks) |
Build and test
# Build
dune build labs/lab2-ast-parser/
# Run tests (9 student-visible tests)
dune runtest labs/lab2-ast-parser/
Starter output: EEEEEEEEE (9 errors -- all TODOs)
Tips
- The
programandfunc_defrules inparser.mlyare provided as examples. Study them to understand the Menhir syntax before writing your own rules. - Use
%prec UMINUSand%prec UNOTannotations for unary operators to resolve precedence. - The precedence/associativity declarations are already set up at the top of the
file -- your
exprrule just needs to list the binary operation patterns. - Start with
atom(simplest), thenexpr, thenstmt.
8. Troubleshooting
Build errors
| Error | Fix |
|---|---|
ocaml: command not found | Run eval $(opam env) or add it to your ~/.bashrc/~/.zshrc |
dune: command not found | opam install dune && eval $(opam env) |
Error: Unbound module OUnit2 | opam install ounit2 |
Error: Unbound module Shared_ast | Run dune build from the repo root first (not from inside an exercise) |
Error: Unbound module Traversal_exercises | Same -- dune build from the repo root |
Parser warnings from labs/lab2-ast-parser/ | Expected -- the parser starter has unused tokens until you complete the grammar |
Test errors
| Symptom | Meaning |
|---|---|
EEEEEE | Every test errors -- functions still have failwith "TODO" |
..EEEE | First 2 tests pass, rest still TODO |
..F.EE | 3rd test fails (wrong answer) -- read the error message for expected vs actual |
Fatal error: exception Failure("TODO: ...") | The first function called is still a TODO -- implement it first |
Common OCaml mistakes
| Mistake | Fix |
|---|---|
match not exhaustive | Add all cases for the AST variants (every expr and stmt constructor) |
| Wrong traversal order | Double-check: pre-order visits the node before children; post-order visits after |
| BFS does not differ from pre-order | BFS uses a queue, not a stack. Make sure you enqueue children and dequeue level-by-level |
count_nodes counts are wrong | Make sure you count the node and recurse into its children -- do not forget both steps |
evaluate hits Not_found | Use Option.bind to propagate None from sub-expressions instead of unwrapping with Option.get |
exit_scope crashes | Return None when there is only one scope left, not an exception |
| Renaming misses some occurrences | Check both Var references in expressions and the LHS of Assign statements |
| Dead-code elimination not recursive | Apply the rules inside If branches, While bodies, and Block contents, not just at the top level |
Running tests from the right directory
Always run dune commands from the repository root, not from inside an
exercise directory:
# CORRECT -- from repo root:
dune runtest modules/module2-ast/exercises/traversal-algorithms/
# WRONG -- from inside the exercise (will fail to find shared libraries):
cd modules/module2-ast/exercises/traversal-algorithms/
dune runtest # ERROR: can't find shared_ast
9. Exercise Progression Cheat Sheet
Exercise 1: AST Structure Mapping <-- warm-up: visualize ASTs, no tests
|
Exercise 2: Traversal Algorithms <-- walk the tree three ways + visitor pattern
|
Exercise 3: Symbol Table <-- scoped lookups with a stack of maps
|
Exercise 4: AST Transformations <-- rewrite the tree: fold, rename, eliminate dead code
|
Lab 2: AST Parser and Analyzer <-- parse source code into an AST with Menhir
| Exercise | Tests | Key concepts |
|---|---|---|
| 1. AST Structure Mapping | 0 (run executable) | AST construction, indentation, pretty-printing |
| 2. Traversal Algorithms | 27 | Pre-order, post-order, BFS, visitor pattern, expression evaluation |
| 3. Symbol Table | 6 | Scope stacks, lexical scoping, shadowing |
| 4. AST Transformations | 30 | Constant folding, variable renaming, dead-code elimination |
| Exercise total | 63 | |
| Lab 2: AST Parser | 9 | Menhir grammars, lexing, parsing, AST analysis |
| Grand total | 72 |
Good luck!