OCaml Warm-Up

Module 0 — Program Analysis Bootcamp

Getting ready for program analysis with OCaml

5 exercises • ~2 hours • No tests — guided tutorials

Learning Objectives

By the end of this module, you will be able to:

  1. Write OCaml functions using let bindings, type annotations, pattern matching, and recursion
  2. Define and manipulate algebraic data types (ADTs) representing expression trees
  3. Use collection typesList.map/fold, StringMap, StringSet, and ref
  4. Build modules satisfying a signature and use functors to parameterize code
  5. Read and extend ocamllex/Menhir grammar rules for a simple parser
Why these five? Each exercise directly foreshadows a concept you will use in Modules 2-6: AST types, dataflow sets, abstract domains, and parser grammars.

Why OCaml for Program Analysis?

Pattern Matching

Match on AST node types directly. The compiler warns you if you forget a case.

Algebraic Data Types

ASTs, lattice values, and analysis results are all naturally expressed as ADTs.

Type Safety

Strong static types catch bugs at compile time — no null pointer surprises.

Immutability by Default

Functional style means fewer side effects, easier reasoning about program state.

Module System

Signatures + functors let you write generic analyses parameterized by abstract domains.

Tooling

ocamllex and Menhir provide industrial-strength lexer/parser generators.

Industry note: Facebook's Infer, Jane Street's trading systems, and the Coq proof assistant are all built in OCaml.

OCaml Basics: Let Bindings & Functions

Top-level bindings

(* Immutable binding *)
let x = 42

(* Function with type annotations *)
let square (n : int) : int = n * n

(* Multiple arguments *)
let add (a : int) (b : int) : int = a + b

Local bindings with let...in

let hypotenuse a b =
  let a2 = a *. a in
  let b2 = b *. b in
  Float.sqrt (a2 +. b2)

If / then / else (expression, not statement)

let abs x =
  if x >= 0 then x else -x

(* Returns a value -- no "return" keyword *)
let classify n =
  if n > 0 then "positive"
  else if n < 0 then "negative"
  else "zero"

Key idea

Everything is an expression that produces a value. There are no statements.

Exercise 1 builds on these: square, is_empty, greet, is_digit, classify_char.

OCaml Basics: Tuples and Records

Tuples — lightweight grouping

(* A position is (line, column) *)
type pos = int * int

let origin : pos = (1, 1)

(* Destructure in function args *)
let format_pos ((line, col) : pos) : string =
  Printf.sprintf "line %d, col %d" line col

(* Pattern match *)
let advance_pos (line, col) c =
  if c = '\n' then (line + 1, 1)
  else (line, col + 1)

Tuples are positional — access by pattern matching, not by name.

Records — named fields

type assignment = {
  var_name : string;
  value    : int;
  line     : int;
}

let a = { var_name = "x"; value = 5; line = 1 }

(* Field access *)
let name = a.var_name

(* Functional update -- creates a NEW record *)
let a' = { a with value = a.value + 3 }

Records are immutable by default. Use { r with field = new_val } to "update".

OCaml Basics: Printf and String Formatting

String concatenation

let greeting = "Hello, " ^ "world!"       (* "Hello, world!" *)
let msg = "x = " ^ string_of_int 42       (* "x = 42" *)

Printf — type-safe formatted output

(* Print to stdout *)
Printf.printf "name = %s, age = %d\n" "Alice" 30

(* Format to a string *)
let s = Printf.sprintf "[%s: %s]" "keyword" "if"

(* Common format specifiers *)
(*  %d  int          %s  string       %b  bool
    %f  float        %c  char         %B  bool (true/false)  *)

Type safety in action

(* This is a compile error -- OCaml checks format types! *)
Printf.printf "%d" "not an int"
(* Error: This expression has type string but ... expected int *)
Contrast with C: OCaml's Printf is checked at compile time. No %s-on-an-int crashes.

Algebraic Data Types (ADTs)

ADTs let you define types with multiple variants, each carrying different data. They are the backbone of ASTs in program analysis.

Defining variants

(* Binary operators *)
type op = Add | Sub | Mul

(* Expression tree -- a mini AST *)
type expr =
  | Num of int
  | Var of string
  | BinOp of op * expr * expr

Each variant is a constructor that tags the data it carries.

Building values

(* 2 + 3 *)
let e1 = BinOp (Add, Num 2, Num 3)

(* x * (1 + y) *)
let e2 = BinOp (Mul, Var "x",
            BinOp (Add, Num 1, Var "y"))
BinOp(Mul) / \ Var "x" BinOp(Add) / \ Num 1 Var "y"
Foreshadow: Module 2's Shared_ast.Ast_types defines expr, stmt, func_def, and program using exactly this pattern.

Pattern Matching

match...with is OCaml's most powerful control structure. It destructures values and the compiler ensures you handle every case.

Basic matching

let string_of_op o =
  match o with
  | Add -> "+"
  | Sub -> "-"
  | Mul -> "*"

Recursive matching on trees

let rec string_of_expr e =
  match e with
  | Num n -> string_of_int n
  | Var x -> x
  | BinOp (o, l, r) ->
    Printf.sprintf "(%s %s %s)"
      (string_of_expr l)
      (string_of_op o)
      (string_of_expr r)

Exhaustiveness checking

(* If you forget a case: *)
let bad o = match o with
  | Add -> "+"
  | Sub -> "-"
(* Warning 8: this pattern-matching
   is not exhaustive.
   Here is an example of a case
   that is not matched: Mul *)
This is critical for analysis. When you add a new AST node type, the compiler tells you every function that needs updating.

Matching on tuples

let classify (x, y) = match (x, y) with
  | (0, 0) -> "origin"
  | (0, _) -> "y-axis"
  | (_, 0) -> "x-axis"
  | _      -> "other"

Recursion and the Option Type

Recursive functions with let rec

(* Count nodes in an expression tree *)
let rec count_nodes e =
  match e with
  | Num _ | Var _ -> 1
  | BinOp (_, l, r) ->
    1 + count_nodes l + count_nodes r

(* Tree depth *)
let rec depth e =
  match e with
  | Num _ | Var _ -> 1
  | BinOp (_, l, r) ->
    1 + max (depth l) (depth r)

Option: safe "nullable" values

(* Option type: Some x or None *)
type 'a option = Some of 'a | None

(* Evaluate if no variables present *)
let rec eval e =
  match e with
  | Num n -> Some n
  | Var _ -> None  (* can't evaluate *)
  | BinOp (o, l, r) ->
    match eval l, eval r with
    | Some a, Some b ->
      Some (apply_op o a b)
    | _ -> None
Foreshadow: "We might not know the exact value" is the norm in abstract interpretation (Module 4). Option is a tiny abstract domain: Some n = known, None = unknown.

Expression Trees as ADTs

Tree transformations are the core mechanic of program analysis. You will do this constantly in Modules 2-6.

Substitution — replacing variables with values

(* substitute "x" 5 (x * (1 + y))  -->  (5 * (1 + y)) *)
let rec substitute var_name value e =
  match e with
  | Num _ -> e
  | Var x -> if x = var_name then Num value else e
  | BinOp (o, l, r) ->
    BinOp (o, substitute var_name value l,
               substitute var_name value r)

Constant folding — simplifying known sub-expressions

(* simplify (10 + 20) - 5  -->  25 *)
let rec simplify e =
  match e with
  | Num _ | Var _ -> e
  | BinOp (o, l, r) ->
    match simplify l, simplify r with
    | Num a, Num b -> Num (apply_op o a b)  (* fold! *)
    | l', r' -> BinOp (o, l', r')           (* can't fold *)
Foreshadow: Constant folding is a real compiler optimization. Module 4's abstract interpretation generalizes this: instead of exact values, you track sign, range, or taint.

Lists and Higher-Order Functions

List basics

(* Immutable linked lists *)
let xs = [1; 2; 3; 4; 5]
let ys = 0 :: xs  (* prepend: [0;1;2;3;4;5] *)

(* Pattern match on lists *)
let rec length = function
  | [] -> 0
  | _ :: rest -> 1 + length rest

List.map — transform each element

let double_all xs = List.map (fun x -> x * 2) xs
(* double_all [1;2;3] = [2;4;6] *)

List.filter — keep matching elements

let keep_positive xs =
  List.filter (fun x -> x > 0) xs
(* keep_positive [-1;3;0;5] = [3;5] *)

List.fold_left — reduce to a single value

let sum xs =
  List.fold_left (fun acc x -> acc + x) 0 xs
(* sum [1;2;3;4] = 10 *)
List.fold_left f init [a; b; c] Step 1: f init a --> r1 Step 2: f r1 b --> r2 Step 3: f r2 c --> result
Why this matters: You will use fold_left everywhere — building environments from lists of assignments, accumulating analysis results, computing fixpoints.

Collections: Map and Set

OCaml's standard library provides immutable, balanced-tree-backed Map and Set via functors.

StringMap — variable environments

module StringMap = Map.Make(String)

(* Build from a list of pairs *)
let build_env pairs =
  List.fold_left
    (fun env (k, v) -> StringMap.add k v env)
    StringMap.empty
    pairs

(* Lookup with Option *)
let lookup env name =
  StringMap.find_opt name env

(* Get all keys *)
let all_vars env =
  List.map fst (StringMap.bindings env)

StringSet — variable sets

module StringSet = Set.Make(String)

let s1 = StringSet.of_list ["x"; "y"; "z"]
let s2 = StringSet.of_list ["y"; "z"; "w"]

(* Set operations *)
let union = StringSet.union s1 s2
let inter = StringSet.inter s1 s2
let diff  = StringSet.diff s1 s2

(* Convert to list *)
StringSet.elements inter  (* ["y"; "z"] *)
Foreshadow: Modules 3-5 use StringSet for live-variable sets, reaching-definition sets, and taint sets. Map stores variable-to-abstract-value bindings.

Records and Mutable State

Records in practice

type assignment = {
  var_name : string;
  value    : int;
  line     : int;
}

let a = { var_name="x"; value=5; line=1 }

(* Format for display *)
let format_assign a =
  Printf.sprintf "%s = %d (line %d)"
    a.var_name a.value a.line

(* Functional update *)
let increment_value a n =
  { a with value = a.value + n }

Mutable state with ref

(* ref creates a mutable cell *)
let counter = ref 0

(* Read with ! *)
let current = !counter     (* 0 *)

(* Write with := *)
counter := !counter + 1    (* now 1 *)

(* Closure over a ref -- a counter factory *)
let make_counter () =
  let n = ref 0 in
  fun () ->
    let v = !n in
    n := v + 1;
    v

let next = make_counter ()
(* next() = 0, next() = 1, next() = 2 *)
Use sparingly. You will see ref in fixpoint loops (Modules 3-4) where a worklist updates until convergence.

Module System: Signatures and Structures

OCaml modules group related types, values, and functions. Signatures describe the interface; structures provide the implementation.

Signature (module type)

module type LATTICE = sig
  type t
  val bottom : t
  val top : t
  val join : t -> t -> t
  val equal : t -> t -> bool
  val to_string : t -> string
end

The signature says what exists. The type t is abstract — callers cannot see its representation.

Structure (module)

module BoolLattice : LATTICE
  with type t = bool
= struct
  type t = bool
  let bottom = false
  let top = true
  let join a b = a || b
  let equal a b = (a = b)
  let to_string b =
    if b then "true" else "false"
end

The with type t = bool makes the type transparent so callers can pass true/false directly.

Functors: Parameterized Modules

A functor is a function from modules to modules. It lets you write generic code parameterized by an interface.

(* MakeEnv takes any LATTICE and produces an environment module *)
module MakeEnv (L : LATTICE) = struct
  module M = Map.Make(String)
  type t = L.t M.t                  (* map from string to L.t *)

  let empty = M.empty

  let lookup env x =
    match M.find_opt x env with
    | Some v -> v
    | None   -> L.bottom            (* missing = bottom *)

  let update env x v = M.add x v env

  let join env1 env2 =
    M.union (fun _k v1 v2 -> Some (L.join v1 v2)) env1 env2
end

Instantiation

module Env = MakeEnv(ThreeValueLattice)   (* concrete environment *)
let env = Env.update Env.empty "x" Zero   (* use it! *)
Foreshadow: This is exactly the pattern in lib/abstract_domains/abstract_env.ml. In Modules 3-4, you will plug in sign domains, interval domains, and taint domains as the LATTICE parameter.

Building a Simple Lattice Module

A lattice is a partially ordered set with a least element (bottom), greatest element (top), and a join operation (least upper bound).

Unknown (top) / \ Zero Positive \ / Bot (bottom)

ThreeValueLattice

type three_value =
  | Bot | Zero | Positive | Unknown

module ThreeValueLattice
  : LATTICE with type t = three_value
= struct
  type t = three_value
  let bottom = Bot
  let top = Unknown
  let join a b =
    if a = b then a
    else if a = Bot then b
    else if b = Bot then a
    else Unknown
  let equal a b = (a = b)
  let to_string = function
    | Bot -> "Bot" | Zero -> "Zero"
    | Positive -> "Positive"
    | Unknown -> "Unknown"
end

Why lattices?

Every abstract domain in program analysis forms a lattice:

  • Bottom = no information / unreachable
  • Top = could be anything / no precision
  • Join = merge information from two paths
if (cond) { x = 0; // x -> Zero } else { x = 5; // x -> Positive } // x -> join(Zero, Positive) = Unknown
Foreshadow: Module 3 (reaching definitions) and Module 4 (sign analysis) are both built on this exact lattice + functor pattern.

Parsing with ocamllex and Menhir

A parser turns source text into an AST. OCaml provides two tools that work together:

ocamllex Menhir Source text ---------> Token stream ---------> AST "3 + x * 2" [INT 3; PLUS; BinOp(Add, Num 3, IDENT "x"; BinOp(Mul, Var "x", STAR; INT 2] Num 2))

ocamllex — lexer (.mll files)

let digit = ['0'-'9']
let alpha = ['a'-'z' 'A'-'Z' '_']

rule token = parse
  | [' ' '\t' '\n']+  { token lexbuf }
  | '+'               { PLUS }
  | '-'               { MINUS }
  | digit+ as n       { INT (int_of_string n) }
  | alpha+ as id      { IDENT id }
  | eof               { EOF }

Menhir — parser (.mly files)

%token <int> INT
%token <string> IDENT
%token PLUS MINUS STAR SLASH
%left PLUS MINUS    (* precedence *)
%left STAR SLASH

%%
program: e=expr EOF  { e } ;

expr:
  | e1=expr PLUS e2=expr
      { BinOp(Add, e1, e2) }
  | a=atom  { a } ;

atom:
  | n=INT   { Num n }
  | id=IDENT { Var id } ;

Menhir: Precedence and Associativity

Menhir resolves ambiguity in grammars using precedence and associativity declarations.

The ambiguity problem

How should 3 + 4 * 2 be parsed?

Option A: Option B: BinOp(Mul, BinOp(Add, BinOp(Add, Num 3, Num 3, BinOp(Mul, Num 4), Num 4, Num 2) Num 2)) = (3+4)*2 = 14 = 3+(4*2) = 11

We want Option B (standard math precedence).

Declarations (lowest to highest)

%left PLUS MINUS       (* lowest *)
%left STAR SLASH       (* higher *)
%nonassoc UMINUS       (* highest *)
  • %left — left-associative: a-b-c = (a-b)-c
  • %right — right-associative
  • %nonassoc — cannot chain

Unary minus trick

atom:
  | MINUS a=atom %prec UMINUS
      { Neg a }
  ;

%prec UMINUS tells Menhir to use the highest precedence for this rule.

Hands-On Exercises Overview

# Exercise Time Key Concepts Foreshadows
1 OCaml Basics
"Token Classifier"
~20 min let, functions, tuples, Printf, char classification Lexer helpers
2 Types and Recursion
"Mini Expression Tree"
~25 min ADTs, pattern matching, Option, recursive tree ops shared_ast expr type
3 Collections and Records
"Variable Tracker"
~25 min List.map/fold, StringMap, StringSet, ref Dataflow analysis
4 Modules and Functors
"Analysis Domain Builder"
~25 min Signatures, structs, functors, LATTICE abstract_domains
5 Calculator Parser ~25 min ocamllex, Menhir grammar rules Lab 2 parser
How to work: Fill in (* EXERCISE: ... *) stubs, run with dune exec, and compare output against the STUDENT_README. No OUnit2 tests — just guided tutorials.

How Module 0 Connects to the Bootcamp

Module 0 Module 1 Module 2 Module 3 OCaml Warm-Up Foundations AST & CFG Dataflow Analysis +-----------+ +-----------+ +------------+ +----------------+ | let, ADTs | | What is | | shared_ast | | Reaching defs | | match |-->| program |-->| expr, stmt |-->| Live variables | | Map, Set | | analysis? | | CFG build | | Fixpoint loops | | Functors | | Soundness | | Visitors | | Worklist algo | | Parsing | | Lattices | | Lab 2 | | Lab 3 | +-----------+ +-----------+ +------------+ +----------------+ | v Module 5 Module 4 Security Analysis Abstract Interp. +----------------+ +----------------+ | Taint analysis | | Sign domain | | Source/sink |<-| Interval domain| | Sanitizers | | MakeEnv functor| | Lab 5 | | Lab 4 | +----------------+ +----------------+
Exercise 2 (ADTs) directly previews Shared_ast.Ast_types from Module 2.
Exercise 4 (Functors) directly previews Abstract_domains.Abstract_env.MakeEnv from Module 4.

Key Takeaways

Language Fundamentals

  • Everything is an expression — no statements, no null, no void
  • Immutable by default — use ref only when needed
  • Pattern matching is your primary control flow tool
  • Type inference means annotations are optional but helpful

Data Structures

  • ADTs for ASTs and abstract domain values
  • StringMap for variable environments
  • StringSet for tracking variable sets
  • Records for structured data with named fields

Module System

  • Signatures define interfaces (like Java interfaces)
  • Structures provide implementations
  • Functors parameterize modules over other modules
  • The LATTICE + MakeEnv pattern recurs throughout the bootcamp

Parsing

  • ocamllex for lexing (regex-based tokenization)
  • Menhir for parsing (grammar rules producing AST nodes)
  • Precedence declarations resolve ambiguity

Next: Module 1 — Foundations

Now that you are comfortable with OCaml, Module 1 introduces the theory behind program analysis.

What you will learn

  • What is program analysis and why do we need it?
  • Static vs. dynamic analysis trade-offs
  • Soundness and completeness
  • The lattice-theoretic foundation of abstract interpretation
  • Fixpoint computation and the widening operator

How Module 0 prepared you

Module 0 ConceptModule 1+ Usage
ADTs + pattern matchingAST traversal
Map + SetDataflow facts
LATTICE signatureAbstract domains
MakeEnv functorAbstract environments
Menhir parserLab 2 parser extension
Ready? Complete all 5 exercises, then move on to Module 1. If you get stuck, check the STUDENT_README for expected output and hints.