Before You Code

Understanding Security Analysis

Before building taint analyzers in OCaml, let's develop intuition for how taint tracks untrusted data, why sources/sinks/sanitizers matter, and the subtle difference between explicit and implicit information flows that make security analysis challenging.

Why Taint Analysis?

Many security vulnerabilities share a common pattern: untrusted data from an external source (user input, database, network) flows into a sensitive operation (SQL query, HTML output, command execution) without being properly sanitized. Taint analysis tracks this flow automatically.

The Classic SQL Injection

What happens when the attacker enters ' OR 1=1 -- as their username?

username = req.body.username   // attacker controls this!
query = "SELECT * FROM users WHERE name='" + username + "'"
db.exec(query)  // 💥 SQL Injection

Sources

Where untrusted data enters: user input, URL params, request body, database reads, environment variables.

Sinks

Where tainted data is dangerous: SQL queries, HTML output, command execution, file paths.

Sanitizers

Functions that clean data: parameterized queries, HTML escaping, input validation. They convert Tainted → Untainted.

The Taint Lattice

Taint analysis uses a simple 4-element lattice. Every variable is tracked as one of: Bot (unreachable), Untainted (safe), Tainted (user-controlled), or Top (might be either).

Interactive Taint LatticeClick two elements to compute join & meet

Click two elements above

Taint Propagation Rules

Operation	Input A	Input B	Result	Why
+	Tainted	Untainted	Tainted	Any tainted operand taints the result
+	Untainted	Untainted	Untainted	Both safe → result safe
concat	Tainted	Untainted	Tainted	String concat with tainted = tainted string
assign	Tainted	—	Tainted	Assigning tainted value propagates taint

Taint Propagation

Taint flows through the program following data dependencies. When a tainted value is used in an expression, the result is tainted too. Sanitizers are the only way to break the chain.

Step Through: Safe Taint Flow

name = req.query.name
safe_name = escape_html(name)
greeting = "Hello, " + safe_name
res.send(greeting)

Step 1 / 4

name = req.query.name

name is Tainted — it comes from the URL query string.

Taint Environment

nameTainted

Source: user input

Propagation Rules Summary

Assignment: x = tainted_value → x becomes Tainted

Operations: tainted + anything → result is Tainted

Sanitizer: sanitize(tainted) → result is Untainted

Literals: "hello", 42 → Untainted

Security Configuration

A taint analyzer needs to know what to track. The security configuration defines three sets: which functions are sources of tainted data, which are sinks where tainted data is dangerous, and which are sanitizers that make data safe.

Sources are functions or locations where untrusted data enters the program. Any value returned from a source is marked Tainted.

user_input

Raw user input from forms, URL params, or request body

req.bodyreq.queryreq.params

database_read

Data read from database (may contain previously-stored tainted data)

db.query()Model.find()

environment

Environment variables or config that could be manipulated

process.envos.environ

Information Flow

Taint analysis tracks explicit flows (data assignments). But information can also leak through implicit flows — where the control structure reveals secrets even without direct data copying.

Explicit Flow

secret = get_password()
leak = secret  // direct copy!

Data flows directly from secret to leak via assignment. Easy to track — just follow the data dependencies.

Implicit Flow

secret = get_password()
leak = 0
if secret > 0 then
  leak = 1

No direct copy! But leak's value reveals whether secret > 0. Information leaks through the branch.

How Implicit Flows Leak Secrets

Ready to Build Your Own?

You now understand taint lattices, propagation rules, security configurations (sources/sinks/sanitizers), and the difference between explicit and implicit information flow. Time to implement a taint analyzer in OCaml.

Security Analysis Explorer

Trace taint flow through 5 programs: SQL injection, XSS, sanitized code, implicit flows, and multi-source scenarios.

Open explorer →

Exercise 1: Taint Lattice

Implement the taint domain with join, meet, leq, and taint propagation operators.

Start exercise →