Before You Code

Understanding Security Analysis

Before building taint analyzers in OCaml, let's develop intuition for how taint tracks untrusted data, why sources/sinks/sanitizers matter, and the subtle difference between explicit and implicit information flows that make security analysis challenging.

Why Taint Analysis?

Many security vulnerabilities share a common pattern: untrusted data from an external source (user input, database, network) flows into a sensitive operation (SQL query, HTML output, command execution) without being properly sanitized. Taint analysis tracks this flow automatically.

The Classic SQL Injection

What happens when the attacker enters ' OR 1=1 -- as their username?

username = req.body.username   // attacker controls this!
query = "SELECT * FROM users WHERE name='" + username + "'"
db.exec(query)  // 💥 SQL Injection

Sources

Where untrusted data enters: user input, URL params, request body, database reads, environment variables.

Sinks

Where tainted data is dangerous: SQL queries, HTML output, command execution, file paths.

Sanitizers

Functions that clean data: parameterized queries, HTML escaping, input validation. They convert Tainted → Untainted.

The Taint Lattice

Taint analysis uses a simple 4-element lattice. Every variable is tracked as one of: Bot (unreachable), Untainted (safe), Tainted (user-controlled), or Top (might be either).

Interactive Taint LatticeClick two elements to compute join & meet
UntaintedTainted

Click two elements above

Taint Propagation Rules
OperationInput AInput BResultWhy
+TaintedUntaintedTaintedAny tainted operand taints the result
+UntaintedUntaintedUntaintedBoth safe → result safe
concatTaintedUntaintedTaintedString concat with tainted = tainted string
assignTaintedTaintedAssigning tainted value propagates taint

Taint Propagation

Taint flows through the program following data dependencies. When a tainted value is used in an expression, the result is tainted too. Sanitizers are the only way to break the chain.

Step Through: Safe Taint Flow
name = req.query.name
safe_name = escape_html(name)
greeting = "Hello, " + safe_name
res.send(greeting)
Step 1 / 4
name = req.query.name

name is Tainted — it comes from the URL query string.

Taint Environment

nameTainted
Source: user input

Propagation Rules Summary

Assignment: x = tainted_value → x becomes Tainted
Operations: tainted + anything → result is Tainted
Sanitizer: sanitize(tainted) → result is Untainted
Literals: "hello", 42 → Untainted

Security Configuration

A taint analyzer needs to know what to track. The security configuration defines three sets: which functions are sources of tainted data, which are sinks where tainted data is dangerous, and which are sanitizers that make data safe.

Sources are functions or locations where untrusted data enters the program. Any value returned from a source is marked Tainted.

user_input

Raw user input from forms, URL params, or request body

req.bodyreq.queryreq.params
database_read

Data read from database (may contain previously-stored tainted data)

db.query()Model.find()
environment

Environment variables or config that could be manipulated

process.envos.environ

Information Flow

Taint analysis tracks explicit flows (data assignments). But information can also leak through implicit flows — where the control structure reveals secrets even without direct data copying.

Explicit Flow
secret = get_password()
leak = secret  // direct copy!

Data flows directly from secret to leak via assignment. Easy to track — just follow the data dependencies.

Implicit Flow
secret = get_password()
leak = 0
if secret > 0 then
  leak = 1

No direct copy! But leak's value reveals whether secret > 0. Information leaks through the branch.

How Implicit Flows Leak Secrets

Ready to Build Your Own?

You now understand taint lattices, propagation rules, security configurations (sources/sinks/sanitizers), and the difference between explicit and implicit information flow. Time to implement a taint analyzer in OCaml.