Understanding Security Analysis
Before building taint analyzers in OCaml, let's develop intuition for how taint tracks untrusted data, why sources/sinks/sanitizers matter, and the subtle difference between explicit and implicit information flows that make security analysis challenging.
Why Taint Analysis?
Many security vulnerabilities share a common pattern: untrusted data from an external source (user input, database, network) flows into a sensitive operation (SQL query, HTML output, command execution) without being properly sanitized. Taint analysis tracks this flow automatically.
What happens when the attacker enters ' OR 1=1 -- as their username?
username = req.body.username // attacker controls this! query = "SELECT * FROM users WHERE name='" + username + "'" db.exec(query) // 💥 SQL Injection
Sources
Where untrusted data enters: user input, URL params, request body, database reads, environment variables.
Sinks
Where tainted data is dangerous: SQL queries, HTML output, command execution, file paths.
Sanitizers
Functions that clean data: parameterized queries, HTML escaping, input validation. They convert Tainted → Untainted.
The Taint Lattice
Taint analysis uses a simple 4-element lattice. Every variable is tracked as one of: Bot (unreachable), Untainted (safe), Tainted (user-controlled), or Top (might be either).
Click two elements above
| Operation | Input A | Input B | Result | Why |
|---|---|---|---|---|
| + | Tainted | Untainted | Tainted | Any tainted operand taints the result |
| + | Untainted | Untainted | Untainted | Both safe → result safe |
| concat | Tainted | Untainted | Tainted | String concat with tainted = tainted string |
| assign | Tainted | — | Tainted | Assigning tainted value propagates taint |
Taint Propagation
Taint flows through the program following data dependencies. When a tainted value is used in an expression, the result is tainted too. Sanitizers are the only way to break the chain.
name = req.query.name safe_name = escape_html(name) greeting = "Hello, " + safe_name res.send(greeting)
name is Tainted — it comes from the URL query string.
Taint Environment
Propagation Rules Summary
x = tainted_value → x becomes Taintedtainted + anything → result is Taintedsanitize(tainted) → result is Untainted"hello", 42 → UntaintedSecurity Configuration
A taint analyzer needs to know what to track. The security configuration defines three sets: which functions are sources of tainted data, which are sinks where tainted data is dangerous, and which are sanitizers that make data safe.
Sources are functions or locations where untrusted data enters the program. Any value returned from a source is marked Tainted.
Raw user input from forms, URL params, or request body
req.bodyreq.queryreq.paramsData read from database (may contain previously-stored tainted data)
db.query()Model.find()Environment variables or config that could be manipulated
process.envos.environInformation Flow
Taint analysis tracks explicit flows (data assignments). But information can also leak through implicit flows — where the control structure reveals secrets even without direct data copying.
secret = get_password() leak = secret // direct copy!
Data flows directly from secret to leak via assignment. Easy to track — just follow the data dependencies.
secret = get_password() leak = 0 if secret > 0 then leak = 1
No direct copy! But leak's value reveals whether secret > 0. Information leaks through the branch.
Ready to Build Your Own?
You now understand taint lattices, propagation rules, security configurations (sources/sinks/sanitizers), and the difference between explicit and implicit information flow. Time to implement a taint analyzer in OCaml.