Understanding Tools Integration
Before building the capstone analyzer in OCaml, let's understand how real-world tools combine multiple analysis passes into a unified pipeline, aggregate findings into a common format, and produce actionable reports for developers and CI/CD systems.
Why Tools Integration?
You've built individual analyses — CFGs, dataflow, abstract interpretation, taint tracking. But real-world tools like ESLint, Semgrep, and Coverity don't run just one analysis. They combine multiple passes into a unified pipeline, aggregate findings, and produce actionable reports. This module is about building that integration layer.
Multi-Pass Pipeline
Chain multiple analyses together. Each pass produces findings. The pipeline orchestrates execution and collects results.
Unified Findings
A common finding format: severity, location, message, category. Different passes produce the same type of output.
Actionable Reports
Group by severity, by pass, or by location. Format for developers (inline) or security teams (summary). The same data, different views.
Analysis Findings
Every analysis pass produces findings — structured records describing potential issues. A unified finding type lets the pipeline collect results from different passes into one stream.
type finding = {
id : string; (* unique identifier *)
pass : string; (* which analysis pass *)
severity : severity; (* Critical|High|Medium|Low|Info *)
line : int; (* source location *)
message : string; (* human-readable description *)
category : string; (* e.g. "sql-injection", "div-by-zero" *)
}Severity Levels
Categories
Categories group findings by what kind of issue they represent, regardless of which pass found them.
Multiple passes can produce findings in the same category — e.g., both taint analysis and constant propagation might flag injection risks. Deduplication is part of the reporting layer.
Dead Code Detection
Dead code detection combines multiple analyses into a single pass: live variables finds unused assignments, reachability finds unreachable statements, and reaching definitions finds dead stores. It's a perfect example of multi-analysis integration.
x = 5 y = 10 // y is assigned but never read result = x + 1
Why Dead Code Matters
Dead code isn't just messy — it can hide bugs. An unused variable might indicate a missing operation. Unreachable code might be a guard that was supposed to execute. Dead stores waste computation. Flagging these issues helps developers maintain cleaner, more correct codebases.
Configurable Pipelines
A pipeline defines which passes to run, in what order, with what options. Think of it like a build system for analysis: each pass is a task, and the pipeline orchestrates them.
┌──────────────┐
│ Source Code │
└──────┬───────┘
│
▼
┌──────────────┐ ┌──────────────┐ ┌──────────────┐
│ Pass 1: │ │ Pass 2: │ │ Pass 3: │
│ Taint │───▶│ Sign │───▶│ Dead Code │
│ Analysis │ │ Analysis │ │ Detector │
└──────┬───────┘ └──────┬───────┘ └──────┬───────┘
│ │ │
▼ ▼ ▼
┌─────────────────────────────────────────────────────┐
│ Finding Aggregator │
│ Collect, deduplicate, sort by severity │
└──────────────────────┬──────────────────────────────┘
│
▼
┌──────────────┐
│ Report │
└──────────────┘Parallel Execution
Independent passes (taint vs sign) can run in parallel for speed. Dependent passes (e.g., inlining before interprocedural) must run sequentially. The pipeline respects these constraints.
Stop on Critical
In CI/CD, you might want to stop the pipeline immediately when a Critical finding is detected — no point running more passes if deployment is already blocked. This is a configurable option.
Analysis Reporting
The report is the user-facing output of your tool. The same findings can be presented in multiple formats: grouped by severity for triage, by pass for understanding, or as structured data (JSON/SARIF) for integration with CI/CD tools.
Real-World Integration
Tools like GitHub Code Scanning use SARIF (Static Analysis Results Interchange Format) — a JSON standard for analysis results. Your reporter module converts internal findings into this kind of structured output, making your tool compatible with the broader ecosystem.
Ready to Build Your Own?
You now understand how multi-pass pipelines work, how findings are structured and aggregated, how dead code detection combines analyses, and how configurable pipelines and reporters turn raw results into actionable output. Time to build the integration layer in OCaml.
Pipeline Explorer
Visualize multi-pass pipelines across 5 scenarios: toggle passes, watch findings aggregate, compare parallel vs sequential execution.
Open explorer →Exercise 1: Analysis Finding
Implement the unified finding type with severity levels, categories, sorting, and deduplication.
Start exercise →