The Book of Why: A Summary of Judea Pearl’s Causal Revolution
Judea Pearl’s revolution in how we understand cause and effect
“The Book of Why: The New Science of Cause and Effect” (2018) by Judea Pearl and Dana Mackenzie presents a groundbreaking argument: for over a century, science crippled itself by refusing to talk about causation, and this failure cost millions of lives. Pearl, a Turing Award-winning computer scientist, provides both a historical account of how statistics banned causal language and introduces his revolutionary mathematical framework—the “Ladder of Causation,” do-calculus, and causal diagrams—that finally allows scientists to answer “why” questions with mathematical rigor.
Pearl’s central insight is deceptively simple: correlation is not causation, but we can move from correlation to causation if we’re willing to make our causal assumptions explicit. Traditional statistics could only tell you that smokers get lung cancer more often—Pearl’s framework can tell you whether smoking actually causes cancer, even without conducting experiments.
The century-long ban on asking “why”
Pearl opens with a striking historical narrative: modern statistics was founded on an explicit prohibition against causal reasoning. Karl Pearson, who established the field in the late 19th century, argued in his influential “Grammar of Science” (1892) that science should be “essentially descriptive rather than explanatory.”
This anti-causal stance calcified into dogma. During the smoking-cancer debate of the 1950s, Ronald Fisher, co-inventor of the randomized controlled trial, used his enormous prestige to argue that observed correlations proved nothing. Fisher proposed a hypothetical “smoking gene” that might cause both the urge to smoke and susceptibility to cancer—technically correct within the constraints of pure statistical reasoning, but tragically wrong in hindsight.
Pearl argues this intellectual straitjacket delayed authoritative claims about smoking and cancer for years while people continued dying.
Three rungs on the ladder of causation
Pearl’s most accessible contribution is the Ladder of Causation, a hierarchy that distinguishes three fundamentally different types of questions:
Rung 1: Association (Seeing) asks “What patterns exist in the data?” This is the realm of traditional statistics and machine learning—observing that barometer readings drop before storms, or that people who exercise live longer. Current AI systems, including deep learning, operate exclusively here.
Rung 2: Intervention (Doing) asks “What happens if I take action?” This requires understanding causal mechanisms. Seeing the barometer fall predicts rain, but forcing the barometer down doesn’t cause rain. The crucial distinction is between P(Y|X)—the probability of Y given that we observe X—and P(Y|do(X))—the probability of Y given that we force X to occur.
Rung 3: Counterfactuals (Imagining) asks “What would have happened if circumstances had been different?” These questions—“Would I have survived if I’d taken the other treatment?”—require the most sophisticated reasoning.
The hierarchy matters because you cannot answer higher-rung questions using only lower-rung data.
Causal diagrams make assumptions visible
Perhaps Pearl’s most practical contribution is the causal diagram, also called a Directed Acyclic Graph (DAG). These simple pictures use circles (or boxes) for variables and arrows for direct causal relationships.
The power lies in three fundamental structures:
- A chain (A→B→C) shows mediation—fire causes smoke, which triggers alarms
- A fork (A←B→C) shows confounding—age affects both shoe size and reading ability in children
- A collider (A→B←C) is the trickiest: two independent causes affecting a common effect
What makes DAGs revolutionary is they force researchers to state their assumptions explicitly. Before Pearl, a researcher might “control for” variables based on intuition. DAGs reveal that controlling for the wrong variables can introduce bias rather than remove it.
The do-operator translates causal questions into math
Pearl’s technical breakthrough is the do-operator and associated do-calculus. Writing do(X=x) means “set X to value x through intervention,” as opposed to simply observing that X equals x.
The do-calculus consists of three mathematical rules that, together, form a complete system: if a causal effect can be identified from observational data, do-calculus will find a way; if it cannot, do-calculus proves impossibility.
The practical tool most researchers use is the back-door criterion: a simple graphical test that identifies which variables to control for when estimating causal effects.
Key examples from the book
Simpson’s Paradox
In the famous UC Berkeley admissions case (1973), overall data suggested discrimination against women (35% admitted vs. 44% of men), yet department-by-department analysis showed no discrimination. Pearl’s answer: data alone cannot tell you which analysis is correct—you need a causal model.
The Monty Hall Problem
Your door choice and the prize location independently determine which door Monty opens. Once you condition on Monty’s choice (observe which door he opened), previously independent variables become correlated. This explains why switching wins two-thirds of the time.
Counterfactuals and Structural Causal Models
The highest rung of Pearl’s ladder addresses questions like “Would the patient have survived with a different treatment?” These questions require Structural Causal Models (SCMs).
An SCM specifies:
- Endogenous variables (the system being modeled)
- Exogenous variables (outside factors including randomness)
- Structural equations that determine each variable as a function of its direct causes
This framework has immediate legal applications. The “but-for” test in tort law is precisely a counterfactual question.
Implications for AI
Pearl argues that artificial intelligence has achieved “impressive abilities but no intelligence.” Deep learning systems can recognize faces and play games, but they cannot answer why questions or predict the effects of interventions.
The diagnosis is architectural: neural networks learn associations from data, but the equations learned are symmetric. Causal relationships are fundamentally asymmetric: smoking predicts cancer, but cancer does not cause smoking.
Pearl’s prescription for achieving stronger AI includes:
- Building explicit causal models of the world
- Enabling systems to reason about interventions
- Developing capacity for counterfactual reasoning
- Creating adaptable systems that can generalize to new environments
Key takeaways
- Correlation truly is not causation, but we can move from correlation to causation through explicit causal modeling
- Causal diagrams force transparency about assumptions
- Controlling for the wrong variables can introduce bias
- Answering “why” requires going beyond the data to ask what would have happened under different circumstances
Pearl’s framework has not replaced traditional statistics—rather, it completes it by providing the missing tools for causal reasoning. As Pearl emphasizes, the “causal revolution” represents a wedding between the language of cause and effect and the language of data.