Beyond Data: Mechanisms and Structures in Evidence

When to Trust the Model, and When to Follow the Mechanism

The Clinician’s Dilemma… as Told on a Baseball Diamond

Let me start with something seemingly far afield from rehab: stealing second base.

If you’ve read Moneyball, you might recall the chapter where the data revolution in baseball came head-to-head with the traditional wisdom of the game. For decades, stealing bases was seen as a hallmark of aggressive, smart baseball - manufacturing runs. A well-timed steal could change the momentum of a game. It was what “savvy” players did. But the statisticians who helped shape the Oakland A’s new analytic strategy weren’t buying it. They ran the numbers and came to a clear, unpopular conclusion: stealing, in most cases, is a losing bet. The odds didn’t justify the risk. And so the new rule in the dugout became clear: Don’t steal.1

But then there’s the moment that doesn’t fit the model. A runner’s on first. He’s fast. The pitcher is slow to the plate. The catcher’s throw is mediocre at best. The defense is out of position. The opportunity is there. And he runs.

He’s safe. The crowd erupts. The model said don’t. The moment said go.

Who was right?

This is the tension we sit in, every day, as clinicians. We are constantly negotiating between the base rates of population-level research and the particulars of the patient in front of us. The temptation is to settle the matter once and for all—trust the model or trust the moment. But as Moneyball reminds us, that’s the wrong question. The real question is: When does the model apply, and when does it miss something crucial? And what allows us to know the difference?

Think about a clinical decision you’ve made recently. One where the guideline or systematic review said one thing, but your hands, your patient’s story, your understanding of the body’s response systems said something else. Why did you choose the direction you did? What tipped the scales?

The Problem with Pure Data: When the Model Misses the Mechanism

Years ago, outcomes research told us that traction doesn’t work for low back pain. It was studied, compared, measured, and ultimately dismissed. No benefit over placebo. That was the consensus.

But something didn’t sit right. Patients who described compressive pain—symptoms clearly aggravated by load-bearing—sometimes responded to traction. Not universally. Not always dramatically. But consistently enough to notice.

What the data didn’t capture was the mechanism. The studies tested traction generically, not mechanistically. They lumped together subgroups and asked whether the intervention made a statistical dent in the pain score across the whole sample.

In doing so, they erased the very reasons a clinician might choose to use traction: not because it works for back pain in general, but because it might work for back pain caused by compressive sensitivity. If that distinction isn’t reflected in the study design, the research will fail to detect what’s clinically meaningful.

This is exactly the kind of scenario Paul Meehl was warning about when he famously advised: “Don’t deviate from a predictive model unless you know about causes the model didn’t consider.” That sentence holds immense clinical weight. It doesn’t mean we should disregard statistical models or clinical prediction rules. It means we should recognize that they are limited to the variables they include, and therefore blind to the causal structures they leave out.

Population Evidence vs. Individual Reasoning

There’s another example I keep coming back to—this one from cardiopulmonary rehab. A set of guidelines once reported that both aerobic and strength training improved outcomes in patients with heart failure. Both groups improved. Therefore, either approach could be recommended.

But what if the patient in front of you has significant muscular weakness and no endurance limitation? What if the benefit of aerobic training in the trial came from improving cardiovascular function in a population with low endurance—but that doesn’t apply to this individual?

The trial reports an effect size. But it doesn’t tell us why the effect occurred, or for whom. Without understanding the mechanism behind the effect, we’re left with a statistical abstraction. Useful? Yes. But not definitive.

This is the essential divide between statistical significance and clinical reasoning. Statistical significance talks to populations. Clinical reasoning must talk to patients. And patients don’t live in confidence intervals—they live in bodies, stories, systems, and contexts.

When has a research finding clashed with your clinical instincts about what a particular patient needed? Did the discrepancy come from the intervention itself—or from the model’s failure to recognize why that intervention might work in a given case?

The Mechanistic Middle: Where Our Judgment Lives

Let’s take a step deeper.

Roy Bhaskar’s framework of critical realism gives us three layers to work with: the empirical, the actual, and the real.

The empirical domain is what we can observe and measure. Symptoms. Outcomes. Lab values. Scores.
The actual domain is what happens—whether we see it or not. Muscle firing patterns. Inflammatory responses. Motor planning.
The real domain is what generates those happenings—the underlying structures and systems. Social determinants. Biological predispositions. Cognitive schemas.

RCTs operate in the empirical domain. Clinical practice lives in all three.

This middle layer—the actual domain—is where mechanisms reside. When I apply manual therapy, I’m not just hoping to change a pain score. I’m attempting to modulate a neuromuscular response, alter a tissue state, or influence central processing. The outcomes may be measured empirically, but the reason I act is grounded in the mechanisms I understand (or at least hypothesize).

And this is why mechanisms matter. They give us a reason to expect a different outcome in a particular case. They are not just academic abstractions; they are the foundations of clinical justification.

Think of a treatment you use often. What mechanism do you believe drives its effectiveness? Where did that belief come from—clinical experience, training, research, or all three?

Mechanisms as Evidence: Beyond the Black Box

We often talk about “the evidence” as if it comes in one form: a statistically significant result, a guideline recommendation, maybe a meta-analysis summary. But another kind of evidence works quietly in the background of everyday clinical judgment: mechanistic reasoning.

In traditional hierarchies of evidence, mechanistic reasoning has been treated with deep suspicion. That caution isn’t unfounded. There are countless examples of interventions that “made sense” mechanistically but failed when tested empirically. Think of hormone replacement therapy and its early promise in reducing cardiac mortality, a prediction made on the back of a plausible lipid-lowering mechanism—but which ultimately didn’t hold up in trials. The mistake wasn’t just clinical—it was epistemological: trusting a mechanism as if it were outcome evidence.

But dismissing all mechanistic reasoning because some of it is poor-quality is like dismissing all clinical trials because some are underpowered or poorly blinded. What matters is the quality of the reasoning, not the category.

Philosopher Holly Andersen (2012) argues that mechanistic reasoning plays at least two roles in clinical epistemology. First, it has a predictive role—trying to anticipate what an intervention will do. This is where mechanistic reasoning is most fragile, especially in complex systems. But more importantly, she identifies an applicative role: mechanisms help clinicians judge when population-level findings apply to individual patients. This is where mechanisms earn their keep—not in prediction, but in translation.

That’s the real work of reasoning in rehab.

Jeremy Howick and colleagues (2010) offer a more formal structure for evaluating the strength of mechanistic reasoning. They distinguish between “empty” mechanisms—those based on speculation or folk physiology—and high-quality mechanistic reasoning, which involves a clear, coherent chain linking intervention to outcome. According to Howick et al., high-quality mechanistic reasoning is:

Rooted in empirical knowledge of biological or behavioral pathways,

Conscious of the complexity and probabilistic nature of mechanisms,

And structured through what they call a four-phase model:

Set-up (diagnosis, context, environment),
Delivery (can the intervention be implemented properly?),
Action (how does it produce its effects?), and
Outcome (what patient-relevant changes follow?).

In our context, this framework maps seamlessly onto PT reasoning. Consider a case of progressive loading for tendon pain. Did we set the stage properly (load tolerance, baseline irritability)? Can the patient execute the intervention as prescribed? Do we know what tissue, neural, or cognitive adaptations we expect? Are those changes detectable in pain, function, or confidence?

Each phase involves mechanistic judgment—not just about what to do, but why we think it will work.

And then there’s the final piece. In a 2020 paper, Daniel Auker-Howlett and Michael Wilde introduce a concept called reinforced reasoning. Their argument is elegant: neither mechanistic reasoning nor statistical correlation, on its own, is enough to justify clinical action. But together? They create something stronger.

They compare this to reinforced concrete: the concrete resists compression, the steel resists tension. Alone, each has weaknesses. Together, they form a structure far more resilient than either component on its own. In the same way, they argue, combining evidence of correlation (e.g., from trials) with evidence of mechanisms (e.g., from basic science or clinical physiology) yields a more reliable inference than either could provide alone.

This approach—what philosophers call evidential pluralism—fits how many of us already reason. It’s not either/or. It’s both, in context. We use research evidence to understand what tends to work. We use mechanistic reasoning to understand when and why it might work—or not—for the person in front of us.

Mechanistic reasoning, then, is not just a philosophical exercise—it’s a clinical one. It’s how we interpret outcomes in light of causes, how we distinguish correlation from consequence. But here’s the challenge: the place where most clinicians look for practical guidance—the clinical practice guideline—is often where mechanisms are the least visible. They’re there, but they’re rarely explicit. To use guidelines well, we have to read not just what they say, but what they assume. What causal logic underwrites the recommendation? What view of the body, the condition, or the therapeutic process is embedded in the summary? If we fail to ask those questions, we risk treating guidelines as gospel—rather than the structured, partial, and context-dependent tools they really are.

Mechanisms in the Margins of Guidelines

Clinical practice guidelines (CPGs) are often treated as the most distilled form of evidence—consensus documents backed by systematic reviews, meta-analyses, and expert consensus. And while they do provide rigorously synthesized data, they are not mechanistically neutral. Mechanisms are embedded within them, even if they aren’t always named.

Take the 2025 CPG for rotator cuff tendinopathy (Desmeules et al.). On the surface, it’s a structured list of recommendations—when to use manual therapy, what modalities to avoid, the role of progressive loading. But underneath those statements is a functional model of shoulder pathology. The guideline distinguishes between tendinopathy types (e.g., degenerative vs. reactive), links interventions to biological processes (like tissue adaptation, subacromial pressure reduction), and situates care within timelines that reflect healing trajectories. Those are mechanistic claims—statements about how change is thought to happen. They aren’t foregrounded, but they’re doing the epistemic lifting.

This is where the rotator cuff CPG quietly excels. It assumes that tissue load capacity matters. It assumes that pathology has a process, not just a name. And it provides space for clinicians to reason through when and why a given intervention fits.

Now contrast that with the 2023 CPG on plantar fasciitis (Koc et al.). It’s comprehensive, methodologically sound, and rigorously structured. But its recommendations are largely outcome-driven. Stretching works. Manual therapy helps. Orthoses help when combined with other strategies. But the causal logic behind those findings—the distinction between inflammatory and degenerative processes, the interaction between plantar fascia stress and neuromotor control—is mostly implicit. The clinician is left to supply the causal interpretation.

Neither approach is wrong. They represent different modes of clinical reasoning support. But the contrast makes something important visible: even evidence-based guidelines rely on mechanistic assumptions. They simply vary in how explicitly they frame them.

And this matters, because mechanisms are what allow us to take a recommendation and ask: Will it work for this patient? In this context? Given this presentation?

Guidelines can’t answer those questions for us. But they can—if we read them closely—provide the scaffolding for us to reason through them. What appears as a recommendation is often a placeholder for a more complex mechanistic argument that the guideline itself doesn’t fully articulate.

So as clinicians, we need to read for what’s said—and for what’s assumed. Evidence doesn’t only come from outcomes. It also comes from understanding how those outcomes come to be.

Open Systems and Ethical Reasoning

One of the clearest misalignments between research and practice comes from a misunderstanding of systems.

RCTs are built to operate in closed systems. They attempt to isolate variables, minimize noise, control for confounds. And that’s not a flaw—it’s a necessity for answering certain kinds of questions.

But clinical care doesn’t happen in a lab. It happens in open systems—dynamic, adaptive, and deeply human. Patients change their minds. They have beliefs, fears, social obligations, prior injuries. The therapeutic relationship itself becomes a mechanism.

To try and close this system—to force it into the shape of a controlled trial—is to misunderstand what clinical reasoning is for. We don’t treat variables. We treat people.

This is why evidence-based guidelines should inform our reasoning, not replace it. They are one lens among many. They are useful precisely to the extent that we understand their scope—and their limitations.

When was the last time you adjusted a plan—not because a journal article told you to, or a protocol flagged a change, but because something you noticed—in the patient’s response, their context, or their story—called for it?

When to Trust the Model — and When to Trust the Mechanism

So when do we follow the model? And when do we step aside?

We come back, once more, to Meehl’s insight: Don’t deviate from a predictive model unless you understand a cause the model doesn’t consider.

That’s the fulcrum of this lesson.

Ask yourself:

Is the patient in front of me similar to those the model was built on?
Is the causal pathway for this patient consistent with the one assumed in the model?
Do I have a reason—grounded in theory, mechanism, or experience—to believe this case deviates?

If you can answer yes to those last questions, then you have a reasoned basis to depart. Not a hunch. Not an intuition. A reason.

That is clinical expertise: knowing when the model applies, and when the phenomenon demands something more.

Reclaiming the Full Scope of Evidence

When I started physical therapy school in 1994, I didn’t just think I was entering a healthcare profession—I realized I was stepping into an intellectual territory that stretched from genetics to sociology. Every class, every lab, every clinical rotation showed me how wide the field really was. Tissue biology, motor control, behavior change, structural inequality, adaptation theory—they all had something to say about why a patient recovered or didn’t, why a treatment succeeded or failed, why pain persisted or finally receded.

Practicing physical therapy gives you license to roam across disciplines. Every patient encounter is a small research question. Every plan of care is a structured hypothesis. And if you know where to look, the underlying mechanisms are always there—at the cellular level, the systems level, and the social level.

When I was a student, the research that filled our journals was mechanistically curious. Experimental studies, applied science, biopsychosocial models. They weren’t perfect. But they were trying to explain things—to offer reasons, not just results. Somewhere along the line, the pendulum swung. We became obsessed with outcomes. We needed numbers, significance levels, meta-analytic confidence. And those things matter—deeply. But they are not the whole story.

If we want to reclaim a deeper kind of evidence-based practice, we need to stop treating “evidence” as synonymous with outcomes. We need to ask again: What is the nature of this condition? What changes when we intervene? What are the structures and mechanisms at play? We need to let our reasoning be as plural and layered as the patients we treat.

Because real clinical reasoning is not just about applying data. It’s about engaging a causal reality that moves through and around us—from intracellular cascades to cultural norms, from movement patterns to social narratives. We don’t just treat impairments—we navigate causes.

Maybe the goal isn’t to choose between data and mechanisms, between guidelines and experience. Maybe the real task is to bring them together—to use all of the tools, all of the disciplines, all of the evidence, to make sense of what we see, and help change what we can.

This isn’t just evidence-based practice. This is knowledge-based practice. And it’s what our profession has always been capable of—when we remember that the map isn’t the territory, and the guideline isn’t the reason.

Wrapping Up

If mechanisms are the middle layer of clinical reasoning, then reclaiming them is essential to making sense of everything else.

Mechanisms aren’t a side conversation. They’re not optional add-ons to our evidence. They are the structure that makes our reasoning coherent. They are what separates “what worked” from “why it worked,” and “why it might not.”

Statistical models are tools. But they don’t think. They don’t reason. They don’t explain.

You do.

And explanation—causal, contextual, patient-centered—is the heart of what we do.

A Peripatetic Physical Therapist