Bayes’ Theorem in Clinical Decision-Making

Summary of our Bayesian Travels

Lesson 3, Bayesian Reasoning for Clinical Decision-Making, introduces Bayesian reasoning as a natural and clinically aligned alternative to traditional frequentist statistics in physical therapy. Clinicians inherently use Bayesian principles—updating beliefs with new evidence—to manage uncertainty in diagnosis (patient classification). The lesson contrasts frequentist and Bayesian approaches, emphasizing that Bayesian reasoning reflects how clinical decisions are made in real time. It addresses the challenge of paradigm shifts in PT education, where frequentist methods dominate due to standardized testing and institutional inertia, despite Bayesian methods offering greater transparency, adaptability, and clinical relevance.

Lesson 4, Bayesian Applications in Research and Evidence Synthesis, explores how Bayesian methods enhance clinical research and evidence synthesis by offering a more flexible, context-sensitive alternative to traditional frequentist approaches. It outlines the advantages of Bayesian clinical trials—such as adaptive design, smaller sample sizes, and real-time decision-making—as well as Bayesian meta-analyses, which incorporate prior knowledge and better handle heterogeneity. The lesson also connects Bayesian reasoning to causal inference and critical realist reviews, showing how it supports deeper understanding of mechanisms, context, and individualized outcomes. Ultimately, it positions Bayesian methods as helpful for refining causal models and making research findings more meaningful for clinical practice.

Introduction to this Lesson

Here, Lesson 5, Bayes’ Theorem in Clinical Decision-Making, shifts the focus from research and population-level inference to the real-time decisions clinicians make with individual patients. Unlike the inductive flow of information from cases to populations seen in research, this lesson emphasizes the abductive reasoning clinicians rely on when applying population-based evidence to a case. Bayes’ Theorem offers a structure for this process, helping clinicians update beliefs about diagnoses (classification of any type*) by integrating prior knowledge with evolving clinical information—always with the goal of minimize (or at least understand) uncertainty in an attempt to make actionable decisions.

*Note: Diagnosis simply means classification. Classification can include classifying a patient into a prognostic group. A prognostic group is a population that has shared characteristics that make a particular intervention effective. Clinical prediction rules attempt to find people that are part of a prognostic group (a population). I dare not say subpopulation because if we add “sub” for every characteristic for which we create a population from another population we could fill the page with the prefix “sub” - such as the “subsubsubsubsubpopulation”. But that’s what clinical reasoning requires us to do. Find the population that is small enough to be useful without being small enough that it includes only the patient we’re seeing.

In fact any diagnostic group is a classification that we label in a way that is useful for some reason - but it’s a population that everyone with that diagnosis is a part. Many of our diagnostic classifications are less specific than prognostic groups as subpopulations of the diagnostic population. It is only a subpopulation of patients with COPD that have a positive Hoover’s sign, and that subpopulation has a different prognosis in response to inspiratory muscle training than those with a negative Hoover’s sign. Such prognostic groups are often generated by using clinical prediction rules. Prognostic groups tend to be prognostic groups because they have a shared characteristic that is mechanistically valuable. What do I mean by this? Low back pain may be a diagnostic label; but it’s mechanistically empty and therefore is not very useful for intervention or prognosis. The process of determining whether a particular prognostic group is more likely to have positive (or negative) outcomes with an intervention (or risk factor) is an inductive process and would be covered using Bayesian Applications in Research and Evidence Synthesis. ( https://peripateticpt.substack.com/p/bayesian-applications-in-research)

The Flow of Practice

Clinical practice unfolds as a sequence of decisions, each shaped by evolving beliefs and accumulating information. Whether diagnosing, treating, or estimating prognosis, clinicians move through a continuous process of refinement—beliefs shift under the weight of new information, insights evolve, and uncertainty is reduced over time. This is the essence of Bayesian reasoning. Applying Bayes’ Theorem to clinical decision-making doesn’t require a fundamental shift in how clinicians work—it simply gives a name and form to the logic behind the reasoning that’s already occurring.

So why highlight it?

First, Bayes’ Theorem provides the underlying logic of belief revision. Making this logic explicit allows for more transparent and justifiable decisions, or at least gives us the language to study it. Without the language to study it - we can’t study it.

Second, Bayes’ Theorem applies to both statistical inference (induction) from samples to populations and clinical inference (abduction and deduction) from population-based knowledge to individual patients. In this way, it bridges research and practice through a coherent, unified framework.

Third—and perhaps most practically—Bayesian reasoning clarifies and justifies commonly used clinical tools such as diagnostic accuracy metrics, which are easy to misunderstand or apply incorrectly.

In short, Bayesian reasoning offers a structured way to update probability estimates as new evidence becomes available, supporting more individualized and context-sensitive decisions.

It helps answer fundamental clinical questions about classification (diagnosis): Given a patient’s symptoms and test results, how likely is a particular condition?

The Application of Bayesian Reasoning

I hope this lesson doesn’t not come across as a polemic urging clinicians to perform Bayesian calculations in real-time during patient care—what Schön would call reflection-in-practice! What I’m attempting to do is present three practical ways to incorporate Bayesian reasoning into clinical decision-making:

Intuitive Belief Updating: As described above, clinicians already revise their beliefs as new information arises. Simply recognizing that this process mirrors Bayesian reasoning can improve awareness of how uncertainty is managed in practice. If this recognition catches on, it could provide a unified language for discussing how research evidence can be organized to more easily translate into clinical reasoning and decision making.
Strategic Decision Planning: Bayesian reasoning provides a framework for developing strategies about which diagnostic tests or clinical measures to use, and under what circumstances they are most effective in reducing uncertainty or shifting diagnostic confidence. It also encourages an explicit attempt to bridge research findings (inductive inference) with individual clinical presentations (abductive and deductive reasoning)—a connection often glossed over in published research with vague handwaving about “practice implications.” Through a Bayesian lens, these implications could be addressed directly within the data analysis and results, rather than being tacked on as an afterthought in the discussion section.
Metacognitive Reflection: Finally, Bayes’ Theorem supports reflection-on-practice—a retrospective evaluation of how information influenced decision-making. This approach helps clinicians refine their reasoning habits, better understand errors, and improve future care.

In all three forms, Bayesian reasoning supports a more deliberate, transparent, and rational clinical inquiry process—whether happening in the moment or in thoughtful reflection afterward.

Revisiting Bayes’ Theorem

Bayes’ Theorem is expressed as:

P(C|E) = [P(E|C) × P(C)] / P(E)

In clinical diagnosis, the cause (C) is the disease or condition, and the effect (E) is the patient’s observed presentation. In prognosis, the cause (C) is the treatment, and the effect (E) is a particular outcome.

And each element of the Bayes theorem helps us consider what and why we do what we do during the clinical encounter.

P(C) = Prior probability: belief in the cause before seeing the effect, often in clinical practice it’s belief that the cause exists in the situation we’re facing. Don’t be fooled by the seemingly straightforward presentation P(C). I’m of the belief that all probabilities are conditional. That even if there’s nothing on which we’re conditioning, that is simply P(C|nothing). But how often do we really have P(C|nothing)? I’d argue never. If I ask you what you think P(C) is (here let’s say C is any particular disease). You will immediately want to know something to help you consider your estimate. You’ll ask, the probability of that disease in the US? In Canada? In men? In women? In adults? In older adults? Whatever population I identify for you (an older adult male in the US) is now the conditional for the probability P(C|older adult male in the US). That’s one of the problems with some examples of using Bayes’ theorem in a clinical diagnostic process. The question - what’s the prior comes up a lot. Is the prior the baseline prevalence of a disease? Not really, after all, this person is in front of you and there’s a reason that you suspect they may have this disease. Considering the reason they have the disease, it’s actually P(C|the reason you suspect C), in English - the probability of the disease given the reason you suspect they have the disease. At the very least - in a clinical diagnostic process - that should be at least a bit higher than the baseline prevalence of the entire population. But if you’re in a screening situation, then P(C) may be the baseline prevalence of whatever subpopulation this person is in that warrants the screening test in the first place. If they are not in a subpopulation that warrants the screening test, then why are they being screened? If you think about it - you should realize that it’s Bayesian reasoning behind the debates about who should be screened and why they should be screened. Mammogram’s and colonoscopies are worth consideration when the baseline prevalence of breast or colon cancer is higher; but if the baseline prevalence is low, then that low number greatly reduces the posterior probability (reflect on the equation to follow that reasoning and let me know if you need me to unpack it some more).

P(E|C) = Likelihood: probability of the effect if the cause is true. This is what we hope to learn from clinical trials and meta-analyses of clinical trials for interventions; and from longitudinal cohort (observational) epidemiological studies for risk factors and exposures. We hope to first understand whether something (or how likely it) is a cause. Then we hope to understand how likely the effect is to occur given the cause. Why don’t studies present the conditional probability this way - the conditional about how effective an intervention is at causing its intended effect? That’s something I explore in my work in progress paper: The Effective Conditional: Enhancing Clinical Decision-Making Through Prediction Intervals. I really do hope to get that paper submitted soon, I first wrote about it in 2017, so it’s not like I haven’t taken the time to mull over its implications.

P(E) = Marginal probability: how likely is the effect across all possible causes of the effect (or without this particular effect under consideration). In diagnostic (classification) reasoning the effect is a symptom, sign, test result. This is the probability of that occurring. The probability that a shoulder clicks when there isn’t a labral tear; the probability that a hip has reduced rotation when it doesn’t have OA; the probability that someone has dyspnea on exertion or lower extremity edema when they are not in heart failure. The marginal probability is critically important for determining how specific a symptom (more false positives), sign or test result is for any particular cause. It’s why “loss of taste and smell” were more specific signs of COVID than “cough”. There are far more causes of cough than loss of taste and smell. But loss of taste and smell were not that common with even COVID, making that sign not as sensitive (more false negatives).

P(C|E) = Posterior probability: the updated belief in a cause (C) given observed effect (E). This is the reason we keep asking questions and performing examinations during our clinical encounters (note - not just during the initial examination - but during the entire clinical encounter). We’re always in search of a new posterior probability. In layman’s terms, the posterior probability is your new state of knowledge, your belief - either more secure or less secure, perhaps different than it was before. In the ongoing sequence of clinical reasoning, the posterior probability replaces the prior probability for the next step in reasoning. The posterior of thought sequence 1 is the prior of thought sequence 2 which generates a new posterior, and so it goes.

Quantitative Example: Diagnosing Small Pox vs. Chicken Pox

In James Stone’s introductory example on Bayes’ Rule, he illustrates how probability updates in clinical diagnosis by examining the case of a rare disease and a diagnostic test. The disease affects 1 in 1,000 people (a prior probability of 0.1%). The diagnostic test used has a sensitivity of 99% (true positive rate) and a false positive rate of 5%, which corresponds to a specificity of 95% (since specificity = 1 − false positive rate).

When a patient tests positive, we might expect the likelihood of disease to be high due to the test’s strong sensitivity and reasonable specificity. However, applying Bayes’ Rule reveals that the posterior probability—the probability that the person actually has the disease given a positive result—is only about 2%.

This result is surprising but critical: even a good test can produce mostly false positives when the condition is rare. Stone’s example emphasizes the importance of considering both the prevalence of the condition (prior probability) and the test characteristics (sensitivity and specificity). Bayesian reasoning provides the structure to combine these factors and calculate how much diagnostic certainty has actually improved.

Reflective Example: Diagnosing Deep Vein Thrombosis (DVT)

A patient presents with unilateral leg swelling. You consider the possibility of DVT.

P(C): Prior probability of DVT in this patient’s population - triggers in practice reflection that may lead to questions about risk factors, recent travel, medications, medical history, family history. Essentially - what population does this patient belong to? When I cover this in class, I have the students think of cases that have a “high” or “low” base rate of X (X being whatever condition they are considering - - keeping in mind that to be considering X in a case you must have some information just to get started…..). Then we do the math based on “high” being 75-80%, and “low” being 20-25%. These numbers are subjectively obtained, but to build a reasoning system objectively determined numbers could be obtained - just not usually during reflection-in-practice. The irony here is that while this is a base rate (prior probability), it is also truly a conditional probability. What are the conditions that specify a more precise population for the patient?

P(E|C): Likelihood, Probability of swelling if DVT is present P(Swelling|DVT)- Note that empirically this is the rate of true positives, the sensitivity of a symptom or sign. As a side note: the +LR is the ratio of the rate of true positives / rate of false positives. Therefore, a high sensitivity and high specificity (1- specificity is the rate of false positives) are required for a clinically meaningful +LR . And, logically, if there is a high sensitivity and high specificity, there will also be a clinically meaningful -LR since the -LR is the rate of false negatives / rate of true negatives (which is 1-sensitivity / specificity). Also note - that since the P(E|C) is simply a probability related to causation (how strong is the causal linkage between C and E?), then sensitivity, specificity and LRs are implicitly connected to and strongly resonant of causal and Bayesian reasoning.

P(E): Marginal probability of swelling in general (DVT or not) - triggers in practice reflection about whether there are other conditions that the patient has that could cause swelling. If you can rule those out, then the Marginal probability of other causes drops, which increases the posterior probability that the swelling is due to a DVT.

P(E|C): Posterior probability - our updated belief of whether there is a DVT based on the presence of swelling. Of significant importance is that in the sequence of clinical reasoning, this posterior probability becomes the Prior Probability for the next step in the sequence.

The above process can also be performed using Fagan’s nomogram which relies on an estimate of the base rate and a likelihood ratio, but does not explicitly consider the base rate of swelling. However, Fagan’s nomogram does not account for (at least as far as I can tell) the Marginal probability. It is rather intuitive that if you rule out other causes, then the cause left is more likely to be the cause. Not quite the same as the famous Sherlock Holmes’ quote - but close enough:

“When you have eliminated the impossible, whatever remains, however improbable, must be the truth.”

— Sir Arthur Conan Doyle, The Sign of the Four

If you didn’t know this - the author of Sherlock Holmes was trained as a physician, and derived Holmes from his medical school professor that was a skilled diagnostician (Dr. Joseph Bell). While Bell is credited for Holmes’ savant capabilities of deduction, it is only because Doyle was writing before C.S. Peirce that it’s called deduction in the books. Clearly, what Holmes is doing is abduction. But Peirce wasn’t writing about abduction until years after Doyle was writing about Holmes; and it took Peirce a while to settle on abduction, having tried hypothesis and retroduction first (both of which have gone on to have other meanings, but of the two retroduction is often defined as synonymous with abduction).

Using Bayes’ Theorem, you can revise the probability that this patient has DVT based on test results (e.g., D-dimer) or clinical decision tools (e.g., Wells criteria), in other words, a progressive cumulative case approach to determining belief. Bayesian reasoning helps avoid over-reliance on single tests or binary thresholds and supports more nuanced decision-making.

Quantitative & Reflective Example: Heart Failure (HF)

Bayesian reasoning also underpins causal modeling frameworks like Bayesian networks, which explicitly represent how multiple clinical signs relate to a possible diagnosis.

When developing the HF clinical practice guideline I created a Bayesian network with SAMIAM (http://reasoning.cs.ucla.edu/samiam/) using the diagnostic characteristics of:

• Pulmonary crackles

• Jugular venous distention (JVD)

• Dependent edema

• S3 heart sound

The network allowed us to estimate HF likelihood based on any combination of these findings using the base (prior) probability of something having a history of heart failure. For example:

• All four signs → ~99% probability of HF

• All four absent → ~5% probability

• S3 alone → ~44% probability

• Edema alone → ~16%

This network didn’t just calculate—it supported abductive inference and offered probabilistic expectations when some findings were missing. These causal, data-driven models move beyond checklists and help physical therapists reason transparently and flexibly. And it’s resonant of the above process determining how strongly to believe there’s a DVT because each sign is tested one at a time, so there remains a cumulative case approach to determining belief.

Using Bayesian Reasoning

In summary, you’re already using Bayesian reasoning. It is the logic that informs the diagnostic (classification) process. And, you have to classify. Once you broaden the phrase diagnosis to be classify, and you recognize that it is classifications that we prescribe particular interventions, then all clinical decision making hinges on classification. Yes, you’re prescribing stability exercise to Ms. Smith. But it’s because you’ve determined Ms. Smith fits into a classification that has some reasonable probability of responding to stability exercise.

Importantly, this isn’t just about math—it’s a mindset. Many clinicians already think probabilistically, even if not formally. By making that reasoning process explicit and structured, Bayesian thinking strengthens clinical inquiry and decision-making at the very least. And it provides a framework that opens the door to deeper connection to research evidence.

What’s Next?

In the next lesson of Stats4PT,, we’ll dive into Lesson 6: Beyond Data: Mechanisms and Structures in Evidence – Why mechanistic reasoning matters, integrating causal inference into clinical decision-making.

A Peripatetic Physical Therapist