6 Comments
User's avatar
Tim Duffy's avatar

Sean,

I’m looking forward to hearing about Baysean reasoning and started writing a question to you on this topic. However, since I wrote it down I’ve since come up with the solution myself. I’m curious if my understanding is in concert with yours and I figure it might be useful for the rest of the fellows (or in our current cohort, fellas). Here it is:

I understand the importance of base rate, but I’ve been bothered when applying Baysean reasoning when the disease/injury prevalence data has a large range of reported incidence. My observation is that this is a problem in conditions with poorer diagnostic agreement and often reserved for syndromes. There tends to be a lot of these in musculoskeletal injuries. A good example is SIJ pain where the estimated incidence is troublingly variable. For an example, the estimate of SIJ pain is said to be between 15-30% of the population (Cohen, 2005).

My solution to this was to use a Fagan’s nomogram, assign a made-up special test with a value of +7 LR and plug it in using a pre-test probability of 15 and 30. I then cheated and plugged in this to ChatGPT to get the exact numbers…

“let's say I have a special test with a positive likelihood ratio (+LR) of 7 that I use on two diseases. Disease A has a prevalence of 15% of the population and disease B has a prevalence of 30% of the population. Could you calculate the post-test probabilities for each of these scenarios?”

Chat GPT did the math for me (and showed the work) providing a final answer of:

Disease A (Prevalence 15%) → Post-test probability: 55.2%

Disease B (Prevalence 30%) → Post-test probability: 75%

So how should I use this data? Take an average? Use 15? Use 30? Continue to question the existence of SIJ pain?

This leant me to ask a more philosophical question of does the base rate that is published (empirical domain) reflect the actual and real domains.

Thanks!

Tim

Cohen SP. Sacroiliac joint pain: a comprehensive review of epidemiology, diagnosis, and treatment. Anesth Analg. 2005;101(5):1440-1453

Expand full comment
Sean Collins's avatar

Hey Tim,

Great question! And actually, your process of working through the issue—running the numbers for both 15% and 30% and seeing how they affect post-test probability—is exactly the kind of thinking Bayesian reasoning promotes.

To get to the heart of your question: Which base rate should you use? The key is that base rates aren’t fixed truths—they are conditional probabilities. The reported prevalence of SIJ pain (15-30%) isn’t a universal constant, but rather an estimate that depends on the population studied or at least characteristics of people in the population. There are most certainly factors within the entire population that creates other populations that carry different base rates (that's a conditional probability), and getting to that means there's a Bayesian reasoning process just to get to the base rate. We should refine our base rate before applying test likelihood ratios.

If we had more information about the patient population in front of us, we could further stratify the prior probability:

• Is this patient postpartum? If so, the SIJ pain prevalence might be on the higher end.

• Is this patient a 22-year-old male with no history of trauma? Then SIJ pain is likely lower than 15%.

• Does this patient have a history of inflammatory arthritis? Then it may be even higher than 30%.

What you’re running into is an example of how clinical reasoning interacts with Bayesian inference in multiple steps—the prevalence you use as a prior should already account for patient-specific contextual factors. The challenge isn’t just knowing what the base rate is for a given condition, but also deciding which population your patient most appropriately belongs to. A prevalence estimate is only useful if it represents a population similar to the individual you’re treating.

So what should you do?

• If no better information is available, taking an average (e.g., 22.5%) might be reasonable, but it’s a blunt approach.

• Ideally, you refine the prior by looking at studies that break down SIJ pain prevalence in more specific populations that match your patient.

• If no such data exists, your clinical judgment on the patient’s risk factors effectively serves as an informal Bayesian prior adjustment.

As for your last question—does the empirical domain reflect the actual and real domains? Not always! Prevalence estimates are empirical observations, but they may not fully capture the mechanisms (Actual domain) or underlying structures (Real domain) influencing SIJ pain. That’s why Bayesian reasoning is so valuable—it allows us to incorporate broader reasoning beyond what’s strictly in the dataset.

This is a great topic that will definitely come up in the Bayesian reasoning posts, especially when we talk about how priors should be structured in clinical reasoning. Looking forward to diving into this more!

Sean

Expand full comment
Tim Duffy's avatar

Hi Sean,

I just used Models4PT using the exact same instructions provided by the video. Super easy to use!

Thanks!

Expand full comment
Sean Collins's avatar

Great, thanks for giving it a try!

Expand full comment
Tim Duffy's avatar

Thanks that's what I needed, an explanation of conditional probability!

Expand full comment
Sean Collins's avatar

Great, allowed me to make sure I add a short explanation of conditional probability to the first “lesson” (page) of the Bayesian topics. In my opinion, all probabilities are conditional. When we say the “probability of X” as in P(X), we’re really just saying “probability of X given no other information” as in P(X|I don’t know anything). :)

Expand full comment