Not Causal & a negative test: strong epistemological claims

Thoughts about practice

Nov 19, 2023

Starting very soon I’ll be practicing physical therapy part time (two days a week (4 hours a day, 8 hours a week)). To help me update, and continue to learn, I’m reading a lot (book and journals) and taking some continuing education (“update”) courses. I’ll be practicing in an outpatient setting. The clientele is primarily seeking physical therapy for what most people would consider musculoskeletal conditions (orthopedics). I tend to take a different approach and try to avoid making any assumptions that partition a whole being into one human constructed system, but that might be a topic for another post.

What inspired me this morning were an accumulation of issues I’ve witnessed in the continuing education I’ve been listening to.

First, Causation.

People teaching the courses I’ve watched are certainly aware that correlation does not imply causation. However, they seem less inclined to consider that causation implies correlation. They are too quick to say the opposite of causation when reporting correlation findings. I’ve now heard the phrase “This is NOT causation” many times. I think they mean to say: “We cannot conclude that this correlation is causal, but here are the pros and cons towards that conclusion one way or another.” Using “Hill’s Criteria for Causation” there are nine factors we can consider. Strength of the association (correlation or some other approach to measuring association) is simply ONE of these nine considerations. If there is a correlation, just because that does not “imply” causation, but it also doesn’t mean that there is “no causation.” Saying this is “not causation” sounds like the person is claiming something epistemologically equivalent to saying there “is causation.” The claim “NOT” is just as knowledge laden as the claim “IS”. They are both claims to know something. Just like the claim that a test result is “positive” is equivalent (as a claim of knowledge” as saying the test result is “negative.” My suggestion - be more clear. Don’t take not knowing for sure whether a relationship is causal to mean that a relationship is “not causation.” That’s just as fraught as saying it is causal. If you’ve ever made a causal graph, the decision to “not” draw an arrow between two variables is just as important as the decision to draw an arrow between two variables.

magnifying glass near gray laptop computer — Photo by Agence Olloweb on Unsplash

Second, Diagnostic Accuracy for Screening.

Related to “not causal” is the claim from a negative finding that someone does not have a particular condition. This is important to consider for screening. And with movement or treatment based approaches to physical therapy, there’s a need to screen because to assume there’s nothing specific that requires attention, you need to at least consider whether that assumption is true. And regardless of your treatment approach, you’re trying to make claims about possible causes, so you cannot escape the need to consider the possible false positives or false negatives associated with your claims. Meaning, just because you don’t use “special tests”, you cannot escape the impact of uncertainty and the need to try to understand it in your practice.

The important diagnostic accuracy metric for a “screening” tool is Sensitivity and the negative (-) LR (which is influenced by sensitivity, but not exclusively - see figure below). This is because for a screening tool, we don’t want many false negatives. If someone screens negative, we would like to feel confident that it is in fact negative.

The figure above shows how the Sensitivity is related to the Negative Likelihood ratio at three different Specificity values. I’ll refer back to this below.

If you need some help here with the definitions - Sensitivity (Sn) is the rate of true positives in those that have the condition being tested for. It is calculated from among those that have a condition as the number of True Positives divided by the total number of people with the condition. The total number of people that have the condition are those that are True Positive plus those that are False Negative. Therefore, the Sensitivity is largely driven by the number of False Negatives. If there are a lot of False Negatives, the sensitivity is lower (hence the saying “SnOut”). And that’s bad for screening. That entire paragraph is a waste of space because it can be summarized with an equation - but I’ve provided enough continuing education to know that most PTs don’t want to see the equations…. But I suppose here I’m speaking to readers of my Substack, and perhaps you’re more likely willing to see the equations….

Sensitivity: Sn = True Positives / True Positives + False Negatives

Specificity: Sp = True Negatives / True Negatives + False Positives

+LR = Sn / 1-Sp (Rate of True Positives / Rate of False Positives)

-LR = 1-Sn/Sp (Rate of False Negatives / Rate of True Negatives)

The LRs are odds, they are the odds of having the condition based on whether the test (or sign or symptom) is present or not. If the test is positive, the odds of the condition is elevated. The higher the +LR the higher the odds. If the test is negative, the odds of the condition is reduced. The closer the -LR is to zero the lower the odds.

When screening you seek a test (or sign, or symptom) that has a high Sn and a low negative LR.

Now in two continuing education courses I’ve heard people say that screening tests “send to many people for referrals” because the positive test is in error (false positive). But in neither situation did they bother to relate that possibility to the Sp or +LR. The probability of a False Positive is the inverse of the Specificity, and the higher the +LR, the lower the probability of a False Positive (though you can never escape the need to consider the baseline or prior probability, more on this below). If we specifically identify and recommend tests (signs and symptoms) based on the -LR (and Sn) for screening, then this is bound to occur. In both courses I’ve taken the recommendation is to select tests for screening with a Sn of or greater than 90% (0.9) or a -LR of or less than 0.1. But in neither case did the person teaching make the connection or implications of the Sp. If you look back up at the figure of how Sn relates to -LR, you’ll see that the impact. With a high enough Sn, the Sp can be as low as 25% and still meet the criteria of a -LR of less than 0.1. This means, if the person tests positive, there’s a 75% probability of a false positive.

If you want to to have an idea of how likely it is that the positive test (sign or symptom) is a false positive, simply consider what the Specificity is (or the +LR). When considering the +LR of a screening test that may not give great results when the test result is positive, then consider the “prior probability” of the condition for which you are screening. The LR is intended to be used as an adjustment to your knowledge that starts with your belief that the person has the condition. That prior belief is a prior probability. If you believe there’s a 5% (0.05) probability that someone has the condition your screening for, and the +LR is 3, and they test positive then your belief that they have that the condition knowing that the test is positive is still only 13.6%. It’s certainly higher than 5%, but not that much higher. Knowing this information is helpful in your process. You may think of other tests to try to confirm that they have the condition that you’re thinking about referring out because of. Maybe that 13.6% posterior probability is enough for you to consider a test that helps to “rule in” that condition. If you start with 13.6% and the person then tests positive on a test with a +LR of 15 then the posterior probability is now 70% - time to refer. Someone might ask - why didn’t I start with the test that had a higher +LR? I’d answer - because you only believed there was a 5% chance that they had that condition. So you were going to try to rule it out (screen for it) with a test that had a high Sn (and low -LR). But when that test was positive you now had to rethink your belief. Your new belief that they had the condition was higher. So now you’re wondering whether that is simply because you had a test good for ruling out - not for ruling in. So you decide to see if you can rule it in with a test for that purpose.

If the prior probability of 5% was tested with a screening test with a -LR of 0.1 and the test was negative, the posterior probability would be 0.5% (0.005). That’s what you’re looking for with your screening test.

But - if your screening test is positive. You have to consider what’s next - and as my colleagues had pointed out in the courses I’ve watched - if it’s to immediately refer out, you may be referring too many people out. What they missed out on however, was an opportunity to discuss the partners of Sn and -LR in the spectrum of diagnostic accuracy statistics, the Sp and +LR and how they fit together in your reasoning.

Should I continue…..?

If anyone would like more posts on either of these topics (causation or diagnostic accuracy), or even how they are ultimately linked together - please leave a comment or respond to the poll below. They are topics near and dear to my teaching and practice - but I can tend to go overboard with them. I’m happy to write more about them - if someone thinks they’d read it. ;)

David Gillette

Nov 21, 2023

Hi Sean - I think I'm too tired to read it and vote at the moment - end of the year grading, finals (next week) and other projects : ) And I need to do similar reading and reviewing as you - I'm going back to practice next week (4 hrs/wk for a couple of months in an OP clinic focusing on geriatrics until about March, then 4hrs 2x/wk). More fodder for discussion later! : )

Expand full comment

A Peripatetic Physical Therapist

Discussion about this post