A previous post (Not Causal & a negative test: strong epistemological claims) ended with a poll asking whether I should further explore causation or diagnostic accuracy or both and how they are related. 100% of respondents requested both and how they are related. Sorry it has taken so long to get to this - my relationship to writing is a tenuous one. Even this post, at this time, is a primer for getting past writers block on several projects. It’s a fairly common experience (at least among the small group I talk with) that being busy with things that must be done results in greater productivity with things that you’d like to get done. To a point of course, there must be a U-Shaped curve in there somewhere.
Causation is a word ripe with meaning and interpretation. It’s critical in all aspects of clinical practice. However, there are entire books on evidence and research for practice that don’t ever mention the word other than to say that correlation doesn’t imply causation. Such sources don’t tell readers that causation does in fact imply correlation, which is why we need to be warned that correlation doesn’t imply causation. Every time there’s causation there’s a correlation of some sort.
I fully believe that the connection between research based evidence and actual practice depends on the recognition that the most we can learn from research based evidence is something about causes. Therefore, despite the philosophical challenges of proving causation, it’s a necessary step in generating knowledge for practice.
Many of these assertions rest on logical proof rather than empirical verification using the standard rules of logic related to implication (IF, THEN statements). I’ll spare readers the formal proofs of the following logic.
IF Causation, THEN correlation.
But IF Causation, THEN correlation is not equivalent to IF Correlation, THEN Causation (Affirming the Consequent formal fallacy)
And you may notice that it is not equivalent to:
If NOT Causation, THEN NOT Correlation (Denying the Antecedent formal fallacy)
It is equivalent to: IF NOT Correlation, THEN NOT Causation.
Correlation is thus the pattern of observations that we look for when making an initial judgement of causation in phenomenon we regularly interact with - either unstructured or structured. Here I’ve introduced two terms that I use for distinguishing between inductive learning (empirical learning) - that is learning by making observations and drawing inductive inferences from those observations.
Our daily lives are filled with experiences that include the observation of phenomenon that we may or may not pay attention to. These are “unstructured”. Sometimes we set up protocols (methods) to structure our observations. There are a variety of ways we can structure our observations. Along the spectrum of structuring our observations we reach a point that we name the protocol (methods) and it is officially a nameable “research design”. Even with “case studies” we can enter the world of structured observations (for example, the ABAB case study design of alternating two scenarios that usually differ from one anther based on the presence and then absence of an independent variable (a cause - but we dare not call it that!).
Unstructured observations are fully immersed in the reality we live in, they are open systems. Open because there is little attempt to close them off from factors that could impact the interpretation of the observations. From unstructured to structured observations involves a leap from open to a little less open and eventually to a closed system. An a priori case study is much closer to an open system than any other type of structured observation. An a posteriori case study is an open system with reflection back onto it to try to figure out what happened (like historical research). An experiment with genetically identical cells in Petri dishes with identical exposures to some sort of chemical that is hypothesized to impact the cell are highly structured observations, and a closed system. Although thankfully for the development of antibiotics the discovery of penicillin was not such a closed system that the intended observations could not be contaminated! Similarly, thank fully for our understanding of stress physiology and its health effects Hans Selye was bad at giving his mice (or rats?) injections - a confounder that contaminated his supposed closed system structured observations. Point being, even when we try to close a system to structure our observations we must be vigilant to attempt to identify possible confounders that alter the internal validity of the observations.
With unstructured observations (always open systems) - we know that there are many possible confounders that alter the internal validity of our observations. And that doesn’t even begin to consider the bias (blind spots) that enter into the sensory-perception-conscious of unstructured observations. I’ve gotten off track.
Measures of diagnostic accuracy are measures of correlation
To connect causation to diagnostic accuracy we can first acknowledge that all measures of diagnostic accuracy are measures of correlation. To see this you may have to first remove from your brain any false notion that correlation is an “r” value. An “r” value is one measure of correlation, but it is not the only measure of correlation. You must disentangle the concept of correlation from the measurement of correlation. The concept of correlation is an association. The measurement of correlation depends on the data being generated from observations. Any measure that relies on a distribution of data in a 2x2 table (such as sensitivity and specificity) is a correlation. The distribution of observations in the 4 cells of the 2x2 provides information about whether a correlation exists.
A clinical test that can be positive or negative; and a condition that can be determined to be present or absent, generates a 2x2 table.
If the correlation based on 100 observations of people, half of whom have the condition (disease in the above table), then the cells True Positive and True Negative will have 50 people each, and the cells labelled False Positive and False Negative will be zero. Everyone that has the disease will test positive and everyone that tests negative will not have the disease. If there is no correlation then each cell - if perfectly NOT correlated - would have 25 subjects. People that have the disease would be 50% likely to test negative and 50% likely to test positive; and the same can be true for those that don’t have the disease.
I think I’ve demonstrated that diagnostic accuracy measures are measures of correlation. The higher the correlation between a condition and a sign (symptom, test result, etc) the greater the diagnostic accuracy. From a pragmatic point of view, all that matters to the clinician is this correlation and how high (or low) the correlation is (how high or low the sensitivity and specificity are determine Likelihood ratios and the shift in probability that occurs with a positive or negative test result).
But correlation is not equivalent to causation. So we’re only part way there. In my next post I’ll pick this up and explain the reasons that what we look for in a diagnostic test is that it is causally related to the condition it is attempting to diagnose. And the factors that influence how good the correlation (diagnostic accuracy) is, really depends on some ontological mechanistic factors of causation including the presence of other causes. It turns out to be a signal and noise situation. And, that when interpreting the results of a test in a particular person, understanding the causation (not just the correlation) is helpful.