The Unrealistic Gold Standard
by Jason S. Brinkley, PhD, MA, MS
On the Brink addresses topics related to data, analytics, and visualizations on personal health and public health research. This column explores current practices in the health arena and how both the data and mathematical sciences have an impact. (The opinions and views represented here are the author’s own and do not reflect any group for which the author has an association.)
“When did linear regression stop being good enough?” a co-worker asked last week when he came by my office for some insight on a project he was reviewing. It’s nice working with a group that has limited statisticians because I find myself consulting on analytics with my colleagues as often as I do with clients and outside researchers. This discussion was regarding a program that he is evaluating where a large patient registry is being used to evaluate the effectiveness of two competing interventions. I’ve spoken before about making better comparisons, and it was those techniques that the colleague was asking about. I explained to him, just as I did in my previous post, that methods such as propensity scores (and other quasi-experimental methods) have become a huge boon in our ability to make causal comparisons from observational data. After a twenty-minute discussion, my friend said, “So the point is to make the registry look like an experiment?” To which I replied a triumphant “YES,” and he said thanks and was on his way. While that conversation only focused on one side of the observational data/randomized experiment coin, there is an interesting duality in the current state of methods research that is being overlooked.
Let’s first focus on observational studies, those sources of data that are collected outside an experimental setting. This data is all over the place, but the inherent problem is that it may have all kinds of biases and only some can be observed or adjusted for in the analysis. This issue has become exacerbated in the world of big data; while we have found ways to collect large volumes of data about the world around us, that data can sometimes pose more questions than answers. Our quest for drawing causal conclusions has led some to suggest that the best path forward is to make this data “look” like an experiment. Randomized experiments are considered the gold standard because the act of randomization removes many of these aforementioned biases.
Rubin (2007) provides a great introduction to this line of thinking, and while the “how” of making observational data look like an experiment varies across many different techniques created for virtually every setting for which one could gather data, most revolve around the idea that one should limit the observational data to those individuals for whom comparisons are appropriate (ie, that either intervention being compared is a viable candidate for that person) and then calculate comparisons in a way that adjusts for the observational nature of the data AND the impact of various patient characteristics on the outcome of interest. Rubin suggests that there are aspects of both design and analysis to this work and lays out some general ground rules such as “create comparison groups while completely ignoring the outcome data.” In other words, just pretend you don’t have it until you have your comparison groups, in much the same way one would randomize patients in an experiment before observing any outcome. There are some who balk at these ideas for several reasons, one of which is that it doesn’t seem appropriate to just ignore or limit some of the observed data. I counter by suggesting that experiments ignore or limit potential data all the time with study inclusion or exclusion criteria, so are these changes really that big a deal?
The take home message is that observational data is always bad and experiments are always good, right? Not really. Recently, some experiments (ie, clinical trials) have come under criticism for not being “realistic enough.” That is to say that drugs and therapies are often tested in settings and in ways that they would rarely be used in practice. There is a push for further investments into so-called “Pragmatic Clinical Trials” that more closely mimic how patients would actually use the intervention. As Patsopoulos points out: “Although hundreds of trials and RCTs have been performed so far in most clinical conditions, comparing dozens of interventions, there is an increasing expression of doubt as to whether the plethora of available evidence and ongoing data are translatable and usable in real life.” The basic tenants of these trials: use clinically relevant interventions, include patients from a variety of backgrounds, incorporate different settings, and collect multiple relevant outcomes. Pragmatic Clinical Trials can be complicated and costly but offer great potential for eliciting a truer measure of overall effectiveness and utility.
And that brings us to the unusual state of current method research. We have a simultaneous desire to make our observational data seem more like clinical trials and our trials to be more realistic and mimic what is observed in practice. Perhaps the end goal is to one day establish a unified methodology from which one can make causal conclusions regardless of how the data were gathered. If those goals are achieved at study design and pre-data collection, then it’s possible that the most appropriate data analysis will be linear regression analysis, which surely means that my co-worker will be back for another chat.
Jason S. Brinkley, PhD, MS, MA is a Senior Researcher and Biostatistician at the American Institutes for Research where he works on a wide variety of data for health services, policy, and disparities research. He maintains a research affiliation with the North Carolina Agromedicine Institute and serves on the executive committee for the NC Chapter of the American Statistical Association and the Southeast SAS Users Group. Follow him on Twitter. [Full Bio]