Making Better Comparisons

by Jason S. Brinkley, PhD, MA, MS

On the Brink addresses topics related to data, analytics, and visualizations on personal health and public health research. This column explores current practices in the health arena and how both the data and mathematical sciences have an impact.

Jason S. Brinkley, PhD, MA, MS

Jason S. Brinkley, PhD, MA, MS

Our health care system is obsessed with decision making. We track everything; then we analyze it and try to make comparisons so that we can figure out what is “best.” We constantly make head-to-head comparisons of almost every aspect of the health care system in the hope that we can see who is winning and who is losing. If we find more people doing better on one treatment than another, then the logical conclusion is that everyone should be getting the superior treatment.

Except, some people have it better than others. You thought all sick people have equal access to all treatments? Hardly.

In health analytics, we measure the chance that someone gets a particular treatment with a metric called a propensity score. Propensity scores help us make better comparisons. So how do they work? Well, to illustrate, we’ll start with Bob.

The worst day of Bob’s life could be the one where he is told that he has cancer. His journey to that day begins in his home bathroom, where he notices an unusual mole after stepping out of the shower. A trip to his regular doctor leads to a specialist referral, which leads to blood tests, and then a biopsy.

Bob has melanoma. Could be the worst day of his life. Bob will spend a lot of mental energy on how to tell his wife and children. “Don’t worry; the doctor said the prognosis is good.” Then Bob wonders whether the doctor actually said the prognosis is good, or if he just made that up from something he saw on TV?

“The doctor did say I’m a good candidate for surgery.” Bob leans on the doctor’s words, saying them over and over like a mantra. He’s a good candidate for surgery. There is a plan.

The day Bob finds out he has cancer isn’t the worst day of his life, because Bob has hope.

So how did this story move away from a cancer death sentence? Why is Bob a good candidate for anything, and what does that mean?

One of the most important contributions of statistics to medicine in the last 50 years is showing that outside the experimental setting (ie, clinical trials), characteristics such as age can impact who gets which intervention, how they respond to that intervention, and whether people will have positive outcomes. All of these things may have to be accounted for in comparing two interventions.

Imagine that you have two therapies to treat melanoma, but one therapy is only given to “good candidates,” maybe in this case the ones who are young and have no other medical conditions. We also know that younger and healthier individuals are also more likely to survive a cancer diagnosis.

You can find a deeper overview of propensity score analysis here. The overall idea is that we can leverage the information we get from identifying these gaps by forming statistical models on the likelihood patients receive certain therapies, and the results give us a framework that lets us make apples-to-apples comparisons between interventions. These methods fall into a setting usually deemed “quasi-experimental” because they seek to reap some of the same rewards one finds in the experimental setting, where random treatment assignment between patients in a clinical trial   yield treatment comparisons free from “good candidate” effects.

So propensity scores have certainly revolutionized surgery and other areas of medicine where comparisons need to be made but clinical trials either haven’t been, or can’t be, done. What about a more general public health setting? Propensity scores are, in some ways, even more important here but also more difficult because the mechanism that generates “good candidates” can be a lot more subtle.

In the melanoma scenario, it is easy to see where physician preference or location of treatment (say urban versus rural) may lead to differences in who gets what treatment. Who is to blame when we find disparities between races in vaccinations? Or comparing health insurance rates across low- and medium-income countries? Or differences in breastfeeding rates when exploring outcomes such as child cognitive development? JPHMP readers may recall a study from 2014 that looked at the impact of having a patient-centered medical home among diabetics.

In addition to health studies, propensity scores are commonly used in political science, law, history, anthropology, agriculture, and many other fields. It has become part of the methodological standards for the Patient Centered Outcomes Research Institute, which aims to fund health research that is patient centric or compares interventions in more “real life” settings.  Its time is here and is slowly becoming a research standard for making better comparisons.

So is the methodology perfect? No. In fact, there can be some big limitations in using propensity scores. Most center around exactly which interventions are to be compared, which brings us back to the beginning of the discussion. Propensity scores help make for better comparisons, but that means there have to be two appropriate things to compare. Bob is a “good candidate” for surgery. But surgery is a broad area, and there are many different types of surgery. So propensity scores may be useful if we have two competing surgical techniques and it is unclear whether Bob should have one type of surgery or another.

But what if the comparison is between, say, surgery + chemo versus chemo alone? Bob’s age and otherwise good health make him a good candidate for surgery, but what about whether he would get chemo only? Probably not, if age plays a big role. Let’s say the propensity score for Bob getting surgery + chemo is 99.5% when compared to just chemo. But then Bob’s chance of getting chemo only is 0.5%, which for all intents and purposes means that Bob isn’t going to get chemo alone. Clinically speaking, almost every doctor out there would do surgery. So Bob’s data doesn’t help us make better comparisons because his chance of one of the two interventions being compared is simply too high to provide useful insights into how patients like Bob would respond to chemo alone. Propensity score analysis really needs patients who could be “good candidates” for either intervention. Not a lot of researchers think about it this way, even though balancing the treatment comparison is precisely why we want to use propensity scores.

So what about Bob? Bob and millions of other cancer patients are entering a world of personalized medicine where propensity score analysis and similar novel methods help answer the question, “What is the best intervention for me?” instead of “What is the best intervention for most?” Look for more on personalized medicine later from this blog as we will shift discussion away from the head-to-head comparison paradigm. For now, we are confident that Bob can be optimistic about his chances of recovery, and studies show that positive attitudes go a long way in having a healthy life. Of course, maybe Bob is a natural-born optimist, in which case we may need to predict optimism and do yet another propensity score analysis..

Jason S. Brinkley, PhD, MS, MA is a Senior Researcher and Biostatistician at Abt Associates Inc. where he works on a wide variety of data for health services, policy, and disparities research. He maintains a research affiliation with the North Carolina Agromedicine Institute and serves on the executive committee for the NC Chapter of the American Statistical Association and the Southeast SAS Users Group. Follow him on Twitter. [Full Bio]

Previous posts by this author: