by Jason S. Brinkley, PhD, MA, MS
On the Brink addresses topics related to data, analytics, and visualizations on personal health and public health research. This column explores current practices in the health arena and how both the data and mathematical sciences have an impact. (The opinions and views represented here are the author’s own and do not reflect any group for which the author has an association.)
The Health Insurance Portability and Accountability Act of 1996 is a significant piece of US law that provides protection and confidentiality of private medical information. More commonly referred to as HIPAA, it governs the kind of information that health care providers can release about the patients in their care. Home addresses are among the protections offered by HIPAA, which suggests that we place an importance or premium on protecting information about where people live. So then what do we do if address information is publicly available from other sources? Doesn’t the potential of abuses of such sources create an unfair environment? Is it fair if patients can track down their physicians’ home addresses online but those physicians can’t even release their patients’ zip codes?
North Carolina is one of many US states that has public access voter data. If you vote in any state election, then your voter information is automatically added to a public registry which is made available online. This is a good thing from the perspective of public transparency and prevention of voter fraud. Also, the North Carolina Medical Board publishes a list of physicians who are certified to practice in the state of North Carolina. This is a good thing from the perspective that patients can ensure that their doctors are licensed to practice medicine in North Carolina. But these data sources are totally open and available for download, so they can be linked to create something new and unintended. I wanted to know how challenging this would be and how many addresses I could easily find. Here’s how I did it.
Start with the North Carolina State Board of Elections’ website, which has downloadable voter data files. Or you can go directly here to start downloading data by county or across the whole state. Next, go to the NC Medical Board website and do a provider search where one only fills in “NC” in the state field. You will get approximately 59,000 active and inactive board certified physicians in North Carolina; that info can be copied and pasted into a spreadsheet program. Then it is just a matter of searching the voter data for a physician of interest in order to obtain his or her home address. However, not all physicians vote, so how well does this data match up? Bringing these data sources into my favorite statistical software, I was able to link the sources with minimal effort and used only exact matches on first, middle, and last name to get addresses on about 31,000 NC physicians (a little more than 50% of physicians, which is better than the overall population voting rate). I used open source map sources to get the longitude and latitude of 16,000 of those physicians and have mapped them for you below. By the way, Google will give you longitude and latitude of up to 2,500 addresses for free by going here.
All told, this work took me about one evening of effort on a regular laptop while watching television. It was neither extremely difficult nor time consuming and nothing here is limited to just physician names. I also could have gotten more matches by including close but not exact matches. For example, if one registry has “John Smith Jr.” and the other has “John Smith Jr” (no period) then those did not match. I took only the exact matches without any other efforts and was able to get this high number from just a first pass. I feel reasonably confident about this first pass with high concentrations of physicians in the areas of Charlotte, Raleigh/Durham, and Greensboro/Winston-Salem. While many may struggle with the challenges of doing this kind of matching on the state level, the county level files are easy to work with and finding just one person in this data can be done quickly.
North Carolina isn’t the only state with open access voter data files, so this kind of matching can be done in a lot of other places. My point is that each of these sources of public reporting is necessary from one perspective but also creates opportunities for unintended use by outside groups. Some of these entities are from outside the United States, as is the case of this academic paper that uses the North Carolina data for methodological research. Indeed, the recent Congressional investigations into the use of social media platforms like Facebook and Twitter had lawmakers asking executives whether voter data had been used to help specifically target certain areas.
So beyond privacy concerns, is this a big deal? HIPAA seemed to think so when it was created because so much can be known about people just by knowing where they live. Indeed, the Robert Wood Johnson Foundation has a website that will tell you how long you are expected to live just by typing in your zip code. Maybe policy discussions need to be had about how we want this data to be used before we get to a place where asking for directions to your doctor’s home address is just the latest feature from Alexa.
Jason S. Brinkley, PhD, MS, MA is a Senior Researcher and Biostatistician at Abt Associates Inc. where he works on a wide variety of data for health services, policy, and disparities research. He maintains a research affiliation with the North Carolina Agromedicine Institute and serves on the executive committee for the NC Chapter of the American Statistical Association and the Southeast SAS Users Group. Follow him on Twitter. [Full Bio]
Previous posts by this author:
- The Population Bullet
- The Unknown Unknowns of Missing Data
- Communicating Science–More Than Just Good Words?
- Counting Alabamas
- The Third World in Your Own Backyard
- The Unrealistic Gold Standard
- Does MACRA Signal the Beginning of the End for Medicare Claims Data?
- Think You Aren’t Extraordinary? Odds Are You’re Wrong
- Mapping by Words
- Are We Asking Too Much From Surveys?
- Making Better Comparisons
- What Kills Us?