Did it take a village?

What can we do to reduce racial inequalities in health? There are mountains of scientific reports, peer reviewed articles, reviews, monographs, books, and blog posts showing that US blacks have worse health indicators than whites. Although the magnitude of black-white differences in health have varied over time, the persistence of the racial gap in health is distressing and can be seen across a diverse array of environments and ways of measuring health: life expectancy, survival from cancer, death rates, cases of infectious diseases, accidents and crime victimization, among other outcomes.

This problem is echoed in striking language in a 2013 paper titled, "Eliminating Racial Disparities in Colorectal Cancer in the Real World: It Took a Village" by a group of public health cancer researchers from Delaware:

Those who are poor, underserved, or minorities are more likely to get cancer and die as a result of it than those who are rich or white. This is a fact, and it is the current reality of cancer care in the United States, as documented in thousands of peer-reviewed articles,…
— Grubbs et al. J Clinical Oncology 2013;31:1928.

Despite this depressing introduction, the quote above comes from a paper that attempts to make a case that a sustained, focused program aimed at reducing racial differences in cancer in Delaware was enormously successful, demonstrating “what can happen when the entire health care community of a state is mobilized toward a goal: eliminating health disparities in CRC [colorectal cancer].”

Briefly, the program contained 3 general aims:

  • A colorectal cancer screening program.
  • A cancer treatment program providing for the uninsured.
  • An emphasis on African American cancer disparity reduction.

In the paper the authors argue that the program resulted in the near-elimination of black-white inequalities in colorectal cancer:

By doing these common-sense things, we accomplished the following with respect to CRC health disparities from 2002 to 2009: elimination of screening disparities, equalization of incidence rates, reduction in the percentage of African Americans with regional and distant disease from 79% to 40%, and most importantly a near elimination of mortality differences.
— Grubbs et al. J Clinical Oncology 2013;31:1929.

The primary pieces of evidence behind this claim come from Table 1 and Supplementary Figures 2 and 3:

Table 1 shows rates of ever colonoscopy screening, incidence, and mortality in 2001 and 2009, and the relative change between the two time points. Without a doubt, black rates of screening increased more among blacks than whites, and both incidence and mortality declined more for blacks than whites, although no estimates of precision (only p-values, sigh) were included (a serious and unethical omission, in my opinion). Based on these two sets of measurements, and the fact that the program was implemented between 2001 and 2009 serves to bolster the authors’ claims that the program reduced black-white differences.

However, Appendix Figures 2 and 3 complicate the story a little bit, since they show trends over the entire period, for colorectal cancer incidence (the authors circled the rates at the end of the period, I added an arrow indicating the general time of program implementation):

Grubbs et al. JCO 2013

Here is the graph for mortality:

Grubbs et al. JCO 2013

The authors are certainly correct that inequalities decreased. But one thing that is clearer from the figures than the table is that rates of cancer incidence and mortality were decreasing, particularly among blacks, prior to the program implementation, which complicates the authors’ story since now we have an alternative explanation for the decrease in black-white inequalities at the end of the observation period: black rates were already declining faster than whites for reasons that have nothing to do with the program. I don’t know what these might be, but it does make it harder to accept the authors’ attribution of the elimination of inequality to the program, since clearly some part of this decrease looks as though it would have happened even if the program had not been implemented.

It is worth mentioning that national rates of colorectal cancer mortality have also been declining, and perhaps faster for blacks than whites. The graph below I generated from the National Cancer Institute’s web-based tool (https://seer.cancer.gov/faststats):

Source: US National Cancer Institute (author's calculations).

This at least suggest that there may be some factors operating nationwide that are driving down colorectal cancer mortality rates for both blacks and whites, which makes it harder to evaluate the researchers' claim that their program was responsible for the success in Delaware.

How can we adjudicate between these competing explanations to figure out whether the authors' argument for the impact of the program is more plausible than the argument that factors other than the program were already driving down rates of colorectal cancer incidence and mortality among blacks before the program was even implemented?

One way (I do not claim this is the only way) is to use something called an “interrupted time series” (ITS) study, which effectively uses the knowledge about when a program was implemented and assesses whether the program may have had an immediate impact on the rates, or possibly changed the trajectory of the outcome after the program was implemented. (There are many papers on this, but a rather nice overview is given by Gillings and colleagues in the American Journal of Public Health in 1981). The picture below gives the basic idea, which is using this type of study design to test whether some intervention "interrupts" pre-existing trends.

Source: Author.

The key idea of ITS is to say, let’s look at the outcome trends before the program (β1) and predict what would have happened in the absence of the program, assuming the trends would have continued in the same fashion (clearly a serious assumption), and compare this to what actually happened (since the program was actually implemented). In the figure above β2 represents any immediate impact of the program (less likely for most cancer indicators), but β3 represents a change in the trend that we could potentially attribute to the program. [Note that I have a color code for "control" there, but right now we could just think about the red lines as trends for either Delaware as a whole or for black residents of Delaware...more on that below]. If there are important differences between what we would have predicted without the program and what actually happened (under these assumptions), we may be more confident that the program actually “interrupted” the trends and had some impact on outcomes. (leaving aside a lot of other complicating features and assumptions of ITS [autocorrelation, seasonality, time-varying confounders, etc.], which need to be taken seriously)

A simple analysis for Delaware mortality

I downloaded annual age-adjusted rates of colorectal cancer mortality for black and white individuals ages 50 and over from Delaware using the US National Cancer Institute’s SEER database (http://seer.cancer.gov) for the years 1999 to 2013. Unfortunately, incidence rates are a little unstable among blacks and NCI suppresses some cases where numbers are small, but we could apply this same idea to trends in stage-specific incidence or cancer screening (which would be very interesting and perhaps a better example, since those are likely easier to change in short order than mortality).

I used the Stata routine itsa, written by Ariel Linden (and later described in an article published in Stata Journal) to estimate the impact of the introduction of the program in 2003 on colorectal cancer mortality rates among Delaware blacks. It isn't necessary to use this routine for an ITS analysis, but it makes some simple coding aspects easier and I was feeling lazy. I used the log mortality rate but the results are similar if one uses the absolute mortality rate.

I'll show the regression output below, but this picture tells the story pretty well. We tell Stata that an intervention occurred in 2003, and the software routine tests for a break in the time series in 2003 and any change in the slope of the mortality rate after 2003 (under assumptions of linearity, which, again, are important and need to be thoroughly explored, but hey, Grubbs et al. didn't even report standard errors!). 

Source: Author's calculations of SEER data.

It's pretty straightforward to see that after the intervention began in 2003 there was little change in the cancer mortality rate among blacks, and that there was little change in the trajectory of the death rate after 2003. This is confirmed by the output from the regression analysis:

Source: Author's calculations of SEER data.

Source: Author's calculations of SEER data.

A few notes here. Exponentiating the constant term (4.8) gives the predicted colorectal cancer rate for blacks in 1999, which is approximately exp(4.8)=122 per 100,000 population, roughly equal to the actual rate of 119. The coefficient _t says that the black mortality rate was declining by roughly 6% per year prior to 2003, the coefficient for _x2003 represents the change in the log mortality at 2003, which was near zero, and the coefficient for _x_t2003 captures any difference in the log rate trend after the intervention, which is also effectively zero. The fact that the two trends before and after 2003 look nearly identical in the figure above corresponds with the fact that this latter coefficient is nearly zero.  

This suggests that the program did not "interrupt" black mortality rates. The rate of decline was effectively the same before and after the program. But, you might ask, maybe comparing blacks after 2003 to their trajectory before 2003 isn't the right comparison. Since the program was focused on reducing racial inequalities, maybe we should be comparing black rates to white rates (or even modeling the black white difference in rates)?

This is a fairly straightforward extension of ITS, which is just to include a *non-affected* control group as a comparison. I say "non-affected" but this is unlikely to be true here, since the program (at least the limited description of it in the paper) did not say that that it only affected blacks in Delaware; presumably, whites may also have benefited. Nevertheless, it would seem valuable to use whites as a control group since the idea was for the program to narrow racial inequalities.

Below is another graph based on the ITS regression results, this time with Delaware whites as a control group: 

Source: Author's calculations of SEER data.

Again, a few things to notice here. First, white log mortality rates were on roughly the same trajectory as blacks prior to 2003 (although much lower in absolute terms). Second, the post-intervention trends also look generally similar, which is inconsistent with the idea that the program disproportionately benefited blacks. The regression estimates provide some numbers to back this up:

Source: Author's calculations of SEER data.

There are more parameters in this model, since we are not only estimating the program impact on blacks, but also on whites. Since in this case blacks are the "treated" group (_z=1) and whites are the control group (_z=0), the parameters have the following interpretations. The constant term (_cons) is the predicted white rate in 1999, so roughly exp(4.3)=74 per 100,000. The coefficient on _t is the white trend prior to 2003 (weakly declining), the coefficient on _z is the black-white difference in log mortality levels in 1999, which is large and obvious in the graph, and the coefficient _z_t is the difference in the pre-intervention trend between blacks and whites, which is rather small and suggests that blacks and whites were on similar trajectories prior to the intervention. The next set of coefficients give the change in the white rate at the time of the intervention (_x2003, small and positive), the change in the trend after the intervention among whites (_x_t2003, small, but still negative and showing declines in rates), the change in the black rate at the time of the intervention (_z_x2003, small and negative), and, most importantly, the difference in the post-intervention trend between blacks and whites (_z_x_t2003), which is also close to zero.

The itsa program also will display the post intervention trends for the treated and control groups, and the difference between these two trends. The panel below gives this information and you can see that both blacks ("treated") and whites ("controls") were decreasing by roughly 5-6% annually. The black decline is larger than the white decline, but it is hard to make a case for a meaningful difference in the post-intervention trends with this data. The 95% confidence interval for the black-white difference in post-intervention trends goes from -4% to +3%.

Source: Author's calculations of SEER data.

Okay, as one last example, rather than compare Delaware blacks to Delaware whites (since we might worry about whites also being affected by the program), we could instead compare Delaware blacks to blacks in a different state without the same program. I grabbed some additional data for blacks in Maryland, and we could instead use them as a control population. Here are the ITS results:

Source: Author's calculations of SEER data.

In this case it looks a little more plausible that the post-intervention trends show steeper declines for blacks in Delaware than in Maryland. The regression estimates are posted below:

Source: Author's calculations of SEER data.

These estimates are obviously related to the graphs above, but the coefficient on the post-intervention product term that captures the difference in post-intervention trends between blacks in Delaware and Maryland is still small and close to zero. The post-trend estimates do show that the post-intervention trends are more negative (i.e., a steeper mortality decline) in Delaware (~6% annual decline) than in Maryland (~4% annual decline), but these are still not reliably different from one another, although this seems a more compelling story than we had before. 

I am not submitting these analyses for publication, and this is not a serious attempt to estimate the program's impact, but the fact that no alternative explanations were even explored by Grubbs et al. is what really disappoints me, since these methods have been around for a long time. [Also, the data I used for these analyses are here, if anyone is interested]. 

Who cares?

You may be wondering why I would take the trouble to write this post, given that the paper was published in 2013. Old news, right? But I began thinking more about this precisely because I have seen this paper cited a number of times (currently 60 citations according to Google Scholar) recently as evidence of a program that has successfully "worked" to reduce black-white differences in health, which is a topic I do care about.

In a recent interview on the "Future of Health Equity Research" with the Robert Wood Johnson Foundation, the sociologist David Williams, who has done some really fine work on black-white differences in health, responded to a question about interventions that have been shown to reduce health inequalities by citing the Delaware example: 

And the State of Delaware provided colorectal cancer screenings and treatment for all residents but created targeted interventions and marketing campaigns for the black community, which experienced lower screening rates and higher colon cancer incidence rates than whites. As a result, screening rates among blacks rose from 48% in 2002 to 74% in 2009, which was equal to the improved screening rate for whites. Additionally, colon cancer incidence rates declined from 67 and 58 per 100,000 residents for blacks and whites respectively in 2002 to 45 per 100,000 residents among both populations in 2009.
— David Williams

Elsewhere, this study also shows up as evidence cited by other researchers that the program eliminated black-white differences in colorectal cancer outcomes. Renee Williams and colleagues reviewed the epidemiology of colorectal cancer among African-Americans and noted the impact of the program implemented in Delaware:

The Delaware Cancer Advisory Council was established in 2001 with the goal of developing a Statewide cancer control program. This program provided reimbursement and covered 2 years of the cost of care for newly diagnosed CRC for the uninsured. Special programs to reach the African American community were put in place partnering with community organizations. Between 2001 and 2009 they equalized the screening rates between blacks and whites and significantly diminished the gaps in mortality. In 2001, the CRC mortality was 31.2 per 100,000 in blacks and 19.5 per 100,000 in whites in 2001. In 2009 these numbers decreased to 18.0 per 100,000 and 16.9 per 100,000, respectively [citing Grubbs et al. 2013].
— Williams et al. Colorectal Cancer in African Americans: An Update. Clinical and Translational Gastroenterology, 2016;7(7), e185

And another recent paper by an esteemed group of public health researchers at the Harvard School of Public Health repeated this same claim:

We note that a recent comprehensive screening program deployed in the state of Delaware resulted in the elimination of stage at diagnosis disparities and in the reduction of survival disparities [citing Grubbs et al. 2013].
— Valeri et al. Cancer Epidemiol Biomarkers Prev; 25(1); 83–89

Again, it is worth noting here that in both of these examples the entire reduction/elimination in black-white inequality in colorectal cancer is attributed *to the program* yet, as we showed above, this claim is very hard to back up with solid evidence.

More distressingly, Grubbs et al. are still making serious claims that attribute the reduction or elimination of black-white inequalities in cancer outcomes entirely to the program. An abstract of a conference presentation in the fall of 2016 says it all:

Finally, I should say, and want to emphasize, that the point of this post is not to try and embarrass or shame these researchers; they are no doubt committed to reducing the appalling gap in black-white health in their state, and I certainly applaud them for creating the support and political will to implement a program designed to reducing racial inequalities in colorectal cancer. And I am, of course, happy that black-white differences in many of these cancer-related outcomes have diminished in Delaware. Nor is my goal to question whether a program like the one in Delaware has produced some positive benefits; it may very well have, and it may have made an important contribution to the decline in black-white inequalities. But the program also costs money, and before we consider it “a model” for other states, we should get serious about understanding it’s impact. For all I know other states out there may be trying to replicate this program and anticipating the same result. If the program was not the chief reason why black mortality rates decreased faster than for whites, then we have little reason to expect that it can be replicated elsewhere, and that wastes resources that could be spent elsewhere.

My main motivation is to alert researchers (and consumers of research) that without a proper study design to isolate the impact of a particular program or intervention on health, we cannot know whether the program itself is responsible for the positive (or negative!) outcomes. What I hope is that, when considering whether or not their program was successful, researchers such as Grubbs et al. first remember to ask themselves, “What would have happened in the absence of the program?” This kind of thinking will help (I hope) to improve the reliability and validity of claims about the effectiveness of public health interventions. We can, and should, demand better evidence for the investments we make in public health.

Bad news for US life expectancy

The big health news this week was a report that, for the first time since 1993, life expectancy in the United States went down (incidentally this was also a story earlier this year when the preliminary numbers were published, coverage by the NYT here). Life expectancy decreased from 81.3 to 81.2 years for women and from 76.5 to 76.3 years for men. Those seem like small changes, right? So why is this such a big story? Because US life expectancy has been steadily increasing--with minor interruptions--for decades. When these numbers are reported each year (and usually don't make headlines) it is because the story is that life expectancy increased again, mortality is improving, and we are doing something right (or, at least, we aren't doing something wrong). The graph below shows trends in life expectancy at birth since 1929 for US men and women:

From published estimates by the US National Center for Health Statistics.

From published estimates by the US National Center for Health Statistics.

It's an impressive record, notwithstanding the fact that, despite this progress, the US still lags behind many other rich (and some poor) countries in terms of life expectancy.

This story has gotten a lot of coverage, and there are solid articles that have attempted to discuss the reasons for the decrease. Scientific American had an interview with Robert Anderson, chief of the Mortality Statistics Branch at the NCHS, who published the report, that talks about increases in cardiovascular disease, the spike in the 2015 flu epidemic that potentially spilled over to cause more cardiovascular deaths, and the continuing epidemic of overdoses from prescription opioids and heroin. 

The NY Times focused quite a bit on socioeconomic factors and the role they may have played in the decline. They quoted Columbia's Peter Meunning, who said that "popular theories for the cause of the decline, including an increase in obesity rates and an opioid epidemic, fail to explain a problem that feels broader."

Vox also had a nice write-up that discussed the wide range of diseases affected, potential mechanisms related to stress and poor health behaviors, but also (and positively, in my view) noted that, despite a recent focus on white mortality patterns in the press, this recent mortality increase did little to alter the gap in life expectancy between blacks and whites.

There are clearly important questions about what the underlying causes are of this decrease, especially if it is sustained and shows up in next year's statistics as well. Anderson from NCHS says that the preliminary data from 2016 suggest that 2015 may just be a "blip"): 

if we look at the age-adjusted rate for 2015, which is about 733 deaths per 100,000 population, and look at the age-adjusted rate for the 12 months ending in June 2016, it drops back to 720 per 100,000—and that’s actually below the 2014 level, so it may very well be a blip.
— Robert Anderson, NCHS, quoted in Scientific American

Most of these stories have centered on the fact that 8 out of the 10 leading causes of death showed increases, most distressingly for cardiovascular diseases (chiefly heart disease and stroke), which have been leading the cause of death tables for decades. The increase in cardiovascular diseases, because they represent the leading cause of death, is potentially what makes this a story about a decrease in life expectancy rather than a few causes of death that are higher from one year to the next. And there are, indeed, signs that the historic declines in deaths from cardiovascular diseases may be coming to an end. A recent paper in JAMA Cardiology (Sydney et al. 2016;1(5):594-599) shows that the declines have been stalling since around 2011 or 2012, which is of great concern since it continues to be the number one killer in the US.  

But none of these stories really tell us about which causes of death played the largest role in the decline of life expectancy. So, let's take a very early and simple look at this question.

Decomposing the decrease in life expectancy

We'll do this separately for men and women, since the mortality patterns and trends are often quite different by gender. The data on mortality by age and cause of death are publicly available and easily downloadable from CDC WONDER. I used the multiple cause of death database for 2014 and 2015, and asked for the "standard" list of 113 causes of death used by NCHS.

Because a decrease in life expectancy can arise due to changes in mortality at different ages, I used a technique developed by the demographer Eduardo Arriaga to decompose the change (there are other ways to do this, but this one is fairly straightforward and simple).

It's important to note as well that life expectancy is essentially a weighted average of age-specific death rates, and deaths at younger ages are given more weight (the steep increases in life expectancy in the early 20th century were largely driven by substantial reductions in infant and child mortality). Because it is a weighted average, life expectancy at birth can change in lots of different ways. It could decrease if mortality trends decrease in younger ages but increase at older ages, and vice versa. The decomposition helps to sort out which age groups made the most difference.


I calculated life expectancy for women as 81.36 in 2014 and 81.20 in 2015, a 0.16 year decrease, which is close to what was reported by NCHS. For men the values were 76.48 and 76.31, a decrease of 0.17 years. Below you can see the graph showing the proportion of the decline in life expectancy due to mortality changes at different ages, by gender:

Author's calculations. Source: CDC WONDER http://wonder.cdc.gov/

Author's calculations. Source: CDC WONDER http://wonder.cdc.gov/

A few things to notice here.

First, the contributions across age groups are quite different by gender. Nearly 60% of the (0.1 year) decrease in life expectancy among women came from unfavorable mortality changes at very old ages (85 and over). On the other hand, for men 70% of the increase was due to mortality increases among those aged 15 to 44 years. Although infant mortality can strongly influence life expectancy, it played no role in the worsening of life expectancy for either men or women (the fact that they did not help push toward increasing life expectancy probably is not a good sign).

Second, nearly all age groups (except the 45-54 year olds) helped to push life expectancy down for both men and women, which adds some weight to the story that this decrease may be associated with a broad range of factors affecting mortality.


With respect to causes of death, it obviously isn't feasible to try and look at all 113 causes, so I broke it down to some broad categories of disease (there are many other ways to do this). Here are the results:

Author's calculations. Source: CDC WONDER http://wonder.cdc.gov/

Author's calculations. Source: CDC WONDER http://wonder.cdc.gov/

Again, there are a lot of potentially interesting things going on here.

  • For women, most of the story appears to be worsening cardiovascular disease and Alzheimer's disease mortality. These two groups accounted for 94% of the net decrease (it is important to remember that the overall change is a net change that results from some causes that improve and some that worsen).
  • For men, most of the story is about deaths from injuries, both unintentional and intentional. The largest contributor (41%) was unintentional poisoning, but homicide also contributed 20%, motor vehicle crashes 13%, and suicide 7%. Altogether these causes accounted for 82% of the net decrease in life expectancy for men. 
  • Progress in cancer mortality for both men and women kept life expectancy from declining by more than it did (i.e., the decrease in life expectancy would have been greater had cancer mortality rates not declined from 2014 to 2015).

There is obviously quite a lot of additional work to be done exploring the decrease, but a few things to keep in mind.

One, reductions in life expectancy are obviously bad, but they are uncommon, and in this case they are also small in magnitude, so it is important to keep that in mind when interpreting the proportional contribution of different causes.

Two, there has been a lot of attention focused on the opioid epidemic, but it would seem that the decline among men is linked more broadly to a number of injury-related causes. I have yet to see many reports even mention homicide (perhaps because there has been so much focus on the middle-aged white population after the Case/Deaton study last year, where homicides do not figure prominently), yet homicide played a larger role among men than did Alzheimer's. My guess is that this played an even larger role among non-Hispanic black men.

Three, there are many potential hypotheses that may be consistent with some parts of the patterns above. Many of the researchers quoted talked about wide ranging stress, economic inequality, poor health behaviors, increases in obesity, and any number of additional "meta-theories". However, it seems unlikely (to me) that any one of these explanations will be sufficient to account for the diverse patterns by gender, age, and cause of death we see here, not to mention race, which may make things more complicated. It may be that a number of different channels are operating simultaneously rather than one "big thing".

Finally, and more controversially, there is a lot of literature that suggests that mortality rates worsen when the economy improves (and vice versa). Yes, this sounds counterintuitive, but there is a lot of work from Christopher Ruhm, Jose Tapia-Granados and others that makes a reasonably good case for this hypothesis. That being said, whatever increases in death rates that may come from an improving economy (one mechanism is usually increases in traffic accidents as more people are working and commuting, which we do see here and which has received some press attention), they are usually not large enough to overwhelm all of the other improvements in lifestyles, public health, and medical care that make death rates *usually* decline.

It will be very interesting to follow this story as it unfolds.

For those interested, the mortality data I used for the decomposition can be downloaded here, and the Stata do-file is here.

Is support for democracy really declining?

A recent paper by Roberto Foa (University of Melbourne) and Yascha Mounk (Harvard) suggesting that support for democratic governance is declining among younger generations has generated a fair bit of buzz recently. The New York Times ran a relatively long story on it, highlighting a graph purporting to show declines in support for democracy among younger generations that had been making the rounds on Twitter (see for example here, retweeted nearly 6,000 times). Below is the NYT version of the graph (which, as far as I can tell does not appear in the actual paper, so I am a little unclear about where exactly these data came from):

These "declines" are dramatic, but this is essentially a snapshot of two pooled cross-sectional datasets (Waves 5 and 6 of the World Values Survey data, corresponding roughly to surveys conducted around 2006 and 2012). By "snapshot" what I mean is, we survey individuals at a single point in time (e.g., in 2012), and then we plot the percentage of individuals responding that it is "essential" to live in a democracy according to the year they were born. Fine. Nothing wrong with that. And I don't doubt that these estimates are correct in showing that individuals born more recently are less likely to say that it is essential to live in a democracy. 

However, it's important to note that these estimates, on their own, say nothing about trends in beliefs about democracy, unless one is willing to assume that individual perceptions of how "essential" it is to live in a democracy are fixed at birth. I'm not a political scientist, but I certainly know that many of my political opinions have changed over the years (there must be data on this, but I'm too lazy to search for it right now), and I'd gather that is true for many others. 

Nevertheless, the message the New York Times was pushing clearly did suggest that faith in democracy was declining:

Across numerous countries, including Australia, Britain, the Netherlands, New Zealand, Sweden and the United States, the percentage of people who say it is “essential” to live in a democracy has plummeted, and it is especially low among younger generations.

Saying the percentage has "plummeted" strongly suggests that is has declined in a substantial way over time, yet the graph shown above cannot tell us whether or not that is true.


The authors of the original paper made it pretty straightforward to see how they constructed their numbers, and the WVS has a brilliant website where you can easily download the data, so I grabbed the data and wanted to see whether a closer look at more than one survey would provide a similar picture.

With a little tweaking (code is at the bottom of the post) I think I got pretty close to reproducing the NYT figure:

Okay, so now it seems like we are working with basically the same outcome data as did the authors. What I'm interested in here is the change in the fraction of people reporting that living in a democracy is "essential" (measured as 10 out of 10 on a scale of importance). Because the WVS asked this question in both Wave 5 (around 2006) and Wave 6 (around 2012), we can look at a couple of things. First, we can see whether or not the proportion reporting that it's essential to live in a democracy has "plummeted" between the two surveys, as the NYT claimed above.

Second, and more relevant to the broader point I believe the authors were making about younger generations not valuing democracy in the same way as their elders, we can look and see, for individuals born in the same decade, whether their views are actually changing. In the table below I have calculated, just for the United States, the percentage reporting that it is essential to live in a democracy for each birth cohort, by survey year for waves 5 and 6 of the WVS:

2006 2011 Change
Birth cohort
1930-39 75.4% 71.3% -4.0%
1940-49 67.0% 58.4% -8.6%
1950-59 56.5% 57.4% 0.8%
1960-69 48.4% 50.6% 2.2%
1970-79 40.3% 44.1% 3.8%
1980-89 35.8% 29.2% -6.6%
Total 53.2% 50.9% -2.3%
Average within cohort change -1.2%

A few of things are worth noting here. First, as the authors of the paper and the NYT story emphasized, the decrease in the belief that it is essential to live in a democracy across birth cohorts is quite large (>70% for those born in the 1930s vs. ~35% for those born in the 1980s), and indeed this is troubling, particularly if younger generations do not change their thinking as they get older. (And as I noted before, I'm not a political scientist--there may be reams of very good evidence out there that opinions like this don't change much as people age, in which case this is all the more distressing. But an age effect is not the same thing as a secular trend, which is the point I am making here.)

Second, the claim that the NYT made about the "plummeting" of the percentage reporting that it essential to live in a democracy is a hard to see in these data. The overall percentage of decreased from 53% to 51% in the roughly 6 years between survey waves. I do think it is probably a bad sign for this kind of indicator to be declining over time, but I find it very difficult to characterize this as a "plummeting" change.

Third, and perhaps most interesting, if you look across the rows you can see something interesting, which is the change within the same birth cohorts. For example, the first row shows that 75% of people born in the 1930s said it was "essential" to live in a democracy in 2006, but 6 years later only 71% of people in that same birth cohort said it was essential. If anything, this suggest that it is older, rather than younger individuals, that seem to be "losing" their belief that living in a democracy is essential. In fact, the percentage regarding it as essential to live in a democracy is actually increasing over time in the "younger" cohorts of the 60s and 70s. (In fairness, the 1980s cohort does show a decline). The average within-cohort change (weighted by cohort size) is a decline of about 1 percentage point, but you can see this is due to very different patterns for older and younger cohorts. 

I did this same exercise for 3 of the other countries shown in the figures above (data for England and New Zealand for this outcome were not available in both survey waves) and plotted the results below. Each graph shows the change in opinion within each birth cohort between the two surveys, and a 95% confidence interval for each estimate). 

The estimates I showed in the table above for the United States are reflected here, and one can also see that it is hard to paint a picture of anything like robust declines in believing that it is essential to live in a democracy. In Australia, for example, the fraction of individuals who consider it essential to live in a democracy is actually increasing for every single birth cohort, despite the fact that it is lower in general for younger cohorts (the overall percentage in Australia also increased from 63% to 68% between the two surveys). Of course, there is measurement error here so nothing is really definitive for any country, but this pattern of results is considerably more nuanced than the NYT reported. I have not read the entire academic paper in detail, but I think it is safe to say that the authors do not use only this information to form their conclusions about the prospects of democracy as a form of government in rich countries. But I do think that many people (including Amanda Taub at the NYT) have stretched the interpretation of these data in ways that go well beyond what they are capable of revealing.


For anyone that is interested I have provided the Stata code I used to generate these numbers at the bottom of this post:

capture log close
log using declining-democracy.txt, replace text

//  program:    declining-democracy.do
//  task:		replicate NYT (How Stable Are Democracies? ‘Warning Signs
//              Are Flashing Red’ http://nyti.ms/2gdv7I6) 
//  author:     sam harper \ 30nov2016

//  #0
//  program setup

version 14
set linesize 80
clear all
macro drop _all	

// #1
// load World Values Survey data for Wave 5

* Wave 5 data
use V1 V2 V162 V236 using ///
  "WV5_Data_stata_v_2015_04_18.dta", clear

* restrict to AUS, SWE, NED, USA, NZL, GBR
keep if inlist(V2,752,36,528,840,554,826)

* rename variables
rename (V1 V2 V162 V236) (wave country impdemo byear)

* generate survey year midpoint
gen syear = 2006
label var syear "Survey year (midpoint)"

* save temporary dataset
tempname wvs5
save `wvs5', replace

// #2
// load World Values Survey data for Wave 6

* now for Wave 6 data
use V1 V2 V140 V241 using ///
  "WV6_Stata_v_2016_01_01.dta", clear

* restrict to AUS, SWE, NED, USA, NZL, GBR
keep if inlist(V2,752,36,528,840,554,826)

* rename
rename (V1 V2 V140 V241) (wave country impdemo byear)

* generate survey year midpoint
gen syear = 2012
label var syear "Survey year (midpoint)"

* append Wave 5 data
append using `wvs5'

// #3
// generate outcomes and birth cohort categories

* country order for graphing
recode country (752 = 1 "Sweden") (36 = 2 "Australia") ///
  (528 = 3 "Netherlands") (840 = 4 "United States") ///
  (554 = 5 "New Zealand") (826 = 6 "Britain"), gen(cname)

* generate outcome variable: How impt to live in a democracy? 
* 1 = not at all, 10 = absolutely
recode impdemo (10=1 "Yes") (1/9=0 "No") (-5 -4 -2 -1 = .), gen(demo)
label var demo "Democracy 'essential'?"

* generate birth cohort categories and labels
egen bcohort = cut(byear), at(1930(10)1990) icodes
label define bcohort 0 "1930-39" 1 "1940-49" 2 "1950-59" ///
  3 "1960-69" 4 "1970-79" 5 "1980-89"
label values bcohort bcohort
label var bcohort "Birth cohort"

// #4
// plot outcomes by birth cohort

* reproduce NYT figure
qui logit demo cname##bcohort, nolog
qui margins cname#bcohort
marginsplot, by(cname) recastci(rarea) ///
  byopts(rows(1) imargin(medsmall) ///
  title("Percentage of people who say it is 'essential' to live in a democracy", ///
  size(medium)) ///
  note("Source: Author's calculations of the World Values Surveys, Waves 5 and 6", ///
   color(gray) size(vsmall))) ///
  ylabel(0 "" .25 "25%" .5 "50%" .75 "75%" 1 "100%", angle(horizontal)) ///
  xlabel(0 "1930s" 5 "1980s") ciopts(color(ltblue)) ///
  plotopts(lcolor(edkblue) mcolor(edkblue) msymbol(O) msize(small)) ///
  ytitle("") scheme(tufte) name(nyt, replace) xsize(6.5)
graph export "nyt-replication.png", replace
* within cohort change in proportion reporting "essential"
qui logit demo cname##bcohort##wave if cname<5, nolog
qui margins r.wave, over(cname bcohort)
marginsplot, by(cname) xdim(bcohort) recast(scatter) recastci(rspike) ///
  byopts(rows(1) imargin(medium) ///
  title("Change in % saying it is 'essential' to live in a democracy, 2006 to 2012", ///
  size(medium)) ///
  note("Source: Author's calculations of the World Values Surveys, Waves 5 and 6", ///
   color(gray) size(vsmall))) ///
  ylabel(-.2 "-20%" -.1 "-10%" 0 "" .1 "10%" .2 "20%", angle(horizontal)) ///
  xlabel(0 "1930s" 5 "1980s") ciopts(color(ltblue)) ///
  plotopts(lcolor(edkblue) mcolor(edkblue) msymbol(O) msize(small)) ///
  ytitle("") scheme(tufte) yline(0, lcolor(red)) name(change, replace)
graph export "change.png", replace

log close







Disability, Welfare, Opioids...

Nick King, Erin Strumpf and I just published a new editorial in Annals of Internal Medicine on the potential association between changes to welfare policy in the 1990s, the subsequent rise of the disability rolls (particularly for mental health and musculoskeletal conditions such as back pain), and the extraordinary rise in deaths from drug overdoses related to prescription opioids (and now, increasingly, heroin). An ungated version of the paper will be available for 6 months here.

A Bad Taste That Keeps Not Getting Any Better....

Reposted from http://bit.ly/1orsq4A. Some interesting comments by Jay Kaufman on the responsiveness (or lackthereof) of the PLoS ONE editors to serious concerns about one of their papers on race:

Jonathan Eisen already posted on this blog about a PLoS ONE paper by Mason et.. published on 23 October 2013.  And he posted related comments on the PLoS ONE website.  I also commented at this site, in reference to his comments and the authors' response. The purpose of my comments here are just to review those concerns and comment additionally on the PLos ONE response and what this means for the journal's publication model and the progress of science.

The paper by Mason and colleagues analyzes data on 48 people in each of 4 self-identified ethnic groups (African American, Caucasian, Chinese, and Latino). These study subjects are apparently volunteers, and the paper only states that they are non-smokers over 18 years old who are free of a list of diagnosed diseases and who have not recently had their teeth cleaned. Based on the text of the published paper, there is no consideration of their age, diet, social class, or even gender. The authors culture bacterial species from the study subjects and process the data through an algorithm that maximizes the prediction of racial group membership based on these measured data.

The prediction is moderately successful, but this could result from any number of unsurprising reasons. For example, if alcohol consumption affects some particular bacterial species, and whites drink more than Asians in central Ohio, then whatever species is diminished by alcohol exposure would help predict that a sample was from a white rather than from an Asian volunteer. And likewise for any of a million possible lifestyle, social class and demographic differences.  In fact, this is a general problem with data mining exercises that Lazer et al describe in the current issue of Science.

The authors provide no information about how balanced this sample is with respect to any of these variables. Maybe the 48 Hispanics are younger than the 48 whites on average, or have more tooth decay or eat more refined sugar or any of a million other possibilities. The fact that these countless potentially imbalanced factors get represented in the oral bio-environment hardly seems surprising, and the fact that these behaviors and exposures might be differential by race is an observation that is completely trivial from a sociological perspective.

My concern here, however, the authors assert in the published text that these differences do not arise from any of these myriad environmental factors, but from some innate genetic characteristics of the groups. In the Discussion section on page 3 they state that "ethnicity exerts a selection pressure on the oral microbiome, and...this selection pressure is genetic rather than environmental, since the two ethnicities that shared a common food, nutritional and lifestyle heritage (Caucasians and African Americans) demonstrated significant microbial divergence." Here is a remarkable statement, that Caucasians and African Americans experience no differential dietary or lifestyle factors. It is directly contradicted by thousands of published papers in sociology, epidemiology and anthropology that document these differences for reasons of culture, geographic origin, social class and discrimination.

Jonathan Eisen's post directly confronted the authors on this point, and they responded with the following explanation:

"Subjects were selected based on extensive questionnaire surveys and clinical examinations to ensure homogeneity. These questionnaires evaluated educational level, socio-economic status, diet and nutritional history, systemic health status, oral hygiene habits and dental visits, among other things."

This is surely an important statement about the research design, but the problem is that it appears nowhere in the peer-reviewed text of the published paper. What exactly do the authors mean when they insist that the study subjects were perfectly balanced on factors such as socio-economic status and nutritional history? These complex social and lifestyle variables are notoriously difficult to define and measure. While the authors describe the laboratory techniques in baroque detail, they do not even mention in the published paper that they measured these factors, let alone how these variables were defined and considered in the analysis. This represents a profound limitation for the reader in assessing the validity of these measures and adjustments, and therefore the adequacy of the claimed "homogeneity". The complete omission of these crucial aspects of the analysis in the paper prevents the reader from investing much confidence in the boldly stated claim that observed differences are "genetic rather than environmental" in origin.

I expressed these concerns in my own post at the journal website on 14 November 2013, but the authors did not respond.  Therefore, at Jonathan's suggestion, I addressed this concern to the PLoS ONE editors in an e-mail on 22 November 2013. I got passed along from one editor to another, and finally I got a very nice response from Elizabeth Silva on 4 December 2013. She wrote:
I wanted to let you know that I am discussing this article and your concerns with both the Academic Editor and the authors, as well as with my colleagues. We take such concerns very seriously and will ensure that appropriate measures are taken to correct any errors or discrepancies. 

Then I waited.  After 2 months I had heard nothing, so I wrote to Dr. Silva again asking for any word on progress, but received no reply.  So I waited another month.

On 4 March it had been 3 months since the note from PLoS ONE promising appropriate measures to correct any errors or discrepancies, so I wrote again, this time a bit more insistently.  This did message did finally generate a quick and reassuring response from Dr. Silva:
I really am very sorry for the extended delay in replying to you, and for my neglect in providing you with an update. Following your correspondence I contacted the authors to ask them for additional information relating to their statement that they corrected for confounding factors, and details of these methods. They promptly replied with a table of details of the baseline variables that they corrected for, and that they described in the comment on their article (see attached), as well as an additional correction to one of their figure legends. I then contacted the Academic Editor, Dr. Indranil Biswas with the full details of your concerns, as well as the table sent by the authors and the correction they requested for their figure legend. We asked Dr. Biswas to revisit the manuscript, in light of this new information, and he has informed us that he feels the conclusions of the manuscript are sound. We will now work with the authors to draft and issue a formal correction to the published article to update the methods to include the table, and to amend the figure legend in question.

The table that Dr. Silva forwarded displayed a list of variables and a p-value for some kind of test between the values in the 4 race groups. The test is not specified (t-test? chi-square test?) but presumably it is for any difference in means or proportions between the 4 groups.  Most of the p-values are large, indicating little evidence for any difference between the groups in income, age, education, or frequency of tooth-brushing, etc.  Based on this table, the populations differed only in their diets, which were characterized as "Asian Diet", "Hispanic Diet" and "American Diet".  Not unsurprisingly, the Asians were significantly more likely to report an "Asian Diet" and the Hispanics were significantly more likely to report a "Hispanic Diet".  The Blacks and Whites had similar reported consumption of the "American Diet", which presumably was the basis for the authors' assertion that these groups have identical social environments.

To date, there has been no correction made to the Mason et al paper at the PLoS ONE website.  Therefore it is perhaps somewhat premature to speculate on how the authors will address the concern voiced by Jonathan Eisen's posted comment and blog post that balance across a handful of measured covariates does not in any way imply balance across all relevant factors except for genetics. Indeed, it has long been argued in the epidemiology literature that one cannot make indirect inferences about genes by measuring and adjusting for a few environmental exposures and attributing all remaining differences to genes.  The argument that Blacks and Whites in Ohio experience identical environments is clearly false, even if a handful of measured covariates are not significantly different in their small convenience sample, the exact origin of which is still obscure.

There are many observations that can be made from this episode. I offer just a few:
  1. These authors are assiduous in describing their lab techniques, but regarding the study design and analysis they are quite cavalier.  Presumably the reviewers were not population scientists, and so they failed to point out these embarrassing flaws. This raises the question of whether a multidisciplinary journal such as PLoS ONE has the relevant expertise to screen out scientifically invalid papers. The fact that Dr. Silva suggested that the authors' table of covariates and p-values solves the issue demonstrates a wide gap of understanding.  Specialty journals that handle a narrow disciplinary range are not faced with this kind of crisis of competence.
  2. PLoS ONE is such a large operation with so many papers, that quality control seems to suffer. These beleaguered editors are responsible for an enormous publishing volume. Has quantity overwhelmed quality to the extent that gross errors of logic slip through? Months later, the Mason paper has been accessed thousands of times and generated a great deal of media attention, and yet no correction or erratum has appeared, despite the fact that the authors freely admit that the methods in their published paper are not accurate. 
  3. The publishing model gives PLoS ONE a big incentive (almost $2000) to accept a paper, but once it is published, little incentive to correct or withdraw it. 

Sadly, this is not an isolated example.  This week, PLoS ONE published a paper by Wikoff et al which makes a similar logical gaffe about observed racial difference proving a genetic difference.  We could post a comment online, but it seems that nobody (neither the authors nor the editors) has much time to spend monitoring such comments, nor much incentive to care about them.  The authors have their publication, the journal has its $2000, and another tiny piece of horrific misinformation has been released into the world.  The basic philosophy of PLoS ONE is to reduce the gate-keeper role of scientific publication. I am starting to become convinced that a little gate-keeping is not such a bad idea.

Don't do it...

Jay Kaufman sends along a bit from the Economist about the futility of earning a PhD. My favorite part:

One thing many PhD students have in common is dissatisfaction.

But this gets to the heart of the matter:

There is an oversupply of PhDs. Although a doctorate is designed as training for a job in academia, the number of PhD positions is unrelated to the number of job openings. 

And later...

America produced more than 100,000 doctoral degrees between 2005 and 2009. In the same period there were just 16,000 new professorships...
...Even in Canada, where the output of PhD graduates has grown relatively modestly, universities conferred 4,800 doctorate degrees in 2007 but hired just 2,616 new full-time professors.


Right now this seems less true in epidemiology (the job market still seems relatively strong), but recent doctorates are still in need of retirements or deaths.