Is 4/20 deadly?

[Note: this post was written with Adam Palayew]

This Friday is April 20th, a day that for most people means nothing. Americans are relieved at having filed their taxes, Canadians are stressed in anticipation of filing theirs, and in Montreal we are still wondering if winter will ever end. However, for a small minority of people, April 20th (or 4/20) is a day of celebration of all things cannabis, with large outdoor gatherings where cannabis consumption is promoted, especially at 4:20pm.

Big deal, right? Let ‘em have fun. However, a few weeks ago a short paper was published in a leading medical journal, JAMA Internal Medicine, suggesting that, over the 25 years from 1992-2016, excess cannabis consumption after 4:20pm on 4/20 increased fatal traffic crashes by 12% relative to fatal crashes that occurred one week before and one week after. Here is the key result from the paper:

In total, 1369 drivers were involved in fatal crashes after 4:20 PM on April 20 whereas 2453 drivers were in fatal crashes on control days during the same time intervals (corresponding to 7.1 and 6.4 drivers in fatal crashes per hour, respectively). The risk of a fatal crash was significantly higher on April 20 (relative risk, 1.12; 95% CI, 1.05-1.19; P = .001).
— Staples JA, Redelmeier DA. The April 20 Cannabis Celebration and Fatal Traffic Crashes in the United States JAMA Int Med, Feb 18, 2018, p.E2

Naturally, this sparked (heh) considerable media interest, not only because p<.05 and the finding is “surprising”, but also because cannabis is a hot topic these days (and, of course, April 20th happens every year).

LA Times equates stoned driving with drunk driving.

LA Times equates stoned driving with drunk driving.

Here in Canada, recreational cannabis is going to be legal throughout the nation later this summer, and the movement to make recreational cannabis legal continues to gather strength in the US. Moreover, the topic of impaired driving due to cannabis consumption has also seen increased attention in recent years. And, of course, using the relative scale to measure the impact also helps to make a splash. Here is a quote from the University of British Columbia's press release:

"Assuming fewer than 10 per cent of Americans drive while high on April 20, our results suggest that drug use at 4/20 celebrations more than doubles the risk of a fatal crash."

-Donald Redelmeier, February 12, 2018

How seriously should we take these findings?

We should say at the beginning, this is an interesting idea and some aspects of the study design are clever and thoughtful. In particular, comparing fatal crashes on 4/20 after 4:20pm to "control" days at the same time of day one week before and after inherently adjusts for seasonal and day-of-the-week factors, changes in vehicle design, as well as obviating the need for a denominator, assuming that the population-at-risk and/or vehicle traffic volume is likely to be similar a week apart from 4/20.

However, there are, we think, some reasons to be skeptical about the plausibility of the findings, as well as some methodological problems to sort out before we decide to start intervening to prevent these excess deaths (it goes without saying that we think driving while intoxicated with any substance that impairs your faculties is a bad idea).

In order for 4/20 celebrations to raise the national number of fatal crashes by 12%, it has to either lead to considerably increased consumption or increases in particularly risky kinds of driving relative to control days. The authors argue for the former, noting that 4/20 is associated with “synchronized mass consumption.” So, is 4/20 popular enough to lead to a meaningful increase in the proportion of people driving-while-high?

The authors defined 4/20 celebrations as having started in 1992, claiming that 4/20 events were popularized as a consequence of High Times magazine publishing a 1991 editorial declaring, “the grandmaster of all holidays: 4/20, or April 20th” (see timeline below). However, there is evidence that the first 4/20 celebration in the US did not occur until 1997 and was limited to Hippie Hill San Francisco with a few thousand people in attendance (some interesting history here, here, and here). The large 4/20 celebrations seen more recently (e.g., the last 5 years) are likely due to a combination of legalization of recreational use in certain areas (Colorado, Washington, etc.), wider societal acceptance, and the growth of the cannabis market into a lucrative industry.

Timeline of the evolution of 4/20.

Timeline of the evolution of 4/20.

Interestingly, Staples and Redelmeier also show state variations in the impact of 4/20, but the heterogeneity is a little puzzling. Not all states celebrate 4/20 with the same enthusiasm (we are still trying estimate potential size of celebrations), and states that you would expect to have elevated rates if 4/20 causes excess crashes are not the ones highlighted in Figure 2 of their paper (Hawaii, Minnesota, Maine). The largest and most popular 4/20 celebrations have typically been in California and Colorado, but the analysis by SR found that both California and Colorado were states with a lower than median 4/20 risk (or perhaps there are fewer ‘newbies’ smoking there or folks are more accustomed to driving while high or know their limits better).

Is it plausible?

If we try and back out some estimates of what might have to happen on 4/20 to generate a 12% increase in the national rate of fatal car crashes, it seems less and less plausible that the 4/20 effect is reliable or valid. Let's give it a shot.

In 2015 there were 17,466 driver deaths, so on any given day there are an average of about 48 driver deaths. With respect to person-time at risk, in 2015 there were 3,095 billion (yes, billion) vehicle miles traveled (VMT). Not all of those miles are accrued by drivers, but the 2009 National Household Travel Survey estimated 30 daily VMT per driver and 212 million drivers, which gives use roughly 6.4 billion daily VMT by drivers. So let’s say the base rate is around 48 deaths / 6,400,000,000 VMT per day = 0.0000000075 deaths per VMT, or 7.5 deaths per billion VMT. The transportation survey also suggests that about 1/3 of VMT occur between 4pm and midnight, so let’s also say that during those hours (the time of day SR restrict to) there are roughly 2.5 billion VMT each day.

Over the 25 year period SR tally 1369 deaths on 4/20 and 2453 deaths on control days, which works out to average deaths on those days each year of 1369/25 ~ 55 on 4/20 and 2453/25/2 ~ 49 on control days, an average excess of about 6 deaths each year. If we use our estimates of post-1620h VMT above, that works out to around 55/2.5 = 22 fatal crashes per billion VMT on 4/20 vs. 49/2.5 = 19.6 on control days.

Now, on any given day the fatal crash rate in the whole population can be expressed as a weighted average of the fatal crash rates for non-cannabis-intoxicated drivers and intoxicated cannabis drivers:

Ratepopulation = %'sober' x Rate'sober' + %stoned x Ratestoned

In order to estimate how the excess deaths are generated, we need to know what fraction of the population is driving while high, and what their risk might be relative to non-cannabis users.

Both American and Canadian reports suggest a similar prevalence of past-year cannabis use among adults (~12%), but it isn’t clear whether these kinds of casual users are likely to be 4/20 celebrants. Past month use in the US is closer to 8%, and daily use (we’d say more folks likely to be 4/20 celebrants) is more like 2.5%.

What about driving high? Roadside tests in Canada indicate 4-6% of drivers reported having used cannabis, as well as 20% of the 12% of cannabis users admitting to driving within 2 hours of cannabis use (.20*.12 = 0.024) [see the report by Capler et al.]. If we conservatively say that 3% of the population is driving after using cannabis, what is their risk of a fatal crash? A meta-analysis published in BMJ says that acute cannabis use roughly doubles the risk of a fatal crash (but note enormous heterogeneity among studies contributing to that in that estimate). Let’s plug these numbers in and see what we get.

On control days for the sober rate let's assume based on our estimates above a daily rate of 19 deaths per billion VMT during the hours from 4:20pm to midnight. Let’s also suppose that around 3% of the population are driving while high, and they have twice the fatal crash rate (38/bVMT) of the sober. This generates something similar to the base rate we calculated above (19.6 deaths per billion VMT):

Control day rate: .97*(19) + .03*(38) = 19.6


Recall that on 4/20 the rate increases by 12% to 22. If we don’t assume the relative risk changes on 4/20, just more people smoking, what proportion of the population would need to be driving while high to generate a rate of 22 per billion VMT? A little algebra tells us that to get to 22 we’d need to see something like:

4/20 day rate: .85*(19) + .15*(38) = 21.9


15%! That’s nearly one-sixth of the population driving while high on 4/20 from 4:20pm to midnight, which doesn’t, absent any other evidence, seem very likely. (Note that our base rates don't really need to be correct here; even if the sober rate is 10 or 30/bVMT, you'd still need 15% driving-while-high on 4/20 to increase the rate by 12%). Alternatively, one could also raise the relative risk among cannabis drivers to 6x the base rate and get something close. Or some combination of the two. This means either the nationwide prevalence of driving while using cannabis increases massively on 4/20, or the RR of a fatal crash with the kind of cannabis use happening on 4/20 is absurdly high. Neither of these scenarios seem particularly likely based on what we currently know about cannabis use and driving risks.

Now, of course we are just making guesses, and it’s true that we don’t have much strong evidence for saying that 15% of drivers nationwide being impaired after 4:20pm on 4/20 is an implausibly high estimate, but neither do Staples and Redelmeier present any evidence that it is reasonably plausible (Redelmeier said less than 10% in the quote above).

Extra-Poisson variation (yes, that).

Even if we accept the plausibility of a meaningful point estimate of the effect of 4/20 on fatal crashes, we have other issues to deal with. Namely, other sources of variation in an outcome like daily traffic crashes that are ignored by SR, who effectively treat each year as a matched pair of case and control days but fail to account for this matching when calculating their measure of precision (remember that p<.05).

Between 1992 and 2016 they report 1369 deaths on 4/20 and 2453 deaths on the combined days of 4/13 and 4/27. One can easily reproduce their estimate and measure of precision (RR=1.12, 95%CI 1.04, 1.19) with the basic assumption that there is twice as much unexposed person-time during the “control” days:

Simple tabular replication in Stata.

Simple tabular replication in Stata.

Turning to regression models, one could use Poisson regression to model the effect of 4/20 relative to control days on the daily count of deaths, but Poisson regression assumes that the mean and variance of the outcome variable are equal. If there is "extra-Poisson" variation, the data exhibit overdispersion. What is that? We aren't biostatisticians, so here is a description from the late Joseph Hilbe, who wrote a lovely book on this kind of regression analysis:

Overdispersion is caused by positive correlation between responses or by an excess variation between response probabilities or counts.

Overdispersion also arises when there are violations in the distributional assumptions of the
data, such as when the data are clustered and thereby violate the likelihood independence of observations assumption.

Overdispersion may cause standard errors of the estimates to be deflated or
— Joseph Hilbe, Negative Binomial Regresson (2nd ed.), p.141.

These issues (overdispersion, clustering) are relevant to the 4/20 problem, so we used the same Fatal Analysis Reporting System (FARS) dataset (restricted to drivers and the same time of day) to see whether accounting for additional variation in daily traffic crashes would provide alternative estimates of precision. Not surprisingly, it does. First, it’s easy to show that the tabular estimate is identical to what you get using Poisson regression (beta=0.11, se=0.034, IRR=1.12, 95%CI 1.05, 1.19). Then one can correct for overdispersion and/or extra-Poisson variation a few different ways:

  • use negative binomial regression;
  • scale the standard errors by the Pearson-based dispersion statistic;
  • use a robust clustered variance (year as cluster).

We also included a model using a resampling technique (bootstrapping with 500 replications) as another possible way to estimate the standard error. Here are the results:

Source: Authors' analysis of FARS data (n=75 for each model, i.e., deaths on 1 case and 2 control days x 25 years from 1992-2016.)

Source: Authors' analysis of FARS data (n=75 for each model, i.e., deaths on 1 case and 2 control days x 25 years from 1992-2016.)

As can be seen in the table above, any of these techniques would have yielded a larger standard error and, obviously, p>.05. We aren't saying JAMA Int Med wouldn't have considered it, but there are (sadly), good reasons to think so.

But why consider 4/20 in isolation? Is it especially different from other days of the year? We ran the same analysis (using negative binomial regression) for every day of the year (comparing deaths between 1620h and 2359h on each day relative to the days one week before and after), and here is a plot of the results for each day:

Source: Authors' analysis of FARS data (n=75 for each estimate, i.e., 1 case and 2 control days a week before and after x 25 years from 1992-2016.)

Source: Authors' analysis of FARS data (n=75 for each estimate, i.e., 1 case and 2 control days a week before and after x 25 years from 1992-2016.)

There is quite a lot of noise in these daily crash rate ratios, and few that appear reliably above or below the rates +/- one week. Although accounting for overdispersion leads to estimates for 4/20 that are not statistically distinguishable from the null (a distinction that we don't think is meaningful), apparently there are several other days to worry about (excess deaths) and/or celebrate (reduced deaths):



Rather than 4/20, should we be concerned about May 25? (out-of-control Geek Pride Day celebrations?) What about July 14? (Americans celebrating Bastille Day?) December 18?

But the table and graph above also suggest it's not all noise. The elevated rates of fatal crashes around Labor Day and July 4th have been noted many times. So it’s also instructive to compare 4/20 to July the 4th. Here are crude incidence rate-ratios for each year, across the entire range of the FARS data (back to 1975).

First for 4/20:

Source: Authors' analysis of FARS data. IRR and 95%CI computed with Stata's -iri command.

Source: Authors' analysis of FARS data. IRR and 95%CI computed with Stata's -iri command.

Same analyses for July 4th:

Source: Authors' analysis of FARS data. IRR and 95%CI computed with Stata's -iri command.

Source: Authors' analysis of FARS data. IRR and 95%CI computed with Stata's -iri command.

Note the more consistently elevated estimates for 7/04 vs. 4/20 in each year. Again, it is difficult to see much signal in the 4/20 estimates, and clearly there is a fair bit of noise in both sets of estimates. It would probably be beneficial to study these problems using hierarchical or Bayesian models, as has been suggested for similar analyses of "day" effect problems. 

Concluding thoughts

Taken together, we think the preceding discussion of plausibility and methodological concerns should temper any conclusions regarding 4/20 and fatal traffic crashes.  We should also point out that this post is not meant as a “take down” or critique of Staples and Redelmeier as researchers. Rather, SR have identified an interesting question and we hoped to provide some additional exploration, context and considerations that were not able to make it into their report. We are writing a more formal paper exploring some of these issues and will post all of the data and code for that analysis, as we have for other papers. We remain skeptical that the impact of 4/20 is as large as SR claim, but are open to seeing other evidence to change our minds.

So, this 4/20, we hope everyone celebrating does so with enthusiasm, but appropriate caution. Don’t overdo it, take public transit or walk. And if you are thinking of driving, promise that you’ll listen to the entirety of the Dead’s Dark Star > Other One > Lovelight from 1970’s Fillmore East show before getting behind the wheel. Have a safe trip!

Stata code for these estimates below:

capture log close
log using blog-420.txt, replace text

//  program:
//  task:     replication of 420 effects
//  input:    FARS raw data
//  output:   various figures
//  project:  FARS and fatal traffic crashes
//  author:   sam harper \ 19apr2018

//  #0
//  program setup

version 14
set linesize 80
clear all
macro drop _all

/*  SECTION 1: Replicate Staples / Redelmeier numbers        
    Notes: */

   /* [>   1.1.  Prepare dataset   <] */ 

* read in FARS person-level datasets for 1975-2016
forvalues i=1975/2016 {
  tempfile fars`i'
  qui use state county month day hour minute ///
    st_case per_no per_typ age sex ///
    death_da death_mo death_yr death_hr mod_year ///
    death_mn death_tm lag_hrs lag_mins using farsp`i'.dta, clear
  qui gen syear=`i'
  qui save `fars`i'', replace

use `fars1975' , clear
forvalues i=1976/2016 {
  qui append using `fars`i''

* restrict to drivers
keep if per_typ==1

* replace unknown month of crash
replace month = . if month==99

* replace unknown day of crash
replace day = . if day==99

* replace unknown hour of crash
replace hour = . if hour==99

* replace unknown minute of crash
replace minute = . if minute==99

* revised time of death
gen crashtime = hour*100 + minute

   /* [>   1.2.  Replicate some table numbers   <] */ 

* count of fatalities from 1992-2016
count if syear>1991 

* exposure to 420
gen e420 = 1 if month==4 & day==20 & ///
  (crashtime>=1620 & crashtime<=2359)
* control period
replace e420 = 0 if month==4 & (day==13 | day==27) & ///
  (crashtime>=1620 & crashtime<=2359)

recode sex (2=0 "No") (1=1 "Yes") (8/9=.), gen(male)
label var male "Male gender?"

recode syear (1992/2003 = 0 "Remote (1992-2003") ///
  (2004/2016 = 1 "Recent (2004-2016"), gen(era)
label var era "Time period"

recode age (0/20 = 1 "<20y") (21/30 =2 "21-30y") (31/40 = 3 "31-40y") ///
  (41/50 = 4 "41-50y") (51/97 = 5 ">51y") (998 999 = .), gen(agec)
replace agec = 5 if syear>2008 & age>=98 & age<998
replace agec = . if syear<2009 & age==99
label var agec "Age group"

* Some Fig 1 numbers
tab e420 if syear>1991
tab agec e420 if syear>1991
tab male e420 if syear>1991
tab era e420 if syear>1991

   /* [>   1.3.  Measure of association   <] */ 

* exposed cases
qui count if e420==1 & syear>1991
local e1 = r(N)

* unexposed cases
qui count if e420==0 & syear>1991
local e0 = r(N)

* rate ratio (person-time=1 for 420, 2 for control days)
iri `e1' `e0' 1 2 

disp as result "IRR= " %3.2f r(irr) as result ///
  "; 95%CI: (" %3.2f r(lb_irr) as result ", " %3.2f r(ub_irr) ")"

* compare to published estimates
* rate ratio (person-time=1 for 420, 2 for control days)
iri 1369 2453 1 2 

disp as result "SR IRR= " %3.2f r(irr) as result ///
  "; 95%CI: (" %3.2f r(lb_irr) as result ", " %3.2f r(ub_irr) ")"

/*------------------------------------ End of SECTION 1 ------------------------------------*/

/*  SECTION 2: Correcting for overdispersion        
    Notes: */

   /* [>   2.1.  Collapse to daily deaths 1620h-2359h <] */ 

* indicator for crashtimes between 1620h and 2359h
gen d420 = (crashtime>=1620 & crashtime<=2359 & crashtime!=.) 

* aggregate to daily data
collapse (count) deaths=per_no, by(syear month day d420)

* create date variable
gen date = mdy(month, day, syear)
format date %td

* restrict to crashes between 1620h and 2359h
keep if d420==1

* format month, year, day of week, day of year
drop if day==.
drop if month==.
replace month=month(date) 
gen year=year(date)
gen dow=dow(date) // day of week
gen doy=doy(date) // day of year

* define exposure to 4/20, controls +/- 7 days
gen exp420 = 1 if month==4 & day==20
replace exp420 = 0 if exp420[ _n - 7 ] == 1
replace exp420 = 0 if exp420[ _n + 7 ] == 1

* series of regression models
poisson deaths i.exp420 if syear>1991, irr nolog
estimates store m1 

nbreg deaths i.exp420 if syear>1991, irr nolog
estimates store m2

glm deaths i.exp420 if syear>1991, fam(poi) scale(x2) eform nolog
estimates store m3

poisson deaths i.exp420 if syear>1991, vce(cl syear) irr nolog
estimates store m4

poisson deaths i.exp420 if syear>1991, vce(bootstrap, reps(500)) irr
estimates store m5

esttab m1 m2 m3 m4 m5, b(%3.2f) ci(%3.2f) wide eform nostar

drop exp420

    /* [>   2.2.  NB regression for each day   <] */ 

* placeholders for regression coefficients and 95% CIs 
gen nbirr = .
gen nbirr_lb = .
gen nbirr_ub = .

* exposure variable 
gen expday = .

* sort by month and
sort year month day

* define exposure and run regression, store results
qui levelsof month, local(months) // cycle through months
  foreach m of local months {
    qui levelsof day if month==`m', local(days) // cycle through days
    foreach d of local days {
    qui replace expday = . 
    qui replace expday = 1 if month==`m' & day==`d' & syear>1991
    qui replace expday = 0 if expday[ _n - 7 ] == 1
    qui replace expday = 0 if expday[ _n + 7 ] == 1
    qui nbreg deaths i.expday if syear>1991, nolog
    qui replace nbirr = exp(_b[1.expday]) if month==`m' & day==`d' & syear>1991
    qui replace nbirr_lb = exp(_b[1.expday] - 1.96*_se[1.expday]) ///
      if month==`m' & day==`d' & syear>1991
    qui replace nbirr_ub = exp(_b[1.expday] + 1.96*_se[1.expday]) ///
      if month==`m' & day==`d' & syear>1991

  * list days with "significant" effects
  count if syear==2016 & (nbirr_lb>1 | nbirr_ub<1)
  list month day nbirr nbirr_lb nbirr_ub if syear==2016 & (nbirr_lb>1 | nbirr_ub<1)

* graph of IRR each day
egen md = group(month day)

tw (rcap nbirr_ub nbirr_lb md if md!=111, msize(zero) lcolor(black) lwidth(vvthin)) ///
  (rcap nbirr_ub nbirr_lb md if md==111, msize(zero) lcolor(red) lwidth(vvthin)) ///
  (scatter nbirr md if md!=111, mfcolor(black) mlcolor(white) mlwidth(vthin) msize(vsmall)) ///
  (scatter nbirr md if md==111, mfcolor(red) mlcolor(white) mlwidth(vthin) msize(vsmall)) ///
  (pcarrowi 1.7 0 1.2 0 (3), lcolor(black) mcolor(black) msize(small) lwidth(thin)), ///
  ylab(0.8 1 1.25 2, angle(horizontal)) yscale(log) xsize(11) yline(1) ///
  xlab(16 "Jan" 47 "Feb" 77 "Mar" 107 "Apr" 138 "May" 168 "Jun" 199 "Jul" 230 ///
  "Aug" 260 "Sep" 291 "Oct" 321 "Nov" 351 "Dec", noticks) ///
  xmtick(1 32 61 92 122 153 183 214 245 275 306 336) ///
  legend(off) text(1.45 111 `""4/20""', size(small) color(red)) ///
  text(1.75 186 "July" "4th", size(small))  ///
  text(1.75 248 "Labor" "Day", size(small)) ///
  text(1.75 127 "Memorial" "Day", size(small)) ///
  text(1.75 360 "Christmas", size(small)) ///
  text(1.75 -2 "New Year's" "Day", place(e) size(small)) xtitle("") ///
  graphregion(fcolor(white) lcolor(white) lstyle(none)) ///
  title("Incidence rate ratios (log scale) from separate negative binomial regression for each day of the year relative to control days +/- 1 week, 1992-2016", ///
  size(medsmall) pos(11)) name(nb, replace) 

/*------------------------------------ End of SECTION 2 ------------------------------------*/


/*  SECTION 3: Annual estimates for 4/20 and 7/4        
    Notes: */

   /* [>   3.1.  Analysis and graph for 4/20   <] */ 

* define exposure to 4/20, controls +/- 7 days
gen exp420 = 1 if month==4 & day==20
replace exp420 = 0 if exp420[ _n - 7 ] == 1
replace exp420 = 0 if exp420[ _n + 7 ] == 1

* loop over years and estimate IRR and 95% CI
gen pt420 = 1   // person-time
gen iri420=.    // IRR
gen iri420lb=.  // IRR lower bound
gen iri420ub=.  // IRR upper bound

forvalues i=1975/2016 {
  qui ir deaths exp420 pt420 if year==`i'
  qui replace iri420 = r(irr) if year==`i' & exp420!=.
  qui replace iri420lb = r(lb_irr) if year==`i' & exp420!=.
  qui replace iri420ub = r(ub_irr) if year==`i' & exp420!=.

gen lniri420 = ln(iri420)
gen lniri420se = abs(ln(iri420ub) - ln(iri420lb)) / (2*invnorm(0.975))
tw (rcap iri420ub iri420lb year if exp420==1, msize(zero) lcolor(black) ///
  lwidth(medium)) ///
  (scatter iri420 year if exp420==1, mfcolor(black) mlcolor(white) ///
  mlwidth(thick) msize(medsmall)) ///
  , ylab(,angle(horizontal)) yscale(log) yline(1) ///
  graphregion(fcolor(white) lcolor(white) lstyle(none)) ///
  name(year420iri, replace) legend(off) xtitle("") ///
  title("Incidence rate ratio (log scale): April 20 vs April 13/April 20, each year", ///
  pos(11) size(medsmall)) 

    /* [>   3.2.  Analysis and graph for 7/4   <] */ 
 * define exposure to 7/4, controls +/- 7 days
gen exp704 = 1 if month==7 & day==4
replace exp704 = 0 if exp704[ _n - 7 ] == 1
replace exp704 = 0 if exp704[ _n + 7 ] == 1

* loop over years and estimate IRR and 95% CI
gen pt704 = 1   // person-time
gen iri704=.    // IRR
gen iri704lb=.  // IRR lower bound
gen iri704ub=.  // IRR upper bound

forvalues i=1975/2016 {
  qui ir deaths exp704 pt704 if year==`i' 
  qui replace iri704 = r(irr) if year==`i' & exp704!=.
  qui replace iri704lb = r(lb_irr) if year==`i' & exp704!=.
  qui replace iri704ub = r(ub_irr) if year==`i' & exp704!=.

tw (rcap iri704ub iri704lb year if exp704==1, msize(zero) lcolor(black) ///
  lwidth(medium)) ///
  (scatter iri704 year if exp704==1, mfcolor(black) mlcolor(white) ///
  mlwidth(thick) msize(medsmall)) ///
  , ylab(,angle(horizontal)) yscale(log) yline(1) ///
  graphregion(fcolor(white) lcolor(white) lstyle(none)) ///
  name(year704iri, replace) legend(off) xtitle("") ///
  title("Incidence rate ratio (log scale): July 4 vs. June 27/July 11, each year", pos(11) ///

   /* [>   3.3.  Generate figure   <] */ 

* combine graphs for annual estimates
graph combine year420iri year704iri, ycommon rows(1) ///
  graphregion(fcolor(white) lcolor(white) lstyle(none)) ///
  name(yearly, replace)

* now combine with daily pooled estimates
graph combine nb yearly, ycommon rows(2) ///
  graphregion(fcolor(white) lcolor(white) lstyle(none)) ///
  name(all, replace)

log close

Did it take a village?

What can we do to reduce racial inequalities in health? There are mountains of scientific reports, peer reviewed articles, reviews, monographs, books, and blog posts showing that US blacks have worse health indicators than whites. Although the magnitude of black-white differences in health have varied over time, the persistence of the racial gap in health is distressing and can be seen across a diverse array of environments and ways of measuring health: life expectancy, survival from cancer, death rates, cases of infectious diseases, accidents and crime victimization, among other outcomes.

This problem is echoed in striking language in a 2013 paper titled, "Eliminating Racial Disparities in Colorectal Cancer in the Real World: It Took a Village" by a group of public health cancer researchers from Delaware:

Those who are poor, underserved, or minorities are more likely to get cancer and die as a result of it than those who are rich or white. This is a fact, and it is the current reality of cancer care in the United States, as documented in thousands of peer-reviewed articles,…
— Grubbs et al. J Clinical Oncology 2013;31:1928.

Despite this depressing introduction, the quote above comes from a paper that attempts to make a case that a sustained, focused program aimed at reducing racial differences in cancer in Delaware was enormously successful, demonstrating “what can happen when the entire health care community of a state is mobilized toward a goal: eliminating health disparities in CRC [colorectal cancer].”

Briefly, the program contained 3 general aims:

  • A colorectal cancer screening program.
  • A cancer treatment program providing for the uninsured.
  • An emphasis on African American cancer disparity reduction.

In the paper the authors argue that the program resulted in the near-elimination of black-white inequalities in colorectal cancer:

By doing these common-sense things, we accomplished the following with respect to CRC health disparities from 2002 to 2009: elimination of screening disparities, equalization of incidence rates, reduction in the percentage of African Americans with regional and distant disease from 79% to 40%, and most importantly a near elimination of mortality differences.
— Grubbs et al. J Clinical Oncology 2013;31:1929.

The primary pieces of evidence behind this claim come from Table 1 and Supplementary Figures 2 and 3:

Table 1 shows rates of ever colonoscopy screening, incidence, and mortality in 2001 and 2009, and the relative change between the two time points. Without a doubt, black rates of screening increased more among blacks than whites, and both incidence and mortality declined more for blacks than whites, although no estimates of precision (only p-values, sigh) were included (a serious and unethical omission, in my opinion). Based on these two sets of measurements, and the fact that the program was implemented between 2001 and 2009 serves to bolster the authors’ claims that the program reduced black-white differences.

However, Appendix Figures 2 and 3 complicate the story a little bit, since they show trends over the entire period, for colorectal cancer incidence (the authors circled the rates at the end of the period, I added an arrow indicating the general time of program implementation):

Grubbs et al. JCO 2013

Here is the graph for mortality:

Grubbs et al. JCO 2013

The authors are certainly correct that inequalities decreased. But one thing that is clearer from the figures than the table is that rates of cancer incidence and mortality were decreasing, particularly among blacks, prior to the program implementation, which complicates the authors’ story since now we have an alternative explanation for the decrease in black-white inequalities at the end of the observation period: black rates were already declining faster than whites for reasons that have nothing to do with the program. I don’t know what these might be, but it does make it harder to accept the authors’ attribution of the elimination of inequality to the program, since clearly some part of this decrease looks as though it would have happened even if the program had not been implemented.

It is worth mentioning that national rates of colorectal cancer mortality have also been declining, and perhaps faster for blacks than whites. The graph below I generated from the National Cancer Institute’s web-based tool (

Source: US National Cancer Institute (author's calculations).

This at least suggest that there may be some factors operating nationwide that are driving down colorectal cancer mortality rates for both blacks and whites, which makes it harder to evaluate the researchers' claim that their program was responsible for the success in Delaware.

How can we adjudicate between these competing explanations to figure out whether the authors' argument for the impact of the program is more plausible than the argument that factors other than the program were already driving down rates of colorectal cancer incidence and mortality among blacks before the program was even implemented?

One way (I do not claim this is the only way) is to use something called an “interrupted time series” (ITS) study, which effectively uses the knowledge about when a program was implemented and assesses whether the program may have had an immediate impact on the rates, or possibly changed the trajectory of the outcome after the program was implemented. (There are many papers on this, but a rather nice overview is given by Gillings and colleagues in the American Journal of Public Health in 1981). The picture below gives the basic idea, which is using this type of study design to test whether some intervention "interrupts" pre-existing trends.

Source: Author.

The key idea of ITS is to say, let’s look at the outcome trends before the program (β1) and predict what would have happened in the absence of the program, assuming the trends would have continued in the same fashion (clearly a serious assumption), and compare this to what actually happened (since the program was actually implemented). In the figure above β2 represents any immediate impact of the program (less likely for most cancer indicators), but β3 represents a change in the trend that we could potentially attribute to the program. [Note that I have a color code for "control" there, but right now we could just think about the red lines as trends for either Delaware as a whole or for black residents of Delaware...more on that below]. If there are important differences between what we would have predicted without the program and what actually happened (under these assumptions), we may be more confident that the program actually “interrupted” the trends and had some impact on outcomes. (leaving aside a lot of other complicating features and assumptions of ITS [autocorrelation, seasonality, time-varying confounders, etc.], which need to be taken seriously)

A simple analysis for Delaware mortality

I downloaded annual age-adjusted rates of colorectal cancer mortality for black and white individuals ages 50 and over from Delaware using the US National Cancer Institute’s SEER database ( for the years 1999 to 2013. Unfortunately, incidence rates are a little unstable among blacks and NCI suppresses some cases where numbers are small, but we could apply this same idea to trends in stage-specific incidence or cancer screening (which would be very interesting and perhaps a better example, since those are likely easier to change in short order than mortality).

I used the Stata routine itsa, written by Ariel Linden (and later described in an article published in Stata Journal) to estimate the impact of the introduction of the program in 2003 on colorectal cancer mortality rates among Delaware blacks. It isn't necessary to use this routine for an ITS analysis, but it makes some simple coding aspects easier and I was feeling lazy. I used the log mortality rate but the results are similar if one uses the absolute mortality rate.

I'll show the regression output below, but this picture tells the story pretty well. We tell Stata that an intervention occurred in 2003, and the software routine tests for a break in the time series in 2003 and any change in the slope of the mortality rate after 2003 (under assumptions of linearity, which, again, are important and need to be thoroughly explored, but hey, Grubbs et al. didn't even report standard errors!). 

Source: Author's calculations of SEER data.

It's pretty straightforward to see that after the intervention began in 2003 there was little change in the cancer mortality rate among blacks, and that there was little change in the trajectory of the death rate after 2003. This is confirmed by the output from the regression analysis:

Source: Author's calculations of SEER data.

Source: Author's calculations of SEER data.

A few notes here. Exponentiating the constant term (4.8) gives the predicted colorectal cancer rate for blacks in 1999, which is approximately exp(4.8)=122 per 100,000 population, roughly equal to the actual rate of 119. The coefficient _t says that the black mortality rate was declining by roughly 6% per year prior to 2003, the coefficient for _x2003 represents the change in the log mortality at 2003, which was near zero, and the coefficient for _x_t2003 captures any difference in the log rate trend after the intervention, which is also effectively zero. The fact that the two trends before and after 2003 look nearly identical in the figure above corresponds with the fact that this latter coefficient is nearly zero.  

This suggests that the program did not "interrupt" black mortality rates. The rate of decline was effectively the same before and after the program. But, you might ask, maybe comparing blacks after 2003 to their trajectory before 2003 isn't the right comparison. Since the program was focused on reducing racial inequalities, maybe we should be comparing black rates to white rates (or even modeling the black white difference in rates)?

This is a fairly straightforward extension of ITS, which is just to include a *non-affected* control group as a comparison. I say "non-affected" but this is unlikely to be true here, since the program (at least the limited description of it in the paper) did not say that that it only affected blacks in Delaware; presumably, whites may also have benefited. Nevertheless, it would seem valuable to use whites as a control group since the idea was for the program to narrow racial inequalities.

Below is another graph based on the ITS regression results, this time with Delaware whites as a control group: 

Source: Author's calculations of SEER data.

Again, a few things to notice here. First, white log mortality rates were on roughly the same trajectory as blacks prior to 2003 (although much lower in absolute terms). Second, the post-intervention trends also look generally similar, which is inconsistent with the idea that the program disproportionately benefited blacks. The regression estimates provide some numbers to back this up:

Source: Author's calculations of SEER data.

There are more parameters in this model, since we are not only estimating the program impact on blacks, but also on whites. Since in this case blacks are the "treated" group (_z=1) and whites are the control group (_z=0), the parameters have the following interpretations. The constant term (_cons) is the predicted white rate in 1999, so roughly exp(4.3)=74 per 100,000. The coefficient on _t is the white trend prior to 2003 (weakly declining), the coefficient on _z is the black-white difference in log mortality levels in 1999, which is large and obvious in the graph, and the coefficient _z_t is the difference in the pre-intervention trend between blacks and whites, which is rather small and suggests that blacks and whites were on similar trajectories prior to the intervention. The next set of coefficients give the change in the white rate at the time of the intervention (_x2003, small and positive), the change in the trend after the intervention among whites (_x_t2003, small, but still negative and showing declines in rates), the change in the black rate at the time of the intervention (_z_x2003, small and negative), and, most importantly, the difference in the post-intervention trend between blacks and whites (_z_x_t2003), which is also close to zero.

The itsa program also will display the post intervention trends for the treated and control groups, and the difference between these two trends. The panel below gives this information and you can see that both blacks ("treated") and whites ("controls") were decreasing by roughly 5-6% annually. The black decline is larger than the white decline, but it is hard to make a case for a meaningful difference in the post-intervention trends with this data. The 95% confidence interval for the black-white difference in post-intervention trends goes from -4% to +3%.

Source: Author's calculations of SEER data.

Okay, as one last example, rather than compare Delaware blacks to Delaware whites (since we might worry about whites also being affected by the program), we could instead compare Delaware blacks to blacks in a different state without the same program. I grabbed some additional data for blacks in Maryland, and we could instead use them as a control population. Here are the ITS results:

Source: Author's calculations of SEER data.

In this case it looks a little more plausible that the post-intervention trends show steeper declines for blacks in Delaware than in Maryland. The regression estimates are posted below:

Source: Author's calculations of SEER data.

These estimates are obviously related to the graphs above, but the coefficient on the post-intervention product term that captures the difference in post-intervention trends between blacks in Delaware and Maryland is still small and close to zero. The post-trend estimates do show that the post-intervention trends are more negative (i.e., a steeper mortality decline) in Delaware (~6% annual decline) than in Maryland (~4% annual decline), but these are still not reliably different from one another, although this seems a more compelling story than we had before. 

I am not submitting these analyses for publication, and this is not a serious attempt to estimate the program's impact, but the fact that no alternative explanations were even explored by Grubbs et al. is what really disappoints me, since these methods have been around for a long time. [Also, the data I used for these analyses are here, if anyone is interested]. 

Who cares?

You may be wondering why I would take the trouble to write this post, given that the paper was published in 2013. Old news, right? But I began thinking more about this precisely because I have seen this paper cited a number of times (currently 60 citations according to Google Scholar) recently as evidence of a program that has successfully "worked" to reduce black-white differences in health, which is a topic I do care about.

In a recent interview on the "Future of Health Equity Research" with the Robert Wood Johnson Foundation, the sociologist David Williams, who has done some really fine work on black-white differences in health, responded to a question about interventions that have been shown to reduce health inequalities by citing the Delaware example: 

And the State of Delaware provided colorectal cancer screenings and treatment for all residents but created targeted interventions and marketing campaigns for the black community, which experienced lower screening rates and higher colon cancer incidence rates than whites. As a result, screening rates among blacks rose from 48% in 2002 to 74% in 2009, which was equal to the improved screening rate for whites. Additionally, colon cancer incidence rates declined from 67 and 58 per 100,000 residents for blacks and whites respectively in 2002 to 45 per 100,000 residents among both populations in 2009.
— David Williams

Elsewhere, this study also shows up as evidence cited by other researchers that the program eliminated black-white differences in colorectal cancer outcomes. Renee Williams and colleagues reviewed the epidemiology of colorectal cancer among African-Americans and noted the impact of the program implemented in Delaware:

The Delaware Cancer Advisory Council was established in 2001 with the goal of developing a Statewide cancer control program. This program provided reimbursement and covered 2 years of the cost of care for newly diagnosed CRC for the uninsured. Special programs to reach the African American community were put in place partnering with community organizations. Between 2001 and 2009 they equalized the screening rates between blacks and whites and significantly diminished the gaps in mortality. In 2001, the CRC mortality was 31.2 per 100,000 in blacks and 19.5 per 100,000 in whites in 2001. In 2009 these numbers decreased to 18.0 per 100,000 and 16.9 per 100,000, respectively [citing Grubbs et al. 2013].
— Williams et al. Colorectal Cancer in African Americans: An Update. Clinical and Translational Gastroenterology, 2016;7(7), e185

And another recent paper by an esteemed group of public health researchers at the Harvard School of Public Health repeated this same claim:

We note that a recent comprehensive screening program deployed in the state of Delaware resulted in the elimination of stage at diagnosis disparities and in the reduction of survival disparities [citing Grubbs et al. 2013].
— Valeri et al. Cancer Epidemiol Biomarkers Prev; 25(1); 83–89

Again, it is worth noting here that in both of these examples the entire reduction/elimination in black-white inequality in colorectal cancer is attributed *to the program* yet, as we showed above, this claim is very hard to back up with solid evidence.

More distressingly, Grubbs et al. are still making serious claims that attribute the reduction or elimination of black-white inequalities in cancer outcomes entirely to the program. An abstract of a conference presentation in the fall of 2016 says it all:

Finally, I should say, and want to emphasize, that the point of this post is not to try and embarrass or shame these researchers; they are no doubt committed to reducing the appalling gap in black-white health in their state, and I certainly applaud them for creating the support and political will to implement a program designed to reducing racial inequalities in colorectal cancer. And I am, of course, happy that black-white differences in many of these cancer-related outcomes have diminished in Delaware. Nor is my goal to question whether a program like the one in Delaware has produced some positive benefits; it may very well have, and it may have made an important contribution to the decline in black-white inequalities. But the program also costs money, and before we consider it “a model” for other states, we should get serious about understanding it’s impact. For all I know other states out there may be trying to replicate this program and anticipating the same result. If the program was not the chief reason why black mortality rates decreased faster than for whites, then we have little reason to expect that it can be replicated elsewhere, and that wastes resources that could be spent elsewhere.

My main motivation is to alert researchers (and consumers of research) that without a proper study design to isolate the impact of a particular program or intervention on health, we cannot know whether the program itself is responsible for the positive (or negative!) outcomes. What I hope is that, when considering whether or not their program was successful, researchers such as Grubbs et al. first remember to ask themselves, “What would have happened in the absence of the program?” This kind of thinking will help (I hope) to improve the reliability and validity of claims about the effectiveness of public health interventions. We can, and should, demand better evidence for the investments we make in public health.

Bad news for US life expectancy

The big health news this week was a report that, for the first time since 1993, life expectancy in the United States went down (incidentally this was also a story earlier this year when the preliminary numbers were published, coverage by the NYT here). Life expectancy decreased from 81.3 to 81.2 years for women and from 76.5 to 76.3 years for men. Those seem like small changes, right? So why is this such a big story? Because US life expectancy has been steadily increasing--with minor interruptions--for decades. When these numbers are reported each year (and usually don't make headlines) it is because the story is that life expectancy increased again, mortality is improving, and we are doing something right (or, at least, we aren't doing something wrong). The graph below shows trends in life expectancy at birth since 1929 for US men and women:

From published estimates by the US National Center for Health Statistics.

From published estimates by the US National Center for Health Statistics.

It's an impressive record, notwithstanding the fact that, despite this progress, the US still lags behind many other rich (and some poor) countries in terms of life expectancy.

This story has gotten a lot of coverage, and there are solid articles that have attempted to discuss the reasons for the decrease. Scientific American had an interview with Robert Anderson, chief of the Mortality Statistics Branch at the NCHS, who published the report, that talks about increases in cardiovascular disease, the spike in the 2015 flu epidemic that potentially spilled over to cause more cardiovascular deaths, and the continuing epidemic of overdoses from prescription opioids and heroin. 

The NY Times focused quite a bit on socioeconomic factors and the role they may have played in the decline. They quoted Columbia's Peter Meunning, who said that "popular theories for the cause of the decline, including an increase in obesity rates and an opioid epidemic, fail to explain a problem that feels broader."

Vox also had a nice write-up that discussed the wide range of diseases affected, potential mechanisms related to stress and poor health behaviors, but also (and positively, in my view) noted that, despite a recent focus on white mortality patterns in the press, this recent mortality increase did little to alter the gap in life expectancy between blacks and whites.

There are clearly important questions about what the underlying causes are of this decrease, especially if it is sustained and shows up in next year's statistics as well. Anderson from NCHS says that the preliminary data from 2016 suggest that 2015 may just be a "blip"): 

if we look at the age-adjusted rate for 2015, which is about 733 deaths per 100,000 population, and look at the age-adjusted rate for the 12 months ending in June 2016, it drops back to 720 per 100,000—and that’s actually below the 2014 level, so it may very well be a blip.
— Robert Anderson, NCHS, quoted in Scientific American

Most of these stories have centered on the fact that 8 out of the 10 leading causes of death showed increases, most distressingly for cardiovascular diseases (chiefly heart disease and stroke), which have been leading the cause of death tables for decades. The increase in cardiovascular diseases, because they represent the leading cause of death, is potentially what makes this a story about a decrease in life expectancy rather than a few causes of death that are higher from one year to the next. And there are, indeed, signs that the historic declines in deaths from cardiovascular diseases may be coming to an end. A recent paper in JAMA Cardiology (Sydney et al. 2016;1(5):594-599) shows that the declines have been stalling since around 2011 or 2012, which is of great concern since it continues to be the number one killer in the US.  

But none of these stories really tell us about which causes of death played the largest role in the decline of life expectancy. So, let's take a very early and simple look at this question.

Decomposing the decrease in life expectancy

We'll do this separately for men and women, since the mortality patterns and trends are often quite different by gender. The data on mortality by age and cause of death are publicly available and easily downloadable from CDC WONDER. I used the multiple cause of death database for 2014 and 2015, and asked for the "standard" list of 113 causes of death used by NCHS.

Because a decrease in life expectancy can arise due to changes in mortality at different ages, I used a technique developed by the demographer Eduardo Arriaga to decompose the change (there are other ways to do this, but this one is fairly straightforward and simple).

It's important to note as well that life expectancy is essentially a weighted average of age-specific death rates, and deaths at younger ages are given more weight (the steep increases in life expectancy in the early 20th century were largely driven by substantial reductions in infant and child mortality). Because it is a weighted average, life expectancy at birth can change in lots of different ways. It could decrease if mortality trends decrease in younger ages but increase at older ages, and vice versa. The decomposition helps to sort out which age groups made the most difference.


I calculated life expectancy for women as 81.36 in 2014 and 81.20 in 2015, a 0.16 year decrease, which is close to what was reported by NCHS. For men the values were 76.48 and 76.31, a decrease of 0.17 years. Below you can see the graph showing the proportion of the decline in life expectancy due to mortality changes at different ages, by gender:

Author's calculations. Source: CDC WONDER

Author's calculations. Source: CDC WONDER

A few things to notice here.

First, the contributions across age groups are quite different by gender. Nearly 60% of the (0.1 year) decrease in life expectancy among women came from unfavorable mortality changes at very old ages (85 and over). On the other hand, for men 70% of the increase was due to mortality increases among those aged 15 to 44 years. Although infant mortality can strongly influence life expectancy, it played no role in the worsening of life expectancy for either men or women (the fact that they did not help push toward increasing life expectancy probably is not a good sign).

Second, nearly all age groups (except the 45-54 year olds) helped to push life expectancy down for both men and women, which adds some weight to the story that this decrease may be associated with a broad range of factors affecting mortality.


With respect to causes of death, it obviously isn't feasible to try and look at all 113 causes, so I broke it down to some broad categories of disease (there are many other ways to do this). Here are the results:

Author's calculations. Source: CDC WONDER

Author's calculations. Source: CDC WONDER

Again, there are a lot of potentially interesting things going on here.

  • For women, most of the story appears to be worsening cardiovascular disease and Alzheimer's disease mortality. These two groups accounted for 94% of the net decrease (it is important to remember that the overall change is a net change that results from some causes that improve and some that worsen).
  • For men, most of the story is about deaths from injuries, both unintentional and intentional. The largest contributor (41%) was unintentional poisoning, but homicide also contributed 20%, motor vehicle crashes 13%, and suicide 7%. Altogether these causes accounted for 82% of the net decrease in life expectancy for men. 
  • Progress in cancer mortality for both men and women kept life expectancy from declining by more than it did (i.e., the decrease in life expectancy would have been greater had cancer mortality rates not declined from 2014 to 2015).

There is obviously quite a lot of additional work to be done exploring the decrease, but a few things to keep in mind.

One, reductions in life expectancy are obviously bad, but they are uncommon, and in this case they are also small in magnitude, so it is important to keep that in mind when interpreting the proportional contribution of different causes.

Two, there has been a lot of attention focused on the opioid epidemic, but it would seem that the decline among men is linked more broadly to a number of injury-related causes. I have yet to see many reports even mention homicide (perhaps because there has been so much focus on the middle-aged white population after the Case/Deaton study last year, where homicides do not figure prominently), yet homicide played a larger role among men than did Alzheimer's. My guess is that this played an even larger role among non-Hispanic black men.

Three, there are many potential hypotheses that may be consistent with some parts of the patterns above. Many of the researchers quoted talked about wide ranging stress, economic inequality, poor health behaviors, increases in obesity, and any number of additional "meta-theories". However, it seems unlikely (to me) that any one of these explanations will be sufficient to account for the diverse patterns by gender, age, and cause of death we see here, not to mention race, which may make things more complicated. It may be that a number of different channels are operating simultaneously rather than one "big thing".

Finally, and more controversially, there is a lot of literature that suggests that mortality rates worsen when the economy improves (and vice versa). Yes, this sounds counterintuitive, but there is a lot of work from Christopher Ruhm, Jose Tapia-Granados and others that makes a reasonably good case for this hypothesis. That being said, whatever increases in death rates that may come from an improving economy (one mechanism is usually increases in traffic accidents as more people are working and commuting, which we do see here and which has received some press attention), they are usually not large enough to overwhelm all of the other improvements in lifestyles, public health, and medical care that make death rates *usually* decline.

It will be very interesting to follow this story as it unfolds.

For those interested, the mortality data I used for the decomposition can be downloaded here, and the Stata do-file is here.

Is support for democracy really declining?

A recent paper by Roberto Foa (University of Melbourne) and Yascha Mounk (Harvard) suggesting that support for democratic governance is declining among younger generations has generated a fair bit of buzz recently. The New York Times ran a relatively long story on it, highlighting a graph purporting to show declines in support for democracy among younger generations that had been making the rounds on Twitter (see for example here, retweeted nearly 6,000 times). Below is the NYT version of the graph (which, as far as I can tell does not appear in the actual paper, so I am a little unclear about where exactly these data came from):

These "declines" are dramatic, but this is essentially a snapshot of two pooled cross-sectional datasets (Waves 5 and 6 of the World Values Survey data, corresponding roughly to surveys conducted around 2006 and 2012). By "snapshot" what I mean is, we survey individuals at a single point in time (e.g., in 2012), and then we plot the percentage of individuals responding that it is "essential" to live in a democracy according to the year they were born. Fine. Nothing wrong with that. And I don't doubt that these estimates are correct in showing that individuals born more recently are less likely to say that it is essential to live in a democracy. 

However, it's important to note that these estimates, on their own, say nothing about trends in beliefs about democracy, unless one is willing to assume that individual perceptions of how "essential" it is to live in a democracy are fixed at birth. I'm not a political scientist, but I certainly know that many of my political opinions have changed over the years (there must be data on this, but I'm too lazy to search for it right now), and I'd gather that is true for many others. 

Nevertheless, the message the New York Times was pushing clearly did suggest that faith in democracy was declining:

Across numerous countries, including Australia, Britain, the Netherlands, New Zealand, Sweden and the United States, the percentage of people who say it is “essential” to live in a democracy has plummeted, and it is especially low among younger generations.

Saying the percentage has "plummeted" strongly suggests that is has declined in a substantial way over time, yet the graph shown above cannot tell us whether or not that is true.


The authors of the original paper made it pretty straightforward to see how they constructed their numbers, and the WVS has a brilliant website where you can easily download the data, so I grabbed the data and wanted to see whether a closer look at more than one survey would provide a similar picture.

With a little tweaking (code is at the bottom of the post) I think I got pretty close to reproducing the NYT figure:

Okay, so now it seems like we are working with basically the same outcome data as did the authors. What I'm interested in here is the change in the fraction of people reporting that living in a democracy is "essential" (measured as 10 out of 10 on a scale of importance). Because the WVS asked this question in both Wave 5 (around 2006) and Wave 6 (around 2012), we can look at a couple of things. First, we can see whether or not the proportion reporting that it's essential to live in a democracy has "plummeted" between the two surveys, as the NYT claimed above.

Second, and more relevant to the broader point I believe the authors were making about younger generations not valuing democracy in the same way as their elders, we can look and see, for individuals born in the same decade, whether their views are actually changing. In the table below I have calculated, just for the United States, the percentage reporting that it is essential to live in a democracy for each birth cohort, by survey year for waves 5 and 6 of the WVS:

2006 2011 Change
Birth cohort
1930-39 75.4% 71.3% -4.0%
1940-49 67.0% 58.4% -8.6%
1950-59 56.5% 57.4% 0.8%
1960-69 48.4% 50.6% 2.2%
1970-79 40.3% 44.1% 3.8%
1980-89 35.8% 29.2% -6.6%
Total 53.2% 50.9% -2.3%
Average within cohort change -1.2%

A few of things are worth noting here. First, as the authors of the paper and the NYT story emphasized, the decrease in the belief that it is essential to live in a democracy across birth cohorts is quite large (>70% for those born in the 1930s vs. ~35% for those born in the 1980s), and indeed this is troubling, particularly if younger generations do not change their thinking as they get older. (And as I noted before, I'm not a political scientist--there may be reams of very good evidence out there that opinions like this don't change much as people age, in which case this is all the more distressing. But an age effect is not the same thing as a secular trend, which is the point I am making here.)

Second, the claim that the NYT made about the "plummeting" of the percentage reporting that it essential to live in a democracy is a hard to see in these data. The overall percentage of decreased from 53% to 51% in the roughly 6 years between survey waves. I do think it is probably a bad sign for this kind of indicator to be declining over time, but I find it very difficult to characterize this as a "plummeting" change.

Third, and perhaps most interesting, if you look across the rows you can see something interesting, which is the change within the same birth cohorts. For example, the first row shows that 75% of people born in the 1930s said it was "essential" to live in a democracy in 2006, but 6 years later only 71% of people in that same birth cohort said it was essential. If anything, this suggest that it is older, rather than younger individuals, that seem to be "losing" their belief that living in a democracy is essential. In fact, the percentage regarding it as essential to live in a democracy is actually increasing over time in the "younger" cohorts of the 60s and 70s. (In fairness, the 1980s cohort does show a decline). The average within-cohort change (weighted by cohort size) is a decline of about 1 percentage point, but you can see this is due to very different patterns for older and younger cohorts. 

I did this same exercise for 3 of the other countries shown in the figures above (data for England and New Zealand for this outcome were not available in both survey waves) and plotted the results below. Each graph shows the change in opinion within each birth cohort between the two surveys, and a 95% confidence interval for each estimate). 

The estimates I showed in the table above for the United States are reflected here, and one can also see that it is hard to paint a picture of anything like robust declines in believing that it is essential to live in a democracy. In Australia, for example, the fraction of individuals who consider it essential to live in a democracy is actually increasing for every single birth cohort, despite the fact that it is lower in general for younger cohorts (the overall percentage in Australia also increased from 63% to 68% between the two surveys). Of course, there is measurement error here so nothing is really definitive for any country, but this pattern of results is considerably more nuanced than the NYT reported. I have not read the entire academic paper in detail, but I think it is safe to say that the authors do not use only this information to form their conclusions about the prospects of democracy as a form of government in rich countries. But I do think that many people (including Amanda Taub at the NYT) have stretched the interpretation of these data in ways that go well beyond what they are capable of revealing.


For anyone that is interested I have provided the Stata code I used to generate these numbers at the bottom of this post:

capture log close
log using declining-democracy.txt, replace text

//  program:
//  task:		replicate NYT (How Stable Are Democracies? ‘Warning Signs
//              Are Flashing Red’ 
//  author:     sam harper \ 30nov2016

//  #0
//  program setup

version 14
set linesize 80
clear all
macro drop _all	

// #1
// load World Values Survey data for Wave 5

* Wave 5 data
use V1 V2 V162 V236 using ///
  "WV5_Data_stata_v_2015_04_18.dta", clear

* restrict to AUS, SWE, NED, USA, NZL, GBR
keep if inlist(V2,752,36,528,840,554,826)

* rename variables
rename (V1 V2 V162 V236) (wave country impdemo byear)

* generate survey year midpoint
gen syear = 2006
label var syear "Survey year (midpoint)"

* save temporary dataset
tempname wvs5
save `wvs5', replace

// #2
// load World Values Survey data for Wave 6

* now for Wave 6 data
use V1 V2 V140 V241 using ///
  "WV6_Stata_v_2016_01_01.dta", clear

* restrict to AUS, SWE, NED, USA, NZL, GBR
keep if inlist(V2,752,36,528,840,554,826)

* rename
rename (V1 V2 V140 V241) (wave country impdemo byear)

* generate survey year midpoint
gen syear = 2012
label var syear "Survey year (midpoint)"

* append Wave 5 data
append using `wvs5'

// #3
// generate outcomes and birth cohort categories

* country order for graphing
recode country (752 = 1 "Sweden") (36 = 2 "Australia") ///
  (528 = 3 "Netherlands") (840 = 4 "United States") ///
  (554 = 5 "New Zealand") (826 = 6 "Britain"), gen(cname)

* generate outcome variable: How impt to live in a democracy? 
* 1 = not at all, 10 = absolutely
recode impdemo (10=1 "Yes") (1/9=0 "No") (-5 -4 -2 -1 = .), gen(demo)
label var demo "Democracy 'essential'?"

* generate birth cohort categories and labels
egen bcohort = cut(byear), at(1930(10)1990) icodes
label define bcohort 0 "1930-39" 1 "1940-49" 2 "1950-59" ///
  3 "1960-69" 4 "1970-79" 5 "1980-89"
label values bcohort bcohort
label var bcohort "Birth cohort"

// #4
// plot outcomes by birth cohort

* reproduce NYT figure
qui logit demo cname##bcohort, nolog
qui margins cname#bcohort
marginsplot, by(cname) recastci(rarea) ///
  byopts(rows(1) imargin(medsmall) ///
  title("Percentage of people who say it is 'essential' to live in a democracy", ///
  size(medium)) ///
  note("Source: Author's calculations of the World Values Surveys, Waves 5 and 6", ///
   color(gray) size(vsmall))) ///
  ylabel(0 "" .25 "25%" .5 "50%" .75 "75%" 1 "100%", angle(horizontal)) ///
  xlabel(0 "1930s" 5 "1980s") ciopts(color(ltblue)) ///
  plotopts(lcolor(edkblue) mcolor(edkblue) msymbol(O) msize(small)) ///
  ytitle("") scheme(tufte) name(nyt, replace) xsize(6.5)
graph export "nyt-replication.png", replace
* within cohort change in proportion reporting "essential"
qui logit demo cname##bcohort##wave if cname<5, nolog
qui margins r.wave, over(cname bcohort)
marginsplot, by(cname) xdim(bcohort) recast(scatter) recastci(rspike) ///
  byopts(rows(1) imargin(medium) ///
  title("Change in % saying it is 'essential' to live in a democracy, 2006 to 2012", ///
  size(medium)) ///
  note("Source: Author's calculations of the World Values Surveys, Waves 5 and 6", ///
   color(gray) size(vsmall))) ///
  ylabel(-.2 "-20%" -.1 "-10%" 0 "" .1 "10%" .2 "20%", angle(horizontal)) ///
  xlabel(0 "1930s" 5 "1980s") ciopts(color(ltblue)) ///
  plotopts(lcolor(edkblue) mcolor(edkblue) msymbol(O) msize(small)) ///
  ytitle("") scheme(tufte) yline(0, lcolor(red)) name(change, replace)
graph export "change.png", replace

log close







Disability, Welfare, Opioids...

Nick King, Erin Strumpf and I just published a new editorial in Annals of Internal Medicine on the potential association between changes to welfare policy in the 1990s, the subsequent rise of the disability rolls (particularly for mental health and musculoskeletal conditions such as back pain), and the extraordinary rise in deaths from drug overdoses related to prescription opioids (and now, increasingly, heroin). An ungated version of the paper will be available for 6 months here.

A Bad Taste That Keeps Not Getting Any Better....

Reposted from Some interesting comments by Jay Kaufman on the responsiveness (or lackthereof) of the PLoS ONE editors to serious concerns about one of their papers on race:

Jonathan Eisen already posted on this blog about a PLoS ONE paper by Mason et.. published on 23 October 2013.  And he posted related comments on the PLoS ONE website.  I also commented at this site, in reference to his comments and the authors' response. The purpose of my comments here are just to review those concerns and comment additionally on the PLos ONE response and what this means for the journal's publication model and the progress of science.

The paper by Mason and colleagues analyzes data on 48 people in each of 4 self-identified ethnic groups (African American, Caucasian, Chinese, and Latino). These study subjects are apparently volunteers, and the paper only states that they are non-smokers over 18 years old who are free of a list of diagnosed diseases and who have not recently had their teeth cleaned. Based on the text of the published paper, there is no consideration of their age, diet, social class, or even gender. The authors culture bacterial species from the study subjects and process the data through an algorithm that maximizes the prediction of racial group membership based on these measured data.

The prediction is moderately successful, but this could result from any number of unsurprising reasons. For example, if alcohol consumption affects some particular bacterial species, and whites drink more than Asians in central Ohio, then whatever species is diminished by alcohol exposure would help predict that a sample was from a white rather than from an Asian volunteer. And likewise for any of a million possible lifestyle, social class and demographic differences.  In fact, this is a general problem with data mining exercises that Lazer et al describe in the current issue of Science.

The authors provide no information about how balanced this sample is with respect to any of these variables. Maybe the 48 Hispanics are younger than the 48 whites on average, or have more tooth decay or eat more refined sugar or any of a million other possibilities. The fact that these countless potentially imbalanced factors get represented in the oral bio-environment hardly seems surprising, and the fact that these behaviors and exposures might be differential by race is an observation that is completely trivial from a sociological perspective.

My concern here, however, the authors assert in the published text that these differences do not arise from any of these myriad environmental factors, but from some innate genetic characteristics of the groups. In the Discussion section on page 3 they state that "ethnicity exerts a selection pressure on the oral microbiome, and...this selection pressure is genetic rather than environmental, since the two ethnicities that shared a common food, nutritional and lifestyle heritage (Caucasians and African Americans) demonstrated significant microbial divergence." Here is a remarkable statement, that Caucasians and African Americans experience no differential dietary or lifestyle factors. It is directly contradicted by thousands of published papers in sociology, epidemiology and anthropology that document these differences for reasons of culture, geographic origin, social class and discrimination.

Jonathan Eisen's post directly confronted the authors on this point, and they responded with the following explanation:

"Subjects were selected based on extensive questionnaire surveys and clinical examinations to ensure homogeneity. These questionnaires evaluated educational level, socio-economic status, diet and nutritional history, systemic health status, oral hygiene habits and dental visits, among other things."

This is surely an important statement about the research design, but the problem is that it appears nowhere in the peer-reviewed text of the published paper. What exactly do the authors mean when they insist that the study subjects were perfectly balanced on factors such as socio-economic status and nutritional history? These complex social and lifestyle variables are notoriously difficult to define and measure. While the authors describe the laboratory techniques in baroque detail, they do not even mention in the published paper that they measured these factors, let alone how these variables were defined and considered in the analysis. This represents a profound limitation for the reader in assessing the validity of these measures and adjustments, and therefore the adequacy of the claimed "homogeneity". The complete omission of these crucial aspects of the analysis in the paper prevents the reader from investing much confidence in the boldly stated claim that observed differences are "genetic rather than environmental" in origin.

I expressed these concerns in my own post at the journal website on 14 November 2013, but the authors did not respond.  Therefore, at Jonathan's suggestion, I addressed this concern to the PLoS ONE editors in an e-mail on 22 November 2013. I got passed along from one editor to another, and finally I got a very nice response from Elizabeth Silva on 4 December 2013. She wrote:
I wanted to let you know that I am discussing this article and your concerns with both the Academic Editor and the authors, as well as with my colleagues. We take such concerns very seriously and will ensure that appropriate measures are taken to correct any errors or discrepancies. 

Then I waited.  After 2 months I had heard nothing, so I wrote to Dr. Silva again asking for any word on progress, but received no reply.  So I waited another month.

On 4 March it had been 3 months since the note from PLoS ONE promising appropriate measures to correct any errors or discrepancies, so I wrote again, this time a bit more insistently.  This did message did finally generate a quick and reassuring response from Dr. Silva:
I really am very sorry for the extended delay in replying to you, and for my neglect in providing you with an update. Following your correspondence I contacted the authors to ask them for additional information relating to their statement that they corrected for confounding factors, and details of these methods. They promptly replied with a table of details of the baseline variables that they corrected for, and that they described in the comment on their article (see attached), as well as an additional correction to one of their figure legends. I then contacted the Academic Editor, Dr. Indranil Biswas with the full details of your concerns, as well as the table sent by the authors and the correction they requested for their figure legend. We asked Dr. Biswas to revisit the manuscript, in light of this new information, and he has informed us that he feels the conclusions of the manuscript are sound. We will now work with the authors to draft and issue a formal correction to the published article to update the methods to include the table, and to amend the figure legend in question.

The table that Dr. Silva forwarded displayed a list of variables and a p-value for some kind of test between the values in the 4 race groups. The test is not specified (t-test? chi-square test?) but presumably it is for any difference in means or proportions between the 4 groups.  Most of the p-values are large, indicating little evidence for any difference between the groups in income, age, education, or frequency of tooth-brushing, etc.  Based on this table, the populations differed only in their diets, which were characterized as "Asian Diet", "Hispanic Diet" and "American Diet".  Not unsurprisingly, the Asians were significantly more likely to report an "Asian Diet" and the Hispanics were significantly more likely to report a "Hispanic Diet".  The Blacks and Whites had similar reported consumption of the "American Diet", which presumably was the basis for the authors' assertion that these groups have identical social environments.

To date, there has been no correction made to the Mason et al paper at the PLoS ONE website.  Therefore it is perhaps somewhat premature to speculate on how the authors will address the concern voiced by Jonathan Eisen's posted comment and blog post that balance across a handful of measured covariates does not in any way imply balance across all relevant factors except for genetics. Indeed, it has long been argued in the epidemiology literature that one cannot make indirect inferences about genes by measuring and adjusting for a few environmental exposures and attributing all remaining differences to genes.  The argument that Blacks and Whites in Ohio experience identical environments is clearly false, even if a handful of measured covariates are not significantly different in their small convenience sample, the exact origin of which is still obscure.

There are many observations that can be made from this episode. I offer just a few:
  1. These authors are assiduous in describing their lab techniques, but regarding the study design and analysis they are quite cavalier.  Presumably the reviewers were not population scientists, and so they failed to point out these embarrassing flaws. This raises the question of whether a multidisciplinary journal such as PLoS ONE has the relevant expertise to screen out scientifically invalid papers. The fact that Dr. Silva suggested that the authors' table of covariates and p-values solves the issue demonstrates a wide gap of understanding.  Specialty journals that handle a narrow disciplinary range are not faced with this kind of crisis of competence.
  2. PLoS ONE is such a large operation with so many papers, that quality control seems to suffer. These beleaguered editors are responsible for an enormous publishing volume. Has quantity overwhelmed quality to the extent that gross errors of logic slip through? Months later, the Mason paper has been accessed thousands of times and generated a great deal of media attention, and yet no correction or erratum has appeared, despite the fact that the authors freely admit that the methods in their published paper are not accurate. 
  3. The publishing model gives PLoS ONE a big incentive (almost $2000) to accept a paper, but once it is published, little incentive to correct or withdraw it. 

Sadly, this is not an isolated example.  This week, PLoS ONE published a paper by Wikoff et al which makes a similar logical gaffe about observed racial difference proving a genetic difference.  We could post a comment online, but it seems that nobody (neither the authors nor the editors) has much time to spend monitoring such comments, nor much incentive to care about them.  The authors have their publication, the journal has its $2000, and another tiny piece of horrific misinformation has been released into the world.  The basic philosophy of PLoS ONE is to reduce the gate-keeper role of scientific publication. I am starting to become convinced that a little gate-keeping is not such a bad idea.

Don't do it...

Jay Kaufman sends along a bit from the Economist about the futility of earning a PhD. My favorite part:

One thing many PhD students have in common is dissatisfaction.

But this gets to the heart of the matter:

There is an oversupply of PhDs. Although a doctorate is designed as training for a job in academia, the number of PhD positions is unrelated to the number of job openings. 

And later...

America produced more than 100,000 doctoral degrees between 2005 and 2009. In the same period there were just 16,000 new professorships...
...Even in Canada, where the output of PhD graduates has grown relatively modestly, universities conferred 4,800 doctorate degrees in 2007 but hired just 2,616 new full-time professors.


Right now this seems less true in epidemiology (the job market still seems relatively strong), but recent doctorates are still in need of retirements or deaths.