Statistical Analysis

Our basic framework is a cluster-randomized evaluation, with clusters specified at the level of the hamlet. A key issue for inference from such a design is to account for non-independence among observations from the same cluster when estimating the impact of the intervention \cite{Hayes:2009it}. We focus here on the general plan for analysis of survey results, including descriptive tables and basic model specifications.

Descriptive tables

We will generate a descriptive table of measures of central tendency and dispersion for the main outcome variables and the primary covariates specified above, and provide both cluster-level summaries and individual-level summaries \cite{Campbell:2004fj}. Continuous variables will be presented as means and standard deviations, and categorical variables as proportions and standard deviations. We will also provide a breakdown by our main stratification variable, block. We will refrain from conducting statistical tests of balance across treated and control groups \cite{Senn:1994qv}.

Sample attrition and missing data

Given the relatively high initial response rate for our baseline survey (~90%), relatively short follow-up, and generally limited migration in the study area, we anticipate little study attrition. Even for those observed at follow-up it is also possible that some primary or secondary outcome data could be missing, although we anticipate relatively little item non-response, based on the baseline survey results. Regardless, sample attrition and missing data are common in virtually all surveys and we describe briefly our plan for dealing with these issues. If the proportion of missing outcome data is relatively small (e.g., ¡5%) we will proceed with a complete case analysis. Regardless of the amount of missing data, we will describe the reason why subjects were lost to follow-up and why outcomes are missing \cite{NAP:2010qc,Groenwold:2014bq}. In addition to reporting baseline characteristics we will also include characteristics of those who were lost to follow-up or were missing outcome or covariate data. For amounts of missing data that may be more severe we will perform sensitivity analysis to quantify the effect of missing outcome data on our effects of interest (e.g. multiple imputation or inverse probability weighting, or estimating bounds for our treatment effects) \cite{Manski:1990fr,Lee:2009aa,White:2011aa}.

Main model specifications

The primary specification will be an intent-to-treat (ITT) model including adjustment for stratification by block. For continuous outcomes we will use linear model of the form:
\begin{equation} y_{ij}=\alpha+\beta Z_{j}+\sum_{k=1}^{K}\gamma_{k}K_{jk}+\varepsilon_{ij}\\ \end{equation}
where \(y_{ij}\) is the outcome of interest, \(Z_{j}\) is the main treatment assignment variable (1 for intervention, 0 for control), \(\gamma_{k}\) are coefficients for each of the \(K\) blocks on which we stratified randomized treatment assignment, and \(\varepsilon_{ij}\) is an error term. To account for non-independence of errors among observations in the same hamlet we will use cluster-robust variance estimators in all models \cite{Hayes:2009it,Cameron:2015aa}. The coefficient \(\beta\) and its 95% confidence interval will be our primary estimate of interest. Equation 1 above represents our unadjusted model, but in order to increase precision we may include additional baseline covariates that are strong predictors of the outcome. It is often the case that one of the strongest predictors of post-treatment outcome is pre-treatment outcome measured at baseline, and we will include this as well:
\begin{equation} y_{ij}=\alpha+\beta Z_{j}+\sum_{k=1}^{K}\gamma_{k}K_{jk}+\sum_{m=1}^{M}\delta_{m}\mathbf{C}_{m}+\varepsilon_{ij}\\ \end{equation}
where now \(\mathbf{C}\) is a vector of pre-treatment covariates, including a measure of the outcome \(y_{ij}\) at baseline. In particular, including measures of the outcome at baseline can lead to meaningful increases in power, especially for outcomes that are weakly correlated over time \cite{Frison:1992rt,McKenzie:2012yu}. For continuous outcomes (e.g., empowerment scale) we will use linear regression. For binary outcomes (e.g., maternal employment) we will use logistic regression and report marginal effects on the absolute risk scale. For count outcomes (e.g., reports of symptoms of distress) we will use either Poisson or negative binomial regression and report marginal estimates of the expected number of events. For all models we will use cluster-robust standard errors, clustered at the hamlet level \cite{Cameron:2015aa}.

Heterogeneous treatment effects

Prior literature has found some evidence that access to center-based child care had stronger impacts on mothers who may have been more disadvantaged \cite{Barros:2011qr,Angeles:2014bh}. Therefore, we will assess whether there are any differences in the program impact according to the following maternal characteristics: 1) Current work status (whether or not women were working at baseline); 2) Education (less than primary vs. primary or greater); 3) Household structure (e.g., whether there is another adult caregiver in house); and 4) Measures of women’s empowerment at baseline (continuous latent scale from confirmatory factor analysis). Additionally, we will test whether the program has any differential impact by our main geographic stratification variable (block). In order to test whether the program effects may differ by other individual characteristics, we will extend the generalized linear model specified with covariates (including block) above (Equation 2) to allow for heterogeneity in the impact of the treatment:
\begin{equation} y_{ij}=\alpha+\beta Z_{j}+\sum_{k=1}^{K}\gamma_{k}K_{jk}+\sum_{m=1}^{M}\delta_{m}\mathbf{C}_{m}+\theta\left(Z_{j}\times C_{ij}\right)+\varepsilon_{ij}\\ \end{equation}
where the coefficient \(\theta\) on the product term between treatment assignment (\(Z_{j}\)) and a specific covariate \(C\) provides a test of whether the program’s impact is homogenous across levels of \(C\). The other terms in Equation 3 are defined analogously to Equation 2, though the inclusion of the product term affects their interpretation. Based on prior research we anticipate stronger treatment effects for women who were not working at baseline, with low education, without another adult caregiver in the household, and with higher baseline levels of empowerment.

Multiple comparisons

Our primary analytic goal for this study is to estimate treatment effects and measures of precision (i.e., confidence intervals) for the intervention, and we will generally avoid null hypothesis significance testing (with the exception of tests for heterogeneity as described above). However, because daycare is a complex intervention that may plausibly affect a number of outcomes for both mothers and children, in order maintain an overall Type 1 error rate we will make some adjustments for testing the impact of the intervention across many outcomes \cite{Schochet:2008db,Anderson:2012nx,Glennerster:qf}. To test whether the intervention has general impacts on a given domain with multiple indicators (e.g., empowerment, employment, income/savings, time use) we will create a summary index for each domain. For related outcomes within a given domain we will take the approach of creating standardized effect estimates for each of the subdomains, averaging them, and calculating a summary estimate that accounts for testing multiple outcomes \cite{OBrien:1984xe,Kling:2007fv}. The use of an index is beneficial in this case since probability of a false rejection does not increase as additional outcomes are added to a summary index. Moreover, an indexed outcome can provide a statistical test for whether a program has a “general effect” on a set of outcomes.
Using an index, however, will not provide us with inference on specific sub-components within a given domain (e.g., a women’s decision-making ability regarding her own health care in the empowerment domain). Within a given domain we will report both unadjusted \(p\)-values for individual outcomes and “family-wise” \(p\)-values adjusted to account for the multiple outcomes examined within a domain \cite{Kling:2007fv,Finkelstein:2012ye}. To avoid the losses of power associated with simple Bonferroni-type adjustments we will use the free step-down method of Westfall and Young \cite{Westfall:1993qr}. Finally, because many of our tests for heterogenous effects are exploratory, we will use the less conservative strategy of Benjamini and Hochberg \cite{Benjamini:1995le} account for multiple testing across the standardized domain outcomes.

Estimating the impact of use of daycare on maternal outcomes

All of the prior analyses will estimate the impact of random assignment to daycare. However, we are also interested in estimating the impact of actual receipt of daycare on maternal and child outcomes. For these analyses the basic identifying assumption is that random assignment to daycare only affects outcomes through its impact on the actual take-up and use of daycare. Equation 1 above may be considered as the ITT or reduced form equation for the impact of treatment assignment, and we will use two-stage least squares (2SLS) instrumental variables analysis to estimate the impact of daycare on maternal outcomes. We will proceed by estimating the first stage equation, which gives the impact of randomized treatment assignment on daycare:
\begin{equation} D_{ij}=\alpha+\beta Z_{j}+\sum_{k=1}^{K}\gamma_{k}K_{jk}+\sum_{m=1}^{M}\delta_{m}\mathbf{C}_{m}+\varepsilon_{ij}\\ \end{equation}
where \(D_{ij}\) now represents whether or not mother reports enrolling her child in the balwadi program and the other parameters are defined similarly as in Equation 1 above. We use the coefficients from Equation 3 to predict, for each observation, the probability \(\hat{D}_{ij}\) of enrolling in the balwadi program, which is then used in the second stage equation:
\begin{equation} y_{ij}=\zeta+\eta\hat{D}_{ij}+\sum_{k=1}^{K}\theta_{k}K_{jk}+\sum_{m=1}^{M}\lambda_{m}\mathbf{C}_{m}+\varepsilon_{ij}\\ \end{equation}
where \(y_{ij}\) is the outcome of interest, \(\hat{D}_{ij}\) is the predicted probability of daycare use derived from Equation 4 and the coefficient. We will use software to correct standard errors for the fact that \(\hat{D}_{ij}\) is predicted in the first stage \cite{Angrist:1996rm,Angrist:2008vk}.