Empirical framework
Context
Tanzania is one of the largest countries in Eastern and Central Africa,
and an important source of the region’s maize production. However, most
of this production comes from smallholders who have relatively low
levels of productivity, and few of which use modern inputs such as
fertilizer. As such, raising maize yields has been an important
investment and policy target for the country and its partners in recent
years. Tanzania is representative in many ways of the maize-based
farming systems found elsewhere in the region, in terms of its
agroecologies and range of biophysical endowments, the predominant
production characteristics of its smallholder farmers, and the
relatively low levels of market infrastructure development. At the same
time, the heterogeneity of production characteristics found within
Tanzania’s maize growing areas bodes well for its value as a test case
for evaluating variability of agronomic responses across key
geographical characteristics (Nord & Snapp 2020).
Data
Farm household survey data were collected in Tanzania in 2016 and 2017
on 624 households, located in 25 districts (Figure 1). These districts
are located in both the Southern Highlands and Northern zone,
representing the most important maize growing areas in the country.
Within each district, a stratified sampling frame was used that
maximized soil type variability so as be able to make broad inferences
about crop response, and to identify survey localities (Walsh & Vågen
2006; Shepherd et al., 2015). Within each locality, a listing of all
maize producing households was generated with the assistance of the
local headman. From this listing, 24 households in each locality were
randomly selected. Data were collected on household demographics, farm
and non-farm economic portfolios, land holdings and productive assets,
and other characteristics. Within each farm household, basic information
was collected for each plot managed by the household (e.g. land use
status, production decisions). In addition, very detailed agronomic
management information was collected for household’s most important
maize plot (henceforth the farm’s “focal plot”). This plot was
identified by the farmer as the plot which generated the most maize
production, and which received the most managerial effort.
Nitrogen and other macronutrient supplies were calculated from the
various fertilizer blends farmers reported using. To account for
implausible values, we replaced application rates exceeding 700 kg
ha-1 of N with that value, which was tantamount to
winsorizing at the 99th percentile of N application
rates for fertilizer users, and which follows the protocols used by
Liverpool-Tasie et al. (2017) and Sheahan and Barrett (2017).
Maize yields on focal plots were measured using crop cuts from three 5x5
meter quadrants, calculated at 12.5% grain moisture content. Soil
characteristics from these plots were measured from samples taken at
quadrant locations at 0-10 and 10-20 cm depths.
Total organic carbon, despite its well-recognized importance as an
indicator of overall soil quality, is not an ideal indicator of nutrient
availability because much of the bulk soil organic matter is relatively
inert (Drinkwater et al., 1998). Soil organic carbon is largely
conditioned by topography and soil parent material; however, once a
field is converted to agriculture, active soil organic matter fractions
largely determine soil productivity, and this is markedly influenced by
farmer practices (Zingore et al., 2008). Thus, rather than testing for
total carbon, as is often the case in standardized soil testing, testing
the active organic matter pool provides better insight into how changes
in management affect nutrient cycling and potential soil C accumulation
or loss (Haynes, 2005; Wander, 2004). The active carbon pool, while
constituting a small fraction (5–20%) of the soil’s total organic
matter, is the component that greatly influences key soil functions,
such as nutrient cycling and availability, soil aggregation, and soil C
accumulation (Grandy and Robertson, 2007; Schmidt et al., 2011; Six et
al., 1998; Wander, 2004). Hence, in this analysis, we focus on the
factors influencing active carbon.
Developments in laboratory assays to monitor ‘active’ soil organic
matter fractions have highlighted the value of permanganate oxidizable
carbon as an early indicator of management influence on soil organic
carbon (Culman et al., 2012). Total soil organic carbon also provides
insights regarding sustainable soil management, although at a slow
timestep (five to ten years). For this work, permanganate oxidizable
carbon (POXC) was determined on a ground (1mm sieve) sub-sample,
oxidized with 0.02 M KMnO4, and subsequently absorbance was read at a
wavelength of 550nm (ibid.). To address potential measurement error, and
under the assumption that the soil properties of interest here
(particularly soil active carbon) are relatively stable, we use the
average measure across the two years for each plot in our regression
work.
Rainfall was measured as the sum of dekadal values recorded for the main
growing season, using the CHIRPS dataset (Funk et al., 2017). Rainfall
variability was measured as the coefficient of variation on the dekadal
observations within a season.
Estimation strategy
The intent of this paper is to understand the agronomic and economic
returns to nitrogen fertilizer applications in smallholder maize
production. In keeping with agronomic and agricultural economic
literature, we frame maize yield (y) as a function of fertilizer
application rates (F), other agronomic management decisions (M), and
other exogenous conditioners (G).
\(y\ =\ f(\mathbf{F},\mathbf{M},\mathbf{G})\) (1)
Because farmers in Tanzania use a variety of fertilizer blends, we
integrate these decisions be decomposing each blend into its
macronutrient content, i.e. nitrogen (N), phosphorous (P) and potassium
(K). Other management factors include improved maize seed, maize-legume
intercropping (common in the southern highlands), organic matter
integration via compost, manure and crop residue retention, plant
spacing, weeding, fallowing, terracing and erosion control structures,
and herbicide and pesticide applications. Other exogenous conditioners
include slope, rainfall, rainfall variability and the presence of
disease or striga (witchweed).
We adopt a flexible polynomial functional form, allowing for quadratic
terms and interactions between variables. In this approach, we follow
similar empirical studies (e.g. Burke et al., 2017, Sheahan et al.,
2013, Xu et al., 2009). This flexibility is important in enabling us to
investigate how yield response to nitrogen is conditioned by other
factors. We may generalize this function as:
\(y_{\text{it}}\ =\ \alpha+\beta_{1}N_{\text{it}}+\beta_{2}N_{\text{it}}^{2}+\beta_{10}\mathbf{X}_{\mathbf{\text{it}}}++\beta_{11}N_{\text{it}}*\mathbf{X}_{\mathbf{\text{it}}}+\ u_{\text{it}}\)(2)
where N is nitrogen, our primary input of interest, i indexes
plots, t indexes observations over time, and where, for
convenience, we have subsumed M and G in the vectorX . As indicated earlier, a priori hypotheses include the
possibility of positive interactions between nitrogen, soil organic
carbon and rainfall, after controlling for other factors.
A key consideration is the possibility that unobserved factors may
possibly bias our estimation results. Concretely, we may decompose the
residual in equation 2 as:
\(u_{\text{it}}\ =\ o_{\text{it}}+\ c_{i}+\ \epsilon_{\text{it}}\)(3)
where \(o\) represents unobserved time-varying factors, \(c\) represents
unobserved time-constant factors, and \(\epsilon\) is a randomly
distributed error term. Time-varying unobservables may include soil
moisture, nutrient status or other factors which are often missing from
empirical studies (or poorly measured). Time-constant unobservables may
include farmer ability or plot biophysical characteristics which change
little from year to year, but which may affect both fertilizer usage and
yield outcomes. Finally, correlation between model covariates and the
stochastic error term may be an additional source of bias.11Burkeet al . (2017) provide a useful, detailed discussion of these
issues and corresponding identification strategies in survey data
settings.
In the present study, we argue that our dataset does a better job at
controlling for time-varying plot and plot-management factors than is
typically the case in empirical studies, and therefore unobseved\(o_{\text{it}}\)is unlikely to be a major issue. Our larger concern is
with time-invariant unobserved farmer and plot-level heterogeneity which
are likely to upwardly bias our results if not addressed (e.g. under the
assumption that more able farmers are more likely to use fertilizer than
less able farmers). To address this, we estimate models with the
Mundlak-Chamberlain device (i.e. the Correlated Random Effects model
(Wooldridge 2010), as well as a Fixed Effects estimator.