Minghao Qiu

and 2 more

Evaluating the influence of anthropogenic emissions changes on air quality requires accounting for the influence of meteorological variability. Statistical methods such as multiple linear regression (MLR) models with basic meteorological variables are often used to remove meteorological variability and estimate trends in measured pollutant concentrations attributable to emissions changes. However, the ability of these widely-used statistical approaches to correct for meteorological variability remains unknown, limiting their usefulness in the real-world policy evaluations. Here, we quantify the performance of MLR and other quantitative methods using two scenarios simulated by a chemical transport model, GEOS-Chem, as a synthetic dataset. Focusing on the impacts of anthropogenic emissions changes in the US (2011 to 2017) and China (2013 to 2017) on PM2.5 and O3, we show that widely-used regression methods do not perform well in correcting for meteorological variability and identifying long-term trends in ambient pollution related to changes in emissions. The estimation errors, characterized as the differences between meteorology-corrected trends and emission-driven trends under constant meteorology scenarios, can be reduced by 30%-42% using a random forest model that incorporates both local and regional scale meteorological features. We further design a correction method based on GEOS-Chem simulations with constant emission input and quantify the degree to which emissions and meteorological influences are inseparable, due to their process-based interactions. We conclude by providing recommendations for evaluating the effectiveness of emissions reduction policies using statistical approaches.

Yuchen Xiao

and 3 more

Saltwater disposal (SWDs) has been linked to the recent increase of earthquakes in various regions of the United States. In some cases, the strong temporal and spatial associations have provided unequivocal evidences to the scientific community that wastewater injection is one of the dominant causal factors to the onset seismicity. In addition, numerous physical models have suggested that the increase in pore pressure from wastewater injection is capable to induce fault slips, providing further physical evidences. Another growing body of literature sorts to rigorously prove causality with statistical analysis where they propose statistical frameworks with parametric regression models to evaluate whether the observed earthquakes were occurring more often than by random chances and tested the statistical significance of the observed occurrences of earthquake to arrive at causal interpretations. We propose causal inference frameworks with the potential outcomes perspective to explicitly define what we meant by causal effect with mathematical formulations and declare necessary assumptions to ensure consistency between models for model comparison. In particular, we put considerations on two common difficulties in raster-based spatial statistical analysis, the spatial correlation, which can be described by Tobler’s first law of geography where near things are more related than distant things, and interference, a causal inference term, where treatments applied to some spatially indexed units affect the outcomes at other spatially indexed units, mostly due to complex physical processes. The study region, the Fort-Worth Basin of North Central Texas, is discretized into non-overlapping grid blocks. The first proposed workflow adopts a cross-sectional study design on aggregated earthquake catalog and injection data where two statistical methods are employed to test the significance of the causal effect between the presence or absence of saltwater disposals and the number of the earthquakes and to estimate the magnitude of the average causal effect. The second proposed workflow incorporates the temporal domain which holds more scientific interests. Finally, the analysis is repeated for different grid configurations to directly assess the sensitivity of statistical results.

Yuchen Xiao

and 4 more

Saltwater disposal has been identified as the dominant causal factor that contribute to induced seismicity. Physical models rely on mechanistic understanding to infer causality where they evaluate various conditions for fault slips albeit with a high degree of uncertainty due to sparse data and subsurface heterogeneity. Given these uncertainties, statistical analysis is designed to measure statistical associations in the observed data with parametric regression models and interpret the significance of specific coefficient as evidence of causation. However, it is often difficult to interrogate the coefficients between different statistical models as the coefficients hold different implications. We propose a causal inference framework with the potential outcomes perspective to explicitly define what we meant by causal effect and declare necessary assumptions to ensure consistency between models for model comparison. The proposed workflow is applied to the Fort-Worth Basin of North Central Texas with the area of interest is discretized into non-overlapping grid blocks. Two statistical methods are employed to test the significance of the causal effect between the presence or absence of saltwater disposals and the number of the earthquakes and to estimate the magnitude of the average causal effect. In addition, our analysis is repeated for different grid configurations to directly assess the sensitivity of statistical results. We have identified a stable and statistically significant causal relationship between the presence of saltwater disposals and the number of earthquakes and have estimated there are, on average, 13 more earthquakes occurring in grids with saltwater disposals.