Materials and Methods

Data Compilation

We used a combination of keywords, “fung*” or “bacteria*”, “ratio”, and “terrestrial” or “soil”, to search peer-reviewed papers in Google Scholar. The papers were selected via following criteria: 1) at least one of fungal biomass, bacterial biomass, or F:B ratio and the units were clearly reported; 2) the data were extractable from tables (assessing the text) or figures (using Engauge Digitizer Version 10.7); 3) the study sites were not affected by disturbances such as fire burning, mining, and heavy metal contamination; and 4) the reported data contain 0-30 cm topsoil. Geological information of the sampling sites was recorded and used to locate the sites on the global map (Fig. 1 ). We also collected any available soil pH, mean annual precipitation (MAP), mean annual temperature (MAT), SOC and total nitrogen (TN) concentration, and soil texture, to validate the extracted data from global datasets.
Fungal and bacterial biomass were measured using a number of methods such as phospholipid fatty acid (PLFA), direct microscopy (DM), colony forming units (CFU), substrate-induced respiration (SIR), and glucosamine and muramic acid (GMA). Additionally, we included some experimental data (214) measured using PLFA from global topsoil dataset, detailed information about this dataset can be found in Bahram et al. (2018). To examine the potential biases in the measurement of fungal and bacterial biomass, we did a comparison among those methods (Table 1, Table S1 ). To compare the fungal (FBC) and bacterial (BBC) biomass C measured using different methods, we used conversion factors for PLFA (Frostegård & Bååth 1996; Klamer & Bååth 2004), SIR (Beare et al. 1990), CFU (Aon et al. 2001), DM (Birkhoferet al. 2008), and GMA (Jost et al. 2011) reported by previous studies. Across biomes, FBC, BBC, and F:B ratio generally follow the similar pattern using different methods. However, we found large variations in measured FBC and BBC among different methods. Specifically, compared with PLFA, SIR, and GMA, fungi were more dominant over bacteria using CFU, while DM estimated higher dominance of bacteria relative to fungi, suggesting that DM may underestimate FBC while CFU may overestimate FBC. In addition, we found overall higher FBC and BBC measured using GMA, which was largely different from the measurements using other methods. Therefore, using data generated from multiple methods in one analysis might be problematic. Finally we used PLFA data for this analysis. This selection is due to two reasons: 1) the PLFA was the most widely used approach (Materials and Methods ), eventually the PLFA-derived FBC and BBC measurements account for 73% of the whole dataset; 2) the PLFA has been evaluated and proved to be the most appropriate approach for estimating FBC and BBC simultaneously (Waring et al. 2013).
The final database included the fungal and bacterial biomass data measured using PLFA from publications spanning from the late 1960s to 2018. Collectively, 1323 data points in unvegetated ground and 11 biomes (i.e., boreal forest, temperate forest, tropical/subtropical forest, grassland, shrub, savanna, tundra, desert, natural wetlands, cropland, and pasture) across the globe were included in the database (Fig. 1 ). Forest, grassland, and cropland contribute approximately 39%, 22%, and 19% of the dataset, respectively, with all the other biomes together contributed 20% of the dataset. A majority of the field sites are located in North America, Europe, and Asia. There is relatively small amount of observations in South America, Africa, Russian Asia, Australia, and Antarctica. All soil samples are for 0-30 cm soil profile. For data points without coordinate information being reported, we searched the geographical coordinates based on the location of study site, city, state, and country. Then, the geographical information was used for locating the sampling points on the global map to extract climate, edaphic properties, plant productivity, and soil microclimate long-term data from global datasets.

Climate, Plant, and Soil Data

MAT and MAP with the spatial resolution of 30s during 1970-2000 were obtained from the WorldClim database version 2 (http://worldclim.org/version2). In addition, monthly mean SM and soil temperature (ST) during 1979-2014 were obtained from the NCEP/DOE AMIP-II Reanalysis (https://www.esrl.noaa.gov/psd/data/gridded/data.ncep.reanalysis2.gaussian.html). The global vegetation distribution data were from a spatial map of 11 major biomes: boreal forest, temperate forest, tropical/subtropical forest, mixed forest, grassland, shrub, tundra, desert, natural wetlands, cropland, and pasture, which have been used in our previous publication (Xu et al. 2013). We used the data for spatial distribution of soil properties, including soil pH, sand, silt, clay, and SOC from the Harmonized World Soil Database (HWSD, https://daac.ornl.gov/cgi-bin/dsviewer.pl?ds_id=1247), while soil bulk density and TN data are from the IGBP-DIS dataset (IGBP, https://daac.ornl.gov/SOILS/guides/igbp-surfaces.html) because TN is not in HWSD. Since TN in IGBP-DIS are for the 100-cm profile as a whole, we used the factor calculated from the fraction of SOC in the top 0-30 cm with HWSD. Since SOC and soil TN exhibit large spatial heterogeneities, the variation in fine-scale variation in edaphic properties were underrepresented in global datasets. To better account for the edaphic effects on fungal and bacterial distribution, we examined the relationships of FBC, BBC, and F:B ratio with SOC, TN, and C:N ratio with the data directly extracted from literatures. Due to the poor correlation between bulk density extracted from HWSD and reported bulk density in literatures, we used the same soil bulk density values for the entire top 100 cm soil profile from IGBP, assuming no difference in bulk density between top 0-30 cm and 30-100 cm soil profiles. Root C density (Croot) data were extracted from global dataset of 0.5 degree based on observation data (Ruesch & Gibbs 2008; Songet al. 2017). Annual net primary productivity (NPP) was obtained from MODIS gridded dataset with the spatial resolution of 30s during 2000-2015 (http://files.ntsg.umt.edu/data/NTSG_Products/MOD17/GeoTIFF/MOD17A3/GeoTIFF_30arcsec/). We then compared the data directly extracted in literatures and those extracted from global datasets, and consistencies were found for a majority of the dataset (Fig. S1 ).

Model Selection and Validation

Considering the clear biogeographic patterns of FBC, BBC, and F:B ratio, we developed generalized linear models with climate (MAP and MAT), soil microclimate (ST and SM), plant (NPP and Croot), and edaphic properties (clay, sand, soil pH, bulk density, SOC, and TN) to tear apart the controlling factors on fungal and bacterial distribution. Based on the generalized linear model of climate, plant, edaphic properties, and soil microclimate for FBC, BBC, and F:B ratio, over 70% of variations in FBC, BBC, and F:B ratio can be explained by the generalized linear model, and FBC and BBC were better explained than F:B ratio (Fig. 2 ).
Considering the higher proportion of missing data in FBC (14.8%) and BBC (16.3%) relative to F:B ratio (1.9%), we built an empirical model for F:B ratio with 75% of the dataset. With the generalized linear model of F:B ratio, we did the principle component analysis to select the important factors in explaining the variations in the F:B ratio. Based on the variations explained by each component and the cumulative variations of components, we selected 31 most important factors with emphasis on climate in explaining the variation in F:B ratio using stepwise regression, which explained 33.0% of the variation in F:B ratio (Fig. S7; Table S2 ). The selected empirical model had the formula: log10 (F:B ratio)=0.6789-0.03402*MAT-0.000058*MAP+0.003772*ST+1.542*SM-0.00099*NPP+0.01553*Croot+0.1226*bulk density+0.05991*soil pH-0.03631*clay-0.0045*sand+0.002878*SOC-0.01607*TN+0.000177*MAT*ST-0.03955*MAT*SM-0.000015*MAP*ST-0.000335*MAP*SM+0.000005*MAT*NPP-0.001615*MAT*Croot+0.000001*MAP*NPP+0.000007*MAP*Croot+0.02201*MAT*bulk density-0.003794*MAT*soil pH+0.002188*MAT*clay+0.000137*MAT*sand-0.000061*MAT*SOC+0.00513*MAT*TN-0.000029*MAP*soil pH+0.000001*MAP*clay+0.000003*MAP*sand-0.000001*MAP*SOC-0.000043*MAP*TN.
After the model is developed, we used the 25% of the data that were not used in model development to validate the model that returned a high consistency (Fig. S8a ). We then investigated the modeling performance of F:B ratio by comparing the model simulation and observed data in each biome (Fig. S9 ). We found the overall consistency between simulated and observed log-scaled F:B ratio, with relatively poor fit in deserts. Given the much lower BBC and FBC in deserts, this inconsistency does not bring large bias to our large-scale estimation. Additionally, we found a little overestimation of F:B ratio in croplands and pastures, indicating large uncertainties in managed systems that was caused by human activities.

Mapping Global Bacterial and Fungal Biomass Carbon

We compared the microbial biomass C in Xu et al. (2013) and the sum of FBC and BBC in this study and found a good agreement between the sum of FBC and BBC and microbial biomass C (Fig. S8b ; R2=0.91), indicating that the sum of FBC and BBC constitutes a constant proportion of microbial biomass, providing a feasible way to estimate FBC and BBC. Based on the microbial biomass C dataset in Xu et al. (2013) and the global map of F:B ratio, we generated the global maps of FBC and BBC and estimated global storage of FBC and BBC. The auxiliary data used included global vegetation distribution (Xu et al. 2013) and global land area database supplied by surface data map generated by Community Land Model 4.0 (https://svn-ccsm-models.cgd.ucar.edu/clm2/trunk_tags/clm4_5_1_r085/models/lnd/clm/tools/clm4_5/mksurfdata_map/).

Uncertainty Analysis

To estimate the parameter-induced uncertainties in fungal and bacterial biomass distribution and storage, we used improved Latin Hypercube Sampling (LHS) approach to estimate the variations in F:B ratio. LHS approach is able to randomly produce an ensemble of parameter combinations with a high efficiency. This approach has been widely used in the modeling community to estimate uncertainties in model output (Haefner 2005; Xu 2010; Xu et al. 2014). First, we assumed that all parameters follow normal distribution, then we used LHS to randomly select an ensemble of 3000 parameter sets using the package of “improvedLHS” in R program (Table S2 ). Then we calculated the 95% confidence interval of fungal and bacterial biomass C density and storage for reporting (Table 2 ).­­­

Statistical Analysis

Since FBC, BBC, and F:B ratio in our dataset did not follow normal distribution, we used log-transformation to convert them to normal distributions for subsequent statistical analysis. The mean and 95% confidence boundaries of FBC, BBC, and F:B ratio were transformed back to the original values for reporting. To understand the variations of FBC, BBC, and F:B ratio, we conducted generalized linear model to investigate relationships between FBC, BBC, and F:B ratio and long-term climate (MAP and MAT), soil microclimate (ST and SM), plant (NPP and Croot), and edaphic properties (clay, sand, soil pH, bulk density, SOC, and TN). Then we used Akaike information criterion (AIC) as selection criteria, i.e., the smaller the AICs, the better the regression. Before conducting the generalized linear model, we tested the multicollinearity for the variables within and among each variable group, i.e., climate, soil microclimate, edaphic properties, and plant, and we did not find significant multilinearity (VIF < 5). All statistical analyses were carried out and relevant figures were plotted with R3.5.3 in Mac OS X. The Fig. 1 and Fig. 3 were produced with NCAR Command Language (version 6.3.0) and ArcGIS (version 10.5), respectively.