Figure 3. Schematic flowchart of several processes for landcover classification. The detailed explanations of the processes in Google Earth Engine, QGIS, and Python are described in subsections 2.2, 2.3, and 2.4, respectively.
2.3 Investigation of NDXI and slope across different vegetation types
Investigation of NDXI in different vegetation types was conducted by GIS software Quantum GIS (QGIS version 3.20.0). In this study, we attempted to classify landcover in the Tyrma region into three vegetation types: wetland, forest, and grassland (Figure 2). First, 30 random point clouds were generated in a area where these vegetations are dominated. The random point cloud is a tool to sample a certain number of data in a given area, and we can specify the minimum distance between points (Figure S3). Here, we specified 30 m as a minimum distance to avoid generating points on the same grid of Landsat-8 data with 30 m resolution. Most importantly, the areas where the random point clouds were generated are locations that we confirmed the actual vegetation type (ground truth) at the study site. The random point clouds were generated in three ground truth areas for each vegetation type, that is, a total of 270 points (3 vegetation types × 3 ground truth areas × 30 point clouds). Then, the values of NDXI calculated using JJA-median Landsat-8 data were investigated for all 270 points. In addition to NDXI, the degree of slope at all 270 points was also investigated. The slope data were created using a 30 m resolution digital elevation model (DEM) provided by Japan Aerospace Exploration Agency (JAXA). The coordinates, the values of NDXI, and the degree of slope for all 270 points are summarized in Table S1.
2.4 Determination of classification criteria by decision tree algorithm
To classify landcovers based on NDXI and slope, the criteria were determined by supervised machine learning. Here, we utilized decision tree analysis in Python (version 3.8.5). A decision tree is an algorithm that classifies data gradually based on generated rules and outputs a tree-like graph. Of the data in all 270 points, 30% were used as test data, and the other 70% were used as learning data. This data classification was done in three stages. The obtained criteria as a result of the decision tree analysis were extrapolated to the whole Tyrma region, and landcover was classified into wetland (Mari ), forest, and grassland. The code for the decision tree analysis in Python is available in Figure S4.
2.5 Sampling of river waters
Water samples were collected in July 2019. Sampling of the Tyrma Main River was conducted just before the confluence with the Gujik River, and the sampling of other large rivers (the Yaurin River, the Gujik River, the Gujal River, and the Sutyri River) was conducted just before the confluence with the Tyrma River (Figure 1b). In addition, water samples were also collected in 19 small rivers: 8 rivers in the Gujal River system and 19 rivers in the Tyrma River system (Figure 1c). Two hundred milliliters of water was sampled using a disposable syringe (TERUMO, SS-50ESZ) and immediately filtered through 0.45 µm disposable filters made of cellulose acetate (ADVANTEC, DISMIC 25CS045AS). One hundred milliliters of the filtered water was preserved in an acid-washed propylene bottle for dFe measurement and the other 100 mL was preserved in a propylene bottle for DOC measurement. Both samples were kept in a refrigerator until analysis. Also, electrical conductivity (EC) was measured using a portable EC meter (ES-71, HORIBA) at the time of water sampling.
2.6 Chemical analyses and statistics
dFe concentration was determined by the 1,10-phenanthroline method (Russian international technical standards 52.24.358-2006: https://files.stroyinf.ru/Index2/1/4293837/4293837319.htm). Here, we describe this method briefly. First, 1 mL of 10% hydroxylammonium chloride was added to 50 mL of the sample. Second, this was boiled for 15–20 min until the volume reached 25 mL to separate organic iron complexes into organic compounds and Fe (Ⅱ). Third, after cooling, ammonium hydroxide was added until ~pH4. Fourth, 3 mL of ammonium acetate buffer and 1 mL of 1,10-phenanthroline were added, and ultrapure water was added until the volume reached 50 mL. Finally, 20 min after the color development, the absorbance at a wavelength of 510 nm was measured with an ultraviolet–visible spectrophotometer (SHIMADZU UV mini-1240). In this paper, we define dFe as Fe that was determined by this process. The detection limit for dFe by the 1,10-phenanthroline method was 0.02 mg L–1. DOC concentration was determined with a total organic carbon (TOC) analyzer (SHIMADZU TOC-LCSH) using the catalytic combustion oxidation method. The detection limit for TOC by the TOC analyzer was 0.1 mg L–1, and standard solutions for DOC analysis were prepared using Potassium Hydrogen Phthalate (C6H4(COOK)(COOH)) (Nacalai tesque).
Based on the produced landcover map (subsections 2.2–2.4), the coverage of wetland (Mari ) was investigated for each catchment area of 5 large rivers and 19 small rivers. The correlation of water chemistry (dFe, DOC, and EC) and the coverage of wetland (Mari ) was assessed using liner regression analysis and non-liner regression analysis. For non-liner regression analysis, three common functions (power, exponential, and logarithmic) were investigated to create a approximation curve. Both liner and non-liner regression calculations were performed by least-square method with Microsoft Excel Solver (version 2021). The approximation line or curve with the highest coefficient of determination (r2 ) was selected as the most suitable regression equation to represent the coefficient between water chemistry and the wetland coverage. Note that coefficient of determination (r2 ) and Pearson’s correlation coefficient (r ) for each approximation curve were calculated by fitting liner regression model for the log-transformed data.
3 Results
3.1 Landcover classification by decision tree analysis based on NDXI and slope
Ranges of NDXI and slope on the point clouds in wetland (Mari ), forest, and grassland are shown in Figure 4. Here we focus attention on the differences of the NDXI and slope in the wetland compared with those in the forest and the grassland. In the wetland, NDVI was in the range of 0.50–0.75, NDSI was –0.32 to –0.14, and NDWI was –0.65 to –0.47. NDVI and NDSI in the wetland largely overlapped with those of the forest, indicating that NDVI and NDWI were not useful in distinguishing between wetland and forest. On the other hand, NDSI in the wetland was clearly higher than that in the forest and the grassland. The range of slope in the wetland was quite low at 0.43–4.31 degrees. Compared with this, the forest clearly showed a higher range, but the grassland showed almost the same range as the wetland; accordingly, wetland cannot be distinguished from grassland just by the slope. From these findings, NDSI seems to be the most effective index for identifying the distribution of wetland (Mari ).