2. METHODS

2.1 Study area and tag deployment

Field work was conducted in the southwestern part of Mauritius Island in 2014, 2016 and 2018 (Fig. 1). Sperm whales (n=22) were instrumented with Wildlife Computers SPOT5, SPOT6 and SPLASH10 satellite transmitters (http://wildlifecomputers.com), modified for deployment and use on whales by Mikkel Villum Jensen (http://mikkelvillum.com). The tags were deployed using the ARTS, a modified pneumatic air gun, at about 8 to 10 m from the whale set at pressure of 11 bars (Heide‐Jørgensen et al. 2001). This is a standard procedure commonly used in tracking projects of large whales (Andrews et al. 2019). Both transmitters consisted of a stainless-steel cylinder (SPOT5: 22x110 mm SPLASH10 24 mm x 155 mm) that contained the electronics and one lithium AA cell. A 38mm stopplate mounted 3 cm from the rear end of the tag stopped the tag at the surface of the skin and prevented the tag from penetrating deeper into the blubber/muscle layer. The rear end of the steel tube had an antenna (160 mm length) and a salt water switch that ensured that transmissions were only conducted when the rear part of the tag was out of the water. A pressure transducer was positioned just below the stop plate on SPLASH10 tags. In the front, the tags were equipped with a stainless-steel anchor spear with a sharp pointed triangular tip and foldable barbs (40–50mm) to impede expulsion from the blubber-muscle layer. The total length of the SPOT5 and SPOT6 from the stop plate to the tip of the anchor was 170 mm and the mass of the instrument with attachment spear was 133 g. The total length of the SPLASH10 tag was 215 mm and the mass of the instrument with attachment spear was 250 g.
The SPLASH10 tags collected summarized dive data in bins where dives to different depths and time spent at the same depths were binned into 100, 200, 300, 400, 500, 600, 700, 800, 900, 1000, 1100, 1200, 1300, >1300 m bins. The duration of dives was summarized in these bins: 5, 10, 15, 20, 25, 30, 35, 40, 45, 50, 55, 60, 65 and >65 mins. In addition to that the maximum depth of dives was recorded for each 24 hr.
The tags were programmed to make a maximum of 250 transmissions per day between 04:00 and 16:00. The SPOT5 and SPOT6 tags were allowed to transmit every day in November through January and every other day the rest of the year. The SPLASH10 tags were allowed to collect dive data and transmit every day.
            The tagging operation in Mauritius was conducted from a rigid hull inflatable boat (24 ft) with a 2 x 90 hp outboard motor, a steering panel, and at a maximum speed of 24 knots. The boat was equipped with a barrel to secure the tagger and provide a stable platform when approaching and tagging the whales. The satellite tags were deployed into the left or right flank of the whales about 1-2 m ahead of the dorsal fin and within 2 m from the midline of the whale's body. Approximate length of the tagged whales was estimated by comparing the size of the whale with the length of the boats involved with the tagging. Based on dimorphic morphology and on the Mauritius Marine Conservation Organization (M2CO) photo ID catalogue, mature males and mature females were also identified (Sarano & Sarano 2017).
 
2.2 Location data processing
Location and dive data were obtained through the Argos Data Collection and Location System using the Kalman filter which greatly improves the location data (Lopez et al. 2014). Dive data were decoded in Wildlife Computers portal. All statistical analyses were performed using R software version 4.0.0 (R Core Team 2020). We restricted our dataset to positions associated with a travel speed lower than 7 km.h-1 (Wahlberg 2002). Locations on land were also discarded. Any individuals containing less than 10 locations (over both seasons together) were also discarded from the analysis. In order to assess seasonal patterns and monsoon periods of the Indian Ocean, seasons were classified as follows: dry season from April to November and wet season from December to March.
 
2.3 Kernel density estimation
To investigate the residency pattern of the sperm whales and locate their high-use areas, a kernel utilisation density approach was used for both seasons separately (Worton 1989). To prevent over and under-smoothing commonly found in kernel density estimations, we used a visual ad hoc procedure previously applied to terrestrial animals (Berger & Gese 2007, Jacques et al. 2009) and recently tested in sea turtles (Chambault et al. 2020). The reference bandwidth parameter href was first calculated for each season. Then, href was sequentially reduced in 0.10 increment (0.9 href, 0.8 href,  0.7 href, …) until 0.1 href, and the most appropriate smoothing parameter was chosen visually by comparing the kernel density to the original location data (Kie 2013). The core and global home ranges were calculated from the 50 and 90% kernel contours respectively for each season.
 
2.4 Environmental data
Strong relationships exist between cetaceans distribution and dynamic environmental variables (Mannocci et al. 2014a, b), such as sea surface temperature (SST), sea surface height (SSH), ocean currents (U and V components) and ocean current velocity. These variables were therefore tested as potential drivers of sperm whales’ movements and to predict their potential distribution in the SWIO. In addition to surface variables, the mixed layer depth (MLD) was also considered as this variable is known to be closely related to primary productivity. However, the deep diving behaviour of sperm whales might also be influenced by temperatures at the bottom of the water column where they mainly forage. Consequently, bottom temperature was also considered a likely driver of sperm whales’ movements. Bathymetry was also extracted from GEBCO at a spatial resolution of 1 km and the slope was subsequently derived from the bathymetry and expressed in degrees to get a proxy of the seafloor roughness. The dynamic variables were extracted monthly from the products Global Ocean Physics Reanalysis Glorys S2V4 (PHYS 001-024) and the Global Ocean Physics Reanalysis Glorys12v1 (PHY-001-030) at a resolution of 0.08° (from E.U. Copernicus Marine Service Information). All variables were then set to the same spatial resolution of 0.08 decimal degree. Monthly grids of each predictor were then averaged for each season: between December and March for the wet season, and between April and November for the dry season.
 
2.5 Species distribution modelling
To identify the environmental drivers of sperm whale movement and predict their potential distribution, we built a series of species distribution models (SDMs) using multiple algorithms from the caret package in R. The aim was to relate the individual occurrences (observations provided by the tracking data), to the environmental predictors selected. We first used an environmental background based technique to generate pseudo-absences (Senay et al. 2013, Iturbide et al. 2015, Hattab et al. 2017, Schickele et al. 2020), relying on the assumption that true absences are more likely located in areas that are environmentally dissimilar from presence locations. Following the same procedure as described in Chambault et al (under review), a principal component analysis (PCA) was used to generate a two-dimensional environmental background representing the ordination results of the seven environmental variables available over the study area. One PCA was performed for each season separately. Pseudo-absences were then randomly generated outside environmentally favourable areas for each season and in equal number to the filtered occurrences (e.g. tracking locations). To assess models’ sensitivity to the pseudo-absences generation procedure, 10 different sets of pseudo-absences were simulated (i.e. 10 runs for each season) for each algorithm. The eight environmental variables were then extracted at each occurrence and pseudo-absence.
In order to find the most adequate model to predict the distribution of sperm whales with the highest accuracy, we tested 14 different algorithms belonging to the following categories:
1.     Ensemble: Random Forest (RF) and Stochastic Gradient Boosting (GBM);
2.     Regression: Generalized Additive Model (GAM) and Multivariate Adaptive Regression Splines (MARS);
3.     Bayesian: Naïve-Bayes (NB) and Bayesian Additive Regression Trees (BayesGLM);
4.     Decision tree: Logistic Model Trees (LMT) and C5.0;
5.     Instance-based: K-Nearest Neighbour (KNN) and KKNN;
6.     Dimensionality reduction: Linear Discriminate Analysis (LDA) and Quadratic Discriminant Analysis (QDA);
7.     Support Vector Machine (SVM): SVM with radial kernel (SVMradial) and SVM with linear kernel (SVMlinear).
The 14 algorithms were ran for each simulation run using the presence of sperm whales (1: presence vs. 0: pseudo-absence) as a response variable. The 14 models included the seven predictors mentioned above. All predictors were scaled between 0 and 1, and collinearity was checked using the Variance Inflation Factor (below four). The dataset of each run was first randomly split between the training dataset (80% of the data) and the validation dataset (20% of the data). Each algorithm was run on the training dataset while model evaluation was performed on the validation dataset. Model comparison was based on a 10-fold cross-validation with three repetitions using the following performance metrics calculated for each run on the 20% validation dataset: the accuracy, the Kappa, the sensitivity, the specificity, the True Skill Statistics (TSS) and the F1 score. The best selected model was then tuned by testing several values of the mtry argument (the number randomly selected predictors). The “tuned model” was then used to generate ten prediction maps of the sperm whale’s distribution (for each of the ten runs) and for each season separately. In parallel, the caretEnsemble package was used to generate ten predictions based on the combinations of the 14 algorithms previously tested, hereafter called the “stacking method”. The ten prediction maps of each approach were finally averaged to provide a final map of the potential distribution of sperm whales during the wet and dry season separately. The coefficients of variation were also calculated to provide a map of uncertainty (ratio of the standard deviation over the mean).