Discussion:
The ability to predict outcomes of specific surgical treatments is
becoming mandatory to provide the highest standards of surgical care. It
helps propose the treatment that fits best and allocate specific
resources to patients most likely to benefit from them. On the
personalized level, routine use of predictive technology could be of
value in presurgical counseling, giving accurate prognostic information
to both physicians and patients, powering medical decision, mitigating
expectations and disappointment given by the incomprehension of patients
and care-givers and reducing legal issues. On the systemic medical
level, timely refraining from unnecessary testing or treatment would
result in decreased complication rates and healthcare costs.
In the literature, the association between CSF leakage and preoperative
risk factors has already been investigated, without univocal results. A
recent review(16) recognized that only suprasellar intraventricular
extension was consistently associated with CSF leakage. Other
preoperative features considered associated with increased risk of CSF
leak were: lower age, higher BMI and ACTH secretion. However, the
preoperative characteristics of PAs patients, above all their incredible
heterogeneity, make the use of traditional statistical models less
reliable compared to other CNS diseases. Meta-analyses are lacking for
the same reason.
We provided a comparative analysis of performance for classical
statistical methods vs ML, and also between different ML
technologies, demonstrating the feasibility to develop an internally
validate prediction model based on supervised machine learning
architecture.
According to our initial exploratory analysis, less invasive PAs (Knosp
grade 2 or lower) were more represented among cases suffering from
intraoperative CSF leakage (31,5% vs 14,7%) compared to more invasive
PAs (Knosp grade 3: leak 5,6 % vs no-leak 18,5%). Parasellar
extension, therefore, could play a major role into planning resection
goal and determining intraoperative CSF leaks, since the latter were
more frequently associated with a radical approach intention. A shorter
ICD was more represented in patients who experienced intraoperative CSF
leak (19,5±4,13 vs 21,6±3,84) as reduced ICD might facilitate cavernous
sinus invasion, require additional tissue manipulation and overall
laterally reduce surgical corridor during endoscopic approach, resulting
in higher chance of traction on surrounding tissue and damage of the
arachnoid.
Of interest, sellar osteodural invasiveness was related to lower chance
of intraoperative CSF leak (59,2% vs 35,2%). A more prominent
expansion downward the sphenoid sinus might have determined a less
suprasellar growth, decreasing the risk of arachnoid tearing and leak.
Despite only moderate accuracy, the multivariate regression identified
non-secreting status, osteodural invasiveness, older age and ICD as risk
factors. In our experience, an increased age seems to be related to
increased odds, in discordance with previous studies. It could be due to
a predominant prevalence of macroadenomas (n=190, 82,8%) in our cohort,
most likely occurring in older patients with higher risk of CSF leakage.
Intraoperative (22.69%) and postoperative (2,1%) CSF leakage rate in
our population is in line with previous literature, determining a
reliable milestone for generalizability of the model(17). Being ACTH-
and GH- secreting PAs a minor proportion of the population investigated,
it is not surprising the current study did not show a relevance for
these variables. In fact, intraoperative CSF leak occurrence could be
higher in younger adults suffering from Cushing’s disease, since TNS
surgery plays a curative role in this invalidating condition and
aggressive strategies are usually recommended. On the contrary,
GH-secreting microadenomas in older adults could receive more
conservative treatments as adjuvant medical therapies are shown to be
suitable for disease control. Non secreting status – associated with
major extrasellar invasion and firm consistency with not well defined
frequency – might be addressed as major risk factor (highest odds in
our study). In our traditional analysis, non-secreting status showed
increased odds of CSF leak occurrence independently from diameter
measurements, implying additional factors might work in favor of
augmented risk of leakage.
These patients might require additional attention when no macroscopic
leakages are identified during surgery: in fact, patients selected by
the predictive tool as at higher risk of CSF leakage might benefit of
careful exploration of the surgical corridor for identifying even occult
low-flow leakages occasionally left undetected during endoscopic
resection. These patients might require preventive sellar floor repair
even in the absence of CSF leakage after scrupulous inspection,
especially in frail multimorbid patients who would suffer most from
postoperative complications.
The implementation of a feature selecting algorithm (BORUTA) help
identify the most performing predictors in our population:“non-secreting status”, “higher age”, “x-axis”,
“y-axis”, “z-axis”, “ICD” and “R ratio”. Compared to previous
classical analyses, the Knosp classification was outperformed by R
ratio. Moreover, in addition to the confirmed endocrinological and
demographic predictors (higher age and non-secreting status),
dimensional tumor measurements also resulted highly predictive of CSF
leakage occurrence.
Our supervised ML model successfully passed internal validation test,
with particular reference to random forest classifier which showed high
discriminative capacity in predicting intraoperative CSF leak occurrence
(Table 3 ).
To the best of our knowledge, this is the first study to compare
different ML models and their performances on CSF leakage occurrence
prediction in EE-TNS surgery. Unlike previous studies, where different
models were arbitrary picked by the investigators without arguing the
reasons (about 87,5% of all studies, according to a recent systematic
review(18)), our workflow included a preliminary parallel analysis of a
consistent number of different models among which we selected a subgroup
for further analyses based on F-1 score before hyperparameters tuning.
Startjees et colleagues already investigated the reliability of a
predictive ML model for intraoperative CSF leakages(19): they reported
high accuracy on a small monocentric population with a single ANN-based
model. In our analysis, RF outperformed every other tested model,
including ANN, in accordance with previous evidence where tabular data
were used as inputs[36]. RFs - in fact - can train on small datasets
and deal with missing data, while ANNs require larger datasets and
features normalization. In addition, the application of ANN on small
population results less prone to generalizability because of overfitting
and biases in the sample characteristics might play a major role. It is
also noteworthy that ANNs, as well as other deep learning technologies,
work on implicit relationships between input and output features. This
so called “black-box” process prevents explicit workflow of the
analysis from being extracted. On the contrary, RFs can be manipulated
with several approaches to extract the most important input features for
further clinical discussion and implementation(22).
Despite high-quality data training permits ML models to follow complex
non-linear interactions and compute accurate predictions, the
interpretation of such results should remain speculative and
experimental. A limitation of our study, in fact, is the monocentric
design which could poses undersurface biases if patients treated in
other institutions and by different surgeons are tested with our tool.
Therefore, an external validation must investigate the generalizability
of the model.
With prospective population inclusion and external data validation we
will be able in the near future to generalize this ML-powered tool in
prevision of a deployment in the clinical practice.
Conclusions:
We believe that machine learning can improve the current planning and
perioperative management of PAs. In this study, we provided a pipeline
for training and validating different supervised machine learning
models. Our random forest classifier (RF) predicted intraoperative CSF
leak occurrence with an accuracy of 87% in the training set and 84% in
hold-out test (Sensitivity 87%. Specificity 82%). We encourage other
institutions to join our mission and share their surgical experience to
develop a tool able to assist daily neurosurgical practice.
References:
1. Ostrom QT, Cioffi G, Gittleman H, Patil N, Waite K, Kruchko C, et al.
CBTRUS Statistical Report: Primary Brain and Other Central Nervous
System Tumors Diagnosed in the United States in 2012-2016. Neuro Oncol.
2019;
2. Tabaee A, Anand VK, Barrón Y, Hiltzik DDH, Brown SM, Kacker A, et al.
Endoscopic pituitary surgery: A systematic review and meta-analysis:
Clinical article. J Neurosurg. 2009;111(3):545–54.
3. Nishioka H, Haraoka J, Ikeda Y. Risk factors of cerebrospinal fluid
rhinorrhea following transsphenoidal surgery. Acta Neurochir (Wien).
2005;147(11):1163–6.
4. Ivan C, Ann R, Craig B, Debi P. Topic Review Complications of
Transsphenoidal Surgery : Results of a National Survey , Review of the
Literature , and Personal Experience Abstract The primary objectives of
this report were , first , to determine the number and incidence of
complications . 1997;(February):225–37.
5. Senders JT, Zaki MM, Karhade A V., Chang B, Gormley WB, Broekman ML,
et al. An introduction and overview of machine learning in neurosurgical
care. Acta Neurochir (Wien). 2018;160(1):29–38.
6. C B Luo, M M Teng, S S Chen, J F Lirng, F C Chang, W Y Guo, et al.
Imaging of Invasiveness of Pituitary Adenomas - PubMed. Kaohsiung J Med
Sci [Internet]. 2000 [cited 2020 May 10];16(1):26–31. Available
from: https://pubmed.ncbi.nlm.nih.gov/10741013/
7. Micko ASG, Wöhrer A, Wolfsberger S, Knosp E. Invasion of the
cavernous sinus space in pituitary adenomas: Endoscopic verification and
its correlation with an MRI-based classification. J Neurosurg. 2015;
8. Hardy J, Vezina JL. Transsphenoidal neurosurgery of intracranial
neoplasm. Adv Neurol. 1976;
9. Kursa MB, Jankowski A, Rudnicki WR. Boruta - A system for feature
selection. Fundam Informaticae. 2010;
10. Chawla N V., Bowyer KW, Hall LO, Kegelmeyer WP. SMOTE: Synthetic
minority over-sampling technique. J Artif Intell Res. 2002;
11. Olson RS, Moore JH. TPOT: A Tree-Based Pipeline Optimization Tool
for Automating Machine Learning. In 2019.
12. Brownlee J. A Gentle Introduction to k-fold Cross-Validation.
machinelearningmastery.com. 2019.
13. Ghawi R, Pfeffer J. Efficient Hyperparameter Tuning with Grid Search
for Text Categorization using kNN Approach with BM25 Similarity. Open
Comput Sci. 2019;
14. Abadi M, Barham P, Chen J, Chen Z, Davis A, Dean J, et al.
TensorFlow: A system for large-scale machine learning. In: Proceedings
of the 12th USENIX Symposium on Operating Systems Design and
Implementation, OSDI 2016. 2016.
15. Hall MA, Holmes G. Benchmarking Attribute Selection Techniques for
Discrete Class Data Mining. IEEE Trans Knowl Data Eng. 2003;
16. Lobatto DJ, de Vries F, Zamanipoor Najafabadi AH, Pereira AM, Peul
WC, Vliet Vlieland TPM, et al. Preoperative risk factors for
postoperative complications in endoscopic pituitary surgery: a
systematic review. Pituitary. 2018;21(1):84–97.
17. Strickland BA, Lucas J, Harris B, Kulubya E, Bakhsheshian J, Liu C,
et al. Identification and repair of intraoperative cerebrospinal fluid
leaks in endonasal transsphenoidal pituitary surgery: Surgical
experience in a series of 1002 patients. J Neurosurg. 2018;
18. Qiao N. A systematic review on machine learning in sellar region
diseases: Quality and reporting items. Endocr Connect. 2019;
19. Staartjes VE, Zattra CM, Akeret K, Maldaner N, Muscas G, Bas van
Niftrik CH, et al. Neural network–based identification of patients at
high risk for intraoperative cerebrospinal fluid leaks in endoscopic
pituitary surgery. J Neurosurg. 2019;
20. Nawar S, Mouazen AM. Comparison between random forests, artificial
neural networks and gradient boosted machines methods of on-line Vis-NIR
spectroscopy measurements of soil total nitrogen and total carbon.
Sensors (Switzerland). 2017;
21. Senders JT, Staples P, Mehrtash A, Cote DJ, Taphoorn MJB, Reardon
DA, et al. An Online Calculator for the Prediction of Survival in
Glioblastoma Patients Using Classical Statistics and Machine Learning.
Clin Neurosurg. 2020;
22. Banerjee M, Ding Y, Noone AM. Identifying representative trees from
ensembles. Stat Med. 2012;
Figures:
Figure 1: Study design and Machine learning model building
pipeline. Here are reported all phases of the current study: 1)
Patients selection: inclusion and exclusion criteria definition; 2)Data
extraction: medical chart review and radiological measurements; 3)Data
preprocessing: dataset construction and variables definition; 4)Data
splitting: definition of a training set (70% of overall data) and an
hold-out test set (30%) for final internal validation; 5) Features
selection: the features selecting BORUTA algorithm discarded not
relevant variables in improving model accuracy and pointed out a minor
proportion of preoperative data as implicated in intraoperative CSF
leakage occurrence; 6) Minory class imbalance oversampling: as CSF
leakage occurred in a small proportion (22,6%) of patients, performance
evaluation of ML models would be negatively influenced by such a
disproportion between outcome classes (occurrence of CSF intraoperative
leakage or not). The SMOTE-NC algorithm permits comparison by
oversampling minority class ; 7) Model selection on training set: a
customized pipeline based on TPOT was coded and best models according to
F-1 score were picked; 8) Model optimization: 10-fold CV was run for
hyperparameter optimization; 9) Model performance report on hold-out
test set: best five optimized models were tested on data never trained
on before (hold-out) and their performance was reported.
Figure 2: Left: Features importance plot as computed by the
features selecting algorithm BORUTA Right: Correlation plot provided for
comparison shows classical statistical inference by Pearson correlation
test.
Additional materials: