loading page

Two Nearest Means: A New Case Based Reasoning Method
  • +1
  • Farrokh Alemi,
  • Madhukar Reddy Vongala,
  • Mr. Sri Surya Krishna Rama Taraka Naren Durbha,
  • Manaf Zargoush
Farrokh Alemi
George Mason University

Corresponding Author:[email protected]

Author Profile
Madhukar Reddy Vongala
George Mason University
Author Profile
Mr. Sri Surya Krishna Rama Taraka Naren Durbha
George Mason University
Author Profile
Manaf Zargoush
McMaster University DeGroote School of Business
Author Profile

Abstract

Objective: Case-based reasoning predicts outcomes based on matching to training cases and without modeling the relationship between features and outcome. This study compares the accuracy of the two nearest means (2NM), a case-based reasoning, to regression, a feature-based reasoning. Data Sources: The accuracy of the two methods was examined in predicting mortality of 296,051 residents in Veterans Health Affairs nursing homes. Data was collected from 1/1/2000 to 9/10/2012. Data was randomly divided into training (90%) and validation (10%) samples. Study Design: Case-control observational study Data Collection/Extraction Methods: In the 2NM algorithm, first data was transformed so that all features are monotonely related to the outcome. Second, all means that violate monotone order were set aside; to be processed as exceptions to the general algorithm. Third, for predicting a new case, the means in the training set are divided into “excessive” and “partial” means, based on how they match a new case. Finally, the outcome for the new case is predicted as the average of two means: the excessive mean with minimum outcome and the partial mean with maximum outcome. For regression, we predicted mortality from age, gender, and 10 disabilities. Principal Findings: In cases set aside for validation, the 2NM had a McFadden Pseudo R-squared of 0.51. The linear logistic regression, trained on the same training sample and predicting to the same validation cases, had a McFadden Pseudo R-squared of 0.09. The 2NM was significantly more accurate (alpha <0.001) than linear logistic regression. Conclusions : 2NM, a Case-Based reasoning method, captured nonlinear interactions in the data.