2.2 Problem Statement
There were many techniques, methods, and models proposed for lung cancer classification, In the paper [10] authors were used logistic regressions as a machine learning classifier for lung cancer classification. They used a total number of 80 patients’ data. But they used only fifty patient data in the analysis. They found a good accuracy score from 71% to 78%. The following are major problems in existing research work.
They used 50 patients data set which was very low sample for the supervised model because small data set creating over fitting in model implementation. Another problem no worked has been done on feature scaling and model optimization to train best model for best accuracy. That’s why the existing research work sensitivity and specificity not same. So the accuracy was low and not found the correct figure of accuracy. It is important to apply feature scaling and model optimization to avoid over fitting from the algorithm. The model trained on a small dataset is more likely to see patterns that do not exist, which results in high variance and very high error on a test set. These are the common signs of over fitting. So, the reason for the low accuracy of the model has used a small set of data and not handled over fitting in model training. Also there are many advanced algorithms available for classification. So it needs a sufficient and advanced algorithm that gives better results with high accuracy. Also, need a powerful tool or language that work in less time consuming with limited human efforts. Only one algorithm suggested for classification. It is important to perform more than one algorithm.