2.2 Problem Statement
There were many techniques, methods, and models proposed for lung cancer
classification, In the paper [10] authors were used logistic
regressions as a machine learning classifier for lung cancer
classification. They used a total number of 80 patients’ data. But they
used only fifty patient data in the analysis. They found a good accuracy
score from 71% to 78%. The following are major problems in existing
research work.
They used 50 patients data set which was very low sample for the
supervised model because small data set creating over fitting in model
implementation. Another problem no worked has been done on feature
scaling and model optimization to train best model for best accuracy.
That’s why the existing research work sensitivity and specificity not
same. So the accuracy was low and not found the correct figure of
accuracy. It is important to apply feature scaling and model
optimization to avoid over fitting from the algorithm. The model trained
on a small dataset is more likely to see patterns that do not exist,
which results in high variance and very high error on a test set. These
are the common signs of over fitting. So, the reason for the low
accuracy of the model has used a small set of data and not handled over
fitting in model training. Also there are many advanced algorithms
available for classification. So it needs a sufficient and advanced
algorithm that gives better results with high accuracy. Also, need a
powerful tool or language that work in less time consuming with limited
human efforts. Only one algorithm suggested for classification. It is
important to perform more than one algorithm.