Joel Bangalan added Methodology_a_Information_and_Resources__.md  about 8 years ago

Commit id: cb7f3b69cbeffb1b82e4e9f48e802d2c35ab150a

deletions | additions      

         

## Methodology:   ### (a) Information and Resources  Previous researches on gene expression and cancer classification as applied to the colon cancer and the leukemia data sets will serve as the foundational pieces in this research. A similar set of data involving lung tissues will be used, and classification algorithms developed based on these. Methods on feature selection will be considered, with focus on how R Programming and existing packages can be utilized in a high dimensional setting.   ### (b) Data Collection   The colon and leukemia data sets are available as described in the previous researches (see the preliminary bibliography list).   The lung cancer data set is proprietary and will be provided by the First Reader.  ### (c) Overview of analytical approach  The author will analyze the existing researches on gene expression cancer classifications, particularly on the methods used and the accuracy measures of each. The author will then apply some (or all) of these algorithms to a data set provided by the First Reader, made up of gene expression profiles on normal and cancerous lung tissues. Other algorithms (or variants of the current algorithms) will also be developed and the accuracies reported in comparison to each other. Finally, these new algorithms will be applied to the leukemia and colon cancer data sets. This will enable the author to summarize the results of various algorithms across data sets.   ### (d) How the analysis relates to the topic or research question  The objective is to be able to develop machine learning algorithms for the classification of cancers in the lung cancer data set, and how these algorithms compare with the methods used in the leukemia and colon data sets. A comparative analysis of the classification results across the data sets will show which algorithm is ideal for each data type.