Joel Bangalan edited Methodology_Information_and_Resources_necessary__.md  about 8 years ago

Commit id: 6b59d62915235c34ef106b790dbb5064e787c6f3

deletions | additions      

       

## #  Methodology: ### ##  Information and Resources necessary Previous researches on gene expression and cancer classification as applied to the colon cancer and the leukemia data sets will serve as the foundational pieces in this research. A similar set of data involving lung tissues will be used, and classification algorithms developed based on these. Methods on feature selection will be considered, with focus on how R Programming and existing packages can be utilized in a high dimensional setting.   ### ##  Data Collection The colon and leukemia data sets are available as described in the previous researches (see the preliminary bibliography list).   The lung cancer data set is proprietary and will be provided by a Cancer Research Organization.  ### ##  Overview of the analytical approach 1. **Previous Research Analysis**.  The author will analyze the existing researches on gene expression cancer classifications, particularly on the methods used and the accuracy measures of each.  

The results will be summarized and the accuracy measures compared across data sets.  ### ##  How the analysis relates to the topic or research question The objective is to be able to develop machine learning algorithms for the classification of cancers in the lung cancer data set, and how these algorithms compare with the methods used in the leukemia and colon data sets. A comparative analysis of the classification results across the data sets will show which algorithm is ideal for each data type.