Joel Bangalan edited Methodology_Information_and_Resources_necessary__.md  about 8 years ago

Commit id: 43a6ec66e867eb9499e49bd19b98af8fbea60f6e

deletions | additions      

       

### Data Collection   The colon and leukemia data sets are available as described in the previous researches (see the preliminary bibliography list).   The lung cancer data set is proprietary and will be provided by the First Reader. a Cancer Research Organization.  ### Overview of the analytical approach  ### Previous Research Analysis  The author will analyze the existing researches on gene expression cancer classifications, particularly on the methods used and the accuracy measures of each. ### Application to the Lung Cancer Data  The author will then apply some (or all) of these algorithms to a data set provided by the First Reader, made up consisting  of gene expression profiles on normal and cancerous lung tissues. ### Develop new machine learning algorithms  Other algorithms (or variants of the current algorithms) willalso  be developed andthe accuracies reported in comparison to each other. Finally, these new algorithms will be  applied to the leukemia lung, leukemia,  and colon cancer data sets. This ### Comparative Analysis   The results  will enable the author to summarize be summarized and  the results of various algorithms accuracy measures compared  across data sets. ### How the analysis relates to the topic or research question  The objective is to be able to develop machine learning algorithms for the classification of cancers in the lung cancer data set, and how these algorithms compare with the methods used in the leukemia and colon data sets. A comparative analysis of the classification results across the data sets will show which algorithm is ideal for each data type.