2016 U.S. Election Exit Poll Results Modeling

  PUI2015 Extra Credit Project

2016 U.S. Election Exit Poll Results Modeling  <Xianbo Gao, gaogxb, xg656>

Abstract: Using PCA and Lasso regression to build a regression model for 2016 U.S. Election Exit Poll Results to find which factors and to what extent contribute to the result.
Introduction:
In this project, I aim to discover the main factors influence the percentage of people voting for Trump and Clinton in state level in the 2016 U.S. Election Exit Poll Result, how much each factor contributes to the percentage and build a model to fit the percentage of the voting result in state level. Then I can explain the reason which Trump won the election by the election exit poll result.
Data:
County level election results and information of people provided by United States Department of Agriculture
Economic Research Service
Election results and information of people in excel format provided by uselectionatlas.org

The data only have population in 2014. Besides, there are only information of 37 states, not all the states.
There are 51 columns which are factors or variables. The names of these columns are codes which should be replaced by description, so I rename these columns. I try to convert all the data into percentage format. 30 factors are or can be converted into percentage (such as percentage of age under 18). 21 factors which are not able to be converted into percentage level are normalized (such as mean time to work). After that, the data are summed into state level by weighted average which is based on population in each County. The format of data is shown below.

Head of dataset