Extra Credit Project
U.S. Election Exit Poll Results Modeling <Xianbo Gao, gaogxb, xg656>
Abstract: Using PCA and Lasso regression to build a regression model for 2016 U.S.
Election Exit Poll Results to find which factors and to what extent contribute
to the result.
In this project, I aim to discover the main factors
influence the percentage of people voting for Trump and Clinton in state level
in the 2016 U.S. Election Exit Poll Result, how much each factor contributes to
the percentage and build a model to fit the percentage of the voting result in
state level. Then I can explain the reason which Trump won the election by
the election exit poll result.
County level election results and
information of people provided by United States Department of Agriculture
Economic Research Service
Election results and information of people
in excel format provided by uselectionatlas.org
The data only have population in 2014.
Besides, there are only information of 37 states, not all the states.
There are 51 columns which are factors or variables. The names of these columns are codes which
should be replaced by description, so I rename these columns. I try to convert all the data into percentage format. 30 factors are or can be converted into percentage (such as percentage of age under 18). 21 factors which are not able to be converted
into percentage level are normalized (such as mean time to work). After that, the data are summed into
state level by weighted average which is based on population in each County.
The format of data is shown below.