PUI2016 Extra Credit Project Proposal

 PUI2016 Extra Credit Project Proposal

2016 U.S. Election Exit Poll Results Modeling

  <Xianbo Gao, gaogxb, xg656>
Problem Description:
In this project, I aim to discover the main factors influence the percentage of people voting for Trump and Clinton in County level in the 2016 U.S. Election Exit Poll Result, how much each factor contributes to the percentage and build a model to fit the percentage of the voting result.

Data:
Election results by times
County level election results and information of people provided by United States Department of Agriculture
Economic Research Service
Election results and and information of people in excel format provided by uselectionatlas.org

Analysis:
First, I will collect the information of people who take the exit polls including their races, income and so on, treating them as potential factors. I will use logarithm, square root or power to deal with these original data and then normalize some of them to percentage level according to the distribution of these data. Then I will run single factor linear regression on each potential factor and determine which several factors contribute the most based on r-squared of each regression model or using PCA to reduce the dimension of variables. Finally, I will set up a multi-variant linear regression model to fit the poll result so that I can explain which factors influence the result most. During the process, I may separate the whole data into several parts based on the population density if the factors vary a lot in different Counties.

References:
 
Deliverable:
A multi-variant regression model based on several factors to model the exit poll results. This can be shown by a table and a regression figure. A conclusion of which factors and how these factors contribute to the exit poll results.