PUI2016 Extra Credit Project Proposal

In this project, I aim to discover the main
factors influence the percentage of people voting for Trump and Clinton in
County level in the 2016 U.S. Election Exit Poll Result, how much each factor contributes
to the percentage and build a model to fit the percentage of the voting result.

Election results by times

County level election results and
information of people provided by United States Department of Agriculture

Economic Research Service

Election results and and
information of people in excel format provided by
uselectionatlas.org

First, I will collect the information of people
who take the exit polls including their races, income and so on, treating them
as potential factors. I will use logarithm, square root or power to deal with
these original data and then normalize some of them to percentage level
according to the distribution of these data. Then I will run single factor
linear regression on each potential factor and determine which several factors
contribute the most based on r-squared of each regression model or using PCA to
reduce the dimension of variables. Finally, I will set up a multi-variant
linear regression model to fit the poll result so that I can explain which
factors influence the result most. During the process, I may separate the whole
data into several parts based on the population density if the factors vary a
lot in different Counties.

A multi-variant regression
model based on several factors to model the exit poll results. This can be
shown by a table and a regression figure. A conclusion of which factors and how
these factors contribute to the exit poll results.

federica B biancoabout 2 years ago · Public