WORKING DRAFT authorea.com/20524
Main Data History
Export
Show Index Toggle 0 comments
  •  Quick Edit
  • Forecasting the Israeli 2015 elections using a smartphone application

    Introduction

    The 19th Knesset, elected on January 22nd, 2013, was officially dispersed on December 8th, 2014. The elections for the 20th Knesset, which were supposed to be held on Noveber 7th, 2017, are to be held on March 17th, 2015, more than two years before scheduled and just two years and two months after the previous elections.

    During the weeks after the elections were declared, Ofer Moshaioff, Yoav Ram and Idan Cohen developed a smartphone application ('app') called Ha'Midgam המדגם (http://hamidgam.com). This app allowed users to anonymously vote for one of the major participating parties in the upcoming elections (2015), to disclose their vote in the previous elections (2013), and to view a forecast of the 2015 election results based on the aggregated data from all users.

    The app was published for Android devices on the Android Play Store on December 29th, 2014 and for iOS devices (iPhone, iPad; developed by Elad Ben-Israel) on the Apple App Store on January 26th, 2015. It quickly gained media attention on local radio shows, digital media and newspapers. This media attention contributed to over 7,500 application downloads by March 16th, 2015.

    Our app differs from traditional polls in several aspects. In traditional polls, media outlets publish forecasts based on a group of 500-1,000 individuals that were chosen by a polling company at a specific point in time to reflect an unbiased sample of the population.

    In contrast, our app allows users to view a realtime, online forecast of the elections based on individuals that chose to disclose their vote. Therefore, the sample size in our app is roughly 10-fold. However, in contrast to traditional polls, our app doesn't collect any demographic information, such as age, socio-economical status, religion or ethnicity. Therefore, our app's sample may be biased and therefore requires statistical manipulation.

    Our app does collect information that is unique: the app allows users to change their mind at any time; it keep a history of user choices; it logs the precise time and, if allowed by the device, location; and most importantly for the sake of this manuscript, the app asks users to disclose which party they voted for in the previous elections (2013). Our hypothesis was that this information could be enough to make a good forecast of the elections results - the distribution of seats between the participating parties.

    In this manuscript we describe how the app works, the methods we used to manipulate the data, and the forecasts we got. We wanted to make this manuscript available before the elections day begins and therefore this manuscript in it's current form includes only basic analysis.

    Methods

    App technical description

    The mobile client was developed for the Android and iOS smartphone operating systems (the iOS version didn't include the entire feature set). The app communicated with a RESTful API server, developed using Python 2.7 and the Flask web application framework and hosted on heroku, largely following a tutorial by Miguel Grinberg.

    The app presents to the user a grid of the parties, including some basic information and a link to the party home or Facebook page. The user can vote to a specific party, at which point the results forecast screen appears. The user can view the number of seats per party. At any time the user can change his vote. In the Android version additional features were implemented; most importantly, users were asked to disclose their vote in the 2013 elections. In addition, users could see the a geographical distribution of the votes by the country main administrative regions.

    Seats distribution forecasting

    We only describe our latest approach with some variations. The basic problem is how to control bias in our vote sample. Although our sample has over 7,500 votes, it could be biased due to several factors such as age, socio-economical status, and party activist propaganda.

    Bias control

    We started asking users for their 2013 elections choices on February 13th 2015. We used this information, together with the 2013 elections official results to attempt to control sample bias.

    First, we take only the latest vote for each device id, both from the 2013 and the 2015 datasets. Next, we calculate a counts matrix \(C\) with rows for 2015 parties, columns for 2013 parties: \(C_{i,j}\) is the number of individuals who voted for party \(j\) in 2013 and will vote for party \(i\) in 2015.

    Next, we use the counts matrix \(C\) to estimate the transition matrix \(M\) in which \(M_{i,j}\) is the probability that an individual who voted for party \(j\) in 2013 will vote for party \(i\) in 2015. This was done by normalizing the columns: \(M_{i,j} = \frac{C_{i_j}}{\sum_{i}{C_{i,j}}}\).

    We then generate the 2013 results vector \(v\) from the official results data, removing counts of parties for which we have no information as well as illegal or discarded votes. We multiply the transition matrix by the results vector to get the forecast vector: \(f = C \cdot v\). The forecast vector \(f\) describes our prediction of the number of votes each party will get in the 2015 elections.

    To get a forecast of the number of seats for each party we process the forecast vector \(f\) using the Bader-Offer method, also known as the Hagenbach-Bischoff system. In our version of the Bader-Offer method we disregarded surplus vote agreements.

    The multiplication of the transition matrix with the 2013 results vector can be viewed as giving different respondents different weights. If \(c_i\) respondents replied that they voted for party \(i\) in 2013, then in the normalized transition matrix, each such respondent has weight \(1/c_i\). When we right-multiply the matrix with the actual 2013 results vector \(v\), each respondent ends up with the weight \(v_i/c_i\). With this weighting scheme, the total weight of respondents that claims to have voted for party \(i\) in 2013 is the same as the actual number of voters for \(i\) in 2013. Our sample now 'agrees' with the actual 2013 election results. To recap, we inserted a weighting scheme that controlled for the publicly known 2013 election results.

    Additional bias control

    As another layer of bias correction, we experimented with fixing the number of votes received by parties that represent four demographies to the number of votes in 2013. These demographies are:

    1. The arab sector, represented by Hadash, Balad & Raam-Taal in 2013 and by the Arab Unified List in 2015.
    2. The Ashkenazi-Orhodox sector, represented by Yahadut Ha'Tora both in 2013 and in 2015.
    3. The Sfaradi-Orthodox sector, represented by Shash and Am Shalem in 2013 and by Shas and Yachad in 2015. Because Yachad merged with Ozma La'Am for the 2015 elections, we includied Ozma La'Am in the respective 2013 votes.
    4. The liberal, pro-cannabis legalisation party, Ale Yarok.

    Fixing the number of voters of the first three demographies can be justified by the relatively constant number of seats their respective parties received in the previous three elections and by the sectoriality of these parties. As for fixing the number of votes of Ale Yarok, this was considered necessary because supporters of this party are known to be very active online, thus generating biases in online surveys and polls. For example, the number of "Likes" Ale Yarok has in Facebook is 85,709, compared with 27,205 Ha'Likud, the major right winged party, has.