Screenshot 2017 12 15 at 12.54.02 am
<Emily Hansen, ekh331, ekh331>Abstract: The wealth of WiFi probe requests collected from public access points in Lower Manhattan were explored as time series, probability distributions, and KS testing to determine similarity of distributions of human client behavior. Weekday and weekend behavior have distinct patterns, while weekday variation is inconsistent: while times entering the network are consistent on weekdays, the duration of clients in the network varies by day of week.Introduction: Recent studies have shown that WiFi networks can be used to classify user type and aid the understanding of urban mobility through opportunistic sensing networks (Campbell 2006). These networks, placed in areas of high public traffic, are highly scalable and collect vast amounts of data concerning user connectivity and counts. Such networks can be used to formulate and map urban mobility at the device level (Afanasyev 2008). The role of city-wide WiFi networks in tandem with a host of private networks, and the extent to which they may interact, is still an open question (Afanasyev 2015). With this in mind, activity patterns of urban populations are a promising area of exploration with the initialization of large public WiFi networks. One such example of these networks is that across Lower Manhattan, including its hustling Financial District. This network captures the pulse of Lower Manhattan and its workers 24/7-- what insight may we gleam from this rate of observation? This research has aimed to address the consistency and implications of human-controlled devices in this network as they are first and last seen daily. Explored were what fractions of users can be expected to be present in the network by what time of day and whether there is variation between weekdays as well as between weekday and weekend in Lower Manhattan client duration and first seen times. Data: Description of DataThe data utilized for this project included a Cisco Meraki WiFi probe request feed from over 50 access points in Lower Manhattan that of primary interest included client device ID and the time stamps for each probe request. Additional information provided in the data included access point ID, whether or not the probe request resulted in an authenticated connection to the network, and the signal strength of the probe request, as a single probe request from a client device may reach varying numbers of access points at varying distances from the client. The data span 6 months from April to August 2017. These data are not available publicly, but they can be obtained from Cisco Meraki. The entirety of the data available, from the time the feed was accessible, was utilized. More data exist for this region prior to April 2017, but the data are not available. The data are geographically limited to the ranges of devices to the 50+ access points across lower Manhattan, so only the clients using these public sources or near them with WiFi enabled may be recorded. They are also only inclusive of "universal" client addresses, meaning that local, or anonymous, client addresses are not included in the data set. Data Wrangling and ProcessingThe data came in (aside from lack of local client addresses) unfiltered form, and two substantial data management goals were implemented. First, the data included probe requests from human-controlled and robotic devices alike. The robotic, automatic, or "bot" devices are not useful for measurement of human activity in this context, so they were removed from the data by filtering for clients who sent probe requests at specific frequencies-- frequencies that were too high or consistent were considered bots and removed (Figures 1, 2). Second, post-bot filtering, the time stamps available for each probe request were manipulated to produce a second data set consisting of unique client device duration in the network as derived from their first and last seen times in the network each day in order to study those in greater detail.
PUI2017 Extra Credit Project ProposalVehicular urban mobility and the introduction of ride-sharing <Emily Hansen, ekh331, ekh331>Problem Description: The proposed project aims to evaluate the relationship, if any, that exists between New York City Yellow Cab and Citi Bike trips before and after the introduction of major urban mobility events. Such events considered include the debut of ride-sharing services such as Uber and Lyft, carpooling services such as UberPOOL. The goal is to evaluate, via a series of Mann-Whitney U tests, any empirical change in ridership usage, controlled for population, that may be the result of changing public perception of urban mobility or the beginnings of the re-branding process for certain modes of transportation. Origin and destination aggregates will be analyzed in the event of significant ridership number changes and considered with socioeconomic data of their respective regions in order to discuss whose ridership is most changing. The questions the research aims to address is as follows: are given samples of ridership data before and after these major events representative of different populations? If so, can we identify "tipping points" in urban mobility functionality? Geographically, who is representing taxis, public transportation, and ride-sharing, and how has that representation changed temporally with the advent of new transportation services?Data: The data planned to be utilized for this project include records from the New York City Taxi and Limousine Commission detailing taxi trip records, Citi Bike trip data, and historical data representative of major transportation market renovation. The data is suitable for this project because their records, including features such as trip duration, location, and distance traveled, can be plotted in time series once filtered to reflect the time periods of interest. Analysis: The data can be visualized high-level as a series of time series plots surrounding the periods of interest. In order to determine if there was a significant change in ridership behavior (from any number of trip features) as a result of major events, a series of KS-tests or Mann-Whitney U Tests can be performed on the data to tell whether the populations choosing to take a certain method of transportation have evolved. References: Recent studies have speculated that ride-sharing services such as Uber are changing the way the taxi systems work in urban areas. Socially, the introduction of Uber has led to a decrease in complaints about taxi usage, but the magnitude of this decrease has not been contextualized (Wallsten 2015). Riders have reportedly taken well to Uber over taxis: as of October 2017, New Yorkers take more Ubers now than they do taxis and extending their range far into the boroughs where yellow cabs are scarcely seen (Muoio 2017).Deliverable: The intended deliverable is a statistical conclusion about the different methods of transportation as a way to quantify with increased confidence the impact of major transportation-related events in the city. Also included will be descriptive statistics and plots to better visualize changes, or lack thereof, in ridership features.Extensions: If time permits, the bike sharing, taxi-riding, and ride-sharing city of Chicago may also be studied and the relative impacts of transportation events between it and New York City compared. Chicago's Data Portal has analogous transportation data available at a smaller scale.Bibliography:Gaskell, A. (26 January 2017). Study Explores the Impact of Uber On The Taxi Industry.     Retrieved from, D. (13 October 2017). New Yorkers now use Uber more often than taxis.     Retrieved from, S. (01 June 2015). The Competitive Effects of the Sharing Economy: How is     Uber Changing Taxis? Retrieved from
Screenshot 2017 11 08 at 10 55 47 pm
AbstractThe Citi Bike program has revolutionized public transportation across New York City, with hundreds, if not thousands, of crowd-shared bicycle trips occurring daily. An analysis is performed on Citi Bike records to compare the mean travel times between users subscribed to the service and users employing the service as a one-time customer. It was found through a t-test for difference in means that one-time customers rode Citi Bikes for a statistically significantly longer period of time, on average, than subscribed users. This supports the idea of subscribers taking shorter trips for the purposes of neighborhood commute.IntroductionThe Citi Bike program is the largest bicycle sharing service in a United States major city, spanning several boroughs and providing over 12,000 bikes at over 700 stations to New Yorkers as of October 2017. Given its increasing popularity and the population density of New York City,  it is possible to analyze large numbers of trips taken across the city over time and gleam insight from ridership activity. Citi Bike classifies its users as "subscribers"-- with a monthly subscription-- or "customers"-- one-time service users. Knowing any difference between how long subscribers vs customers tend to spend on their routes will help Citi Bike improve their services to cater to clusters of office spaces or tourist destinations, for example.DataThe data utilized are from an open, downloadable index of trip data published by Citi Bike. The data cover a single month-- December 2015. The data were read in as a data frame and were cleaned to separate out user type and trip duration from the rest of the information. Their distributions are seen in Figures 1 and 2 below. The remaining data were grouped by user type, and the average trip duration for each group was calculated.