UPDATE:This proposal was updated to reflect the results seen in the report located here.After speaking with the Prof. Fedhere, the analysis shifted from histograms to plotting geographic features onto the raster image. The tiling of the images and hillside were completed as originally proposed. PUI2016 Extra Credit Project ProposalAnthropogenic Impacts found the Bedrock Layer<Dana Karwas, dlk253, dlk253>Problem Description: Coastal Urban ecosystems are under the constant pressure of natural and man made forces. How much is the bedrock layer effected by urban coastal ecosystems? By identifying patterns at the bedrock layer is it possible to identify urban coastal areas through their bedrock profile? Can an algorithm to measure anthropogenic impacts on urban coastal ecosystems be established by applying visual synthesis and analysis techniques to bedrock models? What can the bedrock tell us about the current state of our coastal cities? Can a metric for human impact be established by looking at the shape and topographic details of the bedrock? Data: The dataset this that is available and suitable is the Earth 2014 arcmin global topography and relief models from Curtin University. The data includes a global bedrock only layer which is what I would like to start with. It is available as gridded data and degree‐10,800 spherical harmonic. The bedrock (BED) includes Earth`s relief without water and ice masses.This data was found it in the paper linked below with accompanying data gateway. Paper: http://ddfe.curtin.edu.au/models/Earth2014/Hirt_Rexer2015_Earth2014.pdfData Gateway: href="http://ddfe.curtin.edu.au/models/Earth2014/">http://ddfe.curtin.edu.au/models/Earth2014/Bedrock href="http://ddfe.curtin.edu.au/models/Earth2014/">http://ddfe.curtin.edu.au/models/Earth2014/Bedrock Layer:href="http://ddfe.curtin.edu.au/models/Earth2014/Earth2014_visualisation_Antarctica.jpg">http://ddfe.curtin.edu.au/models/Earth2014/Earth2014_visualisation_Antarctica.jpgData href="http://ddfe.curtin.edu.au/models/Earth2014/Earth2014_visualisation_Antarctica.jpg">http://ddfe.curtin.edu.au/models/Earth2014/Earth2014_visualisation_Antarctica.jpgData Source: Western Australian Center for Geodesy, Curtin University PerthData PerthData Contact: email@example.comThis data is suitable for my questions because it has an isolated bedrock layer for the entire globe. The analysis will be made on three coastal urban cities in the US (New York City, Los Angeles, and New Orleans). I will have to pay close attention to land and ocean stitching and may need to find additional data to fill in data gaps in resolution if needed. I will also need to play close attention to the coordinate system transformation for my datasets. I will look for topographic anomalies by using imaging processing techniques on the shape files. I will establish a search criteria through image processing (histogram matching/analysis) for "man-made" interventions - ultimately leading to machine learning (this is very ambitious, and would be happy if I could just begin to compare a few histograms).Other data of interest: Earth1- ETOPO1 (1 arc minute)http://www.ngdc.noaa.gov/mgg/global/global.html2- SRTM30_PLUS (0.5 arc minute ~ 900 meters) and SRTM15_PLUS (0.25 arc minute ~ 450 meters)http://topex.ucsd.edu/WWW_html/srtm30_plus.htmlMarsThe MOLA Mission Experiment Gridded Data Records (MEGDRs) are global topographic maps of Mars http://pds-geosciences.wustl.edu/missions/mgs/megdr.htmlAnalysis Image processing techniques such as histogram matching could be used as a way to compare the datasets. Finding patterns in the histograms would be one way to begin identifying the impacts. ReferencesTechnical References:http://www.machinalis.com/blog/python-for-geospatial-data-processing/https://en.wikipedia.org/wiki/Histogram_matchinghttp://geospatialpython.com/https://github.com/GeospatialPython/pyshphttps://code.google.com/archive/p/pyshp/wikis/CreatePRJfiles.wikiTheoretical References:http://press.uchicago.edu/ucp/books/book/chicago/S/bo18295743.htmlhttp://www.nyu.edu/classes/bkg/methods/daston.pdfDeliverable: The expected deliverable would be an algorithm that can stitch together topographic bedrock data of ANY planetary body, href="http://astrogeology.usgs.gov/search/map/Mars/GlobalSurveyor/MOLA/Mars_MGS_MOLA_DEM_mosaic_global_463m" target="_blank">such href="http://astrogeology.usgs.gov/search/map/Mars/GlobalSurveyor/MOLA/Mars_MGS_MOLA_DEM_mosaic_global_463m">such as mars-- and search for human impact in that dataset. This algorithm can be used by agencies and students to search for "unnatural impacts". This will be interesting, I think, when the impact includes errors from the sensing device - such as those discussed in the mars MOLA dataset and impacts created from human (or other) intervention.
_PROBLEM DESCRIPTION_ In recent years, cities across the world have resorted to automated traffic enforcement cameras in order to free up human law enforcement resources for other tasks. The City of Chicago has both a red light and speed camera program, which issues fines to owners whose vehicles are caught driving over the maximum speed limit or entering through a signalized intersection after the signal has prohibited them from doing so. Since these cameras use photographic enforcement, they provide live video feed all over the city, available to law enforcement officials at a moment's request. I am interested in whether or not the installation of these traffic cameras have led to a subsequent decrease in general levels of criminal activity. The question is as follows: Does the placement of red light or speed cameras reduce the rate of reported criminal activity within the various political wards of the City of Chicago? _DATA_ I will use the datasets below as the basis for my analysis. For all three sets, there is a recorded address, date and latitude/longitude coordinate set for each violation. The Crimes dataset also has a unique case number for each recorded incident (unique ID), date-time stamp, police district, rough street block number the incident occured and type/severity of offense. They are all released directly by the City of Chicago on the city's online data portal. _DATASETS_ Crimes - 2001 to present Red Light Camera Violations Speed Camera Violations _ANALYSIS_ I will use spatial autocorrelation to find the relationship between placement of automated traffic cameras and their effect on local crime throughout the City of Chicago's wards. Since the cameras were all placed and activated by July, 2014 I will compare the crime data for two periods: the 24 months preceding July, 2014 and the 24 months afterwards. Significant increases and decreases in crime should show up on a map as hotspots and coldspots respectively, and the geography used would be the city's fifty wards, creating a chloropleth map of the wards reflecting positive and negative changes in local crime rates. Lastly I will attempt to use Moran's I, which will determine if the expressed pattern is random, clustered or dispersed. _REFERENCES_ How Spatial Autocorrelation (Global Moran's I) works Intro to Spatial Data Analysis in Python _DELIVERABLE_ My deliverable will be a statistical conclusion on whether or not the placement of the cameras has any significant affect on the change in crime rates in Chicago police districts.
We revisit the measurements , who used the technique of light echoes to observe Cassiopeia A (Cas A) from multiple lines of sight and hence determine its asphericty. We confirm and improve on this measurement by reproducing the effect of the light echoes in the spectra of several type IIb supernovae (SNe IIb) found in the literature as well as a new SNe IIb template recently created by , and comparing these to the measured spectra of Cas A. We utilise a Monte Carlo method to measure the velocities, including uncertainty, of three features in the spectra (He 5876, Hα 6564 and Ca II 8579), in order to make this comparison precise. We then test the hypothesis that this asphericity is enough to explain the diversity seen in SNe IIb by comparing to the range of velocities seen in this population. We conclude that the range of velocities due to asphericity and diversity are of the same order and thus asphericity could be enough to explain the diversity by itself. We identify the low signal-to-noise of the Cas A spectra as the main weakness of this study and advocate for more resources.
Since 2013 the citi-bike has been operated in NYC and the whole dataset has been opened to the public. Analysis citi-bike trip duration behavior by gender by analysis the open data and plotting the two genders trip duration histogram. Null Hypothesis is average of the citi-bike women's trip duration is the same as the men's trip duration. Mann Whitney U test is used to test hypothesis test because the two histogram is non-gaussian distribution. Significant level is chosen to 0.05 and p-value of the hypothesis test is 0.0. Therefore, Null hypothesis can not reject it.
Problem Description: According to the National Health Interview Survey conducted in 2015, an estimated 25 million Americans are affected by asthma (Centers for Disease Control and Health Prevention, 2015). Half of asthma cases are hereditary, while the other half are caused by environmental factors. A number of plants have been identified as common asthma triggers, with plane trees identified as a major source of pollen (Asthma Respiratory Foundation NZ, 2017). However, other studies have also shown that only 25 per cent of people are allergic to pollen from plane trees (Sercombe et al., 2011). As this study conducted at Sydney University only involved 64 people, my project aims to expand existing research by analyzing the effect of London plane trees on asthma hospitalization discharge rates. Data: Data from the Street Tree Census and the local asthma hospitalization rate will be used. The 2015 Street Tree Census include details on every street tree in New York City, including specific information about the species, size and health of the trees. Preliminary data processing identified 5 most common tree types in the City (Figure 1), and the London plane Tree was the most common of all tree types. The data will be cleaned to include only London plane trees. Asthma Hospital Discharge data for counties in New York by zip code is available from the New York State Department of Health. The data records discharge rates and absolute numbers of discharges on the zip code level. This data is useful as the distribution and count of plane trees across the city can be compared with asthma hospitalization rates at the zip code level. This data is however limited by the lack of timestamps in the asthma dataset. While I know that flowering for plane trees occurs in spring during April and May, and pollination occurs in October, lasting till early winter, specific analysis on the effect of plane trees on asthma discharge rates specifically during pollinating period cannot be conducted. This implies that the relationship identified may not be as significant due to influence by other factors during other times of the year.
Pooneh Famili pf910 Github: poonehfamiliAbstract: This research project seeks to find the impact of the socioeconomic factors (age, income), city facility (proximity to subway station), and safety score of the streets on the taxi trips rate at census tract level. I used multivariate regression technique for analyzing the data. The result indicates that the most important factor that affect the popularity of taxi in an area is the median income of the neighborhood. Also, there is a significant negative correlation between distance to subway and age with the number of Taxi pick-ups, as well. Key words: Taxi pickup numbers, socioeconomic factors, Multivariate regression, NYC Introduction: The question that this study seeks to find an answer to it is: “How much socioeconomic s’ indices and city facilities can have impact on the popularity of Taxi at the neighborhood level in NYC”. This question could be important since it could help the Taxi agencies and transportation organizations to plan more effectively, Taxi drivers could find out in which neighborhoods the chance of having more trips are higher, and also gives a good view regarding the difference between the Taxi pickups and its today’s main competitor: Uber.To doing so, first I picked four indices to do analysis on them, median age, income of the neighborhood, the distance from subway station, and the safety score of the street. Then after data wrangling and cleaning data I have done multivariate regression on my data to find out the correlation between each of above factors and the number of pickups. Data: In the data collection phase, first I got the data for all Taxi trips of one month in summer (June) and one in winter(January) for the Taxi in 2012 and 2014 from https://github.com/toddwschneider/nyc-taxi-data/blob/master/raw_data_urls.txt. Since the data for Uber is just available from April 2014 I used the data for June 2014 to be comparable with taxi trips from https://github.com/fivethirtyeight/uber-tlc-foil-response/tree/master/uber-trip-data.Regarding socioeconomic metrics, I used median age and income data from http://nyu.policymap.com/ which are available at census tract level for 2010. For accessibility, I got the data of subway locations all over the NYC from https://data.ny.gov/Transportation/NYC-Transit-Subway-Entrance-And-Exit-Data/i9wp-a4ja/data. Finally, for safety issue, I used the safety score of NYC streets (which is available in the link below).The data for safety scores include points that are not even at intersections, in order to get the safety of the taxi pickup points, I calculated the distance between each point and all city safety scores then for each point I selected the one which is closest to it. This is the strategy that I used to find the proximity of each point from subway stations. For age and median income, some of the census tracts have more than one attributes. Since I couldn’t find any documents attached to find the reason behind that I got the average of them and replace that for their age or income. All the data sets for Taxi include more than 10million trips (mean15 million for 2012 and 13,800,000 for 2014), and for Uber (700,000trips reported). To be able to do the computing process on it I picked 20000 randomly from each of them.As mentioned above the data for subway and safety score include points (lat, lon) as their coordinate and I found and merged them on the common points to the data frame which I was working as the main data set. Since I wanted to do my analysis at the level of census tract I found the intersection of all point with the census tracts, I grouped my data by that and got the mean of the point attributes in each census tract and used.Then I merged data of age and income which were available at census tract with my data set.Since I wanted to do multivariate regression analysis on my data sets, I normalized all my four features.Some of the (less than 10 in all of the datasets) are null which I dropped.I did all the above steps in the separate notebook for each data sets for taxi and Uber and saved the result data frame in csv format to use it in the “analysis notebook”.All the notebooks are available at the link below.In comparison phase, I also merged the related datasets.To get a good understanding from my data I calculated the number of pickups in each census tract, and added it to my data and mapped the frequency of pickups for each census tracts(Figure1&2). The maps show that all my data are from Manhattan. Then this research just can be applied for Manhattan. Methodology: This research used multivariate regression technique to find the correlation between each of the independent variables and the frequency of trips at the census tract level. Since these variables are independent of each other, multivariate regression works for our purpose. This method has been used in several studies that evaluate the coefficient correlation of an independent variable with the dependent one. In the ADS class, we had a real world example that evaluates the impact of the different factors such as income of the residents, the size of the units on the price of the buildings, and multivariate regression technique was used. This method cannot eliminate the multicolinearity between the dependent variables, which PCA doesn’t have this problem, and if we had more available datasets it would be the better option. Conclusions: Findings:The analysis from taxi 2014, January data that has been done through multivariate regression indicates that income, age and distance to subway have significant coefficient correlation with the number of taxi pickups (Pvalue: 0, 0.02, 0.008(all of them smaller than 0.05)), their coefficients are (221, -72, -51), R^2 = 0.241 (Figure 3, 4, 5, 6,7)The other four data sets also have the same trend, except taxi 2012 June, and Uber 2014 June that their distance to subway has p value greater than 0.5. Interpretation:My findings show that, disrespect to the time differences (2012 or 2014, winter or summer) of the datasets income is the most important factor that have positive coefficient correlation with taxi pick up in Manhattan. Also age, and distance from subway station have negative impact in all of my analysis that make sense since the far you are from subway, the more is the probability to tend to take the Taxi, and as you are more aged you have more money to take the taxi and less energy to walk.Based on the maps (Figure 1,2), the other interesting result is that the most popular point for taxi is in the midtown around Times Sq, which is tourist attraction spot, but for Uber is in midtown west which is poor regarding public transportation, but there is no tourist attraction in that area (mostly stores, and vehicle stores), that confirms the previous study on this subject (newsroom.uber.com), that claims most of the Uber trips are destined to transportation hubs.However, I got the same number of trip from all my data sets, 20000, but all of them not distributed equally regarding the most popular census area, for example for Uber 2014 June and Taxi in the same time, the top popular census tracts are different(Figure9), and also from winter to summer these spots are different even for only the Taxi (Figure 8). Future work: If we could add more independent variable to our model like the number of site seeing, the number of people above 18 instead of average age of all people, the number of building units in census tract(density), and use PCA to eliminate the multicolinearity that would give us the important factors with more certainty. Also, if we would run special analysis to find the autocorrelation between the census tract taxi trips rate or clustering the census tracts by their Taxi trips rate, it would give us interesting result. Finding the exact characteristics of census tracts which are significantly different regarding their pickup numbers, for Uber or Taxi, or just Taxi in different times of the year could help us to get a better understanding of reasons behind that. Links: To make my code reproducible, I have put all my data sets on the Github:https://github.com/poonehfamili/PUI2016_pf910/tree/master/extra%20credit Bibliography: http://toddwschneider.com/posts/taxi-uber-lyft-usage-new-york-city/https://newsroom.uber.com/us-new-york/top-destinations-in-nyc-according-to-the-data/https://data.ny.gov/Transportation/NYC-Transit-Subway-Entrance-And-Exit-Data/i9wp-a4ja/datahttps://github.com/fivethirtyeight/uber-tlc-foil-response/tree/master/uber-trip-datahttps://github.com/toddwschneider/nyc-taxi-data/blob/master/raw_data_urls.txthttp://nyu.policymap.com/ Appendix: