Shalmali Kulkarni, skulkarni2, sck408
This paper is about the neighborhood clustering of Manhattan based on physical environment characteristics. The research project was done as a part of curriculum for Principles of Urban Informatics Class 2016.
This paper studies the spatial and social homogeneity of neighborhoods to better understand socio-spatial characters of different neighborhoods in Manhattan, New York. The analysis uses two clustering techniques –Kmeans and DBscan to understand the cluster homogeneity. The results show that DBscan technique performs better than KMeans clustering to reveal that Manhattan shows a distinct pattern based on physical characteristics (diversity) of the buildings.
KMeans Clustering, DBscan clustering, Manhattan.
INTRODUCTION (Relevance of the Study)
The relation between urban form and social heterogeneity have a long history in research of cities. Socio-spatial inequality in NYC has been discussed by city planning, economic development and social justice for a long time. This research clusters the zip codes in Manhattan, New York based on the physical characteristics of the built environment and population distribution. The study tries to understand the clusters defining the famous Manhattan skyline.
The datasets used for this research – PLUTO data and Zip code geojson file for mapping. This study is done for all the zip codes in Manhattan and could later be performed on all the boroughs of New York City.
The PLUTO Data 2016 for Manhattan was downloaded Open NYC website. This data has extensive land and geographic information in about 70 attributes such as zoning, land use category, lot area, building area, building frontage, building depth, etc.
The final zip code file was also downloaded from NYC Open data [Zip Code file] (http://data.nycprepared.org/dataset/nyc-zip-code-tabulation-areas/resource/0c0e14e9-78e1-404e-97b0-c2fabceb3981)
The data wrangling process follows the idea of reproducibility and includes the following stages:
Zip code shapefile
A final dataset including all the required fields was exported as csv file for later use.
A colormap for each of the attribute was made to understand the emerging patterns.
The Number of floors map(ref: Figure 1) shows a highest building height in financial district of lower Manhattan and the mid-town. As the map is plotted using the standard deviations it shows the variations clearly.
The 'diversity score' (ref: Figure 2) defined as the sum of all the normalized standard deviations of attributes considered visually shows some cluster formation.
This research uses two clustering techniques from the sklearn package for further analysis.
The Kmeans algorithm clusters the data into groups of equal variance. This algorithm requires to specify the number of clusters. Silhouette analysis was performed to get the optimal value for number of clusters. This analysis suggested 3 or 4 number of clusters based to the maximum score. The study explores four clusters.