All of the data used in our analysis are publicly available. Most of these data are easily accessible through New York City's (NYC's) Open Data portal\cite{data}, although a few datasets required a more intensive procedure to collect and process before incorporating into our analysis. These processes are detailed below, and our publicly available code repository [LINK HERE] makes available all data and scripts used to collect and process these data. Finally, our team also used a few indices generated by other urban science initiatives. The data to generate these indices will be made publicly available in the near future, and links to the methodologies used to generate them (when available) are included below.
Data from NYC's Open Data Portal or another public NYC website
1. Building sales data collected by NYC's Department of Finance (DOF)
Data on building sales is collected by New York City's Department of Finance. Sales records from 2007-2017 can be found on the DOF's website.\cite{updatea} Each record includes information such as address, Building-Block-Lot (BBL, a key identifier for buildings in New York City), sales price, date of sale, number of residential units, area, and building class, among other features. This dataset does not include identifying information on either the buyer or the seller.
2. Complaints compiled by NYC's Department of Buildings (DOB)
The DOB receives complaints from NYC residents detailing issues they are having with their buildings. These complaints are filed through 311 (a hub of information for NYC residents and visitors and an important outlet for filing complaints) \cite{york} and through the DOB's own complaint system. Each record (or complaint) includes the Building Identification Number (BIN, another key identifier for buildings in NYC), date of complaint, type of complaint and complaint status (e.g. was the complaint investigated, was it converted to a building violation, etc.), among other features. This dataset of complaints is available in full from NYC's Open Data Portal from 2013 until 2017.\cite{received}
3. Construction permits compiled by NYC's DOB
The DOB compiles data on all construction permits given to building owners. These permits are what allow building owners to legally make improvements, repairs or build new structures. Each record in this dataset includes information on the type of construction being granted, the date the permit was granted, the expiration date for the permit and the BBL, among other features. This dataset of permits is available in full from NYC's Open Data Portal from 1999 to 2018.\cite{datac}
4. Complaints submitted to NYC 311
NYC 311 is a hub of information for NYC residents and visitors and an important outlet for filing complaints. \cite{york} NYC 311 collects data on all complaints filed through their system and has made these complaints available on the NYC Open Data Portal.\cite{datad} Each record (or complaint) includes BBL, date of complaint, type of complaint and complaint status (e.g. was the complaint received, forwarded to relevant agencies, etc.), among other features. This data is available from 2010 to present, and is updated daily.
5. NYC's Automated City Register Information System (ACRIS)
ACRIS is a registry of a variety of different property records, including mortgages and tax records, among other types of records. It is compiled by NYC's Department of Finance. Data from ACRIS is available on the NYC Open Data Portal (coming in multiple different files) but it may be too large for many to download, as it contains at least 3 files of around 1 gigabyte each.\cite{datasets} Our team used ACRIS' mortgage records exclusively, but the utility of this extensive dataset in analyzing predatory landlord behavior extends far beyond what is discussed in this paper.
6. NYC's Primary Land Use Tax Lot Output (PLUTO)
PLUTO is a dataset of all buildings in New York City divided into tax lots.
Existing Indices
1. Displacement Typology: Urban Displacement Project
Generated by the NYU Center for Urban Science + Progress (CUSP) 2018 capstone group entitled "Map of gentrification and displacement for the greater New York", this index was created using a methodology created by the Urban Displacement Project, a research initiative of the University of California, Berkeley.\cite{displacement} The index uses demographic data from the US Census' American Community Survey, and aims at measuring the human impacts of gentrification and displacement. Their data is not publicly available at the moment but will be published in the coming months.
2. Renovation Index: "Digital Traces of Gentrification" Project from CUSP 2018's Capstone Cohort
The NYU Center for Urban Science + Progress (CUSP) 2018 capstone group entitled "Digital Traces of Gentrification" designed an index aimed at measuring gentrification entitled the "Renovation Index." This index uses sales and construction permit data to measure the built factors involved in the process of gentrification. Their data and methodology are not publicly available at the moment but will be published in the coming months.
Data with extraction and processing demands
1. Evictions data from the New York City Housing Court and the Department of Investigations, compiled by nyc-db
Evictions data is available on NYC Open Data (provided by the Department of Investigations), but the records only include evictions occurring in 2017 or after.\cite{dataa} To acquire evictions data over an extended period of time we used the nyc-db, a postgresql database that compiles a variety of publicly available datasets on buildings on NYC. \cite{nyc-db} One of these datasets is a dataset of evictions in New York City from 2013 to 2015. It includes BBL and the date of eviction, among other features. These data are technically public, but given that they are stored in a postgresql database the technical skills necessary to read it in is more substantial than for the datasets described above, which can be downloaded through the use of a download link. A set of instructions to create a personal instance of nyc-db can be found on on nyc-db's github repository. Ultimately, our team ended up using evictions data from 2013-2017 excluding the year 2016, as neither dataset contains data on this year.
2. Property listings from Street Easy
Our team developed with a relationship with Street Easy keywords our team was interested for analysis. \cite{nyc} The data we collected contains only properties for sale at the date of collection (mid-July 2018) and only contains properties listed on Zillow.
3. Rent stabilized unit estimates from New York City Department of Finance, compiled by John Krauss and nyc-db
Another dataset in the nyc-db is of estimated counts of rent stabilized units for each building, by year, from 2007 through 2016. These counts are either collected or estimated (depending on the record in question) using NYC's tax lot data by John Krauss, a civic hacker. His methods are thoroughly documented on Github.\cite{nyc-stabilization-unit-counts}