I was hired as a Bloomberg Associates Fellow to work on the Detroit Land Bank Authority (DLBA) Inventory team beginning in July of 2016 in order to build a statistical “occupancy model” that would predict the occupancy of every residential non-lot parcel in the city of Detroit.
SUMMARY – OCCUPANCY MODEL
The occupancy of homes, especially in Detroit, is dynamic and challenging to measure, especially based on a single observation or input. Robust occupancy data is very valuable to the DLBA, because it will allow the DLBA to, among other things, focus its limited resources on targeting occupied homes for its buy-back program, and it will be useful in connection with additional future programs focused on occupied homes. The model will also become a critical aspect of the DLBA’s ongoing property triage process. Specifically, the Land Bank regularly receives thousands to tens of thousands of properties from Wayne County. This occupancy model will greatly assist in more accurately selecting properties for DLBA programs (i.e., the Demolition, Auction, Community Partners, and Own-It-Now programs). Therefore, my goal for this project was to build this valuable data tool for the DLBA, which could serve as a predictive model that would incorporate multiple data inputs that can measure occupancy at an extremely detailed level. I have built my occupancy model in such a way that the data is updated on a recurring basis, improving the accuracy of the model.
I gathered data, listed in the Dataset section below (Table 1) from a variety of sources, all of which I believed collectively would help inform whether or not a person is living in a house. The comprehensive Loveland Technologies Motor City Mapping (MCM) data tool served as the basis for my model. Since the beginning of 2014, Loveland Technologies has gathered an incredible amount of data from every parcel in the city by having dedicated teams of “blight-texters” (known as “blexters”) go door-to-door recording information. This information ranges from the physical condition of the home, to the extent of fire damage, to (importantly for me) the occupancy of the home.
I was able to make a determination about which homes looked “occupied” and “unoccupied” by combining the MCM “occupancy” data with water usage data, active energy account (i.e., DTE) data, United States Postal Service (USPS) data, voter registration records, and fire history, and applying a number of machine learning algorithms. In total, I built my model using around 66,000 residential non-lot parcels that have had data gathered on them by Motor City Mapping since January 1, 2016. After building the model, I then applied it to the full list of about 234,000 residential non-lot parcels in the city of Detroit. In the final occupancy model I developed, each residential non-lot, non-apartment parcel in the city is listed, and each parcel has an occupancy score (ranging from 0 to 1). Using a subset of MCM lots, my model provides an occupancy classification of either “occupied” or “unoccupied", based on that score and chosen predictor threshold. (See Table 2).
This model builds upon and updates the MCM data; for example, it relies on more recent observations to define occupancy. Also, because we get our data at a monthly cadence, the outputs (occupancy scores) can be updated every time the inputs are updated without having to retrain the model (i.e., having to actually go to a house and observe whether or not it is occupied).
In order to build the machine learning algorithms for my occupancy model described above, I compiled a dataset at the parcel level from a number of different sources. The dataset contains water usage data from the Detroit Water and Sewage Department (DWSD), active electricity and gas account data from DTE, voter registration records, USPS delivery data, Detroit Fire Department fire data, as well as MCM data.
Table 1: Data Set and Variable List