I was hired as a Bloomberg Associates Fellow to work on the Detroit Land
Bank Authority (DLBA) Inventory team beginning in July of 2016 in order
to build a statistical “occupancy model” that would predict the
occupancy of every residential non-lot parcel in the city of Detroit.
SUMMARY – OCCUPANCY MODEL
The occupancy of homes, especially in Detroit, is dynamic and
challenging to measure, especially based on a single observation or
input. Robust occupancy data is very valuable to the DLBA, because it
will allow the DLBA to, among other things, focus its limited resources
on targeting occupied homes for its buy-back program, and it will be
useful in connection with additional future programs focused on occupied
homes. The model will also become a critical aspect of the DLBA’s
ongoing property triage process. Specifically, the Land Bank regularly
receives thousands to tens of thousands of properties from Wayne County.
This occupancy model will greatly assist in more accurately selecting
properties for DLBA programs (i.e., the Demolition, Auction, Community
Partners, and Own-It-Now programs). Therefore, my goal for this project
was to build this valuable data tool for the DLBA, which could serve as
a predictive model that would incorporate multiple data inputs that can
measure occupancy at an extremely detailed level. I have built my
occupancy model in such a way that the data is updated on a recurring
basis, improving the accuracy of the model.
I gathered data, listed in the Dataset section below (Table 1) from a
variety of sources, all of which I believed collectively would help
inform whether or not a person is living in a house. The comprehensive
Loveland Technologies Motor City Mapping (MCM) data tool served as the
basis for my model. Since the beginning of 2014, Loveland Technologies
has gathered an incredible amount of data from every parcel in the city
by having dedicated teams of “blight-texters” (known as “blexters”) go
door-to-door recording information. This information ranges from the
physical condition of the home, to the extent of fire damage, to
(importantly for me) the occupancy of the home.
I was able to make a determination about which homes looked “occupied”
and “unoccupied” by combining the MCM “occupancy” data with water usage
data, active energy account (i.e., DTE) data, United States Postal
Service (USPS) data, voter registration records, and fire history, and
applying a number of machine learning algorithms. In total, I built my
model using around 66,000 residential non-lot parcels that have had data
gathered on them by Motor City Mapping since January 1, 2016. After
building the model, I then applied it to the full list of about 234,000
residential non-lot parcels in the city of Detroit. In the final occupancy model I developed, each residential non-lot,
non-apartment parcel in the city is listed, and each parcel has an
occupancy score (ranging from 0 to 1). Using a subset of MCM lots, my
model provides an occupancy classification of either “occupied” or “unoccupied", based on that score and chosen predictor threshold. (See Table 2).
This model builds upon and updates the MCM data; for example, it relies
on more recent observations to define occupancy. Also, because we get
our data at a monthly cadence, the outputs (occupancy scores) can be
updated every time the inputs are updated without having to retrain the
model (i.e., having to actually go to a house and observe whether or not
it is occupied).
In order to build the machine learning algorithms for my occupancy model
described above, I compiled a dataset at the parcel level from a number
of different sources. The dataset contains water usage data from the
Detroit Water and Sewage Department (DWSD), active electricity and gas
account data from DTE, voter registration records, USPS delivery data,
Detroit Fire Department fire data, as well as MCM data.
Table 1: Data Set and Variable List