Keywords: building occupancy, decision tree model, building energy consumption
n this paper I will be reviewing and evaluating the work of Simona D'Oca and Tianzhen Hong of the Lawrence Berkeley National Laboratory, entitled Occupancy Schedules Learning Process Through a Data Mining Framework (OSL), published in the Journal of Energy and Buildings in February 2015 (D’Oca 2015).
OSL is a simple, real-world application of the knowledge discovery and data mining process on building occupancy schedules. This is important from the energetic scheduling and subsequent consumption perspective. Buildings account for one quarter of the global energy consumption (World Energy Outlook...) so they represent a key component of the future smart grid with dynamic supply and demand patterns. Theere is also a an increasing number of buildings fitted with building management systems (BMS) (Open data communicati...), some already connected to the so-called "Internet of Things", but the benefits of these data streams are yet to be harvested. Spaces in buildings can be characterized from their energetic consumption perspective (heat, light, electricity) based on their occupants' presence profiles. OSL presents a simple methodology of determining these profiles using a decision tree model. A set of 16 offices with existing 10 minute interval occupancy data, over a two year period is mined through a decision tree model which predicts the occupancy presence. Then a rule induction algorithm is used to learn a pruned set of rules on the results from the decision tree model. Finally, a cluster analysis is employed in order to obtain consistent patterns of occupancy schedules. The identified occupancy rules and schedules are representative as four archetypal working profiles that can be used as input to current building energy modeling programs.
OSL's decision tree model predicts the occupancy (binary) of an office based on 4 attributes: season, time of day, day of week and previous occupancy state. For this it employs the entropy minimization algorithm C4.5 (Shen 2009, Quinlan 1993). Since the study is conducted on fully-known data, this is regarded as training for datasets that are not complete - i.e. buildings with only a portion of the sensors installed.
In the second part of the study, independent of the first part, OSL clusters occupancy patterns into an optimal number of subgroups using the Davies-Bouldin Index (Thomas 2014), comparing the pairwise Dynamic Time Warping (DTW, (Dynamic Time Warping...)) differences between each occupancy schedule - eventually finding 4 optimal occupancy schedule clusters - making it easy to visually identify the corresponding characteristic activities for each time of the day.