Background and related work

\label{sec:background}

Whole building performance diagnostic approaches have been investigated for many years, especially since digital data collection systems became common in commercial buildings. This section reviews previous efforts in conventional building performance analysis approaches such as AFDD and building simulation model calibration. The intent of these reviews is to show the context in which the proposed exploratory analysis process is meant to supplement. We provide a review of data-driven approaches that specifically utilize historic process data. We also discuss the background of temporal data symbolic aggregation and the mining techniques utilized in this paper.

Automated fault detection and diagnostics

Automated fault detection and diagnostics (AFDD) is a process in which abnormal conditions in specific equipment is found and the root cause is identified automatically, i.e. without human intervention \cite{Katipamula:2004tp}. Quantitative methods are based on development of detailed physical, white-box models which are analytically compared to measured building data. Recent quantitative methods developed still require an extensive amount of intimate knowledge in order to construct simulation models in programs such as EnergyPlus \cite{ONeill:2014io}. Qualitative methods rely on expert analysis of specific building systems and the setting of thresholds or alarms, often on much of the detailed sensor data \cite{Costa:2013ko}. Other methods have been developed that do not rely on simulation models and use decoupling-based techniques and virtual sensors to find faults in specific systems \cite{Zhao:2014jc}. Many of the AFDD approaches in the literature focus on the efficacy of detecting and diagnosing faults and neglect the people and process issues related to implementation and utilization of the systems. This gap can be addressed through investigation of data mining and exploratory methods.

Building simulation model calibration

Building simulation model calibration is a promising approach for bridging the gap between design and operational performance \cite{Coakley:2014bg,Reddy:2006un}. A building data model hierarchy matching approach was created to analyze the relationship between simulated and measured datasets \cite{Maile:2012cf}. Calibration is utilized in AFDD quantitative methods outlined previously, to validate model assumptions, as a tool for various measurement and verification procedures (M&V), and in performing investment-grade retrofit analysis. Previous day-typing proceses have been developed to reduce measurement data complexity for simulation calibration \cite{Hadley:1993tn,Kaplan:1990td}. There have been several more recent attempts to convert measured data into occupancy patterns and diversity factors \cite{Duarte:2013gy,DavisIII:2010gj,Abushakra:2001us}. These studies require manual intervention to describe the basic occupancy schedule partitions and removal of anomalous days such as holidays and abnormal performance. Overall, reviews of the literature in this area have found lack of standards, methods for identification of discrepancies, and automation, which all contribute to the process suffering from under-utilization in the industry. More work in exploratory and automated parameter reduction is necessary to further enhance automation of the calibration process.

Data-driven building performance analysis

Building process history-based, or data-driven, techniques are the third major category of diagnostics in buildings. The key feature uniting many of these studies is their focus on extracting knowledge from measured datasets without detailed intervention from an expert. The concept of whole building diagnostics was established in the 1990’s as a means of finding major disruptions in performance from high-level system metrics and probabilistic regression models \cite{Dodier:1999vh}. Various unsupervised learning or exploratory methods have become more popular in extracting information. For example, clustering has been used as a means of finding similar daily performance \cite{Seem:2005ea,Panapakidis:2014ck,Xiao:2014kn}, detecting deviant performance \cite{Seem:2007gj}, enhancing analyst productivity \cite{Koran:2002ut}, and supplementing controls optimization algorithms \cite{Kusiak:2008dw,Bogen:2013vh}. More generalized building data mining methodologies have been developed using outliers detection \cite{Li:2010fq}, multiple linear regression models \cite{Jacob:2010gs}, fuzzy behavior modeling \cite{Linda:2012ek}, generalize additive models \cite{Ploennigs:2013um}, and association rule mining \cite{Yu:2012ho}. Wavelet transformations and clustering have been used in large scale classification of electrical demand profiles of hundreds of buildings \cite{Florita:2013fz}. The newest advanced methods combine a number of different techniques in an analysis framework \cite{Yu:2013ca,Fontugne:2013ug}. Semi-supervised methods have been introduced which allow expert intervention in the process to leverage both unsupervised and supervised machine learning \cite{Yoshida:2008fu}. All of these approaches focus on the efficacy of detecting and diagnosing various potential issues in buildings, but lack discussion or design dedicated to the implementation within the context or interpretability of the results.

Temporal data mining

Temporal data mining for performance monitoring focuses generally on the extraction of patterns and model building of time series data. These techniques are, in some ways, similar to many existing building performance analysis approaches, however different concepts and terminology are used. Two key concepts to understand when applying data mining to buildings are that of motifs and discords. A motif is a common subsequence pattern that has the highest number of non-trivial matches \cite{Patel:2002bb}, thus, a pattern that is found frequently in the dataset. A discord, on the other hand, is defined as a subsequence of a time series that has the largest distance to its nearest non-self match \cite{Keogh:2005wd}. It is a subsequence of a univariate data stream that is least like all other non-overlapping subsequences and is, therefore, a rare pattern that is diverges from the rest of the dataset. These definitions are more general than that of a fault and therefore more appropriate for our goal of higher level information extraction with less parameter setting. In short, we want to efficiently find interesting or infrequent behavior and not create a detailed list of specific problems that could be occurring in individual systems.

In order to work with common temporal mining approaches, we utilize the extensive work in the development of the Symbolic Aggregate approXimation (SAX) representation of time-series data \cite{Lin:2003wz}. SAX allows discretization of time series data which facilitates the use of various motif and discord detection algorithms. The process breaks time series data into subsequences which are converted to into an alphabetic symbol. These symbols are combined to form strings to represent the original time series enabling various mining and visualization techniques. In terms of application, an example of a process using SAX-based techniques is the VizTree tool that uses augmented suffix tree visualizations designed for usability by an analyst \cite{Lin:2004wv,Lin:2005bi}. A specific application of VizTree is the analysis of collected sensor data from an impending space craft launch in which thousands of telemetry sensors are feeding data back to a command center where experts are required to interpret the data. Visualization and filtering tools are needed that allow a natural and intuitive transfer of mined knowledge to the monitoring task. Human perception of visualizations and the algorithms behind them must work in unison to achieve understanding of large amounts of novel data streams.

SAX has been used on building performance data before in a few studies focused on data center chilled water plants and it was found effective in detecting the most efficient control strategies \cite{Patnaik:2009uk}. The same research was used to create a visual exploration tool of high frequency time series data \cite{Hao:2012go}. Despite these efforts, our review of the literature found a lack of tools or processes similar to the VizTree tool for day-types that fit in our targeted context of bridging the performance gap. We will introduce a new process focused on combining temporal approximation, filtering, and visualization.