In addition to the increase in building performance data, there is a growing awareness of the gap in performance between building design and operations \cite{Trust:2011uz,deWilde:2013hh,Menezes:2012hm,Demanuele:2010vw,Turner:2008wc}. Multiple studies have documented and validated this phenomenon, with the most extreme mismatch finding measured energy consumption at 5 times predicted consumption for a commercial building \cite{Trust:2011uz}. A framework for investigating this gap emphasizes more robustly leveraging measurement data and ensuring that research in this field aligns with actual building engineering practices \cite{deWilde:2013hh}.

From the conventional operations and management side, this performance gap is generally addressed through the use of various performance analysis techniques. There are two major realms of these techniques: top-down, whole building techniques and bottom-up, device-focused diagnostics \cite{Seem:2007gj, Ulickey:2010ut}.

Top-down approaches such as Energy Information Systems (EIS) are designed to qualify the building’s overall performance health. They leverage the whole building and sub-systems level data to show how well a building performs compared to its peers (benchmarking) or simple tracking metrics. Despite their high-level usefulness, these techniques have a limited amount of insight and ignore much of the detailed digital data created in recently built or renovated high performance buildings \cite{Friedman:2011vw}. In addition, they often aren’t able to leverage higher frequency, sub-hourly measurements.

Bottom-up, component level approaches such as commissioning and automated fault detection and diagnostics (AFDD) are more effective at detecting the root cause of performance problems. A review of AFDD approaches for building systems diagnostics describes three general categories: Qualitative Model-based, Quantitative Model-based, and Process History Based \cite{Katipamula:2004tp}. The first two categories often require an understanding of the impact of each detailed data stream in order to set thresholds or parameters for detection of anomalies. Process history based methods rely on large amounts of empirical, measured data to create statistical models or use pattern recognition to find operational anomalies. Only process history based approaches are identified as useful with little a priori knowledge. However, they are implied to be inferior due to weaknesses such as the inability to extrapolate beyond a range of training data, amount of data necessary, and being specific to a particular dataset.

Beyond AFDD and EIS, another very active research topic is the process of calibrating the building simulation model developed in the design phase with measured performance data from the operations phase \cite{Coakley:2014bg}. The benefits of such a process have long been lauded as key in the understanding of the performance gap; first, in identifying the deficiencies in modeling engines and assumptions and, second, in investigating potential performance deviation in operations. This field was one of the first to investigate the use of day-typing as a means of parameter reduction of measured data for simulation feedback. Much of the literature in simulation model calibration treats measured raw data preparation and day-typing in a manual way, often ignoring the shape and magnitude patterns and relying on rules-of-thumb regarding schedule creation. These approaches add to the cost, time and lack of automation burden that calibration suffers from in real implementation.

A comprehensive study of building performance tracking was completed by the California Commissioning Collaborative (CACx) and funded by the California Energy Commission (CEC) to characterize the technology, market, and research landscape in the United States. Three of the key tasks in this project focused on establishing the state of the art \cite{Effinger:2010tm}, characterizing available tools and their the barriers to adoption \cite{Ulickey:2010ut}, and establishing standard performance metrics \cite{Greensfelder:2010wl}. These reports were accomplished through investigation of the available tools and technologies on the market as well as discussions and surveys with building operators and engineers. The common theme amongst the interviews and case studies was the lack of time and expertise on the part of the involved operations professionals. The findings showed that installation time and cost was driven by the need for a controls engineer to develop a full understanding of the building and systems. We interpret these results as a latent need for techniques that take into consideration the people, process, and philosophy aspects of the performance analysis equation \cite{Miller:2013vk}. The effort described in this paper addresses this challenge by focusing on automatically finding insight in large, unstructured building performance datasets as part of an analysis process.

Parameter-light exploratory analysis for building performance data

We draw inspiration from other time-series analysis and visualization applications in order to address the progression of data mining in the building industry. One emerging trend is that “data mining algorithms should have as few parameters as possible, ideally none. A parameter-free algorithm prevents us from imposing our prejudices and presumptions on the problem at hand and let the data itself speak to us \cite{Keogh:2004vp}.” This approach is known as parameter-free or parameter-light data mining. The efficacy of these algorithms has been proven comparable or better than many more complex, traditional time-series data mining approaches \cite{Keogh:2004vp}.

An emerging circumstance in the building industry is the consolidated analysis of multiple buildings or portfolios by third-party experts \cite{Granderson:2010jh,Lawrence:2012dx}. The responsibility of managing and mining performance data is shifted from operations staff to data and building science experts who develop specific skills and efficiencies of scale. This type of scaled analysis and intervention addresses the previously-mentioned time and expertise deficiency and the cost effectiveness of building performance investigations. This scenario requires computation techniques which, on one hand, condense information more effectively than conventional top-down techniques, and on the other hand, requires less a priori knowledge than bottom-up, component-level approaches. Therefore, exploratory visualization and data mining techniques could be designed as part of a process to bridge these gaps. Our research combines traditional AFDD with exploration techniques such as time-series pattern recognition and visualization.

We propose a new context for the process history based methods found in the literature by testing their usefulness not as a full-scale automated fault detection and diagnostics (AFDD) approach, but as an exploratory step between the top-down and bottom-up paradigms within a process of analysis. The goal is to reduce the expert intervention needed to utilize measured raw data, most importantly in the initial analysis stages. The question addressed is whether parameter-light techniques can provide meaningful insight for bridging the performance gap and implementing other techniques in a more automated way. Additionally, there is a goal of automated filtering of the typical performance behavior of a building in order to better understand whether the building is performing as designed. These objectives are useful in the post-occupancy phase for stakeholders such as designers and architects and non-technical occupants and managers.

This paper introduces DayFilter, a process with three key contributions related to post-occupancy exploratory analysis:

  • Automated tagging of the most common daily profiles whose patterns occur most frequently. These subsequences are defined as motif candidates and they can be used to characterize general daily performance profiles.

  • Automated tagging of potentially anomalous individual daily profiles whose patterns occur least frequently in the dataset. These subsequences are defined as discord candidates and are further investigated in the development of rule-based diagnostics.

  • Both types of analysis are presented using combinations of visualizations that are expressive and interpretable by analysts as part of a larger process.

These efforts are meant to assist in transforming high frequency collected data into more qualitative or simplified means so as to be compared to design phase assumptions or to be used to implement more sophisticated bottom-up techniques such as AFDD. This target assists in bridging the previously mentioned performance gap between design and operations.

The paper is organized as follows. In Section \ref{sec:background}, we discuss the background of whole building performance analysis and give an overview of the time-series data mining techniques investigated with application to the buildings context. Section \ref{sec:process} explains DayFilter, a process of analysis used to filter information from raw measured building data. Then, Section \ref{sec:application} outlines two real-world case studies as applications of the process. Section \ref{sec:parameters} discusses the achievement of the parameter-light goals by investigating the influence of the input parameters and statistical analysis of the tightness of fit of the clustering process. Finally, Section \ref{sec:discussion} discusses insight gained from the application process with respect to the achievement of the objectives.