Data Mining for Accurately Estimating Residential Natural Gas Energy Consumption and Savings Using a Random Forest Approach

Date of Award


Degree Name

Ph.D. in Mechanical Engineering


Department of Mechanical and Aerospace Engineering and Renewable and Clean Energy


Advisor: Kevin P Hallinan


Cost effective energy efficiency improvements in residential buildings could yield annual electricity savings of approximately 30 percent within this sector for the United States. Furthermore, such investment can create millions of direct and indirect jobs throughout the economy. Unfortunately, realizing these savings is difficult. One of the impediments for realization is the means by which savings can be estimated. The prevalent approach is to use energy models to estimate. However, actual energy savings are more often than not over-predicted by energy models, leading to wariness on the part of potential investors which include the residents themselves.A driver for this research is 500 residential buildings with known geometrical and historical energy data owned by the University of Dayton. Further, the energy characteristics of these buildings are knowable. This housing stock offers significant diversity in size (ranging from a floor area of 715to 2800 square feet), age (from the early 1900s to new construction) and energy effectiveness, the latter occurring as a result of gradual improvements made to residences over the past 15 years. In the summer of 2015 energy and building data audits were conducted on a subset of 139 homes. The audit documented the areas of the walls and attic, the amount and type of insulation in the walls and attic, areas and types of windows, floor heights, maximum occupancy, appliance (refrigerator, range, oven) specifications, heating ventilation air-conditioning system specifications domestic hot water equipment specifications, interior attic penetration area, and the presence of a basement.A data mining approach was used for developing the Random Forest (RF) model to predict energy consumption in a group of single family houses based upon knowledge of residential energy characteristics, historical energy consumption, occupancy and building geometrical data, as well as inferred energy characteristics from energy consumption data. The model was used to estimate savings and develop a cost implementation model from discrete measures for each residence. Thus, the cost effectiveness of each possible measure could be assessed. From these, prioritized energy reduction measures among all possible measures for all residences could be identified based upon a `worst-to-first' strategy in order to achieve community-scale energy (and carbon) savings most cost effectively. The results when extrapolated 45,000 single family houses in Dayton, Ohio show that a preliminary investment in energy efficiency of {dollar}26 million can achieve annual energy cost savings of {dollar}2.21M per year. As or more importantly, an Economic Input-Output analysis reveals a total sequential economic impact of {dollar}41.2M from the investment. Thus, this approach offers significant and indisputable local impact.


Mechanical Engineering, Economics, Energy, Random Forest, Building Energy Efficiency, Data Mining, Levelized Cost of Fuel Saving, Worst-to-First Strategy

Rights Statement

Copyright © 2019, author