Graduate Theses and Dissertations

Using Hadoop to cluster data in energy system

Jun Hou, University of Dayton

Date of Award

2015

Degree Name

M.S. in Computer Science

Department

Department of Computer Science

Advisor/Chair

Advisor: Zhongmei Yao

Abstract

With the large amount of data generated by various devices, data scientists face big challenges since conditional machine learning algorithms applied on a single computer can no longer be used for processing/analyzing such large data sets. This thesis takes a distributed computing approach built upon Apache Hadoop, which is a distributed data analysis framework running on multiple computers. The main components of this work includes implementation of k-means machine learning algorithms on the Hadoop Map-Reduce framework, processing raw data from real energy systems, classifying the data using k-means algorithms in Hadoop, and improvement on seed selection for k-means algorithms. Finally, this thesis demonstrates the efficiency and effectiveness of our approach using different data sets.

Keywords

Public utilities Data processing Case studies, Electronic data processing Distributed processing Case studies, Data mining Case studies, Computer algorithms, Computer Science, Hadoop, K-means, energy data, clustering analysis

Rights Statement

Recommended Citation

Hou, Jun, "Using Hadoop to cluster data in energy system" (2015). Graduate Theses and Dissertations. 1044.
https://ecommons.udayton.edu/graduate_theses/1044

Link to Full Text

COinS

Graduate Theses and Dissertations

Using Hadoop to cluster data in energy system

Date of Award

Degree Name

Department

Advisor/Chair

Abstract

Keywords

Rights Statement

Recommended Citation

ENTER SEARCH TERMS

Contribute Work

Browse

Links

Graduate Theses and Dissertations

Using Hadoop to cluster data in energy system

Author

Date of Award

Degree Name

Department

Advisor/Chair

Abstract

Keywords

Rights Statement

Recommended Citation

Share

ENTER SEARCH TERMS

Contribute Work

Browse

Links