Multi-core architectures for feed-forward neutral networks

Date of Award

2014

Degree Name

M.S. in Electrical Engineering

Department

Department of Electrical and Computer Engineering

Advisor/Chair

Advisor: Tarek Taha

Abstract

Power density constraints and processor reliability concerns are causing energy efficient processor architectures to gain more interest in recent years. One approach to reduce processor power consumption is through the use of specialized multi-core architectures that provide significant speedups for neural network applications. Several studies have shown that a large variety of processing tasks can be represented as neural networks. This thesis examines specialized multi-core processor designs for such specialized architectures. Both SRAM and memristor based specialized neural core designs are studied. The thesis also examines the on-chip routing needed to enable communications between cores.The routing bandwidth needed to enable processing of large multi-layered feed forward neural networks is studied. Two routing bandwidth models were developed for mesh interconnection networks: one examined sending neuron outputs from one layer to the next, while the other examined the streaming of synaptic weights from off-chip memory. The models were validated through simulations. Both static and dynamic routing approaches for large multi-core feed forward neural network accelerators were examined.We have observed that the accumulated bandwidth requirement in the on-chip network to access off-chip data is much greater than bandwidth required to send neuron outputs between cores. In almost all cases, static routing was significantly more efficient than dynamic routing, requiring both lower area and power.We compared our proposed SRAM and memristor based digital systems to more traditional HPC systems. Two commodity high performance processors were examined: a six core Intel Xeon processor, and an NVIDIA Tesla M2070 GPGPU. Care was taken to ensure the code on each platform was very efficient: multi-threaded and accounting for vector parallelism on the Xeon processor, and a high device utilization CUDA program on the GPGPU. Our results indicate that the specialized systems can be between two to five orders more energy efficient compared to the traditional HPC systems.

Keywords

Neural networks (Computer science), Routing (Computer network management), Microprocessors Design and construction, Computers Energy conservation, Electrical Engineering, Multi-core architectures, neuromorphic architecture, on-chip routing, memristor crossbar, specialized core

Rights Statement

Copyright © 2014, author

Share

COinS