A 2D/3D Feature-Level Information Fusion Architecture For Remote Sensing Applications

Date of Award


Degree Name

M.S. in Electrical and Computer Engineering


Department of Electrical and Computer Engineering


Vijayan Asari


Remote sensing has seen significant attention due to advances in technology and access to data. A current challenge is classifying land regions by their usage - residential, industrial, forest, etc. Scope is very important, too large of an area would lead to multiple classes being present in one scene, and too small of an area would not contain enough contextual information to accurately determine a scene. To further complicate matters, there are multiple similar objects all present in different classes, for example trees are found in residential, forest, and park classes. Deep learning is a current technology that is successful with problems at this level of ambiguity. The most straight-forward approach to address this level of complexity is to use remote sensing images to classify land regions. However, deep learning using 2D images has its downsides, especially when analyzing aerial data, namely, it lacks 3-dimensional information such as depth. Similarly, there are also 3D deep learning architectures with different weaknesses, i.e., longer processing times and lack of intensity information. As access to processing hardware and remote sensing data continues to increase, there is a pressing need to leverage the strengths of both modalities. This can be done in one of three ways: (1) a data-level fusion, where data modalities are fused together directly; (2) a feature-level fusion, where features are fused after data modalities are processed individually; or (3) a decision-level fusion, where predictions are made using each modality independently, until, ultimately, they are fused into one final decision. In this work, we utilize a feature-level fusion because our dataset (comprised of lidar and RGB scenes) have very different types of information; after analysis, we found that each modality was better suited to different sections of our data, which we could harness using a feature-level fusion. Furthermore, to improve on these results, an accurate registration strategy is important because features from each modality should describe the same spatial region as that in the original scene. This would be especially important in a segmentation (pixel-wise decision) fusion architecture. A common technique between image registration and point cloud registration is a feature extraction and matching strategy. This methodology is leveraged using the Scale Invariant and Feature Transform (SIFT) algorithm, a robust feature extraction technique of 2D images. Then, using the geographical metadata of our images and point clouds, the SIFT detections are accurately projected onto their respective point clouds. This transformation can be used in many applications, including an automatic 2D/3D registration method.


Computer Engineering, Computer Science, Artificial Intelligence, Remote Sensing, deep learning, computer vision, remote sensing, lidar, aerial imagery, land classification, multimodal, fusion, machine learning, point clouds, GIS, mapping, geospatial intelligence, transfer learning, wide area surveillance, automatic target recognition

Rights Statement

Copyright © 2022, author.