Jonathan Schierl



Download Project (1.9 MB)


Even with current testing and evaluations in state-of-the-art deep learning, there is a lack of comparison between different modalities in object detection. To improve this, we created an in-house dataset directly for the comparison of 2D infrared captures and 3D LiDAR point clouds. The sensors used to capture this dataset were placed next to each other to retain a similar point of view and resolution. Individually, these modalities were evaluated using state-of-the-art deep learning architectures. For 2D Infrared, a neighborhood-based image enhancement algorithm called Retinex was used to improve the contrast of the images. These enhanced images were then processed using the Mask R-CNN architecture. For 3D point clouds, PointNet++ was used for feature extraction and classification. The detection accuracy and overall performance were compared between these modalities. Generally, the 3D approach performed better, with higher rates of detection and better accuracy. In comparing these architectures, we learned about the pros and cons of each modality. To further increase the accuracy of detection, we propose a fusion network that incorporates the strengths of both modalities and processes them in one architecture. This network would extract features in 2D using Mask R-CNN and in 3D using KPConv. These feature spaces would be combined and sent through the region proposal network and rest of the Mask R-CNN architecture for a higher detection accuracy.

Publication Date


Project Designation

Honors Thesis

Primary Advisor

Theus H. Aspiras

Primary Advisor's Department

Electrical and Computer Engineering


Stander Symposium project, School of Engineering

United Nations Sustainable Development Goals

Industry, Innovation, and Infrastructure

Multi-modal Data Analysis and Fusion for Robust Object Detection in 2D/3D Sensing