Dashcam-Based Collision Anticipation (DCA) Using Multimodal Large Language Models: An Ablation Study

Date of Award

5-9-2026

Degree Name

M.S. in Computer Science

Department

Department of Computer Science

Advisor/Chair

Tam Van Nguyen

Abstract

With the increase in automobile accidents every year, the necessity of building a road accident prediction system has become vital. The ability to detect early collisions has become a critical challenge in building safer driving systems. While modern autonomous vehicles are equipped with advanced safety features, most vehicles on the road today lack such capabilities. This raises the question: can multimodal large language models analyze dashcam footage and anticipate collisions before they occur? To answer this, we introduce Dashcam-Based Collision Anticipation (DCA) and evaluate six models on this task, including three commercial systems (GPT-4o, Gemini 2.5 Flash, and Claude Sonnet 4) and three open-source systems (Llama 3.2 Vision 11B, LLaVA 13B, and LLaVA 7B), using a dataset of 1,960 dashcam video clips. We also evaluate two zero-shot vision-language baselines, CLIP ViT-L/14 and BLIP-2, to establish a reference of comparison. Results show that commercial models significantly outperform both open-source models and zero-shot baselines, with GPT-4o achieving the highest accuracy. Performance is measured using accuracy, precision, recall, and F1-score.

Keywords

Computer Science

Comments

OCLC No. 1591829754

Rights Statement

Copyright 2026, author.

Share

COinS
 
 
 

Links