Dashcam-Based Collision Anticipation (DCA) Using Multimodal Large Language Models: An Ablation Study
Date of Award
5-9-2026
Degree Name
M.S. in Computer Science
Department
Department of Computer Science
Advisor/Chair
Tam Van Nguyen
Abstract
With the increase in automobile accidents every year, the necessity of building a road accident prediction system has become vital. The ability to detect early collisions has become a critical challenge in building safer driving systems. While modern autonomous vehicles are equipped with advanced safety features, most vehicles on the road today lack such capabilities. This raises the question: can multimodal large language models analyze dashcam footage and anticipate collisions before they occur? To answer this, we introduce Dashcam-Based Collision Anticipation (DCA) and evaluate six models on this task, including three commercial systems (GPT-4o, Gemini 2.5 Flash, and Claude Sonnet 4) and three open-source systems (Llama 3.2 Vision 11B, LLaVA 13B, and LLaVA 7B), using a dataset of 1,960 dashcam video clips. We also evaluate two zero-shot vision-language baselines, CLIP ViT-L/14 and BLIP-2, to establish a reference of comparison. Results show that commercial models significantly outperform both open-source models and zero-shot baselines, with GPT-4o achieving the highest accuracy. Performance is measured using accuracy, precision, recall, and F1-score.
Keywords
Computer Science
Rights Statement
Copyright 2026, author.
Recommended Citation
Merakanapalli, Jayanth Nageswara Rao, "Dashcam-Based Collision Anticipation (DCA) Using Multimodal Large Language Models: An Ablation Study" (2026). Graduate Theses and Dissertations. 7655.
https://ecommons.udayton.edu/graduate_theses/7655

Comments
OCLC No. 1591829754