Assessment of Generative AI for Text-to-Image Synthesis
Date of Award
5-5-2024
Degree Name
M.C.S. in Computer Science
Department
Department of Computer Science
Advisor/Chair
Tam Nguyen
Abstract
Text-to-image models, often associated with Generative AI techniques, have faced several challenges that have become synonymous with the broader field of Computer Vision and Machine Learning. Generating realistic images from textual descriptions is a complex task. While recent models have made significant progress, achieving photo realism remains a challenge. Understanding and translating natural language descriptions into meaningful visual representations is inherently difficult. Many text-to-image models struggle to produce images with sufficient resolution and clarity, especially for complex scenes or objects. Assessing the quality of these generated images remains a challenging task due to the perception of image quality. Furthermore, generated images lack clear ground truth and higher details for human attention. To address this challenge in the study of artificially generated images, we are introducing a novel approach with a dataset, Generative Artificial Image Assessment (GAIA) trained with architecture to define a definitive reference point. This comprises images from eight popular text-to-image AI models as well as user rankings from a crowd-sourced annotation. This dataset is evaluated and predicted by pre-trained state-of-the-art networks with ranking classes and diverse regression techniques in a comparative study to analyze the images. The approach is a combination of objective evaluation metrics, subjective human judgment, benchmark datasets with diverse ground truth annotations, and advancements in multimodal learning techniques giving us a way forward to advancing the field of text-to-image generation.
Keywords
Computer Science
Rights Statement
Copyright 2024, author
Recommended Citation
Sharma, Kriti, "Assessment of Generative AI for Text-to-Image Synthesis" (2024). Graduate Theses and Dissertations. 7606.
https://ecommons.udayton.edu/graduate_theses/7606
