Image Caption Generation Using AI/ML With User Feedback


Image Caption Generation Using AI/ML With User Feedback



Nidhi Sinha


Presentation: 10:00-10:20 a.m., Jessie Hathcock Hall 101



The objective of this project is to create a machine learning (ML) and artificial intelligence (AI) system that can produce accurate and meaningful captions for images. When a picture is an input, the system will give a caption that precisely describes what is shown in the image. This will be achieved by using a sizable collection of photos and their related captions to train a neural network. The system will use a recurrent neural network (RNN) to produce the caption and a convolutional neural network (CNN) to extract features from the image. The image will be transformed by the CNN into a feature vector, which will then be sent as input to the RNN. The caption will then be created by the RNN using a series of words. To train the model, a large dataset and their respective caption are required. Initially, will pre-process the dataset to ensure that it will be useful for neural networks. It includes resizing of images to a particular size, normalizing pixel values & converting the caption to their respective numerical vectors. Once training is completed, it will be tested on a different set of images to check its accuracy and relevance. User feedback will be collected for further improvement. This feedback will be used to fine-tune the model in order to achieve an accurate result. Users will be able to upload an image and instantly get a caption using a web application. Users will be able to rate the automatically generated captions and offer additional input through the web application's feedback feature, which will be included. This technique has a wide range of possible uses, such as increasing the accessibility of photos for those who are visually impaired, improving the searchability of image databases, and giving automated image descriptions for use in advertising and social media. In conclusion, the goal of this project is to create a system that can produce precise and pertinent captions for photographs utilizing AI/ML methods. The technology will be put into use as a web application for real-time use and assessed using user feedback.

Publication Date


Project Designation

Course Project 202310 CPS 595 P1

Primary Advisor

Ahmed El Ouadrhiri, Phu Phung

Primary Advisor's Department

Computer Science


Stander Symposium, College of Arts and Sciences

Image Caption Generation Using AI/ML With User Feedback