IJFANS International Journal of Food and Nutritional Sciences

ISSN PRINT 2319 1775 Online 2320-7876

Advancements in Image Captioning: Bridging Computer Vision and Natural Language Processing with Deep Learning

Main Article Content

S.Sagar Imambi1, N.V. Nikhila2
» doi: 10.48047/IJFANS/11/S6/036


the past few years, image captioning has emerged as a complex and demanding task within the field of artificial intelligence. It has attracted many researchers in the field of AI and became an arduous and an interesting task. Image captioning, automatically generates the textual description according to the content observed in an image and it is the combination of two methods including computer vision and natural language processing. Computer vision is to realize the content of the image and natural speech processing is to understand the image into words in the correct order. Recently, Deep learning methods are achieving better results on the problem of caption generation and they can define a single end-to-end model to predict a caption when a photograph is given, instead of requiring a pipeline of specifically designed models or sophisticated data preparation. By using deep learning techniques like CNN, RNN accurate descriptions can be predicted. Convolutional Deep Neural Network (CNN) is used for feature extraction from image and Recurrent Neural Network is used for sentence generation. the model is trained in such a way that if an image is given to the model it generates the textual description observed in an image. Recurrent neural network can be trained on a dataset of images and text descriptions, and then used to generate new text descriptions for new images

Article Details