IJFANS International Journal of Food and Nutritional Sciences

ISSN PRINT 2319 1775 Online 2320-7876

Critical Factors for Optimizing Large Multi-modal Models: Mage Resolution and Text Labeling

Main Article Content

U. Harita,Tanaya Ganguly

Abstract

Large Multimodal Models have proven to be remarkably adept at comprehending tasks involving broad vision and language. However, these models frequently face difficulties when handling complex scene understandings and narratives because of the limited supported input resolution (e.g., 448 x 448) and the incomplete description of the training image-text combination.

Article Details