Image of The Role of Transformer-based Image Captioning for Indoor Environment Visual Understanding

Text

The Role of Transformer-based Image Captioning for Indoor Environment Visual Understanding



Image captioning has attracted extensive attention in the field of image understanding. Image captioning has two natural parts; image and language expressions that combines computer vision and NLP to generate caption. Image captioning focuses on making the model to be able to get the description of the image as accurate as the ground-truth captions delivered by humans. Image captioning can be applied into different scenarios, such as helping the visually impaired people to get a better visual understanding of their surroundings environment through generated image caption that can be translated to speech. In this paper, we present a novel image captioning approach in Bahasa Indonesia, using Transformer, to enable visual understanding of indoor environments. We use our own modified MSCOCO dataset. Here, we used ten different indoor objects from MSCOCO datasets namely, beds, sinks, chairs, couches, tables, televisions, refrigerators, house plants, ovens, and cellphones. We modified the captions by creating three new captions in Bahasa Indonesia that includes the objects name, color, position, size, characteristics, and its close surrounding. We use Transformer architecture, which is then compared with merged encoder-decoder architecture model with different hyperparameter tunings. Both model architectures used InceptionV3 in extracting image features. The result of our experiment shows that the Transformer model with a batch size of 64, number of attention heads of 4, and a dropout of 0.2 outperforms other models with a BLEU-1 score of 0.527565, BLEU-2 score of 0.353696, BLEU-3 score of 0.227728, BLEU-4 score of 0.146192, METEOR score of 0.184714, ROUGE-L score of 0.377379, and CIDEr score of 0.393117. Finally, the inference result shows that the generated captions could give indoor environment understanding.


Availability

No copy data


Detail Information

Series Title
-
Call Number
-
Publisher International Journal of Computing and Digital Systems : Bahrain.,
Collation
005
Language
English
ISBN/ISSN
2210-142X
Classification
NONE
Content Type
-
Media Type
-
Carrier Type
-
Edition
-
Subject(s)
Specific Detail Info
-
Statement of Responsibility

Other Information

Accreditation
Scopus Q3

Other version/related

No other version available


File Attachment



Information


Web Online Public Access Catalog - Use the search options to find documents quickly