17 September 2024
Warsaw University of Life Sciences - SGGW
Europe/Warsaw timezone

Improving the efficiency of "Show and Tell" encoder-decoder image captioning model

17 Sept 2024, 11:25
25m
Online meeting - MS Teams (Warsaw University of Life Sciences - SGGW)

Online meeting - MS Teams

Warsaw University of Life Sciences - SGGW

Nowoursynowska 159 Warszawa, Poland (See the section "Programme & Venue" for details)
paper main track

Speakers

Albert Ziółkiewicz Karol Zieliński Marcin Iwanowski (Institute of Control and Industrial Electronics, Warsaw University of Technology) Mateusz Bartosiewicz (Institute of Control and Industrial Electronics, Warsaw University of Technology) Piotr Szczepański

Description

The paper investigates the influence of hyperparameters of the "Show and Tell" image captioning model on the overall efficiency of the method. The method is based on an encoder-decoder approach, where the encoder -- the backbone feature extractor based on the convolutional neural networks (CNN) is responsible for extracting image features and the decoder -- the recurrent neural network (RNN), produces a caption -- a phrase describing the image content. In our research, we tested the encoder part by verifying Densenet, Resnet, and Regnet image feature extractors and the decoder part by changing the size of the RNN sizes. Furthermore, we also investigated the sentence generation stage. The investigation aims to find the optimal feature extractor and decoder size combination. Our research proves that an optimal choice of model's hyperparameters increases caption generation efficiency.

Primary authors

Albert Ziółkiewicz Karol Zieliński Marcin Iwanowski (Institute of Control and Industrial Electronics, Warsaw University of Technology) Mateusz Bartosiewicz (Institute of Control and Industrial Electronics, Warsaw University of Technology) Piotr Szczepański

Presentation materials

There are no materials yet.