Content-Aware Generative Model for Multi-Item Outfit Recommendation

EasyChair Preprint no. 8016

14 pagesDate: May 22, 2022


Recently, deep learning-based recommender systems have received increasing attention of researchers and demonstrate excellent results at solving various tasks in various areas. One of the last growing trends is learning the compatibility of items in a set and predicting the next item or several ones by input ones. Fashion compatibility modeling is one of the areas in which this task is being actively researched. Classical solutions are training on existing sets and are learning to recommend items that have been combined with each other before. This severely limits the number of possible combinations. GAN models proved to be the most effective for decreasing the impact of this problem and generating unseen combinations of items, but they also have several limitations. They use a fixed number of input and output items. However, real outfits contain a variable number of items. Also, they use unimodal or multimodal data to generate only visual features. However, this approach is not guaranteed to save content attributes of items during generation. We propose a multimodal transformer-based GAN with cross-modal attention to simultaneously explore visual features and textual attributes. We also propose to represent a set of items as a sequence of items to allow the model to decide how many items should be in the set. Experimenting on FOTOS dataset at the fill-in-the-blank task is showed that our method outperforms such strong baselines as Bi-LSTM-VSE, MGCM, HFGN, and others. Our model has reached 0.878 accuracy versus 0.724 of Bi-LSTM-VSE, 0.822 of MGCM, 0.826 of HFGN.

Keyphrases: Generative Adversarial Networks, Multimodal Recommendations, Outfit Recommendations, Recommender Systems, Set Recommendations, transformers

