Text to image synthesis using generative adversarial network

The text-to-image synthesis will synthesise images based on the given text description; the content of synthesised images is matched with the text description. The existing text-to-image synthesis approaches mainly develop based on Generative Adversarial Networks (GANs) for optimal performance. Howe...

Full description

Saved in:
Bibliographic Details
Main Author: Tan, Yong Xuan
Format: Thesis
Published: 2022
Subjects:
Tags: Add Tag
No Tags, Be the first to tag this record!
id my-mmu-ep.11543
record_format uketd_dc
spelling my-mmu-ep.115432023-07-18T05:23:06Z Text to image synthesis using generative adversarial network 2022-12 Tan, Yong Xuan Q300-390 Cybernetics The text-to-image synthesis will synthesise images based on the given text description; the content of synthesised images is matched with the text description. The existing text-to-image synthesis approaches mainly develop based on Generative Adversarial Networks (GANs) for optimal performance. However, text-to-image synthesis remains challenging in synthesising realistic and semantically consistent images. In this thesis, text-to-image synthesis frameworks are proposed to synthesise highly realistic images that are conditioned on the text description. This research proposes GANs-based text-to-image synthesis frameworks to utilise advanced model architecture in synthesising large-scale realistic images. Moreover, self-supervised learning is the first to be investigated into text-toimage synthesis to explore high-level structural information in synthesising complex objects. In this work, three novel text-to-image synthesise frameworks are designed which are referred to as: (1) Self-supervised Residual Generative Adversarial Network (SResGAN), (2) Multi-scale Self-supervised Residual Generative Adversarial Network (MSResGAN), and (3) Multi-scale Refined Self-supervised Residual Generative Adversarial Network (MRSResGAN). SResGAN investigates advanced model architecture with the residual network to produce large scale and realistic images. For better visual realism, MSResGAN and MRSResGAN employed multi-scale GAN architecture to synthesise images from a lower scale to a larger scale. All designed frameworks manage to synthesise highly realistic images based on the received text description. Besides that, the proposed frameworks are integrated with self-supervised learning via a rotation task to eliminate low-data regime and diversify the model learned representation. In doing so, the models are able to maximise the high-level structural information throughout the network and synthesise more diverse image content. Furthermore, the proposed frameworks are integrated with feature matching, L1 distance loss, and one-sided label smoothing to stabilise the model training. All three frameworks are evaluated on two benchmark text-to-image synthesis datasets, namely Oxford-102 and CUB-200-2011. The performance of the frameworks is measured by three evaluation metrics including Inception Score, Fréchet Inception Distance, and Structural Similarity Index. Based on the experiment results, three frameworks are managed to outperform several existing text-to-image synthesis approaches on all two benchmark text-to-image synthesis datasets. 2022-12 Thesis http://shdl.mmu.edu.my/11543/ http://erep.mmu.edu.my/ masters Multimedia University Faculty of Information Science and Technology (FIST) EREP ID: 10860
institution Multimedia University
collection MMU Institutional Repository
topic Q300-390 Cybernetics
spellingShingle Q300-390 Cybernetics
Tan, Yong Xuan
Text to image synthesis using generative adversarial network
description The text-to-image synthesis will synthesise images based on the given text description; the content of synthesised images is matched with the text description. The existing text-to-image synthesis approaches mainly develop based on Generative Adversarial Networks (GANs) for optimal performance. However, text-to-image synthesis remains challenging in synthesising realistic and semantically consistent images. In this thesis, text-to-image synthesis frameworks are proposed to synthesise highly realistic images that are conditioned on the text description. This research proposes GANs-based text-to-image synthesis frameworks to utilise advanced model architecture in synthesising large-scale realistic images. Moreover, self-supervised learning is the first to be investigated into text-toimage synthesis to explore high-level structural information in synthesising complex objects. In this work, three novel text-to-image synthesise frameworks are designed which are referred to as: (1) Self-supervised Residual Generative Adversarial Network (SResGAN), (2) Multi-scale Self-supervised Residual Generative Adversarial Network (MSResGAN), and (3) Multi-scale Refined Self-supervised Residual Generative Adversarial Network (MRSResGAN). SResGAN investigates advanced model architecture with the residual network to produce large scale and realistic images. For better visual realism, MSResGAN and MRSResGAN employed multi-scale GAN architecture to synthesise images from a lower scale to a larger scale. All designed frameworks manage to synthesise highly realistic images based on the received text description. Besides that, the proposed frameworks are integrated with self-supervised learning via a rotation task to eliminate low-data regime and diversify the model learned representation. In doing so, the models are able to maximise the high-level structural information throughout the network and synthesise more diverse image content. Furthermore, the proposed frameworks are integrated with feature matching, L1 distance loss, and one-sided label smoothing to stabilise the model training. All three frameworks are evaluated on two benchmark text-to-image synthesis datasets, namely Oxford-102 and CUB-200-2011. The performance of the frameworks is measured by three evaluation metrics including Inception Score, Fréchet Inception Distance, and Structural Similarity Index. Based on the experiment results, three frameworks are managed to outperform several existing text-to-image synthesis approaches on all two benchmark text-to-image synthesis datasets.
format Thesis
qualification_level Master's degree
author Tan, Yong Xuan
author_facet Tan, Yong Xuan
author_sort Tan, Yong Xuan
title Text to image synthesis using generative adversarial network
title_short Text to image synthesis using generative adversarial network
title_full Text to image synthesis using generative adversarial network
title_fullStr Text to image synthesis using generative adversarial network
title_full_unstemmed Text to image synthesis using generative adversarial network
title_sort text to image synthesis using generative adversarial network
granting_institution Multimedia University
granting_department Faculty of Information Science and Technology (FIST)
publishDate 2022
_version_ 1776101417046507520