Molecular similarity searching based on deep learning for feature reduction

The concept of molecular similarity has been widely used in rational drug design, where structurally similar molecules are explored in molecular databases for retrieving functionally similar molecules. The most used conventional similarity methods are two-dimensional (2D) fingerprints to evaluate th...

Full description

Saved in:
Bibliographic Details
Main Author: Saeed Nasser, Maged Mohammed
Format: Thesis
Language:English
Published: 2022
Subjects:
Online Access:http://eprints.utm.my/id/eprint/101478/1/MagedMohammedSaeedNasserPSC2022.pdf.pdf
Tags: Add Tag
No Tags, Be the first to tag this record!
id my-utm-ep.101478
record_format uketd_dc
spelling my-utm-ep.1014782023-06-21T10:09:44Z Molecular similarity searching based on deep learning for feature reduction 2022 Saeed Nasser, Maged Mohammed QA75 Electronic computers. Computer science The concept of molecular similarity has been widely used in rational drug design, where structurally similar molecules are explored in molecular databases for retrieving functionally similar molecules. The most used conventional similarity methods are two-dimensional (2D) fingerprints to evaluate the similarity of molecules towards a target query. However, these descriptors include redundant and irrelevant features that might impact the effectiveness of similarity searching methods. Moreover, the majority of existing similarity searching methods often disregard the importance of some features over others and assume all features are equally important. Thus, this study proposed three approaches for identifying the important features of molecules in chemical datasets. The first approach was based on the representation of the molecular features using Autoencoder (AE), which removes irrelevant and redundant features. The second approach was the feature selection model based on Deep Belief Networks (DBN), which are used to select only the important features. In this approach, the DBN is used to find subset of features that represent the important ones. The third approach was conducted to include descriptors that complement to each other. Different important features from many descriptors were filtered through DBN and combined to form a new descriptor used for molecular similarity searching. The proposed approaches were experimented on the MDL Data Drug Report standard dataset (MDDR). Based on the test results, the three proposed approaches overcame some of the existing benchmark similarity methods, such as Bayesian Inference Networks (BIN), Tanimoto Similarity Method (TAN), Adapted Similarity Measure of Text Processing (ASMTP) and Quantum-Based Similarity Method (SQB). The results showed that the performance of the three proposed approaches proved to be better in term of average recall values, especially with the use of structurally heterogeneous datasets that could produce results than other methods used previously to improve molecular similarity searching. 2022 Thesis http://eprints.utm.my/id/eprint/101478/ http://eprints.utm.my/id/eprint/101478/1/MagedMohammedSaeedNasserPSC2022.pdf.pdf application/pdf en public http://dms.library.utm.my:8080/vital/access/manager/Repository/vital:150555 phd doctoral Universiti Teknologi Malaysia Faculty of Engineering - School of Computing
institution Universiti Teknologi Malaysia
collection UTM Institutional Repository
language English
topic QA75 Electronic computers
Computer science
spellingShingle QA75 Electronic computers
Computer science
Saeed Nasser, Maged Mohammed
Molecular similarity searching based on deep learning for feature reduction
description The concept of molecular similarity has been widely used in rational drug design, where structurally similar molecules are explored in molecular databases for retrieving functionally similar molecules. The most used conventional similarity methods are two-dimensional (2D) fingerprints to evaluate the similarity of molecules towards a target query. However, these descriptors include redundant and irrelevant features that might impact the effectiveness of similarity searching methods. Moreover, the majority of existing similarity searching methods often disregard the importance of some features over others and assume all features are equally important. Thus, this study proposed three approaches for identifying the important features of molecules in chemical datasets. The first approach was based on the representation of the molecular features using Autoencoder (AE), which removes irrelevant and redundant features. The second approach was the feature selection model based on Deep Belief Networks (DBN), which are used to select only the important features. In this approach, the DBN is used to find subset of features that represent the important ones. The third approach was conducted to include descriptors that complement to each other. Different important features from many descriptors were filtered through DBN and combined to form a new descriptor used for molecular similarity searching. The proposed approaches were experimented on the MDL Data Drug Report standard dataset (MDDR). Based on the test results, the three proposed approaches overcame some of the existing benchmark similarity methods, such as Bayesian Inference Networks (BIN), Tanimoto Similarity Method (TAN), Adapted Similarity Measure of Text Processing (ASMTP) and Quantum-Based Similarity Method (SQB). The results showed that the performance of the three proposed approaches proved to be better in term of average recall values, especially with the use of structurally heterogeneous datasets that could produce results than other methods used previously to improve molecular similarity searching.
format Thesis
qualification_name Doctor of Philosophy (PhD.)
qualification_level Doctorate
author Saeed Nasser, Maged Mohammed
author_facet Saeed Nasser, Maged Mohammed
author_sort Saeed Nasser, Maged Mohammed
title Molecular similarity searching based on deep learning for feature reduction
title_short Molecular similarity searching based on deep learning for feature reduction
title_full Molecular similarity searching based on deep learning for feature reduction
title_fullStr Molecular similarity searching based on deep learning for feature reduction
title_full_unstemmed Molecular similarity searching based on deep learning for feature reduction
title_sort molecular similarity searching based on deep learning for feature reduction
granting_institution Universiti Teknologi Malaysia
granting_department Faculty of Engineering - School of Computing
publishDate 2022
url http://eprints.utm.my/id/eprint/101478/1/MagedMohammedSaeedNasserPSC2022.pdf.pdf
_version_ 1776100707479322624