Alignment-free distance measures for clustering Expressed Sequence Tags

Clustering of expressed sequence tags (ESTs) is a vital step in EST analysis pipeline. The main goal of clustering is to gather overlapping ESTs from the same transcript of a single gene into a distinct cluster. A simple way to cluster ESTs is by comparing their similarity in a pair-wise manner. In...

Full description

Saved in:
Bibliographic Details
Main Author: Ngo, Keng Hoong
Format: Thesis
Published: 2013
Subjects:
Tags: Add Tag
No Tags, Be the first to tag this record!
id my-mmu-ep.5237
record_format uketd_dc
spelling my-mmu-ep.52372014-02-26T01:43:55Z Alignment-free distance measures for clustering Expressed Sequence Tags 2013-03 Ngo, Keng Hoong QA299.6-433 Analysis Clustering of expressed sequence tags (ESTs) is a vital step in EST analysis pipeline. The main goal of clustering is to gather overlapping ESTs from the same transcript of a single gene into a distinct cluster. A simple way to cluster ESTs is by comparing their similarity in a pair-wise manner. In fact, earlier EST clustering was implemented using the alignment-based distance measures such as BLAST, FASTA, Smith-Waterman algorithm and etc. However, the main shortcoming of the alignment-based approach is the high computational cost resulting from pair-wise alignment. This makes it impractical for very large EST datasets. This has motivated the introduction of alignment-free distance measures for EST clustering. Established EST clustering methods such as d2_cluster, wcd and PEACE apply alignment-free distance measures. Performance-wise, they yield faster computation time with acceptable clustering accuracy as compared to the alignment based methods. In EST clustering, it is common to implement a windowing strategy in conjunction with the alignment-free distance measures. Some distance measures also use heuristics to speed up the comparisons. Consequently, the clustering results produced by them can vary significantly from one dataset to another. It means that the clustering performance is excellent when the distance measure is able to detect and quantify the features found in the dataset efficiently. On the other hand, it can perform poorly when it deals with another dataset with different characteristics, where the distance measure fails to capture and quantify them correctly. 2013-03 Thesis http://shdl.mmu.edu.my/5237/ http://vlib.mmu.edu.my/diglib/login/dlusr/login.php phd doctoral Multimedia University Faculty of Computing & Informatics
institution Multimedia University
collection MMU Institutional Repository
topic QA299.6-433 Analysis
spellingShingle QA299.6-433 Analysis
Ngo, Keng Hoong
Alignment-free distance measures for clustering Expressed Sequence Tags
description Clustering of expressed sequence tags (ESTs) is a vital step in EST analysis pipeline. The main goal of clustering is to gather overlapping ESTs from the same transcript of a single gene into a distinct cluster. A simple way to cluster ESTs is by comparing their similarity in a pair-wise manner. In fact, earlier EST clustering was implemented using the alignment-based distance measures such as BLAST, FASTA, Smith-Waterman algorithm and etc. However, the main shortcoming of the alignment-based approach is the high computational cost resulting from pair-wise alignment. This makes it impractical for very large EST datasets. This has motivated the introduction of alignment-free distance measures for EST clustering. Established EST clustering methods such as d2_cluster, wcd and PEACE apply alignment-free distance measures. Performance-wise, they yield faster computation time with acceptable clustering accuracy as compared to the alignment based methods. In EST clustering, it is common to implement a windowing strategy in conjunction with the alignment-free distance measures. Some distance measures also use heuristics to speed up the comparisons. Consequently, the clustering results produced by them can vary significantly from one dataset to another. It means that the clustering performance is excellent when the distance measure is able to detect and quantify the features found in the dataset efficiently. On the other hand, it can perform poorly when it deals with another dataset with different characteristics, where the distance measure fails to capture and quantify them correctly.
format Thesis
qualification_name Doctor of Philosophy (PhD.)
qualification_level Doctorate
author Ngo, Keng Hoong
author_facet Ngo, Keng Hoong
author_sort Ngo, Keng Hoong
title Alignment-free distance measures for clustering Expressed Sequence Tags
title_short Alignment-free distance measures for clustering Expressed Sequence Tags
title_full Alignment-free distance measures for clustering Expressed Sequence Tags
title_fullStr Alignment-free distance measures for clustering Expressed Sequence Tags
title_full_unstemmed Alignment-free distance measures for clustering Expressed Sequence Tags
title_sort alignment-free distance measures for clustering expressed sequence tags
granting_institution Multimedia University
granting_department Faculty of Computing & Informatics
publishDate 2013
_version_ 1747829567697977344