Alignment-free distance measures for clustering Expressed Sequence Tags

Clustering of expressed sequence tags (ESTs) is a vital step in EST analysis pipeline. The main goal of clustering is to gather overlapping ESTs from the same transcript of a single gene into a distinct cluster. A simple way to cluster ESTs is by comparing their similarity in a pair-wise manner. In...

Full description

Saved in:

Bibliographic Details
Main Author:	Ngo, Keng Hoong
Format:	Thesis
Published:	2013
Subjects:	QA299.6-433 Analysis
Tags:	Add Tag No Tags, Be the first to tag this record!

id	my-mmu-ep.5237
record_format	uketd_dc
spelling	my-mmu-ep.52372014-02-26T01:43:55Z Alignment-free distance measures for clustering Expressed Sequence Tags 2013-03 Ngo, Keng Hoong QA299.6-433 Analysis Clustering of expressed sequence tags (ESTs) is a vital step in EST analysis pipeline. The main goal of clustering is to gather overlapping ESTs from the same transcript of a single gene into a distinct cluster. A simple way to cluster ESTs is by comparing their similarity in a pair-wise manner. In fact, earlier EST clustering was implemented using the alignment-based distance measures such as BLAST, FASTA, Smith-Waterman algorithm and etc. However, the main shortcoming of the alignment-based approach is the high computational cost resulting from pair-wise alignment. This makes it impractical for very large EST datasets. This has motivated the introduction of alignment-free distance measures for EST clustering. Established EST clustering methods such as d2_cluster, wcd and PEACE apply alignment-free distance measures. Performance-wise, they yield faster computation time with acceptable clustering accuracy as compared to the alignment based methods. In EST clustering, it is common to implement a windowing strategy in conjunction with the alignment-free distance measures. Some distance measures also use heuristics to speed up the comparisons. Consequently, the clustering results produced by them can vary significantly from one dataset to another. It means that the clustering performance is excellent when the distance measure is able to detect and quantify the features found in the dataset efficiently. On the other hand, it can perform poorly when it deals with another dataset with different characteristics, where the distance measure fails to capture and quantify them correctly. 2013-03 Thesis http://shdl.mmu.edu.my/5237/ http://vlib.mmu.edu.my/diglib/login/dlusr/login.php phd doctoral Multimedia University Faculty of Computing & Informatics
institution	Multimedia University
collection	MMU Institutional Repository
topic	QA299.6-433 Analysis
spellingShingle	QA299.6-433 Analysis Ngo, Keng Hoong Alignment-free distance measures for clustering Expressed Sequence Tags
description	Clustering of expressed sequence tags (ESTs) is a vital step in EST analysis pipeline. The main goal of clustering is to gather overlapping ESTs from the same transcript of a single gene into a distinct cluster. A simple way to cluster ESTs is by comparing their similarity in a pair-wise manner. In fact, earlier EST clustering was implemented using the alignment-based distance measures such as BLAST, FASTA, Smith-Waterman algorithm and etc. However, the main shortcoming of the alignment-based approach is the high computational cost resulting from pair-wise alignment. This makes it impractical for very large EST datasets. This has motivated the introduction of alignment-free distance measures for EST clustering. Established EST clustering methods such as d2_cluster, wcd and PEACE apply alignment-free distance measures. Performance-wise, they yield faster computation time with acceptable clustering accuracy as compared to the alignment based methods. In EST clustering, it is common to implement a windowing strategy in conjunction with the alignment-free distance measures. Some distance measures also use heuristics to speed up the comparisons. Consequently, the clustering results produced by them can vary significantly from one dataset to another. It means that the clustering performance is excellent when the distance measure is able to detect and quantify the features found in the dataset efficiently. On the other hand, it can perform poorly when it deals with another dataset with different characteristics, where the distance measure fails to capture and quantify them correctly.
format	Thesis
qualification_name	Doctor of Philosophy (PhD.)
qualification_level	Doctorate
author	Ngo, Keng Hoong
author_facet	Ngo, Keng Hoong
author_sort	Ngo, Keng Hoong
title	Alignment-free distance measures for clustering Expressed Sequence Tags
title_short	Alignment-free distance measures for clustering Expressed Sequence Tags
title_full	Alignment-free distance measures for clustering Expressed Sequence Tags
title_fullStr	Alignment-free distance measures for clustering Expressed Sequence Tags
title_full_unstemmed	Alignment-free distance measures for clustering Expressed Sequence Tags
title_sort	alignment-free distance measures for clustering expressed sequence tags
granting_institution	Multimedia University
granting_department	Faculty of Computing & Informatics
publishDate	2013
_version_	1747829567697977344

Alignment-free distance measures for clustering Expressed Sequence Tags

Similar Items