COMPUTATIONAL METHODS FOR PHYLOGENETIC ANALYSIS

Zahid, Mohd. Abdul Hai

DSpace Home
→
ELECTRONICS AND COMMUNICATION ENGINEERING (FORMERLY ELECTRONICS & COMPUTER ENGINEERING)
→
DOCTORAL THESES (E & C)
→
View Item

COMPUTATIONAL METHODS FOR PHYLOGENETIC ANALYSIS

Zahid, Mohd. Abdul Hai

URI: http://hdl.handle.net/123456789/1790

Date: 2007

Abstract:

The huge volume of genetic data generated through biological experiments is not useful until it is analyzed and classified properly. A single human genome project ensued in 3.2 million base pairs of nucleotide sequence data. Manual analysis of this kind of huge data is impossible [1]. The quest of classifying the genetic material leads to one of the most important field of biology called phylogenet ics. Phylogenetics is the study of relationship among species or genes with the combination of molecular biology and mathematics. The large applications and availability of genetic data indicate the serious requirement for accurate, fast and generic phylogenetic analysis tools to process genomic data. This is presented in detail through citing recent works in the first part of the thesis. It is well known that the network representation of the evolutionary relation ship provides a better understanding of the evolutionary process and the non-tree like events such as horizontal gene transfer, hybridization, recombination and homoplasy. The second part of the thesis proposes a pattern recognition based approach for the construction of phylogenetic network due to recombination. Unlike other works [2, 3, 4, 5, 6, 7], we used both similarity and dissimilarity of Single Nucleotide Polymorphism (SNP) sites for classifying the nodes into mu tation and recombination nodes. The use of both distance measures helps to overcome the information loss due to converting of sequence data into distances. iii Abstract The proposed algorithm [8] conducts a row-based search to detect the recombina tion nodes which significantly reduces the complexity of the proposed algorithm. Comparisons with existing algorithms show the superiority of our approach both in network visualization and time complexity. Third part of the thesis presents the problem of merging a set of given smaller phylogenetic trees into a bigger tree called supertree. There exist nearly 1.7 mil lion known species. Constructing tree of life consisting of these species is im practical. A method which exploits the features of distance and character based phylogenetic reconstruction methods is proposed to construct phylogenetic tree of life. We developed a variant of well known Unweighted-Pair-Group Method with Arithmetic mean (UPGMA) [9] for constructing the rooted supertree [10]. The algorithm satisfies all the desirable properties of the supertree algorithms and gives the better visualization of the supertree than the existing supertree methods. We also consider the problem of supertree reconstruction for unrooted input trees [11]. The fourth part of the thesis introduces the problem of incorporating addi tional evolutionary information for constructing supertrees. Most of the existing supertree methods combine the input trees based on the topological information carried by each of the input tree and other evolutionary information is usually ignored [12]. If the available evolutionary information is considered with tree topology for amalgamating the input collection of trees, the resulting supertree would be more accurate and resolved. In this part, we propose a novel supertree method [13], which incorporates relative time divergence information with tree topologies. This method returns a supertree even for incompatibilities such as conflicts in divergence dates and incompatibility between topology and diver gence dates. The conflicts are resolved based on graph theoretic concepts [14]. iv Abstract Another advantage of the algorithm is that the resulting supertree represents all nestings present in the input collection, which is not possible with other existing algorithms. Fifth part of the thesis explores the problem of merging smaller trees with some of the labelled internal nodes. Generally, most of the existing supertree methods are developed based on the implicit assumption that only leaf nodes are labelled in the input tree collection. On the other hand, the phylogenies con structed based on morphological studies often contain the labelled internal nodes, thus requiring for a more generalized supertree approach. In this part of the dis sertation, we propose an optimization based divide and conquer method [15] to combine semi-labelled trees. The algorithm returns a supertree even for (descen ded level) incompatible input trees. Moreover, it also preserves all the nestings present in the input tree collection. On the other hand, most of the existing methods are neither capable of handling incompatible input trees nor the result ing supertree represent all the nestings present in all the input trees. Finally, the contributions made in the thesis are summarized and scope for the future work is outlined.

Show full item record