Abstract:
The huge volume of genetic data generated through biological experiments is not
useful until it is analyzed and classified properly. A single human genome project
ensued in 3.2 million base pairs of nucleotide sequence data. Manual analysis
of this kind of huge data is impossible [1]. The quest of classifying the genetic
material leads to one of the most important field of biology called phylogenet
ics. Phylogenetics is the study of relationship among species or genes with the
combination of molecular biology and mathematics. The large applications and
availability of genetic data indicate the serious requirement for accurate, fast and
generic phylogenetic analysis tools to process genomic data. This is presented in
detail through citing recent works in the first part of the thesis.
It is well known that the network representation of the evolutionary relation
ship provides a better understanding of the evolutionary process and the non-tree
like events such as horizontal gene transfer, hybridization, recombination and
homoplasy. The second part of the thesis proposes a pattern recognition based
approach for the construction of phylogenetic network due to recombination.
Unlike other works [2, 3, 4, 5, 6, 7], we used both similarity and dissimilarity of
Single Nucleotide Polymorphism (SNP) sites for classifying the nodes into mu
tation and recombination nodes. The use of both distance measures helps to
overcome the information loss due to converting of sequence data into distances.
iii
Abstract
The proposed algorithm [8] conducts a row-based search to detect the recombina
tion nodes which significantly reduces the complexity of the proposed algorithm.
Comparisons with existing algorithms show the superiority of our approach both
in network visualization and time complexity.
Third part of the thesis presents the problem of merging a set of given smaller
phylogenetic trees into a bigger tree called supertree. There exist nearly 1.7 mil
lion known species. Constructing tree of life consisting of these species is im
practical. A method which exploits the features of distance and character based
phylogenetic reconstruction methods is proposed to construct phylogenetic tree
of life. We developed a variant of well known Unweighted-Pair-Group Method
with Arithmetic mean (UPGMA) [9] for constructing the rooted supertree [10].
The algorithm satisfies all the desirable properties of the supertree algorithms
and gives the better visualization of the supertree than the existing supertree
methods. We also consider the problem of supertree reconstruction for unrooted
input trees [11].
The fourth part of the thesis introduces the problem of incorporating addi
tional evolutionary information for constructing supertrees. Most of the existing
supertree methods combine the input trees based on the topological information
carried by each of the input tree and other evolutionary information is usually
ignored [12]. If the available evolutionary information is considered with tree
topology for amalgamating the input collection of trees, the resulting supertree
would be more accurate and resolved. In this part, we propose a novel supertree
method [13], which incorporates relative time divergence information with tree
topologies. This method returns a supertree even for incompatibilities such as
conflicts in divergence dates and incompatibility between topology and diver
gence dates. The conflicts are resolved based on graph theoretic concepts [14].
iv
Abstract
Another advantage of the algorithm is that the resulting supertree represents all
nestings present in the input collection, which is not possible with other existing
algorithms.
Fifth part of the thesis explores the problem of merging smaller trees with
some of the labelled internal nodes. Generally, most of the existing supertree
methods are developed based on the implicit assumption that only leaf nodes
are labelled in the input tree collection. On the other hand, the phylogenies con
structed based on morphological studies often contain the labelled internal nodes,
thus requiring for a more generalized supertree approach. In this part of the dis
sertation, we propose an optimization based divide and conquer method [15] to
combine semi-labelled trees. The algorithm returns a supertree even for (descen
ded level) incompatible input trees. Moreover, it also preserves all the nestings
present in the input tree collection. On the other hand, most of the existing
methods are neither capable of handling incompatible input trees nor the result
ing supertree represent all the nestings present in all the input trees.
Finally, the contributions made in the thesis are summarized and scope for
the future work is outlined.